Introduction

At times us developers have to deal with delimited text files in our applications. Such files have been around since yonks and I often come across data import/export tasks where delimited files are used. Till now the common way in .NET has been to read each line and then extract data using some sort of creative string functions within for loops. But there is another way by using LINQ. In this tutorial I will show you how to use LINQ to read such data. By the end of tutorial you will appreciate how easy and logical it is to use LINQ for reading data from delimited text files.

Sample Data

I will use a sample file which contains a data about customers. When working with text files we must know the number of columns and the data contained in each column. Below is a list of columns in their right order for our file.

  1. First Name
  2. Last Name
  3. Job Title
  4. City
  5. Country

The file itself will contain this data. I have pulled this out of Employees table in Northwind database.

image

Reading Data

Before we start reading our csv file we will create a class which will hold a record we will read from our csv file. For this I will create a customer class which looks like this. I know I pulled data from Employees table so just imagine that data above is customer data. Employees can also be customers ;)

public class Customer
{
 string Firstname { get; set; }
 string Lastname { get; set; }
 string JobTitle { get; set; }
 string City { get; set; }
 string Country { get; set; }
}
Reading Entire File

Now we are ready to read data from our file using LINQ. Using this code we can read the entire file. I am also using a foreach statement to output the results.

var query =
 from line in File.ReadAllLines(filePath)
 let customerRecord = line.Split(',')
 select new Customer()
 {
 Firstname = customerRecord[0],
 Lastname = customerRecord[1],
 JobTitle = customerRecord[2],
 City = customerRecord[3],
 Country = customerRecord[4]
 };
foreach (var item in query)
{
 Console.WriteLine("{0}, {1}, {2}, {3}, {4}",
 item.Firstname, item.Lastname, item.JobTitle, item.City, item.Country);
}

File.ReadAllLines() returns an array of lines and we then use the split function of array to split it by a comma. Its just that simple.

Reading selected records

We can use this code to read all customers who live in UK.

var query =
 from c in
 (from line in File.ReadAllLines(filePath)
 let customerRecord = line.Split(',')
 select new Customer()
 {
 Firstname = customerRecord[0],
 Lastname = customerRecord[1],
 JobTitle = customerRecord[2],
 City = customerRecord[3],
 Country = customerRecord[4]
 })
 where c.Country == "UK"
 select c;

This code can be used to read customers who have sales in their job title.

var query =
 from c in
 (from line in File.ReadAllLines(filePath)
 let customerRecord = line.Split(',')
 select new Customer()
 {
 Firstname = customerRecord[0],
 Lastname = customerRecord[1],
 JobTitle = customerRecord[2],
 City = customerRecord[3],
 Country = customerRecord[4]
 })
 where c.JobTitle.Contains("Sales")
 select c;

I am sure that above queries can be polished by using a bit more syntax sugar but I am just too excited to see LINQ working with a csv file.

Conclusion

LINQ makes it extremely simple to work with delimited text files. Once we have the records from a csv file in an object we can use all the power of LINQ to query our hearts out on such files. This functionality is available to us via LINQ to objects and we do not need another flavour of LINQ to achieve this. Stay tuned for more posts on LINQ.

kick it on DotNetKicks.com

Technorati Tags: ,,

Tagged with:
 

16 Responses to Tutorial Reading A Text File Using LINQ

  1. Deepak says:

    Thanks Mike

  2. I’m really fascinated by the fact that LINQ can be used in a great variety of scenarios.

    I remember I was a bit confused when I saw it for the first time. I was primary frustrated by the fact that the first operator was “from” and “select” was the last one. But then I understood the reasons.

  3. Deepak says:

    Mike,

    The best thing about LINQ is that it comes with such a small learning curve for C# and VB.NET programmers. And if you have written any SQL then after writing few LINQ queries you feel like you knew it all the time.

    In my case I feel that after a long time something (LINQ) excites me so much that I want to learn every little detail about it.

  4. JabberBoxer says:

    Just wanted to share the code we used to extend the functionality of your read CSV, this will now read your CSV into multiple objects.

    ‘~ Example of CSV file
    ‘~ “InvoiceNumber”,”InvoiceDate”, “FirstName”, “LastName”
    ‘~ “11111″,”9/1/2008″, “Jerrame”, “Hertz”
    ‘~ “22222″,”9/2/2008″, “John”, “Doe”
    ‘~ “33333″,”9/3/2008″, “Jane”, “Doe”

    Imports System.IO

    Module Module1
    Sub Main()
    ReadCSV(“c:\test.csv”)
    End Sub
    End Module

    Module LINQ2CSV
    Public Sub ReadCSV(ByVal FilePath As String)
    ‘~ Load the values from the CSV file.
    Dim Orders = From line In File.ReadAllLines(FilePath) _
    Let OrderRecord = line.Split(“,”) _
    Select New Invoice() _
    With {.Order = New Order( _
    TrimQoutes(OrderRecord(0)), _
    TrimQoutes(OrderRecord(1))), _
    .Customer = New Customer( _
    TrimQoutes(OrderRecord(0)), _
    TrimQoutes(OrderRecord(2)), _
    TrimQoutes(OrderRecord(3)))}

    End Sub

    ‘~ Remove qoutes from string. May be able to avoid this if I use the
    ‘~ parse object instead of the the file object.
    Private Function TrimQoutes(ByVal value As String) As String
    Return value.Trim.Replace(“”"”, “”)
    End Function

    End Module
    ‘~ Order object.
    Public Class Invoice
    Public [Order] As Order
    Public [Customer] As Customer
    End Class

    Public Class Order
    Public InvoiceNumber As String
    Public InvoiceDate As String

    Public Sub New(ByVal InvoiceNumber As String, ByVal InvoiceDate As String)
    Me.InvoiceNumber = InvoiceNumber
    Me.InvoiceDate = InvoiceDate
    End Sub
    End Class

    Public Class Customer
    Public InvoiceNumber As String
    Public FirstName As String
    Public LastName As String

    Public Sub New(ByVal InvoiceNumber As String, ByVal FirstName As String, ByVal LastName As String)
    Me.InvoiceNumber = InvoiceNumber
    Me.FirstName = FirstName
    Me.LastName = LastName
    End Sub

    End Class

  5. [...] from a text file. Firstly, I read the contents of the file into a collection using an approach I described in an earlier post [...]

  6. Andrew says:

    Awesome, thanks exactly what I was looking for.

  7. Deepak says:

    Andrew,

    Thanks for your comment.

  8. Vish says:

    Hi,

    Can you suggest me how can I import a CSV file (pretty large in size) to a SQL DB using Linq in Silverlight ASP .NET project.

    I have written a test app to read the records from the DB and display them on the grid using Linq.

    How would I do bulk insert from such a large file (it has close to 50+ columns)?

    Thanks!

  9. richard says:

    the only problem I encounter here is when there’s a comma “,” included in a field.

  10. Deepak says:

    Hi Richard,

    I know what you mean. In fact this has been my number one issue with flat files. If I find a comma within a field and text is not enclosed in say double quotes then I consider it to be an error and skip the record.

  11. Vijaya says:

    This article is good.
    but it would be more helpful if you included code to convert the object into a record to be written to a database.
    for example, i want to read the text file and write all records of UK employees in sales dept to the database.

    but this was still helpful.
    keep up the good work.

  12. Deepak says:

    Thanks Vijaya.

  13. Gil says:

    I found a solution for the comma problem by using Linq To Csv.

    What I didn’t like much was that you have to declare all the variables inside the select and the csv columns must match the array declarations or everything will be messup… i.e. if you move the first column in the csv file to the end the Firstname = customerRecord[0] will get the LastName and so on…

    I tried doing a loop inside the select but there’s no way (as long as I know) to dynamically declare these variables with the column’s name and get the value from the array (i.e. FirstName = customerRecord[0])… does someone have any idea on how to do this?

  14. dan says:

    hey good example man. really helped me. cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>