Wednesday, 5 March 2008

Introduction To LINQ

Posted by on Sunday, December 23, 2007

This article is an introduction to LINQ and provides examples of using LINQ to query objects, XML, and relational data.

Language Integrated Query (LINQ) gives .NET developers the ability to query and transform data using their .NET language of choice. The data can live inside XML documents, inside relational database tables, or inside collections of objects. What truly distinguishes LINQ from other data access APIs in .NET however, is LINQ's deep integration into the .NET languages like C# and VB. In this article, we will explore the basic features of LINQ and demonstrate this beautiful integration.

Manipulating Data

Traditionally, your data source determines your selection of a low-level data access API. To fetch relational data from a table, you can use components based on ADO.NET. To parse XML data, you might use some combination of XPath and XML DOM APIs. Each data access API requires its own specialized syntax (SQL queries, as an example), and typically forces a developer to shuffle data between their business objects and the data access API. Sophisticated frameworks like object-relational mappers (OR/M) can take much of the grunge work out of data shuffling, but these frameworks still impose their own APIs on a developer.

All of the diverse data access strategies and APIs means there is a tremendous amount of mental energy expended to build the basic data features needed in every business application. LINQ is poised to eliminate this mental overhead.

Designing LINQ

Microsoft designed LINQ to provide general-purpose query facilities to the .NET platform. More specifically, LINQ defines a set of standard query operators in the System.Linq namespace to select, filter, aggregate, and partition data from any type that implements the IEnumerable or IEnumerable<T> interfaces. The standard query operators are compliant with the Common Language Specification (CLS) and work inside any .NET language that supports generics.

In the following C# code, we're using LINQ to query a collection of strings.

string[] instructors = { "Aaron", "Fritz", "Keith", "Scott" };

IEnumerable<string> query =  from n in instructors
                            
where n.Length == 5
                            
orderby n descending
                            
select n;

foreach (string name in query)
{
    
Console.WriteLine(name);
}                                        

Inside the code is a LINQ query expression. The code will print out all the instructor names in descending alphabetical order. The next query produces the same results, but queries data inside an XML fragment.

XElement instructors = XElement.Parse(
              
@"<instructors>
                   <instructor>Aaron</instructor>
                   <instructor>Fritz</instructor>
                   <instructor>Keith</instructor>
                   <instructor>Scott</instructor>
                 </instructors>"
            );

IEnumerable<string> query = from n in instructors.Elements("instructor")
                            
where n.Value.Length == 5
                            
orderby n.Value descending
                            select n.Value;

foreach (string name in query)
{
    
Console.WriteLine(name);
}  

Although we made some slight changes to dig into the XML, the operators inside the query expression remain the same. LINQ query expressions are similar to SQL SELECT commands with from, where, orderby, group, and select clauses.

Microsoft also designed LINQ to integrate closely with .NET languages and their environment. When typing the above queries, Visual Studio provides Intellisense for auto-completion. The compiler can perform syntax and type checking on query expressions. Take the following code as an example:

string[] instructors = { "Aaron", "Fritz", "Keith", "Scott" };

IEnumerable<string> query = from n in instructors
                            
where n.Length == 5
                            
orderby n descending
                            select n.Length;

This code will generate an error. Variables in a query expression all have strong types. The query is selecting the Length property of each string, thus retuning a set of integers (IEnumerable<int>). Compilation will fail because it's illegal to assign this result to a variable of type IEnumerable<string>.

Finally, Microsoft designed LINQ to be extensible. The query operators we used earlier to query an array of strings are the same query operators used to query data in an XML document, or data in a SQL Server database. As developers, we are free to implement additional query operators for our own applications. We can even replace the implementation of the built-in standard query operators for our own types. Best of all, any customizations we make enjoy the same level of support and integration as the standard operators.

LINQ Query Essentials

A LINQ query retrieves data from a data source. As mentioned earlier, any type implementing IEnumerable or IEnumerable<T> is a valid source of data and is known as a queryable type. Array, List<T>, and Dictionary<K,T> are all queryable types inside the Base Class Library. We can define our own custom types implementing IEnumerable<T>, and LINQ will work on our custom types with no additional modifications.

A LINQ provider is a gateway for LINQ to act upon data that is not inside a queryable type. LINQ providers can give LINQ access to data inside SQL Server, inside XML, and more. Microsoft currently ships the 5 LINQ technologies shown in the diagram below.

LINQ Technologes

 

LINQ to Objects

Anytime you are working with a collection, like an array or a list, you have an opportunity to use LINQ to Objects. Generally speaking, we can replace foreach loops and other collection iterating code with concise, readable LINQ queries.

As an example, let's imagine we need to get a list of all the svchost.exe processes running on a machine and return them sorted according to their working set. A .NET 2.0 solution might look like the following:

List<Process> processList = new List<Process>(Process.GetProcesses());

processList =
    processList.FindAll(
delegate(Process p)
                        {
                            
return String.Equals(p.ProcessName, "svchost");
                        });

processList.Sort(
delegate(Process p1, Process p2)
                 {
                     
return p2.WorkingSet64.CompareTo(p1.WorkingSet64);
                 });
return processList;

Here we use Process.GetAllProcesses to retrieve a list of all running processes. At this point we could use foreach to loop through the array and find each svchost process, but this code achieves the same effect using List.FindAll and an anonymous method (the anonymous method provides the search logic). We can sort the List using a similar approach – invoke the List.Sort method and provide an anonymous method that implements the comparison logic.

Now, let's use LINQ to produce the same result:

IEnumerable<Process> processList =
    from p in Process.GetProcesses()
   
where String.Equals(p.ProcessName, "svchost")
 
  orderby p.WorkingSet64 descending
    select
p;
return processList;

The from clause in the query specifies the data source. In this case we use only a single source of data (an array of all running processes), but a query can use multiple data sources. The where clause applies a filter, while orderby will sort the sequence of processes. Finally, the select clause is always found at the bottom of a LINQ query expression and determines what the query will return when we execute the query.

A SQL savvy reader will think our LINQ query appears backwards. In LINQ, the from clause appears first and the select clause appears last. In contrast, the SELECT clause always appears first in a SQL query. One reason for LINQ to structure queries in this fashion is because the IDE can see the data source before we begin typing the where, orderby, and select clauses. With the data source information available the IDE can provide Intellisense support when we write the query.

Deferred Execution

It's important to understand that the result of a LINQ query expression is not a sequence or collection of objects. Instead, the query expression returns a query object. This query object represents the commands needed to execute the query (an expression tree). The query does not execute until the program requests data from the query object. This behavior is known as deferred execution and is a powerful feature of LINQ. Deferred execution facilitates the composability of queries and allows applications to pass queries around as data.

A query object implements the IQueryable<T> interface, which inherits from IEnumerable<T> - meaning we can still treat the query object like a collection. The query executes when we iterate the collection.

As an example, let's use the following class which represents information about an executing process. The class represents a summary of the total information in the regular System.Diagnostics.Process class because we are storing just the process ID and name. Notice the constructor will print a message when each time we construct a ProcessSummary object.

public class ProcessSummary
{
    
public ProcessSummary(int id, string name)
    {
        ID = id;
        Name = name;
        
Console.WriteLine("ProcessSummary created for {0}", id);
    }
    
public int ID { get; set; }
    
public string Name { get; set; }
}

We'll use the ProcessSummary class in the following code. Notice how the select clause of the LINQ query creates a new ProcessSummary object for each Process.

IEnumerable<ProcessSummary> query =
    
from p in Process.GetProcesses()
    
select new ProcessSummary(p.Id, p.ProcessName);

Console.WriteLine("Query complete");

foreach (ProcessSummary summary in query)
{
    
Console.WriteLine("{0,10}({1,4})", summary.Name, summary.ID);
}                                    

Running the code produces the following output. Notice that the program does not create a ProcessSummary until the iteration is ready to use the ProcessSummary.

One has to be careful with deferred execution. Take the following code as an example. In this code we print out a Count of the total number of processes before listing each process.

IEnumerable<ProcessSummary> query =
  
from p in Process.GetProcesses()
  
select new ProcessSummary(p.Id, p.ProcessName);

Console.WriteLine("Query complete");

Console.WriteLine("Total processes: {0}", query.Count());

foreach (ProcessSummary summary in query)
{
    
Console.WriteLine("{0,10}({1,4})", summary.Name, summary.ID);
}                                    

If we examine the output of the above code (see below), we'll notice the program creates a full list of all ProcessSummary items twice. The program creates the first list when we ask it for a count of all items that the query will return. The program has to execute the query to retrieve the count. The program creates the second list as we iterate through the results of the query. The program has to execute the query a second time to retrieve this list.

deferred execution

If we wanted to avoid executing the query twice, we could force the query to materialize the ProcessSummary sequence once using the ToArray or ToList methods, as shown below.

IEnumerable<ProcessSummary> summaries =
   (
from p in Process.GetProcesses()
    
select new ProcessSummary(p.Id, p.ProcessName)).ToList();

Console.WriteLine("Query complete");

Console.WriteLine("Total processes: {0}", summaries.Count());

foreach (ProcessSummary summary in summaries)
{
    
Console.WriteLine("{0,10}({1,4})", summary.Name, summary.ID);
}

We can see in the program output that the query executes once (to create a List of ProcessSummary objects). The Count method and foreach expression can then use this list as a data source.

Now that we've seen some LINQ basics, let's take a quick overview of the other LINQ providers in .NET 3.5.

LINQ to XML

You might be thinking we don't need yet another XML API – but LINQ to XML is special. LINQ to XML utilizes the same standard query operators as the other LINQ enabled technologies and provides some processing extensions specific to XML. The end result is an API that gives a developer all of the power of XPath, XSLT, and the XML DOM API along with all of the benefits, like language integration, that LINQ offers.

LINQ to XML does introduce a new class hierarchy in .NET 3.5. The essential class in this hierarchy is the XElement class in the System.Xml.Linq namespace. One technique you can utilize with XElement is the functional construction of XML as shown in the following code sample.

XElement instructors =
 
new XElement("instructors",
    
new XElement("instructor", "Aaron"),
    
new XElement("instructor", "Fritz"),
    
new XElement("instructor", "Keith"),
    
new XElement("instructor", "Scott")
);

If we use the ToString method on the XElement we've created above, we'll see the following XML.

<instructors>
  <instructor>Aaron</instructor>
  <instructor>Fritz</instructor>
  <instructor>Keith</instructor>
  <instructor>Scott</instructor>
</
instructors>

Functional construction is enabled by a params argument to the XElement constructor and allows us to create an entire XML document using a single statement in C#. Proper indenting of the statement allows us to follow the structure of the XML. If you've ever created XML using the XmlDocument API in .NET 2.0, you'll probably find functional construction creates more readable, concise code.

We can use the XML we've created above in a LINQ query to find the number of instructors named "Scott".

int numberOfScotts =
  (
from i in instructors.Elements("instructor")
 
 where i.Value == "Scott"
 
 select i).Count();

LINQ to SQL

Much has been written on the "object relational impedance mismatch" in software. Software stores data in objects – relational databases store information in rows. Object relational mappers (OR/Ms), like LINQ to SQL, manage relational data as objects and try to shield an application developer from the nuances of relational data.

The first step in LINQ to SQL is defining entity classes and associating those classes with tables in a SQL Server database. Here is an excerpt of a Customer class that is associated with the Customer table in the standard Northwind database.

[Table(Name = "dbo.Customers")]
public partial class Customer
{              
    [Column(IsPrimaryKey =
true)]
    
public string CustomerID
    {
        
get
        {
            
return this._CustomerID;
        }
        
set
        {
            
if ((this._CustomerID != value))
            {
                
this._CustomerID = value;
            }
        }                    
    }
    
private string _CustomerID;

    [Column]
    
public string CompanyName
    {
        
get
        {
            
return this._CompanyName;
        }
        
set
        {
            
if ((this._CompanyName != value))
            {
                
this._CompanyName = value;
            }
        }
    }

    
// ... and so on
}

There are two approaches available for mapping a class to a table. One approach is to use the attributes shown above – Table and Column. Another approach is to use an external XML file. Using an XML files means you don't need to clutter the entity class with attributes. Both approaches, however, come with a number of options that allow you customize the exact mappings. Visual Studio 2008 includes a graphical LINQ to SQL class designer tool, and there is also a command line utility (sqlmetal.exe) that will code generate entity classes and mapping files.

A DataContext class provides access to all entities in LINQ to SQL. The DataContext class manages connections, transactions, change tracking, identity maps, and works well in a unit of work pattern. The data context also translates LINQ queries into SQL queries and issues those queries against a database. In the following code we are querying the Customers table in Northwind to retrieve a list of Customer objects.

IEnumerable<Customer> customers =
    
from c in context.Customers
    
where c.Country == "France"
    orderby c.CustomerID ascending
    select c;

foreach (Customer c in customers)
{
    
Console.WriteLine(c.CustomerID);
}

When we iterate through the enumerable collection of customers (LINQ to SQL also uses deferred execution), the DataContext issues the following command to SQL Server.

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[Country] = @p0
ORDER BY [t0].[CustomerID]

LINQ to SQL also handles more difficult queries, including queries that use joins and queries between related entities. There are some scenarios that go beyond the abilities of LINQ to SQL, but many of these scenarios can be handled by LINQ to SQL's big brother – the Entity Framework.

LINQ to Entities

The Entity Framework (EF) is also an OR/M framework but works at a higher level of abstraction than LINQ to SQL. When working with EF a developer creates a conceptual schema, a store schema, and a mapping between the two. Unlike LINQ to SQL where each class maps to exactly one database table, the conceptual schema in EF allows objects to be modeled with greater flexibility. A single class can map to multiple tables, for instance, and there are several options to manage persistence for inheritance hierarchies.

LINQ to Entities allows a developer to query the conceptual data model. The queries again use all the same standard query operators as we have already seen in this article. We'll return to the entity framework in a future article to give the topic the coverage it deserves.

LINQ to DataSets

The DataSet is an in-memory representation of relational data and (for better or worse) has long been a staple of data access in the .NET world. LINQ to DataSets allows a developer to use the same standard query operators to query the data inside a DataSet or DataTable.

SqlDataAdapter adapter = new SqlDataAdapter(
                            
"SELECT * FROM Customers",
                            _connectionString);
DataTable customers = new DataTable("customers");
adapter.Fill(customers);

int countOfFrenchCustomers =
    (
from c in customers.AsEnumerable()
    
where c.Field<string>("Country") == "France"
     select c).Count();

LINQ to DataSets will work with both typed and untyped DataSets. For untyped DataSets, LINQ to DataSets provides extensions like the Field<T> method that allows strong typing inside the query.

Summary

Although language integrated queries have been around software since the golden years of xBase languages, LINQ is a significant addition to the .NET platform. LINQ gives us consistency by defining a set of standard query operators that work across diverse data sources – objects, relational data, XML, and more. The deep integration of LINQ into .NET languages and tools allows developers to be more productive and work at a higher level of abstraction when creating data oriented applications.
 

No comments: