Feature Article - Developers - Microsoft Press Online

		All Products \| Support \| Search \| microsoft.com Home

Microsoft Press Home \| Register Books \| Site Index \| All Books \|

Developer Feature Article

Taking the Sting Out of ADO.NET

By Scott Mauvais, Microsoft Consulting Services

Over the past few months, I have talked about how easy it is to build applications based on the Microsoft� .NET Framework and how the Microsoft Visual Studio� .NET development environment increases your productivity. This month I planned to continue this discussion by showing you how .NET assemblies work and how easy it is to deploy, version, and secure your code with them. Instead, I am going to talk about ADO.NET.

I decided to postpone my discussion on .NET assemblies and go with Microsoft ADO.NET after talking to several developers and learning that a lot of people are having a hard time wrapping their arms around ADO.NET. Because most applications rely on some sort of back-end data it makes sense to talk about ADO.NET first and then dive into deployment later.

Fortunately, after receiving some pointers, developers tend to pick up ADO.NET pretty quickly. It is even easier if you have worked with other Microsoft data access models (such as ADO, DAO, or RDO) in the past. When I introduce other developers to ADO.NET, I start off with an overview of the goals of ADO.NET and examine some of the problems it is trying to solve. I then quickly review the ADO.NET object model. Finally, I walk them through some sample code that demonstrates how to perform some basic tasks such as connecting to a database and retrieving datasets.

After this quick overview, everyone has been able to start building data-aware applications right away and soon tackle more complex tasks such as data binding, heterogeneous queries, and distributed transactions. As I have had good success with this approach in the past, this is exactly what I will do here. Rather than spend a great deal of time walking through code in this article (though we will dive into some!), I'll point you to other resources. I will also include some information on the best practices for storing connection strings because this is a fairly common question and this seems be a good forum to tackle the issue.

For the best way to learn more about ADO.NET browse to your favorite on-line book store and order up a copy of Rebecca Riordan's new book, Microsoft ADO.NET Step-by-Step. Her book is chock-full of code samples and walks you through all the common tasks you will face on a real project. To get a feel for the book, review the table of contents and read Chapter 10: ADO.NET Data-Binding in Windows Forms.

In the Beginning There Was the Recordset, and It Was Good

When ADO first came out, it was a godsend for us database developers. Finally we had a single object model that provided a consistent access pattern regardless of where our data was stored. The nature of development has changed a great deal since Microsoft first began development of ADO. At that time, most Windows� and Web applications accessed back-end data stored in relational databases or in other stores (such as mail and directories) that could easily present their data in a row and column format similar to the relational model.

When most ADO programmers think of data access, they think in terms of rows and columns and think of Recordsets. This makes sense because the Recordset is the primary means in which ADO exposes data to the client application. Many of the difficulties people have with ADO.NET comes from the fact that ADO.NET does not have a recordset object and there is no single object that provides a one-to-one mapping of the functionality provided by a classic ADO Recordset�and this is a good thing, trust me.

First, Recordset objects are huge objects. As a one-stop shop, they provide a great deal of functionality and this is part of their problem. If all I want to do is grab some data and blast it out on a Web page, I still have to instantiate a Recordset and incur all the overhead on an object that can do everything from looping through resultsets to filtering data to performing batch updates. Besides incurring the expense of creating a relatively heavy object, each of these actions requires me to tweak different configuration settings to optimize performance. Should I accidentally mix up the property settings, the result can be a huge drop in performance.
Second, Recordset objects are COM-based and it's fairly expensive to serialize a Recordset if you want to send it across the wire; and because Recordsets use binary format, it's nearly impossible to share Recordset-based data with systems running on different platforms.

And Then There Was XML, and It Was Better

Initially, the appeal of XML came from the ease with which data could be transformed from one format to another. Got XML data, need HTML? Easy, just apply an XSLT and you're done. How about Insert statements? Sure, no problem. XML also solved both the serialization and cross-platform challenges of ADO Recordset objects. Because XML is made up of text, it is very easy to transmit your data to another system regardless of platform or what sort of firewalls may sit between the two systems.

This necessitated a second change in the approach to development. Once your data moves into XML, it is completely disconnected from the data source. While this greatly increase scalability (in fact, the use of disconnected data has been a standard design pattern for high-volume systems for quite some time) it requires additional work on the part of the developer. No longer can you simply rely on the database engine to ensure consistency and perform conflict resolution; this is plumbing you often have to write yourself.

Finally, unless you were using Microsoft SQL Server� 2000 for all your data storage (which, of course, would be the right thing to do, but not all of us are quite there yet), you would still have to rely on Recordset objects to get your data into and out of your data store. Therefore you'd find yourself writing more plumbing to marshal data between XML and Recordset objects. This could be quite difficult at times because XML is often hierarchical, where a Recordset by its very nature is almost always tabular.



And Then Came ADO.NET, and It Was the Best

As you might suspect, ADO.NET combines the Recordset and XML models to give you a single view of your data. No longer do you need to choose between XML and ADO or worse yet, work with both simultaneously. Microsoft designed ADO.NET to be a first-class XML citizen (rather than just trying to bolt on XML support to existing products as others have done.) ADO.NET loads and persists your data (including metadata such as relations and constraints) into XML. Once it is in XML, you can use standard XML technologies such as XML Schemas, XPath queries, and XSLT to validate, query, and manipulate your data. You can also use the standard object-oriented approach of methods and properties if you are more comfortable with that model.

With all this functionality, you might think that ADO.NET is falling into that same one-stop-shop trap of the ADO Recordset. Not to worry; as I mentioned above, ADO.NET does not have a single, huge recordset-like object. Rather than a monolithic data access object, Microsoft took the approach of breaking up the object model and providing a handful of highly specialized data access objects rather than a couple of generic ones.

In the next section, I will take a closer look at those objects. If you want to learn more about the evolution of Microsoft's data access strategy, check out Omri Gazitt's excellent article from the November 2000 issue of MSDN� entitled Introducing ADO+: Data Access Services for the Microsoft .NET Framework. If you are already very familiar with ADO, you will want to read the newly updated ADO.NET for the ADO Programmer by Doug Rothaus and Mike Pizzo.

So, What Does ADO.NET Look Like Anyway?

After you get over the shock and initial disbelieve that ADO.NET does not have a recordset, you will notice some similarities to classic ADO. ADO.NET still sports Connection and Command objects and they have similar functionality: you use a Connection object to connect to your data source and the Command object to execute commands, return results, enlist transactions, and pass parameters to and from the data source. Because they are so similar to classic ADO, I won't go into any more detail here. To get more information see ADO.NET Connections and ADO.NET Commands in the Microsoft .NET Framework SDK on MSDN.

On to the differences. Microsoft has replaced the multi-purpose ADO Recordset with a combination of four objects:

DataTable: A collection of rows (DataRow) and columns (DataColumn) from a single table. If you want, you can also include metadata such as constraints (ForeignKeyConstraint and UniqueConstraint) and more advanced properties such as calculated and auto-increment columns. In some respects, it is similar to a classic ADO Recordset but it is always an in-memory representation cached on the machine running your application.
DataSet: A collection of DataTable objects together with relationships (DataRelations) and constraints (Constraints) that relate them together. It is a memory-resident, XML representation of the relational structure. The DataSet provides a consistent programming model regardless of the source of your data. Key to this goal is that the DataSet has no knowledge of the underlying data source used to populate it; it is a completely self-contained, disconnected collection of data.
DataAdapter: The DataAdapter is the glue that connects a DataSet with the underlying data source. You use a DataAdapter initially to populate (the Fill() method) the Dataset and then to update (surprise, it's called Update()) the data source if necessary. The DataAdapter also contains the batch update features associated with ADO Recordsets.
DataReader: The DataReader provides blazingly fast access to query results. Because it is optimized for performance, it does not provide the caching capabilities of the DataSet. In this manner, you can look at it as a firehose Recordset in classic ADO.

The first question that comes to mind is whether one should use a DataSet (which includes the DataTable) or a DataReader? I have heard many people oversimplify the decision and suggest using the DataReader object for Web applications and the DataSet for Windows applications. It is not that easy because the decision is a classic performance versus functionality tradeoff, so the proper choice depends on your usage scenario. The DataSet provides greater functionality (batch updates, disconnected relational model, and XML manipulation among others) while the DataReader uses less memory and is optimized for speed in read-only, forward scrolling scenarios.

If your data is read-only you probably want to stick with DataReader�unless you need to scroll forward and backward, in which case you would need to use a DataSet. Also, if you are doing singleton queries you will see a performance boost by using the DataReader even if your data is not read-only. However, you'll have to handle the updates yourself by calling stored procedures or writing Update statements, but you would probably need to do that anyway. Finally, if you are dealing with a huge result set, you are probably better off using the DataReader because of the overhead associated with caching all that data. These, of course, are just guidelines and you need to test and verify that they are appropriate for your applications in your environment. Your mileage may (and probably will) vary.

In the next section, I will show you how to use the DataReader to easily data-enable your applications. I chose to use this object because it facilitates the most common data access pattern: connect to the database, grab some data, and present it to the user. Furthermore, the process is very easy to grasp and it can be implemented with only a handful of lines of code. If you are interested in how to use the other objects, you can download some of my extensively commented sample code from my web site. After reading this article, you will be able to learn the basics for the DataTable, DataSet, and DataAdapter on your own by reading the comments. To go beyond the basics in my sample code, you will want to buy a copy of Rebecca Riordan's Microsoft ADO.NET Step-by-Step, which is available from Microsoft Press�.

One last thing before I dive into some sample code: The Connection, Command, DataAdapter, and DataReader collectively make up what are known as the .NET Data Provider. Currently, ADO.NET supports two different flavors: one for Microsoft SQL Server� (7.0 and above) and one for other OLE DB data sources. The class names for the two providers are somewhat different. The SQL Server .NET Data Provider prefixes the class names with Sql, while the OLE DB .NET Data Provider uses OleDb. In other words, the class names for the SQL Server .NET Data Provider are SqlConnection, SqlCommand, SqlDataAdapter, and SqlDataReader; the names for the OLE DB .NET Data Provider follow the same pattern.

If you are connecting to Microsoft SQL Server, you should use the SQL Server .NET Data Provider because it talks directly to Microsoft SQL Server's native TDS data transfer protocol. Conversely, the OLE DB .NET Data Provider relies on COM Interop to call the standard OLE DB provider.

So, What Does ADO.NET Code Look Like?

While you may have thought it was confusing when you first looked at it, ADO.NET code is not that difficult now that you have a little background. Let's start off by comparing two snippets of code that assume you have already declared your variables connected to your database (Northwind), and set up your command object. The first bit of code represents what you would currently do in Microsoft Visual Basic� 6.0 and classic ADO.

'-- Using classic VB and ADO to loop through data
'-- adRecordSet is an ADODB.Recordset
Do While (Not adRecordSet.EOF)
    Debug.Print adRecordSet("CustomerID") & "-" _
              & adRecordSet("CompanyName")
    adRecordSet.MoveNext
Loop
The second snippet implements the same functionality but uses ADO.NET and Microsoft Visual C#��.
// Using C# and ADO.NET to loop through data
//drCustomers is a System.Data.SqlClient.SqlDataReader
while (drCustomers.Read()) {
Console.WriteLine (drCustomers["CustomerID"]
                   + " "
                   + drCustomers["CompanyName"]);
}   // While

As you can see, the code is strikingly similar even though it is implemented in different languages. There are only two real differences. First, the VB code outputs to the debug window (Debug.Print), while the C# code writes to STDOUT (Console.WriteLine). Big deal. The other, more subtle difference is that ADO.NET DataReader is not positioned at the first row when it is first opened while the classic ADO Recordset is. You need to move the DataReader to the first row by calling the Read() method before you get any data; With the ADO Recordset, you call MoveNext() after you have accessed that data.

This helps prevent those nasty bugs where you forgot to add the MoveNext() method in your loop. (Like you have never wondered why a Web page took so long to render only to find out�when the query timed out�that you had LONG table with several thousand rows of ALFKI-Alfred's Futterkiste)

Now that you have seen how to use the DataReader, I'll show you how to combine the Connection, Command, and DataReader all together into a working application. In Listing 1, I have removed all the error handling to make it easier to read in this format. You can download the complete version from my Web site.

Listing 1. Complete listing for the DataReader test program.
// Create aliases for namespaces
using System;
using System.Configuration;
using System.Data;
using System.Data.SqlClient ;

// Namespace is used to declare scope and create
// globally unique types.
namespace MSPO_ADO_NET {

// The Name of your class.
class DataReader_Sample {

    // The entry point.
    [STAThread]
    static void Main(string[] args) {
      // Declare your variables.
      SqlDataReader   drCustomers = null ;
      SqlConnection   adConn      = null ;
      SqlCommand      adCmd       = null ;

      // Instantiate your variables.
      // You could have done this above when you
      // declared them but I separated them
      // so the individual processes would be easier
      // to identify.
      adConn = new SqlConnection();
      adCmd   = new SqlCommand() ;

      // Pull the connection string from the
      // application's .config file.
      // This also could have been done via a
      // constructor string when the Connection
      // was declared.
      adConn.ConnectionString =
           ConfigurationSettings.AppSettings["ConnString"];

      // Hook up the Command to the Connection.
      adCmd.Connection        = adConn ;

      // Set the text of the query you want to execute.
      // This too could have been part of the
      // declaration of the Command object.
      // You would normally use a stored proc here.
      adCmd.CommandText       = "SELECT TOP 5
                                        CustomerID,
                                        CompanyName
                                      FROM Customers";

      adCmd.CommandType       = CommandType.Text ;

      // Open the connection.
      // Usually you would use a try/catch block here.
      adConn.Open() ;

      // Execute the query and
      // instantiate the DataReader.
      drCustomers = adCmd.ExecuteReader() ;
      while (drCustomers.Read()) {
          Console.WriteLine (drCustomers["CustomerID"]
                           + " "
                           + drCustomers["CompanyName"]);
      }   // While

      // It's Miller Time! Close objects.
      drCustomers.Close();
      adConn.Close();
      return;
    }
}
}

The comments should make the code pretty understandable, but I want to point out a couple of things. The first is that I have written this example using just about as many lines of code as possible. While C# has overloaded the constructor strings for most of the objects so that you could set the most common properties when you create the objects, I have chosen not to do that here. The main reason is I wanted to make sure the code was as readable as possible, so I have explicitly set each property. Second, unlike COM (and VB in particular), the cost of making multiple property assignments at runtime is much lower in .NET, so the performance impact is minimal. It does, however, tend to give one carpal tunnel syndrome.

The second thing I want to mention is how I retrieved the connection string. The specific line is

adConn.ConnectionString =
           ConfigurationSettings.AppSettings["ConnString"];
I chose to store the connection string in the application's configuration file under my own key called ConnString. Here are the contents of the configuration file:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<appSettings>
    <add key   = "ConnString"
         value = "server=smauvais-2600\shiloh191;
                  Integrated_Security=SSPI
                  database=northwind;
                  Persist Security Info=false;" />
</appSettings>
</configuration>

This is the best way to store connection information for .NET applications. The other common places would be Universal Data Link (UDL) files, the Registry, and the COM+ Catalog files. Let's look at each of these alternatives in turn. First, UDL files are dirt slow and should be avoided. That was an easy one. As for the Registry, the application .config file is better because of the deployment problems associated with registry settings. Because it is located in your application's directory and part of its manifest, you can use XCOPY to deploy it to all your clients. The other common place to store this type of information is in the COM+ Catalog. You would then access it through the COM+ constructor string. The problem with this approach (besides not being very secure) is that only serviced components have access to the COM+ catalog and there is no reason to derive all your classes from ServicedComponent simply to get access to COM+ constructor strings.

The only real downside of the .config file approach is that you might be storing sensitive information in these files. The first line of defense here are NTFS permissions and the encrypted file system (EFS.) Second, typically the most problematic information is the userid and password. You can (and should) avoid using these at all and rely instead on integrated security. Finally, you can always encrypt the information in the .config file.

This concludes my overview of ADO.NET. I started off by looking at some of the design goals of ADO.NET, and then I reviewed the object model. Next I walked you through some sample ADO.NET code that retrieves data from the Northwind sample database and compared that process to the classic VB/ADO approach. Finally I concluded with a discussion of the best practices around storing connection strings in Microsoft .NET applications.



Microsoft Press Solutions

Now that you have an overview of ADO.NET and have seen how easy it is to work with, you should be ready to start building your own data-centric Microsoft .NET applications As you start to take on some relatively complex projects, the best way to ramp up quickly on all the features and best practices of ADO.NET is to get a copy of Rebecca Riordan's new booked Microsoft ADO.NET Step-by-Step.

Microsoft Press provides in-depth documentation for these and all the other issues related to developing for .NET. For a complete list of .NET titles from Microsoft Press, see the Inside Information About Microsoft .NET page.

For a broader overview of all the Microsoft .NET technologies, you should look at David S. Platt's Introducing Microsoft .NET. It has some great information on COM Interop, Windows Forms, and .NET memory management.

Another good book is XML Step by Step, Second Edition by Michael J. Young. It is a practical, hands-on learning title that clearly explains the basics of XML and shows both nonprogrammers and Web developers alike how to create effective XML documents and display them on the Web.

One of my favorites is Jake Sturm's book, Developing XML Solutions, which examines XML-related technologies for data exchange and shows how XML fits in the Microsoft Windows architecture.

For a complete list of all the developer titles, see the Developer Tools section.

For some great links and the latest information on Web Services, see the MSDN Web Services home page. A good troubleshooting resource is Keith Ballinger article on Web Services Interoperability and SOAP.

For more information on Microsoft Visual Studio .NET, see the Visual Studio Next Generation Home Page.

Last Updated: Tuesday, February 5, 2002