Developer Feature
Article |
Taking
the Sting Out of ADO.NET |
 |
By Scott Mauvais, Microsoft
Consulting Services
Over the past few months, I
have talked about how easy it is to build applications based
on the Microsoft� .NET Framework and how the Microsoft Visual
Studio� .NET development environment increases your
productivity. This month I planned to continue this discussion
by showing you how .NET assemblies work and how easy it is to
deploy, version, and secure your code with them. Instead, I am
going to talk about ADO.NET.
I decided to postpone my discussion
on .NET assemblies and go with Microsoft ADO.NET after talking
to several developers and learning that a lot of people are
having a hard time wrapping their arms around ADO.NET. Because
most applications rely on some sort of back-end data it makes
sense to talk about ADO.NET first and then dive into
deployment later. |
Fortunately, after receiving some pointers, developers tend
to pick up ADO.NET pretty quickly. It is even easier if you
have worked with other Microsoft data access models (such as
ADO, DAO, or RDO) in the past. When I introduce other
developers to ADO.NET, I start off with an overview of the
goals of ADO.NET and examine some of the problems it is trying
to solve. I then quickly review the ADO.NET object model.
Finally, I walk them through some sample code that
demonstrates how to perform some basic tasks such as
connecting to a database and retrieving datasets.
After this quick overview,
everyone has been able to start building data-aware
applications right away and soon tackle more complex tasks
such as data binding, heterogeneous queries, and distributed
transactions. As I have had good success with this approach in
the past, this is exactly what I will do here. Rather than
spend a great deal of time walking through code in this
article (though we will dive into some!), I'll point you to
other resources. I will also include some information on the
best practices for storing connection strings because this is
a fairly common question and this seems be a good forum to
tackle the issue.
For the best way to learn more about
ADO.NET browse to your favorite on-line book store and order
up a copy of Rebecca Riordan's new book, Microsoft
ADO.NET Step-by-Step. Her book is chock-full of code
samples and walks you through all the common tasks you will
face on a real project. To get a feel for the book, review the
table
of contents and read Chapter
10: ADO.NET Data-Binding in Windows
Forms.
|
In the
Beginning There Was the Recordset, and It Was Good |
When ADO first came out, it was a godsend for us database
developers. Finally we had a single object model that provided
a consistent access pattern regardless of where our data was
stored. The nature of development has changed a great deal
since Microsoft first began development of ADO. At that time,
most Windows� and Web applications accessed back-end data
stored in relational databases or in other stores (such as
mail and directories) that could easily present their data in
a row and column format similar to the relational model.
When most ADO programmers think of data access, they think
in terms of rows and columns and think of Recordsets. This makes sense because the Recordset is the primary means in
which ADO exposes data to the client application. Many of the
difficulties people have with ADO.NET comes from the fact that
ADO.NET does not have a recordset object and there is no single object
that provides a one-to-one mapping of the functionality
provided by a classic ADO Recordset�and this is a good thing, trust
me.
First, Recordset objects
are huge objects. As a one-stop shop, they provide a great
deal of functionality and this is part of their problem. If
all I want to do is grab some data and blast it out on a Web
page, I still have to instantiate a Recordset and incur all the overhead on an
object that can do everything from looping through resultsets
to filtering data to performing batch updates. Besides
incurring the expense of creating a relatively heavy object,
each of these actions requires me to tweak different
configuration settings to optimize performance. Should I
accidentally mix up the property settings, the result can be a
huge drop in performance.
Second, Recordset objects
are COM-based and it's fairly expensive to serialize a Recordset if you want to send it
across the wire; and because Recordsets use binary format, it's nearly
impossible to share Recordset-based data with systems running on
different platforms. |
|
And Then
There Was XML, and It Was Better |
Initially, the appeal of XML came from the ease with which
data could be transformed from one format to another. Got XML
data, need HTML? Easy, just apply an XSLT and you're done. How
about Insert statements?
Sure, no problem. XML also solved both the serialization and
cross-platform challenges of ADO Recordset objects. Because XML is made up of
text, it is very easy to transmit your data to another system
regardless of platform or what sort of firewalls may sit
between the two systems.
This necessitated a second change in the approach to
development. Once your data moves into XML, it is completely
disconnected from the data source. While this greatly increase
scalability (in fact, the use of disconnected data has been a
standard design pattern for high-volume systems for quite some
time) it requires additional work on the part of the
developer. No longer can you simply rely on the database
engine to ensure consistency and perform conflict resolution;
this is plumbing you often have to write yourself.
Finally, unless you were using Microsoft SQL Server� 2000
for all your data storage (which, of course, would be the
right thing to do, but not all of us are quite there
yet), you would still have to rely on Recordset objects to get your data into and out
of your data store. Therefore you'd find yourself writing more
plumbing to marshal data between XML and Recordset objects. This could be quite
difficult at times because XML is often hierarchical, where a
Recordset by its very nature
is almost always tabular. |
|
And Then
Came ADO.NET, and It Was the Best |
As you might suspect, ADO.NET combines the Recordset and XML models to give
you a single view of your data. No longer do you need to
choose between XML and ADO or worse yet, work with both
simultaneously. Microsoft designed ADO.NET to be a first-class
XML citizen (rather than just trying to bolt on XML support to
existing products as others have done.) ADO.NET loads and
persists your data (including metadata such as relations and
constraints) into XML. Once it is in XML, you can use standard
XML technologies such as XML Schemas, XPath queries, and XSLT
to validate, query, and manipulate your data. You can also use
the standard object-oriented approach of methods and
properties if you are more comfortable with that model.
With all this functionality, you might think that ADO.NET
is falling into that same one-stop-shop trap of the ADO Recordset. Not to worry; as I
mentioned above, ADO.NET does not have a single, huge
recordset-like object. Rather than a monolithic data access
object, Microsoft took the approach of breaking up the object
model and providing a handful of highly specialized data
access objects rather than a couple of generic ones.
In the next section, I will take a closer look at those
objects. If you want to learn more about the evolution of
Microsoft's data access strategy, check out Omri Gazitt's
excellent article from the November 2000 issue of MSDN�
entitled Introducing ADO+: Data Access Services for the
Microsoft .NET Framework. If you are already very familiar
with ADO, you will want to read the newly updated ADO.NET for
the ADO Programmer by Doug Rothaus and Mike Pizzo. |
|
So, What
Does ADO.NET Look Like Anyway? |
After you get over the shock and initial disbelieve that
ADO.NET does not have a recordset, you will notice some
similarities to classic ADO. ADO.NET still sports Connection and Command objects and they have similar
functionality: you use a Connection object to connect to your data source
and the Command object to
execute commands, return results, enlist transactions, and
pass parameters to and from the data source. Because they are
so similar to classic ADO, I won't go into any more detail
here. To get more information see ADO.NET Connections and
ADO.NET Commands in the Microsoft .NET Framework SDK on
MSDN.
On to the differences. Microsoft has replaced the
multi-purpose ADO Recordset
with a combination of four objects:
- DataTable: A collection
of rows (DataRow) and
columns (DataColumn) from a
single table. If you want, you can also include metadata
such as constraints (ForeignKeyConstraint and UniqueConstraint) and more advanced properties
such as calculated and auto-increment columns. In some
respects, it is similar to a classic ADO Recordset but it is always an in-memory
representation cached on the machine running your
application.
- DataSet: A collection
of DataTable objects
together with relationships (DataRelations) and constraints (Constraints) that relate them
together. It is a memory-resident, XML representation of the
relational structure. The DataSet provides a consistent programming
model regardless of the source of your data. Key to this
goal is that the DataSet
has no knowledge of the underlying data source used to
populate it; it is a completely self-contained, disconnected
collection of data.
- DataAdapter: The DataAdapter is the glue that
connects a DataSet with the
underlying data source. You use a DataAdapter initially to populate (the Fill() method) the Dataset and
then to update (surprise, it's called Update()) the data source if necessary. The
DataAdapter also contains
the batch update features associated with ADO Recordsets.
- DataReader: The DataReader provides blazingly
fast access to query results. Because it is optimized for
performance, it does not provide the caching capabilities of
the DataSet. In this
manner, you can look at it as a firehose Recordset in classic ADO.
The first question that comes to mind is whether one should
use a DataSet (which includes
the DataTable) or a DataReader? I have heard many
people oversimplify the decision and suggest using the DataReader object for Web
applications and the DataSet
for Windows applications. It is not that easy because the
decision is a classic performance versus functionality
tradeoff, so the proper choice depends on your usage scenario.
The DataSet provides greater
functionality (batch updates, disconnected relational model,
and XML manipulation among others) while the DataReader uses less memory and is
optimized for speed in read-only, forward scrolling
scenarios.
If your data is read-only you probably want to stick with
DataReader�unless you need to
scroll forward and backward, in which case you would need to
use a DataSet. Also, if you
are doing singleton queries you will see a performance boost
by using the DataReader even
if your data is not read-only. However, you'll have to handle
the updates yourself by calling stored procedures or writing
Update statements, but you
would probably need to do that anyway. Finally, if you are
dealing with a huge result set, you are probably better off
using the DataReader because
of the overhead associated with caching all that data. These,
of course, are just guidelines and you need to test and verify
that they are appropriate for your applications in your
environment. Your mileage may (and probably will) vary.
In the next section, I will show you how to use the DataReader to easily data-enable
your applications. I chose to use this object because it
facilitates the most common data access pattern: connect to
the database, grab some data, and present it to the user.
Furthermore, the process is very easy to grasp and it can be
implemented with only a handful of lines of code. If you are
interested in how to use the other objects, you can download
some of my extensively commented sample
code from my web site. After reading this article, you
will be able to learn the basics for the DataTable, DataSet, and DataAdapter on your own by reading the comments.
To go beyond the basics in my sample code, you will want to
buy a copy of Rebecca Riordan's Microsoft
ADO.NET Step-by-Step, which is available from Microsoft
Press�.
One last thing before I dive into some sample
code: The Connection, Command, DataAdapter, and DataReader collectively make up what are known
as the .NET Data Provider. Currently, ADO.NET supports two
different flavors: one for Microsoft SQL Server� (7.0 and
above) and one for other OLE DB data sources. The class names
for the two providers are somewhat different. The SQL Server
.NET Data Provider prefixes the class names with Sql, while
the OLE DB .NET Data Provider uses OleDb. In other words, the
class names for the SQL Server .NET Data Provider are SqlConnection, SqlCommand, SqlDataAdapter, and SqlDataReader; the names for the OLE DB .NET
Data Provider follow the same pattern.
If you are connecting to Microsoft SQL Server, you should
use the SQL Server .NET Data Provider because it talks
directly to Microsoft SQL Server's native TDS data transfer
protocol. Conversely, the OLE DB .NET Data Provider relies on
COM Interop to call the standard OLE DB provider. |
|
So, What
Does ADO.NET Code Look Like? |
While you may have thought it was confusing when you first
looked at it, ADO.NET code is not that difficult now that you
have a little background. Let's start off by comparing two
snippets of code that assume you have already declared your
variables connected to your database (Northwind), and set up
your command object. The first bit of code represents what you
would currently do in Microsoft Visual Basic� 6.0 and classic
ADO.
'-- Using classic VB and ADO to loop through data '--
adRecordSet is an ADODB.Recordset Do While (Not
adRecordSet.EOF) Debug.Print
adRecordSet("CustomerID") & "-"
_
& adRecordSet("CompanyName")
adRecordSet.MoveNext Loop The
second snippet implements the same functionality but uses
ADO.NET and Microsoft Visual C#��.
//
Using C# and ADO.NET to loop through data //drCustomers is a
System.Data.SqlClient.SqlDataReader while (drCustomers.Read())
{ Console.WriteLine (drCustomers["CustomerID"]
+ "
"
+ drCustomers["CompanyName"]); } // While
As you can see, the code is strikingly similar even though
it is implemented in different languages. There are only two
real differences. First, the VB code outputs to the debug
window (Debug.Print), while
the C# code writes to STDOUT (Console.WriteLine). Big deal. The other, more
subtle difference is that ADO.NET DataReader is not positioned at the first row
when it is first opened while the classic ADO Recordset is. You need to move the
DataReader to the first row
by calling the Read() method
before you get any data; With the ADO Recordset, you
call MoveNext()
after you have accessed that data.
This helps prevent those nasty bugs where you forgot to add
the MoveNext() method in your
loop. (Like you have never wondered why a Web page took so
long to render only to find out�when the query timed out�that
you had LONG table with several thousand rows of
ALFKI-Alfred's Futterkiste)
Now that you have seen how to use the DataReader, I'll show
you how to combine the Connection, Command, and DataReader all together into a working
application. In Listing 1, I have removed all the error
handling to make it easier to read in this format. You can download
the complete version from my Web site.
Listing 1. Complete listing for the DataReader test program.
// Create aliases for
namespaces using System; using System.Configuration; using
System.Data; using System.Data.SqlClient ;
// Namespace is used to declare scope and create // globally
unique types. namespace MSPO_ADO_NET {
// The Name of your class. class
DataReader_Sample { // The
entry point. [STAThread]
static void Main(string[] args) {
// Declare your variables.
SqlDataReader drCustomers = null
; SqlConnection
adConn = null
;
SqlCommand
adCmd = null ;
// Instantiate your variables.
// You could have done this above
when you // declared them but I
separated them // so the
individual processes would be
easier // to
identify. adConn = new
SqlConnection(); adCmd
= new SqlCommand() ;
// Pull the connection string from
the // application's .config
file. // This also could have been
done via a // constructor string
when the Connection // was
declared. adConn.ConnectionString
=
ConfigurationSettings.AppSettings["ConnString"];
// Hook up the Command to the
Connection.
adCmd.Connection = adConn
;
// Set the text of the query you
want to execute. // This too could
have been part of the //
declaration of the Command object.
// You would normally use a stored proc
here.
adCmd.CommandText = "SELECT TOP
5
CustomerID,
CompanyName
FROM Customers";
adCmd.CommandType =
CommandType.Text ; //
Open the connection. // Usually
you would use a try/catch block
here. adConn.Open()
; //
Execute the query and //
instantiate the DataReader.
drCustomers = adCmd.ExecuteReader()
; while (drCustomers.Read())
{
Console.WriteLine (drCustomers["CustomerID"]
+ "
"
+ drCustomers["CompanyName"]);
} // While
// It's Miller Time! Close
objects.
drCustomers.Close();
adConn.Close();
return;
} } }
The comments should make the code pretty understandable,
but I want to point out a couple of things. The first is that
I have written this example using just about as many lines of
code as possible. While C# has overloaded the constructor
strings for most of the objects so that you could set the most
common properties when you create the objects, I have chosen
not to do that here. The main reason is I wanted to make sure
the code was as readable as possible, so I have explicitly set
each property. Second, unlike COM (and VB in particular), the
cost of making multiple property assignments at runtime is
much lower in .NET, so the performance impact is minimal. It
does, however, tend to give one carpal tunnel syndrome.
The second thing I want to mention is how I retrieved the
connection string. The specific line is
adConn.ConnectionString
=
ConfigurationSettings.AppSettings["ConnString"]; I chose to store
the connection string in the application's configuration file under
my own key called ConnString. Here are the contents of the
configuration file: <?xml version="1.0" encoding="utf-8" ?>
<configuration>
<appSettings> <add key =
"ConnString"
value =
"server=smauvais-2600\shiloh191;
Integrated_Security=SSPI
database=northwind;
Persist Security Info=false;" />
</appSettings> </configuration>
This is the best way to store connection information for
.NET applications. The other common places would be Universal
Data Link (UDL) files, the Registry, and the COM+ Catalog
files. Let's look at each of these alternatives in turn.
First, UDL files are dirt slow and should be avoided. That was
an easy one. As for the Registry, the application .config file
is better because of the deployment problems associated with
registry settings. Because it is located in your application's
directory and part of its manifest, you can use XCOPY to deploy it to all your
clients. The other common place to store this type of
information is in the COM+ Catalog. You would then access it
through the COM+ constructor string. The problem with this
approach (besides not being very secure) is that only serviced
components have access to the COM+ catalog and there is no
reason to derive all your classes from ServicedComponent simply to get access to COM+
constructor strings.
The only real downside of the .config file approach is that
you might be storing sensitive information in these files. The
first line of defense here are NTFS permissions and the
encrypted file system (EFS.) Second, typically the most
problematic information is the userid and password. You can
(and should) avoid using these at all and rely instead on
integrated security. Finally, you can always encrypt the
information in the .config file.
This concludes my overview of ADO.NET. I started off by
looking at some of the design goals of ADO.NET, and then I
reviewed the object model. Next I walked you through some
sample ADO.NET code that retrieves data from the Northwind
sample database and compared that process to the classic
VB/ADO approach. Finally I concluded with a discussion of the
best practices around storing connection strings in Microsoft
.NET applications. |
|
Microsoft
Press Solutions |
Now that you have an overview of ADO.NET and have seen how
easy it is to work with, you should be ready to start building
your own data-centric Microsoft .NET applications As you start
to take on some relatively complex projects, the best way to
ramp up quickly on all the features and best practices of
ADO.NET is to get a copy of Rebecca Riordan's new booked Microsoft
ADO.NET Step-by-Step.
Microsoft Press provides in-depth documentation for these
and all the other issues related to developing for .NET. For a
complete list of .NET titles from Microsoft Press, see the Inside
Information About Microsoft .NET page.
For a broader overview of all the Microsoft .NET
technologies, you should look at David S. Platt's Introducing
Microsoft .NET. It has some great information on COM
Interop, Windows Forms, and .NET memory management.
Another good book is XML
Step by Step, Second Edition by Michael J. Young. It is a
practical, hands-on learning title that clearly explains the
basics of XML and shows both nonprogrammers and Web developers
alike how to create effective XML documents and display them
on the Web.
One of my favorites is Jake Sturm's book, Developing
XML Solutions, which examines XML-related technologies for
data exchange and shows how XML fits in the Microsoft Windows
architecture.
For a complete list of all the developer titles, see the Developer
Tools section.
For some great links and the latest information on Web
Services, see the MSDN Web
Services home page. A good troubleshooting resource is
Keith Ballinger article on Web
Services Interoperability and SOAP.
For more information on Microsoft Visual Studio .NET, see
the Visual
Studio Next Generation Home Page. |
 |
 |
|
Last Updated: Tuesday, February 5, 2002 |
| |
 |
 |
|