Try a FREE Issue
  Shop the Store
  Shop the Pay-per-
View Archives

  Business & Personal Computing
  IT Professional
  Programming &
Web Development
  Graphics & Design
  Element K Journals Home Page

  Visit Discussion
Forums
  Contact Us
  Get Published
  About Element K Journals
  Advertise with Us!
  Meet Our Partners

  Tips
  Weekly Buzz
 
Access Online Courses
 
Search Element K Journals Articles


[ Element K Journals ]
Edu. Gde. to Computers in the Classroom
[ Element K Journals ]


 

SQL Server Solutions
Back Issues  •  About our Experts

 

April 2000

Using heterogeneous queries

 
by Scott Mauvais

Now that corporate IT managers are (finally) viewing SQL Server as a serious enterprise platform, as developers and DBAs, we're facing new challenges. Before the release of SQL Server 7.0, we fought just to get in the game. To do so, we focused all our energies just on proving we could build and maintain high-performance, scaleable applications in a 24 x 7 environment.

Now that we're in the data center, we need to learn the rules of the game—and the first rule is to play nice with others. With some two million servers running countless different flavors of UNIX and another 350,000 IBM mainframe and mid-range computers, the challenge of integrating all this disparate data into a structure usable by mere mortals might appear insurmountable.

Fortunately, interoperability has always been one of SQL Server's strengths—and the heterogeneous query support in version 7.0 makes it even easier. As you can see in Figure A, SQL Server is able to connect to many different data sources. You can now transparently access any OLE DB data source with the standard Transact-SQL you use with your local data.

Figure A: SQL Server 7.0 supports many heterogeneous data sources.
[ Figure A ]

Last month, in the article entitled, "Using distributed queries with SQL Server 7.0", we looked at a related technology, SQL Server's distributed query processor. The distributed query enables you to access data on remote SQL servers and to present users with a single result set as if all of the data came from the local server. In this article, we're going to expand upon the distributed queries we learned about last month by showing you how to use heterogeneous queries to integrate remote data from sources other than SQL Server.

We'll start by showing you how to create linked servers with SQL Server Enterprise Manager to connect to several data sources regardless of format or location. From there, we're going to write queries that access both relational and non-relational data and join these result sets with native SQL Server data. At the end of the article, we're going to explore some performance considerations that apply equally to both heterogeneous and distributed queries.

Understanding the SQL Server query architecture

Before we get into the details of setting up heterogeneous queries, let's review the architecture behind SQL Server 7.0's distributed query processor and its heterogeneous query support. A heterogeneous query enables developers to issue a single SQL statement against a single SQL server that integrates data from several different stores. You can do all of this with only one connection to a single server because SQL Server takes care of all the hard work for you.

Imagine that your company stores its human resources data in Oracle and its sales data in SQL Server. Your manager has just asked to you to create a monthly report that compares each salesperson's salary with his sales volume. Oh, and she wants you to automatically email it to each sales rep's manager so you need to query the Active Directory also.

Without heterogeneous queries not only would you have to learn PL/SQL for Oracle and LDAP for the Active Directory, but you would have to manage the connectivity and security for yourself. Moreover, once you received the query results back, you would have to write your own algorithm to merge this data and join it with your local SQL Server Sales data. With heterogeneous queries, you don't need to worry about any of this. Instead, you simply build a standard SQL query as you've always done.

When you issue a standard query requesting data, the SQL Server Query Processor first parses it, then compiles and optimizes it, and finally executes it—returning the results to the client application. When processing a heterogeneous query, during the parse phase, SQL Server breaks down the SQL statement and separates out the portions that refer to non-local data. SQL Server then sends the requests that it can't fulfill off to OLE DB. OLE DB in turn passes each request off to the specific OLE DB provider responsible for a given data source. If, for example, your query referenced an employee's pay grade from PeopleSoft running on Oracle and his manager's email address from LDAP, OLE DB would send the compensation request off to the OLE DB Provider for Oracle and the directory request off to the OLE DB Provider for Microsoft Directory Services.

Once the data source has processed the request and returned the data—called a rowset in OLE DB-speak—the process is reversed. The data store first sends the data back to the OLE DB provider. The provider returns the data to OLE DB, which in turn passes rowset back to SQL Server. When all the OLE DB providers involved in the query have returned all of their rowsets, SQL Server merges them all together and returns a single result set back to the client application. From the user's perspective—and more importantly from the developer's—all the data came from a single SQL Server.

Because Microsoft built SQL Server on OLE DB, heterogeneous queries aren't limited to traditional relational data sources. With SQL Server, you can query directory services, file systems, mail stores—any data source that has an OLE DB provider written for it. For example if you wanted to locate all of the files in a given directory written by a specific author, SQL Server would pass the request off to OLE DB, which would then pass the request off to the OLE DB Provider for Indexing Service. If you want to determine the OLE DB providers installed on any SQL server, you can use the xp_enum_oledb_providers extended stored procedure. Now that we understand the fundamentals of heterogeneous queries, let's move on and look at setting some up and executing statements against them.

Accessing remote servers

As with distributed queries last month, the first step to using heterogeneous queries is telling SQL Server how to access the data. You can do this either at design time by creating and storing a linked server configuration or you can use the OpenRowSet() function and link to the data source dynamically at run time.

Because each OLE DB provider is specific to a particular data source, the behavior of each provider is a bit different. The main benefit is that the provider—once configured—knows exactly how to locate the data source, establish a connection, authenticate the user, and request the desired data in the most efficient manner. The downside is that the information required to configure a linked server varies by OLE DB provider.

As we learned last month, you can set up a linked SQL server with as little as a network address. On the other hand, if you want to use Index Server to query the file system, you'll need to provide the name of the catalog. To use Jet to access a file server-style database (such as one in Microsoft Access) or a flat file, you'll need the fully qualified path to the data source and the name of the ISAM driver you want to use.

Defining a linked server in SQL Server Enterprise Manager

Once you know the required configuration information, adding a linked server is very straightforward. You can use either SQL Server Enterprise Manager or stored procedures. We covered the related store procedure approach, especially sp_AddLinkedServer, in last month's article so we won't do so here. There are several options when using SQL Server Enterprise Manager that we saved to show you this month, so let's look at them now.

Begin by starting SQL Server Enterprise Manager and connecting to the desired server. Next, expand the Security Folder and then right-click on Linked Servers. From the shortcut menu, choose New Linked Server, as shown in Figure B. This takes you to the General tab of the Linked Server Properties dialog box, as shown Figure C. Let's begin by setting up a connection to an Oracle server so we can look at each of these input fields in detail. You should define the properties as follows:

Figure B: You can use SQL Server Enterprise Manager to define a Linked Server.
[ Figure B ]

Figure C: Use the Linked Server Properties dialog box to define the properties of a new linked server.
[ Figure C ]

  • Linked Server—This is the name you'll use to access the data source. In our example, we've chosen to call the linked server ORACLE_HR. Notice that SQL Server forces you to use all capitals in this field.

  • Server Type—Here you'll specify whether you're connecting to a SQL server or to some other data source. Until you select the Other Data Source option button, the rest of the fields in this section of the dialog box are unavailable.

  • Provider Name—Use this list to specify the type of OLE DB provider you want to use to connect to the remote server. In our example, we chose the Microsoft OLE DB Provider for Oracle.

  • Product Name—This is the friendly name of the OLE DB provider but it doesn't have to relate directly to the provider itself. We generally use it as a description field like in ODBC to help us remember exactly why we created the linked server. Be careful not to include SQL Server in the product name because it will confuse SQL Server and, depending on the exact combination of settings you use, you might receive one of several cryptic (and not overly helpful) error messages. (See last month's article for more details.)

  • Data Source—This is the information that the OLE DB provider will use to locate the remote data and is usually specific to the provider itself. For Oracle, use the SQL*Net name you've defined for the Oracle connection. In our example, we've used the standard data source name of Oracle.World.

  • Provider String—This tells the provider how to connect to the heterogeneous data source. Typically, the information you enter in this text box includes user credentials in a DataSource;UserName;Password DSNProviderString-style format. For Oracle, however, all of this information is stored in the SQL*Net connection—so we'll leave it blank.

  • Collation Compatible—Checking this option tells the SQL Server query optimizer that this data source uses the same code page and sort order as your local SQL server. While this can give you quite a performance boost when executing distributed queries against multiple SQL servers with similar configurations, you're better off leaving this blank when dealing with heterogeneous data unless you know exactly what you're doing.

  • Data Access—This option enables you to specify that you want this data source available for heterogeneous queries, so you should leave it checked.

  • RPC and RPC Out—Checking these options enables SQL Server to communicate with the data source via remote procedure calls. RPC is used for executing stored procedures on remote SQL servers. You don't need to check these to connect to an Oracle server, so in our example, we left them unchecked.

Once you've configured the general information for the linked server, you next need to configure the security for the connection. The security options enable you to identify how your SQL server can log on to the remote server. To configure the security options, select the Security tab as shown in Figure D.

Figure D: You must also configure the security for a remote data source.
[ Figure D ]

You use this tab to define how you're going to map your local SQL Server users to users on the remote data source—the Oracle server in this example. As you can see, the screen has two sections; we're going to look at the lower section first. Here you define a mapping table to translate user accounts on your local SQL server to valid accounts on the other server (in this case, the Oracle server).

In our example, we've mapped the SQL Server sa account to an Oracle DBA account called system. This means that when the local sa user executes a heterogeneous query that involves this remove server, the distributed query processor instructs Oracle to run its portion of the query under the system user's security context. We've also mapped the scott account because, while the user name exists on both servers, the passwords are different. Finally, we checked the Impersonate box for the user demo because the user name and password are the same on both servers. Checking this tells SQL Server to pass the user's credentials directly to the remote server.

Now let's return to the top section of the Security tab. Use this section to define how you want to handle users you haven't defined in the mapping section. You can use one of the following four choices:

  • No Security Context Will Be Used
  • They Will Be Impersonated
  • They Will Be Mapped
  • They Are Not Allowed To Access

Let's discuss each of these four options in detail.

No Security Context Will Be Used

When you use this option, SQL Server makes no attempt to pass any credentials when connecting to a remote data source. For most data stores, this has the result of denying access because most servers will reject anonymous connections. However, this option works just fine for some data sources, such as xBase, which don't support authentication.

They Will Be Impersonated

When you use this option, SQL Server first checks to see if the current user has an entry in the mapping table. If the he doesn't, it passes the user's local login and password to the remote server. We rarely use this option because it requires that you make the user's account and password the same on both your local SQL server as well as the remote server—and it's too much effort to keep both accounts in synch.

Also, there's a risk of a user inadvertently gaining access because of a security oversight. If you want a user to access a remote system using his local credentials, you should proactively grant access. We also use this as an easy concession to security paranoid data center DBAs that are none-too-thrilled to have a toy database (SQL Server) accessing their real databases (Oracle).

They Will Be Mapped

Use this option if you want any user who isn't listed in the lower portion of the tab to be mapped to the username and password you provide in the adjacent text boxes. This setting is quite similar to SQL Server 7.0's new application database role, only much more frightening. With an application role, SQL Server explicitly delegates security to the application where users' actions are (presumably) tightly controlled and closely matches the intended use of the application.

Conversely, if you map all users, every valid SQL Server login has access to the remote data. Worse still, the only way to deny user access is to set up a bogus map with an invalid remote login/password combination.

We do occasionally use this setting, however, when we're first setting up the permissions in SQL Server. This way, when we have views and stored procedures that access remote data, we can test out permissions and ownership trees on these local objects without having to constantly bother the other DBAs by asking them to set up accounts on their systems.

They Are Not Allowed To Access

Use this setting if you want SQL Server to deny access to the remote data to any user who you haven't explicitly defined in the mapping table. We use this configuration most often for the same reasons we don't use the They Will Be Impersonated option.

Using linked servers

Once you've configured security, click OK to save your changes. Now that we've set up a linked server, you might be asking yourself "Where is a linked server saved and how exactly do I use it?" You can look at a linked server as a virtual server.

When you create a linked server, SQL Sever encapsulates all of the information the OLE DB provider needs when it connects to the remote data source—the network, security, and semantics of the data source, and saves this configuration information in its system tables. This way, you can set the link up once and all of your developers can use it without having to manually specify the connection information each time they want to access non-SQL Server data.

Better yet, if you need to move the data to another server—or even a different platform—all you need to do is update the associated linked server. If you require more granular control—for example, if you don't know where the specific data will be located or what provider you'll want to use until runtime—you can call the OpenRowSet() function as part of your FROM clause to access heterogeneous data.

Writing heterogeneous queries

Once you've defined a linked server, you can refer to any remote resource just as you would if it were a native SQL Server table. When constructing a heterogeneous query, you reference objects on a linked server using a four-part syntax that should be familiar to anyone who has written cross-database queries:

LinkedServer.Catalog.Schema.Object

For example, to retrieve the employee list from the demo database that ships with Oracle, we used the following query (because Oracle's system catalog is case sensitive, make sure you use all upper-case letters):

SELECT EMPLOYEE_ID, FIRST_NAME, MIDDLE_INITIAL,

LAST_NAME, FUNCTION

FROM ORACLE_HR..DEMO.EMPLOYEE E,

ORACLE_HR..DEMO.JOB J

WHERE E.JOB_ID = J.JOB_ID

ORDER BY LAST_NAME, FIRST_NAME,

MIDDLE_INITIAL

Not only can you read from heterogeneous data sources, but you can also issue updates by using the following query:

UPDATE ORACLE_HR..DEMO.EMPLOYEE

SET MIDDLE_INITIAL = 'Q'

WHERE EMPLOYEE_ID = 7876

You can also create views against the Oracle server that behave just like native SQL Server views. For example, you might start by executing the following SQL statement in the Northwind database to create a simple view:

CREATE VIEW qryOracleJob as

SELECT *

FROM ORACLE_HR..DEMO.JOB

Next, you can use the view by using either the SELECT SQL statement or from within SQL Server Enterprise Manager. To view it within SQL Server Enterprise Manager, navigate to the views object in the Northwind database. Right-click on qryOracleJob. From the shortcut menu, choose Open View | Return All Rows. You should now be able to see Oracle data just as if it were native to your SQL server.

Finally, you can join Oracle data together with native SQL Server data and present the user with a single result set. For example, we created the qryOracleEmployees view in Listing A to transform the data in the EMPLOYEE table in Oracle's demo database so data resembles the Northwind Employees table. The SQL syntax is relatively basic so we won't discuss it other than to draw attention to the CASE statement that changes the primary keys in the Oracle DEMO.EMPLOYEE table to match the keys in the SQL Server Northwind.Employees table. This way, we can join the Oracle EMPLOYEE table to other tables in the Northwind database in place of the standard Employees table in SQL Server's Northwind database.

Listing A: Joining an Oracle table to a SQL Server table

CREATE VIEW qryOracleEmployees AS

SELECT EmployeeID = CASE E.EMPLOYEE_ID

WHEN 7369 THEN 1

WHEN 7499 THEN 2

WHEN 7505 THEN 3

WHEN 7506 THEN 4

WHEN 7507 THEN 5

WHEN 7521 THEN 6

WHEN 7555 THEN 7

WHEN 7557 THEN 8

WHEN 7560 THEN 9

ELSE EMPLOYEE_ID

END,

FirstName = E.FIRST_NAME,

LastName = E.LAST_NAME,

Title = J.FUNCTION,

TitleOfCourtesy = "Other",

BirthDate = DATEADD(day, E.EMPLOYEE_ID,

=>CAST('1/1/1940' AS DATETIME)),

HireDate = HIRE_DATE,

Address = '1600 Pennsylvania Avenue',

City = 'Washington',

Region = 'DC',

PostalCode = '20500',

Country = 'USA',

HomePhone = '(202) 456-1414',

Extension = '123',

Notes = 'Total Compensation ' +

=>CAST(SALARY + ISNULL(COMMISSION,

=>0) AS VARCHAR(16)),

ReportsTo = MANAGER_ID,

PhotoPath = ''

FROM ORACLE_HR..DEMO.EMPLOYEE E,

ORACLE_HR..DEMO.JOB J

WHERE E.JOB_ID = J.JOB_ID

After creating qryOracleEmployees, we can use the query in Listing B to easily calculate the sales per employee even if the Sales database is in SQL Server and my HR database is in Oracle.

Listing B: Query to easily calculate the sales per employee

CREATE VIEW qrySalesBySalesPerson AS

SELECT O.EmployeeID, E.FirstName,

=>E.LastName, SUM(OD.UnitPrice *

=>OD.Quantity) AS Sales

FROM [Order Details] OD,

Orders O,

qryOracleEmployees E

WHERE O.OrderID = OD.OrderID

AND O.EmployeeID = E.EmployeeID

GROUP BY O.EmployeeID, E.FirstName, E.LastName

This discussion, along with last month's article on distributed queries, should give you a strong understanding of how to work with traditional relational data whether it's stored in SQL Server or some other relational database management system. In the next section, we're going to walk you through linking to non-relational data sources.

Working with non-relational data sources

Being able to access any number of different vendors' RDBMSs from a single SQL statement greatly increases our productivity and adds much needed flexibility to our designs. Structured data, however, represents a very small portion of corporate data. By far, the majority of corporate data resides in a myriad of unstructured data sources from email stores to file systems to Web servers. We're going to walk you through setting up linked servers for two common, non-relational stores to which most of you will have easy access: the Windows NT file system and the new Windows 2000 Active Directory.

Querying the Windows NT file system

Rather than having to develop a full-text query engine for the Windows NT file system, OLE DB allowed the SQL Server development team to build upon Index Server that ships with the Windows NT Option Pack. If you don't already have a copy of Index Server, you can download it free from

www.microsoft.com/ntserver/nts/downloads/recommended/
NT4OptPk/default.asp

To add your Web data store as a linked server, execute the following SQL statement:

-- Set up linked server

sp_AddLinkedServer

@Server = WebStore,

@SrvProduct = 'Index Server',

@Provider = 'MSIDXS',

@DataSrc = 'Web'

Note that Web is the default name for the Index Server catalog of the Default Web Site. If you've changed the name of the catalog, you'll need to update the @DataSrc argument accordingly. Next, you need to configure the security by executing the following query:

-- Deny all access by default

sp_DropLinkedSrvLogin

@RmtSrvName = WebStore,

@LocalLogin = null

-- Map the sa account to my network account

sp_AddLinkedSrvLogin

@RmtSrvName = 'WebStore',

@UseSelf = 'false',

@LocalLogin = 'sa',

@RmtUser = 'MAUVAIS\Administrator',

@RmtPassword = 'Secret'

Finally, we execute a query that lists all the files in on the Default Web Site that reference SQL Server and ranks them by relevance:

-- Execute query

SELECT *

FROM OPENQUERY(WebStore,

'SELECT Rank, path, DocTitle,

Size, HitCount FROM SCOPE('' "/" '')

WHERE CONTAINS ('' "SQL Server" '' ) ')

ORDER BY Rank Desc

Querying the active directory

Let's move on from the file system to the Active Directory. As companies begin to consolidate multiple directories into Windows 2000's Active Directory, it will be a boon to developers whose apps are directory-enabled and to consultants who can help make legacy applications directory-aware.

Better yet, it will make our lives going forward much easier because we can stop worrying about writing separate interfaces for user maintenance and then synchronizing our user data with the various other stores of user information located elsewhere in the enterprise. Let's see what we can do to get a head start on this trend. We start off just as we have in the other examples by adding a linked server and configuring security as shown in Listing C.

Listing C: Linked server and security

-- Add linked server and deny all access by default

sp_AddLinkedServer

@Server = ADSI,

@SrvProduct = 'Active Directory Services',

@Provider = 'ADSDSOObject'

sp_DropLinkedSrvLogin

@RmtSrvName = ADSI,

@LocalLogin = null

--Map sa to NT admin

sp_AddLinkedSrvLogin

@RmtSrvName = 'ADSI',

@UseSelf = 'false',

@LocalLogin = 'sa',

@RmtUser = 'MAUVAIS\Administrator',

@RmtPassword = 'Secret'

Now that you've defined the linked server, to quickly get a list of all the users in your directory, you could use this query:

--List users in AD

SELECT name, adspath, cn

FROM OpenQuery(ADSI,

'SELECT adspath, name, cn

FROM ''LDAP://DC=mauvais,DC=Com''

WHERE objectCategory=''person''

AND objectClass=''user'' ')

You'll want to note the use of two single quotes to escape the quoting required by the LDAP's SQL dialect. Some documentation we've seen incorrectly uses a set of double quotes. If you accidentally use double quotes, ADODB will return generic error number 7321 with the description

An error occurred while preparing a query for execution against OLE DB provider ‘ADSDSOObject'.

You might come across this error fairly often as you first start writing heterogeneous queries. It simply means you have a syntax error in the query string portion of OpenQuery.

You might be temped to include objectClass in your SQL string to determine what other classes the Active Directory provides. However, you'll run into problems with such a query because objectClass contains multiple values in the Active Directory. Transact-SQL only knows how to work with attributes that hold a single value.

While we're talking about potential problems, be careful not to leave the security configured for They Will Be Impersonated or your queries will only return information on DNS root servers! To make matters worse, most of the attributes for these entries are null¾ making troubleshooting frustrating and difficult. (We spent several hours tracking this one down.) To the best of our knowledge, this problem isn't documented anywhere.

Let's return back to our original example of joining the PeopleSoft data with the sales database and the Active Directory. Here's the query to accomplish that:

SELECT E.FirstName, E.LastName,

M.FIRST_NAME AS MANAGER_FIRST,

M.LAST_NAME AS MANAGER_LAST,

S.Sales, AD.Mail

FROM qryOracleEmployees E,

ORACLE_HR..DEMO.EMPLOYEE M,

qrySalesBySalesPerson S,

OpenQuery(ADSI,

‘SELECT givenName, sn, mail

FROM ''LDAP://DC=mauvais,DC=Com''

WHERE objectCategory=''person''

AND objectClass=''user'' ') AD

WHERE E.ReportsTo = M.EMPLOYEE_ID

AND S.EmployeeID = E.EmployeeID

AND AD.sn = M.LAST_NAME

AND AD.givenName = M.FIRST_NAME

Now you have all the information you need to mail a report to the sales managers.

Maximizing the performance of heterogeneous queries

Finally, we want to spend a bit of time discussing performance of the distributed query processor, because it will make or break your application—whether you're writing heterogeneous queries across multiple data sources or distributed queries with native SQL Server data. To do this, we'll examine the performance difference between two nearly identical distributed queries accessing the Northwind database on two different servers. This first query joins the local Categories table to the Products table on a remote server and limits the result set to only those products in category 7, which is the Produce category:

SELECT C.CategoryID, C.CategoryName,

P.ProductName

FROM Categories C,

SAP.northwind.dbo.Products P

WHERE P.CategoryID = C.CategoryID

and C.CategoryID = 7

This second query does exactly the same thing except it uses the CategoryName instead of the CategoryID in the WHERE clause:

SELECT C.CategoryID, C.CategoryName,

P.ProductName

FROM Categories C,

SAP.northwind.dbo.Products P

WHERE P.CategoryID = C.CategoryID

and C.CategoryName = 'Produce'

If you were to run these two queries against a single server, they would perform equally well. When you execute them as distributed queries, however, the first query runs about three times faster. Why such a large difference? All of the relevant columns in the join condition and the WHERE clauses have indices and both queries return the correct five rows, so there's nothing wrong with the data or the schema. The problem lies in the way the distributed query processor optimizes the query and how it decides what portion to send off the remote server.

To get an idea of how SQL Server optimizes the query, let's take a look at the execution plans for the two queries. First let's look at the plan for the faster query, CategoryID = 7, as shown in Figure E.

Figure E: You can use the query plan to analyze the execution of the query containing the WHERE CategoryID = 7 clause.
[ Figure E ]

When SQL Server executes the first query, the join condition and the criteria in the WHERE clause are both based on the same field, CategoryID. Therefore, the distributed query processor knows that it cares about only the products in Category 7 and it sends the appropriate query off to the remote server, as you can see in the argument section of the pop-up window. Also notice the row count of five matches the number of rows in the result set.

In the second, slower query, CategoryName = 'Produce', the row count for the remote query is 66, as you can see in Figure F. You can see the reason in the argument section of the pop-up window. Rather than sending a query with a WHERE clause and thus limiting the result set to the five rows in the Products like it did for the first query, this time SQL Server asks for all the rows ordered by CategoryID. Why would it do this?

Figure F: Use the query plan to analyze the execution of the query containing the WHERE CategoryName = 'Produce' clause.
[ Figure F ]

It does this because the WHERE clause of our second query, C.CategoryName = 'Produce', references only a column in the local table. As a result, the query SQL Server builds to send to the remote server thus has no criteria to restrict the number of rows returned by that server. Therefore, the query processor does the next best thing and instructs the remote server to send the result set ordered by CategoryID so it can quickly match them to the rows returned by the local query.

You might be wondering why the remote query processed 66 rows when there are 77 rows in the Products table. Didn't we just say that it requested all the rows? As astute readers will probably guess, if you execute the query

SELECT * FROM Products ORDER BY CategoryID ASC

the five rows between 61 and 65 belong to CategoryID 7. Because SQL Server orders the result set by CategoryID, as soon as SQL Server hit row 66, it knew it had all the rows it needed and stopped fetching. For old-timers, this is just like what we used to do in xBase where we would use SKIP to go through an indexed table until the key value changed.

As you've probably surmised, the key to insuring good performance with heterogeneous queries is to limit the data you pass between servers. Fortunately, SQL Server provides several tools to help us analyze what the query optimizer is doing.

Conclusion

In this article, we showed you how to leverage OLE DB to write standard Transact-SQL queries that access data from heterogeneous sources throughout the enterprise. More importantly, we saw how to join together data from several sources and present it in a single result set where end users and developers can easily understand and manipulate it.

We started by using Enterprise Manager to create an Oracle linked server. At the same time, we took a close look at the options you have for controlling access to remote data. From there, we moved on to non-relational data sources where corporations still keep the majority of their critical information. We demonstrated how to access the Active Directory and the unstructured information in the NT file system. Finally, we wrapped up by looking at the performance implications of working in a distributed environment. Now that you know how to work with heterogeneous queries, you can integrate mission critical data into your applications—and start to move your SQL Server apps into the data center.


 

Top of Page  •  Back Issues


Copyright © 2001 Element K Content LLC. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Element K Content LLC is prohibited. Element K is a service mark of Element K LLC.

[ Advertise with Us! ]    Privacy Policy     Questions? Comments?