Try a FREE Issue
  Shop the Store
  Shop the Pay-per-
View Archives

  Business & Personal Computing
  IT Professional
  Programming &
Web Development
  Graphics & Design
  Element K Journals Home Page

  Visit Discussion
Forums
  Contact Us
  Get Published
  About Element K Journals
  Advertise with Us!
  Meet Our Partners

  Tips
  Weekly Buzz
 
Access Online Courses
 
Search Element K Journals Articles


[ Element K Journals ]
Inside Web Design
[ Element K Journals ]


 

SQL Server Solutions
Back Issues  •  About our Experts

 

March 2000

Using distributed queries with SQL Server 7.0

 
by Scott Mauvais

With SQL Server taking a more prominent role in most companies' IT strategies, those of us that are responsible for managing these databases are finding our server rooms filling up with SQL boxes as more and more applications roll into production. At first glance, having a server dedicated to a few applications rather than a single huge server looks like a good thing—we probably even argued for it. Sure, the extra hardware was expensive, but it sure made tuning easy. However, as we delve deeper into this multiple server approach, we've discovered some complications.

We're first introduced to some of these complications when some business manager says something to the effect of "We already track that information so don't bother collecting it yourself—just use our data. The IS folks will take care of it—it's all in SQL Server." All of a sudden, you now need to validate your Web orders against the quantity-on-hand in your inventory control database—and these databases are on separate servers. One solution is to use linked servers and distributed queries. These queries enable you to integrate data from several linked SQL servers and present it to your users as if it came from a single source—your application. You can easily define linked servers in SQL Server Enterprise Manager, as shown in Figure A.

Figure A: You can define a linked server by using SQL Server Enterprise Manager.
[ Figure A ]

Before we dive into the details of our application, you should understand the technology behind SQL Server's distributed queries. A distributed query enables users and application developers to access data from multiple remote data sources by using the same standard SQL statements such as SELECT and INSERT that you use for local tables. SQL Server takes care of parsing the query and sending the various portions to the proper servers—and constructing the data returned by those servers into a single result set.

SQL Server can work with any data source that supports OLE DB. Obvious examples of these include relational databases such as SQL Server, Oracle, and anything that's accessible via ODBC; but it also includes non-relational stores such as LDAP, file systems, and email stores. In this article, we're going to focus solely on SQL Server; next month, we're going to look at working with heterogeneous data—specifically, data in non-relational stores.

We're going to begin by defining linked servers, and then writing distributed queries to join remote servers' data with data from our local server. Next, we'll simulate a multi-application environment by making use of different tables in Northwind on each of our servers. We're going to access the Customers table on one server (Solomon), the Orders and Order Details tables on another (Great Plains), and the Products table on a third (SAP). Don't worry if you only have one server, later we'll show you how to easily set up this environment on a single SQL server.

Setting up a linked server within Enterprise Manager

To access data on another server, you must first tell SQL Server how to access the remote data. You can do this by using either of two methods: defining a linked server or using the OpenRowSet() function to create a dynamic reference. A linked server enables you to define (and save) all of the information you need for connecting to a remote server.

When you set up a linked server, SQL Server adds a row to the sysservers table. In this table, SQL Server stores the necessary information to allow it to connect to the remote data source. If you're familiar with Microsoft Access, you can liken this to creating a linked table, except that this points to the entire server rather than to a single table in a single database.

You can define a linked server by using either SQL Server Enterprise Manager or stored procedures. To define a linked server in SQL Server Enterprise Manager, begin by accessing the Security folder below the server from which you want to establish the link to a remote server. In our example, we're connecting from the Great Plains server with the customer database to the remote Solomon server holding the orders database. Next, right-click on the Linked Servers object. From the shortcut menu, choose New Linked Server, as shown in Figure A.

You use the General tab, as shown in Figure B, to define information about the linked server. Let's look at each of the options on the General property page in order:

  • Linked Server—This is the name you'll use to access the remote data source. The SQL Server documentation incorrectly states that this must be the computer name of your SQL server. While this is the default behavior, you can cheat and use the SQL Client Configuration utility to provide an alias for a server, and then use the alias as the linked server name. If you're working on a single server, you can use the same technique to create three different aliases all pointing to the same SQL server.

  • Server Type—This is the type of data source to which you want to link. You can choose either SQL Server or an OLE DB data source. When you select SQL Server, the remaining options in the Server Type section of the dialog box are disabled. In this article, we're focusing on defining links to other SQL servers only, so we're going to skip the remaining Server Type options.

  • Collation Compatible—If you check this option, it tells the query optimizer that both servers use the same collation order and character sets. If this is the case, the local SQL server parses the query text more aggressively and sends a larger portion of the query to the remote server. This entails letting the remote server perform comparisons on character data and return ordered result sets rather than forcing it to return all character data to the local server for processing. You should check this box if you're absolutely positive that the character sets and sort orders are identical on both servers. If not, you'll get unpredictable results.

  • Data Access—You should leave this option checked. If you uncheck this box, you're telling SQL Server to disallow distributed queries against this data source.

  • RPC—You should select this option to allow RPC access in from the remote server.

  • RPC Out—You should select this option to allow RPC access out to the remote server. You must leave this option checked if you want to execute stored procedures on the remote server.

Figure B: To configure a SQL server as a linked server, define a name for that server and choose SQL Server as the Server Type.
[ Figure B ]

Let's move on to the Security tab. You use this tab to define the security context that users will use to access the remote data store. We'll talk about security in more detail in next month's article. For the time being, just select the They Will Be Impersonated option, as shown in Figure C. This option enables your local users to connect to the linked server (Solomon) by using the same login ID and password they used to connect to the local server. Click OK to save your linked server definition to the sysservers table. Note that in our sample queries, we're going to log on to both the local and linked servers as the sa user in order to keep our examples simple.

Figure C: Use the Security tab to set the linked server's security context.
[ Figure C ]

Using distributed queries with linked servers

Now that we have a linked server set up, we're ready to execute a distributed query. When executing a distributed query, you reference objects on a linked server by using the following four-part syntax:
LinkedServer.Catalog.Schema.Object
As you can see, Microsoft designed this syntax to work with any OLE DB data sources rather than just with SQL Server. That said, specifying a remote SQL Server object is extremely easy and will look familiar to anyone who has worked with inter-database queries. Let's look at each of the four parts:

  • LinkedServer—This is the name of the linked server you just set up. In our example, the linked server's name is Solomon.

  • Catalog—This is the name of the database. In our example, it's customer.

  • Schema—This is the owner of the object you're querying. It will typically be the database owner (dbo) user.

  • Object—This is the table, view, or stored procedure you want to access. In our example, we're going to retrieve data from the Customers table.

You're going to see that throughout our distributed query examples we've included the object's owner in each query. When working with inter-database queries, SQL Server can resolve the object owner's name for you. Thus, both of the following queries are valid:

SELECT * FROM Northwind..Customers
SELECT * FROM Northwind.dbo.Customers
OLE DB, however, doesn't support implicit name resolution. If you remove the object owner from the four-part syntax, you'll receive a 7314 error telling you the object doesn't exist. As a result, it's best for you to include the object's owner as part of the four-part name—even if you're simply connecting to another SQL server.

Basic queries

For example, let's say we want to query the Customer database on the linked Solomon server to get an alphabetical list of cities in which our customers are located. To do so, we must use the four-part syntax for accessing a linked server, as shown in the following query:
SELECT City
 FROM Solomon.Customer.dbo.Customers
 GROUP BY City
 ORDER BY City

Table joins

You can join tables, too. The joined tables can be all on the linked server, or you can join a table on the linked server to a table on your local server. For example, assume we're currently logged on to the Great Plains server that contains the Sales database—the Northwind Orders and Order Details tables.

Now, let's say we want to determine what customers have ordered products that we have since discontinued. To accomplish this, we're going to need to connect to the Inventory database on the SAP server for the Products table, the Customer database on the Solomon server for the Customers table, and the local server for the Orders and Orders Detail tables stored in the Great Plains database.

To obtain this information, we can use the query shown in Listing A. We used brackets as delimiters for names with embedded spaces. In addition, because of the length of the four-part names for each table, we used aliases to help you keep your sanity (and to keep you from getting carpal tunnel syndrome!). Once we have the result set in hand, we could contact each of these customers and let them know if we have any new products to offer as replacements for their favorite goods.

Listing A: Table joins in a distributed query

SELECT C.CustomerID, C.CompanyName, C.ContactName, C.Phone,
 O.OrderID, O.OrderDate,
 P.ProductName, P.Discontinued
 FROM Solomon.Customer.dbo.Customers AS C,
  Sales.dbo.Orders  AS O,
  Sales.dbo.[Order Details] AS D,
  SAP.Inventory.dbo.Products AS P
 WHERE C.CustomerID = O.CustomerID
  AND O.OrderID = D.OrderID
  AND D.ProductID = P.ProductID
  AND P.Discontinued = 1
 ORDER BY C.CompanyName, O.OrderID, P.ProductName
Let's take a moment to consider what it would have taken to retrieve this same result set if we didn't use distributed queries. For starters, our application would have required three separate connections along with the associated overhead required to establish them. We would also have had to know where the three servers were located and how to connect to them. Because our application is responsible for knowing how to connect to these disparate servers, we probably need to provide maintenance screens to enable the user to change the server name, login credentials, and a myriad of other settings. As they say on late night infomercials, "But wait, there's more!"

So far, we've only addressed the difficulty in connecting to the various databases. We still need to processes the result sets. To find out who has purchased products that are now discontinued we would have to construct a procedure around the following pseudo code:

Fetch all discontinued products
 For each product, fetch all orders
  For each order, Fetch customer
No, the coding isn't that difficult—a standard outer/inner looping construct—but the performance will be abysmal. We're essentially writing our own ISAM-style database engine in a SQL application. For each and every value in the header table, we're manually joining to the detail table to get the matching records for just that one value.

Potentially we could loop through and fetch every customer for every product. These nested loops and merge joins are exactly what we're supposed to rely on RDBMSs to do for us. One final thought here before we get back to our regularly scheduled program. We were only reading the data, so imagine what would have been involved if we had wanted to update each of those data sources in a single transaction.

Stored procedures

In our previous example, we used a standard SQL statement. If we want, we can also execute stored procedures on linked servers. For example, we can execute everyone's favorite system stored procedure, sp_help, using the same four-part syntax:
GreatPlains.Sales.dbo.sp_help

Updating data

You've now seen how you can retrieve data from remote data sources—but what about updating it? Well, it works just as you would expect with standard Transact-SQL. For example, if you want to change the name of Save-a-lot-Markets to Spend-a-bunch Markets in the customers table, you could use the following query:
UPDATE Solomon.Customer.dbo.Customers
 SET CompanyName = 'Spend-a-bunch Markets'
 WHERE CustomerID = 'SAVEA'
You can also use the INSERT and DELETE statements the same way. Just be sure you identify the remote data by using the four-part syntax.

Using stored procedures to set up linked servers

Earlier we saw how to use Enterprise Manager to create linked servers. For your production servers, you'll want to use SQL Server's system stored procedures to define linked servers so that you can reconfigure your server should you ever need to.

The stored procedures themselves are straightforward, but we've found some significant problems that aren't well documented. Thus, we're going to walk you through the process in rather minute detail with quite a few examples so that you can avoid some of the mistakes we've made along the way.

You can use the sp_addlinkedserver stored procedure to add a linked server. It uses the following syntax:

sp_addlinkedserver [@server =] 'server'
 [, [@srvproduct =] 'product name']
 [, [@provider =] 'provider name']
 [, [@datasrc =] 'data source']
 [, [@location =] 'location']
 [, [@provstr =] 'provider string']
 [, [@catalog =] 'catalog']
Let's look at each of the seven parameters you use in sp_addlinkedserver.

Server

The server parameter represents the name by which you'll refer to the linked server. This is the only required parameter for the sp_addlinkedserver stored procedure. If you don't specify any of the other parameters, SQL Server assumes you want to set up a link to a remote SQL Server (and not a heterogeneous data source). It also forces you to have the linked server name be the same as the server's network name. For example, you could use the following syntax to add a linked server definition for the server named GreatPlains:
sp_addlinkedServer @server='GreatPlains'

Provider name

The provider name parameter represents the OLE DB provider name and it must match one of the providers installed on the local computer. You need to be careful with the provider name parameter because the sp_addlinkedserver system stored procedure doesn't validate it. For example, consider the following query:
sp_addlinkedserver
@server = 'bad_GreatPlains',
@srvproduct = 'Remote SQL Data',
@provider = 'BAD_SQLOLEDB'
@datasrc = 'GreatPlains'
go

Server added.
1>
In this example, SQL Server successfully adds a linked server even though the provider name parameter is invalid. But when you try to run a query against this linked server, you'll receive the following error:

1> SELECT * FROM bad_GreatPlains.Sales.

=>dbo.Orders

2> go

Msg 7403, Level 16, State 1, Line 1

Could not locate registry entry for OLE DB

=>provider 'BAD_SQLOLEDB'.

1>

If you're unsure of the exact OLE DB provider name, you can use the xp_enum_oledb_providers extended stored procedure. It reads the local computer's Registry and returns a list of installed providers in a single column list that's just perfect for cutting and pasting.

Product name

You use the product name parameter to specify the friendly name for the OLEDB provider, but using this parameter is anything but friendly. In fact, it can be downright frustrating. The documentation correctly states that it defaults to SQL Server but there's much more to it that that.

If you set it to SQL Server, you cannot pass any other parameters. If you try, you'll receive an error 15428 as demonstrated by the following code:

sp_addlinkedserver
@server = 'GreatPlains',
@srvproduct = 'SQL Server',
@provider = 'SQLOLEDB'
go

Msg 15428, Level 16, State 1, Procedure
sp_addlinkedserver, Line 67. You cannot
specify a provider or any properties for
product 'SQL Server'.
1<
If you try to take advantage of it defaulting to ‘SQL Server' and leave it blank while passing other parameters, SQL Server returns error 15429:
sp_addlinkedserver
@server = 'GreatPlains',
@srvproduct = 'SQL Server',
@provider = 'SQLOLEDB',
go

Msg 15429, Level 16, State 1, Procedure
sp_addlinkedserver, Line 72 '(null)' is
an invalid product name. @datasrc =
'GreatPlains'
1>
So, how about if we try using ‘Remote SQL Server Data' as the friendly name? No such luck. We receive error 15429 again, but with a different description this time:
sp_addlinkedserver
@server = 'GreatPlains',
@srvproduct = 'Remote SQL Server Data',
@provider = 'SQLOLEDB',
@datasrc = 'GreatPlains'
go

Msg 15429, Level 16, State 1, Procedure
sp_addlinkedserver, Line 72 'Remote SQL
Server Data' is an invalid product name.
1>
With a little experimentation, we've learned that we simply can't use ‘SQL Server' as any part of the product name unless the product name exactly equals ‘SQL Server', and you pass no other parameters. This query works:
sp_addlinkedserver
@server = 'GreatPlains',
@srvproduct = 'SQL Server'
go

Server added.
1>
As does this:
sp_addlinkedserver
@server = 'GreatPlains',
@srvproduct = 'Remote SQL Data',
@provider = 'SQLOLEDB'
go

Server added.
1>
If you don't want to use a friendly name, simply pass an empty string, @srvproduct = ''.

Data source

Use the data source parameter to specify the name of the data source the OLE DB provider will use to locate the remote data. For SQL Server, you can use either the network name of the database server or an aliased name you created with the Client Configuration Utility.

Location

This parameter enables you to specify the location of the data source that the OLE DB will use when connecting. It isn't used when connecting to remote SQL Servers.

Provider string

Use the provider string parameter to tell the OLE DB provider how to connect to the remote data store. This parameter accepts only a DSNProviderString style connection string; the DataSource;UserName;Password format acceptable for the OpenRowSet function doesn't work here. Here's the syntax:
sp_addlinkedserver @server = 'ProvStr_SAP',
 @srvproduct = '',
 @provider = 'SQLOLEDB',
 @datasrc = 'SAP',
 @provstr = 'Server=SAP;UID=sa;pwd=Secret;'

Catalog

You can use the catalog parameter to specify what catalog the OLE DB provider should use when connecting to the remote data source. For SQL Server data sources, use the default database. For example, in the following query, we pass the Inventory database for the catalog parameter:
sp_addlinkedserver @server='SAPCatalog',
 @srvproduct = '',
 @provider = 'SQLOLEDB',
 @datasrc = 'SAP',
 @catalog = 'Inventory'
You can determine if you specified a catalog for a linked server by selecting the linked server's tables icon in Enterprise Manager. If the right pane contains a list of tables, these are the tables from the default database, as shown in Figure D. If we hadn't specified a catalog, the right pane would have been empty.

Figure D: You can use SQL Server Enterprise Manager to view the default database for a linked server.
[ Figure D ]

Although you can define a default database (catalog) when you use the sp_addlinkedserver stored procedure, you can query any database on the server by using the following standard four-part syntax:

LinkedServer.Catalog.Schema.Object
In fact, because you must specify all four parts anyway, defining a default database doesn't do much for you. Therefore, we usually save a little typing and skip the catalog parameter when defining a linked server with sp_addlinkedserver.

Configuring linked server properties

Once you've defined a linked server by using sp_addlinkedserver, you'll still need to configure the server's properties if you want to execute stored procedures on it. (Remember, in SQL Server Enterprise Manager, we had check boxes for the RPC option.) If you use stored procedures to add a linked server, you can use the sp_serveroption system stored procedure to configure its options. This procedure works just like sp_dboption (for configuring your local server) that you might be more accustomed to using. Use the following syntax:
sp-_serveroption [@server = ] ['server' ]
 [, [@optname = ] 'option name' ]
 [, [@optvalue = ] 'option value' ]
There are several possible option settings that you can determine by calling sp_serveroption with no parameters. We're going to only concern ourselves with the two that relate to RPC. In the following examples, we use sp_serveroption to set the RPC and RPC Out server options on the linked server:
exec sp_serveroption 'SAP', 'rpc', TRUE
exec sp_serveroption 'SAP', 'rpc out', TRUE
Now you can execute stored procedures on the remote SQL Server.

Viewing linked servers

If you want to see a list of the linked servers you've defined or to confirm that your newly added server really does exist, you can use the sp_linkedservers stored procedure. This procedure returns a list of not only the linked servers, but also the configuration information for each server such as the Provider String, Data Source, and so on.

Dropping linked servers

Finally, if you want to remove a linked sever, you can use the stored procedure sp_dropserver. For example, if we wanted to remove the linked server we just added with the invalid OLE DB Provider, we could execute this statement:
sp_dropserver  'bad_GreatPlains'

Creating ad hoc distributed queries

As we mentioned earlier, you can execute distributed queries either against a pre-configured linked server, or you can reference the remote data source dynamically as part of your query syntax. To dynamically reference a remote data source, you use the OpenRowSet() function as the table name in the FROM clause of a query. Use the following syntax with OpenRowSet():
OpenRowSet(Provider, Connection String,
=>SQL String)
Replace the variables in the function with the following information:

  • ;Provider—Use this to define the OLE DB provider to connect to the remote server. For a SQL server, use 'SQLOLEDB'.

  • Connection String—This tells the OLE DB provider how to connect to the remote data store. You can format this either as DataSource;UserName;Password or DSNProviderString.

  • SQL String—This is a string that the data source will recognize and will either result in it returning a result set or perform an update, insert, or delete. If you're connecting to a SQL server, you can use any valid Transact-SQL statement.

We can rewrite our original example, to retrieve an alphabetical list of cities from the customer table by using the following syntax:

SELECT C.City
FROM OpenRowSet('SQLOLEDB',
Solomon;sa;Secret;' ,
‘SELECT City FROM Customer.dbo.Customers')
AS C
GROUP BY C.City
ORDER BY C.City
In this example, we use the AS syntax for aliasing the table name so that we can make sure we don't have ambiguous table names in the SELECT clause (if we've referenced more than one table). Alternately, we could use the following query using a DSNProviderString:
SELECT C.City
FROM OpenRowSet('SQLOLEDB',
'Server=Solomon;UID=sa;PWD=Secret' ,
'SELECT City FROM Customer.dbo.Customers')
AS C
GROUP BY C.City
ORDER BY C.City
If you use a DSNProviderString, make sure you don't accidentally leave a space between the Server and UID values or the query parser won't be able to interpret the login credentials—and you'll be denied access. We spent a very frustrating (and unbillable) hour or two debugging an application that had an extra space buried deep in a dynamically built SQL statement.

Finally, you can update data on a remote data store by using OpenRowSet. For example, we can return the CompanyName back to Save-a-bunch Markets by using the following query:

UPDATE OpenRowSet('SQLOLEDB',
'Solomon';'sa';'Secret',
'SELECT CustomerID, CompanyName
FROM Customer.dbo.Customers
WHERE CustomerID = ''SAVEA'' ')
SET CompanyName = 'Save-a-bunch Markets'
WHERE CustomerID = 'SAVEA'
Here we used the DataSource;UserName;Password syntax for the connection string, but DSNProviderString would have worked just as well. You'll also want to note the query string portion of the OpenRowSet function. While a simple
SELECT * FROM Customer.dbo.Customers
would have worked, we included the same WHERE clause and column list used in my UPDATE clause to minimize the rows processed.

Conclusion

In this article, we showed you how to utilize SQL Sever 7.0's new distributed queries to present data that's distributed on several servers to end users as a single, coherent result set. Not only does the ability to execute distributed queries make our lives as DBAs much easier, but it greatly eases the burden on developers from writing distributed applications. It frees them from having to write their own code to connect to multiple servers, issue multiple SQL queries, and step through and synchronize multiple result sets.

This month, we looked at how to defined linked servers in SQL Server Enterprise Manager and through stored procedures, and how to execute distributed queries against linked servers. In addition, we also showed you how to execute ad hoc distributed queries when connecting to remote servers. So as not to overwhelm ourselves, we limited ourselves to a SQL Server only environment. Next month, we're going to expand our data source horizons and look at other types of data including non-relational ones. We'll also take a closer look at the performance and security issues when dealing with distributed, heterogeneous data sources.


 

Top of Page  •  Back Issues


Copyright © 2001 Element K Content LLC. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Element K Content LLC is prohibited. Element K is a service mark of Element K LLC.

[ Advertise with Us! ]    Privacy Policy     Questions? Comments?