|
![]() ![]()
March 2000Using distributed queries with SQL Server 7.0by Scott Mauvais With SQL Server taking a more prominent role in most companies' IT strategies, those of us that are responsible for managing these databases are finding our server rooms filling up with SQL boxes as more and more applications roll into production. At first glance, having a server dedicated to a few applications rather than a single huge server looks like a good thing—we probably even argued for it. Sure, the extra hardware was expensive, but it sure made tuning easy. However, as we delve deeper into this multiple server approach, we've discovered some complications. We're first introduced to some of these complications when some business manager says something to the effect of "We already track that information so don't bother collecting it yourself—just use our data. The IS folks will take care of it—it's all in SQL Server." All of a sudden, you now need to validate your Web orders against the quantity-on-hand in your inventory control database—and these databases are on separate servers. One solution is to use linked servers and distributed queries. These queries enable you to integrate data from several linked SQL servers and present it to your users as if it came from a single source—your application. You can easily define linked servers in SQL Server Enterprise Manager, as shown in Figure A. Figure A: You can define a linked server by using SQL Server
Enterprise Manager. Before we dive into the details of our application, you should
understand the technology behind SQL Server's distributed queries. A
distributed query enables users and application developers to access data
from multiple remote data sources by using the same standard SQL
statements such as SQL Server can work with any data source that supports OLE DB. Obvious examples of these include relational databases such as SQL Server, Oracle, and anything that's accessible via ODBC; but it also includes non-relational stores such as LDAP, file systems, and email stores. In this article, we're going to focus solely on SQL Server; next month, we're going to look at working with heterogeneous data—specifically, data in non-relational stores. We're going to begin by defining linked servers, and then writing
distributed queries to join remote servers' data with data from our local
server. Next, we'll simulate a multi-application environment by making use
of different tables in Northwind on each of our servers. We're going to
access the Setting up a linked server within Enterprise ManagerTo access data on another server, you must first tell SQL Server how to access the remote data. You can do this by using either of two methods: defining a linked server or using theOpenRowSet() function to create a
dynamic reference. A linked server enables you to define (and save)
all of the information you need for connecting to a remote server.
When you set up a linked server, SQL Server adds a row to the
You can define a linked server by using either SQL Server Enterprise Manager or stored procedures. To define a linked server in SQL Server Enterprise Manager, begin by accessing the Security folder below the server from which you want to establish the link to a remote server. In our example, we're connecting from the Great Plains server with the customer database to the remote Solomon server holding the orders database. Next, right-click on the Linked Servers object. From the shortcut menu, choose New Linked Server, as shown in Figure A. You use the General tab, as shown in Figure B, to define information about the linked server. Let's look at each of the options on the General property page in order:
Figure B: To configure a SQL server as a linked server, define a
name for that server and choose SQL Server as the Server Type. Let's move on to the Security tab. You use this tab to define the
security context that users will use to access the remote data store.
We'll talk about security in more detail in next month's article. For the
time being, just select the Figure C: Use the Security tab to set the linked server's
security context. Using distributed queries with linked serversNow that we have a linked server set up, we're ready to execute a distributed query. When executing a distributed query, you reference objects on a linked server by using the following four-part syntax:LinkedServer.Catalog.Schema.ObjectAs you can see, Microsoft designed this syntax to work with any OLE DB data sources rather than just with SQL Server. That said, specifying a remote SQL Server object is extremely easy and will look familiar to anyone who has worked with inter-database queries. Let's look at each of the four parts:
You're going to see that throughout our distributed query examples we've included the object's owner in each query. When working with inter-database queries, SQL Server can resolve the object owner's name for you. Thus, both of the following queries are valid: SELECT * FROM Northwind..Customers SELECT * FROM Northwind.dbo.CustomersOLE DB, however, doesn't support implicit name resolution. If you remove the object owner from the four-part syntax, you'll receive a 7314 error telling you the object doesn't exist. As a result, it's best for you to include the object's owner as part of the four-part name—even if you're simply connecting to another SQL server. Basic queriesFor example, let's say we want to query the Customer database on the linked Solomon server to get an alphabetical list of cities in which our customers are located. To do so, we must use the four-part syntax for accessing a linked server, as shown in the following query:SELECT City FROM Solomon.Customer.dbo.Customers GROUP BY City ORDER BY City Table joinsYou can join tables, too. The joined tables can be all on the linked server, or you can join a table on the linked server to a table on your local server. For example, assume we're currently logged on to the Great Plains server that contains the Sales database—the NorthwindOrders and Order Details tables.
Now, let's say we want to determine what customers have ordered
products that we have since discontinued. To accomplish this, we're going
to need to connect to the Inventory database on the SAP server for the
To obtain this information, we can use the query shown in Listing A. We used brackets as delimiters for names with embedded spaces. In addition, because of the length of the four-part names for each table, we used aliases to help you keep your sanity (and to keep you from getting carpal tunnel syndrome!). Once we have the result set in hand, we could contact each of these customers and let them know if we have any new products to offer as replacements for their favorite goods. Listing A: Table joins in a distributed query SELECT C.CustomerID, C.CompanyName, C.ContactName, C.Phone, O.OrderID, O.OrderDate, P.ProductName, P.Discontinued FROM Solomon.Customer.dbo.Customers AS C, Sales.dbo.Orders AS O, Sales.dbo.[Order Details] AS D, SAP.Inventory.dbo.Products AS P WHERE C.CustomerID = O.CustomerID AND O.OrderID = D.OrderID AND D.ProductID = P.ProductID AND P.Discontinued = 1 ORDER BY C.CompanyName, O.OrderID, P.ProductNameLet's take a moment to consider what it would have taken to retrieve this same result set if we didn't use distributed queries. For starters, our application would have required three separate connections along with the associated overhead required to establish them. We would also have had to know where the three servers were located and how to connect to them. Because our application is responsible for knowing how to connect to these disparate servers, we probably need to provide maintenance screens to enable the user to change the server name, login credentials, and a myriad of other settings. As they say on late night infomercials, "But wait, there's more!" So far, we've only addressed the difficulty in connecting to the various databases. We still need to processes the result sets. To find out who has purchased products that are now discontinued we would have to construct a procedure around the following pseudo code: Fetch all discontinued products For each product, fetch all orders For each order, Fetch customerNo, the coding isn't that difficult—a standard outer/inner looping construct—but the performance will be abysmal. We're essentially writing our own ISAM-style database engine in a SQL application. For each and every value in the header table, we're manually joining to the detail table to get the matching records for just that one value. Potentially we could loop through and fetch every customer for every product. These nested loops and merge joins are exactly what we're supposed to rely on RDBMSs to do for us. One final thought here before we get back to our regularly scheduled program. We were only reading the data, so imagine what would have been involved if we had wanted to update each of those data sources in a single transaction. Stored proceduresIn our previous example, we used a standard SQL statement. If we want, we can also execute stored procedures on linked servers. For example, we can execute everyone's favorite system stored procedure,sp_help , using the same four-part syntax: GreatPlains.Sales.dbo.sp_help Updating dataYou've now seen how you can retrieve data from remote data sources—but what about updating it? Well, it works just as you would expect with standard Transact-SQL. For example, if you want to change the name of Save-a-lot-Markets to Spend-a-bunch Markets in the customers table, you could use the following query:UPDATE Solomon.Customer.dbo.Customers SET CompanyName = 'Spend-a-bunch Markets' WHERE CustomerID = 'SAVEA'You can also use the INSERT
and DELETE statements the same way. Just be sure you identify
the remote data by using the four-part syntax.
Using stored procedures to set up linked serversEarlier we saw how to use Enterprise Manager to create linked servers. For your production servers, you'll want to use SQL Server's system stored procedures to define linked servers so that you can reconfigure your server should you ever need to.The stored procedures themselves are straightforward, but we've found some significant problems that aren't well documented. Thus, we're going to walk you through the process in rather minute detail with quite a few examples so that you can avoid some of the mistakes we've made along the way. You can use the sp_addlinkedserver [@server =] 'server' [, [@srvproduct =] 'product name'] [, [@provider =] 'provider name'] [, [@datasrc =] 'data source'] [, [@location =] 'location'] [, [@provstr =] 'provider string'] [, [@catalog =] 'catalog']Let's look at each of the seven parameters you use in sp_addlinkedserver .
ServerTheserver parameter represents the name by
which you'll refer to the linked server. This is the only required
parameter for the sp_addlinkedserver stored procedure. If you
don't specify any of the other parameters, SQL Server assumes you want to
set up a link to a remote SQL Server (and not a heterogeneous data
source). It also forces you to have the linked server name be the same as
the server's network name. For example, you could use the following syntax
to add a linked server definition for the server named GreatPlains: sp_addlinkedServer @server='GreatPlains' Provider nameTheprovider name parameter represents
the OLE DB provider name and it must match one of the providers installed
on the local computer. You need to be careful with the provider
name parameter because the sp_addlinkedserver system
stored procedure doesn't validate it. For example, consider the following
query: sp_addlinkedserver @server = 'bad_GreatPlains', @srvproduct = 'Remote SQL Data', @provider = 'BAD_SQLOLEDB' @datasrc = 'GreatPlains' go Server added. 1>In this example, SQL Server successfully adds a linked server even though the provider name parameter is invalid. But when
you try to run a query against this linked server, you'll receive the
following error:
=>dbo.Orders 2> go Msg 7403, Level 16, State 1, Line 1 Could not locate registry entry for OLE DB =>provider 'BAD_SQLOLEDB'. 1> If you're unsure of the exact OLE DB Product nameYou use theproduct name parameter to
specify the friendly name for the OLEDB provider, but using this parameter
is anything but friendly. In fact, it can be downright frustrating. The
documentation correctly states that it defaults to SQL Server but there's
much more to it that that.
If you set it to SQL Server, you cannot pass any other parameters. If you try, you'll receive an error 15428 as demonstrated by the following code: sp_addlinkedserver @server = 'GreatPlains', @srvproduct = 'SQL Server', @provider = 'SQLOLEDB' go Msg 15428, Level 16, State 1, Procedure sp_addlinkedserver, Line 67. You cannot specify a provider or any properties for product 'SQL Server'. 1<If you try to take advantage of it defaulting to ‘SQL
Server' and leave it blank while passing other parameters, SQL
Server returns error 15429: sp_addlinkedserver @server = 'GreatPlains', @srvproduct = 'SQL Server', @provider = 'SQLOLEDB', go Msg 15429, Level 16, State 1, Procedure sp_addlinkedserver, Line 72 '(null)' is an invalid product name. @datasrc = 'GreatPlains' 1>So, how about if we try using ‘Remote SQL Server
Data' as the friendly name? No such luck. We receive error 15429
again, but with a different description this time: sp_addlinkedserver @server = 'GreatPlains', @srvproduct = 'Remote SQL Server Data', @provider = 'SQLOLEDB', @datasrc = 'GreatPlains' go Msg 15429, Level 16, State 1, Procedure sp_addlinkedserver, Line 72 'Remote SQL Server Data' is an invalid product name. 1>With a little experimentation, we've learned that we simply can't use ‘SQL Server' as any part of the product name unless
the product name exactly equals ‘SQL Server' , and you pass no
other parameters. This query works: sp_addlinkedserver @server = 'GreatPlains', @srvproduct = 'SQL Server' go Server added. 1>As does this: sp_addlinkedserver @server = 'GreatPlains', @srvproduct = 'Remote SQL Data', @provider = 'SQLOLEDB' go Server added. 1>If you don't want to use a friendly name, simply pass an empty string, @srvproduct = '' .
Data sourceUse thedata source parameter to specify
the name of the data source the OLE DB provider will use to locate the
remote data. For SQL Server, you can use either the network name of the
database server or an aliased name you created with the Client
Configuration Utility.
LocationThis parameter enables you to specify the location of the data source that the OLE DB will use when connecting. It isn't used when connecting to remote SQL Servers.Provider stringUse theprovider string parameter to
tell the OLE DB provider how to connect to the remote data store. This
parameter accepts only a DSNProviderString style connection
string; the DataSource;UserName;Password format acceptable
for the OpenRowSet function doesn't work here. Here's the
syntax: sp_addlinkedserver @server = 'ProvStr_SAP', @srvproduct = '', @provider = 'SQLOLEDB', @datasrc = 'SAP', @provstr = 'Server=SAP;UID=sa;pwd=Secret;' CatalogYou can use thecatalog parameter to specify
what catalog the OLE DB provider should use when connecting to the remote
data source. For SQL Server data sources, use the default database. For
example, in the following query, we pass the Inventory database for the
catalog parameter: sp_addlinkedserver @server='SAPCatalog', @srvproduct = '', @provider = 'SQLOLEDB', @datasrc = 'SAP', @catalog = 'Inventory'You can determine if you specified a catalog for a linked server by selecting the linked server's tables icon in Enterprise Manager. If the right pane contains a list of tables, these are the tables from the default database, as shown in Figure D. If we hadn't specified a catalog, the right pane would have been empty. Figure D: You can use SQL Server Enterprise Manager to view the
default database for a linked server. Although you can define a default database (catalog) when you use the
LinkedServer.Catalog.Schema.ObjectIn fact, because you must specify all four parts anyway, defining a default database doesn't do much for you. Therefore, we usually save a little typing and skip the catalog parameter when defining a linked server with
sp_addlinkedserver .
Configuring linked server propertiesOnce you've defined a linked server by usingsp_addlinkedserver , you'll still need to
configure the server's properties if you want to execute stored procedures
on it. (Remember, in SQL Server Enterprise Manager, we had check boxes for
the RPC option.) If you use stored procedures to add a linked server, you
can use the sp_serveroption system stored procedure to
configure its options. This procedure works just like
sp_dboption (for configuring your local server) that you
might be more accustomed to using. Use the following syntax: sp-_serveroption [@server = ] ['server' ] [, [@optname = ] 'option name' ] [, [@optvalue = ] 'option value' ]There are several possible option settings that you can determine by calling sp_serveroption
with no parameters. We're going to only concern ourselves with the two
that relate to RPC. In the following examples, we use
sp_serveroption to set the RPC and RPC Out server options on
the linked server: exec sp_serveroption 'SAP', 'rpc', TRUE exec sp_serveroption 'SAP', 'rpc out', TRUENow you can execute stored procedures on the remote SQL Server. Viewing linked serversIf you want to see a list of the linked servers you've defined or to confirm that your newly added server really does exist, you can use thesp_linkedservers stored
procedure. This procedure returns a list of not only the linked servers,
but also the configuration information for each server such as the
Provider String, Data Source, and so on.
Dropping linked serversFinally, if you want to remove a linked sever, you can use the stored proceduresp_dropserver . For
example, if we wanted to remove the linked server we just added with the
invalid OLE DB Provider, we could execute this statement: sp_dropserver 'bad_GreatPlains' Creating ad hoc distributed queriesAs we mentioned earlier, you can execute distributed queries either against a pre-configured linked server, or you can reference the remote data source dynamically as part of your query syntax. To dynamically reference a remote data source, you use theOpenRowSet() function as the table name in the
FROM clause of a query. Use the following syntax with
OpenRowSet() : OpenRowSet(Provider, Connection String, =>SQL String)Replace the variables in the function with the following information:
We can rewrite our original example, to retrieve an alphabetical list of cities from the customer table by using the following syntax: SELECT C.City FROM OpenRowSet('SQLOLEDB', Solomon;sa;Secret;' , ‘SELECT City FROM Customer.dbo.Customers') AS C GROUP BY C.City ORDER BY C.CityIn this example, we use the AS syntax
for aliasing the table name so that we can make sure we don't have
ambiguous table names in the SELECT clause (if we've
referenced more than one table). Alternately, we could use the following
query using a DSNProviderString : SELECT C.City FROM OpenRowSet('SQLOLEDB', 'Server=Solomon;UID=sa;PWD=Secret' , 'SELECT City FROM Customer.dbo.Customers') AS C GROUP BY C.City ORDER BY C.CityIf you use a DSNProviderString , make
sure you don't accidentally leave a space between the Server and UID
values or the query parser won't be able to interpret the login
credentials—and you'll be denied access. We spent a very frustrating (and
unbillable) hour or two debugging an application that had an extra space
buried deep in a dynamically built SQL statement.
Finally, you can update data on a remote data store by using
UPDATE OpenRowSet('SQLOLEDB', 'Solomon';'sa';'Secret', 'SELECT CustomerID, CompanyName FROM Customer.dbo.Customers WHERE CustomerID = ''SAVEA'' ') SET CompanyName = 'Save-a-bunch Markets' WHERE CustomerID = 'SAVEA'Here we used the DataSource;UserName;Password syntax for the connection
string, but DSNProviderString would have worked just as well.
You'll also want to note the query string portion of the
OpenRowSet function. While a simple SELECT * FROM Customer.dbo.Customerswould have worked, we included the same WHERE clause and column list used in my
UPDATE clause to minimize the rows processed.
ConclusionIn this article, we showed you how to utilize SQL Sever 7.0's new distributed queries to present data that's distributed on several servers to end users as a single, coherent result set. Not only does the ability to execute distributed queries make our lives as DBAs much easier, but it greatly eases the burden on developers from writing distributed applications. It frees them from having to write their own code to connect to multiple servers, issue multiple SQL queries, and step through and synchronize multiple result sets.This month, we looked at how to defined linked servers in SQL Server Enterprise Manager and through stored procedures, and how to execute distributed queries against linked servers. In addition, we also showed you how to execute ad hoc distributed queries when connecting to remote servers. So as not to overwhelm ourselves, we limited ourselves to a SQL Server only environment. Next month, we're going to expand our data source horizons and look at other types of data including non-relational ones. We'll also take a closer look at the performance and security issues when dealing with distributed, heterogeneous data sources.
![]() |