SQL Server Solutions / Understanding parallel queries in SQL Server 7.0

Try a FREE Issue

Shop the Store

Shop the Pay-per-
View Archives

Business & Personal Computing

IT Professional

Programming &
Web Development

Graphics & Design

Element K Journals Home Page

Visit Discussion
Forums

Contact Us

Get Published

About Element K Journals

Advertise with Us!

Meet Our Partners

Tips

Weekly Buzz

Access Online Courses

Search Element K Journals Articles

Back Issues . About our Experts

May 2000

Understanding parallel queries in SQL Server 7.0

by Scott Mauvais

When Microsoft shipped SQL Server 7.0, it added another key check box for its head-to-head competition with Oracle and DB2: the ability to execute queries in parallel on multiprocessor servers. If you've ever watched Windows NT Performance Monitor, you might have noticed a situation where SQL Server only uses a single processor. "Wait a minute," you might have asked yourself. "I thought SQL Server was supposed to be a true, multithreaded app." Well, the answer is yes and no.

Prior to version 7.0, SQL Server as a whole would have taken advantage of all the processors in your system and spread the work across them. When it came down to the basic unit of work in a query, it executed everything serially. In other words, the application was multithreaded but an individual query had only one thread of execution. Both Oracle and IBM used this fact, along with the lack of row-level locking, to great advantage when selling against SQL Server in shops with large data sets and applications that relied upon complex queries.

All that has changed with the release of SQL Server 7.0. In this article, we'll take a detailed look at the parallel query architecture and how SQL Server decides whether to execute a query in parallel or not. From there, we'll examine the performance tradeoffs and learn how to control how SQL Server executes parallel queries. For example, you can limit the number of processors SQL Server can use to process parallel queries, as shown in Figure A. Finally, we'll set up and execute some sample queries and compare the performance between the serial and parallel execution plans.

Figure A: You can configure the number of processors SQL Server uses for parallel queries.
[ Figure A ]

Parallel query architecture

First, let's make sure we understand what is meant by intra-query parallelism . When SQL Server executes a query in parallel, it breaks down this single query into subsections and spreads these subsections across multiple processors. A classical example for a parallel query is the calculation of quarterly sales. To apply this to the Northwind sample database, you would use something like the following query:

SELECT TOP 4 DATEPART(qq, OrderDate) AS FQ,

DATEPART(year,Orderdate) AS FY,


						SUM(unitprice*quantity) AS Sales
						FROM Orders O,
						
						[Order Details] OD
						WHERE O.OrderID = OD.OrderID
						GROUP BY DATEPART(year, Orderdate),
						
						DATEPART(qq, OrderDate)

ORDER BY FY DESC, FQ DESC

In the classic example, this query would be broken down into four sub-tasks, which relate to each of the four quarters. On a quad processor machine, the server would delegate one fiscal quarter to each processor.

SQL Server behaves quite similarly other than it doesn't break down the query into tasks based on column values and the number of groups returned. This makes sense because the server can't know the number of distinct values until it actually executes the query, so it has no way of partitioning the query across the processors using this approach.

When executing a query in parallel, SQL Server breaks down the rows into streams , regardless of value, and sends each stream-consisting of a roughly equal number of rows-to each processor. Each processor then executes the complete query against a subset of the rows. Finally, near the end of the query execution, SQL Server aggregates these separate streams and produces a single result set.

Well, that's almost correct. Sometimes, not every processor participates in a parallel query. SQL Server supports what it calls degrees of parallelism or DOP. A degree of parallelism refers to the number of processors involved in executing a query. A query with a DOP of 2 executes on two processors; a DOP of 4 corresponds to four processors. What this means is that a given query may execute with either a DOP of 2 or 4 on the quad processor machine depending on the load on the server. While obvious, it's probably worth noting that the DOP can't exceed the number of processors installed in the machine and allocated to SQL Server.

One thing that makes SQL Server's parallel query architecture stand out is its lack of explicit parallelism directives. Other databases require the developer to explicitly define a query as parallel at design-time by including a specific syntax in the queries. Microsoft, by contrast, has written parallelism into SQL Server's Query Optimizer rather than writing it into the various logical operations (for example-seek, join, sequence, etc.). Instead of relying on the developer, SQL Server decides at optimization-time whether to execute a parallel or serial query plan.

When the Query Optimizer detects that a given query could benefit from parallelism, it inserts exchange operators into the query. These operators manage the streams and the flow control during the execution of the query. With this approach in mind, Microsoft developed what it calls parallel everything , where each query operation can execute in parallel when combined with an exchange operator.

In the SQL Server 7.0 release, the INSERT, UPDATE, and DELETE operators-just the operators, not the entire query-always execute serially to avoid race conditions. The remainder of the query, such as the WHERE clause and the SELECT portion of an INSERT query, can execute in parallel. In other words, SQL Server identifies the rows affected by the data modification in parallel but it applies the changes serially. Microsoft has said that future versions of SQL Server will be fully parallel.

Performance tradeoffs

Let's move on and examine how the optimizer determines if a query would run faster if executed on multiple processors. To decide this, the optimizer looks at the query itself and the current operating environment. This is a key point that also distinguishes SQL Server from its competitors.

Similar to its approach with dynamic locking, Microsoft has built into SQL Server the ability to monitor its own usage and optimize queries appropriately. Thus, depending on the current load on the server, a query that executes in parallel one time may execute serially the next. For SQL Server to execute a query in parallel, the query and the current environment must pass the following six tests:

The server must have more than one processor and SQL Server must be configured to use at least two of them. Okay, that one was obvious. It's worth noting here that although SQL Server 7.0 Desktop Edition will use up to two processors, it doesn't support intra-query parallelism.
The developer or DBA must not have disabled or reduced the level of parallel query processing. We'll talk more about this later in the article.
The server must have sufficient memory available because executing a query in parallel requires significantly more memory than executing the same query serially. The higher the DOP, the more memory required. If the server doesn't have enough memory available, the Query Optimizer backs off the DOP-all the way down to 1 (serial execution)-until it finds the server has sufficient memory to run the query. You can see the additional memory requirements for yourself by examining the Granted Workspace Memory counter of the SQL Server:Memory Manager object in Performance Monitor.
The server must have sufficient CPU bandwidth available. Just as a parallel query uses more memory on your server, parallel query also uses more CPU cycles than the same query executed serially. Parallel queries increase the workload on your server's CPUs because of the additional overhead of processing the exchange operators and of coordinating the multiple processors. If the server is under heavy load, the optimizer reduces the DOP.
The query must be sufficiently complex for the time saved by executing on multiple threads to overcome the time lost due to the CPU overhead. In terms of parallelism, complex queries are those that involve scans or joins against large tables or aggregations involving many rows or many distinct values. To determine if a query is a good candidate for intra-query parallelism, SQL Server uses the threshold for parallelism global configuration parameter. We'll talk more about this value later when we discuss controlling parallel queries.
There must be sufficient rows in each stream. Here, the Query Optimizer uses its statistics about data distribution to determine if the overhead of starting up a parallel query would be greater than the performance gained.

After the optimizer has evaluated a query against these six criteria, it places exchange operators in the queries it has determined will benefit from parallel execution.

As a developer or DBA, you can exercise a great deal of influence over how SQL Server handles these six criteria and, consequently, you can control the DOP at several different levels. First, you can disable all parallel queries or limit the maximum DOP at the server level. If that's too sweeping of a change, you can increase the complexity threshold a query must cross before the optimizer considers it for parallel execution. Finally, you can have extremely granular control by specifying the maximum DOP on a query-by-query basis. Let's look at each of these approaches.

Disabling parallel query processing for your server

To turn off all parallel query processing, bring up the SQL Server Properties window shown in Figure A on the cover. To do so, right-click on your server in Enterprise Manager and choose Properties. Next, click on the Processor tab.

By default, SQL Server uses all available processors for executing parallel queries. To reduce the number of processors used, select the Use option button in the Parallelism section of the dialog box. Doing so enables the combo box where you can select the maximum number of processors you want SQL Server to use. This setting affects the outcome of the second test mentioned earlier. Be careful not to make changes in the Processor Control section of this dialog box. These settings affect the processors used by SQL Server itself, not just the parallel query processor.

The other parallelism setting you can change in this window is the cost threshold for parallelism . Altering this setting determines how complex a given query must be before the Query Optimizer considers it for parallel execution. The default setting is five seconds, meaning that if the optimizer believes the query will take more than five seconds to run if executed serially, it will consider it for parallel execution. If SQL Server estimates the query will run faster, it's considered as trivial and will always be executed serially. This corresponds to the fifth of the six tests.

If you prefer to change settings through queries instead of dialog boxes, you can also configure the threshold for parallelism by using Transact-SQL. The following commands enable you to turn off intra-query parallelism for the entire server:

EXEC sp_configure 'show advanced options', 1

EXEC sp_configure 'max degree of parallelism', 1


						RECONFIGURE

EXEC sp_configure 'show advanced options', 0

Just like the maximum degree of parallelism, you can change the minimum cost for parallelism via Transact-SQL. For example, to increase the threshold to 7 seconds, you could use the following statements:

EXEC sp_configure 'show advanced options', 1

EXEC sp_configure 'cost threshold for parallelism', 7

EXEC sp_configure 'show advanced options', 0

Disabling parallel query processing for an individual query

If you require more control over individual queries, you can use an optimizer hint to control the DOP. Unlike the locking hints you might be familiar with that affect the concurrency and consistency that you place following the FROM clause, the DOP hint is part of the new OPTIONS clause. Back to our quarterly sales example-if we want to limit the intra-query parallelism to two processors, we could add a hint like this:

SELECT TOP 4 DATEPART(qq, OrderDate) AS FQ,

DATEPART(year, Orderdate) AS FY,


						SUM(unitprice * quantity) AS Sales
						FROM Orders O, [Order Details] OD
						WHERE O.OrderID = OD.OrderID
						GROUP BY DATEPART(year, Orderdate),
						
						DATEPART(qq, OrderDate)
						ORDER BY FY DESC, FQ DESC

OPTION (MAXDOP 2)

It's worth drawing attention to the specific syntax. Some of the early SQL Server documentation as well as several articles list the syntax as (MAXDOP=n). The correct syntax is (MAXDOP n); don't use an equal sign.

Why should you adjust the DOP?

With all of SQL Server's great self-tuning we keep hearing about, you might be wondering why you would want to adjust the DOP. The most common reason is if you have a server that isn't dedicated to a single task. In this case, you might want to ensure that a handful of complex queries don't monopolize all of the CPU horsepower on your server and affect the performance of all the other applications running on it.

With the declining cost of hardware, the need for a server to do double-duty by running both SQL Server and, say, IIS has gone away for the most part. However, this declining cost has also lead to a new class of super servers that host a whole series of SQL applications, running the gamut from high-volume OLTP systems to data warehouses holding hundreds of gigabytes. With servers like this, you'll probably want to limit the DOP so you can maintain the agreed upon quality of service across all your applications. Turning off parallel queries is also a documented means of working around a couple of bugs in SQL Server. For more information, refer to Knowledge Base articles Q240853 and Q247500.

Now that we understand how SQL Server decided whether to run a query in parallel across a server's processors and we've discussed how to influence this behavior, let's move on and examine the performance differences between parallel and serial queries.

Analyzing the performance of parallel queries

To study the performance of parallel queries in a controlled environment, we created a script that increased the size of the sample Northwind database a thousand fold. We chose this approach primarily because it provided a repeatable test bed that wasn't based on a proprietary schema or confidential information. We'll discuss the details of the script in a future article. But, in the meantime, you can download it along with instructions for its use from from my web site .

To compare the performance of parallel versus serial execution, we constructed a simple query that computes the average sale of a product, the number of customers who bought it, and counts the times it was ordered. Here's the query we used:

SELECT ProductID,

COUNT(C.CustomerID) AS NumCustomers,


						COUNT(O.OrderID) AS NumOrders,
						AVG(unitprice * quantity) As AvgPrice
						
						FROM Customers C,
						
						Orders O, [Order Details] OD
						WHERE C.CustomerID = O.CustomerID
						AND O.OrderID = OD.OrderID
						GROUP BY ProductID

ORDER BY ProductID

This query provides us with a good performance test of parallel queries because it incorporates three large joins and a good sized aggregation. To run this same query serially, simply append OPTION (MAXDOP 1) to the end of the query.

Using Performance Monitor

In Figure B, you can see where we captured a Windows NT Performance Monitor graph of the query first executed serially and then executed in parallel on the same dual processor Pentium 500 server. While there are several interesting counters we could have looked at, here we chose to focus strictly on the CPU utilization.

The solid black line represents the Total Processor Time from the System counter. The white line charts Processor Time for CPU 0 from the Processor counter. The dotted line represents CPU 1. At the time we executed these queries, there was absolutely no other activity on the server-so the rise and fall of the graphs denote the beginning and ending points of the query execution.

Figure B: You can analyze serial and parallel query performance with Windows NT Performance Monitor.
[ Figure B ]

The first thing you'll undoubtedly notice is the shorter execution time of the second, parallel query. Astute readers will suspect that some of the performance gain during parallel execution came because of SQL Server caching the data after it ran the query the first time. To correct for this measuring error, we used DBCC PINTABLE against the relevant tables before running these tests to ensure they were already in the data cache. To double-check our results, we reversed the order of the tests and achieved the identical outcome.

Faster execution is, of course, the expected result because of the increased CPU resources available. The interesting thing, however, is the relative amplitude of the CPU utilization between the serial and parallel queries. During the serial execution, CPU 1 (the dotted line) immediately jumps up around 90 percent utilization, while CPU 0 (the white line) languishes around 10 percent. The result is a total CPU usage of around 50 percent.

Even more interesting is the apparent inverse relationship between the two processors. As one processor peaks within its range, the other drops off. While there's no way to know for sure, the most likely cause is the thread executing the query on CPU 1 has to wait for SQL Server or Windows NT itself to return the desired data from the cache to the CPU for processing.

When we turn on parallel execution, the first thing you'll notice is the nearly 100 percent Total Processor Time and the almost identical tacking for the two processors. For a good portion of the query's execution, both CPUs are pegged at 100 percent utilization. While this is fine in our test environment, this is probably not acceptable in a production environment where other users would be running their own queries. It's in complex queries such as this one where you'll probably want to scale down the DOP.

The other important aspect of the parallel query plan is that it only provided a little more than a 20 percent performance boost from the serial plan, based on a rough estimate of the CPU graphs. This shows that the scaling isn't linear because of the overhead of coordinating the multiple processors.

Using SQL Server Profiler

To get a better measure of the relative performance of serial and parallel execution plans, we can turn to another tool provided with SQL Server, the Profiler. Before you can effectively use the SQL Server Profiler to analyze parallel queries, you need to add the Event Sub Class data column. To do so, open up your trace and select File | Properties. This brings up the Trace Properties dialog box. Next, click on the Data Columns tab, highlight Event Sub Class in the Unselected Data list box, and click Add, as shown in Figure C. The Event Sub Class column lets you see the DOP for each query in the Trace window.

Figure C: You'll need to add Event Sub Class to Profiler to analyze parallel query performance.
[ Figure C ]

With the Event Sub Class column added, let's execute our sales by product query with both serial and parallel execution plans and look at the results. You can see our SQL Server Profiler results in Figure D.

Figure D: SQL Server Profiler enables you to view serial and parallel DOP, CPU usage, and duration.
[ Figure D ]

You can tell the first query (above the highlight) is the serial one both by the OPTION (MAXDOP 1) at the end of the query and the 0 in the Event Sub Class column. We know the second query executed in parallel because of the 2 in the Event Sub Class column. From looking at the Duration and CPU columns, we see that the parallel query executed about 25 percent faster (19377ms versus 25,687ms) yet required 43 percent more CPU time (35,999ms versus 25,141ms). You'll also notice that both queries performed the same amount of physical work-16,902 reads.

Conclusion

In this article, we covered SQL Server's new intra-query parallelism. While previous versions of SQL Server were indeed multithreaded, an individual query would only execute on a single thread.

We discussed how SQL Server breaks a query into subtasks and assigns each to separate processors. We then looked at how the SQL Server Query Optimizer decided whether to use a serial or parallel execution plan. Next, we looked at three methods that we as developers and DBAs can employ to control the degree of parallelism for the server as a whole and down to the level of an individual query. Finally, we showed some examples of parallel queries and compared their performance to normal serial queries.

With the addition of the parallel query, execution SQL Server takes an important step towards being a real, enterprise-class database now that it captures the wasted horsepower in SMP servers by using the full CPU resources available. This faster execution comes at a cost as coordination of the query's multiple threads requires additional memory and CPU cycles.

Top of Page . Back Issues

Copyright � 2001 Element K Content LLC. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Element K Content LLC is prohibited. Element K is a service mark of Element K LLC.