|
![]() ![]()
June 2000Using Northwind to Create a Huge test Databaseby Scott Mauvais Last month, in the article "Understanding parallel queries in SQL Server 7.0", we used a sample database based on the standard Northwind database included with SQL Server—but it was 1000 times the size of the standard one. We needed a database this large to make sure the dataset was big enough for the query optimizer to consider parallelism when it executed a query. In this article, we're going to show you the BigNW.sql script we used to create this large database, as shown in Figure A. This script creates a new database named BigNW based on the Northwind sample database. We'll walk you through the major sections of this script. Further, you can download this script from our ftp site at ftp://ftp.elementkjournals.com/sql/sql006.zip. By the end of the article, you'll not only have the means to create a database as large as your hard disks can handle, but you'll also have a script that you can modify to create large databases based on your own production data. Figure A: Use the BigNW.sql script to create a large database
for testing your server. Why do you want a large database?You might be asking yourself "Why would I want such a large database?" Once you have a large dataset built on real-life data, you can start testing all sorts of scenarios in your environment. Obvious scenarios include those listed in Table A.Table A: Scenarios to test in your environment
You'll also find a large database useful in some unexpected ways. As a developer, you'll find performance tuning becomes much easier when you run suspect queries against a large dataset. It's often hard for you to find bad queries when you run them against a small dataset—and a small dataset is typical on most developers' computers because the performance hit may be only a couple of seconds at best. When we started testing against databases several times larger than those in production, our poorly written stored procedures jumped right out at us because they would often run for minutes. Whoops! Several years back, when Microsoft Access first shipped, we created some Access Basic scripts to increase the size of its version of the Northwind database to test the limits of the Jet engine. We did this by reading the order entry tables, modifying the data just a bit, and inserting it (or appending it in Access-speak) back into the original tables. We then repeated this process until we'd created a database of the desired size. During the technical beta for SQL Server 7.0, we compared the
performance of some complex queries on the various beta releases. To make
sure we had an easily reproducible test database, we decided to port our
Access Basic scripts to the Transact-SQL script called
When mapping out the tasks in the script, we broke it into four major sections, which we'll take a look at in the following sections:
Setting up the environmentThe first part of the script sets up the environment by declaring and assigning values to some variables to hold database names, paths, and the like. Probably the most important variable here is#1HowBig , which determines the number of
times the process will loop through the tables while creating sample data.
We've set #1HowBig to a default value of 100 ,
which means that the resulting database will have 100 times the number of
customers and orders as the original. Executing this script in the default
mode will require 60 MB of disk space and will take around two minutes to
execute (depending on your hardware).
Managing the temporary tables generated by the scriptDeclaring the variables and assigning a value to them is pretty basic stuff. One thing we do want to draw your attention to, however, is how we'll handle temporary tables. Normally, when you create a table, if you start the table name off with a pound sign (#), you signify to SQL Server that you want it to be a temporary table. For example, the following SQL statement creates a temporary table named #BigNW_Vars:(DBName sysname, HowBig int) When SQL Server sees a statement like this, it creates a new table within the TempDB system database. This table is private to the user who created it and the server guarantees that the name won't collide with any other user's temporary tables—even if multiple users issue the same statement at the same time. That's the good part. The downside is that SQL Server accomplishes this by obfuscating the
table name by appending an arbitrary extension. This makes it extremely
difficult to use the standard BigNW.SQL might still abend for reasons such as lack
of disk space.
But, before we can drop the tables, we have to first verify that they
exist. As we mentioned earlier, we can't use the standard WHERE name = '#BigNW_Vars' DROP TABLE #BigNW_Vars To get around this problem, you can use the
When you use this code snippet, make sure you specify the database
location. Using Creating temporary tablesWe're now ready to start creating some temporary tables. Because you must execute this script in three Transact-SQL batches, you'll lose all of your local variables when they go out of scope as soon as the first batch ends with theGO
command. To work around this, in the first batch, we'll create the
##BigNW_Vars table to hold the variables we're going to need
to access in later batches by using the following syntax:
(DBName sysname, HowBig int) INSERT INTO ##BigNW_Vars (DBName, HowBig) VALUES (@sDBName, @lHowBig) When you want to retrieve these variables in a subsequent batch, you
simply select the values into that batch's local variables like we do
below with the
One final word about variables before we move on to creating the database: To make the code snippets easier to read, we're going to replace the variables with their default values. When you read through the code in the download file, you'll find statements like: + " TO " + @sBackupName + " WITH INIT" EXEC (@sSQL) But in this article, we're going to show this statement as follows: WITH INIT By showing you the statements in this format, you won't have to wade through all the dynamic SQL. Now that we've declared and populated our variables and dealt with the temporary tables, let's move on to creating the database. Creating the BigNW databaseCreating theBigNW
database is a three-step process. First, we need to drop the database if
it already exists. Second, we'll create an empty database called BigNW.
Finally, we'll dump the original Northwind database and load it over our
new, empty BigNW . Begin with the following statements:
IF EXISTS (SELECT * FROM master..sysdatabases WHERE NAME = 'BigNW' ) EXEC ("DROP DATABASE BigNW") Before we attempt to Once we know it exits, we While creating the remainder of the database section in the script
file, we'll create the database, back up the Northwind database and
restore it over our new database. Finally we'll set some database options
such as Modify the tablesIn this section, we need to accomplish two things: dropping the constraints and indices for faster loading, and increasing the size of some of the primary key fields so they can hold more unique values. We must remove the constraints first because SQL Server will not allow us to modify the underlying fields while the constraints are still being enforced. Even if we didn't need to change the fields, we would still want to drop the constraints because we don't want to incur the overhead of enforcing them for each and every row we insert.While we're at it, we also want to remove the indices. If we don't, SQL Server won't only write the data page for the rows we add, but it will have to update the various index pages too. It gets even worse when you consider Northwind makes liberal use of clustered indices. Because SQL Server keeps the physical order of the rows the same as the clustered index order, we'll end up causing many page splits as we insert our new data if we leave the indices on the tables. When we first started developing this script, we didn't take the time to write the portion that removes the indices. Like most software projects, the schedule was very tight and this was clearly a feature that could wait until the second version. But as soon as we added the code to drop the indices, we saw a threefold increase in performance! In this version of @psTableName SYSNAME = '' AS IF @psTableName = '' BEGIN PRINT 'USAGE: procShowRefs psTableName = =>Name' RETURN (1) END SELECT OBJECT_NAME(fkeyid) AS TableName, OBJECT_NAME(constid) AS ConstraintName FROM sysreferences WHERE rkeyid = OBJECT_ID(@psTableName) To generate a list of indices for a specific table, search the online
Help for You still need to make one more change to the database schema to support generating the sample data, but we'll cover that after describing the process we're going to use to loop through the tables. Creating the sample dataWe have just one more major element of this process to discuss and it's the most important one: just how are we going to expand the database by a factor of 100? We'll do so fairly easily. We're simply going to copy all the rows out of the tables we want to grow and place them into temporary tables. Then we'll insert this data back into the original tables 100 times. While simple in theory, it does present some interesting challenges in practice.First, we must prevent key violations by generating a new primary key
for each row we insert. This isn't a problem for tables where the primary
key is numeric (by numeric, we mean Creating new values for text-based primary keys presents a different
problem. We have just such an issue with the When you install Northwind, the Column 'Customers.CustomerID' is not the same length as referencing column 'CustomerCustomerDemo.CustomerID' in foreign key =>'FK_CustomerCustomerDemo_Customers'. Did you know that there's even a table called
Finally, we'll also add a new column called ALTER TABLE Customers ALTER COLUMN CustomerID VARCHAR(11) NOT NULL ALTER TABLE CustomerCustomerDemo ALTER COLUMN CustomerID VARCHAR(11) NOT NULL ALTER TABLE Orders ALTER COLUMN CustomerID VARCHAR(11) NOT NULL -- Add Cycle column to each table ALTER TABLE Customers ADD Cycle INT ALTER TABLE Orders ADD Cycle INT ALTER TABLE [Order Details] ADD Cycle INT Inserting dataWe're now ready to start inserting data. We'll start with the master tables, move on to the header tables, and conclude with the transactions. We do this to maintain the referential integrity. To keep this script down to a manageable size, we'll only include the order entry tables.In Northwind, like most every other order management databases, the customer table is the master. Let's step through the code we use to expand it: DECLARE @i INT DECLARE @lHowBig INT -- Retrieve the value for the size factor SELECT @lHowBig = HowBig FROM #BigNW_Vars -- Init the counter to 0 SELECT @i = 0 -- Remove any pre-existing new rows DELETE FROM Customers WHERE Cycle IS NOT Null -- Copy to the temp table we will use as source -- and set the cycle counter to 0 SELECT * INTO #BigNW_Cust FROM Customers UPDATE #BigNW_Cust SET Cycle = @I Use your new Cycle column to identify the new rows you've added to the
table. Whenever you insert a row, you'll populate the Cycle column with
the current cycle number. Because we allowed Now that you have the Now let's move on to the looping section where the real work gets done: BEGIN BEGIN TRAN SELECT @i = @i + 1 -- Update PK so it is unique and insert UPDATE #BigNW_Cust SET CustomerID = LEFT(CustomerID, 5) + CAST(@i AS VARCHAR(5)), Cycle = @i INSERT Customers SELECT * FROM #BigNW_Cust PRINT "Customer passes complete: " + CAST(@i as varchar(5)) COMMIT WORK END Start by testing the current value of the counter to make sure you're
not complete. Assuming you aren't, increase the counter by one. Then,
update all 91 rows in the temporary table by appending the value of
Next, insert the contents of the temporary table back into the main Customers table, print out some status information so you know something is happening, and restart the loop. When you're done, use the following query to test the results: FROM Customers ORDER BY CustomerID The output for this query is shown in Table A. Notice how the rows sort in alphabetical rather than numeric order. Table A: The first rows of the Customers table after running
BigNW.SQL
You might be wondering why you need to use the AS VARCHAR(5) Table B: The result if we don't use
As you can see, each loop just added its value to the result of the
previous one. When we reach the seventh cycle, the value would be larger
than the field's data type of 11 characters. If you run
With the master table populated, we're ready for
Listing A: Use this query to copy all of the columns from the Northwind..Orders table into a temporary table. SELECT CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry, Cycle INTO #BigNW_Ord FROM Orders UPDATE #BigNW_Ord SET Cycle = @i -- Done yet? WHILE @i < @lHowBig BEGIN SELECT @i = @i + 1 UPDATE #BigNW_Ord SET CustomerID = LEFT(CustomerID, 5) + CAST(@i AS VARCHAR(5)), Cycle = @i INSERT Orders SELECT * FROM #BigNW_Ord PRINT "Orders passes complete: " + CAST(@i as varchar(5)) END The first difference you'll notice is that we have explicitly listed
the columns rather than just using Getting around this is extremely easy. We simply use the same code we
used to generate unique It's worth noting here that we didn't add any new rows to the
-- one, OrdDet, that we will update and -- insert back into the original table. -- We will use the other, OrdDetOrig, -- to refresh OrdDet after each cycle SELECT * INTO #BigNW_OrdDet FROM [Order Details] SELECT * INTO #BigNW_OrdDetOrig FROM [Order Details] UPDATE #BigNW_OrdDet SET Cycle = @I We start off just like we did for the previous two tables, but here you
need to make two copies. After we walk through the next section, the
reason for two will become clear. Because the table doesn't have an
Identity column, we can revert back to the Listing B: Use this script to create the Order Details table in the BigNW database. BEGIN SELECT @i = @i + 1 -- Refresh OrdDet temp table with original -- [Order Details] table. We need to do -- this because we need the original values in -- the OrderID column TRUNCATE TABLE #BigNW_OrdDet INSERT INTO #BigNW_OrdDet SELECT * FROM #BigNW_OrdDetOrig WHERE Cycle IS NULL -- 830 orders UPDATE #BigNW_OrdDet SET OrderID = (OrderID + (830 * @i)), Cycle =
@i INSERT [Order Details] SELECT * FROM #BigNW_OrdDet PRINT "Order Details passes complete: " + CAST(@i as varchar(5)) END The key to understanding this entire loop is this one line of code:
The
Because
that there are 830 orders in the original Northwind. Although not
relevant to the formula, knowing that the first Now that we know these three pieces of information,
Because we relied on This brings us back to why we need two copies of the original
There you have it—the BigNW database is now 100 times larger and you have a great platform for testing The only thing left to do is to re-create the constraints and indices and do a little clean up. As with the initial creation, it's quite easy, so we will leave that as an exercise for the reader. ConclusionIn this article, we showed you how to expand the stock Northwind database into a platform where you can do capacity planning, test your operational procedures, and improve the performance of your existing apps.Along the way, we covered many related topics and showed you several techniques to solve some of the more common headaches of T/SQL including temporary tables and variable scope. Finally, we closely looked at methods of generating large amounts of test data while maintaining referential integrity in your database.
![]() |