Populating a SQL Server Test Database with Random Data

By: Tibor Nagy | Updated: 2010-12-09 | Comments (6) | Related: > Testing

Problem

If you develop new functionalities for your SQL database then probably you already encountered the typical problem of testing large scale databases. I have to run a series of performance and functional tests on a database with a few million rows, but I do not have the necessary test data. I heard about some excellent commercial tools but they are expensive and my company cannot afford them. How can I generate the test data on my own and populate it to the tables on the test database?

Solution

I will show you some tricks to generate and multiply rows for the test database. Each data type needs a different approach as you can see below.

Numeric data

First of all you will definitely need to generate some numeric data. The hands on idea would be to use the RAND() function. Unfortunately RAND is invoked once per query, not once per row therefore it returns the same value during the course of your query. So I suggest using other methods unless you would like to use a different seed value for each and every call which can be very painful for generating mass volume of data.

If you are looking for a real random value, you will get much better results with generating a NEWID() and calculating the checksum. The following expression sets INTVALUE1 field to a random integer value:

UPDATE TESTTABLE  SET INTVALUE1=CHECKSUM(NEWID())

Further modifying this expression you can generate decimal values and using the ABS() function you can eliminate the negative numbers:

UPDATE TESTTABLE  SET DECVALUE1=ABS(CHECKSUM(NEWID()))/100.0

Fixed range data sets

We can use the above method with some changes to generate values within a fixed data range. For example when you need to generate a Boolean value then you can choose from the data sets (0;1) or (Y;N) etc. You can use the parity of the random integer to convert it into Boolean data. Beware; you cannot use case function to sort NEWID results since NEWID is invoked for every CASE statement. The following expression generates random yes or no values for the column BOOLVALUE1:

--Generate random 0 or 1 value for every row  
UPDATE TESTTABLE  SET BOOLVALUE1 = ABS(CHECKSUM(NEWID()))%2  
--Translate the values to Yes or No  
UPDATE TESTTABLE  SET BOOLVALUE1 = 'N' WHERE BOOLVALUE1='0'  
UPDATE TESTTABLE  SET BOOLVALUE1 = 'Y' WHERE BOOLVALUE1='1'

Text data

Text data requires special attention if you would like to have a database with some reasonable content. You have to build an initial dictionary and then use it to breed more rows. For example it is common to use names in various database fields. The following example shows you how to create a table containing 100 different names in a few seconds.

--Create table for first names  
CREATE TABLE [NAMES1] (FIRST_NAME [varchar](20))  
--Create table for family names  
CREATE TABLE [NAMES2] (FAMILY_NAME [varchar](20))  
--Fill first names  
INSERT INTO NAMES1 VALUES ('John')  
INSERT INTO NAMES1 VALUES ('Jack')  
INSERT INTO NAMES1 VALUES ('Jill')  
INSERT INTO NAMES1 VALUES ('Bill')  
INSERT INTO NAMES1 VALUES ('Mary')  
INSERT INTO NAMES1 VALUES ('Kate')  
INSERT INTO NAMES1 VALUES ('Kevin')  
INSERT INTO NAMES1 VALUES ('Matt')  
INSERT INTO NAMES1 VALUES ('Rachel')  
INSERT INTO NAMES1 VALUES ('Tom')  
--Fill family names  INSERT INTO NAMES2 VALUES ('Smith')  
INSERT INTO NAMES2 VALUES ('Morgan')  
INSERT INTO NAMES2 VALUES ('Simpson')  
INSERT INTO NAMES2 VALUES ('Walker')  
INSERT INTO NAMES2 VALUES ('Bauer')  
INSERT INTO NAMES2 VALUES ('Taylor')  
INSERT INTO NAMES2 VALUES ('Morris')  
INSERT INTO NAMES2 VALUES ('Elliott')  
INSERT INTO NAMES2 VALUES ('Clark')  
INSERT INTO NAMES2 VALUES ('Rock')  
--Generate 10x10=100 different names  
SELECT * INTO TESTTABLE FROM NAMES1 CROSS JOIN NAMES2

You can take the above examples to quickly generate more and more different database records from a few row sample tables. Cross joins and self joins can easily create large datasets so be cautious when using very big tables. For example cross joining two tables with one thousand records generates results of one million records.

Next Steps

Build supporting tables to generate the text values
Create stored procedures to fill test database
Check out these related resources:

About the author

Tibor Nagy is a SQL Server professional in the financial industry with experience in SQL 2000-2012, DB2 and MySQL.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2010-12-09

Populating a SQL Server Test Database with Random Data

Problem

Solution

Next Steps

About the author

Comments For This Article