Populate Large Tables with Random Data for SQL Server Performance Testing

By:   |   Updated: 2017-10-23   |   Comments (2)   |   Related: > Testing


Problem

Performance testing is one of the most critical criteria to evaluate SQL Server database efficiency. Poorly written queries affect database performance. However if you only have a small amount of data in the database, it becomes difficult to evaluate how well a query is performing. For small data sets, the difference between the performances of different query scripts is not discernable. To evaluate query performance, we need large datasets. In this tip we will see how to create large tables of random data that can be used for performance testing.

Solution

The solution is to this problem is to write a script that can add large amount of random data into the SQL Server database so that queries can be evaluated for performance and execution.

Creating Large SQL Server Tables Filled With Random Data

We will explain the process of creating large tables with random data with the help of an example. We will create a dummy library database with two tables: tblAuthors and tblBooks. The first table will store information about imaginary authors and the second table will store information about imaginary books. The tables will have a one to many relationship where an author can have multiple books. First we will insert a large amount of random data in the tblAuthors table since it doesn’t have any foreign keys. Next we will explain the process of adding random data to a table that has a foreign key.

Create an Example SQL Server Database

First we need to create the example library database and add the tables to it. Take a look at the following script:

CREATE Table tblAuthors
(
   Id int identity primary key,
   Author_name nvarchar(50),
   country nvarchar(50)
)
CREATE Table tblBooks
(
   Id int identity primary key,
   Auhthor_id int foreign key references tblAuthors(Id),
   Price int,
   Edition int
)

The tblAuthors table contains three columns: Id, Author_name and Country. The tblBooks table contains four columns: Id, Author_id, Price and Edition. The Author_id column of the tblBooks table is a foreign key column and references Id column of the tblAuthors table. This is to implement the one to many relation between the two tables.

Adding a Large Amount of Random Data to the tblAuthors Table in SQL Server

Now let’s add data to the tblAuthors table. We have chosen to add records to this table first since this is an independent table and has no foreign keys. On the other hand tblBooks table has a foreign key which references tblAuthors table. Hence we have to have a record in the tblAuthors table before we can insert any records in the tblBooks table.

The following script inserts 12 thousand dummy records into the tblAuthors table. You can add more if you want.

Declare @Id int
Set @Id = 1

While @Id <= 12000
Begin 
   Insert Into tblAuthors values ('Author - ' + CAST(@Id as nvarchar(10)),
              'Country - ' + CAST(@Id as nvarchar(10)) + ' name')
   Print @Id
   Set @Id = @Id + 1
End

Take a look at the above script. Here we declare an integer variable @Id and initialize it with 1. Inside the while loop we use the INSERT statement to insert records into the tblAuthors table.

Look at the values being inserted. We do not need to insert any value for the Id column since we have set the identity property on, so the value for this column will automatically be inserted with each record. We have to insert values for the Author_name and country columns. For Author_name, we use the string ‘Author -’ and concatenate it with the value of @Id variable. To convert @Id from integer to string we use CAST function. The values inserted for Author_name column will be Author - 1, Author - 2 up to Author - 12000. We use the same technique to add values for the Country column.

Now if you select all the records from the tblAuthor column, you will get 12000 records. The table will look like this:

Id Author_name country
1 Author - 1 Country - 1 name
2 Author - 2 Country - 2 name
3 Author - 3 Country - 3 name
4 Author - 4 Country - 4 name
5 Author - 5 Country - 5 name
6 Author - 6 Country - 6 name
7 Author - 7 Country - 7 name
8 Author - 8 Country - 8 name
9 Author - 9 Country - 9 name

-

-

-

-

-

-

12000 Author - 12000 Country - 12000 name

Adding a Large Amount of Random Data to the tblBooks Table in SQL Server

Now let’s add some data in the tblBooks table. This is a bit trickier than inserting data into the tblAuthors table. This is because the Author_Id column of the tblBooks table references Id column of the tblAuthors table. This means that the Author_Id column can only have values between 1 and 12000 i.e. the values of the Id column of the Author. Also we have to add random values for the Price and Edition columns.

To see the solution for this problem check out the following script. The explanation for this code follows.

Declare @RandomAuthorId int
Declare @RandomPrice int
Declare @RandomEdition int

Declare @LowerLimitForAuthorId int
Declare @UpperLimitForAuthorId int

Set @LowerLimitForAuthorId = 1
Set @UpperLimitForAuthorId = 12000


Declare @LowerLimitForPrice int
Declare @UpperLimitForPrice int

Set @LowerLimitForPrice = 50 
Set @UpperLimitForPrice = 100 

Declare @LowerLimitForEdition int
Declare @UpperLimitForEdition int

Set @LowerLimitForEdition = 1
Set @UpperLimitForEdition = 10


Declare @count int
Set @count = 1

While @count <= 20000
Begin 

   Select @RandomAuthorId = Round(((@UpperLimitForAuthorId - @LowerLimitForAuthorId) * Rand()) + @LowerLimitForAuthorId, 0)
   Select @RandomPrice = Round(((@UpperLimitForPrice - @LowerLimitForPrice) * Rand()) + @LowerLimitForPrice, 0)
   Select @RandomEdition = Round(((@UpperLimitForEdition - @LowerLimitForEdition) * Rand()) + @LowerLimitForEdition, 0)


   Insert Into tblBooks values (@RandomAuthorId, @RandomPrice, @RandomEdition)
   Print @count
   Set @count = @count + 1
End

Take a look at the code above. Here at the start we create three variables @RandomAuthorId, @RandomPrice and @RandomEdition. These three variables will store the values to be inserted into Author_Id, Price and Edition columns of the tblBooks table.

Next we created variables to store the upper limit and lower limit values for all Author_Id, Price and Edition columns. We want that the Author_Id columns should only have values between 1 and 12000 therefore the @UpperLimitForAuthorId variable is set to 12000 and the @LowerLimitForAuthorId variable is set to 1. Similarly, @UpperLimitForPrice variable is set to 50 and the @LowerLimitForAuthorId variable is set to 100 because we want the Price between 50 and 100. Finally, @UpperLimitForEdition variable is set to 10 and the @LowerLimitForEdition variable is set to 1 because we want the Edition to have values between 1 and 10.

Next we use the Rand() function which returns the values between 0 and 1 and multiplied it by the result of upper limits – lower limits. This returns the values between the specified limits. However these values are in decimal. To convert it into integer we use the Round function. We specify the second attribute as 0. This rounds the number to zero decimal places. Finally we insert the resultant values into the tblBooks table.

Now if you select all the records from the tblBooks table you will see that 20000 records were inserted. You will see the value for Author_Id between 1 to 12000, the value for Price between 50 to 100 and the value for Edition between 1 to 10 as specified in the query. The result set will look like this:

Id Auhthor_id Price Edition
1 8878 56 2
2 9605 71 5
3 3860 61 8
4 7425 81 7
5 4775 77 5
6 66 60 3
7 241 78 9
8 10583 93 2
9 7920 96 8
- - - -
- - - -
20000 2096 92 6

Your values will be different since the Rand function generates these numbers randomly.

Next Steps

Now you have large amount of data in your database. You can evaluate the performance of your queries with this dataset. You will see a clear distinction between the performances of different queries. For instance try to update the records using both JOINS and Cursors. You will see that JOINS will perform faster than Cursors. On small datasets, this distinction is not detectable. Similarly, you can improve your existing queries and check if they perform faster than their previous versions.



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Richardson Ben Richardson is the owner of Acuity Training, a UK based IT training business offering SQL training up to advanced administration courses.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-10-23

Comments For This Article




Wednesday, November 15, 2023 - 6:29:08 AM - anoop agarwal Back To Top (91754)
Thanks for this, works great.

Wednesday, June 9, 2021 - 11:44:54 PM - HARIVSAN Back To Top (88832)
It worked flawlessly and it is very quick.














get free sql tips
agree to terms