By: Aseel Al-Laham | Updated: 2023-08-21 | Comments (1) | Related: > Testing
Problem
In my line of work with databases and business intelligence, I have observed a common oversight among developers regarding testing their queries. Failing to perform this crucial step can result in production issues, which they may attribute to real-life data instead of improper Unit Testing. We can easily avoid these significant issues by taking the time to test queries before deploying them.
In this tip, we will discuss the process of generating test data to verify query logic without relying on real-life data, along with some helpful guidelines. We will also cover various scenarios and their corresponding solutions.
Solution
Before diving into testing SQL queries, we must first understand what unit testing is and how it applies to SQL code.
Unit Testing is a type of software testing that tests individual software units or components to ensure that each performs as expected. In SQL, when talking about Testing, it typically means data testing, which is validating data against test criteria. On the other hand, Unit Testing validates query logic using fixed inputs without relying on real-life data.
And if you are new to SQL queries, check out the following tips to get started:
Generating Test Data Guidelines
When we develop queries, we commonly encounter two database permissions: full access to Add, Delete, and Modify data to an existing database or only the ability to read the data. In addition, there may be situations where we come across empty databases and need interface development to add data, which prevents us from properly Unit Testing our queries. So, to handle whatever comes your way, below are a few guidelines to follow.
Guideline 1 - Generate Test Data Using APIs
You can generate test data if you can access the interface applications or APIs of the transactional database. Many companies have test environments that you can use to create your queries. By using these applications or APIs, you can gain practical knowledge about the business aspect of the data and even detect any related bugs or issues. If you are new to this process and require help generating test data, don't hesitate to contact the Operation or QA teams.
If you are new to APIs and don't know how to use them, you can use software like Postman to use and test APIs. For more information, check out these links:
- API Testing with Postman
- Postman Tutorial for Beginners to Perform API Testing
- Generate Test Data with Postman and Random Key
Guideline 2 - Generate Test Data Using T-SQL
If you have to add, delete, and update permissions, you can generate test data by creating it yourself. Still, it's essential to understand the business rules related to the data. For instance, a field value in one table may depend on the value(s) of another/other field(s) in the same or a different table. Understanding these rules will allow you to add data accurately.
For example, you need to generate data for a Person table that includes the following fields: ID, NationalID, FirstName, LastName, Gender, and BirthDate. We can create it by using a few predefined functions and simple query techniques as follow:
-- First, we will create the table. -- MSSQLTips.com CREATE TABLE Person ( ID NUMERIC(38,0) PRIMARY KEY IDENTITY (1,1), NationalID NVARCHAR(10), FirstName NVARCHAR(30), LastName NVARCHAR(30), Gender NVARCHAR(10), BirthDate DATE )
Second, we will create a query that will populate the Person table with 39600 rows.
-- MSSQLTips.com declare @max int = 200; with FirstName(FirstName,Gender) as ( SELECT N'John' , N'Male' UNION ALL SELECT N'Tim' , N'Male' UNION ALL SELECT N'Laura' , N'Female' UNION ALL SELECT N'Jeff' , N'Male' UNION ALL SELECT N'Alexander' , N'Male' UNION ALL SELECT N'Burt' , N'Male' UNION ALL SELECT N'Christopher', N'Male' UNION ALL SELECT N'Daniel' , N'Male' UNION ALL SELECT N'Daniela' , N'Female' UNION ALL SELECT N'Eric' , N'Male' UNION ALL SELECT N'Alexandra' , N'Female' UNION ALL SELECT N'Bertha' , N'Female' UNION ALL SELECT N'Christine' , N'Female' UNION ALL SELECT N'Noor' , N'Female' UNION ALL SELECT N'Leen' , N'Female' UNION ALL SELECT N'Aseel' , N'Female' UNION ALL SELECT N'Erica' , N'Female' UNION ALL SELECT N'Jackson' , N'Male' UNION ALL SELECT N'Walid' , N'Male' UNION ALL SELECT NULL , NULL UNION ALL -- This row was added because FirstName,Gender columns in Person table are nullable SELECT NULL , N'Male' UNION ALL -- This row was added because FirstName column in Person table is nullable SELECT NULL , N'Female' -- This row was added because FirstName column in Person table is nullable ) , LastNames(LastName) as ( SELECT N'Johnson' UNION ALL SELECT N'Hudson' UNION ALL SELECT N'Jackson' UNION ALL SELECT N'Ranallo' UNION ALL SELECT N'Curry' UNION ALL SELECT N'Allaham' UNION ALL SELECT N'Alnatur' UNION ALL SELECT N'Alkhattab' UNION ALL SELECT Null -- This row was added because LastName column in Person table is nullable ) , SortedNames(FirstName, LastName,Gender, RowNum) as -- This code will generate Sorted Names ( /* Number of records from this Select will be: Number of Male records in FirstName WITH clause Multiplied By Number of records IN LastNames WITH clause*/ select FirstName , LastName , Gender , ROW_NUMBER() over (Order by newid()) from FirstName cross join LastNames where Gender = 'Male' UNION /* Number of records from this Select will be: Number of Female records in FirstName WITH clause Multiplied By Number of records IN LastNames WITH clause*/ select FirstName , LastName , Gender , ROW_NUMBER() over (Order by newid()) from FirstName cross join LastNames where Gender = 'Female' UNION /* Number of records from this Select will be: Number of Null Gender records in FirstName WITH clause Multiplied By Number of records IN LastNames WITH clause*/ select FirstName , LastName , Gender , ROW_NUMBER() over (Order by newid()) from FirstName cross join LastNames where Gender is null ), Names AS ( select FirstName , LastName , Gender from SortedNames ) , Multiplier as ( SELECT 1 N UNION ALL SELECT N+1 FROM Multiplier WHERE N < @max ) INSERT INTO [dbo].[Person]([NationalID],[FirstName],[LastName],[Gender],[BirthDate]) select RIGHT(CAST(CAST(CRYPT_GEN_RANDOM(4) AS bigint) AS varchar(10)), 10) NationalID /* Generate random 10 numbers*/ , FirstName , LastName , Gender , DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 36500), '1940-1-1') BirthDate /* Generate random dates between 1940-1-1 and 2040-01-01; 36500 is around 100 years*/ from Names cross join Multiplier /*To cross multiply Names WITH clause that returns 198 random {FirstName LastName with gender} combinations with the number that we specify in @max parameter*/ order by NEWID() option (maxrecursion 0);
I used a few commands in the above code to help me create my test data. If you are unfamiliar with these commands, please refer to the following links for more information:
- WITH clause (CTE) in SQL Server
- Fix SQL Server CTE Maximum Recursion Exhausted Error
- SQL CROSS JOIN to get Every Combination of Records
- UNION vs. UNION ALL in SQL Server
- SQL Server Window Functions ROW_NUMBER
- NEWID
- CRYPT_GEN_RANDOM
- CHECKSUM
- ABS
- DATEADD
- SQL CAST Function for Data Type Conversions
In addition to creating a query to populate a table, we can use SQL Server INSERT and GO commands to load it with nth records. For example, the following code will run the same INSERT command 20 times:
-- MSSQLTips.com INSERT INTO [dbo].[Person] ( [NationalID] ,[FirstName] ,[LastName] ,[Gender] ,[BirthDate] ) VALUES ( RIGHT(CAST(CAST(CRYPT_GEN_RANDOM(4) AS bigint) AS varchar(10)), 10) ,N'John' ,N'Tim' ,N'Male' ,'1991-09-02' ) GO 20
For more information, check out this tip: Executing a TSQL batch multiple times using GO.
Guideline 3 - Generate Data via Queries
If you have only read permission for a database that does not have a data pattern you need for testing your query, don't give up. Instead, manipulate your data logically.
Whenever I'm testing queries, I follow a personal rule of thumb: "When in doubt, add your data to the query."
After we finish listing the main guidelines to generate test data, we will review two scenarios and how to add our data to the query to validate our query.
Guideline 4 - Generate Data Using Data Generation Tools
In some situations, utilizing test data generation tools provides significant ease in inserting data into the database, ultimately benefiting developers and testers. This guideline is notably more efficient and cost-effective than the complex and expensive manual insertion or script-writing process, especially if you have multiple large databases.
A few of these tools include:
- Red Gate SQL Data Generator
- Visual Studio Data Generators
- Toad for SQL Server
- devart Data Generator for SQL Server
Data Generation Examples
Below we will cover some examples of generating data based on using queries, which was briefly covered in Guideline 3 above.
Scenario #1 - Generate Rows for Missing Data Scenarios
In a table with few fields, an entire row of data that must meet the query logic is missing; for that, we may do the following steps:
- Use the dual table to prepare the required row.
- Union the new row with the source table.
- Use the modified dataset in the FROM or WITH clauses as a sub-query.
For example, in the previously created Person table, we were requested to count the persons in the Person table by age generation, and gender. Also, when age is null, map it to Unknown, and when Gender is null or an empty string, map it to Unknown. Unfortunately, the Person table doesn't have rows with empty BirthDate or empty string gender; at the same time, we don't have permission to create an Age generation table. Don't panic. We can test our code.
First, we will create our dummy record, as the following image shows.
Second, we will transform our data to map null values and empty strings to Unknown, extract age from the BirthDate column, and find age accordingly.
-- MSSQLTips.com With Person_SQ as ( SELECT ID, NationalID, FirstName ,LastName,Gender,BirthDate FROM Person UNION ALL SELECT (SELECT MAX(ID) FROM Person) + 1 ID, N'0000000001' NationalID, N'Dummy' FirstName , N'Record' LastName, N'' Gender, NULL BirthDate ) , AgeGeneration as ( SELECT 1 GenID , 78 FromAge , 999 ToAge , N'The Silent Generation' Generation , N'1900-1945' AgeRange UNION ALL SELECT 2 GenID , 59 FromAge , 77 ToAge , N'Baby Boomers' Generation , N'59-77' AgeRange UNION ALL SELECT 3 GenID , 43 FromAge , 58 ToAge , N'Gen X: Born' Generation , N'43-58' AgeRange UNION ALL SELECT 4 GenID , 27 FromAge , 42 ToAge , N'Millennials' Generation , N'27-42' AgeRange UNION ALL SELECT 5 GenID , 11 FromAge , 26 ToAge , N'Gen Z' Generation , N'11-26' AgeRange UNION ALL SELECT 6 GenID , -999 FromAge , 10 ToAge , N'Gen Alpha' Generation , N'<=10' AgeRange UNION ALL SELECT 7 GenID , NULL FromAge , NULL ToAge , N'Unknown' Generation , N'Unknown' AgeRange ) SELECT * FROM ( SELECT Person_SQ.ID, NationalID, FirstName ,LastName,Gender,BirthDate , DateDiff(Year,BirthDate, GETDATE()) Age , AG.GenID, AG.Generation , AG.AgeRange, CASE WHEN Gender IS NULL OR Gender = N'' THEN N'Unknown' ELSE Gender END MappedGender FROM Person_SQ LEFT JOIN AgeGeneration AG ON (DateDiff(Year,BirthDate, GETDATE()) BETWEEN AG.FromAge AND ToAge) OR (BirthDate IS NULL AND AG.GenID = 7) ) MainQuery ORDER BY ID DESC;
Third, we will write an aggregation query for the required logic.
-- MSSQLTips.com With Person_SQ as ( SELECT ID, NationalID, FirstName ,LastName,Gender,BirthDate FROM Person UNION ALL SELECT (SELECT MAX(ID) FROM Person) + 1 ID, N'0000000001' NationalID, N'Dummy' FirstName , N'Record' LastName, N'' Gender, NULL BirthDate ) , AgeGeneration as ( SELECT 1 GenID , 78 FromAge , 999 ToAge , N'The Silent Generation' Generation , N'1900-1945' AgeRange UNION ALL SELECT 2 GenID , 59 FromAge , 77 ToAge , N'Baby Boomers' Generation , N'59-77' AgeRange UNION ALL SELECT 3 GenID , 43 FromAge , 58 ToAge , N'Gen X: Born' Generation , N'43-58' AgeRange UNION ALL SELECT 4 GenID , 27 FromAge , 42 ToAge , N'Millennials' Generation , N'27-42' AgeRange UNION ALL SELECT 5 GenID , 11 FromAge , 26 ToAge , N'Gen Z' Generation , N'11-26' AgeRange UNION ALL SELECT 6 GenID , -999 FromAge , 10 ToAge , N'Gen Alpha' Generation , N'<=10' AgeRange UNION ALL SELECT 7 GenID , NULL FromAge , NULL ToAge , N'Unknown' Generation , N'Unknown' AgeRange ) SELECT Generation [Age Generation] , MappedGender Gender , FORMAT(COUNT(*) , '#,#') [Total Number of Persons] FROM ( SELECT Person_SQ.ID, NationalID, FirstName ,LastName,Gender,BirthDate , DateDiff(Year,BirthDate, GETDATE()) Age , AG.GenID, AG.Generation , AG.AgeRange, CASE WHEN Gender IS NULL OR Gender = N'' THEN N'Unknown' ELSE Gender END MappedGender FROM Person_SQ LEFT JOIN AgeGeneration AG ON (DateDiff(Year,BirthDate, GETDATE()) BETWEEN AG.FromAge AND ToAge) OR (BirthDate IS NULL AND AG.GenID = 7) ) MainQuery GROUP BY Generation , MappedGender , GenID ORDER BY GenID , MappedGender;
In addition to adding a dummy record to test the needed transformation in the previous example, we used a WITH clause and UNION ALL command to logically create a lookup table that is not physically created in the database.
Although this is a simple example, the same process can be applied to test a wide range of query complexities.
Scenario #2 - Generate Rows for Missing Data Scenarios
In a table with many fields, a particular value(s) in a column that must meet the query logic is missing. For that, we may do the following steps:
- Use the case statement to change the value for a few rows to the required value(s).
- Write a WITH clause that contains the modified column data, which will be aliased with the original column name.
- To test the query, call the modified dataset with a WITH clause in the FROM clause.
For example, in the previously created Person table, we need to return all information for the records in which the NationalID value has two or more occurrences in the dataset, and it goes the same if NationalID is null. Unfortunately, the Person table doesn't have this pattern of data. Still, we can test our code.
Most of the developers will write the query as follows:
-- MSSQLTips.com WITH Person_SQ AS ( SELECT ID , CASE WHEN ID = 56 THEN N'1046776265' WHEN ID = 58 THEN N'3820053562' WHEN ID = 59 THEN N'3820053562' WHEN ID = 101 THEN N'3175732123' WHEN ID = 1102 THEN N'3175732123' WHEN ID = 1103 THEN N'3175732123' WHEN ID = 39600 THEN N'3175732123' WHEN ID = 39592 THEN N'3175732123' WHEN ID = 39570 THEN N'3175732123' WHEN ID = 1 THEN NULL WHEN ID = 12 THEN NULL WHEN ID = 33 THEN NULL WHEN ID = 46 THEN NULL ELSE NationalID END NationalID, FirstName, LastName, Gender, BirthDate FROM Person ) , MoreThanOneOccurrence AS ( SELECT COUNT(*) CNT , NationalID FROM Person_SQ AS Person GROUP BY NationalID HAVING COUNT(*) > 1 ) SELECT Person.* , MT1O.CNT [Number Of Occurrence] FROM Person_SQ AS Person INNER JOIN MoreThanOneOccurrence MT1O ON Person.NationalID = MT1O.NationalID ORDER BY MT1O.CNT DESC , Person.NationalID
If null values are not considered in an INNER JOIN, any occurrence of NationalID being null and, in our scenario, occurring more than once, will not be returned. This was discovered by creating data test cases logically. Therefore, the correct code should be adjusted accordingly.
-- MSSQLTips.com WITH Person_SQ AS ( SELECT ID , CASE WHEN ID = 56 THEN N'1046776265' WHEN ID = 58 THEN N'3820053562' WHEN ID = 59 THEN N'3820053562' WHEN ID = 101 THEN N'3175732123' WHEN ID = 1102 THEN N'3175732123' WHEN ID = 1103 THEN N'3175732123' WHEN ID = 39600 THEN N'3175732123' WHEN ID = 39592 THEN N'3175732123' WHEN ID = 39570 THEN N'3175732123' WHEN ID = 1 THEN NULL WHEN ID = 12 THEN NULL WHEN ID = 33 THEN NULL WHEN ID = 46 THEN NULL ELSE NationalID END NationalID, FirstName, LastName, Gender, BirthDate FROM Person ) , MoreThanOneOccurrence AS ( SELECT COUNT(*) CNT , NationalID FROM Person_SQ AS Person GROUP BY NationalID HAVING COUNT(*) > 1 ) SELECT Person.* , MT1O.CNT [Number Of Occurrence] FROM Person_SQ AS Person INNER JOIN MoreThanOneOccurrence MT1O ON ISNULL(Person.NationalID, '') = ISNULL(MT1O.NationalID, '') ORDER BY MT1O.CNT DESC , Person.NationalID
Conclusion
Although we covered some simple examples, the same process can be applied to test a wide range of query complexities.
We learned that we can Unit Test queries, which is not a myth. And we can test query logic using different methods as described in the Guidelines sections, which cover all types of database permission that any databases and/or business intelligence developers can have.
Next Steps
- Read more about SQL Server Random Sorted Result Set
- Read more about SQL Server Function to Generate Random Numbers
- Read more about CHECKSUM Functions in SQL Server
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-08-21