By: Koen Verbeeck | Updated: 2020-07-15 | Comments (6) | Related: > Dates
Problem
I have a common use case where I need to convert a date range into a set of rows in a SQL Server table. For example, I have a record saying an employee took a holiday from 2020-08-01 till 2020-08-20. This single row needs to be transformed into 20 rows, one for each day of vacation. How can I do this in a fast and scalable manner, since my tables contains thousands of records which might result in millions of rows in output?
Solution
Converting date ranges (or any other type of range) into a set of rows is a common requirement. For example, the source system supplies rows with a start and end date for a specific interval, but in your database, you need a transactional table with a row for each day, since this simplifies calculations. You could for example filter out weekends and holidays much easier than when you are using just the start and end date of the interval.
Often, the term "exploding the table" is used, since a small set of ranges can result in a huge output of rows. If you have one rows with 2020-01-01 as the start date and 2020-12-31 as end date, this would already result in 366 rows. Imagine you need to perform a similar calculation for millions of customers. Since the output can be large, it's important the solution is fast and scalable. This rules out loops and cursors in T-SQL, since they are row-based and are not suited for large volumes of rows. Other terminology includes "starbursting" or "unpacking a relation on a time interval".
In this tip, a solution is presented using a "numbers table", sometimes also called a "tally table". For more background information about this type of table, check out these excellent tips by Aaron Bertrand:
The tip SQL Server Function to return a range of dates does something similar, but uses a recursive CTE which is not scalable and also has a limit on the maximum number of recursions.
Sample Data
With the following SQL statement, we can create a simple table for holding the sample data:
CREATE TABLE dbo.EmployeeHoliday (EmployeeID VARCHAR(10) NOT NULL ,StartDate DATE NOT NULL ,EndDate DATE NOT NULL);
Let's insert 2 rows into that table. Two employees taking a holiday, both with start and end date.
INSERT INTO dbo.EmployeeHoliday SELECT [EmployeeID] = 'A', [StartDate] = '2020-06-01', [EndDate] = '2020-06-05' UNION ALL SELECT 'B','2020-12-15','2020-12-31';
This is the desired end result, one row for each day in the range of the start and end date:
Creating the SQL Server Numbers Table
There are different methods of creating a numbers table, such as cross joining two large sets together (take a look at Aaron's tips if you're interested in examples), but you can also create a "virtual" table by using common table expressions (CTE). The following syntax is coined by Itzik Ben-Gan, a T-SQL guru:
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1) ,E02(N) AS (SELECT 1 FROM E00 a, E00 b) ,E04(N) AS (SELECT 1 FROM E02 a, E02 b) ,E08(N) AS (SELECT 1 FROM E04 a, E04 b) ,E16(N) AS (SELECT 1 FROM E08 a, E08 b) ,E32(N) AS (SELECT 1 FROM E16 a, E16 b) ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32) SELECT N FROM cteTally WHERE N <= 10000;
In the first CTE E00, two rows are joined together. In the next CTE E02, this first CTE is cross joined to itself. This gives us 4 rows. This process is repeated a couple of times. The last CTE E32 will return 2^32 rows which is also the highest number an integer can hold in SQL Server. But this CTE only returns rows with the value 1.
Using the window function ROW_NUMBER, we can assign numbers to each row. For the ORDER BY in the OVER clause, we use the subquery (SELECT NULL). This tricks SQL Server into not sorting the dataset. If it would, it could be a serious performance issue. Running the entire SQL statement returns a unique sequential number for each row, starting with the number 1:
Generating a one million row table takes just a couple of seconds on my machine. But in this use case we need dates, not numbers. We can transform the numbers into dates by using the DATEADD function. The SQL statement now becomes:
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1) ,E02(N) AS (SELECT 1 FROM E00 a, E00 b) ,E04(N) AS (SELECT 1 FROM E02 a, E02 b) ,E08(N) AS (SELECT 1 FROM E04 a, E04 b) ,E16(N) AS (SELECT 1 FROM E08 a, E08 b) ,E32(N) AS (SELECT 1 FROM E16 a, E16 b) ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32) SELECT ExplodedDate = DATEADD(DAY,N - 1,'2020-01-01') FROM cteTally WHERE N <= 366
This will generate all of the dates of the year 2020 (keep in mind it's a leap year).
Exploding the Range in the SQL Server Numbers Table
Using the numbers table (or dates table more accurately), we can finally "explode" our sample data. We just need to join the tally table to the sample table using a range join.
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1) ,E02(N) AS (SELECT 1 FROM E00 a, E00 b) ,E04(N) AS (SELECT 1 FROM E02 a, E02 b) ,E08(N) AS (SELECT 1 FROM E04 a, E04 b) ,E16(N) AS (SELECT 1 FROM E08 a, E08 b) ,E32(N) AS (SELECT 1 FROM E16 a, E16 b) ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32) ,DateRange AS ( SELECT ExplodedDate = DATEADD(DAY,N - 1,'2020-01-01') FROM cteTally WHERE N <= 366 ) SELECT * FROM dbo.EmployeeHoliday eh JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate] AND d.ExplodedDate <= eh.[EndDate];
This gives us the desired result set:
Starting with 2020-01-01 and just taking 366 days is quite limited for a generated date table. You can expand these limits though and this shouldn't affect performance too badly. Let's take 2000-01-01 as starting point and generate 100 years of dates (roughly 365,000 rows):
The query now finishes just under 3 seconds.
Test with More Data
Let's do the same use case, but now using the Fact Internet Sales table of the AdventureWorks 2017 data warehouse. We'll explode the dates between the order date and the due date. The fact table contains 60,398 rows. The exploded result set will contain 785,174 rows. The adapted SQL query:
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1) ,E02(N) AS (SELECT 1 FROM E00 a, E00 b) ,E04(N) AS (SELECT 1 FROM E02 a, E02 b) ,E08(N) AS (SELECT 1 FROM E04 a, E04 b) ,E16(N) AS (SELECT 1 FROM E08 a, E08 b) ,E32(N) AS (SELECT 1 FROM E16 a, E16 b) ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32) ,DateRange AS ( SELECT ExplodedDate = DATEADD(DAY,N - 1,'2010-01-01') FROM cteTally WHERE N <= 5000 ) SELECT f.[SalesOrderNumber], f.[SalesOrderLineNumber], [f].[OrderDate], [f].[DueDate], d.[ExplodedDate] INTO #Test FROM [AdventureWorksDW2017].[dbo].[FactInternetSales] f JOIN DateRange d ON d.ExplodedDate >= f.[OrderDate] AND d.ExplodedDate <= f.[DueDate]; SELECT COUNT(1) FROM #test;
In this test I'm not returning the rows to SSMS, because displaying all the rows in the grid would take too much time. Rather I'm inserting the data into a temp table and then returning the row count from this temp table. Exploding the Internet Sales Fact table took 20 seconds on my machine:
Tip Update - Alternative Approach
In the comments, Jeff Moden suggested an alternate way of solving the problem. Instead of hard-coding the limit to 5,000 rows, the limit is dynamically calculated using the DATEDIFF of the order and due date. This value is applied to a TOP clause, selecting only the number of rows of the tally table that are actually needed. The resulting rows are then joined to the fact table using CROSS APPLY, effectively exploding the data range.
Here’s the adapted code:
WITH H1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) H0(N)) ,cteTALLY(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM H1 a, H1 b, H1 c, H1 d, H1 e, H1 f, H1 g, H1 h) SELECT f.SalesOrderNumber ,f.SalesOrderLineNumber ,f.OrderDate ,f.DueDate ,ExplodedDate = DATEADD(dd,t.N-1,f.OrderDate) INTO #Test FROM AdventureWorksDW2017.dbo.FactInternetSales f CROSS APPLY ( SELECT TOP (DATEDIFF(dd,f.OrderDate,f.DueDate) + 1) N FROM cteTally ORDER BY N ) t;
Let’s say for example that one particular row of the Internet Sales Fact table, the number of days between is 5 (which means we need 6 exploded rows, since we need to include the first date). This would result in the following subquery:
SELECT TOP 6 N FROM cteTally ORDER BY N
This would return 6 rows of the tally table (1,2,3,4,5,6). Using CROSS APPLY, these 6 rows are joined to the fact table, repeating the original row of the fact table 6 times, but for each row adding (N-1) days to the order date.
The solution is more efficient, because there’s no work table created in the tempdb database and the tally table isn’t materialized in memory. For more information, check out the comments.
Conclusion
In this tip we saw how you can explode a date range using a "numbers" table. It's a common scenario for time series analysis for example. The code can be optimized by using a persisted date table instead of generating one on the fly. The purpose of this tip however was to show you the techniques to quickly generate a large numbers table to improve your T-SQL code by using set-based logic.
Next Steps
- If you haven't already, check out Aaron's tips about the basics of the numbers table:
- Another great introduction to tally tables by Jeff Moden: The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
- For more T-SQL tips, check out this overview.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2020-07-15