By: Jared Westover | Updated: 2023-11-21 | Comments (2) | Related: > Common Table Expressions
Problem
Have you built a recursive CTE lately? Yeah, me either. In the real world, I use a recursive CTE rarely. Along with a PIVOT operator, the syntax for a recursive CTE terrifies me. But when you break it down, there are only two parts. Don't believe me? Keep on reading. Also, why do I get an error message about maximum recursion exhausted at 100?
Solution
In this article, we'll explore what a recursive CTE is and when to use it. We'll cover the two main parts along with an optional third. I'll mention an alternative to using a recursive CTE—spoiler, it's a WHILE loop. We'll also look at getting around that pesky maximum recursion error. By the end of this article, you'll find building a recursive CTE less intimidating. If you don't need to create one today, I hope you bookmark this article for later.
What is a CTE?
You've likely used a Common Table Expression (CTE) if you write T-SQL. They come in two varieties: non-recursive and recursive. I include the former in every query I write that's longer than two lines of code. Microsoft introduced CTEs in SQL Server 2005—along with synonyms, everyone's favorite feature. Oracle was the first RDBMS I remember building a CTE in, and it was love at first sight.
CTEs are like derived tables and hold a temporary result set—don't confuse them with temporary tables. Unlike temporary tables, you can only reference them in a single statement.
People find using CTEs makes reading code easier than nested queries. A developer friend hates using them and says they make reading code harder. He rewrites other people's queries and removes the CTEs. Seriously, what's up with this guy? Style is a matter of personal preference. For me, reading top-down makes everything in life simpler. I prefer the CTE below compared to the derived table.
DROP TABLE IF EXISTS #Hero; CREATE TABLE #Hero ( Id INT NOT NULL, [HeroName] VARCHAR(25) NOT NULL, [CreatedDate] DATE NOT NULL ); GO INSERT INTO #Hero ( Id, HeroName, CreatedDate ) VALUES (1, 'Batman', '01-01-2023'), (2, 'Superman', '01-01-2023'), (3, 'Wonder Woman', '01-01-2023'), (4, 'Wonder Woman', '01-03-2023'); -- Find the duplicate superhero using a CTE. ;WITH DupHero AS (SELECT Id, ROW_NUMBER() OVER (PARTITION BY HeroName ORDER BY h.CreatedDate) AS rn FROM #Hero AS h) SELECT h.HeroName FROM #Hero AS h JOIN DupHero AS d ON d.Id = h.Id WHERE d.rn > 1; -- Find the duplicate superhero using a derived table. SELECT h.HeroName FROM #Hero AS h JOIN ( SELECT Id, ROW_NUMBER() OVER (PARTITION BY HeroName ORDER BY CreatedDate) AS rn FROM #Hero ) AS d ON d.Id = h.Id WHERE d.rn > 1;
I often find myself using a CTE in combination with a windowing function. For example, I start by generating a row number in the first CTE and reference that number in the next. Years ago, a coworker called this the waterfall effect. Something about that phrase stuck with me. If you are not using CTEs, at least give them a try.
Explore Recursive CTEs
Let's explore the other type of CTE that frightens people: recursive. The Merriam-Webster dictionary defines recursion as a technique involving "…a procedure, subroutine, function, or algorithm that calls itself one or more times until a specified condition is met..." The definition sounds like a WHILE loop or cursor to me.
Have you used the Fibonacci sequence? If you work with an Agile team, they love using it to assign story points. It's a type of recursion where each number is the sum of the prior two numbers—I stopped at 21 below because of blackjack.
0, 1, 1, 2, 3, 5, 8, 13, 21,…
The typical recursive CTE example involves a company hierarchy or family tree. A recursive CTE references a result set multiple times until it meets a condition. In the family tree example, that's all the family members. Without a recursive CTE, you might use a WHILE loop or cursor to obtain the same results. Here is an article by Edwin Sarmiento where he provides an example. Now, let's look at the two parts of a recursive CTE.
Parts of a Recursive CTE
Someone asking me to write a recursive CTE in an interview is what nightmares are made of. At its core, a recursive CTE consists of two parts; people sometimes throw a third one in there, but we'll stick with two for now.
Let's first look at the entire dataset.
DROP TABLE IF EXISTS #Hero; CREATE TABLE #Hero ( Id INT NOT NULL, [HeroName] VARCHAR(25) NOT NULL, [BossId] INT NULL, [BirthYear] DATE NOT NULL ); GO INSERT INTO #Hero ( Id, HeroName, BossId, BirthYear ) VALUES (5, 'Professor X', NULL, '01-01-1932'), (6, 'Cyclops', 5, '01-01-1977'), (3, 'Wolverine', 6, '01-01-1832'), (4, 'Storm', 3, '01-01-1975'), (2, 'Rogue', 3, '01-01-1981'), (1, 'Jubilee', 2, '01-01-1989'); ;WITH RecursiveCTE AS (SELECT Id, HeroName, BossId FROM #Hero WHERE BossId IS NULL -- Here we start with Professor X UNION ALL SELECT h.Id, h.HeroName, h.BossId FROM #Hero h INNER JOIN RecursiveCTE r ON h.BossId = r.Id) SELECT r.Id, r.HeroName, r.BossId FROM RecursiveCTE r;
I didn't place the Ids in order to illustrate the magic. The recursive CTE ordered my structure in the screenshot below.
The Anchor or Parent
In the example of a company hierarchy, the anchor is the CEO. The word anchor always confuses me. I know what an anchor is, but something about saying CEO, parent, or boss simplifies it. Pick whatever word clicks in your brain. It's vital to remember SQL executes this query one time.
What does the query look like to return the boss record? I've included it below, but it's where the BossId is NULL. We'll assume the boss doesn't have a boss.
SELECT Id, HeroName, BossId FROM #Hero WHERE BossId IS NULL; -- Here we start with Professor X
Another example where we generate a family tree would be the oldest known relative. In my case, it's a great-grandmother.
The Recursive Part
Since we figured out the parent, we need to write the recursive part of the query. SQL Server executes this query until we expose all levels in the hierarchy. Below is an example keeping with our structure above.
SELECT h.Id, h.HeroName, h.BossId FROM #Hero h INNER JOIN RecursiveCTE r -- Here we join to the CTE ON h.BossId = r.Id
Sometimes, you'll see people add in a third part called the break or end condition. Most times, this entails adding a simple WHERE clause to the recursive part. It adds more complexity to my two-part system, so I like ignoring it. However, I've included an example below for you to review. Let's say we only want to return superheroes born before a particular date in our structure.
;WITH RecursiveCTE AS (SELECT Id, HeroName, BossId FROM #Hero WHERE BossId IS NULL -- Here we start with Professor X UNION ALL SELECT h.Id, h.HeroName, h.BossId FROM #Hero h INNER JOIN RecursiveCTE r ON h.BossId = r.Id WHERE h.BirthYear < '01-01-1985') –- The break condition SELECT r.Id, r.HeroName, r.BossId FROM RecursiveCTE r;
At the center of it all, you combine the boss with the recursion using the UNION ALL operator. What if, for some reason, you wanted to change it to a UNION? Well, SQL doesn't like it and throws an error.
Building a Dataset
I often use a recursive CTE to build a specific dataset rather than a hierarchy. In a prior article, I used the code below. In a way, this is a CROSS JOIN of sorts. I know it isn't a CROSS JOIN; there's no need to leave a comment. Can you spot the two parts of the recursive CTE?
DECLARE @StartDate DATE = '2023-02-01'; DECLARE @EndDate DATE = '2023-02-28'; WITH Dates AS (SELECT @StartDate AS Date UNION ALL SELECT DATEADD(DAY, 1, Date) AS Date FROM Dates WHERE Date < @EndDate) SELECT d.Date FROM Dates d;
Maximum Recursion
SQL Server is smart and understands our shortcomings as humans. To keep us from entering an infinite loop by default, SQL permits us to scale 100 levels deep. If you execute a recursive CTE with more than 100 levels, SQL returns the error message below.
If you want to bypass the 100-level restriction, there's a command for that. If you remove the limitation, the query gets out in production, and SQL hangs, your DBA will email you.
DECLARE @StartDate DATE = '2022-02-01'; DECLARE @EndDate DATE = '2023-02-28'; WITH Dates AS (SELECT @StartDate AS Date UNION ALL SELECT DATEADD(DAY, 1, Date) AS Date FROM Dates WHERE Date < @EndDate) SELECT d.Date FROM Dates d OPTION (MAXRECURSION 0);
Summary
Many things in life are scary, like taxes or running an update in production, and wondering if you included a WHERE clause. But as we saw in this article, recursive CTEs don't need to be one of them. There are only two parts: the boss and the recursion. The older I get, the easier it is to forget syntax. If you're prone to forgetfulness, bookmark this page for your future self. I better do it now, so I don't forget.
Have you constructed a recursive CTE in real life? If so, what other use cases have you encountered?
Key Points
- Recursive CTEs help traverse recursive structures, like a company hierarchy, bill of material, or family tree.
- At first glance, the syntax for a recursive CTE is confusing. But it's easier to digest if you remember there are only two parts: the boss and recursion.
- I prefer a recursive CTE over trying to write a cursor or WHILE loop to accomplish the same thing. I won't hold it against you if you like the loop method.
- If you trust your code, don't worry about maximum recursion. However, add it to avoid an error in production.
Next Steps
- Can't get enough of CTEs? Ben Snaidero wrote the article SQL Server CTE vs Temp Table vs Table Variable Performance Test comparing the performance of each.
- Andy Brown explores why CTEs make code more readable in How to use SQL Server CTEs to make your T-SQL code readable by humans.
- I mentioned PIVOT in the opening, and for a good reason: the syntax is confusing. I wrote SQL Pivot Like a Pro to simplify the PIVOT operator.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-11-21