Troubleshoot Chained SQL Server CTE

By:   |   Updated: 2024-10-16   |   Comments (8)   |   Related: > Common Table Expressions


Problem

I started using Common Table Expressions (CTEs) in Oracle before Microsoft SQL Server and found them easier to read than a derived table. Something about reading from the top down just clicked in my brain. As the years have passed, I'm less of a fan for one reason: troubleshooting a chained CTE with several links is hard—especially when it's someone else's code. Even the ones I wrote years ago fill me with unease when something breaks. Is there an easier way to break down each link in the chain when things go wrong?

Solution

This SQL tutorial will explore how chained CTEs might make troubleshooting code harder. The process can be further complicated if your developers like using local variables. Next, we'll look at using a temporary table as a substitute. But what are the trade-offs of using a temporary table versus a CTE? Also, should you stop using CTEs altogether? These are a few of the questions I'll address throughout this article.

Defining Terms

Before diving in, let's define each element we'll examine in this article. Since countless articles have already been written about CTEs, my intent is not to write another one. Instead, I want to highlight the differences between a derived table and a subquery since those terms are often confused. Below, I've defined all three of them.

SQL Common Table Expression

A Common Table Expression (CTE) is like a derived table but is defined top-down. It's often described as a temporary result set that's only accessible within the query where it's defined. Unlike a temporary table, SQL Server doesn't store the results, so you can't reuse them outside the query.

Below is a simple CTE example. This query retrieves the salesperson who made the most recent sale of each unique product from product category 2.

-- mssqltips.com
;WITH cte
AS (SELECT SalesPersonId,
           ProductId,
           ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY TransactionDate DESC) rn
    FROM dbo.SalesTransactions
    WHERE ProductCategoryId = 2
   )
SELECT SalesPersonId,
       ProductId
FROM cte
WHERE rn = 1;

It seems that no matter the query, I end up incorporating a CTE into it, especially when using windowing functions.

Derived Table

A derived table is also like a CTE, but you place it in the FROM clause of a query. Some refer to derived tables as subqueries, though that term can be confusing. I suppose you could say they are a type of subquery. A derived table only exists for the duration of a single query, just like a CTE. The query below shows a derived table in action.

-- mssqltips.com
SELECT SalesPersonId,
       ProductId
FROM
(
    SELECT SalesPersonId,
           ProductId,
           ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY TransactionDate DESC) rn
    FROM dbo.SalesTransactions
    WHERE ProductCategoryId = 2
) AS dt
WHERE rn = 1;

If you execute these queries, SQL Server uses the same execution plan, and the logical reads are identical.

Subquery

SQL Server offers two types of subqueries: correlated and non-correlated. Here, I'll focus on the non-correlated type. A non-correlated subquery runs independently of the main query. For example, you might use a subquery if you want to return the sum of all sales for a specific column. You'll typically find a subquery in the SELECT clause as a column expression or in the WHERE clause as a predicate. Below is a simple example.

-- mssqltips.com
SELECT SalesPersonId,
       ProductId,
       (
           SELECT COUNT(1) FROM dbo.SalesTransactions WHERE ProductCategoryId = 2
       ) AS Category2Count
FROM dbo.SalesTransactions;

With the terms defined, let's explore why you might want to use a chained CTE instead of a nested derived table.

Nested Derived Tables

Why would you need to create a nested derived table? Let me offer two reasons. First, nesting becomes necessary when you need to perform multiple transformations on a result set. Second, when you want to execute the code step by step. Perhaps you could even combine three derived tables into one, but steps make following the code logic easier.

Below is an example of a nested derived table. I'm only going two levels deep in this example, but I'm sure you've seen much worse. This query finds the top three most recent sales of products from category 2 where the sales amount is greater than or equal to the average.

-- mssqltips.com
SELECT TOP (3)
    dt2.SalesPersonId,
    dt2.ProductId,
    dt2.SaleAmount
FROM
(
    SELECT dt1.SalesPersonId,
           dt1.ProductId,
           dt1.SaleAmount,
           dt1.AverageSalesAmount,
           ROW_NUMBER() OVER (ORDER BY dt1.TransactionDate DESC) AS rn
    FROM
    (
        SELECT SalesPersonId,
               ProductId,
               TransactionDate,
               AVG(SaleAmount) OVER (ORDER BY (SELECT NULL)) AS AverageSalesAmount,
               SaleAmount
        FROM dbo.SalesTransactions
        WHERE ProductCategoryId = 2
    ) AS dt1
) AS dt2
WHERE dt2.SaleAmount >= dt2.AverageSalesAmount
ORDER BY dt2.SaleAmount DESC,
         dt2.rn ASC;

Chained CTEs

Like the nested derived table, a chained CTE allows you to perform multiple transformations step-by-step. For many, reading a chained CTE is easier since the syntax flows vertically at the top of the query rather than being nested within. Below is the same query, but it uses a chained CTE.

-- mssqltips.com
;WITH cte1
AS (SELECT SalesPersonId,
           ProductId,
           TransactionDate,
           AVG(SaleAmount) OVER (ORDER BY (SELECT NULL)) AS AverageSalesAmount,
           SaleAmount
    FROM dbo.SalesTransactions
    WHERE ProductCategoryId = 2
   ),
      cte2
AS (SELECT SalesPersonId,
           ProductId,
           ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) AS rn,
           AverageSalesAmount,
           SaleAmount
    FROM cte1
   )
SELECT TOP (3)
    SalesPersonId,
    ProductId,
    SaleAmount
FROM cte2
WHERE SaleAmount >= AverageSalesAmount
ORDER BY SaleAmount DESC,
         rn ASC;

Some people prefer vanilla over chocolate ice cream; others even enjoy strawberry. Ultimately, the choice comes down to personal preference. From a readability standpoint, I prefer chained CTEs. However, I've worked with people who can't stand CTEs—I'm not quite sure why!

Troubleshooting Chained CTEs

One thing I detest about a chained CTE is troubleshooting a step when something is off. The first CTE is rarely the issue, but things get hard when it's the fifth or sixth link in the chain. Imagine someone telling us the results are wrong—they expect more rows. My instinct is to run the first CTE, but I don't see any issues. Our next step is to break out each of the CTEs using comments so I can check where the problem exists in the chain.

-- mssqltips.com
;WITH cte1
AS (SELECT SalesPersonId,
           ProductId,
           TransactionDate,
           AVG(SaleAmount) OVER (ORDER BY (SELECT NULL)) AS AverageSalesAmount,
           SaleAmount
    FROM dbo.SalesTransactions
    WHERE ProductCategoryId = 2
   )--,
      --cte2
--AS (
SELECT SalesPersonId,
           ProductId,
           ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) AS rn,
           AverageSalesAmount,
           SaleAmount
    FROM cte1
--   )
--SELECT TOP (3)
--    SalesPersonId,
--    ProductId,
--    SaleAmount
--FROM cte2
--WHERE SaleAmount >= AverageSalesAmount
--ORDER BY SaleAmount DESC,
--         rn ASC;

I dread breaking the query down with comments. There must be an easier way to troubleshoot this code.

Temporary Tables

SQL Server creates temporary tables in tempdb, and they are commonly used to store intermediate results. One advantage of temporary tables is that SQL automatically generates statistics so that you might get better execution plans than a table variable. Let's see how we can rewrite the CTE above using temporary tables.

DROP TABLE IF EXISTS #temp1
DROP TABLE IF EXISTS #temp2

SELECT SalesPersonId,
       ProductId,
       TransactionDate,
       AVG(SaleAmount) OVER (ORDER BY (SELECT NULL)) AS AverageSalesAmount,
       SaleAmount
INTO #temp1
FROM dbo.SalesTransactions
WHERE ProductCategoryId = 2;

SELECT SalesPersonId,
       ProductId,
       ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) AS rn,
       AverageSalesAmount,
       SaleAmount
INTO #temp2
FROM #temp1;

SELECT TOP (3)
    SalesPersonId,
    ProductId,
    SaleAmount
FROM #temp2
WHERE SaleAmount >= AverageSalesAmount
ORDER BY SaleAmount DESC,
         rn ASC;

We still use the step-based method for our transformations, but here, we can simply query each temp table to see the results up to a specific step. This approach is much easier for me to troubleshoot.

Downside to Temporary Tables

As Thomas Sowell said, "There are no solutions. There are only trade-offs." With that in mind, here are some downsides to replacing chained CTEs with temporary tables:

  • Additional resource usage: Creating temporary objects, including tables and statistics, requires extra time, memory, and storage.
  • Lack of indexes: For example, in the code above, if covering indexes exist later in the chain, you would need to recreate them for the temporary tables.
  • Manually CREATE and DROP temporary tables: If you don't drop them and leave the session open, it can lead to clutter.

With the above points in mind, don't take this article as a recommendation to rewrite all your chained CTEs as temporary tables. If you do, remember—don't blame me.

Help Converting

You might think, "I don't want to rewrite everything as temporary tables," and I wouldn't blame you. However, this is one area where I believe an AI, like ChatGPT or Copilot, can help. Let's see how this works in the real world. Below, I've added the prompt and code generated.

-- Prompt: Please take the code below and convert it to temporary tables. Do not explicitly create the tables.

-- Step 1: Insert data into a temporary table (equivalent to cte1)
SELECT SalesPersonId,
       ProductId,
       TransactionDate,
       AVG(SaleAmount) OVER (ORDER BY (SELECT NULL)) AS AverageSalesAmount,
       SaleAmount
INTO #TempTable1
FROM dbo.SalesTransactions
WHERE ProductCategoryId = 2;

-- Step 2: Insert data into a second temporary table (equivalent to cte2)
SELECT SalesPersonId,
       ProductId,
       ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) AS rn,
       AverageSalesAmount,
       SaleAmount
INTO #TempTable2
FROM #TempTable1;

-- Step 3: Select statement with the final result
SELECT TOP (3)
    SalesPersonId, -- column names
    ProductId,
    SaleAmount
FROM #TempTable2
WHERE SaleAmount >= AverageSalesAmount
ORDER BY SaleAmount DESC,
         rn ASC;

-- Optional: Drop the temporary tables if needed
-- DROP TABLE #TempTable1, #TempTable2;

Not too shabby! Instead of you or the developer writing this code as a temporary table, you can use AI to convert it.

Summary

Chained CTEs are a helpful tool in SQL Server and, for some, much easier to read than nested derived tables. With a chained CTE, you don't need to worry about including additional indexes as you might with the temporary table approach. However, when a chain grows to several links, they can be hard to debug. Transforming them into separate steps with temporary tables can simplify troubleshooting.

I'd love to hear from you. What methods do you use when trying to troubleshoot long-chained CTEs? Feel free to share tips in the comments below.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jared Westover Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2024-10-16

Comments For This Article




Sunday, November 3, 2024 - 1:22:28 PM - Jared Westover Back To Top (92615)
@Barak810

Thank you for reading! I love Itzik Ben-Gan's content, thank you for the link.

Sunday, November 3, 2024 - 1:21:23 PM - Jared Westover Back To Top (92614)
@Nico

Thank you for reading and the kind words.

Tuesday, October 29, 2024 - 11:12:04 AM - Nico Botes Back To Top (92606)
Great write-up Jared, just when CTE-hell broke loose, this helped quite a bit putting things into context. Thanks!

Wednesday, October 23, 2024 - 5:54:15 AM - Philippe Addelia Back To Top (92595)
I like CTE's however there are cases where Temp tables will perform better.
If the dataset stored in the CTE is very large, memory pressure can become a performance bottleneck, in that case Temp tables are better.
The same applies to variable tables as the data set grow. At some point the temp table does perform better.
If I need to re-use the result set several times in the query, Temp table are better because they remain available beyond their first use.
If I know that the dataset will remain fairly small, then I use CTE and or table variables, this will be faster than temp tables.
My rough cut-off limit between CTE and temp table is around 20,000 rows and 5-6 columns, while it depends on the environment.

Friday, October 18, 2024 - 3:45:28 PM - Jared Westover Back To Top (92589)
@Richard

Thank you for reading and adding the comment.

Thursday, October 17, 2024 - 10:21:17 AM - hotchan Back To Top (92577)
I don’t really like CTE, it often causes a lot of performance issues by using a bad execution plan. temp table would be good to use and easy to debug. how do you think?

Thursday, October 17, 2024 - 3:53:35 AM - Barak810 Back To Top (92573)
I strongly recommend checking out 'Fundamentals of Table Expressions' by Itzik Ben-Gan.
I believe CTEs are covered somewhere between parts 3 and 6 in this 13-part series. You'll find some valuable insights.

https://www.google.com/search?q=itzik+ben+gan+%22Fundamentals+of+table+expressions%22+site%3Asqlperformance.com

Wednesday, October 16, 2024 - 11:28:09 AM - Richard Back To Top (92571)
Great overview on the pros and cons of CTEs and comparing them to temp tables. Thank you. I tend to use CTEs as a base for Windows functions, but it's usually one CTE.














get free sql tips
agree to terms