By: Jared Westover | Updated: 2024-05-30 | Comments (2) | Related: > TSQL
Problem
Cursors get a bad rap in SQL Server, and for good reasons. They have their uses, such as performing database maintenance tasks. I avoid them when it comes to standard T-SQL code. Performance issues become apparent when dealing with tables of any notable size. If you come from a more procedural language background, it can be hard to think outside the cursor. Don't worry; there's hope.
Solution
In this article, I want to review a typical pattern we've all seen. It involves using a cursor or WHILE loop to insert or update data. We'll start by looking at why a developer might default to a cursor in the first place. Next, I'll build a typical cursor to solve this problem. Then, we'll explore how we can achieve the same outcome with a set-based operation in a fraction of the time. Please bookmark and share this article whenever you see a cursor pop-up in your new code.
A Look at SQL Cursors
There are dozens of articles describing cursors, but the following is a definition in one sentence: It's an operation in SQL Server that allows you to work with one row at a time. Developers with a procedural-focused background, such as C or Python, might gravitate towards cursors and WHILE loops.
You can declare a simple cursor using the code below.
-- mssqltips.com DECLARE @characters TABLE ( Id INT, Name VARCHAR(50) ); INSERT INTO @characters VALUES (1, 'Leonard'), (2, 'Sammy'), (3, 'Natalie'), (4, 'Teddy'), (5, 'Blonde'); DECLARE @character VARCHAR(50); DECLARE movie_cursor CURSOR FOR SELECT Name FROM @characters; OPEN movie_cursor; FETCH NEXT FROM movie_cursor INTO @character; WHILE @@FETCH_STATUS = 0 BEGIN PRINT @character; FETCH NEXT FROM movie_cursor INTO @character; END; CLOSE movie_cursor; DEALLOCATE movie_cursor;
Results:
(5 rows affected) Leonard Sammy Natalie Teddy Blonde
This cursor only prints out the names from the table variable. Even the most procedural-minded person would agree that the cursor above is a bad idea. We can get the same output using the code below. It's been a long time since I've seen anything this bad during code review.
-- mssqltips.com DECLARE @characters TABLE ( Id INT, Name VARCHAR(50) ); INSERT INTO @characters VALUES (1, 'Leonard'), (2, 'Sammy'), (3, 'Natalie'), (4, 'Teddy'), (5, 'Blonde'); SELECT Name FROM @characters;
I won't make a blanket statement that you should never use a cursor to modify data in SQL Server. However, you can replace a cursor with a set-based approach in nearly every instance I've faced. This statement especially applies to modern versions of SQL Server, let's say 2017 and up.
If you work in an environment where cursors are everywhere in old code, it can be hard to make changes. From an outsider's perspective, it's hard to convince a product owner or stakeholder to rewrite dozens of stored procedures when the code performs well. Any code change comes with risk. You need to test. And what about potential bugs in the new code? As data-minded people, we see the shortcomings when using cursors against larger data sets.
One action item you can take moving forward is to urge developers to adopt a more set-based approach. In the following example, we'll look at a typical scenario where a developer might try to use a cursor. But we'll offer a set-based approach that outperforms the cursor with a big dataset.
Replace The SQL Cursor
Developers might use a cursor to capture newly created identity values. Then, they perform actions on those values in the same transaction. For example, imagine inserting data into a destination table from a source table, and you need to update the source table with the newly inserted identity values.
Let's create a small dataset to see this in action. With the code below, we'll end up with two tables. The source table holds the initial values, and the other table needs the values inserted.
-- mssqltips.com CREATE TABLE #source ( SourceId INT, DestinationId INT NULL, FullName VARCHAR(20) ); INSERT INTO #source VALUES (8, NULL, 'Bob Jones'), (10, NULL, 'Sue Smith'), (14, NULL, 'Jane Red'), (22, NULL, 'Bob Jones'), (2, NULL, 'Sarah Clark'), (42, NULL, 'Jane Red'); CREATE TABLE #destination ( DestinationId INT IDENTITY(1, 1), FullName VARCHAR(20) ); GO
The source table has a column we need to update to reference the DestinationId for the inserted identity values. I've included the code below to solve this using a cursor. Finally, we select the results from the source table.
-- mssqltips.com DECLARE @SourceId INT, @DestinationId INT, @FullName VARCHAR(20); DECLARE Cursor_Demo CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR SELECT SourceId, FullName FROM #source; OPEN Cursor_Demo; FETCH NEXT FROM Cursor_Demo INTO @SourceId, @FullName; WHILE (@@FETCH_STATUS = 0) BEGIN INSERT INTO #destination ( FullName ) VALUES (@FullName); SET @DestinationId = SCOPE_IDENTITY(); UPDATE #source SET DestinationId = @DestinationId WHERE SourceId = @SourceId; FETCH NEXT FROM Cursor_Demo INTO @SourceId, @FullName; END; CLOSE Cursor_Demo; DEALLOCATE Cursor_Demo; SELECT * FROM #source GO
Results:
SourceId DestinationId FullName ----------- ------------- -------------------- 8 1 Bob Jones 10 2 Sue Smith 14 3 Jane Red 22 4 Bob Jones 2 5 Sarah Clark 42 6 Jane Red
Using a cursor technically works. However, cursors don't scale well. This process would take a long time if our source table contained 50 or 100 thousand rows. Is there a better option? I'm glad you asked.
Using a Set-Based Approach
Instead of using a cursor, let's devise a set-based way of populating the source table in our example above. There is a little-known clause in T-SQL called OUTPUT. Microsoft says that OUTPUT returns information from, or expressions based on, each row affected by an INSERT, UPDATE, DELETE, or MERGE statement. This approach sounds promising. The example below only returns the results and does not insert them into a table.
-- mssqltips.com DECLARE @OutputTest AS TABLE ( Id INT IDENTITY(121, 14), Color VARCHAR(10) NULL ); INSERT INTO @OutputTest ( Color ) OUTPUT Inserted.Id, Inserted.Color VALUES ('Red'), ('Green'), ('Blue'); GO
Results:
Id Color -------- ---------- 121 Red 135 Green 149 Blue
Did you notice I set the identity seed and increment values oddly to keep you from guessing the IDs right away?
Let's recreate our source and destination tables from above. I'll also add a bridge table to hold the identity values from the OUTPUT clause.
-- mssqltips.com DROP TABLE IF EXISTS #source; DROP TABLE IF EXISTS #destination; DROP TABLE IF EXISTS #bridge; CREATE TABLE #source ( SourceId INT, DestinationId INT NULL, FullName VARCHAR(20) ); INSERT INTO #source VALUES (8, NULL, 'Bob Jones'), (10, NULL, 'Sue Smith'), (14, NULL, 'Jane Red'), (22, NULL, 'Bob Jones'), (2, NULL, 'Sarah Clark'), (42, NULL, 'Jane Red'); CREATE TABLE #destination ( DestinationId INT IDENTITY(1, 1), FullName VARCHAR(20) ); CREATE TABLE #bridge ( SourceId INT, DestinationId INT ); GO
You might think we can perform an INSERT statement and call it a day. Unfortunately, it's not that easy. The code below returns an error message since we can't reference a column that is not part of the target or destination table.
-- mssqltips.com -- This doesn't work INSERT INTO #Destination (FullName) OUTPUT inserted.DestinationId, #Source.SourceId INTO #Bridge SELECT FullName FROM #Source s; GO
Results:
Msg 4104, Level 16, State 1, Line 244 The multi-part identifier "#source.SourceId" could not be bound.
Don't worry; there's another way to get this method working. Instead of using an INSERT statement, we'll use MERGE.
Enter the SQL MERGE Statement
The MERGE statement allows you to perform INSERT, UPDATE, and DELETE all in one statement. Some people love it, and others don't. Aaron Bertrand wrote several articles about its shortcomings, and I recommend reading them. But for our use case, it works perfectly. I've included the syntax below for inserting our rows into the destination and updating the source table.
-- mssqltips.com MERGE INTO #destination dest USING #source src ON 1 = 0 WHEN NOT MATCHED BY TARGET THEN INSERT ( FullName ) VALUES (src.FullName) OUTPUT src.SourceId, Inserted.DestinationId INTO #bridge; UPDATE s SET s.DestinationId = b.DestinationId FROM #source s INNER JOIN #bridge b ON b.SourceId = s.SourceId; SELECT * FROM #source
Results:
SourceId DestinationId FullName ----------- ------------- -------------- 8 1 Bob Jones 10 2 Sue Smith 14 3 Jane Red 22 4 Bob Jones 2 5 Sarah Clark 42 6 Jane Red
This approach scales much better than the cursor when dealing with a larger dataset. If our source and destination table had more than 50,000 rows, you would only wait seconds rather than minutes when using the cursor.
The next time you review code and see a cursor doing something OUTPUT could handle, send the developer a link to this article.
Key Points
- Avoid cursors for typical T-SQL scripts unless no other solution works. Even if your dataset is small, data tends to grow over time, and if it doesn't, things might not be going well for the business.
- When part of a MERGE or standard INSERT, UPDATE, or DELETE, the OUTPUT clause provides several options for logging data by capturing the before-and-after values.
- Remember, when you perform an INSERT and use the OUTPUT clause, you can only reference the columns inserted into the target table. However, that's when the MERGE comes in handy.
Next Steps
- Jeremy Kadlec wrote the insightful article, SQL Server Cursor Example. He provides several real-world examples for building a cursor the right way.
- If you are going to use a cursor, Aaron Bertrand has written extensively about tuning them to perform better. Check out Aaron's article, What impact can different cursor options have?
- Jim Evans wrote a helpful article on using the OUTPUT clause for auditing data changes. Check out his article, Archive SQL Server Data using the SQL OUTPUT Clause, to learn more.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2024-05-30