Replace SQL Cursors with Set Based Operations

By: Jared Westover | Updated: 2024-05-30 | Comments (2) | Related: > TSQL

Problem

Cursors get a bad rap in SQL Server, and for good reasons. They have their uses, such as performing database maintenance tasks. I avoid them when it comes to standard T-SQL code. Performance issues become apparent when dealing with tables of any notable size. If you come from a more procedural language background, it can be hard to think outside the cursor. Don't worry; there's hope.

Solution

In this article, I want to review a typical pattern we've all seen. It involves using a cursor or WHILE loop to insert or update data. We'll start by looking at why a developer might default to a cursor in the first place. Next, I'll build a typical cursor to solve this problem. Then, we'll explore how we can achieve the same outcome with a set-based operation in a fraction of the time. Please bookmark and share this article whenever you see a cursor pop-up in your new code.

A Look at SQL Cursors

There are dozens of articles describing cursors, but the following is a definition in one sentence: It's an operation in SQL Server that allows you to work with one row at a time. Developers with a procedural-focused background, such as C or Python, might gravitate towards cursors and WHILE loops.

You can declare a simple cursor using the code below.

-- mssqltips.com
DECLARE @characters TABLE
(
    Id INT,
    Name VARCHAR(50)
);
INSERT INTO @characters
VALUES
(1, 'Leonard'),
(2, 'Sammy'),
(3, 'Natalie'),
(4, 'Teddy'),
(5, 'Blonde');

DECLARE @character VARCHAR(50);
DECLARE movie_cursor CURSOR FOR SELECT Name FROM @characters;

OPEN movie_cursor;
FETCH NEXT FROM movie_cursor
INTO @character;
WHILE @@FETCH_STATUS = 0
BEGIN
    PRINT @character;
    FETCH NEXT FROM movie_cursor
    INTO @character;
END;

CLOSE movie_cursor;
DEALLOCATE movie_cursor;

Results:

(5 rows affected)
Leonard
Sammy
Natalie
Teddy
Blonde

This cursor only prints out the names from the table variable. Even the most procedural-minded person would agree that the cursor above is a bad idea. We can get the same output using the code below. It's been a long time since I've seen anything this bad during code review.

-- mssqltips.com
DECLARE @characters TABLE
(
    Id INT,
    Name VARCHAR(50)
);
INSERT INTO @characters
VALUES
(1, 'Leonard'),
(2, 'Sammy'),
(3, 'Natalie'),
(4, 'Teddy'),
(5, 'Blonde');

SELECT Name FROM @characters;

I won't make a blanket statement that you should never use a cursor to modify data in SQL Server. However, you can replace a cursor with a set-based approach in nearly every instance I've faced. This statement especially applies to modern versions of SQL Server, let's say 2017 and up.

If you work in an environment where cursors are everywhere in old code, it can be hard to make changes. From an outsider's perspective, it's hard to convince a product owner or stakeholder to rewrite dozens of stored procedures when the code performs well. Any code change comes with risk. You need to test. And what about potential bugs in the new code? As data-minded people, we see the shortcomings when using cursors against larger data sets.

One action item you can take moving forward is to urge developers to adopt a more set-based approach. In the following example, we'll look at a typical scenario where a developer might try to use a cursor. But we'll offer a set-based approach that outperforms the cursor with a big dataset.

Replace The SQL Cursor

Developers might use a cursor to capture newly created identity values. Then, they perform actions on those values in the same transaction. For example, imagine inserting data into a destination table from a source table, and you need to update the source table with the newly inserted identity values.

Let's create a small dataset to see this in action. With the code below, we'll end up with two tables. The source table holds the initial values, and the other table needs the values inserted.

-- mssqltips.com
CREATE TABLE #source
(
    SourceId INT,
    DestinationId INT NULL,
    FullName VARCHAR(20)
);
INSERT INTO #source
VALUES
(8, NULL, 'Bob Jones'),
(10, NULL, 'Sue Smith'),
(14, NULL, 'Jane Red'),
(22, NULL, 'Bob Jones'),
(2, NULL, 'Sarah Clark'),
(42, NULL, 'Jane Red');

CREATE TABLE #destination
(
    DestinationId INT IDENTITY(1, 1),
    FullName VARCHAR(20)
);
GO

The source table has a column we need to update to reference the DestinationId for the inserted identity values. I've included the code below to solve this using a cursor. Finally, we select the results from the source table.

-- mssqltips.com
DECLARE @SourceId INT,
        @DestinationId INT,
        @FullName VARCHAR(20);

DECLARE Cursor_Demo CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR
SELECT SourceId,
       FullName
FROM #source;

OPEN Cursor_Demo;
FETCH NEXT FROM Cursor_Demo
INTO @SourceId,
     @FullName;

WHILE (@@FETCH_STATUS = 0)
BEGIN

    INSERT INTO #destination
    (
        FullName
    )
    VALUES
    (@FullName);

    SET @DestinationId = SCOPE_IDENTITY();

    UPDATE #source
    SET DestinationId = @DestinationId
    WHERE SourceId = @SourceId;

    FETCH NEXT FROM Cursor_Demo
    INTO @SourceId,
         @FullName;
END;


CLOSE Cursor_Demo;
DEALLOCATE Cursor_Demo;

SELECT * FROM #source
GO

Results:

SourceId    DestinationId FullName
----------- ------------- --------------------
8           1             Bob Jones
10          2             Sue Smith
14          3             Jane Red
22          4             Bob Jones
2           5             Sarah Clark
42          6             Jane Red

Using a cursor technically works. However, cursors don't scale well. This process would take a long time if our source table contained 50 or 100 thousand rows. Is there a better option? I'm glad you asked.

Using a Set-Based Approach

Instead of using a cursor, let's devise a set-based way of populating the source table in our example above. There is a little-known clause in T-SQL called OUTPUT. Microsoft says that OUTPUT returns information from, or expressions based on, each row affected by an INSERT, UPDATE, DELETE, or MERGE statement. This approach sounds promising. The example below only returns the results and does not insert them into a table.

-- mssqltips.com
DECLARE @OutputTest AS TABLE
(
    Id INT IDENTITY(121, 14),
    Color VARCHAR(10) NULL
);

INSERT INTO @OutputTest
(
    Color
)
OUTPUT Inserted.Id,
       Inserted.Color
VALUES
('Red'),
('Green'),
('Blue');
GO

Results:

Id       Color
-------- ----------
121      Red
135      Green
149      Blue

Did you notice I set the identity seed and increment values oddly to keep you from guessing the IDs right away?

Let's recreate our source and destination tables from above. I'll also add a bridge table to hold the identity values from the OUTPUT clause.

-- mssqltips.com
DROP TABLE IF EXISTS #source;
DROP TABLE IF EXISTS #destination;
DROP TABLE IF EXISTS #bridge;

CREATE TABLE #source
(
    SourceId INT,
    DestinationId INT NULL,
    FullName VARCHAR(20)
);
INSERT INTO #source
VALUES
(8,  NULL, 'Bob Jones'),
(10, NULL, 'Sue Smith'),
(14, NULL, 'Jane Red'),
(22, NULL, 'Bob Jones'),
(2,  NULL, 'Sarah Clark'),
(42, NULL, 'Jane Red');

CREATE TABLE #destination
(
    DestinationId INT IDENTITY(1, 1),
    FullName VARCHAR(20)
);

CREATE TABLE #bridge
(
    SourceId INT,
    DestinationId INT
);
GO

You might think we can perform an INSERT statement and call it a day. Unfortunately, it's not that easy. The code below returns an error message since we can't reference a column that is not part of the target or destination table.

-- mssqltips.com
-- This doesn't work
INSERT INTO #Destination (FullName)
OUTPUT inserted.DestinationId, #Source.SourceId INTO #Bridge 
SELECT FullName FROM #Source s;
GO

Results:

Msg 4104, Level 16, State 1, Line 244
The multi-part identifier "#source.SourceId" could not be bound.

Don't worry; there's another way to get this method working. Instead of using an INSERT statement, we'll use MERGE.

Enter the SQL MERGE Statement

The MERGE statement allows you to perform INSERT, UPDATE, and DELETE all in one statement. Some people love it, and others don't. Aaron Bertrand wrote several articles about its shortcomings, and I recommend reading them. But for our use case, it works perfectly. I've included the syntax below for inserting our rows into the destination and updating the source table.

-- mssqltips.com
MERGE INTO #destination dest
    USING #source src
    ON 1 = 0
    WHEN NOT MATCHED BY TARGET THEN
        INSERT
        (
            FullName
        )
        VALUES
        (src.FullName)
    OUTPUT src.SourceId,
           Inserted.DestinationId
    INTO #bridge;

    UPDATE s 
    SET s.DestinationId = b.DestinationId
    FROM #source s
        INNER JOIN #bridge b
            ON b.SourceId = s.SourceId;

SELECT * FROM #source

Results:

SourceId    DestinationId FullName
----------- ------------- --------------
8           1             Bob Jones
10          2             Sue Smith
14          3             Jane Red
22          4             Bob Jones
2           5             Sarah Clark
42          6             Jane Red

This approach scales much better than the cursor when dealing with a larger dataset. If our source and destination table had more than 50,000 rows, you would only wait seconds rather than minutes when using the cursor.

The next time you review code and see a cursor doing something OUTPUT could handle, send the developer a link to this article.

Key Points

Avoid cursors for typical T-SQL scripts unless no other solution works. Even if your dataset is small, data tends to grow over time, and if it doesn't, things might not be going well for the business.
When part of a MERGE or standard INSERT, UPDATE, or DELETE, the OUTPUT clause provides several options for logging data by capturing the before-and-after values.
Remember, when you perform an INSERT and use the OUTPUT clause, you can only reference the columns inserted into the target table. However, that's when the MERGE comes in handy.

Next Steps

Jeremy Kadlec wrote the insightful article, SQL Server Cursor Example. He provides several real-world examples for building a cursor the right way.
If you are going to use a cursor, Aaron Bertrand has written extensively about tuning them to perform better. Check out Aaron's article, What impact can different cursor options have?
Jim Evans wrote a helpful article on using the OUTPUT clause for auditing data changes. Check out his article, Archive SQL Server Data using the SQL OUTPUT Clause, to learn more.

About the author

Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2024-05-30

Replace SQL Cursors with Set Based Operations Including OUTPUT and MERGE

Problem

Solution

A Look at SQL Cursors

Replace The SQL Cursor

Using a Set-Based Approach

Enter the SQL MERGE Statement

Key Points

Next Steps

About the author

Comments For This Article

Wednesday, June 5, 2024 - 5:06:43 PM - Jared Westover	Back To Top (92295)
Hi @Barak, Thank you for reading the article. I don't believe I explicitly said cursors were "bad" in the article. The only example where I used the word "bad" was performing a SELECT statement. Would you agree that performing a SELECT, as in my example with a cursor, is less than ideal? I agree that MERGE has its own set of problems.

Monday, June 3, 2024 - 1:10:35 PM - Barak	Back To Top (92285)
DON'T USE MERGE ‼ Cursors are "bad"? MERGE is catastrofic. It will work perfect on the Adventureworks and Stack Overflow with no load on the DB, but - just don't. https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/