SQL MERGE vs INSERT, UPDATE, DELETE Performance Considerations

By: Jared Westover | Updated: 2023-03-08 | Comments (6) | Related: 1 | 2 | 3 | 4 | 5 | 6 | 7 | > TSQL

Problem

Whenever someone mentions the MERGE statement, at least one person points out that the performance is sub-optimal compared to the same basic T-SQL INSERT, UPDATE, and DELETE statements. You can find a wide variety of articles either supporting or condemning MERGE. Like sushi or exercise, it's one of those SQL statements that people love or hate. Full disclosure, I like using MERGE. So, I'm putting the claims of it performing sub-optimally to the test. Perhaps by the end of this tutorial, I'll change my mind and stop using MERGE.

Solution

In this tutorial, we'll start by looking at the MERGE statement and why you would want to use it. I'll then look at alternatives to MERGE. I'll present the primary functionality MERGE provides that other options don't. We'll set up a decent size dataset and perform our comparison test. By the end of this tutorial, I hope to answer the question: Is MERGE slow? Perhaps, more importantly: Should new code projects include the MERGE statement?

Exploring the SQL Server MERGE Statement

Let's start by answering, "What is MERGE?" Microsoft defines MERGE as "a statement that runs insert, update, or delete operations on a target table from the results of a join with a source table." It seems simple, right? You have a source table, aka the source of truth. Then you have a destination table that you'll update based on the source of truth. The MERGE statement exists in other RDMS such as Oracle, MySQL, and Azure Synapse.

Data engineers commonly use MERGE for loading or syncing data into an environment, perhaps for a data warehouse. For example, if your database has an employee table synced nightly from an origination system, you could use MERGE to get the job done. MERGE inserts new employees, updates existing ones, and deletes ones missing from the source. Below is a simple example of using the MERGE statement.

MERGE dbo.DestinationTable AS dest
USING dbo.SourceTable AS src -- Source Clauses
ON (dest.SpecialKey = src.SpecialKey)
WHEN MATCHED THEN -- Matched Clauses
    UPDATE SET Column1 = src.Column1,
               Column2 = src.Column2,
               Column3 = src.Column3
WHEN NOT MATCHED BY TARGET THEN
    INSERT
    (
        Column1,
        Column2,
        Column3
    )
    VALUES
    (src.Column1, src.Column2, src.Column3);
GO

Don't forget to include the semicolon at the end of MERGE or SQL yells at you.

Unique Advantage of MERGE

One of the main reasons I advocate MERGE is that it offers the ability to use a special OUTPUT clause to reference columns not part of the inserted or deleted tables. The standard OUTPUT clause works with INSERT, UPDATE, and DELETE; however, you can't reference columns not being inserted or deleted. Daniel Hutmacher wrote a short blog post on the topic that's worth the time investment. This technique helps replace the logic of cursors or WHILE loops.

It's impossible to perform a search without coming across articles pointing out known issues with the MERGE statement. Aaron Bertrand has written several of them, and I recommend reading them. MERGE is not a perfect statement, but this article focuses on performance. When I say performance, I care primarily about the time something takes to complete.

Alternative to MERGE

What's the alternative to MERGE? It's straightforward—you individually perform INSERT, UPDATE, and DELETE statements. Microsoft recommends not using MERGE if you perform any of those individual operations. Of course, this would also mean you don't need access to the non-inserted and deleted columns mentioned above.

Building a Test Dataset

Now, let's set up the base for our test. It's important to mimic a real-world dataset versus a simplistic one with a single integer column. The basic idea is to update one table based on the source of another. Let's name our first table "EmployeeSource" and the second "EmployeeDestination." We'll also create a trusty numbers table. I rarely delete data in the real world due to compliance regulations. So, let's include a BIT column indicating a soft delete instead of deleting. Remember, you'll need about 7 GB of free space.

USE [master];
GO

IF DATABASEPROPERTYEX('MergeDemo', 'Version') IS NOT NULL
BEGIN
    ALTER DATABASE MergeDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE MergeDemo;
END;
GO

CREATE DATABASE MergeDemo;
GO

ALTER DATABASE MergeDemo SET RECOVERY SIMPLE;
GO

USE MergeDemo;
GO

DROP TABLE IF EXISTS dbo.Numbers;
GO

DECLARE @UpperBound INT = 10010000;
;WITH cteN (Number)
AS (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
    FROM sys.all_columns AS s1
        CROSS JOIN sys.all_columns AS s2)
SELECT [Number]
INTO dbo.Numbers
FROM cteN
WHERE [Number] <= @UpperBound;

CREATE UNIQUE CLUSTERED INDEX CIX_Number
ON dbo.Numbers ([Number])
WITH (FILLFACTOR = 100);

CREATE TABLE dbo.EmployeeDestination
(
    Id INT IDENTITY(1, 1) NOT NULL,
    EmployeeNumber CHAR(20) NOT NULL,
    FirstName NVARCHAR(256) NOT NULL,
    LastName NVARCHAR(256) NOT NULL,
    MiddleIntial CHAR(1) NULL,
    YearlySalary DECIMAL(36, 2) NOT NULL,
    DepartmentCode CHAR(5) NOT NULL,
    StartDate DATE NOT NULL,
    IsActive BIT NOT NULL
        DEFAULT 1,
    IsDeleted BIT NOT NULL
        DEFAULT 0,
    CONSTRAINT PK_Merge_Employee_Id
        PRIMARY KEY CLUSTERED (Id)
);

CREATE TABLE dbo.EmployeeSource
(
    RowNumber INT NOT NULL,
    EmployeeNumber CHAR(20) NOT NULL,
    FirstName NVARCHAR(256) NOT NULL,
    LastName NVARCHAR(256) NOT NULL,
    MiddleIntial CHAR(1) NULL,
    YearlySalary DECIMAL(36, 2) NOT NULL,
    DepartmentCode CHAR(5) NOT NULL,
    StartDate DATE NOT NULL,
    IsActive BIT NOT NULL,
    IsDeleted BIT NOT NULL
);
GO

The next code block performs the following actions:

The first statement inserts 10 million rows of data into the "EmployeeDestination" table.
The second statement inserts 1 million rows of data into the "EmployeeSource" table.
The third statement updates 100,000 rows of data already in the "EmployeeSource" table.
The last statement inserts 10,000 new rows of data into the "EmployeeSource" table.

After performing these actions, the goal is to insert 10,000 new rows of data and update 100,000 existing rows of data in the "EmployeeDestination" table.

INSERT INTO dbo.EmployeeDestination
(
    EmployeeNumber,
    FirstName,
    LastName,
    MiddleIntial,
    YearlySalary,
    DepartmentCode,
    StartDate,
    IsActive
)
SELECT TOP (10000000)
       CONCAT('E00', n.Number) AS EmployeeNumber,
       SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26) + 1, 6) AS FirstName,
       SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26) + 1, 7) AS LastName,
       SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26) + 1, 1) AS MiddleInitial,
       ABS(CHECKSUM(NEWID()) % 100000) + 45000 AS YearlySalary,
       CONCAT('M000', ABS(CHECKSUM(NEWID()) % 3) + 1) AS DepartmentCode,
       DATEADD(DAY, RAND(CHECKSUM(NEWID())) * (1 + DATEDIFF(DAY, '2020-01-01', '03-31-2023')), '2020-01-01') AS StartDate,
       CASE
           WHEN (n.Number % 1000) = 0 THEN
               0
           ELSE
               1
       END AS IsActive
FROM dbo.Numbers n;
GO

INSERT INTO dbo.EmployeeSource
(
    RowNumber,
    EmployeeNumber,
    FirstName,
    LastName,
    MiddleIntial,
    YearlySalary,
    DepartmentCode,
    StartDate,
    IsActive,
    IsDeleted
)
SELECT ROW_NUMBER() OVER (ORDER BY EmployeeNumber) AS RowNumber,
       EmployeeNumber,
       FirstName,
       LastName,
       MiddleIntial,
       YearlySalary,
       DepartmentCode,
       StartDate,
       IsActive,
       IsDeleted
FROM dbo.EmployeeDestination
WHERE (Id % 10) = 0;


UPDATE dbo.EmployeeSource
SET YearlySalary = ABS(CHECKSUM(NEWID()) % 100000) + 45000,
    IsDeleted = 1,
    DepartmentCode = 'M0004'
WHERE (RowNumber % 10) = 0;

INSERT INTO dbo.EmployeeSource
(
    RowNumber,
    EmployeeNumber,
    FirstName,
    LastName,
    MiddleIntial,
    YearlySalary,
    DepartmentCode,
    StartDate,
    IsDeleted,
    IsActive
)
SELECT TOP (10000)
       ROW_NUMBER() OVER (ORDER BY n.Number) AS RowNumber,
       CONCAT('E00', n.Number) AS EmployeeNumber,
       SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26) + 1, 6) AS FirstName,
       SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26) + 1, 7) AS LastName,
       SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26) + 1, 1) AS MiddleInitial,
       ABS(CHECKSUM(NEWID()) % 100000) + 45000 AS YearlySalary,
       CONCAT('M000', ABS(CHECKSUM(NEWID()) % 3) + 1) AS DepartmentCode,
       DATEADD(DAY, RAND(CHECKSUM(NEWID())) * (1 + DATEDIFF(DAY, '2020-01-01', '03-31-2023')), '2020-01-01') AS StartDate,
       0 AS IsDeleted,
       1 AS IsActive
FROM dbo.Numbers n
WHERE n.Number > 10000000;
GO

At this point in the process, I made a copy-only backup that I'll restore after each test. I wanted to ensure that certain factors like statistics or cache didn't affect the results.

Executing the SQL MERGE Statement

With either of these methods, I want to ensure SQL Server only updates rows where a modification exists. Said another way, the row in the source data and destination data are different. This technique is beneficial when dealing with temporal tables. Now that we have our dataset let's first run the MERGE statement:

MERGE dbo.EmployeeDestination AS dest
USING dbo.EmployeeSource AS src
ON (dest.EmployeeNumber = src.EmployeeNumber)
WHEN MATCHED AND EXISTS
                 (
                     SELECT FirstName,
                            LastName,
                            MiddleIntial,
                            YearlySalary,
                            DepartmentCode,
                            StartDate,
                            IsActive,
                            IsDeleted
                     FROM dbo.EmployeeSource
                     EXCEPT
                     SELECT FirstName,
                            LastName,
                            MiddleIntial,
                            YearlySalary,
                            DepartmentCode,
                            StartDate,
                            IsActive,
                            IsDeleted
                     FROM dbo.EmployeeDestination 
                 ) THEN
    UPDATE SET FirstName = src.FirstName,
               LastName = src.LastName,
               MiddleIntial = src.MiddleIntial,
               YearlySalary = src.YearlySalary,
               DepartmentCode = src.DepartmentCode,
               StartDate = src.StartDate,
               IsActive = src.IsActive,
               IsDeleted = src.IsDeleted
WHEN NOT MATCHED BY TARGET THEN
    INSERT
    (
        EmployeeNumber,
        FirstName,
        LastName,
        MiddleIntial,
        YearlySalary,
        DepartmentCode,
        StartDate,
        IsActive,
        IsDeleted
    )
    VALUES
    (src.EmployeeNumber, src.FirstName, src.LastName, src.MiddleIntial, src.YearlySalary, src.DepartmentCode,
     src.StartDate, src.IsActive, src.IsDeleted);
GO

After each iteration, I executed the above statement five times with a restore of the database. This process was laborious, but I wanted consistent comparisons.

Performing INSERT Statements and UPDATE Statements

Below is the code for performing the UPDATE and INSERT in separate statements. Like the MERGE, I only want to update rows where a modification occurred.

UPDATE dest
SET dest.FirstName = src.FirstName,
    dest.LastName = src.LastName,
    dest.MiddleIntial = src.MiddleIntial,
    dest.YearlySalary = src.YearlySalary,
    dest.DepartmentCode = src.DepartmentCode,
    dest.StartDate = src.StartDate,
    IsActive = src.IsActive,
    IsDeleted = src.IsDeleted
FROM dbo.EmployeeDestination_NonMerge dest
    INNER JOIN dbo.EmployeeSource src
        ON dest.EmployeeNumber = src.EmployeeNumber -- JOIN Condition
WHERE EXISTS
(
    SELECT FirstName,
           LastName,
           MiddleIntial,
           YearlySalary,
           DepartmentCode,
           StartDate,
           IsActive,
           IsDeleted
    FROM dbo.EmployeeSource
    EXCEPT
    SELECT FirstName,
           LastName,
           MiddleIntial,
           YearlySalary,
           DepartmentCode,
           StartDate,
           IsActive,
           IsDeleted
    FROM dbo.EmployeeDestination 
);

INSERT INTO dbo.EmployeeDestination 
(
    EmployeeNumber,
    FirstName,
    LastName,
    MiddleIntial,
    YearlySalary,
    DepartmentCode,
    StartDate,
    IsActive,
    IsDeleted
)
SELECT EmployeeNumber,
       FirstName,
       LastName,
       MiddleIntial,
       YearlySalary,
       DepartmentCode,
       StartDate,
       IsActive,
       IsDeleted
FROM dbo.EmployeeSource src
WHERE NOT EXISTS
(
    SELECT src.EmployeeNumber
    FROM dbo.EmployeeDestination dest
    WHERE src.EmployeeNumber = dest.EmployeeNumber
);
GO

As with MERGE, I executed the two statements five times with a restored database after each run.

Reviewing the Results

I captured the performance markers below with Extended Events. It's a helpful tool that allows for comprehensive and efficient monitoring and collection of system information from SQL Server. You could also use STATISTICS TIME & IO. Now back to the results.

The numbers reflect an average of five runs. There wasn't a significant difference between individual executions.

Method	Logical Reads	Writes	CPU Time	Duration	Duration %
MERGE Statement	5992820	99404	838904667	79088005	18.9%
UPDATE/INSERT Operations	5829529	99316	720380333	66536671

Before comparing the statements, I suspected MERGE might be slower. According to the results, MERGE is about 19% slower. The outcome surprised me. I would have guessed 3-5%. Does this change my mind? Yea, kind of. When starting a new project requiring a larger data sync, I'll likely choose the individual INSERT operations and UPDATE operations. A 19% difference is huge if your sync process takes hours. Dwain Camps found similar results when comparing the methods in this article.

Do you prefer the MERGE syntax? Then please keep on using it. I drive a particular route to my office because of the traffic pattern, even though it takes a bit longer sometimes.

I look forward to hearing about your experiences with MERGE in the comments below. If you feel the test was somehow skewed, please let me know. I based the original criteria on my specific needs.

Key Takeaways

A handy feature of MERGE is the ability to reference columns not in the inserted or deleted tables in the OUTPUT clause.
If you only perform individual INSERT, UPDATE, or DELETE statements, consider skipping MERGE.
Do you like using MERGE? Keep doing so but be aware that the command might add performance overhead.

Next Steps

Would you like to learn more about capturing performance metrics using Extended Events? Check out this link for a list of tips.
Arshad Ali wrote a detailed article on using the MERGE statement. I highly recommend giving it a read.
Aaron Bertrand wrote a massive collection of articles on proceeding with caution when using the MERGE statement.

About the author

Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2023-03-08

Performance Considerations with SQL MERGE vs INSERT, UPDATE, DELETE