UPDATE Statement Performance in SQL Server

By:   |   Updated: 2019-01-04   |   Comments (3)   |   Related: > Performance Tuning


Problem

In SQL Server we use the UPDATE statement for modifying data. Updating data can be done in various ways such as row by row, one big batch or in several smaller batches.  In this tip we will look at the differences to perform UPDATEs using these methods.

Solution

Setup SQL Server Test environment

First, we'll setup a SQL Server database table in order to run UPDATE operations.   This will create a table and insert 1 million rows.

USE MASTER
GO

CREATE DATABASE UpdateDB
GO

USE UpdateDB
GO

CREATE Table CustTransaction
(
id INT PRIMARY KEY IDENTITY(1,1),
col1 NVARCHAR(500),
Col2 NVARCHAR(500),
Col3 NVARCHAR(500),
Col4 NVARCHAR(500),
COL5 NVARCHAR(500),
COL6 NVARCHAR(500)
) 
GO

INSERT INTO CustTransaction
SELECT 
'C1' + replicate ('0',200),
'C2' + replicate ('0',200),
'C3' + replicate ('0',200),
'C4' + replicate ('0',200),
'C5' + replicate ('0',200),
'C6' + replicate ('0',200)
GO 1000000 

Basics about SQL Server Updates

To limit the rows that are updated when you execute an UPDATE statement, a WHERE clause can be used. The WHERE clause limits the search conditions that define which rows to update in the base table.

Here are examples in which data is updated based on a filter condition.

UPDATE TABLENAME
SET COLUMNNAME = (SomeValue)
WHERE COLUMNNAME = (SomeValue)

 In addition, the filtering of rows can be done by referencing another table with a join.

UPDATE TABLENAME
SET COLUMNNAME = (SomeValue)
FROM TABLENAME AS Alias
INNER JOIN OTHERTABLE AS OtherAlias ON Alias.columnName = OtherAlias.columnName 
WHERE OtherAlias.COLUMNNAME = (SomeValue)

In addition, you can update the data in one table based on the value from another table.

UPDATE TABLENAME
SET COLUMNNAME = OtherAlias.columnName
FROM TABLENAME AS Alias 
INNER JOIN OTHERTABLE AS OtherAlias ON Alias.columnName = OtherAlias.columnName 
WHERE OtherAlias.columnName = (SomeValue)

For this tip, we are going to keep things simple and just use one table for the update.

UPDATE Scenarios

I need to update a big SQL Server table and will use the following options:

  • Data update by row by row
  • Data update in bulk
  • Data update in small batches

We need to aware of these difference methods. Which one will be faster. Is there some performance impact due to infrastructure, isolation level, concurrent processes that are running, etc.

UPDATE Row By Row

I am going to update the table CustTransaction using a row by row method with a loop. I have prepared the sample query and executed as follows.

Update Row by row

The CustTransaction table has 1 million rows and this took around 22 minutes to update all rows. Updating row by row is very slow, but less resource intensive and the consistency is at a statement level. If you want all updates to be consistent for the table, you would need to use a transaction, because if one update fails, the rest will succeed. Also, in a transaction when updating row by row with a large volume, it might be possible that deadlocks occur because row iteration in the transaction takes more time.

UPDATE in Bulk

I am going to update the CustTransaction table with one command. I have prepared the sample query and executed as follows.

Update in Bulk

This took around 5 minutes. It’s a faster update than a row by row operation, but this is best used when updating limited rows. A bulk update is an expensive operation in terms of query cost, because it takes more resources for the single update operation. It also takes time for the update to be logged in the transaction log. Also, long running updates can cause blocking issues for other processes.

UPDATE in Batches

Another approach is to update the data in smaller batches.  So instead of doing row by row updates or one big update,  I have divided the data into batches of 10K rows and performed the update operation as shown below.

Update in Batch

This completed in less than 2 minutes, which is much faster than the other two approaches. In addition, the transaction size is minimized and also the data logged and resources used are minimized.

I prefer to update data in batches for large volumes of data.  You can set your own batch size after analyzing your data and environment to see what makes the most sense.  Another option for batching updates is to use the top statement for configuring the batch size.

Things to Consider

  • For optimizing update operations you should try to minimize the transaction size.
  • Always make sure you use a WHERE clause unless you want to update the entire table.
  • Do large updates during low peak usage times to minimize blocking of other processes.
  • If triggers are used on a table, it is helpful to disable the triggers prior to doing large updates if possible.
  • Make sure you have the correct indexes in place to optimize the update.
  • If you have indexes that are not being used for the table, removing them could help optimize the update.
  • Understand the Isolation Level because this could directly or indirectly impact query performance.
  • Make sure you understand the recovery model for the database.  Different recovery models can help reduce the use of the transaction log when updates are done in batches, but understand the differences before making changes.
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Bhavesh Patel Bhavesh Patel is a SQL Server database professional with 10+ years of experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2019-01-04

Comments For This Article




Monday, May 17, 2021 - 4:11:30 AM - Amit Back To Top (88691)
really helpful

Monday, January 7, 2019 - 8:23:03 AM - Mark Horninger Back To Top (78648)

 What was the recovery model set to?  I'm thinking full recovery vs simple recovery might make a big difference in performance.


Friday, January 4, 2019 - 10:27:56 AM - Doug Back To Top (78625)

I'm not certain why your 'bulk' update performed so slowly. Executing on our dev server, which is pretty much a bare-bones configuration (single processor with 4 cores, 12GB RAM with only 6GB given to SQL Server), it ran in 4 seconds. 

Your batch update ran in 25 seconds on my server. If you have SQL Server 2012 or above, the following code performs even faster (5 seconds) if you want to update in batches. One of the keys is that the WHERE condition is based on a clustered primary key:

declare @startID int = (select min(id) from CustTransaction);
declare @batchSize int = 10000;
declare @rows int = -1;

while @rows <> 0
begin
 raiserror('Beginning loop. Last ROWCOUNT was %d',0,1, @rows) with nowait;
 update top (@batchSize) CustTransaction
 set col1 = col2,
  col2 = col1,
  col3 = col4,
  col4 = col3,
  col5 = col6,
  col6 = col5
 where id between @startID and @startID + @batchSize;

 set @rows = @@ROWCOUNT;
 set @startID += @batchSize;
end;

I'm not sure why you got such a larger performance swing between the bulk and batch updates. I've typically used batch updating when I know I'm going to touch the majority of rows in a table and the table size is over 10M rows. Under that, then a single set-based update will usually perform well.

I'm not even going to try the RBAR example :-). 

Also, many times we want to create test tables to insert a certain number of rows into a table for testing purposes. While your code certainly works, it is time consuming. The following code will create and populate the same table in under 10 seconds. If you already have a tally (numbers) table built it will likely take even less time. btw, I killed your code after 18 minutes. It had only populated 290K rows. I hate to think how long it took on your box.

CREATE Table CustTransaction
(
id INT PRIMARY KEY IDENTITY(1,1),
col1 NVARCHAR(500),
col2 NVARCHAR(500),
col3 NVARCHAR(500),
col4 NVARCHAR(500),
col5 NVARCHAR(500),
col6 NVARCHAR(500)

GO
;WITH
  p1 as (select 1 as C union all select 1),
  p2 as (select 1 as C from p1 as A, p1 as B),
  p3 as (select 1 as C from p2 as A, p2 as B),
  p4 as (select 1 as C from p3 as A, p3 as B),
  p5 as (select 1 as C from p4 as A, p4 as B),
  p6 as (select 1 as C from p5 as A, p5 as B),
  tally as (select row_number() over(order by C) as N from p6
)
INSERT INTO CustTransaction
SELECT 
'C1' + replicate ('0',200),
'C2' + replicate ('0',200),
'C3' + replicate ('0',200),
'C4' + replicate ('0',200),
'C5' + replicate ('0',200),
'C6' + replicate ('0',200)
from tally
where N between 1 and 1000000

 















get free sql tips
agree to terms