By: Aaron Bertrand | Updated: 2020-08-19 | Comments (1) | Related: > Database Design
Problem
We recently performed a DDL operation against a SQL Server table – simply increasing the size of a varchar column – which should have been instantaneous. Instead, we killed it after observing 20 minutes of HARD_SYNC_COMMIT waits and a blocked replication log reader. Could this issue have been avoided? What went wrong?
Solution
Here was the original table (grossly simplified):
CREATE TABLE dbo.Floob
( FloobID int IDENTITY(1,1) PRIMARY KEY,
PostalCode varchar(32)
);
And here was the command to change the PostalCode column from 32 characters to 64:
ALTER TABLE dbo.Floob
ALTER COLUMN PostalCode varchar(64) NOT NULL
WITH (ONLINE = ON);
If you don't spot the issue right away, don't worry, you're not alone. We missed it too. The problem is that the original table had the column defined as nullable (though NULL was prevented through other logic). However, the DDL command that was generated from source control inadvertently changed PostalCode to NOT NULL, based on a rule (or assumption) rather than the current state of the table.
Also, I want to be clear that this issue wasn't specifically caused by the operation being online – it is just why it was surprising to see even though we explicitly asked for an online operation. You may not be able to test this scenario exactly, since altering a column online is restricted to Enterprise Edition (just like online index operations). If you try the syntax on a lesser edition, you will see this error message:
Online index operations can only be performed in Enterprise edition of SQL Server.
Changing nullability can be, effectively, a size-of-data operation, whether online or not. On a small table, this is unlikely to be noticed; on a large table, however, it can become a problem. And we can prove this simply by inspecting the transaction log before and after a change, and understanding how those changes cause impact downstream (basically, anything that uses the log reader or redo, like replication and Availability Groups).
To set up a quick test, we can repeatedly run a script with minor changes to detect cases where the amount of work logged varies with the number of rows (which shouldn't happen for an operation that is truly online and indifferent to the number of rows affected):
- Setting the initial column NULL or NOT NULL
- Populating the table with 10, 100, 1,000, or 10,000 rows
- Altering the column explicitly to be NULL or NOT NULL
- Performing this change with ONLINE = ON or OFF
The skeleton of the test looks like this:
-- create table with null or not null
-- populate with n rows
-- count number of log records in fn_dblog()
-- alter table using null or not null, online or offline
-- find delta in number of log records in fn_dblog()
And here is the actual code (I just re-created the database each time to keep things clean):
USE master;
GO ALTER DATABASE Splunge SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO DROP DATABASE Splunge;
GO CREATE DATABASE Splunge;
GO USE Splunge;
SET NOCOUNT ON;
GO CREATE TABLE dbo.Floob
(
FloobID int IDENTITY(1,1) PRIMARY KEY,
PostalCode varchar(32) NULL -- change to NOT NULL
);
GO DECLARE @n int = 10; -- 100, 1000, 10000 INSERT dbo.Floob(PostalCode) SELECT TOP (@n) 'N0T L33T'
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2;
GO DECLARE @i int = (SELECT COUNT(*) FROM sys.fn_dblog(DB_ID(), NULL)); DECLARE @d datetime2 = sysdatetime(); ALTER TABLE dbo.Floob
ALTER COLUMN PostalCode varchar(64)
NULL -- change to NOT NULL
WITH (ONLINE = ON); -- change to OFF (or comment out) SELECT DurationMilliseconds = DATEDIFF(MILLISECOND, @d, sysdatetime()); SELECT LogRecords = COUNT(*)-@i FROM sys.fn_dblog(DB_ID(), NULL);
I ran each of these test variation 10 times and averaged out the duration (which varied from test to test) and log records produced (which was constant for each variation). Here are the results for the log records:
And the duration, which should be no surprise after seeing the log activity:
Essentially, changing the column from NULL to NOT NULL invoked a whole bunch of additional log activity, and this increased with the number of rows affected, making it a size-of-data operation, at least to a certain extent. The majority of this log activity fell under LOP_INSERT_ROWS for online operations, and LOP_MODIFY_ROW when running the ALTER offline. This log activity can obviously impact duration and, as suggested before, might affect downstream systems and data movement as well.
You will notice that the online operations required about double the log activity as offline, and about double the duration, as well. This is the price you pay for not blocking – other activities can run against the table while the column is being changed, even if it does become a size-of-data operation.
Summary
Please use caution when planning out schema changes. The original nullability state of a column is something to easily overlook when your DDL commands are meant to look a certain way or conform to some specific standard or pattern. If your intent is to change nullability then, by all means, it is something you have to do. In our case, this was an unintended side effect, and the impact to the log was unexpected because, under most other scenarios, increasing the size of a variable-length column is an online, metadata-only operation whether or not you specify. In a future tip, I will talk about what happens when the column participates in a non-clustered index, and ways ONLINE column changes may still cause locking/blocking.
Next Steps
See these related tips and other resources:
- SQL Server 2016 Online ALTER COLUMN Operation
- Be Cautious Altering Tables Using SQL Server Management Studio SSMS
- Best practices for SQL Server database ALTER table operations
- SQL Server Data Type Consistency
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2020-08-19