SQL Server Insert Performance for Clustered Indexes vs. Heap Tables

By: Ben Snaidero | Updated: 2017-07-07 | Comments (7) | Related: > Indexing

Problem

I've read many SQL Server best practice documents that state every table should have a clustered index as opposed to leaving it as a heap structure with nonclustered indexes. Most of the documentation notes the administrative benefits of using clustered indexes. Are there any performance implications, either positive or negative, from implementing a clustered index on a table?

Solution

In order to test the performance of each of these types of table/index structures we will create two identical tables with one difference. One will have a primary key created with a clustered index and the other will have the primary key created using a nonclustered index so the table data will stay in the heap structure. Below is the T-SQL to create and load sample data into these tables.

-- Create table and indexes
CREATE TABLE testtable ([col1]  [int] NOT NULL PRIMARY KEY CLUSTERED,
                        [col2] [int]  NULL,
                        [col3] [int]  NULL,
                        [col4] [varchar](50) NULL,
                        [col5] uniqueidentifier); 
 
-- Load sample data into table
DECLARE @val INT
SELECT @val=1
WHILE @val < 5000000
BEGIN  
   INSERT INTO testtable (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
 
-- Create sample table and indexes
CREATE TABLE testtable2 ([col1]  [int] NOT NULL PRIMARY KEY NONCLUSTERED,
                         [col2] [int]  NULL,
                         [col3] [int]  NULL,
                         [col4] [varchar](50) NULL,
                         [col5] uniqueidentifier); 
 
INSERT INTO testtable2 SELECT * FROM testtable;

One thing I noticed right away was that the heap table was using more space since the table and index are separate structures. With the clustered index the data is stored in the leaf pages of the index so less space is used. Below is a chart that shows how much space is used by each object after the initial load.

TableName	IndexName	Used (KB)	Reserved (KB)	Rows
testtable	PK__testtabl__357D0D3E3D086A66	257952	257992	4999999
testtable2	HEAP	256992	257032	4999999
testtable2	PK__testtabl__357D0D3F2CBA35D8	89432	89608	4999999

Now that we have some data loaded let's perform another data load and monitor the performance using SQL Profiler and check the space usage after the inserts complete. Below is the T-SQL to load another 100,000 records each table.

-- insert when all pages are full
DECLARE @val INT
SELECT @val=5000000
WHILE @val < 5100000
BEGIN  
   INSERT INTO testtable (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
 
DECLARE @val INT
SELECT @val=5000000
WHILE @val < 5100000
BEGIN  
   INSERT INTO testtable2 (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO

Looking first at the space usage as one would expect (since the pages were all full ie. nothing has been deleted from the table) each object is using proportionately more space. Both the heap and clustered index increased in size by almost the same amount.

TableName	IndexName	Used (KB)	Reserved (KB)	Rows
testtable	PK__testtabl__357D0D3E3D086A66	263128	263176	5099999
testtable2	HEAP	262392	262472	5099999
testtable2	PK__testtabl__357D0D3F2CBA35D8	91216	91272	5099999

A little more interesting is the SQL Profiler output. Due to the fact that there are two objects to be updated the nonclustered index and heap table insert requires slightly more CPU and performs more reads and writes than the clustered index insert. It also takes slightly longer to complete.

Index Type	CPU (ms)	Reads	Writes	Duration (ms)
CLUSTERED	3500	304919	654	11288
HEAP	3890	406083	904	11438

Let's now randomly delete some data and then run another 100,000 record insert to see if having free space on some of the pages in the table makes any difference with performance. Below is the T-SQL to perform both the delete and the data load.

-- remove 1000000 random records from each table
DELETE FROM testtable WHERE col1 in (SELECT TOP 1000000 col1 FROM testtable ORDER BY newid());
DELETE FROM testtable2 WHERE col1 not in (SELECT col1 FROM testtable);
GO
 
-- insert when there is free space in pages
DECLARE @val INT
SELECT @val=5100000
WHILE @val < 5200000
BEGIN  
   INSERT INTO testtable (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
 
DECLARE @val INT
SELECT @val=5100000
WHILE @val < 5200000
BEGIN  
   INSERT INTO testtable2 (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO

Looking at the SQL Profiler output we see a much larger performance gap between the two types of structures this time with the table with the clustered index having much better performance.

Index Type	CPU (ms)	Reads	Writes	Duration (ms)
CLUSTERED	3562	304859	653	10334
HEAP	4973	422142	7053	13042

This difference is most likely due to the fact that when inserting data into a heap table the database engine will search for empty space within each page to store the new data being inserted. This is done since the data in the heap table is unsorted so it can go anywhere. The table with the clustered index simply has data appended to the end of the clustered index since the primary key is increasing. Let's also take a look at the space usage for the table. If the heap structure is indeed reusing space it should not have grown any more since we deleted 1,000,000 records and only inserted 100,000. Below is the table sizes after this last batch of inserts completed. We can see here that the size of the heap table has not changed confirming our earlier assumption.

TableName	IndexName	Used (KB)	Reserved (KB)	Rows
testtable	PK__testtabl__357D0D3E3D086A66	268304	268360	4199999
testtable2	HEAP	262392	262472	4199999
testtable2	PK__testtabl__357D0D3F2CBA35D8	93008	93064	4199999

Summary

In this test case we showed that using a clustered index gave us better performance in every category that we measured, CPU, IO and total duration. The only side effect of using a clustered index was that it did use a little bit more space. This space could also be reclaimed by rebuilding the index during a maintenance window (or online if you are running Enterprise Edition). In any case, before implementing any index changes in production you should always test in your own environment to ensure you also see a positive improvement in performance.

Next Steps

More tips on using clustered indexes:
More tips on indexing:
- Understanding SQL Server Indexing
- Indexing

About the author

Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2017-07-07