SQL Server Clustered Index


By:
Overview

We will likely want indexes to help queries. One of the most important indexing decisions is to determine which column(s) will make up the clustered index.

Explanation

You may or may not have read it before but I'll re-hash it here. You get only 1 clustered index per table since the clustered index determines the logical ordering of the data in the table and it makes up the actual table itself. Not having a clustered index should be conscious decision that should be considered carefully.

How can a clustered index help and why do you want one? If you're grouping, sorting, or range searching large sets of data, clustering on the column(s) involved in your query can considerably speed up the query and will consume fewer resources (i.e. CPU, memory). The data that makes up a clustered index is stored as a B tree structure with the data itself linked in doubly linked list fashion in clustered key order making it ideal for sorting, grouping and range queries.

Consider this example where we SELECT a range of rows from AdventureWorks table Production.TransactionHistory using the clustered, unique primary key on TransactionID:

set statistics io on
go
select * 
from Production.TransactionHistory
where TransactionID between 211000 and 211200
go

Examining the I/O counts we see that 6 pages of data needed to be read to satisfy the request

(201 row(s) affected))))) Table 'TransactionHistory'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now let's drop and re-create the primary key as a non-clustered one.

alter table Production.TransactionHistory     
drop constraint PK_TransactionHistory_TransactionID
go
alter table Production.TransactionHistory 
add constraint PK_TransactionHistory_TransactionID primary key nonclustered (TransactionID) 
go

Re-running the same query and re-examining the I/O counts we see that 205 pages of data are now needed to satisfy the same request

(201 row(s) affected))))) Table 'TransactionHistory'. Scan count 1, logical reads 205, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server automatically makes the primary key a clustered index (if not defined otherwise). The primary key does not have to be the clustered index of the table if other columns make more sense to support critical queries.

The choice of the clustered index is a balancing act between query performance vs. table health. Highly volatile tables (i.e. tables that experience a lot of inserts), can experience severe logical fragmentation. If your clustered index is on not on monotonically increasing data (i.e. an identity, sequentially increasing datetime, sequential uniqueidentifier) and the table is highly volatile, you will experience rapid fragmentation of the table which can result in extra I/O to retrieve the data to satisfy a query request.

If no columns make good candidates for a clustered index, I usually make an available monotonically increasing value the clustered index.

Additional Information

Last Update: 9/10/2011




Comments For This Article

















get free sql tips
agree to terms