By: Jared Westover | Updated: 2023-01-11 | Comments (1) | Related: > Indexing
Problem
When I started creating SQL Server indexes, I only added columns to the key. Even if the handy green index hint suggested otherwise. It's been so long ago I can't remember where I first read about placing columns in the include or nonkey portion. However, somewhere along life's journey, I conformed. Why should you put columns in the key versus the nonkey area when creating an index? Does it matter? These are just of few of the questions we'll explore together.
Solution
In this tutorial, I'll start by defining the context of a covering index in SQL Server. Next, we'll discover how SQL Server stores the nonkey columns in an index structure. We'll explore two primary benefits of adding columns to the include portion of an index. There are more, but these are the two I've experienced the most. By the end of this tutorial, you'll have the skills to start creating better indexes today.
Covering Index in SQL Server
Let's start by answering the question, what's a covering index? The simplest way to describe it is an index that includes every column for a specific query. Sometimes you hear this referred to as covering the query. For example, take the simple query below.
SELECT Column1, Column2 FROM Table1 WHERE Column1 = 'Some value';
If you want to cover the query, an index like the one below does the trick.
CREATE NONCLUSTERED INDEX [IX_Column1_Column2] ON Table1 ( Column1, Column2 );
The index may also cover other queries, but we don't know. This speaks to another concept of database or server tuning. Where you're not looking at one query in isolation; however, that's beyond the scope of this article. For now, we'll focus on single queries.
In the example above, what would happen if we excluded Column2 from the index's key section? Likely, SQL would perform Key or RID lookups to fetch it from the base table. Avoiding lookups wins as one of the main reasons to cover a query. I'm not saying you never want lookups in your query plan. You might need to create dozens of indexes on a single table in a busy production environment to avoid them entirely. Since indexes take up space and require maintenance creating one for every column is not the best idea.
SQL Server Index Structure
I've included an illustration of an index structure below. You've likely seen some form of the image more times than you can count. I'll keep things simple and suggest only three levels exist. The root node, where SQL stores the index key. The intermediate node points to the next level. Finally, the leaf level or actual data pages are at the bottom of the tree.
It's essential to hold a basic understanding of an index structure before moving on.
Index Key Columns
What should you place in the index key? For the sake of clarity, I'm referring to nonclustered indexes. I recommend columns used in JOIN and WHERE clauses. Also, if your query has an ORDER BY or GROUP BY, those might be candidates. The ones mentioned above, Microsoft also recommends. Another way to think of columns in the key is anything we plan to search or sort.
Index Nonkey Columns
What about the nonkey portion? Which columns should you add? Pretty much none of the ones I mentioned in the key but everything else that satisfies the query requirements. Another way to think about it is columns where SQL Server doesn't need to worry about searching or sorting decisions. Kind of like driving around my five-year-old; he sits in the backseat since he can't navigate or even change the radio station. Plus, it's the law where I live.
When you create an index and place all columns in the key, SQL stores them at the root node. Keep in mind that a nonclustered index key has a maximum size of 1,700 bytes starting with SQL 2016. Before that, it was 900 bytes. From my understanding, this limit controls the number of levels in the tree.
Next, let's review two primary reasons to add columns to the nonkey portion.
Key Size Reduction
The first reason to use nonkey columns is to reduce the index key size. When you place columns in the include portion, SQL only stores them at the leaf level. I've heard people say that it doesn't noticeably affect the number of pages created. At a basic level, I agree with that belief. However, if you know you'll never use the column in a WHERE or ORDER BY clause, you're bloating the number of pages at both the root node and intermediate level. If you're dealing with a small dataset, it doesn't matter. However, like the weeds in my lawn, tables always seem to grow over time.
Let's create a dataset to visualize the differences. The script below creates a single table with five columns. You can see I placed a clustered index on Column1. Next, we insert 100 thousand rows into our new table.
DROP TABLE IF EXISTS dbo.IndexKeySize; CREATE TABLE dbo.IndexKeySize ( Column1 INT IDENTITY(1, 1), Column2 INT NOT NULL, Column3 CHAR(250) NOT NULL, Column4 CHAR(250) NOT NULL, Column5 NVARCHAR(MAX) NULL, CONSTRAINT PK_IndexKeySize_Column1 PRIMARY KEY CLUSTERED (Column1) ); GO INSERT INTO dbo.IndexKeySize ( Column2, Column3, Column4, Column5 ) SELECT TOP (100000) ABS(CHECKSUM(NEWID()) % 100) + 1 AS Column2, SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26)+1, 10) AS Column3, SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26)+1, 20) AS Column4, SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ', (ABS(CHECKSUM(NEWID())) % 26)+1, 20) AS Column5 FROM sys.all_columns AS n1 CROSS JOIN sys.all_columns AS n2; GO
Let's imagine the query below executes hundreds of times a day.
SELECT Column3, Column4 FROM dbo.IndexKeySize WHERE Column2 = 2;
To improve performance, we'll create an index. If we don't use nonkey columns, the DDL for the index will look like the code below.
CREATE NONCLUSTERED INDEX [IX_Column2_Column3_Column4] ON dbo.IndexKeySize ( Column2, Column3, Column4 ); GO
Success, our shiny new index is waiting to be called on. Let's check out how many levels and pages we have. I'm going to run the system function sys.dm_db_index_physical_stats.
SELECT i.[name], ips.index_type_desc, ips.alloc_unit_type_desc, ips.index_depth, ips.index_level, ips.page_count FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.IndexKeySize'), NULL, NULL, 'DETAILED') ips INNER JOIN sys.indexes i ON i.index_id = ips.index_id AND [ips].[object_id] = [i].[object_id]; GO
You can see from the screenshot above; our index consists of five levels and 7,147 pages. What about creating another index but designating Column3 and Column4 as nonkey?
CREATE NONCLUSTERED INDEX [IX_Column2+Column3+Column4] ON dbo.IndexKeySize (Column2) INCLUDE ( Column3, Column4 ); GO
Now, let's rerun sys.dm_db_index_physical_stats.
Notice that our new index with the nonkey columns is only three levels deep for 6,685 pages. Which index do you think SQL Server will use if we rerun the query above?
You guessed it! The smaller one with fewer pages.
Non-supported Key Data Types
The second reason to use nonkey columns is when you can't place them in the key. Did you notice Column5 was an NVARCHAR(MAX)? What if we have the following query running in production hundreds of times per day?
SELECT Column5 FROM dbo.IndexKeySize WHERE Column2 = 2;
Can you guess what happens when I try and add Column5 to the index key?
CREATE NONCLUSTERED INDEX [IX_Column2_Column5] ON dbo.IndexKeySize (Column2,Column5); GO
SQL delivers a nice error message saying please don't do that. A way to work around this limitation is to place the column in the nonkey portion.
CREATE NONCLUSTERED INDEX [IX_Column2+Column5] ON dbo.IndexKeySize (Column2) INCLUDE (Column5); GO
I haven't tried to add every data type to the nonkey, but Microsoft confirms you can use all data types except text, ntext, and image as nonkey columns.
Wrapping Up
The original question I asked was, does it matter if you place a column in the key or nonkey part of an index? Given the items we looked at above, yes, it does matter. You don't want your index keys growing too large because that increases the size of the root and intermediate pages. Additionally, we looked at limitations with placing certain data types in an index key. In these cases, one of your only options is to use the include. What do you generally do in practice regarding key and nonkey columns? Are there any additional benefits to using nonkey columns? Please leave your comments below.
Next Steps
- Would you like to learn more about covering indexes and performance in SQL Server? Fikrat Azizov wrote an exceptionally detailed article on the topic.
- Do you think the order of columns matter for the index key? The answer might just surprise you. I wrote an article that explores the question in detail.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-01-11