SQL Server Covering Index Performance

By:   |   Updated: 2015-02-20   |   Comments (3)   |   Related: > Indexing


Problem

I was recently involved in troubleshooting data warehouse queries for one of my customers who had an ETL package running longer than usual. The package was quite simple it was inserting a large number of rows from a staging table into a large table containing hundreds of millions of rows. The initial investigation revealed that the package's OLEDB destination adapter was responsible for most of the execution time. The destination table had a number of indexes, some of them larger than 200 GB and the execution plan for the bulk inserts has showed that index updates were significantly contributing to the slowness of the inserts.

The total execution costs for a few large non clustered indexes had more than a 40% share in the entire execution plan. We've considered disabling the largest indexes, however it turned out they were heavily used by some daily reports and disabling them could lead to significant performance impact on those reports.

Disabling large indexes prior to ETL and enabling them afterwards also was not an option, because the index rebuild required significant time and reports on those indexes needed to be run immediately after the ETL package. So, I started looking into ways to reduce index sizes, assuming that smaller indexes would require less index updates and therefore better performance for data insert and update transactions.

Solution

Fully covering and partially covering SQL Server indexes

A closer examination has showed that even though the largest indexes had only a few key fields, they contained a high number of columns in their INCLUDED columns logic, apparently some developers created them in order to satisfy queries with large SELECT lists. For those of you who are not familiar with covering indexes, I'd recommend reading this article by Joe Webb.

The biggest benefit of covering index is that when it contains all the fields required by query, it may greatly improve query performance and best results are often achieved by placing fields required in filter or join criteria into the index key and placing fields required in the SELECT list into the INCLUDE part of the covering index. As you can see from the sample execution plan below, the index IDX_Employees_Covering provides all the fields included in the SELECT list and therefore completely satisfies the query:

Query:

SELECT FirstName, LastName, Birthdate, DepartmentId, PositionId, 
ManagerId, Salary, Address, City, State, HiredDate
from Employees 
where DepartmentId between 10 and 70 
AND PositionId between 10 and 80

Index:

CREATE NONCLUSTERED INDEX IDX_Employees_Covering ON Employees (DepartmentId,PositionId) 
INCLUDE (FirstName,LastName,Birthdate,ManagerId,Salary,Address,City,State,HiredDate)

Figure1. Execution plan for fully covering index:

Execution plan for fully covering index:

However, in some cases a fully covering index may become partially covering, due to table/query enhancements. As you can see from the example below for the same query, index IDX_Employees_PartiallyCovering doesn't provide all the required fields and SQL Server is forced to use an additional operator, a Key Lookup in order to bring missing columns (City, HiredDate, State in this example) from the clustered index:

Index:

CREATE NONCLUSTERED INDEX IDX_Employees_PartiallyCovering 
ON Employees DepartmentId PositionId 
INCLUDE FirstName LastName Birthdate ManagerId 

Figure2. Execution plan for partially covering index:

Execution plan for partially covering index:

Index optimization

In the above mentioned problem my goal was to optimize the existing indexes without compromising the performance of read queries. So, I've found all the queries from the query cache which had reference to large indexes and analyzed their execution plans, using this query described by Jonathan Kehayias in this article. This analysis has revealed that none of the large covering indexes were fully satisfying queries, in other words they had Lookup operators in their execution plans. So I thought of removing some of the columns from the INCLUDE part of the index would not change the query since a Lookup operator had to bring more fields in anyway as shown below.

Execution plans for partially covering indexes:

Figure 3. Large index, before optimization

Large index, before optimization

Figure 4. Trimmed index, after optimization

Trimmed index, after optimization

I've removed most of the fields from the INCLUDE part of the covering indexes without touching the main index and index sizes have dropped up to 40%. As a result, query performance improvements have exceeded all my expectations- the execution costs of related SELECT queries have dropped and they ran faster. Moreover, we've gained more than 45% improvement on data warehouse bulk insert transactions. Below I've tried to reproduce the problem and provide a comparison between various indexing options.

Test scenario: Measuring partially covering index performance

We'll create a test table within a test database and measure key performance metrics: duration, number of reads and query execution cost against the following two queries:

1. SELECT FirstName, LastName, Birthdate, DepartmentId, PositionId, 
ManagerId, Salary, Address, City, State, HiredDate 
FROM Employees 
WHERE DepartmentId between 10 and 70 
AND PositionId between 10 and 80

2. INSERT INTO Employees 
SELECT FirstName, LastName, Birthdate, DepartmentId, PositionId, 
ManagerId, Salary, Address, City, State, HiredDate
FROM [testDb].[dbo].[Employees_Archive]

The measurements would be taken with the following indexing conditions:

1. Without non-clustered indexes, with only clustered index
2. Large non-clustered partially covering index on top of the clustered index
3. Fully covering non-clustered index on top of clustered index
4. Trimmed (non-covering) index on top of clustered index

In order to get accurate results, we'll clean the cache by running the following commands prior to each test:

dbcc dropcleanbuffers;
dbcc freeproccache;

We'll also run the following commands at the start of each query session to be able to get I/O and time statistics and will turn on the Actual Execution Plan in SSMS:

set statistics time on;
set statistics io on;

This query will create the Employees table in the testDb database and populate it with 500K rows (it took a few minutes on my test server) :

CREATE DATABASE testDb
Go
USE TestDB
GO
SET NOCOUNT ON;

CREATE TABLE Employees (
Id int Identity(1,1) Primary Key not null,
FirstName varchar(500),LastName varchar(500),Birthdate smalldatetime,DepartmentId int,
PositionId int,ManagerId int,Salary decimal(10,2),Email varchar(50),Address varchar(500),
City varchar(200),State varchar(200),HiredDate date);
GO

DECLARE @cntvar int=1,@RowCnt int=500000
WHILE @cntvar<=@RowCnt 
BEGIN 
INSERT INTO Employees 
(FirstName,LastName,Birthdate,DepartmentId,PositionId,
ManagerId,Salary,Email,Address,City,State,HiredDate) 
VALUES
('TestName'+replicate('X',90),
'TesLastName'+replicate('X',90),
dateadd(hour,@cntvar,'1950-01-01'),
@cntvar%1000,
@cntvar%10000, 
@cntvar%1000,
50000,
'[email protected]',
'testStreet'+replicate('X',90),
'testCity'+replicate('X',90),
'testState'+replicate('X',85),
dateadd(hour,@cntvar,'1980-01-01')
); 
SET @cntvar=@cntvar+1;
END

The next query will create a duplicate of this table in order to test bulk inserts into the Employees' table:

SELECT * INTO Employees_Archive from Employees; 

Below are the scripts for creating indexes:

1. Partially covering index:

CREATE NONCLUSTERED INDEX IDX_Employees_PartiallyCovering
ON Employees (DepartmentId,PositionId) 
INCLUDE (FirstName ,LastName ,Birthdate ,ManagerId ,Salary,Address,email);

2. Fully covering index:

CREATE NONCLUSTERED INDEX IDX_Employees_Covering
ON Employees (DepartmentId,PositionId)
INCLUDE (FirstName,LastName,Birthdate,ManagerId,Salary,Address,City,State,HiredDate);

3. Trimmed (non-covering) index:

CREATE NONCLUSTERED INDEX IDX_Employees_Trimmed
ON Employees (DepartmentId,PositionId);

Here is the summary of the test results (full test results could be found here: Test Results):

Index conditions

Select query Bulk insert
Duration (ms) Logical reads Execution cost Duration (ms) Logical reads Execution cost
Only clustered index 4715 42367 29.227 41770 1786510 453.955
Partially covering index 1563 15252 5.877 182407 3880338 1426.57
Fully covering index 396 2156 1.65 140775 4133971 1679.55
Trimmed index 378 14152 4.851 67328 3393525 462.294

In this test, the best performance results for read queries were achieved by applying a fully covering index, however the insert queries involving fully covering indexes performed worst in terms of execution cost, I/O and duration. On the other hand, the trimmed index (index with all columns removed from the INCLUDED logic) proved to be the best for insert queries, while performing slightly worse for read queries in comparison to a fully covering index. The most important takeaway from the above experiment is that the removal of some columns from the INCLUDED part of the index helps to improve performance for both read and write queries in cases where the execution plans do not change and this happens due to the fact that smaller indexes require less memory and disk I/O.

Conclusion

Covering indexes are a great tools for improving read queries, however they can also significantly degrade performance of write transactions on large tables. When write queries are performing poorly, indexes on underlying tables need to be reviewed. If an index is fully covering for at least some of the important queries, I'd be hesitant to reduce its size as it may become partially covering and less efficient for those queries. However, in cases when an index is partially covering for all key queries and I/O transactions are underperforming, downsizing it would certainly be useful.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002 and has earned two MCSE certifications. He’s currently working as a Solutions Architect at Slalom Canada.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2015-02-20

Comments For This Article




Monday, October 29, 2018 - 10:51:44 AM - Fikrat Azizov Back To Top (78110)

 Hi there,

Thanks for question.I think west way to handle your case is to try finding minimal combinations of fields, which are common for as many queries as possible and use them to create indexes.In other words if query 1 uses columns a,b,c and query 2 uses columns a,b then index on a,b,c would serve both queries.Then you could create separate indexes for the rest of queries.Once you've designed indexes based on commonalities, it's good idea to qun your queries and examine their execution plans to see if those indexes are being used. 

 


Monday, October 29, 2018 - 4:03:38 AM - VISHNUVARDHAN Back To Top (78100)

HI Ji,

thanks for the clear explanation.

I would like to know how many indexes we need to create on a table if we are using different filter condition at different places

I have a table with 70 columns and 10000 records, I retrieve data from the table with different filter conditions at different times so i cant say key column but

all have the importance at there time.

for this scenario how to create indexs on table

like a1 index1 non clustered

a2 index2 non clustered

a3 index3 non clustered

 

or a1,a2,a3 index1 non clustered

id index2 clustered


Friday, February 20, 2015 - 3:57:47 AM - subrata Back To Top (36298)

I facing one Problem  when i use my query in SSRS report then display that message." error occurred while the query design method was being saved.An item with the same key has already been added".so how to salve that problem..my code is bellow:

 

SELECT Ctga1.CtgVarDesc AS MainProduct, CASE WHEN Ctga1.CtgChrWebActive = 1 THEN 'Active' END AS Active, CASE WHEN Ctga1.CtgChrWebActive = 2 THEN 'Inactive' END AS Inactive, 

                         Ctgb1.CtgVarDesc AS SubProduct1, CASE WHEN Ctgb1.CtgChrWebActive = 1 THEN 'Active' END AS Active, CASE WHEN Ctgb1.CtgChrWebActive = 2 THEN 'Inactive' END AS InactiveActive, 

                         Ctgc1.CtgVarDesc AS SubProduct2, CASE WHEN Ctgc1.CtgChrWebActive = 1 THEN 'Active' END AS Active, CASE WHEN Ctgc1.CtgChrWebActive = 2 THEN 'Inactive' END AS InActive, 

                         Prdd1.PrdVarDesc AS SubProduct3, CASE WHEN Prdd1.PrdChrActiveYN = 'y' THEN 'Active' END AS Active, CASE WHEN Prdd1.PrdChrActiveYN = 'N' THEN 'Inactive' END AS InActive

FROM            CbzCtgMst AS Ctga1 INNER JOIN

                         CbzCtgMst AS Ctgb1 ON Ctga1.CtgIntId = Ctgb1.CtgIntPtId INNER JOIN

                         CbzCtgMst AS Ctgc1 ON Ctgb1.CtgIntId = Ctgc1.CtgIntPtId INNER JOIN

                         CbzPrdMst AS Prdd1 ON Ctgc1.CtgIntId = Prdd1.PrdIntCtgId

ORDER BY MainProduct, SubProduct1, SubProduct2, SubProduct3

 















get free sql tips
agree to terms