Avoid Index Redundancy in SQL Server Tables

By: Armando Prato | Updated: 2008-03-06 | Comments (5) | Related: > Indexing

Problem

I'm trying to design indexes on a new Customer table and I have three separate query searches to satisfy. One query search is by lastname. The 2nd query search is by last name and first name. The last query search is by lastname and an active flag. How should I create my indexes?

Solution

I've come across more instances than I'd like of data models where the original database developer would solve this problem by creating three separate indexes as follows:

create index ix_customer_lastname on dbo.Customer(lastname)
GO

create index ix_customer_lastname_firstname on dbo.Customer(lastname, firstname)
GO

create index ix_customer_lastname_activesw on dbo.Customer(lastname, activesw)
GO

The reality is, in this case you only need one!

The following queries

select * from dbo.Customer where lastname = 'Washington'
GO

select * from dbo.Customer where lastname = 'Washington' and activesw = 1
GO

select * from dbo.Customer where firstname = 'George' and lastname = 'Washington'
GO

select * from dbo.Customer
where firstname = 'George' and lastname = 'Washington' and activesw = 1
GO

can all be satisfied using the following single index

create index ix_customer_lastname on dbo.Customer(lastname, firstname, activesw)
GO

The lead column of each index is lastname and this is the most important one. SQL Server keeps frequency distribution statistics in the form of a histogram on the lead column of every index. It will also keep secondary selectivity information for the additional index column permutations.

Having redundant indexes in your database wastes SQL Server system resources because the database engine has to maintain more data structures than it needs to. Consider a new Customer being added to the database. The new last name inserted would have to be accounted for in the logical ordering of each index resulting in unnecessary overhead to maintain each of these indexes. Furthermore, redundant indexes waste disk space.

Let's create a Customer table and some indexes to illustrate these concepts. This script will create the table and 10,000 customer rows.

set nocount on
go
create table dbo.Customer (customerid int identity primary key,
firstname char(40), lastname char(40),
address1 char(500), city char(40), state char(20), activesw bit)

declare @counter int, @id varchar(4)
select @counter = 1
while (@counter <= 10000)
begin
select @id = cast(@counter as varchar)
insert into customer(firstname, lastname, address1, city, activesw)
select 'f'+@id, 'l'+@id, 'a'+@id, 'c'+@id, 1

select @counter = @counter + 1
end

create index ix_customer_lastname on dbo.Customer(lastname)
create index ix_customer_lastname_firstname on dbo.Customer(lastname, firstname)
create index ix_customer_lastname_activesw on dbo.Customer(lastname, activesw)
go

Now let's examine the space used by the table's indexes

fig 2
We see that the indexes take up 2kb of disk space

Now let's run the following queries and examine the optimizer's index selections

set showplan_text on
go
select customerid from dbo.Customer where lastname = 'l22'
select customerid from dbo.Customer where lastname = 'l22' and activesw = 1
select customerid from dbo.Customer where firstname = 'f22' and lastname = 'l22'
select customerid from dbo.Customer where firstname = 'f22' and lastname = 'l22' and activesw = 1

Your query plans may look similar to the following:

fig 3 1

Now these are interesting. The first and second queries used index ix_customer_lastname_activesw. The third query used index ix_customer_lastname_firstname. For the last query, the optimizer decided to use ix_customer_lastname_firstname as well and decided to do a lookup to get the activesw value. If you haven't noticed yet, index ix_customer_lastname was not used to satisfy any of these queries. It's just sitting around, literally taking up space.

Now let's drop these indexes and add a single index to satisfy the same four queries

drop index dbo.Customer.ix_customer_lastname
drop index dbo.Customer.ix_customer_lastname_firstname
drop index dbo.Customer.ix_customer_lastname_activesw

create index ix_customer_lastname on dbo.Customer(lastname, firstname, activesw)

If we now re-run sp_spaceused, we'll see that the index space used by the table has been reduced by over 100%

Fig 5

Now let's re-run the queries we ran earlier and examine the new execution plans

fig 6 1

As we now see, the single index satisfies all four queries. In addition, the last query that performed a lookup for the activesw now reads it directly from the index and does not resort to searching the clustered table for the value.

By carefully defining your indexes you can save the database engine some work, save some space on your disks, and still have high performing queries.

Next Steps

Examine your data model for redundant indexes and consider an investigation into the elimination of these redundancies
Read more about sp_spaceused in the SQL tips archive - Using DBCC UPDATEUSAGE to get accurate space allocation information in SQL Server
Read Creating Indexes with Included Columns in the SQL Server 2005 Books Online

About the author

Armando Prato has close to 30 years of industry experience and has been working with SQL Server since version 6.5.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2008-03-06

Comments For This Article

Tuesday, March 17, 2009 - 7:39:21 PM - aprato

I'm not sure why you'd need to query on both id and date? Can you elaborate to why? I would think you would only need to use one or the other.

If the clustered index is id, then the id will be located at the leaf level of the non-clustered index. You wouldn't need to specify id in the index definition if it's just to be returned in the output because you're already getting it for free

a query like

select id
from mytable
where date1 = <some date>

would become a covered query where just the non-clustered index structure is read.
The engine will grab the id right from the leaf.

If you read this tip, you'll be able to drill into the internals of the pages which can
be an eye opener.

http://www.mssqltips.com/tip.asp?tip=1578

Saturday, March 14, 2009 - 4:40:36 AM - ranjan

I have a table with three colum as
id,date1,date2

Three index is created as
ClusterIndex1 = id
Nonclusterindex1 = date1,id
Nonclusterindex2 = date2,id

I have written 2 different query,in that id, date1 is in one where clause id , date2 is another where clause.

I suggest index creation should be

ClusterIndex1 = id
Nonclusterindex1 = date1
Nonclusterindex2 = date2

Unnecessary adding some columns in index creation will affect performance. But the index space used by both scenario is same. What would be the different between those two creation of index.

If this is correct can u send any link for the proof.

Monday, June 23, 2008 - 6:37:57 AM - grobido

Take a look at this tip, I think this is what you are referring to.

http://www.mssqltips.com/tip.asp?tip=1485

Sunday, June 22, 2008 - 6:36:27 PM - aprato

I think I know to what you're referring.... covering indexes? Using a covering index, you can eliminate a deadlock in READ COMMITTED isolation by covering a query such that a bookmark lookup is not needed against a clustered table that is in process of being updated. Higher isolation levels, may have a different result, however. If the logical ordering is not important, you could also use the new index INCLUDE feature of SQL Server 2005.

Thursday, June 19, 2008 - 8:32:47 PM - ROd1

I definitely agree on your post, the less redundancy the better --- Now, dealing with deadlocks i've read that sometimes when several transactions/queries are being run at once (concurrent users), certain table operations can lock the indexes of tables, causing every other query wanting to write to that table to wait for getting exclusive access to that index (as they need to be also updated). Then, i remember the tip to reduce this kind of waits was to have yet an extra index (smaller) including some of the most selective columns of that table as an extra path for SQL Server to use in case the first index is locked. What do you think? ROd@Mx