Identify SQL Server Indexes With Duplicate Columns

By:   |   Updated: 2015-05-01   |   Comments (6)   |   Related: > Indexing


Problem

Although not as much of a SQL Server performance issue as missing indexes where they could be used, having too many indexes defined on a table can cause a lot of unnecessary IO on your SQL Server database as well as use up additional disk space. This tip will provide a query that will identify any tables that have indexes that share leading columns so you can decide which ones are not actually needed and remove them.

Solution

Sample SQL Server Table Setup

In order to provide you with some test cases I've included some T-SQL to create a couple of sample tables. These two tables provide examples of the following test cases:

  1. Exact duplicate indexes
  2. Indexes that shares leading columns
  3. Indexes that share first column, but subsequent columns are different

Of course you can always skip this step and test the queries below in one of your own development environments. Either way here is the T-SQL to setup these sample tables.

-- Create sample table and indexes
CREATE TABLE testtable1 ([col1] [int] NOT NULL primary key  clustered,
                            [col2] [int]  NULL,
                            [col3] [int]  NULL,
                            [col4] [varchar](50) NULL); 

CREATE INDEX idx_testtable1_col2col3 on testtable1 (col2  asc, col3 asc);
CREATE INDEX idx_testtable1_col2col4 on testtable1 (col2  asc, col4 asc);
CREATE INDEX idx_testtable1_col3 on testtable1 (col3  asc);
CREATE INDEX idx_testtable1_col3col4 on testtable1 (col3  asc, col4 asc);

CREATE TABLE testtable2 ([col1] [int] NOT NULL primary key  clustered,
                        [col2] [int]  NULL,
                        [col3] [int]  NULL,
                        [col4] [varchar](50) NULL); 
 
CREATE INDEX idx_testtable2_col3col4 on testtable2 (col3  asc, col4 asc);
CREATE INDEX idx_testtable2_col3col4_1 on testtable2 (col3  asc, col4 asc);

Build Script

In an ideal world you would have a naming convention for your SQL Server indexes that would identify which columns make up the index. Since these types of standards don't always exist it often happens that duplicate indexes can get created on tables and in order to identify these indexes we'll have to use the following catalog views:

To build this query we will break the problem down into two steps. The first thing we will need is a list of all the tables and indexes with their associated columns. For the columns that make up the index we will create a comma separated list of values as is described in this link using the STUFF function and the FOR XML PATH clause. Here is the T-SQL to retrieve this listing. Note the WHERE clause which excludes any tables that are not user created.

select distinct object_name(i.object_id) tablename,i.name indexname,
        (select distinct stuff((select ', ' + c.name
                                  from sys.index_columns ic1 inner join 
                                       sys.columns c on ic1.object_id=c.object_id and 
                                                        ic1.column_id=c.column_id
                                 where ic1.index_id = ic.index_id and 
                                       ic1.object_id=i.object_id and 
                                       ic1.index_id=i.index_id
                                 order by index_column_id FOR XML PATH('')),1,2,'')
           from sys.index_columns ic 
          where object_id=i.object_id and index_id=i.index_id) as columnlist
  from sys.indexes i inner join 
       sys.index_columns ic on i.object_id=ic.object_id and i.index_id=ic.index_id inner join
       sys.objects o on i.object_id=o.object_id 
 where o.is_ms_shipped=0

Now that we have a complete list of all our tables and indexes (including their columns) we can use that output and join it to itself comparing the columnlist output column. All we have to do is join on table name and compare the columnlist column from the first result set with the substring of the columnlist column from the second result set using the length of the first columnlist column. Of course we will omit exact matches except where the index name is also different. Here is the T-SQL for this query.

select t1.tablename,t1.indexname,t1.columnlist,t2.indexname,t2.columnlist from
   (select distinct object_name(i.object_id) tablename,i.name indexname,
             (select distinct stuff((select ', ' + c.name
                                       from sys.index_columns ic1 inner join 
                                            sys.columns c on ic1.object_id=c.object_id and 
                                                             ic1.column_id=c.column_id
                                      where ic1.index_id = ic.index_id and 
                                            ic1.object_id=i.object_id and 
                                            ic1.index_id=i.index_id
                                      order by index_column_id FOR XML PATH('')),1,2,'')
                from sys.index_columns ic 
               where object_id=i.object_id and index_id=i.index_id) as columnlist
       from sys.indexes i inner join 
    	    sys.index_columns ic on i.object_id=ic.object_id and 
                                    i.index_id=ic.index_id inner join
            sys.objects o on i.object_id=o.object_id 
      where o.is_ms_shipped=0) t1 inner join
   (select distinct object_name(i.object_id) tablename,i.name indexname,
             (select distinct stuff((select ', ' + c.name
                                       from sys.index_columns ic1 inner join 
                                            sys.columns c on ic1.object_id=c.object_id and 
                                                             ic1.column_id=c.column_id
                                      where ic1.index_id = ic.index_id and 
                                            ic1.object_id=i.object_id and 
                                            ic1.index_id=i.index_id
                                      order by index_column_id FOR XML PATH('')),1,2,'')
                from sys.index_columns ic 
               where object_id=i.object_id and index_id=i.index_id) as columnlist
       from sys.indexes i inner join 
    	    sys.index_columns ic on i.object_id=ic.object_id and 
                                    i.index_id=ic.index_id inner join
            sys.objects o on i.object_id=o.object_id 
 where o.is_ms_shipped=0) t2 on t1.tablename=t2.tablename and 
       substring(t2.columnlist,1,len(t1.columnlist))=t1.columnlist and 
       (t1.columnlist<>t2.columnlist or 
         (t1.columnlist=t2.columnlist and t1.indexname<>t2.indexname))

And there you have it. A query that will provide a complete listing of all indexes that share leading columns. Please note that the way the query is written if you have two indexes that are exact duplicates it will show up twice in the result set. The only thing left to do now is to decide which indexes we should remove so that we can reclaim some space and reduce the IO required for any DML statements executed against these tables. The following tips can be used to help you determine which indexes you should remove.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2015-05-01

Comments For This Article




Monday, August 12, 2019 - 7:08:23 AM - Nick Stafford Back To Top (82034)

Hi,

I'm aware this is pretty old now but thought it'd be useful to pop this in.

1. Thanks for the script, certainly useful.

2. You might want to consider applying the comma in the column string even if there's only one column in the index. There's a slight issue in that since you're just comparing the starts of the column strings using substring you're susceptible to cases where you have a single column index (such as a PK with just an ID column) and then have indexes with columns which also start with the letters 'ID'. For example ID2 if there's some sort of linking between records in the same table.


Friday, May 8, 2015 - 5:44:41 AM - Thomas Franz Back To Top (37134)

add OBJECT_SCHEMA_NAME(i.object_id) as schemaname to the select list of t1 and t2 and join on them (plus add t1.schemanname to the outer select list).


Wednesday, May 6, 2015 - 6:31:15 PM - Tony Green Back To Top (37112)

This is a good start however it does not take into account Filtered Indexes.  It would say that the following 2 indexes are the same.

CREATE INDEX idx_testtable1_col2col3 on testtable1 (col2  asc, col3 asc);

CREATE INDEX idx_testtable1_col2col32 on testtable1 (col2  asc, col3 asc) WHERE [col3] = 2;

Tony

 


Tuesday, May 5, 2015 - 7:18:38 AM - Niroshan Back To Top (37093)

This is very useful. But the order by index_column_id sometimes gives incorrect results. It should be order by key_ordinal


Friday, May 1, 2015 - 9:48:25 PM - Ben Snaidero Back To Top (37085)

Hi Kyle,

Actually I didn't know that. Most of the databases I work with put all objects in the dbo schema.  Thanks for catching and providing a fix for this.

Ben


Friday, May 1, 2015 - 11:13:43 AM - Kyle Back To Top (37082)

This is a very timely article for me as I'm working on just this thing today. Thank you :)

FYI ... perhaps you know this already, but this returns false positives if you have tables with the same names under different schemas. Adding the schema name to "tablename" gets rid of them. (eg: sys.schemas.name + '.' + sys.objects.name as tablename) 

 















get free sql tips
agree to terms