How column COLLATION can affect SQL Server query performance

By:   |   Updated: 2014-04-24   |   Comments (12)   |   Related: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | > Query Optimization


Problem

Most DBAs, myself included, install SQL Server with the default server collation SQL_Latin1_General_CP1_CI_AS and all of our table columns get created using this default setting. This tip will look at the performance impacts of querying data with this setting as it compares to querying columns with the collation set to SQL_Latin1_General_CP1_CS_AS.

Solution

Table Setup

In order to carry out this small test we will create a simple 3 column table with an integer primary key column as well as two varchar columns. One of the varchar columns will have the collation set to SQL_Latin1_General_CP1_CI_AS and the other will have the collation set to SQL_Latin1_General_CP1_CS_AS. We'll also create an index on both columns. If we don't have an index on the column it would have to do a table scan in both cases and we would not see any difference in performance. Also, in the real world, any heavily queried column would be indexed so it makes sense to do the same for our performance test. Below is the query to create this table and indexes as well the T-SQL to load some random string data into the table.

CREATE TABLE [dbo].[test](
 [col1] [bigint] NOT NULL,
 [col2] [varchar](5) COLLATE SQL_Latin1_General_CP1_CI_AS,
 [col3] [varchar](5) COLLATE SQL_Latin1_General_CP1_CS_AS,
PRIMARY KEY CLUSTERED ([col1] ASC) )
GO

CREATE NONCLUSTERED INDEX [IX_test_col2] ON [dbo].[test] ([col2] ASC)
CREATE NONCLUSTERED INDEX [IX_test_col3] ON [dbo].[test] ([col3] ASC)
GO

DECLARE @i INTEGER
DECLARE @x CHAR(1),@y CHAR(1),@z CHAR(1)
DECLARE @u INT,@v INT,@w INT
SELECT @i=1
WHILE @i < 500000
BEGIN

 SELECT @u=ROUND(RAND()+1,0),@v=ROUND(RAND()+1,0),@w=ROUND(RAND()+1,0)
 IF @u=1 
    SELECT @x=CHAR((RAND()*24)+65)
 ELSE
    SELECT @x=CHAR((RAND()*24)+97)
 IF @v=1 
    SELECT @y=CHAR((RAND()*24)+65)
 ELSE
    SELECT @y=CHAR((RAND()*24)+97)
 IF @w=1 
    SELECT @z=CHAR((RAND()*24)+65)
 ELSE
    SELECT @z=CHAR((RAND()*24)+97)

 INSERT INTO [dbo].[test] VALUES (@i,@x+@y+@z,@x+@y+@z)
 SELECT @i=@i+1
END

Test Scenario 1 - Case insensitive query

Let's first take a look at the results when performing a case insensitive search of each of our columns. In this scenario, for the query on the column defined with SQL_Latin1_General_CP1_CI_AS we can simply use the equality operator in our WHERE clause. For the SQL_Latin1_General_CP1_CS_AS column we have two options, either we can use the T-SQL UPPER() function or specify the collation in the WHERE clause. Here are examples of each query.

select * from test where col2='npE'
select * from test where UPPER(col3)='NPE'
select * from test where col3 COLLATE SQL_Latin1_General_CP1_CI_AS='NPE'

Looking at the explain plan for each of these queries we can see that both the query with the function call as well as the one where we specify the collation need to perform an index scan to execute the query. The first query uses an index seek which will most likely execute much faster using fewer resources.

Explain Plan - CI Query with CI Column


Explain Plan - CI Query with CS Column and Function


Explain Plan - CI Query with CS Column and Collation

We can confirm the performance by looking at the SQL Trace results. From the chart below we can see that the first query outperforms the other two in every category.

CPU Reads Writes Duration Rows
CI Query - CI Column 0 193 0 0 51
CI Query - CS Column with Function 93 2674 0 87 51
CI Query - CS Column with Collation 63 2674 0 62 51

Test Scenario 2 - Case sensitive query

Now let's take a look at the results when performing a case sensitive search on each of our columns. In this scenario the query of the SQL_Latin1_General_CP1_Cs_AS column is pretty straightforward as we can just use the equality operator in our WHERE clause. For the SQL_Latin1_General_CP1_CI_AS column our only option is to specify the collation in the where clause. Here are examples of each query.

select * from test where col2 COLLATE SQL_Latin1_General_CP1_CS_AS='npE'
select * from test where col3='npE'

Looking at the explain plan for these queries we can see that again the query where we specify the collation needs to perform an index scan to execute the query. The later query uses an index seek which will most likely execute much faster using fewer resources.

Explain Plan - CS Query with CI Column and Collation


Explain Plan - CS Query with CS Column

We'll again confirm the performance by looking at the SQL Trace results. From the chart below we can see that the second query which uses an index seek outperforms the other query in every category as was the case in our previous scenario.

CPU Reads Writes Duration Rows
CS Query - CI Column with Collation 63 2674 0 64 7
CS Query - CS Column 0 24 0 0 7

Summary

These simple test scenarios above illustrate how we need to be aware of how our data is being queried by the application. Once we know this we can then design the schema correctly so we can avoid any of these unnecessary index scans.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2014-04-24

Comments For This Article




Friday, February 23, 2018 - 1:40:29 AM - kartis panagiotis Back To Top (75284)

 

 

I created a database for a dictionary. will contain German and Greek words but when I use collate Greek does not support German and when I use German collate does not support Greek Can you help me

thanks

kartis panagiotis


Thursday, July 6, 2017 - 8:33:06 AM - Ben Snaidero Back To Top (58956)

 Hi Miguel,

Here is a good link you can use for creating trace files.

https://www.mssqltips.com/sqlservertip/2232/working-with-sql-server-profiler-trace-files/

Thanks for reading


Wednesday, July 5, 2017 - 10:12:13 PM - Miguel Chillitupa Back To Top (58939)

 

 Awesome... How do I get that trace???


Tuesday, March 17, 2015 - 9:53:17 AM - kulmamies Back To Top (36559)

I believe that the case sensitive query from case insensitive index can be optimized by changing:

select * from test where col2 COLLATE SQL_Latin1_General_CP1_CS_AS='npE'
to
select * from test where (col2 = 'npE') and (col2 COLLATE SQL_Latin1_General_CP1_CS_AS='npE')

by that way, it finds 51 rows from the index and then filters 7 of them to the result.
A super clever optimization engine might be able to do this automatically.


Thursday, March 5, 2015 - 12:03:47 PM - Anne Back To Top (36456)

I believe SQL_Latin1_General_CP1_CI_AS is the default of U.S. But ever since we had a group of consultant developers and dba who worked in our org many years ago, they don't use the default but setup to use Latin1_General_CI_AS ,that becomes our standard when we setup our new servers in our organization. I searched on line a lot about the topic, it seems it is better to use windows collation Latin1_General_CI_AS. instead of SQL collation.  SQL_Latin1_General_CP1_CI_AS  is there for back compatibility.

 

I am also interested in how you get the result like the neat two rows table you have using SQL trace, how do you use it?

 

Thanks

 

 


Wednesday, May 7, 2014 - 5:04:21 AM - humbleDBA Back To Top (30643)

I believe that some of the results you get are not necessarily valid as you are using functions on the column-side of the predicates and thus making them non-SARGable.

I've put a link to an article by Gail Shaw (aka GilaMonster on SSC), below, which covers this though there are many more examples on the web...

https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/ *

*see the srction 'Query predicate is not SARGable'


Wednesday, May 7, 2014 - 4:16:00 AM - Herbert Tobisch Back To Top (30641)

I guess what you want do say is, if a a table column has case insensitive collation, then any index on it will be , say, in uppercase. So, if you want to perform a case sensitive search, the whole index will be scanned  instead of just a precise lookup. Is this correct? Are there other issues ? You are not quite clear about that.

Anyway, it's a good hint to consider collation as a performace factor.

 

 


Tuesday, May 6, 2014 - 11:34:06 PM - Ben Snaidero Back To Top (30638)

Mine has always been SQL_Latin1_General_CP1_CI_AS as well but based on this link that may be due to my OS setting always being US English.

http://social.msdn.microsoft.com/Forums/sqlserver/en-US/bfdc32d3-3d36-4d63-8d87-6ee972fd8130/on-sqllatin1generalcp1cias-sql-server-2008-default-collation?forum=transactsql

Thanks for reading


Tuesday, May 6, 2014 - 12:41:12 PM - Lee Linares Back To Top (30632)

I believe the default collation is SQL_Latin1_General_CP1_CI_AS. As least all mine are.


Tuesday, May 6, 2014 - 7:44:40 AM - Stevan Allen Back To Top (30626)

Hi,

 

Is SQL_Latin1_General_CP1_CS_AS the default location for the U.S?

 

As I get Latin1_General_CI_AS as default location when installing SQL server in the U.K


Tuesday, May 6, 2014 - 7:32:25 AM - WiseOldMan Back To Top (30625)

I guess the real question here, is do we need "npE" to not be the same as "npe" or "NPE" or any of the other combinations.  In a case-insensitive collation, they are all the same.  If we don't need them to be different, then why would we use a case-sensitive collation.  I think that is the point you are trying to make with your conclusion.


Thursday, April 24, 2014 - 8:30:37 AM - Kevin Back To Top (30497)

I don't think this has anything to do with the collation, but rather it has to do with the fact that you are changing the collation in one query, but not in the other.  I would expect a query with a collate command or a function to be slower than the same query without it.  Is there a performance issue when running the same query with different collations?  For example, is it faster to find an item in a case insensitive column vs a case sensitive column (with no collate command) since there are less possible values to match in the case insensitive column?  Also, does the server collation matter since a string in a query would probably use that collation therefore possibly causing a collation change for the query to run?















get free sql tips
agree to terms