By: Ben Snaidero | Updated: 2015-09-01 | Comments (1) | Related: > Indexing
Problem
Beginning in SQL Server 2012 we have a new type of index that we can create, a column store index. Column store indexes store the index data for each column on a specific page so any type of scan query (i.e. data warehouse type queries) perform much better than if the column were indexed with a regular b-tree index. This tip will look at a simple example to see how much better a column store index performs even with a smaller data set.
Solution
In this tip we will look at column store index performance compared to traditional SQL Server indexes.
Sample Data Setup
For this test we'll create a simple table with some column types you might find in a warehouse table. We'll also load 5,000,000 records into this table. This is not really close to what you find in your typical warehouse, but should be enough to show some differences in index performance with our column store and b-tree indexes. Here is the T-SQL to create the table and data.
-- Create SampleData table CREATE TABLE [dbo].[SampleData]( [RowKey] [int] NOT NULL, [CreateDate] [int] NOT NULL, [OtherDate] [int] NOT NULL, [VarcharColumn1] [varchar](20) NULL, [VarcharColumn2] [varchar](20) NULL, [VarcharColumn3] [varchar](20) NULL, [VarcharColumn4] [varchar](20) NULL, [VarcharColumn5] [varchar](20) NULL, [IntColumn1] int NULL, [IntColumn2] int NULL, [IntColumn3] int NULL, [IntColumn4] int NULL, [IntColumn5] int NULL, [IntColumn6] int NULL, [IntColumn7] int NULL, [IntColumn8] int NULL, [IntColumn9] int NULL, [IntColumn10] int NULL, [FloatColumn1] float NULL, [FloatColumn2] float NULL, [FloatColumn3] float NULL, [FloatColumn4] float NULL, [FloatColumn5] float NULL ) GO -- Load sample data into table DECLARE @val INT SELECT @val=1 WHILE @val < 5000000 BEGIN INSERT INTO SampleData VALUES (@val, CAST(CONVERT(varchar,DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 365), '2015-01-01'),112) as integer), CAST(CONVERT(varchar,DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 365), '2015-01-01'),112) as integer), 'TEST' + cast(round(rand()*100,0) AS VARCHAR), 'TEST' + cast(round(rand()*100,0) AS VARCHAR), 'TEST' + cast(round(rand()*100,0) AS VARCHAR), 'TEST' + cast(round(rand()*100,0) AS VARCHAR), 'TEST' + cast(round(rand()*100,0) AS VARCHAR), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*100000,0), round(rand()*10000,2), round(rand()*10000,2), round(rand()*10000,2), round(rand()*10000,2), round(rand()*10000,2)) SELECT @val=@val+1 END GO
Index Setup
Since in SQL Server 2012 column store indexes are not updatable (this changed with SQL Server 2014) we'll create both types of indexes after the data load in order to make sure that the b-tree index does not get fragmented due to the initial data load. Below is the T-SQL to create each index.
-- b-tree index CREATE NONCLUSTERED INDEX IX_SampleData_Reg ON SampleData (VarcharColumn1,FloatColumn1); GO -- column store index CREATE NONCLUSTERED COLUMNSTORE INDEX IX_SampleData_ColStore ON SampleData (VarcharColumn1,FloatColumn1); GO
I did run into one issue when trying to create the column store index. Since I was running this on my home desktop which does not have a lot of memory I received the following error.
The statement has been terminated. Msg 8658, Level 17, State 1, Line 1 Cannot start the columnstore index build because it requires at least 142360 KB, while the maximum memory grant is limited to 93392 KB per query in workload group 'default' (2) and resource pool 'default' (2). Retry after modifying columnstore index to contain fewer columns, or after increasing the maximum memory grant limit with Resource Governor.
Below is a query you can use to check this setting as well as the statement I used to update it. If you are running on a server with a lot of memory I don't think you would run into this issue unless the table you are indexing is big relative to the amount of memory you have on the server.
SELECT name ,request_max_memory_grant_percent FROM sys.dm_resource_governor_workload_groups WHERE name = 'default'; GO ALTER WORKLOAD GROUP [default] WITH (request_max_memory_grant_percent = 75); GO ALTER RESOURCE GOVERNOR RECONFIGURE; GO
Column Store Index Performance Test
Now that all the setup is complete let's run a typical query you'd find running against any data warehouse, one with an aggregation done grouping by some column. Here is my query below. You'll notice since there are two indexes and the optimizer obviously chooses to use the column store index, in order to complete our comparison we'll use a hint to force the optimizer to use the b-tree index for our other query. Here is the T-SQL for each query.
-- column store query SELECT VarcharColumn1,avg(FloatColumn1) FROM SampleData GROUP BY VarcharColumn1 GO -- b-tree query SELECT VarcharColumn1,avg(FloatColumn1) FROM SampleData GROUP BY VarcharColumn1 OPTION (TABLE HINT(SampleData, INDEX (IX_SampleData_Reg))) GO
Test Results Analysis
In order to gather some performance statistics on each query execution I had SQL Profiler running while the queries executed. Below are the results from my trace.
Query |
Reads |
CPU |
Duration |
---|---|---|---|
B-Tree | 20192 | 2308 | 630 |
Column Store | 10117 | 170 | 130 |
It's easy to see from these test results that the column store index performance is much better. It performed half as many reads, used much less CPU and executed faster than its b-tree counterpart. This feature seems to live up to Microsoft's claims of up to 10x better performance with column store indexes over traditional row-based storage.
Next Steps
- More tips on column store indexes:
- Using Profiler and Server Side Traces
- More tips on indexing
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2015-09-01