By: Ben Snaidero | Updated: 2017-11-16 | Comments (2) | Related: > Database Design
Problem
A good point was brought up with a tip I wrote last month on using a GUID column for a primary key and how it can cause really poor performance over time due to the fact that using this column type will result in a lot of page splits. In this tip we will run through basically the same steps as the original tip, but this time we will measure the page splits encountered when inserting data for each of the following datatypes: integer, bigint, GUID and sequential GUID. We will also measure the impact that these page splits have on the SQL Server transaction log.
Solution
Before we dive into our page split demonstration I just want to give a quick overview on the two different types of page splits we can encounter in a SQL Server database as it pertains to clustered indexes and inserting data. As we all know data in a SQL Server database is stored in pages. Since each page is 8KB in size we are inevitably going to have to add and/or split a database page in order to add more data. This leads us to the two types of page splits that can occur, one where we are adding a page to the end of the structure and one where the data needs to go in the middle of the structure, but the page is full so we actually have to split this full page into two pages. In the case of the former since integer, bigint and the sequential GUID datatypes are ever increasing columns (assuming the integer types are identity values of some sort) we expect all or most of the page splits with these indexes to happen when adding a page to the end of the index structure which requires very little overhead and does not increase the level of fragmentation in the object. The latter is the case which can lead to poor performance. When a page split occurs in the middle of an objects structure it impacts:
- INSERTs and UPDATEs while the page is actually being split during the DML operation
- SELECT queries due to excessive fragmentation caused by the page splits
Since the random GUID column could have data added anywhere in the structure due to its random nature, it could be become very susceptible to some poor performance because of these bad page splits (assuming no other maintenance is done).
Now that we understand a little more about page splits let’s run through a demonstration using a large number of inserts against tables with the different column types for the primary key. Below is the T-SQL to setup the 4 tables. Each table has a primary key column that is either an integer, bigint, GUID or sequential GUID.
CREATE TABLE testtable_int ([pk_col] [int] not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE testtable_bigint ([pk_col] [bigint] not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE testtable_guid ([pk_col] [uniqueidentifier] not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE testtable_seqguid ([pk_col] [uniqueidentifier] default newsequentialid() not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20));
We can use the following T-SQL to load each of these tables with 1,000,000 records which should be enough to see a pretty significant number of page splits.
-- data load DECLARE @val int DECLARE @pkguid uniqueidentifier SELECT @val=0 WHILE @val < 1000000 BEGIN INSERT INTO testtable_int (pk_col, col1, col2, col3) VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); INSERT INTO testtable_bigint (pk_col, col1, col2, col3) VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); SELECT @pkguid = newid(); INSERT INTO testtable_guid (pk_col, col1, col2, col3) VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); INSERT INTO testtable_seqguid ( col1, col2, col3) VALUES (round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); SELECT @val=@val+1 END GO
With the data loaded, we now need to find these bad page splits and luckily for us this information can be pulled from the SQL Server transaction log. Assuming you are in full recovery mode and no log backup has been taken or you are in simple recovery mode and the vlf hasn't been overwritten we can use the undocumented function fn_dblog to query the SQL Server transaction log. More details on this function can be found here. Using the query from that link we can see in the chart below that as we suspected we have quite a few more bad page splits on the table with the random GUID primary key. This also explains the large amount of fragmentation we were seeing in my original tip. Below is the query we can use to inspect the SQL Server transaction log and the output of this query for our test tables.
SELECT COUNT(1) AS NumberOfSplits ,AllocUnitName ,Context FROM fn_dblog(NULL,NULL) WHERE Operation = 'LOP_DELETE_SPLIT' GROUP BY AllocUnitName, Context ORDER BY NumberOfSplits DESC
NumberOfSplits | AllocUnitName | Context |
---|---|---|
9044 | dbo.testtable_guid.PK__testtabl__CF31D692BC64152C | LCX_CLUSTERED |
41 | dbo.testtable_seqguid.PK__testtabl__CF31D6922AC4B1C1 | LCX_INDEX_INTERIOR |
38 | dbo.testtable_guid.PK__testtabl__CF31D692BC64152C | LCX_INDEX_INTERIOR |
23 | dbo.testtable_bigint.PK__testtabl__CF31D6927EBF9CD7 | LCX_INDEX_INTERIOR |
16 | dbo.testtable_int.PK__testtabl__CF31D692E6421A60 | LCX_INDEX_INTERIOR |
Now that we know how many page splits occurred for each datatype let take a look at the effect that these page splits have on the amount of space required in the SQL Server transaction log. I ran through the data load script a second time (after truncating the tables), but this I time I ran each table load separately in its own loop so I could measure the transaction log usage using DBCC SQLPERF (LOGSPACE). You can see from the results below that the page splits caused by the random GUID column required a lot more transaction log space compared to the amount generated by the other 3 column types. It's especially interesting to see how much more space it used compared with the table that used the sequential GUID since these columns require the exact same amount of space (16 bytes), but the table that was loaded with a random GUID used much more transaction log space.
Column Type | Transaction Log Usage |
---|---|
GUID | 237 MB |
Integer | 138 MB |
BigInt | 146 MB |
Sequential GUID | 155 MB |
Next Steps
- Read other tips comparing INT and GUID data types:
- Read tips on fill factors to reduce page splits:
- Read more on tracking page splits:
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2017-11-16