By: Ben Snaidero | Updated: 2017-10-19 | Comments (2) | Related: > Database Design
Problem
Reading some of the comments from my last tip got me thinking about the performance of table joins on different datatypes. I had always assumed (I know I should know better) that as long as the columns, regardless of whether or not there is a foreign key relationship defined, are indexed that there should be no performance issues based on datatype of the join column chosen to join two tables. In this tip we will compare joining tables using the following datatypes to see if there is in fact any performance difference: INT, BIGINT, GUID (one randomly generated and generated using the newsequentialid() function).
Solution
In order to perform this test, let's setup 4 similar sets of parent/detail tables. The only difference will be that each parent table primary key and its associated detail table join column will use a different column type: one of INT, BIGINT and GUID (one randomly generated and generated using the newsequentialid() function). Below is the T-SQL to generate these tables and load some sample data.
CREATE TABLE parent_int ([pk_col] [int] not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE parent_bigint ([pk_col] [bigint] not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE parent_guid ([pk_col] [uniqueidentifier] not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE parent_seqguid ([pk_col] [uniqueidentifier] default newsequentialid() not null primary key clustered, [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE detail_int ([pk_col] [int] IDENTITY(1,1) not null primary key clustered, [join_col] [int], [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE detail_bigint ([pk_col] [int] IDENTITY(1,1) not null primary key clustered, [join_col] [bigint], [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE detail_guid ([pk_col] [int] IDENTITY(1,1) not null primary key clustered, [join_col] [uniqueidentifier], [col1] [int], [col2] [datetime], [col3] [varchar](20)); CREATE TABLE detail_seqguid ([pk_col] [int] IDENTITY(1,1) not null primary key clustered, [join_col] [uniqueidentifier], [col1] [int], [col2] [datetime], [col3] [varchar](20)); ALTER TABLE dbo.detail_int ADD CONSTRAINT FK_detail_int_parent_int FOREIGN KEY (join_col) REFERENCES dbo.parent_int (pk_col) ON UPDATE NO ACTION ON DELETE NO ACTION; ALTER TABLE dbo.detail_bigint ADD CONSTRAINT FK_detail_bigint_parent_int FOREIGN KEY (join_col) REFERENCES dbo.parent_bigint (pk_col) ON UPDATE NO ACTION ON DELETE NO ACTION; ALTER TABLE dbo.detail_guid ADD CONSTRAINT FK_detail_guid_parent_guid FOREIGN KEY (join_col) REFERENCES dbo.parent_guid (pk_col) ON UPDATE NO ACTION ON DELETE NO ACTION; ALTER TABLE dbo.detail_seqguid ADD CONSTRAINT FK_detail_seqguid_parent_guid FOREIGN KEY (join_col) REFERENCES dbo.parent_seqguid (pk_col) ON UPDATE NO ACTION ON DELETE NO ACTION; CREATE INDEX detail_int_join_col ON dbo.detail_int (join_col); CREATE INDEX detail_bigint_join_col ON dbo.detail_bigint (join_col); CREATE INDEX detail_guid_join_col ON dbo.detail_guid (join_col); CREATE INDEX detail_seqguid_join_col ON dbo.detail_seqguid (join_col); -- data load DECLARE @returnid table (pk_col uniqueidentifier) DECLARE @val int DECLARE @randval int DECLARE @val2 int DECLARE @pkint int DECLARE @pkguid uniqueidentifier DECLARE @pkseqguid uniqueidentifier SELECT @val=1 WHILE @val < 200000 BEGIN INSERT INTO parent_int (pk_col, col1, col2, col3) VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); INSERT INTO parent_bigint (pk_col, col1, col2, col3) VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); SELECT @pkint = @val; SELECT @pkguid = newid(); INSERT INTO parent_guid (pk_col, col1, col2, col3) VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); INSERT INTO parent_seqguid ( col1, col2, col3) output inserted.pk_col into @returnid VALUES (round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); SELECT @pkseqguid = r.pk_col from @returnid r SELECT @randval=FLOOR(RAND()*(10-1)+1); SELECT @val2=1 WHILE @val2 <= @randval BEGIN INSERT INTO detail_int (join_col,col1,col2,col3) VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); INSERT INTO detail_bigint (join_col,col1,col2,col3) VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); INSERT INTO detail_guid (join_col,col1,col2,col3) VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); INSERT INTO detail_seqguid (join_col,col1,col2,col3) VALUES (@pkseqguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); SELECT @val2=@val2+1 END SELECT @val=@val+1 END GO
As with the last test we did let's run through some SELECT queries both before and after the indexes on the tables have been rebuilt to see if there is any difference in performance with the indexes being more or less fragmented. Below is the state of the indexes after the initial data load.
TableName | IndexName | IndexType | FragmentationPct | PageCount |
---|---|---|---|---|
parent_seqguid | PK__parent_s__CF31D692708D8E83 | CLUSTERED | 0.71 | 1253 |
parent_int | PK__parent_i__CF31D69240B1A39F | CLUSTERED | 0.42 | 952 |
parent_guid | PK__parent_g__CF31D6926ABDEAE2 | CLUSTERED | 99.23 | 1821 |
parent_bigint | PK__parent_b__CF31D6927AAC240B | CLUSTERED | 0.47 | 1052 |
detail_seqguid | detail_seqguid_join_col | NONCLUSTERED | 0.83 | 3220 |
detail_int | detail_int_join_col | NONCLUSTERED | 0.57 | 1733 |
detail_guid | detail_guid_join_col | NONCLUSTERED | 99.38 | 5970 |
detail_bigint | detail_bigint_join_col | NONCLUSTERED | 0.67 | 2231 |
To test the performance we will run two different SELECT statements. One that returns all the records from the parent and the associated records from the detail table. The second query will return a single record from the parent and the associated record from the detail table. Note that when querying a single record from the parent table you need to make sure that the detail table has the same number of records to ensure the test results are not skewed based on the size of the dataset returned. Below is the T-SQL for these statements.
-- all records in parent select * from parent_int p inner join detail_int d on d.join_col=p.pk_col; select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col; select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col; select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col; -- single record in parent select * from parent_int p inner join detail_int d on d.join_col=p.pk_col where p.pk_col=121143; select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col where p.pk_col=121143; select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col where p.pk_col='A10B3C3C-3ABC-47CB-8DDE-22DC1DF89447'; select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col where p.pk_col='E89535AA-27A9-E711-985B-3402863D5BD9';
As always we'll use SQL Profiler to collect the performance statistics. Below are the SQL Profile results from this first test run.
Table/Query | CPU (ms) | Reads | Duration (ms) |
---|---|---|---|
parent_int/Full Join | 1295 | 6242 | 10431 |
parent_bigint/Full Join | 1397 | 6832 | 11523 |
parent_guid/Full Join | 1483 | 8616 | 14734 |
parent_seqguid/Full Join | 2512 | 8208 | 14008 |
parent_int/Single Record | 0 | 27 | 2 |
parent_bigint/Single Record | 0 | 27 | 1 |
parent_guid/Single Record | 0 | 27 | 2 |
parent_seqguid/Single Record | 0 | 27 | 1 |
Looking at these results we see that in order to join the entire table the seqguid case uses quite a bit more CPU than the other 3 cases. Also both GUID cases performed ~20% more IO than the integer test cases. In the test cases where the query was limited to just one record from the parent table the performance was almost identical for all 4 test cases. Even though we have quite a bit of fragmentation with the index on the random GUID column it still performs just as well as the other datatypes in this case.
Now let's rebuild our indexes, verify we don't have any more fragmentation and run through our test cases again. Below is the T-SQL to rebuild the indexes.
ALTER INDEX [PK__parent_g__CF31D6926ABDEAE2] ON [dbo].[parent_guid] REBUILD; ALTER INDEX [PK__parent_s__CF31D692708D8E83] ON [dbo].[parent_seqguid] REBUILD; ALTER INDEX [PK__parent_i__CF31D69240B1A39F] ON [dbo].[parent_int] REBUILD; ALTER INDEX [PK__parent_b__CF31D6927AAC240B] ON [dbo].[parent_bigint] REBUILD; ALTER INDEX [detail_seqguid_join_col] ON [dbo].[detail_seqguid] REBUILD; ALTER INDEX [detail_int_join_col] ON [dbo].[detail_int] REBUILD; ALTER INDEX [detail_guid_join_col] ON [dbo].[detail_guid] REBUILD; ALTER INDEX [detail_bigint_join_col] ON [dbo].[detail_bigint] REBUILD;
As you can see below we no longer have any index fragmentation.
TableName | IndexName | IndexType | FragmentationPct | PageCount |
---|---|---|---|---|
parent_seqguid | PK__parent_s__CF31D692708D8E83 | CLUSTERED | 0.23 | 1254 |
parent_int | PK__parent_i__CF31D69240B1A39F | CLUSTERED | 0.31 | 953 |
parent_guid | PK__parent_g__CF31D6926ABDEAE2 | CLUSTERED | 0.23 | 1252 |
parent_bigint | PK__parent_b__CF31D6927AAC240B | CLUSTERED | 0.28 | 1052 |
detail_seqguid | detail_seqguid_join_col | NONCLUSTERED | 0.09 | 3221 |
detail_int | detail_int_join_col | NONCLUSTERED | 0.17 | 1734 |
detail_guid | detail_guid_join_col | NONCLUSTERED | 0.09 | 3222 |
detail_bigint | detail_bigint_join_col | NONCLUSTERED | 0.13 | 2232 |
Now let's rerun our select queries again and see if there is any difference with performance. Below are the SQL Profiler results from this second test run.
Table/Query | CPU (ms) | Reads | Duration (ms) |
---|---|---|---|
parent_int/Full Join | 1102 | 6243 | 10231 |
parent_bigint/Full Join | 1328 | 6832 | 11275 |
parent_guid/Full Join | 1295 | 8044 | 14101 |
parent_seqguid/Full Join | 2532 | 8203 | 14206 |
parent_int/Single Record | 0 | 27 | 2 |
parent_bigint/Single Record | 0 | 27 | 2 |
parent_guid/Single Record | 0 | 27 | 2 |
parent_seqguid/Single Record | 0 | 27 | 1 |
These results match almost exactly with our first test run. The only difference being, now that the index has been rebuilt, the random GUID datatype is performing fewer reads in the case where we are querying the entire parent table. Although it's not quite as good as either of the integer test cases.
So there we have it. At least for this simple test case, we have demonstrated that there is very little difference with table join performance using these 4 datatypes.
Next Steps
- Stay tuned for additional related tips on SQL Server performance to address common questions.
- Read other tips on performance tuning.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2017-10-19