Compare SQL Server Table Join Performance for INT vs BIGINT vs GUID Data Types

By:   |   Updated: 2017-10-19   |   Comments (2)   |   Related: > Database Design


Problem

Reading some of the comments from my last tip got me thinking about the performance of table joins on different datatypes.  I had always assumed (I know I should know better) that as long as the columns, regardless of whether or not there is a foreign key relationship defined, are indexed that there should be no performance issues based on datatype of the join column chosen to join two tables.  In this tip we will compare joining tables using the following datatypes to see if there is in fact any performance difference: INT, BIGINT, GUID (one randomly generated and generated using the newsequentialid() function).

Solution

In order to perform this test, let's setup 4 similar sets of parent/detail tables.  The only difference will be that each parent table primary key and its associated detail table join column will use a different column type: one of INT, BIGINT and GUID (one randomly generated and generated using the newsequentialid() function).  Below is the T-SQL to generate these tables and load some sample data.

CREATE TABLE parent_int     ([pk_col] [int] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_bigint  ([pk_col] [bigint] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_guid    ([pk_col] [uniqueidentifier] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_seqguid ([pk_col] [uniqueidentifier] default newsequentialid() not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20)); 
CREATE TABLE detail_int     ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [int],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_bigint  ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [bigint],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_guid    ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [uniqueidentifier],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_seqguid ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [uniqueidentifier],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));

ALTER TABLE dbo.detail_int ADD CONSTRAINT FK_detail_int_parent_int FOREIGN KEY
 (join_col) REFERENCES dbo.parent_int (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

ALTER TABLE dbo.detail_bigint ADD CONSTRAINT FK_detail_bigint_parent_int FOREIGN KEY
 (join_col) REFERENCES dbo.parent_bigint (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

ALTER TABLE dbo.detail_guid ADD CONSTRAINT FK_detail_guid_parent_guid FOREIGN KEY
 (join_col) REFERENCES dbo.parent_guid (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

ALTER TABLE dbo.detail_seqguid ADD CONSTRAINT FK_detail_seqguid_parent_guid FOREIGN KEY
 (join_col) REFERENCES dbo.parent_seqguid (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

CREATE INDEX detail_int_join_col ON dbo.detail_int (join_col);
CREATE INDEX detail_bigint_join_col ON dbo.detail_bigint (join_col);
CREATE INDEX detail_guid_join_col ON dbo.detail_guid (join_col);
CREATE INDEX detail_seqguid_join_col ON dbo.detail_seqguid (join_col);

-- data load
DECLARE @returnid table (pk_col uniqueidentifier)
DECLARE @val int
DECLARE @randval int
DECLARE @val2 int
DECLARE @pkint int
DECLARE @pkguid uniqueidentifier
DECLARE @pkseqguid uniqueidentifier
SELECT @val=1
WHILE @val < 200000
BEGIN  
   INSERT INTO parent_int (pk_col, col1, col2,  col3) 
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   INSERT INTO parent_bigint (pk_col, col1, col2,  col3) 
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   SELECT @pkint = @val;
   SELECT @pkguid = newid();
   INSERT INTO parent_guid (pk_col, col1, col2,  col3) 
     VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   INSERT INTO parent_seqguid ( col1, col2,  col3)
     output inserted.pk_col into @returnid 
     VALUES (round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   SELECT @pkseqguid = r.pk_col from @returnid r
   SELECT @randval=FLOOR(RAND()*(10-1)+1);
   SELECT @val2=1
   WHILE @val2 <= @randval
   BEGIN
      INSERT INTO detail_int (join_col,col1,col2,col3)
         VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_bigint (join_col,col1,col2,col3)
         VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_guid (join_col,col1,col2,col3)
         VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_seqguid (join_col,col1,col2,col3)
         VALUES (@pkseqguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      SELECT @val2=@val2+1
   END
   
   SELECT @val=@val+1
END
GO                        

As with the last test we did let's run through some SELECT queries both before and after the indexes on the tables have been rebuilt to see if there is any difference in performance with the indexes being more or less fragmented.  Below is the state of the indexes after the initial data load.

TableName IndexName IndexType FragmentationPct PageCount
parent_seqguid PK__parent_s__CF31D692708D8E83 CLUSTERED 0.71 1253
parent_int PK__parent_i__CF31D69240B1A39F CLUSTERED 0.42 952
parent_guid PK__parent_g__CF31D6926ABDEAE2 CLUSTERED 99.23 1821
parent_bigint PK__parent_b__CF31D6927AAC240B CLUSTERED 0.47 1052
detail_seqguid detail_seqguid_join_col NONCLUSTERED 0.83 3220
detail_int detail_int_join_col NONCLUSTERED 0.57 1733
detail_guid detail_guid_join_col NONCLUSTERED 99.38 5970
detail_bigint detail_bigint_join_col NONCLUSTERED 0.67 2231

To test the performance we will run two different SELECT statements.  One that returns all the records from the parent and the associated records from the detail table.  The second query will return a single record from the parent and the associated record from the detail table.  Note that when querying a single record from the parent table you need to make sure that the detail table has the same number of records to ensure the test results are not skewed based on the size of the dataset returned.  Below is the T-SQL for these statements.

-- all records in parent
select * from parent_int p inner join detail_int d on d.join_col=p.pk_col;
select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col;
select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col;
select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col;
  
-- single record in parent
select * from parent_int p inner join detail_int d on d.join_col=p.pk_col 
  where p.pk_col=121143;
select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col 
  where p.pk_col=121143;
select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col 
  where p.pk_col='A10B3C3C-3ABC-47CB-8DDE-22DC1DF89447';
select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col 
  where p.pk_col='E89535AA-27A9-E711-985B-3402863D5BD9';

As always we'll use SQL Profiler to collect the performance statistics.  Below are the SQL Profile results from this first test run.

Table/Query CPU (ms) Reads Duration (ms)
parent_int/Full Join 1295 6242 10431
parent_bigint/Full Join 1397 6832 11523
parent_guid/Full Join 1483 8616 14734
parent_seqguid/Full Join 2512 8208 14008
parent_int/Single Record 0 27 2
parent_bigint/Single Record 0 27 1
parent_guid/Single Record 0 27 2
parent_seqguid/Single Record 0 27 1

Looking at these results we see that in order to join the entire table the seqguid case uses quite a bit more CPU than the other 3 cases.  Also both GUID cases performed ~20% more IO than the integer test cases.  In the test cases where the query was limited to just one record from the parent table the performance was almost identical for all 4 test cases.  Even though we have quite a bit of fragmentation with the index on the random GUID column it still performs just as well as the other datatypes in this case.

Now let's rebuild our indexes, verify we don't have any more fragmentation and run through our test cases again.  Below is the T-SQL to rebuild the indexes.

ALTER INDEX [PK__parent_g__CF31D6926ABDEAE2] ON [dbo].[parent_guid] REBUILD;
ALTER INDEX [PK__parent_s__CF31D692708D8E83] ON [dbo].[parent_seqguid] REBUILD;
ALTER INDEX [PK__parent_i__CF31D69240B1A39F] ON [dbo].[parent_int] REBUILD;
ALTER INDEX [PK__parent_b__CF31D6927AAC240B] ON [dbo].[parent_bigint] REBUILD;
ALTER INDEX [detail_seqguid_join_col] ON [dbo].[detail_seqguid] REBUILD;
ALTER INDEX [detail_int_join_col] ON [dbo].[detail_int] REBUILD;
ALTER INDEX [detail_guid_join_col] ON [dbo].[detail_guid] REBUILD;
ALTER INDEX [detail_bigint_join_col] ON [dbo].[detail_bigint] REBUILD;

As you can see below we no longer have any index fragmentation.

TableName IndexName IndexType FragmentationPct PageCount
parent_seqguid PK__parent_s__CF31D692708D8E83 CLUSTERED 0.23 1254
parent_int PK__parent_i__CF31D69240B1A39F CLUSTERED 0.31 953
parent_guid PK__parent_g__CF31D6926ABDEAE2 CLUSTERED 0.23 1252
parent_bigint PK__parent_b__CF31D6927AAC240B CLUSTERED 0.28 1052
detail_seqguid detail_seqguid_join_col NONCLUSTERED 0.09 3221
detail_int detail_int_join_col NONCLUSTERED 0.17 1734
detail_guid detail_guid_join_col NONCLUSTERED 0.09 3222
detail_bigint detail_bigint_join_col NONCLUSTERED 0.13 2232

Now let's rerun our select queries again and see if there is any difference with performance.  Below are the SQL Profiler results from this second test run.

Table/Query CPU (ms) Reads Duration (ms)
parent_int/Full Join 1102 6243 10231
parent_bigint/Full Join 1328 6832 11275
parent_guid/Full Join 1295 8044 14101
parent_seqguid/Full Join 2532 8203 14206
parent_int/Single Record 0 27 2
parent_bigint/Single Record 0 27 2
parent_guid/Single Record 0 27 2
parent_seqguid/Single Record 0 27 1

These results match almost exactly with our first test run.  The only difference being, now that the index has been rebuilt, the random GUID datatype is performing fewer reads in the case where we are querying the entire parent table.  Although it's not quite as good as either of the integer test cases.

So there we have it.  At least for this simple test case, we have demonstrated that there is very little difference with table join performance using these 4 datatypes.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-10-19

Comments For This Article




Thursday, October 19, 2017 - 2:08:43 PM - Ben Snaidero Back To Top (68547)

@Gregory Surplus

That was actually a typo.  The table has been updated so it should make sense now.  Thanks for noticing and reading :)

 


Thursday, October 19, 2017 - 11:15:05 AM - Gregory Surplus Back To Top (68545)

 I'm a little confused here.  I read the following:

[quote]

 Looking at these results we see that in order to join the entire table the seqguid case uses quite a bit more CPU than the other 3 cases.

[/quote]

But the table above is showing the BigInt join with the highest CPU usage.  2532.  Am I missing something?

 















get free sql tips
agree to terms