SQL Table Join Performance for INT vs BIGINT vs GUID Data Types

By: Ben Snaidero | Updated: 2017-10-19 | Comments (2) | Related: > Database Design

Problem

Reading some of the comments from my last tip got me thinking about the performance of table joins on different datatypes. I had always assumed (I know I should know better) that as long as the columns, regardless of whether or not there is a foreign key relationship defined, are indexed that there should be no performance issues based on datatype of the join column chosen to join two tables. In this tip we will compare joining tables using the following datatypes to see if there is in fact any performance difference: INT, BIGINT, GUID (one randomly generated and generated using the newsequentialid() function).

Solution

In order to perform this test, let's setup 4 similar sets of parent/detail tables. The only difference will be that each parent table primary key and its associated detail table join column will use a different column type: one of INT, BIGINT and GUID (one randomly generated and generated using the newsequentialid() function). Below is the T-SQL to generate these tables and load some sample data.

CREATE TABLE parent_int     ([pk_col] [int] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_bigint  ([pk_col] [bigint] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_guid    ([pk_col] [uniqueidentifier] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_seqguid ([pk_col] [uniqueidentifier] default newsequentialid() not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20)); 
CREATE TABLE detail_int     ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [int],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_bigint  ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [bigint],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_guid    ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [uniqueidentifier],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_seqguid ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [uniqueidentifier],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));

ALTER TABLE dbo.detail_int ADD CONSTRAINT FK_detail_int_parent_int FOREIGN KEY
 (join_col) REFERENCES dbo.parent_int (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

ALTER TABLE dbo.detail_bigint ADD CONSTRAINT FK_detail_bigint_parent_int FOREIGN KEY
 (join_col) REFERENCES dbo.parent_bigint (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

ALTER TABLE dbo.detail_guid ADD CONSTRAINT FK_detail_guid_parent_guid FOREIGN KEY
 (join_col) REFERENCES dbo.parent_guid (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

ALTER TABLE dbo.detail_seqguid ADD CONSTRAINT FK_detail_seqguid_parent_guid FOREIGN KEY
 (join_col) REFERENCES dbo.parent_seqguid (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;

CREATE INDEX detail_int_join_col ON dbo.detail_int (join_col);
CREATE INDEX detail_bigint_join_col ON dbo.detail_bigint (join_col);
CREATE INDEX detail_guid_join_col ON dbo.detail_guid (join_col);
CREATE INDEX detail_seqguid_join_col ON dbo.detail_seqguid (join_col);

-- data load
DECLARE @returnid table (pk_col uniqueidentifier)
DECLARE @val int
DECLARE @randval int
DECLARE @val2 int
DECLARE @pkint int
DECLARE @pkguid uniqueidentifier
DECLARE @pkseqguid uniqueidentifier
SELECT @val=1
WHILE @val < 200000
BEGIN  
   INSERT INTO parent_int (pk_col, col1, col2,  col3) 
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   INSERT INTO parent_bigint (pk_col, col1, col2,  col3) 
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   SELECT @pkint = @val;
   SELECT @pkguid = newid();
   INSERT INTO parent_guid (pk_col, col1, col2,  col3) 
     VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   INSERT INTO parent_seqguid ( col1, col2,  col3)
     output inserted.pk_col into @returnid 
     VALUES (round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   SELECT @pkseqguid = r.pk_col from @returnid r
   SELECT @randval=FLOOR(RAND()*(10-1)+1);
   SELECT @val2=1
   WHILE @val2 <= @randval
   BEGIN
      INSERT INTO detail_int (join_col,col1,col2,col3)
         VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_bigint (join_col,col1,col2,col3)
         VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_guid (join_col,col1,col2,col3)
         VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_seqguid (join_col,col1,col2,col3)
         VALUES (@pkseqguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      SELECT @val2=@val2+1
   END
   
   SELECT @val=@val+1
END
GO

As with the last test we did let's run through some SELECT queries both before and after the indexes on the tables have been rebuilt to see if there is any difference in performance with the indexes being more or less fragmented. Below is the state of the indexes after the initial data load.

TableName	IndexName	IndexType	FragmentationPct	PageCount
parent_seqguid	PK__parent_s__CF31D692708D8E83	CLUSTERED	0.71	1253
parent_int	PK__parent_i__CF31D69240B1A39F	CLUSTERED	0.42	952
parent_guid	PK__parent_g__CF31D6926ABDEAE2	CLUSTERED	99.23	1821
parent_bigint	PK__parent_b__CF31D6927AAC240B	CLUSTERED	0.47	1052
detail_seqguid	detail_seqguid_join_col	NONCLUSTERED	0.83	3220
detail_int	detail_int_join_col	NONCLUSTERED	0.57	1733
detail_guid	detail_guid_join_col	NONCLUSTERED	99.38	5970
detail_bigint	detail_bigint_join_col	NONCLUSTERED	0.67	2231

To test the performance we will run two different SELECT statements. One that returns all the records from the parent and the associated records from the detail table. The second query will return a single record from the parent and the associated record from the detail table. Note that when querying a single record from the parent table you need to make sure that the detail table has the same number of records to ensure the test results are not skewed based on the size of the dataset returned. Below is the T-SQL for these statements.

-- all records in parent
select * from parent_int p inner join detail_int d on d.join_col=p.pk_col;
select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col;
select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col;
select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col;
  
-- single record in parent
select * from parent_int p inner join detail_int d on d.join_col=p.pk_col 
  where p.pk_col=121143;
select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col 
  where p.pk_col=121143;
select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col 
  where p.pk_col='A10B3C3C-3ABC-47CB-8DDE-22DC1DF89447';
select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col 
  where p.pk_col='E89535AA-27A9-E711-985B-3402863D5BD9';

As always we'll use SQL Profiler to collect the performance statistics. Below are the SQL Profile results from this first test run.

Table/Query	CPU (ms)	Reads	Duration (ms)
parent_int/Full Join	1295	6242	10431
parent_bigint/Full Join	1397	6832	11523
parent_guid/Full Join	1483	8616	14734
parent_seqguid/Full Join	2512	8208	14008
parent_int/Single Record	0	27	2
parent_bigint/Single Record	0	27	1
parent_guid/Single Record	0	27	2
parent_seqguid/Single Record	0	27	1

Looking at these results we see that in order to join the entire table the seqguid case uses quite a bit more CPU than the other 3 cases. Also both GUID cases performed ~20% more IO than the integer test cases. In the test cases where the query was limited to just one record from the parent table the performance was almost identical for all 4 test cases. Even though we have quite a bit of fragmentation with the index on the random GUID column it still performs just as well as the other datatypes in this case.

Now let's rebuild our indexes, verify we don't have any more fragmentation and run through our test cases again. Below is the T-SQL to rebuild the indexes.

ALTER INDEX [PK__parent_g__CF31D6926ABDEAE2] ON [dbo].[parent_guid] REBUILD;
ALTER INDEX [PK__parent_s__CF31D692708D8E83] ON [dbo].[parent_seqguid] REBUILD;
ALTER INDEX [PK__parent_i__CF31D69240B1A39F] ON [dbo].[parent_int] REBUILD;
ALTER INDEX [PK__parent_b__CF31D6927AAC240B] ON [dbo].[parent_bigint] REBUILD;
ALTER INDEX [detail_seqguid_join_col] ON [dbo].[detail_seqguid] REBUILD;
ALTER INDEX [detail_int_join_col] ON [dbo].[detail_int] REBUILD;
ALTER INDEX [detail_guid_join_col] ON [dbo].[detail_guid] REBUILD;
ALTER INDEX [detail_bigint_join_col] ON [dbo].[detail_bigint] REBUILD;

As you can see below we no longer have any index fragmentation.

TableName	IndexName	IndexType	FragmentationPct	PageCount
parent_seqguid	PK__parent_s__CF31D692708D8E83	CLUSTERED	0.23	1254
parent_int	PK__parent_i__CF31D69240B1A39F	CLUSTERED	0.31	953
parent_guid	PK__parent_g__CF31D6926ABDEAE2	CLUSTERED	0.23	1252
parent_bigint	PK__parent_b__CF31D6927AAC240B	CLUSTERED	0.28	1052
detail_seqguid	detail_seqguid_join_col	NONCLUSTERED	0.09	3221
detail_int	detail_int_join_col	NONCLUSTERED	0.17	1734
detail_guid	detail_guid_join_col	NONCLUSTERED	0.09	3222
detail_bigint	detail_bigint_join_col	NONCLUSTERED	0.13	2232

Now let's rerun our select queries again and see if there is any difference with performance. Below are the SQL Profiler results from this second test run.

Table/Query	CPU (ms)	Reads	Duration (ms)
parent_int/Full Join	1102	6243	10231
parent_bigint/Full Join	1328	6832	11275
parent_guid/Full Join	1295	8044	14101
parent_seqguid/Full Join	2532	8203	14206
parent_int/Single Record	0	27	2
parent_bigint/Single Record	0	27	2
parent_guid/Single Record	0	27	2
parent_seqguid/Single Record	0	27	1

These results match almost exactly with our first test run. The only difference being, now that the index has been rebuilt, the random GUID datatype is performing fewer reads in the case where we are querying the entire parent table. Although it's not quite as good as either of the integer test cases.

So there we have it. At least for this simple test case, we have demonstrated that there is very little difference with table join performance using these 4 datatypes.

Next Steps

Stay tuned for additional related tips on SQL Server performance to address common questions.
Read other tips on performance tuning.

About the author

Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2017-10-19

Compare SQL Server Table Join Performance for INT vs BIGINT vs GUID Data Types

Problem

Solution

Next Steps

About the author

Comments For This Article