By: Chad Boyd | Updated: 2007-03-02 | Comments (4) | Related: > Identities
Problem
I have a table that makes use of integer based sequential values (identity based or otherwise) and I need to determine what gaps exist within this table. What is the best way to determine these gaps using a set-based approach (i.e. no cursor operations)?
Solution
A variety of simple queries can be used with any version of SQL Server to determine the first-level gaps within a given integer-based column in a table, some of which are more scalable than others. My favorite, and probably the most scalable solution I've found is a morph of a solution created by Umachandar Jayachandran.
Let's setup an example with this code and then begin to issues the queries below.
SELECT a.pkid + 1 as GapValue FROM t1 a WHERE NOT EXISTS( SELECT * FROM t1 b WHERE b.pkid = a.pkid + 1 ) AND a.pkid < (select max(pkid) from t1);
The query above will give you the minimum integer value (first-level gap) for every gap in the pkid column from table t1. The query from above will return the following result:
GapValue ----------- 4 8 14 27 31
This is a great start, however, there is at least 1 fairly problematic limitation with the above: you will never get a result set that contains all ID's in a multiple-value gap range, you will only get the first ID of a gap range (for example, notice in the data that the first gap starts at 4 as output in the result set, however there also is no 5 or 6 in the data, but the results of the query only provide the first value in a gap range). This can pose a significant problem, for example, if you are trying to insert records into the table containing the gaps based of a query from data in another table. You may still end up getting the data in successfully, but you will not have done so by using up all the gaps, or even the smallest values for the id's in question.
To address this issue, with SQL Server 2000 it would be extremely difficult and would require some iterative techniques. However, with SQL Server 2005 and the advent of Recursion and CTE's (recursive common table expressions), you can easily morph the solution from above to provide you what you're looking for, as follows:
declare @i int; SELECT @i = MAX(pkid) FROM t1; WITH tmp (gapId) AS ( SELECT DISTINCT a.pkid + 1 FROM t1 a WHERE NOT EXISTS( SELECT * FROM t1 b WHERE b.pkid = a.pkid + 1) AND a.pkid < @i UNION ALL SELECT a.gapId + 1 FROM tmp a WHERE NOT EXISTS( SELECT * FROM t1 b WHERE b.pkid = a.gapId + 1) AND a.gapId < @i ) SELECT gapId FROM tmp ORDER BY gapId;
Not only will this result in a data-set including ALL the id values in all gap ranges, but is also extremely efficient and scalable when compared to other possible solutions using functions and iterative (i.e. cursor based) solutions:
gapId ----------- 4 5 6 8 14 15 16 17 18 19 20 21 22 23 24 25 27 31 32 33 34
Next Steps
- Be sure to check out the supporting code example to understand the entire process.
- This code is a prime example of new and enhanced approaches to resolving historical SQL Server problems with SQL Server 2005.
- As you begin to work with SQL Server 2005, research the new coding options to expand your knowledge and improve the functionality and performance of your code.
- Check out the following related tips:
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2007-03-02