By: Ahmad Yaseen | Updated: 2017-06-29 | Comments (4) | Related: > Performance Tuning
Problem
Most of the time when we write a T-SQL query to retrieve data from a specific table, we are interested in retrieving a specific sorted range and number of records rather than retrieving all the records. The range of data can be managed using the WHERE clause, the sort of the data is managed by the ORDER BY clause and the retrieved number of records can be managed by the TOP or ROW_NUMBER statements. Which one is better, using the TOP or ROW_NUMBER statements to limit the number of retrieved records?
Solution
The TOP T-SQL statement is used to limit the number of rows returned in the query result set to a specific number. If the TOP statement is used with the ORDER BY clause, the returned result will be limited to the first ordered N records; otherwise, it will return the first N records with no order.
On the other hand, the ROW_NUMBER statement returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
It is not a difficult task to return the same query results written with the TOP statement vs. the ROW_NUMBER statement. However, is it better to limit the number of records returned from the query using the TOP or ROW_NUMBER statements?
For each of the runs below, I enabled the TIME and IO statistics and the Actual Execution Plan to compare the queries performance. This is the command to turn on the stattsics for IO and TIME.
SET STATISTICS IO ON SET STATISTICS TIME ON
Test 1 - Return 10 Rows
Let us run the below two queries that return the first 10 records from the Product table in two ways, the first using the TOP statement and the second using the ROW_NUMBER statement. I enabled the TIME and IO statistics and the Actual Execution Plan to compare the queries performance:
SELECT TOP 10 [ProductID] ,[Name] ,[ProductNumber] FROM [MSSQLTipsDemo].[Production].[Product] ORDER BY ProductID OPTION (RECOMPILE) GO SELECT PP.[ProductRank],PP.ProductID , PP.[Name] ,PP.[ProductNumber] FROM ( SELECT ROW_NUMBER() OVER (ORDER BY [ProductID]) AS [ProductRank] ,ProductID ,[Name] ,[ProductNumber] FROM [MSSQLTipsDemo].[Production].[Product] ) AS PP WHERE PP.[ProductRank] <=10 ORDER BY [ProductRank] OPTION (RECOMPILE)
You can see the below results are the same, as they are using the ORDER BY clause:
Comparing the execution plans for both queries, you will find that the query using the TOP statement (37%) has less weight compared to the query using the ROW_NUMBER statement (63%) relative to the batch.
The IO statistics show that the query using the TOP statement performs less logical reads than the query using the ROW_NUMBER statement:
The TIME statistics show the query using the TOP statement executed faster than the query using the ROW_NUMBER statement:
Comparing the Actual and Estimated number of rows returned from the Clustered Index Scan operator in the two queries, both actual and estimated number of rows for the query using the TOP statement are equal, which is not the case for the querying using the ROW_NUMBER statement where there are different actual and estimated values. Note this difference as we do more tests.
Test 2 - Return 100 Rows
Let us increase the number of records returned from the previous queries to be 100 records:
SELECT TOP 100 [ProductID] ,[Name] ,[ProductNumber] FROM [MSSQLTipsDemo].[Production].[Product] ORDER BY ProductID OPTION (RECOMPILE) GO SELECT PP.[ProductRank],PP.ProductID , PP.[Name] ,PP.[ProductNumber] FROM ( SELECT ROW_NUMBER() OVER (ORDER BY [ProductID]) AS [ProductRank] ,ProductID ,[Name] ,[ProductNumber] FROM [MSSQLTipsDemo].[Production].[Product] ) AS PP WHERE PP.[ProductRank] <=100 ORDER BY [ProductRank] OPTION (RECOMPILE)
Comparing the execution plans for both queries again, you will see that the two queries have the same weight:
Moreover, both queries perform the same number of logical reads.
The query using the TOP statement executed faster than the query using the ROW_NUMBER statement this time too.
Comparing the Actual and Estimated number of rows returned from the Clustered Index Scan operator of the two queries, both actual and estimated number of rows for the both queries are the same:
Test 3 - Return 1000 Rows
Increasing the number of records returned from the previous queries again to be 1000 records this time:
SELECT TOP 1000 [ProductID] ,[Name] ,[ProductNumber] FROM [MSSQLTipsDemo].[Production].[Product] ORDER BY ProductID OPTION (RECOMPILE) GO SELECT PP.[ProductRank],PP.ProductID , PP.[Name] ,PP.[ProductNumber] FROM ( SELECT ROW_NUMBER() OVER (ORDER BY [ProductID]) AS [ProductRank] ,ProductID ,[Name] ,[ProductNumber] FROM [MSSQLTipsDemo].[Production].[Product] ) AS PP WHERE PP.[ProductRank] <=1000 ORDER BY [ProductRank] OPTION (RECOMPILE)
Comparing the execution plans for both queries, you will find that the query using the ROW_NUMBER statement has less weight than the query using the TOP statement this time:
In addition, both queries perform the same number of logical reads:
The query with the TOP statement again executed faster than the query with the ROW_NUMBER.
The actual and estimated number of rows for the query using the TOP statement are equal, but the numbers are different for the ROW_NUMBER statement.
Conclusion
Execution Plan
The execution plan comparison gave us a different indication each time with how long the queries took relative to the batch. First TOP was faster, than it was slower, but the overall execution plans were the same will all of the tests. Therefore we cannot build our conclusion using just the execution plan comparison. It seems we are comparing the estimated costs for these queries, based on the estimated number of rows returned from the queries, not the actual number of rows. This is clear from the difference between the actual number and estimated number of records shown in the Clustered Index Scan operator for the query that used the ROW_NUMBER statement. Therefore, we cannot trust the costs that are calculated based on the estimated values to compare the queries performance.
IO Stats
Moving to the IO statistics gathered from the previous tests, there was a small difference in the number of logical reads between the query using the TOP statement versus the ROW_NUMBER statement for the first test, but as the result sets got bigger the logical reads were equal.
Time Stats
The Time statistics are clear in all tests. The time required to execute the query using the TOP statement was always less than the ROW_NUMBER statement. This shows for our example, that the TOP statement was always faster than the ROW_NUMBER statement.
Next Steps
- This is just a start to get an idea of how things can act differently. It is better to use more than one performance metric to get the real performance difference between two queries, without basing your decision on one set of tests.
- Take the time to write queries in multiple ways and capture the stats as we did above to see what works the best and implement the fastest solution.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2017-06-29