SQL Server Performance Impact of Querying Unnecessary Columns

By:   |   Updated: 2017-12-06   |   Comments (6)   |   Related: > Performance Tuning


Problem

Performing unnecessary processing during query execution can sometimes prove to be one of the bigger culprits when it comes to the overall performance of your SQL Server database instance.  Far too many times I've seen queries that do "SELECT *" when they really only need one or two columns from the result set.  Sometimes there is a column list defined in the query but within the code only a few of the columns are actually used.  This tip will use a fairly extreme example to show how querying only the columns you actually require can have a positive impact on SQL Server performance.

Solution

If you would like to follow along yourself or do further testing, the schema used for the demonstration below is from the AdventureWorks2014 database which can be downloaded from the following link.  AdventureWorks2014 download.  The second set of test queries in the demonstration use an enlarged version of the SalesOrderDetail table which can be created by using the script found here.  Once you have the schema created you can also add a fairly wide Notes column to both the Employee and SalesOrderDetailEnlarged tables and populate them both with some data.  Below is the TSQL to add these two columns and populate them.

ALTER TABLE HumanResources.Employee ADD Notes VARCHAR(max);
UPDATE HumanResources.Employee SET Notes=REPLICATE('x',30000);
ALTER TABLE [Sales].[SalesOrderDetailEnlarged] ADD Notes VARCHAR(max); 
UPDATE [Sales].[SalesOrderDetailEnlarged] SET Notes=REPLICATE('x',30000);

SQL Server Performance Test Case 1

Now that we have our schema created let’s look at our first example.  As mentioned in the problem statement we are going to run two different versions of this query.  One that returns all the columns from the table and one that leaves out this new large column we have just added.  For the first test we are going to look at a relatively small table and see the effect that returning this large column data has on the query performance.  Here are the queries for this test.

select * from HumanResources.Employee order by HireDate desc
GO
select [BusinessEntityID],[NationalIDNumber],[LoginID],[OrganizationNode],[OrganizationLevel]
      ,[JobTitle],[BirthDate],[MaritalStatus],[Gender],[HireDate],[SalariedFlag],[VacationHours]
      ,[SickLeaveHours],[CurrentFlag],[rowguid],[ModifiedDate] from HumanResources.Employee
order by HireDate desc
GO
   

When running these queries in order to gather some statistics you will need to select the "Include Client Statistics" and "Include Actual Execution Plan" buttons on the toolbar in SQL Server Management Studio.  Once that is done we can take a look at the results.  Below is a subset of the client statistics and as you can see the query that omitted the large column from the result set ran 20ms faster.  Most of this can probably be attributed to the fact that it returned 2332268 fewer bytes.

  No Large Column All Columns
Query Profile Statistics   
  Rows returned by SELECT statements 292 292
Network Statistics       
  Number of server roundtrips 4 4
  TDS packets sent from client 4 4
  TDS packets received from server 26 595
  Bytes sent from client 956 478
  Bytes received from server 92,195 2,424,463
Time Statistics      
  Client processing time 45 66
  Total execution time 58 78
  Wait time on server replies 11 12

If we dig a little further and look at the explain plan for the above queries we can confirm that it is definitely the row size that is different between the two result sets.  Looking at the first query with all the columns we see that the "Estimated Row Size" is 4854 bytes.

query plan

Looking at the second query without the large Notes columns we see that the "Estimated Row Size" is only 828 bytes.

query plan

EEven for a small table like this we see a performance benefit in just returning the data that we need.  Let's now take a look at a large table.

SQL Server Performance Test Case 2

For this example we will use the SalesOrderDetailEnlarged table we created with the script linked above with the following two queries.  Just as with the first test case in the second query we will leave our large Notes column out of the result set.

select * from [Sales].[SalesOrderDetailEnlarged] order by [ModifiedDate] desc
GO
select [SalesOrderID],[SalesOrderDetailID],[CarrierTrackingNumber],[OrderQty],[ProductID]
      ,[SpecialOfferID],[UnitPrice],[UnitPriceDiscount],[LineTotal],[rowguid],[ModifiedDate]
  FROM [Sales].[SalesOrderDetailEnlarged]
order by [ModifiedDate] desc
GO
   

As we did in the first test case let’s first take a look at the client statistics for these queries.  As you can see when querying the table with the extra data from the large Notes column we have a duration of almost 20 minutes compared to only about 20 seconds when this column is left out of the result set.

  No Large Column All Columns
Query Profile Statistics       
  Rows returned by SELECT statements 4,852,682 4,852,682
Network Statistics       
  Number of server roundtrips 4 4
  TDS packets sent from client 4 4
  TDS packets received from server 95,209 9,619,945
  Bytes sent from client 848 490
  Bytes received from server 389,963,000 39,403,270,000
Time Statistics  
  Client processing time 17,768 120,0682
  Total execution time 21,281 120,7548
  Wait time on server replies 3,513 6,866

Again digging deeper and looking at the explain plan let’s see if it was just returning the extra data that caused the longer run time or if there were other factors.  The explain plan for each query is below and as it turns out there was another factor that contributed to the long execution time.  If you look at the sort task in the both explain plans we see that the sort actually spilled to disk due to the large amount of data.  This along with the larger "Estimated Row Size", 4121 bytes for this query, would have a negative impact on the query performance and explains why we had much poorer performance compared with the second query.

query plan
sql server performance impact query unnecessary columns 004

Looking at the second query although we still see the sort spilling to disk it had to write much fewer pages to the tempdb, 16053 compared with 96600 for the query with all columns.  Also for this query the "Estimated Row Size" is only 95 bytes, much less data needed to be returned to the calling application.

query plan
query plan

Summary

The two simple examples above show how you must be diligent is ensuring that only the required data is queried and returned to the application.  Querying unnecessary data not only leads to poor performance for the person initiating the call, but can also effect other processes with all the extra processing/CPU, network traffic and database IO that is needed to execute the query.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-12-06

Comments For This Article




Wednesday, August 22, 2018 - 5:30:38 PM - mohammad jouhari Back To Top (77274)

 Thanks man for the effort and the explanition. For me I had to clear the cache before the execustion of each query. Without that I managed to have Total execustion time in select * less than the specefied colums.


Wednesday, December 6, 2017 - 4:06:02 PM - Jimbo99 Back To Top (73695)

Just me, but there is always one person that needs to drill down on an issue and nail it down to a responsible party that requires the other data columns/fields. I mean anytime you need a sub-report from the dataset queried ? For example, counts of membership by physician. The next logical question from the query, who comprises those members, what are their demographics ? When was their last office visit, and when is their next scheduled appointment ?

 


Wednesday, December 6, 2017 - 1:12:25 PM - Ben Snaidero Back To Top (73685)

@Jetus - The first script to add/update the columns has been updated.  Thanks for noticing my typo

@Keon Verbeeck - This example was a fairly extreme example just to illustrate the point.  It can have other effects though, not just throughput.  As you can see in the explain plan where the sort spilled to disk.  In this case there was quite a bit more tempdb space used 

Thanks for reading.


Wednesday, December 6, 2017 - 10:13:52 AM - Koen Verbeeck Back To Top (73676)

 

Expanding on my previous comment: selecting less columns has of course an impact on the data throughput (less data has to be transferred to the client app), but the SQL query itself should not be impacted that much by omitting columns.


Wednesday, December 6, 2017 - 9:23:02 AM - Koen Verbeeck Back To Top (73671)

I'm under the impression that it doesn't matter how many columns you select in your SELECT statement, because SELECT is one of the last operators to be executed in a query plan. When reading the data, SQL Server has to scan the data pages anyway, so it doesn't really matter if you skip columns or not (at least not for read performance). It only makes a difference if you have columnstore indexes, because they have the ability to skip columns. Regular rowstore indexes (or heaps) do not.

So why are you seeing such a big performance improvement? My guess is that's caused because your test column "Notes" is a blob column, and is actually stored outside the table. Hence, regular page scans don't include the column, while if you explicitly include the column (or use SELECT *), SQL Server has to do all the extra work of including the blob data.


Wednesday, December 6, 2017 - 9:02:21 AM - Jetus Back To Top (73670)

 

Ben, you added column [Notes] twice into the [Sales].[SalesOrderDetailEnlarged] table instead of [HumanResources.Employee] in your very first SQL code.















get free sql tips
agree to terms