By: Ben Snaidero | Updated: 2012-07-10 | Comments (11) | Related: > Performance Tuning
Problem
Everyone knows it's best not to use function calls in your WHERE clause as it will affect your performance (if you did not know this already, this article will convince you not to), but what about using them say in the SELECT list. This tip will look at how removing function calls from the SELECT list can dramatically improve performance, especially when returning larger data sets i.e. search query results.
Solution
Sample SQL Server Table and Functions
For this example we will setup two sample tables and two functions that access these tables. As for populating the tables you will have to use a tool like Visual Studio to populate them with some reasonably realistic data so as not to skew the results. In my case I populated each table with about 200,000 records. One thing to note with these examples is that there is a requirement for the functions to only return one record per sale, even if there is actually multiple buyers.
Here is the DDL code:
-- Table creation logic CREATE TABLE [dbo].[CarSale]( [CarSaleID] [int] IDENTITY(1,1) NOT NULL, [PurchaseDate] [smalldatetime] NOT NULL, CONSTRAINT [PK_CarSale] PRIMARY KEY CLUSTERED ([CarSaleID] ASC) );
CREATE TABLE [dbo].[Buyer]( [BuyerID] [int] IDENTITY(1,1) NOT NULL, [CarSaleID] [int] NOT NULL, [LastName] [varchar](50) NULL, [FirstName] [varchar](100) NULL, [CompanyName] [varchar](200) NULL, CONSTRAINT [PK_Buyer] PRIMARY KEY NONCLUSTERED ([BuyerID] ASC) );
ALTER TABLE [dbo].[Buyer] WITH CHECK ADD CONSTRAINT [FK_Buyer_CarSale] FOREIGN KEY([CarSaleID]) REFERENCES [dbo].[CarSale] ([CarSaleID]) ON UPDATE CASCADE ON DELETE CASCADE;
CREATE CLUSTERED INDEX [IX_Buyer_CarSalelID] ON [dbo].[Buyer]([CarSaleID] ASC);
-- Function creation logic CREATE FUNCTION [dbo].[fnGetBuyerFirstName] (@CarSaleID INT) RETURNS VARCHAR (500) AS BEGIN RETURN (SELECT Top 1 FirstName FROM Buyer WHERE CarSaleID= @CarSaleID ORDER BY BuyerID) END GO
CREATE FUNCTION [dbo].[fnGetBuyerLastName] (@CarSaleID INT) RETURNS VARCHAR (500) AS BEGIN RETURN (SELECT Top 1 coalesce(LastName,CompanyName) FROM Buyer WHERE CarSaleID= @CarSaleID ORDER BY BuyerID) END GO
Original Query Example
Our original query is pretty straightforward. It does a simple SELECT and uses two functions to retrieve the first and last name from the other table.
Here is the code:
SELECT cs.PurchaseDate, dbo.fnGetBuyerFirstName(cs.CarSaleID), dbo.fnGetBuyerLastName(cs.CarSaleID) FROM CarSale cs ORDER BY CarSaleID;
Looking at the explain plan for our original query we can see that for each record returned by the query we are going to call these functions and basically do a lookup on the Buyer table twice (once for each function). This is not going to be very efficient if the CarSale table contains a large number of records.
Even if we restrict our query to a very specific WHERE clause and retrieve only one record and check the explain plan for this updated query, we still see that it's going to have to do two lookups on the Buyer table.
Below is the code for the updated query:
SELECT cs.PurchaseDate, dbo.fnGetBuyerFirstName(cs.CarSaleID), dbo.fnGetBuyerLastName(cs.CarSaleID) FROM CarSale cs WHERE CarSaleID=5 ORDER BY CarSaleID;
One thing to note here is that this is the best case scenario as this query is only going to return one record. Any query with a wider WHERE clause returning more data will become slower as the data set returned grows.
Query Update to Remove Function Calls Example
Now let's try removing the function calls and getting our query result using a table join and see what happens. Here are both new queries, the first without a WHERE clause and the second with one.
SELECT cs.PurchaseDate,FirstName,LastName FROM CarSale cs INNER JOIN (SELECT CarSaleID,MIN(BuyerID) AS SingleBuyerID FROM Buyer GROUP BY CarSaleID) m2 ON cs.CarSaleID=m2.CarSaleID INNER JOIN Buyer m ON m2.CarSaleID=cs.CarSaleID AND m2.SingleBuyerID=m.BuyerID ORDER BY cs.CarSaleID;
SELECT cs.PurchaseDate,FirstName,LastName FROM CarSale cs INNER JOIN (SELECT CarSaleID,MIN(BuyerID) AS SingleBuyerID FROM Buyer GROUP BY CarSaleID) m2 ON cs.CarSaleID=m2.CarSaleID INNER JOIN Buyer m ON m2.CarSaleID=cs.CarSaleID AND m2.SingleBuyerID=m.BuyerID WHERE cs.CarSaleID=5 ORDER BY cs.CarSaleID;
Looking at the explain plans for the query without the WHERE clause and the query with the WHERE clause we can see that in both cases the new explain plans should perform better since we no longer have to perform a lookup for each record returned by the query. This is now handled by the MERGE join.
To confirm this and see just how much improvement we get from removing the function calls I ran a SQL Profiler trace and ran each of these queries. Here are the results:
Query | WHERE Clause | CPU (ms) | Reads | Writes | Duration |
---|---|---|---|---|---|
Original | NO | 10734 | 1239655 | 0 | 25879 |
YES | 0 | 9 | 0 | 0 | |
No Function Call | NO | 578 | 16337 | 0 | 2457 |
YES | 0 | 11 | 0 | 0 |
Looking at the SQL Profiler trace results we can see that by removing the function calls we get a considerable benefit when the result set of the query is large in all resource areas, read, cpu and total duration. When just returning a single record though the original query actually performs slightly better. Again, this WHERE clause is a best case scenario, as the data set returned grows the query without function calls will start to outperform the one with them.
Final Update Adding Common Table Expression (CTE) Example
Because we have this requirement from the function for only returning a single buyer we can use a CTE to further optimize this query.
Here is the code both with and without the WHERE clause:
WITH summary AS (SELECT CarSaleID, BuyerID, FirstName, LastName, ROW_NUMBER() OVER(PARTITION BY CarSaleID ORDER BY CarSaleID) AS rk FROM Buyer) SELECT PurchaseDate,s.FirstName,s.LastName FROM CarSale cs INNER JOIN summary s ON s.CarSaleID=cs.CarSaleID WHERE s.rk = 1;
WITH summary AS (SELECT CarSaleID, BuyerID, FirstName, LastName, ROW_NUMBER() OVER(PARTITION BY CarSaleID ORDER BY CarSaleID) AS rk FROM Buyer) SELECT PurchaseDate,s.FirstName,s.LastName FROM CarSale cs INNER JOIN summary s ON s.CarSaleID=cs.CarSaleID WHERE s.rk = 1 AND cs.CarSaleID=5;
After generating the explain plans for both of these queries, no WHERE clause and WHERE clause, it's easy to see how much more streamlined these two queries are compared to the originals. As well, looking at the trace results below we can see that this version of the query outperforms all previous examples both with and without the WHERE clause.
Query | WHERE Clause | CPU (ms) | Reads | Writes | Duration |
---|---|---|---|---|---|
No Function Call add WITH statement | NO | 266 | 15796 | 0 | 1931 |
YES | 0 | 6 | 0 | 0 |
Summary
I would agree with anyone that wanted to argue the first example is much easier to read and understand, but given the performance improvements we get by removing these function calls I think it's pretty obvious that it's worth eliminating them and saving on valuable CPU and I/O cycles.
Next Steps
- How to read SQL Server Explain Plans
- Other common uses for Common Table Expressions (CTE)
- More information on ROW_NUMBER() and RANK() functions
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2012-07-10