Refactor SQL Server scalar UDF to inline TVF to improve performance

By:   |   Updated: 2017-03-22   |   Comments (2)   |   Related: > Functions User Defined UDF


Problem

My previous tip Understand the Performance Behavior of SQL Server Scalar User Defined Functions focuses on the characteristics when analyzing queries with scalar UDF. This tip will recommend an option to improve query performance when using a scalar UDF.

Solution

One viable option to improve the performance of query which uses a scalar UDF is to refactor the scalar UDF code into a Table Valued Function (TVF). This might not always be possible given the complexity of the scalar UDF which can vary to a very high degree, but it should certainly be given consideration and tested.

One of the main advantages to refactor scalar UDF code into a TVF is typically the effort required is minimal compared to re-writing into another object type such as a stored procedure. A TVF can be re-fitted back into the main query by selecting directly against the TVF or using an APPLY operator to invoke the TVF for each row returned by an outer table expression of a query. Using the APPLY operator is the more appropriate way to invoke a TVF when joining to a query.

Other advantages of using a TVF over a scalar UDF includes:

  • Queries which reference a TVF can possibly leverage parallelism
  • A TVF can return more than one column for rows that satisfy the filter criteria
  • Statistics I/O generated also includes the disk activity generated by the TVF
  • Encapsulate reusable code with the ability to work on bigger items
  • Works similarly to using a View, with the ability to take parameters

There are two types of TVFs:

  • Inline TVF
    • Consists of one single SELECT statement and returns a table data type
    • All referenced objects are included in the final query plan
    • The query optimizer can access all referenced objects' statistics hence produces more accurate cardinality estimates
    • The query references the base tables in its execution plan
  • Multi-line TVF
    • Can have multiple lines like a stored procedure and return a table data type
    • Referenced objects in a multi-line TVF produces a separate query plan
    • There are no statistics available hence generally gives a low cardinality estimate
    • Requires intermediate results to go into tempdb due to table variable creation, allocation, and deallocation. This can cause tempdb contention.

Refactor scalar UDF code into an inline TVF

To make sure the query evaluates the same rows and produces the same output result between the query which references the scalar UDF and the inline TVF, we will add an ORDER BY into the benchmark query structure to be used in the benchmark as below.

SELECT TOP 5000 *
FROM [Sales].[InvoiceLines] po
JOIN [Sales].[OrderLines] pl
ON po.[StockItemID] = pl.[StockItemID]
ORDER BY po.InvoiceID
OPTION (RECOMPILE)

Scalar UDF

Below is the definition of the scalar UDF as per the previous tip.

USE [WideWorldImporters]
GO
CREATE FUNCTION dbo.ufn_GetTotalQuantity (@StockItemID INT)
RETURNS INT
AS
BEGIN
        DECLARE @Qty INT

        SELECT @Qty = SUM(Quantity)
        FROM Warehouse.StockItemTransactions
        WHERE StockItemID = @StockItemID

        RETURN (@Qty)
END

The benchmark query which references the scalar UDF below and is the same as the previous tip with addition of an ORDER BY clause.

SELECT TOP 5000 *, dbo.ufn_GetTotalQuantity (po.StockItemID) TotalQty 
FROM [Sales].[InvoiceLines] po 
JOIN [Sales].[OrderLines] pl 
ON po.[StockItemID] = pl.[StockItemID] 
ORDER BY po.InvoiceID
OPTION (RECOMPILE)

Inline TVF

We will refactor the scalar UDF code into an inline TVF. After the code refactor, the inline TVF code looks like below.

USE [WideWorldImporters]
GO
CREATE FUNCTION [dbo].[tvf_GetTotalQuantity] (@StockItemID INT)
RETURNS TABLE
AS
RETURN (
 SELECT  SUM(Quantity) Qty
 FROM Warehouse.StockItemTransactions
 WHERE StockItemID = @StockItemID
    );
GO

The equivalent benchmark query to produce the same result using the TVF is below.

SELECT TOP 5000 *, tv.Qty TotalQty
FROM [Sales].[InvoiceLines] po
JOIN [Sales].[OrderLines] pl
ON po.[StockItemID] = pl.[StockItemID]
CROSS APPLY dbo.[tvf_GetTotalQuantity] (po.StockItemID) tv
ORDER BY po.InvoiceID
OPTION (RECOMPILE)

Query Cost Comparison

Do not try to compare the query cost or I/O statistics between the two benchmark queries. The SQL Server query cost will show that the query which uses the scalar UDF is lower. But in the previous tip, this is misleading because the SQL Server query plan excludes the query cost of the scalar UDF from the main query plan.

Query Cost Comparison between the SQL Query and UDF

Benchmark Scenario

We now benchmark the execution duration of the query which uses the scalar UDF and TVF by increasing the number of rows returned from each query.  This is done by specifying a different number of rows in the SELECT TOP syntax in the first line of the main query with the specified number of rows in the table below.

Rows returned from Query

Scalar UDF (seconds)

Inline TVF (seconds)
SELECT TOP 5000 ... 9 2
SELECT TOP 10000 ... 12 2
SELECT TOP 30000 ... 30 2
SELECT TOP 50000 ... 45 2
SELECT TOP 100000 ... 94 4

The more rows evaluated in the query using the scalar UDF the longer it takes to return the result, where the performance of the query using the TVF is consistent.

UDF the longer it takes to return the result, where the performance of the query using the TVF is consistent
Summary

Inline TVFs have more advantages and typically is the preferred solution if a single T-SQL statement can be written to return the result. But if a TVF needs to be written in multiple T-SQL statements and encounter performance issue due to lack of statistics, there are workarounds such as having the multi-line TVF select data into a temporary table and then the temporary table joined to the main query. In any situation, proper testing and understanding the features is essential towards a better solution.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Simon Liew Simon Liew is an independent SQL Server Consultant in Sydney, Australia. He is a Microsoft Certified Master for SQL Server 2008 and holds a Master’s Degree in Distributed Computing.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-03-22

Comments For This Article




Wednesday, March 22, 2017 - 8:23:14 PM - Simon Liew Back To Top (51549)

Hi Jeff,

Thank you for adding more info into this tip. I concur with your points because I have seen many issues with mTVF causing tempdb contention and poor performance. 

In some instances, I had to take out mTVF from main query, select mTVF data into a temporary table first and then rejoin temp table to main query. The inaccurate cardinality estimation can really kill performance on complex query structure.


Wednesday, March 22, 2017 - 9:43:09 AM - Jeff Moden Back To Top (51533)

Good artilcle,Simon.  Thanks for taking the time to do the testing and publish the results.  Its an older trick but it seems that a lot of people still don't know about it so I'm glad to see folks continuing to publish about it.

Just to drive a point home, converting a Scalar Function to an mTVF (Multi-statment Table Valued Function) will make things worse.  Much worse.  Using mTVFs even for non-scalar functionality is a form of RBAR on steriods that will make Scalar UDFs look good. 

Simon (the author) is specifically talking about iTVFs (INLINE Table Valued Functions), which must be written as a single statement just like you would a view.  In fact, some of us refer to them as a "parameterized view", although they're much more flexible than just a view in many cases.  Even supposedly "memory only" Scalar Functions are a good 7 times slower than the equivalent iTVF when written properly.  I've got an example at the following article.

http://www.sqlservercentral.com/articles/T-SQL/91724/

If you have any doubt about whether or not a function is a slower mTVF or a higher performing iTVF, look for the word "BEGIN" in the function.  mTVFs require a BEGIN, iTVFs cannot use the word BEGIN.

 















get free sql tips
agree to terms