By: Andy Novick | Updated: 2009-01-23 | Comments (1) | Related: 1 | 2 | 3 | 4 | > Functions User Defined UDF
Problem
When you send e-mail to large lists, validating the e-mail addresses before sending out the e-mail is worth doing to prevent having mail rejection messages clog up your mail server. I had been doing the validation with a T-SQL User Defined Function (UDF), but it was getting too slow as the number of e-mails grew. I needed a faster alternative, what options do I have?
Solution
While I could have improved the T-SQL in my function validating e-mail addresses, which requires a lot of string handling, T-SQL isn't the best technology for this problem. When the task at hand is one that .Net code handles better then T-SQL a SQLCLR scalar function can be a great solution. When introduced in SQL Server 2005, the SQLCLR allows stored procedures, functions, triggers, user-defined types and user-defined aggregates to be written in C#, VB.Net or C++. I showed how to write a SQLCLR stored procedure in the tip Writing to an operating system file using the SQL Server SQLCLR. This tip will show how to write a SQLCLR scalar function and compare it to the original T-SQL.
Although I acknowledge my T-SQL UDF is not totally optimized, in my tests of validating 10,000 email addresses the SQLCLR took 53ms and the T-SQL UDF took 14,822ms. That is a significant improvement, so read on...
Lets take a look at the original T-SQL UDF. Here's the code:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE FUNCTION [dbo].[udf_Txt_IsEmail] ( @EmailAddr varchar(255) -- Email address to check ) RETURNS BIT -- 1 if @EmailAddr is a valid email address /* * Checks an text string to be sure it's a valid e-mail address. * Returns 1 when it is, otherwise 0. * Example: SELECT CASE WHEN 1=dbo.udf_Txt_IsEmail('[email protected]') THEN 'Is an e-mail address' ELSE 'Not an e-mail address' END * * Test: print case when 1=dbo.udf_txt_isEmail('[email protected]') then 'Passes' else 'Fails' end + ' test for good addr' print case when 0=dbo.udf_txt_isEmail('@novicksoftware.com') then 'Passes' else 'Fails' end + ' test for no user' print case when 0=dbo.udf_txt_isEmail('[email protected]') then 'Passes' else 'Fails' end + ' test for 1 char domain' print case when 1=dbo.udf_txt_isEmail('[email protected]') then 'Passes' else 'Fails' end + ' test for 2 char domain' print case when 0=dbo.udf_txt_isEmail('[email protected]') then 'Passes' else 'Fails' end + ' test for no domain' print case when 0=dbo.udf_txt_isEmail('anov [email protected]') then 'Passes' else 'Fails' end + ' test for space in name' print case when 0=dbo.udf_txt_isEmail('ano#[email protected]') then 'Passes' else 'Fails' end + ' test for # in user' print case when 0=dbo.udf_txt_isEmail('anovick@novick*software.com') then 'Passes' else 'Fails' end + ' test for * asterisk in domain' ****************************************************************/ AS BEGIN DECLARE @AlphabetPlus VARCHAR(255) , @Max INT -- Length of the address , @Pos INT -- Position in @EmailAddr , @OK BIT -- Is @EmailAddr OK -- Check basic conditions IF @EmailAddr IS NULL OR NOT @EmailAddr LIKE '_%@__%.__%' OR CHARINDEX(' ',LTRIM(RTRIM(@EmailAddr))) > 0 RETURN(0) SELECT @AlphabetPlus = 'abcdefghijklmnopqrstuvwxyz01234567890_-.@' , @Max = LEN(@EmailAddr) , @Pos = 0 , @OK = 1 WHILE @Pos < @Max AND @OK = 1 BEGIN SET @Pos = @Pos + 1 IF NOT @AlphabetPlus LIKE '%' + SUBSTRING(@EmailAddr, @Pos, 1) + '%' SET @OK = 0 END -- WHILE RETURN @OK END go GRANT EXEC on dbo.udf_txt_isEmail to PUBLIC go
It's not the most efficient of code. That's why I have to update it.
One of the features of this function is that the header comment includes a series of tests. While these might be not be the most complete set of tests for an e-mail validation function, there the ones that I've been using for awhile. They're easy to execute. In SSMS, just select the text and execute the script. It prints out a series of Passed/Failed messages about each test. Later in the tip they'll be used to validate the new SQLCLR function.
The SQLCLR alternative will use a regular expression to validate the e-mail address. Regular expressions are expressions in a mini-language specific to searching text strings. They are perfectly suited to the task of validating e-mail addresses. There are variations between implementations of the regular expression syntax, but most adhere to similar rules. The .Net syntax is described in the documentation for .Net. This link, http://msdn.microsoft.com/en-us/library/hs600312(VS.71).aspx is a good place to start. There are many tutorials that you can find on-line if you want to learn more. There are also libraries of regular expressions that can be found. I sometimes use expressions from http://regexlib.com/. For validating e-mails I've chosen an expression from the MSDN .Net documentation in an example that shows how to validate e-mail addresses. You can find it in your Visual Studio documentation or on-line at http://msdn.microsoft.com/en-us/library/01escwtf(VS.71).aspx
Previous tips on the SQLCLR have shown how to create a SQLCLR project, Writing to an operating system file using the SQL Server SQLCLR, or to compile a .Net program from the command line and then load it into SQL Server CLR function to delete older backup and log files in SQL Server, so this tip will concentrate on the validation code and testing for performance differences. I started with the existing project ns_txt and used the menu command Project/Add User-Defined Function. I gave the function the name ns_txt_email and Visual Studio created a prototype function as shown here:
Writing the code was pretty simple, given that there was an example in the Visual Studio documentation. There were a few steps that I needed to take:
- Add a using statement for the System.Text.RegularExpressions namespace
- Change the return type of the function to SqlBoolean
- Add the parameter email with type SqlString
- Change the return statement to return a new SqlBoolean with the result of the Regex.IsMatch function as it's value
Here's the code after I was done:
The regular expression is a long string constant that is cut off in the picture. The full string is:
^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
The Regex.IsMatch function returns a CLR Boolean, which is returned by the function as a SqlBoolean. This is the type that best corresponds to the T-SQL type BIT.
The next task is to build and deploy the project and try the function. The function ns_txt_email can be used from SSMS, but it can also be used. from Visual Studio. For this project I decided to test by putting my tests in the Test.sql script that Visual Studio provides as part of a Database project. You can add scripts in the "Test Scripts" folder. Execute them by right clicking on the script and choosing "Debug Script". My test script for the SQLCLR function is here:
print case when 1=dbo.ns_txt_email('[email protected]') then 'Passes' else 'Fails' end + ' test for good addr' print case when 0=dbo.ns_txt_email('@novicksoftware.com') then 'Passes' else 'Fails' end + ' test for no user' print case when 0=dbo.ns_txt_email('[email protected]') then 'Passes' else 'Fails' end + ' test for 1 char in tld' print case when 1=dbo.ns_txt_email('[email protected]') then 'Passes' else 'Fails' end + ' test for 2 char in tld' print case when 0=dbo.ns_txt_email('[email protected]') then 'Passes' else 'Fails' end + ' test for no domain' print case when 0=dbo.ns_txt_email('anov [email protected]') then 'Passes' else 'Fails' end + ' test for space in name' print case when 0=dbo.ns_txt_email('ano#[email protected]') then 'Passes' else 'Fails' end + ' test for # in user' print case when 0=dbo.ns_txt_email('anovick@novick*software.com') then 'Passes' else 'Fails' end + ' test for * asterisk in domain'
The output of a test script is sent to Visual Studio's Output window. The following is the result of running the script. The results have been truncated on the right side for readability:
Auto-attach to process '[5968] [SQL] nsl6' on machine 'nsl6' succeeded. Debugging script from project script file. The thread 'nsl6 [58]' (0x10d8) has exited with code 0 (0x0). The thread 'nsl6 [58]' (0x10d8) has exited with code 0 (0x0). The thread 'nsl6 [58]' (0x10d8) has exited with code 0 (0x0). 'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_32\mscorlib\2.0 'sqlservr.exe' (Managed): Loaded 'C:\Program Files\Microsoft SQL Server\M 'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_32\System.Data 'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_MSIL\System\2.0 'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_32\System.Trans 'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_MSIL\System.Sec 'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_MSIL\System.Xml 'sqlservr.exe' (Managed): Loaded 'ns_txt', Symbols loaded. Auto-attach to process '[5968] sqlservr.exe' on machine 'nsl6' succeeded. Passes test for good addr Passes test for no user Passes test for 1 char in tld Passes test for 2 char domain Passes test for no domain Passes test for space in name Passes test for # in user Passes test for * asterisk in domain No rows affected. (0 row(s) returned) Finished running sp_executesql.
So the new function passes the original set of tests.
Now how about performance? Is the SQLCLR function really faster than the old T-SQL UDF? To create some reasonable test data that anyone could use this script creates 10,000 sample e-mail addresses. All of them should be valid.
use tempdb go create table #test_emails ( id int not null identity(1,1) primary key , email nvarchar(255) not null ) go declare @i int set @i=0 while (@i < 10000) begin insert into #test_emails (email) select substring ( 'abcdefghijklmnopqrstuvwyz01234567890zywvutsr' , convert(smallint, 1+ rand() * 35) , convert(smallint, 2+ rand() * 10)) + '@' + substring ('abcdefghijklmnopqrstuvwyzzywvutsrqp' , 1+convert(smallint, rand() * 25) , 2+convert(smallint, rand() * 10)) + '.' + rtrim(substring ( 'com net biz infous za ca uk edu gov tv ' , 1 + (4* convert(smallint, rand() * 10)) , 4)) SET @i = @i + 1 end go
The next script compares the performance of the two functions. It first selects the table of temporary email address in the expectation of pulling the tables pages into memory. The SET STATISTICS TIME ON command asks SQL Server for the elapsed and CPU times for each command. The results are summed so that the time to return results to the client isn't a factor in the measurement.
use tempdb go select * from #test_emails go set statistics time on select sum(case when 1=ns_lib.dbo.udf_txt_IsEmail(email) then 1 else 0 end) from #test_emails select sum(case when 1=ns_lib.dbo.ns_txt_email(email) then 1 else 0 end) from #test_emails set statistics time off go
After sending the results to the Text Results window and ignoring the list of sample e-mails the results are:
UDF results ----------- 10000 (1 row(s) affected) SQL Server Execution Times: CPU time = 6187 ms, elapsed time = 14822 ms. SQLCLR results -------------- 10000 (1 row(s) affected) SQL Server Execution Times: CPU time = 47 ms, elapsed time = 53 ms.
The CPU time for the SQLCLR is only 47 milliseconds compared to 6.187 seconds for the T-SQL UDF. The SQLCLR function is 131 times faster, a dramatic difference. I'm sure that I could improve the T-SQL a lot to make it faster, but I doubt that the T-SQL UDF is ever going to get close. At least in this case SQLCLR can be dramatically faster in terms of CPU and the difference in elapsed time is even larger.
Next Steps
T-SQL functions that handle strings are candidates for replacement with SQLCLR functions. That's especially true of validation functions, such as udf_txt_IsEmail, due to the availability of the regular expression feature in .Net. If a function is dragging your performance down, consider replacing it with a similar SQLCLR UDF.
- You can download and use the ns_txt_email function in your own applications
- Review related tips on the SQLCLR such as these:
- You might also like to learn more about regular expressions in the Visual Studio documentation.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2009-01-23