Generate Random Strings with High Performance with a SQL CLR function

By: Jeffrey Yao | Updated: 2015-04-17 | Comments (5) | Related: > Testing

Problem

In my work, random strings are useful in many ways. For example, I want to replace all sensitive information of some columns with random strings after I restore the production SQL Server database to a test environment, or I want to generate dummy data for development purposes. So is there a way I can generate random strings easily?

I have the following requirements for the random string generation:

I can define the string length to be within a range.
I can repeatedly generate the exact same strings if needed, so I can make sure my data quantity and quality are the same.
I can generate random string with simple patterns, for example, the postal code in Canada has a format of A1A 1A1, i.e. LetterNumberLetter NumberLetterNumber, such as V3V 2A4 or M9B 0B5.

Solution

There are many ways in T-SQL to generate random strings. Here is one good discussion of this topic "Generating random strings with T-SQL".

Generally speaking, with pure T-SQL, we can use Rand(), NewID(), CRYPT_GEN_RANDOM() and Convert/Cast/Substring T-SQL to create random strings.

However, just using pure T-SQL has two obvious disadvantages:

Non-deterministic functions such as Rand(), NewID() and CRYPT_GEN_RANDOM() are not allowed to be used inside a UDF, which means you need to create some additional layer to bypass this limitation.
For heavy-load string generation, the pure T-SQL solution's performance is compromised.

For string manipulation inside SQL Server, the majority agree that a CLR function will be better positioned. So in this tip, I will provide two CLR functions to meet the above-mentioned requirements.

Generate a random string with its length specified, and also with a seed parameter, we ensure repeatablility with the same random string when using the same seed.
Generate a random string with a simple pattern defined by the pattern parameter, this function also has a seed parameter to ensure repeatable string generations.

I will not repeat the steps about how to create/deploy an assembly with Visual Studio, but you can refer to the links in [Next Steps] section to find the details.

using System;
using System.Data;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Security.Cryptography;
using System.Text;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString fn_random_string(SqlInt32 minLen, SqlInt32 maxLen, SqlInt32 seed)
    {
        int min_i = (int)minLen;
        int max_i = (int)maxLen;

        int i = 0;
        if (min_i <= 0 || min_i > max_i)
        { return new SqlString(string.Empty); }
        else
        {
            int sd = (int)seed;
            Random r = new Random();
            if (sd != 0)
            {
                r = new Random(sd);
            }

            i = r.Next(min_i, max_i + 1);
            byte[] rnd = new byte[i];
            using (var rng = new RNGCryptoServiceProvider())
            {
                rng.GetNonZeroBytes(rnd);
                string rs = Convert.ToBase64String(rnd);
                rs = rs.Substring(0, i);
                return new SqlString(rs);
            }
        }
    } //fn_random_string


    public static SqlString fn_random_pattern(SqlString pat, SqlInt32 seed)
    {
        string pattern = pat.ToString();
        if (pattern == string.Empty)
        { return new SqlString(string.Empty); }
        else
        {
            string CharList = "abcdefghijklmnopqrstvvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
            string NumList = "0123456789";
            char[] cl_a = CharList.ToCharArray();
            char[] nl_a = NumList.ToCharArray();
            int sd = (int)seed;
            Random rnd = new Random();
            if (sd != 0)
            {
                rnd = new Random(sd);
            }

            StringBuilder sb = new StringBuilder(pattern.Length);

            char[] a = pattern.ToCharArray();
            for (int i = 0; i < a.Length; i++)
            {
                switch (a[i])
                {
                    case '@':
                        sb.Append(cl_a[rnd.Next(0, CharList.Length)]);
                        break;
                    case '!':
                        sb.Append(nl_a[rnd.Next(0, NumList.Length)]);
                        break;
                    default:
                        sb.Append(a[i]);
                        break;
                }
            }//for
            return new SqlString(sb.ToString());
        }//else
    } // fn_random_pattern
} //UserDefinedFunctions

In my case, after I build the application to generate a DLL file, which I put it under c:\MSSQLTips\Random_String\bin\ folder, I need to run the following to import the DLL into SQL Server 2012.

use MSSQLTips -- this is my test database
alter database MSSQLTips set trustworthy on;
exec sp_configure 'clr enabled', 1;
reconfigure with override
go

create assembly clr_random_string
from 'C:\mssqltips\Random_String\bin\CLR_Rand_String.dll'
with permission_set = safe
go

create function dbo.ucf_random_string (@minLen int, @maxLen int, @seed int =0)
returns nvarchar(max) with execute as caller
as 
external name [clr_random_string].[UserDefinedFunctions].fn_random_string;
go

/*
@pattern: @ means one letter from a to z (both lower and upper cases), ! means one digit number, i.e. 0 to 9. Anything else will not change.
so if we have a @pattern='abc !! def', then we may have strings like 'abc 12 def' or 'abc 87 def' generated.
For canadian post code the pattern can be '@!@ !@!' (the middle blank space will be kept as it is in the generated string, like 'V1A 2P5'
*/
create function dbo.ucf_random_pattern (@pattern nvarchar(max), @seed int=0 )
returns nvarchar(max) with execute as caller
as 
external name [clr_random_string].[UserDefinedFunctions].fn_random_pattern;
go

We can use the following code to generate some random strings:

Use MSSQLTips
-- generate a single random string
select RandStr=dbo.ucf_random_string(10, 30, default)
, RandPattern=dbo.ucf_random_pattern('What is the time, Mr. @@@@@@? It is ! am', default);

-- generate random Canada Post Code / US zip code
select top 10 Canada_PostCode= upper(dbo.ucf_random_pattern('@!@ !@!', row_number() over (order by column_id)))
, US_ZipCode= dbo.ucf_random_pattern('!!!!!-!!!!', ceiling(rand(column_id)) + row_number() over (order by column_id))
from sys.all_columns

Here is the result:

Performance Comparison between CLR and T-SQL

Here I compare the execution of usp_generateIdentifier as seen on stackoverflow.com and my CLR version dbo.ucf_Random_String, I run each 20,000 times and for each 2000 times, I will record the duration as the script runs. The test code uses the same parameters for both the T-SQL Stored Procedure and the CLR function.

Here is the test code:

-- test performance between CLR and T-SQL
Use MSSQLTips;
set nocount on;
declare @i int=1, @start_time datetime = getdate(); 
declare @str varchar(8000), @seed int;
declare @t_tsql table (run_count int, duration_ms int); -- for tsql execution stats
declare @t_clr table (run_count int, duration_ms int); -- for clr execution stats
-- run tsql solution 20,000 times
while @i <= 20000
begin
set @seed = @i;
exec dbo.usp_generateIdentifier
@minLen = 2000
, @maxLen = 4000
, @seed = @seed output
, @string = @str output;

if (@i % 2000 = 0)
  insert into @t_tsql (run_count, duration_ms)
  select @i, datediff(ms, @start_time, getdate());
  set @i = @i+1;
end

select @i = 1, @start_time = getdate(); -- reinitialize variable
-- run clr solution 20,000 times
while @i <= 20000
begin
set @seed = @i;
select @str = dbo.ucf_random_string(2000, 4000, @seed)
if (@i % 2000 = 0)
  insert into @t_clr (run_count, duration_ms)
  select @i, datediff(ms, @start_time, getdate());
set @i = @i+1;
end

select t1.run_count, tsql_duration=t1.duration_ms, clr_duration=t2.duration_ms
from @t_tsql t1
inner join @t_clr t2
on t1.run_count = t2.run_count;

I put the results into an Excel sheet and graphed the data as shown below:

Next Steps

I like this CLR approach especially because its assembly permission is to set to SAFE, meaning I never need to worry that any future .NET DLL patches will break this CLR code.

Please read the following articles to know more about how to work with CLR functions.

This tip's code has been tested in Visual Studio 2013 and SQL Server 2012 environment. It should be applicable to SQL Server 2008 and above as well.

About the author

Jeffrey Yao is a senior SQL Server consultant, striving to automate DBA work as much as possible to have more time for family, life and more automation.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2015-04-17

Friday, April 17, 2015 - 2:14:05 PM - jeff_yao	Back To Top (36965)
Thanks @Jeff Moden, I will run your code later and make a comparsion with my CLR code and update back here.