By: Jeffrey Yao | Updated: 2015-04-17 | Comments (5) | Related: > Testing
Problem
In my work, random strings are useful in many ways. For example, I want to replace all sensitive information of some columns with random strings after I restore the production SQL Server database to a test environment, or I want to generate dummy data for development purposes. So is there a way I can generate random strings easily?
I have the following requirements for the random string generation:
- I can define the string length to be within a range.
- I can repeatedly generate the exact same strings if needed, so I can make sure my data quantity and quality are the same.
- I can generate random string with simple patterns, for example, the postal code in Canada has a format of A1A 1A1, i.e. LetterNumberLetter NumberLetterNumber, such as V3V 2A4 or M9B 0B5.
Solution
There are many ways in T-SQL to generate random strings. Here is one good discussion of this topic "Generating random strings with T-SQL".
Generally speaking, with pure T-SQL, we can use Rand(), NewID(), CRYPT_GEN_RANDOM() and Convert/Cast/Substring T-SQL to create random strings.
However, just using pure T-SQL has two obvious disadvantages:
- Non-deterministic functions such as Rand(), NewID() and CRYPT_GEN_RANDOM() are not allowed to be used inside a UDF, which means you need to create some additional layer to bypass this limitation.
- For heavy-load string generation, the pure T-SQL solution's performance is compromised.
For string manipulation inside SQL Server, the majority agree that a CLR function will be better positioned. So in this tip, I will provide two CLR functions to meet the above-mentioned requirements.
- Generate a random string with its length specified, and also with a seed parameter, we ensure repeatablility with the same random string when using the same seed.
- Generate a random string with a simple pattern defined by the pattern parameter, this function also has a seed parameter to ensure repeatable string generations.
I will not repeat the steps about how to create/deploy an assembly with Visual Studio, but you can refer to the links in [Next Steps] section to find the details.
using System; using System.Data; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.Security.Cryptography; using System.Text; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static SqlString fn_random_string(SqlInt32 minLen, SqlInt32 maxLen, SqlInt32 seed) { int min_i = (int)minLen; int max_i = (int)maxLen; int i = 0; if (min_i <= 0 || min_i > max_i) { return new SqlString(string.Empty); } else { int sd = (int)seed; Random r = new Random(); if (sd != 0) { r = new Random(sd); } i = r.Next(min_i, max_i + 1); byte[] rnd = new byte[i]; using (var rng = new RNGCryptoServiceProvider()) { rng.GetNonZeroBytes(rnd); string rs = Convert.ToBase64String(rnd); rs = rs.Substring(0, i); return new SqlString(rs); } } } //fn_random_string public static SqlString fn_random_pattern(SqlString pat, SqlInt32 seed) { string pattern = pat.ToString(); if (pattern == string.Empty) { return new SqlString(string.Empty); } else { string CharList = "abcdefghijklmnopqrstvvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; string NumList = "0123456789"; char[] cl_a = CharList.ToCharArray(); char[] nl_a = NumList.ToCharArray(); int sd = (int)seed; Random rnd = new Random(); if (sd != 0) { rnd = new Random(sd); } StringBuilder sb = new StringBuilder(pattern.Length); char[] a = pattern.ToCharArray(); for (int i = 0; i < a.Length; i++) { switch (a[i]) { case '@': sb.Append(cl_a[rnd.Next(0, CharList.Length)]); break; case '!': sb.Append(nl_a[rnd.Next(0, NumList.Length)]); break; default: sb.Append(a[i]); break; } }//for return new SqlString(sb.ToString()); }//else } // fn_random_pattern } //UserDefinedFunctions
In my case, after I build the application to generate a DLL file, which I put it under c:\MSSQLTips\Random_String\bin\ folder, I need to run the following to import the DLL into SQL Server 2012.
use MSSQLTips -- this is my test database alter database MSSQLTips set trustworthy on; exec sp_configure 'clr enabled', 1; reconfigure with override go create assembly clr_random_string from 'C:\mssqltips\Random_String\bin\CLR_Rand_String.dll' with permission_set = safe go create function dbo.ucf_random_string (@minLen int, @maxLen int, @seed int =0) returns nvarchar(max) with execute as caller as external name [clr_random_string].[UserDefinedFunctions].fn_random_string; go /* @pattern: @ means one letter from a to z (both lower and upper cases), ! means one digit number, i.e. 0 to 9. Anything else will not change. so if we have a @pattern='abc !! def', then we may have strings like 'abc 12 def' or 'abc 87 def' generated. For canadian post code the pattern can be '@!@ !@!' (the middle blank space will be kept as it is in the generated string, like 'V1A 2P5' */ create function dbo.ucf_random_pattern (@pattern nvarchar(max), @seed int=0 ) returns nvarchar(max) with execute as caller as external name [clr_random_string].[UserDefinedFunctions].fn_random_pattern; go
We can use the following code to generate some random strings:
Use MSSQLTips -- generate a single random string select RandStr=dbo.ucf_random_string(10, 30, default) , RandPattern=dbo.ucf_random_pattern('What is the time, Mr. @@@@@@? It is ! am', default); -- generate random Canada Post Code / US zip code select top 10 Canada_PostCode= upper(dbo.ucf_random_pattern('@!@ !@!', row_number() over (order by column_id))) , US_ZipCode= dbo.ucf_random_pattern('!!!!!-!!!!', ceiling(rand(column_id)) + row_number() over (order by column_id)) from sys.all_columns
Here is the result:
Performance Comparison between CLR and T-SQL
Here I compare the execution of usp_generateIdentifier as seen on stackoverflow.com and my CLR version dbo.ucf_Random_String, I run each 20,000 times and for each 2000 times, I will record the duration as the script runs. The test code uses the same parameters for both the T-SQL Stored Procedure and the CLR function.
Here is the test code:
-- test performance between CLR and T-SQL Use MSSQLTips; set nocount on; declare @i int=1, @start_time datetime = getdate(); declare @str varchar(8000), @seed int; declare @t_tsql table (run_count int, duration_ms int); -- for tsql execution stats declare @t_clr table (run_count int, duration_ms int); -- for clr execution stats -- run tsql solution 20,000 times while @i <= 20000 begin set @seed = @i; exec dbo.usp_generateIdentifier @minLen = 2000 , @maxLen = 4000 , @seed = @seed output , @string = @str output; if (@i % 2000 = 0) insert into @t_tsql (run_count, duration_ms) select @i, datediff(ms, @start_time, getdate()); set @i = @i+1; end select @i = 1, @start_time = getdate(); -- reinitialize variable -- run clr solution 20,000 times while @i <= 20000 begin set @seed = @i; select @str = dbo.ucf_random_string(2000, 4000, @seed) if (@i % 2000 = 0) insert into @t_clr (run_count, duration_ms) select @i, datediff(ms, @start_time, getdate()); set @i = @i+1; end select t1.run_count, tsql_duration=t1.duration_ms, clr_duration=t2.duration_ms from @t_tsql t1 inner join @t_clr t2 on t1.run_count = t2.run_count;
I put the results into an Excel sheet and graphed the data as shown below:
Next Steps
I like this CLR approach especially because its assembly permission is to set to SAFE, meaning I never need to worry that any future .NET DLL patches will break this CLR code.
Please read the following articles to know more about how to work with CLR functions.
- Introduction to SQL Server CLR table valued functions
- Various SQL CLR function tips on this website
- SQL CLR Function Tutorial from Microsoft
This tip's code has been tested in Visual Studio 2013 and SQL Server 2012 environment. It should be applicable to SQL Server 2008 and above as well.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2015-04-17