SQL Server T-SQL Code to Generate A Normal Distribution

By:   |   Updated: 2016-03-29   |   Comments (2)   |   Related: > Testing


Problem

I need some sample data. How can I generate a normal distribution in T-SQL with a specific mean, standard deviation and with a certain number of values to the right of the decimal point?

Solution

The RAND() function alone in T-SQL will not generate a normal distribution (a bell curve), but it should generate values evenly distributed from 0 to 1. In this tip, we will examine T-SQL code that will allow you to specify a mean (average), standard deviation, the precision, and the number of samples to be generated. Our example will use the polar method first described in 1958 by Box and Muller in their article "A Note on the Generation of Random Normal Deviates" which appeared in The Annals of Mathematical Statistics. We will set our mean to 75, standard deviation to 5, precision to 1, and the number of samples to 1000.

Let's start by creating a temporary table using the T-SQL below.

IF OBJECT_ID('tempdb..#tblNormalDistribution') IS NOT NULL
  DROP TABLE #tblNormalDistribution
go
create table #tblNormalDistribution (x float)
go

Next, we want to declare and set our variables as shown below. As we will see later, each iteration of our loop will generate two values. So for us to generate 1,000 values, we need 500 iterations.

declare @pi float, @2pi float, @randNum1 float, @randNum2 float
declare @value1 float, @value2 float
declare @iteration int, @numberOfIterations int
declare @mean float
declare @stdDev float --standard deviation
declare @precision int --number of places to the right of the decimal point

select @iteration = 0
select @pi = pi()
select @2pi = 2.0 * @pi
select @mean =75.0
select @stdDev =5.0
select @precision = 1
select @numberOfIterations = 500 --Two values will be generated each iteration

Now it is time to enter into our loop to generate and insert our normally distributed values.

while (@iteration < @numberOfIterations)
begin
  select @randNum1 = rand()
  select @randNum2 = rand()
  select @value1 = round((sqrt(-2.0*log(@randNum1))*cos(@2pi*@randNum2))*@stdDev, @precision)+@mean
  select @value2 = round((sqrt(-2.0*log(@randNum1))*sin(@2pi*@randNum2))*@stdDev, @precision)+@mean
    
  insert into #tblNormalDistribution (x) values (@value1)
  insert into #tblNormalDistribution (x) values (@value2)

  select @iteration = @iteration + 1
end

After loading the data into the temp table, we will execute the following T-SQL code to verify that our data is normally distributed about our specified mean of 75 with our chosen standard deviation of 5.

select count(*) as [Count],
       min(x)   as [Min],
       max(x)   as [Max],
       avg(x)   as [Average],
       stdev(x) as [Standard Deviation] from #tblNormalDistribution

The image below shows the results of executing the queries. We can see that the mean is close to 75 and the standard deviation is almost 5. The Min and Max columns demonstrate our precision of 1.

Validate the results in SQL Server Management Studio

We can use the next T-SQL query to generate values for a histogram.

select round(x,0) as testValue,
count(*) as testValueCount
from #tblNormalDistribution
group by round(x,0)
order by testValue 

The first 13 rows returned by the previous query are shown here.

Query to generate a histogram
Next Steps

Adjust the values for mean, standard deviation, precision and number of iterations and watch how the results change. Also, please check out these other tips and tutorials on T-SQL and the RAND() function on MSSQLTips.com.



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2016-03-29

Comments For This Article




Monday, April 4, 2016 - 11:41:56 AM - Scott Back To Top (41131)

Ray,

Converting this to a set-based query is not a bad idea but you didn't exactly follow the original algorithm.  The idea is to generate two random numbers, then generate two data points from those numbers by alternating the SIN and COS functions.  You generated one data point from each pair of random numbers, choosing the SIN or COS formula based on even or odd row numbers.  Not a huge change, but mathemeticians take these things seriously.

You need a set-based query that generates two data points for each iteration.  I used (SELECT TOP (n) 1 FROM sys.columns) because the count (50) was so low, for larger numbers another approach would be needed (such as the tally table in your query).

DECLARE @2pi FLOAT = 2.0 * PI();
DECLARE @mean FLOAT = 75.0;
DECLARE @stdDev FLOAT = 5.0;
DECLARE @precision INT = 1;
DECLARE @iterations INT = 50;
DECLARE @two31 FLOAT = 1.0 / 2147483648.0;

DECLARE @samples TABLE ( x FLOAT NOT NULL );


INSERT INTO @samples ( x )
SELECT  val
FROM ( SELECT TOP (@iterations) i = 1 FROM sys.columns ) i 
CROSS JOIN (
    SELECT  Rnd1 = SQRT(-2.0 * LOG(CAST(ABS(CHECKSUM(NEWID())) AS FLOAT) * @two31)) * @stdDev,
            Rnd2 = @2pi * CAST(ABS(CHECKSUM(NEWID())) AS FLOAT) * @two31
) r CROSS APPLY (
    SELECT val = ROUND(r.Rnd1 * COS(Rnd2), @precision) + @mean
    UNION ALL
    SELECT val = ROUND(r.Rnd1 * SIN(Rnd2), @precision) + @mean
) v;

SELECT  [Count] = COUNT(*),
        [Min] = MIN(x),
        [Max] = MAX(x),
        Average = AVG(x),
        [Standard Deviation] = STDEV(x)
FROM @samples;

SELECT  testValue = ROUND(x,0),
        testValueCount = COUNT(*)
FROM @samples
GROUP BY ROUND(x,0)
ORDER BY testValue;


Tuesday, March 29, 2016 - 12:51:08 PM - Ray Back To Top (41083)

 Hi Doc,

I always enjoy reading this type of tip.  I am no math whiz but I do find them interesting.

I have been trying to learn more about numbers or tally tables and just as an exercise decided to try a "set-based" approach to your problem.  I certainly am not implying it is better I just wanted to see if I could come close. Based on your tests of Avg, StdDev, etc. I think it works properly.

Here is my solution, (I am sure you will recognize most of it as your own:) ).

 

 

IF OBJECT_ID('tempdb..#tblNormalDistribution') IS NOT NULL

  DROP TABLE #tblNormalDistribution

go

create table #tblNormalDistribution (Id Integer Not Null, x float)

Go

Declare

@iterationInteger = 100000

, @2PiFloat = 2 * PI()

, @meanFloat = 75.0

, @stDevFloat = 5.0

, @precisionInteger = 1

Insert Into #tblNormalDistribution (Id, x )

Select tn.Number

, Case When tn.Number % 2 = 0 Then Round(Sqrt(-2.0 * Log(Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31))))

* cos(@2Pi * (Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31)))) * @stDev, @precision) + @mean

Else Round(Sqrt(-2.0 * Log(Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31))))

* sin(@2Pi * (Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31)))) * @stDev, @precision) + @mean

End

From dbo.theNumbers As tn

Where tn.Number < @iteration

Select count(*) as Count,

       min(x)   as Min,

       max(x)   as Max,

       avg(x)   as Average,

       stdev(x) as [Standard Deviation]

from #tblNormalDistribution

Select

Round(x, 0) As testValue

, Count(*) As testValueCount

From #tblNormalDistribution

Group By Round(x, 0)

Order By testValue; 

 















get free sql tips
agree to terms