Creating a box plot graph in SQL Server Reporting Services

By:   |   Updated: 2014-11-03   |   Comments (4)   |   Related: > Reporting Services Charts


Problem

I have a data set about durations that I want to plot in a graph. I need to display the minimum, maximum, average and other statistical information in a convenient way. How can I do this using SQL Server Reporting Services (SSRS)?

Solution

An efficient way to display different statistical properties is a box plot. It allows us to display the following statistical information in a condensed representation:

  • Maximum value
  • Minimum value
  • Average
  • Median (which is the 50th percentile)
  • Percentiles (typically 25th, the bottom of the box and 75th, the top of the box)
  • Possible outliers

Using a box plot, you can quickly find out if there are outliers, if the data is tightly grouped or if it is rather skewed.

An example:

Box plot example

In Reporting Services, there are different graph types available that allow us to create this type of graph. The two most important ones are the candlestick and the boxplot itself of course. Both can be found in the Range category.

Range graph types

The tip Distribution Analysis using a SQL Server Reporting Services Box Plot Chart explains in great detail how to make a box plot using the candlestick graph and shows some customization tricks, such as adding outliers to the graph. This tip will focus on the box plot type, which was used to build the example above.

Set-up a Box Plot in SSRS

We will use the same data as in the tip about formatting durations in SSAS, which was about tracking durations for several customer representatives and the handling of service tickets. For your convenience, here’s the entire script to set-up the test data.

CREATE TABLE dbo.DimRepresentative(
  SK_Representative INT NOT NULL
 ,Name    VARCHAR(50) NOT NULL
 CONSTRAINT [PK_DimRepresentative] PRIMARY KEY CLUSTERED 
 (
  SK_Representative ASC
 )
);
GO

INSERT INTO dbo.DimRepresentative(SK_Representative,Name)
VALUES  (1,'Bruce')
  ,(2,'Selena')
  ,(3,'Gordon')
  ,(4,'Harvey');

CREATE TABLE dbo.FactCustomerService
 (SK_CustomerServiceFact INT IDENTITY(1,1) NOT NULL
 ,SK_Representative  INT NULL
 ,TicketNumber   UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() -- degenerate dimension
 ,TicketTimestamp  DATETIME2(3) NOT NULL
 ,Duration    INT NOT NULL);

INSERT INTO dbo.FactCustomerService(SK_Representative,TicketTimestamp,Duration)
SELECT
  SK_Representative = FLOOR(RAND()*4) + 1 -- random assign a duration to a customer representative
 ,TicketTimestamp = DATEADD(DAY,RAND()*100,'2014-01-01')
 ,Duration   = RAND()*100000;
GO 100

CREATE VIEW dbo.CustomerServiceFact_Cube
AS
(
 SELECT
   SK_CustomerServiceFact
  ,SK_Representative
  ,TicketNumber
  ,TicketTimestamp
  ,TicketMonth = CONCAT('2014, ',DATENAME(MONTH,TicketTimestamp)) -- concat is SQL 2012+ only
  ,MonthOrder  = MONTH(TicketTimestamp) -- used for sorting in the report/cube
  ,Duration  = Duration / 86400.0
 FROM dbo.FactCustomerService
);
GO

Creating the Box Plot Graph in Visual Studio

Open up a new report in Visual Studio. Add a connection to the database holding the test data and create a new dataset. The following query is used to retrieve all the statistical information from the database:

SELECT DISTINCT
  Representative = r.Name
 ,MaxDuration = MAX([Duration]) OVER (PARTITION BY r.Name) / 86400.0
 ,MinDuration = MIN([Duration]) OVER (PARTITION BY r.Name) / 86400.0
 ,MeanDuration = AVG([Duration]) OVER (PARTITION BY r.Name) / 86400.0
 ,Percentile25 = (PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY Duration) OVER (PARTITION BY r.Name)) / 86400.0
 ,Percentile75 = (PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY Duration) OVER (PARTITION BY r.Name)) / 86400.0
 ,MedianDuration = (PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY Duration) OVER (PARTITION BY r.Name)) / 86400.0
FROM [dbo].[FactCustomerService] c
JOIN [dbo].DimRepresentative  r ON c.SK_Representative = r.SK_Representative;

Note that this query will only work using SQL Server 2012 or above, because it uses the analytical windowing function PERCENTILE_CONT introduced in SQL Server 2012 to calculate the percentiles and the median. The query returns the results in decimal time. The maximum, minimum and average can be calculated starting from SQL Server 2005, because those are not actual windowing functions using windows (defined by RANGE or ROWS); they are normal aggregates using the OVER clause. The difference is described in this blog post.

The dataset looks like this:

The SSRS dataset

Add a new graph to the reporting canvas and select the boxplot graph type.

Fresh inserted box plot

Click somewhere in the chart area so that the Chart Data form pops up. Click on the green plus icon next to Values and choose MaxDuration.

Step 1 of creating the graph

This will insert a new series called MaxDuration. If you want, you can change the name in the properties window.

Step 2 of creating the graph

Right-click on the series and choose Series Properties... There you can see MaxDuration has been configured as the high whisker. Now we can setup all the other properties of the box plot using the other measures. Leave category and tooltip empty.

Step 3 of creating the graph

Now add the Representative column as a category group to the chart.

Step 4 of creating the graph

Finally we need to clean up the layout a bit: give the chart and the y-axis proper titles, drop the legend and the x-axis title and perhaps make the axis and gridlines less prominent. When we preview the report, we have a box plot like this one:

Step 5 of creating the graph

Customizing the Box Plot in Reporting Services

Unfortunately, there aren’t many options to customize this box plot. You can change the color of the box and the thickness of the lines, but that’s about it. Luckily SSRS is smart enough to change the color of the median and mean lines depending on the color of the box. For example, they turn white/grey against a darker box color.

Not many options to customize a box plot

But suppose I want to choose the color of those lines myself. Or that I want to change the thickness of that line. In that case we have to use a small work around. Remove the median and the mean from the series and click the plus icon in the Chart Data window again to add a new series. Choose MeanDuration from the list. This will add a new series with MeanDuration configured as the high whisker.

Added a new series

Right-click the series and change the chart type to Candlestick.

Changing the chart type

Right-click the series again and go to the properties. Set the open and close value to MeanDuration and leave low and high empty. Change the color in the Fill pane and the border thickness in the Border pane to your liking.

Configure the candlestick series

Repeat the same process for MedianDuration. When the report is run, the box plot looks like this:

The final box plot

Because the open and close values for the candlestick series are populated with the same values, they are reduced to single lines. Due to this trick, we can change color and line weight of the median and mean lines in the box plot.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2014-11-03

Comments For This Article




Friday, March 25, 2016 - 11:47:42 AM - Nate Mills Back To Top (41059)

 Thanks very much for this!  It was super helpful.  For me, though, I had to do something else with the SQL since my database is set up with 2008 compatibility:

With core as (

SELECT *

FROM theDB.dbo.theTable)

 

SELECT  core.[Category],

COUNT([Measurable]) Count,

MIN([Measurable]) minTime,

MAX([Measurable]) maxTime,

AVG([Measurable]) meanTime,

med.medianTime,

CASE WHEN pct25.pct25th IS NULL THEN MIN([Measurable]) ELSE pct25.pct25th END pct25th,

CASE WHEN pct75.pct75th IS NULL THEN MAX([Measurable]) ELSE pct75.pct75th END pct75th

FROM core LEFT JOIN

 (SELECT

[Category],

AVG([Measurable]) medianTime

FROM (

SELECT

[Category],

[Measurable],

RowDesc,

ROW_NUMBER() OVER (PARTITION BY [Category] ORDER BY [Measurable] ASC, RowDesc DESC) AS RowAsc

FROM (

SELECT

[Category],

[Measurable],

ROW_NUMBER() OVER (PARTITION BY [Category] ORDER BY [Measurable] DESC) AS RowDesc

FROM core) xx

) x

WHERE

RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)

GROUP BY [Category]) med ON core.[Category] = med.[Category] LEFT JOIN

(SELECT [Category], Min(subq.[Measurable]) pct25th 

FROM

(SELECT TOP 75 PERCENT [Category], [Measurable] 

FROM core

GROUP BY [Category], [Measurable]

ORDER BY [Measurable] DESC) subq

GROUP BY [Category]) pct25 ON core.[Category] = pct25.[Category] LEFT JOIN

(SELECT [Category], Min(subq.[Measurable]) pct75th

FROM

(SELECT TOP 25 PERCENT [Category], [Measurable] 

FROM core

GROUP BY [Category], [Measurable]

ORDER BY [Measurable] DESC) subq

GROUP BY [Category]) pct75 ON core.[Category] = pct75.[Category]

GROUP BY core.[Category],

med.medianTime,

pct25.pct25th,

pct75.pct75th

HAVING (COUNT([Measurable]) > 0)

 

This produces the final values, so the box plot mappings don't need to be "sums", they can just be the direct values.

Hopefully this helps some other folks, too!


Tuesday, May 19, 2015 - 5:34:22 AM - Koen Verbeeck Back To Top (37219)

Hi Siva,

I guess if all the values are 0, then the box will be right on the axis, right?


Thursday, May 14, 2015 - 9:53:35 AM - Siva Back To Top (37174)

Hi Koen,

 

Really a great article which healed my pains of Box Plot, because I couldn't find any documentation on msdn, but luckily got your blog here. The Step by Step explanation could help even the beginner like me. 

 

question:

in my Result Set, I have Null values which was replaced by zero, for some business reason, but I am unable to find the Boxes in the chart for which the Min, Max, Median, High & Low Box values are all same = zero. 

is there any way, we can show the empty boxes.. (No I Guess..)

any quick replies would help me a lot. 

 

Thanks,

Siva


Monday, November 3, 2014 - 5:08:34 PM - Jacque Back To Top (35173)

Great tip, thanks.















get free sql tips
agree to terms