Creating a Table Warehouse to Retain Historical Data for a SQL Server Table

By:   |   Updated: 2013-12-09   |   Comments (7)   |   Related: More > Database Administration


Problem

I have a database table that gets groomed every night. How can I create a quick data warehouse to store all the data in a separate table so that it doesn't get groomed? The table only holds 60 days worth of data and I need to store more data for reporting purposes.

Solution

A lot of applications, including SCOM and even SSRS groom their databases so they don't grow out of control. For instance, in SSRS there is a table called ExecutionLogStorage in the ReportServer database. This table holds details about report executions. SSRS grooms this table so that it only keeps a specified number of days of data. This is good for performance reasons obviously because the larger the table grows, the worse it will perform. However, what if you need to keep more days than specified? You can probably change the number of days in a config file but then the table will begin to grow and the server will not perform optimally. In cases like this, I like to create what I call a table warehouse. A table warehouse is basically a new table that stores data from another table, but doesn't get groomed. A table warehouse will grow much larger than the source table, but the application doesn't use this table so it shouldn't effect performance, but it will effect disk space so make sure you plan accordingly. This is basically a data warehouse, but I like to think of a data warehouse as more than one table and I usually transform and massage the data before I move it to a data warehouse.

Some reasons why you might want to create a table warehouse include archiving data and reporting on older data that may get deleted in the future.

You can probably create this type of table several different ways but I like to use the UNION operator. In this tip, I'll show you the statement I use and we'll walk through it.

First, let's look at the ExecutionLogStorage table:

Let's look at the ExecutionLogStorage table

This table stores information such as InstanceName, UserName, Format, Parameters, Start/End Time, etc. By default, it only stores information for 60 days. Every night, SSRS grooms this table so that the 61st day of data is deleted. We need to keep more than 60 days of data so that's where the table warehouse comes into play.

First thing we'll need to do is create our table warehouse table that will hold the data. I don't need all the columns from the ExecutionLogStorage table so I'll just create my table warehouse to store InstanceName, UserName, Format, TimeStart, TimeEnd, TimeProcessing, Status, ByteCount and RowCount.

  CREATE TABLE [dbo].[ExecutionLogStorageTW](
 [InstanceName] [nvarchar](38) NOT NULL,
 [UserName] [nvarchar](260) NULL,
 [Format] [nvarchar](26) NULL,
 [TimeStart] [datetime] NOT NULL,
 [TimeEnd] [datetime] NOT NULL,
 [TimeProcessing] [int] NOT NULL,
 [Status] [nvarchar](40) NOT NULL,
 [ByteCount] [bigint] NOT NULL,
 [RowCount] [bigint] NOT NULL
    ) 

Next we can use the following query to see the differences between the ExecutionLogStorage table and our new ExecutionLogStorageTW table:

SELECT MIN(TableName) AS TableName
 ,InstanceName
 ,UserName
 ,[Format]
 ,TimeStart
 ,TimeEnd
 ,TimeProcessing
 ,[Status]
 ,ByteCount
 ,[RowCount]
FROM (
 SELECT 'ELS' AS TableName
  ,els.InstanceName
  ,els.UserName
  ,els.[Format]
  ,els.TimeStart
  ,els.TimeEnd
  ,els.TimeProcessing
  ,els.[Status]
  ,els.ByteCount
  ,els.[RowCount]
 FROM [ReportServer].[dbo].[ExecutionLogStorage] els
 
 UNION ALL
 
 SELECT 'ELSDW' AS TableName
  ,elstw.InstanceName
  ,elstw.UserName
  ,elstw.[Format]
  ,elstw.TimeStart
  ,elstw.TimeEnd
  ,elstw.TimeProcessing
  ,elstw.[Status]
  ,elstw.ByteCount
  ,elstw.[RowCount]
 FROM [ReportServer].[dbo].[ExecutionLogStorageTW] elstw
 ) tmp
GROUP BY InstanceName
 ,UserName
 ,[Format]
 ,TimeStart
 ,TimeEnd
 ,TimeProcessing
 ,[Status]
 ,ByteCount
 ,[RowCount]
HAVING COUNT(*) = 1

This query will UNION all the results from ExecutionLogStorage and ExecutionLogStorageTW and display the data that is different. As you can see from the results below there are 72325 results that are different. The TableName column displays the table that the data resides in, therefore you can see that all the data resides in the ELS (ExecutionLogStorage) table.

Since ExecutionLogStorageTW is empty it should show the exact results as:

SELECT * FROM [ReportServer].[dbo].[ExecutionLogStorage]


This query will UNION all the results from ExecutionLogStorage and ExecutionLogStorageTW

Now that I can see the differences, all I need to do is INSERT them into my new table using the following INSERT statement:

INSERT INTO [ReportServer].[dbo].[ExecutionLogStorageTW]
SELECT InstanceName
 ,UserName
 ,[Format]
 ,TimeStart
 ,TimeEnd
 ,TimeProcessing
 ,[Status]
 ,ByteCount
 ,[RowCount]
FROM (
 SELECT 'ELS' AS TableName
  ,els.InstanceName
  ,els.UserName
  ,els.[Format]
  ,els.TimeStart
  ,els.TimeEnd
  ,els.TimeProcessing
  ,els.[Status]
  ,els.ByteCount
  ,els.[RowCount]
 FROM [ReportServer].[dbo].[ExecutionLogStorage] els
 
 UNION ALL
 
 SELECT 'ELSDW' AS TableName
  ,elstw.InstanceName
  ,elstw.UserName
  ,elstw.[Format]
  ,elstw.TimeStart
  ,elstw.TimeEnd
  ,elstw.TimeProcessing
  ,elstw.[Status]
  ,elstw.ByteCount
  ,elstw.[RowCount]
 FROM [ReportServer].[dbo].[ExecutionLogStorageTW] elstw
 ) tmp
GROUP BY InstanceName
 ,UserName
 ,[Format]
 ,TimeStart
 ,TimeEnd
 ,TimeProcessing
 ,[Status]
 ,ByteCount
 ,[RowCount]
HAVING COUNT(*) = 1

We can now query the COUNT on both tables and they should match:

We can now query the COUNT on both tables and they should match

We now have two tables with identical data. But what happens when ExecutionLogStorage gets groomed? The query we are using will show that a difference exists and will re-INSERT the data. Since we only want to INSERT data from the ExecutionLogStorage (ELS) table we can just simply add to the HAVING clause

AND MIN(TableName) = 'ELS'

...making the complete statement look like this:

INSERT INTO [ReportServer].[dbo].[ExecutionLogStorageTW]
SELECT InstanceName
 ,UserName
 ,[Format]
 ,TimeStart
 ,TimeEnd
 ,TimeProcessing
 ,[Status]
 ,ByteCount
 ,[RowCount]
FROM (
 SELECT 'ELS' AS TableName
  ,els.InstanceName
  ,els.UserName
  ,els.[Format]
  ,els.TimeStart
  ,els.TimeEnd
  ,els.TimeProcessing
  ,els.[Status]
  ,els.ByteCount
  ,els.[RowCount]
 FROM [ReportServer].[dbo].[ExecutionLogStorage] els
 
 UNION ALL
 
 SELECT 'ELSDW' AS TableName
  ,elstw.InstanceName
  ,elstw.UserName
  ,elstw.[Format]
  ,elstw.TimeStart
  ,elstw.TimeEnd
  ,elstw.TimeProcessing
  ,elstw.[Status]
  ,elstw.ByteCount
  ,elstw.[RowCount]
 FROM [ReportServer].[dbo].[ExecutionLogStorageTW] elstw
 ) tmp
GROUP BY InstanceName
 ,UserName
 ,[Format]
 ,TimeStart
 ,TimeEnd
 ,TimeProcessing
 ,[Status]
 ,ByteCount
 ,[RowCount]
HAVING COUNT(*) = 1
 AND MIN(TableName) = 'ELS'

You can schedule this with a SQL Agent job or run on demand. If the job was to fail, it will simply look at the differences between the tables since the last run and execute accordingly.

Next Steps
  • Like I mentioned earlier, there may be other ways to complete this task. This is just one example that I found to be the quickest and easiest to setup.
  • A data warehouse is much more complex to setup and is more common for analysis. This tip is meant to show a quick and easy way to replicate a few tables.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Brady Upton Brady Upton is a Database Administrator and SharePoint superstar in Nashville, TN.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2013-12-09

Comments For This Article




Friday, March 7, 2014 - 9:49:04 AM - Jason Back To Top (29679)

outer join will do the difference


Saturday, December 21, 2013 - 6:23:17 PM - Jeff Moden Back To Top (27866)

My only bone to pick here is in the following quoted from the article...

This is good for performance reasons obviously because the larger the table grows, the worse it will perform.
...{snip}...
...but the application doesn’t use this table so it shouldn’t effect performance...

The bone is that table size is usually not responsible for poor performance.  Bad code, poorly designed tables, and lack of / improper / too many indexes is.

 

 

 

 


Friday, December 20, 2013 - 2:00:59 PM - Edward Back To Top (27862)

Interestingly enough, our customer has always wanted us to keep the history even while we prune records to keep the table manageable.

So what we did was create a table that exactly duplicated the original table and in the stored procedure that carries out inserts, and updates we would either:

  1. Insert all the records older than a certain date into the History table as the new record was being added, or
  2. Copy each altered or deleted record to the history table before carrying out the update or delete.

(Some tables requiring that changes be tracked, and all records be preserved until the real life item has been eliminated, and others that report one time only events that need to be tracked but records older than a certain period are not immediately used.)


Friday, December 20, 2013 - 7:33:08 AM - Elliot Back To Top (27856)

If you knew the data that was grooming the data you could append the OUTPUT clause onto it to facilitate the data move


Wednesday, December 11, 2013 - 5:56:49 AM - Benjamin Back To Top (27761)

But your last statement has IN(TableName) = 'ELS' meaning it will only report from the top table too! As far as I understand it, the only difference between your way & mine would be that the archive table will have copies of the main table too - ie ones that are less than 60 days old - I would say this is no bad thing for 2 reasons. 1) With your way of doing things, if the automated procedure failed for some reason, then you would have lost some data for good, and 2) having worked with such a setup, it was a pain having to refer to the current & archive table in every query that used it - having them all in one table would've made my life easier & the queries quicker.


Tuesday, December 10, 2013 - 4:03:19 PM - Brady Back To Top (27754)

This will not work. When using UNION ALL, only the differences between the tables will be the result set. When using EXCEPT, the result set will display the results from the top table.


Tuesday, December 10, 2013 - 10:38:50 AM - Benjamin Back To Top (27752)

Wouldn't using the EXCEPT clause be a more elegant solution here?

INSERT INTO [ReportServer].[dbo].[ExecutionLogStorage]
SELECT InstanceName ,UserName ,[Format] ,TimeStart ,TimeEnd ,TimeProcessing ,[Status] ,ByteCount ,[RowCount]
FROM [ReportServer].[dbo].[ExecutionLog]
EXCEPT
SELECT InstanceName ,UserName ,[Format] ,TimeStart ,TimeEnd ,TimeProcessing ,[Status] ,ByteCount ,[RowCount]
FROM [ReportServer].[dbo].[ExecutionLogStorage]














get free sql tips
agree to terms