By: Tim Smith | Updated: 2018-01-04 | Comments (2) | Related: > Integration Services Performance
Problem
We use SQL Server Data Tools (SSDT) for importing and transforming some of our data sets and have different needs for timing some of the events within the SQL Server Integration Services (SSIS) packages. With some packages we want to determine the total time a package ran, while with other packages we want to determine the amount of time a specific import or part of the package ran. What are some examples to assist us with tracking this information as close as possible, as we realize we may be limited in some of our packages, but a close estimate will help.
Solution
Our solution will depend on what we need to know in the ETL flow. When we consider tracking the time of a process during a package execution, it could be one of the following that we're tracking:
- Do we need to know the length of time for a specific execution within the SSIS package - such as the length of time for a data flow process?
- Do we need to know the length of time for the overall package execution - from start to finish?
- Do we need to know the length of time for specific block of executions - such as the length of time for a data flow process and data transformations within an execute SQL task?
For instance, suppose that we make a call to an API which returns a block of data with each call generally consuming a few minutes to return the data. In this example, we're trying to track the time for a specific call when we get data, which will affect how and where we add our data information.
Option 1 - SQL Server Agent Job
If we need to know the total time that an SSIS package ran and we execute the package through SSIS, we add a T-SQL timestamp before the package executes and another one after the package finishes. In the below code, we use the day as an identifier to update the row when the package finishes - this might change depending on what the identifier is. For an example, if our loader is monthly and it takes more than one day to load, we may use a month and year identifier for updating the row of data. As an alternative to get the duration only, we could also derive this information from the SQL Server Job Agent information by looking at how long the job step executed. Both examples are shown below this.
/* CREATE TABLE tbDataLoaders( DateLog VARCHAR(12), DataLoader VARCHAR(100), StartDateTime DATETIME, EndDateTime DATETIME ) */ ---- Step 1 DECLARE @date VARCHAR(12) = CONVERT(VARCHAR(12),GETDATE(),112) INSERT INTO tbDataLoaders (DateLog,DataLoader,StartDateTime) VALUES (@date,'EnergyDataLoaderOne Package',GETDATE()) ---- Step 2 DECLARE @date VARCHAR(12) = CONVERT(VARCHAR(12),GETDATE(),112) UPDATE tbDataLoaders SET EndDateTime = GETDATE() WHERE DateLog = @date ---- SQL agent information from msdb USE msdb GO SELECT last_run_duration RunDuration , last_run_outcome RunOutcome FROM sysjobsteps t INNER JOIN sysjobs tt ON t.job_id = tt.job_id WHERE tt.name = 'OurDataLoader'
The duration of the job being executed in sysjobsteps reports the data in the hhmmss format. Since some environments may not allow developers to access to the system databases, the alternative of stamping the time in the agent may be an option that developers can use.
Option 2 - Use T-SQL or Stored Procedures in the SSIS package
SSIS comes with an execute SQL task, which allows us to call any T-SQL statement. At the beginning of the SSIS package we can add a timestamp value to a table and at the end close the package with a timestamp value. This is similar to the above technique, except the step is added to the SSIS package. We'll borrow the T-SQL from above and add it to our example SSIS package.
Since DBAs may restrict permissions, sometimes we're limited to updating timestamps within the SSIS package. In this case, our table would be within our loading database, since we'd have permissions there. If the permissions were restricted further, such as object level permissions, we may use stored procedures with the same T-SQL where the user would only have execute permission on the stored procedures and write permissions on the loading table.
Option 3 - Add a time column to the SSIS data flow
We may want to obtain more granular time information when a specific part of our SSIS package runs, such as tracking the time of a file's data flow. For a simple example of this, what if we're importing 100 files and want to track the time range between several files? For an example: file 1 is imported at 11:01AM, file 2 is imported at 11:03AM, file 3 is imported at 11:16AM, and file 4 is imported at 1:18AM. We know that file 2 ran longer than files 1 and 3. In the below example within SSIS, we add a new column using the derived column option and adding a time function as a part of the data flow, which tracks the current time using a function. When the example file below this imports the data, the time is tracked.
Make sure that the table has an appropriate time column for these data to be added to the table, as this will stamp the column with the time (in this example, the ImportTime column on the table will receive the time of the data flow). With this technique, we can be more granular in measuring the time of our tasks, if we want to look at the time for a specific import.
Option 4 - Add data and time information with an SSIS Script task
I've seen some SSIS packages which use script tasks for additional functionality where a timestamp may be required. Because script tasks use the .NET library, we can use the DateTime library in .NET for tracking time. First, let's look at a simple output in the console application within Visual Studio. We'll execute the below code to validate our output - make sure to hit enter after a pause (your dates and time will differ).
using System; namespace ConsoleExample { class Program { static void Main(string[] args) { DateTime start = DateTime.Now; /// Wait a period before hitting enter Console.ReadLine(); DateTime end = DateTime.Now; Console.WriteLine(start.ToString() + " " + end.ToString()); Console.ReadLine(); } } }
In a script task within SSIS, we will add a start and end variable with the same DateTime class and separate them with 10 seconds using the Threading library. After the ten seconds pass, the script task will write both the start and end time to a log file in our import directory.
public void Main() { // TODO: Add your code here DateTime start = DateTime.Now; System.Threading.Thread.Sleep(10000); DateTime end = DateTime.Now; using (System.IO.StreamWriter writeFile = new System.IO.StreamWriter(@"C:\import\log.txt")) { writeFile.WriteLine("Start: " + start.ToString() + "; end: " + end.ToString()); } Dts.TaskResult = (int)ScriptResults.Success; }
Next Steps
- We want to consider what permissions we'll have and what level of access, as we may be limited where we can add timestamps or where we can track information.
- We want to consider how granular of tracking we want - entire package, a set of actions within the package, etc.
- Finally, we want to consider where we want to keep this information - the import table itself, a separate tracking table, a logging table, etc.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2018-01-04