By: Aakash Patel | Updated: 2019-05-30 | Comments (3) | Related: > Integration Services Administration
Problem
SQL Server Integration Services (SSIS) executions sometimes go off the rails and runs indefinitely instead of failing gracefully. It is extremely important to stop such executions to allow new instances to kick off at the next scheduled interval. In this tip, we will set up a basic monitoring and alerting mechanism to stop orphaned SSIS executions.
Solution
As described in the problem section, I have noticed multiple times that SSIS executions go orphaned for one reason or the other and silently eat server resources. Not only that, these executions won’t stop until a server restart. During a server restart, the start-up stored procedure ([dbo].[sp_ssis_startup]) will take care of these ghost executions, but we probably do not want to wait until a server restart and let it steal server resources.
Unfortunately, I do not have a demo script to create orphaned executions, but most of the time I have noticed this situation due to severed connections either due to a remote server restart or some form of network interruption while a data transfer is in progress.
To overcome this situation of ghost executions, I wrote a monitoring solution. The monitoring solution consists of a configuration table and a stored procedure. We have a dedicated user database to hold monitoring and maintenance utilities developed by the DBA group, but you can create these objects in any user database.
We will go through the logic step by step. The complete stored procedure is at the end of this article.
Step 1: Configuration for Monitoring and Alerting
This is a table-driven monitoring solution. We will first create a table to hold configuration data for our monitoring solution. The configuration data will include SSIS package name along with its environment name (optional), threshold duration in seconds and alerting email address.
The [threshold_time_sec] duration represents the maximum amount of time (in seconds) each package should take to finish. Any package execution running over that duration will be picked up by the monitoring procedure and will be stopped. The [alert_email] column holds the values of email address for the group of people/person, to be notified if there is an issue.
---- create table to hold configuration data CREATE TABLE [dbo].[ssis_monitor_configure] ( [package_name] [nvarchar](100) NOT NULL, [environment_name] [nvarchar](100) NULL, [threshold_time_sec] [int] NULL, [monitored] [bit] NULL, [alert_email] [varchar](100) NULL ) ON [primary]; GO CREATE CLUSTERED INDEX ix_ssis_monitor_configure_package_name_env_name_monitored ON dbo.ssis_monitor_configure (package_name, environment_name, monitored); GO
Step 2: Populating Configuration Data
Once the table is created, we will insert the configuration data for each package. You can use the below query to get started and populate it with SSIS package data from your environment. MAKE SURE to tune/change the configuration data as per your need. Alternatively, you can start with just a few packages in the configuration table and grow from there.
---- populate configuration data INSERT INTO [dbo].[ssis_monitor_configure] ([package_name] , [environment_name] , [threshold_time_sec] , [monitored] , [alert_email]) SELECT DISTINCT p.name 'pacakge_name', NULL 'environment_name', --populate environment name 3600 'threshold_sec', --change it as per your need? 0, --monitored '[email protected]' 'alert' --alert email FROM ssisdb.internal.packages p (NOLOCK);
Step 3: Find Long Running Executions
We will query the SSISDB catalog system view to find long running executions on the server and load them into a temporary table.
-- load long running package information in a temp table SELECT ex.execution_id, ex.package_name, ex.environment_name, ex.caller_name, DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) AS 'run_time_sec', smc.threshold_time_sec, smc.alert_email INTO #execution_info FROM ssisdb.catalog.executions(nolock) ex JOIN dbo.ssis_monitor_configure(nolock) smc ON ex.package_name = smc.package_name WHERE ex.end_time IS NULL -- still running AND ex.status = 2 -- ( 1-created , 2-running ,3-canceled,4-failed ,5-pending,6-ended unexpectedly,7-succeeded ,8-stopping, 9-completed ) AND (ex.environment_name = smc.environment_name OR smc.environment_name IS NULL) AND (DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) > smc.threshold_time_sec);
Step 4: Alert On-Call Person if Necessary
If there are multiple long running executions on the server (more than defined threshold) – we will alert the on call device instead of taking any automated action as there may be a bigger issue in the environment and we probably want some human intervention.
-- alert DBA On call if long running instances are higher than threshold IF (@@ROWCOUNT > @oncallthreshold) BEGIN SET @emailsubject = 'SSIS Alert: ' + @@servername + ': More than ' + CAST(@oncallthreshold AS varchar(10)) + ' long running SSIS executions detected.' EXEC msdb.dbo.sp_send_dbmail @recipients = @dbaOncall, @subject = @emailsubject, @body_format = 'HTML'; RETURN; END
Step 5: Stop Executions Automatically
In case of only a few long running executions, we will go through each execution and stop them. We will use a cursor as we must stop these executions one by one. We will use system stored procedure (ssisdb.catalog.stop_operation) to stop these executions and notify the appropriate group/person entered in the configuration table.
DECLARE ssis_monitor_cursor CURSOR FAST_FORWARD FOR SELECT execution_id, package_name, alert_email FROM #execution_info; OPEN ssis_monitor_cursor; FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail; WHILE @@fetch_status = 0 BEGIN --stop the long running execution EXEC ssisdb.catalog.stop_operation @operation_id = @execution_id; --build email body SET @emailsubject = 'SSIS Alert: ' + @@servername + ': ' + @package_name + ' is past defined threshold and has been stopped'; SET @tableHTML = N'<table border="2" cellpadding="2" cellspacing="0">' + CHAR(13); SET @tableHTML = @tableHTML + N'<tr><th>execution_id</th><th>package_name</th><th>environment_name</th><th>calling_user</th><th>exec_time_seconds</th><th>threshold_sec</th></tr>' + CHAR(13); SET @tableHTML = @tableHTML + CAST((SELECT execution_id AS td, package_name AS td, COALESCE(environment_name, 'not defined') AS td, caller_name AS td, run_time_sec AS td, threshold_time_sec AS td FROM #execution_info WHERE execution_id = @execution_id FOR xml RAW ('tr'), ELEMENTS) AS nvarchar(max)) + CHAR(13); SET @tableHTML = @tableHTML + N'</table>' + '<br>' + CHAR(13); --- notify dba group EXEC msdb.dbo.sp_send_dbmail @recipients = @alertemail, @subject = @emailsubject, @body = @tableHTML, @body_format = 'HTML'; FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail; END; CLOSE ssis_monitor_cursor; DEALLOCATE ssis_monitor_cursor;
Step 6: User Inputs
Make sure to change the following variable values or column values in the stored procedure/configuration table for this solution to work.
- @dbaOncall - email address for the on-call device
- @oncallthreshold - threshold for number of long running executions to alert on call device
- [threshold_time_sec] - column in the configuration table dbo.ssis_monitor_configure
- [alert_email] - column in the configuration table dbo.ssis_monitor_configure
Finally, hook up the stored procedure to a SQL Server Agent Job and run it on scheduled interval to monitor for ghost executions of SSIS packages.
Complete SSIS Monitoring Stored Procedure
Here is the complete script.
CREATE PROCEDURE [dbo].[ssis_monitor] ---------------------------------------------------------------------------------------------------------------------------- -- LOCATION : SSIS servers -- AUTHOR : Aakash Patel -- DATE : 05/09/2018 -- INPUTS : -- OUTPUTS : -- DEPENDENCIES : none -- DEPENDENTS : none -- DESCRIPTION : This proc checks for long running ssis executions and stops them based on the defined threshold from dbo.ssis_monitor_configure table -- which app calls this proc : agent job -- FREQUENCY : every xx seconds -- MODIFICATION HISTORY : -- 05/09/2018 – Aakash Patel : Initial Build ---------------------------------------------------------------------------------------------------------------------------- AS BEGIN BEGIN TRY DECLARE @execution_id bigint DECLARE @package_name nvarchar(100) DECLARE @tableHTML varchar(max) DECLARE @emailsubject AS varchar(150) DECLARE @alertemail varchar(50) DECLARE @dbaOncall varchar(50) = '[email protected]' --email address for on call device DECLARE @oncallthreshold tinyint = 5 --threshold for long running instances to alert on call device -- load long running package information in a temp table SELECT ex.execution_id, ex.package_name, ex.environment_name, ex.caller_name, DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) AS 'run_time_sec', smc.threshold_time_sec, smc.alert_email INTO #execution_info FROM ssisdb.catalog.executions(nolock) ex JOIN dbo.ssis_monitor_configure(nolock) smc ON ex.package_name = smc.package_name WHERE ex.end_time IS NULL -- still running AND ex.status = 2 -- ( 1-created , 2-running ,3-canceled,4-failed ,5-pending,6-ended unexpectedly,7-succeeded ,8-stopping, 9-completed ) AND (ex.environment_name = smc.environment_name OR smc.environment_name IS NULL) AND (DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) > smc.threshold_time_sec); -- alert DBA On call if long running instances are higher than threshold IF (@@ROWCOUNT > @oncallthreshold) BEGIN SET @emailsubject = 'SSIS Alert: ' + @@servername + ': More than ' + CAST(@oncallthreshold AS varchar(10)) + ' long running SSIS executions detected.' EXEC msdb.dbo.sp_send_dbmail @recipients = @dbaOncall, @subject = @emailsubject, @body_format = 'HTML'; RETURN; END -- loop through long running executions and stop them ELSE BEGIN DECLARE ssis_monitor_cursor CURSOR FAST_FORWARD FOR SELECT execution_id, package_name, alert_email FROM #execution_info; OPEN ssis_monitor_cursor; FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail; WHILE @@fetch_status = 0 BEGIN --stop the long running execution EXEC ssisdb.catalog.stop_operation @operation_id = @execution_id; --build email body SET @emailsubject = 'SSIS Alert: ' + @@servername + ': ' + @package_name + ' is past defined threshold and has been stopped'; SET @tableHTML = N'<table border="2" cellpadding="2" cellspacing="0">' + CHAR(13); SET @tableHTML = @tableHTML + N'<tr><th>execution_id</th><th>package_name</th><th>environment_name</th><th>calling_user</th><th>exec_time_seconds</th><th>threshold_sec</th></tr>' + CHAR(13); SET @tableHTML = @tableHTML + CAST((SELECT execution_id AS td, package_name AS td, COALESCE(environment_name, 'not defined') AS td, caller_name AS td, run_time_sec AS td, threshold_time_sec AS td FROM #execution_info WHERE execution_id = @execution_id FOR xml RAW ('tr'), ELEMENTS) AS nvarchar(max)) + CHAR(13); SET @tableHTML = @tableHTML + N'</table>' + '<br>' + CHAR(13); --- notify dba group EXEC msdb.dbo.sp_send_dbmail @recipients = @alertemail, @subject = @emailsubject, @body = @tableHTML, @body_format = 'HTML'; FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail; END; CLOSE ssis_monitor_cursor; DEALLOCATE ssis_monitor_cursor; END; END TRY ----- error handling BEGIN CATCH --insert your standard error handling code END CATCH END;
Next Steps
- Check out these other SSIS Administration Tips
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2019-05-30