Find SSIS Package Stuck in Running Status

By:   |   Updated: 2019-05-30   |   Comments (3)   |   Related: > Integration Services Administration


Problem

SQL Server Integration Services (SSIS) executions sometimes go off the rails and runs indefinitely instead of failing gracefully. It is extremely important to stop such executions to allow new instances to kick off at the next scheduled interval.  In this tip, we will set up a basic monitoring and alerting mechanism to stop orphaned SSIS executions.

Solution

As described in the problem section, I have noticed multiple times that SSIS executions go orphaned for one reason or the other and silently eat server resources. Not only that, these executions won’t stop until a server restart. During a server restart, the start-up stored procedure ([dbo].[sp_ssis_startup]) will take care of these ghost executions, but we probably do not want to wait until a server restart and let it steal server resources.

Unfortunately, I do not have a demo script to create orphaned executions, but most of the time I have noticed this situation due to severed connections either due to a remote server restart or some form of network interruption while a data transfer is in progress.

To overcome this situation of ghost executions, I wrote a monitoring solution. The monitoring solution consists of a configuration table and a stored procedure. We have a dedicated user database to hold monitoring and maintenance utilities developed by the DBA group, but you can create these objects in any user database.

We will go through the logic step by step. The complete stored procedure is at the end of this article.

Step 1: Configuration for Monitoring and Alerting

This is a table-driven monitoring solution. We will first create a table to hold configuration data for our monitoring solution. The configuration data will include SSIS package name along with its environment name (optional), threshold duration in seconds and alerting email address.

The [threshold_time_sec] duration represents the maximum amount of time (in seconds) each package should take to finish. Any package execution running over that duration will be picked up by the monitoring procedure and will be stopped.  The [alert_email] column holds the values of email address for the group of people/person, to be notified if there is an issue.

---- create table to hold configuration data
CREATE TABLE [dbo].[ssis_monitor_configure] (
  [package_name] [nvarchar](100) NOT NULL,
  [environment_name] [nvarchar](100) NULL,
  [threshold_time_sec] [int] NULL,
  [monitored] [bit] NULL,
  [alert_email] [varchar](100) NULL
) ON [primary];
GO
CREATE CLUSTERED INDEX ix_ssis_monitor_configure_package_name_env_name_monitored
ON dbo.ssis_monitor_configure (package_name, environment_name, monitored);
GO

Step 2: Populating Configuration Data

Once the table is created, we will insert the configuration data for each package. You can use the below query to get started and populate it with SSIS package data from your environment. MAKE SURE to tune/change the configuration data as per your need. Alternatively, you can start with just a few packages in the configuration table and grow from there.

---- populate configuration data

INSERT INTO [dbo].[ssis_monitor_configure] ([package_name]
, [environment_name]
, [threshold_time_sec]
, [monitored]
, [alert_email])
  SELECT DISTINCT
    p.name 'pacakge_name',
    NULL 'environment_name', --populate environment name
    3600 'threshold_sec', --change it as per your need? 
    0, --monitored 
    '[email protected]' 'alert' --alert email
  FROM ssisdb.internal.packages p (NOLOCK);

Step 3: Find Long Running Executions

We will query the SSISDB catalog system view to find long running executions on the server and load them into a temporary table.

-- load long running package information in a temp table
SELECT
  ex.execution_id,
  ex.package_name,
  ex.environment_name,
  ex.caller_name,
  DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) AS 'run_time_sec',
  smc.threshold_time_sec,
  smc.alert_email INTO #execution_info
FROM ssisdb.catalog.executions(nolock) ex
JOIN dbo.ssis_monitor_configure(nolock) smc
  ON ex.package_name = smc.package_name
WHERE ex.end_time IS NULL  -- still running 
AND ex.status = 2 -- ( 1-created , 2-running ,3-canceled,4-failed ,5-pending,6-ended unexpectedly,7-succeeded ,8-stopping, 9-completed )
AND (ex.environment_name = smc.environment_name
OR smc.environment_name IS NULL)
AND (DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) > smc.threshold_time_sec);

Step 4: Alert On-Call Person if Necessary

If there are multiple long running executions on the server (more than defined threshold) – we will alert the on call device instead of taking any automated action as there may be a bigger issue in the environment and we probably want some human intervention.

-- alert DBA On call if long running instances are higher than threshold 
IF (@@ROWCOUNT > @oncallthreshold)
BEGIN
  SET @emailsubject = 'SSIS Alert: ' + @@servername + ': More than ' + CAST(@oncallthreshold AS varchar(10)) + ' long running SSIS executions detected.'
  EXEC msdb.dbo.sp_send_dbmail @recipients = @dbaOncall,
                               @subject = @emailsubject,
                               @body_format = 'HTML';
  RETURN;
END   

Step 5: Stop Executions Automatically

In case of only a few long running executions, we will go through each execution and stop them. We will use a cursor as we must stop these executions one by one. We will use system stored procedure (ssisdb.catalog.stop_operation) to stop these executions and notify the appropriate group/person entered in the configuration table.

DECLARE ssis_monitor_cursor CURSOR FAST_FORWARD FOR
SELECT
  execution_id,
  package_name,
  alert_email
FROM #execution_info;
OPEN ssis_monitor_cursor;

FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

WHILE @@fetch_status = 0
BEGIN

  --stop the long running execution
  EXEC ssisdb.catalog.stop_operation @operation_id = @execution_id;

  --build email body
  SET @emailsubject = 'SSIS Alert: ' + @@servername + ': ' + @package_name + ' is past defined threshold and has been stopped';
  SET @tableHTML = N'<table border="2" cellpadding="2" cellspacing="0">' + CHAR(13);
  SET @tableHTML = @tableHTML + N'<tr><th>execution_id</th><th>package_name</th><th>environment_name</th><th>calling_user</th><th>exec_time_seconds</th><th>threshold_sec</th></tr>' + CHAR(13);
  SET @tableHTML = @tableHTML + CAST((SELECT
    execution_id AS td,
    package_name AS td,
    COALESCE(environment_name, 'not defined') AS td,
    caller_name AS td,
    run_time_sec AS td,
    threshold_time_sec AS td
  FROM #execution_info
  WHERE execution_id = @execution_id
  FOR xml RAW ('tr'), ELEMENTS)
  AS nvarchar(max)) + CHAR(13);
  SET @tableHTML = @tableHTML + N'</table>' + '<br>' + CHAR(13);

  --- notify dba group 
  EXEC msdb.dbo.sp_send_dbmail @recipients = @alertemail,
                               @subject = @emailsubject,
                               @body = @tableHTML,
                               @body_format = 'HTML';

  FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

END;

CLOSE ssis_monitor_cursor;
DEALLOCATE ssis_monitor_cursor;

Step 6: User Inputs

Make sure to change the following variable values or column values in the stored procedure/configuration table for this solution to work.

  • @dbaOncall - email address for the on-call device
  • @oncallthreshold - threshold for number of long running executions to alert on call device
  • [threshold_time_sec] - column in the configuration table dbo.ssis_monitor_configure
  • [alert_email] - column in the configuration table dbo.ssis_monitor_configure

Finally, hook up the stored procedure to a SQL Server Agent Job and run it on scheduled interval to monitor for ghost executions of SSIS packages.

Complete SSIS Monitoring Stored Procedure

Here is the complete script.

CREATE PROCEDURE [dbo].[ssis_monitor]
----------------------------------------------------------------------------------------------------------------------------
-- LOCATION : SSIS servers 
-- AUTHOR : Aakash Patel
-- DATE : 05/09/2018
-- INPUTS : 
-- OUTPUTS : 
-- DEPENDENCIES : none
-- DEPENDENTS : none
-- DESCRIPTION : This proc checks for long running ssis executions and stops them based on the defined threshold from dbo.ssis_monitor_configure table
-- which app calls this proc  : agent job
-- FREQUENCY : every xx seconds
-- MODIFICATION HISTORY : 
-- 05/09/2018 – Aakash Patel : Initial Build 
----------------------------------------------------------------------------------------------------------------------------
AS
BEGIN
  BEGIN TRY
    DECLARE @execution_id bigint
    DECLARE @package_name nvarchar(100)
    DECLARE @tableHTML varchar(max)
    DECLARE @emailsubject AS varchar(150)
    DECLARE @alertemail varchar(50)
    DECLARE @dbaOncall varchar(50) = '[email protected]'  --email address for on call device
    DECLARE @oncallthreshold tinyint = 5  --threshold for long running instances to alert on call device

    -- load long running package information in a temp table

    SELECT
      ex.execution_id,
      ex.package_name,
      ex.environment_name,
      ex.caller_name,
      DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) AS 'run_time_sec',
      smc.threshold_time_sec,
      smc.alert_email INTO #execution_info
    FROM ssisdb.catalog.executions(nolock) ex
    JOIN dbo.ssis_monitor_configure(nolock) smc
      ON ex.package_name = smc.package_name
    WHERE ex.end_time IS NULL  -- still running 
    AND ex.status = 2 -- ( 1-created , 2-running ,3-canceled,4-failed ,5-pending,6-ended unexpectedly,7-succeeded ,8-stopping, 9-completed )
    AND (ex.environment_name = smc.environment_name
    OR smc.environment_name IS NULL)
    AND (DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) > smc.threshold_time_sec);

    -- alert DBA On call if long running instances are higher than threshold 
    IF (@@ROWCOUNT > @oncallthreshold)
    BEGIN
      SET @emailsubject = 'SSIS Alert: ' + @@servername + ': More than ' + CAST(@oncallthreshold AS varchar(10)) + ' long running SSIS executions detected.'
      EXEC msdb.dbo.sp_send_dbmail @recipients = @dbaOncall,
                                   @subject = @emailsubject,
                                   @body_format = 'HTML';
      RETURN;
    END
    -- loop through long running executions and stop them
    ELSE
    BEGIN
      DECLARE ssis_monitor_cursor CURSOR FAST_FORWARD FOR
      SELECT
        execution_id,
        package_name,
        alert_email
      FROM #execution_info;
      OPEN ssis_monitor_cursor;

      FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

      WHILE @@fetch_status = 0
      BEGIN

        --stop the long running execution
        EXEC ssisdb.catalog.stop_operation @operation_id = @execution_id;

        --build email body
        SET @emailsubject = 'SSIS Alert: ' + @@servername + ': ' + @package_name + ' is past defined threshold and has been stopped';
        SET @tableHTML = N'<table border="2" cellpadding="2" cellspacing="0">' + CHAR(13);
        SET @tableHTML = @tableHTML + N'<tr><th>execution_id</th><th>package_name</th><th>environment_name</th><th>calling_user</th><th>exec_time_seconds</th><th>threshold_sec</th></tr>' + CHAR(13);
        SET @tableHTML = @tableHTML + CAST((SELECT
          execution_id AS td,
          package_name AS td,
          COALESCE(environment_name, 'not defined') AS td,
          caller_name AS td,
          run_time_sec AS td,
          threshold_time_sec AS td
        FROM #execution_info
        WHERE execution_id = @execution_id
        FOR xml RAW ('tr'), ELEMENTS)
        AS nvarchar(max)) + CHAR(13);
        SET @tableHTML = @tableHTML + N'</table>' + '<br>' + CHAR(13);

        --- notify dba group 
        EXEC msdb.dbo.sp_send_dbmail @recipients = @alertemail,
                                     @subject = @emailsubject,
                                     @body = @tableHTML,
                                     @body_format = 'HTML';

        FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

      END;

      CLOSE ssis_monitor_cursor;
      DEALLOCATE ssis_monitor_cursor;
    END;

  END TRY

  ----- error handling 
  BEGIN CATCH
  --insert your standard error handling code
  END CATCH
END;
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aakash Patel Aakash Patel is a Senior SQL Server DBA for a software firm in Connecticut with 10+ years of experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2019-05-30

Comments For This Article




Tuesday, June 14, 2022 - 2:31:24 PM - Joel Back To Top (90161)
What is the purpose of the monitored field? I don't see it utilized in any of the code

Thursday, December 2, 2021 - 12:03:18 AM - nasa Back To Top (89511)
Excellent. This helped me resolve the issue.
For an error "Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation."


i created the table with appropriate collation and Bingo IT WORKED:

CREATE TABLE #ssis_monitor_configure(
[package_name] [nvarchar](100)collate SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[environment_name] [nvarchar](100) collate SQL_Latin1_General_CP1_CI_AS NULL,
[threshold_time_sec] [int] NULL,
[monitored] [bit] NULL,
[alert_email] [varchar](100) collate SQL_Latin1_General_CP1_CI_AS NULL
) ON [primary];
GO

Monday, August 26, 2019 - 4:41:59 AM - Kenny Hanberg Back To Top (82136)

Thanks for sharing this solution Aakash.

Locking only for packagenames and environments, though is far from enough on our setup.

Had to add folder and project names to the solution.















get free sql tips
agree to terms