By: Pablo Echeverria | Updated: 2022-09-22 | Comments | Related: > SQL Server and Oracle Comparison
Problem
My company has multiple scheduled recurring activities in both SQL Server and Oracle and sometimes the jobs hang. How can I handle this so processing can continue with minimal interruptions?
Solution
Job hangs can happen for numerous reasons:
- A deadlock occurs and doesn't get cleared automatically
- The job finished but is reported as still running; maybe the agent or the server rebooted
- A package is corrupted, causing it to fail intermittently
- A local resource is temporarily crowded
- An executable failed with a transient error
- Network communication got interrupted
- Remote resources are temporarily unavailable
- DNS intermittent malfunction, maybe it was updated recently, or one replica is failing
- Waiting for user input before closing
- No free space left for writing the log
- A command needs to finish graciously before starting the next command
- Antivirus blocking the job at that moment
- A remote database session got killed and was unable to be recovered/terminated locally
- A remote database query failed due to any of the reasons above
When any of these occur, sometimes you can't handle it in the code (retry/rollback). If the job can be restarted, you can automate this task to continue processing information with minimal disruption. However, be careful to exclude long-running jobs (suspected corrupt data, waiting on resource locked by another session, big logfile, etc.) or when the failure must be fixed first (network configuration changes, corrupt database files, disk full, etc.), as that will adversely affect the performance of those jobs and will only increase the time they take to complete.
How do you automate this task in SQL Server 2019 and Oracle Windows 19c, and what are the differences?
Oracle
In Oracle Windows 19c, there is a standard way to create this setup:
- Fun with Event Based Oracle Scheduling
- MAX_RUN_DURATION Attribute of the SCHEDULER to Setup a Sniper Job
Often, it's still not clear or straightforward even after reading the following documentation:
- Scheduler (DBMS_SCHEDULER) Enhancements in Oracle Database 12c Release 1 (12.1)
- Using Events to Start Jobs
- Advanced Queuing User's Guide
- DBMS_SCHEDULER
So, I'm going to show a more straightforward approach that is easier to understand and can be extended and adapted to your environment. This will be simpler than creating the job, a scheduler program (with its metadata argument EVENT_MESSAGE being thrown), and a stored procedure (which receives the SYS.SCHEDULER$_EVENT_INFO). And once you understand how it works, it will be much easier to create the scheduler program and stored procedure to improve performance as recommended by Oracle experts.
First, you need to subscribe to event-based notifications sent from the scheduler event queue. To do this, run the command below, logged in as a DBA user, taking note of the name assigned:
BEGIN DBMS_SCHEDULER.ADD_EVENT_QUEUE_SUBSCRIBER('DATA_EVENT'); END; /
Note: Use the command below to remove the subscription:
BEGIN DBMS_SCHEDULER.REMOVE_EVENT_QUEUE_SUBSCRIBER('DATA_EVENT'); END; /
To check which users are subscribed to event-based notifications from the scheduler event queue, run the following commands. The ones added are the last row in the first output and the last two rows in the second output.
SET LINESIZE 300 COL OWNER FOR A5 COL QUEUE_TABLE FOR A21 COL CONSUMER_NAME FOR A25 COL RULE FOR A35 SELECT OWNER, QUEUE_TABLE, CONSUMER_NAME, RULE FROM DBA_QUEUE_SUBSCRIBERS WHERE QUEUE_NAME = 'SCHEDULER$_EVENT_QUEUE'; OWNER QUEUE_TABLE CONSUMER_NAME RULE ----- --------------------- ------------------------- ----------------------------------- SYS SCHEDULER$_EVENT_QTAB ILM_AGENT SYS SCHEDULER$_EVENT_QTAB SCHEDULER$_EVENT_AGENT SYS SCHEDULER$_EVENT_QTAB DATA_EVENT tab.user_data.object_owner = 'SYS' COL OBJECT_NAME FOR A30 SELECT OWNER, OBJECT_NAME, OBJECT_TYPE, STATUS FROM DBA_OBJECTS WHERE OBJECT_NAME LIKE '%SCHEDULER$_EVENT_QUEUE%'; OWNER OBJECT_NAME OBJECT_TYPE STATUS ----- ------------------------------ ----------------------- ------- SYS SCHEDULER$_EVENT_QUEUE_R RULE SET VALID SYS SCHEDULER$_EVENT_QUEUE_N RULE SET VALID SYS SCHEDULER$_EVENT_QUEUE QUEUE VALID SYS SCHEDULER$_EVENT_QUEUE UNDEFINED VALID SYS SCHEDULER$_EVENT_QUEUE$21 RULE SET VALID SYS SCHEDULER$_EVENT_QUEUE$21 RULE VALID
Now let's show the query to check which jobs have exceeded their max duration. It will only return records after you modify the existing jobs (to be explained later). The table DBA_SCHEDULER_JOBS has information about the existing jobs. We're interested in the last time the job ran and if it's running. The table SYS.AQ$SCHEDULER$_EVENT_QTAB means AQ for Advanced Queuing and QTAB for Queue Table. Note: SE.USER_DATA is an object, so it must be cast appropriately to get its nested columns EVENT_TIMESTAMP and EVENT_TYPE. We're interested at what time the job exceeded its max duration:
COL OWNER FOR A10 COL JOB_NAME FOR A10 COL EVENT_TIMESTAMP FOR A15 COL EVENT_TYPE FOR A16 COL LAST_START_DATE FOR A10 COL STATE FOR A10 SELECT J.OWNER, J.JOB_NAME, TREAT(SE.USER_DATA AS SYS.SCHEDULER$_EVENT_INFO).EVENT_TIMESTAMP EVENT_TIMESTAMP, TREAT(SE.USER_DATA AS SYS.SCHEDULER$_EVENT_INFO).EVENT_TYPE EVENT_TYPE, J.LAST_START_DATE, J.STATE FROM DBA_SCHEDULER_JOBS J INNER JOIN SYS.AQ$SCHEDULER$_EVENT_QTAB SE ON TREAT(SE.USER_DATA AS SYS.SCHEDULER$_EVENT_INFO).OBJECT_OWNER = J.OWNER AND TREAT(SE.USER_DATA AS SYS.SCHEDULER$_EVENT_INFO).OBJECT_NAME = J.JOB_NAME WHERE TREAT(SE.USER_DATA AS SYS.SCHEDULER$_EVENT_INFO).EVENT_TIMESTAMP >= J.LAST_START_DATE AND TREAT(SE.USER_DATA AS SYS.SCHEDULER$_EVENT_INFO).EVENT_TYPE = 'JOB_OVER_MAX_DUR' AND J.STATE = 'RUNNING' ORDER BY J.LAST_START_DATE;
With this query, we can create an event-based job that will stop other jobs that have exceeded their max duration. You can filter them as needed by modifying the query. The second step is to create the job with JOB_ACTION containing the previous query escaping single quotes and stopping the job with DBMS_SCHEDULER.STOP_JOB. Specifying this job will launch when the event type is JOB_OVER_MAX_DUR, specifying the queue type SYS.SCHEDULER$_EVENT_QUEUE and specifying the subscriber name created initially. Note: TAB.USER_DATA is a special collection and must be used exactly as that.
BEGIN DBMS_SCHEDULER.CREATE_JOB( job_name => 'STOP_HUNG_JOBS', job_type => 'PLSQL_BLOCK', job_action => ' BEGIN FOR i IN ( SELECT DISTINCT OWNER||''.''||JOB_NAME JOB FROM ( SELECT J.OWNER, J.JOB_NAME, TREAT(SE.USER_DATA AS SCHEDULER$_EVENT_INFO).EVENT_TIMESTAMP EVENT_TIMESTAMP, TREAT(SE.USER_DATA AS SCHEDULER$_EVENT_INFO).EVENT_TYPE EVENT_TYPE, J.LAST_START_DATE, J.STATE FROM DBA_SCHEDULER_JOBS J INNER JOIN SYS.AQ$SCHEDULER$_EVENT_QTAB SE ON TREAT(SE.USER_DATA AS SCHEDULER$_EVENT_INFO).OBJECT_OWNER = J.OWNER AND TREAT(SE.USER_DATA AS SCHEDULER$_EVENT_INFO).OBJECT_NAME = J.JOB_NAME WHERE TREAT(SE.USER_DATA AS SCHEDULER$_EVENT_INFO).EVENT_TIMESTAMP >= J.LAST_START_DATE AND TREAT(SE.USER_DATA AS SCHEDULER$_EVENT_INFO).EVENT_TYPE = ''JOB_OVER_MAX_DUR'' AND J.STATE = ''RUNNING'' ORDER BY J.LAST_START_DATE)) LOOP DBMS_SCHEDULER.STOP_JOB(i.JOB, TRUE); END LOOP; END; ', event_condition => 'TAB.USER_DATA.EVENT_TYPE = ''JOB_OVER_MAX_DUR''', queue_spec => 'SYS.SCHEDULER$_EVENT_QUEUE, DATA_EVENT', enabled => TRUE); END; /
Logged in with the user MYDB, you can create a job with the query below to test it. It is set to wait for 120 seconds, to launch every minute, and to raise the event JOB_OVER_MAX_DUR if it has been running for more than 60 seconds, which is the smallest possible value.
BEGIN DBMS_SCHEDULER.CREATE_JOBS( JOB_DEFINITION_ARRAY( JOB_DEFINITION( job_name => '"MYDB"."TEST"', job_type => 'PLSQL_BLOCK', job_action => 'DBMS_LOCK.SLEEP(120);', repeat_interval => 'FREQUENCY=MINUTELY;INTERVAL=1;', max_run_duration => INTERVAL '60' SECOND, enabled => TRUE, number_of_arguments => 0))); END; /
To specify the MAX_RUN_DURATION for an existing job is as follows:
BEGIN DBMS_SCHEDULER.SET_ATTRIBUTE( name => '"MYDB"."TEST"', attribute => 'MAX_RUN_DURATION', value => NUMTODSINTERVAL(60, 'SECOND')); END; /
You can check the current active jobs and their configuration with the queries below:
SET LINESIZE 300 SET PAGESIZE 500 COL OWNER FOR A5 COL JOB_NAME FOR A22 COL STATE FOR A9 COL REPEAT_INTERVAL FOR A30 COL LAST_START_DATE FOR A16 COL LAST_RUN_DURATION FOR A16 COL NEXT_RUN_DATE FOR A16 COL START_DATE FOR A16 COL MAX_RUN_DURATION FOR A15 SELECT OWNER, JOB_NAME, STATE, LAST_START_DATE, LAST_RUN_DURATION, NEXT_RUN_DATE FROM DBA_SCHEDULER_JOBS WHERE JOB_NAME IN ('TEST','STOP_HUNG_JOBS') AND ENABLED='TRUE'; OWNER JOB_NAME STATE LAST_START_DATE LAST_RUN_DURATIO NEXT_RUN_DATE ----- ---------------------- --------- ---------------- ---------------- ---------------- SYS STOP_HUNG_JOBS SCHEDULED 30-AUG-22 09.07. +000000000 00:00 26.017000 PM AME :00.297000 RICA/MEXICO_CITY MYDB TEST RUNNING 30-AUG-22 09.07. 30-AUG-22 09.07. 26.314000 PM AME 04.205000 PM AME RICA/MEXICO_CITY RICA/MEXICO_CITY SELECT OWNER, JOB_NAME, START_DATE, REPEAT_INTERVAL, ENABLED, MAX_RUN_DURATION FROM DBA_SCHEDULER_JOBS WHERE JOB_NAME IN ('TEST','STOP_HUNG_JOBS') AND ENABLED='TRUE'; OWNER JOB_NAME START_DATE REPEAT_INTERVAL ENABL MAX_RUN_DURATIO ----- ---------------------- ---------------- ------------------------------ ----- --------------- SYS STOP_HUNG_JOBS TRUE MYDB TEST 30-AUG-22 09.02. FREQUENCY=MINUTELY;INTERVAL=1; TRUE +000 00:01:00 04.283000 PM AME RICA/MEXICO_CITY
You can see what happens in each schedule with the query and output below:
COL LOG_DATE FOR A10 COL OWNER FOR A5 COL JOB_NAME FOR A22 COL STATUS FOR A10 COL REQ_START_DATE FOR A10 COL ACTUAL_START_DATE FOR A10 COL RUN_DURATION FOR A13 COL ADDITIONAL_INFO FOR A10 SET LINESIZE 300 SET PAGESIZE 500 SELECT LOG_DATE, OWNER, JOB_NAME, STATUS, ACTUAL_START_DATE, RUN_DURATION, ADDITIONAL_INFO FROM DBA_SCHEDULER_JOB_RUN_DETAILS WHERE JOB_NAME IN ('TEST','STOP_HUNG_JOBS') ORDER BY ACTUAL_START_DATE; LOG_DATE OWNER JOB_NAME STATUS ACTUAL_STA RUN_DURATION ADDITIONAL ---------- ----- ---------------------- ---------- ---------- ------------ ---------- 30-AUG-22 MYDB TEST STOPPED 30-AUG-22 +000 00:01:18 REASON="St 08.33.37.0 09.32.18.3 op job wit 17000 PM - 61000 PM A h force ca 06:00 MERICA/MEX lled by us ICO_CITY er: 'SYS'" 30-AUG-22 SYS STOP_HUNG_JOBS SUCCEEDED 30-AUG-22 +000 00:00:01 08.33.37.0 09.33.35.8 17000 PM - 77000 PM A 06:00 MERICA/MEX ICO_CITY 30-AUG-22 MYDB TEST STOPPED 30-AUG-22 +000 00:01:18 REASON="St 08.34.55.6 09.33.37.0 op job wit 89000 PM - 17000 PM A h force ca 06:00 MERICA/MEX lled by us ICO_CITY er: 'SYS'" 30-AUG-22 SYS STOP_HUNG_JOBS SUCCEEDED 30-AUG-22 +000 00:00:01 08.34.55.6 09.34.54.8 89000 PM - 92000 PM A 06:00 MERICA/MEX ICO_CITY
By creating this setup, you have automated the hung job stop functionality, and in the next job schedule, it can continue processing data without noticing and requiring manual intervention.
Additionally, to clear the scheduler job run details for any reason, you can do it with the following:
DELETE FROM DBA_SCHEDULER_JOB_RUN_DETAILS WHERE JOB_NAME IN ('TEST','STOP_HUNG_JOBS');
To clear the queue events, use the following:
DECLARE PO DBMS_AQADM.AQ$_PURGE_OPTIONS_T; BEGIN PO.BLOCK := FALSE; DBMS_AQADM.PURGE_QUEUE_TABLE( QUEUE_TABLE => 'SCHEDULER$_EVENT_QTAB', PURGE_CONDITION => NULL, PURGE_OPTIONS => PO); END; /
Once you've confirmed the stop job works successfully, you can drop the test job:
BEGIN DBMS_SCHEDULER.STOP_JOB('MYDB.TEST'); DBMS_SCHEDULER.DISABLE('MYDB.TEST'); DBMS_SCHEDULER.DROP_JOB('MYDB.TEST', TRUE); END; /
The first command stops the job if it's running, the second disables the job from running in the next schedule, and the third deletes the job. Now you need to set the MAX_RUN_DURATION for the existing jobs you want to be stopped, and you're done.
SQL Server
No job property indicates the max time it should run, nor a way to raise an event when it has exceeded a certain amount of time. However, in the schedules, you can specify an active end time, and one job can be associated with multiple schedules, so this can be used to create a job that stops hung jobs.
Let's show the query to check which jobs have exceeded the schedule active end time, but it will only return records after you specify this property in a schedule:
SELECT js.schedule_id, j.name job_name, msdb.dbo.agent_datetime(CONVERT(VARCHAR, ja.start_execution_date, 112), s.active_end_time) MustFinishBy FROM msdb.dbo.sysjobs j INNER JOIN msdb.dbo.sysjobschedules js ON js.job_id = j.job_id INNER JOIN msdb.dbo.sysschedules s ON s.schedule_id = js.schedule_id INNER JOIN msdb.dbo.sysjobactivity ja ON ja.job_id = j.job_id WHERE ja.start_execution_date IS NOT NULL AND s.active_end_time IS NOT NULL
With this query, we can create a job that will stop other jobs that have exceeded the scheduled active end time; you can filter them as needed by modifying the query. The steps below create the job, associate it with the local server, add a step to the job, add a schedule to run it daily (3rd parameter) every (5th parameter) minute (4th parameter), and attach the schedule to the job:
EXEC msdb.dbo.sp_add_job 'STOP_HUNG_JOBS', 1 EXEC msdb.dbo.sp_add_jobserver @job_name='STOP_HUNG_JOBS', @server_name = @@SERVERNAME EXEC msdb.dbo.sp_add_jobstep @job_name='STOP_HUNG_JOBS', @step_id=1, @step_name='STOP_HUNG_JOBS', @command=' DECLARE @cmd VARCHAR(MAX); SET @cmd = ''''; SELECT @cmd=@cmd+''EXEC msdb.dbo.sp_stop_job ''''''+job_name+'''''';'' FROM ( SELECT DISTINCT job_name FROM ( SELECT js.schedule_id, j.name job_name, msdb.dbo.agent_datetime(CONVERT(VARCHAR, ja.start_execution_date, 112), s.active_end_time) MustFinishBy FROM msdb.dbo.sysjobs j INNER JOIN msdb.dbo.sysjobschedules js ON js.job_id = j.job_id INNER JOIN msdb.dbo.sysschedules s ON s.schedule_id = js.schedule_id INNER JOIN msdb.dbo.sysjobactivity ja ON ja.job_id = j.job_id WHERE ja.start_execution_date IS NOT NULL AND s.active_end_time IS NOT NULL) A WHERE GETDATE() > A.MustFinishBy) B; EXEC (@cmd); ' EXEC msdb.dbo.sp_add_schedule 'MinutelyEvery1Minute', 1, 4, 1, 0x4, 1 EXEC msdb.dbo.sp_attach_schedule @job_name='STOP_HUNG_JOBS', @schedule_name='MinutelyEvery1Minute'
To test this job, you can create schedules per minute programmatically with the query below. They're created to run every minute and set to end the minute after it starts:
DECLARE @cnt INT = 0, @min VARCHAR(2), @nextmin VARCHAR(2), @hr VARCHAR(2), @nexthr VARCHAR(2); WHILE @cnt < 1440 BEGIN SET @min = RIGHT('0'+CAST(@cnt%60 AS VARCHAR(2)), 2) SET @nextmin = RIGHT('0'+CAST((@cnt+1)%60 AS VARCHAR(2)), 2) SET @hr = RIGHT('0'+CAST(@cnt/60 AS VARCHAR(2)), 2) SET @nexthr = CASE WHEN @nextmin='00' THEN RIGHT('0'+CAST((@cnt/60)+1 AS VARCHAR(2)), 2) ELSE @hr END SET @nexthr = CASE WHEN @nexthr='24' THEN '00' ELSE @nexthr END EXEC ('EXEC msdb.dbo.sp_add_schedule '''+@hr+':'+@min+''', 1, 4, 1, @active_start_time='''+@hr+@min+'00'', @active_end_time='''+@nexthr+@nextmin+'00''') SET @cnt = @cnt + 1; END;
If you already have a schedule, you can update it with the query below, 000100 means the first minute after midnight:
EXEC msdb.dbo.sp_update_schedule @name='00:00', @active_end_time = 000100;
You can check the existing schedules and their active end time with the query below:
SELECT name, active_start_date, active_start_time, active_end_date, active_end_time FROM msdb.dbo.sysschedules
Note: The results look strange:
But, they display correctly when you view the schedules:
Also, note that the last schedule, even when we set it to 00:00:00, changed to 11:59:59 PM.
Now you can create the test job as follows:
EXEC msdb.dbo.sp_add_job 'test', 1 EXEC msdb.dbo.sp_add_jobserver @job_name='test', @server_name = @@SERVERNAME EXEC msdb.dbo.sp_add_jobstep @job_name='test', @step_id=1, @step_name='step1', @command='WAITFOR DELAY ''00:02:00'';'
And you can associate the job with the created schedules programmatically:
DECLARE @cnt INT = 0, @min VARCHAR(2), @hr VARCHAR(2); WHILE @cnt < 1440 BEGIN SET @min = RIGHT('0'+CAST(@cnt%60 AS VARCHAR(2)), 2) SET @hr = RIGHT('0'+CAST(@cnt/60 AS VARCHAR(2)), 2) EXEC ('EXEC msdb.dbo.sp_attach_schedule @job_name=''test'', @schedule_name='''+@hr+':'+@min+'''') SET @cnt = @cnt + 1; END;
You can see what happens in each schedule as described in this tip: Script for SQL Server Agent Job Issues Across All Instances.
In my case, the output of all jobs history is below:
By creating this setup, you have automated the hung job stop functionality. In the next job schedule, it can continue processing data without noticing or requiring manual intervention.
Once you've confirmed the stop job works successfully, you can drop the test job, which also deletes the per minute schedules created earlier and its job history:
EXEC msdb.dbo.sp_delete_job @job_name='test'
Now you just need to set the active_end_time for the existing schedules you want to be stopped, and you're done.
Next Steps
You can learn more about SQL Server scheduled jobs in the links below:
- Create Azure SQL Database Scheduled Jobs
- Simple way to create a SQL Server Job Using T-SQL
- How To Schedule SQL Scripts On SQL Server Express Edition
- Automatic cleanup of SQL Server Agent scheduled jobs
- SQL Server Agent Job Schedule Reporting
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2022-09-22