By: David Bird | Updated: 2011-01-12 | Comments (5) | Related: > Clustering
Problem
Have you noticed that when a failover is initiated on a SQL Server cluster it does not always do so successfully? Often nothing is wrong, it's just taking too long to confirm a resource or Windows id. The cause may vary but the solution is the same. You log onto the cluster server and bring the cluster resource group online. So how can you automate this?
Solution
Use the Windows Cluster Command to query the cluster resource group's status. Then when any resources are found offline, execute a subsequent cluster command to bring them online. A remote server is used to execute the commands to insure their execution regardless of the state of the cluster.
The cluster command supports remote executions from a Windows non-clustered server making it very flexible.
A Windows batch command script is used to execute the cluster commands and identify the cluster resource groups needing help. It will be slightly customized for each cluster. Although this is not a SQL Server component, SQL Agent is used to schedule the execution of the scripts. As an alternative, Windows Scheduled Tasks could be used, but that would require additional scripting not covered in this article.
In addition to bringing cluster resource groups online, there are several other requirements:
- Notification when offline resource groups are detected
- Keeping a history of output to assist when issues arise
- Automate this process through scheduling
This article and its attachments address all of these requirements.
Technical Requirements
- Windows clustered servers
- Remote server to act as the monitoring server to run the scripts
- SQL Agent on monitoring server
- SMTP Mail access from monitoring server
- Windows cluster command on monitoring server
- A domain id with local administrator permission on the Windows cluster and remote access from the monitoring server executing the cluster commands.
Monitoring Server
The server you use to schedule and execute the cluster monitoring script needs to be reliable with few external dependencies. It should also:- use internal storage
- stand-alone server (not clustered or virtual)
- have a reboot schedule different from the servers it is monitoring
Step by Step Instructions
The SQL Agent job uses multiple steps to meet all the requirements mentioned earlier. The instructions for creating each step are provided in tandem with instructions on customizing the respective scripts. The script changes will involve server names, cluster resources, email addresses, and other items unique to your environment. Nothing complex.Attached to this article are the scripts used and a sample SQL Agent job. The scripts are required, but the SQL Agent Job is optional. It has been provided as a sample to help get you started and for comparison with the job you create.
SQL Agent Job Outline
- Create one-step for each cluster to check its cluster resources and bring them online.
- Send e-mail when resources are found offline.
- Keep history.
- Purge old history.
Step 1 - Create Job
- Connect to your server that will be used to monitor the clustered servers.
- Create the new job "DBA - Bring Cluster Resources Online". As an alternative, you can use the attached sample Job SQL script.
- Confirm the SQL Agent service is assigned a windows domain id with permission to execute the cluster command on the remote cluster.
- Create the following folders on the server for the scripts and output
E:\DBA_Scripts\Cluster
E:\DBA_Scripts\SendEmail
E:\Output\Job\Cluster_online
Step 2 - Check Cluster Step(s)
The script BringResourcesOnline_SERVERNAME.cmd was designed to be as simple as possible. It checks all the cluster resources on the Windows cluster. If any are found offline, a cluster command is executed to bring them online. If a resource group happens to be online when the cluster command is executed nothing will happen. Please note the cluster status values used are for Windows 2003, they may vary with different versions of Windows operating systems.Execution of the cluster command is quick with the exception of when the Windows cluster name resource is down. Then it hangs for awhile. When this happens the job should hang as well. Any cluster problems that require more than bringing a cluster group online is out of scope for this script.
Create Cluster Online Script
- Edit the sample script BringResourcesOnline_SERVERNAME.cmd.
- Set variable ServerName to Windows cluster name. Do not use the SQL Server cluster name because it might be offline.
- Review the cluster group names in the "Bring all Groups Online" section. Change any group name to match the resource groups on your Windows cluster. Add additional lines if you have more than two cluster resource groups.
Example: "cluster %ServerName% groupname "Group" /ONLINE" - Save the script file in the Cluster folder. Include the name of Windows cluster being monitored in the file name.
Create Cluster Check Step
- Create new step
- General Pane
- Type: Operating system (CmdExec)
- Run as: SQL Server Agent Service Account
- Command: call E:\DBA_Scripts\Cluster\BringResourcesOnline_SERVERNAME.cmd
* Specify your drive, folder, and script name used.
- Advance Pane
- Specify an output file name Cluster_online.txt with the job output directory E:\Output\Job\Cluster_online
- Check the box "Append output to existing file"
- Check the box "Include step output in history"
Note: Do not worry about the appending to the file, the output file will be renamed in a later step. - Click OK to save
Step 3 - Send E-mail
This step checks the output file Cluster_online.txt for any cluster resources that need to be brought online. When found, an email is sent with the output file as an attachment. This is why the output is appended together, so only one email is sent. Two scripts are used; one checks the captured output and the other sends email.Customize Send Email Script
- Edit send_offline_email.cmd
- Set variable ClusterOuput to the full location of the file Cluster_online.txt
- Set variable EmailGroup to a valid email address. If more than one address is to be used, separate them with semicolons.
- Set variable SendmailCommand to the full location of the SendMailCDO.vbs script
Customize SendMailCDO.vbs Script
The sample code used for SendMailCDO.vbs was published by Paul Sadowski. I made some modifications, which are documented in the script.- Edit SendMailCDO.vbs
- Change YOUR.STMP.COM to your SMTP server.
- Save in SendMail folder. I use a separate folder because it might be used by other jobs/scripts in the future.
Create Email Step
- Create new step
- Type: Operating system (CmdExec)
- Run as: SQL Server Agent Service Account
- Command: call E:\DBA_Scripts\Cluster\send_offline_email.cmd
* Specify your drive and folder used. - On Advance pane, check the box "Include step output in history"
- Click OK to save
Step 4 - Output History
Rename the output file Cluster_online.txt by appending the name with the current date and time.Customize Rename Script
- Edit rename.cmd
- Set variable ClusterOutput to the full for the the output file Cluster_online.txt.
Example E:\Output\Job\Cluster_online\ - Save file in Cluster folder
Create Rename Step
- Create new step
- Type: Operating system (CmdExec)
- Run as: SQL Server Agent Service Account
- Command: call E:\DBA_Scripts\Cluster\rename.cmd
* Specify your drive and folder used. - On Advance pane, check the box "Include step output in history"
- Click OK to save
Step 5 - Delete Old Output Files
This step uses two scripts. The first provides the path and delete settings for the Cluster_online.txt files. The second is a VBScript published by Microsoft used to delete old files.Customize Delete Script
- Edit: delete_Cluster_online_files.cmd
- Set variable ScriptFolder to E:\DBA_Scripts\Cluster\ to the location of DeleteFiles.vbs.
Example: E:\DBA_Scripts\Cluster\ - Set variable ClusterOutput to the location of this jobs output files.
Example E:\Output\Job\Cluster_online\ - The script is set to delete any files not accessed in the last 30 days. Change the value 30 to a more appropiate setting if need be.
- Save file in Cluster folder
Script DeleteFiles.vbs
This script was published by Microsoft. There are no changes for this script.- Save file in Cluster folder
Create Delete Step
- Create new step
- Type: Operating system (CmdExec)
- Run as: SQL Server Agent Service Account
- Command: call E:\DBA_Scripts\Cluster\delete_Cluster_online_files.cmd
* Specify your drive and folder used. - On Advance pane, check the box "Include step output in history"
- Click OK to save
Finished
- Save the job.
- Add a schedule once you have fully tested it.
- If you used the sample job included with this article, you will need to enable it when you are ready and add a schedule.
Testing and Scheduling
Testing this job and scripts can be a little tricky if you do not have a cluster to play with. You can run each script manually to verify the execution is as expected and then run the entire job for one cluster. Then add more clusters as you feel more confident.I schedule this job to run 1 hour after a scheduled reboot. It continues to run every 30 minutes until the start of the normal business day.
Sample Cluster Commands
cluster servername node /status
cluster servername group /status
cluster servername group "SQL Group" /online
Sample Execution Output:
Sample Execution Output for Partially Online Resource Group
Conclusion
The goal of this article is to provide you with the commands and scripts needed to bring a Windows cluster resource online after a failed failover. It is nothing fancy and does not address any issues when resources are truly unavailable. It acts as a second attempt to bring the cluster resources online in place of someone manually having to do this. You may never need this script but after a few early Sunday morning wake up calls, you might find it useful.
Next Steps
- Download scripts for this tip
- Check out these additional tips about clustering
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2011-01-12