By: Pablo Echeverria | Updated: 2018-03-21 | Comments (4) | Related: > Monitoring
Problem
As a proactive measure, you need to know when any of your SQL Server services are not running. It could be due to a failure and it's better to know about the issues before the users start to complain. Also, it is possible the SQL Server Agent stops working and you won’t notice it until you check the service manually because jobs didn’t run as scheduled.
Solution
Here is a script that checks the status of all SQL Server services across all of your instances and reports when a service is not running.
As it is regular on my scripts, it checks the servers in parallel, so in less than 5 seconds I can get the results for around 170 instances. Also, the script is able to identify active-active clusters and determine if the service is up in one of the nodes, to not report false information. You can run this on demand or schedule it as a regular job, so you can be alerted of issues.
In the script there are some variables you need to modify:
- $server: the name of the server where you have stored your server inventory
- $database: the database where you have created the inventory table
- $query: modify it depending on the names of your columns, this one queries the servers that aren’t clusters
- $query2: modify depending on the names of your columns, this one queries the clusters
- $Maxthreads: query XX servers at a time; if you’re tight on resources you can lower this number, but the results may take longer to appear.
The script works as follows:
- Query the database “Inventory” in the table “Inventory”,
which contains one entry for each instance with the details:
- Environment
- ServerName
- Cluster
- ClusterName
- Use the library “FailoverClusters” with the method “Get-ClusterGroup” to identify which node the services are running. In case it is an active-active configuration, both nodes will be returned.
- Parallelize the work, one thread for each server. If there is only one server or one active node in the cluster, it is going to store the services that are stopped. If it is an active-active cluster, it is going to get all stopped services, and then filter the services that are running on another node.
- For all identified entries, use the WMI service to check if the service is disabled or not, to filter the results and remove unneeded entries. In case it can’t find the details using WMI, the StartMode column will be empty, meaning you need to check the service manually.
- Show any error (like lack of permissions to query the services) and show the list of stopped services.
Sample Output from Monitoring Script
After running the script, here is the output.
We can see we have an error for a server that cannot be connected to. Then we see a list of all of the services that are stopped for each server.
PS C:\Users\PabloEcheverria\Desktop> .\GetServicesStatus.ps1 Production PRODUCTIONSERVER2: Cannot open Service Control Manager on computer 'PRODUCTIONSERVER2'. This operation might require other privileges. Environment ServerName Service Status StartMode ----------- ---------- ------- ------ --------- Production PROD04 SQL Server Agent (SQLEXPRESS) Stopped Test TEST01 SQL Server Browser Stopped Development DEV01 SQL Active Directory Helper Service Stopped Production PROD05 MSSQLServerADHelper Stopped Manual Production prod06node01 SQL Active Directory Helper Service Stopped Production prod06node01 SQL Server CEIP service (MSSQLSERVER) Stopped Production prod06node01 SQL Server Analysis Services CEIP (MSSQLSERVER) Stopped Production prod07node01 SQLDIAG Stopped Manual Production prod07node02 SQLDIAG Stopped Manual
Script to create Instance Inventory Table and Insert Records
This will create the table and some example records. You will need to put in the correct name for your servers.
CREATE TABLE [Inventory] ( [Environment] VARCHAR(128), [ServerName] VARCHAR(128), [Cluster] INT, [ClusterName] VARCHAR(128) ) -- insert sample records into table INSERT INTO [Inventory] VALUES ('Development', 'DEV01', 0, NULL) INSERT INTO [Inventory] VALUES ('Test', 'TEST01', 0, NULL) INSERT INTO [Inventory] VALUES ('Production', 'PROD04', 0, NULL) INSERT INTO [Inventory] VALUES ('Production', 'PROD05', 0, NULL) INSERT INTO [Inventory] VALUES ('Production', 'PROD06', 1, 'WIN06') --Active-Passive cluster INSERT INTO [Inventory] VALUES ('Production', 'PROD07', 1, 'WIN07') --Active-Active cluster -- query table results SELECT DISTINCT [Environment], [ServerName] FROM [Inventory] WHERE [Cluster] = 0 SELECT DISTINCT [Environment], [ClusterName] FROM [Inventory] WHERE [Cluster] = 1
PowerShell Script to Check SQL Server Service Status
This is the complete PowerShell script. As noted above, you will need to adjust some of the variables for your environment.
$ErrorActionPreference = "Stop" #stop when an error is encountered # Declare variables $server = "." $database = "Inventory" $query = @" SELECT DISTINCT [Environment], [ServerName] FROM [Inventory] WHERE [Cluster] = 0 "@ $query2 = @" SELECT DISTINCT [Environment], [ClusterName] FROM [Inventory] WHERE [Cluster] = 1 "@ $SleepTimer = 1000 #after X milliseconds, check if the jobs have finished. 1000 is every second. $MaxResultTime = 300 #after X seconds, all jobs are killed. 300 is 5 minutes. $Maxthreads = 20 #number of parallel jobs # import modules Import-Module SqlPs -DisableNameChecking Import-Module FailoverClusters $error.clear() #clear error generated by last command # get list of instances $objects = @(Invoke-Sqlcmd -ServerInstance $server -Database $database -Query $query) $clusters = @(Invoke-Sqlcmd -ServerInstance $server -Database $database -Query $query2) # get node(s) for the clusters and add to the list of servers foreach ($cluster in $clusters) { $services = Get-ClusterGroup -Cluster $cluster.ClusterName foreach ($service in $services) { if (!($service.Name -eq "Cluster Group") -And !($service.Name -eq "Available Storage")) { $found = 0 for ($i = 0; $i -lt $objects.length; $i++) { if ($objects[$i].ServerName.StartsWith($cluster.ClusterName)) { $found = 1 if (!$objects[$i].ServerName.contains($service.OwnerNode.Name)) { $objects[$i].ServerName = $objects[$i].ServerName + " " + $service.OwnerNode.Name } break } } if ($found -eq 0) { $obj = New-Object -TypeName PSObject $obj | Add-Member -MemberType NoteProperty -Name Environment -Value $cluster.Environment $obj | Add-Member -MemberType NoteProperty -Name ServerName -Value "$($cluster.ClusterName) $($service.OwnerNode.Name)" $objects += $obj } } } } # environment setup $output = @() $errors = "" $Jobs = @() $ISS = [system.management.automation.runspaces.initialsessionstate]::CreateDefault() $RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxThreads, $ISS, $Host) $RunspacePool.Open() $script = { Param ([string]$environment, [string]$serverName) $ErrorActionPreference = "Stop" try { $firstSet = @() $final = @() if (!$serverName.contains(" ")) { # one server $firstSet = Get-Service -computername $serverName | where {$_.DisplayName -like "*SQL*" -and $_.Status -eq "Stopped"} | select @{Name="Environment";Expression={$environment}}, @{Name="ServerName";Expression={$serverName}}, @{Name="Service";Expression={$_.DisplayName}}, Status } else { $parts = $serverName.split(" ") if ($parts.count -eq 2) { # one active node in the cluster $firstSet = Get-Service -computername $parts[1] | where {$_.DisplayName -like "*SQL*" -and $_.Status -eq "Stopped"} | select @{Name="Environment";Expression={$environment}}, @{Name="ServerName";Expression={$parts[1]}}, @{Name="Service";Expression={$_.DisplayName}}, Status } else { # more than one active node in the cluster for ($i = 1; $i -lt $parts.count; $i++) { $secondSet += Get-Service -computername $parts[$i] | where {$_.DisplayName -like "*SQL*"} | select @{Name="Environment";Expression={$environment}}, @{Name="ServerName";Expression={$parts[$i]}}, @{Name="Service";Expression={$_.DisplayName}}, Status } $thirdSet = $secondSet | where {$_.Status -ne "Running"} for ($i = 0; $i -lt $thirdSet.count; $i++) { $running = $secondSet | where {$_.Service -eq $thirdSet[$i].Service -and $_.Status -eq "Running"} | measure if ($running.count -eq 0) { $firstSet += $thirdSet[$i]; } } } } $m = $firstSet | measure if ($m.count -gt 0) { foreach ($row in $firstSet) { $props = Get-WmiObject win32_service -computer $row.ServerName -filter "Name='$($row.Service)'" if (!$props -or ($props.StartMode -ne "Disabled")) { $final += $row | select Environment, ServerName, Service, Status, @{Name="StartMode";Expression={$props.StartMode}} } } } "" #indicate there was no error $final #return results } catch { return $environment + " " + $serverName + ": " + $LastExitCode + " " + $_ | foreach { $_.ToString() } | Out-String #Get-WmiObject error, maybe permissions } } function CreateThread() { param ([string]$environment, [string]$serverName, [ref]$Jobs) $PowershellThread = [powershell]::Create().AddScript($script) #scriptToRun $PowershellThread.AddArgument($environment) | out-null $PowershellThread.AddArgument($serverName) | out-null $PowershellThread.RunspacePool = $RunspacePool $Handle = $PowershellThread.BeginInvoke() $Job = "" | select Handle, Thread, object $Job.Handle = $Handle; $Job.Thread = $PowershellThread $Jobs.value += $Job } $ResultTimer = Get-Date #start time #start processing first task for each instance for ($i=0; $i -lt $objects.length; $i++) { CreateThread $objects[$i].Environment $objects[$i].ServerName ([ref]$Jobs) } while (@($Jobs | where {$_.Handle -ne $Null}).count -gt 0) { #update completed jobs, get errors and result, and dispose them foreach ($Job in @($Jobs | where {$_.Handle -ne $Null -and $_.Handle.IsCompleted -eq $True})) { $results = $Job.Thread.EndInvoke($Job.Handle) if ($results[0] -and $results[0] -ne "") { $errors += $results[0] } for ($i = 1; $i -lt $results.count; $i++) { $output += $results[$i] } #end thread $Job.Thread.Dispose() $Job.Thread = $Null $Job.Handle = $Null } #show progress $inProgress = @($Jobs | where {$_.Handle.IsCompleted -eq $False}).count Write-Progress ` -Activity "Gathering data" ` -PercentComplete (($objects.length - $inProgress) * 100 / $objects.length) ` -Status "$inProgress pending" #exit on timeout $currentTime = Get-Date if (($currentTime - $ResultTimer).totalseconds -gt $MaxResultTime) { Write-Error "Child script appears to be frozen, try increasing MaxResultTime" break } #sleep Start-Sleep -Milliseconds $SleepTimer } #dispose thread pools $RunspacePool.Close() | Out-Null $RunspacePool.Dispose() | Out-Null $errors #return errors $output | Format-Table -Auto #return results
Notes:
If you are running this on a machine that does not have the Failover Clustering features installed you may get an error.
There are two options to fix this error:
- You can comment out this line "Import-Module FailoverClusters" in the PowerShell script using a # in front of the line. If you take this approach you should mark all of your SQL Server instances as non-clustered in the Inventory table.
- You can install the Remote Server Administration Tools. Here are two
articles about this:
- https://social.technet.microsoft.com/Forums/scriptcenter/en-US/d377b496-5441-4896-8706-d60ffda75d6b/can-i-install-the-powershell-failoverclusters-on-a-nonclustered-server?forum=ITCG
- https://stackoverflow.com/questions/36333362/failoverclusters-module-is-not-getting-listed-while-running-the-cmdlet-get-modul
Next Steps
- Modify the script for your inventory table to see which services are stopped.
- Tweak the process to only check for certain services like the database engine and SQL Agent.
- Tweak the process to allow you to ignore certain processes for specific instances.
- You can set this script to run as a scheduled job and notify you periodically on stopped services. Check my other tip on sending a formatted email here.
- You can learn more about Windows services for SQL Server here.
- You can learn more about services isolation here.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2018-03-21