By: Jeffrey Yao | Updated: 2015-05-27 | Comments (10) | Related: > Replication
Problem
I have many SQL Server transactional replication sessions to monitor, for some implementations, we have one strict requirement, i.e. the latency should be less than 90 seconds, otherwise, there may be timeout errors inside front-end application. Is there a good way to monitor the health of our SQL Server replication implementation?
Solution
There are many ways to monitor replication health, such as checking dbo.MSRepl_Errors table in the distribution database, check various replication SQL Server Agent Jobs (distribution jobs, log reader job, etc.), using various replication stored procedures, such as sp_replcounter, sp_repltrans and others, etc. Actually if you Google "Monitoring SQL Server Transactional Replication", you will see numerous links.
For replication latency monitoring, the best way is probably via a tracer token.
In this tip, I will use Replication Management Object (RMO) and PowerShell to send trace tokens on the publisher and then measure the latency of the token as it is sent from publisher to distributor and then from the distributor to the subscriber. All the latency data can be saved to a table for further analysis and processing. The latency value can be a good indicator for replication health.
There are two advantages with this RMO/PowerShell approach:
- High scalability: we can monitor multiple replication systems easily and quickly
- Low configuration requirement: only a list of distributor names is needed, so the solution is efficient with minimum administrative overhead
Solution Explanation
- With distributor names, loop through each distributor and enumerate the publishers that use the distributor
- For each publisher, check whether it has subscribers, if it has subscribers, enumerate the publication databases on this publisher
- For each publication database which has publication(s), enumerate its publications
- For each publication, post a tracer token
- Wait for ($threshold) seconds as defined in $threshold input parameter
- Same steps as 1, 2, 3
- For each publication, enumerate its tracer tokens
- For each tracer token, enumerate its history
- Log the history data into a datatable, and return the datatable
#version=11.0.0.0 means sql server 2012 client sdk installed add-type -AssemblyName "Microsoft.SqlServer.Rmo, version=11.0.0.0, Culture=Neutral, PublicKeyToken=89845dcd8080cc91" add-type -AssemblyName "Microsoft.SqlServer.Replication, version=11.0.0.0, Culture=Neutral, PublicKeyToken=89845dcd8080cc91"; add-type -AssemblyName "Microsoft.SqlServer.ConnectionInfo, version=11.0.0.0, Culture=Neutral, PublicKeyToken=89845dcd8080cc91"; #Import-Module sqlps -DisableNameChecking; #Set-Location c: <# .SYNOPSIS Test the transactional replication latency. Supports SQL Server versions 2005 and above. Assume the distribution agent is running continuously. .PARAMETER Distributors One or more distributor sql instance names. Default value is the current server name where the cmdlet starts .PARAMETER Threshold The wait time in seconds, after tracer tokens are posted, but before the cmdlet tries to check whether the tracer token has arrived on subscriber The default value is 15 (seconds) .EXAMPLE Test-ReplicationLatency -Distributor server1, server2 -threshold 10 Check the transactional replication latency on all publications of the publishers whose distributors are server1 or server2 .NOTES Author: Jeffrey Yao Requires: PowerShell Version 3.0, SQL Server SMO, .OUTPUTS A DataTable that contains distributor, publisher, subscriber and latency information #> Set-StrictMode -Version 3; Function Test-ReplicationLatency { [CmdletBinding()] param ( [Parameter(Mandatory=$false, ValueFromPipeLine=$true, Position=0)] [string[]]$Distributors=$env:ComputerName, [Parameter(Mandatory=$false, Position=1)] [int]$Threshold=15 #wait for $Threshold seconds, default 15 seconds, before retrieving the tracer token status ) [string[]]$distributor_list= $Distributors; $dt = New-Object "system.data.datatable"; $col = new-object "system.data.DataColumn" ('TokenID', [System.String]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('PublisherCommitTime', [System.DateTime]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Distributor_Latency', [System.Int16]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Subscriber', [System.String]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Subscriber_DB', [System.String]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Subscriber_Latency', [System.Int16]) $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Overall_Latency', [System.Int16]) $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Publisher', [System.String]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Publisher_DB', [System.String]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Publication', [System.String]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Distributor', [System.String]); $dt.Columns.Add($col); $col = new-object "system.data.DataColumn" ('Threshold', [System.Int16]); $dt.Columns.Add($col); foreach ($dist in $distributor_list) { try { #[string]$distributor = $dist; #'VHOL1415960\sql2014' #'CGYSQLP22' #'cgysqlt06','trlsqlp90' $distsvr = new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dist foreach ($dp in $distsvr.DistributionPublishers) {# each publisher on the distributor $pubsvr= new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dp.name; if (($pubsvr.EnumRegisteredSubscribers()).count -gt 0) # the publisher has subscribers, so we will send tracer token { foreach ($db in ($pubsvr.ReplicationDatabases | where {$_.HasPublications})) { #each Publication Database, post token $db.TransPublications | % { #post tracer toke on each publication $_.PostTracerToken() | Out-Null; } #post tracer toke on each publication }#each Publication Database, post token } } } catch { $Error[0].Exception.Message; } } Start-Sleep -Seconds $threshold; #wait for $threshold seconds foreach ($dist in $distributor_list) { try { #$dist; [boolean]$del_token = $true; [int]$TokenID =0; [datetime]$PubCommitTime=Get-Date; $distsvr = new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dist foreach ($dp in $distsvr.DistributionPublishers) {# each publisher on the distributor $pubsvr= new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dp.name; if (($pubsvr.EnumRegisteredSubscribers()).count -gt 0) # the publisher has subscribers, so we have sent tracer token in last loop { foreach ($db in ($pubsvr.ReplicationDatabases | where {$_.HasPublications -eq $true})) { #each Publication Database, collect token $db.TransPublications | % { #collect token history of each publication $pub = $_; $PubMon=new-object "microsoft.sqlserver.replication.PublicationMonitor"($pub.name, $pubsvr.DistributionDatabase, $pubsvr.name, $db.name, $dist); if ($PubMon.LoadProperties()) #LoadProperties() should return true, meaning $PubMon is correctly created { $PubMon.EnumTracerTokens() | % { $TokenID = $_.TracerTokenID; $PubCommitTime = $_.PublisherCommitTime; $PubMon.EnumTracerTokenHistory($TokenID).Tables[0] | % { $r = $dt.NewRow(); $r.TokenID = $TokenID; $r.PublisherCommitTime = $PubCommitTime; $r.Distributor_Latency = $_.distributor_latency; $r.Subscriber = $_.subscriber; $r.Subscriber_DB = $_.Subscriber_db; $r.Subscriber_Latency = $_.subscriber_latency; $r.Overall_Latency = $_.overall_latency; $r.Publisher = $pubsvr.name; $r.Publication = $pub.name; $r.Publisher_DB = $db.name; $r.Distributor = $dist; $r.Threshold = $Threshold; $dt.rows.add($r); ## when we have multiple subscribers, we want to make sure ## even if there is one subscriber that has not received the tracer token, we will not delete this tracer if ($r.Overall_Latency -is 'DBNull') { $del_token = $false;} } #clean up the tracer token after it arrives at the subscriber side already if ($del_token) { $PubMon.CleanUpTracerTokenHistory($TokenID);} else { #reset $del_token to true $del_token = $true; } } } #PubMon.LoadProperties() -eq $true; } #collect token history of each publication } #each Publication Database , collect token } } # each publisher on the distributor }#try catch { $error[0].Exception.Message;} } Write-Output $dt -NoEnumerate; }# Test-ReplicationLatency
The following is an example I run on my laptop, where I have two SQL Server instances, [tp_w520] serves as both a distributor and publisher and [tp_w520\SQL2K8R2] serves as a subscriber.
We can actually set up a SQL Server Agent Job or a Windows Task to run every [X] minutes and then save the results to a database table, and based on this table, we can setup various alerts for latency trend analysis.
The following is an example to save the results from the previous Test-ReplicationLatency cmdlet into a SQL Server table
We first need to create a table:
use MSSQLTips if object_id('dbo.tblReplicationLatency', 'U') is not null drop table dbo.tblReplicationLatency; create table dbo.tblReplicationLatency (id int identity primary key , Publisher varchar(100) , Publisher_DB varchar(100) , Publication varchar(100) , Subscriber varchar(100) , Subscriber_DB varchar(100) , Distributor_Latency int , Subscriber_Latency int , Overall_Latency int , Distributor varchar(100) , LogDate datetime default current_timestamp); go
We then need a PowerShell function to write data into the table:
#This Save-DataTable is used to write a datatable into a sql server table function Save-DataTable { [cmdletbinding()] param ( [parameter(Mandatory=$true)] [string] $SQLInstance, [parameter (Mandatory=$true, ValueFromPipeline=$true)] $SourceDataTable, [parameter (mandatory=$true)] [string] $DestinationDB, [parameter (mandatory=$true)] [string] $DestinationTable, # can be two-part naming convention, i.e. [schema_name].[table_name] [hashtable] $ColumnMapping=@{} ) try { $conn = New-Object System.Data.SqlClient.SqlConnection ("Server=$SQLInstance; Database=$DestinationDB; trusted_connection=TRUE"); $conn.Open(); $bulkcopy = New-Object System.Data.SqlClient.SqlBulkCopy($conn); $bulkcopy.DestinationTableName=$DestinationTable; #you may change to your own table if ($ColumnMapping.count -gt 0) { $ColumnMapping.keys | % {$bc_mapping = new-object System.Data.SqlClient.SqlBulkCopyColumnMapping($_, $ColumnMapping[$_]); $bulkcopy.ColumnMappings.Add($bc_mapping); } | Out-Null; }# mapping columns needed $bulkcopy.WriteToServer($SourceDataTable); } catch { Write-Error $_; } finally { $bulkcopy.close(); $conn.Close(); } return; } # Save-DataTable
In PowerShell IDE, we run the following script
[hashtable]$mapping=@{ 'Publisher'='Publisher'; 'Publisher_DB'='Publisher_DB'; 'Publication'='Publication'; 'Subscriber'='Subscriber'; 'Subscriber_DB'='Subscriber_DB'; 'Distributor_Latency'='Distributor_Latency'; 'Subscriber_Latency'='Subscriber_Latency'; 'Overall_Latency'='Overall_Latency'; 'Distributor'='Distributor'}; Test-ReplicationLatency -Distributors tp_w520 -Threshold 15 | Save-DataTable -SQLInstance tp_w520 -DestinationDB MSSQLTips -DestinationTable 'dbo.tblReplicationLatency' -ColumnMapping $mapping;
After a few runs, we can check the table and see something like the following:
Summary
This tip shows how to use PowerShell and RMO together to check the transactional replication health via tracer tokens. With some additional coding, we can collect the result into a table, and then use the table to setup alerts if we find:
- The latency is beyond a defined threshold
- Analyze the latency trend at different time windows
- Proactively check the publication database's transaction log size when long distributor latency is recorded because a large transaction log may mean a long read time for the log reader
- Check network traffic load between the distributor and subscriber when long subscriber latency is detected
Next Steps
Replication monitoring can be done in multiple ways, but using RMO and PowerShell seems a better way in terms of scalability and efficiency. You may read other great tips about replication latency:
- Monitor SQL Server replication latency using tracer tokens
- Determine data latency between Publisher and Subscriber in SQL Server Transactional Replication
- SQL Server Replication Interview Questions
- Troubleshooting transactional replication latency issues in SQL Server
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2015-05-27