Disk Latency for SQL Server Database and Transaction Log Files

By:   |   Updated: 2019-08-19   |   Comments (9)   |   Related: > Performance Tuning


Problem

I need to create a method of reporting the SQL Server database data and log files that have latency issues. The criteria for measuring the IO busy levels is presented as "average total latency" and is calculated for each file as a ratio between the IO stalls and the sum of read and write requests as shown below.

average total latency = io_stall / (num_reads + num_writes)
Solution

My solution involves creating a T-SQL stored procedure in the SQL Server master database, called dbo.usp_FindErrorsInDefTrace that will collect the needed IO information from SQL Server for the database and transaction log files.

The statistics information about all of the files is exposed through the sys.dm_io_virtual_file_stats system function. This function gets a file_id and/or a database_id as parameters and provides cumulative IO statistics like the number of reads and writes for each database file and for each transaction log file.  It also reports IO stall time between the various requests. From these columns the average total latency can be computed.

The IO stall metric is considered to be a general indicator of performance problems. High stall times show I/O problems, which can point to busy physical disks or queries that return large data sets to the client.

IO Stalls are affected mostly by three reasons:

  1. A poorly performing disk subsystem
  2. Badly defined queries
  3. An overloaded disk subsystem

The procedure takes an integer parameter (@N) that indicates how many files are displayed in the result set.

The procedure joins the results from the sys.dm_io_virtual_file_stats system function with the system view sys.master_files (joined by the file and database id columns) and reports the database name, the average total latency, the file size, the file name and path.

SQL Server Stored Procedure to Show Latency for Database Files

USE master
GO

-- =================================================================================
-- Author:         Eli Leiba
-- Create date:    07-2019
-- Procedure Name: dbo.usp_FindTopVirtualFilesStatsIOLatency
-- Description:
--                 This procedure finds the name and paths for the top N data or log files 
--                 on the Server with the most high value of average total IO latency. 
-- ==================================================================================
CREATE PROCEDURE dbo.usp_FindTopVirtualFilesStatsIOLatency (@N INT = 3)
AS
BEGIN
   SET NOCOUNT ON
 
   SELECT TOP (@N) DB_NAME (a.database_id) AS dbname,
      a.io_stall / NULLIF (a.num_of_reads + a.num_of_writes, 0) AS average_tot_latency,
      Round ((a.size_on_disk_bytes / square (1024.0)), 1) AS size_mb,
      b.physical_name AS [fileName]
   FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS a,
      sys.master_files AS b
   WHERE a.database_id = b.database_id
      AND a.FILE_ID = b.FILE_ID
   ORDER BY average_tot_latency DESC
 
   SET NOCOUNT OFF
END
GO			

Example Stored Procedure Execution and Results

Report the top five average total latency files on my SQL Server disk system:

USE master
GO

EXEC dbo.usp_FindTopVirtualFilesStatsIOLatency @N=5
GO			

And the results are (on my server):

contained database

From the above screen shot we can see the transaction log file for database ContainedDatabase has the highest latency based on the calculation.  Also, we can see that all of the above files are the transaction log files by looking at the file extension on the files.

Next Steps
  • You can create and compile this simple procedure in your master database and use it as a simple tool for finding out information about the top N average total IO latency transaction log or database files on your SQL Server.
  • The data gathered from this stored procedure can lead to allocating new hardware resources in addition to performance tuning of queries for databases residing on the files reported by the procedure.
  • The procedure was tested for SQL Server 2014 and 2017, but should work with SQL Server 2005 - 2017.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Eli Leiba Eli Leiba is a senior application DBA, a teacher and a senior database consultant with 19 years of RDBMS experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2019-08-19

Comments For This Article




Tuesday, August 27, 2019 - 11:37:00 AM - Greg Robidoux Back To Top (82156)

Hi Jeffrey,

you can just run this code:

  
SELECT TOP (5) DB_NAME (a.database_id) AS dbname,
  a.io_stall / NULLIF (a.num_of_reads + a.num_of_writes, 0) AS average_tot_latency,
  Round ((a.size_on_disk_bytes / square (1024.0)), 1) AS size_mb,
  b.physical_name AS [fileName]
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS a,
  sys.master_files AS b
WHERE a.database_id = b.database_id
  AND a.FILE_ID = b.FILE_ID
ORDER BY average_tot_latency DESC

Tuesday, August 27, 2019 - 11:31:01 AM - Jeffrey Ayoub Back To Top (82155)

Thank you very much for the information. 

Do you have the same script that I can use without creating a SP.  I am not able to create stored procedures on any servers without approval and i am wonding of you have a version of the script I can just run in a query window and get the same results. 

Please let me know when you have a chance. 

Your help is appreciated.  


Sunday, August 25, 2019 - 7:44:31 AM - eli leiba Back To Top (82133)

In my opinion a value of 50 MS is considered high (for production environments)

Further reading please check out :

https://kaminario.com/company/blog/whats-an-acceptable-io-latency/


Sunday, August 25, 2019 - 6:17:50 AM - eli leiba Back To Top (82132)

These factors affect IO performance

  • Multidisk Arrays
  • Average IOPS per-drive 
  • RAID Factor
  • Read and Write Workload

Sunday, August 25, 2019 - 6:01:31 AM - eli leiba Back To Top (82131)

"average tot latency" is in milliseconds time units


Monday, August 19, 2019 - 8:41:26 PM - Jeffrey Ayoub Back To Top (82106)

The article is very good.  I want to ask, what would you consider high Average total latency.  I know think that Microsoft mentioned anything over 20 MS is considered high.  I want to know what you consider high

Thanks

Jeff


Monday, August 19, 2019 - 11:51:29 AM - Joćo Paulo Oliveira Back To Top (82101)

Hi! thanks for share with us.

May I make a question?

"average tot latency" it's in milliseconds time?


Monday, August 19, 2019 - 10:43:17 AM - Annon Back To Top (82100)

Larry K. There is a large stack that makes up IO and depending on where and how you are looking, you will see different values.

Remember. SQL Server is an application running on top of an OS (potential bottleneck Point A) , possibly running through Fiber or network cards  (potential bottleneck point B)  plugged into network ports (potential bottleneck pint C)  of either a network switch (potential bottleneck D)   or storage processor (potential bottleneck E)  which are then plugged into drives (potential bottleneck F) ,  You SAN team has software tools that most likely look at either the storage processor (potential bottleneck E)  or the aggregate of the drives (potential bottleneck F).

The Question you need to ask the storage team is, what benchmarking tool (HAMMERDB , DSKSPD) will cause your SAN monitoring tool to budge?  

My suspicion is no SAN tool will show any issue as the issue is not with the SAN drives (bottleneck F - to small a pool of drives or misaligned formatting, or the LUN is shared by both OLTP and DW workloads or constant backups running 24/7) but somewhere else in the chain.

You need to take a blank server, and ask the SAN team, if the stars are aligned, and the wind is at our back, and we are dealing with a frictionless surface, what is the highest throughput we can expect from the SAN with, 4K block size, 8K block size, 32K block size, 64K blocksize, etc.  Then ask them to prove it. What benchmarking tool shows that number running on the same OS that you are using.  If they can't, they are drinking the SAN vendors kool-aid about "billions of I/Os per microsecond yada,  yada,  yada"

Every SAN vendor quotes the tiniest block size throughput rates.  We had issues where the SAN team turned on compression on the storage processor and its CPU spiked at 80% most of the day.  There tools showed disk IO plummented and that we could put hundreds of more servers on with that I/O pattern.  This is not good for TDE databases which are already encrypted.  Network ports can have large queue lengths and potentially dropped packets that never show up in SAN tools.  You really need to take this one step at a time and find what is a reasonable value for each component at each bottleneck point when your DSKSPD test is running.


Monday, August 19, 2019 - 9:26:01 AM - Larry Kruse Back To Top (82097)

Eli - we run in a mostly virtualized environment with all SSD disks.  Whenever I start questioning our infrastructure team about disk I/O latency values we see, they take a look at it using their tools and they never find any latency and the latency values that they see are wildly different than the ones I see using your query.  

What accounts for the differences?  Why are we seeing elevated latency values when the infrastructure team sees none?  How can we (as DBAs) really get a handle on I/O issues when the values that SQL Server records are so inaccurate?

Larry K.















get free sql tips
agree to terms