How to Identify IO Bottlenecks in MS SQL Server

By: Tibor Nagy | Updated: 2011-03-17 | Comments (5) | Related: > Performance Tuning

Problem

We experience regular slowdowns on our SQL Server databases. After analyzing the memory and CPU usage we would like to continue the root cause investigation by examining I/O bottlenecks. What is your recommendation to recognize I/O related bottlenecks in SQL Server?

Solution

The I/O subsystem is a key factor when it comes to SQL Server performance since database pages are constantly moved in and out of the buffer pool. Also the transaction logs and tempDB generate significant I/O traffic. Therefore you have to ensure that your I/O subsystem performs as expected, otherwise you will be a victim of increased response times and frequent time-outs. In this tip I will describe some of the ways to identify I/O related bottlenecks using the built-in tools and provide some disk configuration ideas.

Performance Monitor

You can use Performance Monitor to check the load on your I/O subsystem. The following performance counters can be setup to check disk performance.

The PhysicalDisk Object: Avg. Disk Queue Length counter shows you the average number of read and write requests that were queued on the selected physical disk. The higher the number the more disk operations are waiting. It requires attention if this value frequently exceeds a value of 2 during peak usage of SQL Server. If you have multiple drives you should take this number and divide by the number of drives in the array to see if the number is above 2. For example, you have 4 drives and a disk queue length of 10, this would be 10/4 = 2.5, so this would be the value you want to use not 10.

Avg. Disk Sec/Read and Avg. Disk Sec/Write shows the average time of a data reads and writes from/to the disk. It is good up to 10 ms, but it is still acceptable if less than 20 ms. Any higher value needs further investigation.

Physical Disk: %Disk Time is the ratio of elapsed time when the disk drive was busy with read or write requests. The rule of thumb for this value is that it should be below 50 percent.

The counters Disk Reads/Sec and Disk Writes/Sec show you the rate of read/write operations on the disk. It should be less than 85 percent of the disk capacity since the disk access time increases exponentially beyond this value.

You can determine the disk capacity by gradually increasing the load on the system. One way to do this is to use SQLIO. You should look for the point where the throughput is constant, but the latency increases.

You can use the counters for RAID configurations with the following calculations:
Raid 0: I/O per disk = (reads + writes) / number of disks
Raid 1: I/O per disk = [reads + (writes*2)] / 2
Raid 5: I/O per disk = [reads + (writes*4)] / number of disks
Raid 10: I/O per disk = [reads + (writes*2)] / number of disks

Here is an example of your I/O per disk for RAID 1, if we get these values from the counters:

Disk Reads/sec = 90
Disk Writes/sec = 75
The formula for I/O on a RAID-1 array is [reads + (writes*2)] / 2 or [90 + (75*2)] / 2 = 120 I/Os per disk

Dynamic Management Views

There are some useful Dynamic Management Views (DMVs) to check I/O bottlenecks.

An I/O latch wait occurs when a page is accessed for reading or writing but the page is not available in the buffer pool. It causes waits on PAGEIOLATCH_EX or PAGEIOLATCH_SH, depending upon the type of request. These wait types can indicate an I/O bottleneck. You can query the sys.dm_os_wait_stats DMV to find latch wait statistics. You can identify I/O problems if you save query outputs of waiting_task_counts and wait_time_ms values from a normal working state of your SQL Server and compare these values when performance is degraded.

select *
from sys.dm_os_wait_stats  
where wait_type like 'PAGEIOLATCH%'
order by wait_type asc

Pending I/O requests can be found by querying the following DMVs and can be used to identify which disk is responsible for the bottleneck.

select database_id, 
       file_id, 
       io_stall,
       io_pending_ms_ticks,
       scheduler_address 
from sys.dm_io_virtual_file_stats(NULL, NULL) iovfs,
     sys.dm_io_pending_io_requests as iopior
where iovfs.file_handle = iopior.io_handle

Disk Fragmentation

I would recommend that you check the disk fragmentation and the configuration of your disks used by the SQL Server instance.

Fragmentation of files on NTFS can cause significant reductions in performance. Disks should be defragmented regularly and a defragmentation policy and plan should be put in place. Research shows that in some cases a SAN can actually perform worse with defragmentation enabled thus SANs need to be treated on a case-by-case basis.

Fragmentation of indexes can also cause high I/O utilization on NTFS, but this does not have the same affect on SANs which perform better on random I/Os.

Disk Configuration / Best Practices

As a general rule, you should have log files on a physical disk that is separate from the data files for better performance. The I/O profile for a heavily used database data files (including tempDB) are random. The I/O profile for all database log files are sequential except when a transaction needs to be rolled back.

The internal disks should only be used for database log files, because they excel at sequential I/O, but perform poorly on random I/Os.

The database data and log files should each be placed on their own dedicated disk packs. To ensure optimal performance, I recommend that the database log file be placed on two internal disks configured as RAID 1. The database data file should reside on a SAN system which is only accessed by SQL Server and has controlled querying and reporting. Ad hoc access should be disallowed.

Write caching should be enabled where possible and you should make sure the cache is protected from power failures and other possible failures.

To limit possible I/O bottlenecks for your OLTP system you should not mix OLAP and OLTP environments. Also, make sure your code is tuned and create indexes where necessary to avoid unnecessary I/Os.

Next Steps

Collect and compare performance counters
Analyze DMV information
Run SQL Server Profiler to find high Read and Write queries that can be tuned
Check the Performance Tips category on MSSQLTips.com

About the author

Tibor Nagy is a SQL Server professional in the financial industry with experience in SQL 2000-2012, DB2 and MySQL.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2011-03-17

Comments For This Article

Wednesday, August 13, 2014 - 9:16:03 AM - ananda

I have collected Physical disk performance counter around 1 Hour by using system perfmon tool.

Disk Reads/sec - average values = 24.10862341

Disk write/sec - average values = 37.1755549

Data and log files are configured RAID 5 array controller. total 5 HD, 300*5

as per IO formula in RAID 5 =(24.10862341 + (37.1755549*4)) /5 = 34.5621686.

so as per this result 34 IO per seconds. is it correct? what is value of 4 in this formula.

Avg. Disk Queue Length - 15 second interval values not reached more than 10.

Pls. suggstion.

Thursday, November 1, 2012 - 7:53:29 AM - Dave

Hi All,

Not sure this is the right place to ask question but Please move it to the right area

One of our Prod box running SQL2008R2 having CPU high utilization issue

I already checked Min/Max memory changed it but no effect(Min 20GB Max is 100GB)

SQL remain on 51GB

2. I used DBCC DROPCLEANBUFFERS, after couple of minutes cpu is back to 80%

3. DBCC MEMORYSTATUS is showing me locked allocated page which is taking 47GB memorym, I need to know do I need to disable this page locking, why it suddenly locked the pages?, the server OS is W28

Is is windows issue or DBA ?

How to resolve it, kindly help me and tell me how to fix this issue ASAP

Thanks

Dave

Friday, July 29, 2011 - 1:19:25 AM - Usman Butt

Hi Tibor,

Thanks for the response. So based upon on your feedback, can I safely say that one can let the fragged indexes on SAN since it is good at random I/Os for some time (the performance would not detoriate considerably) ? If yes, then when should I start considering index defragmentation on SANs (if should be)? Thanks in advance.

Kind Regards,
Usman

Thursday, July 28, 2011 - 11:18:17 AM - Tibor Nagy

Hi Usman,

I meant that fragmentation does not have that big effect on SAN as it has on a traditional NTFS system.

Regards,

Tibor

Thursday, July 28, 2011 - 2:45:11 AM - Usman Butt

Hi,

First of all seems like reading it a bit late :) But still wanted to ask the details of your statement

"Fragmentation of indexes can also cause high I/O utilization on NTFS, but this does not have the same affect on SANs which perform better on random I/Os"

How index fragmentation would not cause big I/Os on SAN while considering there would be much more page scans and use of fragged indexes always yield poor executions? Are you talking about the fragmentation of indexes on multiple DISKs arrays (the storage)? If this is the case then your statement is very confusing and could mislead others. Please make me understand this. Thanks.

Regards,
Usman