By: Tim Smith | Updated: 2016-10-21 | Comments | Related: > Azure
Problem
We're experiencing the AzureSQL wait LOG_RATE_GOVERNOR frequently when we load data and this has limited our ability to return data from our queries, along with some of our other loaders. We see this disappears when we scale to a higher pricing tier, but we want to know what other options we have when we experience this wait, or if we should stay at a higher pricing tier.
Solution
When experiencing this wait, I generally see average log write percent's in the range of 96-100% with the below query:
SELECT end_time AS [EndTime] , (SELECT Max(v) FROM (VALUES (avg_cpu_percent), (avg_data_io_percent), (avg_log_write_percent)) AS value(v)) AS [AvgDTU_Percent] , ((dtu_limit)*((SELECT Max(v) FROM (VALUES (avg_cpu_percent), (avg_data_io_percent), (avg_log_write_percent)) AS value(v))/100.00)) AS [AvgDTUsUsed] , dtu_limit AS [DTULimit] FROM sys.dm_db_resource_stats
This appears to relate exclusively to the write, while the former appears to be related to the HADR underneath in addition to the writes. In the former case, we have a few options before we start scaling as we can tune the writes below the throttle. We may still need to scale either vertically or horizontally.
1. Add appropriate objects related to writes.
If our writes involve selects, then we can reduce the load by using strict filters with indexing or primary keys that prevent duplicate inserts. This helps on both the read and write loads along with smaller batch sizes (the below point).
2. Use smaller batches.
In some cases, the write requirements may be flexible and batching the writes into smaller transactions won't cause issues or disrupt the application. This will generally be a useful approach for a smaller or flexible environment. For a flexible environment, if an application doesn't require real time access to data, a batched write - like a small insert and delete combo iteration - can be completed at a convenient time, or small enough to be done throughout the day - like every minute, hour, or day.
Consider the example of using a small live table with a set of distributed tables after the live data passes - rather than remove the live data each pass with a DELETE, we can TRUNCATE and reload the data. The former will have an impact on this wait, while the latter will have a significantly reduced effect.
3. Vertically scale to a higher pricing tier on a schedule.
One of the more efficient techniques when dealing with heavy writes by schedule (like an overnight ETL) is to vertically scale up the database service pricing tier before the process, and scale down the pricing tier when finished. This carries two potential benefits: a higher scale might help the process finishes faster, reducing the cost of the same process being throttled and this option helps developers consider how they might be able to use this technique for some application processes. In the latter benefit, if real time processing isn't required, scheduling a time to process a heavy write load might save resources over distributing the write load throughout the day.
One of my tests of migrating historic data into an archive table from a main table, scaling up several pricing tiers resulted in much faster write processing of the data than trying to distribute the same data set throughout the day into the archiving table after a period of time on a higher tier. The result was that the "cost" of the higher pricing tiers for a short period of time during a schedule was much lower than the cost of running a higher tier to process transactions throughout the day. If vertically scaling can be scheduled without interruption - and be careful because this can abort some transactions - it is definitely worth testing as an approach. One major warning however on this is that you will want alerting if the scale down fails; otherwise, you will be caught on a higher pricing tier for a longer period of time than expected.
4. Horizontally scale needs.
It might be less cost efficient to vertically scale and instead use other AzureSQL databases for scaling writes. Just like read queries can be offloaded by using a geo-replicated secondary, you could divide the writes into smaller databases using smaller tiers (i.e.: sharding/partitioning), but also scaling the writes across each database. Real estate data offers a great example of this because generally these data can be divided into the heavily searched and the seldom searched. In both cases, we want all of the data so our application can return all of it if queried, but we can also demarcate the highly-searched from the seldom-searched.
Some data sets have a subset of data that is highly requested, whether it's area, time, type, etc. Horizontally scaling does come with costs on the architecture side of combining the data when necessary, though in some environments because these queries are requested rarely, that cost may not be expensive. For reducing read and write contention, using geo-replicated secondary's can assist with offloading reads, which may help reduce query times when the primary is receiving a load.
When considering either vertically or horizontally scaling when your environment is still in a POC or testing phase, I would recommend loading multiples of what you expect and tracking the overall use based on the numeric multiple of the expected load. For an example, if we see our writes throttled at three multiples of an expected load, then we can present a scaling case early if we expect the loads to increase. In some cases, we might want to iterate the development of scaling due to release times, so testing higher loads give us a more accurate picture of when we'll need to scale and what routes to consider when we do.
Next Steps
- Before scaling, ensure that you're not missing any objects, like indexes or constraints, that might reduce the write load.
- If applicable, test smaller transaction batches throughout a period of time and test on a schedule and compare the write load using the above query.
- Compare your requirements with the costs and benefits of vertically or horizontally scaling.
- Read more Azure tips
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2016-10-21