What are the SQL Server _WA_Sys... statistics?

By:   |   Updated: 2012-08-09   |   Comments (13)   |   Related: > Indexing


Problem

I was checking the statistics objects in my Microsoft SQL Server database and found some statistics with strange names. Could you please explain what the _WA_Sys... statistics are? Why are they created? Can I safely delete them? Do they differ from the other statistics objects?  Check out this tip to learn more.

Solution

In the SQL Server universe it vital to keep your statistics up-to-date otherwise you can face performance issues. When you set the AUTO_CREATE_STATISTICS option on at a database level, the Query Optimizer creates statistics on individual columns used in a predicate. These statistics are necessary to generate the query plan. They are created on columns that do not have a histogram in an existing statistics object. The name of the auto-created statistics includes the column name and the object ID in hexadecimal format: _WA_Sys_<column_name>_<XXXX>. These statistics are used by the Query Optimizer to determine the optimal Query Execution Plan.

Please note that the AUTO_CREATE_STATISTICS database option generates only single-column statistics for the full table and it does not generate filtered statistics.

How can you check if the Query Optimizer created such single-column statistics for a table? Just run the following query from SQL Server Management Studio:

SELECT stat.name AS 'Statistics',
OBJECT_NAME(stat.object_id) AS 'Object',
COL_NAME(scol.object_id, scol.column_id) AS 'Column'
FROM sys.stats AS stat (NOLOCK) Join sys.stats_columns AS scol (NOLOCK)
ON stat.stats_id = scol.stats_id AND stat.object_id = scol.object_id
INNER JOIN sys.tables AS tab (NOLOCK) on tab.object_id = stat.object_id
WHERE stat.name like '_WA%'
ORDER BY stat.name

The query will return the statistics name, the table name and the column name for all the statistics that have been added automatically by the Query Optimizer. This information can be useful to decide if you should add indexes to these columns or not.

Removing _WA_Sys... statistics in SQL Server

The _WA_Sys stats are created when the statistics are missing. You can drop these statistics, but it is worth having this statistical information in your database. If you drop them then they will be automatically created during the next query execution involving that column. This additional step negatively impacts the query performance.

I recommend converting the _WA_Sys stats to a statistics object with a more reasonable name. This can be done by dropping the auto-created statistics and running sp_createstats to create the statistics with a more user-friendly name:

EXEC sp_createstats @indexonly = 'NO', @fullscan = 'FULLSCAN', @norecompute ='NO'

After executing this command, the SQL Server will create all the missing single-column statistics for your database.

Comparison of _WA_Sys... statistics with the regular statistics objects

The Query Optimizer computes a histogram which is not 100% accurate for the _WA_Sys statistics. With the statistics, the histogram is used to describe the distribution of the values between certain ranges. It is usually represented by rectangles on a graph where the x-axis shows the ranges and the y-axis (the height of the rectangles) shows the number of data elements within the range. In SQL Server it measures the frequency of the occurrence for the distinct values in the given column. The statistical information used for the histogram is the following:

  • RANGE_HI_KEY: the upper boundary for a histogram step
  • RANGE_ROWS: number of rows falling within the histogram step (smaller then the actual upper boundary but greater than the previous step boundary)
  • EQ_ROWS: number of rows which equal to the upper boundary
  • DISTINCT_RANGE_ROWS: number of distinct values in the histogram step
  • AVG_RANGE_ROWS: average number of rows with duplicate values in the histogram step

The histogram for a statistics object can be retrieved by the following query:

DBCC SHOW_STATISTICS (table_name, statistics_name) WITH HISTOGRAM

If you explicitly create statistics for an index with the FULLSCAN option then the Query Optimizer scans all the rows and returns exact statistical data. You can see this on the histogram of the statistics:

Histogram of the SQL Server statistics

Let's check what happens if we drop the statistics and run a query to automatically create the statistics as shown above.  In my example, for the _WA_Sys... statistics the histogram includes estimated values because the Query Optimizer does not sample all the rows. As such, you can see that not all the values are whole numbers:

Histogram of the SQL Server statistics with estimated values
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Tibor Nagy Tibor Nagy is a SQL Server professional in the financial industry with experience in SQL 2000-2012, DB2 and MySQL.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2012-08-09

Comments For This Article




Sunday, August 17, 2014 - 8:53:49 AM - Tibor Nagy Back To Top (34174)

Kurt,

I do not recommend deleting the _wa_ entries directly from the system table although SQL Server 2000 allows you to do it. It is safer to drop some statistics, especially if there are any duplicates.


Friday, August 15, 2014 - 11:19:23 AM - Kurt Survance Back To Top (34156)

 

Thanks for the response.  I know I need to drop them.  My question was about a safe method to drop them.  deleting then directly from sysindexes would be fast.  Drop statistics seems to only drop 1 stats collection at a time.  What are your thoughts about deleting all the _wa_ entries directly from the system table?


Thursday, August 14, 2014 - 10:59:49 AM - Tibor Nagy Back To Top (34134)

Marios,

I think it is better to drop the duplicated statistics because updating two copies generates an overhead during database operations and causes additional disk IO.


Thursday, August 14, 2014 - 10:57:03 AM - Tibor Nagy Back To Top (34133)

Kurt,

I think that you can try to drop some of these automatically generated statistics but it would be good to check which ones are regularly used. It sounds like many of them were created a long time ago and not used anymore.


Tuesday, August 12, 2014 - 8:40:51 PM - Kurt Survance Back To Top (34105)

 

I am working on an old poorly performing sql 2000 instance (against my will.)  It is a complete disaster.  I tried to create an obviously needed index and got a message that no more than 249 indexes + statistics can be created on a single table.  I went to sysindexes and saw five indexes and 244 _WA_statistics on a table with nowhere near that many columns.I have no idea where they came from. Create and Update stats are of course turned on.

I am wnodering if I can safely delete the _WA_ stats from sysindexes or should I use  a drop statsstatement  which as far as I know only drops one stats collection at a time.  I suppose I could write a query that would create a drop statement for each one.  

This database is a basket case tended by no one for years, so I suspect if I deleted all the _WA_ stats only a reasonable number would be re-created.  Things are so bad that sudden loss of all the stats and the recompiles would go almost unnoticed.

What do you think?


Saturday, June 29, 2013 - 8:31:08 AM - Marios Philippopoulos Back To Top (25631)

Would you recommend that all duplicate single-column stats - autocreated prior to creation of an index with leading key on the same column - be dropped?

I've seen it mentioned that they should, as they might confuse the optimizer into generating suboptimal exec plans.

 

Thank you for the article,

Marios Philippopoulos


Sunday, February 17, 2013 - 6:35:02 AM - Tibor Nagy Back To Top (22208)

It depends on what is queried. If query only using the indexes then only the statistics for those indexes are necessary.

If you remove some of the indexes and run a query then the SQl server will automatically create statistics for the affected columns.


Wednesday, February 13, 2013 - 5:18:27 AM - AA Back To Top (22093)

Additional question. If we have a single index for 2 columns say col1 and col2 in that index in that order. Do we need statistics for col1 or col2?

Futhermore if we now have 2 indexes one for col1,col2 and the other for col2,col1. Do we need statistics on the individual columns? If we remove those statisics will the query engine create them again?


Saturday, August 11, 2012 - 4:01:52 PM - Armando Prato Back To Top (19001)

Interestingly enough, the _WA_Sys_<column>_<XXXX>  format of the auto generated statistic has a meaning

 

WA = Washington state (where Microsoft is located)

Sys = System generated statistic

<column> = column number in the table the statistic is for

<XXXX> =  the table's OBJECT_ID in hex


Thursday, August 9, 2012 - 2:21:32 PM - BN Back To Top (18984)

Additional question. If we have a single index for 2 columns say col1 and col2 in that index in that order. Do we need statistics for col1 or col2?

Futhermore if we now have 2 indexes one for col1,col2 and the other for col2,col1. Do we need statistics on the individual columns? If we remove those statisics will the query engine create them again?

 


Thursday, August 9, 2012 - 2:21:19 PM - Tibor Nagy Back To Top (18983)

Martyn,

The stats are automatically created if there are no stats for the given column. Next time, when you run a similar query against the same column, the stats will be available so the query optimizer will use them. This is why I recommend the to drop these stats and subtitute them with manually created stats.

Regards,

Tibor


Thursday, August 9, 2012 - 12:47:52 PM - Martyn Back To Top (18982)

So to summarise, the query optimizer does the best it can when creating stats on the table but it samples only the amount used for the query running against it. This suggests that if another similar query is run against the same table the stats could be updated to incude that data too and overtime the stats for the column in question would become reasonably full and potetially better - is this correct?

If we are to run the fullscan to build a full list of stats, will that create stats that aren't used/needed at this stage? If the stats haven't been updated for a while is it safe to assume that the stats either contain enough info or they are no longer used... and if so how do you find out so the unused stats can be dropped?

Thanks


Thursday, August 9, 2012 - 9:29:09 AM - is there an easy way to drop all WA_SYS_ statistics? Back To Top (18977)















get free sql tips
agree to terms