How to update statistic sql server and how optimizer use indexes to update auto-created statistics?
Consider the scenario where auto-created statistics are created on a column to satisfy a query. Afterward; a nonclustered index is created on the column, with or without other key columns. Could the SQL Server optimizer leverage indexes to update these auto created statistics when running UPDATE STATISTICS statements? Is there a difference between (default) sample rates & fullscan statistic updates when adding these indexes? How to update statistic sql server?SELECT @@VERSION; Microsoft SQL Server 2017 (RTM-CU12) (KB4464082) - 14.0.3045.24 (X64) Oct 18 2018 23:11:05 What is update statistics SQL Server?
When are SQL Server Statistics Updated
sp_updatestats will update the statistics for every user-defined table in the database it is run against. That means that when SQL Server detects out-of-date statistics it will first update the statistics, generate a new execution plan, and then execute the query. Index statistics can be updated automatically by the SQL Server Engine, or manually using the sp_updatestats stored procedure, that runs UPDATE STATISTICS against all user-defined and internal tables in the current database, or using the UPDATE STATISTICS T-SQL command, that can be used to update the statistics. To Update STATISTICS using SQL Server Maintenance Plan Connect to SQL Server instance in SSMS. Right-click on the Maintenance Plans and go to Maintenance Plan Wizard. Select the Update Statistics maintenance task from the list of tasks. Click Next, and you can define the Update Statistics task. TL;DR
- Auto created statistics will be improved by an index if you are using fullscan updates & have the column as the first key column in an index.
- (Default) sample rate statistic updates will use the clustered index / heap table scan even when a NCI exists with the column as the first key column.
- Even when updating the statistic of the NCI.
- As such, updating statistics with a sample rate will not be positively impacted by adding an index.
Setup
Creating a table with a primary key & 10M rows:
CREATE TABLE dbo.testing(ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL, val nchar(500)); INSERT INTO dbo.testing WITH(TABLOCK) (val) SELECT top(10000000) REPLICATE('b',((ROW_NUMBER()OVER (ORDER BY (SELECT NULL))) @0)) FROM master..spt_values spt1 CROSS APPLY master..spt_values spt2 CROSS APPLY master..spt_values spt3;
- A nchar(500) value is used to show a clear difference between statistic updates.
- We can create an auto created statistic on theval column with this query:
- SELECT val FROM dbo.testing whereval = 'bla' order by val;
Full scan stat updates
Updating the statistic with fullscan:
SET STATISTICS IO, TIME ON; UPDATE STATISTICS dbo.testing([_WA_Sys_00000002_4F5F3681]) WITH FULLSCAN;
Results in 127 seconds of cpu tme on my machine:
CPU time = 127141 ms, elapsed time = 80147 ms.
Now when we look at the query that the stat updates actually runs:
SELECT StatMan([SC0]) FROM (SELECT TOP 100 PERCENT [val] AS [SC0] FROM [dbo].[testing] WITH (READUNCOMMITTED) ORDER BY [SC0] ) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 2);
We can see that it went parallel on the two cores on my machine & did order by theval column. As a side note, there was also a memory grant & tempdb was used to support the ordering, The memory grant was as high as 12484 MB or ~ 12 GB
Let's see what happens when we create this index:
CREATE INDEX IX_val on dbo.testing(val);
And re-run the stat update:
SET STATISTICS IO, TIME ON; UPDATE STATISTICS dbo.testing([_WA_Sys_00000002_4F5F3681]) WITH FULLSCAN;
This results in an execution time of
CPU time = 10578 ms, elapsed time = 10882 ms.
And the memory grant & tempdb usage is gone since it does not have to sort anymore.
The statistic update with fullscan on the auto created stat is able to use the index.
This would also work for an index like this:
CREATE INDEX IX_val_val2 on dbo.testing(val,val2);
But not when creating an index like this:
CREATE INDEX IX_val2_val on dbo.testing(val2,val);
Sample rates
If we change the update stats to sample 10 percent instead of doing a fullscan:
UPDATE STATISTICS dbo.testing([_WA_Sys_00000002_4F5F3681]) WITH SAMPLE 10 PERCENT
Without an index this amounts to
CPU time = 10500 ms, elapsed time = 11100 ms.
With an index this amounts to
CPU time = 10312 ms, elapsed time = 11038 ms.
Auto created statistics with sample rates are not positively impacted by indexes on the column where the column is the first key column. Why this happens is explained further below
Comparison with updating the index' stats
As expected, updating the statistic on the index takes the same amount of time as the auto created statistics.
UPDATE STATISTICS dbo.testing(IX_val) WITH SAMPLE 10 PERCENT CPU time = 11203 ms, elapsed time = 11849 ms. UPDATE STATISTICS dbo.testing(IX_val) WITH FULLSCAN; CPU time = 10938 ms, elapsed time = 11005 ms.
Sampled statistic update on an indexed column stat
The query used when updating statistics with a sample rate of 10 percent is different:
SELECT StatMan([SC0], [SC1], [SB0000]) FROM (SELECT TOP 100 PERCENT [SC0], [SC1], step_direction([SC0]) over (order by NULL) AS [SB0000] FROM (SELECT [val] AS [SC0], [ID] AS [SC1] FROM [dbo].[testing] TABLESAMPLE SYSTEM (9.999986e+000 PERCENT) WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SC1], [SB0000] ) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 1)
TABLESAMPLE will give us random pages and it's explanation can be found here.
The statistic orders by val,id,step_direction([SC0]) over (order by NULL)
We can trace the query plan of this StatMan query with the profiler & the Showplan XML Statistics Profile event:
UPDATE STATISTICS dbo.testing(IX_val) WITH SAMPLE 10 PERCENT
The execution plan shows the clustered index being used, instead of the expected nonclustered index:
I cannot share the query plan with paste the plan unfortunately, it is malformed.
The same is true for heap tables. The usage of the NCI would not make a difference here, the TABLESAMPLE SYSTEM (9.999986e+000 PERCENT) retrieves 10 percent of the data pages in a random order. This is why the CI was used.