Quantcast
Channel: CSS SQL Server Engineers

SQL Setup ToolSuite Introduction (2) – Product Browser

0
0

Oftentimes when I worked on a setup case I hope I have a simple tool to tell me the detailed installed product properties and patch information. In Windows control panel I can find all installed products and its update but it doesn't provide detailed information like the package name of a product, the product code and package code, the cached MSI/MSP name etc. I think this is not a difficult task so I just make it and publish it to GitHub:

https://github.com/suyouquan/SQLSetupTools

With the tool you can easy browse, search, or export detailed products information to text files.

You can download its latest version from above link or Version 1.2 here:

https://github.com/suyouquan/SQLSetupTools/releases/download/v1.2/ProductBrowser_V1.2_NET4.5.zip

Screenshot for your reference:


SQL Setup ToolSuite Introduction (3) – SQL Registry Viewer

0
0

You may want to know what registry keys will be added to system for a SQL server installation. If you use some registry snapshot tool to compare the window registries change of before and after the SQL installation you will find there are 40000~60000 modifications happening. However if you study the modifications carefully you will find that most of them doesn't have much sense, for example, lots of modifications go to "HKLM\DRIVERS\DriverDatabase\DeviceIds\" entry. The most interesting modifications are:

<>Installer related registry keys under
HKEY_CLASSES_ROOT\Installer and
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18

<>COM+ related,Like
Computer\HKEY_CLASSES_ROOT\CLSID
Computer\HKEY_CLASSES_ROOT\Interface
Computer\HKEY_CLASSES_ROOT\TypeLib

<> SQL specific
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server

<>Service, performance log related keys,  and other keys under WOW6432Node

<>Others...

So is it possible to list all these important SQL server related registry keys? My answer is SQL Registry Viewer tool in GitHub:
https://github.com/suyouquan/SQLSetupTools

This tool pre-reads all SQL server meta data from setup source (the setup media containing those MSI/MSP files). The meta data includes  product code, patch code,package code, files, registry keys will be added to system etc. It will then use this prepared meta data to scan registry and display only SQL server related in the UI. The tool will ask you to specify service pack or CU you ever installed to your system for an accurate report. you can browser/search/export the keys in the UI easily.  Just be attention that this tool is not intended to list all SQL server related  keys. Just those important ones will be displayed.

The binary is here:

https://github.com/suyouquan/SQLSetupTools/releases/download/1.0/SQLRegistryViewer_V1.0_NET4.5.zip

Enjoy and have fun!

Uniqueifier considerations and error 666

0
0

This post is intended to shed some light around uniqueifiers and table design that rely on its usage.

First a quick information about the subject.

A uniqueifier (or uniquifier as reported by SQL Server internal tools) has been used in the engine for a long time (since SQL Server 7.0), and even being known to many, referenced in books and blogs, our documentation clearly states that you will not see it exposed externally in the engine (https://docs.microsoft.com/en-us/sql/relational-databases/sql-server-index-design-guide).

"If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users."

While it´s unlikely that you will face an issue related with uniqueifiers, we have seen rare cases where customer reaches the uniqueifier limit of 2,147,483,648, generating error 666.

Msg 666, Level 16, State 2, Line 1

The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID <PARTITIONID>. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.

The error message is clear about the actions you should take to resolve the issue.

Drop and recreate the index may resolve the problem if you don´t have 2,147,483,648 records with the same key value. Recreating the index will allow the uniqueifier to be reset, giving you some time to review table design before reaching the limit again.

As of February 2018, the design goal for the storage engine is to not reset uniqueifiers during REBUILDs. As such, rebuild of the index ideally would not reset uniquifiers and issue would continue to occur, while inserting new data with a key value for which the uniquifiers were exhausted. But current engine behavior is different for one specific case, if you use the statement ALTER INDEX ALL ON <TABLE> REBUILD WITH (ONLINE = ON), it will reset the uniqueifiers (across all version starting SQL Server 2005 to SQL Server 2017).

Important: This is something that is not documented and can change in future versions, so our recommendation is that you should review table design to avoid relying on it.

Related to the table design aspect, based on the second recommendation from error message, our question is: “Is a good design to choose a non-unique clustering key that could have several million/billion duplicate key values?”

The short answer is probably NO. Of course, we know that when it comes to table design there is usually not a right or wrong answer, but the majority of cases should not rely heavily on uniqueifiers.

While most cases will likely have at most hundreds or thousands of duplicated records for a single value and, is straightforward to solve the issue if you get error 666 (having less duplicated rows than the uniqueifier limit), it can cause some downtime when executing the required steps. Consequently, the best course of action is to review tables that rely on uniqueifiers and proactively work to improve its design.

Hope it helps you to have a better understanding about uniqueifiers and to review table design, in order to avoid error 666 in your production environment.

If you want to have more information about uniqueifiers, review post at https://blogs.msdn.microsoft.com/luti/2018/02/16/uniqueifier-details-in-sql-server/ and example 03 from companion content for Chapter 6 (Index Internals) of SQL Server 2008 Internals book (https://www.sqlskills.com/blogs/kimberly/companion-content-for-chapter-6-index-internals-of-sql-server-2008-internals/).

Installation of SQL Server 2017 failing with ‘VS Shell installation has failed with exit code 1638’

0
0

Dear all,

Depending on which products were installed on the server beforehand, a SQL Server 2017 setup may fail with the following error :

TITLE: Microsoft SQL Server 2017 Setup
------------------------------

The following error has occurred:

VS Shell installation has failed with exit code 1638.

Note that it could happen even after SQL Server 2017 setup was executed successfully once already. Eg. the following steps would reach the situation : install a first SQL Server 2017 instance (no error), install SSMS 17 from the Web (no error), install a 2nd instance (error 1638).

Anyway, the important thing is that situation is covered by KB 4092997 and as indicated in the KB, a simple Repair of 'Microsoft Visual C++ 2015 Redistributable (x64)' followed by a server restart should clear the problem (please check the current KB content though, as it may be updated with further details after this post is published).

We're working on making the interactive link of SQL Server setup's pop-up message a bit more useful than it is today, and the more specific situation where our very own SSMS introduces the problem is under review.

Regards,

Guillaume Fourrat

Escalation Engineer

Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups

0
0

Writer: Simon Su
Technical Reviewer: Pam Lahoud, Sourabh Agarwal, Tejas Shah
Applies to: SQL Server 2014 SP2, SQL Server 2016 SP1, SQL Server 2017 RTM 

In synchronous-commit mode AG nodes sometimes you may observe your transactions are pending on HADR_SYNC_COMMIT waits. HADR_SYNC_COMMIT waits indicate that SQL server is waiting for the signal from the remote replicas to commit the transaction.  To understand the transaction commit latency you can refer to below articles:  

Troubleshooting High HADR_SYNC_COMMIT wait type with AlwaysOn Availability Groups
https://blogs.msdn.microsoft.com/sql_server_team/troubleshooting-high-hadr_sync_commit-wait-type-with-always-on-availability-groups/ 

SQL Server 2012 AlwaysOn – Part 12 – Performance Aspects and Performance Monitoring II
https://blogs.msdn.microsoft.com/saponsqlserver/2013/04/24/sql-server-2012-alwayson-part-12-performance-aspects-and-performance-monitoring-ii/ 

In above link you will learn that the transaction delay can be evaluated by below two performance counters: 

  • SQL Server:Database Replica –> Transaction Delay
  • SQL Server:Database Replica –> Mirrored Write Transactions/sec

For example, assume there are poor performing AG nodes and you see “SQL Server:Database Replica –> Transaction Delay” is 1000 ms (milliseconds), and “SQL Server:Database Replica –> Mirrored Write Transactions/sec” is 50, then it means on average each transaction has a delay of 1000ms/50=20 ms. 

Given above example, can we know where the delay of 20 ms is from? What are the factors that cause this latency? To find out the answers of this kind of questions we need to understand how synchronous-commit works: 

https://blogs.msdn.microsoft.com/psssql/2011/04/01/alwayson-hadron-learning-series-how-does-alwayson-process-a-synchronous-commit-request/ 

To track the data movement between replicas we are lucky we have updated xevents:
https://support.microsoft.com/en-us/help/3173156/update-adds-alwayson-extended-events-and-performance-counters-in-sql-s 

In synchronous-commit mode the basic logic of the log blocks movement is as below:


In primary
:

1.1 Log block->LogCache->LogFlush->LocalHarden(LDF)
1.2 Log block->logPool->LogCapture->SendToRemote
(
1.1 and 1.2 happens in parallel )

In remote synchronous replica: 

The harden process is similar as primary:
logBlock receive->LogCache->LogFlush->HardenToLDF->AckPrimary 

 

The xevents (see the first link in this article) happened in different places of log block movement. I have below figure to give you the detailed log movement flow of each steps and those related Xevents: 

As showed above when the Xevent trace is captured we can know the precious time point of each step of the log block movement and you can know exactly where the transaction latency is from. Commonly the delay is from three parts:

1. The duration of log harden in primary 

    It equals to the time delta of Log_flush_start(step 2) and Log_flush_complete (step 3)

2. The duration of log harden in remote replica 

    It equals to the time delta of Log_flush_start (step 10) and Log_flush_complete (step 11)

3. The duration of network traffic 

    The sum of time deltas of (primary:hadr_log_block_send_complete ->secondary:hadr_transport_receive_log_block_message, step 6-7) and    (secondary:hadr_lsn_send_complete->primary:hadr_receive_harden_lsn_message,step 12-13) 

 

I use below script to capture xevents: 

/* Note: this trace could generate very large amount of data very quickly, depends on the actual transaction rate. On a busy server it can grow several GB per minute, so do not run the script too long to avoid the impact to the production server.  */

CREATE EVENT SESSION [AlwaysOn_Data_Movement_Tracing] ON SERVER 
ADD EVENT sqlserver.file_write_completed,
ADD EVENT sqlserver.file_write_enqueued,
ADD EVENT sqlserver.hadr_apply_log_block,
ADD EVENT sqlserver.hadr_apply_vlfheader,
ADD EVENT sqlserver.hadr_capture_compressed_log_cache,
ADD EVENT sqlserver.hadr_capture_filestream_wait,
ADD EVENT sqlserver.hadr_capture_log_block,
ADD EVENT sqlserver.hadr_capture_vlfheader,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden_still_waiting,
ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden,
ADD EVENT sqlserver.hadr_filestream_processed_block,
ADD EVENT sqlserver.hadr_log_block_compression,
ADD EVENT sqlserver.hadr_log_block_decompression,
ADD EVENT sqlserver.hadr_log_block_group_commit ,
ADD EVENT sqlserver.hadr_log_block_send_complete,
ADD EVENT sqlserver.hadr_lsn_send_complete,
ADD EVENT sqlserver.hadr_receive_harden_lsn_message,
ADD EVENT sqlserver.hadr_send_harden_lsn_message,
ADD EVENT sqlserver.hadr_transport_flow_control_action,
ADD EVENT sqlserver.hadr_transport_receive_log_block_message,
ADD EVENT sqlserver.log_block_pushed_to_logpool,
ADD EVENT sqlserver.log_flush_complete ,
ADD EVENT sqlserver.log_flush_start,
ADD EVENT sqlserver.recovery_unit_harden_log_timestamps 
ADD TARGET package0.event_file(SET filename=N'c:\mslog\AlwaysOn_Data_Movement_Tracing.xel',max_file_size=(500),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON) 

GO 

For demo purpose I just run “insert into [AdventureWorks2014]..t1 values(1) and then capture the xevent trace on primary and secondary, and below are the screenshot of the captured xevents: 

 Primary: 

In second synchronous replica: 

 

Note: You may notice that the log_block_id (146028889512) of hadr_receive_harden_lsn_message is not the same as others (146028889488). This is because the return id is always the next immediate id of the harden log block, we can use hadr_db_commit_mgr_update_harden xevent to correlate the xevents: 

 

With above xevent log we now get below detailed time latency breakdowns of the transaction commit:  

  From  To  latency 
Network: 

Primary->Second 

Primary: 

hadr_log_block_send_complete 

2018-03-06 16:56:28.2174613 

 

Secondary: 

hadr_transport_receive_log_block_message 

2018-03-06 16:56:32.1241242 

3.907 seconds 
Network: 

Secondary->Primary 

Secondary:hadr_lsn_send_complete 

2018-03-06 16:56:32.7863432 

Primary:hadr_receive_harden_lsn_message 

2018-03-06 16:56:33.3732126 

 

0.587 seconds 
LogHarden(Primary)  log_flush_start 

2018-03-06 16:56:28.2168580 

log_flush_complete 

2018-03-06 16:56:28.8785928 

 

0.663 seconds 
Log Harden(Secondary)  Log_flush_start 

2018-03-06 16:56:32.1242499 

Log_flush_complete 

2018-03-06 16:56:32.7861231 

 

0.663 seconds 

 

I list the time delta (the latency) just for network and log harden process there, there could be other latency places happening as well, like log block compressing/decompressing etc, but mainly the latency comes from these three parts: 

  1. Network latency between replicas. In above example, it is 3.907+0.587=4.494 seconds 
  1. Log harden on Primary=0.663 seconds 
  1. Log harden on Secondary=0.663 seconds 

To get the total transaction delay, we cannot simply sum them up because Log flush on primary and the network transfer are happening in parallel. Say, the network takes 4.494 seconds, but primary log harden finished (log_flush_complete:2018-03-06 16:56:28.8785928) far before the primary gets confirmation from replica (hadr_receive_harden_lsn_message:2018-03-06 16:56:33.3732126). Luckily, we do not need to manually determine which timestamp to use to calculate the total commit time of a transaction. We can use the time delta between the two hadr_log_block_group_commit  xevents to know the time to commit. For example, in above log: 

Primary: hadr_log_block_group_commit: 2018-03-06 16:56:28.2167393 

Primary: hadr_log_block_group_commit: 2018-03-06 16:56:33.3732847 

Total time to commit=delta of above two timestamps= 5.157 seconds 

This number is equal to the network transfer time plus the log harden time on the secondary. This makes sense because the secondary has to wait for the network for the log block available before log harden, it cannot harden the log in parallel like in the primary. 

If you look at the second hadr_log_block_group_commit event it has a column of “processing_time” which is the exactly the commit time of the transaction that we are talking about: 

 

So now you have an overall picture about the log block movement between synchronous-commit mode replicas, and you know where is the latency (if any) from, replicas, network, or disk (log harden) or others.  

By the way, you may notice that there is “hadr_db_commit_mgr_harden_still_waitingxevent happening in primary Xevents. This event happens every 2 seconds (the 2 seconds is hardcoded) when the primary is waiting for the acknowledge message from second replica. If the ack comes back within 2 seconds you won’t see this xevent. 

 

Reference 

New in SSMS – Always On Availability Group Latency Reports
https://blogs.msdn.microsoft.com/sql_server_team/new-in-ssms-always-on-availability-group-latency-reports/ 

AlwaysOn Availability Groups Troubleshooting and Monitoring Guide
https://msdn.microsoft.com/library/dn135328 

Troubleshooting SQL Server Scheduling and Yielding

0
0

Writer: Simon Su
Technical Reviewer: Pam Lahoud, Sourabh Agarwal, Tejas Shah
  

Scheduling and Yielding Knowledge Recap 

We all know that SQL server is a multi-threads and multi-tasks system and it has its own thread scheduling mechanism which is a small part of job of what we call SQLOS. If you are not familiar with SQLOS you can refer to below two links for the details: 

A new platform layer in SQL Server 2005 to exploit new hardware capabilities and their trends
https://blogs.msdn.microsoft.com/slavao/2005/07/20/platform-layer-for-sql-server 

Inside the SQL Server 2000 User Mode Scheduler
https://msdn.microsoft.com/library/aa175393.aspx 

How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888
https://technet.microsoft.com/en-us/library/cc917684.aspx 

 

Inside SQL server source code, there are many voluntary yield points to make multi-threads run efficiently and cooperatively. If a SQL Server worker thread does not voluntarily yield, it will likely prevent other threads from running on the same scheduler. When the owner of the scheduler has not yielded within 60 seconds and as a result pending requests (tasks) are stalled, SQL Server will log “non-yielding scheduler error in error log like below: 

 

2018-03-10 21:16:35.89 Server      ***********************************************
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      * BEGIN STACK DUMP:
2018-03-10 21:16:35.89 Server      *   03/10/18 21:16:35 spid 22548
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      * Non-yielding Scheduler
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      *********************************************** 

 

And a mini dump is generated in the LOG folder. You can contact Microsoft support to understand the details of the mini dump and check whether this non-yielding scheduler warning is a serious issue or it can be safely ignored.  

 

How long a thread has been running on scheduler without yielding 

 Regarding SQL server scheduling and yielding all things sound perfect until I got deeply involved in a customer’s case to troubleshoot a sudden transaction drop issue. I will write another post to share the story of that. In that case I need to find out how long a SQL server worker thread has been running on a scheduler without yielding. We know if non-yielding condition exceeds 60 seconds SQL server will log non-yielding error accordingly. However how about those threads that have been running on the schedulers without yielding but less than 60 seconds? Do we have a way to get the detail information of these non-yielding threads? Note that you cannot use the CPU time column of a statement in profiler trace because the CPU time of a query doesn’t mean it has been exclusively running on the CPU without yielding for that long. Some yields could have occurred among the life cycle of a query execution and SQL server records these yields with “SOS_SCHEDULER_YIELD” wait type.  

 

SQL server worker has a quantum target of 4ms (milliseconds), however due to the complexity of the reality there could be places that the code runs unexpectedly long before it reaches to the voluntary yield point. Normally this is not a problem because that thread will eventually yield without starving the runnable list threads. In case we really need to know the details of this kind of things (like how long a thread has been running on the CPU without yielding) we can use below approaches.  

Before talking about the method to troubleshoot this kind of scheduling and yielding issue let’s see how a scheduler and its tasks look like: 

Here is the logic how scheduling works:  

  1. When the task is running, it is in “Running” state and it is the active worker of the scheduler. 
  2. When the task waits for nothing but CPU to run it is put into “Runnable” queue of a scheduler 
  3. When the task is waiting for something (like lock,disk I/O, etc) it is in “Suspended” state. 
  4. If the suspended task finishes the waits (wait for nothing) and is ready to run, it is put to the end of the Runnable queue. 
  5. If the running thread voluntarily yields it is put back to end of Runnable queue. 
  6.  If the running thread needs to wait for something it is switched out of the scheduler and is put into suspended status. 
  7. If the running thread finishes its work then the top thread of the runnable queue becomes the “Running” thread. 

 

Resource Wait Time 

For a suspended task if it is waiting for something we have lots of way to get the wait related information like wait_time, wait_resource etc. For example, both DMVs of sys.dm_os_waiting_tasks or sys.dm_exec_requests can tell the detail wait statistics: 

SELECT session_id,status,wait_time,wait_type,wait_resource,last_wait_type
FROM sys.dm_exec_requests
WHERE session_id=52

Result: 

 

Signal Wait Time 

If you query “sys.dm_os_wait_stats” you will find a column called “signal_wait_time_ms”. Signal wait time is the time a thread spent on the scheduler’s “runnable” list waiting to get on the CPU and run again. sys.dm_os_wait_stats output gives you an overall picture of waits for each wait type including the  signal wait time. If you want to get the detailed information of signal wait time of each individual session you can leverage Xevent wait_info and wait_info_external. There is an excellent article discussing how to use wait_info event to trace REDO latency: 

https://blogs.msdn.microsoft.com/alwaysonpro/2015/01/06/troubleshooting-redo-queue-build-up-data-latency-issues-on-alwayson-readable-secondary-replicas-using-the-wait_info-extended-event/ 

The same approach applies to all other waits. I use below steps to simulate signal waits:

1. Create a table in tempdb 

USE tempdb
CREATE TABLE t1 (c1 int)

2. Alter SQL server to use only one CPU (never do this on production server!): 

EXEC SP_CONFIGURE  'affinity mask',2 --use only the second CPU of the system
RECONFIGURE WITH OVERRIDE

3. Now start the xevent trace: 

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name LIKE 'SignalWaitDemo')
DROP EVENT SESSION [SignalWaitDemo] ON SERVER 
GO 
CREATE EVENT SESSION [SignalWaitDemo] ON SERVER 
ADD EVENT sqlos.wait_info(
    ACTION(sqlos.scheduler_id,sqlserver.database_id,sqlserver.session_id)
    --Capture End event (opcode=1) only 
WHERE ([package0].[equal_uint64]([opcode],(1)) 
--Only capture user sessions (session_id>=50) 
AND [package0].[greater_than_equal_uint64]([sqlserver].[session_id],(50))
--You can change duration to bigger value, say, change below 10 ms to 3000ms 
AND [duration]>=(10)))
ADD TARGET package0.event_file(SET filename=N'E:\temp\Wait_Info_Demo.xel')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30  SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) 

GO 

 ALTER EVENT SESSION [SignalWaitDemo] ON SERVER STATE=START; 

 4. Then use ostress.exe tool to simulate workload to SQL server: 

ostress -n1000 -r200 -S. isignal_wait.sql 

In signal_wait.sql, I have below query inside: 

SET NOCOUNT ON
USE
tempdb
DECLARE @I int=0,@k int=0

BEGIN
IF(rand()>0.9)update t1 set c1=c1+10*(rand()-0.5)
DECLARE @document varchar(64);  
SELECT @document = 'Reflectors are vital safety' +  
                   ' components of your bicycle.';  
DECLARE @j int 
SET @j=CHARINDEX('bicycle', @document);
SET @j=CHARINDEX('are', @document);
SET @j=CHARINDEX('vital', @document);
SET @j=CHARINDEX('dd', @document);
SET @j=CHARINDEX('your', @document);
END 

When the ostress tool is running, I query “select * from sys.dm_exec_requests where session_id>50” and I got below output: 

You notice there are lots of runnable threads as well as suspended threads. Those suspended threads are the threads that are waiting for UPDATE lock while those runnable ones are waiting for scheduler to run. 

 I stop the Xevent SignalWaitDemo trace and I got below result: 

You see that there are very long signal_duration sessions in the result which means they are in runnable queue for that long time. 

 

Non-yielding Scheduler Time 

From above description, we know how to check the resource waits time and signal waits time. Now comes to the million dollars question, how to know how long a thread has been running on a given scheduler without yielding (I call it Non-yielding Scheduler Time)?  

Note that non-yielding scheduler time means how long a thread is occupying the scheduler without any yielding. It doesn’t always means the CPU execution time The thread holding the scheduler may be held by operation system if other application is using the same CPU at that time point. This is not common since in most cases it is a dedicated server for SQL server and no other heavy application running on the machine. 

 The fact is that we do not have a handy way to track how long a thread is actually continuously running on a scheduler without yielding. I would expect there is a max_non_yielding_scheduler_time_ms column somewhere in DMV but there is no at this moment 

 

The good news is that we have yield_count in DMV of “sys.dm_os_schedulers like below: 

select yield_count, runnable_tasks_count, * from sys.dm_os_schedulers where scheduler_id<1024 

If the scheduler yields the yield_count will be increased by one. We can query this DMV regularly and get the delta of this column. If the yield_count doesn’t change during the monitoring time delta we know that someone is running on the scheduler for that period of time.  

For example: 

At timestamp1, yield_count is 33555 

After 5 seconds, we query it again, and if the yield_count is still the same, 33555, then we know that a thread has holding the scheduler for at least 5 seconds. 

Once we get the non-yielding scheduler, we can use scheduler’s active worker to inner join sys.dm_os_workers to get the active task, and use the active task to join the sys.dm_exec_requests to get the related user session information. A scheduler’s active worker is the worker thread that is current running on the scheduler, which is normally the running session running on the scheduler. 

 Here is the script to save the yield_count and other related information to a permanent table called “yields. The script will run with every specified interval until you manually stop it:  

USE <yourdb>
 CREATE TABLE yields 
 (runtime datetime, scheduler_id bigint,yield_count bigint,runnable int, session_id int,start_time datetime,command varchar(200),database_id int) 

 GO   

 SET NOCOUNT ON 
 WHILE(1=1)
 BEGIN 
 INSERT INTO yields
 SELECT getdate() 'runtime', a.scheduler_id, a.yield_count, runnable_tasks_count, session_id,start_time, command,database_id 
 FROM sys.dm_os_schedulers a
inner join sys.dm_os_workers b on a.active_worker_address=b.worker_address 
left join sys.dm_exec_requests c on c.task_address=b.task_address 
--Most system has less than 1024 cores, use this to ignore those HIDDEN schedulers 
WHERE a.scheduler_id<1024 
 --Monitor it every 5 seconds. you can change it to meet your needs
 WAITFOR DELAY '00:00:05'
 END 

 To get interesting non-yielding scheduler information out of table yields, I use below script. It is not the perfect one but it can give you the idea how to get  meaningful information from the captured data. 

DECLARE scheduler_cur CURSOR  
FOR SELECT scheduler_id from yields group by scheduler_id order by scheduler_id
OPEN scheduler_cur
DECLARE @id bigint
FETCH NEXT  FROM scheduler_cur INTO @id
WHILE (@@FETCH_STATUS=0)
BEGIN 
 DECLARE delta_cur CURSOR 
 FOR SELECT runtime, yield_count,scheduler_id,runnable,session_id,start_time, command,database_id 
 FROM yields WHERE scheduler_id=ORDER BY runtime ASC 
 OPEN delta_cur
 DECLARE @runtime_previous datetime,@yieldcount_previous bigint
 DECLARE @runtime datetime,@yieldcount bigint,@scheduler_id bigint,@runnable int,@session_id int,@start_time datetime,@command varchar(200),@database_id int

 FETCH NEXT FROM delta_cur INTO  @runtime ,@yieldcount ,@scheduler_id,@runnable ,@session_id ,@start_time,@command,@database_id 
 SET @runtime_previous=@runtime;SET @yieldcount_previous=@yieldcount
 FETCH NEXT FROM delta_cur INTO  @runtime ,@yieldcount ,@scheduler_id ,@runnable,@session_id ,@start_time,@command ,@database_id  

 WHILE(@@FETCH_STATUS=0)
 BEGIN 
--We find one non-yielding scheduler during the runtime delta
IF(@yieldcount=@yieldcount_previous)
BEGIN 
PRINT 'Non-yielding Scheduler Time delta found!'
  SELECT @runtime_previous 'runtime_previous', @runtime 'runtime', datediff(second, @runtime_previous,@runtime) 'non_yielding_scheduler_time_second', @yieldcount_previous 'yieldcount_previous',
  @yieldcount 'yieldcount' ,@scheduler_id 'scheduler_id',@runnable 'runnable_tasks' ,@session_id 'session_id' ,@start_time 'start_time', 

  @command 'command' ,@database_id  'database_id'
END 

-- print @id
SET @runtime_previous=@runtime;SET @yieldcount_previous=@yieldcount
FETCH NEXT FROM delta_cur INTO  @runtime ,@yieldcount ,@scheduler_id,@runnable ,@session_id ,@start_time,@command ,@database_id    

 END

 CLOSE delta_cur
 DEALLOCATE delta_cur
 FETCH NEXT  FROM scheduler_cur INTO @id 

END 
CLOSE scheduler_cur
DEALLOCATE scheduler_cur 

The output looks like below: 

From above output you can see that scheduler 1 has non_yielding_scheduler_time for several times. Actually that scheduler 1 is hang because I suspended its active worker thread in debugger. 

 

If you want to capture more information about the user session, like application name, hostname, etc you can run profiler trace or xevent at the same time to capture those events, and then you can correlate information with yields table to drill down further. 

 

 

Lesson learned from an Availability Group performance case

0
0

Writer: Simon Su
Technical Reviewer: Pam Lahoud, Sourabh Agarwal, Tejas Shah 

Problem description 

One of my customers implemented a very high workload synchronous AG (Availability Group) solution and he needs 10k transactions/sec in AG databases. With the in-memory technology, this 10K/sec goal was achieved but they found a very strange behavior in transaction processing of SQL Server. During stress testing about every 5~10 minutes the transactions/sec counter (actually it is SQL Server 2016 XTP Transactions:Transactions Created/sec counter) could drop to zero suddenly and quickly resume to normal within a second or tens of micro-seconds. Normally you will not observe this interesting thing because the duration of the dip is so short. My customer’s transaction is very time-sensitive so he has his own transaction/sec calculation formula and he found this short sharp drop in his monitor log. If we observe the SQL Server 2016 XTP Transactions:Transactions Created/sec in his captured performance monitor log from primary replica, it looks like below: 

I highlight the sharp drop with red circle in above chart. If we export the performance monitor log to text file, the “Transaction Created/sec” counter has below values: 

You can see that the counter suddenly dropped to 33 at 37:53.4 as highlighted above. I do not think this drop is serious since SQL server keeps the same high transaction processing speed from next second. However, my customer is curious to this little dip and he want to find out the root cause of it.  

How to troubleshoot AG performance delay? 

For AG performance troubleshooting, we have two very good public articles: 

https://blogs.msdn.microsoft.com/saponsqlserver/2013/04/21/sql-server-2012-alwayson-part-11-performance-aspects-and-performance-monitoring-i/ 

 https://blogs.msdn.microsoft.com/saponsqlserver/2013/04/24/sql-server-2012-alwayson-part-12-performance-aspects-and-performance-monitoring-ii/ 

If you are not familiar with AG performance troubleshooting concepts and steps please read above two articles first. Let us look at the two key performance counters to check the transaction delay in my customer’s synchronous-commit mode replicas: 

  • SQL Server:Database Replica –> Transaction Delay
  • SQL Server:Database Replica –> Mirrored Write Transactions/sec

In performance monitor, these two counters look like below: 

 

The Transaction Delay value is an accumulation of the delay of all the current transaction delay in millisecond. You can see that the Transaction Delay counter has the same spikes as the sudden drop of the Transactions Created/Sec. Its spikes indicate that at those time points the AG transactions have time delay during commits.  This gives us a very good start point. We can focus on the transaction delay in our AG performance troubleshooting. 

So who causes the transaction delay? Is it primary replica, secondary replica, or other factors like network traffic? 

As a must go-through step for performance troubleshooting we captured performance monitor logs to check how the performance behaved on both replicas.  We want to find out whether there is any performance bottleneck existing in primary or secondary. For example, whether CPU usage is high when transaction delay spike happens, whether disk queue length is long, disk latency is large, etc.  We expect to find something that has the same spike trend as the “Transaction Created/sec or Transaction Delay. Unfortunately, we do not anything interesting. CPU usage is as low 30%, Disk speed is quite fast. No disk queue length at all. We then checked AG related counters, like the log send queue and the recovery queue as the above two links mentioned but again we do not find anything helpful. We have below conclusions according to the performance monitor log: 

--There is no overall CPU performance bottleneck 

--There is no disk performance bottleneck, especially no disk issue on second replica. 

--There is no network traffic issue. 

In short, the performance monitor log does not tell us much why the transaction delay is happening  

 

Who introduces the transaction delay? 

To investigate the details of the AG transaction performance, we need to study the performance of data movement between the two synchronous replicas. I wrote another article discussing the detailed steps to troubleshoot log block movement latency: 

Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups
https://blogs.msdn.microsoft.com/psssql/2018/04/05/troubleshooting-data-movement-latency-between-synchronous-commit-always-on-availability-groups/ 

I use similar script as above article to capture Xevent traces on both replicas. From the xevent logs we find out that the transaction latency is not caused by below factors: 

<>Network transfer 

<>Local log harden 

<>Remote Log harden 

The latency is happening on the primary replica after the primary receives the LSN harden message from remote node. This is a big milestone because it gives us clear direction where to investigate further. We should focus on the primary to know why it cannot commit the transaction in time. Below is the figure to tell you where comes the delay: 

 

From above xevent log we can see the delay (about 3.2 seconds gap) occurs mainly between xevents of hadr_receive_harden_lsn_message and hadr_db_commit_mgr_update_harden, i.e. between step 13 and step 14 in below figure: 

Normally once hadr_receive_harden_lsn_message arrives from remote replicas, SQL server will process the message and update LSN progress very quickly. Now we see it has delay to process the messages.  

HADR_LOGPROGRESS_SYNC Wait 

Now comes the challenge. How to troubleshoot this further, why step 13-14 produces the latency?  To get the answer of this I use below script (wait.sql) to understand the request status for every second:  

declare  @i integer =0 

WHILE(1=1) 

BEGIN 

set @i=@i+1 

RAISERROR ('-- sys.dm_exec_requests --', 0, 1) WITH NOWAIT 

SELECT  GETDATE() 'runtime', * from sys.dm_exec_requests where session_id >50 

RAISERROR ('-- sys.dm_os_waiting_tasks --', 0, 1) WITH NOWAIT 

SELECT  getdate() 'runtime',* from sys.dm_os_waiting_tasks WHERE    session_id >50 

--Please don’t use so small value in production, it will eat up one core’s usage. 

WAITFOR DELAY '00:00:01.000'; 

END 

GO 

 

I am lucky that from the script output I have a big finding. Whenever the transaction sharp drop occurs there are always HADR_LOGPROGRESS_SYNC waits happening there as well: 

HADR_LOGPROGRESS_SYNC wait is “Concurrency control wait when updating the log progress status of database replicas”. To update log progress for an AG database, like the latest harden LSN from remote replica etc, a thread has to acquire the HADR_LOGPROGRESS_SYNC lock first. For any given point in time, only one thread can hold this lock, and when this lock is held by someone other threads who want to update the log progress have to wait until the lock release. One example is that thread A holds this lock to update the latest harden LSN is 1:20:40, after thread A finishes, it releases the lock, and then thread B holds this lock and update remote harden LSN to 1:20:44. LSN progress update has to be serialized to make the log consistent.  

 

Besides HADR_LOGPROGRESS_SYNC waits in the output, there are also lots of HADR_SYNC_COMMIT occurring. This is expected because we know that there is latency happening at that time (see the transaction delay spike at the beginning of this article). Here is the screenshot of the HADR_SYNC_COMMIT threads: 

What are the relationship between HADR_LOGPROGRESS_SYNC wait and HADR_SYNC_COMMIT wait? It takes me sometime to understand that for synchronous replica, log block could contain several log records from different transactions and these transactions are grouped to commit to replica, this is the behavior of what is called “Group Commit in Availability Groups. When the log block is hardened on remote replica, it will send the harden LSN to primary (we call this sync progress messages). The primary receives the harden LSN and then will acquire HADR_LOGPROGRESS_SYNC lock to update the latest harden LSN to the primary database. All those transactions waiting on HADR_SYNC_COMMIT will be signaled that the remote commit is done if their expected harden LSN is less that the latest harden LSN from remote replica. When local commit and remote commit are both done then the user transaction is called “committed”. Note that we are talking synchronous-commit mode replicas here. If the thread cannot acquire HADR_LOGPROGRESS_SYNC lock to update the latest LSN then there could be lots of threads being in HADR_SYNC_COMMIT wait because they are not able to get signal from the log progress update thread. 

 

Now comes to the million dollars question. Why is there long HADR_LOGPROGRESS_SYNC wait happening? In other words, who owns the HADR_LOGPROGRESS_SYNC lock for that long time? From the HADR_LOGPROGRESS_SYNC wait figure shown above, we see that SPID 438 has last_wait_type of HADR_LOGPROGRESS_SYNC, is it possible it is the owner of the HADR_LOGPROGRESS_SYNC lock? Later investigation actually confirms that SPID 438 is holding HADR_LOGPROGRESS_SYNC at that time. However why does it hold the lock so long? 

 

 

Scheduler Yielding issue 

We checked the output of wait.sql to see if we can get the answer why SPID 438 held the lock for more than 2 seconds. From the output I see SPID 438 status is “background” so I am not able to know whether it is running or in runnable queue. To figure out whether this thread is really running or runnable we can check the active worker thread of its scheduler. If the active worker thread of the scheduler is the same as this thread then we know this thread is on the scheduler running. I wrote below article to demonstrate how to troubleshoot thread scheduling and yielding: 

Troubleshooting SQL Server Scheduling and Yielding 

https://blogs.msdn.microsoft.com/psssql/2018/04/05/troubleshooting-sql-server-scheduling-and-yielding/ 

 

I use the same technology to capture logs. The finding is simple.  The HADR_LOGPROGRESS_SYNC thread was in runnable queue for about 100ms-1 second and therefore it cause lots of HADR_SYNC_COMMIT waits with the same waiting duration, and no doubt it then caused the transaction delay spike as you see in the beginning of this article. Here is the scheduler yield_count looks like: 

You can see that yield_count (27076130) of scheduler 23 does not change within 1 second which means someone is actively running on the scheduler without yielding to other threads. You also see runnable task is 7 which means there are 7 threads are waiting in runnable queue. 

The wait_info xevent trace also confirms the HADR_LOGPROGRESS_SYNC thread is in runnable queue waiting for a while: 

You see for SPID 438 the signal_duration (2407ms) is the same as duration column. This means it has been sitting in runnable queue for about 2407 ms. 

 

Who is holding the scheduler without yielding 

From above investigation we understand that the thread who owns the HADR_LOGPROGRESS_SYNC lock cannot get chance to run on scheduler in time and hence it causes transaction delay spike (i.e. sharp transaction rate drop). Using the technology described in the article “Troubleshooting SQL Server Scheduling and Yielding” we finally find out the “offending” thread is running a query which will access a big in-memory table. The memory table is big and its index is also very huge, and it often takes hundreds of microseconds to run.  In case the worker thread who picks up the HADR_LOGPROGRESS_SYNC message to process is on the same scheduler then they have chance to competing for CPU resource at the same time. In this case, SQL Server is running the query without yielding for about one second, and then this one-second non-yielding scheduler time causes the HADR_LOGPROGRESS_SYNC thread to wait in runnable queue for one second.  Because of this, all of the HADR_LOGPROGRESS_SYNC waiter need to wait for 1 second for the lock, which in turn block those threads in HADR_SYNC_COMMIT waits for one second accordingly. 

The solution is simple.  We involved product group to add yielding code when scanning the in-memory table and then the problem is fixed ( in SQL2016 SP1  CU7, see https://support.microsoft.com/en-us/help/4057280/high-cpu-usage-when-large-index-use-in-query-on-memory-optimized-table).

 

 

July 10, 2018 Windows updates cause SQL startup issues due to “TCP port is already in use” errors

0
0

We have recently become aware of a regression in one of the TCP/IP functions that manages the TCP port pool which was introduced in the July 10, 2018 Windows updates for Windows 7/Server 2008 R2 and Windows 8.1/Server 2012 R2.

This regression may cause the restart of the SQL Server service to fail with the error, “TCP port is already in use”. We have also observed this issue preventing Availability Group listeners from coming online during failover events for both planned and/or unexpected failovers. When this occurs, you may observe errors similar to below in the SQL ERRORLOGs:

Error: 26023, Severity: 16, State: 1.
Server TCP provider failed to listen on [ <IP ADDRESS> <ipv4> <PORT>]. Tcp port is already in use.
Error: 17182, Severity: 16, State: 1.
TDSSNIClient initialization failed with error 0x2740, status code 0xa. Reason: Unable to initialize the TCP/IP listener. Only one usage of each socket address (protocol/network address/port) is normally permitted.
Error: 17182, Severity: 16, State: 1.
TDSSNIClient initialization failed with error 0x2740, status code 0x1. Reason: Initialization failed with an infrastructure error. Check for previous errors. Only one usage of each socket address (protocol/network address/port) is normally permitted.
Error: 17826, Severity: 18, State: 3.
Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
Error: 17120, Severity: 16, State: 1.
SQL Server could not spawn FRunCommunicationsManager thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.

If the issue is impacting an Availability Group listener, you may also observe the below error in addition to the above:

Error: 26075, Severity: 16, State: 1.
Failed to start a listener for virtual network name '<LISTENER NAME>'. Error: 10048.

Additionally, you may also observe the following errors in the Windows System logs:

The SQL Server (<INSTANCE NAME>) service entered the stopped state.
The SQL Server (<INSTANCE NAME>) service terminated with the following service-specific error:  Only one usage of each socket address (protocol/network address/port) is normally permitted.

And if the instance is part of a cluster:

Cluster resource 'SQL Server (<INSTANCE NAME>)' of type 'SQL Server' in clustered role 'SQL Server (<INSTANCE NAME>)' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
The Cluster service failed to bring clustered role 'SQL Server (<INSTANCE NAME>)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

It is also possible for this issue to impact the creation of a new Availability Group listener. In such scenarios, you may encounter an error like below from SQL Server Management Studio:

The configuration changes to the availability group listener were completed, but the TCP provider of the instance of SQL Server failed to listen on the specified port [<LISTENER NAME>:<PORT>]. This TCP port is already in use. Reconfigure the availability group listener, specifying an available TCP port. For information about altering an availability group listener, see the “ALTER AVAILABILITY GROUP (Transact-SQL)” topic in SQL Server Books Online. (Microsoft SQL Server, Error: 19486)

For this scenario, you may see errors similar to below in the SQL ERRORLOGs:

Error: 19476, Severity: 16, State: 4.
The attempt to create the network name and IP address for the listener failed. If this is a WSFC availability group, the WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator. Otherwise, contact your primary support provider.
The Service Broker endpoint is in disabled or stopped state.
Error: 26023, Severity: 16, State: 1.
Server TCP provider failed to listen on [ <IP ADDRESS> <PORT>]. Tcp port is already in use.
Error: 26075, Severity: 16, State: 1.
Failed to start a listener for virtual network name ‘<LISTENER NAME>:’. Error: 10048.
Stopped listening on virtual network name ‘<LISTENER NAME>:’. No user action is required.
Error: 10800, Severity: 16, State: 1.
The listener for the WSFC resource ‘<RESOURCE GUID>’ failed to start, and returned error code 10048, ‘Only one usage of each socket address (protocol/network address/port) is normally permitted.‘. For more information about this error code, see “System Error Codes” in the Windows Development Documentation.
Error: 19452, Severity: 16, State: 1.
The availability group listener (network name) with Windows Server Failover Clustering resource ID ‘<RESOURCE GUID>’, DNS name ‘<LISTENER NAME>’, port <PORT> failed to start with a permanent error: 10048. Verify port numbers, DNS names and other related network configuration, then retry the operation.

 

Solution:

The Windows team has already released hotfixes to address this issue and we have had multiple customers already confirm that these hotfixes have resolved issues related to this regression. The below tables list the KB articles for the patches that introduced the regression and the KB articles for their correlating hotfixes.

 

For Windows 7/Server 2008 R2

KBs that introduced the regression

KBs that fix the regression

July 10, 2018—KB4338818 (Monthly Rollup)

July 18, 2018—KB4338821 (Preview of Monthly Rollup)

July 10, 2018—KB4338823 (Security-only update)

Improvements and fixes - Windows 7 Service Pack 1 and Windows Server 2008 R2 Service Pack 1 (KB4345459)

For Windows Server 2012

KBs that introduced the regression

KBs that fix the regression

July 10, 2018—KB4338830 (Monthly Rollup)

July 18, 2018—KB4338816 (Preview of Monthly Rollup)

July 10, 2018—KB4338820 (Security-only update)

Improvements and fixes - Windows Server 2012 (KB4345425)

For Windows 8.1/Server 2012 R2

KBs that introduced the regression

KBs that fix the regression

July 10, 2018—KB4338815 (Monthly Rollup)

July 18, 2018—KB4338831 (Preview of Monthly Rollup)

July 10, 2018—KB4338824 (Security-only update)

Improvements and fixes - Windows 8.1 and Server 2012 R2 (KB4345424)

 

You can choose to install either of the applicable KBs that fix the regression in order to resolve issues with SQL service/Availability Group listeners failing to start/come online due to "TCP port is already in use" errors due to this regression. For example, if your system has KB4338815, you can install either KB4338831 or KB4345424 to fix the regression. The difference between the two is that KB4345424 provides only the fix for the regression, whereas KB4338831 includes all of the fixes from KB4338815 as well as some additional quality improvements as a preview of the next Monthly Rollup update (which includes the fix for the regression).

In addition to the monthly rollup/security-only updates mentioned above, this regression was also introduced in updates for specific Windows 10/Server 2016 builds. Please note that the build-specific updates do not have a correlating hotfix-only patch, therefore each build only has one applicable patch to address the regression as noted in the table below.

 

KB that introduced the regression

KB that fixes the regression

July 10, 2018—KB4338819 (OS Build 17134.165)

July 16, 2018—KB4345421 (OS Build 17134.167)

July 10, 2018—KB4338825 (OS Build 16299.547)

July 16, 2018—KB4345420 (OS Build 16299.551)

July 10, 2018—KB4338826 (OS Build 15063.1206)

July 16, 2018—KB4345419 (OS Build 15063.1209)

July 10, 2018—KB4338814 (OS Build 14393.2363)

July 16, 2018—KB4345418 (OS Build 14393.2368)

July 10, 2018—KB4338829 (OS Build 10240.17914)

July 16, 2018—KB4345455 (OS Build 10240.17918)

 

There can be other causes of the "TCP port is already in use" errors preventing SQL resources from starting/coming online which are not due to the regression mentioned above. If you are encountering similar errors but do not have the July 10, 2018 updates installed on your system, or you already have the fix installed, then you may find our colleague Chris Thompson's blog - https://blogs.msdn.microsoft.com/sql_pfe_blog/2016/10/05/tcp-port-is-already-in-use/ - useful in identifying whether any other process(es) may be using the port meant for your SQL instance(s).


AGLatency report tool introduction

0
0

I wrote an article to discuss about data movement latency between AG groups:

Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups

Now I develop a tool to analyze AG log block movement latency between replicas and create report accordingly.

alt text

You can download it here: https://github.com/suyouquan/AGLatency/releases/download/V1.0/AGLatencyV1.01.zip

Video: https://github.com/suyouquan/AGLatency/blob/master/AGLatency.mp4

You capture log block Xevent trace from both primary and secondary replica for 5-10 minutes and then this tool will generate report about the latency of the log block movement.

alt text

 

Regards,
Simon

Understanding Optimizer Timeout and how Complex queries can be Affected in SQL Server

0
0

What Is Optimizer Timeout?

SQL Server uses a cost-based query optimizer. Therefore, it selects a query plan with the lowest cost after it has built and examined multiple query plans. One of the objectives of the SQL Server query optimizer (QO) is to spend a "reasonable time" in query optimization as compared to query execution. Therefore, QO has a built-in threshold of tasks to consider before it stops the optimization process. If this threshold is reached before QO has considered most, if not all, possible plans then it has reached the Optimizer TimeOut limit. An event is reported in the query plan as Time Out under "Reason For Early Termination of Statement Optimization." It's important to understand that this threshold isn't based on clock time but on number of possibilities considered. In current SQL QO versions, over a half million possibilities are considered before a time out is reached.


Optimizer timeout is designed in Microsoft SQL Server and in many cases encountering it is not a factor affecting query performance. However, in some cases the SQL query plan choice may be affected by optimizer timeout and thus performance could be impacted. When you encounter such issues, if you understand optimizer timeout mechanism and how complex queries can be affected in SQL Server, it can help you to better troubleshoot and improve your performance issue.

What are the Symptoms?
Here are some of the factors involved:
  •  You have a complex query that involves lots of joined tables (for example, 8 or more tables are joined).
  • The query may run slowly or slower than when you compare it to another SQL Server version or another system.
  • The query plan of the query shows the following information in the XML query plan: StatementOptmEarlyAbortReason="TimeOut". Or, if you verify the properties of the left-most plan operator in Microsoft SQL Server Management Studio, you notice the value of "Reason For Early Termination of Statement Optimization" is “TimeOut.”
For example, the following is the XML output of a query plan that shows the optimizer timeout:
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.518" Build="13.0.5201.2" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
 <BatchSequence>
  <Batch>
   <Statements>
    <StmtSimple StatementCompId="6" StatementEstRows="419.335" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="TimeOut" ......>
    ...
   <Statements>
  <Batch>
<BatchSequence> 

 

The following is an example of a graphical representation of a plan, which displays a "TimeOut" value:

 

 

How does it work, then?

There's no simple formula to determine what conditions would cause the optimizer threshold to be reached or exceeded. However, the following are some factors that determine how many plans are explored by QO in the process of looking for a "best plan":

  • In what order to join tables:
    • join Table1 with Table2 and the result with Table3
    • join Table1 with Table3 and the result with Table2
      Note The larger the number of tables, the larger the possibilities are.
  • What heap or binary tree (HoBT) access structure to use to retrieve the rows from a table:
    • Nonclustered Index1
    • Nonclustered Index2
    • Clustered index, and so on
  • What access method:
    • index seek
    • index scan
    • table scan
  • What physical join operator to use :
    • Nested Loop
    • Hash Match
    • Merge Join (NL, HM, MJ)
  • Should a parallel plan be used or a serial one?

To illustrate, take an example of a join between 3 tables (T1, T2 and T3) and each table has a clustered index only. There are two joins involved here and because there are 3 physical join possibilities (NL, HM, MJ), then each of the two joins can be performed in 6 (2 * 3) ways. Also consider the join order:

  • T1 joined to T2 and then to T3
  • T1 joined to T3 and then to T2
  • T2 joined to T3 and then to T1

Now multiply 6 ways * 3 join orders and we have a minimum of 18 possible plans to choose from. If you include the possibility of parallelism and other factors like Seek or Scan of the HoBT, then the possible plans increase even more. If you are a math wizard you can figure out that when a query involves for example 10 tables, the possible permutations there are in the millions. Therefore, you can see that a query with lots of joins is more likely to reach the optimizer timeout threshold than one with fewer joins.

Note The query predicates (filters in the WHERE clause) and existence of constraints will reduce the number of access methods considered and thus the possibilities considered.

The result of reaching the optimizer timeout threshold is that SQL Server has not considered the entire set of possibilities for optimization and it may have missed a set of plans that could produce shorter execution times. QO will stop at the threshold and consider the least-costly query plan at that point, although there may be better unexplored options. This may result in a query execution that's suboptimal.

But I see an Optimizer Timeout with a simpler query?

Nothing with QO is simple (black and white). There are so many possible scenarios, its complexity so high that it is hard to grasp all of the possibilities. The Query Optimizer may dynamically adjust/set timeout threshold based on the cost of the plan found at a certain stage. For example, if a plan that appears relatively "cheap" is found, then the task limit to search for a better plan may be reduced. Therefore, grossly underestimated cardinality estimation may be one example for hitting an optimizer timeout early. In this case, the focus of investigation is cardinality estimation. This is a rarer case than the scenario that's discussed previously about running a complex query, but it is possible.

So, what do I do?

You may have to do nothing. In many cases the plan you get is quite reasonable and the query you are running is performing well. But in the cases where you find the need to tune and optimize, consider the following options:

First, do this: 

  • Establish that the query under investigation is slower when you compare it to running on a different build of SQL Server or using a different CE configuration or different system. One of my mottos in performance tuning is "There is no performance problem without a baseline"
  • Examine your query in detail when you determine its complexity. Upon initial examination, it may not be obvious that the query is complex and involves many joins. This is the common case when views or table-valued functions are involved. For example, on the surface the query may seem to be simple because it joins two views. But when you examine the views, you may find that each view joins 7 tables; When the two views are joined, you end up with a 14-table join.

The following are different alternatives that you can explore to help improve the performance of the query. Again, be aware that the fact that an optimizer timeout is present in a query plan, does not necessarily mean that it's the reason for query slowness. 

  • Force a particular plan: If you determine that a particular plan is better for your query through testing, ask QO to select that plan. To do this, check out the following reference articles:
  • Try to reduce the permutations/possibilities that QO needs to consider. This involves testing the query with different options. Note: As is with most decisions with QO, the choices are not always deterministic because there is a large variety of factors considered. Therefore, there is no one guaranteed successful strategy. These may improve or worsen the performance of the selected query. For more information, see Query Hints:
    • Eliminate the order permutations: OPTION (FORCE ORDER)
    • Reduce the JOIN possibilities: OPTION (HASH JOIN, MERGE JOIN), OPTION (HASH JOIN, LOOP JOIN) or OPTION (MERGE JOIN)
  • Change Cardinality Estimation (CE) configuration: You can attempt to change the Cardinality Estimation configuration by switching from Legacy CE to New CE or from New CE to Legacy CE. Changing the Cardinality Estimation configuration can cause the QO to pick a different path when SQL Server evaluates and creates query plans. So even if an optimizer timeout issue occurs, it is possible that you end up with a plan that performs more optimally than the one selected using the alternate CE configuration. For more information, see how you can assess and choose the best cardinality estimation configuration for your SQL Server system.
  • Optimizer fixes: If you have not enabled QO fixes via T4199 or by using database compatibility levels for SQL Server 2016 and later or ALTER DATABASE SCOPED CONFIGURATION ..QUERY_OPTIMIZER_HOTFIXES = ON, you may consider applying these fixes. This may cause the optimizer to take a different path in plan exploration and therefore possibly end up with a more optimal query plan. For more information, see SQL Server query optimizer hotfix trace flag 4199 servicing model.
  • Re-write the query: Consider breaking up the single multi-table query into multiple queries by using temporary tables. However, you shouldn't always do this, breaking up the query is just one of the ways to simplify the task for the optimizer. See the following sample:
select ...
from t1
join t2
on t1.id = t2.id
join t3
on t3.id = t2.id
join t4
on t4.id = t3.id

To optimize, try to break down into two queries:

select ...
into #temp1
from t1
join t2
on t1.id = t2.id

select ...
From t3
join #temp1
on t3.id = #temp1.id
join t4
on t4.id = t3.id

 

Important: Using Multiple Common Table Expressions (CTEs) is not an appropriate solution to simplify a query. Multiple CTEs will only increase the complexity of the query. Therefore, it’s counterproductive. CTEs appear to break a query logically, but they are combined into a single query and optimized as a single large join of tables in the end. 

Enjoy!

Joseph Pilov

 





Latest Images