My Kerberos Checklist…

June 23, 2010, 3:49 pm

≫ Next: When Does sp_prepare Return Metadata

≪ Previous: Error 18056 can be unwanted noise in certain scenarios

I’ve had numerous questions regarding Kerberos, both internally within Microsoft and with Customers. It continues to be a complicated topic and the documentation that is out there can be less than straight forward. Based on some previous items I’ve worked on, I wanted to share my experience in regards

Let me start by looking at two scenarios for reference. One that is basic and the other that is complex.

As you’ll find, once we figure out how to configure the basic scenario, the complex scenario ends up being very similar.

Data Collection:

The first thing when you try to tackle a Kerberos issue is to understand your environment. I find that a lot of the Kerberos issues that I troubleshoot all come down to gathering the right information to make an informed analysis and identify the problem point. The following data points relate to all servers involved. We will circle back on the Client after we talk about the Servers.

Know your topology
What is the Service Account being used for the application in question?
What Service Principal Name (SPN) does your app require?
What SPNs are defined for that service account?
What are the delegation settings for the service account?
Local Policy related information
Additional application specific information

Consistent vs. Intermittent Kerberos Issues

The data collection points above should allow you to get Kerberos working in most cases. I say most cases because the above refers specifically to configuration. I typically break it down to consistent vs. intermittent issues. If the issue is reproducable every time, it is a configuration issue. If it is intermittent, then it is usually not a configuration issue. If it was it would happen all the time. Intermittent means it works most of the time. In order to work at all, it has to be configured correctly. The exception to this would be if you are in a Farm type situation and the configuration is not the same on every box in the farm. Sometimes you may hit Server A which is configured properly, and another time you may hit Server B which is not and causes an error. Which brings us to the first Data Collection Point…

Know your topology

Before you being, you should know what servers are involved in your application as a whole. If we are talking about a single web application, you probably have at least two servers to consider and know about – the Web Server and the Backend (SQL for our purposes). They both play a part. This becomes even more important in a distributed environment where you may have 3+ servers.

As you’ll see, with the data collection items, we basically will walk the line down your servers to check them one by one.

What is the Service Account?

For the particular server you are looking at, what is the service account that the application is using? This is important, because this will tell us where the SPN needs to go. It also plays a part in Delegation. Not every service will be a Windows Service, so this could be dependent on the application in question. Here are some examples:

SharePoint

IIS – not a windows service

Reporting Services

Windows Service

SQL Server

Windows Service

For windows services, you can also look in the Services MMC to get the information. Again, you need to know what your application is doing:

What SPN does your app require?

We can look at all sorts of SPN listings, but before you do, we need to know what it is we are looking for. I think this is one of the more complicated parts of Kerb configuration because the SPN is dependent on the application you are using. The format of the SPN is consistent between applications, but what is required is dependent on the application, or from an SPN point of view, the service. It is a Service Principal Name after all!

The SPN has the following format: <service>/<host>:<port/name>

The port/name piece of this is optional and dependent on what the service will accept.

HTTP – For a default configuration, the port is never used for an HTTP SPN. SPN’s are unique and if you add an HTTP SPN with a port on it, it will be ignored as it is not correct. IIS and Internet Explorer do not affix the port number to the SPN request when they look for it. From an Internet Explorer perspective, you can alter this behavior via a registry key to where it will, but I have yet to see anyone do that. Most people aren’t aware of it from what I can tell. From my experience, I would stay away from adding a port to an HTTP SPN.

MSSQLSvc – you can look at the following blog post to read more about how SQL determines the SPN needed. http://blogs.msdn.com/b/psssql/archive/2010/03/09/what-spn-do-i-use-and-how-does-it-get-there.aspx

For the next couple of items, we will use the SharePoint service as the example – spservice. In this case it is a web application, so we know it will use the HTTP service from an SPN perspective. The host piece is dependent on how we are connecting to the web server. This is true for any application really. From an HTTP perspective it is the URL, for SQL it is the connection string. Another thing to know is that both IIS and SQL will resolve a NetBIOS name to the Fully Qualified Domain Name if it can. For example – http://passp will be resolved to passsp.pass.local.

For our spservice example with a url of http://passsp, our SPN turns out to be http/passsp.pass.local and it is placed on the spservice account.

Another special note about HTTP SPNs. If for example my SharePoint AppPool (service) was using Network Service, this is considered the machine context so the SPN would go on the machine account (PASSSP). However, HTTP is considered a covered service for a special service type called HOST. Every Machine account has a HOST entry for the FQDN as well as the NetBIOS name. You don’t need to add an HTTP SPN on the machine account as long as your URL matches the machine name.

When adding an SPN, I also always recommend that you add both the FQDN SPN (i.e. http/passsp.pass.local) as well as the NetBIOS SPN (i.e. http/passsp). The NetBIOS SPN is a safety measure in case the DNS resolution fails and it just submits the NetBIOS SPN request.

What SPN is defined?

Now that we know the service account and what our SPN should be, we can look at the SPNs that are defined on that account. We can use SetSPN to do this, although there are other tools that can help get this information for you (ADSIEdit, LDAP queries, etc…). SetSPN is nice though as it ships with the Operating System starting with Windows 2008. Lets have a look at our SharePoint Service account – spservice:

Based on what we came up with above, we can see that the passsp SPN’s are in place. You’ll also notice another SPN present, which means this Service Account is hosting two HTTP Services (could be two AppPools on the one server, or on two separate servers).

You could run into a situation where the SPN is defined on another account as well. This may be a misplaced or a duplicate SPN. Both will cause an issue for you. Usually when I grab SPN information from an environment, I grab all SPN’s defined in the Domain so that I can look for misplaced or duplicate SPNs. The SetSPN tool that comes with Windows 2008 and later (and can be downloaded for Windows 2003), contains a new switch that will look for Duplicates for you. It is the –X switch.

In the above, you can see two accounts that had the http/passsp.pass.local SPN. You can then decide which one really needs to be there based on the Service Account being used.

What are the delegation settings?

Delegation only comes into play if you want the Client’s Windows credentials forwarded to another service. For example, SharePoint to Reporting Services, Reporting Services to SQL, or even SQL to SQL in a Linked Server scenario. NTLM does not allow for the forwarding of credentials. This is accomplished through the process of delegation as part of the Kerberos Protocol. There are two main types of Delegation – Full Trust or Constrained Delegation. Of note, you will not see the Delegation Tab on the Account within Active Directory unless an SPN has been assigned to that account.

Full Trust

This means that the given service can forward the Client’s credentials to any service. You are non-discriminate in who you communicate to. This is less secure option out of the two, but it is the easiest to configure out of the two (which I would expect being less secure – Secure always means complicated right?)

Constrained Delegation

Constrained means that you are going to specify which services you can actually delegate to. The services are represented by SPN’s. This is the more secure approach but has some drawbacks. As mentioned before it is more complicated. The reason is that you have to know exactly what your application is trying to delegate to. It may not be just the service you are interested. For example, you may be configuring SharePoint for Delegation to go to Reporting Services, but then realize that you just broke a connection to SQL or maybe a connection to some web service that you are trying to hit that requires Kerberos. It’s not really that bad as long as you understand everything that your application is going to reach out to and that would require passing on the Client’s credentials.

The other drawback to Constrained Delegation is that you lose the ability to cross a domain boundary. Meaning a cross domain scenario will fail from a delegation perspective. Users from another Domain can hit your application, but all of the services that you are communicating to need to be in the same domain. For example, SharePoint (Domain A) cannot delegate to SQL (Domain B). Under constrained delegation, that will fail.

In the image below, the 3rd radio dial means that you want to use Constrained Delegation. The sub radio dials define whether you want to use all Kerberos, or if you want to enable Protocol Transitioning. I’m not going to get into Protocol Transitioning in this blog post as it is big enough, but you will have to deal with Protocol Transitioning if you are using the Claims to Windows Token Service. This would come into the picture if you are doing anything with Excel Services in SharePoint or PowerPivot.

You will need to go back to your application’s topology to determine if enabling delegation is required. If we look at our Double Hop example from above, Reporting Services would need to have delegation enabled for it’s service account, but SQL would not as SQL isn’t going out to anything using the Client’s credentials.

Local Policy Settings

There is at least one Local Policy setting you’ll need to pay attention to when trying to delegate. That is the “Impersonate a client after authentication” policy.

If your middle server is a web server, you can take advantage of a build in group that has this permission. For Windows 2003, the group is called IIS_WPG. For Windows 2008 and later it is the IIS_USRS group. By default, SharePoint and RS should place itself in that group. So, you usually don’t have to worry about it. I’m just mentioning it here as a step in the checklist. I rarely see this as the issue though unless you are doing a customer application with a Domain User account for the service account.

Client

Let’s circle back on the Client. You may be asking, all this is great for the application, but is there anything special I need to do for the User Account coming from the client. Not really. By default you should be good to go from the Client’s user account. However, there is an account you should be aware of within Active Directory. That is the “Account is sensitive and cannot be delegated” setting. If that is checked, you will have issues with that specific user. To this date, I have yet to see a customer actually have that checked. Doesn’t mean people don’t do it. I just haven’t seen it.

Application Specific Settings

When I started getting into Kerberos, I found that almost all of the issues were based on the Active Directory settings (SPN, Delegation, etc…). Not to say that that has lessened, but I’ve also seen a shift in the complexity of getting specific applications up and running. As applications become more complex, you should be aware of what settings may come into play within that app that could affect Kerberos. If you have gone through everything above and it all looks good. Chances are that there is an application specific setting that is interfering.

There is a lot to mention in this area, so I will spin up another blog post to discuss application specific settings to touch on IIS, SharePoint, Excel Services, PowerPivot and Reporting Services. SQL doesn’t really have any Kerb specific settings as long as the SPN and delegation settings (if needed) are in place.

Tying it together…

So, we’ve looked at what my checklist is, but it was really focused on one service. What I’ve found is that it is as simple as that. All I do is repeat the check list on each server that play a part in the application (topology). Think of it as a wash, rinse, repeat. When I help customers to get Kerberos configured, I just walk the line down each server to make sure everything lines up. I have been fairly successful with that approach. As I’ve had more experience with it (as I usually deal with it every day), I can usually target a specific segment depending on where the error is coming from. Other times it may not be that straight forward. Even when I target a specific area, if that doesn’t pan out, I just start from the beginning and apply the checklist to each server/service that is playing a part.

Once you approach it that way, it really doesn’t matter how many hops there are or what services are involved. You just follow the checklist one more time. The point where complications usually come into play are when Constrained Delegation is implemented and we didn’t account for everything or you hit up against an App Specific issue. Outside of that, it is usually straight forward based on the above. Just find out what the SPN needs to be and where it needs to go and you are 80% there.

I realize I’m making it sound simple when it can be very frustrating and complicated, but the above has worked well for me in the past. Hopefully the above is helpful to you as you try to implement Kerberos within your environment.

There is definitely way more to cover on this topic and I will continue to blog about those items.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

When Does sp_prepare Return Metadata

July 23, 2013, 12:35 pm

≫ Next: Invalid or loopback address when configuring SharePoint against a SQL Server

≪ Previous: My Kerberos Checklist…

I was running an RML Utilities Suite test pass and encountered varying behavior from our sp_prepare suite. Here is what I uncovered.

The command sp_prepare returns (or does not return) metadata depending on the server version. For the client version, it is only significant whether it is prior to SQL 2012 or it is a later one (i.e. 2012 RTM, SP1, etc.).

1. Prior to SQL 2012, sp_prepare returns metadata to the user. This was implemented by internally setting FMTONLY ON and executing the statement.

2. In SQL 2012 RTM and SP1, sp_prepare does NOT return metadata, if client version is 2012 or greater. FMTONLY ON is deprecated and used only for backward compatibility with the older (i.e. 2008) clients.

3. In SQL 2012 CU6 (build 11.0.2401.0) and later, and SP1 CU3 and later, sp_prepare DOES return metadata to the user, if the batch contains one statement. This is to address a performance issue with some scenarios (see hotfix KB2772525).

The following matrix shows when sp_prepare should return metadata for batches containing one statement.

Client\Server Version	2008/R2	2012 RTM	2012 CU6 +	2012 SP1	2012 SP1 CU3 +	SQL 14
2008 R2	yes	yes	yes	yes	yes	yes
2012 (all versions)	yes	no	yes	no	yes	yes
SQL 14 CTP	yes	no	yes	no	yes	yes

yes - sp_prepare returns metadata
no - sp_prepare does NOT return metadata

The following matrix shows when sp_prepare should return metadata for multi-statement batches, such as

declare@p1int

set@p1=NULL

execsp_prepare@p1output,NULL,N'select * from sys.objects; select 1;',1

select@p1

Client\Server Version	2008/R2	2012 RTM	2012 CU6 +	2012 SP1	2012 SP1 CU3 +	SQL 14
2008 R2	yes	yes	yes	yes	yes	yes
2012 (all versions)	yes	no	no	no	no	no
SQL 14 CTP	yes	no	no	no	no	no

Bob Dorr - Principal SQL Server Escalation Engineer

↧

Invalid or loopback address when configuring SharePoint against a SQL Server

October 4, 2013, 7:19 am

≫ Next: SQL Connection Pool Timeout Debugging

≪ Previous: When Does sp_prepare Return Metadata

I was presented with a connectivity issue when trying to configure SharePoint 2013 using a CTP build of SQL 2014. They got the following error when they were it was trying to create the Configuration Database.

Exception: System.ArgumentException: myserver,50000 is an invalid or loopback address. Specify a valid server address.
   at Microsoft.SharePoint.Administration.SPServer.ValidateAddress(String address)
   at Microsoft.SharePoint.Administration.SPServer..ctor(String address, SPFarm farm, Guid id)
   at Microsoft.SharePoint.Administration.SPConfigurationDatabase.RegisterDefaultDatabaseServices(SqlConnectionStringBuilder connectionString)
   at Microsoft.SharePoint.Administration.SPConfigurationDatabase.Provision(SqlConnectionStringBuilder connectionString)
   at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword, SecureString masterPassphrase)
   at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, String farmUser, SecureString farmPassword, SecureString masterPassphrase)
   at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.CreateOrConnectConfigDb()
   at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.Run()
   at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()

They had indicated that they had hit this before, and they worked around it by creating a SQL Alias. However this time it was not working. It was presented to me as a possible issue with using SQL 2014 and I was asked to have a look to see if this would affect other customers using SQL 2014.

I found some references regarding the error, and the majority of comments indicated to have SQL Server use the default port of 1433. Also some that said create an Alias. Some of the SharePoint documentation even shows how to change the SQL Port, and they also show how to create an Alias, but none really explained why this was necessary, or what SharePoint what actually looking for.

For this issue, it has nothing to do with SQL 2014 specifically and could happen with any version of SQL. The issue is what SharePoint is looking for. Whatever you put in for the Server name needs to be a valid DNS name. For a non-default port (1433), you would need to create a SQL Alias. If you create a SQL Alias, the name should be resolvable and not a made up name that doesn’t exist in DNS. Otherwise, you will get the same error.

Techie Details

I started by looking at the error first. Of note, this is a SharePoint specific error and not a SQL error.

Exception: System.ArgumentException: myserver,50000 is an invalid or loopback address. Specify a valid server address.
at Microsoft.SharePoint.Administration.SPServer.ValidateAddress(String address)

This was an ArgumentException when SPServer.ValidateAddress was called. I’m going to assume that the string being passed in is whatever we entered for the database server. In my case it would be “myserver,50000”. I’ve seen this type of behavior before, here is one example. My first question was, what is ValidateAddress actually doing? I had an assumption based on the behavior that it was doing a name lookup on what was being passed in, but I don’t like assumptions, so I wanted to verify.

Enter JustDecompile! This is a create tool if you want to see what .NET Assemblies are really doing. The trick sometimes is to figure out what the actual assembly is. I know SharePoint 2013 using the .NET 4.0 Framework, so the assemblies that are GAC’d will be in C:\Windows\Microsoft.NET\assembly\GAC_MSIL. After that, I go off of the namespace as assemblies are typically aligned to the namespaces that are within it. I didn’t see an assembly for Microsoft.SharePoint.Administration, so I grabbed the Microsoft.SharePoint assembly within C:\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.SharePoint\v4.0_15.0.0.0__71e9bce111e9429c. This prompted me to load a few others, but it told me which ones to go get.

Within the Microsoft.SharePoint assembly, we can see that we have the Administration namespace.

So, now we want the SPServer object and the ValidateAddress method.

internal static void ValidateAddress(string address)
{
    Uri uri;
    if (address == null)
    {
        throw new ArgumentNullException("address");
    }
    UriHostNameType uriHostNameType = Uri.CheckHostName(address); <-- This is what gets us into trouble
    if (uriHostNameType == UriHostNameType.Unknown)
    {
        object[] objArray = new object[] { address };
        throw new ArgumentException(SPResource.GetString("InvalidServerAddress", objArray)); <-- The exception will be thrown here
    }
    uri = (uriHostNameType != UriHostNameType.IPv6 ||
        address.Length <= 0 ||
        address[0] == '[' ||
        address[address.Length - 1] == ']' ?
        new Uri(string.Concat("http://", address)) : new Uri(string.Concat("http://[", address, "]")));
    if (uri.IsLoopback)
    {
        object[] objArray1 = new object[] { address };
        throw new ArgumentException(SPResource.GetString("InvalidServerAddress", objArray1));
    }
}

Uri.CheckHostName Method
http://msdn.microsoft.com/en-us/library/system.uri.checkhostname.aspx
Determines whether the specified host name is a valid DNS name.

So, if the string we pass in cannot be resolved via DNS, it will fail. We never get to the point where we actually hit SQL itself.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

↧

SQL Connection Pool Timeout Debugging

October 22, 2013, 9:04 am

≫ Next: Can’t Connect to SQL when provisioning a Project App in SharePoint

≪ Previous: Invalid or loopback address when configuring SharePoint against a SQL Server

This is a follow up to two blog posts from back in 2009 which talked about leaked connections. In Part 1 and Part 2 of that post, it was about how to determine that you actually filled your pool. This was centered around the following error:

Exception type: System.InvalidOperationException
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
InnerException: <none>
StackTrace (generated):
    SP               IP               Function
    000000001454DDC0 00000642828425A8 System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)
    000000001454DE10 0000064282841BA2 System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)
    000000001454DE60 000006428284166C System.Data.SqlClient.SqlConnection.Open()

The issue I just worked on was the same exception, but in the case the Pools were not exhausted. In this case, the issue was occurring within BizTalk 2006 R2. We narrowed this down to the following exception:

0:138> !pe e09e13f0
Exception object: 00000000e09e13f0
Exception type: System.Data.SqlClient.SqlException
Message: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
InnerException: <none>
StackTrace (generated):
    SP               IP               Function
    0000000015CBDF10 00000642828554A3 System_Data!System.Data.SqlClient.SqlInternalConnection.OnError(System.Data.SqlClient.SqlException, Boolean)+0x103
    0000000015CBDF60 0000064282854DA6 System_Data!System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(System.Data.SqlClient.TdsParserStateObject)+0xf6
    0000000015CBDFC0 0000064282CDCCF1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSniError(System.Data.SqlClient.TdsParserStateObject, UInt32)+0x291
    0000000015CBE0A0 000006428284ECCA System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult, System.Data.SqlClient.TdsParserStateObject)+0x13a
    0000000015CBE140 000006428284E9E1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()+0x91
    0000000015CBE1A0 0000064282852763 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadBuffer()+0x33
    0000000015CBE1D0 00000642828526A1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadByte()+0x21
    0000000015CBE200 0000064282851B5C System_Data!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior, System.Data.SqlClient.SqlCommand, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.BulkCopySimpleResultSet, System.Data.SqlClient.TdsParserStateObject)+0xbc
    0000000015CBE2D0 00000642828519E6 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.CompleteLogin(Boolean)+0x36
    0000000015CBE320 000006428284A997 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(System.Data.SqlClient.ServerInfo, System.String, Boolean, Int64, System.Data.SqlClient.SqlConnection)+0x147
    0000000015CBE3C0 000006428284859F System_Data!System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(System.String, System.String, Boolean, System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, Int64)+0x52f
    0000000015CBE530 0000064282847505 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, System.String, Boolean)+0x135
    0000000015CBE5D0 00000642828471E3 System_Data!System.Data.SqlClient.SqlInternalConnectionTds..ctor(System.Data.ProviderBase.DbConnectionPoolIdentity, System.Data.SqlClient.SqlConnectionString, System.Object, System.String, System.Data.SqlClient.SqlConnection, Boolean)+0x153
    0000000015CBE670 0000064282846E36 System_Data!System.Data.SqlClient.SqlConnectionFactory.CreateConnection(System.Data.Common.DbConnectionOptions, System.Object, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection)+0x296
    0000000015CBE730 0000064282846947 System_Data!System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnectionOptions)+0x37
    0000000015CBE790 000006428284689D System_Data!System.Data.ProviderBase.DbConnectionPool.CreateObject(System.Data.Common.DbConnection)+0x29d
    0000000015CBE830 000006428292905D System_Data!System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(System.Data.Common.DbConnection)+0x5d
    0000000015CBE870 0000064282846412 System_Data!System.Data.ProviderBase.DbConnectionPool.GetConnection(System.Data.Common.DbConnection)+0x6b2
    0000000015CBE930 00000642828424B4 System_Data!System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)+0x54
    0000000015CBE980 0000064282841BA2 System_Data!System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)+0xf2
    0000000015CBE9D0 000006428284166C System_Data!System.Data.SqlClient.SqlConnection.Open()+0x10c
    0000000015CBEA60 0000064282928C2D Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreSingleEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x8d
    0000000015CBEAE0 0000064282928947 Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreCustomEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x47

The end result was to either increase the connection timeout for that connection string, or to look at the performance on the SQL Server and determine why SQL wasn’t able to satisfy the connection. The customer had indicated that this occurred at the month end operations, which probably means that we ramped up pressure on SQL Server. It may have come down to us not having enough Workers within SQL to handle the connection request which resulted in a Timeout after the default timeout which is 15 seconds.

Techie details:

This will look at how we determined what the problem was once we had a memory dump of the process. These debugging instructions are based on a 64-bit dump. The steps should be similar for a 32-bit dump as well. For the dumps, we used the SOS debugging extension which ships with the .NET Framework. You can load the extension in the debugger by using the following command:

0:000> .loadby sos mscorwks

Let’s first find the Connection Pools that are in the dump:

0:138> !dumpheap -stat -type DbConnectionPool
000006428281fce8        4          416 System.Data.ProviderBase.DbConnectionPool+TransactedConnectionPool
000006428085dbc8       28          672 System.Data.ProviderBase.DbConnectionPoolCounters+Counter
000006428281f6d8        8          704 System.Data.ProviderBase.DbConnectionPool+PoolWaitHandles
0000064282810450        4          704 System.Data.ProviderBase.DbConnectionPool
000006428281d320      165         5280 System.Data.ProviderBase.DbConnectionPoolIdentity

This shows the MethodTable that we can use to go get the different items. Of note, you may see multiple items, and may have to go through each one.

0:138> !dumpheap -mt 0x0000064282810450
------------------------------
Heap 4
         Address               MT     Size
00000000c021b348 0000064282810450      176
total 1 objects
------------------------------
Heap 6
         Address               MT     Size
00000000e05add10 0000064282810450      176
total 1 objects
------------------------------
Heap 12
         Address               MT     Size
000000014004b1d8 0000064282810450      176
total 1 objects
------------------------------
Heap 13
         Address               MT     Size
00000001502e6af0 0000064282810450      176

We have 4 pools. Let’s have a look at each pool and see how many connections we have for each.

Pool 1:

0:138> !do 0x00000000c021b348
Name: System.Data.ProviderBase.DbConnectionPool
MethodTable: 0000064282810450
EEClass: 00000642827da538
Size: 176(0xb0) bytes
(C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
…
00000642827ef760 400153f       18 ...nnectionPoolGroup 0 instance 0000000160036630 _connectionPoolGroup
0000064282818d18 4001540       20 ...nPoolGroupOptions 0 instance 0000000160036608 _connectionPoolGroupOptions
…
000006427843d998 4001551       98         System.Int32 1 instance                7 _totalObjects <-- Only 7 Objects out of a total pool size of 500
…
0:138> !do 0000000160036608
Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
MethodTable: 0000064282818d18
EEClass: 000006428282ce58
Size: 40(0x28) bytes
(C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00000642784358f8 4001598       14       System.Boolean 1 instance                1 _poolByIdentity
000006427843d998 4001599        8         System.Int32 1 instance                1 _minPoolSize
000006427843d998 400159a        c         System.Int32 1 instance              500 _maxPoolSize <-- Total pool size

Pool 2:

0:138> !do 0x00000000e05add10
Name: System.Data.ProviderBase.DbConnectionPool
0000064282818d18 4001540       20 ...nPoolGroupOptions 0 instance         e05ad798 _connectionPoolGroupOptions
000006427843d998 4001551       98         System.Int32 1 instance                6 _totalObjects <-- Only 6 Objects out of a total pool size of 100
0:138> !do e05ad798
Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
              MT            Field           Offset                 Type VT             Attr            Value Name
00000642784358f8 4001598       14       System.Boolean 1 instance                1 _poolByIdentity
000006427843d998 4001599        8         System.Int32 1 instance                0 _minPoolSize
000006427843d998 400159a        c         System.Int32 1 instance              100 _maxPoolSize <-- Total pool size

Pool 3:

0:138> !do 0x000000014004b1d8
Name: System.Data.ProviderBase.DbConnectionPool
0000064282818d18 4001540       20 ...nPoolGroupOptions 0 instance         d01e8288 _connectionPoolGroupOptions
000006427843d998 4001551       98         System.Int32 1 instance                7 _totalObjects <-- Only 7 Objects out of a total pool size of 500
0:138> !do d01e8288
Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
              MT            Field           Offset                 Type VT             Attr            Value Name
00000642784358f8 4001598       14       System.Boolean 1 instance                1 _poolByIdentity
000006427843d998 4001599        8         System.Int32 1 instance                1 _minPoolSize
000006427843d998 400159a        c         System.Int32 1 instance              500 _maxPoolSize <-- Total pool size

Pool 4:

0:138> !do 0x00000001502e6af0
Name: System.Data.ProviderBase.DbConnectionPool
0000064282818d18 4001540       20 ...nPoolGroupOptions 0 instance        1600f1940 _connectionPoolGroupOptions
000006427843d998 4001551       98         System.Int32 1 instance                4 _totalObjects <-- Only 4 Objects out of a total pool size of 100

0:138> !do 1600f1940
Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
              MT            Field           Offset                 Type VT             Attr            Value Name
00000642784358f8 4001598       14       System.Boolean 1 instance                1 _poolByIdentity
000006427843d998 4001599        8         System.Int32 1 instance                0 _minPoolSize
000006427843d998 400159a        c         System.Int32 1 instance              100 _maxPoolSize <-- Total pool size

The connection pools are dictated by the Connection String used. So, this means 4 different connection strings were used. We can look at the stack objects to see if we can pick apart some more information.

0:138> !dso
OS Thread Id: 0x70b0 (138)
RSP/REG Object Name
...
000000001454df30 00000001602a0f00 System.Data.SqlClient.SqlConnection
000000001454df40 00000000c0ace890 System.String
000000001454df48 00000001602a0cf0 Microsoft.BizTalk.Bam.EventObservation.BAMTraceFragment
000000001454df50 0000000150511568 System.String
000000001454df60 00000001602a0b00 Microsoft.BizTalk.Bam.EventObservation.DirectEventStream
000000001454df70 00000001602a0b00 Microsoft.BizTalk.Bam.EventObservation.DirectEventStream
000000001454df78 00000001602a0cf0 Microsoft.BizTalk.Bam.EventObservation.BAMTraceFragment
000000001454df80 00000001505112d0 System.String
000000001454df88 0000000150511568 System.String
000000001454df90 00000001602a0cf0 Microsoft.BizTalk.Bam.EventObservation.BAMTraceFragment
000000001454dfa8 00000001602a13d0 System.InvalidOperationException
000000001454dfb0 00000001602a0b38 System.Object
000000001454dfb8 000000015050d780 System.Data.SqlClient.SqlCommand
...

Here is the SQL Command Object that was issuing the command when we had the exception.

0:138> !do 000000015050d780
Name: System.Data.SqlClient.SqlCommand
MethodTable: 000006428279dbd0
EEClass: 00000642827d1dc0
Size: 224(0xe0) bytes
(C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000064278436018 400018a        8        System.Object 0 instance 0000000000000000 __identity
00000642828144d8 40008de       10 ...ponentModel.ISite 0 instance 0000000000000000 site
00000642826664d8 40008df       18 ....EventHandlerList 0 instance 0000000000000000 events
0000064278436018 40008dd      210        System.Object 0   static 00000000f0269548 EventDisposed
000006427843d998 40016f2       b0         System.Int32 1 instance              672 ObjectID
0000064278436728 40016f3       20        System.String 0 instance 00000000f0020178 _commandText <-- The query/command issued
000006428279c370 40016f4       b4         System.Int32 1 instance                4 _commandType
000006427843d998 40016f5       b8         System.Int32 1 instance               30 _commandTimeout
000006428279d908 40016f6       bc         System.Int32 1 instance                3 _updatedRowSource
00000642784358f8 40016f7       d0       System.Boolean 1 instance                0 _designTimeInvisible
000006428288d490 40016f8       28 ...ent.SqlDependency 0 instance 0000000000000000 _sqlDep
00000642784358f8 40016f9       d1       System.Boolean 1 instance                0 _inPrepare
000006427843d998 40016fa       c0         System.Int32 1 instance               -1 _prepareHandle
00000642784358f8 40016fb       d2       System.Boolean 1 instance                0 _hiddenPrepare
00000642827e3128 40016fc       30 ...rameterCollection 0 instance 000000015050d940 _parameters
00000642827eea48 40016fd       38 ...ent.SqlConnection 0 instance 000000015050f308 _activeConnection <-- The SqlConnection that we used for this command
00000642784358f8 40016fe       d3       System.Boolean 1 instance                0 _dirty
…

In this case, we know the SqlConnection isn’t valid because we erred trying to get it from the Pool. The Command Text would be interesting has this been a Query timeout, but for a connection Timeout, it is irrelevant. We can poke at the strings on the stack and we will find the Connection String used for this operation.

0:138> !do 00000001505112d0
Name: System.String
MethodTable: 0000064278436728
EEClass: 000006427803e520
Size: 330(0x14a) bytes
(C:\WINDOWS\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: server=MyServer; database= MyDatabase;Integrated Security=SSPI;Connect Timeout=25; pooling=true; Max Pool Size=500; Min Pool Size=1

From this, we can see Max Pool Size is at 500, so that narrows it down to two of the four Pools listed above. When we went through the pools previously, I noticed that one of the pools had something that the others didn’t. And, it happened to be one of the pools with the Pool Size of 500. Let’s look at the full input of the pool in question.

0:138> !do 0x000000014004b1d8
Name: System.Data.ProviderBase.DbConnectionPool
MethodTable: 0000064282810450
EEClass: 00000642827da538
Size: 176(0xb0) bytes
(C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000006427843d998 400153c       88         System.Int32 1 instance           200000 _cleanupWait
000006428281d320 400153d        8 ...ctionPoolIdentity 0 instance 000000014004b1b8 _identity
00000642827ef2d0 400153e       10 ...ConnectionFactory 0 instance 0000000140022860 _connectionFactory
00000642827ef760 400153f       18 ...nnectionPoolGroup 0 instance 00000000d01e82b0 _connectionPoolGroup <-- We can get the connection string from this object
0000064282818d18 4001540       20 ...nPoolGroupOptions 0 instance 00000000d01e8288 _connectionPoolGroupOptions
000006428281d3c0 4001541       28 ...nPoolProviderInfo 0 instance 0000000000000000 _connectionPoolProviderInfo
00000642828102f8 4001542       8c         System.Int32 1 instance                1 _state
000006428281d4b8 4001543       30 ...InternalListStack 0 instance 000000014004b288 _stackOld
000006428281d4b8 4001544       38 ...InternalListStack 0 instance 000000014004b2a0 _stackNew
0000064278424d50 4001545       40 ...ding.WaitCallback 0 instance 000000014004c570 _poolCreateRequest
0000064278425c90 4001546       48 ...Collections.Queue 0 instance 0000000000000000 _deactivateQueue
0000064278424d50 4001547       50 ...ding.WaitCallback 0 instance 0000000000000000 _deactivateCallback
000006427843d998 4001548       90         System.Int32 1 instance                0 _waitCount
000006428281f6d8 4001549       58 ...l+PoolWaitHandles 0 instance 000000014004b3a8 _waitHandles
00000642784369f0 400154a       60     System.Exception 0 instance 00000000e09e13f0 _resError <-- We had an error on this pool
00000642784358f8 400154b       a0       System.Boolean 1 instance                1 _errorOccurred
000006427843d998 400154c       94         System.Int32 1 instance            10000 _errorWait
0000064278468a80 400154d       68 ...m.Threading.Timer 0 instance 00000001505bc420 _errorTimer
0000064278468a80 400154e       70 ...m.Threading.Timer 0 instance 000000014004c5f0 _cleanupTimer
000006428281fce8 400154f       78 ...tedConnectionPool 0 instance 000000014004c3e8 _transactedConnectionPool
0000000000000000 4001550       80                       0 instance 000000014004b400 _objectList
000006427843d998 4001551       98         System.Int32 1 instance                7 _totalObjects
000006427843d998 4001553       9c         System.Int32 1 instance                8 _objectID
0000064278425e20 400153b      c00        System.Random 0   static 00000000e0188968 _random
000006427843d998 4001552      968         System.Int32 1   static               18 _objectTypeCount

First, lets see if we can line up the connection string for this Pool with what was on the stack to make sure we are looking at the right pool.

0:138> !do 00000000d01e82b0
Name: System.Data.ProviderBase.DbConnectionPoolGroup
MethodTable: 00000642827ef760
EEClass: 00000642827da418
Size: 72(0x48) bytes
(C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000064282816978 4001584        8 ...ConnectionOptions 0 instance 0000000170021600 _connectionOptions
0000064282818d18 4001585       10 ...nPoolGroupOptions 0 instance 00000000d01e8288 _poolGroupOptions
00000642823f2650 4001586       18 ....HybridDictionary 0 instance 00000000b00fb528 _poolCollection
000006427843d998 4001587       30         System.Int32 1 instance                1 _poolCount
000006427843d998 4001588       34         System.Int32 1 instance                1 _state
00000642828193b0 4001589       20 ...GroupProviderInfo 0 instance 00000000d01e82f8 _providerInfo
0000000000000000 400158a       28 ...DbMetaDataFactory 0 instance 0000000000000000 _metaDataFactory
000006427843d998 400158c       38         System.Int32 1 instance                7 _objectID
000006427843d998 400158b      978         System.Int32 1   static               20 _objectTypeCount
0:138> !do 0000000170021600
Name: System.Data.SqlClient.SqlConnectionString
MethodTable: 0000064282817158
EEClass: 00000642828234e0
Size: 184(0xb8) bytes
(C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000064278436728 4000bef        8        System.String 0 instance 0000000150020230 _usersConnectionString
000006427843e080 4000bf0       10 ...ections.Hashtable 0 instance 00000001700216b8 _parsetable
00000642828180a0 4000bf1       18 ...mon.NameValuePair 0 instance 0000000170021878 KeyChain
00000642784358f8 4000bf2       28       System.Boolean 1 instance                0 HasPasswordKeyword
00000642784358f8 4000bf3       29       System.Boolean 1 instance                0 UseOdbcRules
000006427843cf18 4000bf4       20 ...ity.PermissionSet 0 instance 00000000d01e8330 _permissionset
00000642825a4958 4000beb      3e0 ...Expressions.Regex 0   static 00000000f026d658 ConnectionStringValidKeyRegex
00000642825a4958 4000bec      3e8 ...Expressions.Regex 0   static 00000000d01e7798 ConnectionStringValidValueRegex
00000642825a4958 4000bed      3f0 ...Expressions.Regex 0   static 0000000080032770 ConnectionStringQuoteValueRegex
00000642825a4958 4000bee      3f8 ...Expressions.Regex 0   static 0000000080034800 ConnectionStringQuoteOdbcValueRegex
…
0:138> !do 0000000150020230
Name: System.String
MethodTable: 0000064278436728
EEClass: 000006427803e520
Size: 330(0x14a) bytes
(C:\WINDOWS\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: server=MyServer; database= MyDatabase;Integrated Security=SSPI;Connect Timeout=25; pooling=true; Max Pool Size=500; Min Pool Size=1

We have a match! So, now lets look at the error that was on the pool.

0:138> !pe 00000000e09e13f0
Exception object: 00000000e09e13f0
Exception type: System.Data.SqlClient.SqlException
Message: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
InnerException: <none>
StackTrace (generated):
    SP               IP               Function
    0000000015CBDF10 00000642828554A3 System_Data!System.Data.SqlClient.SqlInternalConnection.OnError(System.Data.SqlClient.SqlException, Boolean)+0x103
    0000000015CBDF60 0000064282854DA6 System_Data!System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(System.Data.SqlClient.TdsParserStateObject)+0xf6
    0000000015CBDFC0 0000064282CDCCF1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSniError(System.Data.SqlClient.TdsParserStateObject, UInt32)+0x291
    0000000015CBE0A0 000006428284ECCA System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult, System.Data.SqlClient.TdsParserStateObject)+0x13a
    0000000015CBE140 000006428284E9E1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()+0x91
    0000000015CBE1A0 0000064282852763 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadBuffer()+0x33
    0000000015CBE1D0 00000642828526A1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadByte()+0x21
    0000000015CBE200 0000064282851B5C System_Data!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior, System.Data.SqlClient.SqlCommand, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.BulkCopySimpleResultSet, System.Data.SqlClient.TdsParserStateObject)+0xbc
    0000000015CBE2D0 00000642828519E6 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.CompleteLogin(Boolean)+0x36
    0000000015CBE320 000006428284A997 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(System.Data.SqlClient.ServerInfo, System.String, Boolean, Int64, System.Data.SqlClient.SqlConnection)+0x147
    0000000015CBE3C0 000006428284859F System_Data!System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(System.String, System.String, Boolean, System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, Int64)+0x52f
    0000000015CBE530 0000064282847505 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, System.String, Boolean)+0x135
    0000000015CBE5D0 00000642828471E3 System_Data!System.Data.SqlClient.SqlInternalConnectionTds..ctor(System.Data.ProviderBase.DbConnectionPoolIdentity, System.Data.SqlClient.SqlConnectionString, System.Object, System.String, System.Data.SqlClient.SqlConnection, Boolean)+0x153
    0000000015CBE670 0000064282846E36 System_Data!System.Data.SqlClient.SqlConnectionFactory.CreateConnection(System.Data.Common.DbConnectionOptions, System.Object, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection)+0x296
    0000000015CBE730 0000064282846947 System_Data!System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnectionOptions)+0x37
    0000000015CBE790 000006428284689D System_Data!System.Data.ProviderBase.DbConnectionPool.CreateObject(System.Data.Common.DbConnection)+0x29d
    0000000015CBE830 000006428292905D System_Data!System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(System.Data.Common.DbConnection)+0x5d
    0000000015CBE870 0000064282846412 System_Data!System.Data.ProviderBase.DbConnectionPool.GetConnection(System.Data.Common.DbConnection)+0x6b2
    0000000015CBE930 00000642828424B4 System_Data!System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)+0x54
    0000000015CBE980 0000064282841BA2 System_Data!System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)+0xf2
    0000000015CBE9D0 000006428284166C System_Data!System.Data.SqlClient.SqlConnection.Open()+0x10c
    0000000015CBEA60 0000064282928C2D Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreSingleEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x8d
    0000000015CBEAE0 0000064282928947 Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreCustomEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x47

As we can see, it is a normal Connection Timeout error. Which makes sense, as our pools were not exhausted. Of note, they had set their Connection Timeout to 25 seconds in the connection string. Which means they would need to bump it higher, or look at what is going on with SQL Server at the time this occurs. Not much more we can get from the dump.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

↧

Can’t Connect to SQL when provisioning a Project App in SharePoint

February 7, 2014, 7:25 am

≫ Next: Don’t Rely On a Static IP Address for Your SQL Database

≪ Previous: SQL Connection Pool Timeout Debugging

The customers issue was that they were trying to provision a Project site within the Project SharePoint Application. This was done via a PowerShell script that they ran on one of the SharePoint App Servers.

They had two SharePoint App Servers – AppServerA and AppServerB. They had indicated that the provisioning would fail on either App Server and it started failing around November of last year (4 months ago). The error that they would see when the failure occurred was the following from the SharePoint ULS Logs:

02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Database        880i        High        System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Database        880j        High        SqlError: 'A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)' Source: '.Net SqlClient Data Provider' Number: 53 State: 0 Class: 20 Procedure: '' LineNumber: 0 Server: ''        f5009e1d-12cd-4a70-a0af-f0400acf99e6
02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Database        tzkv        High        SqlCommand: 'CREATE PROCEDURE dbo.MSP_TimesheetQ_Acknowledge_Control_Message @serverUID UID , @ctrlMsgId int AS BEGIN IF @@TRANCOUNT > 0 BEGIN RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1) RETURN END DECLARE @lastError INT SELECT @lastError = 0 UPDATE dbo.MSP_QUEUE_TIMESHEET_HEALTH SET LAST_CONTROL_ID = @ctrlMsgId WHERE SERVER_UID = @serverUID SELECT @lastError = @@ERROR Exit1: RETURN @lastError END ' CommandType: Text CommandTimeout: 0        f5009e1d-12cd-4a70-a0af-f0400acf99e6
02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Provisioning        6935        Critical        Error provisioning database. Script: C:\Program Files\Microsoft Office Servers\14.0\Sql\Project Server\Core\addqueue1timesheetsps12.sql, Line: 0, Error: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server), Line: CREATE PROCEDURE dbo.MSP_TimesheetQ_Acknowledge_Control_Message @serverUID UID , @ctrlMsgId int AS BEGIN IF @@TRANCOUNT > 0 BEGIN RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1) RETURN END DECLARE @lastError INT SELECT @lastError = 0 UPDATE dbo.MSP_QUEUE_TIMESHEET_HEALTH SET LAST_CONTROL_ID = @ctrlMsgId WHERE SERVER_UID = @serverUID SELECT @lastError = @@ERROR Exit1: RETURN @lastError END .        f5009e1d-12cd-4a70-a0af-f0400acf99e6
02/05/2014 10:14:32.89        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Provisioning        6971        Critical        Failed to provision site /CMS with error: Microsoft.Office.Project.Server.Administration.ProvisionException: Failed to provision databases. ---> Microsoft.Office.Project.Server.Administration.ProvisionException: CREATE PROCEDURE dbo.MSP_TimesheetQ_Acknowledge_Control_Message @serverUID UID , @ctrlMsgId int AS BEGIN IF @@TRANCOUNT > 0 BEGIN RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1) RETURN END DECLARE @lastError INT SELECT @lastError = 0 UPDATE dbo.MSP_QUEUE_TIMESHEET_HEALTH SET LAST_CONTROL_ID = @ctrlMsgId WHERE SERVER_UID = @serverUID SELECT @lastError = @@ERROR Exit1: RETURN @lastError END ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) at

One thing they had mentioned was that if they increased the Connection Timeout to 60 seconds, it would sometimes work. My thought process on this is that if connection timeout would sometimes allow it to work that we may have had a timeout when actually connecting to SQL Server, but that wasn’t the error.

Looking at the actual error we can draw some conclusions.

provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server

By default, we should be using TCP. If there is a serious error with that, we will use Named Pipes. The error Named Pipes got back was that we couldn’t open the connection. Not a timeout. Think of this as “SQL Server does not exist or access denied”. SQL Server in this case was also a default instance Cluster. Not a Named Instance, so SQL Browser was not coming into the picture. This is a straight shot to port 1433 via TCP.

Which machine was getting the error?

For troubleshooting, we need to consider which machines are involved. One thing that we noticed over the course of troubleshooting was that the error always occurred on AppServerB and we were always starting the script from AppServerA. If you think about how SharePoint works with its App Servers, when a service is running, you can have it started on individual App Servers and control the load. The fact that we were always seeing the error on AppServerB led me to believe that the Project Application Server Service was only started on AppServerB and not AppServerA. Looking in Central Admin, this was correct. So, we want to concentrate data collection from AppServerB.

Network Traces

The first thing that was looked at was getting a network trace. We collected network traces from AppServerB and the SQL Server. If we go back to error that was happening, we recall that we know that TCP was not working as expected and then Named Pipes was failing. Named Pipes uses the SMB protocol to talk. This will first reach out to TCP port 445. We didn’t see any traffic in the Network trace going to that. We also didn’t see any SMB traffic that was relevant to the error. We only saw browser announcements which had nothing to do with us. This tells me that we never hit the wire. So, the network traces wouldn’t be helpful.

BIDTrace

Enter BIDTrace. BIDTrace is really just diagnostic logging within our client providers and server SNI stack. Think Event Tracing for Windows (ETW). I’m not going to dive into how to set this up as it would take its own blog post. You can read more about it in the following MSDN Page:

Data Access Tracing in SQL Server 2012
http://msdn.microsoft.com/en-us/library/hh880086.aspx

Typically I won’t go this route unless I know what I’m looking to get out of it. It is actually pretty rare that I’ll jump to this. In this particular case, it was an excellent case. We have some evidence that we are not getting far enough to hit the wire, and we know we are getting an error when trying to make a connection to SQL. So, what I’m looking for here is if there is some Windows Error that we are getting that wasn’t presented in the actual exception.

Here is the Logman command that I used to start the capture after getting the BIDTrace items configured.

Logman start MyTrace -pf ctrl.guid -ct perf -o Out%d.etl -mode NewFile -max 150 –ets

A few things I’ll point out with this comment. The output file has a %d in it. This is a format string because we will end up with multiple files. -mode is used to tell it to create a new file after hitting the max size that is listed. We then set –max to 150 which means that we want to cap the size of the file to 150MB in size. I did this because when we first went for it with a single file, the ETL file was 300MB and when I went to convert it to text it was over 1GB in size. That’s a lot to look through. I also had troubles opening it. So, I decided to break it up. Of note, it took about 4-5 minutes to reproduce the issue. That’s a long time to capture a BIDTrace. When you go to capture a BIDTrace, it is better to get a small window to capture if you can. These files fill up fast.

Here is the ctrl.guid that I used to capture. This is effectively the event providers that I wanted to capture:

{8B98D3F2-3CC6-0B9C-6651-9649CCE5C752} 0x630ff 0   MSDADIAG.ETW
{914ABDE2-171E-C600-3348-C514171DE148} 0x630ff 0   System.Data.1
{C9996FA5-C06F-F20C-8A20-69B3BA392315} 0x630ff 0   System.Data.SNI.1

The capture will produce ETL files which are binary files. You need to convert them after you are done. I use TraceRPT to do this. It is part of Windows. Here is the command I used to output it to a CSV file to look at.

TraceRPT out5.etl –of CSV

In our case, it had generated 5 etl files – remember the %d? So, we grabbed the last file that was produced which was out5.etl and converted it. Although at first, I didn’t know it was out5.etl. I actually started with out4.etl. One problem is though is I didn’t have timestamps within the CSV output. I had clock CPU time which is hard to visualize compared to an actual timestamp.

Enter Message Analyzer! Message Analyzer is a replacement for Network Monitor. But it has another awesome ability in that it can open ETL files. One other thing I had was the timestamp of the error from the SharePoint ULS Log on the attempt that we made when we captured the BIDTrace.

02/06/2014 13:14:20.55 OWSTIMER.EXE (0x2024) 0x1C4C Project Server Database 880i High System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSec... bc7aaa60-93fc-4873-8f75-416d802aa55b
02/06/2014 13:14:20.55 OWSTIMER.EXE (0x2024) 0x1C4C Project Server Provisioning 6993 Critical Provisioning '/Test3': Failed to provision databases. An exception occurred: CREATE PROCEDURE dbo.MSP_TimesheetQ_Get_Job_Count_Simple @correlationID UID , @groupState int , @msgType int AS BEGIN IF @@TRANCOUNT > 0 BEGIN RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1) RETURN END SELECT COUNT(*) FROM dbo.MSP_QUEUE_TIMESHEET_GROUP WHERE CORRELATION_UID = @correlationID AND GRP_QUEUE_STATE = @groupState AND GRP_QUEUE_MESSAGE_TYPE = @msgType END . bc7aaa60-93fc-4873-8f75-416d802aa55b

Our issue occurred at 1:14:20.55 Server Time. We can also see the statement it was going to try and run. If we open the ETL file within Message Analyzer, we can see the timestamps that are covered within the file.

We can see that this went up to 12:14:04 local time. We were looking for 12:14:20.55. So, out4.etl was not the file I was looking for. Which left Out5.etl. Technically you can read the data within Message Analyzer as you can see from the lower right of the screenshot. It’s unicode data, and we see l.e.a.v.e. I still prefer the output from TraceRPT when going to CSV as I can get the readable text from that. It is just a little easier to work with.

So, I have the CSV output from out5.etl, but what do we look for? Well, we know the statement that it was trying to make, so lets look for that - MSP_TimesheetQ_Get_Job_Count_Simple. We get a hit and it looks like this:

System.Data,      TextW,            0,          0,          0,          0,         18,          0, 0x0000000000000000, 0x00002024, 0x00001C4C,                    0,             ,                     ,   {00000000-0000-0000-0000-000000000000},                                         ,   130361840460213871,       7080,      21510,        2, "<sc.SqlCommand.set_CommandText|API> 4187832#, '"
System.Data,      TextW,            0,          0,          0,          0,         18,          0, 0x0000000000000000, 0x00002024, 0x00001C4C,                    0,             ,                     ,   {00000000-0000-0000-0000-000000000000},                                         ,   130361840460213910,       7080,      21510,        2, "CREATE PROCEDURE dbo.MSP_TimesheetQ_Get_Job_Count_Simple   @correlationID UID ,    @groupState int ,    @msgType int AS BEGIN    IF @@TRANCOUNT > 0    BEGIN              RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1)       RETURN    END     SELECT COUNT(*) FROM dbo.MSP_QUEUE_TIMESHEET_GROUP        WHERE CORRELATION_UID = @correlationID       AND   GRP_QUEUE_STATE = @groupState       AND   GRP_QUEUE_MESSAGE_TYPE = @msgType END "
System.Data,      TextW,            0,          0,          0,          0,         18,          0, 0x0000000000000000, 0x00002024, 0x00001C4C,                    0,             ,                     ,   {00000000-0000-0000-0000-000000000000},                                         ,   130361840460213935,       7080,      21510,        2, "' "

Not the prettiest, but when looking in notepad or some other text reader, we can just go over to the right to get a better view.

The first time you look at this it can be a little overwhelming. Especially if you aren’t familiar with how SNI/TDS works. If we go through the results, we’ll see a few interesting things.

<prov.DbConnectionHelper.ConnectionString_Set|API> 4184523#, 'Data Source=<server>;Initial Catalog=<database>;Integrated Security=True;Pooling=False;Asynchronous Processing=False;Connect Timeout=15;Application Name="Microsoft Project Server"' "
<GetProtocolEnum|API|SNI>
<Tcp::FInit|API|SNI>
<Tcp::SocketOpenSync|API|SNI>
<Tcp::SocketOpenSync|RET|SNI> 10055{WINERR}
<Tcp::Open|ERR|SNI> ProviderNum: 7{ProviderNum}, SNIError: 0{SNIError}, NativeError: 10055{WINERR} <-- 10055 = WSAENOBUFS
<Np::FInit|RET|SNI> 0{WINERR}
<Np::OpenPipe|API|SNI> 212439#, szPipeName: '\\<server>\PIPE\sql\query', dwTimeout: 5000
<Np::OpenPipe|ERR|SNI> ProviderNum: 1{ProviderNum}, SNIError: 40{SNIError}, NativeError: 53{WINERR} <-- ERROR_BAD_NETPATH = network path was not found

We can get the Connection string, which was also available in the SharePoint ULS Log. We will also see some entries around Protocol Enumeration. This is where we look at the Client Registry items to see what Protocols we will go through and in what order (TCP, NP, LPC, etc…). Then we see TCP trying to connect. You’ll recall I mentioned that we try TCP first by default. We then see that this received a Windows error of 10055 (WSAENOBUFS). We then see Named Pipes fail with Error 53 which is ERROR_BAD_NETPATH. We got what we were looking for out of the BIDTrace.

WSAENOBUFS is the key here. It is a WinSock error which we actually have a KB Article on.

When you try to connect from TCP ports greater than 5000 you receive the error 'WSAENOBUFS (10055)'
http://support.microsoft.com/kb/196271

There is a registry key called MaxUserPort which can increase the number of dynamic ports that are available. In Windows 2003, this was under 5000. Starting in Windows 2008, this was increased as we use to see a lot of problems here. Especially when connection pooling was not being used. Here is the port range on my Windows 8.1 machine.

And for a Windows 2008 R2 Server, which the customer was using:

I have 64510 ports available. On the customer’s machine, they had mentioned that for a previous issue, the engineer had asked them to add this registry key, and they set the value to 4999. By setting it to 4999, we are effectively limiting the number of ports that would have otherwise been available. If you look back at the connection string, you can see that Pooling was set to False. This means we are turning off connection pooling, and every time we go to connect, we will establish a new hard connection. This eats up a port. You can look at NETSTAT to see what it looks like. We did then when running the provisioning scripts and we saw it get up to around 3000 or so before it was done. You will also see a lot of ports in a TIME_WAIT status. When you disconnect and the port is released, it will go into a TIME_WAIT state for a set amount of time. The default of which is around 4 minutes. That’s 4 minutes you can’t use that port. If you are opening and closing connections a lot, you will run out of ports because a lot will be in the TIME_WAIT state. That’s typically when we would bump up the number of ports using the MaxUserPort registry key. However, this is never really a fix, you are just putting a bandaid on without understanding the problem.

End result…

In our case, Project Server was turning off connection pooling. I don’t know why they are doing that, but that, in conjunction with the MaxUserPort being set to 4999, was causing this issue. We had removed the MaxUserPort registry key and rebooted AppServerB, and it started working after that. Of note, we had also started the Project Application Server on AppServerA and cleaned up the TCP registry keys on that machine as well so that they could effectively balance their load on the SharePoint App Server.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

Don’t Rely On a Static IP Address for Your SQL Database

March 27, 2014, 1:34 pm

≫ Next: JDBC: This driver is not configured for integrated authentication

≪ Previous: Can’t Connect to SQL when provisioning a Project App in SharePoint

I’ve seen a number of customers open support incidents because they couldn’t connect to their SQL Database server which was ultimately due to the incorrect assumption that the server’s IP address is static. In fact, the IP address of your logical server is not static and is subject to change at any time. All connections should be made using the fully qualified DNS name (FQDN) rather than the IP address.

The following picture from the Windows Azure SQL Database Connection Management Technet article shows the network topology for a SQL Database cluster.

Your logical server (e.g., with a FQDN of xyz.database.windows.net) resides on a SQL Database cluster in one of the backend SQL Server nodes. Within a given region (e.g., North Central US, South Central US, North Europe, etc) there are generally many SQL Database clusters, as required to meet the aggregate capacity of all customers. All logical servers within a cluster are accessed through the network load balancer (the single blue block with the note saying “Load balancer forwards ‘sticky’ sessions…” in the diagram) via a virtual IP address.

If you do a reverse name lookup from your server’s IP address you will actually see the name of the cluster load balancer. For example, if I try to ping one of my servers (whose actual server name starts with ljvt in the screenshot below) you will see that the displayed name associated with the IP address is instead data.sn3-1.database.windows.net, where the sn3-1 portion of the name maps to the specific cluster in the region (South Central) hosting this server.

Microsoft may do an online migration of your logical server between clusters within a region, load balancing capacity across the clusters within the region. This move is a live operation and there is no loss of availability to your database during the operation. When the migration completes, existing connections to your logical server are terminated and upon reconnecting via fully qualified domain name your app will be directed to the new cluster. However, if your application caches or connects by IP address instead of FQDN then your connection attempts will fail.

A migration moves all of your settings, including any SQL Database firewall rules that you have. Consequently there are no Azure-specific changes that are required in order to connect. However, if your on-premise network infrastructure blocks/filters outgoing TCP/IP traffic to port 1433—the port used for SQL connections—and you had it restricted to a fixed IP address then you may need to adjust your client firewall/router. The IP address of your SQL Database server will always be a part of the address ranges listed in the Windows Azure Datacenter IP Ranges list. You should allow outgoing traffic for port 1433 to these address ranges rather than a specific IP address.

Keith Elmore – Principal Escalation Engineer

↧

JDBC: This driver is not configured for integrated authentication

January 9, 2015, 7:46 am

≫ Next: How to get unstuck when using SQL Aliases

≪ Previous: Don’t Rely On a Static IP Address for Your SQL Database

I’ve had about 4 cases in the last two months that centered around the following error when trying to use Windows Integrated authentication with JDBC.

java.sql.SQLException: This driver is not configured for integrated authentication

The key environment point on this was that they were trying to do this on a Linux platorm and not a Windows platform. Specifically they were running WebSphere on a Linux platform. The last one I worked on was running WebSphere 8.5.

There is only one location within the JDBC Driver where this particular error is raised. It is when we are trying to use Kerberos and the Authentication Scheme is set to NativeAuthentication, which is the default setting for this property. Starting in the JDBC 4.0 driver, you can use the authenticationScheme connection property to indicate how you want to use Kerberos to connect to SQL. There are two settings here.

NativeAuthentication (default) – This uses the sqljdbc_auth.dll and is specific to the Windows platform. This was the only option prior to the JDBC 4.0 driver.
JavaKerberos– Makes use of the Java API’s to invoke kerberos and does not rely on the Windows Platform. This is java specific and not bound to the underlying operating system, so this can be used on both Windows and Linux platforms.

So, if you are receiving the error above, there are three possibilities that could be causing it to show up. First, you actually specified authenticationScheme=NativeAuthentication in your connection string and you are on a Linux Platform. Second, you specified integratedSecurity and omitted authenticationScheme, which defaulted to NativeAuthentication, and you are on a Unix/Linux Platform. Third, you are using a version of the JDBC Driver prior to the 4.0 driver and trying to use Integrated Authentication on a Unix/Linux platform. In the third case, even if you specify authenticationScheme=JavaKerberos, it won’t help as the older drivers aren’t aware of it, so it is ignored.

The following document outlines how to use Kerberos with the JDBC Driver and walks through what is needed to get JavaKerberos working properly.

Using Kerberos Integrated Authentication to Connect to SQL Server
http://msdn.microsoft.com/en-us/library/gg558122%28v=sql.110%29.aspx

Another aspect that was discovered was the that it appears that the WebSphere 8.5 release comes with the 3.0 version of the SQL JDBC Driver. This will not honor the JavaKerberos setting and you will get the error listed above.

Configuration

So, you will need to make sure your driver is updated to the 4.0 driver or later. After that is done, you will need to make sure that the Kerberos Configuration file (krb5.ini or krb5.conf) is configured properly on your platform. In the above referenced documentation we have a sample of what that should look like. You will also need to generate keytab files for the platform to reference. A login configuration file also needs to be setup. If you don’t have one, the driver will automatically configure it using the Krb5LoginModule. If you need to use a different Login Module, you will need to make sure that is configured for your environment. Assuming all of that is in place, the driver should work using JavaKerberos to connect.

The following blog does a good job of walking through the steps to get this setup for Java. It indicates Weblogic, but really it just goes through the java aspects. It walks through how to create the keytab files and what to do with the krb5.ini file.

Configure Kerberos with Weblogic Server (really just a Java reference)
https://blogbypuneeth.wordpress.com/configure-kerberos-with-weblogic-server/

Known Limitation

If you have a multiple domain environment with SQL Servers in different domains that you are trying to hit, you will run into issues. We found that in order to get it to work properly, you need to set the default domain within the Kerberos configuration file, to the domain that the SQL Server resides in. You can only have one default domain, so if you have multiple SQL Servers in different domains, you are going to have to pick one.

SQL JDBC Driver Versioning and Files

I’ve also heard a lot of questions and seen confusion on the file versioning, file name and system requirements. Here is a table where I tried to highlight what comes with what driver for reference.

JDBC Driver Version
JAR Files
JDBC API Support
Supported JVM
2.0
sqljdbc.jar
3.0
1.5
sqljdbc4.jar
4.0
1.6 or later
3.0
sqljdbc.jar
3.0
1.5
sqljdbc4.jar
4.0
1.6 or later
4.0
sqljdbc.jar
3.0
1.5
sqljdbc4.jar
4.0
1.6 or later
4.1
sqljdbc.jar
3.0
1.5
sqljdbc4.jar
4.0
1.6 or later
sqljdbc41.jar
4.0
1.7 or later

Also, we have documentation regarding the System Requirements that you can look at that goes a little further into this.

System Requirements for the JDBC Driver
http://msdn.microsoft.com/en-us/library/ms378422(v=sql.110).aspx

Hopefully this will help clear things up for you when using the SQL JDBC Driver on a Unix/Linux Platform.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

How to get unstuck when using SQL Aliases

June 5, 2015, 12:31 pm

≫ Next: Getting the latest SQL Server Native Client

≪ Previous: JDBC: This driver is not configured for integrated authentication

I got a case recently where the customer had a SQL Alias setup but was having issues connecting to their application. Being in Business Intelligence Support we deal with plenty of connectivity issues and this is one topic of connectivity that does not get touched on a lot.

A SQL alias is an alternate name that can be used to make a connection. The alias encapsulates the required elements of a connection string, and exposes them with a name chosen by the user. Aliases can be used with any client application.

It started by getting this error message when trying to connect to SQL from a client application. For the sake of this write up we are going to use SQL Server Reporting Services (SSRS) as our application. We received an error similar to the following:

ERROR [08001] [Microsoft][SQL Server Native Client 10.0]TCP Provider: No such host is known. ERROR [HYT00] [Microsoft][SQL Server Native Client 10.0]Login timeout expired ERROR [01S00] [Microsoft][SQL Server Native Client 10.0]Invalid connection string attribute ERROR [08001] [Microsoft][SQL Server Native Client 10.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

From this we can tell we cannot connect to SQL Server because it could not find the server.

We tried a Universal Data Link (UDL) file to start with using the SQL Server name the customer provided, not knowing at that point it was actually a SQL Alias. Of Note, I always start with a UDL test because it is the simplest way to test connectivity. This gets the Customer’s application out of the picture.

Next we did a Ping on the IP of the SQL server and this was successful. This test rules out any DNS or name resolution issues in the environment.

So since Ping was successful, I went down the TELNET road. This allows us to see if anything is listening on the port we are expected to be on. This could be impaired by a firewall or the listener not being there.

We know the connection was successful because we got a blank screen that looks like this…

This got me thinking, why does Ping and TELNET work but not the UDL. I tested the UDL with ODBC Driver, MS OLE DB Provider, and SQL Server Native Client (SNAC), of which none worked using the name. I even tested the UDL forcing the TCP protocol which failed also.

TCP:<server name>,<port #>
e.g. - TCP:SQL2008R2,1433)

None of this was making sense especially since the server looked correct in configuration manager based on the Customer’s comment that they were using an Alias and they said it looked correct.

Aliases can be found under SQL Server Configuration Manager in the SQL Native Client configuration area.

At this time everything looked Ok, so I had to do further research where a SQL Alias can reside. During that research I found a bunch of references to Cliconfg. What is Cliconfg you may ask? I had the same question! In short, Cliconfg is the old configuration tool that ships with Windows. You can find more info on Cliconfg here .

NOTE: Be aware that Cliconfg is missing an "I" in the config part. This is due to the old 8.3 naming conventions for files.

So on the Application we went to Start > Run and typed in Cliconfg and noticed that they were using an IP address instead of a name which we saw in SQL Server Configuration Manager of which the Customer indicated the IP listed was incorrect.

In Cliconfg we saw something similar to the following…

I know the SQL Server IP address is 10.0.0.4, but the alias in Cliconfg was configured for 10.0.0.272. So to help correct this we edited the Connection parameters and set Server Name to the actual name of the SQL Server instead of the IP Address.

After changing that in Cliconfg we were able to connect to the SQL Server using the UDL successfully. Then we went back to the application, in this example being SSRS, and it was also able to connect successfully.

Bitness matters!

Be aware that client aliases can lead to connectivity issues if not configured correctly. These client aliases are really just entries in the registry. Also be aware that BITNESS MATTERS! The alias bitness will depend on what the bitness of the application is. If the application is 32-bit then the alias needs to be 32-bit.

From SQL server configuration manager the bitness is broken out in the GUI.

For Cliconfg there is actually separate application. One for 32-bit and one for 64-bit. By default if you go to start > run and type in Cliconfg on a 64-bit OS, you will get the 64-bit version. If the application is 32-bit, and you added a 64-bit alias then your will not pick up the alias. To get the 32-bit version of Cliconfg you can go to this path…

%SystemDrive%\Windows\sysWOW64\cliconfg.exe

Mark Ghanayem
Microsoft Business Intelligence Support

↧

Getting the latest SQL Server Native Client

July 14, 2015, 3:26 pm

≫ Next: How to get up and running with Oracle and Linked Servers

≪ Previous: How to get unstuck when using SQL Aliases

If you are installing a Service Pack (SP) or Cumulative Update (CU) for SQL Server, you may notice that the SQL Server Native Client doesn’t get updated. It may also be difficult to find the item to actually get it updated.

If you go look at the Feature Packs for the Service Packs, or go look at the Cumulative Update downloads, the sqlncli.msi package may not be listed there. So, how do you get it?

Get it from the SP or CU Package

When you go to run the SP or CU Package, it will self extract to a GUID folder. When it is done self extracting, you will see the SQL Setup landing page. I actually want to ignore the Setup landing page for this. But while that is there, we can go into Explorer and browse to the GUID folder.

From there, you will want to go to the region folder that matches your region. For me, it will be the 1033 folder.

From there, we can go to \x64\setup\x64 and you should see the sqlnsli.msi there. If you are 32bit, it will be an x86 folder.

From there, you can copy the MSI out to where ever you need to run it at.

If you are looking for the 32bit SQL Server Native Client, but are running on a x64 machine, use the x64 MSI. It will lay down both the 32bit and 64bit Driver/Provider.

Adam W. Saxton | Microsoft Business Intelligence Support - Escalation Services
@GuyInACube | YouTube | Facebook.com\guyinacube

↧

How to get up and running with Oracle and Linked Servers

July 31, 2015, 1:43 pm

≫ Next: Using the SSMS ConnectionDialog

≪ Previous: Getting the latest SQL Server Native Client

I have had more Linked Server cases that are setup for an Oracle database than any other non-SQL Server database in SQL. Being in Business Intelligence Support we deal with plenty of connectivity issues and this is one topic of connectivity that does not get touched on a lot.

In less than a month I got 4 Oracle Linked Server cases that all had different issues. The one thing that really got me was I did not really understand how the Oracle side of things worked for me to better troubleshoot the issue. For example, in one case I did not really have a good understanding of the ODAC Providers (Oracle’s Providers for connecting to different tools and applications) and the tnsnames.ora file and how they related to the whole setup. By having the whole picture I feel we can really help understand Oracle Linked Server setups better.

Walkthrough:

So for me to understand how the Oracle side worked I needed to get an Oracle server up and running.

As such, I decided to create an Oracle 11G server. You can download the bits using the following link.

Oracle Database Software
http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index-092322.html

After all that, I created a table in the system (default) schema.

Then I needed to create a listener which I learned is very important from an Oracle’s standpoint to make the database run properly.

What’s a table without any data? I added some test data so I can compare the results between the Oracle database and the Linked Server results. Then after creating the table I added data to it so I can compare the results between the Oracle database and the Linked Server results.

Once I had the Oracle side all up and ready I started to create my Linked Server in SSMS.

Now after I got my Oracle server up and operational I needed to find a very distinct file as this is the file that deals with the connectivity between Oracle and SQL. This file is called the tnsnames.ora file. I need to make sure I can locate it on the Oracle database server itself. More normal default Location is

C:\<database folder>\product\11.2.0\dbhome_1\ NETWORK\ADMIN\tnsnames.ora
ex: C:\OracleDatabase\product\11.2.0\dbhome_1\NETWORK\ADMIN\tnsnames.ora

The Service ID you have setup will be the connection information you will need when creating the Linked Server in SSMS. In this case I am going to use MSORATEST.

Now that we know the Oracle server is setup and we have our tnsnames.ora information ready, we need to start setting up the SQL Server to have the ability to create a Linked Server that connects to an Oracle database.

So at this point we would need to download and install the proper ODAC provider from ORACLE to get that process started. REMEMBER – BITNESS MATTERS!

Listed below are the sites on where to download the proper provider needed:

For 64-bit providers
http://www.oracle.com/technetwork/database/windows/downloads/index-090165.html
For 32-bit providers
http://www.oracle.com/technetwork/topics/dotnet/utilsoft-086879.html

For this example we are using 64 bit Oracle version 11g

For a quick test to verify you have it downloaded and installed properly you can do a quick UDL test. On the desktop create a new text file (make sure to show extensions so you can see the .txt part of the name). Then rename the entire file including the extension to Test.udl and press OK. Once you go to the Provider tab at the top left you should see something like “Oracle Provider for OLE DB” listed.

Now once you have confirmed you have installed the provider, search for the tnsnames.ora file on the SQL Server. Normally the default location is – C:\<folder chosen to save it in>\app\oracle\product\11.2.0\client\network\ADMIN.

Example location we are going to use will be: D:\app\sql2012\product\11.2.0\client_1\network\ADMIN.

What we would add to the SQL Server TNSNames file:

SPORTS =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = ORACLE11GMG.markg.local)(PORT = 4977))
(ADDRESS = (PROTOCOL = TCP)(HOST = ORACLE11GMG.markg.local)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = Sports)
)
)

Once you have the tnsnames.ora file correctly filled in on the SQL Server machine, you can now setup the Linked Server in SQL Server Management Studio.

Using SQL Server Management Studio to create a linked server to another instance of SQL Server Using SQL Server Management Studio
https://technet.microsoft.com/en-us/library/ff772782(v=sql.110).aspx#SSMSProcedure

Before we start going through the actual steps you need to make sure the “OraOLEDB.Oracle” Provider is listed under Linked Server > Providers.

Also make sure that under properties for the provider you select Allow inprocess.

When you use the “Allow in-process” option for Linked Server providers, SQL loads the COM DLL in its own memory process. We do not normally recommend it because it can lead to stability issues in SQL Server, but some providers require it such as the Oracle one. If it crashes, it will also crash SQL Server.

When running “Out of Process”, SQL launches the MSDAINITIALIZE process and that process loads the COM server (in this case, OLE DB Provider). If it is idle for x minutes or if the driver crashes the process, it unloads and the next linked server request loads in a new MSDAINITIALIZE process. You can see MSDAINITIALIZE by running dcomcnfg and working your way down Component Services.

Generally only Administrators or the local system account can launch this, so if SQL is running under a domain account, you should add it to the local Administrators group or have it run as Local System.

Now we can start creating the Oracle Linked Server in Management Studio.

Go to Server Objects > Linked Servers > right click and select New Linked Server…

Then start filling in the necessary information to continue to create an Oracle Linked server

General Tab:

Linked server - Name of your Linked Server
Server Type - Choose “Other data source” when using Oracle or any other Non-SQL Server database
Provider– Oracle Provider for OLE DB (downloaded from the Oracle site)
Product name– Oracle
Data source– MSORATEST (this comes from the information in the TNSNames.ora file you added onto the SQL machine)
Provider String– “leave blank”

Sample image of what it would look like once completed.

Then you will need to go to the Security Tab.

Select the option – Be made using this security context. The credentials you need to add are the ones that get you logged into your Oracle database.

Do Note: this is probably not the safest option. Mapping logins would be more secure. By doing this, it means that every user hitting this linked server will connect to Oracle using that context. I did it this way because it was easier for me and I am my own admin.

Then open up the Linked Server in Management Studio and search for the system tables:

Test that it works by running a 4 part query.

<Linked server name> <Database name> (if no specific database name then just use “..”) <Schema> <Table Name>
Ex: [SPORTS]..[MARK].[SPORTSDALLAS]

If you get this error when trying to create a Linked Server - “Cannot create an instance of OLE DB provider” - after filling in all the information follow this BLOG.

Troubleshooting “Cannot create an instance of OLE DB provider”
http://blogs.msdn.com/b/dataaccesstechnologies/archive/2011/09/28/troubleshooting-cannot-create-an-instance-of-ole-db-provider.aspx

Mark Ghanayem
Microsoft Business Intelligence Support

↧

Using the SSMS ConnectionDialog

August 12, 2015, 4:07 am

≫ Next: In-Memory OLTP files –what are they and how can I relocate them?

≪ Previous: How to get up and running with Oracle and Linked Servers

I was attempting to add the SSMS connection dialog to my utility and ran into problems with referenced assemblies.

The ConnectionDialog is documented here: https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.ui.connectiondlg.aspx

The following is a snippet using the SSMS ConnectionDialog in a C# application.

Microsoft.SqlServer.Management.UI.ConnectionDlg.

ConnectionDialog dlg = new Microsoft.SqlServer.Management.UI.ConnectionDlg.ConnectionDialog();

Microsoft.SqlServer.Management.Smo.RegSvrEnum.

UIConnectionInfo connInfo = new Microsoft.SqlServer.Management.Smo.RegSvrEnum.UIConnectionInfo { ApplicationName = “My App” };

IDbConnection conn;
dlg.AddServer(new Microsoft.SqlServer.Management.UI.ConnectionDlg.SqlServerType());
dlg.Text = “Connect to My Database”;

DialogResult dr = dlg.ShowDialogValidateConnection(this, ref connInfo, out conn);
if (DialogResult.OK == dr)
{
   m_strConnString = conn.ConnectionString + “;Pooling=TRUE”;
   if(false == m_strConnString.Contains(“Database=”))
      m_strConnString += “;Database=MyDb”;

bRC = true;
}
else
{
bRC = false;
}

To compile the properly, references to the RegSvrEnum and ManagementControls are required. I compiled it on my system and provided it to a peer, who quickly caused the application to fail with missing assembly references.

I built the application using the SQL Server SSMS 2015 July Preview and they had SQL Server 2014 on their system. No problem I made sure both assemblies were in the same directory as my application but this still failed.

Following the trail of missing assemblies I had to provide the following in order for the application to execute.

Microsoft.SqlServer.Management.Controls

Microsoft.SqlServer.Management.UserSettings

Microsoft.SqlServer.RegSvrEnum

SqlWorkbbench.Interfaces

There is not a redistributable package containing these assemblies. The supported way is to install the matching SSMS package on the target system. SSMS can be installed separately using the following link: https://msdn.microsoft.com/en-us/library/mt238290.aspx?f=255&MSPPError=-2147217396

Bob Dorr – Principal SQL Server Escalation Engineer

↧

In-Memory OLTP files –what are they and how can I relocate them?

August 31, 2015, 11:42 am

≫ Next: Are My Statistics Correct?

≪ Previous: Using the SSMS ConnectionDialog

In SQL 2014 and above, you can create memory optimized tables with In-Memory OLTP feature. When you use this feature, SQL Server actually generates native code to optimize performance. As a result, there will be dll, pdb files plus other intermediate files. In fact, native optimization is one of three pillars of high performance. The other two are no lock/no latch implantation and optimizing for in memory (no buffer pool handling).

Each stored procedure or table will have separate set of files generated. These are managed by SQL Server and you don’t need to worry about them normally. But we actually got a report from customer lately and they got the following error when starting their database

"Msg 41322, Level 16, State 13, Line 0
MAT/PIT export/import encountered a failure for memory optimized table or natively compiled stored procedure with object ID 214291823 in database ID 6. The error code was 0x80030070".

The error 0x80030070 is operating system error for ERROR_DISK_FULL “There is not enough space on the disk”.

It turned out that customer has lots of memory optimized objects (tables and stored procedures) and that resulted in lots of files generated.

Where do these files get stored?

They are stored in the default location of database file for the server instance.

SQL will always create a subfolder like <default data file location>\xtp\<dbid> and then store files. The file names follow the convention of xtp_<p or t>_<dbid>_<objected>.*. For example, when I created a sample In-Memory OLTP with just one memory optimized table named t, my instance of SQL Server generated the following files.

if you query sys.dm_os_loaded_modules, you will see the native dlls loaded. see a screenshot below.

Additionally, these files will be always deleted and recreated for the following conditions

SQL Server restarts
Offline/online database
drop and recreate a table or procedure

How can I relocate these files?

If you want these files stored in a different location, all you need to do is to change default data file location. SQL Server Management Studio allows you to do that. But you will need to restart SQL Server after the change. Once you do that, the In-Memory OLTP related files will be in the new location.

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

↧

Are My Statistics Correct?

November 8, 2015, 11:10 pm

≫ Next: The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance

≪ Previous: In-Memory OLTP files –what are they and how can I relocate them?

The question is often “Are my statistics up-to-date?” which can be a bit misleading. I can make sure I have up-to-date statistics but the statistics may not be accurate.

I recently engaged in an issue where the statistics were rebuilt nightly. A maintenance job change had been made moving from FULLSCAN to WITH SAMPLE statistics creation/update that dramatically altered the statistical layout. The underlying data was skewed and as such the execution plan generation(s) varied significantly. Queries running in 1 minute now took over an hour to complete using an alternate plan with significant memory grants and TEMPDB usage.

As you can imagine this issue has resulted in a series of DCR asks from the product team.

The dilemma we all run into is what level of SAMPLED statistics is appropriate? The answer is you have to test but that is not always feasible and in the case of Microsoft CSS we generally don’t have histogram, historical states to revisit.

Microsoft CSS is engaged to help track down the source of a poorly performing query. It is common step to locate possible cardinality mismatches and study them closer. Studying the statistics dates, row modification counter(s), atypical parameters usage and the like are among the fundamental troubleshooting steps.

The script, below, is one way Microsoft CSS may use to help determine the accuracy of the current statistics. You can use similar techniques to check the accuracy of your statistical, SAMPLING choices or to store historical information. The example loads a specific histogram for the ‘SupportCases’ table then executes queries to, using the key values and range information to determine actual counts (as if FULLSCAN) had been executed. The final select of the captured data can be used to detect variations in current actual vs the in use histogram.

create table #tblHistogram
(
vData sql_variant,
range_rows bigint,
eq_rows bigint,
distinct_range_rows bigint,
avg_range_rows bigint,
actual_eq_rows bigint DEFAULT(NULL),
actual_range_rows bigint DEFAULT(NULL)
)
go

create

procedure #spHistogram @strTable sysname, @strIndex sysname
as

dbcc show_statistics(@strTable, @strIndex) with HISTOGRAM
go

truncate

table #tblHistogram
go

insert

into #tblHistogram (vData, range_rows, eq_rows, distinct_range_rows, avg_range_rows)
exec #spHistogram ‘SupportCases’, ‘cix_SupportCases’
go

– EQ_ROWS

update #tblHistogram
set actual_eq_rows = (select count(*) from SupportCases with(NOLOCK) where ServiceRequestNumber = h.vData)
from #tblHistogram h;

– RANGE_ROWS

with BOUNDS (LowerBound, UpperBound)
as
(
select LAG(vData) over(order by vData) as [LowerBound], vData [UpperBound] from #tblHistogram
)

update

#tblHistogram
set actual_range_rows = ActualRangeRows
from (select LowerBound, UpperBound,
(select count(*) from SupportCases with(NOLOCK) where ServiceRequestNumber > LowerBound and ServiceRequestNumber < UpperBound) as ActualRangeRows from BOUNDS
) as t
where vData = t.UpperBound
go

select

/*TOP 10 NEWID(),*/ vData, eq_rows, actual_eq_rows, range_rows, actual_range_rows from #tblHistogram
where eq_rows <> actual_eq_rows or range_rows <> actual_range_rows
–order by 1
go

Testing the script I leveraged UPDATE STATISTICS with SAMPLE 1 PERCENT and skewed data in my table. This resulted in several steps of the histogram having a statistical variation of +200% from the actual (FULLSCAN) values.

I continued to test variants of SAMPLE PERCENTAGE until the statistical relevance level from actuals fell within a noise range. For my data this was 65 PECENT. SAMPLING at 65 PERCENT allows reduction of statistics creation/modification time while retaining the necessary statistical relevance.

Bob Dorr – Principal SQL Server Escalation Engineer

↧

The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance

November 10, 2015, 12:12 pm

≫ Next: Will SQL Server use ‘incomplete’ or ‘dirty’ statistics during online index rebuild?

≪ Previous: Are My Statistics Correct?

Have you seen this message before? We see our customers encounter this message while performing SQL Server installation. If there is a problem, you will normally get this message in the “Instance Configuration” page of the “new SQL Server Failover Cluster setup” sequence.

Here is how the screen appears with the message at the bottom:

After you provide the SQL Server Network Name and instance name, you will click Next. At this point the setup program performs a few validations. If those validations fail, then you will notice the error message at the bottom of the screen. If you click on the error message, you will see the some additional information embedded in this message at the end which is not visible by default in this view. Here is an example:

In general you might encounter one of the following messages:

The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance due to the following error: ‘The network address is invalid.’

The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance due to the following error: ‘Access is denied.’

The troubleshooting steps and resolution for these situations depends on the what the last part of the error message indicates. Let’s take a quick look at how the setup program performs the validation of the network name. The setup program calls the Windows API NetServerGetInfo and passes two parameters: The network name that you typed in the setup screen and level 101. There are multiple outcomes from this Windows API call:

1. The API call returns OS error code 53 [The network path was not found]. This tells the setup program that network name provided in the setup program is good to use since nobody is currently using that same name in the network. This is what you ideally want to happen. Setup can proceed to the next steps.

2. The API call returns success. This tells the setup program that there is another computer active with this same name and hence we cannot use the network name provided in the setup screen. This is essentially a duplicate name scenario. This is straight forward and you can provide a different name to be used by setup.

3. The API call returns other unexpected failure states like the following:

RPC error code 1707 which translates to "The network address is invalid"
Security error code 5 which translates to "Access is denied"

These are the same error messages you actually get on the setup screen in the last part of that long error message. Now, let us review the steps you can take to troubleshoot these errors and resolve them.

As a first step, you can isolate this issue to this specific API call and remove SQL server setup from the picture. You can take the sample code for Windows API NetServerGetInfo to build a console application and pass the same network name as parameter to this call. Observe which one of the error codes discussed above is returned back. You need to get back OS error 53 but you might be getting 1707 or 5 as error codes.

If you now use Process Monitor to track the activity, you will notice a CreateFile call to \\SQL-VNN-TEST\PIPE\srvsvc encounter a BAD NETWORK NAME or ACCESS DENIED.

If you do not have the required permissions to create computer objects, make sure that the computer objects are pre-staged with the appropriate permissions as described in the document: Failover Cluster Step-by-Step Guide: Configuring Accounts in Active Directory. Also validate that there is no stale entry in the DNS server that is pointing this network name to a different IP address. If possible, clean up all entries related to this network name from the active directory and other name resolution servers like DNS. It will be a good idea to create entries for this network name fresh as described in the section “Steps for prestaging the cluster name account” and “Steps for prestaging an account for a clustered service or application”.

In the past when our networking team debugged this, they noticed that the error code changes (from 53 to 1707) while the network request is flowing through the various drivers in the network stack. RDBSS will show the correct error code but when the request reaches MUP it gets changed to one of the incorrect error codes we finally encounter. Typically this happens when there is some filter driver sitting in the network stack and intercepting these calls and eventually changing the return codes. So next step for you will be to review all processes and services that are running on this system and evaluate if you can disable or remove the non-critical ones just during the installation or troubleshooting timeframe.

Check if this problem happens only for a specific name or any network name that you pass for the validation. This can help establish the fact that there is a generic network issue at play than looking up a specific network name.

It will be great to hear from you if you encountered this issue and which one of the above steps you used to resolve this issue. Also if there is something we have overlooked, please let us know so we can add them to this list of steps to resolve this issue.

Thanks,

Suresh Kandoth – SQL Server Escalation Services

↧

Will SQL Server use ‘incomplete’ or ‘dirty’ statistics during online index rebuild?

November 16, 2015, 3:34 am

≫ Next: MultiSubnet = TRUE Is Now Default Behavior

≪ Previous: The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance

We had a customer who opened an issue with us and wanted to know the behavior of statistics during online index rebuild. Specifically, he suspected that SQL Server might have used ‘incomplete’ statistics because his application uses read uncommitted isolation level.

This type of questions comes up frequently. I thought I’d share my research and answers to this customer so that readers will benefit from this blog.

In order to answer the question more accurately, let’s be specific. Let’s call Stats1 for index1’s statistics before online index rebuild and stats2 is after online index rebuild. Furthermore, let’s call Stats3 for any incomplete stats during the index rebuild. Now the question becomes: during online index index rebuild for index1 (started but not completed), which stats will my query (compiled during online index rebuild) use (stats1, stats2 or stats3)?

Here are few key points that answer the above question:

First of all, there is no stats3. SQL Server never stuffs in flight stats to stats blob for use during online index rebuild. Even you are under dirty read, you won’t get non-existing stats3.
During online index rebuild, stats1 (old stats) continues to be available for use until the very end
Stats2 (new stats) will be updated at very end of index rebuild .
During the brief period when SQL switches to new stats (stats2), no one can access stats at all. Even with read uncommitted isolation level, you can’t access it. This is because SQL Server acquires schema modification lock at the very last of online index rebuild to make changes in meta data including stats change. Even you have read uncommitted isolation level, you still need schema stability lock for the table. You can’t have that when schema modification lock is granted by someone else. In short, you will never see anything in between. You either see before (stats 1) or after (stats2).
After online index rebuild, all queries involved in the tables will need to recompile .

What about Index reorg?

REORG does nothing related to statistics update. In other word, REORG doesn’t update stats for the index at all. I have posted a blog. In the interest of finding impact of reorg on locks and recompile, I did more research. Re-org won’t cause recompile of your query or hold schema modification lock. It requests a schema stability lock which is much ligher weight. Reorg does acquires and releases x locks on pages or rows. But these have no effect on stats or queries in read uncommitted isolation levels. In otherwords, your query in read uncommitted isolation will continue to run without any impact. Re-org only help on data is accessed physically. No stats update, no recompile.

What is the duration of schema stability locks

for online index rebuild, duration of schema-modification lock (for rebuild, sql acquire schema modification lock) is very brief towards the end. All it does is to do metadata update?

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

↧

MultiSubnet = TRUE Is Now Default Behavior

December 3, 2015, 4:12 am

≫ Next: SQL Server 2016 Temporal Data Assists Machine Learning Models

≪ Previous: Will SQL Server use ‘incomplete’ or ‘dirty’ statistics during online index rebuild?

I get to be the a good new messenger today. We have made changes to the SQL Server Client Provider. The provider detects when multiple IP addresses are present for a listener.

The links below detail the behavior making it easier for your multi-subnet AlwaysOn deployments.

Improved MultiSubnet Listener Behavior With Newly Released SQL Client Provider in .NET 4.6.1

.NET Framework 4.6.1 is now available!

Bob Dorr – Principal SQL Server Escalation Engineer

↧

SQL Server 2016 Temporal Data Assists Machine Learning Models

December 6, 2015, 11:00 pm

≫ Next: Spool operator and trace flag 8690

≪ Previous: MultiSubnet = TRUE Is Now Default Behavior

Microsoft is always seeking out ways to improve the customer experience and satisfaction. A project that is currently active looks at the SQL Server incidents reported to Microsoft SQL Server Support and applies Machine Learning. A specific aspect of the project is to predict when a case needs advanced assistance (escalation, onsite, development or upper level management assistance.)

Not every model requires historical data but working with our data scientists I realized the importance of temporal as it related to our approach. We are trying to predict on day 1, 2, 3, … that a issue has a high probability of requiring advanced assistance. While building the training set for Machine Learning it becomes clear that the model needs to understand what the issue(s) looked like on day 1, day 2, and so forth.

I made a quick chart in excel to help us all visualize the concept.

If I want to predict if an issue has a high probability of needing advanced assistance I need to know what an advanced issue looked like over time. If I take the training values when the incident was resolved, Machine Learning is limited to learning the resolution patterns.

Let me expound on this a bit more. If provide training data at day 10 to the machine learning model I am influencing the model accuracy at day 10. The model can be very accurate for day 10 but I want to predict issues that need assistance and address them on day 1.

Using a temporal approach the training data is expanded to each day in the life of the advanced incidents. The model now understands what an advanced issues looked like on day 1, 2, … allowing it to provide relevant predictions. When a case has high relevancy in the this model we can adjust resources and assist the customer quickly.

I prefer easy math so let’s assume 1000 issues needed advanced assistance over the past year and each of them took 10 days to resolve. Instead of a training set of 1000 issues at the point of resolution, applying a temporal design expands the training set to 1000 x 10 = 10,000 views.

When using Machine Learning carefully consider if SQL Server 2016 Temporal Tables are relevant to the accuracy and design of your model.

Bob Dorr – Principal SQL Server Escalation Engineer

↧

Spool operator and trace flag 8690

December 14, 2015, 7:27 pm

≫ Next: What to do when you run out of disk space for In-Memory OLTP checkpoint files

≪ Previous: SQL Server 2016 Temporal Data Assists Machine Learning Models

If you have seen enough query plans, you for sure ran into spool operators (index spool or table spool). It is documented in https://technet.microsoft.com/en-us/library/ms181032(v=sql.105).aspx

The spool operator helps improve a query performance because it stores intermediate results so that SQL doesn’t have to rescan or re-compute for repeated uses. Spool operator has many usage.

For this blog, I’m talking about spool on the inner side of nested loop. If your query plan has this type of spool, you will see something similar like below:

Spools improve performance in majority of the cases. But it’s based on estimates. Sometimes, this can be incorrect due to unevenly distributed or skewed data, causing slow performance.

You can actually disable the spool on the inner side of nested loop with trace flag 8690. This trace flag helped two of my customers last week. I want to point out this is an exception (that I resolved two issues this way in one week). In vast majority of situations, you don’t need to manually disable spool with this trace flag.

I don’t recommend you to disable table spool server wide. But you can use querytraceon to localize a single query if you exhaust other ways to tune the query and find disable table spool helps you.

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

↧

What to do when you run out of disk space for In-Memory OLTP checkpoint files

December 23, 2015, 3:07 pm

≫ Next: Wanting your non-sysadmin users to enable certain trace flags without changing your app?

≪ Previous: Spool operator and trace flag 8690

While data for memory optimized tables resides in memory all the time with SQL Server 2014 and 2016’s In-Memory OLTP feature, we still need a means to cut down recovery time in case of crash or restart. For disk based table, checkpoint flushes the dirty pages into data file(s). With In-memory OLTP, there are separate set of checkpoint files that SQL Server uses. These checkpoint files reside in a directory you specify when you create the MEMORY_OPTIMIZED_DATA filegroup required to enable In-Memory OLTP feature.

The question is what happens if the disk that host the In-Memory checkpoint files runs out of disk space? So I decided to do some testing and document the symptoms and recovery steps here in case you run into such issue. With our Azure, test was really easy. All I had to do was to spawn a VM and attach a very small disk to simulate out of disk space condition.

If your disk runs out of space, you will see various errors below though your database stays online

Your insert, update or delete may fail with the following error:

Msg 3930, Level 16, State 1, Line 29

The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.

In the errorlog, you will see

2015-12-23 21:38:23.920 spid11s [ERROR] Failed to extend file ‘f:\temp\imoltp_mod1\7ef8758a-228c-4bd3-9605-d7562d23fa76\a78f6449-bd73-4160-8a3f-413f4eba8fb300000ad-00013ea0-0002′ (‘GetOverlappedResult’). Error code: 0x80070070. (d:\b\s1\sources\sql\ntdbms\hekaton\sqlhost\sqllang\fsstgl

2015-12-23 21:40:49.710 spid11s [ERROR] Database ID: [6]. Failure to allocate cache file. Error code: 0x80070070. (d:\b\s1\sources\sql\ntdbms\hekaton\engine\hadr\ckptagent.cpp : 890 – ‘ckptAgentAllocateCfp’)

if you manually issue checkpoint command, you will get this error:

Msg 41315, Level 16, State 0, Line 5

Checkpoint operation failed in database ‘testdb’.

What to do when you encounter such condition?

step 1 — Add additional ‘container’

if you can append more space to the disk, just do so. If you can’t append more space to current disk, you can add another ‘container’ to the MEMORY_OPTIMIZED_DATA to point to a folder in another drive. You can do so by issuing a command like this: ALTER DATABASE testdb ADD FILE (name=’imoltp_mod1′, filename=’f:\checkpoint\imoltp_mod1′) TO FILEGROUP imoltp_mod

step 2– Manually issue a checkpoint: after you have added space or additional ‘container’ as above, just run checkpoint against the database. then you are all set.

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

↧

Wanting your non-sysadmin users to enable certain trace flags without changing your app?

December 29, 2015, 7:34 pm

≫ Next: TLS 1.2 Support for SQL Server 2008, 2008 R2, 2012 and 2014

≪ Previous: What to do when you run out of disk space for In-Memory OLTP checkpoint files

Even with SQL Server support for so many years, we still face something new almost every day. Sometimes you will just have to combine things together to achieve what you need. Here is an example due to troubleshooting a customer’s issue.

A couple of months ago, we ran into a need to enable a trace flag when troubleshooting a highly critical performance issue. This customer had 30 databases that served many applications on a single server. One application produced queries that negatively impacted entire server. Through troubleshooting, we discovered a trace flag (which is rarely used by the way) helped query plans for that set of queries. The problem is that the trace flag is not suited for entire server because it would negatively impact other queries.

The initial thought is to enable the trace flag at session level. We ran into two challenges. First, application needs code change (which they couldn’t do) to enable it. Secondly, dbcc traceon requires sysadmin rights. Customer’s application used a non-sysadmin user. These two restrictions made it seem impossible to use the trace flag.

However, we eventually came up with a way of using logon trigger coupled with wrapping the dbcc traceon command inside a stored procedure. In doing so, we solved all problems. We were able to isolate the trace flag just to that application without requiring sysadmin login.

Below is the code of using trace flag 9481. I used trace flag 9481 in the demo here because it’s easier to verify the fact it indeed takes effect.

alter database master set trustworthy on
go

use master

go
create procedure proc_enable_tf
with execute as owner
as
Exec(‘dbcc traceon(9481)’)

go
grant execute on proc_enable_tf to public
go

create TRIGGER trigger_enable_tf
ON ALL SERVER
FOR LOGON
AS
BEGIN
IF app_name()= ‘Microsoft SQL Server Management Studio – Query’ — replace this with your application name
begin
exec master.dbo.proc_enable_tf
end
END;

After you execute above code on SQL Server 2014, you can create a login that is not member of sysadmin. Then log in with that user using Management studio and run some query to gather xml query plan. In the query plan, you can examine CardinalityEstimationModelVersion to see it’s 70 (instead of 120 which is default).

you can also see in message “DBCC TRACEON 9481, server process ID (SPID) 58. This is an informational message only; no user action is required” in the errorlog.

Reference:

Optimizer trace flags are documented in https://support.microsoft.com/en-us/kb/2801413 and https://support.microsoft.com/en-us/kb/920093.

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

↧