I recently ran into a problem with one of my own internal applications and it re-raised a philosophical question I have had before with customers. There are really two sides to the question:
1) Should I set my non-default instance to listen on TCP 1433?
2) Should I set my default instance to listen on something other than TCP 1433?

In both cases, I recommend "no".

Let me tackle the second question first since that is a simpler question. I know that years ago one of the common security recommendations was to put your default instance on a port other than TCP 1433 to make it more difficult for attackers to find it. However, I can say with complete comfort that if an attacker has a network trace that has a connection attempt to your instance, they can figure out the port on which your instance is listening very easily. Even if you encrypt the conversation, the first five packets of the conversation are unencrypted because you cannot encrypt anything until you have contacted the instance. Since those first five packets are the same for any connection attempt, it is very easy to detect them in a network trace. Although I don't do this maliciously (I swear!), I do this on a regular basis when someone sends me a network trace to a named instance and neglects to tell me the port on which the instance is listening.

In addition, if you change your default instance to a port other than TCP 1433, you now need to specify it in every connection string - either directly (servername,port#) or indirectly via client alias. Given how easy it is to find this conversation in a network, I really cannot see the additional effort as being worth the negligible security benefit (security by obfuscation is never a great idea).

The first question is a little bit more complex. Setting your default instance to TCP 1433 does indeed give you the benefit of not having to specify the instance name or port number in the connection string. This is because the SQL Server client libraries don't bother querying SQL Browser for the port number if they don't detect an instance name in the connection string. Instead, they go straight to TCP 1433.

The downside to this approach shows up when you are working with application administrators who don't know anything about the SQL Server instance. If they don't know that the instance is a named instance, they might configure their connection string as if the SQL Server instance was a default instance. Since the instance is listening on TCP 1433, the attempt to connect will succeed. The the real problem comes later when you decide to change the port on which your SQL Server instance is listening (maybe you read my blog:)). If you do, but don't change the client connection string, the client won't be able to connect. And, because the client thinks your instance is a default instance, it won't query SQL Browser, so will never find out the new port. The only way to fix this is to create an alias on the client (tough to maintain over time) or to modify the connection string to specify an instance name. Now, instead of just getting downtime on the SQL Server side, you have to take downtime on the client side, too.

In conclusion, given that there is a negligible security benefit to modifying the port for your default instance and there is significant potential for outages with setting your named instance to TCP 1433. Therefore, with the exception of setting a static port for your named instances, I recommend you just leave the port settings to default.

P.S. Please don't set any of your instances to TCP 1434 either. While not technically wrong, it is very confusing since SQL Browser listens on UDP 1434 and hardly anybody references the protocol (TCP vs. UDP) when talking about ports. Making sure both sides of the conversation are talking about the same service can then get quite confusing if you put SQL Server on TCP 1434.

Evan Basalik | Senior Support Escalation Engineer | Microsoft SQL Server Escalation Services

A customer had encountered an issue with their SharePoint 2010 / Reporting Services 2012 deployment. They had setup a Data Source for Reporting Services that was setup to connect to a stand alone Analysis Services instance. When they clicked on “Test Connection” they saw the following:

Within the SharePoint ULS Log, we saw the following – which was really the same error:

Throwing Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: , Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: Cannot create a connection to data source 'AdventureWorksAS2012.rsds'. ---> Microsoft.AnalysisServices.AdomdClient.AdomdConnectionException: The connection either timed out or was lost. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host

at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace ---

When I saw this error, I did not attribute the error to an authentication issue as this usually indicates a network related issue. I was actually able to reproduce the issue in my local environment. Once I had it reproduced I grabbed an Analysis Services Profiler trace and saw the following.

The minute I saw that, my mindset shifted to an authentication issue and I was pretty sure this was Kerberos related – which based on our deployment of SharePoint 2010 and RS 2012 this also equated to a Claims/Kerberos issue. Some people think that because we are a Claims aware service now, that Kerberos isn’t needed any longer. What you will see below is that Kerberos is definitely in play and contributes to the issue.

So, I started with a Network trace using Network Monitor 3.4. After I collected the trace, I just filtered with the KeberosV5 protocol and applied that. Here is what I saw:

There were actually two things going on here.

I was missing the MSOLAPSvc.3/BSPegasus SPN
The Claims Service Account did not have Constrained Delegation setup to allow delegation to the OLAP Service.

I added my MSOLAP SPN’s. In this case it was requesting the NETBIOS name, so I added both:

What surprised me on this was that I didn’t see any PRINCIPAL_UNKNOWN errors here. Just the KDC_ERR_BADOPTION. In the past, I usually ignored BADOPTION errors and sometimes it can be red herring. The key here is the number. The BADOPTIONS I typically ignored had a 5 code with it. These had 13. Of note, this BADOPTION was because of Item 2 above – lack of Constrained Delegation configured within Active Directory.

The thing to remember about this deployment is that this is going to be Claims to start. This means that we will be using the Claims to Windows Token Services (C2WTS). There will be a Windows Service on the server that is affected and it will have an associated Service Account. In my case, my service account is BATTLESTAR\claimsservice. After adding the SPN, I allowed the the Constrained Delegation option to the MSOLAP service. This is done on the Delegation tab for the service account in question. If you are using LocalSystem for the C2WTS service account, it would be on the machine account for the server that the C2WTS service is running on.

NOTE: In order to see the Delegation tab in Active Directory, an SPN needs to be on that account. However, there is no SPN needed for the Claims Service Account. In my case, I just added a bogus SPN to get it to show. The SPN I added isn’t used for anything other than to get the Delegation tab to show.

After I had that in place, I did an IISReset to flush and cache for that logged in session and ran a Network Trace again – because I got the same error.

You can notice that the BADOPTION is not present after the MSOLAP TGS request. That’s because what we did corrected that one. However, now we see a BADOPTION after the TGS request for the RSService. This is something I ran into a few months back that a lot of people either aren’t aware of, or the configuration is so confusing that it is just missed. Even though you setup the Claims Service with Delegation settings, the actual Shared Service that we are using, also needs these delegation settings. In this case it would be Reporting Services. So, we have to repeat what we did for the Claims Service with the Reporting Services account.

NOTE: In this configuration, the Reporting Services Service Account will not have any SPN’s on it as they are not needed (unless you are sharing it with something else). So, we’ll need to add a bogus SPN on the RS Service Account to get the Delegation tab to show up.

In my case, I’m sharing my RSService account with a native mode service, so I actually have an HTTP SPN on the account and the Delegation tab is available.

NOTE: Because the Claims Service has forced Constrained Delegation because of the need for Protocol transitioning, the RS Service MUST use Constrained Delegation. You can’t go from more secure to less secure. It will fail.

Now lets look at the network trace with these changes.

You can see that we got a successful response on the 4th line without getting the BADOPTION. We still see one more BADOPTION, but I didn’t concern myself with it, because…

I was now working!!!

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

I was working with a customer who was encountering problems trying to use a PerformancePoint Dashboard against an Analysis Services Instance. The issue came down to the Claims to Windows Token Service (C2WTS) configuration. This is used to take the Claims context and convert it to a Windows Token for use to backend servers.

When trying to create a Data Source within PerformancePoint Dashboard Designer, using the Unattended Service Account, the test succeeds. If we switch that over to Per-user Identity, we see the following:

Within the Event Logs for the SharePoint App Server, we see the following from PerformancePoint:

Log Name:      Application
Source:        Microsoft-SharePoint Products-PerformancePoint Service
Date:          9/6/2012 11:59:57 AM
Event ID:      37
Task Category: PerformancePoint Services
Level:         Error
Keywords:
User:          BATTLESTAR\spservice
Computer:      AdmAdama.battlestar.local
Description:
The following data source cannot be used because PerformancePoint Services is not configured correctly.

Data source location: http://admadama:82/Data Connections for PerformancePoint/5_.000
Data source name: New Data Source 3

Monitoring Service was unable to retrieve a Windows identity for "BATTLESTAR\asaxton". Verify that the web application authentication provider in SharePoint Central Administration is the default windows Negotiate or Kerberos provider. If the user does not have a valid active directory account the data source will need to be configured to use the unattended service account for the user to access this data.

Exception details:
System.InvalidOperationException: Could not retrieve a valid Windows identity. ---> System.ArgumentException: Token cannot be zero.
   at System.Security.Principal.WindowsIdentity.CreateFromToken(IntPtr userToken)
   at System.Security.Principal.WindowsIdentity..ctor(IntPtr userToken, String authType, Int32 isAuthenticated)
   at System.Security.Principal.WindowsIdentity..ctor(IntPtr userToken)
   at Microsoft.IdentityModel.WindowsTokenService.S4UClient.CallService(Func`2 contractOperation)
   at Microsoft.SharePoint.SPSecurityContext.GetWindowsIdentity()
   --- End of inner exception stack trace ---
   at Microsoft.SharePoint.SPSecurityContext.GetWindowsIdentity()
   at Microsoft.PerformancePoint.Scorecards.ServerCommon.ConnectionContextHelper.SetContext(ConnectionContext connectionContext, ICredentialProvider credentials)

This error is indicating that the C2WTS Service failed with getting the windows Credential. The S4UClient call is the key indicator. We reviewed the C2WTS settings, which aren’t many, and the one thing I remembered is that if you are using a Domain User account for the C2WTS Windows Service, you have to add it to the Local Adminstrators group on the box that is trying to invoke it. In our case, it is the server hosting the PerformancePoint Service App. You don’t have to do this step if you leave the C2WTS service as LocalSystem.

Once that is done, we need to recycle the C2WTS Windows Service and try it again. We were then presented with a different error:

Log Name:      Application
Source:        Microsoft-SharePoint Products-PerformancePoint Service
Date:          9/6/2012 12:09:42 PM
Event ID:      9
Task Category: PerformancePoint Services
Level:         Warning
Keywords:
User:          BATTLESTAR\spservice
Computer:      AdmAdama.battlestar.local
Description:
The user "BATTLESTAR\asaxton" does not have access to the following data source server.

Data source location: http://admadama:82/Data Connections for PerformancePoint/5_.000
Data source name: New Data Source 3
Server name: bspegasus\kjssas

Exception details:
Microsoft.AnalysisServices.AdomdClient.AdomdConnectionException: A connection cannot be made to redirector. Ensure that 'SQL Browser' service is running. ---> System.Net.Sockets.SocketException: The requested name is valid, but no data of the requested type was found
   at System.Net.Sockets.TcpClient..ctor(String hostname, Int32 port)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
   --- End of inner exception stack trace ---
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.OpenTcpConnection(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.Connect(ConnectionInfo connectionInfo, Boolean beginSession)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetInstancePort(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.OpenTcpConnection(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.Connect(ConnectionInfo connectionInfo, Boolean beginSession)
   at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.XmlaClientProvider.Connect(Boolean toIXMLA)
   at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.ConnectToXMLA(Boolean createSession, Boolean isHTTP)
   at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.Open()
   at Microsoft.PerformancePoint.Scorecards.DataSourceProviders.AdomdConnectionPool`1.GetConnection(String connectionString, ConnectionContext connectionCtx, String effectiveUserName, CultureInfo culture, NewConnectionHandler newConnectionHandler, TestConnectionHandler testConnectionHandler)

At first I thought that this may be because of SQL Browser, based on the error message. And, I know that for SQL Browser, when we have a Named Instance, you have to add the DISCO SPN’s per the following KB Article:

An SPN for the SQL Server Browser service is required when you establish a connection to a named instance of SQL Server Analysis Services or of SQL Server
http://support.microsoft.com/kb/950599

My thought was that I had to add delegation rights for the Claims and PerformancePoint service over to the DISCO service. This actually turned out to not be needed at all based on my testing. I have this actually working with those SPN’s in place and without the Claims/PerformancePoint service accounts having Constrained Delegation rights to that.

After playing around with this a little more, I remembered that I had been told a while back that the Claims Service Account needs to have “Act as part of the operating system” right in order to work correctly. My mindset was that if the account was a local admin, this wouldn’t be needed. With that right missing, I was able to reproduce the 2nd error that the customer was hitting. This is actually listed on page 126 of the following whitepaper.

Configuring Kerberos Authentication for Microsoft SharePoint 2010 Products
http://www.microsoft.com/en-us/download/details.aspx?id=23176

Of note, the “Impersonate a client after authentication” right that it lists, you get for free because the Claims Service account will be a member of WSS_WPG which is a member of the IIS_IUSRS group because of SharePoint.

The C2WTS Service Account will be automatically added to the “Log on as a service” right when you start the C2WTS Service from Central Admin in the “Manage services on server” area.

The lesson learned here is that the Claims to Windows Token Service Account needs to be in the Local Administrators group and has to have the “Act as part of the operating system” right that you can assign within Local policies.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

We have had two blog posts on this blog regarding the 18056 error. Two from Bob Dorr (and part 2) and another from Tejas Shah. However, we still see a lot of questions about this error message. This error message can show up for different reasons. After those two blog posts were made, we released the following:

FIX: Errors when a client application sends an attention signal to SQL Server 2008 or SQL Server 2008 R2
http://support.microsoft.com/kb/2543687

This fix was specific to the following message and having to do with Attentions:

Error: 18056, Severity: 20, State: 29.
The client was unable to reuse a session with <SPID>, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Since this was released, there has still continued to be confusion over this error. The intent of the fix above was to limit the amount of noise in the ERRORLOG. And, this was specific to receiving the State 29 with 18056 when an Attention was received. The Attention is the important part here. If an Attention occurred during a reset of a connection, we would normally log that to the ERRORLOG under the State 29. However, with this fix applied, if the Attention occurs during the reset of a connection, you should no longer see the error within the ERRORLOG. This does NOT mean that you will no longer see a State 29.

I will use this post to explain further how we handle these errors to give you a better understanding. To do that, I will expand on Bob Dorr's blog post that I linked above which lists out the states.

States

Default = 1,
GetLogin1, 2
UnprotectMem1, 3
UnprotectMem2, 4
GetLogin2, 5
LoginType, 6
LoginDisabled, 7
PasswordNotMatch, 8
BadPassword, 9
BadResult, 10
FCheckSrvAccess1, 11
FCheckSrvAccess2, 12
LoginSrvPaused, 13
LoginType, 14
LoginSwitchDb, 15
LoginSessDb, 16
LoginSessLang, 17
LoginChangePwd, 18
LoginUnprotectMem, 19
RedoLoginTrace, 20
RedoLoginPause, 21
RedoLoginInitSec, 22
RedoLoginAccessCheck, 23
RedoLoginSwitchDb, 24
RedoLoginUserInst, 25
RedoLoginAttachDb, 26
RedoLoginSessDb, 27
RedoLoginSessLang, 28
RedoLoginException, 29 (Kind of generic but you can use dm_os_ring_buffers to help track down the source and perhaps –y. Think E_FAIL or General Network Error)
ReauthLoginTrace, 30
ReauthLoginPause, 31
ReauthLoginInitSec, 32
ReauthLoginAccessCheck, 33
ReauthLoginSwitchDb, 34
ReauthLoginException, 35

**** Login assignments from master ****

LoginSessDb_GetDbNameAndSetItemDomain, 36
LoginSessDb_IsNonShareLoginAllowed, 37
LoginSessDb_UseDbExplicit, 38
LoginSessDb_GetDbNameFromPath, 39
LoginSessDb_UseDbImplicit, 40 (We can cause this by changing the default database for the login at the server)
LoginSessDb_StoreDbColl, 41
LoginSessDb_SameDbColl, 42
LoginSessDb_SendLogShippingEnvChange, 43

**** Connection String Values ****

RedoLoginSessDb_GetDbNameAndSetItemDomain, 44
RedoLoginSessDb_IsNonShareLoginAllowed, 45
RedoLoginSessDb_UseDbExplicit, 46 (Data specified in the connection string Database=XYX no longer exists)
RedoLoginSessDb_GetDbNameFromPath, 47
RedoLoginSessDb_UseDbImplicit, 48
RedoLoginSessDb_StoreDbColl, 49
RedoLoginSessDb_SameDbColl, 50
RedoLoginSessDb_SendLogShippingEnvChange, 51

**** Common Windows API Calls ****

ImpersonateClient, 52
RevertToSelf, 53
GetTokenInfo, 54
DuplicateToken, 55
RetryProcessToken, 56
LoginChangePwdErr, 57
WinAuthOnlyErr, 58

**** New with SQL 2012 ****

DbAuthGetLogin1, 59
DbAuthUnprotectMem1, 60
DbAuthUnprotectMem2, 61
DbAuthGetLogin2, 62
DbAuthLoginType, 63
DbAuthLoginDisabled, 64
DbAuthPasswordNotMatch, 65
DbAuthBadPassword, 66
DbAuthBadResult, 67
DbAuthFCheckSrvAccess1, 68
DbAuthFCheckSrvAccess2, 69
OldHash, 70
LoginSessDb_ObtainRoutingEnvChange, 71
DbAcceptsGatewayConnOnly, 72

Pooled Connections

An 18056 error can only occur when we are trying to reset a pooled connection. Most applications I see these days are setup to use pooled connections. For example, a .NET application will use connection pooling by default. The reason for using pooled connections are to avoid some of the overhead of creating a physical hard connection.

With a pooled connection, when you close the connection in your application, the physical hard connection will stick around. When the application then goes to open a connection, using the same connection string as before, it will grab an existing connection from the pool and then reset the connection.

When a connection is reset, you will not see sp_reset_connection over the wire. You will only see the "reset connection" bit set in the TDS Packet Header.

Frame: Number = 175, Captured Frame Length = 116, MediaType = ETHERNET
+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-15-5D-4C-B9-60],SourceAddress:[00-15-5D-4C-B9-52]
+ Ipv4: Src = 10.0.0.11, Dest = 10.0.0.130, Next Protocol = TCP, Packet ID = 18133, Total IP Length = 102
+ Tcp: [Bad CheckSum]Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=62, Seq=4058275796 - 4058275858, Ack=1214473613, Win=509 (scale factor 0x8) = 130304
- Tds: SQLBatch, Version = 7.3 (0x730b0003), SPID = 0, PacketID = 1, Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=62, Seq=4058275796 - 4058275858, Ack=1214473613, Win=130304
- PacketHeader: SPID = 0, Size = 62, PacketID = 1, Window = 0
PacketType: SQLBatch, 1(0x01)
Status: End of message true, ignore event false, reset connection true, reset connection skip tran false
Length: 62 (0x3E)
SPID: 0 (0x0)
PacketID: 1 (0x1)
Window: 0 (0x0)
- TDSSqlBatchData:
+ AllHeadersData: Head Type = MARS Header
SQLText: select @@version

In the above example, we are issuing a SQL Batch on a pooled connection. Because it was a pooled connection, we have to signal that we need to reset the connection before the Batch is executed. This is done via the "reset connection" bit.

After the above SQLBatch is issued, the app could then turn around and issue an Attention to cancel the request. This is what resulted in the 18056 with State 29 in the past under the condition of an attention.

Frame: Number = 176, Captured Frame Length = 62, MediaType = ETHERNET
+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-15-5D-4C-B9-60],SourceAddress:[00-15-5D-4C-B9-52]
+ Ipv4: Src = 10.0.0.11, Dest = 10.0.0.130, Next Protocol = TCP, Packet ID = 18143, Total IP Length = 48
+ Tcp: [Bad CheckSum]Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=8, Seq=4058275858 - 4058275866, Ack=1214473613, Win=509 (scale factor 0x8) = 130304
- Tds: Attention, Version = 7.3 (0x730b0003), SPID = 0, PacketID = 1, Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=8, Seq=4058275858 - 4058275866, Ack=1214473613, Win=130304
- PacketHeader: SPID = 0, Size = 8, PacketID = 1, Window = 0
PacketType: Attention, 6(0x06)
Status: End of message true, ignore event false, reset connection false, reset connection skip tran false
Length: 8 (0x8)
SPID: 0 (0x0)
PacketID: 1 (0x1)
Window: 0 (0x0)

In this case, we would still be in the process of doing the connection reset which would be a problem. Bob Dorr's Part 2 blog that is linked above goes into good detail for how this actually occurs.

So, no more State 29?

The thing to realize about State 29 is that it is a generic state just indicating that an exception has occurred while trying to redo a login (Pooled Connection). This exception was not accounted for in any other logic to produce a different state that is listed above. Something similar to like an E_FAIL or General Network Error.

Going forward, assuming you the above fix applied, or are running on SQL 2012 which has it as well, if you get a State 29, it will not be because of an Attention because we are not logging the 18056 any longer for the Attention, however, if you look at dm_os_ring_buffers, you will still see the actual Attention (Error 3617). We just don't log the 18056 any longer to avoid noise.

<Record id= "3707218" type="RING_BUFFER_EXCEPTION" time="267850787"><Exception><Task address="0x52BDDC8"></Task><Error>3617</Error><Severity>25</Severity><State>23</State><UserDefined>0</UserDefined></Exception><Stack

There are things that occur in the course of resetting a login that could trigger a State 29. One example that we have seen is a Lock Timeout (1222).

In the Lock Timeout scenario, the only thing logged to the ERRORLOG was the 18056. We had to review the dm_os_ring_buffersDMV to see the Lock Timeout.

<Record id= "3707217" type="RING_BUFFER_EXCEPTION" time="267850784"><Exception><Task address="0x4676A42C8"></Task><Error>1222</Error><Severity>16</Severity><State>55</State><UserDefined>0</UserDefined></Exception><Stack

The Lock Timeout was a result of statements issuing "SET LOCK_TIMEOUT 0" which affects the connection itself. When the connection is "reset", the SET statements are carried forward. Then based on timing, and whether an exclusive lock is taken based on what the Login logic is looking for, it could end up affecting Logins off of a Pooled Connection when that connection is reused. The default lock timeout for a connection is -1.

Now what?

If you receive a State 29, you should follow that up by looking in the dm_os_ring_buffers. You will want to look at the RING_BUFFER_EXCEPTION buffer type.

selectcast(recordasXML) asrecordXML
fromsys.dm_os_ring_buffers
wherering_buffer_type ='RING_BUFFER_EXCEPTION'

The error that you find should help explain the condition, and/or allow you to troubleshoot the problem further. If you see 3617, then you will want to look at applying the hotfix above to prevent those messages from being logged. If you see a different error, then you may want to collect additional data (Profiler Trace, Network Trace, etc…) to assist with determining what could have led to that error.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

I’m always amazed that issues usually come in batches. I was looped into a few cases that had the following symptoms. They were running SharePoint 2010 and Reporting Services 2012 SP1. When they went to use a data source with Windows Authentication, they were seeing the following error:

System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

This caused me to raise an eyebrow (visions of Spock as the new Star Trek movie is opening today <g>). A lot of thoughts were floating in my head that all told me that this error didn’t make sense, for a bunch of reasons.

The default protocol order for connecting to SQL from a client is TCP and then Named Pipes. So, because we failed with a Named Pipes error, that meant something was either wrong with TCP or someone changed the Protocol order (which I have never seen in a customer case – so very unlikely)
This is RS 2012, which means we are a Shared Service and rely on the Claims to Windows Token Service (C2WTS). This forces Constrained Delegation. Pretty sure most people would not have created the delegation requirements for the Named Pipes SQL SPN as most people go down the TCP route. You can read more about SQL’s SPNs being Protocol based here. Also more on this related aspect in a later post as I found some interesting things about this as well.
This error tells me that we couldn’t establish a connection to SQL via Named Pipes. Think of this as a “Server Not Found” type error. I immediately tossed out any Kerberos/Claims related issue due to that thinking – again more on the kerb piece of this in a later post.
This is really the first time I’ve had someone hit me up with a Named Pipes connection failure from an RS/SharePoint Integration perspective ever. And I just got hit with 3 of them within the same week. Something is up.

Being this told me we had an actual connection issue via Named Pipes, I started down the normal connectivity troubleshooting path. With any connectivity issue, I started with a UDL (Universal Data Link) file. Basically just a text file renamed with an extension of UDL. It’s important to run this from the same machine that is hitting the SqlException. In my case it was my SharePoint App server, not the WFE server.

You’ll notice the “np:” in front of the server name. This forces the Named Pipes Protocol and ignores the default protocol order. And this worked. I also tried “tcp:” to force TCP in the UDL and this worked to. I went back to my data source and tried forcing TCP there.

System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - The requested name is valid, but no data of the requested type was found.)

This made no sense. I even made sure I was logged in as the RS Service Account as that is the context in which we would have been connecting to SQL. Same result. Also, within a network trace, I saw nothing on either the TCP or Named Pipes side of the house in the trace that related to this connection attempt. Which meant we never hit the wire.

As I was going to collect some additional diagnostic logging (Kerberos ETW tracing and LSASS Logging) I ended up doing an IISRESET and a recycle of the C2WTS service. We went to reproduce the issue, but got a different error this time.

System.IO.FileLoadException: Could not load file or assembly 'System.EnterpriseServices, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. Either a required impersonation level was not provided, or the provided impersonation level is invalid. (Exception from HRESULT: 0x80070542) File name: 'System.EnterpriseServices, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' ---> System.Runtime.InteropServices.COMException (0x80070542): Either a required impersonation level was not provided, or the provided impersonation level is invalid. (Exception from HRESULT: 0x80070542)

This error I did know and can work with. I had blogged about this error last July here. Checking the “Act as part of the operating system” showed that the C2WTS service account in fact was not given that right. Adding that account to that policy right and restarting the C2WTS Windows Service and performing an IISRESET then yielded the following:

The connectivity errors were clearly related to the lack of the Policy Setting. It was unexpected and didn’t line up with normal connectivity related issues and also wasn’t very helpful with regards of where to go look for more information as all of the normal paths didn’t show anything useful.

Of note, I tried reproducing this on SharePoint 2013, but only got the FileLoadException. I think this is partly a timing issue with how IIS AppPools are started and the C2WTS service is started. Doesn’t mean you won’t see this on SharePoint 2013 necessarily. Even on SharePoint 2010, the first time I hit the FileLoadException.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

I ran into a new Kerberos Scenario that I hadn’t hit before when I was working on the cases related to this blog post. It’s rare that I actually see a case related to the Named Pipes protocol. When I do, it is usually a customer trying to get it setup with a Cluster deployment. I have never had a Named Pipes case related to Kerberos. On top of that, I’ve never had a SQL related Kerberos issue that looked like an actual network related issue. I usually see a traditional “Login failed for user” type error from the SQL Server itself.

As part of my troubleshooting for the other blog post with the Claims configuration, I stumbled upon some information and theories about how Named Pipes responds when Kerberos is in the picture that I hadn’t ever seen or dealt with before. I love when I see new things! It is very humbling and always reminds me there are a lot of things that I don’t know. And, if you have read my other blog posts, or have seen me present at conferences like PASS, you know I have a passion for Kerberos!

Here is what I saw from an error perspective using SharePoint 2013 and Reporting Services 2012 SP1.

System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) ---> System.ComponentModel.Win32Exception: Access is denied

This is a typical error if we can’t connect to SQL. Think of this like a “Server doesn’t exist” type error. We didn’t get the normal “Login failed for user” error that would possibly point towards Kerberos. In this error, we didn’t even make it to SQL. The interesting piece here though is the “Access is denied” inner exception. That does possibly point to a permission issue.

I had talked in the last Blog Post about protocol order with connecting to SQL and that the default was TCP. In this case, I was forcing Named Pipes, so the fact that the error is a Named Pipes error is expected.

I dropped down to a network trace to see how far we actually got and to see if that revealed any other information. One thing to keep in mind here is that we are in a Claims to Windows Token Service (C2WTS) scenario with the SharePoint/RS 2012 integration. So, Kerberos/Constrained Delegation will be in the picture here. A lot of people aren’t necessarily familiar with how Named Pipes actually works. Named Pipes actually uses the SMB (simple message block) protocol from a network perspective. This is the same protocol used for file shares and you’ll see the traffic on port 445. It can be a little confusing because SMB sits on top of TCP, but we aren’t actually using the TCP 1433 port. It is just a different way to connect to SQL Server. The IP 10.0.0.20 was the SharePoint Server hosting the Reporting Services Service.

300    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB    SMB:C; Negotiate, Dialect = PC NETWORK PROGRAM 1.0, LANMAN1.0, Windows for Workgroups 3.1a, LM1.2X002, LANMAN2.1, NT LM 0.12, SMB 2.002, SMB 2.???    {SMBOverTCP:42, TCP:41, IPv4:1}
302    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R   NEGOTIATE (0x0), Revision: (0x2ff) - SMB2 wildcard revision number., ServerGUID={97B805C2-296C-477B-82B4-DEB6170A2A01} Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}
303    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB2    SMB2:C   NEGOTIATE (0x0), ClientGUID= {9CB563F9-BEF4-11E2-9403-00155D4CB97B},     {SMBOverTCP:42, TCP:41, IPv4:1}
304    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R   NEGOTIATE (0x0), Revision: (0x300) - SMB 3.0 dialect revision number., ServerGUID={97B805C2-296C-477B-82B4-DEB6170A2A01} Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}
323    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB2    SMB2:C   SESSION SETUP (0x1) Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}
326    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R - NT Status: System - Error, Code = (22) STATUS_MORE_PROCESSING_REQUIRED SESSION SETUP (0x1), SessionFlags=0x0 Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}
327    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB2    SMB2:C   SESSION SETUP (0x1) Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}
     - ResponseToken: NTLM AUTHENTICATE MESSAGE Version:NTLM v2, Workstation: CAPTHELO
          Signature: NTLMSSP
328    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R - NT Status: System - Error, Code = (34) STATUS_ACCESS_DENIED SESSION SETUP (0x1) ,     {SMBOverTCP:42, TCP:41, IPv4:1}
329    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    TCP    TCP:Flags=...A.R.., SrcPort=49665, DstPort=Microsoft-DS(445), PayloadLen=0, Seq=2945236632, Ack=2852397926, Win=0 (scale factor 0x8) = 0    {TCP:41, IPv4:1}

In the Network Trace we can see that we were trying to connect via NTLM. I already know that that will be a problem as we have to go Kerberos. We started supporting Kerberos with Named Pipes starting in SQL 2008, so it should work. At this point, I’m thinking we actually have a Kerberos issue even though it looked like a network issue from the original error message. So, lets go see if we can validate that. I already had Kerberos Event Logging enabled. These entries will be located in the System Event Log. You can ignore errors that show “KDC_ERR_PREAUTH_REQUIRED”. That is just noise and expected. Also realize that errors may be cached and if they are, you will not see them in the Event Log or a Network Trace. It may require an IISRESET, a reset of the C2WTS Windows Service, or even a reboot of the box to get the items to show in the Event log or Network Trace. See this Blog Post.

Log Name:      System
Source:        Microsoft-Windows-Security-Kerberos
Date:          5/17/2013 9:04:40 AM
Event ID:      3
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      CaptHelo.battlestar.local
Description:
A Kerberos error message was received:
on logon session
Client Time:
Server Time: 14:4:40.0000 5/17/2013 Z
Error Code: 0xd KDC_ERR_BADOPTION
Extended Error: 0xc0000225 KLIN(0)
Client Realm:
Client Name:
Server Realm: BATTLESTAR.LOCAL
Server Name: cifs/captthrace.battlestar.local
Target Name: cifs/captthrace.battlestar.local@BATTLESTAR.LOCAL
Error Text:
File: 9
Line: 12be
Error Data is in record data.

This entry was the only non-PREAUTH_REQUIRED error. Two things that were interesting about this. First was KDC_ERR_BADOPTION. When I see this, especially in a Claims type configuration, it tells me we have a Constrained Delegation issue. The other item that was interesting was the CIFS SPN. CIFS is used for File Sharing. It stands for “Common Internet File System”. This was our SMB traffic. We can also see this in the Network Trace.

319 9:04:40 AM 5/17/2013 10.0.0.20 10.0.0.1 KerberosV5 KerberosV5:TGS Request Realm: BATTLESTAR.LOCAL Sname: cifs/captthrace.battlestar.local {TCP:44, IPv4:14}
321 9:04:40 AM 5/17/2013 10.0.0.1 10.0.0.20 KerberosV5 KerberosV5:KRB_ERROR - KDC_ERR_BADOPTION (13) {TCP:44, IPv4:14}

This was interesting, because I never gave Constrained Delegation rights to CIFS for the C2WTS or the Computer Account. When we talk about SPN’s and Delegation and placement, we talk about that the SPN should be on the account that is running the servers. For CIFS, it will be the system itself and therefore on the machine account of the SQL Server that we are trying to connect to.

CIFS is one of those special Service Classes, similar to HTTP. It is covered by the HOST SPN on the Machine Account and we won’t see an actual CIFS SPN defined, but when we go to the delegation side of things you will see it.

I added this to both the Claims Service account and the Computer Account. I say computer account, because the actual SMB request will come from the machine and not directly from the RS Process. Under the hoods, it is affectively making a call to the CreateFile Windows API.

After resetting IIS and cycling the C2WTS Service, I still saw the same exact error. This was one of those reboot moments. After rebooting the server, I then got the following:

I didn’t necessarily expect this as I expected to fail on the Kerb side to SQL. So, I ran a report and stuck a WAITFOR DELAY in there so I could see the connection. had a look at dm_exec_connections on the SQL Server and saw that we had connected with NTLM:

For our purposes this will work as I’m not going further than SQL. This is technically a single hop between the SharePoint Server System context and the SQL Server. You can configure it for Kerberos if you really want that auth_scheme by creating the appropriate Named Pipes SPN and configuring the appropriate Delegation for the C2WTS Service Account and the Machine Account for where the SMB request is originating from. Also realize that if you have a misplaced Named Pipes SQL SPN, you will encounter a “Cannot Generate SSPI Context” similar to the following:

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

A case came up where the user was trying to use Report Builder in a Reporting Services instance that was not integrated with SharePoint. It was in Native Mode configuration. They indicated that they were getting a 401 error. My initial thought was that we were hitting a Kerberos issue. Of note, they were trying to hit a List that was in SharePoint 2013.

SharePoint 2013 is defaulted to use Claims Authentication Sites. So, most would probably ignore the Kerberos aspects of the SharePoint site. I was able to reproduce the issue locally because I had done the same thing.

I created the Data Source within Report Builder to hit my SharePoint 2013 site: http://capthelo/, and when I click on “Test Connection” within the Data Source Dialog Window, I get the following error.

dataextension!ReportServer_0-1!9cc!06/11/2013-14:25:58:: e ERROR: Throwing Microsoft.ReportingServices.DataExtensions.SharePointList.SPDPException: , Microsoft.ReportingServices.DataExtensions.SharePointList.SPDPException: An error occurred when accessing the specified SharePoint list. The connection string might not be valid. Verify that the connection string is correct. ---> System.Net.WebException: The request failed with HTTP status 401: Unauthorized.

This happens because when you click “Test Connection” the connection test is actually performed on the Report Server itself not directly from Report Builder. I had blogged a while back regarding Report Builder and Firewalls where I talk about how some of the items in Report Builder will try to connect direction, but “Test Connection” is not one of them.

At this point, we could ignore the error and hit OK on the Data Source Dialog and try and create a DataSet. When I go to the Query Designer, it appears to have worked. This because the DataSets and Query Designer are coming from Report Builder itself. It is a direct Web Request from the Report Builder Process and not the Report Server, so I don’t get an error.

However, this is misleading. This may make you believe that it is working properly, but when you deploy and try to run the report, you will be back to the 401 error because we are now coming from the Report Server which will be down the same path that the original error with the “Test Connection” had. From the DataSet/Query Designer perspective, this is a straight shot from Report Builder to SharePoint, so we can get away with an NTLM connection for the Web Request and the Windows Credential is valid.

From the Report Server, however, this is called a Double Hop and to forward Windows Credentials you need Kerberos to do that. Even when your SharePoint 2013 site is configured for Claims. This actually has nothing to do with SharePoint, it has everything to do with Reporting Services. The Report Server is the one trying to delegate the Windows Credential to whoever the receiving party is for the Web Request (or SQL Connection if that is your Data Source). In this case, it is SharePoint 2013. Because Kerberos isn’t configured properly, IIS (which is hosting SharePoint), received an anonymous credential for the Web Request and rejects it accordingly with a 401 error.

In my case, I was using a Domain User Account for the RS Service Account (BATTLESTAR\RSService – http://chieftyrol). It had the proper HTTP SPN on it. Also my SharePoint site was using a Domain User account for the AppPool identity within IIS (BATTLESTAR\spservice – http://capthelo) and this had the proper HTTP SPN on it.

So, now I just need to verify the Delegation properties for the RSService Account. Because I’m using the RSService account for other things that includes Claims within SharePoint 2013, I’m forced to Constrained Delegation on this account and need to continue using that. If you are not bound to Constrained Delegation, you could choose the option “Trust this user for delegation to any service (Kerberos Only)” which is considered Full Trust and should correct the issue. If you are using Constrained Delegation, you have to add the proper service that you want to delegate to. In my case that is for my SharePoint site and is http/capthelo.battlestar.local. After I added it, it looked like the following.

Then I restarted the Reporting Services Service and created the Data Source again. At that point, the “Test Connection” returned Success!

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

TCP Chimney is enabled by default if you apply Windows Server 2003 Sp2. This is an operating system feature that provides capability to offload TCP/IP packet processing from the processor to the network adapters and some other balancing options. (For a full description of this feature see http://support.microsoft.com/kb/912222.)

TCP Chimney has been known to cause issues on SQL Server systems such as general network errors and working set trimming. The following articles document these known issues:

http://support.microsoft.com/kb/942861

http://support.microsoft.com/kb/918483

We’ve also identified situations where TCP Chimney has impacted transaction throughput and caused delays between when a statement has been completed by the SQL engine and the time to receive the begin event of the next statement. This impact can be significant especially in application workloads that have throughput requirements to execute a series of statements within a certain time boundary.

For example, your application has a key transaction that consists of multiple statements. Each individual statement on the engine side is optimized and has very short duration. The overall duration of the transaction is short because each statement has low duration and the time in between the batches is short as well. A profiler trace of this transaction typically shows a pattern like the following. Note that there is very short time in between the complete of one batch and the start of the next batch:

However with TCP Chimney enabled, you notice there is a marked delay between a batch completed and the start of the next batch for the exact same series of statements and work. In this example, note how there is approximately a 500 ms. delay in between the complete and start of the next batch:

In this scenario with the 500 ms. delay in between statements you would see the SPID spend most of its time awaiting command in sys.sysprocesses with a waittype of 0x000.

This type of delay can affect application throughput as well as concurrency. For example if the above statements are all encompassed in an implicit transaction, with the added delay the overall duration of the implicit transaction is significantly increased, locks would then be held longer than normal and you may see unexpected blocking. If you do a comparison test of the same implicit transaction between two systems, one with TCP Chimney enabled and the other with TCP Chimney disabled and you compare the sum of the duration of the individual statements vs. the total duration of the entire transaction, you may see that the overall transaction is significantly increased when TCP Chimney is enabled. With TCP Chimney enabled, the delta between the sum of the statement duration from the overall transaction duration shows that the majority of time is spent awaiting the next batch/command.

Here is an example comparison of the same workload with TCP Chimney enabled and disabled. Note the significant increase in transaction duration and the large delta (difference between transaction duration vs. the sum duration of all statements within transaction) when TCP Chimney is enabled:

Implicit Transaction Summary TCP Chimney Enabled

spid    TransactionID TranStart     TranEnd       TranDuration sum_batch_duration   batch_count    delta
------- -------------- ------------- ------------ ------------- -------------------- -------------- --------
57      916972         09:40:24.450 09:41:17.623 53173         601                  516            52572
57      896243         09:39:31.620 09:40:01.840 30220         322                  301            29898
57      877227         09:39:12.120 09:39:15.293 3173          306                  161            2867
57      876313         09:38:58.590 09:38:58.603 13            0                    1              13
57      895388         09:39:18.510 09:39:18.527 16            16                   4              0
57      915675         09:40:02.653 09:40:02.670 16            16                   4              0

Implicit Transaction Summary TCP Chimney Disabled

spid    TransactionID TranStart     TranEnd       TranDuration sum_batch_duration   batch_count    delta
------- -------------- ------------ ------------ ------------- -------------------- -------------- --------
54      127910         11:13:47.287 11:13:52.490 5203          4060                 516            1143
54      107344         11:13:23.380 11:13:24.427 1046          382                  301            664
51      87187          11:12:50.067 11:12:50.550 483           0                    1              483
54      88182          11:13:03.987 11:13:07.237 3250          2878                 161            372
51      106432         11:13:10.487 11:13:10.487 0             0                    1              0
54      126550         11:13:25.490 11:13:26.007 516           516                  4              0

If you observe a similar pattern and suspect TCP Chimney, you may want to disable TCP Chimney to provide immediate relief. Another option is to follow up with your network adapter vendor to see if they have an updated driver that will address the problem and allow for use of TCP Chimney. For additional information see http://support.microsoft.com/default.aspx?scid=kb;EN-US;948496.

TCP Chimney is off by default in Windows Server 2008 - see http://support.microsoft.com/kb/951037.

Sarah Henwood | Microsoft SQL Server Escalation Services

While I was setting up one of my demos for SQL PASS, I starting hitting 401.1 errors. I was setting up a SharePoint Intergrated setup with Reporting Services.

I knew I had a distributed environment, so I accounted for my Kerberos configuration. I lined up my SPNs and made sure my accounts were trusted for delegation. So, I was a little surprised when I was hitting a 401.1 error when trying to run a report or create a new Datasource through the SharePoint RS Library.

I was using a Domain user account for my RS Service. The key was that I configured the Service account to use the Domain user before it had started up at all. Out of the gate, I was using the Domain user account and never touched the Network Service account. This was done by way of specifying the Domain user account within the SQL 2008 Setup wizard.

An interested side affect to doing this is that we don't add RSWindowsNegotiate to the rsreportserver.config file. All that was listed was RSWindowsNTLM. Well, that explained the 401.1 error. After manually adding in RSWindowsNegotiate, everything worked like a champ.

I found that we will add RSWindowsNegotiate when we use the Network Service account. Because I hadn't used that account, the setting was never populated to the config file.

RSReportServer Configuration File

http://msdn.microsoft.com/en-us/library/ms157273.aspx

RSWindowsNegotiate
The report server accepts either Kerberos or NTLM security tokens. This is the default setting when the report server is running in native mode and the service account is Network Service. This setting is omitted when the report server is running in native mode and the service account is configured as a domain user account.
If a domain account is configured for the Report Server Service account and a Service Principle Name (SPN) is not configured for the report server, this setting might prevent users from logging on to the server.

Of note, once the setting is there, we will not remove it if you change from the Network Service account to a Domain User account.

Adam W. Saxton | Microsoft SQL Server Escalation Services

We ran into some interesting situations with the SQL 2005 JDBC Driver (v1.2) and it's use with failover partners. Take the following connection string:

jdbc:sqlserver://myserver1;databaseName=AdventureWorks;failoverPartner=myserver2;

In this connection string, our Primary server will be myserver1 with our failover server being myserver2. If the primary server becomes unresponsive, we will fail over to the myserver2. This connection string should work perfectly fine.

Lets look at another situation:

jdbc:sqlserver://myserver1;databaseName=AdventureWorks;failoverPartner=myserver2\instance;

In this situation, we are connecting to a named instance for the failoverPartner. Again this should work perfectly fine from a usage standpoint of failoverPartner.

Port Number used with failoverPartner

jdbc:sqlserver://myserver1;databaseName=AdventureWorks;failoverPartner=myserver2:1699;

In this case, we are either connecting to a named instance (by way of the port), or a default instance on a non-standard port (that being 1433). This may or may not work as expected. If we were able to successfully connect to the Primary Server once, we will cache the failover connection string via the connection to the Primary server. The primary server in that case actually supplies the connection string to the failover partner and we ignore what you put in the actual application connection string. If that happened, we will probably successfully connect to the failoverPartner.

However, if we were not able to connect to the Primary Server at all (i.e. Primary Server is physically down or unreachable), then we will rely on the application connection string to connect to the failoverPartner. Our JDBC Driver doesn't parse the port number for the failoverPartner property. We will treat it as the actual server name. This is what we would see in the JDBC Log output:

Oct 28, 2008 10:21:22 AM com.microsoft.sqlserver.jdbc.SQLServerConnection loginWithFailover
FINE: ConnectionID:1 TransactionID:0x0000000000000000 This attempt No: 1
Oct 28, 2008 10:21:22 AM com.microsoft.sqlserver.jdbc.SQLServerConnection connectHelper
FINE: ConnectionID:1 TransactionID:0x0000000000000000 Connecting with server: myserver2:1699 port: 1433 Timeout slice: 400 Timeout Full: 5
Oct 28, 2008 10:21:22 AM com.microsoft.sqlserver.jdbc.TDSChannel open
FINE: TDSChannel ( ConnectionID:1 TransactionID:0x0000000000000000): Opening TCP socket...
Oct 28, 2008 10:21:23 AM com.microsoft.sqlserver.jdbc.SQLServerException logException
FINE: *** SQLException:com.microsoft.sqlserver.jdbc.SQLServerConnection@471e30 com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host has failed.

Notice that server equals myserver2:1699 with a port of 1433. This is because the port number that was specified in the connection string was not parsed out.

This is the exception you will receive on the application side:

com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host has failed. java.net.UnknownHostExce
ption
        at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.loginWithFailover(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)

This issue is currently not going to be changed and will still be present in the 2.0 release of the driver.

Named Instance on Primary Server when failoverPartner is specified

jdbc:sqlserver://myserver1\instance;databaseName=AdventureWorks;failoverPartner=myserver2;

Here we have a named instance for the Primary server and just a default instance for the failoverPartner. Lets assume that either the Primary Server is physically down or the SQL Browser service on that server is not running. This will result in the following exception:

com.microsoft.sqlserver.jdbc.SQLServerException: The connection to the named instance has failed. Error: java.net.Socke
tTimeoutException: Receive timed out.
        at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.getInstancePort(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)

In most situations, because we received an exception, we will try and connect again as it should use the failoverPartner at this point. This is what we will see:

Notice the lack of the loginWithFailover method from the callstack. We didn't make it far enough to even attempt the connection to the failoverPartner. In this situation, we are trying to resolve the instance name to a port number. Because we are unable to communicate with the SQL Browser services (UDP 1434) we cannot perform the lookup and we are just erroring out at that point. To work around this issue, you could specify the port number instead of the instance name itself.

This is what we would see in the JDBC Log output:

Oct 28, 2008 10:39:19 AM com.microsoft.sqlserver.jdbc.SQLServerConnection getInstancePort
FINE: ConnectionID:1 TransactionID:0x0000000000000000 Unexpected UDP timeout at 1 seconds resolving instance port. Target -> udp:myserver2/10.0.0.2:1434.
Oct 28, 2008 10:39:28 AM com.microsoft.sqlserver.jdbc.SQLServerException logException
FINE: *** SQLException:com.microsoft.sqlserver.jdbc.SQLServerConnection@b09e89 com.microsoft.sqlserver.jdbc.SQLServerException: The connection to the named instance has failed. Error: java.net.SocketTimeoutException: Receive timed out. The connection to the named instance has failed. Error: java.net.SocketTimeoutException: Receive timed out.

This issue is actually going to be addressed in the 2.0 release of the JDBC Driver and should not be a problem.

Adam W. Saxton | Microsoft SQL Server Escalation Services

One of my colleagues, Kamil Sykora, compiled a document that goes through how to troubleshoot leaked SqlConnection objects (from a .NET 2.0 perspective). It was a fairly large document, so I’m not going to post the whole thing. I’m going to split it out over several posts and base the examples off of a custom demo that I have created.

A common issue that we often observe is "leaking" connections in a .NET application. While leaking objects is technically not possible in a .NET application, the issue that we often observe is that customers are not closing SqlConnection objects before they go out of scope. This results in unused SqlConnection objects holding on to internal references and native objects until these SqlConnection objects get collected by the Garbage Collector.

The most common symptom of this is this error message:

Exception type: System.InvalidOperationException
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
InnerException: <none>
StackTrace (generated):
    SP       IP       Function
    0636F4B8 653CF486 System_Data_ni!System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)+0x133f46
    0636F4C4 652D69BA System_Data_ni!System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)+0x6a
    0636F4F8 652F5440 System_Data_ni!System.Data.SqlClient.SqlConnection.Open()+0x70

The steps to take when we see this exception are:

Find out how the customer is opening and closing connections and ensure that they are explicitly closing them in all cases. If doing this is not sufficient and it’s not 100% clear if all connections are getting closed.
Obtain a user dump of the process once the issue occurs. We can obtain a hang dump as soon as the exception occurs (good) or a crash dump on the exception (better).
Follow the debugging steps in this series to confirm if there are any unreferenced connections that are still holding on to internal references.

The following debugging instructions are based on an x86 user dump. Similar steps can be taken for a 64-bit dump as noted below.

For the dumps, we used the SOS debugging extension which ships with the .NET Framework. You can load the extension in the debugger by using the following command:

0:000> .loadby sos mscorwks

Locating the pool(s)

First we find all the pool object method tables in the process.

0:000> !dumpheap -stat -type DbConnectionPool
total 26 objects
Statistics:
      MT    Count    TotalSize Class Name
65404260        1           16 System.Data.ProviderBase.DbConnectionPoolIdentity
65436c90        1           24 System.Collections.Generic.List`1[[System.Data.ProviderBase.DbConnectionPool, System.Data]]
65436598        1           24 System.Collections.Generic.List`1[[System.Data.ProviderBase.DbConnectionPoolGroup, System.Data]]
6540444c        2           24 System.Data.ProviderBase.DbConnectionPool+DbConnectionInternalListStack
65400c70        1           32 System.Data.ProviderBase.DbConnectionPoolGroupOptions
654000a4        1           40 System.Data.ProviderBase.DbConnectionPoolGroup
6543397c        1           52 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Data.ProviderBase.DbConnectionPoolGroup, System.Data]]
654044a8        1           52 System.Data.ProviderBase.DbConnectionPool+PoolWaitHandles
6543085c        1           60 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Data.ProviderBase.DbConnectionPoolGroup, System.Data]][]
65404638        1           64 System.Data.ProviderBase.DbConnectionPool+TransactedConnectionPool
653fff4c        1          100 System.Data.ProviderBase.DbConnectionPool
653ffde4       14          168 System.Data.ProviderBase.DbConnectionPoolCounters+Counter
Total 26 objects

Then we dump out the individual pool objects. In this case there are a total of one pool. We dump out the pool and look for the _totalObjects member variable to see how many objects we have in that pool. Note that in the below case we have at least one pool with 100 connections which is the default number of maximum connections in a pool. We also look at the _connectionPoolGroupOptions variable and dump it out to double-check that the _maxPoolSize has been reached.

0:000> !dumpheap -mt 653fff4c
Address       MT     Size
012bbe80 653fff4c      100
total 1 objects
Statistics:
      MT    Count    TotalSize Class Name
653fff4c        1          100 System.Data.ProviderBase.DbConnectionPool
Total 1 objects

0:000> !do 012bbe80
Name: System.Data.ProviderBase.DbConnectionPool
MethodTable: 653fff4c
EEClass: 653ffedc
Size: 100(0x64) bytes
(C:\WINNT\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
79102290 4001517       44         System.Int32 1 instance   220000 _cleanupWait
65404260 4001518        4 ...ctionPoolIdentity 0 instance 012bd960 _identity
6540012c 4001519        8 ...ConnectionFactory 0 instance 01275c34 _connectionFactory
654000a4 400151a        c ...nnectionPoolGroup 0 instance 01279e7c _connectionPoolGroup
65400c70 400151b       10 ...nPoolGroupOptions 0 instance 01279e5c _connectionPoolGroupOptions
65426f4c 400151c       14 ...nPoolProviderInfo 0 instance 00000000 _connectionPoolProviderInfo
65426eac 400151d       48         System.Int32 1 instance        1 _state
6540444c 400151e       18 ...InternalListStack 0 instance 012bbee4 _stackOld
6540444c 400151f       1c ...InternalListStack 0 instance 012bbef0 _stackNew
791186fc 4001520       20 ...ding.WaitCallback 0 instance 012bc348 _poolCreateRequest
791087cc 4001521       24 ...Collections.Queue 0 instance 00000000 _deactivateQueue
791186fc 4001522       28 ...ding.WaitCallback 0 instance 00000000 _deactivateCallback
79102290 4001523       4c         System.Int32 1 instance       32 _waitCount
654044a8 4001524       2c ...l+PoolWaitHandles 0 instance 012bbf80 _waitHandles
790fdf04 4001525       30     System.Exception 0 instance 00000000 _resError
7910be50 4001526       5c       System.Boolean 1 instance        0 _errorOccurred
79102290 4001527       50         System.Int32 1 instance     5000 _errorWait
791127fc 4001528       34 ...m.Threading.Timer 0 instance 00000000 _errorTimer
791127fc 4001529       38 ...m.Threading.Timer 0 instance 012bc4c0 _cleanupTimer
65404638 400152a       3c ...tedConnectionPool 0 instance 012bc16c _transactedConnectionPool
00000000 400152b       40                       0 instance 012bbfb4 _objectList
79102290 400152c       54         System.Int32 1 instance      100 _totalObjects
79102290 400152e       58         System.Int32 1 instance        2 _objectID
791080f0 4001516      5fc        System.Random 0   static 012bd9c0 _random
79102290 400152d      828         System.Int32 1   static        2 _objectTypeCount

Here is the DbConnectionPoolGroupOptions object that we can get _maxPoolSize from:

0:000> !do 01279e5c
Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
MethodTable: 65400c70
EEClass: 6544cb58
Size: 32(0x20) bytes
(C:\WINNT\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
7910be50 4001573       10       System.Boolean 1 instance        0 _poolByIdentity
79102290 4001574        4         System.Int32 1 instance        0 _minPoolSize
79102290 4001575        8         System.Int32 1 instance      100 _maxPoolSize
79102290 4001576        c         System.Int32 1 instance    15000 _creationTimeout
7911228c 4001577       14      System.TimeSpan 1 instance 01279e70 _loadBalanceTimeout
7910be50 4001578       11       System.Boolean 1 instance        1 _hasTransactionAffinity
7910be50 4001579       12       System.Boolean 1 instance        0 _useDeactivateQueue
7910be50 400157a       13       System.Boolean 1 instance        0 _useLoadBalancing

At this point we have found that our pool has 100 connections whose max pool size is 100. This means that any connection requests to this pool will return the above mentioned error message. This is the immediate cause of the error message and we do not have to spend time looking for other potential causes, such as physical connectivity problems etc.

Next time, we will go into the internal connection object.

Adam W. Saxton | Microsoft SQL Server Escalation Services

In the last post in this series, we looked at how we can determine that our Connection pool was exhausted. In this post I'm going to go a little deeper into the Internal connection itself and how we can verify if this is a closed or active connection.

Dumping out the internal connection objects

A connection object in the System.Data.SqlClient namespace consists of two parts:

The SqlConnection class that is used by customers’ code
The SqlInternalConnectionTds internal class that is used by the pooling code. This class is not directly accessible to the user.

The SqlConnection class has a pointer to a SqlInternalConnectionTds object if it’s open (_innerConnection member variable). The _innerConnection member variable is null if the connection is closed. Whenever a connection is closed by the code, the internal object gets disassociated from the external object and the ownership of the internal object transfers to the pool object. This relationship allows us to identify SqlConnection objects that have not been closed.

The SqlInternalConnectionTds object has a weak reference back to the owning SqlConnection object.

Since there are typically multiple pools and not all of them are full, we want to start with the internal objects that we know belong to a full pool.

Going back to the pool in question, lets dump out the items within this pool.

0:000> !do 012bbe80
Name: System.Data.ProviderBase.DbConnectionPool
...
00000000 400152b 40 0 instance 012bbfb4 _objectList
79102290 400152c 54 System.Int32 1 instance 100 _totalObjects
...

0:000> !do 012bbfb4
Name: System.Collections.Generic.List`1[[System.Data.ProviderBase.DbConnectionInternal, System.Data]]
MethodTable: 654413c4
EEClass: 7912f680
Size: 24(0x18) bytes
(C:\WINNT\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
7912d8f8 40009c7        4      System.Object[] 0 instance 012bbfcc _items
79102290 40009c8        c         System.Int32 1 instance      100 _size
79102290 40009c9       10         System.Int32 1 instance      100 _version
790fd0f0 40009ca        8        System.Object 0 instance 00000000 _syncRoot
7912d8f8 40009cb        0      System.Object[] 0   shared   static _emptyArray
    >> Domain:Value dynamic statics NYI
00155858:NotInit <<

0:000> !da 012bbfcc
Name: System.Data.ProviderBase.DbConnectionInternal[]
MethodTable: 7912d8f8
EEClass: 7912de6c
Size: 416(0x1a0) bytes
Array: Rank 1, Number of elements 100, Type CLASS
Element Methodtable: 654009f0
[0] 012be414
[1] 012bf3e4
[2] 012bf008
...
[98] 0148114c
[99] 01485fcc

At this point we want to save all these 100 internal connection addresses into a file and remove all the array indexes so that the file only contains:

012be414
012bf3e4
012bf008
...
0148114c
01485fcc

Visual Studio is handy for this since we can select using alt + mouse to select the first 3-4 columns and delete them all, then save the file.

Processing the internal connections

The goal at this point is to find any SqlConnection objects from these SqlInternalConnectionTds objects that are no longer referenced. If the SqlConnection still references the SqlInternalConnectionTds and cannot be reached through !gcroot, it has been abandoned by the code without closing it.

Using .foreach to dump out the connections is easiest since it avoid the manual work of processing each of the 100 connections in question:

.foreach /f ( place "c:\temp\InternalConnections.txt") { dd poi(poi( place +4)+4) l1}
(32 bit)

.foreach /f ( place "c:\temp\InternalConnections.txt") { dq poi(poi( place +8)+8) l1}
(64 bit)

Explanation of the .foreach command:

place – this is our placeholder, or variable name, that represents each of the addresses in the file
dd – this would be dq in a 64-bit dump. It dumps out a double word, or the address
place + 8 – the weak reference is at offset 8 from the SqlInternalConnectionTds(64 bit) or at offset 4(32 bit):

0:000> !do 012be414
Name: System.Data.SqlClient.SqlInternalConnectionTds
MethodTable: 65404744
EEClass: 6544d9e0
Size: 140(0x8c) bytes
(C:\WINNT\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
79102290 4000f67       1c         System.Int32 1 instance        4 _objectID
...
79104c38 4000f6d        4 System.WeakReference 0 instance 012be55c _owningObject
...

The WeakReference object has a handle at offset 8 that is the second +8 in the command (64 bit) or at offset 4 (32 bit):

0:000> !do 012be55c
Name: System.WeakReference
MethodTable: 79104c38
EEClass: 79104bd4
Size: 16(0x10) bytes
(C:\WINNT\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
791016bc 40005a9        4        System.IntPtr 1 instance   3f1268 m_handle
7910be50 40005aa        8       System.Boolean 1 instance        0 m_IsLongReference

The value at that location is the owning object if it exists.

Non-Null and has an owning object:

0:000> dd 3f1268 l1
003f1268 01575138

Null and no owning object:

0:000> dd 3f1268 l1
003f1268 00000000

Output of the foreach command:

0:000> .foreach /f ( place "c:\temp\InternalConnections.txt") { dd poi(poi( place +4)+4) l1}
003f1268 01575138
003f127c 0157336c
003f1290 0157136c
003f1298 0156f138
003f1244 015809fc
...
003f2d34 014ac514
003f2d2c 014acbf4
003f2d1c 014ac7d4
003f2d14 015817cc

As we can see, the internal connections have an owning SqlConnection object. This either means that they are actively being used by the code (not likely) or they have been abandoned (more likely).

Finding out if a connection is actively used

To find out if a SqlConnection is still being used by the code, we can run the !gcroot command. This command will tell us if the object is reachable by the .NET Framework and if it is not, it is ready to be collected.

0:000> !gcroot 0157336c
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 0 OSTHread 590
DOMAIN(00155858):HANDLE(WeakSh):3f127c:Root:0157336c(System.Data.SqlClient.SqlConnection)

At this point in the application, we only have one thread running which is thread ID 0.

Here the output indicates that the object is reachable from thread 0. However, this can be a false positive because thread references can be old. We still have to verify that the object actually exists on that thread:

0:000> kL
ChildEBP RetAddr
0012f31c 7739bf53 ntdll!KiFastSystemCallRet
0012f3b8 7b0831a5 user32!NtUserWaitMessage+0xc
0012f434 7b082fe3 System_Windows_Forms_ni+0xb31a5
0012f464 7b0692c2 System_Windows_Forms_ni+0xb2fe3
0012f490 79e7c6cc System_Windows_Forms_ni+0x992c2
0012f510 79e7c8e1 mscorwks!CallDescrWorkerWithHandler+0xa3
0012f64c 79e7c783 mscorwks!MethodDesc::CallDescr+0x19c
0012f668 79e7c90d mscorwks!MethodDesc::CallTargetWorker+0x1f
0012f67c 79eefb9e mscorwks!MethodDescCallSite::Call_RetArgSlot+0x18
0012f7e0 79eef830 mscorwks!ClassLoader::RunMain+0x263
0012fa48 79ef01da mscorwks!Assembly::ExecuteMainMethod+0xa6
0012ff18 79fb9793 mscorwks!SystemDomain::ExecuteMainMethod+0x43f
0012ff68 79fb96df mscorwks!ExecuteEXE+0x59
0012ffb0 7900b1b3 mscorwks!_CorExeMain+0x15c
0012ffc0 77e6f23b mscoree!_CorExeMain+0x2c
0012fff0 00000000 kernel32!BaseProcessStart+0x23

We can see that we have managed code on this thread. Let's look at what the managed stack looks like:

0:000> !clrstack
OS Thread Id: 0x590 (0)
ESP EIP
0012f32c 7c8285ec [InlinedCallFrame: 0012f32c] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
0012f328 7b08374f System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
0012f3c8 7b0831a5 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
0012f440 7b082fe3 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0012f470 7b0692c2 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f480 00e70097 SqlConnectionLeakWin.Program.Main()
0012f69c 79e7c74b [GCFrame: 0012f69c]

Doesn't appear to be doing anything with SQL here. Let's look at the objects on the stack:

0:000> !dso
OS Thread Id: 0x590 (0)
ESP/REG Object   Name
ebx      01253384 System.Windows.Forms.Application+ThreadContext
esi      015cc2e8 System.Collections.Hashtable+HashtableEnumerator
0012f354 01299fc4 System.Windows.Forms.NativeMethods+MSG[]
0012f358 01253384 System.Windows.Forms.Application+ThreadContext
0012f360 01299ad8 System.Windows.Forms.Application+ComponentManager
0012f3d8 01253384 System.Windows.Forms.Application+ThreadContext
0012f42c 01253384 System.Windows.Forms.Application+ThreadContext
0012f43c 01296b84 System.Windows.Forms.ApplicationContext
0012f444 0127fe4c System.ComponentModel.EventHandlerList
0012f458 01252a8c SqlConnectionLeakWin.Form1
0012f460 01253384 System.Windows.Forms.Application+ThreadContext
0012f474 01252a8c SqlConnectionLeakWin.Form1

We can conclude that this SqlConnection object is no longer being used and it has not been closed. This proves that the applications code did not close all connections and further code investigation needs to be performed to close all connections.

Reference:

Part 1

Adam W. Saxton | Microsoft SQL Server Escalation Services

We get a lot of calls related to Kerberos configuration, and I'm planning to write more about our experiences and troubleshooting techniques for these types of issues across the box (Engine, AS and RS).

With Windows 2000/2003 SetSPN had only a few commands associated with it.

Switches:
-R = reset HOST ServicePrincipalName
Usage:   setspn -R computername
-A = add arbitrary SPN
Usage:   setspn -A SPN computername
-D = delete arbitrary SPN
Usage:   setspn -D SPN computername
-L = list registered SPNs
Usage:   setspn [-L] computername

The other problem was that SetSPN was part of the Resource Kit and did not ship with the OS.

This has changed in Windows 2008. SetSPN is now part of the OS from the moment you install it. They have also improved what SetSPN can do. Namely the ability to look for duplicate SPNs. In the past I have used numerous tools to look for duplicate SPNs. This ranged from DHDiag (an internal CSS tool that uses LDIFDE) to queryspn.vbs to DelegConfig.

Here are the new switches for SetSPN that ships with Windows 2008:

Modifiers:
-F = perform the duplicate checking on forestwide level
-P = do not show progress (useful for redirecting output to file)

Switches:
-R = reset HOST ServicePrincipalName
Usage:   setspn -R computername
-A = add arbitrary SPN
Usage:   setspn -A SPN computername
-S = add arbitrary SPN after verifying no duplicates exist
Usage:   setspn -S SPN computername
-D = delete arbitrary SPN
Usage:   setspn -D SPN computername
-L = list registered SPNs
Usage:   setspn [-L] computername
-Q = query for existence of SPN
Usage:   setspn -Q SPN
-X = search for duplicate SPNs
Usage:   setspn -X

The Q switch is really the nice feature here. This allows you to see if an SPN is already out on your domain. You could also combine this with the F modifier to look through the whole forest.

C:\>setspn -q MSSQLSvc/mymachine:1433

No such SPN found.

C:\>setspn -q MSSQLSvc/mymachine.mydomain.com:1433
CN=MYMACHINE,OU=Workstations,DC=mydomain,DC=com
        MSSQLSvc/mymachine.mydomain.com:1433
        HOST/MYMACHINE
        HOST/MYMACHINE.MYDOMAIN.COM

Existing SPN found!

This is just another thing that will make Kerberos configuration/troubleshooting easier for users.

Adam W. Saxton | Microsoft SQL Server Escalation Services

I tend to get quite a bit of Kerberos related cases. These are related across the box, from the Engine, to Reporting Services to just straight connectivity with custom applications. I had one given to me yesterday because the engineer had gone through everything we normally go through and wasn’t getting anywhere.

The situation was an 8 node cluster with multiple instances across the nodes. These were running Windows 2008 with SQL 2008. One node in particular was having an issue when they were issuing a Linked Server Query from a remote client.

When trying to hit the linked server from within Management Studio on the client machine, we received the following message:

Msg 18456, Level 14, State 1, Line 1
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'

Kerberos Configuration:

When we see this type of error, it is typically Kerberos related as the Service we are using (ServerA) is unable to delegate the client’s credentials to the backend server (ServerB – Linked Server). The first thing we do is go through our regular kerberos checklist – SPN’s and Delegation settings. Both SQL Servers were using the same Domain User Service Account (SNEAKERNET\SQLSvc). We can use SetSPN to check what SPN’s are on that account. NOTE: There are numerous ways to look for SPN’s but SetSPN is one of the easier command line tools available. You could also use LDIFDE (http://support.microsoft.com/kb/237677), ADSIEdit (http://technet.microsoft.com/en-us/library/cc773354(WS.10).aspx) and other tools. You will see us use an in house tool called DHDiag to collect SPN’s. This is just a wrapper that calls LDIFDE to output the results.

So, here are the SetSPN results:

C:\Users\Administrator>setspn -l sqlsvc
Registered ServicePrincipalNames for CN=SQL Service,OU=Service Account,DC=sneakernet,DC=local:
        MSSQLSvc/SQL02:26445
        MSSQLSvc/SQL02.sneakernet.local:26445
        MSSQLSvc/SQL01.sneakernet.local:14556
        MSSQLSvc/SQL01:14556

Why do we see SQL01 and SQL02 when our machine names are ServerA and ServerB? This is because SQL01 and SQL02 are the virtual names for the cluster. This name will move to whatever the active node is for that given instance. Where as ServerA and ServerB are the physical machine names and may or may not be actually hosting that instance. We can also see from this that we have two distinct instances because of the ports (14556 & 26445). If you look at some of our documentation (i.e. http://msdn.microsoft.com/en-us/library/ms189585(SQL.90).aspx), it indicates that for clusters, you need to also add a SQL SPN that does not include the port number. I have yet to see where this is actually needed. Every cluster I’ve seen has never had one. Typically if it is needed, you will receive a KRB_ERR_S_PRINCIPAL_UNKNOWN error if you enable Kerberos Event Logging. If you do see that and it lists that SPN, then go ahead and add it. But, from my experience, you won’t see it.

Ok, our SPNs look good. Lets look at our Delegation Settings. In this case we really care about the SQL Service Account, because that is the context that will be performing the delegation.

We can do this by going to the properties for that account within Active Directory Users and Computers. You will see a Delegation tab on the account. If you don’t see the delegation tab, then the account does not have an SPN attached to it. In this case we have “Trust this user for delegation to any service (Kerberos only)”. This is what I call Full or Open Delegation as opposed to Constrained Delegation (which is more secure). We are good to go here. Nine times out of ten, the SPN or Delegation setting is going to be the cause of your issue. In this case it isn’t. What can we do now?

Kerberos Event Logging and Network Traces:

We can enable Kerberos Event Logging (http://support.microsoft.com/default.aspx?scid=kb;EN-US;262177) which will give us errors within the System Log for Kerberos. This can sometimes be very helpful in diagnosing what may or may not be happening. This produced the following results on ServerA:

Error Code: 0x1b Unknown Error
Error Code: 0x19 KDC_ERR_PREAUTH_REQUIRED
And KDC_ERR_BADOPTION

These are not uncommon and when we looked at these, they didn’t really relate to our issue. Which means we had nothing here. Of note, doing a linked server query from ServerB to ServerA worked, and it also produced the same events listed above. So, nothing to gain here.

The next thing we can look at is getting a network trace as this will show us the communication between Service in question and the Domain Controller. I usually end up at this level if the SPN’s and Delegation settings check out. This is really where some customers can have issues, because typically these are hard to interpret and will require a call to CSS. We grabbed a trace in the failing and working condition to see what was different. We saw the following:

Failing:
525355 2009-06-30 15:55:39.468865 10.0.0.90 10.0.0.10 KRB5 TGS-REQ
KDC_REQ_BODY
KDCOptions: 40810000 (Forwardable, Renewable, Canonicalize)
Realm: SNEAKERNET.LOCAL
Server Name (Enterprise Name): ServerA$@SNEAKERNET.LOCAL

Working:
353115 23.437037 10.0.0.20 10.0.0.11 KRB5 TGS-REQ
KDC_REQ_BODY
KDCOptions: 40810000 (Forwardable, Renewable, Canonicalize)
Realm: SNEAKERNET.LOCAL
Server Name (Service and Instance): MSSQLSvc/SQL02.sneakernet.local:26445

You’ll notice that we are hitting different DC’s here, but that wasn’t the issue as we also saw the failing one hitting different DC’s as we continued. The other item that is different is the working one requested the right SPN, where as the failing one is requesting the physical machine account context. This is what was forcing us into NTLM and causing the Login failed error. But why was that happening? So far we have zero information to indicate what could be causing it.

SSPIClient:

We then used an internal tool called SSPIClient which makes direct calls to the InitializeSecurityContext API call which is how we do impersonation. This tool allowed us to take SQL Server out of the picture and focus on the Kerberos issue directly. We could see that we were failing back to NTLM which really confirmed what we saw in the network trace.

2009-07-01 16:34:24.577 ENTER InitializeSecurityContextA
2009-07-01 16:34:24.577 phCredential              = 0x0090936c
2009-07-01 16:34:24.577 phContext                 = 0x00000000
2009-07-01 16:34:24.577 pszTargetName             = 'MSSQLSvc/SQL02.sneakernet.local:26445'
2009-07-01 16:34:24.577 fContextReq               = 0x00000003 ISC_REQ_DELEGATE|ISC_REQ_MUTUAL_AUTH
2009-07-01 16:34:24.577 TargetDataRep             = 16
2009-07-01 16:34:24.577 pInput                    = 0x00000000
2009-07-01 16:34:24.577 phNewContext              = 0x0090937c
2009-07-01 16:34:24.577 pOutput                   = 0x0017d468
2009-07-01 16:34:24.577 pOutput->ulVersion        = 0
2009-07-01 16:34:24.577 pOutput->cBuffers         = 1
2009-07-01 16:34:24.577 pBuffers[00].cbBuffer   = 52
2009-07-01 16:34:24.577 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
2009-07-01 16:34:24.577 pBuffers[00].pvBuffer   = 0x02c99f90
2009-07-01 16:34:24.578 02c99f90 4e 54 4c 4d 53 53 50 00 01 00 00 00 97 b2 08 e2   NTLMSSP.........
2009-07-01 16:34:24.578 02c99fa0 03 00 03 00 31 00 00 00 09 00 09 00 28 00 00 00   ....1.......(...
2009-07-01 16:34:24.578 pfContextAttr             = 0x00001000 ISC_RET_INTERMEDIATE_RETURN
2009-07-01 16:34:24.578 ptsExpiry                 = 0x0017d43c -> 2009-07-01 10:39:24 *** EXPIRED *** (05:55:00 diff)
2009-07-01 16:34:24.578 EXIT InitializeSecurityContextA returned 0x00090312 SEC_I_CONTINUE_NEEDED (The function completed successfully, but must be called again to complete the context)

NOTE: We purged all of the Kerberos Tickets before we did this to make sure we would request the ticket from the KDC. This was done using KerbTray which is part of the Windows Resource Kit.

This tells us that we were requesting a given SPN for the Target, but the buffer shows NTLMSSP. This means we fell down to NTLM instead of getting Kerberos. This still doesn’t explain why.

End Result:

Unfortunately, this was one of those issues that just escaped us. This tends to happen with odd Kerberos cases. We had the Directory Services team engaged as well and they did not know what else we could do in terms of data collection outside of a Kernel Dump to see what may be going on. We noticed that the nodes had not been rebooted since April 5th which is a while. The SQL Service was recycled on June 25th. We decided to fail over to another node and reboot ServerA. After we rebooted, we tried SSPIClient again and we saw a proper response come back which also didn’t list EXPIRED. The issue at this point it was resolved. We don’t have hard data to indicate what exactly the issue was, but the thought is that something was cached and invalid causing the issue. Rebooting cleared that out and allowed us to work as expected.

Which leads me to my motto: When in doubt, Reboot!

Adam W. Saxton | Microsoft SQL Server Escalation Services

We have had a few customer calls come in on this scenario that I thought this needed to be documented a bit.

Scenario:

In this scenario, the customer has a data source defined on the Report Server. Some were using Named Instances, others were using a Default Instance for the Data Source.

There are some aspects of Report Builder that will run server side (from the context of the Report Server). For example, DataSource retrieval and preview of a report. This is assuming that we are in connected mode in Report Builder.

There are other aspects that will run Client Side. Some examples of that are the Query Designer and general Metadata lookup for the DataSet. This is where the problems come into play when a firewall is involved.

In all of the cases, reports and Report Builder function normally locally. When they try to create a new report through Report Builder, they encounter errors similar to the following:

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - The requested name is valid, but no data of the requested type was found.)

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

The first error is specific to a Named Instance server. The other two are when we are trying to connect directly to the SQL Server. Named Instances have to do a lookup to get the port number for the actual instance we are connecting to. This lookup is fielded by SQL Browser over UDP port 1434. When ever you see “error: 26 - Error Locating Server/Instance Specified”, it is SQL Browser related. The underlying issue is still the same as the other messages.

The way I reproduced the issue was by doing the following on my lab setup which was configured for Basic Authentication:

Open Report Builder (which starts with a blank report – and I was in connected mode with my Report Server)
Create a DataSource which I select from the existing data sources on my Report Server
Create a DataSet
At this point, the DataSet Properties window should open up, at which point you can click on “Query Designer…”
I was then prompted for Credentials and then was met with the following:

The Problem:

The overall problem is that Report Builder cannot see the SQL Server when external to the network that SQL Server resides on. SQL Server is typically not exposed through the firewall. Assume the following configuration:

Report Server:

Internet RS URL: http://www.mysite.com/ReportServer
Public IP: 201.201.201.201
Private IP: 10.0.0.5
DataSource Connection String: server=MyServer\MyInstance;Database=AdventureWorks;

SQL Server:

Server Name: MyServer
Instance Name: MyInstance (Port 2644)
Private IP: 10.0.0.4

When Report Builder is opened from a client machine on the Internet (or external to the private network that SQL Server is a part of), when it goes to hit the datasource, it is actually trying to connect to MyServer\MyInstance. Because this is a named instance, we are doing the SQL Browser lookup first. In this case, it will be a NetBIOS lookup. If we are doing a straight TCP connection, we will end up doing a DNS lookup. Because we are on the Internet, there is no WINS or DNS server that is aware of MyServer. NetBIOS or DNS will come back basically saying it couldn’t find the server name you are requesting which results in one of the errors I outlined above.

Report Builder doesn’t go through the Reporting Services WebService to do DataSource calls which would make it server based. From the design perspective, we are client side and it will try to establish that data from the client. I think some of the confusion is that people thing that we are in “connected” mode with the Report Server, so all functionally would occur on the Report Server itself, in which case we would expect the Report Server to be able to communicate with the SQL Server successfully. This, unfortunately, is not the case.

Are there any workarounds?

The next logical question would be, how do I get this to work? There are two possible workaround I can think of. One that is not very realistic and another that is possible, but also somewhat of a pain.

Workaround 1:

This involves exposing your SQL Server to the internet, which I do NOT recommend and I doubt most companies are willing to do. At that point, you could have an External DataSource along with an Internal DataSource. People using Report Builder on the internet could reference the External DataSource which has the connection information for the SQL Server that would be usable from the internet. At that point the design aspects would work, but Preview could fail depending on your network configuration if the Report Server can reference the external IP address for SQL Server from the internal side.

Then when you publish, the report can reference the Internal DataSource.

Workaround 2:

Another option is to expose your data through a WebService that is accessible via the Internet. Then Report Builder uses can access the DataSource that is using the WebService as that resource is available to them externally.

Update - Workaround 3 (SSAS/OLAP) – Thanks David!:

For SSAS/OLAP you can setup a Connection Proxy over HTTP. This would be usable both internally and externally and can be easily exposed through a firewall. Be sure to use a non-standard port that is configured on your Firewall for security purposes. Also, be aware that you are exposing your backend to the internet and to take the appropriate security measures. SQL has a similar feature through the use of an HTTP Endpoint, but be aware that that has been deprecated and is not guaranteed to be available in a future release.

Overall, it will be difficult for people using Report Builder externally to access resources that are on an internal network when designing a report. Hopefully, this will allow you to better plan your deployment of Reporting Services.

Adam W. Saxton | Microsoft SQL Server Escalation Services

Hi,

I wanted to make everybody aware of this feature in SQL 2008.

Are you tired of having to use NetMon to narrow down a connectivity issue with SQL Server 2008 or have to wait for an elusive connectivity error to reoccur?

A new ring buffer called "RING_BUFFER_CONNECTIVITY' has been added to the dmv sys.dm_os_ring_buffers in SQL 2008 RTM.

This will automatically log server-side initiated connection closures, if you see nothing in the dmv, then most likely the client reset/closed the connection. You can enable any connection closure (client or server) logging with trace flag 7827.

Please read this blog for more information!

http://blogs.msdn.com/sql_protocols/archive/2008/05/20/connectivity-troubleshooting-in-sql-server-2008-with-the-connectivity-ring-buffer.aspx

So if SQL Server 2008 is still online since the connection failure, make sure to capture the information from the sys.dm_os_ring_buffers base on the query in the blog above, as it may give you enough information to narrow down your troubleshooting to the client or server without costly netmon traces.

Hope this helps!

Eric Burgess
SQL Server Escalation Team

Was working with Keith Elmore on one of our internal processes and he was hitting a “Cannot generate SSPI context” when trying to connect from Management Studio. I also saw this come up in a double hop situation (IIS to SQL) when I setup a local repro.

We went through the normal check list for Kerberos Troubleshooting, but really that just consisted of validating the SPN in the case of Management Studio as it was a single hop and we were just trying to do a direct connection without any delegation. The SPN checked out, and there was only one SPN. No duplicates.

We have an internal tool called SSPIClient which will go through the motions of just trying the Windows API calls for Kerberos authentication (IntializeSecurityContext).

2009-12-30 21:11:16.185 Connecting via ODBC to [DRIVER=SQL Server;Server=tcp:passsql\demo;Trusted_Connection=Yes;]
2009-12-30 21:11:16.232 ENTER InitializeSecurityContextA
2009-12-30 21:11:16.232 phCredential              = 0x0055ffb4
2009-12-30 21:11:16.232 phContext                 = 0x0055ffc4
2009-12-30 21:11:16.232 pszTargetName             = 'MSSQLSvc/PASSSQL.pass.local:59256'
2009-12-30 21:11:16.232 fContextReq               = 0x00000003 ISC_REQ_DELEGATE|ISC_REQ_MUTUAL_AUTH
2009-12-30 21:11:16.232 TargetDataRep             = 16
2009-12-30 21:11:16.232 pInput                    = 0x0018d55c
2009-12-30 21:11:16.232 pInput->ulVersion         = 0
2009-12-30 21:11:16.232 pInput->cBuffers          = 1
2009-12-30 21:11:16.232 pBuffers[00].cbBuffer   = 112
2009-12-30 21:11:16.232 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
2009-12-30 21:11:16.232 pBuffers[00].pvBuffer   = 0x03753870
2009-12-30 21:11:16.232 03753870 a1 6e 30 6c a0 03 0a 01 01 a2 65 04 63 60 61 06   .n0l......e.c`a.
2009-12-30 21:11:16.232 03753880 09 2a 86 48 86 f7 12 01 02 02 03 00 7e 52 30 50   .*.H........~R0P
2009-12-30 21:11:16.232 03753890 a0 03 02 01 05 a1 03 02 01 1e a4 11 18 0f 32 30   ..............20
2009-12-30 21:11:16.232 037538a0 30 39 31 32 33 30 32 31 31 31 31 36 5a a5 05 02   091230211116Z...
2009-12-30 21:11:16.232 037538b0 03 01 0d b4 a6 03 02 01 29 a9 0c 1b 0a 50 41 53   ........)....PAS
2009-12-30 21:11:16.232 037538c0 53 2e 4c 4f 43 41 4c aa 17 30 15 a0 03 02 01 01   S.LOCAL..0......
2009-12-30 21:11:16.232 037538d0 a1 0e 30 0c 1b 0a 73 71 6c 73 65 72 76 69 63 65   ..0...sqlservice
2009-12-30 21:11:16.232 phNewContext              = 0x0055ffc4
2009-12-30 21:11:16.232 pOutput                   = 0x0018d574
2009-12-30 21:11:16.232 pOutput->ulVersion        = 0
2009-12-30 21:11:16.232 pOutput->cBuffers         = 1
2009-12-30 21:11:16.232 pBuffers[00].cbBuffer   = 12256
2009-12-30 21:11:16.232 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
2009-12-30 21:11:16.232 pBuffers[00].pvBuffer   = 0x03759d68
2009-12-30 21:11:16.232 pfContextAttr             = 0x00000000
2009-12-30 21:11:16.232 ptsExpiry                 = 0x0018d548 -> 1601-01-01 00:00:00 *** EXPIRED *** (3585189:11:16 diff)
2009-12-30 21:11:16.232 EXIT InitializeSecurityContextA returned 0x80090322 SEC_E_WRONG_PRINCIPAL (The target principal name is incorrect)
2009-12-30 21:11:16.232
2009-12-30 21:11:16.232 ******************** ODBC Errors ********************
2009-12-30 21:11:16.232 Return code = -1.
2009-12-30 21:11:16.232 SQLError[00] SQLState    'S1000'
2009-12-30 21:11:16.232 SQLError[00] NativeError 0
2009-12-30 21:11:16.232 SQLError[00] Message     '[Microsoft][ODBC SQL Server Driver]Cannot generate SSPI context'
2009-12-30 21:11:16.232 ******************** ODBC Errors ********************

It was saying that the principal was incorrect, but you can see in the output that it is showing sqlservice, which is correct. We had rebooted the SQL Server in question, at which point the SQL Service wouldn’t even start. Keith asked if the password had been changed recently. We took a look, and sure enough, the password was changed yesterday. This happens to be an account that we use for multiple things.

We changed the service account password through SQL Server Configuration Manager and restarted SQL. SQL could start at that point, and the SSPI error disappeared. We were able to successfully connect to SQL at that point.

I’m sure other people have known about this type of condition, but in the years that I’ve been here, along with the number of Kerb issues that I’ve troubleshot in the past, this was the first time I had run across this. Thought I would throw it out there to share with everyone in case they maybe run across something like this that they can’t explain.

If you change your service password, be sure to recycle the SQL Service so that Kerberos can function properly.

Adam W. Saxton | Microsoft SQL Server Escalation Services

We have had a number of people ask about how they can get the Jet ODBC driver/OLE DB Provider as 64 bit. Windows only ships the 32 bit versions of these. The answer is that the windows versions won’t be x64 as those items are deprecated. What does deprecated mean? Here is the excerpt from the MDAC/WDAC Roadmap on MSDN:

Deprecated MDAC/WDAC Components
These components are still supported in the current release of MDAC/WDAC, but they might be removed in future releases. Microsoft recommends, when you develop new applications, that you avoid using these components. Additionally, when you upgrade or modify existing applications, remove any dependency on these components.

And here is what it lists about the Jet Database Engine:

Microsoft Jet Database Engine 4.0: Starting with version 2.6, MDAC no longer contains Jet components. In other words, MDAC 2.6, 2.7, 2.8, and all future MDAC/WDAC releases do not contain Microsoft Jet, the Microsoft Jet OLE DB Provider, the ODBC Desktop Database Drivers, or Jet Data Access Objects (DAO). The Microsoft Jet Database Engine 4.0 components entered a state of functional deprecation and sustained engineering, and have not received feature level enhancements since becoming a part of Microsoft Windows in Windows 2000.

There is no 64-bit version of the Jet Database Engine, the Jet OLEDB Driver, the Jet ODBC Drivers, or Jet DAO available. This is also documented in KB article 957570. On 64-bit versions of Windows, 32-bit Jet runs under the Windows WOW64 subsystem. For more information on WOW64, see http://msdn.microsoft.com/en-us/library/aa384249(VS.85).aspx. Native 64-bit applications cannot communicate with the 32-bit Jet drivers running in WOW64.

Instead of Microsoft Jet, Microsoft recommends using Microsoft SQL Server Express Edition or Microsoft SQL Server Compact Edition when developing new, non-Microsoft Access applications requiring a relational data store. These new or converted Jet applications can continue to use Jet with the intention of using Microsoft Office 2003 and earlier files (.mdb and .xls) for non-primary data storage. However, for these applications, you should plan to migrate from Jet to the 2007 Office System Driver. You can download the 2007 Office System Driver, which allows you to read from and write to pre-existing files in either Office 2003 (.mdb and .xls) or the Office 2007 (*.accdb, *.xlsm, *.xlsx and *.xlsb) file formats. IMPORTANT Please read the 2007 Office System End User License Agreement for specific usage limitations.

Note: SQL Server applications can also access the 2007 Office System, and earlier, files from SQL Server heterogeneous data connectivity and Integrations Services capabilities as well, via the 2007 Office System Driver. Additionally, 64-bit SQL Server applications can access to 32-bit Jet and 2007 Office System files by using 32-bit SQL Server Integration Services (SSIS) on 64-bit Windows.

This all pertains to the components that actually ship with Windows. The Office team has since taken up Jet as part of Access and has come out with what they call the Access Control Entry (ACE) driver. For more information on the ACE Drivers, you can check out this blog post which goes into details. the ACE driver/provider is completely backwards compatible with Jet 4.0 though.

Office 2010 will introduce a 64 bit version of Office. With that is coming a 64 bit version of the ACE Driver/Provider which will in essence give you a 64 bit version of Jet. The downside is that it doesn’t ship with the operating system but will be a redistributable. There is a beta version available of this driver, as Office 2010 hasn’t been released yet.

2010 Office System Driver Beta: Data Connectivity Components
http://www.microsoft.com/downloads/details.aspx?familyid=C06B8369-60DD-4B64-A44B-84B371EDE16D&displaylang=en

Adam W. Saxton | Microsoft SQL Server Escalation Services

This month has turned into another Kerberos Month for me. I had an email discussion regarding SPN’s for SQL Server and what we can do to get them created and in a usable state. I thought I would share my response to the questions as it will probably be helpful for someone. Here was the comment that started the conversation. And, by the way, this was actually a good question. I actually see this kind of comment a lot in regards to SPN placement. Not necessarily the setup aspect of it, but for SPN’s in general.

“In prior versions of setup we used to be able to specify the port number for the default and Named Instance. Now, (SQL 2008 & R2) it takes the defaults. 1433 and Dynamic for Named Instances.

If you want to use Kerberos with TCP, you need to know the port number to create the SPN. For Default instances, if you’re using 1433 then you’re ok. But, Named Instances listen on a dynamic port by default, and since you can’t set the port number, any SPN you create will probably be wrong and Kerberos won’t work. It would be great if we could ask the user if they want to change the port number during setup, like we did with SQL 2000.”

Let’s have a look at Books Online first.

Registering a Service Principal Name
http://msdn.microsoft.com/en-us/library/ms191153.aspx

This article goes through the different formats that are applicable to SQL 2008 (they are the same for R2 as well). It also touches on two items that are important to understand. 1. Automatic SPN Registration and 2. Client Connections. Here is the excerpt from the above article in regards to Automatic SPN Registration.

Automatic SPN Registration
When an instance of the SQL Server Database Engine starts, SQL Server tries to register the SPN for the SQL Server service. When the instance is stopped, SQL Server tries to unregister the SPN. For a TCP/IP connection the SPN is registered in the format MSSQLSvc/<FQDN>:<tcpport>.Both named instances and the default instance are registered as MSSQLSvc, relying on the <tcpport> value to differentiate the instances.
For other connections that support Kerberos the SPN is registered in the format MSSQLSvc/<FQDN>:<instancename> for a named instance. The format for registering the default instance is MSSQLSvc/<FQDN>.
Manual intervention might be required to register or unregister the SPN if the service account lacks the permissions that are required for these actions.

What does this mean? It means that if the SQL Service account is using Local System or Network Service as the logon account, we will have the permission necessary to register the SPN against the Domain Machine Account. By default, the machine accounts have permission to modify themselves. If we change this over to a Domain User Account for the SQL Service account, things change a little. By default a Domain User does not have the permission required to create the SPN. So, when you start SQL Server with a Domain User Account, you will see an entry in your ERRORLOG similar to the following:

2010-03-05 09:39:53.20 Server The SQL Server Network Interface library could not register the Service Principal Name (SPN) for the SQL Server service. Error: 0x2098, state: 15. Failure to register an SPN may cause integrated authentication to fall back to NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies.

This permission is called “Write servicePrincipalName” and can be altered through an MMC snap in called ADSI Edit. For instructions on how to modify this setting, refer to Step 3 in the following KB Article. WARNING: I do NOT recommend you do this on a Cluster. We have seen issues with this causing connectivity issues due to Active Directory Replication issues if more than one Domain Controller is used in your environment.

How to use Kerberos authentication in SQL Server
http://support.microsoft.com/kb/319723

So, if I enable that permission, lets see what the SQL Service does. I have two machines I’m going to use for this. ASKJCTP3 (running the RC build of 2008 R2) and MySQLCluster (SQL 2008 running a Named Instance called SQL2K8).

SetSPN Details:

SPN's with TCP and NP enabled on Default Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
MSSQLSvc/ASKJCTP3.dsdnet.local:1433
MSSQLSvc/ASKJCTP3.dsdnet.local

SPN's with only NP enabled on Default Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
MSSQLSvc/ASKJCTP3.dsdnet.local

SPN's with TCP and NP enabled on Clustered Named Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
MSSQLSvc/MYSQLCLUSTER.dsdnet.local:54675
MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

SPN's with only NP enabled on a Clustered Named Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

Lets look at what the client will do. When I say client, this could mean a lot of different things. Really it means an Application trying to connect to SQL Server by way of a Provider/Driver. NOTE: Specifying the SPN as part of the connection is specific to SQL Native Client 10 and later. It does not apply to SqlClient or the Provider/Driver that ships with Windows.

Service Principal Name (SPN) Support in Client Connections
http://msdn.microsoft.com/en-us/library/cc280459.aspx

MSSQLSvc/fqdn
The provider-generated, default SPN for a default instance when a protocol other than TCP is used.
fqdn is a fully-qualified domain name.
MSSQLSvc/fqdn:port
The provider-generated, default SPN when TCP is used.
port is a TCP port number.
MSSQLSvc/fqdn:InstanceName
The provider-generated, default SPN for a named instance when a protocol other than TCP is used.
InstanceName is a SQL Server instance name

Based on this, if I have a straight TCP connection, the Provider/Driver will use the Port for the SPN designation. Let’s see what happens when I try to make connections using a UDL file. For the UDL I’m going to use the SQL Native Client 10 OleDb Provider. Starting with SNAC10, we can specify which SPN to use for the connection. This provides us some flexibility when we control how the application is going to connect. Note: This is not available with the Provider/Driver that actually ship with Windows. I also will show what the Kerberos request looks like in the network trace. This will show us, what SPN is actually being used. All of these connection attempts were made using ASKJCTP3 which is a Default Instance.

Being this is a Default Instance, I added the Instance Name SPN manually.

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
        MSSQLSvc/ASKJCTP3.dsdnet.local:MSSQLSERVER
        MSSQLSvc/ASKJCTP3.dsdnet.local:1433
        MSSQLSvc/ASKJCTP3.dsdnet.local
        MSSQLSvc/MYSQLCLUSTER.dsdnet.local:54675
        MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

Straight TCP with no SPN Specified:

58 1.796875 {TCP:7, IPv4:5} 10.0.0.3 10.0.0.1 KerberosV5 KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/askjctp3.dsdnet.local:1433

TCP with specifying an SPN for the connection:

32 1.062500 {TCP:11, IPv4:5} 10.0.0.3 10.0.0.1 KerberosV5 KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/ASKJCTP3.dsdnet.local:MSSQLSERVER

Forcing Named Pipes with no SPN specified:

68 1.828125 {TCP:21, IPv4:5} 10.0.0.3 10.0.0.1 KerberosV5 KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/askjctp3.dsdnet.local

The way the provider/driver determines which SPN to use is based on the Protocol being used. Of note, starting in SQL 2008 we allowed for Kerberos to be used with Named Pipes. If you have a Named Instance and you are using the Named Pipes protocol, we will look for an SPN with the Named Instance specified. For a Default Instance and Named Pipes, we will just look for the SPN with no port or Named Instance Name specified as shown above.

With the ability to specify the SPN from the client side, you can see how you can easily manipulate, or even see how we will determine what SPN will be used.

Now that we know all of the above, lets go back to the original question. Your company may or may not want to enable the Write permission for the Domain User Account. If your company is not willing to open up the permission on the service account, then their only recourse will be to set a static port for the Named Instance instead of letting the Named Instance use a dynamic port. This would also be my recommendation for Clusters. In this case, you will need to know exactly what SPN’s are needed and create them manually using SetSPN or tool of your choice.

Even though we don’t provide the ability to set your port during setup, you can still modify the port settings for the Instance through the SQL Server Configuration Manager. This will allow you to set your static SPN’s as well as assist you with Firewall rules.

Adam W. Saxton | Microsoft SQL Server Escalation Services

http://twitter.com/awsaxton

I saw a lot of hits on the web when I searched for the Error message 18056 with State 29. I even saw two Microsoft Connect items for this issue filed for SQL Server 2008 instances:

http://connect.microsoft.com/SQL/feedback/ViewFeedback.aspx?FeedbackID=468478

http://connect.microsoft.com/SQLServer/feedback/details/540092/sql-server-2008-sp1-cu6-periodically-does-not-accept-connections

So, I thought it was high time that we pen a blog post on when this message can be safely ignored and when it is supposed to raise alarm bells. Before I get into the nitty-gritty details, let me explain under what condition is 18056 raised with the state = 29.

Most applications today make use of connection pooling to reduce the number of times a new connection need to be opened to the backend database server. When the client application reuses the connection pool to send a new request to the server, SQL Server performs certain operations to facilitate the connection reuse. During this process (we shall call it Redo Login for this discussion) if any exception occurs, we report an 18056 error. The state numbers like the famous 18456: Login Failed error message give us more insight into why the Redo Login task fails. State 29 occurs when there is an Attention received from the client while the Redo Login code is being executed. This is when you would see the message below which has plagued many a mind till date on SQL Server 2008 instances:

2009-02-19 04:40:03.41 spid58 Error: 18056, Severity: 20, State: 29.

2009-02-19 04:40:03.41 spid58 The client was unable to reuse a session with SPID 58, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Is this a harmful message?

The answer that always brings a smile to my face: It depends! The dependency of this error message being just plain noise to something that should send all the admins in the environment running helter-skelter can be summarized in one line.

If the above error message (note that the state number should reflect 29) is the only message in the SQL Server Errorlog along with no other errors noticed in the environment (connectivity failures to the SQL instance in question, degraded performance, high CPU usage, Out of Memory errors), then this message can be treated as benign and safely ignored.

Why is this message there?

Well our intentions here were noble and we didn’t put the error message out there to create confusion. This error message is just reporting that a client is reusing a pooled connection and when the connection was reset, the server received an attention (in this case, a client disconnect) during the connection reset processing on the server side. This could be due to either a performance bottleneck on the server/environment or a plain application disconnect. The error message is aimed at helping in troubleshooting the first category of problems. If you do see some other issues at the same time though, these errors may be an indicator of what is going on at the engine side.

What should you do when you see your Errorlog bloating with these error messages?

a. The foremost task would be to scan the SQL Errorlog and determine if this error message is accompanied before/after by some other error message or warning like Non-yielding messages, Out of Memory (OOM) error message (Error 701, Failed Allocate Pages etc.).

b. The next action item would be to determine if there is high CPU usage on the server or any other resource bottleneck on the Windows Server. Windows Performance Monitor (Perfmon) would be your best friend here.

c. Lastly, check if the Network between the Client and Server is facing any latency issues or if network packets drops are occurring frequently. A Netmon trace should help you here.

Tejas Shah

Escalation Engineer - Microsoft

To 1433 or not to 1433...that is the question

SharePoint Adventures : “The connection either timed out or was lost” with RS DataSource to SSAS

AdomdConnectionException using PerformancePoint hitting Analysis Services

Breaking Down 18065

States

Pooled Connections

So, no more State 29?

Now what?

SharePoint Adventures : When connectivity is not connectivity

SharePoint Adventures : Claims, Named Pipes and Kerberos

Getting a SharePoint List Data Source to work with Reporting Services Native Mode

TCP Chimney Offload – Possible Performance and Concurrency Impacts to SQL Server Workloads

Implicit Transaction Summary TCP Chimney Enabled

Implicit Transaction Summary TCP Chimney Disabled

RSWindowsNegotiate and 401.1 Error when using RS 2008

SQL 2005 JDBC Driver and Database Mirroring

How to troubleshoot leaked SqlConnection objects (.NET 2.0) - Part 1

How to troubleshoot leaked SqlConnection Objects (.NET 2.0) - Part 2

Dumping out the internal connection objects

Processing the internal connections

Searching for Duplicate SPN's got a little easier

When in doubt, Reboot!

Kerberos Configuration:

Kerberos Event Logging and Network Traces:

SSPIClient:

End Result:

Report Builder and Firewalls

SQL 2008 - New Functionality to the dm_os_ring_buffers for Connectivity Troubleshooting

‘Cannot Generate SSPI Context’ and Service Account Passwords

How to get a x64 version of Jet?

Deprecated MDAC/WDAC Components

What SPN do I use and how does it get there?

Error 18056 can be unwanted noise in certain scenarios