Are you the publisher? Claim this channel


Embed this content in your HTML

Search

Report adult content:

click to rate:

Account: (login)

More Channels


Channel Catalog


Channel Description:

Tags: Connectivity

    This message has come across my desk a couple of times in the last week and when that happens I like to produce blog content.  

    The error is when you are trying to use a pooled connection and the reset of the connection state encounters an error.   Additional details are often logged in the SQL Server error log but the 'failure ID' is the key to understanding where to go next.

    Event ID:           18056

    Description:     The client was unable to reuse a session with SPID 157, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

    Map the failure ID to the following (SQL 2008 and SQL 2008 R2 failure id states)

     

            Default = 1,

            GetLogin1,                    2

            UnprotectMem1,                3

            UnprotectMem2,                4

            GetLogin2,                    5

            LoginType,                    6

            LoginDisabled,                7

            PasswordNotMatch,             8

            BadPassword,                  9

            BadResult,                    10

            CheckSrvAccess1,              11

            CheckSrvAccess2,              12

     

            LoginSrvPaused,                  13

            LoginType,                       14

            LoginSwitchDb,                   15

            LoginSessDb,                     16            

            LoginSessLang,                   17

            LoginChangePwd,                  18

            LoginUnprotectMem,               19

     

            RedoLoginTrace,                  20

            RedoLoginPause,                  21

            RedoLoginInitSec,                22

            RedoLoginAccessCheck,            23

            RedoLoginSwitchDb,               24

            RedoLoginUserInst,               25

            RedoLoginAttachDb,               26

            RedoLoginSessDb,                 27     

            RedoLoginSessLang,               28

            RedoLoginException,              29             (Kind of generic but you can use dm_os_ring_buffers to help track down the source and perhaps -y)

     

            ReauthLoginTrace,                30

            ReauthLoginPause,                31

            ReauthLoginInitSec,              32

            ReauthLoginAccessCheck,          33

            ReauthLoginSwitchDb,             34

            ReauthLoginException,            35

                               Login assignments from master

            LoginSessDb_GetDbNameAndSetItemDomain,           36

            LoginSessDb_IsNonShareLoginAllowed,              37

            LoginSessDb_UseDbExplicit,                       38

            LoginSessDb_GetDbNameFromPath,                   39

            LoginSessDb_UseDbImplicit,                       40      (I can cause this by changing the default database for the login at the server)

            LoginSessDb_StoreDbColl,                         41

            LoginSessDb_SameDbColl,                          42

            LoginSessDb_SendLogShippingEnvChange,            43

     

                                    Connection string values

     

            RedoLoginSessDb_GetDbNameAndSetItemDomain,       44

            RedoLoginSessDb_IsNonShareLoginAllowed,          45

            RedoLoginSessDb_UseDbExplicit,                   46      (Data specificed in the connection string Database=XYX no longer exists)

            RedoLoginSessDb_GetDbNameFromPath,               47

            RedoLoginSessDb_UseDbImplicit,                   48

            RedoLoginSessDb_StoreDbColl,                     49

            RedoLoginSessDb_SameDbColl,                      50

            RedoLoginSessDb_SendLogShippingEnvChange,        51  

      

                                    Common Windows API calls

     

            ImpersonateClient,                            52

            RevertToSelf,                                 53

            GetTokenInfo,                                 54

            DuplicateToken,                               55

            RetryProcessToken,                            56

            inChangePwdErr,                               57

            WinAuthOnlyErr,                               58

     

    Error: 18056  Severity: 20  State: 46.

    The client was unable to reuse a session with SPID 1971  which had been reset for connection pooling. The failure ID is 46. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

    State 46 = x_elfRedoLoginSessDb_UseDbExplicit = 0n46

     

    There is only one place in the code (We are simply trying to execute a usedb and getting a failure.) that sets this state and it is after we have printed the message 4060 to the client that we could not open the database or the user does not have permissions to the database.    Since there are not messages about a database going offline or being recovered and this connection as already established – “Would there have been any permission changes at this time to prevent this login from accessing the database?”   

     

    I tried this with a test application.

     

    Connection pool using database dbTest

    User RDORRTest with default database dbTest

     

    When I drop the user in the database dbTest the client starts getting the errors as I expected to see.

     

    07/28/10 07:56:45.391 [0x00001E5C] SQLState: 28000, Native Error: 18456 [Microsoft][SQL Server Native Client 10.0][SQL Server]Login failed for user 'RDORRTest'.

    07/28/10 07:56:45.410 [0x00001E5C] SQLState: 42000, Native Error: 4064 [Microsoft][SQL Server Native Client 10.0][SQL Server]Cannot open user default database. Login failed.

     

    My SQL Server error log shows

     

    2010-07-28 08:02:40.41 Logon       Error: 18456, Severity: 14, State: 50.

    2010-07-28 08:02:40.41 Logon       Login failed for user 'RDORRTest'. Reason: Current collation did not match the database's collation during connection reset.

    2010-07-28 08:02:40.41 spid53      Error: 18056, Severity: 20, State: 50.

    2010-07-28 08:02:40.41 spid53      The client was unable to reuse a session with SPID 53, which had been reset for connection pooling. The failure ID is 50. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

     

    I password change for the login at the server generated state 8.

    If I rename the database I don’t get any information about the rename in the error log and I start getting connection failures.

     

    All my attempts to far had been when the login was setup with a default database.  However, to get to the 46 condition I had to specify the DATABASE for the connection string.

     

    Now all I had to do was drop the user from the database and I get state 46.

     

    2010-07-28 08:29:51.61 Logon       Error: 18456, Severity: 14, State: 46.

    2010-07-28 08:29:51.61 Logon       Login failed for user 'RDORRTest'. Reason: Fa iled to open the database configured in the login object while revalidating the login on the connection. [CLIENT: 65.53.66.207]

     

    Added the user back and I no longer get the error and the connections continue their work.

     

    Bob Dorr - Principal SQL Server Escalation Engineer


    I have had several questions on my blog post: http://blogs.msdn.com/b/psssql/archive/2010/08/03/how-it-works-error-18056-the-client-was-unable-to-reuse-a-session-with-spid-which-had-been-reset-for-connection-pooling.aspx related to SQL Server 2008's honoring of an query cancel (attention) during the processing of the reset connection.  This blog will augment my prior post.

    Facts

    • You will not see the sp_reset_connection on the wire when tracing the network packets.   It is only a bit set in the TDS header and not RPC text in the packet.
    • sp_reset_connection is an internal operation and generates RPC events to show its activity.
    • Newer builds of SQL Server added logical disconnect and connect events. http://blogs.msdn.com/b/psssql/archive/2007/03/29/sql-server-2005-sp2-trace-event-change-connection-based-events.aspx
    • An attention from the client (specific cancel or query timeout) records the time it arrives (out-of-band) but the attention event is not produced until the query has ceased execution, honored the attention.   This makes the start time of the attention the received time, the end time the complete honor time and the duration how long it took to interrupt the execution, handle rollback operations if necessary and return control of the session to the client.

    The questions normally center around the Error 18056, State 29 and how one can encounter it.   I have outlined the high level flow in the diagram below for producing the error.

    The application will reuse a connection from the pool.   When this occurs the client driver will set the reset bit in the TDS header when the next command is executed.  In the diagram I used an ODBC example of SQLExecDirect.

    1. The command is received at the SQL Server, assigned to a worker and begins processing.  If the reset bit is located the sp_reset_connection logic is invoked.
    2. When tracing the RPC:Starting and logical Disconnect events are produced.
    3. The login is redone; checking permissions, making sure password has not expired, database still exists and is online, user has permission in the database and other validations take place.  
    4. Client explicitly cancels (SQLCancel) or query timeout is detected by client drivers and an attention is submitted to the SQL Server.   The attention is read by the SQL Server, starting time captured and the session is notified of the cancel request, STOP!  (Note:  This is often a point of confusion.  The overall query timeout applies to reset login and execution of the query in this scenario.)
    5. During all these checks the logic will also check to see if a query cancellation (attention) has arrived.  If so the Redo Login processing is interrupted the 18056 is reported and processing is stopped.
    6. The attention event is always produced after the completed event.  (When looking at a trace look for the attention event after the completed event to determine if the execution was cancelled.)  This allows the attention event to show the duration required to honor the attention.  For example, if SET_XACT_ABORT is enabled an attention will upgrade to a rollback of the transaction.   If it was a long running transaction the rollback processing could be significant.   Without SET_XACT_ABORT the attention interrupts processing as quickly as possible and leaves the transaction active.   The client is then responsible for the scope of the transaction.

    image

    The "If Cancelled" used by Redo Login is where the change occurs between SQL 2005 and SQL 2008.   The cancel was not checked as frequently in SQL 2005 so it was not honored until the command execution started.   SQL Server 2008 will honor the attention during the redo login processing. 

    Here was an example that I received that will show the behavior.   Notice that the execution (rs.Open) is done asynchronously so control returns to the client as soon as the query is put on the wire to the SQL Server.   The cn.Cancel following the rs.Open will submit the attention for the request that was traveling to the SQL Server.   This will produce the same pattern as shown in the diagram above, interrupting the Redo Login.  If you were not using pooled connections the reset activity would not be taking place and the query itself would be interrupted.

    dim cn

    dim rs

    set cn = CreateObject("ADODB.Connection")

    set rs = CreateObject("ADODB.Recordset")

    for i = 1 to 1000

                    cn.Open "Provider=SQLNCLI10;Integrated Security=SSPI;Data Source=SQL2K8Server; initial catalog =whatever;"

                    rs.ActiveConnection = cn

                    rs.CursorLocation = 2

                                 ‘ 48 = adAsyncExecute + adAsyncFetch

                    rs.Open "select * from whatever", cn, 0, 1, 48

                    cn.Cancel

                    cn.Close

    next

    Internally an attention is raised as a 3617 error and handled by the SQL Server error handlers to stop execution of the request.   You can see the 3617 errors in the sys.dm_os_ring_buffers.  You can watch them with the trace exception events as well.

    <Record id= "1715" type="RING_BUFFER_EXCEPTION" time="12558630"><Exception><Task address= 0x11B4D1B88</Task><Error>3617</Error><Severity>25</Severity><State>23</State><UserDefined>0</UserDefined></Exception><Stack

    Bob Dorr - Principal SQL Server Escalation Engineer


    Previous Post: SharePoint Adventures : Setting up Reporting Services with SharePoint Integration

    In the previous post, I walked through getting RS 2008 R2 Integrated with SharePoint 2010. What I didn't touch on was if you wanted to get this work with Kerberos. Kerberos itself can be complicated. This is partly because you need to track so many things. And, as the deployment becomes more distributed, you have to track more things.

    A while back, I posted a blog post describing my Kerberos Checklist. I'll use this as we step through my SharePoint deployment to get Kerberos working in this environment.

    Before we get into the details, there is one piece I want to point out that is special with SharePoint 2010. The authentication model that you select for your site makes a big difference in whether this will work or not. SharePoint allows you to choose between Classic and Claims. If you choose to have a Claims site, you will not be able to get Kerberos to work with RS 2008 R2 when integrated with SharePoint 2010. If the site is Claims based, you won't be able to change it back either. Part of the reason why Kerberos won't work is because when we detect you are in a Claims site, we always go with Trusted Authentication from the RS Perspective. This means that a Windows Token will not be passed from SharePoint to the Report Server. An SPUser token will be passed instead. My next post will go into how you can determine if your site is Classic or Claims.

    That being said, lets dive in…

    For this setup, I only have 3 servers involved.

      Server Name Service Account Delegation Required Custom App Settings?
    SharePoint DSDContosoSP DSDCONTOSO\spservice Yes Yes
    Reporting Services DSDContosoRS DSDCONTOSO\rsservice Yes Yes
    SQL Server DSDContosoSQL DSDCONTOSO\sqlservice No No

    SharePoint

    Service Principal Name (SPN)

    Because we are going to do Kerberos we are going to need some SPNs. In the table above, we know that the SharePoint Service account is DSDCONTOSO\spservice. This means that those SPN's will need to go on that user account (DSDCONTOSO\spservice) and not the Machine Account (DSDContosoSP). Had we been using LocalSystem or Network Service, the SPNs would have gone on the Machine Account. That's always how you figure out where the SPN's go. It's always based on the context that the Service is running under.

    So, let's have a look at DSDCONTOSO\spservice. We'll use SETSPN to do that. Starting with Windows 2008, SETSPN ships with the operating system and is available directly from a command prompt.

    clip_image001

    We can see that there are no SPN's registered on the spservice account. You'll also notice that I can run SETSPN with or without the Domain Name. This is because I only have a single domain. If you had multiple domains, supplying the domain name tells SETSPN where you want to modify the account at. This really is helpful if you have the same account name in multiple domains. Going forward, I will leave out the domain name.

    As a quick check, I also look at the Machine Account (DSDContosoSP). This is just a double check to make sure I won't run into a duplicate situation.

    clip_image002

    Because SharePoint is a web application, we are interested in the HTTP SPN. We do not see one listed on the Machine account. You can take note of the HOST SPN though. This will be found on any Machine Account. You should never see these on a User account. They get created when a Machine Account is created. The HOST SPN will cover the HTTP service if you are running within the context of the Machine Account (LocalSystem or Network Service). Had that been the case, we wouldn't have needed and HTTP SPN. But, because we are using the spservice user account, we will need to put an HTTP SPN on the spservice account.

    NOTE: Domain Admin permissions are required to add (-a) or delete (-d) an SPN. Anyone can list (-l) out an SPN.

    clip_image003

    We added two SPNs to the spservice account. HTTP/dsdcontososp and HTTP/dsdcontososp.dsdcontoso.local. HTTP SPNs are based on the URL that you are going to use. In this case we are just using the machine name for the URL (http://dsdcontososp). Internet Explorer will convert this to the Fully Qualified Domain Name (FQDN) when it builds out the required SPN. So the SPN request for that URL would be HTTP/dsdcontososp.dsdcontoso.local. And we have just added that.

    NOTE: HTTP SPNs should NOT have a port listed. They are purely based on the host within the URL without the port number. This means that you have two sites running on different ports, you should use the same Service Account for both as you will have a shared SPN between the two. For example: http://dsdcontososp & http://dsdcontososp:5555 both use the following SPN: HTTP/dsdcontososp.

    What about the Netbios SPN? Well, I always add that for good measure. Hopefully it will never be needed. But, on the off chance that the name lookup fails, we will be covered. When we do the reverse name lookup to get the FQDN, we have to go out to the DNS Server to do that. If for whatever reason the DNS Lookup fails, we will just use the netbios name. So, the SPN would look like HTTP/dsdcontososp. The fact that we added both means I won't be hit by intermittent DNS issues and end users won't be interrupted unnecessarily. So, my take on it is to always add both the NETBIOS and FQDN SPNs.

    If we look at the spservice account now, we will see both SPNs that we added.

    clip_image004

    Delegation

    We know that we are going from the SharePoint server to the Report Server. In order for credentials to be forwarded from SharePoint to RS, we need to give the Service Account permission to delegate. By default, this is disabled.

    NOTE: Domain Admin permissions are required to modify delegation settings on an account.

    clip_image005

    NOTE: The delegation tab will only be visible if SPN's are present on that account.

    The Delegation Tab of the account is where we will find these settings. Here is how the options break down:

    Do not trust this user for delegation No Trust. We cannot delegate.
    Trust this user for delegation to any service (Kerberos only) Full Trust. We can delegate to any service.
    Trust this user for delegation to specified services only Constrained Delegation.  Requires you to list the services that we can delegate to in the list below the radio dials.
       Use Kerberos Only Constrained Delegation with Kerberos Protocol only
       Use any authentication protocol Constrained Delegation with Protocol Transitioning.  Useful on the Claims side of things.

    For this example, I'm not going to go into the Constrained Delegation side of things. I will do that in a later post. We will just stick with Full Trust.

    So, I select "Trust this user for delegation to any service (Kerberos Only)". Please take into account that Constrained Delegation is the more secure option. But, it also presents its restrictions as a result. Stay tuned for more information about that.

    clip_image006

    SharePoint Settings

    The SPN and delegation settings are really the basic Kerberos settings needed for any application. However, SharePoint has some app specific settings that we need to pay attention to. For this, we will head over to SharePoint's Central Admin Site.

    We will go to Application Management and then to Manage Web Applications.

    clip_image007

    You will select the site you are interested in, in my case it is the SharePoint - 80 site, and then click on Authentication Providers.

    clip_image008

    Click on Default.

    clip_image009

    We want to choose "Negotiate (Kerberos)" and then hit "Save".

    clip_image010

    This configures that SharePoint site to use Negotiate. Negotiate will always attempt to use Kerberos first if an SPN is available to use. We can test to see if this is working properly by going back to the SharePoint site. It should come up as normal without any prompts for credentials or 401.1 errors. If you encounter that, something isn't right.

    However, at this point our reports should no longer work. The underlying error here will be a 401.1 against the Report Server because it hasn't been setup for Kerberos.

    clip_image011

    In the SharePoint ULS Log we will see:

    02/21/2011 08:16:46.68         w3wp.exe (0x0F44)         0x0F68        SQL Server Reporting Services         UI Pages         aacz        High         Web part failed in SetParamPanelVisibilityForParamAreaContent: System.Net.WebException: The request failed with HTTP status 401: Unauthorized.
    at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.GetSecureMethods()
    at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.IsSecureMethod(String methodname)
    at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.SetConnectionSSLForMethod(String methodname)
    at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.ProxyMethodInvocation.Execute[TReturn](RSExecutionConnection connection, ProxyMethod`1 initialMethod, ProxyMethod`1 retryMethod)
    at Microso...        23c6017c-3d37-4b70-b378-d5dd875518f6

    Which brings us to the next stop in our journey…

    Reporting Services

    Service Principal Name (SPN)

    Our service account for Reporting Services is rsservice and not Network Service, so the SPN's will go on the rsservice account itself. Also, Reporting Services is a web application, so we are still sticking with an HTTP SPN. Lets check out what is on the Service Account and the Machine Account.

    clip_image012

    clip_image013

    Everything looks good here. Again, HTTP SPN's are URL based. So, we are going to create the SPN based on the url which you can get from the Reporting Services Configuration Manager under the Web Service URL Tab.

    clip_image014

    Based on that, our SPNs will be the following - HTTP/dsdcontosors and HTTP/dsdcontosors.dsdcontoso.local

    clip_image015

    And doing a listing of the rsservice account, we should see two SPNs on it.

    clip_image016

    Reporting Services Settings

    I'm doing Settings first instead of Delegation to show that Delegation may not be needed. However, there is a setting for Reporting Services that is needed in order for Kerberos to work successfully against Reporting Services. This setting resides in the rsreportserver.config file which by default should be found at :

    C:\Program Files\Microsoft SQL Server\MSRS10_50.<instance name>\Reporting Services\ReportServer

    The setting that we are interested in is Authentication Type. If you look at the current setting, you may see different results.

    <Authentication>
        <AuthenticationTypes>
            <RSWindowsNTLM/>
        </AuthenticationTypes>
        <RSWindowsExtendedProtectionLevel>Off</RSWindowsExtendedProtectionLevel>
        <RSWindowsExtendedProtectionScenario>Proxy</RSWindowsExtendedProtectionScenario>
        <EnableAuthPersistence>true</EnableAuthPersistence>
    </Authentication>

    For mine, I see RSWindowsNTLM under the Authentication Types. This is because when I first setup Reporting Services, I used a Domain Account instead of the default Network Service. When you do this, it will default the setting to RSWindowsNTLM. If I would have chosen Network Service as the Account to use during setup, this setting would have reflected RSWindowsNegotiate. And then you could later change it to a Domain Account without this setting changing.

    All I need to do for mine to get Kerberos working is to change it over to RSWindowsNegotiate. You can either add it on top of RSWindowsNTLM or replace RSWindowsNTLM.

    NOTE: RSWindowsNegotiate is specific to Internet Explorer. Other browsers may need RSWindowsKerberos instead. You will need to test that to see what works best for your configuration.

    In my case, I just added it on top of RSWindowsNTLM

    <Authentication>
        <AuthenticationTypes>
            <RSWindowsNegotiate/>
            <RSWindowsNTLM/>
        </AuthenticationTypes>
        <RSWindowsExtendedProtectionLevel>Off</RSWindowsExtendedProtectionLevel>
        <RSWindowsExtendedProtectionScenario>Proxy</RSWindowsExtendedProtectionScenario>
        <EnableAuthPersistence>true</EnableAuthPersistence>
    </Authentication>

    At this point, my Hello World report should come up ok as I'm not using any data sources for it.

    clip_image017

    However, the report where I do have a data source will fail. However, it is with a different message this time.

    clip_image018

    In the ULS log, we won't see an error by default, because the Reporting Services Monitoring trace points have not been enabled within Central Admin. The error itself will be a "Login failed for user 'NT AUTHORITY\ANONYMOUS'". That error comes directly from SQL Server. Whereas the 401.1 errors were Web related errors.

    Delegation

    In order for Reporting Services to forward credentials to a back end data source, we need to enable delegation permissions on the Service Account. The data sources are process within the Report Server Windows Service and not SharePoint, so the SharePoint settings don't help us here.

    We will do what we did with the SharePoint Account and enabled Full Trust for the Reporting Services Account.

    clip_image019

    This in itself is not enough to get our Report with the Data Source working though. This just allows Reporting Services to forward the user's credentials to another Service. That service we are forwarding to still needs to be setup properly. In this example it is SQL Server we are forwarding to and it does not have it's SPN configured yet. So, we will still fail with a Login Failed message from SQL. Reporting Services at this point should be good to go though.

    Which brings us to the last stop in our journey…

    SQL Server

    Service Principal Name (SPN)

    I have previously written a blog post concerning the SQL Server SPNs. What SPN do I use and how does it get there?

    It goes through how SQL Server can make use of it's ability to manage the SPN for you, and which SPN is needed based on which protocol you are trying to connect with. I won't go through all the details again here, so I will make a few assumptions.

    First, that the ability for SQL to manage it's SPNs is not working because I'm using a Domain Account and I haven't given it the permissions necessarily for that to occur. You can also verify this in the SQL ERRORLOG:

    2011-02-21 08:58:01.40 Server The SQL Server Network Interface library could not register the Service Principal Name (SPN) for the SQL Server service. Error: 0x2098, state: 15. Failure to register an SPN may cause integrated authentication to fall back to NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies.

    Second, that I'm going to be connecting with the TCP protocol and not Named Pipes.

    The SQL Service is using the sqlservice account. And because we are using the TCP Protocol, the SPN will need the port number. In this case, it is a default instance, so we know the port will be 1433. So, our SPN will look like the following for SQL - MSSQLSvc/dsdcontososql:1433 and MSSQLSvc/dsdcontososql.dsdcontoso.local:1433. You'll notice I'm doing both the NETBIOS and FQDN SPNs here. It is the same reason as with the HTTP SPN. In this case, the SQL Client connectivity components will do a reverse lookup on the server name to try and resolve the FQDN. So, with everything working as it should, it should always try to get the FQDN SPN even if you supply the NETBIOS server name in the connection string.

    The SPNs for SQL Server are derived from the Connection String that the client is using. The client in this case being Reporting Services. Reporting Services is a .NET Application, so it is using SqlClient to connect to SQL.

    We can see that there are no SQL SPNs registered on the service account or the machine account

    clip_image020

    So, lets go ahead and add the SPNs.

    clip_image021

    clip_image022

    Everything looks good on the SPN front. For good measure, you may want to use the setspn tool to search for duplications. It is a new feature of SPN that was added in Windows 2008. It is the -X command. It will search the entire domain for duplicates. You should never have a duplicate as it will cause an error.

    clip_image023

    Looks like we do not have any duplicate SPNs. At this point the Report that we have with a data source to the SQL Server should run ok as there are no application specific settings that needs to be set for SQL Server outside of the SPN.

    clip_image024

    NOTE: Depending on how you have approached the setup, you may still encounter an error due to the fact that the failed Kerberos requests may still be cached. You can either wait for cache to clear out, or you can restart the services to get it going. I had to recycle SharePoint and Reporting Services for it to start working on my box, as well as log off and back in (or just run klist purge on the client).

    Delegation

    For your back end server, you may not need to enable delegation. If the hops stop with this server, then we are done and do not need delegation. However, if this backend server will be continuing on to another service, then delegation will be necessary if it will try to forward the windows user credential.

    A great example of this with SQL Server is the use of a Linked Server. However, just the fact that you have a Linked Server doesn't mean that you need delegation. It is dependent on how you configure authentication on the Linked Server.

    clip_image025

    If "Be made using the login's current security context" is selected for the Linked Server, then we will need to enable delegation for the SQL Service account.

    There are also other things that may require delegation from SQL. SQLCLR is one that might depending on what you are doing. The general rule of thumb is that if anything within SQL is trying to reach out to another resource and will need to send the current user's credentials, than you will need Delegation enabled on the SQL Service Account.

    In my case I'm not, so I'm going to leave it alone.

    Summary

    So, that's it. We went through each stop along the communication path (SharePoint, RS and SQL), and we validated the settings for each one as we got there. We also saw that certain things began to work as we enabled items. The Report without the data source started working with SQL being setup because we weren't reaching out to SQL. And, we also looked at when you need to enable delegation or not depending on whether that service needed to reach out to another service. For Reporting Services, had we not been hitting a data source, we may not have needed to enable Delegation on the rsservice account as I showed with the HelloWorld report. But when we need to access data, we then need to have it if we want to use Kerberos. The other option would be to store the credentials within the data source.

    Hopefully this helps someone when trying to setup this type of deployment, or any deployment that requires Kerberos in order to work.

    Adam W. Saxton | Microsoft SQL Server Escalation Services
    http://twitter.com/awsaxton


    I recently ran into a problem with one of my own internal applications and it re-raised a philosophical question I have had before with customers.  There are really two sides to the question:
    1) Should I set my non-default instance to listen on TCP 1433?
    2) Should I set my default instance to listen on something other than TCP 1433?

    In both cases, I recommend "no". 

    Let me tackle the second question first since that is a simpler question.  I know that years ago one of the common security recommendations was to put your default instance on a port other than TCP 1433 to make it more difficult for attackers to find it.  However, I can say with complete comfort that if an attacker has a network trace that has a connection attempt to your instance, they can figure out the port on which your instance is listening very easily.  Even if you encrypt the conversation, the first five packets of the conversation are unencrypted because you cannot encrypt anything until you have contacted the instance.  Since those first five packets are the same for any connection attempt, it is very easy to detect them in a network trace.  Although I don't do this maliciously (I swear!), I do this on a regular basis when someone sends me a network trace to a named instance and neglects to tell me the port on which the instance is listening.

    In addition, if you change your default instance to a port other than TCP 1433, you now need to specify it in every connection string - either directly (servername,port#) or indirectly via client alias.  Given how easy it is to find this conversation in a network, I really cannot see the additional effort as being worth the negligible security benefit (security by obfuscation is never a great idea).

    The first question is a little bit more complex.  Setting your default instance to TCP 1433 does indeed give you the benefit of not having to specify the instance name or port number in the connection string.  This is because the SQL Server client libraries don't bother querying SQL Browser for the port number if they don't detect an instance name in the connection string.  Instead, they go straight to TCP 1433. 

    The downside to this approach shows up when you are working with application administrators who don't know anything about the SQL Server instance.  If they don't know that the instance is a named instance, they might configure their connection string as if the SQL Server instance was a default instance.  Since the instance is listening on TCP 1433, the attempt to connect will succeed.  The the real problem comes later when you decide to change the port on which your SQL Server instance is listening (maybe you read my blog:)).  If you do, but don't change the client connection string, the client won't be able to connect.  And, because the client thinks your instance is a default instance, it won't query SQL Browser, so will never find out the new port.  The only way to fix this is to create an alias on the client (tough to maintain over time) or to modify the connection string to specify an instance name.  Now, instead of just getting downtime on the SQL Server side, you have to take downtime on the client side, too.

    In conclusion, given that there is a negligible security benefit to modifying the port for your default instance and there is significant potential for outages with setting your named instance to TCP 1433.  Therefore, with the exception of setting a static port for your named instances, I recommend you just leave the port settings to default.

    P.S.  Please don't set any of your instances to TCP 1434 either.  While not technically wrong, it is very confusing since SQL Browser listens on UDP 1434 and hardly anybody references the protocol (TCP vs. UDP) when talking about ports.  Making sure both sides of the conversation are talking about the same service can then get quite confusing if you put SQL Server on TCP 1434.

    Evan Basalik | Senior Support Escalation Engineer | Microsoft SQL Server Escalation Services


    A customer had encountered an issue with their SharePoint 2010 / Reporting Services 2012 deployment.  They had setup a Data Source for Reporting Services that was setup to connect to a stand alone Analysis Services instance.  When they clicked on “Test Connection” they saw the following:

     

    SNAGHTML7d37f3[4]

    Within the SharePoint ULS Log, we saw the following – which was really the same error:

    Throwing Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: , Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: Cannot create a connection to data source 'AdventureWorksAS2012.rsds'. ---> Microsoft.AnalysisServices.AdomdClient.AdomdConnectionException: The connection either timed out or was lost. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host

    at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace ---

    When I saw this error, I did not attribute the error to an authentication issue as this usually indicates a network related issue.  I was actually able to reproduce the issue in my local environment.  Once I had it reproduced I grabbed an Analysis Services Profiler trace and saw the following.

    SNAGHTML80759e

    The minute I saw that, my mindset shifted to an authentication issue and I was pretty sure this was Kerberos related – which based on our deployment of SharePoint 2010 and RS 2012 this also equated to a Claims/Kerberos issue.  Some people think that because we are a Claims aware service now, that Kerberos isn’t needed any longer.  What you will see below is that Kerberos is definitely in play and contributes to the issue.

    So, I started with a Network trace using Network Monitor 3.4.  After I collected the trace, I just filtered with the KeberosV5 protocol and applied that.  Here is what I saw:

    SNAGHTML82d1d1

    There were actually two things going on here.

    1. I was missing the MSOLAPSvc.3/BSPegasus SPN
    2. The Claims Service Account did not have Constrained Delegation setup to allow delegation to the OLAP Service.

    I added my MSOLAP SPN’s.  In this case it was requesting the NETBIOS name, so I added both:

    SNAGHTML988123

    What surprised me on this was that I didn’t see any PRINCIPAL_UNKNOWN errors here.  Just the KDC_ERR_BADOPTION.  In the past, I usually ignored BADOPTION errors and sometimes it can be red herring.  The key here is the number.  The BADOPTIONS I typically ignored had a 5 code with it.  These had 13.  Of note, this BADOPTION was because of Item 2 above – lack of Constrained Delegation configured within Active Directory.

    The thing to remember about this deployment is that this is going to be Claims to start.  This means that we will be using the Claims to Windows Token Services (C2WTS).  There will be a Windows Service on the server that is affected and it will have an associated Service Account.  In my case, my service account is BATTLESTAR\claimsservice.  After adding the SPN, I allowed the the Constrained Delegation option to the MSOLAP service.  This is done on the Delegation tab for the service account in question.  If you are using LocalSystem for the C2WTS service account, it would be on the machine account for the server that the C2WTS service is running on.

    NOTE: In order to see the Delegation tab in Active Directory, an SPN needs to be on that account.  However, there is no SPN needed for the Claims Service Account.  In my case, I just added a bogus SPN to get it to show.  The SPN I added isn’t used for anything other than to get the Delegation tab to show.

    SNAGHTML8ff864

    SNAGHTML9036bb

    After I had that in place, I did an IISReset to flush and cache for that logged in session and ran a Network Trace again – because I got the same error.

    SNAGHTML9149ed

    You can notice that the BADOPTION is not present after the MSOLAP TGS request.  That’s because what we did  corrected that one.  However, now we see a BADOPTION after the TGS request for the RSService.  This is something I ran into a few months back that a lot of people either aren’t aware of, or the configuration is so confusing that it is just missed.  Even though you setup the Claims Service with Delegation settings, the actual Shared Service that we are using, also needs these delegation settings.  In this case it would be Reporting Services.  So, we have to repeat what we did for the Claims Service with the Reporting Services account.

    NOTE:  In this configuration, the Reporting Services Service Account will not have any SPN’s on it as they are not needed (unless you are sharing it with something else).  So, we’ll need to add a bogus SPN on the RS Service Account to get the Delegation tab to show up.

    In my case, I’m sharing my RSService account with a native mode service, so I actually have an HTTP SPN on the account and the Delegation tab is available. 

    NOTE: Because the Claims Service has forced Constrained Delegation because of the need for Protocol transitioning, the RS Service MUST use Constrained Delegation.  You can’t go from more secure to less secure.  It will fail.

    SNAGHTML967c53

    Now lets look at the network trace with these changes.

    SNAGHTML991141

    You can see that we got a successful response on the 4th line without getting the BADOPTION.  We still see one more BADOPTION, but I didn’t concern myself with it, because…

    SNAGHTML99d8d4

    I was now working!!!

     

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton


    I was working with a customer who was encountering problems trying to use a PerformancePoint Dashboard against an Analysis Services Instance. The issue came down to the Claims to Windows Token Service (C2WTS) configuration.  This is used to take the Claims context and convert it to a Windows Token for use to backend servers.

    When trying to create a Data Source within PerformancePoint Dashboard Designer, using the Unattended Service Account, the test succeeds.  If we switch that over to Per-user Identity, we see the following:

    image

    Within the Event Logs for the SharePoint App Server, we see the following from PerformancePoint:

    Log Name:      Application
    Source:        Microsoft-SharePoint Products-PerformancePoint Service
    Date:          9/6/2012 11:59:57 AM
    Event ID:      37
    Task Category: PerformancePoint Services
    Level:         Error
    Keywords:     
    User:          BATTLESTAR\spservice
    Computer:      AdmAdama.battlestar.local
    Description:
    The following data source cannot be used because PerformancePoint Services is not configured correctly.

    Data source location: http://admadama:82/Data Connections for PerformancePoint/5_.000
    Data source name: New Data Source 3

    Monitoring Service was unable to retrieve a Windows identity for "BATTLESTAR\asaxton".  Verify that the web application authentication provider in SharePoint Central Administration is the default windows Negotiate or Kerberos provider.  If the user does not have a valid active directory account the data source will need to be configured to use the unattended service account for the user to access this data.

    Exception details:
    System.InvalidOperationException: Could not retrieve a valid Windows identity. ---> System.ArgumentException: Token cannot be zero.
       at System.Security.Principal.WindowsIdentity.CreateFromToken(IntPtr userToken)
       at System.Security.Principal.WindowsIdentity..ctor(IntPtr userToken, String authType, Int32 isAuthenticated)
       at System.Security.Principal.WindowsIdentity..ctor(IntPtr userToken)
       at Microsoft.IdentityModel.WindowsTokenService.S4UClient.CallService(Func`2 contractOperation)
       at Microsoft.SharePoint.SPSecurityContext.GetWindowsIdentity()
       --- End of inner exception stack trace ---
       at Microsoft.SharePoint.SPSecurityContext.GetWindowsIdentity()
       at Microsoft.PerformancePoint.Scorecards.ServerCommon.ConnectionContextHelper.SetContext(ConnectionContext connectionContext, ICredentialProvider credentials)

    This error is indicating that the C2WTS Service failed with getting the windows Credential.  The S4UClient call is the key indicator.  We reviewed the C2WTS settings, which aren’t many, and the one thing I remembered is that if you are using a Domain User account for the C2WTS Windows Service, you have to add it to the Local Adminstrators group on the box that is trying to invoke it.  In our case, it is the server hosting the PerformancePoint Service App.  You don’t have to do this step if you leave the C2WTS service as LocalSystem.

    Once that is done, we need to recycle the C2WTS Windows Service and try it again.  We were then presented with a different error:

    image

    Log Name:      Application
    Source:        Microsoft-SharePoint Products-PerformancePoint Service
    Date:          9/6/2012 12:09:42 PM
    Event ID:      9
    Task Category: PerformancePoint Services
    Level:         Warning
    Keywords:     
    User:          BATTLESTAR\spservice
    Computer:      AdmAdama.battlestar.local
    Description:
    The user "BATTLESTAR\asaxton" does not have access to the following data source server.

    Data source location: http://admadama:82/Data Connections for PerformancePoint/5_.000
    Data source name: New Data Source 3
    Server name: bspegasus\kjssas

    Exception details:
    Microsoft.AnalysisServices.AdomdClient.AdomdConnectionException: A connection cannot be made to redirector. Ensure that 'SQL Browser' service is running. ---> System.Net.Sockets.SocketException: The requested name is valid, but no data of the requested type was found
       at System.Net.Sockets.TcpClient..ctor(String hostname, Int32 port)
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
       --- End of inner exception stack trace ---
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.OpenTcpConnection(ConnectionInfo connectionInfo)
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.Connect(ConnectionInfo connectionInfo, Boolean beginSession)
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetInstancePort(ConnectionInfo connectionInfo)
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.OpenTcpConnection(ConnectionInfo connectionInfo)
       at Microsoft.AnalysisServices.AdomdClient.XmlaClient.Connect(ConnectionInfo connectionInfo, Boolean beginSession)
       at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.XmlaClientProvider.Connect(Boolean toIXMLA)
       at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.ConnectToXMLA(Boolean createSession, Boolean isHTTP)
       at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.Open()
       at Microsoft.PerformancePoint.Scorecards.DataSourceProviders.AdomdConnectionPool`1.GetConnection(String connectionString, ConnectionContext connectionCtx, String effectiveUserName, CultureInfo culture, NewConnectionHandler newConnectionHandler, TestConnectionHandler testConnectionHandler)

    At first I thought that this may be because of SQL Browser, based on the error message.  And, I know that for SQL Browser, when we have a Named Instance, you have to add the DISCO SPN’s per the following KB Article:

    An SPN for the SQL Server Browser service is required when you establish a connection to a named instance of SQL Server Analysis Services or of SQL Server
    http://support.microsoft.com/kb/950599

    My thought was that I had to add delegation rights for the Claims and PerformancePoint service over to the DISCO service.  This actually turned out to not be needed at all based on my testing.  I have this actually working with those SPN’s in place and without the Claims/PerformancePoint service accounts having Constrained Delegation rights to that. 

    After playing around with this a little more, I remembered that I had been told a while back that the Claims Service Account needs to have “Act as part of the operating system” right in order to work correctly.  My mindset was that if the account was a local admin, this wouldn’t be needed.  With that right missing, I was able to reproduce the 2nd error that the customer was hitting.  This is actually listed on page 126 of the following whitepaper. 

    Configuring Kerberos Authentication for Microsoft SharePoint 2010 Products
    http://www.microsoft.com/en-us/download/details.aspx?id=23176

    Of note, the “Impersonate a client after authentication” right that it lists, you get for free because the Claims Service account will be a member of WSS_WPG which is a member of the IIS_IUSRS group because of SharePoint.

    The C2WTS Service Account will be automatically added to the “Log on as a service” right when you start the C2WTS Service from Central Admin in the “Manage services on server” area.

    The lesson learned here is that the Claims to Windows Token Service Account needs to be in the Local Administrators group and has to have the “Act as part of the operating system” right that you can assign within Local policies.

    image

     

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton


  • 02/13/13--12:13: Breaking Down 18065
  • We have had two blog posts on this blog regarding the 18056 error.  Two from Bob Dorr (and part 2) and another from Tejas Shah.  However, we still see a lot of questions about this error message. This error message can show up for different reasons.  After those two blog posts were made, we released the following:

    FIX: Errors when a client application sends an attention signal to SQL Server 2008 or SQL Server 2008 R2
    http://support.microsoft.com/kb/2543687

    This fix was specific to the following message and having to do with Attentions:

    Error: 18056, Severity: 20, State: 29.
    The client was unable to reuse a session with <SPID>, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

    Since this was released, there has still continued to be confusion over this error.  The intent of the fix above was to limit the amount of noise in the ERRORLOG.  And, this was specific to receiving the State 29 with 18056 when an Attention was received.  The Attention is the important part here.  If an Attention occurred during a reset of a connection, we would normally log that to the ERRORLOG under the State 29.  However, with this fix applied, if the Attention occurs during the reset of a connection, you should no longer see the error within the ERRORLOG.  This does NOT mean that you will no longer see a State 29

    I will use this post to explain further how we handle these errors to give you a better understanding.  To do that, I will expand on Bob Dorr's blog post that I linked above which lists out the states.  

    States

    Default = 1,
    GetLogin1, 2
    UnprotectMem1, 3
    UnprotectMem2, 4
    GetLogin2, 5
    LoginType, 6
    LoginDisabled, 7
    PasswordNotMatch, 8
    BadPassword, 9
    BadResult, 10
    FCheckSrvAccess1, 11
    FCheckSrvAccess2, 12
    LoginSrvPaused, 13
    LoginType, 14
    LoginSwitchDb, 15
    LoginSessDb, 16
    LoginSessLang, 17
    LoginChangePwd, 18
    LoginUnprotectMem, 19
    RedoLoginTrace, 20
    RedoLoginPause, 21
    RedoLoginInitSec, 22
    RedoLoginAccessCheck, 23
    RedoLoginSwitchDb, 24
    RedoLoginUserInst, 25
    RedoLoginAttachDb, 26
    RedoLoginSessDb, 27
    RedoLoginSessLang, 28
    RedoLoginException, 29    (Kind of generic but you can use dm_os_ring_buffers to help track down the source and perhaps –y. Think E_FAIL or General Network Error)
    ReauthLoginTrace, 30
    ReauthLoginPause, 31
    ReauthLoginInitSec, 32
    ReauthLoginAccessCheck, 33
    ReauthLoginSwitchDb, 34
    ReauthLoginException, 35

    **** Login assignments from master ****

    LoginSessDb_GetDbNameAndSetItemDomain, 36
    LoginSessDb_IsNonShareLoginAllowed, 37
    LoginSessDb_UseDbExplicit, 38
    LoginSessDb_GetDbNameFromPath, 39
    LoginSessDb_UseDbImplicit, 40    (We can cause this by changing the default database for the login at the server)
    LoginSessDb_StoreDbColl, 41
    LoginSessDb_SameDbColl, 42
    LoginSessDb_SendLogShippingEnvChange, 43

    **** Connection String Values ****

    RedoLoginSessDb_GetDbNameAndSetItemDomain, 44
    RedoLoginSessDb_IsNonShareLoginAllowed, 45
    RedoLoginSessDb_UseDbExplicit, 46    (Data specified in the connection string Database=XYX no longer exists)
    RedoLoginSessDb_GetDbNameFromPath, 47
    RedoLoginSessDb_UseDbImplicit, 48
    RedoLoginSessDb_StoreDbColl, 49
    RedoLoginSessDb_SameDbColl, 50
    RedoLoginSessDb_SendLogShippingEnvChange, 51

    **** Common Windows API Calls ****

    ImpersonateClient, 52
    RevertToSelf, 53
    GetTokenInfo, 54
    DuplicateToken, 55
    RetryProcessToken, 56
    LoginChangePwdErr, 57
    WinAuthOnlyErr, 58

    **** New with SQL 2012 ****

    DbAuthGetLogin1, 59
    DbAuthUnprotectMem1, 60
    DbAuthUnprotectMem2, 61
    DbAuthGetLogin2, 62
    DbAuthLoginType, 63
    DbAuthLoginDisabled, 64
    DbAuthPasswordNotMatch, 65
    DbAuthBadPassword, 66
    DbAuthBadResult, 67
    DbAuthFCheckSrvAccess1, 68
    DbAuthFCheckSrvAccess2, 69
    OldHash, 70
    LoginSessDb_ObtainRoutingEnvChange, 71
    DbAcceptsGatewayConnOnly, 72

    Pooled Connections

    An 18056 error can only occur when we are trying to reset a pooled connection. Most applications I see these days are setup to use pooled connections. For example, a .NET application will use connection pooling by default. The reason for using pooled connections are to avoid some of the overhead of creating a physical hard connection.

    With a pooled connection, when you close the connection in your application, the physical hard connection will stick around. When the application then goes to open a connection, using the same connection string as before, it will grab an existing connection from the pool and then reset the connection.

    When a connection is reset, you will not see sp_reset_connection over the wire. You will only see the "reset connection" bit set in the TDS Packet Header.

    Frame: Number = 175, Captured Frame Length = 116, MediaType = ETHERNET
    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-15-5D-4C-B9-60],SourceAddress:[00-15-5D-4C-B9-52]
    + Ipv4: Src = 10.0.0.11, Dest = 10.0.0.130, Next Protocol = TCP, Packet ID = 18133, Total IP Length = 102
    + Tcp: [Bad CheckSum]Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=62, Seq=4058275796 - 4058275858, Ack=1214473613, Win=509 (scale factor 0x8) = 130304
    - Tds: SQLBatch, Version = 7.3 (0x730b0003), SPID = 0, PacketID = 1, Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=62, Seq=4058275796 - 4058275858, Ack=1214473613, Win=130304
    - PacketHeader: SPID = 0, Size = 62, PacketID = 1, Window = 0
    PacketType: SQLBatch, 1(0x01)
    Status: End of message true, ignore event false, reset connection true, reset connection skip tran false
    Length: 62 (0x3E)
    SPID: 0 (0x0)
    PacketID: 1 (0x1)
    Window: 0 (0x0)
    - TDSSqlBatchData:
    + AllHeadersData: Head Type = MARS Header
    SQLText: select @@version

    In the above example, we are issuing a SQL Batch on a pooled connection. Because it was a pooled connection, we have to signal that we need to reset the connection before the Batch is executed. This is done via the "reset connection" bit.

    After the above SQLBatch is issued, the app could then turn around and issue an Attention to cancel the request. This is what resulted in the 18056 with State 29 in the past under the condition of an attention.

    Frame: Number = 176, Captured Frame Length = 62, MediaType = ETHERNET
    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-15-5D-4C-B9-60],SourceAddress:[00-15-5D-4C-B9-52]
    + Ipv4: Src = 10.0.0.11, Dest = 10.0.0.130, Next Protocol = TCP, Packet ID = 18143, Total IP Length = 48
    + Tcp: [Bad CheckSum]Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=8, Seq=4058275858 - 4058275866, Ack=1214473613, Win=509 (scale factor 0x8) = 130304
    - Tds: Attention, Version = 7.3 (0x730b0003), SPID = 0, PacketID = 1, Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=8, Seq=4058275858 - 4058275866, Ack=1214473613, Win=130304
    - PacketHeader: SPID = 0, Size = 8, PacketID = 1, Window = 0
    PacketType: Attention, 6(0x06)
    Status: End of message true, ignore event false, reset connection false, reset connection skip tran false
    Length: 8 (0x8)
    SPID: 0 (0x0)
    PacketID: 1 (0x1)
    Window: 0 (0x0)

    In this case, we would still be in the process of doing the connection reset which would be a problem. Bob Dorr's Part 2 blog that is linked above goes into good detail for how this actually occurs.

    So, no more State 29?

    The thing to realize about State 29 is that it is a generic state just indicating that an exception has occurred while trying to redo a login (Pooled Connection). This exception was not accounted for in any other logic to produce a different state that is listed above. Something similar to like an E_FAIL or General Network Error.

    Going forward, assuming you the above fix applied, or are running on SQL 2012 which has it as well, if you get a State 29, it will not be because of an Attention because we are not logging the 18056 any longer for the Attention, however, if you look at dm_os_ring_buffers, you will still see the actual Attention (Error 3617). We just don't log the 18056 any longer to avoid noise.

    <Record id= "3707218" type="RING_BUFFER_EXCEPTION" time="267850787"><Exception><Task address="0x52BDDC8"></Task><Error>3617</Error><Severity>25</Severity><State>23</State><UserDefined>0</UserDefined></Exception><Stack

    There are things that occur in the course of resetting a login that could trigger a State 29. One example that we have seen is a Lock Timeout (1222).

    In the Lock Timeout scenario, the only thing logged to the ERRORLOG was the 18056. We had to review the dm_os_ring_buffersDMV to see the Lock Timeout.

    <Record id= "3707217" type="RING_BUFFER_EXCEPTION" time="267850784"><Exception><Task address="0x4676A42C8"></Task><Error>1222</Error><Severity>16</Severity><State>55</State><UserDefined>0</UserDefined></Exception><Stack

    The Lock Timeout was a result of statements issuing "SET LOCK_TIMEOUT 0" which affects the connection itself. When the connection is "reset", the SET statements are carried forward. Then based on timing, and whether an exclusive lock is taken based on what the Login logic is looking for, it could end up affecting Logins off of a Pooled Connection when that connection is reused. The default lock timeout for a connection is -1.

    Now what?

    If you receive a State 29, you should follow that up by looking in the dm_os_ring_buffers. You will want to look at the RING_BUFFER_EXCEPTION buffer type.

    selectcast(recordasXML) asrecordXML
    fromsys.dm_os_ring_buffers
    wherering_buffer_type =
    'RING_BUFFER_EXCEPTION'

    The error that you find should help explain the condition, and/or allow you to troubleshoot the problem further. If you see 3617, then you will want to look at applying the hotfix above to prevent those messages from being logged. If you see a different error, then you may want to collect additional data (Profiler Trace, Network Trace, etc…) to assist with determining what could have led to that error.

     

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton

     

     

     


    I’m always amazed that issues usually come in batches.  I was looped into a few cases that had the following symptoms.   They were running SharePoint 2010 and Reporting Services 2012 SP1.  When they went to use a data source with Windows Authentication, they were seeing the following error:

    image

    System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)   

    This caused me to raise an eyebrow (visions of Spock as the new Star Trek movie is opening today <g>).  A lot of thoughts were floating in my head that all told me that this error didn’t make sense, for a bunch of reasons.

    1. The default protocol order for connecting to SQL from a client is TCP and then Named Pipes.  So, because we failed with a Named Pipes error, that meant something was either wrong with TCP or someone changed the Protocol order (which I have never seen in a customer case – so very unlikely)
    2. This is RS 2012, which means we are a Shared Service and rely on the Claims to Windows Token Service (C2WTS).  This forces Constrained Delegation.  Pretty sure most people would not have created the delegation requirements for the Named Pipes SQL SPN as most people go down the TCP route.  You can read more about SQL’s SPNs being Protocol based here.  Also more on this related aspect in a later post as I found some interesting things about this as well.
    3. This error tells me that we couldn’t establish a connection to SQL via Named Pipes.  Think of this as a “Server Not Found” type error.  I immediately tossed out any Kerberos/Claims related issue due to that thinking – again more on the kerb piece of this in a later post.
    4. This is really the first time I’ve had someone hit me up with a Named Pipes connection failure from an RS/SharePoint Integration perspective ever.  And I just got hit with 3 of them within the same week.  Something is up.

    Being this told me we had an actual connection issue via Named Pipes, I started down the normal connectivity troubleshooting path.  With any connectivity issue, I started with a UDL (Universal Data Link) file.  Basically just a text file renamed with an extension of UDL.  It’s important to run this from the same machine that is hitting the SqlException.  In my case it was my SharePoint App server, not the WFE server.

    image

    You’ll notice the “np:” in front of the server name.  This forces the Named Pipes Protocol and ignores the default protocol order.  And this worked.  I also tried “tcp:” to force TCP in the UDL and this worked to.  I went back to my data source and tried forcing TCP there.

    image

    System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - The requested name is valid, but no data of the requested type was found.)

    This made no sense.  I even made sure I was logged in as the RS Service Account as that is the context in which we would have been connecting to SQL.  Same result.  Also, within a network trace, I saw nothing on either the TCP or Named Pipes side of the house in the trace that related to this connection attempt.  Which meant we never hit the wire. 

    As I was going to collect some additional diagnostic logging (Kerberos ETW tracing and LSASS Logging) I ended up doing an IISRESET and a recycle of the C2WTS service.  We went to reproduce the issue, but got a different error this time.

    image

    System.IO.FileLoadException: Could not load file or assembly 'System.EnterpriseServices, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. Either a required impersonation level was not provided, or the provided impersonation level is invalid. (Exception from HRESULT: 0x80070542)  File name: 'System.EnterpriseServices, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' ---> System.Runtime.InteropServices.COMException (0x80070542): Either a required impersonation level was not provided, or the provided impersonation level is invalid. (Exception from HRESULT: 0x80070542)   

    This error I did know and can work with.  I had blogged about this error last July here.  Checking the “Act as part of the operating system” showed that the C2WTS service account in fact was not given that right.  Adding that account to that policy right and restarting the C2WTS Windows Service and performing an IISRESET then yielded the following:

    image

    The connectivity errors were clearly related to the lack of the Policy Setting.  It was unexpected and didn’t line up with normal connectivity related issues and also wasn’t very helpful with regards of where to go look for more information as all of the normal paths didn’t show anything useful.

    Of note, I tried reproducing this on SharePoint 2013, but only got the FileLoadException.  I think this is partly a timing issue with how IIS AppPools are started and the C2WTS service is started.  Doesn’t mean you won’t see this on SharePoint 2013 necessarily.  Even on SharePoint 2010, the first time I hit the FileLoadException.

     

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton


    I ran into a new Kerberos Scenario that I hadn’t hit before when I was working on the cases related to this blog post. It’s rare that I actually see a case related to the Named Pipes protocol.  When I do, it is usually a customer trying to get it setup with a Cluster deployment.  I have never had a Named Pipes case related to Kerberos.  On top of that, I’ve never had a SQL related Kerberos issue that looked like an actual network related issue.  I usually see a traditional “Login failed for user” type error from the SQL Server itself.

    As part of my troubleshooting for the other blog post with the Claims configuration, I stumbled upon some information and theories about how Named Pipes responds when Kerberos is in the picture that I hadn’t ever seen or dealt with before.  I love when I see new things! It is very humbling and always reminds me there are a lot of things that I don’t know.  And, if you have read my other blog posts, or have seen me present at conferences like PASS, you know I have a passion for Kerberos!

    Here is what I saw from an error perspective using SharePoint 2013 and Reporting Services 2012 SP1.

    image

    System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) ---> System.ComponentModel.Win32Exception: Access is denied

    This is a typical error if we can’t connect to SQL.  Think of this like a “Server doesn’t exist” type error.  We didn’t get the normal “Login failed for user” error that would possibly point towards Kerberos.  In this error, we didn’t even make it to SQL.  The interesting piece here though is the “Access is denied” inner exception.  That does possibly point to a permission issue. 

    I had talked in the last Blog Post about protocol order with connecting to SQL and that the default was TCP.  In this case, I was forcing Named Pipes, so the fact that the error is a Named Pipes error is expected.

    I dropped down to a network trace to see how far we actually got and to see if that revealed any other information.  One thing to keep in mind here is that we are in a Claims to Windows Token Service (C2WTS) scenario with the SharePoint/RS 2012 integration.  So, Kerberos/Constrained Delegation will be in the picture here.  A lot of people aren’t necessarily familiar with how Named Pipes actually works.  Named Pipes actually uses the SMB (simple message block) protocol from a network perspective.  This is the same protocol used for file shares and you’ll see the traffic on port 445.  It can be a little confusing because SMB sits on top of TCP, but we aren’t actually using the TCP 1433 port.  It is just a different way to connect to SQL Server. The IP 10.0.0.20 was the SharePoint Server hosting the Reporting Services Service.

    300    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB    SMB:C; Negotiate, Dialect = PC NETWORK PROGRAM 1.0, LANMAN1.0, Windows for Workgroups 3.1a, LM1.2X002, LANMAN2.1, NT LM 0.12, SMB 2.002, SMB 2.???    {SMBOverTCP:42, TCP:41, IPv4:1}

    302    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R   NEGOTIATE (0x0), Revision: (0x2ff) - SMB2 wildcard revision number., ServerGUID={97B805C2-296C-477B-82B4-DEB6170A2A01} Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}

    303    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB2    SMB2:C   NEGOTIATE (0x0), ClientGUID= {9CB563F9-BEF4-11E2-9403-00155D4CB97B},     {SMBOverTCP:42, TCP:41, IPv4:1}

    304    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R   NEGOTIATE (0x0), Revision: (0x300) - SMB 3.0 dialect revision number., ServerGUID={97B805C2-296C-477B-82B4-DEB6170A2A01} Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}

    323    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB2    SMB2:C   SESSION SETUP (0x1) Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}

    326    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R  - NT Status: System - Error, Code = (22) STATUS_MORE_PROCESSING_REQUIRED  SESSION SETUP (0x1), SessionFlags=0x0 Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}

    327    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    SMB2    SMB2:C   SESSION SETUP (0x1) Authentication Method: GSSAPI,     {SMBOverTCP:42, TCP:41, IPv4:1}
         - ResponseToken: NTLM AUTHENTICATE MESSAGE Version:NTLM v2, Workstation: CAPTHELO
              Signature: NTLMSSP

    328    9:04:40 AM 5/17/2013    captthrace.battlestar.local    10.0.0.20    SMB2    SMB2:R  - NT Status: System - Error, Code = (34) STATUS_ACCESS_DENIED  SESSION SETUP (0x1) ,     {SMBOverTCP:42, TCP:41, IPv4:1}

    329    9:04:40 AM 5/17/2013    10.0.0.20    captthrace.battlestar.local    TCP    TCP:Flags=...A.R.., SrcPort=49665, DstPort=Microsoft-DS(445), PayloadLen=0, Seq=2945236632, Ack=2852397926, Win=0 (scale factor 0x8) = 0    {TCP:41, IPv4:1}

    In the Network Trace we can see that we were trying to connect via NTLM.  I already know that that will be a problem as we have to go Kerberos.  We started supporting Kerberos with Named Pipes starting in SQL 2008, so it should work. At this point, I’m thinking we actually have a Kerberos issue even though it looked like a network issue from the original error message.  So, lets go see if we can validate that.  I already had Kerberos Event Logging enabled.  These entries will be located in the System Event Log.  You can ignore errors that show “KDC_ERR_PREAUTH_REQUIRED”.  That is just noise and expected.  Also realize that errors may be cached and if they are, you will not see them in the Event Log or a Network Trace. It may require an IISRESET, a reset of the C2WTS Windows Service, or even a reboot of the box to get the items to show in the Event log or Network Trace. See this Blog Post.

    Log Name:      System
    Source:        Microsoft-Windows-Security-Kerberos
    Date:          5/17/2013 9:04:40 AM
    Event ID:      3
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      CaptHelo.battlestar.local
    Description:
    A Kerberos error message was received:
    on logon session
    Client Time:
    Server Time: 14:4:40.0000 5/17/2013 Z
    Error Code: 0xd KDC_ERR_BADOPTION
    Extended Error: 0xc0000225 KLIN(0)
    Client Realm:
    Client Name:
    Server Realm: BATTLESTAR.LOCAL
    Server Name: cifs/captthrace.battlestar.local
    Target Name: cifs/captthrace.battlestar.local@BATTLESTAR.LOCAL
    Error Text:
    File: 9
    Line: 12be
    Error Data is in record data.

    This entry was the only non-PREAUTH_REQUIRED error.  Two things that were interesting about this.  First was KDC_ERR_BADOPTION.  When I see this, especially in a Claims type configuration, it tells me we have a Constrained Delegation issue.  The other item that was interesting was the CIFS SPN.  CIFS is used for File Sharing.  It stands for “Common Internet File System”.  This was our SMB traffic.  We can also see this in the Network Trace.

    319    9:04:40 AM 5/17/2013    10.0.0.20    10.0.0.1    KerberosV5    KerberosV5:TGS Request Realm: BATTLESTAR.LOCAL Sname: cifs/captthrace.battlestar.local     {TCP:44, IPv4:14}

    321    9:04:40 AM 5/17/2013    10.0.0.1    10.0.0.20    KerberosV5    KerberosV5:KRB_ERROR  - KDC_ERR_BADOPTION (13)    {TCP:44, IPv4:14}

    This was interesting, because I never gave Constrained Delegation rights to CIFS for the C2WTS or the Computer Account.  When we talk about SPN’s and Delegation and placement, we talk about that the SPN should be on the account that is running the servers.  For CIFS, it will be the system itself and therefore on the machine account of the SQL Server that we are trying to connect to. 

    CIFS is one of those special Service Classes, similar to HTTP.  It is covered by the HOST SPN on the Machine Account and we won’t see an actual CIFS SPN defined, but when we go to the delegation side of things you will see it.

    image

    image

    I added this to both the Claims Service account and the Computer Account.  I say computer account, because the actual SMB request will come from the machine and not directly from the RS Process.  Under the hoods, it is affectively making a call to the CreateFile Windows API. 

    After resetting IIS and cycling the C2WTS Service, I still saw the same exact error.  This was one of those reboot moments.  After rebooting the server, I then got the following:

    image

    I didn’t necessarily expect this as I expected to fail on the Kerb side to SQL.  So, I ran a report and stuck a WAITFOR DELAY in there so I could see the connection.  had a look at dm_exec_connections on the SQL Server and saw that we had connected with NTLM:

    image

    For our purposes this will work as I’m not going further than SQL.  This is technically a single hop between the SharePoint Server System context and the SQL Server.  You can configure it for Kerberos if you really want that auth_scheme by creating the appropriate Named Pipes SPN and configuring the appropriate Delegation for the C2WTS Service Account and the Machine Account for where the SMB request is originating from.  Also realize that if you have a misplaced Named Pipes SQL SPN, you will encounter a “Cannot Generate SSPI Context” similar to the following:

     

    image

     

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton


    A case came up where the user was trying to use Report Builder in a Reporting Services instance that was not integrated with SharePoint.  It was in Native Mode configuration.  They indicated that they were getting a 401 error.  My initial thought was that we were hitting a Kerberos issue.  Of note, they were trying to hit a List that was in SharePoint 2013. 

    SharePoint 2013 is defaulted to use Claims Authentication Sites.  So, most would probably ignore the Kerberos aspects of the SharePoint site.  I was able to reproduce the issue locally because I had done the same thing.

    I created the Data Source within Report Builder to hit my SharePoint 2013 site:  http://capthelo/, and when I click on “Test Connection” within the Data Source Dialog Window, I get the following error.

    image

    dataextension!ReportServer_0-1!9cc!06/11/2013-14:25:58:: e ERROR: Throwing Microsoft.ReportingServices.DataExtensions.SharePointList.SPDPException: , Microsoft.ReportingServices.DataExtensions.SharePointList.SPDPException: An error occurred when accessing the specified SharePoint list. The connection string might not be valid. Verify that the connection string is correct.  ---> System.Net.WebException: The request failed with HTTP status 401: Unauthorized.

    This happens because when you click “Test Connection” the connection test is actually performed on the Report Server itself not directly from Report Builder.  I had blogged a while back regarding Report Builder and Firewalls where I talk about how some of the items in Report Builder will try to connect direction, but “Test Connection” is not one of them.

    At this point, we could ignore the error and hit OK on the Data Source Dialog and try and create a DataSet. When I go to the Query Designer, it appears to have worked.  This because the DataSets and Query Designer are coming from Report Builder itself.  It is a direct Web Request from the Report Builder Process and not the Report Server, so I don’t get an error.

    image

    However, this is misleading.  This may make you believe that it is working properly, but when you deploy and try to run the report, you will be back to the 401 error because we are now coming from the Report Server which will be down the same path that the original error with the “Test Connection” had.  From the DataSet/Query Designer perspective, this is a straight shot from Report Builder to SharePoint, so we can get away with an NTLM connection for the Web Request and the Windows Credential is valid. 

    From the Report Server, however, this is called a Double Hop and to forward Windows Credentials you need Kerberos to do that.   Even when your SharePoint 2013 site is configured for Claims.  This actually has nothing to do with SharePoint, it has everything to do with Reporting Services.  The Report Server is the one trying to delegate the Windows Credential to whoever the receiving party is for the Web Request (or SQL Connection if that is your Data Source).  In this case, it is SharePoint 2013.  Because Kerberos isn’t configured properly, IIS (which is hosting SharePoint), received an anonymous credential for the Web Request and rejects it accordingly with a 401 error.

    In my case, I was using a Domain User Account for the RS Service Account (BATTLESTAR\RSService – http://chieftyrol).  It had the proper HTTP SPN on it.  Also my SharePoint site was using a Domain User account for the AppPool identity within IIS (BATTLESTAR\spservice – http://capthelo) and this had the proper HTTP SPN on it.

    image

    So, now I just need to verify the Delegation properties for the RSService Account. Because I’m using the RSService account for other things that includes Claims within SharePoint 2013, I’m forced to Constrained Delegation on this account and need to continue using that.  If you are not bound to Constrained Delegation, you could choose the option “Trust this user for delegation to any service (Kerberos Only)” which is considered Full Trust and should correct the issue.  If you are using Constrained Delegation, you have to add the proper service that you want to delegate to.  In my case that is for my SharePoint site and is http/capthelo.battlestar.local.  After I added it, it looked like the following.

    image

    Then I restarted the Reporting Services Service and created the Data Source again.  At that point, the “Test Connection” returned Success!

    image

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton


    In the last post in this series, we looked at how we can determine that our Connection pool was exhausted.  In this post I'm going to go a little deeper into the Internal connection itself and how we can verify if this is a closed or active connection.

    Dumping out the internal connection objects

    A connection object in the System.Data.SqlClient namespace consists of two parts:

    • The SqlConnection class that is used by customers’ code
    • The SqlInternalConnectionTds internal class that is used by the pooling code. This class is not directly accessible to the user.

    The SqlConnection class has a pointer to a SqlInternalConnectionTds object if it’s open (_innerConnection member variable). The _innerConnection member variable is null if the connection is closed. Whenever a connection is closed by the code, the internal object gets disassociated from the external object and the ownership of the internal object transfers to the pool object. This relationship allows us to identify SqlConnection objects that have not been closed.

    The SqlInternalConnectionTds object has a weak reference back to the owning SqlConnection object.

    Since there are typically multiple pools and not all of them are full, we want to start with the internal objects that we know belong to a full pool.

    Going back to the pool in question, lets dump out the items within this pool.

    0:000> !do 012bbe80
    Name: System.Data.ProviderBase.DbConnectionPool
    ...
    00000000  400152b       40                       0 instance 012bbfb4 _objectList
    79102290  400152c       54         System.Int32  1 instance      100 _totalObjects
    ...

    0:000> !do 012bbfb4
    Name: System.Collections.Generic.List`1[[System.Data.ProviderBase.DbConnectionInternal, System.Data]]
    MethodTable: 654413c4
    EEClass: 7912f680
    Size: 24(0x18) bytes
    (C:\WINNT\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
    Fields:
          MT    Field   Offset                 Type VT     Attr    Value Name
    7912d8f8  40009c7        4      System.Object[]  0 instance 012bbfcc _items
    79102290  40009c8        c         System.Int32  1 instance      100 _size
    79102290  40009c9       10         System.Int32  1 instance      100 _version
    790fd0f0  40009ca        8        System.Object  0 instance 00000000 _syncRoot
    7912d8f8  40009cb        0      System.Object[]  0   shared   static _emptyArray
        >> Domain:Value dynamic statics NYI
    00155858:NotInit  <<

    0:000> !da 012bbfcc
    Name: System.Data.ProviderBase.DbConnectionInternal[]
    MethodTable: 7912d8f8
    EEClass: 7912de6c
    Size: 416(0x1a0) bytes
    Array: Rank 1, Number of elements 100, Type CLASS
    Element Methodtable: 654009f0
    [0] 012be414
    [1] 012bf3e4
    [2] 012bf008
    ...
    [98] 0148114c
    [99] 01485fcc

    At this point we want to save all these 100 internal connection addresses into a file and remove all the array indexes so that the file only contains:

    012be414
    012bf3e4
    012bf008
    ...
    0148114c
    01485fcc

    Visual Studio is handy for this since we can select using alt + mouse to select the first 3-4 columns and delete them all, then save the file.

    Processing the internal connections

    The goal at this point is to find any SqlConnection objects from these SqlInternalConnectionTds objects that are no longer referenced. If the SqlConnection still references the SqlInternalConnectionTds and cannot be reached through !gcroot, it has been abandoned by the code without closing it.

    Using .foreach to dump out the connections is easiest since it avoid the manual work of processing each of the 100 connections in question:

    .foreach /f ( place "c:\temp\InternalConnections.txt") {  dd poi(poi( place +4)+4) l1}
    (32 bit)

    or

    .foreach /f ( place "c:\temp\InternalConnections.txt") {  dq poi(poi( place +8)+8) l1}
    (64 bit)

    Explanation of the .foreach command:

    place – this is our placeholder, or variable name, that represents each of the addresses in the file
    dd – this would be dq in a 64-bit dump. It dumps out a double word, or the address
    place + 8 – the weak reference is at offset 8 from the SqlInternalConnectionTds(64 bit) or at offset 4(32 bit):

    0:000> !do 012be414
    Name: System.Data.SqlClient.SqlInternalConnectionTds
    MethodTable: 65404744
    EEClass: 6544d9e0
    Size: 140(0x8c) bytes
    (C:\WINNT\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
    Fields:
          MT    Field   Offset                 Type VT     Attr    Value Name
    79102290  4000f67       1c         System.Int32  1 instance        4 _objectID
    ...
    79104c38  4000f6d        4 System.WeakReference  0 instance 012be55c _owningObject
    ...

    The WeakReference object has a handle at offset 8 that is the second +8 in the command (64 bit) or at offset 4 (32 bit):

    0:000> !do 012be55c
    Name: System.WeakReference
    MethodTable: 79104c38
    EEClass: 79104bd4
    Size: 16(0x10) bytes
    (C:\WINNT\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
    Fields:
          MT    Field   Offset                 Type VT     Attr    Value Name
    791016bc  40005a9        4        System.IntPtr  1 instance   3f1268 m_handle
    7910be50  40005aa        8       System.Boolean  1 instance        0 m_IsLongReference

    The value at that location is the owning object if it exists.

    Non-Null and has an owning object:

    0:000> dd 3f1268 l1
    003f1268  01575138

    Null and no owning object:

    0:000> dd 3f1268 l1
    003f1268  00000000

    Output of the foreach command:

    0:000> .foreach /f ( place "c:\temp\InternalConnections.txt") {  dd poi(poi( place +4)+4) l1}
    003f1268  01575138
    003f127c  0157336c
    003f1290  0157136c
    003f1298  0156f138
    003f1244  015809fc
    ...
    003f2d34  014ac514
    003f2d2c  014acbf4
    003f2d1c  014ac7d4
    003f2d14  015817cc

    As we can see, the internal connections have an owning SqlConnection object. This either means that they are actively being used by the code (not likely) or they have been abandoned (more likely).

    Finding out if a connection is actively used

    To find out if a SqlConnection is still being used by the code, we can run the !gcroot command. This command will tell us if the object is reachable by the .NET Framework and if it is not, it is ready to be collected.

    0:000> !gcroot 0157336c
    Note: Roots found on stacks may be false positives. Run "!help gcroot" for
    more info.
    Scan Thread 0 OSTHread 590
    DOMAIN(00155858):HANDLE(WeakSh):3f127c:Root:0157336c(System.Data.SqlClient.SqlConnection)

    At this point in the application, we only have one thread running which is thread ID 0. 

    Here the output indicates that the object is reachable from thread 0. However, this can be a false positive because thread references can be old. We still have to verify that the object actually exists on that thread:

    0:000> kL
    ChildEBP RetAddr 
    0012f31c 7739bf53 ntdll!KiFastSystemCallRet
    0012f3b8 7b0831a5 user32!NtUserWaitMessage+0xc
    0012f434 7b082fe3 System_Windows_Forms_ni+0xb31a5
    0012f464 7b0692c2 System_Windows_Forms_ni+0xb2fe3
    0012f490 79e7c6cc System_Windows_Forms_ni+0x992c2
    0012f510 79e7c8e1 mscorwks!CallDescrWorkerWithHandler+0xa3
    0012f64c 79e7c783 mscorwks!MethodDesc::CallDescr+0x19c
    0012f668 79e7c90d mscorwks!MethodDesc::CallTargetWorker+0x1f
    0012f67c 79eefb9e mscorwks!MethodDescCallSite::Call_RetArgSlot+0x18
    0012f7e0 79eef830 mscorwks!ClassLoader::RunMain+0x263
    0012fa48 79ef01da mscorwks!Assembly::ExecuteMainMethod+0xa6
    0012ff18 79fb9793 mscorwks!SystemDomain::ExecuteMainMethod+0x43f
    0012ff68 79fb96df mscorwks!ExecuteEXE+0x59
    0012ffb0 7900b1b3 mscorwks!_CorExeMain+0x15c
    0012ffc0 77e6f23b mscoree!_CorExeMain+0x2c
    0012fff0 00000000 kernel32!BaseProcessStart+0x23

    We can see that we have managed code on this thread.  Let's look at what the managed stack looks like:

    0:000> !clrstack
    OS Thread Id: 0x590 (0)
    ESP       EIP    
    0012f32c 7c8285ec [InlinedCallFrame: 0012f32c] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
    0012f328 7b08374f System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
    0012f3c8 7b0831a5 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
    0012f440 7b082fe3 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
    0012f470 7b0692c2 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
    0012f480 00e70097 SqlConnectionLeakWin.Program.Main()
    0012f69c 79e7c74b [GCFrame: 0012f69c]

    Doesn't appear to be doing anything with SQL here.  Let's look at the objects on the stack:

    0:000> !dso
    OS Thread Id: 0x590 (0)
    ESP/REG  Object   Name
    ebx      01253384 System.Windows.Forms.Application+ThreadContext
    esi      015cc2e8 System.Collections.Hashtable+HashtableEnumerator
    0012f354 01299fc4 System.Windows.Forms.NativeMethods+MSG[]
    0012f358 01253384 System.Windows.Forms.Application+ThreadContext
    0012f360 01299ad8 System.Windows.Forms.Application+ComponentManager
    0012f3d8 01253384 System.Windows.Forms.Application+ThreadContext
    0012f42c 01253384 System.Windows.Forms.Application+ThreadContext
    0012f43c 01296b84 System.Windows.Forms.ApplicationContext
    0012f444 0127fe4c System.ComponentModel.EventHandlerList
    0012f458 01252a8c SqlConnectionLeakWin.Form1
    0012f460 01253384 System.Windows.Forms.Application+ThreadContext
    0012f474 01252a8c SqlConnectionLeakWin.Form1

    We can conclude that this SqlConnection object is no longer being used and it has not been closed.  This proves that the  applications code did not close all connections and further code investigation needs to be performed to close all connections.

    Reference:

    Part 1

    Adam W. Saxton | Microsoft SQL Server Escalation Services


    We get a lot of calls related to Kerberos configuration, and I'm planning to write more about our experiences and troubleshooting techniques for these types of issues across the box (Engine, AS and RS). 

    With Windows 2000/2003 SetSPN had only a few commands associated with it.

    Switches:
    -R = reset HOST ServicePrincipalName
      Usage:   setspn -R computername
    -A = add arbitrary SPN
      Usage:   setspn -A SPN computername
    -D = delete arbitrary SPN
      Usage:   setspn -D SPN computername
    -L = list registered SPNs
      Usage:   setspn [-L] computername

    The other problem was that SetSPN was part of the Resource Kit and did not ship with the OS.

    This has changed in Windows 2008.  SetSPN is now part of the OS from the moment you install it.  They have also improved what SetSPN can do.  Namely the ability to look for duplicate SPNs.  In the past I have used numerous tools to look for duplicate SPNs.  This ranged from DHDiag (an internal CSS tool that uses LDIFDE) to queryspn.vbs to DelegConfig

    Here are the new switches for SetSPN that ships with Windows 2008:

    Modifiers:
    -F = perform the duplicate checking on forestwide level
    -P = do not show progress (useful for redirecting output to file)

    Switches:
    -R = reset HOST ServicePrincipalName
    Usage:   setspn -R computername
    -A = add arbitrary SPN
    Usage:   setspn -A SPN computername
    -S = add arbitrary SPN after verifying no duplicates exist
    Usage:   setspn -S SPN computername
    -D = delete arbitrary SPN
    Usage:   setspn -D SPN computername
    -L = list registered SPNs
    Usage:   setspn [-L] computername
    -Q = query for existence of SPN
    Usage:   setspn -Q SPN
    -X = search for duplicate SPNs
    Usage:   setspn -X

    The Q switch is really the nice feature here.  This allows you to see if an SPN is already out on your domain.  You could also combine this with the F modifier to look through the whole forest.

    C:\>setspn -q MSSQLSvc/mymachine:1433

    No such SPN found.

    C:\>setspn -q MSSQLSvc/mymachine.mydomain.com:1433
    CN=MYMACHINE,OU=Workstations,DC=mydomain,DC=com
            MSSQLSvc/mymachine.mydomain.com:1433
            HOST/MYMACHINE
            HOST/MYMACHINE.MYDOMAIN.COM

    Existing SPN found!

    This is just another thing that will make Kerberos configuration/troubleshooting easier for users.

    Adam W. Saxton | Microsoft SQL Server Escalation Services


  • 07/01/09--17:12: When in doubt, Reboot!
  • I tend to get quite a bit of Kerberos related cases.  These are related across the box, from the Engine, to Reporting Services to just straight connectivity with custom applications.  I had one given to me yesterday because the engineer had gone through everything we normally go through and wasn’t getting anywhere.

    The situation was an 8 node cluster with multiple instances across the nodes.  These were running Windows 2008 with SQL 2008.  One node in particular was having an issue when they were issuing a Linked Server Query from a remote client.

    image

    When trying to hit the linked server from within Management Studio on the client machine, we received the following message:

    Msg 18456, Level 14, State 1, Line 1
    Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'

    Kerberos Configuration:

    When we see this type of error, it is typically Kerberos related as the Service we are using (ServerA) is unable to delegate the client’s credentials to the backend server (ServerB – Linked Server).  The first thing we do is go through our regular kerberos checklist – SPN’s and Delegation settings. Both SQL Servers were using the same Domain User Service Account (SNEAKERNET\SQLSvc).  We can use SetSPN to check what SPN’s are on that account.  NOTE:  There are numerous ways to look for SPN’s but SetSPN is one of the easier command line tools available.  You could also use LDIFDE (http://support.microsoft.com/kb/237677), ADSIEdit (http://technet.microsoft.com/en-us/library/cc773354(WS.10).aspx) and other tools.  You will see us use an in house tool called DHDiag to collect SPN’s.  This is just a wrapper that calls LDIFDE to output the results. 

    So, here are the SetSPN results:

    C:\Users\Administrator>setspn -l sqlsvc
    Registered ServicePrincipalNames for CN=SQL Service,OU=Service Account,DC=sneakernet,DC=local:
            MSSQLSvc/SQL02:26445
            MSSQLSvc/SQL02.sneakernet.local:26445
            MSSQLSvc/SQL01.sneakernet.local:14556
            MSSQLSvc/SQL01:14556

    Why do we see SQL01 and SQL02 when our machine names are ServerA and ServerB?  This is because SQL01 and SQL02 are the virtual names for the cluster.  This name will move to whatever the active node is for that given instance.  Where as ServerA and ServerB are the physical machine names and may or may not be actually hosting that instance.  We can also see from this that we have two distinct instances because of the ports (14556 & 26445).  If you look at some of our documentation (i.e. http://msdn.microsoft.com/en-us/library/ms189585(SQL.90).aspx), it indicates that for clusters, you need to also add a SQL SPN that does not include the port number.  I have yet to see where this is actually needed.  Every cluster I’ve seen has never had one.  Typically if it is needed, you will receive a KRB_ERR_S_PRINCIPAL_UNKNOWN error if you enable Kerberos Event Logging.  If you do see that and it lists that SPN, then go ahead and add it.  But, from my experience, you won’t see it.

    Ok, our SPNs look good. Lets look at our Delegation Settings.  In this case we really care about the SQL Service Account, because that is the context that will be performing the delegation.

    image 

    We can do this by going to the properties for that account within Active Directory Users and Computers.  You will see a Delegation tab on the account.  If you don’t see the delegation tab, then the account does not have an SPN attached to it.  In this case we have “Trust this user for delegation to any service (Kerberos only)”.  This is what I call Full or Open Delegation as opposed to Constrained Delegation (which is more secure).  We are good to go here.  Nine times out of ten, the SPN or Delegation setting is going to be the cause of your issue.  In this case it isn’t.  What can we do now?

    Kerberos Event Logging and Network Traces:

    We can enable Kerberos Event Logging (http://support.microsoft.com/default.aspx?scid=kb;EN-US;262177) which will give us errors within the System Log for Kerberos.  This can sometimes be very helpful in diagnosing what may or may not be happening.  This produced the following results on ServerA:

    Error Code: 0x1b Unknown Error
    Error Code: 0x19 KDC_ERR_PREAUTH_REQUIRED
    And KDC_ERR_BADOPTION

    These are not uncommon and when we looked at these, they didn’t really relate to our issue.  Which means we had nothing here.  Of note, doing a linked server query from ServerB to ServerA worked, and it also produced the same events listed above.  So, nothing to gain here.

    The next thing we can look at is getting a network trace as this will show us the communication between Service in question and the Domain Controller.  I usually end up at this level if the SPN’s and Delegation settings check out.  This is really where some customers can have issues, because typically these are hard to interpret and will require a call to CSS.  We grabbed a trace in the failing and working condition to see what was different.  We saw the following:

    Failing:
    525355 2009-06-30 15:55:39.468865 10.0.0.90 10.0.0.10 KRB5 TGS-REQ
    KDC_REQ_BODY
    KDCOptions: 40810000 (Forwardable, Renewable, Canonicalize)
    Realm: SNEAKERNET.LOCAL
    Server Name (Enterprise Name): ServerA$@SNEAKERNET.LOCAL

    Working:
    353115 23.437037 10.0.0.20 10.0.0.11 KRB5 TGS-REQ
    KDC_REQ_BODY
    KDCOptions: 40810000 (Forwardable, Renewable, Canonicalize)
    Realm: SNEAKERNET.LOCAL
    Server Name (Service and Instance): MSSQLSvc/SQL02.sneakernet.local:26445

    You’ll notice that we are hitting different DC’s here, but that wasn’t the issue as we also saw the failing one hitting different DC’s as we continued.  The other item that is different is the working one requested the right SPN, where as the failing one is requesting the physical machine account context.  This is what was forcing us into NTLM and causing the Login failed error.  But why was that happening?  So far we have zero information to indicate what could be causing it.

    SSPIClient:

    We then used an internal tool called SSPIClient which makes direct calls to the InitializeSecurityContext API call which is how we do impersonation.  This tool allowed us to take SQL Server out of the picture and focus on the Kerberos issue directly.  We could see that we were failing back to NTLM which really confirmed what we saw in the network trace.

    2009-07-01 16:34:24.577 ENTER InitializeSecurityContextA
    2009-07-01 16:34:24.577 phCredential              = 0x0090936c
    2009-07-01 16:34:24.577 phContext                 = 0x00000000
    2009-07-01 16:34:24.577 pszTargetName             = 'MSSQLSvc/SQL02.sneakernet.local:26445'
    2009-07-01 16:34:24.577 fContextReq               = 0x00000003 ISC_REQ_DELEGATE|ISC_REQ_MUTUAL_AUTH
    2009-07-01 16:34:24.577 TargetDataRep             = 16
    2009-07-01 16:34:24.577 pInput                    = 0x00000000
    2009-07-01 16:34:24.577 phNewContext              = 0x0090937c
    2009-07-01 16:34:24.577 pOutput                   = 0x0017d468
    2009-07-01 16:34:24.577 pOutput->ulVersion        = 0
    2009-07-01 16:34:24.577 pOutput->cBuffers         = 1
    2009-07-01 16:34:24.577 pBuffers[00].cbBuffer   = 52
    2009-07-01 16:34:24.577 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
    2009-07-01 16:34:24.577 pBuffers[00].pvBuffer   = 0x02c99f90
    2009-07-01 16:34:24.578 02c99f90  4e 54 4c 4d 53 53 50 00 01 00 00 00 97 b2 08 e2   NTLMSSP.........
    2009-07-01 16:34:24.578 02c99fa0  03 00 03 00 31 00 00 00 09 00 09 00 28 00 00 00   ....1.......(...        
    2009-07-01 16:34:24.578 pfContextAttr             = 0x00001000 ISC_RET_INTERMEDIATE_RETURN
    2009-07-01 16:34:24.578 ptsExpiry                 = 0x0017d43c -> 2009-07-01 10:39:24 *** EXPIRED *** (05:55:00 diff)
    2009-07-01 16:34:24.578 EXIT  InitializeSecurityContextA returned 0x00090312 SEC_I_CONTINUE_NEEDED (The function completed successfully, but must be called again to complete the context)

    NOTE:  We purged all of the Kerberos Tickets before we did this to make sure we would request the ticket from the KDC.  This was done using KerbTray which is part of the Windows Resource Kit.

    This tells us that we were requesting a given SPN for the Target, but the buffer shows NTLMSSP.  This means we fell down to NTLM instead of getting Kerberos.  This still doesn’t explain why.

    End Result:

    Unfortunately, this was one of those issues that just escaped us.  This tends to happen with odd Kerberos cases.  We had the Directory Services team engaged as well and they did not know what else we could do in terms of data collection outside of a Kernel Dump to see what may be going on.  We noticed that the nodes had not been rebooted since April 5th which is a while.  The SQL Service was recycled on June 25th.  We decided to fail over to another node and reboot ServerA. After we rebooted, we tried SSPIClient again and we saw a proper response come back which also didn’t list EXPIRED.  The issue at this point it was resolved.  We don’t have hard data to indicate what exactly the issue was, but the thought is that something was cached and invalid causing the issue.  Rebooting cleared that out and allowed us to work as expected.

    Which leads me to my motto:  When in doubt, Reboot!

    Adam W. Saxton | Microsoft SQL Server Escalation Services


  • 12/03/09--14:53: Report Builder and Firewalls
  • We have had a few customer calls come in on this scenario that I thought this needed to be documented a bit.

    Scenario:

    image

    In this scenario, the customer has a data source defined on the Report Server.  Some were using Named Instances, others were using a Default Instance for the Data Source.

    There are some aspects of Report Builder that will run server side (from the context of the Report Server).  For example, DataSource retrieval and preview of a report.  This is assuming that we are in connected mode in Report Builder.

    image

    There are other aspects that will run Client Side.  Some examples of that are the Query Designer and general Metadata lookup for the DataSet.  This is where the problems come into play when a firewall is involved.

    In all of the cases, reports and Report Builder function normally locally.  When they try to create a new report through Report Builder, they encounter errors similar to the following:

     

    A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)

     

    A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - The requested name is valid, but no data of the requested type was found.)

    A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

    The first error is specific to a Named Instance server. The other two are when we are trying to connect directly to the SQL Server.  Named Instances have to do a lookup to get the port number for the actual instance we are connecting to.  This lookup is fielded by SQL Browser over UDP port 1434.  When ever you see “error: 26 - Error Locating Server/Instance Specified”, it is SQL Browser related.  The underlying issue is still the same as the other messages.

    The way I reproduced the issue was by doing the following on my lab setup which was configured for Basic Authentication:

    1. Open Report Builder (which starts with a blank report – and I was in connected mode with my Report Server)
    2. Create a DataSource which I select from the existing data sources on my Report Server
    3. Create a DataSet
    4. At this point, the DataSet Properties window should open up, at which point you can click on “Query Designer…”

      image
    5. I was then prompted for Credentials and then was met with the following:

      image

    The Problem:

    The overall problem is that Report Builder cannot see the SQL Server when external to the network that SQL Server resides on.  SQL Server is typically not exposed through the firewall.  Assume the following configuration:

    Report Server:

    • Internet RS URL: http://www.mysite.com/ReportServer
    • Public IP:  201.201.201.201
    • Private IP: 10.0.0.5
    • DataSource Connection String:  server=MyServer\MyInstance;Database=AdventureWorks;

    SQL Server:

    • Server Name: MyServer
    • Instance Name: MyInstance (Port 2644)
    • Private IP:  10.0.0.4

    When Report Builder is opened from a client machine on the Internet (or external to the private network that SQL Server is a part of), when it goes to hit the datasource, it is actually trying to connect to MyServer\MyInstance.  Because this is a named instance, we are doing the SQL Browser lookup first.  In this case, it will be a NetBIOS lookup.  If we are doing a straight TCP connection, we will end up doing a DNS lookup.  Because we are on the Internet, there is no WINS or DNS server that is aware of MyServer.  NetBIOS or DNS will come back basically saying it couldn’t find the server name you are requesting which results in one of the errors I outlined above.

    Report Builder doesn’t go through the Reporting Services WebService to do DataSource calls which would make it server based.  From the design perspective, we are client side and it will try to establish that data from the client.  I think some of the confusion is that people thing that we are in “connected” mode with the Report Server, so all functionally would occur on the Report Server itself, in which case we would expect the Report Server to be able to communicate with the SQL Server successfully.  This, unfortunately, is not the case.

    Are there any workarounds?

    The next logical question would be, how do I get this to work?  There are two possible workaround I can think of.  One that is not very realistic and another that is possible, but also somewhat of a pain.

    Workaround 1:

    This involves exposing your SQL Server to the internet, which I do NOT recommend and I doubt most companies are willing to do.  At that point, you could have an External DataSource along with an Internal DataSource.  People using Report Builder on the internet could reference the External DataSource which has the connection information for the SQL Server that would be usable from the internet.  At that point the design aspects would work, but Preview could fail depending on your network configuration if the Report Server can reference the external IP address for SQL Server from the internal side.

    Then when you publish, the report can reference the Internal DataSource.

    Workaround 2:

    Another option is to expose your data through a WebService that is accessible via the Internet.  Then Report Builder uses can access the DataSource that is using the WebService as that resource is available to them externally.

    Update - Workaround 3 (SSAS/OLAP) – Thanks David!:

    For SSAS/OLAP you can setup a Connection Proxy over HTTP.  This would be usable both internally and externally and can be easily exposed through a firewall.  Be sure to use a non-standard port that is configured on your Firewall for security purposes.  Also, be aware that you are exposing your backend to the internet and to take the appropriate security measures.  SQL has a similar feature through the use of an HTTP Endpoint, but be aware that that has been deprecated and is not guaranteed to be available in a future release.

     

    Overall, it will be difficult for people using Report Builder externally to access resources that are on an internal network when designing a report.  Hopefully, this will allow you to better plan your deployment of Reporting Services.

     

    Adam W. Saxton | Microsoft SQL Server Escalation Services


    Hi,

     

    I wanted to make everybody aware of this feature in SQL 2008. 

     

    Are you tired of having to use NetMon to narrow down a connectivity issue with SQL Server 2008 or have to wait for an elusive connectivity error to reoccur?

     

    A new ring buffer called "RING_BUFFER_CONNECTIVITY' has been added to the dmv sys.dm_os_ring_buffers in SQL 2008 RTM.

     

    This will automatically log server-side initiated connection closures, if you see nothing in the dmv, then most likely the client reset/closed the connection. You can enable any connection closure (client or server) logging with trace flag 7827.

     

    Please read this blog for more information!

    http://blogs.msdn.com/sql_protocols/archive/2008/05/20/connectivity-troubleshooting-in-sql-server-2008-with-the-connectivity-ring-buffer.aspx

     

    So if SQL Server 2008 is still online since the connection failure, make sure to capture the information from the sys.dm_os_ring_buffers base on the query in the blog above, as it may give you enough information to narrow down your troubleshooting to the client or server without costly netmon traces.

     

    Hope this helps!

    Eric Burgess
    SQL Server Escalation Team

    Was working with Keith Elmore on one of our internal processes and he was hitting a “Cannot generate SSPI context” when trying to connect from Management Studio.  I also saw this come up in a double hop situation (IIS to SQL) when I setup a local repro.

    image

    We went through the normal check list for Kerberos Troubleshooting, but really that just consisted of validating the SPN in the case of Management Studio as it was a single hop and we were just trying to do a direct connection without any delegation.  The SPN checked out, and there was only one SPN.  No duplicates.

    image

    We have an internal tool called SSPIClient which will go through the motions of just trying the Windows API calls for Kerberos authentication (IntializeSecurityContext).

    2009-12-30 21:11:16.185 Connecting via ODBC to [DRIVER=SQL Server;Server=tcp:passsql\demo;Trusted_Connection=Yes;]

    2009-12-30 21:11:16.232 ENTER InitializeSecurityContextA
    2009-12-30 21:11:16.232 phCredential              = 0x0055ffb4
    2009-12-30 21:11:16.232 phContext                 = 0x0055ffc4
    2009-12-30 21:11:16.232 pszTargetName             = 'MSSQLSvc/PASSSQL.pass.local:59256'
    2009-12-30 21:11:16.232 fContextReq               = 0x00000003 ISC_REQ_DELEGATE|ISC_REQ_MUTUAL_AUTH
    2009-12-30 21:11:16.232 TargetDataRep             = 16
    2009-12-30 21:11:16.232 pInput                    = 0x0018d55c
    2009-12-30 21:11:16.232 pInput->ulVersion         = 0
    2009-12-30 21:11:16.232 pInput->cBuffers          = 1
    2009-12-30 21:11:16.232 pBuffers[00].cbBuffer   = 112
    2009-12-30 21:11:16.232 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
    2009-12-30 21:11:16.232 pBuffers[00].pvBuffer   = 0x03753870
    2009-12-30 21:11:16.232 03753870  a1 6e 30 6c a0 03 0a 01 01 a2 65 04 63 60 61 06   .n0l......e.c`a.
    2009-12-30 21:11:16.232 03753880  09 2a 86 48 86 f7 12 01 02 02 03 00 7e 52 30 50   .*.H........~R0P
    2009-12-30 21:11:16.232 03753890  a0 03 02 01 05 a1 03 02 01 1e a4 11 18 0f 32 30   ..............20
    2009-12-30 21:11:16.232 037538a0  30 39 31 32 33 30 32 31 31 31 31 36 5a a5 05 02   091230211116Z...
    2009-12-30 21:11:16.232 037538b0  03 01 0d b4 a6 03 02 01 29 a9 0c 1b 0a 50 41 53   ........)....PAS
    2009-12-30 21:11:16.232 037538c0  53 2e 4c 4f 43 41 4c aa 17 30 15 a0 03 02 01 01   S.LOCAL..0......
    2009-12-30 21:11:16.232 037538d0  a1 0e 30 0c 1b 0a 73 71 6c 73 65 72 76 69 63 65   ..0...sqlservice
    2009-12-30 21:11:16.232 phNewContext              = 0x0055ffc4
    2009-12-30 21:11:16.232 pOutput                   = 0x0018d574
    2009-12-30 21:11:16.232 pOutput->ulVersion        = 0
    2009-12-30 21:11:16.232 pOutput->cBuffers         = 1
    2009-12-30 21:11:16.232 pBuffers[00].cbBuffer   = 12256
    2009-12-30 21:11:16.232 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
    2009-12-30 21:11:16.232 pBuffers[00].pvBuffer   = 0x03759d68
    2009-12-30 21:11:16.232 pfContextAttr             = 0x00000000
    2009-12-30 21:11:16.232 ptsExpiry                 = 0x0018d548 -> 1601-01-01 00:00:00 *** EXPIRED *** (3585189:11:16 diff)
    2009-12-30 21:11:16.232 EXIT  InitializeSecurityContextA returned 0x80090322 SEC_E_WRONG_PRINCIPAL (The target principal name is incorrect)
    2009-12-30 21:11:16.232
    2009-12-30 21:11:16.232 ******************** ODBC Errors ********************
    2009-12-30 21:11:16.232 Return code = -1.
    2009-12-30 21:11:16.232 SQLError[00] SQLState    'S1000'
    2009-12-30 21:11:16.232 SQLError[00] NativeError 0
    2009-12-30 21:11:16.232 SQLError[00] Message     '[Microsoft][ODBC SQL Server Driver]Cannot generate SSPI context'
    2009-12-30 21:11:16.232 ******************** ODBC Errors ********************

    It was saying that the principal was incorrect, but you can see in the output that it is showing sqlservice, which is correct.  We had rebooted the SQL Server in question, at which point the SQL Service wouldn’t even start.  Keith asked if the password had been changed recently.  We took a look, and sure enough, the password was changed yesterday.  This happens to be an account that we use for multiple things. 

    We changed the service account password through SQL Server Configuration Manager and restarted SQL.  SQL could start at that point, and the SSPI error disappeared.  We were able to successfully connect to SQL at that point.

    I’m sure other people have known about this type of condition, but in the years that I’ve been here, along with the number of Kerb issues that I’ve troubleshot in the past, this was the first time I had run across this.  Thought I would throw it out there to share with everyone in case they maybe run across something like this that they can’t explain. 

    If you change your service password, be sure to recycle the SQL Service so that Kerberos can function properly.

     

    Adam W. Saxton | Microsoft SQL Server Escalation Services


    We have had a number of people ask about how they can get the Jet ODBC driver/OLE DB Provider as 64 bit.  Windows only ships the 32 bit versions of these.  The answer is that the windows versions won’t be x64 as those items are deprecated.  What does deprecated mean?  Here is the excerpt from the MDAC/WDAC Roadmap on MSDN:

    Deprecated MDAC/WDAC Components

    These components are still supported in the current release of MDAC/WDAC, but they might be removed in future releases. Microsoft recommends, when you develop new applications, that you avoid using these components. Additionally, when you upgrade or modify existing applications, remove any dependency on these components.

    And here is what it lists about the Jet Database Engine:

    Microsoft Jet Database Engine 4.0: Starting with version 2.6, MDAC no longer contains Jet components. In other words, MDAC 2.6, 2.7, 2.8, and all future MDAC/WDAC releases do not contain Microsoft Jet, the Microsoft Jet OLE DB Provider, the ODBC Desktop Database Drivers, or Jet Data Access Objects (DAO). The Microsoft Jet Database Engine 4.0 components entered a state of functional deprecation and sustained engineering, and have not received feature level enhancements since becoming a part of Microsoft Windows in Windows 2000.


    There is no 64-bit version of the Jet Database Engine, the Jet OLEDB Driver, the Jet ODBC Drivers, or Jet DAO available. This is also documented in KB article 957570. On 64-bit versions of Windows, 32-bit Jet runs under the Windows WOW64 subsystem. For more information on WOW64, see http://msdn.microsoft.com/en-us/library/aa384249(VS.85).aspx. Native 64-bit applications cannot communicate with the 32-bit Jet drivers running in WOW64.


    Instead of Microsoft Jet, Microsoft recommends using Microsoft SQL Server Express Edition or Microsoft SQL Server Compact Edition when developing new, non-Microsoft Access applications requiring a relational data store. These new or converted Jet applications can continue to use Jet with the intention of using Microsoft Office 2003 and earlier files (.mdb and .xls) for non-primary data storage. However, for these applications, you should plan to migrate from Jet to the 2007 Office System Driver. You can download the 2007 Office System Driver, which allows you to read from and write to pre-existing files in either Office 2003 (.mdb and .xls) or the Office 2007 (*.accdb, *.xlsm, *.xlsx and *.xlsb) file formats. IMPORTANT Please read the 2007 Office System End User License Agreement for specific usage limitations.


    Note: SQL Server applications can also access the 2007 Office System, and earlier, files from SQL Server heterogeneous data connectivity and Integrations Services capabilities as well, via the 2007 Office System Driver. Additionally, 64-bit SQL Server applications can access to 32-bit Jet and 2007 Office System files by using 32-bit SQL Server Integration Services (SSIS) on 64-bit Windows.

    This all pertains to the components that actually ship with Windows.  The Office team has since taken up Jet as part of Access and has come out with what they call the Access Control Entry (ACE) driver.  For more information on the ACE Drivers, you can check out this blog post which goes into details.  the ACE driver/provider is completely backwards compatible with Jet 4.0 though. 

    Office 2010 will introduce a 64 bit version of Office.  With that is coming a 64 bit version of the ACE Driver/Provider which will in essence give you a 64 bit version of Jet.  The downside is that it doesn’t ship with the operating system but will be a redistributable.  There is a beta version available of this driver, as Office 2010 hasn’t been released yet.

    2010 Office System Driver Beta: Data Connectivity Components
    http://www.microsoft.com/downloads/details.aspx?familyid=C06B8369-60DD-4B64-A44B-84B371EDE16D&displaylang=en

    Adam W. Saxton | Microsoft SQL Server Escalation Services


    This month has turned into another Kerberos Month for me.  I had an email discussion regarding SPN’s for SQL Server and what we can do to get them created and in a usable state.  I thought I would share my response to the questions as it will probably be helpful for someone.  Here was the comment that started the conversation.  And, by the way, this was actually a good question.  I actually see this kind of comment a lot in regards to SPN placement.  Not necessarily the setup aspect of it, but for SPN’s in general.

    “In prior versions of setup we used to be able to specify the port number for the default and Named Instance.  Now, (SQL 2008 & R2) it takes the defaults.  1433 and Dynamic for Named Instances.

    If you want to use Kerberos with TCP, you need to know the port number to create the SPN.  For Default instances, if you’re using 1433 then you’re ok. But, Named Instances listen on a dynamic port by default, and since you can’t set the port number, any SPN you create will probably be wrong and Kerberos won’t work.  It would be great if we could ask the user if they want to change the port number during setup, like we did with SQL 2000.”

    Let’s have a look at Books Online first.

    Registering a Service Principal Name
    http://msdn.microsoft.com/en-us/library/ms191153.aspx

    This article goes through the different formats that are applicable to SQL 2008 (they are the same for R2 as well).  It also touches on two items that are important to understand.  1.  Automatic SPN Registration and 2. Client Connections. Here is the excerpt from the above article in regards to Automatic SPN Registration.

    Automatic SPN Registration

    When an instance of the SQL Server Database Engine starts, SQL Server tries to register the SPN for the SQL Server service. When the instance is stopped, SQL Server tries to unregister the SPN. For a TCP/IP connection the SPN is registered in the format MSSQLSvc/<FQDN>:<tcpport>.Both named instances and the default instance are registered as MSSQLSvc, relying on the <tcpport> value to differentiate the instances.

    For other connections that support Kerberos the SPN is registered in the format MSSQLSvc/<FQDN>:<instancename> for a named instance. The format for registering the default instance is MSSQLSvc/<FQDN>.

    Manual intervention might be required to register or unregister the SPN if the service account lacks the permissions that are required for these actions.

    What does this mean?  It means that if the SQL Service account is using Local System or Network Service as the logon account, we will have the permission necessary to register the SPN against the Domain Machine Account.  By default, the machine accounts have permission to modify themselves.  If we change this over to a Domain User Account for the SQL Service account, things change a little.  By default a Domain User does not have the permission required to create the SPN.  So, when you start SQL Server with a Domain User Account, you will see an entry in your ERRORLOG similar to the following:

    2010-03-05 09:39:53.20 Server      The SQL Server Network Interface library could not register the Service Principal Name (SPN) for the SQL Server service. Error: 0x2098, state: 15. Failure to register an SPN may cause integrated authentication to fall back to NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies.

    This permission is called “Write servicePrincipalName” and can be altered through an MMC snap in called ADSI Edit.  For instructions on how to modify this setting, refer to Step 3 in the following KB Article.  WARNING:  I do NOT recommend you do this on a Cluster.  We have seen issues with this causing connectivity issues due to Active Directory Replication issues if more than one Domain Controller is used in your environment.

    How to use Kerberos authentication in SQL Server
    http://support.microsoft.com/kb/319723

    clip_image002

    So, if I enable that permission, lets see what the SQL Service does.  I have two machines I’m going to use for this.  ASKJCTP3 (running the RC build of 2008 R2) and MySQLCluster (SQL 2008 running a Named Instance called SQL2K8).

    SetSPN Details:

    SPN's with TCP and NP enabled on Default Instance:

    C:\>setspn -l sqlservice
    Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
            MSSQLSvc/ASKJCTP3.dsdnet.local:1433
            MSSQLSvc/ASKJCTP3.dsdnet.local

    SPN's with only NP enabled on Default Instance:

    C:\>setspn -l sqlservice
    Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
            MSSQLSvc/ASKJCTP3.dsdnet.local

    SPN's with TCP and NP enabled on Clustered Named Instance:

    C:\>setspn -l sqlservice
    Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
            MSSQLSvc/MYSQLCLUSTER.dsdnet.local:54675
            MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

    SPN's with only NP enabled on a Clustered Named Instance:

    C:\>setspn -l sqlservice
    Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
            MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

    Lets look at what the client will do.  When I say client, this could mean a lot of different things.  Really it means an Application trying to connect to SQL Server by way of a Provider/Driver.  NOTE:  Specifying the SPN as part of the connection is specific to SQL Native Client 10 and later.  It does not apply to SqlClient or the Provider/Driver that ships with Windows.

    Service Principal Name (SPN) Support in Client Connections
    http://msdn.microsoft.com/en-us/library/cc280459.aspx

    MSSQLSvc/fqdn

    The provider-generated, default SPN for a default instance when a protocol other than TCP is used.

    fqdn is a fully-qualified domain name.

    MSSQLSvc/fqdn:port

    The provider-generated, default SPN when TCP is used.

    port is a TCP port number.

    MSSQLSvc/fqdn:InstanceName

    The provider-generated, default SPN for a named instance when a protocol other than TCP is used.

    InstanceName is a SQL Server instance name

    Based on this, if I have a straight TCP connection, the Provider/Driver will use the Port for the SPN designation.  Let’s see what happens when I try to make connections using a UDL file.  For the UDL I’m going to use the SQL Native Client 10 OleDb Provider.  Starting with SNAC10, we can specify which SPN to use for the connection.  This provides us some flexibility when we control how the application is going to connect.  Note:  This is not available with the Provider/Driver that actually ship with Windows.  I also will show what the Kerberos request looks like in the network trace.  This will show us, what SPN is actually being used.  All of these connection attempts were made using ASKJCTP3 which is a Default Instance.

    Being this is a Default Instance, I added the Instance Name SPN manually.

    C:\>setspn -l sqlservice
    Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
            MSSQLSvc/ASKJCTP3.dsdnet.local:MSSQLSERVER
            MSSQLSvc/ASKJCTP3.dsdnet.local:1433
            MSSQLSvc/ASKJCTP3.dsdnet.local
            MSSQLSvc/MYSQLCLUSTER.dsdnet.local:54675
            MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

    Straight TCP with no SPN Specified:

    clip_image002[5]

    58     1.796875   {TCP:7, IPv4:5}      10.0.0.3      10.0.0.1      KerberosV5    KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/askjctp3.dsdnet.local:1433

    TCP with specifying an SPN for the connection:

    clip_image004

    32     1.062500   {TCP:11, IPv4:5}     10.0.0.3      10.0.0.1      KerberosV5    KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/ASKJCTP3.dsdnet.local:MSSQLSERVER

    Forcing Named Pipes with no SPN specified:

    clip_image006

    68     1.828125   {TCP:21, IPv4:5}     10.0.0.3      10.0.0.1      KerberosV5    KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/askjctp3.dsdnet.local

     

    The way the provider/driver determines which SPN to use is based on the Protocol being used.  Of note, starting in SQL 2008 we allowed for Kerberos to be used with Named Pipes.  If you have a Named Instance and you are using the Named Pipes protocol, we will look for an SPN with the Named Instance specified.  For a Default Instance and Named Pipes, we will just look for the SPN with no port or Named Instance Name specified as shown above.

    With the ability to specify the SPN from the client side, you can see how you can easily manipulate, or even see how we will determine what SPN will be used. 

    Now that we know all of the above, lets go back to the original question.  Your company may or may not want to enable the Write permission for the Domain User Account.  If your company is not willing to open up the permission on the service account, then their only recourse will be to set a static port for the Named Instance instead of letting the Named Instance use a dynamic port.  This would also be my recommendation for Clusters.  In this case, you will need to know exactly what SPN’s are needed and create them manually using SetSPN or tool of your choice.

    Even though we don’t provide the ability to set your port during setup, you can still modify the port settings for the Instance through the SQL Server Configuration Manager.  This will allow you to set your static SPN’s as well as assist you with Firewall rules.

    image

    image

    Adam W. Saxton | Microsoft SQL Server Escalation Services

    http://twitter.com/awsaxton


    I saw a lot of hits on the web when I searched for the Error message 18056 with State 29. I even saw two Microsoft Connect items for this issue filed for SQL Server 2008 instances:

    http://connect.microsoft.com/SQL/feedback/ViewFeedback.aspx?FeedbackID=468478

    http://connect.microsoft.com/SQLServer/feedback/details/540092/sql-server-2008-sp1-cu6-periodically-does-not-accept-connections

    So, I thought it was high time that we pen a blog post on when this message can be safely ignored and when it is supposed to raise alarm bells. Before I get into the nitty-gritty details, let me explain under what condition is 18056 raised with the state = 29.

    Most applications today make use of connection pooling to reduce the number of times a new connection need to be opened to the backend database server. When the client application reuses the connection pool to send a new request to the server, SQL Server performs certain operations to facilitate the connection reuse. During this process (we shall call it Redo Login for this discussion) if any exception occurs, we report an 18056 error. The state numbers like the famous 18456: Login Failed error message give us more insight into why the Redo Login task fails. State 29 occurs when there is an Attention received from the client while the Redo Login code is being executed. This is when you would see the message below which has plagued many a mind till date on SQL Server 2008 instances:

    2009-02-19 04:40:03.41 spid58 Error: 18056, Severity: 20, State: 29.

    2009-02-19 04:40:03.41 spid58 The client was unable to reuse a session with SPID 58, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

    Is this a harmful message?

    The answer that always brings a smile to my face: It depends! The dependency of this error message being just plain noise to something that should send all the admins in the environment running helter-skelter can be summarized in one line.

    If the above error message (note that the state number should reflect 29) is the only message in the SQL Server Errorlog along with no other errors noticed in the environment (connectivity failures to the SQL instance in question, degraded performance, high CPU usage, Out of Memory errors), then this message can be treated as benign and safely ignored.

    Why is this message there?

    Well our intentions here were noble and we didn’t put the error message out there to create confusion. This error message is just reporting that a client is reusing a pooled connection and when the connection was reset, the server received an attention (in this case, a client disconnect) during the connection reset processing on the server side. This could be due to either a performance bottleneck on the server/environment or a plain application disconnect. The error message is aimed at helping in troubleshooting the first category of problems. If you do see some other issues at the same time though, these errors may be an indicator of what is going on at the engine side.

    What should you do when you see your Errorlog bloating with these error messages?

    a.       The foremost task would be to scan the SQL Errorlog and determine if this error message is accompanied before/after by some other error message or warning like Non-yielding messages, Out of Memory (OOM) error message (Error 701, Failed Allocate Pages etc.).

    b.      The next action item would be to determine if there is high CPU usage on the server or any other resource bottleneck on the Windows Server. Windows Performance Monitor (Perfmon) would be your best friend here.

    c.       Lastly, check if the Network between the Client and Server is facing any latency issues or if network packets drops are occurring frequently. A Netmon trace should help you here.

     

    Tejas Shah

    Escalation Engineer - Microsoft


  • 06/23/10--15:49: My Kerberos Checklist…
  • I’ve had numerous questions regarding Kerberos, both internally within Microsoft and with Customers.  It continues to be a complicated topic and the documentation that is out there can be less than straight forward.  Based on some previous items I’ve worked on, I wanted to share my experience in regards

    Let me start by looking at two scenarios for reference.  One that is basic and the other that is complex.

    image

    image

    As you’ll find, once we figure out how to configure the basic scenario, the complex scenario ends up being very similar.

    Data Collection:

    The first thing when you try to tackle a Kerberos issue is to understand your environment.  I find that a lot of the Kerberos issues that I troubleshoot all come down to gathering the right information to make an informed analysis and identify the problem point.  The following data points relate to all servers involved.  We will circle back on the Client after we talk about the Servers.

    1. Know your topology
    2. What is the Service Account being used for the application in question?
    3. What Service Principal Name (SPN) does your app require?
    4. What SPNs are defined for that service account?
    5. What are the delegation settings for the service account?
    6. Local Policy related information
    7. Additional application specific information

     

    Consistent vs. Intermittent Kerberos Issues

    The data collection points above should allow you to get Kerberos working in most cases.  I say most cases because the above refers specifically to configuration.  I typically break it down to consistent vs. intermittent issues.  If the issue is reproducable every time, it is a configuration issue.  If it is intermittent, then it is usually not a configuration issue.  If it was it would happen all the time.  Intermittent means it works most of the time.  In order to work at all, it has to be configured correctly.  The exception to this would be if you are in a Farm type situation and the configuration is not the same on every box in the farm.  Sometimes you may hit Server A which is configured properly, and another time you may hit Server B which is not and causes an error.  Which brings us to the first Data Collection Point…

    Know your topology

    Before you being, you should know what servers are involved in your application as a whole.  If we are talking about a single web application, you probably have at least two servers to consider and know about – the Web Server and the Backend (SQL for our purposes).  They both play a part.  This becomes even more important in a distributed environment where you may have 3+ servers.

    As you’ll see, with the data collection items, we basically will walk the line down your servers to check them one by one.

    What is the Service Account?

    For the particular server you are looking at, what is the service account that the application is using?  This is important, because this will tell us where the SPN needs to go.  It also plays a part in Delegation.  Not every service will be a Windows Service, so this could be dependent on the application in question.  Here are some examples:

    SharePoint

    IIS – not a windows service

    image

    Reporting Services

    Windows Service

    image

    SQL Server

    Windows Service

    image

    For windows services, you can also look in the Services MMC to get the information.  Again, you need to know what your application is doing:

    image

    What SPN does your app require?

    We can look at all sorts of SPN listings, but before you do, we need to know what it is we are looking for.  I think this is one of the more complicated parts of Kerb configuration because the SPN is dependent on the application you are using.  The format of the SPN is consistent between applications, but what is required is dependent on the application, or from an SPN point of view, the service.  It is a Service Principal Name after all!

    The SPN has the following format:  <service>/<host>:<port/name>

    The port/name piece of this is optional and dependent on what the service will accept.

    HTTP – For a default configuration, the port is never used for an HTTP SPN.  SPN’s are unique and if you add an HTTP SPN with a port on it, it will be ignored as it is not correct.  IIS and Internet Explorer do not affix the port number to the SPN request when they look for it.  From an Internet Explorer perspective, you can alter this behavior via a registry key to where it will, but I have yet to see anyone do that.  Most people aren’t aware of it from what I can tell.  From my experience, I would stay away from adding a port to an HTTP SPN.

    MSSQLSvc – you can look at the following blog post to read more about how SQL determines the SPN needed.  http://blogs.msdn.com/b/psssql/archive/2010/03/09/what-spn-do-i-use-and-how-does-it-get-there.aspx

    For the next couple of items, we will use the SharePoint service as the example – spservice.  In this case it is a web application, so we know it will use the HTTP service from an SPN perspective.  The host piece is dependent on how we are connecting to the web server.  This is true for any application really.  From an HTTP perspective it is the URL, for SQL it is the connection string.  Another thing to know is that both IIS and SQL will resolve a NetBIOS name to the Fully Qualified Domain Name if it can.  For example – http://passp will be resolved to passsp.pass.local.

    For our spservice example with a url of http://passsp, our SPN turns out to be http/passsp.pass.local and it is placed on the spservice account.

    Another special note about HTTP SPNs.  If for example my SharePoint AppPool (service) was using Network Service, this is considered the machine context so the SPN would go on the machine account (PASSSP).  However, HTTP is considered a covered service for a special service type called HOST.  Every Machine account has a HOST entry for the FQDN as well as the NetBIOS name.  You don’t need to add an HTTP SPN on the machine account as long as your URL matches the machine name.

    When adding an SPN, I also always recommend that you add both the FQDN SPN (i.e. http/passsp.pass.local) as well as the NetBIOS SPN (i.e. http/passsp).  The NetBIOS SPN is a safety measure in case the DNS resolution fails and it just submits the NetBIOS SPN request.

    What SPN is defined?

    Now that we know the service account and what our SPN should be, we can look at the SPNs that are defined on that account.  We can use SetSPN to do this, although there are other tools that can help get this information for you (ADSIEdit, LDAP queries, etc…).  SetSPN is nice though as it ships with the Operating System starting with Windows 2008.  Lets have a look at our SharePoint Service account – spservice:

    image

    Based on what we came up with above, we can see that the passsp SPN’s are in place.  You’ll also notice another SPN present, which means this Service Account is hosting two HTTP Services (could be two AppPools on the one server, or on two separate servers). 

    You could run into a situation where the SPN is defined on another account as well.  This may be a misplaced or a duplicate SPN.  Both will cause an issue for you.  Usually when I grab SPN information from an environment, I grab all SPN’s defined in the Domain so that I can look for misplaced or duplicate SPNs.  The SetSPN tool that comes with Windows 2008 and later (and can be downloaded for Windows 2003), contains a new switch that will look for Duplicates for you.  It is the –X switch.

    image

    In the above, you can see two accounts that had the http/passsp.pass.local SPN.  You can then decide which one really needs to be there based on the Service Account being used. 

    What are the delegation settings?

    Delegation only comes into play if you want the Client’s Windows credentials forwarded to another service.  For example, SharePoint to Reporting Services, Reporting Services to SQL, or even SQL to SQL in a Linked Server scenario.  NTLM does not allow for the forwarding of credentials.  This is accomplished through the process of delegation as part of the Kerberos Protocol. There are two main types of Delegation – Full Trust or Constrained Delegation.  Of note, you will not see the Delegation Tab on the Account within Active Directory unless an SPN has been assigned to that account.

    Full Trust

    This means that the given service can forward the Client’s credentials to any service.  You are non-discriminate in who you communicate to.  This is less secure option out of the two, but it is the easiest to configure out of the two (which I would expect being less secure – Secure always means complicated right?)

    image

    Constrained Delegation

    Constrained means that you are going to specify which services you can actually delegate to.  The services are represented by SPN’s.  This is the more secure approach but has some drawbacks.  As mentioned before it is more complicated. The reason is that you have to know exactly what your application is trying to delegate to.  It may not be just the service you are interested.  For example, you may be configuring SharePoint for Delegation to go to Reporting Services, but then realize that you just broke a connection to SQL or maybe a connection to some web service that you are trying to hit that requires Kerberos.  It’s not really that bad as long as you understand everything that your application is going to reach out to and that would require passing on the Client’s credentials.

    The other drawback to Constrained Delegation is that you lose the ability to cross a domain boundary.  Meaning a cross domain scenario will fail from a delegation perspective.  Users from another Domain can hit your application, but all of the services that you are communicating to need to be in the same domain.  For example, SharePoint (Domain A) cannot delegate to SQL (Domain B).  Under constrained delegation, that will fail.

    In the image below, the 3rd radio dial means that you want to use Constrained Delegation.  The sub radio dials define whether you want to use all Kerberos, or if you want to enable Protocol Transitioning.  I’m not going to get into Protocol Transitioning in this blog post as it is big enough, but you will have to deal with Protocol Transitioning if you are using the Claims to Windows Token Service.  This would come into the picture if you are doing anything with Excel Services in SharePoint or PowerPivot.

    image

     

    You will need to go back to your application’s topology to determine if enabling delegation is required. If we look at our Double Hop example from above, Reporting Services would need to have delegation enabled for it’s service account, but SQL would not as SQL isn’t going out to anything using the Client’s credentials.

    Local Policy Settings

    There is at least one Local Policy setting you’ll need to pay attention to when trying to delegate.  That is the “Impersonate a client after authentication” policy.

    image

    If your middle server is a web server, you can take advantage of a build in group that has this permission.  For Windows 2003, the group is called IIS_WPG.  For Windows 2008 and later it is the IIS_USRS group.  By default, SharePoint and RS should place itself in that group.  So, you usually don’t have to worry about it.  I’m just mentioning it here as a step in the checklist.  I rarely see this as the issue though unless you are doing a customer application with a Domain User account for the service account.

    Client

    Let’s circle back on the Client.  You may be asking, all this is great for the application, but is there anything special I need to do for the User Account coming from the client.  Not really.  By default you should be good to go from the Client’s user account.  However, there is an account you should be aware of within Active Directory.  That is the “Account is sensitive and cannot be delegated” setting.  If that is checked, you will have issues with that specific user.  To this date, I have yet to see a customer actually have that checked.  Doesn’t mean people don’t do it.  I just haven’t seen it.

    image

    Application Specific Settings

    When I started getting into Kerberos, I found that almost all of the issues were based on the Active Directory settings (SPN, Delegation, etc…).  Not to say that that has lessened, but I’ve also seen a shift in the complexity of getting specific applications up and running.  As applications become more complex, you should be aware of what settings may come into play within that app that could affect Kerberos.  If you have gone through everything above and it all looks good.  Chances are that there is an application specific setting that is interfering. 

    There is a lot to mention in this area, so I will spin up another blog post to discuss application specific settings to touch on IIS, SharePoint, Excel Services, PowerPivot and Reporting Services.  SQL doesn’t really have any Kerb specific settings as long as the SPN and delegation settings (if needed) are in place.

    Tying it together…

    So, we’ve looked at what my checklist is, but it was really focused on one service. What I’ve found is that it is as simple as that.  All I do is repeat the check list on each server that play a part in the application (topology).  Think of it as a wash, rinse, repeat.  When I help customers to get Kerberos configured, I just walk the line down each server to make sure everything lines up.  I have been fairly successful with that approach.  As I’ve had more experience with it (as I usually deal with it every day), I can usually target a specific segment depending on where the error is coming from.  Other times it may not be that straight forward.  Even when I target a specific area, if that doesn’t pan out, I just start from the beginning and apply the checklist to each server/service that is playing a part. 

    Once you approach it that way, it really doesn’t matter how many hops there are or what services are involved.  You just follow the checklist one more time.  The point where complications usually come into play are when Constrained Delegation is implemented and we didn’t account for everything or you hit up against an App Specific issue.  Outside of that, it is usually straight forward based on the above.  Just find out what the SPN needs to be and where it needs to go and you are 80% there.

    I realize I’m making it sound simple when it can be very frustrating and complicated, but the above has worked well for me in the past. Hopefully the above is helpful to you as you try to implement Kerberos within your environment. 

    There is definitely way more to cover on this topic and I will continue to blog about those items.

    Adam W. Saxton | Microsoft SQL Server Escalation Services
    http://twitter.com/awsaxton


    I was running an RML Utilities Suite test pass and encountered varying behavior from our sp_prepare suite.  Here is what I uncovered.

    The command sp_prepare returns (or does not return) metadata depending on the server version.  For the client version, it is only significant whether it is prior to SQL 2012 or it is a later one (i.e. 2012 RTM, SP1, etc.).

    1. Prior to SQL 2012, sp_prepare returns metadata to the user. This was implemented by internally setting FMTONLY ON and executing the statement.

    2. In SQL 2012 RTM and SP1, sp_prepare does NOT return metadata, if client version is 2012 or greater. FMTONLY ON is deprecated and used only for backward compatibility with the older (i.e. 2008) clients.

    3. In SQL 2012 CU6 (build 11.0.2401.0) and later, and SP1 CU3 and later, sp_prepare DOES return metadata to the user, if the batch contains one statement.  This is to address a performance issue with some scenarios (see hotfix KB2772525).

    The following matrix shows when sp_prepare should return metadata for batches containing one statement.

    Client\Server Version

    2008/R2

    2012 RTM

    2012 CU6 +

    2012 SP1

    2012 SP1 CU3 +

    SQL 14

    2008 R2

    yes

    yes

    yes

    yes

    yes

    yes

    2012 (all versions)

    yes

    no

    yes

    no

    yes

    yes

    SQL 14 CTP

    yes

    no

    yes

    no

    yes

    yes

    yes - sp_prepare returns metadata
    no - sp_prepare does NOT return metadata

    The following matrix shows when sp_prepare should return metadata for multi-statement batches, such as

    declare@p1int

    set@p1=NULL

    execsp_prepare@p1output,NULL,N'select * from sys.objects; select 1;',1

    select@p1

    Client\Server Version

    2008/R2

    2012 RTM

    2012 CU6 +

    2012 SP1

    2012 SP1 CU3 +

    SQL 14

    2008 R2

    yes

    yes

    yes

    yes

    yes

    yes

    2012 (all versions)

    yes

    no

    no

    no

    no

    no

    SQL 14 CTP

    yes

    no

    no

    no

    no

    no

    Bob Dorr - Principal SQL Server Escalation Engineer


    I was presented with a connectivity issue when trying to configure SharePoint 2013 using a CTP build of SQL 2014.  They got the following error when they were it was trying to create the Configuration Database.

    Exception: System.ArgumentException: myserver,50000 is an invalid or loopback address.  Specify a valid server address.
       at Microsoft.SharePoint.Administration.SPServer.ValidateAddress(String address)
       at Microsoft.SharePoint.Administration.SPServer..ctor(String address, SPFarm farm, Guid id)
       at Microsoft.SharePoint.Administration.SPConfigurationDatabase.RegisterDefaultDatabaseServices(SqlConnectionStringBuilder connectionString)
       at Microsoft.SharePoint.Administration.SPConfigurationDatabase.Provision(SqlConnectionStringBuilder connectionString)
       at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword, SecureString masterPassphrase)
       at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, String farmUser, SecureString farmPassword, SecureString masterPassphrase)
       at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.CreateOrConnectConfigDb()
       at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.Run()
       at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()

    They had indicated that they had hit this before, and they worked around it by creating a SQL Alias.  However this time it was not working.  It was presented to me as a possible issue with using SQL 2014 and I was asked to have a look to see if this would affect other customers using SQL 2014.

    I found some references regarding the error, and the majority of comments indicated to have SQL Server use the default port of 1433.  Also some that said create an Alias.  Some of the SharePoint documentation even shows how to change the SQL Port, and they also show how to create an Alias, but none really explained why this was necessary, or what SharePoint what actually looking for.

    For this issue, it has nothing to do with SQL 2014 specifically and could happen with any version of SQL.  The issue is what SharePoint is looking for.  Whatever you put in for the Server name needs to be a valid DNS name.  For a non-default port (1433), you would need to create a SQL Alias.  If you create a SQL Alias, the name should be resolvable and not a made up name that doesn’t exist in DNS.  Otherwise, you will get the same error.

     

    Techie Details

    I started by looking at the error first.  Of note, this is a SharePoint specific error and not a SQL error.

    Exception: System.ArgumentException: myserver,50000 is an invalid or loopback address.  Specify a valid server address.
       at Microsoft.SharePoint.Administration.SPServer.ValidateAddress(String address)

    This was an ArgumentException when SPServer.ValidateAddress was called.  I’m going to assume that the string being passed in is whatever we entered for the database server.  In my case it would be “myserver,50000”.  I’ve seen this type of behavior before, here is one example.  My first question was, what is ValidateAddress actually doing?  I had an assumption based on the behavior that it was doing a name lookup on what was being passed in, but I don’t like assumptions, so I wanted to verify.

    Enter JustDecompile!  This is a create tool if you want to see what .NET Assemblies are really doing.  The trick sometimes is to figure out what the actual assembly is.  I know SharePoint 2013 using the .NET 4.0 Framework, so the assemblies that are GAC’d will be in C:\Windows\Microsoft.NET\assembly\GAC_MSIL.  After that, I go off of the namespace as assemblies are typically aligned to the namespaces that are within it.  I didn’t see an assembly for Microsoft.SharePoint.Administration, so I grabbed the Microsoft.SharePoint assembly within C:\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.SharePoint\v4.0_15.0.0.0__71e9bce111e9429c.  This prompted me to load a few others, but it told me which ones to go get.

    Within the Microsoft.SharePoint assembly, we can see that we have the Administration namespace.

    SNAGHTML1464f782

    So, now we want the SPServer object and the ValidateAddress method.

    SNAGHTML1465d9b6

    internal static void ValidateAddress(string address)
    {
        Uri uri;
        if (address == null)
        {

            throw new ArgumentNullException("address");

        }

        UriHostNameType uriHostNameType = Uri.CheckHostName(address); <-- This is what gets us into trouble
        if (uriHostNameType == UriHostNameType.Unknown)
        {
            object[] objArray = new object[] { address };
            throw new ArgumentException(SPResource.GetString("InvalidServerAddress", objArray)); <-- The exception will be thrown here
        }

        uri = (uriHostNameType != UriHostNameType.IPv6 ||
            address.Length <= 0 ||
            address[0] == '[' ||
            address[address.Length - 1] == ']' ?
            new Uri(string.Concat("
    http://", address)) : new Uri(string.Concat("http://[", address, "]")));
        if (uri.IsLoopback)
        {

            object[] objArray1 = new object[] { address };
            throw new ArgumentException(SPResource.GetString("InvalidServerAddress", objArray1));
        }
    }

    Uri.CheckHostName Method
    http://msdn.microsoft.com/en-us/library/system.uri.checkhostname.aspx

    Determines whether the specified host name is a valid DNS name.

    So, if the string we pass in cannot be resolved via DNS, it will fail.  We never get to the point where we actually hit SQL itself.

     

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton


    This is a follow up to two blog posts from back in 2009 which talked about leaked connections.  In Part 1 and Part 2 of that post, it was about how to determine that you actually filled your pool.  This was centered around the following error:

    Exception type: System.InvalidOperationException
    Message: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.
    InnerException: <none>
    StackTrace (generated):
        SP               IP               Function
        000000001454DDC0 00000642828425A8 System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)
        000000001454DE10 0000064282841BA2 System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)
        000000001454DE60 000006428284166C System.Data.SqlClient.SqlConnection.Open()

    The issue I just worked on was the same exception, but in the case the Pools were not exhausted. In this case, the issue was occurring within BizTalk 2006 R2.  We narrowed this down to the following exception:

    0:138> !pe e09e13f0
    Exception object: 00000000e09e13f0
    Exception type: System.Data.SqlClient.SqlException
    Message: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
    InnerException: <none>
    StackTrace (generated):
        SP               IP               Function
        0000000015CBDF10 00000642828554A3 System_Data!System.Data.SqlClient.SqlInternalConnection.OnError(System.Data.SqlClient.SqlException, Boolean)+0x103
        0000000015CBDF60 0000064282854DA6 System_Data!System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(System.Data.SqlClient.TdsParserStateObject)+0xf6
        0000000015CBDFC0 0000064282CDCCF1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSniError(System.Data.SqlClient.TdsParserStateObject, UInt32)+0x291
        0000000015CBE0A0 000006428284ECCA System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult, System.Data.SqlClient.TdsParserStateObject)+0x13a
        0000000015CBE140 000006428284E9E1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()+0x91
        0000000015CBE1A0 0000064282852763 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadBuffer()+0x33
        0000000015CBE1D0 00000642828526A1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadByte()+0x21
        0000000015CBE200 0000064282851B5C System_Data!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior, System.Data.SqlClient.SqlCommand, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.BulkCopySimpleResultSet, System.Data.SqlClient.TdsParserStateObject)+0xbc
        0000000015CBE2D0 00000642828519E6 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.CompleteLogin(Boolean)+0x36
        0000000015CBE320 000006428284A997 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(System.Data.SqlClient.ServerInfo, System.String, Boolean, Int64, System.Data.SqlClient.SqlConnection)+0x147
        0000000015CBE3C0 000006428284859F System_Data!System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(System.String, System.String, Boolean, System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, Int64)+0x52f
        0000000015CBE530 0000064282847505 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, System.String, Boolean)+0x135
        0000000015CBE5D0 00000642828471E3 System_Data!System.Data.SqlClient.SqlInternalConnectionTds..ctor(System.Data.ProviderBase.DbConnectionPoolIdentity, System.Data.SqlClient.SqlConnectionString, System.Object, System.String, System.Data.SqlClient.SqlConnection, Boolean)+0x153
        0000000015CBE670 0000064282846E36 System_Data!System.Data.SqlClient.SqlConnectionFactory.CreateConnection(System.Data.Common.DbConnectionOptions, System.Object, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection)+0x296
        0000000015CBE730 0000064282846947 System_Data!System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnectionOptions)+0x37
        0000000015CBE790 000006428284689D System_Data!System.Data.ProviderBase.DbConnectionPool.CreateObject(System.Data.Common.DbConnection)+0x29d
        0000000015CBE830 000006428292905D System_Data!System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(System.Data.Common.DbConnection)+0x5d
        0000000015CBE870 0000064282846412 System_Data!System.Data.ProviderBase.DbConnectionPool.GetConnection(System.Data.Common.DbConnection)+0x6b2
        0000000015CBE930 00000642828424B4 System_Data!System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)+0x54
        0000000015CBE980 0000064282841BA2 System_Data!System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)+0xf2
        0000000015CBE9D0 000006428284166C System_Data!System.Data.SqlClient.SqlConnection.Open()+0x10c
        0000000015CBEA60 0000064282928C2D Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreSingleEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x8d
        0000000015CBEAE0 0000064282928947 Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreCustomEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x47

    The end result was to either increase the connection timeout for that connection string, or to look at the performance on the SQL Server and determine why SQL wasn’t able to satisfy the connection.  The customer had indicated that this occurred at the month end operations, which probably means that we ramped up pressure on SQL Server.  It may have come down to us not having enough Workers within SQL to handle the connection request which resulted in a Timeout after the default timeout which is 15 seconds.

    Techie details:

    This will look at how we determined what the problem was once we had a memory dump of the process. These debugging instructions are based on a 64-bit dump.  The steps should be similar for a 32-bit dump as well.  For the dumps, we used the SOS debugging extension which ships with the .NET Framework.  You can load the extension in the debugger by using the following command:

    0:000> .loadby sos mscorwks

    Let’s first find the Connection Pools that are in the dump:

    0:138> !dumpheap -stat -type DbConnectionPool

    000006428281fce8        4          416 System.Data.ProviderBase.DbConnectionPool+TransactedConnectionPool
    000006428085dbc8       28          672 System.Data.ProviderBase.DbConnectionPoolCounters+Counter
    000006428281f6d8        8          704 System.Data.ProviderBase.DbConnectionPool+PoolWaitHandles
    0000064282810450        4          704 System.Data.ProviderBase.DbConnectionPool
    000006428281d320      165         5280 System.Data.ProviderBase.DbConnectionPoolIdentity

    This shows the MethodTable that we can use to go get the different items.  Of note, you may see multiple items, and may have to go through each one.

    0:138> !dumpheap -mt 0x0000064282810450
    ------------------------------
    Heap 4
             Address               MT     Size
    00000000c021b348 0000064282810450      176    
    total 1 objects
    ------------------------------
    Heap 6
             Address               MT     Size
    00000000e05add10 0000064282810450      176    
    total 1 objects
    ------------------------------
    Heap 12
             Address               MT     Size
    000000014004b1d8 0000064282810450      176    
    total 1 objects
    ------------------------------
    Heap 13
             Address               MT     Size
    00000001502e6af0 0000064282810450      176
     

    We have 4 pools.  Let’s have a look at each pool and see how many connections we have for each.

    Pool 1:

    0:138> !do 0x00000000c021b348
    Name: System.Data.ProviderBase.DbConnectionPool
    MethodTable: 0000064282810450
    EEClass: 00000642827da538
    Size: 176(0xb0) bytes
    (C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
    Fields:
                  MT    Field   Offset                 Type VT     Attr            Value Name

    00000642827ef760  400153f       18 ...nnectionPoolGroup  0 instance 0000000160036630 _connectionPoolGroup
    0000064282818d18  4001540       20 ...nPoolGroupOptions  0 instance 0000000160036608 _connectionPoolGroupOptions

    000006427843d998  4001551       98         System.Int32  1 instance                7 _totalObjects <-- Only 7 Objects out of a total pool size of 500

    0:138> !do 0000000160036608
    Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
    MethodTable: 0000064282818d18
    EEClass: 000006428282ce58
    Size: 40(0x28) bytes
    (C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
    Fields:
                  MT    Field   Offset                 Type VT     Attr            Value Name
    00000642784358f8  4001598       14       System.Boolean  1 instance                1 _poolByIdentity
    000006427843d998  4001599        8         System.Int32  1 instance                1 _minPoolSize
    000006427843d998  400159a        c         System.Int32  1 instance              500 _maxPoolSize <-- Total pool size

    Pool 2:

    0:138> !do 0x00000000e05add10
    Name: System.Data.ProviderBase.DbConnectionPool
    0000064282818d18  4001540       20 ...nPoolGroupOptions  0 instance         e05ad798 _connectionPoolGroupOptions
    000006427843d998  4001551       98         System.Int32  1 instance                6 _totalObjects <-- Only 6 Objects out of a total pool size of 100

    0:138> !do e05ad798
    Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
                  MT            Field           Offset                 Type VT             Attr            Value Name
    00000642784358f8  4001598       14       System.Boolean  1 instance                1 _poolByIdentity
    000006427843d998  4001599        8         System.Int32  1 instance                0 _minPoolSize
    000006427843d998  400159a        c         System.Int32  1 instance              100 _maxPoolSize <-- Total pool size

    Pool 3:

    0:138> !do 0x000000014004b1d8
    Name: System.Data.ProviderBase.DbConnectionPool
    0000064282818d18  4001540       20 ...nPoolGroupOptions  0 instance         d01e8288 _connectionPoolGroupOptions
    000006427843d998  4001551       98         System.Int32  1 instance                7 _totalObjects <-- Only 7 Objects out of a total pool size of 500

    0:138> !do d01e8288
    Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
                  MT            Field           Offset                 Type VT             Attr            Value Name
    00000642784358f8  4001598       14       System.Boolean  1 instance                1 _poolByIdentity
    000006427843d998  4001599        8         System.Int32  1 instance                1 _minPoolSize
    000006427843d998  400159a        c         System.Int32  1 instance              500 _maxPoolSize <-- Total pool size

    Pool 4:

    0:138> !do 0x00000001502e6af0
    Name: System.Data.ProviderBase.DbConnectionPool
    0000064282818d18  4001540       20 ...nPoolGroupOptions  0 instance        1600f1940 _connectionPoolGroupOptions
    000006427843d998  4001551       98         System.Int32  1 instance                4 _totalObjects <-- Only 4 Objects out of a total pool size of 100

    0:138> !do 1600f1940
    Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
                  MT            Field           Offset                 Type VT             Attr            Value Name
    00000642784358f8  4001598       14       System.Boolean  1 instance                1 _poolByIdentity
    000006427843d998  4001599        8         System.Int32  1 instance                0 _minPoolSize
    000006427843d998  400159a        c         System.Int32  1 instance              100 _maxPoolSize <-- Total pool size

    The connection pools are dictated by the Connection String used.  So, this means 4 different connection strings were used.  We can look at the stack objects to see if we can pick apart some more information.

    0:138> !dso
    OS Thread Id: 0x70b0 (138)
    RSP/REG          Object           Name
    ...
    000000001454df30 00000001602a0f00 System.Data.SqlClient.SqlConnection
    000000001454df40 00000000c0ace890 System.String
    000000001454df48 00000001602a0cf0 Microsoft.BizTalk.Bam.EventObservation.BAMTraceFragment
    000000001454df50 0000000150511568 System.String
    000000001454df60 00000001602a0b00 Microsoft.BizTalk.Bam.EventObservation.DirectEventStream
    000000001454df70 00000001602a0b00 Microsoft.BizTalk.Bam.EventObservation.DirectEventStream
    000000001454df78 00000001602a0cf0 Microsoft.BizTalk.Bam.EventObservation.BAMTraceFragment
    000000001454df80 00000001505112d0 System.String
    000000001454df88 0000000150511568 System.String
    000000001454df90 00000001602a0cf0 Microsoft.BizTalk.Bam.EventObservation.BAMTraceFragment
    000000001454dfa8 00000001602a13d0 System.InvalidOperationException
    000000001454dfb0 00000001602a0b38 System.Object
    000000001454dfb8 000000015050d780 System.Data.SqlClient.SqlCommand
    ...

    Here is the SQL Command Object that was issuing the command when we had the exception.

    0:138> !do 000000015050d780
    Name: System.Data.SqlClient.SqlCommand
    MethodTable: 000006428279dbd0
    EEClass: 00000642827d1dc0
    Size: 224(0xe0) bytes
    (C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
    Fields:
                  MT    Field   Offset                 Type VT     Attr            Value Name
    0000064278436018  400018a        8        System.Object  0 instance 0000000000000000 __identity
    00000642828144d8  40008de       10 ...ponentModel.ISite  0 instance 0000000000000000 site
    00000642826664d8  40008df       18 ....EventHandlerList  0 instance 0000000000000000 events
    0000064278436018  40008dd      210        System.Object  0   static 00000000f0269548 EventDisposed
    000006427843d998  40016f2       b0         System.Int32  1 instance              672 ObjectID
    0000064278436728  40016f3       20        System.String  0 instance 00000000f0020178 _commandText <-- The query/command issued
    000006428279c370  40016f4       b4         System.Int32  1 instance                4 _commandType
    000006427843d998  40016f5       b8         System.Int32  1 instance               30 _commandTimeout
    000006428279d908  40016f6       bc         System.Int32  1 instance                3 _updatedRowSource
    00000642784358f8  40016f7       d0       System.Boolean  1 instance                0 _designTimeInvisible
    000006428288d490  40016f8       28 ...ent.SqlDependency  0 instance 0000000000000000 _sqlDep
    00000642784358f8  40016f9       d1       System.Boolean  1 instance                0 _inPrepare
    000006427843d998  40016fa       c0         System.Int32  1 instance               -1 _prepareHandle
    00000642784358f8  40016fb       d2       System.Boolean  1 instance                0 _hiddenPrepare
    00000642827e3128  40016fc       30 ...rameterCollection  0 instance 000000015050d940 _parameters
    00000642827eea48  40016fd       38 ...ent.SqlConnection  0 instance 000000015050f308 _activeConnection <-- The SqlConnection that we used for this command
    00000642784358f8  40016fe       d3       System.Boolean  1 instance                0 _dirty

    In this case, we know the SqlConnection isn’t valid because we erred trying to get it from the Pool.  The Command Text would be interesting has this been a Query timeout, but for a connection Timeout, it is irrelevant.  We can poke at the strings on the stack and we will find the Connection String used for this operation.

    0:138> !do 00000001505112d0
    Name: System.String
    MethodTable: 0000064278436728
    EEClass: 000006427803e520
    Size: 330(0x14a) bytes
    (C:\WINDOWS\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
    String: server=MyServer; database= MyDatabase;Integrated Security=SSPI;Connect Timeout=25; pooling=true; Max Pool Size=500; Min Pool Size=1

    From this, we can see Max Pool Size is at 500, so that narrows it down to two of the four Pools listed above. When we went through the pools previously, I noticed that one of the pools had something that the others didn’t.  And, it happened to be one of the pools with the Pool Size of 500.  Let’s look at the full input of the pool in question.

    0:138> !do 0x000000014004b1d8
    Name: System.Data.ProviderBase.DbConnectionPool
    MethodTable: 0000064282810450
    EEClass: 00000642827da538
    Size: 176(0xb0) bytes
    (C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
    Fields:
                  MT    Field   Offset                 Type VT     Attr            Value Name
    000006427843d998  400153c       88         System.Int32  1 instance           200000 _cleanupWait
    000006428281d320  400153d        8 ...ctionPoolIdentity  0 instance 000000014004b1b8 _identity
    00000642827ef2d0  400153e       10 ...ConnectionFactory  0 instance 0000000140022860 _connectionFactory
    00000642827ef760  400153f       18 ...nnectionPoolGroup  0 instance 00000000d01e82b0 _connectionPoolGroup <-- We can get the connection string from this object
    0000064282818d18  4001540       20 ...nPoolGroupOptions  0 instance 00000000d01e8288 _connectionPoolGroupOptions
    000006428281d3c0  4001541       28 ...nPoolProviderInfo  0 instance 0000000000000000 _connectionPoolProviderInfo
    00000642828102f8  4001542       8c         System.Int32  1 instance                1 _state
    000006428281d4b8  4001543       30 ...InternalListStack  0 instance 000000014004b288 _stackOld
    000006428281d4b8  4001544       38 ...InternalListStack  0 instance 000000014004b2a0 _stackNew
    0000064278424d50  4001545       40 ...ding.WaitCallback  0 instance 000000014004c570 _poolCreateRequest
    0000064278425c90  4001546       48 ...Collections.Queue  0 instance 0000000000000000 _deactivateQueue
    0000064278424d50  4001547       50 ...ding.WaitCallback  0 instance 0000000000000000 _deactivateCallback
    000006427843d998  4001548       90         System.Int32  1 instance                0 _waitCount
    000006428281f6d8  4001549       58 ...l+PoolWaitHandles  0 instance 000000014004b3a8 _waitHandles
    00000642784369f0  400154a       60     System.Exception  0 instance 00000000e09e13f0 _resError <-- We had an error on this pool
    00000642784358f8  400154b       a0       System.Boolean  1 instance                1 _errorOccurred
    000006427843d998  400154c       94         System.Int32  1 instance            10000 _errorWait
    0000064278468a80  400154d       68 ...m.Threading.Timer  0 instance 00000001505bc420 _errorTimer
    0000064278468a80  400154e       70 ...m.Threading.Timer  0 instance 000000014004c5f0 _cleanupTimer
    000006428281fce8  400154f       78 ...tedConnectionPool  0 instance 000000014004c3e8 _transactedConnectionPool
    0000000000000000  4001550       80                       0 instance 000000014004b400 _objectList
    000006427843d998  4001551       98         System.Int32  1 instance                7 _totalObjects
    000006427843d998  4001553       9c         System.Int32  1 instance                8 _objectID
    0000064278425e20  400153b      c00        System.Random  0   static 00000000e0188968 _random
    000006427843d998  4001552      968         System.Int32  1   static               18 _objectTypeCount

    First, lets see if we can line up the connection string for this Pool with what was on the stack to make sure we are looking at the right pool.

    0:138> !do 00000000d01e82b0
    Name: System.Data.ProviderBase.DbConnectionPoolGroup
    MethodTable: 00000642827ef760
    EEClass: 00000642827da418
    Size: 72(0x48) bytes
    (C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
    Fields:
                  MT    Field   Offset                 Type VT     Attr            Value Name
    0000064282816978  4001584        8 ...ConnectionOptions  0 instance 0000000170021600 _connectionOptions
    0000064282818d18  4001585       10 ...nPoolGroupOptions  0 instance 00000000d01e8288 _poolGroupOptions
    00000642823f2650  4001586       18 ....HybridDictionary  0 instance 00000000b00fb528 _poolCollection
    000006427843d998  4001587       30         System.Int32  1 instance                1 _poolCount
    000006427843d998  4001588       34         System.Int32  1 instance                1 _state
    00000642828193b0  4001589       20 ...GroupProviderInfo  0 instance 00000000d01e82f8 _providerInfo
    0000000000000000  400158a       28 ...DbMetaDataFactory  0 instance 0000000000000000 _metaDataFactory
    000006427843d998  400158c       38         System.Int32  1 instance                7 _objectID
    000006427843d998  400158b      978         System.Int32  1   static               20 _objectTypeCount

    0:138> !do 0000000170021600
    Name: System.Data.SqlClient.SqlConnectionString
    MethodTable: 0000064282817158
    EEClass: 00000642828234e0
    Size: 184(0xb8) bytes
    (C:\WINDOWS\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
    Fields:
                  MT    Field   Offset                 Type VT     Attr            Value Name
    0000064278436728  4000bef        8        System.String  0 instance 0000000150020230 _usersConnectionString
    000006427843e080  4000bf0       10 ...ections.Hashtable  0 instance 00000001700216b8 _parsetable
    00000642828180a0  4000bf1       18 ...mon.NameValuePair  0 instance 0000000170021878 KeyChain
    00000642784358f8  4000bf2       28       System.Boolean  1 instance                0 HasPasswordKeyword
    00000642784358f8  4000bf3       29       System.Boolean  1 instance                0 UseOdbcRules
    000006427843cf18  4000bf4       20 ...ity.PermissionSet  0 instance 00000000d01e8330 _permissionset
    00000642825a4958  4000beb      3e0 ...Expressions.Regex  0   static 00000000f026d658 ConnectionStringValidKeyRegex
    00000642825a4958  4000bec      3e8 ...Expressions.Regex  0   static 00000000d01e7798 ConnectionStringValidValueRegex
    00000642825a4958  4000bed      3f0 ...Expressions.Regex  0   static 0000000080032770 ConnectionStringQuoteValueRegex
    00000642825a4958  4000bee      3f8 ...Expressions.Regex  0   static 0000000080034800 ConnectionStringQuoteOdbcValueRegex

    0:138> !do 0000000150020230
    Name: System.String
    MethodTable: 0000064278436728
    EEClass: 000006427803e520
    Size: 330(0x14a) bytes
    (C:\WINDOWS\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
    String: server=MyServer; database= MyDatabase;Integrated Security=SSPI;Connect Timeout=25; pooling=true; Max Pool Size=500; Min Pool Size=1

    We have a match!  So, now lets look at the error that was on the pool.

    0:138> !pe 00000000e09e13f0
    Exception object: 00000000e09e13f0
    Exception type: System.Data.SqlClient.SqlException
    Message: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
    InnerException: <none>
    StackTrace (generated):
        SP               IP               Function
        0000000015CBDF10 00000642828554A3 System_Data!System.Data.SqlClient.SqlInternalConnection.OnError(System.Data.SqlClient.SqlException, Boolean)+0x103
        0000000015CBDF60 0000064282854DA6 System_Data!System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(System.Data.SqlClient.TdsParserStateObject)+0xf6
        0000000015CBDFC0 0000064282CDCCF1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSniError(System.Data.SqlClient.TdsParserStateObject, UInt32)+0x291
        0000000015CBE0A0 000006428284ECCA System_Data!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult, System.Data.SqlClient.TdsParserStateObject)+0x13a
        0000000015CBE140 000006428284E9E1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()+0x91
        0000000015CBE1A0 0000064282852763 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadBuffer()+0x33
        0000000015CBE1D0 00000642828526A1 System_Data!System.Data.SqlClient.TdsParserStateObject.ReadByte()+0x21
        0000000015CBE200 0000064282851B5C System_Data!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior, System.Data.SqlClient.SqlCommand, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.BulkCopySimpleResultSet, System.Data.SqlClient.TdsParserStateObject)+0xbc
        0000000015CBE2D0 00000642828519E6 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.CompleteLogin(Boolean)+0x36
        0000000015CBE320 000006428284A997 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(System.Data.SqlClient.ServerInfo, System.String, Boolean, Int64, System.Data.SqlClient.SqlConnection)+0x147
        0000000015CBE3C0 000006428284859F System_Data!System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(System.String, System.String, Boolean, System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, Int64)+0x52f
        0000000015CBE530 0000064282847505 System_Data!System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(System.Data.SqlClient.SqlConnection, System.Data.SqlClient.SqlConnectionString, System.String, Boolean)+0x135
        0000000015CBE5D0 00000642828471E3 System_Data!System.Data.SqlClient.SqlInternalConnectionTds..ctor(System.Data.ProviderBase.DbConnectionPoolIdentity, System.Data.SqlClient.SqlConnectionString, System.Object, System.String, System.Data.SqlClient.SqlConnection, Boolean)+0x153
        0000000015CBE670 0000064282846E36 System_Data!System.Data.SqlClient.SqlConnectionFactory.CreateConnection(System.Data.Common.DbConnectionOptions, System.Object, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection)+0x296
        0000000015CBE730 0000064282846947 System_Data!System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnectionOptions)+0x37
        0000000015CBE790 000006428284689D System_Data!System.Data.ProviderBase.DbConnectionPool.CreateObject(System.Data.Common.DbConnection)+0x29d
        0000000015CBE830 000006428292905D System_Data!System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(System.Data.Common.DbConnection)+0x5d
        0000000015CBE870 0000064282846412 System_Data!System.Data.ProviderBase.DbConnectionPool.GetConnection(System.Data.Common.DbConnection)+0x6b2
        0000000015CBE930 00000642828424B4 System_Data!System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)+0x54
        0000000015CBE980 0000064282841BA2 System_Data!System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)+0xf2
        0000000015CBE9D0 000006428284166C System_Data!System.Data.SqlClient.SqlConnection.Open()+0x10c
        0000000015CBEA60 0000064282928C2D Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreSingleEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x8d
        0000000015CBEAE0 0000064282928947 Microsoft_BizTalk_Bam_EventObservation!Microsoft.BizTalk.Bam.EventObservation.DirectEventStream.StoreCustomEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)+0x47

    As we can see, it is a normal Connection Timeout error.  Which makes sense, as our pools were not exhausted.  Of note, they had set their Connection Timeout to 25 seconds in the connection string.  Which means they would need to bump it higher, or look at what is going on with SQL Server at the time this occurs.  Not much more we can get from the dump.

     

    Adam W. Saxton | Microsoft Escalation Services
    http://twitter.com/awsaxton


    The customers issue was that they were trying to provision a Project site within the Project SharePoint Application. This was done via a PowerShell script that they ran on one of the SharePoint App Servers.

    They had two SharePoint App Servers – AppServerA and AppServerB. They had indicated that the provisioning would fail on either App Server and it started failing around November of last year (4 months ago). The error that they would see when the failure occurred was the following from the SharePoint ULS Logs:

    02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Database        880i        High        System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)

    02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Database        880j        High        SqlError: 'A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)' Source: '.Net SqlClient Data Provider' Number: 53 State: 0 Class: 20 Procedure: '' LineNumber: 0 Server: ''        f5009e1d-12cd-4a70-a0af-f0400acf99e6

    02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Database        tzkv        High        SqlCommand: 'CREATE PROCEDURE dbo.MSP_TimesheetQ_Acknowledge_Control_Message @serverUID UID , @ctrlMsgId int AS BEGIN IF @@TRANCOUNT > 0 BEGIN RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1) RETURN END DECLARE @lastError INT SELECT @lastError = 0 UPDATE dbo.MSP_QUEUE_TIMESHEET_HEALTH SET LAST_CONTROL_ID = @ctrlMsgId WHERE SERVER_UID = @serverUID SELECT @lastError = @@ERROR Exit1: RETURN @lastError END ' CommandType: Text CommandTimeout: 0        f5009e1d-12cd-4a70-a0af-f0400acf99e6

    02/05/2014 10:14:32.87        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Provisioning        6935        Critical        Error provisioning database. Script: C:\Program Files\Microsoft Office Servers\14.0\Sql\Project Server\Core\addqueue1timesheetsps12.sql, Line: 0, Error: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server), Line: CREATE PROCEDURE dbo.MSP_TimesheetQ_Acknowledge_Control_Message @serverUID UID , @ctrlMsgId int AS BEGIN IF @@TRANCOUNT > 0 BEGIN RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1) RETURN END DECLARE @lastError INT SELECT @lastError = 0 UPDATE dbo.MSP_QUEUE_TIMESHEET_HEALTH SET LAST_CONTROL_ID = @ctrlMsgId WHERE SERVER_UID = @serverUID SELECT @lastError = @@ERROR Exit1: RETURN @lastError END .        f5009e1d-12cd-4a70-a0af-f0400acf99e6

    02/05/2014 10:14:32.89        OWSTIMER.EXE (0x2024)        0x0BC8        Project Server        Provisioning        6971        Critical        Failed to provision site /CMS with error: Microsoft.Office.Project.Server.Administration.ProvisionException: Failed to provision databases. ---> Microsoft.Office.Project.Server.Administration.ProvisionException: CREATE PROCEDURE dbo.MSP_TimesheetQ_Acknowledge_Control_Message @serverUID UID , @ctrlMsgId int AS BEGIN IF @@TRANCOUNT > 0 BEGIN RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1) RETURN END DECLARE @lastError INT SELECT @lastError = 0 UPDATE dbo.MSP_QUEUE_TIMESHEET_HEALTH SET LAST_CONTROL_ID = @ctrlMsgId WHERE SERVER_UID = @serverUID SELECT @lastError = @@ERROR Exit1: RETURN @lastError END ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) at

    One thing they had mentioned was that if they increased the Connection Timeout to 60 seconds, it would sometimes work. My thought process on this is that if connection timeout would sometimes allow it to work that we may have had a timeout when actually connecting to SQL Server, but that wasn’t the error.

    Looking at the actual error we can draw some conclusions.

    provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server

    By default, we should be using TCP. If there is a serious error with that, we will use Named Pipes. The error Named Pipes got back was that we couldn’t open the connection. Not a timeout. Think of this as “SQL Server does not exist or access denied”. SQL Server in this case was also a default instance Cluster. Not a Named Instance, so SQL Browser was not coming into the picture. This is a straight shot to port 1433 via TCP.

    Which machine was getting the error?

    For troubleshooting, we need to consider which machines are involved. One thing that we noticed over the course of troubleshooting was that the error always occurred on AppServerB and we were always starting the script from AppServerA. If you think about how SharePoint works with its App Servers, when a service is running, you can have it started on individual App Servers and control the load.  The fact that we were always seeing the error on AppServerB led me to believe that the Project Application Server Service was only started on AppServerB and not AppServerA.  Looking in Central Admin, this was correct.  So, we want to concentrate data collection from AppServerB.

    Network Traces

    The first thing that was looked at was getting a network trace. We collected network traces from AppServerB and the SQL Server.  If we go back to error that was happening, we recall that we know that TCP was not working as expected and then Named Pipes was failing.  Named Pipes uses the SMB protocol to talk.  This will first reach out to TCP port 445.  We didn’t see any traffic in the Network trace going to that.  We also didn’t see any SMB traffic that was relevant to the error.  We only saw browser announcements which had nothing to do with us.  This tells me that we never hit the wire.  So, the network traces wouldn’t be helpful.

    BIDTrace

    Enter BIDTrace.  BIDTrace is really just diagnostic logging within our client providers and server SNI stack.  Think Event Tracing for Windows (ETW).  I’m not going to dive into how to set this up as it would take its own blog post.  You can read more about it in the following MSDN Page:

    Data Access Tracing in SQL Server 2012
    http://msdn.microsoft.com/en-us/library/hh880086.aspx

    Typically I won’t go this route unless I know what I’m looking to get out of it.  It is actually pretty rare that I’ll jump to this.  In this particular case, it was an excellent case.  We have some evidence that we are not getting far enough to hit the wire, and we know we are getting an error when trying to make a connection to SQL.  So, what I’m looking for here is if there is some Windows Error that we are getting that wasn’t presented in the actual exception.

    Here is the Logman command that I used to start the capture after getting the BIDTrace items configured.

    Logman start MyTrace -pf ctrl.guid -ct perf -o Out%d.etl -mode NewFile -max 150 –ets

    A few things I’ll point out with this comment.  The output file has a %d in it.  This is a format string because we will end up with multiple files.  -mode is used to tell it to create a new file after hitting the max size that is listed.  We then set –max to 150 which means that we want to cap the size of the file to 150MB in size.  I did this because when we first went for it with a single file, the ETL file was 300MB and when I went to convert it to text it was over 1GB in size.  That’s a lot to look through.  I also had troubles opening it.  So, I decided to break it up.  Of note, it took about 4-5 minutes to reproduce the issue.  That’s a long time to capture a BIDTrace.  When you go to capture a BIDTrace, it is better to get a small window to capture if you can.  These files fill up fast.

    Here is the ctrl.guid that I used to capture.  This is effectively the event providers that I wanted to capture:

    {8B98D3F2-3CC6-0B9C-6651-9649CCE5C752}  0x630ff  0   MSDADIAG.ETW
    {914ABDE2-171E-C600-3348-C514171DE148}  0x630ff  0   System.Data.1
    {C9996FA5-C06F-F20C-8A20-69B3BA392315}  0x630ff  0   System.Data.SNI.1

    The capture will produce ETL files which are binary files.  You need to convert them after you are done.  I use TraceRPT to do this.  It is part of Windows.  Here is the command I used to output it to a CSV file to look at.

    TraceRPT out5.etl –of CSV

    In our case, it had generated 5 etl files – remember the %d?  So, we grabbed the last file that was produced which was out5.etl and converted it.  Although at first, I didn’t know it was out5.etl.  I actually started with out4.etl.  One problem is though is I didn’t have timestamps within the CSV output.  I had clock CPU time which is hard to visualize compared to an actual timestamp.

    Enter Message Analyzer! Message Analyzer is a replacement for Network Monitor.  But it has another awesome ability in that it can open ETL files.  One other thing I had was the timestamp of the error from the SharePoint ULS Log on the attempt that we made when we captured the BIDTrace.

    02/06/2014 13:14:20.55     OWSTIMER.EXE (0x2024)                       0x1C4C    Project Server                    Database                          880i    High        System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)     at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)     at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)     at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSec...    bc7aaa60-93fc-4873-8f75-416d802aa55b

    02/06/2014 13:14:20.55     OWSTIMER.EXE (0x2024)                       0x1C4C    Project Server                    Provisioning                      6993    Critical    Provisioning '/Test3': Failed to provision databases. An exception occurred: CREATE PROCEDURE dbo.MSP_TimesheetQ_Get_Job_Count_Simple   @correlationID UID ,    @groupState int ,    @msgType int  AS BEGIN    IF @@TRANCOUNT > 0    BEGIN              RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1)       RETURN    END     SELECT COUNT(*) FROM dbo.MSP_QUEUE_TIMESHEET_GROUP        WHERE CORRELATION_UID = @correlationID       AND   GRP_QUEUE_STATE = @groupState       AND   GRP_QUEUE_MESSAGE_TYPE = @msgType END .    bc7aaa60-93fc-4873-8f75-416d802aa55b

    Our issue occurred at 1:14:20.55 Server Time. We can also see the statement it was going to try and run.  If we open the ETL file within Message Analyzer, we can see the timestamps that are covered within the file. 

    image

    We can see that this went up to 12:14:04 local time.  We were looking for 12:14:20.55.  So, out4.etl was not the file I was looking for.  Which left Out5.etl.  Technically you can read the data within Message Analyzer as you can see from the lower right of the screenshot.  It’s unicode data, and we see l.e.a.v.e.  I still prefer the output from TraceRPT when going to CSV as I can get the readable text from that.  It is just a little easier to work with.

    So, I have the CSV output from out5.etl, but what do we look for?  Well, we know the statement that it was trying to make, so lets look for that - MSP_TimesheetQ_Get_Job_Count_Simple. We get a hit and it looks like this:

    System.Data,      TextW,            0,          0,          0,          0,         18,          0, 0x0000000000000000, 0x00002024, 0x00001C4C,                    0,             ,                     ,   {00000000-0000-0000-0000-000000000000},                                         ,   130361840460213871,       7080,      21510,        2, "<sc.SqlCommand.set_CommandText|API> 4187832#, '"
    System.Data,      TextW,            0,          0,          0,          0,         18,          0, 0x0000000000000000, 0x00002024, 0x00001C4C,                    0,             ,                     ,   {00000000-0000-0000-0000-000000000000},                                         ,   130361840460213910,       7080,      21510,        2, "CREATE PROCEDURE dbo.MSP_TimesheetQ_Get_Job_Count_Simple   @correlationID UID ,    @groupState int ,    @msgType int  AS BEGIN    IF @@TRANCOUNT > 0    BEGIN              RAISERROR ('Queue operations cannot be used from within a transaction.', 16, 1)       RETURN    END     SELECT COUNT(*) FROM dbo.MSP_QUEUE_TIMESHEET_GROUP        WHERE CORRELATION_UID = @correlationID       AND   GRP_QUEUE_STATE = @groupState       AND   GRP_QUEUE_MESSAGE_TYPE = @msgType END "
    System.Data,      TextW,            0,          0,          0,          0,         18,          0, 0x0000000000000000, 0x00002024, 0x00001C4C,                    0,             ,                     ,   {00000000-0000-0000-0000-000000000000},                                         ,   130361840460213935,       7080,      21510,        2, "' "

    Not the prettiest, but when looking in notepad or some other text reader, we can just go over to the right to get a better view.

    image

    The first time you look at this it can be a little overwhelming.  Especially if you aren’t familiar with how SNI/TDS works.  If we go through the results, we’ll see a few interesting things.

    <prov.DbConnectionHelper.ConnectionString_Set|API> 4184523#, 'Data Source=<server>;Initial Catalog=<database>;Integrated Security=True;Pooling=False;Asynchronous Processing=False;Connect Timeout=15;Application Name="Microsoft Project Server"' "

    <GetProtocolEnum|API|SNI>

    <Tcp::FInit|API|SNI>

    <Tcp::SocketOpenSync|API|SNI>

    <Tcp::SocketOpenSync|RET|SNI> 10055{WINERR}

    <Tcp::Open|ERR|SNI> ProviderNum: 7{ProviderNum}, SNIError: 0{SNIError}, NativeError: 10055{WINERR} <-- 10055 = WSAENOBUFS

    <Np::FInit|RET|SNI> 0{WINERR}

    <Np::OpenPipe|API|SNI> 212439#, szPipeName: '\\<server>\PIPE\sql\query', dwTimeout: 5000

    <Np::OpenPipe|ERR|SNI> ProviderNum: 1{ProviderNum}, SNIError: 40{SNIError}, NativeError: 53{WINERR} <-- ERROR_BAD_NETPATH = network path was not found

    We can get the Connection string, which was also available in the SharePoint ULS Log.  We will also see some entries around Protocol Enumeration.  This is where we look at the Client Registry items to see what Protocols we will go through and in what order (TCP, NP, LPC, etc…).  Then we see TCP trying to connect.  You’ll recall I mentioned that we try TCP first by default.  We then see that this received a Windows error of 10055 (WSAENOBUFS).  We then see Named Pipes fail with Error 53 which is ERROR_BAD_NETPATH.  We got what we were looking for out of the BIDTrace.

    WSAENOBUFS is the key here.  It is a WinSock error which we actually have a KB Article on.

    When you try to connect from TCP ports greater than 5000 you receive the error 'WSAENOBUFS (10055)'
    http://support.microsoft.com/kb/196271

    There is a registry key called MaxUserPort which can increase the number of dynamic ports that are available.  In Windows 2003, this was under 5000.  Starting in Windows 2008, this was increased as we use to see a lot of problems here.  Especially when connection pooling was not being used.  Here is the port range on my Windows 8.1 machine.

    image

    And for a Windows 2008 R2 Server, which the customer was using:

    image

    I have 64510 ports available.  On the customer’s machine, they had mentioned that for a previous issue, the engineer had asked them to add this registry key, and they set the value to 4999.  By setting it to 4999, we are effectively limiting the number of ports that would have otherwise been available.  If you look back at the connection string, you can see that Pooling was set to False.  This means we are turning off connection pooling, and every time we go to connect, we will establish a new hard connection.  This eats up a port.  You can look at NETSTAT to see what it looks like.  We did then when running the provisioning scripts and we saw it get up to around 3000 or so before it was done.  You will also see a lot of ports in a TIME_WAIT status.  When you disconnect and the port is released, it will go into a TIME_WAIT state for a set amount of time.  The default of which is around 4 minutes.  That’s 4 minutes you can’t use that port.  If you are opening and closing connections a lot, you will run out of ports because a lot will be in the TIME_WAIT state.  That’s typically when we would bump up the number of ports using the MaxUserPort registry key.  However, this is never really a fix, you are just putting  a bandaid on without understanding the problem.

    End result…

    In our case, Project Server was turning off connection pooling.  I don’t know why they are doing that, but that, in conjunction with the MaxUserPort being set to 4999, was causing this issue.  We had removed the MaxUserPort registry key and rebooted AppServerB, and it started working after that.  Of note, we had also started the Project Application Server on AppServerA and cleaned up the TCP registry keys on that machine as well so that they could effectively balance their load on the SharePoint App Server.

     

    Adam W. Saxton | Microsoft SQL Server Escalation Services
    http://twitter.com/awsaxton


    I’ve seen a number of customers open support incidents because they couldn’t connect to their SQL Database server which was ultimately due to the incorrect assumption that the server’s IP address is static. In fact, the IP address of your logical server is not static and is subject to change at any time. All connections should be made using the fully qualified DNS name (FQDN) rather than the IP address.

    The following picture from the Windows Azure SQL Database Connection Management Technet article shows the network topology for a SQL Database cluster.

    image

    Your logical server (e.g., with a FQDN of xyz.database.windows.net) resides on a SQL Database cluster in one of the backend SQL Server nodes. Within a given region (e.g., North Central US, South Central US, North Europe, etc) there are generally many SQL Database clusters, as required to meet the aggregate capacity of all customers.  All logical servers within a cluster are accessed through the network load balancer (the single blue block with the note saying “Load balancer forwards ‘sticky’ sessions…” in the diagram) via a virtual IP address.

    If you do a reverse name lookup from your server’s IP address you will actually see the name of the cluster load balancer. For example, if I try to ping one of my servers (whose actual server name starts with ljvt in the screenshot below) you will see that the displayed name associated with the IP address is instead data.sn3-1.database.windows.net, where the sn3-1 portion of the name maps to the specific cluster in the region (South Central) hosting this server.

    image

    Microsoft may do an online migration of your logical server between clusters within a region, load balancing capacity across the clusters within the region. This move is a live operation and there is no loss of availability to your database during the operation. When the migration completes, existing connections to your logical server are terminated and upon reconnecting via fully qualified domain name your app will be directed to the new cluster.  However, if your application caches or connects by IP address instead of FQDN then your connection attempts will fail.

    A migration moves all of your settings, including any SQL Database firewall rules that you have.  Consequently there are no Azure-specific changes that are required in order to connect.  However, if your on-premise network infrastructure blocks/filters outgoing TCP/IP traffic to port 1433—the port used for SQL connections—and you had it restricted to a fixed IP address then you may need to adjust your client firewall/router.  The IP address of your SQL Database server will always be a part of the address ranges listed in the Windows Azure Datacenter IP Ranges list.  You should allow outgoing traffic for port 1433 to these address ranges rather than a specific IP address.

    Keith Elmore – Principal Escalation Engineer