Quantcast
Channel: CSS SQL Server Engineers
Viewing all 136 articles
Browse latest View live

How to configure Database mirroring for use with ODBC applications?

$
0
0

 

If you have an ODBC application that is connecting to a mirrored database and if

  • The application uses a connection string without a database name ( for e.g: dsn=ssistest;uid=ssistest;pwd=*********)

  • The DSN is configured to use SQL authentication with a non-blank password.

  • The default database in the DSN configuration is changed to the mirrored database.

 

You will see the following error message when the database is failed over from primary to secondary

ERROR [28000] [Microsoft][SQL Native Client][SQL Server]Login failed for user 'testappuser'.

ERROR [42000] [Microsoft][SQL Native Client][SQL Server]Cannot open database "mirrordb" requested by the login. The login failed.

 

Cause: Though you create the same user account on both the servers with the same password, upon failover the sids of the primary and the secondary do not match and hence the error.

 

There are a couple of ways to work around the problem:

1)      Change the connection string to include the database name. For eg: dsn=ssistest;uid=ssistest;pwd=*********;database=mirrordb;

2)      If this is not an option then you can use the SSIS transfer logins task to transfer the logins between primary and secondary.

 

Here are the details:

1)    Create a login on the primary server (for eg: ssistest with a password of Password1) and set the default database to master.

2)       Open SQL Server BI Studio (Business intelligence studio under SQL server program group) and create a SSIS package as per the following steps:

a)      Add two SMOServer Connection managers (one for the primary and one for the secondary).

b)      Add the Transfer Logins Task from the Toolbox and double click on it to get the properties screen.

c) Click on the Logins List and choose the ssistest user that we had setup for the mirroring.

d) In the properties of the Transfer Logins Task, under ‘Misc’ section ensure that we have 'CopySids’ set to True (This is very important)

e)      Now execute the task. You should see that the package executed successfully and transferred the logins.

3)      Now on the secondary server you should see the ssistest login. This will be in disabled state. You need to enable the same and set the password to the same one as on the primary. (Transfer logins task does this by design and the login on the destination server is assigned a random password)

4)      Now on the primary server change the default database of ssistest to mirrordb and make him a dbo in that database.

5)      Now failover the database to secondary.

6)      Change the default database of ssistest on the secondary to mirrordb (we can see that he will be automatically mapped to dbo role).

7)      Failover the mirrordb back to primary.

8)      Now create a ODBC DSN mapped the usual way

9)      Test the application It should work without a problem.(My app has the connection string the following way: dsn=ssistest;uid=ssistest;pwd=*********;)

10)      Failover the mirrodb and test your connection again from your application. It will still work.

 

 

The only caveat is that after the Test connection from ODBC administrator does not work after you failover the database from primary to secondary unless you just leave it at the last property page (Test datasource page) while you failover.

 

 

Here are some more references on why this problem is happening only with SQL logins.

 

1)      From Database Mirroring in SQL Server 2005 http://www.microsoft.com/technet/prodtechnol/sql/2005/dbmirror.mspx#EKGAG

 

+++++

Usually it will take more time to redirect an entire application from the old principal to the new principal than a database mirroring automatic failover will take. The application must detect and retry connections, which may add some time to the process. In addition, if new logins using SQL Server authentication have been added to the servers, you may need to map those logins to the new principal using the system stored procedure sp_change_users_login. Complete application failover may also be delayed if any critical objects on the old principal, such as SQL Agent jobs, have not also been copied to the new principal server. (For more information, see "Preparing the Mirror server for Failover" in the Implementation section later.)

+++++++++

2)      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=125975&wa=wsignin1.0

3)      Database Mirroring Best Practices and Performance Considerations

++++++

Make sure that applications can connect and execute all necessary actions, and that all active SQL Server logins (and their permissions) on the principal server are also present on the mirror server. You can use the Transfer Logins task of SQL Server 2005 Integration Services to accomplish this. (For details on the Transfer Logins task, see SQL Server 2005 Books Online.)

++++++++++

 

For more information on Transfer Logins Task:

·         Transfer Logins Task http://technet.microsoft.com/en-us/library/ms137870(SQL.90).aspx

 

 

So to summarize, the problems that we are running into from application perspective with SQL logins are expected and documented and we do need to workaround them using the ‘Transfer Logins’ task.

 

(For screenshots of the process and a sample SSIS package, please download the attached files).

 

Ramu Konidena

Support Engineer, SQL Developer team.


Timeouts when connecting to Named Instances

$
0
0

When connecting to a SQL Named Instance, you may encounter a Timeout error when trying to connect to it if the client is Vista or Windows 2008 with the client Firewall enabled.  This particular issue is only present if you are running the SQL 2008 Browser Service.

The browser service is what allows us to connect to a SQL Named Instance by using the friendly instance name (i.e. MyServer\SqlInstance) as opposed to the server and port (i.e. MyServer,2508).  When connecting to a Default Instance, which is running off of port 1433, we do not use the SQL Browser service and will therefore not encounter this issue.

This issue, along with possible workarounds, is documented in Books Online under the “Unusual Errors” section:

Troubleshooting: Timeout Expired
http://msdn.microsoft.com/en-us/library/ms190181(SQL.100).aspx

Also note that this could affect a SQL 2005 Named Instance as well if we are running the SQL 2008 Browser Service on the box.  After you install a SQL 2008 Instance, both SQL 2005 and SQL 2008 will share the same SQL Browser Service.

We wanted to point this out as it is not a typical timeout error and may be hard to pinpoint.  So, if you are running on Vista or Windows 2008, when trying to connect to a SQL 2008 Named Instance, and experience a Timeout, take a look at the firewall on the client side as this may be your issue.

Adam W. Saxton

Support Escalation Engineer

How It Works: Creating An EndPoint Adds An Entry To SysLogins

$
0
0

My SQL Server does not have individual windows users established as separate logins.   Instead it has the Domain\SQLUsers group established as a WINDOWS GROUP login.  You can review your mappings using the following DMVs.


select * from syslogins
select * from sys.server_principals
select * from sys.server_permissions

When I used the following CREATE ENDPOINT statement the Domain\UserName appeared in syslogins and server_principals.

CREATE ENDPOINT endpoint_test_tsql
    AS TCP ( LISTENER_PORT = 5033 )
    FOR TSQL()

Principals are entities that can request SQL Server resources. Like other components of the SQL Server authorization model, principals can be arranged in a hierarchy. The scope of influence of a principal depends on the scope of the definition of the principal: Windows, server, database; and whether the principal is indivisible or a collection. A Windows Login is an example of an indivisible principal, and a Windows Group is an example of a principal that is a collection. Every principal has a security identifier (SID).

During CREATE ENDPOINT the AUTHORIZATION is used to establish the ownership of the ENDPOINT object at the server level.

 

A valid SQL Server or Windows login that is assigned ownership of the newly created endpoint object. If AUTHORIZATION is not specified, by default, the caller becomes owner of the newly created object.

This results in the principal and HasAccess = 0 syslogin entry creation.    The Windows User Domain\UserName is not given direct login permissions, login permissions are still handled by the encompassing Domain\SQLUsers group that Domain\UserName belongs to.  However, this windows user is the owner of the endpoint and is allowed to control the permissions for the endpoint.


Think of this like the dbo in a database. 

 

                Machine\administrators          -               Login permissions and mapped to SQL Administrator

                Machine\rdorr                      -               DOES NOT EXIST in syslogins or sys.server_principals

 

create database dbTest

 

Show ownership of database

sp_helpdb dbTest

dbTest            2.73 MB     Machine\rdorr      7     Aug 29 2008   ...

Still no entry in syslogins but select * from sys.database_principals maps the dbo to Maching\rdorr SID.   The database requires an owner principal just like the endpoint requires a server level principle.

Bob Dorr
SQL Server Principal Escalation Engineer

TCP Chimney Offload – Possible Performance and Concurrency Impacts to SQL Server Workloads

$
0
0

TCP Chimney is enabled by default if you apply Windows Server 2003 Sp2.  This is an operating system feature that provides capability to offload TCP/IP packet processing from the processor to the network adapters and some other balancing options.  (For a full description of this feature see http://support.microsoft.com/kb/912222.)

TCP Chimney has been known to cause issues on SQL Server systems such as general network errors and working set trimming.  The following articles document these known issues:

http://support.microsoft.com/kb/942861

http://support.microsoft.com/kb/918483

We’ve also identified situations where TCP Chimney has impacted transaction throughput and caused delays between when a statement has been completed by the SQL engine and the time to receive the begin event of the next statement.  This impact can be significant especially in application workloads that have throughput requirements to execute a series of statements within a certain time boundary. 

For example, your application has a key transaction that consists of multiple statements.  Each individual statement on the engine side is optimized and has very short duration.  The overall duration of the transaction is short because each statement has low duration and the time in between the batches is short as well.  A profiler trace of this transaction typically shows a pattern like the following.  Note that there is very short time in between the complete of one batch and the start of the next batch:

image

However with TCP Chimney enabled, you notice there is a marked delay between a batch completed and the start of the next batch for the exact same series of statements and work.  In this example, note how there is approximately a 500 ms. delay in between the complete and start of the next batch:

image

In this scenario with the 500 ms. delay in between statements you would see the SPID spend most of its time awaiting command in sys.sysprocesses with a waittype of 0x000.

This type of delay can affect application throughput as well as concurrency.  For example if the above statements are all encompassed in an implicit transaction, with the added delay the overall duration of the implicit transaction is significantly increased, locks would then be held longer than normal and you may see unexpected blocking.  If you do a comparison test of the same implicit transaction between two systems, one with TCP Chimney enabled and the other with TCP Chimney disabled and you compare the sum of the duration of the individual statements vs. the total duration of the entire transaction, you may see that the overall transaction is significantly increased when TCP Chimney is enabled.  With TCP Chimney enabled, the delta between the sum of the statement duration from the overall transaction duration shows that the majority of time is spent awaiting the next batch/command. 

Here is an example comparison of the same workload with TCP Chimney enabled and disabled.  Note the significant increase in transaction duration and the large delta (difference between transaction duration vs. the sum duration of all statements within transaction) when TCP Chimney is enabled:

Implicit Transaction Summary TCP Chimney Enabled

spid    TransactionID  TranStart     TranEnd       TranDuration  sum_batch_duration   batch_count    delta
------- -------------- ------------- ------------  ------------- -------------------- -------------- --------
57      916972         09:40:24.450  09:41:17.623  53173         601                  516            52572
57      896243         09:39:31.620  09:40:01.840  30220         322                  301            29898
57      877227         09:39:12.120  09:39:15.293  3173          306                  161            2867
57      876313         09:38:58.590  09:38:58.603  13            0                    1              13
57      895388         09:39:18.510  09:39:18.527  16            16                   4              0
57      915675         09:40:02.653  09:40:02.670  16            16                   4              0

Implicit Transaction Summary TCP Chimney Disabled

spid    TransactionID  TranStart     TranEnd       TranDuration  sum_batch_duration   batch_count    delta
------- -------------- ------------  ------------  ------------- -------------------- -------------- --------
54      127910         11:13:47.287  11:13:52.490  5203          4060                 516            1143
54      107344         11:13:23.380  11:13:24.427  1046          382                  301            664
51      87187          11:12:50.067  11:12:50.550  483           0                    1              483
54      88182          11:13:03.987  11:13:07.237  3250          2878                 161            372
51      106432         11:13:10.487  11:13:10.487  0             0                    1              0
54      126550         11:13:25.490  11:13:26.007  516           516                  4              0

 

If you observe a similar pattern and suspect TCP Chimney, you may want to disable TCP Chimney to provide immediate relief.  Another option is to follow up with your network adapter vendor to see if they have an updated driver that will address the problem and allow for use of TCP Chimney.  For additional information see  http://support.microsoft.com/default.aspx?scid=kb;EN-US;948496

 

TCP Chimney is off by default in Windows Server 2008 - see http://support.microsoft.com/kb/951037.

Sarah Henwood | Microsoft SQL Server Escalation Services

RSWindowsNegotiate and 401.1 Error when using RS 2008

$
0
0

While I was setting up one of my demos for SQL PASS, I starting hitting 401.1 errors.  I was setting up a SharePoint Intergrated setup with Reporting Services.

I knew I had a distributed environment, so I accounted for my Kerberos configuration.  I lined up my SPNs and made sure my accounts were trusted for delegation.  So, I was a little surprised when I was hitting a 401.1 error when trying to run a report or create a new Datasource through the SharePoint RS Library.

I was using a Domain user account for my RS Service.  The key was that I configured the Service account to use the Domain user before it had started up at all.  Out of the gate, I was using the Domain user account and never touched the Network Service account.  This was done by way of specifying the Domain user account within the SQL 2008 Setup wizard.

An interested side affect to doing this is that we don't add RSWindowsNegotiate to the rsreportserver.config file.  All that was listed was RSWindowsNTLM.  Well, that explained the 401.1 error.  After manually adding in RSWindowsNegotiate, everything worked like a champ.

I found that we will add RSWindowsNegotiate when we use the Network Service account.  Because I hadn't used that account, the setting was never populated to the config file.

 

RSReportServer Configuration File

http://msdn.microsoft.com/en-us/library/ms157273.aspx

 

RSWindowsNegotiate

The report server accepts either Kerberos or NTLM security tokens. This is the default setting when the report server is running in native mode and the service account is Network Service. This setting is omitted when the report server is running in native mode and the service account is configured as a domain user account.

If a domain account is configured for the Report Server Service account and a Service Principle Name (SPN) is not configured for the report server, this setting might prevent users from logging on to the server.

rsconfig

 

Of note, once the setting is there, we will not remove it if you change from the Network Service account to a Domain User account.

 

Adam W. Saxton | Microsoft SQL Server Escalation Services

SQL 2005 JDBC Driver and Database Mirroring

$
0
0

We ran into some interesting situations with the SQL 2005 JDBC Driver (v1.2) and it's use with failover partners. Take the following connection string:

jdbc:sqlserver://myserver1;databaseName=AdventureWorks;failoverPartner=myserver2;

In this connection string, our Primary server will be myserver1 with our failover server being myserver2.  If the primary server becomes unresponsive, we will fail over to the myserver2.  This connection string should work perfectly fine.

Lets look at another situation:

jdbc:sqlserver://myserver1;databaseName=AdventureWorks;failoverPartner=myserver2\instance;

In this situation, we are connecting to a named instance for the failoverPartner.  Again this should work perfectly fine from a usage standpoint of failoverPartner.

Port Number used with failoverPartner

jdbc:sqlserver://myserver1;databaseName=AdventureWorks;failoverPartner=myserver2:1699;

In this case, we are either connecting to a named instance (by way of the port), or a default instance on a non-standard port (that being 1433).  This may or may not work as expected.  If we were able to successfully connect to the Primary Server once, we will cache the failover connection string via the connection to the Primary server.  The primary server in that case actually supplies the connection string to the failover partner and we ignore what you put in the actual application connection string.  If that happened, we will probably successfully connect to the failoverPartner. 

However, if we were not able to connect to the Primary Server at all (i.e. Primary Server is physically down or unreachable), then we will rely on the application connection string to connect to the failoverPartner.  Our JDBC Driver doesn't parse the port number for the failoverPartner property.  We will treat it as the actual server name.  This is what we would see in the JDBC Log output:

Oct 28, 2008 10:21:22 AM com.microsoft.sqlserver.jdbc.SQLServerConnection loginWithFailover
FINE:  ConnectionID:1 TransactionID:0x0000000000000000 This attempt No: 1
Oct 28, 2008 10:21:22 AM com.microsoft.sqlserver.jdbc.SQLServerConnection connectHelper
FINE:  ConnectionID:1 TransactionID:0x0000000000000000 Connecting with server: myserver2:1699 port: 1433 Timeout slice: 400 Timeout Full: 5
Oct 28, 2008 10:21:22 AM com.microsoft.sqlserver.jdbc.TDSChannel open
FINE: TDSChannel ( ConnectionID:1 TransactionID:0x0000000000000000): Opening TCP socket...
Oct 28, 2008 10:21:23 AM com.microsoft.sqlserver.jdbc.SQLServerException logException
FINE: *** SQLException:com.microsoft.sqlserver.jdbc.SQLServerConnection@471e30 com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host  has failed.

Notice that server equals myserver2:1699 with a port of 1433.  This is because the port number that was specified in the connection string was not parsed out.

This is the exception you will receive on the application side:

com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host  has failed. java.net.UnknownHostExce
ption
        at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.loginWithFailover(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)

This issue is currently not going to be changed and will still be present in the 2.0 release of the driver.

Named Instance on Primary Server when failoverPartner is specified

jdbc:sqlserver://myserver1\instance;databaseName=AdventureWorks;failoverPartner=myserver2;

Here we have a named instance for the Primary server and just a default instance for the failoverPartner.  Lets assume that either the Primary Server is physically down or the SQL Browser service on that server is not running.  This will result in the following exception:

com.microsoft.sqlserver.jdbc.SQLServerException: The connection to the named instance  has failed. Error: java.net.Socke
tTimeoutException: Receive timed out.
        at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.getInstancePort(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)

In most situations, because we received an exception, we will try and connect again as it should use the failoverPartner at this point.  This is what we will see:

com.microsoft.sqlserver.jdbc.SQLServerException: The connection to the named instance  has failed. Error: java.net.Socke
tTimeoutException: Receive timed out.
        at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.getInstancePort(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(Unknown Source)
        at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)

Notice the lack of the loginWithFailover method from the callstack.  We didn't make it far enough to even attempt the connection to the failoverPartner.  In this situation, we are trying to resolve the instance name to a port number.  Because we are unable to communicate with the SQL Browser services (UDP 1434) we cannot perform the lookup and we are just erroring out at that point.  To work around this issue, you could specify the port number instead of the instance name itself.

This is what we would see in the JDBC Log output:

Oct 28, 2008 10:39:19 AM com.microsoft.sqlserver.jdbc.SQLServerConnection getInstancePort
FINE:  ConnectionID:1 TransactionID:0x0000000000000000 Unexpected UDP timeout at 1 seconds resolving instance port.  Target -> udp:myserver2/10.0.0.2:1434.
Oct 28, 2008 10:39:28 AM com.microsoft.sqlserver.jdbc.SQLServerException logException
FINE: *** SQLException:com.microsoft.sqlserver.jdbc.SQLServerConnection@b09e89 com.microsoft.sqlserver.jdbc.SQLServerException: The connection to the named instance  has failed. Error: java.net.SocketTimeoutException: Receive timed out. The connection to the named instance  has failed. Error: java.net.SocketTimeoutException: Receive timed out.

This issue is actually going to be addressed in the 2.0 release of the JDBC Driver and should not be a problem.

Adam W. Saxton | Microsoft SQL Server Escalation Services

How to troubleshoot leaked SqlConnection objects (.NET 2.0) - Part 1

$
0
0

One of my colleagues, Kamil Sykora, compiled a document that goes through how to troubleshoot leaked SqlConnection objects (from a .NET 2.0 perspective).  It was a fairly large document, so I’m not going to post the whole thing.  I’m going to split it out over several posts and base the examples off of a custom demo that I have created. 

A common issue that we often observe is "leaking" connections in a .NET application. While leaking objects is technically not possible in a .NET application, the issue that we often observe is that customers are not closing SqlConnection objects before they go out of scope. This results in unused SqlConnection objects holding on to internal references and native objects until these SqlConnection objects get collected by the Garbage Collector.

The most common symptom of this is this error message:


Exception type: System.InvalidOperationException
Message: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.
InnerException: <none>
StackTrace (generated):
    SP       IP       Function
    0636F4B8 653CF486 System_Data_ni!System.Data.ProviderBase.DbConnectionFactory.GetConnection(System.Data.Common.DbConnection)+0x133f46
    0636F4C4 652D69BA System_Data_ni!System.Data.ProviderBase.DbConnectionClosed.OpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)+0x6a
    0636F4F8 652F5440 System_Data_ni!System.Data.SqlClient.SqlConnection.Open()+0x70

The steps to take when we see this exception are:

  • Find out how the customer is opening and closing connections and ensure that they are explicitly closing them in all cases. If doing this is not sufficient and it’s not 100% clear if all connections are getting closed.
  • Obtain a user dump of the process once the issue occurs. We can obtain a hang dump as soon as the exception occurs (good) or a crash dump on the exception (better).
  • Follow the debugging steps in this series to confirm if there are any unreferenced connections that are still holding on to internal references.

The following debugging instructions are based on an x86 user dump. Similar steps can be taken for a 64-bit dump as noted below.

For the dumps, we used the SOS debugging extension which ships with the .NET Framework.  You can load the extension in the debugger by using the following command:

0:000> .loadby sos mscorwks

Locating the pool(s)

First we find all the pool object method tables in the process.

0:000> !dumpheap -stat -type DbConnectionPool
total 26 objects
Statistics:
      MT    Count    TotalSize Class Name
65404260        1           16 System.Data.ProviderBase.DbConnectionPoolIdentity
65436c90        1           24 System.Collections.Generic.List`1[[System.Data.ProviderBase.DbConnectionPool, System.Data]]
65436598        1           24 System.Collections.Generic.List`1[[System.Data.ProviderBase.DbConnectionPoolGroup, System.Data]]
6540444c        2           24 System.Data.ProviderBase.DbConnectionPool+DbConnectionInternalListStack
65400c70        1           32 System.Data.ProviderBase.DbConnectionPoolGroupOptions
654000a4        1           40 System.Data.ProviderBase.DbConnectionPoolGroup
6543397c        1           52 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Data.ProviderBase.DbConnectionPoolGroup, System.Data]]
654044a8        1           52 System.Data.ProviderBase.DbConnectionPool+PoolWaitHandles
6543085c        1           60 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Data.ProviderBase.DbConnectionPoolGroup, System.Data]][]
65404638        1           64 System.Data.ProviderBase.DbConnectionPool+TransactedConnectionPool
653fff4c        1          100 System.Data.ProviderBase.DbConnectionPool
653ffde4       14          168 System.Data.ProviderBase.DbConnectionPoolCounters+Counter
Total 26 objects

Then we dump out the individual pool objects. In this case there are a total of one pool. We dump out the pool and look for the _totalObjects member variable to see how many objects we have in that pool. Note that in the below case we have at least one pool with 100 connections which is the default number of maximum connections in a pool. We also look at the _connectionPoolGroupOptions variable and dump it out to double-check that the _maxPoolSize has been reached.

0:000> !dumpheap -mt 653fff4c
Address       MT     Size
012bbe80 653fff4c      100    
total 1 objects
Statistics:
      MT    Count    TotalSize Class Name
653fff4c        1          100 System.Data.ProviderBase.DbConnectionPool
Total 1 objects

0:000> !do 012bbe80
Name: System.Data.ProviderBase.DbConnectionPool
MethodTable: 653fff4c
EEClass: 653ffedc
Size: 100(0x64) bytes
(C:\WINNT\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
79102290  4001517       44         System.Int32  1 instance   220000 _cleanupWait
65404260  4001518        4 ...ctionPoolIdentity  0 instance 012bd960 _identity
6540012c  4001519        8 ...ConnectionFactory  0 instance 01275c34 _connectionFactory
654000a4  400151a        c ...nnectionPoolGroup  0 instance 01279e7c _connectionPoolGroup
65400c70  400151b       10 ...nPoolGroupOptions  0 instance 01279e5c _connectionPoolGroupOptions
65426f4c  400151c       14 ...nPoolProviderInfo  0 instance 00000000 _connectionPoolProviderInfo
65426eac  400151d       48         System.Int32  1 instance        1 _state
6540444c  400151e       18 ...InternalListStack  0 instance 012bbee4 _stackOld
6540444c  400151f       1c ...InternalListStack  0 instance 012bbef0 _stackNew
791186fc  4001520       20 ...ding.WaitCallback  0 instance 012bc348 _poolCreateRequest
791087cc  4001521       24 ...Collections.Queue  0 instance 00000000 _deactivateQueue
791186fc  4001522       28 ...ding.WaitCallback  0 instance 00000000 _deactivateCallback
79102290  4001523       4c         System.Int32  1 instance       32 _waitCount
654044a8  4001524       2c ...l+PoolWaitHandles  0 instance 012bbf80 _waitHandles
790fdf04  4001525       30     System.Exception  0 instance 00000000 _resError
7910be50  4001526       5c       System.Boolean  1 instance        0 _errorOccurred
79102290  4001527       50         System.Int32  1 instance     5000 _errorWait
791127fc  4001528       34 ...m.Threading.Timer  0 instance 00000000 _errorTimer
791127fc  4001529       38 ...m.Threading.Timer  0 instance 012bc4c0 _cleanupTimer
65404638  400152a       3c ...tedConnectionPool  0 instance 012bc16c _transactedConnectionPool
00000000  400152b       40                       0 instance 012bbfb4 _objectList
79102290  400152c       54         System.Int32  1 instance      100 _totalObjects
79102290  400152e       58         System.Int32  1 instance        2 _objectID
791080f0  4001516      5fc        System.Random  0   static 012bd9c0 _random
79102290  400152d      828         System.Int32  1   static        2 _objectTypeCount

Here is the DbConnectionPoolGroupOptions object that we can get _maxPoolSize from:

0:000> !do 01279e5c
Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
MethodTable: 65400c70
EEClass: 6544cb58
Size: 32(0x20) bytes
(C:\WINNT\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
7910be50  4001573       10       System.Boolean  1 instance        0 _poolByIdentity
79102290  4001574        4         System.Int32  1 instance        0 _minPoolSize
79102290  4001575        8         System.Int32  1 instance      100 _maxPoolSize
79102290  4001576        c         System.Int32  1 instance    15000 _creationTimeout
7911228c  4001577       14      System.TimeSpan  1 instance 01279e70 _loadBalanceTimeout
7910be50  4001578       11       System.Boolean  1 instance        1 _hasTransactionAffinity
7910be50  4001579       12       System.Boolean  1 instance        0 _useDeactivateQueue
7910be50  400157a       13       System.Boolean  1 instance        0 _useLoadBalancing

At this point we have found that our pool has 100 connections whose max pool size is 100. This means that any connection requests to this pool will return the above mentioned error message. This is the immediate cause of the error message and we do not have to spend time looking for other potential causes, such as physical connectivity problems etc.

Next time, we will go into the internal connection object.

Adam W. Saxton | Microsoft SQL Server Escalation Services

How to troubleshoot leaked SqlConnection Objects (.NET 2.0) - Part 2

$
0
0

In the last post in this series, we looked at how we can determine that our Connection pool was exhausted.  In this post I'm going to go a little deeper into the Internal connection itself and how we can verify if this is a closed or active connection.

Dumping out the internal connection objects

A connection object in the System.Data.SqlClient namespace consists of two parts:

  • The SqlConnection class that is used by customers’ code
  • The SqlInternalConnectionTds internal class that is used by the pooling code. This class is not directly accessible to the user.

The SqlConnection class has a pointer to a SqlInternalConnectionTds object if it’s open (_innerConnection member variable). The _innerConnection member variable is null if the connection is closed. Whenever a connection is closed by the code, the internal object gets disassociated from the external object and the ownership of the internal object transfers to the pool object. This relationship allows us to identify SqlConnection objects that have not been closed.

The SqlInternalConnectionTds object has a weak reference back to the owning SqlConnection object.

Since there are typically multiple pools and not all of them are full, we want to start with the internal objects that we know belong to a full pool.

Going back to the pool in question, lets dump out the items within this pool.

0:000> !do 012bbe80
Name: System.Data.ProviderBase.DbConnectionPool
...
00000000  400152b       40                       0 instance 012bbfb4 _objectList
79102290  400152c       54         System.Int32  1 instance      100 _totalObjects
...

0:000> !do 012bbfb4
Name: System.Collections.Generic.List`1[[System.Data.ProviderBase.DbConnectionInternal, System.Data]]
MethodTable: 654413c4
EEClass: 7912f680
Size: 24(0x18) bytes
(C:\WINNT\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
7912d8f8  40009c7        4      System.Object[]  0 instance 012bbfcc _items
79102290  40009c8        c         System.Int32  1 instance      100 _size
79102290  40009c9       10         System.Int32  1 instance      100 _version
790fd0f0  40009ca        8        System.Object  0 instance 00000000 _syncRoot
7912d8f8  40009cb        0      System.Object[]  0   shared   static _emptyArray
    >> Domain:Value dynamic statics NYI
00155858:NotInit  <<

0:000> !da 012bbfcc
Name: System.Data.ProviderBase.DbConnectionInternal[]
MethodTable: 7912d8f8
EEClass: 7912de6c
Size: 416(0x1a0) bytes
Array: Rank 1, Number of elements 100, Type CLASS
Element Methodtable: 654009f0
[0] 012be414
[1] 012bf3e4
[2] 012bf008
...
[98] 0148114c
[99] 01485fcc

At this point we want to save all these 100 internal connection addresses into a file and remove all the array indexes so that the file only contains:

012be414
012bf3e4
012bf008
...
0148114c
01485fcc

Visual Studio is handy for this since we can select using alt + mouse to select the first 3-4 columns and delete them all, then save the file.

Processing the internal connections

The goal at this point is to find any SqlConnection objects from these SqlInternalConnectionTds objects that are no longer referenced. If the SqlConnection still references the SqlInternalConnectionTds and cannot be reached through !gcroot, it has been abandoned by the code without closing it.

Using .foreach to dump out the connections is easiest since it avoid the manual work of processing each of the 100 connections in question:

.foreach /f ( place "c:\temp\InternalConnections.txt") {  dd poi(poi( place +4)+4) l1}
(32 bit)

or

.foreach /f ( place "c:\temp\InternalConnections.txt") {  dq poi(poi( place +8)+8) l1}
(64 bit)

Explanation of the .foreach command:

place – this is our placeholder, or variable name, that represents each of the addresses in the file
dd – this would be dq in a 64-bit dump. It dumps out a double word, or the address
place + 8 – the weak reference is at offset 8 from the SqlInternalConnectionTds(64 bit) or at offset 4(32 bit):

0:000> !do 012be414
Name: System.Data.SqlClient.SqlInternalConnectionTds
MethodTable: 65404744
EEClass: 6544d9e0
Size: 140(0x8c) bytes
(C:\WINNT\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
79102290  4000f67       1c         System.Int32  1 instance        4 _objectID
...
79104c38  4000f6d        4 System.WeakReference  0 instance 012be55c _owningObject
...

The WeakReference object has a handle at offset 8 that is the second +8 in the command (64 bit) or at offset 4 (32 bit):

0:000> !do 012be55c
Name: System.WeakReference
MethodTable: 79104c38
EEClass: 79104bd4
Size: 16(0x10) bytes
(C:\WINNT\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
791016bc  40005a9        4        System.IntPtr  1 instance   3f1268 m_handle
7910be50  40005aa        8       System.Boolean  1 instance        0 m_IsLongReference

The value at that location is the owning object if it exists.

Non-Null and has an owning object:

0:000> dd 3f1268 l1
003f1268  01575138

Null and no owning object:

0:000> dd 3f1268 l1
003f1268  00000000

Output of the foreach command:

0:000> .foreach /f ( place "c:\temp\InternalConnections.txt") {  dd poi(poi( place +4)+4) l1}
003f1268  01575138
003f127c  0157336c
003f1290  0157136c
003f1298  0156f138
003f1244  015809fc
...
003f2d34  014ac514
003f2d2c  014acbf4
003f2d1c  014ac7d4
003f2d14  015817cc

As we can see, the internal connections have an owning SqlConnection object. This either means that they are actively being used by the code (not likely) or they have been abandoned (more likely).

Finding out if a connection is actively used

To find out if a SqlConnection is still being used by the code, we can run the !gcroot command. This command will tell us if the object is reachable by the .NET Framework and if it is not, it is ready to be collected.

0:000> !gcroot 0157336c
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 0 OSTHread 590
DOMAIN(00155858):HANDLE(WeakSh):3f127c:Root:0157336c(System.Data.SqlClient.SqlConnection)

At this point in the application, we only have one thread running which is thread ID 0. 

Here the output indicates that the object is reachable from thread 0. However, this can be a false positive because thread references can be old. We still have to verify that the object actually exists on that thread:

0:000> kL
ChildEBP RetAddr 
0012f31c 7739bf53 ntdll!KiFastSystemCallRet
0012f3b8 7b0831a5 user32!NtUserWaitMessage+0xc
0012f434 7b082fe3 System_Windows_Forms_ni+0xb31a5
0012f464 7b0692c2 System_Windows_Forms_ni+0xb2fe3
0012f490 79e7c6cc System_Windows_Forms_ni+0x992c2
0012f510 79e7c8e1 mscorwks!CallDescrWorkerWithHandler+0xa3
0012f64c 79e7c783 mscorwks!MethodDesc::CallDescr+0x19c
0012f668 79e7c90d mscorwks!MethodDesc::CallTargetWorker+0x1f
0012f67c 79eefb9e mscorwks!MethodDescCallSite::Call_RetArgSlot+0x18
0012f7e0 79eef830 mscorwks!ClassLoader::RunMain+0x263
0012fa48 79ef01da mscorwks!Assembly::ExecuteMainMethod+0xa6
0012ff18 79fb9793 mscorwks!SystemDomain::ExecuteMainMethod+0x43f
0012ff68 79fb96df mscorwks!ExecuteEXE+0x59
0012ffb0 7900b1b3 mscorwks!_CorExeMain+0x15c
0012ffc0 77e6f23b mscoree!_CorExeMain+0x2c
0012fff0 00000000 kernel32!BaseProcessStart+0x23

We can see that we have managed code on this thread.  Let's look at what the managed stack looks like:

0:000> !clrstack
OS Thread Id: 0x590 (0)
ESP       EIP    
0012f32c 7c8285ec [InlinedCallFrame: 0012f32c] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
0012f328 7b08374f System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
0012f3c8 7b0831a5 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
0012f440 7b082fe3 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0012f470 7b0692c2 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f480 00e70097 SqlConnectionLeakWin.Program.Main()
0012f69c 79e7c74b [GCFrame: 0012f69c]

Doesn't appear to be doing anything with SQL here.  Let's look at the objects on the stack:

0:000> !dso
OS Thread Id: 0x590 (0)
ESP/REG  Object   Name
ebx      01253384 System.Windows.Forms.Application+ThreadContext
esi      015cc2e8 System.Collections.Hashtable+HashtableEnumerator
0012f354 01299fc4 System.Windows.Forms.NativeMethods+MSG[]
0012f358 01253384 System.Windows.Forms.Application+ThreadContext
0012f360 01299ad8 System.Windows.Forms.Application+ComponentManager
0012f3d8 01253384 System.Windows.Forms.Application+ThreadContext
0012f42c 01253384 System.Windows.Forms.Application+ThreadContext
0012f43c 01296b84 System.Windows.Forms.ApplicationContext
0012f444 0127fe4c System.ComponentModel.EventHandlerList
0012f458 01252a8c SqlConnectionLeakWin.Form1
0012f460 01253384 System.Windows.Forms.Application+ThreadContext
0012f474 01252a8c SqlConnectionLeakWin.Form1

We can conclude that this SqlConnection object is no longer being used and it has not been closed.  This proves that the  applications code did not close all connections and further code investigation needs to be performed to close all connections.

Reference:

Part 1

Adam W. Saxton | Microsoft SQL Server Escalation Services


Searching for Duplicate SPN's got a little easier

$
0
0

We get a lot of calls related to Kerberos configuration, and I'm planning to write more about our experiences and troubleshooting techniques for these types of issues across the box (Engine, AS and RS). 

With Windows 2000/2003 SetSPN had only a few commands associated with it.

Switches:
-R = reset HOST ServicePrincipalName
  Usage:   setspn -R computername
-A = add arbitrary SPN
  Usage:   setspn -A SPN computername
-D = delete arbitrary SPN
  Usage:   setspn -D SPN computername
-L = list registered SPNs
  Usage:   setspn [-L] computername

The other problem was that SetSPN was part of the Resource Kit and did not ship with the OS.

This has changed in Windows 2008.  SetSPN is now part of the OS from the moment you install it.  They have also improved what SetSPN can do.  Namely the ability to look for duplicate SPNs.  In the past I have used numerous tools to look for duplicate SPNs.  This ranged from DHDiag (an internal CSS tool that uses LDIFDE) to queryspn.vbs to DelegConfig

Here are the new switches for SetSPN that ships with Windows 2008:

Modifiers:
-F = perform the duplicate checking on forestwide level
-P = do not show progress (useful for redirecting output to file)

Switches:
-R = reset HOST ServicePrincipalName
Usage:   setspn -R computername
-A = add arbitrary SPN
Usage:   setspn -A SPN computername
-S = add arbitrary SPN after verifying no duplicates exist
Usage:   setspn -S SPN computername
-D = delete arbitrary SPN
Usage:   setspn -D SPN computername
-L = list registered SPNs
Usage:   setspn [-L] computername
-Q = query for existence of SPN
Usage:   setspn -Q SPN
-X = search for duplicate SPNs
Usage:   setspn -X

The Q switch is really the nice feature here.  This allows you to see if an SPN is already out on your domain.  You could also combine this with the F modifier to look through the whole forest.

C:\>setspn -q MSSQLSvc/mymachine:1433

No such SPN found.

C:\>setspn -q MSSQLSvc/mymachine.mydomain.com:1433
CN=MYMACHINE,OU=Workstations,DC=mydomain,DC=com
        MSSQLSvc/mymachine.mydomain.com:1433
        HOST/MYMACHINE
        HOST/MYMACHINE.MYDOMAIN.COM

Existing SPN found!

This is just another thing that will make Kerberos configuration/troubleshooting easier for users.

Adam W. Saxton | Microsoft SQL Server Escalation Services

When in doubt, Reboot!

$
0
0

I tend to get quite a bit of Kerberos related cases.  These are related across the box, from the Engine, to Reporting Services to just straight connectivity with custom applications.  I had one given to me yesterday because the engineer had gone through everything we normally go through and wasn’t getting anywhere.

The situation was an 8 node cluster with multiple instances across the nodes.  These were running Windows 2008 with SQL 2008.  One node in particular was having an issue when they were issuing a Linked Server Query from a remote client.

image

When trying to hit the linked server from within Management Studio on the client machine, we received the following message:

Msg 18456, Level 14, State 1, Line 1
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'

Kerberos Configuration:

When we see this type of error, it is typically Kerberos related as the Service we are using (ServerA) is unable to delegate the client’s credentials to the backend server (ServerB – Linked Server).  The first thing we do is go through our regular kerberos checklist – SPN’s and Delegation settings. Both SQL Servers were using the same Domain User Service Account (SNEAKERNET\SQLSvc).  We can use SetSPN to check what SPN’s are on that account.  NOTE:  There are numerous ways to look for SPN’s but SetSPN is one of the easier command line tools available.  You could also use LDIFDE (http://support.microsoft.com/kb/237677), ADSIEdit (http://technet.microsoft.com/en-us/library/cc773354(WS.10).aspx) and other tools.  You will see us use an in house tool called DHDiag to collect SPN’s.  This is just a wrapper that calls LDIFDE to output the results. 

So, here are the SetSPN results:

C:\Users\Administrator>setspn -l sqlsvc
Registered ServicePrincipalNames for CN=SQL Service,OU=Service Account,DC=sneakernet,DC=local:
        MSSQLSvc/SQL02:26445
        MSSQLSvc/SQL02.sneakernet.local:26445
        MSSQLSvc/SQL01.sneakernet.local:14556
        MSSQLSvc/SQL01:14556

Why do we see SQL01 and SQL02 when our machine names are ServerA and ServerB?  This is because SQL01 and SQL02 are the virtual names for the cluster.  This name will move to whatever the active node is for that given instance.  Where as ServerA and ServerB are the physical machine names and may or may not be actually hosting that instance.  We can also see from this that we have two distinct instances because of the ports (14556 & 26445).  If you look at some of our documentation (i.e. http://msdn.microsoft.com/en-us/library/ms189585(SQL.90).aspx), it indicates that for clusters, you need to also add a SQL SPN that does not include the port number.  I have yet to see where this is actually needed.  Every cluster I’ve seen has never had one.  Typically if it is needed, you will receive a KRB_ERR_S_PRINCIPAL_UNKNOWN error if you enable Kerberos Event Logging.  If you do see that and it lists that SPN, then go ahead and add it.  But, from my experience, you won’t see it.

Ok, our SPNs look good. Lets look at our Delegation Settings.  In this case we really care about the SQL Service Account, because that is the context that will be performing the delegation.

image 

We can do this by going to the properties for that account within Active Directory Users and Computers.  You will see a Delegation tab on the account.  If you don’t see the delegation tab, then the account does not have an SPN attached to it.  In this case we have “Trust this user for delegation to any service (Kerberos only)”.  This is what I call Full or Open Delegation as opposed to Constrained Delegation (which is more secure).  We are good to go here.  Nine times out of ten, the SPN or Delegation setting is going to be the cause of your issue.  In this case it isn’t.  What can we do now?

Kerberos Event Logging and Network Traces:

We can enable Kerberos Event Logging (http://support.microsoft.com/default.aspx?scid=kb;EN-US;262177) which will give us errors within the System Log for Kerberos.  This can sometimes be very helpful in diagnosing what may or may not be happening.  This produced the following results on ServerA:

Error Code: 0x1b Unknown Error
Error Code: 0x19 KDC_ERR_PREAUTH_REQUIRED
And KDC_ERR_BADOPTION

These are not uncommon and when we looked at these, they didn’t really relate to our issue.  Which means we had nothing here.  Of note, doing a linked server query from ServerB to ServerA worked, and it also produced the same events listed above.  So, nothing to gain here.

The next thing we can look at is getting a network trace as this will show us the communication between Service in question and the Domain Controller.  I usually end up at this level if the SPN’s and Delegation settings check out.  This is really where some customers can have issues, because typically these are hard to interpret and will require a call to CSS.  We grabbed a trace in the failing and working condition to see what was different.  We saw the following:

Failing:
525355 2009-06-30 15:55:39.468865 10.0.0.90 10.0.0.10 KRB5 TGS-REQ
KDC_REQ_BODY
KDCOptions: 40810000 (Forwardable, Renewable, Canonicalize)
Realm: SNEAKERNET.LOCAL
Server Name (Enterprise Name): ServerA$@SNEAKERNET.LOCAL

Working:
353115 23.437037 10.0.0.20 10.0.0.11 KRB5 TGS-REQ
KDC_REQ_BODY
KDCOptions: 40810000 (Forwardable, Renewable, Canonicalize)
Realm: SNEAKERNET.LOCAL
Server Name (Service and Instance): MSSQLSvc/SQL02.sneakernet.local:26445

You’ll notice that we are hitting different DC’s here, but that wasn’t the issue as we also saw the failing one hitting different DC’s as we continued.  The other item that is different is the working one requested the right SPN, where as the failing one is requesting the physical machine account context.  This is what was forcing us into NTLM and causing the Login failed error.  But why was that happening?  So far we have zero information to indicate what could be causing it.

SSPIClient:

We then used an internal tool called SSPIClient which makes direct calls to the InitializeSecurityContext API call which is how we do impersonation.  This tool allowed us to take SQL Server out of the picture and focus on the Kerberos issue directly.  We could see that we were failing back to NTLM which really confirmed what we saw in the network trace.

2009-07-01 16:34:24.577 ENTER InitializeSecurityContextA
2009-07-01 16:34:24.577 phCredential              = 0x0090936c
2009-07-01 16:34:24.577 phContext                 = 0x00000000
2009-07-01 16:34:24.577 pszTargetName             = 'MSSQLSvc/SQL02.sneakernet.local:26445'
2009-07-01 16:34:24.577 fContextReq               = 0x00000003 ISC_REQ_DELEGATE|ISC_REQ_MUTUAL_AUTH
2009-07-01 16:34:24.577 TargetDataRep             = 16
2009-07-01 16:34:24.577 pInput                    = 0x00000000
2009-07-01 16:34:24.577 phNewContext              = 0x0090937c
2009-07-01 16:34:24.577 pOutput                   = 0x0017d468
2009-07-01 16:34:24.577 pOutput->ulVersion        = 0
2009-07-01 16:34:24.577 pOutput->cBuffers         = 1
2009-07-01 16:34:24.577 pBuffers[00].cbBuffer   = 52
2009-07-01 16:34:24.577 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
2009-07-01 16:34:24.577 pBuffers[00].pvBuffer   = 0x02c99f90
2009-07-01 16:34:24.578 02c99f90  4e 54 4c 4d 53 53 50 00 01 00 00 00 97 b2 08 e2   NTLMSSP.........
2009-07-01 16:34:24.578 02c99fa0  03 00 03 00 31 00 00 00 09 00 09 00 28 00 00 00   ....1.......(...        
2009-07-01 16:34:24.578 pfContextAttr             = 0x00001000 ISC_RET_INTERMEDIATE_RETURN
2009-07-01 16:34:24.578 ptsExpiry                 = 0x0017d43c -> 2009-07-01 10:39:24 *** EXPIRED *** (05:55:00 diff)
2009-07-01 16:34:24.578 EXIT  InitializeSecurityContextA returned 0x00090312 SEC_I_CONTINUE_NEEDED (The function completed successfully, but must be called again to complete the context)

NOTE:  We purged all of the Kerberos Tickets before we did this to make sure we would request the ticket from the KDC.  This was done using KerbTray which is part of the Windows Resource Kit.

This tells us that we were requesting a given SPN for the Target, but the buffer shows NTLMSSP.  This means we fell down to NTLM instead of getting Kerberos.  This still doesn’t explain why.

End Result:

Unfortunately, this was one of those issues that just escaped us.  This tends to happen with odd Kerberos cases.  We had the Directory Services team engaged as well and they did not know what else we could do in terms of data collection outside of a Kernel Dump to see what may be going on.  We noticed that the nodes had not been rebooted since April 5th which is a while.  The SQL Service was recycled on June 25th.  We decided to fail over to another node and reboot ServerA. After we rebooted, we tried SSPIClient again and we saw a proper response come back which also didn’t list EXPIRED.  The issue at this point it was resolved.  We don’t have hard data to indicate what exactly the issue was, but the thought is that something was cached and invalid causing the issue.  Rebooting cleared that out and allowed us to work as expected.

Which leads me to my motto:  When in doubt, Reboot!

Adam W. Saxton | Microsoft SQL Server Escalation Services

Report Builder and Firewalls

$
0
0

We have had a few customer calls come in on this scenario that I thought this needed to be documented a bit.

Scenario:

image

In this scenario, the customer has a data source defined on the Report Server.  Some were using Named Instances, others were using a Default Instance for the Data Source.

There are some aspects of Report Builder that will run server side (from the context of the Report Server).  For example, DataSource retrieval and preview of a report.  This is assuming that we are in connected mode in Report Builder.

image

There are other aspects that will run Client Side.  Some examples of that are the Query Designer and general Metadata lookup for the DataSet.  This is where the problems come into play when a firewall is involved.

In all of the cases, reports and Report Builder function normally locally.  When they try to create a new report through Report Builder, they encounter errors similar to the following:

 

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)

 

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - The requested name is valid, but no data of the requested type was found.)

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

The first error is specific to a Named Instance server. The other two are when we are trying to connect directly to the SQL Server.  Named Instances have to do a lookup to get the port number for the actual instance we are connecting to.  This lookup is fielded by SQL Browser over UDP port 1434.  When ever you see “error: 26 - Error Locating Server/Instance Specified”, it is SQL Browser related.  The underlying issue is still the same as the other messages.

The way I reproduced the issue was by doing the following on my lab setup which was configured for Basic Authentication:

  1. Open Report Builder (which starts with a blank report – and I was in connected mode with my Report Server)
  2. Create a DataSource which I select from the existing data sources on my Report Server
  3. Create a DataSet
  4. At this point, the DataSet Properties window should open up, at which point you can click on “Query Designer…”

    image
  5. I was then prompted for Credentials and then was met with the following:

    image

The Problem:

The overall problem is that Report Builder cannot see the SQL Server when external to the network that SQL Server resides on.  SQL Server is typically not exposed through the firewall.  Assume the following configuration:

Report Server:

  • Internet RS URL: http://www.mysite.com/ReportServer
  • Public IP:  201.201.201.201
  • Private IP: 10.0.0.5
  • DataSource Connection String:  server=MyServer\MyInstance;Database=AdventureWorks;

SQL Server:

  • Server Name: MyServer
  • Instance Name: MyInstance (Port 2644)
  • Private IP:  10.0.0.4

When Report Builder is opened from a client machine on the Internet (or external to the private network that SQL Server is a part of), when it goes to hit the datasource, it is actually trying to connect to MyServer\MyInstance.  Because this is a named instance, we are doing the SQL Browser lookup first.  In this case, it will be a NetBIOS lookup.  If we are doing a straight TCP connection, we will end up doing a DNS lookup.  Because we are on the Internet, there is no WINS or DNS server that is aware of MyServer.  NetBIOS or DNS will come back basically saying it couldn’t find the server name you are requesting which results in one of the errors I outlined above.

Report Builder doesn’t go through the Reporting Services WebService to do DataSource calls which would make it server based.  From the design perspective, we are client side and it will try to establish that data from the client.  I think some of the confusion is that people thing that we are in “connected” mode with the Report Server, so all functionally would occur on the Report Server itself, in which case we would expect the Report Server to be able to communicate with the SQL Server successfully.  This, unfortunately, is not the case.

Are there any workarounds?

The next logical question would be, how do I get this to work?  There are two possible workaround I can think of.  One that is not very realistic and another that is possible, but also somewhat of a pain.

Workaround 1:

This involves exposing your SQL Server to the internet, which I do NOT recommend and I doubt most companies are willing to do.  At that point, you could have an External DataSource along with an Internal DataSource.  People using Report Builder on the internet could reference the External DataSource which has the connection information for the SQL Server that would be usable from the internet.  At that point the design aspects would work, but Preview could fail depending on your network configuration if the Report Server can reference the external IP address for SQL Server from the internal side.

Then when you publish, the report can reference the Internal DataSource.

Workaround 2:

Another option is to expose your data through a WebService that is accessible via the Internet.  Then Report Builder uses can access the DataSource that is using the WebService as that resource is available to them externally.

Update - Workaround 3 (SSAS/OLAP) – Thanks David!:

For SSAS/OLAP you can setup a Connection Proxy over HTTP.  This would be usable both internally and externally and can be easily exposed through a firewall.  Be sure to use a non-standard port that is configured on your Firewall for security purposes.  Also, be aware that you are exposing your backend to the internet and to take the appropriate security measures.  SQL has a similar feature through the use of an HTTP Endpoint, but be aware that that has been deprecated and is not guaranteed to be available in a future release.

 

Overall, it will be difficult for people using Report Builder externally to access resources that are on an internal network when designing a report.  Hopefully, this will allow you to better plan your deployment of Reporting Services.

 

Adam W. Saxton | Microsoft SQL Server Escalation Services

SQL 2008 - New Functionality to the dm_os_ring_buffers for Connectivity Troubleshooting

$
0
0

Hi,

 

I wanted to make everybody aware of this feature in SQL 2008. 

 

Are you tired of having to use NetMon to narrow down a connectivity issue with SQL Server 2008 or have to wait for an elusive connectivity error to reoccur?

 

A new ring buffer called "RING_BUFFER_CONNECTIVITY' has been added to the dmv sys.dm_os_ring_buffers in SQL 2008 RTM.

 

This will automatically log server-side initiated connection closures, if you see nothing in the dmv, then most likely the client reset/closed the connection. You can enable any connection closure (client or server) logging with trace flag 7827.

 

Please read this blog for more information!

http://blogs.msdn.com/sql_protocols/archive/2008/05/20/connectivity-troubleshooting-in-sql-server-2008-with-the-connectivity-ring-buffer.aspx

 

So if SQL Server 2008 is still online since the connection failure, make sure to capture the information from the sys.dm_os_ring_buffers base on the query in the blog above, as it may give you enough information to narrow down your troubleshooting to the client or server without costly netmon traces.

 

Hope this helps!

Eric Burgess
SQL Server Escalation Team

‘Cannot Generate SSPI Context’ and Service Account Passwords

$
0
0

Was working with Keith Elmore on one of our internal processes and he was hitting a “Cannot generate SSPI context” when trying to connect from Management Studio.  I also saw this come up in a double hop situation (IIS to SQL) when I setup a local repro.

image

We went through the normal check list for Kerberos Troubleshooting, but really that just consisted of validating the SPN in the case of Management Studio as it was a single hop and we were just trying to do a direct connection without any delegation.  The SPN checked out, and there was only one SPN.  No duplicates.

image

We have an internal tool called SSPIClient which will go through the motions of just trying the Windows API calls for Kerberos authentication (IntializeSecurityContext).

2009-12-30 21:11:16.185 Connecting via ODBC to [DRIVER=SQL Server;Server=tcp:passsql\demo;Trusted_Connection=Yes;]

2009-12-30 21:11:16.232 ENTER InitializeSecurityContextA
2009-12-30 21:11:16.232 phCredential              = 0x0055ffb4
2009-12-30 21:11:16.232 phContext                 = 0x0055ffc4
2009-12-30 21:11:16.232 pszTargetName             = 'MSSQLSvc/PASSSQL.pass.local:59256'
2009-12-30 21:11:16.232 fContextReq               = 0x00000003 ISC_REQ_DELEGATE|ISC_REQ_MUTUAL_AUTH
2009-12-30 21:11:16.232 TargetDataRep             = 16
2009-12-30 21:11:16.232 pInput                    = 0x0018d55c
2009-12-30 21:11:16.232 pInput->ulVersion         = 0
2009-12-30 21:11:16.232 pInput->cBuffers          = 1
2009-12-30 21:11:16.232 pBuffers[00].cbBuffer   = 112
2009-12-30 21:11:16.232 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
2009-12-30 21:11:16.232 pBuffers[00].pvBuffer   = 0x03753870
2009-12-30 21:11:16.232 03753870  a1 6e 30 6c a0 03 0a 01 01 a2 65 04 63 60 61 06   .n0l......e.c`a.
2009-12-30 21:11:16.232 03753880  09 2a 86 48 86 f7 12 01 02 02 03 00 7e 52 30 50   .*.H........~R0P
2009-12-30 21:11:16.232 03753890  a0 03 02 01 05 a1 03 02 01 1e a4 11 18 0f 32 30   ..............20
2009-12-30 21:11:16.232 037538a0  30 39 31 32 33 30 32 31 31 31 31 36 5a a5 05 02   091230211116Z...
2009-12-30 21:11:16.232 037538b0  03 01 0d b4 a6 03 02 01 29 a9 0c 1b 0a 50 41 53   ........)....PAS
2009-12-30 21:11:16.232 037538c0  53 2e 4c 4f 43 41 4c aa 17 30 15 a0 03 02 01 01   S.LOCAL..0......
2009-12-30 21:11:16.232 037538d0  a1 0e 30 0c 1b 0a 73 71 6c 73 65 72 76 69 63 65   ..0...sqlservice
2009-12-30 21:11:16.232 phNewContext              = 0x0055ffc4
2009-12-30 21:11:16.232 pOutput                   = 0x0018d574
2009-12-30 21:11:16.232 pOutput->ulVersion        = 0
2009-12-30 21:11:16.232 pOutput->cBuffers         = 1
2009-12-30 21:11:16.232 pBuffers[00].cbBuffer   = 12256
2009-12-30 21:11:16.232 pBuffers[00].BufferType = 2 SECBUFFER_TOKEN
2009-12-30 21:11:16.232 pBuffers[00].pvBuffer   = 0x03759d68
2009-12-30 21:11:16.232 pfContextAttr             = 0x00000000
2009-12-30 21:11:16.232 ptsExpiry                 = 0x0018d548 -> 1601-01-01 00:00:00 *** EXPIRED *** (3585189:11:16 diff)
2009-12-30 21:11:16.232 EXIT  InitializeSecurityContextA returned 0x80090322 SEC_E_WRONG_PRINCIPAL (The target principal name is incorrect)
2009-12-30 21:11:16.232
2009-12-30 21:11:16.232 ******************** ODBC Errors ********************
2009-12-30 21:11:16.232 Return code = -1.
2009-12-30 21:11:16.232 SQLError[00] SQLState    'S1000'
2009-12-30 21:11:16.232 SQLError[00] NativeError 0
2009-12-30 21:11:16.232 SQLError[00] Message     '[Microsoft][ODBC SQL Server Driver]Cannot generate SSPI context'
2009-12-30 21:11:16.232 ******************** ODBC Errors ********************

It was saying that the principal was incorrect, but you can see in the output that it is showing sqlservice, which is correct.  We had rebooted the SQL Server in question, at which point the SQL Service wouldn’t even start.  Keith asked if the password had been changed recently.  We took a look, and sure enough, the password was changed yesterday.  This happens to be an account that we use for multiple things. 

We changed the service account password through SQL Server Configuration Manager and restarted SQL.  SQL could start at that point, and the SSPI error disappeared.  We were able to successfully connect to SQL at that point.

I’m sure other people have known about this type of condition, but in the years that I’ve been here, along with the number of Kerb issues that I’ve troubleshot in the past, this was the first time I had run across this.  Thought I would throw it out there to share with everyone in case they maybe run across something like this that they can’t explain. 

If you change your service password, be sure to recycle the SQL Service so that Kerberos can function properly.

 

Adam W. Saxton | Microsoft SQL Server Escalation Services

How to get a x64 version of Jet?

$
0
0

We have had a number of people ask about how they can get the Jet ODBC driver/OLE DB Provider as 64 bit.  Windows only ships the 32 bit versions of these.  The answer is that the windows versions won’t be x64 as those items are deprecated.  What does deprecated mean?  Here is the excerpt from the MDAC/WDAC Roadmap on MSDN:

Deprecated MDAC/WDAC Components

These components are still supported in the current release of MDAC/WDAC, but they might be removed in future releases. Microsoft recommends, when you develop new applications, that you avoid using these components. Additionally, when you upgrade or modify existing applications, remove any dependency on these components.

And here is what it lists about the Jet Database Engine:

Microsoft Jet Database Engine 4.0: Starting with version 2.6, MDAC no longer contains Jet components. In other words, MDAC 2.6, 2.7, 2.8, and all future MDAC/WDAC releases do not contain Microsoft Jet, the Microsoft Jet OLE DB Provider, the ODBC Desktop Database Drivers, or Jet Data Access Objects (DAO). The Microsoft Jet Database Engine 4.0 components entered a state of functional deprecation and sustained engineering, and have not received feature level enhancements since becoming a part of Microsoft Windows in Windows 2000.


There is no 64-bit version of the Jet Database Engine, the Jet OLEDB Driver, the Jet ODBC Drivers, or Jet DAO available. This is also documented in KB article 957570. On 64-bit versions of Windows, 32-bit Jet runs under the Windows WOW64 subsystem. For more information on WOW64, see http://msdn.microsoft.com/en-us/library/aa384249(VS.85).aspx. Native 64-bit applications cannot communicate with the 32-bit Jet drivers running in WOW64.


Instead of Microsoft Jet, Microsoft recommends using Microsoft SQL Server Express Edition or Microsoft SQL Server Compact Edition when developing new, non-Microsoft Access applications requiring a relational data store. These new or converted Jet applications can continue to use Jet with the intention of using Microsoft Office 2003 and earlier files (.mdb and .xls) for non-primary data storage. However, for these applications, you should plan to migrate from Jet to the 2007 Office System Driver. You can download the 2007 Office System Driver, which allows you to read from and write to pre-existing files in either Office 2003 (.mdb and .xls) or the Office 2007 (*.accdb, *.xlsm, *.xlsx and *.xlsb) file formats. IMPORTANT Please read the 2007 Office System End User License Agreement for specific usage limitations.


Note: SQL Server applications can also access the 2007 Office System, and earlier, files from SQL Server heterogeneous data connectivity and Integrations Services capabilities as well, via the 2007 Office System Driver. Additionally, 64-bit SQL Server applications can access to 32-bit Jet and 2007 Office System files by using 32-bit SQL Server Integration Services (SSIS) on 64-bit Windows.

This all pertains to the components that actually ship with Windows.  The Office team has since taken up Jet as part of Access and has come out with what they call the Access Control Entry (ACE) driver.  For more information on the ACE Drivers, you can check out this blog post which goes into details.  the ACE driver/provider is completely backwards compatible with Jet 4.0 though. 

Office 2010 will introduce a 64 bit version of Office.  With that is coming a 64 bit version of the ACE Driver/Provider which will in essence give you a 64 bit version of Jet.  The downside is that it doesn’t ship with the operating system but will be a redistributable.  There is a beta version available of this driver, as Office 2010 hasn’t been released yet.

2010 Office System Driver Beta: Data Connectivity Components
http://www.microsoft.com/downloads/details.aspx?familyid=C06B8369-60DD-4B64-A44B-84B371EDE16D&displaylang=en

Adam W. Saxton | Microsoft SQL Server Escalation Services

What SPN do I use and how does it get there?

$
0
0

This month has turned into another Kerberos Month for me.  I had an email discussion regarding SPN’s for SQL Server and what we can do to get them created and in a usable state.  I thought I would share my response to the questions as it will probably be helpful for someone.  Here was the comment that started the conversation.  And, by the way, this was actually a good question.  I actually see this kind of comment a lot in regards to SPN placement.  Not necessarily the setup aspect of it, but for SPN’s in general.

“In prior versions of setup we used to be able to specify the port number for the default and Named Instance.  Now, (SQL 2008 & R2) it takes the defaults.  1433 and Dynamic for Named Instances.

If you want to use Kerberos with TCP, you need to know the port number to create the SPN.  For Default instances, if you’re using 1433 then you’re ok. But, Named Instances listen on a dynamic port by default, and since you can’t set the port number, any SPN you create will probably be wrong and Kerberos won’t work.  It would be great if we could ask the user if they want to change the port number during setup, like we did with SQL 2000.”

Let’s have a look at Books Online first.

Registering a Service Principal Name
http://msdn.microsoft.com/en-us/library/ms191153.aspx

This article goes through the different formats that are applicable to SQL 2008 (they are the same for R2 as well).  It also touches on two items that are important to understand.  1.  Automatic SPN Registration and 2. Client Connections. Here is the excerpt from the above article in regards to Automatic SPN Registration.

Automatic SPN Registration

When an instance of the SQL Server Database Engine starts, SQL Server tries to register the SPN for the SQL Server service. When the instance is stopped, SQL Server tries to unregister the SPN. For a TCP/IP connection the SPN is registered in the format MSSQLSvc/<FQDN>:<tcpport>.Both named instances and the default instance are registered as MSSQLSvc, relying on the <tcpport> value to differentiate the instances.

For other connections that support Kerberos the SPN is registered in the format MSSQLSvc/<FQDN>:<instancename> for a named instance. The format for registering the default instance is MSSQLSvc/<FQDN>.

Manual intervention might be required to register or unregister the SPN if the service account lacks the permissions that are required for these actions.

What does this mean?  It means that if the SQL Service account is using Local System or Network Service as the logon account, we will have the permission necessary to register the SPN against the Domain Machine Account.  By default, the machine accounts have permission to modify themselves.  If we change this over to a Domain User Account for the SQL Service account, things change a little.  By default a Domain User does not have the permission required to create the SPN.  So, when you start SQL Server with a Domain User Account, you will see an entry in your ERRORLOG similar to the following:

2010-03-05 09:39:53.20 Server      The SQL Server Network Interface library could not register the Service Principal Name (SPN) for the SQL Server service. Error: 0x2098, state: 15. Failure to register an SPN may cause integrated authentication to fall back to NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies.

This permission is called “Write servicePrincipalName” and can be altered through an MMC snap in called ADSI Edit.  For instructions on how to modify this setting, refer to Step 3 in the following KB Article.  WARNING:  I do NOT recommend you do this on a Cluster.  We have seen issues with this causing connectivity issues due to Active Directory Replication issues if more than one Domain Controller is used in your environment.

How to use Kerberos authentication in SQL Server
http://support.microsoft.com/kb/319723

clip_image002

So, if I enable that permission, lets see what the SQL Service does.  I have two machines I’m going to use for this.  ASKJCTP3 (running the RC build of 2008 R2) and MySQLCluster (SQL 2008 running a Named Instance called SQL2K8).

SetSPN Details:

SPN's with TCP and NP enabled on Default Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
        MSSQLSvc/ASKJCTP3.dsdnet.local:1433
        MSSQLSvc/ASKJCTP3.dsdnet.local

SPN's with only NP enabled on Default Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
        MSSQLSvc/ASKJCTP3.dsdnet.local

SPN's with TCP and NP enabled on Clustered Named Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
        MSSQLSvc/MYSQLCLUSTER.dsdnet.local:54675
        MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

SPN's with only NP enabled on a Clustered Named Instance:

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
        MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

Lets look at what the client will do.  When I say client, this could mean a lot of different things.  Really it means an Application trying to connect to SQL Server by way of a Provider/Driver.  NOTE:  Specifying the SPN as part of the connection is specific to SQL Native Client 10 and later.  It does not apply to SqlClient or the Provider/Driver that ships with Windows.

Service Principal Name (SPN) Support in Client Connections
http://msdn.microsoft.com/en-us/library/cc280459.aspx

MSSQLSvc/fqdn

The provider-generated, default SPN for a default instance when a protocol other than TCP is used.

fqdn is a fully-qualified domain name.

MSSQLSvc/fqdn:port

The provider-generated, default SPN when TCP is used.

port is a TCP port number.

MSSQLSvc/fqdn:InstanceName

The provider-generated, default SPN for a named instance when a protocol other than TCP is used.

InstanceName is a SQL Server instance name

Based on this, if I have a straight TCP connection, the Provider/Driver will use the Port for the SPN designation.  Let’s see what happens when I try to make connections using a UDL file.  For the UDL I’m going to use the SQL Native Client 10 OleDb Provider.  Starting with SNAC10, we can specify which SPN to use for the connection.  This provides us some flexibility when we control how the application is going to connect.  Note:  This is not available with the Provider/Driver that actually ship with Windows.  I also will show what the Kerberos request looks like in the network trace.  This will show us, what SPN is actually being used.  All of these connection attempts were made using ASKJCTP3 which is a Default Instance.

Being this is a Default Instance, I added the Instance Name SPN manually.

C:\>setspn -l sqlservice
Registered ServicePrincipalNames for CN=SQL Service,OU=Services,DC=dsdnet,DC=local:
        MSSQLSvc/ASKJCTP3.dsdnet.local:MSSQLSERVER
        MSSQLSvc/ASKJCTP3.dsdnet.local:1433
        MSSQLSvc/ASKJCTP3.dsdnet.local
        MSSQLSvc/MYSQLCLUSTER.dsdnet.local:54675
        MSSQLSvc/MYSQLCLUSTER.dsdnet.local:SQL2K8

Straight TCP with no SPN Specified:

clip_image002[5]

58     1.796875   {TCP:7, IPv4:5}      10.0.0.3      10.0.0.1      KerberosV5    KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/askjctp3.dsdnet.local:1433

TCP with specifying an SPN for the connection:

clip_image004

32     1.062500   {TCP:11, IPv4:5}     10.0.0.3      10.0.0.1      KerberosV5    KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/ASKJCTP3.dsdnet.local:MSSQLSERVER

Forcing Named Pipes with no SPN specified:

clip_image006

68     1.828125   {TCP:21, IPv4:5}     10.0.0.3      10.0.0.1      KerberosV5    KerberosV5:TGS Request Realm: DSDNET.LOCAL Sname: MSSQLSvc/askjctp3.dsdnet.local

 

The way the provider/driver determines which SPN to use is based on the Protocol being used.  Of note, starting in SQL 2008 we allowed for Kerberos to be used with Named Pipes.  If you have a Named Instance and you are using the Named Pipes protocol, we will look for an SPN with the Named Instance specified.  For a Default Instance and Named Pipes, we will just look for the SPN with no port or Named Instance Name specified as shown above.

With the ability to specify the SPN from the client side, you can see how you can easily manipulate, or even see how we will determine what SPN will be used. 

Now that we know all of the above, lets go back to the original question.  Your company may or may not want to enable the Write permission for the Domain User Account.  If your company is not willing to open up the permission on the service account, then their only recourse will be to set a static port for the Named Instance instead of letting the Named Instance use a dynamic port.  This would also be my recommendation for Clusters.  In this case, you will need to know exactly what SPN’s are needed and create them manually using SetSPN or tool of your choice.

Even though we don’t provide the ability to set your port during setup, you can still modify the port settings for the Instance through the SQL Server Configuration Manager.  This will allow you to set your static SPN’s as well as assist you with Firewall rules.

image

image

Adam W. Saxton | Microsoft SQL Server Escalation Services

http://twitter.com/awsaxton


Error 18056 can be unwanted noise in certain scenarios

$
0
0

I saw a lot of hits on the web when I searched for the Error message 18056 with State 29. I even saw two Microsoft Connect items for this issue filed for SQL Server 2008 instances:

http://connect.microsoft.com/SQL/feedback/ViewFeedback.aspx?FeedbackID=468478

http://connect.microsoft.com/SQLServer/feedback/details/540092/sql-server-2008-sp1-cu6-periodically-does-not-accept-connections

So, I thought it was high time that we pen a blog post on when this message can be safely ignored and when it is supposed to raise alarm bells. Before I get into the nitty-gritty details, let me explain under what condition is 18056 raised with the state = 29.

Most applications today make use of connection pooling to reduce the number of times a new connection need to be opened to the backend database server. When the client application reuses the connection pool to send a new request to the server, SQL Server performs certain operations to facilitate the connection reuse. During this process (we shall call it Redo Login for this discussion) if any exception occurs, we report an 18056 error. The state numbers like the famous 18456: Login Failed error message give us more insight into why the Redo Login task fails. State 29 occurs when there is an Attention received from the client while the Redo Login code is being executed. This is when you would see the message below which has plagued many a mind till date on SQL Server 2008 instances:

2009-02-19 04:40:03.41 spid58 Error: 18056, Severity: 20, State: 29.

2009-02-19 04:40:03.41 spid58 The client was unable to reuse a session with SPID 58, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Is this a harmful message?

The answer that always brings a smile to my face: It depends! The dependency of this error message being just plain noise to something that should send all the admins in the environment running helter-skelter can be summarized in one line.

If the above error message (note that the state number should reflect 29) is the only message in the SQL Server Errorlog along with no other errors noticed in the environment (connectivity failures to the SQL instance in question, degraded performance, high CPU usage, Out of Memory errors), then this message can be treated as benign and safely ignored.

Why is this message there?

Well our intentions here were noble and we didn’t put the error message out there to create confusion. This error message is just reporting that a client is reusing a pooled connection and when the connection was reset, the server received an attention (in this case, a client disconnect) during the connection reset processing on the server side. This could be due to either a performance bottleneck on the server/environment or a plain application disconnect. The error message is aimed at helping in troubleshooting the first category of problems. If you do see some other issues at the same time though, these errors may be an indicator of what is going on at the engine side.

What should you do when you see your Errorlog bloating with these error messages?

a.       The foremost task would be to scan the SQL Errorlog and determine if this error message is accompanied before/after by some other error message or warning like Non-yielding messages, Out of Memory (OOM) error message (Error 701, Failed Allocate Pages etc.).

b.      The next action item would be to determine if there is high CPU usage on the server or any other resource bottleneck on the Windows Server. Windows Performance Monitor (Perfmon) would be your best friend here.

c.       Lastly, check if the Network between the Client and Server is facing any latency issues or if network packets drops are occurring frequently. A Netmon trace should help you here.

 

Tejas Shah

Escalation Engineer - Microsoft

My Kerberos Checklist…

$
0
0

I’ve had numerous questions regarding Kerberos, both internally within Microsoft and with Customers.  It continues to be a complicated topic and the documentation that is out there can be less than straight forward.  Based on some previous items I’ve worked on, I wanted to share my experience in regards

Let me start by looking at two scenarios for reference.  One that is basic and the other that is complex.

image

image

As you’ll find, once we figure out how to configure the basic scenario, the complex scenario ends up being very similar.

Data Collection:

The first thing when you try to tackle a Kerberos issue is to understand your environment.  I find that a lot of the Kerberos issues that I troubleshoot all come down to gathering the right information to make an informed analysis and identify the problem point.  The following data points relate to all servers involved.  We will circle back on the Client after we talk about the Servers.

  1. Know your topology
  2. What is the Service Account being used for the application in question?
  3. What Service Principal Name (SPN) does your app require?
  4. What SPNs are defined for that service account?
  5. What are the delegation settings for the service account?
  6. Local Policy related information
  7. Additional application specific information

 

Consistent vs. Intermittent Kerberos Issues

The data collection points above should allow you to get Kerberos working in most cases.  I say most cases because the above refers specifically to configuration.  I typically break it down to consistent vs. intermittent issues.  If the issue is reproducable every time, it is a configuration issue.  If it is intermittent, then it is usually not a configuration issue.  If it was it would happen all the time.  Intermittent means it works most of the time.  In order to work at all, it has to be configured correctly.  The exception to this would be if you are in a Farm type situation and the configuration is not the same on every box in the farm.  Sometimes you may hit Server A which is configured properly, and another time you may hit Server B which is not and causes an error.  Which brings us to the first Data Collection Point…

Know your topology

Before you being, you should know what servers are involved in your application as a whole.  If we are talking about a single web application, you probably have at least two servers to consider and know about – the Web Server and the Backend (SQL for our purposes).  They both play a part.  This becomes even more important in a distributed environment where you may have 3+ servers.

As you’ll see, with the data collection items, we basically will walk the line down your servers to check them one by one.

What is the Service Account?

For the particular server you are looking at, what is the service account that the application is using?  This is important, because this will tell us where the SPN needs to go.  It also plays a part in Delegation.  Not every service will be a Windows Service, so this could be dependent on the application in question.  Here are some examples:

SharePoint

IIS – not a windows service

image

Reporting Services

Windows Service

image

SQL Server

Windows Service

image

For windows services, you can also look in the Services MMC to get the information.  Again, you need to know what your application is doing:

image

What SPN does your app require?

We can look at all sorts of SPN listings, but before you do, we need to know what it is we are looking for.  I think this is one of the more complicated parts of Kerb configuration because the SPN is dependent on the application you are using.  The format of the SPN is consistent between applications, but what is required is dependent on the application, or from an SPN point of view, the service.  It is a Service Principal Name after all!

The SPN has the following format:  <service>/<host>:<port/name>

The port/name piece of this is optional and dependent on what the service will accept.

HTTP – For a default configuration, the port is never used for an HTTP SPN.  SPN’s are unique and if you add an HTTP SPN with a port on it, it will be ignored as it is not correct.  IIS and Internet Explorer do not affix the port number to the SPN request when they look for it.  From an Internet Explorer perspective, you can alter this behavior via a registry key to where it will, but I have yet to see anyone do that.  Most people aren’t aware of it from what I can tell.  From my experience, I would stay away from adding a port to an HTTP SPN.

MSSQLSvc – you can look at the following blog post to read more about how SQL determines the SPN needed.  http://blogs.msdn.com/b/psssql/archive/2010/03/09/what-spn-do-i-use-and-how-does-it-get-there.aspx

For the next couple of items, we will use the SharePoint service as the example – spservice.  In this case it is a web application, so we know it will use the HTTP service from an SPN perspective.  The host piece is dependent on how we are connecting to the web server.  This is true for any application really.  From an HTTP perspective it is the URL, for SQL it is the connection string.  Another thing to know is that both IIS and SQL will resolve a NetBIOS name to the Fully Qualified Domain Name if it can.  For example – http://passp will be resolved to passsp.pass.local.

For our spservice example with a url of http://passsp, our SPN turns out to be http/passsp.pass.local and it is placed on the spservice account.

Another special note about HTTP SPNs.  If for example my SharePoint AppPool (service) was using Network Service, this is considered the machine context so the SPN would go on the machine account (PASSSP).  However, HTTP is considered a covered service for a special service type called HOST.  Every Machine account has a HOST entry for the FQDN as well as the NetBIOS name.  You don’t need to add an HTTP SPN on the machine account as long as your URL matches the machine name.

When adding an SPN, I also always recommend that you add both the FQDN SPN (i.e. http/passsp.pass.local) as well as the NetBIOS SPN (i.e. http/passsp).  The NetBIOS SPN is a safety measure in case the DNS resolution fails and it just submits the NetBIOS SPN request.

What SPN is defined?

Now that we know the service account and what our SPN should be, we can look at the SPNs that are defined on that account.  We can use SetSPN to do this, although there are other tools that can help get this information for you (ADSIEdit, LDAP queries, etc…).  SetSPN is nice though as it ships with the Operating System starting with Windows 2008.  Lets have a look at our SharePoint Service account – spservice:

image

Based on what we came up with above, we can see that the passsp SPN’s are in place.  You’ll also notice another SPN present, which means this Service Account is hosting two HTTP Services (could be two AppPools on the one server, or on two separate servers). 

You could run into a situation where the SPN is defined on another account as well.  This may be a misplaced or a duplicate SPN.  Both will cause an issue for you.  Usually when I grab SPN information from an environment, I grab all SPN’s defined in the Domain so that I can look for misplaced or duplicate SPNs.  The SetSPN tool that comes with Windows 2008 and later (and can be downloaded for Windows 2003), contains a new switch that will look for Duplicates for you.  It is the –X switch.

image

In the above, you can see two accounts that had the http/passsp.pass.local SPN.  You can then decide which one really needs to be there based on the Service Account being used. 

What are the delegation settings?

Delegation only comes into play if you want the Client’s Windows credentials forwarded to another service.  For example, SharePoint to Reporting Services, Reporting Services to SQL, or even SQL to SQL in a Linked Server scenario.  NTLM does not allow for the forwarding of credentials.  This is accomplished through the process of delegation as part of the Kerberos Protocol. There are two main types of Delegation – Full Trust or Constrained Delegation.  Of note, you will not see the Delegation Tab on the Account within Active Directory unless an SPN has been assigned to that account.

Full Trust

This means that the given service can forward the Client’s credentials to any service.  You are non-discriminate in who you communicate to.  This is less secure option out of the two, but it is the easiest to configure out of the two (which I would expect being less secure – Secure always means complicated right?)

image

Constrained Delegation

Constrained means that you are going to specify which services you can actually delegate to.  The services are represented by SPN’s.  This is the more secure approach but has some drawbacks.  As mentioned before it is more complicated. The reason is that you have to know exactly what your application is trying to delegate to.  It may not be just the service you are interested.  For example, you may be configuring SharePoint for Delegation to go to Reporting Services, but then realize that you just broke a connection to SQL or maybe a connection to some web service that you are trying to hit that requires Kerberos.  It’s not really that bad as long as you understand everything that your application is going to reach out to and that would require passing on the Client’s credentials.

The other drawback to Constrained Delegation is that you lose the ability to cross a domain boundary.  Meaning a cross domain scenario will fail from a delegation perspective.  Users from another Domain can hit your application, but all of the services that you are communicating to need to be in the same domain.  For example, SharePoint (Domain A) cannot delegate to SQL (Domain B).  Under constrained delegation, that will fail.

In the image below, the 3rd radio dial means that you want to use Constrained Delegation.  The sub radio dials define whether you want to use all Kerberos, or if you want to enable Protocol Transitioning.  I’m not going to get into Protocol Transitioning in this blog post as it is big enough, but you will have to deal with Protocol Transitioning if you are using the Claims to Windows Token Service.  This would come into the picture if you are doing anything with Excel Services in SharePoint or PowerPivot.

image

 

You will need to go back to your application’s topology to determine if enabling delegation is required. If we look at our Double Hop example from above, Reporting Services would need to have delegation enabled for it’s service account, but SQL would not as SQL isn’t going out to anything using the Client’s credentials.

Local Policy Settings

There is at least one Local Policy setting you’ll need to pay attention to when trying to delegate.  That is the “Impersonate a client after authentication” policy.

image

If your middle server is a web server, you can take advantage of a build in group that has this permission.  For Windows 2003, the group is called IIS_WPG.  For Windows 2008 and later it is the IIS_USRS group.  By default, SharePoint and RS should place itself in that group.  So, you usually don’t have to worry about it.  I’m just mentioning it here as a step in the checklist.  I rarely see this as the issue though unless you are doing a customer application with a Domain User account for the service account.

Client

Let’s circle back on the Client.  You may be asking, all this is great for the application, but is there anything special I need to do for the User Account coming from the client.  Not really.  By default you should be good to go from the Client’s user account.  However, there is an account you should be aware of within Active Directory.  That is the “Account is sensitive and cannot be delegated” setting.  If that is checked, you will have issues with that specific user.  To this date, I have yet to see a customer actually have that checked.  Doesn’t mean people don’t do it.  I just haven’t seen it.

image

Application Specific Settings

When I started getting into Kerberos, I found that almost all of the issues were based on the Active Directory settings (SPN, Delegation, etc…).  Not to say that that has lessened, but I’ve also seen a shift in the complexity of getting specific applications up and running.  As applications become more complex, you should be aware of what settings may come into play within that app that could affect Kerberos.  If you have gone through everything above and it all looks good.  Chances are that there is an application specific setting that is interfering. 

There is a lot to mention in this area, so I will spin up another blog post to discuss application specific settings to touch on IIS, SharePoint, Excel Services, PowerPivot and Reporting Services.  SQL doesn’t really have any Kerb specific settings as long as the SPN and delegation settings (if needed) are in place.

Tying it together…

So, we’ve looked at what my checklist is, but it was really focused on one service. What I’ve found is that it is as simple as that.  All I do is repeat the check list on each server that play a part in the application (topology).  Think of it as a wash, rinse, repeat.  When I help customers to get Kerberos configured, I just walk the line down each server to make sure everything lines up.  I have been fairly successful with that approach.  As I’ve had more experience with it (as I usually deal with it every day), I can usually target a specific segment depending on where the error is coming from.  Other times it may not be that straight forward.  Even when I target a specific area, if that doesn’t pan out, I just start from the beginning and apply the checklist to each server/service that is playing a part. 

Once you approach it that way, it really doesn’t matter how many hops there are or what services are involved.  You just follow the checklist one more time.  The point where complications usually come into play are when Constrained Delegation is implemented and we didn’t account for everything or you hit up against an App Specific issue.  Outside of that, it is usually straight forward based on the above.  Just find out what the SPN needs to be and where it needs to go and you are 80% there.

I realize I’m making it sound simple when it can be very frustrating and complicated, but the above has worked well for me in the past. Hopefully the above is helpful to you as you try to implement Kerberos within your environment. 

There is definitely way more to cover on this topic and I will continue to blog about those items.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

How It Works: Error 18056 - The client was unable to reuse a session with SPID ##, which had been reset for connection pooling

$
0
0

This message has come across my desk a couple of times in the last week and when that happens I like to produce blog content.  

The error is when you are trying to use a pooled connection and the reset of the connection state encounters an error.   Additional details are often logged in the SQL Server error log but the 'failure ID' is the key to understanding where to go next.

Event ID:           18056

Description:     The client was unable to reuse a session with SPID 157, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Map the failure ID to the following (SQL 2008 and SQL 2008 R2 failure id states)

 

        Default = 1,

        GetLogin1,                    2

        UnprotectMem1,                3

        UnprotectMem2,                4

        GetLogin2,                    5

        LoginType,                    6

        LoginDisabled,                7

        PasswordNotMatch,             8

        BadPassword,                  9

        BadResult,                    10

        CheckSrvAccess1,              11

        CheckSrvAccess2,              12

 

        LoginSrvPaused,                  13

        LoginType,                       14

        LoginSwitchDb,                   15

        LoginSessDb,                     16            

        LoginSessLang,                   17

        LoginChangePwd,                  18

        LoginUnprotectMem,               19

 

        RedoLoginTrace,                  20

        RedoLoginPause,                  21

        RedoLoginInitSec,                22

        RedoLoginAccessCheck,            23

        RedoLoginSwitchDb,               24

        RedoLoginUserInst,               25

        RedoLoginAttachDb,               26

        RedoLoginSessDb,                 27     

        RedoLoginSessLang,               28

        RedoLoginException,              29             (Kind of generic but you can use dm_os_ring_buffers to help track down the source and perhaps -y)

 

        ReauthLoginTrace,                30

        ReauthLoginPause,                31

        ReauthLoginInitSec,              32

        ReauthLoginAccessCheck,          33

        ReauthLoginSwitchDb,             34

        ReauthLoginException,            35

                           Login assignments from master

        LoginSessDb_GetDbNameAndSetItemDomain,           36

        LoginSessDb_IsNonShareLoginAllowed,              37

        LoginSessDb_UseDbExplicit,                       38

        LoginSessDb_GetDbNameFromPath,                   39

        LoginSessDb_UseDbImplicit,                       40      (I can cause this by changing the default database for the login at the server)

        LoginSessDb_StoreDbColl,                         41

        LoginSessDb_SameDbColl,                          42

        LoginSessDb_SendLogShippingEnvChange,            43

 

                                Connection string values

 

        RedoLoginSessDb_GetDbNameAndSetItemDomain,       44

        RedoLoginSessDb_IsNonShareLoginAllowed,          45

        RedoLoginSessDb_UseDbExplicit,                   46      (Data specificed in the connection string Database=XYX no longer exists)

        RedoLoginSessDb_GetDbNameFromPath,               47

        RedoLoginSessDb_UseDbImplicit,                   48

        RedoLoginSessDb_StoreDbColl,                     49

        RedoLoginSessDb_SameDbColl,                      50

        RedoLoginSessDb_SendLogShippingEnvChange,        51  

  

                                Common Windows API calls

 

        ImpersonateClient,                            52

        RevertToSelf,                                 53

        GetTokenInfo,                                 54

        DuplicateToken,                               55

        RetryProcessToken,                            56

        inChangePwdErr,                               57

        WinAuthOnlyErr,                               58

 

Error: 18056  Severity: 20  State: 46.

The client was unable to reuse a session with SPID 1971  which had been reset for connection pooling. The failure ID is 46. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

State 46 = x_elfRedoLoginSessDb_UseDbExplicit = 0n46

 

There is only one place in the code (We are simply trying to execute a usedb and getting a failure.) that sets this state and it is after we have printed the message 4060 to the client that we could not open the database or the user does not have permissions to the database.    Since there are not messages about a database going offline or being recovered and this connection as already established – “Would there have been any permission changes at this time to prevent this login from accessing the database?”   

 

I tried this with a test application.

 

Connection pool using database dbTest

User RDORRTest with default database dbTest

 

When I drop the user in the database dbTest the client starts getting the errors as I expected to see.

 

07/28/10 07:56:45.391 [0x00001E5C] SQLState: 28000, Native Error: 18456 [Microsoft][SQL Server Native Client 10.0][SQL Server]Login failed for user 'RDORRTest'.

07/28/10 07:56:45.410 [0x00001E5C] SQLState: 42000, Native Error: 4064 [Microsoft][SQL Server Native Client 10.0][SQL Server]Cannot open user default database. Login failed.

 

My SQL Server error log shows

 

2010-07-28 08:02:40.41 Logon       Error: 18456, Severity: 14, State: 50.

2010-07-28 08:02:40.41 Logon       Login failed for user 'RDORRTest'. Reason: Current collation did not match the database's collation during connection reset.

2010-07-28 08:02:40.41 spid53      Error: 18056, Severity: 20, State: 50.

2010-07-28 08:02:40.41 spid53      The client was unable to reuse a session with SPID 53, which had been reset for connection pooling. The failure ID is 50. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

 

I password change for the login at the server generated state 8.

If I rename the database I don’t get any information about the rename in the error log and I start getting connection failures.

 

All my attempts to far had been when the login was setup with a default database.  However, to get to the 46 condition I had to specify the DATABASE for the connection string.

 

Now all I had to do was drop the user from the database and I get state 46.

 

2010-07-28 08:29:51.61 Logon       Error: 18456, Severity: 14, State: 46.

2010-07-28 08:29:51.61 Logon       Login failed for user 'RDORRTest'. Reason: Fa iled to open the database configured in the login object while revalidating the login on the connection. [CLIENT: 65.53.66.207]

 

Added the user back and I no longer get the error and the connections continue their work.

 

Bob Dorr - Principal SQL Server Escalation Engineer

How It Works: Error 18056 - The client was unable to reuse a session - Part 2

$
0
0

I have had several questions on my blog post: http://blogs.msdn.com/b/psssql/archive/2010/08/03/how-it-works-error-18056-the-client-was-unable-to-reuse-a-session-with-spid-which-had-been-reset-for-connection-pooling.aspx related to SQL Server 2008's honoring of an query cancel (attention) during the processing of the reset connection.  This blog will augment my prior post.

Facts

  • You will not see the sp_reset_connection on the wire when tracing the network packets.   It is only a bit set in the TDS header and not RPC text in the packet.
  • sp_reset_connection is an internal operation and generates RPC events to show its activity.
  • Newer builds of SQL Server added logical disconnect and connect events. http://blogs.msdn.com/b/psssql/archive/2007/03/29/sql-server-2005-sp2-trace-event-change-connection-based-events.aspx
  • An attention from the client (specific cancel or query timeout) records the time it arrives (out-of-band) but the attention event is not produced until the query has ceased execution, honored the attention.   This makes the start time of the attention the received time, the end time the complete honor time and the duration how long it took to interrupt the execution, handle rollback operations if necessary and return control of the session to the client.

The questions normally center around the Error 18056, State 29 and how one can encounter it.   I have outlined the high level flow in the diagram below for producing the error.

The application will reuse a connection from the pool.   When this occurs the client driver will set the reset bit in the TDS header when the next command is executed.  In the diagram I used an ODBC example of SQLExecDirect.

  1. The command is received at the SQL Server, assigned to a worker and begins processing.  If the reset bit is located the sp_reset_connection logic is invoked.
  2. When tracing the RPC:Starting and logical Disconnect events are produced.
  3. The login is redone; checking permissions, making sure password has not expired, database still exists and is online, user has permission in the database and other validations take place.  
  4. Client explicitly cancels (SQLCancel) or query timeout is detected by client drivers and an attention is submitted to the SQL Server.   The attention is read by the SQL Server, starting time captured and the session is notified of the cancel request, STOP!  (Note:  This is often a point of confusion.  The overall query timeout applies to reset login and execution of the query in this scenario.)
  5. During all these checks the logic will also check to see if a query cancellation (attention) has arrived.  If so the Redo Login processing is interrupted the 18056 is reported and processing is stopped.
  6. The attention event is always produced after the completed event.  (When looking at a trace look for the attention event after the completed event to determine if the execution was cancelled.)  This allows the attention event to show the duration required to honor the attention.  For example, if SET_XACT_ABORT is enabled an attention will upgrade to a rollback of the transaction.   If it was a long running transaction the rollback processing could be significant.   Without SET_XACT_ABORT the attention interrupts processing as quickly as possible and leaves the transaction active.   The client is then responsible for the scope of the transaction.

image

The "If Cancelled" used by Redo Login is where the change occurs between SQL 2005 and SQL 2008.   The cancel was not checked as frequently in SQL 2005 so it was not honored until the command execution started.   SQL Server 2008 will honor the attention during the redo login processing. 

Here was an example that I received that will show the behavior.   Notice that the execution (rs.Open) is done asynchronously so control returns to the client as soon as the query is put on the wire to the SQL Server.   The cn.Cancel following the rs.Open will submit the attention for the request that was traveling to the SQL Server.   This will produce the same pattern as shown in the diagram above, interrupting the Redo Login.  If you were not using pooled connections the reset activity would not be taking place and the query itself would be interrupted.

dim cn

dim rs

set cn = CreateObject("ADODB.Connection")

set rs = CreateObject("ADODB.Recordset")

for i = 1 to 1000

                cn.Open "Provider=SQLNCLI10;Integrated Security=SSPI;Data Source=SQL2K8Server; initial catalog =whatever;"

                rs.ActiveConnection = cn

                rs.CursorLocation = 2

                             ‘ 48 = adAsyncExecute + adAsyncFetch

                rs.Open "select * from whatever", cn, 0, 1, 48

                cn.Cancel

                cn.Close

next

Internally an attention is raised as a 3617 error and handled by the SQL Server error handlers to stop execution of the request.   You can see the 3617 errors in the sys.dm_os_ring_buffers.  You can watch them with the trace exception events as well.

<Record id= "1715" type="RING_BUFFER_EXCEPTION" time="12558630"><Exception><Task address= 0x11B4D1B88</Task><Error>3617</Error><Severity>25</Severity><State>23</State><UserDefined>0</UserDefined></Exception><Stack

Bob Dorr - Principal SQL Server Escalation Engineer

SharePoint Adventures : Using Kerberos with the Report Server

$
0
0

Previous Post: SharePoint Adventures : Setting up Reporting Services with SharePoint Integration

In the previous post, I walked through getting RS 2008 R2 Integrated with SharePoint 2010. What I didn't touch on was if you wanted to get this work with Kerberos. Kerberos itself can be complicated. This is partly because you need to track so many things. And, as the deployment becomes more distributed, you have to track more things.

A while back, I posted a blog post describing my Kerberos Checklist. I'll use this as we step through my SharePoint deployment to get Kerberos working in this environment.

Before we get into the details, there is one piece I want to point out that is special with SharePoint 2010. The authentication model that you select for your site makes a big difference in whether this will work or not. SharePoint allows you to choose between Classic and Claims. If you choose to have a Claims site, you will not be able to get Kerberos to work with RS 2008 R2 when integrated with SharePoint 2010. If the site is Claims based, you won't be able to change it back either. Part of the reason why Kerberos won't work is because when we detect you are in a Claims site, we always go with Trusted Authentication from the RS Perspective. This means that a Windows Token will not be passed from SharePoint to the Report Server. An SPUser token will be passed instead. My next post will go into how you can determine if your site is Classic or Claims.

That being said, lets dive in…

For this setup, I only have 3 servers involved.

  Server Name Service Account Delegation Required Custom App Settings?
SharePoint DSDContosoSP DSDCONTOSO\spservice Yes Yes
Reporting Services DSDContosoRS DSDCONTOSO\rsservice Yes Yes
SQL Server DSDContosoSQL DSDCONTOSO\sqlservice No No

SharePoint

Service Principal Name (SPN)

Because we are going to do Kerberos we are going to need some SPNs. In the table above, we know that the SharePoint Service account is DSDCONTOSO\spservice. This means that those SPN's will need to go on that user account (DSDCONTOSO\spservice) and not the Machine Account (DSDContosoSP). Had we been using LocalSystem or Network Service, the SPNs would have gone on the Machine Account. That's always how you figure out where the SPN's go. It's always based on the context that the Service is running under.

So, let's have a look at DSDCONTOSO\spservice. We'll use SETSPN to do that. Starting with Windows 2008, SETSPN ships with the operating system and is available directly from a command prompt.

clip_image001

We can see that there are no SPN's registered on the spservice account. You'll also notice that I can run SETSPN with or without the Domain Name. This is because I only have a single domain. If you had multiple domains, supplying the domain name tells SETSPN where you want to modify the account at. This really is helpful if you have the same account name in multiple domains. Going forward, I will leave out the domain name.

As a quick check, I also look at the Machine Account (DSDContosoSP). This is just a double check to make sure I won't run into a duplicate situation.

clip_image002

Because SharePoint is a web application, we are interested in the HTTP SPN. We do not see one listed on the Machine account. You can take note of the HOST SPN though. This will be found on any Machine Account. You should never see these on a User account. They get created when a Machine Account is created. The HOST SPN will cover the HTTP service if you are running within the context of the Machine Account (LocalSystem or Network Service). Had that been the case, we wouldn't have needed and HTTP SPN. But, because we are using the spservice user account, we will need to put an HTTP SPN on the spservice account.

NOTE: Domain Admin permissions are required to add (-a) or delete (-d) an SPN. Anyone can list (-l) out an SPN.

clip_image003

We added two SPNs to the spservice account. HTTP/dsdcontososp and HTTP/dsdcontososp.dsdcontoso.local. HTTP SPNs are based on the URL that you are going to use. In this case we are just using the machine name for the URL (http://dsdcontososp). Internet Explorer will convert this to the Fully Qualified Domain Name (FQDN) when it builds out the required SPN. So the SPN request for that URL would be HTTP/dsdcontososp.dsdcontoso.local. And we have just added that.

NOTE: HTTP SPNs should NOT have a port listed. They are purely based on the host within the URL without the port number. This means that you have two sites running on different ports, you should use the same Service Account for both as you will have a shared SPN between the two. For example: http://dsdcontososp & http://dsdcontososp:5555 both use the following SPN: HTTP/dsdcontososp.

What about the Netbios SPN? Well, I always add that for good measure. Hopefully it will never be needed. But, on the off chance that the name lookup fails, we will be covered. When we do the reverse name lookup to get the FQDN, we have to go out to the DNS Server to do that. If for whatever reason the DNS Lookup fails, we will just use the netbios name. So, the SPN would look like HTTP/dsdcontososp. The fact that we added both means I won't be hit by intermittent DNS issues and end users won't be interrupted unnecessarily. So, my take on it is to always add both the NETBIOS and FQDN SPNs.

If we look at the spservice account now, we will see both SPNs that we added.

clip_image004

Delegation

We know that we are going from the SharePoint server to the Report Server. In order for credentials to be forwarded from SharePoint to RS, we need to give the Service Account permission to delegate. By default, this is disabled.

NOTE: Domain Admin permissions are required to modify delegation settings on an account.

clip_image005

NOTE: The delegation tab will only be visible if SPN's are present on that account.

The Delegation Tab of the account is where we will find these settings. Here is how the options break down:

Do not trust this user for delegation No Trust. We cannot delegate.
Trust this user for delegation to any service (Kerberos only) Full Trust. We can delegate to any service.
Trust this user for delegation to specified services only Constrained Delegation.  Requires you to list the services that we can delegate to in the list below the radio dials.
   Use Kerberos Only Constrained Delegation with Kerberos Protocol only
   Use any authentication protocol Constrained Delegation with Protocol Transitioning.  Useful on the Claims side of things.

For this example, I'm not going to go into the Constrained Delegation side of things. I will do that in a later post. We will just stick with Full Trust.

So, I select "Trust this user for delegation to any service (Kerberos Only)". Please take into account that Constrained Delegation is the more secure option. But, it also presents its restrictions as a result. Stay tuned for more information about that.

clip_image006

SharePoint Settings

The SPN and delegation settings are really the basic Kerberos settings needed for any application. However, SharePoint has some app specific settings that we need to pay attention to. For this, we will head over to SharePoint's Central Admin Site.

We will go to Application Management and then to Manage Web Applications.

clip_image007

You will select the site you are interested in, in my case it is the SharePoint - 80 site, and then click on Authentication Providers.

clip_image008

Click on Default.

clip_image009

We want to choose "Negotiate (Kerberos)" and then hit "Save".

clip_image010

This configures that SharePoint site to use Negotiate. Negotiate will always attempt to use Kerberos first if an SPN is available to use. We can test to see if this is working properly by going back to the SharePoint site. It should come up as normal without any prompts for credentials or 401.1 errors. If you encounter that, something isn't right.

However, at this point our reports should no longer work. The underlying error here will be a 401.1 against the Report Server because it hasn't been setup for Kerberos.

clip_image011

In the SharePoint ULS Log we will see:

02/21/2011 08:16:46.68         w3wp.exe (0x0F44)         0x0F68        SQL Server Reporting Services         UI Pages         aacz        High         Web part failed in SetParamPanelVisibilityForParamAreaContent: System.Net.WebException: The request failed with HTTP status 401: Unauthorized.
at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.GetSecureMethods()
at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.IsSecureMethod(String methodname)
at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.SetConnectionSSLForMethod(String methodname)
at Microsoft.Reporting.WebForms.Internal.Soap.ReportingServices2005.Execution.RSExecutionConnection.ProxyMethodInvocation.Execute[TReturn](RSExecutionConnection connection, ProxyMethod`1 initialMethod, ProxyMethod`1 retryMethod)
at Microso...        23c6017c-3d37-4b70-b378-d5dd875518f6

Which brings us to the next stop in our journey…

Reporting Services

Service Principal Name (SPN)

Our service account for Reporting Services is rsservice and not Network Service, so the SPN's will go on the rsservice account itself. Also, Reporting Services is a web application, so we are still sticking with an HTTP SPN. Lets check out what is on the Service Account and the Machine Account.

clip_image012

clip_image013

Everything looks good here. Again, HTTP SPN's are URL based. So, we are going to create the SPN based on the url which you can get from the Reporting Services Configuration Manager under the Web Service URL Tab.

clip_image014

Based on that, our SPNs will be the following - HTTP/dsdcontosors and HTTP/dsdcontosors.dsdcontoso.local

clip_image015

And doing a listing of the rsservice account, we should see two SPNs on it.

clip_image016

Reporting Services Settings

I'm doing Settings first instead of Delegation to show that Delegation may not be needed. However, there is a setting for Reporting Services that is needed in order for Kerberos to work successfully against Reporting Services. This setting resides in the rsreportserver.config file which by default should be found at :

C:\Program Files\Microsoft SQL Server\MSRS10_50.<instance name>\Reporting Services\ReportServer

The setting that we are interested in is Authentication Type. If you look at the current setting, you may see different results.

<Authentication>
    <AuthenticationTypes>
        <RSWindowsNTLM/>
    </AuthenticationTypes>
    <RSWindowsExtendedProtectionLevel>Off</RSWindowsExtendedProtectionLevel>
    <RSWindowsExtendedProtectionScenario>Proxy</RSWindowsExtendedProtectionScenario>
    <EnableAuthPersistence>true</EnableAuthPersistence>
</Authentication>

For mine, I see RSWindowsNTLM under the Authentication Types. This is because when I first setup Reporting Services, I used a Domain Account instead of the default Network Service. When you do this, it will default the setting to RSWindowsNTLM. If I would have chosen Network Service as the Account to use during setup, this setting would have reflected RSWindowsNegotiate. And then you could later change it to a Domain Account without this setting changing.

All I need to do for mine to get Kerberos working is to change it over to RSWindowsNegotiate. You can either add it on top of RSWindowsNTLM or replace RSWindowsNTLM.

NOTE: RSWindowsNegotiate is specific to Internet Explorer. Other browsers may need RSWindowsKerberos instead. You will need to test that to see what works best for your configuration.

In my case, I just added it on top of RSWindowsNTLM

<Authentication>
    <AuthenticationTypes>
        <RSWindowsNegotiate/>
        <RSWindowsNTLM/>
    </AuthenticationTypes>
    <RSWindowsExtendedProtectionLevel>Off</RSWindowsExtendedProtectionLevel>
    <RSWindowsExtendedProtectionScenario>Proxy</RSWindowsExtendedProtectionScenario>
    <EnableAuthPersistence>true</EnableAuthPersistence>
</Authentication>

At this point, my Hello World report should come up ok as I'm not using any data sources for it.

clip_image017

However, the report where I do have a data source will fail. However, it is with a different message this time.

clip_image018

In the ULS log, we won't see an error by default, because the Reporting Services Monitoring trace points have not been enabled within Central Admin. The error itself will be a "Login failed for user 'NT AUTHORITY\ANONYMOUS'". That error comes directly from SQL Server. Whereas the 401.1 errors were Web related errors.

Delegation

In order for Reporting Services to forward credentials to a back end data source, we need to enable delegation permissions on the Service Account. The data sources are process within the Report Server Windows Service and not SharePoint, so the SharePoint settings don't help us here.

We will do what we did with the SharePoint Account and enabled Full Trust for the Reporting Services Account.

clip_image019

This in itself is not enough to get our Report with the Data Source working though. This just allows Reporting Services to forward the user's credentials to another Service. That service we are forwarding to still needs to be setup properly. In this example it is SQL Server we are forwarding to and it does not have it's SPN configured yet. So, we will still fail with a Login Failed message from SQL. Reporting Services at this point should be good to go though.

Which brings us to the last stop in our journey…

SQL Server

Service Principal Name (SPN)

I have previously written a blog post concerning the SQL Server SPNs. What SPN do I use and how does it get there?

It goes through how SQL Server can make use of it's ability to manage the SPN for you, and which SPN is needed based on which protocol you are trying to connect with. I won't go through all the details again here, so I will make a few assumptions.

First, that the ability for SQL to manage it's SPNs is not working because I'm using a Domain Account and I haven't given it the permissions necessarily for that to occur. You can also verify this in the SQL ERRORLOG:

2011-02-21 08:58:01.40 Server The SQL Server Network Interface library could not register the Service Principal Name (SPN) for the SQL Server service. Error: 0x2098, state: 15. Failure to register an SPN may cause integrated authentication to fall back to NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies.

Second, that I'm going to be connecting with the TCP protocol and not Named Pipes.

The SQL Service is using the sqlservice account. And because we are using the TCP Protocol, the SPN will need the port number. In this case, it is a default instance, so we know the port will be 1433. So, our SPN will look like the following for SQL - MSSQLSvc/dsdcontososql:1433 and MSSQLSvc/dsdcontososql.dsdcontoso.local:1433. You'll notice I'm doing both the NETBIOS and FQDN SPNs here. It is the same reason as with the HTTP SPN. In this case, the SQL Client connectivity components will do a reverse lookup on the server name to try and resolve the FQDN. So, with everything working as it should, it should always try to get the FQDN SPN even if you supply the NETBIOS server name in the connection string.

The SPNs for SQL Server are derived from the Connection String that the client is using. The client in this case being Reporting Services. Reporting Services is a .NET Application, so it is using SqlClient to connect to SQL.

We can see that there are no SQL SPNs registered on the service account or the machine account

clip_image020

So, lets go ahead and add the SPNs.

clip_image021

clip_image022

Everything looks good on the SPN front. For good measure, you may want to use the setspn tool to search for duplications. It is a new feature of SPN that was added in Windows 2008. It is the -X command. It will search the entire domain for duplicates. You should never have a duplicate as it will cause an error.

clip_image023

Looks like we do not have any duplicate SPNs. At this point the Report that we have with a data source to the SQL Server should run ok as there are no application specific settings that needs to be set for SQL Server outside of the SPN.

clip_image024

NOTE: Depending on how you have approached the setup, you may still encounter an error due to the fact that the failed Kerberos requests may still be cached. You can either wait for cache to clear out, or you can restart the services to get it going. I had to recycle SharePoint and Reporting Services for it to start working on my box, as well as log off and back in (or just run klist purge on the client).

Delegation

For your back end server, you may not need to enable delegation. If the hops stop with this server, then we are done and do not need delegation. However, if this backend server will be continuing on to another service, then delegation will be necessary if it will try to forward the windows user credential.

A great example of this with SQL Server is the use of a Linked Server. However, just the fact that you have a Linked Server doesn't mean that you need delegation. It is dependent on how you configure authentication on the Linked Server.

clip_image025

If "Be made using the login's current security context" is selected for the Linked Server, then we will need to enable delegation for the SQL Service account.

There are also other things that may require delegation from SQL. SQLCLR is one that might depending on what you are doing. The general rule of thumb is that if anything within SQL is trying to reach out to another resource and will need to send the current user's credentials, than you will need Delegation enabled on the SQL Service Account.

In my case I'm not, so I'm going to leave it alone.

Summary

So, that's it. We went through each stop along the communication path (SharePoint, RS and SQL), and we validated the settings for each one as we got there. We also saw that certain things began to work as we enabled items. The Report without the data source started working with SQL being setup because we weren't reaching out to SQL. And, we also looked at when you need to enable delegation or not depending on whether that service needed to reach out to another service. For Reporting Services, had we not been hitting a data source, we may not have needed to enable Delegation on the rsservice account as I showed with the HelloWorld report. But when we need to access data, we then need to have it if we want to use Kerberos. The other option would be to store the credentials within the data source.

Hopefully this helps someone when trying to setup this type of deployment, or any deployment that requires Kerberos in order to work.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

Viewing all 136 articles
Browse latest View live




Latest Images