In-Depth
Kerberos Authentication 101: Understanding the Essentials of the Kerberos Security Protocol
Knowing the basics of this pervasive protocol can be critical in troubleshooting and solving Windows security problems.
While Windows IT professionals deal with security on a daily basis, very few understand the under-the-hood protocol, Kerberos. Kerberos is a security protocol in Windows introduced in Windows 2000 to replace the antiquated NTLM used in previous versions of Windows.
Kerberos has several important advantages. For example, it:
- is very secure, preventing various types of intrusion attacks
- uses "tickets" that can be securely presented by a client or a service on the client's behalf to a server for access to services
- permits Cross-Forest Trusts to use transitive properties and eliminate the "full mesh" scenario; all domains in both forests establish a trust with a single Kerberos trust at the root
- permits interoperability with other Kerberos realms such as Unix; this permits non-Windows clients to authenticate to Windows domains and gain access to resources
- provides authentication across the Internet for Web apps
Therefore, it's important to have a good understanding of how the Kerberos protocol works and be familiar with the details of the security functions. This will help with diagnosing a variety of security issues. In addition, IT professionals should understand how Windows Time Service works because Kerberos security is highly dependent on time services.
Kerberos, or Cerberus, is a three-headed dog in Roman mythology that guards the gates of the underworld, preventing inhabitants there from escaping. The Kerberos protocol prevents the bad guys from getting in. There are three components to Kerberos: the client, a service and a third-party that both client and service trust. I love the statement made by Fulvio Ricardi in his Kerberos Protocol Tutorial: Kerberos is "… an authentication protocol for trusted clients on untrusted networks." So, if Kerberos is designed to trust on an untrusted network, it should be even more effective on a trusted corporate network.
The Shared Secret
As noted previously, a key feature is the shared secret and a password that doesn't travel on the network. Thus the service (on the server) and the client (workstation) both know the password. The following scenario describes how this works:
- An account is created on the domain controller, or DC (the Kerberos Key Distribution Center or KDC) and given a password.
- The Kerberos client adds a text string (SALT) to the unencrypted password, along with a Kerberos version number (kvno), and runs those things through the "string2Key" conversion application. The "shared secret" is created. The SALT string is the username.
- At the workstation, the user enters the account name and password and requests certain services. The Kerberos client generates the secret key on the client. Because Kerberos uses the same algorithm to generate this secret key as was used on the KDC, the two secret keys will match as long as the username and password entered are the same.
- The user and the Authentication Service (AS) running on the KDC communicate using the shared secret.
Authentication and Authorization
Using the shared secret method, a user can log in and get access to some application or service, as illustrated in Figure 1. The APIs used are shown in the figure, such as "AS_REQ." The user logs into a workstation with an existing account. The AS_REQ API makes the request of the server by sending the user name. AS_REQ is encrypted. The KDC uses the shared secret associated with that user to decrypt the AS_REQ packet. If successful, the request is honored and a "Ticket Granting Ticket" (TGT) is returned in the AS_REP packet. The TGT can then be used by the client to prove the user is who she says she is and is properly authenticated. This ticket is good for a configurable time period.
[Click on image for larger view.] |
Figure 1. How a user can log in and access an application using the shared secret method. |
If the user wants access to some service or application on a server that requires a service ticket, the TGT just obtained is presented to the server hosting the Ticket Granting Service (TGS) using the TGS_REQ. In a Windows domain, the TGS, like the AS, is hosted on each DC. The TGS contacts the database to find the shared secret, decrypts the AS_REQ and grants the service ticket. The service ticket is encrypted by the Session Key, which is shared by services only. The user cannot decrypt a service ticket. The service ticket is returned using the TGS_REQ. The client cannot decrypt the service ticket because only servers can do that, but it can send it on. The client then sends the service ticket to the application server using the AP_REQ. This is like a locked box inside a locked box. The outer box (packet) can be opened by the service because it has the user's shared secret. It can then open the service ticket because it has the shared Session Key with the TGS. The user is thus validated. The application server would then apply the appropriate permissions to the user to determine if the action requested (such as read, write, change to a document) is granted to the user. If mutual authentication is required, the application server uses the AP_REP to tell the client which service was requested, as a security measure.
The Replay Attack
A replay attack occurs when an intruder steals the packet and presents it to the service as if the intruder were the user. The user's credentials are there -- everything needed to access a resource. This is mitigated by the features of the "Authenticator," which is illustrated in Figure 2. The Authenticator is created for the AS_REQ or the TGS_REQ and sends additional data, such as an encrypted IP list, the client's timestamp and the ticket lifetime. If a packet is replayed, the timestamp is checked. If the timestamp is earlier or the same as a previous authenticator, the packet is rejected because it's a replay. In addition, the time stamp in the Authenticator is compared to the server time. It must be within five minutes (by default in Windows).
[Click on image for larger view.] |
Figure 2. The Authenticator mitigates the possibility of a replay attack. |
If the time skew is greater than five minutes the packet is rejected. This limits the number of possible replay attacks. While it is technically possible to steal the packet and present it to the server before the valid packet gets there, it is very difficult to do.
It's fairly well known that all computers in a Windows domain must have system times within five minutes of each other. This is due to the Kerberos requirement.
Pre-Authentication
In previous versions of Kerberos (v4 and older), a password was not required for authentication. A simple valid user name would authenticate the user. In Kerberos v5, a password is required. This is called Pre-Authentication. It's possible to disable Pre-Authentication in order to provide backward compatibility for old Kerberos v4 libraries and Unix apps and so on.
Warning: Disabling Pre-Authentication is a serious degradation of security.
One of the components of the Authenticator is the ticket lifetime, also configurable in Group Policy. This permits the user to access server resources without re-authenticating for 10 hours by default, and is renewable without intervention by the user.
Time Services
As noted, the Windows Time Service is critical to proper functioning of the Kerberos security model. To keep system clocks on all computers in the domain within five minutes, Windows has used the Network Time Protocol (NTP) since Windows Server 2003, rather than the old Simple Network Time Protocol (SNTP) used previously. NTP uses a "reference clock" on each computer. The reference clock is set at UTC (think GMT) time and doesn't change from computer to computer, no matter what time zone the computer is in. This is often confusing to administrators, as it seems that a computer in Belgium would not be within the five-minute time skew of a computer in Atlanta, five time zones away.
It's important to separate the computer's reference clock from what you see in the Date and Time display in the notification area of the taskbar. The Date and Time display is just a convenient way for users to see what the local time is and has nothing to do with time synchronization for time services. Note that changing the time in the Date and Time display in fact does change the time of the reference clock by the delta that you choose.
For instance, as shown in Figure 3, if the UTC time is 13:00, and I'm in Atlanta (GMT -5), then the Date and Time display shows the time as 08:00. If I change the Date and Time display to 09:00, (Figure 4) then the reference clock is set ahead 1 hour to 14:00 when the UTC on all other machines is 13:00. This causes the time skew. That's why you can fix two computers that have a large time skew by changing the time with the Date and Time feature.
[Click on image for larger view.] |
Figure 3. UTC time is 13:00, but Date and Time shows the local time in Atlanta. |
Warning: Before changing the time, make sure you are indeed one hour out of sync with the actual time or it will cause authentication failures. You can change the time for certain troubleshooting techniques, but be careful that everything is correct when you finish.
[Click on image for larger view.] |
Figure 4. Changing the reference clock on one machine can cause time skew. |
Note that you can change the time zone and it will not affect the reference clock time. In my example, if I change my time zone to the U.S. Pacific Time zone, the display will show the time as 05:00, but the reference clock will remain unchanged.
This is demonstrated by a situation I found in our lab some time ago. I had a DC in Brussels that had been installed with the incorrect time zone. Rather than showing the Belgium time zone (UTC + 1:00), it showed Pacific Time (U.S. and Canada). It had actually been like this for a couple of years before we noticed it. The local admin had not noticed the displayed time was off from the actual local time. Yet there were no replication failures, no W32Time errors, and no authentication failures. So we changed the time zone and the display changed, but there was no effect on the reference clock. If the local admin had noticed that the displayed time was nine hours slow and changed the time rather than the time zone, then that DC would have a nine-hour time skew and authentication failures would have resulted.