The limits of certificate revocation

In the wake of the DigiNotar debacle, it is time to revisit a question that inevitably comes up each time another certificate authority makes a mistake: does certificate revocation help? The short-answer is it turns out, probably not.

Briefly, revocation checks refer to additional steps used to verify the validity of a digital certificate that involves communicating with the issues over the network. These are in addition to the local checks, such as verifying the signature, checking that the certificate is not expired and comparing the name on the certificate to the expected ID. Local checks are cheap from a computational perspective, but they are also static: if a certificate passes these checks once, it will continue to pass them until the expiration date. If the assertions made on the certificate are invalidated at a later time– for example, the user loses their private key– we can not find out about such changes merely by inspecting the certificate one more time. Instead we have to go

There are two ways revocation status can be checked. Simplifying somewhat:

  • Certificate revocation lists or CRLs. CRLs are giant lists of all revoked certificates, periodically published by the issuer. Anyone can download these and check if a particular certificate is on the list.
  • Online Certificate Status Protocol, OCSP.  In this model one queries the issuer directly about one particular certificate, asking in effect “what is the latest news on this certificate?”

OCSP addresses one of the main challenges of CRLs. Verifying the validity of a single certificate using a CRL involves downloading a massive registry containing thousands of other, completely unrelated revoked certificates. This is problematic because often certificate validation is a performance bottleneck for latency. For example when connecting to a website using SSL, the certificate of the server must be validated. If the user is being extra cautious and includes revocation checking, then any communication with that website is now blocked on the completion of that step. While CRLs can be cached for future use and do not have to be downloaded each time, the cost to “bootstrap” from an empty cache can be prohibitive. In these situations a single OCSP query can be more efficient than an extended CRL download. On the other hand, if hundreds of certificates need to be verified from the same issuer, we reach a cross-over point where economies of scale confer an advantage on the bulk-mode operation with CRLs.

Armed with these options, it looks at first sight that incidents along the lines of DigiNotar can be contained by promptly revoking the improperly issued certificates, such as the bogus GMail certificate discovered in the wild for intercepting traffic to Google. The problem is the revocation model itself assumes a certain pattern of limited, isolated “mistakes” on the part of the certificate authority. Failure modes beyond that are outside the scope of the threat model, and can not be mitigated using either CRLs or OCSP.

The standard example of a certificate authority “mistake” is issuing a certificate to the wrong person. To sketch a hypothetical scenario: someone calls up Acme CA, introducing himself as a Microsoft employee. They request a certificate for, the authentication service used by virtually all MSFT online services. Acme CA does not vet the identity of this requestor properly (and why should they? they are getting paid to issue certificates, not for saying “no” as pointed out in earlier post on misaligned economic incentives) issuing the requested certificate to this unauthorized person, who turns out to be working for a repressive government trying to eavesdrop on citizens’ communications.

Both CRLs and OCSP are tailor-made for this scenario. Once the clueless CA realizes their mistake (“what do you mean MSFT has their own cross-signed CA and has no reason to get one-off certificates from us?”) they can blacklist this certificate. It will appear in the next CRL published, and for those who can not wait that long, the OCSP responder will immediately start reporting a revoked status to anyone that asks. Admittedly this best-case scenario is still glossing over the subtleties of which pieces of commonly used software are in fact checking for revocation by default or for that matter what happens if revocation checks fail for unrelated reasons such as network flakiness, or even active attacks as pointed out in 2009 by Moxie Marlinspike in a Blackhat talk. One can argue that suboptimal decisions by client implementations can not be blamed on the protocol itself.

But there is a different failure mode for CAs that does not fit the convenient pattern described above. In the earlier example we assumed that:
1. CA remains in control of their private key– the key itself has not been shipped off to China
2. CA remains in control of the certificate fields being signed; for example the serial number, key usage, expiration dates etc. are all set according to the usual procedures. The only wrong field is the so-called “distinguished name” identifying the purported owner.
Clearly #2 implies #1, as attackers can fill in the blanks in the certificate fields to their heart’s content if they had direct posession of the signing key.  In that sense #2 is a strong requirement, and it turns out that if this is violated both CRL and OCSP are toast.

Some of the issues lie in the protocol details: as pointed out, OCSP uses serial numbers to identify certificates and so does the CRL format, as explained in this MSDN article. Serial numbers are just a field in the certificate. A sufficiently misguided CA could end up reusing the same serial number, one for issuing a healthy certificate to its rightful owner, and a different completely unrelated certificate to unauthorized persons. This means that the very nature of an OCSP query is ambiguous, and in the best case scenario revoking a forged certificate will have “collateral damage” on other benign certificates.

But even if we had better identifiers to uniquely identify certificate (the hash would have been an obvious choice) there are structural problems in the design.

The first problem is that both CRL and OCSP responses are digitally signed, either directly by that CA, or in delegation scenarios, by another certificate issued by the CA. If the attackers had free reign to obtain arbitrary certificates from the issuer, they were in a position to obtain the credentials required to also forge CRL and OCSP responses on behalf of the CA. When the end user decides to check on the status of that bogus GMail certificate being used to intercept their private communications, our attackers would substitute an equally bogus response that appears to originate from the OCSP responder in effect saying, “move along, these are not the revoked certificates you are looking for.” (Incidentally the same logical circularity applies to the “CA compromise” status code defined in CRL: if the certificate authority itself has been compromised, the client has no expectation of being able to trust the information in a CRL and the attacker can make up arbitrary CRLs.)

The second problem is that the location of OCSP responder and  the CLR distribution point are themselves fields in the certificate. If the miscreants had freedom to craft their own certificate and get it signed (instead of conforming to a template defined by the CA) they can simply omit the Authority Information Access field containing the OCSP responder’s location, or point the CDP at a random location controlled by the attacker.

Third the CA often has no idea what certificates have been issued, if the attack has circumvented the usual enrollment process. The serial number, distinguished name or other details required to blacklist the certificate via revocation may not have been logged. In this case the CA only knows that some fraudulent certificates were issued, but they have no idea which sites can be targetted. Until the certificate is observed in the wild, there is nothing to revoke.

Finally– and this plagues the recovery efforts– often it is not possible to determine conclusively whether the mishap experienced by the CA indeed amounts to a few isolated cases conforming to the pattern anticiapted by revocation designs, or if attackers managed to breach the process itself at a deeper level beyond repair. Both ineptitude and economic incentives are at work in this uncertainty. On the one hand, logs maybe incomplete, or the forensics inconclusive to say either way. On the other hand, PR pressures motivate organizations to minimize the perceived damage and hope for the best, until hard evidence proves otherwise. One look no further than the persistant denial by RSA that SecurID breach would have no impact on customers– until Lockheed Martin incident forced them to admit otherwise. DigiNotar set a shining example of transparency here by remaining quiet about the breach until the MITM attack in Iran surfaced. Days after the incident they still lacked a coherent story on what exactly went on. Faced with such negligence, it is better to assume the worst and not depend on revocation for piecemeal mitigation.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s