Goto fail and more subtle ways to mismanage vulnerability response


As security professionals we are often guilty of focusing single-mindedly mindedly on one aspect of risk management, namely preventing vulnerabilities, to the exclusion of others: detection and response. This bias seems to have dominated discussion of the recent “goto fail” debacle in iOS/OS X and its wildly improbable close-cousin in GnuTLS. Apple has been roundly criticized and mocked for this self-explanatory flaw in SecureTransport, its homebrew SSL/TLS implementation. The bug voided all security guarantees the SSL/TLS protocol provides, rendering supposedly “protected” communications vulnerable to eavesdropping.

But much of the conversation and unofficial attempts at post-mortems (true to its secretive nature, Apple never published an official explanation, but conveniently created a well-timed distraction in the form of a whitepaper touting iOS security) focused on the low-level implementation details as root cause. Why is anyone using goto statements in this day-and-age, when the venerable Edsger Dijsktra declared way back in 1968 that they ought to be considered harmful? Why did they not adopt a coding convention requiring braces around all if/else conditionals? How could any intelligent compiler not flag the remainder of the function as unreachable code when the spurious goto statement was causing?** Why was the duplicate line missed in code reviews when it stands out blatantly in the delta? Did Apple not have a good change-control system for introducing code changes? Speaking of sane software engineering practices, how is it possible that code-flow jumps to a point labelled “fail” and yet still returns  success, misleading callers into believing that the function completed successfully? To step back one more level, why did Apple decide to maintain its own SSL/TLS implementation instead of leveraging open-source libraries such as NSS or openssl which have benefited from years of collective improvement and cryptographic expertise that Apple does not have in-house?

All good questions, partly motivated by a righteous indignation that such a catastrophic bug could be hiding in plain sight. But what about the aftermath? Once we accept the premise that a critical vulnerability exists, the focus shifts to response. Putting aside questions around why the flaw existed in the first place, let’s ask how well Apple handled its resolution.

  • There was no prior announcement that an important update was about to be released. Compare this to the advance warning MSFT provides for upcoming bulletins.
  • A passing mention in the release notes about the vulnerability, with an ominous statement to the effect that “an attacker with a privileged network position may capture or modify data in sessions protected by SSL/TLS.” Not a word about the critical nature of the flaw or a pleas for users to upgrade urgently. One would imagine that an implementation error that defeats SSL– the most widely deployed protocol for protecting communications on the Internet– and allows eavesdropping on millions of users’ traffic would hit a raw nerve in this post-Snowden world of  global surveillance. Compare Apple’s nonchalance and brevity to the level of detail in a past critical security update from Debian or even routine MSFT bulletins released every month.
  • The update was released on a Friday afternoon Pacific-time. This is the end of the work-week in Northern America, and well into the weekend in Europe. Due to lack of upfront disclosure by Apple, the exact nature of the vulnerability was not reverse-engineered publicly until several hours later. That is suboptimal timing to say the least for dropping a critical fix, especially in a managed enterprise IT environment with a large Mac fleet and security team tasked with trying to ensure that all employees upgrade their devices. (Granted, Apple never seems to have cared much for the enterprise market, as evidenced by weak support for centralized management compared to Windows or even Linux with third-party solutions.)
  • The update addressed the vulnerability only for iOS, leaving Mavericks, the latest and greatest desktop operating system vulnerable. In other words, Apple 0-dayed its own desktop/laptop users with an incomplete update aimed at mobile users. Why? At least three possibilities come to mind.
    1. Internal disconnect: Apple may not have realized the exact same bug existed in the OS X code base– but this is a stretch, given the extent of code sharing between them.
    2. Optimism/naiveté: Perhaps they were aware of the cross-platform nature of the vulnerability but assumed nobody would figure out exactly what had been fixed, giving Apple a leisurely time-frame to prepare an OS X update before the issue poses a risk to users. To anyone familiar with shrinking time-windows between patch release and exploit development, this is delusional thinking. There is 10 years worth of research on reverse-engineering vulnerabilities from patches, even when the vendor remains silent on details of the vulnerability or even existence of any vulnerabilities in the first place.
    3. Deliberate risk-taking / cost-minimization: The final possibility is Apple did not care or prioritized mobile platforms over traditional laptop & desktops. Some speculated that Apple was already planning to release an update to Mavericks incorporating this fix and saw no reason to rush an out-of-band patch. (Compare this to approach MSFT has taken towards critical vulnerabilities. When there is evidence of ongoing or imminent exploitation in the wild, the company has departed from the monthly cycle to deliver updates immediately as with MS13-008.)
  • No explanation after the fact about the root-cause of the vulnerability or steps taken to reduce chances of similar mistakes in the future. This is perhaps the most damning part. The improbable nature of the bug– one line of code mysteriously duplicated, looking so obviously incorrect on even the most cursory review– fueled much speculation and conspiracy theories around whether it had been a deliberate attempt to introduce a backdoor into Apple products. Companies are understandably reluctant to release internal postmortem out of fear that they may reveal proprietary information or portray individual employees in an unflattering light. But in this case even an official blog post summarizing the results of an investigation could have sufficed to quell  wild theories.

Coincidentally the same Friday this bug was exposed, this blogger gave a presentation at Airbnb arguing that OS X is a mediocre platform for enterprise security, citing lack of TPM, compatibility issues with smart-cards and dubious track record in delivering security updates. For the next four days of goto-fail fiasco, Apple piled on the evidence supporting that last point. In some ways the continuing silence out of Cupertino represents an even bigger failure to comprehend what it takes to maintain trust when vulnerabilities, even critical ones, are inevitable.

CP

** It turns out in this case the blame goes to gcc. By contrast MSVC does correctly flag the code as unreachable.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s