Android RNG vulnerability: when the platform fails


The recent vulnerability in the Android pseudo-random number generator (PRNG) used in critical applications including Bitcoin serves as an instructive example of how distorted incentives in the mobile device ecosystem upends the established wisdom around relying on the platform.

At first this looks like any other PRNG vulnerability, in the spirit of the Debian openssl debacle from 2008 but certainly of lower severity. But unlike Debian which affected ssh and SSL server keys, the consequences in this case lent themselves to better press coverage. All Android Bitcoin applications were  impacted. Money was lost to the tune of $5700. Still PCs get popped and bank account information used for fraud involving much larger amounts, using far more mundane web browser and Flash vulnerabilities. What makes this vulnerability unique?

  • Word of the problem originated with an announcement on Bitcoin mailing list, warning of a problem in Android that required updating all clients. There was no authoritative announcement from Android team or attempts to coordinate updates for other security-critical applications that may be at comparable risk.
  • It is not uncommon for a vendor to be blind-sided by an external researcher publicly dropping news of a security vulnerability, a zero-day so to speak. But when that happens most companies are quick to react, spin up a response and issue some type of announcement in order to get ahead of the messaging. But Google was missing in action completely for the first couple of days. Eventually a post on the Android developer blog confirmed the vulnerability— without providing much in the way of details around root cause.
  • But the post did offer a “solution” to work around the problem: a code snippet for developers to incorporate into their own application. In other words, individual apps were being asked to issue updates to compensate for a platform defect.

In platforms we trust

Platform APIs exists to centralize implementation of shared functionality, which can be inefficient, tricky or plain impossible if every individual application attempted to duplicate for themselves. Cryptographic PRNGs are a particularly good example of functionality best left to the platform. Computers are fundamentally deterministic, contrary to their outward appearance and getting them to generate “random” behavior is a difficult problem. In the absence of special hardware such as upcoming physical RNG in Intel chips, the quality of randomness depends on access to entropy sources serving as seed for initial state. A good implementation will attempt collect from the hardware and surrounding environment: hardware identifiers, timing events, location, CPU load, sensors, running processes, …  Not surprisingly this complexity is a good candidate for being punted to the platform. Windows packages it into platform API with CryptGenRandom. In Linux it is integrated even deeper into the OS as kernel-mode devices mapped under /dev/random and /dev/urandom, which are then invoked by cryptography libraries such as openssl.

Bitcoin applications relying on the platform API to produce cryptographic quality random numbers were on solid ground, following common wisdom in computer science about not reinventing the wheel. This is particularly true in cryptography which abounds in stories of home-brew implementations gone awry. The assumption is that platform owner is responsible for understanding how a particular primitive is best implemented on a given hardware/OS combination. That includes not only security against side-channel attacks but also leveraging hardware capabilities to squeeze that last percent of performance. (For example recent Intel processors have instructions for speeding up AES.) This is  the implicit contract between developers and platform owners: Developers call the “official” supported API which promises identical behavior across a wide-variety of devices and versions of the platform, instead of relying on undocumented behavior or reinventing the wheel in left-field. In return, the platform owner commits to keeping that API stable and providing the ideal implementation taking into account hardware/OS quirks.

Systemic risk

Platforms are not immune to vulnerabilities either and their effects can be far-reaching. For example in 2004 a critical vulnerability was found in the Windows ASN1 parser. The vulnerability affected all applications leveraging that API, directly or indirectly. These dependencies are not always obvious: sometimes the app calls one API which happens to invoke the vulnerable one under the covers, unbeknownst to the developer. For example parsing digital certificates invokes these code paths because X509 is built on ASN1. Widespread usage greatly amplifies the attack surface: any one of these applications can be used to trigger the vulnerability. This helps the attacker, creating multiple avenues to deliver an exploit. Whether or not it also increases the complexity for defender side depends on exactly how tightly applications and platform are connected.

  1. Loose coupling. In the case of Windows most platform functionality is packaged as a shared component called “dynamic link library” or DLL. There is exactly one DLL implementing an API and that single binary happens to be shared by all applications on the system.** In the best case scenario then “security fix” means Microsoft updating that DLL. That is what happened with that ASN1 vulnerability MS04-007. Users received patches from Microsoft Update website, individual applications using the API were not affected. Granted the process is far from perfect and occasionally faulty patches will break functionality, or reveal incorrect behavior in apps relying on undocumented quirks of the API. But it works well enough to operate at large scale by platform owners.
  2. Other times there is no clean separation between “platform” and its consumers. Instead the functionality offered by the platform provider is duplicated in every single application. A notorious example from 2009: critical vulnerability in Active Template Library. ATL is a building-block for developers offered by MSFT, used for writing ActiveX controls. That was bad news enough. Because these controls are often designed to be embedded on web pages, they are accessible to malicious sites. Worse, the vulnerability was traced to a part of ATL that is not distributed as part of Windows. Instead each application effectively incorporates the vulnerable code snippet into itself. Identifying and fixing all such vulnerable controls proved quite the mess, requiring a concerted outreach by MSFT to persuade individual software vendors to publish updates.

Strange case of the Android RNG vulnerability

Android circa 2013 can not accomplish what Windows could in 2003. The RNG vulnerability falls into category #1 while the response decidedly looks like #2. Based on published accounts, root-cause is a bad interaction between the way Android forks new processes and how Java RNG implementation initializes its entropy pool. This is fundamentally a platform problem. When the next version of Android is delivered to a device, this problem will go away. The implementation underlying that Java API for generating random-number will be updated transparently. Third-party applications calling that API will not notice a difference. The exact same app will call the same API but magically receive “more random” looking random numbers, instead of  the predictable ones that caused problems for Bitcoin. Individual developers do not recompile their applications and users do not have to reinstall every application to benefit from the upgrade.

Except this is exactly what happened: Android issued a desperate plea for developers to rewrite their applications and compensate for a bug ultimately fixed in the platform. What is going on here? The answer has little to do with any architectural  deficits in Android or Google shirking responsibility. Instead it is another consequence of the distorted economics of mobile, all but guaranteeing that users can not receive security updates.

[continued]

CP

** Strictly speaking there can be multiple variants. For example on 64-bit Windows, there is one for 32-bit and one for 64-bit applications. Similarly .NET introduces managed assemblies that can request a particular DLL version for compatibility. In these cases the number of files requiring an update can go from 1 to many. But it still remains the case that such updates can be delivered by MSFT– even for legacy versions of the DLL– without action required on the part of application developers.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s