The notion of private computation in the cloud has been around at least in theory for almost as long cloud computing itself, even predating the times when infrastructure-as-a-service went by the distinctly industrial sounding moniker “grid-computing.” That precedence makes sense, because it addresses a significant deal-breaker for many faced with the decision to outsource computing infrastructure: data security. What happens to proprietary company information when it is now sitting on servers owned by somebody else? Can this cloud-provider be trusted to not “peek” at the data or tamper with the operation of the services that tenants are running inside the virtual environment? Can the IaaS provider guarantee that some rogue employee can not help themselves to confidential data in the environment? What protections exist if some government with creative interpretation of fourth-amendment right comes knocking?
Initially cloud providers were quick to brush aside these concerns with appeals to brand authority and brandishing certifications such as ISO 27001 audits and PCI-compliance. Some customers however remained skeptical, requiring special treatment beyond such assurances. For example Amazon has a dedicated cloud for its government customers, presumably with improved security controls and isolated from the other riff-raff always threatening to break out of their own VMs to attack other tenants.
Meanwhile the academic community was inspired by these problems to build a new research agenda around computing on encrypted data. These schemes assume cloud providers are only given encrypted data which they can not decrypt- not even temporarily, an important distinction that critically fails for many of the existing systems as we will see. Using sophisticated cryptographic techniques, the service provider can perform meaningful manipulations on ciphertext such as searching for text or number-crunching, producing results that are are only decryptable by the original data owner. This is a powerful notion. It preserves the main advantage of cloud computing: lease CPU cycles, RAM and disk space from someone else on demand to complete a task while maintaining confidentiality of the data being processed, including crucially the outputs from the task.
Cloud privacy in practice
At least that is the vision. Today private-computation in the cloud is caught in a chasm between:
- Ineffective window-dressing that provides no meaningful security- subject of this post
- Promising ideas that are not quite feasible at-scale yet, such as fully homomorphic encryption
In the first category are solutions which boil down to the used-car salesmen pitch: “trust us, we are honest and/or competent.” Some of these are transparently non-technical in nature: for example warrant canaries are an attempt to work-around the gag-orders accompanying national security letters by using the absence of a statement to hint at some incursion by law enforcement. Others attempt to cloak or hide the critical trust assumption in layers of complex technology, hoping that an abundance of buzzwords (encrypted, HSM, “military-grade,” audit-trail, …) can pass for a sound design.
Box enterprise key management
As an example consider enterprise-key management feature pitched by Box. On paper this is attempting to solve a very real problem discussed in earlier posts: storing data in the cloud encrypted in such a way that the cloud-provider can not read the data. To qualify as “private-computation” in the full sense, that guarantee must hold even when the service provider is:
- Incompetent- experiences a data-breach by external attackers out to steal any data available
- Malicious- decides to peek into or tamper with hosted data, in violation of existing contractual obligations to the customer
- Legally compelled- required to provide customer data to law-enforcement agency pursuant to an investigation
A system with these properties would be a far-cry from popular cloud storage solutions available today. By default Google Drive, Microsoft One Drive and Dropbox have full access to customer data. Muddying the waters somewhat, they often tout as “security feature” that customer data is encrypted inside their own data-centers. In reality of course such encryption is complete window-dressing: it can only protect against risks introduced by the cloud service provider, such as rogue employees and theft of hardware from data-centers. That encryption can be fully peeled away by the hosting service whenever it wants, without any cooperation required by the original data custodian.
The solution Box has announced with much fanfare claims to do better. Here is an outline of that design to the extent that can be gleamed from published information:
- There is a master-key for each customer, where “customer” is defined as an enterprise rather than individual end-users. (Recall that Box distinguishes itself from Dropbox and similar services by focusing on managed IT environments.)
- As before, individual files uploaded to Box are encrypted with a key that Box generates.
- The new twist is that those individual bulk-encryption keys are in turn encrypted by the customer specific master-key
So far, this is only adding a hierarchical aspect to key management. Where EKM is different is transferring custody of the master-key back to the customer, specifically to HSMs hosted at Amazon AWS and backed-up by units hosted in the customer data-center holding duplicates of the same secrets keys. (It is unclear whether these are symmetric or asymmetric keys. The latter design would make more sense by allowing encryption to proceed locally without involving remote HSMs and only decryption to require interaction.)
Box implies that this last step is sufficient to provide “Exclusive key control – Box can’t see the customer’s key, can’t read it or copy it.” Is that sufficient? Let’s consider what could go wrong.