Revisiting the ideal layered model for encrypted cloud backups from the previous post:
- Encrypt user data with keys managed directly by the user, and not available to any third-party
- Upload the resulting ciphertext after encryption to the cloud
Many solutions exists for the first part of the problem, which makes it a natural question to ask if they can be used without modification with a standard cloud storage offering to protect data from any mishaps in the cloud. In other words, the goal is to use the cloud as a glorified disk drive storing opaque bits, with no ability to peer into the meaning of those bits or perform any intelligent manipulation on them in the name of “adding value.”
First point is that this composition is not always straightforward. Many obvious combinations do not work, due to the mismatch in the level where cloud backup systems operate and the level local data encryption is typically applied. To take on example of how things fail: Encrypted File System or EFS has been a feature of Windows since the W2K release. EFS operates at the filesystem level as part of NTFS, allowing entire directories or even individual files to be encrypted. Fortunately many popular cloud storage systems such as Google Drive, Windows Sky Drive and Drop Box also present themselves as a local folder where files can be drag/dropped. A first attempt than might be enabling EFS on that folder, on the assumption that will result in encrypting the underlying content and keeping it safe from prying eyes on the cloud provider side. Enabling EFS is simple enough, via the advanced properties for any directory:
This may look straightforward but it does not accomplish the intended effect of keeping the original data invisible to the cloud provider. To see why, we need a few facts about the way these cloud storage services usually operate. Typically there is a background process which we can call the synchronization agent, although they are often not implemented as a proper Windows service as one might expect. That agent is responsible for monitoring a set of local directories for changes, as well as listening to notifications from the cloud provider about availability of new content in the cloud. When the operating system informs the agent that a local file has changed, that agent kicks into action and uploads latest version of the modified file to the cloud. In the other direction, when the cloud provider sends a notification to the agent that there is a more recent version of the file available in the cloud, it is downloaded and dropped into the local folder.
The reason this does not interact as expected with EFS encryption is that the latter operates at a very low level, transparent to the I/O capabilities used by agents. Synchronization takes place under the same user account as the person who owns the EFS-protected directory containing confidential information. When that process attempts to open some file for reading, EFS driver in the filesystem stack kicks into action. It recognizes that the file is encrypted and looks for a decryption key. Because there is a private key associated with the same user account– corresponding public key was used to encrypt the directory in the first place– the driver can transparently decrypt that file and return its unprotected contents to the agent. The result is that instead of encrypted files, the original cleartext data will get sent over to the cloud provider.**
Incidentally this transparency is entirely by design. Reading/writing contents of encrypted files works exactly the same as for ordinary files. Same APIs and code paths are invoked from the developer perspective. Imagine the alternative: if there was a different set of APIs involved, every application would have to be rewritten to become compatible with EFS. Pragmatically then nobody could enable EFS, out of fear that some application they use had not been upgraded to become encryption-aware yet. EFS designers made the right call in opting for compatibility. It is an unintended consequence that EFS does not compose directly with popular cloud-storage designs in use today. (Incidentally there are specific APIs to retrieve encrypted contents of a file. While these would have achieved the desired effect of only backing up ciphertext, it requires modifying the synchronization agents to invoke different code paths.)
Next post will look at tricks for combining specific local encryption schemes with arbitrary off-the-shelf cloud storage solutions to achieve the desired privacy property: data stored remotely is not accessible to the cloud provider regardless of their intentions.
** As an aside, lack of encryption is not the only problem. Even if that could be addressed, there is loss of functionality in that uploaded data would become inaccessible on any other machine or even a fresh install of Windows on the same box. EFS uses public-key pairs that are generated locally on the machine where encryption is first applied. It is not trivial to roam that key to another device. EFS does have a notion of recovery agents to allow decryption when the original key pair is not available, but that heavyweight process would have to be invoked on every new device to recover access. For now we put aside this problem of access from multiple devices, focusing only on privacy in the context of just one device getting backed up and restored.