Recap from this series of posts exploring the idea of creating a private cloud storage systems (where the service provider can not read user data even if they want to or are compelled to) using only commodity systems:
- Encrypting File System (EFS) does not interact as expected with cloud storage systems, leading to unprotected plaintext data being uploaded.
- Parts 4 & 5 explored how BitLocker can be used to encrypt virtual disk images (VHDs) which are then uploaded to the cloud. But this design suffers from very slow upload times due to lack of incremental sync in most storage system, not to mention the inability to perform backups when the VHD contents are in use.
- Parts 6 & 7 looked at an alternative design that applies BitLocker on virtual iSCSI targets inside a Windows VM hosted in the cloud. This one has incremental replication but does not provide data integrity when used over untrusted connections. It also has problems with concurrent access, requiring some higher-level protocol to ensure that only one device is accessing data at a given time.
Given that none of these proof-of-concept implementations were practical, time to ask a different question: what would an ideal cloud storage system look like?
1. 100% user ownership for keys
Cryptographic keys used for encryption are generated by the user and only stored on user-owned/operated devices. Keys are never “loaned” to the cloud provider, not even temporarily to perform on-the-fly decryption operations when the user is accessing data. Otherwise the provider can make a copy of the key or otherwise improperly capture the key, extending that “temporary” access into “permanent” access. Similarly keys can not be stored by the cloud provider permanently, not even in password protected form because that would permit the cloud provider to mount an offline attack to try to guess that password. (SpiderOak fails this criteria as noted earlier.)
2. Locally installed and managed application
Of course the line between “locally installed”– or what used to be called shrink-wrap software in the days when applications would come in boxes lining store shelves– versus “web-based” is increasingly blurred. Even local applications can have update mechanisms that call home and receive additional pieces of code they execute. Depending on the vendor, such mechanism may or may not provide any user control. Microsoft for instance tends to make automatic updates at least opt-out. Google on the other typically favors forcing updates on users.
In this mode the software publisher to slip-in malicious logic targeted at specific users to undermine the encryption process. That said this is a relatively high-risk process. In principle such a backdoor would be discoverable if someone went to the trouble of reverse-engineering updates. It would also be undeniable for the publisher, since software updates are typically digitally signed. A reputation for delivering malicious updates can be limiting for future business prospects. (As an aside, Hushmail also has an option for using a Java applet, touted as “safer” option— never mind that Java in the browser has been a source of constant vulnerabilities. But that applet itself comes from Hushmail, so there is no reason the service provider could not tamper with its logic if it were inclined to do so.)
3. Standardized, modular protocol for cloud synchronization
To avoid the type of situation described earlier, it is best to decouple the local software that provides encryption from the remote service that provides storage of bits. Ideally this means a modular design: multiple local encryption schemes can be coupled with a given storage provider. Conversely for a given local encryption scheme, there are multiple providers who can store the resulting ciphertext, competing on factors such as space allocated, bandwidth and cost. This relieves the user from depending on a single entity for both providing the encryption logic and storing the resulting ciphertext. More importantly it gives users full control over the implementation: if they distrust a particular software publisher, they can choose a different interoperable one.
4. Encryption at individual file-level
This is primarily to simplify access from multiple devices. It is much easier to merge or manage changes at the granularity level of individual files than at a lower level of filesystems. The reason Bitlocker-based designs did not handle concurrent access very well is that a filesystem is effectively a global structure that can not be managed piecemeal by multiple devices unaware of each other. Worst-case scenario would be one device overwriting an edit made elsewhere but this is far more amenable to existing technologies for tracking/merging changes as long as all versions of the file can be retrieved from the cloud.