Cloud backup and privacy: the problem with SpiderOak (part I)

Continuing the theme from an earlier post– that economic incentives for cloud computing favor service providers to have access to user data, instead of serving as repository of opaque bits– here we look at a service that attempts to swim against the current.

In the wake of FUD created around cloud computing due to PRISM allegations, SpiderOak has come to the forefront as exemplary service that optimizes for user privacy. SpiderOak provides remote backup and file access service, allowing users to save copies of their data in the cloud and access it from any of their devices. This is a crowded space with many competitors, ranging from startups specializing in that one field (DropBox, Mozy) to established companies (SkyDrive from MSFT, Google Drive from Google) offering cloud storage as one piece of  their product portfolio. Wikipedia has a comparison of online backup services, with helpful table that can be sorted on each attribute.

From a privacy perspective the interesting column is the one labeled “personal encryption.” The reason for this non-descriptive label is probably owing to the successful campaign of disinformation cloud service providers have embarked on to reassure users. Every service provider throws around phrases like “military grade encryption” and “256-bit AES” without any consideration to the overall threat model around what exactly that fancy cryptography is designed to protect against. Stripping away this usage of encryption as magic pixie dust, there are three distinct scenarios where it can be effective:

  1. Protecting data in transit. This assumes a bad guy eavesdropping on the network, trying to snoop on private information as it is being backed up to the cloud or, going in the opposite direction, as it is being downloaded from the cloud. This a well-understood problem, with established solutions. A standard communication protocol such as TLS can be used to set up an encrypted channel from the user to the service provider.
  2. Protecting data at rest, from unauthorized access. This is a slightly more nebulous threat model. Perhaps the provider backs up their own data on tape archives offsite, or sends off defective drives for repair– situations where media containing user data could be stolen. In this case bad guys– who are not affiliated with the cloud provider– attain physical possession of the storage. Proper encryption can still prevent them from recovering any useful information from that media.
  3. Protecting data from the service provider itself. This is the most demanding threat model. One envisions the provider itself going rogue, as opposed to #2 where they are only assumed to be incompetent/accident-prone. Standard scenario is the disgruntled employee with full access to the service, who decides to violate company policy and dig through private information belonging to customers. Slightly different but very contemporary issue is that of law enforcement access.

These properties are not entirely orthogonal. If data is not protected in transit, clearly it can not be protected from the service provider either– it is exposed during the upload time at a minimum. Likewise #3 implies #2: if the service provider can not decrypt the data even with full malicious intent, there is no act of negligence they can commit to enable third-parties to decrypt it either. The converse does not hold. Consider encrypted backups. If done correctly, the low-skilled burglars walking off with  backup tapes can not recover any user data. But since the provider can decrypt that data using keys held in its own system, so can others with access to the full capabilities of the company. That means not only the disgruntled employee looking for retribution, but also law enforcement showing up with appropriate papers not to mention APT from halfway around the world who 0wned the service provider.

It is easy to verify that #1 has been implemented, since that part can be observed by any user. Granted there are many ways to get TLS wrong, some more subtle than others. But that pales in comparison to the difficulty of independently verifying the internal processes used by the provider. Are  they indeed encrypting data at rest as claimed? Is there an unencrypted copy left somewhere accidentally? This is why designs that provide stringent guarantees about #3 are very appealing. If user data can not be recovered by the provider, it matters much less what goes on behind the closed doors of their data center.



6 thoughts on “Cloud backup and privacy: the problem with SpiderOak (part I)

  1. Point 3 is certainly interesting. A customer of mine had an experience with this from a very large, well-known cloud service provider who should have known better. While the service provider did deal with their staff member (I believe the police were involved) it was the last straw for my customer who cancelled their contract not long after.

    One thing this event made me think is that the provider’s rogue staff member may not even be that disgruntled – they just don’t feel they owe you (the customer) anything. It’s like that disconnect that sees many people happy to remove cash from a found wallet before handing it in, while at the same time believing themselves incapable of theft. Point 3 should be an essential requirement from any cloud service provider because they can’t protect us from human nature, whatever their work policies say.

    • I suspect many cloud providers mistakenly believe that server-side encryption magically solves that problem.

      In reality encryption only limits data to people holding the keys. Since the service provider at some level can decrypt, that means so can some employees. At best what they are doing is limiting the number of employees to worry about, which can still be a large number. (Often they can not even enumerate that group or it is vaguely defined, due to many privilege-escalation paths and the assumption that anything inside the perimeter is inherently “safe.”)

    • Thank you for that comment. I was not aware of BackupThat.

      From a quick scan, it looks like they actually collect the username/password to your email account. From the FAQ:

      “First, Backup That needs your email address and password in order to link your email account to your Backup That account. Your login creditentials [sic] are stored securely and encrypted. ”

      That sounds quite disconcerting– with those credentials, they could read all your email, not just the attachments backed up in this manner. (Also it would not be compatible with 2-factor authentication, since there is an OTP that changes each time along with password.)

      • They have a ton of encryption around my login credentials. If you read further, they talk about how they can’t see your files. I’ve been using them since I saw them at SXSW and I haven’t had a problem. They’ve been certified by etrust as well.

      • Regardless of how well login credentials are encrypted, at the end of the day they can decrypt to recover the original– because in order to login to Google requires submitting the cleartext password. (They could have used oauth2 to access email only by going through the approval flow, and then no password would be involved.)

        All that said, one can always create a throw-away GMail account only used for this purpose and containing no personal data, and share that password.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s