Cloud storage sans surveillance-capitalism
This post picks up on a series of experiments from 2013, originally written in the aftermath of the Snowden disclosures. These experiments started with one question: how feasible is it to use cloud storage services as a glorified remote drive, without giving the service provider any visibility into data stored? This approach stands in stark contrast to how most cloud providers would prefer their services to be used. Call it the Surveillance Capitalism approach to storage: Google Drive, Dropbox and Microsoft One Drive all operate in terms of individual files. While each provider may proudly tout their encryption-in-transit and encryption-at-rest to protect those files as they bound around the internet, they all gloss over one inconvenient detail: the provider has access to the contents of that file. In fact the whole business model is predicated on being able to “add value” to contents. For example if it is a PDF, index the text and allow searching by keywords across your entire collection of documents. If it is a photograph, analyze and automatically tag the image with names using facial recognition. For the most part, all of these applications require access to the cleartext content. While there is a nascent research field for working with encrypted data—where the service provider only has access to encrypted contents but can not recover the original plaintext— these applications are largely confined to a research setting. “Trust us,” the standard Silicon Valley bargain goes: “we need access to your data so we can provide valuable services at zero (perceived, upfront) cost to you.”
This approach underserves a vocal segment of consumers who are uncomfortable with that trade-off, who would gladly dispense with these so-called value adds or pay a premium in exchange for better privacy. Specialized services cropped up in the aftermath of Snowden revelations catering to that segment, promising to provide encryption-at-rest with keys only held by the customer— Bring Your Own Keys or BYOK model. Yet each offering was accompanied by the baggage of home-brew design and proprietary clients required to access data protected according to that model. This made integration tricky, because protecting remote data looked nothing like protecting local data. Each platform already has a de facto standard for encrypting local disk drives: Bitlocker for Windows, LUKS for Linux and Filevault on OSX. Their prevalence lead many individuals and organizations to adopt key management strategies tailored to that specific standard, designed to achieve desired security and reliability level. For example an organization may want encryption keys rooted in hardware such as TPM while also requiring some recovery option in case that TPM gets bricked. Proprietary designs for encrypting remote storage are unlikely to fit into that framework or achieve the same level of security assurance.
AWS Storage Gateway
AWS Storage Gateway product is hardly new. Part of the expanding family of Amazon Web Services features, it was first introduced in 2012. Very little has changed in the way of high-level functionality— this blog post could have been published seven years ago. While AWS also provides file-oriented storage options such as S3 and Glacier, ASG operates on a different model: it provides an iSCSI interface, presenting the abstraction of a block device. An iSCSI volume is accessed the same way a local solid-state or spinning drive would be addressed in terms of chunks of storage: “fetch the contents of block #2” or “write these bits to block #5”). One corollary is that existing operating system features that work on local disks also work on remote iSCSI volumes— modulo minor caveats. In particular full disk encryption schemes can be used to encrypt them in the same way they can encrypt local drives— an earlier blogpost walked through the example of Windows Bitlocker-To-Go encrypting an iSCSI volume using smart-cards.
The “gateway” itself is either a dedicated hardware appliance available for sale or virtual machine that can be hosted on any common virtualization platform. Management of the appliances is split between the AWS Management Console and a restricted shell running on the appliance itself.
iSCSI beyond the local neighborhood
ASG represents more than a shift from one protocol to another more convenient protocol. After all, no one needed help from Amazon to leverage iSCSI; it is commodity technology dating back two decades. It does not even require specialized storage hardware. Windows Server 2012 can act as a virtual iSCSI target, providing any number of volumes with specific size that can be accessed remotely. So why not launch a few Windows boxes in the cloud— perhaps at AWS even— create iSCSI volumes and call it a day?
The short answer is iSCSI is not designed to operate over untrusted networks. It provides relatively weak, password-based initial authentication and more importantly, provides no security on the communication link. The lack of confidentiality is not necessarily a problem when one assumes data itself is already encrypted, but lack of integrity is a deal breaker: it means an adversary can modify bits on the wire, resulting in data corruption during reads or writes. Granted, sound full-disk encryption (FDE) schemes seek to prevent attackers from making controlled changes to data. Corrupted blocks will likely decrypt to junk instead of a malicious payload. But this is hardly consolation for customers who lose valuable data. For this reason iSCSI is a better fit inside trusted local networks, such as one spanning a datacenter. On the other hand, if servers providing those iSCSI targets are inside the datacenter, one has not achieved true “cloud storage” in the sense of having remote backups— now those serves have to be backed up some place else in the cloud, outside the datacenter.
ASG provides a way out of that conundrum. On the front end, it presents a traditional iSCSI target that other devices on the network can mount and access. While that part is not novel, the crucial piece happens behind the scenes:
- ASG synchronizes contents with AWS S3. That channel does not use iSCSI; otherwise tit would be turtles all the way down. Instead Amazon has authored a custom Java application that communicates with Amazon using an HTTP-based transport protected by TLS.
- ASG also has intelligent buffering to synchronize write operations in the background, based on available bandwidth. To be clear, ASG maintains a full copy of the entire disk. It is not a caching optimization designed to keep a small slice of contents on frequency of access. All data is locally present for read operations. But writes must propagate to the cloud and this is where local, persistent buffering provides a performance boost by not having to block on slow and unreliable network connections to the cloud. If the VM crashes before incoming writes are synchronized to the cloud, it can pick up and continue after the VM restarts.
Encrypted personal cloud storage
Here is a simple model for deploying AWS Storage Gateway to provide end-to-end encrypted personal storage on one device. This example assumes Windows with a virtualization platform such as Hyper-V or VMware Workstation:
- AWS Storage Gateway will run as a guest VM. While ASG is very much an enterprise technology focused on high-end servers, its resource requirements are manageable for moderate desktops and high-end laptops. iSCSI is not particularly CPU intensive but AWS calls for ~8GB memory allocated to the VM, although the service will run with a little less. It is however more demanding of storage: buffers alone require half terabyte of disk space even when the underlying iSCSI volume itself is only a handful of GB. AWS software will complain and refuse to start until this space is allocated. Luckily most virtualization platforms support dynamically resized virtual disk images. The resulting image has an upper bound on storage, but only takes up as much space as required to hold current contents. This makes it possible to appease the gateway software without dedicating massive amounts of space upfront.
- Configure networking to allow inbound connections to the VM over the internal network shared between host & guests. The VM must still have a virtual adapter connected to an external network, since it needs to communicate with AWS. But it will not be allowed to accept inbound iSCSI connections from that interface.
- Use the Windows iSCSI initiator to access the iSCSI target over the local virtual network shared between hosts & guests
- After the disk is mounted, create an NTFS-formatted volume and configure Bitlocker disk encryption as usual. Windows Disk Manager utility treats the AGS volume as ordinary local disk. In fact the only hint that it is not a vanilla hard drive is contained in the self-reported device name from the gateway software.
This model works for personal storage but poses some usability problems. In particular it requires a local VM on every device requiring access to cloud storage and does not allow concurrent access from multiple devices. (Mounting the same iSCSI target from multiple initiators in read/write mode is an invitation to data corruption.) The next post will consider a slightly more flexible architecture for accessing cloud data from multiple devices. More importantly we circle back to the original question around privacy: does this design achieve the objective of using cloud storage as a glorified drive, without giving Amazon any ability to read customer data? Considering that the AWS Storage Gateway is effectively blackbox software provided by Amazon and accepts remote updates from Amazon, we need to carefully evaluate the threat model and ask what could happen in the event that AWS goes rogue or is compelled by law enforcement to target a specific customer.