Filecoin, StorJ and the problem with decentralized storage (part II)

[continued from part I]

Quantifying reliability

Smart-contracts can not magically prevent hardware failures or compel a service provider at gun point to perform the advertised services. At best blockchains can facilitate contractual arrangements with a fairness criteria: the service provider gets paid if and only if they deliver the goods. Proofs-of-storage verified by a decentralized storage chain are an example of that model. It keeps service providers honest by making their revenue contingent on living up to the stated promise of storing customer data. But as the saying goes, past performance is no guarantee of future results. A service provider can produce the requisite proofs 99 times and then report all data is lost when it is time for the next one. This can happen because of an “honest” mistake or more troubling, because it is more profitable for decentralized providers to break existing contracts.

When it comes to honest mistakes—random hardware failures resulting in unrecoverable data loss— the guarantees that can be provided by decentralized storage alone are slightly weaker. This follows from a limitation with existing decentralized designs: their inability to express the reliability of storage systems, except in most rudimentary ways. All storage systems are subject to risks of hardware failure and data loss. That goes for AWS and Google. For all the sophistication of their custom-designed hardware they are still subject to laws of physics. There is still a mean-time-to-failure  associated with every component. It follows must be a design in place to cope with those failures across the board, ranging from making regular backups to having diesel generators ready to kick in when the grid power fails. We take for granted the existence of this massive infrastructure behind the scenes when dealing with the likes of Amazon. There is no such guarantee for a random counterparty on the blockchain.

Filecoin uses a proof-of-replication intended to show that not only does the storage provider have the data but they have multiple copies. (Ironically that involves introducing even more work on the storage provider to format data for storage— otherwise they can fool the test by re-encrypting one copy into multiple replicas when necessary— further pushing the economics away from the allegedly zero marginal cost.) That may seem comparable to the redundancy of AWS but it is not. Five disks sitting in the same basement hooked up to the same PC can claim “5-way replication.” But it is not meaningful redundancy because all five copies are subject to correlated risk, one lightning-strike or ransomware infection away from total data loss. By comparison Google operates data-centers around the world and can afford to put  each of those five copies in a different facility. Each one of those facilities still has a non-zero chance of burning to the ground or losing power during a natural disaster. As long as the locations are far enough from each other, those risks are largely uncorrelated. That key distinction is lost in the primitive notion of “replication” expressed by smart-contracts.

Unreliable by design

Reliability questions aside, there is a more troubling problem with the economics of decentralized storage. It may well be the most rational— read: profitable— strategy to operate an unreliable service deliberately designed to lose customer data. Here are two hypothetical examples to demonstrate the notion that on a blockchain, there is no success like failure.

Consider a storage system designed to store data and publish regular proofs of storage as promised, but with one catch: it would never return that data if the customer actually requested it. (From the customer perspective: you have backups but unbeknownst to you, they are unrecoverable.) Why would this design be more profitable? Because streaming a terabyte back to the customer is dominated by an entirely different type of operational expense than storing that terabyte in the first place: network bandwidth. It may well be profitable to set up a data storage operation in the middle-of-nowhere with cheap real-estate, abundant power but expensive bandwidth. Keeping data in storage while publishing the occasional proof involves very little bandwidth, because proof-of-storage protocols are very efficient in space. The only problem comes up if the customer actually wants their entire data streamed back. At that point a different cost structure involving network bandwidth comes into play and it may well be more profitable to walk away.

To make this more concrete: at the time of writing AWS charges ~1¢ per gigabyte per month for “infrequently accessed” data but 9¢ per gigabyte of data outbound over the network. Conveniently inbound traffic is free; uploading data to AWS costs nothing. As long as prevailing Filecoin market price is higher than S3 prices, one can operate a Filecoin storage miner on AWS to arbitrage the difference— this is exactly what DropBox used to do before figuring out how to operate its own datacenter. The only problem with this model is if the customer comes calling for their data too early or too often. In that case the bandwidth costs may well disrupt the profitability equation. If streaming the data back would lose money overall on that contract, the rational choice is to walk away.

Walking away from the contract for profit

Recall that storage providers are paid in funny money, namely the utility token associated with the blockchain. That currency is unlikely to work for purchasing anything in the real world and must be converted into dollars, euros or some other unit of measure accepted by the utility company to keep the datacenter lights on. That conversion in turn hinges on a volatile exchange rate. While there are reasonably mature markets in major cryptocurrencies such as Bitcoin and Ethereum, the tail-end of the ICO landscape is characterized by thin order-books and highly speculative trading. Against the backdrop of what amounts to an extreme version of FX risk, the service provider enters into a contract to store data for an extended period of time, effectively betting that the economics will work out. It need not be profitable today but perhaps it is projected to become profitable in the near future based on rosy forecasts of prices going to the moon. What happens if that bet proves incorrect? Again the rational choice is to walk away from the contract and drop all customer data.

For that matter, what happens when a better opportunity comes along? Suppose the exchange rate is stable or those risks are managed using a stablecoin while the market value of storage increases. Buyers are willing to pay more of the native currency per byte of data stashed away. Or another blockchain comes along, promising more profitable utilization of spare disk capacity. That may seem like great news for the storage provider except for one problem: they are stuck with existing customers paying lower rates negotiated earlier. Optimal choice is to renege on those commitments: delete existing customer data and reallocate the scarce space to higher-paying customers.

It is not clear if blockchain incentives can be tweaked to discourage this without creating unfavorable dynamics for honest service providers. Suppose we impose penalties on providers for abandoning storage contracts midway. These penalties can not be clawback provisions for past payments. The provider may well have already spent that money to cover operational expenses. For the same reason, it is not feasible to withhold payment until the very end of the contract period, without creating the risk that the buyer may walk away. Another option is requiring service providers to put up a surety bond. Before they are allow to participate in the ecosystem, they must set aside a lump sum on the blockchain held in escrow. These funds would be used to compensate any customers harmed by failure to honor storage contracts. But this has the effect of creating additional barriers to entry and locking away capital in a very unproductive way. Similarly the idea of taking monetary damages out of  future earnings does not work. It seems plausible that if a service provider screws over Alice because Bob offered a better price, recurring fees paid by Bob should be diverted to compensate Alice. But the service provider can trivially circumvent that penalty while still doing business with Bob: just start over with a new identity completely unlinkable to that “other” provider who screwed over Alice. To paraphrase the New Yorker cartoon on identity: on the blockchain nobody knows you are a crook.

Reputation revisisted

Readers may object: surely such an operation will go out of business once the market recognizes their modus operandi and no one is willing to entrust them with storing data? Aside from the fact that lack of identity on a blockchain renders it meaningless to go out-of-business, this posits there is such a thing as “reputation” buyers take into account when making decisions. The whole point of operating a storage market on chain is to allow customers to select the lowest bidder while relying on the smart-contract logic to guarantee that both sides hold up their side of the bargain. But if we are to invoke some fuzzy notion of reputation as selection criteria for service providers, why bother with a blockchain? Amazon, MSFT and Google have stellar reputations in delivering high-reliability, low-cost storage with  no horror stories of customers randomly getting ripped-off because Google decided one day it would be more profitable to drop all of their files. Not to mention, legions of plaintiffs’ attorneys would be having a field day with any US company that reneges on contracts in such cavalier fashion, assuming a newly awakened FTC does not get in on the action first. There is no reason to accept the inefficiencies of a blockchain or invoke elaborate cryptographic proofs if reputation is a good enough proxy.



Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s