Payment networks, subset-sums and tracking consumer spending

(No cause for alarm, for the most part)

A recent Bloomberg article featured a glowing portrayal of the retail analytics company Cardlytics. Readers may forgiven for assuming this outfit had found the magical solution for tracking consumer spending from payment data alone, which had eluded all other attempts. Cue in cheers from hungry-advertisers and consternation from privacy advocates? Not exactly. What the article neglects to mention is that the company has no access to line-level purchase data. In other words, there is no new source of information here. Nothing new that is not already visible to Visa or MasterCard, or for that matter the bank that issued the credit-card. The “innovation,” assuming one exists, lies in better ways of crunching the existing stash of information that payment networks & banks are already sitting on. The implication being that wealth of information was sitting under-utilized, either because they have not gotten around to it, are contractually prevented from engaging in such data mining— unlikely, given that privacy has never been a forte in the payments industry— or more likely, because they lack in-house technical expertise for it.

To better explain this distinction, let’s recap how a payment network such as Visa operates. Suppose a consumer uses their Chase Visa card to buy a cup of coffee from the neighborhood deli. There are three critical participants in the loop facilitating that transaction:

  • Issuer: the bank that gave the credit-card to the consumer. For example, for a Chase Visa, this would be Chase bank.
  • Acquirer: the bank where the merchant has an account
  • The payment network, in this case Visa. This is the network, both in the metaphorical sense of being a densely connected graph between issuers and acquirers, as well as an actual communication network over which payment requests and authorizations are routed.

(This is a simplification; there are many participants all vying for a cut of the transaction, with the most notable ones being payment processors who assist merchants in getting setup to accept credit-cards, often as part of a bundle that includes providing the point-of-sale hardware.)

From a privacy perspective what matters is that each transaction originating at the merchant is routed through the Visa network and must be authorized by the issuing bank. After all, it is Chase bank that gets to decide whether customer Bob is authorized to spend $2345 on a new TV at BestBuy. That decision must take into account several factors, starting with whether Bob has sufficient spending limit left on that card. There is also the minor matter of verifying this purchase is indeed initiated by Bob, as opposed to a fraudulent charge resulting from a stolen card for example. That means the issuing bank knows the purchase amount and merchant ID— so does Visa, which is responsible for relaying the payment request to the issuer and ferrying back the thumbs up/down response for payment authorization.

Of course there is nothing new here. This is how payment networks have always operated. A corollary is that they become unwitting witnesses to consumer spending patterns in a global way. They have global visibility. By contract, merchants have local visibility: BestBuy is aware of every transaction a particular customer conducts at every BestBuy location over time. (There is some ambiguity around whether they are contractually permitted to use the payment card itself as a fixed identifier to correlate such purchases; card numbers are static for the lifetime of the card and more importantly include card-holder name which is likely constant across multiple cards even as they are replaced.) But that visibility ends as soon as the consumer steps outside the store. BestBuy has no idea what the same customer does next door over at Home Depot. Chase Bank on the other hand does, since they can observe the same credit-card being used at both locations. A card network such as Visa or MasterCard does Chase even one better: they can see all transactions across all merchants for that card type.

Again there is no mystery that this information is being collected or mined for patterns— in fact, that is how fraudulent transactions are detected. Issuing banks are on the lookout for red flags: such as the teetotaler racking up large tabs at bars or that card-holder who never leaves Poughkeepsie jet-setting around the Caribbean. What is new is that instead of being used in a defensive manner for combatting fraud within the payment ecosystem, this data is now mined to identify new revenue streams by directly working with merchants to market to consumers according to spending profiles.

But there is one major limitation of this model that companies including Cardlytics have yet to overcome: they have no visibility into so-called line-level purchase information. In other words, all they can observe is the total amount at the bottom of the receipt. They have no visibility into the contents of a shopping cart or itemized list of drinks on the bar tab— information that the merchant knows but is never transmitted as part of the payment authorization protocol. Line-level receipt information has been the Holy Grail of companies that harvest and traffic in consumers spending data. Much to the chagrin of eager startups partnering with banks for access to payment network flows, they are still not any closer to that stash, which remains carefully guarded by merchants.

To be clear, this is not to minimize the privacy risks. On the contrary, merchant IDs and amounts alone can be extremely revealing— and damaging. BestBuy may sound like a saccharine example but consider other merchants that accept credit cards: charitable organizations, treatment facilities specializing in rare conditions and controversial advocacy groups (Nickelback fan-club anyone?) In each case, the existence alone of a purchase amounts to metadata about the card-holder hinting at everything from political persuasion to medical conditions. In other cases, amounts and frequency or purchases can be telling: consider transactions involving liquor stores or casinos. Depending on who gets to make judgements based on data, historic patterns can mean the difference between casual interest and dangerous levels of attachment.

Finally there is an interesting edge case that may be already exploited in the wild: in certain cases one can work backwards from the total amount to line-level data. Consider a store that only offers four items:


Widget Price
A $2.48
B $5.49
C $8.75
D $14.99

Given that pricing structure, if the cash-register rings up a customer for $11.23— recall this is the only number Visa, issuing-bank and whoever they are willing to share the data with can observe— there is only one possible combination of widgets the customer could have purchased: A and C. There is no other way to create a basket of goods summing up to that price.

This idea “reverse engineering” shopping cart contents from total amount and individual prices is related to a well-known problem in computer science. It is a variant of the subset-sum problem. In the strict version of subset-sum problem, items can not be repeated. In the retail settings of course customers can buy multiple copies of the same item. It turns out that tweak does not appreciably alter the fundamental difficulty of the problem— and solving subset-sum is very difficult in a well-defined computational sense. It ranks among the group of NP-complete problems for which there are no  known efficient algorithms for solving large instances. Worse it is conjectured that no efficient algorithm exists. The “state of the art” exact solutions are barely faster than exhaustively checking all possible combinations of items, which does not scale to large instances of the problem where the menu contains not four, but dozens of different widgets available for purchase. Efficient approximation algorithms are known for many NP-complete problems but close-enough is not good enough in the context of inferring customer spending. If the algorithms returns an alleged shopping-cart but its contents do not add up to the exact purchase price, it is not the right cart; there is no reason to believe the customer bought any of them. 

Are retail analytics companies working to solve subset sums to recover line-level purchase information? This is unlikely for several reasons, even if they were willing to throw large amounts of CPU time at the problem. This approach can only provide a “unique” solution on a menu with a small number of discrete items, each with unique price. As soon as any of these conditions are violated, there is no unique solution. For example if variable quantities are allowed— as in a grocery store where fruits and vegetables are priced by the pound— any amount can be attributed to say bananas alone. Similarly, if there are two items with the same price, they become indistinguishable. Even if every item has a distinct price individually, because the possible combinations increase exponentially with the number of items, there is no guarantee that they will remain free of “collisions:” two completely distinct basket of goods ends up with same total cost, down to the penny. Finally, assuming the pricing ambiguity could be tamed, there is a practical problem around sourcing accurate information: the data-mining operation must have exact pricing information from thousands of merchants, staying on top of variables such as geographic location (state & local taxes that increase the final sales amount) seasonal fluctuations and daily promotions that one particular outlet may decide to implement. Achieving any semblance of accuracy for that information would require cooperation by the merchant. But if one posits that merchants will be complicit in helping mine customer transactions by sharing information with a third-party, there is no need for solving subset-sum instances any longer.  One may as well ask the merchant “what was in the shopping cart for that customer who bought $84.25 worth of groceries at 17:34EST?”

That scenario could ratchet up the privacy risks an order of magnitude. Until now, merchants have treated purchase data as a highly valuable asset, guarded jealously and only used internally to boost the health of this business. It is not up for grabs by third-party analytics services focusing on identifying global patterns to be monetized elsewhere. Even if that service could do a better job of crunching the data provided by one merchant, the end result may well end up benefiting their competitor instead. If incentives shift to the point that merchants are collectively throwing their own stash of consumer spending patterns into a single pile, it would spell trouble for privacy.


On the limits of decentralized exchanges

[Full disclosure: this author works on security for a cryptocurrency exchange]

Decentralized or non-custodial exchanges are one of the areas of active research and development in cryptocurrency. The key differentiator— no pun intended— is that the exchange does not take temporary possession of customer funds in order to enable trading. By contrast, the prevalent model for exchanges today only supports trading of assets that customers have already deposited on the exchange. To make this more concrete, let’s say Alice wants to buy 1 bitcoin for $6000 USD— using recent prices, which are bound to date this blog post. She wires $6000 to the exchange and places a buy order. Bob meanwhile is looking to liquidate his 1BTC position, coincidentally at the same $6000 price point. He deposits the bitcoin with the exchange and places the corresponding sell order. These orders “cross” on the trading engine, and the exchange updates its internal ledger to swap assets between these customers, minus any fees for the exchange. Critics like point to out these two limitations to this model:

  • Assets are tied up temporarily even when they are not being put to use for trading. If a better opportunity were to emerge for putting that bitcoin to use, it may take some time to recover it. (The situation is worse for fiat; consider the challenge of sending funds during a bank holiday.)
  • Customers must trust the exchange with their funds, which are subject to risks of insider malfeasance and external attacks.

Non-custodial exchanges seek to address these problems by foregoing the requirement to park funds as a prerequisite to trading. Before delving into the comparative strengths and weaknesses of this model, let’s pause to clarify the definition of what counts as an “exchange.”

An exchange is not a glorified bulletin-board

An abstract requirement for an exchange is to enable price discovery. In other words, the state of order books on the exchange must reflect true supply/demand conditions for the underlying asset. If one bitcoin is being offered for $8000 and a seller comes along willing to buy at that price, that trade must execute. If the seller has an option to back out of trades after they are matched, then the posted bids/offers are pieces of fiction that are no longer indicative of market conditions. (This is not to be confused with the ability to cancel orders or even the much maligned practice of high-frequency trading where orders are posted and retracted quickly. The point applies only during the time an order is posted on the book, even if those times are measured in milliseconds. Once another order crosses that, that trade must execute with overwhelming probability, with each side ending up with the assets on offer by their counter-party. That execution can not depend on the willingness of buyer/seller to further cooperate. Otherwise one side will always have an excuse to back-out as soon as market conditions shift, inspiring a bout of buyer’s remorse for not having taken a better deal. (Note this “guarantee” can be an economic incentive. For example, if any party who reneges on a trade is forced to pay significant penalties, that could be a sufficient deterrent for making sure most activity on the exchange settles uneventfully.)

That means services which only pair up buyers and sellers without providing any execution guarantees fall short of this definition of an exchange. Price signals on such a venue may well depart from true market valuations because there is no way to know in general if any assets are indeed changing hands at those prices. Even if the marketplace is continuing to observe participant activity following a trade (for example, by monitoring blockchains to see if funds claimed to be in the possession of the counter-parties were indeed moved) this is providing at best a delayed signal after the fact.

Beyond bidirectional channels

There is another “trivial” sense in which trading can be done without custody, at least for cryptocurrency assets. Instead of depositing funds ahead of time on the exchange and later allocating some fraction of those funds to open orders, the customer can instead create a bidirectional payment channel with the exchange. Suppose Bob has 5 bitcoin and plans to diversify some of that into ether and litecoin. Instead of depositing 5BTC all at once, he creates a channel with the exchange, with the commitment transaction funded entirely by Bob. When he wants to convert 1BTC into ether, he sends 1BTC to the exchange. If he also decides to convert another bitcoin into litecoin, the channel state is updated to now pay the exchange 2BTC, for the combined amount required to back both order. (Transaction fees are neglected in this simplified model.) If he were to change his mind and cancel the BTC/LTC order, the payment channel state is updated by revoking previous transactions and reverting to a new state where the exchange now only gets 1BTC, with the remaining 4 going back to Bob.

In this model, only the assets committed to open orders are in possession of the exchange, as opposed to the entire amount the customer has available for trading. It is in some ways similar to the daily settlement model common in equities trading: the customer opens a channel with the exchange prior to the opening of market, places/cancels any number of orders throughout the day and after close of market, the channel is “closed” to settle the balances. (Granted cryptocurrency trading takes place around the clock, without the sharply delineated “market hours” of traditional markets.)

While this reduces the proportion of assets tied up with an exchange or subject to security risk, it does not eliminate those risks entirely. For example, Bob still must trust the exchange that when his order executes on BTC/LTC,  he will receive the litecoin amount corresponding to his bitcoin offer. The natural question is whether one can design an exchange with stronger guarantees, where Bob parts with the bitcoin if and only if he will receive the stated amount in litecoin. In an ideal setting, that property would be enforced on chain, taking any trusted third-parties out of the equation. That may sound like a tall order, considering blockchains are self contained and have no awareness of the state of other blockchains. (There is an attempt to replicate the state of bitcoin on ethereum but it has not gotten very far.) On its face, that creates a challenge for making transactions conditional on each other, such as creating a dependency between Bob parting with assets on one chain only if he can recoup a precise amount of another asset class managed on a completely independent chain.

Atomic swaps

Atomic swapspreviously discussed here by comparison to the fair-exchange notion from traditional cryptography— provide a useful building block for doing pairwise exchange across chains. Returning to the problem of Alice and Bob executing a BTC/LTC swap:

  1. Alice prepares some bitcoin (in other words, an unspent transaction output or UTXO) with a specific script. This script allows either one of two paths for claiming the funds:
    • Using Alice’s key A after some time elapses, or
    • Using Bob’s key and knowledge of a preimage for the hash H=SHA256(R) where R is a random value Alice picks.
  1. Bob in turn prepares the corresponding amount of litecoin, using a script that mirrors the one Alice created for spending conditions:
    • Using Bob’s key after the same deadline, or
    • Using Alice’s key and knowledge of a preimage for the same SHA256 hash H. (Note that Bob does not know the random R solving that puzzle but he can copy H from the posted bitcoin transaction.)
  2. Alice proceeds to claim Bob’s litecoin by using her key and knowledge of R. In doing so, she will be revealing R.
  3. Bob in turn proceeds to claim Alice’s bitcoin using his key and the now disclosed answer to the preimage puzzle.

Limitations of atomic swaps

This primitive solves the basic problem of a pairwise trading between mutually distrusting parties, but there are a number of challenges to overcome before it can become a realistic alternative to traditional exchanges.  Let’s catalog these now.

Dealing with fiat

One of the glaring problems with atomic swaps is they operate on a blockchain. That means trading against fiat currencies such as the US dollar or euro can not be expressed using these conditions. There are two work-arounds offered for this approach, neither of which are satisfactory from the perspective of eliminating trusted third-parties.

First one introduces oracles to vouch for the fact that some event has occurred outside the blockchain, such as Bob wiring funds to Alice. This can be done generically by creating a multi-signature arrangement where the oracle must also sign the transaction before Bob can claim the bitcoin offered by Alice. Or in the context of ethereum it can be done by making payment conditional on a specific function call from the oracle contract. Either way this is introducing a trusted third-party into the equation and adding even greater delays to the settlement process, meaning that a trade that executes at a specific time will settle at a later point in time after the mediator had enough time to examine the evidence of fiat payment.

The second approach runs even more contrary to the spirit of avoiding trusted third- parties: using a stablecoin designed to have stable exchange rate tracking a specific fiat currency.  For example one could represent USD via Tether or other stablecoins. But the 1:1 correspondence between Tether and the dollar is only guaranteed by a private entity. In other words, by replacing USD in a bank account with tether for the convenience of applying atomic swaps, participants are taking on even greater concentration of risk in the single party backing tethers.

Scaling to an open market

Another challenge with atomic swaps is the pairwise aspect: the protocol has exactly two participants and operates in a carefully choreographed manner. Notice that even the very first step by Alice requires knowing that her counter-party is Bob, because his public-key goes into the construction of the initial UTXO. This is all good and well but it does raise one important question.

How did Alice and Bob find each other in the first place? Presumably this is the role the decentralized exchange plays. But if Alice must know about Bob, there is a chicken-and-egg problem. In the standard setting, Alice would post an order announcing that a certain quantity of bitcoin is available in exchange for a different quantity of ether. Later along comes Bob with an offer to take the other side of the trade, and this particular trade executes. But note that at the time Alice posted her order, there is no Bob in the picture. Also recall that it is not sufficient for our definition of an exchange to simply act as a bulletin-board where seller/buyers post bids and then go offline to attempt settlement  on their own when they paired with a counter-party. In the absence of additional enforcement to prevent participants from backing out of trades, this scheme can not guarantee that activity on the exchange corresponds to assets changing hands at those prices.

Secondly what prevents the participants from backing out of executing the atomic swap?  Recall that even with both transactions ready on chain, it is still up to Alice in step #3 to get the wheels moving. The protocol is only atomic to the extent that if Alice executes step #3, it is guaranteed Bob can also execute step #4. But there is no guarantee that after steps #1 and #2 are completed, Alice will proceed to step #3.

One approach to dealing with this is to have the exchange become a counter-party to every trade. In other words, when constructing the atomic swaps Alice always assumes she will be trading her BTC to the exchange for ETH. Likewise Bob assumes he is sending ETH for BTC when constructing the swap transactions. An atomic swap will work in this model since the identity of the buyer is known ahead of time; it is always the exchange. Meanwhile the exchange does not take on any risk in the event of a cross. Until there is a crossing order from Bob, the exchange will not move to claim BTC from Alice. Only when both orders appears on the book (and there is sufficient buffer left on the time-locks for both, to prevent the owner from reclaiming them) does it make sense to execute both swaps. Since funds from one are used to pay the other, the exchange is net neutral or even slightly ahead due to fees collected from the parties. By making sure the exchange is playing the role of “Alice” in the protocol and choosing the random nonce, the exchange can guarantee that step #3 will always take place and every crossed order will settle properly.

But there is a fatal problem with that approach: while it may be “neutral” in the sense that the exchange does not make directional bets on BTC/ETH, it still has to front funds to pair up with every order. Recall that both sides of an atomic swap must have an on-chain transaction that the counter-party can claim if the swap runs to completion successfully. That means for every order— no matter how unrealistic its chances of being filled given prevailing market conditions— the exchange is forced to set aside capital to guarantee execution. Suppose 1BTC is currently trading for 20ETH but a die-hard bitcoin fan wants to lodge an order to sell for 50ETH. In order to guarantee execution of that order, the exchange must set aside of 50ETH of its own capital for as long as this irrationally exuberant order stays on the book. That is not a viable model.

Partial execution

Theres is one more challenge to address when trying to build an exchange on atomic swaps: partial order execution. Using the previous example, suppose Alice is offering 1 BTC for $8000, while Bob and Carol each want to buy 1/2 BTC for $4000. There is perfect coincidence of wants here: Alice can sell her whole bitcoin and collect the USD she requested, by splitting that order evenly between Bob and Carol. Custodial trading engines handle this case without a problem, since they have full control of assets. More importantly they can pace the execution over time to deal with the far more likely scenario where the counter-parties appear at different times. Imagine Alice places her order, followed quickly by Bob while Carol does not materialize until a few hours have elapsed.

The basic atomic swap can not handle this scenario without additional involvement from the customer. This holds true even if the exchange itself is the counter-party to her swap. When there is only 1/2 BTC demand available, the exchange is in a bind: it can proceed with the swap with Alice for one whole BTC, but then it will be left holding the remaining 1/2 after Bob is paid. Demand for that half may not materialize at the same price, leaving the exchange holding the bag. Alice can always go back and prep a new transaction that sends two half BTC chunks to the exchange. (By using channels between Alice and the exchange, this can even be done offline without significant confirmation delays.) But therein lies the catch: Alice is free to back out of that transaction. On a custodial exchange, unless the order is placed as all-or-none, it will execute against any crossing order including partial lots. With swaps, one must track down Alice to restructure the transaction after every partial execution.

Looking ahead

These limitations do not preclude the possibility that different, more sophisticated protocols and cryptographic techniques could address the current limitations of decentralized exchanges. Instead they point to a scaling challenge with going from a protocol that works great in 1:1 setting to one that must function with large number of participants.  As things stand, non-custodial trading may eliminate the risk of funds storage, but the lack of an execution guarantee means that it greatly reduces confidence in the price signals generated by the market.


Threshold ECDSA, key sharding and multi-signature: a comparison

Threshold cryptography refers to splitting cryptographic keys such that performing an operation—such as decrypting or signing a message— requires an interactive process among multiple shareholders without ever disclosing the secret. The emergence of cryptocurrencies has inspired a cottage industry of research on threshold ECDSA, the digital signature algorithm hard-coded by most cryptocurrencies. After some false-starts in 2014, more progress has been made towards efficient schemes. For example one recent paper at IEEE Security & Privacy 2018 made strides in reducing the the level of interaction required during that computation. But the techniques remain a long ways from being a realistic solution for any system storing cryptocurrency, especially when they have to compete with a much simpler alternative already built into common cryptocurrencies: multi-signature. In this blog post we compare three alternatives to see how they stack up:

  1. Naive secret splitting
  2. Multi-signature
  3. Threshold cryptography

Naive secret-splitting

In this model, the bits making up the secret value are split using a plain threshold-sharing scheme such as Shamir’s secret sharing. Such schemes are parameterized by so-called “M-of-N” settings, where N shares exist of which any subset M < N are sufficient to reconstruct the original value. In the case of cryptocurrencies such as Bitcoin the “secret” private-key is a scalar value that defines the private key, which is effectively a large number. That number can be split into shares and reassembled again to create the original private-key as long as a quorum of at least M shares are present. With the key reconstituted, it can be used to sign Bitcoin transactions

That reassembly step also happens to be the Achilles heel of this model: there is a single private-key at the end of the day and it is being repeatedly reassembled each time that key is used to transfer Bitcoin. (Recall that in cryptocurrencies such as Bitcoin or Ethereum, control over funds is asserted by creating a digital signature, which boils down to using the private-key.) That means there is a single point of failure, a trusted system where reassembly takes place where that secret value is exposed.

Native multi-signature

Around 2012 Bitcoin underwent one of its more ambitious feature expansions, introducing support for a scripting language for describing payment conditions. This language greatly expanded the range of conditions that can be tied to spending funds. Previously pay-to-public-key or pay-to-public-key-hash  addresses could only express a condition along the lines “you can spend these funds if the transaction is signed with this key.” With script one could replace the second part with more complex assertions such as  “… signed with key A or signed with key B and provides solution to a puzzle C.” Since the idea of requiring multiple signatures on a transaction is a very common scenario, the designers introduced native instructions into the virtual machine scripting language to support multi-signatures or “multisig:” instead of depending on a single key to sign the transaction, some quorum of M keys out of a larger set of N is required to sign.

This M-of-N condition may resemble traditional secret-splitting but the similarity ends there In Bitcoin multisig, there are N distinct, independent keys. By contrast, secret splitting is working with exactly one key. As far as rest of the world is concerned, there is one signature to verify, produced by one private key and verified with one public key. How that private key is managed behind the scenes by sharding across multiple systems is opaque to everyone else.  By contrast multisig involves creating M different signatures to authorize a transaction. Each of these signatures must be independently verified and the presence of multiple, distinct keys is visible to other participants in the system.

In a useful example of debunking false dilemmas around security trade-offs, multisig improves both security and redundancy. The security advantage is readily apparent when M > 1. An attacker planning to steal funds has multiple targets to chase after instead of just one. More importantly, those keys can be managed independently. There is no single system that must have access to all keys at the same time. Each private-key signs a transaction without requiring access to any of the other keys. (In fact, the signatures do not even depend on each other.  This is an example of co-signing rather than counter-signing where subsequent signatures would incorporate earlier signatures. For Bitcoin the “message” being signed is identical from the perspective of multiple key-holders, which allows obtaining the signatures in parallel, in arbitrary order.) In fact the security gains from multisig are directly related to the extent these multiple keys have uncorrelated risks. Put all eggs in the same basket as Bitfinex did, and multisig degenerates into the same baseline as single-key control.

The redundancy improvement is more subtle. While secret keys falling into the hands of an adversary is perhaps the most common threat model, there is another risk around simply losing access to the key. For example the device where the key is stored could malfunction or accidentally get discarded— newsworthy examples of that are plenty. This is key “loss” in a different sense: no one has access to the secret any longer and any funds on the blockchain controlled by that private-key are effectively taken out of the money supply. In multisig configurations where the quorum number M is less than the total number of keys N, key loss can be tolerated. Even with some shares permanently lost or even temporarily unavailable, transactions can be authorized.

There is one catch: multi-signature transactions take up a lot more space. A configuration requiring a quorum of M signatures out of a possible group of N must include not only all M signatures but also M public-keys. You might expect some type of space-optimization in the protocol, where future transactions out of the same address do not require enumerating the public-keys again since those are fixed for a given multisig address. But there is no such provision. Whether out of a strident belief that addresses must only be used once to preserve privacy— an ideological stance readily contradicted by evidence from actual blockchain statistics— or a pragmatic bias for avoiding complexity, the designers did not include any mechanism for “caching” key-sets (even though signatures were being stored permanently on the blockchain until segregated-witness.) This is more than a matter of engineering efficiency: when it comes to paying fees, transactions are priced by the byte. This makes multi-signature transactions more costly, with costs scaling linearly with in both parameters M and N.

Threshold cryptography

“If you think cryptography is the solution to your problem, then you don’t understand your problem” — attributed to Peter Neumann

Threshold cryptography promises to offer  the best of both worlds. As far as the rest of the blockchain is concerned, there is a single key used to create one signature;  transactions consume the same amount of space as old-school single key addresses. On the other hand, that key has been sharded into multiple shares, with each share given to a different party. More importantly, those shares are never reassembled. Instead the people controlling individual shares all participate in an complex protocol to create the signature, without any participant gaining more information about the shares held by others. Designing those protocols, reasoning about their properties and proving security is where the cryptographic techniques comes in.

It is also where the complexity lies. Depending on the particular problem, these protocols can be too unwieldy for implementation. For some cryptographic systems there are natural solutions. For example RSA allows additively splitting the private-key and easily combining individual signatures created by each piece. (But the joint generation of an RSA modulus such that no one learns the factorization is quite tricky. For this reason, threshold systems commonly assume “trusted dealer” where the entire secret is allowed to exist on one system temporarily during key generation, from which the shares are distributed.) For  the DSA family of algorithms, including its elliptical curve incarnation, there is no simple solution known. Solutions can fall short effectivness for a number of reasons. First and most important is the amount of interaction required among holders of shares during signature generation. In an ideal solution, no interaction is required. Each person performs a computation using their share working independently and outputs some intermediate result. Finally there is a reassembly phase, where an untrusted system collects the partial answers and reassembles them into a final result.

That model is ideal for offline storage of cryptocurrency, because it allows parallel, incremental construction of a transaction. Typical offline storage schemes involve storing cryptographic keys offline, on systems located across multiple geographically diverse sites. Because these systems are not connected to a network, participants must travel on site to have physical access for interacting with them. A threshold signature protocol with zero interaction allows those accesses to happen in any sequence, visiting one site at a time. By contrast, all known threshold signature protocols for ECDSA involve multiple rounds of interaction among holders of key-shares. That means either visiting each site multiple times, or more realistically having to orchestrate multiple groups visiting all sites simultaneously and being in communication in real time to carry out protocol steps— keeping in mind that these systems were supposed to be offline and disconnected from networks. None of the options are particularly appealing. One could knit together the “air-gapped” systems into a private network that remains (hopefully) isolated from the rest of the internet. But even in that best case scenario where the network is private, there is new attack surface: connected systems can now target each other, weakening one of the assumptions about each site having uncorrelated risks. Or the operators could resort to kludgy “sneaker-net” to exchange inputs/outputs with offline systems while leveraging public networks to transfer those inputs/outputs to other sites. The inefficiency of that alternative increases with the communication complexity of the protocol, and the number of “rounds” involved in assembling the final signature.

A second and more subtle limitation arises from the incompatibility of threshold signing protocols with existing cryptographic hardware, such as bitcoin wallets running on smart cards or HSMs. If a protocol requires manipulating secret material in ways outside the handful of algorithms supported by the underlying platform, it can not be instantiated using these systems. Once again RSA is the poster-child for a well-behaved algorithm. Any device capable of importing RSA private keys and signing with them can participate in an NofN threshold signature scheme (note M=N requirement) because the “partial computation” using a share looks exactly like producing a regular RSA signature. That said, there are implementation subtleties. For example, the hardware must support importing an RSA private-key without the Chinese Remainder Theorem or CRT components. Otherwise if the shares had to include CRT components, every participant could recover the full secret.

By contrast existing threshold ECDSA protocols do not use secret key material in any way resembling an ECDSA signature, or even other common algorithms defined on secret material, such as ECDH computation using the same scalar value. That means such protocols are incompatible with gold standard for managing high value secrets. (It is rare for  cryptographic hardware to permit users to implement their own cryptographic algorithms to run inside the secure environment, since a badly designed one allows key extraction.) That is a significant step backwards in operational security: general-purpose computing platforms such as PCs are not appropriate for managing critical secrets. Hardware wallets are built for exactly that scenario, with defenses against tampering, key-extraction and side-channel attacks. Giving up that assurance to save on transaction fees is a dubious trade-off. That said, this is a practical limitation that could be overcome if hardware wallets decided to add support for the more esoteric primitives required to make these protocol work. But as with all other software development, vendors of cryptographic hardware follow the 80/20 approach to defining requirements: they concentrate their engineering efforts on supporting the 20% of features used by 80% of customers, and avoid the long-tail of seldom used obscure algorithms that only a handful of people care about.

Bottom-line: while future developments towards a more efficient threshold ECDSA implementation could overcome these limitations, at the moment it’s a tough sell to give up on the simplicity of native multi-signature in favor of a complex multi-party computation protocol that is incompatible with preferred hardware for high-value secrets.



On-ramps & off-ramps: impact of Bitcoin payments on the regulatory landscape

Money serves three functions: store of value, unit of account and medium of exchange. Bitcoin has achieved a dubious hat-trick in drawing criticism in each category. It is claimed to be too volatile to act as a measure of economic value—how can merchants risk accepting BTC if the prices fluctuate drastically day to day? For that matter, why would consumers risk giving up an appreciating asset if it is likely to be worth a lot more tomorrow? (Recall the infamous Bitcoin pizza incident.) It has also attracted a chorus of skeptics arguing that it can not succeed as a popular payments, pointing to the intrinsic scaling limits and costs of transacting on the congested network. Blockchain space is scarce, topping out around 7 transactions per second worldwide, dwarfed by the throughput of existing centralized payment rails such as credit-card and PIN debit networks. Moreover transactions compete against each other for inclusion in every block by offering “fees” to miners, who are incentivized to include only those with highest fees into the next block. Counterintuitively, those fees depend not on the monetary value of the transaction but on the amount of space it takes up on the blockchain— akin to charging for check-processing based on thickness of the paper used instead of notional value printed on the check. Depending on demand on the network, this results in drastically varying efficiency. In 2014 it was possible to transfer $80M while paying a few cents—outshining every other system including wires in cost-effectiveness— while the late 2017 frenzy witnessed  users spending tens of dollars in fees to send amounts of the same order of magnitude, which is a staggering level of inefficiency comparable to Western Union for cross-border remittances.

After an initial period of exuberance where merchants raced to outmaneuver each other in claiming bitcoin support and scoring cheap PR points, it is clear that bitcoin is not currently working as a payment mechanism. Ethereum never focused on payments as a use-case, while other altcoins such as monero with aspirations to being a medium of exchange are optimizing for dubious black markets and criminal activity outside the mainstream. Jury is out on whether any of the scaling approaches proposed such as Lightning Network will provide a lasting solution. 

 But this is more than a question about technical merits or real-world feasibility: it will have repercussions for the regulatory landscape around cryptocurrency. Whether Bitcoin or another cryptocurrency succeeds as a widely used medium of exchange will determine which participants in the ecosystem become the focus points for regulatory influence. If conducting meaningful commercial activity requires converting into fiat, regulating cryptocurrency becomes a simple problem focused on existing, already regulated businesses which serve as on-ramps & off-ramps to blockchains for the majority of consumers. (This is true even when that conversion into fiat is hidden from consumers, as in the case of most merchants who claim to accept Bitcoin but are in fact relying on a third-party payment processor who accepts Bitcoin from the consumer while paying out in USD to the merchant, effectively absorbing both the exchange risk and information security challenges associated with storing bitcoin.)

The difference can be illustrated by imagining two alternative scenarios. In the first case, Bitcoin or some other cryptocurrency with better intrinsic scaling properties succeeds in becoming a ubiquitous payment mechanism, complete with a vibrant ecosystem of consumers and merchants rivaling that of existing payment networks. Average citizens purchase goods with Bitcoin from one another with the same ease as swiping a credit-card, with one crucial distinction: there is no middleman moderating the transaction. (More accurately, since blockchains are not made out of magic, the “middleman” has been replaced by a decentralized network of miners.) There is no reason to convert bitcoin into or out of USD— which is what happens today behind the scenes when most merchants rush to announce bitcoin support. Such cross-currency hops are not only unnecessary but they would introduce economic inefficiencies compared to transacting in bitcoin as a first-class, “native” currency.

In the second scenario, Bitcoin is relegated to a different role, serving primarily as store of value (or unit of measure, however implausible that appears from current levels of volatility.) In this word everyday retail transactions are not carried out over the bitcoin network, directly or indirectly through second-layer solutions such as Lightning Network. either because no viable scaling solution has emerged or because network fees are too high compared to existing payment rails, dissuading merchants from switching. BTC can still be competitive for niche scenarios, such as cross-border remittances or infrequent large transfers (such as settlements between commercial banks) where 24/7 reliability and speed is more important than saving a few basis points.

In both scenarios, the censorship resistance of the Bitcoin network is identical. While there are legitimate concerns around the geographic concentrations of mining pools in China and a software “monoculture” of one implementation running every node, it would require extraordinary effort to blacklist addresses or make arbitrary edits to the ledger to undo history or reverse payments. Where these scenarios differ is that such interventions are not required in the second case as a prerequisite to effectively regulating flow of funds.

If commercial transactions require converting cryptocurrency to/from fiat, existing authority to influence movement of that fiat currency also translates into influence on use of virtual currency in any meaningful commercial capacity. If consumers are constantly liquidating bitcoin as a prerequisite for most economic activity, one need only focus on regulating these so-called “on-ramps” and “off-ramps”— points in the ecosystem where bitcoin is exchanged for a native currency accepted in-real-life, such as US dollar. This a tractable problem. Dealing in fiat currency requires banking relationships and maintaining banking relationships involves following existing regulatory regime, which includes a functional compliance program among other components. (One need only look at the instructive example of Bitfinex to see what happens when those connections are severed.) It does not matter that payments within the bitcoin network can continue to flow freely because the balances recorded on that ledger, no matter how significant when converted into notional USD numbers, grant very little purchasing power unless the funds can be exchanged back into USD. If there are only a handful of such off-ramps to exit out of bitcoin and into old-fashioned but universally accepted currency necessary for meaningful commercial activity, there is an effective point to focus enforcement activities.

Consider the example of WikiLeaks. In 2011 the US government pressured credit networkings including Visa, MasterCard and Paypal to stop processing payments for the organization. This forced WikiLeaks to start accepting bitcoin. With the massive run-up in price against USD, WikiLeaks can paint a slightly revisionist picture of that forced move as a blessing in retrospect. But what about the cash-flow side of the equation? Even if all WikiLeaks personnel were volunteers resulting in zero labor costs, no salaries to pay etc. as soon as there are any expenses to account for— data-center hosting, advertising etc.— the question is whether those costs can be reimbursed in bitcoin. If the counter-party is not willing to accept bitcoin, WikiLeaks must find a way to convert their existing stash of bitcoin into old-school currency, interfacing with aspects of the financial system where stringent rules apply around which customers one can conduct business with. (Quite apart from access, this hypothetical scenario also raises the question of whether paying for expenses out of bitcoin—converted or not— was something to brag about in the first place: if the currency appreciated 50,000% since initial donation was made, having spent that on operational expenses to keep the lights on has echoes of the famous million-dollar pizza episode in bitcoin.)

Even overcoming the scaling challenges may not be enough to realize a world where consumers pay for their morning cup of coffee in bitcoin. Volatility may prove even more tricky, not being amenable to simple technological fixes. In the long-term, the belief that price of bitcoin will rise against USD means everyone will hoard their cryptocurrency, with no one willing to spend it on such pedestrian purchases as breakfast. (That brings a whole new meaning to the cliche that skipping a latte every day will compound into significant savings.) Meanwhile significant short-term fluctuation discourage merchants. Pricing goods in bitcoin requires hedging against the exchange risk. The path of least resistance for risk-averse merchants is to cash out to USD immediately after the transaction, which is exactly what most merchants who claim to accept bitcoin are doing: in reality their payment processor collects BTC from the buyer while delivering USD minus transaction fees to the merchant. That approach insulates the merchant from the risk of holding bitcoin long term, while superficially granting them an excuse to display the bitcoin logo on checkout pages and press releases. But bragging rights aside, the popularity of that model for merchants highlights the fragile nature of decentralization. The favored integration model for accepting bitcoin is in fact not using bitcoin as a peer-to-peer currency: there is a payment processor in the loop for converting bitcoin into fiat currency to make it more palatable to the merchant. That middleman is likely a regulated money transmitter and can be legally compelled to stop processing payments for specific clients. If cryptocurrency becomes a first-class, native payment system for online or brick-and-mortar transactions, it will require a fundamentally different regulatory approach focused on trying to impose order on unwieldy peer-to-peer networks. If instead the current trajectory continues, incremental approaches centered around on/off-ramps will be sufficient.



We can decrypt it for you: smart-contracts for outsourced encryption

A small-step for cryptographic agility

Among the features introduced by the Byzantine hard-fork of Ethereum, one of the more under-appreciated ones may be support for RSA operations. Most virtual currencies have a very weak story around cryptographic agility, defined as the ability to easily change the cryptographic primitives used such as signature algorithms, block ciphers and hash functions. For example Bitcoin settled on ECDSA over one elliptic curve— secp256k1— as the signature algorithm and SHA256 for hashing. (There is a long-standing proposal to introduce Schnorr signatures that could introduce some variety but it is yet again limited to that one curve.) Ethereum also uses ECDSA over the same curve, but inexplicably hard-codes a pre-standardization version of SHA3 for hashing.

Arguably these decisions are  short-sighted given that large-scale blockchains are notoriously difficult to evolve once deployed. Convergence of the system depends on on all participants agreeing on a set of ground rules for interpreting transactions appearing on the chain. If one faction invents some new-fangled feature that the rest of the network does not recognize, they will disagree on which transactions are valid and what the true state of the world is. In effect the  Bitcoin is culturally allergic to using hard-forks to introduce novel functionality— witness the acrimonious debates over increasing block size— and historically relied on incremental soft-forks for improvement. Ethereum Foundation has shown no such restraint, shepherding the currency through multiple hard-forks. Some were controversial and did not exactly go as planned; the 2016 hard-fork to bailout DAO instead gave rise to Ethereum Classic. More recent forks have been less contentious, likely because they introduced features without obviously being designed for the benefit of a particular contract or stakeholder.

In one sense, RSA support it is pure syntactic sugar: EVM is Turing complete. Modular exponentiation, the operation underlying RSA, could have been implemented in Solidity or any other language that compiles to EVM byte-code. Granted the same could be said of ECDSA support provided by the built-in ecrecover contract invocation: that could have been left up to every one to implement on their own. Instead EVM designers made a pragmatic decision about providing a native operation to support a feature expected to have widespread usage. This is the same judgment call API designers face constantly: whether to provide a single convenient API that packages commonly used functionality or leave it up to user to stitch that together from existing APIs. In this blog post we evaluate one scenario that is now made possible with RSA support, namely outsourcing key-management and encryption to trusted third-parties.

We can decrypt that for you

Common disk-encryption solutions including MSFT Bitlocker and Apple FileVault have an option to back-up keys to the enterprise. This is achieved by encrypting contents to two or more different keys: one chosen by the user and one chosen by the IT department, usually an RSA public-key. This allows the company to access storage on those devices even if the assigned user is unable or unwilling to assist. Taking this one scenario step further: suppose the IT department does not want to manage that additional key that can decrypt all employee devices. Can they outsource this to a trusted third-party in a verifiable way? The third-party is retained to provide on-demand decryption when the IT department needs to access one of the devices using the secondary key.  Alternatively the third-party could act as “back-up:” even if the IT department itself has a key, they may want additional redundancy.

There is an important question around whether such draconian arrangements are justifiable or consistent with precedent. For now we will put aside these policy questions, which are highly dependent on jurisdiction. US companies typically adapt the stance that employees have no expectation of privacy on company-issued devices. In centrally managed IT environments, it is very common for the company to have root/administrator privileges over all machines. Those privileges allows IT personnel to remotely execute arbitrary commands on user devices and access any stored information. In other words, being able to decrypt storage offline is not necessarily granting a new capabilities that is not already present in a traditional, all-Windows environment with machines are joined to an Active Directory domain.

One important tangent on policy is warranted here before moving on to the implementation. On the surface, outsourcing encryption looks awfully similar to the dangerous and discredited notion of mandatory key-escrow. Starting in the 1990s, US government has requested backdoors in cryptographic protocols, asking that all encryption products effectively use two keys: one for the intended recipient and one for law-enforcement in case future access is required. Following strong backlash from cryptography and civil-liberties communities, these ideas seeming fell out of favor as access to strong encryption increasingly became taken for granted. But the uneasy truce came to an abrupt end in the aftermath of the 2014 San Bernardino shootings which resurrected calls for key-escrow. Compelled key-escrow is problematic because unwitting users are being forced to trust third-parties they have no control over or trust, merely by virtue of purchasing gadgets or software from vendors subject to that jurisdiction. This is different than voluntary escrow taking place in a closed system such as an enterprise IT environment, where the company is making a deliberate risk-benefit decision to outsource key-management to selected third-parties. In addition to the freedom of choosing these providers—or not choosing for that matter, in favor of keeping that function in-house— they can also reduce risk by splitting keys. For example, 3 providers can be selected and the original secret split in a 2-of-3 threshold scheme with each share encrypted to a different provider. That arrangement removes single points of failure, requiring cooperation/malfeasance/compromise of multiple providers to recover the secret.

Design outline

A smart contract running on Ethereum can provide a number of useful properties:

  • Backup service gets paid per decryption in a verifiable manner— payment is made if and only if a successful decryption is provided
  • The backup service can post a fidelity-bond attesting to the security of its key management. These funds are under control of the contract and designed to be released to an attacker upon submission of evidence that the key has been compromised. In effect, this is a self-enforcing bug bounty as well as a canary designed to alert users of a key compromise.  As long as the economic value of the key for an attacker is less than the value of the bounty, a rational attacker would choose to claim the bounty and in the process, alert the entire world that the key has been compromised. If no such evidence emerges after some time, the funds are released to the operator of the service. (Alternatively funds can be returned gradually over time, with some fraction returned every few months of successful operation.)

How would such a system operate?  Since a contract itself can not hold secrets, the actual decryption keys must be held by the service provider. But the provider can publish the public-key which is sufficient for encrypting data. The customer is free to encrypt arbitrary plaintext using that key. But when it is time for decryption, they have to go through the smart contract to reach out to the service provider. This contract has three methods:

  1. Request decryption: Customer submits a request to decrypt a ciphertext encrypted in the RSA key of the backup service. This call is also required to send previously agreed-upon amount in ether to the contract as pre-payment. In practice it would also be subject to an authorization check based on message sender address; only the legitimate customer is allowed to request decryption. This prevents random interlopers who stumble on ciphertext from using the contract as a decryption oracle.
  2. Submit decryption result: Invoked by the operator of the backup service, providing the plaintext for a ciphertext previously uploaded by the customer.
  3. Request refund: Invoked by the user to get their money back if the service was unable to provide service in a timely manner. (Note there is a race-condition here in the event of blockchain congestion or miner censorship. The decryption result becomes visible to the customer as soon as the service provider broadcasts it to the network. But it will not count towards meeting the deadline until it is mined.)

Step #2 is where support for RSA operations come in handy. This is the point where the smart-contract must verify whether a correct decryption was provided, by applying the encryption step to the alleged plaintext. (Note we are assuming raw RSA decryption— this is also required to support “blinding” as explained below. The plaintext itself may have padding such as OAEP but its fully-padded version must be returned by the backup service.) If the provided plaintext is correct, the payment originally sent in step #1 is “captured” for good. Otherwise the funds remain in a provisional state. If no decryption materializes after some time period such as 24 hours, the user can invoke the refund function. This function checks the internal contract state and ascertains whether an outstanding decryption request initiated by the same caller has not been successfully resolved. If that is the case, the initial payment is sent back to the caller.

This construction sounds counter-intuitive in one aspect: everything taking place on public blockchains including Ethereum is visible to the entire world. Any one else can observe the above transactions getting posted, including the decryption result. That may look like a serious problem— everyone else is learning the secret— but there is an easy work around: RSA cryptosystem supports blinding using the multiplicative homomorphism of RSA. Instead of decrypting the primary ciphertext, the customer submits a modified version that is masked by a random factor. The decryption of that masked ciphertext still allows recovering the original plaintext, but only for the original submitter with knowledge of the masking factor. Other people observing the blockchain transactions are effectively witnessing the decryption of a random value that reveals no information about the original.

Improving accountability

What about providing accountability for the service provider? The idea here is to augment the contract with additional methods that an attacker who compromised the key can invoke to prove that fact and receive the value of the fidelity-bond as their reward. There are different ways to do this varying in complexity and level of evidence required. For example, an attacker can submit the factorization of the modulus and the contract verifies that the product of factors is equally to the modulus. This is sufficient but not necessary to prove key compromise. For example, in many situations the private key material is kept on tamper-resistant hardware such as an HSM which is designed to perform operations without surrendering the raw key bits. If an adversary obtains possession of the HSM and has the necessary credentials to operate it, the key is effectively compromised regardless of whether raw bits can be exfiltrated out of the hardware.

We can lower the burden of proof and require attackers to only prove they can decrypt arbitrary ciphertexts. This is tricky because the contract itself is used to arrange for decryption of arbitrary ciphertexts and those results are available on the blockchain for all to see. Not to mention, when the RSA cryptosystem is used without padding, existential forgery of what appears to be “successful” decryptions is easy. Anyone can fabricate plaintext/ciphertext pairs for which they claim to have magic decryption capability. Even in a design where that choice is taken away, the customer of the encryption service can subvert the process: they can request a decryption and leverage that result as “evidence” to prove they have control over the private key. (Use of RSA blinding makes it impossible to ascertain that they were the same ciphertext, since a randomly masked value is getting decrypted each time.) A simple if not entirely satisfactory approach is to implement a challenge/response protocol during which ordinary decryption requests are paused. The “prover” who claims to have control over the private key initiates the process by invoking a contract function. To prevent spam, we can require that such challenges are accompanied by some amount of ETH as commitment to weed out provers with no hope of successfully completing the process. That function returns a challenge, which is based on recent block hash of the chain, message sent by the prover and internal contract state that incorporates all ciphertexts it has been asked to decrypt to date.(For example we can maintain a running hash that is updated with each request received.) Note that contracts can not provide “random numbers” in the usual sense; incorporating past ciphertexts prevents using information disclosed during past decryption operations to answer the current challenge. The prover is given a deadline measured in block-height or time to provide the decryption of that ciphertext. If the valid RSA plaintext emerges during that window, the funds maintained as fidelity bond by the contract are sent to the prover. Otherwise the challenge is considered a failure and the contract resumes decryption services. By creating unavoidable penalties in the event of key compromise, this approach provides financial incentives for the service provider to take adequate measures. (But it does not prevent a corrupt provider from voluntarily decrypting ciphertexts when requested by parties other than the legitimate customer.)



What is the value of half a secret?

Putting a price tag on cryptographic keys

Cryptocurrencies such as Bitcoin achieve what may be the purest distillation of money into information: control over funds is achieved by having control over associated cryptographic secrets, specifically ECDSA private-keys used to sign transactions. If you can sign with a specific key, you can spend the funds at a specific address associated with that key. Multi-signature is an extension of this model where instead of using a single key, possession is distributed into a quorum of M keys out of N. This improves both security— compromising a single key does not allow an attacker to walk away with any money— and redundancy: losing access to one key, say due to hardware failure, does not result in the legitimate owner losing access to their money. (Assuming M is less than N; otherwise the loss of any key is fatal.)

It also creates something of a conundrum for putting a price-tag on one such key for the purpose of threat modeling. What is a reasonable amount for the defender to spend in securing that key? For that matter, how much effort would a rational attacker spend on going after one key? Suppose 300 BTC is stored stored in a 3-of-5 multisig address. Intuitively one could argue the value for each key is exactly 1/3 the balance or 100 BTC. But this naive conclusion overlooks the “criticality” condition involved: any combination of less than three keys is worthless from the perspective of controlling the associated funds on the blockchain. (Assuming they are not also used to control some other address with different M-of-N configuration.) But once that third key falls into the wrong hands, the entire sum can be claimed.

In more concrete terms: suppose a disgruntled insider offers to sell an attacker one of the keys for 50BTC. If individual keys are valued based on their pro rata share at 100 BTC, that would be a great deal. But if the prospective buyer has no hope of obtaining any of the other four keys— perhaps they are managed by more honest employees— that 50BTC spent on attacking the first key will yield a return of exactly zero. There is an all-or-nothing phenomenon at work for all would be adversaries: 100% of resources dedicated to getting hold of the first M-1 keys are wasted unless that final key to reach quorum can be successfully obtained.

Critical thresholds

This is an unusual property of Bitcoin that is not easily achieved for other commodities it is typically compared to: half a barrel of oil is worth approximately half that of a full barrel. It is not possible to create a configuration of crude oil where three barrels must be consumed at the same time. If a gold vault is compartmentalized into five rooms containing equal amounts of the precious metal, plundering one out of five rooms is still a good outcome for the attacker.

Of course there exist physical objects with supra-linear scaling properties based on quantity: a two-carat diamond is worth more than two individual one-carat diamonds with identical specs. Most physical objects follow this pattern. If a car is chopped in half down the middle, the left and right-sides are not worth half the original value of the vehicle, because neither side amounts to a functioning mode of transportation any longer. But the value of those scattered parts is not zero either. They can be sold for scraps or combined with other interchangeable parts to put together a fully functioning if somewhat unusual vehicle.

It is usually in the mathematical realm that we run into scenarios where objects can be divvied up in a way that creates critical thresholds. For example, it is possible to split a credit-card number into two shares such that either share contains exactly zero information about the original, in a strong information-theoretical sense: even an infinitely powerful computer could not recover the original value from a single share. But the combination of two such “worthless” shares yields the original secret. There is another unusual property of these constructions: not only is combination of shares below the threshold worthless, the only way they can attain any value at all is in conjunction with other shares generated from that particular split. If you have half the components for a bicycle, there is a cottage industry of bike shops and parts suppliers to equip you with necessary missing pieces to assemble a complete bike. But keys in multi-signature configuration can only be combined with other keys in the same group. There is nowhere else to turn.

Multi-signature wallets and accountability

This is more than a philosophical puzzle. At least one multi-signature wallet provider has argued that their company should be exempt from any regulation because they are only providing one out of multiple keys required to control funds, with the remainder controlled by the customer.  This is in effect arguing that one key has zero value, as long as the quorum requires multiple keys. The argument rests on the observation that if the co-signing system goes rogue or experiences a security breach, its customers can not suffer any losses: additional private-keys not controlled by that service exist, gating access to funds on the blockchain.

But this argument is easily debunked by a reductio ad absurdum: suppose a customer creates a 2-of-3 multisig wallet and hires two co-signing providers to help manage those funds. At some future date, both providers are compromised by the same attacker who absconds with all funds in that wallet, since he/she is able to sign with a quorum of 2 different private-keys. Is it reasonable for both providers to disclaim liability by throwing up their hands and saying “not my fault, someone else had another key”? A more plausible interpretation of fiduciary responsibility is that both providers are implicated in the loss. Even the original argument is self-negating on closer reading: if the value of a multisig key is so minimal that one need not be concerned about the security level of a service that specializes in managing those shares, why bother with outsourcing to those providers in the first place? One could also store that share under the mattress or write it on a post-it note. It is contradictory to argue that you are providing an essential service—helping individuals or organizations offload the complexity of managing private keys— while also seeking an exemption from regulatory scrutiny because the secret you are entrusted for safe-keeping has no value, in the sense that it can not cause monetary losses by itself.

RSA 2FA breach as precedent

It will likely take a long time before a regulatory framework can answer these questions. Individual court cases could set precedent earlier. None of the publicized cases so far have involved a failure of the cosigning service. In the 2016 Bitfinex breach, BitGo acted as the co-signer in what was described as an alternative design to the traditional “cold-wallet” concept where majority of funds are stored in an offline system. During this incident BitGo API co-signed the multisig transactions used by perpetrators to take bitcoin out of the Bitfinex system. But BitGo has maintained—and Bitfinex has never disputed this assertion— that its API was working exactly as designedAPI was working exactly as designed: properly authenticated calls were made by Bitfinex, or someone with credentials required to impersonate Bitfinex, requesting co-signing of transaction. That episode raised questions about the effectiveness of co-signing, specifically for “online” configurations where a single point of failure exists: the Bitfinex system held both its own signing-key and credentials necessary to instruct the cosigner to sign with a second key. But it does not raise the question of co-signer liability precisely as long as the service was following instructions from “Bitfinex.”

Other examples from past security breaches suggest that providers of “additional” security features can not easily shirk responsibility. RSA Data Security experienced a breach of their SecurID authentication system in 2011, resulting in undermining the security of anyone who relied on RSA tokens as the second-factor of authentication— a widespread radius of collateral damage that included the defense contractor Lockheed-Martin. Following the co-signer logic, RSA could have argued that they are merely providing a second factor of authentication on top of an existing one such as passwords. As long as passwords are not compromised, failure of 2FA  alone can not result in bypass of authentication. To its credit RSA did not pursue that line of argument. After an initial period of unconvincing PR, the company began offering replacements to affected users, in keeping with the common sense understanding that 2FA providers have a responsibility to maintain a robust solution. It is unlikely that co-signing services will be held to a different standard.


Ethereum, Solidity and integer overflows: programming blockchains like 1970

Role of language design in exploit mitigation

On Wednesday a new vulnerability was disclosed in the implementation of some ERC20 tokens including BEC operating on the Ethereum blockchain. As far as impact goes, the flaw is about as critical as they come: unauthorized “gifting” of tokens out of thin air.  While this may have been the first time such a bug was observed being exploited in the wild on a blockchain, the type of vulnerability is far from novel: an integer overflow, rooted in the fact that most programming languages operate on numbers with limited range. Perform a calculation where the result falls outside that range and the result of that operation is no longer accurate, violating common-sense assumption. For example two very large numbers added together can produce a small number. Or two positive values multiplied may yield a negative product. This class of bugs are very well known in the realm of low-level languages such as C and C++. Combined with weak type-safety, lack of range checking and manual memory management, such flaws often provide a starting point for building a full remote-code execution exploit.

What is unusual is that the Ethereum virtual machine (EVM) design and the Solidity language made a series of bad decisions that resurrected a vulnerability class from low-level systems programming in the context of a wildly different environment intended for decentralized smart-contracts. To better demonstrate these unforced errors on the part of language & VM designers, it is instructive to compare how a low-level language such as C or x86 assembly handles overflows.

Getting by with finite integers

Suppose we are asked to sum two unsigned integers X and Y, and compare them to some other value Z. This is a common pattern in bounds checking, a defensive measure to prevent the application from reading/writing from memory outside the intended regions.

if (X + Y > Z) {
  // Limits exceeded, bail out
} else { 
  // Continue execution…

For extra color, let’s assume one of these values such as “X” can be controlled by an attacker, while Y and Z are constrained by the application itself. If arithmetic in C operated in the Platonic realm of natural numbers, the above code snippet would always run correctly. But these values are not represented by an idealized “integer” type that can take on any value. Instead they are limited to a fixed range, defined by the size of the integral type. This is dependent on the compilation model, which itself is tightly coupled to the hardware architecture code is being targeted at. For example, in the data-model LP64 common for Linux, the C type “unsigned int” has 32-bits while the larger type “unsigned long” can hold 64-bits. (Integral types can also be signed, further reducing the representable magnitude since roughly half the possible values are now allocated to negative numbers.) That “unsigned int” type can represent every number in the range 0 through 4294967295 inclusive. Equally important, it can not represent any value outside that range.

That poses a problem for doing math: it is possible to write expressions such as X + Y where both X and Y start out in the permitted range but the result falls outside. In other words, the operation overflows the result type. So what happens when a C program is asked to run a calculation resulting in overflow? Intuitively the options are:

  1. Return the result in a different, larger integral type such as “long.” While appealing at first glance, this is not a feasible solution for two reasons. First we run out of integral types when the same trick has to be applied for “long” or even “long long,” which a recent addition. As long as the type system only has a finite number of built-in types, there will be one with the highest capacity where we run out of room. But more importantly, playing games with types will only result in converting the out-of-range problem into a type-cast problem. Typically programmers write expressions such as “sum = X + Y” where X, Y and sum all have the same type. Even if the result is magically upgraded into a wider type, it will still get shoe-horned into the original type downstream. Playing conversion games only delays the inevitable.
  2. Treat it as a fatal error. In the same way division by zero results in halting the program— or more precisely, raising a signal/exception which, if unhandled will terminate the program— every overflow could result in triggering an error designed to immediately stop execution flow.
  3. Undefined behavior. This is a favorite cop-out for language designers, effectively stating that the compiler is free to produce any value: the result can be 0, 42, the compiler authors’ year of birth or it may crash. All of them are valid outcomes if the behavior is deemed “undefined” according to the language specification.
  4. Truncate the result in a well-defined manner and keep soldiering on.

It will come as no surprise to those familiar with the minimalist philosophy of C that the language opts for a combination of #3 and #4. (C++ carries on that tradition, for the most part deferring to C on the behavior of primitive types.) Overflow for unsigned types is perfectly legal and truncates the result. Specifically only the least-significant 32-bits of the result are taken. Meanwhile overflow on signed integral types is undefined. But pragmatically speaking, for most compilers and hardware architectures the result ends up being very similar to unsigned behavior: values are truncated to squeeze into the expected type. Looked another way, C programs are not working in the realm of natural numbers with their infinite possibilities. Instead they are doing modular arithmetic in a finite ring where values outside the range “wrap around” to an equivalent one modulo some power of 2.

One level below: x86/x64 hardware behavior

If this property of C looks arbitrary, going one more level down to look at how processors handle arithmetic at the hardware level may vindicate the designers. C is close to the metal, so to speak. Its constructs often map 1:1 into capabilities of an underlying hardware. Consider the common Intel x86 architecture. This architecture sports 32-bit registers and features instructions for operating on them. For example one can add the contents of register EAX to the value existing in register EBX:

add EBX, EAX

Since registers are also limited in the range of numbers they can express, executing this instruction poses the same dilemma: what happens if EAX and EBX hold values that added together would exceed the maximum integer representable in 32-bits? As with most hardware architectures, x86 adopts the convention of doing modular arithmetic: operations are implicitly performed modulo 2**32 (or some other appropriate power of two when working with 16-bit or 8-bit registers.) Nor is that a quirk limited to Intel: most hardware follows this pattern.

Defensive programming: hardware vs software

Given that context, it is hard to fault C for following the same model: it allows using efficient CPU instructions to perform arithmetic without having to double-check or adjust the results spit out by the processor. And while C is rarely accused of being a high-level language, many other languages that followed including those targeting virtual machines such as Java, have also taken the path of least resistance by going along with the hardware. Languages like Python and Scheme where integers have arbitrary precision by default are the exception, not the rule.

That said, there are some crucial differences in how arithmetic works at a low-level which are missing from the abstractions offered in high-level languages:

  • Multiplication has special case behavior of using the “type expansion” idea. Specifically product of 32-bit integers are in fact written into the pair of registers EDX and EAX, to accommodate a full 64-bit product.
  • CPU has a flags register that contains special flags updated based on the result of the last operations. This includes an overflow flag. Even more conveniently, the x86 instruction set contains conditional jumps based on whether that flag is set: JO, short for jump-on-overflow. That makes life easy for a programmer working in assembly language to detect and respond to overflow conditions with just 1 additional instruction— which for all intents and purposes is an unconditional jump, because the branch-predictor will correctly guess that an overflow will not happen in the common scenario.

By contrast checking for overflow in a language such as C or Java is far more kludgy because these low-level signals are missing. There is no facility provided by the language itself for dealing with overflows in an efficient manner. Options are:

  • Perform an extra arithmetic operation first to check if overflow will happen. For example, we can subtract one operands from the maximum representable value and if that difference is larger than the second operand, addition is guaranteed to overflow.
  • Sanity check the result after performing the operation, by comparing the result against one of the original operands. (Leveraging the observation that an overflow condition exists for unsigned addition if and only if the alleged sum is less than either one of the operands. Note that the rules are a lot more complex for signed arithmetic.)

Many libraries exist for doing arithmetic safely with such overflows checks included, but they trade-off safety for kludgy syntax and readability. Instead of writing a straightforward mathematical expression such as “result = X * Y + (Z * scale) + offset” one must replace each binary operator by a functional call, and potentially conditional checks for the overflow status of that particular subexpression. C++ with its support for operator overloading can at least overcome the syntactic problem, since one can design a replacement for every integer type along with  that performs overflow checking. However in the absence of direct compiler support or very diligent use of inline assembly, these user-defined substitutes will be very inefficient compared to native types.

Compiler features to the rescue?

Where the language falls short, compiler authors can sometimes step in. For example GCC has an option called trapv that will raise a trap when operations on signed integers overflow. This is an important distinction because C already considers such behavior to be undefined— crashing the program is just as valid as truncating the result modulo 2**32. But there is no corresponding option to treat overflow of unsigned integral types as a fatal condition: no “correct” compiler can do that because such overflow is perfect legal C and even explicitly relied on to implement functionality such as hash-functions and cryptographic algorithms. (A research paper from 2007 explores the idea of instrumenting existing binaries to fail on unsigned overflow, confronting exactly that problem of having to identify code that deliberately relies on truncation.)

Solving the problem for signed integers may sound good enough except for the fine-print in C conversion rules: in any operation involving a mixture of signed and unsigned operands, the signed input gets typecast to unsigned and arithmetic performed on unsigned values. In other words, introducing any unsigned value into a calculation voids the overflow check.

Solidity and Ethereum: the missing defense

EVM features large 256-bit words internally and has instructions to perform arithmetic operations on these values. Solidity in turns offers integral types up to 256-bits for programmers to use. But as the preceding discussion suggests, one can not “outrun” integer overflow vulnerabilities by adding bits to integer types any more than buffer overflows are fixed by adding more RAM. (Going from 32-bit →  64-bit architecture has expanded addressable memory to 256 terabytes but that did not spell the end of memory corruption bugs.) As long as resources are finite and attackers allowed to sneak inputs into an operation, it is possible to have an arithmetic overflow.

When it comes to defensive measures for avoiding overflow vulnerabilities, it turns out both platforms are worse than their ancient counterparts. In Solidity operations on integral types silently wrap-around by default in the event of an overflow. There is not even a compiler flag (at the time of writing) to treat overflow events as critical error— the equivalent of 10 year old “trapv” flag in GCC. At the processor level, EVM has no efficient mechanism for tracking overflow state or making decisions conditional on whether a recent computation resulted in overflow. There is nothing comparable to the x86 overflow flag and attendant condition branching instruction. Strictly speaking this is not required; a compiler can synthesize overflow checks with existing EVM instructions, patterned after the kludges used for C. But intrinsic EVM support could have made it that much easier.

For historic perspective, the x86 architecture dates back to the 1970s. So does C, designed as a low-level systems programming language with the explicit goal of closely mirroring the underlying hardware. (C++ arrived on the scene in the early 1980s, adding object-oriented paradigms while continuing to pay homage to the efficiency requirements of C.) Ethereum started four decades later with a completely different use-case: programming smart-contracts on a blockchain, which is an extremely inefficient system for computation and where correctness matters far more than shaving a few virtual cycles. Likewise, Solidity as a language for writing smart-contracts started from a clean slate. Unlike C++ it did not inherit the baggage of having to support existing C code with assumptions about how integer arithmetic words. It’s a tribute to the short-term memory of computer science that a modern programming language designed with the singular purpose of building applications on a blockchain platform— where bugs can result in significant monetary losses— still ended up perpetuating an ancient vulnerability class.