Scaling Bitcoin vs keeping-up-with-Visa

Continuing on theme of the previous post on Bitcoin scaling challenges, there is an implicit assumption driving the scaling controversy that requires a closer look. It is an article of faith that Bitcoin must scale by orders of magnitude process more transactions, and do so quickly. The benchmark for evaluating increased blocksize proposals (or features that effectively result in same outcome by shuffling bits around, such as segregated witness) remains one dimensional: the throughput of the network measured in transactions processed per second or TPS.

Given current limitations that metric peaks at a theoretical maximum of 7TPS although the jury is out on whether that upper bound can be attained in practice. This is often contrasted against the corresponding numbers for a standard payment network such as Visa. Statements published by Visa— attributed to a decidedly ancient 2010 IBM test—  put those figures at average of 150M per day with a maximum peak rate of 24000TPS.

Before questioning this “keeping-up-with-the-card-networks” mantra, it’s worth noting the paucity of attention paid to other statistics such as confirmation time. Imagine trying to buy coffee with Bitcoin. To be on the safe side, the merchant must wait for the transaction to appear on the Blockchain. Not just on the latest block freshly minted, but a few blocks deeper to rule out the possibility of being reversed by a rare reorganization event. Average 10 minute interval for mining blocks makes for an unpleasant retail experience. There are nascent proposals such as Lightning Network layered on top of the core Bitcoin protocol (as opposed to being modifications to the protocol) designed to facilitate near real-time settlement. They are far from seeing any traction. Meanwhile the 10 minute interval remains an inviolable constraint in the protocol.

Granted, this may not matter when buying coffee. Merchants can manage the risk from operating with zero-confirmation as long as the potential loss from the sale of that item is low-enough and the proportion of honest customers in the population high enough. It is similar to the risk-management decision Starbucks already makes when they swipe card  at checkout without demanding a signature from the card holder. That is a good idea. It keeps the line moving and customers caffeinated. Once in a while there is a stolen credit-card used or some customer disputes the charge. Starbucks is left holding the bag because they have no proof. A signed receipt could provide insurance by shifting liability to the issuer but those extra 20 seconds would translate into lost sales on a busy morning. Now multiply the dollar amounts involved by 100x, and waiting for the network to confirm the transaction starts to make more sense, just as merchants get more picky about receipts and checking ID for big-ticket purchases. (Not to mention that replace-by-fee proposal will make it much easier to double-spend unconfirmed transactions.)

Keeping up with the Visas

The narrative animating Bitcoin XT envisions a world with ubiquitous Bitcoin usage. Not only do consumers worldwide purchase their coffee at the corner store in cryptocurrency instead of swiping/inserting/tapping their credit-card, but they get to pay the rent, stream music online, access journalism behind a paywall, settle that debt to a friend who paid for dinner and wire money overseas to the cousin traveling in Timbuktu. In this expansive vision, it’s not only Visa and PayPal who are on the way to extinction. Check clearing and its automated cousin ACH, Federal Reserve Wire Network, the international SWIFT system, peer-to-peer payments such as Venmo, proprietary alternatives such as PayPal, Western Union etc. all face competition from Bitcoin eating into their market share. Naturally that one-network-to-rule-them-all must boast the combined capacity of all of the “deprecated” systems it replaces.

In case such a formidable list of incumbents being obsoleted at once has not already cast some doubts on the feasibility of this vision, let’s dive deeper into one scenario: replacing card networks. We can ask what incentives would drive mass-adoption of Bitcoin as an alternative to using payment cards. This can quickly become a complicated comparison with multiple categories: PIN-debit vs credit vs charge-card vs prepaid, retail vs online payments etc. Here we pick one representative case that today accounts for the bulk of such transactions by volume.

Case study: how (not) to disrupt in-store payments

There are multiple participants involved in a transaction when a consumer buys a TV  from a big-box retail store using their credit card: cardholder, merchant, payment network, issuing bank, not to mention all of the invisible middlemen behind the scenes facilitating that transaction such as the payment processor or the vendors who manufactured the point-of-sale terminal. [Quick refresher] In order to switch over to using Bitcoins, one or more parties must see a compelling benefit.

It’s certainly not the card network or the issuer. They would be out of the picture, disintermediated by an open system where anyone can pay anyone else without going through gate-keepers for access to the network. Despite initial forays into exploring crypto-currency, they have little to gain from watching their consumer credit business evaporate overnight. Issuing banks are already earning money hand over fist from interchange-fees and interest on balances carried by consumers.

Could it be the consumer? The privacy afforded by Bitcoin is not exactly much of a help when walking into a store dotted with surveillance cameras, or for that matter using a smart-phone that requires an Internet connection to broadcast the bitcoin transaction. It’s even worse when considering the economics. One can only spend existing funds with Bitcoin; a credit card allows deferring payments, to handle fluctuations in cash flow by carrying a balance. Worse, consumers receive an effective discount (albeit small one) from credit-card transactions. In the face of increasing competition, issuers are increasingly splitting their profits with customers in the form of airline miles, fungible points exchangeable for gifts or straight cash-back rewards. For a card averaging 1% rewards, paying for a $1000 TV in BTC amounts to leaving $10 on the table.

Merchant perspective

Merchants are the prime candidate to agitate for bitcoin. Frustrated by high card-processing fees, perennially on the lookout for cheaper alternatives (so much that several large US retailers banded together to create their own) they are seemingly the ideal cheerleaders for cryptocurrency. The economic pressure is particularly strong for segments with low profit margins: grocery-chains and large department stores often boast net profits in the single-digit percents. Forking over 2% of the gross sale for the privilege of accepting Visa/MC/Amex starts to sting. Irreversibility of Bitcoin transactions is another selling point. There is no longer the dreaded problems of charge-backs caused by fraudulent transactions. (That’s a bug from the consumer perspective: it is a safety feature to be able to call someone and complain when a card is stolen, and not be on the hook for unauthorized charges— or at least maintain that illusion of zero liability. But merchants greatly appreciate not having to worry about a sale getting reversed.)

Yet in-store payments are exactly the scenario where adoption is challenging because of the long settlement time, coupled with 0-confirmation being unsustainable long term. Let’s assume merchants will eat the cost of upgrading their point-of-sale equipment to accept Bitcoin. Realistically, they were far from enthusiastic about chip-and-PIN or NFC adoption, until card networks threatened a liability-shift pitting them against issuers, but in this case the lure of drastically lower processing fees could alter the equation. It would still require widespread adoption of Lightning or equivalent solution for instant settlement before that investment pays off. This would be much less of an issue for online purchases: given the additional lag involved in digging up the item out of a warehouse, packaging and shipping, an extra half-hour wait is noise. (But even that may change with the rise of same-day delivery and other services compressing the timeline from placing the order to a package showing up.) To wit, success stories of using Bitcoin at Sears/CVS/Home Depot turn out to be enabled by an intermediary exchanging Bitcoins for gift-cards, which are then processed by existing channels at the point-of-sale. Those merchants are not directly accepting Bitcoin; they do not have a Bitcoin address or operate a node on the network.**

Realistically, even highly motivated merchants can not unilaterally force consumers to adopt a new payment system. They can certainly express strong preferences towards minimizing transaction processing costs. They can nudge consumers in that direction, for example by setting minimum amounts for using cards. They can try passing on card fees to consumers or equivalently, offering discounts for cash- although that turns out to be so fraught with contractual issues that few venture there. They can lobby for structural changes behind the scenes: the Durbin amendment made it cheaper to process debit transactions. Some have even gotten into the open-loop card business by partnering with issuers, following the if-you-can’t-beat-them-join-them mantra. (That Costco-branded American Express card, which recently blew up in a very public spate between the two companies, let Costco share in revenue generated from cardholders while also reducing its own processing costs.) But at the end of the day, merchants must adopt to the existing payments ecosystem. As long as they conclude card-acceptance increases revenue by driving sales, they have an incentive to offer that option. In order for Bitcoin to become a viable alternative, there must exist a market segment of customers ready to spend Bitcoin and unable/unwilling to transact any other way. That critical mass does not exist at the moment, and it is far from clear that a demographic can emerge that simultaneously embraces Bitcoin while shunning mobile wallets such as Apple Pay or Android Pay, forcing merchants hand in having to cater to that preference.

Capitalizing on the unusual fee structure

The preceding issues reflect less on inherent flaws in Bitcoin than the current framing of scaling debate. The push for rapidly expanding block-size is predicated on the assumption that limited network capacity is holding back Bitcoin from achieving its manifest destiny of disrupting consumer payments. Build it and they will come, that argument goes. But the real question is not whether Bitcoin can compete with credit-cards in raw capacity, but whether it can compete on economical terms. For that matter, whether it needs to compete in order to be considered successful in its own right. Observe that wire-transfers move billions of dollars everyday, but no one has  penned an article announcing the demise of FedWire due to its inability to pay for coffee down the street. Payment systems have strengths and weaknesses. Even in its infancy, there are many applications where Bitcoin already outshines other alternative.

One theme uniting those scenario is fast and low-cost movement of funds. Writing a check does not incur any fees for either sender or recipient, but it is slow and requires physical shipment of pieces of paper. Wire transfers are real-time but come with hefty fees for individuals. Peer-to-peer systems such as Venmo and Google Wallet can move funds instantly at no-cost within the system, but they suffer from the problem of getting money into & out of the system; hefty credit-card fees are passed on to the consumer.

A  crucial special case of low-friction P2P transfers are cross-border remittances. If wiring money from one bank to another in the US sounds like an expensive proposition, imagine being a migrant worker trying to send money overseas to family. WorldBank tracks global remittances, and recently estimated an average ~7.5% fee globally. That estimate hides the true range of figures: trying to send $200 from Germany to China would incur a whopping 17% in fees. It’s not even a function of geography- France and Algeria are close but the average fee is 15%. Bitcoin vastly undercuts these expensive channels, translating into billions of dollars saved.

A core developer recently raised the alarm about Bitcoin development increasingly focusing on settlement scenarios instead of direct payments. There is a good reason for that bias: it plays to existing strengths of the design. Bitcoin is one of the most efficient ways of moving large sums around. Fees are charged per byte of the transaction. That is a very unusual metric because it is independent of the value moved. Under this regime a transaction moving a million dollars may pay less in fees than one moving a few dollars. By contrast most systems charge fees ad valorem. For credit cards those rates can hover around 2-3%. Even the Durbin amendment capping fees on PIN-debit set those limits as fixed cost plus 0.05% of the amount, retaining the proportionality feature. Bitcoin is unique in decoupling fees from the utility of the transaction as measured in the amount changing hands. It’s as if a bank decided to charge for check-cashing not based on the face value of that check, but on the thickness of paper used.

All of these applications still require Bitcoin to continue scaling up its throughput. Key difference is that the throughput targets are much lower than required for playing catch-up to Visa/MasterCard/Amex. Nor do these changes need to happen on an aggressive schedule, playing a game-of-chicken with hard forks. These markets are emerging gradually, with adoption driven organically because Bitcoin is a compelling  improvement over existing alternatives. No one will give up on Bitcoin because their transfer did not settle faster than Western Union or charged a few more Satoshis in network fees.


** Ironically the intermediary eGiftCards itself uses yet another intermediary (Coinbase) for processing the Bitcoin transaction, and never directly taking on the risk of double-spending or currency volatility of BTC/USD. So much for direct peer-to-peer payments.


Observations on Bitcoin’s scaling challenge

Much ink has been spilled in recent months about Bitcoin’s scaling problems, specifically around the impact of increasing blocksize to allow more transactions to be processed in each batch. Quick recap: Bitcoin processes transactions by including them in a “block” which gets tacked on to a public ledger, called the blockchain. Miners compete to find blocks by solving a computational puzzle, incentivized by a supply of new Bitcoins minted in that block as their reward.  Net “throughput” of Bitcoin as system for moving funds is determined by the frequency of block mining and number of transactions, or “TX” for short, appearing in each block. The difficulty level is periodically adjusted such that blocks are found on average every 10 minutes. (That is a statistical average, not an iron-clad rule. A lucky miner could come up with after a few seconds. Alternatively all miners could get collectively unlucky and require a lot more time.) In other words the protocol adopts to find an equilibrium: as hardware speeds improve or miners ramp up their investments, finding a block becomes more difficult to bring average time back to 10 minutes. Similarly if miners reduce their activity because of increased costs, block difficulty would adjust downward and become easier.

“640K ought to be enough for everybody”

(In fairness, Bill Gates never said that about MS-DOS.)

Curiously block-size has been fixed for some time at 1 megabyte. There are no provisions in the protocol for increasing this dynamically. That stands in sharp contrast to many other attributes that are set to change on a fixed schedule (amount of Bitcoin rewarded for mining a block decreases over time) or adjust automatically in response to current network conditions, such as the block difficulty. There is no provision for growing blocks as the limit is approached— the current situation.

What is the effect of that limitation in terms of funds movement? Good news is that space restrictions have no bearing on on amount of funds moved. A transaction moving a billion dollars need not consume any more space than one moving a few cents. But it does limit the number of independent transactions that can be cleared in each batch. Alice can still send Bob a million dollars, but if hundreds of people like her wanted to send a few dollars to hundreds of people like Bob, they would be competing against each other for inclusion in upcoming blocks. Theoretical calculations suggest a throughput of roughly 7 TX per second, although later arguments cast doubt on the feasibility of achieving that.  The notion of “throughput” is further complicated by the fact that a Bitcoin transaction does not move funds from just one point to another. Each TX can have multiple sources and destinations, moving the combined sum of funds in those sources in any proportion to the destinations. That is a double-edged sword. Paying multiple unrelated people in a single TX is more efficient than creating multiple TX for each destination. On the downside, there is inefficiency introduced by scrounging for multiple inputs from past transactions to create the source. Still adjusting for these factors does not appreciably alter the capacity estimate.

The decentralization argument

Historically the 1MB limit was introduced as a defense against denial-of-service attacks, to guard against a malicious node flooding the network with very large blocks that other nodes can not keep up with. Decentralized trust relies on each node in the network independently validating all incoming blocks and deciding for themselves if that block has been properly mined. “Independently” being the operative keyword- if they were taking some other node’s word that the block is legitimate, that would not add any trust into the system. Instead it would effectively concentrate power, granting the other node extra influence over how others view the state of Bitcoin ledger. Now if some miner on a fast network connection creates giant blocks, other miners on slow connections may take a very long time to receive and validate it. As a result they fall behind and find themselves unable to mine new blocks. They are effectively working from an outdated version of the ledger missing the “latest” block. All of their effort to mine the next block on top of this obsolete state will be wasted..

Arguments against increasing blocksize start from this perspective that larger blocks will render many nodes on the network incapable of keeping up, effectively increasing centralization. This holds true for miners- who end up with “orphaned blocks” when they mined a block based on outdated version of the ledger, having missed out on someone else’s discovery of latest block- but to some extent for ordinary “full-nodes” on the network. It’s the independent verification performed by all those of nodes that keeps Bitcoin honest in the absence of a centralized authority. When fewer and fewer nodes are paying attention to which blocks are mined, the argument goes, that distributed trust decreases.

Minimum system requirements: undefined

This logic may be sound but the problem is that Bitcoin core, the open-source software powering full-nodes, has never come with any type of MSR or minimum system requirements around what it takes to operate a node. It’s very common for large software products to define a baseline of hardware required for successful installation and use. This holds true for commercial software such as Windows- and in the old-days when shrink-wrap software actually came in shrink-wrapped packages, those requirements were prominently displayed on the packaging to alert potential buyers. But it also holds true for open-source distributions such as Ubuntu and specialized applications like Adobe Photoshop.

That brings us to the first ambiguity plaguing this debate: no clear requirements have been spelled out around what it takes to “operate a full Bitcoin node” or for that matter to be a miner, which presumably has even more stringent requirements. No reasonable person would expect to run ray-tracing on their 2010-vintage smartphone, so why would they be entitled to running a full Bitcoin node on a device with limited capabilities? This has been pointed out by other critiques:

“Without an MVP-specification and node characterization, there is nothing to stop us from torquing the protocol to support wristwatches on the Sahara.”

Perhaps in a nod to privacy, bitcoind does not have any remote instrumentation to collect statistics from nodes and upload it to a centralized place for aggregation. So there is little information on how much CPU power the “average” node can harness, or how many GBs of RAM or disk-space it has, much less any operational data on how close bitcoind is to exhausting those limits today. Nor has there been a serious attempt to quantify these in realistic settings:

There would also be a related BIP describing the basic requirements for a full node in terms of RAM, CPU processing, storage and network upload bandwidth, based on experiments — not simulations — […] This would help determine quantitatively how many nodes could propagate information rapidly enough to maintain Bitcoin’s decentralized global consensus at a given block size.

Known unknowns

In the absence of MSR criteria or telemetry data, anecdotal evidence and intuition rules the day when hypothesizing which resource may become a bottleneck when blocksize is increased. This is akin to trying to optimize code without a profiler, going by gut instinct on which sections might be the hot-spots that merit attention. But we can at least speculate on how resource requirements scale, both in “average” scenario under ordinary load as well as “worst-case” scenarios induced by deliberately malicious behavior trying to exhaust the capacity of Bitcoin network. This turns out to be instructive not only for getting a perspective on current squirmish over whether to go with 2/4/8 MB blocks, but for revealing some highly suboptimal design choices made by Satoshi that will remain problematic going forward.


Each full-node verifies incoming blocks, which involves checking off several criteria including:

  1. Miner solved the computational puzzle correctly
  2. Each transaction appearing in this block is syntactically valid- in other words, it conforms to the Bitcoin rules around how the TX structure is formatted
  3. Transactions are authorized by the entity who owns those funds- typically this involves validating one or more cryptographic signatures
  4. No transaction is trying to double-spend funds that have already been spent

Blocksize debate brought renewed attention on #3, and core team has done significant work on improving ECDSA performance over secp256k1. (An earlier post provides some reasons why ECDSA is not ideal from a cost perspective in Bitcoin.) Other costs such as hashing were considered so negligible that scaling section of the wiki could boldly assert:

“…as long as Bitcoin nodes are allowed to max out at least 4 cores of the machines they run on, we will not run out of CPU capacity for signature checking unless Bitcoin is handling 100 times as much traffic as PayPal.”

Turns out there is a design quirk/flaw in Bitcoin signatures ignored by this rosy picture. The entire transaction must be hashed and verified independently for each of its inputs. A transaction with 100 inputs will be hashed 100 times (with a few bytes different each time, precluding reuse of previous results, although initial prefixes shared) and subjected to ECDSA signature verification the same number of times. In algorithmic terms, cost of verifying a transaction has a quadratic O(N²)  dependency on input count. Sure enough the pathalogical TX created during the flooding of the network last summer had exactly this pattern: one giant TX boasting over 5500 inputs and just 1 output. Such quadratic behavior is inherently not scalable. Doubling maximum block-size leads to 4x increase in the worst-case scenario.

There are different ways to address this problem. Placing a hard-limit on the number of inputs is one heavy-handed way solution. It’s unlikely to fly because it would require a hard-fork. Segregated witness offers some hope by not requiring a different serialization of the transaction for each input. But one can still force the pathological behavior, as long as Bitcoin allows a signature mode where only the current input (and all outputs) are signed. That was intended for constructing TX in a  distributed fashion, where the destination of funds is fixed but sources are not. Multiple people can chip in to add some of their own funds into the same single transaction, along the lines of fundraising drive for charity. An alternative is to discourage such activity with economic incentives. Currently fees charged for transactions are based on simplistic measures such as size in bytes. Accurately reflecting the cost of verifying a TX on the originator of that TX would introduce a market-based solution to discourage such activity. (That said defining a better metric is tricky. From another perspective, consuming a large number of inputs is desirable, because it consumes unspent outputs which otherwise have to be kept around in memory/disk.)

One final note: most transactions appearing in a block have already been verified. That’s because Bitcoin uses a peer-to-peer communication system to distribute all transactions around the network. Long before a block containing the TX appear, that TX would have been broadcast, verified and placed into the mem-pool. (Under the covers, the implementation caches the result of signature validation to avoid doing it again.) In other words, CPU load is not a sudden spike occurring when blocks materializes out of thin air; it is spread out over time as TX arrive. As long as a block contains few “surprises” in terms of TX never encountered before, bulk of the signature verification has already been paid. This is useful property for scaling: it removes pressure to operate in real-time. Relevant metric isn’t the cost of verifying a block from scratch, but how well the node is keeping up with sustained volume of TX broadcast over time. It might also improve parallelization, by distributing CPU intensive work across multiple cores if new TX are arriving evenly from different peers, handled by different threads.


Nodes also have to store the blockchain and look up information about past transactions when trying to verify a new one. (Recall that each input to a transaction is a reference to an output from some previous TX.) As of this writing current size of the Blockchain is around 55GB. Strictly speaking, only unspent outputs need to be retained. Those already consumed by a later TX can not appear again. That allows for some pruning. But individual nodes have little control over how much churn there is in the system. If most users decide to hoard Bitcoins and not use them for any payments, most  outputs remain active. In practice one worries about not just raw bytes as measured by Bitcoin protocol, but the overhead of throwing that data into a structure database for easy access. That DB will introduce additional overhead beyond the raw size of the blockchain.

Regardless, larger blocks only have a very slow effect on storage requirements. It’s already given that space devoted to the ledger must increase over time as new blocks are mined. Doubling blocksize only leads to faster rate of increase over time, not a sudden doubling of existing usage. It could mean some users will have to add disk capacity sooner than they had planned. But disk space had to be added sooner or later. Of all the factors potentially affected by blocksize increase, this is least likely to be the bottleneck that causes an otherwise viable full-node to drop off the network.

Whether 55GB is already a significant burden or might become one under various proposals depends on the hardware in question. Even 100GB is peanuts for ordinary desktop/server-class hardware that typically feature multiple terabytes of storage. On the other hand it’s out of the question for embedded-devices and IoT scenarios. Likewise most smartphones and even low-end tablets with solid-state disks are probably out of the running. Does that matter? The answer goes back to the larger question of missing MSR, which in turn is a proxy for lack of clarity around target audience.

Network bandwidth & latency

At first bandwidth does not appear all that different from storage, in that costs increase linearly. Blocks that are twice as large will take twice as long to transmit, resulting in an increased delay before the can recognize when a new one has been successfully mined. That could result in a few additional seconds of delay in processing. On the face of it, that does not sound too bad. Once again the scalability page paints a rosy picture with an estimated 8MB/s bandwidth to scale to Visa-like 2000 transactions per second (TPS):

This sort of bandwidth is already common for even residential connections today, and is certainly at the low end of what colocation providers would expect to provide you with.

If the prospect of going to 2000TPS from the status quo of 7TPS is no-sweat, why all this hand-wringing over a mere doubling?

Miner exceptionalism

This is where miners as a group appear to get special dispensation. There is an assumption that many are stuck on relatively slow connections, which is almost paradoxical. These groups command millions of dollars in custom mining hardware and earn thousands of dollars from each block mined. Yet they are doomed to connect to the Internet with dial-up modems, unable to afford a better ISP. This strange state of affairs is sometimes justified by two excuses:

  • Countries where miners are located. Given heavy concentration of mining power in China, there is indeed both high-latency as measured from US/EU and relatively low bandwidth available overall for communicating with the outside world, not helped by the Great Firewall.
  • Economic considerations around specific locations within a country favorable to miners. Because Bitcoin mining is highly power-intensive, it is attractive to locate facilities close to sources of cheap power, such as dams. That ends up being middle of nowhere, without the benefit of fast network connections. (Except that holds for data-centers in general. Google, Amazon and MSFT also place data-centers in the middle of nowhere but still manage to maintain fast network connections to the rest of the world.)

There is no denying that delays in receiving a block are very costly for miners. If a new block is discovered but some miner operating in the dessert with bad connectivity has not received it, they will be wasting cycles trying to mine on an outdated branch. Their objective is to reset their search as soon as possible, to start mining on top of the latest block. Every extra second delay in receiving or validating a block increases the probability of either wasting time on futile search, or worse, actually finding a competing block that creates a temporary fork that will be resolved with one side or other losing all of their work when the longest chain wins out.

Network connections are also the least actionable of all these resources. One can take unilateral action to scale up (buy better machine with more/faster CPUs), scale out (add servers to a data-center), add disk space or add memory. These actions do not need to be coordinated with anyone. But network pipes are part of the infrastructure of the region and often controlled by telcos or governments, neither responsive or agile. There are few options- such as satellite based internet, which is still high-latency and not competitive with fiber- that an individual entity can take to upgrade their connectivity.

Miners’ dilemma

In fact as many pointed out, increased block-size is doubly detrimental to miners’ economic interests: blocks that are filled to capacity lead to an increase in mining fees as users compete to get their transaction into that scarce space. Remove that scarcity and provide lots of spare room for growth, and that competitive pressure on fees goes away. That may not matter much at the moment. Transaction fees are noise compared to what miners earn from the so-called coinbase transaction, the prize in newly-minted Bitcoins for having found that block.

Those rewards diminish overtime and eventually disappear entirely once all 21 million Bitcoins have been created. At that point fees become the only direct incentive to continue mining. (Participants who have significant Bitcoin assets may still want to subsidize mining at a loss, on the grounds that a competitive market in mining-power provides decentralization and protects those assets.) Those worried about a collapse in mining economics argue fees need to rise over time to compensate and that such increases are healthy. Keeping block-size artificially constrained helps that objective in the short run. But looked another way, it may be counter-productive. Miners holding on to their Bitcoin stand to gain a lot more from increase in the value of Bitcoin relative to fiat currencies such as the US dollar. If scaling Bitcoin helps enable new payment scenarios or brings increased demand for purchasing BTC with fiat, that will benefit miners.

Stakeholder priorities

The preceding discussion may shed light on why miners are very sensitive to even small increases in latency and bandwidth. It still does not answer the question of whether that is a legitimate reason to hold off on block-size increases. Miners are one of the constituents, certainly one of the more prominent, and perhaps justifiably wielding outsized influence on the protocol. They are not the only stakeholders. Lest we forget:

  • People running full nodes
  • Merchants accepting Bitcoin as a form of payment
  • Companies operating commercial services dependent on the Blockchain
  • Companies vying to extend Bitcoin or layer new features
  • Current and future end-users, relying on the network for payments and money transfer

That last group dwarfs all of the others in sheer numbers. They are not actively participating in keeping up the network by mining or even verifying blocks; they simply want to use Bitcoin or hold it as an investment vehicle, often under custody of a third-party service. These stakeholders have different incentives, requirements and expectations. Sometimes those are aligned, other times they are in conflict. Sometimes incentives may change over time or vary based on particular circumstances: miners with significant holdings of BTC on their balance sheet would happily forgo higher transaction fees today, if scaling Bitcoin could drive usage and cause their BTC assets to gain against fiat currencies.

That brings us to the governance question: which constituency should Bitcoin core optimize for? Can feature development be held hostage by the requirements of one group or another? W3C has a clear design principle ranking its constituencies on the totem pole:

“Consider users over authors over implementors over specifiers over theoretical purity”

What the XT debacle and more recent saga of a rage-quitting core developer suggest is that Bitcoin project needs to articulate its own set of priorities and abide by them when making decisions.



Getting by without passwords: LUKS with smart-cards (part IV)

[continued from part III]

Unlocking the encrypted partition

During the boot sequence, the system will prompt for the smart-card PIN as part of the execution of pkcs15-crypt command:

Prompt for smart-card PIN

After providing the correct PIN, the card uses the private-key stored on board to decrypt the LUKS key. That key is returned to LUKS which unlocks the volume and permits the boot process to continue.

One tricky aspect handled by the key-script is managing terminal windows via openvt. Because the partition will be unlocked during boot and OpenSC has to prompt users for smart-card PIN to perform private key operations, that process must have access to console. This can be difficult to guarantee when I/O is being redirected as part of the boot sequence. A cleaner approach is to invoke decryption on a different TTY and switch the display over. This also makes the PIN collection slightly more legible by separating it from the clutter of messages related to boot process and appearing on its own window as above.

Weak links and redundancy

Strictly speaking, the sequence of steps detailed earlier adds an option to unlock an encrypted disk using a smart-card. But much like enabling smart-card login via pam-pkcs11, it does not require using the card. As long as the original LUKS key-slot is still present, using a hardware token is at best a convenience option: connecting a token and typing a short PIN maybe preferable to typing in a complex passphrase. But if the user had picked a “weak” passphrase for that, disk encryption is still subject to offline grinding attacks. In that sense, user-chosen passwords remain the weakest link in that setup.

Realizing the security benefit requires going one step further to remove the original keyslot using via cryptsetup luksRemoveKey. After that point, the only way to unlock the volume is decrypting the random secret.

Doing so however a new set of failure modes. What if the user forgets their PIN? Or loses their card? Even assuming perfect user behavior, what if the card itself malfunctions? There are even legitimate scenarios where the private key on the card may have to be replaced. For example when the certificate expires, issuing a new one may call for generating a new key-pair as well. (In theory PIV standard accommodates this by allowing the card to retain “archived” version of the key-management key. Unlike authentication or signature keys, this is one case where simply replacing an expired credentials is not enough. The old key is still required to decrypt previously encrypted ciphertext.)

This has several solutions, since LUKS permits multiple key-slots.

  1. In the simplest case, the user could create a new key-slot with a random passphrase that is printed on hard-copy and tucked away offline.
  2. For more high-tech solutions, one can repeat the process above using additional public-keys, creating multiple encryptions of the same random secret. (The corresponding private-keys may or may not reside on hardware tokens; that is an implementation detail.) Corresponding ciphertexts could be stored alongside the primary, to be used in case of a hardware problem with the original card.
  3. A slight tweak involves creating additional key-slots containing random secrets, with each one encrypted to a different public-key. By contrast option #2 involves a single key-slot, with the same secret encrypted to multiple public-keys. The latter approach is slightly less flexible for revocation: if one wanted to remove one of the backups (for example, because the corresponding private-key is assumed compromised) it requires  deleting all copies of the ciphertext which reside outside the encrypted volume, as opposed  the removing the key-slot that is part of the volume.

Getting by without PIV

One final note about the functionality required of the card here. As with the examples for local login and SSH authentication, we have used a PIV card or more specifically a PIV token in USB form-factor such as GoldKey or PIVKey. But there is nothing in the above examples that assume a specific card-edge. As long as the middleware can recognize the card— and OpenSC boasts an extensive list of supported hardware— the exact same steps apply. Alternatively one can switch to using pkcs11-tool for encryption/decryption, in conjunction with a PKCS#11 module for the hardware to work with a broader range of hardware including HSMs for example.


Second-guessing Satoshi: ECDSA vs RSA? (part III)

The previous post discussed the potential for blackbox kleptographic implementations of ECDSA to leak private-keys by manipulating random nonces. What about other cryptographic algorithms? Would offline Bitcoin wallets be less susceptible to similar attacks if Satoshi had made different design choices?

As a natural alternative we can look at RSA, the first widely deployed public-key cryptosystem. RSA signatures also involve applying a padding scheme to the raw message hash before feeding it into the trapdoor. Some of these schemes are probabilistic, meaning there is some arbitrariness in the way that the original input is extended from size of the hash function (say 32 bytes for SHA256) to the full size of RSA modulus (2048 bits or higher by contemporary standards) Such freedom in choosing those additional bits create the same problem as ECDSA. But RSA can also be used in conjunction with deterministic padding. One message, exactly one valid signature. While that eliminates a very specific threat involving covert channels from blackbox implementations, it raises other questions. How would RSA stack-up against ECDSA along other aspects- security, performance, efficiency?

Theoretical security: a paucity of provable results

At a very fundamental level, there is no clear winner. Reflecting our general state of ignorance about the complexity of core cryptographic primitives, neither cryptosystem has a formal proof of security. It is widely believed that the difficulty of RSA rests on factoring while that of ECDSA/ECDH rests on the discrete logarithm problem. But even those seemingly obvious facts have eluded proof as of this writing, if we define “proof” to mean a specific type of reduction: showing that an effective attack capable of breaking RSA (ECDSA, respectively) leads to an equally effectively algorithm for factoring (computing discrete logs.) It’s clear that efficient factoring would break RSA and efficient algorithms for computing discrete logs would break elliptic-curve cryptosystems. But it is the converse that matters. Proving or refuting that conjecture remains an open problem: whether there exists a way to decrypt messages or forge signatures without solving either of those problems. In fairness, few cryptosystems have been formally reduced to an existing problem. Rabin cryptosystem is provably equivalent to factoring. A few others rarely used in practice such as McEliece are based on an NP-complete underlying problem, but that does not imply that attacking a random instance is NP-hard.

Both elliptic-curve and RSA signatures vary in security level based on parameters, and there are various guidelines for relating the difficulty of breaking one to another. RSA is simpler because there are relatively few knobs and parameters to tweak when generating keys. Key-size is the main variable, with 2048-bits being the consensus recommendation today. In the past there have been varying opinions about choosing primes of special structure, on the grounds that most efficient factoring algorithms known at the time fared better when order of the group was “smooth” eg P-1 and Q-1 have many small divisors. But the most efficient algorithm known today, the number-field sieve, does not rely on such special structure and operates equally fast (or equally slowly, depending on your perspective) against all RSA moduli. Public exponent is generally chosen to be a small number to allow quick public-key operations, and attempts to optimize performance based on choosing special private-key exponents faces security problems. That leaves modulus-size as the main tunable parameter for determining security of the RSA cryptosystem.

For elliptic curves, the story is more complicated. For starters there are many subtypes of curves: they can be over prime fields or binary fields. Their parameters can be generated according to different criteria. Unlike choosing primes, picking a curve is both tricky and complex. So much that end-users do not generate their own curve, instead pick from a menu of options that are supported by their software. (“openssl ecparam -list_curves” for a glimpse at the options on your system.) This is where competition for curves starts to look like a market for used-cars, with each group pushing the virtues of their own curve. With many options come the possibility of some spectacularly bad ones. Some have been inadvertent, with advances in the field revealing that certain classes of curves were weaker than expected. Others may even be deliberate, clouded by allegations that curve parameters have been “cooked” by a conspiracy— with NSA usually playing the role of the supervillain in these stories— to have a very specific weakness exploitable by the powers-that-be.

Further aggravating matters, Bitcoin lacks cryptographic agility— it has committed to a single curve with no opportunity to increase hardness along this dimension short of a hard-fork. In any case, there is no linear scale as with RSA, where modulus size could be increased gradually one machine word at a time. Bumping up security requires a discontinuity, switching to a different curve with different parameters.

Implementation matters

Cryptographic algorithms also fail because incidental weaknesses in an implementation implementing that has nothing to do with the underlying mathematics. How does each choice fare along this dimension? As noted earlier, ECDSA is critically dependent on a source of randomness for each signature operation- and not just during key generation. Even a partial failure of the RNG during signing is fatal. RSA does not have this problem, particularly when used in conjunction with a deterministic padding mode.

Both algorithms are subject to side-channel attacks: in a naive implementation, the sequence of steps taken during the computation has direct relationship to private-key bits. For example the code may take one code path versus another depending on whether a key-bit is 0 or 1, and these different code-paths could have very different profiles in terms of time required for completion, memory references or even aggregate power consumption. Hardened implementations try to obscure these differences against external observation.  For elliptic curves it can be more difficult to do this. To take one example, in RSA a source of side-channel leaks is that squaring operation is often distinguishable from side-multiply because it can be implemented  faster, for example using the recursive Karatsuba technique or simply noting that only half as many multiplications are required because the same pair of operands appears twice. But this optimization is optional: an  implementation can choose to forgo faster squaring and fall back to doing it the hard way.** For elliptic curves, there is no such option. The basic group operation, point addition, is inherently different when adding a point to itself as opposed to adding two different points. That difference can’t be swept aside by falling back to a less efficient version of addition. More subtle techniques such as the Montgomery ladder are required to avoid key-dependent behavior.


At some level there is an irony to worrying about the performance of Bitcoin—an extremely inefficient system predicated on wasting massive amounts of CPU cycles, and consequently energy, to operate without any centralized trust. But suspending disbelief and assuming that speed matters, RSA appears to have a surprising advantage for transaction signing. Surprising, because in general speed is touted as one of the advantages for elliptic-curve cryptography. That part still holds true, and one can get ballpark estimates from (highly outdated) Crypo++ benchmarks or running openssl speed tests locally:

                              sign    verify    sign/s verify/s
256 bit ecdsa (nistp256)   0.0000s   0.0001s  26840.5  11228.1
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000554s 0.000025s   1805.1  39505.7

That’s an order of magnitude difference. The problem for ECDSA is that both signing and verification are relatively expensive. RSA on the other hand has a very asymmetric performance profile: private-key operations (decryption and signing) are 20x slower than public-key operations (encryption and verification, respectively.) Bitcoin transactions are signed once by the originator but verified thousands if not millions of times by full-nodes validating the ledger. In that usage mode, RSA gets the nod for lowering total cost across the network over the lifetime of the transaction.

Looking at key-generation on the other hand, ECDSA gains a slight edge. RSA keys have mathematical structure as the product of two large primes; not any old number will do. Key generation involves a relatively costly search for primes by testing candidates repeatedly until primes are discovered. For ECDSA any positive integer less than the order of the generator is a valid private-key. Randomly choosing a value in this range is sufficient and very quick; so much that bulk of the time in ECDSA key-generation is spent computing the public-key. That property also makes it easier to implement deterministic key-derivation schemes such as BIP32, where multiple keys are derived from a single root secret. Because any old integer in a particular range is a valid private key, it is easy to map outputs from key-derivation functions to ECDSA private keys in one-to-one manner. While RSA key generation can be made deterministic based on the starting point for prime search, it does not have a similar uniqueness guarantee.

A different measure of efficiency is space: how many bytes are consumed by the signature. Shorter is better and arguably this is more important than speed for Bitcoin. Signature size is a major determinant of transaction size. Given that each block is capped and current 1MB limit is already becoming a limitation, any increase in TX size further constrains how many transactions can be mined per generation. On that dimension ECDSA looks a lot better. The size of an ECDSA signature is twice that of the underlying curve. For secp256k1 that comes out to 64 bytes. (Somewhat puzzling is that Bitcoin sticks with traditional ASN1 encoding of signature, which introduces up to 7 bytes or 10% of additional overhead compared to plain layout.) At first glance, even that inefficient encoding looks like major savings compared to the alternative of using 2048-bit RSA signatures, which  represents a whopping 300% increase.

Surprisingly some of the overhead for RSA can be optimized away by merging it into the signature. RSA supports message recovery, where part or entirety of the message is squeezed into the signature block itself. For example with 2048b modulus and SHA256 as the underlying hash function, over 220 bytes of plaintext message can be safely recovered using PSS-R padding. Surprisingly the net overhead than could be lower than ECDSA depending on amount of data merged into the signature. Using a data-point from December 2014 that average Bitcoin transaction size is ~250 bytes, we can estimate ~190 bytes of preimage without the ECDSA signature. That is less than the maximum recoverable message which means that the entire TX would become a single RSA signature, without any appreciable bump in space used.

Unfortunately one detail in Bitcoin signing greatly limits the applicability of message recovery optimizations: every input into a transaction must be signed individually. This holds true even if the inputs are using the same address and in theory a single signature could prove control over all of them. For transactions containing multiple inputs, the hypothetical RSA signature overhead would apply for each input. That means one may run out of “message” to squeeze into the signature, unless we complicate matters further by starting to squeeze signatures into each other. In that case part of the “message” recovered from an RSA signature could be a chunk of a different RSA signature. Such cascades break one of the assumptions about TX construction: inputs can be added independently of each other. (There is a even a signature type that allows anyone to add inputs.) If input signatures incorporate fragments of other inputs, parsing becomes tricky.

One last point on space efficiency of existing Bitcoin signatures: ECDSA is far from being state of the art in optimizing signature size. It has been known for some time that pairing-based schemes enable shorter signatures. For example the Boneh–Lynn–Shacham scheme from 2001 provides a security level comparable to 1024-bit RSA using as few as 20 bytes. But pairing based cryptography remains relatively esoteric with limited implementation support and a potential minefield of intellectual property claims, making it an unlikely candidate for Bitcoin.

Verdict on ECDSA for Bitcoin

In retrospect then ECDSA looks like a reasonable middle-ground for the requirements Satoshi faced. It produces relatively compact signatures compared to RSA without requiring elaborate message-recovery tricks, allows rapid key-generation to promote better privacy for funds and has plenty of implementation support. (Modulo the choice of secp256k1; most of the initial focus on elliptic-curve support has been in response to NIST Suite-B curves. secp256k1 is not among them. For example Windows native cryptography API still does not support it.) On the downside, it has a suboptimal distribution of load between signing/verification, inflating the global cost of verifying blocks.


** To be clear, this only removes the timing leak. Other side-channels exist to  still permit distinguishing squaring from multiplication by a different operand. A side-multiply will typically reference different memory regions than multiplying a number by itself. Observations exploiting local channels such as CPU caches as can still infer such key-dependent memory references.

Getting by without passwords: disk encryption (part III)

[continued from part II: local login]

Next up in this series on using hardware tokens to replace passwords: disk encryption.

Threat models for disk encryption

One word of caution here: OS-based disk encryption, particularly on boot volumes, by itself is not sufficient unless also accompanied by another safeguard to ensure boot integrity such as UEFI secure boot, TPM-based measured boot etc. To motivate this, we  distinguish two different threat models:

  • Passive attack: This is the garden-variety lost/stolen laptop scenario. The device is captured by the adversary, who must contend with defeating disk encryption without ever interacting with the legitimate user or possessing their credentials. (In other words, no out-of-band tricks to get disk encryption passphrase.)
  • Active attack or “evil-maid” scenario: Adversary gets access to the device, tampers with it and then returns it back to the owner. Oblivious to the fact that their device has been in nefarious hands, legitimate owner proceeds to power it on, decrypt the disk and use their machine as if nothing happened.

(It’s worth pointing out this dichotomy does not cover the full range of attacks. For example original cold-boot attacks did not require require interaction, but relied on a quirk of Bitlocker default configuration: decryption happens automatically on boot using TPM. It’s as if the “user” is always present and willing to type in a decryption passphrase whenever prompted.)

Disk encryption alone falls short in the second scenario, regardless of the strength of cryptography or strategy for managing decryption keys. The reason is that before disk can be decrypted, some other piece of the code must execute that is responsible for implementing that complex full-disk encryption logic. That typically resides on another partition, which itself can not be encrypted to avoid a turtles-all-the-way-down situation. Otherwise we need another piece of boot-strap code to decrypt that partition, in a turtles-all-the-way-down circularity. But it could also be part of the BIOS firmware. Now if an attacker can tamper with that code and have the user proceed to unlock the disk using their Trojaned version, they have effectively bypassed encryption: during the decryption, their malicious loader could directly inject a payload into the operating system by returning bogus plaintext. (“You are trying to decrypt sectors containing the sshd binary? Here, take this version instead which contains a backdoor that will grant your adversary remote access with a magic passphrase.”) This is why it is necessary to guarantee integrity of the entire boot chain, starting from the firmware to the multiple stages of the boot-loader, all the way until the operating system gains control.

With that disclaimer in mind, here is a proof-of-concept for using off-the-shelf cryptographic hardware for disk encryption.

First the two easy cases.


This has an easy answer, because of the limitations in OSX: existing FileVault design does not support interfacing with smart-cards. (For historical purposes: the current version is technically FileVault2. The original FileVault was based on a different and did support smart-card unlock.)


Supporting hardware-based disk encryption on Windows is also very easy because it is already built-in. That said, we need to distinguish between two cases. Bitlocker for boot volumes does not use smart-cards. It can not; that scenario calls for specific TPM functionality, such as platform configuration registers or PCRs, to guarantee boot integrity which can not be emulated using an external token that is decoupled from the motherboard. Bitlocker-To-Go on the other hand, can use any compatible smart-card to encrypt other types of storage, such as additional logical partitions and even removable volumes such as USB thumb drives. It can even protect remote disks mounted as iSCSI targets and virtual-disk images, described in earlier posts around on using existing cloud-storage services without giving up privacy.

Linux with dm-crypt & LUKS

Linux falls into that gray area between OSX where enterprise-security scenarios usually don’t work because Apple never designed for enterprise, and Windows where the “common” scenario works out-of-the box provided one is adhering to the MSFT script. There is not a single “full-disk encryption” feature built into Linux but a wide array of choices, with only a handful used in practice.

This example describes implementing FDE using a PIV card with the popular LUKS and dm-crypt combination. dm-crypt operates at the level of block devices. Quoting from the cryptsetup wiki:

Device-mapper crypt target provides transparent encryption of block devices using the kernel crypto API. The user can basically specify one of the symmetric ciphers, an encryption mode, a key (of any allowed size), an iv generation mode and then the user can create a new block device in /dev. Writes to this device will be encrypted and reads decrypted. You can mount your filesystem on it as usual or stack dm-crypt device with another device like RAID or LVM volume.

LUKS in turn provides a layer for managing these encryption keys. LUKS can manage  multiple “slots,” each containing versions of the encryption key protected by a different mechanism. One helpful property of this behavior is providing side-by-side support for traditional passwords along with smart-cards, using different slots. It also provides a migration path to switch from passwords to hardware tokens, simply by deleting the associated slot.

LUKS and smart-cards

LUKS by itself does not have a notion of smart-cards, cryptographic hardware or even high-level interfaces such as PKCS#11. Instead it comes with an even more general extensibility mechanism called key-scripts. This model allows specifying an arbitrary binary to be executed and use the stdout output from that binary as the decryption key for unlocking a volume. This is more than enough to implement support for smart-cards. (In fact it has also been used to implement a poor-man’s Bitlocker.)

Our strategy is:

  1. Generate a random symmetric key
  2. Create a LUKS key-slot containing this symmetric key
  3. We will not store the original symmetric key. Instead, we encrypt it using the public-key from one of the smart-card certificates. That resulting ciphertext is stored on disk.
  4. Write a custom LUKS keyscript to decrypt that ciphertext using the corresponding private-key from smart-card
  5. Configure LUKS to invoke that key-script when the partition needs to be unlocked.


Here is a more detailed walk-through the steps.

Caveat emptor: This is for illustration purposes only. In practice these steps would be  automated, to avoid extended exposure of raw secrets in environment variables.

First generate a random secret:

RANDOM_KEY=`openssl rand -base64 32`

Next we assign it into one of the LUKS key slots:

cryptsetup luksAddKey /dev/sda2 <(echo "$RANDOM_KEY")

In this example /dev/sda2 is the partition containing an existing LUKS volume. The command must be run as root and will require one of the existing passphrases to be supplied. After the new key-slot is added, the disk can be unlocked anytime in the future by presenting that secret again.

Tying unlock to a hardware token

The core idea is making that capability—“present original secret”— conditional on having control over the smart-card. This is achieved by encrypting the secret using one of the public keys associated with the card and throwing away the original. From now on, recovering that secret to unlock the disk requires performing the corresponding private-key operation to decrypt the ciphertext. Typically that translates into having possession of the card and knowing the associated PIN.

To encrypt, we retrieve a certificate from the card and extract the public key from it using openssl.

PUBLIC_KEY=`pkcs15-tool --read-certificate 01  |\
            openssl x509 -inform PEM -noout -pubkey`

openssl rsautl -encrypt -inkey <(echo "$PUBLIC_KEY") -pubin \
                <<< $RANDOM_KEY > /crypto/fde_ciphertext.bin

Two ancillary notes about this:

  • PIV standard defines multiple X509 certificate slots that may be present on the card. This example uses the PIV Authentication Certificate, via identifier “01.” (This is a synthetic ID created by OpenSC. There is no such mapping in PIV, where the card-edge assigns keys identifiers such as 0x9A, 0x9B etc.) Any other key that permits decryption could be used; in fact Key Management certificate is more in the spirit of disk encryption.
  • Choice of raw RSA PKCS1.5 format is somewhat arbitrary and driven by convenience. (OAEP would have been preferable but at the time of writing, OpenSC does not grok OAEP, creating a challenge for decryption.) For the special case of managing LUKS secrets, more complex encryption modes such as CMS that use a random symmetric-key to bulk-encrypt data are not necessary.  Entirety of the secret data to be protected fits in a single RSA block for contemporary key sizes.

Unlocking the volume

We store the ciphertext outside the encrypted partition to avoid a circularity; access to the ciphertext is necessary before we can unlock the partition. In this example /etc/crypto is used. In the same directory we also drop a key-script that will be used for unlocking the volume. This is a simple bash script to decrypt the ciphertext using the smart-card to recover the original secret, and output that to stdout for the benefit of LUKS.

Raw RSA decryption is performed by using pkcs15-crypt from OpenSC. We could also have used pkcs11-tool which has more options, but adds another moving part in having to locate the appropriate PKCS #11 module.**

Finally we adjust the crypttab file to invoke the custom key-script when unlocking our encrypted partition:

cem@ubuntu:/$ cat /etc/crypttab
# <target>  <device>    <key file>                   <options>
home        /dev/sdb1   /crypto/fde_ciphertext.bin   luks,keyscript=/crypto/

(Without a key-file or key-script, the default behavior is interactively prompting the user for a passphrase instead.)



** In a way it is a fluke that pkcs15-tool works in this scenario; PIV is a distinct card-edge; it does not conform the PKCS #15 standard for what a card application looks like at the APDU level.

Sprints and marathons: why developer interviews miss the mark


The New York City marathon is one of the preeminent long-distance races in the world. Not surprisingly it comes with stringent qualification criteria, based on having completed either another full or half marathon at an eligible event. Imagine an alternate universe instead: runners qualify for the marathon based on their 100 meter sprint times. No doubt some of the usual entrants could still meet this unusual bar. But there would also be many false positives: the people have no problem sprinting blazingly fast over short distances but run out of fuel and drop-out of the race after a couple of miles. There would also be many false negatives— remarkable endurance athletes who never make it to the start line because the qualifiers screened for a criteria highly uncorrelated to what they are expected to perform.

That absurd hypothetical is not far from the state of the art in interviewing software engineers at leading technology companies. This blogger is certainly not first or most eloquent with a scathing criticism of the leading paradigm for conducting developer interviews. (In fact he has been a happy beneficiary of that broken model early on, one of its false positives.) But faith in measuring candidates based on their performance on contrived problems remains unshakable in many corners of the industry, from garden variety starts up in the valley to hedge-funds in NYC.

Introducing “The Problem”

With minor variations, the setup is same everywhere. Candidate walks into a room. An interviewer is sitting there, occasionally flanked by a shadow, a silent colleague who is there to observe the proceedings before he/she can conduct similar interrogations in the future. They engage in nervous chit-chat and pleasantries for a few minutes, before the interviewer walks over to a white-board and presents the candidate with “The Problem.” Invariably The Problem has two aspects—in fact novice interviewers often conflate these two problems when providing feedback about the candidate:

  • Algorithmic: This is the abstract logic piece, focused on puzzle-solving. It skews heavily towards the theory side, calling for insights into the underlying structure of a problem (“this is just an instance of network flows in disguise”) and benefits from familiarity with algorithms.
  • Coding: Translating the abstract solution into code expressed in a programming language. Often times the choice of that language is dictated by the company (“We are strictly a Rails shop and require that you are already familiar with Ruby”) In more enlightened circumstances, it may be negotiated between interviewer and candidate on the reasonable assumption that a good developer can easily become proficient at a new language or framework on the job.

Sometimes code is written on a white-board, without much attention paid to formatting or  the occasional bogus syntax. Other times candidates are sat down in front of an actual computer, provided some semblance of a work environment and asked to produce working code. (There are even websites such as CoderPad designed for remote coding interviews, with fancy syntax highlighting in several popular languages.) This is supposed to be an improvement both for the accuracy of the interview and “user-friendliness” of the process for the candidate. On the one hand, with a real compiler scrutinizing every semi-colon and variable reference, he/she must produce syntactically valid code. No sketching vague notions in pseudo-code. At the same time an IDE makes it much easier to write code the way it is normally done: jumping back and forth, moving lines around, deleting them and starting over, as opposed to strictly top-to-bottom on a white board. (Of course it is difficult to replicate the actual work space an engineer is accustomed to. It involves everything from the available screen real estate- some will swear by multiple monitors, others prefer a single giant monitor rotated into portrait mode. Experienced developers can be surprisingly finicky about that, having over-adapted to a very specific set up. Try asking an Emacs user to work in Vim.)


Putting aside the logistics of catering to developer preferences, what is the fundamental problem with this approach?

First the supply of good problems is limited. An ideal instance of “The Problem” provides more than a binary success/failure outcome. It permits a wide range of solutions, from the blindingly obvious and inefficient to the staggeringly elegant/difficult/subtle. A problem with a single “correct” solution requiring a flash of insight is a poor signal, similar to a pass/fail grade. Far more useful are those with complex trade-offs (faster but using more memory, slower but permits parallelization etc.) candidates can make progress incrementally and continue to refine their answers throughout.

Second there is an arms race between those coming up with questions and websites trying to prep candidates by publishing those questions. At Google we used to have an internal website where employees could submit these problems and others could vote on them. Some of the best ones inevitably got stamped with a big red “banned” after they had been leaked on interview websites such as Glassdoor. That even gives rise to the occasional problem of outright dishonest performances: the candidate who professes to never having encountered your question in the past, struggles through the interview and yet amazingly proceeds to parrot out the exact solution from the website last-minute.

Sprinting vs running marathons

Focusing on the difficulty of choosing an “optimal interview problem” is still missing the forest for the trees. A fundamental problem is that the entire setup is highly contrived and artificial, with no relationship to the day-to-day grind of software development in commercial settings.

To put it bluntly: most commercial software development is not about advancing the state of the art in algorithms or bringing new insights into complex problems in theoretical computer science— the type of “puzzle solving” celebrated at nano-scale by interview problems. That is not to say interesting problems never come up. But to the extent such innovations happen, they are motivated by work on real-world applications and their solution is the result of deliberate, principled work in that domain. Few engineers accidentally stumble into a deep theory question while fixing random bugs and then proceed to innovate their way (or failing that, give up on the challenge) in 45 minutes.

Much larger chunks of developer time are spent implementing solutions to problems that are well-understood and fully “solved” from theoretical perspective. Solved in quotes, because a robust implementation makes all the difference between a viable, successful product and one that goes nowhere. This is where the craft of engineering shines. That craft calls for a much broader set of skills than abstract puzzle-solving. It means working within pragmatic constraints, choosing to build on top of existing frameworks, however buggy or quirky they may be. (That is the anti-thesis of writing a solution from scratch out of thin-air. So-called greenfield projects where one gets the luxury of a clean slate are the exception rather than the norm in development.) It means debugging: investigating why a piece of code, already written by someone else with an entirely different approach, is not performing as expected. It means plenty of bug fixes and small tweaks. Making judgment calls on when to rewrite faulty components from scratch and when to continue making scoped, tactical fixes until there is more room in the schedule for an overhaul. Last but not least: working as part of a team, with complex dependencies and ever-shifting boundaries of responsibility.

Commercial software engineering is an endurance event over a highly uneven and unpredictable terrain full of obstacles, unfolding on the scale of weeks and months. (It is no accident that one of the earliest and better known books about software project management was titled “Death March.”) Many fashionable paradigms have come and gone— extreme programming, test-driven development, scrum with its emphasis on sprints— but none have altered the fundamental dynamics. Programming interviews by contrast are sprints on a perfectly flat indoor track, no matter how hard they strive to recreate a superficial similarity to actual software engineering.

The mismeasure of a developer

Why does the industry continue to evaluate engineers in this artificial way? One explanation is that the process is a relic of the 1990s, pioneered by companies such as Microsoft before there was a better way to evaluate the output of a developer.

In the days of shrink-wrap software, it was difficult to evaluate individual contributions in isolation from the larger project that person worked on. Imagine a candidate with resume credits on very large-scale applications such as MSFT Word or IBM OS/2. (Putting aside the question of how one feels about the relative quality of either of those products.) Thousands of people contributed to such a large code base. Did the engineer play a pivotal role or did they simply tweak a few lines of code at the periphery, fix a few inconsequential bugs? The opaqueness of proprietary development ensures this will be  difficult to ascertain. Meanwhile the Dunning-Kruger effect makes even the most sincere candidate self-evaluation suspect. It is not only successes that are difficult to highlight; failures are also easily swept under the rug. If you were the engineer in charge of enthusiastically integrating Clippy into MSFT Office or your code was riddled with critical security vulnerabilities that were later patched by other people, those details may be suspiciously absent from your resume.

Proprietary development has not gone away, but the good news is there is a growing amount of open-source software and it is easier to find than ever. It is no longer about downloading tarballs of source code from an obscure FTP site. With a Github handle listed on a resume, it is possible to browse and search through an individual developer’s contributions in a very systematic fashion. Those public repositories may still be that proverbial tip of the iceberg in relative proportion to their total output. For many people open-source contributions still remain a small fraction of contributions compared to private projects undertaken as part of a commercial product. But it still paints a far more accurate picture of their talent as developer- someone engaged in the craft of writing code, as opposed to solving logic puzzles. That is a picture that no 45-minute whiteboard interview can do justice to.


Who actually pays for credit card fraud? (part III)


Previous two posts [part I, part II] reviewed the counter-intuitive dynamics of how payment networks externalize and disperse fraud, by setting issuers and merchants against each other over direct losses. Those actors in turn incorporate expected losses into cost of services rendered to consumers.

As long as fraud costs are predictable and already factored into existing prices, they can be viewed as the cost of doing business.  Granted such losses incorporate an element of chance and perfect projections are not possible. Some years it may be less than projected, adding to the bottom line, while other years “black-swan” events such as the Target breach could result in much higher losses than expected. But as long as risks are manageable, it creates a stalemate in the system, one that goes a long way to explain why card networks have not particularly motivated to institute a crash-upgrade to EMV across the board. Issuers and merchants are locked into an archaic technology— the ubiquitous magnetic stripe and CVC2 codes— which are trivially defeated by vanilla PC malware, card-skimmers, compromised point-of-sale terminals and phishing scams. Yet the cost of upgrading point-of-sale terminals (and payment-processor connections— this one is not a simple matter of attaching an NFC reader and relying on backwards compatibility) may exceed projected reductions in fraud. As late as 2012 Glennbrook stated:

“In the US the common assumption is that the current level of counterfeit card fraud is too low to merit an industry-wide investment in EMV technology.”

Placing incentives on the right actors

That brings up the question about who should be paying for improving security and reducing fraud in the system. There is a certain logic to exempting card-holders out of the fight around who gets to absorb fraud losses: there is very little actionable for individuals in the way of reducing risk. In fact most “intuitive” responses can be counter-productive for the overall payments ecosystem: avoiding plastic in favor of cash— as consumers were reportedly doing in response to recent breaches— may indeed reduce fraud figures but only at the expense of losing issuer and revenue. (It may also increase cash losses which unlike credit-card fraud is not distributed to other individuals in the risk pool. As consumers in Greece noticed, when households began to stockpile cash and valuable, thefts and burglaries increased.)

Issuers and acquirers have more direct control over the risk levers in a given transactions. Issuing bank has the one final say in authorizing every transaction, with knowledge of amount and some notion of merchant involved. To improve their odds, they develop or more commonly out-source sophisticated risk management systems that sift through the haystack of card-holder transactions in real-time and flag suspicious patterns. Similarly merchants can institute policies based on their level of risk tolerance. Examples for retail outlets include:

  • Accepting thresholds set by networks for when signatures are required. Starbucks prefers to keep the line moving for smaller amounts, since quick customer service is critical in a volume business, but take time to pause on larger purchases
  • Requiring a valid government-issued ID with name matching the payment card
  • Prompting for additional card-holder information that can be checked during authorization process. Gas station pumps requesting ZIP code is the canonical example

Likewise e-commerce merchants subject to card-not-present risks can set certain policies such as:

  • Collecting CVC2 codes for additional card verification (Interestingly many large merchants including Amazon did not do this for the longest time)
  • Not shipping to PO boxes. These are easy to obtain and favored by shipping-mules to hide their true address
  • Checking billing address during authorization against information entered by the customer
  • Requiring additional verification when shipping address is different from billing address

All of these are actionable for the issuer/merchant and more importantly, decisions can be made independently by each actor. Set the rules too conservatively, and legitimate customers are driven away because their purchases are declined. Set them too liberally and an entire underground crew of professional crooks decide to prey on that merchant.

EMV deadline

Clearly something did change in the intervening years because Visa and MasterCard set a deadline of Oct 1st 2015 for EMV adoption in the US. The timing of the announcement coincided with the aftermath of large-scale breaches at Target and Home Depot. While correlation is not causation, one could speculate that card networks capitalized on the climate of heightened fear among consumers at the time to accelerate an EMV agenda which had been slow to gain traction until that point. Mainstream media also latched on a simplistic rhetoric that chip-and-PIN equals no more data breaches, creating a perfect environment to push EMV migration. With merchants backed into a corner too busy explaining why their shoddy security resulted in frequent data breaches, there would be no room for the type of careful cost/benefit analysis that has been otherwise the hallmark of risk management in payments.

So-called “upgrade” plan itself took the form of an ultimatum to issuers and merchants. Past the deadline, rules for disputing in-store transactions change. If the issuing bank had provided the customer with a chip&PIN card but the merchant terminal was only capable of swipe transactions, now it is the merchant who gets to eat the loss if that charge is later disputed by the card-holder. Conversely for those few situations when the merchant would have been on the hook, such as skimping on signed receipts in the interest of quick customer turnover, the bank may be stuck with the loss if the merchant POS had been upgraded for processing EMV transactions but customer card only had swipe capability. (Interestingly enough, if neither side had upgraded then business-as-usual rules apply. In that sense, not upgrading is the optimal outcome for both merchant/issuer when viewed as prisoner’s dilemma game, but the threat that the other side may “defect” would inspire both to settle for the more costly action of going through the upgrade.)

This is another great example of cost diffusion at work. Note that Visa and MasterCard are not on the hook for the vast majority of upgrade costs. The letters V and M in “EMV” may stand for their names but past research & development on EMV has become a historic sunk cost at this point. It is the issuing bank that must shell out for printing, personalizing and mailing millions of cards to their customers. Similarly the merchant is responsible for purchasing or leasing new equipment to accept EMV transactions. On the surface, consumers are off the hook again, much like their indemnification over fraudulent purchases. But to echo Robert Heinlein, TANSTAAFL: those issuer and merchant expenses be paid from somewhere. Absent any government subsidies to encourage better security— which have never played a significant role in this space— that source is the price of goods and services. Consumers will indirectly pay for this upgrade cycle too.