Observations on Bitcoin’s scaling challenge


Much ink has been spilled in recent months about Bitcoin’s scaling problems, specifically around the impact of increasing blocksize to allow more transactions to be processed in each batch. Quick recap: Bitcoin processes transactions by including them in a “block” which gets tacked on to a public ledger, called the blockchain. Miners compete to find blocks by solving a computational puzzle, incentivized by a supply of new Bitcoins minted in that block as their reward.  Net “throughput” of Bitcoin as system for moving funds is determined by the frequency of block mining and number of transactions, or “TX” for short, appearing in each block. The difficulty level is periodically adjusted such that blocks are found on average every 10 minutes. (That is a statistical average, not an iron-clad rule. A lucky miner could come up with after a few seconds. Alternatively all miners could get collectively unlucky and require a lot more time.) In other words the protocol adopts to find an equilibrium: as hardware speeds improve or miners ramp up their investments, finding a block becomes more difficult to bring average time back to 10 minutes. Similarly if miners reduce their activity because of increased costs, block difficulty would adjust downward and become easier.

“640K ought to be enough for everybody”

(In fairness, Bill Gates never said that about MS-DOS.)

Curiously block-size has been fixed for some time at 1 megabyte. There are no provisions in the protocol for increasing this dynamically. That stands in sharp contrast to many other attributes that are set to change on a fixed schedule (amount of Bitcoin rewarded for mining a block decreases over time) or adjust automatically in response to current network conditions, such as the block difficulty. There is no provision for growing blocks as the limit is approached— the current situation.

What is the effect of that limitation in terms of funds movement? Good news is that space restrictions have no bearing on on amount of funds moved. A transaction moving a billion dollars need not consume any more space than one moving a few cents. But it does limit the number of independent transactions that can be cleared in each batch. Alice can still send Bob a million dollars, but if hundreds of people like her wanted to send a few dollars to hundreds of people like Bob, they would be competing against each other for inclusion in upcoming blocks. Theoretical calculations suggest a throughput of roughly 7 TX per second, although later arguments cast doubt on the feasibility of achieving that.  The notion of “throughput” is further complicated by the fact that a Bitcoin transaction does not move funds from just one point to another. Each TX can have multiple sources and destinations, moving the combined sum of funds in those sources in any proportion to the destinations. That is a double-edged sword. Paying multiple unrelated people in a single TX is more efficient than creating multiple TX for each destination. On the downside, there is inefficiency introduced by scrounging for multiple inputs from past transactions to create the source. Still adjusting for these factors does not appreciably alter the capacity estimate.

The decentralization argument

Historically the 1MB limit was introduced as a defense against denial-of-service attacks, to guard against a malicious node flooding the network with very large blocks that other nodes can not keep up with. Decentralized trust relies on each node in the network independently validating all incoming blocks and deciding for themselves if that block has been properly mined. “Independently” being the operative keyword- if they were taking some other node’s word that the block is legitimate, that would not add any trust into the system. Instead it would effectively concentrate power, granting the other node extra influence over how others view the state of Bitcoin ledger. Now if some miner on a fast network connection creates giant blocks, other miners on slow connections may take a very long time to receive and validate it. As a result they fall behind and find themselves unable to mine new blocks. They are effectively working from an outdated version of the ledger missing the “latest” block. All of their effort to mine the next block on top of this obsolete state will be wasted..

Arguments against increasing blocksize start from this perspective that larger blocks will render many nodes on the network incapable of keeping up, effectively increasing centralization. This holds true for miners- who end up with “orphaned blocks” when they mined a block based on outdated version of the ledger, having missed out on someone else’s discovery of latest block- but to some extent for ordinary “full-nodes” on the network. It’s the independent verification performed by all those of nodes that keeps Bitcoin honest in the absence of a centralized authority. When fewer and fewer nodes are paying attention to which blocks are mined, the argument goes, that distributed trust decreases.

Minimum system requirements: undefined

This logic may be sound but the problem is that Bitcoin core, the open-source software powering full-nodes, has never come with any type of MSR or minimum system requirements around what it takes to operate a node. It’s very common for large software products to define a baseline of hardware required for successful installation and use. This holds true for commercial software such as Windows- and in the old-days when shrink-wrap software actually came in shrink-wrapped packages, those requirements were prominently displayed on the packaging to alert potential buyers. But it also holds true for open-source distributions such as Ubuntu and specialized applications like Adobe Photoshop.

That brings us to the first ambiguity plaguing this debate: no clear requirements have been spelled out around what it takes to “operate a full Bitcoin node” or for that matter to be a miner, which presumably has even more stringent requirements. No reasonable person would expect to run ray-tracing on their 2010-vintage smartphone, so why would they be entitled to running a full Bitcoin node on a device with limited capabilities? This has been pointed out by other critiques:

“Without an MVP-specification and node characterization, there is nothing to stop us from torquing the protocol to support wristwatches on the Sahara.”

Perhaps in a nod to privacy, bitcoind does not have any remote instrumentation to collect statistics from nodes and upload it to a centralized place for aggregation. So there is little information on how much CPU power the “average” node can harness, or how many GBs of RAM or disk-space it has, much less any operational data on how close bitcoind is to exhausting those limits today. Nor has there been a serious attempt to quantify these in realistic settings:

There would also be a related BIP describing the basic requirements for a full node in terms of RAM, CPU processing, storage and network upload bandwidth, based on experiments — not simulations — […] This would help determine quantitatively how many nodes could propagate information rapidly enough to maintain Bitcoin’s decentralized global consensus at a given block size.

Known unknowns

In the absence of MSR criteria or telemetry data, anecdotal evidence and intuition rules the day when hypothesizing which resource may become a bottleneck when blocksize is increased. This is akin to trying to optimize code without a profiler, going by gut instinct on which sections might be the hot-spots that merit attention. But we can at least speculate on how resource requirements scale, both in “average” scenario under ordinary load as well as “worst-case” scenarios induced by deliberately malicious behavior trying to exhaust the capacity of Bitcoin network. This turns out to be instructive not only for getting a perspective on current squirmish over whether to go with 2/4/8 MB blocks, but for revealing some highly suboptimal design choices made by Satoshi that will remain problematic going forward.

Processing

Each full-node verifies incoming blocks, which involves checking off several criteria including:

  1. Miner solved the computational puzzle correctly
  2. Each transaction appearing in this block is syntactically valid- in other words, it conforms to the Bitcoin rules around how the TX structure is formatted
  3. Transactions are authorized by the entity who owns those funds- typically this involves validating one or more cryptographic signatures
  4. No transaction is trying to double-spend funds that have already been spent

Blocksize debate brought renewed attention on #3, and core team has done significant work on improving ECDSA performance over secp256k1. (An earlier post provides some reasons why ECDSA is not ideal from a cost perspective in Bitcoin.) Other costs such as hashing were considered so negligible that scaling section of the wiki could boldly assert:

“…as long as Bitcoin nodes are allowed to max out at least 4 cores of the machines they run on, we will not run out of CPU capacity for signature checking unless Bitcoin is handling 100 times as much traffic as PayPal.”

Turns out there is a design quirk/flaw in Bitcoin signatures ignored by this rosy picture. The entire transaction must be hashed and verified independently for each of its inputs. A transaction with 100 inputs will be hashed 100 times (with a few bytes different each time, precluding reuse of previous results, although initial prefixes shared) and subjected to ECDSA signature verification the same number of times. In algorithmic terms, cost of verifying a transaction has a quadratic O(N²)  dependency on input count. Sure enough the pathalogical TX created during the flooding of the network last summer had exactly this pattern: one giant TX boasting over 5500 inputs and just 1 output. Such quadratic behavior is inherently not scalable. Doubling maximum block-size leads to 4x increase in the worst-case scenario.

There are different ways to address this problem. Placing a hard-limit on the number of inputs is one heavy-handed way solution. It’s unlikely to fly because it would require a hard-fork. Segregated witness offers some hope by not requiring a different serialization of the transaction for each input. But one can still force the pathological behavior, as long as Bitcoin allows a signature mode where only the current input (and all outputs) are signed. That was intended for constructing TX in a  distributed fashion, where the destination of funds is fixed but sources are not. Multiple people can chip in to add some of their own funds into the same single transaction, along the lines of fundraising drive for charity. An alternative is to discourage such activity with economic incentives. Currently fees charged for transactions are based on simplistic measures such as size in bytes. Accurately reflecting the cost of verifying a TX on the originator of that TX would introduce a market-based solution to discourage such activity. (That said defining a better metric is tricky. From another perspective, consuming a large number of inputs is desirable, because it consumes unspent outputs which otherwise have to be kept around in memory/disk.)

One final note: most transactions appearing in a block have already been verified. That’s because Bitcoin uses a peer-to-peer communication system to distribute all transactions around the network. Long before a block containing the TX appear, that TX would have been broadcast, verified and placed into the mem-pool. (Under the covers, the implementation caches the result of signature validation to avoid doing it again.) In other words, CPU load is not a sudden spike occurring when blocks materializes out of thin air; it is spread out over time as TX arrive. As long as a block contains few “surprises” in terms of TX never encountered before, bulk of the signature verification has already been paid. This is useful property for scaling: it removes pressure to operate in real-time. Relevant metric isn’t the cost of verifying a block from scratch, but how well the node is keeping up with sustained volume of TX broadcast over time. It might also improve parallelization, by distributing CPU intensive work across multiple cores if new TX are arriving evenly from different peers, handled by different threads.

Storage

Nodes also have to store the blockchain and look up information about past transactions when trying to verify a new one. (Recall that each input to a transaction is a reference to an output from some previous TX.) As of this writing current size of the Blockchain is around 55GB. Strictly speaking, only unspent outputs need to be retained. Those already consumed by a later TX can not appear again. That allows for some pruning. But individual nodes have little control over how much churn there is in the system. If most users decide to hoard Bitcoins and not use them for any payments, most  outputs remain active. In practice one worries about not just raw bytes as measured by Bitcoin protocol, but the overhead of throwing that data into a structure database for easy access. That DB will introduce additional overhead beyond the raw size of the blockchain.

Regardless, larger blocks only have a very slow effect on storage requirements. It’s already given that space devoted to the ledger must increase over time as new blocks are mined. Doubling blocksize only leads to faster rate of increase over time, not a sudden doubling of existing usage. It could mean some users will have to add disk capacity sooner than they had planned. But disk space had to be added sooner or later. Of all the factors potentially affected by blocksize increase, this is least likely to be the bottleneck that causes an otherwise viable full-node to drop off the network.

Whether 55GB is already a significant burden or might become one under various proposals depends on the hardware in question. Even 100GB is peanuts for ordinary desktop/server-class hardware that typically feature multiple terabytes of storage. On the other hand it’s out of the question for embedded-devices and IoT scenarios. Likewise most smartphones and even low-end tablets with solid-state disks are probably out of the running. Does that matter? The answer goes back to the larger question of missing MSR, which in turn is a proxy for lack of clarity around target audience.

Network bandwidth & latency

At first bandwidth does not appear all that different from storage, in that costs increase linearly. Blocks that are twice as large will take twice as long to transmit, resulting in an increased delay before the can recognize when a new one has been successfully mined. That could result in a few additional seconds of delay in processing. On the face of it, that does not sound too bad. Once again the scalability page paints a rosy picture with an estimated 8MB/s bandwidth to scale to Visa-like 2000 transactions per second (TPS):

This sort of bandwidth is already common for even residential connections today, and is certainly at the low end of what colocation providers would expect to provide you with.

If the prospect of going to 2000TPS from the status quo of 7TPS is no-sweat, why all this hand-wringing over a mere doubling?

Miner exceptionalism

This is where miners as a group appear to get special dispensation. There is an assumption that many are stuck on relatively slow connections, which is almost paradoxical. These groups command millions of dollars in custom mining hardware and earn thousands of dollars from each block mined. Yet they are doomed to connect to the Internet with dial-up modems, unable to afford a better ISP. This strange state of affairs is sometimes justified by two excuses:

  • Countries where miners are located. Given heavy concentration of mining power in China, there is indeed both high-latency as measured from US/EU and relatively low bandwidth available overall for communicating with the outside world, not helped by the Great Firewall.
  • Economic considerations around specific locations within a country favorable to miners. Because Bitcoin mining is highly power-intensive, it is attractive to locate facilities close to sources of cheap power, such as dams. That ends up being middle of nowhere, without the benefit of fast network connections. (Except that holds for data-centers in general. Google, Amazon and MSFT also place data-centers in the middle of nowhere but still manage to maintain fast network connections to the rest of the world.)

There is no denying that delays in receiving a block are very costly for miners. If a new block is discovered but some miner operating in the dessert with bad connectivity has not received it, they will be wasting cycles trying to mine on an outdated branch. Their objective is to reset their search as soon as possible, to start mining on top of the latest block. Every extra second delay in receiving or validating a block increases the probability of either wasting time on futile search, or worse, actually finding a competing block that creates a temporary fork that will be resolved with one side or other losing all of their work when the longest chain wins out.

Network connections are also the least actionable of all these resources. One can take unilateral action to scale up (buy better machine with more/faster CPUs), scale out (add servers to a data-center), add disk space or add memory. These actions do not need to be coordinated with anyone. But network pipes are part of the infrastructure of the region and often controlled by telcos or governments, neither responsive or agile. There are few options- such as satellite based internet, which is still high-latency and not competitive with fiber- that an individual entity can take to upgrade their connectivity.

Miners’ dilemma

In fact as many pointed out, increased block-size is doubly detrimental to miners’ economic interests: blocks that are filled to capacity lead to an increase in mining fees as users compete to get their transaction into that scarce space. Remove that scarcity and provide lots of spare room for growth, and that competitive pressure on fees goes away. That may not matter much at the moment. Transaction fees are noise compared to what miners earn from the so-called coinbase transaction, the prize in newly-minted Bitcoins for having found that block.

Those rewards diminish overtime and eventually disappear entirely once all 21 million Bitcoins have been created. At that point fees become the only direct incentive to continue mining. (Participants who have significant Bitcoin assets may still want to subsidize mining at a loss, on the grounds that a competitive market in mining-power provides decentralization and protects those assets.) Those worried about a collapse in mining economics argue fees need to rise over time to compensate and that such increases are healthy. Keeping block-size artificially constrained helps that objective in the short run. But looked another way, it may be counter-productive. Miners holding on to their Bitcoin stand to gain a lot more from increase in the value of Bitcoin relative to fiat currencies such as the US dollar. If scaling Bitcoin helps enable new payment scenarios or brings increased demand for purchasing BTC with fiat, that will benefit miners.

Stakeholder priorities

The preceding discussion may shed light on why miners are very sensitive to even small increases in latency and bandwidth. It still does not answer the question of whether that is a legitimate reason to hold off on block-size increases. Miners are one of the constituents, certainly one of the more prominent, and perhaps justifiably wielding outsized influence on the protocol. They are not the only stakeholders. Lest we forget:

  • People running full nodes
  • Merchants accepting Bitcoin as a form of payment
  • Companies operating commercial services dependent on the Blockchain
  • Companies vying to extend Bitcoin or layer new features
  • Current and future end-users, relying on the network for payments and money transfer

That last group dwarfs all of the others in sheer numbers. They are not actively participating in keeping up the network by mining or even verifying blocks; they simply want to use Bitcoin or hold it as an investment vehicle, often under custody of a third-party service. These stakeholders have different incentives, requirements and expectations. Sometimes those are aligned, other times they are in conflict. Sometimes incentives may change over time or vary based on particular circumstances: miners with significant holdings of BTC on their balance sheet would happily forgo higher transaction fees today, if scaling Bitcoin could drive usage and cause their BTC assets to gain against fiat currencies.

That brings us to the governance question: which constituency should Bitcoin core optimize for? Can feature development be held hostage by the requirements of one group or another? W3C has a clear design principle ranking its constituencies on the totem pole:

“Consider users over authors over implementors over specifiers over theoretical purity”

What the XT debacle and more recent saga of a rage-quitting core developer suggest is that Bitcoin project needs to articulate its own set of priorities and abide by them when making decisions.

CP

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s