Address ≠ person: the elusive Gini coefficient of cryptocurrencies


Estimating the distribution digital-assets from on-chain data is not straightforward

A false sense of transparency

The Gini coefficient of blockchains has long been a point of contention among defenders and detractors of cryptocurrency alike. Critics like to point to extreme levels of inequality based on the observed distribution of wealth among blockchain addresses. Far from having democratized access to finance or created a path for wealth accumulation for average investors, they point to these statistics as evidence that blockchains have only enabled another instance of capital concentration. Defenders downplay the significance of such inequality and hold that such disparities do not indicate any fundamental problems with the economics of cryptocurrency. Without picking sides in that ideological debate, this post outlines a different issue: the measures of alleged inequality calculated from blockchain observations are riddled with systemic errors.

Given the transparency of blockchains as a public ledger of addresses and associated balances, the Gini coefficient is very easy to compute in theory. Anyone can retrieve the list of addresses, sort them by associated balance and crunch the numbers. This methodology is the basis of an often-cited 2014 statistic comparing bitcoin to North Korea and more recent attention-grabbing headlines stating that bitcoin concentration “puts the US dollar to shame.” While blockchain statistics are very appealing in their universal accessibility, there are fundamental problems with attempting to characterize cryptocurrency distribution this way.

Address ≠ person: omnibus wallets

The first problem is that the transparency afforded in blockchain data only applies at the level of addresses. All of the purported eye-opening measures of inequality (“%0.01 of addresses control 27% of funds”) are based on distribution across addresses as the unit of analysis. But an address is not the same thing as a person.

One obvious problem involves omnibus wallets of cryptocurrency service providers, such as centralized exchanges and payment processors. [Full disclosure: This blogger worked at Gemini, a NYC-based exchange and custodian from 2014-2019] For operational reasons, it is more convenient for these companies to pool together funds belonging to different customers into a handful of addresses. These addresses do not correspond to any one person or even the parent corporate entity. The Binance cold-wallet address does not hold the funds of Binance, the exchange itself. Those assets belongs to Binance customers, who are temporarily parking their funds at Binance to take advantage of trading opportunities or simply because they do not want to incur the headache of custodying their own funds.

While the companies responsible for these addresses do not voluntarily disclose them, in many cases they have been deanonymized thanks to voluntary sleuthing by users and labelled on blockchain explorers. A quick peek shows that they are indeed responsible for some of the largest concentrations of capital on chain, including four of the top ten accounts by bitcoin balance and similarly five of the top ten for Ethereum as of this writing.

Address ≠ person: smart-contracts

Ethereum in facts adds another twist that accounts for several other high-value accounts: there are smart-contracts holding funds from multiple sources as part of a distributed application or app. For example, the number one address by balance currently is the staking contract for Ethereum 2.0. This contract is designed to hold in escrow the 32 ETH required as a surety bond from each participant interested in participating in the next version of Ethereum validation using proof-of-stake. The second highest balance belongs to another smart-contract, this one for wrapped Ether or wETH which is a holding vehicle for converting the native ETH currency ether into the ERC20 token format used in decentralized finance (“DeFi”) applications. Others in the top 25 correspond to specific DeFi applications such as the Compound lending protocol or the bridge to the Polygon network. None of these these addresses are meaningful indicators of ownership by anyone. As such it is surprising that even recent studies on inequality are making meaningless statements such as: “The account with the highest balance in Ethereum contains over 4.16% of all Ethers.” (Depending on when the snapshot was taken, that would be either the Ethereum 2.0 staking contract— now the highest balance with > 7% of all ETH in existence— or the Wrapped Ether contract.) Spurious inclusion of such addresses in the study obviously inflates the Gini coefficient. But even their very existence distorts the picture in a way that can not be remedied by merely excluding that data point. After all the funds at that address are real and belong to the thousands of individuals who opted into staking or decided to convert their ether into wrapped-ether for participating in DeFi venues. All of these funds would have to be withdrawn and redistributed back to their original wallets to accurately reflect ownership information that is currently hidden behind the contract.

Investors: retail, institutional and imaginary

On the other extreme, a single person can have multiple wallets, distributing their funds across multiple addresses. Interesting enough this can skew the result in either direction. If a single investor with 1000BTC splits that sum equally among a thousand addresses, counting each one as a unique individual will create the appearance of capital distributed in more egalitarian terms. But it may also go in the other direction. Suppose an investor holding 1 bitcoin splits that balance unevenly across ten addresses: the “primary” wallet gets the lion’s share at 0.90BTC while all others split the remainder. While keeping the total balance constant, this rearrangement has created several phantom “cryptocurrency owners,” each holding a marginal amount of bitcoin consistent with the narrative of a high Gini coefficient.

A different conceptual problem is that even for addresses with a single owner, that owner may be an institutional investor such as a hedge-fund or asset manager. Once again, the naive assumption “one address equals one person” results in overestimating the Gini coefficient when the address represents ownership by hundreds or thousands of persons. (In the extreme case, once sovereign-wealth start allocating to cryptocurrency a single blockchain address could literally represent millions of citizens of a country as stakeholders.) It’s as if an economist tried to estimate average savings in the US by looking at the balance of every checking account at a bank, without distinguishing whether the account belongs to a multinational corporation or ordinary citizen.

Getting the full picture

More subtly, looking at each blockchain in isolation does not paint an accurate picture of total cryptocurrency ownership overall. In traditional finance some amount of positive correlation is expected across different asset types. Investors holding stocks are also likely to have bonds as part of a balanced portfolio. But cryptocurrency has sharp ideological divides that may result in negative correlation where it matters most. If bitcoin maximalists frown upon the proliferation of dubious ICOs for unproven applications while Web3 junkies consider bitcoin the MySpace of cryptocurrency, there would be little overlap in ownership. In this hypothetical universe the correlation is negative: an investor holding BTC means is less likely to hold ETH. In that scenario Bitcoin and Ethereum may both have high inequality when measured in isolation while the combined holdings of investors across both chains exhibit a more egalitarian distribution. It is possible to aggregate assets within a chain, by taking into account all tokens issued on that chain. For example a single notional balance in US dollars can be calculated for each ethereum address by taking into account all token balances for that address, maintained in the ERC20 smart-contract responsible for tracking that asset. But this does not work across chains. There is no reason to expect the correlation between different ERC20 holdings— arguably closer in spirit to each other as utility tokens for various definitions of “utility”— to hold between ethereum and bitcoin.

Better data: paging cryptocurrency exchanges

Is there a better way to estimate the Gini coefficient than this naive accounting by address? The short answer is yes but it relies on closed data-sets. Centralized cryptocurrency exchanges such as Binance are in a better position to measure inequality using their internal ledgers. While an omnibus account may appear as a handful of high-balance addresses to external observers, the exchange knows exactly how those totals are allocated to each customer. Most exchanges also perform some type of identity validation on customers to comply with KYC/AML regulations, so they can distinguish between individual or institutional investor. This allows excluding institutional investors but at the risk of introducing a different type of distortion. If high net-worth individuals are investing in cryptocurrency through institutional vehicles such as family-offices and hedge funds, focusing on individual investors will bias the Gini coefficient down by removing outliers from the dataset. Finally, exchanges have a comprehensive view into balances of their customers across all assets simultaneously so they can arrive at an accurate total across chains and even fiat equivalents. (If a customer is holding dollars or euros at a cryptocurrency exchange, should that number be included in their total balance? What if they are holding stable-coins?) These advantages can yield a more precise estimate on exactly how unequal cryptocurrency ownership is, modulo some caveats. If customers subscribe to the “not your keys, not your bitcoin” school of custody and withdraw all cryptocurrency to their own self-hosted wallet after every purchase, the exchange will underestimate their holdings. Similarly customers holding assets at multiple exchanges— for example holding bitcoin at both Binance and FTX— will result in both providers underestimating the balance. Even with these limitations, getting an independent datapoint from a large-scale exchange would go a long way towards sanity-checking the naive estimates put forward based on raw blockchain data alone. It remains to be seen if any exchange will step up to the plate.

CP

Leave a comment