Role of language design in exploit mitigation
On Wednesday a new vulnerability was disclosed in the implementation of some ERC20 tokens including BEC operating on the Ethereum blockchain. As far as impact goes, the flaw is about as critical as they come: unauthorized “gifting” of tokens out of thin air. While this may have been the first time such a bug was observed being exploited in the wild on a blockchain, the type of vulnerability is far from novel: an integer overflow, rooted in the fact that most programming languages operate on numbers with limited range. Perform a calculation where the result falls outside that range and the result of that operation is no longer accurate, violating common-sense assumption. For example two very large numbers added together can produce a small number. Or two positive values multiplied may yield a negative product. This class of bugs are very well known in the realm of low-level languages such as C and C++. Combined with weak type-safety, lack of range checking and manual memory management, such flaws often provide a starting point for building a full remote-code execution exploit.
What is unusual is that the Ethereum virtual machine (EVM) design and the Solidity language made a series of bad decisions that resurrected a vulnerability class from low-level systems programming in the context of a wildly different environment intended for decentralized smart-contracts. To better demonstrate these unforced errors on the part of language & VM designers, it is instructive to compare how a low-level language such as C or x86 assembly handles overflows.
Getting by with finite integers
Suppose we are asked to sum two unsigned integers X and Y, and compare them to some other value Z. This is a common pattern in bounds checking, a defensive measure to prevent the application from reading/writing from memory outside the intended regions.
if (X + Y > Z) { // Limits exceeded, bail out } else { // Continue execution… }
For extra color, let’s assume one of these values such as “X” can be controlled by an attacker, while Y and Z are constrained by the application itself. If arithmetic in C operated in the Platonic realm of natural numbers, the above code snippet would always run correctly. But these values are not represented by an idealized “integer” type that can take on any value. Instead they are limited to a fixed range, defined by the size of the integral type. This is dependent on the compilation model, which itself is tightly coupled to the hardware architecture code is being targeted at. For example, in the data-model LP64 common for Linux, the C type “unsigned int” has 32-bits while the larger type “unsigned long” can hold 64-bits. (Integral types can also be signed, further reducing the representable magnitude since roughly half the possible values are now allocated to negative numbers.) That “unsigned int” type can represent every number in the range 0 through 4294967295 inclusive. Equally important, it can not represent any value outside that range.
That poses a problem for doing math: it is possible to write expressions such as X + Y where both X and Y start out in the permitted range but the result falls outside. In other words, the operation overflows the result type. So what happens when a C program is asked to run a calculation resulting in overflow? Intuitively the options are:
- Return the result in a different, larger integral type such as “long.” While appealing at first glance, this is not a feasible solution for two reasons. First we run out of integral types when the same trick has to be applied for “long” or even “long long,” which a recent addition. As long as the type system only has a finite number of built-in types, there will be one with the highest capacity where we run out of room. But more importantly, playing games with types will only result in converting the out-of-range problem into a type-cast problem. Typically programmers write expressions such as “sum = X + Y” where X, Y and sum all have the same type. Even if the result is magically upgraded into a wider type, it will still get shoe-horned into the original type downstream. Playing conversion games only delays the inevitable.
- Treat it as a fatal error. In the same way division by zero results in halting the program— or more precisely, raising a signal/exception which, if unhandled will terminate the program— every overflow could result in triggering an error designed to immediately stop execution flow.
- Undefined behavior. This is a favorite cop-out for language designers, effectively stating that the compiler is free to produce any value: the result can be 0, 42, the compiler authors’ year of birth or it may crash. All of them are valid outcomes if the behavior is deemed “undefined” according to the language specification.
- Truncate the result in a well-defined manner and keep soldiering on.
It will come as no surprise to those familiar with the minimalist philosophy of C that the language opts for a combination of #3 and #4. (C++ carries on that tradition, for the most part deferring to C on the behavior of primitive types.) Overflow for unsigned types is perfectly legal and truncates the result. Specifically only the least-significant 32-bits of the result are taken. Meanwhile overflow on signed integral types is undefined. But pragmatically speaking, for most compilers and hardware architectures the result ends up being very similar to unsigned behavior: values are truncated to squeeze into the expected type. Looked another way, C programs are not working in the realm of natural numbers with their infinite possibilities. Instead they are doing modular arithmetic in a finite ring where values outside the range “wrap around” to an equivalent one modulo some power of 2.
One level below: x86/x64 hardware behavior
If this property of C looks arbitrary, going one more level down to look at how processors handle arithmetic at the hardware level may vindicate the designers. C is close to the metal, so to speak. Its constructs often map 1:1 into capabilities of an underlying hardware. Consider the common Intel x86 architecture. This architecture sports 32-bit registers and features instructions for operating on them. For example one can add the contents of register EAX to the value existing in register EBX:
add EBX, EAX
Since registers are also limited in the range of numbers they can express, executing this instruction poses the same dilemma: what happens if EAX and EBX hold values that added together would exceed the maximum integer representable in 32-bits? As with most hardware architectures, x86 adopts the convention of doing modular arithmetic: operations are implicitly performed modulo 2**32 (or some other appropriate power of two when working with 16-bit or 8-bit registers.) Nor is that a quirk limited to Intel: most hardware follows this pattern.
Defensive programming: hardware vs software
Given that context, it is hard to fault C for following the same model: it allows using efficient CPU instructions to perform arithmetic without having to double-check or adjust the results spit out by the processor. And while C is rarely accused of being a high-level language, many other languages that followed including those targeting virtual machines such as Java, have also taken the path of least resistance by going along with the hardware. Languages like Python and Scheme where integers have arbitrary precision by default are the exception, not the rule.
That said, there are some crucial differences in how arithmetic works at a low-level which are missing from the abstractions offered in high-level languages:
- Multiplication has special case behavior of using the “type expansion” idea. Specifically product of 32-bit integers are in fact written into the pair of registers EDX and EAX, to accommodate a full 64-bit product.
- CPU has a flags register that contains special flags updated based on the result of the last operations. This includes an overflow flag. Even more conveniently, the x86 instruction set contains conditional jumps based on whether that flag is set: JO, short for jump-on-overflow. That makes life easy for a programmer working in assembly language to detect and respond to overflow conditions with just 1 additional instruction— which for all intents and purposes is an unconditional jump, because the branch-predictor will correctly guess that an overflow will not happen in the common scenario.
By contrast checking for overflow in a language such as C or Java is far more kludgy because these low-level signals are missing. There is no facility provided by the language itself for dealing with overflows in an efficient manner. Options are:
- Perform an extra arithmetic operation first to check if overflow will happen. For example, we can subtract one operands from the maximum representable value and if that difference is larger than the second operand, addition is guaranteed to overflow.
- Sanity check the result after performing the operation, by comparing the result against one of the original operands. (Leveraging the observation that an overflow condition exists for unsigned addition if and only if the alleged sum is less than either one of the operands. Note that the rules are a lot more complex for signed arithmetic.)
Many libraries exist for doing arithmetic safely with such overflows checks included, but they trade-off safety for kludgy syntax and readability. Instead of writing a straightforward mathematical expression such as “result = X * Y + (Z * scale) + offset” one must replace each binary operator by a functional call, and potentially conditional checks for the overflow status of that particular subexpression. C++ with its support for operator overloading can at least overcome the syntactic problem, since one can design a replacement for every integer type along with that performs overflow checking. However in the absence of direct compiler support or very diligent use of inline assembly, these user-defined substitutes will be very inefficient compared to native types.
Compiler features to the rescue?
Where the language falls short, compiler authors can sometimes step in. For example GCC has an option called trapv that will raise a trap when operations on signed integers overflow. This is an important distinction because C already considers such behavior to be undefined— crashing the program is just as valid as truncating the result modulo 2**32. But there is no corresponding option to treat overflow of unsigned integral types as a fatal condition: no “correct” compiler can do that because such overflow is perfect legal C and even explicitly relied on to implement functionality such as hash-functions and cryptographic algorithms. (A research paper from 2007 explores the idea of instrumenting existing binaries to fail on unsigned overflow, confronting exactly that problem of having to identify code that deliberately relies on truncation.)
Solving the problem for signed integers may sound good enough except for the fine-print in C conversion rules: in any operation involving a mixture of signed and unsigned operands, the signed input gets typecast to unsigned and arithmetic performed on unsigned values. In other words, introducing any unsigned value into a calculation voids the overflow check.
Solidity and Ethereum: the missing defense
EVM features large 256-bit words internally and has instructions to perform arithmetic operations on these values. Solidity in turns offers integral types up to 256-bits for programmers to use. But as the preceding discussion suggests, one can not “outrun” integer overflow vulnerabilities by adding bits to integer types any more than buffer overflows are fixed by adding more RAM. (Going from 32-bit → 64-bit architecture has expanded addressable memory to 256 terabytes but that did not spell the end of memory corruption bugs.) As long as resources are finite and attackers allowed to sneak inputs into an operation, it is possible to have an arithmetic overflow.
When it comes to defensive measures for avoiding overflow vulnerabilities, it turns out both platforms are worse than their ancient counterparts. In Solidity operations on integral types silently wrap-around by default in the event of an overflow. There is not even a compiler flag (at the time of writing) to treat overflow events as critical error— the equivalent of 10 year old “trapv” flag in GCC. At the processor level, EVM has no efficient mechanism for tracking overflow state or making decisions conditional on whether a recent computation resulted in overflow. There is nothing comparable to the x86 overflow flag and attendant condition branching instruction. Strictly speaking this is not required; a compiler can synthesize overflow checks with existing EVM instructions, patterned after the kludges used for C. But intrinsic EVM support could have made it that much easier.
For historic perspective, the x86 architecture dates back to the 1970s. So does C, designed as a low-level systems programming language with the explicit goal of closely mirroring the underlying hardware. (C++ arrived on the scene in the early 1980s, adding object-oriented paradigms while continuing to pay homage to the efficiency requirements of C.) Ethereum started four decades later with a completely different use-case: programming smart-contracts on a blockchain, which is an extremely inefficient system for computation and where correctness matters far more than shaving a few virtual cycles. Likewise, Solidity as a language for writing smart-contracts started from a clean slate. Unlike C++ it did not inherit the baggage of having to support existing C code with assumptions about how integer arithmetic words. It’s a tribute to the short-term memory of computer science that a modern programming language designed with the singular purpose of building applications on a blockchain platform— where bugs can result in significant monetary losses— still ended up perpetuating an ancient vulnerability class.
CP