[continued from part I]
To recap the scenario: Alice and Bob are interested in trading bitcoin (BTC) for ether (ETH.) Alice owns BTC, Bob has ETH, and they have agreed on pricing and quantity. (Note we are fast-forwarding past the scene where Alice and Bob miraculously located each other and organized this trade. That is one of the most valuable functions of a market, a point that we will return to.) Now they want to set up a fair-exchange where Alice only receives her ETH if Bob receives the corresponding amount of BTC.
Fragility of ECDSA as a feature
One way to do this involves turning what could be considered a “bug” in the ECDSA signature algorithm—used by both Bitcoin and Ethereum— into a feature. ECDSA is a randomized signature algorithm. Signing a message involves picking a random nonce each time. The random choice of nonce for each operations means even signing the same message multiple times can yield a different result each time. This is in contrast to RSA for example, where the most common padding mode is deterministic. Processing the same message again will yield the exact same signature.** It is critical for this nonce to be unpredictable and unique, otherwise the security of ECDSA completely breaks down:
- If you know the nonce, you can recover the private key.
- If the same unknown nonce is reused across different messages you can recover the private key. (Just ask Sony about their PlayStation code-signing debacle.)
- It gets worse: if multiple messages are signed with different nonces with known relationship (such as, linear combination of some nonces equals another one) you can still recover the private key.
That makes ECDSA highly fragile, dependent critically on a robust source of randomness. It also means implementations susceptible to backdoors: a malicious version can leak private-keys by cooking the nonce while appearing to operate correctly by producing valid signatures. Variants have been introduced to improve this state of affairs. For example deterministic ECDSA schemes compute the nonce as a one-way function of secret-key and message, without relying on any source of randomness from the environment.
But this same fragility can prove useful as a primitive for exchanging funds across different blockchains, by deliberately forcing disclosure of a private key. Specifically, it’s possible to craft an Ethereum smart-contract that releases funds conditionally on observing two valid signatures for different messages with the same nonce.
- Alice has her public-key A, which can be used to create corresponding addresses on both Bitcoin & Ethereum blockchains.
- Bob likewise has public-key B.
- Alice generates a temporary ECDSA key, the “transfer-key” T.
Before starting execution, Alice rearranges her funds and moves the agreed-upon quantity of bitcoin into a UTXO with a specific redeem script. The script is designed to allow spending if either one of these two conditions are satisfied:
- One signature using Alice’s own public key A but only after some time Δ has elapsed. This is a time-lock enabled by the check-locktime-verify instruction.
- 2-of-2 multi-signature using Bob’s public key B and the transfer key T.
Once this UTXO is confirmed, Alice sends Bob a pointer to the UTXO on the blockchain. In practice she would also have to send the redeem script for Bob to verify that it has been constructed. (Since the P2SH address is based on a one-way hash of the script, it is not possible in general to infer the original script from an address alone.)
Once Bob is satisfied that Alice has put forward the expected Bitcoin amount subject to the right spending conditions, he sets up an Ethereum contract. This contract has two methods:
- Refund(): Can only be called by Bob using B and only after some future date. Sends all funds back to Bob’s address. This is used by Bob to reclaim funds tied up in the contract in case Alice abandons the protocol.
- Exchange(signature1, signature2): This method is called by Alice and implements the fair-exchange logic. It expects two signatures using the transfer-key T over predefined messages, which can be fixed ahead of time such as “foo” and “bar”. The method verifies that both signatures are valid and more importantly they are reusing the ECDSA nonce. (In other words, the private key for T has been disclosed.) If these conditions are met, the contract sends all of its available balance to Alice’s address.
Alice in turn needs to verify that this contract has been setup correctly. As a practical matter, all instances of the contract can share the same source-code, differentiated only by parameters they receive during the contract creation. These constructor parameters are the Ethereum addresses for Alice and Bob, along with the public-key for T to check signatures against. That way there is no need to reverse-engineer the contract logic from EVM byte-code. A single reference implementation can be used for all invocations of the protocol. Only the constructor arguments need to be compared against expected values, along with the current contract balance.
Assuming this smart-contract is setup correctly, Alice can proceed with taking delivery of the ETH from Bob. She signs two messages with her private-key, reusing the same nonce for both. Then she invokes the Exchange method on the contract with these signatures. Immutability of smart-contract logic dictates that upon receiving two signatures with the right properties, the contract has no choice but to send all its funds to Alice.
At this point Alice has her ETH but Bob has not claimed his BTC. This is where the fair-exchange logic comes into play: Alice staked her claim to the ETH by deliberately disclosing the private-key T. Looking back at the redeem script for Alice’s UTXO on the blockchain, possession of T and Bob’s key B allows taking control of those funds. Bob can now sign a transaction using both private-keys to move that BTC to a new address he controls exclusively. Meanwhile Alice herself is prevented from taking those funds back herself because of the timelock.
The fine-print: caveats and improvements
A few subtleties about this protocol. Invoking Exchange() on the contract means the entire world learns the private key for T, not just Bob; blockchain messages are broadcast so all nodes can verify correct execution. Why not have Alice send one of the signatures to Bob out-of-band, in private? A related question is why not allow the Bitcoin funds to be moved using the transfer-key T only, instead of requiring a multi-signature? The answer to both of these is that Bob can not count on his knowledge of T being exclusive. Even if the Ethereum smart-contract only expected a single signature (having the expected nonce hard-coded) Alice can still publish the private-key for T to the entire world after she receives her ETH. If her funds only depended on a single key T for control, it would become a race-condition between Bob and everyone else in the world to claim them. Alice does not care; once she discloses T someone will take her BTC. But Bob cares very much that he is the only recipient and not have to race against others to get their TX mined first. Including an additional key B only known to Bob guarantees this, while also making it moot whether other people come in possession of the private-key for T.
Speaking of race conditions, there is still one case of Bob racing against the clock: he must claim the bitcoin before the time-lock on the alternative spending path expires. Recall that Alice can claw-back her bitcoin after some time/block-height is reached. That path is reserved for the case when the protocol does not run to completion, for example Bob never publishes the ethereum smart-contract. But even after Bob has published the contract and Alice invoked it to claim her ETH, the alternative redemption path remains. So there is an obligation for Bob to act in a timely manner. The deadline is driven by how the time-locks are chosen. Recall that the Ethereum smart-contract also has a deadline after which Bob can claw the funds back if Alice fails to deliver T. If this is set to say midnight on a given day while the Bitcoin UTXO is time-locked to midnight the next day (these are approximate, especially when specified as block-height since mining times are randomly distributed) then Bob has 24 hours to broadcast the transaction. That time window can be adjusted based on the preferences of two sides, but only at the risk of increasing recovery time after protocol is abandoned. In that situation Alice is stuck waiting out the expiration of this lock before she can regain control of her funds.
Another limitation in the basic protocol as described is lack of privacy. The transaction is linkable across blockchains: the keys A, B and T are reused on both sides, allowing observes to trace funds from Bitcoin into Ethereum. This situation can be improved. There is no reason for Alice to reuse the same key A for reclaiming her Bitcoin as the key she uses to receive Ethereum from Bob. (In fact Bob only cares about the second one since that is given as a parameter to the contract.) Similarly Bob can split B into two different keys. Dealing with T is a little more tricky. At first it looks like this must be identical on both chains to allow private-key disclosure to work. But there is another trick Alice and Bob can use. After Alice gives the public-key for T to Bob, Bob can craft his Ethereum contract to expect the related key T* = m·T for a random scalar m used to mask the original key. He in turn shares this masking factor with Alice. Since Alice has the private key for T, she can also compute the private key for T* by simply multiplying with m. When she discloses that private-key, Bob can now recover the original key for T by using the inverse of m. Meanwhile to outside observers the keys T and T* appear unrelated. This provides a form of plausible deniability. If many people were engaging in transactions of this exact format with identical parameters, it would not be possible to link the Bitcoin side of the exchange to the Ethereum side. But “identical parameters” is the operative qualification. If Alice and Bob are trading 1BTC while Carol and David are trading 1000BTC, the transactions are easily separated. Similarly if the time-locks on ETH and BTC side are not overlapping, it becomes possible to rule out an ETH contract as being the counterpart of another BTC transaction posted around the same time.
Finally an implementation detail: why use the repeated-nonce trick for disclosing private key instead of simply sending private-key bits to the contract? Because the Solidity language used for writing smart-contract has a convenient primitive for verifying ECDSA signatures given a public-key. It does not have a similar primitive to check if a given private-key corresponds to a public-key. In fact it makes sense for Solidity to have no facilities for working with private-keys. Since all smart-contract execution is public, the assumption is only publicly available information would ever be processed by the contract and never secret material. For this reason we resort to the nonce reuse trick. Ethereum virtual machine also has the additional primitives required to compare two signatures for nonce equality. Interestingly Bitcoin script-language is exactly one instruction shy of being able to accomplish that. The instruction OP_CAT is already defined in the scripting language but currently disabled and for good reason: without other limits, it can be used as a denial-of-service vector. But if OP_CAT were enabled, it could be used to construct a redeem script that receives ECDSA signatures in suitably encoded form (nonce and second component as individual stack-operands) and checks them for nonce reuse. Other “splicing” opcodes such as OP_SUBSTR can also achieve the same effect by parsing the full ASN1 encoded ECDSA signature to extract the nonce piece into an individual stack operand where it can be compared for equality against another nonce. Either way, it would allow inverting the protocol sequence: Bob posts a smart-contract on the Ethereum blockchain first, Alice sets up the corresponding Bitcoin UTXO, which Bob proceeds to claim by disclosing the transfer key.
** RSA does have a randomized padding mode as well called PSS.