[continued from part I]
Simulating a gesture
To demonstrate how multiple interfaces, specifically contact and NFC, on one smart card can help provide greater certainty about user intent against malicious hosts, let’s start with a simple example: emulating the presence of an external physical “button” on the card that the user must press before an action can be completed. The goal is to come up with some gesture the card-holder must perform, such that it is not possible for malware on local PC to emulate it via software alone. Key property is that an application running on the smart-card must be able to ascertain independently whether the gesture was performed, without relying in any way on the host PC since the latter can be malicious. This check will gate sensitive operations such as digitally signing a message or authenticating the user to a remote website.
Asking user to insert the card is not sufficient, since any number of unauthorized actions can follow after that. At first a more promising approach is to require that the user remove and replace the card. A card application can detect this indirectly: removal cuts off power to the card, resulting in a hard-reset and loss off data stored in transient memory. Unfortunately this does not work because it can be simulated by malware on PC, which can reset the card or even shut-off USB power to the reader hardware, while the card is sitting still. That situation looks indistinguishable from the physical remove/return sequence as far the on-card application is concerned.
With a dual-interface card and dual-interface reader (or less elegantly two distinct readers, one for contact and one for NFC) we can ask the user to perform a slightly different ritual: start with the card in contact slot and switch to NFC, or vice-verse. Card application can detect which interface it has been activated from, make a note of this and wait until it is called again from the complementary interface before proceeding with the requested operation. Because activation areas on the reader are different, no amount of software trickery from the malicious host can magically move the card into a different physical spot required to access it using a different interface.**
This is certainly a great deal of user annoyance for relatively little gain. It also suffers from the same problem as pressing a button built into the card: the user can not be sure exactly what operation resulted from the gesture, only that something was done. The good news is the sequence can be augmented to inspect exactly what was requested from the card, by enlisting the help of another device equipped with its own reader. Here is a scenario combining traditional PC with an NFC-capable mobile device used to double-check the operation:
- PC application streams a message to the card over contact interface and asks the card-application to sign it. (We assume that signature requests and responses are only exchanged over this interface and never over NFC, as enforced by the on-card application.)
- Card application caches the message in non-volatile memory, but returns an error indicating that user confirmation is required
- Host PC conveys this error message to the user.
- User takes the card out of the reader, and taps it against the NFC reader built into her Android phone
- Some Android application communicates with the card over NFC to retrieve the cached message, parse and format it for inspection by the user
- User reviews the message to confirm that it does indeed correspond to the intended operation. If everything looks correct, she informs the mobile app to authorize the action.
- That confirmation is relayed to the card application over NFC, which notes this
- User inserts the card back into the contact slot
- PC application requests signature again
- This time the operation succeeds, because of user confirmation received in step #7
Neither the PC or mobile device acting in isolation can sign an unauthorized message the user did not intend to sign. PC can submit any message but no signature is produced until it is independently confirmed. Meanwhile the mobile device can only confirm operations, but it can not originate new requests or even receive responses. This system can still be subverted if both the PC driving the signature operation and the mobile device used for confirming it are compromised by the same adversary. In that case an attacker can present one message to the card while displaying another one to the user for confirmation. But that is a decidedly higher bar than compromising either side in isolation.
Dual-interface requirement is critical here: asking for confirmation from another PC over same interface would not work for the same reason that card removal is not a reliable signal. There is no way for the on-card application to detect that it has been moved to another device, as opposed to simply power-cycled while sitting in the same reader.
Example: NFC payments with mobile secure element
The flow described in the previous section is already implemented in a familiar context: NFC payments from a mobile device with a secure element.
This video features the Visa NFC payments demonstration for 2012 London Olympics. Specifically around the 1:50 mark, there is a transaction with an explicit confirmation step. The phone is tapped against an NFC reader which communicates with the payWave application on the secure element to complete a purchase. Unlike the preceding transaction in the video, this time the protocol does not run to completion. Instead transaction details are displayed on the phone UI for confirmation. Once the user is satisfied that this is indeed the purchase they intended to make, a second tap against the reader completes the transaction.
To see how this is exploiting dual-interface capability to verify user intent, recall earlier posts describing how secure-elements in current generation of Android phones are effectively dual-interface cards, with the “contact” interface permanently wired to the host operating system and NFC side only accessible to external readers via built-in antenna. In this case the “PC” is replaced by a point-of-sale terminal asking the user to authorize a transaction. The “message” to sign is typically a step in the EMV payment protocol, although the card response is not exactly a signature in the cryptographic sense. The roles of contact/contactless interface are inverted: primary transaction flows over NFC, while out-of-band confirmation is obtained from the user over the contact interface hooked up to the mobile device. Because the secure element can not be taken out and moved independently, the act of bringing the phone into/away-from point-of-sale serves as the equivalent of moving between readers. (We are deliberately glancing over some implementation questions that depend on hardware configuration. For example, if the SE can be used simultaneously on both interfaces, some type of polling mechanism is required to determine when transaction is confirmed. Alternative is that only a single interface can be active, in which case the host controls whether SE is attached to NFC or contact-interface. It also means the user must pull their phone out of the NFC field for confirmation and bring it back, giving rise to a “double-tap” experience. This is how the Android eSE was configured for Google Wallet in its original design.)
** Note there is one subtlety here related to the physical construction of dual-interface readers: we assume the hardware is properly shielded to isolate the contact-slot from NFC field, or include hard-coded logic to suppress NFC when contact is detected to prevent double-activation. Otherwise it is possible for a card sitting in the contact slot to be simultaneously accessed over NFC. The opposite case however goes against physics: when the brass contact-plate is not touching corresponding metal points in the reader, that interface can not be activated. Similar concerns apply when the card is being moved into position because it may be passing through the NFC field temporarily.