On CAPTCHAs and accessibility (part II)

[continued from part I]

Accessibility as a value system

Accessibility has always been part of the design conversation during product development on every MSFT team this blogger worked on. One could cynically attribute this to commercial incentives originating from US government requirement for software compliance with Americans with Disabilities Act. Federal sales are a massive source of Windows revenue and failing a core requirement that would keep the operating system out of that lucrative market is unthinkable. But the commitment to accessibility extended beyond the operating system division. Online services under the MSN umbrella arguably had even greater focus on inclusiveness and making sure all of the web properties would be usable for customers with disabilities. As with all other aspects of software engineering, individual bugs and oversights could happen, but you could count on every team having a program manager with accessibility in their portfolio, responsible for championing these considerations during development.

Luckily it was not particularly difficult to get accessibility right either, at least when designing websites. By the early 2000s, standardization efforts around core web technologies had already laid the foundations with features specifically designed for accessibility. For example, HTML images have an alternative text or alt-text attribute describing that image in words. In situations when users can not see images, a screen-reader software working in conjunction with the web browser could instead speak those words aloud. World Web Consortium had already published guidelines with hints like this— include meaningful alternative text with every image— to educate web developers. MSFT itself had additional internal guidelines for accessibility. For teams operating in the brave new world of “online services” (as distinct from the soon-to-be-antiquated rich-client or shrink-wrap models of delivering software for local installation) accessibility was essentially a solved problem, much like the problem of internationalization or translating software into multiple languages which used to beguile many a software project until ground rules were worked out. As long as you followed certain guidelines— obvious one being to not hard-code English language text intended for users in your code— your software can be easily translated for virtually any market without changing the code. In the same spirit, as long as you followed specific guidelines around designing a website, browsers and screen readers will take care of the rest and make your service accessible to all customers. Unless that is, you went out of your way to introduce a feature that is inaccessible by design— such as visual CAPTCHAs.

Take #2: audio CAPTCHAs

To the extent CAPTCHAs are difficult enough to stop “offensive” software working on behalf of spammers, they also frustrate “honest” software that exists to assist users with disabilities navigate the user interface. Strict interpretation of W3C guidelines dictates that every CAPTCHA image is accompanied by an alternative text along the lines of “this picture contains the distorted sequence of letters X3JRQA.” Of course if we actually did that, spammers could cheat the puzzle, using automated software to learn the solution from the same hint.

The natural fallback was an audio CAPTCHA: instead of recognizing letters in a deliberately distorted image, users would be asked to recognize letters spoken out in a voice recording with deliberate noise added. Once again the trick is knowing exactly how to distort that soundtrack such that humans have an easy time while off-the-shelf voice recognition software stumbles. Once again, Microsoft Research to the rescue. Our colleagues knew that simply adding white-noise (aka Gaussian noise) would not do the trick. Voice recognition had become very good at tuning that out. Instead the difficulty of the audio CAPTCHA would rely on background “babble”— normal conversation sounds layered on top of the soundtrack at slightly lower volume. The perceptual challenge here is similar to carrying on a conversation in a loud space, focusing on the speaker in front of us while tuning out the cacophony of all the other voices echoing around the room.

As with visual CAPTCHAs, there were various knobs for adjusting the difficulty level of the puzzles. Chastened by the weak security configuration on the original rollout, this time more conservative choices were made. We recognized we were dealing with an example of the weakest-link effect: while honest users with accessibility needs are constrained to use the audio CAPTCHA, spammers have their choice of attacking either one. If either option is significantly easier to break, that is the one they are going to target. If it turns out that voice-recognition software could break the audio, it would not matter how good the original CAPTCHA was. All of the previous work optimizing visual CATPCHAs would be undermined as rational spammers shift over to breaking the audio to continue registering bogus accounts.

Fast forward to when the feature rolled out, that dreaded scenario did not come to pass. There was no spike in registrations coming through with audio puzzles. The initial version simply recreated the image puzzle in sound, but later iterations used distinct puzzles. This is important for determining in each case whether someone solved the image or audio version. But even when using the same puzzle, you would expect attackers requesting a large number of audio puzzles if they had an automated break, along with other signals such as a large number of “near misses” where the submitted solution is almost correct except for a letter or two. There was no such peak in the data. Collective sigh of relief all around.

Calibrating the difficulty

Except it turned out the design missed in the opposite direction this time. It is unclear if spammers even bothered attacking the audio CAPTCHA, much less whether they eventually gave up in frustration and violently chucked their copy of Dragon Naturally Speaking voice-recognition software across the room. There is little visibility into how the crooks operate. But one thing became clear over time: our audio CAPTCHA was also too difficult for honest users trying to sign up for accounts.

It’s not that anyone made a conscious decision to ship unsolvable puzzle. On the contrary, deliberate steps were taken to control difficulty. Sound-alike consonants such as “B” and “P” were excluded, since they were considered too difficult to distinguish. This is similar to the visual CAPTCHA avoiding symbols that look identical such as the digit “1” and letter “I,” or the letters “O” and “Q” which are particularly likely to morph into each other as random segments are being added around letters. The problem is all of these intuitions around what qualifies as “right” difficult level were never validated against actual users.

Widespread suspicion existed within the team that we were overdoing it on the difficulty scale. To anyone actually listening to sample audio clips, the letters were incomprehensible. Those of us raising that objection were met with a bit of folk-psychology wisdom: while the puzzles may sound incomprehensible to our untrained ears, users with visual disabilities are likely to have  far more heightened sense of hearing. They would be just fine, this theory went: our subjective evaluation of difficulty is not an accurate gauge because we are not the target audience. That collective delusion may have persisted, were it not for a proper usability study conducted with real users.

Reality check

The wake-up moment occurred in the usability labs on MSFT Redmond-West (“Red-West”) campus. Our usability engineer helped recruited volunteers with specific accessibility requirements involving screen readers. These men and women were sat down in front of a computer to work through a scripted task as members of the Passport team stood helpless, observing behind one-way glass. To control for other accessibility issues that may exist in registration flows, the tasks focused on solving audio CAPTCHAs, stripping away every other extraneous action from the study. Volunteers were simply given dozens of audio CAPTCHA samples calibrated for different settings, some easier and some harder than what we had deployed in production.

After two days, the verdict was in: our audio CAPTCHAs were far more difficult than we realized. Even more instructive were the post-study debriefings. One user said he would likely have asked for help from a relative to complete registering for an account— the worst way to fail customers is making them feel they need help from other people in order to go about their business. Another volunteer wondered aloud if the person designing these audio CAPTCHAs was influenced by John Cage and other avant-garde composers. The folk-psychology theory was bunk: users with visual disabilities were just as frustrated trying to make sense of these these mangled audio as everyone else.

To be clear: this failure rests 100% with the Passport team— not our colleagues in MSFT Research who provided the basic building blocks. If anything, it was an exemplary case of “technology transfer” from research to product: MSR teams carried out innovative work pushing the envelope, problem, handed over working proof-of-concept code and educated the product team on choice of settings. It was our call setting the difficulty level high and our cavalier attitude towards usability that green-lighted a critical feature absent any empirical evidence of its accuracy, all the while patting ourselves on the back that accessibility requirements were satisfied. Mission accomplished Passport team!

In software engineering we rarely come face-to-face with our errors. Our customers are distant abstractions, caricaturized into helpful stereotypes by marketing: “Abby” is the home-user who prioritizes online safety, “Todd” owns a small-business and appreciates time-saving features while “Fred” the IT administrator is always looking to reduce technology costs. Yet we never get to hear directly from Abby, Fred or Todd on how well our work product actually helps them achieve those objectives. Success can be celebrated in metrics trending up— new registrations, logins per day and less commonly trending down— fewer password resets, less outbound spam originating from Hotmail. Failures are abstract, if not entirely out of sight. Usability studies are the one exception when rank-and-file engineers have an opportunity to meet these mythical “users” in the flesh and recognize beyond doubt when our work products have failed our customers.


On CAPTCHAs and accessibility (part I)

[This is an expanded version of what started out as a Twitter thread]

Fighting spam & failing our customers

When Twitter announced its tweet-by-voice feature, they were probably not expecting the backlash from users with disabilities pointing out that the functionality would be unusable in its present state. It is not the first time technology companies forget about accessibility in the rush to ship features out the door. This is one such story from this blogger’s time at MSFT.

Outbound spam problem

In the early 2000s, MSFT Passport was the identity service for all customer-facing online services the company provided: Hotmail, MSN properties, Xbox Live and even developer-facing services including MSDN. (Later renamed Windows Live and now known simply as MSFT Accounts, not to be confused with a completely unrelated Windows authentication feature named “Passport” that retired in 2016.) This put the Passport team— my team— squarely in the midst of a raging battle against outbound spam. Anyone hosting email has to contend with inbound spam and keeping that constant stream out of their customers inbox. But service providers who give away free email accounts also have to worry about the opposite problem: crooks registering for thousands of such free accounts and enlisting the massive resources available to a large-scale provider like Hotmail to push out their fraudulent messages. Since Passport handled all aspects of identity including account registration, it became the first bulwark against keeping spammers out.

While most problems in economics have to do with pricing, the problem of spam originates with complete absence of cost. If customers had to pay for every piece of email they sent or even charged a monthly subscription fee for the privilege of having a Hotmail account, no spammer would find it profitable to use Hotmail accounts for their campaign. But the Orginal Sin of the web is an unshakeable conviction that every service must be “free,” at least on the surface. To the extent that companies are to make money, this doctrine goes, services shall be monetized indirectly— subsidized by some other profitable line of business such as hardware coupled to the service or increasingly, data-mining customer information for increasingly targeted and intrusive advertising, which begat our present form of Surveillance Capitalism. To the extent charges could be levied, they had to be indirect.

Enter CAPTCHAs. This unwieldy acronym stands for “Completely Automated Public Turing-test to tell Computers and Humans Apart.” In the early 2000s the terminology had not been standardized. At MSFT we used the simpler acronym HIP for Human Interaction Proof. The basic idea is having a puzzle that is easy for humans but difficult for computers to solve. The most common example is recognizing distorted letters inside an image. Solving that puzzle becomes the new toll booth for access to an otherwise “free” service. So there is a price introduced but it is charged in the currency of human cognitive workload. Of course spammers are human too: they can sit down and solve these puzzles all day long— or pay other people in developing countries to do so, as researchers eventually discovered happening in the wild. But they can not scale it the same way any longer: before they could register accounts about as fast as their script could post data to Passport servers. Now each one of those registration attempts must be accompanied by proof that someone somewhere devoted a few seconds worth of attention to solve a puzzle.

Designing CAPTCHAs

So what does an ideal CAPTCHA look like? Recall that the ideal puzzle is easy for humans but difficult for computers. This is a moving target: while our cognitive capacity changes very slowly over generations of evolution, the field of artificial intelligence moves much faster to close the gap.

Screen Shot 2020-06-24 at 10.17.29 AM

A visualization of CAPTCHA design space. Advances in AI continue to push the boundary between problems that are solvable by computers and those that are  only solvable by humans— so far. Ideal CAPTCHAs are just beyond the reach of AI, while still easy for most people to solve.


Philosophically, there is no small measure of irony in computers scientists devising such puzzles. The very idea of a CAPTCHA contradicts common interpretations of the Church-Turing thesis, one of the founding principles of computer science. Named after Alan Turing and his advisor Alanzo Church, the thesis states that the notion of a Turing machine— an idealized theoretical model of computers we can construct today, but with infinite memory— captures the notion of computability.  According to this thesis, any computational problem that is amenable to “solution” by mechanical procedures can be solved by a Turing machine. Its original formulated was squarely in the realm of  mathematics but it did not take long for inevitable connection to physics and philosophy of mind to emerge. One interpretation holds that since the human mind can solve certain complex problems—recognizing faces or understanding language— it ought to be possible to implement the same steps for solving that problem on a computer. In this view popular among AI researchers, there can not be a computational problem that is magically solvable by humans and forever out of reach of algorithms. That makes computer science research on CAPTCHAs somewhat akin to engineers designing perpetual motion machines.

Of course few researchers actually believe such problems fundamentally exist. Instead we are simply exploiting a temporary gap between human and AI capabilities. It is a given that AI will continue to improve and encroach into that space of problems temporarily labelled “only solvable by humans.” In other words, we are operating on borrowed time. It is not surprising that many CAPTCHA designs originated with AI researchers: defense and offense feed each other. Less known is that Microsoft Research was at the forefront of this field in the early 2000s. In addition to designing the Passport CAPTCHA, MSR groups broke CAPTCHAs deployed by Ticketmaster, Yahoo and Google, publishing a paper on the subject. (This blogger reached out to affected companies ahead of time with vulnerability notification and make sure there were no objections to publication.) For CAPTCHAs based on recognizing letters, we knew that simple distortion or playing tricks with colors would not be effective. Neural networks are too good at recognizing stand-alone letters. Instead the key is preventing segmentation: make it difficult for OCR to break up the image into distinct letters, by introducing artificial strokes  that connect the archipelago of letters into one uninterrupted web of pixels.

Gone in 60 seconds: Spambot cracks Live Hotmail CAPTCHA | Ars Technica

Example of original Passport CAPTCHA, from ArsTechnica

Some trial-and-error was necessary to find the right balance. The first version proved way too easy. In fact it turned out that around the same time Microsoft Office introduced an OCR capability for scanning images into a Word document and that feature alone could partially decode some of the CAPTCHAs. Facepalm moment: random feature in one MSFT product 0wns the security feature of another MSFT product. We can only hope this at least spurred the sales— or more likely pirating— of Office 2003 among enterprising spammers. After some tweaks to the difficulty parameters, image CAPTCHAs settled on a healthy middle-ground, stemming the tide of bogus accounts created by spammers without stopping honest customers from signing up.

There was one major problem remaining however: accessibility. Visual CAPTCHAs work fine for users who could see the images. What about customers with visual disabilities?




Smart-cards vs USB tokens: esoteric form factors (part III)

[continued from part II]

A smart-card is a smart-card is a…

Once we take away the literal interpretation of “card” out of smart-card, we can see the same underlying IC appearing in a host of other form factors. Here are some examples, roughly in historical order of appearance.

  • Trusted Platform Module or TPM. Defined by the Trusted Computing Group in the early 2000s, TPM provides separate hardware intended to act as root of trust and provide security services such as key management to the primary operating system. Their first major application was Bitlocker full disk-encryption in the ill-fated Windows Vista operating system. In an entertaining example of concepts coming full-circle, Windows 7 introduced the notion of virtual smart-cards which leverage the TPM to simulate smart-cards for scenarios such as remote authentication.
  • Electronic passports, or Machine Readable Travel Documents (MRTD) as they are officially designated by the standardizing body ICAO. This is an unusual scenario where NFC is the only interface to the chip; there is no contact plate to interface with vanilla card readers.
  • Embedded secure elements on mobile devices. While it is possible to view these as “TPM for phones” the integration model tends to be very different. In particular TPMs are tightly integrated into the boot process for PC to provide measured boot functionality, while eSE historically has been limited to a handful of scenarios such as payments. Nokia and other manufacturers experimented with SEs in the late 2000s, before Google Wallet first introduced  it to the US market at scale on Android devices. Apple Pay followed suit a few years later. SE is effectively a dual-interface card. The “contact” interface is permanently connected to the mobile operating system (eg Android or iOS) while the contactless interface is hooked to the NFC antenna, with one caveat: there is an NFC controller in the middle actively involved in the communication. That controller can alter communications, for example directing traffic either to the embedded SE or the host operating system depending on which “card” application is being accessed. By contrast a vanilla card usually has no controller standing between antenna and the chip.

One vulnerability, multiple shapes

An entertaining example of shared heritage of hardware involves the ROCA (Return Of Coppersmith Attack) vulnerability from 2017. Researchers discovered a vulnerability in the RSA key generation logic in Infineon chips. This was not a classic case of randomness failure: the hardware did not lack for entropy for a change. Instead it had a faulty logic that over-constrained the large prime numbers chosen to create the RSA modulus. It turns out that moduli generated as the product of such primes were vulnerable to ancient attack that allowed efficient factoring, breaking the security of such keys. This one bug affected a wide range of products all based on the same underlying platform:

One bug, one secure hardware platform, multiple manifestations.

Beyond form-factor: trusted user interface

There are some USB tokens with additional functionality that at least theoretically improve security compared to using vanilla cards. Most of these involve the introduction of a trusted user interface, augmenting the token with input/output devices so it can communicate information with the user independently of the host. Consider a scenario where digitally signing a message authorized the in transfer of money to a specified bank account. In addition to securing the secret key material that generates those signatures, one would have to be very careful about the messages being signed. An attacker does not need to have access to the key bits to wreak havoc, although that is certainly sufficient. They can trick the authorized owner of the key to sign a message with altered contents, diverting funds into a different account without ever gaining direct access to the private key.

This problem is difficult to solve with standard smart-cards, since the message being signed is delivered by the host system the card is attached to. On-card applications assume the message is correct. With dual-interface cards, there are some kludgy solutions involving the use of two different hosts. For examples the message can be streamed over contact interface initially, but must be confirmed from again over contactless interface. This allows using a second machine to verify that the first one did not submit a bogus message. Note that dual-interface is necessary for this, since the card can not otherwise detect the difference between initial delivery and secondary approval.

A much better solution is to equip the “card” with UI that can display pertinent details about the message being signed along with a mechanism for users to accept or cancel the operation. Feitian LCD PKI token is an example of a USB token with exactly that capability. It features a simple LCD display and buttons. On-board logic parses and extracts crucial fields from messages submitted for signing, such as amount, currency and recipient information in the case of a money transfer. Instead of immediately signing the message, the token displays a summary on the display and waits for affirmative approval from the user via one of the buttons. (Those buttons are physically part of the token itself. Malicious code running on the host can not fake a button press.) Similar ideas have been adopted for cryptocurrency hardware wallets, with one important difference: most of those wallets are not built around a previously validated smart-card platform. The difference is apparent in the sheer number of trivial & embarrassing vulnerabilities in such products that have long been eliminated in more mature market segments such as EMV.

The challenge for all of these designs is that they are necessarily application specific, because the token must make sense of the message and display summarize it in a succinct way for the user to make an informed decision. For all the time standards groups have spent putting angle brackets (XML) or curly braces (JSON) around data, there is no universal standard format for encoding information, much less its semantics. This requires at least some parsing and presentation logic hard-coded in the token itself, in a way that can not be influenced by the host. Otherwise if a malicious host can influence presentation on display, it could also alter the appearance of an unauthorized transaction to appear benign, defeating the whole point of getting having a human in the loop to sanity check what gets signed. Doing this in a general and secure way outside narrow applications such as cryptocurrency is an open problem.


The righteous exploit: Facebook & the ethics of attacking your own customers

Vice magazine recently reported on Facebook using a 0-day exploit for the Tails Linux distribution against one of its own users. The target was under investigation by law enforcement, for a series of highly disturbing criminal acts targeting teenage girls on the platform. Previous efforts to unmask the true identity of the suspect had been unsuccessful because they were accessing Facebook using the Tor anonymizing network. For a change from the usual Facebook narrative involving a platform hopelessly run over by trolls, hate-speech and disinformation, this story has an uplifting conclusion: the exploit works. The suspect is arrested. Yet the story was met with mixed reactions, with many seeming to conflate the ethical issue— was it “right” for Facebook security team to have done this?— with more pragmatic questions around how exactly they went about the objective.

Answering the first question is easy. There is a special place in hell reserved for those who prey on the weak and if these allegations are true, this crook squarely belongs there. There is no reason to doubt that Facebook security team acted in good faith. They had access to ample internal information to judge the credibility of the allegations and conclude that this suspect posed such risk to other people that it warranted going beyond any reasonable definition of “assisting a criminal investigation,” into the uncharted territory of actively attacking their own customer with a 0-day exploit. Ultimately the question of guilt is a matter for courts to decide. Contrary to knee-jerk reactions on social media, Facebook did not act as judge-jury-and-executioner in this episode. The suspect may have been apprehended but due process stands. Mr. Hernandez was still entitled his day in court in front of a real judge with a real jury to argue his innocence, and is unlikely to be strapped into an electric chair by a real executioner anytime soon. (In fact the perp plead guilty earlier this year and currently awaits sentencing.)

More troubling questions about the episode emerge when looking closer at exactly how Facebook collaborated with the FBI in bringing the suspect to justice. These questions are likely to come up again in other contexts, without the benefit of a comparable villain to short-cut the ethical questions. That is, if they have not already come up in other criminal investigations waiting for an enterprising journalists to unearth the story.


Let’s start with the curious timing of publication: the events chronicled in the article take place from 2015 to 2016. Why did the story come out now? There was no new public disclosure, such as the publication of court documents that could reveal— to our collective surprise— how the FBI magically tracked down the true IP address of the criminal using the Tor network. (A question that incidentally has never been answered satisfactorily in the case of Russ Albricht né Dread Pirate Roberts takedown for Silk Road.) Outline of facts are attributed to current and former Facebook employees speaking as anonymous sources. Why now? A slightly conspiratorial view is that Facebook PR desperately wanted a positive story in the current moment, when the company is under fire for refusing to fact-check disinformation on its platform, a stance made even more difficult to defend after Twitter took an unprecedented step to starting labelling tweets from the President. Facebook may have assumed the story could be a happy distraction and score cheap brownie points: “We will condone rampant political disinformation on our platform, but look over here— we went out of our way to help bust this awful criminal.” It is not uncommon for companies to play journalists this way and intentionally leak the desired narrative at the right time. If that was the calculation, public reaction suggests they badly misjudged the reception.

Facebook & Tor: strange bed-fellows

A second issue that has been over-looked in most accounts of this incident is that for Facebook the challenge of deanonymizing miscreants was an entirely self-inflicted problem. The suspect in question used Tor to access Facebook, leaving no identifiable IP address for law enforcement to pursue. There is a wide-range of opinion on the merits of anonymous access and censorship resistance but there is no question that many companies have decided that more harm than good has originated from anonymizing proxies, whether the vanilla centralized VPN variety or Tor. Netflix has an ongoing arms race to block VPNs while VPN providers compete by advertising that their service grants access to Netflix. Cloudflare has drawn the ire of the privacy community by throwing up time-wasting CAPTCHAs in the way of any user trying to access websites fronted by their CDN. Yet Facebook has been going against the grain by not only allowing Tor access but going much further by making Facebook available as a hidden Tor service and even going so far as to obtain the first SSL certificate ever for a hidden service under “.onion” domain.

Such embrace of Tor is quite puzzling, coming from the poster-child of Surveillance Capitalism with a checkered history of commitment to privacy. Tor gives ordinary citizens the power to access information and services without disclosing private information about themselves, even in the presence of “curious” third-parties trying to scrape together profiles from every available signal. That model makes less sense for accessing a social network that requires identifying yourself and using your real name as the prerequisite to meaningful participation. The implied threat model makes no sense: worrying about hiding your IP address while revealing intimate information about your life on a social network that profits by surveilling its own customers is an incoherent view of privacy. A less charitable view is that Facebook chose to pander to the privacy community in an attempt to white-wash its less than impressive record after multiple miscues, including the Beacon debacle and 2011 FTC settlement.

There is a stronger case to be made around avoiding censorship: direct access to Facebook is frequently blocked by autocratic regimes. Tor is arguably the most reliable way to bypass such restrictions. Granted the assumption that expanding access to Facebook results in a better world all around is a laughably absurd idea today. Between Russian interference in the 2016 election, the Cambridge Analytic scandal, a large-scale data breach, discriminatory advertising, ongoing political disinformation and even ethnic violence being orchestrated via Facebook, one could argue the world just might be better off with fewer people accessing this particular platform. But it is easy to forgive Facebook for this bit of self-serving naïveté in 2014, a time when technology companies were still lionized, their negative externalities yet to manifest themselves.

How much of Facebook usage over Tor is legitimate and how much is criminal behavior— such as Hernandez case— disinformation, fraud and spam? As with most facts about Facebook, these data points are not known outside the company. (It is possible they are not even known inside Facebook. For all their unparalleled data-mining capabilities, technology companies have a knack for not posing questions that may have inconvenient answers— such as what percent of accounts are fake, what fraction of advertising clicks are engineered by bots and how much activity from your vaunted Tor hidden-service is malicious.) What is undisputed is that the crook repeatedly registered new accounts after being booted off the platform. Without having a way to identify the miscreant when he returned, Facebook was playing a game of whack-a-mole with these accounts. Services have many options between outright blocking anonymizing proxies and giving them unfettered access to the platform. For example, users could be subject to additional checks or their access to high-risk features— such as unsolicited messaging of other users or video sharing, both implicated in this incident— can be restricted until the account builds sufficient reputation over time or existing accounts vouch for it.

Crossing the Rubicon

Putting aside the question of whether Facebook could have prevented these actions ahead of time, we turn to the more fraught issue of response. Reading between the lines of the Vice article, victims referred the matter to law enforcement and the FBI initiated a criminal  investigation. In these scenarios it is common for the company in question to be subpoenaed for “all relevant information” related to the suspect, in order to identify them in real life. This is where the use Tor frustrated the investigation. IP addresses are one of the most reliable pieces of forensic evidence that can be collected about actions occurring online. In most cases the IP address used by the person of interest directly leads to their residence or office. In other cases it may lead to a shared network such as a public library or coffee shop, in which case a little more sleuthing is necessary, perhaps looking at nearby video from surveillance cameras, license plate readers or any payments made at that establishment using credit cards. With Tor, the trail stops cold at the Tor exit node. If the user had instead used a commercial VPN service, there is a fighting chance the operator of the service can be subpoenaed for records. With a decentralized system such as Tor, there are too many possible nodes, distributed all over the world in different jurisdictions with no single party that could be held accountable. In fact, that is exactly the strength of Tor and why it is so valuable when used in defense of free-speech and privacy.

Facebook security team could have stopped there after handing over what little information they had to the FBI. Instead they decided to go further and actively work on unmasking the identity of the customer. This is a difficult stance. In the opinion of this blogger, it is ethically the correct one. The miscreant in question caused significant harm to young, vulnerable individuals. This harm would have continued as long as the perp was allowed to operate on the platform. Absent the appetite to walk-back on the seemingly inviolable commitment to making Facebook available over Tor, the company had no choice other than going on the offensive with an exploit.

Sourcing the exploit

Once the decision is made to pursue active attacks, the only question becomes how. There is a wide range of options. On the very low-tech side of the spectrum, Facebook employees could impersonate a victim in chat messages and try to social-engineer identifying information out of the suspect. There is no mention in the Vice article of such tactics being attempted. It is unlikely that a perp with meticulous attention to opsec would reveal identifying information in a moment of carelessness. What is implied by the article is that FBI immediately reached for high-tech solutions in their arsenal. The first exploit attempt failed, likely because the it was designed for a different platform— operating system and browser combination— than the esoteric setup this crook had involving the Tails Linux distribution.

Luckily there is no shortage of vulnerabilities to exploit in software. Take #2 witnessed Facebook contracting an “outside vendor” to develop a custom exploit chain for the specific platform used by the suspect. This is a questionable move, because going outside to source a brand-new exploit all but guarantees the independent availability of that exploit for others. Sure, Facebook can contractually demand “exclusivity” as a condition for commissioning the work, but let’s not kid ourselves. In the market for exploits, there is no honor among thieves. It is unclear if this outsourcing was a deliberate decision to distance the Facebook security team itself from exploit development or they simply lack the talent in house. (If this were Google, one expects Project Zero cranking out a reliable exploit in an afternoon’s work.)

Pulling the trigger

The next questionable step involved the actual delivery of the exploit, although Facebook may not have had any choice in the matter. According to Vice, Facebook handed over the exploit to the FBI for eventual delivery to the perp. It is as if a locksmith asked to open the door to one particular household to conduct a lawful search, simply hands over the master-key to the policemen and goes home. At this point Facebook has gone far beyond the original mission of unmasking of one noxious criminal: they handed over a 0-day exploit to the FBI, ready for use in any other situation the agency deems appropriate. (Senator Wyden is quoted in the Vice article questioning exactly this logic, asking whether the FBI later submitted the exploit the Vulnerabilities Equity Process.)

Legally the company may have had no other option. In an ideal world, Facebook holds on to the exploit— they paid for it, after all— and delivers it to the suspect directly, under tacit agreement of the FBI, with some form of immunity against prosecution for what would otherwise be a criminal act committed in the process: Facebook breaking into a machine it is not authorized to access. It is unlikely that option exists in the real world or even if it did, that the FBI would willingly pass on the opportunity to add a shiny new 0-day to its arsenal at no cost.

Disarming the exploit?

Given that Facebook had already sourced the exploit from a third-party, there is no guarantee the FBI would not have received a copy through alternate channels, even if Facebook managed to hold on to it internally. That brings up the most problematic part of this episode: vulnerability disclosure. According to Vice, Facebook security team decided against formal vulnerability notification to Tails was required because “the vulnerable code in question had already been removed in the next version.”

That is either a weak after-the-fact excuse for inaction or a stunning lapse of judgment. There is a material difference between a routine software update that inadvertently fixes a critical security vulnerability (or worse fixes it silently, deliberately trying to hide its existence) and one that is actually billed as a critical security update. In one case, users are put on notice that there is an urgency to applying the update in order to protect their systems.

Given that the exploit was already in the hands of the FBI and likely being resold by the original author, this was the only option available to Facebook to neutralize its downstream effects. Had they disclosed the issue to the Tails team after the crook was apprehended, it would have been a great example of responsible use of an exploit to achieve a limited objective with every ethical justification: delivering a criminal suspect into the hands of the justice system. Instead Facebook gave away a free exploit to the FBI knowing full well it can be used in completely unrelated investigations over which the company has no say. If it is used to bring down another Hernandez or comparable offender, society is better off and we can all cheer from the sidelines for another judicious use of an exploit. If the next target is an immigrant union organizer wanted for jay-walking or a Black Lives Matter activist singled out for surveillance based on her race, the same argument can not be made. From the moment this exploit was brought into existence until every last vulnerable Tails instance has been patched, Facebook security team bears some responsibility in the outcomes, good or bad.

It turns out trafficking in exploits is not that different from connecting the world’s population and giving everyone a platform to spread their ideas— without first stopping to ask whether they going to use that capability for charity or malice.


Smart-cards vs USB tokens: optimizing for logical access (part II)

[continued from part I]

The problem with cards

Card form-factor or “ID1” as it is officially designated, is great for converged access: using the same credential for physical and logical access. Cards can open real doors in physical world life and virtual doors online. But they have an ungainly requirement in  smart-card readers to function properly. For physical access this is barely noticed: there is a badge-reader mounted somewhere near the door that everyone uses. Employees only need to remember to bring their card, not their own reader. For logical access, it is more problematic. While card-readers can be quite compact these days, it is still one more peripheral to carry around. Every person who needs to access an online resource gated by the card— connect to the company VPN, login to a protected website or decrypt a piece of encrypted email— also needs a reader. For the disappearing breed of fixed-base equipment such as desktop PCs and workstation, one could simply have a reader permanently glued to every desk.

Yet the modern work-force is highly mobile. Many companies only issue laptops to their employees or at least expect that their personnel can function just equally effectively when they are away from the office— good luck getting to that workstation in the office during an extended pandemic lockdown. While a handful of laptops can be configured with built-in readers, the vast majority are going to require a reader dangerously dangling from the side, ready to get snapped off, hanging on to a USB-to-USB-C adapter since most readers were not designed for the newer ports. There is also the hardware compatibility issue to worry about since most manufacturers used to target Windows and rely on automatic driver installation through plug & play. (This is largely a solved problem today. Most readers comply with the CCID standard which has solid support through the open-source libccid package on Linux & OSX. Even special snowflake hardware is accompanied by drivers for Linux.)

USB tokens

This is where USB tokens come in handy. Tokens combine the reader and “card” into a single device. That might seem more complex and fragile but it actually simplifies detection from the point of view of the operating system. A reader may or may not have a card present, so the operating system must handle insertion and removal events. USB tokens appear as a reader with a card permanently present, removing one variable from the equation. In addition to featuring the same type of secure IC from a card, these devices also have a USB controller sitting in front of that chip to mediate communication. Usually the controller does nothing more fancy than “protocol arbitrage:” taking messages delivered over USB from the host and relaying them to the secure IC using ISO7816 which is the common protocol supported by smart-cards. But sometimes controllers can augment functionality, by presenting new interfaces such as simulating a keyboard to inject one-time passcodes.

Some examples such as Safenet eToken used in SWIFT authentication define their own card application standard. More often vendors choose to follow an existing standard that enjoys widespread software support. The US government PIV standard is a natural choice, given its ubiquity and out-of-box support on Windows without requiring any additional driver installation. In late 2000s GoldKey became an early example of offering a PIV implementation in USB token format; they also went to the trouble of getting FIPS140 certification for their hardware. Taglio PIVKey followed shortly afterwards with with USB tokens based on Feitian and later NXP platforms. Eventually other vendors such as Yubico copied the same idea by adding PIV functionality to their existing line.

In principle most USB tokens have no intrinsic functionality or security advantages over the equivalent hardware in card form. They are just a repackaging of the exact same secure IC running the exact same application. The difficulty of extracting secrets from the token can not be appreciably more difficult than extracting secrets from the equivalent card. If anything, the introduction of an additional controller to handle USB communication can only make matters worse. That controller is privy to sensitive information flowing across the interface. For example when the user enters a PIN to unlock the card, that information is visible to the USB controller. Likewise if the token is used for encryption, all decrypted messages pass through the secondary controller. So the USB token arguably has a larger attack surface involving an additional chip to target. Unlike the smart-card IC which has been developed with security and tamper resistance objectives, this second chip is a sitting duck.

Usability matters

Yet from a deployment perspective, USB tokens greatly simplify rolling out strong two-factor authentication in an enterprise. By eliminating the ungainly card reader, they improve usability. Employees only need to carry around one piece of hardware that is easily transported and could even remain permanently attached to their laptop, albeit at the risk of slightly increasing certain risks from compromised devices. Examples:

  • In 2014 Airbnb began issuing GoldKey PIV tokens to employees for production SSH access. Airbnb fleet is almost exclusively based on Macbooks. While the office space included fixed infrastructure such as monitors and docking stations at every desk, none of that would have been available for employees working from home or traveling— not uncommon for a company in the hospitality business.
  • Regulated cryptocurrency exchange & custodian Gemini issues PIV tokens to employees for access to restricted systems used for administering the exchange

The common denominator for both of these scenarios is that logical access takes precedence over physical access. Recall that CAC & PIV programs came to life under auspices of the US defense department. Defense use-cases are traditionally driven by a territorial mindset of controlling physical space and limiting infiltration of that area by flesh-and-blood adversaries. That makes sense when valuable assets are tangible objects, such as weapons, ammunition and power plants. Technology and finance companies have a different threat model: their most valuable assets are not material in nature. It is not the office building or literal bars of gold sitting in a vault that need to be defended; even companies directly trading in commodities such as gold and oil frequently outsource actual custody of raw material to third-parties. Instead the nightmare scenarios revolve around remote attackers getting access to digital assets: accessing customer information, stealing intellectual property or manipulating payment systems to inflict direct monetary damages. When locking down logical access is the overarching goal,  usability and deployment advantages of tokens outweigh their incompatibility with physical-access infrastructure.



Updated: June 20th, with examples of hardware token models


Smart-cards vs USB tokens: when form factor matters (part I)

Early on in the development of cryptography, it became clear that off-the-shelf hardware intended for general purpose computing was not well suited to safely managing secrets. Smart-cards were the answer. A miniature computer with its own modest CPU, modest amount of RAM and persistent storage, packaged into the shape an ordinary identity card. Unlike an ordinary PCs, these devices were not meant to be generic computers, with end-user choice of applications. Instead they were preprogrammed for a specific security application, such as credit-card payments in the case of chip & PIN or identity verification in the case of the US government identification programs.

Smart-cards solved a vexing problem in securing sensitive information such as credentials and secret keys: bits are inherently copyable. Smart-cards created an “oracle” abstraction for using secrets while making it difficult for attackers to extricate that secret out of the card. The point-of-sale terminal at the checkout counter can ask this oracle to authorize a payment or the badge-reader mounted next to a door could ask for proof that it was in possession of a specific credential. But they can not ask the card to cough up the underlying secret used in those protocols. Not even the legitimate owner of the card can do that, at least not by design. Tamper-resistance features are built into the hardware design to frustrate attempts to extract any information from the card beyond what the intended interface exposes.

“Insert or swipe card”

The original card form-factor proved convenient in many scenarios when credentials were already carried on passive media in the shape of cards. Consider the ubiquitous credit card. Digits of the card number, expiration date and identity of the cardholder are embossed on the card in raised lettering for the ultimate low-tech payment solution, where a mechanical reader presses the card imprint through carbon paper on a receipt. Magnetic stripes were added later for automated reading and slightly more discrete encoding of the same information. We can view these as two different “interfaces” to the card. Both involve retrieving information from a passive storage medium on the card. Both are easy targets for perpetrating fraud at scale. So when it came time for card networks to switch to a better cryptographic protocol to authorize payments, it made sense to keep the form factor constant while introducing a third interface to the card: the chip. By standardizing protocols for chip & PIN and incentivizing/strong-arming participants in the ecosystem to adopt newer mart-cards, payment industry opened up new market for secure IC manufacturers.

In fact EMV ended up introducing two additional interfaces, contact and contactless. As the name implies, first one used direct contact with a brass plate mounted on the card to communicate with the embedded chip. The latter used a wireless communication protocol, called NFC or Near Frequency Communication for shuttling the same bits back and forth. (The important point however is that in both cases those bits typically arrive at the same piece of hardware: while there exist some stacked designs where the card has two different chips operating independently, in most cases a single “dual-interface” IC handles traffic from both interfaces.)

Identity & access management

Identity badges followed a similar evolution. Federal employees in the US always had to wear and display their badges for access to controlled areas. When presidential directive HSPD-12 called for raising the bar on authentication in the early 2000s, the logical next step was upgrading the vanilla plastic cards to smart-cards that supported public-key authentication protocols. This is what the Common Access Card (CAC) and its later incarnation Personal Identity Validation (PIV) systems achieved.

With NFC, smart-cards can open doors at the swipe of a badge. But they can do a lot more when it comes to access control online. In fact the most common use of smart-cards prior to CAC/PIV programs were in the enterprise space. Companies operating in high-security environments issues smart-cards systems to employees for accessing information resources. Cards could be used to login to a PC, access websites through a browser and even encrypt email messages.  In many ways the US government was behind the curve in adoption of smart-cards for logical access. Several European countries have already deployed large-scale electronic ID or eID systems for all citizens and offer government services online accessed using those cards.

Can you hear me now?

Another early application of smart-card technology for authentication appeared in wireless communication, with the ubiquitous SIM card. These cards hold the cryptographic secrets for authenticating the “handset”— in other words, a cell phone— to the wireless carrier providing service. Early SIM cards had the same dimensions as identity cards, called ID1. As cell-phones miniaturized to the point where some flip-phones were smaller than the card itself, SIM cards followed suit. The second generation of “mini-SIM” looked very much like a smart-card with the plastic surrounding the brass contract-plate trimmed. In fact many SIM cards are still delivered this way, as a full size card with a SIM-sized section that can be removed. Over time standards were developed for even smaller form-factors designated “micro” and “nano” SIM, chipping away at the empty space surrounding the contact plate.

This underscores the point that most of the space taken up by the card is wasted; it is inert plastic. All of the functionality is concentrated in a tiny area where the contact plate and secure IC are located. The rest of the card can be etched, punched or otherwise cut with no effect on functionality. (Incidentally this statement is not true of contactless cards because the NFC antenna runs along the circumference of the card. This is in order to maximize the surface area covered by antenna in order to maximize NFC performance: recall that these cards have no battery or other internal source of power. When operating in contactless mode, the chip relies on induction to draw power from the electromagnetic field generated by the reader. Antenna size is a major limiting factor, which is why most NFC tags and cards have a loop antenna that closely follows the external contours of its physical shape.)



Using Intel SGX for SSH keys (part II)

[continued from part I]

Features & cryptographic agility

A clear advantage of using an SGX enclave over a smart-card or TPM is the ability to support a much larger collection of algorithms. For example the Intel crypto-api-toolkit is effectively a wrapper over openssl or SoftHSMv2, which supports a wide spectrum of cryptographic primitives. By comparison most smart-card and TPM implementations support a handful of algorithms. Looking at generic signature algorithms, the US government PIV standard only defines RSA and ECDSA, with the latter limited to two curves: NIST P256 and P384. TPM2 specifications define RSA and ECDSA, with ECDSA again limited to the same two curves, with no guarantee that a given TPM model will support them all.

That may not sound too bad for the specific case of managing SSH client keys. OpenSSH did not even have the ability to use ECDSA keys from hardware tokens until recently, making RSA the only game in town. But it does raise a question about how far one can get with the vendor-defined feature set or what happens in scenarios where more modern cryptographic techniques— such as pairing-based signatures or anonymous attestation— are required capabilities, instead of  being merely preferred among a host of acceptable algorithms.


More importantly, end-users have greater degree of control over extending the algorithms supported by a virtual token implemented in SGX. Since SGX enclaves are running ordinary x86 code, adding one more signature algorithm such as Ed25519 comes down to adopting an existing C language implementation to run inside the enclave. By contrast, end-users usually have no ability to customize code running inside the execution environment of a smart-card. It is often part of the security design for this class of hardware that end-users are not allowed to execute arbitrary code. They are limited to exercising functionality already present, effectively stuck with feature decisions made by the vendor.

Granted, smart-cards and TPMs are not made out of magic; they have an underlying programming model. At some point someone had the requisite privileges for authoring and loading code there. In principle one could start with the same blank-slate, such as a plain smart-card OS with JavaCard support and develop custom applets with all the desired functionality. While that is certainly possible, programming such embedded environments is unlikely to be as straightforward as porting ordinary C code.

It gets more tricky when considering an upgrade of already deployed cryptographic modules. Being able to upgrade code while keeping secret-key material intact is intrinsically dangerous— it allows replacing a legitimate application with a backdoored “upgrade” that simply exfiltrates keys or otherwise violates the security policy enforced by the original version. This is why in the common Global Platform model for smart-cards, there is no such thing as in-place upgrade. An application can be deleted and a new one can installed under the exact same identity. But this does not help an attacker because the deletion will have removed all persistent data associated with the original. Simulating upgradeability in this environment requires a multiple-applet design, where one permanent applet holds all secrets while a second “replaceable” applet with the logic for using them communicates over IPC.

With SGX, it is possible to upgrade enclave logic depending on how secrets are sealed. If secrets are bound to a specific implementation, known as the MRENCLAVE measurement in Intel terminology, any change to the code will render them unusable. If they are only bound to the identity of the enclave author established by code signature— so-called MRSIGNER measurement— then implementation can be updated trivially, without losing access to secrets. But that flexibility comes with the risk that the same author can sign a malicious enclave designed to leak all secrets.


When it comes to speed, SGX enclaves have a massive advantage over commonly available cryptographic hardware. Even with specialized hardware for accelerating cryptographic operations, the modest resources in an embedded smart-card controller are dwarfed by the computing power & memory available to an SGX enclave.

As an example: a 2048-bit RSA signature operation on a recent generation Infineon TPM  takes about several hundred milliseconds, which is a noticeable delay during an SSH connection. (Meanwhile RSA key generation for that length can take half a minute.)

That slowdown may not matter for the specific use case we looked at, namely SSH client authentication or even other client-side scenarios such as connecting to a VPN or TLS  authentication in a web browser to access websites. In client scenarios, private key operations are infrequent. When they occur, they are often accompanied by user interaction such as a PIN prompt or certificate selection/confirmation dialog. Shaving milliseconds off an RSA computation is hardly useful when overall completion time is dominated by human response times.

That calculus changes if we flip the scenario and look at the server side. That machine could be dealing with hundreds of clients every second, each necessitating use of the server private-key. Overall performance becomes far more dependent on the speed of cryptography under these conditions. The difference between having that operation take place in an SGX enclave ticking along at the full speed of the main CPU versus offloaded to a slow embedded controller would be very noticeable. (This is why one would typically use a hardware-security module in PCIe card form factor for server scenarios, as they combine the security and tamper-resistance aspects with more beefy hardware that can keep with the load just fine. But an HSM hardly qualifies as “commonly available cryptographic hardware” given their cost and complex integration requirements.)

State of limitations

One limitation of SGX enclaves are their stateless nature. Recall that for the virtual PKCS#11 token implemented in SGX, the implementation creates the illusion of persistence by returning sealed secrets to the “untrusted” Linux application, which stores them on the local filesystem. When those secrets need to be used again, they are temporarily imported into the enclave. This has some advantages. In principle the token never runs out of space. By contrast a smart-card has limited EEPROM or flash for nonvolatile storage on-board.  Standards for card applications may introduce their own limitations beyond that: for example the PIV standard defines 4 primary key slots, and some number of slots for “retired” keys, regardless of how much free space the card has.

TPMs present an interesting in-between case. TPM2 standard uses a similar approach, allowing unbounded number of keys, by offloading responsibility for storage to the calling application. When keys are generated, they are exported in opaque format for storage outside the TPM. These keys can be reanimated from that opaque representation when necessary. (For performance reasons, there is a provision for allowing a handful of “persistent” objects that are kept in nonvolatile storage on TPM, optimizing away the requirement to reload every time.)

PIN enforcement

But there is a crucial difference: TPMs do have local storage for state, which makes it possible to implement useful features that are not possible with pure SGX enclaves. Consider the simple example of PIN enforcement. Here is a typical policy:

  • Users must supply a valid PIN before they use private keys
  • Incorrect PIN attempts are tracked by incrementing a counter
  • To discourage guessing attacks, keys become “unusable” (for some definition of unusable) after 10 consecutive incorrect entries
  • Successful PIN entry resets the failure counter back to zero

This is a very common feature for smart-card applications, typically implemented at the global level of the card. TPMs have a similar feature called “dictionary-attack protection” or anti-hammering, with configurable parameters for failure count and lockout period during which all keys on that TPM become unusable when the threshold is hit. (For more advanced scenarios, it is possible to have per-application or per-key PINs. In the TPM2 specification, these are defined as a special type of NVRAM index.)

It is not possible to implement that policy in an SGX enclave. The enclave has no persistent storage of its own to maintain the failure count. While it can certainly seal and export the current count, the untrusted application is free to “roll-back” state by using an earlier version where the count stands at a more favorable number.

In fact even implementing the plain PIN requirement— without fancy lockout semantics— is tricky in SGX. In the example we considered, this is how it works:

  1. Enclave code “bundles” the PIN along with key bits, in a sealed object exported at time of key generation.
  2. When it is time to use that key, the caller must provide a valid PIN along with the exported object.
  3. After unsealing the object, the supplied PIN can be compared against the one previously set.

So far, so good. Now what happens when the user wants to change the PIN? One could build an API to unseal/reseal all objects with an updated PIN. Adding one level of indirection simplifies this process: instead of bundling the actual PIN, place a commitment to a different sealed object that holds the PIN. This reduces the problem to resealing 1 object, for all keys associated with the virtual token. But it does not solve the core problem: there is no way to invalidate previously sealed objects using the previous PIN. In that sense, the PIN was not really changed. An attacker who learned the previous PIN and made off with the sealed representation of a key can use that key indefinitely. There is no way to invalidate that previous version.

(You may be wondering how TPMs deal with this, considering they also rely on exporting what are effectively “sealed objects” by another name. The answer is that TPM2 specification allows setting passwords on keys indirectly, by reference to an NVRAM index. The password set on that NVRAM index then becomes the password for the key. As the “Non-Volatile” part of that name implies, the NVRAM index itself is a persistent TPM object. Changing its passphrase on that index collectively changes the passphrase on all keys, without having to re-import/re-export anything.)

One could try to compensate for this by requiring that users pick high entropy secrets, such as long alphanumeric passphrases. This effectively shifts the burden from machine to human. With an effective rate-limiting policy on the PIN as implemented in smart-cards or TPMs, end-users can get away with low-entropy but more usable secrets. The tamper-resistance of the platform guarantees that after 10 tries, the keys will become unusable. Without such rate limiting, it becomes the users’ responsibility to ensure that  an adversary free to make millions of guesses is still unlikely to hit on the correct one.

Zombie keys

PIN enforcement is not the only area where statelessness poses challenges. For example, there is no easy way to guarantee permanent deletion of secrets from the enclave. As long as there is a copy of the signed enclave code and exported objects stashed away somewhere in the untrusted world, they can be reanimated by running the enclave and supplying the same objects again.

There is a global SGX version state that applies at the level of the CPU. Incrementing that will invalidate enclaves signed for the previous version. But this is a drastic measure that renders all SGX applications on that unit unusable.

Smart-cards and TPMs are much better at both selective and global deletion, since they have state. For example TPM2 can be cleared via firmware or by invoking the take-ownership command. Both options render all previous keys unusable. Similarly smart-card applications typically offer a way to explicitly delete keys or regenerate the key in a particular slot, overwriting its predecessor. (Of course there is also the nuclear option: fry the card in the microwave, which is still nowhere as wasteful as physically destroying an entire Intel CPU.)

Unknown unknowns: security assurance

There is no easy comparison on the question of security— arguably the most important criteria for deciding on a key-management platform. Intel SGX is effectively a single product line (although microcode updates can result in material differences between versions) the market space for secure embedded ICs is far more fragmented. There is a variety of vendors supplying products at different levels of security assurance. Most products ship in the form of a complete, integrated solution encompassing everything from the hardware to the high-level application (such as for chip & PIN payments or identity-management) selected by the vendor. SGX on the other hand serves as a foundation for developers to build their own applications that leverage core functionality provided by Intel, such as sealed storage and attestation provided by the platform.

When it comes to smart-cards, there is little discretion left to the end-user in the way of software; in fact most products do not allow users to install any new code of their choosing. That is not a bug, it is a feature: it reduces the attack surface of the platform. In fact the inability to properly segment hostile application was an acknowledged limitation in some smart-card platforms. Until version 3, JavaCard required the use of an “off-card verifier” before installing applets to guard against malicious bytecode.  Unstated assumption there is that the card OS could not be relied on to perform these checks at runtime and stop malicious applets from exceeding their privileges.

By contrast SGX is predicated on the idea that malicious or buggy code supplied by the end-user can peacefully coexist alongside a trusted application, with the isolation guarantees provided by the platform keeping the latter safe. In the comparatively short span SGX has been commercially available, a number of critical vulnerabilities were discovered in the x86 micro-architecture resulting in catastrophic failure of that isolation. To pick a few examples current as of this writing:

  • Foreshadow
  • SgxPectre
  • RIDL
  • PlunderVolt
  • CacheOut
  • CopyCAT

These attacks could be executed purely in software, in some cases by running unprivileged user code.  In each case, Intel responded with microcode updates and in some cases future hardware improvements to address the vulnerabilities. By contrast most attacks against cryptographic hardware— such as side-channel observations or fault-injection— require physical access. Often they involve invasive techniques such as decapping the chip that destroys the original unit, making it difficult to conceal that an attack occurred.

While it is too early to extrapolate from the existing pattern of SGX vulnerabilities, the track-record confirms the expectation that running code on the same platform as SGX  does indeed translate into a significant advantage for attackers.


Using Intel SGX for SSH keys (part I)

Previous posts looked at using dedicated cryptographic hardware— smart-cards, USB tokens or the TPM— for managing key material related to common scenarios such as SSH, full-disk-encryption or PGP. Here we consider doing the same using a built-in feature of recent-generation Intel CPUs: Software Guard Extensions or SGX for short. The first part will focus on the mechanics of achieving that result using existing open-source software, while a follow-up post will compare SGX to alternative versions that leverage discrete hardware.

First let’s clarify the objective by drawing a parallel with smart-cards. The point of using a smart-card for storing keys is to isolate the secret material from code running on the untrusted host machine. While host applications can instruct the card to use those keys, for example to sign or decrypt a message, it remains at arm’s length from the key itself. In a properly implemented design, raw key-bits are only accessible to the card and can not be extracted out of the card’s secure execution environment. In addition to simplifying key management by guaranteeing that only copy of the key exists at all times, it reduces exposure of the secret to malicious host applications which are prevented from making additional copies for future exploitation.

Most of this translates directly to the SGX context, except for how the boundary is drawn. SGX is not a a separate piece of hardware but a different execution mode of the CPU itself. The corresponding requirement can be rephrased as: manage keys such that raw keys are only accessible to a specific enclave, while presenting an “oracle” abstraction to other applications running on the untrusted commodity OS.

The idea of using a secure execution environment as a “virtual” cryptographic hardware is so common that one may expect to find an existing solution for this. Sure enough a quick search “PKCS11 SGX” turns up two open-source projects on Github. The first one appears to be a work-in-progress that is not quite functional at this time. The second one is more promising: called crypto-api-toolkit the project is under the official Intel umbrella at Github and features a full-fledged implementation of a cryptographic token as an enclave, addressable through a PKCS#11 interface. This property is crucial for interoperability since most applications on Linux are designed to access cryptographic hardware through a PKCS#11 interface. That long list includes OpenSSH (client and server), browsers (Firefox and Chrome) and VPN clients (the reference OpenVPN client as well openconnect which is compatible with Cisco VPN appliance.) This crypto-api-toolkit project turns out to check all the necessary box.


This PoC is based on an earlier version of the code-base which runs openssl inside the enclave. The latest version on Github has switched to SoftHSMv2 as the underlying engine. (In many ways that is is a more natural choice, considering SoftHSM itself aims to be a pure, portable simulation of a cryptographic token intended for execution on commodity CPUs.)

Looking closer at the code, there are a number of minor issues that prevent direct use of the module with common applications for manipulating tokens such as the OpenSC suite.

  • crypto-api-toolkit has some unusual requirements around object attributes, which are above and beyond what the PKCS#11 specification demands
  • While the enclave is running a full-featured version of openssl, the implementation restricts the available algorithms and parameters. For example it arbitrarily restricts elliptic-curve keys to a handful of curves, even though openssl recognizes a large collection of curves by OID.
  • A more significant compatibility issue lies in the management object attributes. The stock implementation does not support the expected way of exporting EC public-keys, namely by querying for a specific attribute on the public-key object.

After a few minor tweaks [minimal patch] to address these issues, SSH use-case works end-to-end, if not necessarily appease all PKCS#11 conformance subtleties.

Kicking the tires

First step is building and install the project. This creates all of the necessary shared libraries, including the signed enclave and installs them in the right location but does not yet create a virtual token. The easiest way to do that is to run the sample PKCS#11 application included with the project.


Initialize a virtual PKCS#11 token implemented in SGX


Now we can observe the existence of a new token and interrogate it. pkcs11-tool utility from the OpenSC suite comes in handy for this. For example we can query for supported algorithms, also known as “mechanisms” in PKCS#11 terminology:


New virtual token visible and advertising different algorithms.

(Note that algorithms recognized by OpenSC are listed by their symbolic name such as “SHA256-RSA-PKCS” while newer algorithm such as EdDSA are only shown by numeric ID .)

This token however is not yet in usable state. Initializing a token defines the security-officer (SO) role, which is the PKCS#11 equivalent of the administrator. But the standard “user” role must be initialized with a separate call by the SO first. Quick search shows that the sample application uses the default SO PIN of 12345678:


Initializing the PKCS#11 user role.

With the user role initialized, it is time for generating some keys:


RSA key generation using an SGX enclave as PKCS#11 token


The newly created keypair is reflected in the appearance of corresponding files on the local file system. Each token is associated with a subdirectory under “/opt/intel/crypto-api-toolkit/tokens” where metadata and objects associated with the token are persisted. This is necessary because unlike a smart-card or USB token, an enclave does not have its own dedicated storage. Instead any secret material that needs to be persisted must be exported in sealed state and saved by the untrusted OS. Otherwise any newly generated object would cease to exist once the machine is shutdown.

Next step is enumerating the objects created and verifying that they are visible to the OpenSSH client:


Enumerating PKCS#11 objects (with & without login) and retrieving the public-key for SSH usage

In keeping with common convention, the RSA private-key has the CKA_PRIVATE attribute set. It will not be visible when enumerating objects unless the user first logs in to the virtual token. This is why the private key object is only visible in the second invocation.

OpenSSH can also see the public-key and deem this RSA key usable for authentication. Somewhat confusingly, ssh-keygen with the “-D” argument does not generate a new key as implied by the command name. It enumerates existing keys available on all available tokens associated with the given PKCS#11 module.

We can add this public-key to a remote server and attempt a connection to check if the openssh client is able to sign with the key. While Github does not provide interactive shells, it is arguably easiest way to check for usability of SSH keys:


SSH using private-key managed in SGX

Beyond RSA

Elliptic curve keys also work:


Elliptic-curve key generation using NIST P-256 curve

Starting with release 8.0, OpenSSH can use elliptic curve keys on hardware tokens. This is why the patch adds support for querying the CKA_EC_POINT attribute on the public-key object, by defining a new enclave call to retrieve that attribute. (As an aside: while that  follows the existing pattern for querying the CKA_EC_PARAMS attribute, this is an inefficient design. These attributes are neither sensitive or variable over the lifetime of the object. In fact there is nothing sensitive about a public-key object that requires calling into the enclave. It would have been much more straightforward to export this object once and for all in the clear for storage in the untrusted side.)

These ECDSA keys are also usable for SSH with more recent versions of OpenSSH:


More recent versions of OpenSSH support using ECDSA keys via PKCS#11

Going outside the SSH scenario for a second, we can also generate elliptic-curve keys over a different curve such as secp256k1. While that key will not be suitable for SSH, it can be used for signing cryptocurrency transactions:


Generating and using an ECDSA key over secp256k1, commonly used for cryptocurrency applications such as Bitcoin

While this proof-of-concept suggests that it is possible to use an SGX enclave as a virtual cryptographic token, it is a different question how that compares to using using real dedicated hardware. The next post will take up that comparison.




Helpful deceptions: location privacy on mobile devices


Over at the New York Times, an insightful series of articles on privacy continues to give consumers disturbing peeks at how the sausage gets made in the surveillance capitalism business. The episode on mobile location tracking is arguably one of the more visceral episodes, demonstrating the ability to isolate individuals as they travel from sensitive locations— the Pentagon, White House, CIA parking-lot— back to their own residence. This type of surveillance capability is not in the hands of a hostile nation state (at least not directly; there is no telling where the data ends up downstream after it is repeatedly repurposed and sold) It is masterminded by run-of-the-mill technology companies foisting their surveillance operation on unsuspecting users in the guise of helpful mobile applications.

But NYT misses the mark on how users can protect themselves. Self-help guide dutifully points out users can selectively disable location permission for apps:

The most important thing you can do now is to disable location sharing for apps already on your phone.

Many apps that request your location, like weather, coupon or local news apps, often work just fine without it. There’s no reason a weather app, for instance, needs your precise, second-by-second location to provide forecasts for your city.

This is correct in principle. For example Android makes it possible to view which apps have access to location and retract that permission anytime. The only problem is many apps will demand that permission right back or refuse to function. This is a form of institutionalized extortion, normalized by the expectation that most applications are “free,” which is to say they are subsidized by advertising that in turn draws on pervasive data collection. App developers withhold useful functionality from customers unless the customer agrees to give up their privacy and capitulates to this implicit bargain.

Interestingly there is a more effective defense available to consumers on Android, but it is currently hampered by a half-baked implementation. Almost accidentally, Android allows designating an application to provide alternative location information to the system. This feature is intended primarily for those developing Android apps, buried deep under developer options.


It is helpful for an engineer developing an Android app in San Francisco to be able to simulate how her app will behave for a customer located in Paris or Zanzibar, without so much as getting out of her chair. Not surprisingly there are multiple options in PlayStore that help set artificial locations and even simulate movement. Here is Location Changer configured to provide a static location:

Screenshot_20200211-150017_Location Changer.jpg

(There would be a certain irony in this app being advertising supported, if it were not common for privacy-enhancing technologies to be subsidized by business models not all that different from the ones they are allegedly protecting against.)

At first this looks like a promising defense against pervasive mobile tracking. Data-hungry tracking apps are happy, still operating under the impression that they retain their entitlement to location data and track users at will. (There is no indication the data is not coming from the GPS and is instead provided by another mobile app.) Because that data no longer reflects the actual position of the device, its disclosure is harmless.

That picture breaks down quickly on closer look. The first problem is that the simulated location is indiscriminately provided to all apps. That means not only invasive surveillance apps but also legitimate apps with perfectly good justification for location data will receive bogus information. For example here is Google Maps also placing the user in Zanzibar, somewhat complicating driving directions:


The second problem is that common applications providing simulated location only have rudimentary capabilities, such as reporting a fixed location or simulating motion along a simple linear path— one that goes straight through buildings, tunnels under natural obstacles and crosses rivers. It would be trivial for apps to detect such anomalies and reject the location data or respond with additional prompts to shame the device owner into providing true location. (Most apps do not appear to be making that effort today, probably because few users have resorted to this particular subterfuge. But under an adversarial model, we have to assume that once such tactics are widespread, surveillance apps will respond by adding such detection capabilities.)

What is required is a way to provide realistic location information that is free of anomalies, such as a device stuck at the same location for hours or suddenly “teleported” across hundreds of miles. Individual consumers have access to a relatively modest sized corpus of such data—  their own past history. In theory we can all synthesize realistic looking location data for the present by sampling and remixing past location history. This solution is still unsatisfactory since it is still built on data sampled from a uniquely identifiably individual. That holds even if the simulated app is replaying an ordinary day in the life over and over again in a Groundhog Day loop. It may hold no new information about her current whereabouts, but it still reveals information about the person. For example, the simulated day will likely start and end at their home residence. What is needed is a way to synthesize realistic location information based on actual data from other people.

Of course a massive repository of such information exists in the hands of one company that arguably bears most responsibility for creating this problem in the first place: Google. Because Google also collects location information from hundreds of millions of iPhone and Android users, the company can craft realistic location data that can help users the renegotiate the standard extortion terms with apps by feeding them simulated data.

Paradoxically, Google as a platform provider is highly motivated to not provide such assistance. That is a consequence of the virtuous cycle that sustains platforms such as Android: more users make the platform attractive to developers who are incentivized to write new apps and the resulting ecosystem of apps in turn makes the platform appealing to users. In a parallel to what has been called the original sin of the web— reliance on free content subsidized by advertising— that ecosystem of mobile apps is largely built around advertising which is in turn fueled by surveillance of users. Location data is a crucial part of that surveillance operation. Currently developers face a simple, binary model for access to location. Either location data is available, as explicitly requested by the application manifest or it has been denied, in the unlikely scenario of a privacy-conscious user after having read one too many troubling articles on privacy. There is no middle ground where convincing but bogus location data has been substituted to fool the application at the user’s behest. Enabling that options will clearly improve privacy for end-users. But it will also rain on the surveillance business model driving the majority of mobile apps.

This is a situation where the interests of end-users and application developers are in direct conflict. Neither group directly has a business relationship Google—no one has to buy a software license for their copy of Android and only a fraction of users have paying subscriptions from Google. Past history on this is not encouraging. Unless a major PR crisis or regulatory intervention forces their hand, platform owners side with app developers, for good reasons. Compared to the sporadic hand-wringing about privacy among consumers, professional developers are keenly aware of their bottom line at all times. They will walk away from a platform if it becomes too consumer-friendly and interferes in cavalier tracking and data-collection practices that help keep afloat advertising-supported business models.


A clear view into AI risks: watching the watchers

A recent NYT expose on ClearView only scratches the surface on the problems with outsourcing critical law-enforcement functions to private companies. There is a lot of  To recap: ClearView AI is possibly the first startup to have commercialized face-recognition-as-a-service (FRaaS?) and riding high on a recent string of successes with police departments in the US. The usage model could not be any easier: upload an image of a person of interest, ClearView locates other pictures of the same person from its massive database of images scraped from public sources such as social media. Imagine going from a grainy surveillance image taken from a security camera to the LinkedIn profile of the suspect. It is worth pointing out that the services hosting the original images including Facebook were none too happy about the unauthorized scraping. Nor was there any consent from users to participate in this AI experiment; as with all things social-media, privacy is just an afterthought.

Aside from the blatant disregard for privacy, what could go wrong here?
NYT article already hints at one troubling dimension of the problem. While investigating ClearView, the NYT journalist asked various members of police departments  with authorized access to the system to search for himself. This experiment initially turned up several hits as expected, demonstrating the coverage of the system. But halfway through the experiment, something strange happened: suddenly the author “disappeared” from the system with no information returned on subsequent searches, even when using the same image successfully matched before. No satisfactory explanation for this came forward. At first it is chalked up to a deliberate “security feature” where the system detects and blocks unusual pattern of queries— presumably the same image being searched repeatedly? Later the founder claims it is a bug and it is eventually resolved. (Reading between the lines suggests a more conspiratorial interpretation: ClearView gets wind of a journalist writing an expose about the company and decides to remove some evidence that demonstrates the uncanny coverage of its database.)

Going with Hanlon’s razor and attributing this case of the “disappearing” person to an ordinary bug, the episode highlights two troubling issues:

  • ClearView learns which individuals are being searched
  • ClearView controls the results returned

Why is this problematic? Let’s start with the visibility issue, which is practically unavoidable. This means that a private company effectively knows who is under  investigation by law enforcement and in which jurisdiction. Imagine if every police department CCed Facebook every time they sent an email to announce that they are opening an investigation into citizen John Smith. That is a massive amount of trust placed in a private entity that is neither accountable to public oversight nor constrained by what it can do with that information.

Granted there are other situations when private companies are necessarily privy to ongoing investigations. Telcos have been servicing wiretaps and pen-registers for decades and more recently ISPs have been tapped as a treasure trove of information on the web-browsing history of their subscribers. But as the NYT article makes clear, ClearView is no Facebook or AT&T. Large companies like Facebook, Google and Microsoft receive thousands of subpoenas every year for customer information, and have developed procedures over time for compartmentalizing the existence of these requests. (For the most sensitive category of requests such as National Security Letters and FISA warrants, there are even more restrictive  procedures.) Are there comparable internal controls at ClearView? Does every employee have access to this information stream? What happens when one of those employees or one of their friends becomes the subject of an investigation?

For that matter, what prevents ClearView from capitalizing on its visibility into law-enforcement requests and trying to monetize both sides of the equation? What prevents the company from offering an “advance warning” service— for a fee of course— to alert individuals whenever they are being investigated?

Even if one posits that ClearView will act in an aboveboard manner and refrain from abusing its visibility into ongoing investigations for commercial gain, there is the question of operational security. Real-time knowledge of law enforcement actions is too tempting a target for criminals and nation states like to pass up. What happens when ClearView is breached by the Russian mob or an APT group working on behalf of China? One can imagine face-recognition systems also being applied to counter-intelligence scenarios to track foreign agents operating on US soil. If you are the nation sponsoring those agents, you want to know when their names come under scrutiny. More importantly you care whether it is the Poughkeepsie police department or the FBI asking the questions.

Being able to modify search results has equally troubling implications. It is a small leap from alerting someone that they are under investigation to withholding results or better yet, deliberately returning bogus information to throw off an investigation or frame an innocent person. The statistical nature of face-recognition and incompleteness of a database cobbled together from public sources makes it much easier to hide such deception. According to the Times, ClearView returns a match only about 75% of the time. (The article did not cite a figure for the false-positive rate, where the system returns results which are later proven to be incorrect.) Results withheld on purpose to protect designated individuals can easily blend in with legitimate failures to identify a face. Similarly ClearView could offer “immunity from face recognition” under the guise of Right To Be Forgotten requests, offering to delete all information about a person from their database— again for a fee presumably.

As before, even if ClearView avoids such dubious business models and remains dedicated to maintaining the integrity of its database, attackers who breach ClearView infrastructure can not be expected to have similar qualms. A few tweaks to metadata in the database could be enough to skew results. Not to mention that a successful breach is not necessary to poison the database to begin with: Facebook and LinkedIn are full of fake accounts with bogus information. Criminals almost certainly have been building such fake online personae by mashing bits of “true” information from different individuals.

This is a situation where ClearView spouting bromides about the importance of privacy and security will not cut it. Private enterprises afforded this much visibility into active police investigations and with this much influence over the outcome of those investigations need oversight. At a minimum companies like ClearView must be prevented from exploiting their privileged role for anything other than the stated purpose— aiding US law enforcement agencies. They need periodic independent audits to verify that sufficient security controls exist to prevent unauthorized parties from tapping into the sensitive information they are sitting on or subverting the integrity of results returned.