There has been a lot of talk about “unlinkability” in the context of web authentication systems. For example Cardspace is touted as being somehow more privacy friendly compared to OpenID because it can support different identifiers for each site the user is interacting with. This post is a first attempt to summarize some points around that– complete with very rusty blogging skills.
To start with, the type of unlinkability envisioned here is a very weak form compared to the more elaborate cryptographic protocols involving anonymous credentials. It comes down to a simple feature: when a user participates in a federated authentication system (which is to say, they have an account with an identity provider that allows the user to authenticate to websites controlled by different entitites) does the user appear to have a single, consistent identifier everywhere he/she goes?
It is not a stretch to see that when such a universal identifier is handed out to all of the sites, it enables tracking. More precisely it allows correlating information known by different sites. Netflix might know the movie rental history and iTunes might know their music preferences– if the user is known to both sides by the same consistent identifier, the argument goes, the two can collude and build an even more comprehensive dossier about the user. This is a slightly contrived example because movie rentals are uniquely identifying already (as the recent deanonymization paper showed) and chances are so is the music collection, but it is easy to imagine scenarios where neither site has enough information to uniquely identify a user but when they can collude and put all of the data together, a single candidate emerges. Consider Latanya Sweeney’s discovery in 2000 that 87% of the US population can be identified by birthdate, five-digit zipcode and gender. It does not require very many pieces of information– if they can be assembled together with the help of a unique ID each one is associated with– to pick out individuals from the online crowd.
The obvious solution is to project a different identity to each website. Alice might appear as user #123 to Netflix but iTunes remembers her as user #987. With a little help from cryptography it is easy to design schemes where such distinct IDs can be generated easily by the authentication system, and virtually impossible to correlated by the websites receiving them even when they are trying to collude.