July 18, 2022 – Random Oracle

Seeking a middle-ground for online privacy

The exact prevalence of bots has become the linchpin of Elon Musk’s attempt to bail out on the proposed acquisition of Twitter. Existence of bots is not disputed by either side; the only question is what percent of accounts these constitute. Twitter itself puts the figure around 5%, using a particular metric called “monetizable daily active users” or mDAU for calculating the ratio. Mr. Musk disputes that number and claims it is much higher, without citing any evidence despite having obtained access to raw data from Twitter for carrying out his own research.

Any discussion involving bots and fake accounts naturally leads to the question: why is Twitter not verifying all accounts to make sure they are actual humans? After all the company already has a concept of verified accounts sporting a blue badge, to signal that the account really belongs to the person it is claiming to be. This deceptively simple question leads into a tangle of complex trade-offs around exactly what verification can achieve and whether it would make any difference to the problem Twitter is trying to solve.

First we need to clarify what is meant by bot accounts. Suppose there is a magical way to perform identity verification online. While not 100% reliable, cryptocurrency exchanges and other online financial platforms are already relying on such solutions to stay on the right side of Know Your Customer (KYC) regulations. These include a mix of collecting information from the customer— such as the time-honored abuse of social security numbers for authentication— uploading copies of government-issued identity documents and cross-checking all this against information maintained by data brokers. None of this is free but suppose Twitter is willing to fork over a few dollars per customer on the theory that the resulting ecosystem will be much more friendly to advertisers. Will that eliminate bots?

The answer is clearly no, at least not according to the straightforward definition of bots. Among other things, nothing stops a legitimate person from going through ID verification and then transferring control of their account to a bot. There need not be any nefarious intent behind this move. For example, it could be a journalist who sets up the account to tweet links to their articles every time they publish a new one. In fact the definition of “bot” itself is ambiguous. If software is designed to queue up tweets from the author and publish them verbatim at specific future times, is that a bot? What if the software augments or edits human-authored content instead of publishing it as-is? Automation is not the problem per se. Having accounts that are controlled by software— even software that is generating content automatically without human intervention— may be perfectly benign. The real questions are:

Who is really behind this account
Why are they using automation to generate content?

Motivation is ultimately unknowable from the outside but the first question can be tracked down to a name, either a person or corporate entity. Until such time as we have sentient AI creating its own social-media accounts, there is going to be someone behind the curtain, accountable for all content spewing from that account. Identity verification can point to that person pulling the levers. (For now we disregard the very real possibility of verified accounts being taken over or even deliberately resold to another actor by the rightful owner.) But that knowledge alone is not particularly useful. What would Twitter do with the information that “nickelbackfan123” is controlled by John Smith of New York, NY? Short of instituting a totalitarian social credit system along the lines of China to gate access to social networks, there is no basis for turning away Mr. Smith or treating him differently than any other customer. Even if ID verification revealed that the customer is a known persona non grata to the US government— fugitive on the FBI most-wanted list or an OFAC-sanctioned oligarch— Twitter has no positive obligation to participate in some collective punishment process by denying them an online presence. Social media presence is not a badge of civic integrity or proof of upstanding character, a conclusion entirely familiar to any one who has spent time online.

But there is one scenario where Twitter can and should preemptively block account creation. Suppose this is not the first account but 17th one Mr. Smith is creating? (Let’s posit that all the other accounts remain active, and this is not a case of starting over. After all in America we all stand for second-acts and personal reinvention.) On the other hand if one person is simultaneously in controlling dozens of accounts, the potential for abuse is high— especially when this link is not clear to followers. Looked another way: there is arguably no issue with a known employee of the Russian intelligence agency GRU registering for a Twitter account and using their presence to push disinformation. The danger comes not from the lone nut-job yelling at the cloud— that is an inevitable part of American politics— but that one person falsely amplifying their message using hundreds of seemingly independent sock-puppet accounts. In the context of information security, this is known as a “Sybil attack:” one actor masquerading as thousands of different actors in order to confuse or mislead systems where equal weight is given to every participant. That makes a compelling case for verified identities online: not stopping bad actors from creating an account, but stopping them from creating the second, third or perhaps the one-hundredth sock-puppet account.

There is no magic “safe” threshold for duplicate accounts; it varies from scenario to scenario. Insisting on a one-person-one-account policy is too restrictive and does not take into account— no pun intended— use of social media by companies, where one person may have to represent multiple brands in addition to maintaining their own personal presence. Even when restricting our attention to individuals, many prefer to maintain a separation between work and personal identities, with separate social media accounts for different facets of their life. Pet lovers often curate separate accounts for their favorite four-legged companions— often eclipsing their own “real” stream in popularity. If we contain multitudes, it is only fair that Twitter allow a multitude of accounts. In other cases, even two is too many. If someone is booted off the platform for violating terms of service, posting hate speech or threatening other participants, they should not be allowed to rejoin under another account. (Harder question: should all personal accounts associated with that person on the platform be shuttered? Does Fido the dog get to keep posting pictures if his companion just got booted for spreading election conspiracies under a different account?)

Beyond real-names

So far the discussion about verified identity focused only on the relationship between an online service such as Twitter and an individual or corporation registering for an account on that platform. But on social media platforms, the crucial connections run laterally, between different users of the platform as peers. It is one thing for Twitter to have some assurance about the real world identity connected to a user. What about other participants on the platform?

One does not have to look back too far to see a large scale experiment in answering that question in the affirmative and evaluating how well that turned out. Google Plus, the failed social networking experiment from designed to compete against Facebook, is today best remembered as the punchline to jokes— if it is remembered at all. But at the time of its launch, G+ was controversial for insisting on the use of “real names”. Of course the company had no way to enforce this at the time. Very few Google services interacted with real world identities, by requiring payment or interactions with existing financial institutions. (The use of a credit card suddenly allows for cross-checking names against those already verified by another institution such as a bank. While there is no requirement that the name on a credit card is identical to that appearing on government issued ID, it is a good proxy in most cases.) Absent such consistency checks, all that Google could do was insist that the same name be used across all services— if you are sending email as “John Smith” then your G+ name shall be John Smith. Given how ineffective this is at stopping users from fabricating names at the outset, there had to be a process for flagging accounts violating this rule. That policing function was naturally crowd-sourced to customers, with the expectation that G+ users would “snitch” on each other by escalating matters to customer support with a complaints when they spotted users with presumably fake names. While it is unclear if this half-baked implementation would have prevented G+ from turning into the cesspool of conspiracy theories and disinformation that Facebook evolved into, it certainly resulted in one predictable outcome: haphazard enforcement, with allegations of real-names violation used to harass individuals defending unpopular views. In a sense G+ combined the worst of both worlds: weak, low-quality identity verification by the platform provider coupled with a requirement for consistency between this “verified” identity known to Google and outward projection visible to other users.

Yet one can also imagine alternative designs that decouple identity verification from the freedom to use pseudonyms or assumed nicknames. Twitter could be 100% confident that the person who signed up is a certain John Smith from New York City in the offline world, while still allowing that customer to operate under a different name as far as all other users are concerned. This affords a reasonable compromise between providing freedom of expressing identity while discouraging abuse: if Mr. Smith is booted from the platform for threatening speech under a pseudonym, he is not coming back under any other pseudonym. (There is also the additional deterrence factor at play: if the behavior warrants referral to law enforcement, the platform can provide meaningful leads on the identity of the perpetrator, instead of an IP address to chase down.)

This model still raises some thorny questions. What if John Smith deliberately adopts the name of another person in their online profile to mislead other participants? What if the target of impersonation is a major investor or political figure whose perceived opinions could influence others and impact markets? Even the definition of “impersonation” is unclear. If someone is publishing stock advice under the pseudonym “NotWarrenBuffett,” is that parody or deliberate attempt at market manipulation? But these are well-known problems for existing social media platforms. Twitter has developed the blue checkmark scheme to cope with celebrity impostors: accounts with the blue check have been verified to be accurately stating their identity while those without are… presumably suspect?

That leads to one of the unintended side-effects of ubiquitous identity verification. Discouraging he use of pseudonyms (because participants using a pseudonym are relegated to second-class citizenship on the platform compared to those using their legal name) may have a chilling effect on expression. This is less a consequence of verified identities and more about the impact of making the outcome of that process prominently visible— the blue badge on your profile. Today the majority of Twitter accounts are not verified. While the presence of a blue badge elevates trust in a handful of accounts, its absence is not perceived as casting doubt on the credibility of the speaker. This is not necessarily by design, but an artifact of the difficulty of doing robust verification at scale (just ask cryptocurrency exchanges) especially for a service reliant on advertising revenue, where there is no guarantee the sunk cost can be recouped over the lifetime of the customer. In a world where most users sport the verification badge by agreeing to include their legal name in a public profile, those dynamics will get inverted: not disclosing your true identity will be seen as suspect and reduce the initial credibility assigned to the speaker. Given the level of disinformation circulating online, that increase skepticism may not be a bad outcome.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Random Oracle

Building and breaking systems

Day: July 18, 2022

Of Twitter bots, Sybil attacks and verified identities

Seeking a middle-ground for online privacy

Beyond real-names