An interesting paper out of MSFT research lends new support to the six-degrees of separation meme.
Quick recap: the idea that on average every individual can be linked to any other by a chain of no more than 5 acquintances in between dates to the work of Stanley Milgram at Stanford in the 1970s. Milgram used an old fashioned communication network: the postal system aka “snail mail.” Subjects in the experiment were asked to deliver letters to individuals they knew only by name and geographic location. Surfer Bob in San Diego might be asked to deliver a letter to investment banker Alice in New York, by forwarding the letter to somebody that he suspects is closer to Alice. On average letters were passed on six times along the way to their final destination, hence the six-degrees concept which later became a Tony award-winning play on Broadway, later jumping to the big screen with Will Smith before achieving mainstream status as an online game featuring Kevin Bacon. Early part of this decade witnessed a surge of interest in studying so-called “small world graphs” which resulted in the publication of several books that poplarized the concept.
Six-degrees took a major hit in credibility when another researcher discovered in 2006 that 95% of the letters in the Milgram experiment had never reached their intended destination. One possible conclusion was that the social network was in fact, not the connected, small-diameter graph everyone imagined it to be. If 95% of the nodes had no routes between them that would paint a picture of a fragmented, disjoint society reminiscent of high school, broken up into cliques and tribes. Or it could have been that the subjects did not follow through, because sending letters takes up non-trivial amount of time.
Fast forward three decades and instant messaging has drastically lowered the overhead for communication. So the MSR group– which includes the designer of infamous Clippy— looked at the social network formed by users of MSN/Live Messenger. Specifically they studied the social graph implied by 30 billion messages sent among 240 million users in June 2006. If two people exchange messages, they are assumed to know each other. (There are situations when that is not true: for example there are spammers sending IM to random people. Also users may have more than on identity, so individuals do not map uniquely to nodes– this can distort paths because user Alice may be appears as contact on Bob’s personal IM account but not work IM account.) This is a much larger data set than Milgram’s original sample of <200 and crunching that would not have been possible without massive computing power. Finding the shortest-path between two people in a graph is very expensive and requires looking at all connections. There are some optimizations for doing this on all pairs of individuals but it is still a very expensive proposition on such a large dataset.
The result came close to vindicating the original idea: average distance between two IM users was right around 6.5 and 78% of users were seperated by less than seven links. In fact the slightly higher number would be expected because the instant messaging graph has a subset of all real world connections. First not everyone uses the Internet; the digital divide remains very much alive in the US. This has the effect of removing entire nodes from the graph; nodes that could have provided shorter conncetions between people. Similarly not all Internet users have instant messaging. Even two heavy Internet addicts may use different IM networks (one on AIM, the other on GMail for example) because interoperability is still not the norm in this space. The combined effect of these biases is to remove edges from the graph and “inflate” the distances.