I published three articles on small world networks. The first one was in August 2010: "Network of Feedbacks on eBay". The second and third articles were in March 2012: "The small world of this blog" and "More on visitors to this blog".

In those occasions, I stated that the feedbacks on eBay and the visitors to this blog formed small-world networks because the number of links attached to was distributed across the nodes according to a power law.

But a couple of days ago, in those moments of semi-lucidity that precede sleep, I realised that the presence of a power-law distribution does not prove the presence of a small-world network.

According to Steven Strogatz, a network is called "small-world" when the average number of links between nodes is low and when clustering is high. The distribution of the number of links attached to each node doesn't always follow a power law. Therefore, a power-law distribution is not a necessary condition for a network to represent a small-world.

But the crucial point is that I haven't found anywhere a statement claiming that a power-law distribution is sufficient to identify small-world networks. In fact, power-laws pop up in many places. For example, the spectrum emitted by supernova remnants and other non-thermal sources of radiation follows power laws.

The bottom line is that the power-law distributions I discovered in the eBay feedbacks and in the blog visitors, contrary to what I stated in my previous articles, do not automatically imply that we are looking at small-world networks.

Wikipedia states that "a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes (the number of steps required) grows proportionally to the logarithm of the number of nodes N in the network."

Scholarpedia, more generally, states that "A small-world network refers to an ensemble of networks in which the mean geodesic (i.e., shortest-path) distance between nodes increases sufficiently slowly as a function of the number of nodes in the network."

It seems that a universally recognised definition of "small-world network" is not available. But I don't like the definitions given by Wikipedia and Scholarpedia, because they depend on being able to "grow" the network or, at the very least, to have several instances of the network you are studying. But what do you do if you only have one network to work with? You cannot simply choose subsets of nodes, because the resulting networks might not be meaningful.

I like the definition used by Strogatz and Watts: a small-world network is characterised by having low "average path length" and high "clustering". You can apply such a definition to any individual network. To calculate the average path length, take any pair of nodes and count the number of links that form the shortest chain between them; then, repeat it for all pairs of nodes and average the result. Clustering measures the level of overlapping in the network and is calculated as the probability that two nodes linked to a common node are also linked with each other. Or, speaking of frequencies instead of probabilities, clustering measures the fraction of pairs of nodes that, being directly connected to a third node, are also directly connected with each other.

Low "average path length" can be identified in terms of network size. The most well known example is the "six degrees of separation" with a world population (i.e., nodes) of six billions. This was sort-of verified in 1967 by Stanley Milgram. He asked 160 randomly-chosen residents of Wichita, Kansas, and Omaha, Nebraska, to send letters to a particular person or, if they didn't know the person, to somebody who might know him/her. As recipients, he chose the wife of a divinity graduate student in Sharon, Massachusetts, and a stockbroker in Boston. Of the 160 letters, 42 reached their destination, after an average of 5.5 intermediaries. Now, 6.5 links is clearly very low when compared to the number of nodes in the network. In case you are interested, the Web has 19 degrees of separation and the Internet 10.

Understanding what "high clustering" means is a bit trickier. A loop in which all nodes are connected to two neighbours as 0 clustering, because no two nodes connected to the same node are ever connected together. At the other end of the spectrum, a network in which all nodes are connected with each other has a clustering of 1. In a fully-connected network, the average path length is exactly 1 regardless of the network size. Therefore, such a network is a small world, although not particularly interesting.

In general, in loosely connected networks, the presence of "hubs" that connect many nodes is sufficient to reduce the average path length and increase the clustering. This is what happens in social media: some people with many connections in FaceBook or LinkedIn connect through them pairs of people who would normally be "very far" from each other.

To go back to what motivated me to write this article, the power-law distributions I showed in my previous articles might as well to do with small-world networks, but to prove it is for all practical purposes impossible.

I have to think about it...

Power laws and small-worldness have nothing to do with each other. These are orthogonal concepts. Neither implies the other (whether or not one includes clustering as part of the definition, which is highly non-standard to do). The advantage of the notion with mean-geodesic distance and scaling is that one can prove mathematically rigorous results. For once, the mathematicians seem to have mostly won this battle. :)

ReplyDeleteFair enough. I still don't like a definition that only applies to the networks you are able to grow. I understand the need for a rigorous framework, but I find the "static" concept of low average distance and high clustering more interesting. Not surprisingly, I studied Physics instead of Mathematics! :-)

ReplyDelete