I use this blog as a soap box to preach (ahem... to talk :-) about subjects that interest me.

Monday, April 23, 2012

A real small-world network #2

It turns out that I hadn't counted the airports correctly: they are 698, not 699 as I stated in my previous article on this subject.

Furthermore, MMU (Morristown NJ) and SSI (Brunswick GA) are only connected with one another. Same story with MXY (McCarthy AK) and MYK (May Creek AK). Therefore, they are not really part of the network.

This leaves a network of 694 airports. Here is a table showing the minimum path lengths between all possible pairs:

Length    #
      1     3,692
      2    42,794
      3    74,998
      4    74,064
      5    39,197
      6     5,452
      7       268
      8         6
              --------
Total  240,471

Indeed, 240,471 corresponds to all possible airport pairings: 294 * 293 / 2.

As a result, the average path length is 3.498. Quite short, when considering that the network consists of 694 nodes.

Next, I have to calculate the clustering coefficient of the network as the fraction of pairs of nodes that, being directly connected to a third node, are also directly connected with each other.

Another way to see clustering is to consider triplets of nodes that are directly connected. If the nodes of a triplet are connected by two links, the triplet is said to be open. If all three nodes are directly connected to each other (i.e., with three links), the triplet is said to be closed. The clustering coefficient is calculated by dividing the total number of closed triplets by the total number of triplets.

We already know that the total number of open triplets in the network of US airports is 42,794, because that is the number of nodes connected via an intermediate node but not connected directly to each other. Those triplets are open because I programmed the database to rejected duplicate pairs. Therefore, pairs found to be connected via a path of length 2 couldn't be inserted into the database if they were already present with a path of length 1.

I wrote a program that, instead of attempting to insert pairs connected with a path of length 2 into the database, counts the cases in which the third node is connected to the first one. I avoided counting more the once the triplets by only considering triplets in which the nodes were in alphabetical order.

The program counted 17,440 closed triplets, resulting in a clustering factor of 0.29. This is higher than the clustering of the Web and of the Internet, as shown in a presentation slides dated May 2008 (www.newton.ac.uk/programmes/CSM/seminars/050114001.pdf):






Network Vertices  γ Path Cluster
WWW 2 × 108 2.1 16 0.1078
Internet, router 1500002.4 11 0.18
Movie actors 212250 2.3  4.54 0.79
Maths, coauthors 70975 2.5  9.5 0.59
Phone calls 53 × 106 2.1

Synonyms 22311 2.8 4.5 0.7


Original sources: Adamic (1999), Aiello et al (2000), Barabási, Albert (1999),Broder (2000), Watts, Strogatz (1998).

The "γ" indicates how long the right tail of the power-law distribution of cluster sizes is. The lower the γ, the slower the number of clusters decreases with the cluster size. The expression for the power law is as follows: N(size) = K * size, where K is a constant. A γ of 1 means that the number of clusters halves when their size doubles. A γ of 2 means that the number of clusters is reduced to a quarter when their size doubles. And so on.

To calculate the distribution for "my" network, I needed to write another small program that listed the number of links of each airport (i.e., node) or, to say it in a different way, the cluster sizes.

It turned out that there are 128 airports with a single connection to another airport. In case you are curious, the best connected city is Atlanta, with non-stop flights to 158 destinations, followed by Minneapolis/Saint Paul with 146, Chicago with 142, and Denver with 141.

The following plot shows log(N) vs. average log(size). This means that γ = 1.25, and K = 264. K should match 128. Why is it much larger? I haven't got a clue. The discrepancy sounds dramatic, but when considering the logarithms, as it appears on the plot, it doesn't look catastrophic.

And why is the γ half of what reported in the examples? Again, I have no idea.


In any case, now that I have found a bona-fide small-world network, I think for a while I will concentrate on something else.

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Advertising in Vietnamese? No thanks!

      Delete