Classroom exercise with networks

In the previous post I looked at graphs created from representing geographic regions with nodes and connecting nodes with edges if the corresponding regions share a border.

It’s an interesting exercise to recover the geographic regions from the network. For example, take a look at the graph for the continental United States.

It’s easy to identify Alaska in the graph. The node on the left represents Maine because Maine is the only state to border exactly one other state. From there you can bootstrap your way to identifying the rest of the states.

Math class

This could make a fun classroom exercise in a math class. Students will naturally come up with the idea of the degree of a node, the number of edges that meet that node, because that’s a handy way to solve the puzzle: the only possibilities for a node of degree n are states that border n other states.

This also illustrates that networks preserve topology, not geometry. That is, the connectivity information is retained, but the shape is dramatically different.

Geography class

Someone asked me on Twitter to make a corresponding graph for Brazil. Mathematica, or at least my version of Mathematica, doesn’t have data on Brazilian states, so I made an adjacency graph using GraphViz.

adjacency graph of Brazilian states

Labeling the blank nodes is much easier for Brazil than for the US because Brazil has about half as many states, and the topology of the graph gives you more to work with. Three nodes connect to only one other node, for example.

Here the exercise doesn’t involve as much logic, but the geography is less familiar, unless of course you’re more familiar with Brazil than the US. Labeling the graph will require staring at a map of Brazil and you might accidentally learn a little about Brazil.

GraphViz

The labeled version of the graph above is available here. And here are the GraphViz source files that make the labeled and unlabeled versions.

The layout of a GraphViz file is very simple. The file looks like this:

    graph G {

        layout=sfdp

        AC [label="Acre"]
        AL [label="Alagoas"]
        ...
        AC -- AM
        AC -- RO
        ...
    }

There are three parts: a layout, node labels, and connections.

GraphViz has several layout engines, and the sfdp one matched what I was looking for in this case. Other layout options lead to overlapping edges that were confusing.

The node names AC, AL, etc. do not appear in the output. They’re just variable names for your convenience. The text inside the label is what appears in the final output. I’ll give an example in the next post in which it’s very convenient for the variables to be different from the labels. The order of the labels doesn’t matter, only which variables are associated with which labels.

Finally, the lines with variables separated by dashes are the connection data. Here we’re telling GraphViz to connect node AC to nodes AM and RO. The order of these lines doesn’t matter.

Related posts

Adjacency networks

Suppose you want to color a map with no two bordering regions having the same color. If this is a map on a plane, you can do this using only four colors, but maybe you’d like to use more.

You can reduce the problem to coloring the nodes in a graph. Each node corresponds to a region, and there is an edge between two nodes if and only if their corresponding regions share a border.

Here is a sort of topologists’s or graph theorist’s view of the continental United States.

This was created using the following sample code from the Mathematica documentation.

    RelationGraph[MemberQ[#2["BorderingStates"], #1] &, 
        EntityList[
            EntityClass["AdministrativeDivision", "ContinentalUSStates"]]]

You can recognize Maine in the graph because it’s the only state that only borders one other state. Alaska is also easy to locate. Exercise for the reader: mentally add Hawaii to the graph.

The analogous graph for Texas counties took much longer to draw: there are 49 continental US states but 254 Texas counties.

This was created with the following code.

    RelationGraph[MemberQ[#2["BorderingCounties"], #1] &, 
        EntityList[EntityClass["AdministrativeDivision", "USCountiesTexas"]]]

You can find El Paso county in the top left; it only borders one county just as Maine only borders one state.

Related posts

Shortest tours of Eurasia and Oceania

This is the final post in a series of three posts about shortest tours, solutions to the so-called traveling salesmen problem.

The first was a tour of Africa. Actually two tours, one for the continent and one for islands. See this post for the Mathematica code used to create the tours.

The second was about the Americas: one tour for the North American continent, one for islands, and one for South America.

This post will look at Eurasia and Oceania. As before, I limit the tours to sovereign states, though there are disputes over which regions are independent nations. I first tried to do separate tours of Europe and Asia, but this would require arbitrarily categorizing some countries as European or Asian. The distinction between Asia and Oceania is a little fuzzy too, but not as complicated.

Oceania

Here’s a map of the tour of Oceania.

Here’s the order of the tour:

  1. Australia
  2. East Timor
  3. Indonesia
  4. Palau
  5. Papua New Guinea
  6. Micronesia
  7. Marshall Islands
  8. Nauru
  9. Solomon Islands
  10. Vanuatu
  11. Fiji
  12. Tuvalu
  13. Kiribati
  14. Samoa
  15. Tonga
  16. New Zealand

The total length of the tour is 28,528 kilometers or 17,727 miles.

Eurasia

Here’s a map of the the Eurasian tour.

Here’s the order of the tour:

  1. Iceland
  2. Norway
  3. Sweden
  4. Finland
  5. Estonia
  6. Latvia
  7. Lithuania
  8. Belarus
  9. Poland
  10. Czech Republic
  11. Slovakia
  12. Hungary
  13. Romania
  14. Moldova
  15. Ukraine
  16. Georgia
  17. Armenia
  18. Azerbaijan
  19. Turkmenistan
  20. Uzbekistan
  21. Afghanistan
  22. Pakistan
  23. Tajikistan
  24. Kyrgyzstan
  25. Kazakhstan
  26. Russia
  27. Mongolia
  28. China
  29. North Korea
  30. South Korea
  31. Japan
  32. Taiwan
  33. Philippines
  34. East Timor
  35. Indonesia
  36. Brunei
  37. Malaysia
  38. Singapore
  39. Cambodia
  40. Vietnam
  41. Laos
  42. Thailand
  43. Myanmar
  44. Bangladesh
  45. Bhutan
  46. Nepal
  47. India
  48. Sri Lanka
  49. Maldives
  50. Yemen
  51. Oman
  52. United Arab Emirates
  53. Qatar
  54. Bahrain
  55. Saudi Arabia
  56. Kuwait
  57. Iran
  58. Iraq
  59. Syria
  60. Lebanon
  61. Jordan
  62. Israel
  63. Cyprus
  64. Turkey
  65. Bulgaria
  66. North Macedonia
  67. Serbia
  68. Bosnia and Herzegovina
  69. Montenegro
  70. Albania
  71. Greece
  72. Malta
  73. Italy
  74. San Marino
  75. Croatia
  76. Slovenia
  77. Austria
  78. Liechtenstein
  79. Switzerland
  80. Monaco
  81. Andorra
  82. Spain
  83. Portugal
  84. France
  85. Belgium
  86. Luxembourg
  87. Germany
  88. Netherlands
  89. Denmark
  90. United Kingdom
  91. Algeria

The total length of the tour is 61,783 kilometers or 38,390 miles.

Three tours of the Americas

The previous post looked at an optimal tour of continental Africa. This post will give analogous tours of continental North America, North American Islands, and South America. The next post looks at Eurasia and Oceania.

North American Continent

Here’s the North American continental tour.

The order of the tour is as follows.

  1. Canada
  2. United States
  3. Mexico
  4. Guatemala
  5. El Salvador
  6. Costa Rica
  7. Panama
  8. Nicaragua
  9. Honduras
  10. Belize

North American Islands

Here’s a tour of the North American islands.

Trinidad and Tabago is about ten miles from the South American continent, but the country is classified as being part of North America, at least for some purposes.

Here is the order of the tour.

  1. Cuba
  2. Jamaica
  3. Haiti
  4. Dominican Republic
  5. Grenada
  6. Trinidad and Tobago
  7. Barbados
  8. Saint Vincent and the Grenadines
  9. Saint Lucia
  10. Dominica
  11. Antigua and Barbuda
  12. Saint Kitts and Nevis
  13. Bahamas

South American tour

Here’s the tour of South America.

Here’s the order of the tour:

  1. Venezuela
  2. Guyana
  3. Suriname
  4. French Guiana
  5. Brazil
  6. Paraguay
  7. Uruguay
  8. Falkland Islands
  9. Argentina
  10. Chile
  11. Bolivia
  12. Peru
  13. Ecuador
  14. Colombia

A traveling salesman tour of Africa

Suppose you’d like to tour Africa, visiting each country once, then returning to your starting point, minimizing the distance traveled.

Here’s my first attempt at a solution using Mathematica, based on an example in the documentation for FindShortestTour.

    africa = CountryData["Africa"]
    FindShortestTour[africa]
    GeoGraphics[{Thick, Red, GeoPath[africa[[%[[2]]]]]}]


This produced the following map:

Hmm. Maybe I should have been more specific about what I mean by “Africa.” My intention was to find a tour of continental Africa, i.e. not including islands. This means I needed to remove several items from Mathematica’s list of African countries. Also, I had in mind sovereign states, not territories of overseas states and not disputed territories.

After doing this, the map is more like what I’d expect.

The tour is then

  1. Algeria
  2. Tunisia
  3. Libya
  4. Egypt
  5. Chad
  6. Central African Republic
  7. Democratic Republic of the Congo
  8. Burundi
  9. Rwanda
  10. Uganda
  11. South Sudan
  12. Sudan
  13. Eritrea
  14. Djibouti
  15. Somalia
  16. Ethiopia
  17. Kenya
  18. Tanzania
  19. Malawi
  20. Zambia
  21. Mozambique
  22. Zimbabwe
  23. Eswatini
  24. Lesotho
  25. South Africa
  26. Botswana
  27. Namibia
  28. Angola
  29. Republic of the Congo
  30. Gabon
  31. Equatorial Guinea
  32. Cameroon
  33. Nigeria
  34. Niger
  35. Mali
  36. Burkina Faso
  37. Benin
  38. Togo
  39. Ghana
  40. Ivory Coast
  41. Liberia
  42. Sierra Leone
  43. Guinea
  44. Guinea-Bissau
  45. Gambia
  46. Senegal
  47. Mauritania
  48. Morocco

The initial tour, including islands, foreign territories, and Western Sahara, was 23,744 miles or 38,213 kilometers. The second tour was 21,074 miles or 33915 kilometers.

Here’s a tour of just the islands, excluding foreign territories.

The order of the tour is

  1. Cape Verde
  2. Seychelles
  3. Mauritius
  4. Madagascar
  5. Comoros
  6. São Tomé and Príncipe

This tour is 13,034 miles or 20,976 kilometers.

Update: See the next two posts for tours of the Americas and Eurasia and Oceania.

Related posts

Master / Apprentice relationship graph in Star Wars

Here’s a graph of master / apprentice relationships for Jedi and Sith in the Star Wars movies. It’s not exhaustive, but it covers the main relationships in Episodes I through VI.

Jedi master/padawan relationships

Here’s what you get if you add a dashed arrow for who killed whom.

Master / apprentice relationships with who killed whom

Here’s the dot (GraphVis) code that created the diagrams.

digraph G {

    Vader [label="Anakin Skywalker/\nDarth Vader"];
    Dooku [label="Dooku/\nDarth Tyranus"];
    Luke  [label="Luke Skywalker"];
    Ben   [label="Obi-Wan Kenobi"];
    Liam  [label="Qui-Gon Jinn"];
    DS    [label="Palpatine/\nDarth Sidious"]
    DP    [label="Darth Plagueis"];
    DM    [label="Darth Maul"];

    Yoda  -> Dooku;
    Yoda  -> Luke;
    Dooku -> Liam;
    Liam  -> Ben;
    Liam  -> Vader;
    Ben   -> Vader;
    Ben   -> Luke;
    DS    -> Dooku;
    DS    -> Vader;
    DP    -> DS;
    DS    -> DM;
    
    Vader -> DS [style=dashed];
    Vader -> Dooku [style=dashed];
    Vader -> Ben [style=dashed];
    Ben   -> DM [style=dashed];    
    DM    -> Liam [style=dashed];
    DS    -> DP [style=dashed];
}

More network visualization posts

Rook graphs and Paley graphs

An m by n rook graph is formed by associating a node with each square of an m by n chess board and connecting two nodes with an edge if a rook can legally move from one to the other. That is, two nodes are connected if they are either in the same row or in the same column.

A Paley graph of order q is a graph with q nodes where q is a prime power and congruent to 1 mod 4. Label each node with an element of the finite field F with q elements and connect two nodes with an edge if the difference between their labels is a quadratic residue. That is, for nodes labeled a and b, connect these nodes if there exist an x in F such that ab = x².

We’ll look at a graph which is both a rook graph and a Paley graph: the 3 by 3 rook graph is the same as the Paley graph of order 9.

Rook graph 3 x 3

Constructing the rook graph is easy. If we number our 3 by 3 chess board as follows

    1 2 3
    4 5 6
    7 8 9

then we connect 1 to 2, 3, 4, and 7. We connect 2 to 1, 3, 5, and 8. Etc. This gives us the following graph.

Rook graph

(The graph was made with Sketchviz introduced here.)

Paley graph of order 9

They Paley graph is a little more complicated to construct. First we need to explain what a field with 9 elements is.

A field with 3 elements is just integers mod 3. We can think of the field with 9 elements as the analog of complex numbers: numbers of the form abi where a and b are integers mod 3. As with the complex numbers, i² = -1.

When we square each of the nine elements of our field, we get four distinct non-zero results: 1, 2, i, and 2i. For example, i is one of our squares because

(1 + 2i)(1 + 2i) = (1 – 4) + 4i = i

Here we used -3 = 0 mod 3 and 4 = 1 mod 3.

Since our squares are 1, 2, i, and 2i, that says 0 is connected to these four values. Then we connect 1 to 2, 0, 1 + i and 1 + 2i because each of these numbers minus 1 is a square. If we keep doing this for all nine nodes, we get the following graph.

Paley graph

Isomorphism

It’s clear that the two graphs above are relabelings of each other. We can see this explicitly by relabeling our chess board as follows:

    0    i    2i
    1  1+i  1+2i
    2  2+i  2+2i

Moving horizontally adds i, which is a square, and moving vertically adds 1, which is a square. So the Paley connections are rook connections.

Uniqueness

Are there more rook graphs that are isomorphic to Paley graphs? No, this is the only one.

Every pair of vertices in a rook graph that are not neighbors have two common. If the points with coordinates (a, b) and (c, d) are on different rows and columns, then both are connected to (a, d) and (c, b).

In a Paley graph of order q, every pair of nodes that aren’t neighbors have (q – 1)/4 common neighbors. For a Paley graph to be isomorphic to a rook graph, we must have (q – 1)/4 = 2, and so q = 9.

More on graphs

Six degrees of Kevin Bacon, Paul Erdos, and Wikipedia

I just discovered the website Six Degrees of Wikipedia. It lets you enter two topics and it will show you how few hops it can take to get from one to the other.

Since the mathematical equivalent of Six Degrees of Kevin Bacon is Six degrees of Paul Erdős, I tried looking for the distance between Kevin Bacon and Paul Erdős and found this:

Erdos to Bacop

Kevin Bacon and Paul Erdős are both known for their prolific collaborations. Many actors have acted with Kevin Bacon, or acted with actors who have acted with him, etc. Many mathematicians wrote papers with Erdős, or have written papers with people who have written papers with him, etc. I’m four degrees of separation from Paul Erdős last I checked.  [1]

Here’s a more complex graph showing the three degrees of separation between Thomas Aquinas and Thomas the Tank Engine.

Aquinas to Tank Engine

Note that the edges are directed, links from the starting page to pages that lead to the target page. If you reverse the starting and ending page, you get different results. For example, there are still three degrees of separation if we reverse our last example, going from Thomas the Tank Engine to Thomas Aquinas, but there are about twice as many possible paths.

More network posts

[1] Your Erdős number, the degrees of separation between your and Paul Erdős, can decrease over time because his collaborators are still writing papers. Even Erdős himself continued to publish posthumously because people who began papers with him finished them years later.

Spectral sparsification

The latest episode of My Favorite theorem features John Urschel, former offensive lineman for the Baltimore Ravens and current math graduate student. His favorite theorem is a result on graph approximation: for every weighted graph, no matter how densely connected, it is possible to find a sparse graph whose Laplacian approximates that of the original graph. For details see Spectral Sparsification of Graphs by Dan Spielman and Shang-Hua Teng.

You could view this theorem as a positive result—it’s possible to find good sparse approximations—or a negative result—the Laplacian doesn’t capture very well how densely a graph is connected.

Related: Blog posts based on my interview with Dan Spielman at the 2014 Heidelberg Laureate Forum.

Random minimum spanning trees

I just ran across a post by John Baez pointing to an article [the link has gone away] by Alan Frieze on random minimum spanning trees.

Here’s the problem.

  1. Create a complete graph with n nodes, i.e. connect every node to every other node.
  2. Assign each edge a uniform random weight between 0 and 1.
  3. Find the minimum spanning tree.
  4. Add up the edges of the weights in the minimum spanning tree.

The surprise is that as n goes to infinity, the expected value of the process above converges to the Riemann zeta function at 3, i.e.

ζ(3) = 1/1³ + 1/2³ + 1/3³ + …

Incidentally, there are closed-form expressions for the Riemann zeta function at positive even integers. For example, ζ(2) = π² / 6. But no closed-form expressions have been found for odd integers.

Simulation

Here’s a little Python code to play with this.

    import networkx as nx
    from random import random

    N = 1000

    G = nx.Graph()
    for i in range(N):
        for j in range(i+1, N):
            G.add_edge(i, j, weight=random())

    T = nx.minimum_spanning_tree(G)
    edges = T.edges(data=True)

    print( sum( e[2]["weight"] for e in edges ) )

When I ran this, I got 1.2307, close to ζ(3) = 1.20205….

I ran this again, putting the code above inside a loop, averaging the results of 100 simulations, and got 1.19701. That is, the distance between my simulation result and ζ(3) went from 0.03 to 0.003.

There are two reasons I wouldn’t get exactly ζ(3). First, I’m only running a finite number of simulations (100) and so I’m not computing the expected value exactly, but only approximating it. (Probably. As in PAC: probably approximately correct.) Second, I’m using a finite graph, of size 1000, not taking a limit as graph size goes to infinity.

My limited results above suggest that the first reason accounts for most of the difference between simulation and theory. Running 100 replications cut the error down by a factor of 10. This is exactly what you’d expect from the central limit theorem. This suggests that for graphs as small as 1000 nodes, the expected value is close to the asymptotic value.

You could experiment with this, increasing the graph size and increasing the number of replications. But be patient. It takes a while for each replication to run.

Generalization

The paper by Frieze considers more than the uniform distribution. You can use any non-negative distribution with finite variance and whose cumulative distribution function F is differentiable at zero. The more general result replaces ζ(3) with ζ(3) / F ‘(0). We could, for example, replace the uniform distribution on weights with an exponential distribution. In this case the distribution function is 1 − exp(−x), at its derivative at the origin is 1, so our simulation should still produce approximately ζ(3). And indeed it does. When I took the average of 100 runs with exponential weights I got a value of 1.2065.

There’s a little subtlety around using the derivative of the distribution at 0 rather than the density at 0. The derivative of the distribution (CDF) is the density (PDF), so why not just say density? One reason would be to allow the most general probability distributions, but a more immediate reason is that we’re up against a discontinuity at the origin. We’re looking at non-negative distributions, so the density has to be zero to the left of the origin.

When we say the derivative of the distribution at 0, we really mean the derivative at zero of a smooth extension of the distribution. For example, the exponential distribution has density 0 for negative x and density exp(−x) for non-negative x. Strictly speaking, the CDF of this distribution is 1 − exp(−x) for non-negative x and 0 for negative x. The left and right derivatives are different, so the derivative doesn’t exist. By saying the distribution function is simply exp(−x), we’ve used a smooth extension from the non-negative reals to all reals.