Can you hear the shape of a network?

Posted on 2 February 2016 by John

Mark Kac asked in 1966 whether you can hear the shape of a drum. In other words, can two drums, made of the same material, produce the exact same sound but have different shapes? More formally, Kac asked whether the eigenvalues of the Laplace’s equation with zero boundary conditions uniquely determine the shape of a region in the plane. The question remained open until 1992. No, you can’t always hear the shape of a drum. But often you can. With some restrictions on the regions, the shape is uniquely determined by the sound, i.e., the Laplace spectrum.

Ten years before Kac asked about hearing the shape of a drum, Günthard and Primas asked the analogous question about graphs. Can you hear the shape of a graph? That is, can two different graphs have the same eigenvalues? At the time, the answer was believed to be yes, but a year later it was found to be no, not always [1]. So the next natural question is when can you hear the shape of a graph, i.e. under what conditions is a graph determined by its eigenvalues? It depends on which matrix you’re taking the eigenvalues of, but under some conditions some matrix spectra uniquely determine graphs. We don’t know in general how common it is for spectra to uniquely determine graphs.

Here are two graphs that have the same adjacency matrix spectra, first published in [2]:

Both have adjacency spectra [-2, 0, 0, 0, 2]. But the graphs are not cospectral as far as the Laplacian is concerned. Their Laplace spectra are [0, 0, 2, 2, 4] and [0, 1, 1, 1, 5] respectively. We could tell that the Laplace spectra would be different before computing them because the second smallest Laplace eigenvalue is positive if and only if a graph is connected.

The graphs below are cospectral for the adjacency, Laplacian, and unsigned Laplacian matrices.

Both graphs have the same number of nodes and edges, and every node has degree 4 in both graphs. But the graph on the left contains more triangles than the one on the right, so they cannot be isomorphic.

One way to test whether two graphs are isomorphic is to compute their spectra. If the spectra are different, the graphs are not isomorphic. So spectral analysis gives a way to show that two graphs are not isomorphic in polynomial time, though the test may be inconclusive.

If two graphs do have the same spectra, what is the probability that they are isomorphic? In [1] the authors answer this question empirically for graphs of order up to 11. Very roughly, there’s about an 80% chance graphs with the same adjacency matrix spectrum are isomorphic. The chances go up to 90% for the Laplacian and 95% for the signless Laplacian.

***

[1] Edwin R. van Dam, Willem H. Haemers. Which graphs are determined by their spectrum? Linear Algebra and its Applications 373 (2003) 241–272

[2] D.M. Cvetkovi´c, Graphs and their spectra, Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Mat. Fiz. 354–356 (1971) 1–50.

Bounding a graph’s diameter by its spectrum

Posted on 28 January 2016 by John

You can get upper and lower bounds on the diameter of a connected graph G from its spectrum.

If G has r distinct eigenvalues—whether of the adjacency matrix A, Laplacian L, or signless Laplacian Q—then the its diameter d is at most r − 1. (Definitions of these matrices here.)

If G has n vertices then we have an lower bound on the diameter: d ≥ 4/nλ₂ where λ₂ is the algebraic connectivity, the second eigenvalue of the graph Laplacian.

Let Δ be the maximum degree of G. Then we also have the following upper bound:

$d \leq 2 \left\lceil \sqrt{\frac{2\Delta}{\lambda_2}} \log_2 n \right\rceil$

How well do these bounds work? Let’s look at a random graph with 50 vertices.

The diameter of this graph is 3. The highest vertex degree is 22.

Each of the three matrices A, L, and Q have 50 distinct eigenvalues. So that says the diameter should be less than 49, which it certainly is.

The lower bound based on algebraic connectivity is 0.0129, and the upper bound is 32.

Altogether, we estimated the diameter, whose true value is 3, to be between 0.0129 and 32. Which is correct, but not very tight.

Maybe the bounds weren’t very good because the diameter was small. Let’s look at graph with a much bigger diameter, a circle with 50 nodes.

Now this graph has diameter 25. The maximum node degree is 2, because every node has degree 2.

Each of the three matrices A, L, and Q have 26 distinct eigenvalues, which predicts that the graph diameter will be no more than 25. In this case, the upper bound is exact.

The algebraic connectivity gives a lower bound of 5.0727 for the diameter and an upper bound of 180. So while these bounds are correct, they are not especially tight either.

Source: Handbook of Linear Algebra

More spectral graph theory

Graph regularity and Laplace eigenvalues

Posted on 27 January 2016 by John

You can tell whether a graph is regular, or nearly regular, by looking at its eigenvalues.

Let G be a graph with n vertices and m edges and let L be the Laplacian matrix of G. Then the sum of the eigenvalues of L is 2m. (The diagonal of L contains the degrees of each node, so it’s trace is twice the number of edges. And the trace of a matrix is the sum of its eigenvalues.)

Let λ_i be the eigenvalues of the L. Then we have

$\sum \lambda_i = 2m \leq \sqrt{n \sum \lambda_i(\lambda_i - 1)}$

with equality if and only if G is regular. [reference]

We can try this out by starting with a dodecahedron graph. It’s a regular graph, since each vertex has three edges. There are a total of 30 edges and the eigenvalues sum to 60. The right-most expression is 60 as expected.

Next remove one of the edges from the graph. The result bears a remarkable resemblance to Iron Man.

Now there are 29 edges, so 2m = 58. The right-most expression above is now 58.31. It’s bigger than 58 because the new graph is not regular. But as expected, it’s close to 58 since the graph is close to regular.

Understanding a graph by peeling away nodes

Posted on 25 January 2016 by John

One way to get some understanding of a graph is to peel away nodes, starting with the least connected. The k-core of a graph is the maximal subgraph such that every vertex has degree at least k. The k-shell is the set of vertices that are part of the k-core but not part of the (k+1)-core.

For example, the 0-core of a graph is simply the entire graph since every vertex has at least zero edges; a vertex can’t have a negative number of edges. The 1-core of a graph contains the vertices that are connected to other vertices. You form the 1-core by throwing away the 0-shell, the set of isolated vertices. You move from the 1-core to the 2-core by removing nodes of degree 1 until everything that’s left has degree at least 2.

NB: You do not form the k-core simply by discarding nodes of degree less than k. For example, consider the star graph below.

star graph with five peripheral nodes

If you peel off the nodes of degree 1, eventually you have nothing left. There is no 2-core, even though the original graph had a node of degree 5. Or going back to the definition, there is no subgraph such that every vertex has degree 2, and so no maximal subgraph. The node of degree 5 only has degree five because it is connected to nodes of degree 1.

The k-cores of a complete graph of degree n are simply the entire graph, for k ≤ n. The k-shells are empty for k < n.

Keyword overlap

Let’s see what the k-cores and k-shells look like for some more complex graphs. First, let’s go back to the post looking at the overlap in keywords. The edges of the graph are the 200 search terms that have lead visitors to the site at least twice. We connect two search terms if they share a word. The 56-core of the graph has 57 vertices, and the 57-core is empty. So the 56-core is a complete graph. Here’s the distribution of the k-shell sizes.

Random graphs

Next let’s look at a random graph. As in the post on spectra of random graphs, we look at an Erdős-Rényi graph G_n,p. First we start with n = 1000 and p = 0.01. So we start with 1000 nodes, and there’s a 0.01 that any particular pair of nodes is connected. Here are the core sizes for one such random graph:

[1000 1000 999 999 990 970 914 793]

So the 7-core has 793 nodes, then the 8-core is empty. I ran this ten times and the 7-core sizes ranged from size 709 to 849. But every time the 8-core was empty. The largest value of k such that the k-core is non-empty is called the degeneracy of the graph, so we could say the degeneracy was 7 every time.

I reran with n = 2000. The degeneracy was 14 every time. The 14-cores were in the range 1723 to 1810.

Finally I set n = 4000 and ran again. The degeneracy was 29 three times and 30 seven times. The final core sizes ranged from 3451 to 3714.

The experiments above suggest that the k-core size abruptly drops to zero, at a predictable value of k, and with a fairly predictable core size. There are papers on k-core size of random graphs, but I have not yet read any. (I have one in my to-read queue that someone let me know about after posting this article.)

Bipartite graphs and the signless Laplacian

Posted on 19 January 2016 by John

The vertices of a bipartite graph can be divided into two sets where edges only go from one set to the other; there are no edges within each set. For example, in the graph below edges connect blue to green nodes, but not blue to blue or green to green.

In this example there are two connected bipartite components. There is a variation on the graph Laplacian called the signless Laplacian whose eigenvalues can tell you how many bipartite components a graph has.

The graph Laplacian is L = D − A where D is the diagonal matrix whose entries are the degrees of each node and A is the adjacency matrix. The signless Laplacian is Q = D + A. Since D and A have only non-negative entries, so does Q.

The smallest eigenvalue of the Laplacian L is always zero, and the second smallest eigenvalue is positive if and only if the graph is connected. The smallest eigenvalue of the signless Laplacian Q is positive if and only if the graph is connected and bipartite. The multiplicity of zero as an eigenvalue equals the number of connected bipartite components.

The graph above has two connected bipartite components. Zero is a double eigenvalue of the signless Laplacian, and the next eigenvalue is 0.2603.

Next we connect two of the blue nodes. Now the graph is connected and bipartite.

Zero is now a single eigenvalue, and the second eigenvalue is 0.0968. This second eigenvalue is positive because there is only one component now, but it is small because the graph can almost be separated into two bipartite components.

Next we remove the edge connecting the two components, but we draw an edge between two of the green vertices.

Now there are two components, but only one of them is bipartite. The smallest eigenvalue is zero, with multiplicity 1, and the second eigenvalue is 0.2015.

Next we repeat the examples above with more edges. Now each component of the graph is a complete bipartite graph.

The smallest eigenvalue is 0, with multiplicity 2. But now the next eigenvalue is 3, This value is larger than before because the components are better connected.

As before, we now join two of the blue nodes.

Now there is one bipartite component. The smallest eigenvalue is 0 with multiplicity 1. The second eigenvalue is 0.2103.

Finally, we remove the edge connecting the two components and connect two of the green vertices.

Now the smallest eigenvalue is 0 with multiplicity 1. While there are two components, there’s only one bipartite component. The next eigenvalue is 0.4812.

More spectral graph theory

Spectra of random graphs

Posted on 17 January 2016 by John

Create a graph by first making a set of n vertices and then connecting nodes i and j, i > j, with probability p. This is known as the Erdős-Rényi graph G_n,p. Here’s such a graph with n = 100 and p = 0.1.

Erdős-Rényi graph

Then the spectrum of the adjacency matrix A of the graph bounces around the spectrum of the expected value of A. This post will illustrate this to give an idea how close the two spectra are.

The adjacency matrix A for our random graph has zeros along the diagonal because the model doesn’t allow loops connecting a vertex with itself. The rest of the entries are 1 with probability p and 0 with probability 1 − p. The entries above the diagonal are independent of each other, but the entries below the diagonal are determined by the entries above by symmetry. (An edge between i and j is also an edge between j and i.)

The expected value of A is a matrix with 0’s along the diagonal and p‘s everywhere else. This matrix has one eigenvalue of p(n − 1) and (n − 1) eigenvalues −p. Let’s see how close the spectra of a dozen random graphs come the spectra of the expected adjacency matrix. As above, n = 100 and p = 0.1.

Because the eigenvalues are close together, I jittered each sample horizontally to make the values easier to see. Notice that the largest eigenvalue in each sample is around 10. The expected value is p(n − 1) = 9.9, so the values are close to their expected value.

The other eigenvalues seem to be centered around 0. This is consistent with the expected value of −p = −0.1. The most obvious thing about the smaller eigenvalues is not their mean but their range. How far can might the eigenvalues be from their expected value? Fan Chung and Mary Radcliffe give the following estimate in their paper On the Spectra of General Random Graphs.

For G_n,p, if p > 8 log(√2 n)/9n, then with probability at least 1 − 1/n we have

$|\lambda_i(A) - \lambda_i(p(J-I))| \leq \sqrt{8np \log(\sqrt{2}n)}$

Here J is the matrix of all 1’s and I is the identity matrix. In our case the theorem says that with probability at least 0.99, we expect the eigenvalues of our random graph to be within 19.903 of their expected value. This is a pessimistic bound since no value is much more than 5 away from its expected value.

Visualizing search keyword overlap

Posted on 15 January 2016 by John

The other day someone asked me to look at the search data for Debug Pest Control, a pest management company based in Rhode Island. One of the things I wanted to visualize was how the search terms overlapped with each other.

To do this, I created a graph where the nodes are the keywords and edges join nodes that share a word. (A “keyword” is actually a set of words, whatever someone types into a search.) The result was a solid gray blob be cause the nodes are so densely connected.

I thinned the graph by first only looking at keywords that have occurred at least twice. Then I made my connection criteria a little stronger. I take the union of the individual words in two keywords and create a connection if at least 1/3 of these words appear in both keywords.

(You can click on the image to see a larger version.)

The area of each red dot is proportional to the number of times someone has come to the site using that keyword. You can see that there’s long-tail behavior: while some dots are much larger than others, there are a lot of little dots.

In case you’re curious, these are the top keywords:

pest control
ri
pest control ri
pest control rhode island,
plants that keep bees away
poisonous snakes in ri
plants that keep bees and wasps away
plants to keep bees away
plants that keep wasps away
pest control companies

The keywords were filtered a little before making the graph. Some of the top keywords contained “debug.” These may have been people using the term to refer to eliminating pests from their property, but more likely they were people who knew the company name. By removing “debug” from each of the keywords, the results focus on people searching for the company’s services rather than the company itself.

Spectral coordinates in Python

Posted on 15 January 2016 by John

A graph doesn’t have any geometric structure unless we add it. The vertices don’t come with any position in space. The same graph can look very different when arranged different ways.

Spectral coordinates are a natural way to draw a graph because they are determined by the properties of the graph, not arbitrary aesthetic choices. Construct the Laplacian matrix and let x and y be the eigenvectors associated with the second and third eigenvalues. (The smallest eigenvalue is always zero and has an eigenvector of all 1’s. The second and third eigenvalues and eigenvectors are the first to contain information about a graph.) The spectral coordinates of the ith node are the ith components of x and y.

We illustrate this with a graph constructed from a dodecahedron, a regular solid with twenty vertices and twelve pentagonal faces. You can make a dodecahedron from a soccer ball by connecting the centers of all the white hexagons. Here’s one I made from Zometool pieces for a previous post:

Although we’re thinking of this graph as sitting in three dimensions, the nodes being the corners of pentagons etc., the graph simply says which vertices are connected to each other. But from this information, we can construct the graph Laplacian and use it to assign plane coordinates to each point. And fortunately, this produces a nice picture:

Here’s how that image was created using Python’s NetworkX library.

    import networkx as nx
    import matplotlib.pyplot as plt
    from scipy.linalg import eigh

    # Read in graph and compute the Laplacian L ...
    
    # Laplacian matrices are real and symmetric, so we can use eigh, 
    # the variation on eig specialized for Hermetian matrices.
    w, v = eigh(L) # w = eigenvalues, v = eigenvectors

    x = v[:,1] 
    y = v[:,2]
    spectral_coordinates = {i : (x[i], y[i]) for i in range(n)}
    G = nx.Graph()
    G.add_edges_from(graph)

    nx.draw(G, pos=spectral_coordinates)
    plt.show()

Update: After posting this I discovered that NetworkX has a method draw_spectral that will compute the spectral coordinates for you.

More spectral graph theory

Spectra of complete graphs, stars, and rings

Posted on 9 January 2016 by John

A few examples help build intuition for what the eigenvalues of the graph Laplacian tell us about a graph. The smallest eigenvalue is always zero (see explanation in footnote here).

For a complete graph on n vertices, all the eigenvalues except the first equal n. The eigenvalues of the Laplacian of a graph with n vertices are always less than or equal to n, this says the complete graph has the largest possible eigenvalue. This makes sense in light of the previous post: adding edges increases the eigenvalues. When you add as many edges as possible, you get the largest possible eigenvalues.

Complete graph on five vertices

$\left[ \begin{array}{rrrrr} 4 & -1 & -1 & -1 & -1 \\ -1 & 4 & -1 & -1 & -1 \\ -1 & -1 & 4 & -1 & -1 \\ -1 & -1 & -1 & 4 & -1 \\ -1 & -1 & -1 & -1 & 4 \end{array} \right]$

Conversely, if all the non-zero eigenvalues of the graph Laplacian are equal, the graph is complete.

Next we look at the case of a star with n − 1 vertices connected to a central vertex. The smallest eigenvalue is zero, as always, and the largest is n. The ones in the middle are all 1. In particular, the second eigenvalue, the algebraic connectivity, is 1. It’s positive because the graph is connected, but it’s not large because the graph is not well connected: if you remove the central vertex it becomes completely disconnected.

star graph with five peripheral nodes

$\left[ \begin{array}{cccccc} 5 & -1 & -1 & -1 & -1 & -1 \\ -1 & 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 1 & 0 & 0 & 0 \\ -1 & 0 & 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 0 & 1 & 0 \\ -1 & 0 & 0 & 0 & 0 & 1 \end{array} \right]$

Finally, we look at a cyclical graph, a ring with n vertices. Here the eigenvalues are 2 − 2 cos(2πk/n) where 0 ≤ k ≤ n/2. In particular, smallest non-zero eigenvalue is 2 − 2 cos(2π/n) and so as n increases, the algebraic connectivity approaches zero. This is curious since the topology doesn’t change at all. But from a graph theory perspective, a big ring is less connected than a small ring. In a big ring, each vertex is connected to a small proportion of the vertices, and the average distance between vertices is larger.

Cyclic graph with five nodes

$\left[ \begin{array}{rrrrr} 2 & -1 & 0 & 0 & -1 \\ -1 & 2 & -1 & 0 & 0 \\ 0 & -1 & 2 & -1 & 0 \\ 0 & 0 & -1 & 2 & -1 \\ -1 & 0 & 0 & -1 & 2 \\ \end{array} \right]$

Adding an edge increases eigenvalues

Posted on 8 January 2016 by John

When you add an edge to a graph, each of the eigenvalues of the graph Laplacian either increases or stays the same.

As I wrote in my previous post, the smallest eigenvalue is always 0. The second smallest eigenvalue is 0 if and only if the graph is disconnected. So no edge will increase the smallest eigenvalue, nor will it increase the second eigenvalue if the matrix is still disconnected after adding it. Adding some edges will have more impact on the second eigenvalue than others.

To illustrate this, we start with the graph below.

The second eigenvalue of the Laplacian is 1.2679. When we connect node 2 to node 4, the second eigenvalue increases to 1.4384.

If we connect node 2 to node 3 instead, the second eigenvalue goes up to 1.6972. This makes sense because node 3 is better connected than node 4. Connecting node 2 to node 3 increases the overall connectivity of the graph more than connecting it to node 4. (The second eigenvalue of the graph Laplacian is sometimes called the algebraic connectivity of the graph because it is so strongly associated with the graph’s connectivity.)

Now let’s look at the rest of the eigenvalues. The full set of eigenvalues for the three graphs are given below.

original [0.0000  1.2679  2.0000  4.0000  4.0000  4.7321]
2 -> 4   [0.0000  1.4384  3.0000  4.0000  4.0000  5.5616]
2 -> 3   [0.0000  1.6972  2.3820  4.0000  4.6180  5.3028]

Notice that adding an edge between nodes 2 and 4 or between nodes 2 and 3 increased some of the eigenvalues relative to their original values and left others unchanged.

Swapping the edge between nodes 2 and 4 with an edge between nodes 2 and 3 increases the 2nd and 5th eigenvalue, but lowers the 3rd and 6th.

Related post: Social networks in fact and fiction