You can associate a matrix with a graph and find the eigenvalues of that matrix. This gives you a spectrum associated with the graph. This spectrum tells you something about the graph analogous to the way the spectrum of a star’s light tells you something about the chemical composition of the start.
So maybe it would be useful to visualize graph spectra the way you visualize chemical spectra. How could you do this?
First, scale the spectrum of the graph to the visual spectrum. There are different kinds of matrices you can associate with a graph, and these have different natural ranges. But if we look at the spectrum of the Laplacian matrix, the eigenvalues are between 0 and n for a graph with n nodes.
Eigenvalues typically correspond to frequencies, not wavelengths, but chemical spectra are often displayed in terms of wave length. We can’t be consistent with both, so we’ll think of our eigenvalues as wavelengths. We could easily redo everything in terms of frequency.
This means we map 0 to violet (400 nm) and n to red (700 nm). So an eigenvalue of λ will be mapped to 400 + 300 λ/n. That tells us where to graph a spectral line, but what color should it be? StackOverflow gives an algorithm for converting wavelength to RGB values. Here’s a little online calculator based on the same code.
Here are some examples. First we start with a random graph.
There are a couple ways to measure how well a graph remains connected when nodes are removed. One ways is to consider nodes dropping out randomly. Another way, the one we look at here, assumes an adversary is trying to remove the most well-connected nodes. This approach was defined by Schneider et al . It is also described, a little more clearly, in .
The robustness of a graph is defined as
Then  explains that
N is the total number of nodes in the initial network and S(q) is the relative size of the largest connected component after removing q nodes with the highest degrees. The normalization factor 1/N ensures 1/N ≤ R ≤ 0.5. Two special cases exist: R = 1/N corresponds to star-like networks while R = 0.5 represents the case of the fully connected network.
Apparently “relative size” means relative to the size of the original network, not to the network with q nodes removed.
There is one difference between  and . The former defines the sum up to N and the latter to N-1. But this doesn’t matter because S(N) = 0: when you remove N nodes from a graph with N nodes, there’s nothing left!
I have a couple reservations. One is that it’s not clear whether R is well defined.
Consider the phrase “removing q nodes with the highest degrees.” What if you have a choice of nodes with the highest degree? Maybe all nodes have degree no more than n and several have degree exactly n. It seems your choice of node to remove matters. Removing one node of degree n may give you a very different network than removing another node of the same degree.
Along the same lines, it’s not clear whether it matters how one removes more than one degree. For S(2), for example, does it matter whether I remove the two most connected nodes from the original graph at once, or remove the most connected node first, then the most connected node from what remains?
My second reservation is that I don’t think the formula above gives exactly the extreme values it says. Start with a star graph. Once you remove the center node, there are only isolated points left, and each point is 1/N of the original graph. So all the S(q) are equal to 1/N. Then R equals (N – 1)/ N2, which is less than 1/N.
With the complete graph, removing a point leaves a complete graph of lower order. So S(q) = (N – q)/N. Then R equals (N – 1)/2N, which is less than 1/2.
* * *
 CM Schneider et al, Mitigation of malicious attacks on networks. P. Natl. Acad. March 8, 2011, vol. 108. no. 10
 Big Data of Complex Networks, CRC Press. Matthias Dehmer et al editors.
The cover time of a graph is the expected time it takes a simple random walk on the graph to visit every node. A simple random walk starts at some node, then at each step chooses with equal probability one of the adjacent nodes. The cover time is defined to be the maximum over all starting points of expected time to visit all nodes.
It seems plausible that adding edges to a graph would decrease its cover time. Sometimes this is the case and sometimes it isn’t, as we’ll demonstrate below.
Chains, complete graphs, and lollipops
This post will look at three kinds of graphs
a chain of n vertices
a complete graph on n vertices
a “lollipop” with n vertices
A chain simply connects vertices in a straight line. A complete graph connects every node to every other node.
A lollipop with n vertices takes a chain with n/2 vertices and complete graph on n/2 vertices and joins them together. The image above is a lollipop graph of order 10. The complete graph becomes a clique in the new graph.
The image looks more like a kite than a lollipop because that’s the way my software (networkx) drew it by default. If you’d like, image the complete graph more round and the chain more straight so that the graph more closely resembles a lollipop.
Random walks on graphs
Chains have a long cover time. Complete graphs have a short cover time. What about lollipops?
A plausible answer would be that a lollipop graph would have a moderate cover time, somewhere between that of a complete graph and a chain. But that’s not the case. In fact, the lollipop has a longer cover time than either the complete graph or the chain.
You could think of a lollipop graph with n vertices as a chain of length n with extra edges added. This shows that adding edges does not always decrease the cover time.
The cover times for the three graphs we consider here are of different orders: O(n log n) for the complete graph, O(n2) for the chain, and O(n3) for the lollipop.
Now these are only asymptotic results. Big-O notation leaves out proportionality constants. We know that for sufficiently large n the cover times are ordered so that the complete graph is the smallest, then the chain, then the lollipop. But does n have to be astronomically large before this happens? What happens with a modest size n?
There’s a theoretical result that says the cover time is bounded by 2m(n-1) where m is the number of edges and n the number of vertices. This tells us that when we say the cover time for the chain is O(n2) we can say more, that it’s less than 2n2, so at least in this case the proportionality constant isn’t large.
We’ll look at the cover times with some simulation results below based on n = 10 and n = 30 based on 10,000 runs.
With 10 nodes each, the cover times for the complete graph, chain, and lollipop were 23.30, 82.25, and 131.08 respectively.
With 30 nodes the corresponding cover times were 114.37, 845.16, and 3480.95.
So the run times were ordered as the asymptotic theory would suggest.
Random walks mix faster on some graphs than on others. Rapid mixing is one of the ways to characterize expander graphs.
By rapid mixing we mean that a random walk approaches its limiting (stationary) probability distribution quickly relative to random walks on other graphs.
Here’s a graph that supports rapid mixing. Pick a prime p and label nodes 0, 1, 2, 3, …, p-1. Create a circular graph by connecting each node k to k+1 (mod p). Then add an edge between each non-zero k to its multiplicative inverse (mod p). If an element is it’s own inverse, add a loop connecting the node to itself. For the purpose of this construction, consider 0 to be its own inverse. This construction comes from .
Here’s what the graph looks like with p = 23.
This graph doesn’t show the three loops from a node to itself, nodes 1, 0, and 22.
At each node in the random walk, we choose an edge uniformly and follow it. The stationary distribution for a random walk on this graph is uniform. That is, in the long run, you’re equally likely to be at any particular node.
If we pick an arbitrary starting node, 13, and take 30 steps of the random walk, the simulated probability distribution is nearly flat.
By contrast, we take a variation on the random walk above. From each node, we move left, right, or stay in place with equal probability. This walk is not as well mixed after 100 steps as the original walk is after only 30 steps.
You can tell from the graph that the walk started around 13. In the previous graph, evidence of the starting point had vanished.
The code below was used to produce these graphs.
To investigate how quickly a walk on this graph converges to its limiting distribution, we could run code like the following.
from random import random
import numpy as np
import matplotlib.pyplot as plt
m = 23
# Multiplicative inverses mod 23
inverse = [0, 1, 12, 8, 6, 14, 4, 10, 3, 18, 7, 21, 2, 16, 5, 20, 13, 19, 9, 17, 15, 11, 22]
# Verify that the inverses are correct
assert(len(inverse) == m)
for i in range(1, m):
assert (i*inverse[i] % 23 == 1)
num_reps = 100000
num_steps = 30
u = random() * 3
if u < 1: # move left
newloc = (loc + m - 1) % m
elif u < 2: # move right
newloc = (loc + 1) % m
newloc = inverse[loc] # or newloc = loc
stats = *m
for i in range(num_reps):
loc = 13 # arbitrary fixed starting point
for j in range(num_steps):
loc = take_cayley_step(loc)
stats[loc] += 1
normed = np.array(stats)/num_reps
“You are the average of the five people you spend the most time with.”
A Google search says this quote is by Jim Rohn. I think other people have said similar things. I’ve heard it quoted many times. The implication is usually that you can improve your life by hanging around better people.
Here are three things it makes me think of.
It sounds approximately true.
It can’t be literally true.
It reminds me of harmonic functions.
There are numbers to back up the assertion that things like your income and even your weight approximately match the average of the people around you. Cause and effect go both ways: people like to hang out with people like themselves, and we become like the people we hang around.
It’s an aphorism, not meant to be taken literally. But a moment’s thought shows that it can’t be literally true for everybody. In any social network, someone has the most money, for example. That person’s net worth cannot be the average of the net worth of others in the group, unless everyone has the exact same net worth. The same applies to the poorest person in the network.
The reason I say that this reminds me of harmonic functions is the mean value theorem. If a function satisfies Laplace’s equation in some region, at any point in the interior of the region, the value of the function equals the average of the function over a spherical region centered at the point. But this is only true in the interior. On the boundary, you might have a maximum or minimum. If the boundary is compact, you will have a maximum and a minimum, provided the function extends continuously to its boundary.
I think of the continuous case first because I spent years thinking about such things. But there’s a discrete analog of harmonic functions that applies directly to the example above. If you have some network, such as a social network, and assign number to each node, you have a discrete harmonic function if the value at every node is the average of the values at its neighboring nodes. For a finite network, a function cannot be harmonic at every point unless it is constant, for reasons given above. But a function could be harmonic at all but two nodes of a graph, or approximately harmonic at all nodes.
Imagine a clock with a prime number of hours. So instead of being divided into 12 hours, it might be divided into 11 or 13, for example. You can add numbers on this clock the way you’d add numbers on an ordinary clock: add the two numbers as ordinary integers and take the remainder by p, the number of hours on the clock. You can multiply them the same way: multiply as ordinary integers, then take the remainder by p. This is called arithmetic modulo p, or just mod p.
Which numbers have square roots mod p? That is, for which numbers x does there exist a number y such that y2 = x mod p? It turns out that of the non-zero numbers, half have a square root and half don’t. The numbers that have square roots are called quadratic residues and the ones that do not have square roots are called quadratic nonresidues. Zero is usually left out of the conversation. For example, if you square the numbers 1 through 6 and take the remainder mod 7, you get 1, 4, 2, 2, 4, and 1. So mod 7 the quadratic residues are 1, 2, and 4, and the nonresidues are 3, 5, and 6.
A simple, brute force way to tell whether a number is a quadratic residue mod p is to square all the numbers from 1 to p-1 and see if any of the results equals your number. There are more efficient algorithms, but this is the easiest to understand.
At first it seems that the quadratic residues and nonresidues are scattered randomly. For example, here are the quadratic residues mod 23:
[1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18]
But there are patterns, such as the famous quadratic reciprocity theorem. We can see some pattern in the residues by visualizing them on a graph. We make the numbers mod p vertices of a graph, and join two vertices a and b with an edge if a–b is a quadratic residue mod p.
Here’s the resulting graph mod 13:
And here’s the corresponding graph for p = 17:
And here’s Python code to produce these graphs:
import networkx as nx
import matplotlib.pyplot as plt
def residue(x, p):
for i in range(1, p):
if (i*i - x) % p == 0:
p = 17
G = nx.Graph()
for i in range(p):
for j in range(i+1, p):
if residue(i-j, p):
However, this isn’t the appropriate graph for all values of p. The graphs above are undirected graphs. We’ve implicitly assumed that if there’s an edge between a and b then there should be an edge between b and a. And that’s true for p = 13 or 17, but not, for example, for p = 11. Why’s that?
If p leaves a remainder of 1 when divided by 4, i.e. p = 1 mod 4, then x is a quadratic residue mod p if and only if –x is a quadratic residue mod p. So if a–b is a residue, so is b–a. This means an undirected graph is appropriate. But if p = 3 mod 4, x is a residue mod p if and only if –x is a non-residue mod p. This means we should use directed graphs if p = 3 mod 4. Here’s an example for p = 7.
The code for creating the directed graphs is a little different:
G = nx.DiGraph()
for i in range(p):
for j in range(p):
if residue(i-j, p):
Unfortunately, the way NetworkX draws directed graphs is disappointing. A directed edge from a to b is drawn with a heavy line segment on the b end. Any suggestions for a Python package that draws more attractive directed graphs?
The idea of creating graphs this way came from Roger Cook’s chapter on number theory applications in Graph Connections. (I’m not related to Roger Cook as far as I know.)
You could look at quadratic residues modulo composite numbers too. And I imagine you could also make some interesting graphs using Legendre symbols.
What is a Fourier transform at its core? An expansion of function in terms of eigenfunctions of the Laplacian. For a function on the real line, the Laplacian is simply the second derivative. The functions mapped to multiples of themselves by taking second derivatives are sines and cosines of various frequencies. A Fourier series is a change of basis, using as basis vectors those functions who behave the simplest under the second derivative.
The Fourier transform of a function on a graph is also a change of basis, expanding a discrete function in terms of eigenvalues of the Laplacian, in this case the graph Laplacian.
The Fourier transform of a function f, evaluated at a frequency ω, is the inner product of f with the eigenfunction exp(2πiωt).
The inner product of two complex functions f and g is the integral of the product of f and the conjugate of g. Conjugation is why exp(2πiωt) became exp(-2πiωt).
The Fourier transform of a discrete function f on a graph, evaluated at an eigenvalue λi, is the inner product of f (i.e. the vector of values of f at each node) with the eigenvector associated with λi.
Here the inner product is a discrete sum rather than an integral. As before, we take the complex conjugate of the second item in the product.
The eigenvectors associated with the smallest eigenvalues of the graph Laplacian are analogous to low frequency sines and cosines. The eigenvalue components corresponding to nearly vertices in a graph should be close together. This analogy explains why spectral coordinates work so well.
A (continuous) harmonic function f on an open set a function that is twice differentiable and satisfied Laplace’s equation: ∇2f = 0. Such functions have a mean value property: at a point x interior to the set, f(x) equals the average value of f over any ball around x that fits inside the region. It turns out that the converse is true: a locally-integrable function satisfying the mean value property is harmonic.
We can take the mean value property from the continuous case as the definition in the discrete case. A function on a graph is harmonic at a node x if f(x) equals the average of f‘s values on the nodes connected to x. A function is said to have a singularity at points of a graph where it is not harmonic.
A non-constant function on a finite connected graph cannot be harmonic at every node. It has to have at least two singularities, namely the points where it takes on its maximum and its minimum. This is analogous to Liouville’s theorem for (continuous) harmonic functions: a bounded harmonic function on all of Euclidean space must be constant. Some infinite graphs have non-constant functions that are harmonic everywhere. For example, linear functions on the integers are harmonic.
In the continuous case, we want to find solutions to Laplace’s equation satisfying a boundary condition. We want to find a function that has the specified values on the boundary and is harmonic in the interior. With the right regularity conditions, this solution exists and is unique.
We can do the same for discrete harmonic functions. If we specify the values of a function on some non-empty subset of nodes, there is a unique discrete harmonic function that extends the function to the rest of a graph.
Discrete harmonic functions come up in applications such as random walks and electrical networks.
Mark Kac asked in 1966 whether you can hear the shape of a drum. In other words, can two drums, made of the same material, produce the exact same sound but have different shapes? More formally, Kac asked whether the eigenvalues of the Laplace’s equation with zero boundary conditions uniquely determine the shape of a region in the plane. The question remained open until 1992. No, you can’t always hear the shape of a drum. But often you can. With some restrictions on the regions, the shape is uniquely determined by the sound, i.e., the Laplace spectrum.
Ten years before Kac asked about hearing the shape of a drum, Günthard and Primas asked the analogous question about graphs. Can you hear the shape of a graph? That is, can two different graphs have the same eigenvalues? At the time, the answer was believed to be yes, but a year later it was found to be no, not always . So the next natural question is when can you hear the shape of a graph, i.e. under what conditions is a graph determined by its eigenvalues? It depends on which matrix you’re taking the eigenvalues of, but under some conditions some matrix spectra uniquely determine graphs. We don’t know in general how common it is for spectra to uniquely determine graphs.
Here are two graphs that have the same adjacency matrix spectra, first published in :
Both have adjacency spectra [-2, 0, 0, 0, 2]. But the graphs are not cospectral as far as the Laplacian is concerned. Their Laplace spectra are [0, 0, 2, 2, 4] and [0, 1, 1, 1, 5] respectively. We could tell that the Laplace spectra would be different before computing them because the second smallest Laplace eigenvalue is positive if and only if a graph is connected.
The graphs below are cospectral for the adjacency, Laplacian, and unsigned Laplacian matrices.
Both graphs have the same number of nodes and edges, and every node has degree 4 in both graphs. But the graph on the left contains more triangles than the one on the right, so they cannot be isomorphic.
One way to test whether two graphs are isomorphic is to compute their spectra. If the spectra are different, the graphs are not isomorphic. So spectral analysis gives a way to show that two graphs are not isomorphic in polynomial time, though the test may be inconclusive.
If two graphs do have the same spectra, what is the probability that they are isomorphic? In  the authors answer this question empirically for graphs of order up to 11. Very roughly, there’s about an 80% chance graphs with the same adjacency matrix spectrum are isomorphic. The chances go up to 90% for the Laplacian and 95% for the signless Laplacian.
 Edwin R. van Dam, Willem H. Haemers. Which graphs are determined by their spectrum? Linear Algebra and its Applications 373 (2003) 241–272
 D.M. Cvetkovi´c, Graphs and their spectra, Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Mat. Fiz. 354–356 (1971) 1–50.
You can tell whether a graph is regular, or nearly regular, by looking at its eigenvalues.
Let G be a graph with n vertices and m edges and let L be the Laplacian matrix of G. Then the sum of the eigenvalues of L is 2m. (The diagonal of L contains the degrees of each node, so it’s trace is twice the number of edges. And the trace of a matrix is the sum of its eigenvalues.)
Let λi be the eigenvalues of the L. Then we have
with equality if and only if G is regular. [reference]
We can try this out by starting with a dodecahedron graph. It’s a regular graph, since each vertex has three edges. There are a total of 30 edges and the eigenvalues sum to 60. The right-most expression is 60 as expected.
Next remove one of the edges from the graph. The result bears a remarkable resemblance to Iron Man.
Now there are 29 edges, so 2m = 58. The right-most expression above is now 58.31. It’s bigger than 58 because the new graph is not regular. But as expected, it’s close to 58 since the graph is close to regular.
One way to get some understanding of a graph is to peel away nodes, starting with the least connected. The k-core of a graph is the maximal subgraph such that every vertex has degree at least k. The k-shell is the set of vertices that are part of the k-core but not part of the (k+1)-core.
For example, the 0-core of a graph is simply the entire graph since every vertex has at least zero edges; a vertex can’t have a negative number of edges. The 1-core of a graph contains the vertices that are connected to other vertices. You form the 1-core by throwing away the 0-shell, the set of isolated vertices. You move from the 1-core to the 2-core by removing nodes of degree 1 until everything that’s left has degree at least 2.
NB: You do not form the k-core simply by discarding nodes of degree less than k. For example, consider the star graph below.
If you peel off the the nodes of degree 1, eventually you have nothing left. There is no 2-core, even though the original graph had a node of degree 5. Or going back to the definition, there is no subgraph such that every vertex has degree 2, and so no maximal subgraph. The node of degree 5 only has degree five because it is connected to nodes of degree 1.
The k-cores of a complete graph of degree n are simply the entire graph, for k ≤ n. The k-shells are empty for k < n.
Let’s see what the k-cores and k-shells look like for some more complex graphs. First, let’s go back to the post looking at the overlap in keywords. The edges of the graph are the 200 search terms that have lead visitors to the site at least twice. We connect two search terms if they share a word. The 56-core of the graph has 57 vertices, and the 57-core is empty. So the 56-core is a complete graph. Here’s the distribution of the k-shell sizes.
Next let’s look at a random graph. As in the post on spectra of random graphs, we look at an Erdős-Rényi graph Gn,p. First we start with n = 1000 and p = 0.01. So we start with 1000 nodes, and there’s a 0.01 that any particular pair of nodes is connected. Here are the core sizes for one such random graph:
[1000 1000 999 999 990 970 914 793]
So the 7-core has 793 nodes, then the 8-core is empty. I ran this ten times and the 7-core sizes ranged from size 709 to 849. But every time the 8-core was empty. The largest value of k such that the k-core is non-empty is called the degeneracy of the graph, so we could say the degeneracy was 7 every time.
I reran with n = 2000. The degeneracy was 14 every time. The 14-cores were in the range 1723 to 1810.
Finally I set n = 4000 and ran again. The degeneracy was 29 three times and 30 seven times. The final core sizes ranged from 3451 to 3714.
The experiments above suggest that the k-core size abruptly drops to zero, at a predictable value of k, and with a fairly predictable core size. There are papers on k-core size of random graphs, but I have not yet read any. (I have one in my to-read queue that someone let me know about after posting this article.)
The vertices of a bipartite graph can be divided into two sets where edges only go from one set to the other; there are no edges within each set. For example, in the graph below edges connect blue to green nodes, but not blue to blue or green to green.
In this example there are two connected bipartite components. There is a variation on the graph Laplacian called the signless Laplacian whose eigenvalues can tell you how many bipartite components a graph has.
The graph Laplacian is L = D – A where D is the diagonal matrix whose entries are the degrees of each node and A is the adjacency matrix. The signless Laplacian is Q = D + A. Since D and A have only non-negative entries, so does Q.
The smallest eigenvalue of the Laplacian L is always zero, and the second smallest eigenvalue is positive if and only if the graph is connected. The smallest eigenvalue of the signless Laplacian Q is positive if and only if the graph is connected and bipartite. The multiplicity of zero as an eigenvalue equals the number of connected bipartite components.
The graph above has two connected bipartite components. Zero is a double eigenvalue of the signless Laplacian, and the next eigenvalue is 0.2603.
Next we connect two of the blue nodes. Now the graph is connected and bipartite.
Zero is now a single eigenvalue, and the second eigenvalue is 0.0968. This second eigenvalue is positive because there is only one component now, but it is small because the graph can almost be separated into two bipartite components.
Next we remove the edge connecting the two components, but we draw an edge between two of the green vertices.
Now there are two components, but only one of them is bipartite. The smallest eigenvalue is zero, with multiplicity 1, and the second eigenvalue is 0.2015.
Next we repeat the examples above with more edges. Now each component of the graph is a complete bipartite graph.
The smallest eigenvalue is 0, with multiplicity 2. But now the next eigenvalue is 3, This value is larger than before because the components are better connected.
As before, we now join two of the blue nodes.
Now there is one bipartite component. The smallest eigenvalue is 0 with multiplicity 1. The second eigenvalue is 0.2103.
Finally, we remove the edge connecting the two components and connect two of the green vertices.
Now the smallest eigenvalue is 0 with multiplicity 1. While there are two components, there’s only one bipartite component. The next eigenvalue is 0.4812.
Create a graph by first making a set of n vertices and then connecting nodes i and j, i > j, with probability p. This is known as the Erdős-Rényi graph Gn,p. Here’s such a graph with n = 100 and p = 0.1.
Then the spectrum of the adjacency matrix A of the graph bounces around the spectrum of the expected value of A. This post will illustrate this to give an idea how close the two spectra are.
The adjacency matrix A for our random graph has zeros along the diagonal because the model doesn’t allow loops connecting a vertex with itself. The rest of the entries are 1 with probability p and 0 with probability 1-p. The entries above the diagonal are independent of each other, but the entries below the diagonal are determined by the entries above by symmetry. (An edge between i and j is also an edge between j and i.)
The expected value of A is a matrix with 0’s along the diagonal and p‘s everywhere else. This matrix has one eigenvalue of p(n – 1) and (n – 1) eigenvalues –p. Let’s see how close the spectra of a dozen random graphs come the spectra of the expected adjacency matrix. As above, n = 100 and p = 0.1.
Because the eigenvalues are close together, I jittered each sample horizontally to make the values easier to see. Notice that the largest eigenvalue in each sample is around 10. The expected value is p(n – 1) = 9.9, so the values are close to their expected value.
The other eigenvalues seem to be centered around 0. This is consistent with the expected value of –p = -0.1. The most obvious thing about the smaller eigenvalues is not their mean but their range. How far can might the eigenvalues be from their expected value? Fan Chung and Mary Radcliffe give the following estimate in their paper On the Spectra of General Random Graphs.
For Gn,p, if p > 8 log(√2 n)/9n, then with probability at least 1 – 1/n we have
Here J is the matrix of all 1’s and I is the identity matrix. In our case the theorem says that with probability at least 0.99, we expect the eigenvalues of our random graph to be within 19.903 of their expected value. This is a pessimistic bound since no value is much more than 5 away from its expected value.