In his new book The Perfect Shape, Øyvind Hammer shows how to create a graph something like the poster for Alfred Hitchcock’s movie Vertigo.

Hammer’s code uses a statistical language called Past that I’d never heard of. Here’s my interpretation of his code using Python.

import matplotlib.pyplot as plt
from numpy import arange, sin, cos, exp
i = arange(5000)
x1 = 1.0*cos(i/10.0)*exp(-i/2500.0)
y1 = 1.4*sin(i/10.0)*exp(-i/2500.0)
d = 450.0
vx = cos(i/d)*x1 - sin(i/d)*y1
vy = sin(i/d)*x1 + cos(i/d)*y1
plt.plot(vx, vy, "k")
h = max(vy) - min(vy)
w = max(vx) - min(vx)
plt.axes().set_aspect(w/h)
plt.show()

This code produces what’s called a harmonograph, related to the motion of a pendulum free to move in x and y directions:

It’s not exactly the same as the movie poster, but it’s definitely similar. If you find a way to modify the code to make it closer to the poster, leave a comment below.

The other day I ran across a blog post by Brian Hayes that linked to an article by David Cantrell on how to compute the inverse of the gamma function. Cantrell gives an approximation in terms of the Lambert W function.

In this post we’ll write a little Python code to kick the tires on Cantrell’s approximation. The post also illustrates how to do some common tasks using SciPy and matplotlib.

Here are the imports we’ll need.

import matplotlib.pyplot as plt
from scipy import pi, e, sqrt, log, linspace
from scipy.special import lambertw, gamma, psi
from scipy.optimize import root

First of all, the gamma function has a local minimum k somewhere between 1 and 2, and so it only makes sense to speak of its inverse to the left or right of this point. Gamma is strictly increasing for real values larger than k.

To find k we look for where the derivative of gamma is zero. It’s more common to work with the derivative of the logarithm of the gamma function than the derivative of the gamma function itself. That works just as well because gamma has a minimum where its log has a minimum. The derivative of the log of the gamma function is called ψ and is implemented in SciPy as scipy.special.psi. We use the function scipy.optimize.root to find where ψ is zero.

The root function returns more information than just the root we’re after. The root(s) are returned in the arrayx, and in our case there’s only one root, so we take the first element of the array:

How well goes this algorithm work? For starters, we’ll see how well it does when we do a round trip, following the exact gamma with the approximate inverse.

x = linspace(5, 30, 100)
plt.plot(x, AIG(gamma(x)))
plt.show()

This produces the following plot:

We get a straight line, as we should, so next we do a more demanding test. We’ll look at the absolute error in the approximate inverse. We’ll use a log scale on the x-axis since gamma values get large quickly.

y = gamma(x)
plt.plot(y, x- AIG(y))
plt.xscale("log")
plt.show()

This shows the approximation error is small, and gets smaller as its argument increases.

Cantrell’s algorithm is based on an asymptotic approximation, so it’s not surprising that it improves for large arguments.

Suppose you fill the components of a matrix with random values. How are the eigenvalues distributed?

We limit our attention to large, symmetric matrices. We fill the entries of the matrix on the diagonal and above the diagonal with random elements, then fill in the elements below the diagonal by symmetry.

If we choose our random values from a thin-tailed distribution, then Wigner’s semicircle law tells us what to expect from our distribution. If our matrices are large enough, then we expect the probability density of eigenvalues to be semicircular. To illustrate this, we’ll fill a matrix with samples from a standard normal distribution and compute its eigenvalues with the following Python code.

import numpy as np
import matplotlib.pyplot as plt
N = 5000
A = np.random.normal(0, 1, (N, N))
B = (A + A.T)/np.sqrt(2)
eigenvalues = np.linalg.eigvalsh(B)
print(max(eigenvalues), min(eigenvalues))
plt.hist(eigenvalues, bins=70)
plt.show()

We first create an N by N non-symmetric matrix, then make it symmetric by adding it to its transpose. (That’s easier than only creating the upper-triangular elements.) We divide by the square root of 2 to return the variance of each component to its original value, in this case 1. The eigenvalues in this particular experiment ran from -141.095 to 141.257 and their histogram shows the expected semi-circular shape.

Wigner’s semicircle law does not require the samples to come from a normal distribution. Any distribution with finite variance will do. We illustrate this by replacing the normal distribution with a Laplace distribution with the same variance and rerunning the code. That is, we change the definition of the matrix A to

A = np.random.laplace(0, np.sqrt(0.5), (N, N))

and get very similar results. The eigenvalues ran from -140.886 to 141.514 and again we see a semicircular distribution.

But what happens when we draw samples from a heavy-tailed distribution, i.e. one without finite variance? We’ll use a Student-t distribution with ν = 1.8 degrees of freedom. When ν > 2 the t-distribution has variance ν/(ν – 2), but for smaller values of ν it has no finite variance. We change the definition of the matrix A to the following:

A = np.random.standard_t(1.8, (N, N))

and now the distribution is quite different.

In this case the minimum eigenvalue was -9631.558 and the largest was 9633.853. When the matrix components are selected from a heavy-tailed distribution, the eigenvalues also have a heavier-tailed distribution. In this case, nearly all the eigenvalues are between -1000 and 1000, but the largest and smallest were 10 times larger. The eigenvalues are more variable, and their distribution looks more like a normal distribution and less like a semicircle.

The distributions for all thin-tailed symmetric matrices are the same. They have a universal property. But each heavy-tailed distribution gives rise to a different distribution on eigenvalues. With apologies to Tolstoy, thin-tailed matrices are all alike; every thick-tailed matrix is thick-tailed in its own way.

Update: As the first comment below rightfully points out, the diagonal entries should be divided by 2, not sqrt(2). Each of the off-diagonal elements of A + A^{T} are the sum of two independent random variables, but the diagonal elements are twice what they were in A. To put it another way, the diagonal elements are the sum of perfectly correlated random variables, not independent random variables.

I reran the simulations with the corrected code and it made no noticeable difference, either to the plots or to the range of the eigenvalues.

Random walks mix faster on some graphs than on others. Rapid mixing is one of the ways to characterize expander graphs.

By rapid mixing we mean that a random walk approaches its limiting (stationary) probability distribution quickly relative to random walks on other graphs.

Here’s a graph that supports rapid mixing. Pick a prime p and label nodes 0, 1, 2, 3, …, p-1. Create a circular graph by connecting each node k to k+1 (mod p). Then add an edge between each non-zero k to its multiplicative inverse (mod p). If an element is it’s own inverse, add a loop connecting the node to itself. For the purpose of this construction, consider 0 to be its own inverse. This construction comes from [1].

Here’s what the graph looks like with p = 23.

This graph doesn’t show the three loops from a node to itself, nodes 1, 0, and 22.

At each node in the random walk, we choose an edge uniformly and follow it. The stationary distribution for a random walk on this graph is uniform. That is, in the long run, you’re equally likely to be at any particular node.

If we pick an arbitrary starting node, 13, and take 30 steps of the random walk, the simulated probability distribution is nearly flat.

By contrast, we take a variation on the random walk above. From each node, we move left, right, or stay in place with equal probability. This walk is not as well mixed after 100 steps as the original walk is after only 30 steps.

You can tell from the graph that the walk started around 13. In the previous graph, evidence of the starting point had vanished.

The code below was used to produce these graphs.

To investigate how quickly a walk on this graph converges to its limiting distribution, we could run code like the following.

from random import random
import numpy as np
import matplotlib.pyplot as plt
m = 23
# Multiplicative inverses mod 23
inverse = [0, 1, 12, 8, 6, 14, 4, 10, 3, 18, 7, 21, 2, 16, 5, 20, 13, 19, 9, 17, 15, 11, 22]
# Verify that the inverses are correct
assert(len(inverse) == m)
for i in range(1, m):
assert (i*inverse[i] % 23 == 1)
# Simulation
num_reps = 100000
num_steps = 30
def take_cayley_step(loc):
u = random() * 3
if u < 1: # move left
newloc = (loc + m - 1) % m
elif u < 2: # move right
newloc = (loc + 1) % m
else:
newloc = inverse[loc] # or newloc = loc
return newloc
stats = [0]*m
for i in range(num_reps):
loc = 13 # arbitrary fixed starting point
for j in range(num_steps):
loc = take_cayley_step(loc)
stats[loc] += 1
normed = np.array(stats)/num_reps
plt.plot(normed)
plt.xlim(1, m)
plt.ylim(0,2/m)
plt.show()

[1] Shlomo Hoory, Nathan Linial, Avi Wigderson. “Expander graphs and their applications.” Bulletin of the American Mathematical Society, Volume 43, Number 4, October 2006, pages 439–561.

In geometry, you’d say that if a square has side x, then it has area x^{2}.

In calculus, you’d say more. First you’d say that if a square has side nearx, then it has area nearx^{2}. That is, area is a continuous function of the length of a side. As the length of the side changes, there’s never an abrupt jump in area. Next you could be more specific and say that a small change Δx to a side of length x corresponds to approximately a change of 2x Δx in the area.

In probability, you ask what is the area of a square like if you pick the length of its side at random. If you pick the length of the side from a distribution with mean μ, does the distribution of the area have mean μ^{2}? No, but if the probability distribution on side length is tightly concentrated around μ, then the distribution on area will be concentrated near μ^{2}. And you can approximate just how near the area is to μ^{2} using the delta method, analogous to the calculus discussion above.

If the distribution on side lengths is not particularly concentrated, finding the distribution on the area is more interesting. It will depend on the specific distribution on side length, and the mean area might not be particularly close to the square of the mean side length. The function to compute area is trivial, and yet the question of what happens when you stick a random variable into that function is not trivial. Random variables behave as you might expect when you stick them into linear functions, but offer surprises when you stick them into nonlinear functions.

Suppose you pick the length of the side of a square uniformly from the interval [0, 1]. Then the average side is 1/2, and so you might expect the average area to be 1/4. But the expected area is actually 1/3. You could see this a couple ways, analytically and empirically.

First an analytical derivation. If X has a uniform [0, 1] distribution and Z = X^{2}, then the CDF of Z is

Prob(Z ≤ z) = Prob(X ≤ √z) = √ z.

and so the PDF for Z, the derivative of the CDF, is -1/2√z. From there you can compute the expected value by integrating z times the PDF.

You could check your calculations by seeing whether simulation gives you similar results. Here’s a little Python code to do that.

from random import random
N = 1000000
print( sum([random()**2 for _ in range(N)] )/N )

When I run this, I get 0.33386, close to 1/3.

Now lets look at an exponential distribution on side length with mean 1. Then a calculation similar to the one above shows that the expected value of the product is 2. You can also check this with simulation. This time we’ll be a little fancier and let SciPy generate our random values for us.

print( sum(expon.rvs(size=N)**2)/N )

When I ran this, I got 1.99934, close to the expected value of 2.

You’ll notice that in both examples, the expected value of the area is more than the square of the expected value of the side. This is not a coincidence but consequence of Jensen’s inequality. Squaring is a convex function, so the expected value of the square is larger than the square of the expected value for any random variable.

This post explores a sentence from the book Single Digits:

Any number in Pascal’s triangle that is not in the outer two layers will appear at least three times, usually four.

Pascal’s triangle contains the binomial coefficients C(n, r) where n ranges over non-negative numbers and r ranges from 0 to n. The outer layers are the elements with r equal to 0, 1, n-1, and n.

We’ll write some Python code to explore how often the numbers up to 1,000,000 appear. How many rows of Pascal’s triangle should we compute? The smallest number on row n is C(n, 2). Now 1,000,000 is between C(1414, 2) and C(1415, 2) so we need row 1414. This means we need N = 1415 below because the row numbers start with 0.

I’d like to use a NumPy array for storing Pascal’s triangle. In the process of writing this code I realized that a NumPy array with dtype int doesn’t contain Python’s arbitrary-sized integers. This makes sense because NumPy is designed for efficient computation, and using a NumPy array to contain huge integers is unnatural. But I’d like to do it anyway, and the way to make it happen is to set dtype to object.

import numpy as np
from collections import Counter
N = 1415 # Number of rows of Pascal's triangle to compute
Pascal = np.zeros((N, N), dtype=object)
Pascal[0, 0] = 1
Pascal[1,0] = Pascal[1,1] = 1
for n in range(2, N):
for r in range(0, n+1):
Pascal[n, r] = Pascal[n-1, r-1] + Pascal[n-1, r]
c = Counter()
for n in range(4, N):
for r in range(2, n-1):
p = Pascal[n, r]
if p <= 1000000:
c[p] += 1

When we run this code, we find that our counter contains 1732 elements. That is, of the numbers up to one million, only 1732 of them appear inside Pascal’s triangle when we disallow the outer two layers. (The second layer contains consecutive integers, so every positive integer appears in Pascal’s triangle. But most integers only appear in the second layer.)

When Single Digits speaks of “Any number in Pascal’s triangle that is not in the outer two layers” this cannot refer to numbers that are not in the outer two layers because every natural number appears in the outer two layers. Also, when it says the number “will appear at least three times, usually four” it is referring to the entire triangle, i.e. including the outer two layers. So another way to state the sentence quoted above is as follows.

Define the interior of Pascal’s triangle to be the triangle excluding the outer two layers. Every number n in the interior of Pascal’s triangle appears twice more outside of the interior, namely as C(n, 1) and C(n, n-1). Most often n appears at least twice in the interior as well.

This means that any number you find in the interior of Pascal’s triangle, interior meaning not in the two outer layers, will appear at least three times in the full triangle, usually more.

Here are the statistics from our code above regarding just the interior of Pascal’s triangle.

One number, 3003, appears six times.

Six numbers appear four times: 120, 210, 1540, 7140, 11628, and 24310.

Ten numbers appear only once: 6, 20, 70, 252, 924, 3432, 12870, 48620, 184756, and 705432.

The large majority of numbers, 1715 out of 1732, appear twice.

Yesterday Brian Hayes wrote a post about the distribution of primes. He showed how you could take the remainder when primes are divided by 7 and produce something that looks like rolls of six-sided dice. Here we apply the chi-square goodness of fit test to show that the rolls are too evenly distributed to mimic randomness. This post does not assume you’ve seen the chi-square test before, so it serves as an introduction to this goodness of fit test.

In Brian Hayes’ post, he looks at the remainder when consecutive primes are divided by 7, starting with 11. Why 11? Because it’s the smallest prime bigger than 7. Since no prime is divisible by any other prime, all the primes after 7 will have a remainder of between 1 and 6 inclusive when divided by 7. So the results are analogous to rolling six-sided dice.

The following Python code looks at prime remainders and (pseudo)random rolls of dice and computes the chi-square statistic for both.

First, we import some functions we’ll need.

from sympy import prime
from random import random
from math import ceil

The function prime takes an argument n and returns the nth prime. The function random produces a pseudorandom number between 0 and 1. The ceiling function ceil rounds its argument up to an integer. We’ll use it to convert the output of random into dice rolls.

In this example we’ll use six-sided dice, but you could change num_sides to simulate other kinds of dice. With six-sided dice, we divide by 7, and we start our primes with the fifth prime, 11.

num_sides = 6
modulus = num_sides + 1
# Find the index of the smallest prime bigger than num_sides
index = 1
while prime(index) <= modulus:
index += 1

We’re going to take a million samples and count how many times we see 1, 2, …, 6. We’ll keep track of our results in an array of length 7, wasting a little bit of space since the 0th slot will always be 0. (Because the remainder when dividing a prime by a smaller number is always positive.)

# Number of samples
N = 1000000
observed_primes = [0]*modulus
observed_random = [0]*modulus

Next we “roll” our dice two ways, using prime remainders and using a pseudorandom number generator.

for i in range(index, N+index):
m = prime(i) % modulus
observed_primes[m] += 1
m = int(ceil(random()*num_sides))
observed_random[m] += 1

The chi-square goodness of fit test depends on the observed number of events in each cell and the expected number. We expect 1/6th of the rolls to land in cell 1, 2, …, 6 for both the primes and the random numbers. But in a general application of the chi-square test, you could have a different expected number of observations in each cell.

expected = [N/num_sides for i in range(1, modulus)]

The chi-square test statistic sums (O – E)^{2}/E over all cells, where O stands for “observed” and E stands for “expected.”

def chisq_stat(O, E):
return sum( [(o - e)**2/e for (o, e) in zip(O, E)] )

Finally, we compute the chi-square statistic for both methods.

Note that we chop off the first element of the observed and expected lists to get rid of the 0th element that we didn’t use.

When I ran this I got 0.01865 for the prime method and 5.0243 for the random method. Your results for the prime method should be the same, though you might have a different result for the random method.

Now, how do we interpret these results? Since we have six possible outcomes, our test statistics has a chi-square distribution with five degrees of freedom. It’s one less than the number of possibilities because the total counts have to sum to N; if you know how many times 1, 2, 3, 4, and 5 came up, you can calculate how many times 6 came up.

A chi-square distribution with ν degrees of freedom has expected value ν. In our case, we expect a value around 5, and so the chi-square value of 5.0243 is unremarkable. But the value of 0.01864 is remarkably small. A large chi-square statistics would indicate a poor fit, the observed numbers being suspiciously far from their expected values. But a small chi-square value suggests the fit is suspiciously good, closer to the expected values than we’d expect of a random process.

We can be precise about how common or unusual a chi-square statistic is by computing the probability that a sample from the chi square distribution would be larger or smaller. The cdf gives the probability of seeing a value this small or smaller, i.e. a fit this good or better. The sf gives the probability of seeing a value this larger or larger, i.e. a fit this bad or worse. (The scipy library uses sf for “survival function,” another name for the ccdf, complementary cumulative distribution function).

from scipy.stats import chi2
print(chi2.cdf(ch, num_sides-1), chi2.sf(ch, num_sides-1))

This says that for the random rolls, there’s about a 41% chance of seeing a better fit and a 59% chance of seeing a worse fit. Unremarkable.

But it says there’s only a 2.5 in a million chance of seeing a better fit than we get with prime numbers. The fit is suspiciously good. In a sense this is not surprising: prime numbers are not random! And yet in another sense it is surprising since there’s a heuristic that says primes act like random numbers unless there’s a good reason why in some context they don’t. This departure from randomness is the subject of research published just this year.

If you look at dice with 4 or 12 sides, you get a suspiciously good fit, but not as suspicious as with 6 sides. But with 8 or 20-sided dice you get a very bad fit, so bad that its probability underflows to 0. This is because the corresponding moduli, 9 and 21, are composite, which means some of the cells in our chi-square test will have no observations. (Suppose m has a proper factor a. Then if a prime p were congruent to a mod m,p would be have to be divisible by a.)

Update: See the next post for a systematic look at different moduli.

You don’t have to use “dice” that correspond to regular solids. You could consider 10-sided “dice,” for example. For such numbers it may be easier to think of spinners than dice, a spinner with 10 equal arc segments it could fall into.

This is a follow-on to my previous post on green noise. Here we create green noise with Python by passing white noise through a Butterworth filter.

Green noise is in the middle of the audible spectrum (on the Bark scale), just where our hearing is most sensitive, analogous to the green light, the frequency where our eyes are most sensitive. See previous post for details, including an explanation of where the left and right cutoffs below come from.

Here’s the code:

from scipy.io.wavfile import write
from scipy.signal import buttord, butter, filtfilt
from scipy.stats import norm
from numpy import int16
def turn_green(signal, samp_rate):
# start and stop of green noise range
left = 1612 # Hz
right = 2919 # Hz
nyquist = (samp_rate/2)
left_pass = 1.1*left/nyquist
left_stop = 0.9*left/nyquist
right_pass = 0.9*right/nyquist
right_stop = 1.1*right/nyquist
(N, Wn) = buttord(wp=[left_pass, right_pass],
ws=[left_stop, right_stop],
gpass=2, gstop=30, analog=0)
(b, a) = butter(N, Wn, btype='band', analog=0, output='ba')
return filtfilt(b, a, signal)
def to_integer(signal):
# Take samples in [-1, 1] and scale to 16-bit integers,
# values between -2^15 and 2^15 - 1.
signal /= max(signal)
return int16(signal*(2**15 - 1))
N = 48000 # samples per second
white_noise= norm.rvs(0, 1, 3*N) # three seconds of audio
green = turn_green(white_noise, N)
write("green_noise.wav", N, to_integer(green))

Let’s look at the spectrum to see whether it looks right. We’ll use one second of the signal so the x-axis coincides with frequency when we plot the FFT.

Suppose you have a graph of a function, but you don’t have an equation for it or the data that produced it. How can you reconstruction the function?

There are a lot of software packages to digitize images. For example, Web Plot Digitizer is one you can use online. Once you have digitized the graph at a few points, you can fit a spline to the points to approximately reconstruct the function. Then as a sanity check, plot your reconstruction to see if it looks like the original. It helps to have the same aspect ratio so you’re not distracted by something that doesn’t matter, and so that differences that do matter are easier to see.

For example, here is a graph from Zwicker and Fastl’s book on psychoacoustics. It contains many graphs with no data or formulas. This particular one gives the logarithmic transmission factor between free field and the peripheral hearing system.

Here’s Python code to reconstruct the functions behind these two curves.

import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
curve_names = ["Free", "Diffuse"]
plot_styles = { "Free" : 'b-', "Diffuse" : 'g:'}
data = {}
for name in curve_names:
data = np.loadtxt("{}.csv".format(name), delimiter=',')
x = data[:,0]
y = data[:,1]
spline = interpolate.splrep(x, y)
xnew = np.linspace(0, max(x), 100)
ynew = interpolate.splev(xnew, spline, der=0)
plt.plot(xnew, ynew, plot_styles[name])
logical_x_range = 24 # Bark
logical_y_range = 40 # dB
physical_x_range = 7 # inch
physical_y_range = 1.625 # inch
plt.legend(curve_names, loc=2)
plt.xlabel("critical-band rate")
plt.ylabel("attenuation")
plt.xlim((0, logical_x_range))
plt.axes().set_aspect(
(physical_y_range/logical_y_range) /
(physical_x_range/logical_x_range) )
ax = plt.gca()
ax.get_xaxis().set_ticks([0, 4, 8, 12, 16, 20, 24])
ax.get_yaxis().set_ticks([-10, 0, 10, 20, 30])
plt.show()

where C and S are the Fresnel functions, sometimes called the Fresnel cosine integral and Fresnel sine integral. Here’s a plot of the spiral.

Both Fresnel functions approach ½ as t → ∞ and so the curve slowly spirals toward (½, ½) in the first quadrant. And by symmetry, because both functions are odd, the curve spirals toward (-½, -½) in the third quadrant.

Here’s the Python code used to make the plot.

from scipy.special import fresnel
from scipy import linspace
import matplotlib.pyplot as plt
t = linspace(-7, 7, 1000)
y, x = fresnel(t)
plt.plot(x, y)
plt.axes().set_aspect("equal")
plt.show()

The SciPy function fresnel returns both Fresnel functions at the same time. It returns them in the order (S, C) so the code reverses the order of these to match the Cornu curve.

One interesting feature of Cornu’s spiral is that its curvature increases linearly with time. This is easy to verify: because of the fundamental theorem of calculus, the Fresnel functions reduce to sines and cosines when you take derivatives, and you can show that the curvature at time t equals πt.

How fast does the curve spiral toward (½, ½)? Since the curvature at time t is πt, that says that at time t the curve is instantaneously bending like a circle of radius 1/πt. So the radius of the spiral is decreasing like 1/πt.

Cornu’s spiral was actually discovered by Euler. Cornu was an engineer who independently discovered the curve much later. Perhaps because Cornu used the curve in applications, his name is more commonly associated with the curve. At least I’ve more often seen it named after Cornu. This is an example of Stigler’s law that things are usually not named after the first person to discover them.

* * *

For daily posts on analysis, follow @AnalysisFact on Twitter.

Yesterday I was looking into calculating fluctuation strength and playing around with some examples. Along the way I discovered how to create files that sound like police sirens. These are sounds with high fluctuation strength.

The Python code below starts with a carrier wave at f_{c} = 1500 Hz. Not surprisingly, this frequency is near where hearing is most sensitive. Then this signal is modulated with a signal with frequency f_{m}. This frequency determines the frequency of the fluctuations.

The slower example produced by the code below sounds like a police siren. The faster example makes me think more of an ambulance or fire truck. Next time I hear an emergency vehicle I’ll pay more attention.

If you use a larger value of the modulation index β and a smaller value of the modulation frequency f_{m} you can make a sound like someone tuning a radio, which is no coincidence.

from scipy.io.wavfile import write
from numpy import arange, pi, sin, int16
def f(t, f_c, f_m, beta):
# t = time
# f_c = carrier frequency
# f_m = modulation frequency
# beta = modulation index
return sin(2*pi*f_c*t - beta*sin(2*f_m*pi*t))
def to_integer(signal):
# Take samples in [-1, 1] and scale to 16-bit integers,
# values between -2^15 and 2^15 - 1.
return int16(signal*(2**15 - 1))
N = 48000 # samples per second
x = arange(3*N) # three seconds of audio
data = f(x/N, 1500, 2, 100)
write("slow.wav", N, to_integer(data))
data = f(x/N, 1500, 8, 100)
write("fast.wav", N, to_integer(data))

Imagine a clock with a prime number of hours. So instead of being divided into 12 hours, it might be divided into 11 or 13, for example. You can add numbers on this clock the way you’d add numbers on an ordinary clock: add the two numbers as ordinary integers and take the remainder by p, the number of hours on the clock. You can multiply them the same way: multiply as ordinary integers, then take the remainder by p. This is called arithmetic modulo p, or just mod p.

Which numbers have square roots mod p? That is, for which numbers x does there exist a number y such that y^{2} = x mod p? It turns out that of the non-zero numbers, half have a square root and half don’t. The numbers that have square roots are called quadratic residues and the ones that do not have square roots are called quadratic nonresidues. Zero is usually left out of the conversation. For example, if you square the numbers 1 through 6 and take the remainder mod 7, you get 1, 4, 2, 2, 4, and 1. So mod 7 the quadratic residues are 1, 2, and 4, and the nonresidues are 3, 5, and 6.

A simple, brute force way to tell whether a number is a quadratic residue mod p is to square all the numbers from 1 to p-1 and see if any of the results equals your number. There are more efficient algorithms, but this is the easiest to understand.

At first it seems that the quadratic residues and nonresidues are scattered randomly. For example, here are the quadratic residues mod 23:

[1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18]

But there are patterns, such as the famous quadratic reciprocity theorem. We can see some pattern in the residues by visualizing them on a graph. We make the numbers mod p vertices of a graph, and join two vertices a and b with an edge if a–b is a quadratic residue mod p.

Here’s the resulting graph mod 13:

And here’s the corresponding graph for p = 17:

And here’s Python code to produce these graphs:

import networkx as nx
import matplotlib.pyplot as plt
def residue(x, p):
for i in range(1, p):
if (i*i - x) % p == 0:
return True
return False
p = 17
G = nx.Graph()
for i in range(p):
for j in range(i+1, p):
if residue(i-j, p):
G.add_edge(i, j)
nx.draw_circular(G)
plt.show()

However, this isn’t the appropriate graph for all values of p. The graphs above are undirected graphs. We’ve implicitly assumed that if there’s an edge between a and b then there should be an edge between b and a. And that’s true for p = 13 or 17, but not, for example, for p = 11. Why’s that?

If p leaves a remainder of 1 when divided by 4, i.e. p = 1 mod 4, then x is a quadratic residue mod p if and only if –x is a quadratic residue mod p. So if a–b is a residue, so is b–a. This means an undirected graph is appropriate. But if p = 3 mod 4, x is a residue mod p if and only if –x is a non-residue mod p. This means we should use directed graphs if p = 3 mod 4. Here’s an example for p = 7.

The code for creating the directed graphs is a little different:

G = nx.DiGraph()
for i in range(p):
for j in range(p):
if residue(i-j, p):
G.add_edge(i, j)

Unfortunately, the way NetworkX draws directed graphs is disappointing. A directed edge from a to b is drawn with a heavy line segment on the b end. Any suggestions for a Python package that draws more attractive directed graphs?

The idea of creating graphs this way came from Roger Cook’s chapter on number theory applications in Graph Connections. (I’m not related to Roger Cook as far as I know.)

You could look at quadratic residues modulo composite numbers too. And I imagine you could also make some interesting graphs using Legendre symbols.

In 1945, a Cleveland newspaper held a contest to find the woman whose measurements were closest to average. This average was based on a study of 15,000 women by Dr. Robert Dickinson and embodied in a statue called Norma by Abram Belskie. Out of 3,864 contestants, no one was average on all nine factors, and fewer than 40 were close to average on five factors. The story of Norma and the Cleveland contest is told in Todd Rose’s book The End of Average.

People are not completely described by a handful of numbers. We’re much more complicated than that. But even in systems that are well described by a few numbers, the region around the average can be nearly empty. I’ll explain why that’s true in general, then look back at the Norma example.

General theory

Suppose you have N points, each described by n independent, standard normal random variables. That is, each point has the form (x_{1}, x_{2}, x_{2}, …, x_{n}) where each x_{i} is independent with a normal distribution with mean 0 and variance 1. The expected value of each coordinate is 0, so you might expect that most points are piled up near the origin (0, 0, 0, …, 0). In fact most points are in spherical shell around the origin. Specifically, as n becomes larger, most of the points will be in a thin shell with distance √n from the origin. (More details here.)

Simulated contest

In the contest above, n = 9, and so we expect most contestants to be about a distance of 3 from average when we normalize each of the factors being measured, i.e. we subtract the mean so that each factor has mean 0, and we divide each by its standard deviation so the standard deviation is 1 on each factor.

We’ve made several simplifying assumptions. For example, we’ve assumed independence, though presumably some of the factors measured in the contest were correlated. There’s also a selection bias: presumably women who knew they were far from average would not have entered the contest. But we’ll run with our simplified model just to see how it behaves in a simulation.

import numpy as np
# Winning critera: minimum Euclidean distance
def euclidean_norm(x):
return np.linalg.norm(x)
# Winning criteria: min-max
def max_norm(x):
return max(abs(x))
n = 9
N = 3864
# Simulated normalized measurements of contestants
M = np.random.normal(size=(N, n))
euclid = np.empty(N)
maxdev = np.empty(N)
for i in range(N):
euclid[i] = euclidean_norm(M[i,:])
maxdev[i] = max_norm(M[i,:])
w1 = euclid.argmin()
w2 = maxdev.argmin()
print( M[w1,:] )
print( euclidean_norm(M[w1,:]) )
print( M[w2,:] )
print( max_norm(M[w2,:]) )

There are two different winners, depending on how we decide the winner. Using the Euclidean distance to the origin, the winner in this simulation was contestant 3306. Her normalized measurements were

If we judge the winner to be the one whose largest deviation from average is the smallest, the winner is contestant 1916. Her normalized measurements were

Solutions to differential equations often satisfy some sort of maximum principle, which can in turn be used to construct upper and lower bounds on solutions.

We illustrate this in one dimension, using a boundary value problem for an ordinary differential equation (ODE).

Maximum principles

If the second derivative of a function is positive over an open interval (a, b), the function cannot have a maximum in that interval. If the function has a maximum over the closed interval [a, b] then it must occur at one of the ends, at a or b.

This can be generalized, for example, to the following maximum principle. Let L be the differential operator

L[u] = u” + g(x)u’ + h(x)

where g and h are bounded functions on some interval [a, b] and h is non-positive. Suppose L[u] ≥ 0 on (a, b). If u has an interior maximum, then u must be constant.

Boundary value problems

Now suppose that we’re interested in the boundary value problem L[u] = f where we specify the values of u at the endpoints a and b, i.e. u(a) = u_{a} and u(b) = u_{b}. We can construct an upper bound on u as follows.

Suppose we find a function z such that L[z] ≤ f and z(a) ≥ u_{a} and z(b) ≥ u_{b}. Then by applying the maximum principle to u – z, we see that u – z must be ≤ 0, and so z is an upper bound for u.

Similarly, suppose we find a function w such that L[w] ≥ f and w(a) ≤ u_{a} and w(b) ≤ u_{b}. Then by applying the maximum principle to w – u, we see that w – u must be ≤ 0, and so w is an lower bound for u.

Note that any functions z and w that satisfy the above requirements give upper and lower bounds, though the bounds may not be very useful. By being clever in our choice of z and w we may be able to get tighter bounds. We might start by choosing polynomials, exponentials, etc. Any functions that are easy to work with and see how good the resulting bounds are.

Tomorrow’s post is similar to this one but looks at bounds for an initial value problem rather than a boundary value problem.

Airy equation example

The following is an elaboration on an example from [1]. Suppose we want to bound solutions to

u”(x) – x u(x) = 0

where u(0) = 0 and u(1) = 1. (This is a well-known equation, but for purposes of illustration we’ll pretend at first that we know nothing about its solutions.)

For our upper bound, we can simply use z(x) = x. We have L[z] ≤ 0 and z satisfies the boundary conditions exactly.

For our lower bound, we use w(x) = x – βx(1 – x). Why? The function z already satisfies the boundary condition. If we add some multiple of x(1 – x) we’ll maintain the boundary condition since x(1 – x) is zero at 0 and 1. The coefficient β gives us some room to maneuver. Turns out L[w] ≥ 0 if β ≥ 1/2. If we choose β = 1/2 we have

(x + x^{2})/2 ≤ u(x) ≤ x

In general, you don’t know the function you’re trying to bound. That’s when bounds are most useful. But this is a sort of toy example because we do know the solution. The equation in this example is well known and is called Airy’s equation. The Airy functions Ai and Bi are independent solutions. Here’s a plot of the solution with its upper and lower bounds.

Here’s the Python code I used to solve for the coefficients of Ai and Bi and make the plot.

SciPy’s function airy has an optimization that we waste here. The function computes Ai and Bi and their first derivatives all at the same time. We could take advantage of that to remove some redundant computations, but that would make the code harder to read. We chose instead to wait an extra nanosecond for the plot.

How can you convert the frequency of a sound to musical notation? I wrote in an earlier post how to calculate how many half steps a frequency is above or below middle C, but it would be useful go further have code to output musical pitch notation.

In scientific pitch notation, the C near the threshold of hearing, around 16 Hz, is called C0. The C an octave higher is C1, the next C2, etc. Octaves begin with C; other notes use the octave number of the closest C below.

The lowest note on a piano is A0, a major sixth up from C0. Middle C is C4 because it’s 4 octaves above C0. The highest note on a piano is C8.

Math

A4, the A above middle C, has a frequency of 440 Hz. This is nine half steps above C4, so the pitch of C4 is 440*2^{-9/12}. C0 is four octaves lower, so it’s 2^{-4} = 1/16 of the pitch of C4. (Details for this calculation and the one below are given in here.)

For a pitch P, the number of half steps from C0 to P is

h = 12 log_{2}(P / C0).

Software

Here is a page that will let you convert back and forth between frequency and music notation: Music, Hertz, Barks.

If you’d like code rather than just to do one calculation, see the Python code below. It calculates the number of half steps h from C0 up to a pitch, then computes the corresponding pitch notation.

from math import log2, pow
A4 = 440
C0 = A4*pow(2, -4.75)
name = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]
def pitch(freq):
h = round(12*log2(freq/C0))
octave = h // 12
n = h % 12
return name[n] + str(octave)

The pitch for A4 is its own variable in case you’d like to modify the code for a different tuning. While 440 is common, it used to be lower in the past, and you’ll sometimes see higher values like 444 today.

If you’d like to port this code to a language that doesn’t have a log2 function, you can use log(x)/log(2) for log2(x).

Powers of 2

When scientific pitch notation was first introduced, C0 was defined to be exactly 16 Hz, whereas now it works out to around 16.35. The advantage of the original system is that all C’s have frequency a power of 2, i.e. Cn has frequency 2^{n+4} Hz. The formula above for the number of half steps a pitch is above C0 simplifies to

h = 12 log_{2}P – 48.

If C0 has frequency 16 Hz, the A above middle C has frequency 2^{8.75} = 430.54, a little flat compared to A 440. But using the A 440 standard, C0 = 16 Hz is a convenient and fairly accurate approximation.