Spectral flatness

White noise has a flat power spectrum. So a reasonable way to measure how close a sound is to being pure noise is to measure how flat its spectrum is.

Spectral flatness is defined as the ratio of the geometric mean to the arithmetic mean of a power spectrum.

The arithmetic mean of a sequence of n items is what you usually think of as a mean or average: add up all the items and divide by n.

The geometric mean of a sequence of n items is the nth root of their product. You could calculate this by taking the arithmetic mean of the logarithms of the items, then taking the exponential of the result. What if some items are negative? Since the power spectrum is the squared absolute value of the FFT, it can’t be negative.

So why should the ratio of the geometric mean to the arithmetic mean measure flatness? And why pure tones have “unflat” power spectra?

If a power spectrum were perfectly flat, i.e. constant, then its arithmetic and geometric means would be equal, so their ratio would be 1. Could the ratio ever be more than 1? No, because the geometric mean is always less than or equal to the arithmetic mean, with equality happening only for constant sequences.

In the continuous realm, the Fourier transform of a sine wave is a pair of delta functions. In the discrete realm, the FFT will be large at two points and small everywhere else. Why should a concentrated function have a small flatness score? If one or two of the components are 1’s and the rest are zeros, the geometric mean is zero. So the ratio of geometric and arithmetic means is zero. If you replace the zero entries with some small ε and take the limit as ε goes to zero, you get 0 flatness as well.

Sometimes flatness is measured on a logarithmic scale, so instead of running from 0 to 1, it would run from -∞ to 0.

Let’s compute the flatness of some examples. We’ll take a mixture of a 440 Hz sine wave and Gaussian white noise with varying proportions, from pure sine wave to pure noise. Here’s what the flatness looks like as a function of the proportions.

spectral flatness

The curve is fairly smooth, though there’s some simulation noise at the right end. This is because we’re working with finite samples.

Here’s what a couple of these signals would sound like. First 30% noise and 70% sine wave:


And now the other way around, 70% noise and 30% sine wave:


Why does the flatness of white noise max out somewhere between 0.5 and 0.6 rather than 1? White noise only has a flat spectrum in expectation. The expected value of the power spectrum at every frequency is the same, but that won’t be true of any particular sample.

Related posts:

Family tree numbering

When you draw a tree of your ancestors, things quickly get out of hand. There are twice as many nodes each time you go back a generation, and so the size of paper you need grows exponentially. Things also get messy because typically you know much more about some lines than others. If you know much about your ancestry, one big tree isn’t going to work.

Ahnentafel numbering system from 1590

There’s a simple solution to this problem, one commonly used in genealogy: assign everyone in the tree a number, starting with yourself as 1. Then follow two simple rules:

  1. The father of person n has number 2n.
  2. The mother of person n has number 2n + 1.

You can tell where someone fits into the tree easily from their number. Men have even numbers, women odd numbers. The number of someone’s child is half their number (rounding down if you get a fraction). For example, person 75 on your tree must be a woman. Her husband would be 74, her child 37, her father 150, etc.

Taking the logarithm base 2 tells you how many generations back someone is. That is, person n is ⌊ log2n ⌋ generations back. Going back to our example of 75, this person would be 6 generations back because log2 75 = 6.2288. (Here ⌊ x ⌋ is the “floor” of x, the largest integer less than x. Using the same notation, the child of n is ⌊ n/2 ⌋.)

Said another way, the people m generations back have numbers 2m through 2m+1 – 1. Your paternal line has numbers equal to powers of 2, and your maternal line has numbers one less than powers of 2.

If you write out a person’s number in binary, you stick a 0 on the end to find their father and a 1 on the end to find their mother. So your paternal grandmother, for example, would have number 101 in binary. Start with your number: 1. Then tack on a zero for your father: 10. Then tack on a 1 for his mother: 101.

In our example of 75 above, this number is 1001011 in binary. Leave off the one on the left, then read from left to right saying “father” every time you see a 0 and “mother” every time you see a 1. So person 75 is your father’s father’s mother’s father’s mother’s mother.

This numbering system goes back to at least 1590. In that year Michaël Eytzinger published the chart in the image above, giving the genealogy of Henry III of France.

Related posts:

The acoustics of kettledrums


Kettledrums (a.k.a. tympani) produce a definite pitch, but in theory they should not. At least the simplest mathematical model of a kettledrum would not have a definite pitch. Of course there are more accurate theories that align with reality.

Unlike many things that work in theory but not in practice, kettledrums work in practice but not in theory.

A musical sound has a definite pitch when the first several Fourier components are small integer multiples of the lowest component, the fundamental. A pitch we hear at 100 Hz would have a first overtone at 200 Hz, the second at 300 Hz, etc. It’s the relative strengths of these components give each instrument its characteristic sound.

An ideal string would make a definite pitch when you pluck it. The features of a real string discarded for the theoretical simplicity, such as stiffness, don’t make a huge difference to the tonality of the string.

An ideal circular membrane would vibrate at frequencies that are much closer together than consecutive integer multiples of the fundamental. The first few frequencies would be at 1.594, 2.136, 2.296, 2.653, and 2.918 times the fundamental. Here’s what that would sound like:


I chose amplitudes of 1, 1/2, 1/3, 1/4, 1/5, and 1/6. This was somewhat arbitrary, but not unrealistic. Including more than the first six Fourier components would make the sound even more muddled.

By comparison, here’s what it would sound like with the components at 2x up to 6x the fundamental, using the same amplitudes.


This isn’t an accurate simulation of tympani sounds, just something simple but more realistic than the vibrations of an idea membrane.

The real world complications of a kettledrum spread out its Fourier components to make it have a more definite pitch. These include the weight of air on top of the drum, the stiffness of the drum head, the air trapped in the body of the drum, etc.

If you’d like to read more about how kettle drums work, you might start with The Physics of Kettledrums by Thomas Rossing in Scientific American, November 1982.

How to create Green noise in Python

This is a follow-on to my previous post on green noise. Here we create green noise with Python by passing white noise through a Butterworth filter.

Green noise is in the middle of the audible spectrum (on the Bark scale), just where our hearing is most sensitive, analogous to the green light, the frequency where our eyes are most sensitive. See previous post for details, including an explanation of where the left and right cutoffs below come from.

Here’s the code:

from scipy.io.wavfile import write
from scipy.signal import buttord, butter, filtfilt
from scipy.stats import norm
from numpy import int16

def turn_green(signal, samp_rate):
    # start and stop of green noise range
    left = 1612 # Hz
    right = 2919 # Hz

    nyquist = (samp_rate/2)
    left_pass  = 1.1*left/nyquist
    left_stop  = 0.9*left/nyquist
    right_pass = 0.9*right/nyquist
    right_stop = 1.1*right/nyquist

    (N, Wn) = buttord(wp=[left_pass, right_pass],
                      ws=[left_stop, right_stop],
                      gpass=2, gstop=30, analog=0)
    (b, a) = butter(N, Wn, btype='band', analog=0, output='ba')
    return filtfilt(b, a, signal)

def to_integer(signal):
    # Take samples in [-1, 1] and scale to 16-bit integers,
    # values between -2^15 and 2^15 - 1.
    signal /= max(signal)
    return int16(signal*(2**15 - 1))

N = 48000 # samples per second

white_noise= norm.rvs(0, 1, 3*N) # three seconds of audio
green = turn_green(white_noise, N)
write("green_noise.wav", N, to_integer(green))

And here’s what it sounds like:

(download .wav file)

Let’s look at the spectrum to see whether it looks right. We’ll use one second of the signal so the x-axis coincides with frequency when we plot the FFT.

from scipy.fftpack import fft

one_sec = green[0:N]
plt.xlim((1500, 3000))

Here’s the output, concentrated between 1600 and 3000 Hz as expected:

spectral plot of green noise

Green noise and Barks

Colors of noise

In a previous post I explained the rationale behind using names of colors to refer to different kinds of noise. The basis is an analogy between the spectra of sounds and the spectra of light. Red noise is biased toward the low end of the audio spectrum just as red light is toward the low end of the visible spectrum. Blue noise is biased toward the high end, just as blue light is toward the high end of the visible spectrum.

visible spectrum

Green noise

Green noise is based on a slightly different analogy with light as described here:

Blue, green and other noise colours seem not to be rigorously defined although the word “colour” is used a lot in describing noise. Some define the 7 rainbow colours to correspond to a width of about three critical bands in the Bark frequency scale such that green lies in the corresponding point of greatest sensitivity … [just as green light has] the greatest sensitivity for the eye. This identifies green noise as the most troublesome for speech systems.

This is different than the usual definition of red noise etc. in that it speaks of colors limited to a particular frequency range rather than being weighted toward that range. Usually red noise contains a broad spectrum of frequencies, but the weighted like 1/f2, so the spectrum decreases fairly quickly as frequency increases.


So what is this Bark frequency scale? First of all, the Bark scale was named in honor of acoustician Heinrich Barkhausen. On this scale, the audible spectrum runs from 0 to 24, each Bark being a sort of psychologically equal division. Lots of things in psychoacoustics work on the Bark scale rather than the scale of Hertz.

There are multiple ways to convert from Hz to Bark and back, each slightly different but approximately equivalent. A convenient form is

z = 6 arcsinh(f/600)

where f is frequency in Hertz and z is frequency in Bark. One reason this form is convenient is that it’s easy to invert:

f = 600 sinh(z/6)

A frequency of 24 Bark corresponds to around 16 kHz, so the audible spectrum doesn’t quite end at 24, at least for most young people, but applications are most concerned with the range of 0–24 Bark.

The paragraph above is a little vague about where the color boundaries should be. When it says there are seven intervals, each “a width of about three critical bands,” I assume it means to divide the range of 0–24 Bark into seven equal pieces, making each 24/7 or 3.4 Barks wide. If we do this, red would run from 0–3.43 Barks, orange from 3.43–6.86, yellow from 6.86–12.29, green from 10.29–13.71, etc.

This would put green noise in the range of 1612 to 2919 Hz. Human hearing is most sensitive around 2000 Hz, near the middle of this interval.

In musical notation, the frequency range of green noise runs from G6 to F#7. See this post for an explanation of the pitch notation and Python code for computing it from frequency.

Update: See the next post for how to create an audio file of green noise in Python. Here’s a spectral plot from that post showing that the frequencies in the noise are in the expected range.

spectral plot of green noise

Consecutive pair magic square

The following magic square has a couple unusual properties. For one, numbers appear in consecutive pairs. Also, you can connect the numbers 1 through 32 in a continuous path.

I found this in Before Sudoku. The authors attribute it to William Mannke, “A Magic Square.” Journal of Recreational Mathematics. 1 (3) page 139, July 1968.

Other magic squares:

How to digitize a graph

Suppose you have a graph of a function, but you don’t have an equation for it or the data that produced it. How can you reconstruction the function?

There are a lot of software packages to digitize images. For example, Web Plot Digitizer is one you can use online. Once you have digitized the graph at a few points, you can fit a spline to the points to approximately reconstruct the function. Then as a sanity check, plot your reconstruction to see if it looks like the original. It helps to have the same aspect ratio so you’re not distracted by something that doesn’t matter, and so that differences that do matter are easier to see.

For example, here is a graph from Zwicker and Fastl’s book on psychoacoustics. It contains many graphs with no data or formulas. This particular one gives the logarithmic transmission factor between free field and the peripheral hearing system.

Here’s Python code to reconstruct the functions behind these two curves.

import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate 

curve_names = ["Free", "Diffuse"]
plot_styles = { "Free" : 'b-', "Diffuse" : 'g:'}

data = {}
for name in curve_names:
    data = np.loadtxt("{}.csv".format(name), delimiter=',')

    x = data[:,0]
    y = data[:,1]
    spline = interpolate.splrep(x, y)
    xnew = np.linspace(0, max(x), 100)
    ynew = interpolate.splev(xnew, spline, der=0)
    plt.plot(xnew, ynew, plot_styles[name])
logical_x_range  = 24    # Bark
logical_y_range  = 40    # dB
physical_x_range = 7     # inch
physical_y_range = 1.625 # inch

plt.legend(curve_names, loc=2)
plt.xlabel("critical-band rate")
plt.xlim((0, logical_x_range))

    (physical_y_range/logical_y_range) / 
    (physical_x_range/logical_x_range) )
ax = plt.gca()
ax.get_xaxis().set_ticks([0, 4, 8, 12, 16, 20, 24])
ax.get_yaxis().set_ticks([-10, 0, 10, 20, 30])


Here’s the reconstructed graph.

Linear or not, random or not, at different levels

Linear vs nonlinear

I’ve run across a lot of ambiguity lately regarding systems described as “nonlinear.” Systems typically have several layers, and are linear at one level and nonlinear at another, and authors are not always clear about which level they’re talking about.

For example, I recently ran across something called a “nonlinear trapezoid filter.” My first instinct was to parse this as

nonlinear (trapezoid filter)

though I didn’t know what that meant. On closer inspection, I think they meant

(nonlinear trapezoid) filter

which is a linear filter, formed by multiplying a spectrum by a “nonlinear trapezoid,” a function whose graph looks like a trapezoid except one of the sides is slightly curved.

One of the things that prompted this post was a discussion with a client about Kalman filters and particle filters. You can have linear methods for tracking nonlinear processes, nonlinear methods for tracking nonlinear processes, etc. You have to be careful in this context when you call something “linear” or not.

Random vs deterministic

There’s a similar ambiguity around whether a system is random or deterministic. Systems are often deterministic at one level and random at another. For example, a bandit design is a deterministic rule for conducting an experiment with random outcomes. Simulations can be even more confusing because there could be several alternating layers of randomness and determinism. People can talk past each other while thinking they’re being clear. They can say something about variance, for example, and other other person nods their head, though they’re thinking about the variance of two different things.

As simple as it sounds, you can often help a team out by asking them to be more explicit when they say something is linear or nonlinear, random or deterministic. In a sense this is nothing new: it helps to be explicit. But I’m saying a little more than that, suggesting a couple particular areas to watch out for, areas where it’s common for people to be vague when they think they’re being clear.

Related posts

The intersection of genomes is empty

From this story in Quanta Magazine:

In fact, there’s no single set of genes that all living things need in order to exist. When scientists first began searching for such a thing 20 years ago, they hoped that simply comparing the genome sequences from a bunch of different species would reveal an essential core shared by all species. But as the number of genome sequences blossomed, that essential core disappeared. In 2010, David Ussery, a biologist at Oak Ridge National Laboratory in Tennessee, and his collaborators compared 1,000 genomes. They found that not a single gene is shared across all of life.

Magic hexagon

The following figure is a magic hexagon: the numbers in any straight path through the figure add to 38, even though paths may have length three, four, or five.

I found this in Before Sudoku. The authors attribute it to Madachy’s Mathematical Recreations.

This is essentially the only magic hexagon filled with consecutive integers starting with one. The only others are rotations or reflections of this one, or the trivial case of a single hexagon.

Related posts:

Personal growth and discrete harmonic functions

“You are the average of the five people you spend the most time with.”

A Google search says this quote is by Jim Rohn. I think other people have said similar things. I’ve heard it quoted many times. The implication is usually that you can improve your life by hanging around better people.

Here are three things it makes me think of.

  1. It sounds approximately true.
  2. It can’t be literally true.
  3. It reminds me of harmonic functions.

There are numbers to back up the assertion that things like your income and even your weight approximately match the average of the people around you. Cause and effect go both ways: people like to hang out with people like themselves, and we become like the people we hang around.

It’s an aphorism, not meant to be taken literally. But a moment’s thought shows that it can’t be literally true for everybody. In any social network, someone has the most money, for example. That person’s net worth cannot be the average of the net worth of others in the group, unless everyone has the exact same net worth. The same applies to the poorest person in the network.

The reason I say that this reminds me of harmonic functions is the mean value theorem. If a function satisfies Laplace’s equation in some region, at any point in the interior of the region, the value of the function equals the average of the function over a spherical region centered at the point. But this is only true in the interior. On the boundary, you might have a maximum or minimum. If the boundary is compact, you will have a maximum and a minimum, provided the function extends continuously to its boundary.

I think of the continuous case first because I spent years thinking about such things. But there’s a discrete analog of harmonic functions that applies directly to the example above. If you have some network, such as a social network, and assign number to each node, you have a discrete harmonic function if the value at every node is the average of the values at its neighboring nodes. For a finite network, a function cannot be harmonic at every point unless it is constant, for reasons given above. But a function could be harmonic at all but two nodes of a graph, or approximately harmonic at all nodes.

Related posts:

Technical memento mori

The Latin phrase memento mori means “remember that you must die.” It has been adopted into English to refer to an object that serves as a reminder of death, especially a skull. This is a common theme in art, such as Albrecht Dürer’s engraving St. Jerome in His Study.

Saint Jerome in His Study (Dürer)

I keep a copy of the book Inside OLE as a sort of technological memento mori.

Inside OLE by Kraig Brockschmidt

At some point in the 1990’s I thought OLE was the way of the future. My professional development as a programmer would be complete once I mastered OLE, and this book was the way to get there.

Most of you probably have no idea what OLE is, which is kinda my point. It stands for Object Linking and Embedding, a framework that began at Microsoft in 1990. It was a brilliant solution to the problems it was designed to address, given the limitations of its time. Today it seems quaint. Now I’m doubly removed from OLE: hardly any Windows software developers think about OLE, and I hardly think about Windows development. The thing I was anxious to understand deeply was irrelevant to me a few years later.

Despite my one-time infatuation with OLE, throughout my career I have mostly focused on things that will last. In particular, I’ve focused more on mathematics than technology because the former has a longer shelf life. As I quipped on Twitter one time, technology has the shelf life of bread, but mathematics has the shelf life of honey. Still, man does not live by honey alone. We need bread too, even if it only lasts a day.

Related post: Remembering COM

Well, F = ma.

Three or four very short stories on the difficulty of learning to use simple things. Depends whether you count the last section as a story.

* * *

When I was taking freshman physics and we were stuck on a problem, the professor would say “Well, F = ma.”

True, but absolutely useless. Yes, we know that F = ma. (Force equals mass times acceleration.) Nobody thought “Oh, that’s it. I was thinking F = ma2. That explains everything.” Newton’s laws are simple (in a sense) but subtle to apply. The difficult part isn’t the abstract principles but their application to concrete problems.

* * *

The heart of Bayesian statistics is not much more complicated than F = ma. Its the statement that

posterior ∝ likelihood × prior.

It takes a few years to learn how to apply that equation well. And when people try to help, their advice sounds about as useless as “Well, F = ma.”

* * *

Learning to use Unix was hard. When I asked for help getting started, a lab assistant said “Go read the man pages.” That’s about as hostile as saying “Want to learn English? Read a dictionary.” Fortunately I knew other people who were helpful. One of them told me about the book.

But still, it took a while to get the gestalt of Unix. I knew how to use a handful of utilities, and kept thinking everything would be fine once I knew maybe 10x as many utilities. Then one day I was talking with a friend who seemed fluent working with Unix. I asked him how he did a few things and realized he used the same tools I did, but used them better. It was almost as if he’d said “I just use F = ma” except when he said it things clicked.

* * *

The motivation for this post, the thing that brought these stories to mind, was listening to a podcast. The show had some good advice, things that I know I need to do, but nothing I hadn’t heard many times before. The hard part is working out what the particulars mean for me personally.

It often takes someone else to help us see what’s right in front of us. I’m grateful for the people who have helped me work out the particulars of things I was convinced of but couldn’t see how to apply. Sometimes I have the pleasure of being able to do that for someone else.

Magic square rows and columns as numbers

Take any 3 by 3 magic square. For example, here’s the ancient Lo Shu square:

[[4,9,2],[3,5,7], [8,1,6]]

If you read the rows as numbers and sum their squares, you get the same thing whether you read left to right or right to left. In this case

4922 + 3572 + 8162 = 2942 + 7532 + 6182.

Similarly, if you read the columns as numbers and sum their squares, you get the same thing whether you read top to bottom or bottom to top:

4382 + 9512 + 2762 = 8342 + 1592 + 6722.

This doesn’t depend on base 10. It’s true of any base. And the entries of the magic square do not have to be single digits as long as you take the first to be the coefficient of b2, the second the coefficient of b, and the last the coefficient of 1, where b is your base.

In addition to rows and columns, you can get analogous results for diagonals.

4562 + 9782 + 2312 = 6542 + 8792 + 1322

4562 + 3122 + 8972 = 6542 + 2132 + 7982

2582 + 9362 + 4712 = 8522 + 6392 + 1742

2582 + 7142 + 6932 = 8522 + 4172 + 3962

How would you prove this? Arthur Benjamin and Kan Yasuda give an elegant proof here using permutation matrices. Or you could use brute-force starting with Édouard Lucas’ theorem that every 3 by 3 magic square has the following form.

[[c-b, c+a+b, c-a], [c-a+b, c, c+a-b], [c+a, c-a-b, c+b]]

(For each ab, and c there are eight variations on magic square given by Lucas, reflections and rotations of his square.)

Benjamin and Yasuda attribute this discovery to R. Holmes in 1970. “The magic magic square”, The Mathematical Gazette, 54(390):376.

Alphamagic squares in French

In earlier blog posts I give examples of alphamagic squares in English and in Spanish. This post looks at French.

French flag

The Wikipedia article on alphamagic squares, quoting The Universal Book of Mathematics, says

French allows just one 3 × 3 alphamagic square involving numbers up to 200, but a further 255 squares if the size of the entries is increased to 300.

My script did not find a French alphamagic with entries no larger than 200, but it did find 254 squares with entries no larger than 300. (For each of these squares there are 7 more variations you can form by rotation and reflection.)

The French alphamagic square with the smallest maximum entry that I found is the following.

[[3, 204, 102], [202, 103, 4], [104, 2, 203]]

When spelled out in French we get

[[trois, deux cent quatre, cent deux], [deux cent deux, cent trois, quatre], [cent quatre, deux, deux cent trois]]

And when we replace each cell with its number of letters we get the following magic square:

[[ 5, 14, 8], [12, 9, 6], [10, , 13]]

A list of all the French magic squares with maximum element 300 that I found is available here.