Width of mixture PDFs

Posted on 5 September 2017 by John

Suppose you draw samples from two populations, one of which has a wider probability distribution than the other. How does the width of the distribution of the combined sample vary as you change the proportions of the two populations?

The extremes are easy. If you pick only from one population, then the resulting distribution will be exactly as wide as the distribution of that population. But what about in the middle? If you pick from both populations with equal probability, will the width resulting distribution be approximately the average of the widths of the two populations separately?

To make things more specific, we’ll draw from two populations: Cauchy and normal. With probability η we will sample from a Cauchy distribution with scale γ and with probability (1 − η) we will sample from a normal distribution with scale σ. The resulting combined distribution is a mixture, known in spectrometry as a pseudo-Voigt distribution. In that field, the Cauchy distribution is usually called the Lorentz distribution.

(Why”pseudo”? A Voigt random variable is the sum of a Cauchy and a normal random variable. Its PDF is a convolution of a Cauchy and a normal PDF. A pseudo-Voigt random variable is the mixture of a Cauchy and a normal random variable. Its PDF is a convex combination of the PDFs of a Cauchy and a normal PDF. In fact, the convex combination coefficients are η and (1-η) mentioned above. Convex combinations are easier to work with than convolutions, at least in some contexts, and the pseudo-Voigt distribution is a convenient approximation to the Voigt distribution.)

We will measure the width of distributions by full width at half maximum (FWHM). In other words, we’ll measure how far apart the two points are where the distribution takes on half of its maximum value.

It’s not hard to calculate that the FWHM for a Cauchy distribution with scale 2γ and the FWHM for a normal distribution with scale σ is 2 √(2 log 2) σ.

If we have a convex combination of Cauchy and normal distributions, we’d expect the FWHM to be at least roughly the same convex combination of the FWHM of the separate distributions, i.e. we’d expect the FWHM of our mixture to be

2ηγ + 2(1 − η)√(2 log 2)σ.

How close is that guess to being correct? It has to be exactly correct when η is 0 or 1, but how well does it do in the middle? Here are a few experiments fixing γ = 1 and varying σ.

Acoustic roughness examples

Posted on 17 August 2016 by John

Amplitude modulated signals sound rough to the human ear. The perceived roughness increases with modulation frequency, then decreases, and eventually disappears. The point where roughness reaches is maximum depends on the carrier signal, but for a 1 kHz tone roughness reaches a maximum for modulation at 70 Hz. Roughness also increases as a function of modulation depth.

Amplitude modulation multiplies a carrier signal by

1 + d sin(2π f t)

where d is the modulation depth, f is the modulation frequency, and t is time.

Here are some examples you can listen to. We use a pure 1000 Hz tone and Gaussian white noise as carriers, and vary modulation depth and frequency continuously over 10 seconds. he modulation depth example varies depth from 0 to 1. Modulation frequency varies from 0 to 120 Hz.

First, here’s a pure tone with increasing modulation depth.

Next we vary the modulation frequency.

Now we switch over to Gaussian white noise, first varying depth.

And finally white noise with varying modulation frequency. This one sounds like a prop-driven airplane taking off.

Related: Psychoacoustics consulting

Kalman filters and bottom-up learning

Posted on 24 May 2016 by John

radio antennae

Kalman filtering is a mixture of differential equations and statistics. Kalman filters are commonly used in tracking applications, such as tracking the location of a space probe or tracking the amount of charge left in a cell phone battery. Kalman filters provide a way to synthesize theoretical predictions and actual measurements, accounting for error in both.

Engineers naturally emphasize the differential equations and statisticians naturally emphasize the statistics. Both perspectives are valuable, but in my opinion/experience, the engineering perspective should come first.

From an engineering perspective, a Kalman filtering problem starts as a differential equation. In an ideal world, one would simply solve the differential equation and be done. But the experienced engineer realizes his or her differential equations don’t capture everything. (Unlike the engineer in this post.) Along the road to the equations at hand, there were approximations, terms left out, and various unknown unknowns.

The Kalman filter accounts for some level of uncertainty in the process dynamics and in the measurements taken. This uncertainty is modeled as randomness, but this doesn’t mean that there’s necessarily anything “random” going on. It simply acknowledges that random variables are an effective way of modeling miscellaneous effects that are unknown or too complicated to account for directly. (See Random is as random does.)

The statistical approach to Kalman filtering is to say that it is simply another estimation problem. You start from a probability model and apply Bayes’ theorem. That probability model has a term inside that happens to come from a differential equation in practice, but this is irrelevant to the statistics. The basic Kalman filter is a linear model with normal probability distributions, and this makes a closed-form solution for the posterior possible.

You’d be hard pressed to start from a statistical description of Kalman filtering, such as that given here, and have much appreciation for the motivating dynamics. Vital details have simply been abstracted away. As a client told me once when I tried to understand his problem starting from the top-down, “You’ll never get here from there.”

The statistical perspective is complementary. Some things are clear from the beginning with the statistical formulation that would take a long time to see from the engineering perspective. But while both perspectives are valuable, I believe it’s easier to start on the engineering end and work toward the statistics end rather than the other way around.

History supports this claim. The Kalman filter from the engineering perspective came first and its formulation in terms of Bayesian statistics came later. Except that’s not entirely true.

Rudolf Kálmán published his seminal paper in 1960 and four years later papers started to come out making the connection to Bayesian statistics. But while Kálmán and others were working in the US starting from the engineering end, Ruslan Stratonovich was working in Russia starting from the statistical end. Still, I believe it’s fair to say that most of the development and application of Kalman filters has proceeded from the engineering to the statistics rather than the other way around.

Related: More on Kalman filters

Cepstral alanysis vocabulary

Posted on 21 May 2016 by John

An earlier post defined cepstrum and quefrency. This post explains some of the other quirky terms introduced in the same paper by Bogert, Healy, and Tukey. (Given Tukey’s delight in coining words, we can assume he was the member of the trio responsible for the new terms.)

The paper [1] explains why the new twists on familiar words:

In general, we find ourselves operating on the frequency side in ways customary on the time side and vice versa. Experience has made it clear that “words that sound like other words,” although strange at first sight, considerably reduce confusion on balance. These parallel or “paraphrased” words are made by the interchange of consonants or consonant groups, as in “alanysis” from “analysis,” and are introduced as needed.

The magnitude and phase of a cepstrum are called gamnitude and saphe. (The latter explains the pun “saphe cracking” in the title.)

Filtering in the cepstral domain is called liftering. A high-pass filter corresponds to a long-pass lifter and a low-pass filter corresponds to a short-pass lifter.

Harmonics in spectra correspond to rahmonics in cepstra.

Some of these terms are helpful. As explained in the previous post, the independent variable in cepstral analysis, quefrency, differs enough from frequency that it helps to have a separate term for it. Using the terms long and short rather than high and low is helpful for the same reason. Using repiod for the analog of period seems gratuitous, but maybe it’s necessary for consistency. Once you introduce some new terminology, you have to keep going.

[1] Bruce P. Bogert, M. J. R. Healy, John W. Tukey. The Quefrency Analysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking. Collected works of John Tukey volume 1

Cepstrum, quefrency, and pitch

Posted on 18 May 2016 by John

John Tukey

John Tukey coined many terms that have passed into common use, such as bit (a shortening of binary digit) and software. Other terms he coined are well known within their niche: boxplot, ANOVA, rootogram, etc. Some of his terms, such as jackknife and vacuum cleaner, were not new words per se but common words he gave a technical meaning to.

Cepstrum is an anagram of spectrum. It involves an unusual use of power spectra, and is roughly analogous to making anagrams of a word. A related term, one we will get to shortly, is quefrency, an anagram of frequency. Some people pronounce the ‘c’ in cepstrum hard (like ‘k’) and some pronounce it soft (like ‘s’).

Let’s go back to an example from my post on guitar distortion. Here’s a note played with a fairly large amount of distortion:

And here is its power spectrum:

single note with distortion

There’s a lot going on in the spectrum, but the peaks are very regularly spaced. As I mentioned in the post on the sound of a leaf blower, this is the fingerprint of a sound with a definite pitch. Spikes in the spectrum alone don’t indicate a definite pitch if they are irregularly spaced.

The peaks are fairly periodic. How to you find periodic patterns in a signal? Fourier transform! But if you simply take the Fourier transform of a Fourier transform, you essentially get the original signal back. The key to the cepstrum is to do something else between the two Fourier transforms.

The cepstrum starts by taking the Fourier transform, then the magnitude, then the logarithm, and then the inverse Fourier transform.

When we take the magnitude, we throw away phase information, which we don’t need in this context. Taking the log of the magnitude is essentially what you do when you compute sound pressure level. Some define the cepstrum using the magnitude of the Fourier transform and some the magnitude squared. Squaring only introduces a multiple of 2 once we take logs, so it doesn’t effect the location of peaks, only their amplitude.

Taking the logarithm compresses the peaks, bringing them all into roughly the same range, making the sequence of peaks roughly periodic.

When we take the inverse Fourier transform, we now have something like a frequency, but inverted. This is what Tukey called quefrency.

Looking at the guitar power spectrum above, we see a sequence of peaks spaced 440 Hz apart. When we take the inverse Fourier transform of this, we’re looking at a sort of frequency of a frequency, what Tukey calls quefrency. The quefrency scale is inverted: sounds with a high frequency fundamental have overtones that are far apart on the frequency domain, so the sequence of the overtone peaks has low frequency.

Here’s the plot of the cepstrum for the guitar sample.

electric guitar cepstrum

There’s a big peak at 109 on the quefrency scale. The audio clip was recorded at 48000 samples per second, so the 109 on the quefrency scale corresponds to a frequency of 48000/109 = 440 Hz. The second peak is at quefrency 215, which corresponds to 48000/215 = 223 Hz. The second peak corresponds to the perceived pitch of the note, A3, and the first peak corresponds to its first harmonic, A4. (Remember the quefrency scale is inverted relative to the frequency scale.)

I cheated a little bit in the plot above. The very highest peaks are at 0. They are so large that they make it hard to see the peaks we’re most interested in. These low quefrency peaks correspond to very high frequency noise, near the edge of the audible spectrum or beyond.

Electric guitar distortion

Posted on 10 May 2016 by John

Alice Wallace at The Coach House

The other day I asked on Google+ if someone could make an audio clip for me and Dave Jacoby graciously volunteered. I wanted a simple chord on an electric guitar played with varying levels of distortion. Dave describes the process of making the recording as

Fender Telecaster -> EHX LPB clean boost -> Washburn Soloist Distortion (when engaged) -> Fender Frontman 25R amplifier -> iPhone

Let’s look at the Fourier spectrum at four places in the recording: single note and chord, clean and distorted. These are a 0:02, 0:08, 0:39, and 0:43.

Power spectra

The first note, without distortion, has most of its spectrum concentrated at 220 Hz, the A below middle C.

spectrum of single note, no distortion

The same note with distortion has a power spectrum that decays much slow, i.e. the sound has more high frequency components.

single note with distortion

Here’s the A major chord without distortion. Note that since the threshold of hearing is around 20 dB, most of the noise components are inaudible.

chord with no distortion

Here’s the same chord with distortion. Notice there’s much more noise in the audible range.

chord with distortion

Update: See the next post an analysis of the loudness and sharpness of the audio samples in this post.

Photo via Brian Roberts CC

Spectral flatness

Posted on 3 May 2016 by John

White noise has a flat power spectrum. So a reasonable way to measure how close a sound is to being pure noise is to measure how flat its spectrum is.

Spectral flatness is defined as the ratio of the geometric mean to the arithmetic mean of a power spectrum.

The arithmetic mean of a sequence of n items is what you usually think of as a mean or average: add up all the items and divide by n.

The geometric mean of a sequence of n items is the nth root of their product. You could calculate this by taking the arithmetic mean of the logarithms of the items, then taking the exponential of the result. What if some items are negative? Since the power spectrum is the squared absolute value of the FFT, it can’t be negative.

So why should the ratio of the geometric mean to the arithmetic mean measure flatness? And why pure tones have “unflat” power spectra?

If a power spectrum were perfectly flat, i.e. constant, then its arithmetic and geometric means would be equal, so their ratio would be 1. Could the ratio ever be more than 1? No, because the geometric mean is always less than or equal to the arithmetic mean, with equality happening only for constant sequences.

In the continuous realm, the Fourier transform of a sine wave is a pair of delta functions. In the discrete realm, the FFT will be large at two points and small everywhere else. Why should a concentrated function have a small flatness score? If one or two of the components are 1’s and the rest are zeros, the geometric mean is zero. So the ratio of geometric and arithmetic means is zero. If you replace the zero entries with some small ε and take the limit as ε goes to zero, you get 0 flatness as well.

Sometimes flatness is measured on a logarithmic scale, so instead of running from 0 to 1, it would run from -∞ to 0.

Let’s compute the flatness of some examples. We’ll take a mixture of a 440 Hz sine wave and Gaussian white noise with varying proportions, from pure sine wave to pure noise. Here’s what the flatness looks like as a function of the proportions.

spectral flatness

The curve is fairly smooth, though there’s some simulation noise at the right end. This is because we’re working with finite samples.

Here’s what a couple of these signals would sound like. First 30% noise and 70% sine wave:

(download)

And now the other way around, 70% noise and 30% sine wave:

(download)

Why does the flatness of white noise max out somewhere between 0.5 and 0.6 rather than 1? White noise only has a flat spectrum in expectation. The expected value of the power spectrum at every frequency is the same, but that won’t be true of any particular sample.

Related: Psychoacoustics

How to create Green noise in Python

Posted on 27 April 2016 by John

This is a follow-on to my previous post on green noise. Here we create green noise with Python by passing white noise through a Butterworth filter.

Green noise is in the middle of the audible spectrum (on the Bark scale), just where our hearing is most sensitive, analogous to the green light, the frequency where our eyes are most sensitive. See previous post for details, including an explanation of where the left and right cutoffs below come from.

Here’s the code:

from scipy.io.wavfile import write
from scipy.signal import buttord, butter, filtfilt
from scipy.stats import norm
from numpy import int16

def turn_green(signal, samp_rate):
    # start and stop of green noise range
    left = 1612 # Hz
    right = 2919 # Hz

    nyquist = (samp_rate/2)
    left_pass  = 1.1*left/nyquist
    left_stop  = 0.9*left/nyquist
    right_pass = 0.9*right/nyquist
    right_stop = 1.1*right/nyquist

    (N, Wn) = buttord(wp=[left_pass, right_pass],
                      ws=[left_stop, right_stop],
                      gpass=2, gstop=30, analog=0)
    (b, a) = butter(N, Wn, btype='band', analog=0, output='ba')
    return filtfilt(b, a, signal)

def to_integer(signal):
    # Take samples in [-1, 1] and scale to 16-bit integers,
    # values between -2^15 and 2^15 - 1.
    signal /= max(signal)
    return int16(signal*(2**15 - 1))

N = 48000 # samples per second

white_noise= norm.rvs(0, 1, 3*N) # three seconds of audio
green = turn_green(white_noise, N)
write("green_noise.wav", N, to_integer(green))

And here’s what it sounds like:

(download .wav file)

Let’s look at the spectrum to see whether it looks right. We’ll use one second of the signal so the x-axis coincides with frequency when we plot the FFT.

from scipy.fftpack import fft

one_sec = green[0:N]
plt.plot(abs(fft(one_sec)))
plt.xlim((1500, 3000))
plt.show()

Here’s the output, concentrated between 1600 and 3000 Hz as expected:

spectral plot of green noise

Beats: amplitude modulation in radios and musical instruments

Posted on 15 March 2016 by John

What do tuning a guitar and tuning a radio have in common? Both are examples of beats or amplitude modulation.

Examples

In an earlier post I wrote about how beats come up in vibrating systems, such as a mass and spring combination or an electric circuit. Here I look at examples from music and radio.

Music

When two musical instruments play nearly the same note, they produce beats. The number of beats per second is the difference in the two frequencies. So if two flutes are playing an A, one playing at 440 Hz and one at 442 Hz, you’ll hear a pitch at 441 Hz that beats two times a second. Here’s a wave file of two pure sine waves at 440 Hz and 442 Hz.

As the players come closer to being in tune, the beats slow down. Sometimes you don’t have two instruments but two strings on the same instrument. Guitarists listen for beats to tell when two strings are playing the same note with the same pitch.

AM radio

The same principle applies to AM radio. A message is transmitted by multiplying a carrier signal by the content you want to broadcast. The beats are the content. As we’ll see below, in some ways the musical example and the AM radio example are opposites. With tuning, we start with two sources and create beats. With AM radio, we start by creating beats, then see that we’ve created two sources, the sidebands of the signal.

Mathematical explanation

Both examples above relate to the following trig identity:

cos(a − b) + cos(a + b) = 2 cos a cos b

And because we’re looking at time-varying signals, slip in a factor of 2πt:

cos(2π(a − b)t) + cos(2π(a + b)t) = 2 cos 2πat cos 2πbt

Music

In the case of two pure tones, slightly out of tune, let a = 441 and b = 1. Playing an A 440 and an A 442 at the same time results in an A 441, twice as loud, with the amplitude going up and down like cos 2πt, i.e. oscillating two times a second. (Why two times and not just once? One beat for the maximum and one for the minimum of cos 2πt.)

It may be hard to hear beats because of complications we’ve glossed over. Musical instruments are never perfectly in phase, but more importantly they’re not pure tones. An oboe, for example, has strong components above the fundamental frequency. I used a flute in this example because although its tone is not simply a sine wave, it’s closer to a sine wave than other instruments, especially when playing higher notes. Also, guitarists often compare the harmonics of two strings. These are purer tones and so it’s easier to hear beats between them.

Radio

For the case of AM radio, read the equation above from right to left. Let a be the frequency of the carrier wave. For example if you’re broadcasting on AM station 700, this means 700 kHz, so a = 700,000. If this station were broadcasting a pure tone at 440 Hz, b would be 440. This would produce sidebands at 700,440 Hz and 699,560 Hz.

AM signal

In practice, however, the carrier is not multiplied by a signal like cos 2πbt but by 1 + m cos 2πbt where |m| < 1 to avoid over-modulation. Without this extra factor of 1 the signal would be 100% modulated; the envelope of the signal would pinch all the way down to zero. By including the factor of 1 and using a modulation index m less than 1, the signal looks more like the image above, with the envelope not pinching all the way down. (Over-modulation occurs when m > 1. Instead of the envelope pinching to zero, the upper and lower parts of the envelop cross.)

Creating police siren sounds with frequency modulation

Posted on 10 March 2016 by John

Yesterday I was looking into calculating fluctuation strength and playing around with some examples. Along the way I discovered how to create files that sound like police sirens. These are sounds with high fluctuation strength.

police car lights

The Python code below starts with a carrier wave at f_c = 1500 Hz. Not surprisingly, this frequency is near where hearing is most sensitive. Then this signal is modulated with a signal with frequency f_m. This frequency determines the frequency of the fluctuations.

The slower example produced by the code below sounds like a police siren. The faster example makes me think more of an ambulance or fire truck. Next time I hear an emergency vehicle I’ll pay more attention.

If you use a larger value of the modulation index β and a smaller value of the modulation frequency f_m you can make a sound like someone tuning a radio, which is no coincidence.

Here are the output audio files in .wav format:

slow.wav

fast.wav

from scipy.io.wavfile import write
from numpy import arange, pi, sin, int16

def f(t, f_c, f_m, beta):
    # t    = time
    # f_c  = carrier frequency
    # f_m  = modulation frequency
    # beta = modulation index
    return sin(2*pi*f_c*t - beta*sin(2*f_m*pi*t))

def to_integer(signal):
    # Take samples in [-1, 1] and scale to 16-bit integers,
    # values between -2^15 and 2^15 - 1.
    return int16(signal*(2**15 - 1))

N = 48000 # samples per second
x = arange(3*N) # three seconds of audio

data = f(x/N, 1500, 2, 100)
write("slow.wav", N, to_integer(data))

data = f(x/N, 1500, 8, 100)
write("fast.wav", N, to_integer(data))

Signal processing

Width of mixture PDFs

Acoustic roughness examples

Kalman filters and bottom-up learning

Cepstral alanysis vocabulary

Cepstrum, quefrency, and pitch

Electric guitar distortion

Power spectra

More acoustics posts

Spectral flatness

How to create Green noise in Python

Beats: amplitude modulation in radios and musical instruments

Examples

Music

AM radio

Mathematical explanation

Music

Radio

Related posts

Creating police siren sounds with frequency modulation

More frequency modulation posts