# Clipped sine waves

One source of distortion in electronic music is clipping. The highest and lowest portions of a wave form are truncated due to limitations of equipment. As the gain is increased, the sound doesn’t simply get louder but also becomes more distorted as more of the signal is clipped off.

## Clipping 0.2

For example, here is what a sine wave looks like when clipped 20%, i.e. cut off to be between -0.8 and 0.8. A simple sine wave has only one Fourier component, itself. But when we clip the sine wave, we move energy into higher frequency components. We can see that in the Fourier components below. You can show by symmetry that the even-numbered coefficients are exactly zero.

## Clipping 0.6

Here are the corresponding plots for 60% clipping, i.e. the absolute value of the signal is cut off to be 0.4. First the signal and then its Fourier components. Here are the first five sine waves with the amplitudes given by the Fourier coefficients. And here we see how the of the sines above do a pretty good job of reconstructing the original clipped sine. We’d need an infinite number of Fourier components to exactly reconstruct the original signal, but the first five components do most of the work. ## Continuous range of clipping

Next let’s look at the ratio of the energy in the 3rd component to that of the 1st component as we continuously vary the amount of clipping. Now for the 5th harmonic. This one is interesting because it’s not strictly increasing but rather has a little bump before it starts increasing. Finally, here’s the ratio of the energy in all higher frequencies to the energy in the fundamental. # Area of sinc and jinc function lobes

Someone left a comment this morning on my blog post on sinc and jinc integrals regarding the area of the lobes.

It would be nice to have the values of integrals of each lobe, i.e. integrals between 0 and multiples of pi. Anyone knows of such a table?

This post will include Python code to address that question. (Update: added asymptotic approximation. See below.)

First, let me back up and explain the context. The sinc function is defined as 

sinc(x) = sin(x) / x

and the jinc function is defined analogously as

jinc(x) = J1(x) / x,

substituting the Bessel function J1 for the sine function. You could think of Bessel functions as analogs of sines and cosines. Bessel functions often come up when vibrations are described in polar coordinates, just as sines and cosines come up when using rectangular coordinates.

Here’s a plot of the sinc and jinc functions: The lobes are the regions between crossings of the x-axis. For the sinc function, the lobe in the middle runs from -π to π, and for n > 0 the nth lobe runs from nπ to (n+1)π. The zeros of Bessel functions are not uniformly spaced like the zeros of the sine function, but they come up in application frequently and so it’s easy to find software to compute their locations.

First of all we’ll need some imports.

    from scipy import sin, pi
from scipy.special import jn, jn_zeros


The sinc and jinc functions are continuous at zero, but the computer doesn’t know that . To prevent division by zero, we return the limiting value of each function for very small arguments.

    def sinc(x):
return 1 if abs(x) < 1e-8 else sin(x)/x

def jinc(x):
return 0.5 if abs(x) < 1e-8 else jn(1,x)/x


You can show via Taylor series that these functions are exact to the limits of floating point precision for |x| < 10-8.

Here’s code to compute the area of the sinc lobes.

    def sinc_lobe_area(n):
n = abs(n)
integral, info = quad(sinc, n*pi, (n+1)*pi)
return 2*integral if n == 0 else integral


The corresponding code for the jinc function is a little more complicated because we need to compute the zeros for the Bessel function J1. Our solution is a little clunky because we have an upper bound N on the lobe number. Ideally we’d work out an asymptotic value for the lobe area and compute zeros up to the point where the asymptotic approximation became sufficiently accurate, and switch over to the asymptotic formula for sufficiently large n.

    def jinc_lobe_area(n):
n = abs(n)
assert(n < N)
integral, info = quad(jinc, jzeros[n-1], jzeros[n])
return 2*integral if n == 0 else integral


Note that the 0th element of the array returned by jn_zeros is the first positive zero of J1; it doesn’t include the zero at the origin.

For both sinc and jinc, the even numbered lobes have positive area and the odd numbered lobes have negative area. Here’s a plot of the absolute values of the lobe areas. ## Asymptotic results

We can approximate the area of the nth lobe of the sinc function by using a midpoint approximation for 1/x. It works out that the area is asymptotically equal to We can do a similar calculation for the area of the nth jinc lobe, starting with the asymptotic approximation for jinc given here. We find that the area of the nth lobe of the jinc function is asymptotically equal to To get an idea of the accuracy of the asymptotic approximations, here are the results for n=100.

    sinc area:      0.00633455
asymptotic:     0.00633452
absolute error: 2.97e-8
relative error: 4.69e-6

jinc area:      0.000283391
asymptotic:     0.000283385
absolute error: 5.66e-9
relative error: 2.00e-5


## More signal processing posts

 Some authors define sinc(x) as sin(πx)/πx. Both definitions are common.

 Scipy has a sinc function in scipy.special, defined as sin(πx)/πx, but it doesn’t have a jinc function.

# The Engineer’s Nyquist frequency and the sampling theorem

The Nyquist sampling theorem says that a band-limited signal can be recovered from evenly-spaced samples. If the highest frequency component of the signal is fc then the function needs to be sampled at a frequency of at least the Nyquist frequency 2fc. Or to put it another way, the spacing between samples needs to be no more than than Δ = 1/2fc.

If the signal is given by a function h(t), then the Nyquist-Shannon sampling theorem says we can recover h(t) by where sinc(x) = sin(πx) / πx.

In practice, signals may not entirely band-limited, but beyond some frequency fc higher frequencies can be ignored. This means that the cutoff frequency fc is somewhat fuzzy. As we demonstrate below, it’s much better to err on the side of making the cutoff frequency higher than necessary. Sampling at a little less than the necessary frequency can cause the reconstructed signal to be a poor approximation of the original. That is, the sampling theorem is robust to over-sampling but not to under-sampling. There’s no harm from sampling more frequently than necessary. (No harm as far as the accuracy of the equation above. There may be economic costs, for example, that come from using an unnecessarily high sampling rate.)

Let’s look at the function h(t) = cos(18πt) + cos(20πt). The bandwidth of this function is 10 Hz, and so the sampling theorem requires that we sample our function at 20 Hz. If we sample at 20.4 Hz, 2% higher than necessary, the reconstruction lines up with the original function so well that the plots of the two functions agree to the thickness of the plotting line. But if we sample at 19.6 Hz, 2% less than necessary, the reconstruction is not at all accurate due to problems with aliasing. One rule of thumb is to use the Engineer’s Nyquist frequency of 2.5 fc which is 25% more than the exact Nyquist frequency. An engineer’s Nyquist frequency is sorta like a baker’s dozen, a conventional safety margin added to a well-known quantity.

Update: Here’s a plot of the error, the RMS difference between the signal and its reconstruction, as a function of sampling frequency. By the way, the function in the example demonstrates beats. The sum of a 9 Hz signal and a 10 Hz signal is a 9.5 Hz signal modulated at 0.5 Hz. More details on beats in this post on AM radio and musical instruments.

# Paul Klee meets Perry the Platypus

I was playing around with something in Mathematica and one of the images that came out of it surprised me. It’s a contour plot for the system function of a low pass filter.

    H[z_] := 0.05634*(1 + 1/z)*(1 - 1.0166/z + 1/z^2) /
((1 - 0.683/z)*(1 - 1.4461/z + 0.7957/z^2))
ContourPlot[ Arg[H[Exp[I (x + I y)]]],
{x, -1, 1}, {y, -1, 1},
ColorFunction -> "StarryNightColors"]


It looks sorta like a cross between Paul Klee’s painting Senecio and Perry the Platypus from Phineas and Ferb. # Time series analysis vs DSP terminology

Time series analysis and digital signal processing are closely related. Unfortunately, the two fields use different terms to refer to the same things.

Suppose you have a sequence of inputs x[n] and a sequence of outputs y[n] for integers n.

## Moving average / FIR

If each output depends on a linear combination of a finite number of previous inputs

y[n] = b0 x[n] + b1 x[n-1] + … + bq x[nq]

then time series analysis would call this a moving average (MA) model of order q, provided b0 = 1. Note that this might not really be an average, i.e. the b‘s are not necessarily positive and don’t necessarily sum to 1.

Digital signal processing would call this a finite impulse response (FIR) filter of order q.

## Autoregressive / IIR

If each output depends on a linear combination of a finite number of previous outputs

y[n] = a1 y[n-1] + … + ap y[np]

then time series analysis would call this an autoregressive (AR) model of order p.

Digital signal processing would call this an infinite impulse response (IIR) filter of order p.

Sometimes you’ll see the opposite sign convention on the a‘s.

## ARMA / IIR

If each output depends on a linear combination of a finite number of previous inputs and outputs

y[n] = b0 x[n] + b1 x[n-1] + … + bq x[nq] + a1 y[n-1] + … + ap y[np]

then time series analysis would call this an autoregressive moving average (ARMA) model of order (pq), i.e. p AR terms and q MA terms.

Digital signal processing would call this an infinite impulse response (IIR) filter with q feedforward coefficients and p feedback coefficients. Also, as above, you may see the opposite sign convention on the a‘s.

## ARMA notation

Box and Jenkins use a‘s for input and z‘s for output. We’ll stick with x‘s and y’s to make the comparison to DSP easier.

Using the backward shift operator B that takes a sample at n to the sample at n-1, the ARMA system can be written

φ(B) y[n] = θ(B) x[n]

where φ and θ are polynomials

φ(B) = 1 – φ1B – φ2B² – … φpBp

and

θ(B) = 1 – θ1B – θ2B² – … θqBq.

## System function notation

In DSP, filters are described by their system function, the z-transform of the impulse response. In this notation (as in Oppenheim and Shafer, for example) we have The φk in Box and Jenkins correspond to the ak in Oppenheim and Schafer. The θk correspond to the (negative) bk.

The system function H(z) corresponds to θ(1/z) / φ(1/z).

## Related

DSP and time series consulting

# Width of mixture PDFs

Suppose you draw samples from two populations, one of which has a wider probability distribution than the other. How does the width of the distribution of the combined sample vary as you change the proportions of the two populations?

The extremes are easy. If you pick only from one population, then the resulting distribution will be exactly as wide as the distribution of that population. But what about in the middle? If you pick from both populations with equal probability, will the width resulting distribution be approximately the average of the widths of the two populations separately?

To make things more specific, we’ll draw from two populations: Cauchy and normal. With probability η we will sample from a Cauchy distribution with scale γ and with probability (1-η) we will sample from a normal distribution with scale σ. The resulting combined distribution is a mixture, known in spectroscopy as a pseudo-Voigt distribution. In that field, the Cauchy distribution is usually called the Lorentz distribution.

(Why”pseudo”? A Voigt random variable is the sum of a Cauchy and a normal random variable. Its PDF is a convolution of a Cauchy and a normal PDF. A pseudo-Voigt random variable is the mixture of a Cauchy and a normal random variable. Its PDF is a convex combination of the PDFs of a Cauchy and a normal PDF. In fact, the convex combination coefficients are η and (1-η) mentioned above. Convex combinations are easier to work with than convolutions, at least in some contexts, and the pseudo-Voigt distribution is a convenient approximation to the Voigt distribution.)

We will measure the width of distributions by full width at half maximum (FWHM). In other words, we’ll measure how far apart the two points are where the distribution takes on half of its maximum value.

It’s not hard to calculate that the FWHM for a Cauchy distribution with scale 2γ and the FWHM for a normal distribution with scale σ is 2 √(2 log 2) σ.

If we have a convex combination of Cauchy and normal distributions, we’d expect the FWHM to be at least roughly the same convex combination of the FWHM of the separate distributions, i.e. we’d expect the FWHM of our mixture to be

2ηγ + 2(1 – η)√(2 log 2)σ.

How close is that guess to being correct? It has to be exactly correct when η is 0 or 1, but how well does it do in the middle? Here are a few experiments fixing γ = 1 and varying σ.   # Acoustic roughness examples

Amplitude modulated signals sound rough to the human ear. The perceived roughness increases with modulation frequency, then decreases, and eventually disappears. The point where roughness reaches is maximum depends on the carrier signal, but for a 1 kHz tone roughness reaches a maximum for modulation at 70 Hz. Roughness also increases as a function of modulation depth.

Amplitude modulation multiplies a carrier signal by

1 + d sin(2π f t)

where d is the modulation depth, f is the modulation frequency, and t is time.

Here are some examples you can listen to. We use a pure 1000 Hz tone and Gaussian white noise as carriers, and vary modulation depth and frequency continuously over 10 seconds. he modulation depth example varies depth from 0 to 1. Modulation frequency varies from 0 to 120 Hz.

First, here’s a pure tone with increasing modulation depth.

Next we vary the modulation frequency.

Now we switch over to Gaussian white noise, first varying depth.

And finally white noise with varying modulation frequency. This one sounds like a prop-driven airplane taking off.

Related: Psychoacoustics consulting

# Kalman filters and bottom-up learning Kalman filtering is a mixture of differential equations and statistics. Kalman filters are commonly used in tracking applications, such as tracking the location of a space probe or tracking the amount of charge left in a cell phone battery. Kalman filters provide a way to synthesize theoretical predictions and actual measurements, accounting for error in both.

Engineers naturally emphasize the differential equations and statisticians naturally emphasize the statistics. Both perspectives are valuable, but in my opinion/experience, the engineering perspective should come first.

From an engineering perspective, a Kalman filtering problem starts as a differential equation. In an ideal world, one would simply solve the differential equation and be done. But the experienced engineer realizes his or her differential equations don’t capture everything. (Unlike the engineer in this post.) Along the road to the equations at hand, there were approximations, terms left out, and various unknown unknowns.

The Kalman filter accounts for some level of uncertainty in the process dynamics and in the measurements taken. This uncertainty is modeled as randomness, but this doesn’t mean that there’s necessarily anything “random” going on. It simply acknowledges that random variables are an effective way of modeling miscellaneous effects that are unknown or too complicated to account for directly. (See Random is as random does.)

The statistical approach to Kalman filtering is to say that it is simply another estimation problem. You start from a probability model and apply Bayes’ theorem. That probability model has a term inside that happens to come from a differential equation in practice, but this is irrelevant to the statistics. The basic Kalman filter is a linear model with normal probability distributions, and this makes a closed-form solution for the posterior possible.

You’d be hard pressed to start from a statistical description of Kalman filtering, such as that given here, and have much appreciation for the motivating dynamics. Vital details have simply been abstracted away. As a client told me once when I tried to understand his problem starting from the top-down, “You’ll never get here from there.”

The statistical perspective is complementary. Some things are clear from the beginning with the statistical formulation that would take a long time to see from the engineering perspective. But while both perspectives are valuable, I believe it’s easier to start on the engineering end and work toward the statistics end rather than the other way around.

History supports this claim. The Kalman filter from the engineering perspective came first and its formulation in terms of Bayesian statistics came later. Except that’s not entirely true.

Rudolf Kálmán published his seminal paper in 1960 and four years later papers started to come out making the connection to Bayesian statistics. But while Kálmán and others were working in the US starting from the engineering end, Ruslan Stratonovich was working in Russia starting from the statistical end. Still, I believe it’s fair to say that most of the development and application of Kalman filters has proceeded from the engineering to the statistics rather than the other way around.

RelatedMore on Kalman filters

# Cepstral alanysis vocabulary

An earlier post defined cepstrum and quefrency. This post explains some of the other quirky terms introduced in the same paper by Bogert, Healy, and Tukey. (Given Tukey’s delight in coining words, we can assume he was the member of the trio responsible for the new terms.)

The paper  explains why the new twists on familiar words:

In general, we find ourselves operating on the frequency side in ways customary on the time side and vice versa. Experience has made it clear that “words that sound like other words,” although strange at first sight, considerably reduce confusion on balance. These parallel or “paraphrased” words are made by the interchange of consonants or consonant groups, as in “alanysis” from “analysis,” and are introduced as needed.

The magnitude and phase of a cepstrum are called gamnitude and saphe. (The latter explains the pun “saphe cracking” in the title.)

Filtering in the cepstral domain is called liftering. A high-pass filter corresponds to a long-pass lifter and a low-pass filter corresponds to a short-pass lifter.

Harmonics in spectra correspond to rahmonics in cepstra.

Some of these terms are helpful. As explained in the previous post, the independent variable in cepstral analysis, quefrency, differs enough from frequency that it helps to have a separate term for it. Using the terms long and short rather than high and low is helpful for the same reason. Using repiod for the analog of period seems gratuitous, but maybe it’s necessary for consistency. Once you introduce some new terminology, you have to keep going.

 Bruce P. Bogert, M. J. R. Healy, John W. Tukey. The Quefrency Analysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking. Collected works of John Tukey volume 1

# Cepstrum, quefrency, and pitch John Tukey coined many terms that have passed into common use, such as bit (a shortening of binary digit) and software. Other terms he coined are well known within their niche: boxplot, ANOVA, rootogram, etc. Some of his terms, such as jackknife and vacuum cleaner, were not new words per se but common words he gave a technical meaning to.

Cepstrum is an anagram of spectrum. It involves an unusual use of power spectra, and is roughly analogous to making anagrams of a word. A related term, one we will get to shortly, is quefrency, an anagram of frequency. Some people pronounce the ‘c’ in cepstrum hard (like ‘k’) and some pronounce it soft (like ‘s’).

Let’s go back to an example from my post on guitar distortion. Here’s a note played with a fairly large amount of distortion:

And here is its power spectrum: There’s a lot going on in the spectrum, but the peaks are very regularly spaced. As I mentioned in the post on the sound of a leaf blower, this is the fingerprint of a sound with a definite pitch. Spikes in the spectrum alone don’t indicate a definite pitch if they are irregularly spaced.

The peaks are fairly periodic. How to you find periodic patterns in a signal? Fourier transform! But if you simply take the Fourier transform of a Fourier transform, you essentially get the original signal back. The key to the cepstrum is to do something else between the two Fourier transforms.

The cepstrum starts by taking the Fourier transform, then the magnitude, then the logarithm, and then the inverse Fourier transform.

When we take the magnitude, we throw away phase information, which we don’t need in this context. Taking the log of the magnitude is essentially what you do when you compute sound pressure level. Some define the cepstrum using the magnitude of the Fourier transform and some the magnitude squared. Squaring only introduces a multiple of 2 once we take logs, so it doesn’t effect the location of peaks, only their amplitude.

Taking the logarithm compresses the peaks, bringing them all into roughly the same range, making the sequence of peaks roughly periodic.

When we take the inverse Fourier transform, we now have something like a frequency, but inverted. This is what Tukey called quefrency.

Looking at the guitar power spectrum above, we see a sequence of peaks spaced 440 Hz apart. When we take the inverse Fourier transform of this, we’re looking at a sort of frequency of a frequency, what Tukey calls quefrency. The quefrency scale is inverted: sounds with a high frequency fundamental have overtones that are far apart on the frequency domain, so the sequence of the overtone peaks has low frequency.

Here’s the plot of the cepstrum for the guitar sample. There’s a big peak at 109 on the quefrency scale. The audio clip was recorded at 48000 samples per second, so the 109 on the quefrency scale corresponds to a frequency of 48000/109 = 440 Hz. The second peak is at quefrency 215, which corresponds to 48000/215 = 223 Hz. The second peak corresponds to the perceived pitch of the note, A3, and the first peak corresponds to its first harmonic, A4. (Remember the quefrency scale is inverted relative to the frequency scale.)

I cheated a little bit in the plot above. The very highest peaks are at 0. They are so large that they make it hard to see the peaks we’re most interested in. These low quefrency peaks correspond to very high frequency noise, near the edge of the audible spectrum or beyond.