Quantifying Loudness

How do you quantify how loud a sound is? Sounds like a simple question, but it’s not.

What is loudness?

It’s not hard to measure the physical intensity of a sound, but loudness is the perceived intensity of a sound. It is not a physical phenomena but a psychological phenomena.

Loudness is subjective, but not entirely so. There is general consensus regarding what it means for two sounds to be equally loud, and even for ratios, such as saying when one sound is twice as loud as the other. Loudness is quantifiable, but not easily so.

What does loudness depend on?

Loudness depends on several properties of a sound, such as its frequency, bandwidth, and duration. Loudness must depend on frequency because sounds that are too low or too high have no loudness at all because we simply cannot hear them. But even with the range of audible frequencies, loudness varies quite a bit by pitch. The graph below, via Wikipedia, shows equal loudness contours. The blue lines are from work by Fletcher and Munson in 1937. The red lines are the revised curves per the ISO 226:2003 standard.

Fletcher-Munson curves

The horizontal axis is frequency in Hz and the vertical axis is sound pressure level in decibels. The contour lines represent combinations of frequency and sound pressure level that are perceived to be equally loud. If a tuba and a flute sound equally loud, the sound pressure level coming from the tuba is much higher.

Notice that the curves are not parallel, They’re much closer together for low frequencies than for midrange frequencies, though they are roughly parallel for high frequencies. This means that if you recorded a piano, for example, playing each of its keys at equal loudness, the pitches wouldn’t sound equally loud unless you played the recording back at its original volume.

Complexities and simplifications

As complicated as this is, it’s still a simplification. It is based on pure tones, simple sine waves. A single musical instrument, much less an orchestra or a jackhammer, are more complicated. Loudness is highly nonlinear, and so you cannot say that the loudness of two sounds is the sum of their individual loudnesses. A-weighting is a relatively simple way to convert sound pressure levels to loudness, but is only accurate for pure tones at fairly low loudness levels.

To simplify thing further, consider a single pure tone, a sine wave at 1 kHz. (This is almost two octaves above middle C. See details here.) Loudness level in phons is defined to match sound pressure level in decibels for a 1 kHz pure tone. So a sound has a loudness level of 40 phons, for example, if it is perceived to be as loud as a pure 1 hKz tone at 40 dB.

At 1 kHz, loudness increases by a factor of 2 for every 10 dB increase in sound pressure level. But because nothing is simple in psychoacoustics, even this is a simplification. It only holds for sounds with loudness level 40 dB or greater. A quiet room is around 40 phons, so the added complications below 40 phons may not be relevant in many applications.

A pure tone at 1 kHz and 20 dB sounds more than four times softer than the same tone at 40 dB. The definition of loudness level in phons still holds below 40 phons. An oboe has a loudness level of 20 phons if it has the same loudness as a sine wave with frequency 1 kHz and sound pressure level 20 dB. But an oboe at 30 phons will sound more than twice as loud as one at 20 phons.

Update: New blog post comparing guitar samples at the same sound pressure level but with differing loudness and sharpness.

Summary

So where are we as far as calculating loudness? We’ve said a lot about what you can’t do, what complications have to be considered. But we’ve concluded this much: for a pure 1 kHz tone, the loudness in phons equals (by definition) the sound pressure level in decibels. And we’ve said how in principle you could define the loudness of other sounds: compare them to a 1 kHz tone that’s just as loud. We haven’t said how to compute this, only how you could determine it empirically.

In future posts I may write about how you do this using the ISO 532B standard or the newer ANSI S3.4-2007 standard.

Related links

Analyzing an FM signal

radio dial

Frequency modulation combines a signal with a carrier wave by changing (modulating) the carrier wave’s frequency.

Starting with a cosine carrier wave with frequency fc Hz and adding a signal with amplitude β and frequency fm Hz results in the combination

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) )

The factor β is known as the modulation index.

We’d like to understand this signal in terms of cosines without any frequency modulation. It turns out the result is a set of cosines weighted by Bessel functions of β.

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) ) = \sum_{k=-\infty}^\infty J_n(\beta) \cos(2\pi(f_c + nf_m)t)

Component amplitudes

We will prove the equation above, but first we’ll discuss what it means for the amplitudes of the cosine components.

For small values of β, Bessel functions decay quickly, which means the first cosine component will be dominant. For larger values of β, the Bessel function values increase to a maximum then decay like one over the square root of the index. To see this we compare the coefficients for modulation index β = 0.5 and β = 5.0.

First, β = 0.5:

and now for β = 5.0:

For fixed β and large n we have

J_n(\beta) \approx \frac{\beta^n}{2^n \, n!}

and so the sideband amplitudes eventually decay very quickly.

Update: See this post for what the equation above says about energy moving from the carrier to sidebands.

Proof

To prove the equation above, we need three basic trig identities

\cos(A + B) &=& \cos A \cos B - \sin A \sin B \\ 2\cos A \cos B &=& \cos(A-B) + \cos(A+B) \\ 2\sin A \sin B &=& \cos(A-B) - \cos(A+B)

and three Bessel function identities

\cos( z \sin \theta) &=& J_0(z) + 2\sum_{k=1}^\infty J_{2k}(z) \cos(2k\theta) \\ \sin( z \sin \theta) &=& 2\sum_{k=0}^\infty J_{2k+1}(z) \sin((2k+1)\theta) \\ J_{-n}(z) &=& (-1)^n J_n(z)

The Bessel function identities above can be found in Abramowitz and Stegun as equations 9.1.42, 9.1.43, and 9.1.5.

And now the proof. We start with

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) )

and apply the sum identity for cosines to get

\cos(2\pi f_c t) \cos(\beta \sin(2\pi f_m t)) - \sin(2\pi f_c t) \sin(\beta \sin(2\pi f_m t))

Now let’s take the first term

 \cos(2\pi f_c t) \cos(\beta \sin(2\pi f_m t))

and apply one of our Bessel identities to expand it to

J_0(\beta) \cos(2\pi f_c t) + \sum_{k=1}^\infty J_{2k}(\beta) \left\{ \cos(2\pi (f_c - 2k f_m)t) + \cos(2\pi(f_c + 2k f_m)t) \right\}

which can be simplified to

\sum_{n \,\, \mathrm{even}} J_n(\beta) \cos(2\pi(f_c + nf_m)t)

where the sum runs over all even integers, positive and negative.

Now we do the same with the second half of the cosine sum. We expand

\sin(2\pi f_c t) \sin(\beta \sin(2\pi f_m t))

to

\sum_{k=1}^\infty J_{2k+1}(\beta) \left\{ \cos(2\pi (f_c - (2k+1) f_m)t) - \cos(2\pi(f_c + (2k+1) f_m)t) \right\}

which simplifies to

\sum_{k=1}^\infty J_{2k+1}(\beta) \left\{ \cos(2\pi (f_c - (2k+1) f_m)t) - \cos(2\pi(f_c + (2k+1) f_m)t) \right\}

where again the sum is over all (odd this time) integers. Combining the two halves gives our result

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) ) = \sum_{k=-\infty}^\infty J_n(\beta) \cos(2\pi(f_c + nf_m)t)

***

More on mathematics and radio.

Fourier transform of a function on a graph

What is a Fourier transform at its core? An expansion of function in terms of eigenfunctions of the Laplacian. For a function on the real line, the Laplacian is simply the second derivative. The functions mapped to multiples of themselves by taking second derivatives are sines and cosines of various frequencies. A Fourier series is a change of basis, using as basis vectors those functions who behave the simplest under the second derivative.

The Fourier transform of a function on a graph is also a change of basis, expanding a discrete function in terms of eigenvalues of the Laplacian, in this case the graph Laplacian.

The Fourier transform of a function f, evaluated at a frequency ω, is the inner product of f with the eigenfunction exp(2πiωt).

\hat{f}(\omega) = \langle f, \exp(2\pi i \omega t) \rangle = \int_{-\infty}^\infty f(t) \exp(-2\pi i \omega t) \, dx

The inner product of two complex functions f and g is the integral of the product of f and the conjugate of g. Conjugation is why exp(2πiωt) became exp(-2πiωt).

The Fourier transform of a discrete function f on a graph, evaluated at an eigenvalue λi, is the inner product of f (i.e. the vector of values of f at each node) with the eigenvector associated with λi.

\hat{f}(\lambda_i) = \langle f, v^*_i \rangle = \sum_{j=1}^N f(j) v_i^*(j)

Here the inner product is a discrete sum rather than an integral. As before, we take the complex conjugate of the second item in the product.

The eigenvectors associated with the smallest eigenvalues of the graph Laplacian are analogous to low frequency sines and cosines. The eigenvalue components corresponding to nearly vertices in a graph should be close together. This analogy explains why spectral coordinates work so well.

Related posts

Discrete Laplace transform

The relationship between the discrete Laplace transform and discrete Fourier transform is not quite the same as that between their continuous counterparts.

Continuous Fourier and Laplace transforms

The continuous versions of the Fourier and Laplace transforms are given as follows.

Fourier transform:

{\cal F}(f)(\omega) = \int_{-\infty}^\infty \exp(-i\omega x) f(x)\, dx

Laplace transform:
{\cal L}(f)(s) = \int_0^\infty \exp(-s x) f(x)\, dx

The Fourier transform is defined several ways, and I actually prefer the convention that puts a factor of 2π in the exponential, but the convention above makes the analogy with Laplace transform simpler. There are two differences between the Fourier and Laplace transforms. The Laplace transform integrates over only half the real line, compared to the entire real line for Fourier. But a variation on the Laplace transform, the Bilateral Laplace transform integrates over the entire real line. The Bilateral Laplace transform at s is simply the Fourier transform at xis. And of course the same is true for the (one-sided) Laplace transform if the function f is only non-zero for positive values.

I’ve encountered the Fourier transform more in application, and the Laplace transform more in teaching. This is not to say the Laplace transform isn’t used in practice; it certainly is used in applications. But the two transforms serve similar purposes, and the Laplace transform is easier to teach. Because the factor exp(-sx) decays rapidly, the integral defining the Laplace transform converges for functions where the integral defining the Fourier transform would not. Such functions may still have Fourier transforms, but the transforms require distribution theory whereas the Laplace transforms can be computed using basic calculus.

Discrete Fourier and Laplace Transforms

There’s more difference between the discrete versions of the Fourier and Laplace transforms than between the continuous versions.

The discrete Fourier transform (DFT) approximates the integral defining the (continuous) Fourier transform with a finite sum. It discretizes the integral and truncates its domain. The discrete Laplace transform is an infinite sum. It discretizes the integral defining the Laplace transform, but it does not truncate the domain. Given a step size η > 0, the discrete Laplace transform of f is

{\cal L}_\eta(f)(s) = \eta \sum_{n=0}^\infty \exp(-sn\eta) f(n\eta)

The discrete Laplace transform isn’t “as discrete” as the discrete Fourier transform. The latter takes a finite sequence and returns a finite sequence. The former evaluates a function at an infinite number of points and produces a continuous function.

The discrete Laplace transform is used in applications such as signal processing, as well as in the theory of analytic functions.

Connection with the z-transform and generating functions

If η = 1 and z = exp(-s), the discrete Laplace transform becomes the z-transform of the values of f at non-negative integers. And if we replace z with 1/z, or equivalently set z = exp(s) instead of z = exp(-s), we get the generating function of the values of f at non-negative integers.

z-transforms are common in digital signal processing, while generating functions are common in combinatorics. They are essentially the same thing.

Visualizing the DFT matrix

The discrete Fourier transform (DFT) of length N multiplies a vector by a matrix whose (j, k) entry is ωjk where ω = exp(-2πi/N), with j and k running from 0 to – 1. Each element of the matrix is a rotation, so if N = 12, we can represent each element by an hour on a clock. The angle between the hour hand and minute hand corresponds to the phase of the matrix entry. We could also view each element as a color around a color wheel. The image below does both.

The matrix representing the inverse of the DFT is the conjugate of the DFT matrix (divided by Nf, but we’re only looking at phase here, so we can ignore this rescaling.) The image below displays the DFT matrix on the left and it’s inverse on the right.

Taking the conjugate amounts to making all the clocks run backward.

The DFT is often called the FFT. Strictly speaking, the FFT is an algorithm for computing the DFT. Nobody computes a DFT by multiplying by the DFT matrix, because the FFT is faster. The DFT matrix has a lot of special structure, which the FFT takes advantage of to compute the product faster than using ordinary matrix multiplication.

By the way, there are Unicode characters for clock times on the hour, U+1F550 through U+1F55B. I created the image above by writing a script that put the right characters in a table. The colors have HSL values where H is proportional to the angle and S = L =0.8.

Relating Fourier series and Fourier transforms

Fourier series and Fourier transforms may seem more different than they are because of the way they’re typically taught. Fourier series are presented more as a representation of a function, not a transformation. Here’s a function on an interval. We can write it as a sum of sines and cosines, just as we can write a function as a sum of powers in a power series. There’s not much emphasis on the coefficients per se. They appear inside a sum, but don’t get much attention on their own.

Fourier transforms, on the other hand, are presented as genuine transforms. Here’s a function, and here’s its transform, another function. One’s a function of time, the other a function of frequency. Or maybe both are presented as representations of the same function in two different domains, the time domain and the frequency domain.

You could think of the Fourier series as a kind of transform, taking a periodic function and mapping it to an infinite sequence, the Fourier series coefficients. And you could think of the Fourier transform as being a kind of continuous set of coefficients for representing a function, if you interpret the inversion theorem the right way.

Here are a couple connections between Fourier series and Fourier transforms. Start with a function f on an interval and compute its Fourier series. The Fourier series is periodic, so we could think of f as periodic, even though we only care about f on the interval. Instead, let’s think of extending f to be 0 everywhere outside the interval. Now we take the Fourier transform of f. The Fourier series coefficients are the Fourier transform of f evaluated at integer arguments.

Now let’s go back to thinking of f as a periodic function. What would it’s Fourier transform look like? In classical analysis, you can’t do that. Periodic functions have Fourier series but they don’t have Fourier transforms because the integral defining the latter does not converge. But by the magic of tempered distributions, we can indeed take the Fourier transform of a periodic function. The result is a weighted sum of delta distributions at each integer, and the coefficient of the delta distribution at n is the nth Fourier series coefficient.

The proof of the claim in the previous paragraph is simple once you understand the sha function Ш. Start with a function f defined on a unit interval and extended to be zero outside that interval. Convolving f with Ш make a periodic function f*Ш extending f. The Fourier transform of a convolution is the product of the convolutions. The Fourier transform of f is simply its classical Fourier transform F. The Ш function is its own Fourier transform, so the transform of f*Ш is FШ. Multiplying a function by Ш samples that function, and the samples of F are the Fourier coefficients of the Fourier series of f*Ш, the periodic extension of f.

An example of coming full circle

Here’s an interesting line from Brad Osgood:

Isn’t it a little embarrassing that multibillion dollar industries seem to depend on integrals that don’t converge?

In context, he’s not saying that huge companies are blithely using bad math. Some are, but that’s not what he’s getting at here. His discussion is an example of coming full circle, where experts and novices come to the same conclusion for different reasons.

The divergent integrals Osgood refers to are Fourier transforms of certain functions. A beginner might not notice that said integrals don’t converge. An expert knows that the calculations are justified by a more sophisticated theory. Someone in-between would have objections. Experts can be casual, not because they’re ignorant of technical difficulties but because they’ve mastered these difficulties. [1]

The expert in Fourier analysis has all the technicalities in the back of his or her mind. Often these don’t need to be explicitly exercised. You can blithely go about using formal calculations that aren’t justified by the classical theory.

But the expert doesn’t entirely come full circle, not in the sense of walking in circles in the woods. It’s more like winding around a parking garage, coming back to the same (x, y) location but one level up. Sometimes the expert needs to pull out the technical machinery to avoid an error the beginner could fall into. The theory of tempered distributions, for example, doesn’t justify every calculation a novice might try.

***

[1] In a nutshell, here’s the theory that justifies apparently sloppy calculations with Fourier transforms. The key is to view the function you want to transform not as a function on the real line but as a tempered distribution, a linear functional on the space of smooth, rapidly decaying test functions. A function acts on a test function by forming their product and integrating. Then use Parseval’s theorem from the classical theory as the definition in this new context, moving the transform operation from the original function to the test function. Simple, right?

The Dirac comb or Sha function

shaThe sha function, also known as the Dirac comb, is denoted with the Cyrillic letter sha (Ш, U+0428). This letter was chosen because it looks like how people visualize the function, a long series of vertical spikes. The function is called the Dirac comb for the same reason. This function is very important in Fourier analysis because it relates Fourier series and Fourier transforms. It relates sampling and periodization.  It’s its own Fourier transform, and with a few qualifiers discussed later, the only such function.

The Ш function, really the Ш distribution, is defined as

sha(x) = \sum_{n=-\infty}^\infty \delta(x-n)

Here δ(xn) is the Dirac delta distribution centered at n. The action of δ(xn) on a test function is to evaluate that function at n. You can envision Ш as an infinite sequence of spikes, one at each integer. The action of Ш on a test function is to add up its values at every integer.

Sampling

The product of Ш with a function f is a new distribution whose action on a test function φ is the sum of f φ over all integers. Or you could think of the distribution as a sort of clothesline on which to hang the sampled values of f, much the way a generating function works.

Periodizing

Next let’s look at a function f that lives on [0, 1], i.e. is zero everywhere outside the unit interval. The convolution of f with δ(xn) is f(xn), i.e. a copy of f shifted over to live on the interval [n, n+1]. So by taking the convolution with Ш, we create copies of f all over the real line. We’ve made f into a periodic function. So instead of saying “the function f extended to create a periodic function” you can simply say f*Ш.

Fourier transform

Now let’s think about the Fourier transform of Ш. The Fourier transform of δ(x) is 1, i.e. the function equal to 1 everywhere [1].  (The more concentrated a function is, the more spread out its Fourier transform. So if you have an infinitely concentrated function δ, its Fourier transform is perfectly flat, 1. You can calculate the transform rigorously, this is the intuition.) If you shift a function by n, you rotate its Fourier transform by exp(-2πinω). So we can compute the transform of Ш:

Fourier transform of sha = \sum_{n=-\infty}^\infty \exp(-2\pi i n \omega)

This equation only makes sense in terms of distributions. The right hand side does not converge in the classical sense; the individual terms don’t even go to zero, since each term has magnitude 1. So what kind of distribution is this thing on the right side? It is in fact the Ш function again, though this is not obvious.

To see that the exponential sum is actually the Ш function, i.e. that Ш is its own Fourier transform, we need to back up a little bit and define Fourier transform of a distribution. As usual with distributions, we take a classical theorem and turn it into a definition in a broader context.

For absolutely integrable functions, we have

\int_{-\infty}^ \infty \hat{f}(x) \, \varphi(x) \, dx = \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

where the hat on top of a function indicates its Fourier transform. Inspired by the theorem above, we define the Fourier transform of a distribution f to be the functional whose action on a test function φ is given below.

 \hat{f} : \varphi \mapsto \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

As we noted in a previous post, the integral above can be taken literally if f is a distribution associated with an ordinary function, but in general it means the application of the linear functional to the test function.

As a distribution, exp(-2πinω) acts on a test function φ by integrating against it. From the definition of a (classical) Fourier transform, this gives the Fourier transform of φ evaluated at n. So the Fourier transform of Ш acts on φ by summing the values of φ’s Fourier transform over all integers. By the Poisson summation formula, this is the same as summing the values of φ itself over all integers. Which is the same as applying Ш. So the Fourier transform of Ш has the same effect on test functions as Ш. In other words, Ш is its own Fourier transform.

Uniqueness

We haven’t been explicit about where our test functions come from. We require that xn φ(x) goes to zero as x goes to ±∞ for any positive integer n. These are called functions of rapid decay. And the distributions we define as linear functionals on such test functions are called tempered distributions.

The Ш distribution is essentially unique. Any tempered distribution with period 1 that equals its own Fourier transform must be a multiple of Ш.

***

[1] All Fourier transform calculations here use the convention I call (-1, τ, 1) in these notes on various definitions. This may be the most common definition, though there are several minor variations in common use.

Restarting @DSP_fact, ending @PerlRegex

I’m making a couple changes to my Twitter accounts.

First, I’m winding down @PerlRegex. I’ll stop tweeting there when my scheduled tweets run out. I suggest that everyone who has been following @PerlRegex start following @RegexTip instead. The latter is more general, but is mostly compatible with Perl.

Second, I’m reviving my @DSP_Fact. I stopped tweeting there a couple years ago, but I’d like to start posting there again. This time it’s going to be a little broader. I intend to include some material on acoustics, Fourier analysis (continuous and discrete), and maybe some other related material.

DSP_fact logo

Sinc and Jinc sums

In the previous post, we looked at an elegant equation involving integrals of the sinc function and computed the corresponding integrals for the jinc function.

\int_{-\infty}^\infty \mbox{sinc}(x) \, dx = \int_{-\infty}^\infty \mbox{sinc}^2(x) \, dx = \pi

It turns out the analogous equation holds for sums as well:

\sum_{n=-\infty}^\infty \mbox{sinc}(n) = \sum_{n=-\infty}^\infty \mbox{sinc}^2(n) = \pi

As before, we’d like to compute these two sums and see whether we can compute the corresponding sums for the jinc function.

The Poisson summation formula says that a function and its Fourier transform produce the same sums over the integers:

\sum_{n=-\infty}^\infty f(n) = \sum_{n=-\infty}^\infty \hat{f}(n)

Recall from the previous post that the Fourier transform of sinc is the function π box(π x) where the box function is 1 on [-1/2, 1/2] and zero elsewhere. The only integer n with πn inside [-1/2, 1/2] is 0, so the sum of sinc(n) over the integers equals π. A very similar argument shows that the sum of jinc(n) over the integers equals its Fourier transform at 0, which equals 2.

Let tri(x) be the triangle function, defined to be 1 – |x| for -1 < x < 1 and 0 otherwise. Then the Fourier transform of tri(x) is sinc2(π ω) and so π tri(π x) and sinc2 are Fourier transform pairs. The Poisson summation formula says the sum of sinc2 over the integers is the sum of π tri(π x) over the integers, which is π.

I don’t know the Fourier transform of jinc2 and doubt it’s easy to compute. Maybe the sum could be computed more easily without Fourier transforms, e.g. using contour integration.