Quantifying how annoying a sound is

Eberhard Zwicker proposed a model for combining several psychoacoustic metrics into one metric to quantify annoyance. It is a function of three things:

  • N5, the 95th percentile of loudness, measured in sone (which is confusingly called the 5th percentile)
  • ωS, a function of sharpness in asper and of loudness
  • ωFR, fluctuation strength (in vacil), roughness (in asper), and loudness.

Specifically, Zwicker calculates PA, psychoacoutic annoyance, by

PA &=&N_5 \left( 1 + \sqrt{\omega_S^2 + \omega_{RF}^2}\right) \\ \omega_S &=& \left(\frac{S}{\mbox{acum}} - 1.75\right)^+ \log \left(\frac{N_5}{\mbox{sone}} + 10\right) \\ \omega_{FR} &=& \frac{2.18}{(N_5/\mbox{sone})^{0.4}} \left( 0.4 \frac{F}{\mbox{vacil}} + 0.6 \frac{R}{\mbox{asper}}\right)

A geometric visualization of the formula is given below.

Geometric representation of Zwicker's annoyance formula

Here’s an example of computing roughness using two sound files from previous posts, a leaf blower and a simulated kettledrum. I calibrated both to have sound pressure level 80 dB. But because of the different composition of the sounds, i.e. more high frequency components in the leaf blower, the leaf blower is much louder than the kettledrum (39 sone vs 15 sone) at the same sound pressure level. The annoyance of the leaf blower works out to about 56 while the kettledrum was only about 19.

 

Physical models

US Army Corp of Engineers Mississippi basin model

The most recent episode of 99% Invisible tells the story of the Corp of Engineers’ enormous physical model of the Mississippi basin, nearly half of the area of the continental US. Spanning over 200 acres, the model was built during WWII and was shut down in 1993.

Here are some of my favorite lines from the show:

The reason engineers continue to rely on [physical models] is because today, in 2016, we still do not have the computers or the science to do all the things that physical models can do. …

Hydraulic engineering gets into some of the most complicated math there is. Allegedly when Albert Einstein’s son Hans said he wanted to study how sediment moves underwater, Einstein asked him why he wanted to work on something so complicated.

The physics involved happen on such a small scale that we still haven’t built equations complex enough to capture them. And so Stanford Gibson, a world-class numerical modeler, is one of the most ardent supporters of physical models.

But then I have a quibble. The show goes on to say “a physical model doesn’t require equations at all.” That’s not true. When you build a small thing to study how a big thing works, you’ve got to have some theory relating the behavior of the two. If the real thing is 1000 times bigger than the model, does that mean you can simply multiply measurements from the model by 1000 to predict measurements of reality? Sometimes. Some effects scale linearly, but some do not.

It would have been more accurate to say a physical model doesn’t require as many equations as an entirely mathematical model. The physical model is a partial mathematical model because of the mathematics necessary to extrapolate from the model to the thing being modeled.

Another line from the show that I liked was a quote from Stanford Gibson, introduced above.

The idea that science demystifies the world, I just don’t understand that. I feel that the deeper down the scientific rabbit hole I go, the bigger and grander and more magical the world seems.

Related post: Toy problems

Another way to define fractional derivatives

There are many ways to define fractional derivatives, and in general they coincide on nice classes of functions. A long time ago I wrote about one way to define fractional derivatives using Fourier transforms. From that post:

Here’s one way fractional derivatives could be defined. Suppose the Fourier transform of f(x) is g(ξ). Then for positive integer n, the nth derivative of f(x) has Fourier transform (2π i ξ)n g(ξ). So you could take the nth derivative of f(x) as follows: take the Fourier transform, multiply by (2π i ξ)n, and take the inverse Fourier transform. This suggests the same procedure could be used to define the nth derivative when n is not an integer.

Here’s another way to define fractional derivatives that doesn’t use Fourier transforms, the Grünwald-Letnikov derivative. It’s a fairly direct approach.

The starting point is to note that for positive integer n, the nth derivative can be written as

f^{(n)}(x) = \lim_{h\to 0} \frac{(\Delta^n_h f)(x)}{h^n}

where

(\Delta_h f)(x) = f(x) - f(x - h)

and Δn iterates Δ. For example,

(\Delta_h^2 f)(x) = (\Delta_h (\Delta_h) f)(x) = f(x) - 2 f(x-h) + f(x - 2h).

In general, for positive integer n, we have

(\Delta^n_h f)(x) = \sum_{k=0}^\infty (-1)^k {n \choose k} f(x - kh).

We could set the upper limit of the sum to n, but there’s no harm in setting it to ∞ because the binomial coefficients will be zero for k larger than n. And in this form, we can replace n with the integer n with any positive real number α to define the αth derivative. That is, the Grünwald-Letnikov derivative of f is given by

f^{(\alpha)}(x) = \lim_{h\to 0} \frac{(\Delta^\alpha_h f)(x)}{h^\alpha} = \lim_{h\to 0} h^{-\alpha} \sum_{k=0}^\infty (-1)^k {\alpha \choose k} f(x - kh).

See these notes for the definition of binomial coefficients for possibly non-integer arguments and for an explanation why for integer n the coefficients are eventually zero.

Notice that fractional derivatives require non-local information. Ordinary derivatives at a point are determined by the values of the function in an arbitrarily small neighborhood of that point. But notice how the fractional derivative, as defined in this post, depends on the values of the function at an evenly spaced infinite sequence of points. If we define fractional derivatives via Fourier transform, the non-local nature is more apparent since the Fourier transform at any point depends on the values of the function everywhere.

This non-local feature can be good or bad. If you want to model a phenomena with non-local dependence, fractional differential equations might be a good way to go. But if your phenomena is locally determined, fractional differential equations might be a poor fit.

Related post: Mittag-Leffler functions

Humble Lisp programmers

Maybe from the headline you were expecting a blank post? No, that’s not where I’m going.

Yesterday I was on Amazon.com and noticed that nearly all the books they recommended for me were either about Lisp or mountain climbing. I thought this was odd, and mentioned it on Twitter. Carl Vogel had a witty reply: “I guess they weren’t sure whether you want to figuratively or literally look down on everyone.”

The stereotype Lisp programmer does look down on everyone. But this is based on a tiny, and perhaps unrepresentative, sample of people writing about Lisp compared to the much larger number of people who are writing in Lisp.

Lisp has been around for over 50 years and shows no signs of going away. There are a lot of people writing Lisp in obscurity. Kate Gregory said something similar about C++ developers, calling them the dark matter of programmers because there are lot of them but they don’t make much of a splash. They’re quietly doing their job, not speaking at conferences or writing much about their language.

I imagine there are a lot of humble Lisp programmers. It takes some humility to commit to an older technology, especially given the pervasive neomania of the programmer community. It also takes some humility to work on projects that have been around for years or that are deep within the infrastructure of more visible projects, which is where I expect a lot of Lisp lives.

You can do very clever things in Lisp, but you don’t have to. As Ed Post famously said, “The determined Real Programmer can write FORTRAN programs in any language.” There must be a lot of code out there that writes (f x) instead of f(x) but otherwise isn’t that different from FORTRAN.

Integral equation types

There are four basic types of integral equations. There are many other integral equations, but if you are familiar with these four, you have a good overview of the classical theory.

 \phantom{\varphi(x) - }\int_a^x K(x, y) \varphi(y)\, dy &=& f(x) \\ \varphi(x) - \int_a^x K(x, y) \varphi(y)\, dy &=& f(x) \\ \phantom{\varphi(x) - }\int_a^b K(x, y) \varphi(y)\, dy &=& f(x) \\ \varphi(x) - \int_a^b K(x, y) \varphi(y)\, dy &=& f(x)

All four involve the unknown function φ(x) in an integral with a kernel K(x, y) and all have an input function f(x). In all four the integration ranges from some fixed lower limit. In the Volterra equations, the upper limit of integration is the variable x, while in the Fredholm equations, the upper limit of integration is a fixed constant.

The so-called equations of the first kind only involve the unknown function φ inside the integral. The equations of the second kind also involve φ outside the integral.

So the four equations above are

  • Volterra equation of the first kind
  • Volterra equation of the second kind
  • Fredholm equation of the first kind
  • Fredholm equation of the second kind

Here’s a diagram to make these easier to keep straight:

In general, the theory of Volterra equations is easier than that of Fredholm equations. And while equations of the first kind look simpler at first, it’s common to reduce equations of the first kind to equations of the second kind and concentrate on the later.

There are many variations on this theme. The x in Volterra equations could be a vector. The integral could be, for example, a double or triple integral. In Fredholm equations, the integration may be over fixed general region. Maybe you’re integrating over a watermelon, as the late William Guy would say. You could have nonlinear versions of these equations where instead of multiplying K(x, y) times φ(y) you have a kernel K(x, y, φ(y)) that is some nonlinear function of  φ.

You may see references to Volterra or Fredholm equations of the third kind. These are an extension of the second kind, where a function A(x) multiples the φ outside the integral. Equations of the second kind are the most important since the first and third kinds can often be reduced to the second kind.

What is a vacil?

Fluctuation strength is similar to roughness, though at much lower modulation frequencies. Fluctuation strength is measured in vacils (from vacilare in Latin or vacillate in English). Police sirens are a good example of sounds with high fluctuation strength.

Fluctuation strength reaches its maximum at a modulation frequency of around 4 Hz. For much higher modulation frequencies, one perceives roughness rather than fluctuation. The reference value for one vacil is a 1 kHz tone, fully modulated at 4 Hz, at a sound pressure level of 60 decibels. In other words

(1 + sin(8πt)) sin(2000πt)

where t is time in seconds.

Since the carrier frequency is 250 times greater than the modulation frequency, you can’t see both in the same graph. In this plot, the carrier is solid blue compared to the modulation.

1000 Hz signal fully modulated at 4 Hz

Here’s what the reference for one vacil would sound like:

 

See also: What is an asper?

What is an asper?

Acoustic roughness is measured in aspers (from the Latin word for rough). An asper is the roughness of a 1 kHz tone, at 60 dB, 100% modulated at 70 Hz. That is, the signal

(1 + sin(140πt)) sin(2000πt)

where t is time in seconds.

1000 Hz carrier fully modulated at 70 Hz

Here’s what that sounds like (if you play this at 60 dB, about the loudness of a typical conversation at one meter):

 

And here’s the Python code that made the file:

    
    from scipy.io.wavfile import write
    from numpy import arange, pi, sin, int16
    
    def f(t, f_c, f_m):
        # t    = time
        # f_c  = carrier frequency
        # f_m  = modulation frequency
        return (1 + sin(2*pi*f_m*t))*sin(2*f_c*pi*t)
    
    def to_integer(signal):
        # Take samples in [-1, 1] and scale to 16-bit integers,
        # values between -2^15 and 2^15 - 1.
        return int16(signal*(2**15 - 1))
    
    N = 48000 # samples per second
    x = arange(3*N) # three seconds of audio
    
    # 1 asper corresponds to a 1 kHz tone, 100% modulated at 70 Hz, at 60 dB
    data = f(x/N, 1000, 70)
    write("one_asper.wav", N, to_integer(data))

See also: What is a vacil?

Mittag-Leffler function and probability distribution

The Mittag-Leffler function is a generalization of the exponential function. Since k!= Γ(k + 1), we can write the exponential function’s power series as

\exp(x) = \sum_{k=0}^\infty \frac{x^k}{\Gamma(k+1)}

and we can generalize this to the Mittag=Leffler function

E_{\alpha, \beta}(x) = \sum_{k=0}^\infty \frac{x^k}{\Gamma(\alpha k+\beta)}

which reduces to the exponential function when α = β = 1. There are a few other values of α and β for which the Mittag-Leffler function reduces to more familiar functions. For example,

E_{2,1}(x) = \cosh(\sqrt{x})

and

E_{1/2, 1}(x) = \exp(x^2) \, \mbox{erfc}(-x)

where erfc(x) is the complementary error function.

History

Mittag-Leffler was one person, not two. When I first saw the Mittag-Leffler theorem in complex analysis, I assumed it was named after two people, Mittag and Leffler. But the theorem and the function discussed here are named after one man, the Swedish mathematician Magnus Gustaf (Gösta) Mittag-Leffler (1846–1927).

The function that Mr. Mittag-Leffler originally introduced did not have a β parameter; that generalization came later. The function Eα is Eα, 1.

Mittag-Leffler probability distributions

Just as you can make a couple probability distributions out of the exponential function, you can make a couple probability distributions out of the Mittag-Leffler function.

Continuous Mittag-Leffler distribution

The exponential function exp(-x) is positive over [0, ∞) and integrates to 1, so we can define a probability distribution whose density (PDF) function is f(x) = exp(-x) and whose distribution function (CDF) is F(x) = 1 – exp(-x). The Mittag-Leffler distribution has CDF is 1 – Eα(-xα) and so reduces to the exponential distribution when α = 1. For 0 < α < 1, the Mittag-Leffler distribution is a heavy-tailed generalization of the exponential. [1]

Discrete Mittag-Leffler distribution

The Poisson distribution comes from taking the power series for exp(λ), normalizing it to 1, and using the kth term as the probability mass for k. That is,

P(X = k) = \frac{1}{\exp(\lambda)} \frac{\lambda^k}{k!}

The analogous discrete Mittag-Leffler distribution [2] has probability mass function

P(X = k) = \frac{1}{E_{\alpha, \beta}(\lambda)} \frac{\lambda^k}{\Gamma(\alpha k + \beta)}.

Fractional differential equations

In addition to probability and statistics, the the Mittag-Leffler function comes up in fractional calculus. It plays a role analogous to that of the exponential distribution in classical calculus. Just as the solution to the simply differential equation

\frac{d}{dx} f(x) = a\, f(x)

is exp(ax), for 0 < μ < 1, the soltuion to the fractional differential equation

\frac{d^\mu}{dx^\mu} f(x) = a\, f(x)

is axμ-1 Eμ, μ(a xμ). Note that this reduces to exp(ax) when μ = 1. [3]

References

[1] Gwo Dong Lin. Journal of Statistical Planning and Inference 74 (1998) 1–9, On the Mittag–Leffler distributions

[2] Subrata Chakraborty, S. H. Ong. Mittag-Leffler function distribution: A new generalization of hyper-Poisson distribution. arXiv:1411.0980v1

[3] Keith Oldham, Jan Myland, Jerome Spanier. An Atlas of Functions. Springer.

Kalman filters and functional programming

A few weeks ago I started a series of posts on various things you could do with a functional fold. In the first post I mentioned that the idea came from a paper by Brian Beckman on Kalman filters and folds:

This post was inspired by a paper by Brian Beckman (in progress) that shows how a Kalman filter can be implemented as a fold. From a Bayesian perspective, the thing that makes the Kalman filter work is that a certain multivariate normal model has a conjugate prior. This post shows that conjugate models more generally can be implemented as folds over the data. That’s interesting, but what does it buy you? Brian’s paper discusses this in detail, but one advantage is that it completely separates the accumulator function from the data, so the former can be tested in isolation.

At the time Brian was working on one big paper in private. This has since been split into several papers and they’re now public.

 

Formal methods let you explore the corners

I heard someone say the other day that the advantage of formal software validation methods is that they let you explore the corners, cases where intuition doesn’t naturally take you.

This made me think of corners in the geometric sense. If you have a sphere in a box in high dimensions, nearly all the volume is in the corners, i.e. outside the sphere. This is more than a metaphor. You can think of software options geometrically, with each independent choice corresponding to a dimension. Paths through a piece of software that are individually rare may account for nearly all use when considered together.

With a circle inside a square, nearly 78.5% of the area is inside the circle. With a ball sitting inside a 3-D box, 52.4% of the volume is inside the ball. As the dimension increases, the proportion of volume inside the sphere rapidly decreases. For a 10-dimensional sphere sitting in a 10-dimensional box, 0.25% of the volume is in the sphere. Said another way, 99.75% of the volume is in the corners.

When you go up to 100 dimensions, the proportion of volume inside the sphere is about 2 parts in 1070, a 1 followed by 70 zeros [1]. If 100 dimensions sounds like pure fantasy, think about a piece of software with more than 100 features. Those feature combinations multiply like geometric dimensions [2].

Here’s a little Python code you could use to see how much volume is in a sphere as a function of dimension.

    from scipy.special import gamma
    from math import pi

    def unit_sphere_volume(n): 
        return pi**(0.5*n)/gamma(0.5*n + 1)

    def unit_cube_volume(n): 
        return 2**n

    def ratio(n):
        return unit_sphere_volume(n) / unit_cube_volume(n)

    print( [ratio(n) for n in range(1, 20)] )

* * *

[1] There are names for such extremely large numbers. These names are hardly ever used—scientific notation is much more practical— but they’re fun to say. 1070 is ten duovigintillion in American nomenclature, ten undecilliard in European.

[2] Geometric dimensions are perfectly independent, but software feature combinations are not. In terms of logic, some combinations may not be possible. Or in terms of probability, the probability of exploring some paths is conditional on the probability of exploring other paths. Even so, there are inconceivably many paths through any large software system. And in large-scale operations, events that should “never happen” happen regularly.

Literate programming: presenting code in human order

Presentation order

People best understand computer programs in a different order than compilers do. This is a key idea of literate programming, and one that distinguishes literate programs from heavily commented programs.

Traditional source code, no matter how heavily commented, is presented in the order dictated by the compiler. The computer is the primary audience. Literate programming is more humanistic in the sense that the primary audience is a human. The computer has to go to extra effort to arrange the code for its needs. As Donald Knuth describes it in his book on literate programming,

The practitioner of literate programming … strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that nicely reinforce each other. [emphasis added]

There are two steps in processing literate programs: weaving and tangling. You take files containing prose and code, and weave them into documentation and tangle them into source code. Tools like Sweave and Pweave focus on the weave process, as their names imply. The weave side of literate programming has gotten the most attention.

A half-hearted approach to literate programming doesn’t require much of a tangle process. A well-commented program has no tangle step at all. A *weave document that follows the order of the source code has a trivial tangle step: save the code to its own file, manually or automatically, but don’t rearrange it. But a full-fledged literate program may make the tangle program work harder, rearranging code fragments from human-friendly to compiler-friendly order.

Careful explanation vs. unit tests

The most obvious feature of literate programming is that it requires careful explanation. Here’s more from the paragraph I quoted above, filling in the part I left out.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with explanation and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible …

The discipline of explaining every piece of code leads to better code. It serves a similar purpose to writing unit tests. I saw somewhere—I can’t remember where now— that Knuth hates unit testing and sees it as redundant effort. Presumably this is because unit testing and literate programming overlap. Unit tests are a kind of commentary on code, explaining how it used, exploring its limitations, etc.

Knuth understands that literate programming doesn’t replace the need for testing, just unit testing. He explained somewhere—again I forget where—that he would test TeX by spending days at a time trying fiendishly to break it.

My misunderstanding and experience

When I read Knuth’s book, I could see the value of carefully explaining code. What I didn’t appreciate was the value of presenting code in a different order than the order of source code.

I’m working on a project now where a few lines of code may require a few paragraphs of explanation. That’s what got me thinking about literate programming. My intention was to write my documentation in the same order as the code. It took a while to realize I had stumbled on an ideal application of literate programming: a complicated algorithm that needs to be explained carefully, both in order to hand over to the client and to reduce the chances of errors. The best order to understand this algorithm is definitely not top-down going through the code.

Why literate programming has not taken off

I think I understand better now why literate programming hasn’t gained much of an audience. I used to think that it was because developers hate writing prose. That’s part of it. Most programmers I’ve worked with would much rather write a hundred lines of unit tests than write one complete sentence.

But that’s not the whole story. There are quite a few programmers who are willing and able to write prose. Why don’t more of them use literate programming?

I think part of the reason is that having a non-trivial tangle process is a barrier to adoption. A programmer can decide to start writing more extensive comments, gradually working up to essay-like explanations. But it’s one thing to say “I’m going to heavily comment my C code” and quite another to say “I’m not going to write C per se any more. I’m going to write CWEB files that compile to C.” Even if a programmer wants to write CWEB in secret, just checking in the tangled C files, other programmers will edit these C files and the changes won’t be reflected in the CWEB source. Also, the output of tangle is less readable than ordinary code. The programmer secretly using CWEB as a preprocessor would appear to be writing undocumented code.

Tricky code benefits from a literate presentation, but routine code does not benefit so much. You either have to have two ways of writing code—straight source for routine code and literate programs for the tricky parts—or impose the overhead of literate programming everywhere. Most code is mundane and repetitive, challenging because of its volume rather than its cleverness. Knuth doesn’t write this kind of code. He only writes code that benefits from a literate presentation.

To write a good literate program, not only do you need to be able to program, and need to be willing and able to write good prose, on top of that you need to have a good sense for story telling, arranging the code for the benefit of other readers. If this is done poorly, the result is harder to understand than traditional programs.

I may use literate programming more now that I’m starting to understand it, at least for my own benefit and hopefully for the benefit of clients. I usually deliver algorithms or libraries, not large applications, and so it wouldn’t be too much work to create two versions of my results. I could create a literate program, then weave a report, and manually edit the tangled code into a traditional form convenient for the client.

Distribution of numbers in Pascal’s triangle

This post explores a sentence from the book Single Digits:

Any number in Pascal’s triangle that is not in the outer two layers will appear at least three times, usually four.

Pascal’s triangle contains the binomial coefficients C(nr) where n ranges over non-negative numbers and r ranges from 0 to n. The outer layers are the elements with r equal to 0, 1, n-1, and n.

Pascal's triangle

We’ll write some Python code to explore how often the numbers up to 1,000,000 appear. How many rows of Pascal’s triangle should we compute? The smallest number on row n is C(n, 2). Now 1,000,000 is between C(1414, 2) and C(1415, 2) so we need row 1414. This means we need N = 1415 below because the row numbers start with 0.

I’d like to use a NumPy array for storing Pascal’s triangle. In the process of writing this code I realized that a NumPy array with dtype int doesn’t contain Python’s arbitrary-sized integers. This makes sense because NumPy is designed for efficient computation, and using a NumPy array to contain huge integers is unnatural. But I’d like to do it anyway, and the to make it happen is to set dtype to object.

    import numpy as np
    from collections import Counter
    
    N = 1415 # Number of rows of Pascal's triangle to compute
    
    Pascal = np.zeros((N, N), dtype=object)
    Pascal[0, 0] = 1
    Pascal[1,0] = Pascal[1,1] = 1
    
    for n in range(2, N):
        for r in range(0, n+1):
            Pascal[n, r] = Pascal[n-1, r-1] + Pascal[n-1, r]
    
    c = Counter()
    for n in range(4, N):
        for r in range(2, n-1):
            p = Pascal[n, r]
            if p <= 1000000:
                c[p] += 1

When we run this code, we find that our counter contains 1732 elements. That is, of the numbers up to one million, only 1732 of them appear inside Pascal’s triangle when we disallow the outer two layers. (The second layer contains consecutive integers, so every positive integer appears in Pascal’s triangle. But most integers only appear in the second layer.)

When Single Digits speaks of “Any number in Pascal’s triangle that is not in the outer two layers” this cannot refer to numbers that are not in the outer two layers because every natural number appears in the outer two layers. Also, when it says the number “will appear at least three times, usually four” it is referring to the entire triangle, i.e. including the outer two layers. So another way to state the sentence quoted above is as follows.

Define the interior of Pascal’s triangle to be the triangle excluding the outer two layers. Every number n in the interior of Pascal’s triangle appears twice more outside of the interior, namely as C(n, 1) and C(nn-1). Most often n appears at least twice in the interior as well.

This means that any number you find in the interior of Pascal’s triangle, interior meaning not in the two outer layers, will appear at least three times in the full triangle, usually more.

Here are the statistics from our code above regarding just the interior of Pascal’s triangle.

  • One number, 3003, appears six times.
  • Six numbers appear four times: 120, 210, 1540, 7140, 11628, and 24310.
  • Ten numbers appear only once: 6, 20, 70, 252, 924, 3432, 12870,  48620, 184756, and 705432.
  • The large majority of numbers, 1715 out of 1732, appear twice.

Agile software development and homotopy

One of the things I learned from my tenure as a software project manager was that a project is more likely to succeed if there’s a way to get where you want to go continuously. You want to move a project from A to B gradually, keeping a working code base all along the way. At the end of each day, the software may not be fully functional, but it should at least build. Anything that requires a big bang change, tearing the system apart for several days and putting it back together, is less likely to succeed.

This is very much like the idea of homotopy from topology, a continuous deformation of one thing into another. No discontinuities along the way — no ripping, no jumping suddenly from one thing to another.

* * *

Need help with agile software forecasting?