The acoustics of Hagia Sophia

Hagia Sophia

The Hagia Sophia (Greek for “Holy Wisdom”) was a Greek Orthodox cathedral from 537 to 1453. When the Ottoman Empire conquered Constantinople the church was converted into a mosque. Then in 1935 it was converted into a museum.

No musical performances are allowed in the Hagia Sophia. However, researchers from Stanford have modeled the acoustics of the space in order to simulate what worship would have sounded like when it was a medieval cathedral. The researchers recorded a virtual performance by synthesizing the acoustics of the building. Not only did they post-process the sound to give the singers the sound of being in the Hagia Sophia, they first gave the singers real-time feedback so they would sing as if they were there.

Related posts

Mathematics of Deep Note

THX deepnote logo score

I just finished listening to the latest episode of Twenty Thousand Hertz, the story behind “Deep Note,” the THX logo sound.

There are a couple mathematical details of the sound that I’d like to explore here: random number generation, and especially Pythagorean tuning.

Random number generation

First is that part of the construction of the sound depended on a random number generator. The voices start in a random configuration and slowly reach the target D major chord at the end.

Apparently the random number generator was not seeded in a reproducible way. This was only mentioned toward the end of the show, and a teaser implies that they’ll go more into this in the next episode.

Pythagorean tuning

The other thing to mention is that the final chord is based on Pythagorean tuning, not the more familiar equal temperament.

The lowest note in the final chord is D1. (Here’s an explanation of musical pitch notation.) The other notes are D2, A2, D3, A3, D4, A4, D5, A5, D6, and F#6.


Octave frequencies are a ratio of 2:1, so if D1 is tuned to 36 Hz, then D2 is 72 Hz, D3 is 144 Hz, D4 is 288 Hz, D5 is 576 Hz, and D6 is 1152 Hz.


In Pythagorean tuning, fifths are in a ratio of 3:2. In equal temperament, a fifth is a ratio of 27/12 or 1.4983 [1], a little less than 3/2. So Pythagorean fifths are slightly bigger than equal temperament fifths. (I explain all this here.)

If D2 is 72 Hz, then A2 is 108 Hz. It follows that A3 would be 216 Hz, A4 would be 432 Hz (flatter than the famous A 440), and A5 would be 864 Hz.

Major thirds

The F#6 on top is the most interesting note. Pythagorean tuning is based on fifths being a ratio of 3:2, so how do you get the major third interval for the highest note? By going up by fifths 4 times from D4, i.e. D4 -> A4 -> E5 -> B5 -> F#6.

The frequency of F#6 would be 81/16 of the frequency of D4, or 1458 Hz.

The F#6 on top has a frequency 81/64 that of the D# below it. A Pythagorean major third is a ratio of 81/64 = 1.2656, whereas an equal temperament major third is f 24/12 or 1.2599 [2]. Pythagorean tuning makes more of a difference to thirds than it does to fifths.

A Pythagorean major third is sharper than a major third in equal temperament. Some describe Pythagorean major chords as brighter or sweeter than equal temperament chords. That the effect the composer was going for and why he chose Pythagorean tuning.


Then after specifying the exact pitches for each note, the composer actually changed the pitches of the highest voices a little to make the chord sound fuller. This makes the three voices on each of the highest notes sound like three voices, not just one voice. Also, the chord shimmers a little bit because the random effects from the beginning of Deep Note never completely stop, they are just diminished over time.

Related posts


[1] The exponent is 7/12 because a half step is 1/12 of an octave, and a fifth is 7 half steps.

[2] The exponent is 4/12 because a major third is 4 half steps.

Musical score above via THX Ltd on Twitter.

Generating pink noise

Different colors of noise are named by analogy with colors of light. Pink noise is between white noise and red noise.

White noise has equal power at all frequencies, just as white light is a combination of all the frequencies of the visible spectrum. The components of red noise are weighted toward low frequencies, just as red light is at the low end of the visible spectrum. Pink noise is weighted toward low frequencies too, but not as strongly as red. Specifically, the power in red noise drops off like 1/f² where f is frequency. The power in pink noise drops off like 1/f.

Generating pink noise is more complicated than you might think. The book Creating Noise, by Stefan Hollos and J. Richard Hollos, has a good explanation and C source code for generating pink noise and variations such as 1/f α noise for 0 < α < 1. If you want even more background, check out Recursive Digital Filters by the same authors.

If you’d like to hear what pink noise sounds like, here’s a sample that was created using the software in the book with a 6th order filter.


Related posts:

Acoustic roughness examples

Amplitude modulated signals sound rough to the human ear. The perceived roughness increases with modulation frequency, then decreases, and eventually disappears. The point where roughness reaches is maximum depends on the the carrier signal, but for a 1 kHz tone roughness reaches a maximum for modulation at 70 Hz. Roughness also increases as a function of modulation depth.

Amplitude modulation multiplies a carrier signal by

1 + d sin(2π f t)

where d is the modulation depth, f is the modulation frequency, and t is time.

Here are some examples you can listen to. We use a pure 1000 Hz tone and Gaussian white noise as carriers, and vary modulation depth and frequency continuously over 10 seconds. he modulation depth example varies depth from 0 to 1. Modulation frequency varies from 0 to 120 Hz.

First, here’s a pure tone with increasing modulation depth.


Next we vary the modulation frequency.


Now we switch over to Gaussian white noise, first varying depth.


And finally white noise with varying modulation frequency. This one sounds like a prop-driven airplane taking off.


Related: Psychoacoustics consulting

Quantifying how annoying a sound is

leaf blower

Eberhard Zwicker proposed a model for combining several psychoacoustic metrics into one metric to quantify annoyance. It is a function of three things:

  • N5, the 95th percentile of loudness, measured in sone (which is confusingly called the 5th percentile)
  • ωS, a function of sharpness in asper and of loudness
  • ωFR, fluctuation strength (in vacil), roughness (in asper), and loudness.

Specifically, Zwicker calculates PA, psychoacoutic annoyance, by

PA &=&N_5 \left( 1 + \sqrt{\omega_S^2 + \omega_{RF}^2}\right) \\ \omega_S &=& \left(\frac{S}{\mbox{acum}} - 1.75\right)^+ \log \left(\frac{N_5}{\mbox{sone}} + 10\right) \\ \omega_{FR} &=& \frac{2.18}{(N_5/\mbox{sone})^{0.4}} \left( 0.4 \frac{F}{\mbox{vacil}} + 0.6 \frac{R}{\mbox{asper}}\right)

A geometric visualization of the formula is given below.

Geometric representation of Zwicker's annoyance formula

Here’s an example of computing roughness using two sound files from previous posts, a leaf blower and a simulated kettledrum. I calibrated both to have sound pressure level 80 dB. But because of the different composition of the sounds, i.e. more high frequency components in the leaf blower, the leaf blower is much louder than the kettledrum (39 sone vs 15 sone) at the same sound pressure level. The annoyance of the leaf blower works out to about 56 while the kettledrum was only about 19.


What is a vacil?

Fluctuation strength is similar to roughness, though at much lower modulation frequencies. Fluctuation strength is measured in vacils (from vacilare in Latin or vacillate in English). Police sirens are a good example of sounds with high fluctuation strength.

Fluctuation strength reaches its maximum at a modulation frequency of around 4 Hz. For much higher modulation frequencies, one perceives roughness rather than fluctuation. The reference value for one vacil is a 1 kHz tone, fully modulated at 4 Hz, at a sound pressure level of 60 decibels. In other words

(1 + sin(8πt)) sin(2000πt)

where t is time in seconds.

Since the carrier frequency is 250 times greater than the modulation frequency, you can’t see both in the same graph. In this plot, the carrier is solid blue compared to the modulation.

1000 Hz signal fully modulated at 4 Hz

Here’s what the reference for one vacil would sound like:


See also: What is an asper?

What is an asper?

Acoustic roughness is measured in aspers (from the Latin word for rough). An asper is the roughness of a 1 kHz tone, at 60 dB, 100% modulated at 70 Hz. That is, the signal

(1 + sin(140πt)) sin(2000πt)

where t is time in seconds.

1000 Hz carrier fully modulated at 70 Hz

Here’s what that sounds like (if you play this at 60 dB, about the loudness of a typical conversation at one meter):


And here’s the Python code that made the file:

    from import write
    from numpy import arange, pi, sin, int16
    def f(t, f_c, f_m):
        # t    = time
        # f_c  = carrier frequency
        # f_m  = modulation frequency
        return (1 + sin(2*pi*f_m*t))*sin(2*f_c*pi*t)
    def to_integer(signal):
        # Take samples in [-1, 1] and scale to 16-bit integers,
        # values between -2^15 and 2^15 - 1.
        return int16(signal*(2**15 - 1))
    N = 48000 # samples per second
    x = arange(3*N) # three seconds of audio
    # 1 asper corresponds to a 1 kHz tone, 100% modulated at 70 Hz, at 60 dB
    data = f(x/N, 1000, 70)
    write("one_asper.wav", N, to_integer(data))

See also: What is a vacil?

Cepstrum, quefrency, and pitch

John Tukey

John Tukey coined many terms that have passed into common use, such as bit (a shortening of binary digit) and software. Other terms he coined are well known within their niche: boxplot, ANOVA, rootogram, etc. Some of his terms, such as jackknife and vacuum cleaner, were not new words per se but common words he gave a technical meaning to.

Cepstrum is an anagram of spectrum. It involves an unusual use of power spectra, and is roughly analogous to making anagrams of a word. A related term, one we will get to shortly, is quefrency, an anagram of frequency. Some people pronounce the ‘c’ in cepstrum hard (like ‘k’) and some pronounce it soft (like ‘s’).

Let’s go back to an example from my post on guitar distortion. Here’s a note played with a fairly large amount of distortion:


And here is its power spectrum:

single note with distortion

There’s a lot going on in the spectrum, but the peaks are very regularly spaced. As I mentioned in the post on the sound of a leaf blower, this is the fingerprint of a sound with a definite pitch. Spikes in the spectrum alone don’t indicate a definite pitch if they are irregularly spaced.

The peaks are fairly periodic. How to you find periodic patterns in a signal? Fourier transform! But if you simply take the Fourier transform of a Fourier transform, you essentially get the original signal back. The key to the cepstrum is to do something else between the two Fourier transforms.

The cepstrum starts by taking the Fourier transform, then the magnitude, then the logarithm, and then the inverse Fourier transform.

When we take the magnitude, we throw away phase information, which we don’t need in this context. Taking the log of the magnitude is essentially what you do when you compute sound pressure level. Some define the cepstrum using the magnitude of the Fourier transform and some the magnitude squared. Squaring only introduces a multiple of 2 once we take logs, so it doesn’t effect the location of peaks, only their amplitude.

Taking the logarithm compresses the peaks, bringing them all into roughly the same range, making the sequence of peaks roughly periodic.

When we take the inverse Fourier transform, we now have something like a frequency, but inverted. This is what Tukey called quefrency.

Looking at the guitar power spectrum above, we see a sequence of peaks spaced 440 Hz apart. When we take the inverse Fourier transform of this, we’re looking at a sort of frequency of a frequency, what Tukey calls quefrency. The quefrency scale is inverted: sounds with a high frequency fundamental have overtones that are far apart on the frequency domain, so the sequence of the overtone peaks has low frequency.

Here’s the plot of the cepstrum for the guitar sample.

electric guitar cepstrum

There’s a big peak at 109 on the quefrency scale. The audio clip was recorded at 48000 samples per second, so the 109 on the quefrency scale corresponds to a frequency of 48000/109 = 440 Hz. The second peak is at quefrency 215, which corresponds to 48000/215 = 223 Hz. The second peak corresponds to the perceived pitch of the note, A3, and the first peak corresponds to its first harmonic, A4. (Remember the quefrency scale is inverted relative to the frequency scale.)

I cheated a little bit in the plot above. The very highest peaks are at 0. They are so large that they make it hard to see the peaks we’re most interested in. These low quefrency peaks correspond to very high frequency noise, near the edge of the audible spectrum or beyond.

Click to learn more about consulting help with signal processing