Typesetting sheet music with AI

Posted on 13 March 2026 by John

Lilypond is a TeX-like typesetting language for sheet music. I’ve had good results asking AI to generate Lilypond code, which is surprising given the obscurity of the language. There can’t be that much publicly available Lilypond code to train on.

I’ve mostly generated Lilypond code for posts related to music theory, such as the post on the James Bond chord. I was curious how well AI would work if I uploaded an image of sheet music and asked it to produce corresponding Lilypond code.

In a nutshell, the results were hilariously bad as far as the sheet music produced. But Grok did a good job of recognizing the source of the clips.

Test images

Here are the two images I used, one of classical music

and one of jazz.

I used the same prompt for both images with Grok and ChatGPT: Write Lilypond code corresponding to the attached sheet music image.

Classical results

Grok

Here’s what I got when I compiled the code Grok generated for the first image.

This bears no resemblance to the original, turning one measure into eight. However, Grok correctly inferred that the excerpt was by Bach, and the music it composed (!) is in the style of Bach, though it is not at all what I asked for.

ChatGPT

Here’s the corresponding output from ChatGPT.

Not only did ChatGPT hallucinate, it hallucinated in two-part harmony!

Jazz results

One reason I wanted to try a jazz example was to see what would happen with the chord symbols.

Grok

Here’s what Grok did with the second sheet music image.

The notes are almost unrelated to the original, though the chords are correct. The only difference is that Grok uses the notation Δ for a major 7th chord; both notations are common. And Grok correctly inferred the title of the song.

I edited the image above. I didn’t change any notes, but I moved the title to center it over the music. I also cut out the music and lyrics credits to make the image fit on the page easier. Grok correctly credited Johnny Burke and Jimmy Van Heusen for the lyrics and music.

ChatGPT

Here’s what I got when I compiled the Lilypond code that ChatGPT produced. The chords are correct, as above. The notes bear some similarity to the original, though ChatGPT took the liberty of changing the key and the time signature, and the last measure has seven and a half beats.

ChatGPT did not speculate on the origin of the clip, but when I asked “What song is this music from?” it responded with “The fragment appears to be from the jazz standard ‘Misty.'”

Giant Steps

Posted on 23 February 2026 by John

John Coltrane’s song Giant Steps is known for its unusual and difficult chord changes. Although the chord progressions are complicated, there aren’t that many unique chords, only nine. And there is a simple pattern to the chords; the difficulty comes from the giant steps between the chords.

Giant Steps chords

If you wrap the chromatic scale around a circle like a clock, there is a three-fold symmetry. There is only one type of chord for each root, and the three notes not represented are evenly spaced. And the pattern of the chord types going around the circle is

minor 7th, dominant 7th, major 7th, skip
minor 7th, dominant 7th, major 7th, skip
minor 7th, dominant 7th, major 7th, skip

To be clear, this is not the order of the chords in Giant Steps. It’s the order of the sorted list of chords.

For more details see the video The simplest song that nobody can play.

Tritone substitution

Posted on 23 February 2026 by John

Big moves in roots can correspond to small moves in chords.

Imagine the 12 notes of a chromatic scale arranged around the hours of a clock: C at 12:00, C♯ at 1:00, D at 2:00, etc. The furthest apart two notes can be is 6 half steps, just as the furthest apart two times can be is 6 hours.

Musical clock

An interval of 6 half steps is called a tritone. That’s a common term in jazz. In classical music you’d likely say augmented fourth or diminished fifth. Same thing.

The largest possible movement in roots corresponds to almost the smallest possible movement between chords. Specifically, to go from a dominant seventh chord to another dominant seventh chord whose roots are a tritone apart only requires moving two notes of the chord a half step each.

For example, C and F♯ are a tritone apart, but a C⁷ chord and a F♯⁷ chord are very close together. To move from the former to the latter you only need to move two notes a half step.

Musical clock

Replacing a dominant seventh chord with one a tritone away is called a tritone substitution, or just tritone sub. It’s called this for two reasons. The root moves a tritone, but also the tritone inside the chord does not move. In the example above, the third and the seventh of the C⁷ chord become the seventh and third of the F♯⁷ chord. On the diagram, the dots at 4:00 and 10:00 don’t move.

Tritone substitutions are a common technique for making basic chord progressions more sophisticated. A common tritone sub is to replace the V of a ii-V-I chord progression, giving a nice chromatic progression in the bass line. For example, in the key of C, a D min – G⁷– C progression becomes D min – D♭⁷ – C.

Just change the key

Posted on 11 December 2025 by John

When I was a kid, I suppose sometime in my early teens, I was interested in music theory, but I couldn’t play piano. One time I asked a lady who played piano at our church to play a piece of sheet music for me so I could hear how it sounded. The music was in the key of A, but she played it in A♭. She didn’t say she was going to change the key, but I could tell from looking at her hands that she had.

Key signatures of A sharp, A flat

I was shocked by the audacity of changing the music to be what you wanted it to be rather than playing what was on the page. I was in band, and there you certainly don’t decide unilaterally that you’re going to play in a different key!

In retrospect what the pianist was doing makes sense. Hymns are very often in the key of A♭. One reason is it’s a comfortable key for SATB singing. Another is that if many hymns are in the same key, that makes it easy to go from one directly into another. If a traditional hymn is not in A♭, it’s probably in a key with flats, like B♭ or D♭. (Contemporary church music is often in keys with sharps because guitarists like open strings, which leads to keys like A or E.)

The pianist wasn’t a great musician, but she was good enough. Picking her key was a coping mechanism that worked well. Unless someone in the congregation has perfect pitch, you can change a song from the key of D to the key of D♭ and nobody will know.

There’s something to be said for clever coping mechanisms, especially if they’re declared, “You asked for A. Is it OK if I give you B?” It’s better than saying “Sorry, I can’t help you.”

Overtones and Barbershop Quartets

Posted on 24 April 2025 by John

I’ve heard that barbershop quartets often sing the 7th in a dominant 7th a little flat in order to bring the note closer in tune with the overtone series. This post will quantify that assertion.

The overtones of a frequency f are 2f, 3f, 4f, 5f, etc. The overtone series is a Fourier series.

Here’s a rendering of the C below middle C and its overtones.

$\score { \new Staff { \clef treble <c c' g' c'' e'' bes''>1 } \layout {} \midi {} }$

We perceive sound on a logarithmic scale. So although the overtone frequencies are evenly spaced, they sound like they’re getting closer together.

Overtones and equal temperament

Let’s look at the notes in the chord above and compare the frequencies between the overtone series and equal temperament tuning.

Let f be the frequency of the lowest note. The top four notes in this overtone series have frequencies 4f, 5f, 6f, and 7f. They form a C⁷ chord [1].

In equal temperament, these four notes have frequencies 2^24/12 f, 2^28/12 f, 2^31/12 f, and 2^34/12 f. This works out to 4, 5.0397, 5.9932, and 7.1272 times the fundamental frequency f

The the highest note, the B♭, is the furthest from its overtone counterpart. The frequency is higher than that of the corresponding overtone, so you’d need to perform it a little flatter to bring it in line with the overtone series. This is consistent with the claim at the top of the post.

Differences in cents

How far apart are 7f and 7.1272f in terms of cents, 100ths of a semitone?

The difference between two frequencies, f₁ and f₂, measured in cents is

1200 log₂(f₁ / f₂).

To verify this, note that this says an octave equals 1200 cents, because log₂ 2 = 1.

So the difference between the B♭ in equal temperament and in the 7th note of the overtone series is 31 cents.

The difference between the E and the 5th overtone is 14 cents, and the difference between the G and the 6th overtone is only 2 cents.

Dogecoin anthem

Posted on 29 November 2024 by John

Rocket with Dogecoin mascot

Someone sent me an AI-generated Dogecoin anthem: To Da Moon.

Here’s the audio.

And here are the lyrics:

Yo, it started as a joke, now we in the game,
Dogecoin rocket, yeah, remember the name.
Crypto vibes, makin’ history soon,
Strapped to the rocket, we’re goin’ to the moon.

Elon on the tweets, got the memes in check,
Shiba Inu power, cashin’ every check.
From the hodlers to the dreamers, we makin’ it right,
Dogecoin anthem, light up the night.

(Chorus)
Doge to da moon, ya, Doge to da moon
To da moooooooooon
Diamond hands brotha
All aboard the rocket, Doge to da moon!

(Verse 2)
Started at a penny, now it’s hittin’ big heights,
Laughin’ to the bank while we flexin’ the lights.
They said it was a phase, but we changin’ the game,
Shoutout to the legends who believed in the name.

Blockchain movin’, decentralize the power,
Crypto revolution, it’s our finest hour.
From the traders to the hodlers, shout it real loud,
Doge Army strong, and we bringin’ the crowd.

(Chorus)
Doge to da moon, ya, Doge to da moon
To da moooooooooon
Diamond hands brotha
All aboard the rocket, Doge to da moon!
All aboard the rocket, Doge to the moon.

[instrumental]

[vocoder]

The Department of Government Efficiency
is coming for your bureaucracy,
No more waste don’t ya see?

(Bridge)
It’s not just a coin, it’s a vibe, it’s a culture,
Crypto rebel spirit, yeah, we ridin’ like vultures.
From the charts to the memes, we hittin’ the tune,
Dogecoin anthem, it’s our lunar commune.

(Chorus)
Doge to da moon, ya, Doge to da moon
To da moooooooooon
Diamond hands brotha
All aboard the rocket, Doge to da moon!

(Outro)
Moonshot dreams, yeah, the future is bright,
Crypto revolution, takin’ flight tonight.
Dogecoin forever, yeah, we’ll never be through,
42 42 42 42 42 42 forty-twooooooo

Music of the spheres

Posted on 28 February 2024 by John

The idea of “music of the spheres” dates back to the Pythagoreans. They saw an analogy between orbital frequency ratios and musical frequency ratios.

HD 110067 is a star 105 light years away that has six known planets in orbital resonance. The orbital frequencies of the planets are related to each other by small integer ratios.

The planets, starting from the star, are labeled b, c, d, e, f, and g. In 9 “years”, from the perspective of g, the planets complete 54, 36, 24, 16, 12, and 9 orbits respectively. So the ratio of orbital frequencies between each pair of consecutive planets are either 3:2 or 4:3. In musical terms, these ratios are fifths and fourths.

In the chord below, the musical frequency ratios are the same as the orbital frequency ratios in the HD 110067 system.

Here’s what the chord sounds like on a piano:

hd11067.wav

The Real Book

Posted on 28 February 2024 by John

I listened to the 99% Invisible podcast about The Real Book this morning and thought back to my first copy.

My first year in college I had a jazz class, and I needed to get a copy of The Real Book, a book of sheet music for jazz standards. The book that was illegal at the time, but there was no legal alternative, and I had no scruples about copyright back then.

When a legal version came out later I replaced my original book with the one in the photo below.

The New Real Book Legal

The podcast refers to “When Hal Leonard finally published the legal version of the Real Book in 2004 …” but my book says “Copyright 1988 Sher Music Co.” Maybe Hal Leonard published a version in 2004, but there was a version that came out years earlier.

The podcast also says “Hal Leonard actually hired a copyist to mimic the old Real Book’s iconic script and turn it into a digital font.” But my 1988 version looks not unlike the original. Maybe my version used a kind of typesetting common in jazz, but the Hal Leonard version looks even more like the original handwritten sheet music.

Tritone

Posted on 10 October 2023 by John

A few weeks ago I wrote about how the dissonance of a musical interval is related to the complexity of the frequency ratio as a fraction, where complexity is measured by the sum of the numerator and denominator. Consonant intervals have simple frequency ratios and dissonant intervals have complex frequency ratios.

By this measure, the most consonant interval, other than an octave, is a perfect fifth. And the most dissonant interval is a tritone, otherwise known as the diminished fifth or augmented fourth. So in some sense perfect fifths and tritones are opposites, but they are both ways of splitting an octave in half, just on different scales.

Linear scale versus log scale

When we say simple frequency ratios are consonant and complex frequency ratios are dissonant, we are speaking about ratios on a linear scale. But we often think of musical notes on a logarithmic scale. For example, we think of the notes in a chromatic scale as being evenly spaced, and they are evenly spaced, but on a log scale.

If we divide an octave in half on a linear scale, we get a perfect fifth. For example, if we take an A 440 and an A 880 an octave higher, the arithmetic mean, the midpoint on a linear scale, we get E 660.

But if we divide an octave in half on a log scale, we get a tritone, three whole steps or six half steps out of 12 half steps in a chromatic scale. The midpoint on a log scale is the geometric mean. The geometric mean of 440 and 880 is 440 √2 = 622, which is D#.

So we take the midpoint of an octave on a linear scale we get the most consonant interval, a perfect fifth, but if we take the midpoint of an octave on a log scale we get the most dissonant interval, a tritone.

Tritone substitution

Intervals of a fifth are so consonant that they don’t contribute much to the character of a chord. It is common to leave out the fifth.

Tritones, however, are essential to the sound of a chord. In fact, it is common to replace a chord with a different chord that maintains the same tritone. For example, in the key of C, the G⁷ chord contains B and F, a tritone. The chord C#⁷ contains the same two notes (though the F would be written as E#), and you’ll often see a C#⁷ chord substituted for a G⁷ chord. So a song that had a Dm–G⁷–C progression might be rewritten as Dm–C#⁷–C, creating a downward chromatic motion in the base line.

This is called a tritone substitution. You could think of the name two ways. In the discussion above we talked about preserving the tritone in a chord. But notice we also changed the root of the chord by a tritone, replacing G with C#. More generally, replacing any chord with a chord whose root is a tritone away is called a tritone substitution or simply tritone sub. For example, a D minor chord does not contain a tritone, but we could still do a tritone sub, replacing Dm with G#m because D and G# are a tritone apart.

Jaccard index and jazz albums

Posted on 26 July 2023 by John

Miles Davis Kind of Blue album cover

Jaccard index is a way of measuring the similarity of sets. The Jaccard index, or Jaccard similarity coefficient, of two sets A and B is the number of elements in their intersection, A ∩ B, divided by the number of elements in their union, A ∪ B.

$J(A, B) = \frac{|A \cap B|}{|A \cup B|}$

Jaccard similarity is a robust way to compare things in machine learning, say in clustering algorithms, less sensitive to outliers than other similarity measures such as cosine similarity.

Miles Davis Albums

Here we’ll illustrate Jaccard similarity by looking at the personnel on albums by Miles Davis. Specifically, which pair of albums had more similar personnel: Kind of Blue and Round About Midnight, or Bitches Brew and In a Silent Way?

There were four musicians who played on both Kind of Blue and Round About Midnight: Miles Davis, Cannonball Adderly, John Coltrane, and Paul Chambers.

There were six musicians who played on both Bitches Brew and In a Silent Way: Miles Davis, Wayne Shorter, Chick Corea, Dave Holland, and John McLaughlin, Joe Zawinul.

The latter pair of albums had more personnel in common, but they also had more personnel in total.

There were 9 musicians who performed on either Kind of Blue or Round About Midnight. Since 4 played on both albums, the Jaccard index comparing the personnel on the two albums is 4/9.

In a Silent Way and especially Bitches Brew used more musicians. A total of 17 musicians performed on one of these albums, including 6 who were on both. So the Jaccard index is 6/17.

Jaccard distance

Jaccard distance is the complement of Jaccard similarity, i.e.

In our example, the Jaccard distance between Kind of Blue and Round About Midnight is 1 − 4/9 = 0.555. The Jaccard distance between Bitches Brew and In a Silent Way is 1 − 6/17 = 0.647.

Jaccard distance really is a distance. It is clearly a symmetric function of its arguments, unlike Kulback-Liebler divergence, which is not.

The difficulty in establishing that Jaccard distance is a distance function, i.e. a metric, is the triangle inequality. The triangle inequality does hold, though this is not simple to prove.

Music

Typesetting sheet music with AI

Test images

Classical results

Grok

ChatGPT

Jazz results

Grok

ChatGPT

Giant Steps

Related posts

Tritone substitution

Related posts

Just change the key

Overtones and Barbershop Quartets

Overtones and equal temperament

Differences in cents

More music posts

Dogecoin anthem

Music of the spheres

Related posts

The Real Book

Tritone

Linear scale versus log scale

Tritone substitution

Related posts

Jaccard index and jazz albums

Miles Davis Albums

Jaccard distance