Morse code and the limits of human perception

Musician Adam Neely made a video asking What is the fastest music humanly possible? He emphasizes that he means the fastest music possible to hear, not the fastest to perform.

Screenshot of Bolton paper

The video cites a psychology article [1] from 1894 that found that most people can reliably distinguish an inter-onset interval (time between notes) of 100 ms [2]. It also gives examples of music faster than this, such as a performance of Vivaldi with an inter-onset interval of 83 ms [3]. The limit seem to be greater than 50 ms because a pulse train with an inter-onset interval of 50 ms starts to sound like a 20 Hz pitch.

People are able to receive Morse code faster than this implies is possible. We will explain how this works, but first we need to convert words per minute to inter-onset interval length.

Morse code timing

Morse code is made of dots and dashes, but it is also made of spaces of three different lengths: the space between the dots and dashes representing a single letter, the space between letters, and the space between words.

According to an International Telecommunication Union standard

  • A dash is equal to three dots.
  • The space between the signals forming the same letter is equal to one dot.
  • The space between two letters is equal to three dots.
  • The space between two words is equal to seven dots.

The same timing is referred to as standard in a US Army manual from 1957,

Notice that all the numbers above are odd. Since a dot or dash is always followed by a space, the duration of a dot or dash and its trailing space is always an even multiple of the duration of a dot.

If we think of a dot as a sixteenth note, Morse code is made of notes that are either sixteenth notes or three sixteenth notes tied together. Rests are equal to one, three, or seven sixteenths, and notes and rests must alternate. All notes start on an eighth note boundary, i.e. either on a down beat or an up beat but not in between.

Words per minute

Morse code speed is measured in words per minute. But what exactly is a “word”? Words have a variable number of letters, and even words with the same number of letters can have very different durations in Morse code.

The most common standard is to use PARIS as the prototypical word. Ten words per minute, for example, means that dots and dashes are coming at you as fast as if someone were sending the word PARIS ten times per minute. Here’s a visualization of the code for PARIS with each square representing the duration of a dot.

■□■■■□■■■□■□□□■□■■■□□□■□■■■□■□□□■□■□□□■□■□■□□□□□□□□

This has the duration of 50 dots.

How does this relate to inter-onset interval? If each duration of a dot is an interval, then n words per minute corresponds to 50n intervals per minute, or 60/50n = 1.2/n seconds per interval.

This would imply that 12 wpm would correspond to an inter-onset interval of around 100 ms, pushing the limit of perception. But 12 wpm is a beginner speed. It’s not uncommon for people to receive Morse code at 30 wpm. The world record, set by Theodore Roosevelt McElroy in 1939, is 75.2 wpm.

What’s going on?

In the preceding section I assumed a dot is an interval when calculating inter-onset interval length. In musical terms, this is saying a sixteenth note is an interval. But maybe we should count eighth notes as intervals. As noted before, every “note” (dot or dash) starts on a down beat or up beat. Still, that would say 20 wpm is pushing the limit of perception, and we know people can listen faster than that.

You don’t have to hear with the precision of the diagram above in order to recognize letters. And you have context clues. Maybe you can’t hear the difference between “E E E” and “O”, but in ordinary prose the latter is far more likely.

E E E vs O

At some skill level people quit hearing individual letters and start hearing words, much like experienced readers see words rather than letters. I’ve heard that this transition happens somewhere between 20 wpm and 30 wpm. That would be consistent with the estimate above that 20 wpm is pushing the limit of perception letter by letter. But 30 words per minute is doable. It’s impressive, but not unheard of.

What I find hard to believe is that there were intelligence officers, such as Terry Jackson, who could copy encrypted text at 50 wpm. Here a “word” is a five-letter code group. There are millions of possible code groups [4], all equally likely, and so it would be impossible to become familiar with particular code groups the way one can become familiar with common words. Maybe they learned to hear pairs or triples of letters.

Related posts

[1] Thaddeus L. Bolton. Rhythm. The American Journal of Psychology. Vol. VI. No. 2. January, 1894. Available here.

[2] Interonset is not commonly hyphenated, but I’ve hyphenated it here for clarity.

[3] The movement Summer from Vivaldi’s The Four Seasons performed at 180 bpm which corresponds to 720 sixteenth notes per minute, each 83 ms long.

[4] If a code group consisted entirely of English letters, there are 265 = 11,881,376 possible groups. If a code group can contain digits as well, there would be 365 = 60,466,176 possible groups.

Russian Morse Code

I read something once about an American telegraph operator who had to switch over to using Russian Morse code during WWII. I wondered how hard that would be, but let it go. The idea came back to me and I decided to settle it.

It would be hard to switch from being able to recognize English words to being able to recognize Russian words, but that’s not the the telegrapher had to do. He had to switch from receiving code groups made of English letters to ones made of Russian letters.

Switching from receiving encrypted English to encrypted Russian is an easier task than switching from English plaintext to Russian plaintext. Code groups were transmitted at a slower speed than words because you can never learn to recognize entire code groups. Also, every letter of a code group is important; you cannot fill in anything from context.

Russian Morse code consists largely of the same sequences of dots and dashes as English Morse code, with some additions. For example, the Russian letter Д is transmitted in Morse code as -.. just like the English letter D. So our telegraph operator could hear -.. and transcribe it as D, then later change D to Д.

The Russian alphabet has 33 letters so it needs Morse codes for 7 more letters than English. Actually, it uses 6 more symbols, transmitting Е and Ё with the same code. Some of the additional codes might have been familiar to our telegrapher. For example Я is transmitted as .-.- which is the same code an American telegrapher would use for ä (if he bothered to distinguish ä from a).

All the additional codes used in Russian correspond to uncommon Latin symbols (ö, ch, ñ, é, ü, and ä) and so our telegrapher could transcribe Russian Morse code without using any Latin letters.

The next question is how the Russian Morse code symbols correspond to the English. Sometimes the correspondence is natural. For example, Д is the same consonant sound as D. But the correspondence between Я and ä is arbitrary.

I wrote about entering Russian letters in Vim a few weeks ago, and I wondered how the mapping of Russian letters to English letters implicit in Morse code corresponds to the mapping used in Vim.

Most Russian letters can be entered in Vim by typing Ctrl-k followed by the corresponding English letter and an equal sign. The question is whether Morse code and Vim have the same idea of what corresponds to what. Many are the same. For example, both agree that Д corresponds to D. But there are some exceptions.

Here’s the complete comparison.

     Vim   Morse
А    A=    A
Б    B=    B
В    V=    W
Г    G=    G
Д    D=    D
Е    E=    E
Ё    IO    E
Ж    Z%    V
З    Z=    Z
И    I=    I
Й    J=    J
К    K=    K
Л    L=    L
М    M=    M
Н    N=    N
О    O=    O
П    P=    P
Р    R=    R
С    S=    S
Т    T=    T
У    U=    U
Ф    F=    F
Х    H=    H
Ц    C=    C
Ч    C%    ö
Ш    S%    ch
Щ    Sc    Q
Ъ    ="    ñ
Ы    Y=    Y
Ь    %"    X
Э    JE    é
Ю    JU    ü
Я    JA    ä

Related posts

Partitioning dots and dashes

Given a set of dots and dashes, how many ways can they be partitioned into a set of Morse code letters?

There is at least one way, since you could take each dot to be an E and each dash to be a T.

If you have a sequence of n dots and dashes, there no more than 2n−1 ways to partition the symbols: at each of the n − 1 spaces between symbols, you either start a new partition or you don’t. This is an over-estimate for large n since a Morse code letter has at most 4 dots or dashes, and not all combinations of four dots and dashes corresponds to a letter.

Last year I wrote about the song YYZ and how it was inspired by the sound of “YYZ” in Morse code, YYZ being the designation of the Toronto airport. Here’s the song’s opening theme:

The C code given here enumerates partitions of dots and dashes, and it shows that there are 1324 ways to partition -.---.----.. into the Morse code for letters [1]. This number 1324 is closer to our upper estimate of 211 = 2048 than our lower estimate of 1.

Define a function M(n) as follows. Express n in binary, convert the 0s to dots and the 1s to dashes, and let M(n) be the number of ways this sequence of dots and dashes can be partitioned into letters. The n corresponding to the Morse code for YYZ is 101110111100two = 3004ten, so M(3004) = 1324.

I looked to see whether M(n) were in OEIS and it doesn’t appear to be, though there are several sequences in OEIS that include Morse code in their definition.

It’s easy to see that 1 ≤ M(n) ≤ n. Exercise for the reader: find sharper upper and lower bounds for M(n). For example, show that every group of three bits can be partitioned four ways, and so M(n) has a lower bound something like n2/3.

Related posts

[1] The code returns 1490 possibilities, but some of these contain one or more asterisks indicating combinations of dots and dashes that do not correspond to letters.

YYZ and Morse code

The song YYZ by Rush opens with a theme based on the rhythm of “YYZ” in Morse code:

    -.--  -.--  --..

YYZ is the designation for the Toronto Pearson International Airport, the main airport serving Toronto. The idea for the song came from hearing the airport identifier in Morse code.

However, the song puts no spaces between rhythm corresponding to each letter. Here’s what the opening riff would look like in sheet music:

Each dash is a middle C and each dot is an F# a tritone below middle C.

When I listen to the song, I don’t hear YYZ. My mind splits up the rhythm with each sequence of long notes starting a group:

    -.  ---.  ----..

So I hear the 20/8 time signature as (3 + 7 + 10)/8.

In terms of Morse code, -. is N. Interpreting the other groupings depends on what you mean by Morse code. The American amateur radio community defines Morse code as 40 characters: the 26 letters of the Latin alphabet, 10 digits, and 4 more symbols: / = , . Using that definition of Morse code, there are no symbols corresponding to ---. or ----... There is no symbol corresponding to ---- either. More on unused sequences here.

However, sometimes ---. is used for Ö and ---- for Š. So the way I hear “YYX” would be more like “NÖŠI”.

There are many other ways to parse -.---.----.. into Morse code symbols. For example, NO1I

    -.  ---  .----  ..

Enumeration

How many ways could you split -.---.----.. into valid Morse code?

Here’s an outline of a recursive algorithm to enumerate the possibilities.

Start at the beginning and list the possible symbols formed by consecutive dots and dashes. In our case the possible symbols are T, N, K, and Y. So the possibilities are

  • T (-) added to the front of all sequences that start with .---.----..
  • N (-.) added to the front of all sequences that start with ---.----..
  • K (-.-) added to the front of all sequences that start with --.----..
  • Y (-.--) added to the front of all sequences that start with -.----..

So for the first bullet point, for example, how would we find all sequences that start with .---.----..? Use the same idea.

  • E (.) added to the front of all sequences that start with ---.----..
  • A (.-) added to the front of all sequences that start with --.----..
  • W (.--) added to the front of all sequences that start with -.----..
  • J (.---) added to the front of all sequences that start with .----..

So pull off all the symbols you can from the beginning of the list of dots and dashes and in each case recurse on the rest of the list.

Related posts

French palindromes and Morse code

I got an email from a student in France who asked about a French counterpart to my post on Morse code palindromes, and this post is a response to that email.

Palindromes

A palindrome is a word that remains the same when the letters are reversed, like kayak. A Morse code palindrome is a word that remains the same when its Morse code representation is reversed.

The word kayak is not a Morse code palindrome because its Morse code representation

    -.- .- -.-- .- -.-

when reversed becomes

    -.- -. --.- -. -.-

which is the Morse code for knqnk.

The word wig is a palindrome in Morse code because

    .-- .. --.

reads the same in reverse.

French distinctives

Now what about French? I saved the script I wrote to find Morse palindromes in English, and I ran it on the French dictionary located at

    /usr/share/dict

on my Linux box.

I thought I’d have to modify the script because French uses characters in addition to the 26 letters of the Roman alphabet, such as ç, a ‘c’ with a cedilla. There is a Morse code for ç

    -.-..

but its reverse is not a letter.

It’s not clear exactly what is “French Morse code” because there are a number of code values that could be used in French (or English) to represent letters with diacritical marks.

The code for é is itself a palindrome, so I didn’t need to modify my script for it. As far as I know, there are no codes for accented letters which are valid letters when reversed, except for ü whose code is the opposite of z. But there are no Morse palindromes in French if you add ü.

Results

See this file for complete results. Some of these words remain the same when translated to Morse, reversed, and translated back, such as sans. Others, are pairs that of valid words but not the same word, such as ail and fin.

Related posts

Partitions of unity, smooth ramps, and CW clicks

Partitions of unity are a handy technical device. They’re seldom the focus of attention but rather are buried in the middle of proofs.

The name sounds odd, but it’s descriptive. A partition of unity is a set of smooth functions into the interval [0, 1] that add up to 1 at every point. The functions split up (partition) the number 1 (unity) at each point. The functions are chosen to have properties that let you glue together local results to create a global result.

Smooth ramp functions

Proving the existence of partitions of unity with the desired properties isn’t trivial. One of the steps along the way is to prove that you can create functions than ramp up smoothly between constant values. You want to show there are functions f that equal 0 on one side of a closed interval [a, b] and equal 1 on the other side. That is, you can choose f such that f(x) = 0 for xa and f(x) = 1 for xb. You can also require f to be monotone increasing over the interval [a, b].

It may seem obvious that smooth ramp functions exist, but they do not exist if you require your functions to have a power series at every point. Ramp functions can be infinitely differentiable, but they cannot be analytic.

Smooth ramp functions are used everywhere, but they’re complicated to write down explicitly.

CW clicks

Morse code is sent over a radio using CW, continuous wave. The name is historical, contrasting with an early method known as damped wave.

To send a dot or a dash, you send a short or a long pulse of a fixed pitch. If you abruptly turn this tone on and off you’ll create noisy side effects called clicks. As I wrote about in this post, an abrupt change in frequency creates broad spectrum side effects, but smoothing the transition greatly reduces the bandwidth.

The recommended rise and fall time for a CW pulse is between 2 and 4 milliseconds. So if a dot is transmitted as a 50 ms pulse, your equipment might shape the pulse to be a 42 ms pulse at full amplitude with 4 ms transitions on each side where the amplitude smoothly rises and falls. That is, you multiply your pulse by a couple of smooth ramp functions as described above.

Here’s a plot for a pulse of a 800 Hz tone.

This minor modification of pulses makes no audible difference to the desired signal but greatly reduces unwanted effects.

Related posts

Morse code numbers and abbreviations

Numbers in Morse code seem a little strange. Here they are:

    |-------+-------|
    | Digit | Code  |
    |-------+-------|
    |     1 | .---- |
    |     2 | ..--- |
    |     3 | ...-- |
    |     4 | ....- |
    |     5 | ..... |
    |     6 | -.... |
    |     7 | --... |
    |     8 | ---.. |
    |     9 | ----. |
    |     0 | ----- |
    |-------+-------|

They’re fairly regular, but not quite. That’s why a couple years ago I thought it would be an interesting exercise to write terse code to encode and decode digits in Morse code. There’s exploitable regularity, but it’s irregular enough to make the exercise challenging.

Design

As with so many things, this scheme makes more sense than it seems at first. When you ask “Why didn’t they just …” there’s often a non-obvious answer.

The letters largely exhausted the possibilities of up to 4 dots and dashes. Some digits would have to take five symbols, and it makes sense that they would all take 5 symbols. But why the ones above? This scheme uses a lot of dashes, and dashes take three times longer to transmit than dots.

A more efficient scheme would be to use binary notation, with dot for 0’s and dash for 1’s. That way the leading symbol would always be a dot and usually the second would be a dot. That’s when encoding digits 0 through 9. As a bonus you could use the same scheme to encode larger numbers in a single Morse code entity.

The problem with this scheme is that Morse code is intended for humans to decode by ear. A binary scheme would be hard to hear. The scheme actually used is easy to hear because you only change from dot to dash at most once. As Morse code entities get longer, the patterns get simpler. Punctuation marks take six or more dots and dashes, but they have simple patterns that are easy to hear.

Code golf

When I posed my coding exercise as a challenge, the winner was Carlos Luna-Mota with the following Python code.

    S="----.....-----"
    e=lambda x:S[9-x:14-x]
    d=lambda x:9-S.find(x)

Honorable mention goes to Manuel Eberl with the following code. It only does decoding, but is quite clever and short.

    d=lambda c:hash(c+'BvS+')%10

It only works in Python 2 because it depends on the specific hashing algorithm used in earlier versions of Python.

Cut numbers

If you’re mixing letters and digits, digits have to be five symbols long. But if you know that characters have to be digits in some context, this opens up the possibility of shorter encodings.

The most common abbreviations are T for 0 and N for 9. For example, a signal report is always three digits, and someone may send 5NN rather than 599 because in that context it’s clear that the N’s represent 9s.

When T abbreviates 0 it might be a “long dash,” slightly longer than a dash meant to represent a T. This is not strictly according to Hoyle but sometimes done.

There are more abbreviations, so called cut numbers, though these are much less common and therefore less likely to be understood.

    |-------+-------+-----+--------+----|
    | Digit | Code  |  T1 | Abbrev | T2 |
    |-------+-------+-----+--------+----|
    |     1 | .---- |  17 | .-     |  5 |
    |     2 | ..--- |  15 | ..-    |  7 |
    |     3 | ...-- |  13 | ...-   |  9 |
    |     4 | ....- |  11 | ....-  | 11 |
    |     5 | ..... |   9 | .      |  1 |
    |     6 | -.... |  11 | -....  | 11 |
    |     7 | --... |  13 | -...   |  9 |
    |     8 | ---.. |  15 | -..    |  7 |
    |     9 | ----. |  17 | -.     |  5 |
    |     0 | ----- |  19 | -      |  3 |
    |-------+-------+-----+--------+----|
    | Total |       | 140 |        | 68 |
    |-------+-------+-----+--------+----|

The space between dots and dashes is equal to one dot, and the length of a dash is the length of three dots. So the time required to send a sequence of dots and dashes equals

2(# dots) + 4(# dashes) – 1

In the table above, T1 is the time to transmit a digit, in units of dots, without abbreviation, and T2 is the time with abbreviation. Both the maximum time and the average time are cut approximately in half. Of course that’s ideal transmission efficiency, not psychological efficiency. If the abbreviations are not understood on the receiving end and the receiver asks for numbers to be repeated, the shortcut turns into a longcut.

Related posts

Morse code in musical notation

Maybe this has been done before, but I haven’t seen it: Morse code in musical notation.

Here’s the Morse code alphabet, one letter per measure; in practice there would be less space between letters [1]. A dash is supposed to be three times as long as a dot, so a dot is a sixteenth note and a dash is a dotted eighth note.

Morse code is often at a frequency between 600 and 800 Hz. I picked the E above middle C (660 Hz) because it’s in that range.

Rhythm

Officially a dash is three times as long as a dot. But there’s also a space equal to the length of a dot between parts of a letter. So the sheet music above would be more accurate if you imagined all the sixteenth notes are staccato and the dotted eighth notes are really eighth notes followed by a sixteenth rest.

This doesn’t make much difference because individual operators have varying “fists,” styles of sending Morse code, and won’t exactly follow the official length and spacing rules.

You could rewrite the music above as follows, but it’s all an approximation.

Tempo

According to Wikipedia, “the dit length at 20 words per minute is 50 milliseconds.” So if a sixteenth note has a duration of 50 milliseconds, this would mean five quarter notes per second, or 300 beats per minute. But according to this video, the shortest duration people can distinguish is about 50 milliseconds.

That would imply that copying Morse code at 20 wpm is pushing the limits of human hearing. But copying at 20 wpm is common. Some people can copy Morse code at more than 50 words per minute or more, but at that speed they’re not hearing individual dits and dahs. An H, for example, four dits in a row, sounds like a single rough sound. In fact, they’re not really hearing letters at all but recognizing the shape of words.

How the image was made

I made the image above with LaTeX and Lilypond.

Adding the letters above each measure was kind of a hack. I used rehearsal markings to label the measures, but there was one problem: the software skips from letter H to letter J. That meant that the labels I and all subsequent letters were one ahead of what they should be, and the final letter Z was labeled AA. I tried several tricks, and Lilypond steadfastly refused to label a measure with ‘I’ even though I’ve seen such a label in the documentation.

My way around this was to make it label two consecutive measures with H, then in image editing software I turned the second H into an I. No doubt there’s a better way, but this worked.

I may play around with this and try to improve it a bit. If you have any suggestions, particularly related to Lilypond, please let me know.

Related posts

[1] You could think of the musical score above as a sort of transcription of the Farnsworth method of teaching Morse code. Students learn the letters at full speed, but with extra space between the letters at first. The faster speed discourages consciously counting the dits and dahs, forcing the student to listen to the overall rhythm of the letters.

Q codes in Seveneves

The first time I heard of Q codes was when reading the novel Seveneves by Neal Stephenson. These are three-letter abbreviations using in Morse code that all begin with Q.

Since Q is always followed by U in native English words, Q can be used to begin a sort of escape sequence [1].

There are dozens of Q codes used in amateur radio [2], and more used in other contexts, but there are only 10 Q codes used in Seveneves [3]. All begin with Q, followed by R, S, or T.

Tree[Q, {Tree[R, {A, K, N, S, T}], Tree[S, {B, L, O}], Tree[T, {H, X}]}]

Each Q code can be used both as a question and as an answer or statement. For example, QRS can mean “Would you like me to slow down” or “Please slow down.” I’ll just give the interrogative forms below.

Here are the 10 codes that appear in Stephenson’s novel.

QRA
What is your call sign?
QRK
Is my signal intelligible?
QRN
Is static a problem?
QRS
Should I slow down?
QRT
Should I stop sending?
QSB
Is my signal fading?
QSL
Are you still there?
QSO
Could you communicate with …?
QTH
Where are you?
QTX
Will you keep your station open for talking with me?

Related posts

[1] Some Q codes have a U as the second letter. I don’t know why—there are plenty of unused TLAs that begin with Q—but it is what it is.

[2] You can find a list here.

[3] There is one non-standard code in the novel: QET for “not on planet Earth.”

Missing Morse codes

Morse codes for Latin letters are sequences of between one and four symbols, where each symbol is a dot or a dash. There are 2 possible sequences with one symbol, 4 with two symbols, 8 with three symbols, and 16 with four symbols. This makes a total of 30 sequences with up to four symbols. There are 26 letters, so what are the four missing codes?

Here they are:

    .-.- 
    ..-- 
    ---. 
    ---- 

There are various uses for these codes, such as variants of Latin letters.

The first sequence on the list, .-.- is similar to two A’s .- .- and is used for variations on A, such as ä or æ.

The sequence ..-- is like a U (..-) with an extra dash on the end, and is used for variations on U, like ü.

The sequence ---. is like O (---) with an extra dot on the end, and is used for variations on O, like ö.

The last sequence ---- is used for letters like Ch or Š. Go figure.

Sequences of length 5

Sequences of five or six symbols are used for numbers, punctuation, and a few miscellaneous tasks, but there are a few unused combinations. (“Unused” is fuzzy here. Maybe some people do or did use these sequences.)

Here are the five-symbol sequences that do not appear in the Wikipedia article on Morse code:

    ..-.-
    .-.--
    -..--
    -.-.-
    -.---
    ---.-

So our of 32 possibilities, people have found uses for 26 of them.

Sequences of length 6

Out of 64 possible sequences of six symbols, 13 have found a use.

It’s harder to distinguish longer sequences by ear, and so it’s not surprising that most sequences of six symbols are unused; the ones that are used have special patterns that are easier to hear. Here are the ones that are used.

    ..--..
    ..--.-
    .-..-.
    .-.-.-
    .--.-.
    .----.
    -....-
    -.-.-.
    -.-.--
    -.--.-
    --..-.
    --..--
    ---...

Related posts