French palindromes and Morse code

I got an email from a student in France who asked about a French counterpart to my post on Morse code palindromes, and this post is a response to that email.

Palindromes

A palindrome is a word that remains the same when the letters are reversed, like kayak. A Morse code palindrome is a word that remains the same when its Morse code representation is reversed.

The word kayak is not a Morse code palindrome because its Morse code representation

    -.- .- -.-- .- -.-

when reversed becomes

    -.- -. --.- -. -.-

which is the Morse code for knqnk.

The word wig is a palindrome in Morse code because

    .-- .. --.

reads the same in reverse.

French distinctives

Now what about French? I saved the script I wrote to find Morse palindromes in English, and I ran it on the French dictionary located at

    /usr/share/dict

on my Linux box.

I thought I’d have to modify the script because French uses characters in addition to the 26 letters of the Roman alphabet, such as ç, a ‘c’ with a cedilla. There is a Morse code for ç

    -.-..

but its reverse is not a letter.

It’s not clear exactly what is “French Morse code” because there are a number of code values that could be used in French (or English) to represent letters with diacritical marks.

The code for é is itself a palindrome, so I didn’t need to modify my script for it. As far as I know, there are no codes for accented letters which are valid letters when reversed, except for ü whose code is the opposite of z. But there are no Morse palindromes in French if you add ü.

Results

See this file for complete results. Some of these words remain the same when translated to Morse, reversed, and translated back, such as sans. Others, are pairs that of valid words but not the same word, such as ail and fin.

Related posts

Partitions of unity, smooth ramps, and CW clicks

Partitions of unity are a handy technical device. They’re seldom the focus of attention but rather are buried in the middle of proofs.

The name sounds odd, but it’s descriptive. A partition of unity is a set of smooth functions into the interval [0, 1] that add up to 1 at every point. The functions split up (partition) the number 1 (unity) at each point. The functions are chosen to have properties that let you glue together local results to create a global result.

Smooth ramp functions

Proving the existence of partitions of unity with the desired properties isn’t trivial. One of the steps along the way is to prove that you can create functions than ramp up smoothly between constant values. You want to show there are functions f that equal 0 on one side of a closed interval [a, b] and equal 1 on the other side. That is, you can choose f such that f(x) = 0 for xa and f(x) = 1 for xb. You can also require f to be monotone increasing over the interval [a, b].

It may seem obvious that smooth ramp functions exist, but they do not exist if you require your functions to have a power series at every point. Ramp functions can be infinitely differentiable, but they cannot be analytic.

Smooth ramp functions are used everywhere, but they’re complicated to write down explicitly.

CW clicks

Morse code is sent over a radio using CW, continuous wave. The name is historical, contrasting with an early method known as damped wave.

To send a dot or a dash, you send a short or a long pulse of a fixed pitch. If you abruptly turn this tone on and off you’ll create noisy side effects called clicks. As I wrote about in this post, an abrupt change in frequency creates broad spectrum side effects, but smoothing the transition greatly reduces the bandwidth.

The recommended rise and fall time for a CW pulse is between 2 and 4 milliseconds. So if a dot is transmitted as a 50 ms pulse, your equipment might shape the pulse to be a 42 ms pulse at full amplitude with 4 ms transitions on each side where the amplitude smoothly rises and falls. That is, you multiply your pulse by a couple of smooth ramp functions as described above.

Here’s a plot for a pulse of a 800 Hz tone.

This minor modification of pulses makes no audible difference to the desired signal but greatly reduces unwanted effects.

Morse code numbers and abbreviations

Numbers in Morse code seem a little strange. Here they are:

    |-------+-------|
    | Digit | Code  |
    |-------+-------|
    |     1 | .---- |
    |     2 | ..--- |
    |     3 | ...-- |
    |     4 | ....- |
    |     5 | ..... |
    |     6 | -.... |
    |     7 | --... |
    |     8 | ---.. |
    |     9 | ----. |
    |     0 | ----- |
    |-------+-------|

They’re fairly regular, but not quite. That’s why a couple years ago I thought it would be an interesting exercise to write terse code to encode and decode digits in Morse code. There’s exploitable regularity, but it’s irregular enough to make the exercise challenging.

Design

As with so many things, this scheme makes more sense than it seems at first. When you ask “Why didn’t they just …” there’s often a non-obvious answer.

The letters largely exhausted the possibilities of up to 4 dots and dashes. Some digits would have to take five symbols, and it makes sense that they would all take 5 symbols. But why the ones above? This scheme uses a lot of dashes, and dashes take three times longer to transmit than dots.

A more efficient scheme would be to use binary notation, with dot for 0’s and dash for 1’s. That way the leading symbol would always be a dot and usually the second would be a dot. That’s when encoding digits 0 through 9. As a bonus you could use the same scheme to encode larger numbers in a single Morse code entity.

The problem with this scheme is that Morse code is intended for humans to decode by ear. A binary scheme would be hard to hear. The scheme actually used is easy to hear because you only change from dot to dash at most once. As Morse code entities get longer, the patterns get simpler. Punctuation marks take six or more dots and dashes, but they have simple patterns that are easy to hear.

Code golf

When I posed my coding exercise as a challenge, the winner was Carlos Luna-Mota with the following Python code.

    S="----.....-----"
    e=lambda x:S[9-x:14-x]
    d=lambda x:9-S.find(x)

Honorable mention goes to Manuel Eberl with the following code. It only does decoding, but is quite clever and short.

    d=lambda c:hash(c+'BvS+')%10

It only works in Python 2 because it depends on the specific hashing algorithm used in earlier versions of Python.

Cut numbers

If you’re mixing letters and digits, digits have to be five symbols long. But if you know that characters have to be digits in some context, this opens up the possibility of shorter encodings.

The most common abbreviations are T for 0 and N for 9. For example, a signal report is always three digits, and someone may send 5NN rather than 599 because in that context it’s clear that the N’s represent 9s.

When T abbreviates 0 it might be a “long dash,” slightly longer than a dash meant to represent a T. This is not strictly according to Hoyle but sometimes done.

There are more abbreviations, so called cut numbers, though these are much less common and therefore less likely to be understood.

    |-------+-------+-----+--------+----|
    | Digit | Code  |  T1 | Abbrev | T2 |
    |-------+-------+-----+--------+----|
    |     1 | .---- |  17 | .-     |  5 |
    |     2 | ..--- |  15 | ..-    |  7 |
    |     3 | ...-- |  13 | ...-   |  9 |
    |     4 | ....- |  11 | ....-  | 11 |
    |     5 | ..... |   9 | .      |  1 |
    |     6 | -.... |  11 | -....  | 11 |
    |     7 | --... |  13 | -...   |  9 |
    |     8 | ---.. |  15 | -..    |  7 |
    |     9 | ----. |  17 | -.     |  5 |
    |     0 | ----- |  19 | -      |  3 |
    |-------+-------+-----+--------+----|
    | Total |       | 140 |        | 68 |
    |-------+-------+-----+--------+----|

The space between dots and dashes is equal to one dot, and the length of a dash is the length of three dots. So the time required to send a sequence of dots and dashes equals

2(# dots) + 4(# dashes) – 1

In the table above, T1 is the time to transmit a digit, in units of dots, without abbreviation, and T2 is the time with abbreviation. Both the maximum time and the average time are cut approximately in half. Of course that’s ideal transmission efficiency, not psychological efficiency. If the abbreviations are not understood on the receiving end and the receiver asks for numbers to be repeated, the shortcut turns into a longcut.

Related posts

Morse code in musical notation

Maybe this has been done before, but I haven’t seen it: Morse code in musical notation.

Here’s the Morse code alphabet, one letter per measure; in practice there would be less space between letters [1]. A dash is supposed to be three times as long as a dot, so a dot is a sixteenth note and a dash is a dotted eighth note.

Morse code is often at a frequency between 600 and 800 Hz. I picked the E above middle C (660 Hz) because it’s in that range.

Rhythm

Officially a dash is three times as long as a dot. But there’s also a space equal to the length of a dot between parts of a letter. So the sheet music above would be more accurate if you imagined all the sixteenth notes are staccato and the dotted eighth notes are really eighth notes followed by a sixteenth rest.

This doesn’t make much difference because individual operators have varying “fists,” styles of sending Morse code, and won’t exactly follow the official length and spacing rules.

You could rewrite the music above as follows, but it’s all an approximation.

Tempo

According to Wikipedia, “the dit length at 20 words per minute is 50 milliseconds.” So if a sixteenth note has a duration of 50 milliseconds, this would mean five quarter notes per second, or 300 beats per minute. But according to this video, the shortest duration people can distinguish is about 50 milliseconds.

That would imply that copying Morse code at 20 wpm is pushing the limits of human hearing. But copying at 20 wpm is common. Some people can copy Morse code at more than 50 words per minute or more, but at that speed they’re not hearing individual dits and dahs. An H, for example, four dits in a row, sounds like a single rough sound. In fact, they’re not really hearing letters at all but recognizing the shape of words.

How the image was made

I made the image above with LaTeX and Lilypond.

Adding the letters above each measure was kind of a hack. I used rehearsal markings to label the measures, but there was one problem: the software skips from letter H to letter J. That meant that the labels I and all subsequent letters were one ahead of what they should be, and the final letter Z was labeled AA. I tried several tricks, and Lilypond steadfastly refused to label a measure with ‘I’ even though I’ve seen such a label in the documentation.

My way around this was to make it label two consecutive measures with H, then in image editing software I turned the second H into an I. No doubt there’s a better way, but this worked.

I may play around with this and try to improve it a bit. If you have any suggestions, particularly related to Lilypond, please let me know.

Related posts

[1] You could think of the musical score above as a sort of transcription of the Farnsworth method of teaching Morse code. Students learn the letters at full speed, but with extra space between the letters at first. The faster speed discourages consciously counting the dits and dahs, forcing the student to listen to the overall rhythm of the letters.

Q codes in Seveneves

The first time I heard of Q codes was when reading the novel Seveneves by Neal Stephenson. These are three-letter abbreviations using in Morse code that all begin with Q.

Since Q is always followed by U in native English words, Q can be used to begin a sort of escape sequence [1].

There are dozens of Q codes used in amateur radio [2], and more used in other contexts, but there are only 10 Q codes used in Seveneves [3]. All begin with Q, followed by R, S, or T.

Tree[Q, {Tree[R, {A, K, N, S, T}], Tree[S, {B, L, O}], Tree[T, {H, X}]}]

Each Q code can be used both as a question and as an answer or statement. For example, QRS can mean “Would you like me to slow down” or “Please slow down.” I’ll just give the interrogative forms below.

Here are the 10 codes that appear in Stephenson’s novel.

QRA
What is your call sign?
QRK
Is my signal intelligible?
QRN
Is static a problem?
QRS
Should I slow down?
QRT
Should I stop sending?
QSB
Is my signal fading?
QSL
Are you still there?
QSO
Could you communicate with …?
QTH
Where are you?
QTX
Will you keep your station open for talking with me?

Related posts

[1] Some Q codes have a U as the second letter. I don’t know why—there are plenty of unused TLAs that begin with Q—but it is what it is.

[2] You can find a list here.

[3] There is one non-standard code in the novel: QET for “not on planet Earth.”

Missing Morse codes

Morse codes for Latin letters are sequences of between one and four symbols, where each symbol is a dot or a dash. There are 2 possible sequences with one symbol, 4 with two symbols, 8 with three symbols, and 16 with four symbols. This makes a total of 30 sequences with up to four symbols. There are 26 letters, so what are the four missing codes?

Here they are:

    .-.- 
    ..-- 
    ---. 
    ---- 

There are various uses for these codes, such as variants of Latin letters.

The first sequence on the list, .-.- is similar to two A’s .- .- and is used for variations on A, such as ä or æ.

The sequence ..-- is like a U (..-) with an extra dash on the end, and is used for variations on U, like ü.

The sequence ---. is like O (---) with an extra dot on the end, and is used for variations on O, like ö.

The last sequence ---- is used for letters like Ch or Š. Go figure.

Sequences of length 5

Sequences of five or six symbols are used for numbers, punctuation, and a few miscellaneous tasks, but there are a few unused combinations. (“Unused” is fuzzy here. Maybe some people do or did use these sequences.)

Here are the five-symbol sequences that do not appear in the Wikipedia article on Morse code:

    ..-.-
    .-.--
    -..--
    -.-.-
    -.---
    ---.-

So our of 32 possibilities, people have found uses for 26 of them.

Sequences of length 6

Out of 64 possible sequences of six symbols, 13 have found a use.

It’s harder to distinguish longer sequences by ear, and so it’s not surprising that most sequences of six symbols are unused; the ones that are used have special patterns that are easier to hear. Here are the ones that are used.

    ..--..
    ..--.-
    .-..-.
    .-.-.-
    .--.-.
    .----.
    -....-
    -.-.-.
    -.-.--
    -.--.-
    --..-.
    --..--
    ---...

Related posts

Morse code palindromes

A palindrome is a word or sentence that remains the same when its characters are reversed. For example, the word “radar” is a palindrome, as is the sentence “Madam, I’m Adam.”

I was thinking today about Morse code palindromes, sequences of Morse code that remain the same when reversed.

This post will look at what it means for a letter or a word to be a palindrome in Morse code, then look at palindrome sentences in Morse code, then finally look at a shell script to find Morse palindromes.

Letters and words

Some individual letters are palindromes in Morse code, such as I (..) and P (.--.).

Some letters change into other letters when their Morse code representation is reversed. For example B (-...) becomes V (...-) and vice versa.

The letters C (-.-.), J (.---), and Z (--..) when reversed are no longer part of the 26-letter Roman alphabet, though the reversed sequences are sometimes used for vowels with umlauts: Ä (.-.-), Ö (---.), and Ü (..--).

The sequence SOS (... --- ...) is a palindrome in English and in Morse code. But some words are palindromes in Morse code that are not palindromes in English, such as “gnaw,” which is

    --. -. .- .--

in Morse code.

The longest word I’ve found which is a palindrome in Morse code is “footstool.”

    ..-. --- --- - ... - --- --- .-..

Sentences

I wrote some code to search a dictionary and make a list of English words that remain English words when converted to Morse code, reversed, and turned back into text. There aren’t that many, around 240. Then I looked for ways to make sentences out of these words.

For example, “Trevor sees Robert” is a palindrome in Morse code:

    - .-. . ...- --- .-. ... . . ... .-. --- -... . .-. -

If you’d like to try your hand at this, you might find a couple files useful. This file gives a list of words that remain the same when their Morse code is reversed, such as “outdo” (--- ..- - -.. ---) and this file gives a list of transformation pairs, such as “sail” (... .- .. .-..) and “fins” (..-. .. -. ...).

Shell scripting

Conceptually we want to write out words in Morse code, reverse the sequence of dots and dashes, and turn the result back into English text. But we can do this without actually working with Morse code.

We can reverse the letters in the input, then replace each letter with the letter corresponding to reversing its Morse code.

I don’t know of an easy way to reverse a string in a shell script, but I do know how to do it with a Perl one-liner.

    perl -lne 'print scalar reverse'

Next we need to turn around the dots and dashes of individual letters. Most letters stay the same, but there are six pairs of letters to swap:

  • (A, N)
  • (B, V)
  • (D, U)
  • (F, L)
  • (G, W)
  • (Q, Y)

The tr (“translate”) utility was made for this kind of task, replacing all characters in one string with their counterparts in another.

    tr ABDFGQNVULWY NVULWYABDFGQ

Note that tr effectively does all the translations at the same time. For example, it replaces A’s with N’s and N’s with A’s simultaneously. If it simply marched down the two strings, replacing A’s with N’s, then replacing B’s to V’s, etc., it would not do what we want. For example, AN would first become NN and then AA.

Putting these together, the following one-liner proves that “footstool” is a palindrome in Morse code

    echo FOOTSTOOL | perl -lne 'print scalar reverse' | 
    tr ABDFGQNVULWY NVULWYABDFGQ

because the output is “FOOTSTOOL”.

Perl has a tr function very much like the shell utility, so we could do more of the work in Perl:

    echo FOOTSTOOL | 
    perl -lne "tr /ABDFGQNVULWY/NVULWYABDFGQ/; print scalar reverse"

Update: A comment from Alastair below let me know you can replace the bit of Perl in the first one-liner with a call to tac.

    echo FOOTSTOOL | tac -rs . | tr ABDFGQNVULWY NVULWYABDFGQ

By default tac lists the lines of a file in reverse order. The name comes from reversing “cat”, the name of the command that dumps a file (“concatenates” it to standard output). The extra arguments to tac cause it to change the definition of a line separator to any character, as indicated by the regular expression consisting of a single period. This effectively tells tac to treat every character as a line, so reversing the lines reverses the string.

More Morse code posts

Morse code golf

You can read the title of this post as ((Morse code) golf) or as (Morse (code golf)).

Morse code is a sort of approximate Huffman coding of letters: letters are assigned symbols so that more common letters can be transmitted more quickly. You can read about how well Morse code achieves this design objective here.

But digits in Morse code are kinda strange. I imagine they were an afterthought, tacked on after encodings had been assigned to each of the letters, and so had to avoid encodings that were already in use. Here are the assignments:

    |-------+-------|
    | Digit | Code  |
    |-------+-------|
    |     1 | .---- |
    |     2 | ..--- |
    |     3 | ...-- |
    |     4 | ....- |
    |     5 | ..... |
    |     6 | -.... |
    |     7 | --... |
    |     8 | ---.. |
    |     9 | ----. |
    |     0 | ----- |
    |-------+-------|

There’s no attempt to relate transmission length to frequency. Maybe the idea was that all digits are equally common. While in some contexts this is true, it’s not true in general for mathematical and psychological reasons.

There is a sort of mathematical pattern to the Morse code symbols for digits. For 1 ≤ n ≤ 5, the symbol for n is n dots followed by 5-n dashes. For 6 ≤ n ≤ 9, the symbol is n-5 dashes followed by 10-n dots. The same rule extends to 0 if you think of 0 as 10.

A more mathematically satisfying way to assign symbols would have been binary numbers padded to five places:

    0 -> .....
    1 -> ....-
    2 -> ..._.
    etc.

Because the Morse encoding of digits is awkward, it’s not easy to describe succinctly. And here is where golf comes in.

The idea of code golf is to write the shortest program that does some task. Fewer characters is better, just as in golf the lowest score wins.

Here’s the challenge: Write two functions as small you can, one to encode digits as Morse code and another to decode Morse digits. Share your solutions in the comments below.

Related posts

ADFGVX cipher and Morse code separation

A century ago the German army used a field cipher that transmitted messages using only six letters: A, D, F, G, V, and X. These letters were chosen because their Morse code representations were distinct, thus reducing transmission error.

The ADFGVX cipher was an extension of an earlier ADFGV cipher. The ADFGV cipher was based on a 5 by 5 grid of letters. The ADFGVX extended the method to a 6 by 6 grid of letters and digits. A message was first encoded using the grid coordinates of the letters, then a transposition cipher was applied to the sequence of coordinates.

This post revisits the design of the ADFGVX cipher. Not the encryption method itself, but the choice of letters used for transmission. How would you quantify the difference between two Morse code characters? Given that method of quantification, how good was the choice of ADFGV or its extension ADFGVX? Could the Germans have done better?

Quantifying separation

There are several possible ways to quantify how distinct two Morse code signals are.

Time signal difference

My first thought was to compare the signals as a function of time.

There are differing conventions for how long a dot or dash should be, and how long the space between dots and dashes should be. For this post, I will assume a dot is one unit of time, a dash is three units of time, and the space between dots or dashes is one unit of time.

The letter A is represented by a dot followed by a dash. I’ll represent this as 10111: on for one unit of time for the dot, off for one unit of time for the space between the dot and the dash, and on for three units of time for the dash. D is dash dot dot, so that would be 1110101.

We could quantify the difference between two letters in Morse code as the Hamming distance between their representations as 0s and 1s, i.e. the number of positions in which the two letters differ. To compare A and D, for example, I’ll pad the A with a couple zeros on the end to make it the same length as D.

    A: 1011100
    D: 1110101
        x x  x

The distance is 3 because the two sequences differ in three positions. (Looking back at the previous post on popcount, you could compute the distance as the popcount of the XOR of the two bit patterns.)

A problem with this approach is that it seems to underestimate the perceived difference between F and G.

    F: ..-. 1010111010
    G: --.  1110111010
             x

These only differ in the second bit, but they sound fairly different.

Symbolic difference

The example above suggests maybe we should compare the sequence of dots and dashes themselves rather than compare their corresponding time signals. By this measure F and G are distance 4 apart since they differ in every position.

Other possibilities

Comparing the symbol difference may over-estimate the difference between U (..-) and V (...-). We should look at some combination of time signal difference and symbolic difference.

Or maybe the thing to do would be to look at something like the edit distance between letters. We could say that U and V are close because it only takes inserting a dot to turn a U into a V.

Was ADFGV optimal?

There are several choices of letters that would have been better than ADFGV by either way of measuring distance. For example, CELNU has better separation and takes about 14% less time to transmit than ADFGV.

Here are a couple tables that give the time distance (dT) and the symbol distance (dS) for ADFGV and for CELNU.

    |------+----+----|
    | Pair | dT | dS |
    |------+----+----|
    | AD   |  3 |  3 |
    | AF   |  4 |  3 |
    | AG   |  5 |  2 |
    | AV   |  4 |  3 |
    | DF   |  3 |  3 |
    | DG   |  2 |  1 |
    | DV   |  3 |  2 |
    | FG   |  1 |  4 |
    | FV   |  2 |  2 |
    | GV   |  3 |  3 |
    |------+----+----|
    
    |------+----+----|
    | Pair | dT | dS |
    |------+----+----|
    | CE   |  7 |  4 |
    | CL   |  4 |  3 |
    | CN   |  4 |  2 |
    | CU   |  5 |  2 |
    | EL   |  5 |  3 |
    | EN   |  3 |  2 |
    | EU   |  4 |  2 |
    | LN   |  4 |  4 |
    | LU   |  3 |  3 |
    | NU   |  3 |  2 |
    |------+----+----|

For six letters, CELNOU is faster to transmit than ADFGVX. It has a minimum time distance separation of 3 and a minimum symbol distance 2.

Time spread

This is an update in response to a comment that suggested instead of minimizing transmission time of a set of letters, you might want to pick letters that are most similar in transmission time. It takes much longer to transmit C (-.-.) than E (.), and this could make CELNU harder to transcribe than ADFGV.

So I went back to the script I’m using and added time spread, the maximum transmission time minus the minimum transmission time, as a criterion. The ADFGV set has a spread of 4 because V takes 4 time units longer to transmit than A. CELNU has a spread of 10.

There are 210 choices of 5 letters that have time distance greater than 1, symbol distance greater than 1, and spread equal to 4. That is, these candidates are more distinct than ADFGV and have the same spread.

It takes 44 time units to transmit ADFGV. Twelve of the 210 candidates identified above require 42 or 40 time units. There are five that take 40 time units:

  • ABLNU
  • ABNUV
  • AGLNU
  • AGNUV
  • ALNUV

Looking at sets of six letters, there are 464 candidates that have better separation than ADFGVX and equal time spread. One of these, AGLNUX, is an extension of one of the 5-letter candidates above.

The best 6-letter are ABLNUV and AGLNUV. They are better than ADFGVX by all the criteria discussed above. They both have time distance separation 2 (compared to 1), symbol distance separation 2 (compared to 1), time spread 4, (compared to 6) and transmission time 50 (compared to 56).

More Morse code posts

How efficient is Morse code?

telegraph

Morse code was designed so that the most frequently used letters have the shortest codes. In general, code length increases as frequency decreases.

How efficient is Morse code? We’ll compare letter frequencies based on Google’s research with the length of each code, and make the standard assumption that a dash is three times as long as a dot.

|--------+------+--------+-----------|
| Letter | Code | Length | Frequency |
|--------+------+--------+-----------|
| E      | .    |      1 |    12.49% |
| T      | -    |      3 |     9.28% |
| A      | .-   |      4 |     8.04% |
| O      | ---  |      9 |     7.64% |
| I      | ..   |      2 |     7.57% |
| N      | -.   |      4 |     7.23% |
| S      | ...  |      3 |     6.51% |
| R      | .-.  |      5 |     6.28% |
| H      | .... |      4 |     5.05% |
| L      | .-.. |      6 |     4.07% |
| D      | -..  |      5 |     3.82% |
| C      | -.-. |      8 |     3.34% |
| U      | ..-  |      5 |     2.73% |
| M      | --   |      6 |     2.51% |
| F      | ..-. |      6 |     2.40% |
| P      | .--. |      8 |     2.14% |
| G      | --.  |      7 |     1.87% |
| W      | .--  |      7 |     1.68% |
| Y      | -.-- |     10 |     1.66% |
| B      | -... |      6 |     1.48% |
| V      | ...- |      6 |     1.05% |
| K      | -.-  |      7 |     0.54% |
| X      | -..- |      8 |     0.23% |
| J      | .--- |     10 |     0.16% |
| Q      | --.- |     10 |     0.12% |
| Z      | --.. |      8 |     0.09% |
|--------+------+--------+-----------|

There’s room for improvement. Assigning the letter O such a long code, for example, was clearly not optimal.

But how much difference does it make? If we were to rearrange the codes so that they corresponded to letter frequency, how much shorter would a typical text transmission be?

Multiplying the code lengths by their frequency, we find that an average letter, weighted by frequency, has code length 4.5268.

What if we rearranged the codes? Then we would get 4.1257 which would be about 9% more efficient. To put it another way, Morse code achieved 91% of the efficiency that it could have achieved with the same codes. This is relative to Google’s English corpus. A different corpus would give slightly different results.

Toward the bottom of the table above, letter frequencies correspond poorly to code lengths, though this hardly matters for efficiency. But some of the choices near the top of the table are puzzling. The relative frequency of the first few letters has remained stable over time and was well known long before Google. (See ETAOIN SHRDLU.) Maybe there were factors other than efficiency that influenced how the most frequently used characters were encoded.

Update: Some sources I looked at said that a dash is three times as long as a dot, including the space between dots or dashes. Others said there is a pause as long as a dot between elements. The latter is the official standard of the International Telecommunications Union.

If you use the official timing, it takes an average time equal to 6.0054 dots to transmit an English letter, and this could be improved to 5.6616. By that measure Morse code is about 93.5% efficient. (I only added time for space inside the code for a letter because the space between letters is the same no matter how they are coded.)