Morse code numbers and abbreviations

Numbers in Morse code seem a little strange. Here they are:

```    |-------+-------|
| Digit | Code  |
|-------+-------|
|     1 | .---- |
|     2 | ..--- |
|     3 | ...-- |
|     4 | ....- |
|     5 | ..... |
|     6 | -.... |
|     7 | --... |
|     8 | ---.. |
|     9 | ----. |
|     0 | ----- |
|-------+-------|
```

They’re fairly regular, but not quite. That’s why a couple years ago I thought it would be an interesting exercise to write terse code to encode and decode digits in Morse code. There’s exploitable regularity, but it’s irregular enough to make the exercise challenging.

Design

As with so many things, this scheme makes more sense than it seems at first. When you ask “Why didn’t they just …” there’s often a non-obvious answer.

The letters largely exhausted the possibilities of up to 4 dots and dashes. Some digits would have to take five symbols, and it makes sense that they would all take 5 symbols. But why the ones above? This scheme uses a lot of dashes, and dashes take three times longer to transmit than dots.

A more efficient scheme would be to use binary notation, with dot for 0’s and dash for 1’s. That way the leading symbol would always be a dot and usually the second would be a dot. That’s when encoding digits 0 through 9. As a bonus you could use the same scheme to encode larger numbers in a single Morse code entity.

The problem with this scheme is that Morse code is intended for humans to decode by ear. A binary scheme would be hard to hear. The scheme actually used is easy to hear because you only change from dot to dash at most once. As Morse code entities get longer, the patterns get simpler. Punctuation marks take six or more dots and dashes, but they have simple patterns that are easy to hear.

Code golf

When I posed my coding exercise as a challenge, the winner was Carlos Luna-Mota with the following Python code.

```    S="----.....-----"
e=lambda x:S[9-x:14-x]
d=lambda x:9-S.find(x)
```

Honorable mention goes to Manuel Eberl with the following code. It only does decoding, but is quite clever and short.

`    d=lambda c:hash(c+'BvS+')%10`

It only works in Python 2 because it depends on the specific hashing algorithm used in earlier versions of Python.

Cut numbers

If you’re mixing letters and digits, digits have to be five symbols long. But if you know that characters have to be digits in some context, this opens up the possibility of shorter encodings.

The most common abbreviations are T for 0 and N for 9. For example, a signal report is always three digits, and someone may send 5NN rather than 599 because in that context it’s clear that the N’s represent 9s.

When T abbreviates 0 it might be a “long dash,” slightly longer than a dash meant to represent a T. This is not strictly according to Hoyle but sometimes done.

There are more abbreviations, so called cut numbers, though these are much less common and therefore less likely to be understood.

```    |-------+-------+-----+--------+----|
| Digit | Code  |  T1 | Abbrev | T2 |
|-------+-------+-----+--------+----|
|     1 | .---- |  17 | .-     |  5 |
|     2 | ..--- |  15 | ..-    |  7 |
|     3 | ...-- |  13 | ...-   |  9 |
|     4 | ....- |  11 | ....-  | 11 |
|     5 | ..... |   9 | .      |  1 |
|     6 | -.... |  11 | -....  | 11 |
|     7 | --... |  13 | -...   |  9 |
|     8 | ---.. |  15 | -..    |  7 |
|     9 | ----. |  17 | -.     |  5 |
|     0 | ----- |  19 | -      |  3 |
|-------+-------+-----+--------+----|
| Total |       | 140 |        | 68 |
|-------+-------+-----+--------+----|
```

The space between dots and dashes is equal to one dot, and the length of a dash is the length of three dots. So the time required to send a sequence of dots and dashes equals

2(# dots) + 4(# dashes) – 1

In the table above, T1 is the time to transmit a digit, in units of dots, without abbreviation, and T2 is the time with abbreviation. Both the maximum time and the average time are cut approximately in half. Of course that’s ideal transmission efficiency, not psychological efficiency. If the abbreviations are not understood on the receiving end and the receiver asks for numbers to be repeated, the shortcut turns into a longcut.