Q codes in Seveneves

The first time I heard of Q codes was when reading the novel Seveneves by Neal Stephenson. These are three-letter abbreviations using in Morse code that all begin with Q.

Since Q is always followed by U in native English words, Q can be used to begin a sort of escape sequence [1].

There are dozens of Q codes used in amateur radio [2], and more used in other contexts, but there are only 10 Q codes used in Seveneves [3]. All begin with Q, followed by R, S, or T.

Tree[Q, {Tree[R, {A, K, N, S, T}], Tree[S, {B, L, O}], Tree[T, {H, X}]}]

Each Q code can be used both as a question and as an answer or statement. For example, QRS can mean “Would you like me to slow down” or “Please slow down.” I’ll just give the interrogative forms below.

Here are the 10 codes that appear in Stephenson’s novel.

What is your call sign?
Is my signal intelligible?
Is static a problem?
Should I slow down?
Should I stop sending?
Is my signal fading?
Are you still there?
Could you communicate with …?
Where are you?
Will you keep your station open for talking with me?

Related posts

[1] Some Q codes have a U as the second letter. I don’t know why—there are plenty of unused TLAs that begin with Q—but it is what it is.

[2] You can find a list here.

[3] There is one non-standard code in the novel: QET for “not on planet Earth.”

Self-orthogonal vectors and coding

One of the surprising things about linear algebra over a finite field is that a non-zero vector can be orthogonal to itself.

When you take the inner product of a real vector with itself, you get a sum of squares of real numbers. If any element in the sum is positive, the whole sum is positive.

In a finite field, things don’t work that way. A list of non-zero squares can add up to zero.

In the previous post, we looked at the ternary Golay code, an error correcting code that multiplies a vector of data by a rectangular matrix mod 3. That post included the example that

[1, 0, 1, 2, 2]

was encoded as

[1 0 1 2 2 0 1 0 2 0 0].

The output is orthogonal to itself:

1² + 0² + 1² + 2² + 2² + 0² + 1² + 0² + 2² + 0² + 0² ≡ 0 (mod 3).

In fact, every output of the ternary Golay code will be self-orthogonal. To see this, let v be a 1 by 5 row vector. Then its encoding is vG where G is the 5 by 11 matrix in the previous post. The inner product of vG with itself is

vG (vG)T = vGGTvT.

We can easily show that GGT is a zero matrix using Python:

    >>> (G@G.T)%3
    [[0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]]

And so

vGGTvT = vOvT = 0.

If a vector is not self-orthogonal, there has been an error. For example, if one of the 0s in our output turned into a 1 or a 2, the inner product of the corrupted vector with itself would be equal to 1 (mod 3) rather than 0.

Self-orthogonality is necessary but not sufficient for an encoded vector to be error-free. Correctly encoded vectors form a 5 dimensional subspace of GF(3)11, namely the row space of G. The set of self-orthogonal vectors in GF(3)11 is a larger space.

A sufficient condition for a row vector w to be the encoding of a data vector v is for


to be the zero vector (carrying out all calculations mod 3).

The ternary Golay code is capable of detecting and correcting corruption in up to two spots in a vector. For example, suppose we take our vector

[1 0 1 2 2 0 1 0 2 0 0]

above and corrupt it by turning the first couple 2’s to 1’s.

[1 0 1 1 1 0 1 0 2 0 0]

We’d still get a self-orthogonal vector, but the product GwT would not be zero.

    >>> v = np.array( [1, 0, 1, 2, 2] )
    >>> w = (v@G)%3
    >>> w[3] = w[4] = 1
    >>> print((G@w)%3)

This prints

    [0 0 0 2 2]

which lets us know that the modified version of w is not a valid encoding, not a part of a row space of G.

If we know (or are willing to assume) that

[1 0 1 1 1 0 1 0 2 0 0]

is the result of no more than two changes applied a valid code word vG, then we can recover vG by looking for the closest vector in the row space of G. Here closeness is measured in Hamming distance: the number of positions in which vectors differ. Every vector in GF(3)11 is within a distance of 2 from a unique code word, and so the code can correct any two errors.

Related posts

Ternary Golay code in Python

Marcel Golay discovered two “perfect” error-correcting codes: one binary and one ternary. These two codes stick out in the classification of perfect codes [1].

The ternary code is a linear code over GF(3), the field with three elements. You can encode a list of 5 base-three digits by multiplying the list as a row vector by the following generator matrix on the right:

\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 2 & 2 & 0 \\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 2 & 1 & 0 & 2 \\ 0 & 0 & 1 & 0 & 0 & 1 & 2 & 1 & 0 & 1 & 2 \\ 0 & 0 & 0 & 1 & 0 & 1 & 2 & 0 & 1 & 2 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 2 & 2 & 1 & 1 \\ \end{pmatrix}

Here the arithmetic is carried out in GF(3): every multiplication and addition is carried out mod 3.

Suppose we want to write this up in Python. How can you tell NumPy that you want to do arithmetic mod 3? I don’t believe you can directly. However, you could just multiply your row vector by the matrix using ordinary arithmetic and reducing the result mod 3 at the end. This gives the same result as if all intermediate operations had been carried out mod 3.

For example, suppose we wanted to encode the list [1, 0, 1, 2, 2].

    import numpy as np

    G = np.array([
      [1, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0], 
      [0, 1, 0, 0, 0, 1, 1, 2, 1, 0, 2], 
      [0, 0, 1, 0, 0, 1, 2, 1, 0, 1, 2], 
      [0, 0, 0, 1, 0, 1, 2, 0, 1, 2, 1], 
      [0, 0, 0, 0, 1, 1, 0, 2, 2, 1, 1], 

    v = np.array( [1, 0, 1, 2, 2] )
    print ((v@G)%3)

The vector-matrix product v@G contains integers that are larger than 3:

    [1 0 1 2 2 6 7 6 8 9 6]

But when we reduce everything mod 3 we get

    [1 0 1 2 2 0 1 0 2 0 0]

This doesn’t mean that linear algebra over a finite field is trivial, though it was trivial in this case.

The same trick would work if we were to work modulo any prime. So the analogous trick would work mod 7, for example.

However, the trick would not work for the field with 8 elements, for example, because arithmetic in this field is not simply arithmetic mod 8. (You can read why here.) So our trick works for GF(p) for any prime p, but not in GF(pk) for any k > 1.

Related posts

[1] From “Sphere Packings, Lattices and Groups” by J. H. Conway and N. J. A. Sloane:

Perfect codes were essentially classified by Tietäväinen and van Lint. The list is as follows.

(i) Certain trivial codes …
(ii) Hamming codes …
(iii) Nonlinear codes with the same parameters as Hamming codes …
(iv) The binary [23, 12, 7] binary Golay code C23 and the ternary [11, 6, 5] Golay code C11.

Error correcting code from octonions

Yesterday I wrote about how to multiply octets of real numbers, the octonions. Today I’ll show how to create an error correcting code from the octonions. In fact, we’ll create a perfect code in the sense explained below.

We’re going to make a code out of octonions over a binary field. That is, we’re going to work with octets of bits rather than octets of real numbers. Our code is going to produce 8-bit numbers which we can think of as octonions with components in each dimension equal to either 0 or 1, and we’ll carry out addition mod 2.

Basis of the code

The earlier post bootstrapped the multiplication facts for octonions from the equation

e1 e2 = e4

and two symmetry rules:

  1. You can shift the subscripts by a constant amount mod 7.
  2. You can permute each rule if you multiply by the sign of the permutation.

The first rule says we have the following equations:

e1 e2 = e4
e2 e3 = e5
e3 e4 = e6
e4 e5 = e7
e5 e6 = e1
e6 e7 = e2
e7 e1 = e3

Our code is going to be based on the complements of the triples in the multiplication rules above. For example, the first rule contains subcripts 1, 2, and 4, and so its complement contains subscripts 3, 5, 6, and 7. So we want the seven octonions

e3 + e5 + e6 + e7
e1 + e4 + e6 + e7
e1 + e2 + e5 + e7
e1 + e2 + e3 + e6
e2 + e3 + e4 + e7
e1 + e3 + e4 + e5
e2 + e4 + e5 + e6

and one more octonion, the sum of all the bases:

1 + e1 + e2 + e3 + e4 + e5 + e6 + e7

Our code will be spanned by these eight octonions. Written as 8-bit numbers, or code will be spanned by


Every pair of binary numbers above differs in four positions. We can verify this with the following Python code.

    from itertools import combinations, product

    def popcount(n):
        return bin(n).count('1')

    def hd(m, n): # Hamming distance
        return popcount(m ^ n)

    gen = [

    for pair in combinations(gen, 2):
        assert(hd(pair[0], pair[1]) == 4)

We didn’t need to import product for the code above, but we’ll need it below.

Note that the popcount of the XOR of two numbers counts how many places where their bit representations differ, i.e. it computes the Hamming distance between the two bit vectors.

Hamming code

The fact that the octets above spread out well, each being a Hamming distance of 4 from all the rest, suggests they could be useful in coding, and in fact the vectors above span the Hamming code (8, 4, 4).

These octets from the previous section will be the basis of our code, “basis” in the colloquial sense of a starting point. They span our set of code words, but they’re not a basis in the linear algebra sense because they are not linearly independent. The following code shows that the octets span a 4 dimensional space, i.e. you can make 24 different bit patterns by XORing together combinations of these octets.

    codewords = set()
    n = 4
    for b in combinations(gen, n):
        for coeff in product(range(2), repeat=n):
            s = 0
            for i in range(n):
                s ^= coeff[i]*b[i]

    assert(len(codewords) == 16)

The code tells that our octets have a span of size 16, i.e. 4 dimensions over a 2-bit field.

(Normally this would be presented by doing linear algebra, but I’m doing it by brute force just for variety. I assume readers who are more comfortable with code than math will appreciate this.)

Perfect codes

The Hamming (8, 4, 4) code is “perfect” in the sense that its packing radius equals its covering radius.

The packing diameter is the minimum distance between code words, and the packing radius is half the packing diameter.

The following code shows this is 2.

    packing_radius = min(
        hd(p[0], p[1]) for p in combinations(codewords, 2))/2

Since the packing diameter is 4, we can detect if up to 3 bits are corrupted: no bit pattern that differs from a code word in three positions is a code word.

The covering radius is the maximum distance from any vector to a code word. The following Python snippet shows that no 8-bit vector is a Hamming distance of more than 2 from a code word.

    def dist_to_codeword(n):
        return min(hd(n, c) for c in codewords)

    covering_radius = max(dist_to_codeword(n) for n in range(256))

More coding theory posts

Vector spaces and subspaces over finite fields

A surprising amount of linear algebra doesn’t depend on the field you’re working over. You can implicitly assume you’re working over the real numbers R and prove all the basic theorems—say all the theorems that come before getting into eigenvalues in a typical course—and all or nearly all of the theorems remain valid if you swap out the complex numbers for the real numbers. In fact, they hold for any field, even a finite field.

Subspaces over infinite fields

Given an n-dimensional vector space V over a field F we can ask how many subspaces of V there are with dimension k. If our field F is the reals and 0 < k < n then there are infinitely many such subspaces. For example, if n = 2, every line through the origin is a subspace of dimension 1.

Now if we’re still working over R but we pick a basis

e1, e2, e3, …, en

and ask for how many subspaces of dimension k there are in V  that use the same basis elements, now we have a finite number. In fact the number is the binomial coefficient


because there are as many subspaces as there are ways to select k basis elements from a set of n.

Subspaces over finite fields

Let F be a finite field with q elements; necessarily q is a prime power. Let V be an n-dimensional vector space over F. We might need to know how many subspaces of V there are of various dimensions when developing algebraic codes, error detection and correction codes based on vector spaces over finite fields.

Then the number of subspaces of V with dimension k equals the q-binomial coefficient

\binom{n}{k}_q \equiv \frac{[n]_q!}{[k]_q! [n-k]_q!}

mentioned in the previous post. Here [n]q! is the q-factorial of n, defined by

[n]_q! \equiv [1]_q [2]_q [3]_q \cdots [n]_q

and [n]q! is the q-analog of n, defined by

[n]_q \equiv \frac{1 - q^n}{1-q} = 1 + q + q^2 + \cdots q^{n-1}

The fraction definition holds for all n, integer or not, when q ≠ 1. The fraction equals the polynomial in q when n is a non-negative integer.

You can derive the expression for the number of subspaces directly from a combinatorial argument, not using any of the q-analog notation, but this notation makes things much more succinct.

Python code

We can easily code up the definitions above in Python.

    def analog(n, q):
        return (1 - q**n)//(1 - q)

    def qfactorial(n, q):
        p = 1
        for k in range(1, n+1):
            p *= analog(k, q)
        return p

    def qbinomial(n, k, q):
        d = qfactorial(k, q)*qfactorial(n-k, q)
        return qfactorial(n, q)/d


To keep things small, let’s look at the field with q = 3 elements. Here addition and multiplication are carried out mod 3.

Let n = 3 and k = 1. So we’re looking for one-dimensional subspaces of F³ where F is the field of integers mod 3.

A one-dimensional subspace of vector space consists of all scalar multiples of a vector. We can only multiply a vector by 0, 1, or 2. Multiplying by 0 gives the zero vector, multiplying by 1 leaves the vector the same, and multiplying by 2 turns 1s into 2s and vice versa. So, for example, the vector (1, 2, 0) is a basis for the subspace consisting of

(0, 0, 0), (1, 2, 0}, (2, 1, 0).

We can find all the subspaces by finding a base for each subspace. And with a little brute force effort, here’s a list.

  • (1, 0, 0)
  • (0, 1, 0)
  • (0, 0, 1)
  • (0, 1, 1)
  • (1, 0, 1)
  • (1, 1, 0)
  • (1, 2, 0)
  • (0, 1, 2)
  • (2, 0, 1)
  • (1, 1, 1)
  • (2, 1, 1)
  • (1, 2, 1)
  • (1, 1, 2)

It’s easy to check that none of these vectors is a multiple mod 3 of another vector on the list. The theory above says we should expect to find 13 subspaces, and we have, so we must have found them all.

Related posts

Redundant Residue Number Systems

You can uniquely represent a large number by its remainders when divided by smaller numbers, provided the smaller numbers have no factors in common, and carry out arithmetic in this representation. Such a representation is called a Residue Number System (RNS).

In the 1960’s people realized RNSs could be useful in computer arithmetic. The original papers are now hard to find, but you can find a summary of some of their ideas here. We will give examples working in a particular RNS below.

When you work with remainders by more numbers than necessary, you have a Redundant Residue Number System (RRNS) that provides error detection and correction. We will also demonstrate this with an example.

RNS example

To be concrete, we’ll use the remainders by 199, 233, 194, and 239 to represent numbers, following the work of C. W. Hastings and R. W. Watson referenced in the link cited above.

Let M = 199*233*194*239. Any non-negative integer less than M can be specified by its remainders mod 199, 233, 194, and 239.

The following Python code generates a random number less than M, represents it by its remainders, and then recovers the original number from the remainders using the Chinese Remainder Theorem.

    from random import randrange
    from sympy import prod
    from sympy.ntheory.modular import crt

    moduli = [199, 233, 194, 239]
    M = prod(moduli)

    x = randrange(M)
    remainders = [x % m for m in moduli]
    # See footnote [1]
    y = crt(moduli, remainders, symmetric=False)[0]


This printed

    [166, 204, 57, 235]

You can add, subtract, and multiply numbers by carrying out the corresponding operations on their remainders. There are three advantages to this.

First, you can work with smaller numbers. In our example, all the moduli are 8-bit numbers; we can carry out arithmetic on 32-bit numbers [2] by only working directly with 8-bit numbers. We could use the same idea to represent extremely large numbers by their remainders via 64-bit integers.

Second, we can do our calculations in parallel, working with each of our remainders at the same time.

Third, there are no carries. There’s no need to keep track of whether component calculations overflow.

The following code shows how we can add two numbers via their remainders.

    a = randrange(M//2)
    b = randrange(M//2)

    arem = [a % m for m in moduli]
    brem = [b % m for m in moduli]
    crem = [z[0] + z[1] for z in zip(arem, brem)]
    c = crt(moduli, crem, symmetric=False)[0]

    print(a + b)

When I ran this code, it printed 832537447 twice.


Now we get to the redundant part. Suppose we add more numbers to our list of moduli, larger than the previous numbers and relatively prime to all of them and to each other. Now working modulo this larger list of numbers, we have more information than we need. If the results we get from using various subsets of the list of moduli are inconsistent, we’ve detected an error. And with enough extra information, we can correct the error.

Watson and Hastings added 251 and 509 in their example, and so we’ll do the same.

As before, we will generate a couple random numbers and represent them via their remainders, though now by a longer list of remainders. We will deliberately corrupt one of the component sums and then compute their sum using different choices of four moduli from the set of six.

    xmoduli = [199, 233, 194, 239, 251, 509]
    a = randrange(M//2)
    b = randrange(M//2)
    aremx = [a % m for m in xmoduli]
    bremx = [b % m for m in xmoduli]
    cremx = [z[0] + z[1] for z in zip(aremx, bremx)]
    cremx[1] += 13

    c0 = crt(xmoduli[:4], cremx[:4], symmetric=False)[0]
    c1 = crt(xmoduli[2:], cremx[2:], symmetric=False)[0]
    c2 = crt(xmoduli[:2] + xmoduli[4:], cremx[:2] + cremx[4:], symmetric=False)[0]
    print(c0, c1, c2)

This will return three different answers, so we know that there has been an error. When I ran it I got 70315884, 819846513, and 890162397. If you run the code yourself you’ll get different answers since you’ll generate different random numbers.

Now how do we correct the error? Suppose we know there has only been one error (or more realistically, we are willing to assume that the probability of two errors is tolerably small). Then one of the results above must be correct: the first sum excludes the last two moduli, the second excludes the first two, and the last excludes the middle two. One of them must exclude the error.

The first calculation based on a different subset of moduli that gives one of the results above is correct. The code

    c3 = crt(xmoduli[:1]+xmoduli[2:5], cremx[:1] + cremx[2:5], symmetric=False)[0]

produced 890162397, matching the third sum above, so we know that eliminating the second modulus gives the correct answer. We’ve found the correct answer, and we’ve discovered which component was corrupted.


[1] A couple things about the call to crt require explanation. We set symmetric to False in order to get a positive return value. Otherwise we might get a value that is correct mod M but negative. Also, crt returns not just the solution we’re after but a pair consisting of the solution and the product of the moduli. We take the first element of the pair with [0] to get the part we’re interested in.

[2] Not all 32-bit numbers. with any numbers less than M, and M is between 231 and 232.

Morse code golf

You can read the title of this post as ((Morse code) golf) or as (Morse (code golf)).

Morse code is a sort of approximate Huffman coding of letters: letters are assigned symbols so that more common letters can be transmitted more quickly. You can read about how well Morse code achieves this design objective here.

But digits in Morse code are kinda strange. I imagine they were an afterthought, tacked on after encodings had been assigned to each of the letters, and so had to avoid encodings that were already in use. Here are the assignments:

    | Digit | Code  |
    |     1 | .---- |
    |     2 | ..--- |
    |     3 | ...-- |
    |     4 | ....- |
    |     5 | ..... |
    |     6 | -.... |
    |     7 | --... |
    |     8 | ---.. |
    |     9 | ----. |
    |     0 | ----- |

There’s no attempt to relate transmission length to frequency. Maybe the idea was that all digits are equally common. While in some contexts this is true, it’s not true in general for mathematical and psychological reasons.

There is a sort of mathematical pattern to the Morse code symbols for digits. For 1 ≤ n ≤ 5, the symbol for n is n dots followed by 5-n dashes. For 6 ≤ n ≤ 9, the symbol is n-5 dashes followed by 10-n dots. The same rule extends to 0 if you think of 0 as 10.

A more mathematically satisfying way to assign symbols would have been binary numbers padded to five places:

    0 -> .....
    1 -> ....-
    2 -> ..._.

Because the Morse encoding of digits is awkward, it’s not easy to describe succinctly. And here is where golf comes in.

The idea of code golf is to write the shortest program that does some task. Fewer characters is better, just as in golf the lowest score wins.

Here’s the challenge: Write two functions as small you can, one to encode digits as Morse code and another to decode Morse digits. Share your solutions in the comments below.

Related posts

MDS codes

A maximum distance separable code, or MDS code, is a way of encoding data so that the distance between code words is as large as possible for a given data capacity. This post will explain what that means and give examples of MDS codes.


A linear block code takes a sequence of k symbols and encodes it as a sequence of n symbols. These symbols come from an alphabet of size q. For binary codes, q = 2. But for non-trivial MDS codes, q > 2. More on that below.

The purpose of these codes is to increase the ability to detect and correct transmission errors while not adding more overhead than necessary. Clearly n must be bigger than k, but the overhead n-k has to pay for itself in terms of the error detection and correction capability it provides.

The ability of a code to detect and correct errors is measured by d, the minimum distance between code words. A code has separation distance d if every pair of code words differs in at least d positions. Such a code can detect up to d errors per block and can correct ⌊(d − 1)/2⌋ errors.


The following example is not an MDS code but it illustrates the notation above.

The extended Golay code used to send back photos from the Voyager missions has q = 2 and [n, k, d] = [24, 12, 8]. That is, data is divided into segments of 12 bits and encoded as 24 bits in such a way that all code blocks differ in at least 8 positions. This allows up to 8 bit flips per block to be detected, and up to 3 bit flips per block to be corrected.

(If 4 bits were corrupted, the result could be equally distant between two valid code words, so the error could be detected but not corrected with certainty.)

Separation bound

There is a theorem that says that for any linear code

k + dn + 1.

This is known as the singleton bound. MDS codes are optimal with respect to this bound. That is,

k + d = n + 1.

So MDS codes are optimal with respect to the singleton bound, analogous to how perfect codes are optimal with respect to the Hamming bound. There is a classification theorem that says perfect codes are either Hamming codes or trivial with one exception. There is something similar for MDS codes.


MDS codes are essentially either Reed-Solomon codes or trivial. This classification is not as precise as the analogous classification of perfect codes. There are variations on Reed-Solomon codes that are also MDS codes. As far as I know, this accounts for all the known MDS codes. I don’t know that any others have been found, or that anyone has proved that there are no more.

Trivial MDS codes

What are these trivial codes? They are the codes with 0 or 1 added symbols, and the duals of these codes. (The dual of an MDS code is always an MDS code.)

If you do no encoding, i.e. take k symbols and encode them as k symbols, then d = 1 because different code words may only differ in one symbol. In this case n = k and so k + d = n + 1, i.e. the singleton bound is exact.

You could take k data symbols and add a checksum. If q = 2 this would be a parity bit. For a larger alphabet of symbols, it could be the sum of the k data symbols mod q. Then if two messages differ in 1 symbol, they also differ in added checksum symbol, so d = 2. We have n = k + 1 and so again k + d = n + 1.

The dual of the code that does no encoding is the code that transmits no information! It has only one code word of size n. You could say, vacuously, that d = n because any two different code words differ in all n positions. There’s only one code word so k = 1. And again k + d = n + 1.

The dual of the checksum code is the code that repeats a single data symbol n times. Then d = n because different code words differ in all n positions. We have k = 1 since there is only one information symbol per block, and so k + d = n + 1.

Reed Solomon codes

So the stars of the MDS show are the Reed-Solomon codes. I haven’t said how to construct these codes because that deserves a post of its own. Maybe another time. For now I’ll just say a little about how they are used in application.

As mentioned above, the Voyager probes used a Golay code to send back images. However, after leaving Saturn the onboard software was updated to use Reed-Solomon encoding. Reed-Solomon codes are used in more down-to-earth applications such as DVDs and cloud data storage.

Reed-Solomon codes are optimal block codes in terms of the singleton bound, but block codes are not optimal in terms of Shannon’s theorem. LDPC (low density parity check) codes come closer to the Shannon limit, but some forms of LDPC encoding use Reed-Solomon codes as a component. So in addition to their direct use, Reed-Solomon codes have found use as building blocks for other encoding schemes.

Perfect codes

A couple days ago I wrote about Hamming codes and said that they are so-called perfect codes, i.e. codes for which Hamming’s upper bound on the number of code words with given separation is exact.

Not only are Hamming codes perfect codes, they’re practically the only non-trivial perfect codes. Specifically, Tietavainen and van Lint proved in 1973 that there are three kinds of perfect binary codes:

  1. Hamming codes
  2. One Golay code
  3. Trivial codes

I wrote about Golay codes a few months ago. There are two binary Golay codes, one with 23 bit words and one with 24 bit words. The former is “perfect.” But odd-length words are awkward to use, and in practice you’d usually tack on one more parity bit, creating an “imperfect” but useful error-correcting code. The Voyager probes used the 24-bit Golay code to send photos of Jupiter and Saturn back to earth.

Trivial codes

Trivial codes simply repeat each word of input n times. For example, 01100 would be encoded as 0110001100 in a trivial code with n = 2. In order to be a perfect code, a trivial code must have n odd.

Trivial codes are the first and simplest thing you’d think of if you wanted to create an error correcting code. You might, for example, send a message three times. Then if a bit in a single word is corrupted in one copy of the message, but the corresponding word is the same in the latter two copies, you use their common value as the assumed correct value.

While trivial codes are obvious, they’re not efficient. For example, suppose we have message words of 5 bits, say to encode English letters and a few other symbols. If we triple each message word, it takes 15 bits of code word to transmit 5 bits of message.

But if we used a Hamming (15, 11) code, each 15-bit code word carries 11 message bits and 4 parity bits. This code has the same Hamming separation as the trivial code with n = 3, but it carries over twice as much information per code word, 11 bits versus 5 bits.

As word size increases, the efficiency of Hamming codes approaches 1. With trivial codes, the efficiency is constantly 1/n.

Trivial codes show that “perfect” does not mean optimal in a practical sense. Perfect codes are optimal with respect to the Hamming bound, but maybe not optimal in practice, depending on what criteria you’re optimizing over. Trivial codes are optimal if you’re optimizing for conceptual simplicity, but not if you’re optimizing for efficiency.

More coding theory posts

A gentle introduction to Hamming codes

The previous post looked at how to choose five or six letters so that their Morse code representations are as distinct as possible. This post will build on the previous one to introduce Hamming codes. The problem of finding Hamming codes is much simpler in some ways, but also more general.

Morse code is complicated for a several reasons. First, it seems at first blush to have an alphabet of two symbols—dot and dash—but it actually has an alphabet of three symbols—dot, dash, and space—with a complicated constraints. Second, we’re trying to optimize the perceptual separation between letters, not the objective separation between signals on a wire. Third, dots and dashes have different lengths, and we have secondary objectives related to transmission time. Given the option, we would prefer to choose letters that reduce transmission time, but we would also like to choose letters that each have similar transmission time.

We will simplify the situation in this post by using exactly two symbols, 0 and 1, and using bit sequences of fixed length. We’ll also have one simple objective, maximizing Hamming distance separation, with no secondary objectives.


We’ll need to introduce some terminology to make the rest of the post more clear.

By alphabet we will mean the choice of symbols we use. For Morse code, the “alphabet” of symbols is dot, dash, and space. For a binary sequence, the alphabet is 0 and 1.

By word we will mean a sequence of symbols from our alphabet. Using this terminology, we would call ..-. in Morse code a word, even though it represents an English letter. You will also see the term vector used instead of word; mentally substitute vector for word everywhere if you find that easier.

The set of words we will use for transmitting messages is called a code. In the previous post, the code consisted originally of A, D, F, G, and V, and we looked at codes that would have been better by several criteria.

If we represent each English letter by a sequence of five bits, we would call the 0s and 1s the elements of our alphabet and the groups of five bits our words.

Binary codes

For this post we will look at words of a fixed length n. For example, we could encode English letters into words of 5 bits each since

25 = 32 > 26

though this would only give us Hamming distance separation of 1, i.e. many of the code words would differ by only one bit. If a single bit were accidentally flipped in transmission, we would not be able to tell. But if we were to use more bits per word, we could have more separation. For example, if we used words of six bits and used the last bit as a parity bit, then we could choose 26 words that have Hamming distance at least 2 from each other.

A(n, d)

The maximum number of binary words of length n, all separated by a Hamming distance of at least d, is denoted

A2(n, d).

The subscript 2 on A says that we’re working with an alphabet of 2 symbols, i.e. 0 and 1. If we were interested in an alphabet of size q, we would replace the 2 with a q.

This notation is a little sideways from our introduction, but closely related. We would like to know, for example, how many bits n we need do produce a code that has, say, 26 symbols separated by a Hamming distance d. That is, we’re thinking of our code size, such as 26, and the Hamming distance d, as being the independent variables and the number of bits n as the dependent variable. But it’s customary in coding theory to consider the word size n and the Hamming distance separation d as independent variables, and to consider Aq(n, d) as a function of n and d.

There is no way to compute Aq(n, d) in general, even for a fixed q such as q = 2. But there are upper and lower bounds, and exact values have been computed in many particular cases.

For example, A2(10, 4) = 40, and so using sequences of 10 bits, you can find a set of 40 words that are all a Hamming distance of at least 4 from each other, enough to encode all English letters (without regard to case) and 10 digits.

A2(15, 6) = 128, so with 15 bits you could find 128 words a distance of 6 apart, enough to encode all ASCII characters.

Hamming bound and perfect codes

As mentioned above, there is no convenient formula for A(n, d) but there are bounds. The Hamming bound says

A_q(n, d) \leq \frac{q^n}{\sum_{k=0}^t {n \choose k}(q-1)^k}

where the upper limit of the sum is

t = \left\lfloor \frac{d-1}{2} \right\rfloor

A code for which the Hamming bound is exact is called a perfect code.

Hamming codes

Hamming codes are perfect binary codes where d = 3.

Note that 3 is the minimum separation for error correction. If we simply add a parity bit, as mentioned above, we can detect errors, but we cannot correct them. If code words are a distance 2 apart, a word with one corrupted bit could be equidistant between two valid code words.

For example, suppose you encode 00010 as 000101, adding a parity bit of 1 at the end because the original sequence had an odd number of 1’s. And suppose you similarly encode 00011 as 000110. Now you receive 000100. You know that something is wrong because the parity bit is inconsistent with the previous bits. But you don’t know whether 000101 was transmitted and the 6th bit was corrupted or 000110 was transmitted and the 5th bit was corrupted.

With Hamming codes, n is always one less than a power of 2, i.e

n = 2m − 1

and m is the number of added bits. That is, the code will have

2mm − 1

data bits, and the number of distinct code words will be 2 raised to that power. Each of these words is separated by a Hamming distance of at least 3.

Incidentally, note that as m increases, the number of parity bits is growing linearly, but the number of data bits is growing exponentially. That is, the overhead of the parity bits relative to the code word size is going to zero.

Hamming(7,4) code

Let’s look at one Hamming code in detail. If m = 3, we have a code with 7-bit words: 4 data bits and 3 added bits. With 4 data bits we can have 16 different words, so the Hamming code with 7-bit words contains 16 words, each separated by a Hamming distance of 3. If we compute the right side of the Hamming bound we also get 16, i.e. A2(7, 3) = 16, demonstrating that the Hamming bound is exact.

We can encode a set of 4 bits by making them into a row vector and multiplying the vector on the right by the generator matrix

    1 0 0 0 1 1 1
    0 1 0 0 0 1 1
    0 0 1 0 1 0 1
    0 0 0 1 1 1 0

The matrix multiplication is defined using the field with two elements, so multiplication stays the same but 1 + 1 = 0.

Note that the 4 by 4 block on the left side of the matrix is the identity matrix. So the encoding of a string of four bits begins with the original 4 bits. The 16 words in our code are


Unless I’ve made an error in writing this up [1], each of these code words should differ from all other code words in at least 3 positions.

More on coding theory

[1] Errors happen all the time. That’s why we need error correcting codes!