Luhn checksum algorithm

After writing the previous post on credit card numbers, I intended to link to a previous post that discussed credit card check sums. But I couldn’t find such a post. I’ve written about other kinds of checksums, such as the checksum scheme used in Vehicle Identification Numbers, but apparently I haven’t written about credit card checksums before.

The algorithm used by credit cards, and other identification numbers, is called the Luhn algorithm. I’ll lead up to the algorithm to explain its motivation, going through a thought process Luhn might have gone through in developing his algorithm.

NB: The intermediate steps are not the Luhn algorithm. If you’re just looking for Luhn’s algorithm, skip to the end.

Step 1

A simple way to create a check sum is to add up the digits of a number and tack on the sum mod 10 on the end. For example, this process would replace 1776 with 17761 because

1 + 7 + 7 + 6 = 21

and 21 mod 10 = 1.

This simple checksum will detect if any single digit has been changed. For example, if we receive 17791 we can compute the checksum of 1779 and tell that something is wrong because the last digit should be 4.

Step 2

A common kind of error is transposing consecutive digits, and the scheme above will not detect transpositions: all digits are treated the same way, so scrambling them doesn’t change the checksum.

Luhn’s idea was to double every other digit before taking the sum. This creates an asymmetry between the contribution of consecutive digits to the check sum and makes it possible to detect all transpositions.

Which digits to double

The method described above works whether you start from the left end or the right end, and whether you start doubling with the first or second digit you encounter. But to be compatible with the Luhn algorithm used in practice, start on the right end and double the last digit.

So going back to our little example, 1776 becomes 17764 because

2*6 + 7 + 2*7 + 1 = 34.

If we accidentally transpose the first two digits, 17764 becomes 71764. But the check sum of 7176 is 0. So 71760 would be valid but 71764 is not. We’ve detected an error.

The advantage of starting at the right end is that adding zeros to the front of the number doesn’t change the checksum. So, for example, 01776 becomes 017764.

Step 3

The next refinement is strange. Instead of simply doubling every other digit, we’ll double every other digit and reduce mod 9.

So we could calculate the checksum for 1776 by first doubling every other digit

1776 -> 1, 14, 7, 12

then reducing the doubled digits mod 9, i.e. replacing 14 by 5 and 12 by 3,

1, 14, 7, 12 -> 1, 5, 7, 3

and then finally taking the last digit of the sum.

1 + 5 + 7 + 3 = 16 -> 6.

What does this extra complication buy us? Nothing as far as I can tell. It weakens the algorithm. The algorithm in Step 2 could detect if 09 were transposed to 90; the algorithm here in Step 3 cannot.

Update: Nathan points out in the comments that without the mod 9 step some changes to a single digit could go undetected, namely changing the second of a pair of doubled digits by 5. For example, 33 and 38 would have the same checksum in Step 2 but not in Step 3.

Step 4

The final step is to take the tens compliment of the checksum computed above. This adds nothing to the effectiveness of the algorithm, but it’s what Luhn did.

Luhn algorithm in practice

Let’s describe the position of digits in a number by numbering them from the right end, starting with o: the last digit is in position 0, the one to its left is in position 1, etc.

We can summarize the Luhn algorithm in the following pseudocode.

    sum = 0
    for d in digits:
        if d is in an even position:
            sum = sum + (2*d) mod 9
        else:
            sum = sum + d
    return 10 - sum mod 10

The last line isn’t quite correct. If sum mod 10 is 0, then this returns 10 when it should return 0. A clever way around this is to change the last line to

    return (sum*9) mod 10

which will always return a single digit.

NB: In our examples above, the checksum was added to the end. Some credit cards do this, but others put the checksum somewhere in the interior of the number.

Afterword

The Luhn algorithm was developed in 1954, though it is still widely used. Since that time much more sophisticated error correcting codes have been developed. See the links below for examples.

Tradeoff between alphabet size and word size

Literal alphabets

Natural language alphabets are all within an order of magnitude of the size of the Roman alphabet. The Hebrew alphabet has a few less letters and Russian has a few more. The smallest alphabet I’m aware of is Hawaiian with 13 letters.

Syllabaries are larger than alphabets, but not an order of magnitude larger. For example, Japanese uses 46 symbols and the Cherokee uses 85.

There is a tradeoff between alphabet size and word size that constrains alphabets to be roughly the same size in every culture.

Metaphorical alphabets

Computer science generalizes the term alphabet to mean any fixed collection of symbols and generalizes word to mean any finite sequence of symbols from an alphabet. So, for example, the digits 0 through 9 form an alphabet in this sense, and the words are numbers.

Numbers are sometimes written using smaller or larger alphabets. Binary notation uses an alphabet of only 2 symbols, but numbers require significantly more symbols to express than they do in decimal. Hexadecimal uses an alphabet of 16 symbols and is more compact than decimal notation.

Tradeoffs

Many design problems can be formulated as a tradeoff between alphabet length and word length. I was working on such a problem recently, which motivated this blog post.

Larger alphabets mean shorter words, but a linear reduction in word size requires an exponential increase in alphabet size. For example, we could write numbers using half as many digits if we thought in base 100 and had symbols for numbers 0 through 99. Or we could think in base 1000, increasing the number of digits by a factor of 100 and reducing the length of number representations by a factor of 3.

Word length is inversely proportional to the logarithm of alphabet size, and so if we have some cost proportional to alphabet size a and another cost proportional to word length, the optimal alphabet size for the sum of these costs would minimize

C1a + C2 / log a.

We could divide this objective function by C1 and set w = C1/C2 to simplify the objective to minimizing

a + w / log a

Setting the derivative with respect to a to zero shows that the optimal alphabet size a is given by solving

a (log a)² = w.

As the weight w given to the cost of word size increases, the optimal alphabet size increases as well. A wide range of weights corresponds to a narrower range of alphabet sizes.

This plot could be misleading. The graph is not showing the cost associated with various alphabet sizes but rather the optimal alphabet size for a given word size cost. The weight w is fixed in any particular application.

An alphabet of size 10 is optimal for a word length cost weight w of about 50. Using w = 50, the following plot shows how alphabet size cost and word size costs compare. A large increase in alphabet size cost only buys a small decrease in word size cost.

Related post: The base with the largest decibel

Morse code numbers and abbreviations

Numbers in Morse code seem a little strange. Here they are:

    |-------+-------|
    | Digit | Code  |
    |-------+-------|
    |     1 | .---- |
    |     2 | ..--- |
    |     3 | ...-- |
    |     4 | ....- |
    |     5 | ..... |
    |     6 | -.... |
    |     7 | --... |
    |     8 | ---.. |
    |     9 | ----. |
    |     0 | ----- |
    |-------+-------|

They’re fairly regular, but not quite. That’s why a couple years ago I thought it would be an interesting exercise to write terse code to encode and decode digits in Morse code. There’s exploitable regularity, but it’s irregular enough to make the exercise challenging.

Design

As with so many things, this scheme makes more sense than it seems at first. When you ask “Why didn’t they just …” there’s often a non-obvious answer.

The letters largely exhausted the possibilities of up to 4 dots and dashes. Some digits would have to take five symbols, and it makes sense that they would all take 5 symbols. But why the ones above? This scheme uses a lot of dashes, and dashes take three times longer to transmit than dots.

A more efficient scheme would be to use binary notation, with dot for 0’s and dash for 1’s. That way the leading symbol would always be a dot and usually the second would be a dot. That’s when encoding digits 0 through 9. As a bonus you could use the same scheme to encode larger numbers in a single Morse code entity.

The problem with this scheme is that Morse code is intended for humans to decode by ear. A binary scheme would be hard to hear. The scheme actually used is easy to hear because you only change from dot to dash at most once. As Morse code entities get longer, the patterns get simpler. Punctuation marks take six or more dots and dashes, but they have simple patterns that are easy to hear.

Code golf

When I posed my coding exercise as a challenge, the winner was Carlos Luna-Mota with the following Python code.

    S="----.....-----"
    e=lambda x:S[9-x:14-x]
    d=lambda x:9-S.find(x)

Honorable mention goes to Manuel Eberl with the following code. It only does decoding, but is quite clever and short.

    d=lambda c:hash(c+'BvS+')%10

It only works in Python 2 because it depends on the specific hashing algorithm used in earlier versions of Python.

Cut numbers

If you’re mixing letters and digits, digits have to be five symbols long. But if you know that characters have to be digits in some context, this opens up the possibility of shorter encodings.

The most common abbreviations are T for 0 and N for 9. For example, a signal report is always three digits, and someone may send 5NN rather than 599 because in that context it’s clear that the N’s represent 9s.

When T abbreviates 0 it might be a “long dash,” slightly longer than a dash meant to represent a T. This is not strictly according to Hoyle but sometimes done.

There are more abbreviations, so called cut numbers, though these are much less common and therefore less likely to be understood.

    |-------+-------+-----+--------+----|
    | Digit | Code  |  T1 | Abbrev | T2 |
    |-------+-------+-----+--------+----|
    |     1 | .---- |  17 | .-     |  5 |
    |     2 | ..--- |  15 | ..-    |  7 |
    |     3 | ...-- |  13 | ...-   |  9 |
    |     4 | ....- |  11 | ....-  | 11 |
    |     5 | ..... |   9 | .      |  1 |
    |     6 | -.... |  11 | -....  | 11 |
    |     7 | --... |  13 | -...   |  9 |
    |     8 | ---.. |  15 | -..    |  7 |
    |     9 | ----. |  17 | -.     |  5 |
    |     0 | ----- |  19 | -      |  3 |
    |-------+-------+-----+--------+----|
    | Total |       | 140 |        | 68 |
    |-------+-------+-----+--------+----|

The space between dots and dashes is equal to one dot, and the length of a dash is the length of three dots. So the time required to send a sequence of dots and dashes equals

2(# dots) + 4(# dashes) – 1

In the table above, T1 is the time to transmit a digit, in units of dots, without abbreviation, and T2 is the time with abbreviation. Both the maximum time and the average time are cut approximately in half. Of course that’s ideal transmission efficiency, not psychological efficiency. If the abbreviations are not understood on the receiving end and the receiver asks for numbers to be repeated, the shortcut turns into a longcut.

Related posts

Frequency shift keying (FSK) spectrum

This post will look encoding digital data as an analog signal using frequency shift keying (FSK), first directly and then with windowing. We’ll look at the spectrum of the encoded signal and show that basic FSK uses much less bandwidth than direct encoding, but more bandwidth than FSK with windowing.

Square waves

The most natural way to encode binary data as an analog signal would be represent 0s and 1s by a sequence of pulses that take on the values 0 and 1.

A problem with this approach is that it would require a lot of bandwidth.

In theory a square wave has infinite bandwidth: its Fourier series has an infinite number of non-zero coefficients. In practice, the bandwidth of a signal is determined by how many Fourier coefficients it has above some threshold. The threshold would depend on context, but let’s say we ignore Fourier components with amplitude smaller than 0.001.

As I wrote about here, the nth Fourier sine series coefficients for a square wave is equal to 4/nπ for odd n. This means we would need on the order of 1,000 terms before the coefficients drop below our threshold.

Frequency shift keying

The rate of convergence of the Fourier series for a function f depends on the smoothness of f. Discontinuities, like the jump in a square wave, correspond to slow convergence, i.e. high bandwidth. We can save bandwidth by encoding our data with smoother functions.

So instead of jumping from 0 to 1, we’ll encode a 0 as a signal of one frequency and a 1 as a signal with another frequency. By changing the frequency after some whole number of periods, the resulting function will be continuous, and so will have smaller bandwidth.

Suppose we have a one second signal f(t) that is made of half a second of a 4 Hz signal and half a second of a 6 Hz signal, possibly encoding a 0 followed by a 1.

What would the Fourier coefficients look like? If we just had a 4 Hz sine wave, the Fourier series would have only one component: the signal itself at 4 Hz. If we just had a 6 Hz sine wave, the only Fourier component would again be the signal itself. There would be no sine components at other frequencies, and no cosine components.

But our signal patched together by concatenating 4 Hz and 6 Hz signals has non-zero cosine terms for every odd n, and these coefficients decay like O(1/n²).

Our Fourier series is

f(t) = 0.25 sin 8πt + 0.25 sin 12πt + 0.0303 cos 2πt + 0.1112 cos 6πt – 0.3151 cos 12πt + 0.1083 cos 10πt + …

We need to go out to 141 Hz before the coefficients drop below 0.001. That’s a lot of coefficients, but it’s an order of magnitude fewer coefficients than we’d need for a square wave.

Pulse shaping

Although our function f is continuous, it is not differentiable. The left-hand derivative at 1/2 is 8π and the right-hand derivative is 12π. If we could replace f with a related function that is differentiable at 1/2, presumably the signal would require less bandwidth.

We could do this by multiplying both halves of our signal by a windowing function. This is called pulse shaping because instead of a simple sine wave, we change the shape of the wave, tapering it at the ends.

Let’s using a cosine window because that’ll be easy; in practice you’d probably use a different window [1].

Now our function is differentiable at 1/2, and its Fourier series converges more quickly. Now we can disregard components above 40 Hz. With a smoother windowing function the windowed function would have more derivatives and we could disregard more of the high frequencies.

Related posts

[1] This kind of window is called a cosine window because you multiply your signal by one lobe of a cosine function, with the peak in the middle of the signal. Since we’re doing this over [0. 1/2] and again over [1/2, 1], we’re actually multiplying by |sin 2πt|.

Varicode

Varicode is a way of encoding text and control characters into binary using code words of variable length. It was developed as part of the PSK31 protocol for digital communication over amateur radio.

In the spirit of Morse code, it uses short code words for common characters and longer code words for less common characters in the expectation that this will result in shorter encodings.

If you use variable length words, you’ve got to have some way of knowing when one word ends and the next begins. Varicode solves this problem by using only keywords that begin and end with 1 and that do not contain two consecutive zeros. Then 00 is inserted between code words. Since 00 cannot appear inside a code word, these bits unambiguously mark the space between code words.

Synchronization

Varicode is self-synchronizing in the sense that if you jump into a stream of bits produced by Varicode, as soon as you see two zeros, you know you’re at a code word boundary and can start reading from there. You’ve lost any bits that were transmitted before you jumped in, but you can parse everything going forward.

ASCII doesn’t have this problem or this robustness. It doesn’t have the problem of determining code word boundaries because every code word is eight bits long. But then to read a stream of ASCII bits you need to know your position mod 8. If you jump into a stream of bits, you don’t know where the next code word boundary will be, though you may be able to infer it by trying 8 possibilities and seeing which produces the most intelligible results.

Regular expression

Another way to state the rules for forming Varicode code words is to say that 1 is a valid code (the code for a space, ASCII 0x20) and that you can form new codes by prefixing 1 or 10 to a code. In terms of regular expressions, this says a Varicode code word matches

    (1|10)*1

Fibonacci numbers

How many code words are there of length n? Well, there are two ways to make a code word that long: you either put a 1 in front of a code word of length n-1, or you put a 10 in front of a code word of length n-2. So the number of code words of length n equals the number of code words of length n-1 plus the number of code words of length n-2. That is, the number of code words satisfies the same recurrence relation as the Fibonacci numbers.

It’s easy to see that there’s only one code word of length one, and only one code word of length two, so the number of code words of each length satisfies the initial conditions for the Fibonacci sequence as well, and so they are the Fibonacci numbers.

Efficiency

Varicode encodes a lot more than lower case letters—it encodes most ASCII characters— and so it would take some work to discover the relative frequencies of the characters, and this frequency would depend on where Varicode is used. As far as I know Varicode is use primarily (only?) in PSK31, and so the relevant frequency would be the frequencies in messages sent over PSK31, not English more generally.

You can find the code words of each letter here.

To make things easier, let’s suppose messages are limited to lower case letters and spaces, and that the letters follow the same distribution as in English in general.

We can use the letter frequencies here, except these don’t take spaces into account. If we assume words are about 5 letters long, then the probability of a character being a space is 1/6 and the probabilities of the other characters conditional on not being a space are given by the table. This means the letter probabilities need to be multiplied by 5/6.

This gives us a an expected length of 3.89 bits per letter, which is effectively 5.89 bits when you consider the 00 pattern we have to add between letters.

You could represent the 26 letters and a few more characters using 5 bits, but the result would not be self-synchronizing. One way of looking at this to say that the compression provided by variable length encoding nearly pays for the overhead required to make the code self-synchronizing.

Related posts

Q codes in Seveneves

The first time I heard of Q codes was when reading the novel Seveneves by Neal Stephenson. These are three-letter abbreviations using in Morse code that all begin with Q.

Since Q is always followed by U in native English words, Q can be used to begin a sort of escape sequence [1].

There are dozens of Q codes used in amateur radio [2], and more used in other contexts, but there are only 10 Q codes used in Seveneves [3]. All begin with Q, followed by R, S, or T.

Tree[Q, {Tree[R, {A, K, N, S, T}], Tree[S, {B, L, O}], Tree[T, {H, X}]}]

Each Q code can be used both as a question and as an answer or statement. For example, QRS can mean “Would you like me to slow down” or “Please slow down.” I’ll just give the interrogative forms below.

Here are the 10 codes that appear in Stephenson’s novel.

QRA
What is your call sign?
QRK
Is my signal intelligible?
QRN
Is static a problem?
QRS
Should I slow down?
QRT
Should I stop sending?
QSB
Is my signal fading?
QSL
Are you still there?
QSO
Could you communicate with …?
QTH
Where are you?
QTX
Will you keep your station open for talking with me?

Related posts

[1] Some Q codes have a U as the second letter. I don’t know why—there are plenty of unused TLAs that begin with Q—but it is what it is.

[2] You can find a list here.

[3] There is one non-standard code in the novel: QET for “not on planet Earth.”

Self-orthogonal vectors and coding

One of the surprising things about linear algebra over a finite field is that a non-zero vector can be orthogonal to itself.

When you take the inner product of a real vector with itself, you get a sum of squares of real numbers. If any element in the sum is positive, the whole sum is positive.

In a finite field, things don’t work that way. A list of non-zero squares can add up to zero.

In the previous post, we looked at the ternary Golay code, an error correcting code that multiplies a vector of data by a rectangular matrix mod 3. That post included the example that

[1, 0, 1, 2, 2]

was encoded as

[1 0 1 2 2 0 1 0 2 0 0].

The output is orthogonal to itself:

1² + 0² + 1² + 2² + 2² + 0² + 1² + 0² + 2² + 0² + 0² ≡ 0 (mod 3).

In fact, every output of the ternary Golay code will be self-orthogonal. To see this, let v be a 1 by 5 row vector. Then its encoding is vG where G is the 5 by 11 matrix in the previous post. The inner product of vG with itself is

vG (vG)T = vGGTvT.

We can easily show that GGT is a zero matrix using Python:

    >>> (G@G.T)%3
    [[0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]]

And so

vGGTvT = vOvT = 0.

If a vector is not self-orthogonal, there has been an error. For example, if one of the 0s in our output turned into a 1 or a 2, the inner product of the corrupted vector with itself would be equal to 1 (mod 3) rather than 0.

Self-orthogonality is necessary but not sufficient for an encoded vector to be error-free. Correctly encoded vectors form a 5 dimensional subspace of GF(3)11, namely the row space of G. The set of self-orthogonal vectors in GF(3)11 is a larger space.

A sufficient condition for a row vector w to be the encoding of a data vector v is for

GwT

to be the zero vector (carrying out all calculations mod 3).

The ternary Golay code is capable of detecting and correcting corruption in up to two spots in a vector. For example, suppose we take our vector

[1 0 1 2 2 0 1 0 2 0 0]

above and corrupt it by turning the first couple 2’s to 1’s.

[1 0 1 1 1 0 1 0 2 0 0]

We’d still get a self-orthogonal vector, but the product GwT would not be zero.

    >>> v = np.array( [1, 0, 1, 2, 2] )
    >>> w = (v@G)%3
    >>> w[3] = w[4] = 1
    >>> print((G@w)%3)

This prints

    [0 0 0 2 2]

which lets us know that the modified version of w is not a valid encoding, not a part of a row space of G.

If we know (or are willing to assume) that

[1 0 1 1 1 0 1 0 2 0 0]

is the result of no more than two changes applied a valid code word vG, then we can recover vG by looking for the closest vector in the row space of G. Here closeness is measured in Hamming distance: the number of positions in which vectors differ. Every vector in GF(3)11 is within a distance of 2 from a unique code word, and so the code can correct any two errors.

Related posts

Ternary Golay code in Python

Marcel Golay discovered two “perfect” error-correcting codes: one binary and one ternary. These two codes stick out in the classification of perfect codes [1].

The ternary code is a linear code over GF(3), the field with three elements. You can encode a list of 5 base-three digits by multiplying the list as a row vector by the following generator matrix on the right:

\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 2 & 2 & 0 \\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 2 & 1 & 0 & 2 \\ 0 & 0 & 1 & 0 & 0 & 1 & 2 & 1 & 0 & 1 & 2 \\ 0 & 0 & 0 & 1 & 0 & 1 & 2 & 0 & 1 & 2 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 2 & 2 & 1 & 1 \\ \end{pmatrix}

Here the arithmetic is carried out in GF(3): every multiplication and addition is carried out mod 3.

Suppose we want to write this up in Python. How can you tell NumPy that you want to do arithmetic mod 3? I don’t believe you can directly. However, you could just multiply your row vector by the matrix using ordinary arithmetic and reducing the result mod 3 at the end. This gives the same result as if all intermediate operations had been carried out mod 3.

For example, suppose we wanted to encode the list [1, 0, 1, 2, 2].

    import numpy as np

    G = np.array([
      [1, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0], 
      [0, 1, 0, 0, 0, 1, 1, 2, 1, 0, 2], 
      [0, 0, 1, 0, 0, 1, 2, 1, 0, 1, 2], 
      [0, 0, 0, 1, 0, 1, 2, 0, 1, 2, 1], 
      [0, 0, 0, 0, 1, 1, 0, 2, 2, 1, 1], 
    ])

    v = np.array( [1, 0, 1, 2, 2] )
    print ((v@G)%3)

The vector-matrix product v@G contains integers that are larger than 3:

    [1 0 1 2 2 6 7 6 8 9 6]

But when we reduce everything mod 3 we get

    [1 0 1 2 2 0 1 0 2 0 0]

This doesn’t mean that linear algebra over a finite field is trivial, though it was trivial in this case.

The same trick would work if we were to work modulo any prime. So the analogous trick would work mod 7, for example.

However, the trick would not work for the field with 8 elements, for example, because arithmetic in this field is not simply arithmetic mod 8. (You can read why here.) So our trick works for GF(p) for any prime p, but not in GF(pk) for any k > 1.

Related posts

[1] From “Sphere Packings, Lattices and Groups” by J. H. Conway and N. J. A. Sloane:

Perfect codes were essentially classified by Tietäväinen and van Lint. The list is as follows.

(i) Certain trivial codes …
(ii) Hamming codes …
(iii) Nonlinear codes with the same parameters as Hamming codes …
(iv) The binary [23, 12, 7] binary Golay code C23 and the ternary [11, 6, 5] Golay code C11.

Error correcting code from octonions

Yesterday I wrote about how to multiply octets of real numbers, the octonions. Today I’ll show how to create an error correcting code from the octonions. In fact, we’ll create a perfect code in the sense explained below.

We’re going to make a code out of octonions over a binary field. That is, we’re going to work with octets of bits rather than octets of real numbers. Our code is going to produce 8-bit numbers which we can think of as octonions with components in each dimension equal to either 0 or 1, and we’ll carry out addition mod 2.

Basis of the code

The earlier post bootstrapped the multiplication facts for octonions from the equation

e1 e2 = e4

and two symmetry rules:

  1. You can shift the subscripts by a constant amount mod 7.
  2. You can permute each rule if you multiply by the sign of the permutation.

The first rule says we have the following equations:

e1 e2 = e4
e2 e3 = e5
e3 e4 = e6
e4 e5 = e7
e5 e6 = e1
e6 e7 = e2
e7 e1 = e3

Our code is going to be based on the complements of the triples in the multiplication rules above. For example, the first rule contains subcripts 1, 2, and 4, and so its complement contains subscripts 3, 5, 6, and 7. So we want the seven octonions

e3 + e5 + e6 + e7
e1 + e4 + e6 + e7
e1 + e2 + e5 + e7
e1 + e2 + e3 + e6
e2 + e3 + e4 + e7
e1 + e3 + e4 + e5
e2 + e4 + e5 + e6

and one more octonion, the sum of all the bases:

1 + e1 + e2 + e3 + e4 + e5 + e6 + e7

Our code will be spanned by these eight octonions. Written as 8-bit numbers, or code will be spanned by

00010111
01001011
01100101
01110010
00111001
01011100
00101110
11111111

Every pair of binary numbers above differs in four positions. We can verify this with the following Python code.

    from itertools import combinations, product

    def popcount(n):
        return bin(n).count('1')

    def hd(m, n): # Hamming distance
        return popcount(m ^ n)

    gen = [
        0b00010111,
        0b01001011,
        0b01100101,
        0b01110010,
        0b00111001,
        0b01011100,
        0b00101110,
        0b11111111,
    ]

    for pair in combinations(gen, 2):
        assert(hd(pair[0], pair[1]) == 4)

We didn’t need to import product for the code above, but we’ll need it below.

Note that the popcount of the XOR of two numbers counts how many places where their bit representations differ, i.e. it computes the Hamming distance between the two bit vectors.

Hamming code

The fact that the octets above spread out well, each being a Hamming distance of 4 from all the rest, suggests they could be useful in coding, and in fact the vectors above span the Hamming code (8, 4, 4).

These octets from the previous section will be the basis of our code, “basis” in the colloquial sense of a starting point. They span our set of code words, but they’re not a basis in the linear algebra sense because they are not linearly independent. The following code shows that the octets span a 4 dimensional space, i.e. you can make 24 different bit patterns by XORing together combinations of these octets.

    codewords = set()
    n = 4
    for b in combinations(gen, n):
        for coeff in product(range(2), repeat=n):
            s = 0
            for i in range(n):
                s ^= coeff[i]*b[i]
            codewords.add(s)

    assert(len(codewords) == 16)

The code tells that our octets have a span of size 16, i.e. 4 dimensions over a 2-bit field.

(Normally this would be presented by doing linear algebra, but I’m doing it by brute force just for variety. I assume readers who are more comfortable with code than math will appreciate this.)

Perfect codes

The Hamming (8, 4, 4) code is “perfect” in the sense that its packing radius equals its covering radius.

The packing diameter is the minimum distance between code words, and the packing radius is half the packing diameter.

The following code shows this is 2.

    packing_radius = min(
        hd(p[0], p[1]) for p in combinations(codewords, 2))/2
    print(packing_radius)

Since the packing diameter is 4, we can detect if up to 3 bits are corrupted: no bit pattern that differs from a code word in three positions is a code word.

The covering radius is the maximum distance from any vector to a code word. The following Python snippet shows that no 8-bit vector is a Hamming distance of more than 2 from a code word.

    def dist_to_codeword(n):
        return min(hd(n, c) for c in codewords)

    covering_radius = max(dist_to_codeword(n) for n in range(256))
    print(covering_radius)

More coding theory posts

Vector spaces and subspaces over finite fields

A surprising amount of linear algebra doesn’t depend on the field you’re working over. You can implicitly assume you’re working over the real numbers R and prove all the basic theorems—say all the theorems that come before getting into eigenvalues in a typical course—and all or nearly all of the theorems remain valid if you swap out the complex numbers for the real numbers. In fact, they hold for any field, even a finite field.

Subspaces over infinite fields

Given an n-dimensional vector space V over a field F we can ask how many subspaces of V there are with dimension k. If our field F is the reals and 0 < k < n then there are infinitely many such subspaces. For example, if n = 2, every line through the origin is a subspace of dimension 1.

Now if we’re still working over R but we pick a basis

e1, e2, e3, …, en

and ask for how many subspaces of dimension k there are in V  that use the same basis elements, now we have a finite number. In fact the number is the binomial coefficient

\binom{n}{k}

because there are as many subspaces as there are ways to select k basis elements from a set of n.

Subspaces over finite fields

Let F be a finite field with q elements; necessarily q is a prime power. Let V be an n-dimensional vector space over F. We might need to know how many subspaces of V there are of various dimensions when developing algebraic codes, error detection and correction codes based on vector spaces over finite fields.

Then the number of subspaces of V with dimension k equals the q-binomial coefficient

\binom{n}{k}_q \equiv \frac{[n]_q!}{[k]_q! [n-k]_q!}

mentioned in the previous post. Here [n]q! is the q-factorial of n, defined by

[n]_q! \equiv [1]_q [2]_q [3]_q \cdots [n]_q

and [n]q! is the q-analog of n, defined by

[n]_q \equiv \frac{1 - q^n}{1-q} = 1 + q + q^2 + \cdots q^{n-1}

The fraction definition holds for all n, integer or not, when q ≠ 1. The fraction equals the polynomial in q when n is a non-negative integer.

You can derive the expression for the number of subspaces directly from a combinatorial argument, not using any of the q-analog notation, but this notation makes things much more succinct.

Python code

We can easily code up the definitions above in Python.

    def analog(n, q):
        return (1 - q**n)//(1 - q)

    def qfactorial(n, q):
        p = 1
        for k in range(1, n+1):
            p *= analog(k, q)
        return p

    def qbinomial(n, k, q):
        d = qfactorial(k, q)*qfactorial(n-k, q)
        return qfactorial(n, q)/d

Example

To keep things small, let’s look at the field with q = 3 elements. Here addition and multiplication are carried out mod 3.

Let n = 3 and k = 1. So we’re looking for one-dimensional subspaces of F³ where F is the field of integers mod 3.

A one-dimensional subspace of vector space consists of all scalar multiples of a vector. We can only multiply a vector by 0, 1, or 2. Multiplying by 0 gives the zero vector, multiplying by 1 leaves the vector the same, and multiplying by 2 turns 1s into 2s and vice versa. So, for example, the vector (1, 2, 0) is a basis for the subspace consisting of

(0, 0, 0), (1, 2, 0}, (2, 1, 0).

We can find all the subspaces by finding a base for each subspace. And with a little brute force effort, here’s a list.

  • (1, 0, 0)
  • (0, 1, 0)
  • (0, 0, 1)
  • (0, 1, 1)
  • (1, 0, 1)
  • (1, 1, 0)
  • (1, 2, 0)
  • (0, 1, 2)
  • (2, 0, 1)
  • (1, 1, 1)
  • (2, 1, 1)
  • (1, 2, 1)
  • (1, 1, 2)

It’s easy to check that none of these vectors is a multiple mod 3 of another vector on the list. The theory above says we should expect to find 13 subspaces, and we have, so we must have found them all.

Related posts