# Redundant Residue Number Systems

You can uniquely represent a large number by its remainders when divided by smaller numbers, provided the smaller numbers have no factors in common, and carry out arithmetic in this representation. Such a representation is called a Residue Number System (RNS).

In the 1960’s people realized RNSs could be useful in computer arithmetic. The original papers are now hard to find, but you can find a summary of some of their ideas here. We will give examples working in a particular RNS below.

When you work with remainders by more numbers than necessary, you have a Redundant Residue Number System (RRNS) that provides error detection and correction. We will also demonstrate this with an example.

## RNS example

To be concrete, we’ll use the remainders by 199, 233, 194, and 239 to represent numbers, following the work of C. W. Hastings and R. W. Watson referenced in the link cited above.

Let M = 199*233*194*239. Any non-negative integer less than M can be specified by its remainders mod 199, 233, 194, and 239.

The following Python code generates a random number less than M, represents it by its remainders, and then recovers the original number from the remainders using the Chinese Remainder Theorem.

    from random import randrange
from sympy import prod
from sympy.ntheory.modular import crt

moduli = [199, 233, 194, 239]
M = prod(moduli)

x = randrange(M)
remainders = [x % m for m in moduli]
# See footnote [1]
y = crt(moduli, remainders, symmetric=False)[0]

print(x)
print(remainders)
print(y)


This printed

    1119075671
[166, 204, 57, 235]
1119075671


You can add, subtract, and multiply numbers by carrying out the corresponding operations on their remainders. There are three advantages to this.

First, you can work with smaller numbers. In our example, all the moduli are 8-bit numbers; we can carry out arithmetic on 32-bit numbers [2] by only working directly with 8-bit numbers. We could use the same idea to represent extremely large numbers by their remainders via 64-bit integers.

Second, we can do our calculations in parallel, working with each of our remainders at the same time.

Third, there are no carries. There’s no need to keep track of whether component calculations overflow.

The following code shows how we can add two numbers via their remainders.

    a = randrange(M//2)
b = randrange(M//2)

arem = [a % m for m in moduli]
brem = [b % m for m in moduli]
crem = [z[0] + z[1] for z in zip(arem, brem)]
c = crt(moduli, crem, symmetric=False)[0]

print(a + b)
print(c)


When I ran this code, it printed 832537447 twice.

## RRNS

Now we get to the redundant part. Suppose we add more numbers to our list of moduli, larger than the previous numbers and relatively prime to all of them and to each other. Now working modulo this larger list of numbers, we have more information than we need. If the results we get from using various subsets of the list of moduli are inconsistent, we’ve detected an error. And with enough extra information, we can correct the error.

Watson and Hastings added 251 and 509 in their example, and so we’ll do the same.

As before, we will generate a couple random numbers and represent them via their remainders, though now by a longer list of remainders. We will deliberately corrupt one of the component sums and then compute their sum using different choices of four moduli from the set of six.

    xmoduli = [199, 233, 194, 239, 251, 509]
a = randrange(M//2)
b = randrange(M//2)
aremx = [a % m for m in xmoduli]
bremx = [b % m for m in xmoduli]
cremx = [z[0] + z[1] for z in zip(aremx, bremx)]
cremx[1] += 13

c0 = crt(xmoduli[:4], cremx[:4], symmetric=False)[0]
c1 = crt(xmoduli[2:], cremx[2:], symmetric=False)[0]
c2 = crt(xmoduli[:2] + xmoduli[4:], cremx[:2] + cremx[4:], symmetric=False)[0]
print(c0, c1, c2)


This will return three different answers, so we know that there has been an error. When I ran it I got 70315884, 819846513, and 890162397. If you run the code yourself you’ll get different answers since you’ll generate different random numbers.

Now how do we correct the error? Suppose we know there has only been one error (or more realistically, we are willing to assume that the probability of two errors is tolerably small). Then one of the results above must be correct: the first sum excludes the last two moduli, the second excludes the first two, and the last excludes the middle two. One of them must exclude the error.

The first calculation based on a different subset of moduli that gives one of the results above is correct. The code

    c3 = crt(xmoduli[:1]+xmoduli[2:5], cremx[:1] + cremx[2:5], symmetric=False)[0]
print(c3)


produced 890162397, matching the third sum above, so we know that eliminating the second modulus gives the correct answer. We’ve found the correct answer, and we’ve discovered which component was corrupted.

***

[1] A couple things about the call to crt require explanation. We set symmetric to False in order to get a positive return value. Otherwise we might get a value that is correct mod M but negative. Also, crt returns not just the solution we’re after but a pair consisting of the solution and the product of the moduli. We take the first element of the pair with [0] to get the part we’re interested in.

[2] Not all 32-bit numbers. with any numbers less than M, and M is between 231 and 232.

# Morse code golf

You can read the title of this post as ((Morse code) golf) or as (Morse (code golf)).

Morse code is a sort of approximate Huffman coding of letters: letters are assigned symbols so that more common letters can be transmitted more quickly. You can read about how well Morse code achieves this design objective here.

But digits in Morse code are kinda strange. I imagine they were an afterthought, tacked on after encodings had been assigned to each of the letters, and so had to avoid encodings that were already in use. Here are the assignments:

    |-------+-------|
| Digit | Code  |
|-------+-------|
|     1 | .---- |
|     2 | ..--- |
|     3 | ...-- |
|     4 | ....- |
|     5 | ..... |
|     6 | -.... |
|     7 | --... |
|     8 | ---.. |
|     9 | ----. |
|     0 | ----- |
|-------+-------|


There’s no attempt to relate transmission length to frequency. Maybe the idea was that all digits are equally common. While in some contexts this is true, it’s not true in general for mathematical and psychological reasons.

There is a sort of mathematical pattern to the Morse code symbols for digits. For 1 ≤ n ≤ 5, the symbol for n is n dots followed by 5-n dashes. For 6 ≤ n ≤ 9, the symbol is n-5 dashes followed by 10-n dots. The same rule extends to 0 if you think of 0 as 10.

A more mathematically satisfying way to assign symbols would have been binary numbers padded to five places:

    0 -> .....
1 -> ....-
2 -> ..._.
etc.


Because the Morse encoding of digits is awkward, it’s not easy to describe succinctly. And here is where golf comes in.

The idea of code golf is to write the shortest program that does some task. Fewer characters is better, just as in golf the lowest score wins.

Here’s the challenge: Write two functions as small you can, one to encode digits as Morse code and another to decode Morse digits. Share your solutions in the comments below.

# MDS codes

A maximum distance separable code, or MDS code, is a way of encoding data so that the distance between code words is as large as possible for a given data capacity. This post will explain what that means and give examples of MDS codes.

## Notation

A linear block code takes a sequence of k symbols and encodes it as a sequence of n symbols. These symbols come from an alphabet of size q. For binary codes, q = 2. But for non-trivial MDS codes, q > 2. More on that below.

The purpose of these codes is to increase the ability to detect and correct transmission errors while not adding more overhead than necessary. Clearly n must be bigger than k, but the overhead n-k has to pay for itself in terms of the error detection and correction capability it provides.

The ability of a code to detect and correct errors is measured by d, the minimum distance between code words. A code has separation distance d if every pair of code words differs in at least d positions. Such a code can detect up to d errors per block and can correct ⌊(d-1)/2⌋ errors.

### Example

The following example is not an MDS code but it illustrates the notation above.

The extended Golay code used to send back photos from the Voyager missions has q = 2 and [n, k, d] = [24, 12, 8]. That is, data is divided into segments of 12 bits and encoded as 24 bits in such a way that all code blocks differ in at least 8 positions. This allows up to 8 bit flips per block to be detected, and up to 3 bit flips per block to be corrected.

(If 4 bits were corrupted, the result could be equally distant between two valid code words, so the error could be detected but not corrected with certainty.)

## Separation bound

There is a theorem that says that for any linear code

k + dn + 1.

This is known as the singleton bound. MDS codes are optimal with respect to this bound. That is,

k + d = n + 1.

So MDS codes are optimal with respect to the singleton bound, analogous to how perfect codes are optimal with respect to the Hamming bound. There is a classification theorem that says perfect codes are either Hamming codes or trivial with one exception. There is something similar for MDS codes.

## Classification

MDS codes are essentially either Reed-Solomon codes or trivial. This classification is not as precise as the analogous classification of perfect codes. There are variations on Reed-Solomon codes that are also MDS codes. As far as I know, this accounts for all the known MDS codes. I don’t know that any others have been found, or that anyone has proved that there are no more.

### Trivial MDS codes

What are these trivial codes? They are the codes with 0 or 1 added symbols, and the duals of these codes. (The dual of an MDS code is always an MDS code.)

If you do no encoding, i.e. take k symbols and encode them as k symbols, then d = 1 because different code words may only differ in one symbol. In this case n = k and so k + d = n + 1, i.e. the singleton bound is exact.

You could take k data symbols and add a checksum. If q = 2 this would be a parity bit. For a larger alphabet of symbols, it could be the sum of the k data symbols mod q. Then if two messages differ in 1 symbol, they also differ in added checksum symbol, so d = 2. We have n = k + 1 and so again k + d = n + 1.

The dual of the code that does no encoding is the code that transmits no information! It has only one code word of size n. You could say, vacuously, that d = n because any two different code words differ in all n positions. There’s only one code word so k = 1. And again k + d = n + 1.

The dual of the checksum code is the code that repeats a single data symbol n times. Then d = n because different code words differ in all n positions. We have k = 1 since there is only one information symbol per block, and so k + d = n + 1.

## Reed Solomon codes

So the stars of the MDS show are the Reed-Solomon codes. I haven’t said how to construct these codes because that deserves a post of its own. Maybe another time. For now I’ll just say a little about how they are used in application.

As mentioned above, the Voyager probes used a Golay code to send back images. However, after leaving Saturn the onboard software was updated to use Reed-Solomon encoding. Reed-Solomon codes are used in more down-to-earth applications such as DVDs and cloud data storage.

Reed-Solomon codes are optimal block codes in terms of the singleton bound, but block codes are not optimal in terms of Shannon’s theorem. LDPC (low density parity check) codes come closer to the Shannon limit, but some forms of LDPC encoding use Reed-Solomon codes as a component. So in addition to their direct use, Reed-Solomon codes have found use as building blocks for other encoding schemes.

# Perfect codes

A couple days ago I wrote about Hamming codes and said that they are so-called perfect codes, i.e. codes for which Hamming’s upper bound on the number of code words with given separation is exact.

Not only are Hamming codes perfect codes, they’re practically the only non-trivial perfect codes. Specifically, Tietavainen and van Lint proved in 1973 that there are three kinds of perfect binary codes:

1. Hamming codes
2. One Golay code
3. Trivial codes

I wrote about Golay codes a few months ago. There are two binary Golay codes, one with 23 bit words and one with 24 bit words. The former is “perfect.” But odd-length words are awkward to use, and in practice you’d usually tack on one more parity bit, creating an “imperfect” but useful error-correcting code. The Voyager probes used the 24-bit Golay code to send photos of Jupiter and Saturn back to earth.

## Trivial codes

Trivial codes simply repeat each word of input n times. For example, 01100 would be encoded as 0110001100 in a trivial code with n = 2. In order to be a perfect code, a trivial code must have n odd.

Trivial codes are the first and simplest thing you’d think of if you wanted to create an error correcting code. You might, for example, send a message three times. Then if a bit in a single word is corrupted in one copy of the message, but the corresponding word is the same in the latter two copies, you use their common value as the assumed correct value.

While trivial codes are obvious, they’re not efficient. For example, suppose we have message words of 5 bits, say to encode English letters and a few other symbols. If we triple each message word, it takes 15 bits of code word to transmit 5 bits of message.

But if we used a Hamming (15, 11) code, each 15-bit code word carries 11 message bits and 4 parity bits. This code has the same Hamming separation as the trivial code with n = 3, but it carries over twice as much information per code word, 11 bits versus 5 bits.

As word size increases, the efficiency of Hamming codes approaches 1. With trivial codes, the efficiency is constantly 1/n.

Trivial codes show that “perfect” does not mean optimal in a practical sense. Perfect codes are optimal with respect to the Hamming bound, but maybe not optimal in practice, depending on what criteria you’re optimizing over. Trivial codes are optimal if you’re optimizing for conceptual simplicity, but not if you’re optimizing for efficiency.

# A gentle introduction to Hamming codes

The previous post looked at how to choose five or six letters so that their Morse code representations are as distinct as possible. This post will build on the previous one to introduce Hamming codes. The problem of finding Hamming codes is much simpler in some ways, but also more general.

Morse code is complicated for a several reasons. First, it seems at first blush to have an alphabet of two symbols—dot and dash—but it actually has an alphabet of three symbols—dot, dash, and space—with a complicated constraints. Second, we’re trying to optimize the perceptual separation between letters, not the objective separation between signals on a wire. Third, dots and dashes have different lengths, and we have secondary objectives related to transmission time. Given the option, we would prefer to choose letters that reduce transmission time, but we would also like to choose letters that each have similar transmission time.

We will simplify the situation in this post by using exactly two symbols, 0 and 1, and using bit sequences of fixed length. We’ll also have one simple objective, maximizing Hamming distance separation, with no secondary objectives.

## Terminology

We’ll need to introduce some terminology to make the rest of the post more clear.

By alphabet we will mean the choice of symbols we use. For Morse code, the “alphabet” of symbols is dot, dash, and space. For a binary sequence, the alphabet is 0 and 1.

By word we will mean a sequence of symbols from our alphabet. Using this terminology, we would call ..-. in Morse code a word, even though it represents an English letter. You will also see the term vector used instead of word; mentally substitute vector for word everywhere if you find that easier.

The set of words we will use for transmitting messages is called a code. In the previous post, the code consisted originally of A, D, F, G, and V, and we looked at codes that would have been better by several criteria.

If we represent each English letter by a sequence of five bits, we would call the 0s and 1s the elements of our alphabet and the groups of five bits our words.

## Binary codes

For this post we will look at words of a fixed length n. For example, we could encode English letters into words of 5 bits each since

25 = 32 > 26

though this would only give us Hamming distance separation of 1, i.e. many of the code words would differ by only one bit. If a single bit were accidentally flipped in transmission, we would not be able to tell. But if we were to use more bits per word, we could have more separation. For example, if we used words of six bits and used the last bit as a parity bit, then we could choose 26 words that have Hamming distance at least 2 from each other.

## A(n, d)

The maximum number of binary words of length n, all separated by a Hamming distance of at least d, is denoted

A2(n, d).

The subscript 2 on A says that we’re working with an alphabet of 2 symbols, i.e. 0 and 1. If we were interested in an alphabet of size q, we would replace the 2 with a q.

This notation is a little sideways from our introduction, but closely related. We would like to know, for example, how many bits n we need do produce a code that has, say, 26 symbols separated by a Hamming distance d. That is, we’re thinking of our code size, such as 26, and the Hamming distance d, as being the independent variables and the number of bits n as the dependent variable. But it’s customary in coding theory to consider the word size n and the Hamming distance separation d as independent variables, and to consider Aq(n, d) as a function of n and d.

There is no way to compute Aq(n, d) in general, even for a fixed q such as q = 2. But there are upper and lower bounds, and exact values have been computed in many particular cases.

For example, A2(10, 4) = 40, and so using sequences of 10 bits, you can find a set of 40 words that are all a Hamming distance of at least 4 from each other, enough to encode all English letters (without regard to case) and 10 digits.

A2(15, 6) = 128, so with 15 bits you could find 128 words a distance of 6 apart, enough to encode all ASCII characters.

## Hamming bound and perfect codes

As mentioned above, there is no convenient formula for A(n, d) but there are bounds. The Hamming bound says

where the upper limit of the sum is

A code for which the Hamming bound is exact is called a perfect code.

## Hamming codes

Hamming codes are perfect binary codes where d = 3.

Note that 3 is the minimum separation for error correction. If we simply add a parity bit, as mentioned above, we can detect errors, but we cannot correct them. If code words are a distance 2 apart, a word with one corrupted bit could be equidistant between two valid code words.

For example, suppose you encode 00010 as 000101, adding a parity bit of 1 at the end because the original sequence had an odd number of 1’s. And suppose you similarly encode 00011 as 000110. Now you receive 000100. You know that something is wrong because the parity bit is inconsistent with the previous bits. But you don’t know whether 000101 was transmitted and the 6th bit was corrupted or 000110 was transmitted and the 5th bit was corrupted.

With Hamming codes, n is always one less than a power of 2, i.e

n = 2m – 1

and m is the number of added bits. That is, the code will have

2mm – 1

data bits, and the number of distinct code words will be 2 raised to that power. Each of these words is separated by a Hamming distance of at least 3.

Incidentally, note that as m increases, the number of parity bits is growing linearly, but the number of data bits is growing exponentially. That is, the overhead of the parity bits relative to the code word size is going to zero.

## Hamming(7,4) code

Let’s look at one Hamming code in detail. If m = 3, we have a code with 7-bit words: 4 data bits and 3 added bits. With 4 data bits we can have 16 different words, so the Hamming code with 7-bit words contains 16 words, each separated by a Hamming distance of 3. If we compute the right side of the Hamming bound we also get 16, i.e. A2(7, 3) = 16, demonstrating that the Hamming bound is exact.

We can encode a set of 4 bits by making them into a row vector and multiplying the vector on the right by the generator matrix


1 0 0 0 1 1 1
0 1 0 0 0 1 1
0 0 1 0 1 0 1
0 0 0 1 1 1 0


The matrix multiplication is defined using the field with two elements, so multiplication stays the same but 1 + 1 = 0.

Note that the 4 by 4 block on the left side of the matrix is the identity matrix. So the encoding of a string of four bits begins with the original 4 bits. The 16 words in our code are

    0000000
0001110
0010101
0011011
0100011
0101101
0110110
0111000
1000111
1001001
1010010
1011100
1100100
1101010
1110001
1111111


Unless I’ve made an error in writing this up [1], each of these code words should differ from all other code words in at least 3 positions.

## More on coding theory

[1] Errors happen all the time. That’s why we need error correcting codes!

# ADFGVX cipher and Morse code separation

A century ago the German army used a field cipher that transmitted messages using only six letters: A, D, F, G, V, and X. These letters were chosen because their Morse code representations were distinct, thus reducing transmission error.

The ADFGVX cipher was an extension of an earlier ADFGV cipher. The ADFGV cipher was based on a 5 by 5 grid of letters. The ADFGVX extended the method to a 6 by 6 grid of letters and digits. A message was first encoded using the grid coordinates of the letters, then a transposition cipher was applied to the sequence of coordinates.

This post revisits the design of the ADFGVX cipher. Not the encryption method itself, but the choice of letters used for transmission. How would you quantify the difference between two Morse code characters? Given that method of quantification, how good was the choice of ADFGV or its extension ADFGVX? Could the Germans have done better?

## Quantifying separation

There are several possible ways to quantify how distinct two Morse code signals are.

### Time signal difference

My first thought was to compare the signals as a function of time.

There are differing conventions for how long a dot or dash should be, and how long the space between dots and dashes should be. For this post, I will assume a dot is one unit of time, a dash is three units of time, and the space between dots or dashes is one unit of time.

The letter A is represented by a dot followed by a dash. I’ll represent this as 10111: on for one unit of time for the dot, off for one unit of time for the space between the dot and the dash, and on for three units of time for the dash. D is dash dot dot, so that would be 1110101.

We could quantify the difference between two letters in Morse code as the Hamming distance between their representations as 0s and 1s, i.e. the number of positions in which the two letters differ. To compare A and D, for example, I’ll pad the A with a couple zeros on the end to make it the same length as D.

    A: 1011100
D: 1110101
x x  x


The distance is 3 because the two sequences differ in three positions. (Looking back at the previous post on popcount, you could compute the distance as the popcount of the XOR of the two bit patterns.)

A problem with this approach is that it seems to underestimate the perceived difference between F and G.

    F: ..-. 1010111010
G: --.  1110111010
x


These only differ in the second bit, but they sound fairly different.

### Symbolic difference

The example above suggests maybe we should compare the sequence of dots and dashes themselves rather than compare their corresponding time signals. By this measure F and G are distance 4 apart since they differ in every position.

### Other possibilities

Comparing the symbol difference may over-estimate the difference between U (..-) and V (...-). We should look at some combination of time signal difference and symbolic difference.

Or maybe the thing to do would be to look at something like the edit distance between letters. We could say that U and V are close because it only takes inserting a dot to turn a U into a V.

There are several choices of letters that would have been better than ADFGV by either way of measuring distance. For example, CELNU has better separation and takes about 14% less time to transmit than ADFGV.

Here are a couple tables that give the time distance (dT) and the symbol distance (dS) for ADFGV and for CELNU.

    |------+----+----|
| Pair | dT | dS |
|------+----+----|
| AD   |  3 |  3 |
| AF   |  4 |  3 |
| AG   |  5 |  2 |
| AV   |  4 |  3 |
| DF   |  3 |  3 |
| DG   |  2 |  1 |
| DV   |  3 |  2 |
| FG   |  1 |  4 |
| FV   |  2 |  2 |
| GV   |  3 |  3 |
|------+----+----|

|------+----+----|
| Pair | dT | dS |
|------+----+----|
| CE   |  7 |  4 |
| CL   |  4 |  3 |
| CN   |  4 |  2 |
| CU   |  5 |  2 |
| EL   |  5 |  3 |
| EN   |  3 |  2 |
| EU   |  4 |  2 |
| LN   |  4 |  4 |
| LU   |  3 |  3 |
| NU   |  3 |  2 |
|------+----+----|


For six letters, CELNOU is faster to transmit than ADFGVX. It has a minimum time distance separation of 3 and a minimum symbol distance 2.

This is an update in response to a comment that suggested instead of minimizing transmission time of a set of letters, you might want to pick letters that are most similar in transmission time. It takes much longer to transmit C (-.-.) than E (.), and this could make CELNU harder to transcribe than ADFGV.

So I went back to the script I’m using and added time spread, the maximum transmission time minus the minimum transmission time, as a criterion. The ADFGV set has a spread of 4 because V takes 4 time units longer to transmit than A. CELNU has a spread of 10.

There are 210 choices of 5 letters that have time distance greater than 1, symbol distance greater than 1, and spread equal to 4. That is, these candidates are more distinct than ADFGV and have the same spread.

It takes 44 time units to transmit ADFGV. Twelve of the 210 candidates identified above require 42 or 40 time units. There are five that take 40 time units:

• ABLNU
• ABNUV
• AGLNU
• AGNUV
• ALNUV

Looking at sets of six letters, there are 464 candidates that have better separation than ADFGVX and equal time spread. One of these, AGLNUX, is an extension of one of the 5-letter candidates above.

The best 6-letter are ABLNUV and AGLNUV. They are better than ADFGVX by all the criteria discussed above. They both have time distance separation 2 (compared to 1), symbol distance separation 2 (compared to 1), time spread 4, (compared to 6) and transmission time 50 (compared to 56).

# More on attacking combination locks

A couple weeks ago I wrote about how De Bruijn sequences can be used to attack locks where there is no “enter” key, i.e. the lock will open once the right symbols have been entered.

Here’s a variation on this theme: what about locks that let you press more than one button at a time? [1]

You could just treat this as if it were a keypad with more buttons. For example, suppose you can push either one or two buttons at a time on the lock pictured above. Then you could treat this as a lock with 15 buttons: the five actual buttons and 10 virtual buttons corresponding to the ten ways you could choose 2 buttons out of 5 to press at the same time.

Maybe a lock would let you press even more buttons at once. I don’t think I’ve ever used a combination that required more than two buttons at once, but in principle you could push up to five buttons at once in the photo above, for a total of 31 possible button combinations. And since 31 is prime, you could use the algorithm here to generate the corresponding De Bruijn sequence.

If you know how long the password is, you can try the De Bruijn sequence for passwords of that length. But what if you don’t know the length of the password a priori? You could try the De Bruijn sequence for passwords of length 1, then length 2, then length 3, etc. But is there a more efficient way?

If there are k button combinations and the password has length n, then the optimal solution is to start entering a De Bruijn sequence B(k, n) until the lock opens. If you know the password has length no more than 4, then you could try a B(k, 4) sequence, and if the password is actually shorter than 4, say length 3, you’ll still open it.

But what if you’ve tried a B(k, 4) sequence and it turns out the password has length 5? You could do better than starting over with a B(k, 5) sequence, because some of the substrings in that B(k, 5) sequence will have already been tried. But how could you do this systematically? If you don’t know the length of the password, how could you do better than trying B(k, 1), then B(k, 2), then B(k, 3) etc.?

## Related posts

[1] For this post, I’m assuming the lock will open as soon as you enter the right sequence of buttons. For example, if the pass code is 345 and you enter 12345 the lock will open. I don’t know whether these locks work that way. Maybe you have to turn the handle, which would effectively act as an enter key. But maybe there’s a way to listen to the lock so that you’ll know when the combination has been entered, before you twist the handle.

Update: According to the first comment below, locks of the kind featured in the photo only allow each button to be used once. That puts severe limits on the number of possible combinations, and the approach outlined here would be unnecessary. However, the post brings up more general questions that could be useful in another setting.

# Linear feedback shift registers

The previous post looked at an algorithm for generating De Bruijn sequences B(k, n) where k is a prime number. These are optimal sequences that contain every possible consecutive sequence of n symbols from an alphabet of size k. As noted near the end of the post, the case k = 2 is especially important in application, i.e. binary sequences.

If we set k = 2, the generating algorithm is an example of a linear feedback shift register (LFSR) sequence.

Here’s the algorithm from the previous post:

1. If (x1, x2, …, xn ) = (0,0, …, 0), return c1.
2. If (x1, x2, …, xn ) = (1,0, …, 0), return 0.
3. Otherwise return c1x1 + c2x2 + … cnxn mod k.

Note that the last line says to return a linear combination of the previous symbols. That is, we operate on the latest n symbols, saving them to registers. We take the feedback from the previous outputs and compute a linear combination. We then shift the register content over by one and add the new output on the end. Hence the name “linear feedback shift register” sequence.

Note that the LFSR algorithm is a linear operator over the field GF(2), except for the special cases in steps 1 and 2. The algorithm couldn’t be entirely linear because it would get stuck; It would produce nothing but zeros forevermore once it encountered an input sequence of all zeros. So technically a LFSR is an “nearly always linear feedback shift register.” It’s linear for 2n – 2 inputs and nonlinear for 2 special inputs.

A LFSR is more general than a binary De Bruijn sequence. For some choices of linear coefficients the output is a De Bruijn sequence, but not for others. If you associate the linear coefficients with polynomial coefficient (with a sign change, as noted in the previous post) then the LFSR sequence is a De Bruijn sequence if and only if the polynomial is a primitive polynomial of degree n.

A few months ago I wrote about LFSRs in the context of random number generation. LFSRs make very efficient random number generators, but they’re not appropriate for cryptography because their linear structure makes them vulnerable to attack. The idea of shrinking generators is use one random number generator to sample another generator. The two generators can both be LFSRs, but the combined generator is nonlinear because the sampling mechanism is nonlinear. The net result is that you can combine two efficient but insecure generators to create a new generator that is efficient and secure.

# Generating De Bruijn cycles with primitive polynomials

Last week I wrote about a way to use De Bruijn cycles. Today I’ll write about a way to generate De Bruijn cycles.

## De Bruijn cycles

Start with an alphabet of k symbols. B(k, n) is the set of cycles of that contain every sequence of n symbols, and is as short as possible. Since there are kn possible sequences of n symbols, and every one corresponds to some starting position in the De Bruijn cycle, an element of B(k, n) has to have at least kn symbols. In fact, the elements of B(k, n) have exactly kn symbols.

It’s not obvious that B(k, n) should be non-empty for all k and n, but it is. And there are algorithms that will take a k and an n and return an element of B(k, n).

The post from last week gave the example of a sequence in B(4, 3) that contains all triples of DNA base pairs:

AAACAAGAATACCACGACTAGCAGGAGTATCATGATTCCCGCCTCGGCGTCTGCTTGGGTGTTT

## Generating De Bruijn cycles

When k is a prime number, i.e. we’re working over an alphabet with a prime number of symbols, it is particularly simple generate De Bruijn sequences [1]. For example, let’s work over the alphabet {0, 1, 2}. Then the following code will produce the next symbol in a sequence in B(3, 4).

    def B34(a,b,c,d):
if (a,b,c,d) == (0,0,0,0):
return 1
if (a,b,c,d) == (1,0,0,0):
return 0
return (a+b) % 3


We can initialize the sequence wherever we like since it produces a cycle, but if we start with 0000 we get the following:

000010011012110021020122101011112220112120002002202122001201021120202222111022121

You can verify that every sequence of four elements from {0, 1, 2} is in there somewhere, provided you wrap the end around. For example, 1000 can be found by starting in the last position.

Where did the algorithm above come from? How would you create an analogous algorithm for other values of k and n?

The algorithm goes back to Willem Mantel in 1894, and can be found, for example, in Knuth’s TAOCP Volume 4. Here is Mantel’s algorithm to generate an element of B(k, n) where k is prime. The function takes the latest n symbols in the De Bruijn cycle and returns the next symbol.

1. If (x1, x2, …, xn ) = (0,0, …, 0), return c1.
2. If (x1, x2, …, xn ) = (1,0, …, 0), return 0.
3. Otherwise return c1x1 + c2x2 + … cnxn mod k.

In our example above, c1 = c2 = 1 and c3 = c4 = 0, but how do you find the c‘s in general?

## Primitive polynomials

To find the c‘s, first find a primitive polynomial of degree n over the prime field with k elements. Then the c‘s are the coefficients of the polynomial, with a sign change, if you write the polynomial in the following form.

In the example above, I started with the polynomial

We can read the coefficients off this polynomial: c1 = c2 = -2 and c3 = c4 = 0. Since -1 and 2 are the same working mod 3, I used c1 = c2 = 1 above.

Backing up a bit, what is a primitive polynomial and how do you find them?

A primitive polynomial of degree n with coefficients in GF(k), the finite field with k elements, has leading coefficient 1 and has a root α that generates the multiplicative group of GF(kn). That is, every nonzero element of GF(kn) can be written as a power of α.

In my example, I found a primitive polynomial in GF(34) by typing polynomials into Mathematica until I found one that worked.

    In[1]:= PrimitivePolynomialQ[x^4 + 2 x + 2, 3]

Out[1]= True


Since coefficients can only be 0, 1, or 2 when you’re working mod 3, it only took a few guesses to find a primitive polynomial [2].

Brute force guessing works fine k and n are small, but clearly isn’t practical in general. There are algorithms for searching for primitive polynomials, but I’m not familiar with them.

The case where k = 2 and n may be large is particularly important in applications, and you can find where people have tabulated primitive binary polynomials, primitive polynomials in GF(2n). It’s especially useful to find primitive polynomials with a lot of zero coefficients because, for example, this leads to less computation when producing De Bruijn cycles.

Finding sparse primitive binary polynomials is its own field of research. See, for example, Richard Brent’s project to find primitive binary trinomials, i.e. primitive binary polynomials with only three non-zero coefficients.

More on binary sequences in the next post on linear feedback shift registers.

***

[1] The same algorithm can be used when k is not prime but a prime power because you can equate sequence of length n from an alphabet with k = pm elements with a sequence of length mn from an alphabet with p elements. For example, 4 is not prime, but we could have generated a De Bruijn sequence for DNA basepair triples by looking for binary sequences of length 6 and using 00 = A, 01 = C, 10 = G, and 11 = T.

[2] The number of primitive polynomials in GF(pn) is φ(pn – 1)/m where φ is Euler’s totient function. So in our case there were 8 polynomials we could have found.

# Golay codes

Suppose you want to sent pictures from Jupiter back to Earth. A lot could happen as a bit travels across the solar system, and so you need some way of correcting errors, or at least detecting errors.

The simplest thing to do would be to transmit photos twice. If a bit is received the same way both times, it’s likely to be correct. If a bit was received differently each time, you know one of them is wrong, but you don’t know which one. So sending the messages twice doesn’t let you correct any errors, but it does let you detect some errors.

There’s a much better solution, one that the Voyager probes actually used, and that is to use a Golay code. Take the bits of your photo in groups of 12, think of each group as a row vector, and multiply it by the following matrix, called the generator matrix:

100000000000101000111011
010000000000110100011101
001000000000011010001111
000100000000101101000111
000010000000110110100011
000001000000111011010001
000000100000011101101001
000000010000001110110101
000000001000000111011011
000000000100100011101101
000000000010010001110111
000000000001111111111110


(You might see different forms of the generator matrix in different publications.)

That’s the (extended binary) Golay code in a nutshell. It takes groups of 12 bits and returns groups of 24 bits. It doubles the size of your transmission, just like transmitting every image twice, but you get more bang for your bits. You’ll be able to correct up to 3 corrupted bits per block of 12 and detect more.

Matrix multiplication here is done over the field with two elements. This means multiplication and addition of 0’s and 1’s works as in the integers, except 1 + 1 = 0.

Each pair of rows in the matrix above differ in 8 positions. The messages produced by multiplication are linear combinations of these rows, and they each differ in at least 8 positions.

When you receive a block of 24 bits, you know that there has been some corruption if the bits you receive are not in the row space of the matrix. If three or fewer bits have been corrupted, you can correct for the errors by replacing the received bits by the closest vector in the row space of the matrix.

If four bits have been corrupted, the received bits may be equally close to two different possible corrections. In that case you’ve detected the error, but you can’t correct it with certainty. If five bits are corrupted, you’ll be able to detect that, but if you attempt to correct the errors, you could mistakenly interpret the received bits to be a corruption of three bits in another vector.

## Going deeper

In one sense, Golay codes are very simple: just multiply by a binary matrix. But there’s a lot going on beneath the surface. Where did the magic matrix above come from? It’s rows differ in eight positions, but how do you know that all linear combinations for the rows also differ in at least eight positions? Also, how would you implement it in practice? There are more efficient approaches than to directly carry out a matrix multiplication.

To give just a hint of the hidden structure in the generator matrix, split the matrix half, giving an identity matrix on the left and a matrix M on the right.

101000111011
110100011101
011010001111
101101000111
110110100011
111011010001
011101101001
001110110101
000111011011
100011101101
010001110111
111111111110


The 1’s along the last row and last column have to do with why this is technically an “extended” Golay code: the “perfect” Golay code has length 23, but an extra check sum bit was added. Here “perfect” means optimal in a sense that I may get into in another post. (Update: See this post.) Let’s strip off the last row and last column.

10100011101
11010001110
01101000111
10110100011
11011010001
11101101000
01110110100
00111011010
00011101101
10001110110
01000111011


The first row corresponds to quadratic residues mod 11. That is, if you number the columns starting from 0, the zeros are in columns 1, 3, 4, 5, and 9. These are the non-zero squares mod 11. Also, the subsequent rows are rotations of the first row.

Golay codes are practical for error correction—they were used to transmit the photo at the top of the post back to Earth—but they also have deep connections to other parts of math, including sphere packing and sporadic groups.