A gentle introduction to Hamming codes

The previous post looked at how to choose five or six letters so that their Morse code representations are as distinct as possible. This post will build on the previous one to introduce Hamming codes. The problem of finding Hamming codes is much simpler in some ways, but also more general.

Morse code is complicated for a several reasons. First, it seems at first blush to have an alphabet of two symbols—dot and dash—but it actually has an alphabet of three symbols—dot, dash, and space—with a complicated constraints. Second, we’re trying to optimize the perceptual separation between letters, not the objective separation between signals on a wire. Third, dots and dashes have different lengths, and we have secondary objectives related to transmission time. Given the option, we would prefer to choose letters that reduce transmission time, but we would also like to choose letters that each have similar transmission time.

We will simplify the situation in this post by using exactly two symbols, 0 and 1, and using bit sequences of fixed length. We’ll also have one simple objective, maximizing Hamming distance separation, with no secondary objectives.


We’ll need to introduce some terminology to make the rest of the post more clear.

By alphabet we will mean the choice of symbols we use. For Morse code, the “alphabet” of symbols is dot, dash, and space. For a binary sequence, the alphabet is 0 and 1.

By word we will mean a sequence of symbols from our alphabet. Using this terminology, we would call ..-. in Morse code a word, even though it represents an English letter. You will also see the term vector used instead of word; mentally substitute vector for word everywhere if you find that easier.

The set of words we will use for transmitting messages is called a code. In the previous post, the code consisted originally of A, D, F, G, and V, and we looked at codes that would have been better by several criteria.

If we represent each English letter by a sequence of five bits, we would call the 0s and 1s the elements of our alphabet and the groups of five bits our words.

Binary codes

For this post we will look at words of a fixed length n. For example, we could encode English letters into words of 5 bits each since

25 = 32 > 26

though this would only give us Hamming distance separation of 1, i.e. many of the code words would differ by only one bit. If a single bit were accidentally flipped in transmission, we would not be able to tell. But if we were to use more bits per word, we could have more separation. For example, if we used words of six bits and used the last bit as a parity bit, then we could choose 26 words that have Hamming distance at least 2 from each other.

A(n, d)

The maximum number of binary words of length n, all separated by a Hamming distance of at least d, is denoted

A2(n, d).

The subscript 2 on A says that we’re working with an alphabet of 2 symbols, i.e. 0 and 1. If we were interested in an alphabet of size q, we would replace the 2 with a q.

This notation is a little sideways from our introduction, but closely related. We would like to know, for example, how many bits n we need do produce a code that has, say, 26 symbols separated by a Hamming distance d. That is, we’re thinking of our code size, such as 26, and the Hamming distance d, as being the independent variables and the number of bits n as the dependent variable. But it’s customary in coding theory to consider the word size n and the Hamming distance separation d as independent variables, and to consider Aq(n, d) as a function of n and d.

There is no way to compute Aq(n, d) in general, even for a fixed q such as q = 2. But there are upper and lower bounds, and exact values have been computed in many particular cases.

For example, A2(10, 4) = 40, and so using sequences of 10 bits, you can find a set of 40 words that are all a Hamming distance of at least 4 from each other, enough to encode all English letters (without regard to case) and 10 digits.

A2(15, 6) = 128, so with 15 bits you could find 128 words a distance of 6 apart, enough to encode all ASCII characters.

Hamming bound and perfect codes

As mentioned above, there is no convenient formula for A(n, d) but there are bounds. The Hamming bound says

A_q(n, d) \leq \frac{q^n}{\sum_{k=0}^t {n \choose k}(q-1)^k}

where the upper limit of the sum is

t = \left\lfloor \frac{d-1}{2} \right\rfloor

A code for which the Hamming bound is exact is called a perfect code.

Hamming codes

Hamming codes are perfect binary codes where d = 3.

Note that 3 is the minimum separation for error correction. If we simply add a parity bit, as mentioned above, we can detect errors, but we cannot correct them. If code words are a distance 2 apart, a word with one corrupted bit could be equidistant between two valid code words.

For example, suppose you encode 00010 as 000101, adding a parity bit of 1 at the end because the original sequence had an odd number of 1’s. And suppose you similarly encode 00011 as 000110. Now you receive 000100. You know that something is wrong because the parity bit is inconsistent with the previous bits. But you don’t know whether 000101 was transmitted and the 6th bit was corrupted or 000110 was transmitted and the 5th bit was corrupted.

With Hamming codes, n is always one less than a power of 2, i.e

n = 2m − 1

and m is the number of added bits. That is, the code will have

2mm − 1

data bits, and the number of distinct code words will be 2 raised to that power. Each of these words is separated by a Hamming distance of at least 3.

Incidentally, note that as m increases, the number of parity bits is growing linearly, but the number of data bits is growing exponentially. That is, the overhead of the parity bits relative to the code word size is going to zero.

Hamming(7,4) code

Let’s look at one Hamming code in detail. If m = 3, we have a code with 7-bit words: 4 data bits and 3 added bits. With 4 data bits we can have 16 different words, so the Hamming code with 7-bit words contains 16 words, each separated by a Hamming distance of 3. If we compute the right side of the Hamming bound we also get 16, i.e. A2(7, 3) = 16, demonstrating that the Hamming bound is exact.

We can encode a set of 4 bits by making them into a row vector and multiplying the vector on the right by the generator matrix

    1 0 0 0 1 1 1
    0 1 0 0 0 1 1
    0 0 1 0 1 0 1
    0 0 0 1 1 1 0

The matrix multiplication is defined using the field with two elements, so multiplication stays the same but 1 + 1 = 0.

Note that the 4 by 4 block on the left side of the matrix is the identity matrix. So the encoding of a string of four bits begins with the original 4 bits. The 16 words in our code are


Unless I’ve made an error in writing this up [1], each of these code words should differ from all other code words in at least 3 positions.

More on coding theory

[1] Errors happen all the time. That’s why we need error correcting codes!

One thought on “A gentle introduction to Hamming codes

  1. Pranava K. Jha

    Just in case you didn’t know, here is a relevant paper:

    P.K. Jha and G. Slutzki, A scheme to construct distance-three codes using latin squares, with applications to the n-cube, Inform. Proc. Lett., vol. 55, no. 3, pp. 123 – 127, Aug. 1995. (MR 1996k: 94019)

Comments are closed.