After writing the previous post on credit card numbers, I intended to link to a previous post that discussed credit card check sums. But I couldn’t find such a post. I’ve written about other kinds of checksums, such as the checksum scheme used in Vehicle Identification Numbers, but apparently I haven’t written about credit card checksums before.
The algorithm used by credit cards, and other identification numbers, is called the Luhn algorithm. I’ll lead up to the algorithm to explain its motivation, going through a thought process Luhn might have gone through in developing his algorithm.
NB: The intermediate steps are not the Luhn algorithm. If you’re just looking for Luhn’s algorithm, skip to the end.
Step 1
A simple way to create a check sum is to add up the digits of a number and tack on the sum mod 10 on the end. For example, this process would replace 1776 with 17761 because
1 + 7 + 7 + 6 = 21
and 21 mod 10 = 1.
This simple checksum will detect if any single digit has been changed. For example, if we receive 17791 we can compute the checksum of 1779 and tell that something is wrong because the last digit should be 4.
Step 2
A common kind of error is transposing consecutive digits, and the scheme above will not detect transpositions: all digits are treated the same way, so scrambling them doesn’t change the checksum.
Luhn’s idea was to double every other digit before taking the sum. This creates an asymmetry between the contribution of consecutive digits to the check sum and makes it possible to detect all transpositions.
Which digits to double
The method described above works whether you start from the left end or the right end, and whether you start doubling with the first or second digit you encounter. But to be compatible with the Luhn algorithm used in practice, start on the right end and double the last digit.
So going back to our little example, 1776 becomes 17764 because
2*6 + 7 + 2*7 + 1 = 34.
If we accidentally transpose the first two digits, 17764 becomes 71764. But the check sum of 7176 is 0. So 71760 would be valid but 71764 is not. We’ve detected an error.
The advantage of starting at the right end is that adding zeros to the front of the number doesn’t change the checksum. So, for example, 01776 becomes 017764.
Step 3
The next refinement is strange. Instead of simply doubling every other digit, we’ll double every other digit and take the digit sum of the result. [1]
So we could calculate the checksum for 1776 by first doubling every other digit
1776 -> 1, 14, 7, 12
then reducing the doubled digits mod 9, i.e. replacing 14 by 5 and 12 by 3,
1, 14, 7, 12 -> 1, 5, 7, 3
and then finally taking the last digit of the sum.
1 + 5 + 7 + 3 = 16 -> 6.
What does this extra complication buy us? Nothing as far as I can tell. It weakens the algorithm. The algorithm in Step 2 could detect if 09 were transposed to 90; the algorithm here in Step 3 cannot.
Update: Nathan points out in the comments that without the digit sum step some changes to a single digit could go undetected, namely changing the second of a pair of doubled digits by 5. For example, 33 and 38 would have the same checksum in Step 2 but not in Step 3.
Step 4
The final step is to take the tens compliment of the checksum computed above. This adds nothing to the effectiveness of the algorithm, but it’s what Luhn did.
Luhn algorithm in practice
Let’s describe the position of digits in a number by numbering them from the right end, starting with o: the last digit is in position 0, the one to its left is in position 1, etc.
We can summarize the Luhn algorithm in the following pseudocode.
sum = 0 for d in digits: if d is in an even position: sum = sum + digit sum of 2*d else: sum = sum + d return 10 - sum mod 10
The last line isn’t quite correct. If sum mod 10
is 0, then this returns 10 when it should return 0. A clever way around this is to change the last line to
return (sum*9) mod 10
which will always return a single digit.
Update: Here is Python code for implementing the Luhn checksum.
NB: In our examples above, the checksum was added to the end. Some credit cards do this, but others put the checksum somewhere in the interior of the number.
Afterword
The Luhn algorithm was developed in 1954, though it is still widely used. Since that time much more sophisticated error correcting codes have been developed. See the links below for examples.
[1] When I first wrote this post, I thought you were supposed to take 2d mod 9. But you’re supposed to take the digit sum of 2d. These are almost the same thing, but the digit sum of 18 is 9, not 0.
Without the mod 9 step, changing any of the doubled digits by 5 would not affect the checksum.
The Luhn algorithm can be used to validate some other ID numbers as well, including the National Provider Identifier used within the US health care system. I wrote about this in Validate a National Provider ID (NPI) value. This is useful as a data quality check when working with health care data.