A VIN (vehicle identification number) is a string of 17 characters that uniquely identifies a car or motorcycle. These numbers are used around the world and have three standardized formats: one for North America, one for the EU, and one for the rest of the world.
Letters that resemble digits
The characters used in a VIN are digits and capital letters. The letters I, O, and Q are not used to avoid confusion with the numerals 0, 1, and 9. So if you’re not sure whether a character is a digit or a letter, it’s probably a digit.
It would have been better to exclude S than Q. A lower case q looks sorta like a 9, but VINs use capital letters, and an S looks like a 5.
The various parts of a VIN have particular meanings, as documented in the Wikipedia article on VINs. I want to focus on just the check sum, a character whose purpose is to help detect errors in the other characters.
Of the three standards for VINs, only the North American one requires a check sum. The check sum is in the middle of the VIN, the 9th character.
The scheme for computing the check sum is both complicated and weak. The end result is either a digit or an X. There are 33 possibilities for each character (10 digits + 23 letters) and 11 possibilities for a check sum, so the check sum cannot possibly detect all changes to even a single character.
The check sum is computed by first converting all letters to digits, computing a weighted sum of the 17 digits, and taking the remainder by 11. The weights for the 17 characters are
8, 7, 6, 5, 4, 3, 2, 10, 0, 9, 8 ,7 ,6, 5, 4, 3, 2
I don’t see any reason for these weights other than that adjacent weights are different, which is enough to detect transposition of consecutive digits (and characters might not be digits). Maybe the process was deliberately complicated in an attempt to provide a little security by obscurity.
There’s an interesting historical quirk in how the letters are converted to digits: each letter is mapped to a digit with in a somewhat complicated way. More on that here. It has to so with gaps in the EBCDIC values of letters. Letters are not mapped to their EBCDIC values per se, but there are jumps that are explained by corresponding jumps in EBCDIC.
EBCDIC?! Why not ASCII? Both EBCDIC and ASCII go back to 1963. VINs date back to 1954 in the US but were standardized in 1981. Presumably the check sum algorithm using EBCDIC digits became a de facto standard before ASCII was ubiquitous.
A better check sum
Any error detection scheme that uses 11 characters to detect changes in 33 characters is necessarily weak.
A much better approach would be to use a slight variation on the check sum algorithm Douglas Crockford recommended for base 32 encoding described here. Crockford says to take a string of characters from an alphabet of 32 characters, interpret it as a base 32 number, and take the remainder by 37 as the check sum. The same algorithm would work for an alphabet of 33 characters. All that matters is that the number of possible characters is less than 37.
Since the check sum is a number between 0 and 36 inclusive, you need 37 characters to represent it. Crockford recommended using the symbols *, ~, $, =, and U for extra symbols in his base 32 system. His system didn’t use U, and VIN numbers do. But we only need four more characters, so we could use *, ~, $, and =.
The drawback to this system is that it requires four new symbols. The advantage is that any change to a single character would be detected, as would any transposition of adjacent characters. This is proved here.