I’ve had to work a little with VIN numbers lately, and so I looked back at a post I wrote on the subject three years ago. That post goes into the details of Vehicle Identification Numbers and the quirky algorithm used to compute the check sum.
This post captures the algorithm in Python code. See the earlier post for documentation.
import re def char_to_num(ch): "Assumes all characters are digits or capital letters." n = ord(ch) if n <= ord('9'): # digits return n - ord('0') if n < ord('I'): # A-I return n - ord('A') + 1 if n <= ord('R'): # J-R return n - ord('J') + 1 return n - ord('S') + 2 # S-Z def checksum(vin): w = [8, 7, 6, 5, 4, 3, 2, 10, 0, 9, 8, 7, 6, 5, 4, 3, 2] t = 0 for i, c in enumerate(vin): t += char_to_num(c)*w[i] t %= 11 check = 'X' if t == 10 else str(t) return check def validate(vin): vinregex = "^[A-HJ-NPR-Z0-9]{17}$" r = re.match(vinregex, vin) return r and checksum(vin) == vin[8]
This code assumes the VIN number is given as ASCII or Unicode text. In particular, digits come before letters, and the numeric values of letters increase with alphabetical order.
The code could seem circular: the input is the full VIN, including the checksum. But the checksum goes in the 9th position, which has weight 0. So the checksum doesn't contribute to its own calculation.
Update I added a regular expression to check that the VIN contains only valid characters.
Is that code right?
if n < ord('I'): # A-I
Shouldn't that be:
if n <= ord('I'): # A-I ?
Never trust comments.
It looks like this has something to do with punch card representation. Those base digits reflect the numeric hole number and there seems to be an adjustment for the zone codes, perhaps the count of zone codes? This doesn't seemed easy for a punch card machine to handle, but I'm betting a computer like the IBM 401 or 1620 could handle pretty easily.
The older article at https://www.johndcook.com/blog/2019/09/12/vin-check-sum/ says “The letters I, O, and Q are not used to avoid confusion with the numerals 0, 1, and 9” and comments that the numeric values are related to EBCDIC encoding.