Country and language abbreviations

I recently had to mark a bit of German text as German in an HTML file and I wondered whether the abbreviation might be GER for German, or DEU for deutsche.

Turns out the answer is both, almost. The language abbreviations used for HTML microdata are given in ISO 639, and they come in three-letter and two-letter varieties. The three-letter abbreviation for German is GER but the two-letter abbreviation is DE.

There are also standard two- and three-letter abbreviations for countries, given in ISO 3166. These are DE and DEU for Germany. I was curious how often a country abbreviation is also a language abbreviation.

I found text files giving the ISO 639 and ISO 3166 abbreviations, and used the comm utility to see how what the intersection was.

There are 253 languages and 252 countries in the two standards. There are 110 two-letter abbreviations common to both, and 40 three-letter abbreviations common to both.

However, just because an abbreviation appears in both standards, this doesn’t mean it represents the same thing in both standards. Sometimes they overlap. For example CZE abbreviates both the Czech Republic and the Czech language. But BEL represents the nation of Belgium and the Belarusian language.