In HTML you can mark the language of a piece of text by putting it inside span tags and setting the lang
attribute to a two-letter abbreviation. For example,
<span lang="fr">Allons enfants de la Patrie, Le jour de gloire est arrivé !<span>
indicates that the first two lines of the French national anthem are in French.
What are the two-letter codes for languages? I’ve had to look this up several times, and I’m writing the answer here for my future reference and for the benefit of anyone else with the same question.
Finding these abbreviations is a bit of a goose chase. Search for the Microdata standard and that takes you to a W3C document. Search that document for “language” and you don’t find what you’re looking for. But if you’re persistent you’ll find that you’re supposed to use BCP 47 abbreviations. Go there and you see a link for RFC 5646: Tags for Identifying Languages. Click on that and you get a link for the RFC in various formats. Click on that and you think you must have finally found it, a table of languages and abbreviations. Au contraire! This takes you to an 84-page document on how to format language abbreviations. Eventually you see something about ISO 639, and searching on that may take you to the Wikipedia page on ISO 639-1 and that has the table you’re looking for.
Short answer: Look up ISO 639.
The longer answer is that language classification has a surprising amount of detail. There are two-letter and three-letter abbreviations, four-to-eight letter abbreviations, variations, private use variations, … But as a first pass, simply use the two-letter abbreviation. There are ways to be more specific if you need to.
And if you have to guess, use the first two letters of the English name, such as ar for Arabic and ru for Russian. Two major exceptions are Chinese (zh) and Spanish (es). There are many other exceptions as well, but if I wanted to remember a list of abbreviations, I’d narrow the list by first scratching off the ones that are abbreviated by their first two letters.