Unicode to LaTeX

I’ve run across a couple web sites that let you enter a LaTeX symbol and get back its Unicode value. But I didn’t find a site that does the reverse, going from Unicode to LaTeX, so I wrote my own.

Unicode / LaTeX Conversion

If you enter Unicode, it will return LaTeX. If you enter LaTeX, it will return Unicode. It interprets a string starting with “U+” as a Unicode code point, and a string starting with a backslash as a LaTeX command.

screenshot of www.johndcook.com/unicode_latex.png

For example, the screenshot above shows what happens if you enter U+221E and click “convert.” You could also enter infty and get back U+221E.

However, if you go from Unicode to LaTeX to Unicode, you won’t always end up where you started. There may be multiple Unicode values that map to a single LaTeX symbol. This is because Unicode is semantic and LaTeX is not. For example, Unicode distinguishes between the Greek letter Ω and the symbol Ω for ohms, the unit of electrical resistance, but LaTeX does not.

11 thoughts on “Unicode to LaTeX

  1. Can’t you just use XeTeX or LuaTeX? If you use a math enabled font, you should be able to get the same result, and your source will be more readable if you use a Unicode text editor.

  2. Kyle: XeTeX and LuaTeX don’t run everywhere. Sometimes you need to use plain LaTeX.

    Also, LaTeX commands are more memorable than Unicode code points. So when I’m writing LaTeX, I’d rather enter LaTeX commands than Unicode values.

  3. Actually you are supposed to differentiate between ohm and omega, and there are a number of packages to do that, such as siunitsx. For example, if you use Omega for ohms it will wrong, as you will get an italic omega, instead of an upright omega.

  4. I have to say that I’m not convinced about this omega/ohm thing. Do we use a different “m” as the symbol for metres? Nope, so I don’t see why we need to be prejudiced against Greek letters, either.

  5. LaTeX was designed to put ink on paper. So if two concepts produce the same patterns pixels on paper, no need to distinguish them.

    Unicode is meant to be more semantic, and so it makes a distinction between the capital ‘A’ of the English alphabet and the capital ‘A’ of the Greek alphabet. That distinction is useful in software. Maybe the software treats Greek text differently than English text. Maybe you want to search for a capital alpha in an English document containing countless English A’s.

    Now LaTeX is being used to create online documents, usually PDFs. It would be nice if it made more semantic distinctions, but it wasn’t designed for that.

  6. John: Semantic LaTeX is a big thing these days. It had a lot of features supporting that back in the day, before anything else did (emph instead of textitalic when appropriate). My code is often criticized as being too low level and not using enough semantic macros.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>