Unicode to LaTeX

I’ve run across a couple web sites that let you enter a LaTeX symbol and get back its Unicode value. But I didn’t find a site that does the reverse, going from Unicode to LaTeX, so I wrote my own.

Unicode / LaTeX Conversion

If you enter Unicode, it will return LaTeX. If you enter LaTeX, it will return Unicode. It interprets a string starting with “U+” as a Unicode code point, and a string starting with a backslash as a LaTeX command.

screenshot of www.johndcook.com/unicode_latex.png

For example, the screenshot above shows what happens if you enter U+221E and click “convert.” You could also enter infty and get back U+221E.

However, if you go from Unicode to LaTeX to Unicode, you won’t always end up where you started. There may be multiple Unicode values that map to a single LaTeX symbol. This is because Unicode is semantic and LaTeX is not. For example, Unicode distinguishes between the Greek letter Ω and the symbol Ω for ohms, the unit of electrical resistance, but LaTeX does not.

Tagged with: , ,
Posted in Typography
10 comments on “Unicode to LaTeX
  1. Kyle says:

    Can’t you just use XeTeX or LuaTeX? If you use a math enabled font, you should be able to get the same result, and your source will be more readable if you use a Unicode text editor.

  2. Edgardo says:

    This site lets you draw a symbol and returns the latex code for it:

  3. John says:

    Kyle: XeTeX and LuaTeX don’t run everywhere. Sometimes you need to use plain LaTeX.

    Also, LaTeX commands are more memorable than Unicode code points. So when I’m writing LaTeX, I’d rather enter LaTeX commands than Unicode values.

  4. Ed Davies says:

    It could also usefully allow you to type (and, particularly, paste) the actual Unicode character as well.

  5. Bart says:

    Hey, do you mind if I steal that data.js file?
    I’d like to start adding stuff like this to a unicode lookup tool of mine…

  6. John says:

    Bart: Go ahead.

  7. Canageek says:

    Actually you are supposed to differentiate between ohm and omega, and there are a number of packages to do that, such as siunitsx. For example, if you use Omega for ohms it will wrong, as you will get an italic omega, instead of an upright omega.

  8. Ed Davies says:

    I have to say that I’m not convinced about this omega/ohm thing. Do we use a different “m” as the symbol for metres? Nope, so I don’t see why we need to be prejudiced against Greek letters, either.

  9. John says:

    LaTeX was designed to put ink on paper. So if two concepts produce the same patterns pixels on paper, no need to distinguish them.

    Unicode is meant to be more semantic, and so it makes a distinction between the capital ‘A’ of the English alphabet and the capital ‘A’ of the Greek alphabet. That distinction is useful in software. Maybe the software treats Greek text differently than English text. Maybe you want to search for a capital alpha in an English document containing countless English A’s.

    Now LaTeX is being used to create online documents, usually PDFs. It would be nice if it made more semantic distinctions, but it wasn’t designed for that.

  10. Canageek says:

    John: Semantic LaTeX is a big thing these days. It had a lot of features supporting that back in the day, before anything else did (emph instead of textitalic when appropriate). My code is often criticized as being too low level and not using enough semantic macros.

1 Pings/Trackbacks for "Unicode to LaTeX"
  1. [...] Unicode to LaTeX ::: The Endeavour [...]