# Five lemma, ASCII art, and Unicode

A few days ago I wrote about creating ASCII art in Emacs using ditaa. Out of curiosity, I wanted to try making the Five Lemma diagram. [1]

The examples in the ditaa site all have arrows between boxes, but you don’t have to have boxes.

Here’s the ditaa source:

A₀ ---> A₁ ---> A₂ ---> A₃ ---> A₄
|       |       |       |       |
| f₀    | f₁    | f₂    | f₃    | f₄
|       |       |       |       |
v       v       v       v       v
B₀ ---> B₁ ---> B₂ ---> B₃ ---> B₄


and here’s the image it produces:

It’s not pretty. You could make a nicer image with LaTeX. But as the old saying goes, the remarkable thing about a dancing bear is not that it dances well but that it dances at all.

The trick to getting the subscripts is to use Unicode characters 0x208n for subscript n. As I noted at the bottom of this post, ditaa isn’t strictly limited to ASCII art. You can use Unicode characters as well. You may or may not be able to see the subscripts in the source code they are not part of the most widely supported set of characters.

* * *

[1]  The Five Lemma is a diagram-chasing result from homological algebra. It lets you infer properties the middle function f from properties of the other f‘s.

# Graphemes

Here’s something amusing I ran across in the glossary of Programming Perl:

grapheme A graphene is an allotrope of carbon arranged in a hexagonal crystal lattice one atom thick. Grapheme, or more fully, a grapheme cluster string is a single user-visible character, which in turn may be several characters (codepoints) long. For example … a “ȫ” is a single grapheme but one, two, or even three characters, depending on normalization.

In case the character ȫ doesn’t display correctly for you, here it is:

First, graphene has little to do with grapheme, but it’s geeky fun to include it anyway. (Both are related to writing. A grapheme has to do with how characters are written, and the word graphene comes from graphite, the “lead” in pencils. The origin of grapheme has nothing to do with graphene but was an analogy to phoneme.)

Second, the example shows how complicated the details of Unicode can get. The Perl code below expands on the details of the comment about ways to represent ȫ.

This demonstrates that the character . in regular expressions matches any single character, but \X matches any single grapheme. (Well, almost. The character . usually matches any character except a newline, though this can be modified via optional switches. But \X matches any grapheme including newline characters.)


# U+0226, o with diaeresis and macron
my $a = "\x{22B}"; # U+00F6 U+0304, (o with diaeresis) + macron my$b = "\x{F6}\x{304}";

# o U+0308 U+0304, o + diaeresis + macron
my $c = "o\x{308}\x{304}"; my @versions = ($a, $b,$c);

# All versions display the same.
say @versions;

# The versions have length 1, 2, and 3.

TorbjoernT and scmbradley let me know there’s a better way: use Martin Hansel’s package mhchem. The package is simpler to use and it correctly handles subtle typographical details.

Using the mhchem package, sulfate would be written ce{SO4^2-}. In addition to chemical compounds, mhchem has support for bonds, arrows, and related chemical notation.

Example:

Source:

\documentclass{article}
\usepackage[version=3]{mhchem}
\parskip=0.1in
\begin{document}

\ce{SO4^2-}

\ce{^{227}_{90}Th+}

\ce{A\bond{-}B\bond{=}C\bond{#}D}

\ce{CO2 + C -> 2CO}

\ce{SO4^2- + Ba^2+ -> BaSO4 v}

\end{document}

Related posts:

# Complexity of HTML and LaTeX

Sometime around 1994, my office mate introduced me to HTML by saying it was 10 times simpler than LaTeX. At the time I thought he was right. Now I’m not so sure. Maybe he was right in 1994 when the expectations for HTML were very low.

It is easier to bang out a simple, ugly HTML page than to write your first LaTeX document. When you compare the time required to make an attractive document, the effort becomes more comparable. The more sophisticated you get, the simpler LaTeX becomes by comparison.

Of course the two languages are not exactly comparable. HTML targets a web browser while LaTeX targets paper. HTML would be much simpler if people only used it to create documents to print out on their own printer. A major challenge with HTML is not knowing how someone else will use your document. You don’t know what browser they will view it with, at what resolution, etc. For that matter, you don’t know whether they’re even going to view it at all — they may use a screen reader to listen to the document.

Writing HTML is much more complicated than writing LaTeX if you take a broad view of all that is required to do it well: learning about accessibility and internationalization, keeping track of browser capabilities and market shares, adapting to evolving standards, etc. The closer you look into it, the less HTML has in common with LaTeX. The two languages are not simply two systems of markup; they address different problems.