From the category archives:

Typography

Gutenberg + Readability

by John on December 18, 2011

Here’s a very simple idea: Use Project Gutenberg for content and Readability for style.

Project Gutenberg has a large collection of public domain books in digital form. The books are available in several formats, none of which are ideal for reading. Project Gutenberg provides text without much styling in order to make it easier for people to use the content as they please.

You can go to the HTML version of a book on Gutenberg and use Readability (or Instapaper) to format it for easier reading. Importing the HTML page to a Kindle similarly improves the formatting.

***

Has anyone made a style sheet to approximate the look of Readability or Instapaper? I’d like to use something like that to improve the appearance of the static HTML pages on my site.

{ 10 comments }

Readability

by John on November 28, 2011

The Readability bookmarklet lets you reformat any web to make it easier to read. It strips out flashing ads and other distractions. It uses black text on a white background, wide margins, a moderate-sized font, etc. I use Readability fairly often. (Instapaper is a similar service. I discuss it at the end of this post.)

Yesterday I used it to reformat an article on literate programming. For some inexplicable reason, the author chose to use a lemon yellow background. It’s ironic that the article is about making source code easier to read. The content of the article is easy to read, but the format is not.

Readability to the rescue! Here are before and after screen shots.

Before:

After:

I recommend the article, Example of Literate Programming in HTML, and I also recommend using reformatting the page unless you enjoy reading black text on a yellow background.

Readability did a good job until about half way through the article. The article has C and HTML code examples, and perhaps these confused Readability. (Readability usually handles code samples well. It correctly formats the first few code samples in this article.) The last half of the article renders like source code, and the font gets smaller and smaller.

I ran the page through an HTML validator to see whether some malformed HTML could be the source of the problem. The validator found numerous problems, so perhaps that was the issue.

I haven’t seen Readability fail like this before. I’ve been surprised how well it has handled some pages I thought might trip it up.

I ended up saving the article and editing its source, changing the bgcolor value to white. It’s a nice article on literate programming once you get past the formatting. The best part of the article is the first section, and that much Readability formats correctly.

Instapaper

Instapaper reformats web pages similarly. It produces a narrower column of text, but otherwise the output looks quite similar.

Instapaper did not discover the title of the literate programming article. (The title of the article was not in an <h1> tag as software might expect but was only in a <title> tag in the page header.) However, it did format the entire body of the article correctly.

I find it slightly more convenient to use the Readability bookmarklet than to submit a link to Instapaper. I imagine there are browser plug-ins that make Instapaper just as easy to use, though I haven’t looked into this because I’m usually satisfied with Readability.

Related posts:

Literate programming and statistics
Tricky code

{ 11 comments }

Draw a symbol, look it up

by John on November 12, 2011

LaTeX users may know about Detexify, a web site that lets you draw a character then looks up its TeX command. Now there’s a new site Shapecatcher that does the same thing for Unicode. According to the site, “Currently, there are 10,007 Unicode character glyphs in the database.” It does not yet support Chinese, Japanese, or Korean.

For example, I drew a treble clef on the page:

The site came back with a list of possible matches, and the first one was what I was hoping for:

Interestingly, the sixth possible match on the list was a symbol for contour integration:

Notice the treble clef response has a funny little box on the right side. That’s because my browser did not have a glyph to display that Unicode character. The browser did have a glyph for the contour integration symbol and displayed it.

Another Unicode resource I recommend is this Unicode Codepoint Chart. It is organized by code point value, in blocks of 256. If you were looking for the contour integration symbol above, for example, you could click on a link “U+2200 to U+22FF: Mathematical Operators” and see a grid of 256 symbols and click on the one you’re looking for. This site gives more detail about each character than does Shapecatcher. So you might use Shapecatcher to find where to start looking, then go to the Unicode Codepoint Chart to find related symbols or more details.

Other posts on Unicode:

Why Unicode is subtle
The disappointing state of Unicode fonts
Entering Unicode characters in Windows and Linux
Inserting graphics in Twitter messages

{ 10 comments }

Typesetting “C#” in LaTeX

by John on October 18, 2011

How do you refer to the C# programming language in LaTeX? Simply typing C# doesn’t work because # is a special character in LaTeX. You could type C\#. That works, but it looks a little odd. The number sign is too big and too low. [click to continue...]

{ 3 comments }

Typesetting chemistry in LaTeX

by John on December 8, 2010

Yesterday I gave the following tip on TeXtip:

Set chemical formulas with math Roman. Example: sulfate is $\mathrm{SO_4^{2-}}$

TorbjoernT and scmbradley let me know there’s a better way: use Martin Hansel’s package mhchem. The package is simpler to use and it correctly handles subtle typographical details.

Using the mhchem package, sulfate would be written \ce{SO4^2-}. In addition to chemical compounds, mhchem has support for bonds, arrows, and related chemical notation.

Example:

Source:

\documentclass{article}
\usepackage[version=3]{mhchem}
\parskip=0.1in
\begin{document}

\ce{SO4^2-}

\ce{^{227}_{90}Th+}

\ce{A\bond{-}B\bond{=}C\bond{#}D}

\ce{CO2 + C -> 2CO}

\ce{SO4^2- + Ba^2+ -> BaSO4 v}

\end{document}

 

For more information, see the mhchem package documentation.

Related posts:

Top four LaTeX mistakes
Hyperlinks in LaTeX-generated PDF

{ 7 comments }

Complexity of HTML and LaTeX

by John on May 12, 2010

Sometime around 1994, my office mate introduced me to HTML by saying it was 10 times simpler than LaTeX. At the time I thought he was right. Now I’m not so sure. Maybe he was right in 1994 when the expectations for HTML were very low.

It is easier to bang out a simple, ugly HTML page than to write your first LaTeX document. When you compare the time required to make an attractive document, the effort becomes more comparable. The more sophisticated you get, the simpler LaTeX becomes by comparison.

Of course the two languages are not exactly comparable. HTML targets a web browser while LaTeX targets paper. HTML would be much simpler if people only used it to create documents to print out on their own printer. A major challenge with HTML is not knowing how someone else will use your document. You don’t know what browser they will view it with, at what resolution, etc. For that matter, you don’t know whether they’re even going to view it at all — they may use a screen reader to listen to the document.

Writing HTML is much more complicated than writing LaTeX if you take a broad view of all that is required to do it well: learning about accessibility and internationalization, keeping track of browser capabilities and market shares, adapting to evolving standards, etc. The closer you look into it, the less HTML has in common with LaTeX. The two languages are not simply two systems of markup; they address different problems.

Related links:

Side benefits of accessibility
Math symbols in HTML, XML, TeX, and Unicode
Top four LaTeX mistakes

{ 9 comments }

Top four LaTeX mistakes

by John on February 15, 2010

Here are four of the most common typesetting errors I see in books and articles created with LaTeX.

1) Quotes

Quotation marks in LaTeX files begin with two back ticks, ``, and end with two single quotes, ''.

The first “Yes” was written as

``Yes.''

in LaTeX while the one with the backward opening quote was written as

"Yes."

2) Differentials

Differentials, most commonly the dx at the end of an integer, should have a little space separating them from other elements. The “dx” is a unit and so it needs a little space to keep from looking like the product of “d” and “x.” You can do this in LaTeX by inserting \, before and between differentials.

The first integral was written as

 \int_0^1 f(x) \, dx

while the second forgot the \, and was written as

 \int_0^1 f(x)  dx

The need for a little extra space around differentials becomes more obvious in multiple integrals.

The first was written as

dx \, dy = r \, dr \, d\theta

while the second was written as

dx  dy = r  dr  d\theta

3) Multi-letter function names

The LaTeX commands for typesetting functions like sin, cos, log, max, etc. begin with a backslash. The command \log keeps “log,” for example, from looking like the product of variables “l”, “o”, and “g.”

The first example above was written as

\log e^x = x

and the second as

log e^x = x

The double angle identity for sine is readable when properly typeset and a jumbled mess when the necessary backslashes are left out.

The first example was written

\sin 2u = 2 \sin u \cos u

and the second as

sin 2u = 2 sin u cos u

4) Failure to use math mode

LaTeX uses math mode to distinguish variables from ordinary letters. Variables are typeset in math italic, a special style that is not the same as ordinary italic prose.

The first sentence was written as

Given a matrix $A$ and vector $b$, solve $Ax = b$.

and the second as

Given a matrix A and vector b, solve Ax = b.

Related posts:

Microsoft equation editor
Converting Excel tables to LaTeX
Typesetting music in LaTeX
Contrasting Word and LaTeX
Things that work best when you don’t notice them

{ 25 comments }

The disappointing state of Unicode fonts

by John on January 16, 2010

Modern operating systems understand Unicode internally, but font support for Unicode is spotty. For an example of the problems this can cause, take a look at these screen shots of how the same Twitter message appears differently depending on what program is used to read it.

No font can display all Unicode characters. According to Wikipedia

… it would be impossible to create such a font in any common font format, as Unicode includes over 100,000 characters, while no widely-used font format supports more than 65,535 glyphs.

However, the biggest problem isn’t the number of characters a font can display. Most Unicode characters are quite rare. About 30,000 characters are enough to display the vast majority of characters in use in all the world’s languages as well as a generous selection of symbols. However Unicode fonts vary greatly in their support even for the more commonly used ranges of characters. See this comparison chart. The only range completely covered by all Unicode fonts in the chart is the 128 characters of Latin Extended-A.

Unifont supports all printable characters in the basic multilingual plane, characters U+0000 through U+FFFF. This includes the 30,000 characters mentioned above plus many more. Unifont isn’t pretty, but it’s complete. As far as I know, it’s the only font that covers the characters below U+FFFF.

Related posts:

Why Unicode is subtle

Entering Unicode characters in Windows, Linux

{ 17 comments }

Free alternative to Consolas font

by John on September 21, 2009

Consolas is my favorite monospace font. It’s a good programmer’s font because it exaggerates the differences between some characters that may easily be confused. It ships with Visual Studio and with many other Microsoft products. See this post for examples.

I recently found out about Inconsolata, a free font similar to Consolas. Inconsolata is part of the OFL font collection from SIL International.

Another interesting font from SIL is Andika, mentioned previously here. The Andika home page describes this font as follows.

Andika is a sans serif, Unicode-compliant font designed especially for literacy use, taking into account the needs of beginning readers. The focus is on clear, easy-to-perceive letterforms that will not be readily confused with one another.

Related posts:

Better R console fonts
Adding fonts to the PowerShell console
Comic Sans and dyslexia

{ 6 comments }

How to write multi-part definitions in LaTeX

by John on September 14, 2009

This post explains how to typeset multi-part definitions in LaTeX.

The absolute value function is a simple example of a two-part definition.

absolute value definition

The Möbius function is a more complicated example of a three-part definition.

definition of Mobius function

Here’s how you could write LaTeX for the absolute value definition.

|x| =
\left\{
	\begin{array}{ll}
		x  & \mbox{if } x \geq 0 \\
		-x & \mbox{if } x < 0
	\end{array}
\right.

The right-hand side of the equation is an array with an opening brace sized to fit on the left. Braces are special characters and so the opening brace needs to be escaped with a backslash. LaTeX requires a \right for every \left but the dot in \right. says to make the matching container on the right side empty.

Since this pattern comes up fairly often, it’s handy to have a command to encapsulate it. We define \twopartdef as follows.

\newcommand{\twopartdef}[4]
{
	\left\{
		\begin{array}{ll}
			#1 & \mbox{if } #2 \\
			#3 & \mbox{if } #4
		\end{array}
	\right.
}

Then we could call it as follows:

|x| = \twopartdef { x } {x \geq 0} {-x} {x < 0}

The command \threepartdef is very similar to \twopartdef.

\newcommand{\threepartdef}[6]
{
	\left\{
		\begin{array}{lll}
			#1 & \mbox{if } #2 \\
			#3 & \mbox{if } #4 \\
			#5 & \mbox{if } #6
		\end{array}
	\right.
}

You could call \threepartdef for the Möbius function as follows.

\mu(n) = \threepartdef
{1}      {n=1}
{0}      {a^2 \,|\, n \mbox{ for some } a > 1}
{(-1)^r} {n \mbox{ has } r \mbox{ distinct prime factors}}

Related posts:

Typesetting music in LaTeX
How to display side-by-side figures in LaTeX
Including images in LaTeX
LaTeX and PowerPoint presentations
How to put PDF properties in a LaTeX file

{ 9 comments }

The default font options for the PowerShell console are limited: raster fonts and Lucida Console. Raster fonts are the default, though Lucida Console is an improvement. In my opinion, Consolas is even better, but it’s not on the list of options.

Mastering PowerShell by Tobias Weltner explains how to expand the list of font options for the PowerShell console. The same trick increases the list of font options in the Windows command prompt cmd.exe as well. The book is free for download. See page 16 for details. However, I have two comments about the instructions it gives.

First, the book says “The name must be exactly the same as the official font name, just the way it’s stated under [registry key].” However, the Consolas font is listed in the registry as “Consolas (True Type)”. You should enter “Consolas” and leave out the parenthetical description.

Second, the book says “the new font will work only after you either log off at least once or restart your computer.” When I tried it, logging off was not sufficient; I had to reboot my computer before the font change would work.

Update: In order to make this post self-contained, I’ve added below the necessary information from Mastering PowerShell.

Run regedit.exe and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont.

Right-click in the panel on the right side and create a new string value. Name that value “0″ or “00″ or however many zeros you need to create a new key. That string’s value is the name of the font to add.

Update: See Necessary criteria for fonts to be available in a command window

Related posts:

Improved PowerShell prompt
A couple thoughts on typography
Better R console fonts

{ 6 comments }

I owe Microsoft Word an apology

by John on July 15, 2009

I tried to use the Equation Editor in Microsoft Word years ago and hated it. It was hard to use and produced ugly output. I tried it again recently and was pleasantly surprised. I’m using Word 2007. I don’t remember what version I’d tried before.

I’ve long said that math written in Word is ugly, and it usually is. But the fault lies with users, like myself, not with Word. I realize now that the problem is that most people writing math in Word are not using the Equation Editor. LaTeX produces ugly math too when people do not use it correctly, though this happens less often.

Math typography is subtle. For example, mathematical symbols are set in an italic font that is not quite the same as the italic font used in prose. Also, word-like symbols such as “log” or “cos” are not set in italics. I imagine most people do not consciously notice these conventions — I never noticed until I learned to use LaTeX — but subconsciously notice when the conventions are violated. The conventions of math typography give clues that help readers distinguish, for example, the English indefinite article “a” from a variable named “a” and to distinguish the symbol for maximum from the product of variables “m”, “a”, and “x.”

Microsoft’s Equation Editor typesets math correctly. Word documents usually do not, but only because folks usually do not use the Equation Editor. In the following example, I set the same equation three times: using ordinary text, using ordinary italic for the “x”, and finally using the Equation Editor.

screen shot of trig identity using MS Word

Note that the “x” in the third version is not the same as the italic “x” in the second version. The prose in this example is set in Calibri font and the Equation Editor uses Cambria Math font. Also, I did not tell Word to format “sin” and “cos” one way and “x” another or tell it what font to use; I simply typed sin^2 x + cos^2 x = 1 into the Equation Editor and it formatted the result as above. I haven’t used it much, but the Equation Editor seems to be more capable and easier to use than I thought.

Here are a few more examples of Equation Editor output.

examples of math using Word: Gaussian integral, Fourier series, quadratic equation

I still prefer using LaTeX for documents containing math symbols. I’ve used LaTeX for many years and I can typeset equations very quickly using it. But I’m glad to know that Word can typeset equations well and that the process is easier than I thought.

I tried out the Equation Editor because Bob Matthews suggested I try MathType, a third-party equation editor add-on for Microsoft Word. I haven’t tried MathType yet but from what I hear it produces even better output.

Related post: Contrasting Microsoft Word and LaTeX

{ 21 comments }

A couple thoughts on typography

by John on June 10, 2009

Font embedding not such a good idea?

The most recent Boag World podcast interviewed Mark Boulton. Boulton has a contrarian opinion on font embedding. Nearly all web designers are excited about font embedding (the ability to have fonts download on-the-fly if a page uses a font not installed on the user’s computer). Bolton’s not so sure this is a good idea. Fonts are designed for a purpose, and most fonts were designed for print. The handful of fonts that were designed first for online viewing (Verdana, Georgia, etc.) are widely installed. If font embedding were a way to broaden the pallet of fonts designed for use on a computer monitor, that would be great. But the most likely use of font embedding would be to allow designers to use more fonts online that were not designed to be used online.

Comic Sans and dyslexia

Comic Sans is terribly overused. It’s not a bad font, but it’s often used in inappropriate contexts and has become a cliché for poor typographical taste.

However, I heard somewhere that people with dyslexia can read Comic Sans more easily than most other fonts. I think the explanation was that the font breaks some typical symmetries. For example, a “p” is not an exact mirror image of a “q.” (The former has a more pronounced serif on top.) On the other hand, the “b” and “d” do look like near mirror images. I wonder whether anyone has designed a font specifically to help people with dyslexia. Maybe such  fonts would exaggerate the asymmetries that were accidental in the design of Comic Sans. Delivering such fonts would be a good application of font embedding.

Update: Karl Ove Hufthammer left a comment pointing out Andika, a font with “easy-to-perceive letterforms that will not be readily confused with one another.” Here’s a sample.

Related posts

Periodic table of typefaces
Things that work best when you don’t notice them
Better R console fonts

{ 1 comment }

Sharps and flats in HTML

by John on March 16, 2009

Apparently there’s no HTML entity for the flat symbol, ♭. In my previous post, I just spelled out B-flat because I thought that was safer; it’s possible not everyone would have the fonts installed to display B♭ correctly.

So how do you display music symbols for flat, sharp, and natural in HTML? You can insert any symbol if you know its Unicode value, though you run the risk that someone viewing the page may not have the necessary fonts installed to view the symbol. Here are the Unicode values for flat, natural, and sharp.

Since the flat sign has Unicode value U+266D, you could enter &#x266d; into HTML to display that symbol.

The sharp sign raises an interesting question. I’m sure most web pages referring to G-sharp would use the number sign # (U+0023) rather than the sharp sign ♯ (U+266F). And why not? The number sign is conveniently located on a standard keyboard and the sharp sign isn’t. It would be nice if people used sharp symbols rather than number signs. It would make it easier to search on specifically musical terms. But it’s not going to happen.

Related posts:

Entering Unicode characters in Linux
Three ways to enter Unicode characters in Windows
Greek letters and math symbols in (X)HTML

{ 2 comments }

Typesetting music in LaTeX and LilyPond

by John on March 15, 2009

I tried typesetting music in LaTeX some time ago and gave up. The packages I found were hard to install, the examples didn’t work, etc. This weekend I decided to try again. I tried plowing through the MusiXTeX documentation and got no further than I did last time.

I posted a note on StackOverflow and got some good responses. Nikhil Chelliah suggested I look at LilyPond. I had looked at LilyPond before, and @jleedev explained how to integrate LaTeX and LilyPond.

Here’s some sheet music I included in my previous post, March in 7/4 time.

sheet music example

Here’s a full-sized PDF file version of the music above. And here’s the LilyPond source code used to create the music.

\relative c' {
\time 7/4
\key f \major
\clef treble
f g f \times 2/3{ c8 c c} f4 g a
g a8. bes16 a4 g f g c,
f g f \times 2/3{ c8 c c} f4 g a
g a8. bes16 a4 g f e f
}

The notation looks cryptic at first, but it makes sense after a few minutes. The command \relative c' means that the following pitches will be relative to middle C. For example, the first note, F, is the F closest to middle C. Each note is the same length as the previous note by default, and the first note is a quarter note by default. The notation c8 means that the C is an eighth note, except it’s in the context of a triplet (\times 2/3) and so it’s an eighth note triplet. The next F is denoted f4 to indicate that we’re back to quarter notes.

The notation a8. says that the A is a dotted eighth note. For the next note, bes16 means a B-flat sixteenth note. The suffix “es” stands for “flat” and “is” stands for “sharp.” (The documentation says it’s Dutch. I’ve never seen it before.) I don’t understand why I had to tell it that the B was flat. The code specified earlier that the key was F major, which implies B’s are flat. I suppose the code for individual notes is decoupled from the code to draw the key signature. That would make entering music painful in keys that have lots of sharps or flats. Maybe there’s a way to specify default sharps or flats.

The comma in c, gives the absolute pitch of the C. In relative mode, LilyPond assumes by default that each pitch name refers to the pitch closest to its predecessor. The C closest to the previous note, F, would have been the C up one fourth rather than down one fifth, so the comma was necessary to tell LilyPond to go down.

If I were to do a lot of music processing, I’d probably look at a commercial package such as Sibelius. But for now I’m just interested in producing small excerpts like that above, and it looks like LilyPond may be fine.

Update: I double checked the rules about flats etc. Yes, I do have to specify explicitly that the B in this example is B-flat. If I just say b rather than bes, LilyPond will add a natural sign in front of the B! It’s strange. It is aware of the key signature: when I tell it the B is flat, it says “OK, then I don’t have to mark that specially since it’s implicit in the key signature.” And if I don’t tell it the B is flat, it says “Oh, that’s an exception to the key signature. Better mark it with a natural sign.”

{ 11 comments }