Posts tagged as:

Typography

Readability

by John on November 28, 2011

The Readability bookmarklet lets you reformat any web to make it easier to read. It strips out flashing ads and other distractions. It uses black text on a white background, wide margins, a moderate-sized font, etc. I use Readability fairly often. (Instapaper is a similar service. I discuss it at the end of this post.)

Yesterday I used it to reformat an article on literate programming. For some inexplicable reason, the author chose to use a lemon yellow background. It’s ironic that the article is about making source code easier to read. The content of the article is easy to read, but the format is not.

Readability to the rescue! Here are before and after screen shots.

Before:

After:

I recommend the article, Example of Literate Programming in HTML, and I also recommend using reformatting the page unless you enjoy reading black text on a yellow background.

Readability did a good job until about half way through the article. The article has C and HTML code examples, and perhaps these confused Readability. (Readability usually handles code samples well. It correctly formats the first few code samples in this article.) The last half of the article renders like source code, and the font gets smaller and smaller.

I ran the page through an HTML validator to see whether some malformed HTML could be the source of the problem. The validator found numerous problems, so perhaps that was the issue.

I haven’t seen Readability fail like this before. I’ve been surprised how well it has handled some pages I thought might trip it up.

I ended up saving the article and editing its source, changing the bgcolor value to white. It’s a nice article on literate programming once you get past the formatting. The best part of the article is the first section, and that much Readability formats correctly.

Instapaper

Instapaper reformats web pages similarly. It produces a narrower column of text, but otherwise the output looks quite similar.

Instapaper did not discover the title of the literate programming article. (The title of the article was not in an <h1> tag as software might expect but was only in a <title> tag in the page header.) However, it did format the entire body of the article correctly.

I find it slightly more convenient to use the Readability bookmarklet than to submit a link to Instapaper. I imagine there are browser plug-ins that make Instapaper just as easy to use, though I haven’t looked into this because I’m usually satisfied with Readability.

Related posts:

Literate programming and statistics
Tricky code

{ 11 comments }

Draw a symbol, look it up

by John on November 12, 2011

LaTeX users may know about Detexify, a web site that lets you draw a character then looks up its TeX command. Now there’s a new site Shapecatcher that does the same thing for Unicode. According to the site, “Currently, there are 10,007 Unicode character glyphs in the database.” It does not yet support Chinese, Japanese, or Korean.

For example, I drew a treble clef on the page:

The site came back with a list of possible matches, and the first one was what I was hoping for:

Interestingly, the sixth possible match on the list was a symbol for contour integration:

Notice the treble clef response has a funny little box on the right side. That’s because my browser did not have a glyph to display that Unicode character. The browser did have a glyph for the contour integration symbol and displayed it.

Another Unicode resource I recommend is this Unicode Codepoint Chart. It is organized by code point value, in blocks of 256. If you were looking for the contour integration symbol above, for example, you could click on a link “U+2200 to U+22FF: Mathematical Operators” and see a grid of 256 symbols and click on the one you’re looking for. This site gives more detail about each character than does Shapecatcher. So you might use Shapecatcher to find where to start looking, then go to the Unicode Codepoint Chart to find related symbols or more details.

Other posts on Unicode:

Why Unicode is subtle
The disappointing state of Unicode fonts
Entering Unicode characters in Windows and Linux
Inserting graphics in Twitter messages

{ 10 comments }

Typesetting “C#” in LaTeX

by John on October 18, 2011

How do you refer to the C# programming language in LaTeX? Simply typing C# doesn’t work because # is a special character in LaTeX. You could type C\#. That works, but it looks a little odd. The number sign is too big and too low. [click to continue...]

{ 3 comments }

Typesetting chemistry in LaTeX

by John on December 8, 2010

Yesterday I gave the following tip on TeXtip:

Set chemical formulas with math Roman. Example: sulfate is $\mathrm{SO_4^{2-}}$

TorbjoernT and scmbradley let me know there’s a better way: use Martin Hansel’s package mhchem. The package is simpler to use and it correctly handles subtle typographical details.

Using the mhchem package, sulfate would be written \ce{SO4^2-}. In addition to chemical compounds, mhchem has support for bonds, arrows, and related chemical notation.

Example:

Source:

\documentclass{article}
\usepackage[version=3]{mhchem}
\parskip=0.1in
\begin{document}

\ce{SO4^2-}

\ce{^{227}_{90}Th+}

\ce{A\bond{-}B\bond{=}C\bond{#}D}

\ce{CO2 + C -> 2CO}

\ce{SO4^2- + Ba^2+ -> BaSO4 v}

\end{document}

 

For more information, see the mhchem package documentation.

Related posts:

Top four LaTeX mistakes
Hyperlinks in LaTeX-generated PDF

{ 7 comments }

Google Docs OCR

by John on June 27, 2010

Google Docs now offers OCR (optical character recognition), but I’ve had little success getting  it to work.

The link to upload files was flaky under Firefox 3.6.4. The underlined text that says “Select files to upload” is not clickable, but you can click the white space a few millimeters above or below what looks like a link. However, the clickable white space didn’t do anything when I clicked it. The link worked just fine in IE 8 and Safari 5.0.

screen shot of page to upload documents for OCR

I clicked the check box that says “Convert text from PDF or image files to Google Docs documents” and uploaded a PDF file. The file was a decent quality scan of a paper document.

section of text from a scanned article

I got a message back saying “Unable to convert document.”

So I tried again with a PDF file that had been created from a LaTeX file using pdflatex. The optical quality of the document was perfect since the document it wasn’t a scan but rather an electronic document printed directly to PDF. Moreover, the PDF file contains the plain text.  Google indexes such PDFs created with pdflatex just as easily as HTML files. However, I still got the message “Unable to convert document.”

My experience with Google OCR wasn’t a total failure. I created a Microsoft Word document with text in 12-point Times New Roman — I figured this was as commonplace as I could get — and printed it to PDF. Google Docs did successfully convert that document to text.

I imagine Google’s OCR feature will be useful once they debug it. But it doesn’t yet seem ready for prime time based on my limited experience.

{ 2 comments }

The disappointing state of Unicode fonts

by John on January 16, 2010

Modern operating systems understand Unicode internally, but font support for Unicode is spotty. For an example of the problems this can cause, take a look at these screen shots of how the same Twitter message appears differently depending on what program is used to read it.

No font can display all Unicode characters. According to Wikipedia

… it would be impossible to create such a font in any common font format, as Unicode includes over 100,000 characters, while no widely-used font format supports more than 65,535 glyphs.

However, the biggest problem isn’t the number of characters a font can display. Most Unicode characters are quite rare. About 30,000 characters are enough to display the vast majority of characters in use in all the world’s languages as well as a generous selection of symbols. However Unicode fonts vary greatly in their support even for the more commonly used ranges of characters. See this comparison chart. The only range completely covered by all Unicode fonts in the chart is the 128 characters of Latin Extended-A.

Unifont supports all printable characters in the basic multilingual plane, characters U+0000 through U+FFFF. This includes the 30,000 characters mentioned above plus many more. Unifont isn’t pretty, but it’s complete. As far as I know, it’s the only font that covers the characters below U+FFFF.

Related posts:

Why Unicode is subtle

Entering Unicode characters in Windows, Linux

{ 17 comments }

Free alternative to Consolas font

by John on September 21, 2009

Consolas is my favorite monospace font. It’s a good programmer’s font because it exaggerates the differences between some characters that may easily be confused. It ships with Visual Studio and with many other Microsoft products. See this post for examples.

I recently found out about Inconsolata, a free font similar to Consolas. Inconsolata is part of the OFL font collection from SIL International.

Another interesting font from SIL is Andika, mentioned previously here. The Andika home page describes this font as follows.

Andika is a sans serif, Unicode-compliant font designed especially for literacy use, taking into account the needs of beginning readers. The focus is on clear, easy-to-perceive letterforms that will not be readily confused with one another.

Related posts:

Better R console fonts
Adding fonts to the PowerShell console
Comic Sans and dyslexia

{ 6 comments }

The default font options for the PowerShell console are limited: raster fonts and Lucida Console. Raster fonts are the default, though Lucida Console is an improvement. In my opinion, Consolas is even better, but it’s not on the list of options.

Mastering PowerShell by Tobias Weltner explains how to expand the list of font options for the PowerShell console. The same trick increases the list of font options in the Windows command prompt cmd.exe as well. The book is free for download. See page 16 for details. However, I have two comments about the instructions it gives.

First, the book says “The name must be exactly the same as the official font name, just the way it’s stated under [registry key].” However, the Consolas font is listed in the registry as “Consolas (True Type)”. You should enter “Consolas” and leave out the parenthetical description.

Second, the book says “the new font will work only after you either log off at least once or restart your computer.” When I tried it, logging off was not sufficient; I had to reboot my computer before the font change would work.

Update: In order to make this post self-contained, I’ve added below the necessary information from Mastering PowerShell.

Run regedit.exe and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont.

Right-click in the panel on the right side and create a new string value. Name that value “0″ or “00″ or however many zeros you need to create a new key. That string’s value is the name of the font to add.

Update: See Necessary criteria for fonts to be available in a command window

Related posts:

Improved PowerShell prompt
A couple thoughts on typography
Better R console fonts

{ 6 comments }

I owe Microsoft Word an apology

by John on July 15, 2009

I tried to use the Equation Editor in Microsoft Word years ago and hated it. It was hard to use and produced ugly output. I tried it again recently and was pleasantly surprised. I’m using Word 2007. I don’t remember what version I’d tried before.

I’ve long said that math written in Word is ugly, and it usually is. But the fault lies with users, like myself, not with Word. I realize now that the problem is that most people writing math in Word are not using the Equation Editor. LaTeX produces ugly math too when people do not use it correctly, though this happens less often.

Math typography is subtle. For example, mathematical symbols are set in an italic font that is not quite the same as the italic font used in prose. Also, word-like symbols such as “log” or “cos” are not set in italics. I imagine most people do not consciously notice these conventions — I never noticed until I learned to use LaTeX — but subconsciously notice when the conventions are violated. The conventions of math typography give clues that help readers distinguish, for example, the English indefinite article “a” from a variable named “a” and to distinguish the symbol for maximum from the product of variables “m”, “a”, and “x.”

Microsoft’s Equation Editor typesets math correctly. Word documents usually do not, but only because folks usually do not use the Equation Editor. In the following example, I set the same equation three times: using ordinary text, using ordinary italic for the “x”, and finally using the Equation Editor.

screen shot of trig identity using MS Word

Note that the “x” in the third version is not the same as the italic “x” in the second version. The prose in this example is set in Calibri font and the Equation Editor uses Cambria Math font. Also, I did not tell Word to format “sin” and “cos” one way and “x” another or tell it what font to use; I simply typed sin^2 x + cos^2 x = 1 into the Equation Editor and it formatted the result as above. I haven’t used it much, but the Equation Editor seems to be more capable and easier to use than I thought.

Here are a few more examples of Equation Editor output.

examples of math using Word: Gaussian integral, Fourier series, quadratic equation

I still prefer using LaTeX for documents containing math symbols. I’ve used LaTeX for many years and I can typeset equations very quickly using it. But I’m glad to know that Word can typeset equations well and that the process is easier than I thought.

I tried out the Equation Editor because Bob Matthews suggested I try MathType, a third-party equation editor add-on for Microsoft Word. I haven’t tried MathType yet but from what I hear it produces even better output.

Related post: Contrasting Microsoft Word and LaTeX

{ 21 comments }

A couple thoughts on typography

by John on June 10, 2009

Font embedding not such a good idea?

The most recent Boag World podcast interviewed Mark Boulton. Boulton has a contrarian opinion on font embedding. Nearly all web designers are excited about font embedding (the ability to have fonts download on-the-fly if a page uses a font not installed on the user’s computer). Bolton’s not so sure this is a good idea. Fonts are designed for a purpose, and most fonts were designed for print. The handful of fonts that were designed first for online viewing (Verdana, Georgia, etc.) are widely installed. If font embedding were a way to broaden the pallet of fonts designed for use on a computer monitor, that would be great. But the most likely use of font embedding would be to allow designers to use more fonts online that were not designed to be used online.

Comic Sans and dyslexia

Comic Sans is terribly overused. It’s not a bad font, but it’s often used in inappropriate contexts and has become a cliché for poor typographical taste.

However, I heard somewhere that people with dyslexia can read Comic Sans more easily than most other fonts. I think the explanation was that the font breaks some typical symmetries. For example, a “p” is not an exact mirror image of a “q.” (The former has a more pronounced serif on top.) On the other hand, the “b” and “d” do look like near mirror images. I wonder whether anyone has designed a font specifically to help people with dyslexia. Maybe such  fonts would exaggerate the asymmetries that were accidental in the design of Comic Sans. Delivering such fonts would be a good application of font embedding.

Update: Karl Ove Hufthammer left a comment pointing out Andika, a font with “easy-to-perceive letterforms that will not be readily confused with one another.” Here’s a sample.

Related posts

Periodic table of typefaces
Things that work best when you don’t notice them
Better R console fonts

{ 1 comment }

Typesetting music in LaTeX and LilyPond

by John on March 15, 2009

I tried typesetting music in LaTeX some time ago and gave up. The packages I found were hard to install, the examples didn’t work, etc. This weekend I decided to try again. I tried plowing through the MusiXTeX documentation and got no further than I did last time.

I posted a note on StackOverflow and got some good responses. Nikhil Chelliah suggested I look at LilyPond. I had looked at LilyPond before, and @jleedev explained how to integrate LaTeX and LilyPond.

Here’s some sheet music I included in my previous post, March in 7/4 time.

sheet music example

Here’s a full-sized PDF file version of the music above. And here’s the LilyPond source code used to create the music.

\relative c' {
\time 7/4
\key f \major
\clef treble
f g f \times 2/3{ c8 c c} f4 g a
g a8. bes16 a4 g f g c,
f g f \times 2/3{ c8 c c} f4 g a
g a8. bes16 a4 g f e f
}

The notation looks cryptic at first, but it makes sense after a few minutes. The command \relative c' means that the following pitches will be relative to middle C. For example, the first note, F, is the F closest to middle C. Each note is the same length as the previous note by default, and the first note is a quarter note by default. The notation c8 means that the C is an eighth note, except it’s in the context of a triplet (\times 2/3) and so it’s an eighth note triplet. The next F is denoted f4 to indicate that we’re back to quarter notes.

The notation a8. says that the A is a dotted eighth note. For the next note, bes16 means a B-flat sixteenth note. The suffix “es” stands for “flat” and “is” stands for “sharp.” (The documentation says it’s Dutch. I’ve never seen it before.) I don’t understand why I had to tell it that the B was flat. The code specified earlier that the key was F major, which implies B’s are flat. I suppose the code for individual notes is decoupled from the code to draw the key signature. That would make entering music painful in keys that have lots of sharps or flats. Maybe there’s a way to specify default sharps or flats.

The comma in c, gives the absolute pitch of the C. In relative mode, LilyPond assumes by default that each pitch name refers to the pitch closest to its predecessor. The C closest to the previous note, F, would have been the C up one fourth rather than down one fifth, so the comma was necessary to tell LilyPond to go down.

If I were to do a lot of music processing, I’d probably look at a commercial package such as Sibelius. But for now I’m just interested in producing small excerpts like that above, and it looks like LilyPond may be fine.

Update: I double checked the rules about flats etc. Yes, I do have to specify explicitly that the B in this example is B-flat. If I just say b rather than bes, LilyPond will add a natural sign in front of the B! It’s strange. It is aware of the key signature: when I tell it the B is flat, it says “OK, then I don’t have to mark that specially since it’s implicit in the key signature.” And if I don’t tell it the B is flat, it says “Oh, that’s an exception to the key signature. Better mark it with a natural sign.”

{ 11 comments }

Periodic table of Typefaces

by John on March 12, 2009

Squidspot.com has created an interesting period table of typefaces.

thumbnail of period table of typefaces from Squidspot.com

Related post: Periodic table of Perl operators

{ 3 comments }

Fonts, translations, and programming languages have one thing in common: they work best when you don’t notice them.

If someone says “Hey, look at this cool font I just found!” you probably wouldn’t want to read a book set in that font. At least to an untrained eye, a great font will not stand out in a list of small samples. You have to see large blocks of text set in a font to appreciate it. Even then, most people will not consciously appreciate a very readable font.

Translations are similar. If you find yourself saying “What an interesting translation!” then the translator has probably fallen down on the job. A good translation is neither archaic nor trendy. It does not draw attention to itself but allows you to focus on the original content. I believe the English Standard Version achieves that with Bible translation.

Python is like a good font or a good translation. For years I’d look into Python briefly when someone would recommend it. I’d thumb through a Python book, but it all looked rather plain. Only later did I come to appreciate that the beauty of Python is that it is rather plain. It doesn’t call attention to itself. It just gets out of your way and lets you write programs. It seems to me that compared to other programming language communities, the Python community brags less about their language per se and more about what they’re able to do with it.

{ 4 comments }

Better R console fonts

by John on October 31, 2008

The default installation of R on Windows uses Courier New for the console font. Unfortunately, this font offers low contrast between the letter ‘l’ and the number ‘1.’ There is also poor contrast between the letter ‘O’ and the number ‘0.’ The contrast between period and commas is OK.

Lucida Console is an improvement. It has high contrast between ‘l’ and ‘1′, though ‘O’ and ‘0′ are still hard to distinguish. But my favorite console font is Consolas. It offers strong contrast between ‘l’ and ‘1′, commas and periods, and especially between lower case ‘o’, upper case ‘O’, and the number ‘0.’

Consolas is more legible while also fitting more characters into the same horizontal space. It can do this because it uses ClearType anti-aliasing while the other two fonts do not. Here is a sample of the three fonts magnified 4x to show the anti-aliasing.

I found setting the default console font in R a little tricky. Clicking on the Edit -> GUI preferences menu brings up the Rgui Configuration Editor. From there it’s obvious how to change the font. However, what I found surprising is that clicking the “OK” button only changes the font for the current session. I can’t think of another application that behaves analogously. To set your choice of font for all future sessions, click “Save” rather than “OK.”

{ 2 comments }

I frequently need to look up how to add diacritical marks to letters in HTML, TeX, and Microsoft Word, though not quite frequently enough to commit the information to my long-term memory. So today I wrote up a set of notes on adding accents for future reference. Here’s a chart summarizing the notes.

Accent HTML TeX Word
grave grave \` CTRL + `
acute acute \' CTRL + '
circumflex circ \^ CTRL + ^
tilde tidle \~ CTRL + SHIFT + ~
umlaut uml \" CTRL + SHIFT + :
cedilla cedil \c CTRL + ,
æ, Æ æ, Æ \ae, \AE CTRL + SHIFT + & + a or A
ø, Ø ø, Ø \o, \O CTRL + / + o or O
å, Å å, Å \aa, \AA CTRL + SHIFT + @ + a or A

The notes go into more details about how accents function in each environment and what limitations each has. For example, LaTeX will let you combine any accent with any letter, but MS Word and HTML only support letter/accent combinations that are common in spoken languages.

{ 1 comment }