Notes on HTML, XML, TeX, and Unicode

This week’s resource post: some notes on typesetting, Unicode, etc.

See also blog posts tagged LaTeX, HTML, and Unicode.

For daily tips on LaTeX and typography, follow @TeXtip on Twitter.

TeXtip logo

Last week: C++ resources

Next week: Special functions

 

Gutenberg + Readability

Here’s a very simple idea: Use Project Gutenberg for content and Readability for style.

Project Gutenberg has a large collection of public domain books in digital form. The books are available in several formats, none of which are ideal for reading. Project Gutenberg provides text without much styling in order to make it easier for people to use the content as they please.

You can go to the HTML version of a book on Gutenberg and use Readability (or Instapaper) to format it for easier reading. Importing the HTML page to a Kindle similarly improves the formatting.

* * *

Has anyone made a style sheet to approximate the look of Readability or Instapaper? I’d like to use something like that to improve the appearance of the static HTML pages on my site.

Styling HTML for mobile devices

Yesterday I thought about adding a style sheet for mobile devices to some static HTML pages. How hard could it be? CSS has a media type. Just set the media to handheld,  specify a style sheet for mobile browsers, and you’re done.

Style sheet media types

One problem is that hand-held devices don’t always look for the handheld style sheet. According to Ben Henick, the handheld media type is “poorly supported by all but the very recently marketed devices, as of 2009.” From what I gather, most web sites try to infer the browser type on the server side and generate different HTML for mobile devices using PHP etc. Apparently static HTML markup will have its limitations at this point in time.

The iPhone doesn’t consider itself a hand-held device as far as CSS is concerned. Fair enough: perhaps the handheld designation is more for tiny screens like more traditional cell phones. But it’s not a desktop either.

You can’t target the iPhone with a simple media type, but you can use the following CSS.

<link rel="stylesheet" type="text/css"  href="iphone.css"
media="only screen and (max-device-width: 480px)" />

Of course a device other than an iPhone could grab this style sheet, and the iPhone will not grab the style sheet if they ever add one more pixel to the browser width. But this is the approach Apple gives in their online documentation.

Testing

It’s difficult to find emulators to test how pages appear on mobile devices. Someone said on Stack Overflow that Opera has a Small Screen view that is useful for emulating mobile devices. However, that recommendation was from November 2008. The current version of Opera either no longer supports that feature or has moved it somewhere else.

The Web Developer plug-in for Firefox lets you specify whether you want to display your page with the screen or handheld style sheet, but it does not emulate a hand-held device.

Apple makes an emulator for the iPhone, but only for Macintosh computers. MobiOne makes an emulator for the iPhone that runs on Windows. However, the emulator does not recognize the CSS statement above. I don’t have an iPhone, but I was able to borrow an iPod Touch, which runs the same browser as the iPhone. My pages worked correctly on the iPod when they did not work on the MobiOne emulator.

Suggestions?

Does anyone have any suggestions for making static HTML pages more friendly to mobile browsers? Any suggestions for testing?

Complexity of HTML and LaTeX

Sometime around 1994, my office mate introduced me to HTML by saying it was 10 times simpler than LaTeX. At the time I thought he was right. Now I’m not so sure. Maybe he was right in 1994 when the expectations for HTML were very low.

It is easier to bang out a simple, ugly HTML page than to write your first LaTeX document. When you compare the time required to make an attractive document, the effort becomes more comparable. The more sophisticated you get, the simpler LaTeX becomes by comparison.

Of course the two languages are not exactly comparable. HTML targets a web browser while LaTeX targets paper. HTML would be much simpler if people only used it to create documents to print out on their own printer. A major challenge with HTML is not knowing how someone else will use your document. You don’t know what browser they will view it with, at what resolution, etc. For that matter, you don’t know whether they’re even going to view it at all — they may use a screen reader to listen to the document.

Writing HTML is much more complicated than writing LaTeX if you take a broad view of all that is required to do it well: learning about accessibility and internationalization, keeping track of browser capabilities and market shares, adapting to evolving standards, etc. The closer you look into it, the less HTML has in common with LaTeX. The two languages are not simply two systems of markup; they address different problems.

Related links:

For daily tips on LaTeX and typography, follow @TeXtip on Twitter.

TeXtip logo

The good parts

I’ve written before about how I liked Douglas Crockford’s book JavaScript: The Good Parts and how I wish someone would write the corresponding book for R. I just found out this week that O’Reilly has published three more books along the lines of Crockford’s book:

I’m reading the HTML & CSS book. It’s a good read, but not quite what you might expect from the title. It’s not an introductory book on HTML or CSS. It assumes the reader is familiar with the basics of both languages. Instead it focuses on strategy for how to use the two languages.

HTML & CSS: The Good Parts reminds me of Scott Meyers’ Effective C++ books. These books assumed you knew the syntax of C++ but were looking for strategic advice for making the best use of the language. Some have argued that the fact Meyers needed to write these books is evidence that C++ is too complicated. The same could be said of HTML and especially CSS. Both C++ and web standards have evolved over time and are burdened with backward compatibility. But as Bjarne Stroustrup remarked

There are just two kinds of languages: the ones everybody complains about and the ones nobody uses.

Related post:

Sharps and flats in HTML

Apparently there’s no HTML entity for the flat symbol, ♭. In my previous post, I just spelled out B-flat because I thought that was safer; it’s possible not everyone would have the fonts installed to display B♭ correctly.

So how do you display music symbols for flat, sharp, and natural in HTML? You can insert any symbol if you know its Unicode value, though you run the risk that someone viewing the page may not have the necessary fonts installed to view the symbol. Here are the Unicode values for flat, natural, and sharp.

Since the flat sign has Unicode value U+266D, you could enter &#x266d; into HTML to display that symbol.

The sharp sign raises an interesting question. I’m sure most web pages referring to G-sharp would use the number sign # (U+0023) rather than the sharp sign ♯ (U+266F). And why not? The number sign is conveniently located on a standard keyboard and the sharp sign isn’t. It would be nice if people used sharp symbols rather than number signs. It would make it easier to search on specifically musical terms. But it’s not going to happen.

Update: See this post on font support for Unicode. Most people can see all three symbols, but some, especially Android users, might not see the natural sign.

Related posts:

Google Reader and HTML lists

Yesterday I wrote a post about how to start numbering a list in HTML at some point other than 1. Mark Reid and Thomas Guest pointed out that my example did not show up correctly in Google Reader.

Here’s how the list shows up when I browse directly to the post using Firefox 3 on Windows XP.

browser screen shot

But here’s what the same list looks like when I look at the post inside Google Reader.

Google Reader screen shot

I don’t understand why the difference. In fact, I don’t understand in general why posts often look different in Google Reader. For example, the screenshots above are centered when you visit the blog directly, but are left-aligned in Google Reader. Also, the space between the images and the text is removed.

Do other RSS readers similarly mangle HTML? Any suggestions how to fix the problem?

Update: Changed the way images are centered per Thomas Guest’s suggestion.

Starting number for HTML lists

I recently found out how to make an HTML list start numbering somewhere other than at 1. This is handy when you have a list interrupted by some text and want to continue the original numbering without starting over. I’ve only been using HTML for 15 years. Maybe one of these days I’ll really learn it.

In the <ol> tag, add the attribute start="7", for example, to make the list start numbering with 7.  The start attribute can be any integer, even negative.

For example, the seven dwarfs are

  1. Dopey
  2. Grumpy
  3. Doc
  4. Happy
  5. Bashful
  6. Sneezy

and last but not least

  1. Sleepy.

Update: As pointed out in the comments below, the example in this post may not render correctly in your reader. See this post for a discussion of the problem.

Faint praise for Expression Web

I really like Expression Web, when it doesn’t crash. It generates standard-compliant XHTML, it’s pleasant to use, etc. But I’ve had it crash many times. It crashes when I have too many files open, or when I edit too big a file for too long.

I need to find something better, but I haven’t taken the time to evaluate other HTML editing environments. I’d like to keep using it if Microsoft would fix the crashes. They have posted an update, but the description implies it doesn’t fix the problems I’ve seen.