Removing Unicode formatting

Several people responded to my previous post asserting that screen readers would not be able to read text formatted via Unicode variants. Maybe some screen readers can’t handle this, but there’s no reason they couldn’t.

Before I go any further, I’d like to repeat my disclaimer from the previous post:

It’s a dirty hack, and I’d recommend not overdoing it. But it could come in handy occasionally. On the other hand, some people may not see what you intend them to see.

This formatting is gimmicky and there are reasons to only use it sparingly or not at all. But I don’t see why screen readers need to be stumped by it.

In the example below, I format the text “The quick brown fox” by running it through unifont as in the previous post.

If we pipe the output through unidecode then we mostly recover the original text. (I wrote about unidecode here.)

    $ unifont The quick brown fox | unidecode 

            Double-Struck: The quick brown fox
                Monospace: The quick brown fox
               Sans-Serif: The quick brown fox
        Sans-Serif Italic: The quick brown fox
          Sans-Serif Bold: The quick brown fox
   Sans-Serif Bold Italic: The quick brown fox
                   Script: T he quick brown fox
                   Italic: The quick brown fox
                     Bold: The quick brown fox
              Bold Italic: The quick brown fox
                  Fraktur: T he quick brown fox
             Bold Fraktur: T he quick brown fox

The only problem is that sometimes there’s an extra space after capital letters. I don’t know whether this is a quirk of unifont or unidecode.

This isn’t perfect, but it’s a quick proof of concept that suggests this shouldn’t be a hard thing for a screen reader to do.

Maybe you don’t want to normalize Unicode characters this way all the time, but you could have some configuration option to only do this for Twitter, or to only do it for characters outside a certain character range.

How to format text in Twitter

Twitter does not directly provide support for formatting text in bold, italic, etc. But it does support Unicode characters [1], and so a hack to get around the formatting limitation is to replace letters with Unicode variants.

For example, you could tweet

How to include bold or italic text in a tweet.

I cheated in the line above, using bold and italic formatting rather than Unicode characters because some readers might not be able to read it.

Here’s a screenshot of the actual Unicode text in Emacs. You can see the text in the footnotes [2].

This is plain text. I have asked for the details on the ‘b’ in bold, and the bottom windows shows that it is not the common U+0062 for ‘b’ down in the ASCII range, but U+1D5EF up in the Supplementary Multilingual Plane. Similarly, the i in italic above is not U+0069 but U+1D456.

Here’s how the text appears in Twitter:

It’s a dirty hack, and I’d recommend not overdoing it. But it could come in handy occasionally. On the other hand, some people may not see what you intend them to see. Here’s a portion of a screenshot from an Android device:

How to include XXXX or XXXXXX test

As a very rough rule of thumb, characters with smaller Unicode values are more likely to display correctly everywhere. Math symbols like ∞ (U+221E) work everywhere as far as I know. I wouldn’t depend on any Unicode character above 0xFFFF.

Update: Several people have said this formatting poses a problem for speech readers. The next post explains why it shouldn’t. (Maybe it does cause a problem, but it wouldn’t have to.)

How to produce Unicode formatting

I produced the Unicode text above using the programs unifont and unisupers from the Perl module Unicode::Tussle. See this post for how to install the module. Here’s a screenshot of using these utilities from the command line.

To use unifont, type the text you’d like to format after the command. It then shows the text formatted several ways using Unicode characters. I typed “bold” and copied the bold version of the word. The text could be anything; it’s a coincidence that I gave it text that was also a format name. For example, I created the double-struck R and C above with the command

    unifont R C

The unisupers command does not take an argument but instead takes its input from standard input. So I hit return after the command name and then typed ‘n’ to get the superscript n.

Related posts

[1] Twitter supports Unicode characters, but there’s a question of whether readers will have fonts installed to display the characters. I wrote eight years ago about some symbols users were and were not likely to see, but my impression is that the situation has improved quite a bit since then.

[2] Here’s the actual text of the tweet:

How to include  or  text in a tweet.
Weierstrass function.
Im: ℂ -> ℝ
ℝⁿ -> ℝᵐ

(I pasted the text into my blogging software, but it looks like it is deleting the words “bold” and “italic.”)

My densest books

I recently got a copy of Methods of Theoretical Physics by Morse and Feshbach. It’s a dense book, literally and metaphorically. I wondered whether it might be the densest book I own, so I weighed some of my weightier books.

I like big books, I cannot lie.

Morse and Feshbach has density 1.005 g/cm³, denser than water.

Gravitation by Misner, Thorne, and Wheeler is, appropriately, a massive book. It’s my weightiest paperback book, literally and perhaps metaphorically. But it’s not that dense, about 0.66 g/cm³. It would easily float.

The Mathematica Book by Wolfram (4th edition) is about the same weight as Gravitation, but denser, about 0.80 g/cm³. Still, it would float.

Physically Based Rendering by Pharr and Humphreys weighs in at 1.05 g/cm³. Like Morse and Feshbach, it would sink.

But the densest of my books is An Atlas of Functions by Oldham, Myland, and Spanier, coming in at 1.12 g/cm³.

The books that are denser than water were all printed on glossy paper. Apparently matte paper floats and glossy paper sinks.

Email subscription switchover

I’ve used Feedburner to allow people to subscribe to this blog via email. That service is going away, and so I just moved everyone over to MailerLite. I turned off Feedburner email, so nobody should get duplicate email.

Feedburner’s RSS service is still going, for now, but most RSS subscribers use my RSS feed without going through Feedburner.

If you’d like to subscribe to my monthly newsletter or to blog post notifications by email, you can do so here.


Volunteer-generated errata pages

I picked up a used copy of Quaternions and Rotation Sequences by Jack B. Kuipers for a project I’m starting to work on. The feedback I’ve seen on the book says it has good content but also has lots of typos. My copy has a fair number of corrections that someone penciled in. Someone on Amazon alluded to an errata page for the book but I’ve been unable to find it.

This made me wonder more generally: Is there a project to create errata pages? I’m thinking especially of mathematical reference books. I’m not concerned with spelling errors and such, but rather errors in equations that could lead to hours of debugging.

I would be willing to curate and host errata pages for a few books I care about, but it would be better if this were its own site, maybe a Wiki.

I don’t want to duplicate someone else’s effort. So if there’s already a site for community-generated errata pages, I could add a little content there. But if there isn’t such a project out there already, maybe someone would like to start one.


[1] Update: Jan Van lent found the errata page. See the first comment. Apparently the changes that were penciled into my book were copied from the author’s errata list. Also, these changes were applied to the paperback edition of the book.

Blog email subscription

As I mentioned a couple weeks ago, Feedburner, the service I’ve been using for blog email subscriptions, is shutting down. I’m switching over to MailerLite. The new email subscription is up and running. You can sign up here if you’d like.

If you’re already subscribed via Feedburner, there’s no need to sign up again with MailerLite. Some time in the next few weeks I will import all the email addresses from Feedburner into MailerLite. There will be some formatting changes and hopefully that will be the only difference you notice.

You could also subscribe via my RSS feed or follow one of my Twitter accounts if you’d like.

Maidenhead geocode system

The Maidenhead Locator System encodes a pair of longitude and latitude coordinates in a slightly complicated but ingenious way. Amateur radio operators using this geocoding system to describe locations.

The Wikipedia article on the subject describes the what of the system, but I’d like to say more about the why of the system. I’ll also go through an example in great detail. Continue reading

Plastic number feels plastic

The plastic ratio is given by

\begin{align*} \rho &= \sqrt[3]{\frac{9 + \sqrt{69}}{18}} + \sqrt[3]{\frac{9 - \sqrt{69}}{18}} \\ &= 1.324717957244746025\ldots \end{align*}

The Dutch architect Dom Hans van der Laan gave the number this name in 1928. He used “plastic” as an allusion to a 3D construction of the number, analogous to the 2D construction of the golden ratio.

Here’s a plastic rectangle, a rectangle whose sides have the proportions of the plastic ratio.

Plastic rectangle

@Gregoresate commented on Twitter that this ratio is aesthetically “disharmonious” and “bereft of meaning.” He could have said it looks plastic.

I doubt there was any negative connotation to the word plastic in 1928. The primary meaning at probably had to do with deformability, not with synthetic materials. To the extent that the word was associated with new materials, it would have had a positive connotation at the time. It’s interesting that the use of the plastic ratio has been criticized for being plastic in the contemporary sense of being inauthentic.

Related posts

LaTeX and Lawyers

Lawyers write Word documents and mathematicians write LaTeX documents. Of course makes collaboration awkward, but there are ways to make it better.

One solution is to simply use Word. People who use LaTeX probably know how to use Word, even if they’d rather not, and asking someone else to learn LaTeX is a non-starter. So if I’m coauthoring a document with a lawyer, I’ll use Word.

If I’m writing a report that a lawyer needs to review, I’ll use LaTeX. Using different programs actually helps because it makes a clear distinction between copy editing feedback and authorial responsibility.

This post will give a couple tips for writing reports in LaTeX to be delivered to a lawyer, one trivial and one not quite trivial.

The trivial tip is that \S produces the section sign § (U+00A7) common in legal documents but not so common elsewhere. The not so trivial tip is that the enumitem package lets you change the default labels that LaTeX uses with enumerated items.

Changing enumerated item labels

LaTeX was designed under the assumption that the user wants to focus on logical structure and leave the formatting up to the the typesetting program. Consistent with this design philosophy, nested enumerated lists simply wrapped with \begin{enumerate} and \end{enumerate} and individual list items are marked with \item, regardless of the level of nesting. LaTeX takes responsibility for displaying different labels at different levels: Arabic numerals for top-level lists, Roman letters for the next level of list, etc.

When you’re quoting legal documents, however, you don’t want to simply preserve the logical structure of (nested) lists; you want to preserve the labels as well.

Suppose you have the following nested list.

    \item First top-level item
    \item Second top-level item
      \item A sub-item
      \item Another sub-item
        \item A third-level item
        \item Another third-level item
            \item Four levels in

By default, LaTeX will format this as follows.

But suppose in order to match another document you need the labels to progress as (a), (1), (A), and (i). The following LaTeX code will accomplish this.

    \begin{enumerate} [label={(\alph*)}]
    \item First top-level item
    \item Second top-level item
      \begin{enumerate} [label={(\arabic*)}]
      \item A sub-item
      \item Another sub-item
        \begin{enumerate} [label={(\Alph*)}]
        \item A third-level item
        \item Another third-level item
          \begin{enumerate} [label={(\roman*)}]
            \item Four levels in

This produces the following.

Note the parentheses in the labels above. You can replace remove one or both, replace them with square brackets, add periods, etc. as the following example shows.

    \begin{enumerate} [label={\alph*)}]
    \item First top-level item
    \item Second top-level item
      \begin{enumerate} [label={\arabic*.}]
      \item A sub-item
      \item Another sub-item
        \begin{enumerate} [label={(\Alph*)}]
        \item A third-level item
        \item Another third-level item
          \begin{enumerate} [label={[\roman*]}]
            \item Four levels in

Here’s what this looks like when compiled.

Related posts

A Pattern Language

A Pattern Language: Towns, Buildings, Construction

I first heard of the book A Pattern Language sometime in the 1990s. I had left academia, for the first time [1], and was working as a software developer. Although the book is about architecture, software developers were exited about the book because of its analogs to software development patterns. The “Gang of Four” book Design Patterns, a book about object oriented programming, was also popular at the time.

I now see more people citing A Pattern Language for its architectural content, though it’s still cited in software circles. A lot of people quote it on Twitter. I recently discovered @apatterntolearn, a Twitter account devoted to the book.

The book is commonly attributed to Christopher Alexander alone, but the dust jacket says “Christoper Alexander, Sara Ishikawa, Murray Silverstein with Max Jacobson, Ingrid Fiksdahl-King, Shlomo Angel.” It’s easier to just say Christopher Alexander. I imagine people who are fully aware of there being more authors use Alexander as a synecdoche.

Ever since I ran into the book over 20 years ago I’ve intended to read it someday. I finally bought a copy a few weeks ago. I’ve blitzed through it, and I intend to go back through it more slowly.

Many quotes that I’d seen from Alexander have resonated with me, and I expected the book to do the same. That wasn’t my experience at first.

The book starts out abstract and becomes more concrete. I think the most quoted parts are later in the book. A few of the early patterns are quirky and controversial [2], but the book soon moves on to patterns that are more concrete and more widely accepted. For example, about midway through the book is the following observation.

If two parts of an office are too far apart, people will not move between them as often as they need to; and if they are more than one floor apart, there will be almost no communication between the two.

I noticed this early on in my career and found it bewildering. But it’s absolutely true, and in fact I might change “more than one floor apart” to “even one floor apart.”

The book talks a great deal about how to foster community, something we desperately need. Architecture is not the cause of or cure for strife, but architecture can create an environment where human interaction likely to be more frequent and more pleasant.

[1] I either left academia once or twice, depending on whether MD Anderson Cancer Center counts as academia. MDACC is a strange mix of hospital and university. I’d say my job there was semi-academic. I did some research and teaching, but I also did some software development and project management.

[2] I can sympathize with some of the quirky and/or controversial statements.