Bidirectional text

This post will take a look at simple bidirectional text, such as a bit of English inside an Arabic document, or a few words of Hebrew inside a French document. If you want to explore the subject in all its complexity, see Unicode Standard Annex 9.

You may not need to do anything special to display bidirectional text. For example, when I typed the following sentence, I just typed the letters in logical order.

The first letter of the Hebrew alphabet is אלף.

For the last word, I typed א, then ל, then ף. When I entered the ל, the editor put it on the left side of the א, and when I entered ף the editor put it to the left of the ל. The characters are stored in memory in the same sequence that I typed them, though they are displayed in the order appropriate for each language.

You can change the default display ordering of characters by inserting control characters. For example, I typed

The [U+202E]quick brown fox[U+202C] jumped.

and the text displays [1] as

The ‮quick brown fox‬ jumped.

The Unicode character U+202E, known as RLO for “right-to-left override,” tells the browser to display the following letters from right-to-left. Then the character U+202C, known as PDF for “pop directional formatting,” exits that mode, returning to left-to-right [2]. If we copy the first sentence into a text file and open it with a hex editor we can see the control characters, circled in red.

hex editor screen shot

I saved the file in UTF-16 encoding to make the characters easy to see: each quartet of hex characters represented a Unicode character. UTF-8 encoding more common and more compressed.

If for some reason you wanted to force Hebrew to display from left-to-right, you could insert U+202D, known as LRO for “left-to-right override.” The character to exit this mode is PDF, U+202C, as before.

Here’s a bit of Hebrew written left-to-right:

Written left-to-right: ‭אלף.

And here’s what it looks like in an hex editor:

another hex editor screen shot

Related posts

[1] This should look the same as

The xof nworb kciuq jumped.

though in this footnote I typed the letters in the order they appear: xof …

If for some reason the text in the body of the post displays in normal order, not as in this note, then something has gone wrong with your browser’s rendering.

[2] So in addition to Portable Document Format and Probability Density Function, PDF can stand for Pop Directional Formatting. Here “pop” is being used in the computer science sense of popping a stack.

One thought on “Bidirectional text

Comments are closed.