The disappointing state of Unicode fonts

Modern operating systems understand Unicode internally, but font support for Unicode is spotty. For an example of the problems this can cause, take a look at these screen shots of how the same Twitter message appears differently depending on what program is used to read it.

No font can display all Unicode characters. According to Wikipedia

… it would be impossible to create such a font in any common font format, as Unicode includes over 100,000 characters, while no widely-used font format supports more than 65,535 glyphs.

However, the biggest problem isn’t the number of characters a font can display. Most Unicode characters are quite rare. About 30,000 characters are enough to display the vast majority of characters in use in all the world’s languages as well as a generous selection of symbols. However Unicode fonts vary greatly in their support even for the more commonly used ranges of characters. See this comparison chart. The only range completely covered by all Unicode fonts in the chart is the 128 characters of Latin Extended-A.

Unifont supports all printable characters in the basic multilingual plane, characters U+0000 through U+FFFF. This includes the 30,000 characters mentioned above plus many more. Unifont isn’t pretty, but it’s complete. As far as I know, it’s the only font that covers the characters below U+FFFF.

More Unicode posts

Technology history quiz

I was skimming through Big Ideas: 100 Modern Inventions the other day and was surprised at the dates for many of the inventions. I thought it would be fun to pick a few of these and make them into a quiz, so here goes.

Match the following technologies with the year of their invention.

First the inventions:

  1. The computer mouse
  2. Radio frequency identification (RFID)
  3. Pull-top cans
  4. Bar codes
  5. Touch tone phones
  6. Cell phones
  7. Car airbags
  8. Automated teller machines (ATM)
  9. Magnetic resonance imaging (MRI)
  10. Latex paint

Now the years:

  1. 1948
  2. 1952
  3. 1953
  4. 1963
  5. 1968
  6. 1969
  7. 1973
  8. 1977

Two of the years are used twice. Quiz answers here.

All examples taken from Big Ideas: 100 Modern Inventions That Have Transformed Our World

Biostatistics software

The M. D. Anderson Cancer Center Department of Biostatistics has a software download site listing software developed by the department over many years.

The home page of the download site allows you to see all products sorted by date or by name. This page also allows search. A new page lets you see the software organized by tags.

RelatedBiostatistics consultant

Soft maximum

In applications you often want to take the maximum of two numbers. But the simple function

f(x, y) = max(x, y)

can be difficult to work with because it has sharp corners. Sometimes you want an alternative that sands down the sharp edges of the maximum function. One such alternative is the soft maximum. Here are some questions this post will answer.

  • What exactly is a soft maximum and why is it called that?
  • How is it related to the maximum function?
  • In what sense does the maximum have “sharp edges”?
  • How does the soft maximum round these edges?
  • What are the advantages to the soft maximum?
  • Can you control the “softness”?

I’ll call the original maximum the “hard” maximum to make it easier to compare to the soft maximum.

The soft maximum of two variables is the function

g(x, y) = log( exp(x) + exp(y) ).

This can be extended to more than two variables by taking

g(x1, x2, …,  xn) = log( exp(x1) + exp(x2) + … + exp(xn) ).

The soft maximum approximates the hard maximum. Why? If x is a little bigger than y, exp(x) will be a lot bigger than exp(y). That is, exponentiation exaggerates the differences between x and y. If x is significantly bigger than y, exp(x) will be so much bigger than exp(y) that exp(x) + exp(y) will essentially equal exp(x) and the soft maximum will be approximately log( exp(x) ) = x, the hard maximum. Here’s an example. Suppose you have three numbers: -2, 3, 8. Obviously the maximum is 8. The soft maximum is 8.007.

The soft maximum approximates the hard maximum but it also rounds off the corners. Let’s look at some graphs that show what these corners are and how the soft maximum softens them.

Here are 3-D plots of the hard maximum f(x, y) and the soft maximum g(x, y). First the hard maximum:

Now the soft maximum:

Next we look at a particular slice through the graph. Here’s the view as we walk along the line y = 1. First the hard maximum:

And now the soft maximum:

Finally, here are the contour plots. First the hard maximum:

And now the soft maximum:

The soft maximum approximates the hard maximum and is a convex function just like the hard maximum. But the soft maximum is smooth. It has no sudden changes in direction and can be differentiated as many times as you like. These properties make it easy for convex optimization algorithms to work with the soft maximum. In fact, the function may have been invented for optimization; that’s where I first heard of it.

Notice that the accuracy of the soft maximum approximation depends on scale. If you multiply x and y by a large constant, the soft maximum will be closer to the hard maximum. For example, g(1, 2) = 2.31, but g(10, 20) = 20.00004. This suggests you could control the “hardness” of the soft maximum by generalizing the soft maximum to depend on a parameter k.

g(x, y; k) = log( exp(kx) + exp(ky) ) / k

You can make the soft maximum as close to the hard maximum as you like by making k large enough. For every value of k the soft maximum is differentiable, but the size of the derivatives increase as k increases. In the limit the derivative becomes infinite as the soft maximum converges to the hard maximum.

Update: See How to compute the soft maximum.

Software sins of omission

The Book of Common Prayer contains the confession

… we have left undone those things which we ought to have done, and we have done those things which we ought not to have done.

The things left undone are called sins of omission; things which ought not to have been done are called sins of commission.

In software testing and debugging, we focus on sins of commission, code that was implemented incorrectly. But according to Robert Glass, the majority of bugs are sins of omission. In Frequently Forgotten Fundamental Facts about Software Engineering Glass says

Roughly 35 percent of software defects emerge from missing logic paths, and another 40 percent are from the execution of a unique combination of logic paths.

If these figures are correct, three out of four software bugs are sins of omission, errors due to things left undone. These are bugs due to contingencies the developers did not think to handle. Three quarters seems like a large proportion, but it is plausible. I know I’ve written plenty of bugs that amounted to not considering enough possibilities, particularly in graphical user interface software. It’s hard to think of everything a user might do and all the ways a user might arrive at a particular place. (When I first wrote user interface applications, my reaction to a bug report would be “Why would anyone do that?!” If everyone would just use my software the way I do, everything would be OK.)

It matters whether bugs are sins of omission or sins of commission. Different kinds of bugs are caught by different means. Developers have come to appreciate the value of unit testing lately, but unit tests primarily catch sins of commission. If you didn’t think to program something in the first place, you’re not likely to think to write a test for it. Complete test coverage could only find 25% of a projects bugs if you assume 75% of the bugs come from code that no one thought to write.

The best way to spot sins of omission is a fresh pair of eyes. As Glass says

Rigorous reviews are more effective, and more cost effective, than any other error-removal strategy, including testing. But they cannot and should not replace testing.

One way to combine the benefits of unit testing and code reviews would be to have different people write the unit tests and the production code.

Related posts

Abstractions are never perfect

Better to have a simple system than a complex system with a simple abstraction on top.

Abstractions are never perfect. Every new layer creates failure points, interoperability hassles, and scalability problems. New tools can hide complexity, but they can’t justify it … The more complex the system, the more difficult it is to fix when something goes wrong.

From the preface to RESTful Web Services.

Related posts

Camtasia as a software deployment tool

Last week .NET Rocks mentioned a good idea in passing: start a screencast tool like Camtasia before you do a software install. Michael Learned, told the story of a client that asked him to take screen shots of every step in the installation of Microsoft’s Team Foundation Server. Carl Franklin commented “What a great idea to throw Camtasia on there and record the whole process.”

It would be better if the installation process were scripted and not just recorded, but sometimes that’s not practical. Sometimes clicking a few buttons is absolutely necessary or at least far easier than writing a script. And even if you think your entire process is automated with a script, a screencast might be a good idea. It could record little steps you have to do in order to run your script, details that are easily forgotten.

Another way to use this idea would be to have one person do a practice install on a test server while recording the process. Then another person could document and script the process by studying the video. This would be helpful when the person who knows how to do the installation lacks either the verbal skills to explain the process or the scripting skills to automate it.

Related posts

Better tools, less productivity?

Can better tools make you less productive? Here’s a quote from Frequently Forgotten Fundamental Facts about Software Engineering by Robert Glass:

Most software tool and technique improvements account for about a 5- to 30-percent increase in productivity and quality. … Learning a new tool or technique actually lowers programmer productivity and product quality initially. You achieve the eventual benefit only after overcoming this learning curve.

If you’re always learning new tools, you may be less productive than if you stuck with your old tools a little longer, even if the new tools really are better. And especially if you’re a part-time developer, you may never reach the point where a new tool pays for itself before you throw it away and pick up a new one. Kathleen Dollard wrote an editorial to this effect in 2004 entitled Save The Hobbyist Programmer.

Miners know they have a significant problem when the canary they keep with them stops singing. Hobbyist/part-time programmers are our industry’s version of the canary, and they have stopped singing. People who program four to eight hours a week are being cut out of the picture because they can’t increase their skills as fast as technology changes. That’s a danger signal for the rest of us.

So what do you do? Learn quickly or change slowly. The first option is to commit to learning a new tool quickly, invest heavily in up-front training, and use the tool as much as you can before the next one comes along. This is the favored option for ambitious programmers who want to maximize their marketability by always using the latest tools.

The second option is to develop a leap frog strategy, letting some new things pass you by. The less time you spend per week programming, the less often you should change tools. Change occasionally, yes, but wait for big improvements.

Related posts