Entering Unicode characters in Linux

Won currency symbol

I ran across this post from Aaron Toponce explaining how to enter Unicode characters in Linux applications. Hold down the shift and control keys while typing “u” and the hex values of the Unicode character you wish to enter. I tried this and it worked in Firefox, GEdit, and Gnome Terminal, but not in OpenOffice. I was running Ubuntu 7.10.

See also  Three ways to enter Unicode characters in Windows.

* * *

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Three ways to enter Unicode characters in Windows

Won currency symbol, U+20A9

Here are three approaches to entering Unicode characters in Windows. See the next post for entering Unicode characters in Linux.

Alt – x

In Microsoft Word you can insert Unicode characters by typing the hex value of the character then typing Alt-x. You can also see the Unicode value of a character by placing the cursor immediately after the character and pressing Alt-x. This also works in applications that use the Windows rich edit control such as WordPad and Outlook.

Pros: Nothing to install or configure. You can see the numeric value before you turn it into a symbol. It’s handy to be able to go the opposite direction, looking up Unicode values for characters.

Cons: Does not work with many applications.

Alt – +

Another approach which works with more applications is as follows. First create a registry key under HKEY_CURRENT_USER of type REG_SZ called EnableHexNumpad, set its value to 1, and reboot. Then you can enter Unicode symbols by holding down the Alt key and typing the plus sign on the numeric keypad followed by the character value. When you release the Alt key, the symbol will appear. This approach worked with most applications I tried, including Firefox and Safari, but did not with Internet Explorer.

Pros: Works with many applications. No software to install.

Cons: Requires a registry edit and a reboot. It’s awkward to hold down the Alt key while typing several other keys. You cannot see the numbers you’re typing. Doesn’t work with every application.


Another option is to install the UnicodeInput utility. This worked with every application I tried, including Internet Explorer. Once installed, the window below pops up whenever you hold down the Alt key and type the plus sign on the numeric keypad. Type the numeric value of the character in the box, click the Send button, and the character will be inserted into the window that had focus when you clicked Alt-plus.

UnicodeInput screenshot

Pros: Works everywhere (as far as I’ve tried). The software is free. Easy to use.

Cons: Requires installing software.

Related links

Click to learn more about math and computing consulting

Improved PowerShell prompt

A while back I wrote a post on how to customize your PowerShell prompt. Last week Tomas Restrepo posted an article on a PowerShell prompt that adds color and shortens the path in a more subtle way. I haven’t tried it out yet, but his prompt looks much better than what I’ve been using.

If you’re a long-time Windows user you might be worried that all this PowerShell stuff is starting to look a lot like Unix. Well, it is. Some of the folks on the PowerShell team have a Unix background and they’re bringing some of the best of Unix to Windows. The Unix world has more experience operating from the command line and so it’s wise to learn from them.

On the other hand, PowerShell is emphatically not bash for Windows. PowerShell is thoroughly object oriented and in that respect unlike any Unix shell. Also, PowerShell is strongly tied to Microsoft libraries, particularly .NET but also COM and WMI.

Applying PageRank algorithm to biology

Scientific American’s 60 Second Science has a podcast Google-style rankings for ecosystems reporting on a presentation by Stefano Allesino suggesting applying a Google-like algorithm to determine conservation priorities. Just as web pages rank higher when many other pages link to them, and organism would be a higher priority for conservation efforts if it is part of the food chain for many other organisms.

The Holy Grail of CSS

Basic tasks are simple in CSS, but even slightly harder tasks can be incredibly difficult. Controlling fonts, margins, and so forth is a piece of cake. But controlling page layout is another matter. In his book Refactoring HTML, Elliotte Rusty Harold describes a technique as

so tricky that it took any smart people quite a few years of experimentation to develop the technique show here.  In fact, so many people searched for this while believing that it didn’t actually exist that this technique goes under the name “The Holy Grail.”

What is the incredibly difficult task that took so many years to discover? Teaching a web browser to play chess using only style sheets? No, three column layout. I kid you not. He goes on to say

The goal is simple: two fixed-width columns on the left and the right and a liquid center for the content in the middle.  (That something so frequently needed was so hard to invent doesn’t speak well of CSS as a language, but it s the language we have to work with.)

You can read more about the Holy Grail of CSS in an article by Matthew Levine.

I appreciate the advantages of CSS, though I do wish it didn’t have such a hockey stick learning curve. I’ve heard people say not to bother learning overly difficult technologies because if you find it too difficult, so will everyone else and it will die off. But CSS seems to be firmly established with no competitor.

Variation in male and female Olympic performance II

In my previous post, I looked at what would happen if men and women had the same average athletic ability but men were more variable. I also looked at what would happen if men and women were equally variable but had different average abilities.

Now I want to look at something different. What if men and women have equal abilities in a given area, equal mean and variance, but more men are interested in that area? What effect does the greater competition? In this scenario, we would expect the male athletes to be better, but would the difference between men and women increase or decrease as you get to higher levels of competition?

Suppose ability for men and women are both normally distributed with mean 0 and variance 1. Then the performance of the best person out of n who try out for a sport is the nth order statistic of the standard normal. The median of this random variable is y(n) = Φ-1( 0.51/n ).  (See this paper for details.) The following table lists some values of y(n).


This means, for example, that if 100 people tried out, the best person is as likely to have ability above 2.462 as ability below that value.

Suppose 10 times as many men as women are interested in a sport. If there’s little competition, say 100 men versus 10 women, we’d expect the best man to have ability somewhere around 2.462 and the best woman to have ability around 1.499, a difference of  0.963. As the competition increases, the performance of the best man and the best woman increase, but the gap between them decreases. If 1,000,000 men are interested in a sport and 100,000 women, the differences in their abilities would be around 4.827 – 4.346  = 0.481, about half as much as difference as there was with less competition.

So according to these estimates, if men and women have equal ability in a sport but proportionately more men are interested in that sport, the difference between the best men and the best women will decline as the competition increases.

The same reason could be applied to show what advantage a large country would have over a smaller country if the citizens of both countries are equally talented and equally likely to want to compete in a sport.

Variation in male and female Olympic performance

Isabel Lugo posted an interesting article today called Variance in Olympic events in which she speculates about the variance in male versus female athletic performance.

… it may be the case that the difference between the very best men and the very best women in physical feats (say, times in some sort of race, because these are the most easily quantified) is larger than the difference between the average man and the average woman, because there could be more variance among men than women.

I did a few back-of-the-envelope calculations to explore this possibility. Let X represent female athletic performance and Y male athletic performance in some context. Assume X and Y are normally distributed and that we have rescaled so that X has mean 0 and standard deviation 1. (I know nothing about the statistics of athletic performance. This is just a rough exercise inspired by Isabel Lugo’s question.) For this post, I will assume equal numbers of men and women are interested in a given sport. My next post looks at what happens when abilities are equal but more men than women are interested in a given sport.

First, suppose men and women have equal average performance but that men have standard deviation σ > 1. Then a man who just makes the cutoff of n standard deviations above mean has performance nσ and a woman who just makes the analogous cutoff has performance n. Then the ratio of their performance is σ for any value of n. At every percentile, the ratio of male to female performance would be the same. The difference in performance, n(σ – 1), does increase as you look at more elite athletes, i.e. increasing values of n, but not by much. The difference would only be larger by 25% when looking at 5-sigma athletes rather than 4-sigma athletes even though the former is over 100 times more exclusive.

What if in some context male and female performance both had variance 1 but had different means? Say the mean for men is μ > 0 and the mean for women is 0. Then the performance for a man n standard deviations from the mean for men would be μ + n and the performance for a woman n standard deviations away from the mean for women would be n. The difference would remain constant at all levels of performance, but the ratio of performance levels would tend toward 1 as n increases, that is, as you look at more and more elite athletes.

Next look at a different question. In either of the above situations, what proportion of the best athletes will be male? I will show that the odds of a top athlete being male increase exponentially as your definition of “top” increases.

For a given level of performance k, we will look at P(Y > k)/P(X > k), the ratio of the proportion of men at that level to the proportion of women at that level. The probability that a woman has performance greater than k is given by the approximation

P(X > k) approx frac{1}{ k sqrt{2pi}} expleft( -frac{k^2}{2} right)

Now suppose Y has mean 0 but standard deviation σ > 1. Then the odds in favor of someone with performance level greater than k being male equals

k expleft( frac{k^2}{2} left( 1 - frac{1}{sigma}right)right)

which increases exponentially as k increases, i.e. as we look at higher levels of performance. (By symmetry, this would also mean that the odds of a poor performer being male would increase as you looked at worse and worse performers.) To plug in some particular numbers, suppose the standard deviation for men is 1.5 and we had a group of people with performance 2 or greater. The odds in favor of someone in that group being male would be almost 4 to 1. But if we looked in a group with performance 5 or greater, the odds in favor of someone being male would be 322 to 1.

Next suppose Y has mean μ > 0 but standard deviation 1. Then the odds of a top performer being male are

frac{k}{k-mu} expleft( mu k - frac{mu^2}{2}right)right)

This also increases exponentially as k increases. Again to put in some specific numbers, assume μ = 0.5 and look at performance levels of 2 and 5. The odds in favor of someone with performance level at least 2 being male are about 3.2 to 1. The corresponding odds for a group with performance level at least 5 are about 12 to 1.

Works in the field, not in the lab

I read recently that the first military radar systems worked better in the field than in the lab. Apparently the electronics needed jiggling now and then and so did better in actual use than in the protected environment of the lab.

What are some other systems that work better in the field than in the lab or systems that work better in practice than in theory?

Conflicting ideas of simplicity

Sometimes it’s simpler to compute things exactly than to use an approximation. When you work on problems that cannot be computed exactly long enough, you start to assume everything falls in that category. I posted a tech report a few days ago about a problem in studying clinical trials that could be solved exactly even though it was commonly approximated by simulation.

This is another example of trying the simplest thing that might work. But it’s also an example of conflicting ideas of simplicity. It’s simpler, in a sense, to do what you’ve always done than to do something new.

It’s also an example of a conflict between a programmer’s idea of simplicity versus a user’s idea of simplicity. For this problem, the slower and less accurate code requires less work. It’s more straight-forward and more likely to be correct. The exact solution takes less code but more thought, and I didn’t get it right the first time. But from a user’s perspective, having exact results is simpler in several ways: no need to specify a number of replications, no need to wait for results, no need to argue over what’s real and what’s simulation noise, etc. In this case I’m the programmer and the user so I feel the tug in both directions.

Pepsi Challenge for Windows Vista

Microsoft did an experiment similar to the Pepsi Challenge from years ago.

Pepsi challenge

Microsoft asked people their opinions of Windows Vista then asked them to take a look at Mojave, a supposedly new version of Windows. See The Mojave Experiment. Not surprisingly, people had favorable things to say about Mojave. There wouldn’t have been a Mojave web site otherwise. To Microsoft’s credit, they do give some details of the experiment on the web site. When the participants were told that “Mojave” is really Vista, their reactions were very similar to the Coke fans who were told that they’d just chosen Pepsi.

There’s a deeper analogy between the Mojave Experiment and the Pepsi Challenge. One reason Coke fans often preferred Pepsi in a blind taste test is that they didn’t drink much of the samples. Pepsi is sweeter than Coke, and so people may prefer a sip of Pepsi to a sip of Coke, even if they would prefer a can of Coke to a can of Pepsi. People may be impressed with a demo of Vista but frustrated when they have to use it for a few days. On the other hand, I don’t doubt that many people have been prejudiced against Vista and would enjoy using it if they gave it a chance.

Random inequalities IV: Cauchy distributions

Two weeks ago I wrote a series of posts on random inequalities: part I, part II, part III. In the process of writing these, I found an error in a tech report I wrote five years ago. I’ve posted a corrected version and describe the changes here.

Suppose X1 is a Cauchy random variable with median m1 and scale s1 and similarly for X2. Then X1X2 is a Cauchy random variable with median m1m2 and scale s1 + s2. Then P(X1 > X2) equals

P(X1X2 > 0) = P(m1m2  + (s1 + s2) C > 0)

where C is a Cauchy random variable with median 0 and scale 1.  This reduces to

P(C < (m1m2)/(s1 + s2)) = 1/2 + atan( (m1m2)/(s1 + s2) )/π.

The original version was missing the factor of 1/2. This is obviously wrong because it would say that P(X1 > X2) is negative when m1 < m2.

By the way, I was told in college that the Cauchy distribution is an impractical curiosity, something more useful for developing counterexamples than modeling real phenomena. That was an overstatement. Thick-tailed distributions like the Cauchy often arise in applications, sometimes directly (see Noise, The Black Swan) or indirectly (for example, robust or default prior distributions).

Update: See part V on beta distributions.


Click to learn more about Bayesian statistics consulting


Black swan talk

Nassim Taleb, author of The Black Swan, was part of a panel discussion at a statistical conference in Denver yesterday. His book contains some provocative criticisms of statisticians, so I was eager to see what the discussion might be like. His rhetoric at the meeting was far more subdued than in his book though his message was essentially the same. His main point was that there are severe limits to the ability of statistics to estimate the probabilities of rare events. Precise statements about very small probabilities are often nonsense.

Taleb argued that statisticians can make the problem of predicting rare events worse by reassuring non-statisticians that risks are under control when common sense would leave more room for doubt. (Anybody remember Long Term Capital Management?) He made an analogy to the former practice of suppressing all forest fires. The success in fighting small forest fires created a false sense of security while also creating the conditions for enormous forest fires by not clearing out underbrush. The success of statisticians in predicting the frequency of not-so-rare events lends confidence to predictions that are past the limits of their models.

The relative error in estimating the probability of rare events is only a problem when these rare events also have huge consequences. In a previous post I explained how normal distributions don’t do a good job of predicting the number of extremely tall people. When you’re predicting what proportion of the population meets the height requirements of the US Army, it makes no difference whether the probability of a woman being seven feet tall is one in a million (106) or one in a billion (109). But if you are insuring against a multi-billion dollar disaster, the difference between one in a million or one in a billion chance matters.

Taleb’s advice is to admit ignorance in predicting rare events and “organically” clip the tails of probability distributions by setting loss limits. This is what insurance companies do when they set caps on payoffs. By setting an upper limit on the amount they will pay, companies no longer need accurate estimates for the probabilities of rare but extremely costly events. Seems like very sensible advice to me.

Bad user interface design: hotel showers

Every time I get into a hotel shower I think “Oh great. How does this one work?” No two are the same, and yet I’ve never seen a shower that had the simplicity and convenience of the typical residential shower with two knobs, one for hot water and one for cold. (At least that’s what’s most common in the US.)

Here’s how the shower was labeled in my hotel in Denver this week:

misleading shower label

I assumed that the off position was at 4 o’clock, the hottest water at 3 o’clock, and the coldest at 9 o’clock. So I turned the handle to the 2 o’clock position and waited for the water to warm up. Eventually I realized the shower should have been labeled something like this:

better shower label

The original label was misleading in two ways. First, it implied that you get warmer water by turning the handle clockwise. Second, it implied that the range of motion of the handle was between 9 o’clock and 4 o’clock. But to get a warm shower you have to turn the knob counterclockwise to between 5 and 6 o’clock.

Why do hotel shower designers go to great lengths to frustrate users? What’s wrong with simply having hot and a cold water knobs? Would this add a few dollars to the construction cost of a room? If so, I could think of a long list of ways I’d rather they cut costs. Are they concerned about guests who don’t know English? If so, then why assume that guests know what the letters “C” and “H” stand for? How about pictures of penguins and ice cubes drawn in blue above the cold water knob, and pictures of boiling water and fire drawn in red above the hot water knob?