Variation in male and female Olympic performance II

In my previous post, I looked at what would happen if men and women had the same average athletic ability but men were more variable. I also looked at what would happen if men and women were equally variable but had different average abilities.

Now I want to look at something different. What if men and women have equal abilities in a given area, equal mean and variance, but more men are interested in that area? What effect does the greater competition? In this scenario, we would expect the male athletes to be better, but would the difference between men and women increase or decrease as you get to higher levels of competition?

Suppose ability for men and women are both normally distributed with mean 0 and variance 1. Then the performance of the best person out of n who try out for a sport is the nth order statistic of the standard normal. The median of this random variable is y(n) = Φ-1( 0.51/n ).  (See this paper for details.) The following table lists some values of y(n).

n y(n)
10 1.499
100 2.462
1,000 3.198
10,000 3.811
100,000 4.346
1,000,000 4.827

This means, for example, that if 100 people tried out, the best person is as likely to have ability above 2.462 as ability below that value.

Suppose 10 times as many men as women are interested in a sport. If there’s little competition, say 100 men versus 10 women, we’d expect the best man to have ability somewhere around 2.462 and the best woman to have ability around 1.499, a difference of  0.963. As the competition increases, the performance of the best man and the best woman increase, but the gap between them decreases. If 1,000,000 men are interested in a sport and 100,000 women, the differences in their abilities would be around 4.827 – 4.346  = 0.481, about half as much as difference as there was with less competition.

So according to these estimates, if men and women have equal ability in a sport but proportionately more men are interested in that sport, the difference between the best men and the best women will decline as the competition increases.

The same reason could be applied to show what advantage a large country would have over a smaller country if the citizens of both countries are equally talented and equally likely to want to compete in a sport.

Variation in male and female Olympic performance

Isabel Lugo posted an interesting article today called Variance in Olympic events in which she speculates about the variance in male versus female athletic performance.

… it may be the case that the difference between the very best men and the very best women in physical feats (say, times in some sort of race, because these are the most easily quantified) is larger than the difference between the average man and the average woman, because there could be more variance among men than women.

I did a few back-of-the-envelope calculations to explore this possibility. Let X represent female athletic performance and Y male athletic performance in some context. Assume X and Y are normally distributed and that we have rescaled so that X has mean 0 and standard deviation 1. (I know nothing about the statistics of athletic performance. This is just a rough exercise inspired by Isabel Lugo’s question.) For this post, I will assume equal numbers of men and women are interested in a given sport. My next post looks at what happens when abilities are equal but more men than women are interested in a given sport.

First, suppose men and women have equal average performance but that men have standard deviation σ > 1. Then a man who just makes the cutoff of n standard deviations above mean has performance nσ and a woman who just makes the analogous cutoff has performance n. Then the ratio of their performance is σ for any value of n. At every percentile, the ratio of male to female performance would be the same. The difference in performance, n(σ – 1), does increase as you look at more elite athletes, i.e. increasing values of n, but not by much. The difference would only be larger by 25% when looking at 5-sigma athletes rather than 4-sigma athletes even though the former is over 100 times more exclusive.

What if in some context male and female performance both had variance 1 but had different means? Say the mean for men is μ > 0 and the mean for women is 0. Then the performance for a man n standard deviations from the mean for men would be μ + n and the performance for a woman n standard deviations away from the mean for women would be n. The difference would remain constant at all levels of performance, but the ratio of performance levels would tend toward 1 as n increases, that is, as you look at more and more elite athletes.

Next look at a different question. In either of the above situations, what proportion of the best athletes will be male? I will show that the odds of a top athlete being male increase exponentially as your definition of “top” increases.

For a given level of performance k, we will look at P(Y > k)/P(X > k), the ratio of the proportion of men at that level to the proportion of women at that level. The probability that a woman has performance greater than k is given by the approximation

P(X > k) \approx \frac{1}{ k \sqrt{2\pi}} \exp\left( -\frac{k^2}{2} \right)

Now suppose Y has mean 0 but standard deviation σ > 1. Then the odds in favor of someone with performance level greater than k being male equals

k \exp\left( \frac{k^2}{2} l\eft( 1 - \frac{1}{\sigma}\right)\right)

which increases exponentially as k increases, i.e. as we look at higher levels of performance. (By symmetry, this would also mean that the odds of a poor performer being male would increase as you looked at worse and worse performers.) To plug in some particular numbers, suppose the standard deviation for men is 1.5 and we had a group of people with performance 2 or greater. The odds in favor of someone in that group being male would be almost 4 to 1. But if we looked in a group with performance 5 or greater, the odds in favor of someone being male would be 322 to 1.

Next suppose Y has mean μ > 0 but standard deviation 1. Then the odds of a top performer being male are

\frac{k}{k-mu} \exp\left( \mu k - \frac{\mu^2}{2}\right)\right)

This also increases exponentially as k increases. Again to put in some specific numbers, assume μ = 0.5 and look at performance levels of 2 and 5. The odds in favor of someone with performance level at least 2 being male are about 3.2 to 1. The corresponding odds for a group with performance level at least 5 are about 12 to 1.

Works in the field, not in the lab

I read recently that the first military radar systems worked better in the field than in the lab. Apparently the electronics needed jiggling now and then and so did better in actual use than in the protected environment of the lab.

What are some other systems that work better in the field than in the lab or systems that work better in practice than in theory?

Conflicting ideas of simplicity

Sometimes it’s simpler to compute things exactly than to use an approximation. When you work on problems that cannot be computed exactly long enough, you start to assume everything falls in that category. I posted a tech report a few days ago about a problem in studying clinical trials that could be solved exactly even though it was commonly approximated by simulation.

This is another example of trying the simplest thing that might work. But it’s also an example of conflicting ideas of simplicity. It’s simpler, in a sense, to do what you’ve always done than to do something new.

It’s also an example of a conflict between a programmer’s idea of simplicity versus a user’s idea of simplicity. For this problem, the slower and less accurate code requires less work. It’s more straightforward and more likely to be correct. The exact solution takes less code but more thought, and I didn’t get it right the first time. But from a user’s perspective, having exact results is simpler in several ways: no need to specify a number of replications, no need to wait for results, no need to argue over what’s real and what’s simulation noise, etc. In this case I’m the programmer and the user so I feel the tug in both directions.

Pepsi Challenge for Windows Vista

Microsoft did an experiment similar to the Pepsi Challenge from years ago.

Pepsi challenge

Microsoft asked people their opinions of Windows Vista then asked them to take a look at Mojave, a supposedly new version of Windows. See The Mojave Experiment. Not surprisingly, people had favorable things to say about Mojave. There wouldn’t have been a Mojave website otherwise. To Microsoft’s credit, they do give some details of the experiment on the website. When the participants were told that “Mojave” is really Vista, their reactions were very similar to the Coke fans who were told that they’d just chosen Pepsi.

There’s a deeper analogy between the Mojave Experiment and the Pepsi Challenge. One reason Coke fans often preferred Pepsi in a blind taste test is that they didn’t drink much of the samples. Pepsi is sweeter than Coke, and so people may prefer a sip of Pepsi to a sip of Coke, even if they would prefer a can of Coke to a can of Pepsi. People may be impressed with a demo of Vista but frustrated when they have to use it for a few days. On the other hand, I don’t doubt that many people have been prejudiced against Vista and would enjoy using it if they gave it a chance.

Random inequalities IV: Cauchy distributions

Two weeks ago I wrote a series of posts on random inequalities: part I, part II, part III. In the process of writing these, I found an error in a tech report I wrote five years ago. I’ve posted a corrected version and describe the changes here.

Suppose X1 is a Cauchy random variable with median m1 and scale s1 and similarly for X2. Then X1X2 is a Cauchy random variable with median m1m2 and scale s1 + s2. Then P(X1 > X2) equals

P(X1X2 > 0) = P(m1m2  + (s1 + s2) C > 0)

where C is a Cauchy random variable with median 0 and scale 1.  This reduces to

P(C < (m1m2)/(s1 + s2)) = 1/2 + atan( (m1m2)/(s1 + s2) )/π.

The original version was missing the factor of 1/2. This is obviously wrong because it would say that P(X1 > X2) is negative when m1 < m2.

By the way, I was told in college that the Cauchy distribution is an impractical curiosity, something more useful for developing counterexamples than modeling real phenomena. That was an overstatement. Thick-tailed distributions like the Cauchy often arise in applications, sometimes directly (see Noise, The Black Swan) or indirectly (for example, robust or default prior distributions).

Update: See part V on beta distributions.

Black swan talk

Nassim Taleb, author of The Black Swan, was part of a panel discussion at a statistical conference in Denver yesterday. His book contains some provocative criticisms of statisticians, so I was eager to see what the discussion might be like. His rhetoric at the meeting was far more subdued than in his book though his message was essentially the same. His main point was that there are severe limits to the ability of statistics to estimate the probabilities of rare events. Precise statements about very small probabilities are often nonsense.

Taleb argued that statisticians can make the problem of predicting rare events worse by reassuring non-statisticians that risks are under control when common sense would leave more room for doubt. (Anybody remember Long Term Capital Management?) He made an analogy to the former practice of suppressing all forest fires. The success in fighting small forest fires created a false sense of security while also creating the conditions for enormous forest fires by not clearing out underbrush. The success of statisticians in predicting the frequency of not-so-rare events lends confidence to predictions that are past the limits of their models.

The relative error in estimating the probability of rare events is only a problem when these rare events also have huge consequences. In a previous post I explained how normal distributions don’t do a good job of predicting the number of extremely tall people. When you’re predicting what proportion of the population meets the height requirements of the US Army, it makes no difference whether the probability of a woman being seven feet tall is one in a million (106) or one in a billion (109). But if you are insuring against a multi-billion dollar disaster, the difference between one in a million or one in a billion chance matters.

Taleb’s advice is to admit ignorance in predicting rare events and “organically” clip the tails of probability distributions by setting loss limits. This is what insurance companies do when they set caps on payoffs. By setting an upper limit on the amount they will pay, companies no longer need accurate estimates for the probabilities of rare but extremely costly events. Seems like very sensible advice to me.

Bad user interface design: hotel showers

Every time I get into a hotel shower I think “Oh great. How does this one work?” No two are the same, and yet I’ve never seen a shower that had the simplicity and convenience of the typical residential shower with two knobs, one for hot water and one for cold. (At least that’s what’s most common in the US.)

Here’s how the shower was labeled in my hotel in Denver this week:

misleading shower label

I assumed that the off position was at 4 o’clock, the hottest water at 3 o’clock, and the coldest at 9 o’clock. So I turned the handle to the 2 o’clock position and waited for the water to warm up. Eventually I realized the shower should have been labeled something like this:

better shower label

The original label was misleading in two ways. First, it implied that you get warmer water by turning the handle clockwise. Second, it implied that the range of motion of the handle was between 9 o’clock and 4 o’clock. But to get a warm shower you have to turn the knob counterclockwise to between 5 and 6 o’clock.

Why do hotel shower designers go to great lengths to frustrate users? What’s wrong with simply having hot and a cold water knobs? Would this add a few dollars to the construction cost of a room? If so, I could think of a long list of ways I’d rather they cut costs. Are they concerned about guests who don’t know English? If so, then why assume that guests know what the letters “C” and “H” stand for? How about pictures of penguins and ice cubes drawn in blue above the cold water knob, and pictures of boiling water and fire drawn in red above the hot water knob?