Just an approximation

I find it amusing when I hear someone say something is “just an approximation” because their “exact” answer is invariably “just an approximation” from someone else’s perspective. When someone says “mere approximation” they often mean “that’s not the kind of approximation my colleagues and I usually make” or “that’s not an approximation I understand.”

For example, I once audited a class in celestial mechanics. I was surprised when the professor spoke with disdain about some analytical technique as a “mere approximation” since his idea of “exact” only extended to Newtonian physics. I don’t recall the details, but it’s possible that the disreputable approximation introduced no more error than the decision to only consider point masses or to ignore relativity. In any case, the approximation violated the rules of the game.

Statisticians can get awfully uptight about numerical approximations. They’ll wring their hands over a numerical routine that’s only good to five or six significant figures but not even blush when they approximate some quantity by averaging a few hundred random samples. Or they’ll make a dozen gross simplifications in modeling and then squint over whether a p-value is 0.04 or 0.06.

The problem is not accuracy but familiarity. We all like to draw a circle around our approximation of reality and distrust anything outside that circle. After a while we forget that our approximations are even approximations.

This applies to professions as well as individuals. All is well until two professional cultures clash. Then one tribe will be horrified by an approximation another tribe takes for granted. These conflicts can be a great reminder of the difference between trying to understand reality and playing by the rules of a professional game.

Related posts:

C programs and reading rooms

This evening I ran across Laurence Tratt’s article How can C Programs be so Reliable? Tratt argues that one reason is that C’s lack of a safety net makes developers more careful.

Because software written in C can fail in so many ways, I was much more careful than normal when writing it. In particular, anything involved in manipulating chunks of memory raises the prospect of off-by-one type errors – which are particularly dangerous in C. Whereas in a higher-level language I might be lazy and think hmm, do I need to subtract 1 from this value when I index into the array? Let’s run it and find out, in C I thought OK, let’s sit down and reason about this. Ironically, the time taken to run-and-discover often seems not to be much different to sit-down-and-think – except the latter is a lot more mentally draining.

I don’t know what I think of this, but it’s interesting. And it reminded me of something I’d written about this summer, how an acoustically live room can be quieter than a room that absorbs sound because people are more careful to be quiet in a live room. See How to design a quiet room.

Related post: Dynamic typing and anti-lock brakes

A Bayesian view of Amazon Resellers

I was buying a used book through Amazon this evening. Three resellers offered the book at essentially the same price. Here were their ratings:

  • 94% positive out of 85,193 reviews
  • 98% positive out of 20,785 reviews
  • 99% positive out of 840 reviews

Which reseller is likely to give the best service? Before you assume it’s the seller with the highest percentage of positive reviews, consider the following simpler scenario.

Suppose one reseller has 90 positive reviews out of 100. The other reseller has two reviews, both positive. You could say one has 90% approval and the other has 100% approval, so the one with 100% approval is better. But this doesn’t take into consideration that there’s much more data on one than the other. You can have some confidence that 90% of the first reseller’s customers are satisfied. You don’t really know about the other because you have only two data points.

A Bayesian view of the problem naturally incorporates the amount of data as well as its average. Let θA be the probability of a customer being satisfied with company A‘s service. Let θB be the corresponding probability for company B. Suppose before we see any reviews we think all ratings are equally likely. That is, we start with a uniform prior distribution θA and θB. A uniform distribution is the same as a beta(1, 1) distribution.

After observing 90 positive reviews and 10 negative reviews, our posterior estimate on θA has a beta(91, 11) distribution. After observing 2 positive reviews, our posterior estimate on θB has a beta(3, 1) distribution. The probability that a sample from θA is bigger than a sample from θB is 0.713. That is, there’s a good chance you’d get better service from the reseller with the lower average approval rating.

beta(91,11) versus beta(3,1)

Now back to our original question. Which of the three resellers is most likely to satisfy a customer?

Assume a uniform prior on θX, θY, and θZ, the probabilities of good service for each reseller. The posterior distributions on these variables have distributions beta(80082, 5113), beta(20370, 417), and beta(833, 9).

These beta distributions have such large parameters that we can approximate them by normal distributions with the same mean and variance. (A beta(a, b) random variable has mean a/(a+b) and variance ab/((a+b)2(a+b+1)).) The variable with the most variance, θZ, has standard deviation 0.003. The other variables have even smaller standard deviation. So the three distributions are highly concentrated at their mean values with practically non-overlapping support. And so a sample from θX or θY is unlikely to be higher than a sample from θZ.

In general, going by averages alone works when you have a lot of customer reviews. But when you have a small number of reviews, going by averages alone could be misleading.

Thanks to Charles McCreary for suggesting the xkcd comic.

Click to learn more about Bayesian statistics consulting


Related links:

Sed one-liners

A few weeks ago I reviewed Peteris Krumins’ book Awk One-Liners Explained. This post looks at his sequel, Sed One-Liners Explained.

The format of both books is the same: one-line scripts followed by detailed commentary. However, the sed book takes more effort to read because the content is more subtle. The awk book covers the most basic features of awk, but the sed book goes into the more advanced features of sed.

Sed One-Liners Explained provides clear explanations of features I found hard to understand from reading the sed documentation. If you want to learn sed in depth, this is a great book. But you may not want to learn sed in depth; the oldest and simplest parts of sed offer the greatest return on time invested. Since the book is organized by task — line numbering, selective printing, etc — rather than by language feature, the advanced and basic features are mingled.

On the other hand, there are two appendices  organized by language feature. Depending on your learning style, you may want to read the appendices first or jump into the examples and refer to the appendices only as needed.

For a sample of the book, see the table of contents, preface, and first chapter here.

Related links:

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

California knows cancer

Last week I stayed in a hotel where I noticed this sign:

This building contains chemicals, including tobacco smoke, known to the State of California to cause cancer, birth defects, or other reproductive harm.

I saw similar signs elsewhere during my visit to California, though without the tobacco phrase.

The most amusing part of the sign to me was “known to the State of California.” In other words, the jury may still be out elsewhere, but the State of California knows what does and does not cause cancer, birth defects, and other reproductive harm.

Now this sign was not on the front of the hotel. You’d think that if the State of California knew that I faced certain and grievous harm from entering this hotel, they might have required the sign to be prominently displayed at the entrance. Instead, the sign was an afterthought, inconspicuously posted outside a restroom. “By the way, staying here will give you cancer and curse your offspring. Have a nice day.”

As far as the building containing tobacco smoke, you couldn’t prove it by me. I had a non-smoking room. I never saw anyone smoke in the common areas and assumed smoking was not allowed. But perhaps someone had once smoked in the hotel and therefore the public should be warned.

Related post: Smoking

New tech reports

Soft maximum

I had a request to turn my blog posts on the soft maximum into a tech report, so here it is:

Basic properties of the soft maximum

There’s no new content here, just a little editing and more formal language. But now it can be referenced in a scholarly publication.

More random inequalities

I recently had a project that needed to compute random inequalities comparing common survival distributions (gamma, inverse gamma, Weibull, log normal) to uniform distributions. Here’s a report of the results.

Random inequalities between survival and uniform distributions

This tech report develops analytical solutions for computing Prob(X > Y) where X and Y are independent, X has one of the distributions mentioned above, and Y is uniform over some interval. The report includes R code to carry out the analytic expressions. It also includes R code to estimate the same inequalities by sampling for complementary validation.

Here are some other tech reports and blog posts on random inequalities.

Professional volunteers

This afternoon I saw a fire truck with the following written on the side:

Staffed by professional volunteers

Of course this is an oxymoron if you take the words literally. A more accurate slogan would be

Staffed by well-qualified amateurs

A professional is someone who does a thing for money, and an amateur is someone who does it for love. Volunteer fire fighters are amateurs in the best sense, doing what they do out of love of the work and love for the community they serve.

Unfortunately professional implies someone is good at what they do, and amateur implies they are not. Maybe skill and compensation were more strongly correlated in the past. When most people had less leisure a century or two ago, few had the time to become highly proficient at something they were not paid for. Now the distinction is more fuzzy.

Because more people work for large organizations, public and private, it is easier to hide incompetence; market forces act more directly on the self-employed. It’s not uncommon to find people in large organizations who are professional only in the pecuniary sense.

It’s also more common now to find people who are quite good at something they choose not to practice for a living. I could imagine three ways the Internet may contribute to this.

  1. It makes highly skilled amateurs more visible by giving them an inexpensive forum to show their work.
  2. It gives amateurs access to information that would have once been readily available only to professionals.
  3. It has reduced the opportunities to make money in some professions. Some people give away their work because they can no longer sell it.

Big data and humility

One of the challenges with big data is to properly estimate your uncertainty. Often “big data” means a huge amount of data that isn’t exactly what you want.

As an example, suppose you have data on how a drug acts in monkeys and you want to infer how the drug acts in humans. There are two sources of uncertainty:

  1. How well do we really know the effects in monkeys?
  2. How well do these results translate to humans?

The former can be quantified, and so we focus on that, but the latter may be more important. There’s a strong temptation to believe that big data regarding one situation tells us more than it does about an analogous situation.

I’ve seen people reason as follows. We don’t really know how results translate from monkeys to humans (or from one chemical to a related chemical, from one market to an analogous market, etc.). We have a moderate amount of data on monkeys and we’ll decimate it and use that as if it were human data, say in order to come up with a prior distribution.

Down-weighting by a fixed ratio, such as 10 to 1, is misleading. If you had 10x as much data on monkeys, would you as much about effects in humans as if the original smaller data set were collected on people? What if you suddenly had “big data” involving every monkey on the planet. More data on monkeys drives down your uncertainty about monkeys, but does nothing to lower your uncertainty regarding how monkey results translate to humans.

At some point, more data about analogous cases reaches diminishing return and you can’t go further without data about what you really want to know. Collecting more and more data about how a drug works in adults won’t help you learn how it works in children. At some point, you need to treat children. Terabytes of analogous data may not be as valuable as kilobytes of highly relevant data.

Related posts:

Thomas Hardy and Harry Potter

Emily Willingham mentioned on Twitter that the names of the Harry Potter characters Dumbledore and Hagrid come from Thomas Hardy’s 1886 novel The Mayor of Casterbridge. Both appear in this passage:

One grievous failing of Elizabeth’s was her occasional pretty and picturesque use of dialect words …

… in time it came to pass that for “fay” she said “succeed”; that she no longer spoke of “dumbledores” but of “humble bees”; no longer said of young men and women that they “walked together,” but that they were “engaged”; that she grew to talk of “greggles” as “wild hyacinths”; that when she had not slept she did not quaintly tell the servants next morning that she had been “hag-rid,” but that she had “suffered from indigestion.”

Apparently dumbledore is a dialect variation on bumblebee and hagrid is a variation on haggard. I don’t know whether this is actually where Rowling drew her character names but it seems plausible.

Two suns in the sunset

NASA’s Kepler mission has discovered a planet orbiting two stars, something like Tatooine in Star Wars. However, unlike Tatooine, this planet is a gas giant about the size and mass of Saturn. But if you had a place to stand near the surface of this planet, you might see a sunset something like the one Luke Skywalker saw.

Source: Science Daily.

Latitude doesn’t exactly mean what I thought

Don Fredkin left a comment on my previous blog post that surprised me. I found out that latitude doesn’t exactly mean what I thought.

Imagine a line connecting your location with the center of the Earth. I thought that your latitude would be the angle that that line makes with the plane of the equator. And that’s almost true, but not quite.

Instead, you should imagine a line perpendicular to the Earth’s surface at your location and take the angle that that line makes with the plane of the equator.

If the Earth were perfectly spherical, the two lines would be identical and so the two angles would be identical. But since the Earth is an oblate spheroid (i.e. its cross-section is an ellipse) the two are not quite the same.

The angle I had in mind is the geocentric latitude ψ. The angle made by a perpendicular line and the plane of the equator is the geographic latitude φ. The following drawing from Wikipedia illustrates the difference, exaggerating the eccentricity of the ellipse.

How do these two ideas of latitude compare? I’ll sketch a derivation for equations relating geographic latitude φ and geocentric latitude ψ.

Let f(x, y) = (x/a)2 + (y/b)2 where a = 6378.1 km is the equatorial radius and b = 6356.8 km is the polar radius.. The gradient of f is perpendicular to the ellipse given by the level set f(x, y) = 1. At geocentric latitude ψ, y = tan(ψ) x and so the gradient is proportional to (1/a2, tan(ψ) / b2). From taking the dot product with (1, 0) it follows that the cosine of φ is given by

(1 + (a/b)4 tan2 ψ)-1/2.

It follows that

φ = tan-1( (a/b)2 tan ψ )


ψ = tan-1( (b/a)2 tan φ ).

The geocentric and geographic latitude are equal at the poles and equator. Between these extremes, geographic latitude is larger than geocentric latitude, but never by more than 0.2 degrees. The maximum difference, as you might guess, is near 45 degrees.

Here’s a graph of φ – ψ in degrees.

The maximum occurs at 44.9 degrees and equals 0.1917.

The curve looks very much like a parabola, and indeed it is. The approximation

φ = ψ + 0.186 – 0.0000946667 (ψ – 45)2

is very accurate, within about 0.005 degrees.

Related post: Journey away from the center of the Earth

Journey away from the center of the Earth

What point on Earth is farthest from its center? Mt. Everest comes to mind, but its summit is the point highest above sea level, not the point farthest from the center. These are not the same because the Earth is not perfectly spherical.

Our planet bulges slightly at the equator due to its rotation. The equatorial diameter is about 0.3% larger than the polar diameter. Sea level at the equator is about 20 kilometers farther from the center of the Earth than sea level at the poles.

Chimborazo in Ecuador

Photo via Wikipedia

Mt. Everest is about nine kilometers above sea level and located about 28 degrees north of the equator. Chimborazo, the highest point in Ecuador, is seven kilometers above sea level and 1.5 degrees south of the equator.

So how far are Mt. Everest and Chimborazo from the center of the Earth? To answer that, we first need to how far sea level at latitude θ is from the center of the Earth.

Imagine slicing the Earth with a plane containing its polar diameter. To a good approximation (within 100 meters) the resulting shape would be an ellipse. The equation of this ellipse is

(x / a)2 + (y / b)2 = 1

where a = 6378.1 km is the equatorial radius and b = 6356.8 km is the polar radius. A line from the center of the ellipse to a point at latitude θ has equation y = tan(θ) x. Solving the pair of equations for x shows that the distance from the center to the point at latitude θ is

d = sqrt( a2b2 sec2 θ / (a2 tan2 θ + b2 ) )

For Mt. Everest, θ = 27.99 degrees and so d = 6373.4. For Chimborazo, θ = -1.486 degrees and so d = 6378.1. So sea level is 4.7 km higher at Chimborazo. Mt. Everest is 2.6 km taller, but the summit of Chimborazo is about 2.1 km farther away from the center of the Earth.

Update: See my next post for a slight correction. A more accurate calculation would compute sea level is about 4.65 km higher at Chimborazo than Mt. Everest.

Related posts:

Compound complexity

I’ve started to read through Michael Fogus’ list of recommended technical papers and ran across this gem from Out of the Tar Pit:

Complexity breeds complexity. There are a whole set of secondary causes of complexity. This covers all complexity introduced as a result of not being able to clearly understand a system. Duplication is a prime example of this — if (due to state, control or code volume) it is not clear that functionality already exists, or it is too complex to understand whether what exists does exactly what is required, there will be a strong tendency to duplicate. This is particularly true in the presence of time pressures.

Within a few hours of reading that quote I had a conversation illustrating it. I talked with someone who needed to make a small change to a complex section of code. He said the code had three minor variations on the same chunk of functionality. He could get his job done much faster in the short term if he simply added a forth mutation to the code base. He refused to do that, but many developers would not refuse.

Suppose the rate of growth in complexity of a project is proportional to how complex the project is. And suppose, as the quote above suggests, that the proportionality constant is the time pressure. Then the complexity over time is given by

y‘(t) = a y(t)

where y(t) is complexity at time t and a is the time pressure. Then complexity grows exponentially. The solution to the equation is

y(t) = y0 eat

where y0 is the initial complexity. This isn’t meant to be an exact model, just a back-of-the-envelope illustration. On the other hand, I’ve seen situations where it gives a fairly good description of a project for a while. Complexity can grow exponentially like compound interest, and the greater the pressure, the greater the rate of compounding.

Now suppose there’s a different kind of time pressure, a pressure to simplify a project. This would correspond to a negative value of the proportionality constant a. If there were such pressure, this would mean that complexity would decrease exponentially.

I don’t think this kind of negative pressure on complexity is as realistic as positive pressure, but it’s not entirely unrealistic either. In the rare case of pressure to simplify, removing one source of complexity could lead to cascading effects. Because we don’t need this one thing any more, we don’t need these other things that were only there to prop it up, etc. There could be a sustained decrease in complexity, though it probably would not be exponential.

Related post: A little simplification goes a long way

Advanced or just obscure?

Sometimes it’s clear what’s meant by one topic being more advanced than another. For example, algebra is more advanced than arithmetic because you need to know arithmetic before you can do algebra. If you can’t learn A until you’ve learned B, then A is more advanced. But often advanced is used in a looser sense.

When I became a software developer, I was surprised how loosely developers use the word advanced. For example, one function might be called more “advanced” than other, even though there was no connection between the two. The supposedly more advanced function might be more specialized or harder to use. In other words, advanced was being used as a synonym for obscure. This is curious since advanced has a positive connotation but obscure has a negative connotation.

I resisted this terminology at first, but eventually I gave in. I’ll say advanced when I’m sure people will understand my meaning, even if I cringe a little inside. For example, I have had a Twitter account SansMouse that posts one keyboard shortcut a day [1]. These are in a cycle, starting with the most well-known and generally useful shortcuts. When I say the shortcuts progress from basic to advanced, people know what I mean and they’re happy with that. But it might be more accurate to say the shortcuts regress from most useful to least useful!

I’m not writing this just to pick at how people use words. My point is that the classification of some things as more advanced than others, particularly in technology, is largely arbitrary. The application of this: don’t assume that ‘advanced’ necessarily comes after ‘basic’.

Maybe A is called more advanced than B because most people find B more accessible. That doesn’t necessarily mean that you will find B more accessible. For example, I’ve often found supposedly advanced books easier to read than introductory books. Whether the author’s style resonates with you may be more important than the level of the book.

Maybe A is called more advanced than B because most people learn B first. That could be a historical accident. Maybe A is actually easier to learn from scratch, but B came first. Teachers and authors tend to present material in the order in which they learned it. They may think of newer material as being more difficult, but a new generation may disagree.

Finally, whether one thing is more advanced than another may depend on how far you intend to pursue it. It may be harder to master A than B, but that doesn’t mean it’s harder to dabble in A than B.

In short, you need to decide for yourself what order to learn things in. Of course if you’re learning something really new, you’re in no position to say what that order should be. The best thing is to start with the conventional order. But experiment with variations. Try getting ahead of the usual path now and then. You may find a different sequence that better fits your ways of thinking and your experience.

* * *

[1] Sometime after this post was written I renamed SansMouse to ShortcutKeyTip. I stopped posting to that account in September 2013, but the account is still online.

Related posts: