From the monthly archives:

March 2008

CEO compensation

by John on March 19, 2008

In a recent interview author and economist Tim Harford argued that CEO compensation is not based on the economic value of the CEO’s leadership. Instead, companies compensate their CEO’s generously in order to provide an incentive to aspiring CEOs. In this view, an executive compensation package is a sort of lottery prize designed to motivate those further down the corporate ladder.

{ 1 comment }

Second homes

by John on March 18, 2008

Designing good surveys is hard work. Andrew Gelman posted an example of unintended consequences in survey design yesterday. A survey question asked “How many people do you know who have a second home?” Apparently some respondents thought the question was asking about folks who own vacation homes while others thought the question referred to immigrants.

{ 1 comment }

Conceptual integrity

by John on March 18, 2008

How do you maintain conceptual integrity when multiple people contribute to a project?

Fred Brooks, author of the software engineering classic The Mythical Man-Month, gave a talk at at OOPSLA 2007 entitled Collaboration and Telecollaboration in Design (audio here). In his talk, Brooks discusses the importance of conceptual integrity. Great products have conceptual integrity and are nearly always the fruit of one or at most two minds. Products reflect their creators, and products designed by committees have multiple personalities. How do you maintain conceptual integrity when scale and complexity demand the participation of many people? Listen to Fred Brooks for some ideas.

{ 0 comments }

Going overboard with nouns

by John on March 15, 2008

I ran across an article by Steve Yegge called Execution in the Kindom of Nouns that explains what can happen when object oriented programming goes bad.

In programming, objects are like nouns and functions are like verbs. You can go off into the weeds by placing too much emphasis on nouns, and Yegge makes a good case that Java has done just that.

{ 0 comments }

Error function and the normal distribution

by John on March 15, 2008

The error function erf(x) and the normal distribution Φ(x) are essentially the same function. The former is more common in math, the latter in statistics. I often have to convert between the two.

It’s a simple exercise to move between erf(x) and Φ(x), but it’s tedious and error-prone, especially when you throw in variations on these two functions such as their complements and inverses. Some time ago I got sufficiently frustrated to write up the various relationships in a LaTeX file for future reference. I was using this file yesterday and thought I should post it as a PDF file in case it could save someone else time and errors.

{ 4 comments }

Wagner’s tuba

by John on March 15, 2008

A few days ago Engines of our Ingenuity aired a piece on Wagner’s tuba. It’s an interesting story of how Richard Wagner invented a new musical instrument for his Ring of the Nibelung cycle.

{ 0 comments }

What is the cosine of a matrix?

by John on March 14, 2008

How would you define the cosine of a matrix? If you’re trying to think of a triangle whose sides are matrices, you’re not going to get there. Think of power series. If a matrix A is square, you can stick it into the power series for cosine and call the sum the cosine of A.

cosine series

For example,

cosine of a 2x2 matrix 

This only works for square matrices. Otherwise the powers of A are not defined.

The power series converges and has many of the properties you’d expect. However, the usual trig identities may or may not apply. For example,

cos(A+B)

only if the matrices A and B commute, i.e. AB = BA. To see why this is necessary, imagine trying to prove the sum identity above. You’d stick A+B into the power series and do some algebra to re-arrange terms to get the terms on the right side of the equation. Along the way you’ll encounter terms like A2 + AB + BA + B2 and you’d like to factor that into (A+B)2, but you can’t justify that unless A and B commute.

Is cosine still periodic in this context? Yes, in the sense that cos(A + 2πI) = cos(A). This is because the diagonal matrix 2πI commutes with every matrix A and so the sum identity above holds.

Why would you want to define the cosine of a matrix? One application of analytic functions of a matrix is solving systems of differential equations. Any linear system of ODEs, of any order, can be rewritten in the form x‘ = Ax where x is a vector of functions and A is a square matrix. Then the solution is x(t) = etA x(0). And cos(At) is a solution to x‘ ‘+ A2x = 0, just as in calculus.

{ 0 comments }

The normal distribution pops up everywhere in statistics. Contrary to popular belief, the name does not come from “normal” as in “conventional.” Instead the term comes from a detail in a proof by Gauss discussed below where he showed that two things were perpendicular in a sense.

(The word “normal” originally meant “at a right angle,” going back to the Latin word normalis for a carpenter’s square. Later the word took on the metaphorical meaning of something in line with custom. Mathematicians sometimes use “normal” in the original sense of being orthogonal.)

The mistaken etymology persists because the normal distribution is conventional. Statisticians often assume anything random has a normal distribution by default. While this assumption is not always justified, it often works remarkably well. This post gives four lines of reasoning that lead naturally to the normal distribution.

Abraham de Moivre

1) The earliest characterization of the normal distribution is the central limit theorem, going back to Abraham de Moivre. Roughly speaking, this theorem says that if you average enough distributions together, even if they’re not normal, in the limit their average is normal. But this justification for assuming normal distributions everywhere has a couple problems. First, the convergence in the central limit theorem may be slow, depending on what is being averaged. Second, if you relax the hypotheses on the central limit theorem, other stable distributions with thicker tails also satisfy a sort of central limit theorem. The characterizations given below are more satisfying because they do not rely on limit theorems.

William Herschel

2) The astronomer William Herschel discovered the simplest characterization of the normal. He wanted to characterize the errors in astronomical measurements. He assumed (1) the distribution of errors in the x and y directions must be independent, and (2) the distribution of errors must be independent of angle when expressed in polar coordinates. These are very natural assumptions for an astronomer, and the only solution is a product of the same normal distribution in x and y. James Clerk Maxwell came up with an analogous derivation in three dimensions when modelling gas dynamics.

Carl Friedrich Gauss

3) Carl Friedrich Gauss came up with the characterization of the normal distribution that caused it to be called the “Gaussian” distribution. There are two strategies for estimating the mean of a random variable from a sample: the arithmetic mean of the samples, and the maximum likelihood value. Only for the normal distribution do these coincide.

4) The final characterization listed here is in terms of entropy. For a specified mean and variance, the probability density with the greatest entropy (least information) is the normal distribution. I don’t know who discovered this result, but I read it in C. R. Rao’s book. Perhaps it’s his result. If anyone knows, please let me know and I’ll update this post. For advocates of maximum entropy this is the most important characterization of the normal distribution.

Related post:

How the Central Limit Theorem began

{ 9 comments }

Alphabetical order is wrong

by John on March 12, 2008

Seth Godin posted an article today entitled Alphabetical order is obsolete. He makes a good argument that alphabetically sorted lists are often not the best user interface. I particularly liked his idea that an email client should sort junk mail according to the probability that each message is spam.

{ 1 comment }

In praise of tedious proofs

by John on March 11, 2008

The book Out of Their Minds quotes Leslie Lamport on proofs:

The proofs have been carried out to an excruciating level of detail … The reader may feel that we have given long, tedious proofs of obvious assertions. However, what he has not seen are the many equally obvious assertions that we discovered to be wrong only by trying to write similarly long, tedious proofs.                   

See Lamport’s paper How to Write a Proof. See also Complementary validation.

{ 0 comments }

How loud is the evidence?

by John on March 11, 2008

We sometimes speak of data as if data could talk. For example, we say such things as “What do the data say?” and “Let the data speak for themselves.” It turns out there’s a way to take this figure of speech seriously: Evidence can be meaningfully measured in decibels.

In acoustics, the intensity of a sound in decibels is given by

10 log10(P1/P0)

where P1 is the power of the sound and P0 is a reference value, the power in a sound at the threshold of human hearing.

In Bayesian statistics, the level of evidence in favor of a hypothesis H1 compared to a null hypothesis H0 can be measured in the same way as sound intensity if we take P0 and P1 to be the posterior probabilities of hypotheses H0 and H1 respectively.

Measuring statistical evidence in decibels provides a visceral interpretation. Psychologists have found that  human perception of stimulus intensity in general is logarithmic. And while natural logarithms are more mathematically convenient, logarithms base 10 are easier to interpret.

A 50-50 toss-up corresponds to 0 dB of evidence. Belief corresponds to positive decibels, disbelief to negative decibels. If an experiment shows H1 to be 100 times more likely than H0 then the experiment increased the evidence in favor of H1 by 20 dB.

A normal conversation is about 60 acoustic dB. Sixty dB of evidence corresponds to million to one odds. A train whistle at 500 feet produces 90 acoustic dB. Ninety dB of evidence corresponds to billion to one odds, data speaking loudly indeed.

To read more about evidence in decibels, see Chapter 4 of Probability Theory: The Logic of Science.

{ 0 comments }

There’s a saying that clients can have good, fast, or cheap. Pick two, but then the third will be whatever it has to be based on the other two choices. You can have good and fast if you’re willing to spend a lot of money. You can have fast and cheap, but the quality will be poor. You might even be able to get good and cheap, if you’re willing to wait a long time.

A variation on this theme is the iron triangle. You draw a triangle with vertices labeled “features”, “time” and ”resources.” If you make two of the sides longer, the third has to become longer too. Here goodness is defined as a feature set rather than quality, but the same principle applies.

There’s a problem with this line of reasoning: no matter what clients say, they want quality. They may say they want fast and cheap, and if you tell them you’ll sacrifice quality to deliver fast and cheap, you’ll be a hero — until you deliver. Then they want quality. As Howard Newton put it

People forget how fast you did a job, but they remember how well you did it.

Sometimes you can cut features as long as you do a good job on the features that remain, but only to a point. Clients are not going to be happy unless you meet their expectations, even if those expectations are explicitly contradicted in a contract. You can tell a client you’ll cut out frills to give them something fast and cheap, and they’ll gladly agree. But they still want their frills, or they will want them. The client may be silently disappointed. Or they may be vocally disappointed, demanding excluded features for free and complaining about your work. Eventually you learn what features to insist on including, even if a client says they can live without them.

{ 2 comments }

Math and stat posts classified

by John on March 9, 2008

I’ve added a page to my web site where I classified my blog posts and informal articles on math and statistics into six categories:

  • Elementary
  • Preventing and detecting errors
  • Interpreting and misinterpreting probabilities
  • Mathematical statistics
  • Practicalities
  • Pure math

I’ve put a link to this page on the side of my blog and intend to keep it up as I add posts related to math and statistics.

{ 0 comments }

Tukey tallying

by John on March 8, 2008

John Tukey was amazingly talented. He would have been remembered for his achievements in pure mathematics had he not gone on to have an even more remarkable career in statistics. He is also remembered for some of the words he coined, such as “software” and “vacuum cleaner.”

In his book Exploratory Data Analysis, affectionately known as EDA, Tukey gives advice on collecting and analyzing data, even down to how to count observations. Rather than the usual slash tallying, Tukey recommended his own method of tallying.

Tukey's tally system

Tukey’s system is easier to scan and may be less error-prone. For example, compare a count of 36 in both systems.

36 in slash tally

36 in Tukey tally

{ 1 comment }

How to linearize data for regression

by John on March 7, 2008

Linear regression books usually include a footnote that you might have to transform your data before you can apply regression. However, they seldom give any guidance on how to pick a transformation. Just try something until your scatterplots look linear.

John Tukey gave a nice heuristic for linearizing data in his 1977 book Exploratory Data Analysis. Tukey gives what he calls a ladder of transformations.

y3 
y2
y
√y
log y
-y-1
-y-2
-y-3

Try transformations in the direction of the bulge in the plot.  If the plot bulges up (say your plot looks something like y=√x), then move up the ladder from the identity: try squaring or cubing the data. Or if you’re going to transform x, think of the ladder as horizontal, from x3 to -x-3. If the bulge is down and to the right, either move down the y-ladder or to the right on the x-ladder.

(If you know of a good presentation of this topic online, something with good illustrations, please let me know and I’ll link to it. I did a quick search and found several hits, but the ones I looked at lacked clear pictures.)

{ 3 comments }