I’m using the term “sleeper” here for a theorem that is far more important than it seems, something that you may not appreciate for years after you first see it.

The first such theorem that comes to mind is **Bayes theorem**. I remember being unsettled by this theorem when I took my first probability course. I found it easy to prove but hard to understand. I couldn’t decide whether it was trivial or profound. Then years later I found myself using Bayes theorem routinely.

The key insight of Bayes theorem is that it gives you a way to turn probabilities around. That is, it lets you compute the probability of *A* given *B* from the probability of *B* given *A*. That may not seem so important, but it’s vital in application. It’s often easy to compute the probability of data given an hypothesis, but need we need to know the probability of an hypothesis given data. Those unfamiliar with Bayes theorem often get probabilities backward.

Another sleeper theorem is **Jensen’s inequality**: If φ is a convex function and *X* is a random variable, φ( E(*X*) ) ≤ E( φ(*X*) ). In words, φ at the expected value of *X* is less than the expected value of φ of *X*. Like Bayes’ theorem, it’s a way of turning things around. If the function φ represents your gain from some investment, Jensen’s inequality says that randomness is good for you; variability in *X* is to your advantage on average. But if φ* *is concave, variability works against you.* *

Sam Savage’s book The Flaw of Averages is all about the difference between φ( E(*X*) ) and E( φ(*X*) ). When φ is linear, they’re equal. But in general they’re different and there’s not much you can say about the relation of the two. However, when φ is convex or concave, you can say what the direction of the difference is.

I’ve just started reading Nassim Taleb’s new book Antifragile, and it seems to be an extended meditation on Jensen’s inequality. Systems with concave returns are fragile; they are harmed by variability. Systems with convex returns are antifragile; they benefit from variability.

What are some more examples of sleeper theorems?

**Related posts**:

A strong contender is Ito’s lemma: It was (re-)discovered by Merton, Black and Scholes for the derivation of their (in-)famous option pricing formula. The rest is history…

Complex Analysis – when complex numbers were discovered in the 16th century, their applied use case scenarios were beyond the comprehension of the time: electromagnetism, signal analysis, fluid dynamics, relativity, quantum mechanics.

Isn’t that an obvious consequence of the definition of convex and concave?

If the function “bulges” above linear, there are more places along the function that are higher than linear. If the function “sags” below linear, there are more places where the function is lower than linear.

Sounds like what I call a “No duh” theorem.

Marc: Jensen’s inequality is a direct consequence of convexity, essentially extending the definition of convexity from sums to integrals. But convexity is subtle.

It’s amazing how often nonlinear functions satisfy convexity or some related property. You can have some awful-looking function, and someone will say “This is convex, ergo by Jensen’s inequality we’re done” and it feels like they just pulled a rabbit out of a hat. I suppose it’s a consequence of the strong composition rules for convex functions, but things can be convex that are not at all obvious.

It’s also interesting, a la Taleb, how you can spot convexity in ordinary situations that don’t have precise formulas. For example, options lead to convex returns. That’s a simple observation, but it has profound consequences.

Bayes’ theorem is also a “no duh” result, falling directly out of the definition of conditional probability. But it can be subtle to apply correctly, and the results often go against intuition, even for people trained in probability and statistics.

I’d mention the standard error of the mean. Bad economic consequences of ignoring it are well documented here: http://press.princeton.edu/chapters/s8863.pdf

The Jordan curve theorem comes to mind: that a closed curve has an inside and an outside. My first thought when I encountered it was along the lines of ‘does that even need proving?’, but the more you think about it the more profound it becomes and the proof is far from trivial. I guess Fermat’s last theorem is an obvious candidate as well.

@John : Indeed , options are all about convexity. taleb, a former derivative trader like myself, had to feel the convexity. that’s how you spot ‘oddities’ aka, unpriced risk, and this extends to higher order parameters, and not just spot price. (like ‘voma’ etc..)

It is unsurprising to me that he tries to generalize this ‘pricing principle’ to entire business setup /environment : as a good trader, he was trained to smell this.

On a scientific perspective, convexity seems to be the deep and strong notion that drive so many modern appplication (svm, compressed sensing etc..) I find it absolutely deep and very rich.

PS : in the vein of your illustration, “This is convex, ergo by Jensen’s inequality we’re done” : kullback-leibler positivity is quite an illustration of this. is there any way it can be trivial wthout jensen.. ?

Dunno about

trivial, but it’s pretty quick even without Jensen. KL >= 0 says that if you have two probability distributions p,q then E_p(log p) >= E_p(log q). So suppose you have any choice of q, and you tweak it by increasing q infinitesimally in one place and decreasing it in another; then d E_p(log q) / d(change) = p1/q1 – p2/q2 because d/dq log q = 1/q, so for the optimal q this must be zero everywhere, which means p,q are proportional, which means p=q, and we’re done modulo a few technical details.Wasn’t it Lars Hormander who said the basis of his Fields Medal was integration by parts? And that’s basically the product rule for differentiation, so maybe that’s a good candidate.

The Hahn-Banach theorem also comes to mind as subtle and very powerful.

The Pigeon Hole Principle maybe?

EWD980

The one that jumps to mind for me is Taylor’s Theorem. When I first saw it, it seemed like a neat trick, but not terribly useful. But it involves so many important and useful ideas:

* Approximating any function with nth degree precision

* Bounding the error term of an approximation

* Decomposing functions into linear combinations of other functions

Powerful stuff!

I’ll “ditto” Will Daly’s suggestion of Taylor’s Theorem.

My suggestion: Kolmogorov’s Inequality for the maximum absolute value of the partial sums of a sequence of IID random variables. At first it looks like a cute but harmless sidebar… and then it turns out to be the basis of martingale theory, nonparametric statistics, and probably a few other things I haven’t even heard of.

Another suggestion: the Karush-Kuhn-Tucker optimality conditions for nonlinear programming, and the related Hugh Everett generalized lagrange multiplier result [http://or.journal.informs.org/content/11/3/399.full.pdf+html]

In economics, the Envelope Theorem is far more important than it seems. It’s an incredibly useful tool to reason about optimization problems, and it’s as perplexing as Bayes Theorem.

Any recommendations for a good book on the Bayes theorem for someone whose last stats class was in high school?

I only know of one Bayesian stat text with only high school math prerequisites: Statistics: A Bayesian Perspective by Don Berry. If you’ve had college math, but just not college statistics, you could start with a more sophisticated book.

Zorn’s Lemma. (Surpassed only by the Axiom of choice, which is a lemma if Zorn’s Lemma is an axiom.)

The Hahn-Banach theorem (& friends).

Any theorem whose statement includes the word “arbitrary.”