Distribution of a range

Suppose you’re drawing random samples uniformly from some interval. How likely are you to see a new value outside the range of values you’ve already seen?

The problem is more interesting when the interval is unknown. You may be trying to estimate the end points of the interval by taking the max and min of the samples you’ve drawn. But in fact we might as well assume the interval is [0, 1] because the probability of a new sample falling within the previous sample range does not depend on the interval. The location and scale of the interval cancel out when calculating the probability.

Suppose we’ve taken n samples so far. The range of these samples is the difference between the 1st and the nth order statistics, and for a uniform distribution this difference has a beta(n-1, 2) distribution. Since a beta(a, b) distribution has mean a/(a+b), the expected value of the sample range from n samples is (n-1)/(n+1). This is also the probability that the next sample, or any particular future sample, will lie within the range of the samples seen so far.

If you’re trying to estimate the size of the total interval, this says that after n samples, the probability that the next sample will give you any new information is 2/(n+1). This is because we only learn something when a sample is less than the minimum so far or greater than the maximum so far.



Tagged with:
Posted in Math, Statistics

New Twitter account: UnitFact

I’ve started a new Twitter account @UnitFact for tweets about units of measurement, constants, dimensional analysis, etc.

Tagged with:
Posted in Science

Elementary vs Foundational

Euclid’s proof that there are infinitely many primes is simple and ancient. This proof is given early in any course on number theory, and even then most students would have seen it before taking such a course.

There are also many other proofs of the infinitude of primes that use more sophisticated arguments. For example, here is such a proof by Paul Erdős. Another proof shows that there must be infinitely many primes because the sum of the reciprocals of the primes diverges. There’s even a proof that uses topology.

When I first saw one of these proofs, I wondered whether they were circular. When you use advanced math to prove something elementary, there’s a chance you could use a result that depends on the very thing you’re trying to prove. The proofs are not circular as far as I know, and this is curious: the fact that there are infinitely many primes is elementary but not foundational. It’s elementary in that it is presented early on and it builds on very little. But it is not foundational. You don’t continue to use it to prove more things, at least not right away. You can develop a great deal of number theory without using the fact that there are infinitely many primes.

The Fundamental Theorem of Algebra is an example in the other direction, something that is foundational but not elementary. It’s stated and used in high school algebra texts but the usual proof depends on Liouville’s theorem from complex analysis.

It’s helpful to distinguish which things are elementary and which are foundational when you’re learning something new so you can emphasize the most important things. But without some guidance, you can’t know what will be foundational until later.

The notion of what is foundational, however, is conventional. It has to do with the order in which things are presented and proved, and sometimes this changes. Sometimes in hindsight we realize that the development could be simplified by changing the order, considering something foundational that wasn’t before. One example is Cauchy’s theorem. It’s now foundational in complex analysis: textbooks prove it as soon as possible then use it to prove things for the rest of course. But historically, Cauchy’s theorem came after many of the results it is now used to prove.

Related: Advanced or just obscure?

Tagged with:
Posted in Math

Rudyard Kipling and applied math

This evening something reminded me of the following line from Rudyard Kipling’s famous poem If:

… If all men count with you, but none too much …

It would be good career advice for a mathematician to say “Let all areas of math count with you, but none too much.” This warns against dismissing something offhand because you’re sure you’ll never use it, and becoming so fond of something that it becomes a solution in search of a problem.

The same applies to technology: Let all technologies count with you, but none too much.

Related posts:

An array of hammers

A couple definitions of applied math


Tagged with:
Posted in Math

Timid medical research

Cancer research is sometimes criticized for being timid. Drug companies run enormous trials looking for small improvements. Critics say they should run smaller trials and more of them.

Which side is correct depends on what’s out there waiting to be discovered, which of course we don’t know. We can only guess. Timid research is rational if you believe there are only marginal improvements that are likely to be discovered.

Sample size increases quickly as the size of the effect you’re trying to find decreases. To establish small differences in effect, you need very large trials.

If you think there are only small improvements on the status quo available to explore, you’ll explore each of the possibilities very carefully. On the other hand, if you think there’s a miracle drug in the pipeline waiting to be discovered, you’ll be willing to risk falsely rejecting small improvements along the way in order to get to the big improvement.

Suppose there are 500 drugs waiting to be tested. All of these are only 10% effective except for one that is 100% effective. You could quickly find the winner by giving each candidate to one patient. For every drug whose patient responded, repeat the process until only one drug is left. One strike and you’re out. You’re likely to find the winner in three rounds, treating fewer than 600 patients. But if all the drugs are 10% effective except one that’s 11% effective,  you’d need hundreds of trials with thousands of patients each.

The best research strategy depends on what you believe is out there to be found. People who know nothing about cancer often believe we could find a cure soon if we just spend a little more money on research. Experts are more sanguine, except when they’re asking for money.

Tagged with: , ,
Posted in Science, Statistics

Commutative diagrams in LaTeX

There are numerous packages for creating commutative diagrams in LaTeX. My favorite, based on my limited experience, is Paul Taylor’s package. Another popular package is tikz-cd.

To install Paul Taylor’s package on Windows, I created a directory called localtexmf, set the environment variable TEXINPUTS to its location, and copied diagrams.sty file in that directory.

Here are a couple examples, diagrams used in the definition of product and coproduct.

And here’s the LaTeX to produce the diagrams.

& & X & & \\
& \ldTo^{f_1} & \dDashto_f & \rdTo^{f_2} & \\
A & \lTo_{\pi_1} & A\times B & \rTo_{\pi_2} & B \\

& & X & & \\
& \ruTo^{f_1} & \uDashto_f & \luTo^{f_2} & \\
A & \rTo_{i_1} & A\oplus B & \lTo_{i_2} & B \\

For much more information, see the package page.

Tagged with:
Posted in Math

The mean of the mean is the mean

There’s a theorem in statistics that says

E( \bar{X} ) = \mu

You could read this aloud as “the mean of the mean is the mean.” More explicitly, it says that the expected value of the average of some number of samples from some distribution is equal to the expected value of the distribution itself. The shorter reading is confusing since “mean” refers to three different things in the same sentence. In reverse order, these are:

  1. The mean of the distribution, defined by an integral.
  2. The sample mean, calculated by averaging samples from the distribution.
  3. The mean of the sample mean as a random variable.

The hypothesis of this theorem is that the underlying distribution has a mean. Lets see where things break down if the distribution does not have a mean.

It’s tempting to say that the Cauchy distribution has mean 0. Or some might want to say that the mean is infinite. But if we take any value to be the mean of a Cauchy distribution — 0, ∞, 42, etc. — then the theorem above would be false. The mean of n samples from a Cauchy has the same distribution as the original Cauchy! The variability does not decrease with n, as it would with samples from a normal, for example. The sample mean doesn’t converge to any value as n increases. It just keeps wandering around with the same distribution, no matter how large the sample. That’s because the mean of the Cauchy distribution simply doesn’t exist.

Tagged with:
Posted in Statistics

Patches and specs

From Leslie Lamport:

Every time code is patched, it becomes a little uglier, harder to understand, harder to maintain, bugs get introduced.

If you don’t start with a spec, every piece of code you write is a patch.

Which means the program starts out from Day One being ugly, hard to understand, and hard to maintain.

Tagged with:
Posted in Software development

Quintic root

Here’s a curious result I ran across the other day. Suppose you have a quintic equation of the form z x5x – 1 = 0. (It’s possible to reduce a general quintic equation to this form, known as Bring-Jerrard normal form.) There is no elementary formula for the roots of this equation, but the following infinite series does give a root as a function of the leading coefficient z:

\sum_{n=0}^\infty {5n \choose n} \frac{z^n}{4n+1}

One reason this is interesting is that the series above has a special form that makes is a hypergeometric function of z. You can read more about it here.

I could imagine situations where having such an expression for a root is useful, though I doubt the series would be much use if you just wanted to find the roots of a fifth degree polynomial numerically. Direct application of something like Newton’s method would be much simpler.

Tagged with:
Posted in Math

The most fearless and the most fearful people

While I was in Europe, someone commented to me that Americans are the most fearless and the most fearful people on Earth. We put men on the moon, and we walk around with hand sanitizer. We start bold business ventures and have ridiculously cautious safety regulations. We’re the home of cowboys and helicopter parents.

One response I had was that it’s not necessarily the same people who are being so bold and so timid. There’s a tension between the risk-tolerant and the risk-averse in America. The former are free to be bold in the private sector while the latter outvote them in the public sector.

Another explanation might be that an individual can be fearless and fearful about different things. Someone may be willing to risk millions of dollars but not be willing to risk eating unpasteurized food. There may be some sort of general risk homeostasis, though I imagine people willing to take risks in one area are often more willing to take risks in another area.

Posted in Uncategorized

Amazing approximation to e

Here’s an approximation to e by Richard Sabey that uses the digits 1 through 9 and is accurate to over a septillion digits. (A septillion is 1024.)

e \approx \left( 1 + 9^{{-4}^{7ḑot6}}\right)^{3^{2^{85}}}

MathWorld says that this approximation is accurate to 18457734525360901453873570 decimal digits. How could you get an idea whether this claim is correct? We could show that the approximation is near e by showing that its logarithm is near 1. That is, we want to show

3^{2^{85}} \log \left( 1 + 9^{{-4}^{42}\right) \approx 1.

Define k to be 3^(2^85) and notice that k also equals 9^(4^42). From the power series for log(1 + x) and the fact that the series alternates, we have

3^{2^{85}} \log \left( 1 + 9^{{-4}^{42}\right) = k \left( \frac{1}{k} - \frac{1}{2\eta^2} \right)

where η is some number between 0 and 1/k. This tells that the error is extremely small because 1/k is extremely small. It also tells us that the approximation underestimates e because its logarithm is slightly less than 1.

Just how small is 1/k? Its log base 10 is around -1.8 × 10^25, so it’s plausible that the approximation is accurate to 10^25 decimal digits. You could tighten this argument up a little and get the exact number of correct digits.

Tagged with:
Posted in Math

Looking like you know what you’re doing

I’ve been in The Netherlands this week for a conference where I gave a talk on erasure coding. Last night after the conference, my host drove me and another speaker to Schiphol Airport. I’m staying in Amsterdam, but it was easier to drop us both at the airport because it’s a short train ride from there into the city.

After wandering around for a bit, I found where I believed I should wait for the train, though I wasn’t entirely sure. While I was standing there a group of half-drunk young men from Scotland walked to the platform and asked me questions about the train. One of the group thought they were on the wrong platform, but I heard their leader say “He’s got glasses and a beard. He’s obviously more intelligent than us.” Apparently they found this argument convincing and they stayed.

Neither my nearsightedness nor my facial hair made me an expert on Dutch trains. This was my first time catching a train in a new country where most of the signs were written in a language I do not know. I imagine they’ve ridden more trains than I have. The only advantage I had over them was my sobriety. Maybe my experience as a consultant has enabled me to give confidence-insprirng advice on subjects I know less about than I’d like.

Central Station in Amsterdam

Posted in Business

Independent decision making

Suppose a large number of people each have a slightly better than 50% chance of correctly answering a yes/no question. If they answered independently, the majority would very likely be correct.

For example, suppose there are 10,000 people, each with a 51% chance of answering a question correctly. The probability that more than 5,000 people will be right is about 98%. [1]

The key assumption here is independence, which is not realistic in most cases. But as people move in the direction of independence, the quality of the majority vote improves. Another assumption is that people are what machine learning calls “weak learners,” i.e. that they perform slightly better than chance. This holds more often than independence, but on some subjects people tend to do worse than chance, particularly experts.

You could call this the wisdom of crowds, but it’s closer to the wisdom of markets. As James Surowiecki points out in his book The Wisdom of Crowds, crowds (as in mobs) aren’t wise; large groups of independent decision makers are wise. Markets are wiser than crowds because they aggregate more independent opinions. Markets are subject to group-think as well, but not to the same extent as mobs.


[1] Suppose there are N people, each with independent probability p of being correct. Suppose N is large and p is near 1/2. Then the probability of a majority answering correctly is approximately

Prob( Z > (1 – 2p) sqrt(N) )

where Z is a standard normal random variable. You could calculate this in Python by

from scipy.stats import norm
from math import sqrt
print( norm.sf( (1 - 2*p)*sqrt(N) ) )

This post is an elaboration of something I first posted on Google+.

Tagged with:
Posted in Math

Making definitions

“The essential virtue of category theory is as a discipline for making definitions, and making definitions is the programmer’s main task in life.”

From Computational Category Theory


Tagged with: ,
Posted in Math, Software development

Where else you can find me

In addition to this blog, you can find me on Twitter and Google+. I occasionally blog at Symbolism, though that site is mostly a paste bin for the DailySymbol Twitter account.

I’ll be speaking at the Snow Unix Event in The Netherlands in a couple weeks and I plan to go to Germany in September. I’ve made a couple trips to California this year and it looks like I’ll be flying out there more often. And of course you can always find me in Houston. If you’d like to meet in person, please let me know.


Posted in Uncategorized