Absence of evidence

Here’s a little saying that irritates me:

Absence of evidence is not evidence of absence.

It’s the kind of thing a Sherlock Holmes-like character might say in a detective novel. The idea is that we can’t be sure something doesn’t exist just because we haven’t seen it yet.

What bothers me is that the statement misuses the word “evidence.” The statement would be correct if we substituted “proof” for “evidence.” We can’t conclude with absolute certainty that something doesn’t exist just because we haven’t yet proved that it does. But evidence is not the same as proof.

Why do we believe that dodo birds are extinct? Because no one has seen one in three centuries. That is, there is an absence of evidence that they exist. That is tantamount to evidence that they do not exist. It’s logically possible that a dodo bird is alive and well somewhere, but there is overwhelming evidence to suggest this is not the case.

Evidence can lead to the wrong conclusion. Why did scientists believe that the coelacanth was extinct? Because no one had seen one except in fossils. The species was believed to have gone extinct 65 million years ago. But in 1938 a fisherman caught one. Absence of evidence is not proof of absence.

coelacanth, a fish once thought to be extinct

Though it is not proof, absence of evidence is unusually strong evidence due to subtle statistical result. Compare the following two scenarios.

Scenario 1: You’ve sequenced the DNA of a large number prostate tumors and found that not one had a particular genetic mutation. How confident can you be that prostate tumors never have this mutation?

Scenario 2: You’ve found that 40% of prostate tumors in your sample have a particular mutation. How confident can you be that 40% of all prostate tumors have this mutation?

It turns out you can have more confidence in the first scenario than the second. If you’ve tested N subjects and not found the mutation, the length of your confidence interval around zero is proportional to N. But if you’ve tested N subjects and found the mutation in 40% of subjects, the length of your confidence interval around 0.40 is proportional to √N. So, for example, if N = 10,000 then the former interval has length on the order of 1/10,000 while the latter interval has length on the order of 1/100. This is known as the rule of three. You can find both a frequentist and a Bayesian justification of the rule here.

Absence of evidence is unusually strong evidence that something is at least rare, though it’s not proof. Sometimes you catch a coelacanth.

Related posts:

Estimating the chances of something that hasn’t happened
Complementary validation

30 thoughts on “Absence of evidence

  1. John,
    this is one time I have to say you are wrong (with qualifications).
    The phrase “Absence of evidence is not evidence of absence” was used quite often by my former partner, a Ph.D. geneticist, and a world authority on evidence-based medicine.
    In a nutshell, your final point “Absence of evidence is unusually strong evidence that something is at least rare, though it’s not proof” is only valid if the “absence” has come through active looking.
    That assumption is usually, at least in lay medical discussions, not valid: people assume because they have not seen something, even though they have not been looking, that the absence of “evidence” is indeed evidence of absence.
    I agree with you that when people are actively and intelligently looking for something in a sustained way, then absence of evidence is indeed evidence of absence.
    The problem comes when they haven’t bothered to look, or only looked briefly or casually.

  2. Your mathematical analysis ignores Taleb’s Turkey problem. If the turkey is feed every day for many days, there is absence of evidence that human beings are anything be nice. Then, one day, crack! The turkey is eaten.

    Just a few years ago, my mother insisted that house prices never ever went down. She justified her belief by the fact that she had never, ever, heard about anyone’s house going down in value. It appears that the banks (until recently) agreed with her. Of course, everyone was wrong and house prices do go down, sometimes drastically, in value.

    Planets are another example. It appears that there is a planet in the solar system which might be 3 times as large as Jupiter. Moreover, there are many new “planets” that have been found, at least one of them being much more massive than Pluto. Just because centuries, or millennia, went by without us observing any new planets in our solar system, does not mean that they are not there.

    I could also take the evidence of extra-terrestrial life. We have none right now, and we’ve had none for a long, long time (sorry, but I dismiss the flying saucers sightings). Does that mean that there is no extra-terrestrial life? I tend to believe the opposite: it is *highly* likely that there is extra-terrestrial life, despite your rule of three.

    [I am not saying your math. is wrong. It is obviously correct though I did not check it. But it can lead to wrong conclusions.]

  3. Daniel: Extrasolar planets helps clarify the discussion. Until the 20th century we had no firm evidence of any extrasoloar planets. Astronomers knew very well that such planets were simply beyond their ability to observe. We didn’t have evidence that large planets did or did not exist.

    Implicit in the “rule of three” derivation is the assumption that your data are not biased by your methodology. The first extrasolar planets discovered were all huge. But nobody concluded that all extrasolar planets are large. Everyone realized that Earth-sized planets could be common and we wouldn’t know because our observation techniques were only capable of detecting large planets.

    But suppose the first thousand planets discovered had all been smaller than Earth. Then we’d have good evidence that planets larger than Jupiter are rare.

  4. @Daniel:
    I don’t think either of those examples is convincing–it’s only a problem if you’re committed to the idea that inductive reasoning is infallible. But, as the classic turkey example (among many others) shows, that’s obviously not the case.
    That doesn’t necessarily mean that knowledge of prior events is absolutely useless, however. Philosophers have argued about this for ages, and while there are still a lot of skeptics about the validity of induction, there are a lot of philosophers these days that believe that–yes–there are some grounds for defending induction.
    It’s not a binary sort of reasoning process, where you can say that “X doesn’t exist!” after Y observations, but it’s the sort of thing where that first statement can become more and more likely as Y increases.

  5. Ahhh, John. You’ve gone and posted a pic of one of my favorite friends, the coelacanth – what was once heralded as THE grand example of fish-to-amphibian evolution.

    What this not-so-handsome fish ended up proving was that species, thought to have existed 138-65 million years ago, can, and do, exist along side “modern” man.
    Not only do they continue to live today, they do so completely unchanged over time. To the dismay of Darwinian evolutionists everywhere, their lobbed-fins are still just that – lobbed-fins.

  6. Hi John,

    When Carl Sagan coined this pithy little aphorism, by “absence of evidence” he meant a sample size N=0. Read like this the quote is a truism.

    Unfortunately, many folks have a rather one-sided (legalistic?) view of “evidence” where only confirmation counts. And then then nonsense begins.


  7. I am wondering if “Ignorance of Evidence” needs to be considered? On reflection, I think it strays too far from John’s point. Is it useful to look at how we define evidence, or does that dive too far into epistemology?

    More questions than answers = good post John.

  8. @John

    I was thinking about the Solar System, where only a minority of astronomers conjectured that there may be other planets. Most astronomers until about a decade ago thought we had found all planets in the Solar System. And the data regarding many of the new dwarf planets was not acquired using new technology. It was out there in the data. But you had to actively look for it. And even then, it was very hard to see.

    Implicit in the “rule of three” derivation is the assumption that your data are not biased by your methodology.

    I think that the implicit assumption is that you are somehow assuming that you know the data distribution.

    This is a very strong requirement. It can work if what you are doing is routine work, and you have lots and lots of experience with the exact same setup. But there are many surprises or change of regimes in the real world.

    Consider that until recently, it was believed that Arab countries were adverse to democracy and that the Arab population were unable to rally up against their governments. And indeed, we had not seen a revolution in decades… from this observation, we could predict that such uprising were very unlikely. And this prediction would have been deeply flawed.

    This belief that we know the data distribution is what lead to the collapse of our financial system. Sometimes we do know the data distributions… but in many important cases, we do not know it.

  9. You sample, with replacement, 1000 marbles from a box we know contains 1,000,000 marbles, each of which is black or white. You find no black balls in your sample. You do your statistical analysis to test for the non-existence of black marbles in the box, citing the statistic (1-3/1000)=99.7%. As devil’s advocate, I offer an alternative to “H0:no black marbles,” namely “H1:one black marble.” I see that P(data|H0)=1, while P(data|H1)=(999999/1000000)^1000 = 0.999, essentially the same. Without any a priori reason to favor H0 over H1, I conclude that your evidence is woefully insufficient, i.e. very close to no evidence at all.

  10. SteveBrooklineMA: The rule of three gives a confidence interval, not a hypothesis test. It says that (0, 3/1000) is a 95% confidence interval for the number of black marbles. It doesn’t say that black marbles don’t exist, only that we estimate that the number of black marbles in the box of 1,000,000 is probably less than 3,000.

    (I’m being a little sloppy here interpreting a frequentist confidence interval as a Bayesian credibility interval. But that’s OK because I link to a post that shows that in this case the Bayesian credible interval is the same.)

  11. I enjoyed this post, especially the examples. In my blog post on the same topic a while back, I used the example of Saddam Hussein’s much-sought-after-but-apparently-nonexistent Weapons of Mass Destruction.

    @mat roberts: The originator of the expression was apparently cosmologist Martin Rees, not Carl Sagan; see my blog post for details.

  12. Agreed, but the question is can we conclude that the number of black marbles in the box of 1,000,000 is probably less than 1. Three thousand is a long way from 1 in this sense.

    When you say “nobody has this disease,” Average Joe thinks “fewer than one person put of 6 billion” Stats Guy thinks “fewer than a few percent of 6 billion” For large populations, this difference is huge. So huge, it seems, that Stats Guy has little to say in answer to Joe.

  13. There’s an interesting contrast between material things like marbles and animals like dodo birds and coelacanths. If you have evidence that an animal population is sufficiently small, you also have evidence that it is actually zero because a species needs a minimum population size to survive.

  14. SteveBrooklineMA: So what happens if you’re lucky, and find that one black marble in your sample of n=1000. Do you then conclude that you found the only one, or that there are about 1000 black marbles in the box? Using John’s example, the 95% Bayesian credible interval is about (25, 3677), so the One Black Marble Hypothesis still doesn’t look too good. Even using a 99% credible interval, the lower limit is about 5. Confidence and credible intervals are statements about the limits of evidence, and should be interpreted as limits on the confidence* we place in the evidence. (Jeez! I hate that word.)

    John: Thanks for the post about the Rule of Three. I teach the shrinkage estimator and the binomial exact interval for 0/n samples to my undergrads, but had not seen the Rule of Three approximation. Everyday I’m amazed at how much simple statistics got overlooked in my graduate education.

  15. @Jeff Farmer
    Very great point Jeff. I think there are many similar instances that are overlooked like this every day. We have so much noise in our scientific measurements, I doubt the credability on everything now a days.
    This was a great post. I like the commentator who mentioned a blog with more questions than answers is a great blog. I’m enjoying this blog more and more everyday! Keep up the great posts.

  16. Mike- If I got exactly one black ball out of 1000 trials, I would simply compare
    (1-1/1000000)^999*(1/1000000) = 9.9900e-007 with
    (1-1/1000)^999*(1/1000) = 3.6806e-004
    and conclude that p=1/1000 is a much more reasonable conclusion than p=1/1000000.

    I don’t think I am saying much different from John, really. If you have a population of size P, then establishing “no existence” means roughly establishing a rate p<1/P, which means our sample size must be on the order of P, e.g. John's P/3. This is what people are thinking when they say "absence of evidence… etc," that you pretty much have to sample the entire population to rule out a such a singular event. Sampling 1000 gives you very little help.

  17. Now we’re all talking on the same wavelength!

    For my students, I try to emphasis that the Rule of Three gives them a quick way to assess what looks like “perfect” results–absence and ubiquity are just complements. Example: a new diagnostic technique for melanoma correctly identified all 274 of its test subjects who were known to have melanoma, which suggests a test sensitivity of 100%. Fabulous! The Rule of Three suggests a lower limit on the sensitivity of 98.9%, which is merely good, these days.

  18. The ‘absence of evidence is not evidence of absence’ is useful and should usually be adhered to for the same reason one usually avoids argument from authority and ad hominem.

    They *are* powerful and general tools for an inductive logic and have plenty of justification statistically – but people are idiots and will cut themselves with said tools. There are some things not safe in the hands of the unwashed masses.

  19. The expression “Absence of evidence is not evidence of absence” is also used in the context of statistical inference testing. Hence if we have a p = 0.06 for the effect of an intervention, this is not statistically significant – but does not necessarily mean there is no effect (perhaps the power of our study was too weak).
    See: http://www.bmj.com/content/311/7003/485.full

    John, you say “the length of your confidence interval around zero is proportional to N.” but surely the CI cannot go below zero?

    Thanks for the post and the rule of 3

  20. No non-trivial zero of the Riemann zeta function having real part not equal to 1/2 has ever been found. Is this evidence that no such zero exists? It depends on whom you ask!

    John’s point about dodo birds vs marbles is a good one. Though to be scientifically useful I think there would have to be general agreement on an appropriate, objective prior. It might be difficult to overcome the criticism that one is assuming one’s conclusion.

  21. The problem is often that no consideration is given to how much evidence was likely. Too many people assume that not having seen a black marble implies no black marbles, even if you haven’t looked at any marbles.

  22. Gary: I agree with that. I implicitly had in mind observations of a negative outcome, not a lack of observations.

    There are many examples in math where an unexpected thing occurs with high probability, but it was unexpected because nobody looked.

  23. Let’s be bayesians.

    We have two binary variables:
    Ev = has evidence for the existence of _ been found?
    Ex = Does _ exists?

    What we’re trying to evaluate is the probability of Ex being false given that Ev is false. Bayes’ theorem tells us:

    P(Ex = False | Ev = False) = P(Ev = False | Ex = False) p(Ex = False) / P(Ev = False)

    Let’s make some assumptions:

    - Pr(Ev = False | Ex = False) = 1
    This is telling that if something doesn’t exist, we expect zero probability of finding evidence supporting it’s existence. We could relax this, I’m just trying to simplify the calculations.

    - Pr(Ev = True | Ex = True) = p
    This is the main control parameter: the probability of finding evidence for the existence of _ if it exists in fact. The smaller p is, the most difficult it is to find evidences for the existence of _.

    - Pr(Ex = True) = q
    This is how likely it is a priori that _ exists.

    If you substitute all of the variables you’ll conclude that:
    P(Ex = False | Ev = False) = (1- q) / (1 – pq)

    Let’s suppose q = 1/2 a priori, that is, I have no information at all if _ exists or not. Than:
    P(Ex = False | Ev = False) = 1/(2 – p)

    If p = 0.001, that is, if finding the evidence is very rare even if _ exists, than P(Ex = False | Ev = False) is very close to 0.5, that is, nothing can really be said.

    The question of whether you’re actively searching for _ or not is regulated by p – by how likely it is to find the evidence if _ actually exists. Active search makes o it more likely (greater p) , making bigger the impact of not finding evidence over the posterior probability of existence.

Comments are closed.