Why microarray study conclusions are so often wrong

Microarray technology makes it possible to examine the expression levels of thousands of genes at once. So one way to do cancer research is to run microarray analyses on cancer and normal tissue samples, hoping to discover genes that are more highly expressed in one or the other. If, for example, a few genes are highly expressed in cancer samples, the proteins these genes code for may be targets for new therapies.

For numerous reasons, cancer research is more complicated than simply running millions of microarray experiments and looking for differences. One complication is that false positives are very likely.

A previous post gives a formula for the probability of a reported result being true. The most important term in that formula is the prior odds R that a hypothesis in a certain context is correct. John Ioannidis gives a hypothetical but realistic example in the paper mentioned earlier (*). In his example, he supposes that 100,000 gene polymorphisms are being tested for association with schizophrenia. If 10 polymorphisms truly are associated with schizophrenia, the pre-study probability that a given gene is associated is 0.0001. If a study has 60% power (β = 0.4) and significance level α = 0.05, the post-study probability that a polymorphism determined to be associated really is associated is 0.0012. That is, a gene reported to be associated with schizophrenia is 12 times more likely to actually be associated with the disease than a gene chosen at random. However, the bad news is that 12 times 0.0001 is only 0.0012. There’s a 99.8% chance that the result is false.

The example above is extreme, but it shows that a completely brute-force approach isn’t going to get you very far. Nobody actually believes that 100,000 polymorphisms are equally likely to be associated with any disease. Biological information makes it possible to narrow down the list of things to test, increasing the value of R. Suppose it were possible to narrow the list down to 1,000 polymorphisms to test, but a couple important genes were left out, leaving 8. Then R increases to 0.008. Now the probability of a reported association being correct increases to 0.088. This is a great improvement, though reported results are still have more than a 90% chance of being wrong.

(*) John P. A. Ioannidis, Why most published research findings are false. CHANCE volume 18, number 4, 2005.

Tagged with:
Posted in Uncategorized
5 comments on “Why microarray study conclusions are so often wrong
  1. Dave Bridges says:

    i’m new at this, but isnt this sort of problem the thing that false discovery rate adjusted p values (q values) is supposed to account for?

  2. Abhijit says:

    You might also want to look at Wacholder, et al’s False Positive Report Probability (FPRP) (J Natl Cancer Inst, 2004 Mar 17;96(6):434-42), which is an attempt to directly address some of the issues you raise in a quasi-Bayesian framework.

  3. Brian H says:

    Why wouldn’t repeat runs, with some mod of sampling and procedure, weed out false positives?

  4. John says:

    The advantage of microarrays is that they’re cheap. If you do enough independent runs to make quality inferences, they’re no longer cheap.

  5. Andrea says:

    I know this is a rather old blog entry, but since the paper by Wacholder has been cited by Abhijit, I wanted also to cite this strong criticism to Wacholder’s paper: Lucke JF, A critique of the false-positive report probability, Genet Epidemiol. 2009 Feb;33(2):145-50. (http://www.ncbi.nlm.nih.gov/pubmed/18720477).

5 Pings/Trackbacks for "Why microarray study conclusions are so often wrong"
  1. [...] out perfectly, microarray experiment conclusions have a high probability of being incorrect for probabilistic reasons. Of course lab work and statistical analysis are not carried out perfectly. I went to a talk [...]

  2. [...] involves a lot of microarray data analysis. However, a series of recent blog posts [here, here and here] talk about microarray-related problems that differ so much from my own experiences that I cannot [...]

  3. [...] Why microarray study conclusions are so often wrong Popular research areas produce more false results Five criticisms of significance testing ? X [...]

  4. [...] soft sciences get things wrong more often. Sciences such as biology and epidemiology — soft compared to physics, but hard compared to sociology — often get things [...]

  5. [...] Predicting height from genes Why microarray studies are often wrong [...]