Numerical integration article posted

Posted on 8 December 2008 by John

This weekend CodeProject posted an article I wrote entitled Fast numerical integration.

The algorithm in the article, introduced in the 1970’s by Masatake Mori and Hidetosi Takahasi, is indeed fast. It integrates analytical functions over bounded intervals with the most accuracy for a fixed number of integration points.

The CodeProject article includes source code and is mostly about the code. See this page for mathematical details.

An albatross can be a good thing

Posted on 6 December 2008 by John

The most frequent use of the word “albatross” in ordinary conversation is in the metaphor of an albatross about someone’s neck, an allusion to The Rime of the Ancient Mariner. An albatross around your neck is a terrible burden that won’t go away.

So when I saw that Seth Godin had posted an article Building an albatross, I thought it would about how to avoid creating a high-maintenance business. But that’s not it at all. He’s created a new metaphor about these sea birds. They have a hard time taking off. They have to wait for just the right wind. But once they are airborne, they can fly for days or weeks without stopping. Seth compares his Squidoo venture to an albatross and gives tips for starting an albatross business.

Related post: Coping with exponential growth.

Three trigonometry topics

Posted on 6 December 2008 by John

I’ve run into three topics related to trigonometry lately.

First, as I’d mentioned a few days ago, I ran into the identity on Mathematics Diary that for the interior angles of a triangle, the sum of the tangents equals the product of the tangents. I mentioned a converse to this identity here.

Next, a discussion came up on StackOverflow regarding how trigonometric functions are calculated. I happened to know something about this because of a consulting job I did a few years ago where I helped design the transcendental function algorithms for a microchip. Calculus instructors speculate, or even assert, that computers use Taylor series to compute trig functions. Maybe some do, but it’s not the most efficient or most common way.

Finally, this morning Brent Yorgey at The Math Less Traveled discusses a neglected aspect of the law of sines. The way I learned this theorem, and apparently the way the Brent learned it as well, was

a/sin(a) = b/sin(b) = c/sin(c)

where a, b, and c are interior angles of a triangle. But there’s more. Not only do the three ratios equal each other, they also equal something interesting: the diameter of the circle that circumscribes the triangle.

See Brent Yorgey’s post for a proof.

Why microarray study conclusions are so often wrong

Posted on 6 December 2008 by John

Microarray technology makes it possible to examine the expression levels of thousands of genes at once. So one way to do cancer research is to run microarray analyses on cancer and normal tissue samples, hoping to discover genes that are more highly expressed in one or the other. If, for example, a few genes are highly expressed in cancer samples, the proteins these genes code for may be targets for new therapies.

For numerous reasons, cancer research is more complicated than simply running millions of microarray experiments and looking for differences. One complication is that false positives are very likely.

A previous post gives a formula for the probability of a reported result being true. The most important term in that formula is the prior odds R that a hypothesis in a certain context is correct. John Ioannidis gives a hypothetical but realistic example in the paper mentioned earlier (*). In his example, he supposes that 100,000 gene polymorphisms are being tested for association with schizophrenia. If 10 polymorphisms truly are associated with schizophrenia, the pre-study probability that a given gene is associated is 0.0001. If a study has 60% power (β = 0.4) and significance level α = 0.05, the post-study probability that a polymorphism determined to be associated really is associated is 0.0012. That is, a gene reported to be associated with schizophrenia is 12 times more likely to actually be associated with the disease than a gene chosen at random. However, the bad news is that 12 times 0.0001 is only 0.0012. There’s a 99.8% chance that the result is false.

The example above is extreme, but it shows that a completely brute-force approach isn’t going to get you very far. Nobody actually believes that 100,000 polymorphisms are equally likely to be associated with any disease. Biological information makes it possible to narrow down the list of things to test, increasing the value of R. Suppose it were possible to narrow the list down to 1,000 polymorphisms to test, but a couple important genes were left out, leaving 8. Then R increases to 0.008. Now the probability of a reported association being correct increases to 0.088. This is a great improvement, though reported results are still have more than a 90% chance of being wrong.

(*) John P. A. Ioannidis, Why most published research findings are false. CHANCE volume 18, number 4, 2005.

Update on not being outsourced or open sourced

Posted on 4 December 2008 by John

Back in February I wrote a post How to avoid being outsourced or open sourced. That post synthesizes points made by Kevin Kelly, Daniel Pink, and Thomas Friedman about how to thrive in an era when many things that once were expensive are now free or cheap.

Kevin Kelly has taken the post I commented on and turned it into a PDF manifesto called Better Than Free available here.

XML database state of the union

Posted on 4 December 2008 by John

Daniel Lemire gives a sort of state-of-the-union report for XML databases in this post: Native XML databases: have they taken over the world yet?

Trying out Twitter

Posted on 3 December 2008 by John

I finally gave in and started using Twitter, username @johndcook. I’m not sold on it yet, but I’m going to try it for a while.

Machine learning

Posted on 3 December 2008 by John

Brendan O’Connor has a thoughtful comparison of machine learning and statistics this morning.

The cult of significance testing

Posted on 3 December 2008 by John

I recently found out about a book that was published earlier this year, The Cult of Statistical Significance by Stephen Ziliak and Deidra McCloskey. The subtitle is sure to stir up controversy: How the Standard Error Costs Us Jobs, Justice, and Lives.

From the parts I’ve read it sounds like the central criticism of the book is that statistical significance is not necessarily scientific significance. Statistical significance questions whether an effect exists and is unconcerned with the size or importance of the effect.

Significance testing errs in two directions. First, in practice many people believe that any hypothesis with a p-value less than 0.05 is very likely true and important, though often such hypotheses are untrue and unimportant. Second, many act as if a hypothesis with a p-value greater than 0.05 is “insignificant” regardless of context. Not only is the 0.05 cutoff arbitrary, it is quite common to say there is evidence if p = 0.049 and to say there is no evidence if p = 0.051. Common sense tells you that if 0.049 provides evidence then 0.051 provides slightly less evidence rather than no evidence.

The book gives the example of Merck saying there is “no evidence” that Vioxx has a higher probability of causing heart attacks than naproxen because their study did not achieve the magical 0.05 significance level. The book argues that “significance” should depend on context. When the stakes are higher, such as people suffering heart attacks, it should take less evidence before we declare an effect significant. Also, if you don’t want to find significance, you can always reduce the size of your study to decrease your chances of finding significance. [I have not followed the Vioxx case and have no opinion on its specifics.] In addition to the Vioxx case, Ziliak and McCloskey provide case studies in economics, psychology, and medicine.

Whenever someone raises objections to significance testing the reaction is always “Yes, everyone knows that.” Everyone agrees that the 0.05 cutoff is arbitrary, everyone agrees that effect sizes matter, etc. And yet nearly everyone continues to play the p < 0.05 game.

A stimulating work environment

Posted on 2 December 2008 by John

Andy Hunt posted an article this morning entitled Science Failure and Cubicle Brain Death. He explains that one reason it took so long to discover that adult animals could grow new brain cells was that such growth doesn’t happen in laboratory conditions. To grow new brain cells, animals need stimulation that a sterile lab environment does not provide. People need stimulating environments too. Little things matter.

… things like the pen and paper you use, the decorations at your desk, the lighting and ceiling height of your cubicle all have a measurable effect on your cognitive processes.

Joel Spolsky talked about this in the latest StackOverflow podcast. His company often faces criticism for spending so much money on office space for developers. But as he put it, the difference between depressing and stimulating office space may amount to whether you devote 4% or 6% of your total budget to rent. The extra investment in office space allows you to recruit more competitively for top talent and makes the people you hire more productive.

Month: December 2008

Numerical integration article posted

An albatross can be a good thing

Three trigonometry topics

Why microarray study conclusions are so often wrong

Update on not being outsourced or open sourced

XML database state of the union

Trying out Twitter

Machine learning

The cult of significance testing

Related posts

A stimulating work environment

Related posts