From the monthly archives:

December 2008

Numerical integration article posted

by John on December 8, 2008

This weekend CodeProject posted an article I wrote entitled Fast numerical integration.

The algorithm in the article, introduced in the 1970’s by Masatake Mori and Hidetosi Takahasi, is indeed fast. It integrates analytical functions over bounded intervals with the most accuracy for a fixed number of integration points.

The CodeProject article includes source code and is mostly about the code. See this page for mathematical details.

{ 0 comments }

An albatross can be a good thing

by John on December 6, 2008

The most frequent use of the word “albatross” in ordinary conversation is in the metaphor of an albatross about someone’s neck, an allusion to The Rime of the Ancient Mariner. An albatross around your neck is a terrible burden that won’t go away.

So when I saw that Seth Godin had posted an article Building an albatross, I thought it would about how to avoid creating a high-maintenance business. But that’s not it at all. He’s created a new metaphor about these sea birds. They have a hard time taking off. They have to wait for just the right wind. But once they are airborne, they can fly for days or weeks without stopping. Seth compares his Squidoo venture to an albatross and gives tips for starting an albatross business.

Related post: Coping with exponential growth.

{ 1 comment }

Three trigonometry topics

by John on December 6, 2008

I’ve run into three topics related to trigonometry lately.

First, as I’d mentioned a few days ago, I ran into the identity on Mathematics Diary that for the interior angles of a triangle, the sum of the tangents equals the product of the tangents. I mentioned a converse to this identity here.

Next, a discussion came up on StackOverflow regarding how trigonometric functions are calculated. I happened to know something about this because of a consulting job I did a few years ago where I helped design the transcendental function algorithms for a microchip. Calculus instructors speculate, or even assert, that computers use Taylor series to compute trig functions. Maybe some do, but it’s not the most efficient or most common way.

Finally, this morning Brent Yorgey at The Math Less Traveled discusses a neglected aspect of the law of sines. The way I learned this theorem, and apparently the way the Brent learned it as well, was

a/sin(a) = b/sin(b) = c/sin(c)

where a, b, and c are interior angles of a triangle. But there’s more. Not only do the three ratios equal each other, they also equal something interesting: the diameter of the circle that circumscribes the triangle.

See Brent Yorgey’s post for a proof.

{ 0 comments }

Microarray technology makes it possible to examine the expression levels of thousands of genes at once. So one way to do cancer research is to run microarray analyses on cancer and normal tissue samples, hoping to discover genes that are more highly expressed in one or the other. If, for example, a few genes are highly expressed in cancer samples, the proteins these genes code for may be targets for new therapies.

For numerous reasons, cancer research is more complicated than simply running millions of microarray experiments and looking for differences. One complication is that false positives are very likely.

A previous post gives a formula for the probability of a reported result being true. The most important term in that formula is the prior odds R that a hypothesis in a certain context is correct. John Ioannidis gives a hypothetical but realistic example in the paper mentioned earlier (*). In his example, he supposes that 100,000 gene polymorphisms are being tested for association with schizophrenia. If 10 polymorphisms truly are associated with schizophrenia, the pre-study probability that a given gene is associated is 0.0001. If a study has 60% power (β = 0.4) and significance level α = 0.05, the post-study probability that a polymorphism determined to be associated really is associated is 0.0012. That is, a gene reported to be associated with schizophrenia is 12 times more likely to actually be associated with the disease than a gene chosen at random. However, the bad news is that 12 times 0.0001 is only 0.0012. There’s a 99.8% chance that the result is false.

The example above is extreme, but it shows that a completely brute-force approach isn’t going to get you very far. Nobody actually believes that 100,000 polymorphisms are equally likely to be associated with any disease. Biological information makes it possible to narrow down the list of things to test, increasing the value of R. Suppose it were possible to narrow the list down to 1,000 polymorphisms to test, but a couple important genes were left out, leaving 8. Then R increases to 0.008. Now the probability of a reported association being correct increases to 0.088. This is a great improvement, though reported results are still have more than a 90% chance of being wrong.

(*) John P. A. Ioannidis, Why most published research findings are false. CHANCE volume 18, number 4, 2005.

{ 9 comments }

Back in February I wrote a post How to avoid being outsourced or open sourced. That post synthesizes points made by Kevin Kelly, Daniel Pink, and Thomas Friedman about how to thrive in an era when many things that once were expensive are now free or cheap.

Kevin Kelly has taken the post I commented on and turned it into a PDF manifesto called Better Than Free available here.

{ 0 comments }

XML database state of the union

by John on December 4, 2008

Daniel Lemire gives a sort of state-of-the-union report for XML databases in this post: Native XML databases: have they taken over the world yet?

{ 1 comment }

Trying out Twitter

by John on December 3, 2008

I finally gave in and started using Twitter, user name @johndcook. I’m not sold on it yet, but I’m going to try it for a while.

{ 1 comment }

Machine learning

by John on December 3, 2008

Brendan O’Connor has a thoughtful comparison of machine learning and statistics this morning.

{ 3 comments }

The cult of significance testing

by John on December 3, 2008

I recently found out about a book that was published earlier this year, The Cult of Statistical Significance by Stephen Ziliak and Deidra McCloskey. The subtitle is sure to stir up controversy: How the Standard Error Costs Us Jobs, Justice, and Lives.

From the parts I’ve read it sounds like the central criticism of the book is that statistical significance is not necessarily scientific significance. Statistical significance questions whether an effect exists and is unconcerned with the size or importance of the effect.

Significance testing errs in two directions. First, in practice many people believe that any hypothesis with a p-value less than 0.05 is very likely true and important, though often such hypotheses are untrue and unimportant. Second, many act as if a hypothesis with a p-value greater than 0.05 is “insignificant” regardless of context. Not only is the 0.05 cutoff arbitrary, it is quite common to say there is evidence if p = 0.049 and to say there is no evidence if p = 0.051. Common sense tells you that if 0.049 provides evidence then 0.051 provides slightly less evidence rather than no evidence.

The book gives the example of Merck saying there is “no evidence” that Vioxx has a higher probability of causing heart attacks than naproxen because their study did not achieve the magical 0.05 significance level. The book argues that “significance” should depend on context. When the stakes are higher, such as people suffering heart attacks, it should take less evidence before we declare an effect significant. Also, if you don’t want to find significance, you can always reduce the size of your study to decrease your chances of finding significance. [I have not followed the Vioxx case and have no opinion on its specifics.] In addition to the Vioxx case, Ziliak and McCloskey provide case studies in economics, psychology, and medicine.

Whenever someone raises objections to significance testing the reaction is always “Yes, everyone knows that.” Everyone agrees that the 0.05 cutoff is arbitrary, everyone agrees that effect sizes matter, etc. And yet nearly everyone continues to play the p < 0.05 game.

Related posts:
Origin of “statistically significant”
Five criticisms of significance testing

{ 12 comments }

A stimulating work environment

by John on December 2, 2008

Andy Hunt posted an article this morning entitled Science Failure and Cubicle Brain Death. He explains that one reason it took so long to discover that adult animals could grow new brain cells was that such growth doesn’t happen in laboratory conditions. To grow new brain cells, animals need stimulation that a sterile lab environment does not provide. People need stimulating environments too. Little things matter.

… things like the pen and paper you use, the decorations at your desk, the lighting and ceiling height of your cubicle all have a measurable effect on your cognitive processes.

Joel Spolsky talked about this in the latest StackOverflow podcast. His company often faces criticism for spending so much money on office space for developers. But as he put it, the difference between depressing and stimulating office space may amount to whether you devote 4% or 6% of your total budget to rent. The extra investment in office space allows you to recruit more competitively for top talent and makes the people you hire more productive.

Related posts:

Selective use of technology
Brain plasticity
Getting to the bottom of things

{ 3 comments }

Matrix cookbook

by John on December 2, 2008

Here’s something that looks like it could be handy: The Matrix Cookbook.

(I just realized that not everyone would have in mind the same context of the link posted above. The Matrix Cookbook is a summary of facts about mathematical matrices. Feel free to leave your own joke about sci fi movies and food preparation in the comments.)

{ 2 comments }

Michael Feathers on refactoring

by John on December 1, 2008

Michael Feathers wrote one of my favorite books on unit testing: Working Effectively with Legacy Code. Some books on unit testing just give abstract platitudes. Feather’s book wrestles with the hard, messy problem of retrofitting unit tests to existing code.

The .NET Rocks podcast had an interview with Michael Feathers recently. The whole interview is worth listening to, but here I’ll just recap a couple things he said about refactoring that I thought were insightful. First, most people agree that you need to have unit tests in place before you can do much refactoring. The unit tests give you the confidence to refactor without worrying that you’ll break something in the process and not know that you broke it. But Feathers adds that you might have to do some light refactoring before you can put the unit tests in place to allow more aggressive refactoring.

The second thing he mentioned about refactoring was the technique called “scratch refactoring.” With this approach, you refactor quickly without worrying about whether you are introducing bugs in order to see where you want to go. But then you completely throw away those changes and refactor carefully. Sometimes you need to do a dry run first to see what patterns emerge and determine where you want to go.

Both of these observations are ways to break out of a chicken-and-egg cycle, needing to refactor before you can refactor.

{ 1 comment }