American creativity is declining according to a recent Newsweek article. The article says that America is embracing rote learning just as China is embracing creativity.

In China there has been widespread education reform to extinguish the drill-and-kill teaching style. … When faculty of a major Chinese university asked [Jonathan] Plucker to identify trends in American education, he described our focus on standardized curriculum, rote memorization, and nationalized testing. “After my answer was translated, they just started laughing out loud,” Plucker says. “They said, ‘You’re racing toward our old model. But we’re racing toward your model, as fast as we can.’ ”

Ken Robinson argues in his TED Talk that rather than encourage creativity, schools kill it.

There’s a simple statistic called the Lincoln Index that lets you estimate the total number of errors based on the number of errors found. I’ll explain what the Lincoln Index is, why it works, give some code for playing with it, and discuss how it applies to software testing.

What is the Lincoln Index?

Suppose you have a tester who finds 20 bugs in your program. You want to estimate how many bugs are really in the program. You know there are at least 20 bugs, and if you have supreme confidence in your tester, you may suppose there are around 20 bugs. But maybe your tester isn’t very good. Maybe there are hundreds of bugs. How can you have any idea how many bugs there are? There’s no way to know with one tester. But if you have two testers, you can get a good idea, even if you don’t know how skilled the testers are.

Suppose two testers independently search for bugs. Let E_{1} be the number of errors the first tester finds and E_{2} the number of errors the second tester finds. Let S be the number of errors both testers find. The Lincoln Index estimates the total number of errors as E_{1}E_{2}/S. You can find historical background on the Lincoln Index here.

How does the index work?

Suppose there are n bugs and the two testers find bugs with probability p_{1} and p_{2} respectively. You’d expect the two testers to find around np_{1} and np_{2} bugs. If you assume the probabilities of each tester finding a bug are independent, you’d expect the testers to find around np_{1}p_{2} bugs in common. That says E_{1}*E2/S would be around

(n^{2}p_{1}p_{2}) / (np_{1}p_{2}) = n.

The probabilities of each tester finding a bug cancel out leaving only n, the total number of bugs.

Simulation code

Here’s some Python code for simulating estimates using the Lincoln Index.

from random import random
def find_error(p):
"Find an error with probability p"
if random() < p:
return 1
return 0
def simulate(true_error_count, p1, p2, reps=10000):
"""Simulate Lincoln's method for estimating errors
given the true number of errors, each person's probability
of finding an error, and the number of simulations to run."""
estimation_error_sum = 0
for rep in xrange(reps):
caught1 = 0
caught2 = 0
caught_both = 0
for error in xrange(true_error_count):
found1 = find_error(p1)
found2 = find_error(p2)
caught1 += found1
caught2 += found2
caught_both += found1*found2
estimate = caught1*caught2 / float(caught_both)
estimation_error_sum += abs(estimate - true_error_count)
return estimation_error_sum / float(reps)

I used this to simulate the case of two testers, one with a 30% chance of finding a bug and the other with a 40% chance, and a total of 100 bugs. I simulated the Lincoln Index 1,000 times, keeping track of the absolute error in the estimates. The code to do this was simulate(100, 0.30, 0.40, 1000). On average, the Lincoln index over- or under-estimated the number of bugs by about 16. This is a good estimate considering each tester greatly under-estimated the number of bugs.

If you didn’t think about using something like the Lincoln Index, in the previous example one tester would find around 30 bugs and the other around 40. The two lists might have 10 bugs in common, so you’d estimate the total number at 60, far short of 100. But the Lincoln index would often find estimates between 84 and 116.

Note that it is possible that the testers won’t find any of the same bugs. In that case the Lincoln Index cannot be computed and the code will divide by zero. But this is unlikely unless the p‘s are small and n is small.

Software testing

Does the Lincoln Index actually provide a good bug count estimate? That depends on how well the assumptions are met. The index assumes all bugs are equally hard for a given tester to find. It does not assume that both testers are equally skilled, but it does assume that their chances of finding a bug are independent. In other words, tester A is no more or less likely to find a bug just because tester B found it.

The most questionable assumption is that all bugs are equally hard to find. That’s usually not true. But it may be true that all bugs of a certain kind are equally hard to find. For example, spelling errors may be easier to find than validation oversights, but the Lincoln Index might be good for estimating separately how many spelling errors or validation errors there are.

The index might provide a rough rule of thumb even if the assumptions it that go into it are violated. For example, suppose one tester found 15 bugs and another found 20. But only 3 of the bugs were the same. A naive estimate would say since there are 32 unique bugs found, there must be around that many in total. But the Lincoln Index would estimate 100 bugs. Maybe the Lincoln estimate is not at all accurate, but it does tell you to be worried that there may be a lot more bugs to find since the overlap between the two bug lists was so small.

Everything I do regularly in Mathematica can be done in Python. Even though Mathematica has a mind-boggling amount of functionality, I only use a tiny proportion of it. I skimmed through some of my Mathematica files to see what functions I use and then looked for Python counterparts. I found I use less of Mathematica than I imagined.

The core mathematical functions I need are in SciPy. The plotting features are in matplotlib. The SymPy library appears to have the symbolic functionality I need, though I’m as not sure about this one.

I don’t have much experience with the Python libraries listed above. I haven’t used SymPy at all; I’ve only browsed its web site. Maybe I’ll find I’d rather work in Mathematica, particularly when I’m just trying out ideas. But I want to experiment with using Python for more tasks.

As I’ve blogged about before, I’d like to consolidate my tools. I started using Emacs again because I was frustrated with using a different editor for every kind of file. One of the things I find promising about Python is that I may be able to do more in Python and reduce the number of programming languages I use regularly.

A decade ago, commercial software vendors would claim that their products were cheaper than open source alternatives when you considered the total cost of ownership. Free software was free to obtain, but difficult to install, configure, maintain, and support.

A lot has changed in the last decade. Open source software has improved a great deal. It would be interesting to revisit the debate over total cost of ownership. Software vendors are right to point out the indirect costs of free software. But there are indirect costs to commercial software too: transaction costs of purchasing the software, upgrades, maintenance agreements, license management, etc.

Suppose you want to buy WinZip. It’s a mature and inexpensive piece of software, selling for $29.95. What will it cost you and your company to buy it? Obviously at least $29.95. But how much paperwork will you have to fill out? How long will it take someone to process your order? How long will you have to wait? If you have a desktop and laptop computer, will you be licensed for both? Can you install it at home? At minimum you’ll have to read enough fine print to find out. Now suppose you get a new PC. Did you remember to save your WinZip installer before they took away your old PC? Do you have your license key? The more you think about it, the better the free alternative 7-zip looks.

I heard a great term the other day: geek fatigue. Being a geek often means doing things the hard way, at least in the short term. There’s usually some long-term advantage, real or imagined, to justify doing things the hard way. But even a die-hard geek gets tired and wants to take the easy way out.

Thomas Gideon — a self-described “die-hard technology geek” — used the term geek fatigue on his podcast to describe why he bought a Mac a few years ago. He was tired of using Linux and fighting driver issues. (Thomas has recently decided to move back to Linux.)

If geek fatigue is exhaustion from doing things the hard way, there needs to be a corresponding term for the relief that comes from joining the mainstream. Any suggestions?

Sometimes the geek approach is just extra work. There’s no advantage other than the personal satisfaction of doing something within self-imposed limitations. But sometimes the geek approach pays off, especially in the longer term. What has your experience been?

Years ago, Dentyne chewing gum ran an advertising campaign with the line “four out of five dentists surveyed recommend sugarless gum for their patients who chew gum.” Of course there’s no mention of sample size. Maybe “four out of five” meant 80% of a large survey, or maybe they literally surveyed five dentists.

Even if they only talked to five dentists, you’d think that if four dentists out of five came to the same conclusion, it is quite likely that they have good advice. Individuals have their biases, but if a large majority comes to the same conclusion independently, maybe some underlying truth is responsible for the consensus rather than a coincidence of prejudices.

However, there is a fallacy in the preceding argument. It implicitly assumes that professionals make up their minds independently and that their prejudices are independent. That may be true on some small objective problem. Several scientists may conduct independent experiments and have independent errors. In that case, if most agree on a measurement, that measurement is likely to be accurate. But ask a group of scientists working in the same area if their area deserves more funding. Of course they’ll agree. Their financial interests are highly correlated.

James Surowiecki’s book The Wisdom of Crowds argues that crowds can be amazingly intelligent. Crowds can also be incredibly foolish. One of the necessary conditions for crowd wisdom is independence. The book gives examples of experiments in which the average independent estimates, such as the weight of a cow or the number of jelly beans in a jar, surprisingly accurate. But if there were an open debate rather than an anonymous poll, the estimates would no longer be independent. If one influential persons offers a guess, other estimates will be anchored by that guess and tend to confirm it.

William Briggs has an excellent article this morning on scientific consensus. The context of his article is climate change, though I don’t want to open a debate here on climate change. For that matter, I don’t want to open a debate on the merits of sugarless chewing gum. I’m more interested in what the article says about how a consensus becomes self-reinforcing.

In his book Made by Hand, Mark Frauenfelder quotes Peter Gray on what’s wrong with contemporary education. Gray says that school is about

always preparing for some future time when you will know enough to actually do something, instead of doing things now. And that’s such a tedious approach for anybody to take to life—always preparing.

There have been several blog posts lately on the volume of balls in higher dimensions that correspond to the case p = 2. The formula above is valid for all p > 0.

Note that as p goes to ∞ the volume goes to 2^{n} because the terms involving gamma functions go to 1. This is as we’d expect since the unit “ball” in the infinity norm is a cube, two units wide on each side.

Travis Oliphant announced this morning at the SciPy 2010 conference that Microsoft is partnering with Enthought to produce a version of NumPy and SciPy for .NET. NumPy and SciPy are Python libraries for scientific computing. Oliphant is the president of Enthought and the original developer of NumPy.

It is possible to call NumPy and SciPy from IronPython now by using IronClad. However, going through IronClad can be inefficient. The new libraries will enable efficient access to NumPy and SciPy from .NET languages and in particular from IronPython.

Here is the official press release from Enthought. [Update: press release no longer available.]