Too easy

When people sneer at a technology for being too easy to use, it’s worth trying out.

If the only criticism is that something is too easy or “OK for beginners” then maybe it’s a threat to people who invested a lot of work learning to do things the old way.

The problem with the “OK for beginners” put-down is that everyone is a beginner sometimes. Professionals are often beginners because they’re routinely trying out new things. And being easier for beginners doesn’t exclude the possibility of being easier for professionals too.

Sometimes we assume that harder must be better. I know I do. For example, when I first used Windows, it was so much easier than Unix that I assumed Unix must be better for reasons I couldn’t articulate. I had invested so much work learning to use the Unix command line, it must have been worth it. (There are indeed advantages to doing some things from the command line, but not the work I was doing at the time.)

There often are advantages to doing things the hard way, but something isn’t necessary better because it’s hard. The easiest tool to pick up may not be best tool for long-term use, but then again it might be.

Most of the time you want to add the easy tool to your toolbox, not take the old one out. Just because you can use specialized professional tools doesn’t mean that you always have to.

Related post: Don’t be a technical masochist

Clinical trial software

This week’s resource post lists some of the projects I managed or contributed to while working at MD Anderson Cancer Center.

Last week’s resource post: Stand-alone numerical code

 

Finding the best dose

In a dose-finding clinical trial, you have a small number of doses to test, and you hope find the one with the best response. Here “best” may mean most effective, least toxic, closest to a target toxicity, some combination of criteria, etc.

Since your goal is to find the best dose, it seems natural to compare dose-finding methods by how often they find the best dose.  This is what is most often done in the clinical trial literature. But this seemingly natural criterion is actually artificial.

Suppose a trial is testing doses of 100, 200, 300, and 400 milligrams of some new drug. Suppose further that on some scale of goodness, these doses rank 0.1, 0.2, 0.5, and 0.51. (Of course these goodness scores are unknown; the point of the trial is to estimate them. But you might make up some values for simulation, pretending with half your brain that these are the true values and pretending with the other half that you don’t know what they are.)

Now suppose you’re evaluating two clinical trial designs, running simulations to see how each performs. The first design picks the 400 mg dose, the best dose, 20% of the time and picks the 300 mg dose, the second best dose, 50% of the time. The second design picks each dose with equal probability. The latter design picks the best dose more often, but it picks a good dose less often.

In this scenario, the two largest doses are essentially equally good; it hardly matters how often a method distinguishes between them. The first method picks one of the two good doses 70% of the time while the second method picks one of the two good doses only 50% of the time.

This example was exaggerated to make a point: obviously it doesn’t matter how often a method can pick the better of two very similar doses, not when it very often picks a bad dose. But there are less obvious situations that are quantitatively different but qualitatively the same.

The goal is actually to find a good dose. Finding the absolute best dose is impossible. The most you could hope for is that a method finds with high probability the best of the four arbitrarily chosen doses under consideration. Maybe the best dose is 350 mg, 843 mg, or some other dose not under consideration.

A simple way to make evaluating dose-finding methods less arbitrary would be to estimate the benefit to patients. Finding the best dose is only a matter of curiosity in itself unless you consider how that information is used. Knowing the best dose is important because you want to treat future patients as effectively as you can. (And patients in the trial itself as well, if it is an adaptive trial.)

Suppose the measure of goodness in the scenario above is probability of successful treatment and that 1,000 patients will be treated at the dose level picked by the trial. Under the first design, there’s a 20% chance that 51% of the future patients will be treated successfully, and a 50% chance that 50% will be. The expected number of successful treatments from the two best doses is 352. Under the second design, the corresponding number is 252.5.

(To simplify the example above, I didn’t say how often the first design picks each of the two lowest doses. But the first design will result in at least 382 expected successes and the second design 327.5.)

You never know how many future patients will be treated according to the outcome of a clinical trial, but there must be some implicit estimate. If this estimate is zero, the trial is not worth conducting. In the example given here, the estimate of 1,000 future patients is irrelevant: the future patient horizon cancels out in a comparison of the two methods. The patient horizon matters when you want to include the benefit to patients in the trial itself. The patient horizon serves as a way to weigh the interests of current versus future patients, an ethically difficult comparison usually left implicit.

Career advice from Einstein

“If I would be a young man again and had to decide how to make my living, I would not try to become a scientist or scholar or teacher. I would rather choose to be a plumber or a peddler, in the hope to find that modest degree of independence still available under present circumstances.”

Albert Einstein, 1954

The opposite of an idiot

The origin of the word idiot is “one’s own,” the same root as idiom. So originally an idiot was someone in his own world, someone who takes no outside input. The historical meaning carries over to some degree: When you see a smart person do something idiotic, it’s usually because he’s acting alone.

The opposite of an idiot would not be someone who is wise, but someone who takes too much outside input, someone who passively lets others make all his decisions or who puts every decision to a vote.

An idiot lives only in his own world; the opposite of an idiot has no world of his own. Both are foolish, but I think the Internet encourages more of the latter kind of foolishness. It’s not turning people into idiots, it’s turning them into the opposite.

Successful companies with incompetent employees

It’s not hard to imagine how a company filled with great people can thrive. More intriguing are the companies that inspire Dilbert cartoons and yet manage to succeed. When a company thrives despite bad service and incompetent employees, they’re doing something right that isn’t obvious. Not everyone can be incompetent. Someone somewhere in the company must be very competent to keep it alive despite liabilities. Or at least there used to be someone very competent who set things in motion.

Maybe the company is good at attracting investors. Maybe they have enormous economies of scale that overcome their diseconomies of scale. Maybe they’ve lobbied politicians to protect the company from competition. (That’s a form of success, though not an honorable one.) Maybe they serve a market well, but you don’t see it because you’re not part of that market.

Stand-alone code for numerical computing

For this week’s resource post, see the page Stand-alone code for numerical computing. It points to small, self-contained bits of code for special functions (log gamma, erf, etc.) and for random number generation (normal, Poisson, gamma, etc.).

The code is available in Python, C++, and C# versions. It could easily be translated into other languages since it hardly uses any language-specific features.

I wrote these functions for projects where you don’t have a numerical library available or would like to minimize dependencies. If you have access to a numerical library, such as SciPy in Python, then by all means use it (although SciPy is missing some of the random number generators provided here). In C++ and especially C#, it’s harder to find some of this functionality.

Last week: Code Project articles

Next week: Clinical trial software

Random walks and the arcsine law

Suppose you stand at 0 and flip a fair coin. If the coin comes up heads, you take a step to the right. Otherwise you take a step to the left. How much of the time will you spend to the right of where you started?

As the number of steps N goes to infinity, the probability that the proportion of your time in positive territory is less than x approaches 2 arcsin(√x)/π. The arcsine term gives this rule its name, the arcsine law.

Here’s a little Python script to illustrate the arcsine law.

import random
from numpy import arcsin, pi, sqrt

def step():
    u = random.random()
    return 1 if u < 0.5 else -1

M = 1000 # outer loop    
N = 1000 # inner loop

x = 0.3 # Use any 0 < x < 1 you'd like.

outer_count = 0
for _ in range(M):
    n = 0
    position= 0 
    inner_count = 0
    for __ in range(N):
        position += step()
        if position > 0:
            inner_count += 1
    if inner_count/N < x:
        outer_count += 1

print (outer_count/M)
print (2*arcsin(sqrt(x))/pi)

Playing with continued fractions and Khinchin’s constant

Take a real number x and expand it as a continued fraction. Compute the geometric mean of the first n coefficients.

Aleksandr Khinchin proved that for almost all real numbers x, as n → ∞ the geometric means converge. Not only that, they converge to the same constant, known as Khinchin’s constant, 2.685452001…. (“Almost all” here mean in the sense of measure theory: the set of real numbers that are exceptions to Khinchin’s theorem have measure zero.)

To get an idea how fast this convergence is, let’s start by looking at the continued fraction expansion of π. In Sage, we can type

continued_fraction(RealField(100)(pi))

to get the continued fraction coefficient

[3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3]

for π to 100 decimal places. The geometric mean of these coefficients is 2.84777288486, which only matches Khinchin’s constant to 1 significant figure.

Let’s try choosing random numbers and working with more decimal places.

There may be a more direct way to find geometric means in Sage, but here’s a function I wrote. It leaves off any leading zeros that would cause the geometric mean to be zero.

from numpy import exp, mean, log
def geometric_mean(x):
    return exp( mean([log(k) for k in x if k > 0]) )

Now let’s find 10 random numbers to 1,000 decimal places.

for _ in range(10):
    r = RealField(1000).random_element(0,1)
    print(geometric_mean(continued_fraction(r)))

This produced

2.66169890535
2.62280675227
2.61146463641
2.58515620064
2.58396664032
2.78152297661
2.55950338205
2.86878898900
2.70852612496
2.52689450535

Three of these agree with Khinchin’s constant to two significant figures but the rest agree only to one. Apparently the convergence is very slow.

If we go back to π, this time looking out 10,000 decimal places, we get a little closer:

print(geometric_mean(continued_fraction(RealField(10000)(pi))))

produces 2.67104567579, which differs from Khinchin’s constant by about 0.5%.

Grand unification of mathematics

Greg Egan’s short story Glory features a “xenomathematician” who discovers that an ancient civilization had produced a sort of grand unification of their various branches of mathematics.

It was not a matter of everything in mathematics collapsing in on itself, with one branch turning out to have been merely a recapitulation of another under a different guise. Rather, the principle was that every sufficiently beautiful mathematical system was rich enough to mirror in part — and sometimes in a complex and distorted fashion — every other sufficiently beautiful system. Nothing became sterile and redundant, nothing proved to have been a waste of time, but everything was shown to be magnificently intertwined.

Natural optima occur in the middle

Akin’s eighth law of spacecraft design says

In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point.

When I first read this I immediately thought of several examples where theory said that an optima was at an extreme, but experience said otherwise.

Linear programming (LP) says the opposite of Akin’s law. The optimal point for a linear objective function subject to linear constraints is always at an extreme point. The constraints form a many-sided shape—you could think of it something like a diamond—and the optimal point will always be at one of the corners.

Nature is not linear, though it is often approximately linear over some useful range. One way to read Akin’s law is to say that even when something is approximately linear in the middle, there’s enough non-linearity at the extremes to pull the optimal points back from the edges. Or said another way, when you have an optimal value at an extreme point, there may be some unrealistic aspect of your model that pushed the optimal point out to the boundary.

Related postData calls the model’s bluff

“MTD” is misleading

Dose-finding trials of chemotherapy agents look for the MTD: maximum tolerated dose. The idea is to give patients as much chemotherapy as they can tolerate, hoping to do maximum damage to tumors without doing too much damage to patients.

But “maximum tolerated dose” implies a degree of personalization that rarely exists in clinical trials. Phase I chemotherapy trials don’t try to find the maximum dose that any particular patient can tolerate. They try to find a dose that is toxic to a certain percentage of the trial participants, say 30%. (This rate may seem high, but it’s typical. It’s not far from the toxicity rate implicit in the so-called 3+3 rule or from the explicit rate given in many CRM (“continual reassessment method”) designs.)

It’s tempting to think of “30% toxicity rate” as meaning that each patient experiences a 30% toxic reaction. But that’s not what it means. It means that each patient has a 30% chance of a toxicity, however toxicity is defined in a particular trial. If toxicity were defined as kidney failure, for example, then 30% toxicity rate means that each patient has a 30% probability of kidney failure, not that they should expect a 30% reduction in kidney function.

Related posts:

Decide what to abandon

Sometimes it’s rational to walk away from something you’ve invested a great deal in.

It’s hard imagine how investors could abandon something as large and expensive as a shopping mall. And yet it must have been a sensible decision. If anyone disagreed, they could buy the abandoned mall on the belief that they could make a profit.

The idea that you should stick to something just because you’ve invested in it goes by many names: sunk cost fallacy, escalation of commitment, gambler’s ruin, etc. If further investment will simply lose more money, there’s no economic reason to continue investing, regardless of how much money you’ve spent. (There may be non-economic reasons. You may have a moral obligation to fulfill a commitment or to clean up a mess you’ve made.)

Most of us have not faced the temptation to keep investing in an unprofitable shopping mall, but everyone is tempted by the sunk cost fallacy in other forms: finishing a novel you don’t enjoy reading, holding on to hopelessly tangled software, trying to earn a living with a skill people no longer wish to pay for, etc.

According to Peter Drucker, “It cannot be said often enough that one should not postpone; one abandons.”

The first step in a growth policy is not to decide where and how to grow. It is to decide what to abandon. In order to grow, a business must have a systematic policy to get rid of the outgrown, the obsolete, the unproductive.

It’s usually more obvious what someone else should abandon than what we should abandon. Smart businesses turn to outside consultants for such advice. Smart individuals turn to trusted friends. An objective observer without our emotional investment can see things more clearly than we can.

***

This posted started out as a shorter post on Google+.

Photo above by Steve Petrucelli via flickr

Code Project articles

This week’s resource post lists some articles along with source code I’ve posted on CodeProject.

Probability

Pitfalls in Random Number Generation includes several lessons learned the hard way.

Simple Random Number Generation is a random number generator written in C# based on George Marsaglia’s WMC algorithm.

Finding probability distribution parameters from percentiles

Numerical computing

Avoiding Overflow, Underflow, and Loss of Precision explains why the most obvious method for evaluating mathematical functions may not work. The article includes C++ source code for evaluating some functions that come up in statistics (particularly logistic regression) that could have problems if naïvely implemented.

An introduction to numerical programming in C#

Five tips for floating point programming gives five of the most important things someone needs to know when working with floating point numbers.

Optimizing a function of one variable with Brent’s method.

Fast numerical integration using the double-exponential transform method. Optimally efficient numerical integration for analytic functions over a finite interval.

Three methods for root-finding in C#

Getting started with the SciPy (Scientific Python) library

Numerical integration with Simpson’s rule

Filling in the gaps: simple interpolation discusses linear interpolation and inverse interpolation, and gives some suggestions for what to do next if linear interpolation isn’t adequate.

PowerShell

Automated Extract and Build from Team System using PowerShell explains a PowerShell script to automatically extract and build Visual Studio projects from Visual Studio Team System (VSTS) version control.

PowerShell Script for Reviewing Text Show to Users is a tool for finding errors in prose displayed to users that might not be exposed during testing.

Monitoring unreliable scheduled tasks describes a simple program for monitoring legacy processes.

C++

Calculating percentiles in memory-bound applications gives an algorithm and C++ code for calculating percentiles of a list too large to fit into memory.

Quick Start for C++ TR1 Regular Expressions answers 10 of the first questions that are likely to come to mind when someone wants to use the new regular expression support in C++.

Resource series

Last week: Miscellaneous math notes

Next week: Stand-alone numerical code

Perl regex twitter account

I’ve started a new Twitter account @PerlRegex for Perl regular expressions. My original account, @RegexTip, is for regular expressions in general and doesn’t go into much detail regarding any particular implementation. @PerlRegex goes into the specifics of regular expressions in Perl.

Why specifically Perl regular expressions? Because Perl has the most powerful support for regular expressions (strictly speaking, “pattern matching.”) Other languages offer “Perl compatible” regular expressions, though the degree of compatibility varies and is always less than complete.

I imagine more people have ruled England than have mastered the whole of the Perl language. But it’s possible to use Perl for regular expression processing without learning too much of the wider language.

PerlRegex icon