Big data is getting a lot of buzz lately, but small data is interesting too. In some ways it’s more interesting. Because of limit theorems, a lot of things become dull in the large that are more interesting in the small.
When working with small data sets you have to accept that you will very often draw the wrong conclusion. You just can’t have high confidence in inference drawn from a small amount of data, unless you can do magic. But you do the best you can with what you have. You have to be content with the accuracy of your method relative to the amount of data available.
For example, a clinical trial may try to find the optimal dose of some new drug by giving the drug to only 30 patients. When you have five doses to test and only 30 patients, you’re just not going to find the right dose very often. You might want to assign 6 patients to each dose, but you can’t count on that. For safety reasons, you have to start at the lowest dose and work your way up cautiously, and that usually results in uneven allocation to doses, and thus less statistical power. And you might not treat all 30 patients. You might decide — possibly incorrectly — to stop the trial early because it appears that all doses are too toxic or ineffective. (This gives a glimpse of why testing drugs on people is a harder statistical problem than testing fertilizers on crops.)
Maybe your method finds the right answer 60% of the time, hardly a satisfying performance. But if alternative methods find the right answer 50% of the time under the same circumstances, your 60% looks great by comparison.
Related post: The law of medium numbers
While I was looking up the Tukey quote in my earlier post, I ran another of his quotes:
The test of a good procedure is how well it works, not how well it is understood.
At some level, it’s hard to argue against this. Statistical procedures operate on empirical data, so it makes sense that the procedures themselves be evaluated empirically.
But I question whether we really know that a statistical procedure works well if it isn’t well understood. Specifically, I’m skeptical of complex statistical methods whose only credentials are a handful of simulations. “We don’t have any theoretical results, buy hey, it works well in practice. Just look at the simulations.”
Every method works well on the scenarios its author publishes, almost by definition. If the method didn’t handle a scenario well, the author would publish a different scenario. Even if the author didn’t select the most flattering scenarios, he or she may simply not have considered unflattering scenarios. The latter is particularly understandable, almost inevitable.
Simulation results would have more credibility if an adversary rather than an advocate chose the scenarios. Even so, an adversary and an advocate may share the same blind spots and not explore certain situations. Unless there’s a way to argue that a set of scenarios adequately samples the space of possible inputs, it’s hard to have a great deal of confidence in a method based on simulation results alone.
Here are a couple new preprints.
A proposed method for limiting the size of runs in a response-adaptive clinical trial.
Skeptical and optimistic robust priors for clinical trials.
Joint work with Jairo Fúquene and Luis Pericchi from University of Puerto Rico.
In an interview on Biotech Nation, Gary Cupit made an offhand remark about why so many drugs list headache as a possible side-effect: clinical trial participants are often asked to abstain from coffee during the trial. That also explains why those who receive placebo often complain of headache as well.
Cupit’s company Somnus Therapeutics makes a sleep medication for people who have no trouble going to sleep but do have trouble staying asleep. The medication has a timed-release so that it is active only in the middle of the night when needed. One of the criteria by which the drug is evaluated is whether there is a lingering effect the next morning. Obviously researchers would like to eliminate coffee consumption as a confounding variable. But this contributes to the litany of side-effects that announcers must mumble in television commercials.
Back in March I wrote a blog post asking whether gaining weight makes you taller. Weight and height are clearly associated, and from that data alone one might speculate that gaining weight could make you taller. Of course causation is in the other direction: becoming taller generally makes you gain weight.
In the 1980’s, cardiologists discovered that patients with irregular heart beats for the first 12 days following a heart attack were much more likely to die. Antiarrythmic drugs became standard therapy. But in the next decade cardiologist discovered this was a bad idea. According to Philip Devereaux, “The trial didn’t just show that the drugs weren’t saving lives, it showed they were actually killing people.”
David Freedman relates the story above in his book Wrong. Freedman says
In fact, notes Devereaux, the drugs killed more Americans than the Vietnam War did—roughly an average of forty thousand a year died from the drugs in the United States alone.
Cardiologists had good reason to suspect that antiarrythmic drugs would save lives. In retrospect, it may be that heart-attack patients with poor prognosis have arrhythmia rather than arrhythmia causing poor prognosis. Or it may be that the association is more complicated than either explanation.
Medical experiments come under greater scrutiny than ordinary medical practice. There are good reasons for such precautions, but this leads to a sort of paradox. As Frederick Mosteller observed
We have a strange double standard now. As long as a physician treats a patient intending to cure, the treatment is admissible. When the object is to find out whether the treatment has value, the physician is immediately subject to many constraints.
If a physician has two treatment options, A and B, he can assign either treatment as long as he believes that one is best. But if he admits that he doesn’t know which is better and says he wants to treat some patients each way in order to get a better idea how they compare, then he has to propose a study and go through a long review processes.
I agree with Mosteller that we have a strange double standard, that a doctor is free to do what he wants as long as he doesn’t try to learn anything. On the other hand, review boards reduce the chances that patients will be asked to participate in ill-conceived experiments by looking for possible conflicts of interest, weaknesses in statistical design, etc. And such precautions are more necessary in experimental medicine than in more routine medicine. Still, there is more uncertainty in medicine than we may like to admit, and the line between “experimental” and “routine” can be fuzzy.
When people ask for a random sequence, they’re often disappointed with what they get.
Random sequences clump more than most folks expect. For graphical applications, quasi-random sequence may be more appropriate.These sequences are “more random than random” in the sense that they behave more like what some folks expect from randomness. They jitter around like a random sequence, but they don’t clump as much.
Researchers conducting clinical trials are dismayed when a randomized trial puts several patients in a row on the same treatment. They want to assign patients one at a time to one of two treatments with equal probability, but they also want the allocation to work out evenly. This is like saying you want to flip a coin 100 times, and you also want to get exactly 50 heads and 50 tails. You can’t guarantee both, but there are effective compromises.
One approach is to randomize in blocks. For example, you could randomize in blocks of 10 patients by taking a sequence of 5 A‘s and 5 B‘s and randomly permuting the 10 letters. This guarantees that the allocations will be balanced, but some outcomes will be predictable. At a minimum, the last assignment in each block is always predictable: you assign whatever is left. Assignments could be even more predictable: if you give n A‘s in a row in a block of 2n, you know the last n assignments will be all B‘s.
Another approach is to “encourage” balance rather than enforce it. When you’ve given more A‘s than B‘s you could increase the probability of assigning a B. The greater the imbalance, the more heavily you bias the randomization probability in favor of the treatment that has been assigned less. This is a sort of compromise between equal randomization and block randomization. All assignments are random, though some assignments may be more predictable than others. Large imbalances are less likely than with equal randomization, but more likely than with block randomization. You can tune how aggressively the method responds to imbalances in order to make the method more like equal randomization or more like block randomization.
No approach to randomization will satisfy everyone because there are conflicting requirements. Randomization is a dilemma to be managed rather than a problem to be solved.
My family loves the Little House on the Prairie books. We read them aloud to our three oldest children and we’re in the process of reading them with our fourth child. We just read the chapter describing when the entire Ingalls family came down with malaria, or “fever ‘n’ ague” as they called it.
The family had settled near a creek that was infested with mosquitoes. All the settlers around the creek bottoms came down with malaria, though at the time (circa 1870) they did not know the disease was transmitted by mosquitoes. One of the settlers, Mrs. Scott, believed that malaria was caused by eating the watermelons that grew in the creek bottoms. She had empirical evidence: everyone who had eaten the melons contracted malaria. Charles Ingalls thought that was ridiculous. After he recovered from his attack of malaria, he went down to the creek and brought back a huge watermelon and ate it. His reasoning was that “Everybody knows that fever ‘n’ ague comes from breathing the night air.”
It’s easy to laugh at Mrs. Scott and Mr. Ingalls. What ignorant, superstitious people. But they were no more ignorant than their contemporaries, and both had good reasons for their beliefs. Mrs. Scott had observational data on her side. Ingalls was relying on the accepted wisdom of his day. (After all, “malaria” means “bad air.”)
People used to believe all kinds of things that are absurd now, particularly in regard to medicine. But they were also right about many things that are hard to enumerate now because we take them for granted. Stories of conventional wisdom being correct are not interesting, unless there was some challenge to that wisdom. The easiest examples of folk wisdom to recall may be the instances in which science initially contradicted folk wisdom but later confirmed it. For example, we have come back to believing that breast milk is best for babies and that a moderate amount of sunshine is good for you.
The M. D. Anderson Cancer Center Department of Biostatistics has a software download site listing software developed by the department over many years.
The home page of the download site allows you to see all products sorted by date or by name. This page also allows search. A new page lets you see the software organized by tags.
Jon Udell’s latest Interviews with Innovators podcast features Randall Julian of Indigo BioSystems. I found this episode particularly interesting because it deals with issues I have some experience with.
The problems in managing biological data begin with how to store the raw experiment data. As Julian says
… without buying into all the hype around semantic web and so on, you would argue that a flexible schema makes more sense in a knowledge gathering or knowledge generation context than a fixed schema does.
So you need something less rigid than a relational database and something with more structure than a set of Excel spreadsheets. That’s not easy, and I don’t know whether anyone has come up with an optimal solution yet. Julian said that he has seen many attempts to put vast amounts of biological data into a rigid relational database schema but hasn’t seen this approach succeed yet. My experience has been similar.
Representing raw experimental data isn’t enough. In fact, that’s the easy part. As Jon Udell comments during the interview
It’s easy to represent data. It’s hard to represent the experiment.
That is, the data must come with ample context to make sense of the data. Julian comments that without this context, the data may as well be a list of zip codes. And not only must you capture experimental context, you must describe the analysis done to the data. (See, for example, this post about researchers making up their own rules of probability.)
Julian comments on how electronic data management is not nearly as common as someone unfamiliar with medical informatics might expect.
So right now maybe 50% of the clinical trials in the world are done using electronic data capture technology. … that’s the thing that maybe people don’t understand about health care and the life sciences in general is that there is still a huge amount of paper out there.
Part of the reason for so much paper goes back to the belief that one must choose between highly normalized relational data stores and unstructured files. Given a choice between inflexible bureaucracy and chaos, many people choose chaos. It may work about as well, and it’s much cheaper to implement. I’ve seen both extremes. I’ve also been part of a project using a flexible but structured approach that worked quite well.
A paper I wrote with Jairo Fúquene and Luis Pericchi is now available online.
A Case for Robust Bayesian Priors with Applications to Clinical Trials
Jairo Fúquene, John Cook, and Luis Pericchi
Bayesian Analysis (2009) 4, Number 4, pp. 817–846.
I recently ran across this quote from Mithat Gönen of Memorial Sloan-Kettering Cancer Center:
While there are certainly some at other centers, the bulk of applied Bayesian clinical trial design in this country is largely confined to a single zip code.
from “Bayesian clinical trials: no more excuses,” Clinical Trials 2009; 6; 203.
The zip code Gönen alludes to is 77030, the zip code of M. D. Anderson Cancer Center. I can’t say how much activity there is elsewhere, but certainly we design and conduct a lot of Bayesian clinical trials at MDACC.
Update: After over a decade working at MDACC, I left to start my own consulting business. If you’d like help with adaptive clinical trials please let me know.
I’m leaving today for San Juan. I’m giving a couple talks at a conference on clinical trials.
Puerto Rico is beautiful. (I want to say a “lovely island,” but then the song America from West Side Story gets stuck in my head.) Here are a couple photos from my last visit.
Jairo Fuquene has released an R package on CRAN to accompany our paper
A Case for Robust Bayesian priors with Applications to Binary Clinical Trials
Jairo A. Fuquene P., John D. Cook, Luis Raul Pericchi
Before I started working for a cancer center, I was not aware of the tension between science and medicine. Popular perception is that the two go together hand and glove, but that’s not always true.
Physicians are trained to use their subjective judgment and to be decisive. And for good reason: making a fairly good decision quickly is often better than making the best decision eventually. But scientists must be tentative, withhold judgment, and follow protocols.
Sometimes physician-scientists can reconcile their two roles, but sometimes they have to choose to wear one hat or the other at different times.
The physician-scientist tension is just one facet of the constant tension between treating each patient effectively and learning how to treat future patients more effectively. Sometimes the interests of current patients and future patients coincide completely, but not always.
This ethical tension is part of what makes biostatistics a separate field of statistics. In manufacturing, for example, you don’t need to balance the interests of current light bulbs and future light bulbs. If you need to destroy 1,000 light bulbs to find out how to make better bulbs in the future, no big deal. But different rules apply when experimenting on people. Clinical trials will often use statistical designs that sacrifice some statistical power in order to protect the people participating in the trial. Ethical constraints make biostatistics interesting.