From the category archives:

Clinical trials

Something like a random sequence but …

by John on February 24, 2010

When people ask for a random sequence, they’re often disappointed with what they get.

Random sequences clump more than most folks expect. For graphical applications, quasi-random sequence may be more appropriate.These sequences are “more random than random” in the sense that they behave more like what some folks expect from randomness. They jitter around like a random sequence, but they don’t clump as much.

Researchers conducting clinical trials are dismayed when a randomized trial puts several patients in a row on the same treatment. They want to assign patients one at a time to one of two treatments with equal probability, but they also want the allocation to work out evenly. This is like saying you want to flip a coin 100 times, and you also want to get exactly 50 heads and 50 tails. You can’t guarantee both, but there are effective compromises.

One approach is to randomize in blocks. For example, you could randomize in blocks of 10 patients by taking a sequence of 5 A’s and 5 B’s and randomly permuting the 10 letters. This guarantees that the allocations will be balanced, but some outcomes will be predictable. At a minimum, the last assignment in each block is always predictable: you assign whatever is left. Assignments could be even more predictable: if you give n A’s in a row in a block of 2n, you know the last n assignments will be all B’s.

Another approach is to “encourage” balance rather than enforce it. When you’ve given more A’s than B’s you could increase the probability of assigning a B. The greater the imbalance, the more heavily you bias the randomization probability in favor of the treatment that has been assigned less. This is a sort of compromise between equal randomization and block randomization. All assignments are random, though some assignments may be more predictable than others. Large imbalances are less likely than with equal randomization, but more likely than with block randomization. You can tune how aggressively the method responds to imbalances in order to make the method more like equal randomization or more like block randomization.

No approach to randomization will satisfy everyone because there are conflicting requirements. Randomization is a dilemma to be managed rather than a problem to be solved.

Related posts:

Quasi-random sequences in art and integration
Three ways of tuning an adaptively randomized trial
Population drift
Galen and clinical trials

{ 0 comments }

Malaria on the prairie

by John on February 9, 2010

My family loves the Little House on the Prairie books. We read them aloud to our three oldest children and we’re in the process of reading them with our fourth child. We just read the chapter describing when the entire Ingalls family came down with malaria, or “fever ‘n’ ague” as they called it.

The family had settled near a creek that was infested with mosquitoes. All the settlers around the creek bottoms came down with malaria, though at the time (circa 1870) they did not know the disease was transmitted by mosquitoes. One of the settlers, Mrs. Scott, believed that malaria was caused by eating the watermelons that grew in the creek bottoms. She had empirical evidence: everyone who had eaten the melons contracted malaria. Charles Ingalls thought that was ridiculous. After he recovered from his attack of malaria, he went down to the creek and brought back a huge watermelon and ate it. His reasoning was that “Everybody knows that fever ‘n’ ague comes from breathing the night air.”

It’s easy to laugh at Mrs. Scott and Mr. Ingalls. What ignorant, superstitious people. But they were no more ignorant than their contemporaries, and both had good reasons for their beliefs. Mrs. Scott had observational data on her side. Ingalls was relying on the accepted wisdom of his day. (After all, “malaria” means “bad air.”)

People used to believe all kinds of things that are absurd now, particularly in regard to medicine. But they were also right about many things that are hard to enumerate now because we take them for granted. Stories of conventional wisdom being correct are not interesting, unless there was some challenge to that wisdom. The easiest examples of folk wisdom to recall may be the instances in which science initially contradicted folk wisdom but later confirmed it. For example, we have come back to believing that breast milk is best for babies and that a moderate amount of sunshine is good for you.

Related posts:

A little coffee on the prairie
Galen and clinical trials
Randomized trials of parachute use

{ 3 comments }

Biostatistics software

by John on January 13, 2010

The M. D. Anderson Cancer Center Department of Biostatistics has a software download site listing software developed by the department over many years.

The home page of the download site allows you to see all products sorted by date or by name. This page also allows search. A new page lets you see the software organized by tags.

{ 1 comment }

Managing biological data

by John on December 14, 2009

Jon Udell’s latest Interviews with Innovators podcast features Randall Julian of Indigo BioSystems. I found this episode particularly interesting because it deals with issues I have some experience with.

The problems in managing biological data begin with how to store the raw experiment data. As Julian says

… without buying into all the hype around semantic web and so on, you would argue that a flexible schema makes more sense in a knowledge gathering or knowledge generation context than a fixed schema does.

So you need something less rigid than a relational database and something with more structure than a set of Excel spreadsheets. That’s not easy, and I don’t know whether anyone has come up with an optimal solution yet. Julian said that he has seen many attempts to put vast amounts of biological data into a rigid relational database schema but hasn’t seen this approach succeed yet. My experience has been similar.

Representing raw experimental data isn’t enough. In fact, that’s the easy part. As Jon Udell comments during the interview

It’s easy to represent data. It’s hard to represent the experiment.

That is, the data must come with ample context to make sense of the data. Julian comments that without this context, the data may as well be a list of zip codes. And not only must you capture experimental context, you must describe the analysis done to the data. (See, for example, this post about researchers making up their own rules of probability.)

Julian comments on how electronic data management is not nearly as common as someone unfamiliar with medical informatics might expect.

So right now maybe 50% of the clinical trials in the world are done using electronic data capture technology. … that’s the thing that maybe people don’t understand about health care and the life sciences in general is that there is still a huge amount of paper out there.

Part of the reason for so much paper goes back to the belief that one must choose between highly normalized relational data stores and unstructured files. Given a choice between inflexible bureaucracy and chaos, many people choose chaos. It may work about as well, and it’s much cheaper to implement. I’ve seen both extremes. I’ve also been part of a project using a flexible but structured approach that worked quite well.

Related posts:

Posts on reproducibility
Problems versus dilemmas
Blogging about reproducible research

{ 9 comments }

A case for robust Bayesian priors

by John on November 30, 2009

A paper I wrote with Jairo Fúquene and Luis Pericchi is now available online.

A Case for Robust Bayesian Priors with Applications to Clinical Trials
Jairo Fúquene, John Cook, and Luis Pericchi
Bayesian Analysis (2009) 4, Number 4, pp. 817–846.

{ 0 comments }

Bayesian clinical trials in one zip code

by John on October 27, 2009

I recently ran across this quote from Mithat Gönen of Memorial Sloan-Kettering Cancer Center:

While there are certainly some at other centers, the bulk of applied Bayesian clinical trial design in this country is largely confined to a single zip code.

from “Bayesian clinical trials: no more excuses,” Clinical Trials 2009; 6; 203.

The zip code Gönen alludes to is 77030, the zip code of M. D. Anderson Cancer Center. I can’t say how much activity there is elsewhere, but certainly we design and conduct a lot of Bayesian clinical trials at MDACC.

Related posts:

Cartoon guide to cancer research
Stopping trials of ineffective drugs sooner
Three ways of tuning an adaptively randomized clinical trial
Population drift

{ 1 comment }

Off to Puerto Rico

by John on May 24, 2009

I’m leaving today for San Juan. I’m giving a couple talks at a conference on clinical trials.

Puerto Rico is beautiful. (I want to say a “lovely island,” but then the song America from West Side Story gets stuck in my head.) Here are a couple photos from my last visit.

{ 1 comment }

R package for robust priors

by John on May 11, 2009

Jairo Fuquene has released an R package on CRAN to accompany our paper

A Case for Robust Bayesian priors with Applications to Binary Clinical Trials
Jairo A. Fuquene P., John D. Cook, Luis Raul Pericchi

{ 2 comments }

Science versus medicine

by John on April 8, 2009

Before I started working for a cancer center, I was not aware of the tension between science and medicine. Popular perception is that the two go together hand and glove, but that’s not always true.

Physicians are trained to use their subjective judgment and to be decisive. And for good reason: making a fairly good decision quickly is often better than making the best decision eventually. But scientists must be tentative, withhold judgment, and follow protocols.

Sometimes physician-scientists can reconcile their two roles, but sometimes they have to choose to wear one hat or the other at different times.

The physician-scientist tension is just one facet of the constant tension between treating each patient effectively and learning how to treat future patients more effectively. Sometimes the interests of current patients and future patients coincide completely, but not always.

This ethical tension is part of what makes biostatistics a separate field of statistics. In manufacturing, for example, you don’t need to balance the interests of current light bulbs and future light bulbs. If you need to destroy 1,000 light bulbs to find out how to make better bulbs in the future, no big deal. But different rules apply when experimenting on people. Clinical trials will often use statistical designs that sacrifice some statistical power in order to protect the people participating in the trial. Ethical constraints make biostatistics interesting.

{ 2 comments }

Probability that a study result is true

by John on November 24, 2008

Suppose a new study comes out saying a drug or a food or a habit lowers your risk of some disease. What is the probability that the study’s result is correct? Obviously this is a very important question, but one that is not raised often enough.

[click to continue...]

{ 3 comments }

Sometimes it’s right under your nose

by John on October 7, 2008

Neptune was discovered in 1846. But Galileo’s notebooks describe a “star” he saw on 28 December 1612 and 2 January 1613 that we now know was Neptune. Galileo even noticed that his star was in a slightly different location for his two observations, but he chalked the difference up to observational error.

The men who discovered Neptune were not the first to see it; they were the first to realize what they were looking at.

Voyager 2 photo of Neptune via Wikipedia

{ 0 comments }

How to pick simulation scenarios

by John on October 6, 2008

People new to simulation start by picking scenarios based on what they hope will happen. That’s OK, but it’s more important to pick scenarios that you expect are likely to happen or fear might happen.

{ 0 comments }

Drug looks promising, come back in 30 years

by John on September 7, 2008

The most recent 60-Second Science podcast summarizes a paper in Science magazine reporting that the average interval between a drug being deemed “promising” and the first paper appearing showing clinical effectiveness is 24 years.

Note that the publication of a paper saying a drug is clinically effective is a far cry from regulatory approval. Many new drugs that look like an improvement after a phase II trial turn out to be no better than existing treatments, and those really are better take years to achieve regulatory approval.

See also

False positives for medical papers
Most published research results are false

{ 0 comments }

The previous posts in this series have looked at P(X > Y), the probability that a sample from a random variable X is greater than a sample from an independent random variable Y. In applications, X and Y have different distributions but come from the same distribution family.

Sometimes applications require computing P(X > max(Y, Z)). For example, an adaptively randomized trial of three treatments may be designed to assign a treatment with probability equal to the probability that that treatment has the best response. In a trial with a binary outcome, the variables X, Y, and Z may be beta random variables representing the probability of response. In a trial with a time-to-event outcome, the variables might be gamma random variables representing survival time.

Sometimes we’re interested in the opposite inequality, P(X < min(Y,Z)). This would be the case if we thought in terms of failures rather than responses, or wanted to minimize the time to a desirable event rather than maximizing the time to an undesirable event.

The maximum and minimum inequalities are related by the following equation:

P(X < min(Y,Z)) = P(X > max(Y, Z)) + 1 – P(X > Y) – P(X > Z).

These inequalities are used for safety monitoring rules as well as to determine randomization probabilities. In a trial seeking to maximize responses, a treatment arm X might be dropped if P(X > max(Y,Z)) becomes too small.

In principle one could design an adaptively randomized trial with n treatment arms for any n ≥ 2 based on P(X1 > max(X2, …, Xn)). In practice, the most common value of n by far is 2. Sometimes n is 3. I’m not familiar with an adaptively randomized trial with more than three arms. I’ve heard of an adaptively randomized trial that was designed with five arms, but I don’t believe the trial ran.

Computing P(X1 > max(X2, …, Xn)) by numerical integration becomes more difficult as n increases. For large n, simulation may be more efficient than integration. Computing P(X1 > max(X2, …, Xn)) for gamma random variables with n=3 was unacceptably slow in a previous version of our adaptive randomization software. The search for a faster algorithm lead to this paper: Numerical Evaluation of Gamma Inequalities.

Previous posts on random inequalities:

Introduction
Analytical results
Numerical results
Cauchy distributions
Beta distributions
Gamma distributions

{ 0 comments }

Random inequalities VI: gamma distributions

by John on August 30, 2008

This post looks at computing P(X > Y) where X and Y are gamma random variables. These inequalities are central to the Thall-Wooten method of monitoring single-arm clinical trials with time-to-event outcomes. They also are central to adaptively randomized clinical trials with time-to-event outcomes.

When X and Y are gamma random variables P(X > Y) can be computed in terms of the incomplete beta function. Suppose X has shape αX and scale βX and Y has shape αY and scale βY. Then βXY/(βX Y+ βYX) has a beta(αY, αX) distribution. (This result is well-known in the special case of the scale parameters both equal to 1. I wrote up the more general result here but I don’t imagine I was the first to stumble on the generalization.) It follows that

P(X < Y) = P(B < βX/(βX+ βY)

where B is a beta(αY, αX) random variable.

For more details, see Numerical evaluation of gamma inequalities.

Previous posts on random inequalities:

Introduction
Analytical results
Numerical results
Cauchy distributions
Beta distributions

{ 0 comments }