Biostatistics software

The M. D. Anderson Cancer Center Department of Biostatistics has a software download site listing software developed by the department over many years.

The home page of the download site allows you to see all products sorted by date or by name. This page also allows search. A new page lets you see the software organized by tags.

RelatedBiostatistics consultant

Managing biological data

Jon Udell’s latest Interviews with Innovators podcast features Randall Julian of Indigo BioSystems. I found this episode particularly interesting because it deals with issues I have some experience with.

The problems in managing biological data begin with how to store the raw experiment data. As Julian says

… without buying into all the hype around semantic web and so on, you would argue that a flexible schema makes more sense in a knowledge gathering or knowledge generation context than a fixed schema does.

So you need something less rigid than a relational database and something with more structure than a set of Excel spreadsheets. That’s not easy, and I don’t know whether anyone has come up with an optimal solution yet. Julian said that he has seen many attempts to put vast amounts of biological data into a rigid relational database schema but hasn’t seen this approach succeed yet. My experience has been similar.

Representing raw experimental data isn’t enough. In fact, that’s the easy part. As Jon Udell comments during the interview

It’s easy to represent data. It’s hard to represent the experiment.

That is, the data must come with ample context to make sense of the data. Julian comments that without this context, the data may as well be a list of zip codes. And not only must you capture experimental context, you must describe the analysis done to the data. (See, for example, this post about researchers making up their own rules of probability.)

Julian comments on how electronic data management is not nearly as common as someone unfamiliar with medical informatics might expect.

So right now maybe 50% of the clinical trials in the world are done using electronic data capture technology. … that’s the thing that maybe people don’t understand about health care and the life sciences in general is that there is still a huge amount of paper out there.

Part of the reason for so much paper goes back to the belief that one must choose between highly normalized relational data stores and unstructured files. Given a choice between inflexible bureaucracy and chaos, many people choose chaos. It may work about as well, and it’s much cheaper to implement. I’ve seen both extremes. I’ve also been part of a project using a flexible but structured approach that worked quite well.

Related posts

Off to Puerto Rico

I’m leaving today for San Juan. I’m giving a couple talks at a conference on clinical trials.

Puerto Rico is beautiful. (I want to say a “lovely island,” but then the song America from West Side Story gets stuck in my head.) Here are a couple photos from my last visit.

Science versus medicine

Before I started working for a cancer center, I was not aware of the tension between science and medicine. Popular perception is that the two go together hand and glove, but that’s not always true.

Physicians are trained to use their subjective judgment and to be decisive. And for good reason: making a fairly good decision quickly is often better than making the best decision eventually. But scientists must be tentative, withhold judgment, and follow protocols.

Sometimes physician-scientists can reconcile their two roles, but sometimes they have to choose to wear one hat or the other at different times.

The physician-scientist tension is just one facet of the constant tension between treating each patient effectively and learning how to treat future patients more effectively. Sometimes the interests of current patients and future patients coincide completely, but not always.

This ethical tension is part of what makes biostatistics a separate field of statistics. In manufacturing, for example, you don’t need to balance the interests of current light bulbs and future light bulbs. If you need to destroy 1,000 light bulbs to find out how to make better bulbs in the future, no big deal. But different rules apply when experimenting on people. Clinical trials will often use statistical designs that sacrifice some statistical power in order to protect the people participating in the trial. Ethical constraints make biostatistics interesting.

Probability that a study result is true

Suppose a new study comes out saying a drug or a food or a habit lowers your risk of some disease. What is the probability that the study’s result is correct? Obviously this is a very important question, but one that is not raised often enough.

I’ve referred to a paper by John Ioannidis (*) several times before, but I haven’t gone over the model he uses to support his claim that most study results are false. This post will look at some equations he derives for estimating the probability that a claimed positive result is correct.

First of all, let R be the ratio of positive findings to negative findings being investigated in a particular area. Of course we never know exactly what R is, but let’s pretend that somehow we knew that out of 1000 hypotheses being investigated in some area, 200 are correct. Then R would be 200/800 = 0.25. The value of R varies quite a bit, being relatively large in some fields of study and quite small in others. Imagine researchers pulling hypotheses to investigate from a hat. The probability of selecting a hypothesis that really is true would be R/(R+1) and the probability selecting a false hypothesis is 1/(R+1).

Let α be the probability of incorrectly declaring a false hypothesis to be true. Studies are often designed with the goal that α would be 0.05. Let β be the probability that a study would incorrectly conclude that that a true hypothesis is false. In practice, β is far more variable than α. You might find study designs with β anywhere from 0.5 down to 0.01. The design choice β = 0.20 is common in some contexts.

There are two ways to publish a study claiming a new result: you could have selected a true hypothesis and correctly concluded that it was true, or you could have selected a false but incorrectly concluded it was true. The former has probability (1-β)R/(R+1) and the latter has probability α/(R+1). The total probability of concluding a hypothesis is true, correctly or incorrectly, is the sum of these probabilities, i.e. ((1-β)R + α)/(R+1). The probability that a study conclusion is true given that you concluded it was true, the positive predictive value or PPV, is the ratio of (1-β)R/(R+1) to ((1-β)R + α)/(R+1). In summary, under the assumptions above, the probability of a claimed result being true is (1-β)R/((1-β)R + α).

If (1 – β)R < α then the model say that a claim is more likely to be false than true. This can happen if R is small, i.e. there are not a large proportion of true results under investigation, and if β is large, i.e. if studies are small. If R is smaller than α, most studies will be false no matter how small you make β, i.e. no matter how large the study. This says that in a challenging area, where few of the ideas being investigated lead to progress, there will be a large proportion of false results published, even if the individual researchers are honest and careful.

Ioannidis develops two other models refining the model above. Suppose that because of bias, some proportion of results that would otherwise have been reported as negative are reported as positive. Call this proportion u. The derivation of the positive predictive value is similar to that in the previous model, but messier. The final result is R(1-β + uβ)/(R(1-β + uβ) + α + u – αu). If 1 – β > α, which is nearly always the case, then the probability of a reported result being correct decreases as bias increases.

The final model considers the impact of multiple investigators testing the same hypothesis. If more people try to prove the same thing, it’s more likely that someone will get lucky and “prove” it, whether or not the thing to be proven is true. Leaving aside bias, if n investigators are testing each hypothesis, the probability that a positive claim is true is given by R(1 – βn)/(R + 1 – (1 – α)nRβn). As n increases, the probability of a positive claim being true decreases.

The probability of a result being true is often much lower than is commonly believed. One reason is that hypothesis testing focuses on the probability of the data given a hypothesis rather than the probability of a hypothesis given the data. Calculating the probability of a hypothesis given data relies on prior probabilities, such as the factors R/(R+1) and 1/(R+1) above. These prior probabilities are elusive and controversial, but they are critical in evaluating how likely it is that claimed results are true.

Related: Adaptive clinical trial design

 

(*) John P. A. Ioannidis, Why most published research findings are false. CHANCE volume 18, number 4, 2005.

Drug looks promising, come back in 30 years

The most recent 60-Second Science podcast summarizes a paper in Science magazine reporting that the average interval between a drug being deemed “promising” and the first paper appearing showing clinical effectiveness is 24 years.

Note that the publication of a paper saying a drug is clinically effective is a far cry from regulatory approval. Many new drugs that look like an improvement after a phase II trial turn out to be no better than existing treatments, and those really are better take years to achieve regulatory approval.

* * *

Consulting in clinical trial design

Random inequalities VII: three or more variables

The previous posts in this series have looked at P(X > Y), the probability that a sample from a random variable X is greater than a sample from an independent random variable Y. In applications, X and Y have different distributions but come from the same distribution family.

Sometimes applications require computing P(X > max(Y, Z)). For example, an adaptively randomized trial of three treatments may be designed to assign a treatment with probability equal to the probability that that treatment has the best response. In a trial with a binary outcome, the variables X, Y, and Z may be beta random variables representing the probability of response. In a trial with a time-to-event outcome, the variables might be gamma random variables representing survival time.

Sometimes we’re interested in the opposite inequality, P(X < min(Y,Z)). This would be the case if we thought in terms of failures rather than responses, or wanted to minimize the time to a desirable event rather than maximizing the time to an undesirable event.

The maximum and minimum inequalities are related by the following equation:

P(X < min(Y,Z)) = P(X > max(Y, Z)) + 1 – P(X > Y) – P(X > Z).

These inequalities are used for safety monitoring rules as well as to determine randomization probabilities. In a trial seeking to maximize responses, a treatment arm X might be dropped if P(X > max(Y,Z)) becomes too small.

In principle one could design an adaptively randomized trial with n treatment arms for any n ≥ 2 based on P(X1 > max(X2, …, Xn)). In practice, the most common value of n by far is 2. Sometimes n is 3. I’m not familiar with an adaptively randomized trial with more than three arms. I’ve heard of an adaptively randomized trial that was designed with five arms, but I don’t believe the trial ran.

Computing P(X1 > max(X2, …, Xn)) by numerical integration becomes more difficult as n increases. For large n, simulation may be more efficient than integration. Computing P(X1 > max(X2, …, Xn)) for gamma random variables with n=3 was unacceptably slow in a previous version of our adaptive randomization software. The search for a faster algorithm lead to this paper: Numerical Evaluation of Gamma Inequalities.

Previous posts on random inequalities: