Good news from Pfizer and Moderna

Both Pfizer and Moderna have announced recently that their SARS-COV2 vaccine candidates reduce the rate of infection by over 90% in the active group compared to the control (placebo) group.

That’s great news. The vaccines may turn out to be less than 90% effective when all is said and done, but even so they’re likely to be far more effective than expected.

But there’s other good news that might be overlooked: the subjects in the control groups did well too, though not as well as in the active groups.

The infection rate was around 0.4% in the Pfizer control group and around 0.6% in the Moderna control group.

There were 11 severe cases of COVID in the Moderna trial, out of 30,000 subjects, all in the control group.

There were 0 severe cases of COVID in the Pfizer trial in either group, out of 43,000 subjects.

 

Real-time analytics

There’s an ancient saying “Whom the gods would destroy they first make mad.” (Mad as in crazy, not mad as in angry.) I wrote a variation of this on Twitter:

Whom the gods would destroy, they first give real-time analytics.

Having more up-to-date information is only valuable up to a point. Past that point, you’re more likely to be distracted by noise. The closer you look at anything, the more irregularities you see, and the more likely you are to over-steer [1].

I don’t mean to imply that the noise isn’t real. (More on that here.) But there’s a temptation to pay more attention to the small variations you don’t understand than the larger trends you believe you do understand.

I became aware of this effect when simulating Bayesian clinical trial designs. The more often you check your stopping rule, the more often you will stop [2]. You want to monitor a trial often enough to shut it down, or at least pause it, if things change for the worse. But monitoring too often can cause you to stop when you don’t want to.

Flatter than glass

A long time ago I wrote about the graph below.

The graph looks awfully jagged, until you look at the vertical scale. The curve represents the numerical difference between two functions that are exactly equal in theory. As I explain in that post, the curve is literally smoother than glass, and certainly flatter than a pancake.

Notes

[1] See The Logic of Failure for a discussion of how over-steering is a common factor in disasters such as the Chernobyl nuclear failure.

[2] Bayesians are loathe to talk about things like α-spending, but when you’re looking at stopping frequencies, frequentist phenomena pop up.

Understanding statistical error

A simple linear regression model has the form

y = μ + βx + ε.

This means that the output variable y is a linear function of the input variable x, plus some error term ε that is randomly distributed.

There’s a common misunderstanding over whose error the error term is. A naive view is that the world really is linear, that

y = μ + βx

is some underlying Platonic reality, and that the only reason that we don’t measure exactly that linear part is that we as observers have made some sort of error, that the fault is the real world rather than in the model.

No, reality is what it is, and it’s our model that is in error. Some aspect of reality may indeed have a structure that is approximately linear (over some range, under some conditions), but when we truncate reality to only that linear approximation, we introduce some error. This error may be tolerable—and thankfully it often is—but the error is ours, not the world’s.

COVID19 mortality per capita by state

Here’s a silly graph by Richard West with a serious point. States with longer names tend to have higher covid19 mortality. Of course no one believes there’s anything about the length of a state’s name that should impact the health of its residents. The correlation is real, but it’s a coincidence.

The variation between mortality in different states is really large. Something caused that, though not the length of the names. But here’s the kicker: you may come up with an explanation that’s much more plausible than length of name, and be just as wrong. Discovering causation is hard work, much harder than looking for correlations.

Randomization audit

clipboard list of names

“How would you go about drawing a random sample?”

I thought that was kind of a silly question. I was in my first probability class in college, and the professor started the course with this. You just take a sample, right?

As with many things in life, it gets more complicated when you get down to business. Random sample of what? Sampled according to what distribution? What kind of constraints are there?

How would you select a random sample of your employees in a way that you could demonstrate was not designed to include any particular employee? How can you not only be fair, but convince people who may not understand probability that you were fair?

When there appear to be patterns in your random selections, how can you decide whether there has been an error or whether the results that you didn’t expect actually should have been expected?

How do you sample data that isn’t simply a list but instead has some sort of structure? Maybe you have data split into tables in a relational database. Maybe you have data that naturally forms a network. How can you sample from a relational database or a network in a way that respects the underlying structure?

How would you decide whether the software you’re using for randomization is adequate for your purposes? If it is adequate, are you using it the right way?

These are all questions that I’ve answered for clients. Sometimes they want me to develop randomization procedures. Sometimes they want me to audit the procedures they’re already using. They not only want good procedures, they want procedures that can stand up to critical review.

If you’d like me to design or audit randomization procedures for your company, let’s talk. You will get the assurance that you’re doing the right thing, and the ability to demonstrate that you’re doing the right thing.

CRM consulting gig

This morning I had someone from a pharmaceutical company call me with questions about conducting a CRM dose-finding trial and I mentioned it to my wife.

Then this afternoon she was reading a book in which there was a dialog between husband and wife including this sentence:

He launched into a technical explanation of his current consulting gig—something about a CRM implementation.

You can’t make this kind of thing up. A few hours before reading this line, my wife had exactly this conversation. However, I doubt the author and I had the same thing in mind.

In my mind, CRM stands for Continual Reassessment Method, a Bayesian method for Phase I clinical trials, especially in oncology. We ran a lot of CRM trials while I worked in biostatistics at MD Anderson Cancer Center.

For most people, presumably including the author of the book quoted above, CRM stands for Customer Relationship Management software.

Like my fictional counterpart, I know a few things about CRM implementation, but it’s a different kind of CRM.

More clinical trial posts

Simple clinical trial of four COVID-19 treatments

A story came out in Science yesterday saying the World Health Organization is launching a trial of what it believes are the four most promising treatments for COVID-19 (a.k.a. SARS-CoV-2, novel coronavirus, etc.)

The four treatment arms will be

  • Remdesivir
  • Chloroquine and hydroxychloroquine
  • Ritonavir + lopinavir
  • Ritonavir + lopinavir + interferon beta

plus standard of care as a control arm.

I find the design of this trial interesting. Clinical trials are often complex and slow. Given a choice in a crisis between ponderously designing the perfect clinical trial and flying by the seat of their pants, health officials would rightly choose the latter. On the other hand, it would obviously be good to know which of the proposed treatments is most effective. So this trial has to be a compromise.

The WHO realizes that the last thing front-line healthcare workers want right now is the added workload of conducting a typical clinical trial. So this trial, named SOLIDARITY, will be very simple to run. According to the Science article,

When a person with a confirmed case of COVID-19 is deemed eligible, the physician can enter the patient’s data into a WHO website, including any underlying condition that could change the course of the disease, such as diabetes or HIV infection. The participant has to sign an informed consent form that is scanned and sent to WHO electronically. After the physician states which drugs are available at his or her hospital, the website will randomize the patient to one of the drugs available or to the local standard care for COVID-19.

… Physicians will record the day the patient left the hospital or died, the duration of the hospital stay, and whether the patient required oxygen or ventilation, she says. “That’s all.”

That may sound a little complicated, but by clinical trial standards the SOLIDARITY trial is shockingly simple. Normally you would have countless detailed case report forms, adverse event reporting, etc.

The statistics of the trial will be simple on the front end but complicated on the back end. There’s no sophisticated algorithm assigning treatments, just a randomization between available treatment options, including standard of care. I don’t see how you could do anything else, but this will create headaches for the analysis.

Patients are randomized to available treatments—what else could you do? [1]—which means the treatment options vary by site and over time. The control arm, standard of care, also varies by site and could change over time as well.  Also, this trial is not double-blind. This is a trial optimized for the convenience of frontline workers, not for the convenience of statisticians.

The SOLIDARITY trial will be adaptive in the sense that a DSMB will look at interim results and decide whether to drop treatment arms that appear to be under-performing. Ideally there would be objective algorithms for making these decisions, carefully designed and simulated in advanced, but there’s no time for that. Better to start learning immediately than to spend six months painstakingly designing a trial. Even if we could somehow go back in time and start the design process six months ago, there could very well be contingencies that the designers couldn’t anticipate.

The SOLIDARITY trial is an expedient compromise, introducing a measure of scientific rigor when there isn’t time to be as rigorous as we’d like.

Update: The hydroxycloroqine arm was dropped from the trial because a paper in the Lancet reported that the drug was neither safe nor effective. However, it now appears that the data used in the Lancet paper were fraudulent.

Update: The Lancet retracted the paper in question on June 4, 2020.

More clinical trial posts

[1] You could limit the trial to sites that have all four treatment options available, cutting off most potential sources of data. The data would not be representative of the world at large and accrual would be slow. Or you could wait until all four treatments were distributed to clinics around the world, but there’s no telling how long that would take.

Automatic data reweighting

Suppose you are designing an autonomous system that will gather data and adapt its behavior to that data.

At first you face the so-called cold-start problem. You don’t have any data when you first turn the system on, and yet the system needs to do something before it has accumulated data. So you prime the pump by having the system act at first on what you believe to be true prior to seeing any data.

Now you face a problem. You initially let the system operate on assumptions rather than data out of necessity, but you’d like to go by data rather than assumptions once you have enough data. Not just some data, but enough data. Once you have a single data point, you have some data, but you can hardly expect a system to act reasonably based on one datum.

Rather than abruptly moving from prior assumptions to empirical data, you’d like the system to gradually transition from reliance on the former to reliance on the latter, weaning the system off initial assumptions as it becomes more reliant on new data.

The delicate part is how to manage this transition. How often should you adjust the relative weight of prior assumptions and empirical data? And how should you determine what weights to use? Should you set the weight given to the prior assumptions to zero at some point, or should you let the weight asymptotically approach zero?

Fortunately, there is a general theory of how to design such systems. It’s called Bayesian statistics. The design issues raised above are all handled automatically by Bayes theorem. You start with a prior distribution summarizing what you believe to be true before collecting new data. Then as new data arrive, the magic of Bayes theorem adjusts this distribution, putting more emphasis on the empirical data and less on the prior. The effect of the prior gradually fades away as more data become available.

Related posts

Sufficient statistic paradox

A sufficient statistic summarizes a set of data. If the data come from a known distribution with an unknown parameter, then the sufficient statistic carries as much information as the full data [0]. For example, given a set of n coin flips, the number of heads and the number of flips are sufficient statistics. More on sufficient statistics here.

There is a theorem by Koopman, Pitman, and Darmois that essentially says that useful sufficient statistics exist only if the data come from a class of probability distributions known as normal families [1]. This leads to what Persi Diaconis [2] labeled a paradox.

Exponential families are quite a restricted family of measuures. Modern statistics deals with far richer classes of probabilities. This suggests a kind of paradox. If statistics is to be of any real use it must provide ways of boiling down great masses of data to a few humanly interpretable numbers. The Koopman-Pitman-Darmois theorem suggests this is impossible unless nature follows highly specialized laws which no one really believes.

Diaconis suggests two ways around the paradox. First, the theorem he refers to concerns sufficient statistics of a fixed size; it doesn’t apply if the summary size varies with the data size. Second, and more importantly, the theorem says nothing about a summary containing approximately as much information as the full data.

***

[0] Students may naively take this to mean that all you need is sufficient statistics. No, it says that if you know the distribution the data come from then all you need is the sufficient statistics. You cannot test whether a model fits given sufficient statistics that assume the model. For example, mean and variance are sufficient statistics assuming data come from a normal distribution. But knowing only the mean and variance, you can’t assess whether a normal distribution model fits the data.

[1] This does not mean the data have to come from the normal distribution. The normal family of distributions includes the normal (Gaussian) distribution, but other distributions as well.

[2] Persi Diaconis. Sufficiency as Statistical Symmetry. Proceedings of the AMS Centennial Symposium. August 8–12, 1988.

What is a privacy budget?

Accountant

The idea behind differential privacy is that it doesn’t make much difference whether your data is in a data set or not. How much difference your participation makes is made precise in terms of probability statements. The exact definition doesn’t for this post, but it matters that there is an exact definition.

Someone designing a differentially private system sets an upper limit on the amount of difference anyone’s participation can make. That’s the privacy budget. The system will allow someone to ask one question that uses the whole privacy budget, or a series of questions whose total impact is no more than that one question.

If you think of a privacy budget in terms of money, maybe your privacy budget is $1.00. You could ask a single $1 question if you’d like, but you couldn’t ask any more questions after that. Or you could ask one $0.30 question and seven $0.10 questions.

Some metaphors are dangerous, but the idea of comparing cumulative privacy impact to a financial budget is a good one. You have a total amount you can spend, and you can chose how you spend it.

The only problem with privacy budgets is that they tend to be overly cautious because they’re based on worst-case estimates. There are several ways to mitigate this. A simple way to stretch privacy budgets is to cache query results. If you ask a question twice, you get the same answer both times, and you’re only charged once.

(Recall that differential privacy adds a little random noise to query results to protect privacy. If you could ask the same question over and over, you could average your answers, reducing the level of added noise, and so a differentially private system will rightly charge you repeatedly for repeated queries. But if the system adds noise once and remembers the result, there’s no harm in giving you back that same answer as often as you ask the question.)

A more technical way to get more from a privacy budget is to use Rényi differential privacy (RDP) rather than the original ε-differential privacy. The former simplifies privacy budget accounting due to simple composition rules, and makes privacy budgets stretch further by leaning away from worst-case analysis a bit and leaning toward average-case analysis. RDP depends on a tuning parameter that includes ε-differential privacy, so one can control how much RDP acts like ε-differential privacy by adjusting that parameter.

There are other ways to stretch privacy budgets as well. The net effect is that when querying a large database, you can often ask all the questions like, and get sufficiently accurate answers, without worrying about privacy budget.

More mathematical privacy posts