“Reproducible” and “randomized” don’t seem to go together. If something was unpredictable the first time, shouldn’t it be unpredictable if you start over and run it again? As is often the case, we want incompatible things.

But the combination of reproducible and random can be reconciled. Why would we want a randomized controlled trial (RCT) to be random, and why would we want it to be reproducible?

**One of the purposes** in randomized experiments is the hope of scattering complicating factors evenly between two groups. For example, one way to test two drugs on a 1000 people would be to gather 1000 people and give the first drug to all the men and the second to all the women. But maybe a person’s sex has something to do with how the drug acts. If we randomize between two groups, it’s likely that about the same number of men and women will be in each group.

The example of sex as a factor is oversimplified because there’s reason to suspect *a priori* that sex might make a difference in how a drug performs. The bigger problem is that factors we can’t anticipate or control may matter, and we’d like them scattered evenly between the two treatment groups. If we knew what the factors were, we could assure that they’re evenly split between the groups. The hope is that randomization will do that for us with things we’re unaware of. For this purpose we don’t need a process that is “truly random,” whatever that means, but a process that matches our expectations of how randomness should behave. So a pseudorandom number generator (PRNG) is fine. No need, for example, to randomize using some physical source of randomness like radioactive decay.

**Another purpose** in randomization is for the assignments to be unpredictable. We want a physician, for example, to enroll patients on a clinical trial without knowing what treatment they will receive. Otherwise there could be a bias, presumably unconscious, against assigning patients with poor prognosis if the physicians know the next treatment be the one they hope or believe is better. Note here that the randomization only has to be unpredictable from the perspective of the people participating in and conducting the trial. The assignments could be predictable, in principle, by someone *not* involved in the study.

And why would you want an randomization assignments to be **reproducible**? One reason would be to test whether randomization software is working correctly. Another might be to satisfy a regulatory agency or some other oversight group. Still another reason might be to defend your randomization in a law suit. A physical random number generator, such as using the time down to the millisecond at which the randomization is conducted would achieve random assignments and unpredictability, but not reproducibility.

Computer algorithms for generating random numbers (technically pseudo-random numbers) can achieve reproducibility, practically random allocation, and unpredictability. The randomization outcomes are predictable, and hence reproducible, to someone with access to the random number generator and its state, but unpredictable in practice to those involved in the trial. The internal state of the random number generator has to be saved between assignments and passed back into the randomization software each time.

Random number generators such as the Mersenne Twister have good statistical properties, but they also carry a large amount of state. The random number generator described here has very small state, 64 bits, and so storing and returning the state is simple. If you needed to generate a trillion random samples, Mersenne Twitster would be preferable, but since RCTs usually have less than a trillion subjects, the RNG in the article is perfectly fine. I have run the Die Harder random number generator quality tests on this generator and it performs quite well.

**Related**:

Image by Ilmicrofono Oggiono, licensed under Creative Commons

]]>H. and B. S. Jeffreys, Methods of Mathematical Physics, 2nd ed., Cambridge University Press, 1950, p. 8.

**Related post**: Just an approximation

The planets have elliptical orbits with the sun at one focus, but these ellipses are nearly circles centered at the sun. We’ll assume the orbits are perfectly circular and lie in the same plane. (Now that Pluto is not classified as a planet, we can say without qualification that the planets have nearly circular orbits. Pluto’s orbit is much more elliptical than any of the planets.)

We can work in astronomical units (AUs) so that the distance from the Earth to the sun is 1. We can also work in units of years so that the period is also 1. Then we could describe the position of the Earth at time *t* as exp(2π*it*).

Mars has a larger orbit and a longer period. By Kepler’s third law, the size of the orbit and the period are related: the square of the period is proportional to the cube of the radius. Because we’re working in AUs and years, the proportionality constant is 1. If we denote the radius of Mars’ orbit by *r*, then its orbit can be described by

*r* exp(2π*i* (*r*^{-3/2} *t* ))

Here we pick our initial time so that at *t* = 0 the two planets are aligned.

The distance between the planets is just the absolute value of the difference between their positions:

| exp(2π*it*) – *r* exp(2π*i* (*r*^{-3/2} *t*)) |

The following code computes and plots the distance from Earth to Mars over time.

from scipy import exp, pi, absolute, linspace import matplotlib.pyplot as plt def earth(t): return exp(2*pi*1j*t) def mars(t): r = 1.524 # semi-major axis of Mars orbit in AU return r*exp(2*pi*1j*(r**-1.5*t)) def distance(t): return absolute(earth(t) - mars(t)) x = linspace(0, 20, 1000) plt.plot(x, distance(x)) plt.xlabel("Time in years") plt.ylabel("Distance in AU") plt.ylim(0, 3) plt.show()

And the output looks like this:

Notice that the distance varies from about 0.5 to about 2.5. That’s because the radius of Mars’ orbit is about 1.5 AU. So when the planets are exactly in phase, they are 0.5 AU apart and when they’re exactly out of phase they are 2.5 AU apart. In other words the distance ranges from 1.5 – 1 to 1.5 + 1.

The distance function seems to be periodic with period about 2 years. We can do a little calculation by hand to show that is the case and find the period exactly.

The distance squared is the distance times its complex conjugate. If we let ω = *r *^{-3/2} then the distance squared is

*d*^{2}(*t*) = (exp(2π*it*) – *r* exp(2π*i*ω*t*)) (exp(-2π*it*) – *r* exp(-2π*i*ω*t*))

which simplifies to

1 + *r*^{2} – 2*r* cos(2π(1 – ω)*t*)

and so the (squared) distance is periodic with period 1/(1 – ω) = 2.13.

Notice that the plot of distance looks more angular at the minima and more rounded near the maxima. Said another way, the distance changes more rapidly when the planets leave their nearest approach than their furthest approach. You can prove this by taking square root of *d*^{2}(*t*) and computing its derivative.

Let *f*(*t*) = 1 + *r*^{2} – 2*r* cos(2π(1 – ω)*t*). By the chain rule, the derivative of the square root of *f*(*t*) is 1/2 *f*(*t*)^{-1/2} *f*‘(*t*). Near a maximum or a minimum, *f*‘(*t*) takes on the same values. But the term *f*(*t*)^{-1/2} is largest when *f*(*t*) is smallest and vice versa because of the negative exponent.

Or maybe not. A new study of three contemporary hunter-gatherer tribes found that they stay awake long after dark and sleep an average of 6.5 hours a night. They also don’t nap much [1]. This suggests the way we sleep may not be that different from our ancient forebears.

Historian A. Roger Ekirch suggested that before electric lighting it was common to sleep in two four-hour segments with an hour or so of wakefulness in between. His theory was based primarily on medieval English texts that refer to “first sleep” and “second sleep” and has other literary support as well. A small study found that subjects settled into the sleep pattern Ekirch predicted when they were in a dark room for 14 hours each night for a month. But the hunter-gatherers don’t sleep this way.

Maybe latitude is an important factor. The hunter-gatherers mentioned above live between 2 and 20 degrees south of the equator whereas England is 52 degrees north of the equator. Maybe two-phase sleep was more common at high latitudes with long winter nights. Of course there are many differences between modern/ancient [2] hunter-gatherers and medieval Western Europeans besides latitude.

Two studies have found two patterns of how people sleep without electric lights. Maybe electric lights don’t have as much impact on how people sleep as other factors.

**Related post**: Paleolithic nonsense

* * *

[1] The study participants were given something like a Fitbit to wear. The article said that naps less than 15 minutes would be below the resolution of the monitors, so we don’t know how often the participants took cat naps. We only know that they rarely took longer naps.

[2] There is an implicit assumption that the contemporary hunter-gatherers live and, in particular, sleep like their ancient ancestors. This seems reasonable, though we can’t be certain. There is also the bigger assumption that the tribesmen represent not only *their* ancestors but all paleolithic humans. Maybe they do, and we don’t have much else to go on, but we don’t know. I suspect there was more diversity in the paleolithic era than we assume.

After the alphabet and the tables of multiplication, nothing has proved quite so useful in my professional life as these six little expressions.

The six expressions he refers to are nicknamed the *vergeet-me-nietjes *in Dutch, which translates to forget-me-nots in English. They are also known as Dr. Myosotis’s equations because myosotis is the genus for forget-me-nots. The equations give the angular and linear deflections of a cantilever beam.

Imagine a beam anchored at one end and free on the other, subject to one of the kinds of load: a bending moment *M* at the opposite end, a point force *P* a the opposite end, or a force *w* distributed over the length of the beam. The equations below give the rotation (angular deflection) and displacement (linear deflection) of the free end of the beam.

Rotation | Displacement | |
---|---|---|

Bending moment | ML/EI |
ML^{2}/2EI |

Point load | PL^{2}/2EI |
PL^{3}/3EI |

Distributed load | wL^{3}/6EI |
wL^{4}/8EI |

Here *E* is the modulus of elasticity, *L* is the length of the beam, and *I* is the area moment of inertia.

]]>… I said that if science could come up with something like the Jump it could surely solve a problem like that. Severin seized hold of that word, “science.” Science, he said, is not some mysterious larger-than-life force, it’s just the name we give to bright ideas that individual guys have when they’re lying in bed at night, and that if the fuel thing bothered me so much, there was nothing stopping me from having a bright idea to solve it …

This is a thumbnail version of a large, high-resolution image by Ulysse Carion. Thanks to Aleksey Shipilëv (@shipilev) for pointing it out.

It’s hard to see in the thumbnail, but the map gives the change in velocity needed at each branch point. You can find the full 2239 x 2725 pixel image here or click on the thumbnail above.

]]>It looks like the story is a matter of fraud rather than sloppiness. This is unfortunate because sloppiness is much more pervasive than fraud, and this could have made a great case study of bad analysis. However, one could look at it as a case study in how *good* analysis (by the folks at MD Anderson) can uncover fraud.

Now there’s a new development in the Potti saga. The latest issue of The Cancer Letter contains letters by whistle-blower Bradford Perez who warned officials at Duke about problems with Potti’s research.

]]>Eroom’s law — that’s Moore’s law backward — observes that the number of new drugs approved per billion dollars spent on R&D has halved every nine years since 1950.

**Update**: Here’s an article from Nature that gives more details. The trend is pretty flat on a log scale, i.e. exponentially declining efficiency.

**Related post**: Take chances, make mistakes, and get messy

Suppose you have a space ship that could accelerate at 1 g for as long as you like. Inside the ship you would feel the same gravity as on earth. You could travel wherever you like by accelerating at 1 g for the first half of the flight then reversing acceleration for the second half of the flight. This approach could take you to Mars in three days.

If you could accelerate at 1 g for a year you could reach the speed of light, and travel half a light year. So you could reverse your acceleration and reach a destination a light year away in two years. But this ignores relativity. Once you’re traveling at near the speed of light, time practically stops for you, so you could keep going as far as you like without taking any more time from your perspective. So you could travel **anywhere** in the universe in two years!

Of course there are a few problems. We have no way to sustain such acceleration. Or to build a ship that could sustain an impact with a spec of dust when traveling at relativistic speed. And the calculation ignores relativity until it throws it in at the end. Still, it’s fun to think about.

**Update**: Dan Piponi gives a calculation on G+ that addresses the last of the problems I mentioned above, sticking relativity on to the end of a classical calculation. He does a proper relativistic calculation from the beginning.

]]>If you take the radius of the observable universe to be 45 billion light years, then I think you need about 12.5 g to get anywhere in it in 2 years. (Both those quantities as measured in the frame of reference of the traveler.)

If you travel at constant acceleration a for time t then the distance covered is c^2/a (cosh(a t/c) – 1) (Note that gives the usual a t^2/2 for small t.)

Which side is correct depends on what’s out there waiting to be discovered, which of course we don’t know. We can only guess. Timid research is rational if you believe there are only marginal improvements that are likely to be discovered.

Sample size increases quickly as the size of the effect you’re trying to find decreases. To establish small differences in effect, you need very large trials.

If you think there are only small improvements on the status quo available to explore, you’ll explore each of the possibilities very carefully. On the other hand, if you think there’s a miracle drug in the pipeline waiting to be discovered, you’ll be willing to risk falsely rejecting small improvements along the way in order to get to the big improvement.

Suppose there are 500 drugs waiting to be tested. All of these are only 10% effective except for one that is 100% effective. You could quickly find the winner by giving each candidate to one patient. For every drug whose patient responded, repeat the process until only one drug is left. One strike and you’re out. You’re likely to find the winner in three rounds, treating fewer than 600 patients. But if all the drugs are 10% effective except one that’s 11% effective, you’d need hundreds of trials with thousands of patients each.

The best research strategy depends on what you believe is out there to be found. People who know nothing about cancer often believe we could find a cure soon if we just spend a little more money on research. Experts are more sanguine, except when they’re asking for money.

]]>However, a more fundamental point has been lost. At the core of Ioannidis’ paper is the assertion that **the proportion of true hypotheses under investigation matters**. In terms of Bayes’ theorem, the *posterior* probability of a result being correct depends on the *prior* probability of the result being correct. This prior probability is vitally important, and it varies from field to field.

In a field where it is hard to come up with good hypotheses to investigate, most researchers will be testing false hypotheses, and most of their positive results will be coincidences. In another field where people have a good idea what ought to be true before doing an experiment, most researchers will be testing true hypotheses and most positive results will be correct.

For example, it’s very difficult to come up with a better cancer treatment. Drugs that kill cancer in a petri dish or in animal models usually don’t work in humans. One reason is that these drugs may cause too much collateral damage to healthy tissue. Another reason is that treating human tumors is more complex than treating artificially induced tumors in lab animals. Of all cancer treatments that appear to be an improvement in early trials, very few end up receiving regulatory approval and changing clinical practice.

A greater proportion of physics hypotheses are correct because physics has powerful theories to guide the selection of experiments. Experimental physics often succeeds because it has good support from theoretical physics. Cancer research is more empirical because there is little reliable predictive theory. This means that a published result in physics is more likely to be true than a published result in oncology.

Whether “most” published results are false depends on context. The proportion of false results varies across fields. It is high in some areas and low in others.

]]>I’m not sure whether I agree with Brenner’s quote, but I find it interesting. You could argue that techniques are most important because they have the most leverage. A new technique may lead to many new discoveries and new ideas.

]]>

]]>“Oh, the intellectual freedom of academia” he thought while filling out a time sheet which checks that he does not work on non-grant science.

When Coleridge, the most famous poet of the day, wrote his tract on scientific method in 1817 it was not considered an oddity; by 1833, the time of the third meeting of the British Association for the Advancement of Science, it was already remarkable, and in the years that followed it was almost inconceivable.

**Related post**: How the term “scientist” came to be

To me, the subject of “information theory” is badly named. That discipline is devoted to finding ideal compression schemes for messages to be sent quickly and accurately across a noisy channel. It deliberately does not pay any attention to what the messages mean. To my mind this should be called compression theory or redundancy theory. Information is inherently meaningful—that is its purpose—any theory that is unconcerned with the meaning is not really studying information per se. The people who decide on speed limits for roads and highways may care about human health, but a study limited to deciding ideal speed limits should not be called “human health theory”.

Despite what was said above, Information theory has been extremely important in a diverse array of fields, including computer science but also in neuroscience and physics. I’m not trying to denigrate the field; I am only frustrated with its name.

From David Spivak, footnotes 13 and 14 here.

]]>I was surprised by the articles on the bombing of Hiroshima and Nagasaki. New York Times reporter William Lawrence was allowed to go on the mission over Nagasaki. He was not on the plane that dropped the bomb, but was in one of the other B-29 Superfortresses that were part of the mission. Lawrence’s story was published September 9, 1945, exactly one month later. Lawrence was also allowed to tour the ruins of Hiroshima. His article on the experience was published September 5, 1945. I was surprised how candid these articles were and how quickly they were published. Apparently military secrecy evaporated rapidly once WWII was over.

Another thing that surprised me was that some stories were newsworthy more recently than I would have thought. I suppose I underestimated how long it took to work out the consequences of a major discovery. I think we’re also biased to think that whatever we learned as children must have been known for generations, even though the dust may have only settled shortly before we were born.

]]>When you see something that is technically sweet, you go ahead and do it and argue about what to do about it only after you’ve had your technical success. That is the way it was with the atomic bomb.

]]>

Like all the books in the series, The Drug Book is a collection of alternating one-page articles and full page color photographs, arranged chronologically. These books make great coffee table books because they’re colorful and easy to dip in and out of. The other books in the series are The Space Book, The Physics Book, and The Medical Book.

The book’s definition of “drug” is a little broad. In addition to medicines, it also includes related chemicals such as recreational drugs and poisons. It also includes articles on drug-related reference works and legislation.

]]>

In other words, integers are not inputs of the theory, as Bohr thought. They are outputs. The integers are an example of what physicists call an emergent quantity. In this view, the term “quantum mechanics” is a misnomer. Deep down, the theory is not quantum. In systems such as the hydrogen atom, the processes described by the theory mold discreteness from underlying continuity. … The building blocks of our theories are not particles but fields: continuous, fluid-like objects spread throughout space. … The objects we call fundamental particles are not fundamental. Instead they are ripples of continuous fields.

Source: The Unquantum Quantum, Scientific American, December 2012.

]]>]]>Pure mathematics and physics are becoming ever more closely connected, though their methods remain different. One may describe the situation by saying that the mathematician plays a game in which he himself invents the rules while the physicist plays a game in which the rules are provided by Nature, but as time goes on it becomes increasingly evident that the rules which the mathematician finds interesting are the same as those which Nature has chosen.

Here’s something I learned while skimming through the book: Asteroids can have moons. (That’s the title of the article on page 414.) This has been known since the early 1990’s, but it’s news to me.

The first example discovered was a satellite now named Dactyl orbiting the asteroid 243 Ida. The Space Book says Dactyl was discovered in 1992. Wikipedia says Dactyl was photographed by the Galileo spacecraft in 1993 and discovered by examining the photos in February of 1994. Since that time, “more than 220 minor planet moons have been found.”

]]>]]>… applied science, purposeful and determined, and pure science, playful and freely curious, continuously support and stimulate each other. The great nation of the future will be the one which protects the freedom of pure science as much as it encourages applied science.

If universities simply paid their faculty a salary rather than giving them a hunting license for grants, the faculty could spend 80% of their time on research rather than 40%. Of course the numbers wouldn’t actually work out so simply. But it is safe to say that if you remove something that takes 40% of their time, researchers could spend more time doing research. (Researchers working in the private sector are often paid by grants too, so to some extent this applies to them as well.)

Universities depend on grant money to pay faculty. But if the money allocated for research were given to universities instead of individuals, universities could afford to pay their faculty.

Not only that, universities could reduce the enormous bureaucracies created to manage grants. This isn’t purely hypothetical. When Hillsdale College decided to refuse all federal grant money, they found that the loss wasn’t nearly as large as it seemed because so much of the grant money had been going to administering grants.

]]>In addition to presenting the advanced physics, which mathematicians find so easy, I also want to explore the workings of elementary physics, and mysterious maneuvers — which physicists seem to find so natural — by which one reduces a complicated physical problem to a simple mathematical question, which I have always found so hard to fathom.

That’s exactly how I feel about physics. I’m comfortable with differential equations and manifolds. It’s blocks and pulleys that kick my butt.

]]>The subtitle may be a little misleading. There is a fair amount of math in the book, but the ratio of history to math is pretty high. You might say the book is more about the role of mathematicians than the role of mathematics. As Roger Penrose says on the back cover, the book has “illuminating descriptions and minimal technicality.”

Someone interested in weather prediction but without a strong math background would enjoy reading the book, though someone who knows more math will recognize some familiar names and theorems and will better appreciate how they fit into the narrative.

**Related posts**:

All medicine is personalized. If you are in an emergency room with a broken leg and the person next to you is lapsing into a diabetic coma, the two of you will be treated differently.

The aim of personalized medicine is to increase the *degree* of personalization, not to introduce personalization. In particular, there is the popular notion that it will become routine to sequence your DNA any time you receive medical attention, and that this sequence data will enable treatment uniquely customized for you. All we have to do is collect a lot of data and let computers sift through it. There are numerous reasons why this is incredibly naive. Here are three to start with.

- Maybe the information relevant to treating your malady is in how DNA is expressed, not in the DNA per se, in which case a sequence of your genome would be useless. Or maybe the most important information is not genetic at all. The data may not contain the answer.

- Maybe the information a doctor needs is not in one gene but in the interaction of 50 genes or 100 genes. Unless a small number of genes are involved, there is no way to explore the combinations by brute force. For example, the number of ways to select 5 genes out of 20,000 is 26,653,335,666,500,004,000. The number of ways to select 32 genes is over a googol, and there isn’t a googol of anything in the universe. Moore’s law will not get us around this impasse.
- Most clinical trials use no biomarker information at all. It is exceptional to incorporate information from one biomarker. Investigating a handful of biomarkers in a single trial is statistically dubious. Blindly exploring tens of thousands of biomarkers is out of the question, at least with current approaches.

Genetic technology has the potential to incrementally increase the degree of personalization in medicine. But these discoveries will require new insight, and not simply more data and more computing power.

**Related posts**:

- Acute myeloid leukemia and myelodysplastic syndrome (AML and MDS)
- Chronic lymphocytic leukemia (CLL)
- Lung cancer
- Melanoma
- Prostate cancer
- Triple negative breast and ovarian cancer

These special areas of research are being called “moon shots” by analogy with John F. Kennedy’s challenge to put a man on the moon. This isn’t a new idea. In fact, a few months after the first moon landing, there was a full-page ad in the Washington Post that began “Mr. Nixon: You can cure cancer.” The thinking was the familiar refrain “If we can put a man on the moon, we can …” President Nixon and other politicians were excited about the idea and announced a “war on cancer.” Scientists, however, were more skeptical. Sol Spiegelman said at the time

An all-out effort at this time would be like trying to land a man on the moon without knowing Newton’s laws of gravity.

The new moon shots are not a national attempt to “cure cancer” in the abstract. They are six initiatives at one institution to focus research on specific kinds of cancer. And while we do not yet know the analog of Newton’s laws for cancer, we do know far more about the basic biology of cancer than we did in the 1970’s.

There are results that suggest that there is some unity beyond the diversity of cancer, that ultimately there are a few common biological pathways involved in all cancers. Maybe some day we will be able to treat cancer in general, but for now it looks like the road forward is specialization. Perhaps specialized research programs will uncover some of these common patters in all cancer.

**Related links**: