Are tweets more accurate than science papers?

John Myles White brings up an interesting question on Twitter:

Ioannidis thinks most published biological research findings are false. Do you think >50% of tweets are false?

I’m inclined to think tweets may be more accurate than research papers, mostly because people tweet about mundane things that they understand. If someone says that there’s a long line at the Apple store, I believe them. When someone says that a food increases or decreases your risk of some malady, I’m more skeptical. I’ll wait to see such a result replicated before I put much faith in it. A lot of tweets are jokes or opinions, but of those that are factual statements, they’re often true.

Tweets are not subject to publication pressure; few people risk losing their job if they don’t tweet. There’s also not a positive publication bias: people can tweet positive or negative conclusions. There is a bias toward tweeting what makes you look good, but that’s not limited to Twitter.

Errors are corrected quickly on Twitter. When I make factual errors on Twitter, I usually hear about it within minutes. As the saga of Anil Potti illustrates, errors or fraud in scientific papers can take years to retract.

(My experience with Twitter may be atypical. I follow people with a relatively high signal to noise ratio, and among those I have a shorter list that I keep up with.)


Euler characteristic with dice

For any convex solid, VE + F = 2 where V is the number of vertices, E the number of edges, and F the number of faces. The number 2 in this formula is a topological invariant of a sphere, called its Euler characteristic. But if you compute the Euler characteristic for a figure with a hole in it, you get a different value. For a torus (the surface of a doughnut) we get VE + F = 0.

You can demonstrate this with eight 6-sided dice. A single die has 8 vertices, 12 edges, and 6 faces, and so VE + F = 2. Next join two dice together along one face.

two dice joined together

Before joining, the two dice separately have 16 vertices, 24 edges, and 12 faces. But when we join them together, we have 4 fewer vertices since 4 pairs of edges are identified together. Similarly, 4 pairs of edges are identified, and 2 faces are identified. So the joined pair now has 12 vertices, 20 edges, and 10 faces, and once again VE + F = 2.

We can keep on adding dice this way, and each time the Euler characteristic doesn’t change. Each new die adds 4 vertices, 8 edges, and 4 faces, so VE + F doesn’t change.

seven dice in a U-shape

But when we join the dice into a circle, the Euler characteristic changes when we put the last die in place.

eight dice in a torus shape

The last die doesn’t change the total number of vertices, since all its vertices are identified with previous vertices. The last die adds 4 edges. It adds a net of 2 faces: it adds 4 new faces, but it removes 2 existing faces. So the net change to the Euler characteristic is 0 – 4 + 2 = -2. The last die lowers the Euler characteristic from 2 to 0.

Exercise: Use a similar procedure to find the Euler characteristic of a two-holed torus.

RelatedMake your own buckyball

For daily tweets on topology and geometry, follow @TopologyFact on Twitter.

TopologyFact logo

Product of normal PDFs

The product of two normal PDFs is proportional to a normal PDF. This is well known in Bayesian statistics because a normal likelihood times a normal prior gives a normal posterior. But because Bayesian applications don’t usually need to know the proportionality constant, it’s a little hard to find. I needed to calculate this constant, so I’m recording the result here for my future reference and for anyone else who might find it useful.

Denote the normal PDF by

\phi(x; m, s) = \frac{1}{\sqrt{2\pi} s} \exp\left(-\frac{(x-m)^2}{2s^2}\right)

Then the product of two normal PDFs is given by the equation

\phi(x; \mu_1, \sigma_1) \, \phi(x; \mu_2, \sigma_2) = \phi\left(\mu_1; \mu_2, \sqrt{\sigma_1^2 + \sigma_2^2}\right) \,\phi(x, \mu, \sigma)


 \mu = \frac{ \sigma_1^{-2} \mu_1 + \sigma_2^{-2} \mu_2}{\sigma_1^{-2} + \sigma_2^{-2} }


 \sigma^2 = \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 + \sigma_2^2}

Note that the product of two normal random variables is not normal, but the product of their PDFs is proportional to the PDF of another normal.

Click to learn more about Bayesian statistics consulting


Make your own buckyball

This weekend a couple of my daughters and I put together a buckyball from a Zometool kit. The shape is named for Buckminster Fuller of geodesic dome fame. Two years after Fuller’s death, scientists discovered that the shape appears naturally in the form of a C60 molecule, named Buckminsterfullerene in his honor. In geometric lingo, the shape is a truncated icosahedron. It’s also the shape of many soccer balls.

I used the buckyball to introduce the Euler’s formula:  V – E + F = 2. (The number of vertices (black balls) minus the number of edges (blue sticks) plus the number of faces (in this case, pentagons and hexagons) always equals 2 for a shape that can be deformed into a sphere.) Being able to physically add and remove vertices or nodes makes the induction proof of Euler’s formula really tangible. Then we looked at 6- and 12-sided dice to show that V – E + F = 2 for these shapes as well.

Thanks to Zometool for sending me the kit.

Update: How to show that the Euler characteristic of a torus is zero

Related post: Platonic solids

For daily tweets on topology and geometry, follow @TopologyFact on Twitter.

TopologyFact logo

Ramanujan’s most beautiful identity

G. H. Hardy called the following equation Ramanujan’s “most beautiful identity.” For |q| < 1,

\sum_{n=0}^\infty p(5n+4) q^n = 5 \prod_{n=1}^\infty \frac{(1 - q^{5n})^5}{(1 - q^n)^6}

If I understood it, I might say it’s beautiful, but for now I can only say it’s mysterious. Still, I explain what I can.

The function p on the left side is the partition function. For a positive integer argument n, p(n) is the number of ways one can write n as the sum of a non-decreasing sequence of positive integers.

The right side of the equation is an example of a q-series. Strictly speaking it’s a product, not a series, but it’s the kind of thing that goes under the general heading of q-series.

I hardly know anything about q-series, and they don’t seem very motivated. However, I keep running into them in unexpected places. They seem to be a common thread running through several things I’m vaguely familiar with and would like to understand better.

As mysterious as Ramanujan’s identity is, it’s not entirely unprecedented. In the eighteenth century, Euler proved that the generating function for partition numbers is a q-product:

\sum_{n=0}^\infty p(n) q^n = \prod_{n=1}^\infty \frac{1}{(1 - q^n)}

So in discovering his most beautiful identity (and others) Ramanujan followed in Euler’s footsteps.

Reference: An Invitation to q-series

For daily posts on analysis, follow @AnalysisFact on Twitter.

AnalysisFact twitter icon

Sun, milk, red meat, and least-squares

I thought this tweet from @WoodyOsher was pretty funny.

Everything our parents said was good is bad. Sun, milk, red meat … the least-squares method.

I wouldn’t say these things are bad, but they are now viewed more critically than they were a generation ago.

Sun exposure may be an apt example since it has alternately been seen as good or bad throughout history. The latest I’ve heard is that moderate sun exposure may lower your risk of cancer, even skin cancer, presumably because of vitamin D production. And sunlight appears to reduce your risk of multiple sclerosis since MS is more prevalent at higher latitudes. But like milk, red meat, or the least squares method, you can over do it.

More on least squares: When it works, it works really well

Python for data analysis

I recommend using Python for data analysis, and I recommend Wes McKinney’s book Python for Data Analysis.

I prefer Python to R for mathematical computing because mathematical computing doesn’t exist in a vacuum; there’s always other stuff to do. I find doing mathematical programming in a general-purpose language is easier than doing general-purpose programming in a mathematical language. Also, general-purpose languages like Python have larger user bases, are better designed, have better tool support, etc.

Python per se doesn’t have everything you need for mathematical computing. You need to combine several tools and libraries, typically at least SciPy, matplotlib, and IPython. Because there are different pieces involved, it’s hard to find one source to explain using them all together. Also, even with the three additional components mentioned before, there is a need for additional software for working with structured data.

Wes McKinney developed the pandas library to give Python “rich data structures and functions designed to make working with structured data fast, easy, and expressive.” And now he has addressed the need for unified exposition by writing a single book that describes how to use the Python mathematical computing stack. Importantly, the book covers two recent developments that make Python more competitive with other environments for data analysis: enhancements to IPython and Wes’ own pandas project.

Python for Data Analysis is available for pre-order. I don’t know when the book will be available but Amazon lists the publication date as October 29. My review copy was a PDF, but at least one paper copy has been spotted in the wild:

Wes McKinney holding his book at O’Reilly’s Strata Conference. Photo posted on Twitter yesterday.

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

Dead man writing

Paul Erdős was an extraordinary mathematical collaborator. He traveled constantly, cross-pollinating the mathematical community. He wrote about 1500 papers and had around 500 coauthors. According to Ron Graham,

He’s still writing papers, actually. He’s slowed down. Because many people started a paper with Erdős and have let it lay in a stack some place and didn’t quite get around to it … In the last couple years he’s published three or four papers. Of course he’s been dead almost 15 years, so he’s slowed a bit.

For more on Erdős, listen to Samuel Hansen’s excellent podcast.

Related posts:

Dimension 5 isn’t so special

Lately I’ve been reading The Best Writing on Mathematics 2012. I’d like to present a alternative perspective on one of the articles.

In his article “An Adventure in the Nth Dimension,” Brian Hayes explores how in high dimensions, balls have surprisingly little volume. As the dimension n increases, the volume of a ball of radius 1 increases until n = 5. Then for larger n the volume steadily decreases. Hayes asks

What is it about five-dimensional space that allows a unit 5-ball to spread out more expansively than any other n-ball?

He says that it all has to do with the value of π and that if π were different, the unit ball would have its maximum value for a different dimension n. While that is true, it seems odd to speculate about changing the value of π. It seems much more natural to speculate about changing the radius of the balls.

The volume of a ball of radius r in dimension n is

V = \frac{\pi^{\frac{n}{2}} r^n}{\Gamma\left(\frac{n}{2} + 1\right)}

If we fix r at 1 and let n vary, we get a curve like this:

But for different values of r, the plot will have its maximum at different values of n. For example, here is the curve for balls of radius 2:

Let’s think of n in our volume formula as a continuous variable so we can differentiate with respect to n. It turns out to be more convenient to work with the logarithm of the volume. This makes no difference: the logarithm of a function takes on its maximum exactly where the original function does since log is an increasing function.

\frac{d}{dn} \log V = \frac{1}{2} \log \pi + \log r - \frac{1}{2} \psi\left(\frac{n}{2} + 1\right)

We can tell from this equation that volume (eventually) decreases as a function of n because ψ is an unbounded increasing function. The derivative has a unique zero, and we can move the location of that zero out by increasing r. So for any dimension n, we can solve for a value of r such that a ball of radius r has its maximum volume in that dimension:

r = \exp\left( \frac{1}{2}\left( \psi\left(\frac{n}{2}-1\right) - \pi \right)\right)

Related: High dimensional integration

Fixing computers

When I was growing up and ordinary people were becoming aware of computers, my father told me that he thought there would be good money in fixing computers when they break down.

Looking back on this, it’s obvious why he would say that: he fixed things. I could just imagine a salesman saying at the same time “Son, there’s going to be good money in selling computers.” Maybe a policeman was telling his son that computer crime was going to be a big problem some day. And maybe a politician was telling his son that we’ve got to find a way to tax computers.

* * *

I first posted this on Google+ a few days ago.

Volatility in adaptive randomization

Randomized clinical trials essentially flip a coin to assign patients to treatment arms. Outcome-adaptive randomization “bends” the coin to favor what appears to be the better treatment at the time each randomized assignment is made. The method aims to treat more patients in the trial effectively, and on average it succeeds.

However, looking only at the average number of patients assigned to each treatment arm conceals the fact that the number of patients assigned to each arm can be surprisingly variable compared to equal randomization.

Suppose we have 100 patients to enroll in a clinical trial. If we assign each patient to a treatment arm with probability 1/2, there will be about 50 patients on each treatment. The following histogram shows the number of patients assigned to the first treatment arm in 1000 simulations. The standard deviation is about 5.

Next we let the randomization probability vary. Suppose the true probability of response is 50% on one arm and 70% on the other. We model the probability of response on each arm as a beta distribution, starting from a uniform prior. We randomize to an arm with probability equal to the posterior probability that that arm has higher response. The histogram below shows the number of patients assigned to the better treatment in 1000 simulations.

The standard deviation in the number of patients is now about 17. Note that while most trials assign 50 or more patients to the better treatment, some trials in this simulation put less than 20 patients on this treatment. Not only will these trials treat patients less effectively, they will also have low statistical power (as will the trials that put nearly all the patients on the better arm).

The reason for this volatility is that the method can easily be mislead by early outcomes. With one or two early failures on an arm, the method could assign more patients to the other arm and not give the first arm a chance to redeem itself.

Because of this dynamic, various methods have been proposed to add “ballast” to adaptive randomization. See a comparison of three such methods here. These methods reduce the volatility in adaptive randomization, but do not eliminate it. For example, the following histogram shows the effect of adding a burn-in period to the example above, randomizing the first 20 patients equally.

The standard deviation is now 13.8, less than without the burn-in period, but still large compared to a standard deviation of 5 for equal randomization.

Another approach is to transform the randomization probability. If we use an exponential tuning parameter of 0.5, the sample standard deviation of the number of patients on the better arm is essentially the same, 13.4. If we combine a burn-in period of 20 and an exponential parameter of 0.5, the sample standard deviation is 11.7, still more than twice that of equal randomization.


Competence and prestige

The phrase “downward nobility” is a pun on “upward mobility.” It usually refers to taking a less lucrative but more admired position. For example, it might be used to describe a stock broker who becomes a teacher in a poor school. (I don’t believe that being a teacher is necessarily more noble than being a stock broker, but many people would think so.)

Daniel Lemire looks at a variation on downward nobility in his blog post Why you may not like your job, even though everyone envies you. He comments on Matt Welsh’s decision to leave a position as a tenured professor at Harvard to develop software for Google. Welsh may not have taken a pay cut — he may well have gotten a raise — but he took a cut in prestige in order to do work that he found more fulfilling.

The Peter Principle describes people how people take more prestigious positions as they become less competent. The kind of downward nobility Daniel describes is a sort of anti-Peter Principle, taking a step down in prestige to move deeper into your area of competence.

Paul Graham touches on this disregard for prestige in his essay How to do what you love.

If you admire two kinds of work equally, but one is more prestigious, you should probably choose the other. Your opinions about what’s admirable are always going to be slightly influenced by prestige, so if the two seem equal to you, you probably have more genuine admiration for the less prestigious one.

Matt Welsh now has a less prestigious position in the assessment of the general public. But in a sense he didn’t give up prestige for competence. Instead, he chose a new environment in which his area competence carries more prestige.

Related posts:

Seven John McCarthy papers in seven weeks

I recently ran across a series of articles from Carin Meier going through seven papers by the late computer science pioneer John McCarthy in seven weeks. Published so far:

#1: Ascribing Mental Qualities to Machines
#2: Towards a Mathematical Science of Computation

Carin has announced that the next paper will be “First Order Theories of Individual Concepts and Propositions” but she hasn’t posted a commentary on it yet.

Shifting probability distributions

One reason the normal distribution is easy to work with is that you can vary the mean and variance independently. With other distribution families, the mean and variance may be linked in some nonlinear way.

I was looking for a faster way to compute Prob(X > Y + δ) where X and Y are independent inverse gamma random variables. If δ were zero, the probability could be computed analytically. But when δ is positive, the calculation requires numerical integration. When the calculation is in the inner loop of a simulation, most of the simulation’s time is spent doing the integration.

Let Z = Y + δ. If Z were another inverse gamma random variable, we could compute Prob(X > Z) quickly and accurately without integration. Unfortunately, Z is not an inverse gamma. But it is approximately an inverse gamma, at least if Y has a moderately large shape parameter, which it always does in my applications. So let Z be inverse gamma with parameters to match the mean and variance of Y + δ. Then Prob(X > Z) is a good approximation to Prob(X > Y + δ).

For more details, see Fast approximation of inverse gamma inequalities.

Related posts:

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

Being useful

Chuck Bearden posted this quote from Steve Holmes on his blog the other day:

Usefulness comes not from pursuing it, but from patiently gathering enough of a reservoir of material so that one has the quirky bit of knowledge … that turns out to be the key to unlocking the problem which someone offers.

Holmes was speaking specifically of theology. I edited out some of the particulars of his quote to emphasize that his idea applies more generally.

Obviously usefulness can come from pursuing it. But there’s a special pleasure in applying some “quirky bit of knowledge” that you acquired for its own sake. It can feel like simply walking up to a gate and unlocking it after unsuccessful attempts to storm the gate by force.