Are tweets more accurate than science papers?

John Myles White brings up an interesting question on Twitter:

Ioannidis thinks most published biological research findings are false. Do you think >50% of tweets are false?

I’m inclined to think tweets may be more accurate than research papers, mostly because people tweet about mundane things that they understand. If someone says that there’s a long line at the Apple store, I believe them. When someone says that a food increases or decreases your risk of some malady, I’m more skeptical. I’ll wait to see such a result replicated before I put much faith in it. A lot of tweets are jokes or opinions, but of those that are factual statements, they’re often true.

Tweets are not subject to publication pressure; few people risk losing their job if they don’t tweet. There’s also not a positive publication bias: people can tweet positive or negative conclusions. There is a bias toward tweeting what makes you look good, but that’s not limited to Twitter.

Errors are corrected quickly on Twitter. When I make factual errors on Twitter, I usually hear about it within minutes. As the saga of Anil Potti illustrates, errors or fraud in scientific papers can take years to retract.

(My experience with Twitter may be atypical. I follow people with a relatively high signal to noise ratio, and among those I have a shorter list that I keep up with.)

Related

Euler characteristic with dice

For any convex solid, VE + F = 2 where V is the number of vertices, E the number of edges, and F the number of faces. The number 2 in this formula is a topological invariant of a sphere, called its Euler characteristic. But if you compute the Euler characteristic for a figure with a hole in it, you get a different value. For a torus (the surface of a doughnut) we get VE + F = 0.

You can demonstrate this with eight 6-sided dice. A single die has 8 vertices, 12 edges, and 6 faces, and so VE + F = 2. Next join two dice together along one face.

two dice joined together

Before joining, the two dice separately have 16 vertices, 24 edges, and 12 faces. But when we join them together, we have 4 fewer vertices since 4 pairs of edges are identified together. Similarly, 4 pairs of edges are identified, and 2 faces are identified. So the joined pair now has 12 vertices, 20 edges, and 10 faces, and once again VE + F = 2.

We can keep on adding dice this way, and each time the Euler characteristic doesn’t change. Each new die adds 4 vertices, 8 edges, and 4 faces, so VE + F doesn’t change.

seven dice in a U-shape

But when we join the dice into a circle, the Euler characteristic changes when we put the last die in place.

eight dice in a torus shape

The last die doesn’t change the total number of vertices, since all its vertices are identified with previous vertices. The last die adds 4 edges. It adds a net of 2 faces: it adds 4 new faces, but it removes 2 existing faces. So the net change to the Euler characteristic is 0 – 4 + 2 = -2. The last die lowers the Euler characteristic from 2 to 0.

Exercise: Use a similar procedure to find the Euler characteristic of a two-holed torus.

RelatedMake your own buckyball

Product of normal PDFs

The product of two normal PDFs is proportional to a normal PDF. This is well known in Bayesian statistics because a normal likelihood times a normal prior gives a normal posterior. But because Bayesian applications don’t usually need to know the proportionality constant, it’s a little hard to find. I needed to calculate this constant, so I’m recording the result here for my future reference and for anyone else who might find it useful.

Denote the normal PDF by

\phi(x; m, s) = \frac{1}{\sqrt{2\pi} s} \exp\left(-\frac{(x-m)^2}{2s^2}\right)

Then the product of two normal PDFs is given by the equation

\phi(x; \mu_1, \sigma_1) \, \phi(x; \mu_2, \sigma_2) = \phi\left(\mu_1; \mu_2, \sqrt{\sigma_1^2 + \sigma_2^2}\right) \,\phi(x, \mu, \sigma)

where

 \mu = \frac{ \sigma_1^{-2} \mu_1 + \sigma_2^{-2} \mu_2}{\sigma_1^{-2} + \sigma_2^{-2} }

and

 \sigma^2 = \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 + \sigma_2^2}

Note that the product of two normal random variables is not normal, but the product of their PDFs is proportional to the PDF of another normal.

Make your own buckyball

This weekend a couple of my daughters and I put together a buckyball from a Zometool kit. The shape is named for Buckminster Fuller of geodesic dome fame. Two years after Fuller’s death, scientists discovered that the shape appears naturally in the form of a C60 molecule, named Buckminsterfullerene in his honor. In geometric lingo, the shape is a truncated icosahedron. It’s also the shape of many soccer balls.

I used the buckyball to introduce the Euler’s formula:  V – E + F = 2. (The number of vertices (black balls) minus the number of edges (blue sticks) plus the number of faces (in this case, pentagons and hexagons) always equals 2 for a shape that can be deformed into a sphere.) Being able to physically add and remove vertices or nodes makes the induction proof of Euler’s formula really tangible. Then we looked at 6- and 12-sided dice to show that V – E + F = 2 for these shapes as well.

Thanks to Zometool for sending me the kit.

Update: How to show that the Euler characteristic of a torus is zero

Related post: Platonic solids

Ramanujan’s most beautiful identity

G. H. Hardy called the following equation Ramanujan’s “most beautiful identity.”

For |q| < 1,

\sum_{n=0}^\infty p(5n+4) q^n = 5 \prod_{n=1}^\infty \frac{(1 - q^{5n})^5}{(1 - q^n)^6}

If I understood it, I might say it’s beautiful, but for now I can only say it’s mysterious. Still, I explain what I can.

The function p on the left side is the partition function. For a positive integer argument n, p(n) is the number of ways one can write n as the sum of a non-decreasing sequence of positive integers.

The right side of the equation is an example of a q-series. Strictly speaking it’s a product, not a series, but it’s the kind of thing that goes under the general heading of q-series.

I hardly know anything about q-series, and they don’t seem very motivated. However, I keep running into them in unexpected places. They seem to be a common thread running through several things I’m vaguely familiar with and would like to understand better.

As mysterious as Ramanujan’s identity is, it’s not entirely unprecedented. In the eighteenth century, Euler proved that the generating function for partition numbers is a q-product:

\sum_{n=0}^\infty p(n) q^n = \prod_{n=1}^\infty \frac{1}{(1 - q^n)}

So in discovering his most beautiful identity (and others) Ramanujan followed in Euler’s footsteps.

Reference: An Invitation to q-series

Sun, milk, red meat, and least-squares

I thought this tweet from @WoodyOsher was pretty funny.

Everything our parents said was good is bad. Sun, milk, red meat … the least-squares method.

I wouldn’t say these things are bad, but they are now viewed more critically than they were a generation ago.

Sun exposure may be an apt example since it has alternately been seen as good or bad throughout history. The latest I’ve heard is that moderate sun exposure may lower your risk of cancer, even skin cancer, presumably because of vitamin D production. And sunlight appears to reduce your risk of multiple sclerosis since MS is more prevalent at higher latitudes. But like milk, red meat, or the least squares method, you can over do it.

More on least squares: When it works, it works really well

Python for data analysis

I recommend using Python for data analysis, and I recommend Wes McKinney’s book Python for Data Analysis.

I prefer Python to R for mathematical computing because mathematical computing doesn’t exist in a vacuum; there’s always other stuff to do. I find doing mathematical programming in a general-purpose language is easier than doing general-purpose programming in a mathematical language. Also, general-purpose languages like Python have larger user bases, are better designed, have better tool support, etc.

Python per se doesn’t have everything you need for mathematical computing. You need to combine several tools and libraries, typically at least SciPy, matplotlib, and IPython. Because there are different pieces involved, it’s hard to find one source to explain using them all together. Also, even with the three additional components mentioned before, there is a need for additional software for working with structured data.

Wes McKinney developed the pandas library to give Python “rich data structures and functions designed to make working with structured data fast, easy, and expressive.” And now he has addressed the need for unified exposition by writing a single book that describes how to use the Python mathematical computing stack. Importantly, the book covers two recent developments that make Python more competitive with other environments for data analysis: enhancements to IPython and Wes’ own pandas project.

Python for Data Analysis is available for pre-order. I don’t know when the book will be available but Amazon lists the publication date as October 29. My review copy was a PDF, but at least one paper copy has been spotted in the wild:

Wes McKinney holding his book at O’Reilly’s Strata Conference. Photo posted on Twitter yesterday.

Dead man writing

Paul Erdős was an extraordinary mathematical collaborator. He traveled constantly, cross-pollinating the mathematical community. He wrote about 1500 papers and had around 500 coauthors. According to Ron Graham,

He’s still writing papers, actually. He’s slowed down. Because many people started a paper with Erdős and have let it lay in a stack some place and didn’t quite get around to it … In the last couple years he’s published three or four papers. Of course he’s been dead almost 15 years, so he’s slowed a bit.

For more on Erdős, listen to Samuel Hansen’s excellent podcast.

More Paul Erdős posts

Dimension 5 isn’t so special

Lately I’ve been reading The Best Writing on Mathematics 2012. I’d like to present a alternative perspective on one of the articles.

In his article “An Adventure in the Nth Dimension,” Brian Hayes explores how in high dimensions, balls have surprisingly little volume. As the dimension n increases, the volume of a ball of radius 1 increases until n = 5. Then for larger n the volume steadily decreases. Hayes asks

What is it about five-dimensional space that allows a unit 5-ball to spread out more expansively than any other n-ball?

He says that it all has to do with the value of π and that if π were different, the unit ball would have its maximum value for a different dimension n. While that is true, it seems odd to speculate about changing the value of π. It seems much more natural to speculate about changing the radius of the balls.

The volume of a ball of radius r in dimension n is

V = \frac{\pi^{\frac{n}{2}} r^n}{\Gamma\left(\frac{n}{2} + 1\right)}

If we fix r at 1 and let n vary, we get a curve like this:

But for different values of r, the plot will have its maximum at different values of n. For example, here is the curve for balls of radius 2:

Let’s think of n in our volume formula as a continuous variable so we can differentiate with respect to n. It turns out to be more convenient to work with the logarithm of the volume. This makes no difference: the logarithm of a function takes on its maximum exactly where the original function does since log is an increasing function.

\frac{d}{dn} \log V = \frac{1}{2} \log \pi + \log r - \frac{1}{2} \psi\left(\frac{n}{2} + 1\right)

We can tell from this equation that volume (eventually) decreases as a function of n because ψ is an unbounded increasing function. The derivative has a unique zero, and we can move the location of that zero out by increasing r. So for any dimension n, we can solve for a value of r such that a ball of radius r has its maximum volume in that dimension:

r = \exp\left( \frac{1}{2}\left( \psi\left(\frac{n}{2}-1\right) - \pi \right)\right)

Related: High dimensional integration

Fixing computers

When I was growing up and ordinary people were becoming aware of computers, my father told me that he thought there would be good money in fixing computers when they break down.

Looking back on this, it’s obvious why he would say that: he fixed things. I could just imagine a salesman saying at the same time “Son, there’s going to be good money in selling computers.” Maybe a policeman was telling his son that computer crime was going to be a big problem some day. And maybe a politician was telling his son that we’ve got to find a way to tax computers.

* * *

I first posted this on Google+ a few days ago.