When does the sum of three numbers equal their product?

Mathematics Diary posted the following identity this morning. If a + b + c = π then

tan(a) + tan(b) + tan(c) = tan(a) tan(b) tan(c).

I’d never seen that before. It’s striking that the sum of three numbers would also equal their product. In fact, the only way for the product of three numbers to be equal to their sum is for the three numbers to be tangents of angles that add up to π radians. I’ll explain below why that’s true.

First, we can generalize the identity above slightly by saying it holds if a + b + c is a multiple of π. In fact, it can be shown that

tan(a) + tan(b) + tan(c) = tan(a) tan(b) tan(c)

if and only if a + b + c is a multiple of π. (Here’s a sketch of a proof.)

Now suppose x + y + z = x y z. We can find numbers a, b, and c such that x = tan(a), y = tan(b), and z = tan(c) and it follows that a + b + c must be a multiple of π. But can we chose a, b, and c so that their sum is not just a multiple of π but exactly π? Yes. If a + b + c equaled kπ for some integer k, pick a new value of c equal to the original c minus (k-1)π. This leaves the value of tan(c) unchanged since the tangent function has period π.

Spell checking from Python

I needed to find a spell checker I could call from Python, so I did a Google search and ran across GNU aspell. I tried installing it but got contradictory warning messages: aspell not installed, aspell already installed, etc. Then I remembered what an awful time I’d had before when I’d tried to use aspell and gave up.

Next I tried Ryan Kelly’s PyEnchant and it worked like a charm. I downloaded the installer for Windows and ran it. Then I opened up a Python console and typed an example following the online tutorial.

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Potatoe")
False
>>> d.check("Potato")
True

It just works.

Checking your BlackBerry at a funeral

In a recent survey, 16% of those surveyed admitted to checking their BlackBerry at a funeral or memorial service. If even a funeral doesn’t make you ignore the ephemera of life for a few minutes to think about what’s important, something is deeply wrong.

Web 2.0 over dial-up

I’m borrowing an old Pentium III computer with a dial-up Internet connection. I haven’t used dial up in a long time and was surprised what a difference bandwidth makes. Many “Web 2.0” sites are just painful to use. Some simply do not work. One site gave me a message essentially saying to go away and come back with a better connection.

However, StackOverflow was a pleasant surprise. I thought that since it uses a lot of client-side JavaScript, the site would be just as sluggish as the others I tried. It takes a while to log in, but after that the site is easy to use over a slow connection. The developers of the site must have thought about conserving bandwidth; I find it hard to believe that the excellent low-bandwidth performance was an accident. Very impressive.

Distribution of adult heights

It is well known that adult male heights follow a normal (Gaussian) distribution. The same is true of adult female heights. But what does the distribution of heights look like for adults in general? You might be surprised.

Assume heights for women follow a normal distribution with mean of 64 inches and standard deviation 3 inches.

Assume men’s heights follow the same distribution but with an average of 70 inches.

Finally, assume men and women each make up 50% of the population. Then you get the following distribution for the heights of adults in general.

The mixture is surprisingly flat on top. Minor variations on the assumptions above can change the shape, making it more rounded at the top, making it dip in the middle, or making it tip to one side.

See Adult heights and mixture distributions for mathematical details.

See also Why heights are normally distributed.

How to put PDF properties in a LaTeX file

My previous post described how to put links in a PDF file generated from LaTeX. The hyperref package that lets you include links also lets you to set PDF document properties. I’ve been using Adobe Acrobat to do this after creating my PDF file with pdflatex, but that’s unnecessary. Here’s how to put the PDF properties directly in the LaTeX file. Add something like this

\hypersetup
{
    pdfauthor={John Hancock},
    pdfsubject={Some subject},
    pdftitle={Sample document},
    pdfkeywords={LaTeX, PDF, hyperlinks}
}

after the \usepackage{hyperref} instruction at the top of your file.

How to link to web pages from LaTeX-generated PDF

This has been on my to-do list for a while, but I finally found out how to embed hyperlinks in a PDF file generated from LaTeX.

Short answer: put \usepackage{hyperref} in your header, and when you want to link to a page, use the command \href{URL}{anchor text}. For example,

\documentclass{article}
\usepackage{hyperref}
\begin{document}

Here's a link to \href{http://twitter.com/home}{Twitter}.

\end{document}

For much more detail on links in LaTeX documents, see Patrick Jöckel’s LaTeX-PDF page and the hyperref package documentation. [Update: looks like these links went away.]

Probability that a study result is true

Suppose a new study comes out saying a drug or a food or a habit lowers your risk of some disease. What is the probability that the study’s result is correct? Obviously this is a very important question, but one that is not raised often enough.

I’ve referred to a paper by John Ioannidis (*) several times before, but I haven’t gone over the model he uses to support his claim that most study results are false. This post will look at some equations he derives for estimating the probability that a claimed positive result is correct.

First of all, let R be the ratio of positive findings to negative findings being investigated in a particular area. Of course we never know exactly what R is, but let’s pretend that somehow we knew that out of 1000 hypotheses being investigated in some area, 200 are correct. Then R would be 200/800 = 0.25. The value of R varies quite a bit, being relatively large in some fields of study and quite small in others. Imagine researchers pulling hypotheses to investigate from a hat. The probability of selecting a hypothesis that really is true would be R/(R+1) and the probability selecting a false hypothesis is 1/(R+1).

Let α be the probability of incorrectly declaring a false hypothesis to be true. Studies are often designed with the goal that α would be 0.05. Let β be the probability that a study would incorrectly conclude that that a true hypothesis is false. In practice, β is far more variable than α. You might find study designs with β anywhere from 0.5 down to 0.01. The design choice β = 0.20 is common in some contexts.

There are two ways to publish a study claiming a new result: you could have selected a true hypothesis and correctly concluded that it was true, or you could have selected a false but incorrectly concluded it was true. The former has probability (1-β)R/(R+1) and the latter has probability α/(R+1). The total probability of concluding a hypothesis is true, correctly or incorrectly, is the sum of these probabilities, i.e. ((1-β)R + α)/(R+1). The probability that a study conclusion is true given that you concluded it was true, the positive predictive value or PPV, is the ratio of (1-β)R/(R+1) to ((1-β)R + α)/(R+1). In summary, under the assumptions above, the probability of a claimed result being true is (1-β)R/((1-β)R + α).

If (1 – β)R < α then the model say that a claim is more likely to be false than true. This can happen if R is small, i.e. there are not a large proportion of true results under investigation, and if β is large, i.e. if studies are small. If R is smaller than α, most studies will be false no matter how small you make β, i.e. no matter how large the study. This says that in a challenging area, where few of the ideas being investigated lead to progress, there will be a large proportion of false results published, even if the individual researchers are honest and careful.

Ioannidis develops two other models refining the model above. Suppose that because of bias, some proportion of results that would otherwise have been reported as negative are reported as positive. Call this proportion u. The derivation of the positive predictive value is similar to that in the previous model, but messier. The final result is R(1-β + uβ)/(R(1-β + uβ) + α + u – αu). If 1 – β > α, which is nearly always the case, then the probability of a reported result being correct decreases as bias increases.

The final model considers the impact of multiple investigators testing the same hypothesis. If more people try to prove the same thing, it’s more likely that someone will get lucky and “prove” it, whether or not the thing to be proven is true. Leaving aside bias, if n investigators are testing each hypothesis, the probability that a positive claim is true is given by R(1 – βn)/(R + 1 – (1 – α)nRβn). As n increases, the probability of a positive claim being true decreases.

The probability of a result being true is often much lower than is commonly believed. One reason is that hypothesis testing focuses on the probability of the data given a hypothesis rather than the probability of a hypothesis given the data. Calculating the probability of a hypothesis given data relies on prior probabilities, such as the factors R/(R+1) and 1/(R+1) above. These prior probabilities are elusive and controversial, but they are critical in evaluating how likely it is that claimed results are true.

Related: Adaptive clinical trial design

 

(*) John P. A. Ioannidis, Why most published research findings are false. CHANCE volume 18, number 4, 2005.