Blog Archives

Sun, milk, red meat, and least-squares

I thought this tweet from @WoodyOsher was pretty funny. Everything our parents said was good is bad. Sun, milk, red meat … the least-squares method. I wouldn’t say these things are bad, but they are now viewed more critically than

Tagged with: ,
Posted in Statistics

Shifting probability distributions

One reason the normal distribution is easy to work with is that you can vary the mean and variance independently. With other distribution families, the mean and variance may be linked in some nonlinear way. I was looking for a

Tagged with: ,
Posted in Statistics

Beta inequalities in R

Someone asked me yesterday for R code to compute the probability P(X > Y + δ) where X and Y are independent beta random variables. I’m posting the solution here in case it benefits anyone else. For an example of

Tagged with: ,
Posted in Statistics

Personalized medicine

When I hear someone say “personalized medicine” I want to ask “as opposed to what?” All medicine is personalized. If you are in an emergency room with a broken leg and the person next to you is lapsing into a

Tagged with: ,
Posted in Clinical trials, Science, Statistics

How do you justify that distribution?

Someone asked me yesterday how people justify probability distribution assumptions. Sometimes the most mystifying assumption is the first one: “Assume X is normally distributed …” Here are a few answers. Sometimes distribution assumptions are not justified. Sometimes distributions can be

Tagged with: , ,
Posted in Clinical trials, Statistics

Accuracy versus perceived accuracy

Commercial weather forecasters need to be accurate, but they also need to be perceived as being accurate, and sometimes the latter trumps the former. For instance, the for-profit weather forecasters rarely predict exactly a 50% chance of rain, which might

Tagged with:
Posted in Statistics

Robustness of simple rules

In his speech The dog and the frisbee, Andrew Haldane argues that simple models often outperform complex models in complex situations. He cites as examples sports prediction, diagnosing heart attacks, locating serial criminals, picking stocks, and  understanding spending patterns. The

Tagged with:
Posted in Statistics

True versus Publishable

This weekend John Myles White and I discussed true versus publishable results in the comments to an earlier post. Methods that make stronger modeling assumptions lead to more statistical confidence, but less actual confidence. That is, they are more likely

Tagged with: ,
Posted in Science, Statistics

Limits of statistics

When statisticians analyze data, they don’t just by look at the data you bring to them. They also consider hypothetical data that you could have brought. In other words, they consider what could have happened as well as what actually

Tagged with:
Posted in Statistics

Unprincipled analysis

The other day I started to call someone’s data analysis “unprincipled” until I realized how harsh that sounds. I wanted to convey that an analysis seemed ad hoc, not based on general principles. Then I realized that “unprincipled” implies someone

Tagged with:
Posted in Statistics

Vague priors are informative

Data analysis has to start from some set of assumptions. Bayesian prior distributions drive some people crazy because they make assumptions explicit that people prefer to leave implicit. But there’s no escaping the need to make some sort of prior

Tagged with:
Posted in Statistics

Avoiding underflow in Bayesian computations

Here’s a common problem that arises in Bayesian computation. Everything works just fine until you have more data than you’ve seen before. Then suddenly you start getting infinite, NaN, or otherwise strange results. This post explains what might be wrong

Tagged with: ,
Posted in Statistics

Computing log gamma differences

Statistical computing often involves working with ratios of factorials. These factorials are often too big to fit in a floating point number, and so we work with logarithms. So if we need to compute log(a! / b!), we call software

Tagged with: ,
Posted in Computing, Python, Statistics

Wrong and unnecessary

David Hogg on linear regression: … in almost all cases in which scientists fit a straight line to their data, they are doing something that is simultaneously wrong and unnecessary. It is wrong because … linear relationship is exceedingly rare.

Tagged with:
Posted in Statistics

Responsible data analysis

David Hogg on responsible data analysis: The key idea is that the result of responsible data analysis is not an answer but a distribution over answers. Data are inherently noisy and incomplete; they never answer your question precisely. So no

Tagged with:
Posted in Statistics