I thought this tweet from @WoodyOsher was pretty funny. Everything our parents said was good is bad. Sun, milk, red meat … the least-squares method. I wouldn’t say these things are bad, but they are now viewed more critically than…
I thought this tweet from @WoodyOsher was pretty funny. Everything our parents said was good is bad. Sun, milk, red meat … the least-squares method. I wouldn’t say these things are bad, but they are now viewed more critically than…
One reason the normal distribution is easy to work with is that you can vary the mean and variance independently. With other distribution families, the mean and variance may be linked in some nonlinear way. I was looking for a…
Someone asked me yesterday for R code to compute the probability P(X > Y + δ) where X and Y are independent beta random variables. I’m posting the solution here in case it benefits anyone else. For an example of…
When I hear someone say “personalized medicine” I want to ask “as opposed to what?” All medicine is personalized. If you are in an emergency room with a broken leg and the person next to you is lapsing into a…
Someone asked me yesterday how people justify probability distribution assumptions. Sometimes the most mystifying assumption is the first one: “Assume X is normally distributed …” Here are a few answers. Sometimes distribution assumptions are not justified. Sometimes distributions can be…
Commercial weather forecasters need to be accurate, but they also need to be perceived as being accurate, and sometimes the latter trumps the former. For instance, the for-profit weather forecasters rarely predict exactly a 50% chance of rain, which might…
In his speech The dog and the frisbee, Andrew Haldane argues that simple models often outperform complex models in complex situations. He cites as examples sports prediction, diagnosing heart attacks, locating serial criminals, picking stocks, and understanding spending patterns. The…
This weekend John Myles White and I discussed true versus publishable results in the comments to an earlier post. Methods that make stronger modeling assumptions lead to more statistical confidence, but less actual confidence. That is, they are more likely…
When statisticians analyze data, they don’t just by look at the data you bring to them. They also consider hypothetical data that you could have brought. In other words, they consider what could have happened as well as what actually…
The other day I started to call someone’s data analysis “unprincipled” until I realized how harsh that sounds. I wanted to convey that an analysis seemed ad hoc, not based on general principles. Then I realized that “unprincipled” implies someone…
Data analysis has to start from some set of assumptions. Bayesian prior distributions drive some people crazy because they make assumptions explicit that people prefer to leave implicit. But there’s no escaping the need to make some sort of prior…
Here’s a common problem that arises in Bayesian computation. Everything works just fine until you have more data than you’ve seen before. Then suddenly you start getting infinite, NaN, or otherwise strange results. This post explains what might be wrong…
Statistical computing often involves working with ratios of factorials. These factorials are often too big to fit in a floating point number, and so we work with logarithms. So if we need to compute log(a! / b!), we call software…
David Hogg on linear regression: … in almost all cases in which scientists fit a straight line to their data, they are doing something that is simultaneously wrong and unnecessary. It is wrong because … linear relationship is exceedingly rare.…
David Hogg on responsible data analysis: The key idea is that the result of responsible data analysis is not an answer but a distribution over answers. Data are inherently noisy and incomplete; they never answer your question precisely. So no…