Suppose you’re drawing random samples uniformly from some interval. How likely are you to see a new value outside the range of values you’ve already seen? The problem is more interesting when the interval is unknown. You may be trying…

Suppose you’re drawing random samples uniformly from some interval. How likely are you to see a new value outside the range of values you’ve already seen? The problem is more interesting when the interval is unknown. You may be trying…

Cancer research is sometimes criticized for being timid. Drug companies run enormous trials looking for small improvements. Critics say they should run smaller trials and more of them. Which side is correct depends on what’s out there waiting to be…

There’s a theorem in statistics that says You could read this aloud as “the mean of the mean is the mean.” More explicitly, it says that the expected value of the average of some number of samples from some distribution…

Russ Roberts had this to say about the proposal to replacing the calculus requirement with statistics for students. Statistics is in many ways much more useful for most students than calculus. The problem is, to teach it well is extraordinarily…

David Hogg calls conventional statistical notation a “nomenclatural abomination”: The terminology used throughout this document enormously overloads the symbol p(). That is, we are using, in each line of this discussion, the function p() to mean something different; its meaning…

Why would anyone care about what the weather was predicted to be once you know what the weather actually was? Because people make decisions based in part on weather predictions, not just weather. Eric Floehr of ForecastWatch told me that…

Posted in Statistics

I have a quibble with the following paragraph from Introducing Windows Azure for IT Professionals: The problem with big data is that it’s difficult to analyze it when the data is stored in many different ways. How do you analyze…

Posted in Statistics

John Ioannidis stirred up a healthy debate when he published Why Most Published Research Findings Are False. Unfortunately, most of the discussion has been over whether the word “most” is correct, i.e. whether the proportion of false results is more…

Many people have drawn Venn diagrams to locate machine learning and related ideas in the intellectual landscape. Drew Conway’s diagram may have been the first. It has at least been frequently referenced. By this classification, Hector Cuesta’s new book Practical…

Andrew Gelman has some interesting comments on non-informative priors this morning. Rather than thinking of the prior as a static thing, think of it as a way to prime the pump. … a non-informative prior is a placeholder: you can…

From Controversies in the Foundations of Statistics by Bradley Efron: Statistics seems to be a difficult subject for mathematicians, perhaps because its elusive and wide-ranging character mitigates against the traditional theorem-proof method of presentation. It may come as some comfort…

Sometimes you can derive a probability distributions from a list of properties it must have. For example, there are several properties that lead inevitably to the normal distribution or the Poisson distribution. Although such derivations are attractive, they don’t apply…

The preface to Elements of Statistical Learning opens with the popular quote In God we trust, all others bring data. — William Edwards Deming The footnote to the quote is better than the quote: On the Web, this quote has…

Posted in Statistics

Bayesian statistics is to Python as frequentist statistics is to Perl. Perl has the slogan “There’s more than one way to do it,” abbreviated TMTOWTDI and pronouced “tim toady.” Perl prides itself on variety. Python takes the opposite approach. The…

I caught a glimpse of a book in a library this morning and thought the title was “Statistics for People Who Think.” Sounds like a great book! But the title was actually “Statistics for People Who (Think They) Hate Statistics”…