Elementary statistics book recommendation

I’ve thought about making a personal FAQ page. If I do, one of the questions would be what elementary statistics book I recommend. Unfortunately, I don’t have an answer for that one. I haven’t seen such a book I’d recommend enthusiastically.

When asked for book recommendations, people will often recommend the textbook used in a course they had. But I never had an elementary statistics course. I had a PhD in math before I became interested in statistics, so I learned statistics from more advanced books. I’ve looked at a number of elementary books, but I haven’t found one I’m excited about.

Elementary statistics books may do more harm than good. They often brush difficulties under the rug. They avoid mathematical and philosophical details. They don’t define terms carefully, and even say things that are false. And they imply that statistical analysis is a matter of applying a set of rules by rote. (And it is, for many statisticians. But that’s a topic for another time.) If a statistics book doesn’t have fairly steep prerequisites, it will be hard for it not to be misleading.

This leads to another frequently asked question: Do I intend to write my own elementary statistics book? No. I don’t know whether I could write such a book that I’d be proud of. And if I could, it would take more time than I could afford to devote to it at this point in my life.

(I’ll write soon about what “this point in my life” is. If you don’t want to wait, here’s the news in a nutshell.)

30 thoughts on “Elementary statistics book recommendation

  1. I would suggest “Statistics” by Freedman, Pisani, and Purves, and also “The Basic Practice of Statistics” by David S. Moore, or any of his similar books.

    George W. Cobb has written an article (JASA, vol. 82, 1987) about introductory statistics textbooks.

  2. I’ve been thinking about this recently too. Here’s my idea, which I hope someone steals (I’m probably not qualified to attempt it myself):

    I don’t think a traditional textbook is the way to go. Ideally, I think you’d want some web application (or other software) that allows you to dig deeper into the material as needed. For instance, the top layer would just be conceptual. All text and graphs, with math no more complicated than some simple arithmetic and probability for demonstrations. The goal at this layer would be to develop an intuition for the material. Optionally (depending on a student’s goals), the student could drill down on a topic into the underlying math, which is the next layer. This would include all of the formal mathematical definitions of whatever topic is at hand. The focus here is to translate the concepts to math. Finally, the last layer would be a programming layer. This would give examples of how to actually perform the analyses/tests/modeling with Python or R or whatever.

    There would be quizzes/homeworks to test the student’s understanding on each topic at each layer. The top layer would ask conceptual questions, the next layer would ask the student to solve problems mathematically by hand, and the last layer would have the student solve more involved problems with software.

    If a manager needed to brush up on stats in order to better communicate with his/her analytics team, he/she could just read through the top layer. A mathematician wanting to learn the concepts more deeply could stick to the first two layers. Someone wanting to become a practitioner would go through all three layers. Thinking about it this way would force an author to make sure that each layer is complete and complementary to the others.

  3. John, I’ve been in the same boat. If you ever do find such a book I hope you’ll share it with us.

    Đani, thanks for suggesting the Cobb article. There’s good advice already on the first page (“judge a book by its exercises, and you cannot go far wrong”).
    Here’s a JSTOR link in case other readers are interested too:
    http://www.jstor.org/stable/10.2307/2289170

  4. The issue with most introductory statistics books is that they give you formulas without any theoretical background, so people tend to misuse the techniques. The biggest offenders are books marketed to other disciplines, like “Statistics for Computer Scientists,” or “Statistics for Biologists.” I really like Mathematical Statistics With Applications by Wackerly, Mendenhall, and Scheaffer because it introduces probability and statistics with enough theoretical background that you can take a more advanced follow-on course if desired. If you don’t have to take a follow on course then hopefully the basic theory will keep you from making the simple mistakes that most scientists make.

  5. Then write a FAQ, and put that in the section about elementary statistics books. It will be disappointing to some, but helpful to others who will be guided to not waste their time.

  6. I smiled when I read that you do not have an enthusiastic recommendation – mainly because I feel like I (many times) almost emailed to ask you to write a blog posting on a series of intro/elementary stats textbooks appropriate for personal study. I have come to the conclusion that having a fairly “mature” mathematical background (I’m planning and guessing on my own that this maybe analysis and linear algebra at the beginning grad level) is step one, then step two is to plow through a number of stats books, harvesting the good while having the taste to throw out the bad.

    Sadly, my somewhat traditional CS education never required more than a “stats for engineers” type course. Even though I’m not currently feeling enamored with traditional credentials and programs of study these days, I am kind of on the fence whether if one is going to claim any mastery of statistics if one ought to pursue a MS in either stats or biostats.

  7. I think youre sentiment can be summarized as: “if you think you know statistics, you don’t know statistics”. Elementary books try to give the feeling you know statistics, and that’s where the danger is …

    PS: I think I know statistics :-)

  8. I enthusiastically second the recommendation for Freedman, Pisani, and Purves’s book (I read this blog from an rss reader, so “enthusiastically” means, “came to the webpage to comment”). Howard Wainer’s books on statistical graphics (i.e. Visual revelations) might do a better job of addressing your specific concerns, though.

  9. My list would be:

    • Farnsworth cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf
    • Faraway Linear Models with R http://books.google.com/books?id=fvenzpofkagC&lpg=PP1&dq=%22applied%20linear%20models%20in%20r%22&pg=PP1#v=onepage&q=%22applied%20linear%20models%20in%20r%22&f=false
    • Some kind of exploratory data analysis eg, http://www.indiana.edu/~wim/docs/9_8_2010_presentation.pdf (cf, http://twitter.com/EdwardTufte/status/284036109426630656)
    • Angrist & Pischke’s book looks interesting. http://www.mostlyharmlesseconometrics.com http://www.stat.columbia.edu/~gelman/research/published/angristpischke2.pdf
    • Richard Jeffrey http://www.princeton.edu/~bayesway/Book*.pdf for the probability theory

    If I were recommending just one book it would be Jeffrey’s. But somehow fit in http://twitter.com/EdwardTufte/status/284036109426630656.

    Toward the modelling + residuals end: I think http://en.wikipedia.org/wiki/Income_inequality_in_the_United_States#Race_and_gender_disparities (+ down 1 page) provides a nice microcosm of the reason one wants to do statistical modelling and the pitfalls of trying to do so. Start with the overall disparities, then note things like Blacks are on average younger; see how the disparities change at different education levels (then the questions shift toward why are blacks getting fewer doctorates; does the women/men doctoral pay divide refer to subjects study; and what’s the difference between running an experiment and a “statistical control” where we “subtract off” some model?

  10. I like Bock, Velleman, and De Veaux which is a high school (AP) text. They definately do not reduce statistics to formulas to be applied by rote, but may not have the the mathematical rigor you are looking for.

  11. I wrote my first book “Statistics for Ecologists Using R and Excel” because I also hadn’t found any really good books: http://www.amazon.com/Statistics-Ecologists-Using-Excel-Presentation/dp/1907807128/ref=ntt_at_ep_dpt_3

    I made plenty of mistakes whilst writing it – especially the non-inclusion of exercises, something that will be rectified in the future. However, I tried to give more than just formulae and included some background information as to “why this works”. I also tried to cover the idea of statistical analysis from beginning to end: planning what you want to do, collecting the data and writing it in a coherent fashion, carrying out the appropriate tests and then reporting the results (including graphs).

    Mark Gardener
    http://www.dataanalytics.org.uk

  12. Nicola Ward Petty

    The question is also, for whom is the book destined? Is is a terminal course in applying statistics or a first year course in the mathematics of statistics? Either way there needs to be a lot of real data, something that most books don’t have. I’m glad you mentioned Cobb’s paper, which was a life-changing find for me. You may also be interested in my post on how textbooks suck the fun out of statistics.
    http://learnandteachstatistics.wordpress.com/2012/04/10/statistics-textbooks/
    I also think that textbooks are on their way out, and that online learning or apps are preferable.
    http://learnandteachstatistics.wordpress.com/2012/01/20/textbooks/
    Or try my app AtMyPace: Statistics! Much cheaper than a textbook and far from complete, but fun.

  13. I would almost certainly use, in addition, the very short book “How to Lie with Statistics.” You might want something else to provide all the formulas, but that book excels at getting across the basic concepts and the common mistakes.

  14. Hey, man, I love your blog. I’ve been reading it for years. I have also lately come to an amateur study of statistics and I was also searching for a good introductory book. I settled on “The Complete Idiot’s Guide to Statistics”, which was fine for my purposes. I’ll supplement with the other titles mentioned above. Personally, I’ve come to believe that part of the reason we struggle to find a satisfactory “introductory stats” book is that the topic of statistics is simply too huge — broader, deeper, and more sub-specialized than might be first be assumed, and the field is expanding rapidly. It’s less like looking for an intoductory book on “algebra” and more like looking for an introductory book on “architecture” or “engineering”.

  15. I gave my nephew a copy of “The Manga Guide to Statistics” at Christmas, since his JC isn’t giving a course in it. As a real text it wouldn’t work, but for a self taught introduction it seemed as good as anything else (and got an amused laugh upon unwrapping).

  16. Depends on what you mean by “elementary”. If you have little or no background in stats and are interested in learning about the history of the field, you might enjoy “The Lady Tasting Tea” by David Salsburg. I also just stumbled upon “Naked Statistics” by Charles Wheelan, which looks promising… I learn better when the subject is presented in an interesting fashion. Textbook authors seem to struggle with that concept.

  17. David: I enjoyed reading “The Lady Tasting Tea.” I like history, but I learned much of my statistics before knowing any historical context. When I read that book, I wished I read it earlier.

  18. as a non math non stats person, I would say that for most people, the most important thing is understanding the real world situation, and how to apply simple tests.
    There are times you need heteroscedastic non linear least squares, but 99% of the time, the problem is understanding what the relevant comparision group is, or somehting really basic like that.

    If I may – politely – stats courses are taugth by stats guys (or gals) who tend to be math oriented; also math is an easy way to spread out the class and find winners and loosers.
    But that doesn’t mean a lot of math is necessary; I think stats courses should spend a lot more time on exp design and understanding how to put the data into really simple bins

  19. I’d recommend starting statistics from a real-world perspective, dealing with measurements of real systems. Learning about the characterization of, and detection of, measurement errors and biases. Learning when to discard data, and how to improve how data is taken. Only after the data itself has been well described can the next step of performing higher-level analyses be applied.

    It’s engineering statistics, for which there are many excellent freshman-level texts. It starts with error estimation, analysis, and propagation. Seeing how errors in different domains (time, magnitude, etc.), and their correlations, affect analytical results.

    For me, my first single semester course illuminated the spectrum of lies, damned lies, and statistics. It opened the door to understanding control systems and feedback at an intuitive level, guiding me toward the math I needed for a particular situation. Best of all, it let me quickly validate hunches by determining the characteristics of the data I’d need to obtain (quality, quantity, etc.) in order to confirm or refute it.

    That wedge of statistics also provided access to other knowledge domains, where I became able to critique papers with significant statistical content, such as in economics, AI, and particularly medicine. There are lots of medical researchers who don’t know their stats from a hole in the ground!

    I’m not much of a stats freak, knowing little more than the basics. But I learned them well, and use them regularly. Though doing stats occupies about 3% of my work (in embedded software), it consistently provides the most usable results for the time spent. It especially helps me clarify fuzzy thinking, to craft mini-experiments to help me determine if I’m on the right track.

    I believe all stats beginners should start with Engineering Statistics, and expand from there.

    For me, it provides clear guidance for deciding what tools and techniques may be used under what conditions, and how to validate the results. Dealing with real-world data obtained using a stopwatch or multimeter or o’scope or logic analyzer provides great clarity: Does the instrument measure the world correctly? What are its limitations? Can those limitations be compensated or allowed for? How can data from different instruments be combined? If I split my data into two sets, and repeat my analysis on each set, how can I expect the results to behave?

    Statistics, as a concept, can be baffling to beginners: Anchoring it to the real world (not “soft” data such as surveys) quickly makes the concepts concrete.

  20. So what “more advanced books” did you learn from? I have a pretty good grasp of real analysis, but I’ve never managed to get as comfortable with statistics as I would like.

  21. So what should I do? I currently only know the difference between mean, mode, median. I really want to understand statistics; mainly for linguistic research, but I also want to avoid all those pitfalls you’ve written about.

  22. My suggestion is to focus on learning probability first. It’s easier to present probability correctly, and most books do a fairly good job. Once you know probability well, you’ll be sensitive to the false statements in many statistics books if you pay attention.

Comments are closed.