It’s engineering statistics, for which there are many excellent freshman-level texts. It starts with error estimation, analysis, and propagation. Seeing how errors in different domains (time, magnitude, etc.), and their correlations, affect analytical results.

For me, my first single semester course illuminated the spectrum of lies, damned lies, and statistics. It opened the door to understanding control systems and feedback at an intuitive level, guiding me toward the math I needed for a particular situation. Best of all, it let me quickly validate hunches by determining the characteristics of the data I’d need to obtain (quality, quantity, etc.) in order to confirm or refute it.

That wedge of statistics also provided access to other knowledge domains, where I became able to critique papers with significant statistical content, such as in economics, AI, and particularly medicine. There are lots of medical researchers who don’t know their stats from a hole in the ground!

I’m not much of a stats freak, knowing little more than the basics. But I learned them well, and use them regularly. Though doing stats occupies about 3% of my work (in embedded software), it consistently provides the most usable results for the time spent. It especially helps me clarify fuzzy thinking, to craft mini-experiments to help me determine if I’m on the right track.

I believe all stats beginners should start with Engineering Statistics, and expand from there.

For me, it provides clear guidance for deciding what tools and techniques may be used under what conditions, and how to validate the results. Dealing with real-world data obtained using a stopwatch or multimeter or o’scope or logic analyzer provides great clarity: Does the instrument measure the world correctly? What are its limitations? Can those limitations be compensated or allowed for? How can data from different instruments be combined? If I split my data into two sets, and repeat my analysis on each set, how can I expect the results to behave?

Statistics, as a concept, can be baffling to beginners: Anchoring it to the real world (not “soft” data such as surveys) quickly makes the concepts concrete.

]]>There are times you need heteroscedastic non linear least squares, but 99% of the time, the problem is understanding what the relevant comparision group is, or somehting really basic like that.

If I may – politely – stats courses are taugth by stats guys (or gals) who tend to be math oriented; also math is an easy way to spread out the class and find winners and loosers.

But that doesn’t mean a lot of math is necessary; I think stats courses should spend a lot more time on exp design and understanding how to put the data into really simple bins

http://learnandteachstatistics.wordpress.com/2012/04/10/statistics-textbooks/

I also think that textbooks are on their way out, and that online learning or apps are preferable.

http://learnandteachstatistics.wordpress.com/2012/01/20/textbooks/

Or try my app AtMyPace: Statistics! Much cheaper than a textbook and far from complete, but fun. ]]>

I made plenty of mistakes whilst writing it – especially the non-inclusion of exercises, something that will be rectified in the future. However, I tried to give more than just formulae and included some background information as to “why this works”. I also tried to cover the idea of statistical analysis from beginning to end: planning what you want to do, collecting the data and writing it in a coherent fashion, carrying out the appropriate tests and then reporting the results (including graphs).

Mark Gardener

http://www.dataanalytics.org.uk

• Farnsworth cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf

• Faraway Linear Models with R http://books.google.com/books?id=fvenzpofkagC&lpg=PP1&dq=%22applied%20linear%20models%20in%20r%22&pg=PP1#v=onepage&q=%22applied%20linear%20models%20in%20r%22&f=false

• Some kind of exploratory data analysis eg, http://www.indiana.edu/~wim/docs/9_8_2010_presentation.pdf (cf, http://twitter.com/EdwardTufte/status/284036109426630656)

• Angrist & Pischke’s book looks interesting. http://www.mostlyharmlesseconometrics.com http://www.stat.columbia.edu/~gelman/research/published/angristpischke2.pdf

• Richard Jeffrey http://www.princeton.edu/~bayesway/Book*.pdf for the probability theory

If I were recommending just one book it would be Jeffrey’s. But somehow fit in http://twitter.com/EdwardTufte/status/284036109426630656.

Toward the modelling + residuals end: I think http://en.wikipedia.org/wiki/Income_inequality_in_the_United_States#Race_and_gender_disparities (+ down 1 page) provides a nice microcosm of the reason one wants to do statistical modelling and the pitfalls of trying to do so. Start with the overall disparities, then note things like Blacks are on average younger; see how the disparities change at different education levels (then the questions shift toward why are blacks getting fewer doctorates; does the women/men doctoral pay divide refer to subjects study; and what’s the difference between running an experiment and a “statistical control” where we “subtract off” some model?

]]>PS: I think I know statistics

]]>Sadly, my somewhat traditional CS education never required more than a “stats for engineers” type course. Even though I’m not currently feeling enamored with traditional credentials and programs of study these days, I am kind of on the fence whether if one is going to claim any mastery of statistics if one ought to pursue a MS in either stats or biostats.

]]>Đani, thanks for suggesting the Cobb article. There’s good advice already on the first page (“judge a book by its exercises, and you cannot go far wrong”).

Here’s a JSTOR link in case other readers are interested too:

http://www.jstor.org/stable/10.2307/2289170

I don’t think a traditional textbook is the way to go. Ideally, I think you’d want some web application (or other software) that allows you to dig deeper into the material as needed. For instance, the top layer would just be conceptual. All text and graphs, with math no more complicated than some simple arithmetic and probability for demonstrations. The goal at this layer would be to develop an intuition for the material. Optionally (depending on a student’s goals), the student could drill down on a topic into the underlying math, which is the next layer. This would include all of the formal mathematical definitions of whatever topic is at hand. The focus here is to translate the concepts to math. Finally, the last layer would be a programming layer. This would give examples of how to actually perform the analyses/tests/modeling with Python or R or whatever.

There would be quizzes/homeworks to test the student’s understanding on each topic at each layer. The top layer would ask conceptual questions, the next layer would ask the student to solve problems mathematically by hand, and the last layer would have the student solve more involved problems with software.

If a manager needed to brush up on stats in order to better communicate with his/her analytics team, he/she could just read through the top layer. A mathematician wanting to learn the concepts more deeply could stick to the first two layers. Someone wanting to become a practitioner would go through all three layers. Thinking about it this way would force an author to make sure that each layer is complete and complementary to the others.

]]>http://tamino.wordpress.com/2012/12/30/hiatus/

I know nothing other than that I’ve found his blog posts very educational (though I’ve skipped some of the heavier maths bits).

]]>George W. Cobb has written an article (JASA, vol. 82, 1987) about introductory statistics textbooks.

]]>