I help people make decisions in the face of uncertainty. Sounds interesting.
I’m a data scientist. Not sure what that means, but it sounds cool.
I study machine learning. Hmm. Maybe interesting, maybe a little ominous.
I’m into big data. Exciting or passé, depending on how many times you’ve heard the term.
Even though each of these descriptions makes a different impression, they’re all essentially the same thing. You could throw in a few more terms too, like artificial intelligence, inferential science, decision theory, or inverse probability.
There are distinctions. These terms don’t entirely overlap, but the overlap is huge. They all have to do with taking data and making an inference.
“Decision-making under uncertainty” emphasizes that you never have complete data, and yet you need to make decisions anyway. “Decision theory” emphasizes that the whole point of analyzing data is to do something as a result, and suggests that focusing directly on the decision itself, rather than proxies along the way, is the best way to do this.
“Data science” stresses that there is more to the process of making inferences than what falls under the traditional heading of “statistics.” Statistics has never been only about “the grotesque phenomenon generally known as mathematical statistics,” as Francis Anscombe described it. Things like data cleaning and visualization have always been part of the practice of statistics, though not the theory of statistics. Data science also emphasizes the role of computation. Some say a data scientist is a statistician who can program. Some say data science is statistics on a Mac.
Despite the hype around the term data science, it’s growing on me. It has its drawbacks, but so does every other name.
Machine learning, like decision theory, emphasizes the ultimate goal of doing something with data rather than creating an accurate model of the process that generates the data. If you can create such a model, so much the better. But it may not be necessary to have a great model in order to accomplish what you originally set out to do. “Naive Bayes,” for example, is a classification algorithm that is admittedly naive. It knowingly makes a gross simplification, assuming events are independent that we know are certainly not independent, and yet it often works well enough.
“Big data” is a big can of worms. It is often concerned with data sets that are indeed big, but it also implies other things, such as the way the data become available, as a real time stream rather than as a complete static set. See Erik Meijer’s Big data cube. And that’s just when the term “big data” is used in some fairly meaningful way. It’s also used so broadly as to be meaningless.
The term “statistics” literally means the mathematics of the interests of states, as in governments, because these were the first applications of statistics. So while “statistics” may be the most established and perhaps most respectable term discussed here, it’s not great. As I remarked here, “The term statistics would be equivalent to governmentistics, a historically accurate but otherwise useless term.” Statistics emphasizes probability models and mathematical rigor more than other variations on data analysis do. Statisticians criticize machine learning folks for being sloppy. Machine learning folks criticize statisticians for being too conservative, or for being too focused on description and not focused enough on prediction.
Bayesian statistics is much older than what is now sometimes called “classical” statistics. It was essential dormant during the first half of the 20th century before experiencing a renaissance in the second half of the century. Bayesian statistics was originally called “inverse probability” for good reason. Probability theory takes the probabilities of events as given and makes inferences about possible outcomes. Bayesian statistics does the inverse, taking data as given and inferring the probabilities that lead to the data. All statistics does something like this, but Bayesian statistics is consistent in forming all inference directly as probabilities. Frequetist (“classical”) statistics also infers probabilities, but the results, things like p-values and confidence intervals, are not the probabilities of what most people think they are. See Anthony O’Hagan’s description here.
Data analysis has gone by many names over time, sometimes with meaningful distinctions and sometimes not. Often people make a distinction without a difference.
I think the first one, “I help people make decisions in the face of uncertainty”, is not as closely equivalent, because there is no implication of quantitative methods. Political pundits and other “experts” would fall into this category, in a way that they would not in others.
Good point. Data analysis emphasizes quantitative method for dealing with uncertainty. But it should also take into account things known outside the data, and tread lightly where things are not readily quantified.
Very glad to see your quotes around the term “classical” as applied to frequentist statistics. I don’t get why people use this term – perhaps as a way of appearing respectful? But Bayesian statistics has a much longer history, and classical is already applied to other areas (music, art, etc), where it has connotations of oldness and quality that are lacking in statistics.
The popular impression is that statistics started in the late 19th or early 20th century. Then when Bayesian statistics experienced a renaissance around 1980, people saw it as new.
You could call frequentist statistics “classical” for two reasons. One would just be ignorance of history. The other would be to say that in a sense, statistics really did start in the early 20th century because only then was it widely adopted in science. And when that happened, it was frequentist statistics that was adopted.
I think there is a fundamental distinction between “data analysis” (including design of experiments, business intelligence, hypotheses testing, regression models, etc) and “artificial intelligence” (including computer vision, natural language processing, game playing, optimal control, etc).
Of course, those two disciplines have much in common, but they are philosophically, methodologically and engineerically very-very different.
I think it’s rather common (but not common enough to avoid confusion) to assign the term “machine learning” to the second, and “data science” to the first.
I recently got a promotion at work, and my title went from “Manager, Decision Research and Analytics” to “Senior Manager, Data Science” while remaining with the same team. I like “Analytics” better, so it’s still how I describe my role at work. The only downside is that there are other teams within the company with the term “analytics” in their name that don’t really use programming, statistics, or machine learning (they use things like Cognos and Excel). I see some people at other companies using “advanced analytics” to distinguish, but that sounds pretentious to me.
When I hear someone say they use “advanced” analytics, I sometimes tell them that I am glad that they are not using “basic” analytics or — even worse — “remedial” analytics.
The omnipresent nomenclature debate in this area reminds me to treasure both my Finance/Decision Sciences degree from 30+ years ago (truly a marriage between OR, Stat & Behavioral Economics/Finance) as well as a later econometrics/finance degree, where the subtle distinction between the econometric and statistical approach was always noted & highlighted. The key (regardless of the name) is to use tools wisely (and i’d add, create robust models).
Mind you, “machine learning” has a similar issue. Roughly speaking:
1950s: Electronic brains
1960s: Perceptrons
1970s: Artificial intelligence
1980s: Expert systems
1990s: Knowledge-based systems
2000s: Intelligent agents
2010s: Machine learning