I learned a useful new phrase today: numerator-only data. This is data without anything to compare it to, no denominator. I ran across the term in Frederick Mosteller’s autobiography. He illustrates the problem with the following old joke.
“Why do the white horses eat more than the black horses?”
“Don’t know. Why?”
“Because we have ten times as many white horses and black horses.”
Numerator-only data is data that leaves you asking “compared to what?” If I tell you the NASDAQ stock index closed at 2368 today, is that good or bad? The number by itself means nothing. Is that up or down compared to last week? Last year? If I tell you, for example, that the record high value was 5047, that gives you a denominator to compare it to.
8 thoughts on “Numerator-only data”
Good point. I think it also depends on the reader. Readers may or may not possess knowledge of the “denominator”. For example, I have a rich understanding of weather expressed in Celsius. I know the difference between 23 and 25 degrees Celsius. The points on the scale are richly connected with a set of memories and beliefs. However, if a temperature is given in Fahrenheit, I’m pretty lost. I either have to convert to Celsius or rely on a rough intuition that’s built up over the while.
This in turn raises the issue of how best to give people perspective on a metric with which they are unfamiliar. This a big issue in many fields: health indicators; economic variables; psychological scales; etc.
I’m always amused by reporting of stock index changes….
I keep thinking that the small random fluctuation in the index from day to day are hardly news at all. They might as well flip a coin and say “today it came up heads”.
Of course when there is a big change in the index, that is news.
Beyond this, I would say that “dividing by the denominator” is a simple–often useful, but still simple–example of “controlling for a confounder.” One of the errors commonly made by stat wannabees such as Steven Levitt is to think that there is some correct denominator that, if we divide by it, will solve all our problems. (Recall his notorious discussion of the per-mile risks of drunk driving.) Per-mile (or per-person or per-horse, etc) is a good start but ultimately no substitute for a fuller formulation of one’s problem.
This is not to disagree with your point but rather to indicate that, once you have a “denominator,” that’s the beginning of a solution but typically not the end.
P.S. I didn’t know that Fred Mosteller had any autobiographical writings. I was his last T.A. but have been out of the Mosteller loop for awhile, I guess.
Numerator-only data can appropriate in certain circumstances – for example when looking at defects or adverse events. In hospitals for example it is common to record the number of falls in a specified period of time.
But I of course agree with you too often absolute figures are given where a relative measure such as a percentage would be more appropriate.
This often occurs when talking about bad things globally – number of deaths from rare illnesses or accidents. Clearly the lower the better but it can give a distorted picture of how unsafe things are.
“Likes” (popularized by Facebook) always struck me as having this fundamental flaw — that it was a numerator-only signal (particularly once recommendation algorithms are put into the loop, increasing or decreasing the denominator). Now everybody is used to likes, and nobody thinks of them as unusual or problematic. But this problem remains.
Could this be similar to “vanity metrics”? (Reference: Troy Magennis @t_magennis)