Is the woman in this photograph more likely to be named Esther or Caitlin?
Yesterday Mark Jason Dominus published wrote about statistics on first names in the US from 1960 to 2021. For each year and state, the data tell how many boys and girls were given each name.
Reading the data “forward” you could ask, for example, how common was it for girls born in 1960 to be named Paula. Reading the data “backward” you could ask how likely it is that a woman named Paula was born in 1960.
We do this intuitively. When you hear a name like Blanche you may think of an elderly woman because the name is uncommon now (in the US) but was more common in the past. Sometimes we get a bimodal distribution. Olivia, for example, has made a comeback. If you hear that a female is named Olivia, she’s probably not middle aged. She’s more likely to be in school or retired than to be a soccer mom.
Bayes’ theorem tells us how to turn probabilities around. We could go through Mark’s data and compute the probabilities in reverse. We could quantify the probability that a woman named Paula was born in the 1960s, for example, by adding up the probabilities that she was born in 1960, 1961, …, 1969.
Bayes theorem says
P(age | name) = P(name | age) P(age) / P(name).
Here the vertical bar separates the thing we want the probability of from the thing we assume. P(age | name), for example, is the probability of a particular age, given a particular name.
There is no bar in the probability in the denominator above. P(name) is the overall probability of a particular name, regardless of age.
People very often get probabilities backward; they need Bayes theorem to turn them around. A particular case of this is the prosecutor’s fallacy. In a court of law, the probability of a bit of evidence given that someone is guilty is irrelevant. What matters is the probability that they are guilty given the evidence.
In a paternity case, we don’t need to know the probability of someone having a particular genetic marker given that a certain man is or is not their father. We want to know the probability that a certain man is the father, given that someone has a particular genetic marker. The former probability is not what we’re after, but it is useful information to stick into Bayes’ theorem.
Photo by Todd Cravens on Unsplash.