Is the woman in this photograph more likely to be named Esther or Caitlin?
Yesterday Mark Jason Dominus published wrote about statistics on first names in the US from 1960 to 2021. For each year and state, the data tell how many boys and girls were given each name.
Reading the data “forward” you could ask, for example, how common was it for girls born in 1960 to be named Paula. Reading the data “backward” you could ask how likely it is that a woman named Paula was born in 1960.
We do this intuitively. When you hear a name like Blanche you may think of an elderly woman because the name is uncommon now (in the US) but was more common in the past. Sometimes we get a bimodal distribution. Olivia, for example, has made a comeback. If you hear that a female is named Olivia, she’s probably not middle aged. She’s more likely to be in school or retired than to be a soccer mom.
Bayes’ theorem tells us how to turn probabilities around. We could go through Mark’s data and compute the probabilities in reverse. We could quantify the probability that a woman named Paula was born in the 1960s, for example, by adding up the probabilities that she was born in 1960, 1961, …, 1969.
Bayes theorem says
P(age | name) = P(name | age) P(age) / P(name).
Here the vertical bar separates the thing we want the probability of from the thing we assume. P(age | name), for example, is the probability of a particular age, given a particular name.
There is no bar in the probability in the denominator above. P(name) is the overall probability of a particular name, regardless of age.
People very often get probabilities backward; they need Bayes theorem to turn them around. A particular case of this is the prosecutor’s fallacy. In a court of law, the probability of a bit of evidence given that someone is guilty is irrelevant. What matters is the probability that they are guilty given the evidence.
In a paternity case, we don’t need to know the probability of someone having a particular genetic marker given that a certain man is or is not their father. We want to know the probability that a certain man is the father, given that someone has a particular genetic marker. The former probability is not what we’re after, but it is useful information to stick into Bayes’ theorem.
- Bits of information in age and birthday
- Expected length of DNA matches
- Explaining probability to a jury
Photo by Todd Cravens on Unsplash.
8 thoughts on “First names and Bayes’ theorem”
This is how I think about Bayes’ Theorem: it’s an algebraic trick to turn conditional probabilities around. What I can’t get through my head is this notion that Bayes’ Theorem is how you adjust your beliefs in the presence of new evidence.
I’ve seen people comment on that before, that Bayes theorem is not intrinsically sequential.
I like the way E. T. Jaynes presents it in terms of order of learning. He would state the theorem as
P(A | B) P(B) = P(B | A) P(A)
and then read this as saying you could either learn about A, then learn about B given A, or you could learn about B, then learn about A given B.
I have an amusing anecdote. regarding this. Many years ago a colleague of mine insisted that it was true that 90% of men were colorblind.
I had to explain to her that she was , referring to the fact that 90% of color-blind people are men. yes, order is important.
Vernor Vinge’s short story “True Names” makes use of the age distribution of names. “I’m looking for Deborah Charteris.” “My granddaughter. She’s out shopping. … ” “Deborah, Debby. It suddenly struck him what an old-fashioned name that was, more the name of a grandmother than a grand- daughter.” That’s when he infers the old woman he’s talking to is Deborah. https://archive.org/details/truenamesotherda00ving/page/134/mode/2up?q=grandmother .
That’s great. If I’d known that story before writing the post I would have worked it in.
“Bayes theorem is not intrinsically sequential.”
That’s my problem, I guess. It seems to me that Bayes’ Theorem is too sequential. Given the commutativity of multiplication, set intersection and equality, what Bayes delivers is sequence after sequence of structurally different terms related by equality. What escapes my grasp is how I’m supposed to manipulate sequences of equalities into a change of something.
On the other hand, what should you do when you obtain a new piece of data? Update your prior probability.
And the next time? Update it again, with your new prior distribution being the previous posterior distribution.
“what should you do when you obtain a new piece of data? Update your prior probability.”
Absolutely, but how? In particular, how do I update using Bayes’ Theorem?
If I know the likelihood that an old person is named Agatha, and I know the other necessary likelihoods, I can use Bayes’ Theorem to find the likelihood that a person named Agatha is old. But how do I update when I find a bunch of teenagers named Agatha? I can put on my frequentist hat and recalculate all relevant likelihoods to include the new data, then rerun Bayes’ Theorem, but that’s merely Bayes being used as an algebraic trick; the updating took place in the likelihood recalculations. My interpretation (which may be wrong) of Bayes’ Theorem as an updater is that the theorem accepts as input the new and existing data and somehow directly produces as output the likelihood revisions. It’s that “somehow directly” that has me stumped.