Offended by conditional probability

It’s a simple rule of probability that if A makes B more likely, B makes A more likely. That is, if the conditional probability of A given B is larger than the probability of A alone, the conditional probability of B given A is larger than the probability of B alone. In symbols,

Prob( A | B ) > Prob( A ) ⇒ Prob( B | A ) > Prob( B ).

The proof is trivial: Apply the definition of conditional probability and observe that if Prob( AB ) / Prob( B ) > Prob( A ), then Prob( AB ) / Prob( A ) > Prob( B ).

Let A be the event that someone was born in Arkansas and let B be the event that this person has been president of the United States. There are five living current and former US presidents, and one of them, Bill Clinton, was born in Arkansas, a state with about 1% of the US population. Knowing that someone has been president increases your estimation of the probability that this person is from Arkansas. Similarly, knowing that someone is from Arkansas should increase your estimation of the chances that this person has been president.

The chances that an American selected at random has been president are very small, but as small as this probability is, it goes up if you know the person is from Arkansas. In fact, it goes up by the same proportion as the opposite probability. Knowing that someone has been president increases their probability of being from Arkansas by a factor of 20, so knowing that someone is from Arkansas increases the probability that they have been president by a factor of 20 as well. This is because

Prob( A | B ) / Prob( A ) = Prob( B | A ) / Prob( B ).

This isn’t controversial when we’re talking about presidents and where they were born. But it becomes more controversial when we apply the same reasoning, for example, to deciding who should be screened at airports.

When I jokingly said that being an Emacs user makes you a better programmer, it appears a few Vim users got upset. Whether they were serious or not, it does seem that they thought “Hey, what does that say about me? I use Vim. Does that mean I’m a bad programmer?”

Assume for the sake of argument that Emacs users are better programmers, i.e.

Prob( good programmer | Emacs user )  >  Prob( good programmer ).

We’re not assuming that Emacs users are necessarily better programmers, only that a larger proportion of Emacs users are good programmers. And we’re not saying anything about causality, only probability.

Does this imply that being a Vim user lowers your chance of being a good programmer? i.e.

Prob( good programmer | Vim user )  <  Prob( good programmer )?

No, because being a Vim user is a specific alternative to being an Emacs user, and there are programmers who use neither Emacs nor Vim. What the above statement about Emacs would imply is that

Prob( good programmer | not a Emacs user )  <  Prob( good programmer ).

That is, if knowing that someone uses Emacs increases the chances that they are a good programmer, then knowing that they are not an Emacs user does indeed lower the chances that they are a good programmer, if we have no other information. In general

Prob( A | B ) > Prob( A ) ⇒ Prob( A | not B ) < Prob( A ).

To take a more plausible example, suppose that spending four years at MIT obtaining a computer science degree makes you a better programmer. Then knowing that someone has a CS degree from MIT increases the probability that this person is a good programmer. But if that’s true, it must also be true that absent any other information, knowing that someone does not have a CS degree from MIT decreases the probability that this person is a good programmer. If a larger proportion of good programmers come from MIT, then a smaller proportion must not come from MIT.

* * *

This post uses the ideas of information and conditional probability interchangeably. If you’d like to read more on that perspective, I recommend Probability Theory: The Logic of Science by E. T. Jaynes.

12 thoughts on “Offended by conditional probability

  1. I think it’s a mistake, though, to assume that the rules of english grammar correspond to the rules of probability.

    One issue here is that contexts shift between different parts of an english phrase or discussion while probabilistic inference (such as you are describing, above) is only valid within a fixed context.

    So, while it’s possible to use english to discuss probability (if we are careful) it’s easy to fall into problems if we do not carefully address contextual issues as perceived by our audiences.

  2. Perhaps the complaint of the above Vim users is better formalized like so:

    Prob( good programmer | Emacs user ) > Prob( good programmer | Vim user )

  3. Peter: Maybe so. I never said that, but someone may have heard that.

    It’s amazing how touchy the Emacs/Vim thing is. Ironic too since both Emacs and Vim are obscure corners of the universe as far as Visual Studio and Eclipse users are concerned. I guess that’s how it always goes: sibling rivalries are the most bitter.

  4. You have fallen into the trap of thinking that human communication is governed by such a simple Bayesian rule.

    Check out the works of Paul Grice and what he had to say about Conversational implicature. Sperber & Wilson’s “Relevance communication & cognition” is worth a look.

  5. I agree with the comments above regarding the translation from everyday speech to probability and math. Many of the probability problems I’ve seen floating around the net (e.g. “Tuesday boy”) seem to boil down to differences in translation.

  6. Dave Backus @ NYU

    When you say “being an Emacs user makes you a better programmer,” I think you want to avoid the word “makes.”

  7. People who are offended by conditional probability are often also offended by symbolic logic. In the same way that logic presents itself as a distillation of patterns of valid argument, probability presents itself as being about patterns of plausible inference. Yet, for logic there is always the option of giving up on argument and building up a model-theoretic account, and for probability you can give up on inference and spend your time reducing things to measure theory and sigma algebras. In neither case does the ensuing activity feel right as a way of capturing what goes on when people construct arguments or make inferences.

  8. “If A makes B more likely then B makes A more likely” is only true when the “makes” is read as being about how your subjective degree of belief changes on learning one of the events. If it’s read as something akin to “causes” then you got obvious absurdities: Going to MIT causes you to be better at programming, therefore being good at programming causes you to have gone to MIT. Clearly crazy.

    On a related topic, here’s an interesting paper: http://www.colyvan.com/papers/shonubi.pdf

  9. Dave and Seamus: Good points. When I say “makes,” I mean “causes the rational person to increase their estimated probability of.” I agree with Jaynes’ view that probability is not a statement about nature but rather of our understanding of nature.

Comments are closed.