It’s a simple rule of probability that if *A* makes *B* more likely, *B* makes *A* more likely. That is, if the conditional probability of *A* given *B* is larger than the probability of *A* alone, the the conditional probability of *B* given *A* is larger than the probability of *B* alone. In symbols,

Prob( *A* | *B* ) > Prob( *A* ) ⇒ Prob( *B* | *A* ) > Prob( *B* ).

The proof is trivial: Apply the definition of conditional probability and observe that if Prob( *A* ∩ *B* ) / Prob( *B* ) > Prob( *A* ), then Prob( *A* ∩ *B* ) / Prob( *A* ) > Prob( *B* ).

Let *A* be the event that someone was born in Arkansas and let *B* be the event that this person has been president of the United States. There are five living current and former US presidents, and one of them, Bill Clinton, was born in Arkansas, a state with about 1% of the US population. Knowing that someone has been president increases your estimation of the probability that this person is from Arkansas. Similarly, knowing that someone is from Arkansas should increase your estimation of the chances that this person has been president.

The chances that an American selected at random has been president are very small, but as small as this probability is, it goes up if you know the person is from Arkansas. In fact, it goes up by the same proportion as the opposite probability. Knowing that someone has been president increases their probability of being from Arkansas by a factor of 20, so knowing that someone is from Arkansas increases the probability that they have been president by a factor of 20 as well. This is because

Prob( *A* | *B* ) / Prob( *A* ) = Prob( *B* | *A* ) / Prob( *B* ).

This isn’t controversial when we’re talking about presidents and where they were born. But it becomes more controversial when we apply the same reasoning, for example, to deciding who should be screened at airports.

When I jokingly said that being an Emacs user makes you a better programmer, it appears a few Vim users got upset. Whether they were serious or not, it does seem that they thought “Hey, what does that say about me? I use Vim. Does that mean I’m a bad programmer?”

Assume for the sake of argument that Emacs users are better programmers, i.e.

Prob( good programmer | Emacs user ) > Prob( good programmer ).

We’re not assuming that Emacs users are necessarily better programmers, only that a larger proportion of Emacs users are good programmers. And we’re not saying anything about causality, only probability.

Does this imply that being a Vim user lowers your chance of being a good programmer? i.e.

Prob( good programmer | Vim user ) < Prob( good programmer )?

No, because being a Vim user is a specific alternative to being an Emacs user, and there are programmers who use neither Emacs nor Vim. What the above statement about Emacs *would* imply is that

Prob( good programmer | not a Emacs user ) < Prob( good programmer ).

That is, if knowing that someone uses Emacs increases the chances that they are a good programmer, then knowing that they are not an Emacs user does indeed lower the chances that they are a good programmer, *if we have no other information*. In general

Prob( *A* | *B* ) > Prob( *A* ) ⇒ Prob( *A* | not *B* ) < Prob( *A* ).

To take a more plausible example, suppose that spending four years at MIT obtaining a computer science degree makes you a better programmer. Then knowing that someone has a CS degree from MIT increases the probability that this person is a good programmer. But if that’s true, it must also be true that **absent any other information**, knowing that someone does not have a CS degree from MIT decreases the probability that this person is a good programmer. If a larger proportion of good programmers come from MIT, then a smaller proportion must not come from MIT.

***

This post uses the ideas of information and conditional probability interchangeably. If you’d like to read more on that perspective, I recommend Probability Theory: The Logic of Science by E. T. Jaynes.