Suppose in a company of *N* employees, *m* are chosen randomly for drug screening. In two independent screenings, what is the probability that someone will be picked both times? It may be unlikely that any given *individual* will be picked twice, while being very likely that *someone* will be picked twice.

Imagine *m* employees being given a red ticket representing the first screening, and *m* being given a blue ticket representing the second screening. The tickets be passed out in

different ways. Of these, the number of ways the tickets could be passed out so that no one has both a red and a blue ticket is

because you can first pass out the red tickets, then choose *m* of the employees who did not get a red ticket for the blue tickets. And so the probability that no one will be picked twice, the probability that nobody holds both a red and a blue ticket, is

Now let’s plug in some numbers. Suppose a company has 100 employees and 20 are randomly screened each time. In two screenings, there is only a 0.7% chance that no one will be tested twice. Said another way, there’s a 99.3% chance that at least one person will be screened twice. Any given individual has a 20% chance of being selected each time, and so a 4% chance of being picked twice.

A variation on this problem is to compute the expected number of overlaps between two tests. With *N* = 100 and *m* = 20, we expect four people to be tested twice.

By the way, what if over half the employees are tested each time? For example, if a company of 100 people tests 60 people each time, it’s *certain* to test somebody twice. But the derivation above still works. The general definition of binomial coefficients takes care of this because the numerator will be zero if *m* is larger than half *N*. The number of ways to choose 60 things from a set of 40 things, for instance, is zero.

Might this map pretty cleanly onto the well-known “share the same birthday in a room” problem?

@Ross: It’s similar but not the same. To correspond cleanly to the birthday problem, you would have to repeatedly choose one person at random for screening, rather than choosing m people at a time. Then the screenings correspond to people, and the people correspond to days.

Or, you would have to consider the following variant of the birthday problem: given two rooms with m people, knowing that no two people in a room share a birthday, what is the probability that two people in the two rooms share a birthday?