Vitalii Tymchyshyn and Andrii Khlevniuk posted a new paper here entitled “On the average number of birthdays in birthday paradox setup.” This paper gives a different perspective on the birthday problem and a new proof.
In the usual formation of the birthday problem, one asks the probability that everyone in a group of size n has a different birthday, or one asks how big n needs to be for the probability to be approximately 1/2 that two people share a birthday. Tymchyshyn and Khlevniuk ask about the expected number of unique birthdays in a group of size n and show that it equals
(1 – αn) / (1 – α)
where α = 364/365.
Variations on the birthday problem often come up in practice. For example, you often need to consider how likely it is that a hash function will map set of inputs to unique values. There’s a rule of thumb that if there are N possible hash values, then there’s a 50-50 chance of a collision if you hash √N values.
Let α = (N-1)/N and p = 1/N. If we take the first three terms in the Taylor series for (1 – p)n we see that the expected number of unique hash values after hashing n inputs is approximately
n – n(n – 1) p/2.
If we choose n = √N, then the expected number of unique hash values is approximately
n – 1/2,
which is consistent with having about a 0.5 probability of one collision.