A new take on the birthday problem

Vitalii Tymchyshyn and Andrii Khlevniuk posted a new paper here entitled “On the average number of birthdays in birthday paradox setup.” This paper gives a different perspective on the birthday problem and a new proof.

In the usual formation of the birthday problem, one asks the probability that everyone in a group of size n has a different birthday, or one asks how big n needs to be for the probability to be approximately 1/2 that two people share a birthday. Tymchyshyn and Khlevniuk ask about the expected number of unique birthdays in a group of size n and show that it equals

(1 − αⁿ) / (1 − α)

where α = 364/365.

Variations on the birthday problem often come up in practice. For example, you often need to consider how likely it is that a hash function will map set of inputs to unique values. There’s a rule of thumb that if there are N possible hash values, then there’s a 50-50 chance of a collision if you hash √N values.

Let α = (N − 1)/N and p = 1/N. If we take the first three terms in the Taylor series for (1 − p)ⁿ we see that the expected number of unique hash values after hashing n inputs is approximately

n − n(n − 1) p/2.

If we choose n = √N, then the expected number of unique hash values is approximately

n − 1/2,

which is consistent with having about a 0.5 probability of one collision.

Related probability posts

5 thoughts on “A new take on the birthday problem”

Mike Anderson

29 December 2019 at 10:03

So about 19 individuals, rather than the 23 we get by assuming that birthdays are uniformly distributed and every year has 365 days. Since this is one of these “as N gets large” things, it seems the year is not quite large enough.

Mike G.

30 December 2019 at 08:28

It feels like that should be α = 364.25/365.25 to account for the few individuals with Feb 29th birthdays?

But that won’t make much difference, I guess…

Mike G.

30 December 2019 at 08:30

>>> (364.25/365.25)-(364.0/365.0)
1.875240265181155e-06

Heh – I guess it only matters for REALLY large n :)

Hannes Planatscher

12 February 2020 at 11:06

Very interesting. I came across this under the name “frequency of frequencies”: How many groups of size m share a birthday? I am baffled that this is new, as I am sure to have read about similar findings years ago. e.g. https://en.wikipedia.org/wiki/Good–Turing_frequency_estimation

Andriy

31 March 2020 at 14:36

@Hannes Planatscher I am one of the authors of this work. At the time of writing, I was not aware of the work of Good and Turing. Thanks to your comment I’ve read the paper “The Population Frequencies of Species and the Estimation of Population Parameters” (Biometrika
Vol. 40, No. 3/4 (Dec., 1953), pp. 237-264) and I agree that my result is a special case of a more general result proved in this paper. Thank you for clarification. I am sorry for misinformation.

Comments are closed.