The birthday paradox is an interesting problem, mainly because of its somehow “unexpected” results. The problem usually is stated as the following:

*What is the minimum number of people in a room, in order to have a probability higher than 1/2 (or 50% in other terms), that 2 people in the room are born in the same day and month (not taking the year into account) ?*

Equalizing all conditions, e.g. no leap years, no seasonality in birthdays (each day is equally possible) and no twins in the room, the result is pretty strange for those, who are not familiar with probability calculations. It is **23**. And if we increase the number up to **55**, the probability is higher than 99/100 (or 99%). Somehow unexpected.

The way to calculate it is rather logical – we start by taking 2 people in the room and we write their birthdays on a list. The probability, that the second has the birthday of the first is **1/365**. Then we add a third person, and the chance that their birthday matches the one of the two previous people in the room is a union of two probabilities – 1/365 and 1/364. The days become one less, because the second person in the room has already “taken” a second date in the 365 day calendar.

In order to simplify the calculations a bit, it is a good idea to calculate the probability of **n** people **not** sharing the same birthday, and then remove this probability from 1. In the python code below, this probability is called
not_sharing. Once this probability is calculated,
1 - not_sharing is our expected result.

The algorithm looks like this in Wikipedia:

The representation in Python could be done through a simple loop with printing the results:

1 2 3 4 5 6 7 8 9 10 11 12 |
def main(): days = 365 not_sharing = 1 for i in range(1, days): probability = i / days not_sharing *= (1 - probability) result = 1 - not_sharing print ("{0} - {1:.16f}".format(i+1, result)) if __name__== "__main__": main() |

The representation in Excel is also quite easy, using the same logic from Python, with 3 columns and incrementing variable **i **(the 50% line is highlighted in yellow):

The formulas, used in the Excel sheet are presented in the GIF below:

Cheers!