Birthday problem

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of $n$ randomly chosen people, some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367 (since there are only 366 possible birthdays, including February 29). However, 99.9% probability is reached with just 70 people, and 50% probability with 23 people. These conclusions are based on the assumption that each day of the year (excluding February 29) is equally probable for a birthday.

Actual birth records show that different numbers of people are born on different days. In this case, it can be shown that the number of people required to reach the 50% threshold is 23 or fewer.^[1] For example, if half the people were born on one day and the other half on another day, then any two people would have a 50% chance of sharing a birthday.

It may well seem surprising that a group of just 23 individuals is required to reach a probability of 50% that at least two individuals in the group have the same birthday: this result is perhaps made more plausible by considering that the comparisons of birthday will actually be made between every possible pair of individuals = 23 × 22/2 = 253 comparisons, which is well over half the number of days in a year (183 at most), as opposed to fixing on one individual and comparing their birthday to everyone else's. The birthday problem is not a "paradox" in the literal logical sense of being self-contradictory, but is merely unintuitive at first glance.

Real-world applications for the birthday problem include a cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of finding a collision for a hash function, as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of population.

The history of the problem is obscure. The result has been attributed to Harold Davenport;^[2] however, a version of what is considered today to be the birthday problem was proposed earlier by Richard von Mises.^[3]

Calculating the probability

The problem is to compute an approximate probability that in a group of $n$ people at least two have the same birthday. For simplicity, variations in the distribution, such as leap years, twins, seasonal, or weekday variations are disregarded, and it is assumed that all 365 possible birthdays are equally likely. (Real-life birthday distributions are not uniform, since not all dates are equally likely, but these irregularities have little effect on the analysis.^{[nb 1]} Actually, a uniform distribution of birth dates is the worst case.^[5])

The goal is to compute $P (A)$ , the probability that at least two people in the room have the same birthday. However, it is simpler to calculate $P (A')$ , the probability that no two people in the room have the same birthday. Then, because $A$ and $A'$ are the only two possibilities and are also mutually exclusive, $P (A) = 1 - P (A').$

In deference to widely published solutions^[which?] concluding that 23 is the minimum number of people necessary to have a $P (A)$ that is greater than 50%, the following calculation of $P (A)$ will use 23 people as an example. If one numbers the 23 people from 1 to 23, the event that all 23 people have different birthdays is the same as the event that person 2 does not have the same birthday as person 1, and that person 3 does not have the same birthday as either person 1 or person 2, and so on, and finally that person 23 does not have the same birthday as any of persons 1 through 22. Let these events respectively be called "Event 2", "Event 3", and so on. One may also add an "Event 1", corresponding to the event of person 1 having a birthday, which occurs with probability 1. This conjunction of events may be computed using conditional probability: the probability of Event 2 is 364/365, as person 2 may have any birthday other than the birthday of person 1. Similarly, the probability of Event 3 given that Event 2 occurred is 363/365, as person 3 may have any of the birthdays not already taken by persons 1 and 2. This continues until finally the probability of Event 23 given that all preceding events occurred is 343/365. Finally, the principle of conditional probability implies that $P (A')$ is equal to the product of these individual probabilities:

P(A')={\frac {365}{365}}\times {\frac {364}{365}}\times {\frac {363}{365}}\times {\frac {362}{365}}\times \cdots \times {\frac {343}{365}}

(1)

The terms of equation (1) can be collected to arrive at:

P(A')=\left({\frac {1}{365}}\right)^{23}\times (365\times 364\times 363\times \cdots \times 343)

(2)

Evaluating equation (2) gives $P (A') \approx 0.492703$

Therefore, $P (A) \approx 1 - 0.492703 = 0.507297$ (50.7297%).

This process can be generalized to a group of $n$ people, where $p (n)$ is the probability of at least two of the $n$ people sharing a birthday. It is easier to first calculate the probability $p (n)$ that all $n$ birthdays are different. According to the pigeonhole principle, $p (n)$ is zero when $n > 365$ . When $n \leq 365$ :

{\begin{aligned}{\bar {p}}(n)&=1\times \left(1-{\frac {1}{365}}\right)\times \left(1-{\frac {2}{365}}\right)\times \cdots \times \left(1-{\frac {n-1}{365}}\right)\\[6pt]&={\frac {365\times 364\times \cdots \times (365-n+1)}{365^{n}}}\\[6pt]&={\frac {365!}{365^{n}(365-n)!}}={\frac {n!\cdot {\binom {365}{n}}}{365^{n}}}={\frac {_{365}P_{n}}{365^{n}}}\end{aligned}}

where $!$ is the factorial operator, $(365 n)$ is the binomial coefficient and $k P r$ denotes permutation.

The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first (⁠364/365⁠), the third cannot have the same birthday as either of the first two (⁠363/365⁠), and in general the $n$ th birthday cannot be the same as any of the $n - 1$ preceding birthdays.

The event of at least two of the $n$ persons having the same birthday is complementary to all $n$ birthdays being different. Therefore, its probability $p (n)$ is

p(n)=1-{\bar {p}}(n).

The following table shows the probability for some other values of $n$ (for this table, the existence of leap years is ignored, and each birthday is assumed to be equally likely):

The probability that no two people share a birthday in a group of $n$ people. Note that the vertical scale is logarithmic (each step down is 10²⁰ times less likely).

$n$	$p (n)$
1	00.0%
5	02.7%
10	11.7%
20	41.1%
23	50.7%
30	70.6%
40	89.1%
50	97.0%
60	99.4%
70	99.9%
75	99.97%
100	99.99997%
200	99.9999999999999999999999999998%
300	(100 − 6×10⁻⁸⁰)%
350	(100 − 3×10⁻¹²⁹)%
365	(100 − 1.45×10⁻¹⁵⁵)%
≥ 366	100%

Leap years. If we substitute 366 for 365 in the formula for ${\bar {p}}(n)$ , a similar calculation shows that for leap years, the number of people required for the probability of a match to be more than 50% is also 23; the probability of a match in this case is 50.6%.

Approximations

The Taylor series expansion of the exponential function (the constant $e \approx 2.718 281828$ )

e^{x}=1+x+{\frac {x^{2}}{2!}}+\cdots

provides a first-order approximation for $e x$ for $|x|\ll 1$ :

e^{x}\approx 1+x.

To apply this approximation to the first expression derived for $p (n)$ , set $x = - ⁠ a / 365 ⁠$ . Thus,

e^{-a/365}\approx 1-{\frac {a}{365}}.

Then, replace $a$ with non-negative integers for each term in the formula of $p (n)$ until $a = n - 1$ , for example, when $a = 1$ ,

e^{-1/365}\approx 1-{\frac {1}{365}}.

The first expression derived for $p (n)$ can be approximated as

{\begin{aligned}{\bar {p}}(n)&\approx 1\cdot e^{-1/365}\cdot e^{-2/365}\cdots e^{-(n-1)/365}\\[6pt]&=e^{-\left.{\big (}1+2+\,\cdots \,+(n-1){\big )}\right/365}\\[6pt]&=e^{-(n(n-1)/2)/365}=e^{-n(n-1)/730}.\end{aligned}}

Therefore,

p(n)=1-{\bar {p}}(n)\approx 1-e^{-n(n-1)/730}.

An even coarser approximation is given by

p(n)\approx 1-e^{-n^{2}/730},

which, as the graph illustrates, is still fairly accurate.

According to the approximation, the same approach can be applied to any number of "people" and "days". If rather than 365 days there are $d$ , if there are $n$ persons, and if $n ≪ d$ , then using the same approach as above we achieve the result that if $p (n, d)$ is the probability that at least two out of $n$ people share the same birthday from a set of $d$ available days, then:

{\begin{aligned}p(n,d)&\approx 1-e^{-n(n-1)/(2d)}\\[6pt]&\approx 1-e^{-n^{2}/(2d)}.\end{aligned}}

A simple exponentiation

The probability of any two people not having the same birthday is ⁠364/365⁠. In a room containing n people, there are $(n 2) = ⁠ n (n - 1) / 2 ⁠$ pairs of people, i.e. $(n 2)$ events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. In short ⁠364/365⁠ can be multiplied by itself $(n 2)$ times, which gives us

{\bar {p}}(n)\approx \left({\frac {364}{365}}\right)^{\binom {n}{2}}.

Since this is the probability of no one having the same birthday, then the probability of someone sharing a birthday is

p(n)\approx 1-\left({\frac {364}{365}}\right)^{\binom {n}{2}}.

Poisson approximation

Applying the Poisson approximation for the binomial on the group of 23 people,

\operatorname {Poi} \left({\frac {\binom {23}{2}}{365}}\right)=\operatorname {Poi} \left({\frac {253}{365}}\right)\approx \operatorname {Poi} (0.6932)

so

\Pr(X>0)=1-\Pr(X=0)\approx 1-e^{-0.6932}\approx 1-0.499998=0.500002.

The result is over 50% as previous descriptions. This approximation is the same as the one above based on the Taylor expansion that uses $e^{x}\approx 1+x$ .

Square approximation

A good rule of thumb which can be used for mental calculation is the relation

p(n)\approx {\frac {n^{2}}{2m}}

which can also be written as

n\approx {\sqrt {2m\times p(n)}}

which works well for probabilities less than or equal to ⁠1/2⁠. In these equations, $m$ is the number of days in a year.

For instance, to estimate the number of people required for a ⁠1/2⁠ chance of a shared birthday, we get

n\approx {\sqrt {2\times 365\times {\tfrac {1}{2}}}}={\sqrt {365}}\approx 19

Which is not too far from the correct answer of 23.

Approximation of number of people

This can also be approximated using the following formula for the number of people necessary to have at least a ⁠1/2⁠ chance of matching:

n\approx {\tfrac {1}{2}}+{\sqrt {{\tfrac {1}{4}}+2\times \ln(2)\times 365}}=22.999943.

This is a result of the good approximation that an event with $⁠ 1 / k ⁠$ probability will have a ⁠1/2⁠ chance of occurring at least once if it is repeated $k ln 2$ times.^[6]

Probability table

length of hex string	no. of bits ( $b$ )	hash space size ( $2 b$ )	Number of hashed elements such that probability of at least one hash collision ≥ $p$
length of hex string	no. of bits ( $b$ )	hash space size ( $2 b$ )	$p$ = 10⁻¹⁸	$p$ = 10⁻¹⁵	$p$ = 10⁻¹²	$p$ = 10⁻⁹	$p$ = 10⁻⁶	$p$ = 0.001	$p$ = 0.01	$p$ = 0.25	$p$ = 0.50	$p$ = 0.75
8	32	4.3×10⁹	2	2	2	2.9	93	2.9×10³	9.3×10³	5.0×10⁴	7.7×10⁴	1.1×10⁵
(10)	(40)	(1.1×10¹²)	2	2	2	47	1.5×10³	4.7×10⁴	1.5×10⁵	8.0×10⁵	1.2×10⁶	1.7×10⁶
(12)	(48)	(2.8×10¹⁴)	2	2	24	7.5×10²	2.4×10⁴	7.5×10⁵	2.4×10⁶	1.3×10⁷	2.0×10⁷	2.8×10⁷
16	64	1.8×10¹⁹	6.1	1.9×10²	6.1×10³	1.9×10⁵	6.1×10⁶	1.9×10⁸	6.1×10⁸	3.3×10⁹	5.1×10⁹	7.2×10⁹
(24)	(96)	(7.9×10²⁸)	4.0×10⁵	1.3×10⁷	4.0×10⁸	1.3×10¹⁰	4.0×10¹¹	1.3×10¹³	4.0×10¹³	2.1×10¹⁴	3.3×10¹⁴	4.7×10¹⁴
32	128	3.4×10³⁸	2.6×10¹⁰	8.2×10¹¹	2.6×10¹³	8.2×10¹⁴	2.6×10¹⁶	8.3×10¹⁷	2.6×10¹⁸	1.4×10¹⁹	2.2×10¹⁹	3.1×10¹⁹
(48)	(192)	(6.3×10⁵⁷)	1.1×10²⁰	3.5×10²¹	1.1×10²³	3.5×10²⁴	1.1×10²⁶	3.5×10²⁷	1.1×10²⁸	6.0×10²⁸	9.3×10²⁸	1.3×10²⁹
64	256	1.2×10⁷⁷	4.8×10²⁹	1.5×10³¹	4.8×10³²	1.5×10³⁴	4.8×10³⁵	1.5×10³⁷	4.8×10³⁷	2.6×10³⁸	4.0×10³⁸	5.7×10³⁸
(96)	(384)	(3.9×10¹¹⁵)	8.9×10⁴⁸	2.8×10⁵⁰	8.9×10⁵¹	2.8×10⁵³	8.9×10⁵⁴	2.8×10⁵⁶	8.9×10⁵⁶	4.8×10⁵⁷	7.4×10⁵⁷	1.0×10⁵⁸
128	512	1.3×10¹⁵⁴	1.6×10⁶⁸	5.2×10⁶⁹	1.6×10⁷¹	5.2×10⁷²	1.6×10⁷⁴	5.2×10⁷⁵	1.6×10⁷⁶	8.8×10⁷⁶	1.4×10⁷⁷	1.9×10⁷⁷

The lighter fields in this table show the number of hashes needed to achieve the given probability of collision (column) given a hash space of a certain size in bits (row). Using the birthday analogy: the "hash space size" resembles the "available days", the "probability of collision" resembles the "probability of shared birthday", and the "required number of hashed elements" resembles the "required number of people in a group". One could also use this chart to determine the minimum hash size required (given upper bounds on the hashes and probability of error), or the probability of collision (for fixed number of hashes and probability of error).

For comparison, 10⁻¹⁸ to 10⁻¹⁵ is the uncorrectable bit error rate of a typical hard disk.^[7] In theory, 128-bit hash functions, such as MD5, should stay within that range until about 8.2×10¹¹ documents, even if its possible outputs are many more.

An upper bound on the probability and a lower bound on the number of people

The argument below is adapted from an argument of Paul Halmos.^{[nb 2]}

As stated above, the probability that no two birthdays coincide is

1-p(n)={\bar {p}}(n)=\prod _{k=1}^{n-1}\left(1-{\frac {k}{365}}\right).

As in earlier paragraphs, interest lies in the smallest $n$ such that $p (n) > ⁠ 1 / 2 ⁠$ ; or equivalently, the smallest $n$ such that $p (n) < ⁠ 1 / 2 ⁠$ .

Using the inequality $1 - x < e - x$ in the above expression we replace $1 - ⁠ k / 365 ⁠$ with $e - k ⁄ 365$ . This yields

{\bar {p}}(n)=\prod _{k=1}^{n-1}\left(1-{\frac {k}{365}}\right)<\prod _{k=1}^{n-1}\left(e^{-k/365}\right)=e^{-n(n-1)/730}.

Therefore, the expression above is not only an approximation, but also an upper bound of $p (n)$ . The inequality

e^{-n(n-1)/730}<{\frac {1}{2}}

implies $p (n) < ⁠ 1 / 2 ⁠$ . Solving for $n$ gives

n^{2}-n>730\ln 2.

Now, $730 ln 2$ is approximately 505.997, which is barely below 506, the value of $n 2 - n$ attained when $n = 23$ . Therefore, 23 people suffice. Incidentally, solving $n 2 - n = 730 ln 2$ for n gives the approximate formula of Frank H. Mathis cited above.

This derivation only shows that at most 23 people are needed to ensure a birthday match with even chance; it leaves open the possibility that $n$ is 22 or less could also work.

Generalizations

The generalized birthday problem

Given a year with $d$ days, the generalized birthday problem asks for the minimal number $n (d)$ such that, in a set of $n$ randomly chosen people, the probability of a birthday coincidence is at least 50%. In other words, $n (d)$ is the minimal integer $n$ such that

1-\left(1-{\frac {1}{d}}\right)\left(1-{\frac {2}{d}}\right)\cdots \left(1-{\frac {n-1}{d}}\right)\geq {\frac {1}{2}}.

The classical birthday problem thus corresponds to determining $n (365)$ . The first 99 values of $n (d)$ are given here (sequence A033810 in the OEIS):

$d$	1–2	3–5	6–9	10–16	17–23	24–32	33–42	43–54	55–68	69–82	83–99
$n (d)$	2	3	4	5	6	7	8	9	10	11	12

A similar calculation shows that $n (d)$ = 23 when $d$ is in the range 341–372.

A number of bounds and formulas for $n (d)$ have been published.^[8] For any $d \geq 1$ , the number $n (d)$ satisfies^[9]

{\frac {3-2\ln 2}{6}}<n(d)-{\sqrt {2d\ln 2}}\leq 9-{\sqrt {86\ln 2}}.

These bounds are optimal in the sense that the sequence $n (d) - \sqrt 2 d ln 2$ gets arbitrarily close to

{\frac {3-2\ln 2}{6}}\approx 0.27,

while it has

9-{\sqrt {86\ln 2}}\approx 1.28

as its maximum, taken for $d = 43$ .

The bounds are sufficiently tight to give the exact value of $n (d)$ in 99% of all cases, for example $n (365) = 23$ . In general, it follows from these bounds that $n (d)$ always equals either

\left\lceil {\sqrt {2d\ln 2}}\,\right\rceil \quad {\text{or}}\quad \left\lceil {\sqrt {2d\ln 2}}\,\right\rceil +1

where $⌈ \cdot ⌉$ denotes the ceiling function. The formula

n(d)=\left\lceil {\sqrt {2d\ln 2}}\,\right\rceil

holds for 73% of all integers $d$ .^[10] The formula

n(d)=\left\lceil {\sqrt {2d\ln 2}}+{\frac {3-2\ln 2}{6}}\right\rceil

holds for almost all $d$ , i.e., for a set of integers $d$ with asymptotic density 1.^[10]

The formula

n(d)=\left\lceil {\sqrt {2d\ln 2}}+{\frac {3-2\ln 2}{6}}+{\frac {9-4(\ln 2)^{2}}{72{\sqrt {2d\ln 2}}}}\right\rceil

holds for all $d \leq 1018$ , but it is conjectured that there are infinitely many counterexamples to this formula.^[11]

The formula

n(d)=\left\lceil {\sqrt {2d\ln 2}}+{\frac {3-2\ln 2}{6}}+{\frac {9-4(\ln 2)^{2}}{72{\sqrt {2d\ln 2}}}}-{\frac {2(\ln 2)^{2}}{135d}}\right\rceil

holds for all $d \leq 1018$ , and it is conjectured that this formula holds for all $d$ .^[11]

More than 2 people

It is possible to extend the problem to ask how many people in a group are necessary for there to be a greater than 50% probability that at least 3/4/5/etc. of the group share the same birthday.

The first few values are as follows: >50% probability of 3 people sharing a birthday - 88 people; >50% probability of 4 people sharing a birthday - 187 people. The full list can be found as sequence A014088 of the Online Encyclopedia of Integer Sequences.^[12]

Cast as a collision problem

The birthday problem can be generalized as follows:

Given

n

random integers drawn from a discrete uniform distribution with range

[1, d]

, what is the probability

p (n; d)

that at least two numbers are the same? (

d = 365

gives the usual birthday problem.)^[13]

The generic results can be derived using the same arguments given above.

{\begin{aligned}p(n;d)&={\begin{cases}1-\displaystyle \prod _{k=1}^{n-1}\left(1-{\frac {k}{d}}\right)&n\leq d\\1&n>d\end{cases}}\\[8px]&\approx 1-e^{-{\frac {n(n-1)}{2d}}}\\&\approx 1-\left({\frac {d-1}{d}}\right)^{\frac {n(n-1)}{2}}\end{aligned}}

Conversely, if $n (p; d)$ denotes the number of random integers drawn from $[1, d]$ to obtain a probability $p$ that at least two numbers are the same, then

n(p;d)\approx {\sqrt {2d\cdot \ln \left({\frac {1}{1-p}}\right)}}.

The birthday problem in this more generic sense applies to hash functions: the expected number of $N$ -bit hashes that can be generated before getting a collision is not $2 N$ , but rather only $2 N ⁄ 2$ . This is exploited by birthday attacks on cryptographic hash functions and is the reason why a small number of collisions in a hash table are, for all practical purposes, inevitable.

The theory behind the birthday problem was used by Zoe Schnabel^[14] under the name of capture-recapture statistics to estimate the size of fish population in lakes.

Generalization to multiple types

The basic problem considers all trials to be of one "type". The birthday problem has been generalized to consider an arbitrary number of types.^[15] In the simplest extension there are two types of people, say $m$ men and $n$ women, and the problem becomes characterizing the probability of a shared birthday between at least one man and one woman. (Shared birthdays between two men or two women do not count.) The probability of no shared birthdays here is

p_{0}={\frac {1}{d^{m+n}}}\sum _{i=1}^{m}\sum _{j=1}^{n}S_{2}(m,i)S_{2}(n,j)\prod _{k=0}^{i+j-1}d-k

where $d = 365$ and $S 2$ are Stirling numbers of the second kind. Consequently, the desired probability is $1 - p 0$ .

This variation of the birthday problem is interesting because there is not a unique solution for the total number of people $m + n$ . For example, the usual 50% probability value is realized for both a 32-member group of 16 men and 16 women and a 49-member group of 43 women and 6 men.

Partition problem

A related problem is the partition problem, a variant of the knapsack problem from operations research. Some weights are put on a balance scale; each weight is an integer number of grams randomly chosen between one gram and one million grams (one tonne). The question is whether one can usually (that is, with probability close to 1) transfer the weights between the left and right arms to balance the scale. (In case the sum of all the weights is an odd number of grams, a discrepancy of one gram is allowed.) If there are only two or three weights, the answer is very clearly no; although there are some combinations which work, the majority of randomly selected combinations of three weights do not. If there are very many weights, the answer is clearly yes. The question is, how many are just sufficient? That is, what is the number of weights such that it is equally likely for it to be possible to balance them as it is to be impossible?

Often, people's intuition is that the answer is above 100000. Most people's intuition is that it is in the thousands or tens of thousands, while others feel it should at least be in the hundreds. The correct answer is 23.^{[citation needed]}

The reason is that the correct comparison is to the number of partitions of the weights into left and right. There are $2 N - 1$ different partitions for $N$ weights, and the left sum minus the right sum can be thought of as a new random quantity for each partition. The distribution of the sum of weights is approximately Gaussian, with a peak at $1000000 N$ and width $1000000 \sqrt N$ , so that when $2 N - 1$ is approximately equal to $1000000 \sqrt N$ the transition occurs. 2^{23 − 1} is about 4 million, while the width of the distribution is only 5 million.^[23]

In fiction

Arthur C. Clarke's novel A Fall of Moondust, published in 1961, contains a section where the main characters, trapped underground for an indefinite amount of time, are celebrating a birthday and find themselves discussing the validity of the birthday problem. As stated by a physicist passenger: "If you have a group of more than twenty-four people, the odds are better than even that two of them have the same birthday." Eventually, out of 22 present, it is revealed that two characters share the same birthday, May 23.

Notes

^
In reality, birthdays are not evenly distributed throughout the year; there are more births per day in some seasons than in others, but for the purposes of this problem the distribution is treated as uniform. In particular, many children are born in the summer, especially the months of August and September (for the northern hemisphere) [1], and in the U.S. it has been noted that many children are conceived around the holidays of Christmas and New Year's Day.^[1] Also, because hospitals rarely schedule caesarian sections and induced labor on the weekend, more people are born between Tuesday and Friday than on weekends;^[1] where many of the people share a birth year (e.g. a class in a school), this creates a tendency toward particular dates. In Sweden 9.3% of the population is born in March and 7.3% in November when a uniform distribution would give 8.3% Swedish statistics board. See also:
- Murphy, Ron. "An Analysis of the Distribution of Birthdays in a Calendar Year". Retrieved 2011-12-27.
- Mathers, C D; R S Harris (1983). "Seasonal Distribution of Births in Australia". International Journal of Epidemiology. 12 (3): 326–331. doi:10.1093/ije/12.3.326. PMID 6629621. Retrieved 2011-12-27.
These factors tend to increase the chance of identical birth dates, since a denser subset has more possible pairs (in the extreme case when everyone was born on three days, there would obviously be many identical birthdays). The problem of a non-uniform number of births occurring during each day of the year was first understood by Murray Klamkin in 1967.^[4] A formal proof that the probability of two matching birthdays is least for a uniform distribution of birthdays was given by Bloom (Bloom 1973).
^ In his autobiography, Halmos criticized the form in which the birthday paradox is often presented, in terms of numerical computation. He believed that it should be used as an example in the use of more abstract mathematical concepts. He wrote:
The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories.

References

^ ^a ^b ^c Mario Cortina Borja; John Haigh (September 2007). "The Birthday Problem". Significance. 4 (3). Royal Statistical Society: 124–127. doi:10.1111/j.1740-9713.2007.00246.x.
^ W. W. Rouse Ball and H.S.M. Coxeter, "Mathematical Recreations and Essays, 13th edition", Dover Publications, New York, 1987, p 45.
^ Frank, P.; Goldstein, S.; Kac, M.; Prager, W.; Szegö, G.; Birkhoff, G., eds. (1964). Selected Papers of Richard von Mises. Vol. 2. Providence, Rhode Island: Amer. Math. Soc. pp. 313–334.
^ Klamkin & Newman 1967.
^ Steele, J. Michael (2004). The Cauchy‑Schwarz Master Class. Cambridge: Cambridge University Press. pp. 206, 277. ISBN 9780521546775.
^ Mathis, Frank H. (June 1991). "A Generalized Birthday Problem". SIAM Review. 33 (2): 265–270. doi:10.1137/1033051. ISSN 0036-1445. JSTOR 2031144. OCLC 37699182.
^ Jim Gray, Catharine van Ingen. Empirical Measurements of Disk Failure Rates and Error Rates
^ D. Brink, A (probably) exact solution to the Birthday Problem, Ramanujan Journal, 2012, [2].
^ Brink 2012, Theorem 2
^ ^a ^b Brink 2012, Theorem 3
^ ^a ^b Brink 2012, Table 3, Conjecture 1
^ "Minimal number of people to give a 50% probability of having at least n coincident birthdays in one year". The On-line Encyclopedia of Integer Sequences. OEIS. Retrieved 17 February 2020.
^ Suzuki, K.; Tonien, D.; et al. (2006). "Birthday Paradox for Multi-collisions". In Rhee M.S., Lee B. (ed.). Lecture Notes in Computer Science, vol 4296. Berlin: Springer. doi:10.1007/11927587_5. Information Security and Cryptology – ICISC 2006.
^ Z. E. Schnabel (1938) The Estimation of the Total Fish Population of a Lake, American Mathematical Monthly 45, 348–352.
^ M. C. Wendl (2003) Collision Probability Between Sets of Random Variables, Statistics and Probability Letters 64(3), 249–254.
^ ^a ^b M. Abramson and W. O. J. Moser (1970) More Birthday Surprises, American Mathematical Monthly 77, 856–858
^ Might, Matt. "Collision hash collisions with the birthday paradox". Matt Might's blog. Retrieved 17 July 2015.
^ Knuth, D. E. (1973). The Art of Computer Programming. Vol. Vol. 3, Sorting and Searching. Reading, Massachusetts: Addison-Wesley. ISBN 978-0-201-03803-3. {{cite book}}: |volume= has extra text (help)
^ Flajolet, P.; Grabner, P. J.; Kirschenhofer, P.; Prodinger, H. (1995). "On Ramanujan's Q-Function". Journal of Computational and Applied Mathematics. 58: 103–116. doi:10.1016/0377-0427(93)E0258-N.
^ Cormen; et al. Introduction to Algorithms.
^ Fletcher, James (16 June 2014). "The birthday paradox at the World Cup". bbc.com. BBC. Retrieved 27 August 2015.
^ Voracek, M.; Tran, U. S.; Formann, A. K. (2008). "Birthday and birthmate problems: Misconceptions of probability among psychology undergraduates and casino visitors and personnel". Perceptual and Motor Skills. 106 (1): 91–103. doi:10.2466/pms.106.1.91-103. PMID 18459359. S2CID 22046399.
^ Borgs, C.; Chayes, J.; Pittel, B. (2001). "Phase Transition and Finite Size Scaling in the Integer Partition Problem". Random Structures and Algorithms. 19 (3–4): 247–288. doi:10.1002/rsa.10004. S2CID 6819493.

Bibliography

Abramson, M.; Moser, W. O. J. (1970). "More Birthday Surprises". American Mathematical Monthly. 77 (8): 856–858. doi:10.2307/2317022. JSTOR 2317022.
Bloom, D. (1973). "A Birthday Problem". American Mathematical Monthly. 80 (10): 1141–1142. doi:10.2307/2318556. JSTOR 2318556.
Kemeny, John G.; Snell, J. Laurie; Thompson, Gerald (1957). Introduction to Finite Mathematics (First ed.).
Klamkin, M.; Newman, D. (1967). "Extensions of the Birthday Surprise". Journal of Combinatorial Theory. 3 (3): 279–282. doi:10.1016/s0021-9800(67)80075-9.
McKinney, E. H. (1966). "Generalized Birthday Problem". American Mathematical Monthly. 73 (4): 385–387. doi:10.2307/2315408. JSTOR 2315408.
Schneps, Leila; Colmez, Coralie (2013). "Math error number 5. The case of Diana Sylvester: cold hit analysis". Math on Trial. How Numbers Get Used and Abused in the Courtroom. Basic Books. ISBN 978-0-465-03292-1.
Sy M. Blinder (2013). Guide to Essential Math: A Review for Physics, Chemistry and Engineering Students. Elsevier. pp. 5–6. ISBN 978-0-12-407163-6.

External links

The Birthday Paradox accounting for leap year birthdays
Weisstein, Eric W. "Birthday Problem". MathWorld.
A humorous article explaining the paradox
SOCR EduMaterials activities birthday experiment
Understanding the Birthday Problem (Better Explained)
Eurobirthdays 2012. A birthday problem. A practical football example of the birthday paradox.
Grime, James. "23: Birthday Probability". Numberphile. Brady Haran. Archived from the original on 2017-02-25. Retrieved 2013-04-02.
Computing the probabilities of the Birthday Problem at WolframAlpha

[nonuniform_birthdays-5] In reality, birthdays are not evenly distributed throughout the year; there are more births per day in some seasons than in others, but for the purposes of this problem the distribution is treated as uniform. In particular, many children are born in the summer, especially the months of August and September (for the northern hemisphere) [1], and in the U.S. it has been noted that many children are conceived around the holidays of Christmas and New Year's Day.^[1] Also, because hospitals rarely schedule caesarian sections and induced labor on the weekend, more people are born between Tuesday and Friday than on weekends;^[1] where many of the people share a birth year (e.g. a class in a school), this creates a tendency toward particular dates. In Sweden 9.3% of the population is born in March and 7.3% in November when a uniform distribution would give 8.3% Swedish statistics board. See also:
Murphy, Ron. "An Analysis of the Distribution of Birthdays in a Calendar Year". Retrieved 2011-12-27.

Mathers, C D; R S Harris (1983). "Seasonal Distribution of Births in Australia". International Journal of Epidemiology. 12 (3): 326–331. doi:10.1093/ije/12.3.326. PMID 6629621. Retrieved 2011-12-27.
These factors tend to increase the chance of identical birth dates, since a denser subset has more possible pairs (in the extreme case when everyone was born on three days, there would obviously be many identical birthdays). The problem of a non-uniform number of births occurring during each day of the year was first understood by Murray Klamkin in 1967.^[4] A formal proof that the probability of two matching birthdays is least for a uniform distribution of birthdays was given by Bloom (Bloom 1973).

[2] Murphy, Ron. "An Analysis of the Distribution of Birthdays in a Calendar Year". Retrieved 2011-12-27.

[3] Mathers, C D; R S Harris (1983). "Seasonal Distribution of Births in Australia". International Journal of Epidemiology. 12 (3): 326–331. doi:10.1093/ije/12.3.326. PMID 6629621. Retrieved 2011-12-27.

[9] In his autobiography, Halmos criticized the form in which the birthday paradox is often presented, in terms of numerical computation. He believed that it should be used as an example in the use of more abstract mathematical concepts. He wrote:
The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories.

[Borja-1] Mario Cortina Borja; John Haigh (September 2007). "The Birthday Problem". Significance. 4 (3). Royal Statistical Society: 124–127. doi:10.1111/j.1740-9713.2007.00246.x.

[2] W. W. Rouse Ball and H.S.M. Coxeter, "Mathematical Recreations and Essays, 13th edition", Dover Publications, New York, 1987, p 45.

[3] Frank, P.; Goldstein, S.; Kac, M.; Prager, W.; Szegö, G.; Birkhoff, G., eds. (1964). Selected Papers of Richard von Mises. Vol. 2. Providence, Rhode Island: Amer. Math. Soc. pp. 313–334.

[FOOTNOTEKlamkinNewman1967-4] Klamkin & Newman 1967.

[6] Steele, J. Michael (2004). The Cauchy‑Schwarz Master Class. Cambridge: Cambridge University Press. pp. 206, 277. ISBN 9780521546775.

[7] Mathis, Frank H. (June 1991). "A Generalized Birthday Problem". SIAM Review. 33 (2): 265–270. doi:10.1137/1033051. ISSN 0036-1445. JSTOR 2031144. OCLC 37699182.

[8] Jim Gray, Catharine van Ingen. Empirical Measurements of Disk Failure Rates and Error Rates

[10] D. Brink, A (probably) exact solution to the Birthday Problem, Ramanujan Journal, 2012, [2].

[11] Brink 2012, Theorem 2

[Brink-12] Brink 2012, Theorem 3

[ReferenceA-13] Brink 2012, Table 3, Conjecture 1

[14] "Minimal number of people to give a 50% probability of having at least n coincident birthdays in one year". The On-line Encyclopedia of Integer Sequences. OEIS. Retrieved 17 February 2020.

[15] Suzuki, K.; Tonien, D.; et al. (2006). "Birthday Paradox for Multi-collisions". In Rhee M.S., Lee B. (ed.). Lecture Notes in Computer Science, vol 4296. Berlin: Springer. doi:10.1007/11927587_5. Information Security and Cryptology – ICISC 2006.

[16] Z. E. Schnabel (1938) The Estimation of the Total Fish Population of a Lake, American Mathematical Monthly 45, 348–352.

[17] M. C. Wendl (2003) Collision Probability Between Sets of Random Variables, Statistics and Probability Letters 64(3), 249–254.

[abramson-18] M. Abramson and W. O. J. Moser (1970) More Birthday Surprises, American Mathematical Monthly 77, 856–858

[19] Might, Matt. "Collision hash collisions with the birthday paradox". Matt Might's blog. Retrieved 17 July 2015.

[knuth73-20] Knuth, D. E. (1973). The Art of Computer Programming. Vol. Vol. 3, Sorting and Searching. Reading, Massachusetts: Addison-Wesley. ISBN 978-0-201-03803-3. {{cite book}}: |volume= has extra text (help)

[flajolet95-21] Flajolet, P.; Grabner, P. J.; Kirschenhofer, P.; Prodinger, H. (1995). "On Ramanujan's Q-Function". Journal of Computational and Applied Mathematics. 58: 103–116. doi:10.1016/0377-0427(93)E0258-N.

[22] Cormen; et al. Introduction to Algorithms.

[23] Fletcher, James (16 June 2014). "The birthday paradox at the World Cup". bbc.com. BBC. Retrieved 27 August 2015.

[24] Voracek, M.; Tran, U. S.; Formann, A. K. (2008). "Birthday and birthmate problems: Misconceptions of probability among psychology undergraduates and casino visitors and personnel". Perceptual and Motor Skills. 106 (1): 91–103. doi:10.2466/pms.106.1.91-103. PMID 18459359. S2CID 22046399.

[25] Borgs, C.; Chayes, J.; Pittel, B. (2001). "Phase Transition and Finite Size Scaling in the Integer Partition Problem". Random Structures and Algorithms. 19 (3–4): 247–288. doi:10.1002/rsa.10004. S2CID 6819493.

[1]

[2]

[3]

[nb 1]

[5]

[6]

[7]

[nb 2]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[4]

$p$	$n$	$n ↓$	$p (n ↓)$	$n ↑$	$p (n ↑)$
0.01	0.14178√365 = 2.70864	2	0.00274	3	0.00820
0.05	0.32029√365 = 6.11916	6	0.04046	7	0.05624
0.1	0.45904√365 = 8.77002	8	0.07434	9	0.09462
0.2	0.66805√365 = 12.76302	12	0.16702	13	0.19441
0.3	0.84460√365 = 16.13607	16	0.28360	17	0.31501
0.5	1.17741√365 = 22.49439	22	0.47570	23	0.50730
0.7	1.55176√365 = 29.64625	29	0.68097	30	0.70632
0.8	1.79412√365 = 34.27666	34	0.79532	35	0.81438
0.9	2.14597√365 = 40.99862	40	0.89123	41	0.90315
0.95	2.44775√365 = 46.76414	46	0.94825	47	0.95477
0.99	3.03485√365 = 57.98081	57	0.99012	58	0.99166