# Bernoulli distribution

Parameters Probability mass function Three examples of Bernoulli distribution: .mw-parser-output .legend{page-break-inside:avoid;break-inside:avoid-column}.mw-parser-output .legend-color{display:inline-block;min-width:1.25em;height:1.25em;line-height:1.25;margin:1px 0;text-align:center;border:1px solid black;background-color:transparent;color:black}.mw-parser-output .legend-text{}  ${\displaystyle P(x=0)=0{.}2}$ and ${\displaystyle P(x=1)=0{.}8}$   ${\displaystyle P(x=0)=0{.}8}$ and ${\displaystyle P(x=1)=0{.}2}$   ${\displaystyle P(x=0)=0{.}5}$ and ${\displaystyle P(x=1)=0{.}5}$ ${\displaystyle 0\leq p\leq 1}$ ${\displaystyle q=1-p}$ ${\displaystyle k\in \{0,1\}}$ ${\displaystyle {\begin{cases}q=1-p&{\text{if }}k=0\\p&{\text{if }}k=1\end{cases}}}$ ${\displaystyle {\begin{cases}0&{\text{if }}k<0\\1-p&{\text{if }}0\leq k<1\\1&{\text{if }}k\geq 1\end{cases}}}$ ${\displaystyle p}$ ${\displaystyle {\begin{cases}0&{\text{if }}p<1/2\\\left[0,1\right]&{\text{if }}p=1/2\\1&{\text{if }}p>1/2\end{cases}}}$ ${\displaystyle {\begin{cases}0&{\text{if }}p<1/2\\0,1&{\text{if }}p=1/2\\1&{\text{if }}p>1/2\end{cases}}}$ ${\displaystyle p(1-p)=pq}$ ${\displaystyle 2p(1-p)=2pq}$ ${\displaystyle {\frac {q-p}{\sqrt {pq}}}}$ ${\displaystyle {\frac {1-6pq}{pq}}}$ ${\displaystyle -q\ln q-p\ln p}$ ${\displaystyle q+pe^{t}}$ ${\displaystyle q+pe^{it}}$ ${\displaystyle q+pz}$ ${\displaystyle {\frac {1}{pq}}}$

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,[1] is the discrete probability distribution of a random variable which takes the value 1 with probability ${\displaystyle p}$ and the value 0 with probability ${\displaystyle q=1-p}$. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and p would be the probability of tails). In particular, unfair coins would have ${\displaystyle p\neq 1/2.}$

The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1. [2]

## Properties

If ${\displaystyle X}$ is a random variable with a Bernoulli distribution, then:

${\displaystyle \Pr(X=1)=p=1-\Pr(X=0)=1-q.}$

The probability mass function ${\displaystyle f}$ of this distribution, over possible outcomes k, is

${\displaystyle f(k;p)={\begin{cases}p&{\text{if }}k=1,\\q=1-p&{\text{if }}k=0.\end{cases}}}$[3]

This can also be expressed as

${\displaystyle f(k;p)=p^{k}(1-p)^{1-k}\quad {\text{for }}k\in \{0,1\}}$

or as

${\displaystyle f(k;p)=pk+(1-p)(1-k)\quad {\text{for }}k\in \{0,1\}.}$

The Bernoulli distribution is a special case of the binomial distribution with ${\displaystyle n=1.}$[4]

The kurtosis goes to infinity for high and low values of ${\displaystyle p,}$ but for ${\displaystyle p=1/2}$ the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.

The Bernoulli distributions for ${\displaystyle 0\leq p\leq 1}$ form an exponential family.

The maximum likelihood estimator of ${\displaystyle p}$ based on a random sample is the sample mean.

## Mean

The expected value of a Bernoulli random variable ${\displaystyle X}$ is

${\displaystyle \operatorname {E} [X]=p}$

This is due to the fact that for a Bernoulli distributed random variable ${\displaystyle X}$ with ${\displaystyle \Pr(X=1)=p}$ and ${\displaystyle \Pr(X=0)=q}$ we find

${\displaystyle \operatorname {E} [X]=\Pr(X=1)\cdot 1+\Pr(X=0)\cdot 0=p\cdot 1+q\cdot 0=p.}$[3]

## Variance

The variance of a Bernoulli distributed ${\displaystyle X}$ is

${\displaystyle \operatorname {Var} [X]=pq=p(1-p)}$

We first find

${\displaystyle \operatorname {E} [X^{2}]=\Pr(X=1)\cdot 1^{2}+\Pr(X=0)\cdot 0^{2}=p\cdot 1^{2}+q\cdot 0^{2}=p=\operatorname {E} [X]}$

From this follows

${\displaystyle \operatorname {Var} [X]=\operatorname {E} [X^{2}]-\operatorname {E} [X]^{2}=\operatorname {E} [X]-\operatorname {E} [X]^{2}=p-p^{2}=p(1-p)=pq}$[3]

With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside ${\displaystyle [0,1/4]}$.

## Skewness

The skewness is ${\displaystyle {\frac {q-p}{\sqrt {pq}}}={\frac {1-2p}{\sqrt {pq}}}}$. When we take the standardized Bernoulli distributed random variable ${\displaystyle {\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}}$ we find that this random variable attains ${\displaystyle {\frac {q}{\sqrt {pq}}}}$ with probability ${\displaystyle p}$ and attains ${\displaystyle -{\frac {p}{\sqrt {pq}}}}$ with probability ${\displaystyle q}$. Thus we get

{\displaystyle {\begin{aligned}\gamma _{1}&=\operatorname {E} \left[\left({\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}\right)^{3}\right]\\&=p\cdot \left({\frac {q}{\sqrt {pq}}}\right)^{3}+q\cdot \left(-{\frac {p}{\sqrt {pq}}}\right)^{3}\\&={\frac {1}{{\sqrt {pq}}^{3}}}\left(pq^{3}-qp^{3}\right)\\&={\frac {pq}{{\sqrt {pq}}^{3}}}(q-p)\\&={\frac {q-p}{\sqrt {pq}}}.\end{aligned}}}

## Higher moments and cumulants

The raw moments are all equal due to the fact that ${\displaystyle 1^{k}=1}$ and ${\displaystyle 0^{k}=0}$.

${\displaystyle \operatorname {E} [X^{k}]=\Pr(X=1)\cdot 1^{k}+\Pr(X=0)\cdot 0^{k}=p\cdot 1+q\cdot 0=p=\operatorname {E} [X].}$

The central moment of order ${\displaystyle k}$ is given by

${\displaystyle \mu _{k}=(1-p)(-p)^{k}+p(1-p)^{k}.}$

The first six central moments are

{\displaystyle {\begin{aligned}\mu _{1}&=0,\\\mu _{2}&=p(1-p),\\\mu _{3}&=p(1-p)(1-2p),\\\mu _{4}&=p(1-p)(1-3p(1-p)),\\\mu _{5}&=p(1-p)(1-2p)(1-2p(1-p)),\\\mu _{6}&=p(1-p)(1-5p(1-p)(1-p(1-p))).\end{aligned}}}

The higher central moments can be expressed more compactly in terms of ${\displaystyle \mu _{2}}$ and ${\displaystyle \mu _{3}}$

{\displaystyle {\begin{aligned}\mu _{4}&=\mu _{2}(1-3\mu _{2}),\\\mu _{5}&=\mu _{3}(1-2\mu _{2}),\\\mu _{6}&=\mu _{2}(1-5\mu _{2}(1-\mu _{2})).\end{aligned}}}

The first six cumulants are

{\displaystyle {\begin{aligned}\kappa _{1}&=p,\\\kappa _{2}&=\mu _{2},\\\kappa _{3}&=\mu _{3},\\\kappa _{4}&=\mu _{2}(1-6\mu _{2}),\\\kappa _{5}&=\mu _{3}(1-12\mu _{2}),\\\kappa _{6}&=\mu _{2}(1-30\mu _{2}(1-4\mu _{2})).\end{aligned}}}
The Bernoulli distribution is simply ${\displaystyle \operatorname {B} (1,p)}$, also written as ${\textstyle \mathrm {Bernoulli} (p).}$
• The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
• The Beta distribution is the conjugate prior of the Bernoulli distribution.[5]
• The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success.
• If ${\textstyle Y\sim \mathrm {Bernoulli} \left({\frac {1}{2}}\right)}$, then ${\textstyle 2Y-1}$ has a Rademacher distribution.