= Bernoulli distribution =

</math>
|kurtosis =$\frac{1 - 6pq}{pq}$
|entropy =$-q\ln q - p\ln p$
|mgf =$q+pe^t$
|char =$q+pe^{it}$
|pgf =$q+pz$
|fisher =$\frac{1}{pq}$
}}

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability $p$ and the value 0 with probability $q = 1-p$. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and p would be the probability of tails). In particular, unfair coins would have $p \neq 1/2.$

The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.

==Properties==
If $X$ is a random variable with a Bernoulli distribution, then:

$\begin{align}
\Pr(X{=}1) &= p, \\
\Pr(X{=}0) &= q =1 - p.
\end{align}$

The probability mass function $f$ of this distribution, over possible outcomes k, is

$f(k;p) = \begin{cases}
   p & \text{if }k=1, \\
   q = 1-p & \text {if } k = 0.
 \end{cases}$

This can also be expressed as

$f(k;p) = p^k (1-p)^{1-k} \quad \text{for } k\in\{0,1\}$

or as

$f(k;p)=pk+(1-p)(1-k) \quad \text{for } k\in\{0,1\}.$

The Bernoulli distribution is a special case of the binomial distribution with $n = 1.$

The kurtosis goes to infinity for high and low values of $p,$ but for $p=1/2$ the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.

The Bernoulli distributions for $0 \le p \le 1$ form an exponential family.

The maximum likelihood estimator of $p$ based on a random sample is the sample mean.

==Mean==
The expected value of a Bernoulli random variable $X$ is

$\operatorname{E}[X]=p$

This is because for a Bernoulli distributed random variable $X$ with $\Pr(X{=}1) = p$ and $\Pr(X{=}0) = q$ we find

$\begin{align}
\operatorname{E}[X] &= \Pr(X{=}1) \cdot 1 + \Pr(X{=}0) \cdot 0 \\[1ex]
&= p \cdot 1 + q \cdot 0 \\[1ex]
&= p.
\end{align}$

== Variance ==
The variance of a Bernoulli distributed $X$ is

$\operatorname{Var}[X] = pq = p(1-p)$

We first find

$\begin{align}
\operatorname{E}[X^2] &= \Pr(X{=}1) \cdot 1^2 + \Pr(X{=}0) \cdot 0^2 \\
 &= p \cdot 1^2 + q\cdot 0^2 \\
 &= p = \operatorname{E}[X]
\end{align}$

From this follows

$\begin{align}
\operatorname{Var}[X] &= \operatorname{E}[X^2]-\operatorname{E}[X]^2
 = \operatorname{E}[X]-\operatorname{E}[X]^2 \\[1ex]
 &= p-p^2
 = p(1-p) = pq
\end{align}$

With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside $[0,1/4]$.

==Skewness==
The skewness is $\frac{q-p}{\sqrt{pq}}=\frac{1-2p}{\sqrt{pq}}$. When we take the standardized Bernoulli distributed random variable $\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}$ we find that this random variable attains $\frac{q}{\sqrt{pq}}$ with probability $p$ and attains $-\frac{p}{\sqrt{pq}}$ with probability $q$. Thus we get

$\begin{align}
\gamma_1 &= \operatorname{E} \left[\left(\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}\right)^3\right] \\
&= p \cdot \left(\frac{q}{\sqrt{pq}}\right)^3 + q \cdot \left(-\frac{p}{\sqrt{pq}}\right)^3 \\
&= \frac{1}{\sqrt{pq}^3} \left(pq^3-qp^3\right) \\
&= \frac{pq}{\sqrt{pq}^3} (q^2-p^2) \\
&= \frac{(1-p)^2-p^2}{\sqrt{pq}} \\
&= \frac{1-2p}{\sqrt{pq}} = \frac{q-p}{\sqrt{pq}}.
\end{align}$

==Higher moments and cumulants==
The raw moments are all equal because $1^k=1$ and $0^k=0$.

$\operatorname{E}[X^k] = \Pr(X{=}1) \cdot 1^k + \Pr(X{=}0) \cdot 0^k = p \cdot 1 + q\cdot 0 = p = \operatorname{E}[X].$

The central moment of order $k$ is given by
$\mu_k =(1-p)(-p)^k +p(1-p)^k.$
The first six central moments are
$\begin{align}
\mu_1 &= 0, \\
\mu_2 &= p(1-p), \\
\mu_3 &= p(1-p)(1-2p), \\
\mu_4 &= p(1-p)(1-3p(1-p)), \\
\mu_5 &= p(1-p)(1-2p)(1-2p(1-p)), \\
\mu_6 &= p(1-p)(1-5p(1-p)(1-p(1-p))).
\end{align}$
The higher central moments can be expressed more compactly in terms of $\mu_2$ and $\mu_3$
$\begin{align}
\mu_4 &= \mu_2 (1-3\mu_2 ), \\
\mu_5 &= \mu_3 (1-2\mu_2 ), \\
\mu_6 &= \mu_2 (1-5\mu_2 (1-\mu_2 )).
\end{align}$
The first six cumulants are
$\begin{align}
\kappa_1 &= p, \\
\kappa_2 &= \mu_2 , \\
\kappa_3 &= \mu_3 , \\
\kappa_4 &= \mu_2 (1-6\mu_2 ), \\
\kappa_5 &= \mu_3 (1-12\mu_2 ), \\
\kappa_6 &= \mu_2 (1-30\mu_2 (1-4\mu_2 )).
\end{align}$

==Entropy and Fisher's Information==

===Entropy===
Entropy is a measure of uncertainty or randomness in a probability distribution. For a Bernoulli random variable $X$ with success probability $p$ and failure probability $q = 1 - p$, the entropy $H(X)$ is defined as:

$\begin{align}
H(X) &= \mathbb{E}_p \ln \frac{1}{\Pr(X)} \\[1ex]
&= - \Pr(X{=}0) \ln \Pr(X{=}0) - \Pr(X{=}1) \ln \Pr(X{=}1) \\[1ex]
&= - (q \ln q + p \ln p).
\end{align}$

The entropy is maximized when $p = 0.5$, indicating the highest level of uncertainty when both outcomes are equally likely. The entropy is zero when $p = 0$ or $p = 1$, where one outcome is certain.

===Fisher's Information===
Fisher information measures the amount of information that an observable random variable $X$ carries about an unknown parameter $p$ upon which the probability of $X$ depends. For the Bernoulli distribution, the Fisher information with respect to the parameter $p$ is given by:

$I(p) = \frac{1}{pq}$

Proof:

- The Likelihood Function for a Bernoulli random variable$X$ is: $L(p; X) = p^X (1 - p)^{1 - X}$ This represents the probability of observing $X$ given the parameter $p$.
- The Log-Likelihood Function is: $\ln L(p; X) = X \ln p + (1 - X) \ln (1 - p)$
- The Score Function (the first derivative of the log-likelihood with respect to $p$ is: $\frac{\partial}{\partial p} \ln L(p; X) = \frac{X}{p} - \frac{1 - X}{1 - p}$
- The second derivative of the log-likelihood function is: $\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{X}{p^2} - \frac{1 - X}{(1 - p)^2}$
- Fisher information is calculated as the negative expected value of the second derivative of the log-likelihood:$\begin{align}
I(p) = -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right]
= -\left(-\frac{p}{p^2} - \frac{1 - p}{(1 - p)^2}\right) = \frac{1}{p(1-p)} = \frac{1}{pq}
\end{align}$
It is maximized when $p = 0.5$, reflecting maximum uncertainty and thus maximum information about the parameter $p$.

==Related distributions==
- If $X_1,\dots,X_n$ are independent, identically distributed (i.i.d.) random variables, all Bernoulli trials with success probability p, then their sum is distributed according to a binomial distribution with parameters n and p:
- :$\sum_{k=1}^n X_k \sim \operatorname{B}(n,p)$ (binomial distribution).

The Bernoulli distribution is simply $\operatorname{B}(1, p)$, also written as $\mathrm{Bernoulli} (p).$

- The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
- The Beta distribution is the conjugate prior of the Bernoulli distribution.
- The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success.
- If $Y \sim \mathrm{Bernoulli}\left(\frac{1}{2}\right)$, then $2Y - 1$ has a Rademacher distribution.

==See also==
- Bernoulli process, a random process consisting of a sequence of independent Bernoulli trials
- Bernoulli sampling
- Binary entropy function
- Binary decision diagram

==Author's mention==
- Abhirath, dwivedi. "Univariate Discrete Distributions"
- Peatman, John G.. "Introduction to Applied Statistics"
