= Continuous Bernoulli distribution =

Continuous Bernoulli distribution
- Type: density
- Parameters: $\lambda = 1/(1+e^{-\theta}) \in (0,1)$
- Support: $x \in [0, 1]$
- Pdf: $C(\lambda) \lambda^x (1-\lambda)^{1-x}\!$, where $C(\lambda) = \begin{cases} 2 &\text{if } \lambda = \frac{1}{2}\\ \frac{2 \tanh^{-1}(1-2\lambda)}{1-2\lambda} &\text{ otherwise} \end{cases}$
- Cdf: $F(x \mid \lambda)= \begin{cases} x, & \lambda=\tfrac12\\[6pt] \dfrac{\lambda^x(1-\lambda)^{1-x} + \lambda - 1}{2\lambda - 1}, & \text{otherwise} \end{cases}$
- Mean: $\operatorname{E}[X]= \begin{cases} \tfrac12 & \lambda=\tfrac12\\[6pt] \dfrac{\lambda}{2\lambda-1}+\dfrac{1}{2\tanh^{-1}(1-2\lambda)}, & \text{otherwise} \end{cases}$
- Variance: $\operatorname{Var}(X)= \begin{cases} \tfrac{1}{12}, & \lambda=\tfrac12\\[6pt] -\dfrac{\lambda(1-\lambda)}{(1-2\lambda)^2}+\dfrac{1}{(2\tanh^{-1}(1-2\lambda))^2}, & \text{otherwise} \end{cases}$
- Parameters2: $\theta \in \mathbb{R}$, natural parameter
- Support2: $x \in [0, 1]$
- Pdf2: $f(x \mid \theta)= \begin{cases} 1 & \theta = 0\\ \exp(x \theta - \log\{(e^\theta-1)/\theta\} ) & \theta \neq 0 \end{cases}$
- Cdf2: $F(x \mid \theta) = \begin{cases} x & \theta = 0\\ (e^{\theta x}-1)/(e^\theta-1) & \theta \neq 0 \end{cases}$
- Mean2: $\operatorname{E}[X]= \begin{cases} 1/2 & \theta = 0\\ e^\theta/(e^\theta-1)-\theta^{-1} & \theta \neq 0 \end{cases}$
- Variance2: $\operatorname{Var}(X)= \begin{cases} 1/12 & \theta = 0\\ (2-e^\theta-e^{-\theta})^{-1}+\theta^2 & \theta \neq 0 \end{cases}$

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution is a family of continuous probability distributions parameterized by a single shape parameter $\lambda \in (0, 1)$, defined on the unit interval $x \in [0, 1]$, by:
$p(x | \lambda) \propto \lambda^x (1-\lambda)^{1-x}.$

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders, for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, $[0,1]$-valued data. This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, $\{0,1\}$-valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing $\theta = \log\left(\lambda/(1-\lambda)\right)$ for the natural parameter, the density can be rewritten in canonical form:
$p(x | \theta) \propto \exp (\theta x)$.

== Statistical inference ==

Given an independent sample of $n$ points $x_1,\dots,x_n$ with $x_i\in[0,1]\,\forall i$ from continuous Bernoulli, the log-likelihood of the natural parameter $\theta$ is
$\mathcal{L}(\theta) = \theta \sum_{i=1}^n x_i - n \log \{(e^\theta - 1)/\theta\}$
and the maximum likelihood estimator of the natural parameter $\theta$ is the solution of $\mathcal{L}'(\theta) =0$, that is, $\hat{\theta}$ satisfies
$\frac{e^{\hat{\theta}}}{e^{\hat{\theta}} - 1} - \frac{1}{\hat{\theta}}= \frac{1}{n}\sum_{i=1}^n x_i$
where the left hand side $e^{\hat{\theta}}/(e^{\hat{\theta}} - 1) - \hat{\theta}^{-1}$ is the expected value of continuous Bernoulli with parameter $\hat\theta$. Although $\hat{\theta}$ does not admit a closed-form expression, it can be easily calculated with numerical inversion.

== Further properties ==

The entropy of a continuous Bernoulli distribution is
$\operatorname{H}[X] = \begin{cases} 0 &\text{ if } \lambda = \frac{1}{2} \\ \frac{\lambda\log\left(\lambda\right) - \left(1 - \lambda\right)\log\left(1 - \lambda\right)}{1 - 2\lambda} - \log\left(\frac{2 \tanh^{-1}\left(1 - 2\lambda\right)}{e\left(1 - 2\lambda\right)}\right) &\text{ otherwise} \end{cases}\!$

== Related distributions ==

=== Bernoulli distribution ===
The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set $\{0,1\}$ by the probability mass function:
$p(x) = p^x (1-p)^{1-x},$
where $p$ is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval $[0,1]$ results in the continuous Bernoulli probability density function, up to a normalizing constant.

=== Uniform distribution ===
The Uniform distribution between the unit interval [0,1] is a special case of continuous Bernoulli when $\lambda = 1/2$ or $\theta = 0$.

=== Exponential distribution ===
An exponential distribution with rate $\Lambda$ restricted to the unit interval [0,1] corresponds to a continuous Bernoulli distribution with natural parameter $\theta = -\Lambda < 0$.

=== Continuous categorical distribution ===
The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.
