In probability theory, the Borel–Kolmogorov paradox (sometimes known as Borel's paradox) is a paradox relating to conditional probability with respect to an event of probability zero (also known as a null set). It is named after Émile Borel and Andrey Kolmogorov.

## A great circle puzzle

Suppose that a random variable has a uniform distribution on a unit sphere. What is its conditional distribution on a great circle? Because of the symmetry of the sphere, one might expect that the distribution is uniform and independent of the choice of coordinates. However, two analyses give contradictory results. First, note that choosing a point uniformly on the sphere is equivalent to choosing the longitude ${\displaystyle \lambda }$ uniformly from ${\displaystyle [-\pi ,\pi ]}$ and choosing the latitude ${\displaystyle \varphi }$ from ${\textstyle [-{\frac {\pi }{2}},{\frac {\pi }{2}}]}$ with density ${\textstyle {\frac {1}{2}}\cos \varphi }$.[1] Then we can look at two different great circles:

1. If the coordinates are chosen so that the great circle is an equator (latitude ${\displaystyle \varphi =0}$), the conditional density for a longitude ${\displaystyle \lambda }$ defined on the interval ${\displaystyle [-\pi ,\pi ]}$ is
${\displaystyle f(\lambda \mid \varphi =0)={\frac {1}{2\pi }}.}$
2. If the great circle is a line of longitude with ${\displaystyle \lambda =0}$, the conditional density for ${\displaystyle \varphi }$ on the interval ${\textstyle [-{\frac {\pi }{2}},{\frac {\pi }{2}}]}$ is
${\displaystyle f(\varphi \mid \lambda =0)={\frac {1}{2}}\cos \varphi .}$

One distribution is uniform on the circle, the other is not. Yet both seem to be referring to the same great circle in different coordinate systems.

Many quite futile arguments have raged — between otherwise competent probabilists — over which of these results is 'correct'.

## Explanation and implications

In case (1) above, the conditional probability that the longitude λ lies in a set E given that φ = 0 can be written P(λE | φ = 0). Elementary probability theory suggests this can be computed as P(λE and φ = 0)/P(φ = 0), but that expression is not well-defined since P(φ = 0) = 0. Measure theory provides a way to define a conditional probability, using the family of events Rab = {φ : a < φ < b} which are horizontal rings consisting of all points with latitude between a and b.

The resolution of the paradox is to notice that in case (2), P(φF | λ = 0) is defined using the events Lab = {λ : a < λ < b}, which are lunes (vertical wedges), consisting of all points whose longitude varies between a and b. So although P(λE | φ = 0) and P(φF | λ = 0) each provide a probability distribution on a great circle, one of them is defined using rings, and the other using lunes. Thus it is not surprising after all that P(λE | φ = 0) and P(φF | λ = 0) have different distributions.

The concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible. For we can obtain a probability distribution for [the latitude] on the meridian circle only if we regard this circle as an element of the decomposition of the entire spherical surface onto meridian circles with the given poles

… the term 'great circle' is ambiguous until we specify what limiting operation is to produce it. The intuitive symmetry argument presupposes the equatorial limit; yet one eating slices of an orange might presuppose the other.

## Mathematical explication

### Using calculus

Consider a random vector ${\displaystyle (X,Y,Z)}$ that is uniformly distributed on the unit sphere ${\displaystyle S^{2}}$. The uniform distribution is defined in terms of surface integrals: for a given region ${\displaystyle D\subset S^{2}}$,

${\displaystyle \mathbb {P} ((X,Y,Z)\in D)={\frac {1}{4\pi }}\int _{D}\mathrm {d} S.}$

This means that for any parametrization ${\displaystyle p(u,v)}$ that covers D, we have

${\displaystyle \mathbb {P} ((X,Y,Z)\in D)={\frac {1}{4\pi }}\int _{p^{-1}(D)}\left|\left|{\frac {\partial p}{\partial u}}\times {\frac {\partial p}{\partial v}}\right|\right|\mathrm {d} u\mathrm {d} v}$

For simplicity, we won't calculate the full conditional distribution on a great circle, only the probability that it lies in the first quadrant. That is to say, we will attempt to calculate the conditional probability ${\displaystyle \mathbb {P} (A|B)}$ with

{\displaystyle {\begin{aligned}A&=\{0

We begin by parametrizing the sphere with the usual spherical polar coordinates:

{\displaystyle {\begin{aligned}x&=\cos(\varphi )\cos(\theta )\\y&=\cos(\varphi )\sin(\theta )\\z&=\sin(\varphi )\end{aligned}}}

where ${\displaystyle -{\frac {\pi }{2}}\leq \varphi \leq {\frac {\pi }{2}}}$ and ${\displaystyle -\pi \leq \theta \leq \pi }$.

We can define random variables ${\displaystyle \Phi }$, ${\displaystyle \Theta }$ as the values of ${\displaystyle (X,Y,Z)}$ under the inverse of this parametrization, or more formally using the arctan2 function:

{\displaystyle {\begin{aligned}\Phi &=\arcsin(Z)\\\Theta &=\arctan _{2}\left({\frac {Y}{\sqrt {1-Z^{2}}}},{\frac {X}{\sqrt {1-Z^{2}}}}\right)\end{aligned}}}

Via the formula above, the probability distribution on the sphere is

${\displaystyle \mathbb {P} ((X,Y,Z)\in D)={\frac {1}{4\pi }}\int _{p^{-1}(D)}\cos(\varphi )\mathrm {d} \varphi \mathrm {d} \theta }$

This equation in effect defines the joint density ${\displaystyle f_{\Phi ,\Theta }(\varphi ,\theta )}$ of ${\displaystyle \Phi }$ and ${\displaystyle \Theta }$:

${\displaystyle f_{\Phi ,\Theta }(\varphi ,\theta )={\frac {1}{4\pi }}\cos(\varphi )}$.

We can rewrite A and B in polar coordinates:

{\displaystyle {\begin{aligned}A&=\{0

We attempt to evaluate the conditional probability using the density

${\displaystyle \mathbb {P} (A\mid B){\stackrel {?}{=}}\int _{0}^{\frac {\pi }{2}}f_{\Theta |\Phi }(\theta ,0)\mathrm {d} \theta =\int _{0}^{\frac {\pi }{2}}{\frac {1}{2\pi }}\mathrm {d} \theta ={\frac {1}{4}}}$

Now we repeat the process with a different parametrization of the sphere:

{\displaystyle {\begin{aligned}x&=\cos(\varphi )\cos(\theta )\\y&=\sin(\varphi )\\z&=-\cos(\varphi )\sin(\theta )\end{aligned}}}

This is equivalent to the previous parametrization rotated by 90 degrees around the x axis.

Define new random variables

{\displaystyle {\begin{aligned}\Phi '&=\arcsin(Y)\\\Theta '&=\arctan _{2}\left({\frac {-Z}{\sqrt {1-Y^{2}}}},{\frac {X}{\sqrt {1-Y^{2}}}}\right).\end{aligned}}}

Rotation is measure preserving so the density of ${\displaystyle \Phi '}$ and ${\displaystyle \Theta '}$ is the same:

${\displaystyle f_{\Phi ',\Theta '}(\varphi ,\theta )={\frac {1}{4\pi }}\cos(\varphi )}$.

The expressions for A and B are:

{\displaystyle {\begin{aligned}A&=\{0

Attempting again to evaluate the conditional probability using the density

${\displaystyle \mathbb {P} (A\mid B){\stackrel {?}{=}}\int _{0}^{\frac {\pi }{2}}f_{\Phi '|\Theta '}(\varphi ,0)\mathrm {d} \varphi =\int _{0}^{\frac {\pi }{2}}{\frac {1}{2}}\cos(\varphi )\mathrm {d} \varphi ={\frac {1}{2}}.}$

This shows that the conditional density cannot be treated as conditioning on an event of probability zero, as explained in Conditional probability#Conditioning on an event of probability zero.

### Using measure theory

To understand the problem we need to recognize that a distribution on a continuous random variable is described by a density f only with respect to some measure μ. Both are important for the full description of the probability distribution. Or, equivalently, we need to fully define the space on which we want to define f.

Let Φ and Λ denote two random variables taking values in Ω1 = [−π/2, π/2] respectively Ω2 = [−π, π]. An event {Φ = φ, Λ = λ} gives a point on the sphere S(r) with radius r. We define the coordinate transform

{\displaystyle {\begin{aligned}x&=r\cos \varphi \cos \lambda \\y&=r\cos \varphi \sin \lambda \\z&=r\sin \varphi \end{aligned}}}

for which we obtain the volume element

${\displaystyle \omega _{r}(\varphi ,\lambda )=\left|\left|{\partial (x,y,z) \over \partial \varphi }\times {\partial (x,y,z) \over \partial \lambda }\right|\right|=r^{2}\cos \varphi \ .}$

Furthermore, if either φ or λ is fixed, we get the volume elements

{\displaystyle {\begin{aligned}\omega _{r}(\lambda )&=\left|\left|{\partial (x,y,z) \over \partial \varphi }\right|\right|=r\ ,\quad \mathrm {respectively} \\\omega _{r}(\varphi )&=\left|\left|{\partial (x,y,z) \over \partial \lambda }\right|\right|=r\cos \varphi \ .\end{aligned}}}

Let

${\displaystyle \mu _{\Phi ,\Lambda }(d\varphi ,d\lambda )=f_{\Phi ,\Lambda }(\varphi ,\lambda )\omega _{r}(\varphi ,\lambda )\,d\varphi \,d\lambda }$

denote the joint measure on ${\displaystyle {\mathcal {B}}(\Omega _{1}\times \Omega _{2})}$, which has a density ${\displaystyle f_{\Phi ,\Lambda }}$ with respect to ${\displaystyle \omega _{r}(\varphi ,\lambda )\,d\varphi \,d\lambda }$ and let

{\displaystyle {\begin{aligned}\mu _{\Phi }(d\varphi )&=\int _{\lambda \in \Omega _{2}}\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda )\ ,\\\mu _{\Lambda }(d\lambda )&=\int _{\varphi \in \Omega _{1}}\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda )\ .\end{aligned}}}

If we assume that the density ${\displaystyle f_{\Phi ,\Lambda }}$ is uniform, then

{\displaystyle {\begin{aligned}\mu _{\Phi \mid \Lambda }(d\varphi \mid \lambda )&={\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda ) \over \mu _{\Lambda }(d\lambda )}={\frac {1}{2r}}\omega _{r}(\varphi )\,d\varphi \ ,\quad {\text{and}}\\\mu _{\Lambda \mid \Phi }(d\lambda \mid \varphi )&={\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda ) \over \mu _{\Phi }(d\varphi )}={\frac {1}{2r\pi }}\omega _{r}(\lambda )\,d\lambda \ .\end{aligned}}}

Hence, ${\displaystyle \mu _{\Phi \mid \Lambda }}$ has a uniform density with respect to ${\displaystyle \omega _{r}(\varphi )\,d\varphi }$ but not with respect to the Lebesgue measure. On the other hand, ${\displaystyle \mu _{\Lambda \mid \Phi }}$ has a uniform density with respect to ${\displaystyle \omega _{r}(\lambda )\,d\lambda }$ and the Lebesgue measure.

## References

### Citations

1. ^ a b c Jaynes 2003, pp. 1514–1517
2. ^ Originally Kolmogorov (1933), translated in Kolmogorov (1956). Sourced from Pollard (2002)

### Sources

• Jaynes, E. T. (2003). "15.7 The Borel-Kolmogorov paradox". Probability Theory: The Logic of Science. Cambridge University Press. pp. 467–470. ISBN 0-521-59271-2. MR 1992316.
• Kolmogorov, Andrey (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung (in German). Berlin: Julius Springer.
• Pollard, David (2002). "Chapter 5. Conditioning, Example 17.". A User's Guide to Measure Theoretic Probability. Cambridge University Press. pp. 122–123. ISBN 0-521-00289-3. MR 1873379.
• Mosegaard, K., & Tarantola, A. (2002). 16 Probabilistic approach to inverse problems. International Geophysics, 81, 237–265.