= Kramers–Moyal expansion =

In stochastic processes, the Kramers–Moyal expansion refers to a Taylor series expansion of the master equation, and is named after Hans Kramers and José Enrique Moyal. In many textbooks, the expansion is only used to derive the Fokker–Planck equation, and never used again. In general, continuous stochastic processes are essentially Markovian, and so Fokker–Planck equations are sufficient for studying them. The higher-order Kramers–Moyal expansion only comes into play when the process is jumpy. This usually means it is a Poisson-like process.

For a real stochastic process, one can compute its central-moment functions from experimental data on the process, from which one can then compute its Kramers–Moyal coefficients, and thus empirically measure its Kolmogorov forward and backward equations.

== Statement ==
Start with the integro-differential master equation

$\frac{\partial p(x,t)}{\partial t} =\int p(x,t|x_0, t_0)p(x_0, t_0) dx_0$

where $p(x, t|x_0, t_0)$ is the transition probability function, and $p(x,t)$ is the probability density at time $t$. The Kramers–Moyal expansion transforms the above to an infinite order partial differential equation

$\partial_t p(x,t) = \sum_{n=1}^\infty (-\partial_x)^n[D_n(x,t) p(x,t)]$

and also$\partial_t p(x, t|x_0, t_0) =
\sum_{n=1}^\infty (-\partial_x)^n [D_n(x, t) p(x, t|x_0, t_0) ]$

where $D_n(x, t)$ are the Kramers–Moyal coefficients, defined by$D_n(x, t) = \frac{1}{n!}\lim_{\tau\to 0} \frac{1}{\tau} \mu_n(t|x, t-\tau)$and $\mu_n$ are the central moment functions, defined by

$\mu_n(t' | x, t) = \int_{-\infty}^\infty (x'-x)^n p(x', t'\mid x, t) \ dx'.$

The Fokker–Planck equation is obtained by keeping only the first two terms of the series in which $D_1$ is the drift and $D_2$ is the diffusion coefficient.

The moments, assuming they exist, evolve as

$\frac{\partial}{\partial t}\left\langle x^n\right\rangle=\sum_{k=1}^n \frac{n !}{(n-k) !}\left\langle x^{n-k} D^{(k)}(x, t)\right\rangle$where angled brackets mean taking the expectation: $\left\langle f\right\rangle = \int f(x) p(x, t)dx$.

=== n-dimensional version ===
The above is the one-dimensional version. It generalizes to n-dimensions. (Section 4.7 )

==== Proof ====
In usual probability, where the probability density does not change, the moments of a probability density function determine the probability density itself by a Fourier transform (details may be found on the characteristic function page):$p(x) = \frac{1}{2\pi} \int e^{-ikx}\tilde p(k)dk
= \sum_{n=0}^\infty \frac{(-1)^n}{n!}\delta^{(n)}(x)\mu_n$$\tilde p(k) = \int e^{ikx} p(x) dx = \sum_{n=0}^\infty\frac{(ik)^n}{n!} \mu_n$Similarly,
$p(x, t| x_0, t_0 ) = \sum_{n=0}^\infty \frac{(-1)^n}{n!}\delta^{(n)}(x-x_0) \mu_n(t|x_0, t_0)$
Now we need to integrate away the Dirac delta function. Fixing a small $\tau > 0$, we have by the Chapman-Kolmogorov equation,$\begin{align}
p(x, t) &= \int p(x,t|x', t-\tau) p(x', t-\tau) dx' \\
&= \sum_{n=0}^\infty \frac{(-1)^n}{n!}\int p(x', t-\tau) \delta^{(n)}(x-x') \mu_n(t|x', t-\tau) dx' \\
&= \sum_{n=0}^\infty \frac{(-1)^n}{n!} \partial_x^n (p(x, t-\tau) \mu_n(t|x, t-\tau))
\end{align}$The $n=0$ term is just $p(x, t-\tau)$, so taking derivative with respect to time,$\partial_t p(x, t) = \lim_{\tau \to 0^+}\frac 1\tau \sum_{n=1}^\infty \frac{(-1)^n}{n!} \partial_x^n (p(x, t-\tau) \mu_n(t|x, t-\tau)) =
\sum_{n=1}^\infty (-\partial_x)^n (p(x, t) D_n(x, t))$

The same computation with $p(x, t|x_0, t_0)$ gives the other equation.

== Forward and backward equations ==
The equation can be recast into a linear operator form, using the idea of an infinitesimal generator. Define the linear operator $\mathcal A f := \sum_{n=1}^\infty (-\partial_x)^n[D_n(x,t) f(x,t)]$then the equation above states that $\begin{align}
\partial_t p(x, t) &= \mathcal{A} p(x, t) \\
\partial_t p(x, t|x_0, t_0) &= \mathcal{A} p(x, t|x_0, t_0)
\end{align}$In this form, the equations are precisely in the form of a general Kolmogorov forward equation. The backward equation then states that$\partial_t p(x_1, t_1|x, t) = -\mathcal{A}^\dagger p(x_1, t_1|x, t)$where$\mathcal A^\dagger f := \sum_{n=1}^\infty D_n(x,t) \partial_x^n[f(x,t)]$
is the Hermitian adjoint of $\mathcal A$.

== Computing the Kramers–Moyal coefficients ==
By definition,$D_n(x, t) = \frac{1}{n!}\lim_{\tau\to 0} \frac{1}{\tau} \mu_n(t|x, t-\tau)$This definition works because $\mu_n(t|x, t) = 0$, as those are the central moments of the Dirac delta function. Since the even central moments are nonnegative, we have $D_{2n} \geq 0$ for all $n\geq 1$. When the stochastic process is the Markov process $dX = bdt + \sigma dW_t$, we can directly solve for $p(x, t|x, t-\tau)$ as approximated by a normal distribution with mean $x + b(x)\tau$ and variance $\sigma^2\tau$. This allows us to compute the central moments, and so$D_1 = b, \quad D_2 = \frac 12 \sigma^2, \quad D_3=D_4=\cdots = 0$This then gives us the 1-dimensional Fokker–Planck equation:$\partial_t p = -\partial_x(bp) + \frac 12 \partial_x^2(\sigma^2 p)$

==Pawula theorem==
The Pawula theorem states that either the sequence $D_1, D_2, D_3, ...$ becomes zero at the third term, or all its even terms are positive.

=== Proof ===
By the Cauchy–Schwarz inequality, the central moment functions satisfy $\mu_{n+m}^2 \leq \mu_{2n}\mu_{2m}$. So, taking the limit, we have $D_{n+m}^2 \leq \frac{(2n)!(2m)!}{(n+m)!^2}D_{2n}D_{2m}$. If some $D_{2+n} \neq 0$ for some $n \geq 1$, then $D_2 D_{2+2n}> 0$. In particular, $D_{2+n}, D_{2+2n}, D_{2+4n}, ... > 0$. So the existence of any nonzero coefficient of order $\geq 3$ implies the existence of nonzero coefficients of arbitrarily large order. Also, if $D_n \neq 0$, then $D_2D_{2n-2} > 0, D_4D_{2n-4} > 0, ...$. So the existence of any nonzero coefficient of order $n$ implies all coefficients of order $2, 4, ..., 2n-2$ are positive.

=== Interpretation ===
Let the operator $\mathcal A_m$ be defined such that $\mathcal A_m f := \sum_{n=1}^m (-\partial_x)^n[D_n(x,t) f(x,t)]$. The probability density evolves by $\partial_t\rho \approx \mathcal A_m \rho$. A different order of $m$ gives a different level of approximation.
- $m = 0$: the probability density does not evolve
- $m=1$: it evolves by deterministic drift only.
- $m=2$: it evolves by drift and Brownian motion (Fokker-Planck equation)
- $m=\infty$: the fully exact equation.

The Pawula theorem means that if truncating to the second term is not exact, that is if $\mathcal A_2 \neq \mathcal A$, then truncating to any term is still not exact. Usually, this means that for any truncation $\mathcal A_m$, there exists a probability density function $\rho$ that can become negative during its evolution $\partial_t\rho \approx\mathcal A_m \rho$ (and thus fail to be a probability density function). However, this doesn't mean that Kramers-Moyal expansions truncated at other choices of $m$ is useless. Though the solution must have negative values at least for sufficiently small times, the resulting approximation probability density may still be better than the $m=2$ approximation.
