Filtering problem (stochastic processes)

In the theory of stochastic processes, the filtering problem is a mathematical model for a number of state estimation problems in signal processing and related fields. The general idea is to establish a "best estimate" for the true value of some system from an incomplete, potentially noisy set of observations on that system. The problem of optimal non-linear filtering (even for the non-stationary case) was solved by Ruslan L. Stratonovich (1959,[1] 1960[2]), see also Harold J. Kushner's work [3] and Moshe Zakai's, who introduced a simplified dynamics for the unnormalized conditional law of the filter[4] known as Zakai equation. The solution, however, is infinite-dimensional in the general case.[5] Certain approximations and special cases are well understood: for example, the linear filters are optimal for Gaussian random variables, and are known as the Wiener filter and the Kalman-Bucy filter. More generally, as the solution is infinite dimensional, it requires finite dimensional approximations to be implemented in a computer with finite memory. A finite dimensional approximated nonlinear filter may be more based on heuristics, such as the Extended Kalman Filter or the Assumed Density Filters,[6] or more methodologically oriented such as for example the Projection Filters,[7] some sub-families of which are shown to coincide with the Assumed Density Filters.[8]

In general, if the separation principle applies, then filtering also arises as part of the solution of an optimal control problem. For example, the Kalman filter is the estimation part of the optimal control solution to the linear-quadratic-Gaussian control problem.

The mathematical formalism

Consider a probability space (Ω, Σ, P) and suppose that the (random) state Yt in n-dimensional Euclidean space Rn of a system of interest at time t is a random variable Yt : Ω → Rn given by the solution to an Itō stochastic differential equation of the form

${\displaystyle \mathrm {d} Y_{t}=b(t,Y_{t})\,\mathrm {d} t+\sigma (t,Y_{t})\,\mathrm {d} B_{t},}$

where B denotes standard p-dimensional Brownian motion, b : [0, +∞) × Rn → Rn is the drift field, and σ : [0, +∞) × Rn → Rn×p is the diffusion field. It is assumed that observations Ht in Rm (note that m and n may, in general, be unequal) are taken for each time t according to

${\displaystyle H_{t}=c(t,Y_{t})+\gamma (t,Y_{t})\cdot {\mbox{noise}}.}$

Adopting the Itō interpretation of the stochastic differential and setting

${\displaystyle Z_{t}=\int _{0}^{t}H_{s}\,\mathrm {d} s,}$

this gives the following stochastic integral representation for the observations Zt:

${\displaystyle \mathrm {d} Z_{t}=c(t,Y_{t})\,\mathrm {d} t+\gamma (t,Y_{t})\,\mathrm {d} W_{t},}$

where W denotes standard r-dimensional Brownian motion, independent of B and the initial condition Y0, and c : [0, +∞) × Rn → Rn and γ : [0, +∞) × Rn → Rn×r satisfy

${\displaystyle {\big |}c(t,x){\big |}+{\big |}\gamma (t,x){\big |}\leq C{\big (}1+|x|{\big )}}$

for all t and x and some constant C.

The filtering problem is the following: given observations Zs for 0 ≤ s ≤ t, what is the best estimate Ŷt of the true state Yt of the system based on those observations?

By "based on those observations" it is meant that Ŷt is measurable with respect to the σ-algebra Gt generated by the observations Zs, 0 ≤ s ≤ t. Denote by K = K(Zt) be collection of all Rn-valued random variables Y that are square-integrable and Gt-measurable:

${\displaystyle K=K(Z,t)=L^{2}(\Omega ,G_{t},\mathbf {P} ;\mathbf {R} ^{n}).}$

By "best estimate", it is meant that Ŷt minimizes the mean-square distance between Yt and all candidates in K:

${\displaystyle \mathbf {E} \left[{\big |}Y_{t}-{\hat {Y}}_{t}{\big |}^{2}\right]=\inf _{Y\in K}\mathbf {E} \left[{\big |}Y_{t}-Y{\big |}^{2}\right].\qquad {\mbox{(M)}}}$

Basic result: orthogonal projection

The space K(Zt) of candidates is a Hilbert space, and the general theory of Hilbert spaces implies that the solution Ŷt of the minimization problem (M) is given by

${\displaystyle {\hat {Y}}_{t}=P_{K(Z,t)}{\big (}Y_{t}{\big )},}$

where PK(Z,t) denotes the orthogonal projection of L2(Ω, Σ, PRn) onto the linear subspace K(Zt) = L2(Ω, GtPRn). Furthermore, it is a general fact about conditional expectations that if F is any sub-σ-algebra of Σ then the orthogonal projection

${\displaystyle P_{K}:L^{2}(\Omega ,\Sigma ,\mathbf {P} ;\mathbf {R} ^{n})\to L^{2}(\Omega ,F,\mathbf {P} ;\mathbf {R} ^{n})}$

is exactly the conditional expectation operator E[·|F], i.e.,

${\displaystyle P_{K}(X)=\mathbf {E} {\big [}X{\big |}F{\big ]}.}$

Hence,

${\displaystyle {\hat {Y}}_{t}=P_{K(Z,t)}{\big (}Y_{t}{\big )}=\mathbf {E} {\big [}Y_{t}{\big |}G_{t}{\big ]}.}$

This elementary result is the basis for the general Fujisaki-Kallianpur-Kunita equation of filtering theory.