= Projection filters =

Projection filters are a set of algorithms based on stochastic analysis and information geometry, or the differential geometric approach to statistics, used to find approximate solutions for filtering problems for nonlinear state-space systems.
The filtering problem consists of estimating the unobserved signal of a random dynamical system from partial noisy observations of the signal. The objective is computing the probability distribution of the signal conditional on the history of the noise-perturbed observations. This distribution allows for calculations of all statistics of the signal given the history of observations. If this distribution has a density, the density satisfies specific stochastic partial differential equations (SPDEs) called Kushner-Stratonovich equation, or Zakai equation.
It is known that the nonlinear filter density evolves in an infinite dimensional function space.

One can choose a finite dimensional family of probability densities, for example Gaussian densities, Gaussian mixtures, or exponential families, on which the infinite-dimensional filter density can be approximated. The basic idea of the projection filter is to use a geometric structure in the chosen spaces of densities to project the infinite dimensional SPDE of the optimal filter onto the chosen finite dimensional family, obtaining a finite dimensional stochastic differential equation (SDE) for the parameter of the density in the finite dimensional family that approximates the full filter evolution. To do this, the chosen finite dimensional family is equipped with a manifold structure as in information geometry.
The projection filter was tested against the optimal filter for the cubic sensor problem. The projection filter could track effectively bimodal densities of the optimal filter that would have been difficult to approximate with standard algorithms like the extended Kalman filter.
Projection filters are ideal for in-line estimation, as they are quick to implement and run efficiently in time, providing a finite dimensional SDE for the parameter that can be implemented efficiently.
Projection filters are also flexible, as they allow fine tuning the precision of the approximation by choosing richer approximating families, and some exponential families make the correction step in the projection filtering algorithm exact. Some formulations coincide with heuristic based assumed density filters or with Galerkin methods. Projection filters can also approximate the full infinite-dimensional filter in an optimal way, beyond the optimal approximation of the SPDE coefficients alone, according to precise criteria such as mean square minimization. Projection filters have been studied by the Swedish Defense Research Agency and have also been successfully applied to a variety of fields including navigation, ocean dynamics, quantum optics and quantum systems, estimation of fiber diameters, estimation of chaotic time series, change point detection and other areas.

== History and development ==

The term "projection filter" was first coined in 1987 by Bernard Hanzon, and the related theory and numerical examples were fully developed, expanded and made rigorous during the Ph.D. work of Damiano Brigo, in collaboration with Bernard Hanzon and Francois LeGland.
These works dealt with the projection filters in Hellinger distance and Fisher information metric, that were used to project the optimal filter infinite-dimensional SPDE on a chosen exponential family. The exponential family can be chosen so as to make the prediction step of the filtering algorithm exact.
A different type of projection filters, based on an alternative projection metric, the direct $L^2$ metric, was introduced in Armstrong and Brigo (2016). With this metric, the projection filters on families of mixture distributions coincide with filters based on Galerkin methods. Later on, Armstrong, Brigo and Rossi Ferrucci (2021) derive optimal projection filters that satisfy specific optimality criteria in approximating the infinite dimensional optimal filter. Indeed, the Stratonovich-based projection filters optimized the approximations of the SPDE separate coefficients on the chosen manifold but not the SPDE solution as a whole. This has been dealt with by introducing the optimal projection filters. The innovation here is to work directly with Ito calculus, instead of resorting to the Stratonovich calculus version of the filter equation. This is based on research on the geometry of Ito Stochastic differential equations on manifolds based on the jet bundle, the so-called 2-jet interpretation of Ito stochastic differential equations on manifolds.

== Projection filters derivation ==
Here the derivation of the different projection filters is sketched.

=== Stratonovich-based projection filters ===
This is a derivation of both the initial filter in Hellinger/Fisher metric sketched by Hanzon and fully developed by Brigo, Hanzon and LeGland, and the later projection filter in direct L2 metric by Armstrong and Brigo (2016).

It is assumed that the unobserved random signal $X_t \in \R^m$ is modelled by the Ito stochastic differential equation:
$d X_t = f(X_t,t) \, d t + \sigma(X_t,t) \, d W_t$
where f and $\sigma\, dW$ are $\R^m$ valued and $W_t$ is a Brownian motion. Validity of all regularity conditions necessary for the results to hold will be assumed, with details given in the references. The associated noisy observation process $Y_t \in \R^d$ is modelled by
$d Y_t = b(X_t,t) \, d t + d V_t$
where $b$ is $\R^d$ valued and $V_t$ is a Brownian motion independent of $W_t$. As hinted above, the full filter is the conditional
distribution of $X_t$ given a prior for $X_0$
and the history of $Y$ up to time $t$. If this distribution has a density described informally as
$p_t(x)dx = Prob\{X_t \in dx | \sigma(Y_s, s\leq t)\}$
where $\sigma(Y_s, s\leq t)$ is the sigma-field generated by the history of noisy observations $Y$ up to time $t$, under suitable technical conditions the density $p_t$ satisfies the Kushner—Stratonovich SPDE:
$d p_t = {\cal L}^*_t p_t \ d t
+ p_t[b(\cdot,t) - E_{p_t}(b(\cdot,t))]^T [ d Y_t - E_{p_t}(b(\cdot,t)) dt]$

where $E_p$ is the expectation
$E_p[h] = \int h(x) p(x) dx,$
and the forward diffusion operator ${\cal L}^*_t$ is
${\cal L}_t^* p = - \sum_{i=1}^m \frac{\partial}{\partial x_i} [ f_i(x,t) p_t(x) ] + \frac{1}{2} \sum_{i,j=1}^m \frac{\partial^2}{\partial x_i \partial x_j} [a_{ij}(x,t) p_t(x)]$

where $a=\sigma \sigma^T$ and $T$ denotes transposition.
To derive the first version of the projection filters, one needs to put the $p_t$ SPDE in Stratonovich form. One obtains
$d p_t = {\cal L}^\ast_t\, p_t\,dt
   - \frac{1}{2}\, p_t\, [\vert b(\cdot,t) \vert^2 - E_{p_t}\{\vert b(\cdot,t) \vert^2\}] \,dt
   + p_t\, [b(\cdot,t)-E_{p_t}\{b(\cdot,t)\}]^T \circ dY_t\ .$

Through the chain rule, it's immediate to derive the SPDE for $d \sqrt{p_t}$.
To shorten notation one may rewrite this last SPDE as
$dp = F(p) \,dt + G^T(p) \circ dY\ ,$

where the operators $F(p)$ and $G^T(p)$ are defined as
$F(p) = {\cal L}^\ast_t\, p\,
   - \frac{1}{2}\, p\, [\vert b(\cdot,t) \vert^2 - E_{p}\{\vert b(\cdot,t) \vert^2\}],$

$G^T(p) = p\, [b(\cdot,t)-E_{p}\{b(\cdot,t)\}]^T.$

The square root version is
$d \sqrt{p} = \frac{1}{2 \sqrt{p}}[ F(p) \,dt + G^T(p) \circ dY]\ .$

These are Stratonovich SPDEs whose solutions evolve in infinite dimensional function spaces. For example $p_t$ may evolve in $L^2$ (direct metric $d_2$)
$d_2(p_1,p_2)= \Vert p_1- p_2 \Vert\ , \ \ p_{1,2}\in L^2$

or $\sqrt{p_t}$ may evolve in $L^2$ (Hellinger metric $d_H$)
$d_H(\sqrt{p_1},\sqrt{p_2})= \Vert \sqrt{p_1}-\sqrt{p_2} \Vert , \ \ \ p_{1,2}\in L^1$

where $\Vert\cdot\Vert$ is the norm of Hilbert space $L^2$.
In any case, $p_t$ (or $\sqrt{p_t}$) will not evolve inside any finite dimensional family of densitities,
$S_\Theta=\{p(\cdot, \theta), \ \theta \in \Theta \subset \R^n\} \ (or \ S_\Theta^{1/2}=\{\sqrt{p(\cdot, \theta)}, \ \theta \in \Theta \subset \R^n\}).$

The projection filter idea is approximating $p_t(x)$ (or $\sqrt{p_t(x)}$) via a finite dimensional density $p(x,\theta_t)$ (or $\sqrt{p(x,\theta_t)}$).

The fact that the filter SPDE is in Stratonovich form allows for the following. As Stratonovich SPDEs satisfy the chain rule, $F$ and $G$ behave as vector fields. Thus, the equation is characterized by a $dt$ vector field $F$ and a $dY_t$ vector field $G$. For this version of the projection filter one is satisfied with dealing with the two vector fields separately.
One may project $F$ and $G$ on the tangent space of the densities in $S_\Theta$ (direct metric) or of their square roots (Hellinger metric). The direct metric case yields
$dp(\cdot,\theta_t) = \Pi_{p(\cdot,\theta_t)}[F(p(\cdot,\theta_t))] \,dt + \Pi_{p(\cdot,\theta_t)}[G^T(p(\cdot,\theta_t))] \circ dY_t\$

where $\Pi_{p(\cdot,\theta)}$ is the tangent space projection at the point $p(\cdot,\theta)$ for the manifold $S_\Theta$, and where, when applied to a vector such as $G^T$, it is assumed to act component-wise by projecting each of $G^T$'s components. As a basis of this tangent space is
$\left\{ \frac{\partial p(\cdot,\theta)}{\partial \theta_1},\cdots,
   \frac{\partial p(\cdot,\theta)}{\partial \theta_n} \right\},$
by denoting the inner product of $L^2$ with $\langle \cdot, \cdot \rangle$, one defines the metric
$\gamma_{ij}(\theta) = \left\langle \frac{\partial {p(\cdot,\theta)}}{
   \partial \theta_i}\, , \frac{\partial {p(\cdot,\theta)}}{
   \partial \theta_j} \right\rangle =
    \int \frac{\partial p(x,\theta)}{\partial \theta_i}\,
   \frac{\partial p(x,\theta)}{\partial \theta_j}\, d x$

and the projection is thus
$\Pi^\gamma_{p(\cdot,\theta)} [v] = \sum_{i=1}^n \left[ \sum_{j=1}^n
   \gamma^{ij}(\theta)\; \left\langle v,\,
   \frac{\partial {p(\cdot,\theta)}}{\partial \theta_j} \right\rangle \right]\;
   \frac{\partial {p(\cdot,\theta)}}{\partial \theta_i}$
where $\gamma^{ij}$ is the inverse of $\gamma_{ij}$.
The projected equation thus reads
$d p(\cdot, \theta_t) = \Pi_{p(\cdot,\theta)}[F(p(\cdot, \theta_t))] dt + \Pi_{p(\cdot,\theta)}[G^T(p(\cdot, \theta_t))] \circ dY_t$

which can be written as
$\sum_{i=1}^n \frac{\partial p(\cdot, \theta_t)}{\theta_i}\circ d \theta_i =
\sum_{i=1}^n \left[ \sum_{j=1}^n
   \gamma^{ij}(\theta)\; \left\langle F(p(\cdot, \theta_t)),\,
   \frac{\partial {p(\cdot,\theta)}}{\partial \theta_j} \right\rangle \right]\;
   \frac{\partial {p(\cdot,\theta)}}{\partial \theta_i} dt +
\sum_{i=1}^n \left[ \sum_{j=1}^n
   \gamma^{ij}(\theta)\; \left\langle G^T(p(\cdot, \theta_t)),\,
   \frac{\partial {p(\cdot,\theta)}}{\partial \theta_j} \right\rangle \right]\;
   \frac{\partial {p(\cdot,\theta)}}{\partial \theta_i} \circ dY_t ,$

where it has been crucial that Stratonovich calculus obeys the chain rule. From the above equation, the final projection filter SDE is
$d \theta_i =
    \left[\sum_{j=1}^n \gamma^{ij}(\theta_t)\;
   \int F(p(x, \theta_t)) \;
   \frac{\partial p(x,\theta_t)}{\partial \theta_j} dx \right] dt +
   \sum_{k=1}^d\; \left[ \sum_{j=1}^n \gamma^{ij}(\theta_t)\;
   \int G_k(p(x, \theta_t)) \;
   \frac{\partial p(x,\theta_t)}{\partial \theta_j}\; d x \right]
   \circ dY_k$

with initial condition a chosen $\theta_0$.

By substituting the definition of the operators F and G we obtain the fully explicit projection filter equation in direct metric:

$d \theta_i(t)
   =
    \left[\sum_{j=1}^m \gamma^{ij}(\theta_t)\;
   \int {\partial \theta_1},\cdots,
   \frac{\partial \sqrt{p(\cdot,\theta)}}{\partial \theta_n} \right\},$

and one defines the metric
$\frac{1}{4} g_{ij}(\theta) =
 \left \langle \frac{\partial \sqrt{p}}{
   \partial \theta_i}\, , \frac{\partial \sqrt{p}}{
   \partial \theta_j}\right \rangle
  = \frac{1}{4} \int \frac{1}{p(x,\theta)}\,
   \frac{\partial p(x,\theta)}{\partial \theta_i}\,
   \frac{\partial p(x,\theta)}{\partial \theta_j}\,
   d x .$

The metric $g$ is the Fisher information metric. One follows steps completely analogous to the direct metric case and the filter equation in Hellinger/Fisher metric is
$d \theta_i = \left[ \sum_{j=1}^n g^{ij}(\theta_t)\;
   \int \frac{F(p(x,\theta_t))}{p(x,\theta_t)}\;
   \frac{\partial p(x,\theta_t)}{\partial \theta_j}\;
   dx \right] dt + \sum_{k=1}^d\; \left[ \sum_{j=1}^m g^{ij}(\theta_t)\;
   \int \frac{G_k(p(x,\theta_t))}{p(x,\theta_t)}\;
   \frac{\partial p(x,\theta_t)}{\partial \theta_j}\; dx \right]
   \circ dY_t^k\ ,$
again with initial condition a chosen $\theta_0$.

Substituting F and G one obtains
<math> d \theta_i(t)
   =
   \left[ \sum_{j=1}^m g^{ij}(\theta_t)\;
   \int \frac
