Iterated filtering

Iterated filtering algorithms are a tool for maximum likelihood inference on partially observed dynamical systems. Stochastic perturbations to the unknown parameters are used to explore the parameter space. Applying sequential Monte Carlo (the particle filter) to this extended model results in the selection of the parameter values that are more consistent with the data. Appropriately constructed procedures, iterating with successively diminished perturbations, converge to the maximum likelihood estimate. Iterated filtering methods have so far been used most extensively to study infectious disease transmission dynamics. Case studies include cholera, Ebola virus, influenza, malaria, HIV, pertussis, poliovirus and measles. Other areas which have been proposed to be suitable for these methods include ecological dynamics and finance.

The perturbations to the parameter space play several different roles. Firstly, they smooth out the likelihood surface, enabling the algorithm to overcome small-scale features of the likelihood during early stages of the global search. Secondly, Monte Carlo variation allows the search to escape from local minima. Thirdly, the iterated filtering update uses the perturbed parameter values to construct an approximation to the derivative of the log likelihood even though this quantity is not typically available in closed form. Fourthly, the parameter perturbations help to overcome numerical difficulties that can arise during sequential Monte Carlo.

Overview

The data are a time series $y_{1},\dots ,y_{N}$ collected at times $t_{1} . The dynamic system is modeled by a Markov process $X(t)$ which is generated by a function $f(x,s,t,\theta ,W)$ in the sense that

$X(t_{n}^{})=f(X(t_{n-1}^{}),t_{n-1}^{},t_{n}^{},\theta ,W)$ where $\theta$ is a vector of unknown parameters and $W$ is some random quantity that is drawn independently each time $f(.)$ is evaluated. An initial condition $X(t_{0})$ at some time $t_{0} is specified by an initialization function, $X(t_{0})=h(\theta )$ . A measurement density $g(y_{n}|X_{n},t_{n},\theta )$ completes the specification of a partially observed Markov process. We present a basic iterated filtering algorithm (IF1) followed by an iterated filtering algorithm implementing an iterated, perturbed Bayes map (IF2).

Procedure: Iterated filtering (IF1)

Input: A partially observed Markov model specified as above; Monte Carlo sample size $J$ ; number of iterations $M$ ; cooling parameters $0 and $b$ ; covariance matrix $\Phi$ ; initial parameter vector $\theta ^{(1)}$ for $m_{}^{}=1$ to $M_{}^{}$ draw $\Theta _{F}(t_{0}^{},j)\sim \mathrm {Normal} (\theta ^{(m)},ba^{m-1}\Phi )$ for $j=1,\dots ,J$ set $X_{F}(t_{0}^{},j)=h{\big (}\Theta _{F}(t_{0}^{},j){\big )}$ for $j=1,\dots ,J$ set ${\bar {\theta }}(t_{0}^{})=\theta ^{(m)}$ for $n_{}^{}=1$ to $N_{}^{}$ draw $\Theta _{P}(t_{n}^{},j)\sim \mathrm {Normal} (\Theta _{F}(t_{n-1}^{},j),a^{m-1}\Phi )$ for $j=1,\dots ,J$ set $X_{P}(t_{n}^{},j)=f(X_{F}(t_{n-1}^{},j),t_{n-1}^{},t_{n},\Theta _{P}(t_{n},j),W)$ for $j=1,\dots ,J$ set $w(n,j)=g(y_{n}|X_{P}(t_{n}^{},j),t_{n}^{},\Theta _{P}(t_{n},j))$ for $j=1,\dots ,J$ draw $k_{1}^{},\dots ,k_{J}^{}$ such that $P(k_{j}^{}=i)=w(n,i){\big /}{\sum }_{\ell }w(n,\ell )$ set $X_{F}(t_{n}^{},j)=X_{P}(t_{n}^{},k_{j}^{})$ and $\Theta _{F}(t_{n}^{},j)=\Theta _{P}(t_{n}^{},k_{j}^{})$ for $j=1,\dots ,J$ set ${\bar {\theta }}_{i}^{}(t_{n}^{})$ to the sample mean of $\{\Theta _{F,i}^{}(t_{n}^{},j),j=1,\dots ,J\}$ , where the vector $\Theta _{F}^{}$ has components $\{\Theta _{F,i}^{}\}$ set $V_{i}^{}(t_{n}^{})$ to the sample variance of $\{\Theta _{P,i}^{}(t_{n}^{},j),j=1,\dots ,J\}$ set $\theta _{i}^{(m+1)}=\theta _{i}^{(m)}+V_{i}(t_{1})\sum _{n=1}^{N}V_{i}^{-1}(t_{n})({\bar {\theta }}_{i}(t_{n})-{\bar {\theta }}_{i}(t_{n-1}))$ Output: Maximum likelihood estimate ${\hat {\theta }}=\theta ^{(M+1)}$ Variations

1. For IF1, parameters which enter the model only in the specification of the initial condition, $X(t_{0})$ , warrant some special algorithmic attention since information about them in the data may be concentrated in a small part of the time series.
2. Theoretically, any distribution with the requisite mean and variance could be used in place of the normal distribution. It is standard to use the normal distribution and to reparameterise to remove constraints on the possible values of the parameters.
3. Modifications to the IF1 algorithm have been proposed to give superior asymptotic performance.

Procedure: Iterated filtering (IF2)

Input: A partially observed Markov model specified as above; Monte Carlo sample size $J$ ; number of iterations $M$ ; cooling parameter $0 ; covariance matrix $\Phi$ ; initial parameter vectors $\{\Theta _{j},j=1,\dots ,J\}$ for $m_{}^{}=1$ to $M_{}^{}$ set $\Theta _{F}(t_{0}^{},j)\sim \mathrm {Normal} (\Theta _{j},a^{m-1}\Phi )$ for $j=1,\dots ,J$ set $X_{F}(t_{0}^{},j)=h{\big (}\Theta _{F}(t_{0}^{},j){\big )}$ for $j=1,\dots ,J$ for $n_{}^{}=1$ to $N_{}^{}$ draw $\Theta _{P}(t_{n}^{},j)\sim \mathrm {Normal} (\Theta _{F}(t_{n-1}^{},k_{j}^{}),a^{m-1}\Phi )$ for $j=1,\dots ,J$ set $X_{P}(t_{n}^{},j)=f(X_{F}(t_{n-1}^{},j),t_{n-1}^{},t_{n},\Theta _{P}(t_{n},j),W)$ for $j=1,\dots ,J$ set $w(n,j)=g(y_{n}|X_{P}(t_{n}^{},j),t_{n}^{},\Theta _{P}(t_{n},j))$ for $j=1,\dots ,J$ draw $k_{1}^{},\dots ,k_{J}^{}$ such that $P(k_{j}^{}=i)=w(n,i){\big /}{\sum }_{\ell }w(n,\ell )$ set $X_{F}(t_{n}^{},j)=X_{P}(t_{n}^{},k_{j}^{})$ and $\Theta _{F}(t_{n}^{},j)=\Theta _{P}(t_{n}^{},k_{j}^{})$ for $j=1,\dots ,J$ set $\Theta _{j}=\Theta _{F}(t_{N}^{},j)$ for $j=1,\dots ,J$ Output: Parameter vectors approximating the maximum likelihood estimate, $\{\Theta _{j},j=1,\dots ,J\}$ 