Particle Filters for Partially Observed Diffusions

Paul Fearnhead, Omiros Papaspiliopoulos, Gareth Roberts

Introduction

There is considerable interest in using diffusion processes to model continuous-time phenomena in many diverse scientific disciplines. These processes can be used to model directly the observed data and/or to describe unobserved processes in a hierarchical model. This paper focuses on estimating the path of the diffusion given partial information about it. We develop novel particle filters for analysing a class of multivariate diffusions which are partially observed at a set of discrete time-points.

Particle filtering methods are standard Monte-Carlo methods for analysing partially-observed discrete-time dynamic models Doucet et al. (2001). They involve estimating the filtering densities of interest by a swarm of weighted particles. The approximation error decreases as the number of particles, $N$ , increases. However, filtering for diffusion processes is significantly harder than for discrete-time Markov models since the transition density of the diffusion is unavailable in all but a few special cases. In many contexts even the observation density is intractable. Therefore, the standard propagation/weighting/re-sampling steps in the particle filter algorithm cannot be routinely applied.

To circumvent these complications, a further approximation, based on a time-discretisation of the diffusion, has been suggested (see for example Crisan et al., 1999; Del Moral et al., 2001). The propagation of each particle from one observation time to the next is done by splitting the time increment into $M$ , say, pieces and performing $M$ intermediate simulations according to an appropriate Gaussian distribution. As $M$ gets large this Gaussian approximation converges to the true diffusion dynamics. In this framework the computational cost of the algorithm is of order $M\times N$ , and the true filtering distributions are obtained as both $M$ and $N$ increase.

Our approach does not rely on time-discretisation, but builds on recent work on the Exact Algorithm for the simulation of diffusions (Beskos and Roberts, 2005; Beskos et al., 2006a, 2005b) and on the unbiased estimation of the diffusion transition density (Beskos et al., 2006b, 2005a). This algorithm can be used in a variety of ways to avoid time discretisations in the filtering problem. The potential of the Exact Algorithm in the filtering problem was brought up in the discussion of Beskos et al. (2006b), see the contributions by Chopin, Künsch, and in particular Rousset and Doucet who also suggest the use of a random weight particle filter in this context.

One possibility is simply to use the Exact Algorithm to propagate the particles in the implementation of the Gordon et al. (1993) bootstrap particle filter, thus avoiding entirely the $M$ intermediate approximate simulations between each pair of observation times. We call this the Exact Propagation Particle Filter (EPPF). Where possible, a better approach is to adapt the Exact Algorithm to simulate directly from (a particle approximation to) the filtering density using rejection sampling; we term this the Exact Simulation Particle Filter (ESPF).

However, our favoured method goes in a different direction. We work in the framework of the auxiliary particle filter of Pitt and Shephard (1999), where particles are propagated from each observation time to the next according to a user-specified density and then are appropriately weighted to provide a consistent estimator of the new filtering distribution. Due to the transition density being unavailable, the weights associated with each particle are intractable. However, our approach is to assign to each particle a random positive weight which is an unbiased estimator of the true weight. We call this the Random Weight Particle Filter (RWPF). Our algorithm yields consistent estimates of the filtering distributions. The replacement of the weights in a particle filter by positive unbiased estimators is an interesting possibility in more general contexts than the one considered in this paper. Indeed, in Section 3.2 we show that this approach amounts to a convenient augmentation of the state with auxiliary variables.

The construction of the unbiased estimators of the weights is one of the main contributions of this paper, and it is of independent interest. This is based on an extension of the Poisson Estimator of Beskos et al. (2006b), which we call the Generalised Poisson Estimator. This estimator is guaranteed to return positive estimates (unlike the Poisson Estimator) and its efficiency (in terms of variance and computational cost) can be up to orders of magnitude better than the Poisson Estimator. Optimal implementation of the Poisson and the Generalised Poisson estimators is thoroughly investigated theoretically and via simulation.

All three time-discretisation-free particle filters we introduce are easy to implement, with the RWPF being the easiest and the most flexible to adapt to contexts more general than those considered here. A simulation study is carried out which shows that the RWPF is considerably more efficient than the ESPF which is more efficient than the EPPF. We also provide a theoretical result which shows that our filters can have significant computational advantages over time-discretisation methods. We establish a Central Limit Theorem (CLT) for the estimation of expectations of the filtering distributions using either of the EPPF, ESPF and the RWPF. This is an extension of the results of Chopin (2004). The CLT shows that, for a fixed computational cost $K$ , the errors in the particle approximation of the filtering distributions decrease as $K^{-1/2}$ in our methods, whereas it is known that the rate is $K^{-1/3}$ or slower in time-discretisation methods.

The main limitation of the methodology presented here is the requirement that the stochastic differential equation specifying the underlying diffusion process can be transformed to one with orthogonal diffusion matrix, and gradient drift. Although this framework excludes some important model types (such as stochastic volatility models) it incorporates a wide range of processes which can model successfully many physical processes. On the other hand, our methods can handle a variety of discrete-time observation schemes. In this paper we consider three schemes: noisy observations of a diffusion process, observation of a subset of the components of a multivariate diffusion, and arrival times of a Poisson process whose intensity is stochastic and it is given by a known function of a diffusion.

The paper is organised as follows. Section 2 introduces the model for the underlying diffusion and the necessary notation, the observation schemes we consider and the simulated data sets on which we test our proposed methods. Section 3 introduces the RWPF and states the CLT. Section 4 introduces the main tool required in constructing the RWPF, the Generalised Poisson Estimator (GPE). Several theoretical results are established for the GPE, and a simulation study is performed to assess its performance. Section 5 is devoted in the empirical investigation of the performance of the different particle filters we introduce. Several implementation issues are also discussed. Section 6 closes with a discussion on extensions of the methodology and the appendices contain technical results and proofs.

Signal, data and assumptions

Let the signal be modelled by a $d$ -dimensional diffusion process

We assume throughout the paper that the drift is known. Our approach requires some assumptions which we summarize in this paragraph: i) $\alpha$ is continuously differentiable in all its arguments, ii) there exists a function $A:{\mathbf{R}}^{d}\to{\mathbf{R}}$ such that $\mbox{\boldmath$ \alpha $}({\bf u})=\nabla A({\bf u})$ , and iii) there exists $l>-\infty$ such that $\phi({\bf u}):=\bigl{(}\|\mbox{\boldmath$ \alpha $}({\bf u})\|^{2}+\nabla^{2}A({\bf u})\bigr{)}/2-l\geq 0$ . Among these last three conditions i) and iii) are weak and the strictest is ii), which in the ergodic case corresponds to ${\bf X}$ being a time-reversible diffusion.

The transition density of (1) is typically intractable but a useful expression is available (see for example Beskos et al., 2006b; Dacunha-Castelle and Florens-Zmirou, 1986)

In this expression $\mathcal{N}_{t}({\bf u})$ denotes the density of the $d$ -dimensional normal distribution with mean ${\bf 0}$ and variance ${t}\mbox{\boldmath$ I $}_{d}$ evaluated at ${{\bf u}}\in\mathbf{R}^{d}$ , and the expectation is taken w.r.t. a Brownian bridge, ${\bf W}_{s},s\in[0,t]$ , with ${\bf W}_{0}={\bf x}_{0}$ and ${\bf W}_{t}={\bf x}_{t}$ . Note that the expectation in this formula typically cannot be evaluated.

The data consist of partial observations $y_{1},y_{2},\ldots,y_{n}$ , at discrete time-points $0\leq t_{1}<t_{2}<\cdots<t_{n}$ . We consider three possible observation regimes:

Diffusion observed with error. The observation $y_{i}$ , is related to the signal at time $t_{i}$ via a known density function $f(y_{i}|{\bf x}_{t_{i}})$ . This model extends the general state-space model by allowing the signal to evolve continuously in time. There is a wide range of applications which fit in this framework, see Doucet et al. (2001) for references.

Partial Information. At time $t_{i}$ we observe $y_{i}=\zeta({\bf X}_{t_{i}})$ for some non-invertible known function $\zeta(\cdot)$ . For example we may observe a single component of the $d$ -dimensional diffusion. In this model type $f(y_{i}|{\bf x}_{t_{i}})=1$ for all ${\bf x}_{t_{i}}$ for which $\zeta({\bf x}_{t_{i}})=y_{i}$ .

Cox Process. In this regime the data consist of the observation times $t_{i}$ which are random and are assumed to be the arrivals of a Poisson process of rate $\nu({\bf X}_{s})$ , for some known function $\nu$ . Such models are popular in insurance (Dassios and Jang, 2005) and finance (Engel, 2000; Duffie and Singleton, 1999), and they have recently been used to analyse data from single molecule experiments (Kou et al., 2005).There is a significant difference between this observation regime and the two previous ones. To have notation consistent with (A) and (B) we let $y_{i}=t_{i}$ denote the time of the $i$ th observation; and define the likelihood $f(y_{i}\mid{\bf x}_{t_{i-1}},{\bf x}_{t_{i}})$ to be the probability density that the next observation after $t_{i-1}$ is at time $t_{i}$ . This density can be obtained by integrating

w.r.t. the distribution of $({\bf X}_{s},s\in(t_{i-1},t_{i}))$ conditionally on ${\bf X}_{t_{i-1}}={\bf x}_{t_{i-1}},{\bf X}_{t_{i}}={\bf x}_{t_{i}}$ . The distribution of this conditioned process has a known density w.r.t. the Brownian bridge measure and it is given in Lemma 1 of Beskos et al. (2006b). We can thus show that the density of interest is

where expectation is with respect to the law of a Brownian Bridge from ${\bf x}_{t_{i-1}}$ to ${\bf x}_{t_{i}}$ .

We take a Bayesian approach, and assume a prior distribution for ${\bf X}_{0}$ . Our interest lies in the online calculation of the filtering densities, the posterior densities of the signal at time $t_{i}$ given the observations up to time $t_{i}$ , for each $1\leq i\leq n$ . While these densities are intractable, we propose a particle filter scheme to estimate recursively these densities at each observation time-point. As we point out in Section 6, our approach allows the estimation of the filtering distribution of the continuous time path $({\bf X}_{s},t_{i-1}<s<t_{i})$ .

A more flexible model for the signal is a diffusion process ${\bf Z}$ which solves a more general SDE than the one we have assumed in (1):

Our particle filtering methods will be illustrated on two sets of simulated data:

Example 1: Sine diffusion observed with error. The signal satisfies

and the data consist of noisy observations, $y_{i}\sim\mbox{N}(X_{t_{i}},\sigma^{2})$ . Figure 1(top) shows a simulation of this model with $\sigma=0.2$ . In this case

This process is closely related to Brownian motion on a circle. It is convenient as an illustrative example since discrete-time skeletons can be easily simulated from this process using the most basic form of the Exact Algorithm (EA1 in Beskos et al., 2006a, R-code is available on request by the authors).

Example 2: OU-driven Cox Process. The second data set consists of the arrival times of a Poisson process, $y_{i}=t_{i},$ whose intensity is given by $\nu(X_{s}),s\geq 0$ , where

and $X$ is an Ornstein-Uhlenbeck (OU) process,

The OU process is stationary with Gaussian marginal distribution, $\mbox{N}(0,1/(2\rho))$ . Thus, an interpretation for this model is that the excursions of $X$ increase the Poisson intensity, whereas $a$ corresponds to the intensity when $X$ is at its mean level. An example data set is shown in Figure 3; where we have taken $a=0$ , $\beta=20$ , $\rho=1/2$ . Although the transition density of the OU process is well-known,

the observation density $f(y_{i+1}\mid x_{t_{i}},x_{t_{i+1}})$ is intractable.

Examples 1 and 2 are examples of observation regimes (A) and (C) respectively. We will show that observation regime (B) can be handled in a similar fashion as (A), so we have not included an accompanying example.

Random weight particle filter

Our aim is to recursively calculate the filtering densities $\pi_{i}({\bf x}_{t_{i}})$ . Basic probability calculations yield the following standard filtering recursion for these densities

Particle filters approximate $\pi_{i}({\bf x}_{t_{i}})$ by a discrete distribution, denoted by $\hat{\pi}_{i}({\bf x}_{t_{i}})$ , whose support is a set of $N$ particles, $\{{\bf x}_{i}^{(j)}\}_{j=1}^{N}$ , with associated probability weight $\{w_{i}^{(j)}\}_{j=1}^{N}$ . Substituting $\hat{\pi}_{i}({\bf x}_{t_{i}})$ for $\pi_{i}({\bf x}_{t_{i}})$ in (8), yields a (continuous density) approximation to $\pi_{i+1}({\bf x}_{t_{i+1}})$ ,

We can obtain such a particle approximation via importance sampling, and a general framework for achieving this is given by the auxiliary particle filter of Pitt and Shephard (1999). We choose a proposal density of the form

Choice of suitable proposals, i.e. choice of the $\beta_{i}^{(j)}$ s and $q$ , is discussed in the analysis of our specific applications in Section 5.

To simulate a new particle at time $t_{i+1}$ we (a) simulate a particle ${\bf x}_{i}^{(k)}$ at time $i$ , where $k$ is a realisation of a discrete random variable which takes the value $j\in\{1,2,\ldots,N\}$ with probability $\beta_{i}^{(j)}$ ; and (b) simulate a new particle at time $t_{i+1}$ from $q({\bf x}_{t_{i+1}}|{\bf x}^{(k)}_{i},y_{i+1})$ . The weight assigned to this pair of particles $({\bf x}_{i}^{(k)},{\bf x}_{t_{i+1}})$ is proportional to

This is repeated $N$ times to produce the set of weighted particles at time $t_{i+1}$ , $\left\{({\bf x}_{{i+1}}^{(j)},w_{{i+1}}^{(j)})\right\}_{j=1}^{N}$ , which gives an importance sampling approximation to $\pi_{i+1}({\bf x}_{t_{i+1}})$ . Renormalising the weights is possible but does not materially affect the methodology or its accuracy. Improvements on independent sampling in step (a) can be made: see the stratified sampling ideas of Carpenter et al. (1999). The resulting particle filter has good theoretical properties including consistency Crisan (2001) and central limit theorems for estimates of posterior moments Del Moral and Miclo (2000); Chopin (2004); Künsch (2005), as $N\rightarrow\infty$ . Under conditions relating to exponential forgetting of initial conditions, particle filter errors stabilise as $n\rightarrow\infty$ Del Moral and Guionnet (2001); Künsch (2005).

The difficulty with implementing such a particle filter when the signal ${\bf X}$ is a diffusion process is that the transition density $p_{\Delta_{i}}({\bf x}_{t_{i+1}}|{\bf x}^{(k)}_{i})$ which appears in (11) is intractable for most diffusions of interest, due to the expectation term in (2). Furthermore, for observation model (C) (but also for more general models), the likelihood term $f(y_{i+1}|{\bf x}^{(k)}_{i},{\bf x}_{t_{i+1}})$ given in (4) cannot be calculated analytically.

We circumvent these problems by assigning each new particle a random weight which is a realisation of a random variable whose mean is (11). The construction and simulation of this random variable is developed in Section 4, and it is based on the particular expression for the transition density in (2). The replacement of the weights by positive unbiased estimators is an interesting possibility in more general contexts than the one considered in this paper. Indeed, in Section 3.2 we show that this approach amounts to a convenient augmentation of the state with auxiliary variables.

In all models the weight associated with the pair $({\bf x}_{i}^{(k)},{\bf x}_{t_{i+1}})$ equals

where $h_{i+1}$ is a known function, and for $0<u<t$ ,

where the expectation is taken w.r.t. a $d$ -dimensional Brownian bridge ${\bf W}$ , starting at time $u$ from ${\bf W}_{u}={\bf x}$ and finishing at time $t$ at ${\bf W}_{t}={\bf z}$ .

Models (A) and (B) : For these model types

and $g=\phi$ . In model type (B) the proposal distribution $q({\bf x}_{t_{i+1}}|{\bf x}_{i}^{(k)},y_{i+1})$ should be chosen to propose only values of ${\bf x}_{t_{i+1}}$ such that $\zeta({\bf x}_{t_{i+1}})=y_{i+1}$ ; then $f(y_{i+1}|{\bf x}_{t_{i+1}})=1$ .

Model (C): A synthesis of (2), (4) and (11), with $g=\phi+\nu$ gives

Random Weight Particle Filter (RWPF) PF0 Simulate a sample ${\bf x}_{0}^{(1)},\ldots,{\bf x}_{0}^{(N)}$ from $p({\bf x}_{0})$ , and set $w_{0}^{(j)}=1/N$ . For $i=0,\ldots,n-1$ , for $j=1,\ldots,N$ :

calculate the effective sample size of the $\{\beta_{i}^{(k)}\}$ , $ESS=(\sum_{k=1}^{N}(\beta_{i}^{(k)})^{2})^{-1}$ ; if $ESS<C$ , for some fixed constant $C$ , simulate $k_{i,j}$ from $p(k)=\beta_{i}^{(k)}$ , $k=1,\ldots,N$ and set $\delta_{i+1}^{(j)}=1$ ; otherwise set $k_{i,j}=j$ and $\delta_{i+1}^{(j)}=\beta_{i}^{(j)}$ ;

simulate ${\bf x}_{i+1}^{(j)}$ from $q({\bf x}_{t_{i+1}}|{\bf x}_{i}^{(k_{i,j})},y_{i+1})$ ;

simulate ${\bf v}_{i+1}\sim Q_{g}(\,\cdot\mid{\bf x}_{i}^{(k_{i,j})},{\bf x}_{t_{i+1}},t_{i},t_{i+1})$ ;

assign particle ${\bf x}_{i+1}^{(j)}$ a weight

Notice that this algorithm contains a decision as to whether or not resample particles before propagation in step PF1, with decision being based on the ESS of the $\beta_{i}^{(j)}$ . The constant $C$ can be interpreted as the minimum acceptable effective sample size. (See Liu and Chen 1998 for the rationale of basing resampling on such a condition.) Whether or not resampling occurs will affect the weight given to the new sets of particles, and this is accounted for by different values of $\delta_{i+1}^{(j)}$ in PF1. Optimally, the resampling for step PF1 will incorporate dependence across the $N$ samples; for example the stratified sampling scheme of Carpenter et al. (1999) or the residual sampling of Liu and Chen (1998).

2 An equivalent formulation via an augmentation of the state

In the previous section we described a generic sequential Monte Carlo scheme where the exact weights in the importance sampling approximation of the filtering distributions are replaced by positive unbiased estimators. We now show that this scheme is equivalent to applying an ordinary auxiliary particle filter to a model with richer latent structure. We demonstrate this equivalent representation for model types (A) and (B), since an obvious modification of the argument establishes the equivalence for model type (C).

According to our construction, conditionally on ${\bf X}_{t_{i}}$ , ${\bf X}_{t_{i+1}}$ , $t_{i}$ and $t_{i+1}$ , ${\bf V}_{i+1}$ is independent of ${\bf V}_{j}$ and ${\bf X}_{t_{j}}$ for any $j$ different from $i,i+1$ . Additionally, it follows easily from the unbiasedness and positivity of $r$ that, conditionally on ${\bf X}_{t_{i}}={\bf x}$ , $r({\bf v}_{i+1},{\bf x},{\bf x}_{t_{i+1}},t_{i},t_{i+1})$ is a probability density function for $({\bf X}_{t_{i+1}},{\bf V}_{i+1})$ with respect to the product measure $Leb(\mbox{d}{\bf z})\times Q_{g}(\mbox{d}{\bf v}\mid{\bf x},{\bf z},t_{i},t_{i+1})$ , where $Leb$ denotes the Lebesgue measure.

Consider now an alternative discrete-time model with unobserved states $({\bf Z}_{i},{\bf V}_{i}),i=1,\ldots,n$ , ${\bf Z}_{i}\in{\mathbf{R}}^{d}$ , with a non-homogeneous Markov transition density

(this density is with respect to $Leb\times Q_{g}$ ) and observed data $y_{i}$ with observation density $f(y_{i+1}\mid{\bf z}_{i},{\bf z}_{i+1})$ . By construction the marginal filtering distributions of ${\bf Z}_{i}$ in this model are precisely $\pi_{i}({\bf x}_{t_{i}})$ , i.e. the filtering densities in (8). Consider an auxiliary particle filter applied to this model where we choose with probability $\beta_{i}^{(j)}$ each of the existing particles $({\bf z}_{i}^{(j)},{\bf v}_{i}^{(j)})$ , and generate new particles according to the following proposal

where $q$ is the same proposal density as in (10). The weights associated with each particle in this discrete-time model are tractable and are given by (13). Therefore, the weighted sample $\left\{({\bf z}_{{i+1}}^{(j)},w_{{i+1}}^{(j)})\right\}_{j=1}^{N}$ is precisely a particle approximation to $\pi_{i+1}({\bf x}_{t_{i+1}})$ , and RWPF is equivalent to an auxiliary particle filter on this discrete-time model whose latent structure has been augmented with the auxiliary variables ${\bf V}_{i}$ .

This equivalent representation sheds light on many aspects of our method. Firstly, it makes it obvious that it is inefficient to average more than one realization of the positive unbiased estimator of $\mu_{g}$ per particle. Instead it is more efficient to generate more particles with only one realization of the estimator simulated for each pair of particles.

Secondly, it illustrates that RWPF combines the advantages of the bootstrap and the auxiliary particle filter. Although it is easy to simulate from the probability distribution $Q_{g}$ (as described in Section 4), it is very difficult to derive its density. (with respect to an appropriate reference measure). Since the ${\bf V}_{i}$ s are propagated according to this measure, its calculation is avoided. This is an appealing feature of the bootstrap filter which propagates particles without requiring analytically the system transition density. On the other hand the propagation of the ${\bf Z}_{i}$ s is done via a user-specified density which incorporates the information in the data.

Thirdly, it suggests that the RWPF will have similar theoretical properties with auxiliary particle filters applied to discrete-time models. This is explored in Section 3.3.

3 Theoretical properties

Generalised Poisson Estimators

We have already motivated the need for the simulation of a positive unbiased estimator of

where the expectation is taken w.r.t. a $d$ -dimensional Brownian bridge ${\bf W}$ . In this section we introduce a methodology for deriving such estimators, and provide theoretical and simulation results regarding the variance of the suggested estimators. These results are of independent interest beyond particle filtering, so we present our methodology in a general way, where $g$ is an arbitrary function assumed only to be continuous on $\mathbf{R}^{d}$ . We assume that ${\bf W}_{0}={\bf x}$ and ${\bf W}_{t}={\bf z}$ , for arbitrary ${\bf x},{\bf z}\in\mathbf{R}^{d}$ and $t>0$ . By the time-homogeneity property of the Brownian bridge our methodology extends to the case where the integration limits change to $u$ and $u+t$ , for any $u>0$ .

Beskos et al. (2006b) proposed an unbiased estimator of (14), the Poisson Estimator:

$\kappa$ is a Poisson random variable with mean $\lambda t$ , the $\psi_{j}$ s are uniformly distributed on $[0,t]$ , and $c\in\mathbf{R}$ , $\lambda>0$ are arbitrary constants. (Here and below we assume that the empty product, i.e. when $\kappa=0$ , takes the value 1.) The two main weaknesses of the PE are that it may return negative estimates and that its variance is not guaranteed to be finite. Both of these problems are alleviated when $g$ is bounded. However this is a very restrictive assumption in our context. Therefore, here, we introduce a collection of unbiased and positive estimators of (14) which generalise the PE. The methods we consider allow $c$ and $\lambda$ depend on ${\bf W}$ , and permit $\kappa$ to have a general discrete distribution. Firstly, we need to be able to simulate random variables $L_{\bf W}$ and $U_{\bf W}$ with

and to be able to simulate ${\bf W}_{s}$ at any $s$ , given the condition implied by (16). For unbounded $g$ this is non-trivial. However, both of these simulations have become feasible since the introduction of an efficient algorithm in Beskos et al. (2005b). An outline of the construction is given in Appendix A.

Let $U_{\bf W}$ and $L_{\bf W}$ satisfy (16) and $\psi_{j},j\geq 1$ , be a sequence of independent uniform random variables on $[0,t]$ . Then, (14) can be re-expressed as follows,

We can derive various estimators of (14) by specifying $p(\cdot\mid U_{\bf W},L_{\bf W})$ . The family of all such estimators will be called the Generalised Poisson Estimator (GPE):

The following Theorem (proved in Appendix B) gives the optimal choice for $p(\cdot\mid U_{\bf W},L_{\bf W})$ .

The conditional second moment of the Generalised Poisson Estimator given $U_{\bf W}$ and $L_{\bf W}$ , is:

then the second moment is minimised by the choice

Whilst the right-hand side of (21) cannot be evaluated analytically, it can guide a suitable choice of $p(\cdot\mid U_{\bf W},L_{\bf W})$ . If ${\bf W}$ were known, the optimal proposal is Poisson with mean

We will discuss two possible ways that (23) can be used to choose a good proposal.

A sufficient condition for GPE-1 to have finite variance is that

Since $\lambda_{\bf W}$ is stochastic, an alternative approach is to introduce a (exogenous) random mean and assume that $p(\cdot|U_{\bf W},L_{\bf W})$ is Poisson with this random mean. For tractability we choose the random mean to have a Gamma distribution, when $p(\cdot\mid U_{\bf W},L_{\bf W})$ becomes a negative-binomial distribution:

A simulation study (part of which is presented in Section 4.1 below) reveals that this choice works very well in practice and the GPE-2 has up to several orders of magnitude smaller variance than the PE or the GPE-1. The integral can usually be easily evaluated, otherwise a crude approximation can be used.

We have confined our presentation to the case where the expectation in (14) is w.r.t. the Brownian bridge measure. Nevertheless, as pointed out in Beskos et al. (2006b) the PE can be constructed in exactly the same way when the expectation is taken w.r.t. an arbitrary diffusion bridge measure, as long as exact skeletons can be simulated from this measure. The GPE can also be implemented in this wider framework, provided that the process ${\bf W}$ can be constructed to satisfy (16).

We consider a smooth bounded test function $g(u)=(\sin(u)^{2}+\cos(u)+1)/2$ . This has been chosen in view of Example 1. The function $g$ is periodic, with period $2\pi$ . In $[0,2\pi]$ it has local minima at 0 and $2\pi$ , global minimum at $\pi$ and maxima at $\pi/3$ and $5\pi/3$ . Since $g$ is bounded by $9/8$ we can construct a PE which returns positive estimates by setting $c\geq 9/8$ . Under this constraint, Beskos et al. (2006b) argued that a good choice is $c=\lambda=9/8$ . Simulation experiments suggested that the performance of the GPE-2 is quite robust to the choice of the dispersion parameter $\beta$ . We have fixed it in our examples to $\beta=10$ . Table 1 summarizes estimates of the variance of the estimators based on $10^{4}$ simulated values.

We have also investigated how the efficiency of the PE and GPE-2 varies with the time increment $t$ and in particular for small $t$ (results not shown). These empirical results suggest that the coefficient of variation of the errors of both PE and GPE-2 are $O(t^{\delta})$ for some $\delta>0$ ; but that the value of $\delta$ differs for the two estimators. In the cases that we investigated, the GPE-2 appears to have a faster rate of convergence than PE.

The results of this simulation study have been verified for other functions $g$ (results not shown). We have experimented with differentiable (e.g. $g(u)=u$ ) and non-differentiable (e.g. $g(u)=|u|$ ) unbounded functions. In these cases it is impossible to design a PE which returns positive estimates w.p.1. Again, we have found that the GPE-2 performs significantly better than the PE.

It is important to mention that alternative Monte Carlo methods exist which yield consistent but biased estimates of (14). One such estimator is obtained by replacing the time-integral in (14) with a Riemann approximation based on a number, $M$ say, of intermediate points. This technique is used to construct a transition density estimator in Nicolau (2002) and effectively underlies the transition density estimator of Durham and Gallant (2002) (when the diffusion process has constant diffusion coefficient). The approach of Durham and Gallant (2002) has been used in MCMC and filtering applications Golightly and Wilkinson (2006); Chib et al. (2006); Ionides (2003). In the filtering context it provides an alternative to RWPF, where the weights are approximated. It is not the purpose of this paper to carry out a careful comparison of RWPF with such variants. However, as an illustration we present a very small scale comparison in the context of estimating the transition density, $p_{t}(z\mid x)$ , of (6) for $t=1$ and $x,z$ as in Table 1. We compare 4 methods. Two are based on (2) and use the PE and the GPE-2 to generate estimators of the expectation. The other two, DG-1 and DG-5 are two implementation of the Durham and Gallant (2002) estimator, with 1 and 5 respectively intermediate points. We compare the methods in terms of their root mean square error divided by the true value (i.e. the coefficient of variation). As the true value we used the estimate of the GPE-2. The results of the comparison are presented in Table 2. Notice that DG-1 and DG-5 simulate many more variables than GPE-2 to construct their estimates.

Comparison of particle filters on the simulated data

We now demonstrate the performance of the different particle filters we have presented on the two examples introduced in Section 2.

We first consider analysing the sine diffusion of Example 1. The simulated data is shown in Figure 1(top).

We compare four implementations of the particle filter each of which avoids time-discretisations by using methodology based on the Exact Algorithm (EA) for simulating diffusions: i) EPPF, which uses EA for implementing a bootstrap filter, ii) ESPF, which adapts EA to simulate by rejection sampling from the filtering densities, iii) RWPF1, an implementation of RWPF using PE (see Table 1) to simulate the weights, iv) RWPF2, an implementation of RWPF using GPE-2 to simulate the weights. Details on the implementation of EPPF and ESPF are given in Appendix D.

In this simple example ESPF is more efficient than EPPF, since it has the same computational cost, but it is proposing from the optimal proposal distribution. However, we have efficiently implemented ESPF exploiting several niceties of this simple model, in particular the Gaussian likelihood and the fact that the drift is bounded. In more general models implementation of ESPF can be considerably harder and its comparison with EPPF less favorable due to smaller acceptance probabilities.

In this context where $\phi$ is bounded one can speed up the implementation of GPE-2 with practically no loss of efficiency by replacing $U_{\bf W}$ in (24) and (25) by $9/8$ which is the upper bound of $\phi$ . In this case, there is no need to simulate $U_{\bf W}$ and $L_{\bf W}$ . We have implemented this simplification in the RWPF2.

Algorithms EPPF, RWPF1-2 used the stratified re-sampling algorithm of Carpenter et al. (1999), with re-sampling at every iteration. For RWPF1-2 we chose the proposal distribution for the new particles based on the optimal proposal distribution obtained if the sine diffusion is approximated by the Ozaki discretisation scheme (details in Appendix E). For EPPF we chose the $\beta_{i}^{(k)}$ s to be those obtained from this approximation.

The number of particles used in each algorithm was set so that each filter had comparable CPU cost, which resulted in 500, 500, 910 and 1000 particles used respectively for each algorithm. For these numbers of particles, EPPF and ESPF on average required the proposal of 1360 particles and required 675 Brownian bridge simulations within the accept-reject step (iii) at each iteration of the algorithm. By comparison RWPF1 and RWPF2 simulated respectively 910 and 1000 particles and required on average 1025 and 850 Brownian bridge simulations to generate the random weights at each iteration.

Note that the comparative CPU cost of the four algorithms, and in particular that of EPPF and ESPF as compared to RWPF1-2 depends on the underlying diffusion path. The acceptance probabilities within EPPF and ESPF depend on the values of $x_{i}^{k_{i,j}}$ and $x_{t_{i+1}}$ , and get small when both these values are close to $0(\mbox{mod }2\pi)$ . (in the long run the diffusion will visit these regions infrequently and will stay there for short periods.) Thus, simulated paths which spent more (or less) time in this region of the state-space would result in EPPF and ESPF having a larger (respectively smaller) CPU cost.

We compared the four filters based on the variability of estimates of the mean of the filtering distribution of the state across 500 independent runs of each filter. Results are given in Figure 2, while output from one run of RWPF2 is shown in Figure 1. The comparative results in Figure 2 are for estimating the mean of the filtering distribution at each iteration (similar results were obtained for various quantiles of the filtering distribution). They show RWPF2 performing best with an average efficiency gain of $15\%$ over RWPF1, $50\%$ over ESPF and $200\%$ over EPPF. Interpretation of these results suggest that (for example) ESPF would be required to run with $N=750$ (taking $1.5$ times the CPU cost for this data set) to obtain comparable accuracy with RWPF2.

Varying the parameters of the model and implementation of the algorithms will affect the relative performance of the algorithms. In particular increasing (or decreasing) $\sigma^{2}$ , the variance of the measurement error, will increase (respectively decrease) the relative efficiency of EPPF relative to the other filters. Similar results occur as $\Delta_{i}$ is decreased (respectively increased). The relative performance of the other three algorithms appears to be more robust to such changes. We considered implementing EPPF with $\beta_{i}^{(k)}=w_{i}^{(k)}$ ; and also using an Euler rather than an Ozaki approximation of the sine diffusion to construct the proposal distribution for RWPF1-2, but neither of these changes had any noticeable effect on the performance of the methods. We also considered re-sampling less often, setting $C=N/4$ in step PF1 of the RWPF algorithm (so re-sampling when the effective sample size of the $\beta_{i}^{(j)}$ s was less than $N/4$ ) and this reduced the performance of the algorithms substantially (by a factor of 2 for RWPF1-2).

We also investigated the effect of increasing the amount of time, $\Delta$ , between observations. To do this we used the above data taking (i) every 10th; or (ii) every 20th time-point.

Table 3 gives ESS values for the different values of $\Delta$ . We see that the ESS values drops dramatically as $\Delta$ increases, and the filter is inefficient for $\Delta=20$ . This drop in performance is due to the large variability of the random weights in this case. The variability of these weights is due to (a) the variability of

across different diffusion paths; and (b) the Monte Carlo variability in estimating this for a given path. To evaluate what amount is due to (a), we tried a particle filter that estimates (26) numerically by simulating the Brownian Bridge at a set of discrete time points (for this example we sampled values every 1/2 time unit) and then using these to numerically evaluate the integral. This approach is closely related to the importance sampling approach of Durham and Gallant (2002); Nicolau (2002), see Section 4.1. The results for this filter are also given in Table 3 (note the ESS values ignore any bias introduced through this numerical approximation), and we again see small ESS values, particularly for $\Delta=20$ . This filter’s performance is very similar to the RWPF, which suggests that the Monte Carlo variability in (b) is a small contributor to the poor performance of the RWPF in this case.

Finally we tried introducing pseudo observations at all integer time-intervals where currently no observation is made. The RWPF is then run as above, but with no likelihood contribution to the weight at the time-points where there are these uninformative observations. The idea is that now $\Delta=1$ , so that the variance of the random weights is well-behaved, but we still have adaptation of the path of the diffusion in the unit time-interval prior to an observation to take account of the information in that observation. Results are again shown in Table 3, and the ESS values are very high (and close to the optimal value, that of the number of particles, 1000). Note that the computational cost is only roughly doubled by adding these extra pseudo observations; as the total computational cost for the simulation of the Brownian bridge is unchanged. These results are reasonably robust to the choice of how frequently to introduce these uninformative observations (results not shown).

2 Analysis of the Cox process

We now consider applying the random weight particle filter (RWPF) to Example 2 from Section 2, the OU-driven Cox process. The data we analysed is given in Figure 3(top). It is either impossible or difficult to adapt the other two EA-based particle filters (the EPPF and the ASPF) to this problem. For instance we cannot implement EPPF as the likelihood function is not tractable. As such we just focus on the efficiency of the RWPF in estimating the filtering distribution of $|X_{t}|$ .

Our implementation of the RWPF was based on proposing particles from the prior distribution, so $\beta_{i}^{(k)}=w_{i}^{(k)}$ and $q(x_{t_{i+1}}|x_{i}^{j},y_{i+1})$ is just the OU transition density $p(x_{t_{i+1}}|x_{i}^{j})$ . We simulated the random weights by GPE-2. We calculated the filtering density at each observation time, and also at 56 pseudo-observation times chosen so that the maximum time difference between two consecutive times for which we calculated the filtering density was 0.1. This was necessary to avoid the number of Brownian bridge simulations required to simulate the weights being too large for long inter-observation times, and also to control the variance of the random weights (see above). The likelihood function for these non-observation times is obtained by removing $\nu(x_{t_{i}})$ from (4).

We set the number of particles to $1,000$ and resampled when the ESS of the $\beta_{i}^{(j)}$ s was less then 100 ( $C=N/10$ in step PF1 of the algorithm in Section 3). Whilst results for the sine diffusion suggest that this will result in an algorithm that re-samples too infrequently, we chose to have a low threshold so that we could monitor the performance of the particle filter by how the ESS of the particle filter weights decay over time. The results of one run of this filter are shown in Figure 3(top). The computational efficiency of this method can be gauged by Figure 3 (bottom) where the ESS of the $w_{i}^{(j)}$ s is plotted over time.

Discussion

We have described how recent methods for the exact simulation of diffusions and the unbiased estimation of diffusion exponential functionals can be used within particle filters, so that the resulting particle filters avoid the need for time-discretisation. Among the approaches we have introduced special attention was given to RWPF which implements an auxiliary particle filter, but simulates the weights that are allocated to each particle. We showed that this methodology is equivalent to an auxiliary particle filter applied to appropriately expanded model. We expect that this methodology will have interesting applications to different models than those considered in this paper, which however involve intractable dynamics or likelihoods.

We have focused on the filtering problem, estimating the current state given observations to date. However, extensions to prediction are trivial – merely requiring the ability to simulate from the state equation, which is possible via the EA algorithms. It is also straightforward to use the idea of Kitagawa (1996), where each particle stores the history of its trajectory, to get approximations of the smoothing density (the density of the state at some time in the past given the observations to date).

Note that while particles store values of the state only for each observation time, it is straightforward to fill in the diffusion paths between these times to produce inferences about the state at any time. A particle approximation to the distribution of $({\bf X}_{s},t_{i-1}<s<t_{i})$ , conditionally on the data $y_{1:i}$ can be constructed using the current set of weighted particles $\{({\bf x}_{i-1}^{(j)},{\bf x}_{i}^{(j)})\}_{j=1}^{N}$ with weights $\{w_{i}^{(j)}\}$ , as follows. Firstly we need to introduce some notation; we denote by ${\bf x}_{i-1|i}^{(j)}$ the value of the particle at time $t_{i-1}$ from which the $j$ th particle at time $t_{i}$ is descended. The particle approximation is given by a set of weighted paths $\{({\bf x}_{s},t_{i-1}<s<t_{i})^{(j)}\}_{j=1}^{N}$ with weights $\{w_{i}^{(j)}\}$ . Each path is a diffusion bridge starting from ${{\bf x}}_{i-1|i}^{(j)}$ and finishing at ${\bf x}_{i}^{(j)}$ and it can be simulated using EA, as described in Beskos et al. (2006a) and Beskos et al. (2005b). In observation regimes (A) and (B) the EA is applied to simulate a diffusion bridge with density w.r.t. the Brownian bridge measure given by $\exp\{-\int_{t_{i-1}}^{t_{i}}\phi({\bf X}_{s})\mbox{d}s\}$ , whereas in regime (C) the corresponding density is $\exp\{-\int_{t_{i-1}}^{t_{i}}(\phi({\bf X}_{s})+\nu({\bf X}_{s}))\mbox{d}s\}$ . This representation can be directly exploited to draw inferences for any function of a finite skeleton of ${\bf X}$ in-between observation times.

The algorithm proposed in Beskos et al. (2005b) starts by creating a partition of the sample space of ${\bf W}$ for the given ${\bf W}_{0}={\bf x}$ and ${\bf W}_{t}={\bf y}$ . Writing ${\bf x}=(x_{1},\ldots,x_{d})$ , for a user-specified constant $a>\sqrt{t/3}$ , a sequence of subsets of $\mathbf{R}^{d}$ is formed as $A_{j}=\{{\bf u}=(u_{1},\ldots,u_{d}):\min(x_{t_{i}},y_{i})-ja<u_{i}\leq\max(x_{t_{i}},y_{i})+ja\},~{}j\geq 0$ , where $\cup_{j}A_{j}=\mathbf{R}^{d}$ . This sequence defines a partition of the sample space of the form $\cup_{j=1}^{\infty}D_{j}$ , where a path belongs to $D_{j}$ if and only if the path has exceeded the bounds determined by $A_{j-1}$ but not the bounds determined by $A_{j}$ . In Beskos et al. (2005b) it is shown how to simulate the random variable which determines which of the $D_{j}$ s ${\bf W}$ belongs to, and how to simulate ${\bf W}$ at any collection of times conditional on this random variable, the layered Brownian bridge construction. Since $g$ is assumed continuous, knowing $W\in D_{j}$ is sufficient to determine $U_{\bf W}$ and $L_{\bf W}$ which satisfy (16). In fact, in the simplified setting where $g$ is bounded, as in the sine diffusion of Example 1, the layered Brownian bridge construction can be avoided since it is easy to choose $U_{\bf W}$ and $L_{\bf W}$ independently of ${\bf W}$ .

Appendix B: Proof of Theorem 1

Fubini’s theorem and dominated convergence are used above (valid since the integrands are positive a.s.). (22) is obtained using the following result (which can be easily proved using Jensen’s inequality). Let $f_{i}>0$ for $i=1,2,\ldots$ . Then the sequence of $p_{i}$ s which minimize $\sum_{i=0}^{\infty}f_{i}/p_{i}$ under the constraint $\sum p_{i}=1$ is given by $p_{i}=\sqrt{f_{i}}/\sum\sqrt{f_{i}}$ .

Appendix C: Proof of Theorem 2

where $M_{i}=\sup_{0\leq s\leq t}|W_{i}|$ using the the growth bound in Theorem 2. Furthermore,

It remains therefore to bound the $d$ integrals on the right hand side of this expression. However from the Bachelier-Levy formula for hitting times for Brownian motion and bridges,

which recedes like $w^{-k\log w}$ as $w\to\infty$ thus concluding the proof.

Appendix D: EPPF and ESPF for Example 1

EPPF generates the new particles according to the following procedure:

choose one of the current particles $x_{i}^{(k_{i,j})}$ , where particle $j$ is chosen w.p. $\beta_{i}^{(j)}$ ;

propose $x_{t_{i+1}}$ from Normal with mean $x_{i}^{(k_{i,j})}$ , and variance $\Delta_{i}$ ;

accept this proposal w.p. $\exp(-\cos(x_{t_{i+1}})-1)$ ; if proposal is rejected return to (i).

where expectation is with respect to the law of a Brownian Bridge from $W_{0}=x_{i}^{(k_{i,j})}$ and $W_{\Delta_{i}}=x_{t_{i+1}}$ , and $\phi$ is given in (7). If the proposal is rejected return to (i), otherwise $x_{t_{i+1}}$ is the new particle at time $t_{i+1}$ with weight $w_{i+1}=f(y_{i+1}|x_{i+1})$ .

(iv) is performed using retrospective sampling as described in Beskos et al. (2006a).

ESPF proceeds as above but with steps (i) and (ii) replaced by the step

propose $(x_{i}^{(k_{i,j})},x_{t_{i+1}})$ according to the density proportional to

where $\eta=({\sigma^{2}+\Delta_{i}})^{-1}({x_{i}^{(k_{i,j})}\sigma^{2}+\Delta_{i}y_{i+1}})$ , and $\tau=\sigma^{2}\Delta_{i}/(\sigma^{2}+\Delta_{i}).$

The algorithm is repeated until $N$ values for $x_{t_{i+1}}$ are accepted, each with weight $1/N$ .

Appendix E: Proposal Distribution for Example 1

Consider a diffusion satisfying SDE (1), with $d=1$ for simplicity. The Ozaki approximation of this SDE is based on a first order Taylor expansion of the drift about some value $x$ . For the sine diffusion of Example 1, we get the following approximating SDE

Appendix F: Central Limit Theorem

For notational simplicity, we consider a special case of our particle filter, chosen to resemble those considered in Chopin (2004). We choose our proposal density for time $t_{i+1}$ to have $\beta_{j}=w^{(j)}_{i}$ ; and we assume iid sampling of $X_{t_{i}}^{(j)}$ in step PF1. The particle filter of Chopin (2004) splits up simulating particles at time $t_{i+1}$ into (i) a resampling of particles at time $t_{i}$ ; and (ii) a propagation of each of these particles to time $t_{i+1}$ . Our assumption of iid sampling is equivalent to the multinomial resampling case of Chopin (2004). (The conditions for the central limit theorem are the same if the residual sampling methods of Liu and Chen (1998), but the variances differ.) For simplicity we consider observation model (A) or (B), though the result extends easily to observation model (C).

Comment Equations (27)–(29) refer to the changes in variance of the weighted particles due to the propagation, weighting and resampling stages at iteration $i$ . Only (28) differs from the respective result in Chopin (2004), and this is due to the second term on the right-hand side, which represents the increase in variance due to the randomness of the weights. Condition C1 is taken from Chopin (2004) and applies to standard particle filters; conditions C2 and C3 are new and are conditions bounding the variance of the random weights which ensures that $V_{i}(\varphi)$ is finite.

(Here the expectation and variance in (30) are w.r.t. the auxiliary variables). Combining these results gives (28). The regularity conditions (C1) - (C3) translate directly also. ∎