On the relation between Gaussian process quadratures and sigma-point methods

Simo Särkkä, Jouni Hartikainen, Lennart Svensson, Fredrik Sandblom

I Introduction

Gaussian process quadratures are methods to numerically compute integrals of the form

Sigma-point methods can be seen as methods which approximate the above integrals via

where WiW_{i} are some predefined weights and xi\mathbf{x}_{i} are the sigma-points (classically called abscissas). Typically the evaluation points and weights are selected such that when g\mathbf{g} is a multivariate polynomial up to a certain order, the approximation is exact.

A particularly useful class of methods is obtained when the weight function is selected to be a multivariate Gaussian density w(x)=N(xm,P)w(\mathbf{x})=\operatorname{N}(\mathbf{x}\mid\mathbf{m},\mathbf{P}). In the context of Gaussian process quadratures it then turns out that the integral of the Gaussian process regressor can be computed in closed form provided that the covariance function of the process is chosen to be a squared exponential (i.e., exponentiated quadratic). This kind of quadrature methods are also often referred to as Bayesian or Bayes–Hermite quadratures. They are closely related to Gauss–Hermite quadratures in the sense that as Gaussian quadratures can be seen to form a polynomial approximation to the integrand via point-evaluations, Gaussian process quadratures use a Gaussian process regression approximation instead . Because Gaussian process regressors can be used to approximate much larger class of functions than polynomial approximations , they can be expected to perform much better also in numerical integration.

The selection of a Gaussian weight function is also particularly useful in non-linear filtering and smoothing, because the equations of non-linear Gaussian (Kalman) filters and smoothers consist of Gaussian integrals of the above form and linear operations on vectors and matrices. The selection of different weights and sigma-points leads to different brands of approximate filters and smoothers . For example, the multidimensional Gaussian type of Gauss–Hermite quadrature and cubature based filters and smoothers are based on explicit numerical integration of the Gaussian integrals. The unscented transform based methods as well as other sigma-point methods can also be retrospectively interpreted to belong to the class of Gaussian numerical integration based methods . Conversely, Gaussian type of quadrature or cubature based methods can also be interpreted to be special cases of sigma-point methods. Furthermore, the classical Taylor series based methods and Stirling’s interpolation based methods can be seen as ways to approximate the integrand such that the Gaussian integral becomes tractable (cf. ). The recent Fourier–Hermite series , Hermite polynomial methods are also based on numerical approximation of the integrands.

The aim of this article is to present new Gaussian process quadrature based methods for non-linear filtering and smoothing, and to analyze their connection with sigma-point methods and multivariate numerical integration methods. We show that many sigma-point filtering and smoothing algorithms such as unscented Kalman filters and smoothers, cubature Kalman filters and smoothers, and Gauss–Hermite Kalman filters and smoothers can be seen as special cases of the proposed methods with suitably chosen covariance functions. More generally, we show that many classical multivariate Gaussian quadrature methods, including Gauss–Hermite rules , and symmetric integration formulas are special cases of the present methodology. We also discuss different criteria for selecting the sigma-point (abscissa) locations: exactness for multivariate polynomials up to a given order, minimum average error, and quasi-random point sets.

This article is an extended version of the conference article , where we analyzed the use of Gaussian process quadratures in non-linear filtering and smoothing as well as their connection to the unscented transform and Gauss–Hermite quadratures. In this article, we deepen and sharpen the analysis of those connections and extend our analysis to a more general class of spherically symmetric integration rules. We also analyze different sigma-point selection schemes as well as provide more extensive set of numerical experiments.

II Background

Non-linear Gaussian (Kalman) filters and smoothers are methods which can be used to approximate the filtering distributions p(xky1,,yk)p(\mathbf{x}_{k}\mid\mathbf{y}_{1},\ldots,\mathbf{y}_{k}) and smoothing distributions p(xky1,,yT)p(\mathbf{x}_{k}\mid\mathbf{y}_{1},\ldots,\mathbf{y}_{T}) of non-linear state-space models of the form

Non-linear Gaussian filters (see, e.g., , page 98) are general methods to produce Gaussian approximations to the filtering distributions:

Non-linear Gaussian smoothers (see, e.g., , page 154) are the corresponding methods to produce approximations to the smoothing distributions:

Both Gaussian filters and smoothers above can be easily generalized to state-space models with non-additive noises (see ), but here we only consider the additive noise case.

A general additive noise one-step moment-matching-based Gaussian filter algorithm can be written in the following form.

The prediction and update steps of the non-linear Gaussian (Kalman) filter are :

The filtering is started from initial mean and covariance, m0\mathbf{m}_{0} and P0\mathbf{P}_{0}, respectively, such that x0N(m0,P0)\mathbf{x}_{0}\sim\operatorname{N}(\mathbf{m}_{0},\mathbf{P}_{0}). Then the prediction and update steps are applied for k=1,2,3,,Tk=1,2,3,\ldots,T.

The result of the filter is a sequence of approximations

The corresponding smoothing algorithm can be written in the following form.

The equations of the non-linear Gaussian (Rauch–Tung–Striebel, RTS) smoother are the following :

The smoothing recursion is started from the filtering result of the last time step k=Tk=T, that is, mTs=mT\mathbf{m}^{s}_{T}=\mathbf{m}_{T}, PTs=PT\mathbf{P}^{s}_{T}=\mathbf{P}_{T} and proceeded backwards for k=T1,T2,,1k=T-1,T-2,\ldots,1.

The approximations produced by the smoother are

Both the filter and smoother above can be derived from the following Gaussian moment matching ”transform” (the terminology comes from unscented transform).

The moment matching based Gaussian approximation to the joint distribution of x\mathbf{x} and the transformed random variable y=g(x)+q\mathbf{y}=\mathbf{g}(\mathbf{x})+\mathbf{q}, where xN(m,P)\mathbf{x}\sim\operatorname{N}(\mathbf{m},\mathbf{P}) and qN(0,Q)\mathbf{q}\sim\operatorname{N}(\mathbf{0},\mathbf{Q}), is given by

II-B Gaussian Integration and Sigma-Point Methods

Sigma-point filtering and smoothing methods can generally be described as methods which approximate the Gaussian integrals in the Gaussian filtering and smoothing equations (and in the Gaussian moment matching transform) as

where WiW_{i} are some predefined weights and xi\mathbf{x}_{i} are the sigma-points. Typically, the sigma-point methods use so called stochastic decoupling which refers to the idea that we do a change of variables

where P=PPT\mathbf{P}=\sqrt{\mathbf{P}}\,\sqrt{\mathbf{P}}^{\mathsf{T}}. This implies that we only need to design weights WiW_{i} and unit sigma-points ξi\boldsymbol{\xi}_{i} for integrating against unit Gaussian distributions:

thus leading to approximations of the form

Different sigma-point methods correspond to different choices of weights WiW_{i} and unit sigma-points ξi\boldsymbol{\xi}_{i}. For example, the canonical unscented transform uses the following set of 2n+12n+1 weights (recall that nn is the dimensionality of the state) and sigma-points:

Note that sigma-point methods sometimes use different weights for the integrals appearing in the mean and covariance computations of Gaussian filters and smoothers. However, here we will only concentrate on the methods which use the same weights for both in order to derive more direct connections between the methods. For example, the above unscented transform weights are just a special case of more general unscented transforms (see, e.g., ).

II-C Gaussian Process Regression

Gaussian process quadrature is based on forming a Gaussian process (GP) regression approximation to the integrand using pointwise evaluations and then integrating the approximation. In GP regression the purpose is to predict the value of an unknown function

In practice, the observations are often assumed to contain noise and hence a typical model setting is:

where the first line above means that the random function gKg_{K} has a zero mean Gaussian process prior with the given covariance function K(x,x)K(\mathbf{x},\mathbf{x}^{\prime}). A commonly used covariance function is the exponentiated quadratic (also called squared exponential) covariance function

The GP regression equations can be derived as follows. Assume that we want to estimate the value of the noise-free function g(x)g(\mathbf{x}) based on its Gaussian process approximation gK(x)g_{K}(\mathbf{x}) at a test point x\mathbf{x} given the vector of observed values o=(o1,,oN)\mathbf{o}=(o_{1},\ldots,o_{N}). Due to the Gaussian process assumption we now get

where K=[K(xi,xj)]\mathbf{K}=[K(\mathbf{x}_{i},\mathbf{x}_{j})] is the joint covariance of observed points, K(x,x)K(\mathbf{x},\mathbf{x}) is the (co)variance of the test point, k(x)=[K(x,xi)]\mathbf{k}(\mathbf{x})=[K(\mathbf{x},\mathbf{x}_{i})] is the vector cross covariances with the test point.

The Bayesian estimate of the unknown value of gK(x)g_{K}(\mathbf{x}) is now given by its posterior mean, given the training data. Because everything is Gaussian, the posterior distribution is Gaussian and hence described by the posterior mean and (auto)covariance functions:

These are the Gaussian process regression equations in their typical form , in the special case where gg is scalar. The extension to multiple output dimensions is conceptually straightforward (see, e.g., ), but construction of the covariance functions as well as the practical computational methods tend to be complicated . However, a typical easy approach to the multivariate case is to treat each of the dimensions independently.

II-D Gaussian Process Quadrature

In Gaussian process quadrature the basic idea is to approximate the integral of a given function gg against a weight function w(x)w(\mathbf{x}), that is,

by evaluating the function gg at a finite number of points and then by forming a Gaussian process approximation gKg_{K} to the function. The integral is then approximated by integrating the Gaussian process approximation (or its posterior mean) which is conditioned on the evaluation points instead of the function itself. Here we assume that gg is scalar for simplicity as we can always take a vector function elementwise.

Gaussian process quadratures are related to a regression interpretation of classical Gaussian quadrature integration, that is, we can interpret these integration methods as orthogonal polynomial approximations of the integrand evaluated at certain finite number of points . The integral is then approximated by integrating the polynomial instead of the original function. However, the aim of Gaussian process quadrature is to get a good performance in average, whereas in classical polynomial quadratures the integration rule is designed to be exact for a limited class of (polynomial) functions. Still, these approaches are very much linked together .

Due to linearity of integration, the posterior mean of the integral of the Gaussian process regressor is given as

where the “training set” o=(g(x1),,g(xN))\mathbf{o}=\begin{pmatrix}g(\mathbf{x}_{1}),\ldots,g(\mathbf{x}_{N})\end{pmatrix} now contains the values of the function gg evaluated at certain selected inputs.

The posterior variance of the integral can be evaluated in an analogous manner, and it is sometimes used to optimize the evaluation points of the function gNg_{N} . The posterior covariance of the approximation is

That is, when we approximate the integral (23) with the posterior mean we have

The posterior variance of the (scalar) integral is

In this article we are specifically interested in the case of Gaussian weight function, which then reduces the integral appearing in the above expressions (26) and (27) to

III Gaussian process quadratures for sigma-point filtering and smoothing

In this section we start by showing how Gaussian process quadratures (GPQ) can be seen as sigma-point methods and then introduce the Gaussian process transform. The Gaussian process transform then enables us to construct GPQ-based non-linear filters and smoothers analogously to .

In this section the aim is to shown how Gaussian process quadratures (GPQ) can be seen as sigma-point methods.

The Gaussian process quadrature (or Bayes–Hermite/Bayesian quadrature) can be seen is a sigma-point-type of integral approximation

where xi=m+Pξi\mathbf{x}_{i}=\mathbf{m}+\sqrt{\mathbf{P}}\,\boldsymbol{\xi}_{i} with the unit sigma-points ξi\boldsymbol{\xi}_{i} are selected according to a predefined criterion, and the weights are determined by

where K=[K(ξi,ξj)]\mathbf{K}=[K(\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j})] is the matrix of unit sigma-point covariances and k(ξ)=[K(ξ,ξi)]\mathbf{k}(\boldsymbol{\xi})=[K(\boldsymbol{\xi},\boldsymbol{\xi}_{i})] is the vector cross covariances. In principle, the choice of unit sigma-points above is completely free, but good choices of them are discussed in the following sections.

Let us first use stochastic decoupling (14) which enables us to only consider unit-Gaussian integration formulas of the form (15). Because we can integrate vector functions element-by-element, without loss of generality we can assume that g(x)g(\mathbf{x}) is single-dimensional. Let us now model the function ξg(m+Pξ)\boldsymbol{\xi}\mapsto g(\mathbf{m}+\sqrt{\mathbf{P}}\,\boldsymbol{\xi}) as a Gaussian process gKg_{K} with a given covariance function K(ξ,ξ)K(\boldsymbol{\xi},\boldsymbol{\xi}^{\prime}) and fix the training set for the GP regressor by selecting the points ξi\boldsymbol{\xi}_{i}, i=1,,Ni=1,\ldots,N, which also determines the corresponding points xi=m+Pξi\mathbf{x}_{i}=\mathbf{m}+\sqrt{\mathbf{P}}\,\boldsymbol{\xi}_{i} such that the training set is o=(g(x1),,g(xN))\mathbf{o}=\begin{pmatrix}g(\mathbf{x}_{1}),\ldots,g(\mathbf{x}_{N})\end{pmatrix}. The GP approximation to the integral now follows from (26):

which when simplified and applied to all the dimensions of g\mathbf{g} gives the result. ∎

Note that above we actually assume that the stochastically-decoupled-function ξg(m+Pξ)\boldsymbol{\xi}\mapsto g(\mathbf{m}+\sqrt{\mathbf{P}}\,\boldsymbol{\xi}) instead of the original integrand g(x)g(\mathbf{x}) has the given covariance function. The reason for this modeling choice is that it enables us to decouple the mean and covariance from the integration formula and hence is computationally beneficial. This also makes the result invariant to affine transformations of the state and it also has a property that the variability of the functions corresponds to the scale of the problem. However, on the other hand, one might argue that it is the function g(x)g(\mathbf{x}) which should actually model and using the stochastically-decoupled-function is “wrong”.

From Equation (27) we get that the component-wise variances of the Gaussian process quadrature approximation can be expressed as

Using the above integration approximations we can also define a general Gaussian process transform as follows. The reason for introducing the transform is that the corresponding approximate filters and smoothers can be readily constructed in terms of the transform (cf. ), which we will do in the next section.

The Gaussian process quadrature based Gaussian approximation to the joint distribution of x\mathbf{x} and the transformed random variable y=g(x)+q\mathbf{y}=\mathbf{g}(\mathbf{x})+\mathbf{q}, where xN(m,P)\mathbf{x}\sim\operatorname{N}(\mathbf{m},\mathbf{P}) and qN(0,Q)\mathbf{q}\sim\operatorname{N}(\mathbf{0},\mathbf{Q}), is given by

where ξi\boldsymbol{\xi}_{i} is some fixed set of sigma/training points and the weights are given by Equation (30) with some selected covariance function K(ξ,ξ)K(\boldsymbol{\xi},\boldsymbol{\xi}^{\prime}).

In this article, at least in the analytical results, we usually assume that the measurements are noise-free, that is, σ2=0\sigma^{2}=0. This enables us to obtain analytically exact relationships with the classical quadrature methods. However, when using Gaussian process quadratures as numerical integration method, it is often beneficial to have at least a small non-zero value for σ2\sigma^{2} in (30). This kind of “jitter” stabilizes numerics and can even be sometimes used to compensate for inaccuracies in modeling.

which are the unscented transform weights. We return to this relationship in Section IV-D.

III-B GPQs in filtering and smoothing

In this section we show how to construct filters and smoothers using the Gaussian process quadrature approximations. Because Algorithm III.1 can be seen as a sigma-point method, analogously to other sigma-point filters considered, for example, in , we can now formulate the following sigma-point filter for model (3), which uses the the unit sigma-points ξi\boldsymbol{\xi}_{i} and weights WiW_{i} defined by Algorithm III.1.

The filtering is started from initial mean and covariance, m0\mathbf{m}_{0} and P0\mathbf{P}_{0}, respectively, such that x0N(m0,P0)\mathbf{x}_{0}\sim\operatorname{N}(\mathbf{m}_{0},\mathbf{P}_{0}). Then the following prediction and update steps are applied for k=1,2,3,,Tk=1,2,3,\ldots,T.

Form the sigma points as follows: Xk1(i)=mk1+Pk1ξi,i=1,,N\mathcal{X}^{(i)}_{k-1}=\mathbf{m}_{k-1}+\sqrt{\mathbf{P}_{k-1}}\,\boldsymbol{\xi}_{i},i=1,\ldots,N.

Propagate the sigma points through the dynamic model: X^k(i)=f(Xk1(i)),i=1,,N\hat{\mathcal{X}}^{(i)}_{k}=\mathbf{f}(\mathcal{X}^{(i)}_{k-1}),i=1,\ldots,N.

Compute the predicted mean mk\mathbf{m}^{-}_{k} and the predicted covariance Pk\mathbf{P}^{-}_{k}:

Form the sigma points: Xk(i)=mk+Pkξi,i=1,,N\mathcal{X}^{-(i)}_{k}=\mathbf{m}^{-}_{k}+\sqrt{\mathbf{P}^{-}_{k}}\,\boldsymbol{\xi}_{i},i=1,\ldots,N.

Propagate sigma points through the measurement model: Y^k(i)=h(Xk(i)),i=1N\hat{\mathcal{Y}}^{(i)}_{k}=\mathbf{h}(\mathcal{X}^{-(i)}_{k}),i=1\ldots N.

Compute the predicted mean μk\boldsymbol{\mu}_{k}, the predicted covariance of the measurement Sk\mathbf{S}_{k}, and the cross-covariance of the state and the measurement Ck\mathbf{C}_{k}:

Compute the filter gain Kk\mathbf{K}_{k} and the filtered state mean mk\mathbf{m}_{k} and covariance Pk\mathbf{P}_{k}, conditional on the measurement yk\mathbf{y}_{k}:

Further following the line of thought in we can formulate a sigma-point smoother using the unit sigma-points and weights from Algorithm III.1.

The smoothing recursion is started from the filtering result of the last time step k=Tk=T, that is, mTs=mT\mathbf{m}^{s}_{T}=\mathbf{m}_{T}, PTs=PT\mathbf{P}^{s}_{T}=\mathbf{P}_{T} and proceeded backwards for k=T1,T2,,1k=T-1,T-2,\ldots,1 as follows.

Form the sigma points: Xk(i)=mk+Pkξi,i=1,,N\mathcal{X}^{(i)}_{k}=\mathbf{m}_{k}+\sqrt{\mathbf{P}_{k}}\,\boldsymbol{\xi}_{i},i=1,\ldots,N.

Propagate the sigma points through the dynamic model: X^k+1(i)=f(Xk(i)),i=1,,N\hat{\mathcal{X}}^{(i)}_{k+1}=\mathbf{f}(\mathcal{X}^{(i)}_{k}),i=1,\ldots,N.

Compute the predicted mean mk+1\mathbf{m}^{-}_{k+1}, the predicted covariance Pk+1\mathbf{P}^{-}_{k+1}, and the cross-covariance Dk+1\mathbf{D}_{k+1}:

Compute the gain Gk\mathbf{G}_{k}, mean mks\mathbf{m}^{s}_{k} and covariance Pks\mathbf{P}^{s}_{k} as follows:

Note that we could cope with non-additive noises in the model by using augmented forms of the above filters and smoothers as in . The fixed-point and fixed-lag smoothers can also be derived analogously as was done in the same reference.

IV Selection of covariance functions and sigma-point locations

The accuracy of the Gaussian process quadrature method and hence the accuracy of the filtering and smoothing methods using it is affected by

the covariance function K(ξ,ξ)K(\boldsymbol{\xi},\boldsymbol{\xi}^{\prime}) used and

the sigma-point locations ξi\boldsymbol{\xi}_{i}.

Once both of the above are fixed, the weights are determined by Equation (30). In this section we discuss certain useful choices of covariance functions as well as ”optimal” choices of sigma-point locations for them. We also discuss the connection of the resulting methods with sigma-point methods such as unscented transforms and Gauss–Hermite quadratures.

In a machine learning context the default choice for a covariance function of a Gaussian process is the squared exponential covariance function in Equation (20). What makes it convenient in Gaussian process quadrature context is that the integral required for computing the weights in Equation (30) can be evaluated in closed form (cf. ). It turns out that the posterior variance can be computed in closed form as well which is useful because for a given set of sigma-points we can immediately compute the expected error in the integral approximation (assuming that the integrand is indeed a GP) – this is possible because the variance does not depend on the observations at all.

One way to determine the sigma-point locations is to select them to minimize the posterior variance of the integral approximation . In our case this corresponds to minization of the variance in Equation (32) with respect to the points ξ1:N\boldsymbol{\xi}_{1:N}. Although the minimization is not possible in closed form, with a moderate NN this optimization can be done numerically. Figure 1 shows examples of minimum variance point sets optimized by using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm .

The squared exponential covariance function is not the only possible choice for a covariance function. From the machine learning context we could, for example, choose a Matérn covariance function or some of the scale-mixture-based covariance functions . In that case the weight integral (30) becomes less trivial, but at least we always have a chance to precompute the weights using some (other) multivariate quadrature method. The sigma-point optimization could also be done similarly as for the squared exponential covariance function.

IV-B UT and spherical cubature rules

In addition to the squared exponential covariance funtion, another useful class of covariance function are polynomial covariance functions. They correspond to linear-in-parameters regression using polynomials as the regressor functions. It turns out that also for polynomial covariance functions we can compute the weights (30) in closed form. What is even more interesting is that the Gaussian process quadratures reduce to classical numerical integration methods. In this section we show that with certain selections of symmetric evaluation points we get a classical family of spherically symmetric integration methods of McNamee and Stenger of which the unscented transform can be (retrospectively) seen as a special case . More detailed information on the multivariate Hermite polynomials used below can be found in Appendix A.

where λI,J\lambda_{\mathcal{I},\mathcal{J}}’s form a positive definite covariance matrix and HI(ξ)H_{\mathcal{I}}(\boldsymbol{\xi}) are multivariate Hermite polynomials (see Appendix A). If we now select the evaluation points as in UT (17), then the GPQ weights WiW_{i} become the UT weights. Furthermore, the posterior variance of the integral approximation is exactly zero.

where cIc_{\mathcal{I}} are zero mean Gaussian random variables with the covariances λI,J=E[cIcJ]\lambda_{\mathcal{I},\mathcal{J}}=\operatorname{E}\left[c_{\mathcal{I}}\,c_{\mathcal{J}}\right]. When the joint covariance matrix Λ=[λI,J]\Lambda=[\lambda_{\mathcal{I},\mathcal{J}}] is non-singular, the posterior covariance of the integral being zero is equivalent to that the integral rule is exact for all functions of the form (38) with arbitrary coefficients. Clearly with the UT evaluations points, the UT weights are the unique ones that have this property (see, e.g., ) and hence the result follows. ∎

Note that the above result also covers the cubature transform (CT), that is, the moment matching rule used in the cubature Kalman filter (CKF) and the smoother, because the transform is a special case of UT .

If we select the evaluation points according to order P=5,7,9,P=5,7,9,\ldots rules in , we obtain the higher order integration formulas in , which are often referred to as fifth order, seventh order, ninth order and higher order UTs.

The result follows analogously to the 3rd order case above. ∎

IV-C Multivariate Gauss–Hermite point sets

The multivariate Gauss–Hermite point sets (see, e.g., ) of order PP are exact for monomials of of the form x1p1××xnpnx_{1}^{p_{1}}\times\cdots\times x_{n}^{p_{n}}, where pi2P1p_{i}\leq 2P-1 for i=1,,ni=1,\ldots,n. This implies the following covariance function class.

where λI,J\lambda_{\mathcal{I},\mathcal{J}}’s form a positive definite covariance matrix and HI(ξ)H_{\mathcal{I}}(\boldsymbol{\xi}) are multivariate Hermite polynomials. If we now select the evaluation points to form a cartesian product of roots of Hermite polynomials of order PP, then the GPQ weights WiW_{i} become the multivariate Gauss–Hermite quadrature weights. The posterior variance of the integral approximation is again exactly zero.

Again the result follows from the equivalence of the polynomial approximations and polynomial covariance functions together with the uniqueness of the Gauss–Hermite rule for exact integration of this same function class. ∎

Even when we are using polynomial covariance functions, we are by no means restricted to using the specific points sets corresponding to the classical integration rules. However, obviously, given the order of the polynomial kernel and number of sigma-points they are also minimum variance points sets and hence good choices also in average – provided that the integrand is indeed a polynomial. In any case, for an arbitrary set of sigma-points we can use Equation (30) to give the corresponding minimum variance weights.

IV-D Connection between squared exponential and polynomial Gaussian process quadratures

where cjN(0,(j!l2j)1)c_{j}\sim\operatorname{N}(0,(j!\,l^{2j})^{-1}). The covariance function of this class is now given as

which is the squared exponential covariance function.

Based on the above, Minka argued (although did not formally prove) that GPQs with the squared exponential covariance functions should converge to the classical quadratures. This argument is indeed backed up by our analytical example in Example III.1 where this covergence indeed happens.

IV-E Random and quasi-random point sets

Recall that one way to approximate the expectation of g(ξ)\mathbf{g}(\boldsymbol{\xi}) over a Gaussian distribution N(0,I)\operatorname{N}(\mathbf{0},\mathbf{I}) is to use Monte Carlo integration. In that method we simply draw NN samples from the Gaussian distribution ξiN(0,I)\xi_{i}\sim\operatorname{N}(\mathbf{0},\mathbf{I}) and use them as sigma-points. The classical Monte Carlo approximation to the integral would now correspond to setting Wi=1/NW_{i}=1/N. Alternatively, we could use these random points as sigma-points and evaluate their weights by Equation (30). This leads to an approximation, which is sometimes called the Bayesian Monte Carlo approximation .

Instead of sampling from the normal distribution, we can also use quasi-random points sets such as the Hammersley point sets . These are points sets which are designed to give a smaller error in average than random points. The classical method would correspond to setting all weights to Wi=1/NW_{i}=1/N, but again, we can also use Equation (30) to evaluate the weights for the GP quadrature. This corresponds to a ”Bayesian quasi Monte Carlo” approximation to the integral. Some examples of Hammersley point sets are shown in Figure 4.

V Numerical Results

The unscented transform covariance functions of orders 3–7 (see Theorems IV.1 and IV.2) and the exponentiated quadratic (i.e., the squared exponential, SE) covariance function (Eq. (20)) are illustrated in Fig. 5. The polynomial nature of the unscented transform (UT) covariance function can be clearly seen in the figures – the UT covariance function as such does not have such a simple local-correlation-interpretation as the SE covariance function has as the UT covariance functions simply blow up polynomially when moving away from the diagonal.

The corresponding Gaussian process regression results on random data are illustrated in Fig. 6. The polynomial nature of the unscented transform can be clearly seen in the figures. The Gaussian process prediction with the unscented transform covariance function has a clear polynomial shape as expected. Clearly the polynomial fit has less flexibility to explain the data than the exponentiated quadratic fit although the flexibility certainly grows with the polynomial (and thus UT) order.

V-B Illustrative high-dimensional example

We use the same test case as in Section VIII.A. of , that is, the computation of the first two moments of the function y(x)=(1+xTx)py(\mathbf{x})=(\sqrt{1+\mathbf{x}^{\mathsf{T}}\mathbf{x}})^{p} for p=1,2,3,5p=1,-2,-3,-5. We thus aim to approximate the following integrals:

Figure 7 shows the result of using the following methods as function of the state-dimensionality:

Cubature: The 3rd order spherical cubature sigma-points (2n2n points) with the standard integration weights.

GPQ-Cubature: The Gaussian process quadrature with SE covariance function and the 3rd spherical cubature sigma-points above.

GPQ-Hammersley: The Gaussian process quadrature with SE covariance and 2n2n Hammersley points.

The 3rd spherical cubature points refer to the integration rule proposed in , which was also used in the cubature Kalman filter (CKF) in . In the rule, the sigma-points of placed to the intersections coordinate axes with the origin-centered nn-dimensional hypersphere of radius n\sqrt{n}.

The results in Figure 7 show that the GPQ quite consistently gives a bit lower KL-divergence and hence better result than the plain cubature when the cubature points are used. When Hammersley point sets are used, the results vary a bit more: with small state dimensions the results are slightly worse than with the cubature points. When p1p\neq 1, the Hammersley results are much better in high dimensions whereas with p=1p=1 the results are worse than with the cubature point sets.

V-C Univariate non-linear growth model

In this section we compare the performance of the different methods in the following univariate non-linear growth model (UNGM) which is often used for benchmarking non-linear estimation methods:

where x0N(0,5)x_{0}\sim\operatorname{N}(0,5), qk1N(0,10)q_{k-1}\sim\operatorname{N}(0,10), and rkN(0,1)r_{k}\sim\operatorname{N}(0,1).

V-D Bearings only target tracking

In this section we evaluate the methods in the bearings only target tracking problem with a coordinated-turn dynamic model, which was also considered in Section III.A of the article . The non-linear dynamic model is

where the state of the target is x=(x1,x˙1,x2,x˙2,ω)\mathbf{x}=(x_{1},\dot{x}_{1},x_{2},\dot{x}_{2},\omega), and x1,x2x_{1},x_{2} are the coordinates and x˙1,x˙2\dot{x}_{1},\dot{x}_{2} are the velocities in two dimensional space. The time step size is set to Δt=1  s\Delta t=1\;\text{s} and the covariance of the process noise qkN(0,Q)q_{k}\sim N(0,Q) is

where we used q1=0.1m2s3q_{1}=0.1\text{m}2\text{s}^{-3} and q2=1.75×104s3q_{2}=1.75\times 10^{-4}\text{s}^{-3}.

In the simulation setup we have four sensors measuring the angles θ\theta between the target and the sensors. The non-linear measurement model for sensor ii can be written as

where (s1i,s2i)(s^{i}_{1},s^{i}_{2}) is the position of the sensor ii in two dimensions, and riN(0,σθ2)r^{i}\sim N(0,\sigma^{2}_{\theta}) is the measurement noise. The used parameters were the same as in the article .

The RMSE results for the position errors are shown in Figures 10 and 11. Clearly all of the sigma-point methods outperform the Taylor series based methods (EKF/EKS). However, the performances of all the sigma-point methods are very similar: also the Gaussian process quadrature methods give very similar results to the other sigma-point methods. There is a small dip in the errors at the Gauss–Hermite based methods as well as in the highest order Hammersley GPQ method, but practically the performance of all the sigma-point methods is the same.

VI Conclusion

In this article we have proposed new Gaussian process quadrature based non-linear Kalman filtering and smoothing methods and analyzed their relationship with other sigma-point filters and smoothers. We have also discussed the selection of the evaluation points for the quadratures with respect to different criteria: exactness for multivariate polynomials up to a given order, minimum average error, and quasi-random point sets. We have shown that with suitable selections of (polynomial) covariance functions for the Gaussian processes the filters and smoothers reduce to unscented Kalman filters of different orders as well as to Gauss–Hermite Kalman filters and smoothers. By numerical experiments we have also shown that the Gaussian process quadrature rules as well as the corresponding filters and smoothers often outperform previously proposed (polynomial) integration rules and sigma-point filters and smoothers.

Appendix A Fourier–Hermite series

Fourier–Hermite series (see, e.g., ) are orthogonal polynomial series in a Hilbert space, where the inner product is defined via an expectation of a product over a Gaussian distributions. These series are also inherently related to non-linear Gaussian filtering as they can be seen as generalizations of statistical linearization and they also have a deep connection with unscented transforms, Gaussian quadrature integration, and Gaussian process regression .

We can define an inner product of multivariate scalar functions f(x)f(\mathbf{x}) and g(x)g(\mathbf{x}) as follows:

If we now define a norm via fH2=f,f||f||_{\mathcal{H}}^{2}=\langle f,f\rangle, and the corresponding distance function d(f,g)=fgHd(f,g)=||f-g||_{\mathcal{H}}, then the functions fH<||f||_{\mathcal{H}}<\infty form a Hilbert space H\mathcal{H}. It now turns out that the multivariate Hermite polynomials form a complete orthogonal basis of the resulting Hilbert space .

A multivariate Hermite polynomial with multi-index I={i1,,in}\mathcal{I}=\{i_{1},\ldots,i_{n}\} can be defined as

which is a product of univariate Hermite polynomials

The orthogonality property can now be expressed as

where we have denoted I!=i1!in!\mathcal{I}!=i_{1}!\cdots i_{n}! and I=J\mathcal{I}=\mathcal{J} means that each of the elements in the multi-indices I={i1,,in}\mathcal{I}=\{i_{1},\ldots,i_{n}\} and J={j1,,jn}\mathcal{J}=\{j_{1},\ldots,j_{n}\} are equal. We will also denote the sum of indices as I=i1++in|\mathcal{I}|=i_{1}+\cdots+i_{n}.

A function g(x)g(\mathbf{x}) with g,g<\langle g,g\rangle<\infty can be expanded into Fourier–Hermite series

where HI(x)H_{\mathcal{I}}(\mathbf{x}) are multivariate Hermite polynomials and the series coefficients are given by the inner products cI=HI,gc_{\mathcal{I}}=\langle H_{\mathcal{I}},g\rangle.

Consider a Gaussian process gG(x)g_{G}(\mathbf{x}) which has zero mean and a covariance function K(x,x)K(\mathbf{x},\mathbf{x}^{\prime}). In the same way as deterministic functions, Gaussian processes can also be expanded into Fourier–Hermite series:

References