Convergence of Numerical Time-Averaging and Stationary Measures via Poisson Equations

Jonathan C. Mattingly, Andrew M. Stuart, M. V. Tretyakov

Introduction

In many application one is interested in estimating the invariant measure of stochastic differential equation (SDE) by running a numerical scheme which approximates the time dynamics of the SDE. Two common approaches are to use the numerical trajectories to construct an empirical measure by the time averaging trajectories or by averaging many different realizations to obtain a finite ensemble average (see for example ). In either case one produces an approximation to the numerical method’s invariant measure and is immediately presented with a number of questions including: (i) Does the numerical scheme have a stationary measure to which it converges quickly as time goes to infinity? (ii) How close is the numerical method’s stationary measure to the stationary measure of the underlying SDE ? (iii) How close is the time-averaging estimator to the stationary measure of the underlying SDE? This article mainly focuses on the second two questions.

The first question is addressed in many papers, including . There are also a number of works where the second question has been considered. In it was shown that several numerical methods for SDEs, including the forward Euler-Maruyama scheme, have unique stationary measures which converge at the expected rate to the unique stationary measure of the SDE. Moreover, in an expansion of the numerical integration error in powers of time step was obtained which allows one to use the Richardson-Romberg extrapolation to improve the accuracy. These works discuss the convergence in the case of relatively smooth tests functions. In , the authors use techniques from Malliavin calculus to establish smoothing properties of the discretization scheme. This allows one to prove the convergence of the averages of test functions which are only bounded. In turn, this shows that the distance between the numerical method’s stationary measure is close to the stationary measure of the SDE in the total-variation norm. In , a general method of deducing closeness of the stationary measure from an error estimate on a finite time interval is proposed, though in general it does not provide optimal estimates of the convergence rate. In the earlier works a global Lipschitz assumption was imposed on the coefficients of the SDE which was lifted in . The papers deal with elliptic SDEs, while the hypoelliptic case is treated in .

In this paper we obtain error estimates for time-averaging estimators. The error estimates are given in terms of a term of the same order of magnitude as the weak finite time error of the numerical integrator used and the length of the simulation. Control of the time-averages is then used to prove closeness of the numerical method’s stationary measure to the stationary measure of the SDE. The convergence rate obtained is the same as the weak convergence rate obtained on a finite time interval. This highlights the general principle that, for time-dependent problems, finite time approximation properties can be transferred to infinite time approximation properties in the presence of suitable stability. Here the underlying stability is the geometric ergodicity of the SDE.

The main tool in our analysis is an associated Poisson equation for the underlying SDE. In this error analysis, the Markov chain generated by the numerical method need not be uniquely ergodic. It is shown that any stationary measure of the numerical method will be close to the unique stationary measure of the underlying SDE. We see the main advantage of the current analysis being its simplicity and universality. In particular, we obtain the error estimates in the case of numerical schemes with any reasonable simple random variables (including the discrete random variables used in weak Euler schemes ). Furthermore, our approach works equally well for a range of explicit and implicit schemes, and, perhaps more importantly, it works for hypoelliptic SDEs. It is comparatively short and the analysis leverages classical results on PDEs. It should be possible to carry over this approach to methods with adaptive step-sizes (see ). It should also be adaptable to the SPDE setting, using the Poisson equation in infinite dimensions (see related applications of Poisson equations in ).

Our goal here has not been to give the most general results. We have picked the simplifying setting of a compact phase space, namely the dd-dimensional torus, and relatively smooth test functions. The extension to the whole space requires control of the time spent outside of the center of the phase space. Under additional assumptions in the spirit of , we believe that versions of the present results can be proven in Rd\mathbf{R}^{d}.

In Section 2, we introduce the basic setting for the underlying SDE and discuss the hypoellipticity assumptions. In Section 3, we describe the class of numerical approximations we consider. We give examples of explicit and implicit methods which fit our framework. In Section 4.1, we discuss the auxiliary Poisson equation which will be used to prove the main results, and we give properties of its solution. In Section 4.2, we warm up by showing how to prove a law of large numbers, for the SDE, using an auxiliary Poisson equation. Section 5.1 contains the main results of the article which give a number of senses in which the numerical time-averaging estimators are close to the corresponding stationary time average of the SDE. In Section 5.2 we extend the results of Section 5.1 to numerical methods of higher orders and in Section 5.3 the Richardson-Romberg (Talay-Tubaro) error expansion is considered. Section 6 uses the results of Section 5 to give a quantitative estimate on the distance of the numerical stationary measures from the underlying stationary measure for the SDE. In Section 6.2 we make some general comments about analogies with Stein’s method for proving the convergence of distributions to a limiting law. Some practical implications of the results of Section 5 are discussed in Section 7, where we classify the errors of the numerical time-averaging estimators and, in particular, pay attention to the statistical error.

SDE Setting

Let (Ω,F,Ft,P),(\Omega,\mathcal{F},\mathcal{F}_{t},\mathbf{P}), t0,t\geq 0, be a filtered probability space and W(t)=(W1(t),,Wm(t))W(t)=(W_{1}(t),\ldots,W_{m}(t))^{\top} be an mm-dimensional {Ft}t0\{\mathcal{F}_{t}\}_{t\geq 0}-adapted standard Wiener process. Consider the Markov process X(t)X(t) on the torusIn physical applications a process on the torus can be of interest when periodic boundary conditions imposed on the problem. A typical example is a noisy gradient system which may be used to sample from the Gibbs’ distribution with a periodic potential . We also note that our results can potentially be applied to approximating Lyapunov exponents on compact smooth manifolds . Td\mathbf{T}^{d} whose time evolution is governed by the Ito SDE

where f=(f1,,fd) ⁣:f=(f_{1},\ldots,f_{d})^{\top}\colon TdRd\mathbf{T}^{d}\rightarrow\mathbf{R}^{d} and g ⁣:g\colon TdRd×m\mathbf{T}^{d}\rightarrow\mathbf{R}^{d\times m}. We assume that ff and gg are Lipschitz continuous (and hence uniformly bounded) functions on Td\mathbf{T}^{d}. Under these assumptions, equation (2.1) has a unique pathwise global solution. The generator of the Markov process X(t)X(t) is

where a(x)=g(x)g(x)a(x)=g(x)g^{\top}(x). For a twice differentiable function ϕ ⁣:TdR\phi\colon\mathbf{T}^{d}\rightarrow\mathbf{R} and xTdx\in\mathbf{T}^{d} we have

The operator L\mathcal{L} generates a strong Markov semigroup PtP_{t}, for t0t\geq 0, which maps smooth bounded functions to smooth bounded functions. It can also be defined by (Ptϕ)(x)=Exϕ(Xt)(P_{t}\phi)(x)=\mathbf{E}_{x}\phi(X_{t}) where X0=xX_{0}=x. By duality, PtP_{t} also induces a semigroup which acts on probability measures. In the name of notational economy, we will also denote this semigroup by PtP_{t}.

The kk-th column of gg, which we denote by g(k)g^{(k)}, can be viewed as a function from TdRd\mathbf{T}^{d}\rightarrow\mathbf{R}^{d}. Hence {f,g(1),,g(m)}\{f,g^{(1)},\dots,g^{(m)}\} is a collection of vector fields on Td\mathbf{T}^{d} and (2.1) can be rewritten as

This form of equation (2.1) makes it clear that there is a deterministic drift in the direction of ff and independent random kicks in each of mm directions {g(1),,g(m)}\{g^{(1)},\dots,g^{(m)}\}. We will require that the randomness injected into the dynamics in the g(k)g^{(k)} directions effects all directions sufficiently to produce a “smoothing” effect at the level of probability densities. Uniform ellipticity is the simplest assumption which ensures “sufficient spreading” of the randomness. However, it is far from necessary and many important examples are not uniformly elliptic. For our purposes, it is sufficient that the system is hypoelliptic.

The following is our basic assumption governing the stationary measure of the system, and its smoothness.

We assume that one of the following two assumptions hold:

(Elliptic Setting) The matrix-valued function a(x)=g(x)g(x)a(x)=g(x)g^{\top}(x) is uniformly positive definite: there exists α>0\alpha>0 so that, for all zRdz\in\mathbf{R}^{d} and xTdx\in\mathbf{T}^{d}, αz2a(x)z.z .\alpha|z|^{2}\leq a(x)z.z\ .

(Hypoelliptic setting) The functions ff and gg are C(Td,Rd)C^{\infty}(\mathbf{T}^{d},\mathbf{R}^{d}) and such that, for some nn and all xTdx\in\mathbf{T}^{d}, Λn=Λn(x)=Rd\Lambda_{n}=\Lambda_{n}(x)=\mathbf{R}^{d} and (2.1) possesses a unique stationary measure.

In either case in Assumption 1, we have that (2.1) has a unique stationary measure, henceforth denoted by μ\mu, which has a density with respect to Lebesgue measure on Td{\mathbf{T}}^{d}.

As already mentioned, the goal of this paper is to give a simple, yet robust, proof that time averages obtained using a class of numerical methods for simulating (2.1) is close to the corresponding ergodic limits of (2.1). To this end, we now describe the class of numerical methods we will consider.

Numerical approximations

We will begin by considering a class of simple numerical approximation of (2.1) given by the generalized Euler-Maruyama method on the torus Td:\mathbf{T}^{d}:

where Δ\Delta is the time increment, F ⁣:Td×(0,1)RdF\colon\mathbf{T}^{d}\times(0,1)\rightarrow\mathbf{R}^{d}, G ⁣:Td×(0,1)Rd×mG\colon\mathbf{T}^{d}\times(0,1)\rightarrow\mathbf{R}^{d\times m}, and ηn=(ηn,1,,ηn,m)\eta_{n}=(\eta_{n,1},\ldots,\eta_{n,m})^{\top} is an Rm\mathbf{R}^{m}-valued random variable and {ηn,i:nN,i{1,,m}}\{\eta_{n,i}:n\in\mathbf{N},i\in\{1,\ldots,m\}\} is a collection of i.i.d. real-valued random variables satisfying

for a sufficiently large r2.r\geq 2.We do not specify rr in the statements of the forthcoming theorems since common numerical schemes use random variables with bounded moments up to any order. At the same time, in each proof of Section 5.1 it is not difficult to recover the number of bounded moments required. More general methods will be considered in Section 5.2.

For example, η1,1N(0,1)\eta_{1,1}\sim\mathcal{N}(0,1) satisfies these assumptions as well as η1,1\eta_{1,1} with the law

We also note that the above two choices of η1,1\eta_{1,1} guarantee existence of finite moments of η1,1\eta_{1,1} of any order. Further, we assume that ηn,\eta_{n}, nN,n\in\mathbf{N}, is defined on the probability space (Ω,F,P)(\Omega,\mathcal{F},\mathbf{P}) and is {Ftn}\{\mathcal{F}_{t_{n}}\}-adapted, for tn=nΔ.t_{n}=n\Delta.

Since we want the dynamics of (3.1) to be “close” to those of (2.1), we make the following assumption on the “local error” of (3.1).

For some C>0C>0, all xTdx\in\mathbf{T}^{d}, and all Δ\Delta sufficiently small

Under this assumptions we expect that XnX(tn)X_{n}\approx X(t_{n}) where as before tn=nΔt_{n}=n\Delta. In fact it follows from the general theory that such numerical schemes have first-order weak convergence on finite time intervals.

In analogy to (2.2), we define an operator associated to (3.1) by

where A(x,Δ)=G(x,Δ)G(x,Δ)A(x,\Delta)=G(x,\Delta)G^{\top}(x,\Delta). While not the generator of the Markov process (3.1), it is the generator’s leading order part since

where D2ϕD^{2}\phi is the second derivative, we deduce that

It is reasonable to expect that the distribution of the dynamics of the SDE and its approximation will be close to each other since the leading part of the generator of the approximation (3.1) is close to that of the original process (2.1).

We close this section by giving two approximation methods which satisfy Assumption 2.

and hence F(x,Δ)=f(x)F(x,\Delta)=f(x) and G(x,Δ)=g(x)G(x,\Delta)=g(x).

and hence F(x,Δ)=f(y)F(x,\Delta)=f(y) and G(x,Δ)=g(y)G(x,\Delta)=g(y) where yy is and element of {yTd:y=x+f(y)Δ}\{y\in\mathbf{T}^{d}:y=x+f(y)\Delta\} which is closest to xx.

From (3.6) and the backwards Kolmogorov equation for (2.1), Assumption 2 is equivalent to

for all sufficiently smooth ϕ\phi, provided that X0=X(0)X_{0}=X(0). In the language of consistency and stability of a numerical method used in the introduction, (3.6) and (3.8) provide an appealing way to characterize the degree of local consistency of a numerical method. This is will be the starting point for our discussion of higher order methods in Section 5.2.

Poisson equation

It is a “meta-theorem” in averaging, homogenization and ergodic theory that if one can solve the relevant Poisson equation then one can prove results about the desired limit of a time-average. A central theme of this paper is to show how an appropriate Poisson equation can be used to analyze the long time average of a given function ϕ ⁣:TdR\phi\colon\mathbf{T}^{d}\rightarrow\mathbf{R}, evaluated along a numerical approximation of (2.1) and obtain information about its closeness to the corresponding ergodic limits.

In this section, for motivation, we illustrate key ideas related to the Poisson equation, purely in the continuous time setting. To this end, recalling that μ\mu is the unique stationary measure of (2.1) and L\mathcal{L} its generator, given ϕ ⁣:TdR\phi\colon\mathbf{T}^{d}\rightarrow\mathbf{R} we define the stationary average of ϕ\phi by

and let ψ\psi solve the Poisson equation

Under Assumption 1, (4.2) possesses a unique solution which is at least as smooth as ϕ\phi. For kNk\in\mathbf{N}, we denote by Wk,W^{k,\infty} the space of functions from Td\mathbf{T}^{d} to R\mathbf{R} such that the function and all of its partial derivatives up to order kk are essentially bounded. We then have the following result whose proof we sketch.

Under Assumption 1-i), given any ϕWk,\phi\in W^{k,\infty}, with kN{0}k\in\mathbf{N}\cup\{0\} there exists a unique solution ψWk+2,\psi\in W^{k+2,\infty} to (4.2). Under Assumption 1-ii) there exists a δ>0\delta>0 so that given any ϕWk,\phi\in W^{k,\infty}, with kN{n2}k\in\mathbf{N}\cup\{n\geq 2\}, there exists a unique solution ψWk+δ,\psi\in W^{k+\delta,\infty} to (4.2).

Let \triangle denote the Laplacian on the dd dimensional torus. In either the hypoelliptic or elliptic case, one knows that for some positive α\alpha and CC

for all smooth ϕ\phi. (See and recall that our space is compact.) This implies that L\mathcal{L} has compact resolvent and discrete spectrum (consisting of isolated points). Thus L\mathcal{L} has a spectral gap and is invertible on the complement of the span of the eigenfunctions with zero eigenvalue. The space is nothing other than the functions which have zero mean with respect to an invariant measure for the corresponding semigroup PtP_{t} (i.e. μ\mu such that Ptμ=μP_{t}\mu=\mu or in terms of the generator Lμ=0\mathcal{L}^{\ast}\mu=0). Since the system has a unique invariant measure due to Assumption 1, we see that ϕϕˉ\phi-\bar{\phi} has zero projection onto the space spanned by eigenfunctions with eigenvalue zero. Hence we know there exists a function uu which solves (4.2) weakly, by the Fredholm alternative.

This leaves the regularity. In the uniformly elliptic case, see . In the hypoelliptic case the results follow from Theorem 7.3.4 of or . This states that there is δ>0\delta>0 such that, if ϕWk,p,\phi\in W^{k,p}, then uWk+δ,pu\in W^{k+\delta,p} for p[2,)p\in[2,\infty) and kNk\in\mathbf{N}. Since our ϕWk,p\phi\in W^{k,p} for all p2p\geq 2, we know that up1Wk+δ,pu\in\bigcap_{p\geq 1}W^{k+\delta,p}. Since our space has finite measure, we know that Wk+δ,=p2Wk+δ,pW^{k+\delta,\infty}=\bigcap_{p\geq 2}W^{k+\delta,p}.

One of the principle technical issues that must be addressed in order to extend the results in this paper from the Td\mathbf{T}^{d} case to general ergodic SDE on Rd\mathbf{R}^{d} is proof of the existence of nice, well-controlled solutions to the above Poisson equation. Many of the needed results can be found in .

2 A strong law of large numbers: an illustrative example

We now show how to use the Poisson equation to prove a law of large numbers for (2.1). This idea appears frequently in the literature . Nonetheless since this technique is central to our investigation of discrete time approximations to diffusions, we first highlight the main ideas in continuous time. However, before we begin it is worth noting that, at least formally, the solution to Poisson equation can be written as

This can be interpreted as the total fluctuation over time of PtϕP_{t}\phi from ϕˉ\bar{\phi}; and hence, it is not surprising that ψ\psi can be used to control convergence to equilibrium.

As stated in Theorem 4 when Assumption 1 holds, given any ϕ ⁣:\phi\colon TdR\mathbf{T}^{d}\rightarrow\mathbf{R} with ϕW2,\phi\in W^{2,\infty} there exists a unique ψ ⁣:\psi\colon TdR\mathbf{T}^{d}\rightarrow\mathbf{R} which solves (4.2). Furthermore ψW2,\psi\in W^{2,\infty} and hence Itô’s formula then tells us that

where M(t)=0t(ψ)(X(s)).g(X(s))dW(s)M(t)=\int_{0}^{t}(\nabla\psi)(X(s))\,.\,g(X(s))\,dW(s). Since ψ\psi is bounded, the first term on the right-hand side goes to zero as tt\rightarrow\infty. To see that the last term also goes to zero as tt\rightarrow\infty observe that

for some K>0K>0 independent of time since  ψ\ \nabla\psi and gg are both bounded on Td\mathbf{T}^{d}. More precisely, we have shown that for any initial x0x_{0}

which is a quantitative version of what is often called the mean ergodic theorem.

We can view 1T0Tϕ(X(s))ds\frac{1}{T}\int_{0}^{T}\phi(X(s))\,ds as an estimator for ϕˉ\bar{\phi} and it follows from (4.5) that

The estimate (4.8) easily follows from (4.6) and (4.7) but can also be obtained directly from (4.5) using an additional mixing condition.

To obtain an almost sure (a.s.) resultThis result can also be obtained using Kronecker’s lemma (see, e.g. ). Though the approach we follow gives explicit error estimates which can be useful. , for any ε>0\varepsilon>0 we introduce A(T;ε)={1TsuptTM(t)>Tε12}.A({T;}\varepsilon)=\{\frac{1}{T}\sup_{t\leq T}|M(t)|>T^{\varepsilon-\frac{1}{2}}\}. By the Doob inequality for continuous martingales and the fact that EM(T)2KT\mathbf{E}|M(T)|^{2}\leq KT, we get

and the Borel-Cantelli lemma implies that there exists an a.s. bounded random variable C(ω)>0C(\omega)>0 and n0n_{0} so that for every nNn\in\mathbf{N} with nn0n\geq n_{0} one has

Hence with probability one, for every t[2n,2n+1)t\in[2^{n},2^{n+1}) with nn0n\geq n_{0} one has

By combining this estimate with (4.5) and the fact that ψ\psi is bounded, we see that for any ε>0\varepsilon>0 and for every t2n0t\geq 2^{n_{0}} one has

for some a.s. bounded C(ω)>0.C(\omega)>0. We note that (4.9) implies that for any initial x0x_{0}

In other words, the strong law of large number holds starting from any initial x0x_{0}. Our assumptions are sufficient to ensure that (2.1) has a unique stationary measure μ\mu. Hence, it follows from Birkoff’s ergodic theorem that (4.10) holds for μ\mu-a.e. initial x0x_{0}. The above argument not only shows that the result holds for all x0,x_{0}, but also gives quantitative estimates on the rate of convergence.

Error analysis for the numerical time-average

We now wish to generalize the calculations in the previous section to prove the closeness of the time averages obtained via the Euler approximation (3.1) to the stationary averages of (2.1). We consider sample path estimates in this section and then convert these to distance estimates in an appropriate metric, in Section 6.

Let T=NΔ.T=N\Delta. Introduce the estimator (discrete time-average) ϕ^N\hat{\phi}_{N} for the stationary average ϕˉ\bar{\phi}:

where XnX_{n} is the SDE approximation defined in (3.1).

We start by proving a mean convergence result, followed by L2L^{2} and almost sure theorems.

Let Assumptions 1 and 2 hold. Let HH denote W2,W^{2,\infty} in the elliptic setting and W4,W^{4,\infty} in the hypoelliptic setting. Then for any ϕH\phi\in H, one has

where T=NΔT=N\Delta and CC is some positive constant independent of Δ\Delta and TT. Furthermore, the constant CC is a linear function of ϕH|\phi|_{H} and otherwise independent of ϕ\phi.We indicate the dependence of constants on ϕ\phi since this dependence is used in Section 6. Obviously, all the constants appearing in the statements of this and forthcoming theorems also depend on the coefficients of the SDE (2.1) and on the numerical method used.

Under either of the conditions from Assumption 1, we know from Theorem 4 that the solution ψ\psi of (4.2) is in W4,W^{4,\infty}. In the interest of clarity and definiteness, we will proceed under the first condition in Assumption 1 (the elliptic setting) where the D4ψCD2ϕ|D^{4}\psi|_{\infty}\leq C|D^{2}\phi|_{\infty}, D3ψCDϕ|D^{3}\psi|_{\infty}\leq C|D\phi|_{\infty}, D2ψDψψCϕ|D^{2}\psi|_{\infty}\vee|D\psi|_{\infty}\vee|\psi|_{\infty}\leq C|\phi|_{\infty}. In the hypoelliptic setting, the kthk^{th} derivative of ψ\psi is bounded by the same derivative of ϕ\phi, necessitating the change in definition of HH between the elliptic and hypoelliptic cases in the statement of the theorem.

For brevity, we write ϕn=ϕ(Xn)\phi_{n}=\phi(X_{n}), Fn=F(Xn,Δ)F_{n}=F(X_{n},\Delta), Gn=G(Xn,Δ)G_{n}=G(X_{n},\Delta), ψn=ψ(Xn)\psi_{n}=\psi(X_{n}) and Dkψn=(Dkψ)(Xn)D^{k}\psi_{n}=(D^{k}\psi)(X_{n}) where (Dkψ)(z)(D^{k}\psi)(z) is the kkth derivative. We write (Dkψ)(z)[h1,,hk](D^{k}\psi)(z)[h_{1},\ldots,h_{k}] for the derivative evaluated in the directions hjh_{j}. Defining δˉn=\mboxdefXn+1Xn=ΔFn+ΔGnηn+1\bar{\delta}_{n}\overset{\mbox{\tiny def}}{=}X_{n+1}-X_{n}=\Delta F_{n}+\sqrt{\Delta}G_{n}\eta_{n+1}, we have

is the remainder given by the Taylor theorem. Hence,

Summing (5.4) over the first NN terms, dividing by NΔN\Delta and using (4.2), produces

We will find it convenient to further decompose

Notice that E[rn+1Ftn]=0\mathbf{E}[r_{n+1}|\mathcal{F}_{t_{n}}]=0 and E[ηn+1Ftn]=0\mathbf{E}[\eta_{n+1}|\mathcal{F}_{t_{n}}]=0 and E[ηn+1,i ηn+1,j ηn+1,kFtn]=0\mathbf{E}[\eta_{n+1,i}\ \eta_{n+1,j}\ \eta_{n+1,k}|\mathcal{F}_{t_{n}}]=0. Then it is not difficult to see that Mi,k,M_{i,k}, i=0,,3,i=0,\ldots,3, are martingales with respect to {Ftk}\{\mathcal{F}_{t_{k}}\} and, in particular, EMi,k=0\mathbf{E}M_{i,k}=0 for any kk.

Since ff and gg are uniformly bounded, (3.3) implies that FnF_{n} and GnG_{n} are uniformly bounded in nn. Recall that we also know that ψ\psi and its first four derivatives are uniformly bounded. Hence the Si,NS_{i,N} are bounded as follows:

where the CiC_{i} are positive constants which we have labeled for future reference. Similarly, we have (cf. (3.5)):

Notice that this bound and the above bound in S1,NS_{1,N} are a.s. bounds with deterministic constants. Lastly observe that ψNψ02ϕ|\psi_{N}-\psi_{0}|\leq 2|\phi|_{\infty}. Applying all of the preceding estimates to (5.5) produces the quoted result. ∎

We now consider an L2L^{2} convergence result, related to the mean ergodic theorem (4.6) in the continuous case.

In the setting of Theorem 6, for any ϕH\phi\in H, one has

where T=NΔT=N\Delta and CC is some positive constant independent of Δ\Delta and TT. Furthermore, CC depends on ϕ\phi only through ϕH|\phi|_{H} and does so linearly.

As in the proof of Theorem 6, we provide details in the elliptic case; in the hypoelliptic case the only change is that higher derivatives of ϕ\phi appear in the constants. We begin with (5.5) from the proof of Theorem 6 and obtain

where CC is an ever changing constant. Observe that it follows from (5.6) that S1,N2C12ϕ2Δ2T2.|S_{1,N}|^{2}\leq C_{1}^{2}|\phi|_{\infty}^{2}\Delta^{2}T^{2}. Further, by reasoning similar to that used in getting (5.6) we have

Similar reasoning and the bound on Eηi,n8\mathbf{E}\eta_{i,n}^{8} give

Using the bounds on n=0N1(LΔL)ψn|\sum_{n=0}^{N-1}(\mathcal{L}^{\Delta}-\mathcal{L})\psi_{n}| and ψNψ0|\psi_{N}-\psi_{0}| from the proof of Theorem 6 and all the above inequalities, we estimate the right-hand side of (5.9) and arrive at the quoted result. ∎

We now prove an almost sure version of the preceding two results. Its relation to Theorem 7 is the same as the relation of (4.9) to (4.6).

In the setting of Theorem 6, fixing an L>0L>0 there exists a deterministic constant KK (depending lineally on LL) so that for all ϕ\phi with ϕHL\|\phi\|_{H}\leq L, Δ\Delta sufficiently small, positive ε>0\varepsilon>0, and TT sufficiently large one has:

where T=NΔT=N\Delta and C(ω)>0C(\omega)>0 is an a.s. bounded random variable depending on ε\varepsilon and the particular ϕ\phi.

As in the proof of Theorem 6, we provide details in the elliptic case; in the hypoelliptic case the only change is that higher derivatives of ϕ\phi appear in the constants. Starting from (5.5), we have

where KK is an ever changing positive deterministic constant, independent of Δ\Delta and N.N. Due to the strong law of large numbers, the sum 1Nn=0N1ηn+14\frac{1}{N}\sum_{n=0}^{N-1}\left|\eta_{n+1}\right|^{4} a.s. convergesNote that in the case of ηn,i\eta_{n,i} from (3.2) this sum is equal to m2m^{2}. to Eη14\mathbf{E}\left|\eta_{1}\right|^{4} and thus for almost every sequence η1,\eta_{1}, η2,\eta_{2}, \ldots and all sufficiently large NN we have 1Nn=0N1ηn+14<K\frac{1}{N}\sum_{n=0}^{N-1}\left|\eta_{n+1}\right|^{4}<K for some deterministic K>0.K>0. Hence

where ΘN=\mboxdef1Ti=03Mi,N.\Theta_{N}\overset{\mbox{\tiny def}}{=}\frac{1}{T}\sum_{i=0}^{3}|M_{i,N}|.

Now we analyze ΘN.\Theta_{N}. We have for r1r\geq 1

(here KK depends on rr). Recall that Mi,kM_{i,k} are martingales. If we can prove that EMi,N2rCNr\mathbf{E}|M_{i,N}|^{2r}\leq CN^{r} for all rr sufficiently large then the proof can be completed with the aid of the Borel-Cantelli lemma as in Section 4.2.

We will the provide argument for estimating EM2,N2r,\mathbf{E}|M_{2,N}|^{2r}, the other terms are estimated analogously. We re-write

Note that the second term is equal to zero. Indeed

Note that by Jensen’s inequality this inequality holds for non-integer r1r\geq 1 as well. Therefore,

The Markov inequality together with (5.15) implies

Then for any γ=1/2ε\gamma=1/2-\varepsilon, with ε>0\varepsilon>0, there is a sufficiently large r1r\geq 1 such that (recall that Δ\Delta is fixed here and T=ΔN)T=\Delta N)

Hence, by the Borel-Cantelli lemma, the random variable \varsigma\overset{\mbox{\tiny def}}{=}\sup_{T>0}T^{\gamma}\Theta_{N}\ is a.s. finite which together with (5.11) implies (5.10). ∎

We note that Theorems 6-8 do not require the Markov chain XnX_{n} to be ergodic. It is, of course, possible under some additional conditions on the numerical method to prove its ergodicity (see for example ). If the limit limNϕ^N\lim_{N\rightarrow\infty}\hat{\phi}_{N} exists and is independent of X0X_{0} (i.e., if the Markov chain XnX_{n} is ergodic) then it follows from Theorem 8 that \lim_{N\rightarrow\infty}\big{|}\hat{\phi}_{N}-\bar{\phi}\big{|}\leq K\Delta\ \a.s., which is consistent with the results of, for example, .

In the series of papers the authors consider the following weighted estimator for the stationary average ϕˉ:\bar{\phi}:

where XnX_{n} are obtained by an Euler-type scheme with a decreasing time step Δn\Delta_{n} such that Δn0\Delta_{n}\rightarrow 0 as nn\rightarrow\infty while positive weights wnw_{n} are such that n=1Nwn\sum_{n=1}^{N}w_{n}\rightarrow\infty as N.N\rightarrow\infty. In particular, they proved that for some Euler-type schemes and wn=Δn=Δ0nα,w_{n}=\Delta_{n}=\Delta_{0}n^{-\alpha}, α[1/3,1),\alpha\in[1/3,1),

We also note that the authors of proved a.s. convergence of weighted empirical measures based on Euler-type schemes with decreasing step to the invariant measure of the corresponding SDE exploiting the Echeverria-Weiss theorem, which differs from the approach used in this paper.

If we fix the computational costs (i.e., NN and an Euler-type scheme) then the asymptotically optimal choice of Δ\Delta for ϕ^N\hat{\phi}_{N} is N1/3N^{-1/3} in which case O(Δ+1/N1/2Δ1/2)=O(1/N1/3).O\left(\Delta+1/N^{1/2}\Delta^{1/2}\right)=O\left(1/N^{1/3}\right). It is interesting to note that these optimal orders of errors of both estimators are the same. At the same time, we should emphasize that the term with Δ\Delta in (5.16) (see also KΔK\Delta (5.7) and (5.10)) is related to the numerical integration error while the term with 1/T=1/N1/2Δ1/21/\sqrt{T}=1/N^{1/2}\Delta^{1/2} is related to the statistical error (see also Section 7). In the case of large scale simulations (like those in molecular dynamics) the numerical error is usually relatively small thanks to the existing state of the art numerical integrators and the statistical error prevails. In such common in practice situations one chooses Δ\Delta and NN to appropriately control the corresponding errors (see further discussion in Section 7).

2 Higher order schemes

We now consider a more general approximation than (3.1). In addition to the assumptions made in Section 2 we assume in this section that the coefficients of the SDE (2.1) and the function ϕ\phi are sufficiently smooth.

Given a function δˉ:Td×(0,1)×RmTd\bar{\delta}:\mathbf{T}^{d}\times(0,1)\times\mathbf{R}^{m}\rightarrow\mathbf{T}^{d} and a sequence of Rm\mathbf{R}^{m}-valued i.i.d. random variables {ξn=(ξn,1,,ξn,m):nN}\{\xi_{n}=(\xi_{n,1},\ldots,\xi_{n,m}):n\in\mathbf{N}\}, we define a general numerical method

We assume that the ξn\xi_{n} have sufficiently high moments finite. Clearly our previous class of methods fits in to this framework as well as a number of new methods such as implicit Euler. Guided by (3.8) we make the following general assumption about (5.17) after which we will state some easier to verify conditions which are equivalent.

For all Δ(0,1)\Delta\in(0,1) sufficiently small, and all ϕW2(p+1),\phi\in W^{2(p+1),\infty} if X0=X(0)X_{0}=X(0) then

where the constant in the error term is uniform over all ϕ\phi with ϕW2(p+1),1\|\phi\|_{W^{2(p+1),\infty}}\leq 1.

To complement the increments of the numerical method δˉ\bar{\delta} defined above we now define the δ\delta increments of the SDE. Namely for any xTdx\in\mathbf{T}^{d}, we define δ:Td×(0,1)×ΩTd\delta:\mathbf{T}^{d}\times(0,1)\times\Omega\rightarrow\mathbf{T}^{d} by

where X(0;ω)=xX(0;\omega)=x and X(t;ω)X(t;\omega) solves (2.1). The following proposition, whose proof is given at the end of the section, enables Assumption 3 to be verified.

Assume that for some pNp\in\mathbf{N}, there exists a positive constant KK so that for all Δ(0,1)\Delta\in(0,1) sufficiently small:

Then the method satisfies Assumption 3 with the same pp.

We note that examples of second-order weak schemes (p=2p=2) which satisfy Assumption 3 can be found in many places including [29, p. 103] and . Higher order methods also exist . We also note that it follows from the general theory [29, p. 100] that the numerical schemes considered in Proposition 10 have weak convergence of order pp on finite time intervals. Now we prove a mean convergence result, which is analogous to Theorem 6 in the case of Euler-type methods (3.1).

Let Assumptions 1 and 3 hold. Let HH denote W2p,W^{2p,\infty} in the elliptic setting and W2(p+1),W^{2(p+1),\infty} in the hypoelliptic setting. Then for any ϕH\phi\in H with ϕH1|\phi|_{H}\leq 1, consider ϕ^N\hat{\phi}_{N} defined by (5.1) where XnX_{n} is generated by the numerical method from (5.17) rather than (3.1) as previously. Then

where T=NΔT=N\Delta and CC is some positive constant independent of Δ\Delta and TT. Furthermore, the constant CC is a linear function of ϕH|\phi|_{H} and otherwise independent of ϕ\phi. In other words,

(of Theorem 11) As before, let ψ\psi be the solution to the Poisson equation associated to ϕ\phi given in (4.2). Define ψn=ψ(Xn)\psi_{n}=\psi(X_{n}) and let X(x,t)X(x,t) be the solution to (2.1) at time tt with initial condition xx. From our assumptions, we have that

Rewriting Eψ(X(Xn,Δ))\mathbf{E}\psi(X(X_{n},\Delta)) in (5.24) via the Taylor expansion of expectations of SDE solution ([29, Lemma 2.1.9, p. 99] or ), we arrive at

Summing (5.25) over the first NN terms and dividing by NΔ,N\Delta, we obtain

where Qk=1Nn=0N1E(Lkψ)(Xn).Q_{k}=\frac{1}{N}\sum_{n=0}^{N-1}\mathbf{E}\left(\mathcal{L}^{k}\psi\right)(X_{n}). Using (4.2) and boundedness of ψ\psi, we have after rearrangement of (5.26):

Applying analogous arguments to Lk1ψ,\mathcal{L}^{k-1}\psi, k=2,,p,k=2,\ldots,p, as we did for ψ\psi in (5.26), we have

and, in particular, QpKΔ+K/T.|Q_{p}|\leq K\Delta+K/T. Hence QkKΔp+1k+K/T|Q_{k}|\leq K\Delta^{p+1-k}+K/T which together with (5.27) implies (5.22).To help with intuitive understanding of the proof, we remark that Lkψ(x)dμ(x)=0\int\mathcal{L}^{k}\psi(x)\,d\mu(x)=0 which is approximated by QkQ_{k} with sufficient accuracy. ∎

If we substitute XnX_{n} from the method (5.17) in ϕ^N\hat{\phi}_{N} from (5.1), it is also possible to prove (analogous to Theorem 8) that

(of Proposition 10) We note that under the assumptions made the solution ψ\psi of the Poisson equation (4.2) is sufficiently smooth. Expanding ψn+1=ψ(Xn+1)\psi_{n+1}=\psi(X_{n+1}) in powers of δˉn=\mboxdefXn+1Xn\bar{\delta}_{n}\overset{\mbox{\tiny def}}{=}X_{n+1}-X_{n} and taking expectation, we get

where the remainder O(Δp+1)KΔp+1|O(\Delta^{p+1})|\leq K\Delta^{p+1} with KK independent of Δ\Delta and nn. If we replace δˉn\bar{\delta}_{n} by δn\delta_{n} defined in (5.19) in (5.29) then, using the conditional version of (5.20), we obtain (with a different remainder O(Δp+1)O(\Delta^{p+1}) than in (5.29)):

It is not difficult to see that the right-hand side of (5.30) coincide with expectation of the Taylor expansion of ψ(X(Xn,Δ))\psi(X(X_{n},\Delta)) around XnX_{n} up to a reminder of order Δp+1.\Delta^{p+1}. Hence

3 The Richardson-Romberg (Talay-Tubaro) error expansion

In this section we consider an expansion of the global error Eϕ^Nϕˉ\mathbf{E}\hat{\phi}_{N}-\bar{\phi} in powers of the time step Δ\Delta analogous to the Talay-Tubaro result . As in the previous section, we assume here that the coefficients of the SDE (2.1) and the function ϕ\phi are sufficiently smooth. Note that we obtain the expansion both in the elliptic and hypoelliptic setting.

Let Assumptions 1 and 3 hold. Let qq be a positive integer and HH denote W2(p+q+1),W^{2(p+q+1),\infty} in the elliptic setting and W2(p+q+2),W^{2(p+q+2),\infty} in the hypoelliptic setting. For any ϕH\phi\in H with ϕH1|\phi|_{H}\leq 1, consider ϕ^N\hat{\phi}_{N} defined by (5.1) where XnX_{n} is generated by the numerical method from (5.17). Then

and the constants C0,,Cq,C_{0},\ldots,C_{q}, KK are independent of Δ\Delta and TT and they are linear function of ϕH|\phi|_{H} and otherwise independent of ϕ\phi. In other words,

Here we make use of the Poisson equation again and also exploit an idea used in the proof of the Talay-Tubaro expansion in the finite time case from [29, pp. 106-108]. We prove the theorem in the case of p=1p=1 (i.e., for Euler-type schemes) and q=0q=0 for the clarity of the exposition. It is not difficult to extend the proof to arbitrary p,q>0.p,q>0.

In the case of p=1p=1 for a particular Euler-type scheme, we can write (cf. (5.25)):

where A(x)A(x) is the coefficient at Δ2\Delta^{2} in the corresponding expansion of Eψ1\mathbf{E}\psi_{1} at X0=x.X_{0}=x. For instance, in the case of the explicit Euler-Maruyama scheme (3.7)

where all the coefficients and the derivatives of the function ψ\psi are evaluated at x.x. We do not need the explicit form of A(x)A(x) for the proof.

Summing (5.32) over the first NN terms, dividing by NΔ,N\Delta, using (4.2) and rearranging the terms, we obtain

where the constant Aˉ\bar{A} is the stationary average of AA (see (4.1)). The expansion (5.31) with p=1,p=1, q=0q=0 and C0=AˉC_{0}=-\bar{A} follows from (5.33) and (5.34). ∎

Error analysis for the numerical stationary measures

We now use the results of the previous section to prove that any stationary measure of the numerical method is close to that of the underlining SDE. The existence of stationary measures for the numerical method follows by the Krylov-Bogoliubov construction.The fact that our state space is compact ensures that the time-averaged transition measure forms a tight family of probability measures . We have assumed in (3.4) and (3.6) or Assumption 3 that the finite time dynamics of our method and SDE are close. This can be seen as a form of “consistency”. It is reasonable to expect that the longtime behavior will be close since our setting of the torus provides the necessary “stability”through ergodicity.

We begin by giving a metric in which we will measure the distance between measures. If μΔ\mu^{\Delta} is a stationary measure of the numerical method, then for any bounded function ϕ:TdR\phi:\mathbf{T}^{d}\rightarrow\mathbf{R} and nN:n\in\mathbf{N:}

where Ez\mathbf{E}_{z} denotes the expectation conditional on X(0)=zX(0)=z. Fixing an integer p1p\geq 1, we define the metric ρ\rho between two probability measures on Td\mathbf{T}^{d} by

where H={ϕ:TdR:ϕH1}\mathcal{H}=\{\phi:\mathbf{T}^{d}\rightarrow\mathbf{R}:|\phi|_{H}\leq 1\} and H=W2p,H=W^{2p,\infty} in the elliptic setting and H=W2(p+1),H=W^{2(p+1),\infty} in the hypoelliptic setting. Observe that since ϕH\phi\in\mathcal{H} implies ϕH-\phi\in\mathcal{H}, one also has the equivalent, and sightly more standard, characterization of ρ\rho given by

Defining the metric ρ\rho as above for some integer p1p\geq 1 and assume that either:

then there exists a positive CC so that if μΔ\mu^{\Delta} is any stationary measure of the numerical method (3.1) then

Since μΔ\mu^{\Delta} is stationary, we have that ϕ(z)μΔ(dz)=Ezϕ(Xn)μΔ(dz)\int\phi(z)\mu^{\Delta}(dz)=\int\mathbf{E}_{z}\phi(X_{n})\mu^{\Delta}(dz) for any n0n\geq 0 and hence

where in the last estimate we have invoked either Theorem 6 or Theorem 11 depending on which assumptions hold. Now taking NN\rightarrow\infty, proves the result since the right-hand side is uniform for any ϕH\phi\in\mathcal{H}. ∎

We reemphasize that we have not assumed in this section that the numerical method is uniquely ergodic. Rather we have shown that any stationary measure of the numerical system is close to the true stationary measure in the ρ\rho-distance. It is, of course, possible in many settings to show that the numerical method is itself uniquely ergodic (cf. ).

It follows from Theorem 13 that under its assumptions the error ρ(μΔ,μ)\rho(\mu^{\Delta},\mu) can be expanded in powers of the time step:

It is also worth contrasting the result of Theorem 14 with a short alternative proof. Let PtP_{t} denote the Markov semigroup associated with the SDE and PnΔP_{n}^{\Delta} with nn steps of a numerical method with step size Δ\Delta. Suppose one knows for some metric dd on probability measures that d(Ptμ1,Ptμ2)Keγtd(μ1,μ2)d(P_{t}\mu_{1},P_{t}\mu_{2})\leq Ke^{-\gamma t}d(\mu_{1},\mu_{2}) holds for all probability measures μi\mu_{i} and that for any fixed nn, d(PnΔμΔ,PnΔμΔ)CnΔpd(P_{n}^{\Delta}\mu^{\Delta},P_{n\Delta}\mu^{\Delta})\leq C_{n}\Delta^{p} for some constant Cn,C_{n}, all Δ\Delta sufficiently small and any measure μΔ\mu^{\Delta} invariant for PΔP^{\Delta}. Then if one fixes an nn so that Keγn1/2Ke^{-\gamma n}\leq 1/2 then

and collecting the d(μ,μΔ)d(\mu,\mu^{\Delta}) produces the estimate d(μ,μΔ)2CnΔpd(\mu,\mu^{\Delta})\leq 2C_{n}\Delta^{p} for all Δ\Delta sufficiently small. The challenge in implementing such a seemingly simple program is obtaining the two estimates in the same metric dd. Typically, the first estimate is available in the total variation distance. On the other hand, the second estimate is usually estimated in distances requiring test functions with a number of derivatives. The recent works allow one to obtain the first estimate in the 1-Wasserstein distance which simplifies matching the two norms. Using this strategy on can obtain a bound in a stronger norm (such as total variation or 1-Wasserstein metric), but the d(PnΔμ,PΔnμ)d(P_{n}^{\Delta}\mu,P_{\Delta n}\mu) convergence rate will often be sub-optimal in these metrics. On the other hand, one can obtain d(Ptμ1,Ptμ2)Keγtd(μ1,μ2)d(P_{t}\mu_{1},P_{t}\mu_{2})\leq Ke^{-\gamma t}d(\mu_{1},\mu_{2}) with dd defined as in the metric ρ\rho above with H=W2(p+1),H=W^{2(p+1),\infty} by using that fact that P1ϕHCϕ\|P_{1}\phi\|_{H}\leq C\|\phi\|_{\infty} from some CC and hence Pt+1ϕϕˉHCPtϕϕˉCKeγtϕϕˉCKeγtϕϕˉH\|P_{t+1}\phi-\bar{\phi}\|_{H}\leq C\|P_{t}\phi-\bar{\phi}\|_{\infty}\leq CKe^{-\gamma t}\|\phi-\bar{\phi}\|_{\infty}\leq CKe^{-\gamma t}\|\phi-\bar{\phi}\|_{H}. This gives a result comparable to the hypoelliptic result in Theorem 14 though misses the smoothing in the elliptic result which is embodied in the fact one can use H=W2p,H=W^{2p,\infty}. (Some partial smoothing could have been extracted in the hypoelliptic case in Theorem 14 with more care). See for this program executed in a particular setting.

2 Relationship to Stein’s method

This section is devoted to outlining the similarity between the current setting and Stein’s method.When this work was nearing completion, a conversation with between JCM and Sourav Chatterjee prompted the authors to write this section reflecting on the relation between their approach and Stein’s method. This connection is not fully explored in this work but we believe that it is insightful to highlight the main idea. Though our goals are different, there are some passing similarities between the details in this paper and .

Let us recall that Stein’s method is a generic tool for finding bounds on a distance between two distributions which then can give quantitative convergence results. Denoting the target distribution as π\pi, the idea is to find an operator A\mathcal{A} and a determining class of functions G\mathcal{G} so that if, for all gGg\in\mathcal{G} one has Ag(x)dπ~(x)=0\int\mathcal{A}g(x)d\widetilde{\pi}(x)=0, then this implies that π~=π\widetilde{\pi}=\pi.As a referee correctly observed, the Echeverria-Weiss theorem is useful in identifying when such a condition characterizes an invariant measure and identifying the limiting invariant measure for a sequence measures with Ag(x)dπn~(x)0\int\mathcal{A}g(x)d\widetilde{\pi_{n}}(x)\rightarrow 0 as nn\rightarrow\infty. However, here we are really interested in the next order question. How to use the degree to which the characterizing equation is not satisfied to obtain quantitative estimates of the convergence rate.Given any hh sufficiently nice, one next solves Stein’s equation

with the tacit assumption that the solution gg exists and lies in G\mathcal{G}. Observe that a basic solvability condition on the above equation requires that hdπ=0\int hd\pi=0 when Aπ=0\mathcal{A}^{\ast}\pi=0. In this setting we can consider the equation Ag=hhdπ\mathcal{A}g=h-\int hd\pi for more general, uncentered hh. In order to quantify the distance of a given measure π~\widetilde{\pi} from π\pi, one tries to control (Ag)(z)dπ~(z)\int(\mathcal{A}g)(z)d\widetilde{\pi}(z) by some norm of gg which can in turn be controlled by an appropriate norm of hh. For definiteness let us assume that

By taking the supremum over all hh in a given class with h1,\|h\|\leq 1, one obtains control over the distance of π~\widetilde{\pi} from π\pi. In particular, if ϵ\epsilon is small then π~\widetilde{\pi} is close to π\pi.

This is essentially the methodology we have followed. Taking A\mathcal{A} to be the generator L\mathcal{L} of the Markov process, we are ensured that Aπ=0\mathcal{A}^{\ast}\pi=0 if π\pi is the Markov process’ stationary measure.This is done in some versions of Stein’s method. See . The basic idea of this note is to show that if L~\mathcal{\widetilde{L}} is a generator of another Markov process so that L~L\mathcal{\widetilde{L}}-\mathcal{L} is small then any stationary measure of the second Markov process will be close to π\pi. We will further assume that π~\widetilde{\pi} is ergodic.

for any gGg\in\mathcal{G}. Further assume that for some class of bounded, real valued functions H\mathcal{H} on Rd\mathbf{R}^{d} one can solve Lg=hhdπ\mathcal{L}g=h-\int hd\pi for hHh\in\mathcal{H} with a solution gGg\in\mathcal{G} and gGKhH\|g\|_{\mathcal{G}}\leq K\|h\|_{\mathcal{H}} for a fixed constant KK. Then for all hHh\in\mathcal{H},

where in moving from the penultimate expression to the last, we have used L~π~=0\mathcal{\widetilde{L}}^{\ast}\widetilde{\pi}=0. Since the right-hand side is uniform in hh we can take the supremum over hh. If H\mathcal{H} was a rich-enough class to define a metric, we obtain some estimate of the distance between the two stationary measures in that metric.

If one does not have good control over π~\widetilde{\pi} then (6.3) can be difficult to obtain. Instead it is often easier to replace (6.3) with

for any gGg\in\mathcal{G} and π~\widetilde{\pi}-a.e. xx. As before, we assume that for some class of functions H\mathcal{H} from RdR\mathbf{R}^{d}\rightarrow\mathbf{R}, one can solve Lg=hhdπ\mathcal{L}g=h-\int hd\pi for hHh\in\mathcal{H} with a solution gGg\in\mathcal{G} such that gGKhH\|g\|_{\mathcal{G}}\leq K\|h\|_{\mathcal{H}}. Observe that if G\mathcal{G} is a class of bounded functions then the second assumption is always satisfied.

Continuing since P~tLg(x)=P~th(x)hdπ\widetilde{P}_{t}\mathcal{L}g(x)=\widetilde{P}_{t}h(x)-\int hd\pi,

Rearranging, dividing by TT, and using P~Tg(x)g|\widetilde{P}_{T}g(x)|\leq|g|_{\infty}, produces

By Birkoff’s ergodic theorem, we know that for π~\widetilde{\pi}-almost every xx

so from (6.4) and gGKhH\|g\|_{\mathcal{G}}\leq K\|h\|_{\mathcal{H}} we obtain

This argument can be modified in many ways, the restriction that G\mathcal{G} and H\mathcal{H} are classes of bounded functions can be removed by assuming some control over P~tg(x)\widetilde{P}_{t}g(x) uniform in time. If that control can be maintained using a function which is integrable with respect to π~\widetilde{\pi} then the requirement that π~\widetilde{\pi} be ergodic can be removed. Although we do not fully explore these issues here, we believe that the connections made in this subsection are useful.

Variance of the Empirical Time Average

Theorems 6 and 8 are important for implementing time-averaging in computational practice. There are three types of errors arising in computing stationary averages: (i) numerical integration error (estimated by KΔK\Delta in (5.7) and (5.10) and by KΔpK\Delta^{p} in (5.23) and (5.28)); (ii) the error due to the distance from the stationary distribution (i.e., the error due to the finite time of integration TT estimated by K/TK/T in (4.7), (5.7) and (5.23)); (iii) the statistical error. The first two errors contribute to the bias of the estimator ϕ^N\hat{\phi}_{N} (see (5.7) and (5.23)). The error of numerical integration is controlled by the time step and the choice of a method. It can be estimated in practice using the Talay-Tubaro expansion (see Section 5.3 and ) in the usual fashion. The statistical error is contained in the second term of (5.10) and related to the variance of the estimator ϕ^N\hat{\phi}_{N} (see the details below and also (4.8)). Both the error due to the finite time of integration TT and the statistical error are controlled by the choice of the integration time TT (or what is the same, the choice of the number of steps NN under fixed time step Δ\Delta). They correspond to properties of the continuous dynamics of the SDE (2.1) and they are almost independent (assuming a sufficiently small Δ)\Delta) of a method or time step Δ\Delta used.

The Markov process X(t)X(t) defined on the torus Td\mathbf{T}^{d} by the SDE (2.1) is uniformly ergodic, it has exponential mixing rates (see, e.g., ), and there are some positive CC and γ\gamma such that for any t>0t>0 and θ>0\theta>0 the inequality

holds. Further, as it has been already mentioned before, it is possible in many settings to show that the Markov chain XnX_{n} generated by a numerical method is also geometrically ergodic (cf. ). Then, due to the compactness of the phase space, the chain is uniformly ergodic and has an exponential mixing like in (7.1) [28, Chapter 16]. We note that in the cited papers one of the conditions to ensure the ergodicity of XnX_{n} is the requirement that the numerical method uses random variables with densities positive everywhere. We do not address here the question about mixing rate for the Markov chains XnX_{n} generated by more general numerical methods treated in this paper leaving it for further study but instead we assume that the following relaxed mixing condition is satisfied for XnX_{n} and a ϕH\phi\in H and l>0:l>0:

where K>0K>0 is a constant independent of Δ,\Delta, k,k, l.l. This condition is much weaker than (7.1) but it is sufficient for the proof of the following proposition. At the same time, a faster decorrelation than (7.2) does not improve the estimate (7.3).

In the setting of Theorem 6 and under the condition (7.2), one has

where T=NΔT=N\Delta and K>0K>0 is a constant independent of Δ\Delta and TT.

We have used here that Mi,NM_{i,N} are martingales. Using the estimates for EMi,N2\mathbf{E}|M_{i,N}|^{2} from Theorem 6, we get that i=03EMi,N2KT.\sum_{i=0}^{3}\mathbf{E}|M_{i,N}|^{2}\leq KT.

To estimate the terms with Si,NS_{i,N} and (LΔL)ψ(\mathcal{L}^{\Delta}-\mathcal{L})\psi in (7.4), we exploit the condition (7.2). For example, consider the term with S1,N.S_{1,N}. We have

Analyzing analogously the other terms in (7.4), we arrive at the stated result. ∎

Although we proved Proposition 17 for the method (3.1), it is also valid for a more general class of numerical methods from Section 5.2.

In practice one usually estimates the statistical error of ϕ^N\hat{\phi}_{N} as followsOf course, better statistical estimators can be used to quantify the statistical error of the time averaging but we restrict ourselves here to a simple one.. We run a long trajectory MTMT split into MM blocks of a large length T=ΔNT=\Delta N each. We evaluate the estimators mϕ^N,{}_{m}\hat{\phi}_{N}, m=1,,M,m=1,\ldots,M, for each block. Since TT is big and a time decay of correlations is usually fast, mϕ^N{}_{m}{\hat{\phi}}_{N} can be considered as almost uncorrelated. We compute the sampled variance

For a sufficiently large TT and MM, Eϕ^N\mathbf{E}\,\hat{\phi}_{N} belongs to the confidence interval

with probability, for example 0.950.95 for c=2c=2 and 0.9970.997 for c=3.c=3. Note that Eϕ^N\mathbf{E}\,\hat{\phi}_{N} contains the two errors forming the bias as explained at the beginning of this section. We also pay attention to the fact that D^1/T\hat{D}\sim 1/T (cf. (7.3)), i.e., it is inverse proportional to the product ΔN.\Delta N.

Instead of time averaging considered in this paper, stationary averages can be computed using ensemble averaging, i.e., by the following estimate for the stationary average ϕˉ\bar{\phi}:

where tt is a sufficiently large time, Xˉ\bar{X} is an approximation of X,X, and LL is the number of independent approximate realizations. The total error Rϕˇ=\mboxdefϕˇϕˉR_{\check{\phi}}\overset{\mbox{\tiny def}}{=}\check{\phi}-\bar{\phi} consists of three parts: the error ε\varepsilon of the approximation ϕˉ\bar{\phi} by Eϕ(X(t))\mathbf{E}\phi(X(t)); the error of numerical integration CΔpC\Delta^{p} (here pp is the weak order of the method); and the Monte Carlo error which is proportional to 1/L.1/\sqrt{L}. More specifically

Here, in comparison with the time-averaging approach, each error is controlled by its own parameter: sufficiently large tt ensures smallness of ε;\varepsilon; time step Δ\Delta (as well as the choice of a numerical method) controls the numerical integration error; the statistical error is regulated by choosing an appropriate number of independent trajectories L.L. For further details see .

Acknowledgments

JCM thanks the NSF for its support, through DMS-0616710 and DMS-0449910, as well as the support of a Sloan Fellowship. AMS is grateful to the ERC and EPSRC for financial support. MVT was partially supported by the EPSRC Research Grant EP/D049792/1. JCM thanks Martin Harier and Sourav Chatterjee for useful discussions and the hospitality of the Warwick Mathematics Institute where the core of this work was done. The authors thank an anonymous referee and Denis Talay for useful suggestions which improved the manuscript.

References