Pathwise Accuracy and Ergodicity of Metropolized Integrators for SDEs

Nawaf Bou-Rabee, Eric Vanden-Eijnden

Introduction

This paper considers SDEs whose solution possesses a probability transition density that satisfies a detailed balance-type condition with respect to a known probability distribution. In this context the paper analyzes integrators for such SDEs that

are ergodic with respect to the exact equilibrium distribution of the SDE on infinite time intervals; and,

strongly converge to the solutions of the SDE on finite time intervals.

Pure sampling methods can accomplish (i), but they typically do not approximate the solution to the SDE. Integrators for SDEs certainly satisfy (ii), but they often are divergent on infinite time intervals or ergodic with respect to a different equilibrium distribution \citet*TaTu1990, Ta1995, MaStHi2002, MiTr2004. This paper shows that a Metropolized integrator can simultaneously accomplish these goals.

Motivation.

This paper is motivated by SDEs that arise in molecular dynamics (MD). Although the numerical analysis of general SDEs is well-studied (see, e.g., \citet*Ta1995, MiTr2004), the relevance of this analysis to SDEs that arise in MD is limited. Indeed such SDEs are characterized by complex drift vector fields that have limited regularity, and one is typically interested in their solutions over long time intervals. Let us elaborate briefly on the consequences of these two observations.

The drift vector field of SDEs that arise in MD involve the gradient of an empirically derived potential energy function. Due to singularities in the potential energy, this drift vector field possesses limited regularity, e.g., it is nonglobally Lipschitz. The drift vector field is also computationally costly to evaluate because the potential force involves intricate short and long-range interactions between atoms. As a result implicit methods are typically too costly in this application area.

Moreover, the lack of regularity in the drift of SDEs that arise in MD causes explicit discretizations to be stochastically unstable in general. This implies that, in principle, such schemes cannot be used to sample the equilibrium distribution of the SDE. Some approaches to stabilize explicit discretizations of SDEs include

adaptive time-stepping \citet*LaMaSt2007;

method of rejecting exploding trajectories \citet*MiTr2005; and

using bounded random numbers instead of Gaussian ones.

These methods are often ergodic but typically introduce discretization errors in the sampling (i.e. the equilibrium distribution of the numerical scheme is not exactly the same as the distribution of the SDE). A standard way to remove these errors is to resort to a Monte-Carlo method designed for sampling, e.g., hybrid Monte-Carlo algorithms \citet*DuKePeRo1987, Ho1991, AkRe2008. This procedure, however, has the effect that the dynamics of the method becomes unrelated to that of the SDE. Indeed, in this context, the main question often becomes how to make the sampling as efficient as possible by accelerating the rate of convergence of the method.

Here we show that a Metropolis-adjusted explicit integrator can both be ergodic with respect to the known equilibrium distribution of the SDE and captures the dynamical behavior of the solutions of this SDE. To establish this second property, one needs to estimate the effect on the dynamics of the rejections in the Metropolis-Hasting method. The analysis of this effect in the context of a Metropolized integrator for SDEs with the special structure relevant to MD is one of the main objectives of this paper.

Organization of the Paper.

In §2, the main results of the paper are presented without proofs and put into context. In §3, numerical validation of the main results is provided. The proofs of the main results are organized according to the type of dynamic and discretization considered. In §4, the proofs pertaining to a forward Euler-Maruyama discretization of overdamped Langevin dynamics and its Metropolis-adjustment can be found. In §5, the proofs pertaining to a Störmer-Verlet based discretization of inertial Langevin dynamics and its Metropolis-adjustment can be found. We wrap up the paper with a conclusion in §6.

Main Results

Let NN and hh be given, set T=NhT=Nh and tk=hkt_{k}=hk for k=0,...,Nk=0,...,N, and consider the following explicit Euler-Maruyama discretization of (2.1):

Hence, the chain is irreducible with respect to Lebesgue measure. To be consistent with the literature, the discrete-time Markov chain generated by Euler-Maruyama is called the unadjusted Langevin algorithm (ULA)\citet*RoTw1996A.

If U\nabla U is globally Lipschitz and hh is small enough, ULA (2.3) can often be shown to be:

first-order strongly convergent to the solution orbits of (2.1) on finite time intervals;

geometrically ergodic on infinite-time intervals with respect to an invariant measure that is a first-order approximant to the invariant measure μ\mu of (2.1).

The second property is typically established using a Talay-Tubaro expansion of the global weak error of ULA\citet*TaTu1990.

Metropolized Forward Euler-Maruyama.

A Metropolis-Hastings method is a quite general Monte-Carlo method for sampling from a known probability distribution \citet*MeRoRoTeTe1953,Ha1970. The method generates a Markov chain from a given proposal Markov chain as follows. The algorithm computes a proposal move according to the proposal chain and accepts this proposal with a probability that ensures the Metropolized chain is ergodic with respect to the given probability distribution. Here we shall focus on the Metropolized Euler-Maruyama integrator defined in terms of the equilibrium density π\pi (2.2) and the transition density qhq_{h} (2.4).

Let ζkU(0,1)\zeta_{k}\sim U(0,1) for k=0,...,N1k=0,...,N-1. Given hh and Xk\boldsymbol{X}_{k} the algorithm calculates a proposal move using the Euler-Maruyama updating scheme in (2.3):

and accepts this proposal with a probability

The identity (2.9) is obviously a significant improvement to (2.5).

When U\nabla U is nonglobally Lipschitz, MALA is not geometrically ergodic because ULA is transient (see Theorem 4.2 of\citet*RoTw1996A). To correct this problem, Roberts and Tweedie proposed a modification of MALA which truncates the drift in regions where the underlying Euler method can be explosive. Specifically the proposal move (2.6) is modified to a MALA algorithm with bounded drift:

When U(Zk)<1/h|\nabla U(\boldsymbol{Z}_{k})|<1/h, (2.10) is the Euler-Maruyama update, and otherwise, the proposal drift in (2.10) preserves the direction of the drift in ULA, but normalizes the amplitude in all degrees of freedom. The Metropolis-Hastings method with this modified proposal move will be referred to as the Metropolis-adjusted Langevin truncated algorithm (MALTA)\citet*RoTw1996A. By comparison to a random-walk-based Metropolis algorithm, one can often show that MALTA is geometrically ergodic\citet*RoTw1996A, At2005, JaHa2000. While this ad hoc correction to MALA makes MALTA geometrically ergodic, the relation of MALTA to the original diffusion (2.1) remains to be established.

Main Result I: Strong Convergence of MALA & MALTA.

In the paper we shall consider the situation where U\nabla U is nonglobally Lipschitz. Roughly speaking, MALTA corrupts Euler-Maruyama orbit by the random rejections in (2.8) with the modified proposal (2.10) and the ad hoc truncation of the drift. Yet, we show in the paper that MALTA still approximates pathwise the solution of (2.1). The precise statement is:

Assume 4.1. Then for every E0>0E_{0}>0 and T>0T>0, there exists hc(E0)>0h_{c}(E_{0})>0 and C(T,E0)>0C(T,E_{0})>0 such that for all h<hch<h_{c}, for all x:U(x)E0\boldsymbol{x}:U(\boldsymbol{x})\leq E_{0}, and for all t[0,T]t\in[0,T],

Assumption 4.1 can be found in §4.1. This assumption remains valid for SDEs of the type (2.1) in which the drift is nonglobally Lipschitz. A key ingredient in the proof is geometric ergodicity of MALTA. This property implies bounds on moments of MALTA at finite times. The precise bounds can be found in Lemma 4.9.

Since MALA is not geometrically ergodic when U\nabla U is nonglobally Lipschitz, we are unable to obtain such bounds on moments of MALA at finite times. However, such bounds can be derived if MALA’s initial condition is restricted to the equilibrium distribution of (2.1). This is because MALA by design preserves this distribution. Hence, for MALA we are able to prove the following theorem. We stress that MALTA’s strong accuracy does not require this restriction on initial conditions.

Assume 4.1. For all T>0T>0 there exists hc>0h_{c}>0 and C(T)>0C(T)>0 such that for all positive h<hch<h_{c} and for all t[0,T]t\in[0,T],

The proofs of Theorems 2.1 and 2.2 as well as the precise statement of Assumption 4.1 on which they rely are presented in §4. The proofs build on results of global accuracy for numerical methods applied to SDEs with uniformly Lipschitz drift\citet*MiTr2004 and with one-sided Lipschitz drift\citet*HiMaSt2002. §3 will illustrate these results on a simple but representative example with a nonglobally Lipschitz drift.

2 Inertial Langevin

where the following matrices have been introduced:

Here W\boldsymbol{W} is a standard 2n2n-dimensional Wiener process, or Brownian motion, β>0\beta>0 is a parameter referred to as the inverse temperature, and γ>0\gamma>0 is referred to as the friction coefficient. Write Y(t)=(Q(t),P(t))\mathbf{Y}(t)=(\boldsymbol{Q}(t),\boldsymbol{P}(t)) where Q(t)\boldsymbol{Q}(t) and P(t)\boldsymbol{P}(t) represent the instantaneous configuration and momentum of the system, respectively. We shall assume:

where M\boldsymbol{M} is a symmetric positive definite mass matrix and UU is a potential energy function. There are two main differences between the overdamped and inertial Langevin dynamics. First, the solution of (2.1) is a reversible stochastic process, whereas the solution of (2.11) is not. Second, the diffusion in (2.11) is only applied to momentum degrees of freedom, whereas the diffusion in (2.1) is applied to all degrees of freedom. The consequences of these differences to the discretization of (2.11) and its Metropolis-adjustment will be a main focus of §5.

Despite the degenerate diffusion in (2.11), under certain regularity conditions on UU, the solution to this SDE is geometrically ergodic with respect to an invariant probability measure μ\mu with the following density\citet*Ta2002:

Let NN and hh be given, set T=NhT=Nh and tk=hkt_{k}=hk for k=0,...,Nk=0,...,N. We shall consider an integrator for (2.11) based on splitting the Langevin equations into Hamilton’s equations for the Hamiltonian HH:

The solution of Hamilton’s equations will be approximated by a symplectic integrator; to be specific, the discrete Hamiltonian map of a self-adjoint variational integrator\citet*MaWe2001. While the exact flow will be used for the Ornstein-Uhlenbeck equations. These flows will be composed in a Strang-type splitting to obtain a pathwise approximant to the solution of inertial Langevin which we will refer to as the Geometric Langevin Algorithm (GLA).

This type of splitting of inertial Langevin equations is quite natural and has been recently used in simulations of molecular dynamics (see \citet*VaCi2006, BuPa2007), dissipative particle dynamics (see \citet*Sh2003, SeFaEsCo2006), and inertial particles (see \citet*PaStZy2008). This paper is geared towards applications in molecular dynamics where Langevin integrators (including the ones cited above) have been based on generalizations of the widely used Störmer-Verlet integrator. The Störmer-Verlet integrator is attractive for molecular dynamics because it is an explicit, symmetric, second-order accurate, variational integrator for Hamilton’s equations. In molecular dynamics it was popularized by Loup Verlet in 1967. Other popular generalizations of the Störmer-Verlet integrator to Langevin equations include Brünger-Brooks-Karplus (BBK) \citet*BrBrKa1984, van Gunsteren and Berendsen (vGB) \citet*GuBe1982, and the Langevin-Impulse (LI) methods \citet*SkIz2002. The LI method is also based on a splitting of Langevin equations, but it is different from the splitting considered here. The long-time properties of GLA were recently analyzed in the context of uniformly Lipschitz potential forces in \citet*BoOw2009.

An example of a GLA that will be analyzed in §5 is the Störmer-Verlet based scheme:

It is straightforward to show (2.15) possesses a smooth probability transition function with respect to Lebesgue measure even though the noise is only applied to momenta in (2.14). For globally Lipschitz potential forces, it is possible to show that (2.15) is a first-order strongly accurate integrator for (2.11), and can be shown to be geometrically ergodic with respect to an invariant measure that is a first-order accurate approximation to the exact invariant measure of (2.11) \citet*BoOw2009. However, if the potential force is nonglobally Lipschitz, then this discretization is plagued with the same transient behavior as forward Euler-Maruyama applied to the reversible Langevin diffusion process. As before, a Metropolis-Hastings method is proposed to correct the discretization error in the invariant measure and stochastically stabilize this discretization.

MAGLA.

This involution is introduced since the stochastic process defined by composing the solution of (2.11) with φ\varphi is reversible with respect to μ\mu.

The Metropolis-Adjusted Geometric Langevin Algorithm (MAGLA) will be based on the probability transition density of GLA which we will refer to as qhq_{h}. An explicit formula for this transition density is derived in the body of the paper (See (5.6).). Given hh and (Qk,Pk)(\boldsymbol{Q}_{k},\boldsymbol{P}_{k}), MAGLA computes a proposal move (Qk+1,Pk+1)(\boldsymbol{Q}_{k+1}^{*},\boldsymbol{P}_{k+1}^{*}) according to a step of GLA composed with a momentum flip. For example, for the Störmer-Verlet based GLA this proposal move is given explicitly by:

The algorithm accepts this proposal move with probability:

In other words, the MAGLA update is defined as:

for k=0,...,N1k=0,...,N-1. Observe, when a proposal move is rejected, the momentum is “flipped”; and MAGLA involves a modified detailed balance condition (2.17) as opposed to the usual detailed balance condition used with MALA (2.7). The reason MAGLA uses a modified detailed balance is related to the nonreversibility of the solution to (2.11). In particular, the transition density of the solution to (2.11) does not satisfy detailed balance, but does satisfy a modified detailed balance condition. It is straightforward to show that MAGLA preserves μ\mu, and hence, is not transient. In fact, it is often easy to classify MAGLA as an ergodic Markov chain even if the potential force is nonglobally Lipschitz. In the next paragraph, we quantify the strong convergence of MAGLA.

Main Result II: Strong Convergence of MAGLA.

It is straightforward to show GLA provides a first-order globally accurate integrator for (2.11) provided the potential force is globally Lipschitz \citet*BoOw2009. As discussed MAGLA involves a momentum flip whenever a proposal move is rejected. This momentum flip was introduced since the stochastic process defined by composing the solution of (2.11) with a momentum flip satisfies detailed balance with respect to π\pi. Despite this momentum flip, MAGLA provides a pathwise approximant to (2.11) as summarized by the following theorem. As in Theorem 2.2, we will restrict the initial conditions of MAGLA to be distributed according to the equilibrium distribution of (2.11). As before this assumption implies bounds on relevant moments of MAGLA which follow from the fact that MAGLA preserves the measure μ\mu.

Assumptions 4.1 (C) and 4.1 (E) on the potential energy hold even if potential force is nonglobally Lipschitz. The assumption 5.1 on (2.11) ensures that the continuous process itself is globally Lipschitz. We stress that this regularity of the continuous process is different from assuming U\nabla U is globally Lipschitz, and is made for convenience.

Unlike the rejections in (2.8), a rejection in (2.2) leads to a momentum flip, and hence, a loss of local accuracy. Even though, MAGLA still approximates pathwise the solution to (2.11) because the probability of these rejections as a function of time-step size is small. These issues including a proof of Theorem 2.3 and the lemmas on which it is based can be found in §5.

Numerical illustration

Here we illustrate the results above on the simple example of a particle diffusing in a quartic potential U(x)=x4/4U(x)=x^{4}/4 with inverse temperature β>0\beta>0. The overdamped dynamics of this system is given by:

This SDE admits the following unique invariant measure

Let NN and hh be given. Set T=NhT=Nh and tk=hkt_{k}=hk for k=0,...,Nk=0,...,N. In terms of which consider the following forward Euler approximation to (3.1):

The drift in (3.2) is destabilizing in the region:

This property of Euler-Maruyama’s discrete drift is the essential reason why Euler-Maruyama defines a transient Markov chain. Indeed, using this property (2.5) is proved in Lemma 6.3 of \citet*MaStHi2002. Despite this shortcoming Euler-Maruyama can be used as candidate dynamics in a Metropolis-Hastings method designed to sample from μ\mu.

Figure 3.1 illustrates the difference between a given realization of the exact solution to (3.1), the Euler-Maruyama integrator (3.2), and the Metropolized integrator when initiated at the edge of BhB_{h} and driven by the same realization of the Wiener process WW. We empasize that in this figure the time-step is held fixed and chosen to be large. The figure shows the Euler-Maruyama orbit explodes, the solution orbit rapidly contracts to the origin, and the Metropolis-Hastings method initially stagnating, but eventually tracking the solution apparently better over time. We remark that the ability of the Metropolized orbit to track the solution better over time is not a generic property of Metropolized integrators. In this case it is a consequence of (3.1) satisfying a “one force, one solution” principle (See \citet*EKhMaSi2000.). That is, for every realization of the Wiener process, the solution possesses a random attractor that the Metropolized orbit is drawn to.

Figure 3.2 shows a longer-time realization of the Metropolized integrator and the solution. The solution at this temperature frequently visits higher energy values. The proposal moves to higher energy values are less likely to be accepted by the Metropolized integrator. At the same time, ergodicity implies the Metropolized integrator must visit these higher energy values, and as often as the exact solution along this long time-interval, since both chains preserve π\pi. Consequently, the Metropolized integrator intermittently stagnates as shown in the figure, and of course, loses pathwise accuracy.

This observation does not contradict Theorems 2.1 and 2.2. Keep in mind that the time-step size for this particular orbit is held fixed and chosen to be large. Such stagnations at high energy become less likely to occur along orbits on finite-time intervals as the time-step becomes smaller. Figures 3.4 and 3.3 confirms this pathwise convergence. In particular, the figures show an O(h3/4)\mathcal{O}(h^{3/4}) rate of strong convergence of the Metropolized integrator for this example using MALTA and an out of equilibrium initial condition, and MALA with an initial condition restricted to the equilibrium distribution of (3.1).

2 Inertial Langevin

Consider a simple particle with Hamiltonian given by:

where U(q)=q4/4U(q)=q^{4}/4. The inertial Langevin dynamics of this system is given by:

with initial condition (Q(0),P(0))=(q0,p0)(Q(0),P(0))=(q_{0},p_{0}). This SDE admits the following unique invariant measure

Let NN and hh be given. Set T=NhT=Nh and tk=hkt_{k}=hk for k=0,...,Nk=0,...,N. In terms of which consider the following Störmer-Verlet based GLA discretization of (3.3):

A Metropolis-Hastings method can stochastically stabilize (3.4). At the same time, as stated in Theorem 2.3, the method is also pathwise accurate with respect to the solution of (3.3) when initiated from equilibrium. This accuracy is achieved despite the loss of local accuracy in momentum that occurs at every rejection.

Figure 3.5 is consistent with the strong O(h)\mathcal{O}(h) rate of convergence of MAGLA for this example. Assumption 5.1 is hard to check for this example, but we emphasize the assumption is not equivalent to the potential force being globally Lipschitz.

Overdamped Langevin

The following structural assumptions will be invoked in this paper regarding (2.1):

There exists a real constant K>0K>0 such that

There exists a real constant K>0K>0 such that

There exists a real constant K>0K>0 such that

There exists a real constant K>0K>0 such that

We stress these structural assumptions hold even if the drift in (2.1) is nonglobally Lipschitz. These structural assumptions imply that the Markov process defined by the solution to (2.1) possesses a unique invariant probability measure explicitly given by:

The proof of this lemma is presented in §7.

The structural assumptions on (2.1) also imply a Lipschitz condition on its solution which we state as another lemma.

Assume 4.1. For sts\leq t, let Yt,s(x)\boldsymbol{Y}_{t,s}(\boldsymbol{x}) denote the evolution operator of the solution to (2.1): with Ys,s(x)=x\boldsymbol{Y}_{s,s}(\boldsymbol{x})=\boldsymbol{x} and for rstr\leq s\leq t recall the Chapman-Kolmogorov identity Yt,sYs,r(x)=Yt,r(x)\boldsymbol{Y}_{t,s}\circ\boldsymbol{Y}_{s,r}(\boldsymbol{x})=\boldsymbol{Y}_{t,r}(\boldsymbol{x}). Set

The proof of this lemma is also given below in section 7.

Explicit Integrator for the SDE.

Recall that the transition kernel for ULA (cf. (2.3)) has a density qh(x,y)q_{h}(\boldsymbol{x},\boldsymbol{y}) with respect to the Lebesgue measure given by

It is derived by a simple change of variables relating the probability density of the Euler-Maruyama update to the probability density of a multivariate Gaussian with zero mean and variance 2β1h2\beta^{-1}h. ULA is irreducible with respect to the Lebesgue measure because its transition density is smooth and strictly positive. However, as mentioned in the introduction, in Langevin diffusions with nonglobally Lipschitz drift, ULA is transient, i.e., stochastically unstable. Nevertheless under the assumptions on the drift, it is straightforward to obtain the following single-step accuracy for the forward Euler-Maruyama scheme.

the local mean-squared error of (2.3) satisfies

the local mean deviation of (2.3) satisfies

The above lemmas are proven in section 7.

2 MALA and its Properties

MALA randomly rejects integration steps produced by Euler-Maruyama with a probability designed to ensure the composite chain preserves the equilibrium measure of (2.1). Recall that this probability is given by:

The off-diagonal transition probability kernel of the composite chain has density:

The probability of remaining at the same point, or stagnation probability, is given by

In sum, the Metropolis-Hastings transition kernel is

Under assumption 4.1 on the potential energy, the kk-step transition probability of MALA converges to μ\mu in the following sense

However, since ph(x,y)p_{h}(\boldsymbol{x},\boldsymbol{y}) satisfies detailed balance the first term in the above vanishes, and hence,

According to Corollary 2 of \citet*Ti1994, a Metropolis-Hastings algorithm that is irreducible with respect to the same measure it is designed to preserve is positive Harris recurrent. Consequently, MALA is irreducible, aperiodic, and positive Harris recurrent. According to Proposition 6.3 of \citet*Nu1984, these properties are equivalent to ergodicity of the chain. ∎

This theorem shows that MALA is ergodic with respect to the equilibrium measure of (2.1). However, the effect of rejections on the pathwise approximation of the solution remains to be quantified.

Equilibrium Strong Accuracy.

In order to prove equilibrium strong accuracy of MALA the following lemmas will be helpful.

and ηN(0,1)n\boldsymbol{\eta}\sim\mathcal{N}(0,1)^{n}.

Using the expressions for qhq_{h} and π\pi (cf. (4.3) and (4.2), resp.) one can show

For hh sufficiently small the latter integral can be well-approximated by a Taylor expansion as follows. A Taylor expansion of the function GG about h=0h=0 yields,

Substitute this expansion into (4.8) to obtain,

An application of Laplace method to approximately evaluate integrals yields,

The proof is completed by invoking Assumption 4.1 (E). ∎

Assume 4.1. For all h>0h>0 there exists a C>0C>0 such that

the local mean-squared error of MALA satisfies

the local mean deviation of MALA satisfies

It is clear from this expression that the local mean-squared error of the Metropolis-Hastings integrator is due to the local mean-squared error of forward Euler-Maruyama and a term due to probable rejections in the Metropolis-Hastings step. By the Cauchy-Schwarz inequality this latter term can be bounded as follows,

Lemma 4.1 implies there exists a constant K>0K>0 such that

While Lemma 4.5 implies that there exists a constant K>0K>0 such that

Thus, there exists a constant K>0K>0 such that

Observe that (4.2) is dominated by the error incurred when a proposal move is rejected, and not the O(h3)O(h^{3}) error incurred when a proposal move is accepted (cf. Lemma 4.3). Hence, the local mean-squared accuracy of MALA is O(h5/4)O(h^{5/4}).

It is clear from this expression that the local mean deviation of the Metropolis-Hastings integrator is due to the local mean-deviation of forward Euler-Maruyama and a term due to probable rejections in the Metropolis-Hastings step. By the Cauchy-Schwarz and Jensen inequalities,

As in part (A) of this proof, the local mean deviation of forward Euler-Maruyama Lemma 4.3 together with Lemma 4.1 and Lemma 4.5 imply that,

Assume 4.1. Then for all h>0h>0 and for all t>0t>0, there exists a real constant C>0C>0 such that

This bound is a consequence of ergodicity of the solution to (2.1) and MALA with respect to the probability measure μ\mu. That is, the ergodic property of the integrator and the solution of (2.1) imply that

We are now in measure to prove Theorem 2.2 which we restate.

Assume 4.1. For all T>0T>0 there exist hc>0h_{c}>0 and C(T)>0C(T)>0, such that for all positive h<hch<h_{c} and for all t[0,T]t\in[0,T],

Discretize the interval [0,T][0,T] using NN equally spaced points in such a fashion that tk=kht_{k}=kh for k=0,...,Nk=0,...,N with N=T/hN=\lfloor T/h\rfloor. Assume h>0h>0 but otherwise arbitrary for now. The goal in this proof is to obtain a global error estimate for the mean-squared error of MALA by writing the error at the (k+1)(k+1)th step in terms of the error at the kkth step. For clarity of presentation we will use a rolling constant KK throughout this proof. Set

for k=0,...,N1k=0,...,N-1. Specifically, the proof shows that there exists a constant K>0K>0 such that

By unraveling this recursive inequality, the order h3/4h^{3/4} mean-squared accuracy becomes apparent. To do this the expectation of the error at the (k+1)th(k+1)th step is conditioned on knowing all events up to the kkth step. For this purpose let Fk\mathcal{F}_{k} denote the sigma-algebra of events up to and including the kthkth-iteration. Also write,

It follows from the local mean-squared accuracy of MALA (cf. Lemma 4.6),

Since MALA preserves μ\mu by design, and by the law of total expectation,

In terms of which, rewrite the third term in (4.2) as:

It follows from Lemma 4.6 and invariance of μ\mu under MALA that,

In sum, there exists a real constant K>0K>0 such that the following term bounds from above the third term in (4.2):

Using elementary inequalities it is clear that,

Combining the bounds on the iterated expectations of (4.2) yields (4.10). ∎

3 MALTA and its Properties

A necessary condition for geometric ergodicity of a Metropolis-Hastings method is that the rejection or stagnation probability is bounded away from unity (see proposition 5 of \citet*RoTw1996B). When the drift in (2.3) is nonglobally Lipschitz, ULA is transient and MALA’s stagnation probability cannot be globally bounded away from unity. Hence, MALA is not geometrically ergodic. MALTA corrects this problem by truncating the drift in the candidate dynamics in regions where the Euler-Maruyama integrator can be explosive. This modification enables a precise control of the moments of the integrator initiated from a nonequilibrium initial condition. With this control we are able to prove pathwise accuracy of MALTA on finite time-intervals.

The geometric ergodicity of MALTA for super-exponential target densities with non-degenerate level-sets was intuitively clear to the investigators who proposed the algorithm in\citet*RoTw1996A. Subsequently, it was proven in proposition 2.1 of\citet*At2005 using techniques to prove geometric ergodicity for random walk Metropolis candidate dynamics given in\citet*JaHa2000. For the convenience of the reader, the theorem is restated.

where we have introduced a TV norm M\|\cdot\|_{M} which is defined for a signed measure ν\nu as:

The main implication of geometric ergodicity for our purpose is the following crucial estimate on moments of MALTA. The proof of this lemma is a straightforward, but tedious consequence of the results in Proposition 2.1 and Lemma 6.2 of\citet*At2005.

We are now in measure to prove Theorem 2.1.

Assume 4.1. Then, for every E0>0E_{0}>0 and T>0T>0, there exists a hc(E0)>0h_{c}(E_{0})>0 and a C(T,E0)>0C(T,E_{0})>0 such that for all positive h<hch<h_{c}, for all x:U(x)E0\boldsymbol{x}:U(\boldsymbol{x})\leq E_{0}, and for all t[0,T]t\in[0,T],

The proof of this theorem is similar to the proof of Theorem 2.2 with the following main differences.

The preservation of the moments of the solution to (2.1) and the Metropolis-Hastings method due to ergodicity are replaced by bounds on moments implied by Lemmas 4.1 and 4.9.

The k+1k+1-step error conditioned on knowing Fk\mathcal{F}_{k} is split into two parts:

For the first part, the results of Theorem 2.2 apply with the provision given in the first difference above. For the second term, Chebyshev’s inequality implies:

This inequality, Assumption 4.1 (E), Lemma 4.1, and Lemma 4.9 imply that there exists a constant C(E0)>0C(E_{0})>0 such that,

Inertial Langevin

Next we shall consider a Metropolized version of a discretization of (2.11) that extends variational integrators to inertial Langevin dynamics. We begin by introducing the so-called geometric Langevin algorithm and discuss its properties. The section then examines the properties of the Metropolis-Hastings adjusted geometric Langevin algorithm, including quantifying its accuracy in approximating inertial Langevin dynamics.

The arguments in this section are very closely related to those in §4. However, there are two key differences. First, the solution to (2.11) is no longer a reversible stochastic process. Yet, the transition kernel of the solution process composed with a momentum flip does satisfy detailed balance. Second, the diffusion in (2.11) is only applied to momentum and not position. These differences motivate this section on Metropolizing discretizations of nonreversible processes. However, we will omit analysis that is redundant.

Let NN and hh be given, set T=NhT=Nh and tk=hkt_{k}=hk for k=0,...,Nk=0,...,N. We shall consider an integrator for (2.11) based on splitting the Langevin equations into Hamilton’s equations for the Hamiltonian HH (2.13) and and Ornstein-Uhlenbeck equations (2.14). The solution of Hamilton’s equations will be approximated by the discrete Hamiltonian map of a variational integrator\citet*MaWe2001. While the exact flow will be used for the Ornstein-Uhlenbeck equations. These flows will be composed in a Strang-type fashion to obtain a pathwise approximant to Langevin’s equations which we will call the Geometric Langevin Algorithm (GLA).

where Q(t)\boldsymbol{Q}(t) solves the Euler-Lagrange equations for the Lagrangian LL with endpoint conditions Q(0)=q0\boldsymbol{Q}(0)=\boldsymbol{q}_{0} and Q(h)=q1\boldsymbol{Q}(h)=\boldsymbol{q}_{1}.

A discrete Lagrangian is self-adjoint if:

Some of the results that follow will be specific to the Störmer-Verlet integrator which can be derived from the following discrete Lagrangian:

This discrete Lagrangian is clearly self-adjoint.

Ornstein-Uhlenbeck Equations.

For the distribution of the solution, the stochastic flow will be denoted simply by ψh\psi_{h}. Let oho_{h} denote the transition probability density of ψtk+h,tk\psi_{t_{k}+h,t_{k}}. By a change of variables, it’s transition density is given explicitly by:

Observe that this transition density does not depend on the initial or terminal position, since (5.4) fixes position and the Hamiltonian is separable.

Strang-type Splitting.

GLA is defined as the following Strang-type splitting pathwise approximant to (2.11):

Observe that the zero of the Dirac-delta measure is implicitly defined by

To make it explicit, we perform a change of variables,

From which it follows the transition density qhq_{h} of GLA is given by:

Observe that an explicit characterization of qhq_{h} is available even when the variational integrator is implicit.

Störmer-Verlet Based GLA.

The proof of the following lemma is similar to Lemma 4.3, and hence, is omitted.

the local mean-squared error of (2.15) satisfies

the local mean deviation of (2.15) satisfies

As has been demonstrated, the Markov chain defined by sampling (2.15) every time-step possesses a smooth probability transition density with respect to Lebesgue measure. For globally Lipschitz potential forces, this discretization is a first-order strongly accurate integrator for (2.11). However, if the potential force is nonglobally Lipschitz, the Störmer-Verlet based GLA is plagued with the same transient behavior as Euler-Maruyama for overdamped Langevin. As before, a Metropolis-Hastings method is proposed to stochastically stabilize this Markov chain.

2 MAGLA and its Properties

MAGLA is the Metropolis-adjusted GLA. Given hh and (Qk,Pk)(\boldsymbol{Q}_{k},\boldsymbol{P}_{k}), MAGLA computes a proposal move according to a step of GLA:

MAGLA accepts this proposal move with probability:

Altogether, the MAGLA update is given by:

for k=0,...,N1k=0,...,N-1. We stress the momentum flip in the acceptance probability and upon rejection is introduced because inertial Langevin dynamics is nonreversible. Yet, the exact transition density of the solution to inertial Langevin composed with a momentum is reversible, i.e., does satisfy detailed balance. We will see in the proof of Lemma 5.3 that without this momentum flip, pathwise accuracy cannot be achieved with MAGLA.

MAGLA preserves μ\mu, and hence, is not transient even when its underlying variational integrator is explicit. In fact, it is often straightforward to classify MAGLA as an ergodic Markov chain even if the potential force is nonglobally Lipschitz. However, with an explicit variational integrator MAGLA is often no longer geometrically ergodic even with momentum flips. To correct this problem, a modification of MAGLA can be implemented where the drift is truncated in regions where an explicit discretization of the drift causes high rejection rates. This modification which would be analogous to MALTA is not implemented here. Instead, we will concentrate on proving pathwise accuracy from an initial condition restricted to the equlibrium measure. This restriction enables us to obtain bounds on relevant moments of the Metropolized integrator. To prove pathwise accuracy the following lemmas will be useful.

The following lemma shows for an arbitrary symmetric variational integrator, the acceptance probability of MAGLA is related to the energy change in its variational integrator. This implies that the acceptance probability of MAGLA has mechanical and intrinsic meaning. It also simplifies the subsequent analysis of MAGLA.

Let qhq_{h} denote the transition probability density of GLA. Let LdL_{d} denote the discrete Lagrangian associated with the variational integrator θh\theta_{h}. Assume that LdL_{d} is self-adjoint (cf. 5.2). Then, the acceptance probability of MAGLA satisfies:

Introduce the abbreviations: DiLd=DiLd(q0,q1,h)D_{i}L_{d}=D_{i}L_{d}(\boldsymbol{q}_{0},\boldsymbol{q}_{1},h) for i=1,2i=1,2, and Bh/2=exp(γM1h/2)\boldsymbol{B}_{h/2}=\exp(-\gamma\boldsymbol{M}^{-1}h/2). Expanding (5.6) yields,

The self-adjoint property of the discrete Lagrangian implies that:

The following lemma is analogous to Lemma 4.5, and roughly speaking, quantifies how often rejections occur in MAGLA.

Substitute qhq_{h} from (5.6) into the above to obtain,

Integrate with respect to p1\boldsymbol{p}_{1} to obtain the following simplified expression,

Up to this point the argument has been for a general GLA. Now, the proof is specialized to the Störmer-Verlet based GLA. For the Störmer-Verlet discrete Lagrangian (5.3):

Substitute these expressions into (5.2) gives:

Substitute the change of variables implicitly defined by (5.11) into (5.2) and then Taylor expand ΔE\Delta E about h=0h=0 to obtain,

As in Lemma 4.5, a standard application of Laplace’s method together with Assumption 4.1 (E) gives the desired result. ∎

Using Lemma 5.3, we are now in position to quantify the local accuracy of MAGLA.

the local mean-squared error of MAGLA satisfies

the local mean deviation of MAGLA satisfies

A detailed proof is omitted since similar steps are taken in the proof of Lemma 4.6. We remark that it follows from the definition of MAGLA (2.2),

This estimate reveals that the local mean-squared error of MAGLA is a sum of the local mean-squared error of GLA (cf. Lemma 5.1) and the error that arises from a potential rejection weighted by the probability of rejection. This error is O(1)O(1) because when a rejection happens in (2.2), the momentum is flipped. However, the probability of rejection is within the order of accuracy of the method according to Lemma 5.3.

For sts\leq t, let Yt,s(x)\boldsymbol{Y}_{t,s}(\boldsymbol{x}) denote the evolution operator of the solution to (2.11): with Ys,s(x)=x\boldsymbol{Y}_{s,s}(\boldsymbol{x})=\boldsymbol{x} and for rstr\leq s\leq t recall the Chapman-Kolmogorov identity Yt,sYs,r(x)=Yt,r(x)\boldsymbol{Y}_{t,s}\circ\boldsymbol{Y}_{s,r}(\boldsymbol{x})=\boldsymbol{Y}_{t,r}(\boldsymbol{x}). Set

As discussed MAGLA involves a momentum flip whenever a proposal move is rejected. Despite this MAGLA provides a pathwise approximant to (2.11) as summarized by the following theorem.

A detailed proof is omitted as it is similar to the proof of Theorem 2.2. Recall, the main tools needed for that proof are local accuracy of MAGLA which follows from Lemma 5.4, and the invariance of μ\mu under both the transition kernel of the solution to (2.11) and the Markov chain defined by Störmer-Verlet-based MAGLA.

Conclusion

This paper examined the pathwise accuracy of Metropolized integrators applied to overdamped and inertial Langevin dynamics. The paper primarily analyzed explicit discretizations of these ergodic SDEs. For explicit discretizations if the drift is nonglobally Lipschitz, discretization errors often lead to stochastically unstable Markov chains. A Metropolis-Hastings method was shown to stochastically stabilize these discretizations. The paper used this stochastic stability to quantify the error induced by the Metropolis-Hastings methods in approximating the solution to the SDE.

Although the paper restricted to specific discretizations of these SDEs as proposal moves, the strategy to prove the results is rather general. The strategy can be used to prove pathwise accuracy of Metropolized versions of higher order accurate, adaptive and multistep discretizations of overdamped or inertial Langevin dynamics (See, e.g., \citet*MiTr2004, VaCi2006.). In fact, this strategy applies to a larger class of SDEs, namely those which possess a transition density that satisfies detailed balance-type condition.

Our strategy relies on two main ingredients. First, that the local error induced by the Monte-Carlo method depends upon the time-step size in an algebraic fashion. For example, often we found that the local mean-squared error of the Metropolized Integrator can be decomposed as:

The second ingredient involves bounds on moments of the integrator that are uniform in the time-step size. These bounds enabled the local error estimates to be extended to global error estimates, and hence, pathwise convergence on finite time-intervals. These bounds are automatic if initial conditions are restricted to the known equilibrium distribution the Metropolis-adjusted integrator is designed to sample. To relax this restriction on initial conditions, geometric ergodicity played a key role to obtain such bounds. MALA is often ergodic, but because its proposal chain is often transient, is not geometrically ergodic. This transience is due to a numerical instability in forward Euler for large energy values. By suitably truncating the drift at high energy values, one can ensure the proposal dynamics is not transient. MALA with truncated drift, or MALTA is known to often be geometrically ergodic on infinite time-intervals where MALA is not. The paper used this property to prove MALTA is pathwise convergent on finite time-intervals without a restriction on its initial condition. We remark that MALTA can be generalized to explicit higher-order, multistep and adaptive integrators for overdamped and inertial Langevin equations.

Finally, we emphasize that GLA can be extended to non-flat configuration spaces and multiple time-steps. Indeed, its underlying variational integrator can incorporate holonomic constraints (via, e.g., Lagrange multipliers) \citet*WeMa1997 and multiple time steps to obtain so-called asynchronous variational integrators \citet*LeMaOrWe2003. A natural choice for the variational integrator would be a SHAKE or RATTLE algorithm \citet*WeMa1997, MaWe2001. The Ornstein-Uhlenbeck equations (2.14) are defined on a flat vector space since the configuration is held fixed. Moreover, a review of the proof of Lemma 5.2 shows that the probability transition density of GLA can be explicitly characterized even if the configuration space is not flat. By inspection, this density is absolutely continuous with respect to the standard volume measure on that non-flat space. Consequently, GLA on manifolds can be used as proposal dynamics in a Metropolis-Hastings context, and MAGLA can be extended to non-flat configuration spaces to treat, e.g., inertial Langevin equations with holonomic constraints as in \citet*VaCi2006. Moreover, MAGLA on manifolds can readily classified as an ergodic Markov chain on that non-flat phase space.

We thank Christof Schütte for stimulating discussions. We thank Tony Lelievre, Sebastian Reich, and Gabriel Stoltz for suggestions on an earlier version of this paper.

Proof of Lemmas 4.1, 4.2 and 4.3

Assumption 4.1 (B) on the generator of the (SDE) implies

From the solution to (2.1) it is apparent

Recall, that the higher order central moments of a normal random variable ξN(0,t)\xi\sim\mathcal{N}(0,t) are given by:

An application of Cauchy-Schwarz inequality yields,

According to Assumption 4.1 (E), there exists a constant K>0K>0 such that

An application of Hölder’s inequality and Lemma 4.1 (A) imply there exists a possibly different constant K>0K>0 such that

Proof of Lemma 4.2.

To prove the second inequality, observe that:

An application of the Cauchy-Schwarz inequality implies,

Assumption 4.1 (A) and the triangle inequality imply there exists a constant K>0K>0 such that

The Cauchy-Schwarz inequality implies that

Lemma 4.1 implies with a possibly different constant K>0K>0

While the first part of this lemma implies, again with a possibly different constant K>0K>0 that,

Proof of Lemma 4.3

According to (2.1) and (2.3), the difference between the Euler-Maruyama discretization and the solution to the SDE after a single step takes the form:

Assumption 4.1 (C) implies that there is a constant K>0K>0 such that,

A second application of Cauchy-Schwarz yields,

Lemma 4.1 implies that there exists a constant K>0K>0 such that

Applying the Cauchy-Schwarz inequality twice gives,

Assumption 4.1 (E) implies the existence of a K>0K>0 such that

An application of Lemma 4.1 gives the required result.

References