Exponential Convergence in $L^p$-Wasserstein Distance for Diffusion Processes without Uniformly Dissipative Drift

Dejun Luo, Jian Wang

Introduction

In this paper we consider the following Itô stochastic differential equation

Yet there is another approach for studying exponential convergence of the semigroup (Pt)t0(P_{t})_{t\geq 0} corresponding to the SDE (1.1) considered in this paper, that is, the coupling method. If the drift vector field bb fulfills certain dissipative properties, this latter method provides explicit rate of convergence to equilibrium in a straightforward way, see e.g. and the preprint . The present work is motivated by where the author obtained exponential decay of (Pt)t0(P_{t})_{t\geq 0} when the drift bb is assumed to be only dissipative at infinity, see the introduction below for more details.

In this paper, we shall establish the exponential convergence of the map μμPt\mu\mapsto\mu P_{t} with respect to the LpL^{p}-Wasserstein distance WpW_{p} for all p1p\geq 1. We first recall some known results.

Suppose that σ=Id\sigma=\textup{Id} and there exists a constant K>0K>0 such that

The proof of this result is quite straightforward, by simply using the synchronous coupling (also called the coupling of marching soldiers in [7, Example 2.16]), see e.g. [2, p.2432] for a short proof.

As in [13, (2.3)], we shall assume throughout the paper that

This technical condition will be used in Section 2 to construct the auxiliary function.

Suppose that the vector field bb is locally Lipschitz continuous, and there is a constant c>0c>0 such that

In particular, when σ=Id\sigma=\textup{Id}, the condition (1.5) holds true if (1.2) is satisfied only for large xy|x-y|, that is,

for some constant L>0L>0 large enough. Therefore, [13, Corollary 2.3] implies that the map μμPt\mu\mapsto\mu P_{t} converges exponentially with respect to the standard L1L^{1}-Wasserstein distance W1W_{1} under locally non-dissipative drift, see [13, Example 1.1] for more details. The proof of [13, Corollary 2.3] is based on the coupling by reflection of diffusion processes and a carefully constructed concave function, cf. [13, Section 2]. A number of direct consequences are presented in [13, Section 2.2] which indicate that the convergence result as [13, Corollary 2.3] is extremely useful.

However, [13, Corollary 2.3] is not satisfactory in the sense that no information on the L2L^{2}-Wasserstein distance W2W_{2} is provided. This fact has also been noted in [5, Section 7.1, Remark 19], saying that “the reflection coupling cannot furnish some information on W2W_{2}”. Our main result of this paper shows that this is not the case.

Assume that there are constants c>0c>0 and η0\eta\geq 0 such that for all rηr\geq\eta, one has

where C:=C(c,η,ε,p)>0C:=C(c,\eta,\varepsilon,p)>0 is a positive constant.

Theorem 1.3 above does provide new conditions on the drift term bb such that the associated semigroup (Pt)t0(P_{t})_{t\geq 0} is exponentially convergent with respect to the LpL^{p}-Wasserstein distance WpW_{p} for all p1p\geq 1. The reason that we can obtain the exponential convergence in WpW_{p} for all p1p\geq 1, not only W1W_{1}, is due to our particular choice of the auxiliary function which is convex near infinity. It is designed by using Chen–Wang’s famous variational formula for the principal eigenvalue of one-dimensional diffusion operator, see for instance or [7, Section 3.4]. The reader can refer to and the references therein for recent studies on this topic.

The assertion of Theorem 1.3 can be slightly strengthened if (1.6) is replaced by a stronger condition.

Assume that there are constants c>0c>0, η>0\eta>0 and θ>1\theta>1 such that for all rηr\geq\eta, one has

The idea of the proof is to use synchronous coupling for large xy|x-y| and the coupling by reflection for small xy|x-y|. For the latter part, we can directly use the result of Theorem 1.3, since (1.8) implies that (1.6) holds with cηθ1c\eta^{\theta-1} if η>0\eta>0.

Before presenting the consequences of Theorem 1.3, let us first make some comments and give some examples. In the beginning, we intended to generalize Eberle’s results by assuming that there is a constant c>0c>0 such that

It turns out that under mild conditions on κ\kappa, the two conditions (1.5) and (1.10) are equivalent, up to changing the constants. More explicitly, we have

Assume that there are constants c,r0>0c,r_{0}>0 such that κ(r0)c\kappa(r_{0})\leq-c and δ0:=sup0rr0κ(r)<+\delta_{0}:=\sup_{0\leq r\leq r_{0}}\kappa(r)<+\infty. Then, the condition (1.5) holds with some other positive constant.

This result indicates that if the function κ\kappa is locally bounded from above, then the following statements are equivalent:

there exist constants c,r0>0c,r_{0}>0 such that κ(r0)c\kappa(r_{0})\leq-c;

there exist constants c>0c>0 and θ1\theta\leq 1 such that κ(r)crθ\kappa(r)\leq-cr^{\theta} for r>0r>0 large enough;

there exists a constant c>0c>0 such that κ(r)cr\kappa(r)\leq-cr for r>0r>0 large enough.

The equivalence stated above also indicates that Theorem 1.3 is sharp in some situation, as shown by the next example.

Assume that σ=Id\sigma=\textup{Id} and b(x)=V(x)b(x)=\nabla V(x) with V(x)=(1+x2)δ/2V(x)=-(1+|x|^{2})^{\delta/2} for some δ(0,2)\delta\in(0,2). Then, we have the following statements.

If δ(0,1)\delta\in(0,1), then κ(r)0\kappa(r)\geq 0 for all rr large enough, and the inequality (1.7) does not hold for any positive constants CC and λ\lambda with p=1p=1.

where dTVd_{TV} is the total variation distance between probability measures.

To show the power of Theorem 1.4, we consider the following example which yields the exponential convergence of the semigroup (Pt)t0(P_{t})_{t\geq 0} with respect to the LpL^{p}-Wasserstein distance WpW_{p} (p>2)(p>2) for super-convex potentials. The assertion below improves the results mentioned in [5, Section 6.1].

Let σ=Id\sigma=\textup{Id} and b(x)=V(x)b(x)=\nabla V(x) with V(x)=x2αV(x)=-|x|^{2\alpha} and α>1\alpha>1. It follows from [5, Section 6, Example 1] that there is a constant c>0c>0 such that for all r>0r>0,

holds with some constant C:=C(α,p)>0C:=C(\alpha,p)>0.

As applications of Theorem 1.3, we consider the existence and uniqueness of the invariant probability measure, and also the exponential convergence of the semigroup with respect to the LpL^{p}-Wasserstein distance WpW_{p}. For p(1,)p\in(1,\infty), we define

holds with some positive constant C:=C(c,p)C:=C(c,p).

Under the assumptions of Corollary 1.8, it is easy to establish the following Foster-Lyapunov type conditions:

where P(t,)P(t,\cdot) is the associated transition probability.

As was pointed by the referee, the conclusion of Corollary 1.8 for p=1p=1 also could be deduced from Theorem 1.4 and the Foster-Lyapunov type condition above, by Harris’s theorem for the exponential convergence to the invariant measure in the Wasserstein metric. See [17, Theorem 4.8] and [4, Theorem 2.4] for more details.

The following statement is concerned with symmetric diffusion processes. Though we believe the assertion below is known (see e.g. [25, Corollary 1.4]), we stress the relation between the exponential convergence with respect to L1L^{1}-Wasserstein distance W1W_{1} and that with respect to the L2L^{2}-norm, which is equivalent to the Poincaré inequality.

then μ\mu satisfies the Poincaré inequality, i.e.

Preliminaries

Similar to the main result in , the proof of Theorem 1.3 is based on the reflection coupling of Brownian motion, which was introduced by Lindvall and Rogers and developed by Chen and Li . First, we give a brief introduction of the coupling by reflection. Together with (1.1), we also consider

where (Wt)0t<T(W_{t})_{0\leq t<T} is a one dimensional Brownian motion expressed by Wt=0tes,dBsW_{t}=\int_{0}^{t}\langle e_{s},\textup{d}B_{s}\rangle.

Next, by Itô’s formula and (2.2), for t<Tt<T,

Denote by rt=σ1Ztr_{t}=|\sigma^{-1}Z_{t}|. Then

2 Auxiliary Function

For any ε(0,c)\varepsilon\in(0,c), let ψC2([0,))\psi\in C^{2}([0,\infty)) be the following strictly increasing function

The definition of ψ\psi seems a little bit strange at first sight, indeed, it is motivated by Chen–Wang’s variational formula for principal eigenvalue of one-dimensional diffusion operator

on [0,)[0,\infty). One key to this famous formula is the following elegant mimic eigenfunction gg associated with the first eigenvalue ((see [7, pp. 52–53] for a heuristic argument)):

Now, let b(r)=crb(r)=-cr in the definition of the function gg above. Then we have

for some proper choice of the constant ε\varepsilon in the definition of ψ\psi.

On the one hand, it is easy to see that for any r>0r>0,

We first show that ψ1(r)\psi_{1}(r) and ψ2(r)\psi_{2}(r) are comparable.

There exist two constants C0:=C0(ε)C_{0}:=C_{0}(\varepsilon) and C^0:=C^0(ε)>0\hat{C}_{0}:=\hat{C}_{0}(\varepsilon)>0 such that for all r0r\geq 0,

Note that r(1+r)rr(1+r)\sim r as r0r\to 0 and r(1+r)r2r(1+r)\sim r^{2} as rr\to\infty, where \sim means the two quantities are of the same order. By L’Hôpital’s law,

Therefore, the required assertion follows from the two limits above. ∎

Roughly speaking, the above two inequalities imply that the auxiliary function ψ\psi behaves like crc^{\prime}r for small rr, and grows exponentially fast as ecr2e^{c^{\prime\prime}r^{2}} for large rr. Hence, the function ψ(r)\psi(r) can be used to control the function rpr^{p} with p1p\geq 1. More explicitly, we have

There is a constant C1>1C_{1}>1 such that for all r0r\geq 0,

Consequently, for any p1p\geq 1, there is a constant C2=C2(p,ε)>0C_{2}=C_{2}(p,\varepsilon)>0 such that for all r0r\geq 0,

Furthermore, we can give an explicit estimate to the constant C0C_{0} in Lemma 2.2, which will be used in the exponential convergence rate.

The constant C0C_{0} in Lemma 2.2 has the following expression:

In order to estimate ψ1(r)\psi_{1}(r), we need the following inequality on the tail of standard Gaussian distribution (e.g. see [11, (3)]):

where Φ(r)\Phi(r) and ϕ(r)\phi(r) are respectively the distribution and density function of the standard Gaussian distribution N(0,1)N(0,1). Consequently, for s>0s>0,

Substituting this estimate into the expression of ψ1\psi_{1} leads to

Next, we consider two cases. (i) If r2/εr\leq 2/\sqrt{\varepsilon}, then

Substituting this estimate into (2.10) yields

Summarizing the above two cases and using (2.9), we obtain

it is easy to show that for all r[0,2/ε]r\in[0,2/\sqrt{\varepsilon}\,], it holds

On the other hand, for r>2/εr>2/\sqrt{\varepsilon}, we have

Combining the above two inequalities, we deduce that for all r>2/εr>2/\sqrt{\varepsilon},

Having the two inequalities (2.12) and (2.13) in hand, and using (2.11), we complete the proof. ∎

We also need the following simple result.

If 0r10\leq r\leq 1, then, using the fact that

The required assertion follows immediately from these two conclusions. ∎

Finally, we present a consequence of all the previous results in this part.

Let ψ\psi be the function defined by (2.4), and λ\lambda be the constant in Theorem 1.3. Then, for all r>0r>0,

By (2.5), we deduce from Lemmas 2.4 and 2.5 that

where the last inequality follows from (2.6). This, along with the definition of the constant λ\lambda in Theorem 1.3, yields the required assertion.∎

Proofs of Theorems and Proposition

Let ψ\psi be the function defined by (2.4). According to (2.3) and Itô’s formula, it holds that

Combining this inequality with (2.15), we obtain

where λ\lambda is the constant in Theorem 1.3.

Then, for tTnt\leq T_{n}, the inequality (3.1) yields

Taking expectation in the both hand sides of the inequality above leads to

Since the coupling process (Xt,Yt)t0(X_{t},Y_{t})_{t\geq 0} is non-explosive, we have TnTT_{n}\uparrow T a.s. as nn\to\infty, where TT is the coupling time. Thus by Fatou’s lemma, letting nn\to\infty in the above inequality gives us

Thanks to our convention that Yt=XtY_{t}=X_{t} for tTt\geq T, we have rt=0r_{t}=0 for all tTt\geq T. Therefore,

If σ1(xy)η|\sigma^{-1}(x-y)|\leq\eta, then for any p1p\geq 1 and any t>0t>0, we deduce from (2.8), (3.3) and (2.7) that

for some constant C6>1C_{6}>1. Therefore, if xyη/C6|x-y|\leq\eta/C_{6}, then for any p1p\geq 1 there is a constant C7>0C_{7}>0 such that

Set xi=x+i(yx)/nx_{i}=x+i(y-x)/n for i=0,1,,ni=0,1,\ldots,n. Then x0=xx_{0}=x and xn=yx_{n}=y; moreover, (3.6) implies xi1xi=xy/nη/C6|x_{i-1}-x_{i}|=|x-y|/n\leq\eta/C_{6} for all i=1,2,,ni=1,2,\ldots,n. Therefore, for all p1p\geq 1, by (3.5),

where in the last inequality we have used n2C6xy/ηn\leq 2C_{6}|x-y|/\eta. The proof of Theorem 1.3 is completed. ∎

and T=inf{t>0:Xt=Yt}T=\inf\{t>0:X_{t}=Y_{t}\} is the coupling time. For tTt\geq T, we still set Yt=XtY_{t}=X_{t}. Therefore, the difference process (Zt)t0=(XtYt)t0(Z_{t})_{t\geq 0}=(X_{t}-Y_{t})_{t\geq 0} satisfies

Still denoting by rt=σ1Ztr_{t}=|\sigma^{-1}Z_{t}|, we get

where in the first inequality we have used (3.4), and the last inequality follows from (3.8). In particular, we have for all σ1(xy)>η|\sigma^{-1}(x-y)|>\eta and t>t0t>t_{0},

Combining with all conclusions above, we complete the proof of Theorem 1.4. ∎

Next, since r/r0n0+1{r}/{r_{0}}\leq n_{0}+1, we have

which implies 2cn02c2cr/r0-2cn_{0}\leq 2c-2c{r}/{r_{0}}. Therefore

for all rr0(δ0+2c)/cr\geq r_{0}(\delta_{0}+2c)/c, and so (1.5) holds with the new constant c/2r0c/2r_{0}. ∎

Proofs of Examples and Corollaries

Now the result follows immediately from the fact that V(x)V^{\prime}(x) is strictly increasing when x|x| is large enough. Indeed, we have

which is positive if x(1δ)1/2|x|\geq(1-\delta)^{-1/2}.

On the other hand, it is easy to see that with the choices of σ\sigma and bb above, the semigroup (Pt)t0(P_{t})_{t\geq 0} is symmetric with respect to the probability measure μ(dx)=1ZVeV(x)dx\mu(\textup{d}x)=\frac{1}{Z_{V}}e^{V(x)}\,\textup{d}x. Then, according to (the proof of) Corollary 1.10 below, we know that μ(dx)\mu(\textup{d}x) fulfills the Poincaré inequality (1.15) if (1.7) is satisfied with p=1p=1; however, this is impossible, see e.g. [27, Example 4.3.1 (3)].

which implies V(x)V(x) is strictly concave. Hence κ(r)0\kappa(r)\leq 0 for all r0r\geq 0. On the other hand, to show that κ(r)0\kappa(r)\geq 0 for all r0r\geq 0, as in the proof of (1), we simply look at the one dimensional case:

Then, the assertion (1.11) immediately follows from (the proof of) Theorem 1.1, by simply using the synchronous coupling, see e.g. [2, p.2432].

Finally we prove the algebraic convergence rate (1.12). For this, we mainly follow from [9, Section 5] or [5, Section 7.2] (see also [21, Theorem 1.1]). By (2.3),

where TT is the coupling time of the coupling process (Xt,Yt)t0(X_{t},Y_{t})_{t\geq 0}, and (Wt)t0(W_{t})_{t\geq 0} is the same one-dimensional Brownian motion as in (2.2). Hence,

Denote by Wt=inf0stWsW_{t}^{\ast}=\inf_{0\leq s\leq t}W_{s} which has the same law as that of Wt-|W_{t}|. Thus for any t>0t>0,

In particular, by the definition of total variation distance,

Finally we present the proofs of the two corollaries of Theorem 1.3.

The inequality above also yields the uniqueness of the invariant measure.

This implies that (e.g. see [6, Theorem 5.10] or [24, Theorem 5.10])

holds for any t>0t>0 and any Lipschitz continuous function ff, where fLip\|f\|_{\textrm{Lip}} denotes the Lipschitz semi-norm with respect to the Euclidean norm |\cdot|.

Replacing ff with PtfP_{t}f in the equality above, we arrive at

Next, we follow the proof of [23, Lemma 2.2] to show that the inequality above yields the desired Poincaré inequality. Indeed, for every ff with μ(f)=0\mu(f)=0 and μ(f2)=1\mu(f^{2})=1. By the spectral representation theorem, we have

where in the inequality above we have used Jensen’s inequality. Thus,

which is equivalent to the desired Poincaré inequality, see e.g. [27, Theorem 1.1.1]. ∎

Acknowledgements. The authors are grateful to the referee for his valuable comments which helped to improve the quality of the paper.

References