Exponential approximation and Stein's method of exchangeable pairs

Jason Fulman, Nathan Ross

Introduction

The purpose of this paper is to further develop exponential approximation, using Stein’s method of exchangeable pairs. The first use of exchangeable pairs in exponential approximation was in the paper , which studied the spectrum of the Bernoulli-Laplace Markov chain. Unfortunately the results in , which use the Kolmogorov metric, are very complicated and seem quite hard to apply in other examples. We provide approximation results similar to those in , but which are significantly easier to compute. In particular, we do not see how to apply the results of to the example in this paper.

We work in a “smooth” test function metric, but also provide bounds in the Kolmogorov metric, which is defined for random variables WW and ZZ to be

Our main theoretical result is the following theorem (see also Theorem 2.1 below).

Let ZZ be a mean one exponential random variable. If W0W\geq 0 is a random variable with finite second moment and (W,W)(W,W^{\prime}) is an exchangeable pair such that for some a>0a>0 and sigma-field Fσ(W)\mathcal{F}\supseteq\sigma(W),

Remark: The theorem is stated with a choice of δ\delta in order to simplify the error bound, but it is obvious that in applications δ\delta should be chosen to minimize the bound.

One of the attractive points about this result is that the terms are very similar to those which one encounters in normal approximation. To see the parallels, here is a normal approximation theorem of Rinott and Rotar (in the Kolmogorov metric).

The authors suggest that this should follow from methods in Johansson’s remarkable paper . This seems challenging to make rigorous, particularly if one wants to make c,δc,\delta explicit. In 3, we give the first rigorous and explicit error term for this problem, proving that the Kolmogorov distance between Tr(U)2|Tr(U)|^{2} and a standard mean 1 exponential is at most 29/4/n2^{9/4}/\sqrt{n}. Another approach to this result might be to use the multivariate central limit theorems in ; we do not pursue this here.

To close the introduction, we mention some related results using Stein’s method for exponential approximation. Aside from which we already mentioned, a recent paper of Chatterjee and Shao (and similar results in Section 13.4 of the text ), use Stein’s method of exchangeable pairs for exponential approximation. However our approach is quite different than theirs since they assume the exchangeable pair (W,W)(W,W^{\prime}) satisfies

with c0c_{0} a positive constant, rather than the linearity condition (1) assumed here. Another approach to Stein’s method for exponential approximation is the “equilibrium distribution” coupling for which we refer the reader to the papers by Peköz and Röllin , , and to the references therein. For the generator method (in the more general context of chi-squared approximation), one can consult or . We note that the examples in and are about independent random variables, whereas the example in the current paper involves dependence. Finally, the introductory survey has some discussion of these approaches in the wider context of Stein’s method.

General theorem

The purpose of this section is to prove Theorem 1.1 from the introduction. We first prove an intermediate result which can be thought of as an approximation result for “smooth” test functions.

Let ZZ be a mean one exponential random variable. If W0W\geq 0 is a random variable with finite second moment and (W,W)(W,W^{\prime}) is an exchangeable pair such that for some a>0a>0 and sigma-field Fσ(W)\mathcal{F}\supseteq\sigma(W),

then for all twice differentiable functions hh with h,h<\|h^{\prime}\|,\|h^{\prime\prime}\|<\infty,

Define a characterizing operator A\mathcal{A} for the exponential distribution which has the property that

for all ff in a large enough class of functions if and only if ZExp(1)Z\sim Exp(1).

For functions hh in the class of interest, define fhf_{h} to solve

Now use properties of the solutions fhf_{h} and the auxiliary randomization of exchangeable pairs to bound this last term.

The next lemma takes care of Items 1 and 2 and also provides the bounds on the solutions fhf_{h} needed for Item 3 in the program above. The proof of Theorem 2.1 is immediately after the proof of the lemma.

Let ZZ be a mean one exponential random variable. If hh is a function such that the following integrals are well defined, then

If hh is absolutely continuous with h<\|h^{\prime}\|<\infty, then

If in addition, h(0)=0h^{\prime}(0)=0 and hh^{\prime} is absolutely continuous with h<\|h^{\prime\prime}\|<\infty, then

To prove (8), first note that since translating hh by a constant leaves ff unchanged, we may (and do) assume without loss of generality that h(0)=0h(0)=0, so that h(x)hxh(x)\leq\|h^{\prime}\||x|. Using this fact and also that xexdx=ex(x+1)\int xe^{-x}dx=-e^{-x}(x+1) in the equality below, we find

To bound this last expression, we compare derivatives to find

For the second assertion note that (7) implies that

Since h(x)hx|h(x)|\leq\|h^{\prime}\||x|, the first term of (10) is bounded in absolute value by h\|h^{\prime}\|, so we only need to find an appropriate bound on the term of (10) that is in parentheses. We have by (9) that

Now taking the absolute value of (11), using the triangle inequality, bounding h(x)hx|h(x)|\leq\|h^{\prime}\||x|, and using xexdx=ex(x+1)\int xe^{-x}dx=-e^{-x}(x+1) , we find

which now yields the second assertion of (8).

To prove the final statement of the lemma, take the derivative of (10) using the expression (11) to find

To bound these expressions we first note that since h(0)=h(0)=0h^{\prime}(0)=h(0)=0,

and in particular, h(x)|h(x)| is bounded above by both terms appearing in the minimum. Thus, the absolute value of the right hand side of (12) is bounded above by

where we have used that min{w/21,12/w}1\min\{|w/2-1|,|1-2/w|\}\leq 1.

We bound the second term (13) differently according to w1w\geq 1 or w<1w<1. Suppose that w1w\geq 1. Then note that (w22w+2)ewew2(w^{2}-2w+2)e^{w}\geq e^{w}\geq 2. Using that h(x)hx|h(x)|\leq\|h^{\prime}\||x| and xexdx=ex(x+1)\int xe^{-x}dx=-e^{-x}(x+1), we find the absolute value of (13) is bounded above by

where we have used that 1w201-w^{2}\leq 0. By comparing derivatives we find

so that (14) (and hence (13)) is bounded above by (46/e)h(4-6/e)\|h^{\prime}\| for w1w\geq 1.

Using that h(x)hx2/2|h(x)|\leq\|h^{\prime\prime}\|x^{2}/2 and x2exdx=ex(2+2x+x2)\int x^{2}e^{-x}dx=-e^{-x}(2+2x+x^{2}), we find the absolute value of (13) is bounded above by

so that (15) (and hence (13)) is bounded above by h\|h^{\prime\prime}\| for 0<w10<w\leq 1. ∎

Using exchangeability and the linearity condition (2), we observe that

Now taking the absolute value of this last expression, we find that (17) in absolute value is bounded above by

We are now in a position to prove Theorem 1.1. First define the function for t,x0t,x\geq 0, and δ>0\delta>0,

The next lemma states some important facts regarding the use of ht,δh_{t,\delta} in our framework.

If t0t\geq 0, δ>0\delta>0, and ht,δh_{t,\delta} is defined by (18), then

If W0W\geq 0 is a random variable and ZZ has the exponential distribution with mean one, then

The first assertion follows from direct computation. For the second, note that

Since ex1e^{-x}\leq 1 for x>0x>0, we find by direct computation that

establishes (20) with the left hand side replaced by

The main purpose of this section is to prove the following result.

The heat kernel on a compact Lie group GG is defined by setting for x,yGx,y\in G and t0t\geq 0,

where the λn\lambda_{n} are the eigenvalues of the Laplacian repeated according to multiplicity, and the ϕn\phi_{n} are an orthonormal basis of eigenfunctions of L2(G)L^{2}(G); these can be taken to be the irreducible characters of GG.

We use the following properties of the heat kernel, where Δ\Delta denotes the Laplacian of GG. Part 1 of Lemma 3.2 is from page 198 of . Part 2 of Lemma 3.2 is immediate from the expansion (21). Part 3 of Lemma 3.2 is Lemma 2.5 of .

Let GG be a compact Lie group, x,yGx,y\in G, and t0t\geq 0.

K(t,x,y)K(t,x,y) converges and is non-negative for all x,y,tx,y,t.

yGK(t,x,y)dy=1\int_{y\in G}K(t,x,y)dy=1, where the integration is with respect to Haar measure of GG.

For smooth ϕ\phi, as t0t\rightarrow 0, one has that

The symmetry in xx and yy of K(t,x,y)K(t,x,y) shows that the heat kernel is a reversible Markov process with respect to the Haar measure of GG. It is a standard fact , that reversible Markov processes lead to exchangeable pairs (W,W)(W,W^{\prime}). Namely suppose one has a Markov chain with transition probabilities K(x,y)K(x,y) on a state space XX, and that the Markov chain is reversible with respect to a probability distribution π\pi on XX. Then given a function ff on XX, if one lets W=f(x)W=f(x) where xx is chosen from π\pi and W=f(x)W^{\prime}=f(x^{\prime}) where xx^{\prime} is obtained by moving from xx according to K(x,y)K(x,y), then (W,W)(W,W^{\prime}) is an exchangeable pair. In the special case of the heat kernel on a compact Lie group GG, given a function ff on GG, one can construct an exchangeable pair (W,W)(W,W^{\prime}) by letting W=f(U)W=f(U) where UU is chosen from Haar measure, and W=f(U)W^{\prime}=f(U^{\prime}), where UU^{\prime} is obtained by moving time tt from UU via the heat kernel. To define the exchangeable pair (W,W)(W,W^{\prime}) used in this paper, we further specialize by setting f(U)=Tr(U)2f(U)=|Tr(U)|^{2}.

If λ\lambda is an integer partition, and mjm_{j} denotes the multiplicity of part jj in λ\lambda, we define pλ(U)=jTr(Uj)mjp_{\lambda}(U)=\prod_{j}Tr(U^{j})^{m_{j}}. For example, p5,3,3(U)=Tr(U5)Tr(U3)2p_{5,3,3}(U)=Tr(U^{5})Tr(U^{3})^{2}. Typically we suppress the UU and use the notation pλp_{\lambda}.

The next three lemmas are from Rains ; here fg\nabla f\cdot\nabla g is defined by

For all integers kk and ll (not necessarily positive), and unitary UU,

For all unitary matrices UU and class functions f1,,fkf_{1},\cdots,f_{k}

The final lemma is a moment computation from .

Throughout we let W(U)=Tr(U)2=p1(U)p1(U)W(U)=|Tr(U)|^{2}=p_{1}(U)\overline{p_{1}(U)}.

The second equality was Lemma 3.5, and the final equality used Lemmas 3.3 and 3.4. ∎

Using Lemma 3.5, and then Lemmas 3.3 and 3.4, one computes that

Next we compute expected values of low order moments of WWW^{\prime}-W.

exchangeability of (W,W)(W,W^{\prime}) gives that

where the penultimate equality used Lemma 3.6.

By the proof of Lemma 3.8, and then Lemma 3.6,

For part 3 of the lemma, one uses the Cauchy-Schwarz inequality to obtain that

Part 3 then follows from parts 1 and 2 of the lemma. ∎

Now we proceed to the proof of Theorem 3.1.

By Lemma 3.7, one can apply Theorem 1.1 with a=2nta=2nt. By Lemma 3.8, and the triangle inequality,

By Lemma 3.6, this is 2n+O(t)\frac{\sqrt{2}}{n}+O(t); letting t0t\rightarrow 0 gives an upper bound

Finally, note from Lemma 3.7 that R=O(t2)R=O(t^{2}). Since a=4nta=4nt, it follows that

Summarizing, by letting t0t\rightarrow 0, Theorem 1.1 implies that for δ>0\delta>0,

Choosing δ=421/4/n\delta=4\cdot 2^{1/4}/\sqrt{n}, yields the claimed result. ∎

The moments of the random variable Tr(U)2|Tr(U)|^{2} have a combinatorial interpretation. Indeed, from one has for all positive integers l,nl,n that

Here LnL_{n} is the length of the longest subsequence of a random permutation on nn symbols.

Acknowledgements

Fulman was partially supported by a Simons Foundation Fellowship. We thank Eric Rains for helpful correspondence.

References