Exponential approximation and Stein's method of exchangeable pairs

Jason Fulman, Nathan Ross

Introduction

The purpose of this paper is to further develop exponential approximation, using Stein’s method of exchangeable pairs. The first use of exchangeable pairs in exponential approximation was in the paper , which studied the spectrum of the Bernoulli-Laplace Markov chain. Unfortunately the results in , which use the Kolmogorov metric, are very complicated and seem quite hard to apply in other examples. We provide approximation results similar to those in , but which are significantly easier to compute. In particular, we do not see how to apply the results of to the example in this paper.

We work in a “smooth” test function metric, but also provide bounds in the Kolmogorov metric, which is defined for random variables $W$ and $Z$ to be

Our main theoretical result is the following theorem (see also Theorem 2.1 below).

Let $Z$ be a mean one exponential random variable. If $W\geq 0$ is a random variable with finite second moment and $(W,W^{\prime})$ is an exchangeable pair such that for some $a>0$ and sigma-field $\mathcal{F}\supseteq\sigma(W)$ ,

Remark: The theorem is stated with a choice of $\delta$ in order to simplify the error bound, but it is obvious that in applications $\delta$ should be chosen to minimize the bound.

One of the attractive points about this result is that the terms are very similar to those which one encounters in normal approximation. To see the parallels, here is a normal approximation theorem of Rinott and Rotar (in the Kolmogorov metric).

The authors suggest that this should follow from methods in Johansson’s remarkable paper . This seems challenging to make rigorous, particularly if one wants to make $c,\delta$ explicit. In 3, we give the first rigorous and explicit error term for this problem, proving that the Kolmogorov distance between $|Tr(U)|^{2}$ and a standard mean 1 exponential is at most $2^{9/4}/\sqrt{n}$ . Another approach to this result might be to use the multivariate central limit theorems in ; we do not pursue this here.

To close the introduction, we mention some related results using Stein’s method for exponential approximation. Aside from which we already mentioned, a recent paper of Chatterjee and Shao (and similar results in Section 13.4 of the text ), use Stein’s method of exchangeable pairs for exponential approximation. However our approach is quite different than theirs since they assume the exchangeable pair $(W,W^{\prime})$ satisfies

with $c_{0}$ a positive constant, rather than the linearity condition (1) assumed here. Another approach to Stein’s method for exponential approximation is the “equilibrium distribution” coupling for which we refer the reader to the papers by Peköz and Röllin , , and to the references therein. For the generator method (in the more general context of chi-squared approximation), one can consult or . We note that the examples in and are about independent random variables, whereas the example in the current paper involves dependence. Finally, the introductory survey has some discussion of these approaches in the wider context of Stein’s method.

General theorem

The purpose of this section is to prove Theorem 1.1 from the introduction. We first prove an intermediate result which can be thought of as an approximation result for “smooth” test functions.

then for all twice differentiable functions $h$ with $\|h^{\prime}\|,\|h^{\prime\prime}\|<\infty$ ,

Define a characterizing operator $\mathcal{A}$ for the exponential distribution which has the property that

for all $f$ in a large enough class of functions if and only if $Z\sim Exp(1)$ .

For functions $h$ in the class of interest, define $f_{h}$ to solve

Now use properties of the solutions $f_{h}$ and the auxiliary randomization of exchangeable pairs to bound this last term.

The next lemma takes care of Items 1 and 2 and also provides the bounds on the solutions $f_{h}$ needed for Item 3 in the program above. The proof of Theorem 2.1 is immediately after the proof of the lemma.

Let $Z$ be a mean one exponential random variable. If $h$ is a function such that the following integrals are well defined, then

If $h$ is absolutely continuous with $\|h^{\prime}\|<\infty$ , then

If in addition, $h^{\prime}(0)=0$ and $h^{\prime}$ is absolutely continuous with $\|h^{\prime\prime}\|<\infty$ , then

To prove (8), first note that since translating $h$ by a constant leaves $f$ unchanged, we may (and do) assume without loss of generality that $h(0)=0$ , so that $h(x)\leq\|h^{\prime}\||x|$ . Using this fact and also that $\int xe^{-x}dx=-e^{-x}(x+1)$ in the equality below, we find

To bound this last expression, we compare derivatives to find

For the second assertion note that (7) implies that

Since $|h(x)|\leq\|h^{\prime}\||x|$ , the first term of (10) is bounded in absolute value by $\|h^{\prime}\|$ , so we only need to find an appropriate bound on the term of (10) that is in parentheses. We have by (9) that

Now taking the absolute value of (11), using the triangle inequality, bounding $|h(x)|\leq\|h^{\prime}\||x|$ , and using $\int xe^{-x}dx=-e^{-x}(x+1)$ , we find

which now yields the second assertion of (8).

To prove the final statement of the lemma, take the derivative of (10) using the expression (11) to find

To bound these expressions we first note that since $h^{\prime}(0)=h(0)=0$ ,

and in particular, $|h(x)|$ is bounded above by both terms appearing in the minimum. Thus, the absolute value of the right hand side of (12) is bounded above by

where we have used that $\min\{|w/2-1|,|1-2/w|\}\leq 1$ .

We bound the second term (13) differently according to $w\geq 1$ or $w<1$ . Suppose that $w\geq 1$ . Then note that $(w^{2}-2w+2)e^{w}\geq e^{w}\geq 2$ . Using that $|h(x)|\leq\|h^{\prime}\||x|$ and $\int xe^{-x}dx=-e^{-x}(x+1)$ , we find the absolute value of (13) is bounded above by

where we have used that $1-w^{2}\leq 0$ . By comparing derivatives we find

so that (14) (and hence (13)) is bounded above by $(4-6/e)\|h^{\prime}\|$ for $w\geq 1$ .

Using that $|h(x)|\leq\|h^{\prime\prime}\|x^{2}/2$ and $\int x^{2}e^{-x}dx=-e^{-x}(2+2x+x^{2})$ , we find the absolute value of (13) is bounded above by

so that (15) (and hence (13)) is bounded above by $\|h^{\prime\prime}\|$ for $0<w\leq 1$ . ∎

Using exchangeability and the linearity condition (2), we observe that

Now taking the absolute value of this last expression, we find that (17) in absolute value is bounded above by

We are now in a position to prove Theorem 1.1. First define the function for $t,x\geq 0$ , and $\delta>0$ ,

The next lemma states some important facts regarding the use of $h_{t,\delta}$ in our framework.

If $t\geq 0$ , $\delta>0$ , and $h_{t,\delta}$ is defined by (18), then

If $W\geq 0$ is a random variable and $Z$ has the exponential distribution with mean one, then

The first assertion follows from direct computation. For the second, note that

Since $e^{-x}\leq 1$ for $x>0$ , we find by direct computation that

establishes (20) with the left hand side replaced by

The main purpose of this section is to prove the following result.

The heat kernel on a compact Lie group $G$ is defined by setting for $x,y\in G$ and $t\geq 0$ ,

where the $\lambda_{n}$ are the eigenvalues of the Laplacian repeated according to multiplicity, and the $\phi_{n}$ are an orthonormal basis of eigenfunctions of $L^{2}(G)$ ; these can be taken to be the irreducible characters of $G$ .

We use the following properties of the heat kernel, where $\Delta$ denotes the Laplacian of $G$ . Part 1 of Lemma 3.2 is from page 198 of . Part 2 of Lemma 3.2 is immediate from the expansion (21). Part 3 of Lemma 3.2 is Lemma 2.5 of .

Let $G$ be a compact Lie group, $x,y\in G$ , and $t\geq 0$ .

$K(t,x,y)$ converges and is non-negative for all $x,y,t$ .

$\int_{y\in G}K(t,x,y)dy=1$ , where the integration is with respect to Haar measure of $G$ .

For smooth $\phi$ , as $t\rightarrow 0$ , one has that

The symmetry in $x$ and $y$ of $K(t,x,y)$ shows that the heat kernel is a reversible Markov process with respect to the Haar measure of $G$ . It is a standard fact , that reversible Markov processes lead to exchangeable pairs $(W,W^{\prime})$ . Namely suppose one has a Markov chain with transition probabilities $K(x,y)$ on a state space $X$ , and that the Markov chain is reversible with respect to a probability distribution $\pi$ on $X$ . Then given a function $f$ on $X$ , if one lets $W=f(x)$ where $x$ is chosen from $\pi$ and $W^{\prime}=f(x^{\prime})$ where $x^{\prime}$ is obtained by moving from $x$ according to $K(x,y)$ , then $(W,W^{\prime})$ is an exchangeable pair. In the special case of the heat kernel on a compact Lie group $G$ , given a function $f$ on $G$ , one can construct an exchangeable pair $(W,W^{\prime})$ by letting $W=f(U)$ where $U$ is chosen from Haar measure, and $W^{\prime}=f(U^{\prime})$ , where $U^{\prime}$ is obtained by moving time $t$ from $U$ via the heat kernel. To define the exchangeable pair $(W,W^{\prime})$ used in this paper, we further specialize by setting $f(U)=|Tr(U)|^{2}$ .

If $\lambda$ is an integer partition, and $m_{j}$ denotes the multiplicity of part $j$ in $\lambda$ , we define $p_{\lambda}(U)=\prod_{j}Tr(U^{j})^{m_{j}}$ . For example, $p_{5,3,3}(U)=Tr(U^{5})Tr(U^{3})^{2}$ . Typically we suppress the $U$ and use the notation $p_{\lambda}$ .

The next three lemmas are from Rains ; here $\nabla f\cdot\nabla g$ is defined by

For all integers $k$ and $l$ (not necessarily positive), and unitary $U$ ,

For all unitary matrices $U$ and class functions $f_{1},\cdots,f_{k}$

The final lemma is a moment computation from .

Throughout we let $W(U)=|Tr(U)|^{2}=p_{1}(U)\overline{p_{1}(U)}$ .

The second equality was Lemma 3.5, and the final equality used Lemmas 3.3 and 3.4. ∎

Using Lemma 3.5, and then Lemmas 3.3 and 3.4, one computes that

Next we compute expected values of low order moments of $W^{\prime}-W$ .

exchangeability of $(W,W^{\prime})$ gives that

where the penultimate equality used Lemma 3.6.

By the proof of Lemma 3.8, and then Lemma 3.6,

For part 3 of the lemma, one uses the Cauchy-Schwarz inequality to obtain that

Part 3 then follows from parts 1 and 2 of the lemma. ∎

Now we proceed to the proof of Theorem 3.1.

By Lemma 3.7, one can apply Theorem 1.1 with $a=2nt$ . By Lemma 3.8, and the triangle inequality,

By Lemma 3.6, this is $\frac{\sqrt{2}}{n}+O(t)$ ; letting $t\rightarrow 0$ gives an upper bound

Finally, note from Lemma 3.7 that $R=O(t^{2})$ . Since $a=4nt$ , it follows that

Summarizing, by letting $t\rightarrow 0$ , Theorem 1.1 implies that for $\delta>0$ ,

Choosing $\delta=4\cdot 2^{1/4}/\sqrt{n}$ , yields the claimed result. ∎

The moments of the random variable $|Tr(U)|^{2}$ have a combinatorial interpretation. Indeed, from one has for all positive integers $l,n$ that

Here $L_{n}$ is the length of the longest subsequence of a random permutation on $n$ symbols.

Acknowledgements

Fulman was partially supported by a Simons Foundation Fellowship. We thank Eric Rains for helpful correspondence.

Introduction

General theorem

Acknowledgements

References