Exponential approximation and Stein's method of exchangeable pairs
Jason Fulman, Nathan Ross
Introduction
The purpose of this paper is to further develop exponential approximation, using Stein’s method of exchangeable pairs. The first use of exchangeable pairs in exponential approximation was in the paper , which studied the spectrum of the Bernoulli-Laplace Markov chain. Unfortunately the results in , which use the Kolmogorov metric, are very complicated and seem quite hard to apply in other examples. We provide approximation results similar to those in , but which are significantly easier to compute. In particular, we do not see how to apply the results of to the example in this paper.
We work in a “smooth” test function metric, but also provide bounds in the Kolmogorov metric, which is defined for random variables and to be
Our main theoretical result is the following theorem (see also Theorem 2.1 below).
Let be a mean one exponential random variable. If is a random variable with finite second moment and is an exchangeable pair such that for some and sigma-field ,
Remark: The theorem is stated with a choice of in order to simplify the error bound, but it is obvious that in applications should be chosen to minimize the bound.
One of the attractive points about this result is that the terms are very similar to those which one encounters in normal approximation. To see the parallels, here is a normal approximation theorem of Rinott and Rotar (in the Kolmogorov metric).
The authors suggest that this should follow from methods in Johansson’s remarkable paper . This seems challenging to make rigorous, particularly if one wants to make explicit. In 3, we give the first rigorous and explicit error term for this problem, proving that the Kolmogorov distance between and a standard mean 1 exponential is at most . Another approach to this result might be to use the multivariate central limit theorems in ; we do not pursue this here.
To close the introduction, we mention some related results using Stein’s method for exponential approximation. Aside from which we already mentioned, a recent paper of Chatterjee and Shao (and similar results in Section 13.4 of the text ), use Stein’s method of exchangeable pairs for exponential approximation. However our approach is quite different than theirs since they assume the exchangeable pair satisfies
with a positive constant, rather than the linearity condition (1) assumed here. Another approach to Stein’s method for exponential approximation is the “equilibrium distribution” coupling for which we refer the reader to the papers by Peköz and Röllin , , and to the references therein. For the generator method (in the more general context of chi-squared approximation), one can consult or . We note that the examples in and are about independent random variables, whereas the example in the current paper involves dependence. Finally, the introductory survey has some discussion of these approaches in the wider context of Stein’s method.
General theorem
The purpose of this section is to prove Theorem 1.1 from the introduction. We first prove an intermediate result which can be thought of as an approximation result for “smooth” test functions.
Let be a mean one exponential random variable. If is a random variable with finite second moment and is an exchangeable pair such that for some and sigma-field ,
then for all twice differentiable functions with ,
Define a characterizing operator for the exponential distribution which has the property that
for all in a large enough class of functions if and only if .
For functions in the class of interest, define to solve
Now use properties of the solutions and the auxiliary randomization of exchangeable pairs to bound this last term.
The next lemma takes care of Items 1 and 2 and also provides the bounds on the solutions needed for Item 3 in the program above. The proof of Theorem 2.1 is immediately after the proof of the lemma.
Let be a mean one exponential random variable. If is a function such that the following integrals are well defined, then
If is absolutely continuous with , then
If in addition, and is absolutely continuous with , then
To prove (8), first note that since translating by a constant leaves unchanged, we may (and do) assume without loss of generality that , so that . Using this fact and also that in the equality below, we find
To bound this last expression, we compare derivatives to find
For the second assertion note that (7) implies that
Since , the first term of (10) is bounded in absolute value by , so we only need to find an appropriate bound on the term of (10) that is in parentheses. We have by (9) that
Now taking the absolute value of (11), using the triangle inequality, bounding , and using , we find
which now yields the second assertion of (8).
To prove the final statement of the lemma, take the derivative of (10) using the expression (11) to find
To bound these expressions we first note that since ,
and in particular, is bounded above by both terms appearing in the minimum. Thus, the absolute value of the right hand side of (12) is bounded above by
where we have used that .
We bound the second term (13) differently according to or . Suppose that . Then note that . Using that and , we find the absolute value of (13) is bounded above by
where we have used that . By comparing derivatives we find
so that (14) (and hence (13)) is bounded above by for .
Using that and , we find the absolute value of (13) is bounded above by
so that (15) (and hence (13)) is bounded above by for . ∎
Using exchangeability and the linearity condition (2), we observe that
Now taking the absolute value of this last expression, we find that (17) in absolute value is bounded above by
We are now in a position to prove Theorem 1.1. First define the function for , and ,
The next lemma states some important facts regarding the use of in our framework.
If , , and is defined by (18), then
If is a random variable and has the exponential distribution with mean one, then
The first assertion follows from direct computation. For the second, note that
Since for , we find by direct computation that
establishes (20) with the left hand side replaced by
The main purpose of this section is to prove the following result.
The heat kernel on a compact Lie group is defined by setting for and ,
where the are the eigenvalues of the Laplacian repeated according to multiplicity, and the are an orthonormal basis of eigenfunctions of ; these can be taken to be the irreducible characters of .
We use the following properties of the heat kernel, where denotes the Laplacian of . Part 1 of Lemma 3.2 is from page 198 of . Part 2 of Lemma 3.2 is immediate from the expansion (21). Part 3 of Lemma 3.2 is Lemma 2.5 of .
Let be a compact Lie group, , and .
converges and is non-negative for all .
, where the integration is with respect to Haar measure of .
For smooth , as , one has that
The symmetry in and of shows that the heat kernel is a reversible Markov process with respect to the Haar measure of . It is a standard fact , that reversible Markov processes lead to exchangeable pairs . Namely suppose one has a Markov chain with transition probabilities on a state space , and that the Markov chain is reversible with respect to a probability distribution on . Then given a function on , if one lets where is chosen from and where is obtained by moving from according to , then is an exchangeable pair. In the special case of the heat kernel on a compact Lie group , given a function on , one can construct an exchangeable pair by letting where is chosen from Haar measure, and , where is obtained by moving time from via the heat kernel. To define the exchangeable pair used in this paper, we further specialize by setting .
If is an integer partition, and denotes the multiplicity of part in , we define . For example, . Typically we suppress the and use the notation .
The next three lemmas are from Rains ; here is defined by
For all integers and (not necessarily positive), and unitary ,
For all unitary matrices and class functions
The final lemma is a moment computation from .
Throughout we let .
The second equality was Lemma 3.5, and the final equality used Lemmas 3.3 and 3.4. ∎
Using Lemma 3.5, and then Lemmas 3.3 and 3.4, one computes that
Next we compute expected values of low order moments of .
exchangeability of gives that
where the penultimate equality used Lemma 3.6.
By the proof of Lemma 3.8, and then Lemma 3.6,
For part 3 of the lemma, one uses the Cauchy-Schwarz inequality to obtain that
Part 3 then follows from parts 1 and 2 of the lemma. ∎
Now we proceed to the proof of Theorem 3.1.
By Lemma 3.7, one can apply Theorem 1.1 with . By Lemma 3.8, and the triangle inequality,
By Lemma 3.6, this is ; letting gives an upper bound
Finally, note from Lemma 3.7 that . Since , it follows that
Summarizing, by letting , Theorem 1.1 implies that for ,
Choosing , yields the claimed result. ∎
The moments of the random variable have a combinatorial interpretation. Indeed, from one has for all positive integers that
Here is the length of the longest subsequence of a random permutation on symbols.
Acknowledgements
Fulman was partially supported by a Simons Foundation Fellowship. We thank Eric Rains for helpful correspondence.