Stein's method of exchangeable pairs for absolutely continuous, univariate distributions with applications to the Polya urn model

Christian Döbler

Introduction

and that, if fhf_{h} is bounded and 1p\frac{1}{p} is unbounded on (a,b)(a,b), then fhf_{h} is the only bounded solution on (a,b)(a,b). For general properties of the solutions fhf_{h} see [CS11] or [CGS11]. Note that for general Borel-measurable hh it cannot be expected that there exists a solution ff which is differentiable on all of (a,b)(a,b) and satisfies (2) pointwise. Thus, a solution is understood to be an almost everywhere differentiable and Borel-measurable function which satisfies (2) at all points x(a,b)x\in(a,b) where it is in fact differentiable and contrary to the usual convention, at the remaining points x(a,b)x\in(a,b) one defines f(x):=ψ(x)+h(x)μ(h)f^{\prime}(x):=-\psi(x)+h(x)-\mu(h). This yields a Borel-measurable function ff^{\prime} on (a,b)(a,b) such that (2) holds for each x(a,b)x\in(a,b). In order to understand the exchangeable pairs technique in the framework of the density approach it might be helpful to recall the exchangeable pairs method in the situation of normal approximation. This method, which was first presented in Stein’s monograph [Ste86], is a cornerstone of Stein’s method of normal approximation and is still the most frequently used coupling. This is due to the wide applicability of standard couplings like the Gibbs sampler or making one time step in a reversible Markov chain, which generally yield exchangeable pairs. By definition, an exchangeable pair is a pair (W,W)(W,W^{\prime}) of random variables, defined on a common probability space, such that their joint distribution is symmetric, i.e. such that (W,W)=D(W,W)(W,W^{\prime})\stackrel{{\scriptstyle\mathcal{D}}}{{=}}(W^{\prime},W). In [Ste86], in order to show that a given real-valued random variable WW is approximately standard normally distributed, Stein proposes the construction of another random variable WW^{\prime}, a small random perturbation of WW, on the same space as WW such that (W,W)(W,W^{\prime}) forms an exchangeable pair and additionally the following linear regression property holds:

Here, λ(0,1)\lambda\in(0,1) is a constant which is typically close to zero for conveniently chosen WW^{\prime}. If this condition is satisfied, then the distributional distance of L(W)\mathcal{L}(W) to N(0,1)N(0,1) can be efficiently bounded in various metrics, including the Kolmogorov and Wasserstein metrics (see, e.g. [Ste86], [CS05] or [CGS11] for the common “plug-in theorems”). The range of examples to which this technique could be applied was considerably extended by the work [RR97] of Rinott and Rotar who proved normal approximation theorems allowing the linear regression property to be satisfied only approximately. Specifically, they assumed the existence of a random quantity RR, which is dominated by λW\lambda W in size, such that

Note, that necessarily RR is σ(W)\sigma(W)-measurable and that unlike condition (4), condition (5) is not a true condition on the pair (W,W)(W,W^{\prime}) since we can always define R:=E[WWW]+λWR:=E[W^{\prime}-W|W]+\lambda W for each given constant λ>0\lambda>0. However, the “plug-in theorems” in [RR97], [SS06] or [CGS11] clarify that RR has to be of smaller order than λ\lambda in order to yield useful bounds. Since WW is supposed to have a “true” distributional limit, it follows that both, λ\lambda and RR, are at least asymptotically unique (see also the introduction of [RR09] for the discussion of this topic). When dealing with our possibly non-normal distribution μ\mu, the question is what condition to substitute for the linear regression property (4) or (5). This question was succesfully answered independently by Eichelsbacher and Löwe in [EL10] and by Chatterjee and Shao in [CS11]. They pointed out, that in this more general setting the appropriate regression property is

where, again λ>0\lambda>0 is constant and RR is of smaller order than λψ(W)\lambda\psi(W). In order to give a flavour of the resulting “plug-in theorems”, we present parts of Theorem 2.4 from [EL10].

The third term of the bound (1.1) reveals that, in fact, RR must be of smaller order than λ\lambda in order for the bound to be useful and from the second term we conclude that λ\lambda should such that E[(WW)2]2λE\left[(W^{\prime}-W)^{2}\right]\approx 2\lambda. The first term appearing in the bound on the right hand side of (1.1) is usually interpreted such, that the random variable E\bigl{[}(W-W^{\prime})^{2}|W\bigr{]}/2\lambda must “obey a law of large numbers” to obtain decreasing bounds. Bounding this term is often decisive for the success of applying Theorem 1.1.

Having discussed the method of exchangeable pairs within the density approach, we now address the problem, that condition (6) with negligible remainder term RR is in some examples not satisfied by an exchangeable pair, which, however, appears natural to us for our approximation problem. For example, in many situations where the exchangeable pair (W,W)(W,W^{\prime}) is constructed via the Gibbs sampler, we have a regression property of the form

where λ,c>0\lambda,c>0 are constants (the reason why λ\lambda and cc are not subsumed into a single constant will become clear later on) and where, again, RR is a negligible remainder. Here, again ZμZ\sim\mu. Following the theory of the paper [CS11], condition (8) suggests approximating WW with a normal distribution with mean E[Z]E[Z] and variance 1c\frac{1}{c}. But there are situations, where the exchangeable pair (W,W)(W,W^{\prime}) is good, meaning that the difference WW|W^{\prime}-W| is “small”, condition (8) is satisfied and where we know that WW is approximately distributed as a non-normal random variable ZμZ\sim\mu and so the normal approximation is inappropriate. In general, this either means that the above discussed law of large numbers cannot hold or that the resulting error term RR in (6) is not negligible. These observations motivate a new version of Stein’s method, that allows for a more general regression property.

Suppose, that an appropriately chosen exchangeable pair (W,W)(W,W^{\prime}) satisfies the following general regression property:

where λ>0\lambda>0 is constant, γ\gamma is a measurable function, whose domain contains (a,b)\overline{(a,b)} and will be discussed further below and where RR is a negligible remainder term. We will see, that it will be advantageous if the term γ(x)g(x)\gamma(x)\cdot g(x) appears in the “new” Stein equation. So we make the following ansatz for the Stein identity:

where η\eta is another function which still has to be found.

Starting from the Stein identity (10) our aim is to identify the function η\eta. If this approach is succesful, the Stein equation corrersponding to a meaurable function hh will be

For this identity to hold, irrespective of the test function hh, it must be the case that α(x)=1η(x)\alpha(x)=\frac{1}{\eta(x)} (particularly η\eta must be differentiable at least almost everywhere) and hence

This is a first order linear differential equation, which can of course be solved explicitly by the method of variation of the constant. It turns out, that the right solution is given by

at least, if abγ(t)p(t)dt=E[γ(Z)]<\int_{a}^{b}|\gamma(t)|p(t)dt=E[|\gamma(Z)|]<\infty. But this is a very natural condition to hold, since the function γ\gamma was motivated by the regression property (9) and so, if the random variables WW and WW^{\prime} are integrable we obtain that both sides of (9) must be PP-integrable and, in fact

Neglecting the remainder term RR we thus see that E[γ(W)]E[\gamma(W)] should exist and, in fact, be close to zero. So since WDZW\stackrel{{\scriptstyle\mathcal{D}}}{{\approx}}Z we find it reasonable that E[γ(Z)]E[\gamma(Z)] exists and even equals zero. Furthermore, it is a matter of routine to check, that η\eta as given in (15) indeed still satisfies (14). The above calculations starting with (10) were rather formal but crucial for the motivation and understanding of our approach. The paper is organized in the following way. Rigorous results and the abstract theory for general μ\mu are presented in Section 2. These results are then further spezialized to the Beta distributions in Section 3. In Section 4 the theory combined with a suitable exchangeable pairs coupling is used to prove a rate of convergence of order 1n\frac{1}{n} in a Polya urn model (see Theorem 4.3). In Section 5 some rather lengthy or technical proofs can be found and a sufficiently general version of de l’ Hôpital’s rule for merely absolutely continuous functions is provided. This result justifies all the invocations of this famous rule in the present work.

Acknowledgements

A few days after this work was on the arXiv, G. Reinert and L. Goldstein posted a preprint (see [GR12]), which also develops Stein’s method for the Beta distributions and uses a comparison technique to prove error bounds of order n1n^{-1} in the Wasserstein distance for the more special Polya urn model, where the drawn ball is replaced to the urn together with only one extra ball of the same colour.

The general theory

The density pp is positive on the interval (a,b)(a,b) and absolutely continuous on every compact interval [c,d](a,b)[c,d]\subseteq(a,b).

γ\gamma is continuous on (a,b)\overline{(a,b)}

γ\gamma is strictly decreasing on (a,b)\overline{(a,b)}

abγ(t)p(t)dt<\int_{a}^{b}|\gamma(t)|p(t)dt<\infty and in fact E[γ(Z)]=abγ(t)p(t)dt=0E[\gamma(Z)]=\int_{a}^{b}\gamma(t)p(t)dt=0

There is a unique x0(a,b)x_{0}\in\overline{(a,b)} with γ(x0)=0\gamma(x_{0})=0.

Note that in Condition 2.2 (iv) is actually implied by (i),(ii) and (iii) and the intermediate value theorem. Furthermore, (iv) implies that γ\gamma is positive on (a,x0)(a,x_{0}) and is negative on (x0,b)(x_{0},b).

Under Conditions 2.1 and 2.2 the function II has the following properties:

II is strictly increasing on (a,x0)(a,x_{0}) and strictly decreasing on (x0,b)(x_{0},b) and hence attains its global maximum at x0x_{0}.

Of course, (a) is immediately implied by item (iii) from Condition 2.2. To prove (b) and (c) first observe that by (iii) we have limxaI(x)=0=limxbI(x)\lim_{x\searrow a}I(x)=0=\lim_{x\nearrow b}I(x). Furthermore, I(x)=γ(x)p(x)I^{\prime}(x)=\gamma(x)p(x) is postive on (a,x0)(a,x_{0}) and negative on (x0,b)(x_{0},b) implying the results. ∎

Under Conditions 2.1 and 2.2 the function η\eta has the following properties:

η\eta is positive on (a,b)(a,b), absolutely continuous on every compact subinterval [c,d](a,b)[c,d]\subseteq(a,b) and η(x)=γ(x)ψ(x)η(x)\eta^{\prime}(x)=\gamma(x)-\psi(x)\eta(x) for λ\lambda-almost all x(a,b)x\in(a,b).

limxaη(x)p(x)=limxbη(x)p(x)=0\lim_{x\searrow a}\eta(x)p(x)=\lim_{x\nearrow b}\eta(x)p(x)=0

If limxap(x)=0\lim_{x\searrow a}p(x)=0, then limxaη(x)=γ(a)limxaψ(x)\lim_{x\searrow a}\eta(x)=\frac{\gamma(a)}{\lim_{x\searrow a}\psi(x)}, if this limit exists.

If lim infxap(x)(0,){}\liminf_{x\searrow a}p(x)\in(0,\infty)\cup\{\infty\} then limxaη(x)=0\lim_{x\searrow a}\eta(x)=0

If limxbp(x)=0\lim_{x\nearrow b}p(x)=0, then limxbη(x)=γ(b)limxbψ(x)\lim_{x\nearrow b}\eta(x)=\frac{\gamma(b)}{\lim_{x\nearrow b}\psi(x)}, if this limit exists.

If lim infxbp(x)(0,){}\liminf_{x\nearrow b}p(x)\in(0,\infty)\cup\{\infty\} then limxbη(x)=0\lim_{x\nearrow b}\eta(x)=0

The first part of (a) follows from the fact, that II is positive on (a,b)(a,b) and C1C^{1} on (a,b)\overline{(a,b)} and hence absolutely continuous on [c,d][c,d] and that pp is also absolutely continuous and bounded below by a positive constant on [c,d][c,d]. The rest of (a) has already been observed. Items (b), (d) and (f) follow immediately from the properties of the function II in Proposition 2.4. To prove (c), we use de l’Hôpital’s rule (see Theorem 5.1) to derive

The following “Mill’s ratio” condition on the density pp and the corresponding distribution function FF is often satisfied and will yield limxaη(x)=limxbη(x)=0\lim_{x\searrow a}\eta(x)=\lim_{x\nearrow b}\eta(x)=0.

The density pp of μ\mu satisfies all the properties from Condition 2.1 and also the following:

If a>a>-\infty, then limxaF(x)p(x)=0\lim_{x\searrow a}\frac{F(x)}{p(x)}=0.

If b<b<\infty, then limxb1F(x)p(x)=0\lim_{x\nearrow b}\frac{1-F(x)}{p(x)}=0.

Condition 2.6 is always satisfied if the density pp is bounded away from zero in suitable neighbourhoods of aa and bb.

Assume that both, a>a>-\infty and b<b<\infty and that limxap(x)=limxbp(x)=0\lim_{x\searrow a}p(x)=\lim_{x\nearrow b}p(x)=0. Then Condition 2.6 is satisfied, if there is a δ>0\delta>0 such that pp is increasing on (a,a+δ)(a,a+\delta) and decreasing on (bδ,b)(b-\delta,b). This is easily seen by the inequality

valid for x(a,a+δ)x\in(a,a+\delta) and a similar one for the right end point bb.

Suppose that a>a>-\infty and b<b<\infty, that limxap(x)=limxbp(x)=0\lim_{x\searrow a}p(x)=\lim_{x\nearrow b}p(x)=0 and that there is a δ>0\delta>0 such that pp is convex on (a,a+δ)(a,a+\delta) and on (bδ,b)(b-\delta,b). Then the assumptions of (b) and hence Condition 2.6 is satisfied. In fact, first we can extend pp to a continuous and convex function on [a,b)[a,b) by setting p(a):=0p(a):=0. Now, let a<x<y<a+δa<x<y<a+\delta. Then, there exists a λ(0,1)\lambda\in(0,1) with x=λa+(1λ)yx=\lambda a+(1-\lambda)y and by convexity we have:

Thus, pp is strictly increasing on (a,a+δ)(a,a+\delta). Similarly, one shows, that pp is strictly decreasing on (bδ,b)(b-\delta,b), if pp is convex there.

The following proposition provides the announced result.

Assume Condition 2.6. Then the function η\eta vanishes at the finite end points of the support (a,b)\overline{(a,b)} of μ\mu, i.e. if a>a>-\infty, then limxaη(x)=0\lim_{x\searrow a}\eta(x)=0 and if b<b<\infty, then limxbη(x)=0\lim_{x\nearrow b}\eta(x)=0. Hence, we may extend η\eta to a continuous function on (a,b)\overline{(a,b)} by letting η(a):=η(b):=0\eta(a):=\eta(b):=0.

Suppose, that a>a>-\infty. Then, by the positivity of II and the monotonicity of γ\gamma, for a<x<ba<x<b:

so that limxaη(x)=0\lim_{x\searrow a}\eta(x)=0. The proof of limxbη(x)=0\lim_{x\nearrow b}\eta(x)=0 for finite bb is similar by using the representation I(x)=xbγ(t)p(t)dtI(x)=-\int_{x}^{b}\gamma(t)p(t)dt and is therefore omitted. ∎

solves the Stein equation (11) for x(a,b)x\in(a,b). This can also be proved directly by differentiation and the formula for ghg_{h} could also be derived by the method of variation of the constant using the fact that log(pη)\log(p\cdot\eta) is a primitive function of γη\frac{\gamma}{\eta}, which follows from (14). If we can show that ghg_{h} is bounded, then it will immediately follow from Proposition 2.5 (a) that ghg_{h} is the only bounded solution of (11), since the solutions of the corresponding homogeneous equation are constant multiples of 1pη\frac{1}{p\cdot\eta}. Since we do not exclude approximating random variables which take on the values aa or bb, we show that the solution ghg_{h} can be extended continuously to aa and bb, if hh is continuous there. By the properties of the function I=pηI=p\eta on (a,b)(a,b), from Proposition 2.5, the continuity of γ\gamma and by de l’Hôpital’s rule (see Theorem 5.1) we have

As is typical for Stein’s method, its success within the applications considerably depends on good bounds on the solutions ghg_{h} and their derivative(s), generally uniformly over some given class H\mathcal{H} of test functions hh. The next step will be to prove such bounds. It has to be mentioned that we cannot expect to derive concrete good bounds in full generality, but that sometimes further conditions have to be imposed either on the distribution μ\mu (e.g. through the density pp) or on the coefficient γ\gamma. Nevertheless, we will derive bounds involving functional expressions which can a posteriori be simplified, computed or further bounded for concrete distributions. So our abstract viewpoint will pay off. Moreover, some of our bounds will actually hold in complete generality.

The next Proposition contains a bound for the solutions ghg_{h} for bounded and Borel-measurable test functions hh.

The proof is deferred to the appendix. The following corollary specializes this result to the case that γ(x)=c(xE[Z])\gamma(x)=-c(x-E[Z]) and that μ\mu is symmetric with respect to its median, which is then equal to its expected value E[Z]E[Z], that is Zm=DmZZ-m\stackrel{{\scriptstyle\mathcal{D}}}{{=}}m-Z.

In this case we clearly have I(m)=c2E[Zm]I(m)=\frac{c}{2}E[\lvert Z-m\rvert] which implies the result by Proposition 2.10. ∎

In the case that μ=N(0,1)\mu=N(0,1) and c=1c=1 this result specializes to the well known bound π2hμ(h)\sqrt{\frac{\pi}{2}}\lVert h-\mu(h)\rVert_{\infty} (see [CGS11] or [CS05], e.g.).

In the formulation of Proposition 2.10 it might suprise that there is no bound mentioned for gh\lVert g_{h}^{\prime}\rVert_{\infty}. This is because, in general a bound of the form ghCh\lVert g_{h}^{\prime}\rVert_{\infty}\leq C\lVert h\rVert_{\infty} does not exist with a finite constant CC in this setup. Note that this is contrary to the density approach, where one generally has such a bound (see [CS11] or [CGS11]).

Next, we will turn to Lipschitz continuous test functions hh. In contrast to bounded measurable test functions, there we will also be able to prove useful bounds for gh\lVert g_{h}^{\prime}\rVert_{\infty}.

In order to obtain bounds for Lipschitz continuous test functions we need a further condition on the distribution μ\mu which guarantees that its expected value exists.

The density pp is positive on the interval (a,b)(a,b) and absolutely continuous on every compact interval [c,d](a,b)[c,d]\subseteq(a,b). Furthermore, E[Z]=(a,b)xp(x)dx<E[|Z|]=\int_{\overline{(a,b)}}\lvert x\rvert p(x)dx<\infty.

The following proposition, which is also proved in the appendix, includes bounds for both, ghg_{h} and ghg_{h}^{\prime}, when hh is Lipschitz.

gh(x)hF(x)E[Z]axyp(y)dyI(x)\lvert g_{h}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{F(x)E[Z]-\int_{a}^{x}yp(y)dy}{I(x)}

gh(x)haxF(s)dsG(x)+xb(1F(s))dsH(x)p(x)η(x)2\lvert g_{h}^{\prime}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}F(s)dsG(x)+\int_{x}^{b}(1-F(s))dsH(x)}{p(x)\eta(x)^{2}}

Here, for x(a,b)x\in\overline{(a,b)} the positive functions H(x)H(x) and G(x)G(x) are defined by

In general, the term S(x):=F(x)E[Z]axyp(y)dyI(x)S(x):=\frac{F(x)E[Z]-\int_{a}^{x}yp(y)dy}{I(x)} cannot be bounded uniformly in x(a,b)x\in(a,b) unless γ\lvert\gamma\rvert grows at least linearly in xx.

ghhc\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{c}

\lvert g_{h}^{\prime}(x)\rvert\leq\frac{2\lVert h^{\prime}\rVert_{\infty}}{c}\frac{H(x)G(x)}{I(x)\eta(x)}=2\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}F(s)ds\int_{x}^{b}(1-F(t))dt}{\eta(x)\bigl{(}E[Z]F(x)-\int_{a}^{x}yp(y)dy\bigr{)}}

Claim (a) follows from Proposition 2.14 (a) and the observation that in this case we have

Part (b) follows from Proposition 2.14 (b) and Lemma 5.2 by observing that in this case

and, similarly, G(x)=cxb(1F(s))dsG(x)=c\int_{x}^{b}(1-F(s))ds. ∎

It is quite remarkable that in the case of normal approximation (via its classical Stein equation) the bound given in Corollary 2.16 (a) even improves on the best bound 2h2\lVert h^{\prime}\rVert_{\infty} currently mentioned in the literature (see, e.g. [CGS11] or [CS05]). In fact, in this case c=1c=1 and thus our bound reduces to h\lVert h^{\prime}\rVert_{\infty}.

For concrete distributions the ratio appearing in the bound for gh(x)g_{h}^{\prime}(x) may be bounded uniformly in xx by some constant which can sometimes also be computed explicitely. Nevertheless, in [EV12] the authors give mild conditions for the existence of a finite constant kk such that ghkh\lVert g_{h}^{\prime}\rVert_{\infty}\leq k\lVert h^{\prime}\rVert_{\infty} for any Lipschitz-continuous hh. In practice, these conditions are usually met. However, there is no hope of estimating the constant kk by their method of proof. Thus, for concrete distributions and explicit constants it might therefore by useful to work with our bound from Corollary 2.16 (b).

Next, we will discuss, how we can express the density pp of μ\mu in terms of γ\gamma and η\eta. This will be useful to bound the second derivative of ghg_{h} in some special cases. Let x0x_{0} be as in Condition 2.2. Since η=γηψ\eta^{\prime}=\gamma-\eta\psi and hence ψ=γηη\psi=\frac{\gamma-\eta^{\prime}}{\eta}, we have

Formula (20) is a more general version of formula (3.14) in [NV09] and is also derived in [KT12]. Now, differentiating Stein’s equation (11), we obtain for hh Lipschitz

We already know, how to solve (24) for x(a,b)x\in(a,b). So now, we will assume that at least one of aa and bb is finite and try to solve the equation outside (a,b)(a,b). Furthermore, we will discuss conditions that ensure that the composed solution ghg_{h} behaves nicely at the edges aa and/or bb. We will henceforth assume Condition 2.18. For xa,bx\not=a,b equation (24) is clearly equivalent to

Let us assume that both a>a>-\infty and b<b<\infty (the other cases are of course included) and let FlF_{l} be any primitive function of γη\frac{\gamma}{\eta} on (,a)(-\infty,a). Such a function exists by continuity and is hence continuously differentiable. By the method of variation of the constant one may derive the following formula for x(,a)x\in(-\infty,a):

if this integral exists. Note that this property does not depend on the particular choice of the primitive function FlF_{l}. For a fixed primitive function FlF_{l} of γη\frac{\gamma}{\eta} on (,a)(-\infty,a) we define the function

Analogously, for a given primitive function FrF_{r} of γη\frac{\gamma}{\eta} on (b,)(b,\infty) we define the function

Note that inside the interval (a,b)(a,b) we have that log(ηp)\log(\eta p) is a primitive of γη\frac{\gamma}{\eta} and hence qlq_{l} plays the role of pp on (,a)(-\infty,a) (and similarly for qrq_{r}). As we have observed, we will need the following Condition:

Similarly, for x(b,)x\in(b,\infty) we arrive at the definition

Note that the definition of the solution ghg_{h} does not depend on the choice of the primitive functions FlF_{l} and FrF_{r} since two such functions may only differ by an additive constant.

Next, we prove that the above constructed solution ghg_{h} is continuous as long as hh is continuous at aa and bb. To deal with the limits limxagh(x)\lim_{x\nearrow a}g_{h}(x) and limxbgh(x)\lim_{x\searrow b}g_{h}(x) we first formulate a condition which will usually be satisfied in practice.

The functions FlF_{l} and FrF_{r} satisfy limxaFl(x)=±\lim_{x\nearrow a}F_{l}(x)=\pm\infty and limxbFr(x)=±\lim_{x\searrow b}F_{r}(x)=\pm\infty.

Again, the validity of this condition does not depend on the choice of the functions FlF_{l} and FrF_{r}. By Condition 2.21 we may again apply de l’Hôpital’s rule to compute

Next, we want to present bounds on gh(x)g_{h}(x) and gh(x)g_{h}^{\prime}(x) for x(a,b)x\notin\overline{(a,b)}. But first we will show that our conditions already imply that η(x)<0\eta(x)<0 if x<ax<a or x>bx>b. Since Fl(x)=γ(x)η(x)F_{l}^{\prime}(x)=\frac{\gamma(x)}{\eta(x)} is then negative on (,a)(-\infty,a) this also ensues that limxaFl(x)=\lim_{x\nearrow a}F_{l}(x)=-\infty. Similarly, limxbFr(x)=\lim_{x\searrow b}F_{r}(x)=-\infty.

Assume Conditions 2.18, 2.19 and 2.21. Then the functions η\eta, FlF_{l} and FrF_{r} have the following properties:

We have limxaFl(x)=limxbFr(x)=\lim_{x\nearrow a}F_{l}(x)=\lim_{x\searrow b}F_{r}(x)=-\infty.

To prove (a), first note, that by Condition 2.18 η\eta has no sign changes on (,a)(-\infty,a). Suppose contrarily to the assertion, that η(x)>0\eta(x)>0 for all x(,a)x\in(-\infty,a). Then, the function qlq_{l} is also positive and hence 0xaql(t)dt=Ql(x)0\leq\int_{x}^{a}q_{l}(t)dt=-Q_{l}(x) for each x(,a)x\in(-\infty,a). By Conditions 2.19 and 2.21 we may apply de l’Hôpital’s rule to conclude

by Condition 2.18. This is a contradiction and hence we must have η(x)<0\eta(x)<0 for x(,a)x\in(-\infty,a). Similarly, one shows that also η(x)<0\eta(x)<0 for x(b,)x\in(b,\infty). To prove (b), note that Fl(x)=γ(x)η(x)<0F_{l}^{\prime}(x)=\frac{\gamma(x)}{\eta(x)}<0 for x<ax<a. By Condition 2.21 this necessarily implies that limxaFl(x)=\lim_{x\nearrow a}F_{l}(x)=-\infty. Analogously, we have Fr(x)>0F_{r}^{\prime}(x)>0 for x>bx>b (since γ(x)<0\gamma(x)<0 there) which by Condition 2.21 implies that limxbFr(x)=\lim_{x\searrow b}F_{r}(x)=-\infty. ∎

Proposition 2.23 particularly implies the conclusion of Proposition 2.8 and hence makes Condition 2.6 redundant, at least as far as the assertion of this proposition is concerned. In order to get general bounds, we will need yet another condition on the functions FlF_{l} and FrF_{r}

The functions FlF_{l} and FrF_{r} satisfy limxFl(x)=limxFr(x)=+\lim_{x\searrow-\infty}F_{l}(x)=\lim_{x\nearrow\infty}F_{r}(x)=+\infty.

The next result gives bounds on ghg_{h} for bounded, Borel-measurable functions hh. As usual, the proof is in the appendix.

Before turning to Lipschitz test functions, we dicuss properties of the functions exp(Fl)\exp(F_{l}) and exp(Fr)\exp(F_{r}), respectively. In particular, we will show, that they correspond to the function II on (a,b)(a,b) and have similar integral representations.

For each x(,a)x\in(-\infty,a) we have Il(x)=axγ(t)ql(t)dtI_{l}(x)=\int_{a}^{x}\gamma(t)q_{l}(t)dt.

For each x(b,)x\in(b,\infty) we have Ir(x)=bxγ(t)qr(t)dtI_{r}(x)=\int_{b}^{x}\gamma(t)q_{r}(t)dt.

by Proposition 2.23. Hence axγ(t)ql(t)dt\int_{a}^{x}\gamma(t)q_{l}(t)dt exists and equals Il(x)I_{l}(x). ∎

For each x(,a)x\in(-\infty,a) we have gh(x)hQl(x)E[Z]axsql(s)dsIl(x)\lvert g_{h}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{Q_{l}(x)E[Z]-\int_{a}^{x}sq_{l}(s)ds}{I_{l}(x)}.

For each x(b,)x\in(b,\infty) we have gh(x)hQr(x)E[Z]bxsqr(s)dsIr(x)\lvert g_{h}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{Q_{r}(x)E[Z]-\int_{b}^{x}sq_{r}(s)ds}{I_{r}(x)}.

ghhc\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{c}

These probabilities have nothing to do with the distribution μ\mu of ZZ and hence, one should be able to bound them as well directly for a given WW. Thus, we will focus on z(a,b)z\in(a,b).

Let z(a,b)z\in(a,b) be given. Then, under the Conditions 2.1, 2.18, 2.19 and 2.21 we have:

It is clear, that a similar discussion of the solutions gzg_{z} is possible, if a=a=-\infty or b=b=\infty.

Note, that we can write gz(x)=(1F(z))p(x)H(x)I(x)2g_{z}^{\prime}(x)=\frac{(1-F(z))p(x)H(x)}{I(x)^{2}} for x(a,z)x\in(a,z) and gz(x)=F(z)p(x)G(x)I(x)2g_{z}^{\prime}(x)=\frac{-F(z)p(x)G(x)}{I(x)^{2}} for x(z,b)x\in(z,b), with the functions HH and GG from Proposition 2.14. For concrete distributions one may often prove, that gzg_{z}^{\prime} is increasing on (a,z)(a,z) and decreasing on (z,b)(z,b), but this seems to be hard to prove in generality, if it is true at all.

Finally, in our general setting, we will prove suitable “plug-in theorems” for exchangeable pairs satisfying our general regression property (9). As was observed in [Röl08] for the normal distribution, in case of univariate distributional approximations, one does not need the full strength of exchangeability, but equality in distribution of the random variables WW and WW^{\prime} is sufficient. This may allow for a greater choice of admissible couplings in several situations, or at least, relaxes the verification of asserted properties.

In the following, let (Ω,A,P)(\Omega,\mathcal{A},P) be a probability space and let W,WW,W^{\prime} be real-valued random variables defined on this space such that W=DWW\stackrel{{\scriptstyle\mathcal{D}}}{{=}}W^{\prime}. Let, as before, μ\mu be our target distribution with support (a,b)\overline{(a,b)} fulfilling Condition 2.1 . From now on we will assume, that the random variables WW and WW^{\prime} only have values in an interval I(a,b)I\supseteq(a,b) where both functions η\eta and γ\gamma are defined (recall that it might be the case that η\eta can only be defined on (a,b)(a,b)).

If ff^{\prime} is also absolutely continuous and f<\lVert f^{\prime\prime}\rVert_{\infty}<\infty for some Borel-measurable version ff^{\prime\prime} of the second derivative, then we also have the bound

Hence, by distributional equality, we obtain

From (32) and the assumptions on ff the bound (29) now easily follows. To prove (30) it suffices to observe that

and 01s(1s)ds=16\int_{0}^{1}s(1-s)ds=\frac{1}{6}. ∎

From the first term on the right hand side of (29) we see, that the bound can only be useful, if E\bigl{[}(W^{\prime}-W)^{2}|W\bigr{]}\approx 2\lambda\eta(W). Similarly, the third term reveals, that, indeed, RR should be of smaller order than λ\lambda.

The proof shows, that Proposition 2.31 can easily be generalized to the situation, where there is a σ\sigma-algebra F\mathcal{F} with σ(W)FA\sigma(W)\subseteq\mathcal{F}\subseteq\mathcal{A} and the more general regression property

with some F\mathcal{F}-measurable remainder term RR is satisfied.

If H\mathcal{H} is some class of test functions, such that there are finite, positive constants c0c_{0}, c1c_{1} and c2c_{2} with ghc0\lVert g_{h}\rVert_{\infty}\leq c_{0}, ghc1\lVert g_{h}^{\prime}\rVert_{\infty}\leq c_{1} and ghc2\lVert g_{h}^{\prime\prime}\rVert_{\infty}\leq c_{2} for each hHh\in\mathcal{H}, then (30) immediately yields a bound on the distance

Stein’s method for Beta distributions

From now on, fix α,β>1\alpha,\beta>-1. Now, we introduce a Stein identity for the Beta distribution μα,β\mu_{\alpha,\beta}. It is easily checked, that its density pα,βp_{\alpha,\beta} satisfies the ordinary differential equation

Integrating by parts one obtains the conjugate operator L:=AL:=A^{*}, which is defined by the equation <Af,g>L2=<f,Ag>L2<Af,g>_{L^{2}}=<f,A^{*}g>_{L^{2}}, and which is known to serve as a characterizing operator for the distribution μα,β\mu_{\alpha,\beta}. To be concrete, in our case we have

for smooth enough functions gg, yielding the Stein identity

A real-valued random variable is distributed according to μα,β\mu_{\alpha,\beta} if and only if for all functions gKα,βg\in\mathcal{K}_{\alpha,\beta} the expected values E[(1X2)g(X)]E[(1-X^{2})g^{\prime}(X)] and E[(α+β+2)Xg(X)+(αβ)g(X)]E[(\alpha+\beta+2)Xg(X)+(\alpha-\beta)g(X)] exist and coincide.

First, let L(X)=μα,β\mathcal{L}(X)=\mu_{\alpha,\beta} and let gKα,βg\in\mathcal{K}_{\alpha,\beta}. By the hypothesis and the transformation formula, we have

Hence the expexted value E[(1X2)g(X)]E[(1-X^{2})g^{\prime}(X)] exists. Since gg is continuous, it is bounded on $andsotheexpectedvalueand so the expected valueE[(\alpha+\beta+2)Xg(X)+(\alpha-\beta)g(X)]exists,too.Again,bythetransformationruleandsinceexists, too. Again, by the transformation rule and sincegandand\varrho_{\alpha,\beta}areabsolutelycontinuousonare absolutely continuous on$ we can use integration by parts and have

where hz:=1(,z]h_{z}:=1_{(-\infty,z]}. It will be shown in Proposition 3.2 below that gzKα,βg_{z}\in\mathcal{K}_{\alpha,\beta}, so that by hypothesis we have

Since we have fixed the parameters α\alpha and β\beta, henceforth we may and will suppress them as sub-indices at objects which might well depend on them (for example we will simply write pp for pα,βp_{\alpha,\beta} and so on). As we would like to use the theory from section 2 we have to make sure, that our Stein identity for the Beta distribution fits into this framework, i.e. that relation (15) is satisfied with η(x)=1x2\eta(x)=1-x^{2} and

where we have used, that E[Z]=βαα+β+2E[Z]=\frac{\beta-\alpha}{\alpha+\beta+2}. In principle, this is clear, because we have just established a Stein characterization for μ\mu and given the density pp and the function γ\gamma, the corresponding η\eta is, of course, unique. However, we give a formal proof.

holds for all x(1,1)x\in(-1,1). First note, that

Differentiating the left hand side of (36), we obtain

which is of course the derivative of the right hand side, too. Since

Condition 2.6 is also satisfied but need not be proved, because its most important conclusion, namely that η(1)=η(1)=0\eta(1)=\eta(-1)=0 is clear from the above discussion. To verify Conditions 2.19, 2.21 and 2.24, we must first define the functions FlF_{l} on (,1)(-\infty,-1) and FrF_{r} on (1,)(1,\infty). We claim, that the functions

These two functions are of course locally integrable and hence, Condition 2.19 is satisfied. Since

Conditions 2.21 and 2.24 also hold. Consequently, all results from section 2 are valid in particular for the case of Beta distributions.

Here, the values of ghg_{h} at ±1\pm 1 are arbitrary, but they are chosen such that ghg_{h} is continuous, whenever hh is continuous at ±1\pm 1. This follows immediately from Proposition 2.22.

The next result, which is also proved in the appendix, completes the proof of Proposition 3.1 by showing that the solution gzg_{z} is in the class Kα,β\mathcal{K}_{{}_{\alpha,\beta}} whenever z±1z\not=\pm 1.

Next, we will derive some results for the solutions ghg_{h} from corresponding results in Section 2.

Since γ(1)=2β+2\gamma(-1)=2\beta+2 and γ(1)=2α2\gamma(1)=-2\alpha-2, this immediately follows from Proposition 2.25. ∎

ghhα+β+2\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{\alpha+\beta+2}

There exists a constant K1K_{1}, only depending on α\alpha and β\beta such that ghK1h\lVert g_{h}^{\prime}\rVert_{\infty}\leq K_{1}\lVert h^{\prime}\rVert_{\infty}.

The following lemma, which is proved in the appendix, will be useful.

has a bounded solution ff on (1,1)(-1,1) if and only if E[u(Z)]=0E[u(Z)]=0.

and from Proposition 3.4 we see that h2h_{2} is Lipschitz with minimal Lipschitz constant

Hence, there is a constant K2K_{2} depending only on α\alpha and β\beta such that

for all twice differentiable functions hh with bounded first and second derivative. We have thus proved the following proposition.

Now we are in the position to provide a “plug-in theorem” for the Beta approximation using exchangeable pairs.

Let W,WW,W^{\prime} be identically distributed, real-valued random variables on a common probabilty space (Ω,A,P)(\Omega,\mathcal{A},P) satisfying the regression property

for some constant λ>0\lambda>0 and a random variable RR. Then for each twice differentiable function hh with bounded first and second derivative and with E\bigl{[}\lvert h(W)\rvert\bigr{]}<\infty we have the bound

where the constants K1K_{1} and K2K_{2} are from Propositions 3.4 and 3.6, respectively.

This immediately follows from Propositions 2.31, 3.4, 3.6 and since ghg_{h} is a solution to Stein’s equation (11). ∎

In the following we will transfer the developed theory to the Beta distributions νa,b\nu_{a,b} on $.WestartwiththeSteinidentityfor. We start with the Stein identity for\nu_{a,b},where, wherea,b>0arefixedparameters.Letare fixed parameters. LetX\sim\nu_{a,b},then, thenY:=2X-1\sim\mu_{b-1,a-1}andhenceforeachsmoothenoughfunctionand hence for each smooth enough functionf$ we have

So, a Stein identity for Xνa,bX\sim\nu_{a,b} is given by

If hh is bounded, then \lVert f_{h}\rVert_{\infty}\leq\lVert h-\nu_{a,b}(h)\rVert_{\infty}\max\Bigl{(}\frac{1}{2m(1-m)q_{a,b}(m)},\,\frac{1}{a},\,\frac{1}{b}\Bigr{)}, where mm is a median for νa,b\nu_{a,b}.

If hh is Lipschitz, then fh2a+bh\lVert f_{h}\rVert_{\infty}\leq\frac{2}{a+b}\lVert h^{\prime}\rVert_{\infty} and fhC1h\lVert f_{h}^{\prime}\rVert_{\infty}\leq C_{1}\lVert h^{\prime}\rVert_{\infty}, where C1C_{1} only depends on aa and bb.

If hh is twice differentiable with bounded first and second derivative, then \lVert f_{h}^{\prime\prime}\rVert_{\infty}\leq C_{2}\bigl{(}\lVert h^{\prime}\rVert_{\infty}+\lVert h^{\prime\prime}\rVert_{\infty}\bigr{)}, where C2C_{2} only depends on aa and bb.

Now, let V,VV,V^{\prime} be identically distributed, real-valued random variables on a common probability space (Ω,A,P)(\Omega,\mathcal{A},P). For the approximation of L(V)\mathcal{L}(V) by νa,b\nu_{a,b} the general regression property from Section 2 is

where, again, λ>0\lambda>0 is constant and RR is a hopefully small remainder term. For the distribution νa,b\nu_{a,b} Theorem 3.7 becomes the following:

where the constants C1C_{1} and C2C_{2} are those from Proposition 3.8.

The assertion is clear from Propositions 2.31 and 3.8 and since fhf_{h} is a solution to Stein’s equation (41). ∎

Application to the Polya urn model

In this section we prove a quantitative version of the fact that the relative number of drawn red balls in a Polya urn model converges in distribution to a suitable Beta distribution, if the number of total drawings tends to infinity. This model will serve as an application of our Stein method for the Beta distribution, as developed in section 3. We start by introducing the stochastic model:

It now follows, that for each k=0,,nk=0,\ldots,n we have

or, with a:=rca:=\frac{r}{c} and b:=wcb:=\frac{w}{c},

The distribution of SnS_{n} is usually referred to as the Polya distribution with parameters nn, aa and bb. It is a well-known fact that the distribution of 1nSn\frac{1}{n}S_{n} converges weakly to the distribution νa,b\nu_{a,b} as nn goes to infinity, where the Beta distribution νa,b\nu_{a,b} was defined in section 3. A convenient way to prove this weak convergence result is to use the formula

together with the weak law of large numbers for Bernoulli random variables to deal with the binomial probabilities b(k;n,p)=(nk)pk(1p)nkb(k;n,p)=\binom{n}{k}p^{k}(1-p)^{n-k}.

Formula (43) can be proved by a straight-forward computation using the relations B(a+1,b)=aa+bB(a,b)B(a+1,b)=\frac{a}{a+b}B(a,b) and B(a,b)=B(b,a)B(a,b)=B(b,a) for the Beta function, where a,b>0a,b>0, and can also be viewed as a consequence of a special instance of de Finetti’s representation theorem for infinite exchangeable sequences. Note, however, that one generally does not know the corresponding mixing measure from de Finetti’s theorem and hence, identity (43) is not a direct consequence of this theorem.

From now on, we will present a Stein’s method proof of the above distributional convergence result and, as usual, also derive a rate of convergence. We will usually suppress the time index nn and let V:=Vn:=1nSnV:=V_{n}:=\frac{1}{n}S_{n} denote the random variable of interest. For the construction of the exchangeable pair, we use the well-known Gibbs sampling procedure with the slight simplification, that due to exchangeability of X1,,XnX_{1},\ldots,X_{n} we need not choose at random the index of the summand from SnS_{n}, which has to be replaced. Instead, we will always replace XnX_{n} by XnX_{n}^{\prime}, which is constructed as follows:

Observe X1=x1,,Xn=xnX_{1}=x_{1},\ldots,X_{n}=x_{n} and construct XnX_{n}^{\prime} according to the distribution L(XnX1=x1,,Xn1=xn1)\mathcal{L}(X_{n}|X_{1}=x_{1},\ldots,X_{n-1}=x_{n-1}). Then, letting V:=Vn:=V1nXn+1nXnV^{\prime}:=V_{n}^{\prime}:=V-\frac{1}{n}X_{n}+\frac{1}{n}X_{n}^{\prime}, the pair (V,V)(V,V^{\prime}) is exchangeable. In order to use Stein’s method of exchangeable pairs, we need to establish a suitable regression property. This is the content of the following proposition.

The exchangeable pair (V,V)(V,V^{\prime}) satisfies the regression property

where \gamma_{a,b}(x)=(a+b)\bigl{(}\frac{a}{a+b}-x\bigr{)} and λ=λn=1n(a+b+n1)\lambda=\lambda_{n}=\frac{1}{n(a+b+n-1)}.

We have VV=XnnXnnV^{\prime}-V=\frac{X_{n}^{\prime}}{n}-\frac{X_{n}}{n} and by exchangeability of X1,,XnX_{1},\ldots,X_{n} it clearly holds that E[XnV]=E[XnSn]=1nSn=VE[X_{n}|V]=E[X_{n}|S_{n}]=\frac{1}{n}S_{n}=V. Also, by the definition of XnX_{n}^{\prime} and since XnX_{n}^{\prime} only assumes the values and 11 we have for any x1,,xn1{0,1}x_{1},\ldots,x_{n-1}\in\{0,1\}

Thus, since σ(V)σ(X1,,Xn)\sigma(V)\subseteq\sigma(X_{1},\ldots,X_{n}), we obtain

Next, we will compute the quantity E\bigl{[}(V^{\prime}-V)^{2}|V\bigr{]}.

We have for the above constructed exchangeable pair (V,V)(V,V^{\prime})

From the general theory of Gibbs sampling (see the author’s PhD thesis, to appear) it is known, that

Since Xn2=XnX_{n}^{2}=X_{n} we have from the proof of Proposition 4.1 that

where we have used E[XnV]=VE[X_{n}|V]=V again. Finally, we compute

Putting pieces together, we eventually obtain

The last assertion easily follows from this and from λ=1n(a+b+n1)\lambda=\frac{1}{n(a+b+n-1)}. ∎

Recall that for the distribution νa,b\nu_{a,b} we have η(x):=ηa,b(x)=x(1x)\eta(x):=\eta_{a,b}(x)=x(1-x) and hence, we obtain from Proposition 4.2 that

since V1\lvert V\rvert\leq 1. Similarly, since VV=1nXnXn1n\lvert V^{\prime}-V\rvert=\frac{1}{n}\lvert X_{n}^{\prime}-X_{n}\rvert\leq\frac{1}{n} we have

From Theorem 3.9 we can now conclude the following result.

with the constants C1C_{1} and C2C_{2} from Proposition 3.8.

Since VV assumes only values in $,thecondition, the conditionE\bigl{[}\lvert h(V)\rvert\bigr{]}<\infty$ from Theorem 3.9 is trivially met. The assertion now follows immediately from Theorem 3.9, (44) and (45). ∎

In [GR12] the authors use a different technique within Stein’s method for the Beta distributions, which compares the Stein characterization of the target distribution with that of the approximating discrete distribution, to prove that, in the special case c=1c=1, the convergence rate of order n1n^{-1} from Theorem 4.3 even holds in the Wasserstein distance and they compute an explicit constant in the bound. They also show that the rate of convergence is optimal. Using their technique and the bounds from Proposition 3.8 one can easily see that the rate of order n1n^{-1} in the Wasserstein distance also holds in the case c2c\geq 2. However, in order to obtain an explicit constant, some further work has to be done to bound the constant C1C_{1} from Proposition 3.8 in the case that one of the values a,ba,b is strictly smaller than one.

In [FG] the authors use the zero bias coupling within Stein’s method for normal approximation to prove bounds on the distance of a normalized version of the quantity VV to the standard normal distribution. In particular, they show that a CLT holds whenever the parameters nn, aa and bb tend to infinity in a suitable fashion.

Appendix

In this section we provide the proofs of some of the results from Sections 2 and 3 and state and prove some further auxiliary results, which are only used within proofs.

The following result justifies all our calculations, which invoke de l’Hôpital’s rule. Its proof is suppressed for reasons of space, but will be given in the author’s PhD thesis.

If a<a<b<ba<a^{\prime}<b^{\prime}<b, then both, ff and gg, are absolutely continuous on [a,b][a^{\prime},b^{\prime}].

If limxaf(x)=limxag(x)=0\lim_{x\searrow a}f(x)=\lim_{x\searrow a}g(x)=0, then g(x)0g(x)\not=0 for all x(a,b)x\in(a,b) and

The same conclusion holds if g(x)<0g^{\prime}(x)<0 for almost all x(a,b)x\in(a,b) and an analogous result is true for limxb\lim_{x\nearrow b}.

2. Proofs from Section 2

which exists in [0,)[0,\infty) by Condition 2.2. Here, we used the convention 1=0\frac{1}{\infty}=0.

again by Condition 2.2 and by Proposition 2.4. Furthermore, we have

for each x(a,b)x\in(a,b) since by the positivity of pp and because γ\gamma is strictly decreasing

Hence, MM is strictly increasing and thus for each x(a,m]x\in(a,m]:

The same bound can be proved for x(m,b)x\in(m,b) by using the representation

and the fact that also 1F(m)=121-F(m)=\frac{1}{2}.

The following two well-known lemmas will be needed for the proof of Proposition 2.14. Their proofs are included only for reasons of completeness.

Let a<b-\infty\leq a<b\leq\infty and let μ\mu be a probability measure (not necessarily absolutely continuous with respect to λ\lambda) with supp(μ)(a,b)\operatorname{supp}(\mu)\subseteq\overline{(a,b)}. Let FF be the distribution function corresponding to μ\mu and suppose that abxdμ(x)<\int_{a}^{b}\lvert x\rvert d\mu(x)<\infty. Then, for each x(a,b)x\in\overline{(a,b)} we have

axF(t)dt=xF(x)(a,x]sdμ(s)\int_{a}^{x}F(t)dt=xF(x)-\int_{\overline{(a,x]}}sd\mu(s)

xb(1F(t))dt=(x,)sdμ(s)x(1F(x))\int_{x}^{b}(1-F(t))dt=\int_{(x,\infty)}sd\mu(s)-x(1-F(x))

For each x(a,b)x\in\overline{(a,b)} we have (a,x](h(y)μ(h))dμ(y)=(1F(x))axF(s)h(s)dsF(x)xb(1F(s))h(s)ds\int_{(a,x]}(h(y)-\mu(h))d\mu(y)=-(1-F(x))\int_{a}^{x}F(s)h^{\prime}(s)ds-F(x)\int_{x}^{b}(1-F(s))h^{\prime}(s)ds.

Since μ\mu is a probability measure we have by the fundamental theorem of calculus for Lebesgue integration and by Fubini’s theorem

This proves (a). As to (b), we have using (a) and its proof

First, we prove (a). Recall the representation

By Lemmas 5.3 and 5.2 we thus obtain that

implying (a). Now, we turn to the proof of (b). By Stein’s equation (11) we obtain for x(a,b)x\in(a,b)

which reduces to the bound asserted in (b). ∎

The bound on gh(x)\lvert g_{h}(x)\rvertfor x(a,b)x\in(a,b) has already been proved in Proposition 2.10. Let x(,a)x\in(-\infty,a). Then we have by the negativity of qlq_{l} which follows from Proposition 2.23:

Next, we will state a lemma, which replaces Lemma 5.2 outside the support (a,b)\overline{(a,b)}.

For each x(,a)x\in(-\infty,a) we have xaQl(t)dt=xQl(x)+axtql(t)dt\int_{x}^{a}Q_{l}(t)dt=-xQ_{l}(x)+\int_{a}^{x}tq_{l}(t)dt and Il(x)<γ(x)Ql(x)I_{l}(x)<\gamma(x)Q_{l}(x).

For each x(b,)x\in(b,\infty) we have bxQr(t)dt=xQr(x)bxtqr(t)dt\int_{b}^{x}Q_{r}(t)dt=xQ_{r}(x)-\int_{b}^{x}tq_{r}(t)dt and Ir(x)<γ(x)Qr(x)I_{r}(x)<\gamma(x)Q_{r}(x).

which proves the first part of (a). The second claim of (a) follows from (a), the positivity of ql-q_{l} on (,a)(-\infty,a) and from the monotonicity of γ\gamma:

The proof of (b) is similar but easier, and is therefore omitted. ∎

The next lemma replaces Lemma 5.3 outside of the support of μ\mu.

For each x(,a)x\in(-\infty,a) we have h(x)-\mu(h)=-\int_{x}^{b}\bigl{(}1-F(s)\bigr{)}h^{\prime}(s)ds=-\int_{x}^{a}h^{\prime}(s)ds-\int_{a}^{b}\bigl{(}1-F(s)\bigr{)}h^{\prime}(s)ds and

For each x(b,)x\in(b,\infty) we have h(x)μ(h)=axF(s)h(s)dsh(x)-\mu(h)=\int_{a}^{x}F(s)h^{\prime}(s)ds and

We only prove (a) since the proof of (b) is very similar. The first claim follows from Lemma 5.3 (a) since F(s)=0F(s)=0 for s<as<a and F(s)=1F(s)=1 for xbx\geq b. The second claim follows from the first one and from Fubini’s theorem by

This is the first representation for gh(x)g_{h}(x) in the assertion. The second one follows, since 1F(s)=11-F(s)=1 for s<as<a and hence

We only prove (a) and (c), since the proofs of (b) and (d) are similar. To prove (a), we observe that by Lemma 5.5 we have

Since QlQ_{l} is decreasing (Ql=ql<0Q_{l}^{\prime}=q_{l}<0) and positive on (,a)(-\infty,a) this implies

By Lemma 5.2 and Lemma 5.4 (a) the right hand side equals

which is the claimed bound. Now, we turn to the proof of (c). By Stein’s equation (24) and Lemma 5.5 (a) we have for each x(,a)x\in(-\infty,a):

These together with Proposition 2.27 (a), (b) and Corollary 2.16 immediately imply (a). As to (b), by (49) we have

This and Proposition 2.27 (c) imply claim (b). Assertion (c) may be proved similarly. ∎

proving the desired representation of gzg_{z} inside the interval (a,b)(a,b). Now, for x(a,b)x\in(a,b) let M(x):=F(x)I(x)M(x):=\frac{F(x)}{I(x)} and N(x):=1F(x)I(x)N(x):=\frac{1-F(x)}{I(x)}. Then we have

since G(x)=(I(x)+(1F(x))γ(x))G(x)=(I(x)+(1-F(x))\gamma(x)) is positive by Proposition 2.14. Thus, MM is strictly increasing and NN is strictly decreasing on (a,b)(a,b). Since gz(x)=(1F(z))M(x)g_{z}(x)=(1-F(z))M(x) for x(a,z]x\in(a,z] and gz(x)=F(z)N(x)g_{z}(x)=F(z)N(x) for x(z,b)x\in(z,b), this implies, that supx(a,b)gz(x)=gz(z)=F(z)(1F(z)I(z)\sup_{x\in(a,b)}g_{z}(x)=g_{z}(z)=\frac{F(z)(1-F(z)}{I(z)}. It also implies the claimed representation of gz(x)g_{z}^{\prime}(x) for x(a,b){z}x\in(a,b)\setminus\{z\}. Furthermore, by de l’Hôpital’s rule, we have

Note, that these limits could also be derived from Proposition 2.22. Next, consider x(,a)x\in(-\infty,a). For such an xx we have

by Lemma 5.4. Thus, gzg_{z} is increasing on (,a)(-\infty,a) and hence, again by de l’Hôpital’s rule,

Since gz(x)=(1F(z))ddxQl(x)Il(x)g_{z}^{\prime}(x)=(1-F(z))\frac{d}{dx}\frac{Q_{l}(x)}{I_{l}(x)} we have also derived the desired formula for gz(x)g_{z}^{\prime}(x) for x(,a)x\in(-\infty,a). The calculations for x(b,)x\in(b,\infty) are completely analogous and therefore omitted. From our computations we can already infer, that gz=F(z)(1F(z)I(z)\lVert g_{z}\rVert_{\infty}=\frac{F(z)(1-F(z)}{I(z)}. So, it remains to show that this quantity is bounded in z(a,b)z\in(a,b). Since it is a continuous function of zz, we only have to show that it has finite limits on the edge of the interval (a,b)(a,b). But, of course,

and, similarly, limzbF(z)(1F(z)I(z)=limzbN(z)=1γ(b)\lim_{z\nearrow b}\frac{F(z)(1-F(z)}{I(z)}=\lim_{z\nearrow b}N(z)=-\frac{1}{\gamma(b)}. This concludes the proof. ∎

3. Proofs from Section 3

Using de l’Hôpital’s rule 5.1 and Lemma 5.6 (a) below, we obtain

Hence, limxgh(x)=0\lim_{x\to\infty}g_{h}(x)=0. Similarly, one can prove that limxgh(x)=0\lim_{x\to-\infty}g_{h}(x)=0. It remains to show that

To this end, it suffices to see that the function gz(x)(1x2)p(x)\lvert g_{z}^{\prime}(x)\rvert(1-x^{2})p(x) is bounded on (1,z)(-1,z) and on (z,1)(z,1). Since it is continuous on (1,z](-1,z] and on (z,1)(z,1) (where gz(z):=limxzgz(x)g_{z}^{\prime}(z):=\lim_{x\nearrow z}g_{z}^{\prime}(x) for definiteness), this claim will follow if we have proved that limx±1gz(x)(1x2)p(x)=0\lim_{x\to\pm 1}g_{z}^{\prime}(x)(1-x^{2})p(x)=0. For x(1,z)x\in(-1,z) we have

proving the claim for 1-1. Since it may be proved analogously for +1+1 the proof is complete. ∎

The following lemma will be useful for the proof of Proposition 3.4.

The functions pp, qlq_{l}, qrq_{r} and η\eta satisfy the following equations for each integer k1k\geq 1:

\frac{d}{dx}\bigl{(}\eta(x)^{k}p(x)\bigr{)}=\eta(x)^{k-1}p(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]} for each x(1,1)x\in(-1,1)

\frac{d}{dx}\bigl{(}\eta(x)^{k}q_{l}(x)\bigr{)}=\eta(x)^{k-1}q_{l}(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]} for each x(,1)x\in(-\infty,-1)

\frac{d}{dx}\bigl{(}\eta(x)^{k}q_{r}(x)\bigr{)}=\eta(x)^{k-1}q_{r}(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]} for each x(1,)x\in(1,\infty)

First we prove (a). By (14) we have, multiplying by p(x)p(x),

proving (a). As to (b), we observe that by the definition of ql=expFlηq_{l}=\frac{\exp\circ F_{l}}{\eta} we have on the one hand

and on the other hand, by the product rule,

Now the proof follows the lines of proof for pp as above. Similarly one may prove (c). ∎

Assertion (a) immediately follows from Corollary 2.28 (a) since in this case c=α+β+2c=\alpha+\beta+2. Now, we turn to the proof of (b). First, consider x(1,1)x\in(-1,1). By Corollary 2.16 (b) we have for x(1,1)x\in(-1,1):

Then SS is continuous on (1,1)(-1,1) and hence, to show that SS is bounded on (1,1)(-1,1), it suffices to prove that SS has finite limits at ±1\pm 1. Since limx1G(x)=γ(1)=2β+2\lim_{x\searrow-1}G(x)=\gamma(-1)=2\beta+2 we obtain, using Lemma 5.6 and de l’Hôpital’s rule:

Here we have used, that H(x)=γ(x)F(x)=(α+β+2)F(x)H^{\prime}(x)=-\gamma^{\prime}(x)F(x)=(\alpha+\beta+2)F(x). Similarly one shows that

Thus, we have shown that supx(1,1)S(x)<\sup_{x\in(-1,1)}S(x)<\infty. Now, we consider x(,1)x\in(-\infty,-1). From Corollary 2.28 (b) we have

For x(,1)x\in(-\infty,-1) we consider the function

Clearly, SlS_{l} is a continuous function on (,1)(-\infty,-1). To show that it is bounded, it thus suffices to prove that limx1Sl(x)<\lim_{x\nearrow-1}S_{l}(x)<\infty and limxSl(x)<\lim_{x\to-\infty}S_{l}(x)<\infty. Using Lemma 5.6 and de l’Hôpital’s rule, we obtain

Next, we will show that limxSl(x)=0\lim_{x\to-\infty}S_{l}(x)=0. We actually have

Here we have used that η(x)ql(x)=(1x)α+1(1x)β+1\eta(x)q_{l}(x)=-(1-x)^{\alpha+1}(-1-x)^{\beta+1}\rightarrow-\infty as xx\to-\infty. Hence, limxSl(x)=0\lim_{x\to-\infty}S_{l}(x)=0 and supx(,1)Sl(x)<\sup_{x\in(-\infty,-1)}S_{l}(x)<\infty. Since we can show in a similar manner that supx(1,)Sr(x)<\sup_{x\in(1,\infty)}S_{r}(x)<\infty, where SrS_{r} is defined in the obvious way, the proof is complete. ∎

If E[u(Z)]=0E[u(Z)]=0, then the usual Stein solution is bounded on (1,1)(-1,1) by Proposition 3.3. For the converse, let us assume that E[u(Z)]0E[u(Z)]\not=0. As was already noted in Section 2, the solutions of the homogeneous equation corresponding to (40) are exactly the multiples of 1ηp\frac{1}{\eta p}. Thus, every solution ff of (40) has the form

since E[u(Z)]0E[u(Z)]\not=0. Hence, ff is unbounded near 11. If c0c\not=0 we have

So, ff is unbounded near 1-1. Hence, in any case ff is unbounded on (1,1)(-1,1).

References