Stein's method of exchangeable pairs for absolutely continuous, univariate distributions with applications to the Polya urn model

Christian Döbler

Introduction

and that, if $f_{h}$ is bounded and $\frac{1}{p}$ is unbounded on $(a,b)$ , then $f_{h}$ is the only bounded solution on $(a,b)$ . For general properties of the solutions $f_{h}$ see [CS11] or [CGS11]. Note that for general Borel-measurable $h$ it cannot be expected that there exists a solution $f$ which is differentiable on all of $(a,b)$ and satisfies (2) pointwise. Thus, a solution is understood to be an almost everywhere differentiable and Borel-measurable function which satisfies (2) at all points $x\in(a,b)$ where it is in fact differentiable and contrary to the usual convention, at the remaining points $x\in(a,b)$ one defines $f^{\prime}(x):=-\psi(x)+h(x)-\mu(h)$ . This yields a Borel-measurable function $f^{\prime}$ on $(a,b)$ such that (2) holds for each $x\in(a,b)$ . In order to understand the exchangeable pairs technique in the framework of the density approach it might be helpful to recall the exchangeable pairs method in the situation of normal approximation. This method, which was first presented in Stein’s monograph [Ste86], is a cornerstone of Stein’s method of normal approximation and is still the most frequently used coupling. This is due to the wide applicability of standard couplings like the Gibbs sampler or making one time step in a reversible Markov chain, which generally yield exchangeable pairs. By definition, an exchangeable pair is a pair $(W,W^{\prime})$ of random variables, defined on a common probability space, such that their joint distribution is symmetric, i.e. such that $(W,W^{\prime})\stackrel{{\scriptstyle\mathcal{D}}}{{=}}(W^{\prime},W)$ . In [Ste86], in order to show that a given real-valued random variable $W$ is approximately standard normally distributed, Stein proposes the construction of another random variable $W^{\prime}$ , a small random perturbation of $W$ , on the same space as $W$ such that $(W,W^{\prime})$ forms an exchangeable pair and additionally the following linear regression property holds:

Here, $\lambda\in(0,1)$ is a constant which is typically close to zero for conveniently chosen $W^{\prime}$ . If this condition is satisfied, then the distributional distance of $\mathcal{L}(W)$ to $N(0,1)$ can be efficiently bounded in various metrics, including the Kolmogorov and Wasserstein metrics (see, e.g. [Ste86], [CS05] or [CGS11] for the common “plug-in theorems”). The range of examples to which this technique could be applied was considerably extended by the work [RR97] of Rinott and Rotar who proved normal approximation theorems allowing the linear regression property to be satisfied only approximately. Specifically, they assumed the existence of a random quantity $R$ , which is dominated by $\lambda W$ in size, such that

Note, that necessarily $R$ is $\sigma(W)$ -measurable and that unlike condition (4), condition (5) is not a true condition on the pair $(W,W^{\prime})$ since we can always define $R:=E[W^{\prime}-W|W]+\lambda W$ for each given constant $\lambda>0$ . However, the “plug-in theorems” in [RR97], [SS06] or [CGS11] clarify that $R$ has to be of smaller order than $\lambda$ in order to yield useful bounds. Since $W$ is supposed to have a “true” distributional limit, it follows that both, $\lambda$ and $R$ , are at least asymptotically unique (see also the introduction of [RR09] for the discussion of this topic). When dealing with our possibly non-normal distribution $\mu$ , the question is what condition to substitute for the linear regression property (4) or (5). This question was succesfully answered independently by Eichelsbacher and Löwe in [EL10] and by Chatterjee and Shao in [CS11]. They pointed out, that in this more general setting the appropriate regression property is

where, again $\lambda>0$ is constant and $R$ is of smaller order than $\lambda\psi(W)$ . In order to give a flavour of the resulting “plug-in theorems”, we present parts of Theorem 2.4 from [EL10].

The third term of the bound (1.1) reveals that, in fact, $R$ must be of smaller order than $\lambda$ in order for the bound to be useful and from the second term we conclude that $\lambda$ should such that $E\left[(W^{\prime}-W)^{2}\right]\approx 2\lambda$ . The first term appearing in the bound on the right hand side of (1.1) is usually interpreted such, that the random variable $E\bigl{[}(W-W^{\prime})^{2}|W\bigr{]}/2\lambda$ must “obey a law of large numbers” to obtain decreasing bounds. Bounding this term is often decisive for the success of applying Theorem 1.1.

Having discussed the method of exchangeable pairs within the density approach, we now address the problem, that condition (6) with negligible remainder term $R$ is in some examples not satisfied by an exchangeable pair, which, however, appears natural to us for our approximation problem. For example, in many situations where the exchangeable pair $(W,W^{\prime})$ is constructed via the Gibbs sampler, we have a regression property of the form

where $\lambda,c>0$ are constants (the reason why $\lambda$ and $c$ are not subsumed into a single constant will become clear later on) and where, again, $R$ is a negligible remainder. Here, again $Z\sim\mu$ . Following the theory of the paper [CS11], condition (8) suggests approximating $W$ with a normal distribution with mean $E[Z]$ and variance $\frac{1}{c}$ . But there are situations, where the exchangeable pair $(W,W^{\prime})$ is good, meaning that the difference $|W^{\prime}-W|$ is “small”, condition (8) is satisfied and where we know that $W$ is approximately distributed as a non-normal random variable $Z\sim\mu$ and so the normal approximation is inappropriate. In general, this either means that the above discussed law of large numbers cannot hold or that the resulting error term $R$ in (6) is not negligible. These observations motivate a new version of Stein’s method, that allows for a more general regression property.

Suppose, that an appropriately chosen exchangeable pair $(W,W^{\prime})$ satisfies the following general regression property:

where $\lambda>0$ is constant, $\gamma$ is a measurable function, whose domain contains $\overline{(a,b)}$ and will be discussed further below and where $R$ is a negligible remainder term. We will see, that it will be advantageous if the term $\gamma(x)\cdot g(x)$ appears in the “new” Stein equation. So we make the following ansatz for the Stein identity:

where $\eta$ is another function which still has to be found.

Starting from the Stein identity (10) our aim is to identify the function $\eta$ . If this approach is succesful, the Stein equation corrersponding to a meaurable function $h$ will be

For this identity to hold, irrespective of the test function $h$ , it must be the case that $\alpha(x)=\frac{1}{\eta(x)}$ (particularly $\eta$ must be differentiable at least almost everywhere) and hence

This is a first order linear differential equation, which can of course be solved explicitly by the method of variation of the constant. It turns out, that the right solution is given by

at least, if $\int_{a}^{b}|\gamma(t)|p(t)dt=E[|\gamma(Z)|]<\infty$ . But this is a very natural condition to hold, since the function $\gamma$ was motivated by the regression property (9) and so, if the random variables $W$ and $W^{\prime}$ are integrable we obtain that both sides of (9) must be $P$ -integrable and, in fact

Neglecting the remainder term $R$ we thus see that $E[\gamma(W)]$ should exist and, in fact, be close to zero. So since $W\stackrel{{\scriptstyle\mathcal{D}}}{{\approx}}Z$ we find it reasonable that $E[\gamma(Z)]$ exists and even equals zero. Furthermore, it is a matter of routine to check, that $\eta$ as given in (15) indeed still satisfies (14). The above calculations starting with (10) were rather formal but crucial for the motivation and understanding of our approach. The paper is organized in the following way. Rigorous results and the abstract theory for general $\mu$ are presented in Section 2. These results are then further spezialized to the Beta distributions in Section 3. In Section 4 the theory combined with a suitable exchangeable pairs coupling is used to prove a rate of convergence of order $\frac{1}{n}$ in a Polya urn model (see Theorem 4.3). In Section 5 some rather lengthy or technical proofs can be found and a sufficiently general version of de l’ Hôpital’s rule for merely absolutely continuous functions is provided. This result justifies all the invocations of this famous rule in the present work.

Acknowledgements

A few days after this work was on the arXiv, G. Reinert and L. Goldstein posted a preprint (see [GR12]), which also develops Stein’s method for the Beta distributions and uses a comparison technique to prove error bounds of order $n^{-1}$ in the Wasserstein distance for the more special Polya urn model, where the drawn ball is replaced to the urn together with only one extra ball of the same colour.

The general theory

The density $p$ is positive on the interval $(a,b)$ and absolutely continuous on every compact interval $[c,d]\subseteq(a,b)$ .

$\gamma$ is continuous on $\overline{(a,b)}$

$\gamma$ is strictly decreasing on $\overline{(a,b)}$

$\int_{a}^{b}|\gamma(t)|p(t)dt<\infty$ and in fact $E[\gamma(Z)]=\int_{a}^{b}\gamma(t)p(t)dt=0$

There is a unique $x_{0}\in\overline{(a,b)}$ with $\gamma(x_{0})=0$ .

Note that in Condition 2.2 (iv) is actually implied by (i),(ii) and (iii) and the intermediate value theorem. Furthermore, (iv) implies that $\gamma$ is positive on $(a,x_{0})$ and is negative on $(x_{0},b)$ .

Under Conditions 2.1 and 2.2 the function $I$ has the following properties:

$I$ is strictly increasing on $(a,x_{0})$ and strictly decreasing on $(x_{0},b)$ and hence attains its global maximum at $x_{0}$ .

Of course, (a) is immediately implied by item (iii) from Condition 2.2. To prove (b) and (c) first observe that by (iii) we have $\lim_{x\searrow a}I(x)=0=\lim_{x\nearrow b}I(x)$ . Furthermore, $I^{\prime}(x)=\gamma(x)p(x)$ is postive on $(a,x_{0})$ and negative on $(x_{0},b)$ implying the results. ∎

Under Conditions 2.1 and 2.2 the function $\eta$ has the following properties:

$\eta$ is positive on $(a,b)$ , absolutely continuous on every compact subinterval $[c,d]\subseteq(a,b)$ and $\eta^{\prime}(x)=\gamma(x)-\psi(x)\eta(x)$ for $\lambda$ -almost all $x\in(a,b)$ .

$\lim_{x\searrow a}\eta(x)p(x)=\lim_{x\nearrow b}\eta(x)p(x)=0$

If $\lim_{x\searrow a}p(x)=0$ , then $\lim_{x\searrow a}\eta(x)=\frac{\gamma(a)}{\lim_{x\searrow a}\psi(x)}$ , if this limit exists.

If $\liminf_{x\searrow a}p(x)\in(0,\infty)\cup\{\infty\}$ then $\lim_{x\searrow a}\eta(x)=0$

If $\lim_{x\nearrow b}p(x)=0$ , then $\lim_{x\nearrow b}\eta(x)=\frac{\gamma(b)}{\lim_{x\nearrow b}\psi(x)}$ , if this limit exists.

If $\liminf_{x\nearrow b}p(x)\in(0,\infty)\cup\{\infty\}$ then $\lim_{x\nearrow b}\eta(x)=0$

The first part of (a) follows from the fact, that $I$ is positive on $(a,b)$ and $C^{1}$ on $\overline{(a,b)}$ and hence absolutely continuous on $[c,d]$ and that $p$ is also absolutely continuous and bounded below by a positive constant on $[c,d]$ . The rest of (a) has already been observed. Items (b), (d) and (f) follow immediately from the properties of the function $I$ in Proposition 2.4. To prove (c), we use de l’Hôpital’s rule (see Theorem 5.1) to derive

The following “Mill’s ratio” condition on the density $p$ and the corresponding distribution function $F$ is often satisfied and will yield $\lim_{x\searrow a}\eta(x)=\lim_{x\nearrow b}\eta(x)=0$ .

The density $p$ of $\mu$ satisfies all the properties from Condition 2.1 and also the following:

If $a>-\infty$ , then $\lim_{x\searrow a}\frac{F(x)}{p(x)}=0$ .

If $b<\infty$ , then $\lim_{x\nearrow b}\frac{1-F(x)}{p(x)}=0$ .

Condition 2.6 is always satisfied if the density $p$ is bounded away from zero in suitable neighbourhoods of $a$ and $b$ .

Assume that both, $a>-\infty$ and $b<\infty$ and that $\lim_{x\searrow a}p(x)=\lim_{x\nearrow b}p(x)=0$ . Then Condition 2.6 is satisfied, if there is a $\delta>0$ such that $p$ is increasing on $(a,a+\delta)$ and decreasing on $(b-\delta,b)$ . This is easily seen by the inequality

valid for $x\in(a,a+\delta)$ and a similar one for the right end point $b$ .

Suppose that $a>-\infty$ and $b<\infty$ , that $\lim_{x\searrow a}p(x)=\lim_{x\nearrow b}p(x)=0$ and that there is a $\delta>0$ such that $p$ is convex on $(a,a+\delta)$ and on $(b-\delta,b)$ . Then the assumptions of (b) and hence Condition 2.6 is satisfied. In fact, first we can extend $p$ to a continuous and convex function on $[a,b)$ by setting $p(a):=0$ . Now, let $a<x<y<a+\delta$ . Then, there exists a $\lambda\in(0,1)$ with $x=\lambda a+(1-\lambda)y$ and by convexity we have:

Thus, $p$ is strictly increasing on $(a,a+\delta)$ . Similarly, one shows, that $p$ is strictly decreasing on $(b-\delta,b)$ , if $p$ is convex there.

The following proposition provides the announced result.

Assume Condition 2.6. Then the function $\eta$ vanishes at the finite end points of the support $\overline{(a,b)}$ of $\mu$ , i.e. if $a>-\infty$ , then $\lim_{x\searrow a}\eta(x)=0$ and if $b<\infty$ , then $\lim_{x\nearrow b}\eta(x)=0$ . Hence, we may extend $\eta$ to a continuous function on $\overline{(a,b)}$ by letting $\eta(a):=\eta(b):=0$ .

Suppose, that $a>-\infty$ . Then, by the positivity of $I$ and the monotonicity of $\gamma$ , for $a<x<b$ :

so that $\lim_{x\searrow a}\eta(x)=0$ . The proof of $\lim_{x\nearrow b}\eta(x)=0$ for finite $b$ is similar by using the representation $I(x)=-\int_{x}^{b}\gamma(t)p(t)dt$ and is therefore omitted. ∎

solves the Stein equation (11) for $x\in(a,b)$ . This can also be proved directly by differentiation and the formula for $g_{h}$ could also be derived by the method of variation of the constant using the fact that $\log(p\cdot\eta)$ is a primitive function of $\frac{\gamma}{\eta}$ , which follows from (14). If we can show that $g_{h}$ is bounded, then it will immediately follow from Proposition 2.5 (a) that $g_{h}$ is the only bounded solution of (11), since the solutions of the corresponding homogeneous equation are constant multiples of $\frac{1}{p\cdot\eta}$ . Since we do not exclude approximating random variables which take on the values $a$ or $b$ , we show that the solution $g_{h}$ can be extended continuously to $a$ and $b$ , if $h$ is continuous there. By the properties of the function $I=p\eta$ on $(a,b)$ , from Proposition 2.5, the continuity of $\gamma$ and by de l’Hôpital’s rule (see Theorem 5.1) we have

As is typical for Stein’s method, its success within the applications considerably depends on good bounds on the solutions $g_{h}$ and their derivative(s), generally uniformly over some given class $\mathcal{H}$ of test functions $h$ . The next step will be to prove such bounds. It has to be mentioned that we cannot expect to derive concrete good bounds in full generality, but that sometimes further conditions have to be imposed either on the distribution $\mu$ (e.g. through the density $p$ ) or on the coefficient $\gamma$ . Nevertheless, we will derive bounds involving functional expressions which can a posteriori be simplified, computed or further bounded for concrete distributions. So our abstract viewpoint will pay off. Moreover, some of our bounds will actually hold in complete generality.

The next Proposition contains a bound for the solutions $g_{h}$ for bounded and Borel-measurable test functions $h$ .

The proof is deferred to the appendix. The following corollary specializes this result to the case that $\gamma(x)=-c(x-E[Z])$ and that $\mu$ is symmetric with respect to its median, which is then equal to its expected value $E[Z]$ , that is $Z-m\stackrel{{\scriptstyle\mathcal{D}}}{{=}}m-Z$ .

In this case we clearly have $I(m)=\frac{c}{2}E[\lvert Z-m\rvert]$ which implies the result by Proposition 2.10. ∎

In the case that $\mu=N(0,1)$ and $c=1$ this result specializes to the well known bound $\sqrt{\frac{\pi}{2}}\lVert h-\mu(h)\rVert_{\infty}$ (see [CGS11] or [CS05], e.g.).

In the formulation of Proposition 2.10 it might suprise that there is no bound mentioned for $\lVert g_{h}^{\prime}\rVert_{\infty}$ . This is because, in general a bound of the form $\lVert g_{h}^{\prime}\rVert_{\infty}\leq C\lVert h\rVert_{\infty}$ does not exist with a finite constant $C$ in this setup. Note that this is contrary to the density approach, where one generally has such a bound (see [CS11] or [CGS11]).

Next, we will turn to Lipschitz continuous test functions $h$ . In contrast to bounded measurable test functions, there we will also be able to prove useful bounds for $\lVert g_{h}^{\prime}\rVert_{\infty}$ .

In order to obtain bounds for Lipschitz continuous test functions we need a further condition on the distribution $\mu$ which guarantees that its expected value exists.

The density $p$ is positive on the interval $(a,b)$ and absolutely continuous on every compact interval $[c,d]\subseteq(a,b)$ . Furthermore, $E[|Z|]=\int_{\overline{(a,b)}}\lvert x\rvert p(x)dx<\infty$ .

The following proposition, which is also proved in the appendix, includes bounds for both, $g_{h}$ and $g_{h}^{\prime}$ , when $h$ is Lipschitz.

$\lvert g_{h}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{F(x)E[Z]-\int_{a}^{x}yp(y)dy}{I(x)}$

$\lvert g_{h}^{\prime}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}F(s)dsG(x)+\int_{x}^{b}(1-F(s))dsH(x)}{p(x)\eta(x)^{2}}$

Here, for $x\in\overline{(a,b)}$ the positive functions $H(x)$ and $G(x)$ are defined by

In general, the term $S(x):=\frac{F(x)E[Z]-\int_{a}^{x}yp(y)dy}{I(x)}$ cannot be bounded uniformly in $x\in(a,b)$ unless $\lvert\gamma\rvert$ grows at least linearly in $x$ .

$\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{c}$

$\lvert g_{h}^{\prime}(x)\rvert\leq\frac{2\lVert h^{\prime}\rVert_{\infty}}{c}\frac{H(x)G(x)}{I(x)\eta(x)}=2\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}F(s)ds\int_{x}^{b}(1-F(t))dt}{\eta(x)\bigl{(}E[Z]F(x)-\int_{a}^{x}yp(y)dy\bigr{)}}$

Claim (a) follows from Proposition 2.14 (a) and the observation that in this case we have

Part (b) follows from Proposition 2.14 (b) and Lemma 5.2 by observing that in this case

and, similarly, $G(x)=c\int_{x}^{b}(1-F(s))ds$ . ∎

It is quite remarkable that in the case of normal approximation (via its classical Stein equation) the bound given in Corollary 2.16 (a) even improves on the best bound $2\lVert h^{\prime}\rVert_{\infty}$ currently mentioned in the literature (see, e.g. [CGS11] or [CS05]). In fact, in this case $c=1$ and thus our bound reduces to $\lVert h^{\prime}\rVert_{\infty}$ .

For concrete distributions the ratio appearing in the bound for $g_{h}^{\prime}(x)$ may be bounded uniformly in $x$ by some constant which can sometimes also be computed explicitely. Nevertheless, in [EV12] the authors give mild conditions for the existence of a finite constant $k$ such that $\lVert g_{h}^{\prime}\rVert_{\infty}\leq k\lVert h^{\prime}\rVert_{\infty}$ for any Lipschitz-continuous $h$ . In practice, these conditions are usually met. However, there is no hope of estimating the constant $k$ by their method of proof. Thus, for concrete distributions and explicit constants it might therefore by useful to work with our bound from Corollary 2.16 (b).

Next, we will discuss, how we can express the density $p$ of $\mu$ in terms of $\gamma$ and $\eta$ . This will be useful to bound the second derivative of $g_{h}$ in some special cases. Let $x_{0}$ be as in Condition 2.2. Since $\eta^{\prime}=\gamma-\eta\psi$ and hence $\psi=\frac{\gamma-\eta^{\prime}}{\eta}$ , we have

Formula (20) is a more general version of formula (3.14) in [NV09] and is also derived in [KT12]. Now, differentiating Stein’s equation (11), we obtain for $h$ Lipschitz

We already know, how to solve (24) for $x\in(a,b)$ . So now, we will assume that at least one of $a$ and $b$ is finite and try to solve the equation outside $(a,b)$ . Furthermore, we will discuss conditions that ensure that the composed solution $g_{h}$ behaves nicely at the edges $a$ and/or $b$ . We will henceforth assume Condition 2.18. For $x\not=a,b$ equation (24) is clearly equivalent to

Let us assume that both $a>-\infty$ and $b<\infty$ (the other cases are of course included) and let $F_{l}$ be any primitive function of $\frac{\gamma}{\eta}$ on $(-\infty,a)$ . Such a function exists by continuity and is hence continuously differentiable. By the method of variation of the constant one may derive the following formula for $x\in(-\infty,a)$ :

if this integral exists. Note that this property does not depend on the particular choice of the primitive function $F_{l}$ . For a fixed primitive function $F_{l}$ of $\frac{\gamma}{\eta}$ on $(-\infty,a)$ we define the function

Analogously, for a given primitive function $F_{r}$ of $\frac{\gamma}{\eta}$ on $(b,\infty)$ we define the function

Note that inside the interval $(a,b)$ we have that $\log(\eta p)$ is a primitive of $\frac{\gamma}{\eta}$ and hence $q_{l}$ plays the role of $p$ on $(-\infty,a)$ (and similarly for $q_{r}$ ). As we have observed, we will need the following Condition:

Similarly, for $x\in(b,\infty)$ we arrive at the definition

Note that the definition of the solution $g_{h}$ does not depend on the choice of the primitive functions $F_{l}$ and $F_{r}$ since two such functions may only differ by an additive constant.

Next, we prove that the above constructed solution $g_{h}$ is continuous as long as $h$ is continuous at $a$ and $b$ . To deal with the limits $\lim_{x\nearrow a}g_{h}(x)$ and $\lim_{x\searrow b}g_{h}(x)$ we first formulate a condition which will usually be satisfied in practice.

The functions $F_{l}$ and $F_{r}$ satisfy $\lim_{x\nearrow a}F_{l}(x)=\pm\infty$ and $\lim_{x\searrow b}F_{r}(x)=\pm\infty$ .

Again, the validity of this condition does not depend on the choice of the functions $F_{l}$ and $F_{r}$ . By Condition 2.21 we may again apply de l’Hôpital’s rule to compute

Next, we want to present bounds on $g_{h}(x)$ and $g_{h}^{\prime}(x)$ for $x\notin\overline{(a,b)}$ . But first we will show that our conditions already imply that $\eta(x)<0$ if $x<a$ or $x>b$ . Since $F_{l}^{\prime}(x)=\frac{\gamma(x)}{\eta(x)}$ is then negative on $(-\infty,a)$ this also ensues that $\lim_{x\nearrow a}F_{l}(x)=-\infty$ . Similarly, $\lim_{x\searrow b}F_{r}(x)=-\infty$ .

Assume Conditions 2.18, 2.19 and 2.21. Then the functions $\eta$ , $F_{l}$ and $F_{r}$ have the following properties:

We have $\lim_{x\nearrow a}F_{l}(x)=\lim_{x\searrow b}F_{r}(x)=-\infty$ .

To prove (a), first note, that by Condition 2.18 $\eta$ has no sign changes on $(-\infty,a)$ . Suppose contrarily to the assertion, that $\eta(x)>0$ for all $x\in(-\infty,a)$ . Then, the function $q_{l}$ is also positive and hence $0\leq\int_{x}^{a}q_{l}(t)dt=-Q_{l}(x)$ for each $x\in(-\infty,a)$ . By Conditions 2.19 and 2.21 we may apply de l’Hôpital’s rule to conclude

by Condition 2.18. This is a contradiction and hence we must have $\eta(x)<0$ for $x\in(-\infty,a)$ . Similarly, one shows that also $\eta(x)<0$ for $x\in(b,\infty)$ . To prove (b), note that $F_{l}^{\prime}(x)=\frac{\gamma(x)}{\eta(x)}<0$ for $x<a$ . By Condition 2.21 this necessarily implies that $\lim_{x\nearrow a}F_{l}(x)=-\infty$ . Analogously, we have $F_{r}^{\prime}(x)>0$ for $x>b$ (since $\gamma(x)<0$ there) which by Condition 2.21 implies that $\lim_{x\searrow b}F_{r}(x)=-\infty$ . ∎

Proposition 2.23 particularly implies the conclusion of Proposition 2.8 and hence makes Condition 2.6 redundant, at least as far as the assertion of this proposition is concerned. In order to get general bounds, we will need yet another condition on the functions $F_{l}$ and $F_{r}$

The functions $F_{l}$ and $F_{r}$ satisfy $\lim_{x\searrow-\infty}F_{l}(x)=\lim_{x\nearrow\infty}F_{r}(x)=+\infty$ .

The next result gives bounds on $g_{h}$ for bounded, Borel-measurable functions $h$ . As usual, the proof is in the appendix.

Before turning to Lipschitz test functions, we dicuss properties of the functions $\exp(F_{l})$ and $\exp(F_{r})$ , respectively. In particular, we will show, that they correspond to the function $I$ on $(a,b)$ and have similar integral representations.

For each $x\in(-\infty,a)$ we have $I_{l}(x)=\int_{a}^{x}\gamma(t)q_{l}(t)dt$ .

For each $x\in(b,\infty)$ we have $I_{r}(x)=\int_{b}^{x}\gamma(t)q_{r}(t)dt$ .

by Proposition 2.23. Hence $\int_{a}^{x}\gamma(t)q_{l}(t)dt$ exists and equals $I_{l}(x)$ . ∎

For each $x\in(-\infty,a)$ we have $\lvert g_{h}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{Q_{l}(x)E[Z]-\int_{a}^{x}sq_{l}(s)ds}{I_{l}(x)}$ .

For each $x\in(b,\infty)$ we have $\lvert g_{h}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{Q_{r}(x)E[Z]-\int_{b}^{x}sq_{r}(s)ds}{I_{r}(x)}$ .

$\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{c}$

These probabilities have nothing to do with the distribution $\mu$ of $Z$ and hence, one should be able to bound them as well directly for a given $W$ . Thus, we will focus on $z\in(a,b)$ .

Let $z\in(a,b)$ be given. Then, under the Conditions 2.1, 2.18, 2.19 and 2.21 we have:

It is clear, that a similar discussion of the solutions $g_{z}$ is possible, if $a=-\infty$ or $b=\infty$ .

Note, that we can write $g_{z}^{\prime}(x)=\frac{(1-F(z))p(x)H(x)}{I(x)^{2}}$ for $x\in(a,z)$ and $g_{z}^{\prime}(x)=\frac{-F(z)p(x)G(x)}{I(x)^{2}}$ for $x\in(z,b)$ , with the functions $H$ and $G$ from Proposition 2.14. For concrete distributions one may often prove, that $g_{z}^{\prime}$ is increasing on $(a,z)$ and decreasing on $(z,b)$ , but this seems to be hard to prove in generality, if it is true at all.

Finally, in our general setting, we will prove suitable “plug-in theorems” for exchangeable pairs satisfying our general regression property (9). As was observed in [Röl08] for the normal distribution, in case of univariate distributional approximations, one does not need the full strength of exchangeability, but equality in distribution of the random variables $W$ and $W^{\prime}$ is sufficient. This may allow for a greater choice of admissible couplings in several situations, or at least, relaxes the verification of asserted properties.

In the following, let $(\Omega,\mathcal{A},P)$ be a probability space and let $W,W^{\prime}$ be real-valued random variables defined on this space such that $W\stackrel{{\scriptstyle\mathcal{D}}}{{=}}W^{\prime}$ . Let, as before, $\mu$ be our target distribution with support $\overline{(a,b)}$ fulfilling Condition 2.1 . From now on we will assume, that the random variables $W$ and $W^{\prime}$ only have values in an interval $I\supseteq(a,b)$ where both functions $\eta$ and $\gamma$ are defined (recall that it might be the case that $\eta$ can only be defined on $(a,b)$ ).

If $f^{\prime}$ is also absolutely continuous and $\lVert f^{\prime\prime}\rVert_{\infty}<\infty$ for some Borel-measurable version $f^{\prime\prime}$ of the second derivative, then we also have the bound

Hence, by distributional equality, we obtain

From (32) and the assumptions on $f$ the bound (29) now easily follows. To prove (30) it suffices to observe that

and $\int_{0}^{1}s(1-s)ds=\frac{1}{6}$ . ∎

From the first term on the right hand side of (29) we see, that the bound can only be useful, if $E\bigl{[}(W^{\prime}-W)^{2}|W\bigr{]}\approx 2\lambda\eta(W)$ . Similarly, the third term reveals, that, indeed, $R$ should be of smaller order than $\lambda$ .

The proof shows, that Proposition 2.31 can easily be generalized to the situation, where there is a $\sigma$ -algebra $\mathcal{F}$ with $\sigma(W)\subseteq\mathcal{F}\subseteq\mathcal{A}$ and the more general regression property

with some $\mathcal{F}$ -measurable remainder term $R$ is satisfied.

If $\mathcal{H}$ is some class of test functions, such that there are finite, positive constants $c_{0}$ , $c_{1}$ and $c_{2}$ with $\lVert g_{h}\rVert_{\infty}\leq c_{0}$ , $\lVert g_{h}^{\prime}\rVert_{\infty}\leq c_{1}$ and $\lVert g_{h}^{\prime\prime}\rVert_{\infty}\leq c_{2}$ for each $h\in\mathcal{H}$ , then (30) immediately yields a bound on the distance

Stein’s method for Beta distributions

From now on, fix $\alpha,\beta>-1$ . Now, we introduce a Stein identity for the Beta distribution $\mu_{\alpha,\beta}$ . It is easily checked, that its density $p_{\alpha,\beta}$ satisfies the ordinary differential equation

Integrating by parts one obtains the conjugate operator $L:=A^{*}$ , which is defined by the equation $<Af,g>_{L^{2}}=<f,A^{*}g>_{L^{2}}$ , and which is known to serve as a characterizing operator for the distribution $\mu_{\alpha,\beta}$ . To be concrete, in our case we have

for smooth enough functions $g$ , yielding the Stein identity

A real-valued random variable is distributed according to $\mu_{\alpha,\beta}$ if and only if for all functions $g\in\mathcal{K}_{\alpha,\beta}$ the expected values $E[(1-X^{2})g^{\prime}(X)]$ and $E[(\alpha+\beta+2)Xg(X)+(\alpha-\beta)g(X)]$ exist and coincide.

First, let $\mathcal{L}(X)=\mu_{\alpha,\beta}$ and let $g\in\mathcal{K}_{\alpha,\beta}$ . By the hypothesis and the transformation formula, we have

Hence the expexted value $E[(1-X^{2})g^{\prime}(X)]$ exists. Since $g$ is continuous, it is bounded on $ $and so the expected value$ E[(\alpha+\beta+2)Xg(X)+(\alpha-\beta)g(X)] $exists, too. Again, by the transformation rule and since$ g $and$ \varrho_{\alpha,\beta} $are absolutely continuous on$ $ we can use integration by parts and have

where $h_{z}:=1_{(-\infty,z]}$ . It will be shown in Proposition 3.2 below that $g_{z}\in\mathcal{K}_{\alpha,\beta}$ , so that by hypothesis we have

Since we have fixed the parameters $\alpha$ and $\beta$ , henceforth we may and will suppress them as sub-indices at objects which might well depend on them (for example we will simply write $p$ for $p_{\alpha,\beta}$ and so on). As we would like to use the theory from section 2 we have to make sure, that our Stein identity for the Beta distribution fits into this framework, i.e. that relation (15) is satisfied with $\eta(x)=1-x^{2}$ and

where we have used, that $E[Z]=\frac{\beta-\alpha}{\alpha+\beta+2}$ . In principle, this is clear, because we have just established a Stein characterization for $\mu$ and given the density $p$ and the function $\gamma$ , the corresponding $\eta$ is, of course, unique. However, we give a formal proof.

holds for all $x\in(-1,1)$ . First note, that

Differentiating the left hand side of (36), we obtain

which is of course the derivative of the right hand side, too. Since

Condition 2.6 is also satisfied but need not be proved, because its most important conclusion, namely that $\eta(1)=\eta(-1)=0$ is clear from the above discussion. To verify Conditions 2.19, 2.21 and 2.24, we must first define the functions $F_{l}$ on $(-\infty,-1)$ and $F_{r}$ on $(1,\infty)$ . We claim, that the functions

These two functions are of course locally integrable and hence, Condition 2.19 is satisfied. Since

Conditions 2.21 and 2.24 also hold. Consequently, all results from section 2 are valid in particular for the case of Beta distributions.

Here, the values of $g_{h}$ at $\pm 1$ are arbitrary, but they are chosen such that $g_{h}$ is continuous, whenever $h$ is continuous at $\pm 1$ . This follows immediately from Proposition 2.22.

The next result, which is also proved in the appendix, completes the proof of Proposition 3.1 by showing that the solution $g_{z}$ is in the class $\mathcal{K}_{{}_{\alpha,\beta}}$ whenever $z\not=\pm 1$ .

Next, we will derive some results for the solutions $g_{h}$ from corresponding results in Section 2.

Since $\gamma(-1)=2\beta+2$ and $\gamma(1)=-2\alpha-2$ , this immediately follows from Proposition 2.25. ∎

$\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{\alpha+\beta+2}$

There exists a constant $K_{1}$ , only depending on $\alpha$ and $\beta$ such that $\lVert g_{h}^{\prime}\rVert_{\infty}\leq K_{1}\lVert h^{\prime}\rVert_{\infty}$ .

The following lemma, which is proved in the appendix, will be useful.

has a bounded solution $f$ on $(-1,1)$ if and only if $E[u(Z)]=0$ .

and from Proposition 3.4 we see that $h_{2}$ is Lipschitz with minimal Lipschitz constant

Hence, there is a constant $K_{2}$ depending only on $\alpha$ and $\beta$ such that

for all twice differentiable functions $h$ with bounded first and second derivative. We have thus proved the following proposition.

Now we are in the position to provide a “plug-in theorem” for the Beta approximation using exchangeable pairs.

Let $W,W^{\prime}$ be identically distributed, real-valued random variables on a common probabilty space $(\Omega,\mathcal{A},P)$ satisfying the regression property

for some constant $\lambda>0$ and a random variable $R$ . Then for each twice differentiable function $h$ with bounded first and second derivative and with $E\bigl{[}\lvert h(W)\rvert\bigr{]}<\infty$ we have the bound

where the constants $K_{1}$ and $K_{2}$ are from Propositions 3.4 and 3.6, respectively.

This immediately follows from Propositions 2.31, 3.4, 3.6 and since $g_{h}$ is a solution to Stein’s equation (11). ∎

In the following we will transfer the developed theory to the Beta distributions $\nu_{a,b}$ on $ $. We start with the Stein identity for$ \nu_{a,b} $, where$ a,b>0 $are fixed parameters. Let$ X\sim\nu_{a,b} $, then$ Y:=2X-1\sim\mu_{b-1,a-1} $and hence for each smooth enough function$ f$ we have

So, a Stein identity for $X\sim\nu_{a,b}$ is given by

If $h$ is bounded, then $\lVert f_{h}\rVert_{\infty}\leq\lVert h-\nu_{a,b}(h)\rVert_{\infty}\max\Bigl{(}\frac{1}{2m(1-m)q_{a,b}(m)},\,\frac{1}{a},\,\frac{1}{b}\Bigr{)}$ , where $m$ is a median for $\nu_{a,b}$ .

If $h$ is Lipschitz, then $\lVert f_{h}\rVert_{\infty}\leq\frac{2}{a+b}\lVert h^{\prime}\rVert_{\infty}$ and $\lVert f_{h}^{\prime}\rVert_{\infty}\leq C_{1}\lVert h^{\prime}\rVert_{\infty}$ , where $C_{1}$ only depends on $a$ and $b$ .

If $h$ is twice differentiable with bounded first and second derivative, then $\lVert f_{h}^{\prime\prime}\rVert_{\infty}\leq C_{2}\bigl{(}\lVert h^{\prime}\rVert_{\infty}+\lVert h^{\prime\prime}\rVert_{\infty}\bigr{)}$ , where $C_{2}$ only depends on $a$ and $b$ .

Now, let $V,V^{\prime}$ be identically distributed, real-valued random variables on a common probability space $(\Omega,\mathcal{A},P)$ . For the approximation of $\mathcal{L}(V)$ by $\nu_{a,b}$ the general regression property from Section 2 is

where, again, $\lambda>0$ is constant and $R$ is a hopefully small remainder term. For the distribution $\nu_{a,b}$ Theorem 3.7 becomes the following:

where the constants $C_{1}$ and $C_{2}$ are those from Proposition 3.8.

The assertion is clear from Propositions 2.31 and 3.8 and since $f_{h}$ is a solution to Stein’s equation (41). ∎

Application to the Polya urn model

In this section we prove a quantitative version of the fact that the relative number of drawn red balls in a Polya urn model converges in distribution to a suitable Beta distribution, if the number of total drawings tends to infinity. This model will serve as an application of our Stein method for the Beta distribution, as developed in section 3. We start by introducing the stochastic model:

It now follows, that for each $k=0,\ldots,n$ we have

or, with $a:=\frac{r}{c}$ and $b:=\frac{w}{c}$ ,

The distribution of $S_{n}$ is usually referred to as the Polya distribution with parameters $n$ , $a$ and $b$ . It is a well-known fact that the distribution of $\frac{1}{n}S_{n}$ converges weakly to the distribution $\nu_{a,b}$ as $n$ goes to infinity, where the Beta distribution $\nu_{a,b}$ was defined in section 3. A convenient way to prove this weak convergence result is to use the formula

together with the weak law of large numbers for Bernoulli random variables to deal with the binomial probabilities $b(k;n,p)=\binom{n}{k}p^{k}(1-p)^{n-k}$ .

Formula (43) can be proved by a straight-forward computation using the relations $B(a+1,b)=\frac{a}{a+b}B(a,b)$ and $B(a,b)=B(b,a)$ for the Beta function, where $a,b>0$ , and can also be viewed as a consequence of a special instance of de Finetti’s representation theorem for infinite exchangeable sequences. Note, however, that one generally does not know the corresponding mixing measure from de Finetti’s theorem and hence, identity (43) is not a direct consequence of this theorem.

From now on, we will present a Stein’s method proof of the above distributional convergence result and, as usual, also derive a rate of convergence. We will usually suppress the time index $n$ and let $V:=V_{n}:=\frac{1}{n}S_{n}$ denote the random variable of interest. For the construction of the exchangeable pair, we use the well-known Gibbs sampling procedure with the slight simplification, that due to exchangeability of $X_{1},\ldots,X_{n}$ we need not choose at random the index of the summand from $S_{n}$ , which has to be replaced. Instead, we will always replace $X_{n}$ by $X_{n}^{\prime}$ , which is constructed as follows:

Observe $X_{1}=x_{1},\ldots,X_{n}=x_{n}$ and construct $X_{n}^{\prime}$ according to the distribution $\mathcal{L}(X_{n}|X_{1}=x_{1},\ldots,X_{n-1}=x_{n-1})$ . Then, letting $V^{\prime}:=V_{n}^{\prime}:=V-\frac{1}{n}X_{n}+\frac{1}{n}X_{n}^{\prime}$ , the pair $(V,V^{\prime})$ is exchangeable. In order to use Stein’s method of exchangeable pairs, we need to establish a suitable regression property. This is the content of the following proposition.

The exchangeable pair $(V,V^{\prime})$ satisfies the regression property

where $\gamma_{a,b}(x)=(a+b)\bigl{(}\frac{a}{a+b}-x\bigr{)}$ and $\lambda=\lambda_{n}=\frac{1}{n(a+b+n-1)}$ .

We have $V^{\prime}-V=\frac{X_{n}^{\prime}}{n}-\frac{X_{n}}{n}$ and by exchangeability of $X_{1},\ldots,X_{n}$ it clearly holds that $E[X_{n}|V]=E[X_{n}|S_{n}]=\frac{1}{n}S_{n}=V$ . Also, by the definition of $X_{n}^{\prime}$ and since $X_{n}^{\prime}$ only assumes the values and $1$ we have for any $x_{1},\ldots,x_{n-1}\in\{0,1\}$

Thus, since $\sigma(V)\subseteq\sigma(X_{1},\ldots,X_{n})$ , we obtain

Next, we will compute the quantity $E\bigl{[}(V^{\prime}-V)^{2}|V\bigr{]}$ .

We have for the above constructed exchangeable pair $(V,V^{\prime})$

From the general theory of Gibbs sampling (see the author’s PhD thesis, to appear) it is known, that

Since $X_{n}^{2}=X_{n}$ we have from the proof of Proposition 4.1 that

where we have used $E[X_{n}|V]=V$ again. Finally, we compute

Putting pieces together, we eventually obtain

The last assertion easily follows from this and from $\lambda=\frac{1}{n(a+b+n-1)}$ . ∎

Recall that for the distribution $\nu_{a,b}$ we have $\eta(x):=\eta_{a,b}(x)=x(1-x)$ and hence, we obtain from Proposition 4.2 that

since $\lvert V\rvert\leq 1$ . Similarly, since $\lvert V^{\prime}-V\rvert=\frac{1}{n}\lvert X_{n}^{\prime}-X_{n}\rvert\leq\frac{1}{n}$ we have

From Theorem 3.9 we can now conclude the following result.

with the constants $C_{1}$ and $C_{2}$ from Proposition 3.8.

Since $V$ assumes only values in $ $, the condition$ E\bigl{[}\lvert h(V)\rvert\bigr{]}<\infty$ from Theorem 3.9 is trivially met. The assertion now follows immediately from Theorem 3.9, (44) and (45). ∎

In [GR12] the authors use a different technique within Stein’s method for the Beta distributions, which compares the Stein characterization of the target distribution with that of the approximating discrete distribution, to prove that, in the special case $c=1$ , the convergence rate of order $n^{-1}$ from Theorem 4.3 even holds in the Wasserstein distance and they compute an explicit constant in the bound. They also show that the rate of convergence is optimal. Using their technique and the bounds from Proposition 3.8 one can easily see that the rate of order $n^{-1}$ in the Wasserstein distance also holds in the case $c\geq 2$ . However, in order to obtain an explicit constant, some further work has to be done to bound the constant $C_{1}$ from Proposition 3.8 in the case that one of the values $a,b$ is strictly smaller than one.

In [FG] the authors use the zero bias coupling within Stein’s method for normal approximation to prove bounds on the distance of a normalized version of the quantity $V$ to the standard normal distribution. In particular, they show that a CLT holds whenever the parameters $n$ , $a$ and $b$ tend to infinity in a suitable fashion.

Appendix

In this section we provide the proofs of some of the results from Sections 2 and 3 and state and prove some further auxiliary results, which are only used within proofs.

The following result justifies all our calculations, which invoke de l’Hôpital’s rule. Its proof is suppressed for reasons of space, but will be given in the author’s PhD thesis.

If $a<a^{\prime}<b^{\prime}<b$ , then both, $f$ and $g$ , are absolutely continuous on $[a^{\prime},b^{\prime}]$ .

If $\lim_{x\searrow a}f(x)=\lim_{x\searrow a}g(x)=0$ , then $g(x)\not=0$ for all $x\in(a,b)$ and

The same conclusion holds if $g^{\prime}(x)<0$ for almost all $x\in(a,b)$ and an analogous result is true for $\lim_{x\nearrow b}$ .

2. Proofs from Section 2

which exists in $[0,\infty)$ by Condition 2.2. Here, we used the convention $\frac{1}{\infty}=0$ .

again by Condition 2.2 and by Proposition 2.4. Furthermore, we have

for each $x\in(a,b)$ since by the positivity of $p$ and because $\gamma$ is strictly decreasing

Hence, $M$ is strictly increasing and thus for each $x\in(a,m]$ :

The same bound can be proved for $x\in(m,b)$ by using the representation

and the fact that also $1-F(m)=\frac{1}{2}$ .

The following two well-known lemmas will be needed for the proof of Proposition 2.14. Their proofs are included only for reasons of completeness.

Let $-\infty\leq a<b\leq\infty$ and let $\mu$ be a probability measure (not necessarily absolutely continuous with respect to $\lambda$ ) with $\operatorname{supp}(\mu)\subseteq\overline{(a,b)}$ . Let $F$ be the distribution function corresponding to $\mu$ and suppose that $\int_{a}^{b}\lvert x\rvert d\mu(x)<\infty$ . Then, for each $x\in\overline{(a,b)}$ we have

$\int_{a}^{x}F(t)dt=xF(x)-\int_{\overline{(a,x]}}sd\mu(s)$

$\int_{x}^{b}(1-F(t))dt=\int_{(x,\infty)}sd\mu(s)-x(1-F(x))$

For each $x\in\overline{(a,b)}$ we have $\int_{(a,x]}(h(y)-\mu(h))d\mu(y)=-(1-F(x))\int_{a}^{x}F(s)h^{\prime}(s)ds-F(x)\int_{x}^{b}(1-F(s))h^{\prime}(s)ds$ .

Since $\mu$ is a probability measure we have by the fundamental theorem of calculus for Lebesgue integration and by Fubini’s theorem

This proves (a). As to (b), we have using (a) and its proof

First, we prove (a). Recall the representation

By Lemmas 5.3 and 5.2 we thus obtain that

implying (a). Now, we turn to the proof of (b). By Stein’s equation (11) we obtain for $x\in(a,b)$

which reduces to the bound asserted in (b). ∎

The bound on $\lvert g_{h}(x)\rvert$ for $x\in(a,b)$ has already been proved in Proposition 2.10. Let $x\in(-\infty,a)$ . Then we have by the negativity of $q_{l}$ which follows from Proposition 2.23:

Next, we will state a lemma, which replaces Lemma 5.2 outside the support $\overline{(a,b)}$ .

For each $x\in(-\infty,a)$ we have $\int_{x}^{a}Q_{l}(t)dt=-xQ_{l}(x)+\int_{a}^{x}tq_{l}(t)dt$ and $I_{l}(x)<\gamma(x)Q_{l}(x)$ .

For each $x\in(b,\infty)$ we have $\int_{b}^{x}Q_{r}(t)dt=xQ_{r}(x)-\int_{b}^{x}tq_{r}(t)dt$ and $I_{r}(x)<\gamma(x)Q_{r}(x)$ .

which proves the first part of (a). The second claim of (a) follows from (a), the positivity of $-q_{l}$ on $(-\infty,a)$ and from the monotonicity of $\gamma$ :

The proof of (b) is similar but easier, and is therefore omitted. ∎

The next lemma replaces Lemma 5.3 outside of the support of $\mu$ .

For each $x\in(-\infty,a)$ we have $h(x)-\mu(h)=-\int_{x}^{b}\bigl{(}1-F(s)\bigr{)}h^{\prime}(s)ds=-\int_{x}^{a}h^{\prime}(s)ds-\int_{a}^{b}\bigl{(}1-F(s)\bigr{)}h^{\prime}(s)ds$ and

For each $x\in(b,\infty)$ we have $h(x)-\mu(h)=\int_{a}^{x}F(s)h^{\prime}(s)ds$ and

We only prove (a) since the proof of (b) is very similar. The first claim follows from Lemma 5.3 (a) since $F(s)=0$ for $s<a$ and $F(s)=1$ for $x\geq b$ . The second claim follows from the first one and from Fubini’s theorem by

This is the first representation for $g_{h}(x)$ in the assertion. The second one follows, since $1-F(s)=1$ for $s<a$ and hence

We only prove (a) and (c), since the proofs of (b) and (d) are similar. To prove (a), we observe that by Lemma 5.5 we have

Since $Q_{l}$ is decreasing ( $Q_{l}^{\prime}=q_{l}<0$ ) and positive on $(-\infty,a)$ this implies

By Lemma 5.2 and Lemma 5.4 (a) the right hand side equals

which is the claimed bound. Now, we turn to the proof of (c). By Stein’s equation (24) and Lemma 5.5 (a) we have for each $x\in(-\infty,a)$ :

These together with Proposition 2.27 (a), (b) and Corollary 2.16 immediately imply (a). As to (b), by (49) we have

This and Proposition 2.27 (c) imply claim (b). Assertion (c) may be proved similarly. ∎

proving the desired representation of $g_{z}$ inside the interval $(a,b)$ . Now, for $x\in(a,b)$ let $M(x):=\frac{F(x)}{I(x)}$ and $N(x):=\frac{1-F(x)}{I(x)}$ . Then we have

since $G(x)=(I(x)+(1-F(x))\gamma(x))$ is positive by Proposition 2.14. Thus, $M$ is strictly increasing and $N$ is strictly decreasing on $(a,b)$ . Since $g_{z}(x)=(1-F(z))M(x)$ for $x\in(a,z]$ and $g_{z}(x)=F(z)N(x)$ for $x\in(z,b)$ , this implies, that $\sup_{x\in(a,b)}g_{z}(x)=g_{z}(z)=\frac{F(z)(1-F(z)}{I(z)}$ . It also implies the claimed representation of $g_{z}^{\prime}(x)$ for $x\in(a,b)\setminus\{z\}$ . Furthermore, by de l’Hôpital’s rule, we have

Note, that these limits could also be derived from Proposition 2.22. Next, consider $x\in(-\infty,a)$ . For such an $x$ we have

by Lemma 5.4. Thus, $g_{z}$ is increasing on $(-\infty,a)$ and hence, again by de l’Hôpital’s rule,

Since $g_{z}^{\prime}(x)=(1-F(z))\frac{d}{dx}\frac{Q_{l}(x)}{I_{l}(x)}$ we have also derived the desired formula for $g_{z}^{\prime}(x)$ for $x\in(-\infty,a)$ . The calculations for $x\in(b,\infty)$ are completely analogous and therefore omitted. From our computations we can already infer, that $\lVert g_{z}\rVert_{\infty}=\frac{F(z)(1-F(z)}{I(z)}$ . So, it remains to show that this quantity is bounded in $z\in(a,b)$ . Since it is a continuous function of $z$ , we only have to show that it has finite limits on the edge of the interval $(a,b)$ . But, of course,

and, similarly, $\lim_{z\nearrow b}\frac{F(z)(1-F(z)}{I(z)}=\lim_{z\nearrow b}N(z)=-\frac{1}{\gamma(b)}$ . This concludes the proof. ∎

3. Proofs from Section 3

Using de l’Hôpital’s rule 5.1 and Lemma 5.6 (a) below, we obtain

Hence, $\lim_{x\to\infty}g_{h}(x)=0$ . Similarly, one can prove that $\lim_{x\to-\infty}g_{h}(x)=0$ . It remains to show that

To this end, it suffices to see that the function $\lvert g_{z}^{\prime}(x)\rvert(1-x^{2})p(x)$ is bounded on $(-1,z)$ and on $(z,1)$ . Since it is continuous on $(-1,z]$ and on $(z,1)$ (where $g_{z}^{\prime}(z):=\lim_{x\nearrow z}g_{z}^{\prime}(x)$ for definiteness), this claim will follow if we have proved that $\lim_{x\to\pm 1}g_{z}^{\prime}(x)(1-x^{2})p(x)=0$ . For $x\in(-1,z)$ we have

proving the claim for $-1$ . Since it may be proved analogously for $+1$ the proof is complete. ∎

The following lemma will be useful for the proof of Proposition 3.4.

The functions $p$ , $q_{l}$ , $q_{r}$ and $\eta$ satisfy the following equations for each integer $k\geq 1$ :

$\frac{d}{dx}\bigl{(}\eta(x)^{k}p(x)\bigr{)}=\eta(x)^{k-1}p(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]}$ for each $x\in(-1,1)$

$\frac{d}{dx}\bigl{(}\eta(x)^{k}q_{l}(x)\bigr{)}=\eta(x)^{k-1}q_{l}(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]}$ for each $x\in(-\infty,-1)$

$\frac{d}{dx}\bigl{(}\eta(x)^{k}q_{r}(x)\bigr{)}=\eta(x)^{k-1}q_{r}(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]}$ for each $x\in(1,\infty)$

First we prove (a). By (14) we have, multiplying by $p(x)$ ,

proving (a). As to (b), we observe that by the definition of $q_{l}=\frac{\exp\circ F_{l}}{\eta}$ we have on the one hand

and on the other hand, by the product rule,

Now the proof follows the lines of proof for $p$ as above. Similarly one may prove (c). ∎

Assertion (a) immediately follows from Corollary 2.28 (a) since in this case $c=\alpha+\beta+2$ . Now, we turn to the proof of (b). First, consider $x\in(-1,1)$ . By Corollary 2.16 (b) we have for $x\in(-1,1)$ :

Then $S$ is continuous on $(-1,1)$ and hence, to show that $S$ is bounded on $(-1,1)$ , it suffices to prove that $S$ has finite limits at $\pm 1$ . Since $\lim_{x\searrow-1}G(x)=\gamma(-1)=2\beta+2$ we obtain, using Lemma 5.6 and de l’Hôpital’s rule:

Here we have used, that $H^{\prime}(x)=-\gamma^{\prime}(x)F(x)=(\alpha+\beta+2)F(x)$ . Similarly one shows that

Thus, we have shown that $\sup_{x\in(-1,1)}S(x)<\infty$ . Now, we consider $x\in(-\infty,-1)$ . From Corollary 2.28 (b) we have

For $x\in(-\infty,-1)$ we consider the function

Clearly, $S_{l}$ is a continuous function on $(-\infty,-1)$ . To show that it is bounded, it thus suffices to prove that $\lim_{x\nearrow-1}S_{l}(x)<\infty$ and $\lim_{x\to-\infty}S_{l}(x)<\infty$ . Using Lemma 5.6 and de l’Hôpital’s rule, we obtain

Next, we will show that $\lim_{x\to-\infty}S_{l}(x)=0$ . We actually have

Here we have used that $\eta(x)q_{l}(x)=-(1-x)^{\alpha+1}(-1-x)^{\beta+1}\rightarrow-\infty$ as $x\to-\infty$ . Hence, $\lim_{x\to-\infty}S_{l}(x)=0$ and $\sup_{x\in(-\infty,-1)}S_{l}(x)<\infty$ . Since we can show in a similar manner that $\sup_{x\in(1,\infty)}S_{r}(x)<\infty$ , where $S_{r}$ is defined in the obvious way, the proof is complete. ∎

If $E[u(Z)]=0$ , then the usual Stein solution is bounded on $(-1,1)$ by Proposition 3.3. For the converse, let us assume that $E[u(Z)]\not=0$ . As was already noted in Section 2, the solutions of the homogeneous equation corresponding to (40) are exactly the multiples of $\frac{1}{\eta p}$ . Thus, every solution $f$ of (40) has the form

since $E[u(Z)]\not=0$ . Hence, $f$ is unbounded near $1$ . If $c\not=0$ we have

So, $f$ is unbounded near $-1$ . Hence, in any case $f$ is unbounded on $(-1,1)$ .