Stein's method of exchangeable pairs for the Beta distribution and generalizations

Christian Döbler

Introduction

Since its introduction in in 1972 Stein’s method has become a famous and useful tool for proving distributional convergence. One of its main advantages over other techniques is that it automatically yields concrete error bounds on various distributional distances. Being first only developed for normal approximation it was observed by several authors that Stein’s idea of linking a characterizing operator for the target distribution to a differential equation, the Stein equation, carries over to many other absolutely continuous and discrete distributions, where, in the discrete case, the differential equation has to be replaced by a suitable difference equation. Among those other distributions, to which Stein’s method has been successfully extended, are the Poisson distribution (see e.g. , or ), the Gamma distribution (see or ), the exponential distribution (see e.g. , and ), the Laplace distribution and, more generally, the class of Variance-Gamma distributions . Stein’s method for the Beta distribution has been developed independently in the paper as well as in the preprint .

Although in both works and a rate of convergence for the relative number of drawn red balls in a Pólya urn model was derived using Stein’s method for the Beta distribution, the actual approaches were quite different. In the authors developed a useful and widely applicable technique to find a whole class of characterizing operators for a discrete distribution, whose probability mass function is known explicitly, and compared one of these operators to the Stein operator of the limiting Beta distribution. In contrast, the preprint built on a coupling approach by developing a new version of the exchangeable pairs approach of Stein’s method for a rather large class of absolutely continuous distributions on the real line. This new version of the exchangeable pairs approach differs from that in the framework of the density method as developed in and , since it allows for a modification of the Stein equation, which is adapted to a given exchangeable pair and does not necessarily rely on the characterization by the density method. Recently, in , a nice generalization of the density method, which does not necessarily assume absolute continuity of the given distribution, was given and, as an application, it was shown, how the situation of the Pólya urn example from may be fitted into this framework.

The main purpose of the present paper is to give a more easily readable account of the method and ideas from by keeping the class of Beta distributions on $$ and the Pólya urn model as a running example. In addition, we derive new numerical bounds on the solution to the Stein equation for the Beta distribution and, for smooth test functions, also on its first order derivative. For Lipschitz-continuous test functions, these bounds complement those given in in the sense that they are neither uniformly worse nor uniformly better in the parameters of the Beta distribution. Furthermore, we use a new iterative procedure to obtain uniform bounds for derivatives of any order of the solution to the Beta Stein equation with sufficiently smooth right hand side. Incidentally, this is the first paper to give bounds on higher order derivatives of the solution to the Beta Stein equation. It should be mentioned that, generally, obtaining bounds on higher order derivatives of the solution to the Stein equation is quite a difficult problem, because the explicit representations of those derivatives become more and more complicated. Hence, to date bounds on higher order derivatives of the solution are still quite rare in Stein’s method. For instance, the paper obtains sharp bounds on higher order derivatives in the context of the normal and exponential distributions by exploiting very peculiar identities and facts about these distributions, which are not available for more general absolutely continuous distributions. Also, if one succeeds in deriving a tractable generator representation of the solution to the Stein equation as suggested in , one can usually use this form of the solution to obtain bounds on higher order derivatives. This has been used for the multivariate normal and for the Gamma distribution . However, in contrast to the bounds from , these bounds usually do not exhibit the smoothness property of the inverse of the corresponding Stein operator. In the case of the multivariate normal distribution with non-singular covariance matrix, one can combine the generator representation with a partial integration to obtain bounds on higher order derivatives, which demand one fewer order of smoothness from the test function than the bounds from . This has been accomplished independently in and . The recent paper combines bounds obtained from the generator representation with the iterative method from the present article in order to obtain new bounds on derivatives of arbitrary order of the solution to the Gamma Stein equation, whose dependence on the shape parameter of the Gamma distribution is superior to previous bounds.

We also indicate, how our iterative method can be applied to obtain bounds for the solution to a Stein equation for the exponential distribution, which are better than those previously obtained. We thus suggest that exploiting this iterative procedure can become a fruitful technique for a larger class of distributions.

The remainder of this paper is structured as follows: In Section 2 the general approach is motivated by means of a natural exchangeable pair in the context of the Pólya urn model and it is stressed by means of this example that the framework of exchangeable pairs within the density approach as developed in and is not always suitable and why one might want to use a different Stein characterization. Furthermore, our main application, Theorem 2.1, a quantitative distributional limit theorem for the relative number of drawn red balls is stated. Then, motivated by this example, in Section 3 a general version of Stein’s method for a large class of absolutely continuous distributions adapted to a given exchangeable pair is developed. In Section 4 the theory from Section 3 is specialized to the class of Beta distributions and Theorem 2.1 is proved. Finally, in Section 5 several proofs for statements from Sections 3 and 4 are given.

Acknowledgements

Most parts of the research which led to this article have been accomplished during the authors PhD studies and, hence, there is a certain overlap with the author’s PhD thesis , see also the unpublished paper . The author was supported by the DFG via SFB/TR 12 during this time. We also refer to Appendix A of for a version of de l’Hôpital’s rule which covers (locally) absolutely continuous functions and which is general enough to justify all invocations of this famous tool within this article. Finally, Appendix B of contains some identities about the Gibbs sampling procedure, which are generally useful in the exchangeable pairs version of Stein’s method and which will be used in the present paper. I am grateful to an anonymous referee whose detailed and valuable comments and suggestions helped me improve the presentation of my results.

The Pólya urn model and motivation of our general approach

or, with a:=rca:=\frac{r}{c} and b:=wcb:=\frac{w}{c},

where, for a real number xx and a nonnegative intger mm, we define the generalized binomial coefficient by

Here, B(a,b)B(a,b) denotes the Euler Beta function B(a,b)=01xa1(1x)b1dxB(a,b)=\int_{0}^{1}x^{a-1}(1-x)^{b-1}dx which is related to the Gamma function Γ(t)=0xt1exdx\Gamma(t)=\int_{0}^{\infty}x^{t-1}e^{-x}dx via

where the constants C(,)C(\cdot,\cdot) are defined in (47) and (48) below and h\lVert h^{\prime\prime}\rVert_{\infty} denotes the minimum Lipschitz constant of hh^{\prime}.

Recall that a pair (X,X)(X,X^{\prime}) of random elements on a common probability space is called exchangeable, if

Representation (6) for WW suggests constructing another random variable WW^{\prime} such that WW and WW^{\prime} make up an exchangeable pair using a Gibbs sampling procedure. Noticing that also the random variables X1,,XnX_{1},\dotsc,X_{n} are exchangeable, the construction of WW^{\prime} can be simplified to the following: Observe X1=x1,,Xn=xnX_{1}=x_{1},\dotsc,X_{n}=x_{n} and construct XnX_{n}^{\prime} according to the distribution L(XnX1=x1,,Xn1=xn1)\mathcal{L}(X_{n}|X_{1}=x_{1},\dotsc,X_{n-1}=x_{n-1}). Then, letting

the pair (W,W)(W,W^{\prime}) is exchangeable. Note that WW1n\lvert W-W^{\prime}\rvert\leq\frac{1}{n} is small which suggests that the exchangeable pair (W,W)(W,W^{\prime}) be beneficial for a Stein’s method approach to the proof of weak convergence of L(Wn)\mathcal{L}(W_{n}) to Beta(a,b)Beta(a,b). From the exchangeable pairs approach within normal approximation (see e.g. , or ) and for non-normal approximation (see and ) we know that exchangeability of (W,W)(W,W^{\prime}) is not enough to guarantee distributional closeness of WW and of ZBeta(a,b)Z\sim Beta(a,b) but that a further regression property has to be satisfied.

The exchangeable pair (W,W)(W,W^{\prime}) satisfies the regression property

where \gamma_{a,b}(x)=(a+b)\bigl{(}\frac{a}{a+b}-x\bigr{)} and λ=λn=1n(a+b+n1)\lambda=\lambda_{n}=\frac{1}{n(a+b+n-1)}.

We have WW=XnnXnnW^{\prime}-W=\frac{X_{n}^{\prime}}{n}-\frac{X_{n}}{n} and by exchangeability of X1,,XnX_{1},\ldots,X_{n} it clearly holds that E[XnW]=E[XnSn]=1nSn=WE[X_{n}|W]=E[X_{n}|S_{n}]=\frac{1}{n}S_{n}=W. Also, by the definition of XnX_{n}^{\prime} and since XnX_{n}^{\prime} only assumes the values and 11 we have for any x1,,xn1{0,1}x_{1},\ldots,x_{n-1}\in\{0,1\}

Thus, since σ(W)σ(X1,,Xn)\sigma(W)\subseteq\sigma(X_{1},\ldots,X_{n}), we obtain

From the theory developed in and in we know that if a given exchangeable pair (W,W)(W,W^{\prime}) satisfies a regression property of the form

where λ>0\lambda>0 is a typically small constant and RR is negligible in size, then L(W)\mathcal{L}(W) can be approximated by the absolutely continuous distribution whose density has logarithmic derivative ψ\psi, if and only if the following additional condition is satisfied: It must be the case that

which is often paraphrased as that the term on the left hand side in (9) must satisfy a law of large numbers in order for the approximation to be accurate. Comparing (8) to the statement of Proposition 2.2 we see that according to the theory from or the only possibility would be to approximate the distribution of WW by a distribution whose density has logarithmic derivative equal to (a constant multiple) of

for xx in the support of this density, which should be equal to $inthiscase.Sincethelogarithmicderivativein this case. Since the logarithmic derivative\psi_{a,b}ofthedensityof the densityp_{a,b}ofofBeta(a,b)$ is given by

we conclude by way of contradiction that the law of large numbers (9) cannot hold. Indeed, we will see in Proposition 2.3 below that that the term on the left hand side of (9) is close to the non-constant random quantity W(1W)W(1-W) rather than to the constant 11. From Proposition 2.2 and some experience with the exchangeable pairs approach within Stein’s method we conclude that it would be desirable to have a Stein operator LL of the form

for the Beta distribution Beta(a,b)Beta(a,b). Indeed, in Section 4 we will see that a random variable ZBeta(a,b)Z\sim Beta(a,b) satisfies the Stein identity

for all gg in a suitable class of functions, i.e. we can let ηa,b(x)=η(x)=x(1x)\eta_{a,b}(x)=\eta(x)=x(1-x). Evidently, the Stein identity (12) was first found in and it was also used in . The statement of the following Proposition will make it possible to exploit the above constructed exchangeable pair (W,W)(W,W^{\prime}) in connection with the Stein identity (12) in Section 4.

For the above constructed exchangeable pair (W,W)(W,W^{\prime}) we have

From general facts about Gibbs sampling (see e.g. Appendix B in ) it is known that

Since Xn2=XnX_{n}^{2}=X_{n} we have from the proof of Proposition 2.2 that

where we have used E[XnW]=WE[X_{n}|W]=W again. Finally, we compute

Putting pieces together, we eventually obtain

The last assertion easily follows from (2) and from λ=1n(a+b+n1)\lambda=\frac{1}{n(a+b+n-1)}. ∎

One main aspect of the theoretical contribution of this article is to emphasize that it is no coincidence that

but that this is a natural replacement of condition (9) from the density approach to our class of Stein operators of the form (15) below. We end this motivational section by an abstraction of the ideas in the context of the Pólya urn model and the limiting Beta distribution above. Suppose we are given a sequence of random variables W=WnW=W_{n} of which we know that, as nn\to\infty, it converges in distribution to a random variable ZZ with an absolutely continuous distribution and density pp with respect to the Lebesgue measure. We will also assume that pp itself is absolutely continuous (on each compact subinterval of its support (a,b)\overline{(a,b)}, where a<b-\infty\leq a<b\leq\infty are extended real numbers). Suppose also that we can naturally construct a random variable WW^{\prime}, a small random perturbation of WW, such that (W,W)(W,W^{\prime}) is an exchangeable pair, WW\lvert W-W^{\prime}\rvert is small in a certain sense and that a regression property of the form

holds, where γ\gamma is a certain function on the support of L(Z)\mathcal{L}(Z), λ>0\lambda>0 is constant and RR is a negligible remainder term. The goal is to compute a rate of convergence for the distributional convergence WZW\rightarrow Z by Stein’s method of exchangeable pairs for L(Z)\mathcal{L}(Z). By the above reasoning it would be beneficial to have a characterizing Stein operator LL for ZZ of the form

where η\eta is a function that still has to be found. One might suppose that, in order that LL characterizes L(Z)\mathcal{L}(Z), given the density pp of ZZ and the function γ\gamma the function η\eta is unique but we will see that this is only so up to a constant multiple of p1p^{-1}. Note that by exchangeability

where the first approximation is by the assumption that RR is of negligible order and the second is by the fact that WW converges to ZZ in distribution. Hence, it is natural to assume from the outset that E[γ(Z)]=0E[\gamma(Z)]=0. In particular, we should assume that Eγ(Z)<E\lvert\gamma(Z)\rvert<\infty. A natural question is, given pp and γ\gamma, if there is a general formula for the function η\eta. In the preprint the first order linear differential equation

was found by making, for a given test function hh, the ansatz gh(x)=α(x)fh(x)g_{h}(x)=\alpha(x)f_{h}(x) for the solutions ghg_{h} of the Stein equation

belonging to the operator (15) and fhf_{h} of the Stein equation

corresponding to the density approach. Here, again ψ\psi denotes the logarithmic derivative of pp. In this paper we follow a different, more direct reasoning. If η\eta is such that (15) is characterizing L(Z)\mathcal{L}(Z), then, for suitable functions gg by partial integration:

Thus, if we want this expression to equal

then from (2) we conclude that η\eta must satisfy (17). Of course, (17) can be solved by the method of variation of the constant and it turns out that

is a particular solution which even satisfies (ηp)(a+)=(ηp)(b)=0(\eta p)(a+)=(\eta p)(b-)=0 whenever E[γ(Z)]=0E[\gamma(Z)]=0 and, hence, the boundary conditions

hold for each regular enough, say e.g. bounded, function gg. Also note that every other solution to (17) has the form

Hence, in all these cases, using the density approach implicitly entails choosing ηκ\eta_{\kappa} with κ=p(a+)\kappa=p(a+). When developing the general theory in Section 3 we restrict ourselves to the solution η\eta given by (21), i.e to κ=0\kappa=0. We thus already mention at this point that the density approach for pp is included in the theory presented in Section 3 if and only if

However, at least if γ(x)=c(E[Z]x)\gamma(x)=c(E[Z]-x), it turns out that in many cases η\eta given by (21) has a neat analytical representation, e.g. it is given by a polynomial of degree at most 22, whereas the choice κ0\kappa\not=0 would introduce a complicated coefficient into (18) originating from the term p(x)1p(x)^{-1}. For instance, if ZN(0,1)Z\sim N(0,1) is standard normally distributed and γ(x)=x\gamma(x)=-x, then (21) yields η1\eta\equiv 1, whereas the general expression is ηκ(x)=1+κex2/2\eta_{\kappa}(x)=1+\kappa e^{x^{2}/2}, which is difficult to handle in practice. Furthermore, if pp is not bounded away from zero, then κ0\kappa\not=0 gives an unbounded function ηκ\eta_{\kappa}, whereas η\eta given by (21) usually is bounded, at least if a>a>-\infty and b<b<\infty (see, e.g. Proposition 3.5 below). In the next section we will see that under certain mild conditions on the density pp of ZZ and on the coefficient γ\gamma which, of course, needs not originate from an exchangeable pair, the operator LL given by (15) is indeed characterizing L(Z)\mathcal{L}(Z) and prove bounds on the corresponding Stein equation (18) for suitable test functions hh. Finally, we want to propose a strategy of how to proceed, if, contrarily to the above reasoning, we do not know the limiting density pp from the outset but are only given an exchangeable pair (W,W)(W,W^{\prime}) such that (14) holds and also

and, hence, for xJx\in J and x0Jx_{0}\in J an arbitrary point, we have

Here, of course, K=p(x0)η(x0)K=p(x_{0})\eta(x_{0}) is the normalization constant. Formula (2) shows that pp is uniquely determined by γ\gamma and η\eta. Furthermore, in Theorem 3.22 we will give precise criteria for γ\gamma and pp defined by (2) to satisfy

and for η\eta to satisfy (21) so that the results of the theory developed in Section 3 can in fact be applied. This, together with Proposition 3.19 and Remark 3.20 (iii), suggests the approximation of L(W)\mathcal{L}(W) by the distribution with density pp, if the exchangeable pair (W,W)(W,W^{\prime}) satisfies (14) and (23). Note that this idea yields a certain extension of the methodology proposed in , where only Stein characterizations from the density approach are put to use.

The general approach

Motivated by Section 2 in this section we develop a general version of Stein’s method for a random variable ZZ with an absolutely continuous distribution with respect to the Lebesgue measure. This version is useful for those distributions, which allow for a tractable first order linear Stein operator. This class covers many of the standard absolutely continuous distributions. However, it should not be left unmentioned that certain distributions, like the Laplace , the Variance-Gamma and the PRR distribution fall outside the scope of this approach, as they only possess a second order linear Stein operator with tractable coefficients.

For some extended real numbers a<b-\infty\leq a<b\leq\infty the density pp is positive and locally absolutely continuous on the interval (a,b)(a,b).

γ\gamma is Borel-measurable and not identically equal to zero,

γ\gamma is decreasing on (a,b)\overline{(a,b)},

Eγ(Z)=abγ(t)p(t)dt<E\lvert\gamma(Z)\rvert=\int_{a}^{b}|\gamma(t)|p(t)dt<\infty and in fact E[γ(Z)]=abγ(t)p(t)dt=0E[\gamma(Z)]=\int_{a}^{b}\gamma(t)p(t)dt=0.

Henceforth, we will always assume that Conditions 3.1 and 3.2 are satisfied. Note that by Condition 3.2 there exists a point x0(a,b)x_{0}\in(a,b) such that

though it might not be unique. For definiteness, we choose

and by the positivity of pp on (a,b)(a,b) we can define the function η\eta on (a,b)(a,b) by

The following proposition lists some properties of the function II.

Under Conditions 3.1 and 3.2 the function II has the following properties:

II is locally absolutely continuous on (a,b)\overline{(a,b)}.

II is increasing on (a,x0]\overline{(a,x_{0}]} and decreasing on [x0,b)\overline{[x_{0},b)} and, hence, attains its global maximum at x0x_{0}.

Of course, (a) follows from the fundamental theorem of calculus for Lebesgue integration and the second part of (b) is immediate from item (iii) of Condition 3.2. Finally, (c) and the first part of (b) follow from the second part of (b) and (25). ∎

If a>a>-\infty and/or b<b<\infty, then it is of interest to know under what circumstances it is possible to extend η\eta to a continuous function on (a,b)\overline{(a,b)} because we would like to have η(W)\eta(W) make sense, even if WW assumes one of the boundary values aa and bb with positive probability. We will see that in most cases we indeed have η(a+)=0\eta(a+)=0 or η(b)=0\eta(b-)=0 if a>a>-\infty or if b<b<\infty, respectively. The following Mills ratio condition is satisfied by most absolutely continuous distributions and will in fact turn out to be equivalent to the asserted boundary behaviour of η\eta. From now on, we will denote by FF the distribution function corresponding to the density pp.

The density pp of ZZ satisfies all the properties from Condition 3.1 and also the following:

If a>a>-\infty, then limxaF(x)p(x)=0\lim_{x\downarrow a}\frac{F(x)}{p(x)}=0.

If b<b<\infty, then limxb1F(x)p(x)=0\lim_{x\uparrow b}\frac{1-F(x)}{p(x)}=0.

Assume that Conditions 3.1 and 3.2 hold for pp and γ\gamma, respectively. Then, the function η\eta vanishes at the finite end points of the support (a,b)\overline{(a,b)} of L(Z)\mathcal{L}(Z), i.e. η(a+)=0\eta(a+)=0 whenever a>a>-\infty and η(b)=0\eta(b-)=0 whenever b<b<\infty, if and only if Condition 3.4 is satisfied. Thus, in this case we can extend η\eta to a continuous function on (a,b)\overline{(a,b)} vanishing at the finite end points of this interval.

Not every density pp satisfies Condition 3.4 as is clarified by the following example.

Let δn(0,1)\delta_{n}\in(0,1), n1n\geq 1, be such that n1δn=1\sum_{n\geq 1}\delta_{n}=1 and define xn:=1j=1n1δj=j=nδjx_{n}:=1-\sum_{j=1}^{n-1}\delta_{j}=\sum_{j=n}^{\infty}\delta_{j} and In:=[xn+1,xn]I_{n}:=[x_{n+1},x_{n}], n1n\geq 1. Furthermore let qq be the unique continuous function, which is linear on each interval InI_{n} and such that q(x2n)=δ2n2q(x_{2n})=\delta_{2n}^{2} and q(x2n+1)=δ2nq(x_{2n+1})=\delta_{2n} for n1n\geq 1 and q(1):=δ1q(1):=\delta_{1}. Define pp to be the probability density which is a constant multiple of qq. Then, pp satisfies Condition 3.1 with a=0a=0 and b=1b=1 but Condition 3.4 does not hold: We have limnx2n=0\lim_{n\to\infty}x_{2n}=0 but

Note that pp satisfies limx0p(x)=0\lim_{x\to 0}p(x)=0, so that this does not only happen because p(0+)p(0+) might not exist.

The counterexample given in Example 3.6 is quite artificial. Indeed, the following proposition lists mild assumptions on the density pp which guarantee that Condition 3.4 is satisfied. In practice, at least one of these assumptions is usually met. In particular, note that by part (f) of Proposition 3.7 the Mills ratio limits from Condition 3.4 at finite boundary points aa or bb are always zero, whenever they exist.

Assume a>a>-\infty. In either of the following cases limxaF(x)p(x)=0\lim_{x\downarrow a}\frac{F(x)}{p(x)}=0.

The density pp is bounded away from zero in a suitable neighbourhood of aa.

We have p(a+)=0p(a+)=0 and there is a δ>0\delta>0 such that pp is increasing on (a,a+δ)(a,a+\delta).

We have p(a+)=0p(a+)=0 and there is a δ>0\delta>0 such that pp is convex on (a,a+δ)(a,a+\delta).

We have p(a+)=0p(a+)=0 and there is a δ>0\delta>0 such that pp is concave on (a,a+δ)(a,a+\delta).

The limit limxaF(x)p(x)\lim_{x\downarrow a}\frac{F(x)}{p(x)} exists.

Of course, similar conditions guarantee that limxb1F(x)p(x)=0\lim_{x\uparrow b}\frac{1-F(x)}{p(x)}=0 if b<b<\infty.

if hh has a right limit at aa. Note that γ\gamma has a right limit at aa since it is decreasing. Similarly,

If a>a>-\infty and hh has a right limit at aa, then ghg_{h} can be extended continuously to aa by letting gh(a):=h(a+)E[h(Z)]γ(a+)\displaystyle g_{h}(a):=\frac{h(a+)-E[h(Z)]}{\gamma(a+)}.

If b<b<\infty and hh has a left limit at bb, then ghg_{h} can be extended continuously to bb by letting gh(b):=h(b)E[h(Z)]γ(b)\displaystyle g_{h}(b):=\frac{h(b-)-E[h(Z)]}{\gamma(b-)}.

The success of Stein’s method within applications considerably depends on good bounds on the solutions ghg_{h} and their lower order derivatives, generally uniformly over some given class of test functions hh. The next step will be to prove such bounds. It has to be mentioned that we cannot expect to derive concrete good bounds in full generality, but that sometimes further conditions have to be imposed either on the density pp or on the coefficient γ\gamma. Nevertheless, we will derive bounds involving functional expressions which can be simplified, computed or further bounded a posteriori for concrete distributions. So our abstract viewpoint will pay off. Moreover, some of our general bounds will already be explicit. In what follows, we denote by ghg_{h} the standard solution to Stein’s equation (18) on (a,b)\overline{(a,b)}, implicitly assuming that hh satisfies the assumptions of Proposition 3.8. Furthermore, for a function ff we denote by f\lVert f\rVert_{\infty} its essential supremum norm on (a,b)\overline{(a,b)}. Note that this implies for ff a Lipschitz-continuous function on (a,b)\overline{(a,b)} that f\lVert f^{\prime}\rVert_{\infty} is just its minimum Lipschitz constant. First we give bounds for bounded and measurable test functions hh.

The proof is deferred to Section 5. The following corollary specializes this result to the case that γ(x)=c(xE[Z])\gamma(x)=-c(x-E[Z]) and that L(Z)\mathcal{L}(Z) is symmetric with respect to its mean E[Z]E[Z], i.e. ZE[Z]=DE[Z]ZZ-E[Z]\stackrel{{\scriptstyle\mathcal{D}}}{{=}}E[Z]-Z. Then, it is also clear that m=E[Z]m=E[Z].

In this case we clearly have I(m)=c2E[Zm]I(m)=\frac{c}{2}E[\lvert Z-m\rvert] which implies the result by Proposition 3.9. ∎

In the case that ZN(0,1)Z\sim N(0,1) and c=1c=1 this result specializes to the well known bound ghπ2hE[h(Z)]\lVert g_{h}\rVert_{\infty}\leq\sqrt{\frac{\pi}{2}}\lVert h-E[h(Z)]\rVert_{\infty} (see or , e.g.).

In the statement of Proposition 3.9 it might suprise that there is no bound mentioned for gh\lVert g_{h}^{\prime}\rVert_{\infty}. This is because, in general, a bound of the form ghChE[h(Z)]\lVert g_{h}^{\prime}\rVert_{\infty}\leq C\lVert h-E[h(Z)]\rVert_{\infty} with a finite constant CC does not exist in this setup. For instance, for z>0z>0 and ZZ having the exponential distribution with mean one, consider the Stein equation

Identity (3.3) from shows that for x>zx>z the solution gzg_{z} to (33) satisfies

proving that such a constant CC in general cannot exist. Note also that this is contrary to the density approach, where one usually has such a bound (see or ).

The Kolmogorov distance between a given random variable WW and ZZ is induced by the class of test functions hz:=1(,z]h_{z}:=1_{(-\infty,z]}, where z(a,b)z\in(a,b). In this situation it is easy to verify that the standard solution gz:=ghzg_{z}:=g_{h_{z}} to (18) is given by

By using de l’Hôpital’s rule it is not hard to check that always supz(a,b)S(z)<\sup_{z\in(a,b)}S(z)<\infty. Furthermore, gzg_{z} is Lipschitz-continuous and on (a,b){z}(a,b)\setminus\{z\} it is infinitely often continuously differentiable with

where the functions HH and GG are defined in Proposition 3.13. From the negative example of (a) we already know that, in general, there is no finite constant CC such that

Nevertheless, even in such a situation, one may use the uniform bound on SS and a zz-dependent bound on gz\lVert g_{z}^{\prime}\rVert_{\infty} as well as particular properties of WW to prove accurate bounds on the Kolmogorov distance. This was done in for the exponential distribution. Incidentally, in the case of the Beta distribution, the function SS will be bounded for a different purpose in the proof of Proposition 4.2.

Proposition 3.9 is already sufficient to prove that the operator LL given by (15) characterizes the distribution of ZZ. The proof is given in Section 5.

A random variable XX with values in (a,b)\overline{(a,b)} has the same distribution as ZZ if and only if for each continuous function ff on (a,b)\overline{(a,b)}, which is locally absoulutely continuous on (a,b)(a,b) and which satisfies Eη(Z)f(Z)=abf(x)I(x)dx<E\lvert\eta(Z)f^{\prime}(Z)\rvert=\int_{a}^{b}\lvert f^{\prime}(x)\rvert I(x)dx<\infty we have

In particular, in this case both expected values exist.

Next, we will turn to Lipschitz continuous test functions hh. In contrast to bounded measurable test functions, there we will also be able to prove useful bounds for ghg_{h}^{\prime}. In order that E[h(Z)]E[h(Z)] exists for Lipschitz continuous test functions hh we need to assume that EZ<E\lvert Z\rvert<\infty. The following two result, which are also proved in Section 5, include optimal bounds for both, ghg_{h} and ghg_{h}^{\prime}, when hh is Lipschitz.

gh(x)hF(x)E[Z]axyp(y)dyI(x)=hax(E[Z]y)p(t)dtI(x)\displaystyle\lvert g_{h}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{F(x)E[Z]-\int_{a}^{x}yp(y)dy}{I(x)}=\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}(E[Z]-y)p(t)dt}{I(x)}\,;

gh(x)haxF(s)dsG(x)+xb(1F(s))dsH(x)p(x)η(x)2\displaystyle\lvert g_{h}^{\prime}(x)\rvert\leq\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}F(s)dsG(x)+\int_{x}^{b}(1-F(s))dsH(x)}{p(x)\eta(x)^{2}}.

Here, for x(a,b)x\in\overline{(a,b)}, the positive functions H(x)H(x) and G(x)G(x) are defined by

Moreover, these bounds are optimal among all bounds involving the factor h\lVert h^{\prime}\rVert_{\infty}.

If a>a>-\infty and b<b<\infty, then it follows by an application of de l’Hôpital’s rule that the function S(x):=ax(E[Z]y)p(t)dtI(x)S(x):=\frac{\int_{a}^{x}(E[Z]-y)p(t)dt}{I(x)} is bounded on (a,b)(a,b). Indeed, if a>a>-\infty, for instance, we have that

However, in general S(x)S(x) is unbounded, if γ(x)\lvert\gamma(x)\rvert does not grow at least linearly with xx. For instance, if ZN(0,1)Z\sim N(0,1) and γ(t)=sign(t)\gamma(t)=-\operatorname{sign}(t), then we have for positive xx that

The bound for gh(x)\lvert g_{h}(x)\rvert in part (a) of Proposition 3.13 can be written as

where τ\tau is the so-called Stein factor or Stein kernel of ZZ given by

i.e. τ\tau is the function η\eta which belongs to the choice γ(x)=E[Z]x\gamma(x)=E[Z]-x. The Stein kernel τ\tau appeared first in Lecture 66 of and it has turned out to be a fundamental object in Stein’s method for one-dimensional absolutely continuous distributions (see, e.g. , and ).

ghhc\displaystyle\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{c}\,;

gh(x)2hcH(x)G(x)η(x)2p(x)=2chaxF(s)dsxb(1F(t))dtη(x)2p(x)\displaystyle\lvert g_{h}^{\prime}(x)\rvert\leq\frac{2\lVert h^{\prime}\rVert_{\infty}}{c}\frac{H(x)G(x)}{\eta(x)^{2}p(x)}=2c\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}F(s)ds\int_{x}^{b}(1-F(t))dt}{\eta(x)^{2}p(x)}.

In the case of the normal distribution (via its classical Stein equation) the bound given in Corollary 3.15 (a) reduces to ghh\lVert g_{h}\rVert_{\infty}\leq\lVert h^{\prime}\rVert_{\infty}. Formally, this bound is a special instance of a general bound given in Lemma 3.1 of for the multivariate standard normal distribution (see also Lemma 2.6 in ). However, this lemma is stated under the additional assumption that hh has three bounded derivatives, which is stronger than being Lipschitz-continuous. Yet, as has been pointed out to me by the referee, one can use the generator representation of the solution to the Stein equation to obtain the same bound as in Corollary 3.15 (a) for once differentiable test functions hh with bounded first derivative by applying the well-known consequences of the dominated convergence theorem on differentiating under the integral sign. Then, using smoothing techniques, this result could be extended to the class of Lipschitz-continuous test functions, yielding an alternative proof of this bound. Nevertheless, in the context of Stein’s method for the univariate normal distribution, the best bound mentioned on ghg_{h} for a Lipschitz test function hh is gh2h\lVert g_{h}\rVert_{\infty}\leq 2\lVert h^{\prime}\rVert_{\infty} (see, e.g. or ). Hence, we believe that Corollary 3.15 (a) is the first result that rigorously proves the aforementioned bound, although, as described above, it can also be proved by means of existing techniques from the generator framework.

For concrete distributions the ratio appearing in the bounds for gh(x)g_{h}^{\prime}(x) may be bounded uniformly in xx by some constant which can sometimes also be computed explicitely. For instance, this is performed for the Beta distribution in Section 4. Furthermore, for the situation of Corollary 3.15, in the authors give mild conditions for the existence of a finite constant kk such that ghkh\lVert g_{h}^{\prime}\rVert_{\infty}\leq k\lVert h^{\prime}\rVert_{\infty} for any Lipschitz-continuous hh. In practice, these conditions are usually met. However, there is no hope of estimating the constant kk by their method of proof. Thus, for concrete distributions and explicit constants it might therefore by useful to work with our bounds from Corollary 3.15 (b) or from Proposition 3.13.

Now, we show how we can use the above results and the density formula (2) to give bounds on higher order derivatives of ghg_{h}, if hh itself is smooth enough. First note that the constant KK from (2) is given by

Formula (35) is a more general version of formula (3.14) in and is also derived in . Now, if the coefficient γ\gamma is also absolutely continuous, by differentiating Stein’s equation (18), we obtain for hh Lipschitz

for the distribution Exp(α)\operatorname{Exp}(\alpha), if hh is continuously differentiable on [0,)[0,\infty) and both hh and hh^{\prime} are Lipschitz:

These bounds are better than those derived in and, additionally, since we do not have to assume that h(0)=0h^{\prime}(0)=0 for the bound on gh\lVert g_{h}^{\prime\prime}\rVert_{\infty} to be valid, one term in the bounds of Theorems 1.1 and 1.2 from would drop off, if instead our bounds were used.

Next, we introduce the approach of exchangeable pairs satisfying the regression properties (14) and (23) in our general framework. As was observed in for the normal distribution, in case of univariate distributional approximations, one does not need the full strength of exchangeability, but equality in distribution of the random variables WW and WW^{\prime} is sufficient. This may allow for a greater choice of admissible couplings in several situations, or at least, relaxes the verification of asserted properties. Thus, let W,WW,W^{\prime} be real-valued random variables defined on the same probability space such that W=DWW\stackrel{{\scriptstyle\mathcal{D}}}{{=}}W^{\prime}. We will assume, that the random variables WW and WW^{\prime} only have values in an interval (a,b)J(a,b)(a,b)\subseteq J\subseteq\overline{(a,b)} where both functions η\eta and γ\gamma are defined (recall that it might be the case that η\eta can only be defined on (a,b)(a,b)). However, from Proposition 3.5 we know that we can let J=(a,b)J=\overline{(a,b)} if Condition 3.4 holds.

where f\lVert f^{\prime\prime}\rVert_{\infty} denotes the minimum Lipschitz constant of ff^{\prime}.

The bound (3.19) can only be small, if SS and RR are of negligible order.

The proof shows, that Proposition 3.19 can easily be generalized to the situation, where there is a sub-σ\sigma-algebra F\mathcal{F} with σ(W)F\sigma(W)\subseteq\mathcal{F} and the more general regression properties

hold for some F\mathcal{F}-measurable remainder terms RR and SS.

If H\mathcal{H} is some class of test functions, such that there are finite, positive constants c0c_{0}, c1c_{1} and c2c_{2} with ghc0\lVert g_{h}\rVert_{\infty}\leq c_{0}, ghc1\lVert g_{h}^{\prime}\rVert_{\infty}\leq c_{1} and ghc2\lVert g_{h}^{\prime\prime}\rVert_{\infty}\leq c_{2} for each hHh\in\mathcal{H}, then (3.19) immediately yields a bound on the distance

Finally, in our general framework, we readdress the last issue discussed in Section 2. Namely, we suppose that we are given two functions γ\gamma and η\eta, such that for some a<b-\infty\leq a<b\leq\infty the function γ\gamma is defined on (a,b)\overline{(a,b)}, η\eta is defined at least on (a,b)(a,b) and the following properties hold.

The function γ\gamma is decreasing and such that 0<γ(a+)0<\gamma(a+)\leq\infty and γ(b)<0-\infty\leq\gamma(b-)<0. Again, we define x0(a,b)x_{0}\in(a,b) by x0:=sup{x(a,b):γ(x)>0}x_{0}:=\sup\{x\in(a,b)\,:\,\gamma(x)>0\}.

The function η\eta is positive and locally absolutely continuous on (a,b)(a,b).

The function γ/η\gamma/\eta is locally integrable on (a,b)(a,b) and, if we define

Note that by definition we have Q(x)0Q(x)\leq 0 for all x(a,b)x\in(a,b), if Condition 3.21 is satisfied. Now, we define the density pp by relation (2) with KK being a suitable normalizing constant. The existence of KK follows from the fact that, by Condition 3.21, for each c(a,x0)c\in(a,x_{0}) there is a finite constant L>0L>0 such that Lγ(x)1L\gamma(x)\geq 1 for each x(a,c)x\in(a,c). Thus,

for each d(x0,b)d\in(x_{0},b). Hence, pp can be suitably normalized. Now, let ZZ be a random variable with probability density function pp. The next result is a generalization of Lemma 3, Lecture 6 in .

If Condition 3.21 is satisfied, then the density pp defined by (2) is such that

In particular, the theory developed in this section can be applied in this framework.

Thus, E[γ(Z)]=0E[\gamma(Z)]=0. The second claim follows from

Stein’s method for the Beta distribution

In this section we specialize the theory from Section 3 to the family Beta(a,b)Beta(a,b), a,b>0a,b>0, of Beta distributions as defined in Section 2. Let us fix a,b>0a,b>0 and from now on assume that ZBeta(a,b)Z\sim Beta(a,b). Motivated by the Pólya urn example, the above constructed exchangeable pair (W,W)(W,W^{\prime}) and by Proposition 2.2 we define the function γ:=γa,b\gamma:=\gamma_{a,b} as in Proposition 2.2 and observe that

It is thus easy to see that γ\gamma satisfies all assumptions of Condition 3.2 and also that the Beta density p:=pa,bp:=p_{a,b} given by (4) satisfies Conditions 3.1 and 3.4, the latter either directly or by Proposition 3.7. We claim that the function η\eta defined by (21) is given by

which easily follows from differentiating both sides of (43) and using (10). Thus, from Proposition 3.12 we immediately obtain the following Stein characterization for the Beta distribution. This result substantially extends Theorem 1 in in the case of the Beta distribution, which is weaker as it only characterizes the Beta distribution among the class of absolutely continuous distributions with finite second moment.

A random variable XX with values in $hasthedistributionhas the distributionBeta(a,b)ifandonlyifforeachcontinuousfunctionif and only if for each continuous functionfonon,whichislocallyabsolutelycontinuouson, which is locally absolutely continuous on(0,1)suchthatsuch thatE\lvert Z(1-Z)f^{\prime}(Z)\rvert<\infty$, we have

For the Beta distribution and a mesaurable function hh with Eh(Z)<E\lvert h(Z)\rvert<\infty, the Stein equation (18) is given by

and the standard solution (3) has the form

if hh has a right limit at and a left limit at 11 by Proposition 3.8. We mention that the same Stein equation (44) has already been considered in , and in .

From Proposition 3.9 and Corollary 3.15 we can derive the following bounds for the solution (45) to (44). The proof is given in Section 5.

If hh is bounded, then ghhE[h(Z)]2m(1m)p(m)\displaystyle\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h-E[h(Z)]\rVert_{\infty}}{2m(1-m)p(m)}, where mm is the median of Beta(a,b)Beta(a,b).

If hh is Lipschitz, then ghha+b\displaystyle\lVert g_{h}\rVert_{\infty}\leq\frac{\lVert h^{\prime}\rVert_{\infty}}{a+b} and ghC(a,b)h\lVert g_{h}^{\prime}\rVert_{\infty}\leq C(a,b)\lVert h^{\prime}\rVert_{\infty}, where C(a,b)C(a,b) is given by (47) and (48).

If hh is continuously differentiable with Lipschitz derivative hh^{\prime}, then ghg_{h}^{\prime} is Lipschitz and ghC(a+1,b+1)h+(a+b)C(a+1,b+1)C(a,b)h\displaystyle\lVert g_{h}^{\prime\prime}\rVert_{\infty}\leq C(a+1,b+1)\lVert h^{\prime\prime}\rVert_{\infty}+(a+b)C(a+1,b+1)C(a,b)\lVert h^{\prime}\rVert_{\infty}.

More generally, if m1m\geq 1 is an integer and hh is at least (m1)(m-1)-times differentiable such that h(j)h^{(j)} is Lipschitz-continuous for j=0,,m1j=0,\dotsc,m-1, then gh(m1)\lVert g_{h}^{(m-1)}\rVert_{\infty} is Lipschitz and

where we define an empty product to be equal to 11.

It is worthwhile to compare our bound for gh\lVert g_{h}^{\prime}\rVert_{\infty} from Proposition 4.2 (b) to the bound gh(b0+b1)h\lVert g_{h}^{\prime}\rVert_{\infty}\leq(b_{0}+b_{1})\lVert h^{\prime}\rVert_{\infty} given in . One can show that if a=ba=b, then our bound is uniformly better than theirs. However, if aba\not=b, then there are regions for (a,b)(a,b) where our constant C(a,b)C(a,b) is smaller and other ones, where their b0+b1b_{0}+b_{1} is smaller. For instance, if 0<a,b10<a,b\leq 1, then, again, C(a,b)b0+b1C(a,b)\leq b_{0}+b_{1}. But, if 1<b<21<b<2 is fixed and aa tends to zero, then C(a,b)C(a,b) goes to infinity while their b0+b1b_{0}+b_{1} tends to 1212. In any case, neither our bound nor the bound from seem to be optimal for gh\lVert g_{h}^{\prime}\rVert_{\infty}.

Form Corollary 3.15 (b) we know that for Lipschitz hh and x(0,1)x\in(0,1)

By an application of de l’Hôpital’s rule, one can show that

We conjecture that if min(a,b)<1\min(a,b)<1, then

i.e. that BB assumes its maximum value at the boundary of (0,1)(0,1). However, if min(a,b)>1\min(a,b)>1, then we believe that there is always an x1(0,1)x_{1}\in(0,1) such that

If a=ba=b, then the median of Beta(a,a)Beta(a,a) equals 1/21/2 and the bound in (a) has the explicit form ghB(a,a)2a+b1hE[h(Z)]\lVert g_{h}\rVert_{\infty}\leq B(a,a)2^{a+b-1}\lVert h-E[h(Z)]\rVert_{\infty}. Unfortunately, for aba\not=b there is no closed from expression for the median of Beta(a,b)Beta(a,b). In such a case one could use known inequalities about the median mm in order to get bounds on gh\lVert g_{h}\rVert_{\infty}. Since one would have to distinguish several cases according to the values of aa and bb and, hence, to the shape of the density pp, we omit the details, here.

From Proposition 3.19, Remark 3.20 (ii) and the bounds from Proposition 4.2 we obtain the following plug-in result, which bounds a certain distance to the Beta distribution by terms related to a given exchangeable pair.

Let WW and WW^{\prime} be identically distributed random variables on a common probability space (Ω,A,P)(\Omega,\mathcal{A},P) and let FA\mathcal{F}\subseteq\mathcal{A} be a sub-σ\sigma-algebra of A\mathcal{A} such that σ(W)F\sigma(W)\subseteq\mathcal{F} and

where the constants C(,)C(\cdot,\cdot) are defined by (47) and (48).

Now we are in a position to prove Theorem 2.1.

The claim immediately follows from Theorem 4.4, Propositions 2.2, 2.3 and the fact that in this case

Proofs

Suppose, that a>a>-\infty and choose y(a,x0)y\in(a,x_{0}). Then γ(y)>0\gamma(y)>0 and, by the nonnegativity of II and the monotonicity of γ\gamma, for a<x<ya<x<y we have

so that limxaη(x)=0\lim_{x\downarrow a}\eta(x)=0. Conversely, if η(a+)=0\eta(a+)=0, then, again by (5),

The calculation for finite bb is similar by using the representation I(x)=xbγ(t)p(t)dtI(x)=-\int_{x}^{b}\gamma(t)p(t)dt and is therefore omitted. ∎

That item (a) is sufficient is clear. If (b) holds, then the claim follows from the inequality

valid for x(a,a+δ)x\in(a,a+\delta). Under Condition (c) we obtain a continuous and convex function on [a,a+δ)[a,a+\delta) by letting p(a):=0p(a):=0. Now, let a<x<y<a+δa<x<y<a+\delta. Then, there exists a λ(0,1)\lambda\in(0,1) with x=λa+(1λ)yx=\lambda a+(1-\lambda)y and by convexity we have:

Thus, the assumptions of (b) are satisfied. If (d) holds, then again letting p(a):=0p(a):=0 we obtain a continuous and concave function on [a,a+δ)[a,a+\delta). Thus, there exists a decreasing function ff on [a,a+δ)[a,a+\delta) such that

If there was a sequence (xn)n1(x_{n})_{n\geq 1} in [a,a+δ)[a,a+\delta) such that xnax_{n}\downarrow a and f(xn)0f(x_{n})\leq 0 for each n1n\geq 1, then for each x(a,a+δ)x\in(a,a+\delta) and large enough nn we would have

for all x(a,a+r)x\in(a,a+r) and hence, by de l’Hôpital’s rule,

In order to prove (f) we show that always

if pp satisfies Condition 3.1. To show this, define the function G(x):=logF(x)G(x):=\log F(x) for x(a,b)x\in(a,b). Then, GG is increasing and continuously differentiable on (a,b)(a,b) and satisfies G(a+)=G(a+)=-\infty and G(b)=0G(b-)=0. If (50) did not hold, then

Hence, choosing δ>0\delta>0 such that G(x)c+1G^{\prime}(x)\leq c+1 for all x(a,a+δ]x\in(a,a+\delta] we would obtain

which would contradict G(a+)=G(a+)=-\infty. ∎

which exists in [0,)[0,\infty) by Condition 3.2. Here, we used the convention 1=0\frac{1}{\infty}=0. Moreover,

again by Condition 3.2 and by Proposition 3.3. Furthermore, we have

for each x(a,b)x\in(a,b) since by the positivity of pp and because γ\gamma is decreasing

Hence, MM is increasing and, thus, for each x(a,m]x\in(a,m] we have

The same bound can be proved for x(m,b)x\in(m,b) by using the representation

and the fact that also 1F(m)=121-F(m)=\frac{1}{2}. ∎

The following two lemmas, which are quite standard in Stein’s method, will be needed for the proof of Proposition 3.13. For proofs we refer to , for instance.

Suppose that pp satisfies Condition 3.1 and that abxp(x)dx<\int_{a}^{b}\lvert x\rvert p(x)dx<\infty. Then, for each x(a,b)x\in\overline{(a,b)} we have:

axF(t)dt=xF(x)axsp(s)ds\int_{a}^{x}F(t)dt=xF(x)-\int_{a}^{x}sp(s)ds\,;

xb(1F(t))dt=xsp(s)dsx(1F(x))\int_{x}^{b}(1-F(t))dt=\int_{x}^{\infty}sp(s)ds-x(1-F(x)).

Suppose that pp satisfies Condition 3.1 and that EZ=abxp(x)dx<E\lvert Z\rvert=\int_{a}^{b}\lvert x\rvert p(x)dx<\infty. Then, for each Lipschitz function hh, the following assertions hold true:

For each x(a,b)x\in\overline{(a,b)} we have ax(h(y)E[h(Z)])p(y)dy=(1F(x))axF(s)h(s)dsF(x)xb(1F(s))h(s)ds\int_{a}^{x}(h(y)-E[h(Z)])p(y)dy=-(1-F(x))\int_{a}^{x}F(s)h^{\prime}(s)ds-F(x)\int_{x}^{b}(1-F(s))h^{\prime}(s)ds.

First, we prove (a). Recall the representation

By Lemmas 5.2 and 5.1 we thus obtain that

implying (a). Now, we turn to the proof of (b). By Stein’s equation (18) we obtain for x(a,b)x\in(a,b)

From (52) we already know that HH is nonnegative on (a,b)(a,b). Similarly we prove the nonnegativity of GG on (a,b)(a,b): Since pp is positive and γ\gamma is decreasing, for xx in (a,b)(a,b) we have

which reduces to the bound asserted in (b). Optimality of the bound in (a) follows from choosing h(x)=xh(x)=x and observing that the above inequalities are in fact equalities, in this case. To see that also the bound in (b) is optimal, for given x(a,b)x\in(a,b) choose a 11-Lipschitz function hh such that h(s)=1h^{\prime}(s)=1 for all s(a,x)s\in(a,x) and h(s)=1h^{\prime}(s)=-1 for all s(x,b)s\in(x,b). Then, from (5) and the nonnegativity of HH and GG, we see that equality holds in (5). ∎

Claim (a) follows from Proposition 3.13 (a) and the observation that in this case we have

Part (b) follows from Proposition 3.13 (b) and Lemma 5.1 by observing that in this case

and, similarly, G(x)=cxb(1F(s))dsG(x)=c\int_{x}^{b}(1-F(s))ds. ∎

We first prove necessity. Let ff be given as in the proposition. First we show that Eγ(Z)f(Z)<E\lvert\gamma(Z)f(Z)\rvert<\infty. We have

Repeating essentially the same calculation without absolute value signs and using E[γ(Z)]=0E[\gamma(Z)]=0 yields

To prove sufficiency it is clearly enough to show that

holds for each bounded and continuous function hh. Let ghg_{h} be the standard solution of the Stein equation (18) corresponding to hh. Then, from Proposition 3.9 we know that gh<\lVert g_{h}\rVert_{\infty}<\infty. Also, ghg_{h} is continuous on (a,b)\overline{(a,b)} and continuously differentiable on each compact subinterval of (a,b)\overline{(a,b)}. Furthermore, since I(x)=η(x)p(x)I(x)=\eta(x)p(x) and ghg_{h} solves (18) we have

By the hypothesis of Proposition 3.12 we can thus conclude that

To show finiteness of the first integral in (56) note that since γ\gamma is decreasing, by Fubini’s theorem

since hh is Lipschitz. Similarly, one shows that

Since hh^{\prime} is bounded, to show that the second integral in (56) is finite, it suffices to prove that

since Eγ(Z)<E\lvert\gamma(Z)\rvert<\infty and EZγ(Z)<E\lvert Z\gamma(Z)\rvert<\infty and similarly one shows that

Using ηp=I\eta p=I, I=γpI^{\prime}=\gamma p and I(a+)=I(b)=0I(a+)=I(b-)=0, from Fubini’s theorem we obtain that the left hand side of (59) equals

Similarly, using γ(x0)=0\gamma(x_{0})=0, the definition of ghg_{h} in (3) and Fubini’s theorem again, we have that the right hand side of (59) equals

where we have used E[γ(Z)]=0E[\gamma(Z)]=0 for the last equality. Thus, from (5) and (5) we conclude that (59) holds. Thus, the standard solution f=fh2f=f_{h_{2}} to (38) is well-defined and given by

Now, first suppose that a>a>-\infty. Since ghg_{h} solves the Stein equation (18) we know that

Hence, from (63), (66) and (64) we conclude that gh=fg_{h}^{\prime}=f is the standard solution to (38). Similarly, one obtains this result if b<b<\infty. Finally assume that ghg_{h}^{\prime} is bounded. Since limxaη(x)2p(x)=0\lim_{x\downarrow a}\eta(x)^{2}p(x)=0 we conclude from (64) and (63) that

Hence, by distributional equality, we obtain

and 01s(1s)ds=16\int_{0}^{1}s(1-s)ds=\frac{1}{6} the bound (3.19) now easily follows from (5) and the properties of ff. ∎

Claim (a) immediately follows from Proposition 3.9. Similarly, the first part of claim (b) immediately follows from Corollary 3.15 (a). For the second part of (b) we note that by Corollary 3.15 (b) we have for x(0,1)x\in(0,1):

Since FF is increasing and 1F1-F is decreasing, we have

for each xx\in. Plugging this into (69) yields

By de l’Hôpital’s rule, one can easily show that S(0+)=a1S(0+)=a^{-1} and S(1)=b1S(1-)=b^{-1}. Thus, it suffices to bound S\lVert S\rVert_{\infty}. For general a,b>0a,b>0 we write

For aba\not=b we bound the functions f1f_{1} and f2f_{2} seperately. By de l’Hôpital’s rule we have

This implies that N1N_{1} is nonnegative and, hence, f1f_{1} is increasing for b1b\leq 1 and that N1N_{1} is nonpositive and, hence, f1f_{1} is decreasing for b>1b>1. Thus,

Since f2(x;a,b)=f1(1x;b,a)f_{2}(x;a,b)=f_{1}(1-x;b,a) we have

Thus, from (70), (71), (72) and (73) we have

where C(a,b)C(a,b) is given by (47) and (48). In the case a=ba=b we can provide better bounds. First note that in this case the Beta distribution Beta(a,a)Beta(a,a) is symmetric with respect to 1/21/2. This easily implies that

holds for each 0x1/20\leq x\leq 1/2. Thus it suffices to bound SS on [1/2,1)[1/2,1). Note that

Thus, SS is increasing (decreasing) on [1/2,1)[1/2,1), if and only if TT is nonnegative (nonpositive) there. In the case a=ba=b we have γ(x)=a(12x)\gamma(x)=a(1-2x) and, hence, γ(1/2)=F(1/2)=0\gamma(1/2)=F(1/2)=0. Thus, recalling that I(1)=(ηp)(1)=0I(1)=(\eta p)(1-)=0 we have

By (76) the nonnegativity (nonpositivity) of TT on [1/2,1)[1/2,1) follows, if T(y)0T(y)\geq 0 (0\leq 0) for every locally extremal point y(1/2,1)y\in(1/2,1). We have

and, hence, if y(1/2,1)y\in(1/2,1) is a locally extremal point of TT, we have T(y)=0T^{\prime}(y)=0 and

Now, for x[1/2,1)x\in[1/2,1), consider the function

and note that U(1/2)=0U(1/2)=0. For 1/2x<11/2\leq x<1 we have

and, hence, UU is increasing for a1a\leq 1 and is decreasing for a1a\geq 1. Since U(1/2)=0U(1/2)=0 it thus follows from (5) that if y(1/2,1)y\in(1/2,1) is a locally extremal point of TT, then T(y)T(y) is nonnegative for a<1a<1 and nonpositive for a1a\geq 1. From (74) and (76) it thus follows that SS is decreasing on [1/2,1)[1/2,1) if a1a\geq 1 and increasing if a<1a<1. Hence, we can conclude that

Note that by the duplication formula for the Gamma function we have

Now, we turn to the proof of (c). From Proposition 3.17 we know that ghg_{h}^{\prime} is the standard solution to the Stein equation

corresponding to the distribution Beta(a+1,b+1)Beta(a+1,b+1). Thus, since h2h_{2} is Lipschitz by part (b), applying (b) for Beta(a+1,b+1)Beta(a+1,b+1) and for Beta(a,b)Beta(a,b) yields

one can see by induction that for all k=2,,mk=2,\dotsc,m

Hence, by (b) and from Proposition 3.17 similarly to (5) we can prove that

The bound now follows from an easy induction on mm. ∎

References