Stein's method of exchangeable pairs for absolutely continuous, univariate distributions with applications to the Polya urn model
Christian Döbler
Introduction
and that, if is bounded and is unbounded on , then is the only bounded solution on . For general properties of the solutions see [CS11] or [CGS11]. Note that for general Borel-measurable it cannot be expected that there exists a solution which is differentiable on all of and satisfies (2) pointwise. Thus, a solution is understood to be an almost everywhere differentiable and Borel-measurable function which satisfies (2) at all points where it is in fact differentiable and contrary to the usual convention, at the remaining points one defines . This yields a Borel-measurable function on such that (2) holds for each . In order to understand the exchangeable pairs technique in the framework of the density approach it might be helpful to recall the exchangeable pairs method in the situation of normal approximation. This method, which was first presented in Stein’s monograph [Ste86], is a cornerstone of Stein’s method of normal approximation and is still the most frequently used coupling. This is due to the wide applicability of standard couplings like the Gibbs sampler or making one time step in a reversible Markov chain, which generally yield exchangeable pairs. By definition, an exchangeable pair is a pair of random variables, defined on a common probability space, such that their joint distribution is symmetric, i.e. such that . In [Ste86], in order to show that a given real-valued random variable is approximately standard normally distributed, Stein proposes the construction of another random variable , a small random perturbation of , on the same space as such that forms an exchangeable pair and additionally the following linear regression property holds:
Here, is a constant which is typically close to zero for conveniently chosen . If this condition is satisfied, then the distributional distance of to can be efficiently bounded in various metrics, including the Kolmogorov and Wasserstein metrics (see, e.g. [Ste86], [CS05] or [CGS11] for the common “plug-in theorems”). The range of examples to which this technique could be applied was considerably extended by the work [RR97] of Rinott and Rotar who proved normal approximation theorems allowing the linear regression property to be satisfied only approximately. Specifically, they assumed the existence of a random quantity , which is dominated by in size, such that
Note, that necessarily is -measurable and that unlike condition (4), condition (5) is not a true condition on the pair since we can always define for each given constant . However, the “plug-in theorems” in [RR97], [SS06] or [CGS11] clarify that has to be of smaller order than in order to yield useful bounds. Since is supposed to have a “true” distributional limit, it follows that both, and , are at least asymptotically unique (see also the introduction of [RR09] for the discussion of this topic). When dealing with our possibly non-normal distribution , the question is what condition to substitute for the linear regression property (4) or (5). This question was succesfully answered independently by Eichelsbacher and Löwe in [EL10] and by Chatterjee and Shao in [CS11]. They pointed out, that in this more general setting the appropriate regression property is
where, again is constant and is of smaller order than . In order to give a flavour of the resulting “plug-in theorems”, we present parts of Theorem 2.4 from [EL10].
The third term of the bound (1.1) reveals that, in fact, must be of smaller order than in order for the bound to be useful and from the second term we conclude that should such that . The first term appearing in the bound on the right hand side of (1.1) is usually interpreted such, that the random variable E\bigl{[}(W-W^{\prime})^{2}|W\bigr{]}/2\lambda must “obey a law of large numbers” to obtain decreasing bounds. Bounding this term is often decisive for the success of applying Theorem 1.1.
Having discussed the method of exchangeable pairs within the density approach, we now address the problem, that condition (6) with negligible remainder term is in some examples not satisfied by an exchangeable pair, which, however, appears natural to us for our approximation problem. For example, in many situations where the exchangeable pair is constructed via the Gibbs sampler, we have a regression property of the form
where are constants (the reason why and are not subsumed into a single constant will become clear later on) and where, again, is a negligible remainder. Here, again . Following the theory of the paper [CS11], condition (8) suggests approximating with a normal distribution with mean and variance . But there are situations, where the exchangeable pair is good, meaning that the difference is “small”, condition (8) is satisfied and where we know that is approximately distributed as a non-normal random variable and so the normal approximation is inappropriate. In general, this either means that the above discussed law of large numbers cannot hold or that the resulting error term in (6) is not negligible. These observations motivate a new version of Stein’s method, that allows for a more general regression property.
Suppose, that an appropriately chosen exchangeable pair satisfies the following general regression property:
where is constant, is a measurable function, whose domain contains and will be discussed further below and where is a negligible remainder term. We will see, that it will be advantageous if the term appears in the “new” Stein equation. So we make the following ansatz for the Stein identity:
where is another function which still has to be found.
Starting from the Stein identity (10) our aim is to identify the function . If this approach is succesful, the Stein equation corrersponding to a meaurable function will be
For this identity to hold, irrespective of the test function , it must be the case that (particularly must be differentiable at least almost everywhere) and hence
This is a first order linear differential equation, which can of course be solved explicitly by the method of variation of the constant. It turns out, that the right solution is given by
at least, if . But this is a very natural condition to hold, since the function was motivated by the regression property (9) and so, if the random variables and are integrable we obtain that both sides of (9) must be -integrable and, in fact
Neglecting the remainder term we thus see that should exist and, in fact, be close to zero. So since we find it reasonable that exists and even equals zero. Furthermore, it is a matter of routine to check, that as given in (15) indeed still satisfies (14). The above calculations starting with (10) were rather formal but crucial for the motivation and understanding of our approach. The paper is organized in the following way. Rigorous results and the abstract theory for general are presented in Section 2. These results are then further spezialized to the Beta distributions in Section 3. In Section 4 the theory combined with a suitable exchangeable pairs coupling is used to prove a rate of convergence of order in a Polya urn model (see Theorem 4.3). In Section 5 some rather lengthy or technical proofs can be found and a sufficiently general version of de l’ Hôpital’s rule for merely absolutely continuous functions is provided. This result justifies all the invocations of this famous rule in the present work.
Acknowledgements
A few days after this work was on the arXiv, G. Reinert and L. Goldstein posted a preprint (see [GR12]), which also develops Stein’s method for the Beta distributions and uses a comparison technique to prove error bounds of order in the Wasserstein distance for the more special Polya urn model, where the drawn ball is replaced to the urn together with only one extra ball of the same colour.
The general theory
The density is positive on the interval and absolutely continuous on every compact interval .
is continuous on
is strictly decreasing on
and in fact
There is a unique with .
Note that in Condition 2.2 (iv) is actually implied by (i),(ii) and (iii) and the intermediate value theorem. Furthermore, (iv) implies that is positive on and is negative on .
Under Conditions 2.1 and 2.2 the function has the following properties:
is strictly increasing on and strictly decreasing on and hence attains its global maximum at .
Of course, (a) is immediately implied by item (iii) from Condition 2.2. To prove (b) and (c) first observe that by (iii) we have . Furthermore, is postive on and negative on implying the results. ∎
Under Conditions 2.1 and 2.2 the function has the following properties:
is positive on , absolutely continuous on every compact subinterval and for -almost all .
If , then , if this limit exists.
If then
If , then , if this limit exists.
If then
The first part of (a) follows from the fact, that is positive on and on and hence absolutely continuous on and that is also absolutely continuous and bounded below by a positive constant on . The rest of (a) has already been observed. Items (b), (d) and (f) follow immediately from the properties of the function in Proposition 2.4. To prove (c), we use de l’Hôpital’s rule (see Theorem 5.1) to derive
The following “Mill’s ratio” condition on the density and the corresponding distribution function is often satisfied and will yield .
The density of satisfies all the properties from Condition 2.1 and also the following:
If , then .
If , then .
Condition 2.6 is always satisfied if the density is bounded away from zero in suitable neighbourhoods of and .
Assume that both, and and that . Then Condition 2.6 is satisfied, if there is a such that is increasing on and decreasing on . This is easily seen by the inequality
valid for and a similar one for the right end point .
Suppose that and , that and that there is a such that is convex on and on . Then the assumptions of (b) and hence Condition 2.6 is satisfied. In fact, first we can extend to a continuous and convex function on by setting . Now, let . Then, there exists a with and by convexity we have:
Thus, is strictly increasing on . Similarly, one shows, that is strictly decreasing on , if is convex there.
The following proposition provides the announced result.
Assume Condition 2.6. Then the function vanishes at the finite end points of the support of , i.e. if , then and if , then . Hence, we may extend to a continuous function on by letting .
Suppose, that . Then, by the positivity of and the monotonicity of , for :
so that . The proof of for finite is similar by using the representation and is therefore omitted. ∎
solves the Stein equation (11) for . This can also be proved directly by differentiation and the formula for could also be derived by the method of variation of the constant using the fact that is a primitive function of , which follows from (14). If we can show that is bounded, then it will immediately follow from Proposition 2.5 (a) that is the only bounded solution of (11), since the solutions of the corresponding homogeneous equation are constant multiples of . Since we do not exclude approximating random variables which take on the values or , we show that the solution can be extended continuously to and , if is continuous there. By the properties of the function on , from Proposition 2.5, the continuity of and by de l’Hôpital’s rule (see Theorem 5.1) we have
As is typical for Stein’s method, its success within the applications considerably depends on good bounds on the solutions and their derivative(s), generally uniformly over some given class of test functions . The next step will be to prove such bounds. It has to be mentioned that we cannot expect to derive concrete good bounds in full generality, but that sometimes further conditions have to be imposed either on the distribution (e.g. through the density ) or on the coefficient . Nevertheless, we will derive bounds involving functional expressions which can a posteriori be simplified, computed or further bounded for concrete distributions. So our abstract viewpoint will pay off. Moreover, some of our bounds will actually hold in complete generality.
The next Proposition contains a bound for the solutions for bounded and Borel-measurable test functions .
The proof is deferred to the appendix. The following corollary specializes this result to the case that and that is symmetric with respect to its median, which is then equal to its expected value , that is .
In this case we clearly have which implies the result by Proposition 2.10. ∎
In the case that and this result specializes to the well known bound (see [CGS11] or [CS05], e.g.).
In the formulation of Proposition 2.10 it might suprise that there is no bound mentioned for . This is because, in general a bound of the form does not exist with a finite constant in this setup. Note that this is contrary to the density approach, where one generally has such a bound (see [CS11] or [CGS11]).
Next, we will turn to Lipschitz continuous test functions . In contrast to bounded measurable test functions, there we will also be able to prove useful bounds for .
In order to obtain bounds for Lipschitz continuous test functions we need a further condition on the distribution which guarantees that its expected value exists.
The density is positive on the interval and absolutely continuous on every compact interval . Furthermore, .
The following proposition, which is also proved in the appendix, includes bounds for both, and , when is Lipschitz.
Here, for the positive functions and are defined by
In general, the term cannot be bounded uniformly in unless grows at least linearly in .
\lvert g_{h}^{\prime}(x)\rvert\leq\frac{2\lVert h^{\prime}\rVert_{\infty}}{c}\frac{H(x)G(x)}{I(x)\eta(x)}=2\lVert h^{\prime}\rVert_{\infty}\frac{\int_{a}^{x}F(s)ds\int_{x}^{b}(1-F(t))dt}{\eta(x)\bigl{(}E[Z]F(x)-\int_{a}^{x}yp(y)dy\bigr{)}}
Claim (a) follows from Proposition 2.14 (a) and the observation that in this case we have
Part (b) follows from Proposition 2.14 (b) and Lemma 5.2 by observing that in this case
and, similarly, . ∎
It is quite remarkable that in the case of normal approximation (via its classical Stein equation) the bound given in Corollary 2.16 (a) even improves on the best bound currently mentioned in the literature (see, e.g. [CGS11] or [CS05]). In fact, in this case and thus our bound reduces to .
For concrete distributions the ratio appearing in the bound for may be bounded uniformly in by some constant which can sometimes also be computed explicitely. Nevertheless, in [EV12] the authors give mild conditions for the existence of a finite constant such that for any Lipschitz-continuous . In practice, these conditions are usually met. However, there is no hope of estimating the constant by their method of proof. Thus, for concrete distributions and explicit constants it might therefore by useful to work with our bound from Corollary 2.16 (b).
Next, we will discuss, how we can express the density of in terms of and . This will be useful to bound the second derivative of in some special cases. Let be as in Condition 2.2. Since and hence , we have
Formula (20) is a more general version of formula (3.14) in [NV09] and is also derived in [KT12]. Now, differentiating Stein’s equation (11), we obtain for Lipschitz
We already know, how to solve (24) for . So now, we will assume that at least one of and is finite and try to solve the equation outside . Furthermore, we will discuss conditions that ensure that the composed solution behaves nicely at the edges and/or . We will henceforth assume Condition 2.18. For equation (24) is clearly equivalent to
Let us assume that both and (the other cases are of course included) and let be any primitive function of on . Such a function exists by continuity and is hence continuously differentiable. By the method of variation of the constant one may derive the following formula for :
if this integral exists. Note that this property does not depend on the particular choice of the primitive function . For a fixed primitive function of on we define the function
Analogously, for a given primitive function of on we define the function
Note that inside the interval we have that is a primitive of and hence plays the role of on (and similarly for ). As we have observed, we will need the following Condition:
Similarly, for we arrive at the definition
Note that the definition of the solution does not depend on the choice of the primitive functions and since two such functions may only differ by an additive constant.
Next, we prove that the above constructed solution is continuous as long as is continuous at and . To deal with the limits and we first formulate a condition which will usually be satisfied in practice.
The functions and satisfy and .
Again, the validity of this condition does not depend on the choice of the functions and . By Condition 2.21 we may again apply de l’Hôpital’s rule to compute
Next, we want to present bounds on and for . But first we will show that our conditions already imply that if or . Since is then negative on this also ensues that . Similarly, .
Assume Conditions 2.18, 2.19 and 2.21. Then the functions , and have the following properties:
We have .
To prove (a), first note, that by Condition 2.18 has no sign changes on . Suppose contrarily to the assertion, that for all . Then, the function is also positive and hence for each . By Conditions 2.19 and 2.21 we may apply de l’Hôpital’s rule to conclude
by Condition 2.18. This is a contradiction and hence we must have for . Similarly, one shows that also for . To prove (b), note that for . By Condition 2.21 this necessarily implies that . Analogously, we have for (since there) which by Condition 2.21 implies that . ∎
Proposition 2.23 particularly implies the conclusion of Proposition 2.8 and hence makes Condition 2.6 redundant, at least as far as the assertion of this proposition is concerned. In order to get general bounds, we will need yet another condition on the functions and
The functions and satisfy .
The next result gives bounds on for bounded, Borel-measurable functions . As usual, the proof is in the appendix.
Before turning to Lipschitz test functions, we dicuss properties of the functions and , respectively. In particular, we will show, that they correspond to the function on and have similar integral representations.
For each we have .
For each we have .
by Proposition 2.23. Hence exists and equals . ∎
For each we have .
For each we have .
These probabilities have nothing to do with the distribution of and hence, one should be able to bound them as well directly for a given . Thus, we will focus on .
Let be given. Then, under the Conditions 2.1, 2.18, 2.19 and 2.21 we have:
It is clear, that a similar discussion of the solutions is possible, if or .
Note, that we can write for and for , with the functions and from Proposition 2.14. For concrete distributions one may often prove, that is increasing on and decreasing on , but this seems to be hard to prove in generality, if it is true at all.
Finally, in our general setting, we will prove suitable “plug-in theorems” for exchangeable pairs satisfying our general regression property (9). As was observed in [Röl08] for the normal distribution, in case of univariate distributional approximations, one does not need the full strength of exchangeability, but equality in distribution of the random variables and is sufficient. This may allow for a greater choice of admissible couplings in several situations, or at least, relaxes the verification of asserted properties.
In the following, let be a probability space and let be real-valued random variables defined on this space such that . Let, as before, be our target distribution with support fulfilling Condition 2.1 . From now on we will assume, that the random variables and only have values in an interval where both functions and are defined (recall that it might be the case that can only be defined on ).
If is also absolutely continuous and for some Borel-measurable version of the second derivative, then we also have the bound
Hence, by distributional equality, we obtain
From (32) and the assumptions on the bound (29) now easily follows. To prove (30) it suffices to observe that
and . ∎
From the first term on the right hand side of (29) we see, that the bound can only be useful, if E\bigl{[}(W^{\prime}-W)^{2}|W\bigr{]}\approx 2\lambda\eta(W). Similarly, the third term reveals, that, indeed, should be of smaller order than .
The proof shows, that Proposition 2.31 can easily be generalized to the situation, where there is a -algebra with and the more general regression property
with some -measurable remainder term is satisfied.
If is some class of test functions, such that there are finite, positive constants , and with , and for each , then (30) immediately yields a bound on the distance
Stein’s method for Beta distributions
From now on, fix . Now, we introduce a Stein identity for the Beta distribution . It is easily checked, that its density satisfies the ordinary differential equation
Integrating by parts one obtains the conjugate operator , which is defined by the equation , and which is known to serve as a characterizing operator for the distribution . To be concrete, in our case we have
for smooth enough functions , yielding the Stein identity
A real-valued random variable is distributed according to if and only if for all functions the expected values and exist and coincide.
First, let and let . By the hypothesis and the transformation formula, we have
Hence the expexted value exists. Since is continuous, it is bounded on $E[(\alpha+\beta+2)Xg(X)+(\alpha-\beta)g(X)]g\varrho_{\alpha,\beta}$ we can use integration by parts and have
where . It will be shown in Proposition 3.2 below that , so that by hypothesis we have
Since we have fixed the parameters and , henceforth we may and will suppress them as sub-indices at objects which might well depend on them (for example we will simply write for and so on). As we would like to use the theory from section 2 we have to make sure, that our Stein identity for the Beta distribution fits into this framework, i.e. that relation (15) is satisfied with and
where we have used, that . In principle, this is clear, because we have just established a Stein characterization for and given the density and the function , the corresponding is, of course, unique. However, we give a formal proof.
holds for all . First note, that
Differentiating the left hand side of (36), we obtain
which is of course the derivative of the right hand side, too. Since
Condition 2.6 is also satisfied but need not be proved, because its most important conclusion, namely that is clear from the above discussion. To verify Conditions 2.19, 2.21 and 2.24, we must first define the functions on and on . We claim, that the functions
These two functions are of course locally integrable and hence, Condition 2.19 is satisfied. Since
Conditions 2.21 and 2.24 also hold. Consequently, all results from section 2 are valid in particular for the case of Beta distributions.
Here, the values of at are arbitrary, but they are chosen such that is continuous, whenever is continuous at . This follows immediately from Proposition 2.22.
The next result, which is also proved in the appendix, completes the proof of Proposition 3.1 by showing that the solution is in the class whenever .
Next, we will derive some results for the solutions from corresponding results in Section 2.
Since and , this immediately follows from Proposition 2.25. ∎
There exists a constant , only depending on and such that .
The following lemma, which is proved in the appendix, will be useful.
has a bounded solution on if and only if .
and from Proposition 3.4 we see that is Lipschitz with minimal Lipschitz constant
Hence, there is a constant depending only on and such that
for all twice differentiable functions with bounded first and second derivative. We have thus proved the following proposition.
Now we are in the position to provide a “plug-in theorem” for the Beta approximation using exchangeable pairs.
Let be identically distributed, real-valued random variables on a common probabilty space satisfying the regression property
for some constant and a random variable . Then for each twice differentiable function with bounded first and second derivative and with E\bigl{[}\lvert h(W)\rvert\bigr{]}<\infty we have the bound
where the constants and are from Propositions 3.4 and 3.6, respectively.
This immediately follows from Propositions 2.31, 3.4, 3.6 and since is a solution to Stein’s equation (11). ∎
In the following we will transfer the developed theory to the Beta distributions on $\nu_{a,b}a,b>0X\sim\nu_{a,b}Y:=2X-1\sim\mu_{b-1,a-1}f$ we have
So, a Stein identity for is given by
If is bounded, then \lVert f_{h}\rVert_{\infty}\leq\lVert h-\nu_{a,b}(h)\rVert_{\infty}\max\Bigl{(}\frac{1}{2m(1-m)q_{a,b}(m)},\,\frac{1}{a},\,\frac{1}{b}\Bigr{)}, where is a median for .
If is Lipschitz, then and , where only depends on and .
If is twice differentiable with bounded first and second derivative, then \lVert f_{h}^{\prime\prime}\rVert_{\infty}\leq C_{2}\bigl{(}\lVert h^{\prime}\rVert_{\infty}+\lVert h^{\prime\prime}\rVert_{\infty}\bigr{)}, where only depends on and .
Now, let be identically distributed, real-valued random variables on a common probability space . For the approximation of by the general regression property from Section 2 is
where, again, is constant and is a hopefully small remainder term. For the distribution Theorem 3.7 becomes the following:
where the constants and are those from Proposition 3.8.
The assertion is clear from Propositions 2.31 and 3.8 and since is a solution to Stein’s equation (41). ∎
Application to the Polya urn model
In this section we prove a quantitative version of the fact that the relative number of drawn red balls in a Polya urn model converges in distribution to a suitable Beta distribution, if the number of total drawings tends to infinity. This model will serve as an application of our Stein method for the Beta distribution, as developed in section 3. We start by introducing the stochastic model:
It now follows, that for each we have
or, with and ,
The distribution of is usually referred to as the Polya distribution with parameters , and . It is a well-known fact that the distribution of converges weakly to the distribution as goes to infinity, where the Beta distribution was defined in section 3. A convenient way to prove this weak convergence result is to use the formula
together with the weak law of large numbers for Bernoulli random variables to deal with the binomial probabilities .
Formula (43) can be proved by a straight-forward computation using the relations and for the Beta function, where , and can also be viewed as a consequence of a special instance of de Finetti’s representation theorem for infinite exchangeable sequences. Note, however, that one generally does not know the corresponding mixing measure from de Finetti’s theorem and hence, identity (43) is not a direct consequence of this theorem.
From now on, we will present a Stein’s method proof of the above distributional convergence result and, as usual, also derive a rate of convergence. We will usually suppress the time index and let denote the random variable of interest. For the construction of the exchangeable pair, we use the well-known Gibbs sampling procedure with the slight simplification, that due to exchangeability of we need not choose at random the index of the summand from , which has to be replaced. Instead, we will always replace by , which is constructed as follows:
Observe and construct according to the distribution . Then, letting , the pair is exchangeable. In order to use Stein’s method of exchangeable pairs, we need to establish a suitable regression property. This is the content of the following proposition.
The exchangeable pair satisfies the regression property
where \gamma_{a,b}(x)=(a+b)\bigl{(}\frac{a}{a+b}-x\bigr{)} and .
We have and by exchangeability of it clearly holds that . Also, by the definition of and since only assumes the values and we have for any
Thus, since , we obtain
Next, we will compute the quantity E\bigl{[}(V^{\prime}-V)^{2}|V\bigr{]}.
We have for the above constructed exchangeable pair
From the general theory of Gibbs sampling (see the author’s PhD thesis, to appear) it is known, that
Since we have from the proof of Proposition 4.1 that
where we have used again. Finally, we compute
Putting pieces together, we eventually obtain
The last assertion easily follows from this and from . ∎
Recall that for the distribution we have and hence, we obtain from Proposition 4.2 that
since . Similarly, since we have
From Theorem 3.9 we can now conclude the following result.
with the constants and from Proposition 3.8.
Since assumes only values in $E\bigl{[}\lvert h(V)\rvert\bigr{]}<\infty$ from Theorem 3.9 is trivially met. The assertion now follows immediately from Theorem 3.9, (44) and (45). ∎
In [GR12] the authors use a different technique within Stein’s method for the Beta distributions, which compares the Stein characterization of the target distribution with that of the approximating discrete distribution, to prove that, in the special case , the convergence rate of order from Theorem 4.3 even holds in the Wasserstein distance and they compute an explicit constant in the bound. They also show that the rate of convergence is optimal. Using their technique and the bounds from Proposition 3.8 one can easily see that the rate of order in the Wasserstein distance also holds in the case . However, in order to obtain an explicit constant, some further work has to be done to bound the constant from Proposition 3.8 in the case that one of the values is strictly smaller than one.
In [FG] the authors use the zero bias coupling within Stein’s method for normal approximation to prove bounds on the distance of a normalized version of the quantity to the standard normal distribution. In particular, they show that a CLT holds whenever the parameters , and tend to infinity in a suitable fashion.
Appendix
In this section we provide the proofs of some of the results from Sections 2 and 3 and state and prove some further auxiliary results, which are only used within proofs.
The following result justifies all our calculations, which invoke de l’Hôpital’s rule. Its proof is suppressed for reasons of space, but will be given in the author’s PhD thesis.
If , then both, and , are absolutely continuous on .
If , then for all and
The same conclusion holds if for almost all and an analogous result is true for .
2. Proofs from Section 2
which exists in by Condition 2.2. Here, we used the convention .
again by Condition 2.2 and by Proposition 2.4. Furthermore, we have
for each since by the positivity of and because is strictly decreasing
Hence, is strictly increasing and thus for each :
The same bound can be proved for by using the representation
and the fact that also .
The following two well-known lemmas will be needed for the proof of Proposition 2.14. Their proofs are included only for reasons of completeness.
Let and let be a probability measure (not necessarily absolutely continuous with respect to ) with . Let be the distribution function corresponding to and suppose that . Then, for each we have
For each we have .
Since is a probability measure we have by the fundamental theorem of calculus for Lebesgue integration and by Fubini’s theorem
This proves (a). As to (b), we have using (a) and its proof
First, we prove (a). Recall the representation
By Lemmas 5.3 and 5.2 we thus obtain that
implying (a). Now, we turn to the proof of (b). By Stein’s equation (11) we obtain for
which reduces to the bound asserted in (b). ∎
The bound on for has already been proved in Proposition 2.10. Let . Then we have by the negativity of which follows from Proposition 2.23:
Next, we will state a lemma, which replaces Lemma 5.2 outside the support .
For each we have and .
For each we have and .
which proves the first part of (a). The second claim of (a) follows from (a), the positivity of on and from the monotonicity of :
The proof of (b) is similar but easier, and is therefore omitted. ∎
The next lemma replaces Lemma 5.3 outside of the support of .
For each we have h(x)-\mu(h)=-\int_{x}^{b}\bigl{(}1-F(s)\bigr{)}h^{\prime}(s)ds=-\int_{x}^{a}h^{\prime}(s)ds-\int_{a}^{b}\bigl{(}1-F(s)\bigr{)}h^{\prime}(s)ds and
For each we have and
We only prove (a) since the proof of (b) is very similar. The first claim follows from Lemma 5.3 (a) since for and for . The second claim follows from the first one and from Fubini’s theorem by
This is the first representation for in the assertion. The second one follows, since for and hence
We only prove (a) and (c), since the proofs of (b) and (d) are similar. To prove (a), we observe that by Lemma 5.5 we have
Since is decreasing () and positive on this implies
By Lemma 5.2 and Lemma 5.4 (a) the right hand side equals
which is the claimed bound. Now, we turn to the proof of (c). By Stein’s equation (24) and Lemma 5.5 (a) we have for each :
These together with Proposition 2.27 (a), (b) and Corollary 2.16 immediately imply (a). As to (b), by (49) we have
This and Proposition 2.27 (c) imply claim (b). Assertion (c) may be proved similarly. ∎
proving the desired representation of inside the interval . Now, for let and . Then we have
since is positive by Proposition 2.14. Thus, is strictly increasing and is strictly decreasing on . Since for and for , this implies, that . It also implies the claimed representation of for . Furthermore, by de l’Hôpital’s rule, we have
Note, that these limits could also be derived from Proposition 2.22. Next, consider . For such an we have
by Lemma 5.4. Thus, is increasing on and hence, again by de l’Hôpital’s rule,
Since we have also derived the desired formula for for . The calculations for are completely analogous and therefore omitted. From our computations we can already infer, that . So, it remains to show that this quantity is bounded in . Since it is a continuous function of , we only have to show that it has finite limits on the edge of the interval . But, of course,
and, similarly, . This concludes the proof. ∎
3. Proofs from Section 3
Using de l’Hôpital’s rule 5.1 and Lemma 5.6 (a) below, we obtain
Hence, . Similarly, one can prove that . It remains to show that
To this end, it suffices to see that the function is bounded on and on . Since it is continuous on and on (where for definiteness), this claim will follow if we have proved that . For we have
proving the claim for . Since it may be proved analogously for the proof is complete. ∎
The following lemma will be useful for the proof of Proposition 3.4.
The functions , , and satisfy the following equations for each integer :
\frac{d}{dx}\bigl{(}\eta(x)^{k}p(x)\bigr{)}=\eta(x)^{k-1}p(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]} for each
\frac{d}{dx}\bigl{(}\eta(x)^{k}q_{l}(x)\bigr{)}=\eta(x)^{k-1}q_{l}(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]} for each
\frac{d}{dx}\bigl{(}\eta(x)^{k}q_{r}(x)\bigr{)}=\eta(x)^{k-1}q_{r}(x)\bigl{[}(k-1)\eta^{\prime}(x)+\gamma(x)\bigr{]} for each
First we prove (a). By (14) we have, multiplying by ,
proving (a). As to (b), we observe that by the definition of we have on the one hand
and on the other hand, by the product rule,
Now the proof follows the lines of proof for as above. Similarly one may prove (c). ∎
Assertion (a) immediately follows from Corollary 2.28 (a) since in this case . Now, we turn to the proof of (b). First, consider . By Corollary 2.16 (b) we have for :
Then is continuous on and hence, to show that is bounded on , it suffices to prove that has finite limits at . Since we obtain, using Lemma 5.6 and de l’Hôpital’s rule:
Here we have used, that . Similarly one shows that
Thus, we have shown that . Now, we consider . From Corollary 2.28 (b) we have
For we consider the function
Clearly, is a continuous function on . To show that it is bounded, it thus suffices to prove that and . Using Lemma 5.6 and de l’Hôpital’s rule, we obtain
Next, we will show that . We actually have
Here we have used that as . Hence, and . Since we can show in a similar manner that , where is defined in the obvious way, the proof is complete. ∎
If , then the usual Stein solution is bounded on by Proposition 3.3. For the converse, let us assume that . As was already noted in Section 2, the solutions of the homogeneous equation corresponding to (40) are exactly the multiples of . Thus, every solution of (40) has the form
since . Hence, is unbounded near . If we have
So, is unbounded near . Hence, in any case is unbounded on .