Distributional transformations without orthogonality relations

Christian Döbler

Introduction

Main abstract results and discussion

In this Subsection we give a proof of the following theorem, which is a generalization of Theorem 2.1 in [GR05].

Then, α\alpha is necessarily positive and there exists a unique distribution for a random variable X(B)X^{(B)} such that for all FFmF\in\mathcal{F}^{m} we have

whith LFL_{F} as defined in (2). Furthermore, if m1m\geq 1, then the distribution of X(B)X^{(B)} is absolutely continuous with respect to the Lebesgue measure.

If XX and BB additionally satisfy the orthogonality conditions E[XjB(X)]=0E[X^{j}B(X)]=0 for all j=0,1,,m1j=0,1,\dotsc,m-1, then the distribution L(X(B))\mathcal{L}(X^{(B)}) of X(B)X^{(B)} reduces to the XBX-B biased distribution from [GR05] as is easily seen by writing the polynomial LFL_{F} in terms of the monomials 1,X,,Xm11,X,\dotsc,X^{m-1}. Also, in this case for the same reason we have α=(m!)1E[XmB(X)]\alpha=(m!)^{-1}E[X^{m}B(X)]. So it is justified to call the distribution of X(B)X^{(B)} the generalized XBX-B biased distribution.

Note that if, according to our definition of sign changes, BB has both, mm and mm^{\prime} sign changes for mmm\not=m^{\prime}, then we see from (3) that these two points of view lead to different distributions for X(B)X^{(B)}. Also, if we may consider BB to have sign changes at x1<<xmx_{1}<\ldots<x_{m} as well as at y1<<ymy_{1}<\ldots<y_{m}, then the resulting α\alpha’s and, again, the distributions of X(B)X^{(B)}’s are different, in general, which is in contrast to the theory from [GR05], where such ambiguities are ruled out by their orthogonality asumptions on XX with respect to BB. Thus, one should actually denote the variable X(B)X^{(B)} by X(B;x1,,xm)X^{(B;x_{1},\dotsc,x_{m})} to prevent these ambiguities. We illustrate this phenomenon for the case m=1m=1 in Example 3.6 below. We will, however, not do so but rather assume that it is understood or mention how many sign changes at what exact points the function BB is supposed to have.

For the existence part of Theorem 2.1 we give two different proofs: An analytical proof, which uses the Riesz representation theorem, and a probabilistic proof, which relies on an explicit construction of the random variable X(B)X^{(B)}. Remarkably, the same construction of X(B)X^{(B)} as in [GR05] is still valid in this more general setting. However, we were not able to generalize the proof of Theorem 2.1 in [GR05] to a proof of our Theorem 2.1.

In the case m=1m=1, one can easily show that the function pp given by

Note that if FFmF\in\mathcal{F}^{m}, then one can easily show by induction on k=0,1,,mk=0,1,\dotsc,m that there exist finite constants ck>0c_{k}>0 such that

The assumption EXjB(X)<E\lvert X^{j}B(X)\rvert<\infty for j=0,1,,mj=0,1,\dotsc,m is easily seen to be equivalent to EB(X)<E\lvert B(X)\rvert<\infty and EXmB(X)<E\lvert X^{m}B(X)\rvert<\infty.

From the nonnegativity of BB on Jm+1J_{m+1} we know that

Thus, if α0\alpha\not=0, it is necessarily positive. Now, we give the explicit construction of the random variable X(B)X^{(B)} from [GR05]. Let Y,U1,U2,,UmY,U_{1},U_{2},\dotsc,U_{m} be independent random variables such that UjU_{j} has the density pj(u):=juj11(0,1)(u)p_{j}(u):=ju^{j-1}1_{(0,1)}(u) (1jm1\leq j\leq m) and YY has distribution ν\nu given by

where μ\mu is the distribution of XX. Note that, by (5) and the definition and positivity of α\alpha, ν\nu is indeed a probability measure and, hence, such a YY exists. Now, we define the random variable

where x0:=Yx_{0}:=Y. We claim that X(B)X^{(B)} satisfies (3). This claim will be proved by induction on m=0,1,m=0,1,\dotsc. If m=0m=0, then the claim reduces to

we conclude from the induction hypothesis that

we can thus conclude from the induction hypothesis that

which is clear from the Lagrange form of the interpolation polynomial corresponding to the constant function 11 and the nodes x1,,xmx_{1},\dotsc,x_{m}. Using this, we obtain that

To prove this claim, we use the explicit construction of X(B)X^{(B)} given in (7). Thus, we have that

By the properties of the Lebesgue measure it follows that

Thus, from (11) we infer that P(X(B)N)=0P(X^{(B)}\in N)=0. Hence, the distribution of X(B)X^{(B)} is absolutely continuous with respect to λ\lambda. ∎

With the notation of the above existence proof, from the identity

valid for bounded and measurable ff, and an easy change of variable one can easily deduce that for m2m\geq 2 the (λ\lambda-a.e. unique) density pp of X(B)X^{(B)} is given by

From (2.1) and Remark 2.2 (f) we conclude that

Further, for each sSs\in S let Xs(B)X_{s}^{(B)} have the generalized XsBX_{s}-B biased distribution. Let JJ be independent of the family (Xs(B))sS(X_{s}^{(B)})_{s\in S} having distribution P(JA):=Aαsαdγ(s)P(J\in A):=\int_{A}\frac{\alpha_{s}}{\alpha}d\gamma(s), ASA\in\mathcal{S}.

Under the above assumptions the variable X(B):=XJ(B)X^{(B)}:=X_{J}^{(B)} has the generalized XBX-B biased distribution.

The easy proof is quite standard: For FFmF\in\mathcal{F}^{m} we have by Fubini’s theorem

It is actually not strictly necessary to assume that XsX_{s} satisfies the asumptions of Theorem 2.1 for each sSs\in S. In fact, assuming (2.1) it follows from Remark 2.2 (f) that αs\alpha_{s} exists for γ\gamma-a.e. sSs\in S but it might be zero for certain values of ss. Assuming additionally that α>0\alpha>0 for X=XIX=X_{I} and letting Xs(B)X^{(B)}_{s} have any fixed distribution if αs=0\alpha_{s}=0, then the proof goes through as before, since the distribution of the index JJ puts mass to values of ss such that αs=0\alpha_{s}=0.

2. Biasing functions with fewer than m𝑚m sign changes

Although Theorem 2.1 is already quite general, in practice it might happen that one would like the order mm of the derivative on the right hand side of (3) to be larger than the number, say kk, of sign changes of the function BB on the left hand side of (3). For example, if XX is a nonnegative random variable with finite and non-zero expectation, then XeX^{e} is said to have the equilibrium distribution with respect to XX, if

holds for all continuously differentiable functions ff with a Lipschitz derivative. In their final version [PR14] they prove this by giving an explicit construction of the random variable X(L)X^{(L)}. In the first arXiv version, however, they applied Theorem 2.1 of [GR05] with the distributional transformation given by B(x)=sign(x)B(x)=\operatorname{sign}(x) twice in a row, and, in order to do so, they had to make sure that the orthogonality assumptions of that theorem were satisfied. This is why they first had to assume that not only E[X]=0E[X]=0 but also P(X<0)=P(X>0)=1/2P(X<0)=P(X>0)=1/2 be satisfied. Invoking Theorem 2.1 instead, we are able to prove the following statement, which even generalizes (15) to the class of all XX with finite second moment. This result is the main building block of a generalization of Theorem 2.1 to cases, where the number of sign changes of BB might disagree with the order of the derivative of the test function FF.

holds for all continuously differentiable functions ff with a Lipschitz derivative. Further, the distribution of X^a\hat{X}_{a} is always absolutely continuous with respect to the Lebesgue measure.

Using the transformation from Proposition 2.5, one could easily generalize the results from [PR14] to random sums with general mean zero summands and even to summands with small, non-zero means.

holds for all Lipschitz functions hh. Since XX has finite second moment, one can easily see that (18) also holds for absolutely continuous functions gg such that g(x)\lvert g^{\prime}(x)\rvert is O(x)O(x) as x\lvert x\rvert\to\infty. In particular this holds for g(x):=\operatorname{sign}(x-a)\bigl{(}f(x)-f(a)\bigr{)} with g(a)=0g(a)=0 and g(x)=sign(xa)f(x)g^{\prime}(x)=\operatorname{sign}(x-a)f^{\prime}(x) for xax\not=a. Thus, from (18), (2.2) and (2.2) we conclude that

proving (16). Absolute continuity of L(X^a)\mathcal{L}(\hat{X}_{a}) follows immediately from Theorem 2.1. ∎

Next, we will use the result of Proposition 2.5 to give a generalization of Theorem 2.1 to cases, where the number kk of sign changes of BB may be smaller than the order mm of the derivative we would like to have in the defining identity for the biased distribution. However, we will have to assume that kmmod2k\equiv m\mod 2, i.e. that kk and mm have the same parity. In what follows, for nonnegative integers n,jn,j we denote by (n)j(n)_{j} the falling factorial, i.e. (n)0:=1(n)_{0}:=1 and (n)j:=n(n1)(nj+1)(n)_{j}:=n(n-1)\cdot\ldots\cdot(n-j+1) if j1j\geq 1.

If k=0k=0, assume further that the generalized XBX-B biased distribution from Theorem 2.1 is not the Dirac measure at . Then, there exists a unique distribution for a random variable X(B,m)X^{(B,m)} such that

holds for each FFmF\in\mathcal{F}^{m}, where, with

if k=0k=0. Then, RFR_{F} is equal to zero, whenever k=mk=m and has degree at most m1m-1, if k<mk<m. Furthermore, LFL_{F} still denotes the interpolation polynomial for FF corresponding to the nodes x1,,xkx_{1},\ldots,x_{k} given by (2) but with mm replaced by kk. Additionally, β\beta is always positive and is given by

if k1k\geq 1 and by β=(m!)1E[B(X)Xm]\beta=(m!)^{-1}E[B(X)X^{m}], if k=0k=0. Also, the distribution of X(B,m)X^{(B,m)} is always absolutely continuous with respect to the Lebesgue measure unless k=m=0k=m=0.

From Theorem 2.1 we know that α>0\alpha>0. Let FFmF\in\mathcal{F}^{m} be given. By the assumptions on XX one can conclude again from Theorem 2.1 that E[B(X)(F(X)LF(X))]E[B(X)(F(X)-L_{F}(X))] exists and that there is a random variable YY having the generalized XBX-B biased distribution, so that

From our assumption in the case k=0k=0 and from Theorem 2.1 for k1k\geq 1, we know that YY is not almost surely equal to zero. Thus, if mk+2m\geq k+2, by Proposition 2.5 (with a=0a=0) we know that there is a random variable Y1Y_{1} satisfying

where β1=12E[Y2]\beta_{1}=\frac{1}{2}E[Y^{2}]. Now, if mk+4m\geq k+4, then again by Proposition 2.5 we can find a random variable Y2Y_{2} such that

since E[Y1]=16β1E[Y3]=13!β1E[Y3]E[Y_{1}]=\frac{1}{6\beta_{1}}E[Y^{3}]=\frac{1}{3!\beta_{1}}E[Y^{3}] and with

Inductively, for l=1,,mk2l=1,\dotsc,\frac{m-k}{2} we find that there exists YlY_{l} such that, with Y0:=YY_{0}:=Y we have

Again by induction we find the following analog of (2.2):

Now note that for j=0,1,,mkj=0,1,\dotsc,m-k with the function Fj(x):=xk+j(k+j)kF_{j}(x):=\frac{x^{k+j}}{(k+j)_{k}} we have from (28) that

Clearly, Qj(x):=Fj(x)LFj(x)Q_{j}(x):=F_{j}(x)-L_{F_{j}}(x) is a polynomial of degree k+jk+j having the zeroes x1<<xkx_{1}<\ldots<x_{k}. Thus, there exists a polynomial qjq_{j} of degree jj such that Qj(x)=qj(x)l=1k(xxl)Q_{j}(x)=q_{j}(x)\prod_{l=1}^{k}(x-x_{l}). Now, first suppose that k=0k=0. Then, we have Fj(x)=Qj(x)=qj(x)=xjF_{j}(x)=Q_{j}(x)=q_{j}(x)=x^{j}. Thus, from (33) and (34) we can conclude that

Letting X(B,m):=Ym2X^{(B,m)}:=Y_{\frac{m}{2}} the claim follows in the case k=0k=0 from (28) and (2.2). From now on, we will assume that k1k\geq 1. In order to find qjq_{j} in this case, we write

the last identity because the left hand side is a polynomial of degree j+kj+k and, hence, the right hand side must also be. Thus, as a neat by-product we have proved that

From (2.2) we conclude that qjq_{j} is given by

Hence, from (34) and (38) we find for j=0,1,,mkj=0,1,\dotsc,m-k that

Now, from reading (2.2) backwards (with m=k+jm=k+j) we obtain

Letting X(B,m):=Ymk2X^{(B,m)}:=Y_{\frac{m-k}{2}} (23) now follows from (28) and (42). To see that β>0\beta>0, note that we know from our assumption in the case k=0k=0 and from Theorem 2.1 in the case k1k\geq 1 that YY cannot almost surely be equal to zero. Thus, the even moments of YY are also non-zero. Since we know from (33) that β=α(mk)!E[Ymk]\beta=\frac{\alpha}{(m-k)!}E[Y^{m-k}] with α>0\alpha>0 and as mkm-k is even, it follows that also β>0\beta>0. Knowing that β\beta is necessarily positive, uniqueness of the distribution for X(B,m)X^{(B,m)} can be proved as for X(B)X^{(B)} in the proof of Theorem 2.1. Absolute continuity of L(X(B,m))\mathcal{L}(X^{(B,m)}) in the case that not both, mm and kk are equal to zero, now follows from Theorem 2.1 and Proposition 2.5. It remains to show the alternative representation for the numbers ai(j)a_{i}^{(j)} in (24). This is given by Lemma 2.8. ∎

For k1k\geq 1 let x1,,xkx_{1},\dotsc,x_{k} be distinct real (or complex) numbers. Then, for each nonnegative integer nn we have the identity

We prove the claim by induction on kk, simultaneously for all n0n\geq 0. If k=1k=1, then it is clearly true. Now assume that k1k\geq 1 and that x1,,xk,xk+1x_{1},\dotsc,x_{k},x_{k+1} are distinct numbers. Then, we can write

we conclude from the induction hypothesis that

Thus, it only remains to show that S2=0S_{2}=0. But this follows from (37), completing the proof. ∎

We may call the distribution of X(B,m)X^{(B,m)} the X(B,m)X-(B,m) biased distribution. Note, however, that, as for X(B)X^{(B)}, the distribution of X(B,m)X^{(B,m)} is sensitive to the number kk and the choice of the sign change points x1<<xkx_{1}<\ldots<x_{k}, if these are ambiguous (see Remark 2.2 (b)).

It is easy to see that an analog of Proposition 2.4 also exists for the X(B,m)X-(B,m) biased distribution.

Examples and Applications

In this Subsection we give some examples of first-order distributional transformations, whose existence is guaranteed by Theorem 2.1 and demonstrate how this theory may be applied to prove certain Stein type characterizations without using the solution of the corresponding Stein equation. We also show, how one can use a coupling of XX and X(B)X^{(B)} to estimate the distance of L(X)\mathcal{L}(X) to a fixed point of the distributional transformation induced by BB. Finally, we show by examle that the distribution of X(B)X^{(B)} in general depends on the choice of the zeroes of BB, if these are ambiguous.

Let XX be a real-valued random variable with 0<E[X2]<0<E[X^{2}]<\infty. Choosing B(x)=xB(x)=x with a single sign change at , we conclude from Theorem 2.1 that there exists a random variable XgzX^{gz} such that

Under the same assumptions on XX as in (a) we now choose B(x):=xE[X]B(x):=x-E[X]. Then,

and, again by Theorem 2.1, we find that there is a random variable XnzX^{nz} such that

where we have used that E[B(X)]=0E[B(X)]=0 in this case. Again, whenever XX has mean zero, the distribution of XnzX^{nz} reduces to the XX-zero biased distribution. In general, we call it the XX-non-zero biased distribution. Note that the existence of this distribution already follows from Theorem 2.1 in [GR05], as BB satisfies their orthogonality relation in this case.

Next, we show by example how the existence of such distributional transformations may be used to prove a Stein type characterization of a given distribution, which is a fixed point of the distributional transformation. We first need the following definition.

Let σ>0\sigma>0 and ZσN(0,σ2)Z_{\sigma}\sim N(0,\sigma^{2}). Then, the distribution of Yσ:=ZσY_{\sigma}:=\lvert Z_{\sigma}\rvert is called the half-normal distribution or modulus normal distribution with parameter σ2=E[Yσ2]\sigma^{2}=E[Y_{\sigma}^{2}]. Further, we say that WσW_{\sigma} has the negative half-normal distribution with parameter σ2\sigma^{2}, if Wσ-W_{\sigma} has the half-normal distribution with parameter σ2\sigma^{2}.

Let XX be a real-valued random variable such that 0<E[X2]<0<E[X^{2}]<\infty. Then L(X)\mathcal{L}(X) is a fixed point of the generalized zero bias transformation if and only if it is a mixture of a half-normal and a negative half-normal distribution with the same parameter.

Let the distribution of XX be a fixed point of the generalized zero-bias transformation. Then, from Remark 2.2 (d) we know that XX has an absolutely continuous distribution with density pp given by

From (3.1) and (48) we conclude that pp is continuously differentiable on (0,)(0,\infty) and on (,0)(-\infty,0) and that

for each t0t\not=0. From (49) we see, that

for t<0t<0. Here, we used the shorthands p(0+):=limt0p(t)p(0+):=\lim_{t\downarrow 0}p(t) and p(0):=limt0p(t)p(0-):=\lim_{t\uparrow 0}p(t). The claim now follows from (50) and (51). Conversely, if the distribution of XX is such a mixture, then, by a standard computation involving Fubini’s theorem, one easily verifies that XX satisfies

and, hence, that L(X)\mathcal{L}(X) is a fixed point of the generalized zero bias transformation. We omit the details. ∎

From Proposition 3.3 we directly infer the following Stein characterization of the class of half-normal distributions, whose derivation does not make use of the solution to any Stein equation.

A nonnegative random variable XX with 0<E[X2]<0<E[X^{2}]<\infty has the half-normal distribution with parameter σ2=E[X2]\sigma^{2}=E[X^{2}], if and only if

for all tx1t\not=x_{1}, which is analogous to (49) and which implies that the log\log-derivative of pp is given by B/α-B/\alpha. Hence, the family of denisties pp giving rise to fixed points of the distributional transformation L(X)L(X(B))\mathcal{L}(X)\mapsto\mathcal{L}(X^{(B)}) can be reconstructed as before.

Suppose that the distribution of ZZ is a fixed point of the distributional transformation in (a). Up to dividing BB by a constant, which does not change the distributional transformation, we can assume that

i.e. B-B is the log\log-derivative of the density pp of ZZ. Then, the Stein equation from the density approach (see e.g. [CGS11]) for ZZ corresponding to a test function hh such that E[h(Z)]E[h(Z)] exists, reads

where we suppose that the support of L(Z)\mathcal{L}(Z) is given by the interval (a,b)\overline{(a,b)} for some a<b-\infty\leq a<b\leq\infty. The law of ZZ is then usually chracterized by the identity

valid for all functions ff from some large function class F\mathcal{F}. If hh is Lipschitz-continuous, one typically has bounds for fhf_{h} of the form

for some finite constants c0,c1c_{0},c_{1} and c2c_{2} (see [CGS11], again). Now, suppose that XX is given and that X(B)X^{(B)} has the generalized XBX-B biased distribution and is constructed on the same space as XX. Then, for a 11-Lipschitz function hh, we can estimate

From (53) with f(x)=xx1f(x)=x-x_{1} we presume that α\alpha should be close to one, if L(X)L(Z)\mathcal{L}(X)\approx\mathcal{L}(Z). Thus, the second term in (55) (or (54)) should be close to zero. Also, if we can couple X(B)X^{(B)} close to XX, then the first term should be small, too. In many cases, we have that E[B(Z)]=0E[B(Z)]=0, as is suggested by taking f(x)1f(x)\equiv 1 in (53), and from which we conclude that the third term in (55) is also close to zero and, hence that (55) gives a good estimate of the Wasserstein distance

and, hence, (54) might still give a useful estimate. In a nutshell, if the distribution of ZZ is a fixed point of the distributional transformation induced by BB and we somehow conjecture that L(X)L(Z)\mathcal{L}(X)\approx\mathcal{L}(Z) and if we can can couple XX and X(B)X^{(B)} sufficiently close, then we should be able to accurately estimate the (Wasserstein) distance between L(X)\mathcal{L}(X) and L(Z)\mathcal{L}(Z) by the above procedure.

The following example illustrates the dependence of the distribution of X(B)X^{(B)} on the choice of the sign change points, if there are non-trivial intervals, where BB vanishes identically and, if the orthogonality relations from [GR05] do not hold.

From Remark 2.2 (d) we know that a density pp for the distribution of X(B;a)X^{(B;a)} is given by

and that a density qq for the distribution of X(B;b)X^{(B;b)} is given by

We immediately see that, if the orthogonality relation E[B(X)]=0E[B(X)]=0 is satisfied, then α=β\alpha=\beta and p=qp=q. This is in accordance with the fact that under this condition the X(B)X^{(B)} distribution is the same for all choices of the zero point of BB as stated in [GR05]. If, however, E[B(X)]0E[B(X)]\not=0, then we see from (3.6) that pp and qq are generally different and, hence, that the distribution of X(B)X^{(B)} actually depends on the choice of the zeroes of BB. For a concrete example, let XX be uniformly distributed on $andletand letB(x)=\max(x,0)=:x^{+}.Then,withthenotationofthesituationabove,wecanlet. Then, with the notation of the situation above, we can leta=-1andandb=0andobtainand obtainE[B(X)]=E[X^{+}]=1/4$ as well as

Obviously, pp and qq give rise to two different distributions.

2. Higher order Stein operators

The purpose of this Subsection is to show, how the existence of certain couplings guaranteed by Theorem 2.7 can be used to assess the distance of the distribution of a given random variable XX to the distribution of a random variable ZZ, which is characterized by some higher order linear Stein operator LL of the form

are all finite and such that α:=α1+α2>0\alpha:=\alpha_{1}+\alpha_{2}>0, where

Then, there exists a unique distribution for a random variable XX^{*} such that for all fF2f\in\mathcal{F}^{2} we have

The law of XX^{*} is always absolutely continuous with respect to the Lebesgue measure.

Uniqueness is proved in the same way as in the proof of Theorem 2.1. So let us just prove the existence of XX^{*}. First, choose X^a\hat{X}_{a} as in Proposition 2.5. Let ν:=L(X^a)\nu:=\mathcal{L}(\hat{X}_{a}) and, if E[B0(X^a)]0E[B_{0}(\hat{X}_{a})]\not=0, define μ\mu by

whereas, if E[B0(X^a)]=0E[B_{0}(\hat{X}_{a})]=0, let μ:=δ0\mu:=\delta_{0}. Finally, let Y1μY_{1}\sim\mu and construct Y2Y_{2} and a random index I{1,2}I\in\{1,2\} on the same probability space as Y1Y_{1} such that II is independent of Y1,Y2Y_{1},Y_{2}, and Y2Y_{2} has the generalized XB1X-B_{1}-biased distribution and

hold for all sufficiently smooth functions gg and ff, respectively. Hence, letting X:=YIX^{*}:=Y_{I} we have for all fF2f\in\mathcal{F}^{2} that

as claimed. Also, note that the distribution of XX^{*}, being a mixture of absolutely continuous distributions, is itself absolutely continuous. ∎

is the (λ\lambda-a.e. unique) probability density function of XX^{*}.

If the operator LL in (57) with m=2m=2 is characterizing for the distribution of ZZ, then the Stein equation corresponding to a test function hh with EZ<E\lvert Z\rvert<\infty is given by

and, often, it has a solution fhf_{h} such that the lower order derivatives can be uniformly bounded by constants, i.e. fh(i)ci\lVert f_{h}^{(i)}\rVert_{\infty}\leq c_{i} uniformly over hh in some class H\mathcal{H} of test functions. Then, if one can couple the given random variable XX to a XX^{*} such as in Proposition 3.7, then one can easily show that

Now, in typical cases one either has that the quantities fh(a)f_{h}(a) and fh(a)f_{h}^{\prime}(a) are equal to zero (as is the case for the operator used in [PRR13]), or the expressions

are close to zero. The latter could be guessed from choosing f(x)=xaf(x)=x-a and f(x)=1f(x)=1, respectively, together with the assumption that L(X)L(Z)\mathcal{L}(X)\approx\mathcal{L}(Z). The same heuristic applied to f(x)=12(xa)2f(x)=\frac{1}{2}(x-a)^{2} suggests that α\alpha should be close to 11. Thus, the right hand side of (58) should be close to zero, if XX and XX^{*} are coupled close to each other.

As in the first-order case (see Remark 3.5) one can show that if L(Z)\mathcal{L}(Z) is a fixed point of the distributional transformation from Proposition 3.7, then its density pp satisfies the second order linear differential equation

from which one should be able to reconstruct the class of fixed points in practice by exploiting boundary conditions like p(t)dt=1\int p(t)dt=1.

Now, we return to the case of a general m1m\geq 1. Henceforth, we denote by Rj,fR_{j,f} and Lj,fL_{j,f}, respectively, the polynomials from the statement of Theorem 2.7 for B=BjB=B_{j}, j=0,1,,m1j=0,1,\dotsc,m-1 and define Qj,f:=Lj,f+Rj,fQ_{j,f}:=L_{j,f}+R_{j,f}. In Theorem 3.9 below, we make the assumption that BjB_{j} has 0kjmj0\leq k_{j}\leq m-j sign changes and that kjmjmod2k_{j}\equiv m-j\mod 2. Then, by Theorem 2.7, Qj,fQ_{j,f} is a polynomial of degree mj1\leq m-j-1, j=0,1,,m1j=0,1,\dotsc,m-1. Also, assume that XX is a real random variable such that EBj(X)Xl<E\lvert B_{j}(X)X^{l}\rvert<\infty for each 0lmj0\leq l\leq m-j and 0jm10\leq j\leq m-1. Then, for j=0,1,,m1j=0,1,\dotsc,m-1, we define

which is always nonnegative by Theorem 2.7.

With the above notation and assumptions, suppose that for each j=0,1,,m1j=0,1,\dotsc,m-1 the function BjB_{j} has 0kjmj0\leq k_{j}\leq m-j sign changes, where kjmjmod2k_{j}\equiv m-j\mod 2. Furthermore, assume that there is some j{0,1,,m1}j\in\{0,1,\dotsc,m-1\} such that βj>0\beta_{j}>0 and let β:=j=0m1βj>0\beta:=\sum_{j=0}^{m-1}\beta_{j}>0. Then, there exists a unique distribution for a random variable XX^{*} such that for all fFmf\in\mathcal{F}^{m} we have

The law of XX^{*} is always absolutely continuous with respect to the Lebesgue measure.

Again, we only prove the existence part. For each j=0,1,,m1j=0,1,\dotsc,m-1 let YjY_{j} have the X(Bj,mj)X-(B_{j},m-j) biased distribution, whenever βj0\beta_{j}\not=0 and let Yj:=0Y_{j}:=0, otherwise. Also, let I{0,1,,m1}I\in\{0,1,\dotsc,m-1\} be a random index, which is independent of Y0,Y1,,Ym1Y_{0},Y_{1},\dotsc,Y_{m-1} such that

and define X:=YIX^{*}:=Y_{I}. Then, with the notation fj:=f(j)f_{j}:=f^{(j)}, j=0,1,,m1j=0,1,\dotsc,m-1, by Theorem 2.7 we have

It is possible that a coupling of XX and XX^{*} as in Theorem 3.9 will be useful to bound the distance of the distribution of XX to that of ZZ also in the case m>3m>3, once such Stein operators are used in practice. Maybe it would first be necessary to adjust this distributional transformation slightly by introducing additional location parameters aja_{j} related to the functions BjB_{j}, as discussed in Remark 2.9 (c).

Analytical proof of Theorem 2.1

We prove the claim by induction on mm. Since F(0)=f>0F^{(0)}=f>0 has no zeroes if m=0m=0, the assertion is clear in this case. Now, let m1m\geq 1 and assume that the claim is true for (m1)(m-1)-times differentiable functions. Suppose, contrarily, that FF has m+1m+1 distinct zeroes y1<<ym<ym+1y_{1}<\ldots<y_{m}<y_{m+1}. Then, by Rolle’s theorem there exist points zk(yk,yk+1)z_{k}\in(y_{k},y_{k+1}) such that F(zk)=0F^{\prime}(z_{k})=0 for k=1,,mk=1,\dotsc,m. Since the points z1,,zmz_{1},\dotsc,z_{m} are necessarily pairwise distinct zeroes of the (m1)(m-1)-times differentiable function G:=FG:=F^{\prime} with G(m1)=F(m)>0G^{(m-1)}=F^{(m)}>0, this contradicts the induction hypothesis. ∎

Hence, for each polynomial QQ of degree at most m1m-1 it follows that limx(G(x)+Q(x))=+\lim_{x\to\infty}(G(x)+Q(x))=+\infty if m1m\geq 1 and that lim infxG(x)ε\liminf_{x\to\infty}G(x)\geq\varepsilon if m=0m=0.

Recall that for real numbers x1<x2<<xmx_{1}<x_{2}<\ldots<x_{m} we let J1:=(,x1]J_{1}:=(-\infty,x_{1}], Jk:=(xk1,xk]J_{k}:=(x_{k-1},x_{k}] for 2km2\leq k\leq m and Jm+1:=(xm,)J_{m+1}:=(x_{m},\infty).

This follows immediately from Lemma 4.3 and its proof. ∎

Note that by construction Q(x):=G1(x)LG1(x)Q(x):=G_{1}(x)-L_{G_{1}}(x) is a polynomial of degree mm such that Q(xk)=0Q(x_{k})=0 for k=1,,mk=1,\dotsc,m. Hence, there exists c0c\not=0 such that Q(x)=ck=1m(xxk)Q(x)=c\prod_{k=1}^{m}(x-x_{k}). Since c=Q(m)=G1(m)=1c=Q^{(m)}=G_{1}^{(m)}=1, we conclude from (4) that

On the other hand, by the monotone convergence theorem and (62) we have

Now, it is easily seen by successive differentiation that F=Gf+Tm1,xmFF=G_{f}+T_{m-1,x_{m}}F, where Tm1,xmFT_{m-1,x_{m}}F is the Taylor polynomial of order m1m-1 around xmx_{m} corresponding to FF. Since the interpolation polynomial of degree m1\leq m-1 corresponding to Tm1,xmFT_{m-1,x_{m}}F is still Tm1,xmFT_{m-1,x_{m}}F, this implies that

From (4) and (67) it finally folllows that

Major parts of this work have been carried out while I was postdoc at TU München, Germany. I would like to thank Professor Gesine Reinert for inviting me to a visit to Oxford in September 2013 and giving me the opportunity of presenting parts of this work during my stay there. I am also grateful to an anonymous referee whose comments helped me improve the presentation and exposition of the above results.

References