Distributional transformations without orthogonality relations

Christian Döbler

Introduction

Main abstract results and discussion

In this Subsection we give a proof of the following theorem, which is a generalization of Theorem 2.1 in [GR05].

Then, $\alpha$ is necessarily positive and there exists a unique distribution for a random variable $X^{(B)}$ such that for all $F\in\mathcal{F}^{m}$ we have

whith $L_{F}$ as defined in (2). Furthermore, if $m\geq 1$ , then the distribution of $X^{(B)}$ is absolutely continuous with respect to the Lebesgue measure.

If $X$ and $B$ additionally satisfy the orthogonality conditions $E[X^{j}B(X)]=0$ for all $j=0,1,\dotsc,m-1$ , then the distribution $\mathcal{L}(X^{(B)})$ of $X^{(B)}$ reduces to the $X-B$ biased distribution from [GR05] as is easily seen by writing the polynomial $L_{F}$ in terms of the monomials $1,X,\dotsc,X^{m-1}$ . Also, in this case for the same reason we have $\alpha=(m!)^{-1}E[X^{m}B(X)]$ . So it is justified to call the distribution of $X^{(B)}$ the generalized $X-B$ biased distribution.

Note that if, according to our definition of sign changes, $B$ has both, $m$ and $m^{\prime}$ sign changes for $m\not=m^{\prime}$ , then we see from (3) that these two points of view lead to different distributions for $X^{(B)}$ . Also, if we may consider $B$ to have sign changes at $x_{1}<\ldots<x_{m}$ as well as at $y_{1}<\ldots<y_{m}$ , then the resulting $\alpha$ ’s and, again, the distributions of $X^{(B)}$ ’s are different, in general, which is in contrast to the theory from [GR05], where such ambiguities are ruled out by their orthogonality asumptions on $X$ with respect to $B$ . Thus, one should actually denote the variable $X^{(B)}$ by $X^{(B;x_{1},\dotsc,x_{m})}$ to prevent these ambiguities. We illustrate this phenomenon for the case $m=1$ in Example 3.6 below. We will, however, not do so but rather assume that it is understood or mention how many sign changes at what exact points the function $B$ is supposed to have.

For the existence part of Theorem 2.1 we give two different proofs: An analytical proof, which uses the Riesz representation theorem, and a probabilistic proof, which relies on an explicit construction of the random variable $X^{(B)}$ . Remarkably, the same construction of $X^{(B)}$ as in [GR05] is still valid in this more general setting. However, we were not able to generalize the proof of Theorem 2.1 in [GR05] to a proof of our Theorem 2.1.

In the case $m=1$ , one can easily show that the function $p$ given by

Note that if $F\in\mathcal{F}^{m}$ , then one can easily show by induction on $k=0,1,\dotsc,m$ that there exist finite constants $c_{k}>0$ such that

The assumption $E\lvert X^{j}B(X)\rvert<\infty$ for $j=0,1,\dotsc,m$ is easily seen to be equivalent to $E\lvert B(X)\rvert<\infty$ and $E\lvert X^{m}B(X)\rvert<\infty$ .

From the nonnegativity of $B$ on $J_{m+1}$ we know that

Thus, if $\alpha\not=0$ , it is necessarily positive. Now, we give the explicit construction of the random variable $X^{(B)}$ from [GR05]. Let $Y,U_{1},U_{2},\dotsc,U_{m}$ be independent random variables such that $U_{j}$ has the density $p_{j}(u):=ju^{j-1}1_{(0,1)}(u)$ ( $1\leq j\leq m$ ) and $Y$ has distribution $\nu$ given by

where $\mu$ is the distribution of $X$ . Note that, by (5) and the definition and positivity of $\alpha$ , $\nu$ is indeed a probability measure and, hence, such a $Y$ exists. Now, we define the random variable

where $x_{0}:=Y$ . We claim that $X^{(B)}$ satisfies (3). This claim will be proved by induction on $m=0,1,\dotsc$ . If $m=0$ , then the claim reduces to

we conclude from the induction hypothesis that

we can thus conclude from the induction hypothesis that

which is clear from the Lagrange form of the interpolation polynomial corresponding to the constant function $1$ and the nodes $x_{1},\dotsc,x_{m}$ . Using this, we obtain that

To prove this claim, we use the explicit construction of $X^{(B)}$ given in (7). Thus, we have that

By the properties of the Lebesgue measure it follows that

Thus, from (11) we infer that $P(X^{(B)}\in N)=0$ . Hence, the distribution of $X^{(B)}$ is absolutely continuous with respect to $\lambda$ . ∎

With the notation of the above existence proof, from the identity

valid for bounded and measurable $f$ , and an easy change of variable one can easily deduce that for $m\geq 2$ the ( $\lambda$ -a.e. unique) density $p$ of $X^{(B)}$ is given by

From (2.1) and Remark 2.2 (f) we conclude that

Further, for each $s\in S$ let $X_{s}^{(B)}$ have the generalized $X_{s}-B$ biased distribution. Let $J$ be independent of the family $(X_{s}^{(B)})_{s\in S}$ having distribution $P(J\in A):=\int_{A}\frac{\alpha_{s}}{\alpha}d\gamma(s)$ , $A\in\mathcal{S}$ .

Under the above assumptions the variable $X^{(B)}:=X_{J}^{(B)}$ has the generalized $X-B$ biased distribution.

The easy proof is quite standard: For $F\in\mathcal{F}^{m}$ we have by Fubini’s theorem

It is actually not strictly necessary to assume that $X_{s}$ satisfies the asumptions of Theorem 2.1 for each $s\in S$ . In fact, assuming (2.1) it follows from Remark 2.2 (f) that $\alpha_{s}$ exists for $\gamma$ -a.e. $s\in S$ but it might be zero for certain values of $s$ . Assuming additionally that $\alpha>0$ for $X=X_{I}$ and letting $X^{(B)}_{s}$ have any fixed distribution if $\alpha_{s}=0$ , then the proof goes through as before, since the distribution of the index $J$ puts mass to values of $s$ such that $\alpha_{s}=0$ .

2. Biasing functions with fewer than m𝑚m sign changes

Although Theorem 2.1 is already quite general, in practice it might happen that one would like the order $m$ of the derivative on the right hand side of (3) to be larger than the number, say $k$ , of sign changes of the function $B$ on the left hand side of (3). For example, if $X$ is a nonnegative random variable with finite and non-zero expectation, then $X^{e}$ is said to have the equilibrium distribution with respect to $X$ , if

holds for all continuously differentiable functions $f$ with a Lipschitz derivative. In their final version [PR14] they prove this by giving an explicit construction of the random variable $X^{(L)}$ . In the first arXiv version, however, they applied Theorem 2.1 of [GR05] with the distributional transformation given by $B(x)=\operatorname{sign}(x)$ twice in a row, and, in order to do so, they had to make sure that the orthogonality assumptions of that theorem were satisfied. This is why they first had to assume that not only $E[X]=0$ but also $P(X<0)=P(X>0)=1/2$ be satisfied. Invoking Theorem 2.1 instead, we are able to prove the following statement, which even generalizes (15) to the class of all $X$ with finite second moment. This result is the main building block of a generalization of Theorem 2.1 to cases, where the number of sign changes of $B$ might disagree with the order of the derivative of the test function $F$ .

holds for all continuously differentiable functions $f$ with a Lipschitz derivative. Further, the distribution of $\hat{X}_{a}$ is always absolutely continuous with respect to the Lebesgue measure.

Using the transformation from Proposition 2.5, one could easily generalize the results from [PR14] to random sums with general mean zero summands and even to summands with small, non-zero means.

holds for all Lipschitz functions $h$ . Since $X$ has finite second moment, one can easily see that (18) also holds for absolutely continuous functions $g$ such that $\lvert g^{\prime}(x)\rvert$ is $O(x)$ as $\lvert x\rvert\to\infty$ . In particular this holds for $g(x):=\operatorname{sign}(x-a)\bigl{(}f(x)-f(a)\bigr{)}$ with $g(a)=0$ and $g^{\prime}(x)=\operatorname{sign}(x-a)f^{\prime}(x)$ for $x\not=a$ . Thus, from (18), (2.2) and (2.2) we conclude that

proving (16). Absolute continuity of $\mathcal{L}(\hat{X}_{a})$ follows immediately from Theorem 2.1. ∎

Next, we will use the result of Proposition 2.5 to give a generalization of Theorem 2.1 to cases, where the number $k$ of sign changes of $B$ may be smaller than the order $m$ of the derivative we would like to have in the defining identity for the biased distribution. However, we will have to assume that $k\equiv m\mod 2$ , i.e. that $k$ and $m$ have the same parity. In what follows, for nonnegative integers $n,j$ we denote by $(n)_{j}$ the falling factorial, i.e. $(n)_{0}:=1$ and $(n)_{j}:=n(n-1)\cdot\ldots\cdot(n-j+1)$ if $j\geq 1$ .

If $k=0$ , assume further that the generalized $X-B$ biased distribution from Theorem 2.1 is not the Dirac measure at . Then, there exists a unique distribution for a random variable $X^{(B,m)}$ such that

holds for each $F\in\mathcal{F}^{m}$ , where, with

if $k=0$ . Then, $R_{F}$ is equal to zero, whenever $k=m$ and has degree at most $m-1$ , if $k<m$ . Furthermore, $L_{F}$ still denotes the interpolation polynomial for $F$ corresponding to the nodes $x_{1},\ldots,x_{k}$ given by (2) but with $m$ replaced by $k$ . Additionally, $\beta$ is always positive and is given by

if $k\geq 1$ and by $\beta=(m!)^{-1}E[B(X)X^{m}]$ , if $k=0$ . Also, the distribution of $X^{(B,m)}$ is always absolutely continuous with respect to the Lebesgue measure unless $k=m=0$ .

From Theorem 2.1 we know that $\alpha>0$ . Let $F\in\mathcal{F}^{m}$ be given. By the assumptions on $X$ one can conclude again from Theorem 2.1 that $E[B(X)(F(X)-L_{F}(X))]$ exists and that there is a random variable $Y$ having the generalized $X-B$ biased distribution, so that

From our assumption in the case $k=0$ and from Theorem 2.1 for $k\geq 1$ , we know that $Y$ is not almost surely equal to zero. Thus, if $m\geq k+2$ , by Proposition 2.5 (with $a=0$ ) we know that there is a random variable $Y_{1}$ satisfying

where $\beta_{1}=\frac{1}{2}E[Y^{2}]$ . Now, if $m\geq k+4$ , then again by Proposition 2.5 we can find a random variable $Y_{2}$ such that

since $E[Y_{1}]=\frac{1}{6\beta_{1}}E[Y^{3}]=\frac{1}{3!\beta_{1}}E[Y^{3}]$ and with

Inductively, for $l=1,\dotsc,\frac{m-k}{2}$ we find that there exists $Y_{l}$ such that, with $Y_{0}:=Y$ we have

Again by induction we find the following analog of (2.2):

Now note that for $j=0,1,\dotsc,m-k$ with the function $F_{j}(x):=\frac{x^{k+j}}{(k+j)_{k}}$ we have from (28) that

Clearly, $Q_{j}(x):=F_{j}(x)-L_{F_{j}}(x)$ is a polynomial of degree $k+j$ having the zeroes $x_{1}<\ldots<x_{k}$ . Thus, there exists a polynomial $q_{j}$ of degree $j$ such that $Q_{j}(x)=q_{j}(x)\prod_{l=1}^{k}(x-x_{l})$ . Now, first suppose that $k=0$ . Then, we have $F_{j}(x)=Q_{j}(x)=q_{j}(x)=x^{j}$ . Thus, from (33) and (34) we can conclude that

Letting $X^{(B,m)}:=Y_{\frac{m}{2}}$ the claim follows in the case $k=0$ from (28) and (2.2). From now on, we will assume that $k\geq 1$ . In order to find $q_{j}$ in this case, we write

the last identity because the left hand side is a polynomial of degree $j+k$ and, hence, the right hand side must also be. Thus, as a neat by-product we have proved that

From (2.2) we conclude that $q_{j}$ is given by

Hence, from (34) and (38) we find for $j=0,1,\dotsc,m-k$ that

Now, from reading (2.2) backwards (with $m=k+j$ ) we obtain

Letting $X^{(B,m)}:=Y_{\frac{m-k}{2}}$ (23) now follows from (28) and (42). To see that $\beta>0$ , note that we know from our assumption in the case $k=0$ and from Theorem 2.1 in the case $k\geq 1$ that $Y$ cannot almost surely be equal to zero. Thus, the even moments of $Y$ are also non-zero. Since we know from (33) that $\beta=\frac{\alpha}{(m-k)!}E[Y^{m-k}]$ with $\alpha>0$ and as $m-k$ is even, it follows that also $\beta>0$ . Knowing that $\beta$ is necessarily positive, uniqueness of the distribution for $X^{(B,m)}$ can be proved as for $X^{(B)}$ in the proof of Theorem 2.1. Absolute continuity of $\mathcal{L}(X^{(B,m)})$ in the case that not both, $m$ and $k$ are equal to zero, now follows from Theorem 2.1 and Proposition 2.5. It remains to show the alternative representation for the numbers $a_{i}^{(j)}$ in (24). This is given by Lemma 2.8. ∎

For $k\geq 1$ let $x_{1},\dotsc,x_{k}$ be distinct real (or complex) numbers. Then, for each nonnegative integer $n$ we have the identity

We prove the claim by induction on $k$ , simultaneously for all $n\geq 0$ . If $k=1$ , then it is clearly true. Now assume that $k\geq 1$ and that $x_{1},\dotsc,x_{k},x_{k+1}$ are distinct numbers. Then, we can write

we conclude from the induction hypothesis that

Thus, it only remains to show that $S_{2}=0$ . But this follows from (37), completing the proof. ∎

We may call the distribution of $X^{(B,m)}$ the $X-(B,m)$ biased distribution. Note, however, that, as for $X^{(B)}$ , the distribution of $X^{(B,m)}$ is sensitive to the number $k$ and the choice of the sign change points $x_{1}<\ldots<x_{k}$ , if these are ambiguous (see Remark 2.2 (b)).

It is easy to see that an analog of Proposition 2.4 also exists for the $X-(B,m)$ biased distribution.

Examples and Applications

In this Subsection we give some examples of first-order distributional transformations, whose existence is guaranteed by Theorem 2.1 and demonstrate how this theory may be applied to prove certain Stein type characterizations without using the solution of the corresponding Stein equation. We also show, how one can use a coupling of $X$ and $X^{(B)}$ to estimate the distance of $\mathcal{L}(X)$ to a fixed point of the distributional transformation induced by $B$ . Finally, we show by examle that the distribution of $X^{(B)}$ in general depends on the choice of the zeroes of $B$ , if these are ambiguous.

Let $X$ be a real-valued random variable with $0<E[X^{2}]<\infty$ . Choosing $B(x)=x$ with a single sign change at , we conclude from Theorem 2.1 that there exists a random variable $X^{gz}$ such that

Under the same assumptions on $X$ as in (a) we now choose $B(x):=x-E[X]$ . Then,

and, again by Theorem 2.1, we find that there is a random variable $X^{nz}$ such that

where we have used that $E[B(X)]=0$ in this case. Again, whenever $X$ has mean zero, the distribution of $X^{nz}$ reduces to the $X$ -zero biased distribution. In general, we call it the $X$ -non-zero biased distribution. Note that the existence of this distribution already follows from Theorem 2.1 in [GR05], as $B$ satisfies their orthogonality relation in this case.

Next, we show by example how the existence of such distributional transformations may be used to prove a Stein type characterization of a given distribution, which is a fixed point of the distributional transformation. We first need the following definition.

Let $\sigma>0$ and $Z_{\sigma}\sim N(0,\sigma^{2})$ . Then, the distribution of $Y_{\sigma}:=\lvert Z_{\sigma}\rvert$ is called the half-normal distribution or modulus normal distribution with parameter $\sigma^{2}=E[Y_{\sigma}^{2}]$ . Further, we say that $W_{\sigma}$ has the negative half-normal distribution with parameter $\sigma^{2}$ , if $-W_{\sigma}$ has the half-normal distribution with parameter $\sigma^{2}$ .

Let $X$ be a real-valued random variable such that $0<E[X^{2}]<\infty$ . Then $\mathcal{L}(X)$ is a fixed point of the generalized zero bias transformation if and only if it is a mixture of a half-normal and a negative half-normal distribution with the same parameter.

Let the distribution of $X$ be a fixed point of the generalized zero-bias transformation. Then, from Remark 2.2 (d) we know that $X$ has an absolutely continuous distribution with density $p$ given by

From (3.1) and (48) we conclude that $p$ is continuously differentiable on $(0,\infty)$ and on $(-\infty,0)$ and that

for each $t\not=0$ . From (49) we see, that

for $t<0$ . Here, we used the shorthands $p(0+):=\lim_{t\downarrow 0}p(t)$ and $p(0-):=\lim_{t\uparrow 0}p(t)$ . The claim now follows from (50) and (51). Conversely, if the distribution of $X$ is such a mixture, then, by a standard computation involving Fubini’s theorem, one easily verifies that $X$ satisfies

and, hence, that $\mathcal{L}(X)$ is a fixed point of the generalized zero bias transformation. We omit the details. ∎

From Proposition 3.3 we directly infer the following Stein characterization of the class of half-normal distributions, whose derivation does not make use of the solution to any Stein equation.

A nonnegative random variable $X$ with $0<E[X^{2}]<\infty$ has the half-normal distribution with parameter $\sigma^{2}=E[X^{2}]$ , if and only if

for all $t\not=x_{1}$ , which is analogous to (49) and which implies that the $\log$ -derivative of $p$ is given by $-B/\alpha$ . Hence, the family of denisties $p$ giving rise to fixed points of the distributional transformation $\mathcal{L}(X)\mapsto\mathcal{L}(X^{(B)})$ can be reconstructed as before.

Suppose that the distribution of $Z$ is a fixed point of the distributional transformation in (a). Up to dividing $B$ by a constant, which does not change the distributional transformation, we can assume that

i.e. $-B$ is the $\log$ -derivative of the density $p$ of $Z$ . Then, the Stein equation from the density approach (see e.g. [CGS11]) for $Z$ corresponding to a test function $h$ such that $E[h(Z)]$ exists, reads

where we suppose that the support of $\mathcal{L}(Z)$ is given by the interval $\overline{(a,b)}$ for some $-\infty\leq a<b\leq\infty$ . The law of $Z$ is then usually chracterized by the identity

valid for all functions $f$ from some large function class $\mathcal{F}$ . If $h$ is Lipschitz-continuous, one typically has bounds for $f_{h}$ of the form

for some finite constants $c_{0},c_{1}$ and $c_{2}$ (see [CGS11], again). Now, suppose that $X$ is given and that $X^{(B)}$ has the generalized $X-B$ biased distribution and is constructed on the same space as $X$ . Then, for a $1$ -Lipschitz function $h$ , we can estimate

From (53) with $f(x)=x-x_{1}$ we presume that $\alpha$ should be close to one, if $\mathcal{L}(X)\approx\mathcal{L}(Z)$ . Thus, the second term in (55) (or (54)) should be close to zero. Also, if we can couple $X^{(B)}$ close to $X$ , then the first term should be small, too. In many cases, we have that $E[B(Z)]=0$ , as is suggested by taking $f(x)\equiv 1$ in (53), and from which we conclude that the third term in (55) is also close to zero and, hence that (55) gives a good estimate of the Wasserstein distance

and, hence, (54) might still give a useful estimate. In a nutshell, if the distribution of $Z$ is a fixed point of the distributional transformation induced by $B$ and we somehow conjecture that $\mathcal{L}(X)\approx\mathcal{L}(Z)$ and if we can can couple $X$ and $X^{(B)}$ sufficiently close, then we should be able to accurately estimate the (Wasserstein) distance between $\mathcal{L}(X)$ and $\mathcal{L}(Z)$ by the above procedure.

The following example illustrates the dependence of the distribution of $X^{(B)}$ on the choice of the sign change points, if there are non-trivial intervals, where $B$ vanishes identically and, if the orthogonality relations from [GR05] do not hold.

From Remark 2.2 (d) we know that a density $p$ for the distribution of $X^{(B;a)}$ is given by

and that a density $q$ for the distribution of $X^{(B;b)}$ is given by

We immediately see that, if the orthogonality relation $E[B(X)]=0$ is satisfied, then $\alpha=\beta$ and $p=q$ . This is in accordance with the fact that under this condition the $X^{(B)}$ distribution is the same for all choices of the zero point of $B$ as stated in [GR05]. If, however, $E[B(X)]\not=0$ , then we see from (3.6) that $p$ and $q$ are generally different and, hence, that the distribution of $X^{(B)}$ actually depends on the choice of the zeroes of $B$ . For a concrete example, let $X$ be uniformly distributed on $ $and let$ B(x)=\max(x,0)=:x^{+} $. Then, with the notation of the situation above, we can let$ a=-1 $and$ b=0 $and obtain$ E[B(X)]=E[X^{+}]=1/4$ as well as

Obviously, $p$ and $q$ give rise to two different distributions.

2. Higher order Stein operators

The purpose of this Subsection is to show, how the existence of certain couplings guaranteed by Theorem 2.7 can be used to assess the distance of the distribution of a given random variable $X$ to the distribution of a random variable $Z$ , which is characterized by some higher order linear Stein operator $L$ of the form

are all finite and such that $\alpha:=\alpha_{1}+\alpha_{2}>0$ , where

Then, there exists a unique distribution for a random variable $X^{*}$ such that for all $f\in\mathcal{F}^{2}$ we have

The law of $X^{*}$ is always absolutely continuous with respect to the Lebesgue measure.

Uniqueness is proved in the same way as in the proof of Theorem 2.1. So let us just prove the existence of $X^{*}$ . First, choose $\hat{X}_{a}$ as in Proposition 2.5. Let $\nu:=\mathcal{L}(\hat{X}_{a})$ and, if $E[B_{0}(\hat{X}_{a})]\not=0$ , define $\mu$ by

whereas, if $E[B_{0}(\hat{X}_{a})]=0$ , let $\mu:=\delta_{0}$ . Finally, let $Y_{1}\sim\mu$ and construct $Y_{2}$ and a random index $I\in\{1,2\}$ on the same probability space as $Y_{1}$ such that $I$ is independent of $Y_{1},Y_{2}$ , and $Y_{2}$ has the generalized $X-B_{1}$ -biased distribution and

hold for all sufficiently smooth functions $g$ and $f$ , respectively. Hence, letting $X^{*}:=Y_{I}$ we have for all $f\in\mathcal{F}^{2}$ that

as claimed. Also, note that the distribution of $X^{*}$ , being a mixture of absolutely continuous distributions, is itself absolutely continuous. ∎

is the ( $\lambda$ -a.e. unique) probability density function of $X^{*}$ .

If the operator $L$ in (57) with $m=2$ is characterizing for the distribution of $Z$ , then the Stein equation corresponding to a test function $h$ with $E\lvert Z\rvert<\infty$ is given by

and, often, it has a solution $f_{h}$ such that the lower order derivatives can be uniformly bounded by constants, i.e. $\lVert f_{h}^{(i)}\rVert_{\infty}\leq c_{i}$ uniformly over $h$ in some class $\mathcal{H}$ of test functions. Then, if one can couple the given random variable $X$ to a $X^{*}$ such as in Proposition 3.7, then one can easily show that

Now, in typical cases one either has that the quantities $f_{h}(a)$ and $f_{h}^{\prime}(a)$ are equal to zero (as is the case for the operator used in [PRR13]), or the expressions

are close to zero. The latter could be guessed from choosing $f(x)=x-a$ and $f(x)=1$ , respectively, together with the assumption that $\mathcal{L}(X)\approx\mathcal{L}(Z)$ . The same heuristic applied to $f(x)=\frac{1}{2}(x-a)^{2}$ suggests that $\alpha$ should be close to $1$ . Thus, the right hand side of (58) should be close to zero, if $X$ and $X^{*}$ are coupled close to each other.

As in the first-order case (see Remark 3.5) one can show that if $\mathcal{L}(Z)$ is a fixed point of the distributional transformation from Proposition 3.7, then its density $p$ satisfies the second order linear differential equation

from which one should be able to reconstruct the class of fixed points in practice by exploiting boundary conditions like $\int p(t)dt=1$ .

Now, we return to the case of a general $m\geq 1$ . Henceforth, we denote by $R_{j,f}$ and $L_{j,f}$ , respectively, the polynomials from the statement of Theorem 2.7 for $B=B_{j}$ , $j=0,1,\dotsc,m-1$ and define $Q_{j,f}:=L_{j,f}+R_{j,f}$ . In Theorem 3.9 below, we make the assumption that $B_{j}$ has $0\leq k_{j}\leq m-j$ sign changes and that $k_{j}\equiv m-j\mod 2$ . Then, by Theorem 2.7, $Q_{j,f}$ is a polynomial of degree $\leq m-j-1$ , $j=0,1,\dotsc,m-1$ . Also, assume that $X$ is a real random variable such that $E\lvert B_{j}(X)X^{l}\rvert<\infty$ for each $0\leq l\leq m-j$ and $0\leq j\leq m-1$ . Then, for $j=0,1,\dotsc,m-1$ , we define

which is always nonnegative by Theorem 2.7.

With the above notation and assumptions, suppose that for each $j=0,1,\dotsc,m-1$ the function $B_{j}$ has $0\leq k_{j}\leq m-j$ sign changes, where $k_{j}\equiv m-j\mod 2$ . Furthermore, assume that there is some $j\in\{0,1,\dotsc,m-1\}$ such that $\beta_{j}>0$ and let $\beta:=\sum_{j=0}^{m-1}\beta_{j}>0$ . Then, there exists a unique distribution for a random variable $X^{*}$ such that for all $f\in\mathcal{F}^{m}$ we have

The law of $X^{*}$ is always absolutely continuous with respect to the Lebesgue measure.

Again, we only prove the existence part. For each $j=0,1,\dotsc,m-1$ let $Y_{j}$ have the $X-(B_{j},m-j)$ biased distribution, whenever $\beta_{j}\not=0$ and let $Y_{j}:=0$ , otherwise. Also, let $I\in\{0,1,\dotsc,m-1\}$ be a random index, which is independent of $Y_{0},Y_{1},\dotsc,Y_{m-1}$ such that

and define $X^{*}:=Y_{I}$ . Then, with the notation $f_{j}:=f^{(j)}$ , $j=0,1,\dotsc,m-1$ , by Theorem 2.7 we have

It is possible that a coupling of $X$ and $X^{*}$ as in Theorem 3.9 will be useful to bound the distance of the distribution of $X$ to that of $Z$ also in the case $m>3$ , once such Stein operators are used in practice. Maybe it would first be necessary to adjust this distributional transformation slightly by introducing additional location parameters $a_{j}$ related to the functions $B_{j}$ , as discussed in Remark 2.9 (c).

Analytical proof of Theorem 2.1

We prove the claim by induction on $m$ . Since $F^{(0)}=f>0$ has no zeroes if $m=0$ , the assertion is clear in this case. Now, let $m\geq 1$ and assume that the claim is true for $(m-1)$ -times differentiable functions. Suppose, contrarily, that $F$ has $m+1$ distinct zeroes $y_{1}<\ldots<y_{m}<y_{m+1}$ . Then, by Rolle’s theorem there exist points $z_{k}\in(y_{k},y_{k+1})$ such that $F^{\prime}(z_{k})=0$ for $k=1,\dotsc,m$ . Since the points $z_{1},\dotsc,z_{m}$ are necessarily pairwise distinct zeroes of the $(m-1)$ -times differentiable function $G:=F^{\prime}$ with $G^{(m-1)}=F^{(m)}>0$ , this contradicts the induction hypothesis. ∎

Hence, for each polynomial $Q$ of degree at most $m-1$ it follows that $\lim_{x\to\infty}(G(x)+Q(x))=+\infty$ if $m\geq 1$ and that $\liminf_{x\to\infty}G(x)\geq\varepsilon$ if $m=0$ .

Recall that for real numbers $x_{1}<x_{2}<\ldots<x_{m}$ we let $J_{1}:=(-\infty,x_{1}]$ , $J_{k}:=(x_{k-1},x_{k}]$ for $2\leq k\leq m$ and $J_{m+1}:=(x_{m},\infty)$ .

This follows immediately from Lemma 4.3 and its proof. ∎

Note that by construction $Q(x):=G_{1}(x)-L_{G_{1}}(x)$ is a polynomial of degree $m$ such that $Q(x_{k})=0$ for $k=1,\dotsc,m$ . Hence, there exists $c\not=0$ such that $Q(x)=c\prod_{k=1}^{m}(x-x_{k})$ . Since $c=Q^{(m)}=G_{1}^{(m)}=1$ , we conclude from (4) that

On the other hand, by the monotone convergence theorem and (62) we have

Now, it is easily seen by successive differentiation that $F=G_{f}+T_{m-1,x_{m}}F$ , where $T_{m-1,x_{m}}F$ is the Taylor polynomial of order $m-1$ around $x_{m}$ corresponding to $F$ . Since the interpolation polynomial of degree $\leq m-1$ corresponding to $T_{m-1,x_{m}}F$ is still $T_{m-1,x_{m}}F$ , this implies that

From (4) and (67) it finally folllows that

Major parts of this work have been carried out while I was postdoc at TU München, Germany. I would like to thank Professor Gesine Reinert for inviting me to a visit to Oxford in September 2013 and giving me the opportunity of presenting parts of this work during my stay there. I am also grateful to an anonymous referee whose comments helped me improve the presentation and exposition of the above results.