Distributional transformations without orthogonality relations
Christian Döbler
Introduction
Main abstract results and discussion
In this Subsection we give a proof of the following theorem, which is a generalization of Theorem 2.1 in [GR05].
Then, is necessarily positive and there exists a unique distribution for a random variable such that for all we have
whith as defined in (2). Furthermore, if , then the distribution of is absolutely continuous with respect to the Lebesgue measure.
If and additionally satisfy the orthogonality conditions for all , then the distribution of reduces to the biased distribution from [GR05] as is easily seen by writing the polynomial in terms of the monomials . Also, in this case for the same reason we have . So it is justified to call the distribution of the generalized biased distribution.
Note that if, according to our definition of sign changes, has both, and sign changes for , then we see from (3) that these two points of view lead to different distributions for . Also, if we may consider to have sign changes at as well as at , then the resulting ’s and, again, the distributions of ’s are different, in general, which is in contrast to the theory from [GR05], where such ambiguities are ruled out by their orthogonality asumptions on with respect to . Thus, one should actually denote the variable by to prevent these ambiguities. We illustrate this phenomenon for the case in Example 3.6 below. We will, however, not do so but rather assume that it is understood or mention how many sign changes at what exact points the function is supposed to have.
For the existence part of Theorem 2.1 we give two different proofs: An analytical proof, which uses the Riesz representation theorem, and a probabilistic proof, which relies on an explicit construction of the random variable . Remarkably, the same construction of as in [GR05] is still valid in this more general setting. However, we were not able to generalize the proof of Theorem 2.1 in [GR05] to a proof of our Theorem 2.1.
In the case , one can easily show that the function given by
Note that if , then one can easily show by induction on that there exist finite constants such that
The assumption for is easily seen to be equivalent to and .
From the nonnegativity of on we know that
Thus, if , it is necessarily positive. Now, we give the explicit construction of the random variable from [GR05]. Let be independent random variables such that has the density () and has distribution given by
where is the distribution of . Note that, by (5) and the definition and positivity of , is indeed a probability measure and, hence, such a exists. Now, we define the random variable
where . We claim that satisfies (3). This claim will be proved by induction on . If , then the claim reduces to
we conclude from the induction hypothesis that
we can thus conclude from the induction hypothesis that
which is clear from the Lagrange form of the interpolation polynomial corresponding to the constant function and the nodes . Using this, we obtain that
To prove this claim, we use the explicit construction of given in (7). Thus, we have that
By the properties of the Lebesgue measure it follows that
Thus, from (11) we infer that . Hence, the distribution of is absolutely continuous with respect to . ∎
With the notation of the above existence proof, from the identity
valid for bounded and measurable , and an easy change of variable one can easily deduce that for the (-a.e. unique) density of is given by
From (2.1) and Remark 2.2 (f) we conclude that
Further, for each let have the generalized biased distribution. Let be independent of the family having distribution , .
Under the above assumptions the variable has the generalized biased distribution.
The easy proof is quite standard: For we have by Fubini’s theorem
It is actually not strictly necessary to assume that satisfies the asumptions of Theorem 2.1 for each . In fact, assuming (2.1) it follows from Remark 2.2 (f) that exists for -a.e. but it might be zero for certain values of . Assuming additionally that for and letting have any fixed distribution if , then the proof goes through as before, since the distribution of the index puts mass to values of such that .
2. Biasing functions with fewer than m𝑚m sign changes
Although Theorem 2.1 is already quite general, in practice it might happen that one would like the order of the derivative on the right hand side of (3) to be larger than the number, say , of sign changes of the function on the left hand side of (3). For example, if is a nonnegative random variable with finite and non-zero expectation, then is said to have the equilibrium distribution with respect to , if
holds for all continuously differentiable functions with a Lipschitz derivative. In their final version [PR14] they prove this by giving an explicit construction of the random variable . In the first arXiv version, however, they applied Theorem 2.1 of [GR05] with the distributional transformation given by twice in a row, and, in order to do so, they had to make sure that the orthogonality assumptions of that theorem were satisfied. This is why they first had to assume that not only but also be satisfied. Invoking Theorem 2.1 instead, we are able to prove the following statement, which even generalizes (15) to the class of all with finite second moment. This result is the main building block of a generalization of Theorem 2.1 to cases, where the number of sign changes of might disagree with the order of the derivative of the test function .
holds for all continuously differentiable functions with a Lipschitz derivative. Further, the distribution of is always absolutely continuous with respect to the Lebesgue measure.
Using the transformation from Proposition 2.5, one could easily generalize the results from [PR14] to random sums with general mean zero summands and even to summands with small, non-zero means.
holds for all Lipschitz functions . Since has finite second moment, one can easily see that (18) also holds for absolutely continuous functions such that is as . In particular this holds for g(x):=\operatorname{sign}(x-a)\bigl{(}f(x)-f(a)\bigr{)} with and for . Thus, from (18), (2.2) and (2.2) we conclude that
proving (16). Absolute continuity of follows immediately from Theorem 2.1. ∎
Next, we will use the result of Proposition 2.5 to give a generalization of Theorem 2.1 to cases, where the number of sign changes of may be smaller than the order of the derivative we would like to have in the defining identity for the biased distribution. However, we will have to assume that , i.e. that and have the same parity. In what follows, for nonnegative integers we denote by the falling factorial, i.e. and if .
If , assume further that the generalized biased distribution from Theorem 2.1 is not the Dirac measure at . Then, there exists a unique distribution for a random variable such that
holds for each , where, with
if . Then, is equal to zero, whenever and has degree at most , if . Furthermore, still denotes the interpolation polynomial for corresponding to the nodes given by (2) but with replaced by . Additionally, is always positive and is given by
if and by , if . Also, the distribution of is always absolutely continuous with respect to the Lebesgue measure unless .
From Theorem 2.1 we know that . Let be given. By the assumptions on one can conclude again from Theorem 2.1 that exists and that there is a random variable having the generalized biased distribution, so that
From our assumption in the case and from Theorem 2.1 for , we know that is not almost surely equal to zero. Thus, if , by Proposition 2.5 (with ) we know that there is a random variable satisfying
where . Now, if , then again by Proposition 2.5 we can find a random variable such that
since and with
Inductively, for we find that there exists such that, with we have
Again by induction we find the following analog of (2.2):
Now note that for with the function we have from (28) that
Clearly, is a polynomial of degree having the zeroes . Thus, there exists a polynomial of degree such that . Now, first suppose that . Then, we have . Thus, from (33) and (34) we can conclude that
Letting the claim follows in the case from (28) and (2.2). From now on, we will assume that . In order to find in this case, we write
the last identity because the left hand side is a polynomial of degree and, hence, the right hand side must also be. Thus, as a neat by-product we have proved that
From (2.2) we conclude that is given by
Hence, from (34) and (38) we find for that
Now, from reading (2.2) backwards (with ) we obtain
Letting (23) now follows from (28) and (42). To see that , note that we know from our assumption in the case and from Theorem 2.1 in the case that cannot almost surely be equal to zero. Thus, the even moments of are also non-zero. Since we know from (33) that with and as is even, it follows that also . Knowing that is necessarily positive, uniqueness of the distribution for can be proved as for in the proof of Theorem 2.1. Absolute continuity of in the case that not both, and are equal to zero, now follows from Theorem 2.1 and Proposition 2.5. It remains to show the alternative representation for the numbers in (24). This is given by Lemma 2.8. ∎
For let be distinct real (or complex) numbers. Then, for each nonnegative integer we have the identity
We prove the claim by induction on , simultaneously for all . If , then it is clearly true. Now assume that and that are distinct numbers. Then, we can write
we conclude from the induction hypothesis that
Thus, it only remains to show that . But this follows from (37), completing the proof. ∎
We may call the distribution of the biased distribution. Note, however, that, as for , the distribution of is sensitive to the number and the choice of the sign change points , if these are ambiguous (see Remark 2.2 (b)).
It is easy to see that an analog of Proposition 2.4 also exists for the biased distribution.
Examples and Applications
In this Subsection we give some examples of first-order distributional transformations, whose existence is guaranteed by Theorem 2.1 and demonstrate how this theory may be applied to prove certain Stein type characterizations without using the solution of the corresponding Stein equation. We also show, how one can use a coupling of and to estimate the distance of to a fixed point of the distributional transformation induced by . Finally, we show by examle that the distribution of in general depends on the choice of the zeroes of , if these are ambiguous.
Let be a real-valued random variable with . Choosing with a single sign change at , we conclude from Theorem 2.1 that there exists a random variable such that
Under the same assumptions on as in (a) we now choose . Then,
and, again by Theorem 2.1, we find that there is a random variable such that
where we have used that in this case. Again, whenever has mean zero, the distribution of reduces to the -zero biased distribution. In general, we call it the -non-zero biased distribution. Note that the existence of this distribution already follows from Theorem 2.1 in [GR05], as satisfies their orthogonality relation in this case.
Next, we show by example how the existence of such distributional transformations may be used to prove a Stein type characterization of a given distribution, which is a fixed point of the distributional transformation. We first need the following definition.
Let and . Then, the distribution of is called the half-normal distribution or modulus normal distribution with parameter . Further, we say that has the negative half-normal distribution with parameter , if has the half-normal distribution with parameter .
Let be a real-valued random variable such that . Then is a fixed point of the generalized zero bias transformation if and only if it is a mixture of a half-normal and a negative half-normal distribution with the same parameter.
Let the distribution of be a fixed point of the generalized zero-bias transformation. Then, from Remark 2.2 (d) we know that has an absolutely continuous distribution with density given by
From (3.1) and (48) we conclude that is continuously differentiable on and on and that
for each . From (49) we see, that
for . Here, we used the shorthands and . The claim now follows from (50) and (51). Conversely, if the distribution of is such a mixture, then, by a standard computation involving Fubini’s theorem, one easily verifies that satisfies
and, hence, that is a fixed point of the generalized zero bias transformation. We omit the details. ∎
From Proposition 3.3 we directly infer the following Stein characterization of the class of half-normal distributions, whose derivation does not make use of the solution to any Stein equation.
A nonnegative random variable with has the half-normal distribution with parameter , if and only if
for all , which is analogous to (49) and which implies that the -derivative of is given by . Hence, the family of denisties giving rise to fixed points of the distributional transformation can be reconstructed as before.
Suppose that the distribution of is a fixed point of the distributional transformation in (a). Up to dividing by a constant, which does not change the distributional transformation, we can assume that
i.e. is the -derivative of the density of . Then, the Stein equation from the density approach (see e.g. [CGS11]) for corresponding to a test function such that exists, reads
where we suppose that the support of is given by the interval for some . The law of is then usually chracterized by the identity
valid for all functions from some large function class . If is Lipschitz-continuous, one typically has bounds for of the form
for some finite constants and (see [CGS11], again). Now, suppose that is given and that has the generalized biased distribution and is constructed on the same space as . Then, for a -Lipschitz function , we can estimate
From (53) with we presume that should be close to one, if . Thus, the second term in (55) (or (54)) should be close to zero. Also, if we can couple close to , then the first term should be small, too. In many cases, we have that , as is suggested by taking in (53), and from which we conclude that the third term in (55) is also close to zero and, hence that (55) gives a good estimate of the Wasserstein distance
and, hence, (54) might still give a useful estimate. In a nutshell, if the distribution of is a fixed point of the distributional transformation induced by and we somehow conjecture that and if we can can couple and sufficiently close, then we should be able to accurately estimate the (Wasserstein) distance between and by the above procedure.
The following example illustrates the dependence of the distribution of on the choice of the sign change points, if there are non-trivial intervals, where vanishes identically and, if the orthogonality relations from [GR05] do not hold.
From Remark 2.2 (d) we know that a density for the distribution of is given by
and that a density for the distribution of is given by
We immediately see that, if the orthogonality relation is satisfied, then and . This is in accordance with the fact that under this condition the distribution is the same for all choices of the zero point of as stated in [GR05]. If, however, , then we see from (3.6) that and are generally different and, hence, that the distribution of actually depends on the choice of the zeroes of . For a concrete example, let be uniformly distributed on $B(x)=\max(x,0)=:x^{+}a=-1b=0E[B(X)]=E[X^{+}]=1/4$ as well as
Obviously, and give rise to two different distributions.
2. Higher order Stein operators
The purpose of this Subsection is to show, how the existence of certain couplings guaranteed by Theorem 2.7 can be used to assess the distance of the distribution of a given random variable to the distribution of a random variable , which is characterized by some higher order linear Stein operator of the form
are all finite and such that , where
Then, there exists a unique distribution for a random variable such that for all we have
The law of is always absolutely continuous with respect to the Lebesgue measure.
Uniqueness is proved in the same way as in the proof of Theorem 2.1. So let us just prove the existence of . First, choose as in Proposition 2.5. Let and, if , define by
whereas, if , let . Finally, let and construct and a random index on the same probability space as such that is independent of , and has the generalized -biased distribution and
hold for all sufficiently smooth functions and , respectively. Hence, letting we have for all that
as claimed. Also, note that the distribution of , being a mixture of absolutely continuous distributions, is itself absolutely continuous. ∎
is the (-a.e. unique) probability density function of .
If the operator in (57) with is characterizing for the distribution of , then the Stein equation corresponding to a test function with is given by
and, often, it has a solution such that the lower order derivatives can be uniformly bounded by constants, i.e. uniformly over in some class of test functions. Then, if one can couple the given random variable to a such as in Proposition 3.7, then one can easily show that
Now, in typical cases one either has that the quantities and are equal to zero (as is the case for the operator used in [PRR13]), or the expressions
are close to zero. The latter could be guessed from choosing and , respectively, together with the assumption that . The same heuristic applied to suggests that should be close to . Thus, the right hand side of (58) should be close to zero, if and are coupled close to each other.
As in the first-order case (see Remark 3.5) one can show that if is a fixed point of the distributional transformation from Proposition 3.7, then its density satisfies the second order linear differential equation
from which one should be able to reconstruct the class of fixed points in practice by exploiting boundary conditions like .
Now, we return to the case of a general . Henceforth, we denote by and , respectively, the polynomials from the statement of Theorem 2.7 for , and define . In Theorem 3.9 below, we make the assumption that has sign changes and that . Then, by Theorem 2.7, is a polynomial of degree , . Also, assume that is a real random variable such that for each and . Then, for , we define
which is always nonnegative by Theorem 2.7.
With the above notation and assumptions, suppose that for each the function has sign changes, where . Furthermore, assume that there is some such that and let . Then, there exists a unique distribution for a random variable such that for all we have
The law of is always absolutely continuous with respect to the Lebesgue measure.
Again, we only prove the existence part. For each let have the biased distribution, whenever and let , otherwise. Also, let be a random index, which is independent of such that
and define . Then, with the notation , , by Theorem 2.7 we have
It is possible that a coupling of and as in Theorem 3.9 will be useful to bound the distance of the distribution of to that of also in the case , once such Stein operators are used in practice. Maybe it would first be necessary to adjust this distributional transformation slightly by introducing additional location parameters related to the functions , as discussed in Remark 2.9 (c).
Analytical proof of Theorem 2.1
We prove the claim by induction on . Since has no zeroes if , the assertion is clear in this case. Now, let and assume that the claim is true for -times differentiable functions. Suppose, contrarily, that has distinct zeroes . Then, by Rolle’s theorem there exist points such that for . Since the points are necessarily pairwise distinct zeroes of the -times differentiable function with , this contradicts the induction hypothesis. ∎
Hence, for each polynomial of degree at most it follows that if and that if .
Recall that for real numbers we let , for and .
This follows immediately from Lemma 4.3 and its proof. ∎
Note that by construction is a polynomial of degree such that for . Hence, there exists such that . Since , we conclude from (4) that
On the other hand, by the monotone convergence theorem and (62) we have
Now, it is easily seen by successive differentiation that , where is the Taylor polynomial of order around corresponding to . Since the interpolation polynomial of degree corresponding to is still , this implies that
From (4) and (67) it finally folllows that
Major parts of this work have been carried out while I was postdoc at TU München, Germany. I would like to thank Professor Gesine Reinert for inviting me to a visit to Oxford in September 2013 and giving me the opportunity of presenting parts of this work during my stay there. I am also grateful to an anonymous referee whose comments helped me improve the presentation and exposition of the above results.