Small ball probabilities for linear images of high dimensional distributions
Mark Rudelson, Roman Vershynin
Introduction
Concentration properties of high dimensional distributions have been extensively studied in probability theory. In this paper we are interested in small ball probabilities, which describe the spread of a distribution in space. Small ball probabilities have been extensively studied for stochastic processes (see ), sums of independent random variables (see ) and log-concave measures (see [1, Chapter 5]). Nevertheless, there remain surprisingly basic questions that have not been previously addressed.
If the distributions of are well spread on the line, then the distribution of is well spread in space.
Special cases of interest are marginals of which arise when is an orthogonal projection, and sums of independent random variables which correspond to . The problem of describing small ball probabilities even in these two special cases is nontrivial and useful for applications. In particular, a recent interest in this problem was spurred by applications in random matrix theory; see for sums of random variables and for higher dimensional marginals.
This discussion is non-trivial even for continuous distributions, and we shall start from this special case.
Our first main result is about distributions with independent continuous coordinates. It states that any -dimensional marginal of such distribution has density bounded by , as long as the densities of the coordinates are bounded by .
Here and throughout the paper, denote positive absolute constants.
Theorem 1.1 is trivial in dimension , since the product density is bounded by . A remarkable non-trivial case of Theorem 1.1 is in dimension , where it holds with optimal constant . This partial case is worth to be stated separately.
Let be real-valued independent random variables whose densities are bounded by almost everywhere. Let be real numbers with . Then the density of is bounded by almost everywhere.
Moreover, this estimate is optimal. The density is achieved by the sum where are uniformly distributed in .
Theorem 1.2 follows immediately from a combination of two known results, a theorem of B. Rogozin stated below and a theorem of K. Ball .
Let be real-valued independent random variables whose densities are bounded by respectively. Then the density of is uniformly bounded by the value of the density of at the origin, where are independent random variables uniformly distributed in .
For simplicity, by rescaling we can assume that . Rogozin’s Theorem 1.3 implies that the density of is uniformly bounded by the density of at the origin, where are independent random variables uniformly distributed in .
The random vector is uniformly distributed in the cube . Thus the density of at the origin is the volume of the section of the cube by the hyperplane that contains the origin and is orthogonal to the vector . Now, a theorem of Ball states that the maximal volume of such section equals . Therefore, the density of is bounded by . ∎
2. General distributions
Thus the concentration function controls the small ball probabilities of the distribution of . The study of concentration functions of sums of independent random variables originates from the works of Lévy , Kolmogorov , Rogozin , Esseen and Halasz . Recent developments in this area highlighted connections with Littlewood-Offord problem and applications to random matrix theory, see .
Consider a random vector where are real-valued independent random variables. Let be such that
This result can be regarded as a tensorization property of the concentration function. It will be deduced from Theorem 1.1 in Section 2.
3. Anisotropic distributions
Finally, we study concentration of anisotropic high-dimensional distributions, which take the form for a fixed matrix . The key exponent that controls the behavior of the concentration function of is the stable rank of . We define it as
Consider a random vector where are real-valued independent random variables. Let be such that
Let be an matrix and . Then
where depends only on .
A more precise version of this result is Theorem 8.1 and Corollary 8.5 below. It will be deduced from Corollary 1.4 by replacing by a dyadic sum of spectral projections.
In the particular case of continuous distributions, Theorem 1.5 states the following. Suppose the densities of are bounded by . Then obviously for any , so Theorem 1.5 yields
Here is an absolute constant and depends only on the bound on the sub-gaussian norms. The distributions for which Paouris’ inequality (1.4) applies are not required to have independent coordinates. On the other hand, the log-concavity assumption for (1.4) is much stronger than a uniform bound on the coordinate densities in (1.3).
It is worthwhile to state here a related large deviation bound for from . If are independent, uniformly sub-gaussian, and have zero means and unit variances, then
Here depends only on the bound on the sub-gaussian norms of .
4. The method
Let us outline the proof of the key Theorem 1.1, which implies all other results in this paper.
A natural strategy would be to extend to higher dimensions the simple one-dimensional argument leading to Theorem 1.2, which was a combination of Ball’s and Rogozin’s theorems. A higher-dimensional version of Ball’s theorem is indeed available ; it states that the maximal volume of a section of the cube by a subspace of codimension is . However, we are unaware of any higher-dimensional versions of Rogozin’s theorem .
An alternative approach to the special case of Theorem 1.1 in dimension (and, as a consequence, to Corollary 1.4 in dimension ) was developed in an unpublished manuscript of Ball and Nazarov . Although it does not achieve the optimal constant that appears in Theorem 1.2, this approach avoids the delicate combinatorial arguments that appear in the proof of Rogozin’s theorem. The method of Ball and Nazarov is Fourier-theoretic; its crucial steps go back to Halasz and Ball .
In this paper, we prove Theorem 1.1 by generalizing the method of Ball and Nazarov to higher dimensions using Brascamp-Lieb inequality. For educational purposes, we will start by presenting a version of Ball-Nazarov’s argument in dimension in Sections 3 and 4. The higher-dimensional argument will be presented in Sections 5–7.
There turned out to be an unexpected difference between dimension and higher dimensions, which presents us with an an extra challenge. The one-dimensional method works well under assumption that all coefficients are small, e.g. . The opposite case where there is a large coefficient , is trivial; it can be treated by conditioning on all except .
In higher dimensions, this latter case is no longer trivial. It corresponds to the situation where some is large (here denote the coordinate basis vectors). The power of one random variable is not enough to yield Theorem 1.1; such argument would lead to a weaker bound instead of .
In Section 6 we develop an alternative way to remove the terms with large from the sum. It is based on a careful tensorization argument for small ball probabilities.
Deduction of Corollary 1.4 from Theorem 1.1
We begin by recording a couple of elementary properties of concentration functions.
The following elementary observation connects densities and concentration functions.
The density of is bounded by almost everywhere;
The concentration function of satisfies
Here and depend only on each other. In the implication (i) (ii), we have where is an absolute constant. In the implication (ii) (i), we have .
This proposition follows from the known bound (see e.g. formula (1.18) in ). ∎
Now we are ready to deduce Corollary 1.4 from Theorem 1.1. The proof is a higher-dimensional version of the smoothing argument of Ball and Nazarov .
Note that has independent coordinates whose densities can be computed as follows:
Applying Theorem 1.1, we find that the density of is bounded by , where . Then Proposition 2.2 yields that
Substituting this into (2.2), we complete the proof. ∎
Using regularity of concentration function described in Proposition 2.1, one can state the conclusion of Corollary 1.4 in a more flexible way:
Decay of characteristic functions
We will now begin preparing our way for the proof of Theorem 1.1. Our argument will use the following tail bound for the characteristic function
of a random variable with bounded density. The estimate and its proof below are essentially due to Ball and Nazarov . The reader can refer to [12, Chapter 3] for the definition and basic properties of the decreasing rearrangements of functions.
Let be a random variable whose density is bounded by . Then the non-increasing rearrangement of the characteristic function of satisfies
The estimate for large will follow from Plancherel’s identity. The estimate for small will be based on a regularity argument going back to Halasz .
1. Plancherel. By replacing with we can assume that . Let denote the density of . Thus , according to the standard definition of the Fourier transform
By Plancherel’s identity and using that and , we obtain
This proves the second part of the claimed estimate. It remains to prove the first part.
2. Symmetrization. Let denote an independent copy of . Then
We are going to prove a bound of the form
Substituting , we would obtain the desired estimate
which would conclude the proof (provided is chosen large enough so that ).
3. Regularity. First we observe that (3.3) holds for some fixed constant value of . This follows from the identity and inequality (3.2):
Theorem 1.1 in dimension one. Densities of sums.
Now we are going to give a “soft” proof of a version of Theorem 1.2 due to Ball and Nazarov . Their argument establishes Theorem 1.1 in dimension . Let us state this result separately.
Let be real-valued independent random variables whose densities are bounded by almost everywhere. Let be real numbers with . Then the density of is bounded by almost everywhere.
By replacing with we can assume that . By replacing with when necessary we can assume that all . We can further assume that by dropping all zero terms from the sum. If there exists with , then the conclusion follows by conditioning on all except . Thus we can assume that
Finally, by translating if necessary we reduce the problem to bounding the density of at the origin.
We may assume that by adding to an independent normal random variable with an arbitrarily small variance. Fourier inversion formula associated with the Fourier transform (3.1) yields that the density of at the origin (defined using (2.1)) can be reconstructed from its Fourier transform as
By independence, we have , so
We use the generalized Hölder’s inequality with exponents whose reciprocals sum to by assumption. It yields
The value of the integrals will not change if we replace the functions by their non-increasing rearrangements . After change of variable, we obtain
Bounding by , we see that the first integral (over ) is bounded by . The second integral (over ) is bounded by
where we used that . Therefore
provided that constant is chosen large enough. Hence
Substituting this into (4.1) completes the proof. ∎
Our proof of Theorem 1.1 will go differently depending on whether all vectors are small or some are large. In the first case, we proceed with a high-dimensional version of the argument from Section 4, where Hölder’s inequality will be replaced by Brascamp-Lieb’s inequality. In the second case, we will remove the large vectors one by one, using a new precise tensorization property of concentration functions.
In this section, we treat the case where all vectors are small. Theorem 1.1 can be formulated in this case as follows.
Let be a random vector and be a projection which satisfy the assumptions of Theorem 1.1. Assume that
Then the density of the random vector is bounded by almost everywhere.
The proof will be based on Brascamp-Lieb’s inequality.
The singular value decomposition of yields the existence of a matrix satisfying
As in the proof of Theorem 4.1, Fourier inversion formula associated with the Fourier transform in dimensions yields that the density of at the origin (defined using (2.1)) can be reconstructed from its Fourier transform as
is the characteristic function of . Therefore, to complete the proof, it suffices to bound the integral in the right hand side of (5.1) by .
In order to represent more conveniently for application of Brascamp-Lieb inequality, we denote
Then , so the identity can be written as
Moreover, we have . Substituting this into (5.2) and using independence, we obtain
Recalling (5.3), we apply Brascamp-Lieb inequality for these functions and obtain
We arrived at the same quantity as we encountered in one-dimensional argument in (4.2). Following that argument, which uses the assumption that all , we bound the product the quantity above by
Recalling that and , we find that . Thus the right hand side of (5) is bounded by . The proof of Proposition 5.1 is complete. ∎
Next we turn to the case where not all vectors are small. In this case, we will remove the large vectors one by one. The non-trivial task is how not to lose power at each step. This will be achieved with the help of the following precise tensorization property of small ball probabilities.
Let be random variables and be real numbers. Assume that
This lemma will be used later in an inductive argument. To make the inductive step, two features will be critical: (a) the term of order in the denominator of the probability estimate; (b) the possibility of choosing different values for the parameters and .
Denoting , we compute the probability by iterative integration in plane:
where the last equation follows by integration by parts. Hypothesis (ii) of the lemma says that , so the expression above is bounded by
where the last equation follows by substitution .
The integral in the right hand side is the value of beta function
Bounding this value is standard. One can use the fact that
which follows from the identity and Stirling’s approximation. (Here means that .) It follows that
Therefore the ratio of the left and right hand sides is bounded in . Hence there exists an absolute constant such that
Substituting finishes the proof. ∎
Let be random variables and , be real numbers. Assume that
provided that with a suitably small absolute constant .
Random variables , satisfy the assumptions of Lemma 6.1 with
Using the hypothesis that , we bound the right hand side by
If we choose then the right hand side gets bounded by , as claimed. ∎
Let where is an absolute constant. If
Without loss of generality, we can assume that . Let us record a few basic properties of . A straightforward check shows that
It follows that , since the orthogonal projection of onto equals . Canceling on both sides, we have
It follows from (6.3) that has the form
where are fixed numbers (independent of ). Substituting , we obtain using (6.4) that . Thus
since and by (6.4). Finally, since is orthogonal to the image of , the two vectors in the right side of (6.5) are orthogonal. Thus
Now let us estimate for a random vector . We express using (6.8) and (6.6) as
and try to apply Corollary 6.3. Let first us check the hypotheses of that corollary. Since by (6.7) is determined by (and is independent of ), and by a hypothesis of the lemma, we have
The last inequality follows since the density of is bounded by . This verifies hypothesis (i) of Corollary 6.3 with . Hypothesis (ii) follows immediately from (6.2), with . If then as required in Corollary 6.3. It yields
Theorem 1.1 in higher dimensions: completion of the proof
Now we are ready to prove Theorem 1.1. Replacing with we can assume that . By Proposition 2.2, it suffices to bound the concentration function as follows:
where is an absolute constant. Translating if necessary, we can reduce the problem to showing that
We will prove this by induction on . Choose a sufficiently large constant, depending on the value of the constants denoted by in Theorem 4.1, Proposition 2.2, Proposition 5.1, and the constant denoted by in Proposition 6.4.
For the base of the induction, inequality (7.1) for follows from Theorem 4.1.
If for all , then (7.1) follows from Proposition 5.1 together with Proposition 2.2. Alternatively, if there exists such that , we can apply Proposition 6.4 with . Moreover, since the rank of is , the assumption (6.2) is also satisfied due to the induction hypothesis (7.2) and the choice of . Then an application of Proposition 6.4 yields (7.1). This completes the proof of Theorem 1.1. ∎
Theorem 1.5. Concentration of anisotropic distributions.
In this section we prove a more precise version of Theorem 1.5 for random vectors of the form , where is a fixed matrix.
The singular values of arranged in a non-increasing order are denoted by , . To simplify the notation, we set for , and we will do the same for singular vectors of .
The definition of the stable rank of from (1.2) reads as
To emphasize the difference between the essential rank and the rank, we set
Consider a random vector where are real-valued independent random variables. Let be such that
Then for every matrix and for every we have
provided . Moreover, if , then
where .
For orthogonal projections we have , , so the second part of Theorem 8.1 recovers Corollary 1.4.
Note the instead of in the definition of . This offers some stability of the concentration function. Indeed, a small perturbation of a -dimensional orthogonal projection will not change the exponent in the small ball probability.
We will first prove an inequality that is more general than (8.2). Denote
Before we prove (8.4), let us make some helpful reductions. First, by replacing with we can assume that . We can also assume that the vector appearing the definition (1.1) of the concentration function equals zero; this is obvious by first projecting onto the image of and then appropriately translating . With these reductions, the claim (8.4) becomes
Let be the singular value decomposition of . For , consider the spectral projections defined as
Note that and for
We shall bound below and above and then compare the two estimates. First, using the monotonicity of the singular values, we have
Comparing these two estimates term by term, we obtain
Applying Corollary 1.4 (see Remark 2.3) and noting that , we find that
where is an absolute constant. We will shortly conclude that (8.5) holds with . Without loss of generality we can assume that , so . Thus upon substituting (8.7) into (8.6) we obtain a convergent series whose sum is bounded by . This proves (8.5).
We now turn to proving (8.3). By replacing with and with we can assume that and , and reduce our task to showing that
Assume the contrary. By definition of and , we have
On the other hand, by our assumption and monotonicity, are bounded by for all , and by for . Thus
Since the expression in the right hand side increases in and , we can further bound the sum in (8.11) by . But this is smaller than , the lower bound for the same sum in (8.10). This contradiction establishes our claim (8.9).
Similarly to the first part of the proof, we shall bound below and above and then compare the two estimates. For the lower bound, we consider the spectral projection
Applying Corollary 1.4 (see Remark 2.3) and recalling that , we conclude that
It remains to note that by definition. This establishes (8.8) whenever . The case when is trivial. Theorem 8.1 is proved. ∎
Theorem 8.1 implies the following more precise version of Theorem 1.5.
Consider a random vector where are real-valued independent random variables. Let be such that
Then for every matrix , every and every we have
Here and is an absolute constant.
The result follows from Theorem 8.1 where we apply estimate (8.2) whenever and (8.3) otherwise. ∎
In the case when the densities of are uniformly bounded by , Corollary 8.5 yields
Applying this estimate with and , we derive a bound on the Levy concentration function, which is lossless in terms of power of the small ball radius. Such bound may be useful for estimating the negative moments of the norm of .
Let be a vector with independent random coordinates. Assume that the densities on are bounded by . Let be an matrix. Then for any ,