Small ball probabilities for linear images of high dimensional distributions

Mark Rudelson, Roman Vershynin

Introduction

Concentration properties of high dimensional distributions have been extensively studied in probability theory. In this paper we are interested in small ball probabilities, which describe the spread of a distribution in space. Small ball probabilities have been extensively studied for stochastic processes (see ), sums of independent random variables (see ) and log-concave measures (see [1, Chapter 5]). Nevertheless, there remain surprisingly basic questions that have not been previously addressed.

If the distributions of $X_{i}$ are well spread on the line, then the distribution of $AX$ is well spread in space.

Special cases of interest are marginals of $X$ which arise when $A$ is an orthogonal projection, and sums of independent random variables which correspond to $m=1$ . The problem of describing small ball probabilities even in these two special cases is nontrivial and useful for applications. In particular, a recent interest in this problem was spurred by applications in random matrix theory; see for sums of random variables and for higher dimensional marginals.

This discussion is non-trivial even for continuous distributions, and we shall start from this special case.

Our first main result is about distributions with independent continuous coordinates. It states that any $d$ -dimensional marginal of such distribution has density bounded by $O(1)^{d}$ , as long as the densities of the coordinates are bounded by $O(1)$ .

Here and throughout the paper, $C,C_{1},c,c_{1},\ldots$ denote positive absolute constants.

Theorem 1.1 is trivial in dimension $d=n$ , since the product density is bounded by $K^{n}$ . A remarkable non-trivial case of Theorem 1.1 is in dimension $d=1$ , where it holds with optimal constant $C=\sqrt{2}$ . This partial case is worth to be stated separately.

Let $X_{1},\ldots,X_{n}$ be real-valued independent random variables whose densities are bounded by $K$ almost everywhere. Let $a_{1},\ldots,a_{n}$ be real numbers with $\sum_{j=1}^{n}a_{j}^{2}=1$ . Then the density of $\sum_{j=1}^{n}a_{j}X_{j}$ is bounded by $\sqrt{2}\,K$ almost everywhere.

Moreover, this estimate is optimal. The density $\sqrt{2}\,K$ is achieved by the sum $\frac{1}{\sqrt{2}}(X_{1}+X_{2})$ where $X_{1},X_{2}$ are uniformly distributed in $[-\frac{1}{2K},\frac{1}{2K}]$ .

Theorem 1.2 follows immediately from a combination of two known results, a theorem of B. Rogozin stated below and a theorem of K. Ball .

Let $Z_{1},\ldots,Z_{n}$ be real-valued independent random variables whose densities are bounded by $d_{1},\ldots,d_{n}$ respectively. Then the density of $\sum_{j=1}^{n}Z_{j}$ is uniformly bounded by the value of the density of $\sum_{j=1}^{n}U_{j}$ at the origin, where $U_{j}$ are independent random variables uniformly distributed in $[-\frac{1}{2d_{j}},\frac{1}{2d_{j}}]$ .

For simplicity, by rescaling we can assume that $K=1$ . Rogozin’s Theorem 1.3 implies that the density of $\sum_{j=1}^{n}a_{j}X_{j}$ is uniformly bounded by the density of $\sum_{j=1}^{n}a_{j}Y_{j}$ at the origin, where $Y_{j}$ are independent random variables uniformly distributed in $[-\frac{1}{2},\frac{1}{2}]$ .

The random vector $Y=(Y_{1},\ldots,Y_{n})$ is uniformly distributed in the cube $[-\frac{1}{2},\frac{1}{2}]^{n}$ . Thus the density of $\sum_{j=1}^{n}a_{j}Y_{j}$ at the origin is the volume of the section of the cube by the hyperplane that contains the origin and is orthogonal to the vector $a=(a_{1},\ldots,a_{n})$ . Now, a theorem of Ball states that the maximal volume of such section equals $\sqrt{2}$ . Therefore, the density of $\sum_{j=1}^{n}a_{j}X_{j}$ is bounded by $\sqrt{2}$ . ∎

2. General distributions

Thus the concentration function controls the small ball probabilities of the distribution of $Z$ . The study of concentration functions of sums of independent random variables originates from the works of Lévy , Kolmogorov , Rogozin , Esseen and Halasz . Recent developments in this area highlighted connections with Littlewood-Offord problem and applications to random matrix theory, see .

Consider a random vector $X=(X_{1},\ldots,X_{n})$ where $X_{i}$ are real-valued independent random variables. Let $t,p\geq 0$ be such that

This result can be regarded as a tensorization property of the concentration function. It will be deduced from Theorem 1.1 in Section 2.

3. Anisotropic distributions

Finally, we study concentration of anisotropic high-dimensional distributions, which take the form $AX$ for a fixed matrix $A$ . The key exponent that controls the behavior of the concentration function of $AX$ is the stable rank of $A$ . We define it as

Consider a random vector $X=(X_{1},\ldots,X_{n})$ where $X_{i}$ are real-valued independent random variables. Let $t,p\geq 0$ be such that

Let $A$ be an $m\times n$ matrix and $\varepsilon\in(0,1)$ . Then

where $C_{\varepsilon}$ depends only on $\varepsilon$ .

A more precise version of this result is Theorem 8.1 and Corollary 8.5 below. It will be deduced from Corollary 1.4 by replacing $A$ by a dyadic sum of spectral projections.

In the particular case of continuous distributions, Theorem 1.5 states the following. Suppose the densities of $X_{i}$ are bounded by $K$ . Then obviously $\mathcal{L}(X_{i},t)\leq Kt$ for any $t\geq 0$ , so Theorem 1.5 yields

Here $C$ is an absolute constant and $c\in(0,1)$ depends only on the bound on the sub-gaussian norms. The distributions for which Paouris’ inequality (1.4) applies are not required to have independent coordinates. On the other hand, the log-concavity assumption for (1.4) is much stronger than a uniform bound on the coordinate densities in (1.3).

It is worthwhile to state here a related large deviation bound for $AX$ from . If $X_{i}$ are independent, uniformly sub-gaussian, and have zero means and unit variances, then

Here $c>0$ depends only on the bound on the sub-gaussian norms of $X_{i}$ .

4. The method

Let us outline the proof of the key Theorem 1.1, which implies all other results in this paper.

A natural strategy would be to extend to higher dimensions the simple one-dimensional argument leading to Theorem 1.2, which was a combination of Ball’s and Rogozin’s theorems. A higher-dimensional version of Ball’s theorem is indeed available ; it states that the maximal volume of a section of the cube by a subspace of codimension $d$ is $(\sqrt{2})^{d}$ . However, we are unaware of any higher-dimensional versions of Rogozin’s theorem .

An alternative approach to the special case of Theorem 1.1 in dimension $d=1$ (and, as a consequence, to Corollary 1.4 in dimension $d=1$ ) was developed in an unpublished manuscript of Ball and Nazarov . Although it does not achieve the optimal constant $\sqrt{2}$ that appears in Theorem 1.2, this approach avoids the delicate combinatorial arguments that appear in the proof of Rogozin’s theorem. The method of Ball and Nazarov is Fourier-theoretic; its crucial steps go back to Halasz and Ball .

In this paper, we prove Theorem 1.1 by generalizing the method of Ball and Nazarov to higher dimensions using Brascamp-Lieb inequality. For educational purposes, we will start by presenting a version of Ball-Nazarov’s argument in dimension $d=1$ in Sections 3 and 4. The higher-dimensional argument will be presented in Sections 5–7.

There turned out to be an unexpected difference between dimension $d=1$ and higher dimensions, which presents us with an an extra challenge. The one-dimensional method works well under assumption that all coefficients $a_{i}$ are small, e.g. $|a_{i}|\leq 1/2$ . The opposite case where there is a large coefficient $a_{i_{0}}$ , is trivial; it can be treated by conditioning on all $X_{i}$ except $X_{i_{0}}$ .

In higher dimensions, this latter case is no longer trivial. It corresponds to the situation where some $\|Pe_{i_{0}}\|_{2}$ is large (here $e_{i}$ denote the coordinate basis vectors). The power of one random variable $X_{i_{0}}$ is not enough to yield Theorem 1.1; such argument would lead to a weaker bound $(CK\sqrt{d})^{d}$ instead of $(CK)^{d}$ .

In Section 6 we develop an alternative way to remove the terms with large $\|Pe_{i}\|_{2}$ from the sum. It is based on a careful tensorization argument for small ball probabilities.

Deduction of Corollary 1.4 from Theorem 1.1

We begin by recording a couple of elementary properties of concentration functions.

The following elementary observation connects densities and concentration functions.

The density of $Z$ is bounded by $K^{d}$ almost everywhere;

The concentration function of $Z$ satisfies

Here $K$ and $M$ depend only on each other. In the implication (i) $\Rightarrow$ (ii), we have $M\leq CK$ where $C$ is an absolute constant. In the implication (ii) $\Rightarrow$ (i), we have $K\leq M$ .

This proposition follows from the known bound $t^{d}\leq|B(t\sqrt{d})|\leq(Ct)^{d}$ (see e.g. formula (1.18) in ). ∎

Now we are ready to deduce Corollary 1.4 from Theorem 1.1. The proof is a higher-dimensional version of the smoothing argument of Ball and Nazarov .

Note that $X^{\prime}=X+Y$ has independent coordinates whose densities can be computed as follows:

Applying Theorem 1.1, we find that the density of $PX^{\prime}$ is bounded by $(CL)^{d}$ , where $L=\max_{i}\mathcal{L}(X_{i},1/2)$ . Then Proposition 2.2 yields that

Substituting this into (2.2), we complete the proof. ∎

Using regularity of concentration function described in Proposition 2.1, one can state the conclusion of Corollary 1.4 in a more flexible way:

Decay of characteristic functions

We will now begin preparing our way for the proof of Theorem 1.1. Our argument will use the following tail bound for the characteristic function

of a random variable $X$ with bounded density. The estimate and its proof below are essentially due to Ball and Nazarov . The reader can refer to [12, Chapter 3] for the definition and basic properties of the decreasing rearrangements of functions.

Let $X$ be a random variable whose density is bounded by $K$ . Then the non-increasing rearrangement of the characteristic function of $X$ satisfies

The estimate for large $t$ will follow from Plancherel’s identity. The estimate for small $t$ will be based on a regularity argument going back to Halasz .

1. Plancherel. By replacing $X$ with $KX$ we can assume that $K=1$ . Let $f_{X}(\cdot)$ denote the density of $X$ . Thus $\phi_{X}(t)=\int_{-\infty}^{\infty}f_{X}(x)e^{itx}\,dx=\widehat{f_{X}}(-t/2\pi)$ , according to the standard definition of the Fourier transform

By Plancherel’s identity and using that $\|p_{X}\|_{L^{1}}=1$ and $\|p_{X}\|_{L^{\infty}}\leq K=1$ , we obtain

This proves the second part of the claimed estimate. It remains to prove the first part.

2. Symmetrization. Let $X^{\prime}$ denote an independent copy of $X$ . Then

We are going to prove a bound of the form

Substituting $t=Cs$ , we would obtain the desired estimate

which would conclude the proof (provided $C$ is chosen large enough so that $C/2\geq 2\pi$ ).

3. Regularity. First we observe that (3.3) holds for some fixed constant value of $s$ . This follows from the identity $|\phi_{X}(\tau)|^{2}=1-\psi(\tau)$ and inequality (3.2):

Theorem 1.1 in dimension one. Densities of sums.

Now we are going to give a “soft” proof of a version of Theorem 1.2 due to Ball and Nazarov . Their argument establishes Theorem 1.1 in dimension $d=1$ . Let us state this result separately.

By replacing $X_{j}$ with $KX_{j}$ we can assume that $K=1$ . By replacing $X_{j}$ with $-X_{j}$ when necessary we can assume that all $a_{j}\geq 0$ . We can further assume that $a_{j}>0$ by dropping all zero terms from the sum. If there exists $j_{0}$ with $a_{j_{0}}>1/2$ , then the conclusion follows by conditioning on all $X_{j}$ except $X_{j_{0}}$ . Thus we can assume that

Finally, by translating $X_{j}$ if necessary we reduce the problem to bounding the density of $S=\sum_{j}a_{j}X_{j}$ at the origin.

We may assume that $\phi_{X_{j}}\in L_{1}$ by adding to $X_{j}$ an independent normal random variable with an arbitrarily small variance. Fourier inversion formula associated with the Fourier transform (3.1) yields that the density of $S$ at the origin (defined using (2.1)) can be reconstructed from its Fourier transform as

By independence, we have $\phi_{S}(x)=\prod_{j}\phi_{X_{j}}(a_{j}t)$ , so

We use the generalized Hölder’s inequality with exponents $1/a_{i}^{2}$ whose reciprocals sum to $1$ by assumption. It yields

The value of the integrals will not change if we replace the functions $|\phi_{X_{j}}|$ by their non-increasing rearrangements $|\phi_{X_{j}}|^{*}$ . After change of variable, we obtain

Bounding $1-cx^{2}$ by $e^{-cx^{2}}$ , we see that the first integral (over $[0,2\pi]$ ) is bounded by $Ca_{j}$ . The second integral (over $[2\pi,\infty)$ ) is bounded by

where we used that $a_{j}\leq 1/2$ . Therefore

provided that constant $C$ is chosen large enough. Hence

Substituting this into (4.1) completes the proof. ∎

Our proof of Theorem 1.1 will go differently depending on whether all vectors $Pe_{j}$ are small or some $Pe_{j}$ are large. In the first case, we proceed with a high-dimensional version of the argument from Section 4, where Hölder’s inequality will be replaced by Brascamp-Lieb’s inequality. In the second case, we will remove the large vectors $Pe_{j}$ one by one, using a new precise tensorization property of concentration functions.

In this section, we treat the case where all vectors $Pe_{j}$ are small. Theorem 1.1 can be formulated in this case as follows.

Let $X$ be a random vector and $P$ be a projection which satisfy the assumptions of Theorem 1.1. Assume that

Then the density of the random vector $PX$ is bounded by $(CK)^{d}$ almost everywhere.

The proof will be based on Brascamp-Lieb’s inequality.

The singular value decomposition of $P$ yields the existence of a $d\times n$ matrix $R$ satisfying

As in the proof of Theorem 4.1, Fourier inversion formula associated with the Fourier transform in $n$ dimensions yields that the density of $RX$ at the origin (defined using (2.1)) can be reconstructed from its Fourier transform as

is the characteristic function of $RX$ . Therefore, to complete the proof, it suffices to bound the integral in the right hand side of (5.1) by $C^{d}$ .

In order to represent $\phi_{RX}(x)$ more conveniently for application of Brascamp-Lieb inequality, we denote

Then $R=\sum_{j=1}^{n}a_{j}u_{j}e_{j}^{\mathsf{T}}$ , so the identity $RR^{\mathsf{T}}=I_{d}$ can be written as

Moreover, we have $\langle x,RX\rangle=\sum_{i=1}^{n}a_{j}\langle x,u_{j}\rangle X_{j}$ . Substituting this into (5.2) and using independence, we obtain

Recalling (5.3), we apply Brascamp-Lieb inequality for these functions and obtain

We arrived at the same quantity as we encountered in one-dimensional argument in (4.2). Following that argument, which uses the assumption that all $a_{j}\leq 1/2$ , we bound the product the quantity above by

Recalling that $a_{j}=\|Re_{j}\|_{2}$ and , we find that $\sum_{j=1}^{n}a_{j}^{2}=\sum_{j=1}^{n}\|Re_{j}\|_{2}^{2}=\operatorname*{tr}(RR^{\mathsf{T}})=\operatorname*{tr}(I_{d})=d$ . Thus the right hand side of (5) is bounded by $(2C)^{d}$ . The proof of Proposition 5.1 is complete. ∎

Next we turn to the case where not all vectors $Pe_{i}$ are small. In this case, we will remove the large vectors $Pe_{i}$ one by one. The non-trivial task is how not to lose power at each step. This will be achieved with the help of the following precise tensorization property of small ball probabilities.

Let $Z_{1},Z_{2}\geq 0$ be random variables and $M_{1},M_{2},p\geq 0$ be real numbers. Assume that

This lemma will be used later in an inductive argument. To make the inductive step, two features will be critical: (a) the term of order $\sqrt{p}$ in the denominator of the probability estimate; (b) the possibility of choosing different values for the parameters $M_{1}$ and $M_{2}$ .

Denoting $s=t^{2}$ , we compute the probability by iterative integration in $(Z_{1}^{2},Z_{2}^{2})$ plane:

where the last equation follows by integration by parts. Hypothesis (ii) of the lemma says that $F_{2}(x)\leq M_{2}\,x^{p/2}$ , so the expression above is bounded by

where the last equation follows by substitution $x=su$ .

The integral in the right hand side is the value of beta function

Bounding this value is standard. One can use the fact that

which follows from the identity $B(x,y)=\Gamma(x)\Gamma(y)/\Gamma(x+y)$ and Stirling’s approximation. (Here $f(x)\sim g(x)$ means that $f(x)/g(x)\to 1$ .) It follows that

Therefore the ratio of the left and right hand sides is bounded in $p$ . Hence there exists an absolute constant $C$ such that

Substituting $s=t^{2}$ finishes the proof. ∎

Let $Z_{1},Z_{2}\geq 0$ be random variables and $K_{1},K_{2}\geq 0$ , $d>1$ be real numbers. Assume that

provided that $K_{1}\leq cK_{2}$ with a suitably small absolute constant $c$ .

Random variables $Z_{1}$ , $Z_{2}$ satisfy the assumptions of Lemma 6.1 with

Using the hypothesis that $K_{1}\leq cK_{2}$ , we bound the right hand side by

If we choose $c=1/(3C)$ then the right hand side gets bounded by $(K_{2}t)^{d}$ , as claimed. ∎

Let $M\geq C_{0}$ where $C_{0}$ is an absolute constant. If

Without loss of generality, we can assume that $i=1$ . Let us record a few basic properties of $Q$ . A straightforward check shows that

It follows that $(P-Q)e_{1}=Pe_{1}$ , since the orthogonal projection of $e_{1}$ onto $\operatorname*{span}(Pe_{1})$ equals $Pe_{1}$ . Canceling $Pe_{1}$ on both sides, we have

It follows from (6.3) that $P$ has the form

where $a_{j}$ are fixed numbers (independent of $x$ ). Substituting $x=e_{1}$ , we obtain using (6.4) that $Pe_{1}=a_{1}Pe_{1}+Qe_{1}=a_{1}Pe_{1}$ . Thus

since $Qx=Q(\sum_{i=1}^{n}x_{j}e_{j})=\sum_{i=1}^{n}x_{j}Qe_{j}$ and $Qe_{1}=0$ by (6.4). Finally, since $Pe_{1}$ is orthogonal to the image of $Q$ , the two vectors in the right side of (6.5) are orthogonal. Thus

Now let us estimate $\|PX\|_{2}$ for a random vector $X$ . We express $\|PX\|_{2}^{2}$ using (6.8) and (6.6) as

and try to apply Corollary 6.3. Let first us check the hypotheses of that corollary. Since by (6.7) $Z_{2}$ is determined by $X_{2},\ldots,X_{n}$ (and is independent of $X_{1}$ ), and $\|Pe_{i}\|_{2}\geq\nu$ by a hypothesis of the lemma, we have

The last inequality follows since the density of $X_{1}$ is bounded by $K$ . This verifies hypothesis (i) of Corollary 6.3 with $K_{1}=K/\nu$ . Hypothesis (ii) follows immediately from (6.2), with $K_{2}=MK/\nu$ . If $M\geq 1/c=:C_{0}$ then $K_{1}\leq cK_{2}$ as required in Corollary 6.3. It yields

Theorem 1.1 in higher dimensions: completion of the proof

Now we are ready to prove Theorem 1.1. Replacing $X_{j}$ with $KX_{j}$ we can assume that $K=1$ . By Proposition 2.2, it suffices to bound the concentration function as follows:

where $C_{1}$ is an absolute constant. Translating $X$ if necessary, we can reduce the problem to showing that

We will prove this by induction on $d$ . Choose $C_{1}$ a sufficiently large constant, depending on the value of the constants denoted by $C$ in Theorem 4.1, Proposition 2.2, Proposition 5.1, and the constant denoted by $C_{0}$ in Proposition 6.4.

For the base of the induction, inequality (7.1) for $d=1$ follows from Theorem 4.1.

If $\|Pe_{i}\|_{2}<1/2$ for all $i\in[n]$ , then (7.1) follows from Proposition 5.1 together with Proposition 2.2. Alternatively, if there exists $i\in[n]$ such that $\|Pe_{i}\|_{2}\geq 1/2$ , we can apply Proposition 6.4 with $M=C_{1}/2$ . Moreover, since the rank of $Q$ is $d-1$ , the assumption (6.2) is also satisfied due to the induction hypothesis (7.2) and the choice of $M$ . Then an application of Proposition 6.4 yields (7.1). This completes the proof of Theorem 1.1. ∎

Theorem 1.5. Concentration of anisotropic distributions.

In this section we prove a more precise version of Theorem 1.5 for random vectors of the form $AX$ , where $A$ is a fixed $m\times n$ matrix.

The singular values of $A$ arranged in a non-increasing order are denoted by $s_{j}(A)$ , $j=1,\ldots,m\wedge n$ . To simplify the notation, we set $s_{j}(A)=0$ for $j>m\wedge n$ , and we will do the same for singular vectors of $A$ .

The definition of the stable rank of $A$ from (1.2) reads as

To emphasize the difference between the essential rank and the rank, we set

Consider a random vector $X=(X_{1},\ldots,X_{n})$ where $X_{i}$ are real-valued independent random variables. Let $t,p\geq 0$ be such that

Then for every $m\times n$ matrix $A$ and for every $M\geq 1$ we have

provided $\delta(A)>0$ . Moreover, if $\delta(A)<0.4$ , then

where $r_{0}(A)=\lceil(1-2\delta(A))r(A)\rceil$ .

For orthogonal projections we have $\delta(A)=0$ , $r_{0}(A)=r(A)=\operatorname*{rank}(A)$ , so the second part of Theorem 8.1 recovers Corollary 1.4.

Note the $\lceil\cdot\rceil$ instead of $\lfloor\cdot\rfloor$ in the definition of $r_{0}(A)$ . This offers some stability of the concentration function. Indeed, a small perturbation of a $d$ -dimensional orthogonal projection will not change the exponent $r_{0}(A)=r(A)=d$ in the small ball probability.

We will first prove an inequality that is more general than (8.2). Denote

Before we prove (8.4), let us make some helpful reductions. First, by replacing $X$ with $X/t$ we can assume that $t=1$ . We can also assume that the vector $u$ appearing the definition (1.1) of the concentration function $\mathcal{L}(AX,MtS_{r})$ equals zero; this is obvious by first projecting $u$ onto the image of $A$ and then appropriately translating $X$ . With these reductions, the claim (8.4) becomes

Let $A=\sum_{j=1}^{\infty}s_{j}(A)u_{j}v_{j}^{\mathsf{T}}$ be the singular value decomposition of $A$ . For $l=0,1,2,\ldots$ , consider the spectral projections $P_{l}$ defined as

Note that $\text{rank}(P_{0})=r$ and $\text{rank}(P_{l})=2^{l-1}r$ for $l=1,2,\ldots$

We shall bound $\|AX\|_{2}$ below and $S_{r}(A)$ above and then compare the two estimates. First, using the monotonicity of the singular values, we have

Comparing these two estimates term by term, we obtain

Applying Corollary 1.4 (see Remark 2.3) and noting that $2^{l}r\leq 2\operatorname*{rank}(P_{l})$ , we find that

where $C_{0}$ is an absolute constant. We will shortly conclude that (8.5) holds with $C=10C_{0}$ . Without loss of generality we can assume that $CMp\leq 1$ , so $C_{0}Mp\leq 1/10$ . Thus upon substituting (8.7) into (8.6) we obtain a convergent series whose sum is bounded by $(10C_{0}Mp)^{r}$ . This proves (8.5).

We now turn to proving (8.3). By replacing $X$ with $X/t$ and $A$ with $A/\|A\|$ we can assume that $t=1$ and $\|A\|=1$ , and reduce our task to showing that

Assume the contrary. By definition of $\delta$ and $r$ , we have

On the other hand, by our assumption and monotonicity, $s_{j}(A)$ are bounded by $\|A\|=1$ for all $j$ , and by $1/2$ for $j\geq k+1$ . Thus

Since the expression in the right hand side increases in $k$ and $k\leq(1-2\delta)r$ , we can further bound the sum in (8.11) by $(1-2\delta)r+2\delta r\cdot\frac{1}{4}$ . But this is smaller than $(1-\delta)r$ , the lower bound for the same sum in (8.10). This contradiction establishes our claim (8.9).

Similarly to the first part of the proof, we shall bound $\|AX\|_{2}$ below and $S_{r}(A)$ above and then compare the two estimates. For the lower bound, we consider the spectral projection

Applying Corollary 1.4 (see Remark 2.3) and recalling that $\operatorname*{rank}(Q_{k})=k+1$ , we conclude that

It remains to note that $k+1\geq r_{0}(A)$ by definition. This establishes (8.8) whenever $CMp\leq 1$ . The case when $CMp>1$ is trivial. Theorem 8.1 is proved. ∎

Theorem 8.1 implies the following more precise version of Theorem 1.5.

Consider a random vector $X=(X_{1},\ldots,X_{n})$ where $X_{i}$ are real-valued independent random variables. Let $t,p\geq 0$ be such that

Then for every $m\times n$ matrix $A$ , every $M\geq 1$ and every $\varepsilon\in(0,1)$ we have

Here $C_{\varepsilon}=C/\sqrt{\varepsilon}$ and $C$ is an absolute constant.

The result follows from Theorem 8.1 where we apply estimate (8.2) whenever $\delta(A)\geq\varepsilon/2$ and (8.3) otherwise. ∎

In the case when the densities of $X_{i}$ are uniformly bounded by $K>0$ , Corollary 8.5 yields

Applying this estimate with $\varepsilon=\frac{1}{2r(A)}$ and $\tau=t\sqrt{r(A)}$ , we derive a bound on the Levy concentration function, which is lossless in terms of power of the small ball radius. Such bound may be useful for estimating the negative moments of the norm of $AX$ .

Let $X=(X_{1},\ldots,X_{n})$ be a vector with independent random coordinates. Assume that the densities on $X_{1},\ldots,X_{n}$ are bounded by $K$ . Let $A$ be an $m\times n$ matrix. Then for any $\tau>0$ ,

Introduction

2. General distributions

3. Anisotropic distributions

4. The method

Deduction of Corollary 1.4 from Theorem 1.1

Decay of characteristic functions

Theorem 1.1 in dimension one. Densities of sums.

Theorem 1.1 in higher dimensions: completion of the proof

Theorem 1.5. Concentration of anisotropic distributions.

References