Small ball probabilities for linear images of high dimensional distributions

Mark Rudelson, Roman Vershynin

Introduction

Concentration properties of high dimensional distributions have been extensively studied in probability theory. In this paper we are interested in small ball probabilities, which describe the spread of a distribution in space. Small ball probabilities have been extensively studied for stochastic processes (see ), sums of independent random variables (see ) and log-concave measures (see [1, Chapter 5]). Nevertheless, there remain surprisingly basic questions that have not been previously addressed.

If the distributions of XiX_{i} are well spread on the line, then the distribution of AXAX is well spread in space.

Special cases of interest are marginals of XX which arise when AA is an orthogonal projection, and sums of independent random variables which correspond to m=1m=1. The problem of describing small ball probabilities even in these two special cases is nontrivial and useful for applications. In particular, a recent interest in this problem was spurred by applications in random matrix theory; see for sums of random variables and for higher dimensional marginals.

This discussion is non-trivial even for continuous distributions, and we shall start from this special case.

Our first main result is about distributions with independent continuous coordinates. It states that any dd-dimensional marginal of such distribution has density bounded by O(1)dO(1)^{d}, as long as the densities of the coordinates are bounded by O(1)O(1).

Here and throughout the paper, C,C1,c,c1,C,C_{1},c,c_{1},\ldots denote positive absolute constants.

Theorem 1.1 is trivial in dimension d=nd=n, since the product density is bounded by KnK^{n}. A remarkable non-trivial case of Theorem 1.1 is in dimension d=1d=1, where it holds with optimal constant C=2C=\sqrt{2}. This partial case is worth to be stated separately.

Let X1,,XnX_{1},\ldots,X_{n} be real-valued independent random variables whose densities are bounded by KK almost everywhere. Let a1,,ana_{1},\ldots,a_{n} be real numbers with j=1naj2=1\sum_{j=1}^{n}a_{j}^{2}=1. Then the density of j=1najXj\sum_{j=1}^{n}a_{j}X_{j} is bounded by 2K\sqrt{2}\,K almost everywhere.

Moreover, this estimate is optimal. The density 2K\sqrt{2}\,K is achieved by the sum 12(X1+X2)\frac{1}{\sqrt{2}}(X_{1}+X_{2}) where X1,X2X_{1},X_{2} are uniformly distributed in [12K,12K][-\frac{1}{2K},\frac{1}{2K}].

Theorem 1.2 follows immediately from a combination of two known results, a theorem of B. Rogozin stated below and a theorem of K. Ball .

Let Z1,,ZnZ_{1},\ldots,Z_{n} be real-valued independent random variables whose densities are bounded by d1,,dnd_{1},\ldots,d_{n} respectively. Then the density of j=1nZj\sum_{j=1}^{n}Z_{j} is uniformly bounded by the value of the density of j=1nUj\sum_{j=1}^{n}U_{j} at the origin, where UjU_{j} are independent random variables uniformly distributed in [12dj,12dj][-\frac{1}{2d_{j}},\frac{1}{2d_{j}}].

For simplicity, by rescaling we can assume that K=1K=1. Rogozin’s Theorem 1.3 implies that the density of j=1najXj\sum_{j=1}^{n}a_{j}X_{j} is uniformly bounded by the density of j=1najYj\sum_{j=1}^{n}a_{j}Y_{j} at the origin, where YjY_{j} are independent random variables uniformly distributed in [12,12][-\frac{1}{2},\frac{1}{2}].

The random vector Y=(Y1,,Yn)Y=(Y_{1},\ldots,Y_{n}) is uniformly distributed in the cube [12,12]n[-\frac{1}{2},\frac{1}{2}]^{n}. Thus the density of j=1najYj\sum_{j=1}^{n}a_{j}Y_{j} at the origin is the volume of the section of the cube by the hyperplane that contains the origin and is orthogonal to the vector a=(a1,,an)a=(a_{1},\ldots,a_{n}). Now, a theorem of Ball states that the maximal volume of such section equals 2\sqrt{2}. Therefore, the density of j=1najXj\sum_{j=1}^{n}a_{j}X_{j} is bounded by 2\sqrt{2}. ∎

2. General distributions

Thus the concentration function controls the small ball probabilities of the distribution of ZZ. The study of concentration functions of sums of independent random variables originates from the works of Lévy , Kolmogorov , Rogozin , Esseen and Halasz . Recent developments in this area highlighted connections with Littlewood-Offord problem and applications to random matrix theory, see .

Consider a random vector X=(X1,,Xn)X=(X_{1},\ldots,X_{n}) where XiX_{i} are real-valued independent random variables. Let t,p0t,p\geq 0 be such that

This result can be regarded as a tensorization property of the concentration function. It will be deduced from Theorem 1.1 in Section 2.

3. Anisotropic distributions

Finally, we study concentration of anisotropic high-dimensional distributions, which take the form AXAX for a fixed matrix AA. The key exponent that controls the behavior of the concentration function of AXAX is the stable rank of AA. We define it as

Consider a random vector X=(X1,,Xn)X=(X_{1},\ldots,X_{n}) where XiX_{i} are real-valued independent random variables. Let t,p0t,p\geq 0 be such that

Let AA be an m×nm\times n matrix and ε(0,1)\varepsilon\in(0,1). Then

where CεC_{\varepsilon} depends only on ε\varepsilon.

A more precise version of this result is Theorem 8.1 and Corollary 8.5 below. It will be deduced from Corollary 1.4 by replacing AA by a dyadic sum of spectral projections.

In the particular case of continuous distributions, Theorem 1.5 states the following. Suppose the densities of XiX_{i} are bounded by KK. Then obviously L(Xi,t)Kt\mathcal{L}(X_{i},t)\leq Kt for any t0t\geq 0, so Theorem 1.5 yields

Here CC is an absolute constant and c(0,1)c\in(0,1) depends only on the bound on the sub-gaussian norms. The distributions for which Paouris’ inequality (1.4) applies are not required to have independent coordinates. On the other hand, the log-concavity assumption for (1.4) is much stronger than a uniform bound on the coordinate densities in (1.3).

It is worthwhile to state here a related large deviation bound for AXAX from . If XiX_{i} are independent, uniformly sub-gaussian, and have zero means and unit variances, then

Here c>0c>0 depends only on the bound on the sub-gaussian norms of XiX_{i}.

4. The method

Let us outline the proof of the key Theorem 1.1, which implies all other results in this paper.

A natural strategy would be to extend to higher dimensions the simple one-dimensional argument leading to Theorem 1.2, which was a combination of Ball’s and Rogozin’s theorems. A higher-dimensional version of Ball’s theorem is indeed available ; it states that the maximal volume of a section of the cube by a subspace of codimension dd is (2)d(\sqrt{2})^{d}. However, we are unaware of any higher-dimensional versions of Rogozin’s theorem .

An alternative approach to the special case of Theorem 1.1 in dimension d=1d=1 (and, as a consequence, to Corollary 1.4 in dimension d=1d=1) was developed in an unpublished manuscript of Ball and Nazarov . Although it does not achieve the optimal constant 2\sqrt{2} that appears in Theorem 1.2, this approach avoids the delicate combinatorial arguments that appear in the proof of Rogozin’s theorem. The method of Ball and Nazarov is Fourier-theoretic; its crucial steps go back to Halasz and Ball .

In this paper, we prove Theorem 1.1 by generalizing the method of Ball and Nazarov to higher dimensions using Brascamp-Lieb inequality. For educational purposes, we will start by presenting a version of Ball-Nazarov’s argument in dimension d=1d=1 in Sections 3 and 4. The higher-dimensional argument will be presented in Sections 5–7.

There turned out to be an unexpected difference between dimension d=1d=1 and higher dimensions, which presents us with an an extra challenge. The one-dimensional method works well under assumption that all coefficients aia_{i} are small, e.g. ai1/2|a_{i}|\leq 1/2. The opposite case where there is a large coefficient ai0a_{i_{0}}, is trivial; it can be treated by conditioning on all XiX_{i} except Xi0X_{i_{0}}.

In higher dimensions, this latter case is no longer trivial. It corresponds to the situation where some Pei02\|Pe_{i_{0}}\|_{2} is large (here eie_{i} denote the coordinate basis vectors). The power of one random variable Xi0X_{i_{0}} is not enough to yield Theorem 1.1; such argument would lead to a weaker bound (CKd)d(CK\sqrt{d})^{d} instead of (CK)d(CK)^{d}.

In Section 6 we develop an alternative way to remove the terms with large Pei2\|Pe_{i}\|_{2} from the sum. It is based on a careful tensorization argument for small ball probabilities.

Deduction of Corollary 1.4 from Theorem 1.1

We begin by recording a couple of elementary properties of concentration functions.

The following elementary observation connects densities and concentration functions.

The density of ZZ is bounded by KdK^{d} almost everywhere;

The concentration function of ZZ satisfies

Here KK and MM depend only on each other. In the implication (i) \Rightarrow (ii), we have MCKM\leq CK where CC is an absolute constant. In the implication (ii) \Rightarrow (i), we have KMK\leq M.

This proposition follows from the known bound tdB(td)(Ct)dt^{d}\leq|B(t\sqrt{d})|\leq(Ct)^{d} (see e.g. formula (1.18) in ). ∎

Now we are ready to deduce Corollary 1.4 from Theorem 1.1. The proof is a higher-dimensional version of the smoothing argument of Ball and Nazarov .

Note that X=X+YX^{\prime}=X+Y has independent coordinates whose densities can be computed as follows:

Applying Theorem 1.1, we find that the density of PXPX^{\prime} is bounded by (CL)d(CL)^{d}, where L=maxiL(Xi,1/2)L=\max_{i}\mathcal{L}(X_{i},1/2). Then Proposition 2.2 yields that

Substituting this into (2.2), we complete the proof. ∎

Using regularity of concentration function described in Proposition 2.1, one can state the conclusion of Corollary 1.4 in a more flexible way:

Decay of characteristic functions

We will now begin preparing our way for the proof of Theorem 1.1. Our argument will use the following tail bound for the characteristic function

of a random variable XX with bounded density. The estimate and its proof below are essentially due to Ball and Nazarov . The reader can refer to [12, Chapter 3] for the definition and basic properties of the decreasing rearrangements of functions.

Let XX be a random variable whose density is bounded by KK. Then the non-increasing rearrangement of the characteristic function of XX satisfies

The estimate for large tt will follow from Plancherel’s identity. The estimate for small tt will be based on a regularity argument going back to Halasz .

1. Plancherel. By replacing XX with KXKX we can assume that K=1K=1. Let fX()f_{X}(\cdot) denote the density of XX. Thus ϕX(t)=fX(x)eitxdx=fX^(t/2π)\phi_{X}(t)=\int_{-\infty}^{\infty}f_{X}(x)e^{itx}\,dx=\widehat{f_{X}}(-t/2\pi), according to the standard definition of the Fourier transform

By Plancherel’s identity and using that pXL1=1\|p_{X}\|_{L^{1}}=1 and pXLK=1\|p_{X}\|_{L^{\infty}}\leq K=1, we obtain

This proves the second part of the claimed estimate. It remains to prove the first part.

2. Symmetrization. Let XX^{\prime} denote an independent copy of XX. Then

We are going to prove a bound of the form

Substituting t=Cst=Cs, we would obtain the desired estimate

which would conclude the proof (provided CC is chosen large enough so that C/22πC/2\geq 2\pi).

3. Regularity. First we observe that (3.3) holds for some fixed constant value of ss. This follows from the identity ϕX(τ)2=1ψ(τ)|\phi_{X}(\tau)|^{2}=1-\psi(\tau) and inequality (3.2):

Theorem 1.1 in dimension one. Densities of sums.

Now we are going to give a “soft” proof of a version of Theorem 1.2 due to Ball and Nazarov . Their argument establishes Theorem 1.1 in dimension d=1d=1. Let us state this result separately.

Let X1,,XnX_{1},\ldots,X_{n} be real-valued independent random variables whose densities are bounded by KK almost everywhere. Let a1,,ana_{1},\ldots,a_{n} be real numbers with j=1naj2=1\sum_{j=1}^{n}a_{j}^{2}=1. Then the density of j=1najXj\sum_{j=1}^{n}a_{j}X_{j} is bounded by CKCK almost everywhere.

By replacing XjX_{j} with KXjKX_{j} we can assume that K=1K=1. By replacing XjX_{j} with Xj-X_{j} when necessary we can assume that all aj0a_{j}\geq 0. We can further assume that aj>0a_{j}>0 by dropping all zero terms from the sum. If there exists j0j_{0} with aj0>1/2a_{j_{0}}>1/2, then the conclusion follows by conditioning on all XjX_{j} except Xj0X_{j_{0}}. Thus we can assume that

Finally, by translating XjX_{j} if necessary we reduce the problem to bounding the density of S=jajXjS=\sum_{j}a_{j}X_{j} at the origin.

We may assume that ϕXjL1\phi_{X_{j}}\in L_{1} by adding to XjX_{j} an independent normal random variable with an arbitrarily small variance. Fourier inversion formula associated with the Fourier transform (3.1) yields that the density of SS at the origin (defined using (2.1)) can be reconstructed from its Fourier transform as

By independence, we have ϕS(x)=jϕXj(ajt)\phi_{S}(x)=\prod_{j}\phi_{X_{j}}(a_{j}t), so

We use the generalized Hölder’s inequality with exponents 1/ai21/a_{i}^{2} whose reciprocals sum to 11 by assumption. It yields

The value of the integrals will not change if we replace the functions ϕXj|\phi_{X_{j}}| by their non-increasing rearrangements ϕXj|\phi_{X_{j}}|^{*}. After change of variable, we obtain

Bounding 1cx21-cx^{2} by ecx2e^{-cx^{2}}, we see that the first integral (over [0,2π][0,2\pi]) is bounded by CajCa_{j}. The second integral (over [2π,)[2\pi,\infty)) is bounded by

where we used that aj1/2a_{j}\leq 1/2. Therefore

provided that constant CC is chosen large enough. Hence

Substituting this into (4.1) completes the proof. ∎

Our proof of Theorem 1.1 will go differently depending on whether all vectors PejPe_{j} are small or some PejPe_{j} are large. In the first case, we proceed with a high-dimensional version of the argument from Section 4, where Hölder’s inequality will be replaced by Brascamp-Lieb’s inequality. In the second case, we will remove the large vectors PejPe_{j} one by one, using a new precise tensorization property of concentration functions.

In this section, we treat the case where all vectors PejPe_{j} are small. Theorem 1.1 can be formulated in this case as follows.

Let XX be a random vector and PP be a projection which satisfy the assumptions of Theorem 1.1. Assume that

Then the density of the random vector PXPX is bounded by (CK)d(CK)^{d} almost everywhere.

The proof will be based on Brascamp-Lieb’s inequality.

The singular value decomposition of PP yields the existence of a d×nd\times n matrix RR satisfying

As in the proof of Theorem 4.1, Fourier inversion formula associated with the Fourier transform in nn dimensions yields that the density of RXRX at the origin (defined using (2.1)) can be reconstructed from its Fourier transform as

is the characteristic function of RXRX. Therefore, to complete the proof, it suffices to bound the integral in the right hand side of (5.1) by CdC^{d}.

In order to represent ϕRX(x)\phi_{RX}(x) more conveniently for application of Brascamp-Lieb inequality, we denote

Then R=j=1najujejTR=\sum_{j=1}^{n}a_{j}u_{j}e_{j}^{\mathsf{T}}, so the identity RRT=IdRR^{\mathsf{T}}=I_{d} can be written as

Moreover, we have x,RX=i=1najx,ujXj\langle x,RX\rangle=\sum_{i=1}^{n}a_{j}\langle x,u_{j}\rangle X_{j}. Substituting this into (5.2) and using independence, we obtain

Recalling (5.3), we apply Brascamp-Lieb inequality for these functions and obtain

We arrived at the same quantity as we encountered in one-dimensional argument in (4.2). Following that argument, which uses the assumption that all aj1/2a_{j}\leq 1/2, we bound the product the quantity above by

Recalling that aj=Rej2a_{j}=\|Re_{j}\|_{2} and , we find that j=1naj2=j=1nRej22=tr(RRT)=tr(Id)=d\sum_{j=1}^{n}a_{j}^{2}=\sum_{j=1}^{n}\|Re_{j}\|_{2}^{2}=\operatorname*{tr}(RR^{\mathsf{T}})=\operatorname*{tr}(I_{d})=d. Thus the right hand side of (5) is bounded by (2C)d(2C)^{d}. The proof of Proposition 5.1 is complete. ∎

Next we turn to the case where not all vectors PeiPe_{i} are small. In this case, we will remove the large vectors PeiPe_{i} one by one. The non-trivial task is how not to lose power at each step. This will be achieved with the help of the following precise tensorization property of small ball probabilities.

Let Z1,Z20Z_{1},Z_{2}\geq 0 be random variables and M1,M2,p0M_{1},M_{2},p\geq 0 be real numbers. Assume that

This lemma will be used later in an inductive argument. To make the inductive step, two features will be critical: (a) the term of order p\sqrt{p} in the denominator of the probability estimate; (b) the possibility of choosing different values for the parameters M1M_{1} and M2M_{2}.

Denoting s=t2s=t^{2}, we compute the probability by iterative integration in (Z12,Z22)(Z_{1}^{2},Z_{2}^{2}) plane:

where the last equation follows by integration by parts. Hypothesis (ii) of the lemma says that F2(x)M2xp/2F_{2}(x)\leq M_{2}\,x^{p/2}, so the expression above is bounded by

where the last equation follows by substitution x=sux=su.

The integral in the right hand side is the value of beta function

Bounding this value is standard. One can use the fact that

which follows from the identity B(x,y)=Γ(x)Γ(y)/Γ(x+y)B(x,y)=\Gamma(x)\Gamma(y)/\Gamma(x+y) and Stirling’s approximation. (Here f(x)g(x)f(x)\sim g(x) means that f(x)/g(x)1f(x)/g(x)\to 1.) It follows that

Therefore the ratio of the left and right hand sides is bounded in pp. Hence there exists an absolute constant CC such that

Substituting s=t2s=t^{2} finishes the proof. ∎

Let Z1,Z20Z_{1},Z_{2}\geq 0 be random variables and K1,K20K_{1},K_{2}\geq 0, d>1d>1 be real numbers. Assume that

provided that K1cK2K_{1}\leq cK_{2} with a suitably small absolute constant cc.

Random variables Z1Z_{1}, Z2Z_{2} satisfy the assumptions of Lemma 6.1 with

Using the hypothesis that K1cK2K_{1}\leq cK_{2}, we bound the right hand side by

If we choose c=1/(3C)c=1/(3C) then the right hand side gets bounded by (K2t)d(K_{2}t)^{d}, as claimed. ∎

Let MC0M\geq C_{0} where C0C_{0} is an absolute constant. If

Without loss of generality, we can assume that i=1i=1. Let us record a few basic properties of QQ. A straightforward check shows that

It follows that (PQ)e1=Pe1(P-Q)e_{1}=Pe_{1}, since the orthogonal projection of e1e_{1} onto span(Pe1)\operatorname*{span}(Pe_{1}) equals Pe1Pe_{1}. Canceling Pe1Pe_{1} on both sides, we have

It follows from (6.3) that PP has the form

where aja_{j} are fixed numbers (independent of xx). Substituting x=e1x=e_{1}, we obtain using (6.4) that Pe1=a1Pe1+Qe1=a1Pe1Pe_{1}=a_{1}Pe_{1}+Qe_{1}=a_{1}Pe_{1}. Thus

since Qx=Q(i=1nxjej)=i=1nxjQejQx=Q(\sum_{i=1}^{n}x_{j}e_{j})=\sum_{i=1}^{n}x_{j}Qe_{j} and Qe1=0Qe_{1}=0 by (6.4). Finally, since Pe1Pe_{1} is orthogonal to the image of QQ, the two vectors in the right side of (6.5) are orthogonal. Thus

Now let us estimate PX2\|PX\|_{2} for a random vector XX. We express PX22\|PX\|_{2}^{2} using (6.8) and (6.6) as

and try to apply Corollary 6.3. Let first us check the hypotheses of that corollary. Since by (6.7) Z2Z_{2} is determined by X2,,XnX_{2},\ldots,X_{n} (and is independent of X1X_{1}), and Pei2ν\|Pe_{i}\|_{2}\geq\nu by a hypothesis of the lemma, we have

The last inequality follows since the density of X1X_{1} is bounded by KK. This verifies hypothesis (i) of Corollary 6.3 with K1=K/νK_{1}=K/\nu. Hypothesis (ii) follows immediately from (6.2), with K2=MK/νK_{2}=MK/\nu. If M1/c=:C0M\geq 1/c=:C_{0} then K1cK2K_{1}\leq cK_{2} as required in Corollary 6.3. It yields

Theorem 1.1 in higher dimensions: completion of the proof

Now we are ready to prove Theorem 1.1. Replacing XjX_{j} with KXjKX_{j} we can assume that K=1K=1. By Proposition 2.2, it suffices to bound the concentration function as follows:

where C1C_{1} is an absolute constant. Translating XX if necessary, we can reduce the problem to showing that

We will prove this by induction on dd. Choose C1C_{1} a sufficiently large constant, depending on the value of the constants denoted by CC in Theorem 4.1, Proposition 2.2, Proposition 5.1, and the constant denoted by C0C_{0} in Proposition 6.4.

For the base of the induction, inequality (7.1) for d=1d=1 follows from Theorem 4.1.

If Pei2<1/2\|Pe_{i}\|_{2}<1/2 for all i[n]i\in[n], then (7.1) follows from Proposition 5.1 together with Proposition 2.2. Alternatively, if there exists i[n]i\in[n] such that Pei21/2\|Pe_{i}\|_{2}\geq 1/2, we can apply Proposition 6.4 with M=C1/2M=C_{1}/2. Moreover, since the rank of QQ is d1d-1, the assumption (6.2) is also satisfied due to the induction hypothesis (7.2) and the choice of MM. Then an application of Proposition 6.4 yields (7.1). This completes the proof of Theorem 1.1. ∎

Theorem 1.5. Concentration of anisotropic distributions.

In this section we prove a more precise version of Theorem 1.5 for random vectors of the form AXAX, where AA is a fixed m×nm\times n matrix.

The singular values of AA arranged in a non-increasing order are denoted by sj(A)s_{j}(A), j=1,,mnj=1,\ldots,m\wedge n. To simplify the notation, we set sj(A)=0s_{j}(A)=0 for j>mnj>m\wedge n, and we will do the same for singular vectors of AA.

The definition of the stable rank of AA from (1.2) reads as

To emphasize the difference between the essential rank and the rank, we set

Consider a random vector X=(X1,,Xn)X=(X_{1},\ldots,X_{n}) where XiX_{i} are real-valued independent random variables. Let t,p0t,p\geq 0 be such that

Then for every m×nm\times n matrix AA and for every M1M\geq 1 we have

provided δ(A)>0\delta(A)>0. Moreover, if δ(A)<0.4\delta(A)<0.4, then

where r0(A)=(12δ(A))r(A)r_{0}(A)=\lceil(1-2\delta(A))r(A)\rceil.

For orthogonal projections we have δ(A)=0\delta(A)=0, r0(A)=r(A)=rank(A)r_{0}(A)=r(A)=\operatorname*{rank}(A), so the second part of Theorem 8.1 recovers Corollary 1.4.

Note the \lceil\cdot\rceil instead of \lfloor\cdot\rfloor in the definition of r0(A)r_{0}(A). This offers some stability of the concentration function. Indeed, a small perturbation of a dd-dimensional orthogonal projection will not change the exponent r0(A)=r(A)=dr_{0}(A)=r(A)=d in the small ball probability.

We will first prove an inequality that is more general than (8.2). Denote

Before we prove (8.4), let us make some helpful reductions. First, by replacing XX with X/tX/t we can assume that t=1t=1. We can also assume that the vector uu appearing the definition (1.1) of the concentration function L(AX,MtSr)\mathcal{L}(AX,MtS_{r}) equals zero; this is obvious by first projecting uu onto the image of AA and then appropriately translating XX. With these reductions, the claim (8.4) becomes

Let A=j=1sj(A)ujvjTA=\sum_{j=1}^{\infty}s_{j}(A)u_{j}v_{j}^{\mathsf{T}} be the singular value decomposition of AA. For l=0,1,2,l=0,1,2,\ldots, consider the spectral projections PlP_{l} defined as

Note that rank(P0)=r\text{rank}(P_{0})=r and rank(Pl)=2l1r\text{rank}(P_{l})=2^{l-1}r for l=1,2,l=1,2,\ldots

We shall bound AX2\|AX\|_{2} below and Sr(A)S_{r}(A) above and then compare the two estimates. First, using the monotonicity of the singular values, we have

Comparing these two estimates term by term, we obtain

Applying Corollary 1.4 (see Remark 2.3) and noting that 2lr2rank(Pl)2^{l}r\leq 2\operatorname*{rank}(P_{l}), we find that

where C0C_{0} is an absolute constant. We will shortly conclude that (8.5) holds with C=10C0C=10C_{0}. Without loss of generality we can assume that CMp1CMp\leq 1, so C0Mp1/10C_{0}Mp\leq 1/10. Thus upon substituting (8.7) into (8.6) we obtain a convergent series whose sum is bounded by (10C0Mp)r(10C_{0}Mp)^{r}. This proves (8.5).

We now turn to proving (8.3). By replacing XX with X/tX/t and AA with A/AA/\|A\| we can assume that t=1t=1 and A=1\|A\|=1, and reduce our task to showing that

Assume the contrary. By definition of δ\delta and rr, we have

On the other hand, by our assumption and monotonicity, sj(A)s_{j}(A) are bounded by A=1\|A\|=1 for all jj, and by 1/21/2 for jk+1j\geq k+1. Thus

Since the expression in the right hand side increases in kk and k(12δ)rk\leq(1-2\delta)r, we can further bound the sum in (8.11) by (12δ)r+2δr14(1-2\delta)r+2\delta r\cdot\frac{1}{4}. But this is smaller than (1δ)r(1-\delta)r, the lower bound for the same sum in (8.10). This contradiction establishes our claim (8.9).

Similarly to the first part of the proof, we shall bound AX2\|AX\|_{2} below and Sr(A)S_{r}(A) above and then compare the two estimates. For the lower bound, we consider the spectral projection

Applying Corollary 1.4 (see Remark 2.3) and recalling that rank(Qk)=k+1\operatorname*{rank}(Q_{k})=k+1, we conclude that

It remains to note that k+1r0(A)k+1\geq r_{0}(A) by definition. This establishes (8.8) whenever CMp1CMp\leq 1. The case when CMp>1CMp>1 is trivial. Theorem 8.1 is proved. ∎

Theorem 8.1 implies the following more precise version of Theorem 1.5.

Consider a random vector X=(X1,,Xn)X=(X_{1},\ldots,X_{n}) where XiX_{i} are real-valued independent random variables. Let t,p0t,p\geq 0 be such that

Then for every m×nm\times n matrix AA, every M1M\geq 1 and every ε(0,1)\varepsilon\in(0,1) we have

Here Cε=C/εC_{\varepsilon}=C/\sqrt{\varepsilon} and CC is an absolute constant.

The result follows from Theorem 8.1 where we apply estimate (8.2) whenever δ(A)ε/2\delta(A)\geq\varepsilon/2 and (8.3) otherwise. ∎

In the case when the densities of XiX_{i} are uniformly bounded by K>0K>0, Corollary 8.5 yields

Applying this estimate with ε=12r(A)\varepsilon=\frac{1}{2r(A)} and τ=tr(A)\tau=t\sqrt{r(A)}, we derive a bound on the Levy concentration function, which is lossless in terms of power of the small ball radius. Such bound may be useful for estimating the negative moments of the norm of AXAX.

Let X=(X1,,Xn)X=(X_{1},\ldots,X_{n}) be a vector with independent random coordinates. Assume that the densities on X1,,XnX_{1},\ldots,X_{n} are bounded by KK. Let AA be an m×nm\times n matrix. Then for any τ>0\tau>0,

References