The smallest singular value of a random rectangular matrix

Mark Rudelson, Roman Vershynin

Introduction

Extreme singular values of random matrices has been of considerable interest in mathematical physics, geometric functional analysis, numerical analysis and other fields. Consider an N×nN\times n real matrix AA with NnN\geq n. The singular values sk(A)s_{k}(A) of AA are the eigenvalues of A=AtA|A|=\sqrt{A^{t}A} arranged in nonincreasing order. Of particular significance are the largest and the smallest singular values

A natural matrix model is given by matrices whose entries are independent real random variables with certain moment assumptions. In this paper, we shall consider subgaussian random variables ξ\xi – those whose tails are dominated by that of the standard normal random variable. Namely, a random variable ξ\xi is called subgaussian if there exists B>0B>0 such that

The minimal BB in this inequality is called the subgaussian moment of ξ\xi. Inequality (1.2) is often equivalently formulated as the moment condition

where CC is an absolute constant. The class of subgaussian random variables includes many random variables that arise naturally in applications, such as normal, symmetric ±1\pm 1 and general bounded random variables.

In this paper, we study N×nN\times n real random matrices AA whose entries are independent and identically distributed mean zero subgaussian random variables. The asymptotic behavior of the extreme singular values of AA is well understood. If the entries have unit variance and the dimension nn grows to infinity while the aspect ratio n/Nn/N converges to a constant λ(0,1)\lambda\in(0,1), then

almost surely. This result was proved in for Gaussian matrices, and in for matrices with independent and identically distributed entries with finite fourth moment. In other words, we have asymptotically

Considerable efforts were made recently to establish non-asymptotic estimates similar to (1.4), which would hold for arbitrary fixed dimensions NN and nn; see the survey on the largest singular value, and the discussion below on the smallest singular value.

The largest singular value is relatively easy to bound above, up to a constant factor. Indeed, a standard covering argument shows that s1(A)s_{1}(A) is at most of the optimal order N\sqrt{N} for all fixed dimensions, see Proposition 2.3 below. The smallest singular value is significantly harder to control. The efforts to prove optimal bounds on sn(A)s_{n}(A) have a long history, which we shall now outline.

2. Tall matrices

A result of provides an optimal bound for tall matrices, those with aspect ratio λ=n/N\lambda=n/N satisfies λ<λ0\lambda<\lambda_{0} for some sufficiently small constant λ0>0\lambda_{0}>0. Recalling (1.4), one should expect that tall matrices satisfy

It was indeed proved in that for tall ±1\pm 1 matrices one has

where λ0>0\lambda_{0}>0 and c>0c>0 are absolute constants.

3. Almost square matrices

As we move toward square matrices, thus making the aspect ratio λ=n/N\lambda=n/N arbitrarily close to 11, the problem of estimating the smallest singular value becomes harder. One still expects (1.5) to be true as long as λ<1\lambda<1 is any constant. Indeed, this was proved in for arbitrary aspect ratios λ<1c/logn\lambda<1-c/\log n and for general random matrices with independent subgaussian entries. One has

where cλ>0c_{\lambda}>0 depends only on λ\lambda and the maximal subgaussian moment of the entries.

In subsequent work , the dependence of cλc_{\lambda} on the aspect ratio in (1.7) was improved for random ±1\pm 1 matrices; however the probability estimate there was weaker than in (1.7). An estimate for subgaussian random matrices of all dimensions was obtained in . For any εCN1/2\varepsilon\geq CN^{-1/2}, it was shown that

However, because of the factor (1λ)(1-\lambda), this estimate is suboptimal and does not correspond to the expected asymptotic behavior (1.4).

4. Square matrices

The extreme case for the problem of estimating the singular value is for the square matrices, where N=nN=n. Asymptotic (1.4) is useless for square matrices. However, for “almost” square matrices, those with constant defect Nn=O(1)N-n=O(1), the quantity Nn\sqrt{N}-\sqrt{n} is of order 1/N1/\sqrt{N}, so asymptotics (1.4) heuristically suggests that these matrices should satisfy

This conjecture was proved recently in for all square subgaussian matrices:

5. New result: bridging all classes of matrices

In this paper, we prove the conjectural bound for sn(A)s_{n}(A) valid for all subgaussian matrices in all fixed dimensions N,nN,n. The bound is optimal for matrices with all aspect ratios we encountered above.

Let AA be an N×nN\times n random matrix, NnN\geq n, whose elements are independent copies of a mean zero subgaussian random variable with unit variance. Then, for every ε>0\varepsilon>0, we have

where C,c>0C,c>0 depend (polynomially) only on the subgaussian moment BB.

For tall matrices, Theorem 1.1 clearly amounts to the known estimates (1.5), (1.6). For square matrices (N=nN=n), the quantity NN1\sqrt{N}-\sqrt{N-1} is of order 1/N1/\sqrt{N}, so Theorem 1.1 amounts to the known estimates (1.8), (1.9). Finally, for matrices that are arbitrarily close to square, Theorem 1.1 yields the new optimal estimate

This is a version of the asymptotics (1.4), now valid for all fixed dimensions. This bound was explicitly conjectured e.g. in .

However, this bound is not optimal, and it becomes useless for matrices that are close to square, when Nn=o(n)N-n=o(\sqrt{n}).

The form of estimate (1.10) may be expected if one recalls the classical ε\varepsilon-net argument, which underlies many proofs in geometric functional analysis. By (1.1), we are looking for a lower bound on Ax\|Ax\| that would hold uniformly for all vectors xx on the unit Euclidean sphere Sn1S^{n-1}. For every fixed xSn1x\in S^{n-1}, the quantity Ax22\|Ax\|_{2}^{2} is the sum of NN independent random variables (the squares of the coordinates of AxAx). Therefore, the deviation inequalities make us to expect that Ax2\|Ax\|_{2} is of the order N\sqrt{N} with probability exponential in NN, i.e. 1ecN1-e^{-cN}. We can run this argument separately for each vector xx in a small net N\mathcal{N} of the sphere Sn1S^{n-1}, and then take the union bound to make the estimate uniform over xNx\in\mathcal{N}. It is known how to choose a net N\mathcal{N} of cardinality exponential in the dimension n1n-1 of the sphere, i.e. NeC(n1)|\mathcal{N}|\leq e^{C(n-1)}. Therefore, with probability 1eC(n1)ecN1-e^{C(n-1)}e^{-cN}, we have a good lower bound on Ax2N\|Ax\|_{2}\sim\sqrt{N} for all vectors xx in the net N\mathcal{N}. Finally, one transfers this estimate from the net to the whole sphere Sn1S^{n-1} by approximation.

The problem with this argument is that the constants CC and cc are not the same. Therefore, our estimate on the probability 1eC(n1)ecN1-e^{C(n-1)}e^{-cN} is positive only for tall matrices, when N(C/c)nN\geq(C/c)n. To reach out to matrices of arbitrary dimensions, one needs to develop much more sensitive versions of the ε\varepsilon-net arguments. Nevertheless, the end result stated in Theorem 1.1 exhibits the same two forces played against one another – the probability quantified by the dimension NN and the complexity of the sphere Sn1S^{n-1} quantified by its dimension n1n-1.

6. Small ball probabilities, distance problems, and additive structure

Our proof of Theorem 1.1 is a development of our method in for square matrices. Dealing with rectangular matrices is in several ways considerably harder. Several new tools are developed in this paper, which may be of independent interest.

To prove (1.12), we first use the small ball probability inequalities to compute the distance to an arbitrary subspace HH. This estimate necessarily depends on the additive structure of the subspace HH; the less structure, the better is our estimate, see Theorem 4.2. We then prove the intuitively plausible fact that random subspaces have no arithmetic structure, see Theorem 4.3. This together leads to the desired distance estimate (1.12).

The distance bound is then used to prove our main result, Theorem 1.1. Let XX be some column of the random matrix AA and HH be the span of the other columns. The simple rank argument shows that the smallest singular value sn(A)=0s_{n}(A)=0 if and only if XHX\in H for some column. A simple quantitative version of this argument is that a lower estimate on sn(A)s_{n}(A) yields a lower bound on dist(X,H){\rm dist}(X,H).

In Section 6, we show how to reverse this argument for random matrices – deduce a lower bound on the smallest singular value sn(A)s_{n}(A) from lower bound (1.12) on the distance dist(X,H){\rm dist}(X,H). Our reverse argument is harder than its version for square matrices from , where we had m=1m=1. First, instead of one column XX we now have to consider all linear combinations of dm/2d\sim m/2 columns; see Lemma 6.2. To obtain a distance bound that would be uniformly good for all such linear combinations, one would normally use an ε\varepsilon-net argument. However, the distance to the (Nm)(N-m)-dimensional subspace HH is not sufficiently stable for this argument to be useful for small mm (for matrices close to square). We therefore develop a decoupling argument in Section 7 to bypass this difficulty.

Once this is done, the proof is quickly completed in Section 8.

Acknowledgement

We are grateful to Shuheng Zhou, Nicole Tomczak-Jaegermann, Radoslaw Adamczak, and the anonymous referee for pointing out several inaccuracies in our argument. The second named author is grateful for his wife Lilia for her love and patience during the years this paper was being written.

Notation and preliminaries

Throughout the paper, positive constants are denoted C,C1,C2,c,c1,c2,C,C_{1},C_{2},c,c_{1},c_{2},\ldots Unless otherwise stated, these are absolute constants. In some of our arguments they may depend (polynomially) on specified parameters, such as the subgaussian moment BB.

The following Lemma is a variant of the well known volumetric estimate.

Let SS be a subset of Sn1S^{n-1}, and let ε>0\varepsilon>0. Then there exists an ε\varepsilon-net of SS of cardinality at most

The published variants of his lemma (e.g. , Lemma 2.6) have exponent nn rather than n1n-1. Since the latter exponent will be crucial for our purposes, we include the proof of this lemma for the reader’s convenience.

Without loss of generality we can assume that ε<2\varepsilon<2, otherwise any single point forms a desired net. Let N\mathcal{N} be an ε\varepsilon-separated subset of SS of maximal cardinality. By maximality, N\mathcal{N} is an ε\varepsilon-net of SS. Since N\mathcal{N} is ε\varepsilon-separated, the balls B(x,ε/2)B(x,\varepsilon/2) with centers xNx\in\mathcal{N} are disjoint. All these balls have the same volume, and they are contained in the spherical shell B(0,1+ε/2)B(0,1ε/2)B(0,1+\varepsilon/2)\setminus B(0,1-\varepsilon/2). Therefore, comparing the volumes, we have

Dividing both sides of this inequality by vol(B(0,1)){\rm vol}(B(0,1)), we obtain

Using the inequality (1+x)n(1x)n2nx(1+x)n1(1+x)^{n}-(1-x)^{n}\leq 2nx(1+x)^{n-1} valid for x(0,1)x\in(0,1), we conclude that N|\mathcal{N}| is bounded as desired. This completes the proof. ∎

The following well known argument allows one to compute the norm of a linear operator using nets. We have not found a published reference to this argument, so we include it for the reader’s convenience.

Every zSn1z\in S^{n-1} has the form z=x+hz=x+h, where xNx\in\mathcal{N} and h2ε\|h\|_{2}\leq\varepsilon. Since A=supzSn1Az2\|A\|=\sup_{z\in S^{n-1}}\|Az\|_{2}, the triangle inequality yields

The last term in the right hand side is bounded by εA\varepsilon\|A\|. Therefore we have shown that

Fix xNx\in\mathcal{N}. Repeating the above argument for Ax2=supySm1Ax,y\|Ax\|_{2}=\sup_{y\in S^{m-1}}|\langle Ax,y\rangle| yields the bound

The two previous estimates complete the proof. ∎

Using nets, one easily proves the well known basic bound O(N)O(\sqrt{N}) on the norm of a random subgaussian matrix:

Let AA be an N×nN\times n random matrix, NnN\geq n, whose elements are independent copies of a subgaussian random variable. Then

where C0,c0>0C_{0},c_{0}>0 depend only on the subgaussian moment BB.

Let N\mathcal{N} be a (1/2)(1/2)-net of SN1S^{N-1} and MM be a (1/2)(1/2)-net of Sn1S^{n-1}. By Proposition 2.1, we can choose these nets such that

For every xNx\in\mathcal{N} and yMy\in\mathcal{M}, the random variable Ax,y\langle Ax,y\rangle is subgaussian (see Fact 2.1 in ), thus

where C1,c1>0C_{1},c_{1}>0 depend only on the subgaussian moment BB. Using Lemma 2.2 and taking the union bound, we obtain

2. Compressible and incompressible vectors

In our proof of Theorem 1.1, we will make use of a partition of the unit sphere Sn1S^{n-1} into two sets of compressible and incompressible vectors. These sets were first defined in as follows.

We now recall without proof two simple results. The first is Lemma 3.4 from :

Let xIncomp(δ,ρ)x\in{\mathit{Incomp}}(\delta,\rho). Then there exists a set σ=σ(x){1,,n}\sigma=\sigma(x)\subseteq\{1,\ldots,n\} of cardinality σ12ρ2δn|\sigma|\geq\frac{1}{2}\rho^{2}\delta n and such that

The other result is a variant of Lemma 3.3 from , which establishes the invertibility on compressible vectors, and allows us to focus on incompressible vectors in our proof of Theorem 1.1. While Lemma 3.3 was formulated in for a square matrix, the same proof applies to N×nN\times n matrices, provided that Nn/2N\geq n/2.

Let AA be an N×nN\times n random matrix, Nn/2N\geq n/2, whose elements are independent copies of a subgaussian random variable. There exist δ,ρ,c3>0\delta,\rho,c_{3}>0 depending only on the subgaussian moment BB such that

Small ball probability and the arithmetic structure

Starting from the works of Lévy , Kolmogorov and Esséen , a number of results in probability theory was concerned with the question how spread the sums of independent random variables are. It is convenient to quantify the spread of a random variable in the following way.

An equivalent way of looking at the Lévy concentration function is that it measures the small ball probabilities – the likelihood that the random vector SS enters a small ball in the space. An exposition of the theory of small ball probabilities can be found in .

One can derive a simple but rather weak bound on Lévy concentration function from Paley-Zygmund inequality.

Let ξ\xi be a random variable with mean zero, unit variance, and finite fourth moment. Then for every ε(0,1)\varepsilon\in(0,1) there exists p(0,1)p\in(0,1) which depends only on ε\varepsilon and on the fourth moment, and such that

In particular, this bound holds for subgaussian random variables, and with pp that depends only on ε\varepsilon and the subgaussian moment.

We use Paley-Zygmund inequality, which states for a random variable ZZ that

so, using Minkowski inequality, we obtain

We will need a much stronger bound on the concentration function for sums of independent random variables. Here we present a multi-dimensional version of the inverse Littlewood-Offord inequality from . While this paper was in preparation, Friedland and Sodin proposed two different ways to simplify and improve our argument in . We shall therefore present here a multi-dimensional version of one of arguments of Friedland and Sodin , which is considerably simpler than our original proof.

In the scalar case, when m=1m=1, the additive structure of a sequence a=(a1,,aN)a=(a_{1},\ldots,a_{N}) of real numbers aka_{k} can be described in terms of the shortest arithmetic progression into which it (essentially) embeds. This length is conveniently expressed as the essential least common denominator of aa, defined as follows. We fix parameters α,γ(0,1)\alpha,\gamma\in(0,1), and define

A more traditional way of looking at θa\theta\cdot a is to regard it as the product of the matrix aa with rows aka_{k} and the vector θ\theta.

Then we define, for α>0\alpha>0 and γ(0,1)\gamma\in(0,1),

The following theorem gives a bound on the small ball probability for a random sum S=k=1NakξkS=\sum_{k=1}^{N}a_{k}\xi_{k} in terms of the additive structure of the coefficient sequence aa. The less structure in aa, the bigger its least common denominator is, and the smaller is the small ball probability for SS.

Let ξ1,,ξN\xi_{1},\ldots,\xi_{N} be independent and identically distributed, mean zero random variables, such that L(ξk,1)1b\mathcal{L}(\xi_{k},1)\leq 1-b for some b>0b>0. Consider the random sum S=k=1NakξkS=\sum_{k=1}^{N}a_{k}\xi_{k}. Then, for every α>0\alpha>0 and γ(0,1)\gamma\in(0,1), and for

Halász developed a powerful approach to bounding concentration function; his approach influenced our arguments below. Halász operated under a similar non-degeneracy condition on the vectors aka_{k}: for every xSm1x\in S^{m-1}, at least cNcN terms satisfy ak,x1|\langle{a_{k}},{x}\rangle|\geq 1. After properly rescaling aka_{k} by the factor c/N\sqrt{c/N}, Halász’s condition is seen to be more restrictive than (3.2).

To estimate the Lévy concentration function we apply the Esséen Lemma, see e.g. , p. 290.

Applying Lemma 3.4 to the vector Y=S/εY=S/\varepsilon and using the independence of random variables ξ1,,ξN\xi_{1},\ldots,\xi_{N}, we obtain

Substituting of this into (3.3) and using Jensen’s inequality, we get

The next and major step is to bound the size of the recurrence set

Recall that by the assumption of the theorem,

Therefore, by the definition of the least common denominator, we have that either

In the latter case, since 2t<α2t<\alpha, inequalities (3.4) and (3.5) together yield

where the last inequality follows from condition (3.2).

Recalling the definition of τ\tau, we have proved that every pair of points θ,θI(t)\theta^{\prime},\theta^{\prime\prime}\in I(t) satisfies:

It follows that I(t)I(t) can be covered by Euclidean balls of radii rr, whose centers are RR-separated in the Euclidean distance. Since I(t)B(0,m)I(t)\subset B(0,\sqrt{m}), the number of such balls is at most

which completes the proof of the lemma. ∎

We decompose the domain into two parts. First, by the definition of I(t)I(t), we have

In the last line, we used the estimate vol(B(0,m)Cm|{\rm vol}(B(0,\sqrt{m})|\leq C^{m}.

Second, by the integral distribution formula and using Lemma 3.5, we have

Combining (3.1) and (3.1) completes the proof of Theorem 3.3. ∎

2. Least common denominator of incompressible vectors

The proof gives c1(δ,ρ)=12ρ2δc_{1}(\delta,\rho)=\frac{1}{2}\rho^{2}\sqrt{\delta} and c2(δ)=12δc_{2}(\delta)=\frac{1}{2}\sqrt{\delta}.

By Lemma 2.5, there exists a set σ1{1,,N}\sigma_{1}\subseteq\{1,\ldots,N\} of size

This shows in particular that θ>0\theta>0; dividing by θ\theta gives

Then by Chebychev inequality, there exists a set σ2{1,,N}\sigma_{2}\subseteq\{1,\ldots,N\} of size

Since σ1+σ2>N|\sigma_{1}|+|\sigma_{2}|>N, there exists kσ1σ2k\in\sigma_{1}\cap\sigma_{2}. Fix this kk. By the left hand side of (3.8), by (3.9) and the assumption on γ\gamma we have:

Thus pk>0|p_{k}|>0; since pkp_{k} is an integer, this yields pk1|p_{k}|\geq 1. Similarly, using the right hand side of (3.8), (3.9) and the assumption on γ\gamma, we get

The distance problem and arithmetic structure

More precisely, standard computations give for every ε>0\varepsilon>0 that

However, if XX has a more general distribution with independent coordinates, the distance dist(X,H){\rm dist}(X,H) may strongly depend on the subspace HH. For example, if the coordinates of XX are ±1\pm 1 symmetric random variables. then for H={x:  x1+x2=0}H=\{x:\;x_{1}+x_{2}=0\} the distance equals with probability 1/21/2, while for H={x:  x1++xN=0}H=\{x:\;x_{1}+\cdots+x_{N}=0\} the distance equals with probability 1/N\sim 1/\sqrt{N}.

Nevertheless, a version of the distance bound (4.1) remains true for general distributions if HH is a random subspace. For spaces of codimension m=1m=1, this result was proved in . In this paper, we prove an optimal distance bound for general dimensions.

To explain the term ecNe^{-cN}, consider ±1\pm 1 symmetric random variables. Then with probability at least 2n2^{-n} the random vector XX coincides with one of the random vectors that span HH, which makes the distance equal zero.

We will deduce Theorem 4.1 from a more general inequality that holds for arbitrary fixed subspace HH. This bound will depend on the arithmetic structure of the subspace HH, which we express using the least common denominator.

Then Theorem 3.3 quickly leads to the following general distance bound:

where C,c>0C,c>0 depend only on the subgaussian moment.

Let us write XX in coordinates, X=(ξ1,,ξN)X=(\xi_{1},\ldots,\xi_{N}). By Lemma 3.2 and the remark below it, all coordinates of XX satisfy the inequality L(ξk,1/2)1b\mathcal{L}(\xi_{k},1/2)\leq 1-b for some b>0b>0 that depends only on the subgaussian moment of ξk\xi_{k}. Hence the random variables ξk/2\xi_{k}/2 satisfy the assumption in Theorem 3.3.

Next, we connect the distance to a sum of independent random vectors:

For every θ=(θ1,,θN)H\theta=(\theta_{1},\ldots,\theta_{N})\in H^{\perp} and every kk we have θ,ak=PHθ,ek=θ,ek=θk,\langle\theta,a_{k}\rangle=\langle P_{H^{\perp}}\theta,e_{k}\rangle=\langle\theta,e_{k}\rangle=\theta_{k}, so

The theorem now follows directly from Theorem 3.3. ∎

In order to deduce the Distance Theorem 4.1, it will now suffice to bound below the least common denominator of a random subspace HH^{\perp}. Heuristically, the randomness should remove any arithmetic structure from the subspace, thus making the least common denominator exponentially large. Our next results shows that this is indeed true.

Assuming that this result holds, we can complete the proof of the Distance Theorem 4.1.

Let us condition on a realization of HH in E\mathcal{E}. By the independence of XX and HH, Theorem 4.2 used with α=cN\alpha=c\sqrt{N} and γ=c\gamma=c gives

By the estimate on the probability of Ec\mathcal{E}^{c}, this completes the proof. ∎

Let X1,,XNmX_{1},\ldots,X_{N-m} denote the independent random vectors that span the subspace HH. Consider an (Nm)×N(N-m)\times N random matrix BB with rows XkX_{k}. Then

This observation will help us to “navigate” the random subspace HH^{\perp} away from undesired sets SS on the unit sphere.

There exist δ,ρ(0,1)\delta,\rho\in(0,1) such that

By (4.3), HComp(δ,ρ)=H^{\perp}\cap{\mathit{Comp}}(\delta,\rho)=\emptyset with probability at least 1ec3N1-e^{-c_{3}N}. ∎

Fix the values of δ\delta and ρ\rho given by Lemma 4.4 for the rest of this section. We will further decompose the set of incompressible vectors into level sets SDS_{D} according to the value of the least common denominator DD. We shall prove a nontrivial lower bound on infxSDBx2>0\inf_{x\in S_{D}}\|Bx\|_{2}>0 for each level set up to DD of the exponential order. By (4.3), this will mean that HH^{\perp} is disjoint from every such level set. Therefore, all vectors in HH^{\perp} must have exponentially large least common denominators DD. This is Theorem 4.3.

Let α=μN\alpha=\mu\sqrt{N}, where μ>0\mu>0 is a small number to be chosen later, which depends only on the subgaussian moment. By Lemma 3.6,

Let Dc0ND\geq c_{0}\sqrt{N}. Define SDSN1S_{D}\subseteq S^{N-1} as

To obtain a lower bound for Bx2\|Bx\|_{2} on the level set, we proceed by an ε\varepsilon-net argument. To this end, we first need such a bound for a single vector xx.

Let xSDx\in S_{D}. Then for every t>0t>0 we have

Denoting the elements of BB by ξjk\xi_{jk}, we can write the jj-th coordinate of BxBx as

Now we can use the Small Ball Probability Theorem 3.3 in dimension m=1m=1 for each of these random sums. By Lemma 3.2 and the remark below it, L(ξjk,1/2)1b\mathcal{L}(\xi_{jk},1/2)\leq 1-b for some b>0b>0 that depends only on the subgaussian moment of ξjk\xi_{jk}. Hence the random variables ξjk/2\xi_{jk}/2 satisfy the assumption in Theorem 3.3. This gives for every jj and every t>0t>0:

Since ζj\zeta_{j} are independent random variables, we can use Tensorization Lemma 2.2 of to conclude that for every t>0t>0,

This completes the proof, because Bx22=j=1Nmζj2\|Bx\|_{2}^{2}=\sum_{j=1}^{N-m}|\zeta_{j}|^{2} and N2(Nm)N\leq 2(N-m) by the assumption. ∎

Next, we construct a small ε\varepsilon-net of the level set SDS_{D}. Since this set lies in SN1S^{N-1}, Lemma 2.1 yields the existence of an (N/D)(\sqrt{N}/D)-net of cardinality at most (CD/N)N(CD/\sqrt{N})^{N}. This simple volumetric bound is not sufficient for our purposes, and this is the crucial step where we explore the additive structure of SDS_{D} to construct a smaller net.

There exists a (4α/D)(4\alpha/D)-net of SDS_{D} of cardinality at most (C0D/N)N(C_{0}D/\sqrt{N})^{N}.

Recall that α\alpha is chosen as a small proportion of N\sqrt{N}. Hence Lemma 4.7 gives a better bound than the standard volumetric bound in Lemma 2.1.

We can assume that 4α/D14\alpha/D\leq 1, otherwise the conclusion is trivial. For xSDx\in S_{D}, denote

On the other hand, by (4.5) and using that x2=1\|x\|_{2}=1, D(x)2DD(x)\leq 2D and 4α/D14\alpha/D\leq 1, we obtain

Inequalities (4.6) and (4.7) show that every point xSDx\in S_{D} is within Euclidean distance 2α/D2\alpha/D from the set

A known volumetric argument gives a bound on the number of integer points in B(0,3D)B(0,3D):

(where in the last inequality we used that by Definition 4.5 of the level sets, D>c0ND>c_{0}\sqrt{N}). Finally, there exists a (4α/D)(4\alpha/D)-net of SDS_{D} with the same cardinality as N\mathcal{N}, and which lies in SDS_{D}. Indeed, to obtain such a net, one selects one (arbitrary) point from the intersection of SDS_{D} with a ball of radius 2α/D2\alpha/D centered at each point from N\mathcal{N}. This completes the proof. ∎

There exist c1,c2,μ(0,1)c_{1},c_{2},\mu\in(0,1) such that the following holds. Let α=μN1\alpha=\mu\sqrt{N}\geq 1 and Dc1Nec1N/mD\leq c_{1}\sqrt{N}e^{c_{1}N/m}. Then

By Lemma 2.3, there exists K1K\geq 1 that depends only on the subgaussian moment and such that

Therefore, in order to complete the proof, it is enough to find ν>0\nu>0 which depends only on the subgaussian moment, and such that the event

We claim that this holds with the following choice of parameters:

where C1C\geq 1 and c(0,1)c\in(0,1) are the constants from Lemma 4.6 and C01C_{0}\geq 1 is the constant from Lemma 4.7.

This gives for arbitrary x0SDx_{0}\in S_{D}:

Now we use Lemma 4.7, which yields a small (4α/D)(4\alpha/D)-net N\mathcal{N} of SDS_{D}. Taking the union bound, we get

Denote C1:=3CC0C_{1}:=3CC_{0}. Using the fact that c1νc_{1}\leq\nu and our assumption on DD, we have:

Assume E\mathcal{E} occurs. Fix xSDx\in S_{D} for which Bx2<νN2D\|Bx\|_{2}<\frac{\nu N}{2D}; it can be approximated by some element x0Nx_{0}\in\mathcal{N} as

Therefore, by the triangle inequality we have

where in the last inequality we used our choice of μ\mu.

We have shown that the event E\mathcal{E} implies the event that

whose probability is at most eNe^{-N} by (4.8). The proof is complete. ∎

where c1c_{1} is the constant from Lemma 4.8. Then, by the Definition 4.5 of the level sets, either xx is compressible or xSDx\in S_{D} for some DDD\in\mathcal{D}, where

Therefore, recalling the definition of the least common denominator of the subspace

we can decompose the desired probability as follows:

By Lemma 4.4, the first term in the right hand side is bounded by ecNe^{-cN}. Further terms can be bonded using (4.3) and Lemma 4.8:

Since there are DCN|\mathcal{D}|\leq C^{\prime}N terms in the sum, we conclude that

Decomposition of the sphere

Now we begin the proof of Theorem 1.1. We will make several useful reductions first.

Without loss of generality, we can assume that the entries of AA have a an absolutely continuous distribution. Indeed, we can add to each entry an independent Gaussian random variable with small variance σ\sigma, and later let σ0\sigma\to 0.

Similarly, we can assume that nn0n\geq n_{0}, where n0n_{0} is a suitably large number that depends only on the subgaussian moment BB.

with suitably small constant c0>0c_{0}>0 that depends only on the subgaussian moment BB. Indeed, as we remarked in the Introduction, for the values of dd above a constant proportion of nn, Theorem 1.1 follows from (1.7). Note that

Using the decomposition of the sphere Sn1=CompIncompS^{n-1}={\mathit{Comp}}\cup{\mathit{Incomp}}, we break the invertibility problem into two subproblems, for compressible and incompressible vectors:

A bound for the compressible vectors follows from Lemma 2.6. Using (5.1) we get

It remains to find a lower bound on Ax\|Ax\| for the incompressible vectors xx.

Invertibility via uniform distance bounds

In this section, we reduce the problem of bounding Ax2\|Ax\|_{2} for incompressible vectors xx to the distance problem that we addressed in Section 4.

For levels K1,K2>0K_{1},K_{2}>0 that will only depend on δ,ρ\delta,\rho, we define the set of totally spread vectors

For every δ,ρ(0,1)\delta,\rho\in(0,1), there exist K1,K2,c0>0K_{1},K_{2},c_{0}>0 which depend only on δ,ρ\delta,\rho, and such that the following holds. For every xIncomp(δ,ρ)x\in{\mathit{Incomp}}(\delta,\rho), the event

The proof gives K1=ρδ/2K_{1}=\rho\sqrt{\delta/2}, K2=1/K1K_{2}=1/K_{1}, c0=ρ2δ/2ec_{0}=\rho^{2}\delta/2e. In the rest of the proof, we shall use definition (6.1) of SpreadJ\operatorname{Spread}_{J} with these values of the levels K1K_{1}, K2K_{2}.

Let σ{1,,n}\sigma\subset\{1,\ldots,n\} be the subset from Lemma 2.5. Recall that the parameters δ\delta and ρ\rho depend only on the subgaussian moment BB (see Lemma 2.6). By choosing the constant c0c_{0} in (5.1) appropriately small, we may assume that dσ/2d\leq|\sigma|/2. Then, using Stirling’s approximation we have

If JσJ\subset\sigma, then summing (2.1) over kJk\in J, we obtain the required two-sided bound for PJx2\|P_{J}x\|_{2}. This and (2.1) yields PJxPJx2SpreadJ\frac{P_{J}x}{\left\|P_{J}x\right\|_{2}}\in\operatorname{Spread}_{J}. Hence E(x)\mathcal{E}(x) holds. ∎

Let δ,ρ(0,1)\delta,\rho\in(0,1). There exist C1,c1>0C_{1},c_{1}>0 which depend only on δ,ρ\delta,\rho, and such that the following holds. Let JJ be any dd-element subset of {1,,n}\{1,\ldots,n\}. Then for every ε>0\varepsilon>0

The proof gives K1=ρδ/2K_{1}=\rho\sqrt{\delta/2}, K2=1/K1K_{2}=1/K_{1}, c1=ρ/2c_{1}=\rho/\sqrt{2}, C1=2e/ρ2δC_{1}=2e/\rho^{2}\delta.

Let xIncomp(δ,ρ)x\in{\mathit{Incomp}}(\delta,\rho). For every subset JJ of {1,,n}\{1,\ldots,n\} we have

In case the event E(x)\mathcal{E}(x) of Lemma 6.1 holds, we use the vector z=PJxPJx2SpreadJz=\frac{P_{J}x}{\left\|P_{J}x\right\|_{2}}\in\operatorname{Spread}_{J} to check that

is independent of xx. Moreover, using the estimate on PJx2\|P_{J}x\|_{2} in the definition of the event E(x)\mathcal{E}(x), we conclude that

where c0c_{0} is the constant from Lemma 6.1. Chebychev inequality and Fubini theorem then yield

Fix any realization of AA for which F\mathcal{F} occurs, and fix any xIncomp(δ,ρ)x\in{\mathit{Incomp}}(\delta,\rho). Then

We have proved that for every xIncomp(δ,ρ)x\in{\mathit{Incomp}}(\delta,\rho) there exists a subset J=J(x)J=J(x) that satisfies both E(x)\mathcal{E}(x) and D(A,J)εD(A,J)\geq\varepsilon. Using this JJ in (6.3), we conclude that every matrix AA for which the event F\mathcal{F} occurs satisfies

The uniform distance bound

In this section, we shall estimate the distance between a random ellipsoid and a random independent subspace. This is the distance that we need to bound in the right hand side of (6.2).

Throughout this section, we let JJ be a fixed subset of {1,,n}\{1,\ldots,n\}, J=d|J|=d. We shall use the notation introduced in the beginning of Section 6. Thus, HJH_{J} denotes a random subspace, and SpreadJ\operatorname{Spread}_{J} denotes the totally spread set whose levels K1K_{1}, K2K_{2} depend only on δ\delta, ρ\rho in the definition of incompressibility.

We will denote by K,K0,C,c,C1,c1,K,K_{0},C,c,C_{1},c_{1},\ldots positive numbers that depend only on δ\delta, ρ\rho and the subgaussian moment BB.

Recall that HJcH_{J^{c}} is the span of ndn-d independent random vectors. Since their distribution is absolutely continuous (see the beginning of Section 5), these vectors are almost surely in general position, so

Without loss of generality, in the proof of Theorem 7.1 we can assume that

We would like to prove Theorem 7.1 by a typical ε\varepsilon-net argument. Theorem 4.1 will give a useful probability bound for an individual zSn1z\in S^{n-1}. We might then take a union bound over all zz in an ε\varepsilon-net of SpreadJ\operatorname{Spread}_{J} and complete by approximation. However, the standard approximation argument will leave us with a larger error ecde^{-cd} on the probability, which is unsatisfactory for small dd. To improve upon this step, we shall improve upon this approach using decoupling in Section 7.2.

For now, we start with a bound for an individual zSn1z\in S^{n-1}.

Denote the entries of matrix AA by ξij\xi_{ij}. Then the entries of the random vector AzAz,

are independent and identically distributed mean zero random variables. Moreover, since the random variables ξij\xi_{ij} are subgaussian and j=1nzj2=1\sum_{j=1}^{n}z_{j}^{2}=1, the random variables ζi\zeta_{i} are also subgaussian (see Fact 2.1 in ).

Therefore the random vector X=AzX=Az and the random subspace H=HJcH=H_{J^{c}} satisfy the assumptions of Theorem 4.1 with m=N(nd)=2d1m=N-(n-d)=2d-1 (we used (7.1) here). An application of Theorem 4.1 completes the proof. ∎

Since J=d|J|=d and almost surely dim(HJc)=N(nd)=2d1\dim(H_{J^{c}})^{\perp}=N-(n-d)=2d-1, the random matrix WW acts as an operator from a dd-dimensional subspace into a (2d1)(2d-1)-dimensional subspace. Although the entries of WW are not necessarily independent, we expect WW to behave as if this was the case. To this end, we condition on the realization of the subspace (HJc)(H_{J^{c}}). Now the operator PP becomes a fixed projection, and the columns of WW become independent random vectors. Then WW satisfies a version of Proposition 2.3:

Using Proposition7.3, we can choose a constant K0K_{0} that depends only on the subgaussian moment, and such that

With this bound on the norm of WW, we can run the approximation argument and prove the distance bound in Lemma 7.2 uniformly over all zSpreadJz\in\operatorname{Spread}_{J}.

Let WW be a random matrix as in Proposition 7.3. Then for every tt that satisfies (7.2) we have

Taking the union bound and using the representation (7.4) in Lemma 7.2, we obtain

Now, suppose the event in (7.6) holds, i.e. there exists zSpreadJz^{\prime}\in\operatorname{Spread}_{J} such that

Choose zNz\in\mathcal{N} such that zz2ε\|z-z^{\prime}\|_{2}\leq\varepsilon. Then by the triangle inequality

Therefore, E\mathcal{E} holds. The bound on the probability of E\mathcal{E} completes the proof. ∎

By representation (7.4), this is a weaker version of Theorem 7.2, with ede^{-d} instead of ecNe^{-cN}. Unfortunately, this bound is too weak for small dd. In particular, for square matrices we have d=1d=1, and the bound is useless.

In the next section, we will refine our current approach using decoupling.

2. Refinement: decoupling

Our problem is that the probability bound in (7.5) is too weak. We will bypass this by decomposing our event according to all possible values of W\|W\|, and by decoupling the information about Wz2\|Wz\|_{2} from the information about W\|W\|.

Let WW be an N×dN\times d matrix whose columns are independent random vectors. Let β>0\beta>0 and let zSd1z\in S^{d-1} be a vector satisfying zkβd|z_{k}|\geq\frac{\beta}{\sqrt{d}} for all k{1,,d}k\in\{1,\ldots,d\}. Then for every 0<a<b0<a<b, we have

If d=1d=1 then W=Wz2\|W\|=\|Wz\|_{2}, so the probability in the left hand side is zero. So, let d2d\geq 2. Then we can decompose the index set {1,,n}\{1,\ldots,n\} into two disjoint subsets II and HH whose cardinalities differ by at most 11, say with I=d/2|I|=\lceil d/2\rceil.

We write W=WI+WHW=W_{I}+W_{H} where WIW_{I} and WHW_{H} are the submatrices of WW with columns in II and HH respectively. Similarly, for zSpreadJz\in\operatorname{Spread}_{J}, we write z=zI+zHz=z_{I}+z_{H}.

Since W2WI2+WH2\|W\|^{2}\leq\|W_{I}\|^{2}+\|W_{H}\|^{2}, we have

and similarly for pHp_{H}. It suffices to bound pIp_{I}; the argument for pHp_{H} is similar.

Writing Wz=WIzI+WHzHWz=W_{I}z_{I}+W_{H}z_{H} and using the independence of the matrices WIW_{I} and WHW_{H}, we conclude that

(In the last line we used WIzI=WzIW_{I}z_{I}=Wz_{I} and WHW\|W_{H}\|\leq\|W\|).

By the assumption on zz and since Id/2|I|\geq d/2, we have

Hence for x:=zI/zI2x:=z_{I}/\|z_{I}\|_{2} and u:=w/zI2u:=w/\|z_{I}\|_{2}, we obtain

Together with (7.7), this completes the proof. ∎

We use this decoupling in the following refinement of Lemma 7.4.

We condition on the realization of the subspace HJcH_{J^{c}} as above to make the columns of WW independent. By the definition (6.1) of SpreadJ\operatorname{Spread}_{J}, any zNz\in\mathcal{N} satisfies the condition of the Decoupling Proposition 7.5 with β=K1\beta=K_{1}. Taking the union bound and then using Proposition 7.5, we obtain

Assume now that LCDα,c(HJc)cNecN/m\operatorname{LCD}_{\alpha,c}(H_{J^{c}}^{\perp})\geq c\sqrt{N}e^{cN/m}, where α\alpha and cc are as in Theorem 4.3. Then using Proposition 7.3 and representation (7.4), we conclude as in the proof of Theorem 4.1 that

for any tt satisfying (7.2). Since s1s\geq 1 and d1d\geq 1, we can bound this as

Now, suppose the event in (7.8) holds, i.e. there exists zSpreadJz^{\prime}\in\operatorname{Spread}_{J} such that

Choose zNz\in\mathcal{N} such that zz2ε\|z-z^{\prime}\|_{2}\leq\varepsilon. Then by the triangle inequality

Therefore, E\mathcal{E} holds. The bound on the probability of E\mathcal{E} completes the proof. ∎

Recall that, without loss of generality, we assumed that (7.2) held. Let k1k_{1} be the smallest natural number such that

where C0C_{0} and K0K_{0} are constants from Lemma 2.3 and Lemma 7.6 respectively. Summing the probability estimates of Proposition 7.4 and Lemma 7.6 for s=2ks=2^{k}, k=1,,k1k=1,\ldots,k_{1}, we conclude that

By (7.9) and Proposition 2.3, the last expression does not exceed (Ct)d+ecN(Ct)^{d}+e^{-cN}. In view of representation (7.4), this completes the proof. ∎

Completion of the proof

In Section 6, we reduced the invertibility problem for incompressible vectors to computing the distance between a random ellipsoid and a random subspace. This distance was estimated in Section 7. These together lead to the following invertibility bound:

Let δ,ρ(0,1)\delta,\rho\in(0,1). There exist C,c>0C,c>0 which depend only on δ,ρ\delta,\rho, and such that the following holds. For every t>0t>0,

Without loss of generality, we can assume that (7.2) holds. We use Lemma 6.2 with ε=td\varepsilon=t\sqrt{d} and then Theorem 7.1 to get the bound (Ct)d(C^{\prime}t)^{d} on the desired probability. This completes the proof. ∎

This follows directly from (5.2), (5.3), and Theorem 8.1. ∎

References