The smallest singular value of a random rectangular matrix

Mark Rudelson, Roman Vershynin

Introduction

Extreme singular values of random matrices has been of considerable interest in mathematical physics, geometric functional analysis, numerical analysis and other fields. Consider an $N\times n$ real matrix $A$ with $N\geq n$ . The singular values $s_{k}(A)$ of $A$ are the eigenvalues of $|A|=\sqrt{A^{t}A}$ arranged in nonincreasing order. Of particular significance are the largest and the smallest singular values

A natural matrix model is given by matrices whose entries are independent real random variables with certain moment assumptions. In this paper, we shall consider subgaussian random variables $\xi$ – those whose tails are dominated by that of the standard normal random variable. Namely, a random variable $\xi$ is called subgaussian if there exists $B>0$ such that

The minimal $B$ in this inequality is called the subgaussian moment of $\xi$ . Inequality (1.2) is often equivalently formulated as the moment condition

where $C$ is an absolute constant. The class of subgaussian random variables includes many random variables that arise naturally in applications, such as normal, symmetric $\pm 1$ and general bounded random variables.

In this paper, we study $N\times n$ real random matrices $A$ whose entries are independent and identically distributed mean zero subgaussian random variables. The asymptotic behavior of the extreme singular values of $A$ is well understood. If the entries have unit variance and the dimension $n$ grows to infinity while the aspect ratio $n/N$ converges to a constant $\lambda\in(0,1)$ , then

almost surely. This result was proved in for Gaussian matrices, and in for matrices with independent and identically distributed entries with finite fourth moment. In other words, we have asymptotically

Considerable efforts were made recently to establish non-asymptotic estimates similar to (1.4), which would hold for arbitrary fixed dimensions $N$ and $n$ ; see the survey on the largest singular value, and the discussion below on the smallest singular value.

The largest singular value is relatively easy to bound above, up to a constant factor. Indeed, a standard covering argument shows that $s_{1}(A)$ is at most of the optimal order $\sqrt{N}$ for all fixed dimensions, see Proposition 2.3 below. The smallest singular value is significantly harder to control. The efforts to prove optimal bounds on $s_{n}(A)$ have a long history, which we shall now outline.

2. Tall matrices

A result of provides an optimal bound for tall matrices, those with aspect ratio $\lambda=n/N$ satisfies $\lambda<\lambda_{0}$ for some sufficiently small constant $\lambda_{0}>0$ . Recalling (1.4), one should expect that tall matrices satisfy

It was indeed proved in that for tall $\pm 1$ matrices one has

where $\lambda_{0}>0$ and $c>0$ are absolute constants.

3. Almost square matrices

As we move toward square matrices, thus making the aspect ratio $\lambda=n/N$ arbitrarily close to $1$ , the problem of estimating the smallest singular value becomes harder. One still expects (1.5) to be true as long as $\lambda<1$ is any constant. Indeed, this was proved in for arbitrary aspect ratios $\lambda<1-c/\log n$ and for general random matrices with independent subgaussian entries. One has

where $c_{\lambda}>0$ depends only on $\lambda$ and the maximal subgaussian moment of the entries.

In subsequent work , the dependence of $c_{\lambda}$ on the aspect ratio in (1.7) was improved for random $\pm 1$ matrices; however the probability estimate there was weaker than in (1.7). An estimate for subgaussian random matrices of all dimensions was obtained in . For any $\varepsilon\geq CN^{-1/2}$ , it was shown that

However, because of the factor $(1-\lambda)$ , this estimate is suboptimal and does not correspond to the expected asymptotic behavior (1.4).

4. Square matrices

The extreme case for the problem of estimating the singular value is for the square matrices, where $N=n$ . Asymptotic (1.4) is useless for square matrices. However, for “almost” square matrices, those with constant defect $N-n=O(1)$ , the quantity $\sqrt{N}-\sqrt{n}$ is of order $1/\sqrt{N}$ , so asymptotics (1.4) heuristically suggests that these matrices should satisfy

This conjecture was proved recently in for all square subgaussian matrices:

5. New result: bridging all classes of matrices

In this paper, we prove the conjectural bound for $s_{n}(A)$ valid for all subgaussian matrices in all fixed dimensions $N,n$ . The bound is optimal for matrices with all aspect ratios we encountered above.

Let $A$ be an $N\times n$ random matrix, $N\geq n$ , whose elements are independent copies of a mean zero subgaussian random variable with unit variance. Then, for every $\varepsilon>0$ , we have

where $C,c>0$ depend (polynomially) only on the subgaussian moment $B$ .

For tall matrices, Theorem 1.1 clearly amounts to the known estimates (1.5), (1.6). For square matrices ( $N=n$ ), the quantity $\sqrt{N}-\sqrt{N-1}$ is of order $1/\sqrt{N}$ , so Theorem 1.1 amounts to the known estimates (1.8), (1.9). Finally, for matrices that are arbitrarily close to square, Theorem 1.1 yields the new optimal estimate

This is a version of the asymptotics (1.4), now valid for all fixed dimensions. This bound was explicitly conjectured e.g. in .

However, this bound is not optimal, and it becomes useless for matrices that are close to square, when $N-n=o(\sqrt{n})$ .

The form of estimate (1.10) may be expected if one recalls the classical $\varepsilon$ -net argument, which underlies many proofs in geometric functional analysis. By (1.1), we are looking for a lower bound on $\|Ax\|$ that would hold uniformly for all vectors $x$ on the unit Euclidean sphere $S^{n-1}$ . For every fixed $x\in S^{n-1}$ , the quantity $\|Ax\|_{2}^{2}$ is the sum of $N$ independent random variables (the squares of the coordinates of $Ax$ ). Therefore, the deviation inequalities make us to expect that $\|Ax\|_{2}$ is of the order $\sqrt{N}$ with probability exponential in $N$ , i.e. $1-e^{-cN}$ . We can run this argument separately for each vector $x$ in a small net $\mathcal{N}$ of the sphere $S^{n-1}$ , and then take the union bound to make the estimate uniform over $x\in\mathcal{N}$ . It is known how to choose a net $\mathcal{N}$ of cardinality exponential in the dimension $n-1$ of the sphere, i.e. $|\mathcal{N}|\leq e^{C(n-1)}$ . Therefore, with probability $1-e^{C(n-1)}e^{-cN}$ , we have a good lower bound on $\|Ax\|_{2}\sim\sqrt{N}$ for all vectors $x$ in the net $\mathcal{N}$ . Finally, one transfers this estimate from the net to the whole sphere $S^{n-1}$ by approximation.

The problem with this argument is that the constants $C$ and $c$ are not the same. Therefore, our estimate on the probability $1-e^{C(n-1)}e^{-cN}$ is positive only for tall matrices, when $N\geq(C/c)n$ . To reach out to matrices of arbitrary dimensions, one needs to develop much more sensitive versions of the $\varepsilon$ -net arguments. Nevertheless, the end result stated in Theorem 1.1 exhibits the same two forces played against one another – the probability quantified by the dimension $N$ and the complexity of the sphere $S^{n-1}$ quantified by its dimension $n-1$ .

6. Small ball probabilities, distance problems, and additive structure

Our proof of Theorem 1.1 is a development of our method in for square matrices. Dealing with rectangular matrices is in several ways considerably harder. Several new tools are developed in this paper, which may be of independent interest.

To prove (1.12), we first use the small ball probability inequalities to compute the distance to an arbitrary subspace $H$ . This estimate necessarily depends on the additive structure of the subspace $H$ ; the less structure, the better is our estimate, see Theorem 4.2. We then prove the intuitively plausible fact that random subspaces have no arithmetic structure, see Theorem 4.3. This together leads to the desired distance estimate (1.12).

The distance bound is then used to prove our main result, Theorem 1.1. Let $X$ be some column of the random matrix $A$ and $H$ be the span of the other columns. The simple rank argument shows that the smallest singular value $s_{n}(A)=0$ if and only if $X\in H$ for some column. A simple quantitative version of this argument is that a lower estimate on $s_{n}(A)$ yields a lower bound on ${\rm dist}(X,H)$ .

In Section 6, we show how to reverse this argument for random matrices – deduce a lower bound on the smallest singular value $s_{n}(A)$ from lower bound (1.12) on the distance ${\rm dist}(X,H)$ . Our reverse argument is harder than its version for square matrices from , where we had $m=1$ . First, instead of one column $X$ we now have to consider all linear combinations of $d\sim m/2$ columns; see Lemma 6.2. To obtain a distance bound that would be uniformly good for all such linear combinations, one would normally use an $\varepsilon$ -net argument. However, the distance to the $(N-m)$ -dimensional subspace $H$ is not sufficiently stable for this argument to be useful for small $m$ (for matrices close to square). We therefore develop a decoupling argument in Section 7 to bypass this difficulty.

Once this is done, the proof is quickly completed in Section 8.

Acknowledgement

We are grateful to Shuheng Zhou, Nicole Tomczak-Jaegermann, Radoslaw Adamczak, and the anonymous referee for pointing out several inaccuracies in our argument. The second named author is grateful for his wife Lilia for her love and patience during the years this paper was being written.

Notation and preliminaries

Throughout the paper, positive constants are denoted $C,C_{1},C_{2},c,c_{1},c_{2},\ldots$ Unless otherwise stated, these are absolute constants. In some of our arguments they may depend (polynomially) on specified parameters, such as the subgaussian moment $B$ .

The following Lemma is a variant of the well known volumetric estimate.

Let $S$ be a subset of $S^{n-1}$ , and let $\varepsilon>0$ . Then there exists an $\varepsilon$ -net of $S$ of cardinality at most

The published variants of his lemma (e.g. , Lemma 2.6) have exponent $n$ rather than $n-1$ . Since the latter exponent will be crucial for our purposes, we include the proof of this lemma for the reader’s convenience.

Without loss of generality we can assume that $\varepsilon<2$ , otherwise any single point forms a desired net. Let $\mathcal{N}$ be an $\varepsilon$ -separated subset of $S$ of maximal cardinality. By maximality, $\mathcal{N}$ is an $\varepsilon$ -net of $S$ . Since $\mathcal{N}$ is $\varepsilon$ -separated, the balls $B(x,\varepsilon/2)$ with centers $x\in\mathcal{N}$ are disjoint. All these balls have the same volume, and they are contained in the spherical shell $B(0,1+\varepsilon/2)\setminus B(0,1-\varepsilon/2)$ . Therefore, comparing the volumes, we have

Dividing both sides of this inequality by ${\rm vol}(B(0,1))$ , we obtain

Using the inequality $(1+x)^{n}-(1-x)^{n}\leq 2nx(1+x)^{n-1}$ valid for $x\in(0,1)$ , we conclude that $|\mathcal{N}|$ is bounded as desired. This completes the proof. ∎

The following well known argument allows one to compute the norm of a linear operator using nets. We have not found a published reference to this argument, so we include it for the reader’s convenience.

Every $z\in S^{n-1}$ has the form $z=x+h$ , where $x\in\mathcal{N}$ and $\|h\|_{2}\leq\varepsilon$ . Since $\|A\|=\sup_{z\in S^{n-1}}\|Az\|_{2}$ , the triangle inequality yields

The last term in the right hand side is bounded by $\varepsilon\|A\|$ . Therefore we have shown that

Fix $x\in\mathcal{N}$ . Repeating the above argument for $\|Ax\|_{2}=\sup_{y\in S^{m-1}}|\langle Ax,y\rangle|$ yields the bound

The two previous estimates complete the proof. ∎

Using nets, one easily proves the well known basic bound $O(\sqrt{N})$ on the norm of a random subgaussian matrix:

Let $A$ be an $N\times n$ random matrix, $N\geq n$ , whose elements are independent copies of a subgaussian random variable. Then

where $C_{0},c_{0}>0$ depend only on the subgaussian moment $B$ .

Let $\mathcal{N}$ be a $(1/2)$ -net of $S^{N-1}$ and $M$ be a $(1/2)$ -net of $S^{n-1}$ . By Proposition 2.1, we can choose these nets such that

For every $x\in\mathcal{N}$ and $y\in\mathcal{M}$ , the random variable $\langle Ax,y\rangle$ is subgaussian (see Fact 2.1 in ), thus

where $C_{1},c_{1}>0$ depend only on the subgaussian moment $B$ . Using Lemma 2.2 and taking the union bound, we obtain

2. Compressible and incompressible vectors

In our proof of Theorem 1.1, we will make use of a partition of the unit sphere $S^{n-1}$ into two sets of compressible and incompressible vectors. These sets were first defined in as follows.

We now recall without proof two simple results. The first is Lemma 3.4 from :

Let $x\in{\mathit{Incomp}}(\delta,\rho)$ . Then there exists a set $\sigma=\sigma(x)\subseteq\{1,\ldots,n\}$ of cardinality $|\sigma|\geq\frac{1}{2}\rho^{2}\delta n$ and such that

The other result is a variant of Lemma 3.3 from , which establishes the invertibility on compressible vectors, and allows us to focus on incompressible vectors in our proof of Theorem 1.1. While Lemma 3.3 was formulated in for a square matrix, the same proof applies to $N\times n$ matrices, provided that $N\geq n/2$ .

Let $A$ be an $N\times n$ random matrix, $N\geq n/2$ , whose elements are independent copies of a subgaussian random variable. There exist $\delta,\rho,c_{3}>0$ depending only on the subgaussian moment $B$ such that

Small ball probability and the arithmetic structure

Starting from the works of Lévy , Kolmogorov and Esséen , a number of results in probability theory was concerned with the question how spread the sums of independent random variables are. It is convenient to quantify the spread of a random variable in the following way.

An equivalent way of looking at the Lévy concentration function is that it measures the small ball probabilities – the likelihood that the random vector $S$ enters a small ball in the space. An exposition of the theory of small ball probabilities can be found in .

One can derive a simple but rather weak bound on Lévy concentration function from Paley-Zygmund inequality.

Let $\xi$ be a random variable with mean zero, unit variance, and finite fourth moment. Then for every $\varepsilon\in(0,1)$ there exists $p\in(0,1)$ which depends only on $\varepsilon$ and on the fourth moment, and such that

In particular, this bound holds for subgaussian random variables, and with $p$ that depends only on $\varepsilon$ and the subgaussian moment.

We use Paley-Zygmund inequality, which states for a random variable $Z$ that

so, using Minkowski inequality, we obtain

We will need a much stronger bound on the concentration function for sums of independent random variables. Here we present a multi-dimensional version of the inverse Littlewood-Offord inequality from . While this paper was in preparation, Friedland and Sodin proposed two different ways to simplify and improve our argument in . We shall therefore present here a multi-dimensional version of one of arguments of Friedland and Sodin , which is considerably simpler than our original proof.

In the scalar case, when $m=1$ , the additive structure of a sequence $a=(a_{1},\ldots,a_{N})$ of real numbers $a_{k}$ can be described in terms of the shortest arithmetic progression into which it (essentially) embeds. This length is conveniently expressed as the essential least common denominator of $a$ , defined as follows. We fix parameters $\alpha,\gamma\in(0,1)$ , and define

A more traditional way of looking at $\theta\cdot a$ is to regard it as the product of the matrix $a$ with rows $a_{k}$ and the vector $\theta$ .

Then we define, for $\alpha>0$ and $\gamma\in(0,1)$ ,

The following theorem gives a bound on the small ball probability for a random sum $S=\sum_{k=1}^{N}a_{k}\xi_{k}$ in terms of the additive structure of the coefficient sequence $a$ . The less structure in $a$ , the bigger its least common denominator is, and the smaller is the small ball probability for $S$ .

Let $\xi_{1},\ldots,\xi_{N}$ be independent and identically distributed, mean zero random variables, such that $\mathcal{L}(\xi_{k},1)\leq 1-b$ for some $b>0$ . Consider the random sum $S=\sum_{k=1}^{N}a_{k}\xi_{k}$ . Then, for every $\alpha>0$ and $\gamma\in(0,1)$ , and for

Halász developed a powerful approach to bounding concentration function; his approach influenced our arguments below. Halász operated under a similar non-degeneracy condition on the vectors $a_{k}$ : for every $x\in S^{m-1}$ , at least $cN$ terms satisfy $|\langle{a_{k}},{x}\rangle|\geq 1$ . After properly rescaling $a_{k}$ by the factor $\sqrt{c/N}$ , Halász’s condition is seen to be more restrictive than (3.2).

To estimate the Lévy concentration function we apply the Esséen Lemma, see e.g. , p. 290.

Applying Lemma 3.4 to the vector $Y=S/\varepsilon$ and using the independence of random variables $\xi_{1},\ldots,\xi_{N}$ , we obtain

Substituting of this into (3.3) and using Jensen’s inequality, we get

The next and major step is to bound the size of the recurrence set

Recall that by the assumption of the theorem,

Therefore, by the definition of the least common denominator, we have that either

In the latter case, since $2t<\alpha$ , inequalities (3.4) and (3.5) together yield

where the last inequality follows from condition (3.2).

Recalling the definition of $\tau$ , we have proved that every pair of points $\theta^{\prime},\theta^{\prime\prime}\in I(t)$ satisfies:

It follows that $I(t)$ can be covered by Euclidean balls of radii $r$ , whose centers are $R$ -separated in the Euclidean distance. Since $I(t)\subset B(0,\sqrt{m})$ , the number of such balls is at most

which completes the proof of the lemma. ∎

We decompose the domain into two parts. First, by the definition of $I(t)$ , we have

In the last line, we used the estimate $|{\rm vol}(B(0,\sqrt{m})|\leq C^{m}$ .

Second, by the integral distribution formula and using Lemma 3.5, we have

Combining (3.1) and (3.1) completes the proof of Theorem 3.3. ∎

2. Least common denominator of incompressible vectors

The proof gives $c_{1}(\delta,\rho)=\frac{1}{2}\rho^{2}\sqrt{\delta}$ and $c_{2}(\delta)=\frac{1}{2}\sqrt{\delta}$ .

By Lemma 2.5, there exists a set $\sigma_{1}\subseteq\{1,\ldots,N\}$ of size

This shows in particular that $\theta>0$ ; dividing by $\theta$ gives

Then by Chebychev inequality, there exists a set $\sigma_{2}\subseteq\{1,\ldots,N\}$ of size

Since $|\sigma_{1}|+|\sigma_{2}|>N$ , there exists $k\in\sigma_{1}\cap\sigma_{2}$ . Fix this $k$ . By the left hand side of (3.8), by (3.9) and the assumption on $\gamma$ we have:

Thus $|p_{k}|>0$ ; since $p_{k}$ is an integer, this yields $|p_{k}|\geq 1$ . Similarly, using the right hand side of (3.8), (3.9) and the assumption on $\gamma$ , we get

The distance problem and arithmetic structure

More precisely, standard computations give for every $\varepsilon>0$ that

However, if $X$ has a more general distribution with independent coordinates, the distance ${\rm dist}(X,H)$ may strongly depend on the subspace $H$ . For example, if the coordinates of $X$ are $\pm 1$ symmetric random variables. then for $H=\{x:\;x_{1}+x_{2}=0\}$ the distance equals with probability $1/2$ , while for $H=\{x:\;x_{1}+\cdots+x_{N}=0\}$ the distance equals with probability $\sim 1/\sqrt{N}$ .

Nevertheless, a version of the distance bound (4.1) remains true for general distributions if $H$ is a random subspace. For spaces of codimension $m=1$ , this result was proved in . In this paper, we prove an optimal distance bound for general dimensions.

To explain the term $e^{-cN}$ , consider $\pm 1$ symmetric random variables. Then with probability at least $2^{-n}$ the random vector $X$ coincides with one of the random vectors that span $H$ , which makes the distance equal zero.

We will deduce Theorem 4.1 from a more general inequality that holds for arbitrary fixed subspace $H$ . This bound will depend on the arithmetic structure of the subspace $H$ , which we express using the least common denominator.

Then Theorem 3.3 quickly leads to the following general distance bound:

where $C,c>0$ depend only on the subgaussian moment.

Let us write $X$ in coordinates, $X=(\xi_{1},\ldots,\xi_{N})$ . By Lemma 3.2 and the remark below it, all coordinates of $X$ satisfy the inequality $\mathcal{L}(\xi_{k},1/2)\leq 1-b$ for some $b>0$ that depends only on the subgaussian moment of $\xi_{k}$ . Hence the random variables $\xi_{k}/2$ satisfy the assumption in Theorem 3.3.

Next, we connect the distance to a sum of independent random vectors:

For every $\theta=(\theta_{1},\ldots,\theta_{N})\in H^{\perp}$ and every $k$ we have $\langle\theta,a_{k}\rangle=\langle P_{H^{\perp}}\theta,e_{k}\rangle=\langle\theta,e_{k}\rangle=\theta_{k},$ so

The theorem now follows directly from Theorem 3.3. ∎

In order to deduce the Distance Theorem 4.1, it will now suffice to bound below the least common denominator of a random subspace $H^{\perp}$ . Heuristically, the randomness should remove any arithmetic structure from the subspace, thus making the least common denominator exponentially large. Our next results shows that this is indeed true.

Assuming that this result holds, we can complete the proof of the Distance Theorem 4.1.

Let us condition on a realization of $H$ in $\mathcal{E}$ . By the independence of $X$ and $H$ , Theorem 4.2 used with $\alpha=c\sqrt{N}$ and $\gamma=c$ gives

By the estimate on the probability of $\mathcal{E}^{c}$ , this completes the proof. ∎

Let $X_{1},\ldots,X_{N-m}$ denote the independent random vectors that span the subspace $H$ . Consider an $(N-m)\times N$ random matrix $B$ with rows $X_{k}$ . Then

This observation will help us to “navigate” the random subspace $H^{\perp}$ away from undesired sets $S$ on the unit sphere.

There exist $\delta,\rho\in(0,1)$ such that

By (4.3), $H^{\perp}\cap{\mathit{Comp}}(\delta,\rho)=\emptyset$ with probability at least $1-e^{-c_{3}N}$ . ∎

Fix the values of $\delta$ and $\rho$ given by Lemma 4.4 for the rest of this section. We will further decompose the set of incompressible vectors into level sets $S_{D}$ according to the value of the least common denominator $D$ . We shall prove a nontrivial lower bound on $\inf_{x\in S_{D}}\|Bx\|_{2}>0$ for each level set up to $D$ of the exponential order. By (4.3), this will mean that $H^{\perp}$ is disjoint from every such level set. Therefore, all vectors in $H^{\perp}$ must have exponentially large least common denominators $D$ . This is Theorem 4.3.

Let $\alpha=\mu\sqrt{N}$ , where $\mu>0$ is a small number to be chosen later, which depends only on the subgaussian moment. By Lemma 3.6,

Let $D\geq c_{0}\sqrt{N}$ . Define $S_{D}\subseteq S^{N-1}$ as

To obtain a lower bound for $\|Bx\|_{2}$ on the level set, we proceed by an $\varepsilon$ -net argument. To this end, we first need such a bound for a single vector $x$ .

Let $x\in S_{D}$ . Then for every $t>0$ we have

Denoting the elements of $B$ by $\xi_{jk}$ , we can write the $j$ -th coordinate of $Bx$ as

Now we can use the Small Ball Probability Theorem 3.3 in dimension $m=1$ for each of these random sums. By Lemma 3.2 and the remark below it, $\mathcal{L}(\xi_{jk},1/2)\leq 1-b$ for some $b>0$ that depends only on the subgaussian moment of $\xi_{jk}$ . Hence the random variables $\xi_{jk}/2$ satisfy the assumption in Theorem 3.3. This gives for every $j$ and every $t>0$ :

Since $\zeta_{j}$ are independent random variables, we can use Tensorization Lemma 2.2 of to conclude that for every $t>0$ ,

This completes the proof, because $\|Bx\|_{2}^{2}=\sum_{j=1}^{N-m}|\zeta_{j}|^{2}$ and $N\leq 2(N-m)$ by the assumption. ∎

Next, we construct a small $\varepsilon$ -net of the level set $S_{D}$ . Since this set lies in $S^{N-1}$ , Lemma 2.1 yields the existence of an $(\sqrt{N}/D)$ -net of cardinality at most $(CD/\sqrt{N})^{N}$ . This simple volumetric bound is not sufficient for our purposes, and this is the crucial step where we explore the additive structure of $S_{D}$ to construct a smaller net.

There exists a $(4\alpha/D)$ -net of $S_{D}$ of cardinality at most $(C_{0}D/\sqrt{N})^{N}$ .

Recall that $\alpha$ is chosen as a small proportion of $\sqrt{N}$ . Hence Lemma 4.7 gives a better bound than the standard volumetric bound in Lemma 2.1.

We can assume that $4\alpha/D\leq 1$ , otherwise the conclusion is trivial. For $x\in S_{D}$ , denote

On the other hand, by (4.5) and using that $\|x\|_{2}=1$ , $D(x)\leq 2D$ and $4\alpha/D\leq 1$ , we obtain

Inequalities (4.6) and (4.7) show that every point $x\in S_{D}$ is within Euclidean distance $2\alpha/D$ from the set

A known volumetric argument gives a bound on the number of integer points in $B(0,3D)$ :

(where in the last inequality we used that by Definition 4.5 of the level sets, $D>c_{0}\sqrt{N}$ ). Finally, there exists a $(4\alpha/D)$ -net of $S_{D}$ with the same cardinality as $\mathcal{N}$ , and which lies in $S_{D}$ . Indeed, to obtain such a net, one selects one (arbitrary) point from the intersection of $S_{D}$ with a ball of radius $2\alpha/D$ centered at each point from $\mathcal{N}$ . This completes the proof. ∎

There exist $c_{1},c_{2},\mu\in(0,1)$ such that the following holds. Let $\alpha=\mu\sqrt{N}\geq 1$ and $D\leq c_{1}\sqrt{N}e^{c_{1}N/m}$ . Then

By Lemma 2.3, there exists $K\geq 1$ that depends only on the subgaussian moment and such that

Therefore, in order to complete the proof, it is enough to find $\nu>0$ which depends only on the subgaussian moment, and such that the event

We claim that this holds with the following choice of parameters:

where $C\geq 1$ and $c\in(0,1)$ are the constants from Lemma 4.6 and $C_{0}\geq 1$ is the constant from Lemma 4.7.

This gives for arbitrary $x_{0}\in S_{D}$ :

Now we use Lemma 4.7, which yields a small $(4\alpha/D)$ -net $\mathcal{N}$ of $S_{D}$ . Taking the union bound, we get

Denote $C_{1}:=3CC_{0}$ . Using the fact that $c_{1}\leq\nu$ and our assumption on $D$ , we have:

Assume $\mathcal{E}$ occurs. Fix $x\in S_{D}$ for which $\|Bx\|_{2}<\frac{\nu N}{2D}$ ; it can be approximated by some element $x_{0}\in\mathcal{N}$ as

Therefore, by the triangle inequality we have

where in the last inequality we used our choice of $\mu$ .

We have shown that the event $\mathcal{E}$ implies the event that

whose probability is at most $e^{-N}$ by (4.8). The proof is complete. ∎

where $c_{1}$ is the constant from Lemma 4.8. Then, by the Definition 4.5 of the level sets, either $x$ is compressible or $x\in S_{D}$ for some $D\in\mathcal{D}$ , where

Therefore, recalling the definition of the least common denominator of the subspace

we can decompose the desired probability as follows:

By Lemma 4.4, the first term in the right hand side is bounded by $e^{-cN}$ . Further terms can be bonded using (4.3) and Lemma 4.8:

Since there are $|\mathcal{D}|\leq C^{\prime}N$ terms in the sum, we conclude that

Decomposition of the sphere

Now we begin the proof of Theorem 1.1. We will make several useful reductions first.

Without loss of generality, we can assume that the entries of $A$ have a an absolutely continuous distribution. Indeed, we can add to each entry an independent Gaussian random variable with small variance $\sigma$ , and later let $\sigma\to 0$ .

Similarly, we can assume that $n\geq n_{0}$ , where $n_{0}$ is a suitably large number that depends only on the subgaussian moment $B$ .

with suitably small constant $c_{0}>0$ that depends only on the subgaussian moment $B$ . Indeed, as we remarked in the Introduction, for the values of $d$ above a constant proportion of $n$ , Theorem 1.1 follows from (1.7). Note that

Using the decomposition of the sphere $S^{n-1}={\mathit{Comp}}\cup{\mathit{Incomp}}$ , we break the invertibility problem into two subproblems, for compressible and incompressible vectors:

A bound for the compressible vectors follows from Lemma 2.6. Using (5.1) we get

It remains to find a lower bound on $\|Ax\|$ for the incompressible vectors $x$ .

Invertibility via uniform distance bounds

In this section, we reduce the problem of bounding $\|Ax\|_{2}$ for incompressible vectors $x$ to the distance problem that we addressed in Section 4.

For levels $K_{1},K_{2}>0$ that will only depend on $\delta,\rho$ , we define the set of totally spread vectors

For every $\delta,\rho\in(0,1)$ , there exist $K_{1},K_{2},c_{0}>0$ which depend only on $\delta,\rho$ , and such that the following holds. For every $x\in{\mathit{Incomp}}(\delta,\rho)$ , the event

The proof gives $K_{1}=\rho\sqrt{\delta/2}$ , $K_{2}=1/K_{1}$ , $c_{0}=\rho^{2}\delta/2e$ . In the rest of the proof, we shall use definition (6.1) of $\operatorname{Spread}_{J}$ with these values of the levels $K_{1}$ , $K_{2}$ .

Let $\sigma\subset\{1,\ldots,n\}$ be the subset from Lemma 2.5. Recall that the parameters $\delta$ and $\rho$ depend only on the subgaussian moment $B$ (see Lemma 2.6). By choosing the constant $c_{0}$ in (5.1) appropriately small, we may assume that $d\leq|\sigma|/2$ . Then, using Stirling’s approximation we have

If $J\subset\sigma$ , then summing (2.1) over $k\in J$ , we obtain the required two-sided bound for $\|P_{J}x\|_{2}$ . This and (2.1) yields $\frac{P_{J}x}{\left\|P_{J}x\right\|_{2}}\in\operatorname{Spread}_{J}$ . Hence $\mathcal{E}(x)$ holds. ∎

Let $\delta,\rho\in(0,1)$ . There exist $C_{1},c_{1}>0$ which depend only on $\delta,\rho$ , and such that the following holds. Let $J$ be any $d$ -element subset of $\{1,\ldots,n\}$ . Then for every $\varepsilon>0$

The proof gives $K_{1}=\rho\sqrt{\delta/2}$ , $K_{2}=1/K_{1}$ , $c_{1}=\rho/\sqrt{2}$ , $C_{1}=2e/\rho^{2}\delta$ .

Let $x\in{\mathit{Incomp}}(\delta,\rho)$ . For every subset $J$ of $\{1,\ldots,n\}$ we have

In case the event $\mathcal{E}(x)$ of Lemma 6.1 holds, we use the vector $z=\frac{P_{J}x}{\left\|P_{J}x\right\|_{2}}\in\operatorname{Spread}_{J}$ to check that

is independent of $x$ . Moreover, using the estimate on $\|P_{J}x\|_{2}$ in the definition of the event $\mathcal{E}(x)$ , we conclude that

where $c_{0}$ is the constant from Lemma 6.1. Chebychev inequality and Fubini theorem then yield

Fix any realization of $A$ for which $\mathcal{F}$ occurs, and fix any $x\in{\mathit{Incomp}}(\delta,\rho)$ . Then

We have proved that for every $x\in{\mathit{Incomp}}(\delta,\rho)$ there exists a subset $J=J(x)$ that satisfies both $\mathcal{E}(x)$ and $D(A,J)\geq\varepsilon$ . Using this $J$ in (6.3), we conclude that every matrix $A$ for which the event $\mathcal{F}$ occurs satisfies

The uniform distance bound

In this section, we shall estimate the distance between a random ellipsoid and a random independent subspace. This is the distance that we need to bound in the right hand side of (6.2).

Throughout this section, we let $J$ be a fixed subset of $\{1,\ldots,n\}$ , $|J|=d$ . We shall use the notation introduced in the beginning of Section 6. Thus, $H_{J}$ denotes a random subspace, and $\operatorname{Spread}_{J}$ denotes the totally spread set whose levels $K_{1}$ , $K_{2}$ depend only on $\delta$ , $\rho$ in the definition of incompressibility.

We will denote by $K,K_{0},C,c,C_{1},c_{1},\ldots$ positive numbers that depend only on $\delta$ , $\rho$ and the subgaussian moment $B$ .

Recall that $H_{J^{c}}$ is the span of $n-d$ independent random vectors. Since their distribution is absolutely continuous (see the beginning of Section 5), these vectors are almost surely in general position, so

Without loss of generality, in the proof of Theorem 7.1 we can assume that

We would like to prove Theorem 7.1 by a typical $\varepsilon$ -net argument. Theorem 4.1 will give a useful probability bound for an individual $z\in S^{n-1}$ . We might then take a union bound over all $z$ in an $\varepsilon$ -net of $\operatorname{Spread}_{J}$ and complete by approximation. However, the standard approximation argument will leave us with a larger error $e^{-cd}$ on the probability, which is unsatisfactory for small $d$ . To improve upon this step, we shall improve upon this approach using decoupling in Section 7.2.

For now, we start with a bound for an individual $z\in S^{n-1}$ .

Denote the entries of matrix $A$ by $\xi_{ij}$ . Then the entries of the random vector $Az$ ,

are independent and identically distributed mean zero random variables. Moreover, since the random variables $\xi_{ij}$ are subgaussian and $\sum_{j=1}^{n}z_{j}^{2}=1$ , the random variables $\zeta_{i}$ are also subgaussian (see Fact 2.1 in ).

Therefore the random vector $X=Az$ and the random subspace $H=H_{J^{c}}$ satisfy the assumptions of Theorem 4.1 with $m=N-(n-d)=2d-1$ (we used (7.1) here). An application of Theorem 4.1 completes the proof. ∎

Since $|J|=d$ and almost surely $\dim(H_{J^{c}})^{\perp}=N-(n-d)=2d-1$ , the random matrix $W$ acts as an operator from a $d$ -dimensional subspace into a $(2d-1)$ -dimensional subspace. Although the entries of $W$ are not necessarily independent, we expect $W$ to behave as if this was the case. To this end, we condition on the realization of the subspace $(H_{J^{c}})$ . Now the operator $P$ becomes a fixed projection, and the columns of $W$ become independent random vectors. Then $W$ satisfies a version of Proposition 2.3:

Using Proposition7.3, we can choose a constant $K_{0}$ that depends only on the subgaussian moment, and such that

With this bound on the norm of $W$ , we can run the approximation argument and prove the distance bound in Lemma 7.2 uniformly over all $z\in\operatorname{Spread}_{J}$ .

Let $W$ be a random matrix as in Proposition 7.3. Then for every $t$ that satisfies (7.2) we have

Taking the union bound and using the representation (7.4) in Lemma 7.2, we obtain

Now, suppose the event in (7.6) holds, i.e. there exists $z^{\prime}\in\operatorname{Spread}_{J}$ such that

Choose $z\in\mathcal{N}$ such that $\|z-z^{\prime}\|_{2}\leq\varepsilon$ . Then by the triangle inequality

Therefore, $\mathcal{E}$ holds. The bound on the probability of $\mathcal{E}$ completes the proof. ∎

By representation (7.4), this is a weaker version of Theorem 7.2, with $e^{-d}$ instead of $e^{-cN}$ . Unfortunately, this bound is too weak for small $d$ . In particular, for square matrices we have $d=1$ , and the bound is useless.

In the next section, we will refine our current approach using decoupling.

2. Refinement: decoupling

Our problem is that the probability bound in (7.5) is too weak. We will bypass this by decomposing our event according to all possible values of $\|W\|$ , and by decoupling the information about $\|Wz\|_{2}$ from the information about $\|W\|$ .

Let $W$ be an $N\times d$ matrix whose columns are independent random vectors. Let $\beta>0$ and let $z\in S^{d-1}$ be a vector satisfying $|z_{k}|\geq\frac{\beta}{\sqrt{d}}$ for all $k\in\{1,\ldots,d\}$ . Then for every $0<a<b$ , we have

If $d=1$ then $\|W\|=\|Wz\|_{2}$ , so the probability in the left hand side is zero. So, let $d\geq 2$ . Then we can decompose the index set $\{1,\ldots,n\}$ into two disjoint subsets $I$ and $H$ whose cardinalities differ by at most $1$ , say with $|I|=\lceil d/2\rceil$ .

We write $W=W_{I}+W_{H}$ where $W_{I}$ and $W_{H}$ are the submatrices of $W$ with columns in $I$ and $H$ respectively. Similarly, for $z\in\operatorname{Spread}_{J}$ , we write $z=z_{I}+z_{H}$ .

Since $\|W\|^{2}\leq\|W_{I}\|^{2}+\|W_{H}\|^{2}$ , we have

and similarly for $p_{H}$ . It suffices to bound $p_{I}$ ; the argument for $p_{H}$ is similar.

Writing $Wz=W_{I}z_{I}+W_{H}z_{H}$ and using the independence of the matrices $W_{I}$ and $W_{H}$ , we conclude that

(In the last line we used $W_{I}z_{I}=Wz_{I}$ and $\|W_{H}\|\leq\|W\|$ ).

By the assumption on $z$ and since $|I|\geq d/2$ , we have

Hence for $x:=z_{I}/\|z_{I}\|_{2}$ and $u:=w/\|z_{I}\|_{2}$ , we obtain

Together with (7.7), this completes the proof. ∎

We use this decoupling in the following refinement of Lemma 7.4.

We condition on the realization of the subspace $H_{J^{c}}$ as above to make the columns of $W$ independent. By the definition (6.1) of $\operatorname{Spread}_{J}$ , any $z\in\mathcal{N}$ satisfies the condition of the Decoupling Proposition 7.5 with $\beta=K_{1}$ . Taking the union bound and then using Proposition 7.5, we obtain

Assume now that $\operatorname{LCD}_{\alpha,c}(H_{J^{c}}^{\perp})\geq c\sqrt{N}e^{cN/m}$ , where $\alpha$ and $c$ are as in Theorem 4.3. Then using Proposition 7.3 and representation (7.4), we conclude as in the proof of Theorem 4.1 that

for any $t$ satisfying (7.2). Since $s\geq 1$ and $d\geq 1$ , we can bound this as

Now, suppose the event in (7.8) holds, i.e. there exists $z^{\prime}\in\operatorname{Spread}_{J}$ such that

Choose $z\in\mathcal{N}$ such that $\|z-z^{\prime}\|_{2}\leq\varepsilon$ . Then by the triangle inequality

Therefore, $\mathcal{E}$ holds. The bound on the probability of $\mathcal{E}$ completes the proof. ∎

Recall that, without loss of generality, we assumed that (7.2) held. Let $k_{1}$ be the smallest natural number such that

where $C_{0}$ and $K_{0}$ are constants from Lemma 2.3 and Lemma 7.6 respectively. Summing the probability estimates of Proposition 7.4 and Lemma 7.6 for $s=2^{k}$ , $k=1,\ldots,k_{1}$ , we conclude that

By (7.9) and Proposition 2.3, the last expression does not exceed $(Ct)^{d}+e^{-cN}$ . In view of representation (7.4), this completes the proof. ∎

Completion of the proof

In Section 6, we reduced the invertibility problem for incompressible vectors to computing the distance between a random ellipsoid and a random subspace. This distance was estimated in Section 7. These together lead to the following invertibility bound:

Let $\delta,\rho\in(0,1)$ . There exist $C,c>0$ which depend only on $\delta,\rho$ , and such that the following holds. For every $t>0$ ,

Without loss of generality, we can assume that (7.2) holds. We use Lemma 6.2 with $\varepsilon=t\sqrt{d}$ and then Theorem 7.1 to get the bound $(C^{\prime}t)^{d}$ on the desired probability. This completes the proof. ∎

This follows directly from (5.2), (5.3), and Theorem 8.1. ∎