The smallest singular value of a random rectangular matrix
Mark Rudelson, Roman Vershynin
Introduction
Extreme singular values of random matrices has been of considerable interest in mathematical physics, geometric functional analysis, numerical analysis and other fields. Consider an real matrix with . The singular values of are the eigenvalues of arranged in nonincreasing order. Of particular significance are the largest and the smallest singular values
A natural matrix model is given by matrices whose entries are independent real random variables with certain moment assumptions. In this paper, we shall consider subgaussian random variables – those whose tails are dominated by that of the standard normal random variable. Namely, a random variable is called subgaussian if there exists such that
The minimal in this inequality is called the subgaussian moment of . Inequality (1.2) is often equivalently formulated as the moment condition
where is an absolute constant. The class of subgaussian random variables includes many random variables that arise naturally in applications, such as normal, symmetric and general bounded random variables.
In this paper, we study real random matrices whose entries are independent and identically distributed mean zero subgaussian random variables. The asymptotic behavior of the extreme singular values of is well understood. If the entries have unit variance and the dimension grows to infinity while the aspect ratio converges to a constant , then
almost surely. This result was proved in for Gaussian matrices, and in for matrices with independent and identically distributed entries with finite fourth moment. In other words, we have asymptotically
Considerable efforts were made recently to establish non-asymptotic estimates similar to (1.4), which would hold for arbitrary fixed dimensions and ; see the survey on the largest singular value, and the discussion below on the smallest singular value.
The largest singular value is relatively easy to bound above, up to a constant factor. Indeed, a standard covering argument shows that is at most of the optimal order for all fixed dimensions, see Proposition 2.3 below. The smallest singular value is significantly harder to control. The efforts to prove optimal bounds on have a long history, which we shall now outline.
2. Tall matrices
A result of provides an optimal bound for tall matrices, those with aspect ratio satisfies for some sufficiently small constant . Recalling (1.4), one should expect that tall matrices satisfy
It was indeed proved in that for tall matrices one has
where and are absolute constants.
3. Almost square matrices
As we move toward square matrices, thus making the aspect ratio arbitrarily close to , the problem of estimating the smallest singular value becomes harder. One still expects (1.5) to be true as long as is any constant. Indeed, this was proved in for arbitrary aspect ratios and for general random matrices with independent subgaussian entries. One has
where depends only on and the maximal subgaussian moment of the entries.
In subsequent work , the dependence of on the aspect ratio in (1.7) was improved for random matrices; however the probability estimate there was weaker than in (1.7). An estimate for subgaussian random matrices of all dimensions was obtained in . For any , it was shown that
However, because of the factor , this estimate is suboptimal and does not correspond to the expected asymptotic behavior (1.4).
4. Square matrices
The extreme case for the problem of estimating the singular value is for the square matrices, where . Asymptotic (1.4) is useless for square matrices. However, for “almost” square matrices, those with constant defect , the quantity is of order , so asymptotics (1.4) heuristically suggests that these matrices should satisfy
This conjecture was proved recently in for all square subgaussian matrices:
5. New result: bridging all classes of matrices
In this paper, we prove the conjectural bound for valid for all subgaussian matrices in all fixed dimensions . The bound is optimal for matrices with all aspect ratios we encountered above.
Let be an random matrix, , whose elements are independent copies of a mean zero subgaussian random variable with unit variance. Then, for every , we have
where depend (polynomially) only on the subgaussian moment .
For tall matrices, Theorem 1.1 clearly amounts to the known estimates (1.5), (1.6). For square matrices (), the quantity is of order , so Theorem 1.1 amounts to the known estimates (1.8), (1.9). Finally, for matrices that are arbitrarily close to square, Theorem 1.1 yields the new optimal estimate
This is a version of the asymptotics (1.4), now valid for all fixed dimensions. This bound was explicitly conjectured e.g. in .
However, this bound is not optimal, and it becomes useless for matrices that are close to square, when .
The form of estimate (1.10) may be expected if one recalls the classical -net argument, which underlies many proofs in geometric functional analysis. By (1.1), we are looking for a lower bound on that would hold uniformly for all vectors on the unit Euclidean sphere . For every fixed , the quantity is the sum of independent random variables (the squares of the coordinates of ). Therefore, the deviation inequalities make us to expect that is of the order with probability exponential in , i.e. . We can run this argument separately for each vector in a small net of the sphere , and then take the union bound to make the estimate uniform over . It is known how to choose a net of cardinality exponential in the dimension of the sphere, i.e. . Therefore, with probability , we have a good lower bound on for all vectors in the net . Finally, one transfers this estimate from the net to the whole sphere by approximation.
The problem with this argument is that the constants and are not the same. Therefore, our estimate on the probability is positive only for tall matrices, when . To reach out to matrices of arbitrary dimensions, one needs to develop much more sensitive versions of the -net arguments. Nevertheless, the end result stated in Theorem 1.1 exhibits the same two forces played against one another – the probability quantified by the dimension and the complexity of the sphere quantified by its dimension .
6. Small ball probabilities, distance problems, and additive structure
Our proof of Theorem 1.1 is a development of our method in for square matrices. Dealing with rectangular matrices is in several ways considerably harder. Several new tools are developed in this paper, which may be of independent interest.
To prove (1.12), we first use the small ball probability inequalities to compute the distance to an arbitrary subspace . This estimate necessarily depends on the additive structure of the subspace ; the less structure, the better is our estimate, see Theorem 4.2. We then prove the intuitively plausible fact that random subspaces have no arithmetic structure, see Theorem 4.3. This together leads to the desired distance estimate (1.12).
The distance bound is then used to prove our main result, Theorem 1.1. Let be some column of the random matrix and be the span of the other columns. The simple rank argument shows that the smallest singular value if and only if for some column. A simple quantitative version of this argument is that a lower estimate on yields a lower bound on .
In Section 6, we show how to reverse this argument for random matrices – deduce a lower bound on the smallest singular value from lower bound (1.12) on the distance . Our reverse argument is harder than its version for square matrices from , where we had . First, instead of one column we now have to consider all linear combinations of columns; see Lemma 6.2. To obtain a distance bound that would be uniformly good for all such linear combinations, one would normally use an -net argument. However, the distance to the -dimensional subspace is not sufficiently stable for this argument to be useful for small (for matrices close to square). We therefore develop a decoupling argument in Section 7 to bypass this difficulty.
Once this is done, the proof is quickly completed in Section 8.
Acknowledgement
We are grateful to Shuheng Zhou, Nicole Tomczak-Jaegermann, Radoslaw Adamczak, and the anonymous referee for pointing out several inaccuracies in our argument. The second named author is grateful for his wife Lilia for her love and patience during the years this paper was being written.
Notation and preliminaries
Throughout the paper, positive constants are denoted Unless otherwise stated, these are absolute constants. In some of our arguments they may depend (polynomially) on specified parameters, such as the subgaussian moment .
The following Lemma is a variant of the well known volumetric estimate.
Let be a subset of , and let . Then there exists an -net of of cardinality at most
The published variants of his lemma (e.g. , Lemma 2.6) have exponent rather than . Since the latter exponent will be crucial for our purposes, we include the proof of this lemma for the reader’s convenience.
Without loss of generality we can assume that , otherwise any single point forms a desired net. Let be an -separated subset of of maximal cardinality. By maximality, is an -net of . Since is -separated, the balls with centers are disjoint. All these balls have the same volume, and they are contained in the spherical shell . Therefore, comparing the volumes, we have
Dividing both sides of this inequality by , we obtain
Using the inequality valid for , we conclude that is bounded as desired. This completes the proof. ∎
The following well known argument allows one to compute the norm of a linear operator using nets. We have not found a published reference to this argument, so we include it for the reader’s convenience.
Every has the form , where and . Since , the triangle inequality yields
The last term in the right hand side is bounded by . Therefore we have shown that
Fix . Repeating the above argument for yields the bound
The two previous estimates complete the proof. ∎
Using nets, one easily proves the well known basic bound on the norm of a random subgaussian matrix:
Let be an random matrix, , whose elements are independent copies of a subgaussian random variable. Then
where depend only on the subgaussian moment .
Let be a -net of and be a -net of . By Proposition 2.1, we can choose these nets such that
For every and , the random variable is subgaussian (see Fact 2.1 in ), thus
where depend only on the subgaussian moment . Using Lemma 2.2 and taking the union bound, we obtain
2. Compressible and incompressible vectors
In our proof of Theorem 1.1, we will make use of a partition of the unit sphere into two sets of compressible and incompressible vectors. These sets were first defined in as follows.
We now recall without proof two simple results. The first is Lemma 3.4 from :
Let . Then there exists a set of cardinality and such that
The other result is a variant of Lemma 3.3 from , which establishes the invertibility on compressible vectors, and allows us to focus on incompressible vectors in our proof of Theorem 1.1. While Lemma 3.3 was formulated in for a square matrix, the same proof applies to matrices, provided that .
Let be an random matrix, , whose elements are independent copies of a subgaussian random variable. There exist depending only on the subgaussian moment such that
Small ball probability and the arithmetic structure
Starting from the works of Lévy , Kolmogorov and Esséen , a number of results in probability theory was concerned with the question how spread the sums of independent random variables are. It is convenient to quantify the spread of a random variable in the following way.
An equivalent way of looking at the Lévy concentration function is that it measures the small ball probabilities – the likelihood that the random vector enters a small ball in the space. An exposition of the theory of small ball probabilities can be found in .
One can derive a simple but rather weak bound on Lévy concentration function from Paley-Zygmund inequality.
Let be a random variable with mean zero, unit variance, and finite fourth moment. Then for every there exists which depends only on and on the fourth moment, and such that
In particular, this bound holds for subgaussian random variables, and with that depends only on and the subgaussian moment.
We use Paley-Zygmund inequality, which states for a random variable that
so, using Minkowski inequality, we obtain
We will need a much stronger bound on the concentration function for sums of independent random variables. Here we present a multi-dimensional version of the inverse Littlewood-Offord inequality from . While this paper was in preparation, Friedland and Sodin proposed two different ways to simplify and improve our argument in . We shall therefore present here a multi-dimensional version of one of arguments of Friedland and Sodin , which is considerably simpler than our original proof.
In the scalar case, when , the additive structure of a sequence of real numbers can be described in terms of the shortest arithmetic progression into which it (essentially) embeds. This length is conveniently expressed as the essential least common denominator of , defined as follows. We fix parameters , and define
A more traditional way of looking at is to regard it as the product of the matrix with rows and the vector .
Then we define, for and ,
The following theorem gives a bound on the small ball probability for a random sum in terms of the additive structure of the coefficient sequence . The less structure in , the bigger its least common denominator is, and the smaller is the small ball probability for .
Let be independent and identically distributed, mean zero random variables, such that for some . Consider the random sum . Then, for every and , and for
Halász developed a powerful approach to bounding concentration function; his approach influenced our arguments below. Halász operated under a similar non-degeneracy condition on the vectors : for every , at least terms satisfy . After properly rescaling by the factor , Halász’s condition is seen to be more restrictive than (3.2).
To estimate the Lévy concentration function we apply the Esséen Lemma, see e.g. , p. 290.
Applying Lemma 3.4 to the vector and using the independence of random variables , we obtain
Substituting of this into (3.3) and using Jensen’s inequality, we get
The next and major step is to bound the size of the recurrence set
Recall that by the assumption of the theorem,
Therefore, by the definition of the least common denominator, we have that either
In the latter case, since , inequalities (3.4) and (3.5) together yield
where the last inequality follows from condition (3.2).
Recalling the definition of , we have proved that every pair of points satisfies:
It follows that can be covered by Euclidean balls of radii , whose centers are -separated in the Euclidean distance. Since , the number of such balls is at most
which completes the proof of the lemma. ∎
We decompose the domain into two parts. First, by the definition of , we have
In the last line, we used the estimate .
Second, by the integral distribution formula and using Lemma 3.5, we have
Combining (3.1) and (3.1) completes the proof of Theorem 3.3. ∎
2. Least common denominator of incompressible vectors
The proof gives and .
By Lemma 2.5, there exists a set of size
This shows in particular that ; dividing by gives
Then by Chebychev inequality, there exists a set of size
Since , there exists . Fix this . By the left hand side of (3.8), by (3.9) and the assumption on we have:
Thus ; since is an integer, this yields . Similarly, using the right hand side of (3.8), (3.9) and the assumption on , we get
The distance problem and arithmetic structure
More precisely, standard computations give for every that
However, if has a more general distribution with independent coordinates, the distance may strongly depend on the subspace . For example, if the coordinates of are symmetric random variables. then for the distance equals with probability , while for the distance equals with probability .
Nevertheless, a version of the distance bound (4.1) remains true for general distributions if is a random subspace. For spaces of codimension , this result was proved in . In this paper, we prove an optimal distance bound for general dimensions.
To explain the term , consider symmetric random variables. Then with probability at least the random vector coincides with one of the random vectors that span , which makes the distance equal zero.
We will deduce Theorem 4.1 from a more general inequality that holds for arbitrary fixed subspace . This bound will depend on the arithmetic structure of the subspace , which we express using the least common denominator.
Then Theorem 3.3 quickly leads to the following general distance bound:
where depend only on the subgaussian moment.
Let us write in coordinates, . By Lemma 3.2 and the remark below it, all coordinates of satisfy the inequality for some that depends only on the subgaussian moment of . Hence the random variables satisfy the assumption in Theorem 3.3.
Next, we connect the distance to a sum of independent random vectors:
For every and every we have so
The theorem now follows directly from Theorem 3.3. ∎
In order to deduce the Distance Theorem 4.1, it will now suffice to bound below the least common denominator of a random subspace . Heuristically, the randomness should remove any arithmetic structure from the subspace, thus making the least common denominator exponentially large. Our next results shows that this is indeed true.
Assuming that this result holds, we can complete the proof of the Distance Theorem 4.1.
Let us condition on a realization of in . By the independence of and , Theorem 4.2 used with and gives
By the estimate on the probability of , this completes the proof. ∎
Let denote the independent random vectors that span the subspace . Consider an random matrix with rows . Then
This observation will help us to “navigate” the random subspace away from undesired sets on the unit sphere.
There exist such that
By (4.3), with probability at least . ∎
Fix the values of and given by Lemma 4.4 for the rest of this section. We will further decompose the set of incompressible vectors into level sets according to the value of the least common denominator . We shall prove a nontrivial lower bound on for each level set up to of the exponential order. By (4.3), this will mean that is disjoint from every such level set. Therefore, all vectors in must have exponentially large least common denominators . This is Theorem 4.3.
Let , where is a small number to be chosen later, which depends only on the subgaussian moment. By Lemma 3.6,
Let . Define as
To obtain a lower bound for on the level set, we proceed by an -net argument. To this end, we first need such a bound for a single vector .
Let . Then for every we have
Denoting the elements of by , we can write the -th coordinate of as
Now we can use the Small Ball Probability Theorem 3.3 in dimension for each of these random sums. By Lemma 3.2 and the remark below it, for some that depends only on the subgaussian moment of . Hence the random variables satisfy the assumption in Theorem 3.3. This gives for every and every :
Since are independent random variables, we can use Tensorization Lemma 2.2 of to conclude that for every ,
This completes the proof, because and by the assumption. ∎
Next, we construct a small -net of the level set . Since this set lies in , Lemma 2.1 yields the existence of an -net of cardinality at most . This simple volumetric bound is not sufficient for our purposes, and this is the crucial step where we explore the additive structure of to construct a smaller net.
There exists a -net of of cardinality at most .
Recall that is chosen as a small proportion of . Hence Lemma 4.7 gives a better bound than the standard volumetric bound in Lemma 2.1.
We can assume that , otherwise the conclusion is trivial. For , denote
On the other hand, by (4.5) and using that , and , we obtain
Inequalities (4.6) and (4.7) show that every point is within Euclidean distance from the set
A known volumetric argument gives a bound on the number of integer points in :
(where in the last inequality we used that by Definition 4.5 of the level sets, ). Finally, there exists a -net of with the same cardinality as , and which lies in . Indeed, to obtain such a net, one selects one (arbitrary) point from the intersection of with a ball of radius centered at each point from . This completes the proof. ∎
There exist such that the following holds. Let and . Then
By Lemma 2.3, there exists that depends only on the subgaussian moment and such that
Therefore, in order to complete the proof, it is enough to find which depends only on the subgaussian moment, and such that the event
We claim that this holds with the following choice of parameters:
where and are the constants from Lemma 4.6 and is the constant from Lemma 4.7.
This gives for arbitrary :
Now we use Lemma 4.7, which yields a small -net of . Taking the union bound, we get
Denote . Using the fact that and our assumption on , we have:
Assume occurs. Fix for which ; it can be approximated by some element as
Therefore, by the triangle inequality we have
where in the last inequality we used our choice of .
We have shown that the event implies the event that
whose probability is at most by (4.8). The proof is complete. ∎
where is the constant from Lemma 4.8. Then, by the Definition 4.5 of the level sets, either is compressible or for some , where
Therefore, recalling the definition of the least common denominator of the subspace
we can decompose the desired probability as follows:
By Lemma 4.4, the first term in the right hand side is bounded by . Further terms can be bonded using (4.3) and Lemma 4.8:
Since there are terms in the sum, we conclude that
Decomposition of the sphere
Now we begin the proof of Theorem 1.1. We will make several useful reductions first.
Without loss of generality, we can assume that the entries of have a an absolutely continuous distribution. Indeed, we can add to each entry an independent Gaussian random variable with small variance , and later let .
Similarly, we can assume that , where is a suitably large number that depends only on the subgaussian moment .
with suitably small constant that depends only on the subgaussian moment . Indeed, as we remarked in the Introduction, for the values of above a constant proportion of , Theorem 1.1 follows from (1.7). Note that
Using the decomposition of the sphere , we break the invertibility problem into two subproblems, for compressible and incompressible vectors:
A bound for the compressible vectors follows from Lemma 2.6. Using (5.1) we get
It remains to find a lower bound on for the incompressible vectors .
Invertibility via uniform distance bounds
In this section, we reduce the problem of bounding for incompressible vectors to the distance problem that we addressed in Section 4.
For levels that will only depend on , we define the set of totally spread vectors
For every , there exist which depend only on , and such that the following holds. For every , the event
The proof gives , , . In the rest of the proof, we shall use definition (6.1) of with these values of the levels , .
Let be the subset from Lemma 2.5. Recall that the parameters and depend only on the subgaussian moment (see Lemma 2.6). By choosing the constant in (5.1) appropriately small, we may assume that . Then, using Stirling’s approximation we have
If , then summing (2.1) over , we obtain the required two-sided bound for . This and (2.1) yields . Hence holds. ∎
Let . There exist which depend only on , and such that the following holds. Let be any -element subset of . Then for every
The proof gives , , , .
Let . For every subset of we have
In case the event of Lemma 6.1 holds, we use the vector to check that
is independent of . Moreover, using the estimate on in the definition of the event , we conclude that
where is the constant from Lemma 6.1. Chebychev inequality and Fubini theorem then yield
Fix any realization of for which occurs, and fix any . Then
We have proved that for every there exists a subset that satisfies both and . Using this in (6.3), we conclude that every matrix for which the event occurs satisfies
The uniform distance bound
In this section, we shall estimate the distance between a random ellipsoid and a random independent subspace. This is the distance that we need to bound in the right hand side of (6.2).
Throughout this section, we let be a fixed subset of , . We shall use the notation introduced in the beginning of Section 6. Thus, denotes a random subspace, and denotes the totally spread set whose levels , depend only on , in the definition of incompressibility.
We will denote by positive numbers that depend only on , and the subgaussian moment .
Recall that is the span of independent random vectors. Since their distribution is absolutely continuous (see the beginning of Section 5), these vectors are almost surely in general position, so
Without loss of generality, in the proof of Theorem 7.1 we can assume that
We would like to prove Theorem 7.1 by a typical -net argument. Theorem 4.1 will give a useful probability bound for an individual . We might then take a union bound over all in an -net of and complete by approximation. However, the standard approximation argument will leave us with a larger error on the probability, which is unsatisfactory for small . To improve upon this step, we shall improve upon this approach using decoupling in Section 7.2.
For now, we start with a bound for an individual .
Denote the entries of matrix by . Then the entries of the random vector ,
are independent and identically distributed mean zero random variables. Moreover, since the random variables are subgaussian and , the random variables are also subgaussian (see Fact 2.1 in ).
Therefore the random vector and the random subspace satisfy the assumptions of Theorem 4.1 with (we used (7.1) here). An application of Theorem 4.1 completes the proof. ∎
Since and almost surely , the random matrix acts as an operator from a -dimensional subspace into a -dimensional subspace. Although the entries of are not necessarily independent, we expect to behave as if this was the case. To this end, we condition on the realization of the subspace . Now the operator becomes a fixed projection, and the columns of become independent random vectors. Then satisfies a version of Proposition 2.3:
Using Proposition7.3, we can choose a constant that depends only on the subgaussian moment, and such that
With this bound on the norm of , we can run the approximation argument and prove the distance bound in Lemma 7.2 uniformly over all .
Let be a random matrix as in Proposition 7.3. Then for every that satisfies (7.2) we have
Taking the union bound and using the representation (7.4) in Lemma 7.2, we obtain
Now, suppose the event in (7.6) holds, i.e. there exists such that
Choose such that . Then by the triangle inequality
Therefore, holds. The bound on the probability of completes the proof. ∎
By representation (7.4), this is a weaker version of Theorem 7.2, with instead of . Unfortunately, this bound is too weak for small . In particular, for square matrices we have , and the bound is useless.
In the next section, we will refine our current approach using decoupling.
2. Refinement: decoupling
Our problem is that the probability bound in (7.5) is too weak. We will bypass this by decomposing our event according to all possible values of , and by decoupling the information about from the information about .
Let be an matrix whose columns are independent random vectors. Let and let be a vector satisfying for all . Then for every , we have
If then , so the probability in the left hand side is zero. So, let . Then we can decompose the index set into two disjoint subsets and whose cardinalities differ by at most , say with .
We write where and are the submatrices of with columns in and respectively. Similarly, for , we write .
Since , we have
and similarly for . It suffices to bound ; the argument for is similar.
Writing and using the independence of the matrices and , we conclude that
(In the last line we used and ).
By the assumption on and since , we have
Hence for and , we obtain
Together with (7.7), this completes the proof. ∎
We use this decoupling in the following refinement of Lemma 7.4.
We condition on the realization of the subspace as above to make the columns of independent. By the definition (6.1) of , any satisfies the condition of the Decoupling Proposition 7.5 with . Taking the union bound and then using Proposition 7.5, we obtain
Assume now that , where and are as in Theorem 4.3. Then using Proposition 7.3 and representation (7.4), we conclude as in the proof of Theorem 4.1 that
for any satisfying (7.2). Since and , we can bound this as
Now, suppose the event in (7.8) holds, i.e. there exists such that
Choose such that . Then by the triangle inequality
Therefore, holds. The bound on the probability of completes the proof. ∎
Recall that, without loss of generality, we assumed that (7.2) held. Let be the smallest natural number such that
where and are constants from Lemma 2.3 and Lemma 7.6 respectively. Summing the probability estimates of Proposition 7.4 and Lemma 7.6 for , , we conclude that
By (7.9) and Proposition 2.3, the last expression does not exceed . In view of representation (7.4), this completes the proof. ∎
Completion of the proof
In Section 6, we reduced the invertibility problem for incompressible vectors to computing the distance between a random ellipsoid and a random subspace. This distance was estimated in Section 7. These together lead to the following invertibility bound:
Let . There exist which depend only on , and such that the following holds. For every ,
Without loss of generality, we can assume that (7.2) holds. We use Lemma 6.2 with and then Theorem 7.1 to get the bound on the desired probability. This completes the proof. ∎
This follows directly from (5.2), (5.3), and Theorem 8.1. ∎