The limit of the smallest singular value of random matrices with i.i.d. entries
Konstantin Tikhomirov
Introduction
For and an real-valued matrix , its singular values , , are the eigenvalues of the matrix arranged in non-increasing order, where multiplicities are counted. In particular, the largest and the smallest singular values are given by
In this paper, we establish convergence of the smallest singular values of a sequence random matrices with i.i.d. entries under minimal moment assumptions.
The extreme singular values of random matrices attract considerable attention of researchers both in limiting and non-limiting settings. We refer the reader to surveys and monographs , , , for extensive information on the spectral theory of random matrices. Here, we shall focus on the following specific question: for matrices with i.i.d. entries, what are the weakest possible assumptions on the entries which are sufficient for the smallest singular value to “concentrate”?
We note that a corresponding problem for the largest singular value (i.e. the operator norm) was essentially resolved in the i.i.d. case, where finiteness of the fourth moment of the entries turns out to be crucial both in limiting and non-limiting settings. We refer the reader to and for results on a.s. convergence of the largest singular value, and for the non-limiting case (see also , for some negative results on concentration of the operator norm).
Further, it is proved in , that for square matrices with i.i.d. centered entries with unit variance and a bounded fourth moment, one has with a large probability.
A natural question in connection with the mentioned results is whether the assumption on the fourth moment is necessary for the least singular value to “concentrate”; in particular, whether any assumptions on moments of ’s higher than the -nd are required for the a.s. convergence in the Bai–Yin theorem. This question is discussed in on p. 6. Solving the problem was a motivation for our work.
A considerable progress has been made recently in the direction of weakening the moment assumptions on matrix entries. For square matrices, given a sufficiently large and an matrix with i.i.d. entries with zero mean and unit variance, its smallest singular value is bounded from below by a constant (negative) power of with probability close to one [19, Theorem 2.1] (see also [5, Theorem 4.1] for sparse matrices).
The result of can be used to show that in the limiting setup of the Bai–Yin theorem but without the assumptions on moments higher than the -nd, the sequence \bigl{(}{N_{m}}^{-1/2}s_{m}(A_{m})\bigr{)}_{m=1}^{\infty} satisfies
where is a certain function of and the distribution of ’s. The same conclusion can be derived from [6, Theorem 1.4], if we additionally assume that the limiting aspect ratio is bounded from above by a sufficiently small positive quantity (i.e. the matrices are tall). However, both [20, Theorem 1] and [6, Theorem 1.4] do not give the precise asymptotics.
This problem is resolved in our paper. The main result is the following
Theorem 1 in a strong form establishes the asymmetry of the limiting behaviour of the extreme singular values: whereas the fourth moment is necessary for the operator norm, the second moment is sufficient for the convergence of the smallest singular value.
which implies the result. Thus, the argument of the paper remains the crucial element of the proof, although we apply it only to the truncated variables, for which all positive moments are bounded. Let us emphasize that, whereas a truncation procedure for matrices also appears as a technical step in , in our approach the truncation level is not a function of .
Preliminaries
In this section, we introduce notation and present some classical or elementary facts, which we include for an easier referencing.
The next statement, which is sometimes called the Bernstein (or Hoeffding’s) inequality, can be derived from classical Khintchine’s inequality for the sum of weighted independent signs by a symmetrization procedure:
where is a universal constant.
The lemma below is a law of large numbers, where instead of the arithmetic mean of a collection of random variables we consider more general weighted sums. As in the case of the classical weak LLN, the statement can be proved by applying Levy’s continuity theorem for characteristic functions.
Let be i.i.d. random variables with zero mean. Then for any there is depending only on and the distribution of ’s with the following property: whenever is a sequence of non-negative real numbers such that and , we have
where and .
Note that the above theorem does not require any assumptions on moments higher than the nd, and so can be applied in our setting. For our proof, we will actually need a much weaker result than Theorem 6, namely, that almost surely. The latter can be immediately verified with help of Theorem 6: for every fixed , we have with probability one, hence the smallest non-zero eigenvalues of matrices satisfy a.s. This implies a.s., which gives the required estimate by letting .
Norms of coordinate projections of random vectors
The goal of this section is to show that, given a sufficiently large random matrix with i.i.d. entries with zero mean and unit variance, the quantity
is of order with a very large probability (the probability shall depend on ). It shall act as a “replacement” of the matrix norm which in our setting may be greater than by the order of magnitude with probability close to one. We remark here that a quantity
where and is an random matrix with i.i.d. isotropic log-concave rows, played a crucial role in the paper by Adamczak, Litvak, Pajor and Tomczak-Jaegermann, dealing with the problem of approximating covariance matrix of a log-concave random vector by the sample covariance matrix. In our case, however, the latter quantity is inapplicable as it may not concentrate near (even for small ).
First, we prove the required estimate for (1) under the additional assumption that the entries of are symmetrically distributed (Lemma 12). Then we generalize the result to non-symmetric distributions in Proposition 13. Lemmas 7–11 given below build the framework of the proof.
For each there is depending only on with the following property: let and let be a random vector of independent variables, each having zero mean and unit variance. Then
with probability at least , where are universal constants.
Fix any and define as the smallest positive integer such that
for all . Choose any and let be as stated above. Set . In view of Markov’s inequality,
Finally, using the definition of , we get
Fix any and let and be as stated above. Let be Rademacher variables jointly independent with , and let denote the random matrix . Then, since ’s are symmetrically distributed, for any fixed vector the distribution of is the same as that of . Define a subset of (non-random) matrices:
and for every denote by the random matrix . Note that at every point of the probability space the matrix belongs to . Then, conditioning on ’s, we get for every :
Note that for each and , the -th coordinate of the vector satisfies in view of Lemma 4:
A standard application of the Laplace transform then yields
for some depending only on . This, together with (2), proves the result. ∎
As an elementary consequence of Lemmas 8 and 9 we get
Fix any and and define , where is taken from Lemma 9. Now, choose any and let be an random matrix with i.i.d. entries distributed as . Let be the set of vertices of the cube . In view of Lemma 9, any satisfies
Next, by Lemma 8, for we have
for all . Note that for any the random sets and coincide everywhere on . Hence, together with the above estimates, we get
It remains to note that for any and we have
In the following statement, we bound the quantity (1) assuming that the matrix entries are symmetrically distributed. The lemmas above provide estimates for for individual vectors on the sphere as well as an upper bound on the cube . To derive an estimate for the supremum over the sphere, we shall embed into Minkowski sum of a multiple of and two specially chosen finite sets (see (3) in the proof below). This way each vector can be “decomposed” as a sum of three vectors with particular characterestics. This approach is similar to splitting the unit sphere into sets of “close to sparse” and “far from sparse” vectors introduced in and subsequently used in , .
where is a universal constant.
Fix and let be the smallest integer such that
;
N_{\ref{weak lsv sym}}\geq\max\bigl{(}N_{\ref{single vector est}}(\varepsilon/3),n_{\ref{cube bound}}(\varepsilon/3,1)\bigr{)};
Choose . Without loss of generality, we can assume that . Let be as stated above.
By Lemma 3, there is a finite subset of cardinality at most such that for any there is with .
so . This proves (3).
For each , in view of Lemma 7 and the condition , we have
Next, for every , Lemma 10 together with the inequality and implies that
for some constant . Finally, by Lemma 11 and in view of the condition we have
where is a universal constant. Let denote the event
Then from the above probability estimates and the definition of we obtain
where w_{\ref{weak lsv sym}}=\min\bigl{(}\frac{c_{\ref{single vector est}}\varepsilon}{6},\frac{C_{\ref{cubic net}}}{2},\frac{1}{2}\bigr{)}.
Note that the intersection necessarily satisfies , and from the last inequalities we get . Since our choice of and was arbitrary, we get
Finally, we can state the main result of the section.
where is a universal constant.
Hence, taking into consideration that the entries of are distributed as and using Lemma 12, we get
Matrix truncation and proof of Theorem 1
In the next statement, we compare the -th largest singular value of a random matrix with bounded entries to . Obviously,
We will need an inequality in the opposite direction when . A theorem of Litvak, Pajor, Rudelson and Tomczak-Jaegermann from implies that for any and there are and depending only on and with the following property: whenever and is an random matrix with i.i.d. entries with mean zero, variance one and a.s. bounded by , we have
This, together with an upper bound for , gives an estimate
with a large probability, where depends only on and . However, such an estimate would be insufficient for our needs, and we shall apply a more direct argument to get a stronger relation.
Fix any , let be the largest number in satisfying
Let , and be an random matrix defined as above. We shall prove the statement by contradiction. Let us assume that
Cardinality of the set T=\bigl{\{}I\subset\{1,2,\dots,N\}:\,|I|=\lceil N-\varepsilon N\rceil\bigr{\}} can be estimated as
Hence, our assumption implies that there is a set such that
Now, for every , Lemma 4 and the standard procedure with the Laplace transform give for :
Together with (4), the last estimate implies
However, this contradicts to our choice of . Thus, the initial assumption was wrong, and the statement is proved. ∎
Let be a random variable with zero mean. Then for any we call the variable
the centered -truncation of . Here, is the indicator of the event \bigl{\{}\omega\in\Omega:\,|\xi(\omega)|\leq M\bigr{\}}.
Thus, it suffices to prove the lower estimate
Now, choose arbitrary and let be such that
where the quantity on the right-hand side goes to as tends to infinity. Hence, we obtain
On the other hand, the theorem of Bai and Yin implies that
Since was arbitrary, this proves the result. ∎
Acknowledgement. I would like to thank my supervisor Dr. Nicole Tomczak-Jaegermann for support and for valuable suggestions on the text.