The limit of the smallest singular value of random matrices with i.i.d. entries

Konstantin Tikhomirov

Introduction

For NmN\geq m and an N×mN\times m real-valued matrix BB, its singular values s1(B)s_{1}(B), s2(B),s_{2}(B),\dots, sm(B)s_{m}(B) are the eigenvalues of the matrix BTB\sqrt{B^{T}B} arranged in non-increasing order, where multiplicities are counted. In particular, the largest and the smallest singular values are given by

In this paper, we establish convergence of the smallest singular values of a sequence random matrices with i.i.d. entries under minimal moment assumptions.

The extreme singular values of random matrices attract considerable attention of researchers both in limiting and non-limiting settings. We refer the reader to surveys and monographs , , , for extensive information on the spectral theory of random matrices. Here, we shall focus on the following specific question: for matrices with i.i.d. entries, what are the weakest possible assumptions on the entries which are sufficient for the smallest singular value to “concentrate”?

We note that a corresponding problem for the largest singular value (i.e. the operator norm) was essentially resolved in the i.i.d. case, where finiteness of the fourth moment of the entries turns out to be crucial both in limiting and non-limiting settings. We refer the reader to and for results on a.s. convergence of the largest singular value, and for the non-limiting case (see also , for some negative results on concentration of the operator norm).

Further, it is proved in , that for square m×mm\times m matrices with i.i.d. centered entries with unit variance and a bounded fourth moment, one has sm(A)m1/2s_{m}(A)\approx m^{-1/2} with a large probability.

A natural question in connection with the mentioned results is whether the assumption on the fourth moment is necessary for the least singular value to “concentrate”; in particular, whether any assumptions on moments of aija_{ij}’s higher than the 22-nd are required for the a.s. convergence in the Bai–Yin theorem. This question is discussed in on p. 6. Solving the problem was a motivation for our work.

A considerable progress has been made recently in the direction of weakening the moment assumptions on matrix entries. For square matrices, given a sufficiently large mm and an m×mm\times m matrix with i.i.d. entries with zero mean and unit variance, its smallest singular value is bounded from below by a constant (negative) power of mm with probability close to one [19, Theorem 2.1] (see also [5, Theorem 4.1] for sparse matrices).

The result of can be used to show that in the limiting setup of the Bai–Yin theorem but without the assumptions on moments higher than the 22-nd, the sequence \bigl{(}{N_{m}}^{-1/2}s_{m}(A_{m})\bigr{)}_{m=1}^{\infty} satisfies

where rr is a certain function of z=limm/Nmz=\lim m/N_{m} and the distribution of aija_{ij}’s. The same conclusion can be derived from [6, Theorem 1.4], if we additionally assume that the limiting aspect ratio zz is bounded from above by a sufficiently small positive quantity (i.e. the matrices are tall). However, both [20, Theorem 1] and [6, Theorem 1.4] do not give the precise asymptotics.

This problem is resolved in our paper. The main result is the following

Theorem 1 in a strong form establishes the asymmetry of the limiting behaviour of the extreme singular values: whereas the fourth moment is necessary for the operator norm, the second moment is sufficient for the convergence of the smallest singular value.

which implies the result. Thus, the argument of the paper remains the crucial element of the proof, although we apply it only to the truncated variables, for which all positive moments are bounded. Let us emphasize that, whereas a truncation procedure for matrices also appears as a technical step in , in our approach the truncation level MM is not a function of mm.

Preliminaries

In this section, we introduce notation and present some classical or elementary facts, which we include for an easier referencing.

The next statement, which is sometimes called the Bernstein (or Hoeffding’s) inequality, can be derived from classical Khintchine’s inequality for the sum of weighted independent signs by a symmetrization procedure:

where c\refKhintchinemod>0c_{\ref{Khintchine mod}}>0 is a universal constant.

The lemma below is a law of large numbers, where instead of the arithmetic mean of a collection of random variables we consider more general weighted sums. As in the case of the classical weak LLN, the statement can be proved by applying Levy’s continuity theorem for characteristic functions.

Let a1,a2,a_{1},a_{2},\dots be i.i.d. random variables with zero mean. Then for any ε>0\varepsilon>0 there is δ>0\delta>0 depending only on ε\varepsilon and the distribution of aja_{j}’s with the following property: whenever (tj)j=1(t_{j})_{j=1}^{\infty} is a sequence of non-negative real numbers such that j=1tj=1\sum_{j=1}^{\infty}t_{j}=1 and maxtjδ\max t_{j}\leq\delta, we have

where r=(1z)2r=(1-\sqrt{z})^{2} and R=(1+z)2R=(1+\sqrt{z})^{2}.

Note that the above theorem does not require any assumptions on moments higher than the 22nd, and so can be applied in our setting. For our proof, we will actually need a much weaker result than Theorem 6, namely, that lim supmsm(Am)Nm1z\limsup\limits_{m\to\infty}\frac{s_{m}(A_{m})}{\sqrt{N_{m}}}\leq 1-\sqrt{z} almost surely. The latter can be immediately verified with help of Theorem 6: for every fixed t>(1z)2t>(1-\sqrt{z})^{2}, we have limmFTm(t)=FMP(t)>0\lim\limits_{m\to\infty}F^{T_{m}}(t)=F_{MP}(t)>0 with probability one, hence the smallest non-zero eigenvalues λmin(Tm)\lambda_{\min}(T_{m}) of matrices TmT_{m} satisfy lim supmλmin(Tm)t\limsup\limits_{m\to\infty}\lambda_{\min}(T_{m})\leq t a.s. This implies lim supmsm(Am)Nmt\limsup\limits_{m\to\infty}\frac{s_{m}(A_{m})}{\sqrt{N_{m}}}\leq\sqrt{t} a.s., which gives the required estimate by letting t(1z)2t\to(1-\sqrt{z})^{2}.

Norms of coordinate projections of random vectors

The goal of this section is to show that, given a sufficiently large random N×nN\times n matrix AA with i.i.d. entries with zero mean and unit variance, the quantity

is of order N\sqrt{N} with a very large probability (the probability shall depend on ε>0\varepsilon>0). It shall act as a “replacement” of the matrix norm A\|A\| which in our setting may be greater than N\sqrt{N} by the order of magnitude with probability close to one. We remark here that a quantity

where mNm\leq N and DD is an N×nN\times n random matrix with i.i.d. isotropic log-concave rows, played a crucial role in the paper by Adamczak, Litvak, Pajor and Tomczak-Jaegermann, dealing with the problem of approximating covariance matrix of a log-concave random vector by the sample covariance matrix. In our case, however, the latter quantity is inapplicable as it may not concentrate near N\sqrt{N} (even for small mm).

First, we prove the required estimate for (1) under the additional assumption that the entries of AA are symmetrically distributed (Lemma 12). Then we generalize the result to non-symmetric distributions in Proposition 13. Lemmas 7–11 given below build the framework of the proof.

For each ε(0,1]\varepsilon\in(0,1] there is N\refsinglevectorest=N\refsinglevectorest(ε)>0N_{\ref{single vector est}}=N_{\ref{single vector est}}(\varepsilon)>0 depending only on ε\varepsilon with the following property: let NN\refsinglevectorestN\geq N_{\ref{single vector est}} and let X=(X1,X2,,XN)X=(X_{1},X_{2},\dots,X_{N}) be a random vector of independent variables, each XiX_{i} having zero mean and unit variance. Then

with probability at least 1exp(c\refsinglevectorestεN)1-\exp(-c_{\ref{single vector est}}\varepsilon N), where C\refsinglevectorest,c\refsinglevectorest>0C_{\ref{single vector est}},c_{\ref{single vector est}}>0 are universal constants.

Fix any ε(0,1]\varepsilon\in(0,1] and define N\refsinglevectorestN_{\ref{single vector est}} as the smallest positive integer such that

for all NN\refsinglevectorestN\geq N_{\ref{single vector est}}. Choose any NN\refsinglevectorestN\geq N_{\ref{single vector est}} and let XX be as stated above. Set M=4εM=\frac{4}{\varepsilon}. In view of Markov’s inequality,

Finally, using the definition of N\refsinglevectorestN_{\ref{single vector est}}, we get

Fix any K>0K>0 and let N,nN,n and A=(aij)A=(a_{ij}) be as stated above. Let rijr_{ij} (1iN,  1jn)(1\leq i\leq N,\;1\leq j\leq n) be Rademacher variables jointly independent with AA, and let Aˉ\bar{A} denote the random N×nN\times n matrix (rijaij)(r_{ij}a_{ij}). Then, since aija_{ij}’s are symmetrically distributed, for any fixed vector y=(y1,y2,,yn)Sn1y=(y_{1},y_{2},\dots,y_{n})\in S^{n-1} the distribution of ProjIyAy\|{\rm Proj}_{I_{y}}Ay\| is the same as that of ProjIyAˉy\|{\rm Proj}_{I_{y}}\bar{A}y\|. Define a subset of (non-random) N×nN\times n matrices:

and for every B=(bij)MyB=(b_{ij})\in{\mathcal{M}}_{y} denote by Bˉ\bar{B} the random matrix (rijbij)(r_{ij}b_{ij}). Note that at every point ω\omega of the probability space the matrix ProjIy(ω)Aˉ(ω){\rm Proj}_{I_{y}(\omega)}\bar{A}(\omega) belongs to My{\mathcal{M}}_{y}. Then, conditioning on aija_{ij}’s, we get for every τ>0\tau>0:

Note that for each BMyB\in{\mathcal{M}}_{y} and iNi\leq N, the ii-th coordinate of the vector Bˉy\bar{B}y satisfies in view of Lemma 4:

A standard application of the Laplace transform then yields

for some L\refvectorboundsym>0L_{\ref{vector bound sym}}>0 depending only on KK. This, together with (2), proves the result. ∎

As an elementary consequence of Lemmas 8 and 9 we get

Fix any K>0K>0 and ε>0\varepsilon>0 and define n\refcubebound=δ\refcardofIy(ε,K+1)2n_{\ref{cube bound}}=\lceil\delta_{\ref{card of Iy}}(\varepsilon,K+1)^{-2}\rceil, where δ\refcardofIy>0\delta_{\ref{card of Iy}}>0 is taken from Lemma 9. Now, choose any Nnn\refcubeboundN\geq n\geq n_{\ref{cube bound}} and let A=(aij)A=(a_{ij}) be an N×nN\times n random matrix with i.i.d. entries distributed as ξ\xi. Let VV be the set of vertices of the cube 1nBn=[1n,1n]n\frac{1}{\sqrt{n}}B_{\infty}^{n}=[-\frac{1}{\sqrt{n}},\frac{1}{\sqrt{n}}]^{n}. In view of Lemma 9, any vVv\in V satisfies

Next, by Lemma 8, for L=L\refvectorboundsym(K+2)>0L=L_{\ref{vector bound sym}}(K+2)>0 we have

for all vVv\in V. Note that for any u,vVu,v\in V the random sets IuI_{u} and IvI_{v} coincide everywhere on Ω\Omega. Hence, together with the above estimates, we get

It remains to note that for any I{1,2,,N}I\subset\{1,2,\dots,N\} and yBny\in B_{\infty}^{n} we have

In the following statement, we bound the quantity (1) assuming that the matrix entries are symmetrically distributed. The lemmas above provide estimates for minINεNProjIAy\min\limits_{|I|\geq N-\varepsilon N}\|{\rm Proj}_{I}Ay\| for individual vectors on the sphere as well as an upper bound on the cube 1nBn\frac{1}{\sqrt{n}}B_{\infty}^{n}. To derive an estimate for the supremum over the sphere, we shall embed Sn1S^{n-1} into Minkowski sum of a multiple of BnB_{\infty}^{n} and two specially chosen finite sets (see (3) in the proof below). This way each vector ySn1y\in S^{n-1} can be “decomposed” as a sum of three vectors with particular characterestics. This approach is similar to splitting the unit sphere into sets of “close to sparse” and “far from sparse” vectors introduced in and subsequently used in , .

where C\refweaklsvsym>0C_{\ref{weak lsv sym}}>0 is a universal constant.

Fix ε(0,1]\varepsilon\in(0,1] and let N\refweaklsvsymN_{\ref{weak lsv sym}} be the smallest integer such that

N\refweaklsvsym1/4δ\refsinglevbound(ε/3,2C\refcubicnet)1\lfloor N_{\ref{weak lsv sym}}^{1/4}\rfloor\delta_{\ref{single v bound}}(\varepsilon/3,2C_{\ref{cubic net}})\geq 1;

N_{\ref{weak lsv sym}}\geq\max\bigl{(}N_{\ref{single vector est}}(\varepsilon/3),n_{\ref{cube bound}}(\varepsilon/3,1)\bigr{)};

Choose NN\refweaklsvsymN\geq N_{\ref{weak lsv sym}}. Without loss of generality, we can assume that n=Nn=N. Let AA be as stated above.

By Lemma 3, there is a finite subset N2T{\mathcal{N}}_{2}\subset T of cardinality at most exp(C\refcubicnetN)\exp(C_{\ref{cubic net}}N) such that for any yTy\in T there is yN2y^{\prime}\in{\mathcal{N}}_{2} with yyN1/2\|y-y^{\prime}\|_{\infty}\leq N^{-1/2}.

so y32NBNy^{3}\in\frac{2}{\sqrt{N}}B_{\infty}^{N}. This proves (3).

For each y1N1y^{1}\in{\mathcal{N}}_{1}, in view of Lemma 7 and the condition NN\refsinglevectorest(ε/3)N\geq N_{\ref{single vector est}}(\varepsilon/3), we have

Next, for every y2N2y^{2}\in{\mathcal{N}}_{2}, Lemma 10 together with the inequality N1/4δ\refsinglevbound(ε/3,2C\refcubicnet)1\lfloor N^{1/4}\rfloor\delta_{\ref{single v bound}}(\varepsilon/3,2C_{\ref{cubic net}})\geq 1 and y21/N1/4\|y^{2}\|_{\infty}\leq 1/\lfloor N^{1/4}\rfloor implies that

for some constant L\refsinglevbound>0L_{\ref{single v bound}}>0. Finally, by Lemma 11 and in view of the condition Nn\refcubebound(ε/3,1)N\geq n_{\ref{cube bound}}(\varepsilon/3,1) we have

where L\refcubebound>0L_{\ref{cube bound}}>0 is a universal constant. Let E\mathcal{E} denote the event

Then from the above probability estimates and the definition of N\refweaklsvsymN_{\ref{weak lsv sym}} we obtain

where w_{\ref{weak lsv sym}}=\min\bigl{(}\frac{c_{\ref{single vector est}}\varepsilon}{6},\frac{C_{\ref{cubic net}}}{2},\frac{1}{2}\bigr{)}.

Note that the intersection I=I1I2I3I=I_{1}\cap I_{2}\cap I_{3} necessarily satisfies INεN|I|\geq N-\varepsilon N, and from the last inequalities we get ProjIA(ω)y(2C\refsinglevectorest+L\refsinglevbound+2L\refcubebound)N\|{\rm Proj}_{I}A(\omega)y\|\leq(2C_{\ref{single vector est}}+L_{\ref{single v bound}}+2L_{\ref{cube bound}})\sqrt{N}. Since our choice of ySN1y\in S^{N-1} and ωE\omega\in\mathcal{E} was arbitrary, we get

Finally, we can state the main result of the section.

where C\refweaklsvnonsym>0C_{\ref{weak lsv nonsym}}>0 is a universal constant.

Hence, taking into consideration that the entries of AAA-A^{\prime} are distributed as ξξ\xi-\xi^{\prime} and using Lemma 12, we get

Matrix truncation and proof of Theorem 1

In the next statement, we compare the nn-th largest singular value of a random N×nN\times n matrix AA with bounded entries to sn(ProjIA)s_{n}({\rm Proj}_{I}A). Obviously,

We will need an inequality in the opposite direction when I/N1|I|/N\approx 1. A theorem of Litvak, Pajor, Rudelson and Tomczak-Jaegermann from implies that for any δ>1\delta>1 and M>0M>0 there are h>0h>0 and ε>0\varepsilon>0 depending only on δ\delta and MM with the following property: whenever NδnN\geq\delta n and AA is an N×nN\times n random matrix with i.i.d. entries with mean zero, variance one and a.s. bounded by MM, we have

This, together with an upper bound for sn(A)s_{n}(A), gives an estimate

with a large probability, where L>0L>0 depends only on δ\delta and MM. However, such an estimate would be insufficient for our needs, and we shall apply a more direct argument to get a stronger relation.

Fix any η>0\eta>0, let ε=ε\refssvofsubmatr(η,M)\varepsilon=\varepsilon_{\ref{ssv of submatr}}(\eta,M) be the largest number in (0,1](0,1] satisfying

Let NN\refssvofsubmatrN\geq N_{\ref{ssv of submatr}}, nNn\leq N and AA be an N×nN\times n random matrix defined as above. We shall prove the statement by contradiction. Let us assume that

Cardinality of the set T=\bigl{\{}I\subset\{1,2,\dots,N\}:\,|I|=\lceil N-\varepsilon N\rceil\bigr{\}} can be estimated as

Hence, our assumption implies that there is a set I0TI_{0}\in T such that

Now, for every y=(y1,y2,,yn)Sn1y=(y_{1},y_{2},\dots,y_{n})\in S^{n-1}, Lemma 4 and the standard procedure with the Laplace transform give for λ=c\refKhintchinemod2M2\lambda=\frac{c_{\ref{Khintchine mod}}}{2M^{2}}:

Together with (4), the last estimate implies

However, this contradicts to our choice of ε\varepsilon. Thus, the initial assumption was wrong, and the statement is proved. ∎

Let ξ\xi be a random variable with zero mean. Then for any M>0M>0 we call the variable

the centered MM-truncation of ξ\xi. Here, χ{ξM}\chi_{\{|\xi|\leq M\}} is the indicator of the event \bigl{\{}\omega\in\Omega:\,|\xi(\omega)|\leq M\bigr{\}}.

Thus, it suffices to prove the lower estimate

Now, choose arbitrary η>0\eta>0 and let M>0M>0 be such that

where the quantity on the right-hand side goes to 11 as kk tends to infinity. Hence, we obtain

On the other hand, the theorem of Bai and Yin implies that

Since η>0\eta>0 was arbitrary, this proves the result. ∎

Acknowledgement. I would like to thank my supervisor Dr. Nicole Tomczak-Jaegermann for support and for valuable suggestions on the text.

References