Delocalization of eigenvectors of random matrices with independent entries

Mark Rudelson, Roman Vershynin

Introduction

This paper establishes a complete delocalization of random matrices with independent entries haying variances of the same order of magnitude. For an n×nn\times n matrix GG, complete delocalization refers to the situation where all unit eigenvectors vv of GG have all coordinates of the smallest possible magnitude n1/2n^{-1/2}, up to logarithmic corrections. For example, a random matrix GG with independent standard normal entries is completely delocalized with high probability. Indeed, by rotation invariance the unit eigenvectors vv are uniformly distributed on the sphere Sn1S^{n-1}, so with high probability one has v=maxinvi=O(log(n)/n)\|v\|_{\infty}=\max_{i\leq n}|v_{i}|=O(\sqrt{\log(n)/n}) for all vv.

Rotation-invariant ensembles seem to be the only example where delocalization can be obtained easily. Only recently was it proved by L. Erdös et al. that general symmetric and Hermitian random matrices HH with independent entries are completely delocalized . These results were later extended by L. Erdös et al. and by Tao and Vu , see also surveys . Very recently, the optimal bound O(logn/n)O(\sqrt{\log n/n}) was obtained by Vu and Wang for the “bulk” eigenvectors of Hermitian matrices. Delocalization properties with varying degrees of strength and generality were then established for several other symmetric and Hermitian ensembles – band matrices , sparse matrices (adjacency matrices of Erdös-Renyi graphs) , heavy-tailed matrices , and sample covariance matrices .

Despite this recent progress, no delocalization results were available for non-Hermitian random matrices prior to the present work. Similar to the Hermitian case, non-Hermitian random matrices have been successful in describing various physical phenomena, see and the references therein. The distribution of eigenvectors of non-Hermitian random matrices has been studied in physics literature, mostly focusing on correlations of certain eigenvector entries, see .

All previous approaches to delocalization in random matrices were spectral. Delocalization was obtained as a byproduct of local limit laws, which determine eigenvalue distribution on microscopic scales. For example, delocalization for symmetric random matrices was deduced from a local version of Wigner’s semicircle law which controls the number of eigenvalues of HH falling in short intervals, even down to intervals where the average number of eigenvalues is logarithmic in the dimension .

In this paper we develop a new approach to delocalization of random matrices, which is geometric rather than spectral. The only spectral properties we rely on are crude bounds on the extreme singular values of random matrices. As a result, the new approach can work smoothly in situations where limit spectral laws are unknown or even impossible. In particular, one does not need to require that the variances of all entries be the same, or even that the matrix of variances be doubly-stochastic (as e.g. ).

The case α=2\alpha=2 corresponds to sub-gaussian random variablesStandard properties of sub-gaussian random variables can be found in [38, Section 5.2.3].. It is convenient to state and prove the main result for sub-gaussian random variables, and then deduce a similar result for general α>0\alpha>0 using a standard truncation argument.

The same conclusion as in Theorem 1.1 holds for a complex matrix GG. One just needs to require that both real and imaginary parts of all entries are independent and satisfy the three conditions in Theorem 1.1.

The exponent 9/29/2 of the logarithm in Theorem 1.1 is suboptimal, and there are several points in the proof that can be improved. We believe that by taking care of these points, it is possible to improve the exponent to the optimal value 1/21/2. However, such improvements would come at the expense of simplicity of the argument, while in this paper we aim at presenting the most transparent proof. The exponents 3/23/2 is probably suboptimal as well.

The proof of Theorem 1.1 shows that CC depends polynomially on KK, i.e., C2KC0C\leq 2K^{C_{0}} for some absolute constant C0C_{0}. This observation allows one to extend Theorem 1.1 to the situation where the entries GijG_{ij} of GG have uniformly bounded ψα\psi_{\alpha}-norms, for any fixed α>0\alpha>0.

Here C,β,γC,\beta,\gamma depend only on α>0\alpha>0 and MM.

Let GG be a random matrix as in Theorem 1.1, and let t2t\geq 2 and s0s\geq 0. Then, with probability at least 1n1t(s+1)1-n^{1-t(s+1)}, all (s/n)(s/\sqrt{n})-approximate eigenvectors vv of GG satisfy

The results in of this paper could be extended in several other ways. For instance, it is possible to drop the assumption that all variances of the entries are of the same order and prove a similar theorem for sparse random matrices. One can establish the isotropic delocalization in the sense of . We did not pursue these directions since it would have made the presentation heavier. It is also possible that a version of Theorems 1.1 and 1.6 can be proved for Hermitian matrices. We leave this direction for the future.

Our approach to Theorem 1.6 is based on a dimension reduction argument. If the matrix GG has a localized approximate eigenvector, it will be detected from an imbalance of a suitable projection of GG. As we shall see, this argument yields a lower bound on (GzI)v2\|(G-zI)v\|_{2} that is uniform over all unit vectors vv such that v1/n\|v\|_{\infty}\gg 1/\sqrt{n}.

Let us describe this strategy in loose terms. We fix zz and consider the random matrix A=GzIA=G-zI; denote its columns by AjA_{j}. Consider a projection PP whose kernel contains all AjA_{j} except for j{j0}J0j\in\{j_{0}\}\cup J_{0}, where j0[n]j_{0}\in[n] and J[n]{j0}J\subset[n]\setminus\{j_{0}\} are a random uniform index and subset respectively, with J0=llog10n|J_{0}|=l\sim\log^{10}n. We call such PP a test projection. By its definition, triangle inequality and Cachy-Schwarz inequality, we have for any vector vv that

Suppose that vv is localized, say v>l2/n\|v\|_{\infty}>l^{2}/\sqrt{n}. Using the randomness of j0j_{0} and JJ, with non-negligible probability (around 1/n1/n) we have vj0=v>l2/n|v_{j_{0}}|=\|v\|_{\infty}>l^{2}/\sqrt{n} and (jJ0vj2)1/2l/n(\sum_{j\in J_{0}}|v_{j}|^{2})^{1/2}\lesssim\sqrt{l/n}. On this event, we have shown that

Since the right hand side of (1.2) does not depend on vv, we obtained a uniform lower estimate for all localized vectors vv.

It remains to estimate the magnitudes of PAj2\|PA_{j}\|_{2} for j{j0}J0j\in\{j_{0}\}\cup J_{0}. What helps us is that the test projection PP can be made independent of the random vectors AjA_{j} appearing in (1.2). Since A=GzIA=G-zI, we can represent Aj=GjzejA_{j}=G_{j}-ze_{j} where eje_{j} denote the standard basis vectors. Assume first that zz is very close to zero, so AjGjA_{j}\approx G_{j}. Then, using concentration of measure we can argue that PAj2PGj2l\|PA_{j}\|_{2}\approx\|PG_{j}\|_{2}\sim\sqrt{l} with high probability (and thus for all j{j0}J0j\in\{j_{0}\}\cup J_{0} simultaneously). Substituting this into (1.2) we conclude that the nice lower bound

holds for all localized approximate eigenvectors vv corresponding to (approximate) eigenvalues zz that are very close to zero.

The challenging part of our argument is for zz not close to zero, namely when the diagonal parts PejPe_{j} dominates in the representation PAj=PGjzPejPA_{j}=PG_{j}-zPe_{j}. Estimating the magnitudes of Pej2\|Pe_{j}\|_{2} might be as difficult as the original delocalization problem. However, it turns out that using concentration, it is possible to compare the terms Pej2\|Pe_{j}\|_{2} with each other without knowing their magnitudes. This will require a careful construction of a test projection PP.

The rest of the paper is organized as follows. In Section 2, we recall some known linear algebraic and probabilistic facts. In Section 3, we rigorously develop the argument that was informally described above. It reduces the delocalization problem to finding a test projection PP for which the norms of the columns PejPe_{j} have similar magnitudes. In Section 4, we shall develop a helpful tool for estimating Pej2\|Pe_{j}\|_{2}, an estimate of the distance between anisotropic random vectors and subspaces. In Section 5, we shall express Pej2\|Pe_{j}\|_{2} in terms of such distances, and thus will be able to compare these terms with each other. In Section 6 we deduce Theorem 1.6. Finally, Appendix contains auxiliary results on the smallest singular values of random matrices.

Notation and preliminaries

We shall work with random variables ξ\xi which satisfy the following assumption.

ξ\xi is either real valued and satisfies

or ξ\xi is complex valued, where Reξ\operatorname*{Re}\xi and Imξ\operatorname*{Im}\xi are independent random variables each satisfying the three conditions in (2.1).

We will establish the conclusion of Theorem 1.6 for random matrices GG with independent entries that satisfy Assumption 2.1. Thus we will simultaneously treat the real case and the complex case discussed in Remark 1.2.

We will regard the parameter KK in Assumption 2.1 as a constant, thus C,C1,c,c1,C,C_{1},c,c_{1},\ldots will denote positive numbers that may depend on KK only; their values may change from line to line.

Without loss of generality, we can assume that GG, as well as various other matrices that we will encounter in the proof, have full rank. This can be achieved by a perturbation argument, where one adds to GG an independent Gaussian random matrix GG^{\prime} whose all entries are independent N(0,σ2)N(0,\sigma^{2}) random variables with sufficiently small σ>0\sigma>0. Such perturbation will not affect the proof of Theorem 1.6 since any ε\varepsilon-approximate eigenvector of GG will be a (2ε)(2\varepsilon)-approximate eigenvector of GG^{\prime} whenever GG<ε\left\|G-G^{\prime}\right\|<\varepsilon.

Here AA^{\dagger} denotes the Moore-Penrose pseudoinverse of AA, see e.g. . We will need a few elementary properties of singular values.

Let AA be an m×nm\times n matrix and r=rank(A)r=\operatorname*{rank}(A).

Appendix A contains estimates of the smallest singular values of random matrices.

Next, we state a concentration property of sub-gaussian random vectors.

Let AA be a fixed m×nm\times n matrix. Consider a random vector X=(X1,,Xn)X=(X_{1},\ldots,X_{n}) with independent components XiX_{i} which satisfy Assumption 2.1.

(Concentration) For any t0t\geq 0, we have

In both parts, c=c(K)>0c=c(K)>0 is polynomial in KK.

This result can be deduced from Hanson-Wright inequality. For part (ii), this was done in . A modern proof of Hanson-Wright inequality and deduction of both parts of Theorem 2.3 are discussed in . There XiX_{i} were assumed to have unit variances; the general case follows by a standard normalization step.

Sub-gaussian concentration paired with a standard covering argument yields the following result on norms of random matrices, see .

The same holds if B=PB=P is an r×Nr\times N matrix such that PP=IrPP^{*}=I_{r}.

Reducing delocalization to the existence of a test projection

We will first try to bound the probability of the following localization event for a random matrix AA and parameters l,W,w>0l,W,w>0:

In this section, we reduce our task to the existence of a certain linear map PP which reduces dimension from nn to l\sim l, and which we call a test projection.

To this end, given an m×nm\times n matrix BB, we shall denote by BjB_{j} the jj-th column of BB, and for a subset J[n]J\subseteq[n], we denote by BJB_{J} the submatrix of BB formed by the columns indexed by JJ. Fix nn and lnl\leq n, and define the set of pairs

We equip Λ\Lambda with the uniform probability measure.

Let lnl\leq n. Consider an n×nn\times n random matrix AA with an arbitrary distribution. Suppose that to each (j0,J0)Λ(j_{0},J_{0})\in\Lambda corresponds a number lnl^{\prime}\leq n and an l×nl^{\prime}\times n matrix P=P(n,l,A,j0,J0)P=P(n,l,A,j_{0},J_{0}) with the following properties:

ker(P){Aj}j∉{j0}J0\ker(P)\supseteq\{A_{j}\}_{j\not\in\{j_{0}\}\cup J_{0}}.

Let α,κ>0\alpha,\kappa>0. Let w>0w>0 and W=wκl+2αW=\frac{w}{\kappa l}+\frac{\sqrt{2}}{\alpha}. Then we can bound the probability of the localization event (3.1) as follows:

where Bα,κ\mathcal{B}_{\alpha,\kappa} denotes the following balancing event:

Proposition 3.1 states that in order to establish delocalization (as encoded by the complement of the event LW,w\mathcal{L}_{W,w}), it is enough to find a test projection PP which satisfies the balancing property Bα,κ\mathcal{B}_{\alpha,\kappa}.

Let vSn1v\in S^{n-1}, (j0,J0)Λ(j_{0},J_{0})\in\Lambda let PP be as in the statement. Using the properties (i) and (ii) of PP, we have

The event Bα,κ\mathcal{B}_{\alpha,\kappa} will help us balance the norms PAj02\|PA_{j_{0}}\|_{2} and PAJ0\|PA_{J_{0}}\|, while the following elementary lemma will help us balance the coefficients viv_{i}.

For a given vSn1v\in S^{n-1} and for random (j0,J0)Λ(j_{0},J_{0})\in\Lambda, define the event

Let k0[n]k_{0}\in[n] denote a coordinate for which vk0=v|v_{k_{0}}|=\|v\|_{\infty}. Then

Conditionally on j0=k0j_{0}=k_{0}, the distribution of J0J_{0} is uniform in the set {J[n]{k0},J=l1}\{J\subseteq[n]\setminus\{k_{0}\},\,|J|=l-1\}. Thus using Chebyshev’s inequality we obtain

Assume that a realization of the random matrix AA satisfies

(We will analyze when this event occurs later.) Combining with the conclusion of Lemma 3.2, we see that there exists (j0,J0)Λ(j_{0},J_{0})\in\Lambda such that both events Vv\mathcal{V}_{v} and Bα,κ\mathcal{B}_{\alpha,\kappa} hold. Then we can continue estimating Av2\|Av\|_{2} in (3.3) using Vv\mathcal{V}_{v} and Bα,κ\mathcal{B}_{\alpha,\kappa} as follows:

provided the right hand side is non-negative. In particular, if v>Wl/n\|v\|_{\infty}>W\sqrt{l/n} where W=wκl+2αW=\frac{w}{\kappa l}+\frac{\sqrt{2}}{\alpha}, then Av2>w/n\|Av\|_{2}>w/\sqrt{n}. Thus the localization event LW,w\mathcal{L}_{W,w} must fail.

Let us summarize. We have shown that the localization event LW,w\mathcal{L}_{W,w} implies the failure of the event (3.5). The probability of this failure can be estimated using Chebyshev’s inequality and Fubini theorem as follows:

This completes the proof of Proposition 3.1. ∎

We might choose PP to be the orthogonal projection with

In reality, PP will be a bit more adapted to AA. Let us see what it will take to prove the two inequalities defining the balancing event Bα,κ\mathcal{B}_{\alpha,\kappa} in (3.2). The second inequality can be deduced from the small ball probability estimate, Theorem 2.3(ii). Turning to the first inequality, note that

up to a polynomial factor in J0=l1|J_{0}|=l-1 (thus logarithmic in nn). So we need to show that

Since A=GzInA=G-zI_{n}, the columns AiA_{i} of AA can be expressed as Ai=GizeiA_{i}=G_{i}-ze_{i}. Thus, informally speaking, our task is to show that with high probability,

The first inequality can be deduced from sub-gaussian concentration, Theorem 2.3. The second inequality in (3.6) is challenging, and most of the remaining work is devoted to validating it. Instead of estimating Pej2\|Pe_{j}\|_{2}, we will compare these terms with each other.

Later, in Proposition 5.3, we will relate Pej2\|Pe_{j}\|_{2} to distances between anisotropic random vectors and subspaces. We will now digress to develop a general bound on such distances, which may be interesting on its own.

Distances between anisotropic random vectors and subspaces

Let us start with the isotropic case, where the random vectors in question have all independent coordinates. Here one can use Theorem 2.3 to control the distances.

Let 1kn1\leq k\leq n. Consider independent random vectors X,X1,X2,,XkX,X_{1},X_{2},\ldots,X_{k} with independent coordinates satisfying Assumption 2.1. Consider the subspace Ek=span(Xi)i=1kE_{k}=\operatorname*{span}(X_{i})_{i=1}^{k}. Then

By adding small independent Gaussian perturbations to the vectors XjX_{j}, we can assume that dim(Ek)=k\dim(E_{k})=k almost surely. We can represent the distance as

In this paper, we will need to control the distances in the more difficult anisotropic case, where all random vectors are transformed by a fixed linear map DD. In other words, we will be interested in distances of the form d(DX,Ek)d(DX,E_{k}) where EkE_{k} is the span of the vectors DX1,,DXkDX_{1},\ldots,DX_{k}. An ideal estimate should look like

where si(D)s_{i}(D) are the singular values of DD arranged in the non-increasing order. To see why such estimate would make sense, note that in the isotropic case where D=InD=I_{n} the distance is of order nk\sqrt{n-k}, while for DD of rank kk or lower, the distance is zero.

The following result, based again on Theorem 2.3, establishes a somewhat weaker form of (4.1) with exponentially high probability.

Let DD be an n×nn\times n matrix with singular valuesAs usual, we arrange the singular values in a non-increasing order. si=si(D)s_{i}=s_{i}(D), and define Sˉm2=i>msi2\bar{S}_{m}^{2}=\sum_{i>m}s_{i}^{2} for m0m\geq 0. Let 1kn1\leq k\leq n. Consider independent random vectors X,X1,X2,,XkX,X_{1},X_{2},\ldots,X_{k} with independent coordinates satisfying Assumption 2.1. Consider the subspace Ek=span(DXi)i=1kE_{k}=\operatorname*{span}(DX_{i})_{i=1}^{k}. Then for every k/2k0<kk/2\leq k_{0}<k and k<k1nk<k_{1}\leq n, one has

Here M=Ckk0/(kk0)M=Ck\sqrt{k_{0}}/(k-k_{0}) and C=C(K)C=C(K), c=c(K)>0c=c(K)>0.

It is important that the probability bounds in Theorem 4.2 are exponential in k1kk_{1}-k and kk0k-k_{0}. We will later choose kllog2nk\sim l\sim\log^{2}n and k0(1δ)kk_{0}\approx(1-\delta)k, k1(1+δ)kk_{1}\approx(1+\delta)k, where δ1/logn\delta\sim 1/\log n. This will allow us to make the exceptional probabilities Theorem 4.2 smaller than, say, n10n^{-10}.

As will be clear from the proof, one can replace the distance d(DX,Ek)d(DX,E_{k}) in part (ii) of Theorem 4.2 by the following bigger quantity:

We truncate the singular values of BB by defining an n×nn\times n matrix Bˉ\bar{B} with the same left and right singular vectors as BB, and with singular values

Since si(Bˉ)si(B)s_{i}(\bar{B})\leq s_{i}(B) for all ii, we have BˉBˉBB\bar{B}\bar{B}^{*}\preceq BB^{*} in the p.s.d. order, which implies

It remains to bound BˉX2\|\bar{B}X\|_{2} below. This can be done using Theorem 2.3(ii):

For i>k1ki>k_{1}-k, Cauchy interlacing theorem yields si(Bˉ)=si(B)si+k(D)s_{i}(\bar{B})=s_{i}(B)\geq s_{i+k}(D), thus

(ii) We truncate the singular value decomposition D=i=1nsiuiviD=\sum_{i=1}^{n}s_{i}u_{i}v_{i}^{*} by defining

We will estimate these two terms separately.

Using this for t=ksk0+1t=\sqrt{k}s_{k_{0}+1}, we obtain that with probability at least 12exp(ck)1-2\exp(-ck),

Next, we estimate the first term in (4.4), d(D0X,Ek)d(D_{0}X,E_{k}). Our immediate goal is to represent D0XD_{0}X as a linear combination

with some control of the norm of the coefficient vector a=(a1,,ak)a=(a_{1},\ldots,a_{k}). To this end, let us consider the singular value decomposition

Thus P0P_{0} is a k0×nk_{0}\times n matrix satisfying P0P0=Ik0P_{0}P_{0}^{*}=I_{k_{0}}. Let GG denote the n×kn\times k with columns X1,,XkX_{1},\ldots,X_{k}.

We apply Theorem A.3 for the k0×kk_{0}\times k matrix P0GP_{0}G. It states that with probability at least 12kexp(c(kk0))1-2k\exp(-c(k-k_{0})), we have

Using Lemma 2.2(ii) we can find a coefficient vector a=(a1,,ak)a=(a_{1},\ldots,a_{k}) such that

Multiplying both sides of (4.8) by U0Σ0U_{0}\Sigma_{0} and recalling that D0=U0Σ0V0=U0Σ0P0D_{0}=U_{0}\Sigma_{0}V_{0}^{*}=U_{0}\Sigma_{0}P_{0}, we obtain the desired identity (4.6).

Now we have representation (4.6) with a good control of a2\|a\|_{2}. Then we can estimate the distance as follows:

(Recall that GG denotes the n×kn\times k with matrix columns X1,,XkX_{1},\ldots,X_{k}.) Applying Theorem 2.4, we have with probability at least 12exp(k)1-2\exp(-k) that

Intersecting this with the event (4.10), we obtain with probability at least 16kexp(c(kk0))1-6k\exp(-c(k-k_{0})) that

Finally, we combine this with the event (4.5) and put into the estimate (4.4). It follows that with probability at least 18kexp(c(kk0))1-8k\exp(-c(k-k_{0})), one has

Due to our choice of MM (in (4.10) and (4.7)), the theorem is proved.The factor 88 in the probability estimate can be reduced to 22 by adjusting cc. We will use the same step in later arguments. ∎

Construction of a test projection

We are now ready to construct a test projection PP, which will be used later in Proposition 3.1.

with probability at least 12n2exp(cl/logn)1-2n^{2}\exp(-cl/\log n), one has

In the rest of this section we prove Theorem 5.1.

Consider the n×nn\times n random matrix AA with columns AjA_{j}. Let Aˉ\bar{A} denote the (nl)×(nl)(n-l)\times(n-l) minor of AA obtained by removing the first ll rows and columns. By known invertibility results for random matrices, we will see that most singular values of Aˉ\bar{A}, and thus also of Aˉ1\bar{A}^{-1}, are within a factor nO(1)n^{O(1)} from each other. Then we will find a somewhat smaller interval (a “spectral window”) in which the singular values of Aˉ1\bar{A}^{-1} are within constant factor from each other. This is a consequence of the following elementary lemma.

Let s1s2sns_{1}\geq s_{2}\geq\cdots\geq s_{n}, and define Sˉk2=j>ksj2\bar{S}_{k}^{2}=\sum_{j>k}s_{j}^{2} for k0k\geq 0. Assume that for some lnl\leq n and R1R\geq 1, one has

Set δ=c/logR\delta=c/\log R. Then there exists l[l/2,l]l^{\prime}\in[l/2,l] such that

Let us divide the interval [l/2,l][l/2,l] into 1/(8δ)1/(8\delta) intervals of length 4δl4\delta l. Then for at least one of these intervals, the sequence si2s_{i}^{2} decreases over it by a factor at most 22. Indeed, if this were not true, the sequence would decrease by a factor at least 21/(8δ)>R2^{1/(8\delta)}>R over [l/2,l][l/2,l], which would contradict the assumption (5.1). Set ll^{\prime} to be the midpoint of the interval we just found, thus

By monotonicity of si2s_{i}^{2}, this implies the first part of the conclusion (5.2). To see this, note that since lll^{\prime}\leq l, we have l2δl(1δ)l(1+δ)ll+2δll^{\prime}-2\delta l\leq(1-\delta)l^{\prime}\leq(1+\delta)l^{\prime}\leq l^{\prime}+2\delta l.

To deduce the second part of (5.2), note that by monotonicity we have

where the very last inequality follows from (5.3). Estimates (5.4) and (5.5) together imply that Sˉlδl25Sˉl+δl2\bar{S}_{l^{\prime}-\delta l}^{2}\leq 5\bar{S}_{l^{\prime}+\delta l}^{2}. Like in the first part, we finish by monotonicity. ∎

We shall apply Lemma 5.2 to the singular values of Aˉ1\bar{A}^{-1}, i.e. for

To verify the assumptions of the lemma, we can use known estimates of the extreme singular values of random matrices. By Theorem 2.4 (see Remark 2.5), with probability at least 1exp(n)1-\exp(-n), we have GCn\|G\|\leq C\sqrt{n}, and thus

Further, by Theorem A.2, with probability at least 12lexp(c(n2l))1-2l\exp(-c(n-2l)), one has

(Here we used that ln/4l\leq n/4.) Summarizing, with probability at least 12nexp(cl)1-2n\exp(-cl),

Let us condition on Aˉ\bar{A} for which event (5.6) holds. We apply Lemma 5.2 with R=(C1/c1)nR=(C_{1}/c_{1})n and thus for

We find l[l/2,l]l^{\prime}\in[l/2,l] such that (5.2) holds. Note that the value of ll^{\prime} depends only on the minor Aˉ\bar{A}, thus only on {Aj}j>l\{A_{j}\}_{j>l}, as claimed in Theorem 5.1. Since we have conditioned on Aˉ\bar{A}, the value of the “spectral window” ll^{\prime} is now fixed.

2. Construction of P𝑃P

We construct PP in two steps. First we define a matrix QQ of the same dimensions that satisfies (ii) of the Theorem, and then obtain PP by orthogonalization of the rows of QQ.

Thus we shall look for an l×nl^{\prime}\times n matrix QQ that consists of three blocks of columns:

We require that QQ satisfy condition (ii) in Theorem 5.1, i.e. that

We explore this requirement in Section 5.4; for now let us assume that it holds.

Choose PP to be an l×nl^{\prime}\times n matrix that satisfies the following two defining properties:

the span of the rows of PP is the same as the span of the rows of QQ.

One can construct PP by Gram-Schmidt orhtogonalization of the rows of QQ.

Note that the construction of PP along with (5.8) implies (i) and (ii) of Theorem 5.1. It remains to estimate Pej2\|Pe_{j}\|_{2} thereby proving (iii) of Theorem 5.1.

Let qiq_{i} denote the rows of QQ and qijq_{ij} denote the entries of QQ. Then:

The values of Pei2\|Pe_{i}\|_{2}, ini\leq n, are determined by QQ, and they do not depend on a particular choice of PP satisfying its defining properties (a), (b).

For every l<ill^{\prime}<i\leq l, Pei2=0\|Pe_{i}\|_{2}=0.

(i) Any P,PP,P^{\prime} that satisfy the defining properties (a), (b) must satisfy P=UPP^{\prime}=UP for some l×ll^{\prime}\times l^{\prime} unitary matrix UU. It follows that Pei2=Pei2\|P^{\prime}e_{i}\|_{2}=\|Pe_{i}\|_{2} for all ii.

(ii) Let us assume that i=1i=1; the argument for general ii is similar. By part (i), we can construct the rows of PP by performing Gram-Schmidt procedure on the rows of QQ in any order. We choose the following order: ql,ql1,,q1q_{l^{\prime}},q_{l^{\prime}-1},\ldots,q_{1}, and thus construct the rows pl,pl1,,p1p_{l^{\prime}},p_{l^{\prime}-1},\ldots,p_{1} of PP. This yields

where pijp_{ij} denote the entries of PP.

Next, for each 2jl2\leq j\leq l^{\prime}, (5.11) implies that pjspan(qk)k2=E1p_{j}\in\operatorname*{span}(q_{k})_{k\geq 2}=E_{1}, and thus the first coordinate of pjp_{j} equal zero. Using this in (5.12), we conclude that

(iii) is trivial since Qei=0Qe_{i}=0 for all l<ill^{\prime}<i\leq l by the construction of QQ, while the rows of PP are the linear combination of the rows of QQ. ∎

4. The kernel requirement (5.8)

In order to estimate the distances d(qi,Ei)d(q_{i},E_{i}) defined by the rows of QQ, let us explore the condition (5.8) for QQ. To express this condition algebraically, let us consider the n×(nl)n\times(n-l) matrix A(l)A^{(l)} obtained by removing the first ll columns from AA. Then (5.8) can be written as

Let us denote the first ll rows of A(l)A^{(l)} by BiTB_{i}^{\mathsf{T}}, thus

Without loss of generality, we can assume that the matrix Aˉ\bar{A} is almost surely invertible (see Section 2 for a perturbation argument achieving this). Multiplying both sides of the previous equations by Aˉ1\bar{A}^{-1}, we further rewrite them as

Thus we can choose QQ to satisfy the requirement (5.8) by choosing qii>0q_{ii}>0 arbitrarily and defining qˉi\bar{q}_{i} as in (5.15).

5. Estimating the distances, and completion of proof of Theorem 5.1

We shall now estimate Pei2\|Pe_{i}\|_{2}, 1il1\leq i\leq l^{\prime}, using identities (5.9) and (5.15). By the construction of QQ and (5.15) we have

Let us estimate Pe12\|Pe_{1}\|_{2}; the argument for general Pei2\|Pe_{i}\|_{2} is similar. By (5.9),

We will use Theorem 4.2 to obtain lower and upper bounds on d1d_{1}.

We apply Theorem 4.2 in dimension nln-l instead of nn, and with

Recall here that in (5.7) we selected δ=c/logn\delta=c/\log n. Note that by construction (5.14), the vectors BiB_{i} do not contain the diagonal elements of AA, and so their entries have mean zero as required in Theorem 4.2. Applying part (i) of that theorem, we obtain with probability at least 12exp(cδl)1-2\exp(-c\delta l^{\prime}) that

Now we apply part (ii) of Theorem 4.2. This time we shall use a sharper bound stated in Remark 4.4. It yields that with probability at least 12lexp(cδl)1-2l^{\prime}\exp(-c\delta l^{\prime}), the following holds. There exists a=(a2,,al)a=(a_{2},\ldots,a_{l^{\prime}}) such that

We can simplify (5.18). Using (5.2) and monotonicity, we have

Recall that this holds with probability at least 12lexp(cδl)1-2l^{\prime}\exp(-c\delta l^{\prime}). On this event, by the construction of rir_{i} and using the bound on aa in (5.19), we have

5.3. Completion of the proof of Theorem 5.1

Combining the events (5.20) and (5.17), we have shown the following. With probability at least 14lexp(cδl)1-4l^{\prime}\exp(-c\delta l^{\prime}), the following two-sided estimate holds:

A similar statement can be proved for general did_{i}, 1il1\leq i\leq l^{\prime}. By intersecting these events, we obtain that with probability at least 14(l)2exp(cδl)1-4(l^{\prime})^{2}\exp(-c\delta l^{\prime}), all such bounds for did_{i} hold simultaneously. Suppose this indeed occurs. Then by (5.16), we have

We have calculated the conditional probability of (5.21); recall that we conditioned on Aˉ\bar{A} which satisfies the event (5.6), which itself holds with probability 12nexp(cl)1-2n\exp(-cl). Thus the unconditional probability of the event (5.21) is at least 12nexp(cl)C1(l)2exp(cδl)1-2n\exp(-cl)-C_{1}(l^{\prime})^{2}\exp(-c\delta l^{\prime}). Recalling that l/2ln/4l/2\leq l\leq n/4 and δ=c/logn\delta=c/\log n, and simplifying this expression, we arrive at the probability bound claimed in Theorem 5.1. Since M2l/δM\leq 2\sqrt{l}/\delta according to (5.19), the estimate (5.21) yields the first part of (iii) in Theorem 5.1. The second part, stating that Pei=0Pe_{i}=0 for l<ill^{\prime}<i\leq l, was already noted in (iii) or Proposition 5.3. Thus Theorem 5.1 is proved. ∎

Proof of Theorem 1.6 and Corollary 1.5

Let GG be a random matrix from Theorem 1.6. We shall apply Proposition 3.1 for

Let α=c/(llog3/2n)\alpha=c/(l\log^{3/2}n) and κ=c\kappa=c. Then, for every fixed (j0,J0)Λ(j_{0},J_{0})\in\Lambda, one can find a test projection as required in Proposition 3.1. Moreover,

Without loss of generality, we assume that j0=1j_{0}=1 and J0={2,,n}J_{0}=\{2,\ldots,n\}. We apply Theorem 5.1, and choose l[l/2,l]l^{\prime}\in[l/2,l] and PP determined by {Aj}j>l\{A_{j}\}_{j>l} guaranteed by that theorem. The test projeciton PP automatically satisfies the conditions of Proposition 3.1. Moreover, with probability at least 12n2exp(cl/logn)1-2n^{2}\exp(-cl/\log n), one has

Let us condition on {Aj}j>l\{A_{j}\}_{j>l} for which the event (6.2) holds; this fixes ll^{\prime} and PP but leaves {Aj}jl\{A_{j}\}_{j\leq l} random as before.

The definition (3.2) of balancing event Bα,κ\mathcal{B}_{\alpha,\kappa} requires us to estimate the norms of

Hence, estimates (6.3) and (6.5) hold simultaneously with probability at least 14exp(cl)1-4\exp(-cl). Recall that this concerns conditional probability, where we conditioned on the event (6.2), which itself holds with probability at least 12n2exp(cl/logn)1-2n^{2}\exp(-cl/\log n). Therefore, estimates (6.3) and (6.5) hold simultaneously with (unconditional) probability at least 14exp(cl)2n2exp(cl/logn)16n2exp(cl/logn)1-4\exp(-cl)-2n^{2}\exp(-cl/\log n)\geq 1-6n^{2}\exp(-cl/\log n). Together they yield

This is the first part of the event Bα,κ\mathcal{B}_{\alpha,\kappa}. Finally, (6.3) implies that PA12cl\|PA_{1}\|_{2}\geq c\sqrt{l}, which is the second part of the event Bα,κ\mathcal{B}_{\alpha,\kappa} for κ=c\kappa=c. The proof is complete. ∎

Substituting the conclusion of Proposition 6.1 into Proposition 3.1, we obtain:

From this we can readily deduce a slightly stronger version of Theorem 1.6.

Consider a random matrix GG as in Theorem 1.6. Let 0sn0\leq s\leq n, s+2ln/4s+2\leq l\leq n/4 and W=Cllog3/2nW=Cl\log^{3/2}n. Then the event

Recall that GG is nicely bounded with high probability. Indeed, Theorem 2.4 (see Remark 2.5) states that the event

Assume that Enorm\mathcal{E}_{\textrm{norm}} holds. Then all (s/n)(s/\sqrt{n})-approximate eigenvalues of GG are contained in the complex disc centered at the origin and with radius G+s/n2C1n\|G\|+s/\sqrt{n}\leq 2C_{1}\sqrt{n}. Let {z1,,zN}\{z_{1},\ldots,z_{N}\} be a (1/n)(1/\sqrt{n})-net of this disc such that NC2n2N\leq C_{2}n^{2}.

Assume LW\mathcal{L}_{W} holds, so there exists an (s/n)(s/\sqrt{n})-approximate eigenvector vv of GG such that v2=1\|v\|_{2}=1 and v>Wl/n\|v\|_{\infty}>W\sqrt{l/n}. Choose a point ziz_{i} in the net closest to zz, so zzi1/n|z-z_{i}|\leq 1/\sqrt{n}. Then

This argument shows that LWEnormi=1NLW(i)\mathcal{L}_{W}\cap\mathcal{E}_{\textrm{norm}}\subseteq\bigcup_{i=1}^{N}\mathcal{L}_{W}^{(i)}, where

Recall that the probability of Enorm\mathcal{E}_{\textrm{norm}} is estimated in (6.6), and the probabilities of the events LW(i)\mathcal{L}_{W}^{(i)} can be bounded using Proposition 6.2 with w=s+1w=s+1. (Our assumption that ls+2l\geq s+2 enforces the bound w<lw<l that is needed in Proposition 6.2.) It follows that

Simplifying this bound we complete the proof. ∎

We are going to apply Corollary 6.3 for l=Ct(s+1)log2nl=Ct(s+1)\log^{2}n. This is possible as long as sns\leq n and t(s+1)<cn/log2nt(s+1)<cn/\log^{2}n, since the latter restriction enforces the bound ln/4l\leq n/4. In this regime, the conclusion of Theorem 1.6 follows directly from Corollary 6.3.

In the remaining case, where either sns\geq n or t(εn+1)>cn/log2nt(\varepsilon\sqrt{n}+1)>cn/\log^{2}n, the right hand side of (1.1) is greater than v2\|v\|_{2} for an appropriate choice of the constant CC. Thus, in this case, the bound (1.1) holds trivially since one always has vv2\|v\|_{\infty}\leq\|v\|_{2}. Theorem 1.6 is proved. ∎

Using a standard truncation argument, we will now deduce Corollary 1.5 for general exponential tail decay. We will first prove the following relaxation of Proposition 6.2.

Here β,γ,C,c>0\beta,\gamma,C,c>0 depend only on α\alpha and MM.

Appendix A Invertibility of random matrices

Our delocalization method relied on estimates of the smallest singular values of rectangular random matrices. The method works well provided one has access to estimates that are polynomial in the dimension of the matrix (which sometimes was of order nn, and other times of order llog2nl\sim\log^{2}n), and provided the probability of having these estimates is, say, at least 1n101-n^{-10}.

In the recent years, significantly sharper bounds were proved than those required in our delocalization method, see survey . We chose to include weaker bounds in this appendix for two reasons. First, they hold in somewhat more generality than those recorded in the literature, and also their proofs are significantly simpler.

Let NnN\geq n, and let A=D+GA=D+G where DD is an arbitrary N×nN\times n fixed matrix and GG is an N×nN\times n random matrix with independent entries satisfying Assumption 2.1. Then

Using the negative second moment identity (see Lemma A.4]), we have

Union bound yields that with probability at least 12nexp(c(Nn))1-2n\exp(-c(N-n)), we have d(Ai,Ei)cNnd(A_{i},E_{i})\geq c\sqrt{N-n} for all ini\leq n. Plugging this into (A.2), we conclude that with the same probability, sn(A)2c2n/(Nn)s_{n}(A)^{-2}\leq c^{-2}n/(N-n). This completes the proof. ∎

Let A=D+GA=D+G where DD is an arbitrary N×MN\times M fixed matrix and GG is an N×MN\times M random matrix with independent entries satisfying Assumption 2.1. Then all singular values sn(A)s_{n}(A) for 1nmin(N,M)1\leq n\leq\min(N,M) satisfy the estimate (A.1) with c=c(K)>0c=c(K)>0.

Recall that sn(A)sn(A0)s_{n}(A)\geq s_{n}(A_{0}) where A0A_{0} is formed by the first nn columns of AA. The conclusion follows from Theorem A.1 applied to A0A_{0}. ∎

Let us explain the idea of the proof of Theorem A.3. We need a lower bound for

where GiG_{i} denote the columns of GG. The bound has to be uniform over xSm1x\in S^{m-1}. Let m=(1δ)km=(1-\delta)k and set m0=(1ρ)mm_{0}=(1-\rho)m for a suitably chosen ρδ\rho\ll\delta.

The vectors xx that lie near the subspace EE^{\perp}, which has dimension mm0=ρmm-m_{0}=\rho m, can be controlled by the remaining km0k-m_{0} vectors PGiPG_{i}, since km0mm0k-m_{0}\gg m-m_{0}. Indeed, this is equivalent to controlling the smallest singular value of a (mm0)×(km0)(m-m_{0})\times(k-m_{0}) random matrix whose columns are QGiQG_{i}, where QQ projects onto EE^{\perp}. This is a version of Theorem A.3 for very fat matrices, and it can be proved in a standard way by using ε\varepsilon-nets.

Let m0mm_{0}\leq m. Consider the m×m0m\times m_{0} matrix T0T_{0} formed by the first m0m_{0} columns of matrix T=PGT=PG. Then

This is a minor variant of Theorem A.1; its proof is very similar and is omitted. ∎

There exist C=C(K)C=C(K), c=c(K)>0c=c(K)>0 such that the following holds. Consider the same situation as in Theorem A.3, except that we assume that kCmk\geq Cm. Then

Lemma A.5 is a minor variation of [38, Theorem 5.39] for kCmk\geq Cm independent sub-gaussian columns, and it can be proved in a similar way (using a standard concentration and covering argument). ∎

Denote T:=PGT:=PG; our goal is to bound below the quantity

Let ε,ρ(0,1/2)\varepsilon,\rho\in(0,1/2) be parameters, and set m0=(1ρ)mm_{0}=(1-\rho)m. We decompose

where T0T_{0} is the m×m0m\times m_{0} matrix that consists of the first m0m_{0} columns of TT, and Tˉ\bar{T} is the (km0)×m(k-m_{0})\times m matrix that consists of the last km0k-m_{0} columns of TT. Let xSm1x\in S^{m-1}. Then

Assume that sm0(T0)>0s_{m_{0}}(T_{0})>0 (which will be seen to be a likely event), so dim(E)=m0\dim(E)=m_{0}.

The argument now splits according to the position of xx relative to EE. Assume first that PEx2ε\|P_{E}x\|_{2}\geq\varepsilon. Since rank(T0)=m0\operatorname*{rank}(T_{0})=m_{0}, using Lemma 2.2(i) we have

We will later apply Lemma A.4 to bound sm0(T0)s_{m_{0}}(T_{0}) below.

Consider now the opposite case, where PEx2<ε\|P_{E}x\|_{2}<\varepsilon. There exists yEy\in E^{\perp} such that xy2ε\|x-y\|_{2}\leq\varepsilon, and in particular y2x2ε1ε>1/2\|y\|_{2}\geq\|x\|_{2}-\varepsilon\geq 1-\varepsilon>1/2. Thus

is an (mm0)×(km0)(m-m_{0})\times(k-m_{0}) matrix. Since z21/2\|z\|_{2}\geq 1/2, we have Tˉy212smm0(B)\|\bar{T}^{*}y\|_{2}\geq\frac{1}{2}s_{m-m_{0}}(B), which together with (A.3) yields

A bit later, we will use Lemma A.5 to bound smm0(Bˉ)s_{m-m_{0}}(\bar{B}) below.

Putting the two cases together, we have shown that

It remains to estimate sm0(T0)s_{m_{0}}(T_{0}), smm0(Bˉ)s_{m-m_{0}}(\bar{B}) and Tˉ\|\bar{T}\|.

Since m0=(1ρ)mm_{0}=(1-\rho)m and ρ(0,1/2)\rho\in(0,1/2), Lemma A.4 yields that with probability at least 12mexp(cρm)1-2m\exp(-c\rho m), we have

Next, we use Lemma A.5 for the (mm0)×(km0)(m-m_{0})\times(k-m_{0}) matrix Bˉ=RGˉ\bar{B}=R\bar{G}. Let δ(0,1)\delta\in(0,1) be such that m=(1δ)km=(1-\delta)k. Since m0=(1ρ)mm_{0}=(1-\rho)m, by choosing ρ=c0δ\rho=c_{0}\delta with a suitable c0>0c_{0}>0 we can achieve that km0C(mm0)k-m_{0}\geq C(m-m_{0}) to satisfy the dimension requirement in Lemma A.5. Then, with probability at least 12exp(cδk)1-2\exp(-c\delta k) we have

Further, by Theorem 2.4, with probability at least 12exp(k)1-2\exp(-k) we have

Putting all these estimates in (A.4), we find that with probability at least 12mexp(cρm)2exp(cδk)2exp(k)1-2m\exp(-c\rho m)-2\exp(-c\delta k)-2\exp(-k), one has

Now we choose ε=c1δ\varepsilon=c_{1}\sqrt{\delta} with a suitable c1>0c_{1}>0, and recall that we have chosen ρ=c0δ\rho=c_{0}\delta. We conclude that sm(T)cmin{δ,δk}=cδs_{m}(T)\geq c\min\{\delta,\,\sqrt{\delta k}\}=c\delta. Since m=(1δ)km=(1-\delta)k, the proof of Theorem A.3 is complete. ∎

References