Delocalization of eigenvectors of random matrices with independent entries
Mark Rudelson, Roman Vershynin
Introduction
This paper establishes a complete delocalization of random matrices with independent entries haying variances of the same order of magnitude. For an matrix , complete delocalization refers to the situation where all unit eigenvectors of have all coordinates of the smallest possible magnitude , up to logarithmic corrections. For example, a random matrix with independent standard normal entries is completely delocalized with high probability. Indeed, by rotation invariance the unit eigenvectors are uniformly distributed on the sphere , so with high probability one has for all .
Rotation-invariant ensembles seem to be the only example where delocalization can be obtained easily. Only recently was it proved by L. Erdös et al. that general symmetric and Hermitian random matrices with independent entries are completely delocalized . These results were later extended by L. Erdös et al. and by Tao and Vu , see also surveys . Very recently, the optimal bound was obtained by Vu and Wang for the “bulk” eigenvectors of Hermitian matrices. Delocalization properties with varying degrees of strength and generality were then established for several other symmetric and Hermitian ensembles – band matrices , sparse matrices (adjacency matrices of Erdös-Renyi graphs) , heavy-tailed matrices , and sample covariance matrices .
Despite this recent progress, no delocalization results were available for non-Hermitian random matrices prior to the present work. Similar to the Hermitian case, non-Hermitian random matrices have been successful in describing various physical phenomena, see and the references therein. The distribution of eigenvectors of non-Hermitian random matrices has been studied in physics literature, mostly focusing on correlations of certain eigenvector entries, see .
All previous approaches to delocalization in random matrices were spectral. Delocalization was obtained as a byproduct of local limit laws, which determine eigenvalue distribution on microscopic scales. For example, delocalization for symmetric random matrices was deduced from a local version of Wigner’s semicircle law which controls the number of eigenvalues of falling in short intervals, even down to intervals where the average number of eigenvalues is logarithmic in the dimension .
In this paper we develop a new approach to delocalization of random matrices, which is geometric rather than spectral. The only spectral properties we rely on are crude bounds on the extreme singular values of random matrices. As a result, the new approach can work smoothly in situations where limit spectral laws are unknown or even impossible. In particular, one does not need to require that the variances of all entries be the same, or even that the matrix of variances be doubly-stochastic (as e.g. ).
The case corresponds to sub-gaussian random variablesStandard properties of sub-gaussian random variables can be found in [38, Section 5.2.3].. It is convenient to state and prove the main result for sub-gaussian random variables, and then deduce a similar result for general using a standard truncation argument.
The same conclusion as in Theorem 1.1 holds for a complex matrix . One just needs to require that both real and imaginary parts of all entries are independent and satisfy the three conditions in Theorem 1.1.
The exponent of the logarithm in Theorem 1.1 is suboptimal, and there are several points in the proof that can be improved. We believe that by taking care of these points, it is possible to improve the exponent to the optimal value . However, such improvements would come at the expense of simplicity of the argument, while in this paper we aim at presenting the most transparent proof. The exponents is probably suboptimal as well.
The proof of Theorem 1.1 shows that depends polynomially on , i.e., for some absolute constant . This observation allows one to extend Theorem 1.1 to the situation where the entries of have uniformly bounded -norms, for any fixed .
Here depend only on and .
Let be a random matrix as in Theorem 1.1, and let and . Then, with probability at least , all -approximate eigenvectors of satisfy
The results in of this paper could be extended in several other ways. For instance, it is possible to drop the assumption that all variances of the entries are of the same order and prove a similar theorem for sparse random matrices. One can establish the isotropic delocalization in the sense of . We did not pursue these directions since it would have made the presentation heavier. It is also possible that a version of Theorems 1.1 and 1.6 can be proved for Hermitian matrices. We leave this direction for the future.
Our approach to Theorem 1.6 is based on a dimension reduction argument. If the matrix has a localized approximate eigenvector, it will be detected from an imbalance of a suitable projection of . As we shall see, this argument yields a lower bound on that is uniform over all unit vectors such that .
Let us describe this strategy in loose terms. We fix and consider the random matrix ; denote its columns by . Consider a projection whose kernel contains all except for , where and are a random uniform index and subset respectively, with . We call such a test projection. By its definition, triangle inequality and Cachy-Schwarz inequality, we have for any vector that
Suppose that is localized, say . Using the randomness of and , with non-negligible probability (around ) we have and . On this event, we have shown that
Since the right hand side of (1.2) does not depend on , we obtained a uniform lower estimate for all localized vectors .
It remains to estimate the magnitudes of for . What helps us is that the test projection can be made independent of the random vectors appearing in (1.2). Since , we can represent where denote the standard basis vectors. Assume first that is very close to zero, so . Then, using concentration of measure we can argue that with high probability (and thus for all simultaneously). Substituting this into (1.2) we conclude that the nice lower bound
holds for all localized approximate eigenvectors corresponding to (approximate) eigenvalues that are very close to zero.
The challenging part of our argument is for not close to zero, namely when the diagonal parts dominates in the representation . Estimating the magnitudes of might be as difficult as the original delocalization problem. However, it turns out that using concentration, it is possible to compare the terms with each other without knowing their magnitudes. This will require a careful construction of a test projection .
The rest of the paper is organized as follows. In Section 2, we recall some known linear algebraic and probabilistic facts. In Section 3, we rigorously develop the argument that was informally described above. It reduces the delocalization problem to finding a test projection for which the norms of the columns have similar magnitudes. In Section 4, we shall develop a helpful tool for estimating , an estimate of the distance between anisotropic random vectors and subspaces. In Section 5, we shall express in terms of such distances, and thus will be able to compare these terms with each other. In Section 6 we deduce Theorem 1.6. Finally, Appendix contains auxiliary results on the smallest singular values of random matrices.
Notation and preliminaries
We shall work with random variables which satisfy the following assumption.
is either real valued and satisfies
or is complex valued, where and are independent random variables each satisfying the three conditions in (2.1).
We will establish the conclusion of Theorem 1.6 for random matrices with independent entries that satisfy Assumption 2.1. Thus we will simultaneously treat the real case and the complex case discussed in Remark 1.2.
We will regard the parameter in Assumption 2.1 as a constant, thus will denote positive numbers that may depend on only; their values may change from line to line.
Without loss of generality, we can assume that , as well as various other matrices that we will encounter in the proof, have full rank. This can be achieved by a perturbation argument, where one adds to an independent Gaussian random matrix whose all entries are independent random variables with sufficiently small . Such perturbation will not affect the proof of Theorem 1.6 since any -approximate eigenvector of will be a -approximate eigenvector of whenever .
Here denotes the Moore-Penrose pseudoinverse of , see e.g. . We will need a few elementary properties of singular values.
Let be an matrix and .
Appendix A contains estimates of the smallest singular values of random matrices.
Next, we state a concentration property of sub-gaussian random vectors.
Let be a fixed matrix. Consider a random vector with independent components which satisfy Assumption 2.1.
(Concentration) For any , we have
In both parts, is polynomial in .
This result can be deduced from Hanson-Wright inequality. For part (ii), this was done in . A modern proof of Hanson-Wright inequality and deduction of both parts of Theorem 2.3 are discussed in . There were assumed to have unit variances; the general case follows by a standard normalization step.
Sub-gaussian concentration paired with a standard covering argument yields the following result on norms of random matrices, see .
The same holds if is an matrix such that .
Reducing delocalization to the existence of a test projection
We will first try to bound the probability of the following localization event for a random matrix and parameters :
In this section, we reduce our task to the existence of a certain linear map which reduces dimension from to , and which we call a test projection.
To this end, given an matrix , we shall denote by the -th column of , and for a subset , we denote by the submatrix of formed by the columns indexed by . Fix and , and define the set of pairs
We equip with the uniform probability measure.
Let . Consider an random matrix with an arbitrary distribution. Suppose that to each corresponds a number and an matrix with the following properties:
.
Let . Let and . Then we can bound the probability of the localization event (3.1) as follows:
where denotes the following balancing event:
Proposition 3.1 states that in order to establish delocalization (as encoded by the complement of the event ), it is enough to find a test projection which satisfies the balancing property .
Let , let be as in the statement. Using the properties (i) and (ii) of , we have
The event will help us balance the norms and , while the following elementary lemma will help us balance the coefficients .
For a given and for random , define the event
Let denote a coordinate for which . Then
Conditionally on , the distribution of is uniform in the set . Thus using Chebyshev’s inequality we obtain
Assume that a realization of the random matrix satisfies
(We will analyze when this event occurs later.) Combining with the conclusion of Lemma 3.2, we see that there exists such that both events and hold. Then we can continue estimating in (3.3) using and as follows:
provided the right hand side is non-negative. In particular, if where , then . Thus the localization event must fail.
Let us summarize. We have shown that the localization event implies the failure of the event (3.5). The probability of this failure can be estimated using Chebyshev’s inequality and Fubini theorem as follows:
This completes the proof of Proposition 3.1. ∎
We might choose to be the orthogonal projection with
In reality, will be a bit more adapted to . Let us see what it will take to prove the two inequalities defining the balancing event in (3.2). The second inequality can be deduced from the small ball probability estimate, Theorem 2.3(ii). Turning to the first inequality, note that
up to a polynomial factor in (thus logarithmic in ). So we need to show that
Since , the columns of can be expressed as . Thus, informally speaking, our task is to show that with high probability,
The first inequality can be deduced from sub-gaussian concentration, Theorem 2.3. The second inequality in (3.6) is challenging, and most of the remaining work is devoted to validating it. Instead of estimating , we will compare these terms with each other.
Later, in Proposition 5.3, we will relate to distances between anisotropic random vectors and subspaces. We will now digress to develop a general bound on such distances, which may be interesting on its own.
Distances between anisotropic random vectors and subspaces
Let us start with the isotropic case, where the random vectors in question have all independent coordinates. Here one can use Theorem 2.3 to control the distances.
Let . Consider independent random vectors with independent coordinates satisfying Assumption 2.1. Consider the subspace . Then
By adding small independent Gaussian perturbations to the vectors , we can assume that almost surely. We can represent the distance as
In this paper, we will need to control the distances in the more difficult anisotropic case, where all random vectors are transformed by a fixed linear map . In other words, we will be interested in distances of the form where is the span of the vectors . An ideal estimate should look like
where are the singular values of arranged in the non-increasing order. To see why such estimate would make sense, note that in the isotropic case where the distance is of order , while for of rank or lower, the distance is zero.
The following result, based again on Theorem 2.3, establishes a somewhat weaker form of (4.1) with exponentially high probability.
Let be an matrix with singular valuesAs usual, we arrange the singular values in a non-increasing order. , and define for . Let . Consider independent random vectors with independent coordinates satisfying Assumption 2.1. Consider the subspace . Then for every and , one has
Here and , .
It is important that the probability bounds in Theorem 4.2 are exponential in and . We will later choose and , , where . This will allow us to make the exceptional probabilities Theorem 4.2 smaller than, say, .
As will be clear from the proof, one can replace the distance in part (ii) of Theorem 4.2 by the following bigger quantity:
We truncate the singular values of by defining an matrix with the same left and right singular vectors as , and with singular values
Since for all , we have in the p.s.d. order, which implies
It remains to bound below. This can be done using Theorem 2.3(ii):
For , Cauchy interlacing theorem yields , thus
(ii) We truncate the singular value decomposition by defining
We will estimate these two terms separately.
Using this for , we obtain that with probability at least ,
Next, we estimate the first term in (4.4), . Our immediate goal is to represent as a linear combination
with some control of the norm of the coefficient vector . To this end, let us consider the singular value decomposition
Thus is a matrix satisfying . Let denote the with columns .
We apply Theorem A.3 for the matrix . It states that with probability at least , we have
Using Lemma 2.2(ii) we can find a coefficient vector such that
Multiplying both sides of (4.8) by and recalling that , we obtain the desired identity (4.6).
Now we have representation (4.6) with a good control of . Then we can estimate the distance as follows:
(Recall that denotes the with matrix columns .) Applying Theorem 2.4, we have with probability at least that
Intersecting this with the event (4.10), we obtain with probability at least that
Finally, we combine this with the event (4.5) and put into the estimate (4.4). It follows that with probability at least , one has
Due to our choice of (in (4.10) and (4.7)), the theorem is proved.The factor in the probability estimate can be reduced to by adjusting . We will use the same step in later arguments. ∎
Construction of a test projection
We are now ready to construct a test projection , which will be used later in Proposition 3.1.
with probability at least , one has
In the rest of this section we prove Theorem 5.1.
Consider the random matrix with columns . Let denote the minor of obtained by removing the first rows and columns. By known invertibility results for random matrices, we will see that most singular values of , and thus also of , are within a factor from each other. Then we will find a somewhat smaller interval (a “spectral window”) in which the singular values of are within constant factor from each other. This is a consequence of the following elementary lemma.
Let , and define for . Assume that for some and , one has
Set . Then there exists such that
Let us divide the interval into intervals of length . Then for at least one of these intervals, the sequence decreases over it by a factor at most . Indeed, if this were not true, the sequence would decrease by a factor at least over , which would contradict the assumption (5.1). Set to be the midpoint of the interval we just found, thus
By monotonicity of , this implies the first part of the conclusion (5.2). To see this, note that since , we have .
To deduce the second part of (5.2), note that by monotonicity we have
where the very last inequality follows from (5.3). Estimates (5.4) and (5.5) together imply that . Like in the first part, we finish by monotonicity. ∎
We shall apply Lemma 5.2 to the singular values of , i.e. for
To verify the assumptions of the lemma, we can use known estimates of the extreme singular values of random matrices. By Theorem 2.4 (see Remark 2.5), with probability at least , we have , and thus
Further, by Theorem A.2, with probability at least , one has
(Here we used that .) Summarizing, with probability at least ,
Let us condition on for which event (5.6) holds. We apply Lemma 5.2 with and thus for
We find such that (5.2) holds. Note that the value of depends only on the minor , thus only on , as claimed in Theorem 5.1. Since we have conditioned on , the value of the “spectral window” is now fixed.
2. Construction of P𝑃P
We construct in two steps. First we define a matrix of the same dimensions that satisfies (ii) of the Theorem, and then obtain by orthogonalization of the rows of .
Thus we shall look for an matrix that consists of three blocks of columns:
We require that satisfy condition (ii) in Theorem 5.1, i.e. that
We explore this requirement in Section 5.4; for now let us assume that it holds.
Choose to be an matrix that satisfies the following two defining properties:
the span of the rows of is the same as the span of the rows of .
One can construct by Gram-Schmidt orhtogonalization of the rows of .
Note that the construction of along with (5.8) implies (i) and (ii) of Theorem 5.1. It remains to estimate thereby proving (iii) of Theorem 5.1.
Let denote the rows of and denote the entries of . Then:
The values of , , are determined by , and they do not depend on a particular choice of satisfying its defining properties (a), (b).
For every , .
(i) Any that satisfy the defining properties (a), (b) must satisfy for some unitary matrix . It follows that for all .
(ii) Let us assume that ; the argument for general is similar. By part (i), we can construct the rows of by performing Gram-Schmidt procedure on the rows of in any order. We choose the following order: , and thus construct the rows of . This yields
where denote the entries of .
Next, for each , (5.11) implies that , and thus the first coordinate of equal zero. Using this in (5.12), we conclude that
(iii) is trivial since for all by the construction of , while the rows of are the linear combination of the rows of . ∎
4. The kernel requirement (5.8)
In order to estimate the distances defined by the rows of , let us explore the condition (5.8) for . To express this condition algebraically, let us consider the matrix obtained by removing the first columns from . Then (5.8) can be written as
Let us denote the first rows of by , thus
Without loss of generality, we can assume that the matrix is almost surely invertible (see Section 2 for a perturbation argument achieving this). Multiplying both sides of the previous equations by , we further rewrite them as
Thus we can choose to satisfy the requirement (5.8) by choosing arbitrarily and defining as in (5.15).
5. Estimating the distances, and completion of proof of Theorem 5.1
We shall now estimate , , using identities (5.9) and (5.15). By the construction of and (5.15) we have
Let us estimate ; the argument for general is similar. By (5.9),
We will use Theorem 4.2 to obtain lower and upper bounds on .
We apply Theorem 4.2 in dimension instead of , and with
Recall here that in (5.7) we selected . Note that by construction (5.14), the vectors do not contain the diagonal elements of , and so their entries have mean zero as required in Theorem 4.2. Applying part (i) of that theorem, we obtain with probability at least that
Now we apply part (ii) of Theorem 4.2. This time we shall use a sharper bound stated in Remark 4.4. It yields that with probability at least , the following holds. There exists such that
We can simplify (5.18). Using (5.2) and monotonicity, we have
Recall that this holds with probability at least . On this event, by the construction of and using the bound on in (5.19), we have
5.3. Completion of the proof of Theorem 5.1
Combining the events (5.20) and (5.17), we have shown the following. With probability at least , the following two-sided estimate holds:
A similar statement can be proved for general , . By intersecting these events, we obtain that with probability at least , all such bounds for hold simultaneously. Suppose this indeed occurs. Then by (5.16), we have
We have calculated the conditional probability of (5.21); recall that we conditioned on which satisfies the event (5.6), which itself holds with probability . Thus the unconditional probability of the event (5.21) is at least . Recalling that and , and simplifying this expression, we arrive at the probability bound claimed in Theorem 5.1. Since according to (5.19), the estimate (5.21) yields the first part of (iii) in Theorem 5.1. The second part, stating that for , was already noted in (iii) or Proposition 5.3. Thus Theorem 5.1 is proved. ∎
Proof of Theorem 1.6 and Corollary 1.5
Let be a random matrix from Theorem 1.6. We shall apply Proposition 3.1 for
Let and . Then, for every fixed , one can find a test projection as required in Proposition 3.1. Moreover,
Without loss of generality, we assume that and . We apply Theorem 5.1, and choose and determined by guaranteed by that theorem. The test projeciton automatically satisfies the conditions of Proposition 3.1. Moreover, with probability at least , one has
Let us condition on for which the event (6.2) holds; this fixes and but leaves random as before.
The definition (3.2) of balancing event requires us to estimate the norms of
Hence, estimates (6.3) and (6.5) hold simultaneously with probability at least . Recall that this concerns conditional probability, where we conditioned on the event (6.2), which itself holds with probability at least . Therefore, estimates (6.3) and (6.5) hold simultaneously with (unconditional) probability at least . Together they yield
This is the first part of the event . Finally, (6.3) implies that , which is the second part of the event for . The proof is complete. ∎
Substituting the conclusion of Proposition 6.1 into Proposition 3.1, we obtain:
From this we can readily deduce a slightly stronger version of Theorem 1.6.
Consider a random matrix as in Theorem 1.6. Let , and . Then the event
Recall that is nicely bounded with high probability. Indeed, Theorem 2.4 (see Remark 2.5) states that the event
Assume that holds. Then all -approximate eigenvalues of are contained in the complex disc centered at the origin and with radius . Let be a -net of this disc such that .
Assume holds, so there exists an -approximate eigenvector of such that and . Choose a point in the net closest to , so . Then
This argument shows that , where
Recall that the probability of is estimated in (6.6), and the probabilities of the events can be bounded using Proposition 6.2 with . (Our assumption that enforces the bound that is needed in Proposition 6.2.) It follows that
Simplifying this bound we complete the proof. ∎
We are going to apply Corollary 6.3 for . This is possible as long as and , since the latter restriction enforces the bound . In this regime, the conclusion of Theorem 1.6 follows directly from Corollary 6.3.
In the remaining case, where either or , the right hand side of (1.1) is greater than for an appropriate choice of the constant . Thus, in this case, the bound (1.1) holds trivially since one always has . Theorem 1.6 is proved. ∎
Using a standard truncation argument, we will now deduce Corollary 1.5 for general exponential tail decay. We will first prove the following relaxation of Proposition 6.2.
Here depend only on and .
Appendix A Invertibility of random matrices
Our delocalization method relied on estimates of the smallest singular values of rectangular random matrices. The method works well provided one has access to estimates that are polynomial in the dimension of the matrix (which sometimes was of order , and other times of order ), and provided the probability of having these estimates is, say, at least .
In the recent years, significantly sharper bounds were proved than those required in our delocalization method, see survey . We chose to include weaker bounds in this appendix for two reasons. First, they hold in somewhat more generality than those recorded in the literature, and also their proofs are significantly simpler.
Let , and let where is an arbitrary fixed matrix and is an random matrix with independent entries satisfying Assumption 2.1. Then
Using the negative second moment identity (see Lemma A.4]), we have
Union bound yields that with probability at least , we have for all . Plugging this into (A.2), we conclude that with the same probability, . This completes the proof. ∎
Let where is an arbitrary fixed matrix and is an random matrix with independent entries satisfying Assumption 2.1. Then all singular values for satisfy the estimate (A.1) with .
Recall that where is formed by the first columns of . The conclusion follows from Theorem A.1 applied to . ∎
Let us explain the idea of the proof of Theorem A.3. We need a lower bound for
where denote the columns of . The bound has to be uniform over . Let and set for a suitably chosen .
The vectors that lie near the subspace , which has dimension , can be controlled by the remaining vectors , since . Indeed, this is equivalent to controlling the smallest singular value of a random matrix whose columns are , where projects onto . This is a version of Theorem A.3 for very fat matrices, and it can be proved in a standard way by using -nets.
Let . Consider the matrix formed by the first columns of matrix . Then
This is a minor variant of Theorem A.1; its proof is very similar and is omitted. ∎
There exist , such that the following holds. Consider the same situation as in Theorem A.3, except that we assume that . Then
Lemma A.5 is a minor variation of [38, Theorem 5.39] for independent sub-gaussian columns, and it can be proved in a similar way (using a standard concentration and covering argument). ∎
Denote ; our goal is to bound below the quantity
Let be parameters, and set . We decompose
where is the matrix that consists of the first columns of , and is the matrix that consists of the last columns of . Let . Then
Assume that (which will be seen to be a likely event), so .
The argument now splits according to the position of relative to . Assume first that . Since , using Lemma 2.2(i) we have
We will later apply Lemma A.4 to bound below.
Consider now the opposite case, where . There exists such that , and in particular . Thus
is an matrix. Since , we have , which together with (A.3) yields
A bit later, we will use Lemma A.5 to bound below.
Putting the two cases together, we have shown that
It remains to estimate , and .
Since and , Lemma A.4 yields that with probability at least , we have
Next, we use Lemma A.5 for the matrix . Let be such that . Since , by choosing with a suitable we can achieve that to satisfy the dimension requirement in Lemma A.5. Then, with probability at least we have
Further, by Theorem 2.4, with probability at least we have
Putting all these estimates in (A.4), we find that with probability at least , one has
Now we choose with a suitable , and recall that we have chosen . We conclude that . Since , the proof of Theorem A.3 is complete. ∎