The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices
Florent Benaych-Georges, Raj Rao Nadakuditi
Introduction
Let be an symmetric (or Hermitian) matrix with eigenvalues and be an symmetric (or Hermitian) matrix with rank and non-zero eigenvalues . A fundamental question in matrix analysis is the following :
How are the eigenvalues and eigenvectors of related to the eigenvalues and eigenvectors of and ?
When and are diagonalized by the same eigenvectors, we have for appropriate choice of indices . In the general setting, however, the answer is complicated by the fact that the eigenvalues and eigenvectors of their sum depend on the relationship between the eigenspaces of the individual matrices.
In this scenario, one can use Weyl’s interlacing inequalities and Horn inequalities to obtain coarse bounds for the eigenvalues of the sum in terms of the eigenvalues of . When the norm of is small relative to the norm of , tools from perturbation theory (see [22, Chapter 6] or ) can be employed to improve the characterization of the bounded set in which the eigenvalues of the sum must lie. Exploiting any special structure in the matrices allows us to refine these bounds but this is pretty much as far as the theory goes. Instead of exact answers we must resort to a system of coupled inequalities. Describing the behavior of the eigenvectors of the sum is even more complicated.
Surprisingly, adding some randomness to the eigenspaces permits further analytical progress. Specifically, if the eigenspaces are assumed to be “in generic position with respect to each other”, then in place of eigenvalue bounds we have simple, exact answers that are to be interpreted probabilistically. These results bring into focus a phase transition phenomenon of the kind illustrated in Figure 1 for the eigenvalues and eigenvectors of and . A precise statement of the results may be found in Section 2.
The development of the eigenvector aspect is another contribution that we would like to highlight. Generally speaking, the eigenvector question has received less attention in random matrix theory and in free probability theory. A notable exception is the recent body of work on the eigenvectors of spiked Wishart matrices which corresponds to being the Marčenko-Pastur measure. In this paper, we extend their results for multiplicative models of the kind to the setting where is an arbitrary probability measure and obtain new results for the eigenvectors for additive models of the form .
Our proofs rely on the derivation of master equation representations of the eigenvalues and eigenvectors of the perturbed matrix and the subsequent application of concentration inequalities for random vectors uniformly distributed on high dimensional unit spheres (such as the ones appearing in ) to these implicit master equation representations. Consequently, our technique is simpler, more general and brings into focus the source of the phase transition phenomenon. The underlying methods can and have been adapted to study the extreme singular values and singular vectors of deformations of rectangular random matrices, as well as the fluctuations and the large deviations of our model.
The paper is organized as follows. In Section 2, we state the main results and present the integral transforms alluded to above. Section 3 presents some examples. An outline of the proofs is presented in Section 4. Exact master equation representations of the eigenvalues and the eigenvectors of the perturbed matrices are derived in Section 5 and utilized in Section 6 to prove the main results. Technical results needed in these proofs have been relegated to the Appendix.
Main results
Let be an symmetric (or Hermitian) random matrix whose ordered eigenvalues we denote by . Let be the empirical eigenvalue distribution, i.e., the probability measure defined as
Assume that the probability measure converges almost surely weakly, as , to a non-random compactly supported probability measure . Let and be, respectively, the infimum and supremum of the support of . We suppose the smallest and largest eigenvalue of converge almost surely to and .
For a given , let be deterministic non-zero real numbers, chosen independently of . For every , let be an symmetric (or Hermitian) random matrix having rank with its non-zero eigenvalues equal to . Let the index be defined such that .
Recall that a symmetric (or Hermitian) random matrix is said to be orthogonally invariant (or unitarily invariant) if its distribution is invariant under the action of the orthogonal (or unitary) group under conjugation.
We suppose that and are independent and that either or is orthogonally (or unitarily) invariant.
2. Notation
we also let denote almost sure convergence. The ordered eigenvalues of an Hermitian matrix will be denoted by . Lastly, for a subspace of a Euclidian space and a vector , we denote the norm of the orthogonal projection of onto by .
3. Extreme eigenvalues and eigenvectors under additive perturbations
Consider the rank additive perturbation of the random matrix given by
The extreme eigenvalues of exhibit the following behavior as . We have that for each ,
while for each fixed , .
Similarly, for the smallest eigenvalues, we have that for each ,
while for each fixed , .
is the Cauchy transform of , is its functional inverse so that stands for .
Consider such that . For each , define
and let be a unit-norm eigenvector of associated with the eigenvalue . Then we have, as ,
When , let the sole non-zero eigenvalue of be denoted by . Suppose that
For each , let be a unit-norm eigenvector of associated with either the largest or smallest eigenvalue depending on whether or , respectively. Then we have
The following proposition allows to assert that in many classical matrix models, such as Wigner or Wishart matrices, the above phase transitions actually occur with a finite threshold. The proposition is phrased in terms of , the supremum of the support of , but also applies for , the infimum of the support of . The proof relies on a straightforward computation which we omit.
Assume that the limiting eigenvalue distribution has a density with a power decay at , i.e., that, as with , for some exponent and some constant . Then:
so that the phase transitions in Theorems 2.1 and 2.3 manifest for .
Under additional hypotheses on the manner in which the empirical eigenvalue distribution of as , Theorem 2.2 can be generalized to any eigenvalue with limit equal either to or such that is finite. In the same way, Theorem 2.3 can be generalized for any value of . The specific hypothesis has to do with requiring the spacings between the ’s to be more “random matrix like” and exhibit repulsion instead of being “independent sample like” with possible clumping. We plan to develop this line of inquiry in a separate paper.
4. Extreme eigenvalues and eigenvectors under multiplicative perturbations
We maintain the same hypotheses as before so that the limiting probability measure , the index and the rank matrix are defined as in Section 2.1. In addition, we assume that for every , is a non-negative definite matrix and that the limiting probability measure is not the Dirac mass at zero.
Consider the rank multiplicative perturbation of the random matrix given by
The extreme eigenvalues of exhibit the following behavior as . We have that for ,
while for each fixed , .
In the same way, for the smallest eigenvalues, for each ,
while for each fixed , .
is the T-transform of , is its functional inverse and stands for .
Consider such that . For each , define
and let be a unit-norm eigenvector of associated with the eigenvalue . Then we have, as ,
When , let the sole non-zero eigenvalue of be denoted by . Suppose that
For each , let be the unit-norm eigenvector of associated with either the largest or smallest eigenvalue depending on whether or , respectively. Then, we have
Assume that the limiting eigenvalue distribution has a density with a power decay at (or or both), i.e., that, as with , for some exponent and some constant . Then:
so that the phase transitions in Theorems 2.6 and 2.8 manifest for .
The analogue of Remark 2.5 also applies here.
Consider the matrix . The matrices and are related by a similarity transformation and hence share the same eigenvalues and consequently the same limiting eigenvalue behavior in Theorem 2.6. Additionally, if is a unit-norm eigenvector of then is an eigenvector of and the unit-norm eigenvector satisfies
It follows that we obtain the same phase transition behavior and that when ,
so that the analogue of Theorems 2.7 and 2.8 for the eigenvectors of holds.
5. The Cauchy and T transforms in free probability theory
The Cauchy transform of a compactly supported probability measure on the real line is defined as:
If denotes the convex hull of the support of , then
exist in and , respectively and realizes decreasing homeomorphisms from onto and from onto . Throughout this paper, we shall denote by the inverses of these homeomorphisms, even though can also define other homeomorphisms on the holes of the support of .
is the analogue of the logarithm of the Fourier transform for free additive convolution. The free additive convolution of probability measures on the real line is denoted by the symbol and can be characterized as follows.
Let and be independent symmetric (or Hermitian) random matrices that are invariant, in law, by conjugation by any orthogonal (or unitary) matrix. Suppose that, as , and . Then, free probability theory states that , a probability measure which can be characterized in terms of the -transform as
The connection between free additive convolution and (via the -transform) and the appearance of in Theorem 2.1 could be of independent interest to free probabilists.
5.2. The T𝑇T-transform and its relation to multiplicative free convolution
In the case where and the support of is contained in , one also defines its -transform
which realizes decreasing homeomorphisms from onto and from onto . Throughout this paper, we shall denote by the inverses of these homeomorphisms, even though can also define other homeomorphisms on the holes of the support of .
is the analogue of the Fourier transform for free multiplicative convolution . The free multiplicative convolution of two probability measures and is denoted by the symbols and can be characterized as follows.
Let and be independent symmetric (or Hermitian) positive-definite random matrices that are invariant, in law, by conjugation by any orthogonal (or unitary) matrix. Suppose that, as , and . Then, free probability theory states that , a probability measure which can be characterized in terms of the -transform as
The connection between free multiplicative convolution and (via the -transform) and the appearance of in Theorem 2.6 could be of independent interest to free probabilists.
6. Extensions
Theorem 2.1 can easily be adapted to describe the phase transition in the eigenvalues of which fall in the “holes” of the support of . Consider such that almost surely, for large enough, has no eigenvalue in the interval . It implies that induces a decreasing homeomorphism, that we shall denote by , from the interval onto the interval . Then it can be proved that almost surely, for large enough, has no eigenvalue in the interval , except if some of the ’s are in the interval , in which case for each such index , one eigenvalue of has limit as .
Theorem 2.1 can also easily be adapted to the case where itself has isolated eigenvalues in the sense that some of its eigenvalues have limits out of the support of . More formally, let us replace the assumption that the smallest and largest eigenvalues of tend to the infimum and the supremum of the support of by the following one.
Moreover, and .
The previous remark forms the basis for an iterative application of our theorems to other perturbational models, such as for example. Another way to deal with such perturbations is to first derive the corresponding master equations representations that describe how the eigenvalues and eigenvectors of are related to the eigenvalues and eigenvectors of and the perturbing matrices, along the lines of Proposition 5.1 for additive or multiplicative perturbations of Hermitian matrices.
Let be an Gaussian random matrix with independent real (or complex) entries that are normally distributed with mean and variance . Then the matrix is orthogonally (or unitarily) invariant. Hence one can choose an orthonormal basis of eigenvectors of such that the matrix with columns is Haar-distributed. When is a Gaussian-like matrix, in the sense that its entries are i.i.d. with mean zero and variance one, then upon placing adequate restrictions on the higher order moments, for non-random unit norm vector , the vector will be close to uniformly distributed on the unit real (or complex) sphere . Since our proofs rely heavily on the properties of unit norm vectors uniformly distributed on the -sphere, they could possibly be adapted to the setting where the unit norm vectors are close to uniformly distributed.
Suppose that is a random matrix independent of , with exactly non-zero eigenvalues given by . Let as . Using [22, Cor. 6.3.8] as in Section 6.2.3, one can easily see that our results will also apply in this case.
The analogues of Remarks 2.11, 2.12, 2.14 and 2.15 for the multiplicative setting also hold here. In particular, Wishart matrices with (cf Section 3.2) gives an illustration of the case where there is a hole in the support of .
Examples
We now illustrate our results with some concrete computations. The key to applying our results lies in being able to compute the Cauchy or transforms of the probability measure and their associated functional inverses. In what follows, we focus on settings where the transforms and their inverses can be expressed in closed form. In settings where the transforms are algebraic so that they can be represented as solutions of polynomial equations, the techniques and software developed in can be utilized. In more complicated settings, one will have to resort to numerical techniques.
Let be an symmetric (or Hermitian) matrix with independent, zero mean, normally distributed entries with variance on the diagonal and on the off diagonal. It is known that the spectral measure of converges almost surely to the famous semi-circle distribution with density
It is known that the extreme eigenvalues converge almost surely to the endpoints of the support . Associated with the spectral measure, we have
and .
Thus for a with non-zero eigenvalues , by Theorem 2.1, we have for ,
as . This result has already been established in for the symmetric case and in for the Hermitian case. Remark 2.14 explains why our results should hold for Wigner matrices of the sort considered in .
In the setting where and , let be a unit-norm eigenvector of associated with its largest eigenvalue. By Theorems 2.2 and 2.3, we have
2. Multiplicative perturbation of a random Wishart matrix
Let be an real (or complex) matrix with independent, zero mean, normally distributed entries with variance . Let . It is known that, as with , the spectral measure of converges almost surely to the famous Marčenko-Pastur distribution with density
where and . It is known that the extreme eigenvalues converge almost surely to the endpoints of this support.
Associated with this spectral measure, we have
, and
When , there is an atom at zero so that the smallest eigenvalue of is identically zero. For simplicity, let us consider the setting when so that the extreme eigenvalues of converge almost surely to and . Thus for with non-zero eigenvalues , with , for , by Theorem 2.6, we have for ,
as . An analogous result for the smallest eigenvalue may be similarly derived by making the appropriate substitution for in Theorem 2.6. Consider the matrix . The matrix may be interpreted as a Wishart distributed sample covariance matrix with “spiked” covariance . By Remark 2.10, the above result applies for the eigenvalues of as well. This result for the largest eigenvalue of spiked sample covariance matrices was established in and for the extreme eigenvalues in .
In the setting where and , let and let be a unit-norm eigenvector of associated with its largest (or smallest, depending on whether or ) eigenvalue. By Theorem 2.8, we have
Let be a unit eigenvector of associated with its largest (or smallest, depending on whether or ) eigenvalue. Then, by Theorem 2.8 and Remark 2.10, we have
The result has been established in for the eigenvector associated with the largest eigenvalue. We generalize it to the eigenvector associated with the smallest one.
We note that symmetry considerations imply that when is a Wigner matrix then is a Wigner matrix as well. Thus an analytical characterization of the largest eigenvalue of a Wigner matrix directly yields a characterization of the smallest eigenvalue as well. This trick cannot be applied for Wishart matrices since Wishart matrices do not exhibit the symmetries of Wigner matrices. Consequently, the smallest and largest eigenvalues and their associated eigenvectors of Wishart matrices have to be treated separately. Our results facilitate such a characterization.
Outline of the proofs
We now provide an outline of the proofs. We focus on Theorems 2.1, 2.2 and 2.3, which describe the phase transition in the extreme eigenvalues and associated eigenvectors of (the index in and has been suppressed for brevity). An analogous argument applies for the multiplicative perturbation setting.
Consider the setting where , so that , with being a unit norm column vector. Since either or is assumed to be invariant, in law, under orthogonal (or unitary) conjugation, one can, without loss of generality, suppose that and that is uniformly distributed on the unit -sphere.
The eigenvalues of are the solutions of the equation
Equivalently, for so that is invertible, we have
Consequently, a simple argument reveals that the is an eigenvalue of and not an eigenvalue of if and only if is an eigenvalue of the matrix . But has rank one, so its only non-zero eigenvalue will equal its trace, which in turn is equal to , where is a “weighted” spectral measure of , defined by
Thus any outside the spectrum of is an eigenvalue of if and only if
Equation (1) describes the relationship between the eigenvalues of and the eigenvalues of and the dependence on the coordinates of the vector (via the measure ).
This is where randomization simplifies analysis. Since is a random vector with uniform distribution on the unit -sphere, we have that for large , with high probability. Consequently, we have so that . Inverting equation (1) after substituting these approximations yields the location of the largest eigenvalue to be as in Theorem 2.1.
The phase transition for the extreme eigenvalues emerges because under our assumption that the limiting probability measure is compactly supported on , the Cauchy transform is defined outside and unlike what happens for , we do not always have . Consequently, when , we have that as before. However, when , the phase transition manifests and .
An extension of these arguments for fixed yields the general result and constitutes the most transparent justification, as sought by the authors in , for the emergence of this phase transition phenomenon in such perturbed random matrix models. We rely on concentration inequalities to make the arguments rigorous.
2. Eigenvectors phase transition
Let be a unit eigenvector of associated with the eigenvalue that satisfies (1). From the relationship , we deduce that, for ,
implying that is proportional to .
Equation (2) describes the relationship between the eigenvectors of and the eigenvalues of and the dependence on the coordinates of the vector (via the measure ).
Here too, randomization simplifies analysis since for large , we have and . Consequently,
so that when , which implies that , we have
whereas when and has infinite derivative at , we have
An extension of these arguments for fixed yields the general result and brings into focus the connection between the eigenvalue phase transition and the associated eigenvector phase transition. As before, concentration inequalities allow us to make these arguments rigorous.
The exact master equations for the perturbed eigenvalues and eigenvectors
In this section, we provide the -dimensional analogues of the master equations (1) and (2) employed in our outline of the proof.
Let us fix some positive integers . Let be a diagonal matrix and , with an diagonal matrix and an matrix with orthonormal columns, i.e., ).
a) Then any is an eigenvalue of if and only if the matrix
and for all , we have and
b) Let denote the th element of the matrix for and . Then for all , the -th entry of the matrix can be expressed as
where is the complex measure defined by
and is the Cauchy transform of .
c) In the setting where and as before, we obtain the analog of a) by replacing every occurrence, in (4) and (6), of with . We obtain the analog of b) by replacing the Cauchy transform in (7) with the -transform.
Proof. Part a) is proved, for example, in [2, Th. 2.3]. Part b) follows from a straightforward computation of the -th entry of . Part c) can be proved in the same way.
Proof of Theorem 2.1
The sequence of steps described below yields the desired proof:
Then, we utilize the “master equations” of Section 5 to express the extreme eigenvalues of as the ’s such that a certain random matrix is singular.
We then exploit convergence properties of certain analytical functions (derived in the appendix) to prove that almost surely, converges to a certain diagonal matrix , uniformly in .
We then invoke a continuity lemma (see Lemma 6.1 - derived next) to claim that almost surely, the ’s such that is singular (i.e. the extreme eigenvalues of ) converge to the ’s such that is singular.
We conclude the proof by noting that, for our setting, the ’s such that is singular are precisely the ’s such that for some , . Part (ii) of Lemma 6.1, about the rank of , will be useful to assert that when the ’s are pairwise distinct, the multiplicities of the isolated eigenvalues are all equal to one.
We now prove a continuity lemma that will be used in the proof of Theorem 2.1. We note that nothing in its hypotheses is random. As hinted earlier, we will invoke it to localize the extreme eigenvalues of .
as .
and denote by the ’s such that is singular, where is identically equal to the number of ’s such that .
for large enough, for each , has rank .
Proof. Note firstly that the ’s such that is singular are the ’s such that for a certain ,
Since the ’s are pairwise distinct, for any , there cannot exist more than one such that (9) holds. As a consequence, for all , the rank of is either or . Since the set of matrices with rank at least is open in the set of matrices, once (i) will be proved, (ii) will follow.
Let us now prove (i). Note firstly that by c), there exists such that for such that , . For any such , . By e), it follows that for large enough, the ’s such that is singular satisfy . By d), it even follows that the ’s such that is singular satisfy .
the number of ’s in such that , denoted by tends to , the cardinality of the ’s in such that .
To prove (H), by additivity, one can suppose that and are close enough to have or . Let us define to be the circle with diameter . By a) and since , does not vanish on , thus
the last equality following from e). It follows that for large enough, (note that since or , no ambiguity due to the orders of the zeros has to be taken into account here).
2. Proof of Theorem 2.1
Note that Weyl’s interlacing inequalities imply that for all ,
where we employ the convention that is and if . It follows that the empirical spectral measure of because the empirical spectral measure of does as well.
Since and belong to the support of , we have, for all fixed,
it follows that for all fixed, and .
By (10), we deduce both following relation (11) and (12): for all fixed, we have
and for all (resp. ) fixed, we have
In this section, we assume that the eigenvalues of the perturbing matrix to be pairwise distinct. In the next section, we shall remove this hypothesis by an approximation process.
For a momentarily fixed , let the eigenvalues of be denoted by . Consider orthogonal (or unitary) matrices , that diagonalize and , respectively, such that
The spectrum of is identical to the spectrum of the matrix
Since we have assumed that or is orthogonally (or unitarily) invariant and that they are independent, this implies that is a Haar-distributed orthogonal (or unitary) matrix that is also independent of (see the first paragraph of the proof of [21, Th. 4.3.5] for additional details).
Recall that the largest eigenvalue , while the smallest eigenvalue . Let us now consider the eigenvalues of which are out of . By Proposition 5.1-a) and an application of the identity in Proposition 5.1-b) these eigenvalues are precisely the numbers such that the matrix
is singular. Recall that in (14), , for is the Cauchy transform of the random complex measure defined by
where and are the th and -th entries of the orthogonal (or unitary) matrix in (13) and is the -th largest eigenvalue of as in the first term in (13).
We now note that Hypotheses a), b) and c) of Lemma 6.1 are satisfied and follow from the definition of the Cauchy transform . Hypothesis d) of Lemma 6.1 follows from the fact that is Hermitian while hypothesis e) has been established in (17).
Let us recall that the eigenvalues of which are out of are precisely those values where the matrix is singular. As a consequence, we are now in a position where Theorem 2.1 follows by invoking Lemma 6.1. Indeed, by Lemma 6.1, if
then their exists some sequences ,…, converging respectively to such that for any small enough, for large enough, the eigenvalues of that are out of are exactly . Moreover, (5) and Lemma 6.1-(ii) ensure that for large enough, these eigenvalues have multiplicity one.
We now treat the case where the ’s are not supposed to be pairwise distinct.
We want to prove that for all , and that for all , .
We shall treat only the case of largest eigenvalues (the case of smallest ones can be treated in the same way). So let us fix and .
There is such that whenever . Consider pairwise distinct non zero real numbers such that for all , and have the same sign and
It implies that . With the notation in Section 6.2.2, for each , we define
Note that by [22, Cor. 6.3.8], we have, for all ,
Theorem 2.1 can applied to (because the are pairwise distinct). It follows that almost surely, for large enough,
By the triangular inequality, almost surely, for large enough,
so that .
Proof of Theorem 2.2
The eigenvectors of , are precisely times the eigenvectors of
Consequently, we have proved Theorem 2.2 by proving the result in the setting where and , where is a Haar-distributed orthogonal (or unitary) matrix.
Let be the number of ’s such that . Up to a reindexing of the ’s (which are then no longer decreasing - this fact does not affect our proof), one can suppose that , . This choice implies that, for each , is the linear span of the first columns , …, of . By construction, these columns are orthonormal. Hence, we will have proved Theorem 2.2 if we can prove that as ,
As before, for every and for all outside the spectrum of , define the random matrix:
where, for all , is the random complex measure defined by (15).
We have established in Theorem 2.1 that because , as . It follows that:
Proposition 5.1 a) states that for large enough so that such that is not an eigenvalue of , the vector
is in the kernel of the matrix with .
Thus by (20), any limit point of is in the kernel of the matrix on the right hand side of (21), i.e. has its last coordinates equal to zero.
Thus (19) holds and we have proved Theorem 2.2-b). We now establish (18).
By (6), one has that for all , the eigenvector of associated with the eigenvalue can be expressed as:
As , the sequence is bounded in operator norm so that by (19), . Since , this implies that .
Since we assumed that , we must have that:
By Proposition 9.3, we have that for all , while for all , . Thus, since we have that , we have that for all ,
Combining the relationship in (22) with the fact that , yields (18) and we have proved Theorem 2.2-a).
Proof of Theorem 2.3
Let us assume that . The proof supplied below can be easily ported to the setting where .
We denote the coordinates of by and define, for each , the random probability measure
The setting of Proposition 5.1-b) states that the eigenvalues of which are not eigenvalue of are the solutions of
Since decreases from to for increasing values of , we have that . Reproducing the arguments leading to (3) in Section (4.2), yields the relationship:
By Theorem 2.1, we have that so that
so that by (23), thereby proving Theorem 2.3.
We omit the details of the proofs of Theorems 2.6–2.8, since these are straightforward adaptations of the proofs of Theorems 2.1-Theorem 2.3 that can obtained by following the prescription in Proposition 5.1-c).
Appendix: convergence of weighted spectral measures
We now establish a lemma on the weak convergence of complex measures that will be useful in proving Proposition 9.3. We note that the counterpart of this lemma for probability measures is well known. We did not find any reference in standard literature to the “complex measures version” stated next, so we provide a short proof.
which can be made arbitrarily small by appropriately choosing . The tightness hypothesis ensures that such a can always be found. This proves that converges weakly to . The uniform convergence follows from a straightforward application of Ascoli’s Theorem.
2. Convergence of weighted spectral measures
b) Suppose that converges almost surely to a deterministic limit . Then
Proof. We use Lemma 9.1. Note first that almost surely, since , both sequences are tight. Moreover, we have