Random covariance matrices: Universality of local statistics of eigenvalues up to the edge

Ke Wang

Introduction

The goal of this paper is to extend the Four Moment theorem established by Tao and Vu for iid covariance matrices from the bulk of spectrum to the edge. Let us first specify the matrix ensembles that will be studied.

Consider a random p×np\times n matrix Mn,p=(ζij)1ip,1jnM_{n,p}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}, where p=p(n)p=p(n) is an integer parameter such that pnp\leq n and limnp/n=y\lim_{n\rightarrow\infty}p/n=y for some 0<y10<y\leq 1. We say that the matrix ensemble MM obeys condition C1\bf{C1} if the random variables ζij\zeta_{ij} are jointly independent, have mean zero and variance 11, and obey the moment condition supi,jEζijC0C\sup_{i,j}{\bf E}|\zeta_{ij}|^{C_{0}}\leq C for some constant CC independent of n,pn,p.

Given such a p×np\times n random matrix MM, we form the n×nn\times n covariance matrix W=Wn,p=1nMMW=W_{n,p}=\frac{1}{n}M^{*}M. This (non-negative) matrix has rank pp and the first npn-p eigenvalues are . We order its remaining eigenvalues as

The empirical spectral distribution (ESD) of the matrix WW(which is Hermitian and thus has real eigenvalues) is a one-dimensional function

where we use I|\mathbf{I}| to denote the cardinality of a set I\mathbf{I}.

The first fundamental result concerning the asymptotic limiting behavior of ESD for large covariance matrices is the MarchenkoPastur\mathit{Marchenko-Pastur} Law\mathit{Law} due to (see also ).

(Marchenko-Pastur Law) Assume a p×np\times n random matrix MM obeys condition C1\bf{C1} with C04C_{0}\geq 4, and p,np,n\rightarrow\infty such that limnp/n=y(0,1]\lim_{n\rightarrow\infty}p/n=y\in(0,1], the empirical spectral distribution of the matrix Wn,p=1nMMW_{n,p}=\frac{1}{n}M^{*}M converges in distribution to the Marchenko-Pastur Law with a density function

We introduce the notation of frequent events, depending on nn, in increasing order of likelihood.

EE holds asymptotically almost surely if P(E)=1o(1){\hbox{\bf P}}(E)=1-o(1).

EE holds with high probability if P(E)1O(nc){\hbox{\bf P}}(E)\geq 1-O(n^{-c}) for some constant c>0c>0 (independent of nn).

EE holds with overwhelming probability if P(E)1OC(nC){\hbox{\bf P}}(E)\geq 1-O_{C}(n^{-C}) for every constant C>0C>0 (or equivalently, that P(E)1exp(ω(logn)){\hbox{\bf P}}(E)\geq 1-\exp(-\omega(\log n))).

EE holds almost surely if P(E)=1{\hbox{\bf P}}(E)=1.

We say that two complex random variables ζ,ζ\zeta,\zeta^{\prime} match to order kk for some integer k1k\geq 1 if one has ERe(ζ)mIm(ζ)l=ERe(ζ)mIm(ζ)l{\hbox{\bf E}}\text{Re}(\zeta)^{m}\text{Im}(\zeta)^{l}={\hbox{\bf E}}\text{Re}(\zeta^{\prime})^{m}\text{Im}(\zeta^{\prime})^{l} for all m,l0m,l\geq 0 with m+lkm+l\leq k.

Our main result is the following Four Moment theorem, which extends the result (Theorem 6) in to the edge of the spectrum. The proof is analogous to the proofs in , and and will be presented in Section 5.

For sufficiently small c0>0c_{0}>0 and sufficiently large C0>0C_{0}>0 (C0=104C_{0}=10^{4} will suffice) the following holds for every k1k\geq 1. Let M=(ζij)1ip,1jnM=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} and M=(ζij)1ip,1jnM^{\prime}=(\zeta^{\prime}_{ij})_{1\leq i\leq p,1\leq j\leq n} be two random matrices satisfying condition C1 with the indicated constant C0C_{0}, and assume that for each i,ji,j that ζij\zeta_{ij} and ζij\zeta^{\prime}_{ij} match to order 4. Let W,WW,W^{\prime} be the associated covariance matrices. Assume also that p/nyp/n\rightarrow y for some 0<y10<y\leq 1.

Then for any 1i1<i2<<ikn1\leq i_{1}<i_{2}<\ldots<i_{k}\leq n, and for nn sufficiently large depending on k,c0k,c_{0}, we have

If ζij\zeta_{ij} and ζij\zeta^{\prime}_{ij} only match to order 3 rather 4, then the conclusion (1.2) still holds provided that one strengthens (1.1) to

The next theorem is an extension of Theorem 17 in , which is used in the proof of Theorem 1.5 and is of independent interest as well. The proof is delayed to Section 5.

Let MM be a random matrix obeying condition C1. We say M obeys the gap property if for every c>0c>0 and every 1ip1\leq i\leq p, one has λi+1(W)λi(W)n1c|\lambda_{i+1}(W)-\lambda_{i}(W)|\geq n^{-1-c} with high probability.

Let MM be a random matrix satisfying condition C1. Then MM obeys the gap property.

When y=1y=1, the singular value statistics around a=0a=0 turn out to be different since the density function ρMP,y(x)\rho_{\text{MP},y}(x) has a singularity at x=0x=0. The hard edge is not really an edge, which makes it easier to deal with. In this paper, we will focus on the edge case when a>0a>0.

We consider nn as an asymptotic parameter tending to infinity. We use XYX\ll Y, YXY\gg X, Y=Ω(X)Y=\Omega(X), or X=O(Y)X=O(Y) to denote the bound XCYX\leq CY for all sufficiently large nn and for some constant CC. Notations such as XkY,X=Ok(Y)X\ll_{k}Y,X=O_{k}(Y) mean that the hidden constant CC depend on another constant kk. X=o(Y)X=o(Y) or Y=ω(X)Y=\omega(X) means that X/Y0X/Y\rightarrow 0 as nn\rightarrow\infty; the rate of decay here will be allowed to depend on other parameters. We write X=Θ(Y)X=\Theta(Y) for YXYY\ll X\ll Y. We view vectors xCnx\in{\hbox{\bf C}}^{n} as column vectors. The Euclidean norm of a vector xCnx\in{\hbox{\bf C}}^{n} is defined as x:=(xx)1/2\|x\|:=(x^{*}x)^{1/2}.

This paper is organized as follows: in Section 2, we prove a variant of universality result regarding the smallest singular value as an application of the Four Moment theorem. In Section 3, we mention a few basic results from linear algebra and probability. In Section 4, we provide the proofs of two technical lemmas, which are the major content of this paper. Finally, in Section 5, we give the proofs of the Gap theorem (Theorem 1.7) and Four Moment theorem (Theorem 1.5). The argument draws heavily from those in , and , thus we only focus on the changes needed to complete the proofs.

Acknowledgments: The author would like to thank Van H. Vu for useful discussion and his guidance through to the completion of this paper.

Applications

In a similar way as (Section 1.3), equipped with the Four Moment theorem, we can obtain universality results for large classes of random matrices. Let us demonstrate through some examples, focusing on the results for the lower edge of the spectrum. Recall σ1(Mp,n)\sigma_{1}(M_{p,n}) denotes the smallest singular value of Mp,nM_{p,n}.

For the case when p=np=n, the limiting distribution for Gaussian models was computed by Edelman . Recently a universality result has been established by Tao and Vu for the entries with bounded sufficiently high moments.

Let Mp,n=(ζij)1ip,1jnM_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} be a random covariance matrix, where p=p(n)np=p(n)\leq n tends to infinity as nn\rightarrow\infty and lim supnp/n<1\limsup_{n\rightarrow\infty}p/n<1. Let ζij\zeta_{ij} be independent for all i,ji,j.

where TW1,TW2\text{TW}_{1},\text{TW}_{2} denote the Tracy-Widom distributions.

In a same way as the authors proving (, Theorem 9) and (, Theorem 11), one can get the following (also see Figure 1 for numerical simulations):

The conclusions of Theorem 2.1 can be extended to the case when p=p(n)np=p(n)\leq n tends to infinity as nn\rightarrow\infty and limnp/n=y(0,1]\lim_{n\rightarrow\infty}p/n=y\in(0,1], and when Mp,n=(ζij)1ip,1jnM_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} obeying condition C1 with sufficiently large constant C0C_{0}, and ζij\zeta_{ij} have vanishing third moment.

Recently, Ben Arous and Péché proved universality at the edge for random matrices Mp,n(ζ)M_{p,n}(\zeta) with i.i.d. entries of Gaussian divisible distribution. And with the matching theorem (Corollary 30, ), we can drop the third moment condition whereas ζ\zeta is assumed to be supported on at least three points.

The conclusion (2.2) of Theorem 2.1 can be extended to the case when p=p(n)np=p(n)\leq n tends to infinity as nn\rightarrow\infty and limnp/n=y(0,1]\lim_{n\rightarrow\infty}p/n=y\in(0,1], and when Mp,n=(ζij)1ip,1jnM_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} obeying condition C1 with sufficiently large constant C0C_{0}, and ζij\zeta_{ij} are supported on at least three points.

General Tools

In this section, we collect some basic tools from linear algebra and probability that will be used repeatedly in the sequel.

We start with the Cauchy interlacing law and the Weyl inequalities.

If AnA_{n} is an n×nn\times n Hermitian matrix, and An1A_{n-1} is an (n1)×(n1)(n-1)\times(n-1) minor, then λi(An)λi(An1)λi+1(An)\lambda_{i}(A_{n})\leq\lambda_{i}(A_{n-1})\leq\lambda_{i+1}(A_{n}) for all 1i<n1\leq i<n.

If Mp,nM_{p,n} is a p×np\times n matrix, and Mp1,nM_{p-1,n} is an (p1)×n(p-1)\times n minor, then σi(Mp,n)σi(Mp1,n)σi+1(Mp,n)\sigma_{i}(M_{p,n})\leq\sigma_{i}(M_{p-1,n})\leq\sigma_{i+1}(M_{p,n}) for all 1i<p1\leq i<p.

If p<np<n, if Mp,nM_{p,n} is a p×np\times n matrix, and Mp,n1M_{p,n-1} is a p×(n1)p\times(n-1) minor, then σi1(Mp,n)σi(Mp,n1)σi(Mp,n)\sigma_{i-1}(M_{p,n})\leq\sigma_{i}(M_{p,n-1})\leq\sigma_{i}(M_{p,n}) for all 1ip1\leq i\leq p, with the understanding that σ0(Mp,n)=0\sigma_{0}(M_{p,n})=0. (For p=np=n, one can of course use the transpose of (ii) instead.)

If A,BA,B are n×nn\times n Hermitian matrices, then λi(A)λi(B)ABop\|\lambda_{i}(A)-\lambda_{i}(B)|\leq\|A-B\|_{op} for all 1in1\leq i\leq n.

If M,NM,N are p×np\times n matrices, then σi(M)σi(N)MNop\|\sigma_{i}(M)-\sigma_{i}(N)|\leq\|M-N\|_{op} for all 1ip1\leq i\leq p.

The following formula for an entry of a singular vector, in terms of the singular values and singular vectors of a minor, is very useful:

The next lemma is the well-known Cauchy interlacing identities:

Let AnA_{n} be a n×nn\times n Hermitian matrix, and

Let λi(An),1in\lambda_{i}(A_{n}),1\leq i\leq n be the eigenvalues of AnA_{n} and λj(An1),1jn1\lambda_{j}(A_{n-1}),1\leq j\leq n-1 be the eigenvalues of An1A_{n-1}. Suppose that XX is not orthogonal to any of the unit eigenvectors uj(An1)u_{j}(A_{n-1}) of An1A_{n-1}. Then we have

From this lemma, one immediately gets an interlacing identity for singular values:

Assume the notations in Lemma 3.3, then for every ii,

with eigenvalue σi(Mp.n)2\sigma_{i}(M_{p.n})^{2}.

Since we have λj(Mp,n1Mp,n1)=σj(Mp,n1)2\lambda_{j}(M_{p,n-1}^{*}M_{p,n-1})=\sigma_{j}(M_{p,n-1})^{2} and

(3.1) follows. Similarly, to show (3.2), apply Lemma 3.5 to the matrix

By Schur’s complement, it has the following alternate representation:

Let W=(ζij)1i,jnW=(\zeta_{ij})_{1\leq i,j\leq n} be a Hermitian matrix, and let zz be a complex number not in the spectrum of WW. Then we have

2. Tools from probability theory

We will rely frequently on the next concentration of measure result for projections of random vectors.

for some σ>0\sigma>0. Let ζ1,,ζn\zeta_{1},\ldots,\zeta_{n} be independent complex random variables with mean zero, variance one, and Eζi3C{\hbox{\bf E}}|\zeta_{i}|^{3}\leq C for some C1C\geq 1. For each 1iN1\leq i\leq N, let SiS_{i} be the complex random variable

(Upper tail bound on SiS_{i}) For t1t\geq 1, we have P(Sit)exp(ct2)+Cσ{\hbox{\bf P}}(|S_{i}|\geq t)\ll\exp(-ct^{2})+C\sigma for some absolute constant c>0c>0.

(Lower tail bound on S\vec{S}) For any tNt\leq\sqrt{N}, one has P(St)O(t/N)N/4+CN4t3σ{\hbox{\bf P}}(|\vec{S}|\leq t)\ll O(t/\sqrt{N})^{\lfloor N/4\rfloor}+CN^{4}t^{-3}\sigma.

The same claim holds if one of the ζi\zeta_{i} is assumed to have variance cc instead of 11 for some absolute constant c>0c>0.

Main technical lemmas

Recall in the proofs of the Four Moment Theorem and the Gap Theorem as in , and , a crucial input was the Delocalization Theorem of Erdős, Schlein, and Yau (, and ). The material in this section is analogous to Section 3 of . We will first extend the concentration of ESD result to the edge of the spectrum and use this concentration theorem to show the delocalization of singular vectors. The proof of the delocalization result in the edge of spectrum is significantly different from that in the bulk of spectrum as in . However, similar to , the Cauchy interlacing identities for singular values in Theorem 3.5 will help us to deal with this problem.

First observe that if M=(ζij)M=(\zeta_{ij}) obeys condition C1 for some constant C0>0C_{0}>0, then by Markov’s inequality and the union bound, one has ζijn10/C0|\zeta_{ij}|\leq n^{10/C_{0}} for all i,ji,j with probability 1O(n8)1-O(n^{-8}). By a truncation technique (see for details) and Lemma 3.2, one may assume that

We will derive the eigenvalue concentration theorem (up to the edge) which is an analogue of Theorem 19 in :

Suppose that p/nyp/n\rightarrow y for some 0<y10<y\leq 1. Assume a>0a>0. Let M=(ζij)1ip,1jnM=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} obey condition C1{\bf C1} for some C02C_{0}\geq 2 and the probability distribution of ζij\zeta_{ij} be continuous. Assume further ζijK|\zeta_{ij}|\leq K almost surely for some K=o(n1/2δ2log1n)K=o({n^{1/2}\delta^{2}}{\log^{-1}n}) for all i,ji,j, where 0<δ<1/20<\delta<1/2 (which can depend on nn). Then for any interval I[a,b]I\subset[a,b] of length IK2log4.5nδ9n|I|\geq\frac{K^{2}\log^{4.5}n}{\delta^{9}n}, one has with overwhelming probability(uniformly in II) that the number of eigenvalues NIN_{I} of WW in II obeys the concentration estimate

As a consequence of Theorem 4.1, one can deduce the following delocalization theorem:

Let the hypothesis be as in Theorem 4.1, then with overwhelming probability, all the unit left and right singular vectors of MM have all coefficients uniformly of size at most K2n1/2logO(1)n{K^{2}{n^{-1/2}\log^{O(1)}n}}.

The continuity hypothesis in the above theorems, which guarantees the singular values are almost surely simple, is only a technical one. In practice we are able to eliminate this hypothesis by a limiting argument using Lemma 3.2.

with overwhelming probability. The delocalization of left singular vectors can be proved similarly.

The “bulk” case is treated in . Now we consider the edge case when 1i0.001n1\leq i\leq 0.001n or 0.999nin0.999n\leq i\leq n (say). Using the Marchenko-Pastur law, we have with overwhelming probability that

By Lemma 3.3, it suffices to show with overwhelming probability that

From Lemma 3.7, we conclude that vj(Mp,n1)XKlogn|v_{j}(M_{p,n-1})^{*}X|\ll K\log n with overwhelming probability for each jj (and hence for all jj, by the union bound). Then it is enough to show that with overwhelming probability

By the Cauchy-Schwarz inequality, it thus suffices to show that

with overwhelming probability for some 1T<T+K2logO(1)n1\leq T_{-}<T_{+}\ll K^{2}\log^{O(1)}n. Noticed that σj(Mp,n1)2=λj(Mp,n1Mp,n1)=Θ(n)\sigma_{j}(M_{p,n-1})^{2}=\lambda_{j}(M_{p,n-1}^{*}M_{p,n-1})=\Theta(n), we thus need to show

with overwhelming probability for some 1T<T+K2logO(1)n1\leq T_{-}<T_{+}\ll K^{2}\log^{O(1)}n, which is equivalent to prove that

with overwhelming probability for some 1T<T+K2logO(1)n1\leq T_{-}<T_{+}\ll K^{2}\log^{O(1)}n.

In the interlacing identity in Lemma 3.5, we have

By Lemma 3.7, one gets 1pX2=1+o(1)\frac{1}{p}||X||^{2}=1+o(1) with overwhelming probability. And since p/n=y+o(1)p/n=y+o(1), one has

with overwhelming probability. In order to show (4.2),we will evaluate

for some T,T+=K2logO(1)nT_{-},T_{+}=K^{2}\log^{O(1)}n, where Wp,n1=1n1Mp,n1Mp,n1W_{p,n-1}=\frac{1}{n-1}M_{p,n-1}^{*}M_{p,n-1}.

The Machenko-Pastur law implies λj(Wp,n1)=Θ(1)\lambda_{j}(W_{p,n-1})=\Theta(1) for every 1jmin(p,n1)1\leq j\leq\text{min}(p,n-1).

Let A>100A>100 be a large constant to be chosen later. From Theorem 4.1, we have that (by taking δ=logA/20n\delta=\log^{-A/20}n)

with overwhelming probability for any interval II of length I=K2logAn/n|I|=K^{2}\log^{A}n/n, where αI:=1IIρMP,y(x) dx\alpha_{I}:=\frac{1}{|I|}\int_{I}\rho_{MP,y}(x)\ dx. For such an interval, we see from Lemma 3.7 that with overwhelming probability

Set dI:=dist(λi(Wp,n),I)Id_{I}:=\frac{\text{dist}(\lambda_{i}(W_{p,n}),I)}{|I|}. If dIlognd_{I}\geq\log n(say), then

for all jj in the above sum, and since λi(Wp,n)=Θ(1)\lambda_{i}(W_{p,n})=\Theta(1), we get

We now partition the real line into intervals II of length K2logAn/nK^{2}\log^{A}n/n, and sum (4.7) over all intervals II with dIlognd_{I}\geq\log n. Bounding αI\alpha_{I} crudely by O(1)O(1), we see that IO(αIdI2)=O(1logn)=o(1)\sum_{I}O(\frac{\alpha_{I}}{d_{I}^{2}})=O(\frac{1}{\log n})=o(1). Similarly, one has

Finally, Riemann integration of the principal value integral

If λi(Wp,n)ao(1)|\lambda_{i}(W_{p,n})-a|\leq o(1), using the formula for the Stieltjes transform, one obtains from residue calculus that

When 0<y<10<y<1, y>2y1\sqrt{y}>2\sqrt{y}-1. (4.2) follows by comparing (4.8) and (4.9).

If λi(Wp,n)bo(1)|\lambda_{i}(W_{p,n})-b|\leq o(1), we have

When 0<y10<y\leq 1, y>2y1-\sqrt{y}>-2\sqrt{y}-1. Then (4.2) follows by comparing (4.10) and (4.11).

By the concentration theorem 4.1 and the Cauchy interlacing law, the interval II with dI<lognd_{I}<\log n will contribute at most K2logO(1)nK^{2}\log^{O(1)}n eigenvalues and we can set T,T+T_{-},T_{+} accordingly. The proof is now complete.

2. Proof of Theorem 4.1:

We first have a crude upper bound on the number of eigenvalues of WW on an interval. The proof can be found in Section 5.2, .

with overwhelming probability, where NIN_{I} is the number of eigenvalues in the interval II.

The strategy is to compare the Stieltjes transform of the ESD of matrix WW

with the Stieltjes transform of Marchenko-Pastur Law

And thanks to the next proposition, one gets control on ESD through control on the Stieltjes transforms.

(Lemma 29, ) Let 1/10η1/n1/10\geq\eta\geq 1/n, and a,b,ε,δ>0a,b,\varepsilon,\delta>0. Suppose that one has the bound

with (uniformly) overwhelming probability for all zz with aRe(z)ba\leq\text{Re}(z)\leq b and Im(z)η\text{Im}(z)\geq\eta. Then for any interval II in [aε,b+ε][a-\varepsilon,b+\varepsilon] with Imax(2η,ηδlog1δ)|I|\geq\text{max}(2\eta,\frac{\eta}{\delta}\log\frac{1}{\delta}), one has

By Proposition 4.5, our objective is to show

with (uniformly) overwhelming probability for all zz with aRe(z)ba\leq\text{Re}(z)\leq b and Im(z)η:=K2log6nnδ8.\text{Im}(z)\geq{\eta}:=\frac{K^{2}\log^{6}n}{n\delta^{8}}.

Since sMP,y(z)s_{MP,y}(z) is the unique solution to the equation

in the upper half plane (see ), we investigate a similar equation for s(z)s(z).

The entries of XkX_{k} are independent of each other and of WkW_{k}, and have mean and variance 11. Noticed uj(Mk)u_{j}(M_{k}) is a unit vector. By linearity of expectation we have

is the Stieltjes transform for the ESD of WkW_{k}. From the Cauchy interlacing law, we can get

In fact a similar estimate holds for YkY_{k} itself:

For 1kn1\leq k\leq n, Yk=E(YkWk)+o(δ2)Y_{k}=\mathbf{E}(Y_{k}|W_{k})+o(\delta^{2}) holds with (uniformly) overwhelming probability for all zz with aRe(z)ba\leq\text{Re}(z)\leq b and Im(z)η\text{Im}(z)\geq{\eta}.

Let T{1,,n1}T\subset\{1,\ldots,n-1\}. Let HH be the space spanned by {uj(Wk)}\{u_{j}(W_{k})\} for jTj\in T and PHP_{H} be the orthogonal projection onto HH. Thus jTRj=PH(Xk)2dim(H).\displaystyle\sum_{j\in T}R_{j}=||P_{H}(X_{k})||^{2}-{\text{dim}(H)}.

By Lemma 3.7, we conclude with overwhelming probability

Let z=x+1ηz=x+\sqrt{-1}{}\eta, where η=K2log6nnδ8\eta=\frac{K^{2}\log^{6}n}{n\delta^{8}} and axba\leq x\leq b. We will use two auxiliary parameters α=δ2log1.1n\alpha=\delta^{2}\log^{-1.1}n,δ=δ2log0.1n\delta^{\prime}=\delta^{2}\log^{-0.1}n in later estimation.

First, for those jTj\in T such that λj(Wk)xδη|\lambda_{j}(W_{k})-x|\leq\delta^{\prime}\eta, the function λj(Wk)λj(Wk)x1η\frac{\lambda_{j}(W_{k})}{\lambda_{j}(W_{k})-x-\sqrt{-1}{}\eta} has magnitude O(1η)O(\frac{1}{{}\eta}). From Proposition 4.4, Tnδη|T|\ll n\delta^{\prime}\eta, the contribution for these jTj\in T,

For the contribution of the remaining indices, we subdivide them as

for 0llogn/α0\leq l\ll\log n/\alpha, and then sum over ll.

For each such interval, the function λj(Wk)λj(Wk)x1η\frac{\lambda_{j}(W_{k})}{\lambda_{j}(W_{k})-x-\sqrt{-1}{}\eta} has magnitude O(1(1+α)lδη)O(\frac{1}{(1+\alpha)^{l}\delta^{\prime}\eta}) and fluctuates by at most O(α(1+α)lδη)O(\frac{\alpha}{(1+\alpha)^{l}\delta^{\prime}\eta}). Say T(l)T(l) is the set of all jj’s in this interval, by Proposition 4.4, T(l)nα(1+α)lδη|T(l)|\ll n\alpha(1+\alpha)^{l}\delta^{\prime}\eta. Together with bounds (4.14), (4.15), the contribution for these jj on such an interval,

Summing over ll (taking into account that llogn/αl\ll\log n/\alpha), we will get

Recall sMP,y(z)s_{MP,y}(z) has an explicit expression

where we take the branch of (y+z1)24yz\sqrt{(y+z-1)^{2}-4yz} with cut at [a,b][a,b] that is asymptotically y+z1y+z-1 as zz\rightarrow\infty.

From (4.13) and Proposition 4.6, we have with overwhelming probability that

where we used Lemma 3.7 to obtain that ξkk=Xk2/n=1+o(δ2)\xi_{kk}=||X_{k}||^{2}/n=1+o(\delta^{2}) with overwhelming probability.

By assumption p/nyp/n\rightarrow y, when nn is large enough,

In (4.16), for the error term o(δ2)o(\delta^{2}), one has either o(δ2)y+z1+yzs(z)=o(δ2)\frac{o(\delta^{2})}{y+z-1+yzs(z)}=o(\delta^{2}) or y+z1+yzs(z)=o(1)y+z-1+yzs(z)=o(1). In the latter case, we get s(z)=y+z1yz+o(1)s(z)=-\frac{y+z-1}{yz}+o(1). In the first case, we impose a Taylor expansion on (4.16),

Completing a perfect square for s(z)s(z) in the above identity, one can solve the equation for s(z)s(z),

If o(δ2)(y+z1)24yz1=o(δ)\frac{o(\delta^{2})}{\sqrt{\frac{(y+z-1)^{2}}{4yz}-1}}=o(\delta), by a Taylor expansion on the right hand side of (4.17), we have yz(s(z)+y+z12yz)=±(y+z1)24yz1+o(δ)\sqrt{yz}(s(z)+\frac{y+z-1}{2yz})=\pm\sqrt{\frac{(y+z-1)^{2}}{4yz}-1}+o(\delta). Therefore, s(z)=sMP,y(z)+o(δ)s(z)=s_{MP,y}(z)+o(\delta) or s(z)=sMP,y(z)(y+z1)24yzyz+o(δ)=sMP,y(z)y+z1yz+o(δ)s(z)=s_{MP,y}(z)-\frac{\sqrt{(y+z-1)^{2}-4yz}}{yz}+o(\delta)=-s_{MP,y}(z)-\frac{y+z-1}{yz}+o(\delta). If (y+z1)24yz1=o(δ2)\frac{(y+z-1)^{2}}{4yz}-1=o(\delta^{2}), from (4.17) and the explicit formula for sMP,y(z)s_{MP,y}(z), we still have s(z)=sMP,y(z)+o(δ)s(z)=s_{MP,y}(z)+o(\delta).

To summarize the above discussion, one has, with overwhelming probability, either

We may assume the above trichotomy holds for all z=x+1ηz=x+\sqrt{-1}\eta with axba\leq x\leq b and η0ηn10/δ\eta_{0}\leq\eta\leq n^{10}/\delta where η0=K2log6nnδ8\eta_{0}=\frac{K^{2}\log^{6}n}{n\delta^{8}}.

When η=n10/δ\eta=n^{10}/\delta, from s(z)1/η|s(z)|\leq 1/\eta and sMP,y(z)1/η|s_{MP,y}(z)|\leq 1/\eta, we have s(z)s(z) and sMP,y(z)s_{MP,y}(z) are both o(δ)o(\delta) and therefore (\refeq:diff1)(\ref{eq:diff1}) holds in this case. By continuity, we conclude that either (4.18) holds in the domain of interest or there exists some zz in the domain such that (4.18) and (4.19) or (4.18) and (4.20) hold together.

On the other hand, (4.18) or (4.20) cannot hold at the same time. Otherwise, sMP,y(z)+y+z1yz=o(1)s_{MP,y}(z)+\frac{y+z-1}{yz}=o(1). However, from sMP,y(z)(sMP,y(z)+y+z1yz)=1yzs_{MP,y}(z)(s_{MP,y}(z)+\frac{y+z-1}{yz})=-\frac{1}{yz} and sMP,y(z)2y(1y+η0)|s_{MP,y}(z)|\leq\frac{\sqrt{2}}{\sqrt{y}(1-\sqrt{y}+\sqrt{\eta_{0}})}, one can see that sMP,y(z)+y+z1yz|s_{MP,y}(z)+\frac{y+z-1}{yz}| is bounded from below, which implies a contradiction.

Similarly, (4.18) or (4.19) cannot both hold except when (y+z1)24yz=o(δ2)(y+z-1)^{2}-4yz=o(\delta^{2}). Otherwise, we can conclude that 2sMP,y(z)+y+z1yz=o(δ)2s_{MP,y}(z)+\frac{y+z-1}{yz}=o(\delta). From the explicit formula of sMP,ys_{MP,y},

One can conclude 2sMP,y(z)+y+z1yzCδ|2s_{MP,y}(z)+\frac{y+z-1}{yz}|\geq C\delta, which contradicts our assertion. Actually, if (y+z1)24yz=o(δ2)(y+z-1)^{2}-4yz=o(\delta^{2}), (4.18) and (4.19) are equivalent.

In conclusion, (4.18) holds with overwhelming probability in the domain of interest.

Gap theorem and Four Moment theorem

In this section, we complete the proofs of the main results, Theorem 1.5 and Theorem 1.7. The proofs follow closely those in (as well as in , ), so we shall focus on the changes needed to that argument. We assume substantial familiarity with the materials in , , and will cite from them repeatedly.

It is convenient to use the augmented matrix

which is a (p+n)×(p+n)(p+n)\times(p+n) Hermitian matrix with eigenvalues ±σ1(M),,±σp(M)\pm\sigma_{1}(M),\ldots,\pm\sigma_{p}(M) and npn-p zeros. In this way, we can import the results obtained in , and to the model discussed in this paper.

As mentioned in the beginning of Section 5, one can assume that

almost surely for all i,ji,j. We also assume that the distributions of M,MM,M^{\prime} are continuous to ensure the singular values are almost surely simple.

Let us first state a weaker version the Four Moment Theorem as we assume gap properties for the matrices considered:

For sufficiently small c0>0c_{0}>0 and sufficiently large C0>0C_{0}>0 (C0=104C_{0}=10^{4} will suffice) the following holds for every k1k\geq 1. Let M=(ζij)1ip,1jnM=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} and M=(ζij)1ip,1jnM^{\prime}=(\zeta^{\prime}_{ij})_{1\leq i\leq p,1\leq j\leq n} be two random matrices satisfying condition C1 with the indicated constant C0C_{0}, and assume that for each i,ji,j that ζij\zeta_{ij} and ζij\zeta^{\prime}_{ij} match to order 4. Let W,WW,W^{\prime} be the associated covariance matrices. Assume also that M,MM,M^{\prime} obey the gap property and p/nyp/n\rightarrow y for some 0<y10<y\leq 1.

Then for any 1i1<i2<<ikn1\leq i_{1}<i_{2}<\ldots<i_{k}\leq n, and for nn sufficiently large depending on k,c0k,c_{0}, we have

If ζij\zeta_{ij} and ζij\zeta^{\prime}_{ij} only match to order 3 rather 4, then the conclusion (5.3) still holds provided that one strengthens (5.2) to

The Four Moment theorem follows directly from Theorem 1.7 and Theorem 5.1. The next two sections are devoted to the proofs of Theorem 1.7 and Theorem 5.1.

The key technical step (also used in proving Theorem 1.7) is the truncated Four Moment Theorem, which follows by applying [12, Proposition 6.1 and Proposition 6.2] (or see [10, Proposition 35]) to the argumented matrix M. The proof is omitted here.

For sufficiently small c0>0c_{0}>0 and sufficiently large C0>0C_{0}>0 (C0=104C_{0}=10^{4} will suffice) the following holds for every k1k\geq 1. Let M=(ζij)1ip,1jnM=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} and M=(ζij)1ip,1jnM^{\prime}=(\zeta^{\prime}_{ij})_{1\leq i\leq p,1\leq j\leq n} be two random matrices satisfying condition C1 with the indicated constant C0C_{0}, and assume that for each i,ji,j that ζij\zeta_{ij} and ζij\zeta^{\prime}_{ij} match to order 4. Assume also that ζij,ζijn10/C0|\zeta_{ij}|,|\zeta^{\prime}_{ij}|\leq n^{10/C_{0}} and p/nyp/n\rightarrow y for some 0<y10<y\leq 1.

Then for any 1i1<i2<<ikn1\leq i_{1}<i_{2}<\ldots<i_{k}\leq n, and for nn sufficiently large depending on ε,k,c0\varepsilon,k,c_{0}, we have

If ζij\zeta_{ij} and ζij\zeta^{\prime}_{ij} only match to order 3 rather 4, then the conclusion (5.5) still holds provided that one strengthens (5.4) to

for any c1>0c_{1}>0, provided that c0c_{0} is sufficiently small depending on c1c_{1}.

As in the arguments in Section 6 in , we use the qualities for 1ip1\leq i\leq p,

The gap property (up to the edge) on M ensures an upper bound on Qi(M)Q_{i}({\bf M}). The proof repeats exactly the proof of Lemma 32 in .

If M satisfies the gap property, then for any c0>0c_{0}>0(independent of n), and any 1ip1\leq i\leq p, one has Qi(M)nc0Q_{i}({\bf M})\leq n^{c_{0}} with high probability.

where η(x)\eta(x) is a smooth cutoff to the region xnc0x\leq n^{c_{0}} which equals 11 on xnc0/2x\leq n^{c_{0}}/2. From Propositon 5.3, we have

for some c>0c>0, and a similar relation holds for MM^{\prime}. The proof is complete by using the above relations and Theorem 5.2.

2. Proof of the Gap Theorem

We first have a gap theorem under additional exponential decay hypothesis on the ensembles of MM. The proof is presented in Section 5.3.

Let M=(ζij)1ip,1jnM=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} be a random matrix obeying condition C1, and the entries ζij\zeta_{ij} satisfy exponential decay in the sense that P(ζijtC)exp(t){\hbox{\bf P}}(|\zeta_{ij}|\geq t^{C})\leq\exp(-t) for all tCt\geq C^{\prime} for all i,ji,j and some constants C,C>0C,C^{\prime}>0. Then MM obeys the gap property.

The next observation is the following matching lemma (See Lemma 33 in ), which together with Theorem 5.2, ensures us to remove the exponential decay hypothesis in Theorem 5.4.

Now consider the matrix M=(ζij)1ip,1jnM=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} in Theorem 1.7. By the matching lemma, we can find a random matrix M=(ζij)1ip,1jnM^{\prime}=(\zeta^{\prime}_{ij})_{1\leq i\leq p,1\leq j\leq n} such that ζij\zeta^{\prime}_{ij} satisfies the exponential decay hypothesis and ζij\zeta^{\prime}_{ij} matches ζij\zeta_{ij} to third order for each i,ji,j. By Theorem 5.4, the matrix MM^{\prime} obeys the gap property. Similarly as in Section 6, , let η(x)\eta(x) be a smooth cutoff to the region xnc0x\leq n^{c_{0}}. Then by Proposition 5.3, Eη(Qi(M))=1O(nc1){\hbox{\bf E}}\eta(Q_{i}(M^{\prime}))=1-O(n^{-c_{1}}), which, by using Theorem 5.2, implies that Eη(Qi(M))=1O(nc2){\hbox{\bf E}}\eta(Q_{i}(M^{\prime}))=1-O(n^{-c_{2}}) for some c2c_{2} independent of nn. Hence, MM also satisfies the gap property.

3. Proof of Theorem 5.4:

The proof follows closely to that discussed in Section 5, . We shall mainly mention the corresponding changes. Interested readers can find the detailed proofs in . First, in order to operate an induction argument, we need to treat the edge case i=1,pi=1,p separately.

By symmetry, it suffices to show for i=pi=p. In the interlacing identity (Lemma 3.5),

From Theorem 4.2 and Lemma 3.8, one can conclude that up1(Mp1,n)Y2nc/10|u_{p-1}(M_{p-1,n})^{*}Y|^{2}\geq n^{-c/10} with high probability. Therefore, σj(Mp1,n)2σp(Mp,n)2nc|\sigma_{j}(M_{p-1,n})^{2}-\sigma_{p}(M_{p,n})^{2}|\geq n^{-c} with high probability. The conclusion follows from the Cauchy interlacing law. ∎

For the general case for the gap theorem, we write i0,p0i_{0},p_{0} instead of i,pi,p and define N0:=p0+nN_{0}:=p_{0}+n, as in , we introduce the regularized gap

where C1>1C_{1}>1 is a large constant to be determined later. To show Theorem 5.4, it is enough to show that

By repeating the arguments in Section 3.5, , the proof relies on the following two key propositions. The idea is to propagate a narrow gap for Mp,nM_{p,n} backwards in pp until one can use Theorem 4.1 to control the occurrence of the gap.

. Suppose p0/2p<p0p_{0}/2\leq p<p_{0} and lp/10l\leq p/10 is such that

for some 1<δ11<\delta\leq 1(which can depend on nn), and that

Let Xp+1X_{p+1} be the p+1thp+1^{\text{th}} row of Mp0,nM_{p_{0},n}, and let u1(Mp,n),,up(Mp,n)u_{1}(M_{p,n}),\ldots,u_{p}(M_{p,n}) be an orthonormal system of right singular vectors of Mp,nM_{p,n} associated to σ1(Mp,n),,σp(Mp,n)\sigma_{1}(M_{p,n}),\ldots,\sigma_{p}(M_{p,n}). Then one of the following statement holds:

(Macroscopic spectral concentration) There exists 1i<i+p+11\leq i_{-}<i_{+}\leq p+1 with i+ilogC1/2ni_{+}-i_{-}\geq\log^{C_{1}/2}n such that nσi+(Mp+1,n)nσi(Mp+1,n)δ1/4exp(log0.95)(i+i).|\sqrt{n}\sigma_{i_{+}}(M_{p+1,n})-\sqrt{n}\sigma_{i_{-}}(M_{p+1,n})|\leq\delta^{1/4}\exp(\log^{0.95})(i_{+}-i_{-}).

(Small inner products) There exists 1ii0l<i0i+p1\leq i_{-}\leq i_{0}-l<i_{0}\leq i_{+}\leq p with i+ilogC1/2ni_{+}-i_{-}\leq\log^{C_{1}/2}n such that

(Large singular value) For some 1ip+11\leq i\leq p+1 one has

(Large inner product) There exists 1ip1\leq i\leq p such that

(Large inner product near i0i_{0}) There exists 1ip1\leq i\leq p with ii0logC1n|i-i_{0}|\leq\log^{C_{1}}n such that

Apply Lemma 5.3 in to the augmented matrix

Noticed Ap+nA_{p+n} is Ap+n+1A_{p+n+1} with the rightmost column and bottom column(which is Xp+1X_{p+1} and p+1p+1 zeros) removed. The eigenvalues of Ap+nA_{p+n} are ±nσ1(Mp,n),,±nσp(Mp,n)\pm\sqrt{n}\sigma_{1}(M_{p,n}),\ldots,\pm\sqrt{n}\sigma_{p}(M_{p,n}) and , and an orthonormal eigenbasis includes the vectors \left(\begin{array}[]{c}u_{j}(M_{p,n})\\ v_{j}(M_{p,n})\end{array}\right) for 1jp1\leq j\leq p. (The ”Large coefficient” event in Lemma 5.3, ) cannot occur as Ap+n+1A_{p+n+1} has zero diagonals.) ∎

The next proposition claims that the events (i)-(vi) occurs with small probability.

. Suppose that p0/2p<p0p_{0}/2\leq p<p_{0} and lp/10l\leq p/10 and set δ:=n0κ\delta:=n_{0}^{-\kappa} for some sufficiently small fixed κ>0\kappa>0. Then

The events (i), (iii), (iv), (v) in Proposition 5.7 all fail with high probability.

There is a constant CC^{\prime} such that all the coefficients of the right singular vectors uj(Mp,n)u_{j}(M_{p,n}) for 1jp1\leq j\leq p are of magnitude at most n1/2logCnn^{-1/2}\log^{C^{\prime}}n with overwhelming probability. Conditioning Mp,nM_{p,n} to be a matrix with this property, the events (ii) and (vi) occur with a conditional probability of at most 2κm+nκ2^{-\kappa m}+n^{-\kappa}.

Furthermore, there is a constant C2C_{2} (depending on C,κ,C1C^{\prime},\kappa,C_{1}) such that if lC2l\geq C_{2} and Mp,nM_{p,n} is conditioned as in (b), then (ii) and (vi) in fact occur with a conditional probability of at most 2κmlog2C1n+nκ2^{-\kappa m}\log^{-2C_{1}}n+n^{-\kappa}.

The proof of the above proposition repeats the proof of Proposition 53 in with the major difference being that Theorem 4.1 and Theorem 4.2 are applied instead of Theorem 60 and Proposition 62 in .

References