Random covariance matrices: Universality of local statistics of eigenvalues up to the edge

Ke Wang

Introduction

The goal of this paper is to extend the Four Moment theorem established by Tao and Vu for iid covariance matrices from the bulk of spectrum to the edge. Let us first specify the matrix ensembles that will be studied.

Consider a random $p\times n$ matrix $M_{n,p}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ , where $p=p(n)$ is an integer parameter such that $p\leq n$ and $\lim_{n\rightarrow\infty}p/n=y$ for some $0<y\leq 1$ . We say that the matrix ensemble $M$ obeys condition $\bf{C1}$ if the random variables $\zeta_{ij}$ are jointly independent, have mean zero and variance $1$ , and obey the moment condition $\sup_{i,j}{\bf E}|\zeta_{ij}|^{C_{0}}\leq C$ for some constant $C$ independent of $n,p$ .

Given such a $p\times n$ random matrix $M$ , we form the $n\times n$ covariance matrix $W=W_{n,p}=\frac{1}{n}M^{*}M$ . This (non-negative) matrix has rank $p$ and the first $n-p$ eigenvalues are . We order its remaining eigenvalues as

The empirical spectral distribution (ESD) of the matrix $W$ (which is Hermitian and thus has real eigenvalues) is a one-dimensional function

where we use $|\mathbf{I}|$ to denote the cardinality of a set $\mathbf{I}$ .

The first fundamental result concerning the asymptotic limiting behavior of ESD for large covariance matrices is the $\mathit{Marchenko-Pastur}$ $\mathit{Law}$ due to (see also ).

(Marchenko-Pastur Law) Assume a $p\times n$ random matrix $M$ obeys condition $\bf{C1}$ with $C_{0}\geq 4$ , and $p,n\rightarrow\infty$ such that $\lim_{n\rightarrow\infty}p/n=y\in(0,1]$ , the empirical spectral distribution of the matrix $W_{n,p}=\frac{1}{n}M^{*}M$ converges in distribution to the Marchenko-Pastur Law with a density function

We introduce the notation of frequent events, depending on $n$ , in increasing order of likelihood.

$E$ holds asymptotically almost surely if ${\hbox{\bf P}}(E)=1-o(1)$ .

$E$ holds with high probability if ${\hbox{\bf P}}(E)\geq 1-O(n^{-c})$ for some constant $c>0$ (independent of $n$ ).

$E$ holds with overwhelming probability if ${\hbox{\bf P}}(E)\geq 1-O_{C}(n^{-C})$ for every constant $C>0$ (or equivalently, that ${\hbox{\bf P}}(E)\geq 1-\exp(-\omega(\log n))$ ).

$E$ holds almost surely if ${\hbox{\bf P}}(E)=1$ .

We say that two complex random variables $\zeta,\zeta^{\prime}$ match to order $k$ for some integer $k\geq 1$ if one has ${\hbox{\bf E}}\text{Re}(\zeta)^{m}\text{Im}(\zeta)^{l}={\hbox{\bf E}}\text{Re}(\zeta^{\prime})^{m}\text{Im}(\zeta^{\prime})^{l}$ for all $m,l\geq 0$ with $m+l\leq k$ .

Our main result is the following Four Moment theorem, which extends the result (Theorem 6) in to the edge of the spectrum. The proof is analogous to the proofs in , and and will be presented in Section 5.

For sufficiently small $c_{0}>0$ and sufficiently large $C_{0}>0$ ( $C_{0}=10^{4}$ will suffice) the following holds for every $k\geq 1$ . Let $M=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ and $M^{\prime}=(\zeta^{\prime}_{ij})_{1\leq i\leq p,1\leq j\leq n}$ be two random matrices satisfying condition C1 with the indicated constant $C_{0}$ , and assume that for each $i,j$ that $\zeta_{ij}$ and $\zeta^{\prime}_{ij}$ match to order 4. Let $W,W^{\prime}$ be the associated covariance matrices. Assume also that $p/n\rightarrow y$ for some $0<y\leq 1$ .

Then for any $1\leq i_{1}<i_{2}<\ldots<i_{k}\leq n$ , and for $n$ sufficiently large depending on $k,c_{0}$ , we have

If $\zeta_{ij}$ and $\zeta^{\prime}_{ij}$ only match to order 3 rather 4, then the conclusion (1.2) still holds provided that one strengthens (1.1) to

The next theorem is an extension of Theorem 17 in , which is used in the proof of Theorem 1.5 and is of independent interest as well. The proof is delayed to Section 5.

Let $M$ be a random matrix obeying condition C1. We say M obeys the gap property if for every $c>0$ and every $1\leq i\leq p$ , one has $|\lambda_{i+1}(W)-\lambda_{i}(W)|\geq n^{-1-c}$ with high probability.

Let $M$ be a random matrix satisfying condition C1. Then $M$ obeys the gap property.

When $y=1$ , the singular value statistics around $a=0$ turn out to be different since the density function $\rho_{\text{MP},y}(x)$ has a singularity at $x=0$ . The hard edge is not really an edge, which makes it easier to deal with. In this paper, we will focus on the edge case when $a>0$ .

We consider $n$ as an asymptotic parameter tending to infinity. We use $X\ll Y$ , $Y\gg X$ , $Y=\Omega(X)$ , or $X=O(Y)$ to denote the bound $X\leq CY$ for all sufficiently large $n$ and for some constant $C$ . Notations such as $X\ll_{k}Y,X=O_{k}(Y)$ mean that the hidden constant $C$ depend on another constant $k$ . $X=o(Y)$ or $Y=\omega(X)$ means that $X/Y\rightarrow 0$ as $n\rightarrow\infty$ ; the rate of decay here will be allowed to depend on other parameters. We write $X=\Theta(Y)$ for $Y\ll X\ll Y$ . We view vectors $x\in{\hbox{\bf C}}^{n}$ as column vectors. The Euclidean norm of a vector $x\in{\hbox{\bf C}}^{n}$ is defined as $\|x\|:=(x^{*}x)^{1/2}$ .

This paper is organized as follows: in Section 2, we prove a variant of universality result regarding the smallest singular value as an application of the Four Moment theorem. In Section 3, we mention a few basic results from linear algebra and probability. In Section 4, we provide the proofs of two technical lemmas, which are the major content of this paper. Finally, in Section 5, we give the proofs of the Gap theorem (Theorem 1.7) and Four Moment theorem (Theorem 1.5). The argument draws heavily from those in , and , thus we only focus on the changes needed to complete the proofs.

Acknowledgments: The author would like to thank Van H. Vu for useful discussion and his guidance through to the completion of this paper.

Applications

In a similar way as (Section 1.3), equipped with the Four Moment theorem, we can obtain universality results for large classes of random matrices. Let us demonstrate through some examples, focusing on the results for the lower edge of the spectrum. Recall $\sigma_{1}(M_{p,n})$ denotes the smallest singular value of $M_{p,n}$ .

For the case when $p=n$ , the limiting distribution for Gaussian models was computed by Edelman . Recently a universality result has been established by Tao and Vu for the entries with bounded sufficiently high moments.

Let $M_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ be a random covariance matrix, where $p=p(n)\leq n$ tends to infinity as $n\rightarrow\infty$ and $\limsup_{n\rightarrow\infty}p/n<1$ . Let $\zeta_{ij}$ be independent for all $i,j$ .

where $\text{TW}_{1},\text{TW}_{2}$ denote the Tracy-Widom distributions.

In a same way as the authors proving (, Theorem 9) and (, Theorem 11), one can get the following (also see Figure 1 for numerical simulations):

The conclusions of Theorem 2.1 can be extended to the case when $p=p(n)\leq n$ tends to infinity as $n\rightarrow\infty$ and $\lim_{n\rightarrow\infty}p/n=y\in(0,1]$ , and when $M_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ obeying condition C1 with sufficiently large constant $C_{0}$ , and $\zeta_{ij}$ have vanishing third moment.

Recently, Ben Arous and Péché proved universality at the edge for random matrices $M_{p,n}(\zeta)$ with i.i.d. entries of Gaussian divisible distribution. And with the matching theorem (Corollary 30, ), we can drop the third moment condition whereas $\zeta$ is assumed to be supported on at least three points.

The conclusion (2.2) of Theorem 2.1 can be extended to the case when $p=p(n)\leq n$ tends to infinity as $n\rightarrow\infty$ and $\lim_{n\rightarrow\infty}p/n=y\in(0,1]$ , and when $M_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ obeying condition C1 with sufficiently large constant $C_{0}$ , and $\zeta_{ij}$ are supported on at least three points.

General Tools

In this section, we collect some basic tools from linear algebra and probability that will be used repeatedly in the sequel.

We start with the Cauchy interlacing law and the Weyl inequalities.

If $A_{n}$ is an $n\times n$ Hermitian matrix, and $A_{n-1}$ is an $(n-1)\times(n-1)$ minor, then $\lambda_{i}(A_{n})\leq\lambda_{i}(A_{n-1})\leq\lambda_{i+1}(A_{n})$ for all $1\leq i<n$ .

If $M_{p,n}$ is a $p\times n$ matrix, and $M_{p-1,n}$ is an $(p-1)\times n$ minor, then $\sigma_{i}(M_{p,n})\leq\sigma_{i}(M_{p-1,n})\leq\sigma_{i+1}(M_{p,n})$ for all $1\leq i<p$ .

If $p<n$ , if $M_{p,n}$ is a $p\times n$ matrix, and $M_{p,n-1}$ is a $p\times(n-1)$ minor, then $\sigma_{i-1}(M_{p,n})\leq\sigma_{i}(M_{p,n-1})\leq\sigma_{i}(M_{p,n})$ for all $1\leq i\leq p$ , with the understanding that $\sigma_{0}(M_{p,n})=0$ . (For $p=n$ , one can of course use the transpose of (ii) instead.)

If $A,B$ are $n\times n$ Hermitian matrices, then $\|\lambda_{i}(A)-\lambda_{i}(B)|\leq\|A-B\|_{op}$ for all $1\leq i\leq n$ .

If $M,N$ are $p\times n$ matrices, then $\|\sigma_{i}(M)-\sigma_{i}(N)|\leq\|M-N\|_{op}$ for all $1\leq i\leq p$ .

The following formula for an entry of a singular vector, in terms of the singular values and singular vectors of a minor, is very useful:

The next lemma is the well-known Cauchy interlacing identities:

Let $A_{n}$ be a $n\times n$ Hermitian matrix, and

Let $\lambda_{i}(A_{n}),1\leq i\leq n$ be the eigenvalues of $A_{n}$ and $\lambda_{j}(A_{n-1}),1\leq j\leq n-1$ be the eigenvalues of $A_{n-1}$ . Suppose that $X$ is not orthogonal to any of the unit eigenvectors $u_{j}(A_{n-1})$ of $A_{n-1}$ . Then we have

From this lemma, one immediately gets an interlacing identity for singular values:

Assume the notations in Lemma 3.3, then for every $i$ ,

with eigenvalue $\sigma_{i}(M_{p.n})^{2}$ .

Since we have $\lambda_{j}(M_{p,n-1}^{*}M_{p,n-1})=\sigma_{j}(M_{p,n-1})^{2}$ and

(3.1) follows. Similarly, to show (3.2), apply Lemma 3.5 to the matrix

By Schur’s complement, it has the following alternate representation:

Let $W=(\zeta_{ij})_{1\leq i,j\leq n}$ be a Hermitian matrix, and let $z$ be a complex number not in the spectrum of $W$ . Then we have

2. Tools from probability theory

We will rely frequently on the next concentration of measure result for projections of random vectors.

for some $\sigma>0$ . Let $\zeta_{1},\ldots,\zeta_{n}$ be independent complex random variables with mean zero, variance one, and ${\hbox{\bf E}}|\zeta_{i}|^{3}\leq C$ for some $C\geq 1$ . For each $1\leq i\leq N$ , let $S_{i}$ be the complex random variable

(Upper tail bound on $S_{i}$ ) For $t\geq 1$ , we have ${\hbox{\bf P}}(|S_{i}|\geq t)\ll\exp(-ct^{2})+C\sigma$ for some absolute constant $c>0$ .

(Lower tail bound on $\vec{S}$ ) For any $t\leq\sqrt{N}$ , one has ${\hbox{\bf P}}(|\vec{S}|\leq t)\ll O(t/\sqrt{N})^{\lfloor N/4\rfloor}+CN^{4}t^{-3}\sigma$ .

The same claim holds if one of the $\zeta_{i}$ is assumed to have variance $c$ instead of $1$ for some absolute constant $c>0$ .

Main technical lemmas

Recall in the proofs of the Four Moment Theorem and the Gap Theorem as in , and , a crucial input was the Delocalization Theorem of Erdős, Schlein, and Yau (, and ). The material in this section is analogous to Section 3 of . We will first extend the concentration of ESD result to the edge of the spectrum and use this concentration theorem to show the delocalization of singular vectors. The proof of the delocalization result in the edge of spectrum is significantly different from that in the bulk of spectrum as in . However, similar to , the Cauchy interlacing identities for singular values in Theorem 3.5 will help us to deal with this problem.

First observe that if $M=(\zeta_{ij})$ obeys condition C1 for some constant $C_{0}>0$ , then by Markov’s inequality and the union bound, one has $|\zeta_{ij}|\leq n^{10/C_{0}}$ for all $i,j$ with probability $1-O(n^{-8})$ . By a truncation technique (see for details) and Lemma 3.2, one may assume that

We will derive the eigenvalue concentration theorem (up to the edge) which is an analogue of Theorem 19 in :

Suppose that $p/n\rightarrow y$ for some $0<y\leq 1$ . Assume $a>0$ . Let $M=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ obey condition ${\bf C1}$ for some $C_{0}\geq 2$ and the probability distribution of $\zeta_{ij}$ be continuous. Assume further $|\zeta_{ij}|\leq K$ almost surely for some $K=o({n^{1/2}\delta^{2}}{\log^{-1}n})$ for all $i,j$ , where $0<\delta<1/2$ (which can depend on $n$ ). Then for any interval $I\subset[a,b]$ of length $|I|\geq\frac{K^{2}\log^{4.5}n}{\delta^{9}n}$ , one has with overwhelming probability(uniformly in $I$ ) that the number of eigenvalues $N_{I}$ of $W$ in $I$ obeys the concentration estimate

As a consequence of Theorem 4.1, one can deduce the following delocalization theorem:

Let the hypothesis be as in Theorem 4.1, then with overwhelming probability, all the unit left and right singular vectors of $M$ have all coefficients uniformly of size at most ${K^{2}{n^{-1/2}\log^{O(1)}n}}$ .

The continuity hypothesis in the above theorems, which guarantees the singular values are almost surely simple, is only a technical one. In practice we are able to eliminate this hypothesis by a limiting argument using Lemma 3.2.

with overwhelming probability. The delocalization of left singular vectors can be proved similarly.

The “bulk” case is treated in . Now we consider the edge case when $1\leq i\leq 0.001n$ or $0.999n\leq i\leq n$ (say). Using the Marchenko-Pastur law, we have with overwhelming probability that

By Lemma 3.3, it suffices to show with overwhelming probability that

From Lemma 3.7, we conclude that $|v_{j}(M_{p,n-1})^{*}X|\ll K\log n$ with overwhelming probability for each $j$ (and hence for all $j$ , by the union bound). Then it is enough to show that with overwhelming probability

By the Cauchy-Schwarz inequality, it thus suffices to show that

with overwhelming probability for some $1\leq T_{-}<T_{+}\ll K^{2}\log^{O(1)}n$ . Noticed that $\sigma_{j}(M_{p,n-1})^{2}=\lambda_{j}(M_{p,n-1}^{*}M_{p,n-1})=\Theta(n)$ , we thus need to show

with overwhelming probability for some $1\leq T_{-}<T_{+}\ll K^{2}\log^{O(1)}n$ , which is equivalent to prove that

with overwhelming probability for some $1\leq T_{-}<T_{+}\ll K^{2}\log^{O(1)}n$ .

In the interlacing identity in Lemma 3.5, we have

By Lemma 3.7, one gets $\frac{1}{p}||X||^{2}=1+o(1)$ with overwhelming probability. And since $p/n=y+o(1)$ , one has

with overwhelming probability. In order to show (4.2),we will evaluate

for some $T_{-},T_{+}=K^{2}\log^{O(1)}n$ , where $W_{p,n-1}=\frac{1}{n-1}M_{p,n-1}^{*}M_{p,n-1}$ .

The Machenko-Pastur law implies $\lambda_{j}(W_{p,n-1})=\Theta(1)$ for every $1\leq j\leq\text{min}(p,n-1)$ .

Let $A>100$ be a large constant to be chosen later. From Theorem 4.1, we have that (by taking $\delta=\log^{-A/20}n$ )

with overwhelming probability for any interval $I$ of length $|I|=K^{2}\log^{A}n/n$ , where $\alpha_{I}:=\frac{1}{|I|}\int_{I}\rho_{MP,y}(x)\ dx$ . For such an interval, we see from Lemma 3.7 that with overwhelming probability

Set $d_{I}:=\frac{\text{dist}(\lambda_{i}(W_{p,n}),I)}{|I|}$ . If $d_{I}\geq\log n$ (say), then

for all $j$ in the above sum, and since $\lambda_{i}(W_{p,n})=\Theta(1)$ , we get

We now partition the real line into intervals $I$ of length $K^{2}\log^{A}n/n$ , and sum (4.7) over all intervals $I$ with $d_{I}\geq\log n$ . Bounding $\alpha_{I}$ crudely by $O(1)$ , we see that $\sum_{I}O(\frac{\alpha_{I}}{d_{I}^{2}})=O(\frac{1}{\log n})=o(1)$ . Similarly, one has

Finally, Riemann integration of the principal value integral

If $|\lambda_{i}(W_{p,n})-a|\leq o(1)$ , using the formula for the Stieltjes transform, one obtains from residue calculus that

When $0<y<1$ , $\sqrt{y}>2\sqrt{y}-1$ . (4.2) follows by comparing (4.8) and (4.9).

If $|\lambda_{i}(W_{p,n})-b|\leq o(1)$ , we have

When $0<y\leq 1$ , $-\sqrt{y}>-2\sqrt{y}-1$ . Then (4.2) follows by comparing (4.10) and (4.11).

By the concentration theorem 4.1 and the Cauchy interlacing law, the interval $I$ with $d_{I}<\log n$ will contribute at most $K^{2}\log^{O(1)}n$ eigenvalues and we can set $T_{-},T_{+}$ accordingly. The proof is now complete.

2. Proof of Theorem 4.1:

We first have a crude upper bound on the number of eigenvalues of $W$ on an interval. The proof can be found in Section 5.2, .

with overwhelming probability, where $N_{I}$ is the number of eigenvalues in the interval $I$ .

The strategy is to compare the Stieltjes transform of the ESD of matrix $W$

with the Stieltjes transform of Marchenko-Pastur Law

And thanks to the next proposition, one gets control on ESD through control on the Stieltjes transforms.

(Lemma 29, ) Let $1/10\geq\eta\geq 1/n$ , and $a,b,\varepsilon,\delta>0$ . Suppose that one has the bound

with (uniformly) overwhelming probability for all $z$ with $a\leq\text{Re}(z)\leq b$ and $\text{Im}(z)\geq\eta$ . Then for any interval $I$ in $[a-\varepsilon,b+\varepsilon]$ with $|I|\geq\text{max}(2\eta,\frac{\eta}{\delta}\log\frac{1}{\delta})$ , one has

By Proposition 4.5, our objective is to show

with (uniformly) overwhelming probability for all $z$ with $a\leq\text{Re}(z)\leq b$ and $\text{Im}(z)\geq{\eta}:=\frac{K^{2}\log^{6}n}{n\delta^{8}}.$

Since $s_{MP,y}(z)$ is the unique solution to the equation

in the upper half plane (see ), we investigate a similar equation for $s(z)$ .

The entries of $X_{k}$ are independent of each other and of $W_{k}$ , and have mean and variance $1$ . Noticed $u_{j}(M_{k})$ is a unit vector. By linearity of expectation we have

is the Stieltjes transform for the ESD of $W_{k}$ . From the Cauchy interlacing law, we can get

In fact a similar estimate holds for $Y_{k}$ itself:

For $1\leq k\leq n$ , $Y_{k}=\mathbf{E}(Y_{k}|W_{k})+o(\delta^{2})$ holds with (uniformly) overwhelming probability for all $z$ with $a\leq\text{Re}(z)\leq b$ and $\text{Im}(z)\geq{\eta}$ .

Let $T\subset\{1,\ldots,n-1\}$ . Let $H$ be the space spanned by $\{u_{j}(W_{k})\}$ for $j\in T$ and $P_{H}$ be the orthogonal projection onto $H$ . Thus $\displaystyle\sum_{j\in T}R_{j}=||P_{H}(X_{k})||^{2}-{\text{dim}(H)}.$

By Lemma 3.7, we conclude with overwhelming probability

Let $z=x+\sqrt{-1}{}\eta$ , where $\eta=\frac{K^{2}\log^{6}n}{n\delta^{8}}$ and $a\leq x\leq b$ . We will use two auxiliary parameters $\alpha=\delta^{2}\log^{-1.1}n$ , $\delta^{\prime}=\delta^{2}\log^{-0.1}n$ in later estimation.

First, for those $j\in T$ such that $|\lambda_{j}(W_{k})-x|\leq\delta^{\prime}\eta$ , the function $\frac{\lambda_{j}(W_{k})}{\lambda_{j}(W_{k})-x-\sqrt{-1}{}\eta}$ has magnitude $O(\frac{1}{{}\eta})$ . From Proposition 4.4, $|T|\ll n\delta^{\prime}\eta$ , the contribution for these $j\in T$ ,

For the contribution of the remaining indices, we subdivide them as

for $0\leq l\ll\log n/\alpha$ , and then sum over $l$ .

For each such interval, the function $\frac{\lambda_{j}(W_{k})}{\lambda_{j}(W_{k})-x-\sqrt{-1}{}\eta}$ has magnitude $O(\frac{1}{(1+\alpha)^{l}\delta^{\prime}\eta})$ and fluctuates by at most $O(\frac{\alpha}{(1+\alpha)^{l}\delta^{\prime}\eta})$ . Say $T(l)$ is the set of all $j$ ’s in this interval, by Proposition 4.4, $|T(l)|\ll n\alpha(1+\alpha)^{l}\delta^{\prime}\eta$ . Together with bounds (4.14), (4.15), the contribution for these $j$ on such an interval,

Summing over $l$ (taking into account that $l\ll\log n/\alpha$ ), we will get

Recall $s_{MP,y}(z)$ has an explicit expression

where we take the branch of $\sqrt{(y+z-1)^{2}-4yz}$ with cut at $[a,b]$ that is asymptotically $y+z-1$ as $z\rightarrow\infty$ .

From (4.13) and Proposition 4.6, we have with overwhelming probability that

where we used Lemma 3.7 to obtain that $\xi_{kk}=||X_{k}||^{2}/n=1+o(\delta^{2})$ with overwhelming probability.

By assumption $p/n\rightarrow y$ , when $n$ is large enough,

In (4.16), for the error term $o(\delta^{2})$ , one has either $\frac{o(\delta^{2})}{y+z-1+yzs(z)}=o(\delta^{2})$ or $y+z-1+yzs(z)=o(1)$ . In the latter case, we get $s(z)=-\frac{y+z-1}{yz}+o(1)$ . In the first case, we impose a Taylor expansion on (4.16),

Completing a perfect square for $s(z)$ in the above identity, one can solve the equation for $s(z)$ ,

If $\frac{o(\delta^{2})}{\sqrt{\frac{(y+z-1)^{2}}{4yz}-1}}=o(\delta)$ , by a Taylor expansion on the right hand side of (4.17), we have $\sqrt{yz}(s(z)+\frac{y+z-1}{2yz})=\pm\sqrt{\frac{(y+z-1)^{2}}{4yz}-1}+o(\delta)$ . Therefore, $s(z)=s_{MP,y}(z)+o(\delta)$ or $s(z)=s_{MP,y}(z)-\frac{\sqrt{(y+z-1)^{2}-4yz}}{yz}+o(\delta)=-s_{MP,y}(z)-\frac{y+z-1}{yz}+o(\delta)$ . If $\frac{(y+z-1)^{2}}{4yz}-1=o(\delta^{2})$ , from (4.17) and the explicit formula for $s_{MP,y}(z)$ , we still have $s(z)=s_{MP,y}(z)+o(\delta)$ .

To summarize the above discussion, one has, with overwhelming probability, either

We may assume the above trichotomy holds for all $z=x+\sqrt{-1}\eta$ with $a\leq x\leq b$ and $\eta_{0}\leq\eta\leq n^{10}/\delta$ where $\eta_{0}=\frac{K^{2}\log^{6}n}{n\delta^{8}}$ .

When $\eta=n^{10}/\delta$ , from $|s(z)|\leq 1/\eta$ and $|s_{MP,y}(z)|\leq 1/\eta$ , we have $s(z)$ and $s_{MP,y}(z)$ are both $o(\delta)$ and therefore $(\ref{eq:diff1})$ holds in this case. By continuity, we conclude that either (4.18) holds in the domain of interest or there exists some $z$ in the domain such that (4.18) and (4.19) or (4.18) and (4.20) hold together.

On the other hand, (4.18) or (4.20) cannot hold at the same time. Otherwise, $s_{MP,y}(z)+\frac{y+z-1}{yz}=o(1)$ . However, from $s_{MP,y}(z)(s_{MP,y}(z)+\frac{y+z-1}{yz})=-\frac{1}{yz}$ and $|s_{MP,y}(z)|\leq\frac{\sqrt{2}}{\sqrt{y}(1-\sqrt{y}+\sqrt{\eta_{0}})}$ , one can see that $|s_{MP,y}(z)+\frac{y+z-1}{yz}|$ is bounded from below, which implies a contradiction.

Similarly, (4.18) or (4.19) cannot both hold except when $(y+z-1)^{2}-4yz=o(\delta^{2})$ . Otherwise, we can conclude that $2s_{MP,y}(z)+\frac{y+z-1}{yz}=o(\delta)$ . From the explicit formula of $s_{MP,y}$ ,

One can conclude $|2s_{MP,y}(z)+\frac{y+z-1}{yz}|\geq C\delta$ , which contradicts our assertion. Actually, if $(y+z-1)^{2}-4yz=o(\delta^{2})$ , (4.18) and (4.19) are equivalent.

In conclusion, (4.18) holds with overwhelming probability in the domain of interest.

Gap theorem and Four Moment theorem

In this section, we complete the proofs of the main results, Theorem 1.5 and Theorem 1.7. The proofs follow closely those in (as well as in , ), so we shall focus on the changes needed to that argument. We assume substantial familiarity with the materials in , , and will cite from them repeatedly.

It is convenient to use the augmented matrix

which is a $(p+n)\times(p+n)$ Hermitian matrix with eigenvalues $\pm\sigma_{1}(M),\ldots,\pm\sigma_{p}(M)$ and $n-p$ zeros. In this way, we can import the results obtained in , and to the model discussed in this paper.

As mentioned in the beginning of Section 5, one can assume that

almost surely for all $i,j$ . We also assume that the distributions of $M,M^{\prime}$ are continuous to ensure the singular values are almost surely simple.

Let us first state a weaker version the Four Moment Theorem as we assume gap properties for the matrices considered:

Then for any $1\leq i_{1}<i_{2}<\ldots<i_{k}\leq n$ , and for $n$ sufficiently large depending on $k,c_{0}$ , we have

If $\zeta_{ij}$ and $\zeta^{\prime}_{ij}$ only match to order 3 rather 4, then the conclusion (5.3) still holds provided that one strengthens (5.2) to

The Four Moment theorem follows directly from Theorem 1.7 and Theorem 5.1. The next two sections are devoted to the proofs of Theorem 1.7 and Theorem 5.1.

The key technical step (also used in proving Theorem 1.7) is the truncated Four Moment Theorem, which follows by applying [12, Proposition 6.1 and Proposition 6.2] (or see [10, Proposition 35]) to the argumented matrix M. The proof is omitted here.

For sufficiently small $c_{0}>0$ and sufficiently large $C_{0}>0$ ( $C_{0}=10^{4}$ will suffice) the following holds for every $k\geq 1$ . Let $M=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ and $M^{\prime}=(\zeta^{\prime}_{ij})_{1\leq i\leq p,1\leq j\leq n}$ be two random matrices satisfying condition C1 with the indicated constant $C_{0}$ , and assume that for each $i,j$ that $\zeta_{ij}$ and $\zeta^{\prime}_{ij}$ match to order 4. Assume also that $|\zeta_{ij}|,|\zeta^{\prime}_{ij}|\leq n^{10/C_{0}}$ and $p/n\rightarrow y$ for some $0<y\leq 1$ .

Then for any $1\leq i_{1}<i_{2}<\ldots<i_{k}\leq n$ , and for $n$ sufficiently large depending on $\varepsilon,k,c_{0}$ , we have

If $\zeta_{ij}$ and $\zeta^{\prime}_{ij}$ only match to order 3 rather 4, then the conclusion (5.5) still holds provided that one strengthens (5.4) to

for any $c_{1}>0$ , provided that $c_{0}$ is sufficiently small depending on $c_{1}$ .

As in the arguments in Section 6 in , we use the qualities for $1\leq i\leq p$ ,

The gap property (up to the edge) on M ensures an upper bound on $Q_{i}({\bf M})$ . The proof repeats exactly the proof of Lemma 32 in .

If M satisfies the gap property, then for any $c_{0}>0$ (independent of n), and any $1\leq i\leq p$ , one has $Q_{i}({\bf M})\leq n^{c_{0}}$ with high probability.

where $\eta(x)$ is a smooth cutoff to the region $x\leq n^{c_{0}}$ which equals $1$ on $x\leq n^{c_{0}}/2$ . From Propositon 5.3, we have

for some $c>0$ , and a similar relation holds for $M^{\prime}$ . The proof is complete by using the above relations and Theorem 5.2.

2. Proof of the Gap Theorem

We first have a gap theorem under additional exponential decay hypothesis on the ensembles of $M$ . The proof is presented in Section 5.3.

Let $M=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ be a random matrix obeying condition C1, and the entries $\zeta_{ij}$ satisfy exponential decay in the sense that ${\hbox{\bf P}}(|\zeta_{ij}|\geq t^{C})\leq\exp(-t)$ for all $t\geq C^{\prime}$ for all $i,j$ and some constants $C,C^{\prime}>0$ . Then $M$ obeys the gap property.

The next observation is the following matching lemma (See Lemma 33 in ), which together with Theorem 5.2, ensures us to remove the exponential decay hypothesis in Theorem 5.4.

Now consider the matrix $M=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n}$ in Theorem 1.7. By the matching lemma, we can find a random matrix $M^{\prime}=(\zeta^{\prime}_{ij})_{1\leq i\leq p,1\leq j\leq n}$ such that $\zeta^{\prime}_{ij}$ satisfies the exponential decay hypothesis and $\zeta^{\prime}_{ij}$ matches $\zeta_{ij}$ to third order for each $i,j$ . By Theorem 5.4, the matrix $M^{\prime}$ obeys the gap property. Similarly as in Section 6, , let $\eta(x)$ be a smooth cutoff to the region $x\leq n^{c_{0}}$ . Then by Proposition 5.3, ${\hbox{\bf E}}\eta(Q_{i}(M^{\prime}))=1-O(n^{-c_{1}})$ , which, by using Theorem 5.2, implies that ${\hbox{\bf E}}\eta(Q_{i}(M^{\prime}))=1-O(n^{-c_{2}})$ for some $c_{2}$ independent of $n$ . Hence, $M$ also satisfies the gap property.

3. Proof of Theorem 5.4:

The proof follows closely to that discussed in Section 5, . We shall mainly mention the corresponding changes. Interested readers can find the detailed proofs in . First, in order to operate an induction argument, we need to treat the edge case $i=1,p$ separately.

By symmetry, it suffices to show for $i=p$ . In the interlacing identity (Lemma 3.5),

From Theorem 4.2 and Lemma 3.8, one can conclude that $|u_{p-1}(M_{p-1,n})^{*}Y|^{2}\geq n^{-c/10}$ with high probability. Therefore, $|\sigma_{j}(M_{p-1,n})^{2}-\sigma_{p}(M_{p,n})^{2}|\geq n^{-c}$ with high probability. The conclusion follows from the Cauchy interlacing law. ∎

For the general case for the gap theorem, we write $i_{0},p_{0}$ instead of $i,p$ and define $N_{0}:=p_{0}+n$ , as in , we introduce the regularized gap

where $C_{1}>1$ is a large constant to be determined later. To show Theorem 5.4, it is enough to show that

By repeating the arguments in Section 3.5, , the proof relies on the following two key propositions. The idea is to propagate a narrow gap for $M_{p,n}$ backwards in $p$ until one can use Theorem 4.1 to control the occurrence of the gap.

. Suppose $p_{0}/2\leq p<p_{0}$ and $l\leq p/10$ is such that

for some $1<\delta\leq 1$ (which can depend on $n$ ), and that

Let $X_{p+1}$ be the $p+1^{\text{th}}$ row of $M_{p_{0},n}$ , and let $u_{1}(M_{p,n}),\ldots,u_{p}(M_{p,n})$ be an orthonormal system of right singular vectors of $M_{p,n}$ associated to $\sigma_{1}(M_{p,n}),\ldots,\sigma_{p}(M_{p,n})$ . Then one of the following statement holds:

(Macroscopic spectral concentration) There exists $1\leq i_{-}<i_{+}\leq p+1$ with $i_{+}-i_{-}\geq\log^{C_{1}/2}n$ such that $|\sqrt{n}\sigma_{i_{+}}(M_{p+1,n})-\sqrt{n}\sigma_{i_{-}}(M_{p+1,n})|\leq\delta^{1/4}\exp(\log^{0.95})(i_{+}-i_{-}).$

(Small inner products) There exists $1\leq i_{-}\leq i_{0}-l<i_{0}\leq i_{+}\leq p$ with $i_{+}-i_{-}\leq\log^{C_{1}/2}n$ such that

(Large singular value) For some $1\leq i\leq p+1$ one has

(Large inner product) There exists $1\leq i\leq p$ such that

(Large inner product near $i_{0}$ ) There exists $1\leq i\leq p$ with $|i-i_{0}|\leq\log^{C_{1}}n$ such that

Apply Lemma 5.3 in to the augmented matrix

Noticed $A_{p+n}$ is $A_{p+n+1}$ with the rightmost column and bottom column(which is $X_{p+1}$ and $p+1$ zeros) removed. The eigenvalues of $A_{p+n}$ are $\pm\sqrt{n}\sigma_{1}(M_{p,n}),\ldots,\pm\sqrt{n}\sigma_{p}(M_{p,n})$ and , and an orthonormal eigenbasis includes the vectors $\left(\begin{array}[]{c}u_{j}(M_{p,n})\\ v_{j}(M_{p,n})\end{array}\right)$ for $1\leq j\leq p$ . (The ”Large coefficient” event in Lemma 5.3, ) cannot occur as $A_{p+n+1}$ has zero diagonals.) ∎

The next proposition claims that the events (i)-(vi) occurs with small probability.

. Suppose that $p_{0}/2\leq p<p_{0}$ and $l\leq p/10$ and set $\delta:=n_{0}^{-\kappa}$ for some sufficiently small fixed $\kappa>0$ . Then

The events (i), (iii), (iv), (v) in Proposition 5.7 all fail with high probability.

There is a constant $C^{\prime}$ such that all the coefficients of the right singular vectors $u_{j}(M_{p,n})$ for $1\leq j\leq p$ are of magnitude at most $n^{-1/2}\log^{C^{\prime}}n$ with overwhelming probability. Conditioning $M_{p,n}$ to be a matrix with this property, the events (ii) and (vi) occur with a conditional probability of at most $2^{-\kappa m}+n^{-\kappa}$ .

Furthermore, there is a constant $C_{2}$ (depending on $C^{\prime},\kappa,C_{1}$ ) such that if $l\geq C_{2}$ and $M_{p,n}$ is conditioned as in (b), then (ii) and (vi) in fact occur with a conditional probability of at most $2^{-\kappa m}\log^{-2C_{1}}n+n^{-\kappa}$ .

The proof of the above proposition repeats the proof of Proposition 53 in with the major difference being that Theorem 4.1 and Theorem 4.2 are applied instead of Theorem 60 and Proposition 62 in .