Random weighted projections, random quadratic forms and random eigenvectors

Van Vu, Ke Wang

Introduction

In , Tao and the first author showed that (under certain conditions) this length is strongly concentrated. In other words, the projection of XX onto HH lies essentially on a circle centered at the origin. This fact played a crucial role in the studies of the determinant of a random matrix with independent entries (see ). We say that ξ\xi is KK-bounded if ξK|\xi|\leq K with probability 1.

The projection lemma follows from the Talagrand’s inequality ([17, Chapter 4]). The constants 1010 and 2020 are rather arbitrary. We make no attempt to optimize the constants in this paper.

2. Weighted projection lemmas

Let us fix an orthonormal basis {u1,,ud}\{u_{1},\dots,u_{d}\} of HH. We can express ΠHX\Pi_{H}X as

In recent studies, we came up with situations when the roles of the axes are not compatible. Formally speaking, one is required to consider a weighted version of (1) where (i=1duiX2)1/2(\sum_{i=1}^{d}|u_{i}^{*}X|^{2})^{1/2} is replaced by (i=1dciuiX2)1/2(\sum_{i=1}^{d}c_{i}|u_{i}^{*}X|^{2})^{1/2} with cic_{i} being non-negative numbers (weights).

We are able to prove a variant of Lemma 1.1 for this more general problem, which turns out to be useful in a number of applications, some of which will be discussed in the paper. Furthermore, we can also weaken the assumption on random vector XX in various ways.

where M(Y)M(Y) denotes the median of a random variable YY (choose an arbitrary one as there are many).

Notice that the notion of KK-concentrated is somewhat similar to the notion of threshold in random graph theory in the sense that if XX is KK-concentrated then it is cKcK-concentrated for any constant c>0c>0. (Similarly, if p(n)p(n) is a threshold for a property P{\mathcal{P}} (say, containing a triangle) then cp(n)cp(n) is also a threshold.) One can also replace the median by the expectation (see Lemma 2.1). The dependence on KK on the RHS of (2) is flexible (one can replace K2K^{2} by any function f(K))f(K)); however, the quality of the concentration bound will depend on f(K)f(K) and we leave it as an exercise for the reader to work out this dependence.

Examples of KK-concentrated random variables

If the coordinates of XX are iid standard gaussian (real or complex), then XX is 11-concentrated (see ).

If ξi\xi_{i} are independent and ξi\xi_{i} are KK-bounded for all ii, then XX is KK-concentrated (this is a corollary of Talagrand’s inequality; see [17, Chapter 4] or [27, Theorem F.5]).

If XX satisfies the log-Sobolev inequality with parameter K2K^{2}, then it is KK-concentrated (see [17, Theorem 5.3]).

The coordinates ξi\xi_{i} of XX come from a random walk satisfying certain mixing properties (see [25, Corollary 4]; in this corollary Γ\|\Gamma\| plays the role of KK). In this and the previous example, the coordinates of XX are not necessarily indepedent.

In particular, by squaring, it follows that

Another way to weaken the KK-bounded assumption is to consider truncation. If ξ\xi is not bounded, but has light tail, then by setting KK appropriately, we can show that P(ξK){\mathbf{P}}(|\xi|\geq K) is negligible with respect to the probability bound we want to prove.

Assume that the ξi\xi_{i} are independent with mean zero and variance one. Choose a number K>1K>1 and let ε1:=max1inP(ξi>K){\varepsilon}_{1}:=\max_{1\leq i\leq n}{\mathbf{P}}(|\xi_{i}|>K). Set ξi:=ξiIξiK\xi_{i}^{\prime}:=\xi_{i}{\mathbf{I}}_{|\xi_{i}|\leq K} and let μi\mu_{i} and σi2\sigma_{i}^{2} denote its mean and variance. Set ε2:=max1inμi{\varepsilon}_{2}:=\max_{1\leq i\leq n}|\mu_{i}| and ε3:=max1inσi21{\varepsilon}_{3}:=\max_{1\leq i\leq n}|\sigma_{i}^{2}-1|. Assume all εj1/2{\varepsilon}_{j}\leq 1/2 (in practice this assumption is satisfied easily).

3. Concentration of random quadratic forms

Consider a quadratic form Y:=XAXY:=X^{*}AX where X=(ξ1,,ξn)X=(\xi_{1},\dots,\xi_{n}) is, as usual, a random vector and A=(aij)1i,jnA=(a_{ij})_{1\leq i,j\leq n} a deterministic matrix. As application of the new projection lemmas, we prove a large deviation result for YY, which can be seen as the quadratic version of the standard Chernoff bound.

Let XX and ε1,ε2,ε3{\varepsilon}_{1},{\varepsilon}_{2},{\varepsilon}_{3} be as in Lemma 1.3. There are constants C,C>0C,C^{\prime}>0 such that the following holds. Assume n2K2(ε2+ε3)=o(1)n^{2}K^{2}({\varepsilon}_{2}+{\varepsilon}_{3})=o(1), then

As an illustration, let us consider the case when ξi\xi_{i} are sub-exponential. We say that ξ\xi is sub-exponential with exponent α>0\alpha>0 (with accompanying positive constants aa and bb) if for any t>0t>0

Assume that ξi\xi_{i} are independent and sub-exponential (with exponent α>0\alpha>0) random variables with mean 0 and variance 1. Then there are constants C,C>0C,C^{\prime}>0 such that for any tt satisfying

Quadratic forms of random variables appear frequently in probability and the large deviation problem has been considered by several researchers, with first and perhaps most famous result being by Hanson-Wright inequality . In many cases, our results improve earlier results significantly; see Section 3 for more details.

4. Norm of random eigenvectors

Let MnM_{n} be a symmetric ±1\pm 1 matrix (the upper diagonal and diagonal entries are iid Bernoulli random variables taking values ±1\pm 1 with probability 1/21/2). This is an important object in both probabilistic combinatorics and the theory of random matrices. Let uu be an arbitrary unit eigenvector of MnM_{n}. We investigate the following natural question,

A good bound on the infinity norm of the eigenvectors is important in spectral analysis of graphs and many other applications, such as the studies of nodal domains (see for instance and the references therein). Recently, it plays a crucial role in breakthrough works concerning local statistics of random matrices (see Section 1.5 and also for surverys).

Set Wn=1nMnW_{n}=\frac{1}{\sqrt{n}}M_{n}. Thanks to the classical Wigner’s semi-circle law (see Section 1.5), we know that most of the eigenvalues of WnW_{n} belong to the interval (2+ϵ,2ϵ)(-2+\epsilon,2-\epsilon). Using our new concentration results, we are able to obtain (what we believe to be) the optimal bound for the eigenvectors corresponding to these eigenvalues.

Let MnM_{n} be a n×nn\times n symmetric matrix whose upper diagonal entries are iid random variables that takes values ±1\pm 1 with the same probability. Let Wn=1nMnW_{n}=\frac{1}{\sqrt{n}}M_{n}. For any constant C1>0C_{1}>0, there is a constant C2>0C_{2}>0 such that the following holds.

(Bulk case) With probability at least 1nC11-n^{-C_{1}}, for any fixed ϵ>0\epsilon>0 and any 1in1\leq i\leq n with λi(Wn)[2+ϵ,2ϵ]\lambda_{i}(W_{n})\in[-2+\epsilon,2-\epsilon], there is a unit eigenvector ui(Wn)u_{i}(W_{n}) of λi(Wn)\lambda_{i}(W_{n}) satisfying

(Edge case) With probability at least 1nC11-n^{-C_{1}}, for any ϵ>0\epsilon>0 and any 1in1\leq i\leq n with λi(Wn)[2ϵ,2+ϵ][2ϵ,2+ϵ]\lambda_{i}(W_{n})\in[-2-\epsilon,-2+\epsilon]\cup[2-\epsilon,2+\epsilon], there is a unit eigenvector ui(Wn)u_{i}(W_{n}) of λi(Wn)\lambda_{i}(W_{n}) satisfying

The best previous bound was of the form logCnn1/2\frac{\log^{C}n}{n^{1/2}} for some large (usually not explicit) constant CC . We conjecture that the bound O(logn/n)O(\sqrt{\log n/n}) is sharp (it is easy to see that this is the case if the entries are standard gaussian) and also that it holds for all eigenvectors. Very recently, Rudelson and Vershynin also studied the norm of random eigenvectors using a geometric method, which is different from our approach discussed in Section 1.5.

5. The local semi-circle law

Denote by ρsc\rho_{sc} the semi-circle density function with support on $$,

Let us recall the classical Wigner’s semi-circle law:

Let MnM_{n} be a random Hermitian matrix whose entries on and above the diagonal are iid bounded random variables with zero mean and unit variance and Wn=1nMnW_{n}=\frac{1}{\sqrt{n}}M_{n}. Then for any real number xx,

in the sense of probability, where we use I|I| to denote the cardinality of a finite set II.

The key tool for bounding the infinity norm of an eigenvector is a statement of the following type: any interval of length at least TT (which tends to zero with nn) in the spectrum $containsaneigenvalue,withhighprobability.Thequalityoftheboundwilldependonhowsmallcontains an eigenvalue, with high probability. The quality of the bound will depend on how smallTis.ThisapproachwasdevelopedbyErdo\Hs,SchleinandYauin,leadingtoeigenvectornormboundsoforderis. This approach was developed by Erdős, Schlein and Yau in , leading to eigenvector norm bounds of ordern^{-2/3},n^{-3/4}andfinallyand finallyn^{-1+o(1)}$. A simpler argument, following the same approach, was developed by Tao and the first author in (see [27, Section 4] for a problem concerning random non-hermitian matrices).

One way to attack the above problem is to show that the semi-circle law holds for small intervals (or at small scale). Intuitively, we would like to have with high probability that

for any interval II and fixed δ>0\delta>0, where NIN_{I} denotes the number of eigenvalues of Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n} in the interval II. Of course, the reader can easily see that II cannot be arbitrarily short (since NIN_{I} is an integer). Following , we call a statement of this kind a local semi-circle law (LSCL).

A natural question arises: how short can II be ? Formally, we say that the LSCL holds at a scale f(n)f(n) if with probability 1o(1)1-o(1)

for any interval II in the bulk of length ω(f(n))\omega(f(n)) and any fixed δ>0\delta>0. Furthermore, we say that f(n)f(n) is a threshold scale if the LSCL holds at scale f(n)f(n) but does not holds at scale g(n)g(n) for any function g(n)=o(f(n))g(n)=o(f(n)). (The reader may notice a similarity between this definition and the definition of threshold functions for random graphs.) We would like to raise the following problem.

Determine the threshold scale (if exists).

We do not know a sharp estimate for the threshold for any matrix ensembles, even in the basic GUE (random matrix with complex gaussian entries) and GOE (random matrix with real gaussian entries) cases. A recent result by Ben Arous and Bourgade shows that the maximum gap between two consecutive (bulk) eigenvalues of GUE is of order Θ(logn/n)\Theta(\sqrt{\log n}/n), with high probability. Thus, if we partition the bulk into intervals of length αlogn/n\alpha\sqrt{\log n}/n for a sufficiently small α\alpha, one of these intervals contains at most one eigenvalue. Therefore, we expect that in natural ensembles, the LSCL does not hold below the logn/n\sqrt{\log n}/n scale. In , upper bound of the form logCn/n\log^{C}n/n was proved for some large value of CC. Here we are going to show

Let MnM_{n} be a Hermitian matrix whose upper diagonal entries are independent random variables with mean and variance 11. Assume furthermore that for 1in1\leq i\leq n, the vectors XiX_{i}, obtained by deleting the ii-th entry of the ii-th row vector of MM, are KK-concentrated. Then the threshold scale for LSCL is bounded from above by K2logn/nK^{2}\log n/n.

In the GUE case, the gap between the upper and lower bound is only O(logn)O(\sqrt{\log n}) and it is an intriguing problem to remove this factor. We also conjecture that Ben Arous and Bourgade’s result on the largest gap holds for ±1\pm 1 random matrices.

The results of Section 1.4 and Section 1.5 also hold for random sample covariance matrices. We sketch the results and proofs in the appendices.

Structure of the paper. In the next section, we prove the new projection lemmas. In Section 3, we prove the new concentration inequalities for quadratic forms and make a comparison with prior results. The next section, Section 4 can be seen as a preparation step in which we recall facts about random matrices. We prove the new threshold for the local semi-circle law in Section 5, and the bound on the infinity norm of eigenvectors in Section 6. The appendices contain proofs concerning random sample covariance matrices.

Acknowledgement. The authors would like to thank the anonymous referees for their careful reading and constructive suggestions.

Proofs of Lemmas 1.2 and 1.3

Next, we show that f(X)f(X) is 1-Lipschitz. Notice that f(X)j=1dujX2Xf(X)\leq\sqrt{\sum_{j=1}^{d}|u_{j}^{*}X|^{2}}\leq\|X\|. Since f(X)f(X) is convex, one has

Thus f(X)f(Y)f(XY)f(X)-f(Y)\leq f(X-Y) and f(Y)f(X)f(YX)=f(XY)f(Y)-f(X)\leq f(Y-X)=f(X-Y), which imply

Thus, by the definition of KK-concentrated property,

To conclude the proof, it suffices to show M(f(X))j=1dcj=O(K).|M(f(X))-\sqrt{\sum_{j=1}^{d}c_{j}}|=O(K). We use the following lemma. The proof of the lemma is classical and thus omitted.

Let YY be a real random variable. Assume P(Yμt)f(t){\mathbf{P}}(|Y-\mu|\geq t)\leq f(t), where 0f(x)dx=O(1)\int_{0}^{\infty}f(x)dx=O(1), then EYμ=O(1)|{\mathbf{E}}Y-\mu|=O(1). Assume furthermore that YY is non-negative, μ0\mu\geq 0, σ=EY2\sigma=\sqrt{{\mathbf{E}}Y^{2}}, and 0xf(x)dx=O(1)\int_{0}^{\infty}xf(x)dx=O(1), then EYσ=O(1).|{\mathbf{E}}Y-\sigma|=O(1).

To apply this lemma, set ci:=cimax1idcic_{i}^{\prime}:=\frac{c_{i}}{\max_{1\leq i\leq d}c_{i}}, Y:=1Ki=1dciuiX2Y:=\frac{1}{K}\sqrt{\sum_{i=1}^{d}c^{\prime}_{i}|u_{i}^{*}X|^{2}} and μ:=M(Y)\mu:=M(Y). We have, by the KK-concentration property

Set f(x)=Cexp(Cx2)f(x)=C\exp(-C^{\prime}x^{2}). The assumptions on f(x)f(x) in Lemma 2.1 are trivially satisfied. As XX is isotropic, σ2=EY2=1K2i=1dci\sigma^{2}={\mathbf{E}}Y^{2}=\frac{1}{K^{2}}\sum_{i=1}^{d}c_{i}^{\prime}. It follows from Lemma 2.1 that

As uiu_{i} are unit vectors, uiXiXinK|u_{i}^{*}X_{i}^{\prime}|\leq\|X_{i}^{\prime}\|\leq\sqrt{n}K and uiDiDi2n(Kϵ2+ϵ3)|u_{i}^{*}D_{i}|\leq\|D_{i}\|\leq 2\sqrt{n}(K\epsilon_{2}+\epsilon_{3}) (these bounds are generous and can be improved by a polynomial factor in certain cases, but in applications such improvement rarely matters). It follows, again rather generously,

In practice, εj{\varepsilon}_{j} are typically super-polynomially small, i.e. nω(1)n^{-\omega(1)}, which yields 4n2K2(ε2+ε3)=o(1)4n^{2}K^{2}({\varepsilon}_{2}+{\varepsilon}_{3})=o(1). This term can be ignored (by slightly changing the values of C,CC,C^{\prime} if necessary) and we end up with a more friendly inequality

In the sub-exponential case, for a sufficiently large KK (compared to aa and bb), εjexp(b2K1/α){\varepsilon}_{j}\leq\exp(-\frac{b}{2}K^{1/\alpha}) for j=1,2,3j=1,2,3. For K=ω(logαn)K=\omega(\log^{\alpha}n), n2K2exp(b2K1/α)=o(1)n^{2}K^{2}\exp(-\frac{b}{2}K^{1/\alpha})=o(1) and (10) yield

Random Quadratic Forms

Let us first prove Theorem 1.4. Notice that if Y=XAXY=X^{*}AX, then Y+Yˉ=X(A+A)XY+\bar{Y}=X^{*}(A+A^{*})X and YYˉ=X(AA)XY-\bar{Y}=X^{*}(A-A^{*})X. Since

Moreover, as A+AF,AAF=O(AF)\|A+A^{*}\|_{F},\|A-A^{*}\|_{F}=O(\|A\|_{F}) and A+A2,AA2=O(A2)\|A+A^{*}\|_{2},\|A-A^{*}\|_{2}=O(\|A\|_{2}), it suffices to prove the theorem in the case AA is Hermitian.

Next, we observe that any Hermitian matrix AA can be written as A:=A1A2A:=A_{1}-A_{2} where AiA_{i} are positive semi-definite and maxi=1,2Ai2A2,maxi=1,2AiFAF\max_{i=1,2}\|A_{i}\|_{2}\leq\|A\|_{2},\max_{i=1,2}\|A_{i}\|_{F}\leq\|A\|_{F}. (In fact, the positive eigenvalues of A1A_{1} are the positive eigenvalues of AA and the positive eigenvalues of A2A_{2} are the absolute values of the negative eigenvalues of AA.) This enables us to further reduce the problem to the case when AA is positive semi-definite.

Finally, as the content of the theorem is invariant under scaling, we can assume that A2=1\|A\|_{2}=1. Let 1=c1c2,,cn01=c_{1}\geq c_{2},\dots,c_{n}\geq 0 be the eigenvalues of AA together with corresponding orthonormal eigenvectors {u1,,un}\{u_{1},\ldots,u_{n}\}. We have

This is precisely the setting of the projection lemmas. By Lemma 1.2, we know that for any numbers 0dj10\leq d_{j}\leq 1, jJj\in J,

However, it is somewhat wasteful to apply this directly to (12). We will perform an extra partition step. Set

and let Jk0+1J_{k_{0}+1} be the collection of the remaining indices.

For each 0kk0+10\leq k\leq k_{0}+1, apply Lemma 1.2 to di:=4kci,ciJkd_{i}:=4^{k}c_{i},c_{i}\in J_{k}, we have, for any s0s\geq 0

Set s:=tAFs:=\frac{t}{\|A\|_{F}} and simplify by 4k4^{k}, the above inequality becomes

Apparently, k=0k0+1t24kAF22t2AF2\sum_{k=0}^{k_{0}+1}\frac{t^{2}}{4^{k}\|A\|_{F}^{2}}\leq 2\frac{t^{2}}{\|A\|_{F}^{2}}. Moreover, iJk0+1cin×n5=n4\sum_{i\in J_{k_{0}+1}}c_{i}\leq n\times n^{-5}=n^{-4} and

Putting the above estimates together and using the union bound, we obtain

We can ignore the small term n2n^{-2} (by slightly adjusting the constant 16), the desired bound follows.

If we have more information about AA, the logn\log n term can be improved. For instance, if all eigenvalues of AA are comparable, then we do not need this term.

The proof of Theorem 1.5 uses Lemma 1.3 and is left as an exercise. To prove Corollary 1.6, notice that we can obtain an analogue of (11)

under the assumption that K=ω(logαn)K=\omega(\log^{\alpha}n).

To optimize the bound, we choose KK such that K2min{t2AF2logn,tA2}=K1/αK^{-2}\min\{\frac{t^{2}}{\|A\|_{F}^{2}\log n},\frac{t}{\|A\|_{2}}\}=K^{1/\alpha}. This leads to setting K:=min{(tAFlogn)22+1/α,(tA2)12+1/α}K:=\min\{(\frac{t}{\|A\|_{F}\sqrt{\log n}})^{\frac{2}{2+1/\alpha}},(\frac{t}{\|A\|_{2}})^{\frac{1}{2+1/\alpha}}\}. Assume

This assumption guarantees K=ω(logαn)K=\omega(\log^{\alpha}n). It also implies nexp(b2K1/α)exp(b3K1/α)n\exp(-\frac{b}{2}K^{1/\alpha})\leq\exp(-\frac{b}{3}K^{1/\alpha}), proving Corollary 1.6.

2. Comparison to earlier results

In 1971, Hanson and Wright obtained the first important inequality for sub-gaussian random variables.

Later, Wright extended Theorem 3.2 to non-symmetric random variables. Recently, Hsu, Kakade and Zhang showed that one can obtain a better upper tail (notice that B2\|B\|_{2} is replaced by A2\|A\|_{2})

under a considerably weaker assumption (which, in particular, does not require the ξi\xi_{i} to be independent). On the other hand, their method does not cover the lower tail. Let us pause here to point out a strong distinction from the linear case and the quadratic case: In the linear case (Chernoff type bounds), the lower tail follows from the upper tail by simply switching ξi\xi_{i} to ξi-\xi_{i}, but this trick is useless in the quadratic case. Recently, Rudelson and Vershynin proved the Hanson-Wright inequality

In the previous papers, the random variables ξi\xi_{i} are required to be real. Few years ago, motivated by the delocalization problem for random matrices, Erdős, Schlein and Yau considered the complex case. By assuming either both the real and imaginary parts of ξi\xi_{i} are iid sub-gaussian or the distribution of ξi\xi_{i} is rotationally symmetric (real and imaginary parts still sub-gaussian), they proved

Later, Erdős, Yau and Yin showed that if ξi\xi_{i} are independent sub-exponential random variables with exponent α>0\alpha>0, having mean 0 and variance 1, then

To simplify the comparison, let us ignore the logn\log n terms in our theorems (which play little role in practice). If K=O(1)K=O(1), then the main difference between Theorem 3.2 of Hanson and Wright and Theorem 1.4 is that the term B2\|B\|_{2} in Theorem 3.2 is now replaced by A2\|A\|_{2}. It is easy to see that B2A2\|B\|_{2}\geq\|A\|_{2} for any real matrix AA. In fact, in many cases, B2\|B\|_{2} is significantly larger than A2\|A\|_{2}. For instance, a random matrix AA with entries of order 1 typically has spectral norm of order n\sqrt{n}, but in this case it is clear that B2\|B\|_{2} has spectral norm of order nn (as all row sums are of this order). The same holds for several classical explicit matrices, such as the Hadamard matrix. In these cases, our bound improves Hanson-Wright’s significantly. Furthermore, our result applies in the complex case while the approach used by Hanson and Wright is restricted to the real case.

Comparing to (19), we do not need the fairly restricted assumption that either both the real and imaginary parts of ξi\xi_{i} are iid sub-gaussian or the distribution of ξi\xi_{i} is rotationally symmetric. In the case K=O(1)K=O(1), both terms t2AF2logn\frac{t^{2}}{\|A\|_{F}^{2}\log n} and tA2\frac{t}{\|A\|_{2}} in our bound can be considerably larger than tAF\frac{t}{\|A\|_{F}}. For instance, tA2\frac{t}{\|A\|_{2}} and tAF\frac{t}{\|A\|_{F}} differ by a factor n\sqrt{n} in both the random and Hadamard cases.

In order to make a Hanson-Wright type bound non-trivial, we need to assume tAF+A2t\geq\|A\|_{F}+\|A\|_{2}. In many applications, we want the probability bound to be polynomially or even super-polynomially small, i.e. nO(1)n^{-O(1)} or nω(1)n^{-\omega(1)}. This requires a lower bound logΩ(1)n(AF+A2)\log^{\Omega(1)}n(\|A\|_{F}+\|A\|_{2}) on tt, which is consistent with the assumption (6) in Corollary 1.6.

Notice that (7) compares favorably to (20). For the term tAF\frac{t}{\|A\|_{F}}, the exponent 1α+1/2\frac{1}{\alpha+1/2} is superior to 12α+2\frac{1}{2\alpha+2} (notice that we are talking about a double exponent, so an improvement here could improve the quality of the bound quite a lot). For the term tA2\frac{t}{\|A\|_{2}}, the exponent 12α+1\frac{1}{2\alpha+1} is still better than 12α+2\frac{1}{2\alpha+2}. Furthermore, A2\|A\|_{2} can be significantly smaller than AF\|A\|_{F}, as discussed earlier.

Random matrices and the Stieltjes transform

This section serves as a preparation, in which we recall several facts about random matrices. The empirical spectral distribution (ESD) function of the n×nn\times n Hermitian matrix Wn:=1nMn=1n(ζij)1i,jnW_{n}:=\frac{1}{\sqrt{n}}M_{n}=\frac{1}{\sqrt{n}}(\zeta_{ij})_{1\leq i,j\leq n} is a one-dimensional function

where I|\mathbf{I}| denotes the cardinality of a set I\mathbf{I}. We are going to focus on the case when the entries of MnM_{n} are KK-bounded; it is easy to extend this assumption to KK-concentrated.

The Stieltjes transform of a real measure μ(x)\mu(x) is defined for any complex number zz not in the support of μ\mu as

Thus, the Stieltjes transform sn(z)s_{n}(z) of WnW_{n} is

Furthermore, the Stieltjes transform ssc(z)s_{sc}(z) of the semi-circle distribution is

where z24\sqrt{z^{2}-4} is the branch of square root with a branch cut in $andasymptoticallyequalsand asymptotically equalsz$ at infinity .

The beauty (and power) of the Stieltjes transform lies in the fact that it has a clear linear algebra content; sn(z)s_{n}(z) of WnW_{n} is exactly the trace of the matrix (WnzI)1(W_{n}-zI)^{-1}. This allows us to compute the Stieltjes transform by looking at the diagonal entries of (WnzI)1(W_{n}-zI)^{-1}. In matrix theory, Stieltjes transform plays the role Fourier transform in analysis. If the Stieltjes transforms of two spectral measures are close to each other (for all zz), then the two measures are more or less the same. In particular, if sn(z)s_{n}(z) is close to ssc(z)s_{sc}(z), then the spectral distribution of WnW_{n} is close to the semi-circle distribution (see for instance [4, Chapter 11], ). We are going to use the following lemma.

Let MnM_{n} be a random Hermitian matrix with independent KK-bounded entries with mean 0 and variance 1. Let 1/n<η<1/101/n<\eta<1/10 and L,ε,δ>0L,\varepsilon,\delta>0. For any constant C1>0C_{1}>0, there exists a constant C>0C>0 such that if one has the bound

with probability at least 1nC1-n^{-C} uniformly for all zz with Re(z)L|\text{Re}(z)|\leq L and Im(z)η\text{Im}(z)\geq\eta, then for any interval II in [L+ε,Lε][-L+\varepsilon,L-\varepsilon] with Imax(2η,ηδlog1δ)|I|\geq\text{max}(2\eta,\frac{\eta}{\delta}\log\frac{1}{\delta}), one has

with probability at least 1nC11-n^{-C_{1}}.

This is [29, Lemma 64], which, in turn, is a variant of [10, Corollary 4.3].

An appropriate application of Lemma 4.1 will imply Theorem 1.10. (As a matter of fact, we a going to prove a little bit more.) In order to use this lemma, we set L=4,ε=1L=4,\varepsilon=1, and critically

where C=C1+104C=C_{1}+10^{4}. We are going to show that

In order to show that sn(z)s_{n}(z) is close to ssc(z)s_{sc}(z), the key observation is that ssc(z)s_{sc}(z) can also be defined by the equation

This equation is stable, so if we can show sn(z)1z+sn(z)s_{n}(z)\approx-\frac{1}{z+s_{n}(z)} then it follows that sn(z)ssc(z)s_{n}(z)\approx s_{sc}(z). This observation was due to Bai et al. , who used it to prove the n1/2n^{-1/2} rate of convergence of sn(z)s_{n}(z) to ssc(z)s_{sc}(z). In , Erdős et al. refined Bai’s approach to prove local semi-circle law at scales finer than n1/2n^{-1/2}, ultimately to n1logCnn^{-1}\log^{C}n . Our main contribution here is to push the scale further down to n1lognn^{-1}\log n, which we believe is (at most) a factor logn\sqrt{\log n} from the truth.

Recall that sn(z)s_{n}(z) is the trace of (WnzI)1(W_{n}-zI)^{-1}. By computing the diagonal entires, one can show (see [4, Chapter 11], or [29, Lemma 39])

and Wn,kW_{n,k} is the matrix WnW_{n} with the kk-th row and column removed, and aka_{k} is the kk-th row of WnW_{n} with the kk-th element removed.

The entries of aka_{k} are independent of each other and of Wn,kW_{n,k}, and have mean zero and variance 1/n1/n. By linearity of expectation we have

is the Stieltjes transform of Wn,kW_{n,k}. From the Cauchy interlacing law, we can get

The heart of the matter now is the following concentration result.

Let MnM_{n} be as in Lemma 4.1. For 1kn1\leq k\leq n, Yk=E(YkWn,k)+o(δ2)Y_{k}=\mathbf{E}(Y_{k}|W_{n,k})+o(\delta^{2}) holds with probability at least 1O(nC)1-O(n^{-C}) for any zz with Re(z)4|\text{Re}(z)|\leq 4 and Im(z)η\text{Im}(z)\geq{\eta}.

To prove this lemma, we are going to make an essential use of the weighted projection lemma, as showed in the next section.

Proof of Lemma 4.2 and Threshold of the Local Law

We are going to prove Lemma 4.2 and the following more quantitative version of Theorem 1.10.

For any constants ϵ,δ,C1>0\epsilon,\delta,C_{1}>0, there is a constant C2>0C_{2}>0 such that the following holds. Let MnM_{n} be a Hermitian matrix whose upper diagonal entries are independent random variables with mean and variance 11. Assume furthermore that for 1in1\leq i\leq n, the vectors XiX_{i}, obtained by deleting the ii-th entry of the ii-th row vector of MnM_{n}, are KK-concentrated. Then with probability at least 1nC11-n^{-C_{1}}, we have

for all interval I(2+ϵ,2ϵ)I\subset(-2+\epsilon,2-\epsilon) of length at least C2K2logn/nC_{2}K^{2}\log n/n.

First, we record a lemma that provides a crude upper bound on the number of eigenvalues in short intervals.

with probability at least 1nC11-n^{-C_{1}}.

This lemma is Proposition 66 in , which is a variant of [11, Theorem 5.1]. Notice that

where Xk=nakX_{k}=\sqrt{n}a_{k} is the kk-th row of MnM_{n} with the kk-th element removed. Note that the entries of XkX_{k} are independent with mean 0 and variance 1. Therefore,

where Rj:=uj(Wn,k)Xk21R_{j}:=|u_{j}(W_{n,k})^{*}X_{k}|^{2}-1. By symmetry, we can restrict the sum to those indices jj where λj(Wn,k)x0\lambda_{j}(W_{n,k})-x\geq 0.

Let JJ be the set of indices jj such that 0λj(Wn,k)xη0\leq\lambda_{j}(W_{n,k})-x\leq\eta. Since x=Rez,η=Imzx={\operatorname{Re}}z,\eta={\operatorname{Im}}z, we have

Consider the sum S1:=jJ(λj(Wn,k)x)η(λj(Wn,k)x)2+η2RjS_{1}:=|\sum_{j\in J}\frac{(\lambda_{j}(W_{n,k})-x)\eta}{(\lambda_{j}(W_{n,k})-x)^{2}+\eta^{2}}R_{j}|. As 0(λj(Wn,k)x)η(λj(Wn,k)x)2+η210\leq\frac{(\lambda_{j}(W_{n,k})-x)\eta}{(\lambda_{j}(W_{n,k})-x)^{2}+\eta^{2}}\leq 1, we are in position to apply Lemma 1.2. Taking t=C4Klognt=C_{4}K\sqrt{\log n} with a sufficiently large constant C4C_{4}, by (3) we have

with probability at least 1Cexp(CC42logn)1nC4/21-C\exp(-C^{\prime}C_{4}^{2}\log n)\geq 1-n^{-C_{4}/2}. By Lemma 5.2, JBnη|J|\leq Bn\eta with probability at least 1nC41-n^{-C_{4}}, for some sufficiently large constant B>0B>0. Recall η:=K2C32lognnδ6\eta:=\frac{K^{2}C_{3}^{2}\log n}{n\delta^{6}}; it follows that with probability at least 12nC4/21-2n^{-C_{4}/2} we have

Thus, for C3C_{3} sufficiently large compared to C4C_{4} and BB, we have S1δ3S_{1}\leq\delta^{3}. Similarly, we can prove the same bound for S2:=1nηjJη2(λj(Wn,k)x)2+η2RjS_{2}:=\frac{1}{n\eta}|\sum_{j\in J}\frac{\eta^{2}}{(\lambda_{j}(W_{n,k})-x)^{2}+\eta^{2}}R_{j}|.

For the other eigenvalues, we divide the real line into small intervals. For integer l0l\geq 0, let JlJ_{l} be the set of eigenvalues λj(Wn,k)\lambda_{j}(W_{n,k}) such that 10lη<λj(Wn,k)x10l+1η10^{l}\eta<\lambda_{j}(W_{n,k})-x\leq 10^{l+1}\eta. The number of such JlJ_{l} is at most 20logn20\log n. By Lemma 5.2 one has, Jl9B10lnη|J_{l}|\leq 9B10^{l}n\eta with probability at least 1nC41-n^{-C_{4}}, for some sufficiently large constant B>0B>0. Again by Lemma 1.2 (taking t=KC4lognt=KC_{4}\sqrt{\log n}),

with probability at least 12Cexp(CC42logn)nC41nC4/21-2C\exp(-C^{\prime}C_{4}^{2}\log n)-n^{-C_{4}}\geq 1-n^{-C_{4}/2}.

with probability at least 1nC4/2+11-n^{-C_{4}/2+1}, for C3C_{3} sufficiently large. This completes the proof of Lemma 4.2.

with probability at least 1O(nC)1-O(n^{-C}). The term ζkk/n=o(δ2)|\zeta_{kk}/\sqrt{n}|=o(\delta^{2}) as ζkkK|\zeta_{kk}|\leq K by assumption. Comparing this equation with (22), one can use a continuity argument (see for details) to obtain sn(z)s(z)δ|s_{n}(z)-s(z)|\leq\delta with probability at least 1O(nC+100)1-O(n^{-C+100}).

By Lemma 4.1, it follows that for random matrices MnM_{n} with KK-bounded entries, for any constant C1>0C_{1}>0, there exists a constant C2>0C_{2}>0 such that for 0δ1/20\leq\delta\leq 1/2 and any interval I(3,3)I\subset(-3,3) of length at least C2K2logn/nδ8C_{2}K^{2}\log n/{n\delta^{8}},

holds with probability at least 1nC11-n^{-C_{1}}. In particular, Theorem 5.1 follows.

The infinity norm of eigenvectors

We prove Theorem 1.7 in the following more general form.

Let MnM_{n} be a Hermitian matrix whose upper diagonal entries are independent random variables with mean 0 and variance 1. Further assume that for any index 1in1\leq i\leq n, the vector XiX_{i}, obtained by deleting the ii-th entry of the ii-th row vector of MnM_{n}, is KK-concentrated. Let Wn=1nMnW_{n}=\frac{1}{\sqrt{n}}M_{n}. Then for any constant C1>0C_{1}>0, there is a constant C2>0C_{2}>0 such that the following holds.

(Bulk case) With probability at least 1nC11-n^{-C_{1}}, for any ϵ>0\epsilon>0 and any 1in1\leq i\leq n with λi(Wn)[2+ϵ,2ϵ]\lambda_{i}(W_{n})\in[-2+\epsilon,2-\epsilon] there is a unit eigenvector ui(Wn)u_{i}(W_{n}) of λi(Wn)\lambda_{i}(W_{n}) satisfying

(Edge case) With probability at least 1nC11-n^{-C_{1}}, for any ϵ>0\epsilon>0 and any 1in1\leq i\leq n with λi(Wn)[2ϵ,2+ϵ][2ϵ,2+ϵ]\lambda_{i}(W_{n})\in[-2-\epsilon,-2+\epsilon]\cup[2-\epsilon,2+\epsilon], there is a unit eigenvector ui(Wn)u_{i}(W_{n}) of λi(Wn)\lambda_{i}(W_{n}) satisfying

We give here the proof of the first part of Theorem 6.1. The proof of the second part is somewhat different and deferred to the appendix. With the threshold for local semi-circle law, we are able to derive the eigenvector delocalization results thanks to the next lemma.

where uj(Wn1)u_{j}(W_{n-1}) is a unit eigenvector corresponding to the eigenvalue λj(Wn1).\lambda_{j}(W_{n-1}).

The assumption that the eigenvalues of WnW_{n} and Wn1W_{n-1} do not collide was taken care of in [32, Section 3.1], so we can assume that the above formula makes sense in applications.

First, for the bulk case, for any λi(Wn)(2+ε,2ε)\lambda_{i}(W_{n})\in(-2+\varepsilon,2-\varepsilon), by Theorem 5.1, one can find an interval I(2+ε,2ε)I\subset(-2+\varepsilon,2-\varepsilon), centered at λi(Wn)\lambda_{i}(W_{n}) and with length I=K2Clogn/n|I|={K^{2}C\log n}/{n}, such that NIδ1nIN_{I}\geq\delta_{1}n|I| (δ1>0\delta_{1}>0 small enough) with probability at least 1nC1101-n^{-C_{1}-10}. By Cauchy interlacing law, we can find a set J{1,,n1}J\subset\{1,\ldots,n-1\} with JNI/2|J|\geq N_{I}/2 such that λj(Wn1)λi(Wn)I|\lambda_{j}(W_{n-1})-\lambda_{i}(W_{n})|\leq|I| for all jJj\in J. Let XX be the first column of MnM_{n} with the first entry removed. Then X=nYX=\sqrt{n}Y.

for some constant C2C_{2} with probability at least 1nC1101-n^{-C_{1}-10}. The third inequality follows from (3) by taking t=δ1KClognt=\delta_{1}K\sqrt{C\log n} (say).

Thus, by union bound and symmetry, ui(Wn)C2Klog1/2nn\|u_{i}(W_{n})\|_{\infty}\leq\frac{C_{2}K\log^{1/2}n}{\sqrt{n}} holds with probability at least 1nC11-n^{-C_{1}}.

Appendix A Proof for the Edge case of Theorem 6.1

For the edge case in Theorem 6.1, we use a different approach based on the next lemma.

Let Wn1W_{n-1} be the matrix WnW_{n} with the nn-th row and nn-th column removed and YY is the nn-th column of WnW_{n} with the nn-th element ζnn/n\zeta_{nn}/\sqrt{n} removed. If none of the eigenvalues of Wn1W_{n-1} equals λi(Wn)\lambda_{i}(W_{n}), then

By symmetry, it suffices to consider the case λi(Wn)[2ϵ,2+ϵ]\lambda_{i}(W_{n})\in[2-\epsilon,2+\epsilon] for ϵ>0\epsilon>0 very small. Denote XX the nn-th column of MnM_{n} with the nn-th element removed. Thus Y=nY=\sqrt{n}X. By Lemma 6.2, in order to show x2C4K4log2n/n|x|^{2}\leq C^{4}K^{4}\log^{2}n/n (for constant C>C1+100C>C_{1}+100) with probability at least 1nC1101-n^{-C_{1}-10}, it is enough to show

By the projection lemma, uj(Wn1)X10KClogn|u_{j}(W_{n-1})^{*}X|\leq 10K\sqrt{C\log n} with probability at least 110nC1-10n^{-C}. It suffices to show that with probability at least 1nC1101-n^{-C_{1}-10},

By Cauchy-Schwardz inequality, it is enough to show for some integers 1T<T+n11\leq T_{-}<T_{+}\leq n-1 that

By Lemma A.1, we are going to show for some integers T+,TT_{+},T_{-} satisfying T+T=O(logn)T_{+}-T_{-}=O(\log n) (the choice of T+,TT_{+},T_{-} will be given later) that

with probability at least 1nC1101-n^{-C_{1}-10}.

Let η=K2Clognnδ8\eta=\frac{K^{2}C\log n}{n\delta^{8}} with constant δ=ϵ/1000\delta=\epsilon/1000. Divide the real line into disjoint intervals IkI_{k} for k0k\geq 0 where I0=(λi(Wn)η,λi(Wn)+η)I_{0}=(\lambda_{i}(W_{n})-\eta,\lambda_{i}(W_{n})+\eta). For 1kk0=log0.9n1\leq k\leq k_{0}=\log^{0.9}n (say), Ik|I_{k}| has length 2ηδ8k=o(1)2\eta\delta^{-8k}=o(1) and

where we denote by βk=s=0kδ8s\beta_{k}=\sum_{s=0}^{k}\delta^{-8s}. The distance from λi(Wn)\lambda_{i}(W_{n}) to the interval IkI_{k} satisfies

For each such interval, by (26), for sufficiently large constant C>0C>0, the number of eigenvalues Jk=NIknαIkIk+δk+1nIk|J_{k}|=N_{I_{k}}\leq n\alpha_{I_{k}}|I_{k}|+\delta^{k+1}n|I_{k}| with probability at least 1nC11001-n^{-C_{1}-100}, where αIk=Ikρsc(x)dx/Ik\alpha_{I_{k}}=\int_{I_{k}}\rho_{sc}(x)dx/|I_{k}|.

For the kk-th interval, by (3) taking t=KClognt=K\sqrt{C\log n}, we have that, with probability at least 1Cexp(CClogn)1nC11001-C^{\prime\prime}\exp(-C^{\prime}C\log n)\geq 1-n^{-C_{1}-100} for sufficiently large CC,

For kk0+1k\geq k_{0}+1, let the interval IkI_{k}’s have the same length of Ik0=2δ8k0η|I_{k_{0}}|=2\delta^{-8k_{0}}\eta. Note that the number of such intervals is bounded crudely by o(n)o(n). By (26), the number of eigenvalues JknαIkIk+δk0+1nIk|J_{k}|\leq n\alpha_{I_{k}}|I_{k}|+\delta^{k_{0}+1}n|I_{k}| with probability at least 1nC11001-n^{-C_{1}-100}. And the distance from λi(Wn)\lambda_{i}(W_{n}) to the interval IkI_{k} satisfies

The contribution of such intervals can be computed similarly

with probability at least 1nC11001-n^{-C_{1}-100}.

Summing over all intervals for k10k\geq 10 (say), we obtain

On the other hand, it follows from Riemann integration of the principal value integral that

From the explicit formula for the Stieltjes transform and from residue calculus, one obtains

for λi(Wn)2|\lambda_{i}(W_{n})|\leq 2, and with the right-hand side replaced by λi(Wn)/2+λi(Wn)24/2-\lambda_{i}(W_{n})/2+\sqrt{\lambda_{i}(W_{n})^{2}-4}/2 for λi(Wn)>2|\lambda_{i}(W_{n})|>2. Finally, we always have

Now for the rest of eigenvalues that satisfy λi(Wn)λj(Wn1)I0+I1++I104η/δ80|\lambda_{i}(W_{n})-\lambda_{j}(W_{n-1})|\leq|I_{0}|+|I_{1}|+\ldots+|I_{10}|\leq 4\eta/\delta^{80}, by Theorem 5.1 and Cauchy interlacing law, the number of eigenvalues is at most T+T8nη/δ80=8CK2logn/δ88{T_{+}-T_{-}}\leq 8n\eta/\delta^{80}=8CK^{2}\log n/\delta^{88} with probability at least 1nC11001-n^{-C_{1}-100} for sufficiently large constant C>0C>0. Thus

by choosing CC sufficiently large compared to δ44\delta^{-44}. Thus, from (29), (30), (31) and (32), we have proved that there exits a constant C>0C>0 such that with probability at least 1nC1101-n^{-C_{1}-10},

The conclusion of the second part of Theorem 1.7 follows from symmetry and union bounds.

Appendix B Local Marchenko-Pastur law for random covariance matrix and delocalization of singular vectors

In this appendix, we extend the results obtained for random Hermitian matrices discussed in the previous sections to random covariance matrices, focusing on the changes needed for the proofs. Interested reader can refer to closely related papers and (see also ).

Let M=Mp,n=(ζij)1ip,1jnM=M_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} be a p×np\times n matrix, where p=p(n)p=p(n) is an integer such that pnp\leq n and limnp/n=y(0,1]\lim_{n\rightarrow\infty}p/n=y\in(0,1]. Assume the entries of Mn,pM_{n,p} are independent random variables with mean zero and variance one. For such a p×np\times n random matrix MM, we form the n×nn\times n (sample) covariance matrix W=Wp,n=1nMMW=W_{p,n}=\frac{1}{n}M^{*}M. This (non-negative definite) matrix has at most pp non-zero eigenvalues which are ordered as

A fundamental result concerning the asymptotic limiting behavior of ESD for large covariance matrices is the Marchenko–Pastur Law (see and ).

The hard edge of the limiting support of spectrum refers to the left edge aa when y=1y=1 where it gives rise to a singularity of x1/2x^{-1/2}. The cases of left edge aa when y<1y<1 and the right edge bb regardless of the value of yy are usually called the soft edge. Recent progress on studying the local convergence to Marchenko–Pastur Law includes for the soft edge and for the hard edge. We focus on improving the previous results for the soft edge in this appendix.

Our main results for the random covariance matrices are the following local Marchenko–Pastur law (LMPL) and the delocalization property of singular vectors.

For any constants ϵ,δ,C1>0\epsilon,\delta,C_{1}>0, there exists a constant C2>0C_{2}>0 such that the following holds. Assume limnp/n=y\lim_{n\rightarrow\infty}p/n=y for some 0<y10<y\leq 1. Let M=Mp,n=(ζij)1ip,1jnM=M_{p,n}=(\zeta_{ij})_{1\leq i\leq p,1\leq j\leq n} be a random matrix whose entries are independent KK-bounded random variables with mean 0 and variance 1. Consider the covariance matrix W=1nMMW=\frac{1}{n}M^{*}M. Then with probability at least 1nC11-n^{-C_{1}}, one has

for any interval I(a+ϵ,bϵ)I\subset(a+\epsilon,b-\epsilon) of length at least C2K2logn/n.C_{2}K^{2}\log n/n.

Let Mp,nM_{p,n} be as in Theorem B.2. For any constant C1>0C_{1}>0, there is a constant C2>0C_{2}>0 such that the following holds.

(Bulk case) With probability at least 1nC11-n^{-C_{1}}, for any ϵ>0\epsilon>0 and any 1ip1\leq i\leq p such that σi(Mp,n)2/n[a+ϵ,bϵ]\sigma_{i}(M_{p,n})^{2}/n\in[a+\epsilon,b-\epsilon], there is a left singular vector uiu_{i} corresponding to σi(Mp,n)\sigma_{i}(M_{p,n}) such that

The same holds for right singular vectors.

(Edge case) With probability at least 1nC11-n^{-C_{1}}, for any ϵ>0\epsilon>0 and any 1ip1\leq i\leq p such that σi(Mp,n)2/n[aϵ,a+ϵ][bϵ,b+ϵ]\sigma_{i}(M_{p,n})^{2}/n\in[a-\epsilon,a+\epsilon]\cup[b-\epsilon,b+\epsilon] if a0a\neq 0 and σi(Mp,n)2/n[4ϵ,4]\sigma_{i}(M_{p,n})^{2}/n\in[4-\epsilon,4] if a=0a=0, there is a left singular vector uiu_{i} corresponding to σi(Mn,p)\sigma_{i}(M_{n,p}) such that

The same holds for right singular vectors.

Theorem B.2 and Theorem B.3 actually hold for a larger class of matrices, using the KK-concentration introduced in the previous sections. For instance, Theorem B.2 holds for random matrices Mp,n=(ζij)M_{p,n}=(\zeta_{ij}) whose entries are independent random variables with mean 0 and variance 1, and the row vectors are KK-concentrated. And Theorem B.3 holds if we further assume the column vectors of Mp,nM_{p,n} are also KK-concentrated. Indeed, the KK-bounded assumption is only used to guarantee KK-concentration.

Similarly to the Hermitian case, we compare the Stieltjes transform of WW

The explicit expression of sMP,y(z)s_{MP,y}(z) is given by (see )

where we take the branch of (y+z1)24yz\sqrt{(y+z-1)^{2}-4yz} with cut at [a,b][a,b] that is asymptotically y+z1y+z-1 as zz tends to infinity. Note that it is uniquely defined by the equation

We will show that s(z)s(z) satisfies a similar equation.

The analogue of Lemma 4.1 is the following lemma.

(Lemma 29, ) Let Mn,pM_{n,p} be a random matrix with independent KK-bounded entries with mean 0 and variance 1. Assume limn+p/n=y(0,1]\lim_{n\rightarrow+\infty}p/n=y\in(0,1]. Let 1/n<η<1/101/n<\eta<1/10, and L1,L2,ε,δ>0L_{1},L_{2},\varepsilon,\delta>0. For any constant C1>0C_{1}>0, there exists a constant C>0C>0 such that if one has the bound

with (uniformly) probability at least 1nC1-n^{-C} for all zz with L1Re(z)L2L_{1}\leq\text{Re}(z)\leq L_{2} and Im(z)η\text{Im}(z)\geq\eta. Then for any interval II in [L1ε,L2+ε][L_{1}-\varepsilon,L_{2}+\varepsilon] with Imax(2η,ηδlog1δ)|I|\geq\text{max}(2\eta,\frac{\eta}{\delta}\log\frac{1}{\delta}), one has

with probability at least 1nC11-n^{-C_{1}}.

with probability at least 1nC1-n^{-C} for any zz in the region RyR_{y}, where

where C=C1+104C=C_{1}+10^{4}. Note that in the defined region RyR_{y}, sMP,y(z)=O(1)|s_{MP,y}(z)|=O(1).

First, by Schur’s complement, one can rewrite

The entries of XkX_{k} are independent of each other and of WkW_{k}, and have mean and variance 11. Since uj(Mk)u_{j}(M_{k}) are unit vectors, by linearity of expectation we have

is the Stieltjes transform of WkW_{k}. By Cauchy interlacing law, we have

On the other hand, YkY_{k} is concentrated about E(YkWk){\mathbf{E}}(Y_{k}|W_{k}) with high probability:

Let Mn,pM_{n,p} be as in Lemma B.5. For 1kp1\leq k\leq p, Yk=E(YkWk)+o(δ2)Y_{k}=\mathbf{E}(Y_{k}|W_{k})+o(\delta^{2}) holds with probability at least 1O(nC)1-O(n^{-C}) for any zz in the region RyR_{y}.

where Rj=Xkuj(Mk)21R_{j}=|X_{k}^{*}u_{j}(M_{k})|^{2}-1. Note that λj(Wk)=O(1)\lambda_{j}(W_{k})=O(1). The estimation of (35) is a repetition of the calculation in (25). Interested reader are encouraged to work out the details. Inserting the bounds to (34), we have

with probability at least 1O(nC)1-O(n^{-C}). By a continuity argument (see for instance ), one has s(z)sMP,y(z)=o(δ)|s(z)-s_{MP,y}(z)|=o(\delta) with probability at least 1nC+1001-n^{-C+100} (say). By Lemma B.5, we have showed that for any constants ϵ,C1>0\epsilon,C_{1}>0, there exists a constant C2>0C_{2}>0 such that for 0<δ<1/20<\delta<1/2 and any interval I(aϵ,b+ϵ)I\subset(a-\epsilon,b+\epsilon) if a0a\neq 0 or I(ϵ,4+ϵ)I\subset(\epsilon,4+\epsilon) if a=0a=0 of length at least C2K2logn/nδ8,C_{2}K^{2}\log n/n\delta^{8}, with probability at least 1nC11-n^{-C_{1}},

B.2. Proof of Theorem B.3

To prove the delocalization of singular vectors, we need the following formula to express the entries of a singular vector in terms of the singular values and singular vectors of a minor. It is enough to prove the delocalization for the right (unit) singular vectors.

First, if λi(Wp,n)\lambda_{i}(W_{p,n}) lies within the bulk of spectrum, by Theorem B.2, one can find an interval I(a+ε,bε)I\subset(a+\varepsilon,b-\varepsilon), centered at λi(Wp,n)\lambda_{i}(W_{p,n}) and with length I=K2Clogn/n|I|={K^{2}C\log n}/n such that NIδ1nIN_{I}\geq\delta_{1}n|I| (δ1>0\delta_{1}>0 small constant) with probability at least 1nC1101-n^{-C_{1}-10}. By Cauchy interlacing law, we can find a set J{1,,p}J\subset\{1,\ldots,p\} with JNI/2|J|\geq N_{I}/2 such that λj(Wp,n1)λi(Wp,n)I|\lambda_{j}(W_{p,n-1})-\lambda_{i}(W_{p,n})|\leq|I| for all jJj\in J. Thus

with probability at least 1nC1101-n^{-C_{1}-10} for some constant C2>0C_{2}>0. The fourth inequality follows from (3) by taking t=δ1KClognt=\delta_{1}K\sqrt{C}\log n.

Thus, by Lemma B.7 and the union bound, xC2Klog1/2nn|x|\leq\frac{C_{2}K\log^{1/2}n}{\sqrt{n}} with probability at least 1nC1101-n^{-C_{1}-10}. By symmetry and union bounds, ui(Mp,n)C2Klog1/2nn\|u_{i}(M_{p,n})\|_{\infty}\leq\frac{C_{2}K\log^{1/2}n}{\sqrt{n}} holds with probability at least 1nC11-n^{-C_{1}}.

For the edge case, we consider λi(Wp,n)a=o(1)|\lambda_{i}(W_{p,n})-a|=o(1) (a0)(a\neq 0) or λi(Wp,n)b=o(1)|\lambda_{i}(W_{p,n})-b|=o(1). We first record an analogue of Lemma A.1.

Assume the notations in Lemma B.7, then for every ii,

By the union bound and Lemma B.7, in order to show x2C4K4log2n/n|x|^{2}\leq C^{4}K^{4}\log^{2}n/n with probability at least 1nC1101-n^{-C_{1}-10} for some large constant C>C1+100C>C_{1}+100, it is enough to show

By the projection lemma, vj(Mp,n1)X10KClogn|v_{j}(M_{p,n-1})^{*}X|\leq 10K\sqrt{C\log n} with probability at least 110nC1-10n^{-C}. It suffices to show that with probability at least 1nC1101-n^{-C_{1}-10},

By Cauchy-Schwardz inequality and note that σi(Mp,n1)10n|\sigma_{i}(M_{p,n-1})|\leq 10\sqrt{n} almost surely (See ), it is enough to show for some integers 1T<T+min(p,n1)1\leq T_{-}<T_{+}\leq\text{min}(p,n-1) (the choice of T,T+T_{-},T_{+} will be given later),

On the other hand, by the projection lemma, with probability at least 1nC11001-n^{-C_{1}-100}, X2/n=y+o(1)\|X\|^{2}/n=y+o(1). By (37) in Lemma B.8,

The estimation of (40) is similar to that of (29). We divide the real line into disjoint intervals IkI_{k} for k0k\geq 0. Let η=K2Clognnδ8\eta=\frac{K^{2}C\log n}{n\delta^{8}} with small constant δ0.01\delta\leq 0.01. Denote βk=s=0kδ8s\beta_{k}=\sum_{s=0}^{k}\delta^{-8s}. Let I0=(λi(Wp,n)η,λi(Wp,n)+η)I_{0}=(\lambda_{i}(W_{p,n})-\eta,\lambda_{i}(W_{p,n})+\eta). For 1kk0=log0.9n1\leq k\leq k_{0}=\log^{0.9}n (say),

thus Ik=2δ8kη=o(1)|I_{k}|=2\delta^{-8k}\eta=o(1) and the distance from λi(Wp,n)\lambda_{i}(W_{p,n}) to the interval IkI_{k} satisfies

For each such interval, by (36), for sufficiently large constant C>0C>0, the number of eigenvalues Jk=NIkpαIkIk+δk+1pIk|J_{k}|=N_{I_{k}}\leq p\alpha_{I_{k}}|I_{k}|+\delta^{k+1}p|I_{k}| with probability at least 1nC11001-n^{-C_{1}-100}, where αIk=IkρMP,y(x)dx/Ik\alpha_{I_{k}}=\int_{I_{k}}\rho_{MP,y}(x)dx/|I_{k}|.

Taking t=KClognt=K\sqrt{C\log n} in (3) for CC sufficiently large, it follows that with probability at least 1Cexp(CClogn)1nC11001-C^{\prime\prime}\exp(-C^{\prime}C\log n)\geq 1-n^{-C_{1}-100},

For kk0+1k\geq k_{0}+1, let the intervals IkI_{k}’s have the same length of Ik0=2δ8k0η|I_{k_{0}}|=2\delta^{-8k_{0}}\eta. Note that the number of such intervals is bounded crudely by o(n)o(n). The distance from λi(Wp,n)\lambda_{i}(W_{p,n}) to the interval IkI_{k} satisfies

The contribution of such intervals can be estimated similarly by

with probability at least 1nC11001-n^{-C_{1}-100}.

Summing over all intervals for k10k\geq 10 (say), we have

Using Riemann integration of the principal value integral, we obtain

follows from the explicit formula for the Stieltjes transform and from residue calculus.

Now for the rest of eigenvalues such that λi(Wp,n)λj(Wp,n1)I0+I1++I104η/δ80|\lambda_{i}(W_{p,n})-\lambda_{j}(W_{p,n-1})|\leq|I_{0}|+|I_{1}|+\ldots+|I_{10}|\leq 4\eta/\delta^{80}. By Theorem B.2 and Cauchy interlacing law, the number of eigenvalues is at most T+T8nη/δ80=8CK2logn/δ88{T_{+}-T_{-}}\leq 8n\eta/\delta^{80}=8CK^{2}\log n/\delta^{88} with probability at least 1nC11001-n^{-C_{1}-100} for constant C>0C>0 sufficiently large. Thus

again by choosing CC sufficiently large. From Lemma B.7, by comparing (39), (40) and (42), one can conclude with probability at least 1nC1101-n^{-C_{1}-10}, xC2K2lognn.|x|\leq\frac{C_{2}K^{2}\log n}{\sqrt{n}}. The conclusion of Theorem B.3 follows from symmetry and union bounds.

References