The Isotropic Semicircle Law and Deformation of Wigner Matrices
Antti Knowles, Jun Yin
Introduction
Random matrices were introduced by Wigner Wig in the 1950s to model the excitation spectra of large atomic nuclei, and have since been the subject of intense mathematical investigation. In this paper we study Wigner matrices – random matrices whose entries are independent up to symmetry constraints – that have been deformed by a finite-rank perturbation. By Weyl’s eigenvalue interlacing inequalities, such a deformation does not influence the global statistics of the eigenvalues. Thus, the empirical eigenvalue densities of deformed and undeformed Wigner matrices have the same large-scale asymptotics, and are governed by Wigner’s famous semicircle law. However, the behaviour of individual eigenvalues may change dramatically under a deformation. In particular, deformed Wigner matrices may exhibit outliers, eigenvalues located away from the bulk spectrum. Such models were first investigated by Füredi and Komlós FKoml . Subsequently, much progress SoshPert ; FP ; CDMF1 ; CDMF2 ; CDMF3 ; BGN ; BGGM1 ; BGGM2 has been made in the analysis of the spectrum of such deformed matrix models. See e.g. SoshPert for a review of recent developments. Analogous deformations of covariance matrices, so-called spiked population models, as well as generalizations thereof, were studied in BY1 ; BY2 ; BS .
The phase transition takes place on the scale where is of order one. This may be heuristically understood as follows. The largest eigenvalues of are known to fluctuate on the scale around . The critical scale for , i.e. the scale on which the outlier is separated from by a gap of order , is therefore (since in that case ). In BBP ; Pec ; BV1 ; BV2 , the authors established the weak convergence as
where denotes the largest eigenvalue of . Moreover, the asymptotics in of the law was analysed in BBP ; Pec ; BV1 ; BV2 ; Bthesis : as , the law converges to a Gaussian; as , the law converges to the Tracy-Widom- distribution (where for GOE and for GUE). As mentioned above, the results of BBP ; Pec ; BV1 ; BV2 also apply to rank- deformations, where the picture is similar; each eigenvalue gives rise to an outlier located around , while eigenvalues do not change the statistics of the extremal eigenvalues of .
The proofs of BBP ; Pec use an asymptotic analysis of Fredholm determinants, while those of BV1 ; BV2 use an explicit tridiagonal representation of ; both of these approaches rely heavily on the Gaussian nature of . In order to study the phase transition for non-Gaussian matrix ensembles, and in particular address the question of spectral universality, a different approach is needed. Interestingly, it was observed in CDMF1 ; CDMF2 ; CDMF3 that the distribution of the outliers is not universal, and may depend on the geometry of the eigenvectors of . The non-universality of the outliers was further investigated in SoshPert .
In the present paper we take to be a real symmetric or complex Hermitian Wigner matrix, and to be a rank- deterministic matrix whose symmetry class (real symmetric or complex Hermitian) coincides with that of . We make the following assumptions on the perturbation .
The eigenvalues of may depend on ; they satisfy \bigl{\lvert}\lvert d_{i}\rvert-1\bigr{\rvert}\geqslant(\log N)^{C\log\log N}N^{-1/3}, i.e., on the scale of the phase transition, the eigenvalues of are separated from the transition points by at least a logarithmic factor.
The eigenvectors of are arbitrary orthonormal vectors.
Our main results on the spectrum of may be informally summarized as follows.
The non-outliers “stick” to eigenvalues of the undeformed matrix (Theorem 2.7). In particular, the extremal bulk eigenvalues of are universal.
We identify the distribution of the outliers of (Theorem 2.14).
A key ingredient in our proof is a generalization of the local semicircle law. The study of the local semicircle law was initiated in ESY1 ; ESY3 ; it provides a key step towards establishing universality for Wigner matrices ESY4 ; ESY6 ; EYY2 ; EYY3 ; TV1 ; TV2 . The strongest versions of the local semicircle law, proved in EYY3 ; EKYY1 ; EKYY2 , give precise estimates on the local eigenvalue density, down to scales containing eigenvalues. In fact, as formulated in EYY3 , the local semicircle law gives optimal high-probability estimates on the quantity
where denotes the Stieltjes transform of Wigner’s semicircle law and is the resolvent of . Starting from such estimates on (1.1), the two following facts are established in EYY3 .
The eigenvalue density is governed by Wigner’s semicircle law down to scales containing eigenvalues.
Eigenvalue rigidity: optimal high-probability bounds on the eigenvalue locations.
Another key ingredient in the proof of universality of random matrices is the Green function comparison method introduced in EYY2 . It uses a Lindeberg replacement strategy, which previously appeared in the context of random matrix theory in Chat ; TV1 ; TV2 . A fundamental input in the Green function comparison method is a precise control on the matrix entries of , which is provided by the local semicircle law. The Green function comparison method has subsequently been applied to proving the spectral universality of adjacency matrices of random graphs EKYY1 ; EKYY2 as well as the universality of eigenvectors of Wigner matrices KY1 .
In this paper, we extend the local semicircle law to the isotropic local semicircle law, which gives optimal high-probability estimates on the quantity
In Section 2, we introduce basic definitions and state our results. In a first part, we state the isotropic semicircle law (Theorem 2.2) and some important corollaries, such as the isotropic delocalization estimate (Theorem 2.5). The second part of Section 2 is devoted to the spectra of deformed Wigner matrices. Our main results are deviation estimates on the eigenvalue locations (Theorem 2.7) and the distribution of the outliers (Theorem 2.14). In subsequent remarks we discuss some special cases of interest, in particular making the link to the previous results of CDMF1 ; CDMF2 ; CDMF3 ; SoshPert .
The remainder of this paper is devoted to proofs. As it turns out, the proof of the isotropic local semicircle law is considerably simpler if the third moments of the matrix entries of vanish. This case is dealt with in Section 3. The proof is based on the Green function comparison method and the local semicircle law of EYY3 . In Section 4, we give the additional arguments needed to extend the isotropic local semicircle law to arbitrary matrix entries. We remark that the Green function comparison method has been traditionally EYY2 ; EKYY2 ; KY1 used to obtain limiting distributions of smooth, bounded, observables that depend on the resolvent . In this paper we use it in a novel setting: to obtain high-probability bounds on a fluctuating error.
In Section 5 we use the isotropic semicircle law to obtain an improved estimate outside of the classical spectrum $H$ onto arbitrary deterministic vectors.
Finally, Section 7 contains the proof of Theorem 2.14, the distribution of the outliers. The proof consists of four main steps.
Let be the Wigner matrix we are interested in. We introduce a cutoff (equal to in the notation of Section 7.3). We define as the Wigner matrix obtained from by replacing the -th entry of with a Gaussian whenever and . We choose large enough that most entries of are Gaussian. We shall compare with a Gaussian matrix via the intermediate matrix . In this step, (iii), we compare with .
Our proof relies on a block expansion of , which expresses the distribution of the difference
in terms of a sum of independent random variables ( in the notation of Section 7.3) whose laws may be explicitly computed.
In the final step, we use the Green function comparison method to analyse the difference
By definition of , whenever the entry of differs from that of , we have and . As a consequence, as it turns out, the Green function comparison method is applicable. Of special note in this comparison argument is a shift in the mean of the outlier (arising from the second term on the right-hand side of (7.50)), depending on the third moments of the entries of .
Acknowledgements
We are grateful to Alex Bloemendal, Paul Bourgade, László Erdős, and Horng-Tzer Yau for helpful comments.
Results
We use the abbreviation GOE/GUE to mean GOE if is a real symmetric Wigner matrix with Gaussian entries and GUE if is a complex Hermitian Wigner matrix with Gaussian entries. We assume that the entries of have uniformly subexponential decay, i.e. that there exists a constant such that
for all . Note that we do not assume the entries of to be identically distributed.
The following quantities will appear throughout this paper. We choose a fixed but arbitrary constant . We define the logarithmic control parameter
The parameter will play the role of a fixed positive constant, which simultaneously dictates the power of in large deviations estimates and characterizes the decay of probability of exceptional events, according to the following definition.
Let . We say that an -dependent event holds with -high probability if there is some constant such that
which will be used as the argument of Stieltjes transforms and resolvents. In the following we shall often use the notation and without further comment. Let
denote the density of the local semicircle law, and
its Stieltjes transform. To avoid confusion, we remark that the Stieltjes transform was denoted by in the papers ESY1 ; ESY2 ; ESY3 ; ESY4 ; ESY5 ; ESY6 ; ESY7 ; ESYY ; EYY1 ; EYY2 ; EYY3 ; EKYY1 ; EKYY2 , in which had a different meaning from (2.4). It is well known that the Stieltjes transform satisfies the identity
For we define the resolvent of through
We denote by a generic positive large constant, whose value may change from one expression to the next. If this constant depends on some parameters , we indicate this by writing . Finally, for two positive quantities and we use the notation to mean for some positive constant .
2 The isotropic local semicircle law
Fix . Then there exists a constant such that
for some large enough constant depending on .
Away from the asymptotic spectrum $$, Theorem 2.2 can be strengthened as follows.
Fix and . Then there exist constants and such that for any
Using a simple lattice argument combined with the Lipschitz continuity of , one can easily strengthen the statement (2.7) of Theorem 2.2 to a simultaneous high probability statement for all , as in (3.16) below. For more details, see e.g. Corollary 3.19 in EKYY1 .
Similarly, mimicking the proof of Lemma 7.2 below, we find
For an matrix we denote by the nondecreasing sequence of eigenvalues of . Moreover, we denote by the spectrum of . It is convenient to abbreviate the (random) eigenvalues of by
For any integers and satisfying and
with -high probability. Here is the constant from Theorem 2.2. By symmetry, a similar result holds for the eigenvectors .
If the third moments of the entries of vanish in the sense of (2.8), then we have the stronger statement
with -high probability. The second inequality implies
with -high probability. Compare this with the first inequality of (2.15).
3 Finite-rank deformation of Wigner matrices
We shall study the spectrum of the deformed matrix
We abbreviate the eigenvalues of by
In order to state our results, we order the eigenvalues of , i.e. we assume that . Define the numbers
As we shall see, is the number of outliers to the left of the bulk and the number of outliers to the right of the bulk. We shall always assume that and are independent of .
denote the indices associated with the outliers. For abbreviate the associated eigenvalue index by
Choose a sequence satisfying . Suppose that
for all . Then for we have
In CDMF1 , Capitaine, Donati-Martin, and Féral proved that almost surely for all , under the assumptions that (i) does not depend on and (ii) the law of the entries of is symmetric and satisfies a Poincaré inequality. Subsequently, the assumption (ii) was relaxed by Pizzo, Renfrew, and Soshnikov SoshPert . In fact, in SoshPert the authors proved, assuming (i), that the sequence is bounded in probability for all .
In BGGM1 ; BGGM2 , Benaych-Georges, Guionnet, and Maïda considered deformations of Wigner matrices by finite-rank random matrices whose eigenvalues are independent of and whose eigenvectors are either independent copies of a random vector with i.i.d. centred components satisfying a log-Sobolev inequality or are obtained by Gram-Schmidt orthonormalization of such independent copies. For these random perturbation models, they established eigenvalue sticking estimates similar to (2.21).
Provided one is only interested in the locations of the outliers, i.e. (2.20), one can set in Theorem 2.7.
We shall refer to the eigenvalues in (2.20), i.e. , as the outliers, and to the eigenvalues in (2.21), i.e. , as the extremal bulk eigenvalues.
The phase transition associated with happens on the scale where is of order one. The condition (2.19) is optimal (up to powers of ) in the sense that the power of in (2.19) cannot be reduced. Indeed, in BBP ; Pec ; BV1 ; BV2 it is established that, for rank-oneFor simplicity of presentation, we consider rank-one deformations, although the results of BBP ; Pec ; BV1 ; BV2 hold for rank- deformations. deformations of GOE/GUE with and of order one, fluctuates on the scale and its distribution differs from that of . Hence in that case (2.21) cannot hold for . See also Remark 2.13 below for a more detailed discussion of the qualitative behaviour of eigenvalues of as crosses a transition point.
Note that the location of the outlier associated with satisfies . In comparison, the largest eigenvalue of fluctuates on a scale around .
An immediate corollary of Theorem 2.7 is the universality of the extremal bulk eigenvalues of . In other words, under the assumption for all , the statistics of the extremal bulk eigenvalues of coincide with those of GOE/GUE.
The parameter describes how strongly the extremal bulk eigenvalues of stick to extremal eigenvalues of . If is within distance of a transition point , one does not expect the eigenvalues of to stick to the eigenvalues of . For very weak sticking on the scale , corresponding to , the eigenvalues have to satisfy \bigl{\lvert}\lvert d_{i}\rvert-1\bigr{\rvert}\geqslant\varphi^{C_{2}+1}N^{-1/3}. In particular, we may allow outliers at a distance from the spectral edge.
On the other hand, in order to obtain strong sticking on the scale , corresponding to , the eigenvalues have to satisfy \bigl{\lvert}\lvert d_{i}\rvert-1\bigr{\rvert}\geqslant\varphi^{C_{2}}N^{-\varepsilon}. Now the outliers have to lie at a distance of at least from the spectral edge.
Thus, Theorem 2.7 gives a clear picture of what happens to the extremal bulk eigenvalues as passes a transition point . For definiteness, consider the case where is varied from to for some small , and all other eigenvalues of are kept constant. Consider an extremal bulk eigenvalue near , say . By Theorem 2.7, for , sticks to where . As approaches , the eigenvalue progressively detaches itself from . Theorem 2.7 allows one to follow this behaviour down to . Below this scale, as passes , the eigenvalue “jumps” from from the vicinity of to the vicinity of . This jump happens in the range . After the jump, i.e. for , the eigenvalue sticks to instead of , provided that . If , then escapes from the bulk spectrum and becomes an outlier. This jump happens simultaneously for all extremal bulk eigenvalues near , and is accompanied by the creation of an outlier. This may be expressed as . Meanwhile, the extremal bulk eigenvalues on the other side of the spectrum, i.e. near , remain unaffected by the transition, and continue sticking to the same eigenvalues of they stuck to before the transition.
Next, we identify the distribution of the outliers. We introduce the customary symmetry index , by definition equal to if is real symmetric and if is complex Hermitian. In order to state our result, we define the moment matrices and of through
There is a constant such that the following holds. Suppose that
for all . Suppose moreover that for all we have
and , a random variable independent of with law
Then we have, for all and all bounded and continuous ,
The condition (2.24) has the following interpretation. Let and assume for definiteness that . If is not associated with an outlier on the right-hand side of the bulk, i.e. if , then is bounded from below by the right-hand side of (2.24), as follows from (2.23). Hence the condition (2.24) is only needed to ensure that the outliers are not to close too each other; in fact, this condition is optimal (up to the factor ) in guaranteeing that the distributions of the outliers have essentially no overlap. Indeed, by Theorem 2.7 we know that lies with -high probability in an interval of length centred around . Moreover, differentiating (2.18) yields
Imposing the condition leads to (2.24) (with increased if necessary so that ). In fact, in BBP ; Pec ; SoshPert it was proved (for independent of ) that the distribution associated with degenerate outliers is not Gaussian.
and is a centred Gaussian, independent of , with variance
Proof of Theorem 2.2, Case A
In this section we prove Theorem 2.2 in the case A, i.e. where the first three moments of the entries of coincide with those of GOE/GUE.
For definiteness, we consider the case where is a complex Hermitian Wigner matrix; the proof for real symmetric Wigner matrices is the same. By Markov’s inequality, in order to prove Theorem 2.2 it suffices to prove the following result.
The rest of this section is devoted to the proof of Proposition 3.1.
the distance from to the spectral edges . In the following we use the notations
without further comment. The following lemma collects some useful properties of , the Stieltjes transform of the semicircle law.
For we have
(Here the implicit constants depend on .)
The proof is an elementary calculation; see Lemma 4.2 in EYY2 . ∎
In addition to , we shall make use of a larger control parameter , defined as
From Lemma 3.2 we find, for any satisfying ,
where means for some constant .
We shall often need to consider minors of , which are the content of the following definition.
We shall also need the following resolvent identities, proved in Lemma 4.2 of EYY1 and Lemma 6.10 of EKYY2 .
It is an immediate consequence of (3.6) that
Next, we record some basic large deviations estimates.
Let be independent random variables with zero mean and unit variance. Assume that there is a constant such that
Then there exists a constant such that, for any and any deterministic complex numbers and , we have with -high probability
The estimates (3.12) – (3.14) we proved in Appendix B of EYY1 . The estimate (3.15) follows easily from (3.12) in two steps. Defining , (3.12) yields \lvert A_{i}\rvert\leqslant\varphi^{\rho\zeta}\bigl{(}{\sum_{j}\lvert B_{ij}\rvert^{2}}\bigr{)}^{1/2} with -high probability. Since the families and are independent, (3.15) follows by using (3.12) again. ∎
Finally, we quote the following results which are proved in Theorems 2.1 and 2.2 of EYY3 . (Recall that we use the notation for the quantity denoted by in EYY3 .)
Fix . Then there exists a constant such that the event
Denote by the classical locations of the eigenvalues of , defined through
Fix . Then there exists a constant such that
for all with -high probability.
After these preparations, we may prove the key tool behind the proof of Proposition 3.1. It will be used as input in the Green function comparison method, throughout Sections 3.3, 3.4, and 4. Let us sketch its importance in the Green function comparison method. Anticipating the notation from the proof of Lemma 3.9, we shall have to estimate quantities of the form
where the right-hand side is a resolvent expansion of the left-hand side. The first matrix product on the right-hand side may be written as
For any there exists a constant such that
with -high probability for some constant . By spectral decomposition one easily finds that
with -high probability provided that for some large enough . Setting
we therefore conclude, using first (3.8) and then (3.10), that
with -high probability. Thus we find for
where in the last inequality we used the rough bound . Thus (3.20) for GOE/GUE follows from (3.5) and the estimate
From now on we work on the product space generated by the Wigner matrix and the GOE/GUE matrix . We fix a bijective ordering map on the index set of the independent matrix elements,
and denote by , , the Wigner matrix whose upper-triangular entries are defined by
In particular, is a GOE/GUE matrix and .
Let denote the matrix whose matrix elements are given by . Fix and let be determined by . We shall compare with for each and then sum up the differences. Note that the matrices and differ only in the entries and , and they can be written as
here the matrix satisfies .
which are well-defined for since and are self-adjoint. Using the notation , we have the telescopic sum
Now we choose in (3.26). Applying Theorem 3.6 to the Wigner matrix , using the rough bound to estimate the rest term in (3.26), and recalling (2.1), we find
with -high probability. Here we also used (3.5). Throughout the proof we shall tacitly make use of the bound with -high probability, as follows from (3.27).
Next, setting in (3.25), recalling (2.1), and using Lemma 3.8, we find
with -high probability. Now (3.28), (3.5), and Lemma 3.8 yield
After these preparations, we may start to estimate
We choose in (3.25) and introduce the notation , whereby has factors . We write
where depends on the randomness only through and the first three moments of .
Abbreviating we therefore find
Since (3.20) holds for GOE/GUE, we have the initial estimate X_{0}\leqslant\bigl{(}{\varphi^{C_{\zeta}}\Phi}\bigr{)}^{n}. Iteration therefore yields
Next, we observe that . Since , we find . This implies
Using Lemma 3.8 and (3.29), we get the bound
with -high probability, where in the second step we used Lemma 3.10 below and , and in the third step the inequality . Here is some constant to be chosen later, and denotes a constant depending on and . For the following it will be convenient to abbreviate
Next, we observe that (3.30) and (3.5) imply
for all and large enough. Therefore choosing large enough we get from (3.40)
Therefore (3.33) follows using (3.35) if we can prove that
for all . We check that all terms on the left-hand side of (3.41) are bounded, for all , by the right-hand side of (3.41). The first term is trivial: . The second term is bounded by
The third term is bounded similarly. Finally, the last term is bounded by
where denotes a quantity bounded by the three previous terms. This completes the proof of (3.41), and hence of (3.33). ∎
What remains is to prove the following elementary result.
By convexity of the function we have, for any ,
Choosing yields the claim. ∎
We now conclude the proof of Proposition 3.1. By polarization and linearity, it is enough to prove the following result.
For the GOE/GUE matrix we get from Theorem 3.6, as in the proof of Lemma 3.9, that
In order to perform the comparison step, we write, similarly to (3.32),
where depends on the randomness only through and the first three moments of , and
Using Lemma 3.9, (3.4), and (3.5) we find that the right-hand side of (3.44) is bounded by
Therefore (3.43) and (3.44) yield (3.42), exactly as in the paragraph following (3.34).
What remains therefore is to prove (3.44). Using (3.37), (3.5), and Lemma 3.10 we get, for arbitrary ,
with -high probability. Therefore we get, similarly to (3.40),
with -high probability, where we used (3.30), , and Lemma 3.10. Choosing large enough and recalling (3.41) yields (3.44). (We omit the details of the analysis on the low-probability event, which are similar to those following (3.40).) This concludes the proof of Lemma 3.11. ∎
Proof of Theorem 2.2, Case B
In this section we prove Theorem 2.2 in the case B, i.e. we impose no condition on the third moments of the entries of , and satisfies (2.9). By Markov’s inequality, it suffices to prove the following result.
The rest of this section is devoted to the proof of Proposition 4.1. We take over the notation of Section 3, which we use throughout this section without further comment.
The following (trivial) observation will be needed in the next section: The constant may be increased at will without changing in (4.2).
The main technical estimate behind the proof of Lemma 4.2 is the following lemma. Recall the setup (3.21) of the Green function comparison, and in particular the definitions (3.23).
Fix . Then there are constants and , both depending on , such that if (2.9) holds with constant then we have the following. For any we have
Before proving Lemma 4.3, we use it to complete the proof of Lemma 4.2.
Let denote the subset
Now (4.2) follows from (4.3) and (4.5), by repeating the argument after (3.34). ∎
Before proving Lemma 4.3, we record the following lower bound on .
The claim follows immediately from . ∎
Note that the proof of (3.33) did not use the assumption (2.8). In particular, all statements in the proof of Lemma 3.9 after (3.35) remain true in the case B. By (3.33), it is enough to prove
for as well as, assuming (4.4),
for . In order to prove (4.7) and (4.8), we distinguish four cases depending on and whether . Recall from (3.35) that
Case (i): and . Similarly to (3.37), we find
with -high probability, where we used that . Therefore Lemma 3.10 yields
which is (4.8). In particular, we have also proved (4.7). Here we omit the details of the estimate on the event of low probability, which are analogous to those following (3.40).
Case (ii): and . By (4.9), we have . From (3.37) we get
with -high probability. Together with (3.4) and (3.39), this yields
with -high probability and for any . Choosing and in (2.9) large enough, we get from (2.1), (4.6), Lemma 3.10, and that
with -high probability. Now (4.8), and hence also (4.7), follows easily (we omit the details of the analysis on the low-probability event).
Case (iii): and . Consider first the case . Then (see (3.36) and (3.31)) is a finite sum of terms of the form
(The other terms can be obtained from (4.13) by permutation of indices and complex conjugation of factors.) We shall estimate the contribution of ; the other terms are dealt with in exactly the same way. Note the presence of an off-diagonal resolvent matrix element , as required by the condition . From (3.27) and (4.12) we get, with , that
with -high probability. Note the factor arising from the estimate of . Choosing and large enough, and recalling (2.9), we find using Lemma 3.10 that
with -high probability. This yields (4.8) and hence also (4.7).
Let us therefore consider the case and . (The case and is estimated in the same way.) Using the bounds and , we find
with -high probability, for and large enough. This yields (4.7) in the case and .
In order to prove the stronger bound (4.8) in the case and , we note that (3.29), (3.4), (3.5), and the assumption (4.4) yield
(Again, the other terms can be obtained from by permutation of indices and complex conjugation of factors.) We shall show that
We split in the definition of . The first resulting term is estimated, using (3.27), by
Now we observe that, using the bound (3.27), we may repeat the proof of Lemma 3.8 to the letter to find that its statement holds with replaced with . Thus we find
with -high probability, where in the second step we used (3.39) and (4.4), and in the last step (3.5). Using (3.27), (4.4), and , we therefore find
with -high probability, for any . Therefore (3.39) and (4.16) yield
with -high probability. Using (2.9), (4.6), and Lemma 3.10, we find that the right-hand side is bounded by
In order to compare the quantities in the brackets, we use (3.6), (3.27), and (4.16) to get
with -high probability. In particular, we get from (3.39) and (4.16) that
with -high probability, where in the last step we used (4.21) and . Now (4.23) follows easily for large enough in (2.9), using (2.9) and (4.6). This concludes the proof of (4.18) and hence of (4.17).
Case (iv): and . Similarly to (4.15), one easily finds the weak bound (4.7). Let us therefore assume (4.4) and prove (4.8). It suffices to prove that
where stands for any of the following expressions:
Here we used that and are independent of . (Up to an immaterial renaming of indices and complex conjugation, all terms in are covered by one of these three cases.) Applying the splittings and , we find that it suffices to prove (4.27) for being any of
Next, applying the splitting (4.19) to the last line, we find that it suffices to prove (4.27) for being any of
For in (4.28a), we find from (3.27), (4.16), and (4.22) that
with -high probability, from which (4.27) easily follows using (2.9), (4.6), (3.39), and Lemma 3.10, having chosen and in (2.9) large enough.
Using (3.12), (3.4), (3.6), and (3.27), we find
with -high probability. For the second part of resulting from the splitting of , we therefore get the estimate
with -high probability, where we used (4.26), (2.1), (3.12), (3.6), and (4.24). Together with (3.6), (3.27), (4.16), and (4.4), we therefore find
with -high probability. Recalling (4.21), (4.25), Lemma 3.10, and the usual rough estimate on the complementary low-probability event, a telescopic estimate in (4.30) therefore gives
We deal with the last term by applying (3.6) twice, followed by
itself an immediate consequence of (3.6). This gives the graded expansion
with -high probability. Thus we write
From (4.31), (4.33), (3.39), and Lemma 3.10 we therefore get
The second summand of (4.35) consists of terms of the form
Recalling (4.33), we estimate this as above by
What remains is to estimate the third summand in (4.35). From (4.33) and (4.31) we get
We now conclude the proof of Proposition 4.1. By polarization and linearity, it is enough to prove the following result.
for . Here is a large enough constant depending on .
Assuming that (4.37) and (4.39) have been proved, we get the claim (4.36) from (3.44) and Lemma 4.3 applied to ; the detains are identical to those of the proof of Lemma 4.2 and the argument following (3.34).
The proof of (4.37) and (4.39) is similar to the proof of (4.7) and (4.8). The key input is the apriori bound
with -high probability, which follows from (4.2) and Markov’s inequality. Throughout the proof, we shall consistently (and without further mention) make use of the inequality
which follows from the elementary inequality for , Lemma 3.10, and the estimate
with -high probability (as follows from (3.30)). Moreover, as in (4.16), we find that (4.38) implies
As in the proof of Lemma 4.3, we consider four cases.
Case (i): and . This is easily dealt with using (3.45); we omit further details.
Case (ii): and . Recall that in this case we have . From (4.11) we get
with -high probability. Therefore using (4.40), (3.4), and we get
with -high probability, where in the last step we used (2.9). Choosing large enough yields (4.39), and hence also (4.37).
Case (iii): and . In the case , the estimate is similar to the estimate of in (4.13). Using (4.40), (3.4), and we get
with -high probability, from which (4.39), and hence also (4.37), easily follows.
Next, consider the case and . In order to prove (4.37), we estimate using (4.40) and (3.29), similarly to (4.15),
with -high probability from which (4.37) follows. Let us therefore prove (4.39), assuming (4.38). Using (4.41) and (4.40), we find
with -high probability. We need to prove that
As for (4.18), by splitting and using (3.27), we find that it is enough to prove
As for (4.18), we use the splitting (4.19). Using (3.27), (4.40), and (4.6), we find that the bounds
Case (iv): and . In order to prove (4.37), we use (4.40) to get
with -high probability, from which (4.37) easily follows using .
As for (4.27), in order to prove (4.37) and (4.39) it suffices to prove the following claim. For being any expression in (4.28a) – (4.28c), we have
Note that from (4.20) and (4.40) we get that
If is any expression in (4.28a), we get from Lemma 3.8, (3.27), (4.40), and (4.48) that
with -high probability. Now (4.47), and in particular (4.46), follows easily (note that we did not assume (4.38)).
Next, let be an expression in (4.28b). From Lemma 3.8, (3.27), (4.40), and (4.48) we get
with -high probability. Then the argument from the proof of Lemma 4.2 can be applied almost unchanged, and we get (4.47) assuming (4.38). ∎
Proof of Theorems 2.3 and 2.5
By Lemma 3.2, if and then the control parameter on the right-hand side of (2.10) can also be expressed as
where was defined in (3.2).
By polarization and linearity, it is enough to prove that
It remains therefore to establish (5.2) when . Define
By (5.1) and (5.2) at , it is enough to prove that
which, by Lemma 3.2, implies that . Therefore we get
Next, by Theorem 3.7 we have with -high probability provided is large enough. Therefore, since with -high probability for all , we get
with -high probability, by (5.2) at and the estimate . Finally, we estimate the real part from
with -high probability, where in the last step we used that . Combining (5.6) and (5.7) completes the proof of (5.4). ∎
We begin with (2.14), whose proof is immediate. Using Theorem 2.2 with Condition A and Remark 2.4, we find
with -high probability, where we used Theorem 3.7 to ensure that with -high probability. Choosing yields (2.14).
where is the classical location of the -th eigenvalue defined in (3.17). Then we get
where in the first step we used Theorem 3.7 to conclude that for . In order to invoke Theorem 2.2 with Condition B, we have to satisfy (2.9). Recalling Lemma 3.2, we find that (2.9) holds provided that
where we abbreviated . From (3.17) we get
for , from which we deduce, recalling ,
Since , we find that (5.9), and hence (2.9), holds under the condition (2.12).
Therefore we may apply Theorem 2.2 to the right-hand side of (5.8) to get
with -high probability, where we used Lemma 3.2. The claim now follows from the elementary inequalities
For future use, we record the following consequence of Theorem 2.5 which is useful in combination with dyadic decompositions. For any integer we have
Eigenvalue locations: proof of Theorem 2.7
We begin by collecting a few well-known tools from linear algebra, on which our analysis of the deformed spectrum relies.
We use the following representation of the eigenvalues of , which was already used in several papers on finite-rank deformations of random matrices SoshPert ; BGGM1 ; BGGM2 ; BGN .
For the convenience of the reader, we give the simple proof. The claim follows from the computation
We shall also make use of the well-known Weyl’s interlacing property, summarized in the following lemma.
2 Warmup: the rank-one case
For the following we note the elementary estimate
Fix . Then there is a constant such that the following holds. For we have
with -high probability. For we have
By symmetry, an analogous result holds for .
The key identityHere we ignore the possibility that . Since the law of is absolutely continuous, it is easy to check that the interlacing inequalities in Lemma 6.2 are strict with probability one; see e.g. the proof of Lemma 6.7. for the proof is
where in the last step we used . In order to prove the second relation of (6.4), we differentiate (5.5) and use Lemma 3.2 to get
Therefore we get from (6.5) and the mean value theorem applied to that
Therefore (6.4) follows from .
Now choose large enough that for . Thus (6.3) and (6.4) yield
What remains is the case . Choose where is a large constant to be chosen later. For large enough we find from Theorem 2.3
with -high probability. From (3.3) we find
with -high probability. (The first inequality follows from Lemma 6.2.)
Next, abbreviate for some large constant to be chosen later. Using Theorem 3.7 we estimate, for and large enough ,
with -high probability. In the second inequality we estimated the contribution of the eigenvalues using the dyadic decomposition
and the delocalization estimate (5.11). A similar (in fact easier) dyadic decomposition works for the remaining eigenvalues and yields the last term of the second line. Moreover, we have
with -high probability, by Theorems 3.7 and 2.5. Recalling (6.7) and (6.8), we have therefore proved that
3 The permissible region
The rest of this section is devoted to the proof of Theorem 2.7.
We choose an event, denoted by , of -high probability on which the following statements hold.
All statements of Theorems 2.2, 2.3, 2.5, and 3.7 hold.
We note that such a exists. As explained in Section 6.1, we assume without loss of generality that the law of is absolutely continuous. Then conditions (i) and (ii) hold almost surely; we omit the standard proof. That condition (iii) holds with -high probability is a consequence of Theorems 2.2, 2.3, 2.5, and 3.7 (see also Remark 2.4).
For the whole remainder of the proof of Theorem 2.7, we choose and fix an arbitrary realization with . Thus, the randomness of only comes into play in ensuring that is of -high probability. The rest of the argument is entirely deterministic.
For define the sets
Let denote a constant to be chosen later, and define
We shall only consider eigenvalues of in for some large but fixed .
Let denote some large constant to be chosen later. Define the intervals
First we prove (6.11). By definition of (see Theorem 3.7), we find that (6.11) holds if
First we consider the case . On we have
provided is large enough (see Theorem 3.7). In particular, by (6.15) and the definition of , we have . By increasing if necessary we may assume that , where is the constant from Theorem 2.3. Therefore we get from Theorem 2.3 and Lemma 3.2 that
for all . (We include an imaginary part for later applications of (6.16); for the purposes of this proof we set .)
Let . Then we may repeat to the letter the argument in the proof of Theorem 6.3 leading to (6.4). Provided that , where is the constant in (6.16), we therefore get that
for some depending on . It is now easy to put all the estimates associated with together. Recalling (6.16) and choosing large enough yields, for denoting the constant from (6.16),
We concludeHere we use the well-known fact that if then . from (6.16) that is regular if (6.17) holds.
Our aim is to prove that is regular for any satisfying (6.19). Once this is done, the regularity of for satisfying (6.18) or (6.19) will imply (6.12). Choose and estimate
where in the second step we used (6.19). Therefore, by definition of (See also Theorem 2.2) and Lemma 3.2, we get (recall that )
This implies, for any satisfying (6.19), that
for all , we find that is regular provided is chosen large enough that
This completes the analysis of the case (6.19). The case
is handled similarly. This completes the proof. ∎
4 The initial configuration
The functions and are holomorphic on and inside (for large enough ). Moreover, by construction of , the function has precisely one zero inside , namely at . Next, we have
where the second inequality follows from (6.16). The claim now follows from Rouché’s theorem. The eigenvalues near , , are handled similarly. ∎
Before moving on, we record the following result on rank-one deformations.
Now the claim follows by approximating an arbitrary matrix by matrices in , and by using the Lipschitz continuity of the map . ∎
We now deal with the extremal bulk eigenvalues.
Similarly, we have for all satisfying that
We only prove the first statement; the proof of the second one is almost identical. Abbreviate .
Before sketching the proof of the above claim, we show how to use it to conclude the argument. By Proposition 6.6, there are at least eigenvalues in . Recall that by assumption , i.e. for all . Therefore using interlacing, i.e. a repeated application of Lemma 6.2, we conclude that there are exactly eigenvalues in . From the above claim we find that there is at least one eigenvalue in . Using interlacing we find that there are at most eigenvalues in . We conclude that there is exactly one eigenvalue in . We may move on to the -th eigenvalue: we have proved that there are (i) at least eigenvalues in (from the previous step), (ii) at least one eigenvalue in (from the claim), and (iii) at most eigenvalues in (from interlacing); we conclude that there is exactly one eigenvalue in . Continuing in this fashion concludes the proof.
Let us now complete the sketch of the proof of the above claim. Assume for simplicity that and have no common eigenvalues. From Lemma 6.1 we find that is an eigenvalue of if and only if the matrix , defined in (6.14), is singular. Thus, we have to prove that there is an such that is singular. The idea of the argument is to do a spectral decomposition of , and resum all terms not associated with to get something close to . More precisely, we write
Now we turn towards the detailed proof in the general case. Since eigenvalues of may be separated by less than , we begin by clumping together eigenvalues of which are separated by less than . More precisely, we construct a partition of , defined as the finest partition in which and belong to the same block if . Thus, each block consists of a sequence of consecutive integers. We order the blocks of in a “decreasing” fashion, in such a way that if then for all and .
We now derive a bound on the size of the blocks near the edge. Roughly, we shall show that if and then . Let be a large constant to be chosen later. Now choose and satisfying such that and belong to the same block. Then by definition of and we have
where we used the statement of Theorem 3.7 and the definition (3.17). Thus we get the condition
We conclude that if and satisfy and and belong to the same block, then
Let denote the largest integer such that . In particular, by definition of (see Theorem 3.7) we have
Now we choose large enough that
Next, define through . Therefore we get from (6.23) and (6.24) that any such that satisfies
Since blocks are contiguous, we conclude that
for each . Moreover, by definition of (see Theorem 3.7), we find
for all and all such that .
Now we are ready for the main argument. Pick and abbreviate
which will serve to count eigenvalues. (Note that and .) The interval contains precisely those eigenvalues of that are in , and its endpoints and are at a distance greater than from any eigenvalue of . Thus, is the correct generalization of the interval from the sketch given at the beginning of this proof.
In order to avoid problems with exceptional events, we add some randomness to . Recall that satisfies (6.21). Let be a Hermitian random matrix whose upper triangular entries are independent and have an absolutely continuous law supported in the unit disk. For define
From now on we use “almost surely” to mean almost surely with respect to the randomness of . Our main goal is to prove that for each , almost surely, there are at least eigenvalues of in . (Having done this, we shall deduce, by taking , that has at least eigenvalues in .)
Then (assuming ) we know that if and only if is singular. Split
Let . Similarly to the proof of (6.20), we choose and estimate
where we used that for . Moreover,
where is continuous in and independent of . Compare this to (6.22) in the sketch given at the beginning of the proof. By Theorem 2.5, for we have
Now we give the full proof. Recall that is independent of for all . Thus we get from (6.26) and (6.27) that, for large enough and small enough , all eigenvalues of are negative. (Here we used that for .) We shall vary continuously from to and count the number of eigenvalues crossing the origin. Let and denote by
the values of at which . (Recall that the eigenvalues of are distinct.) It is also convenient to write and . For , let
denote the ordered eigenvalues of . We record the following fundamental properties of .
For all , we have for large enough and small enough (depending on ).
(In particular, both one-sided limits exist.)
Property (i) was proved after (6.27). Property (ii) follows from (6.26). Property (iii) follows from Lemma 6.7, using (6.26) and the fact that is continuous.
Moreover, the two following claims are true almost surely.
If for some then for all .
From we conclude that, almost surely, is singular in at least points in the set . Therefore has almost surely at least eigenvalues in . Taking , we find that has at least eigenvalues in .
What remains is to prove that has at most eigenvalues in . We prove this using interlacing, similarly to the corresponding argument given in the sketch at the beginning of the proof. Together with Proposition 6.6, we have proved that there are at least eigenvalues of in . By interlacing (i.e. a repeated application of Lemma 6.2), we find that there are at most eigenvalues of in . We deduce, again using Proposition 6.6, that there are exactly eigenvalues of in .
We have proved that there are at least eigenvalues of in . Using eigenvalue interlacing, we find that there are at most eigenvalues of in . We conclude that there are exactly eigenvalues of in .
We may now repeat this argument for , to get that has exactly eigenvalues in , for . Moreover, by (6.25), we find for any that
5 Bootstrapping and conclusion of the proof of Theorem 2.7
It is easy to see that such a path exists. Informally, condition (ii) states that if the allowed regions for the outliers and do not over lap at time (i.e. the outliers can be distinguished), then they may not overlap at any earlier time.
We continue to work at fixed and with a fixed realization with . Let and be the constants from Proposition 6.5, and choose such that . Define
and abbreviate . By Propositions 6.6 and 6.8, we have that
In order to invoke a continuity argument, we note that Proposition 6.5 yields
for all . Moreover, since is continuous, we find that is continuous in for all .
for all , and in particular for .
where the second inequality follows from the definition of . This yields
where the constant depends only on . Thus we get
where the last inequality follows from (6.13). Repeating this estimate of for the remaining , we find
What remains is the analysis of the extremal bulk eigenvalues. Once again, we make use of a continuity argument. As before, we only consider positive eigenvalues, for some to be chosen below. Note that by interlacing, Lemma 6.2, we have
(using the convention that for ). Recall the role of from the assumptions of Theorem 2.7. Therefore using the definition of (see Theorem 3.7), we find that there is a such that if then
Let now satisfy . Using (6.30), (6.31), and Proposition 6.5, we find
for all . In addition, we know the two following facts about , for all .
satisfies the interlacing bound (6.34) for all .
Let be the set of such that and are in the same connected component of . Thus we conclude from (i) and (ii) that
completes the proof of Theorem 2.7 (recall the definition (6.10)).
Distribution of the outliers: proof of Theorem 2.14
The following proposition reduces the problem to analysing a single explicit random variable.
There is a constant , depending on , such that the following holds. Suppose that
for all . Suppose moreover that for all (2.24) holds. Recall the definitions (2.16) and (2.17). Then we have for all
Before proving Proposition 7.1, we record the following auxiliary result.
Let denote the constant from Theorem 2.3. For any
By symmetry, we may assume that . Moreover, (7.2) follows from (7.1) and polarization.
We therefore prove (7.1) for . We have
Choose and abbreviate . Thus we get, for ,
with -high probability, where in the last step we used Theorem 3.7. (In the proof of Theorem 2.3, the constant was chosen large enough for this application of Theorem 3.7; see (5.6).) A similar calculation using the definition (2.4) yields
Therefore we get, using Theorem 2.3 and Lemma 3.2,
with -high probability. Choosing yields the claim. ∎
We only prove the claim for the case ; the case is handled similarly.
For , where is the constant from Theorem 2.3, we define the Hermitian matrices and through
For the rest of the proof we fix satisfying . We abbreviate . We begin by comparing the eigenvalues of and . Define the eigenvalue index through
with -high probability for . In particular,
with -high probability. Moreover, (7.4) and the condition (2.24) yield, for ,
with -high probability, provided is chosen large enough. We therefore conclude that
with -high probability, provided is large enough.
Next, we compare the eigenvalues of and using second-order perturbation theory (the first-order correction vanishes by definition of and ). Theorem 2.3 yields
with -high probability. Therefore (7.6) and nondegenerate second-order perturbation theory yield, for large enough ,
Next, we analyse and make the link to . From Lemma 7.2 we find
with -high probability. In particular, we have for all that
with -high probability, provided that .
Recall the definition (2.17) of . From Lemma 6.1 and Theorem 3.7, we know that is characterized by the property that there is a such that
with -high probability. Provided is large enough (depending on ), it is easy to see from (7.9) that
with -high probability. Thus we find, using (7.8), (7.9), and (7.10), that for large enough we have
with -high probability. (Here we absorbed the constant into .)
We now prove that with -high probability provided is large enough. Assume by contradiction that . Then we get, using Theorem 2.3 and the condition (2.24), that
with -high probability. Moreover, (7.8), (7.9), and (7.10) yield
with -high probability, where in the last step we used (6.5). Together with (7.12), this yields the desired contradiction provided is large enough. Hence .
Putting (7.3), (7.11), and (7.7) together, we get
with -high probability. Thus we find that, for all between and , we have
with -high probability, where we used (6.5) and (7.9). Using (6.2), (7.10), and (6.5), we conclude that
with -high probability. The claim now follows for large enough , using the identity (6.2). ∎
2 The GOE/GUE case
By Proposition 7.1, it is enough to analyse the random variable
and we abbreviated . For definiteness, we choose in the following.
The following notion of convergence of random variables is convenient for our needs.
Two sequences of random variables, and , are asymptotically equal in distribution, denoted , if they are tight and satisfy
Clearly, if for all .
where in the last step we used the boundedness of . ∎
Let , , , and be sequences of random variables. Suppose that , , and are independent, and and are independent. Then
Next, we observe that and are tight. Therefore, recalling Remark 7.5, we find that it suffices to prove
. Denoting by the Fourier transform of , we find
Let be a GOE/GUE matrix. Assume that satisfies (7.14). Then for large enough we have
In order to estimate the error term in (7.16), we write
Using (3.6) to estimate , as well as Theorem 2.3, Lemma 3.5, and Lemma 3.2, we therefore find that
From (7.16), (7.17), (7.18), and (7.19), we conclude that there exist random variables and satisfying
with -high probability, the rough bound
In order to infer the distribution of from (7.22), we observe that the random variables and are independent. Also, . Recalling Theorem 2.3 and (3.6), we find the bounds
with -high probability, and the rough bounds
Next, let and be independent random variables whose laws are given by
we find that . Moreover, a standard moment calculation and the definition of yield
provided in (7.14) is large enough. Here we used that
For the induction step, we assume that (7.28) holds for all . From (7.22) we find
We estimate the summands on the left-hand side by
In order to conclude the proof of (7.28), we deduce from (7.26) that
Using the induction assumption (7.28) for , (7.29), and the condition , we get from (7.31), (7.32), and (7.27) that
for large enough . This concludes the proof of (7.28).
for any continuous bounded function . Next, we estimate
with -high probability, where in the second step we used Lemma 7.2, (5.5), and Lemma 3.2 to estimate the first term, and Theorem 2.3 and (6.1) to estimate the second term. Therefore
with -high probability, where in the second step we used (7.29). Therefore (7.33), the fact that , and dominated convergence yield
The claim now follows from Lemma 7.10 below. ∎
Let be a bounded deterministic sequence. Let be random variables such that converges weakly to . Then we have for any bounded continuous function
The claim now follows by dominated convergence. ∎
3 The almost-GOE/GUE case
We compare the original Wigner matrix with , a Wigner matrix obtained from by replacing the -th entry of with a Gaussian whenever and .
We compare the matrix to a Gaussian matrix.
The step (ii) is performed in this section. To simplify notation, we write instead of throughout this section. The step (i) is performed using Green function comparison in Section 7.4 below.
The following shorthand will prove useful.
Let be a bounded positive sequence. If and are independent random variables with , and if , then we write
As before, we consistently drop the spectral parameter from our notation.
where we used that and the fact that , , and are independent.
with -high probability. In order to prove (7.36), write
We consider four cases. First, if we find using (3.15) that
with -high probability. Second, if we find using (3.13) and (3.14) that
with -high probability. Third, for we have by (3.13)
with -high probability. Finally, for we have by (3.15)
with -high probability. This completes the proof of (7.36).
Next, abbreviate . Since is an GOE/GUE matrix, we find from (7.36), Theorem 2.3, and Lemma 3.2 that
with -high probability. Therefore Schur’s formula yields
Similarly, using (3.13) and (3.14) we find that
with -high probability, using (3.15) that
with -high probability, and using (3.13) that
with -high probability. Using Theorem 2.3 applied to (recall that and are independent), we therefore get from (7.39) that
with -high probability. We write this as
Next, we identify the asymptotic laws of . There is nothing to be done with . By definition,
The variance of the term in parentheses is
Since , we get from the Central Limit Theorem and Lemma 7.10 that
we find from the Central Limit Theorem and Lemma 7.10 that
Thus we conclude from the Central Limit Theorem and Lemma 7.10 that
Next, (7.43) – (7.47) imply that are tight (as -dependent random variables). Moreover, an easy variance calculation shows that is also tight. Therefore we get from (7.35), (7.42), (7.43) – (7.47), Lemma 7.7, and Lemma 7.8 that (recall the notation from Definition 7.11)
the Central Limit Theorem, Lemma 7.10, and Lemma 7.8 we find
4 Conclusion of the proof of Theorem 2.14
Define a new Wigner matrix through
Thus, satisfies the assumptions of Proposition 7.12. Let
be the set of matrix indices to be replaced. Similarly to (3.21), we choose a bijective map and denote by the matrix defined by
In particular, and . Let now satisfy . Similarly to (3.22), we write
Thus we have the rough bound which we shall tacitly use in the following. We use the notation (3.23), which gives rise to the quantities defined through (7.49) with replaced by respectively. We may now state the main comparison estimate.
where satisfies ,
Before proving Lemma 7.13, we show how it implies Theorem 2.14.
Applying (7.50) and (7.51) with replaced by yields
Subtracting this from (7.50) and using yields
We now iterate (7.53), starting at and . Using that and , we find after iterations of (7.53)
Moreover, using and , we find that
Using Lemma 7.2, it is now easy to remove the imaginary part of to get
Since satisfies the assumptions of Proposition 7.12, we find
using the notation of Definition 7.11. Now Theorem 2.14 follows from Proposition 7.1 and Lemma 7.7. ∎
with -high probability. This yields
with -high probability for some constant . Now choose . By definition of , we have that and . Therefore
with -high probability. This yields
Using (7.54), it is easy to check that is bounded by the right-hand side of (7.55), and that
with -high probability. In particular,
with -high probability. Moreover, using, , , (7.57) for , and the fact that is bounded by the right-hand side of (7.55), we find that
with -high probability, provided is chosen large enough. Similarly, using (7.57) we find that for for large enough . Thus we conclude from (7.56) that
depends on the randomness only through and the first two moments of . Moreover, from (7.57) and the fact that is bounded by the right-hand side of (7.55), we conclude that .
If , it is easy to see from (7.57) and the definition of that
with -high probability . Therefore it suffices to prove that
with -high probability. We only deal with the first term of ; the second one is dealt with analogously. Recalling the definition of , we conclude that, in order to establish (7.59), it suffices to prove
with -high probability; here we used that is independent of .
see (4.20). The second and third terms are estimated using (7.54) and Theorem 2.3:
with -high probability. Moreover, since , we find from Lemma (3.12), Theorem 2.3, and (3.8) that
What remains is to estimate the right-hand side of (7.64). Defining
with -high probability. Using (7.63) and using that the derivative of is bounded, we may estimate the first term of (7.64) as
with -high probability. Thus we may estimate the third term of (7.64) by