Bulk Universality for Wigner Matrices
Laszlo Erdos, Sandrine Peche, Jose A. Ramirez, Benjamin Schlein, Horng-Tzer Yau
Introduction
The fundamental reason why random matrices have been used to model many large systems is based on the belief that their local eigenvalue statistics are universal. This is generally referred to as the universality of random matrices. It is well-known that the local behavior of eigenvalues near the spectral edge and in the bulk are governed by the Tracy-Widom law and by the Dyson sine kernel, respectively. Since the seminal work of Dyson for the Gaussian Unitary Ensemble (GUE), the universality both for the edge and the bulk were proven for very general classes of unitary invariant ensembles in the past two decades (see, e.g. and references therein). For non-unitary ensembles, the most natural examples are the Wigner matrix ensembles , i.e., random matrices with independent identically distributed entries. The edge universality for these ensembles was proved by Soshnikov using the moment method; the bulk universality remained unknown due to a lack of method to analyze local spectral properties of large matrices inside the spectrum. For ensembles of the form
where is a Wigner matrix, is an independent standard GUE matrix and is a positive constant of order one (independent of ), the bulk universality was proved by Johansson . (Strictly speaking, the range of the parameter in depends on the energy . This restriction was later removed by Ben Arous and Péché , who also extended this approach to Wishart ensembles).
The approach of is partly based on the asymptotic analysis of an explicit formula by Brézin-Hikami for the correlation functions of the eigenvalues of . This matrix can also be generated by a stochastic flow
and the evolution of the eigenvalues is given by the Dyson Brownian motion . The result of thus states that the bulk universality holds for times of order one. The eigenvalue distribution of GUE is in fact the invariant measure of Dyson Brownian motion. (Rigorously speaking, the Brownian motion has to be replaced by an Ornstein-Uhlenbeck process, but we will neglect this subtlety.) It is thus tempting to derive the universality of via the convergence to equilibrium. We have recently carried out this approach and the key observation is that the sine kernel, as a property of local statistics, depends almost exclusively on the convergence to local equilibrium. With this method we have reduced the necessary time to , for any in . Note that the relaxation time to local equilibrium is ; the additional exponent is due to technical reasons.
From the stochastic calculus, one can see that the typical distance between the corresponding eigenvalues of and is of order . Thus the bulk universality of would hold if we could prove the Dyson sine kernel for time . On the other hand, for time smaller than , the eigenvalues do not move in the scale and the dynamical consideration seems to be pointless. In this paper, we provide an approach to address the comparison of eigenvalues between and . To describe the idea, we now introduce the notations.
Suppose the real and imaginary parts of the offdiagonal matrix elements evolve according to the Ornstein-Uhlenbeck (OU) process
with the reversible measure and initial distribution (strictly speaking, a differently normalized OU process is used for the diagonal elements but we omit this detail here). Under this process, the matrix evolves as
and the expectation and variance of the matrix entries remain constant. Notice for time small, when compared with (1.1), after a trivial rescaling.
The initial distribution of all the matrix elements is with . Let be the generator on the product space and be the dynamics of the OU process for all the matrix elements. The joint probability distribution of the matrix elements at time is then given by
Suppose that for some small, say, with , we know the local eigenvalue correlation function w.r.t. . Let
be the total variation norm between to . In order to approximate the correlation functions of by in a weak sense (tested against bounded observables), we need . Heuristically, and this requires that which is far from the time scale for which the sine kernel has been proven in . For observables on short scales, an effective speed of convergence for the total variation is needed. For example, to test a local observable with two variables in scale , as in the case of the Dyson sine kernel, one has to prove .
Although the heuristic bound can be improved to , further improvement seems to be impossible. Thus we are unable to obtain even the weaker bound for . The main observation in the current paper is that, while we cannot compare with , it suffices to prove the existence of some function for which the correlation functions with respect to can be computed for and . Since the necessary input to compute the correlation functions is the validity of the semicircle law on short scales, which we have proved for a wide class of distributions in , the choice of is essentially dominated by the condition . Note that itself may depend on . Since , we could, in principle, choose . But the diffusive dynamics cannot be reversed besides a very special class of initial data . However, we only have to approximately reverse the dynamics and the choice G_{t}=\big{[}1-tL+\frac{1}{2}t^{2}L^{2}\big{]}^{\otimes n}F turns out to be sufficient. In this case, and we will show that
Furthermore, under some mild regularity condition on , is in the class for which we can establish the local semicircle law . We will call this argument the method of time reversal.
We now summarize the assumptions on the initial distribution. Let the probability measure of the real and imaginary parts of the off-diagonal matrix elements be of the form
with some constants , and . In Section 5 we explain how to relax this latter condition to exponential decay,
with some constants (in fact, some high power law decay is sufficient). We assume that the first moment of is zero and the variance is
We assume the conditions (1.5), (1.6) and (1.8) for as well with the variance changed to 1.
Let denote the probability density of eigenvalues and for any , let
be the -point correlation function. With our choice of the variance of , the density is supported in and in the limit it converges to the Wigner semicircle law given by the density
With similar methods we can also prove that the higher order rescaled correlation functions,
converge in the weak sense to \mbox{det}\big{(}f(a_{i}-a_{j})\big{)}_{1\leq i,j\leq k} where , however this statement requires more regularity conditions on . The proof of the sine kernel for immediately implies the convergence of the higher order correlation functions with respect to the evolved measure. To conclude for the higher order correlation functions with respect to , however, one needs to improve the accuracy in (1.4). This can be achieved by approximating the backward evolution to a higher order. For example, using G_{t}=\big{[}1-tL+\frac{1}{2!}(-tL)^{2}-\ldots\frac{1}{(m-1)!}(-tL)^{m-1}\big{]}^{\otimes n}F, will improve the bound (1.4) to , modulo corrections, if is -times differentiable with bounds similar to (1.5).
We now state our result concerning the eigenvalue gap distribution. For any and we define the density of eigenvalue pairs with distance less than in the vicinity of by
where for some .
Suppose the probability measure of the matrix elements satisfies conditions (1.5), (1.6) and (1.8). Let be the operator acting on with kernel . Then for any with and for any we have
where denotes the Fredholm determinant of the compact operator .
As a corollary of Theorem 1.2, one can easily show that the probability to find no eigenvalue in the interval , after averaging in an interval of size around , is given by , same as in the case of GUE (see, e.g., ). Note that assuming more regularity on the exponent of the density , we can get a better bound on the convergence rate (by approximating the backwards evolution to a higher order) and avoid therefore the averaging over .
The proof of Theorem 1.1 and 1.2 consists of two main parts. In Section 2 we prove the approximation (1.4) under precise conditions on the initial distribution . In Section 3 we prove the sine kernel for the distribution with for any , which is the optimal time scale for such a result. Our approach is to recast the formula for the correlation function in , which becomes unstable for , into a more symmetric form (Proposition 3.2) so that it is stable for all time up to . The saddle point analysis can then be achieved with the local semicircle law from . Finally, we complete the proofs of the main theorems in Section 4.
The method of time reversal described previously is very general and should be applicable to a wide range of models. More significantly, it explains the origin of the universality, i.e., the universality comes from the “time reversal”. To summarize, the universality consists of the following observations: (1) The local statistics are determined by the local equilibrium measures. (2) The relaxation to local equilibria takes place in a short time. (3) The original distribution can be well-approximated by the distribution of the Dyson Brownian motion for a short time with initial data given by an approximate inverse flow. To implement this scheme, a key input is to estimate the fluctuations of the empirical density of eigenvalues in short scales.
Shortly after this manuscript appeared on the arXiv, we learned that our main result was also obtained by Tao and Vu in under essentially no regularity conditions on the initial distribution provided the third moment of vanishes. Some partial results for the Gaussian orthogonal ensembles are also obtained and we refer the reader to the preprint for more details.
Conventions. We will use the letters and to denote general constants whose precise values are irrelevant and they may change from line to line. These constants may depend on the constants in (1.5)–(1.8).
Method of Time Reversal
Recall the Ornstein-Uhlenbeck process from (1.3) with the reversible measure . Let be a positive density with respect to , i.e. and we write .
Let satisfy the conditions (1.5), (1.6) with some and (1.8). Let be sufficiently small and . Define a cutoff initial density as
where is a smooth cutoff function satisfying for and for and and are chosen such that is a probability density with zero expectation. Denote , and with .
with some depending on and .
(ii) is a probability measure with respect to and for we have
where depends on and on the constants in (1.5), (1.6).
In the formulation of this proposition we have not taken into account that in our application the diagonal elements of the matrix evolve under a differently normalized OU process with generator with invariant measure . This modification is only notational and does not affect the validity of the estimates (2.1) and (2.2).
Proof. From condition (1.6) the estimate (2.1) follows directly by noting that the constants and are subexponentially small in . For the proof of (2.2), we first control the evolution of each matrix element under the OU process (1.3). We assume that for the initial density
hold with some constants positive and . Set for some and note that is a probability density with respect to if
Note that by the monotonicity preserving property of the Ornstein-Uhlenbeck kernel and by (2.3), we have
Here we used the fact that under the first condition in (2.3), which follows from integrating the inequality
where we used (2.6), (2.5) and finally (2.4).
Now we consider the evolution of the product density , note that . Applying the same procedure to each variable, we have
as long as is bounded. In our application , thus (2.9) will imply (2.2) provided that
which will also guarantee (2.7). It is straightforward to check that the density satisfies (2.3) with constants subject to (2.4) and (2.10). This completes the proof.
Sine kernel for the time evolved measure
We use the contour integral representation for the correlation functions of the eigenvalues of a matrix of the form , where is a GUE matrix . We will apply this result for the matrix
where, apart from a trivial prefactor , plays the role of and . In order to be able to use the formula given in Proposition 1.1 of to analyze , we rescale the variance of from to which changes the semicircle law for to
In particular, the support changes from $[-\sqrt{1+4a^{2}},\sqrt{1+4a^{2}}]a|u|<2|u|<1\widehat{H}$ will also change from the one given in (1.10) to
In the rest of this Section we will use (3.3). The main result of this section is
Proof. Using Proposition 1.1 of , the (symmetrized) distribution of the eigenvalues of for any fixed is given by
where is the eigenvalues of the Wigner matrix with the choice of . Note that
where are the eigenvalues of the Wigner matrix . We will choose to be the event that the points follow the semicircle law (3.3). The limit of the correlation functions of will be computed starting from the next section in Proposition 3.3.
with some sufficiently small and we set
(after taking the supremum over all energies, which can be controlled taking energies on a grid of spacing ). Note that the variance of the matrix elements in was different (see remark at the beginning of Section 3.1) but this does not change the estimates. The condition C1) of on the Gaussian decay for the initial density is clearly satisfied by (2.3) and (1.6). Combining the estimate (3.9) with Proposition 3.3 and with the argument after (3.6), we have proved Proposition 3.1.
We compute the correlation functions of in , for any fixed :
Note that this definition of the correlation functions differs from the definition of given in ; the relation being
The following representation is based on the formula in , but it is more stable and suitable for analysis for very short time.
The correlation functions can be represented as
Proof of Proposition 3.2. From Eq. (2.18) in , we have
The change of variables , leads to
for every . Taking the derivative in at , and removing the primes from the new integration variables, we find the identity
Using that , we find
The second term on the r.h.s. is just . Therefore
Integrating back over , starting from , we find that
At this point the contours of integration can be modified; since the singularity has been removed, they are now allowed to cross. This completes the proof of the proposition.
Let . For any sequence with the choice we have
uniformly for and for in a compact set. Moreover, the correlation functions satisfy
uniformly for and for in a compact set.
Proof. The statement in (3.15) follows directly from (3.14) and (3.11), so it is sufficient to prove (3.14). We will prove (3.14) in the form
for any sequence with and for every fixed with . In order to get (3.14), we take with .
2 Saddle points
This is equivalent to finding the zeros of a polynomial of degree . There are real roots and two complex roots, called , that are complex conjugates of each other
We will work with , the analysis of the other saddle is analogous. Clearly for some large .
The solutions of this latter equation (for small ) are given by
where we also used the equation (3.23) for . We set .
We need to know that at the saddle.
It follows from (3.8) that for we have
We compare and . We have from (3.22)
First we show that for the only solution to (3.26) with positive imaginary part we have . This is a fixed point argument.
for some large constant . Since , we know that
with if . Thus for , so is a contraction on and thus (3.26) has a unique solution, which is .
3 Evaluating the integrals
Using Laplace asymptotics, we compute the integrals in (3.17). We choose the horizontal curves to pass through the two saddles of (see (3.24)), i.e. we set (see the definition of after (3.12)). The vertical line is shifted to pass through the saddles, i.e. . Moreover, if necessary, we deform in a -neighborhood of so that and ; this is always possible.
according to whether and are positive or negative, e.g.
where and . We will work on , the other three integrals are treated similarly.
The main contribution to the integral will come from an -neighborhood in and of the saddle point . The radius will be chosen such that after a local change of variable and become quadratic near the saddle. We now explain the local change of variable.
with , such that
we also assume that . We will choose with a small , depending on . We have
from the explicit formula (3.23), so (3.32) is satisfied. Note that .
We have a similar change of variables for , i.e. with the properties that
For , we have f^{\prime\prime}_{N}(q_{N})=t^{-1}\big{[}1+O(N^{-\lambda/4})\big{]} and by (3.25) and (3.33), thus we can choose for some small constant .
Moreover we have for , so by Cauchy formula and for (maybe after reducing ). The same formulas hold for as well. We also have
where in the first term we used (3.25) and in the second we used .
for any with . Therefore the maps and are -close within and both of them are -close to the shift map .
We first consider the integration. Recall that from (3.24). We fix a small positive constant and we define the domains
where . Recall that was the horizontal line going through , the saddle of . We will deform to so that it passes through and it matches with at the points . Within the regime , we define by the requirement that along . Since is close to the map by (3.36), clearly is almost horizontal curve in small neighborhood of , so it remains in until it reaches the vertical lines . In the regime , we require that matches with at the points and it remains in the wedge . In the outside regime, we set , in particular (see Fig. 1).
Proof. The second statement (3.40) follows from the normal form (3.35) and the fact that for we have , i.e. , and is close to the map in , so for .
For the first statement, we assume , the case is analogous. We get by explicit calculation
(the error is absorbed since for ). Since on the vertical lines , , we can integrate the inequality (3.41) to obtain (3.39).
which holds for . By explicit computation, and using ,
if , for some large . Thus we have
where with a small as before and a similar lower bound holds for . Defining
analogously to before, we easily obtain
The regimes and are treated directly. We use
from (3.42), if is sufficiently small, see (3.7).
If , then
and thus in this regime. Summarizing these results, we have
We can define a new contour similar to the . It follows the path where has zero imaginary part when and then it returns to when . We recall that and by the choice of .
With the paths and defined, we can now do the integration
if , . In order to make sure that these bounds are satisfied, we fix the constant in (3.17). Here is the unique solution with positive imaginary part of the saddle point equation (3.22), with (which is actually a short hand notation for ) replaced by the fixed . Note that, since , we find that the real part of the exponent of (see (3.20)) is bounded, , as runs through .
This choice also guarantees that, away from the saddle,
that hold for , . These bound follow from (3.19), (3.20) and (3.21) and when is near the real axis, we also used that is away from the ’s.
The integration in (see (3.47)) will be divided into regimes near the saddle (“inside”) or away from the saddle (“outside”):
Recall that and (see (3.24)). For example
where is the characteristic function of the interval . The other ’s are defined analogously.
The integral of the exponential term is bounded by
Taking into account (3.48) and (3.49), we see that since . Similarly we can bound all other terms with an outside part. When , then the exponential growth of in (3.49) will be controlled by the Gaussian decay of
Finally, we have to compute the contribution of the saddle, i.e. the term . We let be the part of with and similarly defined . Recall that on . From standard Laplace asymptotics calculation, we have
while the main term in the bracket on the r.h.s. of (3.51) is of order . Analogously performing the integration, we obtain that
where we also used following from (3.21). So far we considered the saddle with positive imaginary part for both the and integrals. The same calculation can be performed at the saddle . The mixed case, when is integrated near one of the saddles and is near the other one, gives zero contribution, since by (3.21). Adding up the contributions of the two relevant saddles, and , taking into account the opposite orientations of the two pieces of , one obtains
where we used the choice (see after (3.48)), which guarantees that as , and the equations (3.16), (3.24), and (3.28). This completes the proof of Proposition 3.3.
Proof of the main theorems
Proof of Theorem 1.1. We follow the notations of Proposition 2.1. In Proposition 3.1 we have shown that the sine kernel holds for the measure if . More precisely, let , denote the density function of the eigenvalues w.r.t. and let be the two point correlation function, defined analogously to (1.9). Similarly, we define and for the eigenvalue density and two point correlation function w.r.t. truncated measure .
for any and with the notation . (We remark that was denoted by in Proposition 3.1 and the condition is translated into after rescaling.)
To prove (1.11), we thus only need to control the difference as follows
with some as . To estimate , we have
Using (4.1) for the observable instead of , the second factor on the r.h.s. of (4.2) is bounded. Since is bounded, the first factor is smaller than
Here in the first step we used that the quantity for two probability measures and decreases when taking marginals. In the second step, we used that decreases when passing the probability laws from matrix elements to the induced probability laws for the eigenvalues. Finally, we used the estimate (2.2). This completes the proof of Theorem 1.1.
by recalling (3.5). The second term can be estimated by using and (3.9) as
For the first term in (4.4), we use the exclusion-inclusion principle to compute
with (see (3.2)) and recall that denote the correlation functions of (see (3.10)). After a change of variables,
where the factor comes from considering the integration sector , . Taking and using Proposition 3.3, we get
where in the last determinant term we set . The interchange of the limit and the summation can be justified by noting that the exclusion-inclusion principle guarantees that (4.6) is an alternating series where the difference between the sum and its -term truncation can be controlled by the -th term for any . We note that the left hand side of (4.8) is , where is the second derivative of the Fredholm determinant (see (1.13)). Combining (4.8) with the estimate (4.5), we have
After rescaling (3.1), we also conclude that the limit of the expectation of with respect to the time evolved ensemble (see Proposition 2.1) is given by right hand side of (4.9).
Finally, the difference of the expectation of with respect to the measure and w.r.t. the initial ensemble vanishes since and (see (2.1) and (2.2)). This completes the proof of Theorem 1.2.
Some extensions and comments
In this section we explain how to relax some of the conditions on the initial distribution .
We first explain how to extend our proof to include distributions with compact support. Take for example a density w.r.t. the Gaussian measure that is given by a nice bump function supported in $(1\pm x)^{m}x=\pm 1mm$ large enough, it is still possible to prove the universality. Define a new distribution with density
with a small parameter to be determined later. Near the edge we have for with some -dependent constant . We thus need the condition
to guarantee that is a probability density. This inequality holds if
The other conditions concerning and (see (2.3)) can be handled similarly. Choosing , the total variation norm is bounded by
Since and , we have
Let, say, , then the error term will be smaller than with some and this will imply Theorem 1.1 for the initial distribution . The modification of in (5.1) can certainly be more sophisticated to reduce the exponent .
Therefore, Theorem 4.1 of holds with the estimates taking the form