Bulk Universality for Wigner Matrices

Laszlo Erdos, Sandrine Peche, Jose A. Ramirez, Benjamin Schlein, Horng-Tzer Yau

Introduction

The fundamental reason why random matrices have been used to model many large systems is based on the belief that their local eigenvalue statistics are universal. This is generally referred to as the universality of random matrices. It is well-known that the local behavior of eigenvalues near the spectral edge and in the bulk are governed by the Tracy-Widom law and by the Dyson sine kernel, respectively. Since the seminal work of Dyson for the Gaussian Unitary Ensemble (GUE), the universality both for the edge and the bulk were proven for very general classes of unitary invariant ensembles in the past two decades (see, e.g. and references therein). For non-unitary ensembles, the most natural examples are the Wigner matrix ensembles , i.e., random matrices with independent identically distributed entries. The edge universality for these ensembles was proved by Soshnikov using the moment method; the bulk universality remained unknown due to a lack of method to analyze local spectral properties of large matrices inside the spectrum. For ensembles of the form

where $\widehat{H}$ is a Wigner matrix, $V$ is an independent standard GUE matrix and $a$ is a positive constant of order one (independent of $N$ ), the bulk universality was proved by Johansson . (Strictly speaking, the range of the parameter $a$ in depends on the energy $E$ . This restriction was later removed by Ben Arous and Péché , who also extended this approach to Wishart ensembles).

The approach of is partly based on the asymptotic analysis of an explicit formula by Brézin-Hikami for the correlation functions of the eigenvalues of $\widehat{H}+aV$ . This matrix can also be generated by a stochastic flow

and the evolution of the eigenvalues is given by the Dyson Brownian motion . The result of thus states that the bulk universality holds for times of order one. The eigenvalue distribution of GUE is in fact the invariant measure of Dyson Brownian motion. (Rigorously speaking, the Brownian motion has to be replaced by an Ornstein-Uhlenbeck process, but we will neglect this subtlety.) It is thus tempting to derive the universality of $\widehat{H}+\sqrt{s}V$ via the convergence to equilibrium. We have recently carried out this approach and the key observation is that the sine kernel, as a property of local statistics, depends almost exclusively on the convergence to local equilibrium. With this method we have reduced the necessary time to $N^{-1+\xi}$ , for any $\xi>1/4$ in . Note that the relaxation time to local equilibrium is $N^{-1}$ ; the additional exponent $\xi$ is due to technical reasons.

From the stochastic calculus, one can see that the typical distance between the corresponding eigenvalues of $\widehat{H}+\sqrt{s}V$ and $\widehat{H}$ is of order $(s/N)^{1/2}$ . Thus the bulk universality of $\widehat{H}$ would hold if we could prove the Dyson sine kernel for time $s\ll 1/N$ . On the other hand, for time smaller than $1/N$ , the eigenvalues do not move in the scale $1/N$ and the dynamical consideration seems to be pointless. In this paper, we provide an approach to address the comparison of eigenvalues between $\widehat{H}+\sqrt{s}V$ and $\widehat{H}$ . To describe the idea, we now introduce the notations.

Suppose the real and imaginary parts of the offdiagonal matrix elements evolve according to the Ornstein-Uhlenbeck (OU) process

with the reversible measure $\mu({\rm d}x)=e^{-x^{2}}{\rm d}x$ and initial distribution $u_{0}=u$ (strictly speaking, a differently normalized OU process is used for the diagonal elements but we omit this detail here). Under this process, the matrix evolves as

and the expectation and variance of the matrix entries remain constant. Notice for time $t$ small, $t\approx a^{2}$ when compared with (1.1), after a trivial rescaling.

The initial distribution of all the matrix elements is $F\,{\rm d}\mu^{\otimes n}=(u\;{\rm d}\mu)^{\otimes n}$ with $n=N^{2}$ . Let ${\mathcal{L}}$ be the generator on the product space and $e^{t{\mathcal{L}}}:=(e^{tL})^{\otimes n}$ be the dynamics of the OU process for all the matrix elements. The joint probability distribution of the matrix elements at time $t$ is then given by

Suppose that for some $t$ small, say, $t=N^{-1+\lambda}$ with $\lambda>0$ , we know the local eigenvalue correlation function w.r.t. $F_{t}$ . Let

be the total variation norm between $F_{t}$ to $F$ . In order to approximate the correlation functions of $F$ by $F_{t}$ in a weak sense (tested against bounded observables), we need $Var(F,F_{t})\to 0$ . Heuristically, $Var(F,F_{t})\sim tN^{2}$ and this requires that $t\ll N^{-2}$ which is far from the time scale $t\geq N^{-1+\xi}$ for which the sine kernel has been proven in . For observables on short scales, an effective speed of convergence for the total variation is needed. For example, to test a local observable with two variables in scale $1/N$ , as in the case of the Dyson sine kernel, one has to prove $Var(F,F_{t})=o(N^{-2})$ .

Although the heuristic bound $Var(F,F_{t})\sim tN^{2}$ can be improved to $Var(F,F_{t})\sim tN$ , further improvement seems to be impossible. Thus we are unable to obtain even the weaker bound $Var(F,F_{t})=o(1)$ for $t>1/N$ . The main observation in the current paper is that, while we cannot compare $F$ with $F_{t}$ , it suffices to prove the existence of some function $G$ for which the correlation functions with respect to $e^{t{\mathcal{L}}}G$ can be computed for $t\geq N^{-1+\lambda}$ and $Var(F,e^{t{\mathcal{L}}}G)=o(N^{-2})$ . Since the necessary input to compute the correlation functions is the validity of the semicircle law on short scales, which we have proved for a wide class of distributions $\nu$ in , the choice of $G$ is essentially dominated by the condition $Var(F,e^{t{\mathcal{L}}}G)=o(N^{-2})$ . Note that $G$ itself may depend on $t$ . Since $F=e^{t{\mathcal{L}}}(e^{-t{\mathcal{L}}}F)$ , we could, in principle, choose $G=e^{-t{\mathcal{L}}}F=[e^{-tL}]^{\otimes n}F$ . But the diffusive dynamics cannot be reversed besides a very special class of initial data $G$ . However, we only have to approximately reverse the dynamics and the choice $G_{t}=\big{[}1-tL+\frac{1}{2}t^{2}L^{2}\big{]}^{\otimes n}F$ turns out to be sufficient. In this case, $e^{t{\mathcal{L}}}G_{t}-F=O(N^{2}t^{3})$ and we will show that

Furthermore, under some mild regularity condition on $F$ , $G_{t}$ is in the class for which we can establish the local semicircle law . We will call this argument the method of time reversal.

We now summarize the assumptions on the initial distribution. Let the probability measure of the real and imaginary parts of the off-diagonal matrix elements be of the form

with some constants $\delta>0$ , $C$ and $C^{\prime}$ . In Section 5 we explain how to relax this latter condition to exponential decay,

with some constants $C,C^{\prime}$ (in fact, some high power law decay is sufficient). We assume that the first moment of $\nu$ is zero and the variance is $\frac{1}{2}$

We assume the conditions (1.5), (1.6) and (1.8) for $\widetilde{V}$ as well with the variance changed to 1.

Let $p_{N}(x_{1},x_{2},\ldots,x_{N})$ denote the probability density of eigenvalues and for any $k=1,2,\ldots,N$ , let

be the $k$ -point correlation function. With our choice of the variance of $\nu$ , the density $p^{(1)}_{N}(x)$ is supported in $+o(1)$ and in the $N\to\infty$ limit it converges to the Wigner semicircle law given by the density

With similar methods we can also prove that the higher order rescaled correlation functions,

converge in the weak sense to $\mbox{det}\big{(}f(a_{i}-a_{j})\big{)}_{1\leq i,j\leq k}$ where $f(\tau)=\frac{\sin\pi\tau}{\pi\tau}$ , however this statement requires more regularity conditions on $V$ . The proof of the sine kernel for $e^{tL}G_{t}$ immediately implies the convergence of the higher order correlation functions with respect to the evolved measure. To conclude for the higher order correlation functions with respect to $F$ , however, one needs to improve the accuracy in (1.4). This can be achieved by approximating the backward evolution $e^{-t{\mathcal{L}}}$ to a higher order. For example, using $G_{t}=\big{[}1-tL+\frac{1}{2!}(-tL)^{2}-\ldots\frac{1}{(m-1)!}(-tL)^{m-1}\big{]}^{\otimes n}F$ , will improve the bound (1.4) to $t^{2m}N^{2}$ , modulo $N^{\varepsilon}$ corrections, if $V$ is $2m$ -times differentiable with bounds similar to (1.5).

We now state our result concerning the eigenvalue gap distribution. For any $s>0$ and $|u|<2$ we define the density of eigenvalue pairs with distance less than $s/N\varrho_{sc}(u)$ in the vicinity of $u$ by

where $t_{N}=N^{-1+\delta}$ for some $0<\delta<1$ .

Suppose the probability measure of the matrix elements satisfies conditions (1.5), (1.6) and (1.8). Let ${\mathcal{K}}_{\alpha}$ be the operator acting on $L^{2}((0,\alpha))$ with kernel $\frac{\sin\pi(x-y)}{\pi(x-y)}$ . Then for any $u$ with $|u|<2$ and for any $s>0$ we have

where $\det$ denotes the Fredholm determinant of the compact operator $1-{\mathcal{K}}_{\alpha}$ .

As a corollary of Theorem 1.2, one can easily show that the probability to find no eigenvalue in the interval $[u,u+\alpha/(\varrho_{sc}(u_{0})N)]$ , after averaging in an interval of size $N^{-1+\delta}$ around $u_{0}\in(-2,2)$ , is given by $\det(1-{\mathcal{K}}_{\alpha})$ , same as in the case of GUE (see, e.g., ). Note that assuming more regularity on the exponent of the density $u(x)=e^{-U(x)}$ , we can get a better bound on the convergence rate (by approximating the backwards evolution $e^{-t{\mathcal{L}}}$ to a higher order) and avoid therefore the averaging over $u$ .

The proof of Theorem 1.1 and 1.2 consists of two main parts. In Section 2 we prove the approximation (1.4) under precise conditions on the initial distribution $u=e^{-V}$ . In Section 3 we prove the sine kernel for the distribution $e^{t{\mathcal{L}}}G_{t}$ with $t=N^{-1+\lambda}$ for any $\lambda>0$ , which is the optimal time scale for such a result. Our approach is to recast the formula for the correlation function in , which becomes unstable for $t\ll 1$ , into a more symmetric form (Proposition 3.2) so that it is stable for all time up to $t=N^{-1+\lambda}$ . The saddle point analysis can then be achieved with the local semicircle law from . Finally, we complete the proofs of the main theorems in Section 4.

The method of time reversal described previously is very general and should be applicable to a wide range of models. More significantly, it explains the origin of the universality, i.e., the universality comes from the “time reversal”. To summarize, the universality consists of the following observations: (1) The local statistics are determined by the local equilibrium measures. (2) The relaxation to local equilibria takes place in a short time. (3) The original distribution can be well-approximated by the distribution of the Dyson Brownian motion for a short time with initial data given by an approximate inverse flow. To implement this scheme, a key input is to estimate the fluctuations of the empirical density of eigenvalues in short scales.

Shortly after this manuscript appeared on the arXiv, we learned that our main result was also obtained by Tao and Vu in under essentially no regularity conditions on the initial distribution $\nu$ provided the third moment of $\nu$ vanishes. Some partial results for the Gaussian orthogonal ensembles are also obtained and we refer the reader to the preprint for more details.

Conventions. We will use the letters $C$ and $c$ to denote general constants whose precise values are irrelevant and they may change from line to line. These constants may depend on the constants in (1.5)–(1.8).

Method of Time Reversal

Recall the Ornstein-Uhlenbeck process from (1.3) with the reversible measure $\mu({\rm d}x)=\mu(x){\rm d}x=e^{-x^{2}}{\rm d}x$ . Let $u$ be a positive density with respect to $\mu$ , i.e. $\int u{\rm d}\mu=1$ and we write $u(x)=\exp(-V(x))$ .

Let $V$ satisfy the conditions (1.5), (1.6) with some $k$ and (1.8). Let $\lambda>0$ be sufficiently small and $t=N^{-1+\lambda}$ . Define a cutoff initial density as

where $\theta$ is a smooth cutoff function satisfying $\theta(x)=1$ for $|x|\leq 1$ and $\theta(x)=0$ for $|x|\geq 2$ and $c_{N}$ and $d_{N}$ are chosen such that $v(x){\rm d}\mu(x)$ is a probability density with zero expectation. Denote ${\mathcal{L}}=L^{\otimes n}$ , $F=u^{\otimes n}$ and $F_{c}=v^{\otimes n}$ with $n=N^{2}$ .

with some $c>0$ depending on $k$ and $\lambda$ .

(ii) $g_{t}:=(1-tL+\frac{1}{2}t^{2}L^{2})v$ is a probability measure with respect to ${\rm d}\mu$ and for $G_{t}:=[g_{t}]^{\otimes n}$ we have

where $C$ depends on $\lambda$ and on the constants in (1.5), (1.6).

In the formulation of this proposition we have not taken into account that in our application the diagonal elements of the matrix evolve under a differently normalized OU process with generator $\widetilde{L}=\frac{1}{2}\partial_{x}^{2}-\frac{x}{2}\partial_{x}$ with invariant measure $e^{-x^{2}/2}{\rm d}x$ . This modification is only notational and does not affect the validity of the estimates (2.1) and (2.2).

Proof. From condition (1.6) the estimate (2.1) follows directly by noting that the constants $c_{N}$ and $d_{N}$ are subexponentially small in $N$ . For the proof of (2.2), we first control the evolution of each matrix element under the OU process (1.3). We assume that for the initial density $v$

hold with some constants positive $A_{1},A_{2}$ and $A_{3}$ . Set $g_{t}=(1-tL+\frac{1}{2}t^{2}L^{2})v$ for some $t>0$ and note that $g_{t}$ is a probability density with respect to $\mu$ if

Note that by the monotonicity preserving property of the Ornstein-Uhlenbeck kernel and by (2.3), we have

Here we used the fact that $e^{sL}v\leq e^{sA_{1}}v$ under the first condition in (2.3), which follows from integrating the inequality

where we used (2.6), (2.5) and finally (2.4).

Now we consider the evolution of the product density $F_{c}=v^{\otimes n}$ , note that $\int F_{c}\;{\rm d}\mu^{\otimes n}=1$ . Applying the same procedure to each variable, we have

as long as $A_{3}^{2}t^{6}n$ is bounded. In our application $n=N^{2}$ , thus (2.9) will imply (2.2) provided that

which will also guarantee (2.7). It is straightforward to check that the density $v(x)$ satisfies (2.3) with constants $A_{j}$ subject to (2.4) and (2.10). This completes the proof.

Sine kernel for the time evolved measure

We use the contour integral representation for the correlation functions of the eigenvalues of a matrix of the form $H=\widehat{H}+aV$ , where $V$ is a GUE matrix . We will apply this result for the matrix

where, apart from a trivial prefactor $e^{-t/2}$ , $G_{t}$ plays the role of $\widehat{H}$ and $a=(e^{t}-1)^{1/2}\approx t^{1/2}$ . In order to be able to use the formula given in Proposition 1.1 of to analyze $H=\widehat{H}+aV$ , we rescale the variance of ${\rm d}\nu$ from $\frac{1}{2}$ to $\frac{1}{8}+\frac{1}{2}a^{2}$ which changes the semicircle law for $H=\widehat{H}+aV$ to

In particular, the support changes from $ $to$ [-\sqrt{1+4a^{2}},\sqrt{1+4a^{2}}] $. Since eventually$ a $will go to zero, the condition$ |u|<2 $in Theorem (1.1) to be away from the spectral edge changes to the condition$ |u|<1 $which we assume in the sequel. The semicircle law for$ \widehat{H}$ will also change from the one given in (1.10) to

In the rest of this Section we will use (3.3). The main result of this section is

Proof. Using Proposition 1.1 of , the (symmetrized) distribution of the eigenvalues $x=(x_{1},\ldots,x_{N})$ of $H=\widehat{H}+aV$ for any fixed $\widehat{H}$ is given by

where $y=(y_{1},\ldots y_{N})$ is the eigenvalues of the Wigner matrix $\widehat{H}$ with the choice of $S=a^{2}/N$ . Note that

where $y(\widehat{H})=(y_{1}(\widehat{H}),\ldots,y_{N}(\widehat{H}))$ are the eigenvalues of the Wigner matrix $\widehat{H}$ . We will choose ${\mathcal{Y}}_{N}$ to be the event that the points $y=(y_{1},\ldots,y_{N})$ follow the semicircle law (3.3). The limit of the correlation functions of $q_{S}(x,y)$ will be computed starting from the next section in Proposition 3.3.

with some sufficiently small $\eta_{0}<1$ and we set

(after taking the supremum over all energies, which can be controlled taking energies on a grid of spacing $\eta$ ). Note that the variance of the matrix elements in was different (see remark at the beginning of Section 3.1) but this does not change the estimates. The condition C1) of on the Gaussian decay for the initial density $g_{t}\mu=(1-tL+\frac{1}{2}t^{2}L^{2})v\mu$ is clearly satisfied by (2.3) and (1.6). Combining the estimate (3.9) with Proposition 3.3 and with the argument after (3.6), we have proved Proposition 3.1.

We compute the correlation functions of $q_{S}(x;y)$ in $x$ , for any fixed $y\in{\mathcal{Y}}_{N}$ :

Note that this definition of the correlation functions differs from the definition of $R_{m}^{N}$ given in ; the relation being

The following representation is based on the formula in , but it is more stable and suitable for analysis for very short time.

The correlation functions can be represented as

Proof of Proposition 3.2. From Eq. (2.18) in , we have

The change of variables $w=(1-\beta)r+\beta w^{\prime}$ , $z=(1-\beta)r+\beta z^{\prime}$ leads to

for every $\beta$ . Taking the derivative in $\beta$ at $\beta=1$ , and removing the primes from the new integration variables, we find the identity

Using that $H^{\prime}_{v}(w)=w-v+S\sum_{j=1}^{N}1/(w-y_{j})$ , we find

The second term on the r.h.s. is just $(v-u)\frac{\partial}{\partial v}{\mathcal{K}}_{N}(u,v)$ . Therefore

Integrating back over $v$ , starting from $u$ , we find that

At this point the contours of integration can be modified; since the singularity $1/(w-z)$ has been removed, they are now allowed to cross. This completes the proof of the proposition.

Let $\kappa>0$ . For any sequence $y=y^{(N)}\in{\mathcal{Y}}_{N}$ with the choice $S=N^{-2+\lambda}$ we have

uniformly for $|u|\leq 1-\kappa$ and for $\alpha,\beta$ in a compact set. Moreover, the correlation functions satisfy

uniformly for $|u|\leq 1-\kappa$ and for $\alpha_{1},\ldots,\alpha_{m}$ in a compact set.

Proof. The statement in (3.15) follows directly from (3.14) and (3.11), so it is sufficient to prove (3.14). We will prove (3.14) in the form

for any sequence $u^{(N)}$ with $|u^{(N)}-u_{*}|\leq C/N$ and for every fixed $u_{*}$ with $|u_{*}|<1-\kappa$ . In order to get (3.14), we take $u^{(N)}=u_{*}+\alpha/N\varrho(u_{*})$ with $u_{*}=u$ .

2 Saddle points

This is equivalent to finding the zeros of a polynomial of degree $N+1$ . There are $N-1$ real roots and two complex roots, called $q_{N}^{\pm}$ , that are complex conjugates of each other

We will work with $q_{N}:=q_{N}^{+}$ , the analysis of the other saddle is analogous. Clearly $|Re\;q_{N}|\leq K$ for some large $K$ .

The solutions of this latter equation (for small $t$ ) are given by

where we also used the equation (3.23) for $q^{\pm}$ . We set $q=q^{+}$ .

We need to know that $f_{N}^{\prime\prime}\neq 0$ at the $q_{N}$ saddle.

It follows from (3.8) that for $y\in{\mathcal{Y}}$ we have

We compare $q$ and $q_{N}$ . We have from (3.22)

First we show that for the only solution to (3.26) with positive imaginary part we have $Im\;q_{N}\geq\eta$ . This is a fixed point argument.

for some large constant $C$ . Since $y\in{\mathcal{Y}}$ , we know that

with $F^{\prime}(z)=O(t)$ if $z\in\Xi$ . Thus $|F_{N}^{\prime}(z)|\leq 1/2$ for $z\in\Xi$ , so $F_{N}$ is a contraction on $\Xi$ and thus (3.26) has a unique solution, which is $q_{N}$ .

3 Evaluating the integrals

Using Laplace asymptotics, we compute the integrals in (3.17). We choose the horizontal curves $\gamma_{\pm}$ to pass through the two saddles $q^{\pm}=a\pm ib$ of $f$ (see (3.24)), i.e. we set ${\omega}=b$ (see the definition of $\gamma^{\pm}$ after (3.12)). The vertical line $\Gamma$ is shifted to pass through the saddles, i.e. $Re\;\Gamma=a$ . Moreover, if necessary, we deform $\Gamma$ in a $O(N^{-1})$ -neighborhood of $a$ so that $\min_{j}\mbox{dist}(\Gamma,y_{j})\geq N^{-2}$ and $\mbox{dist}(\Gamma,a_{N})\geq N^{-2}$ ; this is always possible.

according to whether $Im\;z$ and $Im\;w$ are positive or negative, e.g.

where $\Gamma_{+}=\Gamma\cap\{w\;:\;Im\,w\geq 0\}$ and $\Gamma_{-}=\Gamma\cap\{w\;:\;Im\,w\leq 0\}$ . We will work on $A^{++}$ , the other three integrals are treated similarly.

The main contribution to the integral $A^{++}$ will come from an $\varepsilon$ -neighborhood in $z$ and $w$ of the saddle point $q_{N}=q_{N}^{+}$ . The radius $\varepsilon$ will be chosen such that after a local change of variable $f$ and $f_{N}$ become quadratic near the saddle. We now explain the local change of variable.

with $\phi(q)=0$ , $\phi^{\prime}(q)=\sqrt{tf^{\prime\prime}(q)}$ such that

we also assume that $\varepsilon\leq\eta$ . We will choose $\varepsilon=ct$ with a small $c$ , depending on $u$ . We have

from the explicit formula (3.23), so (3.32) is satisfied. Note that $\phi^{\prime}(q)=\sqrt{tf^{\prime\prime}(q)}=1+O(t)$ .

We have a similar change of variables for $f_{N}$ , i.e. $\phi_{N}$ with the properties that

For $y\in{\mathcal{Y}}$ , we have $f^{\prime\prime}_{N}(q_{N})=t^{-1}\big{[}1+O(N^{-\lambda/4})\big{]}$ and $|f^{\prime\prime\prime}_{N}(z)|\leq Ct^{-2}N^{-\lambda/4}$ by (3.25) and (3.33), thus we can choose $\varepsilon=ct$ for some small constant $c\leq\sqrt{1-u^{2}}$ .

Moreover we have $|\phi_{N}(z)|\leq C|z-q_{N}|$ for $|z-q|\leq ct$ , so by Cauchy formula $|\phi^{\prime}_{N}(z)|\leq C$ and $|\phi^{\prime\prime}_{N}(z)|\leq Ct^{-1}$ for $|z-q|\leq ct$ (maybe after reducing $c$ ). The same formulas hold for $\phi$ as well. We also have

where in the first term we used (3.25) and in the second we used $|f^{\prime\prime\prime}_{N}(z)|\leq Ct^{-2}$ .

for any $z$ with $|z-q|\leq ct$ . Therefore the maps $\phi$ and $\phi_{N}$ are $C^{1}$ -close within $D_{\varepsilon}$ and both of them are $C^{1}$ -close to the shift map $z\to z-q$ .

We first consider the $z$ integration. Recall that $q_{N}=q^{+}_{N}=a_{N}+ib_{N}$ from (3.24). We fix a small positive constant $c_{1}\ll 1$ and we define the domains

where $\varepsilon=ct$ . Recall that $\gamma^{+}$ was the horizontal line going through $q=a+ib$ , the saddle of $f$ . We will deform $\gamma^{+}$ to $\gamma_{N}^{+}$ so that it passes through $q_{N}$ and it matches with $\gamma^{+}$ at the points $a_{N}\pm 2\varepsilon+ib$ . Within the regime $|Re\;z-a_{N}|\leq\varepsilon$ , we define $\gamma_{N}^{+}$ by the requirement that $Im\;\phi_{N}=0$ along $\gamma_{N}^{+}$ . Since $\phi_{N}(z)$ is close to the map $z\to z-q_{N}$ by (3.36), clearly $\gamma_{N}^{+}$ is almost horizontal curve in small neighborhood of $q_{N}$ , so it remains in $W$ until it reaches the vertical lines $|Re\;z-a_{N}|=\varepsilon$ . In the regime $\varepsilon\leq|Re\;z-a_{N}|\leq 2\varepsilon$ , we require that $\gamma_{N}^{+}$ matches with $\gamma^{+}$ at the points $a_{N}\pm 2\varepsilon+ib$ and it remains in the wedge $W$ . In the outside regime, $|Re\;z-a_{N}|\geq 2\varepsilon$ we set $\gamma_{N}^{+}=\gamma^{+}$ , in particular $\gamma_{N}^{+}\subset W\cup\Omega$ (see Fig. 1).

Proof. The second statement (3.40) follows from the normal form (3.35) and the fact that for $z\in W$ we have $|Im\;(z-q_{N})|\leq c_{1}|Re(z-q_{N})|$ , i.e. $Re(z-q_{N})^{2}\geq 0$ , and $\phi_{N}$ is close to the map $z\to z-q_{N}$ in $W$ , so $Re[\phi_{N}(z)]^{2}\geq 0$ for $z\in W$ .

For the first statement, we assume $x\geq a$ , the case $x\leq a$ is analogous. We get by explicit calculation

(the error is absorbed since $|x-a|\geq ct/2$ for $x+iy\in\Omega^{*}$ ). Since $Re[f_{N}(z)-f_{N}(q_{N})]\geq 0$ on the vertical lines $|x-a|=\varepsilon/2$ , $|y-b|\leq c_{1}\varepsilon/2$ , we can integrate the inequality (3.41) to obtain (3.39).

which holds for $|y|\geq\eta$ . By explicit computation, and using $f^{\prime}(a+ib)=0$ ,

if $|y|\leq\frac{1}{2}\sqrt{1-u^{2}}$ , $|x-a|\leq Ct$ for some large $C$ . Thus we have

where $\varepsilon=ct$ with a small $c$ as before and a similar lower bound holds for $y-b\leq-\varepsilon/2$ . Defining

analogously to $W$ before, we easily obtain

The regimes $0\leq y\leq\eta$ and $y\geq\frac{1}{2}\sqrt{1-u^{2}}$ are treated directly. We use

from (3.42), if $\eta_{0}$ is sufficiently small, see (3.7).

If $y\geq\frac{1}{2}\sqrt{1-u^{2}}$ , then

and thus $Re\;f_{N}(x+iy)\leq-y^{2}/4t$ in this regime. Summarizing these results, we have

We can define a new contour $\Gamma_{N}^{+}$ similar to the $\gamma_{N}^{+}$ . It follows the path where $\phi_{N}$ has zero imaginary part when $|Im\;w-b|\leq\varepsilon/2$ and then it returns to $\Gamma^{+}$ when $|Im\;w-b|\geq\varepsilon$ . We recall that $\min_{j}\mbox{dist}(\Gamma_{N}^{+},y_{j})\geq N^{-2}$ and $\mbox{dist}(\Gamma,a_{N})\geq N^{-2}$ by the choice of $\Gamma$ .

With the paths $\gamma_{N}^{+}$ and $\Gamma_{N}^{+}$ defined, we can now do the integration

if $|z-(a+ib)|\leq\varepsilon$ , $|w-(a+ib)|\leq\varepsilon$ . In order to make sure that these bounds are satisfied, we fix the constant $r=\text{Re }q_{N}(u_{*})$ in (3.17). Here $q_{N}(u^{*})$ is the unique solution with positive imaginary part of the saddle point equation (3.22), with $u$ (which is actually a short hand notation for $u^{(N)}$ ) replaced by the fixed $u_{*}$ . Note that, since $|u^{(N)}-u_{*}|\leq C/N$ , we find that the real part of the exponent of $h_{N}(w)$ (see (3.20)) is bounded, $|r-\text{Re}w|/t\rho\leq C$ , as $w$ runs through $\Gamma$ .

This choice also guarantees that, away from the saddle,

that hold for $|Im\;z|\geq\eta$ , $Im\;w\geq 0$ . These bound follow from (3.19), (3.20) and (3.21) and when $w$ is near the real axis, we also used that $\Gamma_{N}$ is away from the $y_{j}$ ’s.

The integration in $A^{++}$ (see (3.47)) will be divided into regimes near the saddle $q_{N}$ (“inside”) or away from the saddle (“outside”):

Recall that $|q_{N}-q|=o(t)$ and $q=q^{+}=a+ib$ (see (3.24)). For example

where $\chi$ is the characteristic function of the interval $[-\varepsilon,\varepsilon]$ . The other $A$ ’s are defined analogously.

The integral of the exponential term is bounded by

Taking into account (3.48) and (3.49), we see that $|A_{io}|\leq e^{-cNt}$ since $t=N^{-1+\lambda}$ . Similarly we can bound all other terms with an outside part. When $|Re\,z-a|\geq ct\gg N^{-1}$ , then the exponential growth of $h_{N}$ in (3.49) will be controlled by the Gaussian decay of

Finally, we have to compute the contribution of the saddle, i.e. the term $A_{ii}$ . We let $\widetilde{\gamma}$ be the part of $\gamma_{N}^{+}$ with $|Re\;\gamma_{N}-a|\leq\varepsilon$ and similarly defined $\widetilde{\Gamma}$ . Recall that $Im\;\phi_{N}=0$ on $\widetilde{\gamma}$ . From standard Laplace asymptotics calculation, we have

while the main term in the bracket on the r.h.s. of (3.51) is of order $t^{-1}$ . Analogously performing the ${\rm d}w$ integration, we obtain that

where we also used $g_{N}(q_{N},q_{N})=f_{N}^{\prime\prime}(q_{N})$ following from (3.21). So far we considered the saddle $q_{N}=q_{N}^{+}$ with positive imaginary part for both the $z$ and $w$ integrals. The same calculation can be performed at the saddle $z=w=q_{N}^{-}$ . The mixed case, when $z$ is integrated near one of the saddles and $w$ is near the other one, gives zero contribution, since $g_{N}(q_{N}^{-},q_{N}^{+})=g_{N}(q_{N}^{+},q_{N}^{-})=0$ by (3.21). Adding up the contributions of the two relevant saddles, $z=w=q_{N}^{+}$ and $z=w=q_{N}^{-}$ , taking into account the opposite orientations of the two pieces of $\gamma_{N}$ , one obtains

where we used the choice $r=\text{Re }q^{\pm}_{N}(u^{*})$ (see after (3.48)), which guarantees that $|r-\text{Re}q_{N}^{\pm}|\to 0$ as $N\to\infty$ , and the equations (3.16), (3.24), and (3.28). This completes the proof of Proposition 3.3.

Proof of the main theorems

Proof of Theorem 1.1. We follow the notations of Proposition 2.1. In Proposition 3.1 we have shown that the sine kernel holds for the measure $e^{t{\mathcal{L}}}G_{t}$ if $t=N^{-1+\lambda}$ . More precisely, let $p_{N,t}(x)$ , denote the density function of the eigenvalues $x=(x_{1},\ldots,x_{N})$ w.r.t. $e^{t{\mathcal{L}}}G_{t}$ and let $p_{N,t}^{(2)}$ be the two point correlation function, defined analogously to (1.9). Similarly, we define $p_{N,c}(x)$ and $p_{N,c}^{(2)}$ for the eigenvalue density and two point correlation function w.r.t. truncated measure $F_{c}=v^{\otimes n}$ .

for any $|u|<2$ and with the notation $\varrho=\varrho_{sc}(u)$ . (We remark that $p_{N,t}^{(2)}$ was denoted by $\widetilde{p}_{N}^{(2)}$ in Proposition 3.1 and the condition $|u|<2$ is translated into $|u|<1$ after rescaling.)

To prove (1.11), we thus only need to control the difference as follows

with some $c>0$ as $N\to\infty$ . To estimate $(II)$ , we have

Using (4.1) for the observable $|O|$ instead of $O$ , the second factor on the r.h.s. of (4.2) is bounded. Since $O$ is bounded, the first factor is smaller than

Here in the first step we used that the quantity $D(f,g)=\int|f/g-1|^{2}g$ for two probability measures $f$ and $g$ decreases when taking marginals. In the second step, we used that $D(f,g)$ decreases when passing the probability laws from matrix elements to the induced probability laws for the eigenvalues. Finally, we used the estimate (2.2). This completes the proof of Theorem 1.1.

by recalling (3.5). The second term can be estimated by using $|\Lambda|\leq N$ and (3.9) as

For the first term in (4.4), we use the exclusion-inclusion principle to compute

with $\varrho=\varrho(u)$ (see (3.2)) and recall that $\widetilde{p}^{(m)}_{N,y,S}$ denote the correlation functions of $q_{S}(x,y)$ (see (3.10)). After a change of variables,

where the factor $m$ comes from considering the integration sector $z_{1}\leq z_{j}$ , $j\geq 2$ . Taking $N\to\infty$ and using Proposition 3.3, we get

where in the last determinant term we set $a_{1}=0$ . The interchange of the limit and the summation can be justified by noting that the exclusion-inclusion principle guarantees that (4.6) is an alternating series where the difference between the sum and its $M$ -term truncation can be controlled by the $(M+1)$ -th term for any $M$ . We note that the left hand side of (4.8) is $\int_{0}^{s}p(\alpha){\rm d}\alpha$ , where $p(\alpha)$ is the second derivative of the Fredholm determinant $\det(1-{\mathcal{K}}_{\alpha})$ (see (1.13)). Combining (4.8) with the estimate (4.5), we have

After rescaling (3.1), we also conclude that the limit of the expectation of $\Lambda$ with respect to the time evolved ensemble $e^{t{\mathcal{L}}}G_{t}$ (see Proposition 2.1) is given by right hand side of (4.9).

Finally, the difference of the expectation of $\Lambda$ with respect to the measure $e^{t{\mathcal{L}}}G_{t}$ and w.r.t. the initial ensemble $F$ vanishes since $|\Lambda|\leq N$ and $\mbox{Var}(e^{t{\mathcal{L}}}G_{t},F)\leq CN^{-2+4\lambda}$ (see (2.1) and (2.2)). This completes the proof of Theorem 1.2.

Some extensions and comments

In this section we explain how to relax some of the conditions on the initial distribution $\nu$ .

We first explain how to extend our proof to include distributions $\nu$ with compact support. Take for example a density w.r.t. the Gaussian measure ${\rm d}\mu(x)=e^{-x^{2}}$ that is given by a nice bump function $u(x)$ supported in $ $decaying like$ (1\pm x)^{m} $near the boundary$ x=\pm 1 $. Clearly, for any$ m $fixed this distribution violates the assumptions of Theorem 1.1. We now show that for$ m$ large enough, it is still possible to prove the universality. Define a new distribution with density

with a small parameter $\tau>0$ to be determined later. Near the edge $1$ we have $Lq(1-y)\lesssim Cy^{m-2}$ for $0\leq y\ll 1$ with some $m$ -dependent constant $C$ . We thus need the condition

to guarantee that $(1-tL)q$ is a probability density. This inequality holds if

The other conditions concerning $L^{2}$ and $L^{3}$ (see (2.3)) can be handled similarly. Choosing $\tau=Ct^{1/2}$ , the total variation norm is bounded by

Since $n=N^{2}$ and $t=N^{-1+\varepsilon}$ , we have

Let, say, $m\geq 9$ , then the error term will be smaller than $N^{-2-\delta}$ with some $\delta>0$ and this will imply Theorem 1.1 for the initial distribution $u$ . The modification of $u$ in (5.1) can certainly be more sophisticated to reduce the exponent $m$ .

Therefore, Theorem 4.1 of holds with the estimates taking the form