Random matrices: The Universality phenomenon for Wigner ensembles

Terence Tao, Van Vu

Introduction

Random matrix theory is a central topic in probability and mathematical physics, with many connections to various areas such as statistics, number theory, combinatorics, numerical analysis and theoretical computer science.

One of the primary goal of random matrix theory is to derive limiting laws for the eigenvalues and eigenvectors of ensembles of large (n×nn\times n) Hermitian random matricesOne can of course also study non-Hermitian or non-square random matrices, though in the latter case the concept of an eigenvalue needs to be replaced with that of a singular value. In this survey we will focus almost exclusively on the square Hermitian case., in the asymptotic limit nn\to\infty. There are many random matrix ensembles of interest, but to focus the discussion and to simplify the exposition we shall restrict attention to an important model class of ensembles, the Wigner matrix ensembles.

Let n1n\geq 1 be an integer (which we view as a parameter going off to infinity). An n×nn\times n Wigner Hermitian matrix MnM_{n} is defined to be a random Hermitian n×nn\times n matrix Mn=(ξij)1i,jnM_{n}=(\xi_{ij})_{1\leq i,j\leq n}, in which the ξij\xi_{ij} for 1ijn1\leq i\leq j\leq n are jointly independent with ξji=ξij\xi_{ji}=\overline{\xi_{ij}} (in particular, the ξii\xi_{ii} are real-valued). For 1i<jn1\leq i<j\leq n, we require that the ξij\xi_{ij} have mean zero and variance one, while for 1i=jn1\leq i=j\leq n we require that the ξij\xi_{ij} (which are necessarily real) have mean zero and variance σ2\sigma^{2} for some σ2>0\sigma^{2}>0 independent of i,j,ni,j,n. To simplify some of the statements of the results here, we will also assume that the ξijξ\xi_{ij}\equiv\xi are identically distributed for i<ji<j, and the ξiiξ\xi_{ii}\equiv\xi^{\prime} are also identically distributed for i=ji=j, and furthermore that the real and imaginary parts of ξ\xi are independent. We refer to the distributions Reξ{\operatorname{Re}}\xi, Imξ{\operatorname{Im}}\xi, and ξ\xi^{\prime} as the atom distributions of MnM_{n}.

We say that the Wigner matrix ensemble obeys Condition C0 if we have the exponential decay condition

for all 1i,jn1\leq i,j\leq n and tCt\geq C^{\prime}, and some constants C,CC,C^{\prime} (independent of i,j,ni,j,n). We say that the Wigner matrix ensemble obeys condition C1 with constant C0C_{0} if one has

for some constant CC (independent of nn).

Of course, Condition C0 implies Condition C1 for any C0C_{0}, but not conversely.

We refer to the matrix Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n} as the coarse-scale normalised Wigner Hermitian matrix, and An:=nMnA_{n}:=\sqrt{n}M_{n} as the fine-scale normalised Wigner Hermitian matrix.

for some quantity Zn>0Z_{n}>0 depending only on nn, where dMndM_{n} is Haar measure on the vector space of n×nn\times n Hermitian matrices (in the case of GUE) or real symmetric matrices (in the case of GOE), and cc is equal to 1/21/2 (for GUE) or 1/41/4 (for GOE). From (1) we easily conclude that the probability distribution of GUE is invariant with respect to conjugations by unitary matrices, and similarly the probability distribution of GOE is invariant with respect to conjugations by orthogonal matrices. However, a general Wigner matrix ensemble will not enjoy invariances with respect to such large classes of matrices. (For instance, the Bernoulli ensembles described below are only invariant with respect to conjugation by a discrete group of matrices, which include permutation matrices and reflections around the coordinate axes.)

At the opposite extreme from the invariant ensembles are the Bernoulli ensembles, which are discrete instead of continuous. In the real Bernoulli ensemble (also known as symmetric random sign matrices), each of the ξij\xi_{ij} are equal to +1+1 with probability 1/21/2 and 1-1 with probability 1/21/2. In the complex Bernoulli ensemble, the diagonal entries ξii\xi_{ii} still have this distribution, but the off-diagonal entries now take valuesWe use 1\sqrt{-1} to denote the imaginary unit, in order to free up the symbol ii as an index variable. ±12±12\pm\frac{1}{\sqrt{2}}\pm\frac{\sqrt{-1}}{\sqrt{2}}, with each of these four complex numbers occuring with probability 1/41/4.

Many of the results given here have been extended to somewhat broader classes of matrices. For instance, one can consider generalised Wigner ensembles in which entries are not identically distributed; for instance, one can allow the variances σij2\sigma_{ij}^{2} of each entry ξij\xi_{ij} to vary in i,ji,j, and even vanish for some i,ji,j; the latter situation occurs for instance in band-limited random matrices; one can also consider allowing the mean μij\mu_{ij} of the entries ξij\xi_{ij} to be non-zero (this is for instance the situation with the adjacency matrices of Erdős-Renyi random graphs). One can also consider sparse Wigner random matrices, in which only a small (randomly selected) number of entries are non-zero. For simplicity, though, we shall mostly restrict attention in this survey to ordinary Wigner ensembles. We do remark, however, that in all of these generalisations, it remains crucial that the entries ξij\xi_{ij} for 1ijn1\leq i\leq j\leq n are jointly independent, as many of the techniques currently available to control the fine-scale spectral structure of Wigner matrices rely heavily on joint independence.

Given an n×nn\times n Hermitian matrix AA, we denote its nn eigenvalues in increasing orderIt is also common in the literature to arrange eigenvalues in decreasing order instead of increasing. Of course, the results remain the same under this convention except for minor notational changes. as

We also introduce the eigenvalue counting function

Global and local semi-circular laws

We first discuss the coarse-scale spectral structure of Wigner matrices, that is to say the structure of the eigenvalues of WnW_{n} at unit scales (or equivalently, the eigenvalues of MnM_{n} at scale n\sqrt{n}, or AnA_{n} at scale nn). The fundamental result in this topic is the (global) Wigner semi-circular law. Denote by ρsc\rho_{sc} the semi-circle density function with support on $$,

Let MnM_{n} be a Wigner Hermitian matrix. Then for any interval II (independent of nn), one has

in the sense of probability (and also in the almost sure sense, if the MnM_{n} are all minors of the same infinite Wigner Hermitian matrix).

Wigner proved this theorem for special ensembles. The general version above is due to Pastur (see for a detailed discussion). The semi-circular law in fact holds under substantially more general hypotheses than those given in Definition 1, but we will not discuss this matter further here. One consequence of Theorem 5 is that we expect most of the eigenvalues of WnW_{n} to lie in the interval (2+ε,2+ε)(-2+{\varepsilon},2+{\varepsilon}) for ε>0{\varepsilon}>0 small; we shall thus informally refer to this region as the bulk of the spectrum.

An essentially equivalentThis formulation is slightly stronger because it also incorporates the upper bound Wnop2+o(1)\|W_{n}\|_{op}\leq 2+o(1) on the operator norm of WnW_{n}, which is consistent with, but not implied by, the semi-circular law, and follows from the work of Bai and Yin . formulation of the semi-circular law is as follows: if 1in1\leq i\leq n, then one hasWe use the asymptotic notation o(1)o(1) to denote any quantity that goes to zero as nn\to\infty, and O(X)O(X) to denote any quantity bounded in magnitude by CXCX, where CC is a constant independent of nn.

with probability 1o(1)1-o(1) (and also almost surely, if the MnM_{n} are minors of an infinite matrix), where the classical location λicl(Wn)\lambda_{i}^{{\operatorname{cl}}}(W_{n}) of the ithi^{th} eigenvalue is the element of $$ defined by the formula

A remarkable feature of the semi-circular law is its universality: the precise distribution of the atom variables ξij\xi_{ij} are irrelevant for the conclusion of the law, so long as they are normalised to have mean zero and variance one (or variance σ2\sigma^{2}, on the diagonal), and are jointly independent on the upper-triangular portion of the matrix. In particular, continuous matrix ensembles such as GUE or GUE, and discrete matrix ensembles such as the Bernoulli ensembles, have the same asymptotic spectral distribution when viewed at the coarse scale (i.e. by considering eigenvalue ranges of size 1\sim 1 for WnW_{n}, or equivalently of size n\sim\sqrt{n} for MnM_{n} or n\sim n for AnA_{n}).

However, as stated, the semi-circular law does not give good control on the fine-scale behaviour of the eigenvalues, for instance in controlling NI(Wn)N_{I}(W_{n}) when II is a very short interval (of length closer to 1/n1/n than to 11). The fine-scale theory for Wigner matrices is much more recent than the coarse-scale theory given by the semi-circular law, and is the main focus of this survey.

There are several ways to establish the semi-circular law. For invariant ensembles such as GUE or GOE, one can use explicit formulae for the probability distribution of NIN_{I} coming from the theory of determinantal processes, giving precise estimates all the way down to infinitesimally small scales; see e.g. or Section 3 below. However, these techniques rely heavily on the invariance of the ensemble, and do not directly extend to more general Wigner ensembles. Another popular technique is the moment method, based on the basic moment identities

for all k0k\geq 0. The moment method already is instructive for revealing at least one explanation for the universality phenomenon. If one takes expectations in the above formula, one obtains

For those terms for which each edge {ij,ij+1}\{i_{j},i_{j+1}\} appears at most twice, the summand can be explicitly computed purely in terms of the mean and variances of the ξij\xi_{ij}, and are thus universal. Terms for which an edge appears three or more times are sensitive to higher moments of the atom distribution, but can be computed to give a contribution of o(1)o(1) (at least assuming a decay condition such as Condition C0). This already explains universality for quantities such as ENI(Wn){\mathbf{E}}N_{I}(W_{n}) for intervals II of fixed size (independent of nn), at least if one assumes a suitable decay condition on the entries (and one can use standard truncation arguments to relax such hypotheses substantially).

At the edges ±2\pm 2 of the spectrum, the moment method can be pushed further, to give quite precise control on the most extreme eigenvalues of WnW_{n} (which dominate the sum in (5)) if kk is sufficiently large; see and the references therein. However, the moment method is quite poor at controlling the spectrum in the bulk. To improve the understanding of the bulk spectrum of Wigner matrices, Bai (see also [6, Chapter 8]) used the Stieltjes transform method (building upon the earlier work of Pastur ) to show that the speed of convergence to the semi-circle was O(n1/2)O(n^{-1/2}). Instead of working with moments, one instead studied the Stieltjes transform

is the Stieltjes transform of the semi-circular distribution ρsc\rho_{\operatorname{sc}}. The key to the argument is to establish a self-consistent equation for sns_{n}, which roughly speaking takes the form

One can explicitly compute by contour integration that

for zz\neq, where z24\sqrt{z^{2}-4} is the branch of the square root that is asymptotic to zz at infinity, and in particular that

From a stability analysis of the elementary equation (7) and a continuity argument, one can then use (6) to show that

which then implies convergence to the semi-circular law. If one can obtain quantitative control on the approximation in (6), one can then deduce quantitative versions of the semi-circular law that are valid for certain short intervals.

We briefly sketch why one expects the self-consistent equation to hold. One can expand

where ((WnzI)1)ii((W_{n}-zI)^{-1})_{ii} denotes the iithii^{{\operatorname{th}}} entry of the matrix (WnzI)1(W_{n}-zI)^{-1}. Let us consider the i=ni=n term for sake of concreteness. If one expands WnW_{n} as a block matrix

property between the eigenvalues of Mn1M_{n-1} and the eigenvalues of MnM_{n} (which easily follows from the Courant-Fisher minimax formula

for the eigenvalues), one easily obtains an approximation of the form

Similarly for the other diagonal entries ((WnzI)1)ii((W_{n}-zI)^{-1})_{ii}. Inserting this approximation back into (9) gives the desired approximation (6), heuristically at least.

The above argument was optimizedThere are further refinements to this method in Erdős, Yau, and Yin , and Erdős-Knowles-Yau-Yin , that took advantage of some additional cancellation between the error terms in (11) (generalised to indices i=1,,ni=1,\ldots,n) that could be obtained (via the moment method in , , and via decoupling arguments in , ), to improve the error estimates further. These refinements are not needed for the application to Wigner matrices assuming a strong decay condition such as Condition C0, but is useful in generalised Wigner matrix models, such as sparse Wigner matrices in which most of the entries are zero, or in models where one only has a weak amount of decay (e.g. Condition C1 with C0=4+εC_{0}=4+{\varepsilon}). These refinements also lead to the very useful eigenvalue rigidity bound (15). in a sequence of papers by Erdős, Schlein, and Yau (see also [104, Section 5.2] for a slightly simplified proof). As a consequence, one was able to obtain good estimates of the form (8) even when zz was quite close to the spectrum $(e.g.atdistance(e.g. at distanceO(n^{-1+{\varepsilon}})forsomesmallfor some small{\varepsilon}>0,whichinturnleads(bystandardarguments)togoodcontrolontheeigenvaluecountingfunction, which in turn leads (by standard arguments) to good control on the eigenvalue counting functionN_{I}(W_{n})forintervalsfor intervalsIoflengthasshortasof length as short asn^{-1+{\varepsilon}}.(Notethatasthereareonly. (Note that as there are onlyneigenvaluesinall,suchintervalsareexpectedtoonlyhaveabouteigenvalues in all, such intervals are expected to only have aboutO(n^{\varepsilon})$ eigenvalues in them.) Such results are known as local semi-circular laws. A typical such law (though not the strongest such law known) is as follows:

See e.g. [98, Theorem 1.10]. For the most precise estimates currently known of this type (and with the weakest decay hypotheses on the entries), see . ∎

A variant of Theorem 7, which was establishedThe result in actually proves a more precise result that also gives sharp results in the edge of the spectrum, though due to the sparser nature of the λicl(Wn)\lambda_{i}^{\operatorname{cl}}(W_{n}) in that case, the error term Oε(n1+ε)O_{\varepsilon}(n^{-1+{\varepsilon}}) must be enlarged. in the subsequent paper , is the extremely useful eigenvalue rigidity property

valid with overwhelming probability in the bulk range δni(1δ)n\delta n\leq i\leq(1-\delta)n for any fixed δ>0\delta>0 (and assuming Condition C0), and which significantly improves upon (4). This result is key in some of the strongest applications of the theory. See Section 7.7 for the precise form of this result and recent developments.

Roughly speaking, results such as Theorem 7 and (15) control the spectrum of WnW_{n} at scales n1+εn^{-1+{\varepsilon}} and above. However, they break down at the fine scale n1n^{-1}; indeed, for intervals II of length I=O(1/n)|I|=O(1/n), one has nIρsc(x) dx=O(1)n\int_{I}\rho_{{\operatorname{sc}}}(x)\ dx=O(1), while NI(Wn)N_{I}(W_{n}) is clearly a natural number, so that one can no longer expect an asymptotic of the form (14). Nevertheless, local semicircle laws are an essential part of the fine-scale theory. One particularly useful consequence of these laws is that of eigenvector delocalisation:

Let MnM_{n} be a Wigner matrix obeying Condition C0, and let ε>0{\varepsilon}>0. Then with overwhelming probability, one has ui(Wn)ej=O(n1/2+ε)u_{i}(W_{n})^{*}e_{j}=O(n^{-1/2+{\varepsilon}}) for all 1i,jn1\leq i,j\leq n.

Note from Pythagoras’ theorem that j=1nui(Mn)ej2=ui(Mn)2=1\sum_{j=1}^{n}|u_{i}(M_{n})^{*}e_{j}|^{2}=\|u_{i}(M_{n})\|^{2}=1; thus Corollary 8 asserts, roughly speaking, that the coefficients of each eigenvector are as spread out (or delocalised) as possible.

(Sketch) By symmetry we may take ej=ne_{j}=n. Fix ii, and set λ:=λi(Wn)\lambda:=\lambda_{i}(W_{n}); then the eigenvector equation (Wnλ)ui(Wn)=0(W_{n}-\lambda)u_{i}(W_{n})=0 can be expressed using (10) as

and by using concentration of measure tools such as Talagrand’s concentration inequality (see e.g. ), one can then conclude that πV(Xn)1\|\pi_{V}(X_{n})\|\gg 1 with overwhelming probability. This concludes the claim of the Corollary in the bulk case.

The edge case is more delicate, due to the sparser spectrum near λ\lambda. Here, one takes advantage of the identity

In the edge case, the right-hand side is close to ±2\pm 2, and this can be used (together with the local semicircle law) to obtain the lower bound

with overwhelming probability, which gives the claim. See for details. ∎

A slicker approach to eigenvalue delocalisation proceeds via control of the resolvent (or Green’s function) (WnzI)1(W_{n}-zI)^{-1}, taking advantage of the identity

for z=E+iηz=E+i\eta; see for instance for details of this approach. Note from (11) that the Stieltjes transform arguments used to establish the local semicircle law already yield control on quantities such as ((WnzI)1)jj((W_{n}-zI)^{-1})_{jj} as a byproduct.

Fine-scale spectral statistics: the case of GUE

We now turn to the question of the fine-scale behavior of eigenvalues of Wigner matrices, starting with the model case of GUE. Here, it is convenient to work with the fine-scale normalisation An:=nMnA_{n}:=\sqrt{n}M_{n}. For simplicity we will restrict attention to the bulk region of the spectrum, which in the fine-scale normalisation corresponds to eigenvalues λi(An)\lambda_{i}(A_{n}) of AnA_{n} that are near nunu for some fixed 2<u<2-2<u<2 independent of nn.

There are several quantities at the fine scale that are of interest to study. For instance, one can directly study the distribution of individual (fine-scale normalised) eigenvalues λi(An)\lambda_{i}(A_{n}) for a single index 1in1\leq i\leq n, or more generally study the joint distribution of a kk-tuple λi1(An),,λik(An)\lambda_{i_{1}}(A_{n}),\ldots,\lambda_{i_{k}}(A_{n}) of such eigenvalues for some 1i1<<ikn1\leq i_{1}<\ldots<i_{k}\leq n. Equivalently, one can obtain estimates for expressions of the form

where Eε,x1,,xkE_{{\varepsilon},x_{1},\ldots,x_{k}} is the event that there is an eigenvalue of AnA_{n} in each of the intervals [xi,xi+ε][x_{i},x_{i}+{\varepsilon}] for i=1,,ki=1,\ldots,k. Alternatively, one can write

where ρn:=1n!Rn(n)\rho_{n}:=\frac{1}{n!}R_{n}^{(n)} is the symmetrized joint probability distribution of all nn eigenvalues of AnA_{n}.

Note from (18) that control on the expressions (17) implies (in principle, at least) control on the kk-point correlation function by summing over the relevant indices i1,,iki_{1},\ldots,i_{k}; and from eigenvalue rigidity estimates such as (17) we see that (for fixed kk, at least) there are only no(1)n^{o(1)} choices for the kk-tuple (i1,,ik)(i_{1},\ldots,i_{k}) that contribute to this sum.

Informally, for infinitesimal ε>0{\varepsilon}>0, εkρn,u(k)(x1,,xk){\varepsilon}^{k}\rho^{(k)}_{n,u}(x_{1},\ldots,x_{k}) is approximately equal to the probability that there is an eigenvalue in each of the intervals [nu+xiρsc(u),nu+xi+ερsc(u)][nu+\frac{x_{i}}{\rho_{\operatorname{sc}}(u)},nu+\frac{x_{i}+{\varepsilon}}{\rho_{\operatorname{sc}}(u)}] for

The Stieltjes transform sn(z)=trace((Wnz)1)s_{n}(z)=\operatorname{trace}((W_{n}-z)^{-1}) introduced previously is related to the fine-scale normalised eigenvalues by the formula

More generally, one can consider the random variables

The joint distribution of such random variables can be expressed in terms of the kk-point correlations; for instance, one has

Conversely, it is possible (with some combinatorial effort) to control the kk-point correlation function in terms of the joint distribution of such random variables; see [41, §8].

The distribution of the eigenvalue counting functions NI(An)=NI/n(Wn)N_{I}(A_{n})=N_{I/n}(W_{n}) can be expressed in terms of the distribution of the individual eigenvalues or from the correlation function. For instance, one has (for continuous ensembles, at least) the formula

for any k0k\geq 0 (with the convention that (nk)=0\binom{n}{k}=0 whenever n<kn<k).

Finally, to close the circle of relationships, by coupling the eigenvalue counting function NI(Wn)N_{I}(W_{n}) for intervals such as I=[2,x]I=[-2,x] with previously mentioned quantities such as the Stieltjes transforms, as well as additional level repulsion estimates that prevent two consecutive eigenvalues from getting too close to each other too often, one can recover control of individual eigenvalues; see .

It has been generally believed (and in many cases explicitly conjectured; see e.g. [77, page 9]) that the asymptotic statistics for the quantities mentioned above are universal, in the sense that the limiting laws do not depend on the distribution of the atom variables (assuming of course that they have been normalised as stated in Definition 1). This phenomenon was motivated by examples of similarly universal laws in physics, such as the laws of thermodynamics or of critical percolation; see e.g. for further discussion.

It is clear that if one is able to prove the universality of a limiting law, then it suffices to compute this law for one specific model in order to describe the asymptotic behaviour for all other models. A natural choice for the specific model is GUE, as for this model, many limiting laws can be computed directly thanks to the availability of an explicit formula for the joint distribution of the eigenvalues, as well as the useful identities of determinantal processes. For instance, one has Ginibre’s formula

for the joint eigenvalue distribution, as can be verified from (1) and a change of variables; see . From (21) and a downwards induction on kk one can then obtain the Gaudin-Mehta formula

for all 0kn0\leq k\leq n, where KnK_{n} is the kernel

and P0,P1,P_{0},P_{1},\ldots are the Hermite polynomials (thus each PnP_{n} is a degree nn polynomial, with the PnP_{n} being orthonormal with respect to the measure ex2/2 dxe^{-x^{2}/2}\ dx). This formula, combined with the classical Plancherel-Rotach asymptotics for Hermite polynomials, gives the limiting law

locally uniformly in x1,,xkx_{1},\ldots,x_{k} where

and KSineK_{{\operatorname{Sine}}} is the Dyson sine kernel

(with the usual convention that sinxx\frac{\sin x}{x} equals 11 at the origin); see . Standard determinantal process identities then give an asymptotic

for any fixed real numbers a<ba<b and 2<u<2-2<u<2, where the ξj{0,1}\xi_{j}\in\{0,1\} are independent Bernoulli variables with E(ξj)=λj{\mathbf{E}}(\xi_{j})=\lambda_{j}, the λj\lambda_{j} are the eigenvalues of the integral operator T[a,b]:L2([a,b])L2([a,b])T_{[a,b]}:L^{2}([a,b])\to L^{2}([a,b]) defined by

and the convergence is in the sense of probability distributions; see e.g. . Thus, for instance, the probability that the interval [nu+a/ρsc(u),nu+b/ρsc(u)][nu+a/\rho_{\operatorname{sc}}(u),nu+b/\rho_{\operatorname{sc}}(u)] is devoid of eigenvalues converges as nn\to\infty to the Fredholm determinantSee for a more explicit description of this determinant in terms of a solution to an ODE.

Using this formula one can obtain a limiting law for an (averaged) eigenvalue spacing. More precisely, given an intermediate scale parameter tnt_{n} such that tn,n/tnt_{n},n/t_{n}\to\infty as nn\to\infty, define the quantity

Then one can establish for fixed 2<u<2-2<u<2 that

see . Informally, ρ\rho is the asymptotic distribution of the normalised gap ρsc(u)(λi+1(An)λi(An))\rho_{\operatorname{sc}}(u)(\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n})) for typical λi(An)\lambda_{i}(A_{n}) near nunu.

A variant of the computations that lead to (24) (and more precisely, a general central limit theorem for determinantal processes due to Costin-Leibowitz and Soshnikov ) can give a limiting law for NI(An)N_{I}(A_{n}) in the case of the macroscopic intervals I=[nu,+)I=[nu,+\infty). More precisely, one has the central limit theorem

in the sense of probability distributions, for any 2<u<2-2<u<2; see . By using the counting functions N[nu,+)N_{[nu,+\infty)} to solve for the location of individual eigenvalues λi(An)\lambda_{i}(A_{n}), one can then conclude the central limit theorem

whenever λicl(An):=nλicl(Wn)\lambda_{i}^{\operatorname{cl}}(A_{n}):=n\lambda_{i}^{\operatorname{cl}}(W_{n}) is equal to n(u+o(1))n(u+o(1)) for some fixed 2<u<2-2<u<2; see . Informally, this asserts (in the GUE case, at least) that each eigenvalue λi(An)\lambda_{i}(A_{n}) typically deviates by O(logn/ρsc(u))O(\sqrt{\log n}/\rho_{\operatorname{sc}}(u)) around its classical location; this result should be compared with (15), which has a slightly worse bound on the deviation (of shape Oε(nε)O_{\varepsilon}(n^{\varepsilon}) instead of O(logn)O(\sqrt{\log n})) but which holds with overwhelming probability (and for general Wigner ensembles).

The above analysis extends to many other classes of invariant ensembles (such as GOEThere are some further direct relationships between the GOE and GUE eigenvalue distributions that can be used to deduce control of the former from that of the latter; see ., for which the joint eigenvalue distribution has a form similar to (21) (namely, an exponential factor and a power of a Vandermonde determinant). However, the Hermite polynomials are usually replaced by some other family of orthogonal polynomials, and one needs additional tools (such as the theory of Riemann-Hilbert problems) to obtain enough asymptotic control on those polynomials to recover the other results of the type given here. See for further discussion. However, we will not pursue this important aspect of the universality phenomenon for random matrices here, as our focus is instead on the Wigner matrix models.

Extending beyond the GUE case I. Heat flow methods

The arguments used to establish the results in the previous section relied heavily on the special structure of the GUE ensemble, and in particular the fact that the joint probability distribution had a determinantal structure (cf. (22)). To go significantly beyond the GUE case, there are two families of techniques used. One family are the heat flow methods, based on applying an Ornstein-Uhlenbeck process to a Wigner ensemble Mn0M_{n}^{0} to obtain a gauss divisible ensemble MntM_{n}^{t} that is closer to the GUE, and showing that the latter obeys approximately the same statistics as the GUE. The other family of methods are the swapping methods, in which one replaces the entries of one Wigner ensemble MnM_{n} with another ensemble MnM^{\prime}_{n} which are close in some suitable sense (e.g. in the sense of matching moments), and shows that the statistics for both ensembles are close to each other. The two methods are complementary, and many of the strongest results known about universality for Wigner matrices use a combination of both methods.

Our focus shall largely be on the swapping methods (and in particular on the four moment theorem), but in this section we will briefly survey the heat flow techniques. (For a more detailed survey of these methods, see or .)

Let Mn0M_{n}^{0} be a Wigner matrix. One can then define the matrix Ornstein-Uhlenbeck process MntM_{n}^{t} for times t[0,+)t\in[0,+\infty) by the stochastic differential equationOne can omit the normalising term 12Mnt dt-\frac{1}{2}M_{n}^{t}\ dt in this process to obtain a Brownian process rather than an Ornstein-Uhlenbeck process, but we retain the normalising term in order to keep the variance of each (off-diagonal) entry of the matrix MntM_{n}^{t} fixed.

with initial data Mntt=0=Mn0M_{n}^{t}|_{t=0}=M_{n}^{0}, where βt\beta_{t} is a Hermitian matrix process whose entries are standard real Brownian motions on the diagonal and standard complex Brownian motions off the diagonal, with the βt\beta_{t} being independent of Mn0M_{n}^{0}, and with the upper-triangular entries of βt\beta_{t} being jointly independent. A standard stochastic calculus computation shows that MntM_{n}^{t} is distributed according to the law

where GnG_{n} is a GUE matrix independent of Mn0M_{n}^{0}. In particular, the random matrix MntM_{n}^{t} is distributed as Mn0M_{n}^{0} for t=0t=0 and then continuously deforms towards the GUE distribution as t+t\to+\infty. We say that a Wigner matrix is gauss divisible (also known as a Johansson matrix) with parameter tt if it has the distribution of et/2Mn0+(1et)1/2Gne^{-t/2}M_{n}^{0}+(1-e^{-t})^{1/2}G_{n} for some Wigner matrix Mn0M_{n}^{0}; thus MntM_{n}^{t} is gauss divisible with parameter tt. Note that not every Wigner matrix is gauss divisible; among other things, gauss divisible matrices necessarily have a continuous (and in fact smooth) distribution rather than a discrete one. The larger one makes tt, the more restrictive the requirement of gauss divisibility becomes, until in the asymptotic limit t=+t=+\infty the only gauss divisible ensemble remaining is GUE.

The dynamics of the (fine-scale normalised) eigenvalues λi(Ant)=nλi(Mnt)\lambda_{i}(A_{n}^{t})=\sqrt{n}\lambda_{i}(M_{n}^{t}) were famously established by Dyson to be governed by the (normalised) Dyson Brownian motion equations

where B1,,BnB_{1},\ldots,B_{n} are independent standard real Brownian motions.

and we write j\partial_{j} as shorthand for the partial derivative xj\frac{\partial}{\partial x_{j}}. Observe that the Ginibre distribution ρn\rho_{n}^{\infty} defined by (21) is annihilated by DD and is thus an equilibrium state of the Dyson Fokker-Planck equation; this is of course consistent with the convergence of the distribution of MntM_{n}^{t} to the distribution of GUE.

The Dyson Fokker-Planck equation (28) can in fact be solved explicitly by observing the identity

where Δn(x):=1i<jn(xixj)\Delta_{n}(x):=\prod_{1\leq i<j\leq n}(x_{i}-x_{j}) is the Vandermonde determinant, and LL is the Ornstein-Uhlenbeck operator

This allows one to reduce the Dyson Fokker-Planck equation by a change of variables to the Ornstein-Uhlenbeck Fokker-Planck equation tu=Lu\frac{\partial}{\partial t}u=Lu, which has an explicit fundamental solution. Using thisThe derivation of these formulae in is somewhat different, proceeding via the Harish-Chandra/Itzykson-Zuber formula ; however, as noted in that paper, one can also use Dyson Brownian motion to derive the formula, which is the approach taken here. one can obtain an explicit formula for ρnt\rho_{n}^{t} in terms of ρn0\rho_{n}^{0}, and with a bit more effort one can also obtain a (slightly messy) determinantal formula for the associated correlation functions (Rn(k))t(R_{n}^{(k)})^{t}; see , . By exploiting these explicit formulae, Johansson was ableSome additional technical hypotheses were assumed in , namely that the diagonal variance σ2\sigma^{2} was equal to 11, that the real and imaginary parts of each entry of MnM^{\prime}_{n} were independent, and that Condition C1held for some C0>6C_{0}>6. to extend the asymptotic (23) for the kk-point correlation function from GUE to the more general class of gauss divisible matrices with fixed parameter t>0t>0 (independent of nn).

It is of interest to extend this analysis to as small a value of tt as possible, since if one could set t=0t=0 then one would obtain universality for all Wigner ensembles. By optimising Johansson’s method (and taking advantage of the local semi-circle law), Erdős, Peche, Ramirez, Schlein, and Yau were able to extend the universality of (23) (interpreted in a suitably weak convergence topology, such as vague convergence) to gauss divisible ensembles for tt as small as n1+εn^{-1+{\varepsilon}} for any fixed ε>0{\varepsilon}>0.

An alternate approach to these results was developed by Erdős, Ramirez, Schlein, Yau, and Yin , , . The details are too technical to be given here, but the main idea is to use standard tools such as log-Sobolev inequalities to control the rate of convergence of the Dyson Fokker-Planck equation to the equilibrium measure ρn\rho_{n}^{\infty}, starting from the initial data ρn0\rho_{n}^{0}; informally, if one has a good convergence to this equilibrium measure by time tt, then one can obtain universality results for gauss divisible ensembles with this parameter tt. A simple model to gain heuristic intuition on the time needed to converge to equilibrium is given by the one-dimensional Ornstein-Uhlenbeck process

As was already observed implicity by Dyson, the difficulty with the Dyson Fokker-Planck equation (28) (or equivalently, the Dyson Brownian motion (27)) is that different components of the evolution converge to equilibrium at different rates. Consider for instance the trace variable T:=λ1(Ant)++λn(Ant)T:=\lambda_{1}(A_{n}^{t})+\ldots+\lambda_{n}(A_{n}^{t}). Summing up (27) we see that this variable evolves by the Ornstein-Uhlenbeck process

for some standard Brownian motion βt\beta_{t}, and so we expect convergence to equilibrium for this variable in time O(1)O(1). At the other extreme, consider an eigenvalue gap si:=λi+1(Ant)λi(Ant)s_{i}:=\lambda_{i+1}(A_{n}^{t})-\lambda_{i}(A_{n}^{t}) somewhere in the bulk of the spectrum. Subtracting two consecutive cases of (27), we see that

Using the heuristic λj(An)λjcl(An)\lambda_{j}(A_{n})\approx\lambda_{j}^{\operatorname{cl}}(A_{n}), we expect θi\theta_{i} to be of size comparable to nn and sis_{i} to be of size comparable to 11; comparing (30) with (29) we thus expect sis_{i} to converge to equilibrium in time O(1/n)O(1/n).

One can use a standard log-Sobolev argument of Bakry and Emery (exploiting the fact that the equilibrium measure ρn\rho_{n}^{\infty} is the negative exponential of a strictly convex function HH) to show (roughly speaking) that the Dyson Brownian motion converges to global equilibrium in time O(1)O(1); see e.g. . Thus the trace variable TT is among the slowest of the components of the motion to converge to equilibrium. However, for the purposes of controlling local statistics such as the normalised kk-point correlation function

and particularly the averaged normalised kk-point correlation function

Extending beyond the GUE case II. Swapping and the Four Moment Theorem

The heat flow methods discussed in the previous section enlarge the class of Wigner matrices to which GUE-type statistics are known to hold, but do not cover all such matrices, and in particular leave out discrete ensembles such as the Bernoulli ensembles, which are not gauss divisible for any t>0t>0. To complement these methods, we have a family of swapping methods to extend spectral asymptotics from one ensemble to another, based on individual replacement of each coefficient of a Wigner matrix, as opposed to deforming all of the coefficients simultaneously via heat flow.

The simplest (but rather crude) example of a swapping method is based on the total variation distance d(X,Y)d(X,Y) between two random variables X,YX,Y taking values in the same range RR, defined by the formula

where the supremum is over all measurable subsets EE of RR Clearly one has

and thus from (18) we have the somewhat crude bound

for any test function FF. On the other hand, by swapping the entries of MnM_{n} with MnM^{\prime}_{n} one at a time, we see that

We thus see that if d(ξij,ξij)nCd(\xi_{ij},\xi^{\prime}_{ij})\leq n^{-C} for a sufficiently large constant CC (depending on kk), then the kk-point correlation functions of MnM_{n} and MnM^{\prime}_{n} are asymptotically equivalent. This argument was quite crude, costing many more powers of nn than is strictly necessary, and by arguing more carefully one can reduce this power; see . However, it does not seem possible to eliminate the factors of nn entirely from this type of argument.

Another fundamental example of a swapping method is the Lindeberg exchange strategy We would like to thank S. Chatterjee and M. Krisnapur for introducing this method to us, introduced in Lindeberg’s classic proof of the central limit theorem, and first applied to Wigner ensembles in . We quickly sketch that proof here. Suppose that X1,,XnX_{1},\ldots,X_{n} are iid real random variables with mean zero and variance one, and let Y1,,YnY_{1},\ldots,Y_{n} be another set of iid real random variables and mean zero and variance one (which we may assume to be independent of X1,,XnX_{1},\ldots,X_{n}). We would like to show that the averages X1++Xnn\frac{X_{1}+\ldots+X_{n}}{\sqrt{n}} and Y1++Ynn\frac{Y_{1}+\ldots+Y_{n}}{\sqrt{n}} have asymptotically the same distribution, thus

for any smooth, compactly supported function FF. The idea is to swap the entries X1,,XnX_{1},\ldots,X_{n} with Y1,,YnY_{1},\ldots,Y_{n} one at a time and obtain an error of o(1/n)o(1/n) on each such swap. For sake of illustration we shall just establish this for the first swap:

We write X1++Xnn=S+n1/2Xn\frac{X_{1}+\ldots+X_{n}}{\sqrt{n}}=S+n^{-1/2}X_{n}, where S:=X1++Xn1nS:=\frac{X_{1}+\ldots+X_{n-1}}{\sqrt{n}}. From Taylor expansion we see (for fixed smooth, compactly supported FF) that

We then make the crucial observation that SS and XnX_{n} are independent. On taking expectations (and assuming that XnX_{n} has a bounded third moment) we conclude that

But by hypothesis, XnX_{n} and YnY_{n} have matching moments to second order, in the sense that EXni=EYni{\mathbf{E}}X_{n}^{i}={\mathbf{E}}Y_{n}^{i} for i=0,1,2i=0,1,2. Thus, on subtracting, we obtain (31) (with about a factor of n1/2n^{-1/2} to spare; cf. the Berry-Esséen theorem , ).

Note how the argument relied on the matching moments of the two atom variables Xi,YiX_{i},Y_{i}; if one had more matching moments, one could continue the Taylor expansion and obtain further improvements to the error term in (31), with an additional gain of n1/2n^{-1/2} for each further matching moment.

We can apply the same strategy to control expressions such as EF(Mn)F(Mn){\mathbf{E}}F(M_{n})-F(M^{\prime}_{n}), where Mn,MnM_{n},M^{\prime}_{n} are two (independent) Wigner matrices. If one can obtain bounds such as

As it turns out, the numerology of swapping for matrices is similar to that for the central limit theorem, in that each matching moment leads to an additional factor of O(n1/2)O(n^{-1/2}) in the error estimates. From this, one can expect to obtain asymptotics of the form (32) when the entries of Mn,MnM_{n},M^{\prime}_{n} match to second order on the diagonal and to fourth order off the diagonal; informally, this would mean that EF(Mn){\mathbf{E}}F(M_{n}) depends only on the first four moments of the entries (and the first two moments of the diagonal entries). In the case of statistics arising from eigenvalues or eigenvectors, this is indeed the case, and the precise statement is known as the Four Moment Theorem.

We first state the Four Moment Theorem for eigenvalues.

Let k1k\geq 1. Two complex random variables ξ,ξ\xi,\xi^{\prime} are said to match to order kk if one has ERe(ξ)aIm(ξ)b=ERe(ξ)aIm(ξ)b{\mathbf{E}}{\operatorname{Re}}(\xi)^{a}{\operatorname{Im}}(\xi)^{b}={\mathbf{E}}{\operatorname{Re}}(\xi^{\prime})^{a}{\operatorname{Im}}(\xi^{\prime})^{b} whenever a,b0a,b\geq 0 are integers such that a+bka+b\leq k.

In the model case when the real and imaginary parts of ξ\xi or of ξ\xi^{\prime} are independent, the matching moment condition simplifies to the assertion that ERe(ξ)a=ERe(ξ)a{\mathbf{E}}{\operatorname{Re}}(\xi)^{a}={\mathbf{E}}{\operatorname{Re}}(\xi^{\prime})^{a} and EIm(ξ)b=EIm(ξ)b{\mathbf{E}}{\operatorname{Im}}(\xi)^{b}={\mathbf{E}}{\operatorname{Im}}(\xi^{\prime})^{b} for all 0a,bk0\leq a,b\leq k.

We remark that in the papers , , , a variant of the above result, which we called the three moment theorem, was asserted, in which the hypothesis of four matching moments off the diagonal was relaxed to three matching moments (and no moment matching was required on the diagonal), but for which the bound (33) was improved to jG(x)nCjc0|\nabla^{j}G(x)|\leq n^{-Cjc_{0}} for some sufficiently large absolute constant C>0C>0. Unfortunately, the proof given of the three moment theorem in these papers was not correct as stated, although the claim can still be proven in most cases by other means; see Appendix A.

A preliminary version of Theorem 11 was first established by the authors in , in the caseIn the paper, kk was held fixed, but an inspection of the argument reveals that it extends without difficulty to the case when kk is as large as nc0n^{c_{0}}, for c0c_{0} small enough. of bulk eigenvalues (thus δni1,,ik(1δ)n\delta n\leq i_{1},\ldots,i_{k}\leq(1-\delta)n for some absolute constant δ>0\delta>0) and assuming Condition C0 instead of Condition C1. In , the restriction to the bulk was removed; and in , Condition C0 was relaxed to Condition C1 for a sufficiently large value of C0C_{0}. We will discuss the proof of this theorem in Section 6.

The following technical generalization of the Four Moment Theorem, in which the entries of Mn,MnM_{n},M^{\prime}_{n} only match approximately rather than exactly, is useful for some applications.

The conclusions of Theorem 11 continue to hold if the requirement that ξij\xi_{ij} and ξij\xi^{\prime}_{ij} match to order 44 is relaxed to the conditions

whenever a,b0a,b\geq 0 and a+b4a+b\leq 4, where

This proposition follows from an inspection of the proof of Theorem 11: see Section 6.

A key technical result used in the proof of the Four Moment Theorem, which is also of independent interest, is the gap theorem:

Let MnM_{n} be a Wigner matrix obeying Condition C1 for a sufficiently large absolute constant C0C_{0}. Then for every c0>0c_{0}>0 there exists a c1>0c_{1}>0 (depending only on c0c_{0}) such that

We discuss this theorem in Section 6. Among other things, the gap theorem tells us that eigenvalues of a Wigner matrix are usually simple. Closely related level repulsion estimates were established (under an additional smoothness hypothesis on the atom distributions) in .

Another variant of the Four Moment Theorem was subsequently introduced in , in which the eigenvalues λij(An)\lambda_{i_{j}}(A_{n}) appearing in Theorem 11 were replaced by expressions such as (20) that are derived from the resolvent (or Green’s function) (Wnz)1(W_{n}-z)^{-1}, but with slightly different technical hypotheses on the matrices Mn,MnM_{n},M^{\prime}_{n}; see for full details. As the resolvent-based quantities (20) are averaged statistics that sum over many eigenvalues, they are far less sensitive to the eigenvalue repulsion phenomenon than the individual eigenvalues, and as such the version of the Four Moment Theorem for Green’s function has a somewhat simpler proof (based on resolvent expansions rather than the Hadamard variation formulae and Taylor expansion). Conversely, though, to use the Four Moment Theorem for Green’s function to control individual eigenvalues, while possible, requires a significant amount of additional argument; see .

Unfortunately, the eigenvectors ui(Mn)u_{i}(M_{n}) are not unique in either the Hermitian or real symmetric cases; even if one assumes that the spectrum of MnM_{n} is simple, in the sense that

one has the freedom to rotate each ui(Mn)u_{i}(M_{n}) by a unit phase e1θU(1)e^{\sqrt{-1}\theta}\in U(1). In the real symmetric case, in which we force the eigenvectors to have real coefficients, one only has the freedom to multiply each ui(Mn)u_{i}(M_{n}) by a sign ±O(1)\pm\in O(1). There are a variety of ways to eliminate this ambiguity. For sake of concreteness we will remove the ambiguity by working with the orthogonal projections Pi(Mn)P_{i}(M_{n}) to the eigenspace at eigenvalue λi(Mn)\lambda_{i}(M_{n}); if this eigenvalue is simple, we simply have Pi(Mn):=ui(Mn)ui(Mn)P_{i}(M_{n}):=u_{i}(M_{n})u_{i}(M_{n})^{*}.

and Pi,p,q(M)P_{i,p,q}(M) is the pqpq coefficient of the projection Pi(M)P_{i}(M). The bounds are uniform in the choice of i1,,ik,p1,,pk,q1,,qki_{1},\ldots,i_{k},p_{1},\ldots,p_{k},q_{1},\ldots,q_{k}.

We also remark that the Four Moment Theorem for eigenvectors (in conjunction with the eigenvalue rigidity bound (15)) can be used to establish a variant of the Four Moment Theorem for Green’s function, which has the advantage of being applicable all the way up to the real axis (assuming a level repulsion hypothesis); see .

It is a natural question to ask whether the requirement of four matching moments (or four approximately matching moments, as in Proposition 12) is genuinely necessary. As far as the distribution of individual eigenvalues λi(An)\lambda_{i}(A_{n}) are concerned, the answer is essentially “yes”, even in the identically distributed case, as the following result from shows.

Let Mn,MnM_{n},M^{\prime}_{n} be real symmetric Wigner matrices whose atom variables ξ,ξ\xi,\xi^{\prime} have vanishing third moment Eξ3=E(ξ)3=0{\mathbf{E}}\xi^{3}={\mathbf{E}}(\xi^{\prime})^{3}=0 but with distinct fourth moments Eξ4E(ξ)4{\mathbf{E}}\xi^{4}\neq{\mathbf{E}}(\xi^{\prime})^{4}. Then for all sufficiently large nn, one has

for some κ\kappa depending only on the atom distributions.

This result is established by combining a computation of the fourth moment i=1nλi(An)4\sum_{i=1}^{n}\lambda_{i}(A_{n})^{4} with eigenvalue rigidity estimates such as (15). Informally, it asserts that on average, the mean value of λi(An)\lambda_{i}(A_{n}) is sensitive to the fourth moment of the atom distributions at the scale of the mean eigenvalue spacing (which is comparable to 11 in the bulk at least). In contrast, the Four Moment Theorem morallyThis is not quite true as stated, because of the various error terms in the Four Moment Theorem, and the requirement that the function GG in that theorem is smooth. A more accurate statement (cf. the proof of Theorem 18 below) is that if the median of λi(An)\lambda_{i}(A_{n}) is MM (thus P(λi(An)M)=1/2{\mathbf{P}}(\lambda_{i}(A_{n})\leq M)=1/2, in the continuous case at least), then one has P(λi(An)M+nc0),P(λi(An)Mnc0)1/2nc0,{\mathbf{P}}(\lambda_{i}(A^{\prime}_{n})\leq M+n^{-c_{0}}),{\mathbf{P}}(\lambda_{i}(A^{\prime}_{n})\geq M-n^{-c_{0}})\geq 1/2-n^{-c_{0}}, which almost places the median of λi(An)\lambda_{i}(A^{\prime}_{n}) within O(nc0)O(n^{-c_{0}}) of MM. asserts that when the atom variables of MnM_{n} and MnM^{\prime}_{n} match to fourth order, then the median of λi(An)\lambda_{i}(A_{n}) and of λi(An)\lambda_{i}(A^{\prime}_{n}) only differ by O(nc0)O(n^{-c_{0}}). Thus, Theorem 15 and Theorem 11 are not directly comparable to each other. Nevertheless it is expected that the mean and median of λi(An)\lambda_{i}(A_{n}) should be asymptotically equal at the level of the mean eigenvalue spacing, although this is just beyond the known results (such as (15)) on the distribution of these eigenvalues. As such, Theorem 15 provides substantial evidence that the Four Moment Theorem breaks down if one does not have any sort of matching at the fourth moment.

By computing higher moments of λi(An)\lambda_{i}(A_{n}), it was conjectured in that one has an asymptotic of the form

for all ii in the bulk region δni(1δ)n\delta n\leq i\leq(1-\delta)n, where Ci,nC_{i,n} is a quantity independent of the atom distribution ξ\xi. (At the edge, the dependence on the fourth moment is weaker, at least when compared against the (now much wider) mean eigenvalue spacing; see Section 7.1.)

We remark that while the statistics of individual eigenvalues are sensitive to the fourth moment, averaged statistics such as the kk-point correlation functions ρn,u(k)\rho_{n,u}^{(k)} are much less sensitive to this moment (or the third moment). Indeed, this is already visible from the results in Section 4, as gauss divisible matrices can have a variety of possible third or fourth moments for their atom distributions (see Lemma 23).

Sketch of proof of four moment theorem

In this section we discuss the proof of Theorem 11 and Theorem 13, following the arguments that originated in and refined in . To simplify the exposition, we will just discuss the four moment theorem; the proof of the approximate four moment theorem in Proposition 12 is established by a routine modification of the argument.

For technical reasons, the two theorems need to be proven together. Let us say that a Wigner matrix MnM_{n} has the gap property if it obeys the conclusion of Theorem 13; thus Theorem 13 asserts that all Wigner matrices obeying Condition C1 for sufficiently large C0C_{0} have the gap property. We do not know of a direct proof of this result that does not go through the Four Moment Theorem; however, it is possible to establish an independent proof of a more restrictive result:

Any Wigner matrix obeying Condition C0 has the gap property.

We discuss this theorem (which is [104, Theorem 19]) later in this section. Another key ingredient is the following truncated version of the Four Moment Theorem, in which one removes the event that two consecutive eigenvalues are too close to each other. For technical reasons, we need to introduce quantities

for i=1,,ni=1,\ldots,n, which is a regularised measure of extent to which λi(An)\lambda_{i}(A_{n}) is close to any other eigenvalue of AnA_{n}.

Let c0>0c_{0}>0 be a sufficiently small constant. Let Mn=(ξij)1i,jnM_{n}=(\xi_{ij})_{1\leq i,j\leq n} and Mn=(ξij)1i,jnM^{\prime}_{n}=(\xi^{\prime}_{ij})_{1\leq i,j\leq n} be two Wigner matrices obeying Condition C1for some sufficiently large absolute constant C0C_{0}. Assume furthermore that for any 1i<jn1\leq i<j\leq n, ξij\xi_{ij} and ξij\xi^{\prime}_{ij} match to order 44 and for any 1in1\leq i\leq n, ξii\xi_{ii} and ξii\xi^{\prime}_{ii} match to order 22. Set An:=nMnA_{n}:=\sqrt{n}M_{n} and An:=nMnA^{\prime}_{n}:=\sqrt{n}M^{\prime}_{n}, let 1knc01\leq k\leq n^{c_{0}} be an integer, and let

We will discuss the proof of this theorem shortly. Applying Theorem 17 with k=1k=1 and a function GG that depends only a single variable QiQ_{i}, and using the gap property to bound QiQ_{i} (cf. [104, Lemma 49]), one can show a four moment property for the gap theorem: if Mn,MnM_{n},M^{\prime}_{n} are Wigner matrices obeying Condition C1 for a sufficiently large C0C_{0} which match to fourth order, and MnM_{n} obeys the gap property, then MnM^{\prime}_{n} also obeys the gap property. Using this and Theorem 16, one can then obtain Theorem 13 in full generality. Using Theorem 13, one can then deduce Theorem 11 from Theorem 17 by smoothly truncating in the QQ variables: see [104, §3.3].

It remains to establish Theorem 16 and Theorem 17. We begin with Theorem 17. To simplify the exposition slightly, let us assume that the matrices Mn,MnM_{n},M^{\prime}_{n} are real symmetric rather than Hermitian. To reduce the number of parameters, we will also set C0:=1/c0C_{0}:=1/c_{0}.

by telescoping together O(n2)O(n^{2}) estimates of this sort one can establish (40). (For swaps on the diagonal, one only needs an error term of O(n3/2+O(c0))O(n^{-3/2+O(c_{0})}), since there are only O(n)O(n) swaps to be made here rather than O(n2)O(n^{2}). This is ultimately why there are two fewer moment conditions on the diagonal than off it.)

is a (random) Hermitian matrix depending linearlyIf we were working with Hermitian matrices rather than real symmetric matrices, then one could either swap the real and imaginary parts of the ξij\xi_{ij} separately (exploiting the hypotheses that these parts were independent), or else repeat the above analysis with tt now being a complex parameter (or equivalently, two real parameters) rather than a real one. In the latter case, one needs to replace all instances of single variable calculus below (such as Taylor expansion) with double variable calculus, but aside from notational difficulties, it is a routine matter to perform this modification. on a real parameter tt, with A(0)A(0) being a Wigner matrix with one entry (and its adjoint) zeroed out, and A(0)A^{\prime}(0) is the explicit elementary Hermitian matrix

We note the crucial fact that the random matrix A(0)A(0) is independent of both ξpq\xi_{pq} and ξpq\xi^{\prime}_{pq}. Note from Condition C1 that we expect ξpq,ξpq\xi_{pq},\xi^{\prime}_{pq} to have size O(nO(c0))O(n^{O(c_{0})}) most of the time, so we should (heuristically at least) be able to restrict attention to the regime t=O(nO(c0))t=O(n^{O(c_{0})}). If we then set

Suppose that we have Taylor expansions of the form

for all t=O(nO(c0))t=O(n^{O(c_{0})}) and l=1,,kl=1,\ldots,k, where the Taylor coefficients cl,jc_{l,j} have size cl,j=O(nj/2+O(c0)c_{l,j}=O(n^{-j/2+O(c_{0})}, and similarly for the quantities Qil(A(t))Q_{i_{l}}(A(t)). Then by using the hypothesis (39) and further Taylor expansion, we can obtain a Taylor expansion

for the function F(t)F(t) defined in (42), where the Taylor coefficients fjf_{j} have size fj=O(nj/2+O(c0))f_{j}=O(n^{-j/2+O(c_{0})}). Setting tt equal to ξpq\xi_{pq} and taking expectations, and noting that the Taylor coefficients fjf_{j} depend only on FF and A(0)A(0) and is thus independent of ξij\xi_{ij}, we conclude that

and similarly for EF(ξpq){\mathbf{E}}F(\xi^{\prime}_{pq}). If ξpq\xi_{pq} and ξpq\xi^{\prime}_{pq} have matching moments to fourth order, this gives (43). (Note that a similar argument also would give the Three Moment Theorem, as well as Proposition 12.)

It remains to establish (44) (as well as the analogue for Qil(A(t))Q_{i_{l}}(A(t)), which turns out to be analogous). We abbreviate ili_{l} simply as ii. By Taylor’s theorem with remainder, it would suffice to show that

In particular, the eigenvalue λi(A(t))\lambda_{i}(A(t)) is simple, which ensures that all quantities depend smoothly on tt (locally, at least).

To prove (45), one can use the classical Hadamard variation formulae for the derivatives of λi(A(t))\lambda_{i}(A(t)), which can be derived for instance by repeatedly differentiating the eigenvector equation A(t)ui(A(t))=λi(A(t))ui(A(t))A(t)u_{i}(A(t))=\lambda_{i}(A(t))u_{i}(A(t)). The formula for the first derivative is

But recall from eigenvalue delocalisation (Corollary 8) that with overwhelming probability, all coefficients of ui(A(t))u_{i}(A(t)) have size O(n1/2+o(1))O(n^{-1/2+o(1)}); given the nature of the matrix (41), we can then obtain (45) in the j=1j=1 case.

Now consider the j=2j=2 case. The second derivative formula reads

(compare with the formula (27) for Dyson Brownian motion). Using eigenvalue delocalisation as before, we see with overwhelming probability that the numerator is O(n1+o(1))O(n^{-1+o(1)}). To deal with the denominator, one has to exploit the hypothesis (46) and the local semicircle law (Theorem 7). Using these tools, one can conclude (45) in the j=2j=2 case with overwhelming probability.

It turns out that one can continue this process for higher values of jj, although the formulae for the derivatives for λi(A(t))\lambda_{i}(A(t)) (and related quantities, such as Pi(A(t))P_{i}(A(t)) and Qi(A(t))Q_{i}(A(t))) become increasingly complicated, being given by a certain recursive formula in jj. See for details.

Now we briefly discuss the proofThe argument here is taken from . In the case when the atom distributions are sufficiently smooth, one can also deduce this result from the level repulsion argument in [37, Theorem 3.5] and the eigenvalue rigidity estimate (15), and by using Theorem 17 one can extend the gap property to several other Wigner ensembles. However, this argument does not cover the case of Bernoulli ensembles, which is perhaps the most difficult case of Theorem 13 or Theorem 16. of Theorem 16. For sake of discussion we restrict attention to the bulk case εni(1ε)n{\varepsilon}n\leq i\leq(1-{\varepsilon})n; the changes needed to deal with the edge case are relatively minor and are discussed in . The objective here is to limit the probability of the event that the quantity λi+1(An)λi(An)\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n}) is unexpectedly small. The main difficulty here is the fact that one is comparing two adjacent eigenvalues. If instead one was bounding λi+k(An)λi(An)\lambda_{i+k}(A_{n})-\lambda_{i}(A_{n}) for a larger value of kk, say klogCnk\geq\log^{C}n for a large value of CC, then one could obtain such a bound from the local semicircle law (Theorem 7) without much difficulty. To reduce kk all the way down to 11, the idea is to exploit the following phenomenon:

If λi+1(An)λi(An)\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n}) is small, then λi+1(An1)λi1(An1)\lambda_{i+1}(A_{n-1})-\lambda_{i-1}(A_{n-1}) is also likely to be small.

Here An1A_{n-1} denotesStrictly speaking, one has to multiply An1A_{n-1} also by n1n\frac{\sqrt{n-1}}{\sqrt{n}} to be consistent with our conventions for MnM_{n} and Mn1M_{n-1}, although this factor turns out to make very little difference to the analysis. the top left n1×n1n-1\times n-1 minor of AnA_{n}. This phenomenon can be viewed as a sort of converse to the classical Cauchy interlacing law

(cf. (12)), since this law clearly shows that λi+1(An)λi(An)\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n}) will be small whenever λi+1(An1)λi1(An1)\lambda_{i+1}(A_{n-1})-\lambda_{i-1}(A_{n-1}) is. In principle, if one iterates (generalisations of) the above principle k=logCnk=\log^{C}n times, one eventually reaches an event that can be shown to be highly unlikely by the local semicircle law.

To explain why we expect such a phenomenon to be true, let us expand AnA_{n} as

that relates λi(An)\lambda_{i}(A_{n}) to the various eigenvalues λj(An1)\lambda_{j}(A_{n-1}) of An1A_{n-1} (ignoring for sake of discussion the non-generic case when one or more of the denominators in (48) vanish); compare with (16). Using concentration of measure tools (such as Talagrand’s inequality, see e.g. ), one expects uj(An1)X2|u_{j}(A_{n-1})^{*}X|^{2} to concentrate around its mean, which can be computed to be n(n1)n(n-1). In view of this and, one expects the largest (and thus, presumably, the most dominant) terms in (48) to be the summands on the left-hand side when jj is equal to either i1i-1 or ii. In particular, if λi+1(An)λi(An)\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n}) is unexpectedly small (e.g. smaller than nc0n^{-c_{0}}), then by (47) λi(An1)λi(An)\lambda_{i}(A_{n-1})-\lambda_{i}(A_{n}) is also small. This causes the j=ij=i summand in (48) to (usually) be large and positive; to counterbalance this, one then typically expects the j=i1j=i-1 summand to be large and negative, so that λi(An)λi1(An1)\lambda_{i}(A_{n})-\lambda_{i-1}(A_{n-1}) is small; in particular, λi(An1)λi1(An1)\lambda_{i}(A_{n-1})-\lambda_{i-1}(A_{n-1}) is small. A similar heuristic argument (based on (48) but with λi(An)\lambda_{i}(A_{n}) replaced by λi+1(An)\lambda_{i+1}(A_{n}) predicts that λi+1(An1)λi(An1)\lambda_{i+1}(A_{n-1})-\lambda_{i}(A_{n-1}) is also small; summing, we conclude that λi+1(An1)λi1(An1)\lambda_{i+1}(A_{n-1})-\lambda_{i-1}(A_{n-1}) is also small, thus giving heuristic support to the above phenomenon.

One can make the above arguments more rigorous, but the details are rather complicated. One of the complications arises from the slow decay of the term 1λj(An1)λi(An)\frac{1}{\lambda_{j}(A_{n-1})-\lambda_{i}(A_{n})} as ii moves away from jj. Because of this, a large positive term (such as the j=ij=i summand) in (48) need not be balanced primarily by the negative j=i1j=i-1 summand, but instead by a dyadic block i2kj<i2k1i-2^{k}\leq j<i-2^{k-1} of such summands; but this can be addressed by replacing the gap λi+1(An)λi(An)\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n}) by a more complicated quantity (called the regularized gap in ) that is an infimum of a moderately large number of (normalised) gaps λi+(An)λi(An)\lambda_{i_{+}}(A_{n})-\lambda_{i_{-}}(A_{n}). A more serious issue is that the numerators uj(An1)X|u_{j}(A_{n-1})^{*}X| can sometimes be much smaller than their expected value of n\sim n, which can cause the gap at An1A_{n-1} to be significantly larger than that at AnA_{n}. By carefully counting all the possible cases and estimating all the error probabilities, one can still keep the net error of this situation to be of the form O(nc)O(n^{-c}) for some c>0c>0. It is in this delicate analysis that one must rely rather heavily on the exponential decay hypothesis in Condition C0, as opposed to the polynomial decay hypothesis in Condition C1.

This concludes the sketch of Theorem 16. We remarked earlier that the extension to the edge case is fairly routine. In part, this is because the expected eigenvalue gap λi+1(An)λi(An)\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n}) becomes much wider at the edge (as large as n1/3n^{1/3}, for instance, when i=1i=1 or i=n1i=n-1), and so Theorem 16 and Theorem 13 becomes a weaker statement. There is however an interesting “bias” phenomenon that is worth pointing out at the edge, for instance with regard with the interlacing

of the very largest eigenvalues. On the one hand, the gap λn(An)λn1(An)\lambda_{n}(A_{n})-\lambda_{n-1}(A_{n}) between the top two eigenvalues of AnA_{n} is expected (and known, in many cases) to be comparable to n1/3n^{1/3} on the average; see (51) below. On the other hand, from the semi-circular law one expects λn(An)\lambda_{n}(A_{n}) to grow like 2n2n, which suggests that λn(An)λn1(An1)\lambda_{n}(A_{n})-\lambda_{n-1}(A_{n-1}) should be comparable to 11, rather than to n1/3n^{1/3}. In other words, the interlacing (49) is biased; the intermediate quantity λn1(An1)\lambda_{n-1}(A_{n-1}) should be far closer to the right-most quantity λn(An)\lambda_{n}(A_{n}) than the left-most quantity λn1(An)\lambda_{n-1}(A_{n}). This bias can in fact be demonstrated by using the fundamental equation (48); the point is that in the edge case (when ii is close to nn) the term λi(An)-\lambda_{i}(A_{n}) on the right-hand side plays a major role, and has to be balanced by λi(An)λi(An1)\lambda_{i}(A_{n})-\lambda_{i}(A_{n-1}) being as small as O(1)O(1).

This bias phenomenon is not purely of academic interest; it turns out to be an essential ingredient in the proof of eigenvalue delocalisation (Corollary 8) in the edge case, as discussed in Section 2. See for more discussion. It would be of interest to understand the precise relationship between the various eigenvalues in (47) or (49); the asymptotic joint distribution for, say, λi(An)\lambda_{i}(A_{n}) and λi(An1)\lambda_{i}(A_{n-1}) is currently not known, even in the GUE case.

Applications

By combining the heat flow methods with swapping tools such as the Four Moment Theorem, one can extend a variety of results from the GUE (or gauss divisible) regime to wider classes of Wigner ensembles. We now give some examples of such extensions.

One of the simplest instances of the method arises when extending the central limit theorem (26) of Gustavsson for eigenvalues λi(An)\lambda_{i}(A_{n}) in the bulk from GUE to more general ensembles:

The gaussian fluctuation law (26) continues to hold for Wigner matrices obeying Condition C1 for a sufficiently large C0C_{0}, and whose atom distributions match that of GUE to second order on the diagonal and fourth order off the diagonal; thus, one has

whenever λicl(An)=n(u+o(1))\lambda_{i}^{\operatorname{cl}}(A_{n})=n(u+o(1)) for some fixed 2<u<2-2<u<2.

Let MnM^{\prime}_{n} be drawn from GUE, thus by (26) one already has

(note that λicl(An)=λicl(An)\lambda_{i}^{\operatorname{cl}}(A_{n})=\lambda_{i}^{\operatorname{cl}}(A^{\prime}_{n}). To conclude the analogous claim for AnA_{n}, it suffices to show that

for all intervals I=[a,b]I=[a,b], and nn sufficiently large, where I+:=[anc0/10,b+nc0/10]I_{+}:=[a-n^{-c_{0}/10},b+n^{-c_{0}/10}] and I:=[a+nc0/10,bnc0/10]I_{-}:=[a+n^{-c_{0}/10},b-n^{-c_{0}/10}].

On the other hand, one can choose GG to obey (33). Thus by Theorem 11 we have

and the second inequality in (50) follows from the triangle inequality. The first inequality is similarly proven using a smooth function that equals 11 on II_{-} and vanishes outside of II. ∎

In the asymptotic joint distribution of kk distinct eigenvalues λi1(Mn),,λik(Mn)\lambda_{i_{1}}(M_{n}),\ldots,\lambda_{i_{k}}(M_{n}) in the bulk of a GUE matrix MnM_{n} was computed (it is a gaussian kk-tuple with an explicit covariance matrix). By using the above argument, one can extend that asymptotic for any fixed kk to other Wigner matrices, so long as they match GUE to fourth order off the diagonal and to second order on the diagonal.

If one could extend the results in to broader ensembles of matrices, such as gauss divisible matrices, then the above argument would allow some of the moment matching hypotheses to be dropped, using tools such as Lemma 23.

Recently in , a moderate deviations property of the distribution of the eigenvalues λi(An)\lambda_{i}(A_{n}) was established first for GUE, and then extended to the same class of matrices considered in Theorem 18 by using the Four Moment Theorem. An analogue of Theorem 18 for real symmetric matrices (using GOE instead of GUE) was established in .

A similar argument to the one given in Theorem 18 also applies at the edge of the spectrum. For sake of discussion we shall just discuss the distribution of the largest eigenvalue λn(An)\lambda_{n}(A_{n}). In the case of a GUE ensemble, this largest eigenvalue is famously governed by the Tracy-Widom law , which asserts that

Interestingly, the limiting distribution in (51) also occurs in many other seemingly unrelated contexts, such as the longest increasing subsequence in a random permutation .

It is conjectured that the Tracy-Widom law in fact holds for all Wigner matrices obeying Condition C1 with C0=4C_{0}=4; this value of C0C_{0} is optimal, as one does not expect λ1(An)\lambda_{1}(A_{n}) to stay near 2n2\sqrt{n} without this hypothesis (see ). While this conjecture is not yet fully resolved, there has now been a substantial amount of partial progress on the problem . Soshnikov was the first to obtain the Tracy-Widom law for a large class of Wigner matrices; thanks to subsequent refinements in , we know that (51) holds for all Wigner matrices whose entries are iid with symmetric distribution and obeying Condition C1 with C0=12C_{0}=12. On the other hand, by using the argument used to prove Theorem 18, one can also obtain the asymptotic (51) for Wigner matrices obeying Condition C1 for a sufficiently large C0C_{0}, provided that the entries match that of GUE to fourth order. Actually, since the asymptotic (51) applies at scale n1/3n^{1/3} rather than at scale 11, it is possible to modify the arguments to reduce the amount of moment matching required, that one only needs the entries to match GUE to third order (which in particular subsumes the case when the distribution is symmetric); see Appendix A.

More recently, Johansson established (51) for gauss divisible Wigner ensembles obeying Condition C1with the optimal decay condition C0=4C_{0}=4. Combining this result with the three moment theorem (and noting that any Wigner matrix can be matched up to order three with a gauss divisible matrix, see Lemma 23 below), one can then obtain (51) for any Wigner matrix obeying Condition C1for sufficiently large C0C_{0}. An independent proof of this claim (which also applied to generalized Wigner matrix models in which the variance of the entries was non-constant) was also established in . Finally, it was shown very recently in that there is a version of the Four Moment Theorem for the edge that only requires two matching moments, which allows one to establish the Tracy-Widom law for all Wigner matrices obeying Condition C0.

2. Universality of the sine kernel for Hermitian Wigner matrices

We now turn to the question of the extent to which the asymptotic (23), which asserts that the normalised kk-point correlation functions ρn,u(k)\rho_{n,u}^{(k)} converge to the universal limit ρSine(k)\rho^{(k)}_{\operatorname{Sine}}, can be extended to more general Wigner ensembles. A long-standing conjecture of Wigner, Dyson, and Mehta (see e.g. ) asserts (informally speaking) that (23) is valid for all fixed kk, all Wigner matrices and all fixed energy levels 2<u<2-2<u<2 in the bulk. (They also make the same conjecture for random symmetric and random symplectic matrices.) However, to make this conjecture precise one has to specify the nature of convergence in (23). For GUE, the convergence is quite strong (in the local uniform sense), but one cannot expect such strong convergence in general, particularly in the case of discrete ensembles in which ρn,u(k)\rho_{n,u}^{(k)} is a discrete probability distribution (i.e. a linear combination of Dirac masses) and thus is unable to converge uniformly or pointwise to the continuous limiting distribution ρSine(k)\rho^{(k)}_{\operatorname{Sine}}. We will thus instead settle for the weaker notion of vague convergence. More precisely, we say that (23) holds in the vague sense if one has

Vague convergence is not the only notion of convergence studied in the literature. Another commonly studied notion of convergence is averaged vague convergence, in which one averages over the energy parameter uu as well, thus replacing (52) with the weaker claim that

for all 2<E<2-2<E<2. It can be argued (as was done in ) that as the original conjectures of Wigner, Dyson, and Mehta did not precisely specify the nature of convergence, that any one of these notions of convergence would be equally valid for the purposes of claiming a proof of the conjecture. However, the distinction between such notions is not purely a technical one, as certain applications of the Wigner-Dyson-Mehta conjecture are only available if the convergence notion is strong enough.

Consider for instance the gap problem of determining the probability that there is no eigenvalue in the interval [t/2n,t/2n][-t/2n,t/2n] (in other words the distribution of the least singular value). This is an important problem which in the GUE case was studied in and discussed in length in a number of important books in the field, including Mehta’s (see [77, Chapter 5]), Deift’s (see [20, Section 5.4]), Deift-Gioev’s (see [21, Section 4.2]) and Anderson-Guionnet-Zeitouni (see [1, Section 3.5]), Forrester ’s (see [46, Section 9.6]) and the Oxford handbook of matrix theory edited by Akemann et. al. (see [56, Section 4.6] by Anderson). This distribution can be determined from the correlation functions ρn,0(k)\rho^{(k)}_{n,0} at the energy level u=0u=0 by a standard combinatorial argument. In particular, if one has the Wigner-Dyson-Mehta conjecture in the sense of vague convergence, one can make the limiting law in universal over the class of matrices for which that conjecture is verified; see Corollary 29 below. However, if one only knows the Wigner-Dyson-Mehta conjecture in the sense of averaged vague convergence (53) instead of vague convergence (52), one cannot make this conclusion, because the distributional information at u=0u=0 is lost in the averaging process. See also Theorem 28 for a further example of a statistic which can be controlled by the vague convergence version of the Wigner-Dyson-Mehta conjecture, but not the averaged vague convergence version.

For this reason, it is our opinion that a solution of Wigner-Dyson-Mehta conjecture in the averaged vague convergence sense should be viewed as an important partial resolution to that conjecture, but one that falls short of a complete solution to that conjectureIn particular, we view this conjecture as only partially resolved in the real symmetric and symplectic cases, as opposed to the Hermitian cases, because the available results in that setting are either only in the averaged vague convergence sense, or require some additional moment matching hypotheses on the coefficients. Completing the proof of the Wigner-Dyson-Mehta conjecture in these categories remains an interesting future direction of research..

As a consequence of the heat flow and swapping techniques, the Wigner-Dyson-Mehta conjecture for (Hermitian) Wigner matrices is largely resolved in the vague convergence category:

Let MnM_{n} be a Wigner matrix obeying Condition C1 for a sufficiently large absolute constant C0C_{0} which matches moments with GUE to second order, and let 2<u<2-2<u<2 and k1k\geq 1 be fixed. Then (23) holds in the vague sense.

This theorem has been established as a result of a long sequence of partial results towards the Wigner-Dyson-Mehta conjecture , which we will summarise (in a slightly non-chronological order) below; see Remark 24 for a more detailed discussion of the chronology. The precise result stated above was first proven explicitly in , but relies heavily on the previous works just cited. As recalled in Section 3, the asymptotic (23) for GUE (in the sense of locally uniform convergence, which is far stronger than vague convergence) follows as a consequence of the Gaudin-Mehta formula and the Plancherel-Rotach asymptotics for Hermite polynomialsAnalogous results are known for much wider classes of invariant random matrix ensembles, see e.g. , , . However, we will not discuss these results further here, as they do not directly impact on the case of Wigner ensembles..

The next major breakthrough was by Johansson , who, as discussed in Section 4, establshed (23) for gauss divisible ensembles at some fixed time parameter t>0t>0 independent of nn, obtained (23) in the vague sense (in fact, the slightly stronger convergence of weak convergence was established in that paper, in which the function FF in (52) was allowed to merely be LL^{\infty} and compactly supported, rather than continuous and compactly supported). The main tool used in was an explicit determinantal formula for the correlation functions in the gauss divisible case, essentially due to Brézin and Hikami .

In Johansson’s result, the time parameter t>0t>0 had to be independent of nn. It was realized by Erdős, Ramirez, Schlein, and Yau that one could obtain many further cases of the Wigner-Dyson-Mehta conjecture if one could extend Johansson’s result to much shorter times tt that decayed at a polynomial rate in nn. This was first achieved (again in the context of weak convergence) for t>n3/4+εt>n^{-3/4+{\varepsilon}} for an arbitrary fixed ε>0{\varepsilon}>0 in , and then to the essentially optimal case t>n1+εt>n^{-1+{\varepsilon}} (for weak convergence, and (implicitly) in the local L1L^{1} sense as well) in . By combining this with the method of reverse heat flow discussed in Section 5, the asymptotic (23) (again in the sense of weak convergence) was established for all Wigner matrices whose distribution obeyed certain smoothness conditions (e.g. when k=2k=2 one needs a C6C^{6} type condition), and also decayed exponentially. The methods used in were an extension of those in , combined with an approximation argument (the “method of time reversal”) that approximated a continuous distribution by a gauss divisible one (with a small value of tt); the arguments in are based instead on an analysis of the Dyson Brownian motion.

Note from the eigenvalue rigidity property (15) that only a small number of eigenvalues (at most no(1)n^{o(1)} or so) makeStrictly speaking, the results in only establish the eigenvalue rigidity property (15) assuming Condition C0. However, by using the Four Moment Theorem one can relax this to Condition C1 for a sufficiently large C0C_{0}, at the cost of making (15) hold only with high probability rather than overwhelming probability. a significant contribution to the normalised correlation function ρn,u(k)\rho^{(k)}_{n,u} on any fixed compact set, and any fixed uu. Because of this, the Four Moment Theorem (Theorem 11) can be used to show thatVery recently, it was observed in that if one uses the Green’s function Four Moment Theorem from in place of the earlier eigenvalue Four Moment Theorem from at this juncture, then one can reach the same conclusion here without the need to invoke the eigenvalue rigidity theorem, thus providing a further simplification to the argument. if one Wigner matrix MnM_{n} obeyed the asymptotics (23) in the vague sense, then any other Wigner matrix MnM^{\prime}_{n} that matched MnM_{n} to fourth order would also obey (23) in the vague sense, assuming that Mn,MnM_{n},M^{\prime}_{n} both obeyed Condition C1 for a sufficiently large C0C_{0} (so that the eigenvalue rigidity and four moment theorems are applicable).

By combining the above observation with the moment matching lemma presented below, one immediately concludes Theorem 22 assuming that the off-diagonal atom distributions are supported on at least three points.

Let ξ\xi be a real random variable with mean zero, variance one, finite fourth moment, and which is supported on at least three points. Then there exists a gauss divisible, exponentially decaying real random variable ξ\xi^{\prime} that matches ξ\xi to fourth order.

For a proof of this elementary lemma, see [104, Lemma 28]. The requirement of support on at least three points is necessary; indeed, if ξ\xi is supported in just two points a,ba,b, then E(ξa)2(ξb)2=0{\mathbf{E}}(\xi-a)^{2}(\xi-b)^{2}=0, and so any other distribution that matches ξ\xi to fourth order must also be supported on a,ba,b and thus cannot be gauss divisible.

To remove the requirement that the atom distributions be supported on at least three points, one can use the observation from Proposition 12 that one only needs the moments of MnM_{n} and MnM^{\prime}_{n} to approximately match to fourth order in order to be able to transfer results on the distribution of spectra of MnM_{n} to that of MnM^{\prime}_{n}. In particular, if t=n1+εt=n^{-1+{\varepsilon}} for some small ε>0{\varepsilon}>0, then the Ornstein-Uhlenbeck flow MntM^{t}_{n} of MnM_{n} by time tt is already close enough to matching the first four moments of MnM_{n} to apply Proposition 12. The results of give the asymptotic (23) for MntM^{t}_{n}, and the eigenvalue rigidity property (15) then allows one to transfer this property to MnM_{n}, giving Theorem 22.

The above presentation (drawn from the most recent paper ) is somewhat ahistorical, as the arguments used above eemerged from a sequence of papers, which obtained partial results using the best technology available at the time. In the paper , where the first version of the Four Moment Theorem was introduced, the asymptotic (23) was established under the additional assumptions of Condition C0, and matching the GUE to fourth orderIn it was claimed that one only needed a matching condition of GUE to third order, but this was not rigorously proven in that paper due to an issue with the Three Moment Theorem that we discuss in Appendix A.; the former hypothesis was due to the weaker form of the four moment theorem known at the time, and the latter was due to the fact that the eigenvalue rigidity result (15) was not yet established (and was instead deduced from the results of Gustavsson combined with the Four Moment Theorem, thus necessitating the matching moment hypothesis). For related reasons, the paper in (which first introduced the use of Proposition 12) was only able to establish (23) after an additional averaging in the energy parameter uu (and with Condition C0). The subsequent progress in via heat flow methods gave an alternate approach to establishing (23), but also required an averaging in the energy and a hypothesis that the atom distributions be supported on at least three points, although the latter condition was then removed in . In a very recent paper , the exponent C0C_{0} in Condition C1 has been relaxed to as low as 4+ε4+{\varepsilon} for any fixed ε>0{\varepsilon}>0, though still at the cost of averaging in the energy parameter. Some generalisations in other directions (e.g. to covariance matrices, or to generalised Wigner ensembles with non-constant variances) were also established in , , , , , , , , . For instance, it was very recently observed in that by using a variant of the argument from (replacing the asymptotic four moment theorem for eigenvalues by an asymptotic four moment theorem for the Green function) together with a careful inspection of the arguments in and invoking some results from , one can extend Theorem 22 to generalised Wigner ensembles in which the entries are allowed to have variable variance (subject to some additional hypotheses); see for details.

To close this section, we remark that while Theorem 22 is the “right” result for discrete Wigner ensembles (except for the large value of C0C_{0} in Condition C1, which in view of the results in should be reducible to 4+ε4+{\varepsilon}), one expects stronger notions of convergence when one has more smoothness hypotheses on the atom distribution; in particular, one should have local uniform convergence of the correlation functions when the distribution is smooth enough. Some very recent progress in this direction in the k=1k=1 case was obtained by Maltsev and Schlein , .

For results concerning symmetricWith our conventions, the symmetric case is not a sub-case of the Hermitian case, because the matrix would now be required to match GOE to second order, rather than GUE to second order. and symplectic random matrices, we refer to and the references therein. In these cases, the Wigner-Dyson-Mehta conjecture is established in the context of averaged vague convergence; the case of vague convergence remains open even for gauss divisible distributions (although in the case when four moments agree with GOE, one can recover the vague convergence version of the conjecture thanks to the Four Moment Theorem). This appears to be an inherent limitation to the relaxation flow method, at least with the current state of technology. In the Hermitian case, this limitation is overcome by using the explicit formulae of Brezis-Hikami and Johansson (see ), but no such tool appears to be available at present in the symmetric and symplectic cases. It remains an interesting future direction of research to find a way to overcome this limitation and obtain a complete solution to the Wigner-Dyson-Mehta conjecture in these cases, as this could lead to a number of interesting applications, such as the determination of the asymptotic distribution of the least singular value of the adjacency matrix of an Erdős-Renyi graph, which remains open currently despite the significant partial progress (see ) on the Wigner-Dyson-Mehta conjecture for such matrices (though see the recent papers , for some lower bounds relating to this problem).

3. Distribution of the gaps

In Section 2 the averaged gap distribution

was defined for a given energy level 2<u<2-2<u<2 and a scale window tnt_{n} with 1/tn,tn/n1/t_{n},t_{n}/n both going to zero as nn\to\infty. For GUE, it is known that the expected value ESn(s,u,tn){\mathbf{E}}S_{n}(s,u,t_{n}) of this distribution converges as nn\to\infty (keeping uu, ss fixed) to the (cumulative) Gaudin distribution (25); see . Informally, this result asserts that a typical normalised gap ρsc(u)(λi+1(An)λi(An))\rho_{\operatorname{sc}}(u)(\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n})), where λisc(Wn)=u+o(1)\lambda_{i}^{\operatorname{sc}}(W_{n})=u+o(1), is asymptotically distributed on the average according to the Gaudin distribution.

The eigenvalue gap distribution has received much attention in the mathematics community, partially thanks to the fascinating (numerical) coincidence with the gap distribution of the zeros of the zeta functions. For more discussions, we refer to and the references therein. We would like to thank P. Sarnak for enlightening conversations regarding the gap distribution and for constantly encouraging us to work on the universality problem.

It is possible to use an inclusion-exclusion argument to deduce information about ESn(s,u,tn){\mathbf{E}}S_{n}(s,u,t_{n}) from information on the kk-point correlation functions; see e.g. , , , , . In particular, one can establish universality of the Gaudin distribution for the Wigner matrix ensembles considered in those papers, sometimes assuming additional hypotheses due to the averaging of the correlation function in the uu parameter; for instance, in universality is shown for all Wigner matrices obeying Condition C0, assuming that tn/nt_{n}/n decays very slowly to zero, by utilising the universality of the averaged kk-point correlation function established in that paper.

A slightly different approach proceeds by expressing the moments of Sn(s,u,tn)S_{n}(s,u,t_{n}) in terms of the joint distribution of the eigenvalues. Observe that

Replacing the sharp cutoffs in the expectation by smoothed out versions, and applying Theorem 11 (and also using (15) to effectively localise the ii summation to about O(tn)+O(no(1))O(t_{n})+O(n^{o(1)}) indices), we see that if Mn,MnM_{n},M^{\prime}_{n} have four matching moments and both obey Condition C1 for a sufficiently large C0C_{0}, then one has

for suitable choices of decaying quantities o(1)o(1). In particular, if MnM^{\prime}_{n} is known to exhibit Gaudin asymptotics (25) for any 2<u<2-2<u<2 and any tnt_{n} with 1/tn,tn/n=o(1)1/t_{n},t_{n}/n=o(1), then MnM_{n} does as well. In , the Gaudin asymptotics (25) were established for gauss divisible ensembles (with fixed time parameter tt) and any tnt_{n} with 1/tn,tn/n=o(1)1/t_{n},t_{n}/n=o(1), and thus by Lemma 23 and the above argument we conclude that (25) also holds for Wigner matrices obeying Condition C1 for a sufficiently large C0C_{0} whose off-diagonal atom distributions are supported on at least three points. This last condition can be removed by using Proposition 12 as in the previous section (using the results in instead of ), thus giving (25) with no hypotheses on the Wigner matrix other than Condition C1 for a sufficiently large C0C_{0}.

This argument appeared previously in , but at the time the eigenvalue rigidity result (15) was not available, and the Four Moment Theorem required Condition C0 instead of Condition C1, so the statement was weaker, requiring both Condition C0 and a vanishing third moment hypothesis (for the same reason as in Remark 24). By averaging over all uu (effectively setting tn=nt_{n}=n), the need for eigenvalue rigidity could be avoided in (but note that the formulation of the averaged universality for the eigenvalue gap in [104, (5)] is not quite correctMore precisely, instead of Sn(s;x):=1n{1in:xi+1xis},S_{n}(s;x):=\frac{1}{n}|\{1\leq i\leq n:x_{i+1}-x_{i}\leq s\}|, the gap distribution should instead be expressed as Sn(s;x):=1n{1in:xi+1xisρsc(ti)},S_{n}(s;x):=\frac{1}{n}|\{1\leq i\leq n:x_{i+1}-x_{i}\leq\frac{s}{\rho_{\operatorname{sc}}(t_{i})}\}|, where tit_{i}\in is the classical location of the ithi^{\operatorname{th}} eigenvalue: 2tiρsc(x) dx:=in.\int_{-2}^{t_{i}}\rho_{\operatorname{sc}}(x)\ dx:=\frac{i}{n}. as stated, due to the absence of the normalisation by ρsc(u)\rho_{\operatorname{sc}}(u)).

A similar argument also applies to higher moments ESn(s,u,tn)k{\mathbf{E}}S_{n}(s,u,t_{n})^{k} of Sn(s,u,tn)S_{n}(s,u,t_{n}), and so in principle one can also use the moment method to obtain universal statistics for the full distribution of Sn(s,u,tn)S_{n}(s,u,t_{n}), and not just the expectation. However, this argument would require extending the results in , , or to control the distribution (and not just the expectation) of Sn(s,u,tn)S_{n}(s,u,t_{n}) for GUE, gauss divisible matrices (with fixed time parameter), or gauss divisible matrices (with t=n1+εt=n^{-1+{\varepsilon}}) respectively.

It is natural to ask whether the averaging over the window tnt_{n} can be dispensed with entirely. Indeed, one expects that the distribution of the individual normalised eigenvalue gaps ρsc(u)(λi+1(An)λi(An))\rho_{\operatorname{sc}}(u)(\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n})), where 2<u<2-2<u<2 is fixed and λisc(Wn)=u+o(1)\lambda^{\operatorname{sc}}_{i}(W_{n})=u+o(1), should asymptotically converge to the Gaudin distribution in the vague topology, without any averaging in ii. The Four moment theorem allows one to readily deduce such a fact for general Wigner ensembles once one has established it for special ensembles such as GUE or gauss divisible ensembles (and to remove all moment matching and support hypotheses on the Wigner ensemble, one would need to treat gauss divisible ensembles with time parameter equal to a negative power of nn). However, control of the individual normalised eigenvalue gaps in the bulk are not presently in the literature, even for GUE, though they are in principle obtainable from determinantal process methods.

4. Universality of the counting function and gap probability

Using the four moment theorem, one can extend this asymptotic to more general Wigner ensembles.

Consider a Wigner matrix satisfying Condition C1 for a sufficiently large constant C0C_{0}. Let ε>0{\varepsilon}>0 and a<ba<b be independent of nn. For any nn, let u=unu=u_{n} be an element of [2+ε,2ε][-2+{\varepsilon},2-{\varepsilon}]. Then the asymptotic (24) (in the sense of convergence in distribution) holds.

See [106, Theorem 8]. The basic idea is to use the moment method, combining Theorem 22 with the identity

for any k1k\geq 1. We remark that the method also allows one to control the asymptotic joint distribution of several intervals NI1(An),,NIk(An)N_{I_{1}}(A_{n}),\ldots,N_{I_{k}}(A_{n}); for instance, one can show that NI(An)N_{I}(A_{n}) and NJ(An)N_{J}(A_{n}) are asymptotically uncorrelated if I,JI,J have bounded lengths, lie in the bulk, and have separation tending to infinity as nn\to\infty. We omit the details. ∎

Specialising to the case u=0u=0, one can obtain universal behaviour for the probability that MnM_{n} has no eigenvalue in an interval of the form (t/2n,t/2n)(-t/2\sqrt{n},t/2\sqrt{n}) for any fixed tt:

For any fixed t>0t>0, and MnM_{n} satisfy the conditions of Theorem 28, one has

with the asymptotics f(t)=tπt2π2t3π3+O(t4)f(t)=\frac{-t}{\pi}-\frac{t^{2}}{\pi^{2}}-\frac{t^{3}}{\pi^{3}}+O(t^{4}) as t0t\to 0.

In the case of GUE, this was established in [1, Theorem 3.1.2], . The general case then follows from Theorem 28. A weaker extension was established in , assuming matching moments with GUE to fourth order as well as Condition C0. ∎

This corollary is the Hermitian version of the Goldstine-Von Neumann least singular value problem (see for more details). Recently, some additional bounds on the least singular value of Wigner matrices were established in , .

The above universality results concerned intervals whose length was comparable to the fine-scale eigenvalue spacing (which, using the fine-scale normalisation AnA_{n}, is 1/ρsc(u)1/\rho_{\operatorname{sc}}(u)). One can also use the Four Moment Theorem to obtain similar universality results for the counting function on much larger scales, such as NI(Wn)N_{I}(W_{n}), where I:=[y,+)I:=[y,+\infty) with y(2,2)y\in(-2,2) in the bulk. For instance, in the mean and variance of this statistic for GUE matrices was computed as

Combining this with a general central limit theorem for determinantal processes due to Costin and Lebowitz , one obtains

This result was extended to more general Wigner ensembles:

Let MnM_{n} be a Wigner matrix obeying Condition C0 which matches the corresponding entries of GUE up to order 44. Let y(2,2)y\in(-2,2) be fixed, and set I:=[y,+)I:=[y,+\infty). Then the asymptotic (56) hold (in the sense of probability distributions), as do the mean and variance bounds (54).

See [18, Theorem 2]. The first component of this theorem is established using the Four Moment Theorem; the second part also uses the eigenvalue rigidity estimate (15). ∎

5. Distribution of the eigenvectors

By combining the results of , with the four moment theorem for eigenvectors (Theorem 14), one can extend this fact to other Wigner ensembles in . Here is a typical result:

As an example to illustrate Theorem 31, we can take a=an:=1n(1,,1)Sn1a=a_{n}:=\frac{1}{\sqrt{n}}(1,\ldots,1)\in S^{n-1}, and i:=n/2i:=\lfloor n/2\rfloor. Then Theorem 31 asserts that the sum of the entries of the middle eigenvector un/2(Mn)u_{\lfloor n/2\rfloor}(M_{n}) is gaussian in the limit.

6. Central limit theorem for log-determinant

One of most natural and important matrix functionals is the determinant. As such, the study of determinants of random matrices has a long and rich history. The earlier papers on this study focused on the determinant detAn\det A_{n} of the non-Hermitian iid model AnA_{n}, where the entries ζij\zeta_{ij} of the matrix were independent random variables with mean and variance 11 (see for a brief survey of these results).

We turn now to real iid matrices, in which the ζij\zeta_{ij} are jointly independent and real with mean zero and variance one. In , Girko stated that (57) holds for such random matrices under the additional assumption that the fourth moment of the atom variables is 33. Twenty years later, he claimed a much stronger result which replaced the above assumption by the assumption that the atom variables have bounded (4+δ)(4+\delta)-th moment . However, there are several points which are not clear in these papers. Recently, Nguyen and the second author gave a new proof for (57). Their approach also results in an estimate for the rate of convergence and is easily extended to handle to complex case.

The analysis of the above random determinants relies crucially on the fact that the rows of the matrix are jointly independent. This independence no longer holds for Hermitian random matrix models, which makes the analysis of determinants of Hermitian random matrices more challenging. Even showing that the determinant of a random symmetric Bernoulli matrix is non-zero (almost surely) was a long standing open question by Weiss in the 1980s and solved only five years ago ; for more recent developments see [104, Theorem 31], , , . These results give good lower bound for the absolute value of the determinant or the least singular number, but do not reveal any information about the distribution.

Even in the GUE case, it is highly non-trivial to prove an analogue of the central limit theorem (58). Notice that the observation of Goodman does not apply due to the dependence between the rows and so it is not even clear why a central limit theorem must hold for the log-determinant. In , Delannay and Le Caer made use of the explicit distribution of GUE and GOE to prove the central limit theorem for these cases.

While it does not seem to be possible to express the log-determinant of GUE as a sum of independent random variables, in , the authors found a way to approximate the log-determinant as a sum of weakly dependent terms, based on analysing a tridiagonal form of GUE due to Trotter We would like to thank R. Killip for suggesting the use of Trotter’s form.. Using stochastic calculus and the martingale central limit theorem, we gave another proof (see ) for the central limit theorem for GUE and GOE:

Similarly, if MnM_{n} is drawn from GOE rather than GUE, one has

The next task is to extend beyond the GUE or GOE case. Our main tool for this is a four moment theorem for log-determinants of Wigner matrices, analogous to Theorem 11.

for 0j50\leq j\leq 5. Let z0=E+1η0z_{0}=E+\sqrt{-1}\eta_{0} be a complex number with E2δ|E|\leq 2-\delta for some fixed δ>0\delta>0. Then

for some fixed c>0c>0, adopting the convention that G()=0G(-\infty)=0.

The requirements that Mn,MnM_{n},M^{\prime}_{n} be supported on at least three points, and that EE lie in the bulk region E<2δ|E|<2-\delta are artificial, due to the state of current literature on level repulsion estimates. It is likely that with further progress on those estimates that these hypotheses can be removed. The hypothesis that the atom distributions have independent real and imaginary parts is mostly for notational convenience and can also be removed with some additional effort.

By combining Theorem 33 with Theorem 32 we obtain

Let MnM_{n} be a Wigner matrix whose atom distributions ζij\zeta_{ij} are independent of nn, have real and imaginary parts that are independent and match GUE to fourth order, and obey Condition C1 for some sufficiently large C0C_{0}. Then

If MnM_{n} matches GOE instead of GUE, then one instead has

The deduction of this proposition from Theorem 33 and Theorem 32 is standard (closely analogous, for instance, to the proof the central limit theorem for individual eigenvalues) and is omitted. (Notice that in order for the atom variables of MnM_{n} match those of GUE to fourth order, these variables most have at least three points in their supports.)

7. Concentration of eigenvalues

We first discuss the case of the Gaussian Unitary Ensemble (GUE), which is the most well-understood case, as the joint distribution of the eigenvalues is given by a determinantal point process. Because of this, it is known that for any interval II, the random variable NI(Wn)N_{I}(W_{n}) in the GUE case obeys a law of the form

where the ηi=ηi,n,I\eta_{i}=\eta_{i,n,I} are jointly independent indicator random variables (i.e. they take values in {0,1}\{0,1\}); see e.g. [1, Corollary 4.2.24]. The mean and variance of NI(Wn)N_{I}(W_{n}) can also be computed in the GUE case with a high degree of accuracy:

Let MnM_{n} be drawn from GUE, let Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}, and let I=[,x]I=[-\infty,x] for some real number xx (which may depend on nn). Let ε>0{\varepsilon}>0 be independent of nn.

(Bulk case) If x[2+ε,2ε]x\in[-2+{\varepsilon},2-{\varepsilon}], then

(Variance bound) If one has x[2,2ε]x\in[-2,2-{\varepsilon}] and n2/3(2+x)n^{2/3}(2+x)\to\infty as nn\to\infty, one has

In particular, one has VarNI(Wn)=O(logn)\mathbf{Var}N_{I}(W_{n})=O(\log n) in this regime.

By combining these estimates with a well-known inequality of Bennett (see for details) we obtain a concentration estimate for NI(Wn)N_{I}(W_{n}) in the GUE case:

Let MnM_{n} be drawn from GUE, let Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}, and let II be an interval. Then one has

From the above corollary we see in particular that in the GUE case, one has

with overwhelming probability for each fixed II, and an easy union bound argument (ranging over all intervals II in, say, $whoseendpointsareamultipleofwhose endpoints are a multiple ofn^{-100}(say))thenshowsthatthisisalsotrueuniformlyin(say)) then shows that this is also true uniformly inI$ as well.

Now we turn from the GUE case to more general Wigner ensembles. As already mentioned, there has been much interest in recent years in obtaining concentration results for NI(Wn)N_{I}(W_{n}) (and for closely related objects, such as the Stieltjes transform sWn(z):=1ntrace(Wnz)1s_{W_{n}}(z):=\frac{1}{n}\operatorname{trace}(W_{n}-z)^{-1} of WnW_{n}) for short intervals II, due to the applicability of such results to establishing various universality properties of such matrices; see . The previous best result in this direction was by Erdős, Yau, and Yin (see also for a variant):

Let MnM_{n} be a Wigner matrix obeying Condition C0, and let Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}. Then, for any interval II, one has

for all TlogAloglognnT\geq\log^{A\log\log n}n, and some constant A>0A>0.

One can reformulate (61) equivalently as the assertion that

In particular, this theorem asserts that with overwhelming probability one has

for all intervals II. The proof of the above theorem is somewhat lengthy, requiring a delicate analysis of the self-consistent equation of the Stieltjes transform of WnW_{n}.

Comparing this result with the previous results for the GUE case, we see that there is a loss of a double logarithm loglogn\log\log n in the exponent. It has turned out that using the swapping method one can remove this double logarithmic loss, at least under an additional vanishing moment assumptionWe thank M. Ledoux for a conversation leading to this study.

Let MnM_{n} be a Wigner matrix obeying Condition C0, and let Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}. Assume that MnM_{n} matches moments with GUE to third order off the diagonal (i.e. Reξij,Imξij{\operatorname{Re}}\xi_{ij},{\operatorname{Im}}\xi_{ij} have variance 1/21/2 and third moment zero). Then, for any interval II, one has

This estimate is phrased for any TT, but the bound only becomes non-trivial when TlogCnT\gg\log^{C}n for some sufficiently large CC. In that regime, we see that this result removes the double-logarithmic factor from Theorem 37. In particular, this theorem implies that with overwhelming probability one has

for all intervals II; in particular, for any II, NI(Wn)N_{I}(W_{n}) has variance O(logO(1)n)O(\log^{O(1)}n).

As we are assuming Re(ξij){\operatorname{Re}}(\xi_{ij}) and Im(ξij){\operatorname{Im}}(\xi_{ij}) to be independent, the moment matching condition simplifies to the constraints that ERe(ξij)2=EIm(ξij)2=12{\mathbf{E}}{\operatorname{Re}}(\xi_{ij})^{2}={\mathbf{E}}{\operatorname{Im}}(\xi_{ij})^{2}=\frac{1}{2} and ERe(ξij)3=EIm(ξij)3=0{\mathbf{E}}{\operatorname{Re}}(\xi_{ij})^{3}={\mathbf{E}}{\operatorname{Im}}(\xi_{ij})^{3}=0. However, it is possible to extend this theorem to the case when the real and imaginary parts of ξij\xi_{ij} are not independent.

The constant cc in the bound in Theorem 38 is quite decent in several cases. For instance, if the atom variables of MnM_{n} are Bernoulli or have sub-gaussian tail, then we can set c=2/5o(1)c=2/5-o(1) by optimizing our arguments (details omitted). If we assume 4 matching moments rather than 3, then we can set c=1c=1, matching the bound in Corollary 36. It is an interesting question to determine the best value of cc. The value of cc in is implicit and rather small.

The proof of the above theorem is different from that in in that it only uses a relatively crude analysis of the self-consistent equation to obtain some preliminary bounds on the Stieltjes transform and on NIN_{I} (which were also essentially implicit in previous literature). Instead, the bulk of the argument relies on using the Lindeberg swapping strategy to deduce concentration of NI(Wn)N_{I}(W_{n}) in the non-GUE case from the concentration results in the GUE case provided by Corollary 36. In order to keep the error terms in this swapping under control, three matching momentsCompare with Theorem 11.. We need one less moment here because we are working at “mesoscopic” scales (in which the number of eigenvalues involved is much larger than 11) rather than at “microscopic” scales.

Very roughly speaking, the main idea of the argument is to show that high moments such as

are quite stable (in a multiplicative sense) if one swaps (the real or imaginary part of) one of the entries of WnW_{n} (and its adjoint) with another random variable that matches the moments of the original entry to third order. For technical reasons, however, we do not quite manipulate NI(Wn)N_{I}(W_{n}) directly, but instead work with a proxy for this quantity, namely a certain integral of the Stieltjes transform of WnW_{n}. As observed in , the Lindeberg swapping argument is quite simple to implement at the level of the Stieltjes transform (due to the simplicity of the resolvent identities, when compared against the rather complicated Taylor expansions of individual eigenvalues used in ).

As a corollary, we obtain the following rigidity of eigenvalues result, improving upon (15) when one has a matching moment hypothesis:

Let MnM_{n} be a Wigner matrix obeying Condition C0, and let Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}. Assume that MnM_{n} matches moments with GUE to three order off the diagonal and second order on the diagonal. Then for any ii in the bulk

for any T>0T>0, where the classical location γi\gamma_{i}\in is defined by the formula

This corollary improves [42, Theorem 2.2] as it allows TT to be as small as logO(1)n\log^{O(1)}n, instead of logO(loglogn)n\log^{O(\log\log n)}n, under the extra third moment assumption. In particular, in the Bernoulli case, this shows that the variance of the bulk eigenvalues is of order logO(1)n/n\log^{O(1)}n/n. We believe that this is sharp, up to the hidden constant in O(1)O(1).

This corollary also significantly improves [104, Theorem 29]. (As a matter of fact, the original proof of this theorem has a gap in it; see Appendix A for a further discussion.) One can have analogous results for the edge case, under four moment assumption; see for details.

Open questions

While the universality of many spectral statistics of Wigner matrices have now been established, there are still several open questions remaining. Some of these have already been raised in earlier sections; we collect some further such questions in this section.

In one direction, one can continue generalising the class of matrices for which the universality result holds, for instance by lowering the exponent C0C_{0} in Condition C1, and allowing the entries ξij\xi_{ij} of the Wigner matrix to have different variances σij2\sigma^{2}_{ij}, or for the Wigner matrix to be quite sparse. For recent work in these directions, see , , , . With regards to the different variances case, one key assumption that is still needed for existing arguments to work is a spectral gap hypothesis, namely that the matrix of variances (σij2)1i,jn(\sigma_{ij}^{2})_{1\leq i,j\leq n} has a significant gap between its largest eigenvalue and its second largest one; in addition, for the most complete results one also needs the variances to be bounded away from zero. This omits some interesting classes of Wigner-type matrices, such as those with large blocks of zeroes. However, the spectral statistics of p+n×p+np+n\times p+n matrices of the form

for rectangular (p×np\times n) matrices with iid matrices MM are well understood, as the problem is equivalent to that of understanding the singular values of MM (or the eigenvalues of the covariance matrix MMMM^{*}); in particular, analogues of the key tools discussed here (such as the four moment theorem, the local semicircle law, and heat flow methods) are known , ; this suggests that other block-type variants of Wigner matrices could be analysed by these methods. A related problem would be to understand the spectral properties of various self-adjoint polynomial combinations of random matrices, e.g. the commutator ABBAAB-BA of two Wigner matrices A,BA,B. The global coarse-scale nature of the spectrum for such matrices can be analysed by the tools of free probability , but there are still very few rigorous results for the local theory.

Another direction of generalisation is to consider generalised Wigner matrices whose entries have non-zero mean, or equivalently to consider the spectral properties of a random matrix Mn+DnM_{n}+D_{n} that is the sum of an ordinary Wigner matrix MnM_{n} and a deterministic Hermitian matrix DnD_{n}. Large portions of the theory seem amenable to extension in this direction, although the global and local semicircular law would need to be replaced by a more complicated variant (in particular, the semicircular distribution ρsc\rho_{\operatorname{sc}} should be replaced by the free convolution of ρsc\rho_{\operatorname{sc}} with the empirical spectral distribution of DnD_{n}, see ).

In yet another direction, one could consider non-Hermitian analogues of these problems, for instance by considering the statistics of eigenvalues of iid random matrices (in which the entries are not constrained to be Hermitian, but are instead independent and identically distributed). The analogue of the semicircular law in this setting is the circular law, which has been analysed intensively in recent years (see for a survey). There are in fact a number of close connections between the Hermitian and non-Hermitian ensembles, and so it is likely that the progress in the former can be applied to some extent to the latter.

Another natural question to ask is to see if the universality theory for Wigner ensembles can be unified in some way with the older, but very extensively developed, universality theory for invariant ensembles (as covered for instance in ). Significant progress in this direction has recently been achieved in , in which heat flow methods are adapted to show that the local spectral statistics of β\beta-ensembles are asymptotically independent of the choice of potential function (assuming some analyticity conditions on the potential). This reduces the problem to the gaussian case when the potential is quadratic, which can be handled by existing methods, similarly to how the methods discussed here reduce the statistics of general Wigner matrices to those of invariant ensembles such as GUE or GOE. Note though that these techniques do not provide an independent explanation as to why these invariant ensembles have the limiting statistics they do (e.g. governed by the sine determinantal process in the bulk, and the Airy determinantal process in the edge, in the case of GUE); for that, one still needs to rely on the theory of determinantal processes.

Returning now to Wigner matrices, one of the major limitations of the methods discussed here is the heavy reliance on the hypothesis that the (upper-triangular) entries are jointly independent; even weak coupling between entries makes many of the existing methods (such as the swapping technique used in the Four Moment Theorem, or the use of identities such as (11)) break down. A good test case would be the asymptotic statistics of the adjacency matrices of random regular graphs, where the fixed degree dd is a constant multiple of nn, such as n/2n/2. This is essentially equivalent to a Wigner matrix model (such as the real symmetric Bernoulli matrix ensemble) in which the row and column sums have been constrained to be zero. For this model, the global semicircular law and eigenvector delocalisation has recently been established for such matrices; see , .

Recall that the Central limit theorem for the log-determinant of non-Hermitian matrices requires only two moment matching. However, in the Hermitian case, Corollary 34 requires four matching moments. We believe that this requirement can be weakened. For instance, the central limit theorem must hold for random Bernoulli matrices.

Another interesting problem is to determine the distribution of bulk eigenvalues of a random Bernoulli matrix. This matrix has only three matching moments, but perhaps a central limit theorem like Theorem 18 also holds here. We have proved that the variance of any bulk eigenvalue is logO(1)n/n\log^{O(1)}n/n. A good first step would be to determine the right value of the hidden constant in O(1)O(1).

Appendix A Some errata

In this appendix we report an issue with the Three Moment Theorem, which first appeared as the second conclusion of [104, Theorem 15]. We extract this theorem from that paper for reference:

Unfortunately, the proof given of the Three Moment Theorem in [104, §3.3] is not correct, although as we shall see shortly, the theorem can be proven by other means, and the proof itself can be repaired in some other instances. The closely related Four Moment Theorem, which is of more importance in most applications, is not affected by this problem.

The issue affects some subsequent papers , , , , where the Three Moment Theorem (or a variant thereof) was used, and we will discuss the alterations needed to correct those papers below.

In the proof of the Four Moment Theorem in [104, §3.3], the analogue of the function GG that appears in Theorem 42 is replaced with a truncated variant

for j=0,,5j=0,\ldots,5. These bounds are sufficient to establish the Four Moment Theorem (the first part of [104, Theorem 15]), by invoking the first part of [104, Proposition 46]. However, for the purposes of establishing the Three Moment Theorem (the second part of [104, Theorem 15]), these bounds are not sufficient to be able to invoke the second part of [104, Proposition 46], and so the proof breaks down in this case.

A similar issue also affects the Three Moment Theorem in the bulk of the spectrum for covariance matrices ([105, Theorem 6]), the Three Moment Theorem at the edge of the spectrum of Wigner matrices [98, Theorem 1.5], and the Three Moment Theorem at the edge of the spectrum for covariance matrices [121, Theorem 1.5].

A.2. First fix: strengthen the gap property

The three moment theorem is most useful when applied to eigenvalues at the edge. In this case, we can reprove this theorem by modifying the argument in as follows.

The proof of the Four Moment Theorem relies on the gap property (Theorem 13). Suppose temporarily that one could improve this property to the bound

The same statement holds at the other edge of the spectrum.

This method allows one reprove one of the main applications of the three moment theorem, [99, Theorem 1.13], which establishes the universality of spectral statistics at the edge of a Wigner matrix with vanishing third moment.

In Theorem 43, the vanishing third moment assumption was needed only to guarantee (65), since GUE satisfies this bound. In a recent paper Johansson considered gauss divisible matrices. While not stated explicitly, his analysis seems to show that the convergence rate of the density of states of gauss divisible matrices at the edge is polynomial, and this would imply that (65) also holds for this larger class of matrices. As a consequence, one would be able to remove the vanishing third moment assumption and Theorem 43 and also obtain [63, Theorem 1.4].

In a similar manner, one can recover the covariance matrix universality in (after one verifies the required polynomial convergence rate of the density of states at the edge for Wishart ensembles (or for gauss divisible covariance matrices), which can be extracted from the analysis in ).

A.3. Second fix: use the Knowles-Yin Two Moment Theorem

Another way to handle the above applications is to use the very recent Two Moment Theorem at the edge of Knowles and Yin [66, Theorem 1.1]. The conclusion of this theorem is stronger than that of the Three Moment Theorem for the edge, for it only requires two matching moments rather than three (and it can also control the eigenvectors as well as the eigenvalues, cf. ). However, its assumption is more restricted, as the hypotheses on GG are stronger than (66), being adapted to the scale n1/3n^{1/3} of the mean eigenvalue spacing at the edge, rather than to the scale nCc0n^{Cc_{0}}. As such, the Two Moment Theorem at the edge can be used as a substitute for the Three Moment Theorem to establish universality for the distribution of kk eigenvalues at the edge for any fixed k=O(1)k=O(1), as was done in and . In principle, this also recovers the analogous universality results for covariance matrices in , although this would require extending the Knowles-Yin Two Moment Theorem in to the covariance matrix case; this is almost certainly a routine matter, but is not currently in the literature.

A.4. Third fix: use of eigenvalue rigidity

This fix applies to the original Three Moment Theorem in the bulk for Wigner matrices (Theorem 42). One can reprove this theorem using the powerful eigenvalue rigidity result (15). From this bound and (62) (with sufficiently large CC) we conclude that

with overwhelming probability, and similarly for AnA^{\prime}_{n}. Since AnA_{n} and AnA^{\prime}_{n} have the same classical locations, the three moment theorem follows.

This fix can be used to resolve all the places where the Three Moment Theorem from are invoked. Of course, in all such cases stronger results can be deduced directly from the subsequent results of .

In the case when the third moment is zero, one can also use Corollary 41 instead of (15). The point is that the proof of this corollary (which relies on the swapping method) is much simpler than that of (15).

In principle, this fix should also repair the Three Moment Theorem in the bulk for covariance matrices [105, Theorem 6]. However, the required eigenvalue rigidity result for covariance matrices in the bulk is not yet in the literature, though it can almost certainly be establishedThere is a technical issue, namely that the augmented matrix associated to a covariance matrix does not quite obey the spectral gap hypothesis from [42, Assumption (B)] due to a double eigenvalue at 11, but this is likely to be an artificial obstruction, given that much of the rest of the universality theory for Wigner matrices is already known to extend to the covariance matrix case (see e.g. , , ). by modifying the arguments in . In any event, this case is not of particular importance because, to the authors knowledge, there are no applications of the Three Moment Theorem in the bulk for covariance matrices that are explicitly stated in the literature at this time.

In , it was observed that the exponential decay condition C0{\bf C0} in the Four Moment Theorem could be relaxed to a finite moment condition C1{\bf C1} for a sufficiently large moment exponent. If one could obtain a similar relaxation for the eigenvalue rigidity result in , then the above argument could then be used to recover the Three Moment Theorem in this case also. In fact, one can achieve this using the Four Moment Theorem. If MnM_{n} is a matrix obeying C1, then it can be matched to fourth order (or to approximate fourth order, as in ) to a matrix MnM^{\prime}_{n} obeying C0. The latter matrix obeys (15) with overwhelming probability; applying the Four Moment Theorem (cf. the proof of [104, Theorem 32]) we conclude that the former matrix also obeys (15) with high probability, and one can then adapt the preceding arguments.

At the edge of the spectrum, the eigenvalue rigidity property is adapted to a broader scale than that used in the Three Moment Theorem; with respect to the fine-scale normalisation AnA_{n}, the latter is adapted to the scale n1/3n^{1/3} at the edge whilst the former is adapted to the scale nCc0n^{Cc_{0}}. As such, the eigenvalue rigidity property cannot be directly used as a fix for applications at the edge.

A.5. Corrections to specific papers

In this section we record specific statements in a number of papers that need repairing, and which of the above fixes are applicable to these papers.

The second part of [104, Theorem 15] (i.e. the Three Moment Theorem) is true, but the proof requires a result that occurred after the publication of this paper, namely the eigenvalue rigidity result in ). References to this part of the theorem in [104, §3.3] should be deleted.

Similarly for [104, Theorem 32]. This theorem is true, but its proof requires using the rigidity result in or Corollary 41.

Similarly for the asymptotic for the determinant in [104, Theorem 34]. In fact, by using the results in , one can even drop the third moment assumption.

The second part of [105, Theorem 6] (i.e. the Three Moment Theorem) should be used with care. This statement would hold if one has a rigidity result for eigenvalues of Wishart matrices (similar to those in ). We believe the proof of such a result would be a routine, but rather tedious, modification of the proof in . If we assume that the third moment vanishes (which is the case in many application), then it suffices to obtain an analogue of Corollary 41 and the proof of this would be simpler.

The eigenvalue localisation result in [102, Theorem 1.4] now requires MnM_{n} to match GUE to fourth order rather than third order. Again, the moment matching hypotheses can be dropped entirely if one is willing to use the results in .

The second part of [98, Theorem 1.13] should add the additional hypothesis that Mn,MnM_{n},M^{\prime}_{n} match GUE to third order (or at least obey the improved gap condition (65) with high probability). The proof of this theorem then needs to be modified as per Section A.2.

In the proof of [63, Theorem 1.4], one needs to replace the invocation of the Three Moment Theorem with the more complicated argument indicated in Section A.2. Alternatively, one can use here the Two Moment Theorem of Knowles and Yin .

The second part of [121, Theorem 1.5] should add the additional hypothesis that Mn,MnM_{n},M^{\prime}_{n} match the Wishart ensemble to third order (or at least obey the improved gap condition (65) with high probability). The proof of this theorem then needs to be modified as per Section A.2.

References