Delocalization and Diffusion Profile for Random Band Matrices

Laszlo Erdos, Antti Knowles, Horng-Tzer Yau, Jun Yin

Introduction

Typically, $W$ is a mesoscopic scale, larger than the lattice spacing but smaller than the diameter $L$ of the system: $1\ll W\ll L$ . These models are natural interpolations between random Schrödinger operators with short range quantum transitions such as the Anderson model And and mean-field random matrices such as Wigner matrices Wig . In particular, random band matrices may be used to model the Anderson metal-insulator phase transition, which we briefly outline.

The analysis of the trace of the single Green function yields the limiting spectral density of $H$ which is the Wigner semicircle law provided the band width $W$ diverges as $L\to\infty$ . For band matrices the semicircle law on large scales, corresponding to spectral parameter $\eta>0$ independent of $N$ , was given in MPK . More recently, a semicircle law on small scales, in which $\eta\ll 1$ , was derived in EYY1 and generalized in EKYY4 . The results of EYY1 ; EKYY4 are summarized in Lemma 3.4 below. As an application of our method, we prove a further improvement of the semiricle law in Theorem 2.2 below.

The main new ingredient in this paper is the self-consistent equation for the matrix $T$ , whose entries

are local averages of $|G_{xy}|^{2}$ . We show in Theorem 4.1 below that $T$ satisfies a self-consistent equation of the form

where $D$ is the matrix of second moments of $f$ (see (8.1) below). In order to give the leading-order behaviour of $T$ , we use $|m|^{2}=1-\alpha\eta+O(\eta^{2})$ (see (3.5) below), where

Therefore the Fourier transform of $T$ is approximately given by

in the regime $\lvert p\rvert\ll W^{-1}$ and $\eta\ll 1$ . This corresponds to the diffusion approximation on scales larger than $W$ with an effective diffusion constant $D_{\rm eff}$ . In the language of diagrammatic perturbation theory, the change from $D$ to $D_{\rm eff}$ has the interpretation of a self-energy renormalization. This result coincides with Equation (1.5.5) of Sp , which was obtained by computing the sum of ladder diagrams in a high-moment expansion.

The main result of this paper is a justification of this heuristic argument in a certain range of parameters. The error term ${\mathcal{E}}$ contains fluctuations of local averages. Roughly speaking, we need to control the size of $\sum_{x}\big{[}|G_{xy}|^{2}-P_{x}|G_{xy}|^{2}\big{]}$ , where $P_{x}$ denotes partial expectation with respect to the matrix entries in the $x$ -th row (see $\widetilde{T}_{xy}$ in (4.5) below). Unfortunately, $|G_{xy}|^{2}$ and $|G_{x^{\prime}y}|^{2}$ for $x\neq x^{\prime}$ are not independent; in fact they are strongly correlated for small $\eta$ , and they do not behave like independent random variables. Estimating high moments of these averages requires an unwrapping of the hierarchical correlation structure among several resolvent matrix entries. The necessary estimates are quite involved. They are a special case of the more general Fluctuation Averaging Theorem that is published separately EKY2 , and was originally developed for application in the current paper. There have been several previous results in this direction; see (EYY2, , Lemma 5.2), (EYY3, , Lemma 4.1), (EKYY1, , Theorem 5.6), and (PY, , Theorem 3.2). The Fluctuation Averaging Theorem generalizes these ideas to arbitrary monomials of $G$ and exploits an additional cancellation mechanism in averages of $|G_{xy}|^{2}$ that is not present in averages of $G_{xx}$ . For more details, see EKY2 .

Formulation of the results

for some fixed $\delta>0$ . The parameter $L$ is the fundamental large quantity of our model. Define the $d$ -dimensional discrete torus

where $Z_{L,W}$ is a normalization constant chosen so that $S$ is a stochastic matrix:

In particular, we may consider the two classical symmetry classes of random matrices: real symmetric and complex Hermitian. For real symmetric band matrices we assume

For complex Hermitian band matrices we assume

in addition to (2.5). A common way to satisfy (2.9) is to choose the real and imaginary parts of $\zeta_{ij}$ to be independent with identical variance. As in EKY2 , our results also hold without this assumption, but we omit the details of this generalization to avoid needless complications.

From the definition of $S$ it is easy to see that $Z_{N,W}=W^{d}+O(W^{d-1})$ . In particular,

The following definition introduces a notion of a high-probability bound that is suited for our purposes.

for large enough $N\geqslant N_{0}(\varepsilon,D)$ . Unless stated otherwise, throughout this paper the stochastic domination will always be uniform in all parameters apart from the parameter $\delta$ in (2.1) and the sequence of constants $\mu_{p}$ in (2.11); thus, $N_{0}(\varepsilon,D)$ also depends on $\delta$ and $\mu_{p}$ . If $X$ is stochastically dominated by $\Psi$ , uniformly in $u$ , we use the equivalent notations

For example, using Chebyshev’s inequality and (2.11) one easily finds that

so that we may also write $h_{ij}=O_{\prec}((s_{ij})^{1/2})$ . The relation $\prec$ satisfies the familiar algebraic rules of order relations. The general statements are formulated later in Lemma 3.3.

We remark that Definition 2.1 is tailored to the assumption that (2.11) holds for any $p$ . If (2.11) only holds for some large but fixed $p$ then all of our results still hold, but in a somewhat weaker sense. Indeed, the control of the exceptional events in our theorems is expressed via the relation $\prec$ . If only finitely many moments are assumed to be finite in (2.11), then the exponents $\varepsilon$ and $D$ in the definition of $\prec$ cannot be chosen to be arbitrary, and will in fact depend on $p$ . Repeating our arguments under this weaker assumption would require us to follow all of these exponents through the entire proof. Our assumption that (2.11) holds for any $p$ streamlines our statements and proofs, by avoiding the need to keep track of the precise values of these parameters.

Throughout the following we make use of a spectral parameter

We choose and fix two arbitrary (small) global constants $\gamma>0$ and $\kappa>0$ . All of our estimates will depend on $\kappa$ and $\gamma$ , and we shall often omit the explicit mention of this dependence. Set

We introduce the Stieltjes transform of Wigner’s semicircle law, defined by

It is well known that the Stieltjes transform $m$ is characterized by the unique solution of

with $\operatorname{Im}m(z)>0$ for $\operatorname{Im}z>0$ . Thus we have

To avoid confusion, we remark that the Stieltjes transform $m$ was denoted by $m_{sc}$ in the papers ESY1 ; ESY2 ; ESY3 ; ESY4 ; ESY5 ; ESY6 ; ESY7 ; ESYY ; EYY1 ; EYY2 ; EYY3 ; EKYY1 ; EKYY2 , in which $m$ had a different meaning from (2.14).

and denote its entries by $G_{ij}(z)$ . In the following sections we list our main results on the resolvent matrix entries.

We conclude this section by introducing some notation that will be used throughout the paper. We use $C$ to denote a generic large positive constant, which may depend on some fixed parameters and whose value may change from one expression to the next. Similarly, we use $c$ to denote a generic small positive constant. For two positive quantities $A_{N}$ and $B_{N}$ we sometimes use the notation $A_{N}\asymp B_{N}$ to mean $cA_{N}\leqslant B_{N}\leqslant CA_{N}$ . Moreover, we use $A_{N}\ll B_{N}$ to mean that there exists a constant $c>0$ such that $A_{N}\leqslant N^{-c}B_{N}$ ; we also use $A_{N}\gg B_{N}$ to denote $B_{N}\ll A_{N}$ . (Note that these latter conventions are nonstandard.) Finally, we introduce the Japanese bracket $\langle x\rangle\mathrel{\mathop{:}}=\sqrt{1+\lvert x\rvert^{2}}$ . Most quantities in this paper depend on the spectral parameter $z$ , which we however mostly omit from the notation.

For simplicity, here we state our main results assuming that $d=1$ and that $f$ satisfies the decay condition

The generalization of our results to $d>1$ and slowly decaying $f$ is straightforward, and will be given in Section 8. We emphasize that the core of our argument, given in Sections 3–5, is valid in general, independent of the dimension.

2. Improved local semicircle law for resolvent entries and delocalization

Throughout this section we assume $d=1$ and (2.17). The Wigner semicircle law states that the normalized trace, $\frac{1}{N}\operatorname{Tr}G(z)$ , is asymptotically given by $m(z)$ . In fact, this asymptotics holds even for individual matrix entries. Our first theorem controls the ( $z$ -dependent) random variable

For the following we introduce the deterministic control parameter $\Phi\equiv\Phi^{(N)}(z)$ through

Assume $d=1$ and (2.17). Suppose moreover that

Clearly, the assumption $\eta\gg N^{2}/W^{3}$ can be replaced with the stronger assumption $\eta\gg W^{-1/2}$ . The assumption $N\ll W^{5/4}$ is technical; to see why it is needed, see (6.3) in the proof of Theorem 2.2 below. In the regime (2.19), Theorem 2.2 improves the earlier result

proved in EYY1 (see Lemma 3.4 below). In fact, the estimate (2.20) is optimal, as may be seen from (2.38) and the first estimate of (2.30) below. By spectral decomposition of $G$ one easily finds that

Thus, in the regime where $\Lambda$ is bounded, the average of $\lvert G_{xy}\rvert^{2}$ is of order $(N\eta)^{-1}$ . Here we introduced the notation $G^{*}(z)\mathrel{\mathop{:}}=(G(z))^{*}=(H-\bar{z})^{-1}$ , which we shall use throughout the following.

The bound (2.20) implies an estimate on the Stieltjes transform of the empirical spectral density, $m_{N}(z)\mathrel{\mathop{:}}=N^{-1}\operatorname{Tr}G(z)$ . Under the assumptions of Theorem 2.2 and the conditions (2.19), we have

we leave the details to the reader. We remark that (2.23) is the simplest form of the fluctuation averaging mechanism (see Section 3.1). A concise proof of (2.23) can be found in (EKYY4, , Theorem 4.6).

For $\eta\leqslant(W/N)^{2}$ we have $\Phi^{2}=(N\eta)^{-1}$ , and the bound (2.20) therefore shows that all off-diagonal entries of $G$ have a magnitude comparable with the average of their magnitudes. We say that the resolvent is completely delocalized. Complete delocalization of the resolvent implies that the eigenvectors are completely delocalized in a weak sense. The precise formulation is given in Proposition 7.1 below. By choosing $\eta$ such that $W^{-1/2}\leqslant\eta\leqslant(W/N)^{2}$ and invoking Proposition 7.1 we obtain the following corollary.

Assume $d=1$ and (2.17). If $N\ll W^{5/4}$ then the eigenvectors of $H$ are completely delocalized in the sense of Proposition 7.1 below.

This corollary improves the result in EK1 ; EK2 , where complete eigenvector delocalization (in a slightly weaker sense; see Remark 2.7 below) was proved under the condition $N\ll W^{7/6}$ . It was observed in Section 11 of EK1 that the graphical perturbative renormalization scheme of EK1 ; EK2 faces a fundamental barrier at $N=W^{6/5}$ . The reason for this barrier is that a large family of graphs whose contribution was subleading for $N\ll W^{6/5}$ in fact yield a leading-order contribution for $N\geqslant W^{6/5}$ if estimated individually. The cancellation mechanism among these subleading graphs has so far not been identified. As evidenced by Corollary 2.4, our present approach goes beyond this barrier.

3. Diffusion profile

Note that the matrix $\Theta=(\Theta_{xy})$ solves the equation

which is obtained from (1.1) by dropping the error term ${\mathcal{E}}$ . Clearly, $\Theta_{xy}$ is translation invariant, i.e. $\Theta_{xy}=\Theta_{u0}$ with $u=[x-y]_{N}$ . Moreover, $\Theta_{xy}>0$ for all $x,y$ . Indeed, this follows immediately from the geometric series representation

which converges by $|m|<1$ (see (3.6) below) and the trivial bound $0\leqslant(S^{n})_{xy}\leqslant 1$ , as follows from (2.3).

(We normalize by $W^{-2}$ to account for the fact that the distribution $s_{u0}$ has variance $O(W^{2})$ .) It is easy to see that

a precise computation is given in (5.2) below.

Note that $T$ is not symmetric, but our results also hold for $T_{xy}$ replaced with the quantities $\sum_{j}s_{yj}\lvert G_{xj}\rvert^{2}$ or $\sum_{i,j}s_{xi}s_{yj}\lvert G_{ij}\rvert^{2}$ .

Assume $d=1$ and (2.17). Suppose that $N\ll W^{5/4}$ and $(W/N)^{2}\leqslant\eta\leqslant 1$ . Then

Note that the total mass of the distribution $\lvert G_{x0}\rvert^{2}$ may be computed explicitly by spectral decomposition of $G$ : assuming $\Lambda\prec\Psi$ we have

in agreement with the corresponding statement (2.28) for the deterministic limiting profile.

We expect that (2.30) should in fact hold under the weaker conditions $\eta\gg\frac{1}{N}$ and $N\ll W^{2}$ . The improved local semicircle law (2.20) should also hold under these weaker conditions. In particular, this would imply complete delocalization of the eigenvectors for all $N\ll W^{2}$ . One obstacle is that a non-trivial control on $\Lambda$ in the regime $\eta\leqslant\frac{1}{W}$ is difficult to obtain.

The resolvent is controlled for $\eta\geqslant W^{-1/2}$ (instead of $\eta\gg W^{-1/3}$ ).

The control on the profile is pointwise in $x$ and $y$ (instead of in a weak sense on the scale $W\eta^{-1/2}$ ).

The estimates hold with high probability (instead of in expectation).

However, the result in the current paper is not uniform in $N$ , unlike that of EK1 ; EK2 .

We conclude this section with an asymptotic result on the deterministic profile $\Theta_{x0}$ . Since we are interested in large values of $x$ , we need to consider the small-momentum behaviour of the Fourier transform of $\Theta_{x0}$ . Using the small- $p$ expansion (1.2) and (1.4), we therefore find that $\Theta_{x0}\approx\theta_{x}$ , where we defined the $N$ -periodic function

Moreover, if $(W/N)^{2}\leqslant\eta\leqslant 1$ and $N\leqslant W^{2}$ , we have the sharp upper bound $\Theta_{xy}\leqslant C\Upsilon_{xy}$ .

where in the last step we used the elementary identities (3.3) and (3.5) below. In fact, the calculation (2.39) is a mere consistency check (to leading order) since $\sum_{x}\Theta_{x0}=\frac{\operatorname{Im}m}{\eta}$ ; see (5.2) below. We conclude that the average height of the profile is of order $(N\eta)^{-1}$ . The peak of the exponential profile has height of order $(W\sqrt{\eta})^{-1}$ , which dominates over the average height if and only if $\eta\gg(W/N)^{2}$ . The regime $\eta\gg(W/N)^{2}$ corresponds to the regime where $\eta$ is sufficiently large that the complete delocalization has not taken place, and the profile is mostly concentrated in the region $|x-y|\leqslant W\eta^{-1/2}\ll N$ .

These scenarios are best understood in a dynamical picture in which $\eta$ is decreased down from $1$ . The ensuing dynamics of $\theta$ corresponds to the diffusion approximation, where the quantum problem is replaced with a random walk of step-size of order $W$ . On a configuration space consisting of $N$ sites, such a random walk will reach an equilibrium beyond time scales $(N/W)^{2}$ . As observed in Remark 2.7, $\eta^{-1}$ plays the role of time $t$ , so that in this dynamical picture equilibrium is reached for $t\sim\eta^{-1}\gg(N/W)^{2}$ . Figure 1 illustrates this diffusive spreading of the profile for different values of $\eta$ .

4. Delocalization with a small mean-field component

In this section we continue to assume $d=1$ and (2.17). We now consider a related model

The effect of adding a small Wigner component of size $\varepsilon$ is that the imaginary part of the spectral parameter effectively increases from $\eta$ to $\eta+\varepsilon$ in the local semicircle law and in the diffusion approximation. In particular, we can eliminate the condition $N\leqslant W^{5/4}$ and still obtain delocalization for $H_{\varepsilon}$ provided $\varepsilon$ is not too small. These results are summarized in the following theorem. In order to state it, we introduce the control parameter

which is analogous to $\Phi$ defined in (2.18).

Suppose that $\eta(\eta+\varepsilon)\gg W^{-1}$ . Moreover, suppose that $N\ll W^{5/4}$ or $\eta+\varepsilon\gg W^{-1/2}$ . Then

Suppose that $\varepsilon+\eta\gg W^{-1/2}$ and

Then the resolvent is completely delocalized:

If $\varepsilon\gg(N/W^{2})^{2/3}$ then the eigenvectors of $H_{\varepsilon}$ are completely delocalized in the sense of Proposition 7.1.

This theorem formulates only the bounds concerning delocalization, i.e. the counterparts of Theorem 2.2 and Corollary 2.4. Similarly to Theorem 2.5, a non-trivial profile can be proved for the average of $|G_{xy}|^{2}$ . The profile is visible in the regime $N\eta\geqslant W\sqrt{\varepsilon+\eta}$ , and it is given by

where the approximation is valid in the regime $|x-y|\ll N$ . The details of the precise formulation and the proof are left to the reader.

Preliminaries

In this subsection we introduce some further notations and collect some basic facts that will be used throughout the paper. Throughout this section we work in the general $d$ -dimensional setting of Section 2.1.

For $T\subset\{1,\dots,N\}$ we define $H^{(T)}$ by

Moreover, we define the resolvent of $H^{(T)}$ through

Let $X\equiv X(H)$ be a random variable. For $i\in\{1,\dots,N\}$ define the operations $P_{i}$ and $Q_{i}$ through

We call $P_{i}$ partial expectation in the index $i$ . Moreover, we say that $X$ is independent of $T\subset\{1,\dots,N\}$ if $X=P_{i}X$ for all $i\in T$ .

Suppose that $X(u,v)\prec\Psi(u,v)$ uniformly in $u\in U$ and $v\in V$ . If $\lvert V\rvert\leqslant N^{C}$ for some constant $C$ then

Suppose that $X_{1}(u)\prec\Psi_{1}(u)$ uniformly in $u$ and $X_{2}(u)\prec\Psi_{2}(u)$ uniformly in $u$ . Then

The claims (i) and (ii) follow from a simple union bound. The claim (iii) follows from Chebyshev’s inequality, using a high-moment estimate combined with Jensen’s inequality for partial expectation. We omit the details. ∎

Note that if for any $\varepsilon>0$ and $p\geqslant 1$ we have

for large enough $N$ (depending on $\varepsilon$ and $p$ ) then $X\prec\Psi$ by Chebyshev’s inequality. Moreover, if $X\leqslant\Psi$ almost surely, then $X\prec\Psi$ . Hence $O_{\prec}(\Psi)$ describes a larger class of random variables than $O(\Psi)$ .

We need the following bound on $\Lambda$ .

Away from the spectral edges, i.e. for $z\in\bf{S}$ , this bound was proved in Proposition 3.3 of EYY1 . In EYY1 , the matrix entries $x_{ij}$ were assumed to have at most subexponential tails (a stronger assumption than (2.11) for all $p$ ), but the proof of EYY1 extends trivially to our case. See EKYY4 for a simplified and generalized alternative proof.

The following result collects some elementary facts about $m$ .

The identity (3.3) follows by taking the imaginary part of (2.15). The estimate (3.4) was proved in EYY2 , Lemma 4.2. From (2.16) we find $\operatorname{Im}m=1/\alpha+O(\eta)$ , from which (3.5) follows easily using (3.3). Finally, (3.6) follows from Lemma 4.2 in EYY2 combined with (3.3) and (3.4). ∎

The following resolvent identities form the backbone of all of our proofs. They first appeared in (EYY1, , Lemmas 4.1 and 4.2) and (EKYY2, , Lemma 6.10). The idea behind them is that a resolvent entry $G_{ij}$ depends strongly on the $i$ -th and $j$ -th columns of $H$ , but weakly on all other columns. The first set of identities (called Family A) determine how to make a resolvent entry $G_{ij}$ independent of an additional index $k\neq i,j$ . The second set (Family B) identities express the dependence of a resolvent entry $G_{ij}$ on the entries in the $i$ -th or in the $j$ -th column of $H$ .

For any Hermitian matrix $H$ and $T\subset\{1,\dots,N\}$ the following identities hold.

For $i,j,k\notin T$ and $k\neq i,j,$ we have

For $i,j\notin T$ satisfying $i\neq j$ we have

The deterministic control parameter $\Psi$ is admissible if

A typical example of an admissible control parameter is

If $\Psi$ is admissible then the lower bound in (3.9) together with (2.12) ensure that $h_{ij}\prec\Psi$ .

The following lemma gives an expansion formula for the diagonal entries of $G$ .

Suppose that $\Lambda\prec\Psi$ for some admissible $\Psi$ . Defining

The claim is an immediate consequence of Equations (9.1) and (9.2) in EKY2 . (Related but less explicit formulas were also obtained in EYY3 ). ∎

In this section we collect the necessary results from EKY2 . The following proposition is a special case of the Fluctuation Averaging Theorem of EKY2 .

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then

All of these estimates follow immediately from Theorem 4.8, Lemma B.1, and Proposition B.2 of EKY2 , recalling that by assumption $\operatorname{Im}m\geqslant c_{\kappa}$ by (3.6). ∎

The important quantity on the right-hand sides of (3.12), (3.13) and (3.14) is $\Psi$ . The additional factors $M^{-1/4}$ are a technicalThis nuisance is necessary, however, and Proposition 3.9 would be false without the factors of $M^{-1/4}$ . See (EKY2, , Remark 4.10). nuisance, but their precise form will play some role in the large- $\eta$ regime, where $M^{-1/4}$ is not negligible compared to $\Psi$ .

To interpret these estimates, we note that each summand in (3.12), (3.13), and (3.14) has a naive size given by $\Psi^{k}$ , where $k$ is the number of off-diagonal resolvent entries in the summand. Without averaging, this naive size would be a sharp upper bound. In the second estimate in (3.12) the averaging does not improve the bound since $G_{\mu a}G_{a\mu}^{*}=|G_{\mu a}|^{2}$ is positive. In all other estimates, the monomial on the left-hand side either has a nontrivial phase or its expectation is zero thanks to $Q_{a}$ . Proposition 3.9 asserts that in these cases the averaged quantity is smaller than its individual summands. Note that this averaging of fluctuations is effective even though the entries of $G$ may be strongly correlated. How many additional factors of $\Psi$ one gains depends on the structure of the left-hand side in a subtle way; see Theorem 4.8 of EKY2 for the precise statement. For the applications in this paper the second bound in (3.13) is especially important; here the averaging yields a gain of two extra factors of $\Psi$ .

We remark that all these bounds also hold if the weight functions $s_{\rho a}$ are replaced with a more general weight function. The precise definition is given in Definition 4.4 of EKY2 . All the weights used in this paper satisfy Definition 4.4 of EKY2 .

We also note that averaging in indices can be replaced by expectations. We shall need the following special case of Theorem 4.15 of EKY2 .

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then for $a\neq\mu,\nu$

Self-consistent equation for T𝑇T

After these preparations, we now move on to the main arguments of this paper. Throughout this section we work in the general $d$ -dimensional setting of Section 2.1. In this section we derive a self-consistent equation for $T$ , given in Theorem 4.1, whose error terms are controlled precisely using the fluctuation averaging from Proposition 3.9. In Section 5 we solve this self-consistent equation; the result is given in Proposition 5.1.

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then we have

where the matrix entries of the error satisfy

The naive size of $T_{xy}$ is of order $\Psi^{2}$ . Notice that the error term in the self-consistent equation (4.2) is smaller by two orders. This improvement is essentially due to second estimate of (3.13).

Instead of averaging in the first index of the resolvent in the definition of $T$ (2.29), we could have averaged in the second, resulting in the quantity $T_{xy}^{\prime}\mathrel{\mathop{:}}=\sum_{j}\lvert G_{xj}\rvert^{2}s_{jy}$ . Then $T^{\prime}$ satisfies the self-consistent equation

where ${\mathcal{E}}^{\prime}$ also satisfies (4.3).

Before the proof we mention that this result also gives a self-consistent equation for the two-sided averaged quantity

Taking the average $\sum_{y}s_{yz}$ of (4.1), we get the following corollary.

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then we have

where ${\mathcal{E}}$ and $\widetilde{{\mathcal{E}}}$ each satisfy (4.3).

The rest of this section is devoted to the proof of Theorem 4.1. We begin by writing

Then by the second formula in (3.13), we have

Notice that (3.13) applies only to the summands $i\neq y$ in (4.5). The estimate for the summand $i=y$ follows from

where we used that $(G_{yy}-m)\prec\Psi$ (see (3.11)) and that $\Psi$ is admissible, and in particular $\Psi M^{-1}\leqslant\Psi^{2}M^{-1/2}$ .

We shall compute $\sum_{i}s_{xi}P_{i}|G_{iy}|^{2}$ up to error terms of order $\Psi^{4}$ . We have the following result.

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then

(It is possible to improve the last error term in (4.7), but we shall not need this.) Before proving Lemma 4.4, we show how it implies Theorem 4.1.

Equation (4.1) is an immediate consequence of (4.8), (4.6), and (4.5). Hence Theorem 4.1 follows from Lemma 4.4. ∎

Throughout the following we shall repeatedly need the simple estimate

where the first step follows from $\Lambda\prec\Psi$ , the second from the fact that $\Psi$ is admissible, and the last from (3.4). In particular, for $k\neq i,j$ , from (3.7) we get the estimate

We start the proof of Lemma 4.4 with the case $i\neq y$ . Using (3.11) we get

where in the last step we used $Z_{i}\prec\Psi$ and the large deviation bound (see Lemma B.2)

We may now compute the contribution of the main term in (4.11) to $P_{i}\lvert G_{iy}\rvert^{2}$ . Still assuming $i\neq y$ , we find

In the third step we used (4.9), and in the last step we added the missing term $k=i$ to obtain $T_{iy}$ ; the resulting error term is $O_{\prec}(\Psi^{2}M^{-1})$ since $i\neq y$ . Next, using (4.9) we get

In the second step, using (3.7) and (4.9), we inserted an upper index $i$ as a preparation to taking the partial expectation $P_{i}$ . We obtain

Now we take the partial expectation in $i$ in (4.15). Using that

by Proposition 3.10 and (4.10), we find that $P_{i}$ applied to the second term in (4.15) results in a quantity $O_{\prec}\big{(}\Psi^{2}(\Psi+M^{-1/4})^{2}\big{)}$ . Thus the contribution of main term in (4.11) to $P_{i}|G_{iy}|^{2}$ is

Next, we look at the contribution of the term with a $Z$ in (4.11):

Now we remove the upper indices at the expense of an error of size $O_{\prec}(\Psi^{4})$ , and then add back the exceptional summation index $i$ as before. This gives

where in the second step we used (3.14); the various cases of coinciding indices $k,l,y$ are easily dealt with using the bound $M^{-1/2}\leqslant\Psi$ .

As remarked above, in the real symmetric case (2.8) the pairing $c=k$ , $d=l$ is also possible. This gives rise to the additional error term

Combining (4.11), (4.16) and (4.17) yields

for $i\neq y$ . This proves (4.7) for the case $i\neq y$ .

Here we used that $G_{yy}-m\prec\Psi$ and that $P_{y}(G_{yy}-m)\prec\Psi^{2}+M^{-1/2}$ by (3.11). It is possible to compute this term to high order in $\Psi$ , but we shall not need this.

For the proof of (4.8) we run almost the same argument as above but now we aim at removing all upper indices $i$ . We first consider the summands $i\neq y$ . From (4.15) we get

where we removed the upper index $i$ using (3.7), and included the summand $k=i$ at the expense of a negligible error term. Taking the average $\sum_{i}^{(y)}s_{xi}$ of the second term on the right-hand side yields

In the first step we just added the exceptional index $i=y$ , and estimated the additional terms with $i=y$ using $s_{xy}s_{yk}\leqslant M^{-1}s_{yk}\leqslant M^{-2}$ as well as $G_{ky}G_{yy}G_{yk}^{*}\prec\delta_{ky}+\Psi^{2}$ . In the second step we used (3.14). Note that the gain comes from the summation index $i$ .

Thus the contribution of the main term of (4.11) to $\sum_{i}^{(y)}s_{xi}P_{i}|G_{iy}|^{2}$ is

The contributions of the error terms in (4.11) to $\sum_{i\neq y}s_{xi}P_{i}|G_{iy}|^{2}$ are of order $O_{\prec}\bigl{(}{\Psi^{4}+\Psi^{2}M^{-1/2}}\bigr{)}$ ; this is true even without averaging (see (4.17)). Thus we have

Finally, we consider the case $i=y$ . From (4.20) we get

This formula provides the missing summands $i=y$ in (4.24) and hence yields (4.8). ∎

Solving the equation for T𝑇T

Thus, the entries $\Pi_{ij}$ of $\Pi$ are all equal to $1/N$ , and $S\Pi=\Pi S=\Pi$ since $S$ is stochastic by (2.3). The complementary projection is denoted by $\overline{\Pi}\!\,\mathrel{\mathop{:}}=1-\Pi$ .

We perform this splitting on $T_{xy}$ only in the $x$ coordinate, regarding $y$ as fixed. Thus, we split

here the last step follows easily by spectral decomposition of $G$ . We can use the local semicircle law, Lemma 3.4, to get

It is instructive to perform the same averaging with the deterministic profile $\Theta$ :

Having dealt with the component $\Pi T$ in (5.1), we devote the rest of this section to the component $\overline{\Pi}\!\,T$ . The following proposition contains the main result of this section.

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then we have for all $y$

Multiplying (4.2) by $\overline{\Pi}\!\,$ from the left yields

where we used that $S\Pi=\Pi S=\Pi$ . Therefore

Note that $(\overline{\Pi}\!\,T)_{xy}=T_{xy}-\overline{T}\!\,_{y}$ . Using (5.6) we therefore get (5.3) whose error term satisfies

This completes the proof of (5.3) and (5.4). ∎

Next, we estimate $|G_{ij}-\delta_{ij}m|^{2}$ in terms of $T_{ij}$ . In other words, we derive pointwise estimates on $G_{ij}$ from estimates on the averaged quantity $T_{xy}$ . This gives rise to an improved bound on $\Lambda$ , which we may plug back into Proposition 5.1. Thus we get a self-improving scheme which may be iterated.

Suppose that $\Lambda\prec\Psi$ with some admissible control parameter $\Psi$ and $T_{ij}\prec\Omega_{ij}^{2}$ for a family of admissible control parameters $\Omega_{ij}$ indexed by a pair $(i,j)$ (see Definition 3.7). Then

(Here we write $\Omega_{ij}^{2}\mathrel{\mathop{:}}=(\Omega_{ij})^{2}$ .)

We fix the index $j$ throughout the proof. Let first $i\neq j$ . Then (3.8) gives

We shall use the large deviation bounds from Theorem B.1 to estimate the sum. For that we shall need a bound on

where in the first step we used (3.7) and (4.9). Since $G_{ii}\prec 1$ , we get from (5.9) and Theorem B.1 (i) that

To estimate $G_{ii}-m$ , we use (3.11) to get

where in the second step we used Theorem B.1 (i) and (ii), with the bounds $G_{kk}^{(i)}\prec 1$ and

This last estimate follows along the lines of (5.10), whereby the error terms resulting from the removal of the upper indices are estimated by Cauchy-Schwarz; we omit the details. Finally $M^{-1}$ can be absorbed into $\sum_{k}\Omega_{ik}^{2}s_{ki}$ by admissibility of $\Omega_{ij}$ . ∎

We may now combine Proposition 5.1 and Lemma 5.3 in an iterative self-improving scheme, which results in an improved bound on $\Lambda$ .

Suppose that $\Lambda\prec\Psi$ and $T_{ij}\prec\Omega^{2}$ for all $i$ and $j$ , where $\Psi$ and $\Omega$ are admissible control parameters. Then

We apply Lemma 5.3 to the constant control parameter $\Omega_{ij}=\Omega$ for each $i,j$ . Thus, suppose that $T_{ij}\prec\Omega^{2}$ for all $i,j$ , Lemma 5.3 yields

Now we can iterate this estimate, $\Omega^{2}+\Psi^{4}$ taking the role of $\Psi^{2}$ in controlling $\Lambda^{2}$ . Thus after one iteration we get

After $k$ iterations we get $\Lambda^{2}\prec\Omega^{2}+\Psi^{2^{k}}$ . Since $\Omega$ and $\Psi$ are admissible, we have $\Psi^{2^{k}}\prec\Omega^{2}$ for $k\sim|\log\gamma|$ . This completes the proof. ∎

First we show that for large enough $L$ the Euclidean matrix norm satisfies

with some positive constant $c_{1}$ depending on the profile $f$ . Since the matrix entries $s_{ij}$ are translation invariant (see (2.2)), it is sufficient to compute its Fourier transform as defined in (5.13). Using the property $\widehat{Su}(p)=\widehat{S}(p)\widehat{u}(p)$ , the fact that $\widehat{\Pi}(p)=\delta_{p0}$ , and Plancherel’s identity, we find

The last step follows easily from $\widehat{S}(p)\geqslant-1+\delta$ (recall (2.4)) and the representation

where in the second step we used $\lvert p\cdot x\rvert\leqslant\pi$ , and in the last step $\lvert p\rvert\geqslant 2\pi/L$ .

From (3.6) we get $1-|m|^{2}\geqslant c\eta$ , which, combined with (5.15), yields

In order to prove (5.6), we first observe that $\|S\|_{\infty}\leqslant 1$ as follows from the estimate

Delocalization bounds

In this section we prove our main results – Theorems 2.2, 2.5, and 2.11. We return to the one-dimensional case, $d=1$ , and continue to assume (2.17). In particular, we write $N$ instead of $L$ . The simple extension to higher dimensions is given in Section 8.

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then (5.3) together with (5.4), (5.1), and (2.38) yield

Recalling Corollary 5.4, we have therefore proved

i.e. the upper bound $\Lambda^{2}\prec\Psi^{2}$ can be replaced with the stronger bound (6.2).

We can now iterate (6.2), exactly as in the proof of Corollary 5.4. We start the iteration with $\Psi_{0}\mathrel{\mathop{:}}=(W\eta)^{-1/2}$ ; see Lemma 3.4. Explicitly, the iteration reads

From (6.2) and Lemma 3.4 we get that $\Lambda^{2}\prec\Psi_{k}$ for any fixed $k$ .

In order perform the iteration, we require

Thus we get the conditions $N\ll W^{5/4}$ and $\eta\gg N^{2}/W^{3}$ . (Here we used (2.1)). Satisfying these two conditions is the reason we need to impose the restriction on $W$ in Theorem 2.2, Corollary 2.4, and Theorem 2.5. Using (6.3) and the fact that $\Phi$ is by definition admissible, it is now easy to see that there is a finite constant $k$ , which depends on the implicit constants $c$ in $\ll$ and $\gg$ above, such that $\Psi_{k}^{2}\leqslant C\Phi^{2}$ . This concludes the proof of Theorem 2.2.

2. Delocalization with profile: proof of Theorem 2.5

By assumption we have $(W/N)^{2}\leqslant\eta\leqslant 1$ , so that in particular $\Phi^{2}=W^{-1}\eta^{-1/2}=\mathrel{\mathop{:}}\Psi^{2}$ . Note that this $\Psi$ is admissible. From (2.20) we get $\Lambda\prec\Psi$ . Now observe that $\frac{\operatorname{Im}m}{N\eta}=\Pi_{xy}\frac{\operatorname{Im}m}{\eta}$ for all $x$ and $y$ , as well as

by (3.3) and the property $\Pi S=S\Pi=\Pi$ . Thus (5.3) together with (5.4) and (5.1) implies the first estimate of (2.30), since in the regime $\eta\geqslant(W/N)^{2}$ and $W^{5/4}\gg N$ the error term (5.4) is bounded by

The second estimate of (2.30) follows from the first one and (4.7).

Next, (2.31) follows by using (2.33) in (2.30).

Finally, using Lemma 5.3 with $\Omega_{ij}^{2}=\Upsilon_{ij}$ and $\Psi\mathrel{\mathop{:}}=W^{-1/2}\eta^{-1/4}$ , we obtain

Here we used that $\Psi^{4}$ can be absorbed into $(N\eta)^{-1}\leqslant\Upsilon_{ij}$ and in the last summation $\sum_{k}\Upsilon_{ik}s_{ki}$ can be absorbed into $\Upsilon_{ii}\geqslant\frac{C}{W\sqrt{\eta}}$ . This proves (2.32), and hence concludes the proof of Theorem 2.5.

3. Delocalization with a small mean-field component: proof of Theorem 2.11

with some positive constant $c$ . This implies that (5.5) and (5.6) can be improved to

Suppose now that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Then the statement of Proposition 5.1 is modified to

Notice that the Fourier transforms of $S$ and $S_{\varepsilon}$ (defined by (5.13)) satisfy

Here we treated the zero mode $p=0$ separately; it is given by

where in the last step we used (3.3). The error term in (6.9) is estimated using a similar calculation.

Notice that the coefficient of $\widehat{S}(p)$ in the denominator of (6.9) is now $|m|^{2}(1-\varepsilon)=1-\varepsilon-(1-\varepsilon)\alpha\eta+O(\eta^{2})$ , where we used (3.5). The results and the proof of Proposition 2.8 remain unchanged when $S$ is replaced with $S_{\varepsilon}$ , except that $\alpha\eta$ must be replaced with $(1-\varepsilon)\alpha\eta+\varepsilon$ on the right-hand side of (2.36), and the whole expression is multiplied by an additional factor $(1-\varepsilon)$ . Moreover, instead of (2.38), we now have

Recall the definition (2.20) of $\Phi_{\varepsilon}$ . Following the proof of Theorem 2.2, instead of (6.2) we now obtain

As in Section 6.1, we can iterate (6.11) under the conditions

(Note that the a priori estimate $(W\eta)^{-1}$ is still determined by $W$ despite the small mean-field component. In Lemma 3.4 it is given by $(M\eta)^{-1/2}$ where $M=(\max_{ij}s_{ij})^{-1}\sim(\varepsilon N^{-1}+W^{-1})^{-1}\sim W$ .) The first condition of (6.12) holds if

In order to get complete delocalization of the resolvent, i.e. $\Lambda^{2}\prec(N\eta)^{-1}$ , we require $\Lambda\prec\Phi_{\varepsilon}^{2}$ as well as

which ensures that $\Phi_{\varepsilon}=(N\eta)^{-1}$ . Hence we get complete delocalization of the resolvent provided that (6.13), (6.14), and (6.16) hold. This concludes the proof of part (ii).

If $\varepsilon\gg(N/W^{2})^{2/3}$ then there exists an $\eta$ such that the assumptions of part (ii) are met. Hence part (ii) and Proposition 7.1 yields part (iii). This concludes the proof of Theorem 2.11.

Complete delocalization of eigenvectors

Let $\varepsilon>0$ and define the random subset of eigenvector indices through

Suppose that $\Lambda\prec\Psi$ for some admissible control parameter $\Psi$ . Let $\eta\equiv\eta_{N}$ be a sequence satisfying $M^{-1+\gamma}\leqslant\eta\ll 1$ . Suppose that

uniformly in $E\in I$ , where in the first step we used the spectral decomposition of $G$ . Thus, for all $x$ , the map $y\mapsto\frac{\eta}{\operatorname{Im}m}\lvert G_{yx}\rvert^{2}$ is approximately a probability distribution on $\{1,\dots,N\}$ . Roughly, (7.1) states that this probability distribution is supported on the order of $N$ sites of $\{1,\dots,N\}$ . More precisely, (7.1) yields (introducing the standard basis vector $\delta_{x}$ defined by $(\delta_{x})(y)\mathrel{\mathop{:}}=\delta_{xy}$ ), for any fixed $x$ ,

uniformly in $E\in I$ . Here in the third step we used (7.2) and (7.1), and in the last step the upper bound $\eta\leqslant M^{-c}$ and the fact that $\Psi$ is admissible.

Therefore we may estimate the left-hand side by its square root to get the bound

Similarly, we may estimate the second term of (7.4) using

Combining (7.4) with (7.3), (7.5), and (7.6), we get

Setting $\zeta=\sqrt{\varepsilon}$ in (7.8) therefore yields

Extension to higher dimensions and a slowly decaying band

In this Section we extend Theorem 2.2, Corollary 2.4, and Theorem 2.5 in two directions: higher dimensions $d$ and a slowly decaying band.

The multidimensional analogues of the slowly decaying profile are left to the reader, as is the formulation of these extensions if a small mean-field component is added to the band matrix. All these results can be obtained in a straightforward manner following the proofs for the one-dimensional case with a rapidly decaying $f$ .

Let $d=2,3,\dots$ and assume (2.17). Then there is a constant $C$ such that

In order to state the precise form of the profile $\Theta_{xy}$ , we define the covariance matrix $D\equiv D_{W}$ through

Since $D_{\infty}>0$ we get $D\geqslant c>0$ uniformly in $W$ .

Next, we define the $d$ -dimensional Yukawa potential

where $\chi$ is a smooth function satisfying $\chi(q)=1$ for $\lvert q\rvert\leqslant 1/2$ and $\chi(q)=0$ for $\lvert q\rvert\geqslant 1$ , $\varphi$ is a Schwartz function satisfying $\int\varphi=1$ , and $\varphi_{t}(x)\mathrel{\mathop{:}}=t^{-d}\varphi(x/t)$ . (In fact, $\varphi(D^{-1/2}x)$ is the Fourier transform of $\chi(q)$ .) The second step of (8.2) follows by Poisson summation; see Appendix A and in particular (A.14) for more details. The following lemma gives the precise error bounds in the approximation (8.2).

Let $d=1,2,3,\dots$ and assume (2.17). Then

The convolution in (8.2) smooths out the Yukawa potential on the scale $x\approx W$ . The error terms in (8.3) are negligible compared to the main term $\theta_{x}$ in the regime $W\ll|x|\leqslant CW\eta^{-1/2}$ . Therefore the approximation $\theta$ is meaningful from the profile scale $W\eta^{-1/2}$ down to the band scale $W$ . The actual choice of the function $\chi$ in (8.2) is immaterial in the relevant regime $|x|\gg W$ , as long as $\chi$ is equal to one in a neighbourhood of the origin.

Next, we state the counterparts of Theorem 2.2, Corollary 2.4, and Theorem 2.5 in the higher-dimensional setting. Their proofs are trivial modifications of the proofs of their one-dimensional counterparts, using Lemmas 8.1 and 8.2.

Let $d=2,3,\dots$ and assume (2.17). Suppose moreover that $L\ll W^{1+d/4}$ and $\eta\gg L^{2}/W^{d+2}$ . Then we have

Let $d=2,3,\dots$ and assume (2.17). If $L\ll W^{1+d/4}$ then the eigenvectors of $H$ are completely delocalized in the sense of Proposition 7.1.

Let $d=2,3,\dots$ and assume (2.17). Suppose that $L\ll W^{1+d/4}$ and $(W/L)^{2}\leqslant\eta\leqslant 1$ . Then

Moreover, the analogues of (2.31) and (2.32) hold with

where $K$ is an arbitrary, fixed, positive integer.

2. Slowly decaying band

In this section we make the following assumption on the band shape. Suppose that $d=1$ and $f$ is smooth and symmetric, and satisfies

for some fixed $\beta\in(0,2)$ . Here $h$ is a symmetric function satisfying

for some fixed $h_{0}>0$ . Note that by definition $f$ is smooth and symmetric, so that $h(x)=O(\lvert x\rvert^{1+\beta})$ near the origin.

In order to avoid technical issues arising from the periodicity of $S$ , we cut off the tail of $f$ at scales $x\approx N$ . Thus we set

here $\sigma$ is a smooth, symmetric bump function satisfying $\sigma(x)=1$ for $\lvert x\rvert\leqslant a$ and $\sigma(x)=0$ for $\lvert x\rvert\geqslant b$ , where $0<a<b<1/2$ . As usual, $Z$ is a normalization constant.

The following lemma is the analogue of Lemma 5.2. Its proof is similar to that of Lemma 5.2; the key input is Lemma A.2 (iii).

Suppose that $d=1$ and that (8.6) and (8.7) hold. Then

Next, we give the sharp upper bound on the peak of the profile.

Suppose that $d=1$ and that (8.6) and (8.7) hold. Then

In order to describe the asymptotic shape of the profile, we define

which plays a role similar to the unrenormalized diffusion constant $D$ from (2.26). Moreover, define the function

which is bounded for $\beta>1$ . It is easy to check that for $\beta>1$ and $\lvert x\rvert\geqslant 1$ we have

with an explicitly computable constant $C_{\beta}>0$ .

Suppose that $d=1$ and that (8.6) and (8.7) hold for some $\beta>1$ . Suppose moreover that

The matrix $\Theta$ is the resolvent of a superdiffusive operator, whose symbol in Fourier space is $B\lvert Wp\rvert^{\beta}$ . Thus, under the identification $t=\eta^{-1}$ from Remark 2.7, we find that the associated dynamics scales according to $x\sim Wt^{1/\beta}$ instead of the diffusive scaling $x\sim Wt^{1/2}$ .

We may now state the counterparts of Theorem 2.2, Corollary 2.4, and Theorem 2.5 for the slowly decaying band. Their proofs are trivial modifications of those for the strongly decaying band, using Lemmas 8.7 and 8.8.

Suppose that $d=1$ and that (8.6) and (8.7) hold. Suppose moreover that $N\ll W^{1+1/2\beta}$ and $\eta\gg(N/W)^{\beta}/W$ . Then we have

Suppose that $d=1$ and that (8.6) and (8.7) hold. If $N\ll W^{1+1/2\beta}$ then the eigenvectors of $H$ are completely delocalized in the sense of Proposition 7.1.

Suppose that $d=1$ and that (8.6) and (8.7) hold for some $\beta\geqslant 1$ . Suppose that $N\ll W^{1+1/2\beta}$ and $(W/N)^{\beta}\leqslant\eta\leqslant 1$ . Then

Moreover, the analogues of (2.31) and (2.32) hold.

Appendix A The deterministic profile

In this appendix we establish bounds and asymptotics for the deterministic profile $\Theta_{xy}$ .

where $g$ is a bounded smooth function satisfying $g(q)=g(-q)$ . Clearly, $\widehat{f}$ is real and $\|\widehat{f}\|_{\infty}\leqslant 1$ . Moreover, we claim that for any $\varepsilon>0$ there exists an $\varepsilon^{\prime}>0$ such that

indeed, this follows easily from the identity

As a guide for intuition, we have $\widehat{S}_{W}(q)\approx\widehat{f}(q)$ , as can be seen from

Thus, our proof consists in controlling the error in the approximation

As a first step, we establish basic properties of $\widehat{S}_{W}$ that are analogous to (A.2) and (A.1).

The function $\widehat{S}_{W}$ is smooth with uniformly bounded derivatives, real, and symmetric with $\lvert\widehat{S}_{W}(q)\rvert\leqslant 1$ and $\widehat{S}_{W}(0)=1$ . Moreover, it has the following properties.

For any $\varepsilon>0$ there exists an $\varepsilon^{\prime}>0$ such that

for large enough $W$ (depending on $\varepsilon$ ).

There exists a smooth function $g_{W}$ whose derivatives are bounded uniformly in $W$ such that

The proof of (i) is similar to that of (A.2).

(recall that $\lvert q\rvert\leqslant\pi W$ ) and that

to estimate the main term, and (2.1) to estimate the error term with $K\sim 1/\delta$ . We also have the trivial bound on $|\widehat{S}_{W}|\leqslant 1$ . Thus we have

We can iterate the above argument for the main term in (A.8), thus obtaining higher order divided differences of $f$ . Since $f$ is smooth and decays rapidly, (A.6) follows for $k=0$ . The proof for $k>0$ is analogous.

Now (iii) follows from the fact that the function $h(q)\mathrel{\mathop{:}}=\bigl{(}{1-q^{2}/2-\cos(q)}\bigr{)}q^{-4}$ is smooth and its derivatives are bounded. ∎

with some positive constant $c$ . By (A.5) we have on the support of $\overline{\chi}\!\,$

for some $\varepsilon^{\prime}$ depending on $\varepsilon$ . Then from (A.3) we have

extended to the whole real line, is smooth and its derivatives are bounded uniformly in $N$ and $W$ (by (A.9)). (These bounds may of course depend on $\varepsilon$ ). Moreover, $R(q)=O_{K}(\langle q\rangle^{-K})$ for any $K$ ; see (A.6). By summation by parts, as in (A.8), we find that for such a function we have

Now we consider the first term in (A.10). For the following we use $A_{i}(q,\eta,N,W)$ with $i=1,2,3,\dots$ to denote functions that are smooth in $q$ and whose $q$ -derivatives are uniformly bounded in $q$ , $\eta$ , $W$ , and $N$ . Using the Taylor expansion (A.7) and (3.5), we have (omitting the arguments for brevity)

This gives (again omitting the arguments)

where we introduced the new variable $r\mathrel{\mathop{:}}=\eta^{-1/2}q$ . By definition, $A_{1},\dots,A_{6}$ and their $q$ -derivatives are uniformly bounded. Since $D\geqslant c>0$ and $r\leqslant\varepsilon\eta^{-1/2}$ on the support of $\chi$ , we find that for small enough $\varepsilon$ the denominator of the second line of (A.12) is bounded away from zero, uniformly in $r$ , $\eta$ , $W$ , and $N$ . We therefore conclude that $F_{N,W,\eta}$ is smooth and its derivatives (in the variable $r$ ) are uniformly bounded.

Using summation by parts, exactly as in (A.8), we get

Here we used that the sum on the left-hand side ranges over a set of size $O(N/W)$ due to the factor $\chi$ in the definition of $F_{N,W,\eta}$ . Therefore (A.12) and (A.13) imply that the first term of (A.10) is given by

Notice that the error term in (A.11) is smaller than in (A.14). Next, we remove the factor $\chi$ from the main term, exactly as in (A.11). Plugging this into (A.10) yields

We can extend the summation in the main term

where the error term on the right-hand side is of order $O(W^{-2})$ . Thus we have

The main term can be computed by the Poisson summation formula

In order to prove (2.38), it suffices to analyse the asymptotics of the expression

We consider two cases. If $\eta\geqslant\bigl{(}{\frac{W}{N}}\bigr{)}^{2}$ then $R\asymp\frac{1}{W\sqrt{\eta}}$ . On the other hand, if $\eta\leqslant\bigl{(}{\frac{W}{N}}\bigr{)}^{2}$ we use an integral approximation to get

This concludes the proof of (2.38), and hence of Proposition 2.8.

A.2. Higher dimensions: proofs of Lemmas 8.1 and 8.2

We follow the argument from the proof of Proposition 2.8 in the previous section, and merely sketch the differences. We use the $d$ -dimensional lattices

Note that, unlike in the proof of Proposition 2.8, we keep the cutoff function $\chi$ in the main term since the function $(\eta+q\cdot Dq)^{-1}$ is not integrable in higher dimensions.

The main term of (A.17) can be computed using Poisson summation:

Using that $V(x)\asymp\lvert x\rvert^{2-d}$ near the origin, we find $\lVert V*\varphi_{\sqrt{\alpha\eta}}\rVert_{\infty}\leqslant CW^{-d}$ . By treating the two cases $\eta\leqslant\bigl{(}{\frac{W}{N}}\bigr{)}^{2}$ and $\eta\geqslant\bigl{(}{\frac{W}{N}}\bigr{)}^{2}$ separately, we find exactly as in the last paragraph of the proof of Proposition 2.8 that (A.18) is bounded by $CW^{-d}+C(N\eta)^{-1}$ .

What remains therefore is the estimate of the error term containing $R$ in (A.17). To that end, we write

We need a more precise bound on the error term of (A.17) than the bound $CW^{-d}$ from the proof of Lemma 8.1. In fact, we claim that

The proof of (A.20) is a rather laborious exercise in Taylor expansion whose details we omit. The basic strategy is similar to the analysis of (A.12), except that we expand $\widehat{S}_{W}$ up to order $d/2+2$ (instead of $4$ ). This completes the proof of Lemma 8.2. ∎

A.3. Slowly decaying band: proof of Lemma 8.8 and Proposition 8.9

We begin by proving the following auxiliary result, which gives the relevant asymptotics of $\widehat{S}_{W}$ . For $q\neq 0$ define

We also set $b(0)\mathrel{\mathop{:}}=0$ , so that $b$ is continuous.

Suppose that $d=1$ and that (8.6) and (8.7) hold. Then the following are true.

Part (i) is proved similarly to (A.6), using summation by parts.

Let $\chi$ be a smooth, symmetric bump function satisfying $\chi(x)=1$ for $\lvert x\rvert\leqslant 1$ and $\chi(x)=0$ for $\lvert x\rvert\geqslant 2$ . Write $\overline{\chi}\!\,\mathrel{\mathop{:}}=1-\chi$ . We introduce the splitting

on the right-hand side of (A.23). It is easy to check that the two first terms give a contribution of order $O(q^{2})$ . The last term of the splitting gives rise to

where the last step follows from a mid-point Riemann sum approximation. Now a change of variables $u=qx$ easily yields (A.22).

Part (iii) follows from part (ii) using an argument similar to (5.15). ∎

where the first term is the contribution of the low modes $\lvert q\rvert\leqslant\eta^{1/\beta}$ and the second term the contribution of the high modes $\lvert q\rvert\geqslant\eta^{1/\beta}$ , which may be replaced with an integral and estimated using Lemma A.2. We omit the details. ∎

We proceed similarly to the proof of Proposition 2.8. We choose a cutoff scale $\varepsilon$ , and denote by $\chi$ the bump function from the proof of Lemma A.2. The scale $\varepsilon$ satisfies $\eta^{1/\beta}\ll\varepsilon\ll 1$ , and will be chosen by optimizing at the end of the proof.

We use the expansions (A.22) and (3.5). Thus we find, as in the proof of Proposition 2.8,

where $\chi$ is a smooth bump function as in the proof of Proposition 2.8 and

Note that for $q\in Q\setminus\{0\}$ we have $b(q)\geqslant c$ . Using (A.21) we may therefore estimate, as in (A.12), to get

for some $c>0$ , where we used (8.10). Setting $\varepsilon\mathrel{\mathop{:}}=\eta^{1-1/\beta}$ and Poisson summation yields

Now (8.11) follows by noting that by (8.9), under the assumption (8.10), only the term $k=0$ is of leading order. ∎

Appendix B Multilinear large deviation estimates

In this appendix we give a generalization of the large deviation estimate of Corollary B.3 EYY1 . The proof is simpler and the statement is formulated under the assumption (2.11) instead of the stronger subexponential decay assumption. Moreover, since the current proof does not rely on the Burkholder inequality, it is trivially generalizable to arbitrary multilinear estimates.

Throughout the following we consider random variables $X$ satisfying

Suppose that $\bigl{(}{\sum_{i}\lvert b_{i}\rvert^{2}}\bigr{)}^{1/2}\prec\Psi$ . Then $\sum_{i}b_{i}X_{i}\prec\Psi$ .

Suppose that $\bigl{(}{\sum_{i\neq j}\lvert a_{ij}\rvert^{2}}\bigr{)}^{1/2}\prec\Psi$ . Then $\sum_{i\neq j}a_{ij}X_{i}X_{j}\prec\Psi$ .

Suppose that $\bigl{(}{\sum_{i,j}\lvert a_{ij}\rvert^{2}}\bigr{)}^{1/2}\prec\Psi$ . Then $\sum_{i,j}a_{ij}X_{i}Y_{j}\prec\Psi$ .

If all of the above random variables depend on an index $u$ and the hypotheses of (i) – (iii) are uniform in $u$ , then so are the conclusions.

The rest of this appendix is devoted to the proof of Theorem B.1. Our proof in fact generalizes trivially to arbitrary multilinear estimates for quantities of the form $\sum_{i_{1},\dots,i_{k}}^{*}a_{i_{1}\dots i_{k}}(u)X_{i_{1}}(u)\cdots X_{i_{k}}(u)$ , where the star indicates that the summation indices are constrained to be distinct.

We first recall the following version of the Marcinkiewicz-Zygmund inequality.

Let $X_{1},\dots,X_{N}$ be a family of independent random variables each satisfying (B.1) and suppose that the family $(b_{i})$ is deterministic. Then

The proof is a simple application of Jensen’s inequality. Writing $B^{2}\mathrel{\mathop{:}}=\sum_{j}\lvert b_{i}\rvert^{2}$ , we get, by the classical Marcinkiewicz-Zygmund inequality stroock in the first line, that

Next, we prove the following intermediate result.

Let $X_{1},\dots,X_{N},Y_{1},\dots,Y_{N}$ be independent random variables each satisfying (B.1), and suppose that the family $(a_{ij})$ is deterministic. Then for all $p\geqslant 2$ we have

Note that $(b_{j})$ and $(Y_{j})$ are independent families. By conditioning on the family $(b_{j})$ , we therefore get from Lemma B.2 and the triangle inequality that

Let $X_{1},\dots,X_{N}$ be independent random variables each satisfying (B.1), and suppose that the family $(a_{ij})$ is deterministic. Then we have

The proof relies on the identity (valid for $i\neq j$ )

where the sum ranges over nonempty subsets $I$ and $J$ . Now we may estimate

As remarked above, the proof of Lemma B.4 may be easily extended to multilinear expressions of the form $\sum_{i_{1},\dots,i_{k}}^{*}a_{i_{1}\dots i_{k}}X_{i_{1}}\cdots X_{i_{k}}$ .

We may now complete the proof of Theorem B.1.

The proof is a simple application of Chebyshev’s inequality. Part (i) follows from Lemma B.2, part (ii) from Lemma B.4, and part (iii) from Lemma B.3. We give the details for part (iii).

for arbitrary $D$ . In the second step we used the definition of $\bigl{(}{\sum_{i\neq j}|a_{ij}|^{2}}\bigr{)}^{1/2}\prec\Psi$ with parameters $\varepsilon/2$ and $D+1$ . In the last step we used Lemma B.4 by conditioning on $(a_{ij})$ . Given $\varepsilon$ and $D$ , there is a large enough $p$ such that the first term on the last line is bounded by $N^{-D-1}$ . Since $\varepsilon$ and $D$ were arbitrary, the proof is complete.

The claimed uniformity in $u$ in the case that $a_{ij}$ and $X_{i}$ depend on an index $u$ also follows from the above estimate. ∎