Random Matrices: The circular Law

Terence Tao, Van Vu

Introduction

Let x{x} be a complex random variable with finite non-zero variance 0<σ2<0<\sigma^{2}<\infty and NnN_{n} be the random matrix of order nn with entries being i.i.d. copies of x{x}. Let λ1,,λn\lambda_{1},\dots,\lambda_{n} be the eigenvalues of 1σnNn\frac{1}{\sigma\sqrt{n}}N_{n}. Define the empirical spectral distribution (ESD) μn\mu_{n} of NnN_{n} by the formula

We say that the (strong) circular law holds for x{x} if, with probability 11, the spectral distribution μn\mu_{n} converges (uniformly) to the uniform distribution

over the unit disk as nn tends to infinity. In the literature one also sees the weak circular law, which asserts that for any fixed ss and tt, that μn(s,t)\mu_{n}(s,t) converges to μ(s,t)\mu_{\infty}(s,t) in probability.

As the name suggests, the weak circular law is easier to prove than the strong one. Using the approach in , the proofs of both types of convergence boil down to controlling the least singular value of 1σnNnzI\frac{1}{\sigma\sqrt{n}}N_{n}-zI. For the weak convergence, one needs a bound with failure probability tending to zero with nn tends to infinity (this is the approach taken in , for example). On the other hand, for the strong convergence one needs the failure probability be summable in nn. This appears much more difficult and we will discuss it in more detail in Section 2 (see the paragraph following Theorem 2.1).

In this paper we shall be concerned exclusively with the strong circular law, and in particular with regard to the following well-known conjecture:

Circular law conjecture. The strong circular law holds for any complex variable x{x} with zero mean and finite non-zero variance.

The circular law conjecture was formulated in the early 1950s, as a natural (non-hermitian) counterpart of Wigner’s semi-circle law. Since then, several partial results have been obtained, at the cost of extra assumptions on the distribution of the basic variable x{x}. In the next few paragraphs, we give a brief survey of these results.

If x{x} is complex Gaussian, the conjecture was proved by Mehta in 1967, using the joint density function of the eigenvalues λi\lambda_{i} which was discovered by Ginibre few years earlier . An important breakthrough was made by Bai , following an earlier work of Girko . (Bai’s paper discussed Girko’s paper carefully and pointed out some gaps in that paper.) In , Bai proved the claim under the assumption that x{x} has finite sixth moment (Ex6<{\mathbf{E}}|{x}|^{6}<\infty) and that the joint distribution of the real and imaginary parts of x{x} has a bounded density. Recently, in [2, Chapter 10], a finer result was obtained showing that the sixth moment hypothesis can be weakened to Ex2+η<{\mathbf{E}}|{x}|^{2+\eta}<\infty for any specified η>0\eta>0. However, the bounded density assumption remains critical. This assumption, unfortunately, excludes several important distributions, for instance discrete distributions such as Bernoulli random variables x{1,+1}x\in\{-1,+1\}.

[2, Theorem 10.3] Assume that the complex random variable x{x} has zero mean and finite (2+η)th(2+\eta)^{\operatorname{th}} moment for some η>0\eta>0 and also that the joint distribution of the real and imaginary part has a bounded density. Then the circular law holds for x{x}.

A key idea in is to analyze the ESD μn\mu_{n} through its Stieltjes transformation sn:CCs_{n}:{\mathbf{C}}\to{\mathbf{C}}, defined by the formulaWe are using 1\sqrt{-1} for the imaginary unit, as we wish to reserve ii as an index of summation.

As sn(z)s_{n}(z) is analytic everywhere except the poles, the real part already determines the eigenvalues λk\lambda_{k}. If write sn(z)=snr(z)+1sni(z)s_{n}(z)=s_{nr}(z)+\sqrt{-1}s_{ni}(z), λk=λkr+1λki\lambda_{k}=\lambda_{kr}+\sqrt{-1}\lambda_{ki} and z=s+1tz=s+\sqrt{-1}t, we have the important identity

where νn(.,z)\nu_{n}(.,z) is the ESD of the Hermitian matrix Hn:=(1nNnzI)(1nNnzI)H_{n}:=(\frac{1}{\sqrt{n}}N_{n}-zI)(\frac{1}{\sqrt{n}}N_{n}-zI)^{\ast}. The task then reduces (at least in principle) to controlling the distributions νn\nu_{n}.

The log\log function has two poles, at \infty and . The first one is easy to deal with, as one can bound the largest singular value by a polynomial in nn. The pole at poses a much more serious obstacle, since the smallest eigenvalue of HnH_{n} (or the least singular value of NnzIN_{n}-zI) can be arbitrary close to . (In fact, if the matrix is singular, which happens with positive probability in discrete models, then the least singular value is .) The bounded density assumption in Theorem 1.1 was introduced primarily in order to handle this obstacle.

In the last few years, the least singular value problem has become better understood in the discrete case, thanks to a series of papers . In these papers, strong lower bounds for the least singular value of a random matrix or a random perturbation of a fixed matrix were obtained. As a consequence, the circular law has recently been established for various new classes of distributions. For instance, Götze and Tikhomirov proved the weak circular law for any sub-GaussianA variable is sub-Gaussian if it has exponential tail; in particular all of its moments are bounded. distribution x{x}, using the arguments from . In , Girko established the weak circular law assuming bounded 4+δ4+\delta moment for some δ>0\delta>0. Relying on , Pan and Zhou were recently able to verify the strong circular law for any distribution with a bounded fourth moment. This assumption is needed for a number of reasons, in particular allowing one to bound the operator norm of NnN_{n} by O(n)O(\sqrt{n}) with high probability. Very recently (a few months after the current paper was first posted on the arXiv), Götze and Tikhomirov proved the weak circular law under an assumption similar to our main theorem below.

In this paper, we prove the circular law only assuming a bounded (2+η)th(2+\eta)^{\operatorname{th}} moment, for any fixed η>0\eta>0. In particular, we have completely removed the bounded density function assumption in Theorem 1.1.

Assume that x{x} is a complex random variable with zero mean and finite (2+η)th(2+\eta)^{\operatorname{th}} moment for some η>0\eta>0, with strictly positive variance. Then the strong circular law holds for x{x}.

This result can be further strengthened in several directions:

We can further relax the condition Ex2+η<{\mathbf{E}}|{x}|^{2+\eta}<\infty to Ex2logC(2+x)<{\mathbf{E}}|{x}|^{2}\log^{C}(2+|{x}|)<\infty, where CC is a sufficiently large absolute constant. (For instance, C=16C=16 is sufficient; see Section 13 for details.)

It is not necessary to assume that the entries have identical distributions. It suffices to assume that they are independent, have mean zero with uniformly bounded (2+η)th(2+\eta)^{\operatorname{th}} moments, and that they are all dominated (in a Fourier-analytic sense) by a single random variable with finite non-zero variance and bounded (2+η)th(2+\eta)^{{\operatorname{th}}} moment; see Remark 2.8 below. (See also [2, p. 326-327] in which the extension of the circular law to the case of non-identical distributions is discussed.)

One can obtain some quantitative estimates on the rate of convergence as well. For example, under the (2+η)th(2+\eta)^{\operatorname{th}} moment assumption, we can show that almost surely, the distance sups,tμn(s,t)μ(s,t)\sup_{s,t}|\mu_{n}(s,t)-\mu_{\infty}(s,t)| between μn\mu_{n} and the limiting distribution μ\mu_{\infty} in the uniform metric is at most nηn^{-\eta^{\prime}} for some constant η>0\eta^{\prime}>0 and all sufficiently large nn.

It is too technical to address these points in the main proof, so we are going to first prove Theorem 1.2 and sketch out the necessary modifications to obtain these refinements in Sections 13, 14.

The circular law also holds for sparse random matrices. For 0<μ10<\mu\leq 1, let Iμ{\mathbf{I}}_{\mu} be the boolean random variable which takes value 1 with probability μ\mu and with probability 1μ1-\mu. Let ρ=n1+α\rho=n^{-1+\alpha}, for a positive constant α\alpha. Let Nn,ρN_{n,\rho} be the random matrix with the ijij entry being Ii,j,ρxi,j{\mathbf{I}}_{i,j,\rho}{x}_{i,j}, where the Ii,j,ρ{\mathbf{I}}_{i,j,\rho} and xij{x}_{ij} are jointly independent iid copies of Iρ{\mathbf{I}}_{\rho} and x{x} respectively. Götze and Tikhomirov proved that if x{x} is sub-Gaussian and α>3/4\alpha>3/4, then Nn,ρN_{n,\rho} admits the circular law. We can prove the following strengthening of this result:

Let α>0\alpha>0 and η>0\eta>0 be arbitrary positive constants. Assume that x{x} is a complex random variable with zero mean and finite (2+η)th(2+\eta)^{\operatorname{th}} moment. Set ρ=n1+α\rho=n^{-1+\alpha} and let μn,ρ\mu_{n,\rho} be the ESD of 1σnρNn,ρ\frac{1}{\sigma\sqrt{n\rho}}N_{n,\rho}, where σ2\sigma^{2}, as usual, is the variance of x{x}. Then μn,ρ\mu_{n,\rho} tends to the uniform distribution μ\mu_{\infty} over the unit disk as nn tends to infinity.

If one takes α=0\alpha=0, the circular law no longer holds. In this case, ρ=n1\rho=n^{-1} and each entry equals with probability 11/n1-1/n. Thus, a row is all-zero with probability (11/n)ne1(1-1/n)^{n}\approx e^{-1}. Since the rows are independent, it is easy to show that with high probability one has Θ(n)\Theta(n) all-zero rows. But this means that the ESD, with high probability, has positive constant mass at the origin.

We shall prove this theorem in parallel with Theorem 1.2, by indicating at various junctures what the “sparse” version of certain key lemmas are.

The key ingredient in our proof of the circular law is a new lower bound for the least singular value of the matrix M+NnM+N_{n}, where MM is an arbitrary matrix with complex entries having absolute values bounded from above by a polynomial in nn. For the circular law, we only need to consider the case M=zIM=-zI, where II is the identity matrix. On the other hand, the general case is interesting on its own right and proves useful in other areas of mathematics (see, for example, ). Our arguments permit the coefficients of MM or NnN_{n} to be as large as nCn^{C} for any fixed constant CC, which is the main reason why we do not need any stronger moment control on x{x} beyond the (2+η)th(2+\eta)^{\operatorname{th}} moment.

The rest of the paper is organized as follows. In the next section, we present the above mentioned result on the least singular value. The key tool for proving this result is a so-called Inverse Littlewood-Offord theorem, discussed in Section 3. This theorem is motivated by several previous results of the same spirit from . On the other hand, the bound in Section 3 is nearly optimal and is sharper than one that can be deduced from . This improvement is critical to us.

The proof of the Inverse Littlewood-Offord theorem is technical and requires several lemmas, developed in Sections 4-9. In particular, we prove a forward Littlewood-Offord theorem (Theorem 6.6), which seems to be of interest on its own right. The proof of the Inverse Theorem follows in Section 10. Next, we prove the desired bound on the least singular value in Section 11. The proof of the circular law follows in Section 12. The rest of the paper is devoted to various refinements of the circular law; for instance, Theorem 1.3 is discussed in Section 15.

In order to handle the sparse case, we need sparse versions of all the tools mentioned above. These results can be proved using the same argument with some modifications. We will only sketch these proofs in the paper.

Let us conclude this section with our notation.

In the whole paper we assume that nn is sufficiently large, whenever needed. Asymptotic notation is used under the assumption that nn\rightarrow\infty. Let XX and YY be non-negative quantities. X=O(Y)X=O(Y), XYX\ll Y, YXY\gg X and Y=Ω(X)Y=\Omega(X) all mean that XCYX\leq CY for some positive constant CC and X=Θ(Y)X=\Theta(Y) means XYXX\ll Y\ll X; X=o(Y)X=o(Y) means that Xc(n)Y|X|\leq c(n)Y where c(n)c(n) goes to zero as nn\to\infty.

Throughout the paper, letters A,B,C,c,α,ε,η,δ,κA,B,C,c,\alpha,\varepsilon,\eta,\delta,\kappa are used to denote constants. Letters μ,ρ,β\mu,\rho,\beta denote quantities that may depend on nn.

We use P{\mathbf{P}} to denote probability, E{\mathbf{E}} to denote expectation, and Iρ{\mathbf{I}}_{\rho} to denote indicator functions of expectation ρ\rho as used earlier in this section. If EE is an event, we use I(E){\mathbf{I}}(E) to denote the indicator of EE, which equals 11 when EE is true and otherwise. The cardinality of a finite set SS will be denoted #S\#S, and the Lebesgue measure of a set ACA\subset{\mathbf{C}} will be denoted mes(A){\operatorname{mes}}(A).

Least singular value bound

Let MM be a matrix of order nn. We use M\|M\| to denote the spectral norm of MM (i.e. the largest singular value of MM)

As discussed in the previous section, a key point in Bai’s approach is to obtain control on the lower tail distribution for the least singular value of 1nNnzIn\frac{1}{\sqrt{n}}N_{n}-zI_{n}, or equivalently to obtain control on the upper tail distribution of the norm of the inverse (1nNnzIn)1\|(\frac{1}{\sqrt{n}}N_{n}-zI_{n})^{-1}\|.

This will be achieved in Theorems 2.1 below. The strength of this theorem is that it requires a very weak assumption on the distribution of the entries. All we need is a finite second moment. Several results of this type were obtained recently, under stronger assumptions of x{x}. For example, addressed the case when x{x} was real Gaussian; addressed the case when x{x} has support on the integers and MM has integer entries. This was done building upon the M=0M=0 case discussed in . The case when x{x} has finite third moment and that M\|M\| is bounded by O(n1/2)O(n^{1/2}) was addressed in (building upon the M=0M=0 real-valued case proven in ). In this result, the assumption on the norm of MM is important and the constant 1/21/2 (in the exponent of nn) cannot be replaced by any other constant. Furthermore, in the complex-valued case, the bounds in depended on the entire covariance matrix of x{x} and not just on the variance.

Let A,C1A,C_{1} be positive constants, and let x{x} be a complex-valued random variable with non-zero finite variance (in particular, the second moment is finite). Then there are positive constants BB and C2C_{2} such that the following holds: if NnN_{n} is the random matrix of order nn whose entries are iid copies of x{x}, and MM is a deterministic matrix of order nn with spectral norm at most nC1n^{C_{1}}, then,

It is very important that we can have any constant AA in the bound. If A>1A>1, then the right hand side is summable in nn and this is critical to the strong circular law. In order to prove the weak law, any AA suffices. The difficulty between getting any AA and getting A>1A>1 can be illustrated by the following simplified case. Take MM be the zero matrix and NN be the random Bernoulli matrix (whose entries take value ±1\pm 1 with probability 1/21/2). To make the situation even simpler, assume that we only want to bound the probability that N1N^{-1} does not exists (namely that NN is singular). Already in the 70s, Komlós proved that this probability is O(n1/2)O(n^{-1/2}). However, the first proof for a bound of the type O(n1ε)O(n^{-1-\varepsilon}) was obtained only almost twenty years later by Kahn, Komlós and Szemerédi , using a much more complex argument.

Let us now go back to Theorem 2.1. In fact, we have a more precise statement involving a seemingly stronger (but actually equivalent) assumption on x{x}. More precisely, we introduce the following technical definition.

Let κ1\kappa\geq 1. A complex random variable x{x} is said to have κ\kappa-controlled second moment if one has the upper bound

(in particular, Exκ1/2|{\mathbf{E}}{x}|\leq\kappa^{1/2}), and the lower bound

The Bernoulli random variable (P(x=+1)=P(x=1)=1/2{\mathbf{P}}({x}=+1)={\mathbf{P}}({x}=-1)=1/2) has 11-controlled second moment. The condition (1) asserts in particular that x{x} has variance at least 1κ\frac{1}{\kappa}, but also asserts that a significant portion of this variance occurs inside the event xκ|{x}|\leq\kappa, and also contains some more technical phase information about the covariance matrix of Re(x){\operatorname{Re}}({x}) and Im(x){\operatorname{Im}}({x}).

To show that this condition is not significantly stronger than bounded second moment, we prove that any complex random variable of finite non-zero variance has controlled second moment after a (harmless) phase rotation:

Let x{x} be a complex random variable with finite non-zero variance. Then there exists a phase e1θe^{\sqrt{-1}\theta} and a κ1\kappa\geq 1 such that e1θxe^{\sqrt{-1}\theta}{x} has κ\kappa-controlled second moment.

For κ\kappa sufficiently large, we have Ex2κ{\mathbf{E}}|{x}|^{2}\leq\kappa, and the event xκ|{x}|\leq\kappa has probability at least 1/κ1/\sqrt{\kappa}. Let xκ{x}_{\kappa} be the variable x{x} conditioned on the event xκ|{x}|\leq\kappa. Since x{x} has non-zero variance, we see that xκ{x}_{\kappa} will also have non-zero variance for κ\kappa large enough. It will then suffice to show that

after rotating x{x} by a phase if necessary. If we write yκ:=xκE(xκ){y}_{\kappa}:={x}_{\kappa}-{\mathbf{E}}({x}_{\kappa}), then we easily compute

so it suffices to show that for κ\kappa sufficiently large we have

Now set y:=xE(x){y}:={x}-{\mathbf{E}}({x}) and consider the covariance matrix

Since x{x} has finite non-zero variance, we see that this matrix is finite, non-zero, and positive semi-definite. In particular its largest eigenvalue is at least δ\delta for some δ>0\delta>0. By monotone convergence we then conclude that the covariance matrix

has largest eigenvalue at least δ/2\delta/2 for κ\kappa sufficiently large.

Now fix κ\kappa large enough so that all the above statements hold, and also so that 1κδ2\frac{1}{\sqrt{\kappa}}\leq\frac{\delta}{2}. The null space of (3) is at most one-dimensional. By rotating x{x} by a phase we may then assume that the null space is contained in the imaginary axis {(0w):wR}\{\left(\begin{matrix}0\\ w\end{matrix}\right):w\in{\mathbf{R}}\}. Since covariance matrices are positive semi-definite, we thus have the quadratic form estimate

and (2) follows by setting u=Re(z)u=Re(z) and v=Im(z)v=Im(z). ∎

Since rotating all entries by the same phase does not change the norm of the inverse, Theorem 2.1 follows from the following theorem.

Let A,C1,C2A,C_{1},C_{2} be positive constants. There are positive constants BB and C3C_{3} such that the following holds. Let x{x} be a random variable with C1C_{1}-controlled second moment and NnN_{n} be the random matrix of order nn whose entries are i.i.d copies of x{x}. Let MM be a deterministic matrix of order nn with spectral norm at most nC2n^{C_{2}}. Then,

Our arguments give an explicit dependence of BB in terms of AA and C2C_{2}. One can set BB to be roughly 2AC22AC_{2}. A more exact dependence can be obtained with considerably more technical details. Since for the proof of the circular law, any constant BB suffices, we do not go into this matter here and will discuss it elsewhere.

Notice that the assumptions in Theorem 2.5 are weaker than the assumption of Theorem 1.2. We do not require x{x} to have mean and bounded (2+η)th(2+\eta)^{\operatorname{th}} moment. In the proof of Theorem 1.2, these extra assumptions are needed in order to repeat the approach of Bai, and are unrelated to the pole problem or Theorem 2.5.

One can relax somewhat the hypothesis that the entries xij{x}_{ij} of NnN_{n} are i.i.d copies of x{x}. It is sufficient to assume the following

xij{x}_{ij} are dominated by a single distribution x{x} in the Fourier-analytic sense that E(e2π1Re(ξxij))E(e2π1Re(ξx))|{\mathbf{E}}(e^{2\pi\sqrt{-1}{\operatorname{Re}}(\xi{x}_{ij})})|\leq{\mathbf{E}}(e^{2\pi\sqrt{-1}{\operatorname{Re}}(\xi{x})}) for all complex numbers ξ\xi.

x{x} has κ\kappa-controlled second moment for some fixed κ\kappa.

This refinement can be extracted without too much difficulty from the proof in this paper, which ultimately relies on Fourier-analytic methods. Using this refinement and following [2, Chapter 10.8.2], we can extend Theorem 1.2 for the case the the entries of NnN_{n} are independent, but not necessarily identically distributed, as mentioned in the introduction.

In order to deal with sparse random matrices, we prove the following variant of Theorem 2.1.

Let A>1,C1,C2,αA>1,C_{1},C_{2},\alpha be positive constants. There are positive constants BB and C3C_{3} depending on A,C1,C2,αA,C_{1},C_{2},\alpha such that the following holds. Let x{x} be a random variable with C1C_{1}-controlled second moment and let Nn,ρN_{n,\rho} be the random matrix of order nn defined as in Theorem 1.3. Let MM be a deterministic matrix of order nn with spectral norm at most nC2n^{C_{2}}. Then,

To conclude this section, let us derive a simple corollary of Theorem 2.9.

Let A,C1,C2,αA,C_{1},C_{2},\alpha be positive constants. There are positive constants BB and C3C_{3} such that the following holds. Let x{x} be a random variable with C1C_{1}-controlled second moment and Nn,ρN_{n,\rho} be the random matrix of order nn defined as in Theorem 1.3. Let MM be a deterministic matrix of order nn with spectral norm at most nC2n^{C_{2}}. Then,

A simple application of Chebyshev’s inequality shows that

Since Nn,ρ\|N_{n,\rho}\| is bounded from above by maxij=1nxij\max_{i}\sum_{j=1}^{n}|{x}_{ij}|, we have that

by the union bound. Combining this with the polynomial bound on M\|M\| and with Theorem 2.9, the claim follows by choosing BB sufficiently large. ∎

The condition number MM1\|M\|\|M^{-1}\| of a matrix MM plays a crucial role in numerical linear algebra (see , for instance). The above corollary implies that if one perturbes a fixed matrix MM by a (very general) sparse random matrix NnN_{n}, the condition number of the resulting matrix will be relatively small with high probability. This fact has some nice applications in theoretical computer science (see or , for example).

Inverse Littlewood-Offord theorems

Let us consider a toy case in order to illustrate the ideas behind the proof of Theorem 2.5. Assume, for a moment, that M=0M=0 and xN(0,1){x}\equiv N(0,1) is real Gaussian. In this case, we talk about the least singular value of the random matrix NnN_{n} whose entries are i.i.d real Gaussian. Let XiX_{i} be the row vectors of NnN_{n} and did_{i} be the distance from XiX_{i} to the hyperplane HiH_{i} spanned by XjX_{j}, jij\neq i. The least singular value of NnN_{n} is close (up to factors of nO(1)n^{O(1)}) to min1indi\min_{1\leq i\leq n}d_{i}. Thus, our goal is to prove that with high probability, each of the did_{i} is bounded away from 0.

In this Gaussian case, the task is simple since, thanks to symmetry, the distribution of did_{i} does not depend on the vectors XjX_{j}, jij\neq i. Indeed, did_{i} has the same distribution as the distance from a Gaussian vector to a fixed hyperplane. This variable is well understood and satisfies the inequality

for any fixed positive constant AA. This leads to the conclusion of Theorem 2.5 in this simple case.

However, the general case is much more difficult. For example, if the entries of NN are iid Bernoulli, it is already non-trivial to prove NnN_{n} is asymptotically almost surely non-singular (i.e. that with probability 1o(1)1-o(1), one has di0d_{i}\neq 0 for all ii). The point here is that one can no longer fix Xj,jiX_{j},j\neq i. As a matter of fact, the distribution of the distance did_{i} depends heavily on the position of the hyperplane HiH_{i} spanned by the Xj,jiX_{j},j\neq i. For example, let x{x} be Bernoulli and consider the following two situations

HiH_{i} has normal vector (1n,,1n)(\frac{1}{\sqrt{n}},\cdots,\frac{1}{\sqrt{n}}). In this case, di=0d_{i}=0 with probability O(1n)O(\frac{1}{\sqrt{n}}).

HiH_{i} has normal vector (12,12,0,,0)(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}},0,\dots,0). In this case, di=0d_{i}=0 with probability 12\frac{1}{2}.

A hyperplane HH is, in some sense, bad for us if the distance from a random (row) vector to HH is small with non-negligible probability. It is important to understand the bad hyperplanes. Notice that if v=(v1,,vn){\mathbf{v}}=(v_{1},\dots,v_{n}) is the unit normal vector of HH, then the distance in question is exactly the random variable

where xi{x}_{i} are i.i.d. copies of x{x}.

This naturally leads to introducing the following concept.

Let x{x} be a complex random variable, and let v=(v1,,vn){\mathbf{v}}=(v_{1},\ldots,v_{n}) be a tuple of complex numbers. We define the random walk Wx(v)W_{{x}}({\mathbf{v}}) to be the complex random variable

where x1,,xnx{x}_{1},\ldots,{x}_{n}\equiv{x} are iid copies of x{x}. For any zCz\in{\mathbf{C}} and r>0r>0, we let B(z,r)B(z,r) denote the closed disk of radius rr centered at zz. For any r0r\geq 0, we define the small ball probability

Intuitively, we expect the small ball probability pr,x(v)p_{r,{x}}({\mathbf{v}}) to be quite small for “most” tuples v{\mathbf{v}}. The question, of course, is to quantify “most”.

A classical theorem of Littlewood and Offord (see also ) shows that if x{x} is Bernoulli, and all vi1|v_{i}|\geq 1, then p1,x(v)=O(n1/2)p_{1,{x}}({\mathbf{v}})=O(n^{-1/2}). There are several extensions of this result. They, typically, improve upon the bound O(n1/2)O(-n^{1/2}), under extra assumptions on the viv_{i}. We are going to refer to results in this spirit as forward Littlewood-Offord theorems.

For our purposes, we need inverse Littlewood-Offord theorems. Such a theorem is supposed to give a characterization of those vectors v{\mathbf{v}}, where pr,vp_{r,{\mathbf{v}}} is larger than some lower bound. The study of inverse Littlewood-Offord theorems was started in , where we investigated the case when x{x} has discrete support. A new result in this spirit was recently obtained in , where the authors investigated sub-Gaussian distributions, as well as distributions with bounded third or fourth moments.

In the current situation, we only assume that x{x} has O(1)O(1)-controlled second moment. The weakness of this assumption is a major obstacle and makes the proof much more complicated. It is still possible to obtain a reasonably strong characterization of v{\mathbf{v}}, given that pr,x(v)p_{r,{x}}({\mathbf{v}}) is large. However, this characterization is somewhat technical to state and so we will only explicitly state here a corollary of it, which will be sufficient for our purpose of proving the least singular value bound and the circular law.

Let x{x} be a complex random variable. Let nn be a positive integer and β,p\beta,p be positive numbers that may depend on nn. Let Sn,x,β,pS_{n,{x},\beta,p} be the set of all unit vectors v=(v1,,vn)Cn{\mathbf{v}}=(v_{1},\ldots,v_{n})\in{\mathbf{C}}^{n} such that one has the concentration bound

We give Cn{\mathbf{C}}^{n} the ll^{\infty} norm

Let x{x} be a complex random variable which has κ\kappa-controlled second moment for some κ>0\kappa>0. Let 0<ε10<\varepsilon\leq 1. Then, for all nn which are sufficiently large depending on κ,ε\kappa,\varepsilon and βexp(nε/2)\beta\geq\exp(-n^{\varepsilon/2}) and p=nO(1)p=n^{-O(1)}, there is a set SCnS^{\prime}\subset{\mathbf{C}}^{n} of size at most n(1/2+ε)npn+exp(o(n))n^{(-1/2+\varepsilon)n}p^{-n}+\exp(o(n)) such that for any vSn,x,β,p{\mathbf{v}}\in S_{n,{x},\beta,p} there is vS{\mathbf{v}}^{\prime}\in S^{\prime} such that vvβ\|{\mathbf{v}}-{\mathbf{v}}^{\prime}\|_{\infty}\leq\beta. In other words, Sn,x,β,pS_{n,{x},\beta,p} has a maximal β\beta-net in the ll^{\infty} norm of size at most n(1/2+ε)npn+exp(o(n))n^{(-1/2+\varepsilon)n}p^{-n}+\exp(o(n)).

Let x{x} be a complex random variable which has κ\kappa-controlled second moment for some κ>0\kappa>0. Let 0<ε10<\varepsilon\leq 1. Then, for all nn which are sufficiently large depending on κ,ε\kappa,\varepsilon and βexp(nε/2)\beta\geq\exp(-n^{\varepsilon/2}) and p=nO(1)p=n^{-O(1)}, all 1/n<μ11/n<\mu\leq 1, and all mm between nεn^{\varepsilon} and n1εμn^{1-\varepsilon}\mu there is a set SCnS^{\prime}\subset{\mathbf{C}}^{n} of size at most nO(ε)n(pm)n+exp(nO(ε)m/μ)n^{O(\varepsilon)n}(p\sqrt{m})^{-n}+\exp(n^{O(\varepsilon)}m/\mu) such that for any vSn,xIμ,β,p{\mathbf{v}}\in S_{n,{x}{\mathbf{I}}_{\mu},\beta,p} there is vS{\mathbf{v}}^{\prime}\in S^{\prime} such that such that vvβ\|{\mathbf{v}}-{\mathbf{v}}^{\prime}\|_{\infty}\leq\beta. In other words, Sn,xIμ,β,pS_{n,{x}{\mathbf{I}}_{\mu},\beta,p} has a maximal β\beta-net in the ll^{\infty} norm of size at most nO(ε)n(pm)n+exp(nO(ε)m/μ)n^{O(\varepsilon)n}(p\sqrt{m})^{-n}+\exp(n^{O(\varepsilon)}m/\mu).

If one sets m=n1Cεμm=n^{1-C\varepsilon}\mu for some absolute constant CC then the conclusion of Theorem 3.3 is similar to that in Theorem 3.2 except for the extra term μ\sqrt{\mu} in Theorem 3.3. However, for our applications it will be slightly more convenient to choose mm at the other extreme, thus m=nεm=n^{\varepsilon}. The main point here is that the size of Sn,x,β,pS_{n,{x},\beta,p} (or Sn,xIμ,β,pS_{n,{x}{\mathbf{I}}_{\mu},\beta,p}) tends to be much smaller than pnp^{-n}.

Let AA be a precompact subset of a metric space XX, and let ε>0\varepsilon>0. We define the internal metric entropy Nε(A){\mathcal{N}}_{\varepsilon}(A) to be the cardinality of the largest ε\varepsilon-net in AA (i.e. a set BAB\subset A where any two elements in BB are separated by distance ε\varepsilon). We define the external metric entropy Nε(A){\mathcal{N}}^{\prime}_{\varepsilon}(A) to be the least number of closed ε\varepsilon-balls in XX needed to cover AA.

and furthermore in the complex plane X=CX={\mathbf{C}} we have N2ε(A)=Θ(Nε(A)){\mathcal{N}}_{2\varepsilon}(A)=\Theta({\mathcal{N}}_{\varepsilon}(A)). As constant factors will not play any important role, the two notions of entropy will be essentially equivalent for our purposes.

Since vn1/2v\|{\mathbf{v}}\|_{\infty}\geq n^{-1/2}\|{\mathbf{v}}\|, we have the following corollary.

Let x{x} be a complex random variable which has κ\kappa-controlled second moment, for some constant κ>0\kappa>0. Let ε\varepsilon be an arbitrary positive constant. Then for any positive numbers μ,β,p1\mu,\beta,p\leq 1 and all sufficiently large nn we have

In fact, the proof of Theorem 3.2 gives a fairly precise description of the set Sn,x,β,pS_{n,{x},\beta,p}, as is the case with other inverse Littlewood-Offord theorems in the literature. However, this description is somewhat technical to state and we only need the entropy bound on Sn,x,β,pS_{n,{x},\beta,p} in our application, so we have presented Theorem 3.2 in the above short (but less explicit) form.

Concentration probabilities and Fourier analysis

Throughout this section x{x} will be a fixed complex random variable with O(1)O(1)-controlled second moment. For any 0<μ10<\mu\leq 1, let x(μ){x}^{(\mu)} be the random variable

where x1,x2{x}_{1},{x}_{2} are iid copies of x{x} and Iμ2{\mathbf{I}}_{\frac{\mu}{2}} is independent from x1,x2{x}_{1},{x}_{2}.

If x{x} is the Bernoulli random variable P(x=+1)=P(x=1)=1/2{\mathbf{P}}({x}=+1)={\mathbf{P}}({x}=-1)=1/2, then x(μ){0,+2,2}{x}^{(\mu)}\in\{0,+2,-2\} with P(x(μ)=+2)=P(x(μ)=2)=μ/8{\mathbf{P}}({x}^{(\mu)}=+2)={\mathbf{P}}({x}^{(\mu)}=-2)=\mu/8.

For any 0<μ10<\mu\leq 1 and any tuple v=(v1,,vn){\mathbf{v}}=(v_{1},\ldots,v_{n}) of complex numbers, define the concentration probability

This quantity will turn out to be very convenient for controlling the small ball probabilities of Wx(μ)(v)W_{{x}^{(\mu)}}({\mathbf{v}}) (see Lemma 4.3 below). To do that, we first need a Fourier-analytic representation of Pμ(v){\mathbf{P}}_{\mu}({\mathbf{v}}). We introduce the characteristic function f:CRf:{\mathbf{C}}\to{\mathbf{R}}, defined by

For any tuple v=(v1,,vn){\mathbf{v}}=(v_{1},\ldots,v_{n}) of complex numbers and any 0<μ10<\mu\leq 1, we have

Here of course dξd\xi is Lebesgue measure on the complex plane C{\mathbf{C}}.

On the other hand, from (4), (5), (7) and independence we see that

The relevance of concentration probability to the small ball probability is provided by the following lemma:

For any tuple v{\mathbf{v}} and any r>0r>0, we have

In applications, rr will be very close to 0 and so the term eπr2e^{\pi r^{2}} can be ignored.

From Definition 3.1, it suffices to show that

Applying (9) as in the proof of the preceding lemma, we have

The quantity Ee(Re(ξWx(v)))|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}}({\mathbf{v}})))| can be expanded, using (4) and (7), as i=1nf(ξvi)1/2\prod_{i=1}^{n}f(\xi v_{i})^{1/2}. Since f(ξvi)1/212+12f(ξvi)f(\xi v_{i})^{1/2}\leq\frac{1}{2}+\frac{1}{2}f(\xi v_{i}), it follows that

The claim of the lemma follows from the triangle inequality and Lemma 4.2. ∎

We now generalize the above lemma to the sparse case:

For any tuple v{\mathbf{v}} and any r>0,1μ>0r>0,1\geq\mu>0, we have

The proof is almost identical as the previous one. The only difference here is that we have Ee(Re(ξWxIμ(v)))|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}{\mathbf{I}}_{\mu}}({\mathbf{v}})))| instead of Ee(Re(ξWx(v)))|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}}({\mathbf{v}})))|. Notice that Ee(Re(ξWxIμ(v)))|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}{\mathbf{I}}_{\mu}}({\mathbf{v}})))| can be expanded as i=1n((1μ)+μf(ξvi)1/2)\prod_{i=1}^{n}((1-\mu)+\mu f(\xi v_{i})^{1/2}). Since f(ξvi)1/212+12f(ξvi)f(\xi v_{i})^{1/2}\leq\frac{1}{2}+\frac{1}{2}f(\xi v_{i}), it follows that

The concentration probability has several pleasant properties (cf. [27, Lemma 5.1]):

Let 0<μ10<\mu\leq 1. Then the following properties hold.

The quantity Pμ(w){\mathbf{P}}_{\mu}({\mathbf{w}}) is monotone decreasing in μ\mu and permutation invariant in w{\mathbf{w}}.

For any tuples v,w{\mathbf{v}},{\mathbf{w}} we have

where vw{\mathbf{v}}{\mathbf{w}} is the concatenation of v{\mathbf{v}} and w{\mathbf{w}}.

For any tuples v,w{\mathbf{v}},{\mathbf{w}} we have

For any k1k\geq 1 and tuple v{\mathbf{v}} we have

where vk{\mathbf{v}}^{k} is the concatenation of kk copies of v{\mathbf{v}}.

For any tuples v,w1,,wm{\mathbf{v}},{\mathbf{w}}_{1},\ldots,{\mathbf{w}}_{m} we have

In particular, by the pigeonhole principle, there exists 1im1\leq i\leq m such that

Properties (i), (ii) are immediate from (8). To prove (iii), observe from (6) that

where we require the walks Wx(μ)(v),Wx(μ)(w)W_{{x}^{(\mu)}}({\mathbf{v}}),W_{{x}^{(\mu)}}({\mathbf{w}}) to be independent. Using the arithmetic mean-geometric mean inequality

Comparing this with (10) we obtain the claim.

The inequality (iv) follows easily from (8) and the elementary inequality 1t(1t/k)k1-t\leq(1-t/k)^{k} for all 0t10\leq t\leq 1, which follows from the convexity of log(1t)\log(1-t). Finally, the inequality (v) follows from (8) and Hölder’s inequality. ∎

The x𝑥{x}-norm of a complex number

In this section, we present a way to estimate the characteristic function ff (and hence the concentration probabilities Pμ(w){\mathbf{P}}_{\mu}({\mathbf{w}})) in terms of a more convenient expression. Define the x{x}-norm of a complex number wCw\in{\mathbf{C}} by the formula

where x1,x2{x}_{1},{x}_{2} are iid copies of x{x}, and tR/Z\|t\|_{{\mathbf{R}}/{\mathbf{Z}}} denotes the distance from tt to the nearest integer.

If x{x} is Bernoulli, then wx=12Re(2w)R/Z\|w\|_{{x}}=\frac{1}{\sqrt{2}}\|{\operatorname{Re}}(2w)\|_{{\mathbf{R}}/{\mathbf{Z}}}. So in this case the x{x}-norm of ww is basically the size of the fractional part of Re(2w){\operatorname{Re}}(2w).

For any wCw\in{\mathbf{C}} and 0<μ10<\mu\leq 1 we have

for any tuple w=(w1,,wk){\mathbf{w}}=(w_{1},\ldots,w_{k}).

In view of the elementary inequality 1texp(t)1-t\leq\exp(-t) for t0t\geq 0, it will suffice to show that

and the claim follows from the elementary inequality cos(2πθ)1Ω(θR/Z2)\cos(2\pi\theta)\leq 1-\Omega(\|\theta\|_{{\mathbf{R}}/{\mathbf{Z}}}^{2}). ∎

We now record some useful properties of the x{x}-norm, which may help explain why we call it a “norm”:

For any wCw\in{\mathbf{C}}, 0wx10\leq\|w\|_{x}\leq 1 and wx=wx\|-w\|_{x}=\|w\|_{x}.

For any z,wCz,w\in{\mathbf{C}}, z+wxzx+wx\|z+w\|_{x}\leq\|z\|_{x}+\|w\|_{x}.

If x{x} has κ\kappa-controlled second moment for some positive constant κ\kappa, then there exists a positive constant cc depending on κ\kappa such that zxRe(z)\|z\|_{x}\gg|{\operatorname{Re}}(z)| for all zB(0,c)z\in B(0,c).

Property (i) is obvious. Property (ii) follows from the triangle inequality for L2L^{2} and the elementary observation that x+yR/ZxR/Z+yR/Z\|x+y\|_{{\mathbf{R}}/{\mathbf{Z}}}\leq\|x\|_{{\mathbf{R}}/{\mathbf{Z}}}+\|y\|_{{\mathbf{R}}/{\mathbf{Z}}}.

Now we prove (iii). Let zB(0,c)z\in B(0,c) for some small cc. From (11) it suffices to show that

for some K=O(1)K=O(1). In particular P(xK)1{\mathbf{P}}(|{x}|\leq K)\gg 1. So if we let yi{y}_{i} for i=1,2i=1,2 be xi{x}_{i} conditioned on the event xiK|{x}_{i}|\leq K, it suffices to show that

If cc is small enough depending on KK, then z(y1y2)12|z({y}_{1}-{y}_{2})|\leq\frac{1}{2}, so it suffices to show that

But this follows by conditioning on y2{y}_{2} and then using (1). ∎

Generalized arithmetic progressions and the forward Littlewood-Offord theorem

As in previous literature, our Littlewood-Offord theorems shall involve generalized arithmetic progressions (GAPs), which we now define.

If v1,,vrv_{1},\ldots,v_{r} are complex numbers and L1,,LrL_{1},\ldots,L_{r} are positive numbers, we define the symmetric generalized arithmetic progression (or symmetric GAP for short)

We refer to rr as the rank of QQ, v1,,vrv_{1},\ldots,v_{r} as the generators, and L1,,LrL_{1},\ldots,L_{r} as the dimensions.

If all the sums n1v1++nrvrn_{1}v_{1}+\ldots+n_{r}v_{r} are distinct, we say that QQ is proper. For t>0t>0, we define the dilate tQtQ of QQ as

Finally, if L1==Lr=LL_{1}=\ldots=L_{r}=L, we abbreviate Q((v1,,vr),(L,,L)){\operatorname{Q}}((v_{1},\ldots,v_{r}),(L,\ldots,L)) as Q((v1,,vr),L){\operatorname{Q}}((v_{1},\ldots,v_{r}),L).

GAPs are a fundamental object in additive combinatorics and they have played a crucial role in our earlier papers on Inverse Littlewood-Offord theorems and least singular values . For a detailed discussion about these objects, we refer to .

It is helpful to view QQ as the image of the integral box

under the linear map Φ\Phi that sends the point (n1,,nr)(n_{1},\dots,n_{r}) to n1v1++nrvrn_{1}v_{1}+\dots+n_{r}v_{r}. QQ is proper if Φ\Phi is one-to-one.

We use the following two simple lemmas frequently:

Let QQ be a symmetric GAP of rank rr and t1t\geq 1. Then

One can cover tQtQ by O(tr)O(t^{r}) translates of QQ. ∎

Let QCQ\subset{\mathbf{C}} be a finite set, and let ΩC\Omega\subset{\mathbf{C}} be a set which can be covered by at most MM balls of radius r/2r/2. Then we have

We can of course assume that QΩQ\cap\Omega is non-empty. By the pigeonhole principle, we can find a ball B(z,r/2)B(z,r/2) of radius r/2r/2 which contains at least #(QΩ)/M\#(Q\cap\Omega)/M elements of QΩQ\cap\Omega; in particular it contains at least one element z0z_{0} of QQ. Since (QΩB(z,r/2))z0(Q\cap\Omega\cap B(z,r/2))-z_{0} is contained in (QQ)B(0,r)(Q-Q)\cap B(0,r), the claim follows. ∎

For a GAP Q=Q((v1,,vr),(L1,,Lr))Q={\operatorname{Q}}((v_{1},\dots,v_{r}),(L_{1},\dots,L_{r})), define the dispersion D(Q){\mathbf{D}}(Q) to be the quantity

The quantity D(Q){\mathbf{D}}(Q) is very close to the metric entropy N1(Q){\mathcal{N}}_{1}(Q) of QQ, indeed simple volume packing arguments (cf. Lemma 6.4) show that D(Q)=Θr(N1(Q)){\mathbf{D}}(Q)=\Theta_{r}({\mathcal{N}}_{1}(Q)). We will however not use that fact here.

This quantity turns out to control the concentration probability of certain random walks associated with QQ:

For any 0<μ10<\mu\leq 1, ε>0\varepsilon>0, and complex numbers v1,,vrv_{1},\dots,v_{r}, we have

This “forward Littlewood-Offord theorem” will be crucial in establishing Theorem 3.2. To give the reader some feeling about this estimate, let us first consider a toy case when x{x} is Bernoulli and μ=1\mu=1. The adjusted random variable x(μ){x}^{(\mu)} equals with probability 3/43/4 and ±2\pm 2 with probability 1/81/8.

Assume furthermore that the viv_{i} are non-zero integers and QQ is proper. Thus QB(0,1){1,0,1}Q\cap B(0,1)\subset\{-1,0,1\} and the desired bound becomes

Consider a (lazy) random walk WW starting at . At step jj, stay with probability 1/21/2 and move to right or left by an amount vjv_{j} with probability 1/81/8. The terminal point after nn step is exactly the random variable

Since Pμ(v):=Eexp(πWx(μ)(v)2){\mathbf{P}}_{\mu}({\mathbf{v}}):={\mathbf{E}}\exp(-\pi|W_{{x}^{(\mu)}}({\mathbf{v}})|^{2}), the quantity P1(v1L12vrLr2){\mathbf{P}}_{1}(v_{1}^{L_{1}^{2}}\ldots v_{r}^{L_{r}^{2}}) can be bounded from above by the sum of the probability that the lazy random walk with L12L_{1}^{2} steps of size v1v_{1}, …, Lr2L_{r}^{2} steps of size vrv_{r} ends up on a point with absolute value at most 10log(#Q)10\log(\#Q) and a negligible term which is much smaller than (#Q)1(\#Q)^{-1}.

Notice that the coefficient of vjv_{j} is the sum of Lj2L_{j}^{2} iid copies of x(1){x}^{(1)}. It is well known that the distribution of this sum is roughly uniform on the interval [Lj,Lj][-L_{j},L_{j}]. (By roughly uniform, we mean that for any two integers in this interval, the ratio of their masses is bounded from above by a positive constant.) Thus, the main observation here (and somehow the essence of the theorem) is that the end point of the walk (conditioned on the fact that the coefficient of vjv_{j} belongs to [Lj,Lj][-L_{j},L_{j}] ) is roughly uniform in QQ. It follows that the probability that it has absolute value O(log#Q)O(\log\#Q) can be bounded from above by O(log#Q#Q)#Q1+εO(\frac{\log\#Q}{\#Q})\leq\#Q^{-1+\varepsilon}, giving the desired bound.

This argument can be made rigorous for random variables x{x} with discrete supports, even when QQ is not proper. However, the proof for the general case is more complicated. The main technical tool needed is the following level set estimate:

Given a GAP Q=Q(v1,,vr,L1,,Lr)Q={\operatorname{Q}}(v_{1},\dots,v_{r},L_{1},\dots,L_{r}), a complex number ξ0\xi_{0}, and ε>0\varepsilon>0, let ΣC\Sigma\subset{\mathbf{C}} be the set

We will prove Lemma 6.7 in Sections 7-8 below. For now, let us show how it implies Theorem 6.6.

We abbreviate D:=D(Q){\mathbf{D}}:={\mathbf{D}}(Q). In view of (12), it suffices to show that

Covering C{\mathbf{C}} by balls of radius 11, it thus suffices to show that

Now we fix ξ0\xi_{0}. Let cc be a small positive constant to be determined. It is clear that if D{\mathbf{D}} is sufficiently large, then

(In fact, Dcε{\mathbf{D}}^{c\varepsilon} can be replaced by ClogDC\log{\mathbf{D}} for some large constant CC.) By Lemma 6.7,

We choose cc equal half of the reciprocal of the hidden constant in OO. It follows that

To conclude the proof of Theorem 6.6, we need to establish Lemma 6.7. This is the purpose of the next two sections.

Lacunary sets inside GAPs

Let SS be a set. We shall informally call a sequence w1,,wdw_{1},\dots,w_{d} of elements of SS lacunary if the ratio wi1wi\frac{|w_{i-1}|}{|w_{i}|} is large for all 1<id1<i\leq d. The goal of this section is to show that a large GAP always contains a large lacunary subset with some prescribed properties. This fact will be a key ingredient in the proof of Lemma 6.7 (and hence Theorem 6.6), which we give in the next section.

To give the reader some motivation, let us consider the toy case when QQ is an interval, say {s,s+1,,s1,s}\{-s,-s+1,\dots,s-1,s\}. Given a ratio K>2K>2 and a constant R>1R>1 (say), we can easily find dd elements w1,,wdw_{1},\dots,w_{d} such that wdR|w_{d}|\geq R and wiwi+1K\frac{|w_{i}|}{|w_{i+1}|}\geq K where dd satisfies

The main result of this section is a generalization of the above observation for general GAPs.

Let K1K\geq 1, let QQ be a symmetric GAP of rank rr, and let R0R\geq 0 be a radius. Then there exists, for some d0d\geq 0, “primary vectors” w1,,wdQw_{1},\ldots,w_{d}\in Q, and “secondary vectors” w1,,wdQw^{\prime}_{1},\ldots,w^{\prime}_{d}\in Q with the following properties:

(Lacunarity) We have wiKwi+1|w_{i}|\geq K|w_{i+1}| for all 1id11\leq i\leq d-1.

(Secondary bounds) We have wi>R|w_{i}|>R and wiwi|w^{\prime}_{i}|\leq|w_{i}| for all 1id1\leq i\leq d.

where 1Ki1+K1\leq K_{i}\leq 1+K is the quantity

The secondary vectors are necessary here because QQ is taking values in the complex numbers; if QRQ\subset{\mathbf{R}} then we could simply take wi=0w^{\prime}_{i}=0 (and thus Ki=1K_{i}=1) for all ii. The reader may wish to follow the argument below in the real case (and for R=0R=0), as it is somewhat simpler in that case. The bound (16) may seem strange, but it is best possible except for the Or()O_{r}(\cdot) factor, and we will need such a tight estimate in our applications. The vectors w1,,wd,w1,,wdw_{1},\ldots,w_{d},w^{\prime}_{1},\ldots,w^{\prime}_{d} are somewhat analogous to the Minkowski basis of a lattice with respect to a convex body, thus (16) can be viewed as a variant of Minkowski’s second theorem.

By increasing KK if necessary we may assume KK to be larger than any given constant depending on rr. We can also assume that QQ is not contained in B(0,R)B(0,R), as the claim is obvious otherwise.

We perform the following algorithm. We set d0:=Cr(1+log#Q#(QB(0,R))logK)d_{0}:=C_{r}\left(1+\frac{\log\frac{\#Q}{\#(Q\cap B(0,R))}}{\log K}\right) for some sufficiently large constant CrC_{r} depending only on rr.

Initialize i=1i=1. We also adopt the convention that w0=w_{0}=\infty.

Let Qi:=2d0+iQB(0,wi1/K)Q_{i}:=2^{-d_{0}+i}Q\cap B(0,|w_{i-1}|/K). If QiB(0,R)Q_{i}\subset B(0,R) then set d:=i1d:=i-1 and STOP. Otherwise, let wiQiw_{i}\in Q_{i} be chosen such that wi|w_{i}| is maximal; thus wiwi1/K|w_{i}|\leq|w_{i-1}|/K, wi>R|w_{i}|>R and QB(0,wi)Q\subset B(0,|w_{i}|).

Let wiQiw^{\prime}_{i}\in Q_{i} be chosen to maximize the quantity KiK_{i} defined in (17). Observe that wiwi|w^{\prime}_{i}|\leq|w_{i}|.

From elementary complex geometry we see that QiQ_{i} is now contained in a rectangle of dimensions O(wi)×O(KiKwi)O(|w_{i}|)\times O(\frac{K_{i}}{K}|w_{i}|). This rectangle can be covered by O(KKi)O(KK_{i}) disks of radius wi/2K|w_{i}|/2K. Applying Lemma 6.4, we conclude that the set

Increment ii to i+1i+1 and return to Step 1.

Since w1,w2,w_{1},w_{2},\ldots have decreasing magnitude and lie in the finite set QQ we see that this algorithm terminates in finite time. In fact we claim that this algorithm terminates before step d0d_{0}. For if the algorithm reaches stage d0d_{0}, we have obtained w1,,wd0Qw_{1},\ldots,w_{d_{0}}\in Q obeying the lacunarity condition wiwi1/K|w_{i}|\leq|w_{i-1}|/K. This implies that the GAP Q((w1,,wd0),K/10){\operatorname{Q}}((w_{1},\ldots,w_{d_{0}}),K/10) is proper, and that the pairwise sums between Q((w1,,wd0),K/10){\operatorname{Q}}((w_{1},\ldots,w_{d_{0}}),K/10) and 2QB(0,R/10)2Q\cap B(0,R/10) are distinct and contained in (d0K+1)2Q(d_{0}K+1)2Q. But this implies that

Also, since B(0,R)B(0,R) can be covered by O(1)O(1) balls of radius R/20R/20, we see from Lemma 6.4 that

But from definition of d0d_{0}, we see that this is impossible if CrC_{r} is chosen sufficiently large (recall we are taking KK large compared to rr). Thus we have dd0d\leq d_{0}, which in particular implies that w1,,wdw_{1},\ldots,w_{d} and w1,,wdw^{\prime}_{1},\ldots,w^{\prime}_{d} lie in QQ. Since QQ is not contained in B(0,R)B(0,R) we also have d1d\geq 1.

Now we can cover QQ by Or(1)d0O_{r}(1)^{d_{0}} copies of Q1=2d0+1QQ_{1}=2^{-d_{0}+1}Q, and thus

In particular, since K+1K1K+1\geq K_{1} and d0dd_{0}\geq d we have

using the definition of d0d_{0} and recalling that KK is large compared to rr we conclude that drd0d\gg_{r}d_{0}. The claim (16) now follows from (20). The remaining claims are easily verified from the construction. ∎

Proof of Lemma 6.7

We are now ready to prove Lemma 6.7. In the following, QQ is fixed and we write D{\mathbf{D}} instead of D(Q){\mathbf{D}}(Q). We also fix rr and allow all implied constants to depend on rr. We may assume without loss of generality that D{\mathbf{D}} is large compared with ε\varepsilon, since the claim is trivial otherwise.

Let K:=DεK:={\mathbf{D}}^{\varepsilon}; since D{\mathbf{D}} is assumed large compared to ε\varepsilon, we see that KK is also. We apply Lemma 7.1 (with R=1R=1, and to the GAP 1K4Q\frac{1}{K^{4}}Q) to obtain vectors

for some d=O(1/ε)d=O(1/\varepsilon) such that wiKwi+1|w_{i}|\geq K|w_{i+1}| for all 1id11\leq i\leq d-1, wi>1|w_{i}|>1 and wiwi|w^{\prime}_{i}|\leq|w_{i}| for all 1id1\leq i\leq d, and

where KiK_{i} is defined in (17). Since QQ has rank O(1)O(1), we have

and thus (since d=O(1/ε)d=O(1/\varepsilon) and D{\mathbf{D}} is large compared with ε\varepsilon)

From (14), Lemma 5.3, and (21) we see that

for all 1id1\leq i\leq d and ξΣ\xi\in\Sigma.

For 1id1\leq i\leq d, define ζi:=1K2wi\zeta_{i}:=\frac{1}{K^{2}w_{i}} and ζi:=11KKiwi\zeta^{\prime}_{i}:=\sqrt{-1}\frac{1}{KK_{i}w_{i}}. Let PP be the GAP

One should view PP as a kind of “dual” to QQ. It has the following properties:

#PD1O(ε)\#P\geq{\mathbf{D}}^{1-O(\varepsilon)}.

If z,zPz,z^{\prime}\in P are distinct, then z+Σz+\Sigma and z+Σz^{\prime}+\Sigma are disjoint.

We first verify (i). If PP is not proper, then we have a linear relation

for some integers n1,,nr,m1,,mrn_{1},\ldots,n_{r},m_{1},\ldots,m_{r}, not all zero, with niK/50|n_{i}|\leq K/50 and miKi/50|m_{i}|\leq K_{i}/50 for 1ir1\leq i\leq r. Let jj be the largest index such that (nj,mj)(n_{j},m_{j}) is non-zero. If 1i<j1\leq i<j, then from the properties of wiw_{i} we have

From the triangle inequality we then have

On the other hand, since nj,mjn_{j},m_{j} are integers which are not both zero, and ζj=1KKjζj\zeta^{\prime}_{j}=\sqrt{-1}\frac{K}{K_{j}}\zeta_{j}, and K/Kj1/2K/K_{j}\geq 1/2, we see that

and so (ii) now follows from (22) (recalling that d=O(1/ε)d=O(1/\varepsilon) and D{\mathbf{D}} is large compared to ε\varepsilon).

Now we prove (iii). If zPz\in P, then we see from the triangle inequality that

by lacunarity. But by construction wd1|w_{d}|\geq 1, and the claim follows.

Now we prove (iv). If the claim was false, then we could find distinct z,zPz,z^{\prime}\in P and ξ,ξΣ\xi,\xi^{\prime}\in\Sigma such that zz=ξξz-z^{\prime}=\xi-\xi^{\prime}. We can then write

for some integers n1,,nr,m1,,mrn_{1},\ldots,n_{r},m_{1},\ldots,m_{r}, not all zero, with niK/50|n_{i}|\leq K/50 and miKi/50|m_{i}|\leq K_{i}/50.

Let jj be the largest index such that (nj,mj)(n_{j},m_{j}) is non-zero. From (23) and Lemma 5.3 we have

On the other hand, from the triangle inequality we have

If KK is large enough, then we can apply Lemma 5.3 to conclude from (24) that

if KK is large enough. On the other hand, by construction of ζj,ζj\zeta_{j},\zeta^{\prime}_{j} we have

Since njn_{j} is an integer, we conclude nj=0n_{j}=0. In that case we have

if Kj2K_{j}\geq 2, by construction of ζj\zeta^{\prime}_{j} and KjK_{j}. Since mjm_{j} is an integer, we conclude mj=0m_{j}=0. On the other hand, if Kj<2K_{j}<2, then we have mj=0m_{j}=0 as well, since mjKj/50|m_{j}|\leq K_{j}/50. But (nj,mj)(n_{j},m_{j}) is non-zero, a contradiction. ∎

From properties (ii), (iii), (iv) we see that

Structure of weak elements

Let QQ be a GAP. Extend QQ by a new dimension generated by a new element zz; Q=Q+{kz,,kz}Q^{\prime}=Q+\{-kz,\cdots,kz\}. We call zz weak if #Q\#Q^{\prime} is only slightly more than #Q\#Q. The goal of this section is to quantify (and generalize) the following phenomenon:

The reader may find the following simple example illustrative. Assume that QQ is the interval [s,,s][-s,\dots,s]. Assume that Q:=Q+{kz,,kz}Q^{\prime}:=Q+\{-kz,\cdots,kz\} has cardinality at most lsls, where l=kδl=k^{\delta} for some small positive δ\delta.

Consider the interval Q1:={xZxsl/k}Q_{1}:=\{x\in{\mathbf{Z}}||x|\leq sl/k\}. The sets x+{z,,kz},xQ1x+\{z,\cdots,kz\},x\in Q_{1} are subsets of QQ^{\prime}. Since #Q1>ls/k\#Q_{1}>ls/k, these sets are not disjoint. Thus, we have x+jz=x+jzx+jz=x^{\prime}+j^{\prime}z for some distinct x,xQ1x,x^{\prime}\in Q_{1} and 1jjk1\leq j\neq j^{\prime}\leq k. This implies that

This already gives a bound k#(Q1Q1)=O(l#Q)=O(ls)k\#(Q_{1}-Q_{1})=O(l\#Q)=O(ls) on the cardinality of the possible zz. But this bound can be improved further (this improvement is critical later on). Consider the set x+{0,,lz}x+\{0,\cdots,lz\} with xQx\in Q. By the same argument as before, these sets are not disjoint, and we can conclude that

for xQ1Q1,1τkx\in Q_{1}-Q_{1},1\leq\tau\leq k and xQQ,1τlx^{\prime}\in Q-Q,1\leq\tau^{\prime}\leq l. If xτ\frac{x}{\tau} is irreducible, then τl\tau\leq l and the number of zz’s of this form is only at most l#(Q1Q1)=O(l2ks)l\#(Q_{1}-Q_{1})=O(\frac{l^{2}}{k}s). If it is not, then gcd(x,τ)τl\operatorname{gcd}(x,\tau)\geq\frac{\tau}{l}. The number of xx satisfying this condition in Q1Q1Q_{1}-Q_{1} is at most O(lτ#Q1)O(\frac{l}{\tau}\#Q_{1}). Thus, the number of zz’s is at most τ=lkO(lτ#Q1)=O(l2ks)\sum_{\tau=l}^{k}O(\frac{l}{\tau}\#Q_{1})=O(\frac{l^{2}}{k}s), using the bound on #Q1\#Q_{1} and the fact that l=kΩ(1)l=k^{\Omega(1)}. Thus, altogether we obtain the bound

which is much better than the previous bound O(l#Q)O(l\#Q). The term k1k^{-1} will play a critical role in later proofs.

The main result of this section is a generalization of this very special case.

Let w1,,wrw_{1},\ldots,w_{r} be complex numbers and Q=Q((w1,,wr),(L1,,Lr))Q={\operatorname{Q}}((w_{1},\dots,w_{r}),(L_{1},\dots,L_{r})). Let zz be a complex number and kk a positive integer. Define

Let ZZ denote the set of all complex numbers zz such that

Then ZZ has a 2424-net of size at most 1+Or(l4k1D(Q))1+O_{r}(l^{4}k^{-1}{\mathbf{D}}(Q)).

The 2424-net can be replaced by an 11-net if we replace the bound 1+Or(l4k1D(Q))1+O_{r}(l^{4}k^{-1}{\mathbf{D}}(Q)) by Or(1+l4k1D(Q))O_{r}(1+l^{4}k^{-1}{\mathbf{D}}(Q)). However, it is important to us to have the current formulation, as in the case when l4k1D(Q)=o(1)l^{4}k^{-1}{\mathbf{D}}(Q)=o(1) the net will have size exactly 11. The power of l4l^{4} might be improvable, but we will not need this improvement here, as ll will always be relatively small for us compared to other parameters such as kk and D(Q){\mathbf{D}}(Q).

Let zZz\in Z. By definition of ZZ, we have

Let W12QW\subset\frac{1}{2}Q be a maximal 11-net of 12Q\frac{1}{2}Q, then we see that the sets w+(QB(0,1))w+(Q\cap B(0,1)) for wWw\in W cover 12Q\frac{1}{2}Q, and thus

thanks to the easily verified fact that #(12Q)r#Q\#(\frac{1}{2}Q)\gg_{r}\#Q.

Refine the 11-net WW to a maximal 22-net WW^{\prime}. We have #Wr#W\#W^{\prime}\gg_{r}\#W and thus

A simple greedy algorithm argument (using the symmetry of LL) shows that we can find a set J{k,,k}J\subset\{-k,\ldots,k\} of cardinality #Jk#L\#J\gg\frac{k}{\#L} such that j1j2∉Lj_{1}-j_{2}\not\in L for any distinct j1,j2Jj_{1},j_{2}\in J. Consider the sets jz+W+((Q+Q(z,k))B(0,1))jz+W^{\prime}+((Q+{\operatorname{Q}}(z,k))\cap B(0,1)) for jJj\in J. By the construction, we can verify that

These sets are disjoint (thanks to the definition of JJ and LL).

Every set lies in 2(Q+Q(z,k))2(Q+{\operatorname{Q}}(z,k)) (since jk|j|\leq k and W12Q).W^{\prime}\subset\frac{1}{2}Q).

Each set has cardinality (#W)#((Q+Q(z,k))B(0,1))(\#W^{\prime})\#((Q+{\operatorname{Q}}(z,k))\cap B(0,1)) (since WW^{\prime} is a 22-net).

Combining this with (25) we conclude that

On the other hand, #Jk#L\#J\gg\frac{k}{\#L}, so

which asserts that many multiples of zz are close to 2Q2Q.

Let R0R_{0} be the smallest radius such that

where CrC_{r} is a sufficiently large constant depending on rr.

Assume, for a moment, that z2R0+4|z|\geq 2R_{0}+4. By the definition of LL, we can find, for each jLj\in L, an element ζj2Q\zeta_{j}\in 2Q such that jzζj2|jz-\zeta_{j}|\leq 2. (If there are many ζj\zeta_{j}, fix one arbitrarily.) Let jj and jj^{\prime} be two different indices, then

This implies that the sets ζj+(10QB(0,R0))\zeta_{j}+(10Q\cap B(0,R_{0})) are disjoint. Furthermore, as ζj2Q\zeta_{j}\in 2Q, they all lie in 12Q12Q. Therefore,

But this contradicts (27) if we choose CrC_{r} sufficiently large. Thus we have

If R0<10R_{0}<10, then z<24z<24 and ZZ has a maximal 2424-net of cardinality 11 and we are done.

From now on, we assume R010R_{0}\geq 10. Thus z<3R0|z|<3R_{0}.

From (26) and the pigeonhole principle we can find j,jLj,j^{\prime}\in L such that 0<jjrl0<|j-j^{\prime}|\ll_{r}l. Thus there exists an integer 0<irl0<i\ll_{r}l such that iz4Q+B(0,4)iz\in 4Q+B(0,4). Since z3R0|z|\leq 3R_{0}, we have izrlR0|iz|\ll_{r}lR_{0} and thus in fact iz(4QB(0,Or(lR0)))+B(0,4)iz\in(4Q\cap B(0,O_{r}(lR_{0})))+B(0,4). Thus, to obtain the desired bound on N1(Z){\mathcal{N}}_{1}(Z), it will suffice to show that

Let ZZ^{\prime} be any 44-net of 4QB(0,Or(lR0))4Q\cap B(0,O_{r}(lR_{0})). Observe that the sets ζ+(QB(0,1))\zeta^{\prime}+(Q\cap B(0,1)) for ζZ\zeta^{\prime}\in Z^{\prime} are disjoint and lie in 5QB(0,Or(lR0))5Q\cap B(0,O_{r}(lR_{0})). Thus we have

Since D(Q)=#Q#(QB(0,1)){\mathbf{D}}(Q)=\frac{\#Q}{\#(Q\cap B(0,1))}, it suffices to show that

But (as we are working on the plane) we can cover B(0,Or(lR0))B(0,O_{r}(lR_{0})) by Or(l2)O_{r}(l^{2}) balls of radius R0/4R_{0}/4, so by Lemma 6.4 we have

Comparing this with (28) we obtain the claim. ∎

Proof of the inverse theorems

We first prove Theorem 3.2. The proof of Theorem 3.3 can be obtained with some minor modifications.

Let us begin with a simple reduction. Since x{x} has O(1)O(1)-controlled second moment, from Chebyshev’s inequality we see that xnA+10|{x}|\geq n^{A+10} with probability O(n2A20)O(n^{-2A-20}). Thus if we let x{x}^{\prime} be x{x} conditioned on the event xnA+10|{x}|\leq n^{A+10}, we see from the union bound that pβ,x(v)p_{\beta,{x}}({\mathbf{v}}) and pβ,x(v)p_{\beta,{x}^{\prime}}({\mathbf{v}}) differ by at most O(n2A19)O(n^{-2A-19}). Thus (modifying pp slightly if necessary) we may replace x{x} by x{x}^{\prime}, and so we may assume for the rest of the proof that

Consider a point v{\mathbf{v}} in Sn,x,β,pS_{n,{x},\beta,p}. Let V=(V1,,Vn){\mathbf{V}}=(V_{1},\ldots,V_{n}) be the vector obtained from β1v/2\beta^{-1}{\mathbf{v}}/2 by rounding the coordinates to the nearest Gaussian integer multiple of nA20n^{-A-20}. Clearly thus V=Θ(β1)|{\mathbf{V}}|=\Theta(\beta^{-1}). Furthermore, by (29)

We are going to find a small O(1)O(1)-net (in the ll^{\infty} norm) for the set of all possible V{\mathbf{V}} satisfying the last inequality. Set k:=n1/2εk:=n^{1/2-\varepsilon}, and let d1d\geq 1 be an integer to be chosen later (dd will be bounded by a constant.)

Now we perform the following algorithm (following the proof of [27, Theorem 2.4]) to construct some elements w1,,wrw_{1},\ldots,w_{r} in V{\mathbf{V}} for some 0rd0\leq r\leq d.

Initialize r=0r=0. Set V=V{\mathbf{V}}^{}={\mathbf{V}}.

Count how many VjV[r]V_{j}\in{\mathbf{V}}^{[r]} there are such that

If this number is less than k2k^{2} then STOP. Otherwise, move on to Step 2.

Applying Lemma 4.5(v), we can find some VjV[r]V_{j}\in V^{[r]} such that

where V[r+1]{\mathbf{V}}^{[r+1]} is obtained from V[r]{\mathbf{V}}^{[r]} by deleting a set of k2k^{2} elements. We then set wr+1:=Vjw_{r+1}:=V_{j} and then increment rr to r+1r+1. If r=dr=d then STOP (with an error); otherwise return to Step 1.

By induction, at each stage in this algorithm we have

and hence by Theorem 6.6 and Lemma 4.5(ii)

On the other hand, by construction we have

Thus, the algorithm must terminate in Step 1 for some r=Oε(1)r=O_{\varepsilon}(1). At this point, we have obtained a tuple (w1,,wr)(w_{1},\ldots,w_{r}) of elements in V{\mathbf{V}} with r=Oε(1)r=O_{\varepsilon}(1) such that

for all but at most rk2=Oε(n12ε)n1εrk^{2}=O_{\varepsilon}(n^{1-2\varepsilon})\leq n^{1-\varepsilon} values of jj.

Now we have enough information to construct the net. First we show that it costs a relatively small factor to take care of the exceptional coordinates. There are at most Oε(k2)n1εO_{\varepsilon}(k^{2})\leq n^{1-\varepsilon} exceptional values of jj; we can fix the values of the exceptional jj by paying a factor of

For each exceptional jj, VjV_{j} is a Gaussian integer multiple of O(nA20)O(n^{-A-20}) of magnitude O(β1)O(\beta^{-1}). Thus, the number of possible choices for VjV_{j} is β1nO(1)\beta^{-1}n^{O(1)}. So, after we fix the exceptional coordinates jj, there are at most

ways to specify the values of these coordinates.

As for the remaining (non-exceptional) coordinates VjV_{j}, Lemma 9.1 (along with (30), the definition of kk, and the bound r=Oε(1)r=O_{\varepsilon}(1)) shows that each such VjV_{j} lies within distance O(1)O(1) of a set of cardinality 1+Oε(n1/2+O(ε)p1)1+O_{\varepsilon}(n^{-1/2+O(\varepsilon)}p^{-1}). The set of all vectors VV has a O(1)O(1)-net in the ll^{\infty} norm of size at most

assuming nn sufficiently large depending on p,εp,\varepsilon.

Changing a O(1)O(1)-net to a 11-net costs only a O(1)O(1) factor. Thus, we can conclude that there is an 11-net of size at most O(n(1/2+O(ε))npn)+exp(o(n))O(n^{(-1/2+O(\varepsilon))n}p^{-n})+\exp(o(n)). As we can choose ε\varepsilon arbitrarily small, the proof of Theorem 3.2 is complete.

To prove Theorem 3.3, we just use the sparse version of all lemmas used in the previous proof, except that we take kk equal to m/μ\sqrt{m/\mu} rather than n1εn^{1-\varepsilon}. The starting point is

Instead of D(Q((w1,,wr),k)){\mathbf{D}}({\operatorname{Q}}((w_{1},\ldots,w_{r}),k)), we will consider D(Q((w1,,wr),μk)){\mathbf{D}}({\operatorname{Q}}((w_{1},\ldots,w_{r}),\sqrt{\mu}k)). Thus, the gain from Lemma 9.1 is no longer k1k^{-1} (which used to lead to the term n1/2+O(ε)n^{-1/2+O(\varepsilon)} in the final bound), but instead (kμ)1(k\sqrt{\mu})^{-1} (which leads to the term nO(ε)(m)1n^{O(\varepsilon)}(\sqrt{m})^{-1}). Meanwhile, the exp(o(n))\exp(o(n)) factor is replaced with exp(nO(ε)k2)=exp(nO(ε)m/μ)\exp(n^{O(\varepsilon)}k^{2})=\exp(n^{O(\varepsilon)}m/\mu). The reader is invited to work out the simple details.

Proof of Theorems 2.5 and 2.9

Theorem 2.5 follows from the following. Let σn(M)\sigma_{n}(M) denote the least singular value of a matrix MM of order nn. We shall abbreviate N=Nn,ρN=N_{n,\rho}.

M+Nnγ\|M+N\|\leq n^{\gamma} with probability one.

x1++xnnγ|{x}_{1}|+\dots+|{x}_{n}|\leq n^{\gamma} with probability one.

Indeed, as x{x} has finite second moment, we can assume that xnA+10|{x}|\leq n^{A+10}, at the cost of a (negligible) additional term o(nA)o(n^{-A}) in probability. Thus, by restricting x{x} to the event xnA+10|{x}|\leq n^{A+10} and using the assumption about MM in Theorem 2.5, we can satisfy both assumptions in Theorem 11.1, for γ\gamma large enough.

We can have a more efficient form of the theorem by bounding the probability that the two assumptions on M+N\|M+N\| and x1++xn|{x}_{1}|+\dots+|{x}_{n}| fail (rather than assuming that they hold with probability one). The relation between BB and γ,A\gamma,A can be strengthened and we will do that in another paper.

We now prove Theorem 11.1. We suppress all dependence of the implied constants on A,γ,B,κA,\gamma,B,\kappa.

Let us call a unit vector v=(v1,,vn){\mathbf{v}}=(v_{1},\ldots,v_{n}) poor if we have

and rich otherwise. Theorem 11.1 follows directly from the following two lemmas and the fact that

(Proof of Lemma 11.3) We repeat the argument from . Let EE be the event that (M+N)vnB\|(M+N)v\|\leq n^{-B} for some poor unit vector v{\mathbf{v}}. If EE holds, then the least singular value of M+NM+N is at most nBn^{-B}, and so the same is true for the adjoint (M+N)(M+N)^{\dagger}. Thus there exists a row vector w{\mathbf{w}}^{\dagger} such that

Write w=(w1,,wn){\mathbf{w}}^{\dagger}=(w_{1},\ldots,w_{n}). By paying a factor of nn and using symmetry we may assume that the last coefficient of w{\mathbf{w}}^{\dagger} has the largest magnitude, thus

Thus, if we let FF be the event that there exists a unit vector w{\mathbf{w}} obeying both (32) and (33), we have

Let X1,,XnX_{1},\ldots,X_{n} be the rows of M+NM+N. We shall condition on the first n1n-1 rows X1,,Xn1X_{1},\ldots,X_{n-1}. Observe that if EE holds, then there exists a poor unit vector v{\mathbf{v}} such that

Thus, if P(EX1,,Xn1){\mathbf{P}}(E|X_{1},\ldots,X_{n-1}) is non-zero, then there exists a poor unit vector u{\mathbf{u}} such that

On the other hand, if FF holds, and w=(w1,,wn){\mathbf{w}}^{\dagger}=(w_{1},\ldots,w_{n}) is as above, then by (32)

taking inner products with the unit vector u{\mathbf{u}} and using the triangle inequality, we conclude

Using (34), Cauchy-Schwarz, and (36) we conclude

On the other hand, since u{\mathbf{u}} is poor, and XnX_{n} is independent of X1,,Xn1X_{1},\ldots,X_{n-1} (and hence independent of u{\mathbf{u}} also), we have

Putting all this together, we conclude that

uniformly in the choice of X1,,Xn1X_{1},\ldots,X_{n-1}. Integrating over X1,,Xn1X_{1},\ldots,X_{n-1} and using (35) we obtain P(E)nA{\mathbf{P}}(E)\leq n^{-A}, as desired. ∎

(Proof of Lemma 11.4) Let ε>0\varepsilon>0 be a sufficiently small constant (in particular, smaller than the constant in Theorem 3.2); we allow all implied constants to depend on ε\varepsilon. We may also assume that nn is sufficiently large depending on ε\varepsilon.

Let JJ be the smallest integer strictly larger than 2A+22A+2, thus 2A+2<J2A+32A+2<J\leq 2A+3. Thus, if we set δ:=(A+1)/J\delta:=(A+1)/J, then (using (31)) we have 0<δ<1/20<\delta<1/2 and B>JγB>J\gamma. If ε\varepsilon is sufficiently small, we thus have

Let v{\mathbf{v}} be a rich unit vector. For j=0,1,,Jj=0,1,\ldots,J, consider the quantities

These quantities are increasing in jj, and range between nA1n^{-A-1} and 11 since v{\mathbf{v}} is rich. Applying the pigeonhole principle and using the definition of δ\delta, we can thus find a positive 0jJ10\leq j\leq J-1 such that

Define, for any 0jJ10\leq j\leq J-1 and 1k(A+1)/ε1\leq k\leq\lceil(A+1)/\varepsilon\rceil, the set Ωj,k\Omega_{j,k} as

Since the number of pairs j,kj,k is O(1)O(1), it suffices by the union bound to show that for each fixed j,kj,k

In fact, we are going to show that this probability is exponentially small.

Let p:=nkεp:=n^{-k\varepsilon}. In the notation of Theorem 3.2, v{\mathbf{v}} lies in Sn,x,nB+Cj+1/2,pS_{n,{x},n^{-B+Cj+1/2},p}. Thus by this theorem, there is a set VV of cardinality at most

such that for each vΩj,k{\mathbf{v}}\in\Omega_{j,k} there is vV{\mathbf{v}}^{\prime}\in V such that vvnB+Cj+1/2\|{\mathbf{v}}-{\mathbf{v}}^{\prime}\|_{\infty}\leq n^{-B+Cj+1/2}.

Consider vΩj,kv\in\Omega_{{j,k}} and vV{\mathbf{v}}^{\prime}\in V as above. Recall that M+Nnγ\|M+N\|\leq n^{\gamma} almost surely. Thus with probability 11 we have

As usual, let XiX_{i} be the iith row of M+NM+N. It follows that there are at least n:=nn1εn^{\prime}:=n-n^{1-\varepsilon} coordinates 1in1\leq i\leq n such that

Now we relate the probability that XvnB+Cj+1/2+γ+ε|X\cdot{\mathbf{v}}^{\prime}|\leq n^{-B+Cj+1/2+\gamma+\varepsilon} with pp, where X:=(x1,,xn)X:=({x}_{1},\dots,{x}_{n}). Consider the quantity

Notice that vivinB+Cj+1/2|v_{i}-v_{i}^{\prime}|\leq n^{-B+Cj+1/2} and also ixinγ\sum_{i}|{x}_{i}|\leq n^{\gamma} with probability one. Thus

which implies, through the triangle inequality, that

where in the last inequality we used (38).

where in the last inequality we used the definition of Ωj,k\Omega_{j,k}.

Also, a very crude second moment argument, using the fact that x{x} has κ\kappa-controlled second moment, gives

if δ>0\delta^{\prime}>0 is small enough depending on κ\kappa. Thus

Again by the union bound, the left-hand side of (39) is at most

It is routine to verify that the last quantity is o(nA)o(n^{-A}) (indeed, we obtain a bound of the form O(exp(σn))O(\exp(-\sigma n)) for some σ>0\sigma>0). Our proof is complete. ∎

Now we sketch the proof of Theorem 2.9. We repeat the above arguments with the following changes. We will of course replace Theorem 3.2 by Theorem 3.3, with μ:=ρ\mu:=\rho. Due to the presence of the additional factors of μ\mu in that theorem, we can no longer afford to choose δ\delta close to 1/21/2, so we instead choose δ\delta to be very small, say δ=ε\delta=\varepsilon, where ε\varepsilon is very small compared to 1α1-\alpha. In order to take δ\delta this small, we will need BB to be much larger than what (31) requires, but this is not a problem. For our applications, all we need is that BB does not depend on nn.

The treatment of the poor vectors (Lemma 11.3) in the sparse case is the same as in the non-sparse case. The treatment of the rich vectors (Lemma 11.4) is also essentially the same, except for the fact that we no longer have (40). To be more precise, 1δ1-\delta^{\prime} needs to be replaced by 1δρ1-\delta^{\prime}\rho. In the cases when kk is larger than some absolute constant (say 5), (40) is not needed, since in this case pp is sufficiently small and

and the above argument goes through without difficulty, so long as one applies Theorem 3.3 with m:=nC0εm:=n^{C_{0}\varepsilon} for some sufficiently large absolute constant C0C_{0}.

In the remaining case where kk is at most 5, the replacement of 1δ1-\delta^{\prime} by 1δρ1-\delta^{\prime}\rho becomes too expensive and we will avoid it by a rescaling argument, using the pigeonhole principle.

To start, notice that from the definition of Ωj,k\Omega_{j,k} and the fact k5k\leq 5, we have

for some fixed BOε(1)BBB-O_{\varepsilon}(1)\leq B^{\prime}\leq B. Since the left-hand side is p1,xIρ(nBv)p_{1,{x}{\mathbf{I}}_{\rho}}(n^{B^{\prime}}{\mathbf{v}}), we also see from Lemma 4.4 that

We observe that this implies that v{\mathbf{v}} is “compressed” in the sense that at most n100ε/ρn^{100\varepsilon}/\rho of the coefficients of v=(v1,,vn)v=(v_{1},\ldots,v_{n}) can exceed nB+10n^{-B^{\prime}+10} in magnitude. (Of course, instead of 100100, on can use any large constant.) Indeed, if instead we had at least n100ε/ρn^{100\varepsilon}/\rho coefficients viv_{i} of magnitude at least nB+10n^{-B^{\prime}+10} for some large absolute constant AA, we see from Lemma 4.5 that

for one of these viv_{i}, but one can show that this is not the case by a direct computation using the κ\kappa-controlled second moment hypothesis, or else by an appeal to Theorem 6.6. (Here we used the notation of Lemma 4.5: (z)s(z)^{s} denotes a vector of length ss whose every coordinate equals zz.)

Next, we apply the pigeonhole principle to conclude the existence of a BB^{\prime\prime} with BOε(1)BB10B^{\prime}-O_{\varepsilon}(1)\leq B^{\prime\prime}\leq B^{\prime}-10 and an integer m=Oε,γ(1)m=O_{\varepsilon,\gamma}(1) with

By paying a factor of Oε,γ(1)O_{\varepsilon,\gamma}(1) in our final probability bound we may fix BB^{\prime\prime} and mm. If we define the vector w{\mathbf{w}} by setting wiw_{i} to be the nearest (Gaussian integer) multiple of nB+1n^{-B^{\prime\prime}+1} to viv_{i}, we see that wiw_{i} is non-zero for at most nmεn^{m\varepsilon} coordinates ii, and has magnitude nB+10+γ\gg n^{-B^{\prime\prime}+10+\gamma} for at least n(m1)εn^{(m-1)\varepsilon} of these coordinates. Also, if (M+N)vnB\|(M+N){\mathbf{v}}\|\leq n^{-B}, we see from the triangle inequality and crude computations that (M+N)wnB+5+γ\|(M+N){\mathbf{w}}\|\leq n^{-B^{\prime\prime}+5+\gamma} (say), recalling that B<B+10B^{\prime\prime}<B+10.

On the other hand, note that if we let Ii,ρ{\mathbf{I}}_{i,\rho} be independent samples of Iρ{\mathbf{I}}_{\rho}, then with probability Ω(n(m1)ερ)\Omega(n^{(m-1)\varepsilon}\rho), there is at least one ii with Ii,ρ=1{\mathbf{I}}_{i,\rho}=1 and wi=Ω(nB+10+γ)|w_{i}|=\Omega(n^{-B^{\prime\prime}+10+\gamma}). From this we conclude that

for some absolute constant δ>0\delta^{\prime}>0 (cf. (40)), and thus for each fixed w{\mathbf{w}} we have

On the other hand, a direct counting argument shows that the number of possible w{\mathbf{w}} is at most exp(O(n(m+1)ε))\exp(O(n^{(m+1)\varepsilon})). Recall that ρn1+α\rho\geq n^{-1+\alpha} and α\alpha is much larger than ε\varepsilon. It follows that

for any mm. Applying the union bound we obtain a suitably small contribution to the sparse analogue of Lemma 11.4, as required.

Proof of the circular law

We now use Theorem 2.5 to derive Theorem 1.2.

By Lemma 2.4 and rotating x{x} by a constant phase if necessaryHere of course we use the obvious fact that the circular law is invariant under phase rotation of the underlying random variable x{x}., we may assume that x{x} has κ\kappa-controlled second moment for some κ\kappa. Allowing implied constants to depend on this κ\kappa, we thus have that x{x} has O(1)O(1)-controlled second moment, which will allow us to apply Theorem 2.5 later.

We closely follow the (now standard) argumentsOne could also follow the approach of Götze and Tikhomirov , as was done in . in [2, Chapter 10] (which are in turn based on the earlier work of Girko and Bai ), which we briefly review here.

Let cn:R×RCc_{n}:{\mathbf{R}}\times{\mathbf{R}}\to{\mathbf{C}} be the characteristic function

of the uniform measure μ\mu_{\infty} on the disk. The sequence of empirical measures μn\mu_{n} can be shown to be a.s. tight just from the assumption that x{x} has finite second moment (see [2, Lemma 10.5] and [2, Theorem 3.6], and also the discussion in [2, p. 295]), and so by standard arguments it suffices to show, for almost every u,vu,v, that

Henceforth we fix u,vu,v. We can take uv0uv\neq 0 since we only need the claim for almost every u,vu,v. From [2, Lemma 10.2], we have the Stieltjes transform identity (first observed by Girko )

and νn\nu_{n} is the empirical distribution of the positive-definite Hermitian matrix

The expression gn(s,t)g_{n}(s,t) is absolutely integrable in s,ts,t, however because of the unboundedness of logx\log x, Fubini’s theorem is not currently applicable, and one must take some care with interchanging integrals or derivatives in this expression. In [2, Lemma 10.4], the analogous identity

is derived, where gg is a function whose explicit form (given in [2, p. 296]) we will not review here. The task is then to show that

The next steps in are to perform some truncations in the region of integration. Let S>2S>2 be any integer. In [2, Lemma 10.6] (see also the discussion in [2, p. 299]), it was shown that

Fix SS. For any ε>0\varepsilon>0, let TR2T\subset{\mathbf{R}}^{2} denote the set

(recall that z:=s+1tz:=s+\sqrt{-1}t). In [2, Lemma 10.7] it is shown that

and similarly with gn(s,t)g_{n}(s,t) replaced by g(s,t)g(s,t); thus it suffices to show that

where ν\nu is an explicit probability measure which we will not review here; in particular, the inner integral is absolutely convergent. Set εn:=n2B\varepsilon_{n}:=n^{-2B} for some large absolute constant BB (independent of nn) to be chosen later. Using the integration by parts argument given in [2, §10.7], it suffices to show that

and similarly with the two-dimensional integral on TT replaced by one-dimensional integrals on the boundary of TT. We shall only estimate the two-dimensional integrals, as the treatment of the one-dimensional ones are similarActually, by employing a smooth cutoff to TT rather than a rough one, one can dispense with the need to consider boundary integrals..

We first prove (44). Since x{x} has finite second moment, a simple application of Chebyshev’s inequality and the Borel-Cantelli lemma, and crude bounds on the spectral norm of NnN_{n} shows that almost surely νn\nu_{n} is supported on the interval [0,n100][0,n^{100}]. Thus it suffices to show that

Observe that logx\log x has total variation bounded by a finite multiple of logn\log n on the xx region of integration, thus it will suffice to show that

For this, it is convenient to perform some truncation, following [2, §10.5.1]. Let 0<δ<1/40<\delta<1/4 be arbitrary, and define the truncated random variables a^ij\hat{a}_{ij} (depending on nn) by

almost surely for some absolute constant c>0c>0, where LL denotes the Levi distance. Next, from [2, Lemma 10.15] we have

almost surely for another absolute constant c>0c^{\prime}>0 (this is where we use the hypothesis δ<1/4\delta<1/4). Applying [2, Lemma 12.18] we conclude

and hence by the triangle inequality for Levi distance

for some c>0c^{\prime\prime}>0. Applying [2, Lemma 12.18] and [2, Lemma 10.8] we obtain

for some c>0c^{\prime\prime\prime}>0, which yields (46) (with some room to spare). This proves (44).

The only remaining task is to prove (45). We would like to reduce matters to establishing that almost surely we have

for almost every zz. The Lebesgue dominated convergence theorem does not apply directly. However, observe from the triangle inequality in L2L^{2} that

since logz\log|z| is locally square-integrable. From bounds on ν\nu (e.g. [2, Lemma 10.8] and the estimates used to prove [2, Lemma 10.10]), we also have

is bounded uniformly in nn, which implies that the sequence of functions 0εnlogx νn(dx,z)\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z) is uniformly integrable on TT.

Now we can deduce (45) from (49). To see this, let M>1M>1 be a large parameter, and let TM,nT_{M,n} be the set of all zz such that 0εnlogx νn(dx,z)M|\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z)|\leq M. From (49) and the Lebesgue dominated convergence theorem we have

On the other hand, from the uniform boundedness of (50) we see that

Adding these two estimates, and then letting MM\to\infty, we obtain (45).

It remains to prove (49). By Fubini’s theorem, it suffices to show for every zz that (49) holds almost surely. But observe that the integrand in (49) vanishes whenever 1nNnzIn\frac{1}{\sqrt{n}}N_{n}-zI_{n} has least singular value at least nBn^{-B}. By Theorem 2.5, this holds with probability at least 1O(n100)1-O(n^{-100}), if BB is sufficiently large. The claim then follows from the Borel-Cantelli lemma. This concludes the proof of Theorem 1.2.

Relaxation of the moment condition

We observe that the bound (46) was established with some room to spare. In fact, the arguments in allow one to relax the condition Ex2+δ<{\mathbf{E}}|{x}|^{2+\delta}<\infty to the slightly weaker condition

for any sufficiently large constant CC. By inspecting the arguments in , we see that any C>16C>16 will work. Perhaps a better constant can be obtained by tightening some calculations, but we do not try to pursue this direction. It seems to us that in the current approach, the extra log term cannot be removed completely in order to establish the full conjecture.

Using the moment method, we obtain the bound

for any integer k1k\geq 1. If we choose k:=Klognloglognk:=\lfloor\frac{K\log n}{\log\log n}\rfloor for some sufficiently large absolute constant KK, then the factor (14nδη)2k(\frac{1}{4}n^{\delta\eta})^{-2k} becomes O(n100)O(n^{-100}).

To conclude the argument, it suffices to show that

This type of bound was established for bounded kk in [2, Lemma 10.11] using the moment method. But it is well-known that the method extends to much higher value of kk, in particular k=OK(lognloglogn)k=O_{K}(\frac{\log n}{\log\log n}). Indeed, the left-hand side of (52) can be expanded as

Thus the total contribution to (53) can be bounded by

The last (l=kl=k) term (which is the dominating term) is of order O(1)knk+1O(1)^{k}n^{k+1}, which is acceptable. As for the l<kl<k terms, we can bound their contribution crudely by

using the definition of kk and the fact that δ\delta is small. Thus, this contribution is negligible compared to the main term. This proves (52), and completes the derivation of the circular law under the hypothesis (51).

Rate of Convergence

Let us return to the original hypothesis of bounded (2+η)th(2+\eta)^{\operatorname{th}} moment for some fixed η>0\eta>0. The above arguments can be pursued in more detail to obtain the more quantitative result that with probability 11, we have

for some η>0\eta^{\prime}>0 depending on η\eta, and all sufficiently large nn.

A full exposition of this improvement would be very tedious, so we only give a brief sketch of how the argument proceeds. We first make some Fourier-analytic reductions, analogous to the proof of Weyl’s equidistribution theorem, to reduce matters to controlling the characteristic function cn(u,v)c_{n}(u,v).

Applying the Kolmogorov law of large numbers, we conclude that with probability 11

Let φ\varphi be a bump function adapted to the ball B(0,n10η)B(0,n^{10\eta^{\prime}}), and let φ^:(R/nη/2Z)2C\hat{\varphi}:({\mathbf{R}}/n^{\eta^{\prime}/2}{\mathbf{Z}})^{2}\to{\mathbf{C}} be the Fourier series

This is an approximation to the identity, and one can then verify the pointwise bounds

Taking Fourier transforms and using the triangle inequality, we can bound the left-hand side by

Thus it will suffice (by the union bound and the Borel-Cantelli lemma) to show that for any fixed u,vu,v with u,vnη|u|,|v|\leq n^{\eta^{\prime}}, one has

To prove (55) one repeats the proof of (43), which requires going through all the relevant arguments in and noting that all the almost sure convergence results can be replaced instead with more quantitative polynomial convergence results (similar to (55)). We perform only one of these steps in detail, namely the proof of the quantitative analogue of (45),

Inspecting the proof of (49), we see that for each fixed zz, 0εnlogx νn(dx,z)\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z) vanishes with probability O(n100)O(n^{-100}). By Fubini’s theorem and Markov’s inequality, we thus see that with probability 1O(n50)1-O(n^{-50}), the set {zT:0εnlogx νn(dx,z)0}\{z\in T:\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z)\neq 0\} has measure at most n50n^{-50}. Since (50) is bounded uniformly in nn, the claim now follows from the Cauchy-Schwarz inequality.

It is quite likely that one can make the convergence even more quantitative, establishing a bound of the form

for all n1n\geq 1; note that the claim (54) is a corollary of this bound and the Borel-Cantelli lemma. This requires replacing the Kolmogorov law of large numbers with a more quantitative law of large numbers which takes advantage of the fact that the random variable x2|{x}|^{2} does not merely have finite first moment, but in fact has finite (1+η2)th(1+\frac{\eta}{2})^{\operatorname{th}} moment. We omit the details.

The sparse case

In this section we sketch how one can modify the arguments in Section 12 to obtain the circular law for sparse matrices (i.e. Theorem 1.3). The proof shall be a modification of thatIt is also likely that the arguments in (see also ) could also be adapted to handle this case, at least if one assumes additional moment conditions on x{x}, since the lower bound α>3/4\alpha>3/4 required in that paper was only needed to obtain an analogue of Theorem 2.9. of Theorem 1.2. In that theorem, one first needed the convergence

which was a consequence of the Kolmogorov law of large numbers, in order to obtain tightness of the μn\mu_{n}. In the sparse case, the analogous convergence result one needs is

But one easily computes that with probability 11, Ij,k,ρ{\mathbf{I}}_{j,k,\rho} is equal to 11 for (1+o(1))ρn2(1+o(1))\rho n^{2} values of j,kj,k, and so this claim also follows from the Kolmogorov law of large numbers.

We are greatly indebted to Manjunath Krishnapur for pointing out the connection between the least singular value bounds and the circular law problem and for many useful discussions, and to Dmitry Timushev for corrections. We also thank P. Wood and an anonymous referee for their careful reading. The first author also thanks Mark Rudelson for some helpful conversations. The first author is supported by a grant from the Macarthur Foundation and by NSF grant CCF-0649473. The second author is supported by NSF Grant 06355606.

References