Random Matrices: The circular Law

Terence Tao, Van Vu

Introduction

Let ${x}$ be a complex random variable with finite non-zero variance $0<\sigma^{2}<\infty$ and $N_{n}$ be the random matrix of order $n$ with entries being i.i.d. copies of ${x}$ . Let $\lambda_{1},\dots,\lambda_{n}$ be the eigenvalues of $\frac{1}{\sigma\sqrt{n}}N_{n}$ . Define the empirical spectral distribution (ESD) $\mu_{n}$ of $N_{n}$ by the formula

We say that the (strong) circular law holds for ${x}$ if, with probability $1$ , the spectral distribution $\mu_{n}$ converges (uniformly) to the uniform distribution

over the unit disk as $n$ tends to infinity. In the literature one also sees the weak circular law, which asserts that for any fixed $s$ and $t$ , that $\mu_{n}(s,t)$ converges to $\mu_{\infty}(s,t)$ in probability.

As the name suggests, the weak circular law is easier to prove than the strong one. Using the approach in , the proofs of both types of convergence boil down to controlling the least singular value of $\frac{1}{\sigma\sqrt{n}}N_{n}-zI$ . For the weak convergence, one needs a bound with failure probability tending to zero with $n$ tends to infinity (this is the approach taken in , for example). On the other hand, for the strong convergence one needs the failure probability be summable in $n$ . This appears much more difficult and we will discuss it in more detail in Section 2 (see the paragraph following Theorem 2.1).

In this paper we shall be concerned exclusively with the strong circular law, and in particular with regard to the following well-known conjecture:

Circular law conjecture. The strong circular law holds for any complex variable ${x}$ with zero mean and finite non-zero variance.

The circular law conjecture was formulated in the early 1950s, as a natural (non-hermitian) counterpart of Wigner’s semi-circle law. Since then, several partial results have been obtained, at the cost of extra assumptions on the distribution of the basic variable ${x}$ . In the next few paragraphs, we give a brief survey of these results.

If ${x}$ is complex Gaussian, the conjecture was proved by Mehta in 1967, using the joint density function of the eigenvalues $\lambda_{i}$ which was discovered by Ginibre few years earlier . An important breakthrough was made by Bai , following an earlier work of Girko . (Bai’s paper discussed Girko’s paper carefully and pointed out some gaps in that paper.) In , Bai proved the claim under the assumption that ${x}$ has finite sixth moment ( ${\mathbf{E}}|{x}|^{6}<\infty$ ) and that the joint distribution of the real and imaginary parts of ${x}$ has a bounded density. Recently, in [2, Chapter 10], a finer result was obtained showing that the sixth moment hypothesis can be weakened to ${\mathbf{E}}|{x}|^{2+\eta}<\infty$ for any specified $\eta>0$ . However, the bounded density assumption remains critical. This assumption, unfortunately, excludes several important distributions, for instance discrete distributions such as Bernoulli random variables $x\in\{-1,+1\}$ .

[2, Theorem 10.3] Assume that the complex random variable ${x}$ has zero mean and finite $(2+\eta)^{\operatorname{th}}$ moment for some $\eta>0$ and also that the joint distribution of the real and imaginary part has a bounded density. Then the circular law holds for ${x}$ .

A key idea in is to analyze the ESD $\mu_{n}$ through its Stieltjes transformation $s_{n}:{\mathbf{C}}\to{\mathbf{C}}$ , defined by the formulaWe are using $\sqrt{-1}$ for the imaginary unit, as we wish to reserve $i$ as an index of summation.

As $s_{n}(z)$ is analytic everywhere except the poles, the real part already determines the eigenvalues $\lambda_{k}$ . If write $s_{n}(z)=s_{nr}(z)+\sqrt{-1}s_{ni}(z)$ , $\lambda_{k}=\lambda_{kr}+\sqrt{-1}\lambda_{ki}$ and $z=s+\sqrt{-1}t$ , we have the important identity

where $\nu_{n}(.,z)$ is the ESD of the Hermitian matrix $H_{n}:=(\frac{1}{\sqrt{n}}N_{n}-zI)(\frac{1}{\sqrt{n}}N_{n}-zI)^{\ast}$ . The task then reduces (at least in principle) to controlling the distributions $\nu_{n}$ .

The $\log$ function has two poles, at $\infty$ and . The first one is easy to deal with, as one can bound the largest singular value by a polynomial in $n$ . The pole at poses a much more serious obstacle, since the smallest eigenvalue of $H_{n}$ (or the least singular value of $N_{n}-zI$ ) can be arbitrary close to . (In fact, if the matrix is singular, which happens with positive probability in discrete models, then the least singular value is .) The bounded density assumption in Theorem 1.1 was introduced primarily in order to handle this obstacle.

In the last few years, the least singular value problem has become better understood in the discrete case, thanks to a series of papers . In these papers, strong lower bounds for the least singular value of a random matrix or a random perturbation of a fixed matrix were obtained. As a consequence, the circular law has recently been established for various new classes of distributions. For instance, Götze and Tikhomirov proved the weak circular law for any sub-GaussianA variable is sub-Gaussian if it has exponential tail; in particular all of its moments are bounded. distribution ${x}$ , using the arguments from . In , Girko established the weak circular law assuming bounded $4+\delta$ moment for some $\delta>0$ . Relying on , Pan and Zhou were recently able to verify the strong circular law for any distribution with a bounded fourth moment. This assumption is needed for a number of reasons, in particular allowing one to bound the operator norm of $N_{n}$ by $O(\sqrt{n})$ with high probability. Very recently (a few months after the current paper was first posted on the arXiv), Götze and Tikhomirov proved the weak circular law under an assumption similar to our main theorem below.

In this paper, we prove the circular law only assuming a bounded $(2+\eta)^{\operatorname{th}}$ moment, for any fixed $\eta>0$ . In particular, we have completely removed the bounded density function assumption in Theorem 1.1.

Assume that ${x}$ is a complex random variable with zero mean and finite $(2+\eta)^{\operatorname{th}}$ moment for some $\eta>0$ , with strictly positive variance. Then the strong circular law holds for ${x}$ .

This result can be further strengthened in several directions:

We can further relax the condition ${\mathbf{E}}|{x}|^{2+\eta}<\infty$ to ${\mathbf{E}}|{x}|^{2}\log^{C}(2+|{x}|)<\infty$ , where $C$ is a sufficiently large absolute constant. (For instance, $C=16$ is sufficient; see Section 13 for details.)

It is not necessary to assume that the entries have identical distributions. It suffices to assume that they are independent, have mean zero with uniformly bounded $(2+\eta)^{\operatorname{th}}$ moments, and that they are all dominated (in a Fourier-analytic sense) by a single random variable with finite non-zero variance and bounded $(2+\eta)^{{\operatorname{th}}}$ moment; see Remark 2.8 below. (See also [2, p. 326-327] in which the extension of the circular law to the case of non-identical distributions is discussed.)

One can obtain some quantitative estimates on the rate of convergence as well. For example, under the $(2+\eta)^{\operatorname{th}}$ moment assumption, we can show that almost surely, the distance $\sup_{s,t}|\mu_{n}(s,t)-\mu_{\infty}(s,t)|$ between $\mu_{n}$ and the limiting distribution $\mu_{\infty}$ in the uniform metric is at most $n^{-\eta^{\prime}}$ for some constant $\eta^{\prime}>0$ and all sufficiently large $n$ .

It is too technical to address these points in the main proof, so we are going to first prove Theorem 1.2 and sketch out the necessary modifications to obtain these refinements in Sections 13, 14.

The circular law also holds for sparse random matrices. For $0<\mu\leq 1$ , let ${\mathbf{I}}_{\mu}$ be the boolean random variable which takes value 1 with probability $\mu$ and with probability $1-\mu$ . Let $\rho=n^{-1+\alpha}$ , for a positive constant $\alpha$ . Let $N_{n,\rho}$ be the random matrix with the $ij$ entry being ${\mathbf{I}}_{i,j,\rho}{x}_{i,j}$ , where the ${\mathbf{I}}_{i,j,\rho}$ and ${x}_{ij}$ are jointly independent iid copies of ${\mathbf{I}}_{\rho}$ and ${x}$ respectively. Götze and Tikhomirov proved that if ${x}$ is sub-Gaussian and $\alpha>3/4$ , then $N_{n,\rho}$ admits the circular law. We can prove the following strengthening of this result:

Let $\alpha>0$ and $\eta>0$ be arbitrary positive constants. Assume that ${x}$ is a complex random variable with zero mean and finite $(2+\eta)^{\operatorname{th}}$ moment. Set $\rho=n^{-1+\alpha}$ and let $\mu_{n,\rho}$ be the ESD of $\frac{1}{\sigma\sqrt{n\rho}}N_{n,\rho}$ , where $\sigma^{2}$ , as usual, is the variance of ${x}$ . Then $\mu_{n,\rho}$ tends to the uniform distribution $\mu_{\infty}$ over the unit disk as $n$ tends to infinity.

If one takes $\alpha=0$ , the circular law no longer holds. In this case, $\rho=n^{-1}$ and each entry equals with probability $1-1/n$ . Thus, a row is all-zero with probability $(1-1/n)^{n}\approx e^{-1}$ . Since the rows are independent, it is easy to show that with high probability one has $\Theta(n)$ all-zero rows. But this means that the ESD, with high probability, has positive constant mass at the origin.

We shall prove this theorem in parallel with Theorem 1.2, by indicating at various junctures what the “sparse” version of certain key lemmas are.

The key ingredient in our proof of the circular law is a new lower bound for the least singular value of the matrix $M+N_{n}$ , where $M$ is an arbitrary matrix with complex entries having absolute values bounded from above by a polynomial in $n$ . For the circular law, we only need to consider the case $M=-zI$ , where $I$ is the identity matrix. On the other hand, the general case is interesting on its own right and proves useful in other areas of mathematics (see, for example, ). Our arguments permit the coefficients of $M$ or $N_{n}$ to be as large as $n^{C}$ for any fixed constant $C$ , which is the main reason why we do not need any stronger moment control on ${x}$ beyond the $(2+\eta)^{\operatorname{th}}$ moment.

The rest of the paper is organized as follows. In the next section, we present the above mentioned result on the least singular value. The key tool for proving this result is a so-called Inverse Littlewood-Offord theorem, discussed in Section 3. This theorem is motivated by several previous results of the same spirit from . On the other hand, the bound in Section 3 is nearly optimal and is sharper than one that can be deduced from . This improvement is critical to us.

The proof of the Inverse Littlewood-Offord theorem is technical and requires several lemmas, developed in Sections 4-9. In particular, we prove a forward Littlewood-Offord theorem (Theorem 6.6), which seems to be of interest on its own right. The proof of the Inverse Theorem follows in Section 10. Next, we prove the desired bound on the least singular value in Section 11. The proof of the circular law follows in Section 12. The rest of the paper is devoted to various refinements of the circular law; for instance, Theorem 1.3 is discussed in Section 15.

In order to handle the sparse case, we need sparse versions of all the tools mentioned above. These results can be proved using the same argument with some modifications. We will only sketch these proofs in the paper.

Let us conclude this section with our notation.

In the whole paper we assume that $n$ is sufficiently large, whenever needed. Asymptotic notation is used under the assumption that $n\rightarrow\infty$ . Let $X$ and $Y$ be non-negative quantities. $X=O(Y)$ , $X\ll Y$ , $Y\gg X$ and $Y=\Omega(X)$ all mean that $X\leq CY$ for some positive constant $C$ and $X=\Theta(Y)$ means $X\ll Y\ll X$ ; $X=o(Y)$ means that $|X|\leq c(n)Y$ where $c(n)$ goes to zero as $n\to\infty$ .

Throughout the paper, letters $A,B,C,c,\alpha,\varepsilon,\eta,\delta,\kappa$ are used to denote constants. Letters $\mu,\rho,\beta$ denote quantities that may depend on $n$ .

We use ${\mathbf{P}}$ to denote probability, ${\mathbf{E}}$ to denote expectation, and ${\mathbf{I}}_{\rho}$ to denote indicator functions of expectation $\rho$ as used earlier in this section. If $E$ is an event, we use ${\mathbf{I}}(E)$ to denote the indicator of $E$ , which equals $1$ when $E$ is true and otherwise. The cardinality of a finite set $S$ will be denoted $\#S$ , and the Lebesgue measure of a set $A\subset{\mathbf{C}}$ will be denoted ${\operatorname{mes}}(A)$ .

Least singular value bound

Let $M$ be a matrix of order $n$ . We use $\|M\|$ to denote the spectral norm of $M$ (i.e. the largest singular value of $M$ )

As discussed in the previous section, a key point in Bai’s approach is to obtain control on the lower tail distribution for the least singular value of $\frac{1}{\sqrt{n}}N_{n}-zI_{n}$ , or equivalently to obtain control on the upper tail distribution of the norm of the inverse $\|(\frac{1}{\sqrt{n}}N_{n}-zI_{n})^{-1}\|$ .

This will be achieved in Theorems 2.1 below. The strength of this theorem is that it requires a very weak assumption on the distribution of the entries. All we need is a finite second moment. Several results of this type were obtained recently, under stronger assumptions of ${x}$ . For example, addressed the case when ${x}$ was real Gaussian; addressed the case when ${x}$ has support on the integers and $M$ has integer entries. This was done building upon the $M=0$ case discussed in . The case when ${x}$ has finite third moment and that $\|M\|$ is bounded by $O(n^{1/2})$ was addressed in (building upon the $M=0$ real-valued case proven in ). In this result, the assumption on the norm of $M$ is important and the constant $1/2$ (in the exponent of $n$ ) cannot be replaced by any other constant. Furthermore, in the complex-valued case, the bounds in depended on the entire covariance matrix of ${x}$ and not just on the variance.

Let $A,C_{1}$ be positive constants, and let ${x}$ be a complex-valued random variable with non-zero finite variance (in particular, the second moment is finite). Then there are positive constants $B$ and $C_{2}$ such that the following holds: if $N_{n}$ is the random matrix of order $n$ whose entries are iid copies of ${x}$ , and $M$ is a deterministic matrix of order $n$ with spectral norm at most $n^{C_{1}}$ , then,

It is very important that we can have any constant $A$ in the bound. If $A>1$ , then the right hand side is summable in $n$ and this is critical to the strong circular law. In order to prove the weak law, any $A$ suffices. The difficulty between getting any $A$ and getting $A>1$ can be illustrated by the following simplified case. Take $M$ be the zero matrix and $N$ be the random Bernoulli matrix (whose entries take value $\pm 1$ with probability $1/2$ ). To make the situation even simpler, assume that we only want to bound the probability that $N^{-1}$ does not exists (namely that $N$ is singular). Already in the 70s, Komlós proved that this probability is $O(n^{-1/2})$ . However, the first proof for a bound of the type $O(n^{-1-\varepsilon})$ was obtained only almost twenty years later by Kahn, Komlós and Szemerédi , using a much more complex argument.

Let us now go back to Theorem 2.1. In fact, we have a more precise statement involving a seemingly stronger (but actually equivalent) assumption on ${x}$ . More precisely, we introduce the following technical definition.

Let $\kappa\geq 1$ . A complex random variable ${x}$ is said to have $\kappa$ -controlled second moment if one has the upper bound

(in particular, $|{\mathbf{E}}{x}|\leq\kappa^{1/2}$ ), and the lower bound

The Bernoulli random variable ( ${\mathbf{P}}({x}=+1)={\mathbf{P}}({x}=-1)=1/2$ ) has $1$ -controlled second moment. The condition (1) asserts in particular that ${x}$ has variance at least $\frac{1}{\kappa}$ , but also asserts that a significant portion of this variance occurs inside the event $|{x}|\leq\kappa$ , and also contains some more technical phase information about the covariance matrix of ${\operatorname{Re}}({x})$ and ${\operatorname{Im}}({x})$ .

To show that this condition is not significantly stronger than bounded second moment, we prove that any complex random variable of finite non-zero variance has controlled second moment after a (harmless) phase rotation:

Let ${x}$ be a complex random variable with finite non-zero variance. Then there exists a phase $e^{\sqrt{-1}\theta}$ and a $\kappa\geq 1$ such that $e^{\sqrt{-1}\theta}{x}$ has $\kappa$ -controlled second moment.

For $\kappa$ sufficiently large, we have ${\mathbf{E}}|{x}|^{2}\leq\kappa$ , and the event $|{x}|\leq\kappa$ has probability at least $1/\sqrt{\kappa}$ . Let ${x}_{\kappa}$ be the variable ${x}$ conditioned on the event $|{x}|\leq\kappa$ . Since ${x}$ has non-zero variance, we see that ${x}_{\kappa}$ will also have non-zero variance for $\kappa$ large enough. It will then suffice to show that

after rotating ${x}$ by a phase if necessary. If we write ${y}_{\kappa}:={x}_{\kappa}-{\mathbf{E}}({x}_{\kappa})$ , then we easily compute

so it suffices to show that for $\kappa$ sufficiently large we have

Now set ${y}:={x}-{\mathbf{E}}({x})$ and consider the covariance matrix

Since ${x}$ has finite non-zero variance, we see that this matrix is finite, non-zero, and positive semi-definite. In particular its largest eigenvalue is at least $\delta$ for some $\delta>0$ . By monotone convergence we then conclude that the covariance matrix

has largest eigenvalue at least $\delta/2$ for $\kappa$ sufficiently large.

Now fix $\kappa$ large enough so that all the above statements hold, and also so that $\frac{1}{\sqrt{\kappa}}\leq\frac{\delta}{2}$ . The null space of (3) is at most one-dimensional. By rotating ${x}$ by a phase we may then assume that the null space is contained in the imaginary axis $\{\left(\begin{matrix}0\\ w\end{matrix}\right):w\in{\mathbf{R}}\}$ . Since covariance matrices are positive semi-definite, we thus have the quadratic form estimate

and (2) follows by setting $u=Re(z)$ and $v=Im(z)$ . ∎

Since rotating all entries by the same phase does not change the norm of the inverse, Theorem 2.1 follows from the following theorem.

Let $A,C_{1},C_{2}$ be positive constants. There are positive constants $B$ and $C_{3}$ such that the following holds. Let ${x}$ be a random variable with $C_{1}$ -controlled second moment and $N_{n}$ be the random matrix of order $n$ whose entries are i.i.d copies of ${x}$ . Let $M$ be a deterministic matrix of order $n$ with spectral norm at most $n^{C_{2}}$ . Then,

Our arguments give an explicit dependence of $B$ in terms of $A$ and $C_{2}$ . One can set $B$ to be roughly $2AC_{2}$ . A more exact dependence can be obtained with considerably more technical details. Since for the proof of the circular law, any constant $B$ suffices, we do not go into this matter here and will discuss it elsewhere.

Notice that the assumptions in Theorem 2.5 are weaker than the assumption of Theorem 1.2. We do not require ${x}$ to have mean and bounded $(2+\eta)^{\operatorname{th}}$ moment. In the proof of Theorem 1.2, these extra assumptions are needed in order to repeat the approach of Bai, and are unrelated to the pole problem or Theorem 2.5.

One can relax somewhat the hypothesis that the entries ${x}_{ij}$ of $N_{n}$ are i.i.d copies of ${x}$ . It is sufficient to assume the following

${x}_{ij}$ are dominated by a single distribution ${x}$ in the Fourier-analytic sense that $|{\mathbf{E}}(e^{2\pi\sqrt{-1}{\operatorname{Re}}(\xi{x}_{ij})})|\leq{\mathbf{E}}(e^{2\pi\sqrt{-1}{\operatorname{Re}}(\xi{x})})$ for all complex numbers $\xi$ .

${x}$ has $\kappa$ -controlled second moment for some fixed $\kappa$ .

This refinement can be extracted without too much difficulty from the proof in this paper, which ultimately relies on Fourier-analytic methods. Using this refinement and following [2, Chapter 10.8.2], we can extend Theorem 1.2 for the case the the entries of $N_{n}$ are independent, but not necessarily identically distributed, as mentioned in the introduction.

In order to deal with sparse random matrices, we prove the following variant of Theorem 2.1.

Let $A>1,C_{1},C_{2},\alpha$ be positive constants. There are positive constants $B$ and $C_{3}$ depending on $A,C_{1},C_{2},\alpha$ such that the following holds. Let ${x}$ be a random variable with $C_{1}$ -controlled second moment and let $N_{n,\rho}$ be the random matrix of order $n$ defined as in Theorem 1.3. Let $M$ be a deterministic matrix of order $n$ with spectral norm at most $n^{C_{2}}$ . Then,

To conclude this section, let us derive a simple corollary of Theorem 2.9.

Let $A,C_{1},C_{2},\alpha$ be positive constants. There are positive constants $B$ and $C_{3}$ such that the following holds. Let ${x}$ be a random variable with $C_{1}$ -controlled second moment and $N_{n,\rho}$ be the random matrix of order $n$ defined as in Theorem 1.3. Let $M$ be a deterministic matrix of order $n$ with spectral norm at most $n^{C_{2}}$ . Then,

A simple application of Chebyshev’s inequality shows that

Since $\|N_{n,\rho}\|$ is bounded from above by $\max_{i}\sum_{j=1}^{n}|{x}_{ij}|$ , we have that

by the union bound. Combining this with the polynomial bound on $\|M\|$ and with Theorem 2.9, the claim follows by choosing $B$ sufficiently large. ∎

The condition number $\|M\|\|M^{-1}\|$ of a matrix $M$ plays a crucial role in numerical linear algebra (see , for instance). The above corollary implies that if one perturbes a fixed matrix $M$ by a (very general) sparse random matrix $N_{n}$ , the condition number of the resulting matrix will be relatively small with high probability. This fact has some nice applications in theoretical computer science (see or , for example).

Inverse Littlewood-Offord theorems

Let us consider a toy case in order to illustrate the ideas behind the proof of Theorem 2.5. Assume, for a moment, that $M=0$ and ${x}\equiv N(0,1)$ is real Gaussian. In this case, we talk about the least singular value of the random matrix $N_{n}$ whose entries are i.i.d real Gaussian. Let $X_{i}$ be the row vectors of $N_{n}$ and $d_{i}$ be the distance from $X_{i}$ to the hyperplane $H_{i}$ spanned by $X_{j}$ , $j\neq i$ . The least singular value of $N_{n}$ is close (up to factors of $n^{O(1)}$ ) to $\min_{1\leq i\leq n}d_{i}$ . Thus, our goal is to prove that with high probability, each of the $d_{i}$ is bounded away from 0.

In this Gaussian case, the task is simple since, thanks to symmetry, the distribution of $d_{i}$ does not depend on the vectors $X_{j}$ , $j\neq i$ . Indeed, $d_{i}$ has the same distribution as the distance from a Gaussian vector to a fixed hyperplane. This variable is well understood and satisfies the inequality

for any fixed positive constant $A$ . This leads to the conclusion of Theorem 2.5 in this simple case.

However, the general case is much more difficult. For example, if the entries of $N$ are iid Bernoulli, it is already non-trivial to prove $N_{n}$ is asymptotically almost surely non-singular (i.e. that with probability $1-o(1)$ , one has $d_{i}\neq 0$ for all $i$ ). The point here is that one can no longer fix $X_{j},j\neq i$ . As a matter of fact, the distribution of the distance $d_{i}$ depends heavily on the position of the hyperplane $H_{i}$ spanned by the $X_{j},j\neq i$ . For example, let ${x}$ be Bernoulli and consider the following two situations

$H_{i}$ has normal vector $(\frac{1}{\sqrt{n}},\cdots,\frac{1}{\sqrt{n}})$ . In this case, $d_{i}=0$ with probability $O(\frac{1}{\sqrt{n}})$ .

$H_{i}$ has normal vector $(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}},0,\dots,0)$ . In this case, $d_{i}=0$ with probability $\frac{1}{2}$ .

A hyperplane $H$ is, in some sense, bad for us if the distance from a random (row) vector to $H$ is small with non-negligible probability. It is important to understand the bad hyperplanes. Notice that if ${\mathbf{v}}=(v_{1},\dots,v_{n})$ is the unit normal vector of $H$ , then the distance in question is exactly the random variable

where ${x}_{i}$ are i.i.d. copies of ${x}$ .

This naturally leads to introducing the following concept.

Let ${x}$ be a complex random variable, and let ${\mathbf{v}}=(v_{1},\ldots,v_{n})$ be a tuple of complex numbers. We define the random walk $W_{{x}}({\mathbf{v}})$ to be the complex random variable

where ${x}_{1},\ldots,{x}_{n}\equiv{x}$ are iid copies of ${x}$ . For any $z\in{\mathbf{C}}$ and $r>0$ , we let $B(z,r)$ denote the closed disk of radius $r$ centered at $z$ . For any $r\geq 0$ , we define the small ball probability

Intuitively, we expect the small ball probability $p_{r,{x}}({\mathbf{v}})$ to be quite small for “most” tuples ${\mathbf{v}}$ . The question, of course, is to quantify “most”.

A classical theorem of Littlewood and Offord (see also ) shows that if ${x}$ is Bernoulli, and all $|v_{i}|\geq 1$ , then $p_{1,{x}}({\mathbf{v}})=O(n^{-1/2})$ . There are several extensions of this result. They, typically, improve upon the bound $O(-n^{1/2})$ , under extra assumptions on the $v_{i}$ . We are going to refer to results in this spirit as forward Littlewood-Offord theorems.

For our purposes, we need inverse Littlewood-Offord theorems. Such a theorem is supposed to give a characterization of those vectors ${\mathbf{v}}$ , where $p_{r,{\mathbf{v}}}$ is larger than some lower bound. The study of inverse Littlewood-Offord theorems was started in , where we investigated the case when ${x}$ has discrete support. A new result in this spirit was recently obtained in , where the authors investigated sub-Gaussian distributions, as well as distributions with bounded third or fourth moments.

In the current situation, we only assume that ${x}$ has $O(1)$ -controlled second moment. The weakness of this assumption is a major obstacle and makes the proof much more complicated. It is still possible to obtain a reasonably strong characterization of ${\mathbf{v}}$ , given that $p_{r,{x}}({\mathbf{v}})$ is large. However, this characterization is somewhat technical to state and so we will only explicitly state here a corollary of it, which will be sufficient for our purpose of proving the least singular value bound and the circular law.

Let ${x}$ be a complex random variable. Let $n$ be a positive integer and $\beta,p$ be positive numbers that may depend on $n$ . Let $S_{n,{x},\beta,p}$ be the set of all unit vectors ${\mathbf{v}}=(v_{1},\ldots,v_{n})\in{\mathbf{C}}^{n}$ such that one has the concentration bound

We give ${\mathbf{C}}^{n}$ the $l^{\infty}$ norm

Let ${x}$ be a complex random variable which has $\kappa$ -controlled second moment for some $\kappa>0$ . Let $0<\varepsilon\leq 1$ . Then, for all $n$ which are sufficiently large depending on $\kappa,\varepsilon$ and $\beta\geq\exp(-n^{\varepsilon/2})$ and $p=n^{-O(1)}$ , there is a set $S^{\prime}\subset{\mathbf{C}}^{n}$ of size at most $n^{(-1/2+\varepsilon)n}p^{-n}+\exp(o(n))$ such that for any ${\mathbf{v}}\in S_{n,{x},\beta,p}$ there is ${\mathbf{v}}^{\prime}\in S^{\prime}$ such that $\|{\mathbf{v}}-{\mathbf{v}}^{\prime}\|_{\infty}\leq\beta$ . In other words, $S_{n,{x},\beta,p}$ has a maximal $\beta$ -net in the $l^{\infty}$ norm of size at most $n^{(-1/2+\varepsilon)n}p^{-n}+\exp(o(n))$ .

Let ${x}$ be a complex random variable which has $\kappa$ -controlled second moment for some $\kappa>0$ . Let $0<\varepsilon\leq 1$ . Then, for all $n$ which are sufficiently large depending on $\kappa,\varepsilon$ and $\beta\geq\exp(-n^{\varepsilon/2})$ and $p=n^{-O(1)}$ , all $1/n<\mu\leq 1$ , and all $m$ between $n^{\varepsilon}$ and $n^{1-\varepsilon}\mu$ there is a set $S^{\prime}\subset{\mathbf{C}}^{n}$ of size at most $n^{O(\varepsilon)n}(p\sqrt{m})^{-n}+\exp(n^{O(\varepsilon)}m/\mu)$ such that for any ${\mathbf{v}}\in S_{n,{x}{\mathbf{I}}_{\mu},\beta,p}$ there is ${\mathbf{v}}^{\prime}\in S^{\prime}$ such that such that $\|{\mathbf{v}}-{\mathbf{v}}^{\prime}\|_{\infty}\leq\beta$ . In other words, $S_{n,{x}{\mathbf{I}}_{\mu},\beta,p}$ has a maximal $\beta$ -net in the $l^{\infty}$ norm of size at most $n^{O(\varepsilon)n}(p\sqrt{m})^{-n}+\exp(n^{O(\varepsilon)}m/\mu)$ .

If one sets $m=n^{1-C\varepsilon}\mu$ for some absolute constant $C$ then the conclusion of Theorem 3.3 is similar to that in Theorem 3.2 except for the extra term $\sqrt{\mu}$ in Theorem 3.3. However, for our applications it will be slightly more convenient to choose $m$ at the other extreme, thus $m=n^{\varepsilon}$ . The main point here is that the size of $S_{n,{x},\beta,p}$ (or $S_{n,{x}{\mathbf{I}}_{\mu},\beta,p}$ ) tends to be much smaller than $p^{-n}$ .

Let $A$ be a precompact subset of a metric space $X$ , and let $\varepsilon>0$ . We define the internal metric entropy ${\mathcal{N}}_{\varepsilon}(A)$ to be the cardinality of the largest $\varepsilon$ -net in $A$ (i.e. a set $B\subset A$ where any two elements in $B$ are separated by distance $\varepsilon$ ). We define the external metric entropy ${\mathcal{N}}^{\prime}_{\varepsilon}(A)$ to be the least number of closed $\varepsilon$ -balls in $X$ needed to cover $A$ .

and furthermore in the complex plane $X={\mathbf{C}}$ we have ${\mathcal{N}}_{2\varepsilon}(A)=\Theta({\mathcal{N}}_{\varepsilon}(A))$ . As constant factors will not play any important role, the two notions of entropy will be essentially equivalent for our purposes.

Since $\|{\mathbf{v}}\|_{\infty}\geq n^{-1/2}\|{\mathbf{v}}\|$ , we have the following corollary.

Let ${x}$ be a complex random variable which has $\kappa$ -controlled second moment, for some constant $\kappa>0$ . Let $\varepsilon$ be an arbitrary positive constant. Then for any positive numbers $\mu,\beta,p\leq 1$ and all sufficiently large $n$ we have

In fact, the proof of Theorem 3.2 gives a fairly precise description of the set $S_{n,{x},\beta,p}$ , as is the case with other inverse Littlewood-Offord theorems in the literature. However, this description is somewhat technical to state and we only need the entropy bound on $S_{n,{x},\beta,p}$ in our application, so we have presented Theorem 3.2 in the above short (but less explicit) form.

Concentration probabilities and Fourier analysis

Throughout this section ${x}$ will be a fixed complex random variable with $O(1)$ -controlled second moment. For any $0<\mu\leq 1$ , let ${x}^{(\mu)}$ be the random variable

where ${x}_{1},{x}_{2}$ are iid copies of ${x}$ and ${\mathbf{I}}_{\frac{\mu}{2}}$ is independent from ${x}_{1},{x}_{2}$ .

If ${x}$ is the Bernoulli random variable ${\mathbf{P}}({x}=+1)={\mathbf{P}}({x}=-1)=1/2$ , then ${x}^{(\mu)}\in\{0,+2,-2\}$ with ${\mathbf{P}}({x}^{(\mu)}=+2)={\mathbf{P}}({x}^{(\mu)}=-2)=\mu/8$ .

For any $0<\mu\leq 1$ and any tuple ${\mathbf{v}}=(v_{1},\ldots,v_{n})$ of complex numbers, define the concentration probability

This quantity will turn out to be very convenient for controlling the small ball probabilities of $W_{{x}^{(\mu)}}({\mathbf{v}})$ (see Lemma 4.3 below). To do that, we first need a Fourier-analytic representation of ${\mathbf{P}}_{\mu}({\mathbf{v}})$ . We introduce the characteristic function $f:{\mathbf{C}}\to{\mathbf{R}}$ , defined by

For any tuple ${\mathbf{v}}=(v_{1},\ldots,v_{n})$ of complex numbers and any $0<\mu\leq 1$ , we have

Here of course $d\xi$ is Lebesgue measure on the complex plane ${\mathbf{C}}$ .

On the other hand, from (4), (5), (7) and independence we see that

The relevance of concentration probability to the small ball probability is provided by the following lemma:

For any tuple ${\mathbf{v}}$ and any $r>0$ , we have

In applications, $r$ will be very close to 0 and so the term $e^{\pi r^{2}}$ can be ignored.

From Definition 3.1, it suffices to show that

Applying (9) as in the proof of the preceding lemma, we have

The quantity $|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}}({\mathbf{v}})))|$ can be expanded, using (4) and (7), as $\prod_{i=1}^{n}f(\xi v_{i})^{1/2}$ . Since $f(\xi v_{i})^{1/2}\leq\frac{1}{2}+\frac{1}{2}f(\xi v_{i})$ , it follows that

The claim of the lemma follows from the triangle inequality and Lemma 4.2. ∎

We now generalize the above lemma to the sparse case:

For any tuple ${\mathbf{v}}$ and any $r>0,1\geq\mu>0$ , we have

The proof is almost identical as the previous one. The only difference here is that we have $|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}{\mathbf{I}}_{\mu}}({\mathbf{v}})))|$ instead of $|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}}({\mathbf{v}})))|$ . Notice that $|{\mathbf{E}}e({\operatorname{Re}}(\xi W_{{x}{\mathbf{I}}_{\mu}}({\mathbf{v}})))|$ can be expanded as $\prod_{i=1}^{n}((1-\mu)+\mu f(\xi v_{i})^{1/2})$ . Since $f(\xi v_{i})^{1/2}\leq\frac{1}{2}+\frac{1}{2}f(\xi v_{i})$ , it follows that

The concentration probability has several pleasant properties (cf. [27, Lemma 5.1]):

Let $0<\mu\leq 1$ . Then the following properties hold.

The quantity ${\mathbf{P}}_{\mu}({\mathbf{w}})$ is monotone decreasing in $\mu$ and permutation invariant in ${\mathbf{w}}$ .

For any tuples ${\mathbf{v}},{\mathbf{w}}$ we have

where ${\mathbf{v}}{\mathbf{w}}$ is the concatenation of ${\mathbf{v}}$ and ${\mathbf{w}}$ .

For any tuples ${\mathbf{v}},{\mathbf{w}}$ we have

For any $k\geq 1$ and tuple ${\mathbf{v}}$ we have

where ${\mathbf{v}}^{k}$ is the concatenation of $k$ copies of ${\mathbf{v}}$ .

For any tuples ${\mathbf{v}},{\mathbf{w}}_{1},\ldots,{\mathbf{w}}_{m}$ we have

In particular, by the pigeonhole principle, there exists $1\leq i\leq m$ such that

Properties (i), (ii) are immediate from (8). To prove (iii), observe from (6) that

where we require the walks $W_{{x}^{(\mu)}}({\mathbf{v}}),W_{{x}^{(\mu)}}({\mathbf{w}})$ to be independent. Using the arithmetic mean-geometric mean inequality

Comparing this with (10) we obtain the claim.

The inequality (iv) follows easily from (8) and the elementary inequality $1-t\leq(1-t/k)^{k}$ for all $0\leq t\leq 1$ , which follows from the convexity of $\log(1-t)$ . Finally, the inequality (v) follows from (8) and Hölder’s inequality. ∎

The x𝑥{x}-norm of a complex number

In this section, we present a way to estimate the characteristic function $f$ (and hence the concentration probabilities ${\mathbf{P}}_{\mu}({\mathbf{w}})$ ) in terms of a more convenient expression. Define the ${x}$ -norm of a complex number $w\in{\mathbf{C}}$ by the formula

where ${x}_{1},{x}_{2}$ are iid copies of ${x}$ , and $\|t\|_{{\mathbf{R}}/{\mathbf{Z}}}$ denotes the distance from $t$ to the nearest integer.

If ${x}$ is Bernoulli, then $\|w\|_{{x}}=\frac{1}{\sqrt{2}}\|{\operatorname{Re}}(2w)\|_{{\mathbf{R}}/{\mathbf{Z}}}$ . So in this case the ${x}$ -norm of $w$ is basically the size of the fractional part of ${\operatorname{Re}}(2w)$ .

For any $w\in{\mathbf{C}}$ and $0<\mu\leq 1$ we have

for any tuple ${\mathbf{w}}=(w_{1},\ldots,w_{k})$ .

In view of the elementary inequality $1-t\leq\exp(-t)$ for $t\geq 0$ , it will suffice to show that

and the claim follows from the elementary inequality $\cos(2\pi\theta)\leq 1-\Omega(\|\theta\|_{{\mathbf{R}}/{\mathbf{Z}}}^{2})$ . ∎

We now record some useful properties of the ${x}$ -norm, which may help explain why we call it a “norm”:

For any $w\in{\mathbf{C}}$ , $0\leq\|w\|_{x}\leq 1$ and $\|-w\|_{x}=\|w\|_{x}$ .

For any $z,w\in{\mathbf{C}}$ , $\|z+w\|_{x}\leq\|z\|_{x}+\|w\|_{x}$ .

If ${x}$ has $\kappa$ -controlled second moment for some positive constant $\kappa$ , then there exists a positive constant $c$ depending on $\kappa$ such that $\|z\|_{x}\gg|{\operatorname{Re}}(z)|$ for all $z\in B(0,c)$ .

Property (i) is obvious. Property (ii) follows from the triangle inequality for $L^{2}$ and the elementary observation that $\|x+y\|_{{\mathbf{R}}/{\mathbf{Z}}}\leq\|x\|_{{\mathbf{R}}/{\mathbf{Z}}}+\|y\|_{{\mathbf{R}}/{\mathbf{Z}}}$ .

Now we prove (iii). Let $z\in B(0,c)$ for some small $c$ . From (11) it suffices to show that

for some $K=O(1)$ . In particular ${\mathbf{P}}(|{x}|\leq K)\gg 1$ . So if we let ${y}_{i}$ for $i=1,2$ be ${x}_{i}$ conditioned on the event $|{x}_{i}|\leq K$ , it suffices to show that

If $c$ is small enough depending on $K$ , then $|z({y}_{1}-{y}_{2})|\leq\frac{1}{2}$ , so it suffices to show that

But this follows by conditioning on ${y}_{2}$ and then using (1). ∎

Generalized arithmetic progressions and the forward Littlewood-Offord theorem

As in previous literature, our Littlewood-Offord theorems shall involve generalized arithmetic progressions (GAPs), which we now define.

If $v_{1},\ldots,v_{r}$ are complex numbers and $L_{1},\ldots,L_{r}$ are positive numbers, we define the symmetric generalized arithmetic progression (or symmetric GAP for short)

We refer to $r$ as the rank of $Q$ , $v_{1},\ldots,v_{r}$ as the generators, and $L_{1},\ldots,L_{r}$ as the dimensions.

If all the sums $n_{1}v_{1}+\ldots+n_{r}v_{r}$ are distinct, we say that $Q$ is proper. For $t>0$ , we define the dilate $tQ$ of $Q$ as

Finally, if $L_{1}=\ldots=L_{r}=L$ , we abbreviate ${\operatorname{Q}}((v_{1},\ldots,v_{r}),(L,\ldots,L))$ as ${\operatorname{Q}}((v_{1},\ldots,v_{r}),L)$ .

GAPs are a fundamental object in additive combinatorics and they have played a crucial role in our earlier papers on Inverse Littlewood-Offord theorems and least singular values . For a detailed discussion about these objects, we refer to .

It is helpful to view $Q$ as the image of the integral box

under the linear map $\Phi$ that sends the point $(n_{1},\dots,n_{r})$ to $n_{1}v_{1}+\dots+n_{r}v_{r}$ . $Q$ is proper if $\Phi$ is one-to-one.

We use the following two simple lemmas frequently:

Let $Q$ be a symmetric GAP of rank $r$ and $t\geq 1$ . Then

One can cover $tQ$ by $O(t^{r})$ translates of $Q$ . ∎

Let $Q\subset{\mathbf{C}}$ be a finite set, and let $\Omega\subset{\mathbf{C}}$ be a set which can be covered by at most $M$ balls of radius $r/2$ . Then we have

We can of course assume that $Q\cap\Omega$ is non-empty. By the pigeonhole principle, we can find a ball $B(z,r/2)$ of radius $r/2$ which contains at least $\#(Q\cap\Omega)/M$ elements of $Q\cap\Omega$ ; in particular it contains at least one element $z_{0}$ of $Q$ . Since $(Q\cap\Omega\cap B(z,r/2))-z_{0}$ is contained in $(Q-Q)\cap B(0,r)$ , the claim follows. ∎

For a GAP $Q={\operatorname{Q}}((v_{1},\dots,v_{r}),(L_{1},\dots,L_{r}))$ , define the dispersion ${\mathbf{D}}(Q)$ to be the quantity

The quantity ${\mathbf{D}}(Q)$ is very close to the metric entropy ${\mathcal{N}}_{1}(Q)$ of $Q$ , indeed simple volume packing arguments (cf. Lemma 6.4) show that ${\mathbf{D}}(Q)=\Theta_{r}({\mathcal{N}}_{1}(Q))$ . We will however not use that fact here.

This quantity turns out to control the concentration probability of certain random walks associated with $Q$ :

For any $0<\mu\leq 1$ , $\varepsilon>0$ , and complex numbers $v_{1},\dots,v_{r}$ , we have

This “forward Littlewood-Offord theorem” will be crucial in establishing Theorem 3.2. To give the reader some feeling about this estimate, let us first consider a toy case when ${x}$ is Bernoulli and $\mu=1$ . The adjusted random variable ${x}^{(\mu)}$ equals with probability $3/4$ and $\pm 2$ with probability $1/8$ .

Assume furthermore that the $v_{i}$ are non-zero integers and $Q$ is proper. Thus $Q\cap B(0,1)\subset\{-1,0,1\}$ and the desired bound becomes

Consider a (lazy) random walk $W$ starting at . At step $j$ , stay with probability $1/2$ and move to right or left by an amount $v_{j}$ with probability $1/8$ . The terminal point after $n$ step is exactly the random variable

Since ${\mathbf{P}}_{\mu}({\mathbf{v}}):={\mathbf{E}}\exp(-\pi|W_{{x}^{(\mu)}}({\mathbf{v}})|^{2})$ , the quantity ${\mathbf{P}}_{1}(v_{1}^{L_{1}^{2}}\ldots v_{r}^{L_{r}^{2}})$ can be bounded from above by the sum of the probability that the lazy random walk with $L_{1}^{2}$ steps of size $v_{1}$ , …, $L_{r}^{2}$ steps of size $v_{r}$ ends up on a point with absolute value at most $10\log(\#Q)$ and a negligible term which is much smaller than $(\#Q)^{-1}$ .

Notice that the coefficient of $v_{j}$ is the sum of $L_{j}^{2}$ iid copies of ${x}^{(1)}$ . It is well known that the distribution of this sum is roughly uniform on the interval $[-L_{j},L_{j}]$ . (By roughly uniform, we mean that for any two integers in this interval, the ratio of their masses is bounded from above by a positive constant.) Thus, the main observation here (and somehow the essence of the theorem) is that the end point of the walk (conditioned on the fact that the coefficient of $v_{j}$ belongs to $[-L_{j},L_{j}]$ ) is roughly uniform in $Q$ . It follows that the probability that it has absolute value $O(\log\#Q)$ can be bounded from above by $O(\frac{\log\#Q}{\#Q})\leq\#Q^{-1+\varepsilon}$ , giving the desired bound.

This argument can be made rigorous for random variables ${x}$ with discrete supports, even when $Q$ is not proper. However, the proof for the general case is more complicated. The main technical tool needed is the following level set estimate:

Given a GAP $Q={\operatorname{Q}}(v_{1},\dots,v_{r},L_{1},\dots,L_{r})$ , a complex number $\xi_{0}$ , and $\varepsilon>0$ , let $\Sigma\subset{\mathbf{C}}$ be the set

We will prove Lemma 6.7 in Sections 7-8 below. For now, let us show how it implies Theorem 6.6.

We abbreviate ${\mathbf{D}}:={\mathbf{D}}(Q)$ . In view of (12), it suffices to show that

Covering ${\mathbf{C}}$ by balls of radius $1$ , it thus suffices to show that

Now we fix $\xi_{0}$ . Let $c$ be a small positive constant to be determined. It is clear that if ${\mathbf{D}}$ is sufficiently large, then

(In fact, ${\mathbf{D}}^{c\varepsilon}$ can be replaced by $C\log{\mathbf{D}}$ for some large constant $C$ .) By Lemma 6.7,

We choose $c$ equal half of the reciprocal of the hidden constant in $O$ . It follows that

To conclude the proof of Theorem 6.6, we need to establish Lemma 6.7. This is the purpose of the next two sections.

Lacunary sets inside GAPs

Let $S$ be a set. We shall informally call a sequence $w_{1},\dots,w_{d}$ of elements of $S$ lacunary if the ratio $\frac{|w_{i-1}|}{|w_{i}|}$ is large for all $1<i\leq d$ . The goal of this section is to show that a large GAP always contains a large lacunary subset with some prescribed properties. This fact will be a key ingredient in the proof of Lemma 6.7 (and hence Theorem 6.6), which we give in the next section.

To give the reader some motivation, let us consider the toy case when $Q$ is an interval, say $\{-s,-s+1,\dots,s-1,s\}$ . Given a ratio $K>2$ and a constant $R>1$ (say), we can easily find $d$ elements $w_{1},\dots,w_{d}$ such that $|w_{d}|\geq R$ and $\frac{|w_{i}|}{|w_{i+1}|}\geq K$ where $d$ satisfies

The main result of this section is a generalization of the above observation for general GAPs.

Let $K\geq 1$ , let $Q$ be a symmetric GAP of rank $r$ , and let $R\geq 0$ be a radius. Then there exists, for some $d\geq 0$ , “primary vectors” $w_{1},\ldots,w_{d}\in Q$ , and “secondary vectors” $w^{\prime}_{1},\ldots,w^{\prime}_{d}\in Q$ with the following properties:

(Lacunarity) We have $|w_{i}|\geq K|w_{i+1}|$ for all $1\leq i\leq d-1$ .

(Secondary bounds) We have $|w_{i}|>R$ and $|w^{\prime}_{i}|\leq|w_{i}|$ for all $1\leq i\leq d$ .

where $1\leq K_{i}\leq 1+K$ is the quantity

The secondary vectors are necessary here because $Q$ is taking values in the complex numbers; if $Q\subset{\mathbf{R}}$ then we could simply take $w^{\prime}_{i}=0$ (and thus $K_{i}=1$ ) for all $i$ . The reader may wish to follow the argument below in the real case (and for $R=0$ ), as it is somewhat simpler in that case. The bound (16) may seem strange, but it is best possible except for the $O_{r}(\cdot)$ factor, and we will need such a tight estimate in our applications. The vectors $w_{1},\ldots,w_{d},w^{\prime}_{1},\ldots,w^{\prime}_{d}$ are somewhat analogous to the Minkowski basis of a lattice with respect to a convex body, thus (16) can be viewed as a variant of Minkowski’s second theorem.

By increasing $K$ if necessary we may assume $K$ to be larger than any given constant depending on $r$ . We can also assume that $Q$ is not contained in $B(0,R)$ , as the claim is obvious otherwise.

We perform the following algorithm. We set $d_{0}:=C_{r}\left(1+\frac{\log\frac{\#Q}{\#(Q\cap B(0,R))}}{\log K}\right)$ for some sufficiently large constant $C_{r}$ depending only on $r$ .

Initialize $i=1$ . We also adopt the convention that $w_{0}=\infty$ .

Let $Q_{i}:=2^{-d_{0}+i}Q\cap B(0,|w_{i-1}|/K)$ . If $Q_{i}\subset B(0,R)$ then set $d:=i-1$ and STOP. Otherwise, let $w_{i}\in Q_{i}$ be chosen such that $|w_{i}|$ is maximal; thus $|w_{i}|\leq|w_{i-1}|/K$ , $|w_{i}|>R$ and $Q\subset B(0,|w_{i}|)$ .

Let $w^{\prime}_{i}\in Q_{i}$ be chosen to maximize the quantity $K_{i}$ defined in (17). Observe that $|w^{\prime}_{i}|\leq|w_{i}|$ .

From elementary complex geometry we see that $Q_{i}$ is now contained in a rectangle of dimensions $O(|w_{i}|)\times O(\frac{K_{i}}{K}|w_{i}|)$ . This rectangle can be covered by $O(KK_{i})$ disks of radius $|w_{i}|/2K$ . Applying Lemma 6.4, we conclude that the set

Increment $i$ to $i+1$ and return to Step 1.

Since $w_{1},w_{2},\ldots$ have decreasing magnitude and lie in the finite set $Q$ we see that this algorithm terminates in finite time. In fact we claim that this algorithm terminates before step $d_{0}$ . For if the algorithm reaches stage $d_{0}$ , we have obtained $w_{1},\ldots,w_{d_{0}}\in Q$ obeying the lacunarity condition $|w_{i}|\leq|w_{i-1}|/K$ . This implies that the GAP ${\operatorname{Q}}((w_{1},\ldots,w_{d_{0}}),K/10)$ is proper, and that the pairwise sums between ${\operatorname{Q}}((w_{1},\ldots,w_{d_{0}}),K/10)$ and $2Q\cap B(0,R/10)$ are distinct and contained in $(d_{0}K+1)2Q$ . But this implies that

Also, since $B(0,R)$ can be covered by $O(1)$ balls of radius $R/20$ , we see from Lemma 6.4 that

But from definition of $d_{0}$ , we see that this is impossible if $C_{r}$ is chosen sufficiently large (recall we are taking $K$ large compared to $r$ ). Thus we have $d\leq d_{0}$ , which in particular implies that $w_{1},\ldots,w_{d}$ and $w^{\prime}_{1},\ldots,w^{\prime}_{d}$ lie in $Q$ . Since $Q$ is not contained in $B(0,R)$ we also have $d\geq 1$ .

Now we can cover $Q$ by $O_{r}(1)^{d_{0}}$ copies of $Q_{1}=2^{-d_{0}+1}Q$ , and thus

In particular, since $K+1\geq K_{1}$ and $d_{0}\geq d$ we have

using the definition of $d_{0}$ and recalling that $K$ is large compared to $r$ we conclude that $d\gg_{r}d_{0}$ . The claim (16) now follows from (20). The remaining claims are easily verified from the construction. ∎

Proof of Lemma 6.7

We are now ready to prove Lemma 6.7. In the following, $Q$ is fixed and we write ${\mathbf{D}}$ instead of ${\mathbf{D}}(Q)$ . We also fix $r$ and allow all implied constants to depend on $r$ . We may assume without loss of generality that ${\mathbf{D}}$ is large compared with $\varepsilon$ , since the claim is trivial otherwise.

Let $K:={\mathbf{D}}^{\varepsilon}$ ; since ${\mathbf{D}}$ is assumed large compared to $\varepsilon$ , we see that $K$ is also. We apply Lemma 7.1 (with $R=1$ , and to the GAP $\frac{1}{K^{4}}Q$ ) to obtain vectors

for some $d=O(1/\varepsilon)$ such that $|w_{i}|\geq K|w_{i+1}|$ for all $1\leq i\leq d-1$ , $|w_{i}|>1$ and $|w^{\prime}_{i}|\leq|w_{i}|$ for all $1\leq i\leq d$ , and

where $K_{i}$ is defined in (17). Since $Q$ has rank $O(1)$ , we have

and thus (since $d=O(1/\varepsilon)$ and ${\mathbf{D}}$ is large compared with $\varepsilon$ )

From (14), Lemma 5.3, and (21) we see that

for all $1\leq i\leq d$ and $\xi\in\Sigma$ .

For $1\leq i\leq d$ , define $\zeta_{i}:=\frac{1}{K^{2}w_{i}}$ and $\zeta^{\prime}_{i}:=\sqrt{-1}\frac{1}{KK_{i}w_{i}}$ . Let $P$ be the GAP

One should view $P$ as a kind of “dual” to $Q$ . It has the following properties:

$\#P\geq{\mathbf{D}}^{1-O(\varepsilon)}$ .

If $z,z^{\prime}\in P$ are distinct, then $z+\Sigma$ and $z^{\prime}+\Sigma$ are disjoint.

We first verify (i). If $P$ is not proper, then we have a linear relation

for some integers $n_{1},\ldots,n_{r},m_{1},\ldots,m_{r}$ , not all zero, with $|n_{i}|\leq K/50$ and $|m_{i}|\leq K_{i}/50$ for $1\leq i\leq r$ . Let $j$ be the largest index such that $(n_{j},m_{j})$ is non-zero. If $1\leq i<j$ , then from the properties of $w_{i}$ we have

From the triangle inequality we then have

On the other hand, since $n_{j},m_{j}$ are integers which are not both zero, and $\zeta^{\prime}_{j}=\sqrt{-1}\frac{K}{K_{j}}\zeta_{j}$ , and $K/K_{j}\geq 1/2$ , we see that

and so (ii) now follows from (22) (recalling that $d=O(1/\varepsilon)$ and ${\mathbf{D}}$ is large compared to $\varepsilon$ ).

Now we prove (iii). If $z\in P$ , then we see from the triangle inequality that

by lacunarity. But by construction $|w_{d}|\geq 1$ , and the claim follows.

Now we prove (iv). If the claim was false, then we could find distinct $z,z^{\prime}\in P$ and $\xi,\xi^{\prime}\in\Sigma$ such that $z-z^{\prime}=\xi-\xi^{\prime}$ . We can then write

for some integers $n_{1},\ldots,n_{r},m_{1},\ldots,m_{r}$ , not all zero, with $|n_{i}|\leq K/50$ and $|m_{i}|\leq K_{i}/50$ .

Let $j$ be the largest index such that $(n_{j},m_{j})$ is non-zero. From (23) and Lemma 5.3 we have

On the other hand, from the triangle inequality we have

If $K$ is large enough, then we can apply Lemma 5.3 to conclude from (24) that

if $K$ is large enough. On the other hand, by construction of $\zeta_{j},\zeta^{\prime}_{j}$ we have

Since $n_{j}$ is an integer, we conclude $n_{j}=0$ . In that case we have

if $K_{j}\geq 2$ , by construction of $\zeta^{\prime}_{j}$ and $K_{j}$ . Since $m_{j}$ is an integer, we conclude $m_{j}=0$ . On the other hand, if $K_{j}<2$ , then we have $m_{j}=0$ as well, since $|m_{j}|\leq K_{j}/50$ . But $(n_{j},m_{j})$ is non-zero, a contradiction. ∎

From properties (ii), (iii), (iv) we see that

Structure of weak elements

Let $Q$ be a GAP. Extend $Q$ by a new dimension generated by a new element $z$ ; $Q^{\prime}=Q+\{-kz,\cdots,kz\}$ . We call $z$ weak if $\#Q^{\prime}$ is only slightly more than $\#Q$ . The goal of this section is to quantify (and generalize) the following phenomenon:

The reader may find the following simple example illustrative. Assume that $Q$ is the interval $[-s,\dots,s]$ . Assume that $Q^{\prime}:=Q+\{-kz,\cdots,kz\}$ has cardinality at most $ls$ , where $l=k^{\delta}$ for some small positive $\delta$ .

Consider the interval $Q_{1}:=\{x\in{\mathbf{Z}}||x|\leq sl/k\}$ . The sets $x+\{z,\cdots,kz\},x\in Q_{1}$ are subsets of $Q^{\prime}$ . Since $\#Q_{1}>ls/k$ , these sets are not disjoint. Thus, we have $x+jz=x^{\prime}+j^{\prime}z$ for some distinct $x,x^{\prime}\in Q_{1}$ and $1\leq j\neq j^{\prime}\leq k$ . This implies that

This already gives a bound $k\#(Q_{1}-Q_{1})=O(l\#Q)=O(ls)$ on the cardinality of the possible $z$ . But this bound can be improved further (this improvement is critical later on). Consider the set $x+\{0,\cdots,lz\}$ with $x\in Q$ . By the same argument as before, these sets are not disjoint, and we can conclude that

for $x\in Q_{1}-Q_{1},1\leq\tau\leq k$ and $x^{\prime}\in Q-Q,1\leq\tau^{\prime}\leq l$ . If $\frac{x}{\tau}$ is irreducible, then $\tau\leq l$ and the number of $z$ ’s of this form is only at most $l\#(Q_{1}-Q_{1})=O(\frac{l^{2}}{k}s)$ . If it is not, then $\operatorname{gcd}(x,\tau)\geq\frac{\tau}{l}$ . The number of $x$ satisfying this condition in $Q_{1}-Q_{1}$ is at most $O(\frac{l}{\tau}\#Q_{1})$ . Thus, the number of $z$ ’s is at most $\sum_{\tau=l}^{k}O(\frac{l}{\tau}\#Q_{1})=O(\frac{l^{2}}{k}s)$ , using the bound on $\#Q_{1}$ and the fact that $l=k^{\Omega(1)}$ . Thus, altogether we obtain the bound

which is much better than the previous bound $O(l\#Q)$ . The term $k^{-1}$ will play a critical role in later proofs.

The main result of this section is a generalization of this very special case.

Let $w_{1},\ldots,w_{r}$ be complex numbers and $Q={\operatorname{Q}}((w_{1},\dots,w_{r}),(L_{1},\dots,L_{r}))$ . Let $z$ be a complex number and $k$ a positive integer. Define

Let $Z$ denote the set of all complex numbers $z$ such that

Then $Z$ has a $24$ -net of size at most $1+O_{r}(l^{4}k^{-1}{\mathbf{D}}(Q))$ .

The $24$ -net can be replaced by an $1$ -net if we replace the bound $1+O_{r}(l^{4}k^{-1}{\mathbf{D}}(Q))$ by $O_{r}(1+l^{4}k^{-1}{\mathbf{D}}(Q))$ . However, it is important to us to have the current formulation, as in the case when $l^{4}k^{-1}{\mathbf{D}}(Q)=o(1)$ the net will have size exactly $1$ . The power of $l^{4}$ might be improvable, but we will not need this improvement here, as $l$ will always be relatively small for us compared to other parameters such as $k$ and ${\mathbf{D}}(Q)$ .

Let $z\in Z$ . By definition of $Z$ , we have

Let $W\subset\frac{1}{2}Q$ be a maximal $1$ -net of $\frac{1}{2}Q$ , then we see that the sets $w+(Q\cap B(0,1))$ for $w\in W$ cover $\frac{1}{2}Q$ , and thus

thanks to the easily verified fact that $\#(\frac{1}{2}Q)\gg_{r}\#Q$ .

Refine the $1$ -net $W$ to a maximal $2$ -net $W^{\prime}$ . We have $\#W^{\prime}\gg_{r}\#W$ and thus

A simple greedy algorithm argument (using the symmetry of $L$ ) shows that we can find a set $J\subset\{-k,\ldots,k\}$ of cardinality $\#J\gg\frac{k}{\#L}$ such that $j_{1}-j_{2}\not\in L$ for any distinct $j_{1},j_{2}\in J$ . Consider the sets $jz+W^{\prime}+((Q+{\operatorname{Q}}(z,k))\cap B(0,1))$ for $j\in J$ . By the construction, we can verify that

These sets are disjoint (thanks to the definition of $J$ and $L$ ).

Every set lies in $2(Q+{\operatorname{Q}}(z,k))$ (since $|j|\leq k$ and $W^{\prime}\subset\frac{1}{2}Q).$

Each set has cardinality $(\#W^{\prime})\#((Q+{\operatorname{Q}}(z,k))\cap B(0,1))$ (since $W^{\prime}$ is a $2$ -net).

Combining this with (25) we conclude that

On the other hand, $\#J\gg\frac{k}{\#L}$ , so

which asserts that many multiples of $z$ are close to $2Q$ .

Let $R_{0}$ be the smallest radius such that

where $C_{r}$ is a sufficiently large constant depending on $r$ .

Assume, for a moment, that $|z|\geq 2R_{0}+4$ . By the definition of $L$ , we can find, for each $j\in L$ , an element $\zeta_{j}\in 2Q$ such that $|jz-\zeta_{j}|\leq 2$ . (If there are many $\zeta_{j}$ , fix one arbitrarily.) Let $j$ and $j^{\prime}$ be two different indices, then

This implies that the sets $\zeta_{j}+(10Q\cap B(0,R_{0}))$ are disjoint. Furthermore, as $\zeta_{j}\in 2Q$ , they all lie in $12Q$ . Therefore,

But this contradicts (27) if we choose $C_{r}$ sufficiently large. Thus we have

If $R_{0}<10$ , then $z<24$ and $Z$ has a maximal $24$ -net of cardinality $1$ and we are done.

From now on, we assume $R_{0}\geq 10$ . Thus $|z|<3R_{0}$ .

From (26) and the pigeonhole principle we can find $j,j^{\prime}\in L$ such that $0<|j-j^{\prime}|\ll_{r}l$ . Thus there exists an integer $0<i\ll_{r}l$ such that $iz\in 4Q+B(0,4)$ . Since $|z|\leq 3R_{0}$ , we have $|iz|\ll_{r}lR_{0}$ and thus in fact $iz\in(4Q\cap B(0,O_{r}(lR_{0})))+B(0,4)$ . Thus, to obtain the desired bound on ${\mathcal{N}}_{1}(Z)$ , it will suffice to show that

Let $Z^{\prime}$ be any $4$ -net of $4Q\cap B(0,O_{r}(lR_{0}))$ . Observe that the sets $\zeta^{\prime}+(Q\cap B(0,1))$ for $\zeta^{\prime}\in Z^{\prime}$ are disjoint and lie in $5Q\cap B(0,O_{r}(lR_{0}))$ . Thus we have

Since ${\mathbf{D}}(Q)=\frac{\#Q}{\#(Q\cap B(0,1))}$ , it suffices to show that

But (as we are working on the plane) we can cover $B(0,O_{r}(lR_{0}))$ by $O_{r}(l^{2})$ balls of radius $R_{0}/4$ , so by Lemma 6.4 we have

Comparing this with (28) we obtain the claim. ∎

Proof of the inverse theorems

We first prove Theorem 3.2. The proof of Theorem 3.3 can be obtained with some minor modifications.

Let us begin with a simple reduction. Since ${x}$ has $O(1)$ -controlled second moment, from Chebyshev’s inequality we see that $|{x}|\geq n^{A+10}$ with probability $O(n^{-2A-20})$ . Thus if we let ${x}^{\prime}$ be ${x}$ conditioned on the event $|{x}|\leq n^{A+10}$ , we see from the union bound that $p_{\beta,{x}}({\mathbf{v}})$ and $p_{\beta,{x}^{\prime}}({\mathbf{v}})$ differ by at most $O(n^{-2A-19})$ . Thus (modifying $p$ slightly if necessary) we may replace ${x}$ by ${x}^{\prime}$ , and so we may assume for the rest of the proof that

Consider a point ${\mathbf{v}}$ in $S_{n,{x},\beta,p}$ . Let ${\mathbf{V}}=(V_{1},\ldots,V_{n})$ be the vector obtained from $\beta^{-1}{\mathbf{v}}/2$ by rounding the coordinates to the nearest Gaussian integer multiple of $n^{-A-20}$ . Clearly thus $|{\mathbf{V}}|=\Theta(\beta^{-1})$ . Furthermore, by (29)

We are going to find a small $O(1)$ -net (in the $l^{\infty}$ norm) for the set of all possible ${\mathbf{V}}$ satisfying the last inequality. Set $k:=n^{1/2-\varepsilon}$ , and let $d\geq 1$ be an integer to be chosen later ( $d$ will be bounded by a constant.)

Now we perform the following algorithm (following the proof of [27, Theorem 2.4]) to construct some elements $w_{1},\ldots,w_{r}$ in ${\mathbf{V}}$ for some $0\leq r\leq d$ .

Initialize $r=0$ . Set ${\mathbf{V}}^{}={\mathbf{V}}$ .

Count how many $V_{j}\in{\mathbf{V}}^{[r]}$ there are such that

If this number is less than $k^{2}$ then STOP. Otherwise, move on to Step 2.

Applying Lemma 4.5(v), we can find some $V_{j}\in V^{[r]}$ such that

where ${\mathbf{V}}^{[r+1]}$ is obtained from ${\mathbf{V}}^{[r]}$ by deleting a set of $k^{2}$ elements. We then set $w_{r+1}:=V_{j}$ and then increment $r$ to $r+1$ . If $r=d$ then STOP (with an error); otherwise return to Step 1.

By induction, at each stage in this algorithm we have

and hence by Theorem 6.6 and Lemma 4.5(ii)

On the other hand, by construction we have

Thus, the algorithm must terminate in Step 1 for some $r=O_{\varepsilon}(1)$ . At this point, we have obtained a tuple $(w_{1},\ldots,w_{r})$ of elements in ${\mathbf{V}}$ with $r=O_{\varepsilon}(1)$ such that

for all but at most $rk^{2}=O_{\varepsilon}(n^{1-2\varepsilon})\leq n^{1-\varepsilon}$ values of $j$ .

Now we have enough information to construct the net. First we show that it costs a relatively small factor to take care of the exceptional coordinates. There are at most $O_{\varepsilon}(k^{2})\leq n^{1-\varepsilon}$ exceptional values of $j$ ; we can fix the values of the exceptional $j$ by paying a factor of

For each exceptional $j$ , $V_{j}$ is a Gaussian integer multiple of $O(n^{-A-20})$ of magnitude $O(\beta^{-1})$ . Thus, the number of possible choices for $V_{j}$ is $\beta^{-1}n^{O(1)}$ . So, after we fix the exceptional coordinates $j$ , there are at most

ways to specify the values of these coordinates.

As for the remaining (non-exceptional) coordinates $V_{j}$ , Lemma 9.1 (along with (30), the definition of $k$ , and the bound $r=O_{\varepsilon}(1)$ ) shows that each such $V_{j}$ lies within distance $O(1)$ of a set of cardinality $1+O_{\varepsilon}(n^{-1/2+O(\varepsilon)}p^{-1})$ . The set of all vectors $V$ has a $O(1)$ -net in the $l^{\infty}$ norm of size at most

assuming $n$ sufficiently large depending on $p,\varepsilon$ .

Changing a $O(1)$ -net to a $1$ -net costs only a $O(1)$ factor. Thus, we can conclude that there is an $1$ -net of size at most $O(n^{(-1/2+O(\varepsilon))n}p^{-n})+\exp(o(n))$ . As we can choose $\varepsilon$ arbitrarily small, the proof of Theorem 3.2 is complete.

To prove Theorem 3.3, we just use the sparse version of all lemmas used in the previous proof, except that we take $k$ equal to $\sqrt{m/\mu}$ rather than $n^{1-\varepsilon}$ . The starting point is

Instead of ${\mathbf{D}}({\operatorname{Q}}((w_{1},\ldots,w_{r}),k))$ , we will consider ${\mathbf{D}}({\operatorname{Q}}((w_{1},\ldots,w_{r}),\sqrt{\mu}k))$ . Thus, the gain from Lemma 9.1 is no longer $k^{-1}$ (which used to lead to the term $n^{-1/2+O(\varepsilon)}$ in the final bound), but instead $(k\sqrt{\mu})^{-1}$ (which leads to the term $n^{O(\varepsilon)}(\sqrt{m})^{-1}$ ). Meanwhile, the $\exp(o(n))$ factor is replaced with $\exp(n^{O(\varepsilon)}k^{2})=\exp(n^{O(\varepsilon)}m/\mu)$ . The reader is invited to work out the simple details.

Proof of Theorems 2.5 and 2.9

Theorem 2.5 follows from the following. Let $\sigma_{n}(M)$ denote the least singular value of a matrix $M$ of order $n$ . We shall abbreviate $N=N_{n,\rho}$ .

$\|M+N\|\leq n^{\gamma}$ with probability one.

$|{x}_{1}|+\dots+|{x}_{n}|\leq n^{\gamma}$ with probability one.

Indeed, as ${x}$ has finite second moment, we can assume that $|{x}|\leq n^{A+10}$ , at the cost of a (negligible) additional term $o(n^{-A})$ in probability. Thus, by restricting ${x}$ to the event $|{x}|\leq n^{A+10}$ and using the assumption about $M$ in Theorem 2.5, we can satisfy both assumptions in Theorem 11.1, for $\gamma$ large enough.

We can have a more efficient form of the theorem by bounding the probability that the two assumptions on $\|M+N\|$ and $|{x}_{1}|+\dots+|{x}_{n}|$ fail (rather than assuming that they hold with probability one). The relation between $B$ and $\gamma,A$ can be strengthened and we will do that in another paper.

We now prove Theorem 11.1. We suppress all dependence of the implied constants on $A,\gamma,B,\kappa$ .

Let us call a unit vector ${\mathbf{v}}=(v_{1},\ldots,v_{n})$ poor if we have

and rich otherwise. Theorem 11.1 follows directly from the following two lemmas and the fact that

(Proof of Lemma 11.3) We repeat the argument from . Let $E$ be the event that $\|(M+N)v\|\leq n^{-B}$ for some poor unit vector ${\mathbf{v}}$ . If $E$ holds, then the least singular value of $M+N$ is at most $n^{-B}$ , and so the same is true for the adjoint $(M+N)^{\dagger}$ . Thus there exists a row vector ${\mathbf{w}}^{\dagger}$ such that

Write ${\mathbf{w}}^{\dagger}=(w_{1},\ldots,w_{n})$ . By paying a factor of $n$ and using symmetry we may assume that the last coefficient of ${\mathbf{w}}^{\dagger}$ has the largest magnitude, thus

Thus, if we let $F$ be the event that there exists a unit vector ${\mathbf{w}}$ obeying both (32) and (33), we have

Let $X_{1},\ldots,X_{n}$ be the rows of $M+N$ . We shall condition on the first $n-1$ rows $X_{1},\ldots,X_{n-1}$ . Observe that if $E$ holds, then there exists a poor unit vector ${\mathbf{v}}$ such that

Thus, if ${\mathbf{P}}(E|X_{1},\ldots,X_{n-1})$ is non-zero, then there exists a poor unit vector ${\mathbf{u}}$ such that

On the other hand, if $F$ holds, and ${\mathbf{w}}^{\dagger}=(w_{1},\ldots,w_{n})$ is as above, then by (32)

taking inner products with the unit vector ${\mathbf{u}}$ and using the triangle inequality, we conclude

Using (34), Cauchy-Schwarz, and (36) we conclude

On the other hand, since ${\mathbf{u}}$ is poor, and $X_{n}$ is independent of $X_{1},\ldots,X_{n-1}$ (and hence independent of ${\mathbf{u}}$ also), we have

Putting all this together, we conclude that

uniformly in the choice of $X_{1},\ldots,X_{n-1}$ . Integrating over $X_{1},\ldots,X_{n-1}$ and using (35) we obtain ${\mathbf{P}}(E)\leq n^{-A}$ , as desired. ∎

(Proof of Lemma 11.4) Let $\varepsilon>0$ be a sufficiently small constant (in particular, smaller than the constant in Theorem 3.2); we allow all implied constants to depend on $\varepsilon$ . We may also assume that $n$ is sufficiently large depending on $\varepsilon$ .

Let $J$ be the smallest integer strictly larger than $2A+2$ , thus $2A+2<J\leq 2A+3$ . Thus, if we set $\delta:=(A+1)/J$ , then (using (31)) we have $0<\delta<1/2$ and $B>J\gamma$ . If $\varepsilon$ is sufficiently small, we thus have

Let ${\mathbf{v}}$ be a rich unit vector. For $j=0,1,\ldots,J$ , consider the quantities

These quantities are increasing in $j$ , and range between $n^{-A-1}$ and $1$ since ${\mathbf{v}}$ is rich. Applying the pigeonhole principle and using the definition of $\delta$ , we can thus find a positive $0\leq j\leq J-1$ such that

Define, for any $0\leq j\leq J-1$ and $1\leq k\leq\lceil(A+1)/\varepsilon\rceil$ , the set $\Omega_{j,k}$ as

Since the number of pairs $j,k$ is $O(1)$ , it suffices by the union bound to show that for each fixed $j,k$

In fact, we are going to show that this probability is exponentially small.

Let $p:=n^{-k\varepsilon}$ . In the notation of Theorem 3.2, ${\mathbf{v}}$ lies in $S_{n,{x},n^{-B+Cj+1/2},p}$ . Thus by this theorem, there is a set $V$ of cardinality at most

such that for each ${\mathbf{v}}\in\Omega_{j,k}$ there is ${\mathbf{v}}^{\prime}\in V$ such that $\|{\mathbf{v}}-{\mathbf{v}}^{\prime}\|_{\infty}\leq n^{-B+Cj+1/2}$ .

Consider $v\in\Omega_{{j,k}}$ and ${\mathbf{v}}^{\prime}\in V$ as above. Recall that $\|M+N\|\leq n^{\gamma}$ almost surely. Thus with probability $1$ we have

As usual, let $X_{i}$ be the $i$ th row of $M+N$ . It follows that there are at least $n^{\prime}:=n-n^{1-\varepsilon}$ coordinates $1\leq i\leq n$ such that

Now we relate the probability that $|X\cdot{\mathbf{v}}^{\prime}|\leq n^{-B+Cj+1/2+\gamma+\varepsilon}$ with $p$ , where $X:=({x}_{1},\dots,{x}_{n})$ . Consider the quantity

Notice that $|v_{i}-v_{i}^{\prime}|\leq n^{-B+Cj+1/2}$ and also $\sum_{i}|{x}_{i}|\leq n^{\gamma}$ with probability one. Thus

which implies, through the triangle inequality, that

where in the last inequality we used (38).

where in the last inequality we used the definition of $\Omega_{j,k}$ .

Also, a very crude second moment argument, using the fact that ${x}$ has $\kappa$ -controlled second moment, gives

if $\delta^{\prime}>0$ is small enough depending on $\kappa$ . Thus

Again by the union bound, the left-hand side of (39) is at most

It is routine to verify that the last quantity is $o(n^{-A})$ (indeed, we obtain a bound of the form $O(\exp(-\sigma n))$ for some $\sigma>0$ ). Our proof is complete. ∎

Now we sketch the proof of Theorem 2.9. We repeat the above arguments with the following changes. We will of course replace Theorem 3.2 by Theorem 3.3, with $\mu:=\rho$ . Due to the presence of the additional factors of $\mu$ in that theorem, we can no longer afford to choose $\delta$ close to $1/2$ , so we instead choose $\delta$ to be very small, say $\delta=\varepsilon$ , where $\varepsilon$ is very small compared to $1-\alpha$ . In order to take $\delta$ this small, we will need $B$ to be much larger than what (31) requires, but this is not a problem. For our applications, all we need is that $B$ does not depend on $n$ .

The treatment of the poor vectors (Lemma 11.3) in the sparse case is the same as in the non-sparse case. The treatment of the rich vectors (Lemma 11.4) is also essentially the same, except for the fact that we no longer have (40). To be more precise, $1-\delta^{\prime}$ needs to be replaced by $1-\delta^{\prime}\rho$ . In the cases when $k$ is larger than some absolute constant (say 5), (40) is not needed, since in this case $p$ is sufficiently small and

and the above argument goes through without difficulty, so long as one applies Theorem 3.3 with $m:=n^{C_{0}\varepsilon}$ for some sufficiently large absolute constant $C_{0}$ .

In the remaining case where $k$ is at most 5, the replacement of $1-\delta^{\prime}$ by $1-\delta^{\prime}\rho$ becomes too expensive and we will avoid it by a rescaling argument, using the pigeonhole principle.

To start, notice that from the definition of $\Omega_{j,k}$ and the fact $k\leq 5$ , we have

for some fixed $B-O_{\varepsilon}(1)\leq B^{\prime}\leq B$ . Since the left-hand side is $p_{1,{x}{\mathbf{I}}_{\rho}}(n^{B^{\prime}}{\mathbf{v}})$ , we also see from Lemma 4.4 that

We observe that this implies that ${\mathbf{v}}$ is “compressed” in the sense that at most $n^{100\varepsilon}/\rho$ of the coefficients of $v=(v_{1},\ldots,v_{n})$ can exceed $n^{-B^{\prime}+10}$ in magnitude. (Of course, instead of $100$ , on can use any large constant.) Indeed, if instead we had at least $n^{100\varepsilon}/\rho$ coefficients $v_{i}$ of magnitude at least $n^{-B^{\prime}+10}$ for some large absolute constant $A$ , we see from Lemma 4.5 that

for one of these $v_{i}$ , but one can show that this is not the case by a direct computation using the $\kappa$ -controlled second moment hypothesis, or else by an appeal to Theorem 6.6. (Here we used the notation of Lemma 4.5: $(z)^{s}$ denotes a vector of length $s$ whose every coordinate equals $z$ .)

Next, we apply the pigeonhole principle to conclude the existence of a $B^{\prime\prime}$ with $B^{\prime}-O_{\varepsilon}(1)\leq B^{\prime\prime}\leq B^{\prime}-10$ and an integer $m=O_{\varepsilon,\gamma}(1)$ with

By paying a factor of $O_{\varepsilon,\gamma}(1)$ in our final probability bound we may fix $B^{\prime\prime}$ and $m$ . If we define the vector ${\mathbf{w}}$ by setting $w_{i}$ to be the nearest (Gaussian integer) multiple of $n^{-B^{\prime\prime}+1}$ to $v_{i}$ , we see that $w_{i}$ is non-zero for at most $n^{m\varepsilon}$ coordinates $i$ , and has magnitude $\gg n^{-B^{\prime\prime}+10+\gamma}$ for at least $n^{(m-1)\varepsilon}$ of these coordinates. Also, if $\|(M+N){\mathbf{v}}\|\leq n^{-B}$ , we see from the triangle inequality and crude computations that $\|(M+N){\mathbf{w}}\|\leq n^{-B^{\prime\prime}+5+\gamma}$ (say), recalling that $B^{\prime\prime}<B+10$ .

On the other hand, note that if we let ${\mathbf{I}}_{i,\rho}$ be independent samples of ${\mathbf{I}}_{\rho}$ , then with probability $\Omega(n^{(m-1)\varepsilon}\rho)$ , there is at least one $i$ with ${\mathbf{I}}_{i,\rho}=1$ and $|w_{i}|=\Omega(n^{-B^{\prime\prime}+10+\gamma})$ . From this we conclude that

for some absolute constant $\delta^{\prime}>0$ (cf. (40)), and thus for each fixed ${\mathbf{w}}$ we have

On the other hand, a direct counting argument shows that the number of possible ${\mathbf{w}}$ is at most $\exp(O(n^{(m+1)\varepsilon}))$ . Recall that $\rho\geq n^{-1+\alpha}$ and $\alpha$ is much larger than $\varepsilon$ . It follows that

for any $m$ . Applying the union bound we obtain a suitably small contribution to the sparse analogue of Lemma 11.4, as required.

Proof of the circular law

We now use Theorem 2.5 to derive Theorem 1.2.

By Lemma 2.4 and rotating ${x}$ by a constant phase if necessaryHere of course we use the obvious fact that the circular law is invariant under phase rotation of the underlying random variable ${x}$ ., we may assume that ${x}$ has $\kappa$ -controlled second moment for some $\kappa$ . Allowing implied constants to depend on this $\kappa$ , we thus have that ${x}$ has $O(1)$ -controlled second moment, which will allow us to apply Theorem 2.5 later.

We closely follow the (now standard) argumentsOne could also follow the approach of Götze and Tikhomirov , as was done in . in [2, Chapter 10] (which are in turn based on the earlier work of Girko and Bai ), which we briefly review here.

Let $c_{n}:{\mathbf{R}}\times{\mathbf{R}}\to{\mathbf{C}}$ be the characteristic function

of the uniform measure $\mu_{\infty}$ on the disk. The sequence of empirical measures $\mu_{n}$ can be shown to be a.s. tight just from the assumption that ${x}$ has finite second moment (see [2, Lemma 10.5] and [2, Theorem 3.6], and also the discussion in [2, p. 295]), and so by standard arguments it suffices to show, for almost every $u,v$ , that

Henceforth we fix $u,v$ . We can take $uv\neq 0$ since we only need the claim for almost every $u,v$ . From [2, Lemma 10.2], we have the Stieltjes transform identity (first observed by Girko )

and $\nu_{n}$ is the empirical distribution of the positive-definite Hermitian matrix

The expression $g_{n}(s,t)$ is absolutely integrable in $s,t$ , however because of the unboundedness of $\log x$ , Fubini’s theorem is not currently applicable, and one must take some care with interchanging integrals or derivatives in this expression. In [2, Lemma 10.4], the analogous identity

is derived, where $g$ is a function whose explicit form (given in [2, p. 296]) we will not review here. The task is then to show that

The next steps in are to perform some truncations in the region of integration. Let $S>2$ be any integer. In [2, Lemma 10.6] (see also the discussion in [2, p. 299]), it was shown that

Fix $S$ . For any $\varepsilon>0$ , let $T\subset{\mathbf{R}}^{2}$ denote the set

(recall that $z:=s+\sqrt{-1}t$ ). In [2, Lemma 10.7] it is shown that

and similarly with $g_{n}(s,t)$ replaced by $g(s,t)$ ; thus it suffices to show that

where $\nu$ is an explicit probability measure which we will not review here; in particular, the inner integral is absolutely convergent. Set $\varepsilon_{n}:=n^{-2B}$ for some large absolute constant $B$ (independent of $n$ ) to be chosen later. Using the integration by parts argument given in [2, §10.7], it suffices to show that

and similarly with the two-dimensional integral on $T$ replaced by one-dimensional integrals on the boundary of $T$ . We shall only estimate the two-dimensional integrals, as the treatment of the one-dimensional ones are similarActually, by employing a smooth cutoff to $T$ rather than a rough one, one can dispense with the need to consider boundary integrals..

We first prove (44). Since ${x}$ has finite second moment, a simple application of Chebyshev’s inequality and the Borel-Cantelli lemma, and crude bounds on the spectral norm of $N_{n}$ shows that almost surely $\nu_{n}$ is supported on the interval $[0,n^{100}]$ . Thus it suffices to show that

Observe that $\log x$ has total variation bounded by a finite multiple of $\log n$ on the $x$ region of integration, thus it will suffice to show that

For this, it is convenient to perform some truncation, following [2, §10.5.1]. Let $0<\delta<1/4$ be arbitrary, and define the truncated random variables $\hat{a}_{ij}$ (depending on $n$ ) by

almost surely for some absolute constant $c>0$ , where $L$ denotes the Levi distance. Next, from [2, Lemma 10.15] we have

almost surely for another absolute constant $c^{\prime}>0$ (this is where we use the hypothesis $\delta<1/4$ ). Applying [2, Lemma 12.18] we conclude

and hence by the triangle inequality for Levi distance

for some $c^{\prime\prime}>0$ . Applying [2, Lemma 12.18] and [2, Lemma 10.8] we obtain

for some $c^{\prime\prime\prime}>0$ , which yields (46) (with some room to spare). This proves (44).

The only remaining task is to prove (45). We would like to reduce matters to establishing that almost surely we have

for almost every $z$ . The Lebesgue dominated convergence theorem does not apply directly. However, observe from the triangle inequality in $L^{2}$ that

since $\log|z|$ is locally square-integrable. From bounds on $\nu$ (e.g. [2, Lemma 10.8] and the estimates used to prove [2, Lemma 10.10]), we also have

is bounded uniformly in $n$ , which implies that the sequence of functions $\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z)$ is uniformly integrable on $T$ .

Now we can deduce (45) from (49). To see this, let $M>1$ be a large parameter, and let $T_{M,n}$ be the set of all $z$ such that $|\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z)|\leq M$ . From (49) and the Lebesgue dominated convergence theorem we have

On the other hand, from the uniform boundedness of (50) we see that

Adding these two estimates, and then letting $M\to\infty$ , we obtain (45).

It remains to prove (49). By Fubini’s theorem, it suffices to show for every $z$ that (49) holds almost surely. But observe that the integrand in (49) vanishes whenever $\frac{1}{\sqrt{n}}N_{n}-zI_{n}$ has least singular value at least $n^{-B}$ . By Theorem 2.5, this holds with probability at least $1-O(n^{-100})$ , if $B$ is sufficiently large. The claim then follows from the Borel-Cantelli lemma. This concludes the proof of Theorem 1.2.

Relaxation of the moment condition

We observe that the bound (46) was established with some room to spare. In fact, the arguments in allow one to relax the condition ${\mathbf{E}}|{x}|^{2+\delta}<\infty$ to the slightly weaker condition

for any sufficiently large constant $C$ . By inspecting the arguments in , we see that any $C>16$ will work. Perhaps a better constant can be obtained by tightening some calculations, but we do not try to pursue this direction. It seems to us that in the current approach, the extra log term cannot be removed completely in order to establish the full conjecture.

Using the moment method, we obtain the bound

for any integer $k\geq 1$ . If we choose $k:=\lfloor\frac{K\log n}{\log\log n}\rfloor$ for some sufficiently large absolute constant $K$ , then the factor $(\frac{1}{4}n^{\delta\eta})^{-2k}$ becomes $O(n^{-100})$ .

To conclude the argument, it suffices to show that

This type of bound was established for bounded $k$ in [2, Lemma 10.11] using the moment method. But it is well-known that the method extends to much higher value of $k$ , in particular $k=O_{K}(\frac{\log n}{\log\log n})$ . Indeed, the left-hand side of (52) can be expanded as

Thus the total contribution to (53) can be bounded by

The last ( $l=k$ ) term (which is the dominating term) is of order $O(1)^{k}n^{k+1}$ , which is acceptable. As for the $l<k$ terms, we can bound their contribution crudely by

using the definition of $k$ and the fact that $\delta$ is small. Thus, this contribution is negligible compared to the main term. This proves (52), and completes the derivation of the circular law under the hypothesis (51).

Rate of Convergence

Let us return to the original hypothesis of bounded $(2+\eta)^{\operatorname{th}}$ moment for some fixed $\eta>0$ . The above arguments can be pursued in more detail to obtain the more quantitative result that with probability $1$ , we have

for some $\eta^{\prime}>0$ depending on $\eta$ , and all sufficiently large $n$ .

A full exposition of this improvement would be very tedious, so we only give a brief sketch of how the argument proceeds. We first make some Fourier-analytic reductions, analogous to the proof of Weyl’s equidistribution theorem, to reduce matters to controlling the characteristic function $c_{n}(u,v)$ .

Applying the Kolmogorov law of large numbers, we conclude that with probability $1$

Let $\varphi$ be a bump function adapted to the ball $B(0,n^{10\eta^{\prime}})$ , and let $\hat{\varphi}:({\mathbf{R}}/n^{\eta^{\prime}/2}{\mathbf{Z}})^{2}\to{\mathbf{C}}$ be the Fourier series

This is an approximation to the identity, and one can then verify the pointwise bounds

Taking Fourier transforms and using the triangle inequality, we can bound the left-hand side by

Thus it will suffice (by the union bound and the Borel-Cantelli lemma) to show that for any fixed $u,v$ with $|u|,|v|\leq n^{\eta^{\prime}}$ , one has

To prove (55) one repeats the proof of (43), which requires going through all the relevant arguments in and noting that all the almost sure convergence results can be replaced instead with more quantitative polynomial convergence results (similar to (55)). We perform only one of these steps in detail, namely the proof of the quantitative analogue of (45),

Inspecting the proof of (49), we see that for each fixed $z$ , $\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z)$ vanishes with probability $O(n^{-100})$ . By Fubini’s theorem and Markov’s inequality, we thus see that with probability $1-O(n^{-50})$ , the set $\{z\in T:\int_{0}^{\varepsilon_{n}}\log x\ \nu_{n}(dx,z)\neq 0\}$ has measure at most $n^{-50}$ . Since (50) is bounded uniformly in $n$ , the claim now follows from the Cauchy-Schwarz inequality.

It is quite likely that one can make the convergence even more quantitative, establishing a bound of the form

for all $n\geq 1$ ; note that the claim (54) is a corollary of this bound and the Borel-Cantelli lemma. This requires replacing the Kolmogorov law of large numbers with a more quantitative law of large numbers which takes advantage of the fact that the random variable $|{x}|^{2}$ does not merely have finite first moment, but in fact has finite $(1+\frac{\eta}{2})^{\operatorname{th}}$ moment. We omit the details.

The sparse case

In this section we sketch how one can modify the arguments in Section 12 to obtain the circular law for sparse matrices (i.e. Theorem 1.3). The proof shall be a modification of thatIt is also likely that the arguments in (see also ) could also be adapted to handle this case, at least if one assumes additional moment conditions on ${x}$ , since the lower bound $\alpha>3/4$ required in that paper was only needed to obtain an analogue of Theorem 2.9. of Theorem 1.2. In that theorem, one first needed the convergence

which was a consequence of the Kolmogorov law of large numbers, in order to obtain tightness of the $\mu_{n}$ . In the sparse case, the analogous convergence result one needs is

But one easily computes that with probability $1$ , ${\mathbf{I}}_{j,k,\rho}$ is equal to $1$ for $(1+o(1))\rho n^{2}$ values of $j,k$ , and so this claim also follows from the Kolmogorov law of large numbers.

We are greatly indebted to Manjunath Krishnapur for pointing out the connection between the least singular value bounds and the circular law problem and for many useful discussions, and to Dmitry Timushev for corrections. We also thank P. Wood and an anonymous referee for their careful reading. The first author also thanks Mark Rudelson for some helpful conversations. The first author is supported by a grant from the Macarthur Foundation and by NSF grant CCF-0649473. The second author is supported by NSF Grant 06355606.