Random Matrices: The circular Law
Terence Tao, Van Vu
Introduction
Let be a complex random variable with finite non-zero variance and be the random matrix of order with entries being i.i.d. copies of . Let be the eigenvalues of . Define the empirical spectral distribution (ESD) of by the formula
We say that the (strong) circular law holds for if, with probability , the spectral distribution converges (uniformly) to the uniform distribution
over the unit disk as tends to infinity. In the literature one also sees the weak circular law, which asserts that for any fixed and , that converges to in probability.
As the name suggests, the weak circular law is easier to prove than the strong one. Using the approach in , the proofs of both types of convergence boil down to controlling the least singular value of . For the weak convergence, one needs a bound with failure probability tending to zero with tends to infinity (this is the approach taken in , for example). On the other hand, for the strong convergence one needs the failure probability be summable in . This appears much more difficult and we will discuss it in more detail in Section 2 (see the paragraph following Theorem 2.1).
In this paper we shall be concerned exclusively with the strong circular law, and in particular with regard to the following well-known conjecture:
Circular law conjecture. The strong circular law holds for any complex variable with zero mean and finite non-zero variance.
The circular law conjecture was formulated in the early 1950s, as a natural (non-hermitian) counterpart of Wigner’s semi-circle law. Since then, several partial results have been obtained, at the cost of extra assumptions on the distribution of the basic variable . In the next few paragraphs, we give a brief survey of these results.
If is complex Gaussian, the conjecture was proved by Mehta in 1967, using the joint density function of the eigenvalues which was discovered by Ginibre few years earlier . An important breakthrough was made by Bai , following an earlier work of Girko . (Bai’s paper discussed Girko’s paper carefully and pointed out some gaps in that paper.) In , Bai proved the claim under the assumption that has finite sixth moment () and that the joint distribution of the real and imaginary parts of has a bounded density. Recently, in [2, Chapter 10], a finer result was obtained showing that the sixth moment hypothesis can be weakened to for any specified . However, the bounded density assumption remains critical. This assumption, unfortunately, excludes several important distributions, for instance discrete distributions such as Bernoulli random variables .
[2, Theorem 10.3] Assume that the complex random variable has zero mean and finite moment for some and also that the joint distribution of the real and imaginary part has a bounded density. Then the circular law holds for .
A key idea in is to analyze the ESD through its Stieltjes transformation , defined by the formulaWe are using for the imaginary unit, as we wish to reserve as an index of summation.
As is analytic everywhere except the poles, the real part already determines the eigenvalues . If write , and , we have the important identity
where is the ESD of the Hermitian matrix . The task then reduces (at least in principle) to controlling the distributions .
The function has two poles, at and . The first one is easy to deal with, as one can bound the largest singular value by a polynomial in . The pole at poses a much more serious obstacle, since the smallest eigenvalue of (or the least singular value of ) can be arbitrary close to . (In fact, if the matrix is singular, which happens with positive probability in discrete models, then the least singular value is .) The bounded density assumption in Theorem 1.1 was introduced primarily in order to handle this obstacle.
In the last few years, the least singular value problem has become better understood in the discrete case, thanks to a series of papers . In these papers, strong lower bounds for the least singular value of a random matrix or a random perturbation of a fixed matrix were obtained. As a consequence, the circular law has recently been established for various new classes of distributions. For instance, Götze and Tikhomirov proved the weak circular law for any sub-GaussianA variable is sub-Gaussian if it has exponential tail; in particular all of its moments are bounded. distribution , using the arguments from . In , Girko established the weak circular law assuming bounded moment for some . Relying on , Pan and Zhou were recently able to verify the strong circular law for any distribution with a bounded fourth moment. This assumption is needed for a number of reasons, in particular allowing one to bound the operator norm of by with high probability. Very recently (a few months after the current paper was first posted on the arXiv), Götze and Tikhomirov proved the weak circular law under an assumption similar to our main theorem below.
In this paper, we prove the circular law only assuming a bounded moment, for any fixed . In particular, we have completely removed the bounded density function assumption in Theorem 1.1.
Assume that is a complex random variable with zero mean and finite moment for some , with strictly positive variance. Then the strong circular law holds for .
This result can be further strengthened in several directions:
We can further relax the condition to , where is a sufficiently large absolute constant. (For instance, is sufficient; see Section 13 for details.)
It is not necessary to assume that the entries have identical distributions. It suffices to assume that they are independent, have mean zero with uniformly bounded moments, and that they are all dominated (in a Fourier-analytic sense) by a single random variable with finite non-zero variance and bounded moment; see Remark 2.8 below. (See also [2, p. 326-327] in which the extension of the circular law to the case of non-identical distributions is discussed.)
One can obtain some quantitative estimates on the rate of convergence as well. For example, under the moment assumption, we can show that almost surely, the distance between and the limiting distribution in the uniform metric is at most for some constant and all sufficiently large .
It is too technical to address these points in the main proof, so we are going to first prove Theorem 1.2 and sketch out the necessary modifications to obtain these refinements in Sections 13, 14.
The circular law also holds for sparse random matrices. For , let be the boolean random variable which takes value 1 with probability and with probability . Let , for a positive constant . Let be the random matrix with the entry being , where the and are jointly independent iid copies of and respectively. Götze and Tikhomirov proved that if is sub-Gaussian and , then admits the circular law. We can prove the following strengthening of this result:
Let and be arbitrary positive constants. Assume that is a complex random variable with zero mean and finite moment. Set and let be the ESD of , where , as usual, is the variance of . Then tends to the uniform distribution over the unit disk as tends to infinity.
If one takes , the circular law no longer holds. In this case, and each entry equals with probability . Thus, a row is all-zero with probability . Since the rows are independent, it is easy to show that with high probability one has all-zero rows. But this means that the ESD, with high probability, has positive constant mass at the origin.
We shall prove this theorem in parallel with Theorem 1.2, by indicating at various junctures what the “sparse” version of certain key lemmas are.
The key ingredient in our proof of the circular law is a new lower bound for the least singular value of the matrix , where is an arbitrary matrix with complex entries having absolute values bounded from above by a polynomial in . For the circular law, we only need to consider the case , where is the identity matrix. On the other hand, the general case is interesting on its own right and proves useful in other areas of mathematics (see, for example, ). Our arguments permit the coefficients of or to be as large as for any fixed constant , which is the main reason why we do not need any stronger moment control on beyond the moment.
The rest of the paper is organized as follows. In the next section, we present the above mentioned result on the least singular value. The key tool for proving this result is a so-called Inverse Littlewood-Offord theorem, discussed in Section 3. This theorem is motivated by several previous results of the same spirit from . On the other hand, the bound in Section 3 is nearly optimal and is sharper than one that can be deduced from . This improvement is critical to us.
The proof of the Inverse Littlewood-Offord theorem is technical and requires several lemmas, developed in Sections 4-9. In particular, we prove a forward Littlewood-Offord theorem (Theorem 6.6), which seems to be of interest on its own right. The proof of the Inverse Theorem follows in Section 10. Next, we prove the desired bound on the least singular value in Section 11. The proof of the circular law follows in Section 12. The rest of the paper is devoted to various refinements of the circular law; for instance, Theorem 1.3 is discussed in Section 15.
In order to handle the sparse case, we need sparse versions of all the tools mentioned above. These results can be proved using the same argument with some modifications. We will only sketch these proofs in the paper.
Let us conclude this section with our notation.
In the whole paper we assume that is sufficiently large, whenever needed. Asymptotic notation is used under the assumption that . Let and be non-negative quantities. , , and all mean that for some positive constant and means ; means that where goes to zero as .
Throughout the paper, letters are used to denote constants. Letters denote quantities that may depend on .
We use to denote probability, to denote expectation, and to denote indicator functions of expectation as used earlier in this section. If is an event, we use to denote the indicator of , which equals when is true and otherwise. The cardinality of a finite set will be denoted , and the Lebesgue measure of a set will be denoted .
Least singular value bound
Let be a matrix of order . We use to denote the spectral norm of (i.e. the largest singular value of )
As discussed in the previous section, a key point in Bai’s approach is to obtain control on the lower tail distribution for the least singular value of , or equivalently to obtain control on the upper tail distribution of the norm of the inverse .
This will be achieved in Theorems 2.1 below. The strength of this theorem is that it requires a very weak assumption on the distribution of the entries. All we need is a finite second moment. Several results of this type were obtained recently, under stronger assumptions of . For example, addressed the case when was real Gaussian; addressed the case when has support on the integers and has integer entries. This was done building upon the case discussed in . The case when has finite third moment and that is bounded by was addressed in (building upon the real-valued case proven in ). In this result, the assumption on the norm of is important and the constant (in the exponent of ) cannot be replaced by any other constant. Furthermore, in the complex-valued case, the bounds in depended on the entire covariance matrix of and not just on the variance.
Let be positive constants, and let be a complex-valued random variable with non-zero finite variance (in particular, the second moment is finite). Then there are positive constants and such that the following holds: if is the random matrix of order whose entries are iid copies of , and is a deterministic matrix of order with spectral norm at most , then,
It is very important that we can have any constant in the bound. If , then the right hand side is summable in and this is critical to the strong circular law. In order to prove the weak law, any suffices. The difficulty between getting any and getting can be illustrated by the following simplified case. Take be the zero matrix and be the random Bernoulli matrix (whose entries take value with probability ). To make the situation even simpler, assume that we only want to bound the probability that does not exists (namely that is singular). Already in the 70s, Komlós proved that this probability is . However, the first proof for a bound of the type was obtained only almost twenty years later by Kahn, Komlós and Szemerédi , using a much more complex argument.
Let us now go back to Theorem 2.1. In fact, we have a more precise statement involving a seemingly stronger (but actually equivalent) assumption on . More precisely, we introduce the following technical definition.
Let . A complex random variable is said to have -controlled second moment if one has the upper bound
(in particular, ), and the lower bound
The Bernoulli random variable () has -controlled second moment. The condition (1) asserts in particular that has variance at least , but also asserts that a significant portion of this variance occurs inside the event , and also contains some more technical phase information about the covariance matrix of and .
To show that this condition is not significantly stronger than bounded second moment, we prove that any complex random variable of finite non-zero variance has controlled second moment after a (harmless) phase rotation:
Let be a complex random variable with finite non-zero variance. Then there exists a phase and a such that has -controlled second moment.
For sufficiently large, we have , and the event has probability at least . Let be the variable conditioned on the event . Since has non-zero variance, we see that will also have non-zero variance for large enough. It will then suffice to show that
after rotating by a phase if necessary. If we write , then we easily compute
so it suffices to show that for sufficiently large we have
Now set and consider the covariance matrix
Since has finite non-zero variance, we see that this matrix is finite, non-zero, and positive semi-definite. In particular its largest eigenvalue is at least for some . By monotone convergence we then conclude that the covariance matrix
has largest eigenvalue at least for sufficiently large.
Now fix large enough so that all the above statements hold, and also so that . The null space of (3) is at most one-dimensional. By rotating by a phase we may then assume that the null space is contained in the imaginary axis . Since covariance matrices are positive semi-definite, we thus have the quadratic form estimate
and (2) follows by setting and . ∎
Since rotating all entries by the same phase does not change the norm of the inverse, Theorem 2.1 follows from the following theorem.
Let be positive constants. There are positive constants and such that the following holds. Let be a random variable with -controlled second moment and be the random matrix of order whose entries are i.i.d copies of . Let be a deterministic matrix of order with spectral norm at most . Then,
Our arguments give an explicit dependence of in terms of and . One can set to be roughly . A more exact dependence can be obtained with considerably more technical details. Since for the proof of the circular law, any constant suffices, we do not go into this matter here and will discuss it elsewhere.
Notice that the assumptions in Theorem 2.5 are weaker than the assumption of Theorem 1.2. We do not require to have mean and bounded moment. In the proof of Theorem 1.2, these extra assumptions are needed in order to repeat the approach of Bai, and are unrelated to the pole problem or Theorem 2.5.
One can relax somewhat the hypothesis that the entries of are i.i.d copies of . It is sufficient to assume the following
are dominated by a single distribution in the Fourier-analytic sense that for all complex numbers .
has -controlled second moment for some fixed .
This refinement can be extracted without too much difficulty from the proof in this paper, which ultimately relies on Fourier-analytic methods. Using this refinement and following [2, Chapter 10.8.2], we can extend Theorem 1.2 for the case the the entries of are independent, but not necessarily identically distributed, as mentioned in the introduction.
In order to deal with sparse random matrices, we prove the following variant of Theorem 2.1.
Let be positive constants. There are positive constants and depending on such that the following holds. Let be a random variable with -controlled second moment and let be the random matrix of order defined as in Theorem 1.3. Let be a deterministic matrix of order with spectral norm at most . Then,
To conclude this section, let us derive a simple corollary of Theorem 2.9.
Let be positive constants. There are positive constants and such that the following holds. Let be a random variable with -controlled second moment and be the random matrix of order defined as in Theorem 1.3. Let be a deterministic matrix of order with spectral norm at most . Then,
A simple application of Chebyshev’s inequality shows that
Since is bounded from above by , we have that
by the union bound. Combining this with the polynomial bound on and with Theorem 2.9, the claim follows by choosing sufficiently large. ∎
The condition number of a matrix plays a crucial role in numerical linear algebra (see , for instance). The above corollary implies that if one perturbes a fixed matrix by a (very general) sparse random matrix , the condition number of the resulting matrix will be relatively small with high probability. This fact has some nice applications in theoretical computer science (see or , for example).
Inverse Littlewood-Offord theorems
Let us consider a toy case in order to illustrate the ideas behind the proof of Theorem 2.5. Assume, for a moment, that and is real Gaussian. In this case, we talk about the least singular value of the random matrix whose entries are i.i.d real Gaussian. Let be the row vectors of and be the distance from to the hyperplane spanned by , . The least singular value of is close (up to factors of ) to . Thus, our goal is to prove that with high probability, each of the is bounded away from 0.
In this Gaussian case, the task is simple since, thanks to symmetry, the distribution of does not depend on the vectors , . Indeed, has the same distribution as the distance from a Gaussian vector to a fixed hyperplane. This variable is well understood and satisfies the inequality
for any fixed positive constant . This leads to the conclusion of Theorem 2.5 in this simple case.
However, the general case is much more difficult. For example, if the entries of are iid Bernoulli, it is already non-trivial to prove is asymptotically almost surely non-singular (i.e. that with probability , one has for all ). The point here is that one can no longer fix . As a matter of fact, the distribution of the distance depends heavily on the position of the hyperplane spanned by the . For example, let be Bernoulli and consider the following two situations
has normal vector . In this case, with probability .
has normal vector . In this case, with probability .
A hyperplane is, in some sense, bad for us if the distance from a random (row) vector to is small with non-negligible probability. It is important to understand the bad hyperplanes. Notice that if is the unit normal vector of , then the distance in question is exactly the random variable
where are i.i.d. copies of .
This naturally leads to introducing the following concept.
Let be a complex random variable, and let be a tuple of complex numbers. We define the random walk to be the complex random variable
where are iid copies of . For any and , we let denote the closed disk of radius centered at . For any , we define the small ball probability
Intuitively, we expect the small ball probability to be quite small for “most” tuples . The question, of course, is to quantify “most”.
A classical theorem of Littlewood and Offord (see also ) shows that if is Bernoulli, and all , then . There are several extensions of this result. They, typically, improve upon the bound , under extra assumptions on the . We are going to refer to results in this spirit as forward Littlewood-Offord theorems.
For our purposes, we need inverse Littlewood-Offord theorems. Such a theorem is supposed to give a characterization of those vectors , where is larger than some lower bound. The study of inverse Littlewood-Offord theorems was started in , where we investigated the case when has discrete support. A new result in this spirit was recently obtained in , where the authors investigated sub-Gaussian distributions, as well as distributions with bounded third or fourth moments.
In the current situation, we only assume that has -controlled second moment. The weakness of this assumption is a major obstacle and makes the proof much more complicated. It is still possible to obtain a reasonably strong characterization of , given that is large. However, this characterization is somewhat technical to state and so we will only explicitly state here a corollary of it, which will be sufficient for our purpose of proving the least singular value bound and the circular law.
Let be a complex random variable. Let be a positive integer and be positive numbers that may depend on . Let be the set of all unit vectors such that one has the concentration bound
We give the norm
Let be a complex random variable which has -controlled second moment for some . Let . Then, for all which are sufficiently large depending on and and , there is a set of size at most such that for any there is such that . In other words, has a maximal -net in the norm of size at most .
Let be a complex random variable which has -controlled second moment for some . Let . Then, for all which are sufficiently large depending on and and , all , and all between and there is a set of size at most such that for any there is such that such that . In other words, has a maximal -net in the norm of size at most .
If one sets for some absolute constant then the conclusion of Theorem 3.3 is similar to that in Theorem 3.2 except for the extra term in Theorem 3.3. However, for our applications it will be slightly more convenient to choose at the other extreme, thus . The main point here is that the size of (or ) tends to be much smaller than .
Let be a precompact subset of a metric space , and let . We define the internal metric entropy to be the cardinality of the largest -net in (i.e. a set where any two elements in are separated by distance ). We define the external metric entropy to be the least number of closed -balls in needed to cover .
and furthermore in the complex plane we have . As constant factors will not play any important role, the two notions of entropy will be essentially equivalent for our purposes.
Since , we have the following corollary.
Let be a complex random variable which has -controlled second moment, for some constant . Let be an arbitrary positive constant. Then for any positive numbers and all sufficiently large we have
In fact, the proof of Theorem 3.2 gives a fairly precise description of the set , as is the case with other inverse Littlewood-Offord theorems in the literature. However, this description is somewhat technical to state and we only need the entropy bound on in our application, so we have presented Theorem 3.2 in the above short (but less explicit) form.
Concentration probabilities and Fourier analysis
Throughout this section will be a fixed complex random variable with -controlled second moment. For any , let be the random variable
where are iid copies of and is independent from .
If is the Bernoulli random variable , then with .
For any and any tuple of complex numbers, define the concentration probability
This quantity will turn out to be very convenient for controlling the small ball probabilities of (see Lemma 4.3 below). To do that, we first need a Fourier-analytic representation of . We introduce the characteristic function , defined by
For any tuple of complex numbers and any , we have
Here of course is Lebesgue measure on the complex plane .
On the other hand, from (4), (5), (7) and independence we see that
The relevance of concentration probability to the small ball probability is provided by the following lemma:
For any tuple and any , we have
In applications, will be very close to 0 and so the term can be ignored.
From Definition 3.1, it suffices to show that
Applying (9) as in the proof of the preceding lemma, we have
The quantity can be expanded, using (4) and (7), as . Since , it follows that
The claim of the lemma follows from the triangle inequality and Lemma 4.2. ∎
We now generalize the above lemma to the sparse case:
For any tuple and any , we have
The proof is almost identical as the previous one. The only difference here is that we have instead of . Notice that can be expanded as . Since , it follows that
The concentration probability has several pleasant properties (cf. [27, Lemma 5.1]):
Let . Then the following properties hold.
The quantity is monotone decreasing in and permutation invariant in .
For any tuples we have
where is the concatenation of and .
For any tuples we have
For any and tuple we have
where is the concatenation of copies of .
For any tuples we have
In particular, by the pigeonhole principle, there exists such that
Properties (i), (ii) are immediate from (8). To prove (iii), observe from (6) that
where we require the walks to be independent. Using the arithmetic mean-geometric mean inequality
Comparing this with (10) we obtain the claim.
The inequality (iv) follows easily from (8) and the elementary inequality for all , which follows from the convexity of . Finally, the inequality (v) follows from (8) and Hölder’s inequality. ∎
The x𝑥{x}-norm of a complex number
In this section, we present a way to estimate the characteristic function (and hence the concentration probabilities ) in terms of a more convenient expression. Define the -norm of a complex number by the formula
where are iid copies of , and denotes the distance from to the nearest integer.
If is Bernoulli, then . So in this case the -norm of is basically the size of the fractional part of .
For any and we have
for any tuple .
In view of the elementary inequality for , it will suffice to show that
and the claim follows from the elementary inequality . ∎
We now record some useful properties of the -norm, which may help explain why we call it a “norm”:
For any , and .
For any , .
If has -controlled second moment for some positive constant , then there exists a positive constant depending on such that for all .
Property (i) is obvious. Property (ii) follows from the triangle inequality for and the elementary observation that .
Now we prove (iii). Let for some small . From (11) it suffices to show that
for some . In particular . So if we let for be conditioned on the event , it suffices to show that
If is small enough depending on , then , so it suffices to show that
But this follows by conditioning on and then using (1). ∎
Generalized arithmetic progressions and the forward Littlewood-Offord theorem
As in previous literature, our Littlewood-Offord theorems shall involve generalized arithmetic progressions (GAPs), which we now define.
If are complex numbers and are positive numbers, we define the symmetric generalized arithmetic progression (or symmetric GAP for short)
We refer to as the rank of , as the generators, and as the dimensions.
If all the sums are distinct, we say that is proper. For , we define the dilate of as
Finally, if , we abbreviate as .
GAPs are a fundamental object in additive combinatorics and they have played a crucial role in our earlier papers on Inverse Littlewood-Offord theorems and least singular values . For a detailed discussion about these objects, we refer to .
It is helpful to view as the image of the integral box
under the linear map that sends the point to . is proper if is one-to-one.
We use the following two simple lemmas frequently:
Let be a symmetric GAP of rank and . Then
One can cover by translates of . ∎
Let be a finite set, and let be a set which can be covered by at most balls of radius . Then we have
We can of course assume that is non-empty. By the pigeonhole principle, we can find a ball of radius which contains at least elements of ; in particular it contains at least one element of . Since is contained in , the claim follows. ∎
For a GAP , define the dispersion to be the quantity
The quantity is very close to the metric entropy of , indeed simple volume packing arguments (cf. Lemma 6.4) show that . We will however not use that fact here.
This quantity turns out to control the concentration probability of certain random walks associated with :
For any , , and complex numbers , we have
This “forward Littlewood-Offord theorem” will be crucial in establishing Theorem 3.2. To give the reader some feeling about this estimate, let us first consider a toy case when is Bernoulli and . The adjusted random variable equals with probability and with probability .
Assume furthermore that the are non-zero integers and is proper. Thus and the desired bound becomes
Consider a (lazy) random walk starting at . At step , stay with probability and move to right or left by an amount with probability . The terminal point after step is exactly the random variable
Since , the quantity can be bounded from above by the sum of the probability that the lazy random walk with steps of size , …, steps of size ends up on a point with absolute value at most and a negligible term which is much smaller than .
Notice that the coefficient of is the sum of iid copies of . It is well known that the distribution of this sum is roughly uniform on the interval . (By roughly uniform, we mean that for any two integers in this interval, the ratio of their masses is bounded from above by a positive constant.) Thus, the main observation here (and somehow the essence of the theorem) is that the end point of the walk (conditioned on the fact that the coefficient of belongs to ) is roughly uniform in . It follows that the probability that it has absolute value can be bounded from above by , giving the desired bound.
This argument can be made rigorous for random variables with discrete supports, even when is not proper. However, the proof for the general case is more complicated. The main technical tool needed is the following level set estimate:
Given a GAP , a complex number , and , let be the set
We will prove Lemma 6.7 in Sections 7-8 below. For now, let us show how it implies Theorem 6.6.
We abbreviate . In view of (12), it suffices to show that
Covering by balls of radius , it thus suffices to show that
Now we fix . Let be a small positive constant to be determined. It is clear that if is sufficiently large, then
(In fact, can be replaced by for some large constant .) By Lemma 6.7,
We choose equal half of the reciprocal of the hidden constant in . It follows that
To conclude the proof of Theorem 6.6, we need to establish Lemma 6.7. This is the purpose of the next two sections.
Lacunary sets inside GAPs
Let be a set. We shall informally call a sequence of elements of lacunary if the ratio is large for all . The goal of this section is to show that a large GAP always contains a large lacunary subset with some prescribed properties. This fact will be a key ingredient in the proof of Lemma 6.7 (and hence Theorem 6.6), which we give in the next section.
To give the reader some motivation, let us consider the toy case when is an interval, say . Given a ratio and a constant (say), we can easily find elements such that and where satisfies
The main result of this section is a generalization of the above observation for general GAPs.
Let , let be a symmetric GAP of rank , and let be a radius. Then there exists, for some , “primary vectors” , and “secondary vectors” with the following properties:
(Lacunarity) We have for all .
(Secondary bounds) We have and for all .
where is the quantity
The secondary vectors are necessary here because is taking values in the complex numbers; if then we could simply take (and thus ) for all . The reader may wish to follow the argument below in the real case (and for ), as it is somewhat simpler in that case. The bound (16) may seem strange, but it is best possible except for the factor, and we will need such a tight estimate in our applications. The vectors are somewhat analogous to the Minkowski basis of a lattice with respect to a convex body, thus (16) can be viewed as a variant of Minkowski’s second theorem.
By increasing if necessary we may assume to be larger than any given constant depending on . We can also assume that is not contained in , as the claim is obvious otherwise.
We perform the following algorithm. We set for some sufficiently large constant depending only on .
Initialize . We also adopt the convention that .
Let . If then set and STOP. Otherwise, let be chosen such that is maximal; thus , and .
Let be chosen to maximize the quantity defined in (17). Observe that .
From elementary complex geometry we see that is now contained in a rectangle of dimensions . This rectangle can be covered by disks of radius . Applying Lemma 6.4, we conclude that the set
Increment to and return to Step 1.
Since have decreasing magnitude and lie in the finite set we see that this algorithm terminates in finite time. In fact we claim that this algorithm terminates before step . For if the algorithm reaches stage , we have obtained obeying the lacunarity condition . This implies that the GAP is proper, and that the pairwise sums between and are distinct and contained in . But this implies that
Also, since can be covered by balls of radius , we see from Lemma 6.4 that
But from definition of , we see that this is impossible if is chosen sufficiently large (recall we are taking large compared to ). Thus we have , which in particular implies that and lie in . Since is not contained in we also have .
Now we can cover by copies of , and thus
In particular, since and we have
using the definition of and recalling that is large compared to we conclude that . The claim (16) now follows from (20). The remaining claims are easily verified from the construction. ∎
Proof of Lemma 6.7
We are now ready to prove Lemma 6.7. In the following, is fixed and we write instead of . We also fix and allow all implied constants to depend on . We may assume without loss of generality that is large compared with , since the claim is trivial otherwise.
Let ; since is assumed large compared to , we see that is also. We apply Lemma 7.1 (with , and to the GAP ) to obtain vectors
for some such that for all , and for all , and
where is defined in (17). Since has rank , we have
and thus (since and is large compared with )
From (14), Lemma 5.3, and (21) we see that
for all and .
For , define and . Let be the GAP
One should view as a kind of “dual” to . It has the following properties:
.
If are distinct, then and are disjoint.
We first verify (i). If is not proper, then we have a linear relation
for some integers , not all zero, with and for . Let be the largest index such that is non-zero. If , then from the properties of we have
From the triangle inequality we then have
On the other hand, since are integers which are not both zero, and , and , we see that
and so (ii) now follows from (22) (recalling that and is large compared to ).
Now we prove (iii). If , then we see from the triangle inequality that
by lacunarity. But by construction , and the claim follows.
Now we prove (iv). If the claim was false, then we could find distinct and such that . We can then write
for some integers , not all zero, with and .
Let be the largest index such that is non-zero. From (23) and Lemma 5.3 we have
On the other hand, from the triangle inequality we have
If is large enough, then we can apply Lemma 5.3 to conclude from (24) that
if is large enough. On the other hand, by construction of we have
Since is an integer, we conclude . In that case we have
if , by construction of and . Since is an integer, we conclude . On the other hand, if , then we have as well, since . But is non-zero, a contradiction. ∎
From properties (ii), (iii), (iv) we see that
Structure of weak elements
Let be a GAP. Extend by a new dimension generated by a new element ; . We call weak if is only slightly more than . The goal of this section is to quantify (and generalize) the following phenomenon:
The reader may find the following simple example illustrative. Assume that is the interval . Assume that has cardinality at most , where for some small positive .
Consider the interval . The sets are subsets of . Since , these sets are not disjoint. Thus, we have for some distinct and . This implies that
This already gives a bound on the cardinality of the possible . But this bound can be improved further (this improvement is critical later on). Consider the set with . By the same argument as before, these sets are not disjoint, and we can conclude that
for and . If is irreducible, then and the number of ’s of this form is only at most . If it is not, then . The number of satisfying this condition in is at most . Thus, the number of ’s is at most , using the bound on and the fact that . Thus, altogether we obtain the bound
which is much better than the previous bound . The term will play a critical role in later proofs.
The main result of this section is a generalization of this very special case.
Let be complex numbers and . Let be a complex number and a positive integer. Define
Let denote the set of all complex numbers such that
Then has a -net of size at most .
The -net can be replaced by an -net if we replace the bound by . However, it is important to us to have the current formulation, as in the case when the net will have size exactly . The power of might be improvable, but we will not need this improvement here, as will always be relatively small for us compared to other parameters such as and .
Let . By definition of , we have
Let be a maximal -net of , then we see that the sets for cover , and thus
thanks to the easily verified fact that .
Refine the -net to a maximal -net . We have and thus
A simple greedy algorithm argument (using the symmetry of ) shows that we can find a set of cardinality such that for any distinct . Consider the sets for . By the construction, we can verify that
These sets are disjoint (thanks to the definition of and ).
Every set lies in (since and
Each set has cardinality (since is a -net).
Combining this with (25) we conclude that
On the other hand, , so
which asserts that many multiples of are close to .
Let be the smallest radius such that
where is a sufficiently large constant depending on .
Assume, for a moment, that . By the definition of , we can find, for each , an element such that . (If there are many , fix one arbitrarily.) Let and be two different indices, then
This implies that the sets are disjoint. Furthermore, as , they all lie in . Therefore,
But this contradicts (27) if we choose sufficiently large. Thus we have
If , then and has a maximal -net of cardinality and we are done.
From now on, we assume . Thus .
From (26) and the pigeonhole principle we can find such that . Thus there exists an integer such that . Since , we have and thus in fact . Thus, to obtain the desired bound on , it will suffice to show that
Let be any -net of . Observe that the sets for are disjoint and lie in . Thus we have
Since , it suffices to show that
But (as we are working on the plane) we can cover by balls of radius , so by Lemma 6.4 we have
Comparing this with (28) we obtain the claim. ∎
Proof of the inverse theorems
We first prove Theorem 3.2. The proof of Theorem 3.3 can be obtained with some minor modifications.
Let us begin with a simple reduction. Since has -controlled second moment, from Chebyshev’s inequality we see that with probability . Thus if we let be conditioned on the event , we see from the union bound that and differ by at most . Thus (modifying slightly if necessary) we may replace by , and so we may assume for the rest of the proof that
Consider a point in . Let be the vector obtained from by rounding the coordinates to the nearest Gaussian integer multiple of . Clearly thus . Furthermore, by (29)
We are going to find a small -net (in the norm) for the set of all possible satisfying the last inequality. Set , and let be an integer to be chosen later ( will be bounded by a constant.)
Now we perform the following algorithm (following the proof of [27, Theorem 2.4]) to construct some elements in for some .
Initialize . Set .
Count how many there are such that
If this number is less than then STOP. Otherwise, move on to Step 2.
Applying Lemma 4.5(v), we can find some such that
where is obtained from by deleting a set of elements. We then set and then increment to . If then STOP (with an error); otherwise return to Step 1.
By induction, at each stage in this algorithm we have
and hence by Theorem 6.6 and Lemma 4.5(ii)
On the other hand, by construction we have
Thus, the algorithm must terminate in Step 1 for some . At this point, we have obtained a tuple of elements in with such that
for all but at most values of .
Now we have enough information to construct the net. First we show that it costs a relatively small factor to take care of the exceptional coordinates. There are at most exceptional values of ; we can fix the values of the exceptional by paying a factor of
For each exceptional , is a Gaussian integer multiple of of magnitude . Thus, the number of possible choices for is . So, after we fix the exceptional coordinates , there are at most
ways to specify the values of these coordinates.
As for the remaining (non-exceptional) coordinates , Lemma 9.1 (along with (30), the definition of , and the bound ) shows that each such lies within distance of a set of cardinality . The set of all vectors has a -net in the norm of size at most
assuming sufficiently large depending on .
Changing a -net to a -net costs only a factor. Thus, we can conclude that there is an -net of size at most . As we can choose arbitrarily small, the proof of Theorem 3.2 is complete.
To prove Theorem 3.3, we just use the sparse version of all lemmas used in the previous proof, except that we take equal to rather than . The starting point is
Instead of , we will consider . Thus, the gain from Lemma 9.1 is no longer (which used to lead to the term in the final bound), but instead (which leads to the term ). Meanwhile, the factor is replaced with . The reader is invited to work out the simple details.
Proof of Theorems 2.5 and 2.9
Theorem 2.5 follows from the following. Let denote the least singular value of a matrix of order . We shall abbreviate .
with probability one.
with probability one.
Indeed, as has finite second moment, we can assume that , at the cost of a (negligible) additional term in probability. Thus, by restricting to the event and using the assumption about in Theorem 2.5, we can satisfy both assumptions in Theorem 11.1, for large enough.
We can have a more efficient form of the theorem by bounding the probability that the two assumptions on and fail (rather than assuming that they hold with probability one). The relation between and can be strengthened and we will do that in another paper.
We now prove Theorem 11.1. We suppress all dependence of the implied constants on .
Let us call a unit vector poor if we have
and rich otherwise. Theorem 11.1 follows directly from the following two lemmas and the fact that
(Proof of Lemma 11.3) We repeat the argument from . Let be the event that for some poor unit vector . If holds, then the least singular value of is at most , and so the same is true for the adjoint . Thus there exists a row vector such that
Write . By paying a factor of and using symmetry we may assume that the last coefficient of has the largest magnitude, thus
Thus, if we let be the event that there exists a unit vector obeying both (32) and (33), we have
Let be the rows of . We shall condition on the first rows . Observe that if holds, then there exists a poor unit vector such that
Thus, if is non-zero, then there exists a poor unit vector such that
On the other hand, if holds, and is as above, then by (32)
taking inner products with the unit vector and using the triangle inequality, we conclude
Using (34), Cauchy-Schwarz, and (36) we conclude
On the other hand, since is poor, and is independent of (and hence independent of also), we have
Putting all this together, we conclude that
uniformly in the choice of . Integrating over and using (35) we obtain , as desired. ∎
(Proof of Lemma 11.4) Let be a sufficiently small constant (in particular, smaller than the constant in Theorem 3.2); we allow all implied constants to depend on . We may also assume that is sufficiently large depending on .
Let be the smallest integer strictly larger than , thus . Thus, if we set , then (using (31)) we have and . If is sufficiently small, we thus have
Let be a rich unit vector. For , consider the quantities
These quantities are increasing in , and range between and since is rich. Applying the pigeonhole principle and using the definition of , we can thus find a positive such that
Define, for any and , the set as
Since the number of pairs is , it suffices by the union bound to show that for each fixed
In fact, we are going to show that this probability is exponentially small.
Let . In the notation of Theorem 3.2, lies in . Thus by this theorem, there is a set of cardinality at most
such that for each there is such that .
Consider and as above. Recall that almost surely. Thus with probability we have
As usual, let be the th row of . It follows that there are at least coordinates such that
Now we relate the probability that with , where . Consider the quantity
Notice that and also with probability one. Thus
which implies, through the triangle inequality, that
where in the last inequality we used (38).
where in the last inequality we used the definition of .
Also, a very crude second moment argument, using the fact that has -controlled second moment, gives
if is small enough depending on . Thus
Again by the union bound, the left-hand side of (39) is at most
It is routine to verify that the last quantity is (indeed, we obtain a bound of the form for some ). Our proof is complete. ∎
Now we sketch the proof of Theorem 2.9. We repeat the above arguments with the following changes. We will of course replace Theorem 3.2 by Theorem 3.3, with . Due to the presence of the additional factors of in that theorem, we can no longer afford to choose close to , so we instead choose to be very small, say , where is very small compared to . In order to take this small, we will need to be much larger than what (31) requires, but this is not a problem. For our applications, all we need is that does not depend on .
The treatment of the poor vectors (Lemma 11.3) in the sparse case is the same as in the non-sparse case. The treatment of the rich vectors (Lemma 11.4) is also essentially the same, except for the fact that we no longer have (40). To be more precise, needs to be replaced by . In the cases when is larger than some absolute constant (say 5), (40) is not needed, since in this case is sufficiently small and
and the above argument goes through without difficulty, so long as one applies Theorem 3.3 with for some sufficiently large absolute constant .
In the remaining case where is at most 5, the replacement of by becomes too expensive and we will avoid it by a rescaling argument, using the pigeonhole principle.
To start, notice that from the definition of and the fact , we have
for some fixed . Since the left-hand side is , we also see from Lemma 4.4 that
We observe that this implies that is “compressed” in the sense that at most of the coefficients of can exceed in magnitude. (Of course, instead of , on can use any large constant.) Indeed, if instead we had at least coefficients of magnitude at least for some large absolute constant , we see from Lemma 4.5 that
for one of these , but one can show that this is not the case by a direct computation using the -controlled second moment hypothesis, or else by an appeal to Theorem 6.6. (Here we used the notation of Lemma 4.5: denotes a vector of length whose every coordinate equals .)
Next, we apply the pigeonhole principle to conclude the existence of a with and an integer with
By paying a factor of in our final probability bound we may fix and . If we define the vector by setting to be the nearest (Gaussian integer) multiple of to , we see that is non-zero for at most coordinates , and has magnitude for at least of these coordinates. Also, if , we see from the triangle inequality and crude computations that (say), recalling that .
On the other hand, note that if we let be independent samples of , then with probability , there is at least one with and . From this we conclude that
for some absolute constant (cf. (40)), and thus for each fixed we have
On the other hand, a direct counting argument shows that the number of possible is at most . Recall that and is much larger than . It follows that
for any . Applying the union bound we obtain a suitably small contribution to the sparse analogue of Lemma 11.4, as required.
Proof of the circular law
We now use Theorem 2.5 to derive Theorem 1.2.
By Lemma 2.4 and rotating by a constant phase if necessaryHere of course we use the obvious fact that the circular law is invariant under phase rotation of the underlying random variable ., we may assume that has -controlled second moment for some . Allowing implied constants to depend on this , we thus have that has -controlled second moment, which will allow us to apply Theorem 2.5 later.
We closely follow the (now standard) argumentsOne could also follow the approach of Götze and Tikhomirov , as was done in . in [2, Chapter 10] (which are in turn based on the earlier work of Girko and Bai ), which we briefly review here.
Let be the characteristic function
of the uniform measure on the disk. The sequence of empirical measures can be shown to be a.s. tight just from the assumption that has finite second moment (see [2, Lemma 10.5] and [2, Theorem 3.6], and also the discussion in [2, p. 295]), and so by standard arguments it suffices to show, for almost every , that
Henceforth we fix . We can take since we only need the claim for almost every . From [2, Lemma 10.2], we have the Stieltjes transform identity (first observed by Girko )
and is the empirical distribution of the positive-definite Hermitian matrix
The expression is absolutely integrable in , however because of the unboundedness of , Fubini’s theorem is not currently applicable, and one must take some care with interchanging integrals or derivatives in this expression. In [2, Lemma 10.4], the analogous identity
is derived, where is a function whose explicit form (given in [2, p. 296]) we will not review here. The task is then to show that
The next steps in are to perform some truncations in the region of integration. Let be any integer. In [2, Lemma 10.6] (see also the discussion in [2, p. 299]), it was shown that
Fix . For any , let denote the set
(recall that ). In [2, Lemma 10.7] it is shown that
and similarly with replaced by ; thus it suffices to show that
where is an explicit probability measure which we will not review here; in particular, the inner integral is absolutely convergent. Set for some large absolute constant (independent of ) to be chosen later. Using the integration by parts argument given in [2, §10.7], it suffices to show that
and similarly with the two-dimensional integral on replaced by one-dimensional integrals on the boundary of . We shall only estimate the two-dimensional integrals, as the treatment of the one-dimensional ones are similarActually, by employing a smooth cutoff to rather than a rough one, one can dispense with the need to consider boundary integrals..
We first prove (44). Since has finite second moment, a simple application of Chebyshev’s inequality and the Borel-Cantelli lemma, and crude bounds on the spectral norm of shows that almost surely is supported on the interval . Thus it suffices to show that
Observe that has total variation bounded by a finite multiple of on the region of integration, thus it will suffice to show that
For this, it is convenient to perform some truncation, following [2, §10.5.1]. Let be arbitrary, and define the truncated random variables (depending on ) by
almost surely for some absolute constant , where denotes the Levi distance. Next, from [2, Lemma 10.15] we have
almost surely for another absolute constant (this is where we use the hypothesis ). Applying [2, Lemma 12.18] we conclude
and hence by the triangle inequality for Levi distance
for some . Applying [2, Lemma 12.18] and [2, Lemma 10.8] we obtain
for some , which yields (46) (with some room to spare). This proves (44).
The only remaining task is to prove (45). We would like to reduce matters to establishing that almost surely we have
for almost every . The Lebesgue dominated convergence theorem does not apply directly. However, observe from the triangle inequality in that
since is locally square-integrable. From bounds on (e.g. [2, Lemma 10.8] and the estimates used to prove [2, Lemma 10.10]), we also have
is bounded uniformly in , which implies that the sequence of functions is uniformly integrable on .
Now we can deduce (45) from (49). To see this, let be a large parameter, and let be the set of all such that . From (49) and the Lebesgue dominated convergence theorem we have
On the other hand, from the uniform boundedness of (50) we see that
Adding these two estimates, and then letting , we obtain (45).
It remains to prove (49). By Fubini’s theorem, it suffices to show for every that (49) holds almost surely. But observe that the integrand in (49) vanishes whenever has least singular value at least . By Theorem 2.5, this holds with probability at least , if is sufficiently large. The claim then follows from the Borel-Cantelli lemma. This concludes the proof of Theorem 1.2.
Relaxation of the moment condition
We observe that the bound (46) was established with some room to spare. In fact, the arguments in allow one to relax the condition to the slightly weaker condition
for any sufficiently large constant . By inspecting the arguments in , we see that any will work. Perhaps a better constant can be obtained by tightening some calculations, but we do not try to pursue this direction. It seems to us that in the current approach, the extra log term cannot be removed completely in order to establish the full conjecture.
Using the moment method, we obtain the bound
for any integer . If we choose for some sufficiently large absolute constant , then the factor becomes .
To conclude the argument, it suffices to show that
This type of bound was established for bounded in [2, Lemma 10.11] using the moment method. But it is well-known that the method extends to much higher value of , in particular . Indeed, the left-hand side of (52) can be expanded as
Thus the total contribution to (53) can be bounded by
The last () term (which is the dominating term) is of order , which is acceptable. As for the terms, we can bound their contribution crudely by
using the definition of and the fact that is small. Thus, this contribution is negligible compared to the main term. This proves (52), and completes the derivation of the circular law under the hypothesis (51).
Rate of Convergence
Let us return to the original hypothesis of bounded moment for some fixed . The above arguments can be pursued in more detail to obtain the more quantitative result that with probability , we have
for some depending on , and all sufficiently large .
A full exposition of this improvement would be very tedious, so we only give a brief sketch of how the argument proceeds. We first make some Fourier-analytic reductions, analogous to the proof of Weyl’s equidistribution theorem, to reduce matters to controlling the characteristic function .
Applying the Kolmogorov law of large numbers, we conclude that with probability
Let be a bump function adapted to the ball , and let be the Fourier series
This is an approximation to the identity, and one can then verify the pointwise bounds
Taking Fourier transforms and using the triangle inequality, we can bound the left-hand side by
Thus it will suffice (by the union bound and the Borel-Cantelli lemma) to show that for any fixed with , one has
To prove (55) one repeats the proof of (43), which requires going through all the relevant arguments in and noting that all the almost sure convergence results can be replaced instead with more quantitative polynomial convergence results (similar to (55)). We perform only one of these steps in detail, namely the proof of the quantitative analogue of (45),
Inspecting the proof of (49), we see that for each fixed , vanishes with probability . By Fubini’s theorem and Markov’s inequality, we thus see that with probability , the set has measure at most . Since (50) is bounded uniformly in , the claim now follows from the Cauchy-Schwarz inequality.
It is quite likely that one can make the convergence even more quantitative, establishing a bound of the form
for all ; note that the claim (54) is a corollary of this bound and the Borel-Cantelli lemma. This requires replacing the Kolmogorov law of large numbers with a more quantitative law of large numbers which takes advantage of the fact that the random variable does not merely have finite first moment, but in fact has finite moment. We omit the details.
The sparse case
In this section we sketch how one can modify the arguments in Section 12 to obtain the circular law for sparse matrices (i.e. Theorem 1.3). The proof shall be a modification of thatIt is also likely that the arguments in (see also ) could also be adapted to handle this case, at least if one assumes additional moment conditions on , since the lower bound required in that paper was only needed to obtain an analogue of Theorem 2.9. of Theorem 1.2. In that theorem, one first needed the convergence
which was a consequence of the Kolmogorov law of large numbers, in order to obtain tightness of the . In the sparse case, the analogous convergence result one needs is
But one easily computes that with probability , is equal to for values of , and so this claim also follows from the Kolmogorov law of large numbers.
We are greatly indebted to Manjunath Krishnapur for pointing out the connection between the least singular value bounds and the circular law problem and for many useful discussions, and to Dmitry Timushev for corrections. We also thank P. Wood and an anonymous referee for their careful reading. The first author also thanks Mark Rudelson for some helpful conversations. The first author is supported by a grant from the Macarthur Foundation and by NSF grant CCF-0649473. The second author is supported by NSF Grant 06355606.