Random matrices: Law of the determinant

Hoi H. Nguyen, Van Vu

Introduction

Let AnA_{n} be an nn by nn random matrix whose entries aij,1i,jna_{ij},1\leq i,j\leq n, are independent real random variables of zero mean and unit variance. We will refer to the entries aija_{ij} as the atom variables.

As determinant is one of the most fundamental matrix functions, it is a basic problem in the theory of random matrices to study the distribution of detAn\det A_{n} and indeed this study has a long and rich history. The earliest paper we find on the subject is a paper of Szekeres and Turán SzT from 1937, in which they studied an extremal problem. In the 1950s, there is a series of papers FT , NRR , Turan , Pre devoted to the computation of moments of fixed orders of detAn\det A_{n} (see also Gbook ). The explicit formula for higher moments gets very complicated and is in general not available, except in the case when the atom variables have some special distribution (see, e.g., Dembo ).

One can use the estimate for the moments and Markov inequality to obtain an upper bound on detAn|\det A_{n}|. However, no lower bound was known for a long time. In particular, Erdős asked whether detAn\det A_{n} is nonzero with probability tending to one. In 1967, Komlós Kom , Kom1 addressed this question, proving that almost surely detAn>0|\det A_{n}|>0 for random Bernoulli matrices (where the atom variables are i.i.d. Bernoulli, taking values ±1\pm 1 with probability 1/21/2). His method also works for much more general models. Following Kom , the upper bound on the probability that detAn=0\det A_{n}=0 has been improved in KKS , TVdet , TVsing , BVW . However, these results do not say much about the value of detAn|\det A_{n}| itself.

In a recent paper TVdet , Tao and the second author proved that for Bernoulli random matrices, with probability tending to one (as nn tends to infinity),

for any function ω(n)\omega(n) tending to infinity with nn. This shows that almost surely, logdetAn\log|\det A_{n}| is (12+o(1))nlogn(\frac{1}{2}+o(1))n\log n, but does not provide any distributional information. For related works concerning other models of random matrices, we refer to Ro .

In Goodman , Goodman considered random Gaussian matrices where the atom variables are i.i.d. standard Gaussian variables. He noticed that in this case the determinant is a product of independent Chi-square variables. Therefore, its logarithm is the sum of independent variables and, thus, one expects a central limit theorem to hold. In fact, using properties of Chi-square distribution, it is not very hard to prove

We refer the reader to RW , Section 4, for further discussion on this model.

In G2 , Girko stated that (2) holds for general random matrices under the additional assumption that the fourth moment of the atom variables is 3. Twenty years later, he claimed a much stronger result which replaced the above assumption by the assumption that the atom variables have bounded (4+δ)(4+\delta)th moment G . However, there are points which are not clear in these papers and we have not found any researcher who can explain the whole proof to us. In our own attempt, we could not pass the proof of Theorem 2 in G . In particular, definition (3.7) of this paper requires the matrix \Xi\bigl{(}{1\atop k}\bigr{)} to be invertible, but this assumption can easily fail.

In this paper, we provide a transparent proof for the central limit theorem of the log-determinant. The next question to consider, naturally, is the rate of convergence. We are able to obtain a rate which we believe to be near optimal.

We say that a random variable ξ\xi satisfies condition C0 (with positive constants C1,C2C_{1},C_{2}) if

Assume that all atom variables aija_{ij} satisfy condition C0 with some positive constants C1,C2C_{1},C_{2}. Then

Here and later, Φ(x)=P(N(0,1)<x)=12πxexp(t2/2)dt\Phi(x)={\mathbf{P}}({\mathbf{N}}(0,1)<x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}\exp(-t^{2}/2)\,dt. In the remaining part of the paper, we will actually prove the following equivalent form:

The reader is invited to consult Figure 1 for our simulation. To give some feeling about (5), let us consider the case when aija_{ij} are i.i.d. standard Gaussian. For 0in10\leq i\leq n-1, let ViV_{i} be the subspace generated by the first ii rows of AnA_{n}. Let Δi+1\Delta_{i+1} denote the distance from ai+1\mathbf{a}_{i+1} to ViV_{i}, where ai+1=(ai+1,1,,ai+1,n)\mathbf{a}_{i+1}=(a_{i+1,1},\ldots,a_{i+1,n}) is the (i+1)(i+1)th row vector of AnA_{n}. Then, by the “base times height” formula, we have

As the aija_{ij} are i.i.d. standard Gaussian, Δi+12\Delta_{i+1}^{2} are independent Chi-square random variables of degree nin-i. Thus, the right-hand side of (7) is a sum of independent random variables. Notice that Δi+12\Delta^{2}_{i+1} has mean nin-i and variance O(ni)O(n-i) and is very strongly concentrated. Thus, with high probability logΔi+12\log\Delta_{i+1}^{2} is roughly log((ni)+O(ni))\log((n-i)+O(\sqrt{n-i})) and so it is easy to show that logΔi+12\log\Delta_{i+1}^{2} has mean close to log(ni)\log(n-i) and variance O(1ni)O(\frac{1}{n-i}). So the variance of i=0n1logΔi+12\sum_{i=0}^{n-1}\log\Delta_{i+1}^{2} is O(logn)O(\log n). To get the precise value 2logn\sqrt{2\log n}, one needs to carry out some careful (but rather routine) calculation, which we leave as an exercise.

The reason for which we think that the rate log1/3+o(1)n\log^{-1/3+o(1)}n might be near optimal is that (as the reader will see though the proofs) 2logn2\log n is only an asymptotic value of the variance of logdetAn\log|\det A_{n}|. This approximation has an error term of order at least Ω(1)\Omega(1) and since 2logn+Ω(1)\sqrt{2\log n+\Omega(1)} 2logn=Ω(log1/2n)-\sqrt{2\log n}=\Omega(\log^{-1/2}n), it seems that one cannot have rate of convergence better than log1/2+o(1)n\log^{-1/2+o(1)}n. It is a quite interesting question whether one can obtain a polynomial rate by replacing log(n1)!\log(n-1)! and 2logn2\log n by other, relatively simple, functions of nn.

Our arguments rely on recent developments in random matrix theory and look quite different from those in Girko’s papers. In particular, we benefit from the arguments developed in TVdet , TVhard , TVlocal . We also use Talagrand’s famous concentration inequality frequently to obtain most of the large deviation results needed in this paper.

Our approach and main lemmas

We first make two extra assumptions about AnA_{n}. We assume that the entries aija_{ij} are bounded in absolute value by logβn\log^{\beta}n for some constant β>0\beta>0 and AnA_{n} has full rank with probability one. We will prove Theorem 1.1 under these two extra assumptions. In Appendix, we will explain why we can implement these assumptions without violating the generality of Theorem 1.1.

Assume that all atom variables aija_{ij} satisfy condition C0 and are bounded in absolute value by logβn\log^{\beta}n for some constant β\beta. Assume furthermore that AnA_{n} has full rank with probability one. Then

In the first, and main, step of the proof, we prove the claim of Theorem 2.1 but with the last logαn\log^{\alpha}n rows being replaced by Gaussian rows (for some properly chosen constant α\alpha). We remark that the replacement trick was also used in G , but for an entirely different reason. Our reason here is that for the last few rows, Lemma 2.4 is not very effective.

For any constant β>1\beta>1 the following holds for any sufficiently large constant α>0\alpha>0. Let AnA_{n} be an nn by nn matrix whose entries aij,1in0,1jna_{ij},1\leq i\leq n_{0},1\leq j\leq n, are independent real random variables of zero mean, unit variance and absolute values at most logβn\log^{\beta}n. Assume furthermore that AnA_{n} has full rank with probability one and the components of the last logαn\log^{\alpha}n rows of AA are independent standard Gaussian random variables. Then

In the second (and simpler) step of the proof, we carry out a replacement procedure, replacing the Gaussian rows by the original rows one at a time,j and show that the replacement does not effect the central limit theorem. This step is motivated by the Lindeberg replacement method used in TVlocal .

We present the verification of Theorem 2.1 using Theorem 2.2 in Section 8. In the rest of this section, we focus on the proof of Theorem 2.2.

Notice that in the setting of this theorem, the variables Δi\Delta_{i} are no longer independent. However, with some work, we can make the RHS of (7) into a sum of martingale differences plus a negligible error, which lays ground for an application of a central limit theorem of martingales. (In G , Girko also used the CLT for martingales via the base times height formula, but his analysis looks very different from ours.) We are going to use the following theorem, due to Machkouri and Ouchti MO .

There exists an absolute constant LL such that the following holds. Assume that X1,,XmX_{1},\ldots,X_{m} are martingale differences with respect to the nested σ\sigma-algebras E0,E1,,Em1\mathcal{E}_{0},\mathcal{E}_{1},\ldots,\mathcal{E}_{m-1}. Let vm2:=i=0m1E(Xi+12Ei)v_{m}^{2}:=\sum_{i=0}^{m-1}{\mathbf{E}}(X_{i+1}^{2}|\mathcal{E}_{i}), and sm2:=i=1mE(Xi2)s_{m}^{2}:=\sum_{i=1}^{m}{\mathbf{E}}(X_{i}^{2}). Assume that E(Xi+13Ei)γiE(Xi+12Ei){\mathbf{E}}(|X_{i+1}^{3}||\mathcal{E}_{i})\leq\gamma_{i}{\mathbf{E}}(X_{i+1}^{2}|\mathcal{E}_{i}) with probability one for all ii, where (γi)1m(\gamma_{i})_{1}^{m} is a sequence of positive real numbers. Then we have

To make use of this theorem, we need some preparation. Conditioning on the first ii rows a1,,ai\mathbf{a}_{1},\ldots,\mathbf{a}_{i}, we can view Δi+1\Delta_{i+1} as the distance from a random vector to Vi:=Span(a1,,ai)V_{i}:=\operatorname{Span}(\mathbf{a}_{1},\ldots,\mathbf{a}_{i}). Since AnA_{n} has full rank with probability one, dimVi=i\dim V_{i}=i with probability one for all ii. The following is a direct corollary of TVlocal , Lemma 43.

For any constant β>0\beta>0 there is a constant C3>0C_{3}>0 depending on β\beta such that the following holds. Assume that VRnV\subset{\mathbf{R}}^{n} is a subspace of dimension dim(V)n4\dim(V)\leq n-4. Let a\mathbf{a} be a random vector whose components are independent variables of zero mean and unit variance and absolute values at most logβn\log^{\beta}n. Denote by Δ\Delta the distance from a\mathbf{a} to VV. Then we have

where α\alpha is a sufficiently large constant (which may depend on β\beta). We will use shorthand kik_{i} to denote nin-i, the co-dimension of ViV_{i} (and the expected value of Δi2\Delta_{i}^{2}),

We next consider each term of the right-hand side of (7) where 0i<n00\leq i<n_{0}. Using the Taylor expansion, we write

By applying Lemma 2.4 with t=ki1/8logα/8nt=k_{i}^{1/8}\geq\log^{\alpha/8}n and by choosing α\alpha sufficiently large, we have with probability at least 1O(exp(log2n))1-O(\exp(-\log^{2}n)) [the probability here is with respect to the random (i+1)(i+1)th row, fixing the first ii rows arbitrarily]

Thus, with probability at least 1O(exp(log2n))1-O(\exp(-\log^{2}n))

Hence, by a uniform bound, the following holds with probability at least 1nO(exp(log2n))=1O(exp(log2n/2))1-n\cdot O(\exp(-\log^{2}n))=1-O(\exp(-\log^{2}n/2)):

again by having α\alpha sufficiently large.

With probability at least 1O(exp(log2n/2))1-O(\exp(-\log^{2}n/2))

Theorem 2.2 follows from the above four lemmas and the following trivial fact (used repeatedly and with proper scaling):

The reader is invited to fill in the simple details using the following observation:

We will prove Lemma 2.6 using Theorem 2.3. Lemma 2.7 will be verified by the moment method and Lemma 2.8 by elementary properties of Chi-square variables. The key to the proof of Lemmas 2.6 and 2.7 is an estimate on the entries of the projection matrix onto the space ViV_{i}^{\bot}, presented in Section 4.

Proof of Lemmas 2.6 and 2.7: Opening

We recall from the previous section that Xi+1=Δi+12kikiX_{i+1}=\frac{\Delta_{i+1}^{2}-k_{i}}{k_{i}}. Denote by Pi=(pst(i))s,tP_{i}=(p_{st}(i))_{s,t} the projection matrix onto the orthogonal complement ViV_{i}^{\bot}. A standard fact in linear algebra is

where a1=ai+1,1,,an=ai+1,na_{1}=a_{i+1,1},\ldots,a_{n}=a_{i+1,n} are the coordinates of the vector ai+1\mathbf{a}_{i+1} and

By (11) we have sqss(i)=1\sum_{s}q_{ss}(i)=1 and s,tqst(i)2=1ki.\sum_{s,t}q_{st}(i)^{2}=\frac{1}{k_{i}}.

Because Eas=0{\mathbf{E}}a_{s}=0 and Eas2=1{\mathbf{E}}a_{s}^{2}=1, and the asa_{s} are mutually independent, we can show by using a routine calculation that [see (6) from Section 6]

where Ei\mathcal{E}_{i} is the σ\sigma-algebra generated by the first ii rows of AnA_{n}.

The reason we split Xi+122+1ki-\frac{X_{i+1}^{2}}{2}+\frac{1}{k_{i}} into the sum of Yi+1Y_{i+1} and Zi+1Z_{i+1} is that E(Yi+1Ei)=0{\mathbf{E}}(Y_{i+1}|\mathcal{E}_{i})=0 and its variance can be easily computed.

To complete the proof of Lemma 2.7 from Lemma 3.1, it suffices to show that the sum of the ZiZ_{i} is negligible,

Our main technical tool will be the following lemma.

Noticing that Eas4{\mathbf{E}}a_{s}^{4} is uniformly bounded (by condition C0), it follows that with probability 1O(n100)1-O(n^{-100}),

Proof of Lemmas 2.6 and 2.7: Mid game

The key idea for proving Lemma 3.2 is to establish a good upper bound for qss(i)|q_{ss}(i)|. For this, we need some new tools. Our main ingredient is the following delocalization result, which is a variant of a result from TVhard (see also E and TVsurvey for recent surveys), asserting that with high probability all unit vectors in the orthogonal complement of a random subspace with high dimension have small infinity norm.

For any constant β>0\beta>0 the following holds for all sufficiently large constant α>0\alpha>0. Assume that the components of a1,,an1\mathbf{a}_{1},\ldots,\mathbf{a}_{n_{1}}, where n1:=nnlog4αnn_{1}:=n-n\log^{-4\alpha}n, are independent random variables of mean zero, variance one and bounded in absolute value by logβn\log^{\beta}n. Then with probability 1O(n100)1-O(n^{-100}), the following holds for all unit vectors v\mathbf{v} of the space Vn1V_{n_{1}}^{\bot}:

Proof of Lemma 3.2 assuming Lemma 4.1 Write

Note that as qst(i)=pst(i)/kiq_{st}(i)=p_{st}(i)/k_{i},

for some unit vector vVi\mathbf{v}\in V_{i}^{\bot}.

Thus, if i>n1i>n_{1}, then ViVn1V_{i}^{\bot}\subset V_{n_{1}}^{\bot} and, hence, by Lemma 4.1

We now focus on the infinity norm of v\mathbf{v} and follow an argument from TVhard .

Proof of Lemma 4.1 By the union bound, it suffices to show that v1=O(log2αn)|v_{1}|=O(\log^{-2\alpha}n) with probability at least 1O(n101)1-O(n^{-101}), where v1v_{1} is the first coordinate of v\mathbf{v}.

Let BB be the matrix formed by the first n1n_{1} rows a1,,an1\mathbf{a}_{1},\ldots,\mathbf{a}_{n_{1}} of AA. Assume that vVn1\mathbf{v}\in V_{n_{1}}^{\bot} is a unit vector, then

Let w\mathbf{w} be the first column of BB, and BB^{\prime} be the matrix obtained by deleting w\mathbf{w} from BB. Clearly,

where v\mathbf{v}^{\prime} is the vector obtained from v\mathbf{v} by deleting v1v_{1}.

We next invoke the following result, which is a variant of TVhard , Lemma 4.1. This lemma was proved using a method of Guionet and Zeitouni GZ , based on Talagrand’s inequality.

For any constant β>0\beta>0 the following holds for all sufficiently large constant α>0\alpha>0. Let AnA_{n} be a random matrix of size nn by nn, where the entries aija_{ij} are independent random variables of mean zero, variance one and bounded in absolute value by logβn\log^{\beta}n. Then for any n/logαnkn/2n/\log^{\alpha}n\leq k\leq n/2, there exist 2k2k singular values of AnA_{n} in the interval [0,ck/n][0,ck/\sqrt{n}], for some absolute constant cc, with probability at least 1O(n101)1-O(n^{-101}).

We can prove Lemma 4.2 by following the arguments in TVhard , Lemma 4.1, almost word by word.

By the interlacing law and Lemma 4.2, we conclude that BB^{\prime} has nn1n-n_{1} singular values in the interval [0,c(nn1)/n][0,c(n-n_{1})/\sqrt{n}] with probability 1O(n101)1-O(n^{-101}).

Let HH be the space spanned by the left singular vectors of these singular values, and let π\pi be the orthogonal projection onto HH. By definition, the spectral norm of πB\pi B^{\prime} is bounded,

here we used the fact that w\mathbf{w} is independent from BB^{\prime}, and thus from π\pi.

On the other hand, since the dimension of HH is nn1n-n_{1}, Lemma 2.4 implies that πwnn1/2\|\pi\mathbf{w}\|\geq\sqrt{n-n_{1}}/2 with probability 14exp((nn1)/16)=1O(nω(1))1-4\exp(-(n-n_{1})/16)=1-O(n^{-\omega(1)}).

Proof of Lemma 2.6: End game

Recall from (10) that conditioned on any first ii rows, Xi=O(ki3/8)|X_{i}|=O(k_{i}^{-3/8}) with probability 1O(exp(log2n/2))1-O(\exp(-\log^{2}n/2)). So, by paying an extra term of O(exp(log2n/2))O(\exp(-\log^{2}n/2)) in probability, it suffices to justify Lemma 2.6 for the sequence Xi:=XiIXi=O(ki3/8)X_{i}^{\prime}:=X_{i}\cdot\mathbf{I}_{|X_{i}|=O(k_{i}^{-3/8})}.

On the other hand, the sequence Xi+1X_{i+1}^{\prime} is not a martingale difference sequence, so we slightly modify Xi+1X_{i+1}^{\prime} to Xi+1:=Xi+1E(Xi+1Ei)X_{i+1}^{\prime\prime}:=X_{i+1}^{\prime}-{\mathbf{E}}(X_{i+1}^{\prime}|\mathcal{E}_{i}) and prove the claim for the sequence Xi+1X_{i+1}^{\prime}, here we recall that Ei\mathcal{E}_{i} is the σ\sigma-algebra generated by the first ii rows of AnA_{n}. In order to show that this modification has no effect whatsoever, we first demonstrate that E(Xi+1Ei){\mathbf{E}}(X_{i+1}^{\prime}|\mathcal{E}_{i}) is extremely small.

Recall from (12) that Xi+1=s,tqst(i)asat1X_{i+1}=\sum_{s,t}q_{st}(i)a_{s}a_{t}-1. By the Cauchy–Schwarz inequality and the assumption that asa_{s} are bounded in absolute value by logO(1)n\log^{O(1)}n, we have with probability one

To justify Lemma 2.6 for the sequence Xi+1X_{i+1}^{\prime\prime}, we apply Theorem 2.3.

The key point here is that thanks to the indicator function in the definition of Xi+1X_{i+1}^{\prime} and the fact that the difference between Xi+1X_{i+1}^{\prime\prime} and Xi+1X_{i+1}^{\prime} is negligible, Xi+1X_{i+1}^{\prime\prime} is bounded by O(ki3/8)O(k_{i}^{-3/8}) with probability one, so the conditions E(Xi+13Ei)γiE(Xi+12Ei){\mathbf{E}}(|X_{i+1}^{\prime\prime}|^{3}|\mathcal{E}_{i})\leq\gamma_{i}{\mathbf{E}}({X_{i+1}^{\prime\prime}}^{2}|\mathcal{E}_{i}) in Theorem 2.3 are satisfied with

We need to estimate sn0,vn0s_{n_{0}},v_{n_{0}} with respect to the sequence Xi+1X_{i+1}^{\prime\prime}. However, thanks to the observations above, Xi+1X_{i+1} and Xi+1X_{i+1}^{\prime\prime} are very close, and so it suffices to compute these values with respect to the sequence Xi+1X_{i+1}.

Also, recall from Section 4 that with probability 1O(n100)1-O(n^{-100}),

This bound, together with (13) and (5), imply that with probability one

which in turn implies that vn02=2logn+O(loglogn)v_{n_{0}}^{2}=2\log n+O(\log\log n) with probability 1O(n100)1-O(n^{-100}).

Using (5) again, because n100n2logO(1)n=o(1)n^{-100}n^{2}\log^{O(1)}n=o(1), we deduce that

With another application of (5), we obtain

By the conclusion of Theorem 2.3 and setting α\alpha sufficiently large, we conclude

Proof of Lemma 2.7: End game

Our goal is to justify Lemma 3.1, which together with (14) verify Lemma 2.7.

We will show that the variance Var(i<n0Yi+1)\operatorname{Var}(\sum_{i<n_{0}}Y_{i+1}) is small and then use Chebyshev’s inequality. The proof is based on a series of routine, but somewhat tedious calculations. We first show that the expectations of the Yi+1Y_{i+1}’s are zero, and so are the covariances E(Yi+1Yj+1){\mathbf{E}}(Y_{i+1}Y_{j+1}) by an elementary manipulation. The variances Var(Yi+1)\operatorname{Var}(Y_{i+1}) will be bounded from above by the Cauchy–Schwarz inequality.

We start with the formula Xi+12=(s,tqst(i)asat)22s,tqst(i)asat+1X_{i+1}^{2}=(\sum_{s,t}q_{st}(i)a_{s}a_{t})^{2}-2\sum_{s,t}q_{st}(i)a_{s}a_{t}+1. Observe that

Expanding each term, using the fact that sqss(i)=1\sum_{s}q_{ss}(i)=1 and s,tqst(i)2=1ki\sum_{s,t}q_{st}(i)^{2}=\frac{1}{k_{i}}, we have

As Eas=0,Eas2=1{\mathbf{E}}a_{s}=0,{\mathbf{E}}a_{s}^{2}=1, and the asa_{s}’s are mutually independent with each other and with every row of index at most ii [and in particular with qst(i)q_{st}(i)’s], every term in the last formula is zero, and so we infer that E(Yi+1)=0{\mathbf{E}}(Y_{i+1})=0 and E(Yi+1Ei)=0{\mathbf{E}}(Y_{i+1}|\mathcal{E}_{i})=0, confirming (13). With the same reasoning, we can also infer that the covariance E(Yi+1Yj+1)=0{\mathbf{E}}(Y_{i+1}Y_{j+1})=0 for all j<ij<i.

It is thus enough to work with the diagonal terms Var(Yi+1)\operatorname{Var}(Y_{i+1}). We have

After a series of cancellations, and because of condition C0, we have

where the first two rows consist of the squares of the terms appearing in Yi+1Y_{i+1} (after deleting several sums of zero expected value), and each of the following rows was obtained by expanding the product of each term with the rest in the order of their appearance.

Because s,tqst(i)2=1ki\sum_{s,t}q_{st}(i)^{2}=\frac{1}{k_{i}}, one has maxs,tqst(i)1ki\max_{s,t}|q_{st}(i)|\leq\frac{1}{\sqrt{k_{i}}} for all s,ts,t. Recall furthermore that sqss(i)=1\sum_{s}q_{ss}(i)=1 and 0qss(i)0\leq q_{ss}(i) for all ss. We next estimate the terms under consideration one by one as follows.

First, the sums sqss3(i),sqss(i)4\sum_{s}q_{ss}^{3}(i),\sum_{s}q_{ss}(i)^{4}, s,tqss(i)qst(i)2,s,tqss(i)2qst(i)2\sum_{s,t}q_{ss}(i)q_{st}(i)^{2},\sum_{s,t}q_{ss}(i)^{2}q_{st}(i)^{2}, s,tqss(i)qtt(i)qst(i)2\sum_{s,t}q_{ss}(i)q_{tt}(i)q_{st}(i)^{2}, and s,tqst(i)3qss(i)\sum_{s,t}|q_{st}(i)^{3}q_{ss}(i)| can be bounded bymaxs,tqst(i)s,tqst2(i)\max_{s,t}|q_{st}(i)|\sum_{s,t}q_{st}^{2}(i), and so by ki3/2k_{i}^{-3/2}.

Second, by applying the Cauchy–Schwarz inequality if needed, one can bound the sums s,t1,t2qst1(i)2qst2(i)2\sum_{s,t_{1},t_{2}}q_{st_{1}}(i)^{2}q_{st_{2}}(i)^{2}, s1,t1,s2,t2qs1t1(i)qs1t2(i)qs2t1(i)qs2t2(i)\sum_{s_{1},t_{1},s_{2},t_{2}}|q_{s_{1}t_{1}}(i)q_{s_{1}t_{2}}(i)q_{s_{2}t_{1}}(i)q_{s_{2}t_{2}}(i)|, and s,t1,t2qss(i)qst1(i)qst2(i)qt1t2(i)\sum_{s,t_{1},t_{2}}|q_{ss}(i)q_{st_{1}}(i)q_{st_{2}}(i)q_{t_{1}t_{2}}(i)| by 2(s,tqst2(i))22(\sum_{s,t}q_{st}^{2}(i))^{2}, and so by 2ki22k_{i}^{-2}.

s,t1,t2qss(i)2qt1t1(i)qt2t2(i)=(sqss(i)2)(tqtt(i))2=sqss(i)2.\sum_{s,t_{1},t_{2}}q_{ss}(i)^{2}q_{t_{1}t_{1}}(i)q_{t_{2}t_{2}}(i)=(\sum_{s}q_{ss}(i)^{2})(\sum_{t}q_{tt}(i))^{2}=\sum_{s}q_{ss}(i)^{2}.

s,tqss(i)2qtt(i)+s,tqss(i)3qtt(i)2(sqss(i)2)(tqtt(i))=2sqss(i)2.\sum_{s,t}q_{ss}(i)^{2}q_{tt}(i)+\sum_{s,t}q_{ss}(i)^{3}q_{tt}(i)\leq 2(\sum_{s}q_{ss}(i)^{2})(\sum_{t}q_{tt}(i))=2\sum_{s}q_{ss}(i)^{2}.

s,tqss(i)qtt(i)qst(i)s,tqss(i)(qtt(i)2+qst(i)2)tqtt(i)2+\breakmaxsqss(i)s,tqst(i)2\sum_{s,t}|q_{ss}(i)q_{tt}(i)q_{st}(i)|\leq\sum_{s,t}q_{ss}(i)(q_{tt}(i)^{2}+q_{st}(i)^{2})\leq\sum_{t}q_{tt}(i)^{2}+\break\max_{s}q_{ss}(i)\sum_{s,t}q_{st}(i)^{2} tqtt(i)2+ki3/2.\leq\sum_{t}q_{tt}(i)^{2}+k_{i}^{-3/2}.

s,tqss(i)2qtt(i)qst(i)sups,tqst(i)s,tqss(i)2qtt(i)sqss(i)2/ki\sum_{s,t}|q_{ss}(i)^{2}q_{tt}(i)q_{st}(i)|\leq\sup_{s,t}|q_{st}(i)|\sum_{s,t}q_{ss}(i)^{2}q_{tt}(i)\leq\sum_{s}q_{ss}(i)^{2}/\sqrt{k_{i}}.

where we applied Lemma 3.2 in the last estimate.

To complete the proof, we note from the estimate of sn02s_{n_{0}}^{2} of Section 5 and from Lemma 3.2 that i<n0EYi+1=O(loglogn)|\sum_{i<n_{0}}{\mathbf{E}}Y_{i+1}|=O(\log\log n). Thus, by Chebyshev’s inequality

Proof of Lemma 2.8

We recall that, with in0i\geq n_{0}, Δi+12\Delta_{i+1}^{2} is a Chi-square random variable of degree nin-i. Let us first consider the lower tail; it suffices to show

By properties of the normal distribution, it is easy to show that Δn2\Delta_{n}^{2} and Δn12\Delta_{n-1}^{2} are at least exp(24logcn)\exp(-\frac{\sqrt{2}}{4}\log^{c}n) with probability 1exp(Ω(logcn))1-\exp(-\Omega(\log^{c}n)), so we can omit these terms from the sum. It now suffices to show that

Flipping the inequality inside the probability (by changing the sign of the RHS and swapping the denominators and numerators in the logarithms of the LHS) and using the Laplace transform trick (based on the fact that the Δi2\Delta^{2}_{i} are independent), we see that the probability in question is at most

Recall that Δi+12\Delta_{i+1}^{2} is a Chi-square random variable with degree of freedom nin-i, so E1Δi+12=1ni2{\mathbf{E}}\frac{1}{\Delta^{2}_{i+1}}=\frac{1}{n-i-2}. Therefore, the numerator in the previous formula is (nn0)(nn01)2log2αn\frac{(n-n_{0})(n-n_{0}-1)}{2}\leq\log^{2\alpha}n.

The proof for the upper tail is similar (in fact simpler as we do not need to treat the first two terms separately) and we omit the details.

Deduction of Theorem 2.1 from Theorem 2.2

Our plan is to replace one by one the last nn0n-n_{0} Gaussian rows of AnA_{n} by vectors of components having zero mean, unit variance and satisfying condition C0. Our key tool here is the classical Berry–Eseen inequality. In order to apply this lemma, we will make a crucial use of Lemma 4.1.

Assume that v=(v1,,vn)\mathbf{v}=(v_{1},\ldots,v_{n}) is a unit vector. Assume that b1,,bnb_{1},\ldots,b_{n} are independent random variables of mean zero, variance one and satisfying condition C0. Then we have

where cc is an absolute constant depending on the parameters appearing in (3).

We remark that in the original setting of Berry and Esseen, it suffices to assume the finite third moment.

In application, v\mathbf{v} plays the role of the normal vector of the hyperplane spanned by the remaining n1n-1 rows of AA, and Δn=v1b1++vnbn\Delta_{n}=|v_{1}b_{1}+\cdots+v_{n}b_{n}|, where (b1,,bn)=b(b_{1},\ldots,b_{n})=\mathbf{b} is the vector to be replaced.

For the deduction, it is enough to show the following.

Let AnA_{n} be a random matrix with atom variables satisfying condition C0 and nonsingular with probability one. Assume furthermore that AnA_{n} has at least one and at most logαn\log^{\alpha}n Gaussian rows. Let BnB_{n} be the random matrix obtained from AnA_{n} by replacing a Gaussian row vector a\mathbf{a} of AnA_{n} by a random vector b=(b1,,bn)\mathbf{b}=(b_{1},\ldots,b_{n}) whose coordinates are independent atom variables satisfying condition C0 such that the resulting matrix is nonsingular with probability one. Then

Clearly, Theorem 1.1 follows from Theorem 2.2 by applying Lemma 8.2 logαn\log^{\alpha}n times.

Proof of Lemma 8.2 Without loss of generality, we can assume that BnB_{n} is obtained from AnA_{n} by replacing the last row an\mathbf{a}_{n}. As AnA_{n} is nonsingular, dim(Vn1)=n1\dim(V_{n-1})=n-1.

By Lemma 4.1, by paying an extra term of O(n100)O(n^{-100}) in probability (which will be absorbed by the eventual bound log2αn\log^{-2\alpha}n), we may also assume that the normal vector v\mathbf{v} of Vn1V_{n-1} satisfies

where Δn\Delta_{n} and Δn\Delta^{\prime}_{n} are the distance from an\mathbf{a}_{n} and bn\mathbf{b}_{n} to Vn1V_{n-1}, respectively.

Appendix: Simplifying the model: Deducing Theorem 1.1 from Theorem 2.1

In this section we show that the two extra assumptions that aijlogβn|a_{ij}|\leq\log^{\beta}n and AnA_{n} has full rank with probability one do not violate the generality of Theorem 1.1.

To start with, we need a very weak lower bound on detAn|\det A_{n}|.

It follows from TVcir , Theorem 2.1, that there is a constant CC such that P(σn(An)nC)n1{\mathbf{P}}(\sigma_{n}(A_{n})\leq n^{-C})\leq n^{-1}. Since detAn|\det A_{n}| is the product of its singular values, the bound follows.

The above bound is extremely weak. By modifying the proof in TVdet , one can actually prove the Tao–Vu lower bound (1) for random matrices satisfying C0. Also, sharper bounds on the least singular value are obtained in TVsmooth , RV . However, for the arguments in this section, we only need the bound on Lemma .3.

Let us start with the assumption aijlogβn|a_{ij}|\leq\log^{\beta}n. We can achieve this assumption using the standard truncation method (see BS or TVlocal ). In what follows, we sketch the idea.

Notice that by condition C0, we have, with probability at least 1exp×(log10n)1-\exp\times(-\log^{10}n), that all entries of AnA_{n} have absolute value at most logβn\log^{\beta}n, for some constant β>0\beta>0 which may depend on the constants in C0.

We replace the variable aija_{ij} by the variable aij:=aijIaijlogβna_{ij}^{\prime}:=a_{ij}{\mathbf{I}}_{|a_{ij}|\leq\log^{\beta}n}, for all 1i,jn1\leq i,j\leq n and let AnA_{n}^{\prime} be the random matrix formed by aija_{ij}^{\prime}. Since with probability at least 1exp(log10n)1-\exp(-\log^{10}n), An=AnA_{n}=A_{n}^{\prime}, it is easy to show that if AnA_{n}^{\prime} satisfies the claim of Theorem 1.1, then so does AnA_{n}.

While the entries of AnA_{n}^{\prime} are bounded by logβn\log^{\beta}n, there is still one problem we need to address, namely, that the new variables aija_{ij}^{\prime} do not have mean 0 and variance one. We can achieve this by a simple normalization trick. First observe that by property C0, taking β\beta sufficiently large, it is easy to show that μij=Eaij\mu_{ij}={\mathbf{E}}a_{ij}^{\prime} has absolute value at most nω(1)n^{-\omega(1)} and 1σijnω(1)|1-\sigma_{ij}|\leq n^{-\omega(1)}, where σij\sigma_{ij} is the standard deviation of aija^{\prime}_{ij}. Now define

Note that aija_{ij}^{\prime\prime\prime} now does have mean zero and variance one. Let AnA_{n}^{\prime\prime} and AnA_{n}^{\prime\prime\prime} be the corresponding matrices of aija_{ij}^{\prime\prime} and aija_{ij}^{\prime\prime\prime}, respectively.

where NnN_{n} is the matrix formed by μij\mu_{ij}.

Since μij=nω(1)|\mu_{ij}|=n^{-\omega(1)}, by Hadamard’s bound detNn1/nnω(1)|\det N_{n}|^{1/n}\leq n^{-\omega(1)}. On the other hand, we have by Lemma .3 that P(detAn1/nnC)1n1{\mathbf{P}}(|\det A_{n}^{\prime\prime}|^{1/n}\geq n^{-C})\geq 1-n^{-1}. It thus follows that

We can prove a matching lower bound by the same argument. From here, we conclude that if detAn|\det A_{n}^{\prime\prime}| satisfies the conclusion of Theorem 1.1, then so does detAn|\det A_{n}^{\prime}|.

To pass from det(An)\det(A_{n}^{\prime\prime}) to det(An)\det(A_{n}^{\prime\prime\prime}), we apply the Brunn–Minkowski inequality again,

where NnN_{n}^{\prime} is the matrix form by aij(1σij1)a_{ij}^{\prime\prime}(1-\sigma_{ij}^{-1}). Noting that 1σij1nω(1)|1-\sigma_{ij}^{-1}|\leq n^{-\omega(1)} and aij=logO(1)n|a_{ij}^{\prime\prime}|=\log^{O(1)}n, we infer that det(An)|\det(A_{n}^{\prime\prime})| and det(An)|\det(A_{n}^{\prime\prime\prime})| are comparable with high probability

Now we address the assumption that AnA_{n} has full rank with probability one. Notice that this is usually not true when the aija_{ij} have discrete distribution (such as Bernoulli). However, we find the following simple trick that makes the assumption valid for our study.

Instead of the entry aija_{ij}, consider aij:=(1ε2)1/2aij+εξ0a_{ij}^{\prime}:=(1-\varepsilon^{2})^{1/2}a_{ij}+\varepsilon\xi_{0} where ξ0\xi_{0} is uniform on the interval $andand\varepsilonisverysmall,say,is very small, say,n^{-1000n}.Itisclearthatthematrix. It is clear that the matrixA_{n}^{\prime}formedbytheformed by thea_{ij}^{\prime}$ has full rank with probability one. On the other hand, it is easy to show that by the Brunn–Minkowski inequality and Hadamard’s bound

Furthermore, by Lemma .3, detAnnCn|\det A_{n}|\geq n^{-Cn} with probability 1n11-n^{-1}, and so we can conclude as in the previous argument.

References