Quantitative estimates of the convergence of the empirical covariance matrix in Log-concave Ensembles
Radosław Adamczak, Alexander E. Litvak, Alain Pajor, Nicole Tomczak-Jaegermann
Introduction
This question was investigated in motivated by a problem of complexity in computing volume in high dimension. In particular the authors proved that
whenever .
When random vectors are standard Gaussian, the covariance matrix is the identity and it is known (see the survey ) that (1.1) holds with high probability whenever . This raises the question about the order of the best . In particular can it be proportional to , under reasonable assumptions? More precisely, the question in was phrased in the following setting.
Since for a symmetric matrix , one has , (1.1) is implied by
In the case when the covariance matrix is the identity, it is equivalent to
Because of the linear invariance, there is no loss of generality to consider just this case when the covariance matrix is the identity.
In this framework, a breakthrough was achieved in where it was proved that for any , there exists such that if a body is isotropic then i.i.d. uniformly distributed points on satisfy (1.2). This estimate was further improved to in and to in and ; the former paper treated the case when is invariant under every reflection with respect to coordinate subspaces and the latter proved the estimate in full generality
One should note that in all these results, the probability in (1.2) does not go to 1 as goes to infinity, as one expects in this type of high dimensional phenomena. This probability, , is given by a parameter and depends on it. Thus letting tend to zero may destroy the estimate on . To emphasize this important feature we will talk about overwhelming probability if the probability goes to 1 as goes to infinity.
The first result establishing (1.1) with overwhelming probability was given in . When a body is invariant under every reflection with respect to coordinate subspaces, it is proved in that for any there exist such that (1.5) holds whenever and with probability going to 1 as goes to infinity. Finally, the present paper shows, as a consequence of our main results (Theorems 4.1 and 4.2), that the same is true for an arbitrary body (in the isotropic position).
To observe a still one more point of view, for arbitrary and , consider again . The set of matrices may be equipped with the distribution of to be a matrix probability space and because of the analogy with Random Matrix Theory, in particular with Wishart Ensemble, let us call it a Log-concave Ensemble.
In the last decades, in Asymptotic Geometric Analysis, considerable work and progress have been achieved in understanding the properties of random vectors with log-concave distribution, and more recently, in understanding spectral properties of random matrices with independent rows (or columns) with log-concave distribution. It appears that in high dimension they behave somewhat similarly as if the coordinate would be independent. This leads by analogy with Random Matrix Theory to questions on the spectrum of similar to those of the Wishart Ensemble. One important difference is that now the entries are dependent but strongly structured by the log-concavity hypothesis.
Denote by the eigenvalues of (the squares of the singular values of ). It was proved in that when goes to as , then the empirical measures of the eigenvalues have a limit. It is the so-called Marchenko-Pastur distribution, as for the Wishart Ensemble when all entries of the matrix are i.i.d. It is also known () in the case when all the entries of are i.i.d. (with a finite fourth moment) and that and . One could conjecture that such results are also valid in the log-concave setting. Nevertheless, these results are asymptotic and not quantitative (given fixed dimension).
Problem (1.5) is of course equivalent to quantitative estimates for and , that is of the support of the spectrum of . An answer is given by Proposition 4.4 where it is shown that for ,
holds with probability larger than , where are numerical constants. Thus, putting , we get
with overwhelming probability. As a consequence already mentioned earlier, with overwhelming probability, where is a numerical constant (Corollary 4.12).
Our general method follows an approach that can be traced back to Bourgain (cf. also ). It relies upon a crucial new ingredient of a novel chaining argument that in an essential way depends on the distribution of coordinates of a point on the unit sphere. What makes this approach work, by rather subtle estimates, is a special structure of the sets used for the chaining.
The paper is organized as follows. In the next Section 2 we present some definitions and preliminary tools. In Section 3 we study the norm of a restriction of the matrix defined by
We show in Theorem 3.6 that with overwhelming probability,
In Section 4.1 we prove the result announced in the abstract, answering a question from . This theorem appears as a particular case of a more general study of
defined for any . Such processes have been studied in , and .
Notation and preliminaries
in other words, if is centered and its covariance matrix is the identity:
A straightforward computation shows that for every integer ,
We can now state the sub-exponential decay of linear functionals in terms of norm :
where is universal constant. Moreover, if has a symmetric distribution then .
The moreover part easily follows by a direct calculation (see ).
Putting together (2.2) and Lemma 2.3, we get that for every ,
where is an absolute positive constant.
Norm of a random matrix
with probability at least .
where and are absolute positive constants. The result follows by the union bound (and adjusting absolute constants).
Let , and . Then
Proof Denote the underlying probability space by . For with , , and , define the subset of by
Fix , and as above and set . Clearly, is independent of vectors ’s, , and . Note that on (otherwise for all and the sharp inequality defining would be violated). Thus, using the fact that , we obtain
on . Since on , this implies
On the other hand, by Chebyshev’s inequality and the assumption on the -norms of linear functionals, the latter probability is less than
We will also need another lemma of a similar type. We provide the proof for sake of completeness.
Let , , , and . Let denote the set of vectors with and let be a subset of of cardinality . Then
Proof The proof is analogous to the argument in Lemma 3.3. For with , , and consider
Fix , , as above and set . Clearly, is independent of the vectors ’s, , moreover, , and, similarly as in before, on . Thus, using the fact that , we obtain
on . Therefore, again as in Lemma 3.3, we have
This shows that the probability estimate in Theorem 3.6 is optimal up to numerical constants. The analysis of this example shows that up to numerical constants the logarithmic term in the estimate of in Theorem 3.6 is also optimal (for the details see ).
Letting we get a clearly optimal estimate for the operator norm , valid with overwhelming probability.
In the setting of Theorem 3.6 we get, for every ,
with probability at least , where are absolute constants.
By Lemmas 2.3 and 3.1, and taking into account the normalization, this would imply a version of (3.1) with and probability .
Note that in the formula in Theorem 3.6 can be substituted with
Indeed, if there is nothing to prove, otherwise
There are absolute positive constants and such that for every , , , and ’s as in Theorem 3.6 one has
Proof Given set . Consider vector defined by if and otherwise. We have
Therefore Theorem 3.6 and Remark 3.7 imply the result.
Proof of Theorem 3.6. As , it is easy to see, by applying the union bound and adjusting absolute constants, that it is sufficient to prove that for sufficiently large and every fixed , one has
We shall define a set of vectors with a special structure and supports less than or equal to which serves simultaneously two purposes: we will be able to estimate with large probability , and we will use to approximate an arbitrary vector from of support less than or equal to . Then a standard argument will lead to the required estimate for .
Otherwise, let be the smallest integer such that
and fix positive integers such that for and , and . (We shall later set for and .)
Then set , where consists of all vectors of the form , where ’s have disjoint supports and
Note that for every vector we have and .
We shall consider the details of the case (the other case, when (3.2) holds, can be treated similarly, actually, it is even simpler, since the construction of is simpler). Fix of the form and let be the support of (if there are more than one such representations, we fix one of them). Denote the coordinates of by , , then
Note that by Lemma 3.1, with probability larger than , and we would like to get a similar estimate for .
To this aim we split according to the structure of . Namely we let
where . Note that
We first estimate . By Lemma 3.2 we obtain that for every there exists a subset of such that
We now apply Lemma 3.3 to each summand in the sum above with , , for the first summand (note that such an satisfies the condition) and with , , for . By the union bound we obtain
where is the absolute constant from Lemma 2.3.
Therefore, the choice of implies the following bound, with some absolute positive constant ,
(We also used the estimate , valid when .)
The estimate for essentially follows the same lines. In a sense it is simpler, since we don’t need to apply Lemma 3.2. For every we consider , where consists of all vectors of the form , where ’s () have pairwise disjoint supports and
Then and
Now we apply Lemma 3.4 to each summand with
As in the case for it follows that
where is the same absolute constant as above. Since , then
Passing now to the approximation argument, pick an arbitrary with . Define the following subsets of depending on . Denote the coordinates of by (). Let be such that , so that for (since ). If condition (3.2) holds we denote the support of by and consider only this . Otherwise we set
where is the smallest integer satisfying (3.3) (as before). (For small values of it can happen that is empty, but it does not create any difficulty in the proof below.) Clearly, we have
and . Note that the numbers ’s do not depend on , although the sets ’s do. Finally, since , we also observe that for every ,
Note that for every the vector can be approximated by a vector from and the vector can be approximated by a vector from . Thus there exists , with a suitable representation , such that
Moreover, is chosen to have the same support as , and thus has the support .
Considering all with it follows that
Recall that by (3.4) for every we have
where is an absolute positive constant. (In fact this estimate for probability requires that is sufficiently large, but, as was arbitrary, we can adjust the constants.) This concludes the proof.
where (note that ).
We conclude this section with a more technical variant of Theorem 3.6. Note that in particular it requires weaker conditions on ’s and does not require any bounds on .
Let be a random matrix whose columns are ’s, and , , is defined as before. Then for every , every , and every one has
where is an absolute constant. In particular, choosing to be the largest integer satisfying
Note that from the definitions we immediately have
For completeness we outline a proof of Theorem 3.13.
Proof (Sketch.) We proceed as in the proof of Theorem 3.6. So first we construct . If we define exactly as after formula (3.2), otherwise it will be constructed in the same way as it was constructed after formula (3.3) (note that now is a fixed number). Then we estimate . As before we use Lemmas 3.3 and 3.4.
The only difference is that for the first summand in the formula for we use Lemma 3.3 with instead of . It will give us that
Thus, with another absolute positive constant we have
Finally we apply the same approximation procedure. By (3.4) and approximation we get formula (3.6)
which implies the result, by adjusting constants, if necessary. The “in particular” part of the Theorem is trivial.
It is possible to extend Theorem 3.13 to a -setting, similar to the one considered in . Let and let be a random vector such that for some one has
for every . Then, adjusting Lemmas 3.3 and 3.4, and repeating the proof of Theorem 3.13 we can get
However we will not pursue this direction here.
Kannan-Lovász-Simonovits question
First note that because of the linear invariance, (1.5) implies
Therefore without loss of generality we restrict ourselves to the case when the covariance matrix is the identity.
where is an absolute constant. Moreover, one can take , where is an absolute constant.
This way approximating the covariance matrix becomes a special case of a more general problem, concerning the uniform approximation of the moments of one dimensional marginals of an isotropic log-concave measure by their empirical counterparts. In particular, Theorem 4.1 is implied by the following result.
Moreover, one can take , where depends only on .
with probability even higher than claimed. Thus in the proofs of both theorems we may assume without loss of generality that .
In the first step of the proof of Theorem 4.2 we shall use some tools from the probability in Banach spaces, in particular classical symmetrization and contraction methods as in and . These tools work for general empirical processes and are not necessary in our setting since we are dealing more specifically with powers of linear forms. We choose this approach, though, as it requires less computations and leads to a unified, simpler and more transparent presentation.
Theorem 4.2 is an easy consequence of the following technical proposition applied with .
In the setting of Theorem 4.2, if , then for any , the estimate
where , , are absolute constants and depends on only.
The two parameters and play different role in the proof and reflect different asymptotic behavior of the probability with which (4.4) holds. The first parameter is related to a level of truncation of linear forms whereas the second is a factor in the deviation when one deals only with the truncated part. For instance, by taking , it allows us to get a probability converging to one as , if both dimensions are fixed.
Before we proceed to the proof of the above proposition, let us introduce some tools from the classical theory of probability in Banach spaces. Below, will always denote a sequence of independent Rademacher variables, independent of the sequence .
Let be a family of functions, uniformly bounded by . Then for any independent random variables and any , we have
We will also use the celebrated Talagrand’s concentration inequality for suprema of bounded empirical processes . The version from presented below, provides the best known constants in this inequality (we will however not take advantage of explicit constants). For a simple proof (with worse constants) we refer the reader to
Proof of Proposition 4.4 For simplicity, throughout this proof we will use the letter to denote absolute constants, whose values may change from line to line.
For (to be specified later) consider
where the last line follows from Corollary 4.7. The function is a contraction, so
Each of the obtained three terms is estimated separately, with the first term already discussed in (4.5) and (4.1). By (2.3) and Chebyshev’s inequality we have
Together with the previous inequalities this implies that
Thus it remains to estimate . To this end we use Theorem 3.6 and Remark 3.10. It follows that for , with probability at least , we have, for all and all with ,
For an arbitrary let . Then, by (4.8),
we obtain (for a different absolute constant ),
This combined with (4.8) implies, after taking the ’th powers and again adjusting constants, that with probability at least , for all ,
Setting , so that (4.9) is satisfied, and combining the resulting estimate with (4.1), we get
This completes the proof of Proposition 4.4,
where is a numerical constant. Thus . This example shows that the sub-exponential decay of linear forms ( norm bounded) is not sufficient for our problem.
2 Additional observations
For let be a random matrix with rows . Then for , with probability at least (where depends only on ),
with depending only on . Moreover
On the other hand, a single row of has expected Euclidean norm of the order of and a single column of has expected norm of the order of , so the left hand side of (4.11) follows trivially.
For let be a random matrix with rows . Then for , with probability at least (where is an absolute constant),
for some absolute constant . Moreover
Proof Inequality (4.12) and the right-hand side of (4.13) follow from the corresponding results for , since
To prove the left-hand side of (4.13), it is enough to notice that if , then
One can also obtain an almost-isometric result for .
Moreover, one can take , where is an absolute constant.
Proof Since the proof differs only by technical details from the corresponding argument for , we will just indicate the necessary changes. We will use the notation from the proof of Proposition 4.4.
(the constants in the exponents can be made independent of , since now runs over a bounded interval). This allows us to finish the proof.
The isomorphic result for was proven in . The same paper also considers .
3 Elementary approach for p=2𝑝2p=2
As announced earlier we will now briefly describe a more elementary proof of Theorem 4.1 and Theorem 4.2 for . In this case, the classical Bernstein inequality and a net argument on the sphere may replace the contraction principle and concentration of measure for empirical processes, that have been used – via Lemma 4.8 – to prove (4.5). The remaining part of the proof is left unchanged.
The key point is the following well known observation:
We postpone the proof of this Lemma and pass to the proof of Theorems 4.1 and 4.2.
Fix a -net of of cardinality at most , and to be determined later. Pick an arbitrary .
For the reader’s convenience recall Bernstein’s inequality.
Let be independent random variables, centered and such that for all . Put . Then for all ,
Setting we infer that
Using this estimate with and handling the unbounded part the same way as in Proposition 4.4 (see the argument that follows (4.5)) we obtain
This corresponds to the estimates in Proposition 4.4 (for ).
Now, for , and sufficiently large, the right hand side of (4.3) is at most and which leads to the probability above to be at least . So with the same probability we get
The triangle inequality and homogeneity of imply, by a standard argument, that
To get a lower estimate, write an arbitrary in the form , with and . Then , where
Thus for all , for some depending only on . In particular . Using the fact that the function is Lipschitz with constant on the interval , we conclude that
where depends only on .