Quantitative estimates of the convergence of the empirical covariance matrix in Log-concave Ensembles

Radosław Adamczak, Alexander E. Litvak, Alain Pajor, Nicole Tomczak-Jaegermann

Introduction

This question was investigated in motivated by a problem of complexity in computing volume in high dimension. In particular the authors proved that

whenever NCεδn2N\geq{\frac{C}{\varepsilon\delta}}n^{2}.

When random vectors are standard Gaussian, the covariance matrix is the identity and it is known (see the survey ) that (1.1) holds with high probability whenever N4n/ε2N\geq 4n/\varepsilon^{2}. This raises the question about the order of the best NN. In particular can it be proportional to nn, under reasonable assumptions? More precisely, the question in was phrased in the following setting.

Since for a symmetric matrix MM, one has M=supySn1My,y\|M\|=\sup_{y\in S^{n-1}}\langle My,y\rangle, (1.1) is implied by

In the case when the covariance matrix is the identity, it is equivalent to

Because of the linear invariance, there is no loss of generality to consider just this case when the covariance matrix is the identity.

In this framework, a breakthrough was achieved in where it was proved that for any ε,δ(0,1)\varepsilon,\delta\in(0,1), there exists C(ε,δ)>0C(\varepsilon,\delta)>0 such that if a body KK is isotropic then N=C(ε,δ)nlog3nN=C(\varepsilon,\delta)n\log^{3}n i.i.d. uniformly distributed points on KK satisfy (1.2). This estimate was further improved to N=C(ε,δ)nlog2nN=C(\varepsilon,\delta)n\log^{2}n in and to N=C(ε,δ)nlognN=C(\varepsilon,\delta)n\log n in and ; the former paper treated the case when KK is invariant under every reflection with respect to coordinate subspaces and the latter proved the estimate in full generality

One should note that in all these results, the probability in (1.2) does not go to 1 as nn goes to infinity, as one expects in this type of high dimensional phenomena. This probability, 1δ1-\delta, is given by a parameter δ\delta and C(ε,δ)C(\varepsilon,\delta) depends on it. Thus letting δ\delta tend to zero may destroy the estimate on NN. To emphasize this important feature we will talk about overwhelming probability if the probability goes to 1 as nn goes to infinity.

The first result establishing (1.1) with overwhelming probability was given in . When a body KK is invariant under every reflection with respect to coordinate subspaces, it is proved in that for any ε(0,1)\varepsilon\in(0,1) there exist C(ε)>0C(\varepsilon)>0 such that (1.5) holds whenever NC(ε)nN\geq C(\varepsilon)\,n and with probability going to 1 as nn goes to infinity. Finally, the present paper shows, as a consequence of our main results (Theorems 4.1 and 4.2), that the same is true for an arbitrary body KK (in the isotropic position).

To observe a still one more point of view, for arbitrary nn and NN, consider again A=A(N)A=A^{(N)}. The set of n×nn\times n matrices may be equipped with the distribution of AAAA^{*} to be a matrix probability space and because of the analogy with Random Matrix Theory, in particular with Wishart Ensemble, let us call it a Log-concave Ensemble.

In the last decades, in Asymptotic Geometric Analysis, considerable work and progress have been achieved in understanding the properties of random vectors with log-concave distribution, and more recently, in understanding spectral properties of random matrices with independent rows (or columns) with log-concave distribution. It appears that in high dimension they behave somewhat similarly as if the coordinate would be independent. This leads by analogy with Random Matrix Theory to questions on the spectrum of AAAA^{*} similar to those of the Wishart Ensemble. One important difference is that now the entries are dependent but strongly structured by the log-concavity hypothesis.

Denote by λ1=λ1(A(N))λn=λn(A(N))\lambda_{1}=\lambda_{1}(A^{(N)})\leq\cdots\leq\lambda_{n}=\lambda_{n}(A^{(N)}) the eigenvalues of AAAA^{*} (the squares of the singular values of AA). It was proved in that when n/Nn/N goes to β(0,1)\beta\in(0,1) as n,Nn,N\to\infty, then the empirical measures of the eigenvalues have a limit. It is the so-called Marchenko-Pastur distribution, as for the Wishart Ensemble when all entries of the matrix AA are i.i.d. It is also known () in the case when all the entries of AA are i.i.d. (with a finite fourth moment) and limn+nN=β(0,1)\lim_{n\to+\infty}{\frac{n}{N}}=\beta\in(0,1) that limλ1/N=(1β)2\lim\lambda_{1}/N=(1-\sqrt{\beta})^{2} and limλn/N=(1+β)2\lim\lambda_{n}/N=(1+\sqrt{\beta})^{2}. One could conjecture that such results are also valid in the log-concave setting. Nevertheless, these results are asymptotic and not quantitative (given fixed dimension).

Problem (1.5) is of course equivalent to quantitative estimates for λ1(A(N))\lambda_{1}(A^{(N)}) and λn(A(N))\lambda_{n}(A^{(N)}), that is of the support of the spectrum of AA. An answer is given by Proposition 4.4 where it is shown that for nNexp(n)n\leq N\leq\exp(\sqrt{n}),

holds with probability larger than 1exp(cn)1-\exp(-c\sqrt{n}), where C,c>0C,c>0 are numerical constants. Thus, putting β=nN(0,1)\beta={\frac{n}{N}}\in(0,1), we get

with overwhelming probability. As a consequence already mentioned earlier, AC(N+n)\|A\|\leq C(\sqrt{N}+\sqrt{n}) with overwhelming probability, where C>0C>0 is a numerical constant (Corollary 4.12).

Our general method follows an approach that can be traced back to Bourgain (cf. also ). It relies upon a crucial new ingredient of a novel chaining argument that in an essential way depends on the distribution of coordinates of a point on the unit sphere. What makes this approach work, by rather subtle estimates, is a special structure of the sets used for the chaining.

The paper is organized as follows. In the next Section 2 we present some definitions and preliminary tools. In Section 3 we study the norm of a restriction of the matrix A=A(N)A=A^{(N)} defined by

We show in Theorem 3.6 that with overwhelming probability,

In Section 4.1 we prove the result announced in the abstract, answering a question from . This theorem appears as a particular case of a more general study of

defined for any p1p\geq 1. Such processes have been studied in , and .

Notation and preliminaries

in other words, if XX is centered and its covariance matrix is the identity:

A straightforward computation shows that for every integer p1p\geq 1,

We can now state the sub-exponential decay of linear functionals in terms of ψ1\psi_{1} norm :

where ψ>0\psi>0 is universal constant. Moreover, if XX has a symmetric distribution then ψ=2\psi=2.

The moreover part easily follows by a direct calculation (see ).

Putting together (2.2) and Lemma 2.3, we get that for every ySn1y\in S^{n-1},

where CC is an absolute positive constant.

Norm of a random matrix

with probability at least 1exp(Kn)1-\exp(-K\sqrt{n}).

where CC and cc are absolute positive constants. The result follows by the union bound (and adjusting absolute constants).

Let mNm\leq N, ε,α(0,1]\varepsilon,\alpha\in(0,1] and L2mlog12eNmεL\geq 2m\log\frac{12eN}{m\varepsilon}. Then

Proof Denote the underlying probability space by Ω\Omega. For F{1,...,N}F\subset\{1,...,N\} with Fm|F|\leq m, EFE\subset F, and zN(F,ε,α)z\in{\cal{N}}(F,\varepsilon,\alpha), define the subset Ω(F,E,z)\Omega(F,E,z) of Ω\Omega by

Fix FF, EE and zz as above and set y=jFEzjXjy=\sum_{j\in F\setminus E}z_{j}X_{j}. Clearly, yy is independent of vectors XiX_{i}’s, iEi\in E, and yAm|y|\leq A_{m}. Note that y>0|y|>0 on Ω(F,E,z)\Omega(F,E,z) (otherwise ziXi,y=0\left\langle z_{i}X_{i},y\right\rangle=0 for all iEi\in E and the sharp inequality defining Ω(F,E,z)\Omega(F,E,z) would be violated). Thus, using the fact that zα\|z\|_{\infty}\leq\alpha, we obtain

on Ω(F,E,z)\Omega(F,E,z). Since Am>0A_{m}>0 on Ω(F,E,z)\Omega(F,E,z), this implies

On the other hand, by Chebyshev’s inequality and the assumption on the ψ1\psi_{1}-norms of linear functionals, the latter probability is less than

We will also need another lemma of a similar type. We provide the proof for sake of completeness.

Let 1k,mN1\leq k,m\leq N, ε,α(0,1]\varepsilon,\alpha\in(0,1], β>0\beta>0, and L>0L>0. Let B(m,β)B(m,\beta) denote the set of vectors xβB2Nx\in\beta B_{2}^{N} with suppxm|\mathop{\rm supp}x|\leq m and let B\cal{B} be a subset of B(m,β)B(m,\beta) of cardinality MM. Then

Proof The proof is analogous to the argument in Lemma 3.3. For F{1,...,N}F\subset\{1,...,N\} with Fk|F|\leq k, xBx\in{\cal{B}}, and zN(F,ε,α)z\in{\cal{N}}(F,\varepsilon,\alpha) consider

Fix FF, xx, zz as above and set y=j∉FxjXjy=\sum_{j\not\in F}x_{j}X_{j}. Clearly, yy is independent of the vectors XiX_{i}’s, iFi\in F, moreover, yβAm|y|\leq\beta A_{m}, and, similarly as in before, y>0|y|>0 on Ω(F,x,z)\Omega(F,x,z). Thus, using the fact that zα\|z\|_{\infty}\leq\alpha, we obtain

on Ω(F,x,z)\Omega(F,x,z). Therefore, again as in Lemma 3.3, we have

This shows that the probability estimate in Theorem 3.6 is optimal up to numerical constants. The analysis of this example shows that up to numerical constants the logarithmic term in the estimate of AmA_{m} in Theorem 3.6 is also optimal (for the details see ).

Letting m=Nm=N we get a clearly optimal estimate for the operator norm A\|A\|, valid with overwhelming probability.

In the setting of Theorem 3.6 we get, for every K1K\geq 1,

with probability at least 1ecKn1-e^{-cK\sqrt{n}}, where C,c>0C,c>0 are absolute constants.

By Lemmas 2.3 and 3.1, and taking into account the normalization, this would imply a version of (3.1) with N=nN=n and probability 1δ1-\delta.

Note that n+mlog2Nm\sqrt{n}+\sqrt{m}\log\frac{2N}{m} in the formula in Theorem 3.6 can be substituted with

Indeed, if mnm\geq n there is nothing to prove, otherwise

There are absolute positive constants CC and cc such that for every n1n\geq 1, 1Nen1\leq N\leq e^{\sqrt{n}}, K1K\geq 1, and XiX_{i}’s as in Theorem 3.6 one has

Proof Given EE set m=Em=|E|. Consider vector zSN1z\in S^{N-1} defined by zi=1/mz_{i}=1/\sqrt{m} if iEi\in E and zi=0z_{i}=0 otherwise. We have

Therefore Theorem 3.6 and Remark 3.7 imply the result.

Proof of Theorem 3.6. As NenN\leq e^{\sqrt{n}}, it is easy to see, by applying the union bound and adjusting absolute constants, that it is sufficient to prove that for KK sufficiently large and every fixed mNm\leq N, one has

We shall define a set M\cal{M} of vectors with a special structure and supports less than or equal to mm which serves simultaneously two purposes: we will be able to estimate with large probability supxMAx\sup_{x\in\cal{M}}|Ax|, and we will use M\cal{M} to approximate an arbitrary vector from B2NB_{2}^{N} of support less than or equal to mm. Then a standard argument will lead to the required estimate for AmA_{m}.

Otherwise, let ll be the smallest integer such that

and fix positive integers a0,a1,,ala_{0},a_{1},\ldots,a_{l} such that akm2k+1a_{k}\leq m\,2^{-k+1} for 1kl1\leq k\leq l and a0m2la_{0}\leq m\,2^{-l}, and k=0lak=m\sum_{k=0}^{l}a_{k}=m. (We shall later set ak:=[m2k+1][m2k]a_{k}:=[m\,2^{-k+1}]-[m\,2^{-k}] for 1kl1\leq k\leq l and a0:=[m2l]a_{0}:=[m\,2^{-l}].)

Then set M=M02B2N{\cal{M}}={\cal{M}}_{0}\cap 2B_{2}^{N}, where M0{\cal{M}}_{0} consists of all vectors of the form x=k=0lxkx=\sum_{k=0}^{l}x_{k}, where xix_{i}’s have disjoint supports and

Note that for every vector xMx\in{\cal{M}} we have suppx0lak=m|\mathop{\rm supp}x|\leq\sum_{0}^{l}a_{k}=m and x2|x|\leq 2.

We shall consider the details of the case mlog(48eN/m)>nm\log(48eN/m)>\sqrt{n} (the other case, when (3.2) holds, can be treated similarly, actually, it is even simpler, since the construction of M\cal{M} is simpler). Fix xMx\in\cal{M} of the form x=k=0lxkx=\sum_{k=0}^{l}x_{k} and let FkF_{k} be the support of xkx_{k} (if there are more than one such representations, we fix one of them). Denote the coordinates of xx by x(i)x(i), iNi\leq N, then

Note that by Lemma 3.1, maxiXiC0Kn\max_{i}|X_{i}|\leq C_{0}K\sqrt{n} with probability larger than 1eKn1-e^{-K\sqrt{n}}, and we would like to get a similar estimate for DxD_{x}.

To this aim we split DxD_{x} according to the structure of xx. Namely we let

where Gk={0,k+1,k+2,,l}G_{k}=\{0,k+1,k+2,\ldots,l\}. Note that

We first estimate DxD^{\prime}_{x}. By Lemma 3.2 we obtain that for every kk there exists a subset Fˉk\bar{F}_{k} of FkF_{k} such that

We now apply Lemma 3.3 to each summand in the sum above with L=2KnL=2K\sqrt{n}, ε=1/4\varepsilon=1/4, α=1\alpha=1 for the first summand (note that such an LL satisfies the condition) and with L=4m2kKlog12eN4kmL=\frac{4m}{2^{k}}K\log\frac{12eN4^{k}}{m}, ε=2k\varepsilon=2^{-k}, α=2km\alpha=\sqrt{\frac{2^{k}}{m}} for k1k\geq 1. By the union bound we obtain

where ψ\psi is the absolute constant from Lemma 2.3.

Therefore, the choice of ll implies the following bound, with some absolute positive constant CC,

(We also used the estimate l2nl\leq 2\sqrt{n}, valid when mNenm\leq N\leq e^{\sqrt{n}}.)

The estimate for DxD^{\prime\prime}_{x} essentially follows the same lines. In a sense it is simpler, since we don’t need to apply Lemma 3.2. For every 1kl1\leq k\leq l we consider Mk=Mk2B2N{\cal{M}}_{k}={\cal{M}}_{k}^{\prime}\cap 2B_{2}^{N}, where Mk{\cal{M}}_{k}^{\prime} consists of all vectors of the form x=x0+s=k+1lxsx=x_{0}+\sum_{s=k+1}^{l}x_{s}, where xix_{i}’s (i=0,k=1,,li=0,k=1,\ldots,l) have pairwise disjoint supports and

Then Mk2B2N{\cal{M}}_{k}\subset 2B_{2}^{N} and

Now we apply Lemma 3.4 to each summand with

As in the case for DxD_{x}^{\prime} it follows that

where CC is the same absolute constant as above. Since Dx=Dx+DxD_{x}=D^{\prime}_{x}+D^{\prime\prime}_{x}, then

Passing now to the approximation argument, pick an arbitrary zSN1z\in S^{N-1} with suppzm|\mathop{\rm supp}z|\leq m. Define the following subsets of {1,,N}\{1,\ldots,N\} depending on zz. Denote the coordinates of zz by ziz_{i} (i=1,,Ni=1,\ldots,N). Let n1,,nNn_{1},\ldots,n_{N} be such that zn1zn2znN|z_{n_{1}}|\geq|z_{n_{2}}|\geq\ldots\geq|z_{n_{N}}|, so that zni=0z_{n_{i}}=0 for i>mi>m (since suppzm|\mathop{\rm supp}z|\leq m). If condition (3.2) holds we denote the support of zz by E0E_{0} and consider only this E0E_{0}. Otherwise we set

where ll is the smallest integer satisfying (3.3) (as before). (For small values of nn it can happen that E0E_{0} is empty, but it does not create any difficulty in the proof below.) Clearly, we have

and i=0lai=m\sum_{i=0}^{l}a_{i}=m. Note that the numbers aka_{k}’s do not depend on zz, although the sets EkE_{k}’s do. Finally, since zSN1z\in S^{N-1}, we also observe that for every k1k\geq 1,

Note that for every k1k\geq 1 the vector PEkzP_{E_{k}}z can be approximated by a vector from N(Ek,2k,2km){\cal{N}}\left(E_{k},2^{-k},\sqrt{\frac{2^{k}}{m}}\right) and the vector PE0zP_{E_{0}}z can be approximated by a vector from N(E0,1/4,1){\cal{N}}(E_{0},1/4,1). Thus there exists xMx\in{\cal{M}}, with a suitable representation x=k=0lxkx=\sum_{k=0}^{l}x_{k}, such that

Moreover, xx is chosen to have the same support as zz, and thus w=zxw=z-x has the support suppwm|\mathop{\rm supp}w|\leq m.

Considering all zSN1z\in S^{N-1} with suppzm|\mathop{\rm supp}z|\leq m it follows that

Recall that by (3.4) for every xMx\in{\cal M} we have

where cc is an absolute positive constant. (In fact this estimate for probability requires that nn is sufficiently large, but, as K1K\geq 1 was arbitrary, we can adjust the constants.) This concludes the proof.

where κ=T=Σ\kappa=\|T^{\ast}\|=\sqrt{\|\Sigma\|} (note that Σ=TT\Sigma=TT^{\ast}).

We conclude this section with a more technical variant of Theorem 3.6. Note that in particular it requires weaker conditions on XiX_{i}’s and does not require any bounds on NN.

Let AA be a random n×Nn\times N matrix whose columns are XiX_{i}’s, and AmA_{m}, mNm\leq N, is defined as before. Then for every 1mN1\leq m\leq N, every 0llogm0\leq l\leq\log m, and every K1K\geq 1 one has

where CC is an absolute constant. In particular, choosing 0llogm0\leq l\leq\log m to be the largest integer satisfying

Note that from the definitions we immediately have

For completeness we outline a proof of Theorem 3.13.

Proof (Sketch.) We proceed as in the proof of Theorem 3.6. So first we construct M\cal{M}. If l=0l=0 we define M\cal{M} exactly as after formula (3.2), otherwise it will be constructed in the same way as it was constructed after formula (3.3) (note that now ll is a fixed number). Then we estimate Dx=Dx+DxD_{x}=D_{x}^{\prime}+D_{x}^{\prime\prime}. As before we use Lemmas 3.3 and 3.4.

The only difference is that for the first summand in the formula for DxD_{x}^{\prime} we use Lemma 3.3 with L=4Km2llog48eN2lmL=4K\frac{m}{2^{l}}\log\frac{48eN2^{l}}{m} instead of L=2KnL=2K\sqrt{n}. It will give us that

Thus, with another absolute positive constant CC we have

Finally we apply the same approximation procedure. By (3.4) and approximation we get formula (3.6)

which implies the result, by adjusting constants, if necessary. The “in particular” part of the Theorem is trivial.

It is possible to extend Theorem 3.13 to a ψp\psi_{p}-setting, similar to the one considered in . Let pp\in and let XX be a random vector such that for some ψp>0\psi_{p}>0 one has

for every ySn1y\in S^{n-1}. Then, adjusting Lemmas 3.3 and 3.4, and repeating the proof of Theorem 3.13 we can get

However we will not pursue this direction here.

Kannan-Lovász-Simonovits question

First note that because of the linear invariance, (1.5) implies

Therefore without loss of generality we restrict ourselves to the case when the covariance matrix is the identity.

where c>0c>0 is an absolute constant. Moreover, one can take C(ε,t)=Ct4ε2log2(2t2ε2)C(\varepsilon,t)=Ct^{4}\varepsilon^{-2}\log^{2}(2t^{2}\varepsilon^{-2}), where C>0C>0 is an absolute constant.

This way approximating the covariance matrix becomes a special case of a more general problem, concerning the uniform approximation of the moments of one dimensional marginals of an isotropic log-concave measure by their empirical counterparts. In particular, Theorem 4.1 is implied by the following result.

Moreover, one can take C(ε,t,p)=Cpt2pε2log2p2(2t2ε2)C(\varepsilon,t,p)=C_{p}t^{2p}\varepsilon^{-2}\log^{2p-2}(2t^{2}\varepsilon^{-2}), where CpC_{p} depends only on pp.

with probability even higher than claimed. Thus in the proofs of both theorems we may assume without loss of generality that Nexp(n)N\leq\exp{(\sqrt{n})}.

In the first step of the proof of Theorem 4.2 we shall use some tools from the probability in Banach spaces, in particular classical symmetrization and contraction methods as in and . These tools work for general empirical processes and are not necessary in our setting since we are dealing more specifically with powers of linear forms. We choose this approach, though, as it requires less computations and leads to a unified, simpler and more transparent presentation.

Theorem 4.2 is an easy consequence of the following technical proposition applied with s=ts=t.

In the setting of Theorem 4.2, if nNenn\leq N\leq e^{\sqrt{n}}, then for any s,t1s,t\geq 1, the estimate

where u=t2s2p2nlog2p2(2N/n)u=t^{2}s^{2p-2}n\log^{2p-2}(2N/n), v=ts1Nn/log(2N/n)v=ts^{-1}\sqrt{Nn}/\log(2N/n), C,c>0C,c>0 are absolute constants and cp>0c_{p}>0 depends on pp only.

The two parameters ss and tt play different role in the proof and reflect different asymptotic behavior of the probability with which (4.4) holds. The first parameter ss is related to a level of truncation of linear forms whereas the second is a factor in the deviation when one deals only with the truncated part. For instance, by taking s=t1/2s=t^{1/2}, it allows us to get a probability converging to one as tt\to\infty, if both dimensions are fixed.

Before we proceed to the proof of the above proposition, let us introduce some tools from the classical theory of probability in Banach spaces. Below, ε1,,εN\varepsilon_{1},\ldots,\varepsilon_{N} will always denote a sequence of independent Rademacher variables, independent of the sequence X1,,XNX_{1},\ldots,X_{N}.

Let F\mathcal{F} be a family of functions, uniformly bounded by B>0B>0. Then for any independent random variables X1,,XNX_{1},\ldots,X_{N} and any p1p\geq 1, we have

We will also use the celebrated Talagrand’s concentration inequality for suprema of bounded empirical processes . The version from presented below, provides the best known constants in this inequality (we will however not take advantage of explicit constants). For a simple proof (with worse constants) we refer the reader to

Proof of Proposition 4.4 For simplicity, throughout this proof we will use the letter CC to denote absolute constants, whose values may change from line to line.

For B>1B>1 (to be specified later) consider

where the last line follows from Corollary 4.7. The function ttBt\mapsto|t|\wedge B is a contraction, so

Each of the obtained three terms is estimated separately, with the first term already discussed in (4.5) and (4.1). By (2.3) and Chebyshev’s inequality we have

Together with the previous inequalities this implies that

Thus it remains to estimate supySn1i=1NXi,yp1{Xi,yB}\sup_{y\in S^{n-1}}\sum_{i=1}^{N}|\langle X_{i},y\rangle|^{p}\mathbf{1}_{\{|\langle X_{i},y\rangle|\geq B\}}. To this end we use Theorem 3.6 and Remark 3.10. It follows that for s1s\geq 1, with probability at least 1ecsn1-e^{-cs\sqrt{n}}, we have, for all mNm\leq N and all zSN1z\in S^{N-1} with suppz=m|\mathop{\rm supp}z|=m,

For an arbitrary ySn1{y\in S^{n-1}} let EB=EB(y):={iN ⁣:Xi,yB}E_{B}=E_{B}(y):=\{i\leq N\colon|\langle X_{i},y\rangle|\geq B\}. Then, by (4.8),

we obtain (for a different absolute constant CC),

This combined with (4.8) implies, after taking the pp’th powers and again adjusting constants, that with probability at least 1ecsn1-e^{-cs\sqrt{n}}, for all ySn1y\in S^{n-1},

Setting B=2Cslog(2N/n)B=2Cs\log(2N/n), so that (4.9) is satisfied, and combining the resulting estimate with (4.1), we get

This completes the proof of Proposition 4.4,

where c>0c>0 is a numerical constant. Thus AcnlogN\|A\|\geq\sqrt{c{n}\log N}. This example shows that the sub-exponential decay of linear forms (ψ1\psi_{1} norm bounded) is not sufficient for our problem.

2 Additional observations

For 1Nen1\leq N\leq e^{\sqrt{n}} let Γ\Gamma be a random N×nN\times n matrix with rows X1,,XNX_{1},\ldots,X_{N}. Then for p2p\geq 2, with probability at least 1ecpn1-e^{-c_{p}\sqrt{n}} (where cp>0c_{p}>0 depends only on pp),

with Cp>0C_{p}>0 depending only on pp. Moreover

On the other hand, a single row of Γ\Gamma has expected Euclidean norm of the order of n\sqrt{n} and a single column of Γ\Gamma has expected p\|\cdot\|_{p} norm of the order of c(p)N1/pc(p)N^{1/p}, so the left hand side of (4.11) follows trivially.

For 1Nen1\leq N\leq e^{\sqrt{n}} let Γ\Gamma be a random N×nN\times n matrix with rows X1,,XNX_{1},\ldots,X_{N}. Then for p[1,2)p\in[1,2), with probability at least 1ecn1-e^{-c\sqrt{n}} (where c>0c>0 is an absolute constant),

for some absolute constant C>0C>0. Moreover

Proof Inequality (4.12) and the right-hand side of (4.13) follow from the corresponding results for p=2p=2, since

To prove the left-hand side of (4.13), it is enough to notice that if 1/p+1/p=11/p^{\ast}+1/p=1, then

One can also obtain an almost-isometric result for p[1,2)p\in[1,2).

Moreover, one can take C(ε,t)=Ct2pε2log2p2(2t2pε2)C(\varepsilon,t)=Ct^{2p}\varepsilon^{-2}\log^{2p-2}(2t^{2p}\varepsilon^{-2}), where C>0C>0 is an absolute constant.

Proof Since the proof differs only by technical details from the corresponding argument for p2p\geq 2, we will just indicate the necessary changes. We will use the notation from the proof of Proposition 4.4.

(the constants in the exponents can be made independent of pp, since now pp runs over a bounded interval). This allows us to finish the proof.

The isomorphic result for p=1p=1 was proven in . The same paper also considers p(0,1)p\in(0,1).

3 Elementary approach for p=2𝑝2p=2

As announced earlier we will now briefly describe a more elementary proof of Theorem 4.1 and Theorem 4.2 for p=2p=2. In this case, the classical Bernstein inequality and a net argument on the sphere may replace the contraction principle and concentration of measure for empirical processes, that have been used – via Lemma 4.8 – to prove (4.5). The remaining part of the proof is left unchanged.

The key point is the following well known observation:

We postpone the proof of this Lemma and pass to the proof of Theorems 4.1 and 4.2.

Fix a cεc\varepsilon-net N\mathcal{N} of Sn1S^{n-1} of cardinality at most (3/cε)n(3/c\varepsilon)^{n}, and B>0B>0 to be determined later. Pick an arbitrary ySn1y\in S^{n-1}.

For the reader’s convenience recall Bernstein’s inequality.

Let ZiZ_{i} be independent random variables, centered and such that Zia|Z_{i}|\leq a for all 1iN1\leq i\leq N. Put Z=1Ni=1NZiZ={\frac{1}{N}}\sum_{i=1}^{N}Z_{i}. Then for all τ0\tau\geq 0,

Setting τ=tBn/N\tau=tB\sqrt{n/N} we infer that

Using this estimate with B=Ctlog(2N/n)B=Ct\log(2N/n) and handling the unbounded part the same way as in Proposition 4.4 (see the argument that follows (4.5)) we obtain

This corresponds to the estimates in Proposition 4.4 (for s=ts=t).

Now, for NC(ε,t)nN\geq C(\varepsilon,t)n, and C(ε,t)C(\varepsilon,t) sufficiently large, the right hand side of (4.3) is at most ε\varepsilon and 5/cε2N/n5/c\varepsilon\leq 2N/n which leads to the probability above to be at least 1exp(ctn)1-\exp(-ct\sqrt{n}). So with the same probability we get

The triangle inequality and homogeneity of \|\cdot\| imply, by a standard argument, that

To get a lower estimate, write an arbitrary ySn1y\in S^{n-1} in the form y=y1+cεy2y=y_{1}+c\varepsilon y_{2}, with y1Ny_{1}\in\mathcal{N} and y2Sn1y_{2}\in S^{n-1}. Then yy1cεy2(1ε)cε(1+δ)1δ1\|y\|\geq\|y_{1}\|-c\varepsilon\|y_{2}\|\geq(1-\varepsilon)-c\varepsilon(1+\delta)\geq 1-\delta_{1}, where

Thus for all ySn1y\in S^{n-1}, y1c1ε|\|y\|-1|\leq c_{1}\varepsilon for some c1c_{1} depending only on cc. In particular y[0,1+c1]\|y\|\in[0,1+c_{1}]. Using the fact that the function tt2t\mapsto t^{2} is Lipschitz with constant 2(1+c1)2(1+c_{1}) on the interval [0,1+c1][0,1+c_{1}], we conclude that

where c=2c1(1+c1)c^{\prime}=2c_{1}(1+c_{1}) depends only on cc.

References