Quantitative estimates of the convergence of the empirical covariance matrix in Log-concave Ensembles

Radosław Adamczak, Alexander E. Litvak, Alain Pajor, Nicole Tomczak-Jaegermann

Introduction

This question was investigated in motivated by a problem of complexity in computing volume in high dimension. In particular the authors proved that

whenever $N\geq{\frac{C}{\varepsilon\delta}}n^{2}$ .

When random vectors are standard Gaussian, the covariance matrix is the identity and it is known (see the survey ) that (1.1) holds with high probability whenever $N\geq 4n/\varepsilon^{2}$ . This raises the question about the order of the best $N$ . In particular can it be proportional to $n$ , under reasonable assumptions? More precisely, the question in was phrased in the following setting.

Since for a symmetric matrix $M$ , one has $\|M\|=\sup_{y\in S^{n-1}}\langle My,y\rangle$ , (1.1) is implied by

In the case when the covariance matrix is the identity, it is equivalent to

Because of the linear invariance, there is no loss of generality to consider just this case when the covariance matrix is the identity.

In this framework, a breakthrough was achieved in where it was proved that for any $\varepsilon,\delta\in(0,1)$ , there exists $C(\varepsilon,\delta)>0$ such that if a body $K$ is isotropic then $N=C(\varepsilon,\delta)n\log^{3}n$ i.i.d. uniformly distributed points on $K$ satisfy (1.2). This estimate was further improved to $N=C(\varepsilon,\delta)n\log^{2}n$ in and to $N=C(\varepsilon,\delta)n\log n$ in and ; the former paper treated the case when $K$ is invariant under every reflection with respect to coordinate subspaces and the latter proved the estimate in full generality

One should note that in all these results, the probability in (1.2) does not go to 1 as $n$ goes to infinity, as one expects in this type of high dimensional phenomena. This probability, $1-\delta$ , is given by a parameter $\delta$ and $C(\varepsilon,\delta)$ depends on it. Thus letting $\delta$ tend to zero may destroy the estimate on $N$ . To emphasize this important feature we will talk about overwhelming probability if the probability goes to 1 as $n$ goes to infinity.

The first result establishing (1.1) with overwhelming probability was given in . When a body $K$ is invariant under every reflection with respect to coordinate subspaces, it is proved in that for any $\varepsilon\in(0,1)$ there exist $C(\varepsilon)>0$ such that (1.5) holds whenever $N\geq C(\varepsilon)\,n$ and with probability going to 1 as $n$ goes to infinity. Finally, the present paper shows, as a consequence of our main results (Theorems 4.1 and 4.2), that the same is true for an arbitrary body $K$ (in the isotropic position).

To observe a still one more point of view, for arbitrary $n$ and $N$ , consider again $A=A^{(N)}$ . The set of $n\times n$ matrices may be equipped with the distribution of $AA^{*}$ to be a matrix probability space and because of the analogy with Random Matrix Theory, in particular with Wishart Ensemble, let us call it a Log-concave Ensemble.

In the last decades, in Asymptotic Geometric Analysis, considerable work and progress have been achieved in understanding the properties of random vectors with log-concave distribution, and more recently, in understanding spectral properties of random matrices with independent rows (or columns) with log-concave distribution. It appears that in high dimension they behave somewhat similarly as if the coordinate would be independent. This leads by analogy with Random Matrix Theory to questions on the spectrum of $AA^{*}$ similar to those of the Wishart Ensemble. One important difference is that now the entries are dependent but strongly structured by the log-concavity hypothesis.

Denote by $\lambda_{1}=\lambda_{1}(A^{(N)})\leq\cdots\leq\lambda_{n}=\lambda_{n}(A^{(N)})$ the eigenvalues of $AA^{*}$ (the squares of the singular values of $A$ ). It was proved in that when $n/N$ goes to $\beta\in(0,1)$ as $n,N\to\infty$ , then the empirical measures of the eigenvalues have a limit. It is the so-called Marchenko-Pastur distribution, as for the Wishart Ensemble when all entries of the matrix $A$ are i.i.d. It is also known () in the case when all the entries of $A$ are i.i.d. (with a finite fourth moment) and $\lim_{n\to+\infty}{\frac{n}{N}}=\beta\in(0,1)$ that $\lim\lambda_{1}/N=(1-\sqrt{\beta})^{2}$ and $\lim\lambda_{n}/N=(1+\sqrt{\beta})^{2}$ . One could conjecture that such results are also valid in the log-concave setting. Nevertheless, these results are asymptotic and not quantitative (given fixed dimension).

Problem (1.5) is of course equivalent to quantitative estimates for $\lambda_{1}(A^{(N)})$ and $\lambda_{n}(A^{(N)})$ , that is of the support of the spectrum of $A$ . An answer is given by Proposition 4.4 where it is shown that for $n\leq N\leq\exp(\sqrt{n})$ ,

holds with probability larger than $1-\exp(-c\sqrt{n})$ , where $C,c>0$ are numerical constants. Thus, putting $\beta={\frac{n}{N}}\in(0,1)$ , we get

with overwhelming probability. As a consequence already mentioned earlier, $\|A\|\leq C(\sqrt{N}+\sqrt{n})$ with overwhelming probability, where $C>0$ is a numerical constant (Corollary 4.12).

Our general method follows an approach that can be traced back to Bourgain (cf. also ). It relies upon a crucial new ingredient of a novel chaining argument that in an essential way depends on the distribution of coordinates of a point on the unit sphere. What makes this approach work, by rather subtle estimates, is a special structure of the sets used for the chaining.

The paper is organized as follows. In the next Section 2 we present some definitions and preliminary tools. In Section 3 we study the norm of a restriction of the matrix $A=A^{(N)}$ defined by

We show in Theorem 3.6 that with overwhelming probability,

In Section 4.1 we prove the result announced in the abstract, answering a question from . This theorem appears as a particular case of a more general study of

defined for any $p\geq 1$ . Such processes have been studied in , and .

Notation and preliminaries

in other words, if $X$ is centered and its covariance matrix is the identity:

A straightforward computation shows that for every integer $p\geq 1$ ,

We can now state the sub-exponential decay of linear functionals in terms of $\psi_{1}$ norm :

where $\psi>0$ is universal constant. Moreover, if $X$ has a symmetric distribution then $\psi=2$ .

The moreover part easily follows by a direct calculation (see ).

Putting together (2.2) and Lemma 2.3, we get that for every $y\in S^{n-1}$ ,

where $C$ is an absolute positive constant.

Norm of a random matrix

with probability at least $1-\exp(-K\sqrt{n})$ .

where $C$ and $c$ are absolute positive constants. The result follows by the union bound (and adjusting absolute constants).

Let $m\leq N$ , $\varepsilon,\alpha\in(0,1]$ and $L\geq 2m\log\frac{12eN}{m\varepsilon}$ . Then

Proof Denote the underlying probability space by $\Omega$ . For $F\subset\{1,...,N\}$ with $|F|\leq m$ , $E\subset F$ , and $z\in{\cal{N}}(F,\varepsilon,\alpha)$ , define the subset $\Omega(F,E,z)$ of $\Omega$ by

Fix $F$ , $E$ and $z$ as above and set $y=\sum_{j\in F\setminus E}z_{j}X_{j}$ . Clearly, $y$ is independent of vectors $X_{i}$ ’s, $i\in E$ , and $|y|\leq A_{m}$ . Note that $|y|>0$ on $\Omega(F,E,z)$ (otherwise $\left\langle z_{i}X_{i},y\right\rangle=0$ for all $i\in E$ and the sharp inequality defining $\Omega(F,E,z)$ would be violated). Thus, using the fact that $\|z\|_{\infty}\leq\alpha$ , we obtain

on $\Omega(F,E,z)$ . Since $A_{m}>0$ on $\Omega(F,E,z)$ , this implies

On the other hand, by Chebyshev’s inequality and the assumption on the $\psi_{1}$ -norms of linear functionals, the latter probability is less than

We will also need another lemma of a similar type. We provide the proof for sake of completeness.

Let $1\leq k,m\leq N$ , $\varepsilon,\alpha\in(0,1]$ , $\beta>0$ , and $L>0$ . Let $B(m,\beta)$ denote the set of vectors $x\in\beta B_{2}^{N}$ with $|\mathop{\rm supp}x|\leq m$ and let $\cal{B}$ be a subset of $B(m,\beta)$ of cardinality $M$ . Then

Proof The proof is analogous to the argument in Lemma 3.3. For $F\subset\{1,...,N\}$ with $|F|\leq k$ , $x\in{\cal{B}}$ , and $z\in{\cal{N}}(F,\varepsilon,\alpha)$ consider

Fix $F$ , $x$ , $z$ as above and set $y=\sum_{j\not\in F}x_{j}X_{j}$ . Clearly, $y$ is independent of the vectors $X_{i}$ ’s, $i\in F$ , moreover, $|y|\leq\beta A_{m}$ , and, similarly as in before, $|y|>0$ on $\Omega(F,x,z)$ . Thus, using the fact that $\|z\|_{\infty}\leq\alpha$ , we obtain

on $\Omega(F,x,z)$ . Therefore, again as in Lemma 3.3, we have

This shows that the probability estimate in Theorem 3.6 is optimal up to numerical constants. The analysis of this example shows that up to numerical constants the logarithmic term in the estimate of $A_{m}$ in Theorem 3.6 is also optimal (for the details see ).

Letting $m=N$ we get a clearly optimal estimate for the operator norm $\|A\|$ , valid with overwhelming probability.

In the setting of Theorem 3.6 we get, for every $K\geq 1$ ,

with probability at least $1-e^{-cK\sqrt{n}}$ , where $C,c>0$ are absolute constants.

By Lemmas 2.3 and 3.1, and taking into account the normalization, this would imply a version of (3.1) with $N=n$ and probability $1-\delta$ .

Note that $\sqrt{n}+\sqrt{m}\log\frac{2N}{m}$ in the formula in Theorem 3.6 can be substituted with

Indeed, if $m\geq n$ there is nothing to prove, otherwise

There are absolute positive constants $C$ and $c$ such that for every $n\geq 1$ , $1\leq N\leq e^{\sqrt{n}}$ , $K\geq 1$ , and $X_{i}$ ’s as in Theorem 3.6 one has

Proof Given $E$ set $m=|E|$ . Consider vector $z\in S^{N-1}$ defined by $z_{i}=1/\sqrt{m}$ if $i\in E$ and $z_{i}=0$ otherwise. We have

Therefore Theorem 3.6 and Remark 3.7 imply the result.

Proof of Theorem 3.6. As $N\leq e^{\sqrt{n}}$ , it is easy to see, by applying the union bound and adjusting absolute constants, that it is sufficient to prove that for $K$ sufficiently large and every fixed $m\leq N$ , one has

We shall define a set $\cal{M}$ of vectors with a special structure and supports less than or equal to $m$ which serves simultaneously two purposes: we will be able to estimate with large probability $\sup_{x\in\cal{M}}|Ax|$ , and we will use $\cal{M}$ to approximate an arbitrary vector from $B_{2}^{N}$ of support less than or equal to $m$ . Then a standard argument will lead to the required estimate for $A_{m}$ .

Otherwise, let $l$ be the smallest integer such that

and fix positive integers $a_{0},a_{1},\ldots,a_{l}$ such that $a_{k}\leq m\,2^{-k+1}$ for $1\leq k\leq l$ and $a_{0}\leq m\,2^{-l}$ , and $\sum_{k=0}^{l}a_{k}=m$ . (We shall later set $a_{k}:=[m\,2^{-k+1}]-[m\,2^{-k}]$ for $1\leq k\leq l$ and $a_{0}:=[m\,2^{-l}]$ .)

Then set ${\cal{M}}={\cal{M}}_{0}\cap 2B_{2}^{N}$ , where ${\cal{M}}_{0}$ consists of all vectors of the form $x=\sum_{k=0}^{l}x_{k}$ , where $x_{i}$ ’s have disjoint supports and

Note that for every vector $x\in{\cal{M}}$ we have $|\mathop{\rm supp}x|\leq\sum_{0}^{l}a_{k}=m$ and $|x|\leq 2$ .

We shall consider the details of the case $m\log(48eN/m)>\sqrt{n}$ (the other case, when (3.2) holds, can be treated similarly, actually, it is even simpler, since the construction of $\cal{M}$ is simpler). Fix $x\in\cal{M}$ of the form $x=\sum_{k=0}^{l}x_{k}$ and let $F_{k}$ be the support of $x_{k}$ (if there are more than one such representations, we fix one of them). Denote the coordinates of $x$ by $x(i)$ , $i\leq N$ , then

Note that by Lemma 3.1, $\max_{i}|X_{i}|\leq C_{0}K\sqrt{n}$ with probability larger than $1-e^{-K\sqrt{n}}$ , and we would like to get a similar estimate for $D_{x}$ .

To this aim we split $D_{x}$ according to the structure of $x$ . Namely we let

where $G_{k}=\{0,k+1,k+2,\ldots,l\}$ . Note that

We first estimate $D^{\prime}_{x}$ . By Lemma 3.2 we obtain that for every $k$ there exists a subset $\bar{F}_{k}$ of $F_{k}$ such that

We now apply Lemma 3.3 to each summand in the sum above with $L=2K\sqrt{n}$ , $\varepsilon=1/4$ , $\alpha=1$ for the first summand (note that such an $L$ satisfies the condition) and with $L=\frac{4m}{2^{k}}K\log\frac{12eN4^{k}}{m}$ , $\varepsilon=2^{-k}$ , $\alpha=\sqrt{\frac{2^{k}}{m}}$ for $k\geq 1$ . By the union bound we obtain

where $\psi$ is the absolute constant from Lemma 2.3.

Therefore, the choice of $l$ implies the following bound, with some absolute positive constant $C$ ,

(We also used the estimate $l\leq 2\sqrt{n}$ , valid when $m\leq N\leq e^{\sqrt{n}}$ .)

The estimate for $D^{\prime\prime}_{x}$ essentially follows the same lines. In a sense it is simpler, since we don’t need to apply Lemma 3.2. For every $1\leq k\leq l$ we consider ${\cal{M}}_{k}={\cal{M}}_{k}^{\prime}\cap 2B_{2}^{N}$ , where ${\cal{M}}_{k}^{\prime}$ consists of all vectors of the form $x=x_{0}+\sum_{s=k+1}^{l}x_{s}$ , where $x_{i}$ ’s ( $i=0,k=1,\ldots,l$ ) have pairwise disjoint supports and

Then ${\cal{M}}_{k}\subset 2B_{2}^{N}$ and

Now we apply Lemma 3.4 to each summand with

As in the case for $D_{x}^{\prime}$ it follows that

where $C$ is the same absolute constant as above. Since $D_{x}=D^{\prime}_{x}+D^{\prime\prime}_{x}$ , then

Passing now to the approximation argument, pick an arbitrary $z\in S^{N-1}$ with $|\mathop{\rm supp}z|\leq m$ . Define the following subsets of $\{1,\ldots,N\}$ depending on $z$ . Denote the coordinates of $z$ by $z_{i}$ ( $i=1,\ldots,N$ ). Let $n_{1},\ldots,n_{N}$ be such that $|z_{n_{1}}|\geq|z_{n_{2}}|\geq\ldots\geq|z_{n_{N}}|$ , so that $z_{n_{i}}=0$ for $i>m$ (since $|\mathop{\rm supp}z|\leq m$ ). If condition (3.2) holds we denote the support of $z$ by $E_{0}$ and consider only this $E_{0}$ . Otherwise we set

where $l$ is the smallest integer satisfying (3.3) (as before). (For small values of $n$ it can happen that $E_{0}$ is empty, but it does not create any difficulty in the proof below.) Clearly, we have

and $\sum_{i=0}^{l}a_{i}=m$ . Note that the numbers $a_{k}$ ’s do not depend on $z$ , although the sets $E_{k}$ ’s do. Finally, since $z\in S^{N-1}$ , we also observe that for every $k\geq 1$ ,

Note that for every $k\geq 1$ the vector $P_{E_{k}}z$ can be approximated by a vector from ${\cal{N}}\left(E_{k},2^{-k},\sqrt{\frac{2^{k}}{m}}\right)$ and the vector $P_{E_{0}}z$ can be approximated by a vector from ${\cal{N}}(E_{0},1/4,1)$ . Thus there exists $x\in{\cal{M}}$ , with a suitable representation $x=\sum_{k=0}^{l}x_{k}$ , such that

Moreover, $x$ is chosen to have the same support as $z$ , and thus $w=z-x$ has the support $|\mathop{\rm supp}w|\leq m$ .

Considering all $z\in S^{N-1}$ with $|\mathop{\rm supp}z|\leq m$ it follows that

Recall that by (3.4) for every $x\in{\cal M}$ we have

where $c$ is an absolute positive constant. (In fact this estimate for probability requires that $n$ is sufficiently large, but, as $K\geq 1$ was arbitrary, we can adjust the constants.) This concludes the proof.

where $\kappa=\|T^{\ast}\|=\sqrt{\|\Sigma\|}$ (note that $\Sigma=TT^{\ast}$ ).

We conclude this section with a more technical variant of Theorem 3.6. Note that in particular it requires weaker conditions on $X_{i}$ ’s and does not require any bounds on $N$ .

Let $A$ be a random $n\times N$ matrix whose columns are $X_{i}$ ’s, and $A_{m}$ , $m\leq N$ , is defined as before. Then for every $1\leq m\leq N$ , every $0\leq l\leq\log m$ , and every $K\geq 1$ one has

where $C$ is an absolute constant. In particular, choosing $0\leq l\leq\log m$ to be the largest integer satisfying

Note that from the definitions we immediately have

For completeness we outline a proof of Theorem 3.13.

Proof (Sketch.) We proceed as in the proof of Theorem 3.6. So first we construct $\cal{M}$ . If $l=0$ we define $\cal{M}$ exactly as after formula (3.2), otherwise it will be constructed in the same way as it was constructed after formula (3.3) (note that now $l$ is a fixed number). Then we estimate $D_{x}=D_{x}^{\prime}+D_{x}^{\prime\prime}$ . As before we use Lemmas 3.3 and 3.4.

The only difference is that for the first summand in the formula for $D_{x}^{\prime}$ we use Lemma 3.3 with $L=4K\frac{m}{2^{l}}\log\frac{48eN2^{l}}{m}$ instead of $L=2K\sqrt{n}$ . It will give us that

Thus, with another absolute positive constant $C$ we have

Finally we apply the same approximation procedure. By (3.4) and approximation we get formula (3.6)

which implies the result, by adjusting constants, if necessary. The “in particular” part of the Theorem is trivial.

It is possible to extend Theorem 3.13 to a $\psi_{p}$ -setting, similar to the one considered in . Let $p\in$ and let $X$ be a random vector such that for some $\psi_{p}>0$ one has

for every $y\in S^{n-1}$ . Then, adjusting Lemmas 3.3 and 3.4, and repeating the proof of Theorem 3.13 we can get

However we will not pursue this direction here.

Kannan-Lovász-Simonovits question

First note that because of the linear invariance, (1.5) implies

Therefore without loss of generality we restrict ourselves to the case when the covariance matrix is the identity.

where $c>0$ is an absolute constant. Moreover, one can take $C(\varepsilon,t)=Ct^{4}\varepsilon^{-2}\log^{2}(2t^{2}\varepsilon^{-2})$ , where $C>0$ is an absolute constant.

This way approximating the covariance matrix becomes a special case of a more general problem, concerning the uniform approximation of the moments of one dimensional marginals of an isotropic log-concave measure by their empirical counterparts. In particular, Theorem 4.1 is implied by the following result.

Moreover, one can take $C(\varepsilon,t,p)=C_{p}t^{2p}\varepsilon^{-2}\log^{2p-2}(2t^{2}\varepsilon^{-2})$ , where $C_{p}$ depends only on $p$ .

with probability even higher than claimed. Thus in the proofs of both theorems we may assume without loss of generality that $N\leq\exp{(\sqrt{n})}$ .

In the first step of the proof of Theorem 4.2 we shall use some tools from the probability in Banach spaces, in particular classical symmetrization and contraction methods as in and . These tools work for general empirical processes and are not necessary in our setting since we are dealing more specifically with powers of linear forms. We choose this approach, though, as it requires less computations and leads to a unified, simpler and more transparent presentation.

Theorem 4.2 is an easy consequence of the following technical proposition applied with $s=t$ .

In the setting of Theorem 4.2, if $n\leq N\leq e^{\sqrt{n}}$ , then for any $s,t\geq 1$ , the estimate

where $u=t^{2}s^{2p-2}n\log^{2p-2}(2N/n)$ , $v=ts^{-1}\sqrt{Nn}/\log(2N/n)$ , $C,c>0$ are absolute constants and $c_{p}>0$ depends on $p$ only.

The two parameters $s$ and $t$ play different role in the proof and reflect different asymptotic behavior of the probability with which (4.4) holds. The first parameter $s$ is related to a level of truncation of linear forms whereas the second is a factor in the deviation when one deals only with the truncated part. For instance, by taking $s=t^{1/2}$ , it allows us to get a probability converging to one as $t\to\infty$ , if both dimensions are fixed.

Before we proceed to the proof of the above proposition, let us introduce some tools from the classical theory of probability in Banach spaces. Below, $\varepsilon_{1},\ldots,\varepsilon_{N}$ will always denote a sequence of independent Rademacher variables, independent of the sequence $X_{1},\ldots,X_{N}$ .

Let $\mathcal{F}$ be a family of functions, uniformly bounded by $B>0$ . Then for any independent random variables $X_{1},\ldots,X_{N}$ and any $p\geq 1$ , we have

We will also use the celebrated Talagrand’s concentration inequality for suprema of bounded empirical processes . The version from presented below, provides the best known constants in this inequality (we will however not take advantage of explicit constants). For a simple proof (with worse constants) we refer the reader to

Proof of Proposition 4.4 For simplicity, throughout this proof we will use the letter $C$ to denote absolute constants, whose values may change from line to line.

For $B>1$ (to be specified later) consider

where the last line follows from Corollary 4.7. The function $t\mapsto|t|\wedge B$ is a contraction, so

Each of the obtained three terms is estimated separately, with the first term already discussed in (4.5) and (4.1). By (2.3) and Chebyshev’s inequality we have

Together with the previous inequalities this implies that

Thus it remains to estimate $\sup_{y\in S^{n-1}}\sum_{i=1}^{N}|\langle X_{i},y\rangle|^{p}\mathbf{1}_{\{|\langle X_{i},y\rangle|\geq B\}}$ . To this end we use Theorem 3.6 and Remark 3.10. It follows that for $s\geq 1$ , with probability at least $1-e^{-cs\sqrt{n}}$ , we have, for all $m\leq N$ and all $z\in S^{N-1}$ with $|\mathop{\rm supp}z|=m$ ,

For an arbitrary ${y\in S^{n-1}}$ let $E_{B}=E_{B}(y):=\{i\leq N\colon|\langle X_{i},y\rangle|\geq B\}$ . Then, by (4.8),

we obtain (for a different absolute constant $C$ ),

This combined with (4.8) implies, after taking the $p$ ’th powers and again adjusting constants, that with probability at least $1-e^{-cs\sqrt{n}}$ , for all $y\in S^{n-1}$ ,

Setting $B=2Cs\log(2N/n)$ , so that (4.9) is satisfied, and combining the resulting estimate with (4.1), we get

This completes the proof of Proposition 4.4,

where $c>0$ is a numerical constant. Thus $\|A\|\geq\sqrt{c{n}\log N}$ . This example shows that the sub-exponential decay of linear forms ( $\psi_{1}$ norm bounded) is not sufficient for our problem.

2 Additional observations

For $1\leq N\leq e^{\sqrt{n}}$ let $\Gamma$ be a random $N\times n$ matrix with rows $X_{1},\ldots,X_{N}$ . Then for $p\geq 2$ , with probability at least $1-e^{-c_{p}\sqrt{n}}$ (where $c_{p}>0$ depends only on $p$ ),

with $C_{p}>0$ depending only on $p$ . Moreover

On the other hand, a single row of $\Gamma$ has expected Euclidean norm of the order of $\sqrt{n}$ and a single column of $\Gamma$ has expected $\|\cdot\|_{p}$ norm of the order of $c(p)N^{1/p}$ , so the left hand side of (4.11) follows trivially.

For $1\leq N\leq e^{\sqrt{n}}$ let $\Gamma$ be a random $N\times n$ matrix with rows $X_{1},\ldots,X_{N}$ . Then for $p\in[1,2)$ , with probability at least $1-e^{-c\sqrt{n}}$ (where $c>0$ is an absolute constant),

for some absolute constant $C>0$ . Moreover

Proof Inequality (4.12) and the right-hand side of (4.13) follow from the corresponding results for $p=2$ , since

To prove the left-hand side of (4.13), it is enough to notice that if $1/p^{\ast}+1/p=1$ , then

One can also obtain an almost-isometric result for $p\in[1,2)$ .

Moreover, one can take $C(\varepsilon,t)=Ct^{2p}\varepsilon^{-2}\log^{2p-2}(2t^{2p}\varepsilon^{-2})$ , where $C>0$ is an absolute constant.

Proof Since the proof differs only by technical details from the corresponding argument for $p\geq 2$ , we will just indicate the necessary changes. We will use the notation from the proof of Proposition 4.4.

(the constants in the exponents can be made independent of $p$ , since now $p$ runs over a bounded interval). This allows us to finish the proof.

The isomorphic result for $p=1$ was proven in . The same paper also considers $p\in(0,1)$ .

3 Elementary approach for p=2𝑝2p=2

As announced earlier we will now briefly describe a more elementary proof of Theorem 4.1 and Theorem 4.2 for $p=2$ . In this case, the classical Bernstein inequality and a net argument on the sphere may replace the contraction principle and concentration of measure for empirical processes, that have been used – via Lemma 4.8 – to prove (4.5). The remaining part of the proof is left unchanged.

The key point is the following well known observation:

We postpone the proof of this Lemma and pass to the proof of Theorems 4.1 and 4.2.

Fix a $c\varepsilon$ -net $\mathcal{N}$ of $S^{n-1}$ of cardinality at most $(3/c\varepsilon)^{n}$ , and $B>0$ to be determined later. Pick an arbitrary $y\in S^{n-1}$ .

For the reader’s convenience recall Bernstein’s inequality.

Let $Z_{i}$ be independent random variables, centered and such that $|Z_{i}|\leq a$ for all $1\leq i\leq N$ . Put $Z={\frac{1}{N}}\sum_{i=1}^{N}Z_{i}$ . Then for all $\tau\geq 0$ ,

Setting $\tau=tB\sqrt{n/N}$ we infer that

Using this estimate with $B=Ct\log(2N/n)$ and handling the unbounded part the same way as in Proposition 4.4 (see the argument that follows (4.5)) we obtain

This corresponds to the estimates in Proposition 4.4 (for $s=t$ ).

Now, for $N\geq C(\varepsilon,t)n$ , and $C(\varepsilon,t)$ sufficiently large, the right hand side of (4.3) is at most $\varepsilon$ and $5/c\varepsilon\leq 2N/n$ which leads to the probability above to be at least $1-\exp(-ct\sqrt{n})$ . So with the same probability we get

The triangle inequality and homogeneity of $\|\cdot\|$ imply, by a standard argument, that

To get a lower estimate, write an arbitrary $y\in S^{n-1}$ in the form $y=y_{1}+c\varepsilon y_{2}$ , with $y_{1}\in\mathcal{N}$ and $y_{2}\in S^{n-1}$ . Then $\|y\|\geq\|y_{1}\|-c\varepsilon\|y_{2}\|\geq(1-\varepsilon)-c\varepsilon(1+\delta)\geq 1-\delta_{1}$ , where

Thus for all $y\in S^{n-1}$ , $|\|y\|-1|\leq c_{1}\varepsilon$ for some $c_{1}$ depending only on $c$ . In particular $\|y\|\in[0,1+c_{1}]$ . Using the fact that the function $t\mapsto t^{2}$ is Lipschitz with constant $2(1+c_{1})$ on the interval $[0,1+c_{1}]$ , we conclude that