Lower bounds on the smallest eigenvalue of a sample covariance matrix

Pavel Yaskov

Introduction

Lower bounds on the smallest eigenvalue of a sample covariance matrix (or a Gram matrix) play a crucial role in the least squares problems in high-dimensional statistics (see, for example, ). These problems motivate the present work.

In this paper we derive sharp lower bounds for λp(n1XpnXpn)\lambda_{p}(n^{-1}\mathbf{X}_{pn}\mathbf{X}_{pn}^{\top}), where λp(A)\lambda_{p}(A) is the smallest eigenvalue of a p×pp\times p matrix AA. We try to impose as few restrictions on the components of XpX_{p} as possible. In proofs we use the same strategy as in .

Main results

whenever yC2Kp2y\leqslant C_{2}K_{p}^{2} and Z=Z(p,n)Z=Z(p,n) as above.

Useful bounds for cp(a)c_{p}(a) and Cp(a)C_{p}(a) in terms of Lp(α)L_{p}(\alpha) and Mp(α)M_{p}(\alpha) are given in the following proposition.

In addition, for all α(0,2]\alpha\in(0,2] and each a>0a>0, Cp(a)C_{p}(a) is bounded from above by

Applications

We now describe different corollaries of Theorem 2.1 and Theorem 2.2. The next corollary extends Theorem 1.3 in and Theorem 3.1 in (for Ai=XpiXpiA_{i}=X_{pi}X_{pi}^{\top}).

One may further weaken assumptions in Corollary 3.1. Namely, one may assume that Mp(α)<M_{p}(\alpha)<\infty for some α(0,2).\alpha\in(0,2). The conclusion of Corollary 3.1 will still hold with some Cα>0C_{\alpha}>0 that depends only on α\alpha and Mp(α)M_{p}(\alpha). In the case α=2\alpha=2, one would have a lower bound of the form 1C2ylog(e/y)1-C_{2}\sqrt{y\log(e/y)} with C2>0C_{2}>0 depending only on Mp(2).M_{p}(2).

Theorems 2.1 and 2.2 improve Theorem 2.1 in as the next corollary shows.

The same conclusion holds if Lp(2)<L_{p}(2)<\infty and n16Lp(2)ε2pn\geqslant 16L_{p}(2)\varepsilon^{-2}p.

Let us formulate the final corollary that improves Theorem 3.1 in for small KpK_{p}.

The range of applicability of Corollary 3.4 is very wide. Namely, there exist some universal constant K>0K>0 such that KpKK_{p}\geqslant K for a very large class of isotropic random vectors XpX_{p}. By Corollary 3.4, this means that λp(n1XpnXpn)\lambda_{p}(n^{-1}\mathbf{X}_{pn}\mathbf{X}_{pn}^{\top}) is separated from zero by an universal constant.

The existence of KK follows from results related to Kashin’s decomposition theorem. The infinite dimensional version of this theorem is given in Kashin (for a proof, see ). It states the following.

There is an universal constant K>0K>0 such that L2(0,1)=H1H2L_{2}(0,1)=H_{1}\oplus H_{2} for some linear subspaces of HiL2(0,1),H_{i}\subset L_{2}(0,1), i=1,2,i=1,2, such that x1Kx2\|x\|_{1}\geqslant K\|x\|_{2} for all xH1H2,x\in H_{1}\cup H_{2}, where xd\|x\|_{d} is the standard norm in Ld(0,1)L_{d}(0,1), d=1,2d=1,2.

Proofs.

In proofs of Theorem 2.1 and Theorem 2.2, we follow the strategy of Srivastava and Vershynin . The key step is the following lemma.

The proof of Lemma 4.1 is given in Appendix.

The strategy itself consists in the following. Let A0A_{0} be a p×pp\times p zero matrix and

Put lk=lk1+Δkl_{k}=l_{k-1}+\Delta_{k} for 1kn1\leqslant k\leqslant n, where

Let UU and VV be non-negative random variables. Then, for all a>0a>0,

Proof of Theorem 2.1. Take in Lemma 4.3 Xp=Xpk,X_{p}=X_{pk},

hereinafter all inequalities with conditional mathematical expectations hold almost surely. By (2), the latter implies that

Assume first that C2=Lp(2)<C^{2}=L_{p}(2)<\infty and p/nyp/n\leqslant y for some y>0y>0. Define XpAXpX_{p}^{\top}AX_{p} and XpBXpX_{p}^{\top}BX_{p} in the same way as in (3). Then, by Lemma 4.3 (with φ=1/(3b)\varphi=1/(3b)),

Taking φ=y/(2C)\varphi=\sqrt{y}/(2C) in (2), we get p/(nφ)y/φ=2Cyp/(n\varphi)\leqslant y/\varphi=2C\sqrt{y} and

Finally, consider the case with Kp>0K_{p}>0 ( the case with Kp=0K_{p}=0 is trivial). By Lemma 4.3 with b=(3φ)1b=(3\varphi)^{-1} and φ=1/4\varphi=1/4,

Taking p/ny=Kp2/16p/n\leqslant y=K_{p}^{2}/16 in (2), we get

Taking y=La1α/2y=La^{-1-\alpha/2}, we get the desired inequality.

Consider the case α=2.\alpha=2. By Theorem 2.2 with y=p/ny=p/n and C=Lp(2)C=\sqrt{L_{p}(2)},

Proof of Corollary 3.3. Set L=Lp(α)L=L_{p}(\alpha) for given α(0,2)\alpha\in(0,2). By Proposition 2.3,

Similarly, taking y=ε2/(16C2)y=\varepsilon^{2}/(16C^{2}) for C=Lp(2)C=\sqrt{L_{p}(2)} in Theorem 2.2, we get that

Proof of Corollary 3.4. Let C0,C1,C2>0C_{0},C_{1},C_{2}>0 be such that the second bound in Theorem 2.2 holds. Then, for p/nC2Kp2,p/n\leqslant C_{2}K_{p}^{2},

Putting C0=C0/2,C_{0}^{*}=C_{0}/2, C1=C02/(8C12)C_{1}^{*}=C_{0}^{2}/(8C_{1}^{2}) and C2=C2C_{2}^{*}=C_{2}, we finish the proof.

Appendix

Proof of Lemma 4.1. By Lemma 2.2 in Srivastava and Vershynin , if A(l+Δ)Ip0A-(l+\Delta)I_{p}\succ 0 and q(l+Δ,v)/[1+Q(l+Δ,v)]Δq(l+\Delta,v)/[1+Q(l+\Delta,v)]\geqslant\Delta, then

since Δ1/(3φ)\Delta\leqslant 1/(3\varphi) by construction.

By Bernoulli’s inequality, (1x)313x(1-x)^{3}\geqslant 1-3x whenever xx\in. Hence,

where the last equality holds by the definition of Δ.\Delta. Proof of Lemma 4.2. We have

for all a>0.a>0. By the Cauchy-Schwartz inequality,

This gives the first inequality. Tending aa to infinity, we get the second inequality.

The last inequality also follows from the Cauchy-Schwartz inequality. Namely,

Fix j{1,,p}j\in\{1,\ldots,p\} and b>0.b>0. By Lemma 4.2,

We have x2/(x+c)xcx^{2}/(x+c)\geqslant x-c for all x,c0x,c\geqslant 0. This yields that

Hence, C5Cp(a)/3.C\leqslant 5C_{p}(a)/3. Combining all estimates together yields

Consider the function f(x)=x2/(1+b1x)f(x)=x^{2}/(1+b^{-1}x), x0x\geqslant 0. Its derivative

Now consider the case with Lp(2)<L_{p}(2)<\infty. By Lemma 4.2,

To finish the proof, we only need to note that

Proof of Lemma 4.4. Since ex1x+x2/2e^{-x}\leqslant 1-x+x^{2}/2 for all x0,x\geqslant 0, we have

References