Bounding the smallest singular value of a random matrix without concentration

Vladimir Koltchinskii, Shahar Mendelson

Introduction

The non-asymptotic theory of random matrices has attracted much attention in the recent years. One of the main questions studied in this area has to do with the behaviour of the singular values of various random matrix ensembles.

Various attempts have been made to find conditions on XX that ensure that the extremal singular values of Γ\Gamma satisfy a non-asymptotic version of the classical Bai-Yin Theorem , formulated here.

Let A=AN,nA=A_{N,n} be an N×nN\times n random matrix with independent entries, distributed according to a random variable ξ\xi, for which

If N,nN,n\to\infty and the aspect ratio n/Nn/N converges to β(0,1]\beta\in(0,1], then

almost surely. Also, without the fourth moment assumption, λmax(A)/N\lambda_{\max}(A)/\sqrt{N} is almost surely unbounded.

In the non-asymptotic theory of random matrices, one is interested in a quantitative Bai-Yin type estimate: that for Nc1nN\geq c_{1}n, with high probability, the extremal singular values of Γ\Gamma satisfy that

for suitable absolute constants c1c_{1} and c2.c_{2}.

The main motivation for this note was the conjecture that a simultaneous estimate on the largest and smallest singular values is misleading, because their roles are very different. More precisely, that while any nontrivial estimate on the largest singular value does require concentration of some kind, usually given via an assumption on the Euclidean norm of XX, the smallest singular value should be bounded away from zero with almost no assumptions on the random vector.

A result in this direction is due to Srivastava and Vershynin . We state it in a slightly modified form consistent with the notations of Theorem 1.1:

Suppose that n/Nβn/N\leq\beta for some β(0,1].\beta\in(0,1]. Then,

for some constant c1>0c_{1}>0 depending only on η,L.\eta,L.

Theorem 1.2 implies that a slightly stronger condition than isotropicity suffices to ensure that for NN that is proportional to nn, the smallest singular value is bounded away from zero, with constant probability. However, when η>2\eta>2, which is the ‘Bai-Yin’ range, or even when η\eta is arbitrarily large, Theorem 1.2 does not lead to a Bai-Yin type of lower bound 1c1β.1-c_{1}\sqrt{\beta}.

Here, we will show that Theorem 1.2 can be improved in the entire range of η\eta. In particular, one has the following non-asymptotic lower bound:

Suppose that n/Nβn/N\leq\beta for some β(0,1].\beta\in(0,1]. Then:

1. For η>2,\eta>2, with probability at least 1c0log(e/β)exp(c1Nβ)1-c_{0}\log(e/\beta)\exp(-c_{1}N\beta),

2. For η=2,\eta=2, with probability at least 1-\exp\Bigl{(}-c_{3}N\beta\log(1/\beta)\Bigr{)},

3. For 0<η<2,0<\eta<2, with probability at least 1exp(c5Nβlog(1/β))1-\exp(-c_{5}N\beta\log(1/\beta)),

In particular, in the ‘Bai-Yin’ range of η>2\eta>2, one recovers the optimal behaviour (up to the multiplicative constants) of the smallest singular value of Γ\Gamma, and with a high probability (not only in expectation), for every Nc(β)N\geq c(\beta).

Somewhat surprisingly, the proof of this stronger result is much easier than the proof of Theorem 1.2 from .

The second main result has to do with the case η=0\eta=0, in which there is no additional moment information on linear forms.

It turns out that under rather minimal assumptions, one may bound the smallest singular value well away from zero.

In fact, Theorem 1.4 can be improved even further. It is reasonable to expect that there are only two situations in which the smallest singular value of Γ\Gamma is ‘too close’ to zero. Firstly, when the distribution of XX is very ‘peaky’ and almost atomic, which causes degeneracy. And secondly, when the covariance operator of XX is degenerate in some sense.

A more general version of Theorem 1.4 shows that this belief is indeed true. The main condition used to quantify the non-degeneracy of XX is that the following ‘small-ball’ function (for one-dimensional projections of XX)

is bounded away from zero for some u>0u>0.

It turns out that having Q(u)>0Q(u)>0 for some u>0u>0, combined with an additional minimal condition on the covariance operator of X,X, suffices to ensure that λmin(Γ)c\lambda_{\min}(\Gamma)\geq c, for a constant cc that depends on uu, Q(u)Q(u) and the covariance, with probability that grows as 1exp(c1N)1-\exp(-c_{1}N). And, the isotropicity of XX and the equivalence between the L1L_{1} and L2L_{2} norms in Theorem 1.4 are only there to ensure that Q(u)Q(u) is bounded from below at some level, that depends only on LL.

We end this introduction with a word about notation. Throughout this note, all absolute constants are denoted by c,c1,...c,c_{1},... or κ,κ1,...\kappa,\kappa_{1},.... Their value may change from line to line. We write c(a)c(a) if the constant depends only on the parameter aa; ABA\lesssim B means that there is an absolute constant cc for which AcBA\leq cB, and AaBA\lesssim_{a}B if cc depends only on aa. A similar convention is used for ABA\sim B and AaBA\sim_{a}B, in which one has a two-sided inequality.

Two main estimates

In this section, we study the problem in a more abstract setting. Let F{\cal F} be a class of functions on a probability space (Ω,A,P)(\Omega,{\cal A},P) and set PNP_{N} to be the empirical measure N1i=1NδXiN^{-1}\sum_{i=1}^{N}\delta_{X_{i}}, based on a sample (X1,,XN)(X_{1},\dots,X_{N}) of independent random variables with a common distribution PP.

We will derive lower bounds on inffFPNf2\inf_{f\in{\cal F}}P_{N}f^{2} that, when {\cal F}=\{\bigl{<}t,\cdot\bigr{>},t\in S^{n-1}\}, imply the results on the smallest singular value of matrix Γ\Gamma stated in the introduction.

Although the estimates presented here are one-sided, they will be referred to as ‘isomorphic’ when there is a constant 0<c<10<c<1 for which PNf2cfL2(P)2P_{N}f^{2}\geq c\|f\|_{L_{2}(P)}^{2} for every fFf\in{\cal F}, and as ‘almost isometric’ when cc can be made arbitrarily close to 11.

We begin with the main estimate needed to prove an ‘isomorphic’ bound - when η=0\eta=0.

where (εi)i=1N(\varepsilon_{i})_{i=1}^{N} are independent, symmetric {1,1}\{-1,1\}-valued random variables.

Given a class F{\cal F}, let Q=QFQ=Q_{{\cal F}} be as above and assume that there is some τ>0\tau>0 for which Q(2τ)>0Q(2\tau)>0. If

Proof. First, note that by Markov’s inequality for the empirical measure PNP_{N}, fL2(PN)2u2PN{fu}\|f\|_{L_{2}(P_{N})}^{2}\geq u^{2}P_{N}\{|f|\geq u\} for every fFf\in{\cal F} and every u>0u>0. Therefore,

Note that {ϕu(f):fF}\{\phi_{u}(f):f\in F\} is a class of functions that is bounded by 11. Thus, by the bounded differences inequality for

(see, for example, ) and the Giné-Zinn symmetrization inequality , it follows that with probability at least 12e2t21-2e^{-2t^{2}},

Moreover, ϕ\phi is a Lipschitz function with constant 1u\frac{1}{u}; therefore, by the contraction inequality for Rademacher sums (see, e.g. ),

Thus, with probability at least 12e2t21-2e^{-2t^{2}},

Finally, if Q(2τ)>0Q(2\tau)>0, set u=τu=\tau and put t=Q(2τ)N4t=\frac{Q(2\tau)\sqrt{N}}{4}. If

then with probability at least 12exp(Q2(2τ)N/8)1-2\exp\left(-Q^{2}(2\tau)N/8\right),

If one wishes to apply Theorem 2.1, one has to bound QF(2τ)Q_{\cal F}(2\tau) from below. One possibility is to use the following version of the Paley-Zygmund inequality (see, e.g. ).

Let ff be a function on the probability space (Ω,P)(\Omega,P). For every p,q>1p,q>1 for which 1p+1q=1\frac{1}{p}+\frac{1}{q}=1 and every 0<u<fL1(P)0<u<\|f\|_{L_{1}(P)},

If α(F)>0\alpha({\cal F})>0 and βp(F)<\beta_{p}({\cal F})<\infty for some p>1p>1, then for every 0<u<α(F)0<u<\alpha({\cal F}),

Theorem 2.1 can only lead to an ‘isomorphic’ type of lower bound, since τ2Q(2τ)/2\tau^{2}Q(2\tau)/2 is always smaller than inffFfL2(P)2\inf_{f\in F}\|f\|_{L_{2}(P)}^{2}. However, we will show in the next section that in some cases it is possible to obtain an ‘almost isometric’ bound as well.

2 A sharp lower bound

Here, we will prove a high probability ‘almost isometric’ lower bound, of the form inffFPNf21δ\inf_{f\in{\cal F}}P_{N}f^{2}\geq 1-\delta, for certain subsets of the unit sphere in L2(P)L_{2}(P).

Let C{\cal C} be a class C{\cal C} of subsets of Ω\Omega. A set {x1,...,xn}\{x_{1},...,x_{n}\} is shattered by C{\cal C} if for every I{1,...,n}I\subset\{1,...,n\} there is some CCC\in{\cal C} for which xiCx_{i}\in C if iIi\in I, and xi∉Cx_{i}\not\in C otherwise.

The maximal cardinality of a subset of Ω\Omega that is shattered by C{\cal C} is called the VC dimension of C{\cal C}, and is denoted by VC(C)VC({\cal C}).

We refer the reader to for basic facts on VC classes and on the VC-dimension.

Let F{\cal F} be a subset of the L2(P)L_{2}(P) unit sphere. Assume that

1. there is a constant L>1L>1 for which, for every u>0u>0,

2. The collection of sets C={{f>u}:fF,u>0}{{\cal C}}=\{\{|f|>u\}:f\in{{\cal F}},u>0\} is a VC class of sets of VC-dimension at most dd.

For every η>0\eta>0 and L1,L\geq 1, there exist constants κi\kappa_{i}, i=0,1,...,6i=0,1,...,6 that depend only on LL and η\eta for which the following holds. Assume that F{\cal F} satisfies Assumption 2.1 with constants η\eta, LL and dd, and that d/Nβd/N\leq\beta for some β(0,1]\beta\in(0,1]. Then, the following statements hold:

1. If η>2,\eta>2, then with probability at least 1κ0log(e/β)exp(κ1Nβ),1-\kappa_{0}\log(e/\beta)\exp(-\kappa_{1}N\beta),

2. If η=2,\eta=2, then with probability at least 1-\exp\Bigl{(}-\kappa_{3}N\beta\log(1/\beta)\Bigr{)},

3. If 0<η<2,0<\eta<2, then with probability at least 1-\exp\Bigl{(}-\kappa_{5}N\beta\log(1/\beta)\Bigr{)},

We will present a detailed proof of the first part of Theorem 2.5. Since the other two parts are almost identical to the first one, we will only outline the minor differences in their proofs.

The following well-known fact regarding VC classes has a key role in the proof of Theorem 2.5.

There exists an absolute constant κ\kappa for which the following holds. Let C{\cal C} be a class of sets, put d=VC(C)d={\rm VC}({\cal C}) and set σ2=supCCP(C)\sigma^{2}=\sup_{C\in{\cal C}}P(C). For every t>0t>0, with probability at least 12exp(t)1-2\exp(-t),

Proof of Theorem 2.5. For j1j\geq 1, let Cj={{f>u}:fF, 2ju2j+1}{\cal C}_{j}=\{\{|f|>u\}:f\in{\cal F},\ 2^{j}\leq u\leq 2^{j+1}\} and denote C0={{f>u} : fF, 0<u1}{\cal C}_{0}=\{\{|f|>u\}\ :\ f\in{\cal F},\ 0<u\leq 1\}.

Recall that by Assumption 2.1, VC(C)dVC({\cal C})\leq d. Since CjC{\cal C}_{j}\subset{\cal C}, it follows that VC(Cj)d{\rm VC}({\cal C}_{j})\leq d for j=0,1,..j=0,1,... Moreover, setting σj=supCCj(P1/2(C))\sigma_{j}=\sup_{C\in{\cal C}_{j}}(P^{1/2}(C)), it is evident from the tail estimate in Assumption 2.1 that for j1j\geq 1,

and trivially, that σ01\sigma_{0}\leq 1. Therefore, by Theorem 2.6, for every j1j\geq 1, with probability at least 12exp(t)1-2\exp(-t),

for suitable absolute constants κ0\kappa_{0} and κ1\kappa_{1}. A similar estimate holds for j=0j=0.

Fix δ\delta to be specified later and set A=max{(L/ηδ)1/η,1}A=\max\{(L/\eta\delta)^{1/\eta},1\}. We will assume that A>1A>1, as the case A=1A=1 is considerably simpler and is omitted.

Recall that fL2(P)=1\|f\|_{L_{2}(P)}=1 for every fFf\in{\cal F}, and thus

Observe that for every fFf\in{\cal F} and u[2j,2j+1]u\in[2^{j},2^{j+1}],

Hence, if j0j_{0} is the smallest integer for which 2j0A2^{j_{0}}\geq A,

Applying (2.2) and summing the probabilities, it is evident that with probability at least 12(j0+1)exp(t)1-2(j_{0}+1)\exp(-t), for every 0jj00\leq j\leq j_{0}

Under the assumption that d/Nβd/N\leq\beta for β(0,1],\beta\in(0,1], set t=κ2Nβt=\kappa_{2}N\beta for a constant κ2\kappa_{2} to be selected later, and that depends only on LL and on η\eta, and let δ=β\delta=\sqrt{\beta}; hence,

By a straightforward computation, combined with a proper choice of the constant κ2\kappa_{2} in the definition of tt,

To estimate the probability of that event, note that

Thus, 2(j0+1)exp(t)κ3log(e/β)exp(κ2Nβ)2(j_{0}+1)\exp(-t)\leq\kappa_{3}\log(e/\beta)\exp(-\kappa_{2}N\beta), and with probability at least 1κ3log(e/β)exp(κ2Nβ)1-\kappa_{3}\log(e/\beta)\exp\left(-\kappa_{2}N\beta\right),

One may take δ=βlog3/2(1/β)\delta=\sqrt{\beta}\log^{3/2}(1/\beta) and t=κ5Nβlog(1/β)t=\kappa_{5}N\beta\log(1/\beta) and repeat the argument used in the case η>2\eta>2.

3. Finally, if η<2\eta<2, then (2.4) turns out to be

and one can again repeat the same argument used in the case η>2\eta>2, this time for the choice

for a constant κ6\kappa_{6} that depends only on LL and η\eta.

The smallest singular value of a random matrix

Next, let us consider the case η=0\eta=0. The next result shows that the smallest singular value of Γ\Gamma is bounded from below in a more general setup than the one formulated in Theorem 1.4.

1. There exist constants 0<a<A0<a<A for which a\leq\|\bigl{<}X,t\bigr{>}\|_{L_{2}}\leq A for every tSn1t\in S^{n-1}.

2. There exists a constant BB that satisfies that \|\bigl{<}X,t\bigr{>}\|_{L_{2}}\leq B\|\bigl{<}X,t\bigr{>}\|_{L_{1}} for every tSn1t\in S^{n-1}.

The first part of Assumption 3.1 simply states that the covariance operator of XX is non-degenerate. The second is the additional component that allows one to bound the ‘small-ball’ function QQ from below.

There exist absolute constants c0c_{0}, c1c_{1} and c2c_{2} for which the following holds. If XX satisfies Assumption 3.1 and Nc0B4(A/a)2nN\geq c_{0}B^{4}(A/a)^{2}n, then with probability at least 1exp(c1B4N)1-\exp(-c_{1}B^{4}N), λmin(Γ)c2a/B2\lambda_{\min}(\Gamma)\geq c_{2}a/B^{2}.

Proof. Let {\cal F}=\left\{\bigl{<}t,\cdot\bigr{>}:t\in S^{n-1}\right\} and let PP be the distribution of X.X. Using the notation of Corollary 2.3,

and setting τ=a/4B\tau=a/4B, it follows that Q(2τ)1/4B2Q(2\tau)\geq{1}/{4B^{2}}.

These estimates, combined with Theorem 2.1, show that if RN(F)τQ(2τ)/16R_{N}({\cal F})\leq\tau Q(2\tau)/16, there are absolute constants c1c_{1} and c2c_{2}, for which, with probability at least 1exp(c1N/B4)1-\exp(-c_{1}N/B^{4}),

References