Bounding the smallest singular value of a random matrix without concentration

Vladimir Koltchinskii, Shahar Mendelson

Introduction

The non-asymptotic theory of random matrices has attracted much attention in the recent years. One of the main questions studied in this area has to do with the behaviour of the singular values of various random matrix ensembles.

Various attempts have been made to find conditions on $X$ that ensure that the extremal singular values of $\Gamma$ satisfy a non-asymptotic version of the classical Bai-Yin Theorem , formulated here.

Let $A=A_{N,n}$ be an $N\times n$ random matrix with independent entries, distributed according to a random variable $\xi$ , for which

If $N,n\to\infty$ and the aspect ratio $n/N$ converges to $\beta\in(0,1]$ , then

almost surely. Also, without the fourth moment assumption, $\lambda_{\max}(A)/\sqrt{N}$ is almost surely unbounded.

In the non-asymptotic theory of random matrices, one is interested in a quantitative Bai-Yin type estimate: that for $N\geq c_{1}n$ , with high probability, the extremal singular values of $\Gamma$ satisfy that

for suitable absolute constants $c_{1}$ and $c_{2}.$

The main motivation for this note was the conjecture that a simultaneous estimate on the largest and smallest singular values is misleading, because their roles are very different. More precisely, that while any nontrivial estimate on the largest singular value does require concentration of some kind, usually given via an assumption on the Euclidean norm of $X$ , the smallest singular value should be bounded away from zero with almost no assumptions on the random vector.

A result in this direction is due to Srivastava and Vershynin . We state it in a slightly modified form consistent with the notations of Theorem 1.1:

Suppose that $n/N\leq\beta$ for some $\beta\in(0,1].$ Then,

for some constant $c_{1}>0$ depending only on $\eta,L.$

Theorem 1.2 implies that a slightly stronger condition than isotropicity suffices to ensure that for $N$ that is proportional to $n$ , the smallest singular value is bounded away from zero, with constant probability. However, when $\eta>2$ , which is the ‘Bai-Yin’ range, or even when $\eta$ is arbitrarily large, Theorem 1.2 does not lead to a Bai-Yin type of lower bound $1-c_{1}\sqrt{\beta}.$

Here, we will show that Theorem 1.2 can be improved in the entire range of $\eta$ . In particular, one has the following non-asymptotic lower bound:

Suppose that $n/N\leq\beta$ for some $\beta\in(0,1].$ Then:

1. For $\eta>2,$ with probability at least $1-c_{0}\log(e/\beta)\exp(-c_{1}N\beta)$ ,

2. For $\eta=2,$ with probability at least $1-\exp\Bigl{(}-c_{3}N\beta\log(1/\beta)\Bigr{)}$ ,

3. For $0<\eta<2,$ with probability at least $1-\exp(-c_{5}N\beta\log(1/\beta))$ ,

In particular, in the ‘Bai-Yin’ range of $\eta>2$ , one recovers the optimal behaviour (up to the multiplicative constants) of the smallest singular value of $\Gamma$ , and with a high probability (not only in expectation), for every $N\geq c(\beta)$ .

Somewhat surprisingly, the proof of this stronger result is much easier than the proof of Theorem 1.2 from .

The second main result has to do with the case $\eta=0$ , in which there is no additional moment information on linear forms.

It turns out that under rather minimal assumptions, one may bound the smallest singular value well away from zero.

In fact, Theorem 1.4 can be improved even further. It is reasonable to expect that there are only two situations in which the smallest singular value of $\Gamma$ is ‘too close’ to zero. Firstly, when the distribution of $X$ is very ‘peaky’ and almost atomic, which causes degeneracy. And secondly, when the covariance operator of $X$ is degenerate in some sense.

A more general version of Theorem 1.4 shows that this belief is indeed true. The main condition used to quantify the non-degeneracy of $X$ is that the following ‘small-ball’ function (for one-dimensional projections of $X$ )

is bounded away from zero for some $u>0$ .

It turns out that having $Q(u)>0$ for some $u>0$ , combined with an additional minimal condition on the covariance operator of $X,$ suffices to ensure that $\lambda_{\min}(\Gamma)\geq c$ , for a constant $c$ that depends on $u$ , $Q(u)$ and the covariance, with probability that grows as $1-\exp(-c_{1}N)$ . And, the isotropicity of $X$ and the equivalence between the $L_{1}$ and $L_{2}$ norms in Theorem 1.4 are only there to ensure that $Q(u)$ is bounded from below at some level, that depends only on $L$ .

We end this introduction with a word about notation. Throughout this note, all absolute constants are denoted by $c,c_{1},...$ or $\kappa,\kappa_{1},...$ . Their value may change from line to line. We write $c(a)$ if the constant depends only on the parameter $a$ ; $A\lesssim B$ means that there is an absolute constant $c$ for which $A\leq cB$ , and $A\lesssim_{a}B$ if $c$ depends only on $a$ . A similar convention is used for $A\sim B$ and $A\sim_{a}B$ , in which one has a two-sided inequality.

Two main estimates

In this section, we study the problem in a more abstract setting. Let ${\cal F}$ be a class of functions on a probability space $(\Omega,{\cal A},P)$ and set $P_{N}$ to be the empirical measure $N^{-1}\sum_{i=1}^{N}\delta_{X_{i}}$ , based on a sample $(X_{1},\dots,X_{N})$ of independent random variables with a common distribution $P$ .

We will derive lower bounds on $\inf_{f\in{\cal F}}P_{N}f^{2}$ that, when ${\cal F}=\{\bigl{<}t,\cdot\bigr{>},t\in S^{n-1}\}$ , imply the results on the smallest singular value of matrix $\Gamma$ stated in the introduction.

Although the estimates presented here are one-sided, they will be referred to as ‘isomorphic’ when there is a constant $0<c<1$ for which $P_{N}f^{2}\geq c\|f\|_{L_{2}(P)}^{2}$ for every $f\in{\cal F}$ , and as ‘almost isometric’ when $c$ can be made arbitrarily close to $1$ .

We begin with the main estimate needed to prove an ‘isomorphic’ bound - when $\eta=0$ .

where $(\varepsilon_{i})_{i=1}^{N}$ are independent, symmetric $\{-1,1\}$ -valued random variables.

Given a class ${\cal F}$ , let $Q=Q_{{\cal F}}$ be as above and assume that there is some $\tau>0$ for which $Q(2\tau)>0$ . If

Proof. First, note that by Markov’s inequality for the empirical measure $P_{N}$ , $\|f\|_{L_{2}(P_{N})}^{2}\geq u^{2}P_{N}\{|f|\geq u\}$ for every $f\in{\cal F}$ and every $u>0$ . Therefore,

Note that $\{\phi_{u}(f):f\in F\}$ is a class of functions that is bounded by $1$ . Thus, by the bounded differences inequality for

(see, for example, ) and the Giné-Zinn symmetrization inequality , it follows that with probability at least $1-2e^{-2t^{2}}$ ,

Moreover, $\phi$ is a Lipschitz function with constant $\frac{1}{u}$ ; therefore, by the contraction inequality for Rademacher sums (see, e.g. ),

Thus, with probability at least $1-2e^{-2t^{2}}$ ,

Finally, if $Q(2\tau)>0$ , set $u=\tau$ and put $t=\frac{Q(2\tau)\sqrt{N}}{4}$ . If

then with probability at least $1-2\exp\left(-Q^{2}(2\tau)N/8\right)$ ,

If one wishes to apply Theorem 2.1, one has to bound $Q_{\cal F}(2\tau)$ from below. One possibility is to use the following version of the Paley-Zygmund inequality (see, e.g. ).

Let $f$ be a function on the probability space $(\Omega,P)$ . For every $p,q>1$ for which $\frac{1}{p}+\frac{1}{q}=1$ and every $0<u<\|f\|_{L_{1}(P)}$ ,

If $\alpha({\cal F})>0$ and $\beta_{p}({\cal F})<\infty$ for some $p>1$ , then for every $0<u<\alpha({\cal F})$ ,

Theorem 2.1 can only lead to an ‘isomorphic’ type of lower bound, since $\tau^{2}Q(2\tau)/2$ is always smaller than $\inf_{f\in F}\|f\|_{L_{2}(P)}^{2}$ . However, we will show in the next section that in some cases it is possible to obtain an ‘almost isometric’ bound as well.

2 A sharp lower bound

Here, we will prove a high probability ‘almost isometric’ lower bound, of the form $\inf_{f\in{\cal F}}P_{N}f^{2}\geq 1-\delta$ , for certain subsets of the unit sphere in $L_{2}(P)$ .

Let ${\cal C}$ be a class ${\cal C}$ of subsets of $\Omega$ . A set $\{x_{1},...,x_{n}\}$ is shattered by ${\cal C}$ if for every $I\subset\{1,...,n\}$ there is some $C\in{\cal C}$ for which $x_{i}\in C$ if $i\in I$ , and $x_{i}\not\in C$ otherwise.

The maximal cardinality of a subset of $\Omega$ that is shattered by ${\cal C}$ is called the VC dimension of ${\cal C}$ , and is denoted by $VC({\cal C})$ .

We refer the reader to for basic facts on VC classes and on the VC-dimension.

Let ${\cal F}$ be a subset of the $L_{2}(P)$ unit sphere. Assume that

1. there is a constant $L>1$ for which, for every $u>0$ ,

2. The collection of sets ${{\cal C}}=\{\{|f|>u\}:f\in{{\cal F}},u>0\}$ is a VC class of sets of VC-dimension at most $d$ .

For every $\eta>0$ and $L\geq 1,$ there exist constants $\kappa_{i}$ , $i=0,1,...,6$ that depend only on $L$ and $\eta$ for which the following holds. Assume that ${\cal F}$ satisfies Assumption 2.1 with constants $\eta$ , $L$ and $d$ , and that $d/N\leq\beta$ for some $\beta\in(0,1]$ . Then, the following statements hold:

1. If $\eta>2,$ then with probability at least $1-\kappa_{0}\log(e/\beta)\exp(-\kappa_{1}N\beta),$

2. If $\eta=2,$ then with probability at least $1-\exp\Bigl{(}-\kappa_{3}N\beta\log(1/\beta)\Bigr{)},$

3. If $0<\eta<2,$ then with probability at least $1-\exp\Bigl{(}-\kappa_{5}N\beta\log(1/\beta)\Bigr{)}$ ,

We will present a detailed proof of the first part of Theorem 2.5. Since the other two parts are almost identical to the first one, we will only outline the minor differences in their proofs.

The following well-known fact regarding VC classes has a key role in the proof of Theorem 2.5.

There exists an absolute constant $\kappa$ for which the following holds. Let ${\cal C}$ be a class of sets, put $d={\rm VC}({\cal C})$ and set $\sigma^{2}=\sup_{C\in{\cal C}}P(C)$ . For every $t>0$ , with probability at least $1-2\exp(-t)$ ,

Proof of Theorem 2.5. For $j\geq 1$ , let ${\cal C}_{j}=\{\{|f|>u\}:f\in{\cal F},\ 2^{j}\leq u\leq 2^{j+1}\}$ and denote ${\cal C}_{0}=\{\{|f|>u\}\ :\ f\in{\cal F},\ 0<u\leq 1\}$ .

Recall that by Assumption 2.1, $VC({\cal C})\leq d$ . Since ${\cal C}_{j}\subset{\cal C}$ , it follows that ${\rm VC}({\cal C}_{j})\leq d$ for $j=0,1,..$ . Moreover, setting $\sigma_{j}=\sup_{C\in{\cal C}_{j}}(P^{1/2}(C))$ , it is evident from the tail estimate in Assumption 2.1 that for $j\geq 1$ ,

and trivially, that $\sigma_{0}\leq 1$ . Therefore, by Theorem 2.6, for every $j\geq 1$ , with probability at least $1-2\exp(-t)$ ,

for suitable absolute constants $\kappa_{0}$ and $\kappa_{1}$ . A similar estimate holds for $j=0$ .

Fix $\delta$ to be specified later and set $A=\max\{(L/\eta\delta)^{1/\eta},1\}$ . We will assume that $A>1$ , as the case $A=1$ is considerably simpler and is omitted.

Recall that $\|f\|_{L_{2}(P)}=1$ for every $f\in{\cal F}$ , and thus

Observe that for every $f\in{\cal F}$ and $u\in[2^{j},2^{j+1}]$ ,

Hence, if $j_{0}$ is the smallest integer for which $2^{j_{0}}\geq A$ ,

Applying (2.2) and summing the probabilities, it is evident that with probability at least $1-2(j_{0}+1)\exp(-t)$ , for every $0\leq j\leq j_{0}$

Under the assumption that $d/N\leq\beta$ for $\beta\in(0,1],$ set $t=\kappa_{2}N\beta$ for a constant $\kappa_{2}$ to be selected later, and that depends only on $L$ and on $\eta$ , and let $\delta=\sqrt{\beta}$ ; hence,

By a straightforward computation, combined with a proper choice of the constant $\kappa_{2}$ in the definition of $t$ ,

To estimate the probability of that event, note that

Thus, $2(j_{0}+1)\exp(-t)\leq\kappa_{3}\log(e/\beta)\exp(-\kappa_{2}N\beta)$ , and with probability at least $1-\kappa_{3}\log(e/\beta)\exp\left(-\kappa_{2}N\beta\right)$ ,

One may take $\delta=\sqrt{\beta}\log^{3/2}(1/\beta)$ and $t=\kappa_{5}N\beta\log(1/\beta)$ and repeat the argument used in the case $\eta>2$ .

3. Finally, if $\eta<2$ , then (2.4) turns out to be

and one can again repeat the same argument used in the case $\eta>2$ , this time for the choice

for a constant $\kappa_{6}$ that depends only on $L$ and $\eta$ .

The smallest singular value of a random matrix

Next, let us consider the case $\eta=0$ . The next result shows that the smallest singular value of $\Gamma$ is bounded from below in a more general setup than the one formulated in Theorem 1.4.

1. There exist constants $0<a<A$ for which $a\leq\|\bigl{<}X,t\bigr{>}\|_{L_{2}}\leq A$ for every $t\in S^{n-1}$ .

2. There exists a constant $B$ that satisfies that $\|\bigl{<}X,t\bigr{>}\|_{L_{2}}\leq B\|\bigl{<}X,t\bigr{>}\|_{L_{1}}$ for every $t\in S^{n-1}$ .

The first part of Assumption 3.1 simply states that the covariance operator of $X$ is non-degenerate. The second is the additional component that allows one to bound the ‘small-ball’ function $Q$ from below.

There exist absolute constants $c_{0}$ , $c_{1}$ and $c_{2}$ for which the following holds. If $X$ satisfies Assumption 3.1 and $N\geq c_{0}B^{4}(A/a)^{2}n$ , then with probability at least $1-\exp(-c_{1}B^{4}N)$ , $\lambda_{\min}(\Gamma)\geq c_{2}a/B^{2}$ .

Proof. Let ${\cal F}=\left\{\bigl{<}t,\cdot\bigr{>}:t\in S^{n-1}\right\}$ and let $P$ be the distribution of $X.$ Using the notation of Corollary 2.3,

and setting $\tau=a/4B$ , it follows that $Q(2\tau)\geq{1}/{4B^{2}}$ .

These estimates, combined with Theorem 2.1, show that if $R_{N}({\cal F})\leq\tau Q(2\tau)/16$ , there are absolute constants $c_{1}$ and $c_{2}$ , for which, with probability at least $1-\exp(-c_{1}N/B^{4})$ ,

Introduction

Two main estimates

2 A sharp lower bound

The smallest singular value of a random matrix

References