Bounding the smallest singular value of a random matrix without concentration
Vladimir Koltchinskii, Shahar Mendelson
Introduction
The non-asymptotic theory of random matrices has attracted much attention in the recent years. One of the main questions studied in this area has to do with the behaviour of the singular values of various random matrix ensembles.
Various attempts have been made to find conditions on that ensure that the extremal singular values of satisfy a non-asymptotic version of the classical Bai-Yin Theorem , formulated here.
Let be an random matrix with independent entries, distributed according to a random variable , for which
If and the aspect ratio converges to , then
almost surely. Also, without the fourth moment assumption, is almost surely unbounded.
In the non-asymptotic theory of random matrices, one is interested in a quantitative Bai-Yin type estimate: that for , with high probability, the extremal singular values of satisfy that
for suitable absolute constants and
The main motivation for this note was the conjecture that a simultaneous estimate on the largest and smallest singular values is misleading, because their roles are very different. More precisely, that while any nontrivial estimate on the largest singular value does require concentration of some kind, usually given via an assumption on the Euclidean norm of , the smallest singular value should be bounded away from zero with almost no assumptions on the random vector.
A result in this direction is due to Srivastava and Vershynin . We state it in a slightly modified form consistent with the notations of Theorem 1.1:
Suppose that for some Then,
for some constant depending only on
Theorem 1.2 implies that a slightly stronger condition than isotropicity suffices to ensure that for that is proportional to , the smallest singular value is bounded away from zero, with constant probability. However, when , which is the ‘Bai-Yin’ range, or even when is arbitrarily large, Theorem 1.2 does not lead to a Bai-Yin type of lower bound
Here, we will show that Theorem 1.2 can be improved in the entire range of . In particular, one has the following non-asymptotic lower bound:
Suppose that for some Then:
1. For with probability at least ,
2. For with probability at least 1-\exp\Bigl{(}-c_{3}N\beta\log(1/\beta)\Bigr{)},
3. For with probability at least ,
In particular, in the ‘Bai-Yin’ range of , one recovers the optimal behaviour (up to the multiplicative constants) of the smallest singular value of , and with a high probability (not only in expectation), for every .
Somewhat surprisingly, the proof of this stronger result is much easier than the proof of Theorem 1.2 from .
The second main result has to do with the case , in which there is no additional moment information on linear forms.
It turns out that under rather minimal assumptions, one may bound the smallest singular value well away from zero.
In fact, Theorem 1.4 can be improved even further. It is reasonable to expect that there are only two situations in which the smallest singular value of is ‘too close’ to zero. Firstly, when the distribution of is very ‘peaky’ and almost atomic, which causes degeneracy. And secondly, when the covariance operator of is degenerate in some sense.
A more general version of Theorem 1.4 shows that this belief is indeed true. The main condition used to quantify the non-degeneracy of is that the following ‘small-ball’ function (for one-dimensional projections of )
is bounded away from zero for some .
It turns out that having for some , combined with an additional minimal condition on the covariance operator of suffices to ensure that , for a constant that depends on , and the covariance, with probability that grows as . And, the isotropicity of and the equivalence between the and norms in Theorem 1.4 are only there to ensure that is bounded from below at some level, that depends only on .
We end this introduction with a word about notation. Throughout this note, all absolute constants are denoted by or . Their value may change from line to line. We write if the constant depends only on the parameter ; means that there is an absolute constant for which , and if depends only on . A similar convention is used for and , in which one has a two-sided inequality.
Two main estimates
In this section, we study the problem in a more abstract setting. Let be a class of functions on a probability space and set to be the empirical measure , based on a sample of independent random variables with a common distribution .
We will derive lower bounds on that, when {\cal F}=\{\bigl{<}t,\cdot\bigr{>},t\in S^{n-1}\}, imply the results on the smallest singular value of matrix stated in the introduction.
Although the estimates presented here are one-sided, they will be referred to as ‘isomorphic’ when there is a constant for which for every , and as ‘almost isometric’ when can be made arbitrarily close to .
We begin with the main estimate needed to prove an ‘isomorphic’ bound - when .
where are independent, symmetric -valued random variables.
Given a class , let be as above and assume that there is some for which . If
Proof. First, note that by Markov’s inequality for the empirical measure , for every and every . Therefore,
Note that is a class of functions that is bounded by . Thus, by the bounded differences inequality for
(see, for example, ) and the Giné-Zinn symmetrization inequality , it follows that with probability at least ,
Moreover, is a Lipschitz function with constant ; therefore, by the contraction inequality for Rademacher sums (see, e.g. ),
Thus, with probability at least ,
Finally, if , set and put . If
then with probability at least ,
If one wishes to apply Theorem 2.1, one has to bound from below. One possibility is to use the following version of the Paley-Zygmund inequality (see, e.g. ).
Let be a function on the probability space . For every for which and every ,
If and for some , then for every ,
Theorem 2.1 can only lead to an ‘isomorphic’ type of lower bound, since is always smaller than . However, we will show in the next section that in some cases it is possible to obtain an ‘almost isometric’ bound as well.
2 A sharp lower bound
Here, we will prove a high probability ‘almost isometric’ lower bound, of the form , for certain subsets of the unit sphere in .
Let be a class of subsets of . A set is shattered by if for every there is some for which if , and otherwise.
The maximal cardinality of a subset of that is shattered by is called the VC dimension of , and is denoted by .
We refer the reader to for basic facts on VC classes and on the VC-dimension.
Let be a subset of the unit sphere. Assume that
1. there is a constant for which, for every ,
2. The collection of sets is a VC class of sets of VC-dimension at most .
For every and there exist constants , that depend only on and for which the following holds. Assume that satisfies Assumption 2.1 with constants , and , and that for some . Then, the following statements hold:
1. If then with probability at least
2. If then with probability at least 1-\exp\Bigl{(}-\kappa_{3}N\beta\log(1/\beta)\Bigr{)},
3. If then with probability at least 1-\exp\Bigl{(}-\kappa_{5}N\beta\log(1/\beta)\Bigr{)},
We will present a detailed proof of the first part of Theorem 2.5. Since the other two parts are almost identical to the first one, we will only outline the minor differences in their proofs.
The following well-known fact regarding VC classes has a key role in the proof of Theorem 2.5.
There exists an absolute constant for which the following holds. Let be a class of sets, put and set . For every , with probability at least ,
Proof of Theorem 2.5. For , let and denote .
Recall that by Assumption 2.1, . Since , it follows that for . Moreover, setting , it is evident from the tail estimate in Assumption 2.1 that for ,
and trivially, that . Therefore, by Theorem 2.6, for every , with probability at least ,
for suitable absolute constants and . A similar estimate holds for .
Fix to be specified later and set . We will assume that , as the case is considerably simpler and is omitted.
Recall that for every , and thus
Observe that for every and ,
Hence, if is the smallest integer for which ,
Applying (2.2) and summing the probabilities, it is evident that with probability at least , for every
Under the assumption that for set for a constant to be selected later, and that depends only on and on , and let ; hence,
By a straightforward computation, combined with a proper choice of the constant in the definition of ,
To estimate the probability of that event, note that
Thus, , and with probability at least ,
One may take and and repeat the argument used in the case .
3. Finally, if , then (2.4) turns out to be
and one can again repeat the same argument used in the case , this time for the choice
for a constant that depends only on and .
The smallest singular value of a random matrix
Next, let us consider the case . The next result shows that the smallest singular value of is bounded from below in a more general setup than the one formulated in Theorem 1.4.
1. There exist constants for which a\leq\|\bigl{<}X,t\bigr{>}\|_{L_{2}}\leq A for every .
2. There exists a constant that satisfies that \|\bigl{<}X,t\bigr{>}\|_{L_{2}}\leq B\|\bigl{<}X,t\bigr{>}\|_{L_{1}} for every .
The first part of Assumption 3.1 simply states that the covariance operator of is non-degenerate. The second is the additional component that allows one to bound the ‘small-ball’ function from below.
There exist absolute constants , and for which the following holds. If satisfies Assumption 3.1 and , then with probability at least , .
Proof. Let {\cal F}=\left\{\bigl{<}t,\cdot\bigr{>}:t\in S^{n-1}\right\} and let be the distribution of Using the notation of Corollary 2.3,
and setting , it follows that .
These estimates, combined with Theorem 2.1, show that if , there are absolute constants and , for which, with probability at least ,