Restricted isometry property of matrices with independent columns and neighborly polytopes by random sampling
Radosław Adamczak, Alexander E. Litvak, Alain Pajor, Nicole Tomczak-Jaegermann
Introduction
The connection between the neighborliness of and sparse solutions of underdetermined linear equations was discovered in , Theorem 1, where it is proved that the following two statements are equivalent:
has vertices and is -neighborly
whenever has a solution having at most non-zero coordinates (in other words is -sparse), then is the unique solution of the program:
Definition. Let be a matrix. For any , the isometry constant of is defined as the smallest number so that
The relevance of this parameter for the reconstruction property ii) is for instance revealed in ,, where it was shown that if then satisfies ii) (see also , , ). In the present paper, we shall use the following sufficient condition from : if a matrix satisfies
then i) and ii) are satisfied. In other words, if has RIP then has the reconstruction property ii). This approach gives the strategy of our paper.
Recall that no general construction of centrally symmetric polytopes is known to produce polytopes with an optimal order of neighborliness. All known results are of randomized nature, namely, they show that for a certain probability on the space of matrices, a polytope is -neighborly with overwhelming probability, for (large) depending on and . Consequently, from now on, will be a random matrix in some Ensemble in the sense of Random Matrix Theory. Due to the normalization, we shall consider the isometry constant of . The plan consists in specializing to some model of random matrices, the condition .
Linear forms obey a uniform sub-exponential decay, that is, for all , all , and ,
The Euclidean norms of are concentrated around their average:
Note that such a concentration inequality is clearly necessary in order to have RIP.
One of the main results of this paper, Theorem 4.3, claims that under these conditions, whenever
the random polytope is -centrally-neighborly with probability larger than , where are universal numerical constants. We will make it more precise in Section 4. This model includes the cases when
’s are independent isotropic random vectors with a log-concave density;
the entries of the matrix are independent, centered with variance one and satisfy a sub-exponential tail inequality;
’s are on the sphere of radius and linear forms exhibit a uniform sub-exponential tail inequality.
These examples give rise to new classes of compressed sensing matrices. The class of i.i.d. entries with sub-exponential tail behavior (that is, entries being random variables), contains a subclass of matrices with i.i.d. entries for (see Definition 2.1 below of random variables). Since in this case the obtained bounds are better by a power of logarithm that may be essential in applications, we prove our results in full generality, for .
Sub-gaussian matrices with independent entries, which correspond to , are by now well understood. They include for instance the Gaussian case when the matrix is built with i.i.d. Gaussian random variables (see ,,); the case when the entries of are i.i.d. Bernoulli random variables (, , ); a general case of i.i.d. sub-gaussian entries is treated in (,, also see for simpler proofs).
Results of this paper are based on concentration type inequalities for random matrices under consideration. The proof of the main technical result, Theorem 3.2, will employ methods from . A crucial new ingredient consists of an analysis of the quantity
Notation and preliminaries
It is well known that the -norm of a random variable may be estimated from the growth of the moments. More precisely if a random variable is such that for any , , for some , then where is a numerical constant.
Remark: The above notation of for the weak norm of a random vector should not be confused with the standard convention in the probability theory that this notation stands for the norm of the random variable , i.e., –this latter meaning will never be used in this paper.
We recall the well known Bernstein’s inequality which we shall use in the form of a estimate ().
Let be independent real random variables with zero mean such that for some and every , . Then, for any ,
in other words, if is centered and its covariance matrix is the identity.
It is well known that if a measure has a log-concave density, then linear functionals exhibit a sub-exponential decay. More precisely, we have:
where is a universal constant. As a consequence, if is an isotropic random vector with a log-concave density then .
The Euclidean norm of an isotropic random vector with a log-concave density highly concentrates around its expectation, this translates geometrically to the concentration of mass of an isotropic convex body within a thin Euclidean shell (, see also ). We will use here the following result immediately derived from , Theorem 4.4.
Moreover, one can take and .
Remark: It is conjectured that in the above theorem one can replace by .
We shall also use the following result from as formulated in .
with probability at least .
In this paper, different universal positive constants may be denoted by the same letters , etc.
Isometry constant
We begin this section by formulating, in Theorem 3.3, a general estimate for the isometry constant of random matrices with independent columns. Then, in order to apply such an estimate, we introduce two sufficient conditions that determine large classes of random matrices. Finally, we give examples of important classes that satisfy the estimates from Theorem 3.3 and thus provide us with models: the Log-Concave Ensemble, matrices with i.i.d. entries, and matrices defined by independent vectors on a sphere.
Techniques of “compressed sensing” rely on properties of the sampling matrix, which should act almost isometrically on sparse vectors. This motivated the concept of Restricted Isometry Property (RIP) defined in . To quantify this property of the “sensing” matrix, the authors introduced the isometry constant defined in the introduction, that we recall here for the convenience of the reader.
Let be a matrices and let . For any , the isometry constant of is defined as the smallest number so that
Thus the isometry constant is controlled by quantity and the second term, We begin by estimating .
Then setting , the inequality
where are absolute positive constants.
We postpone the proof of Theorem 3.2 to the last section. Combining this theorem with inequality (3.3), relating the RIP, and concentration of the Euclidean norm of the ’s, we immediately deduce an estimate for the isometry constant of a random matrix with independent columns.
Condition : Linear forms obey a uniform estimate:
Condition : ’s are concentrated around their average:
As already mentioned in the Introduction, a condition such as is necessary to have the RIP. Indeed, if the matrix has RIP with probability then is satisfied.
2 Examples
We now specialize Theorem 3.3 to some specific classes of matrices.
Assume the above “log-concave setting”. There exist universal constants such that conditions and are satisfied whenever , where is given in Lemma 2.6.
The proof is immediate from Lemmas 2.5 and 2.6.
Applying Theorem 3.3 (with ) together with Lemmas 3.4 and 2.7 to the Log-Concave Ensemble, we get that for every ,
where are universal constants and is given in Lemma 2.6.
It might be worthwhile to note that using directly Lemma 2.6 one can replace the second term in estimate (3.6) by a term tending to 0 when , but this would require an adjustment in probability. For example works with the probability estimate in which is replaced by . (Here is given in Lemma 2.6.)
Consider now the “ setting”, where the entries of the matrix are independent centered, with variance one, random variables (with ). Set .
Assume the above “ setting” with . Then conditions and are satisfied whenever , where are absolute positive constants.
The following Lemma is a combination of Corollaries 2.9 and 2.10 of .
where and , for .
The behavior of general centered variables can be easily reduced to symmetric Weibull variables. The argument is quite standard, we sketch it here for the sake of completeness.
Assume thus that are independent mean zero random variables with . Let and set . Let be defined as in Lemma 3.6.
We will use the above observation together with symmetrization and the contraction principle to estimate moments of linear combinations of variables . We have for ,
where to get the last two inequalities we used Khinchine’s inequality, Lemma 3.6 and integration by parts to pass from tail to moment estimates.
We are now ready to prove condition . Fix and consider the linear combination . Since , we obtain by homogeneity
The proof of condition goes along similar lines. Instead of Lemma 3.6 we will now use the following lemma, which is an easy consequence of Theorem 6.2 in and the observation that the -th moment of a Weibull variable with parameter is of order , where remains bounded for away from .
Moreover, for , is bounded by some absolute constant.
Using similar arguments as in the proof of condition we can infer from the above lemma that if are independent mean zero random variables with (), then for ,
Therefore, for any by the Chebyshev inequality in ,
for some (new) universal constant or equivalently
(The additional constants appearing above stem from the fact that under the standard definition for , is not a norm but only a quasi-norm and additionally . One can modify the function so that it is convex. For away from zero, this modification changes the norm by an absolute constant). Therefore, applying (3.7) with yields
For the proof is similar, but uses Lemma 3.6 (which in this case reduces to Bernstein’s inequality) instead of Lemma 3.7 (the argument is simpler since in this case the involved norms of the vector do not depend on and we get (3.7) directly).
The lemma follows now by the union bound.
Applying Theorem 3.3 together with Lemma 3.5 to the “ setting”, we get that for every ,
2.3 Vectors on a sphere
Another interesting case is when the vectors lie on a common sphere. To keep the same normalization as in the previous cases we assume that the sphere has the radius . Then condition (3.5) becomes empty. Let and assume that the vectors are and let . Let and set . Then Theorem 3.3 immediately gives that
The geometry of faces of random polytopes
In this Section we discuss the geometry of random polytopes. Let be an matrix. We denote by (resp. ) the convex hull (resp., the symmetric convex hull) of the columns of .
For an integer , a polytope is called -neighborly if any set of less than vertices is the vertex set of a face. In the symmetric setting, a centrally symmetric convex polytope is -centrally-neighborly if any set of less than vertices containing no-opposite pairs is the vertex set of a face. We refer the reader to the books and for classical details on neighborly polytopes. (Some new quantitative invariants related to neighborliness were recently developed in .)
The relation between the problem of reconstruction and neighborly polytopes was discovered in .
(, Theorem 1) Let be a matrix, . The following two assertions are equivalent.
The polytope has vertices and is -centrally-neighborly.
Whenever has a solution having at most non-zero coordinates, is the unique solution of the optimization problem :
We will also use the following result from (which could be replaced by a similar result from ).
We are now ready to state the main result on neighborly random polytopes.
Let be integers. Let . Let and . Let be independent random vectors satisfying with parameter and with probability . Let be the matrix with as columns. Then, with probability larger than
the polytopes and are -neighborly and -centrally-neighborly, respectively, whenever
Observe that the probability is positive for large enough provided that .
Proof. Theorem 3.3 and the definition of property imply that for arbitrary , and , setting , we have
In view of Lemma 4.2, we look for and to ensure . For instance, we let and note that (3.5) implies
Now set m_{0}=[{c^{\prime}n\big{/}\psi^{4}\log^{2/r}(C^{\prime}\psi^{6}N/n)}] (for some new constants ). It is clearly sufficient to prove that the polytopes and are -neighborly and -centrally-neighborly, respectively. Thus adjusting the constants and writing for , we obtain
Combining this with the choice of , passing from to and adjusting the constants again if necessary, we conclude that with probability larger than
The last estimate follows from (4.1) by applying (3.5) and (4.2) to the last two terms, respectively; and where are again new constants.
2 Examples
We will now apply Theorem 4.3 in the three different settings introduced in the previous section.
Applying Lemma 3.4 and bound (3.6) we get the following:
Let be integers. Let be independent isotropic vectors with log-concave densities. This is for instance the case if are i.i.d. random vectors uniformly distributed on an isotropic convex body. Then, for any , with probability at least , the polytopes and are -neighborly and -centrally-neighborly, respectively, whenever
where are universal constants and is given in Lemma 2.6.
In a similar way as above, Lemma 3.5 and bound (3.8) imply the following theorem (note that its conclusion becomes empty if and ).
Let be a matrix with entries that are independent centered variance one random variables. Let and assume that the norms of the entries are bounded by some constant . Then, for any , with probability at least , the polytopes and are -neighborly and -centrally-neighborly, respectively, whenever satisfies
2.3 Vectors on a sphere
Finally assume that the vectors are on a sphere of radius . From bound (3.9) we obtain:
Let be integers. Let . Let be independent vectors on a sphere of radius and satisfying for some parameter . Let and set . Then, with probability at least , the polytopes and are -neighborly and -centrally-neighborly, respectively, whenever
Remark: 1) For the matrix with i.i.d. Gaussian entries (the case considered in Section 3.2.2 above when ), it is known that with overwhelming probability, is -centrally-neighborly, whenever satisfies
where are universal constants, (see ,,,). The precise asymptotic dependence of on and has been well studied in when and in when .
Main technical result
and define the other two quantities as follows:
Given a real number , we will denote by .
The main purpose of this Section is to prove Theorem 3.2. In fact we will prove a stronger technical result, Theorem 5.1, from which Theorem 3.2 will follow.
where is an absolute constant and .
is an absolute constant and .
Before starting the proof of the theorem we show how it implies Theorem 3.2, stated in Section 3.
Fix and let be such that
By Theorem 5.1 with , and the condition on ,
and and are absolute positive constants. Thus, if for some , then
where is an absolute constant. This concludes the proof.
2 Proof of Theorem 5.1
The proof will use the same construction as in , which however requires some modifications. For completeness and the reader’s convenience we provide details of the argument.
We require the following two lemmas proved in with . Since the proofs for general repeat the same arguments, we leave them for the reader.
The following formula is well known and the proof is in its statement.
We are now ready to start the proof of Theorem 5.1.
Proof of Theorem 5.1. As in , the construction splits into two cases.
Now we split according to the structure of . Namely we let
We first estimate . By Lemma 5.4 we have
We now set and apply Lemma 5.2 to each summand in the sum above with the parameters
We now pass to the estimate for which essentially follows the same lines.
Then and (similarly as in ) we can estimate the cardinality
where is the an absolute constant.
Since , then
Moreover, is chosen to have the same support as , and thus has the support .
It follows from the definitions of and that
(here , and denote the coordinates of , and , respectively). Thus
Thus, by (5.4) and using again we obtain
3 Optimality of estimates
We conclude this section by an example showing optimality, in a certain sense, of estimates in Theorem 3.2. We will limit ourselves to the case, that is to . To this end we consider a special case when where are i.i.d. symmetric exponential variables with variance one. We begin by showing an optimal estimate for .
First, from (Theorem 3.5) we have that for and any ,
where are numerical constants. In the other direction, we have the following
by general tail estimates for linear combinations of exponential variables with vector valued coefficients (see e.g. Corollary 1 in ), we get
where to simplify the notation we set . On the right hand side we actually have , where is such a rearrangement of that , which can be used to derive lower bounds on the expectation. We will however not rely on this representation, instead we will use a Sudakov type minoration principle for exponential variables proved in , Theorem 5.2.9, which we state here in a simplified version, adapted to our purposes.