Restricted isometry property of matrices with independent columns and neighborly polytopes by random sampling

Radosław Adamczak, Alexander E. Litvak, Alain Pajor, Nicole Tomczak-Jaegermann

Introduction

The connection between the neighborliness of $K(A)$ and sparse solutions of underdetermined linear equations was discovered in , Theorem 1, where it is proved that the following two statements are equivalent:

$K(A)$ has $2N$ vertices and is $m$ -neighborly

whenever $y=Az$ has a solution $z$ having at most $m$ non-zero coordinates (in other words $z$ is $m$ -sparse), then $z$ is the unique solution of the program:

Definition. Let $M$ be a $n\times N$ matrix. For any $1\leq m\leq\min(n,N)$ , the isometry constant of $M$ is defined as the smallest number $\delta_{m}=\delta_{m}(M)$ so that

The relevance of this parameter for the reconstruction property ii) is for instance revealed in ,, where it was shown that if $\delta_{m}(M)+\delta_{2m}(M)+\delta_{3m}(M)<1$ then $M$ satisfies ii) (see also , , ). In the present paper, we shall use the following sufficient condition from : if a matrix $M$ satisfies

then i) and ii) are satisfied. In other words, if $M$ has RIP ${}_{2m}(\sqrt{2}-1)$ then $M$ has the reconstruction property ii). This approach gives the strategy of our paper.

Recall that no general construction of centrally symmetric polytopes is known to produce polytopes with an optimal order of neighborliness. All known results are of randomized nature, namely, they show that for a certain probability on the space of $n\times N$ matrices, a polytope $K(A)$ is $m$ -neighborly with overwhelming probability, for (large) $m$ depending on $n$ and $N$ . Consequently, from now on, $A$ will be a random matrix in some Ensemble in the sense of Random Matrix Theory. Due to the normalization, we shall consider the isometry constant of ${A/\sqrt{n}}$ . The plan consists in specializing to some model of random matrices, the condition $\delta_{2m}\left({A\over\sqrt{n}}\right)<\sqrt{2}-1$ .

Linear forms obey a uniform sub-exponential decay, that is, for all $1\leq i\leq N$ , all ${y\in S^{n-1}}$ , and $t>0$ ,

The Euclidean norms of $X_{1},\dots,X_{N}$ are concentrated around their average:

Note that such a concentration inequality is clearly necessary in order to have RIP ${}_{1}((\sqrt{2}-1)/2)$ .

One of the main results of this paper, Theorem 4.3, claims that under these conditions, whenever

the random polytope $K(A)$ is $m$ -centrally-neighborly with probability larger than $1-2\lambda-C\exp(-c\sqrt{n})$ , where $C,c>0$ are universal numerical constants. We will make it more precise in Section 4. This model includes the cases when

$X_{i}$ ’s are independent isotropic random vectors with a log-concave density;

the entries of the matrix are independent, centered with variance one and satisfy a sub-exponential tail inequality;

$X_{i}$ ’s are on the sphere of radius $\sqrt{n}$ and linear forms exhibit a uniform sub-exponential tail inequality.

These examples give rise to new classes of compressed sensing matrices. The class of i.i.d. entries with sub-exponential tail behavior (that is, entries being $\psi_{1}$ random variables), contains a subclass of matrices with i.i.d. $\psi_{r}$ entries for $1<r\leq 2$ (see Definition 2.1 below of $\psi_{r}$ random variables). Since in this case the obtained bounds are better by a power of logarithm that may be essential in applications, we prove our results in full generality, for $1\leq r\leq 2$ .

Sub-gaussian matrices with independent $\psi_{2}$ entries, which correspond to $r=2$ , are by now well understood. They include for instance the Gaussian case when the matrix $A$ is built with i.i.d. Gaussian $N(0,1)$ random variables (see ,,); the case when the entries of $A$ are i.i.d. $(\pm 1)$ Bernoulli random variables (, , ); a general case of i.i.d. sub-gaussian entries is treated in (,, also see for simpler proofs).

Results of this paper are based on concentration type inequalities for random matrices under consideration. The proof of the main technical result, Theorem 3.2, will employ methods from . A crucial new ingredient consists of an analysis of the quantity

Notation and preliminaries

It is well known that the $\psi_{r}$ -norm of a random variable may be estimated from the growth of the moments. More precisely if a random variable $Y$ is such that for any $p\geq 1$ , $\|Y\|_{p}\leq p^{1/r}K$ , for some $K>0$ , then $\|Y\|_{\psi_{r}}\leq cK$ where $c>0$ is a numerical constant.

Remark: The above notation of $\|X\|_{\psi_{r}}$ for the weak $\,\psi_{r}$ norm of a random vector $X$ should not be confused with the standard convention in the probability theory that this notation stands for the $\psi_{r}$ norm of the random variable $|X|$ , i.e., $\|\,|X|\,\|_{\psi_{r}}$ –this latter meaning will never be used in this paper.

We recall the well known Bernstein’s inequality which we shall use in the form of a $\psi_{1}$ estimate ().

Let $Y_{1},...,Y_{n}$ be independent real random variables with zero mean such that for some $\psi>0$ and every $i$ , $\|Y_{i}\|_{\psi_{1}}\leq\psi$ . Then, for any $t>0$ ,

in other words, if $X$ is centered and its covariance matrix is the identity.

It is well known that if a measure has a log-concave density, then linear functionals exhibit a sub-exponential decay. More precisely, we have:

where $c>0$ is a universal constant. As a consequence, if $X$ is an isotropic random vector with a log-concave density then $\|X\|_{\psi_{1}}\leq c$ .

The Euclidean norm of an isotropic random vector with a log-concave density highly concentrates around its expectation, this translates geometrically to the concentration of mass of an isotropic convex body within a thin Euclidean shell (, see also ). We will use here the following result immediately derived from , Theorem 4.4.

Moreover, one can take $c_{0}=3.33$ and $c_{1}=0.33$ .

Remark: It is conjectured that in the above theorem one can replace $\theta^{3.33}n^{0.33}$ by $c(\theta)n^{1/2}$ .

We shall also use the following result from as formulated in .

with probability at least $1-\exp(-K\sqrt{n})$ .

In this paper, different universal positive constants may be denoted by the same letters $C,C_{0},C^{\prime},c,c_{0},c^{\prime}$ , etc.

Isometry constant

We begin this section by formulating, in Theorem 3.3, a general estimate for the isometry constant of random matrices with independent $\psi_{r}$ columns. Then, in order to apply such an estimate, we introduce two sufficient conditions that determine large classes of random matrices. Finally, we give examples of important classes that satisfy the estimates from Theorem 3.3 and thus provide us with models: the Log-Concave Ensemble, matrices with i.i.d. $\psi_{r}$ entries, and matrices defined by independent $\psi_{r}$ vectors on a sphere.

Techniques of “compressed sensing” rely on properties of the sampling matrix, which should act almost isometrically on sparse vectors. This motivated the concept of Restricted Isometry Property (RIP) defined in . To quantify this property of the “sensing” matrix, the authors introduced the isometry constant defined in the introduction, that we recall here for the convenience of the reader.

Let $M$ be a $n\times N$ matrices and let $\delta\in(0,1)$ . For any $1\leq m\leq\min(n,N)$ , the isometry constant of $M$ is defined as the smallest number $\delta_{m}=\delta_{m}(M)$ so that

Thus the isometry constant is controlled by quantity $B_{m}$ and the second term, $\max_{i\leq N}\left|{|X_{i}|^{2}\over n}-1\right|.$ We begin by estimating $B_{m}$ .

Then setting $\xi=\psi K+K^{\prime}$ , the inequality

where $C,c$ are absolute positive constants.

We postpone the proof of Theorem 3.2 to the last section. Combining this theorem with inequality (3.3), relating the RIP, $B_{m}$ and concentration of the Euclidean norm of the $X_{i}$ ’s, we immediately deduce an estimate for the isometry constant of a random matrix with independent $\psi_{r}$ columns.

Condition $H_{1}(r,\psi)$ : Linear forms obey a uniform $\psi_{r}$ estimate:

Condition $H_{2}(\lambda)$ : $|X_{i}|$ ’s are concentrated around their average:

As already mentioned in the Introduction, a condition such as $H_{2}(\lambda)$ is necessary to have the RIP. Indeed, if the matrix $A/\sqrt{n}$ has RIP ${}_{1}((\sqrt{2}-1)/2)$ with probability $\lambda$ then $H_{2}(\lambda)$ is satisfied.

2 Examples

We now specialize Theorem 3.3 to some specific classes of matrices.

Assume the above “log-concave setting”. There exist universal constants $\psi,C,c>0$ such that conditions $H_{1}(1,\psi)$ and $H_{2}(C\exp(-cn^{c_{1}}))$ are satisfied whenever $N\leq\exp{(cn^{c_{1}})}$ , where ${c_{1}}$ is given in Lemma 2.6.

The proof is immediate from Lemmas 2.5 and 2.6.

Applying Theorem 3.3 (with $r=1$ ) together with Lemmas 3.4 and 2.7 to the Log-Concave Ensemble, we get that for every $N\leq\exp(cn^{c_{1}})$ ,

where $C,c>0$ are universal constants and $c_{1}$ is given in Lemma 2.6.

It might be worthwhile to note that using directly Lemma 2.6 one can replace the second term in estimate (3.6) by a term tending to 0 when $n\to\infty$ , but this would require an adjustment in probability. For example $1/n^{c_{1}/2c_{0}}$ works with the probability estimate in which $\exp(-cn^{c_{1}})$ is replaced by $\exp(-cn^{c_{1}/2})$ . (Here $c_{0}$ is given in Lemma 2.6.)

Consider now the “ $\psi_{r}$ setting”, where the entries $a_{ij}$ of the matrix $A$ are independent centered, with variance one, random $\psi_{r}$ variables (with $r\in$ ). Set $\psi=\max_{ij}\|a_{ij}\|_{\psi_{r}}$ .

Assume the above “ $\psi_{r}$ setting” with $r\in$ . Then conditions $H_{1}(r,C\psi)$ and $H_{2}(2\exp(-cn^{r/2}/\psi^{2r}))$ are satisfied whenever $N\leq\exp(cn^{r/2}/\psi^{2r})$ , where $C,c$ are absolute positive constants.

The following Lemma is a combination of Corollaries 2.9 and 2.10 of .

where $1/r^{\ast}+1/r=1$ and $\|a\|_{q}=(|a_{1}|^{q}+\ldots+|a_{n}|^{q})^{1/q}$ , for $1\leq q<\infty$ .

The behavior of general centered $\psi_{r}$ variables can be easily reduced to symmetric Weibull variables. The argument is quite standard, we sketch it here for the sake of completeness.

Assume thus that $Z_{1},\ldots,Z_{n}$ are independent mean zero random variables with $\|Z_{i}\|_{\psi_{r}}\leq 1$ . Let $\beta=(\log 2)^{1/r}$ and set $U_{i}=(|Z_{i}|-\beta)_{+}$ . Let $Y_{i}$ be defined as in Lemma 3.6.

We will use the above observation together with symmetrization and the contraction principle to estimate moments of linear combinations of variables $Z_{i}$ . We have for $p\geq 1$ ,

where to get the last two inequalities we used Khinchine’s inequality, Lemma 3.6 and integration by parts to pass from tail to moment estimates.

We are now ready to prove condition $H_{1}(r,C\psi)$ . Fix $y\in S^{n-1}$ and consider the linear combination $\sum_{i=1}^{n}y_{i}a_{ij}$ . Since $\|a_{ij}\|_{\psi_{r}}\leq\psi$ , we obtain by homogeneity

The proof of condition $H_{2}$ goes along similar lines. Instead of Lemma 3.6 we will now use the following lemma, which is an easy consequence of Theorem 6.2 in and the observation that the $p$ -th moment of a Weibull variable with parameter $s$ is of order $C_{s}p^{1/s}$ , where $C_{s}$ remains bounded for $s$ away from .

Moreover, for $s\geq 1/2$ , $C_{s}$ is bounded by some absolute constant.

Using similar arguments as in the proof of condition $H_{1}$ we can infer from the above lemma that if $Z_{1},\ldots,Z_{n}$ are independent mean zero random variables with $\|Z_{i}\|_{\psi_{s}}\leq b$ ( $s\in[1/2,1)$ ), then for $p\geq 2$ ,

Therefore, for any $p\geq 2$ by the Chebyshev inequality in $L_{p}$ ,

for some (new) universal constant $C$ or equivalently

(The additional constants appearing above stem from the fact that under the standard definition for $s<1$ , $\|\cdot\|_{\psi_{s}}$ is not a norm but only a quasi-norm and additionally $\|1\|_{\psi_{r/2}}\neq 1$ . One can modify the function $x\mapsto e^{x^{r}}-1$ so that it is convex. For $r$ away from zero, this modification changes the norm by an absolute constant). Therefore, applying (3.7) with $t=\varepsilon n$ yields

For $r=2$ the proof is similar, but uses Lemma 3.6 (which in this case reduces to Bernstein’s $\psi_{1}$ inequality) instead of Lemma 3.7 (the argument is simpler since in this case the involved norms of the vector $a$ do not depend on $p$ and we get (3.7) directly).

The lemma follows now by the union bound.

Applying Theorem 3.3 together with Lemma 3.5 to the “ $\psi_{r}$ setting”, we get that for every $N\leq\exp(cn^{r/2}/\psi^{2r})$ ,

2.3 Vectors on a sphere

Another interesting case is when the vectors $X_{1},\dots,X_{N}$ lie on a common sphere. To keep the same normalization as in the previous cases we assume that the sphere has the radius $\sqrt{n}$ . Then condition (3.5) becomes empty. Let $1\leq r\leq 2$ and assume that the vectors are $\psi_{r}$ and let $\psi=\max_{i\leq N}\|X_{i}\|_{\psi_{r}}$ . Let $K\geq 1$ and set $\xi=\psi K$ . Then Theorem 3.3 immediately gives that

The geometry of faces of random polytopes

In this Section we discuss the geometry of random polytopes. Let $A$ be an $n\times N$ matrix. We denote by $K^{+}(A)$ (resp. $K(A)$ ) the convex hull (resp., the symmetric convex hull) of the $N$ columns of $A$ .

For an integer $1\leq m\leq n$ , a polytope is called $m$ -neighborly if any set of less than $m$ vertices is the vertex set of a face. In the symmetric setting, a centrally symmetric convex polytope is $m$ -centrally-neighborly if any set of less than $m$ vertices containing no-opposite pairs is the vertex set of a face. We refer the reader to the books and for classical details on neighborly polytopes. (Some new quantitative invariants related to neighborliness were recently developed in .)

The relation between the problem of reconstruction and neighborly polytopes was discovered in .

(, Theorem 1) Let $A$ be a $n\times N$ matrix, $n\leq N$ . The following two assertions are equivalent.

The polytope $K(A)$ has $2N$ vertices and is $m$ -centrally-neighborly.

Whenever $y=Az$ has a solution $z$ having at most $m$ non-zero coordinates, $z$ is the unique solution of the optimization problem $(P)$ :

We will also use the following result from (which could be replaced by a similar result from ).

We are now ready to state the main result on neighborly random polytopes.

Let $1\leq m\leq n\leq N$ be integers. Let $1\leq r\leq 2$ . Let $\psi\geq 1$ and $\lambda\in(0,1/2)$ . Let $X_{1},\dots,X_{N}$ be independent random vectors satisfying $H_{1}(r,\psi)$ with parameter $\psi$ and $H_{2}(\lambda)$ with probability $\lambda$ . Let $A$ be the $n\times N$ matrix with $X_{1},\dots,X_{N}$ as columns. Then, with probability larger than

the polytopes $K^{+}(A)$ and $K(A)$ are $m$ -neighborly and $m$ -centrally-neighborly, respectively, whenever

Observe that the probability is positive for $n$ large enough provided that $\lambda<1/2$ .

Proof. Theorem 3.3 and the definition of property $H_{1}(r,\psi)$ imply that for arbitrary $\theta^{\prime}\in(0,1)$ , and $K,K^{\prime}\geq 1$ , setting $\xi=\psi K+K^{\prime}$ , we have

In view of Lemma 4.2, we look for $m$ and $\theta^{\prime}$ to ensure $\delta_{2m}(A/\sqrt{n})<\sqrt{2}-1$ . For instance, we let $\theta^{\prime}=(\sqrt{2}-1)/2$ and note that (3.5) implies

Now set $m_{0}=[{c^{\prime}n\big{/}\psi^{4}\log^{2/r}(C^{\prime}\psi^{6}N/n)}]$ (for some new constants $C^{\prime},c^{\prime}>0$ ). It is clearly sufficient to prove that the polytopes $K^{+}(A)$ and $K(A)$ are $m_{0}$ -neighborly and $m_{0}$ -centrally-neighborly, respectively. Thus adjusting the constants $C^{\prime},c^{\prime}>0$ and writing $m$ for $m_{0}$ , we obtain

Combining this with the choice of $\theta^{\prime}$ , passing from $m$ to $2m$ and adjusting the constants again if necessary, we conclude that $\delta_{m}\left({A\over\sqrt{n}}\right)<\sqrt{2}-1$ with probability larger than

The last estimate follows from (4.1) by applying (3.5) and (4.2) to the last two terms, respectively; and where $C^{\prime\prime},c^{\prime\prime}>0$ are again new constants.

2 Examples

We will now apply Theorem 4.3 in the three different settings introduced in the previous section.

Applying Lemma 3.4 and bound (3.6) we get the following:

Let $1\leq m\leq n\leq N$ be integers. Let $X_{1},\dots,X_{N}$ be independent isotropic vectors with log-concave densities. This is for instance the case if $X_{1},\dots,X_{N}$ are i.i.d. random vectors uniformly distributed on an isotropic convex body. Then, for any $N\leq\exp(cn^{c_{1}/2})$ , with probability at least $1-C\exp(-cn^{c_{1}/2})$ , the polytopes $K^{+}(A)$ and $K(A)$ are $m$ -neighborly and $m$ -centrally-neighborly, respectively, whenever

where $C,c>0$ are universal constants and ${c_{1}}$ is given in Lemma 2.6.

In a similar way as above, Lemma 3.5 and bound (3.8) imply the following theorem (note that its conclusion becomes empty if $N\geq\exp(cn^{r/2}/\psi^{2r})$ and $\psi\geq 1$ ).

Let $A$ be a matrix with entries that are independent centered variance one random variables. Let $1\leq r\leq 2$ and assume that the $\psi_{r}$ norms of the entries are bounded by some constant $\psi$ . Then, for any $N\leq\exp(cn^{r/2}/\psi^{2r})$ , with probability at least $1-C\exp(-cn^{r/2}/\psi^{2r})$ , the polytopes $K^{+}(A)$ and $K(A)$ are $m$ -neighborly and $m$ -centrally-neighborly, respectively, whenever $1\leq m\leq n$ satisfies

2.3 Vectors on a sphere

Finally assume that the vectors are on a sphere of radius $\sqrt{n}$ . From bound (3.9) we obtain:

Let $1\leq m\leq n\leq N$ be integers. Let $1\leq r\leq 2$ . Let $X_{1},\dots,X_{N}$ be independent vectors on a sphere of radius $\sqrt{n}$ and satisfying $H_{1}(r,\psi)$ for some parameter $\psi>0$ . Let $K\geq 1$ and set $\xi=\psi K$ . Then, with probability at least $1-C\exp(-K\sqrt{n}/\psi^{2})$ , the polytopes $K^{+}(A)$ and $K(A)$ are $m$ -neighborly and $m$ -centrally-neighborly, respectively, whenever

Remark: 1) For the matrix $A$ with i.i.d. Gaussian $N(0,1)$ entries (the case considered in Section 3.2.2 above when $r=2$ ), it is known that with overwhelming probability, $K(A)$ is $m$ -centrally-neighborly, whenever $1\leq m\leq n$ satisfies

where $C,c>0$ are universal constants, (see ,,,). The precise asymptotic dependence of $m$ on $n$ and $N$ has been well studied in when $n/N\to\delta\in(0,1)$ and in when $n/N\to 0$ .

Main technical result

and define the other two quantities as follows:

Given a real number $s$ , we will denote $\max(s,0)$ by $s_{+}$ .

The main purpose of this Section is to prove Theorem 3.2. In fact we will prove a stronger technical result, Theorem 5.1, from which Theorem 3.2 will follow.

where $C_{0}$ is an absolute constant and $q=\max\{1,1/r\}$ .

$C$ is an absolute constant and $q=\max\{1,1/r\}$ .

Before starting the proof of the theorem we show how it implies Theorem 3.2, stated in Section 3.

Fix $K_{1}\geq 1$ and let $K\geq K_{1}$ be such that

By Theorem 5.1 with $r\geq 1$ , and the condition on $m$ ,

and $c$ and $C_{0}$ are absolute positive constants. Thus, if $C_{m}\leq K_{2}\sqrt{n}$ for some $K_{2}$ , then

where $C_{1}$ is an absolute constant. This concludes the proof.

2 Proof of Theorem 5.1

The proof will use the same construction as in , which however requires some modifications. For completeness and the reader’s convenience we provide details of the argument.

We require the following two lemmas proved in with $r=1$ . Since the proofs for general $r$ repeat the same arguments, we leave them for the reader.

The following formula is well known and the proof is in its statement.

We are now ready to start the proof of Theorem 5.1.

Proof of Theorem 5.1. As in , the construction splits into two cases.

Now we split $D_{x}$ according to the structure of $x$ . Namely we let

We first estimate $D^{\prime}_{x}$ . By Lemma 5.4 we have

We now set $q=\max\{1,1/r\}$ and apply Lemma 5.2 to each summand in the sum above with the parameters

We now pass to the estimate for $D^{\prime\prime}_{x}$ which essentially follows the same lines.

Then ${\cal{M}}_{k}(\theta)\subset 2B_{2}^{N}$ and (similarly as in ) we can estimate the cardinality

where $C_{2}$ is the an absolute constant.

Since $D_{x}=D^{\prime}_{x}+D^{\prime\prime}_{x}$ , then

Moreover, $x$ is chosen to have the same support as $z$ , and thus $w=z-x$ has the support $|\mathop{\rm supp}w|\leq m$ .

It follows from the definitions of $D_{z}$ and $A$ that

(here $w(i)$ , $x(i)$ and $z(i)$ denote the coordinates of $w$ , $x$ and $z$ , respectively). Thus

Thus, by (5.4) and using again $A_{m}\leq\sqrt{B_{m}^{2}+C_{m}^{2}}\leq B_{m}+C_{m}$ we obtain

3 Optimality of estimates

We conclude this section by an example showing optimality, in a certain sense, of estimates in Theorem 3.2. We will limit ourselves to the $\psi_{1}$ case, that is to $r=1$ . To this end we consider a special case when $X_{i}=(X_{ij})_{j=1}^{n}$ where $X_{ij}$ are i.i.d. symmetric exponential variables with variance one. We begin by showing an optimal estimate for $A_{m}$ .

First, from (Theorem 3.5) we have that for $N\leq\exp(c\sqrt{n})$ and any $K\geq 1$ ,

where $C,c>0$ are numerical constants. In the other direction, we have the following

by general tail estimates for linear combinations of exponential variables with vector valued coefficients (see e.g. Corollary 1 in ), we get

where to simplify the notation we set $Y_{i}=X_{i1}$ . On the right hand side we actually have $\sum_{i=1}^{m}|Y_{i}^{\ast}|$ , where $Y_{i}^{\ast}$ is such a rearrangement of $Y_{i}$ that $|Y_{1}^{\ast}|\geq|Y_{2}^{\ast}|\geq\ldots\geq|Y_{n}^{\ast}|$ , which can be used to derive lower bounds on the expectation. We will however not rely on this representation, instead we will use a Sudakov type minoration principle for exponential variables proved in , Theorem 5.2.9, which we state here in a simplified version, adapted to our purposes.

Introduction

Notation and preliminaries

Isometry constant

2 Examples

2.3 Vectors on a sphere

The geometry of faces of random polytopes

2 Examples

2.3 Vectors on a sphere

Main technical result

2 Proof of Theorem 5.1

3 Optimality of estimates

References