Compressed Sensing: How sharp is the Restricted Isometry Property

Jeffrey D. Blanchard, Coralia Cartis, Jared Tanner

Introduction

Questions of this form must have been around for millennia. Consider this puzzle: “A counterfeit coin is hidden in a batch of $N$ otherwise similar coins; it is distinguished from the others by its slightly heavier weight. How many balance weighings are needed to find the counterfeit?” Abstractly, this concerns the special case where $K$ is an unknown singleton and the nonzero value is nonnegative; the balance is abstractly the same as an inner product which gives weight +1 to the coefficients placed in the “right” pan and -1 to the coefficients placed in the “left” pan. Many people quickly find that roughly $\log(N)$ measurements suffice to find the position and value of the nonzero, each time putting half the remaining coins in one pan, half in the other, and discarding from further consideration the coins that turn out on the light side. Lighthearted as puzzles can sometimes seem, they can lead to serious applications.

During World War Two, efficient screening of large groups of soldiers for certain infections was based on the principle of group testing, in which blood from many soldiers is combined in a single tube and tested for presence of an infectious agent. If an infection is found, one studies that group and by dyadic subdivision eventually isolates the infecteds .

More advanced mathematics can do much better than such common-sense ideas. Those with a physical bent may quickly see that, if $N$ is prime, again assuming a singleton $K$ and a nonnegative $x$ , it will be enough, in fact, to make only 2 inner products, with respectively a sine and cosine of frequency $2\pi/N$ ; the phase of the corresponding complex Fourier coefficient immediately reveals the position of the nonzero. Note here that, for large $N$ , we are doing dramatically better than common-sense (2 measurements rather than $\log(N)$ ).

Advanced mathematics is better than the common-sense approach in another way: common-sense uses adaptive measurements, where the next measurement vector is selected after viewing all previous measurements. In the advanced approach, adaptivity is unnecessary: one simply makes 2 measurements defined a priori and later combines the two to reconstruct.

Compressed Sensing (CS) embodies the advanced approach: it designs a special matrix $A$ of size $n\times N$ , measures $x$ via $y=Ax$ , giving $n$ measurements of the $N$ vector $x$ in parallel, and reconstructs $x$ from $(y,A)$ using computationally efficient and stable algorithms. The key point is that $n$ can be taken much smaller than $N$ , and much closer to $k$ . For example, if $x$ is known to be $k$ -sparse and nonnegative $n=2k+1$ suffices and if $x$ is only known to be $k$ -sparse, roughly $n=2\log(N/n)\cdot k$ will suffice, if $k/N$ is small .

Since the release of the seminal CS papers in 2004, , a great deal of excitement has been generated in signal processing and applied mathematics research, with hundreds of papers on the theory, applications, and extensions of compressed sensing (more than 400 of these are collected at Rice’s online Compressive Sensing Resources archive dsp.rice.edu/cs). Many applications have been proposed, including magnetic resonance imaging , radar , and single-pixel cameras to name a few. In the MRI applications, it has been reported that diagnostic quality images can be obtained in $1/7$ the recording time using CS approaches, . For a recent review of CS see the special issue containing and for a review of sparse approximation see .

which is the convex relaxation of the computationally intractable decoder, , seeking the sparsest solution in agreement with the measurements

Following the usual convention in the CS community, $\|z\|_{0}$ counts the number of nonzero entries in $z$ . Many other encoder/decoder pairs are also being actively studied, with new alternatives being proposed regularly; see Section 3.

To quantify the sparsity/undersampling trade off, we adopt a proportional-growth asymptotic, in which we consider sequences of triples $(k,n,N)$ where all elements grow large in a coordinated way, $n\sim\delta N$ and $k\sim\rho n$ for some constants $\delta,\rho>0$ . This defines a two-dimensional phase space $(\delta,\rho)$ in $^{2}$ for asymptotic analysis.

A sequence of problem sizes $(k,n,N)$ is said to grow proportionally if, for $(\delta,\rho)\in^{2}$ , $\frac{n}{N}\rightarrow\delta$ and $\frac{k}{n}\rightarrow\rho$ as $n\rightarrow\infty$ .

Ultimately, we want to determine, as precisely as possible, which subset of this phase space corresponds to successful recovery and which subset corresponds to unsuccessful recovery. This is the phase-transition framework advocated by Donoho et al ; see Section 3 for a precise definition. By translating the sufficient RIP conditions into the proportional-growth asymptotic, we find lower bounds on the phase-transition for $(\delta,\rho)$ in $^{2}$ . An answer to this question plays the role of an undersampling theorem: to what degree can we undersample a signal and still be able to reconstruct it?

to shed some light on the behavior of the RIP constants of a matrix ensemble with as much precision as possible;

to introduce a reader new to this topic to the type of large deviation analysis calculations often encountered in CS and applicable to many areas faced with combinatorial challenges.

We conclude with a discussion of some other important and related topics not addressed in the current paper. We briefly discuss comparisons of results when noise is present in the measurements or the signal $x$ is not perfectly $k$ -sparse, average case analysis versus the theoretical worst case analysis presented here, and the potential to improve the phase transition curves through improved analysis or improved bounds.

Bounds on RIP for Gaussian Random Matrices

The asymmetric way that the expected eigenvalues $\Lambda^{max}_{n,k}$ and $\Lambda^{min}_{n,k}$ deviate from 1 suggests that the symmetric treatment used by the traditional RIP is missing an important part of the picture. We generalize the RIP to an asymmetric form and derive the sharpest recovery conditions implied by the RIP.

For a matrix $A$ of size $n\times N$ , the asymmetric RIP constants $L(k,n,N;A)$ and $U(k,n,N;A)$ are defined as:

(A similar change in the definition of the RIP constants was used independently by Foucart and Lai in , motivated by different concerns.)

Although both the smallest and largest singular values of $A_{K}^{T}A_{K}$ affect the stability of the reconstruction algorithms, the smaller eigenvalue is dominant for compressed sensing in that it allows distinguishing between sparse vectors from their measurement by $A$ . In fact, it is often incorrectly stated that $R(2k,n,N)<1$ is a necessary condition to ensure that there are no two $k$ -sparse vectors, say $x$ and $x^{\prime}$ , with the same measurements $Ax=Ax^{\prime}$ ; the actual necessary condition is $L(2k,n,N)<1$ .

We see from (4) and (5) that $(1-L(k,n,N))=\min_{K}\lambda^{min}(G_{K})$ and $(1+U(k,n,N))=\max_{K}\lambda^{max}(G_{K})$ with $G_{K}=A_{K}^{T}A_{K}$ . A standard large deviation analysis of bounds on the probability density functions of $\Lambda^{max}_{n,k}$ and $\Lambda^{min}_{n,k}$ allows us to establish upper bounds of $L(k,n,N)$ and $U(k,n,N)$ which are exponentially unlikely to be exceeded.

Let $A$ be a matrix of size $n\times N$ drawn from the Gaussian ensemble and consider the proportional-growth asymptotic ( $\frac{n}{N}\rightarrow\delta$ and $\frac{k}{n}\rightarrow\rho$ as $n\rightarrow\infty$ ). Let $H(p):=p\log(1/p)+(1-p)\log(1/(1-p))$ denote the usual Shannon Entropy with base $e$ logarithms, and let

Define $\lambda^{min}(\delta,\rho)$ and $\lambda^{max}(\delta,\rho)$ as the solution to (8) and (9), respectively:

Define $\mathcal{L}(\delta,\rho)$ and $\mathcal{U}(\delta,\rho)$ as

To facilitate ease of calculating $\mathcal{L}(\delta,\rho)$ and $\mathcal{U}(\delta,\rho)$ , web forms for their calculation are available at ecos.maths.ed.ac.uk.

In the proportional growth asymptotic, the probability that $\mathcal{L}(\delta,\rho)$ and $\mathcal{U}(\delta,\rho)$ bound the random variables $L(k,n,N)$ and $U(k,n,N)$ , respectively, tends to 1 as $n\rightarrow\infty$ . In statistical terminology, the coverage probability of the upper confidence bounds $\mathcal{L}(\delta,\rho)$ and $\mathcal{U}(\delta,\rho)$ tends to one as $n\rightarrow\infty$ . In fact, all probabilities presented in this manuscript converge to their limit “exponentially in $n$ ”; that is, the probability for finite $n$ approaches its limit as $n$ grows with discrepancy bounded by a constant multiple of $e^{-n\beta}$ for some fixed $\beta>0$ .

Fix $\epsilon>0$ . Under the proportional-growth asymptotic, Definition 2, sample each $n\times N$ matrix $A$ from the Gaussian ensemble. Then

Extensive empirical estimates of $L(k,n,N)$ and $U(k,n,N)$ show that the bounds $\mathcal{L}(\delta,\rho)$ and $\mathcal{U}(\delta,\rho)$ are rather sharp; in fact, they are no more than twice the actual upper bounds on $L(k,n,N)$ and $U(k,n,N)$ , see Figure 4 and Table 1, and are much closer for the region applicable for CS decoders, $\rho\ll 1$ . The empirically observed lower bounds on $L(k,n,N)$ and $U(k,n,N)$ are calculated through the following process. The number of rows, $n$ , is fixed at one of the values in Table 1. For each $n$ , 47 values of $N$ are selected so that $n/N$ ranges from $1/20$ to $20/21$ . For each $(n,N)$ a matrix $A$ of size $n\times N$ is drawn from ${\cal N}(0,n^{-1})$ and either the algorithm from or is applied to determine support sets of size $k=1,2,\ldots,n-1$ which are candidates for the support sets that maximize $L(k,n,N;A)$ or $U(k,n,N;A)$ . The largest or smallest eigenvalue of each resulting $n\times k$ submatrix is calculated and recorded. The above process is repeated for some number of matrices, see the caption of Table 1, and the maximum value recorded. The empirical calculation of RIP constants are lower bounds on the true RIP constants as the support sets calculated by and may not be the support sets which maximize the RIP constants.

In order to prove Theorem 5, this section employs a type of large deviation technique often encountered in CS and applicable in fact, to many areas faced with combinatorial challenges.

We first establish some useful lemmas concerning the extreme eigenvalues of Wishart matrices. The matrix $A$ generates ${N\choose k}$ different Wishart matrices $G_{k}=A_{K}^{T}A_{K}$ . Exponential bounds on the tail probabilities of the largest and smallest eigenvalues of such Wishart matrices can be combined with exponential bounds on ${N\choose k}$ to control the chance of large deviations using the union bound. This large deviation analysis technique is characteristic of proofs in compressed sensing. By using the exact probability density functions on the tail behavior of the extreme eigenvalues of Wishart matrices the overestimation of the union bound is dramatically reduced. We focus on the slightly more technical results for the bound on the most extreme of the largest eigenvalues, $\mathcal{U}(\delta,\rho)$ , and prove these statements in full detail. Corresponding results for $\mathcal{L}(\delta,\rho)$ are stated with their similar proofs omitted.

The probability density function, $f_{max}(k,n;\lambda)$ , for the largest eigenvalue of the $k\times k$ Wishart matrix $A_{K}^{T}A_{K}$ was determined by Edelman in . For our analysis, a simplified upper bound suffices.

Let $A_{K}$ be a matrix of size $n\times k$ whose entries are drawn i.i.d from ${\cal N}(0,n^{-1})$ . Let $f_{max}(k,n;\lambda)$ denote the probability density function for the largest eigenvalue of the Wishart matrix $A_{K}^{T}A_{K}$ of size $k\times k$ . Then $f_{max}(k,n;\lambda)$ satisfies:

For our purposes, it is sufficient to have a precise characterization of $g_{max}(k,n;\lambda)$ ’s exponential (with respect to $n$ ) behavior.

where $p_{max}(n,\lambda)$ is a polynomial in $n,\lambda$ .

Let $g_{max}(k,n;\lambda)$ be as defined in (11) and let $\rho_{n}=k/n$ . To extract the exponential behavior of $g_{max}(k,n;\lambda)$ we write $\frac{1}{n}\log(g_{max}(k,n;\lambda))=\Phi_{1}(k,n;\lambda)+\Phi_{2}(k,n;\lambda)+\Phi_{3}(k,n;\lambda)$ where

Clearly, $\lim_{n\rightarrow\infty}\Phi_{1}(k,n;\lambda)=0$ and can be subsumed as part of $p_{max}(n,\lambda)$ . To simplify $\Phi_{3}$ , we apply the second of Binet’s log gamma formulas [52, Sec. 12.32], namely $\log(\Gamma(z))=(z-1/2)\log z-z+\log\sqrt{2\pi}+I$ where $I$ is a convergent, improper integral. With $c(n,\rho)$ representing the constant and integral from Binet’s formula we then have

As $\lim_{n\rightarrow\infty}n^{-1}c(n,\rho_{n})=0$ it can be absorbed into $p_{max}(n,\lambda)$ and we have

To bound $U(k,n,N)$ , we must simultaneously account for all ${N\choose k}$ Wishart matrices $A_{K}^{T}A_{K}$ derived from $A$ . Using a union bound this amounts to studying the exponential behavior of ${N\choose k}g_{max}(k,n;\lambda)$ . In the proportional-growth asymptotic this can be determined by characterizing $\lim_{N\rightarrow\infty}N^{-1}\log\left[{N\choose k}g_{max}(k,n;\lambda)\right]$ , which from Lemma 7 is given by

Recall that $H(p):=p\log(1/p)+(1-p)\log(1/(1-p))$ is the usual Shannon Entropy with base $e$ logarithms.

Equipped with Lemma 7 and (13), Proposition 8 establishes $\lambda^{max}(\delta,\rho)-1$ as an upper bound on $U(k,n,N)$ in the proportional-growth asymptotic.

Throughout this proof $\delta$ and $\rho$ are fixed, and we focus our attention on $\lambda$ , often abbreviating $\psi_{\mathcal{U}}(\delta,\rho;\lambda)$ in (13) as $\psi_{U}(\lambda)$ . We first verify that (9) has a unique solution. Since

$\psi_{U}(\lambda)$ is strictly decreasing on $[1+\rho,\infty)$ and is strictly concave. Combined with

and $\lim_{\lambda\rightarrow\infty}\psi_{U}(\lambda)=-\infty$ , there is a unique solution to (9), namely $\lambda^{max}(\delta,\rho)$ .

Select $\epsilon>0$ and let $(k,n,N)$ be such that $\frac{n}{N}=\delta_{n}$ , $\frac{k}{n}=\rho_{n}$ . First, we write the probability statement in terms of $\lambda^{max}(\delta_{n},\rho_{n})$ :

To bound the integral in (14) in terms of $g_{max}(\delta,\rho;\lambda^{max}(\delta_{n},\rho_{n}))$ we write $g_{max}(k,n;\lambda)$ in terms of $n$ , $\rho_{n}$ , and $\lambda$ as $g_{max}(k,n;\lambda)=\varphi(n,\rho_{n})\lambda^{-\frac{3}{2}}\lambda^{\frac{n}{2}(1+\rho_{n})}e^{-\frac{n}{2}\lambda}$ where

Since $\lambda^{max}(\delta_{n},\rho_{n})>1+\rho_{n}$ , the quantity $\lambda^{\frac{n}{2}(1+\rho_{n})}e^{-\frac{n}{2}\lambda}$ is strictly decreasing in $\lambda$ on $[\lambda^{max}(\delta,\rho_{n}),\infty)$ . Therefore we have

Therefore, combining (14) and (15) we obtain

with the last inequality following from the strict concavity of $\psi_{U}(\lambda)$ . Since $\frac{d}{d\lambda}\psi_{U}\left(\lambda^{max}(\delta,\rho)\right)<0$ is strictly bounded away from zero and $\lim_{n\rightarrow\infty}\lambda^{max}(\delta_{n},\rho_{n})=\lambda^{max}(\delta,\rho)$ , we arrive at, for any $\epsilon>0$

By the definition of $\chi^{N}(k)$ in Definition 1, $U(j,n,N)\geq U(k,n,N)$ for $j=k+1,k+2,\ldots,n$ ; combined with Proposition 8 for $\frac{j}{n}\rightarrow\nu$ as $n\rightarrow\infty$

exponentially in $n$ , and taking a minimum over the compact set $\nu\in[\rho,1]$ we arrive at the desired result. ∎

A similar approach leads to corresponding results for $\mathcal{L}(\delta,\rho)$ . Edelman also determined the probability density function, $f_{min}(k,n;\lambda)$ , for the smallest eigenvalue of the $k\times k$ Wishart matrix $A_{K}^{T}A_{K}$ . Here again, a simplified upper bound suffices:

Let $A_{K}$ be a matrix of size $n\times k$ whose entries are drawn i.i.d. from $\mathcal{N}(0,n^{-1})$ . Let $f_{min}(k,n;\lambda)$ denote the probability density function for the smallest eigenvalue of the Wishart matrix $A_{K}^{T}A_{K}$ of size $k\times k$ . Then $f_{min}(k,n;\lambda)$ satisfies:

With Lemma 10, we establish a bound on the asymptotic behavior of the distribution of the smallest eigenvalue of Wishart matrix of size $k\times k$ .

where $p_{min}(n,\lambda)$ is a polynomial in $n,\lambda$ .

With Lemma 11, the large deviation analysis yields

Similar to the proof of Proposition 8, Lemma 11 and (19) are used to establish $\mathcal{L}(\delta,\rho)$ as an upper bound on $L(k,n,N)$ in the proportional-growth asymptotic.

Let $\delta,\rho\in(0,1]$ , and $A$ be a matrix of size $n\times N$ whose entries are drawn i.i.d. from ${\cal N}(0,n^{-1})$ . Define $\mathcal{L}(\delta,\rho):=1-\lambda^{min}(\delta,\rho)$ where $\lambda^{min}(\delta,\rho)$ is the solution to (8). Then for any $\epsilon>0$ , in the proportional-growth asymptotic

The bound $\mathcal{L}(\delta,\rho)$ is strictly increasing in $\rho$ for any $\delta\in(0,1)$ , and as a consequence no tighter bound can be achieved by minimizing over matrices of larger size as was done in Proposition 9.

RIP Undersampling Theorems

As stated at the end of the introduction, one of the central aims of this article is to advocate a unifying framework for the comparison of results in compressed sensing. Currently there is no general agreement in the compressed sensing community on such a framework, making it difficult to compare results obtained by different methods of analysis or to identify when new results are improvements over existing ones. Donoho has put forth the phase transition framework borrowed from the statistical mechanics literature and used successfully in a similar context by the combinatorial optimization community, see . This framework has been successfully employed in compressed sensing by Donoho et al, .

the oversampling rate of making $n$ measurements as opposed to the optimal oracle rate of making $k$ measurements when the oracle knows the support of $x$ :

For each value of $\delta_{n}\in(0,1)$ there is a largest value of $\rho_{n}$ which guarantees successful recovery of $x$ .

We now formalize the phase transition framework described above.

The event StrongEquiv( $A,$ alg) denotes the following property of an $n\times N$ matrix $A$ : for every $k$ -sparse vector $x$ , the algorithm “alg” exactly recovers $x$ from the corresponding measurements $y=Ax$ .

For most compressed sensing algorithms and for a broad class of matrices, under the proportional-growth asymptotic there is a strictly positive function $\rho_{S}(\delta;$ alg $)>0$ defining a region of the $(\delta,\rho)$ phase space which ensures successful recovery of every $k$ -sparse vector $x\in\chi^{N}(k)$ . This function, $\rho_{S}(\delta;$ alg), is called the Strong phase transition function .

Consider the proportional-growth asymptotic with parameters $(\delta,\rho)\in(0,1)\times(0,1/2)$ . Draw the corresponding $n\times N$ matrices $A$ from the Gaussian ensemble and fix $\epsilon>0$ . Suppose that we are given a function $\rho_{S}(\delta$ ;alg) with the property that, whenever $0<\rho<(1-\epsilon)\rho_{S}(\delta;$ alg), $\hbox{Prob}(\hbox{StrongEquiv}(A,$ alg $))\rightarrow 1$ as $n\rightarrow\infty$ . We say that $\rho_{S}(\delta$ ;alg) bounds a region of strong equivalence.

For any matrix $A$ of size $n\times N$ with RIP constants $L(2k,n,N)$ and $U(2k,n,N)$ , for $2k\leq n<N$ . Define

To translate this result into the phase transition framework for matrices from the Gaussian ensemble, we employ the RIP bounds (10) to the asymmetric RIP constants $L(2k,n,N)$ and $U(2k,n,N)$ . It turns out that naively inserting these bounds into (20) yields a bound on $\mu^{FL}(k,n,N)$ , see Lemma 18, and provides a simple way to obtain a bound on the region of strong equivalence.

and $\rho_{S}^{FL}(\delta)$ as the solution to $\mu^{FL}(\delta,\rho)=1$ .

The function $\rho_{S}^{FL}(\delta)$ is displayed as the red curve in Figure 5.

Fix $\epsilon>0$ . Consider the proportional-growth asymptotic with parameters $(\delta,\rho)\in(0,1)\times(0,1/2)$ . Draw the corresponding $n\times N$ matrices $A$ from the Gaussian ensemble. Then

Theorem 5 and the form of $\mu^{FL}(\delta,\rho)$ imply a similar bound to the above with a modified dependence on $\epsilon$ . For any $c\epsilon>0$ , with $n/N\rightarrow\delta\in(0,1)$ and $k/n\rightarrow\rho\in(0,1/2]$ , the probability, on the draw of $A$ from the Gaussian ensemble, that

is satisfied converges to one exponentially with $n$ . Since $\mathcal{U}(\delta,\rho)$ is non-decreasing in $\rho$ and $\mathcal{L}(\delta,\rho)$ is strictly increasing in $\rho$ for any $\delta$ and $\rho\in(0,1)$ , it follows that the right-hand side of (23) can be bounded by the right-hand side of (22) for any fixed $\epsilon$ satisfying $0<\epsilon<\frac{1}{2\rho}-1$ , by setting

(The upper bound on $\epsilon$ is imposed so that the second argument of $\mathcal{U}(\delta,\cdot)$ and $\mathcal{L}(\delta,\cdot)$ , $2(1+\epsilon)\rho$ , is in the admissible range of $(0,1)$ .) That the bound (22) is satisfied for all $\epsilon>0$ sufficiently small, and that the right hand side of (22) is strictly increasing in $\epsilon$ establishes that (22) is in fact satisfied probability on the draw of $A$ that converges to one exponentially in $n$ for any $\epsilon\in\left(0,\frac{1}{2\rho}-1\right)$ . ∎

and $\rho_{S}^{RV}(\delta)$ as the solution to $\mu^{RV}(\delta,\rho)=1$ .

The function $\rho_{S}^{RV}(\delta)$ is displayed as the blue curve in Figure 5.

Versions of Theorems 19 and 21 exist for finite values of $(k,n,N)$ , , but in each case the recoverability conditions rapidly approach the stated asymptotic limiting functions $\rho_{S}(\delta)$ as $(k,n,N)$ grow; we do not further complicate the discussion with their rates of convergence.

3 Further Considerations

In this article, we have considered only the case of noiseless measurements, Regions of Strong Equivalence, and a particular result obtained via an eigenvalue analysis and the RIP. We briefly discuss some additional considerations for the phase transition framework.

3.2 Regions of Weak Equivalence and Average Case Performance

In many applications, it may not be imperative that the decoder is able to reconstruct every $k$ -sparse vector. Instead, one may be willing to lose a small fraction of all possible $k$ -sparse signals. This is the behavior observed when a decoder is tested on $k$ -sparse vectors whose support sets are drawn uniformly at random. Large scale empirical testing of CoSaMP, Subspace Pursuit and Iterated Hard Thresholding were compiled by Donoho and Maleki . Most sparse approximation algorithms do not have a theoretical average case analysis. The polytope analysis of Donoho and Tanner allows for analytical arguments providing a Region of Weak Equivalence where recovery is guaranteed for all but a but a small fraction of $k$ -sparse signals. An average case variant of the RIP is being developed, see .

3.3 Improving the RIP Phase Transition

Alternatively, the region of strong equivalence might be increased by improving the bounds on the RIP constants, $\mathcal{L}(\delta,\rho),\mathcal{U}(\delta,\rho)$ . If the method of analysis remained the same, we can explore the effects of improved bounds by examining the statements with empirically observed lower bounds on RIP constants for Gaussian matrices. As detailed in Table 1, the current bounds from Thm. 5 are no more than twice the empirical RIP constants. Replacing the RIP constants with empirically observed lower bounds of the RIP constants (for $n=800$ ) in $\mu^{FL}(k,n,N)$ gives us a function $\rho_{S}^{emp}(\delta)$ , see Figure 6, which is an upper bound on the region of strong equivalence implied by Thm. 15; this improvement is no more than 2.5 times $\rho_{S}^{FL}(\delta)$ for $\delta\in[1/20,20/21]$ .

Foucart and Lai bounded the discrepancy between $x^{\star}_{\theta}$ and any $x_{\theta}$ satisfying (25) in terms of the discrepancy between $x_{\theta}$ and its best $k$ sparse approximation,

Given $q\in(0,1]$ , for any matrix $A$ of size $n\times N$ with $n<N$ and with RIP constants $L(2k,n,N)$ and $U(2k,n,N)$ and $\mu(2k,n,N)$ defined as (20), if

then a solution $x^{\star}_{\theta}$ of (26) approximates any $x_{\theta}$ satisfying (25) within the bounds

with $C_{1},C_{2},D_{1},$ and $D_{2}$ functions of $q$ , $\alpha$ , and $\frac{1+U(2\alpha k,n,N)}{1-L(2\alpha k,n,N)}$ .

The parameter $\alpha$ in Theorem 22 is a free parameter from the method of proof, and should be selected so as to maximize the region where (28) and/or other conditions are satisfied. For brevity we do not state the formulae for $C_{1},C_{2},D_{1},$ and $D_{2}$ as functions of $(k,n,N)$ , but only state them in Theorem 23 in terms of their bounds for Gaussian random matrices as $(k,n,N)\rightarrow\infty$ .

Although the solution of (26), $x^{\star}_{\theta}$ , has unknown sparsity, Theorem 22 ensures that if there is a solution of (25), $x_{\theta}$ , which can be well approximated by a $k$ sparse vector, i.e. if $\sigma_{k}(x_{\theta})_{q}$ is small, then if (28) is satisfied the discrepancy between $x^{\star}_{\theta}$ and $x_{\theta}$ will be similarly small. For instance, if the sparsest solution of (26), $x_{\theta}$ , is $k$ sparse, then (30) implies that $\|x_{\theta}-x^{\star}_{\theta}\|_{2}\leq D_{2}\cdot\theta$ ; moreover, if $\theta=0$ then $x^{\star}_{\theta}$ will be $k$ sparse and satisfy $Ax^{\star}_{\theta}=b$ (in the case $q=1$ this result is summarized as Theorem 15). Substituting bounds on the asymmetric RIP constants $L(2\alpha k,n,N)$ and $U(2\alpha k,n,N)$ from Theorem 5 we arrive at a quantitative version of Theorem 22 for Gaussian random matrices.

Given $q\in(0,1]$ , for any $\epsilon>0$ , as $(k,n,N)\rightarrow\infty$ with $n/N\rightarrow\delta\in(0,1)$ and $k/n\rightarrow\rho\in(0,1/2]$ , if $\rho<(1-\epsilon)\rho_{S}^{FL}(\delta;q)$ where $\rho_{S}^{FL}(\delta;q)$ is the maximum over $1\leq\alpha\leq 1/2\rho$ of the solutions, $\rho(\delta;q,\alpha)$ , of $\mu_{\alpha}(\delta,2\alpha\rho):=\alpha^{1/2-1/q}\mu(\delta,2\alpha\rho)=1$ with $\mu(\delta,2\rho)$ defined as in (21), there is overwhelming probability on the draw of $A$ with Gaussian i.i.d. entries that a solution $x^{\star}_{\theta}$ of (26) approximates any $x_{\theta}$ satisfying (25) within the bounds

The multiplicative “stability factors” are defined as:

with $\beta(\delta,\rho):=(1+\sqrt{2})\frac{1+\mathcal{U}(\delta,\rho)}{1-\mathcal{L}(\delta,\rho)}$ .