Compressed Sensing: How sharp is the Restricted Isometry Property

Jeffrey D. Blanchard, Coralia Cartis, Jared Tanner

Introduction

Questions of this form must have been around for millennia. Consider this puzzle: “A counterfeit coin is hidden in a batch of NN otherwise similar coins; it is distinguished from the others by its slightly heavier weight. How many balance weighings are needed to find the counterfeit?” Abstractly, this concerns the special case where KK is an unknown singleton and the nonzero value is nonnegative; the balance is abstractly the same as an inner product which gives weight +1 to the coefficients placed in the “right” pan and -1 to the coefficients placed in the “left” pan. Many people quickly find that roughly log(N)\log(N) measurements suffice to find the position and value of the nonzero, each time putting half the remaining coins in one pan, half in the other, and discarding from further consideration the coins that turn out on the light side. Lighthearted as puzzles can sometimes seem, they can lead to serious applications.

During World War Two, efficient screening of large groups of soldiers for certain infections was based on the principle of group testing, in which blood from many soldiers is combined in a single tube and tested for presence of an infectious agent. If an infection is found, one studies that group and by dyadic subdivision eventually isolates the infecteds .

More advanced mathematics can do much better than such common-sense ideas. Those with a physical bent may quickly see that, if NN is prime, again assuming a singleton KK and a nonnegative xx, it will be enough, in fact, to make only 2 inner products, with respectively a sine and cosine of frequency 2π/N2\pi/N; the phase of the corresponding complex Fourier coefficient immediately reveals the position of the nonzero. Note here that, for large NN, we are doing dramatically better than common-sense (2 measurements rather than log(N)\log(N)).

Advanced mathematics is better than the common-sense approach in another way: common-sense uses adaptive measurements, where the next measurement vector is selected after viewing all previous measurements. In the advanced approach, adaptivity is unnecessary: one simply makes 2 measurements defined a priori and later combines the two to reconstruct.

Compressed Sensing (CS) embodies the advanced approach: it designs a special matrix AA of size n×Nn\times N, measures xx via y=Axy=Ax, giving nn measurements of the NN vector xx in parallel, and reconstructs xx from (y,A)(y,A) using computationally efficient and stable algorithms. The key point is that nn can be taken much smaller than NN, and much closer to kk. For example, if xx is known to be kk-sparse and nonnegative n=2k+1n=2k+1 suffices and if xx is only known to be kk-sparse, roughly n=2log(N/n)kn=2\log(N/n)\cdot k will suffice, if k/Nk/N is small .

Since the release of the seminal CS papers in 2004, , a great deal of excitement has been generated in signal processing and applied mathematics research, with hundreds of papers on the theory, applications, and extensions of compressed sensing (more than 400 of these are collected at Rice’s online Compressive Sensing Resources archive dsp.rice.edu/cs). Many applications have been proposed, including magnetic resonance imaging , radar , and single-pixel cameras to name a few. In the MRI applications, it has been reported that diagnostic quality images can be obtained in 1/71/7 the recording time using CS approaches, . For a recent review of CS see the special issue containing and for a review of sparse approximation see .

which is the convex relaxation of the computationally intractable decoder, , seeking the sparsest solution in agreement with the measurements

Following the usual convention in the CS community, z0\|z\|_{0} counts the number of nonzero entries in zz. Many other encoder/decoder pairs are also being actively studied, with new alternatives being proposed regularly; see Section 3.

To quantify the sparsity/undersampling trade off, we adopt a proportional-growth asymptotic, in which we consider sequences of triples (k,n,N)(k,n,N) where all elements grow large in a coordinated way, nδNn\sim\delta N and kρnk\sim\rho n for some constants δ,ρ>0\delta,\rho>0. This defines a two-dimensional phase space (δ,ρ)(\delta,\rho) in 2^{2} for asymptotic analysis.

A sequence of problem sizes (k,n,N)(k,n,N) is said to grow proportionally if, for (δ,ρ)2(\delta,\rho)\in^{2}, nNδ\frac{n}{N}\rightarrow\delta and knρ\frac{k}{n}\rightarrow\rho as nn\rightarrow\infty.

Ultimately, we want to determine, as precisely as possible, which subset of this phase space corresponds to successful recovery and which subset corresponds to unsuccessful recovery. This is the phase-transition framework advocated by Donoho et al ; see Section 3 for a precise definition. By translating the sufficient RIP conditions into the proportional-growth asymptotic, we find lower bounds on the phase-transition for (δ,ρ)(\delta,\rho) in 2^{2}. An answer to this question plays the role of an undersampling theorem: to what degree can we undersample a signal and still be able to reconstruct it?

to shed some light on the behavior of the RIP constants of a matrix ensemble with as much precision as possible;

to introduce a reader new to this topic to the type of large deviation analysis calculations often encountered in CS and applicable to many areas faced with combinatorial challenges.

We conclude with a discussion of some other important and related topics not addressed in the current paper. We briefly discuss comparisons of results when noise is present in the measurements or the signal xx is not perfectly kk-sparse, average case analysis versus the theoretical worst case analysis presented here, and the potential to improve the phase transition curves through improved analysis or improved bounds.

Bounds on RIP for Gaussian Random Matrices

The asymmetric way that the expected eigenvalues Λn,kmax\Lambda^{max}_{n,k} and Λn,kmin\Lambda^{min}_{n,k} deviate from 1 suggests that the symmetric treatment used by the traditional RIP is missing an important part of the picture. We generalize the RIP to an asymmetric form and derive the sharpest recovery conditions implied by the RIP.

For a matrix AA of size n×Nn\times N, the asymmetric RIP constants L(k,n,N;A)L(k,n,N;A) and U(k,n,N;A)U(k,n,N;A) are defined as:

(A similar change in the definition of the RIP constants was used independently by Foucart and Lai in , motivated by different concerns.)

Although both the smallest and largest singular values of AKTAKA_{K}^{T}A_{K} affect the stability of the reconstruction algorithms, the smaller eigenvalue is dominant for compressed sensing in that it allows distinguishing between sparse vectors from their measurement by AA. In fact, it is often incorrectly stated that R(2k,n,N)<1R(2k,n,N)<1 is a necessary condition to ensure that there are no two kk-sparse vectors, say xx and xx^{\prime}, with the same measurements Ax=AxAx=Ax^{\prime}; the actual necessary condition is L(2k,n,N)<1L(2k,n,N)<1.

We see from (4) and (5) that (1L(k,n,N))=minKλmin(GK)(1-L(k,n,N))=\min_{K}\lambda^{min}(G_{K}) and (1+U(k,n,N))=maxKλmax(GK)(1+U(k,n,N))=\max_{K}\lambda^{max}(G_{K}) with GK=AKTAKG_{K}=A_{K}^{T}A_{K}. A standard large deviation analysis of bounds on the probability density functions of Λn,kmax\Lambda^{max}_{n,k} and Λn,kmin\Lambda^{min}_{n,k} allows us to establish upper bounds of L(k,n,N)L(k,n,N) and U(k,n,N)U(k,n,N) which are exponentially unlikely to be exceeded.

Let AA be a matrix of size n×Nn\times N drawn from the Gaussian ensemble and consider the proportional-growth asymptotic (nNδ\frac{n}{N}\rightarrow\delta and knρ\frac{k}{n}\rightarrow\rho as nn\rightarrow\infty). Let H(p):=plog(1/p)+(1p)log(1/(1p))H(p):=p\log(1/p)+(1-p)\log(1/(1-p)) denote the usual Shannon Entropy with base ee logarithms, and let

Define λmin(δ,ρ)\lambda^{min}(\delta,\rho) and λmax(δ,ρ)\lambda^{max}(\delta,\rho) as the solution to (8) and (9), respectively:

Define L(δ,ρ)\mathcal{L}(\delta,\rho) and U(δ,ρ)\mathcal{U}(\delta,\rho) as

To facilitate ease of calculating L(δ,ρ)\mathcal{L}(\delta,\rho) and U(δ,ρ)\mathcal{U}(\delta,\rho), web forms for their calculation are available at ecos.maths.ed.ac.uk.

In the proportional growth asymptotic, the probability that L(δ,ρ)\mathcal{L}(\delta,\rho) and U(δ,ρ)\mathcal{U}(\delta,\rho) bound the random variables L(k,n,N)L(k,n,N) and U(k,n,N)U(k,n,N), respectively, tends to 1 as nn\rightarrow\infty. In statistical terminology, the coverage probability of the upper confidence bounds L(δ,ρ)\mathcal{L}(\delta,\rho) and U(δ,ρ)\mathcal{U}(\delta,\rho) tends to one as nn\rightarrow\infty. In fact, all probabilities presented in this manuscript converge to their limit “exponentially in nn”; that is, the probability for finite nn approaches its limit as nn grows with discrepancy bounded by a constant multiple of enβe^{-n\beta} for some fixed β>0\beta>0.

Fix ϵ>0\epsilon>0. Under the proportional-growth asymptotic, Definition 2, sample each n×Nn\times N matrix AA from the Gaussian ensemble. Then

Extensive empirical estimates of L(k,n,N)L(k,n,N) and U(k,n,N)U(k,n,N) show that the bounds L(δ,ρ)\mathcal{L}(\delta,\rho) and U(δ,ρ)\mathcal{U}(\delta,\rho) are rather sharp; in fact, they are no more than twice the actual upper bounds on L(k,n,N)L(k,n,N) and U(k,n,N)U(k,n,N), see Figure 4 and Table 1, and are much closer for the region applicable for CS decoders, ρ1\rho\ll 1. The empirically observed lower bounds on L(k,n,N)L(k,n,N) and U(k,n,N)U(k,n,N) are calculated through the following process. The number of rows, nn, is fixed at one of the values in Table 1. For each nn, 47 values of NN are selected so that n/Nn/N ranges from 1/201/20 to 20/2120/21. For each (n,N)(n,N) a matrix AA of size n×Nn\times N is drawn from N(0,n1){\cal N}(0,n^{-1}) and either the algorithm from or is applied to determine support sets of size k=1,2,,n1k=1,2,\ldots,n-1 which are candidates for the support sets that maximize L(k,n,N;A)L(k,n,N;A) or U(k,n,N;A)U(k,n,N;A). The largest or smallest eigenvalue of each resulting n×kn\times k submatrix is calculated and recorded. The above process is repeated for some number of matrices, see the caption of Table 1, and the maximum value recorded. The empirical calculation of RIP constants are lower bounds on the true RIP constants as the support sets calculated by and may not be the support sets which maximize the RIP constants.

In order to prove Theorem 5, this section employs a type of large deviation technique often encountered in CS and applicable in fact, to many areas faced with combinatorial challenges.

We first establish some useful lemmas concerning the extreme eigenvalues of Wishart matrices. The matrix AA generates (Nk){N\choose k} different Wishart matrices Gk=AKTAKG_{k}=A_{K}^{T}A_{K}. Exponential bounds on the tail probabilities of the largest and smallest eigenvalues of such Wishart matrices can be combined with exponential bounds on (Nk){N\choose k} to control the chance of large deviations using the union bound. This large deviation analysis technique is characteristic of proofs in compressed sensing. By using the exact probability density functions on the tail behavior of the extreme eigenvalues of Wishart matrices the overestimation of the union bound is dramatically reduced. We focus on the slightly more technical results for the bound on the most extreme of the largest eigenvalues, U(δ,ρ)\mathcal{U}(\delta,\rho), and prove these statements in full detail. Corresponding results for L(δ,ρ)\mathcal{L}(\delta,\rho) are stated with their similar proofs omitted.

The probability density function, fmax(k,n;λ)f_{max}(k,n;\lambda), for the largest eigenvalue of the k×kk\times k Wishart matrix AKTAKA_{K}^{T}A_{K} was determined by Edelman in . For our analysis, a simplified upper bound suffices.

Let AKA_{K} be a matrix of size n×kn\times k whose entries are drawn i.i.d from N(0,n1){\cal N}(0,n^{-1}). Let fmax(k,n;λ)f_{max}(k,n;\lambda) denote the probability density function for the largest eigenvalue of the Wishart matrix AKTAKA_{K}^{T}A_{K} of size k×kk\times k. Then fmax(k,n;λ)f_{max}(k,n;\lambda) satisfies:

For our purposes, it is sufficient to have a precise characterization of gmax(k,n;λ)g_{max}(k,n;\lambda)’s exponential (with respect to nn) behavior.

where pmax(n,λ)p_{max}(n,\lambda) is a polynomial in n,λn,\lambda.

Let gmax(k,n;λ)g_{max}(k,n;\lambda) be as defined in (11) and let ρn=k/n\rho_{n}=k/n. To extract the exponential behavior of gmax(k,n;λ)g_{max}(k,n;\lambda) we write 1nlog(gmax(k,n;λ))=Φ1(k,n;λ)+Φ2(k,n;λ)+Φ3(k,n;λ)\frac{1}{n}\log(g_{max}(k,n;\lambda))=\Phi_{1}(k,n;\lambda)+\Phi_{2}(k,n;\lambda)+\Phi_{3}(k,n;\lambda) where

Clearly, limnΦ1(k,n;λ)=0\lim_{n\rightarrow\infty}\Phi_{1}(k,n;\lambda)=0 and can be subsumed as part of pmax(n,λ)p_{max}(n,\lambda). To simplify Φ3\Phi_{3}, we apply the second of Binet’s log gamma formulas [52, Sec. 12.32], namely log(Γ(z))=(z1/2)logzz+log2π+I\log(\Gamma(z))=(z-1/2)\log z-z+\log\sqrt{2\pi}+I where II is a convergent, improper integral. With c(n,ρ)c(n,\rho) representing the constant and integral from Binet’s formula we then have

As limnn1c(n,ρn)=0\lim_{n\rightarrow\infty}n^{-1}c(n,\rho_{n})=0 it can be absorbed into pmax(n,λ)p_{max}(n,\lambda) and we have

To bound U(k,n,N)U(k,n,N), we must simultaneously account for all (Nk){N\choose k} Wishart matrices AKTAKA_{K}^{T}A_{K} derived from AA. Using a union bound this amounts to studying the exponential behavior of (Nk)gmax(k,n;λ){N\choose k}g_{max}(k,n;\lambda). In the proportional-growth asymptotic this can be determined by characterizing limNN1log[(Nk)gmax(k,n;λ)]\lim_{N\rightarrow\infty}N^{-1}\log\left[{N\choose k}g_{max}(k,n;\lambda)\right], which from Lemma 7 is given by

Recall that H(p):=plog(1/p)+(1p)log(1/(1p))H(p):=p\log(1/p)+(1-p)\log(1/(1-p)) is the usual Shannon Entropy with base ee logarithms.

Equipped with Lemma 7 and (13), Proposition 8 establishes λmax(δ,ρ)1\lambda^{max}(\delta,\rho)-1 as an upper bound on U(k,n,N)U(k,n,N) in the proportional-growth asymptotic.

Throughout this proof δ\delta and ρ\rho are fixed, and we focus our attention on λ\lambda, often abbreviating ψU(δ,ρ;λ)\psi_{\mathcal{U}}(\delta,\rho;\lambda) in (13) as ψU(λ)\psi_{U}(\lambda). We first verify that (9) has a unique solution. Since

ψU(λ)\psi_{U}(\lambda) is strictly decreasing on [1+ρ,)[1+\rho,\infty) and is strictly concave. Combined with

and limλψU(λ)=\lim_{\lambda\rightarrow\infty}\psi_{U}(\lambda)=-\infty, there is a unique solution to (9), namely λmax(δ,ρ)\lambda^{max}(\delta,\rho).

Select ϵ>0\epsilon>0 and let (k,n,N)(k,n,N) be such that nN=δn\frac{n}{N}=\delta_{n}, kn=ρn\frac{k}{n}=\rho_{n}. First, we write the probability statement in terms of λmax(δn,ρn)\lambda^{max}(\delta_{n},\rho_{n}):

To bound the integral in (14) in terms of gmax(δ,ρ;λmax(δn,ρn))g_{max}(\delta,\rho;\lambda^{max}(\delta_{n},\rho_{n})) we write gmax(k,n;λ)g_{max}(k,n;\lambda) in terms of nn, ρn\rho_{n}, and λ\lambda as gmax(k,n;λ)=φ(n,ρn)λ32λn2(1+ρn)en2λg_{max}(k,n;\lambda)=\varphi(n,\rho_{n})\lambda^{-\frac{3}{2}}\lambda^{\frac{n}{2}(1+\rho_{n})}e^{-\frac{n}{2}\lambda} where

Since λmax(δn,ρn)>1+ρn\lambda^{max}(\delta_{n},\rho_{n})>1+\rho_{n}, the quantity λn2(1+ρn)en2λ\lambda^{\frac{n}{2}(1+\rho_{n})}e^{-\frac{n}{2}\lambda} is strictly decreasing in λ\lambda on [λmax(δ,ρn),)[\lambda^{max}(\delta,\rho_{n}),\infty). Therefore we have

Therefore, combining (14) and (15) we obtain

with the last inequality following from the strict concavity of ψU(λ)\psi_{U}(\lambda). Since ddλψU(λmax(δ,ρ))<0\frac{d}{d\lambda}\psi_{U}\left(\lambda^{max}(\delta,\rho)\right)<0 is strictly bounded away from zero and limnλmax(δn,ρn)=λmax(δ,ρ)\lim_{n\rightarrow\infty}\lambda^{max}(\delta_{n},\rho_{n})=\lambda^{max}(\delta,\rho), we arrive at, for any ϵ>0\epsilon>0

By the definition of χN(k)\chi^{N}(k) in Definition 1, U(j,n,N)U(k,n,N)U(j,n,N)\geq U(k,n,N) for j=k+1,k+2,,nj=k+1,k+2,\ldots,n; combined with Proposition 8 for jnν\frac{j}{n}\rightarrow\nu as nn\rightarrow\infty

exponentially in nn, and taking a minimum over the compact set ν[ρ,1]\nu\in[\rho,1] we arrive at the desired result. ∎

A similar approach leads to corresponding results for L(δ,ρ)\mathcal{L}(\delta,\rho). Edelman also determined the probability density function, fmin(k,n;λ)f_{min}(k,n;\lambda), for the smallest eigenvalue of the k×kk\times k Wishart matrix AKTAKA_{K}^{T}A_{K} . Here again, a simplified upper bound suffices:

Let AKA_{K} be a matrix of size n×kn\times k whose entries are drawn i.i.d. from N(0,n1)\mathcal{N}(0,n^{-1}). Let fmin(k,n;λ)f_{min}(k,n;\lambda) denote the probability density function for the smallest eigenvalue of the Wishart matrix AKTAKA_{K}^{T}A_{K} of size k×kk\times k. Then fmin(k,n;λ)f_{min}(k,n;\lambda) satisfies:

With Lemma 10, we establish a bound on the asymptotic behavior of the distribution of the smallest eigenvalue of Wishart matrix of size k×kk\times k.

where pmin(n,λ)p_{min}(n,\lambda) is a polynomial in n,λn,\lambda.

With Lemma 11, the large deviation analysis yields

Similar to the proof of Proposition 8, Lemma 11 and (19) are used to establish L(δ,ρ)\mathcal{L}(\delta,\rho) as an upper bound on L(k,n,N)L(k,n,N) in the proportional-growth asymptotic.

Let δ,ρ(0,1]\delta,\rho\in(0,1], and AA be a matrix of size n×Nn\times N whose entries are drawn i.i.d. from N(0,n1){\cal N}(0,n^{-1}). Define L(δ,ρ):=1λmin(δ,ρ)\mathcal{L}(\delta,\rho):=1-\lambda^{min}(\delta,\rho) where λmin(δ,ρ)\lambda^{min}(\delta,\rho) is the solution to (8). Then for any ϵ>0\epsilon>0, in the proportional-growth asymptotic

The bound L(δ,ρ)\mathcal{L}(\delta,\rho) is strictly increasing in ρ\rho for any δ(0,1)\delta\in(0,1), and as a consequence no tighter bound can be achieved by minimizing over matrices of larger size as was done in Proposition 9.

RIP Undersampling Theorems

As stated at the end of the introduction, one of the central aims of this article is to advocate a unifying framework for the comparison of results in compressed sensing. Currently there is no general agreement in the compressed sensing community on such a framework, making it difficult to compare results obtained by different methods of analysis or to identify when new results are improvements over existing ones. Donoho has put forth the phase transition framework borrowed from the statistical mechanics literature and used successfully in a similar context by the combinatorial optimization community, see . This framework has been successfully employed in compressed sensing by Donoho et al, .

the oversampling rate of making nn measurements as opposed to the optimal oracle rate of making kk measurements when the oracle knows the support of xx:

For each value of δn(0,1)\delta_{n}\in(0,1) there is a largest value of ρn\rho_{n} which guarantees successful recovery of xx.

We now formalize the phase transition framework described above.

The event StrongEquiv(A,A,alg) denotes the following property of an n×Nn\times N matrix AA: for every kk-sparse vector xx, the algorithm “alg” exactly recovers xx from the corresponding measurements y=Axy=Ax.

For most compressed sensing algorithms and for a broad class of matrices, under the proportional-growth asymptotic there is a strictly positive function ρS(δ;\rho_{S}(\delta;alg)>0)>0 defining a region of the (δ,ρ)(\delta,\rho) phase space which ensures successful recovery of every kk-sparse vector xχN(k)x\in\chi^{N}(k). This function, ρS(δ;\rho_{S}(\delta;alg), is called the Strong phase transition function .

Consider the proportional-growth asymptotic with parameters (δ,ρ)(0,1)×(0,1/2)(\delta,\rho)\in(0,1)\times(0,1/2). Draw the corresponding n×Nn\times N matrices AA from the Gaussian ensemble and fix ϵ>0\epsilon>0. Suppose that we are given a function ρS(δ\rho_{S}(\delta;alg) with the property that, whenever 0<ρ<(1ϵ)ρS(δ;0<\rho<(1-\epsilon)\rho_{S}(\delta;alg), Prob(StrongEquiv(A,\hbox{Prob}(\hbox{StrongEquiv}(A,alg))1))\rightarrow 1 as nn\rightarrow\infty. We say that ρS(δ\rho_{S}(\delta;alg) bounds a region of strong equivalence.

For any matrix AA of size n×Nn\times N with RIP constants L(2k,n,N)L(2k,n,N) and U(2k,n,N)U(2k,n,N), for 2kn<N2k\leq n<N. Define

To translate this result into the phase transition framework for matrices from the Gaussian ensemble, we employ the RIP bounds (10) to the asymmetric RIP constants L(2k,n,N)L(2k,n,N) and U(2k,n,N)U(2k,n,N). It turns out that naively inserting these bounds into (20) yields a bound on μFL(k,n,N)\mu^{FL}(k,n,N), see Lemma 18, and provides a simple way to obtain a bound on the region of strong equivalence.

and ρSFL(δ)\rho_{S}^{FL}(\delta) as the solution to μFL(δ,ρ)=1\mu^{FL}(\delta,\rho)=1.

The function ρSFL(δ)\rho_{S}^{FL}(\delta) is displayed as the red curve in Figure 5.

Fix ϵ>0\epsilon>0. Consider the proportional-growth asymptotic with parameters (δ,ρ)(0,1)×(0,1/2)(\delta,\rho)\in(0,1)\times(0,1/2). Draw the corresponding n×Nn\times N matrices AA from the Gaussian ensemble. Then

Theorem 5 and the form of μFL(δ,ρ)\mu^{FL}(\delta,\rho) imply a similar bound to the above with a modified dependence on ϵ\epsilon. For any cϵ>0c\epsilon>0, with n/Nδ(0,1)n/N\rightarrow\delta\in(0,1) and k/nρ(0,1/2]k/n\rightarrow\rho\in(0,1/2], the probability, on the draw of AA from the Gaussian ensemble, that

is satisfied converges to one exponentially with nn. Since U(δ,ρ)\mathcal{U}(\delta,\rho) is non-decreasing in ρ\rho and L(δ,ρ)\mathcal{L}(\delta,\rho) is strictly increasing in ρ\rho for any δ\delta and ρ(0,1)\rho\in(0,1), it follows that the right-hand side of (23) can be bounded by the right-hand side of (22) for any fixed ϵ\epsilon satisfying 0<ϵ<12ρ10<\epsilon<\frac{1}{2\rho}-1, by setting

(The upper bound on ϵ\epsilon is imposed so that the second argument of U(δ,)\mathcal{U}(\delta,\cdot) and L(δ,)\mathcal{L}(\delta,\cdot), 2(1+ϵ)ρ2(1+\epsilon)\rho, is in the admissible range of (0,1)(0,1).) That the bound (22) is satisfied for all ϵ>0\epsilon>0 sufficiently small, and that the right hand side of (22) is strictly increasing in ϵ\epsilon establishes that (22) is in fact satisfied probability on the draw of AA that converges to one exponentially in nn for any ϵ(0,12ρ1)\epsilon\in\left(0,\frac{1}{2\rho}-1\right). ∎

and ρSRV(δ)\rho_{S}^{RV}(\delta) as the solution to μRV(δ,ρ)=1\mu^{RV}(\delta,\rho)=1.

The function ρSRV(δ)\rho_{S}^{RV}(\delta) is displayed as the blue curve in Figure 5.

Versions of Theorems 19 and 21 exist for finite values of (k,n,N)(k,n,N), , but in each case the recoverability conditions rapidly approach the stated asymptotic limiting functions ρS(δ)\rho_{S}(\delta) as (k,n,N)(k,n,N) grow; we do not further complicate the discussion with their rates of convergence.

3 Further Considerations

In this article, we have considered only the case of noiseless measurements, Regions of Strong Equivalence, and a particular result obtained via an eigenvalue analysis and the RIP. We briefly discuss some additional considerations for the phase transition framework.

3.2 Regions of Weak Equivalence and Average Case Performance

In many applications, it may not be imperative that the decoder is able to reconstruct every kk-sparse vector. Instead, one may be willing to lose a small fraction of all possible kk-sparse signals. This is the behavior observed when a decoder is tested on kk-sparse vectors whose support sets are drawn uniformly at random. Large scale empirical testing of CoSaMP, Subspace Pursuit and Iterated Hard Thresholding were compiled by Donoho and Maleki . Most sparse approximation algorithms do not have a theoretical average case analysis. The polytope analysis of Donoho and Tanner allows for analytical arguments providing a Region of Weak Equivalence where recovery is guaranteed for all but a but a small fraction of kk-sparse signals. An average case variant of the RIP is being developed, see .

3.3 Improving the RIP Phase Transition

Alternatively, the region of strong equivalence might be increased by improving the bounds on the RIP constants, L(δ,ρ),U(δ,ρ)\mathcal{L}(\delta,\rho),\mathcal{U}(\delta,\rho). If the method of analysis remained the same, we can explore the effects of improved bounds by examining the statements with empirically observed lower bounds on RIP constants for Gaussian matrices. As detailed in Table 1, the current bounds from Thm. 5 are no more than twice the empirical RIP constants. Replacing the RIP constants with empirically observed lower bounds of the RIP constants (for n=800n=800) in μFL(k,n,N)\mu^{FL}(k,n,N) gives us a function ρSemp(δ)\rho_{S}^{emp}(\delta), see Figure 6, which is an upper bound on the region of strong equivalence implied by Thm. 15; this improvement is no more than 2.5 times ρSFL(δ)\rho_{S}^{FL}(\delta) for δ[1/20,20/21]\delta\in[1/20,20/21].

Foucart and Lai bounded the discrepancy between xθx^{\star}_{\theta} and any xθx_{\theta} satisfying (25) in terms of the discrepancy between xθx_{\theta} and its best kk sparse approximation,

Given q(0,1]q\in(0,1], for any matrix AA of size n×Nn\times N with n<Nn<N and with RIP constants L(2k,n,N)L(2k,n,N) and U(2k,n,N)U(2k,n,N) and μ(2k,n,N)\mu(2k,n,N) defined as (20), if

then a solution xθx^{\star}_{\theta} of (26) approximates any xθx_{\theta} satisfying (25) within the bounds

with C1,C2,D1,C_{1},C_{2},D_{1}, and D2D_{2} functions of qq, α\alpha, and 1+U(2αk,n,N)1L(2αk,n,N)\frac{1+U(2\alpha k,n,N)}{1-L(2\alpha k,n,N)}.

The parameter α\alpha in Theorem 22 is a free parameter from the method of proof, and should be selected so as to maximize the region where (28) and/or other conditions are satisfied. For brevity we do not state the formulae for C1,C2,D1,C_{1},C_{2},D_{1}, and D2D_{2} as functions of (k,n,N)(k,n,N), but only state them in Theorem 23 in terms of their bounds for Gaussian random matrices as (k,n,N)(k,n,N)\rightarrow\infty.

Although the solution of (26), xθx^{\star}_{\theta}, has unknown sparsity, Theorem 22 ensures that if there is a solution of (25), xθx_{\theta}, which can be well approximated by a kk sparse vector, i.e. if σk(xθ)q\sigma_{k}(x_{\theta})_{q} is small, then if (28) is satisfied the discrepancy between xθx^{\star}_{\theta} and xθx_{\theta} will be similarly small. For instance, if the sparsest solution of (26), xθx_{\theta}, is kk sparse, then (30) implies that xθxθ2D2θ\|x_{\theta}-x^{\star}_{\theta}\|_{2}\leq D_{2}\cdot\theta; moreover, if θ=0\theta=0 then xθx^{\star}_{\theta} will be kk sparse and satisfy Axθ=bAx^{\star}_{\theta}=b (in the case q=1q=1 this result is summarized as Theorem 15). Substituting bounds on the asymmetric RIP constants L(2αk,n,N)L(2\alpha k,n,N) and U(2αk,n,N)U(2\alpha k,n,N) from Theorem 5 we arrive at a quantitative version of Theorem 22 for Gaussian random matrices.

Given q(0,1]q\in(0,1], for any ϵ>0\epsilon>0, as (k,n,N)(k,n,N)\rightarrow\infty with n/Nδ(0,1)n/N\rightarrow\delta\in(0,1) and k/nρ(0,1/2]k/n\rightarrow\rho\in(0,1/2], if ρ<(1ϵ)ρSFL(δ;q)\rho<(1-\epsilon)\rho_{S}^{FL}(\delta;q) where ρSFL(δ;q)\rho_{S}^{FL}(\delta;q) is the maximum over 1α1/2ρ1\leq\alpha\leq 1/2\rho of the solutions, ρ(δ;q,α)\rho(\delta;q,\alpha), of μα(δ,2αρ):=α1/21/qμ(δ,2αρ)=1\mu_{\alpha}(\delta,2\alpha\rho):=\alpha^{1/2-1/q}\mu(\delta,2\alpha\rho)=1 with μ(δ,2ρ)\mu(\delta,2\rho) defined as in (21), there is overwhelming probability on the draw of AA with Gaussian i.i.d. entries that a solution xθx^{\star}_{\theta} of (26) approximates any xθx_{\theta} satisfying (25) within the bounds

The multiplicative “stability factors” are defined as:

with β(δ,ρ):=(1+2)1+U(δ,ρ)1L(δ,ρ)\beta(\delta,\rho):=(1+\sqrt{2})\frac{1+\mathcal{U}(\delta,\rho)}{1-\mathcal{L}(\delta,\rho)}.

Acknowledgements. The authors would like to thank the editor and the referees for their useful suggestions that have greatly improved the manuscript.

References