Various thresholds for $\ell_1$-optimization in compressed sensing

Mihailo Stojnic

Introduction

In last several years the area of compressed sensing has been the subject of extensive research. The breakthrough results of and theoretically demonstrated that in certain applications (e.g. signal processing in sensor networks) classical sampling at Nyquist rate may not be necessary to perfectly recover signals. Namely, it turns out that a crucial compressed sensing problem is finding the sparsest solution of an under-determined system of equations. While this problem had been known for a long time it is the work of and that rigorously proved for the first time that a sparse enough solution can be recovered by solving a linear program in polynomial time. These results generated enormous amount of research with possible applications ranging from high-dimensional geometry, image reconstruction, single-pixel camera design, decoding of linear codes, channel estimation in wireless communications, to machine learning, data-streaming algorithms, DNA micro-arrays, magneto-encephalography etc. (more on the compressed sensing problems, their importance, and wide spectrum of different applications can be found in excellent references ).

In this paper we are interested in the mathematical background of certain compressed sensing problems. As is well known, these problems are very easy to pose and very difficult to solve. Namely, they are as simple as the following: we would like to find x{\bf x} such that

where AA is an m×nm\times n (m<nm<n) measurement matrix and y{\bf y} is an m×1m\times 1 measurement vector. Standard compressed sensing context assumes that x{\bf x} is an n×1n\times 1 unknown kk-sparse vector (see Figure 1; here and in the rest of the paper, under kk-sparse vector we assume a vector that has at most kk nonzero components). The main topic of this paper will be compressed sensing of the so-called ideally sparse signals (more on the so-called approximately sparse signals can be found in e.g. ). We will mostly throughout the paper assume no special structure on the sparse signal (more on the very relevant cases of sparse signals with special structures the interested reader can find in ). Also, in the rest of the paper we will assume the so-called linear regime, i.e. we will assume that k=βnk=\beta n and that the number of the measurements is m=αnm=\alpha n where α\alpha and β\beta are absolute constants independent of nn (more on the non-linear regime, i.e. on the regime when mm is larger than linearly proportional to kk can be found in e.g. ).

We generally distinguish two classes of possible algorithms that can be developed for solving (1). The first class of algorithms assumes freedom in designing the measurement matrix AA. If one has the freedom to design the measurement matrix AA then the results from demonstrated that the techniques from coding theory (based on the coding/decoding of Reed-Solomon codes) can be employed to determine any kk-sparse x{\bf x} in (1) for any 0<α10<\alpha\leq 1 and any βα2\beta\leq\frac{\alpha}{2} in polynomial time. It is relatively easy to show that under the unique recoverability assumption β\beta can not be greater than α2\frac{\alpha}{2}. Therefore, as long as one is concerned with the unique recovery of kk-sparse x{\bf x} in (1) in polynomial time the results from are optimal. The complexity of algorithms from is roughly O(n3)O(n^{3}). In a similar fashion one can, instead of using coding/decoding techniques associated with Reed/Solomon codes, design the measurement matrix and the corresponding recovery algorithm based on the techniques related to the coding/decoding of Expander codes (see e.g. and references therein). In that case recovering x{\bf x} in (1) is significantly faster for large dimensions nn. Namely, the complexity of the techniques from e.g. (or their slight modifications) is usually O(n)O(n) which is clearly for large nn significantly smaller than O(n3)O(n^{3}). However, the techniques based on coding/decoding of Expander codes usually do not allow for β\beta to be as large as α2\frac{\alpha}{2}.

The main subject of this paper will be the algorithms from the second class. Namely, the second class assumes the algorithms that should be designed without having freedom to design the measurement matrix AA in parallel. If one has no freedom in the choice of the matrix AA (instead the matrix AA is rather given to us) then the recovery problem (1) becomes NP-hard. The following two algorithms (and their different variations) are then of special interest (and certainly have been the subject of an extensive research in recent years):

Under certain probabilistic assumptions on the elements of the matrix AA it can be shown (see e.g. ) that if α=O(βlog(1β))\alpha=O(\beta\log(\frac{1}{\beta})) OMP (or a slightly modified OMP) can recover x{\bf x} in (1) with complexity of recovery O(n2)O(n^{2}). On the other hand the so-called stage-wise OMP from recovers x{\bf x} in (1) with complexity of recovery O(nlogn)O(n\log n).

Instead of characterizing the m×nm\times n matrix AA through the RIP condition, in the author associates certain polytope with the matrix AA. Namely, consider polytope obtained by projecting the regular nn-dimensional cross-polytope using the matrix AA. It turns out that a necessary and sufficient condition for (2) to produce the solution of (1) is that this polytope associated with the matrix AA is kk-neighborly . Using the results of , it is further shown in , that if the matrix AA is a random m×nm\times n ortho-projector matrix then with overwhelming probability polytope obtained projecting the standard nn-dimensional cross-polytope by AA is kk-neighborly. The precise relation between mm and kk in order for this to happen is characterized in as well.

It should be noted that one usually considers success of (2) in finding solution of (1) for any given x{\bf x}. It is also of interest to consider success of (2) in finding solution of (1) for almost any given x{\bf x}. To make a distinction between these cases we recall on the following definitions from .

Clearly, for any given constant α1\alpha\leq 1 there is a maximum allowable value of the constant β\beta such that (2) finds solution of (1) with overwhelming probability for any x{\bf x}. This maximum allowable value of the constant β\beta is called the strong threshold (see ). We will denote the value of the strong threshold by βs\beta_{s}. Similarly, for any given constant α1\alpha\leq 1 one can define the sectional threshold as the maximum allowable value of the constant β\beta such that (2) finds the solution of (1) with overwhelming probability for any x{\bf x} with a given fixed location of non-zero components (see ). In a similar fashion one can then denote the value of the sectional threshold by βsec\beta_{sec}. Finally, for any given constant α1\alpha\leq 1 one can define the weak threshold as the maximum allowable value of the constant β\beta such that (2) finds the solution of (1) with overwhelming probability for any x{\bf x} with a given fixed location of non-zero components and a given fixed combination of its elements signs (see ). In a similar fashion one can then denote the value of the weak threshold by βw\beta_{w}. In this paper we determine the values of βs,βsec,βw\beta_{s},\beta_{sec},\beta_{w} for the entire range of α\alpha, i.e. for 0α10\leq\alpha\leq 1, for a specific group of randomly generated matrices AA.

We organize the rest of the paper in the following way. In Section 2 we introduce two key theorems that will be the heart of our subsequent analysis. In Section 3 we determine the values of various thresholds in the case of general sparse signals x{\bf x} under the assumption that the null-space of the matrix AA is uniformly distributed in the Grassmanian. Under the same assumption on the statistics of the measurement matrix AA in Section 4 we determine the values of the weak threshold in a special case of the so-called signed vectors x{\bf x}. Finally, in Section 5 we discuss obtained results and possible directions for future work.

Key theorems

(Null-space characterization; General x{\bf x}) Assume that an m×nm\times n measurement matrix AA is given. Let x{\bf x} be a kk-sparse vector whose non-zero components can be both positive or negative. Further, assume that y=Ax{\bf y}=A{\bf x} and that w{\bf w} is an n×1n\times 1 vector. Let KK be any subset of {1,2,,n}\{1,2,\dots,n\} such that K=k|K|=k and let KiK_{i} denote the ii-th element of KK. Further, let Kˉ={1,2,,n}K\bar{K}=\{1,2,\dots,n\}\setminus K. Let 1{\bf 1} be a 2k×k2^{k}\times k sign matrix. Each element of the matrix 1{\bf 1} is either 11 or 1-1 and there are no two rows that are identical. Let 1j{\bf 1}_{j} be the jj-th row of the matrix 1{\bf 1}. Then (2) will produce the solution of (1) if

Having matrix AA such that (3) holds would be enough for solutions of (2) and (1) to coincide. If one assumes that mm and kk are proportional to nn (the case of our interest in this paper) then the construction of the deterministic matrices AA that would satisfy (3) is not an easy task (in fact, one may say that it is one of the most fundamental open problems in the area of theoretical compressed sensing; more on an equally important inverse problem of checking if a given matrix satisfies the condition of Theorem 3 the interested reader can find in ). However, turning to random matrices significantly simplifies things. As we will see later in the paper, the random matrices AA that have the null-space uniformly distributed in the Grassmanian will turn out to be a very convenient choice. The following phenomenal result from that relates to such matrices will be the key ingredient in the analysis that will follow.

( Escape through a mesh) Let SS be a subset of the unit Euclidean sphere Sn1S^{n-1} in RnR^{n}. Let YY be a random (nm)(n-m)-dimensional subspace of RnR^{n}, distributed uniformly in the Grassmanian with respect to the Haar measure. Let

where h{\bf h} is a random column vector in RnR^{n} with i.i.d. N(0,1){\cal N}(0,1) components. Assume that w(S)<(m14m)w(S)<\left(\sqrt{m}-\frac{1}{4\sqrt{m}}\right). Then

Remark: Gordon’s original constant 3.53.5 was substituted by 2.52.5 in . Both constants are fine for our subsequent analysis.

Probabilistic analysis of the null-space characterizations – general 𝐱𝐱{\bf x}

In this section we probabilistically analyze validity of the null-space characterization given in Theorem 3. In the first subsection of this section we will show how one can obtain the values of the strong threshold βs\beta_{s} for the entire range 0α10\leq\alpha\leq 1 based on such an analysis. In the later two subsections we will generalize the strong threshold analysis and obtain the values of the weak and the sectional threshold.

As masterly noted in Theorem 2 can be used to probabilistically analyze (3) (and as we will see later in the paper, many of its variants). Namely, let SS in (4) be

which according to Theorem 3 (or more precisely according to the remark after Theorem 3) means that the solutions of (1) and (2) coincide with probability 11. For any given value of α(0,1)\alpha\in(0,1) a threshold value of β\beta can be then determined as a maximum β\beta such that w(Ss)<(m14m)w(S_{s})<\left(\sqrt{m}-\frac{1}{4\sqrt{m}}\right). That maximum β\beta will be exactly the value of the strong threshold βs\beta_{s}. If one is only concerned with finding a possible value for βs\beta_{s} it is easy to note that instead of computing w(Ss)w(S_{s}) it is sufficient to find its an upper bound. However, as we will see later in the paper, to determine as good values of βs\beta_{s} as possible, the upper bound on w(Ss)w(S_{s}) should be as tight as possible. The main contribution of this work will be a fairly precise estimate of w(Ss)w(S_{s}).

In the following subsections we present a way to get such an estimate. To simplify the exposition we first set w(h,Ss)=maxwSs(hTw)w({\bf h},S_{s})=\max_{{\bf w}\in S_{s}}({\bf h}^{T}{\bf w}). In order to upper-bound w(Ss)w(S_{s}) we will first in Subsection 3.1.1 determine an upper bound BsB_{s} on w(h,Ss)w({\bf h},S_{s}). The expected value with respect to h{\bf h} of such an upper bound will be an upper bound on w(Ss)w(S_{s}). In Subsection 3.1.2 we will compute an upper bound on that expected value, i.e. we will compute an upper bound on E(Bs)E(B_{s}). That quantity will be an upper bound on w(Ss)w(S_{s}) since according to the following E(Bs)E(B_{s}) is an upper bound on w(Ss)w(S_{s})

From the definition of set SsS_{s} given in (6) it easily follows that if w{\bf w} is in SsS_{s} then any vector obtain from w{\bf w} by changing the signs to any subset of its elements is also in SsS_{s}. The signs of w{\bf w} can therefore be chosen so that they match the signs of the corresponding elements in h{\bf h}. We then easily have

Let w^\hat{{\bf w}} be the solution of the maximization on the right-hand side of (9). Then w^nw^n1w^n2w^1|\hat{{\bf w}}_{n}|\geq|\hat{{\bf w}}_{n-1}|\geq|\hat{{\bf w}}_{n-2}|\geq\dots\geq|\hat{{\bf w}}_{1}|.

Let y=(y1,y2,,yn)TRn{\bf y}=({\bf y}_{1},{\bf y}_{2},\dots,{\bf y}_{n})^{T}\in R^{n}. Then one can simplify (9) in the following way

To be in complete agreement with Lemma 1 one should add the sorting constraints on the elements of y{\bf y} in the optimization problem above. However, it is an easy exercise (similar to the proof of Lemma 1) to show that these constraints would be redundant, i.e. it is easy to show that any solution y^\hat{{\bf y}} to the above optimization problem will automatically satisfy y^ny^n1y^1\hat{{\bf y}}_{n}\geq\hat{{\bf y}}_{n-1}\geq\dots\geq\hat{{\bf y}}_{1} (of course since we will be interested in upper-bounding w(h,Ss)w({\bf h},S_{s}) one can even argue that in the maximization problem (9) dropping constraints would certainly provide a value no smaller than the optimal one obtainable if the constraints are included). To determine an upper bound on w(h,Ss)w({\bf h},S_{s}) we will use the method of Lagrange duality. Before deriving the Lagrange dual we slightly modify (10) in the following way

To further facilitate writing let zRn{\bf z}\in R^{n} be a column vector such that zi=1,1i(nk){\bf z}_{i}=1,1\leq i\leq(n-k) and zi=1,nk+1in{\bf z}_{i}=-1,n-k+1\leq i\leq n. Further, let λ=(λ1,λ2,,λn)TRn\lambda=(\lambda_{1},\lambda_{2},\dots,\lambda_{n})^{T}\in R^{n}. Following, e.g. , we can write the dual of the optimization problem (11) and its optimal value wup(h,Ss)w_{up}({\bf h},S_{s}) as

One can then transform the objective function in the following way

After trivially solving the inner minimization in (13) we obtain

By duality, wup(h,Ss)w(h,Ss)-w_{up}({\bf h},S_{s})\leq-w({\bf h},S_{s}) which easily implies w(h,Ss)wup(h,Ss)w({\bf h},S_{s})\leq w_{up}({\bf h},S_{s}). Therefore wup(h,Ss)w_{up}({\bf h},S_{s}) is an upper bound on w(h,Ss)w({\bf h},S_{s}). (In fact one can easily show that the strong duality holds and that w(h,Ss)=wup(h,Ss)w({\bf h},S_{s})=w_{up}({\bf h},S_{s}); however, as explained earlier, for our analysis showing that wup(h,Ss)w_{up}({\bf h},S_{s}) is an upper bound on w(h,Ss)w({\bf h},S_{s}) is sufficient.) Along the same lines, one can easily spot that any feasible values ν\nu and λ\lambda in (15) will provide a valid upper bound on wup(h,Ss)w_{up}({\bf h},S_{s}) and hence a valid upper bound on w(h,Ss)w({\bf h},S_{s}). In what follows we will in fact determine the optimal values for ν\nu and λ\lambda. However, since it is not necessary for our analysis we will not put too much effort into proving that these values are optimal. As we have stated earlier, for our analysis it will be enough to show that the values for ν\nu and λ\lambda that we will obtain are feasible in (15).

Summing the above derivatives over ii and equalling with zero we obtain

Plugging the value for i=1cλi\sum_{i=1}^{c}\lambda_{i} obtained in (21) in (19) we have

As long as we can find a cc such that λi,1ic\lambda_{i},1\leq i\leq c given in (22) are non-negative ν\nu will be non-negative as well and ν\nu and λ\lambda will therefore be feasible in (15). This in turn implies

where f(h,ν,λ)f({\bf h},\nu,\lambda) is computed for the values of λ\lambda and ν\nu given in (22) and (24), respectively. (In fact determining the largest cc such that λi,1ic\lambda_{i},1\leq i\leq c given in (22) are non-negative will insure that f(h,ν,λ)=w(h,Ss)\sqrt{f({\bf h},\nu,\lambda)}=w({\bf h},S_{s}); however, as already stated earlier, this fact is not of any special importance for our analysis).

Let us now assume that cc is fixed such that λ\lambda and ν\nu are as given in (22) and (24). Then combining (18), (21), and (24) we have

To facilitate the exposition in the following subsection we will make the upper bound given in (29) slightly more pessimistic in the following lemma.

Fa1()F_{a}^{-1}(\cdot) is the inverse cdf of the random variable X|X| where XX is zero-mean, unit variance gaussian random variable. ϵ>0\epsilon>0 is an arbitrarily small constant independent of nn.

Follows from the previous analysis and (29). ∎

In this subsection we will compute an upper bound on E(Bs)E(B_{s}). As a first step we determine a lower bound on P(ζs(h,cs)>0)P(\zeta_{s}({\bf h},c_{s})>0). We start by a sequence of obvious inequalities

The rest of the analysis assumes that nn is large so that δs\delta_{s} can be assumed to be real (of course, δs\delta_{s} is a proportionality constant independent of nn). Using the results from we obtain

We will also need the following brilliant result from . Let ξ():RnR\xi(\cdot):R^{n}\longrightarrow R be a Lipschitz function such that ξ(a)ξ(b)σab2|\xi({\bf a})-\xi({\bf b})|\leq\sigma\|{\bf a}-{\bf b}\|_{2}. Let a{\bf a} be a vector comprised of i.i.d. zero-mean, unit variance Gaussian random variables. Then

The following sequence of inequalities/equalities is easy to establish

From (33), (34), and (38) we finally have

We now return to computing an upper bound on E(Bs)E(B_{s}). By the definition of BsB_{s} we have

Finally, the following lemma easily follows by combining (40), (42), and (43).

If nn is large the first term in (44) goes to zero. Then from (5), (7), and (44) it easily follows that for a fixed α\alpha one can determine βs\beta_{s} as a maximum β\beta such that

We recall that k=βnk=\beta n and zRn{\bf z}\in R^{n} is a column vector such that zi=1,1i(nk){\bf z}_{i}=1,1\leq i\leq(n-k) and zi=1,nk+1in{\bf z}_{i}=-1,n-k+1\leq i\leq n. Therefore, in the above equation β\beta is hidden in z{\bf z}. It is relatively easy to see that problem of finding βs\beta_{s} for a given fixed α\alpha is equivalent to finding minimum α\alpha such that (45) holds for a fixed βs\beta_{s}. Let βsmax\beta_{s}^{max} be βs\beta_{s} such that minimum α\alpha that satisfies (45) is 11. Our goal is then to determine minimum α\alpha that satisfies (45) for any βs[0,βsmax]\beta_{s}\in[0,\beta_{s}^{max}].

Therefore in the rest of this subsection we show how the left hand side of (45) can be computed for a randomly chosen fixed βs\beta_{s}. We do so in two steps:

From Lemma 2 we have cs=δsnc_{s}=\delta_{s}n is a cc such that

We then easily compute Fa1(1θs)F_{a}^{-1}(1-\theta_{s}) in the following way

where erfinv is the inverse of the standard error function of the normal random variable. We further find

Combination of (46), (50), (51), and (52) gives us the following equation for computing θs\theta_{s}

Let θ^s\hat{\theta}_{s} be the solution of (53). Then δs=1θ^s\delta_{s}=1-\hat{\theta}_{s} and cs=δsn=(1θ^s)nc_{s}=\delta_{s}n=(1-\hat{\theta}_{s})n. This concludes step 11.

where Fb1F_{b}^{-1} is the inverse cdf of the squared zero-mean unit variance Gaussian random variable. We then easily compute Fb1(1θ^s)F_{b}^{-1}(1-\hat{\theta}_{s}) in the following way

We summarize the results from this section in the following theorem.

(Strong threshold) Let AA be an m×nm\times n measurement matrix in (1) with the null-space uniformly distributed in the Grassmanian. Let the unknown x{\bf x} in (1) be kk-sparse. Let k,m,nk,m,n be large and let α=mn\alpha=\frac{m}{n} and βs=kn\beta_{s}=\frac{k}{n} be constants independent of mm and nn. Let erfinv be the inverse of the standard error function associated with zero-mean unit variance Gaussian random variable. Further, let ϵ>0\epsilon>0 be an arbitrarily small constant and θ^s\hat{\theta}_{s}, (βsθ^s1\beta_{s}\leq\hat{\theta}_{s}\leq 1) be the solution of

If α\alpha and βs\beta_{s} further satisfy

then the solutions of (1) and (2) coincide with overwhelming probability.

Follows from the previous discussion combining (5), (7), (30), (44), (45), (53), (54), and (58). ∎

The results for the strong threshold obtained from the above theorem as well as the best currently known ones from are presented on Figure 2. As can be seen, the threshold results obtained from the previous analysis are comparable to those from in a large portion of the range for α\alpha. For the values of α\alpha that are close to 11 the threshold values from Theorem 3 are slightly better than those from . For α1\alpha\longrightarrow 1 we have β.24\beta\approx.24 which matches the value obtained in and is in fact optimal.

2 Weak threshold

In this subsection we determine the weak threshold βw\beta_{w}. Before proceeding further we quickly recall on the definition of the weak threshold. Namely, for a given α\alpha, βw\beta_{w} is the maximum value of β\beta such that the solutions of (1) and (2) coincide for any given βn\beta n-sparse x{\bf x} with a fixed location of nonzero components and a fixed combination of signs of its elements. Since the analysis will clearly be irrelevant with respect to what particular location and what particular combination of signs of nonzero elements are chosen, we can for the simplicity of the exposition and without loss of generality assume that the components x1,x2,,xnk{\bf x}_{1},{\bf x}_{2},\dots,{\bf x}_{n-k} of x{\bf x} are equal to zero and the components xnk+1,xnk+2,,xn{\bf x}_{n-k+1},{\bf x}_{n-k+2},\dots,{\bf x}_{n} of x{\bf x} are smaller than or equal to zero. Under this assumption we have the following corollary of Theorem 3.

(Nonzero part of x{\bf x} has fixed signs and location) Assume that an m×nm\times n measurement matrix AA is given. Let x{\bf x} be a kk-sparse vector whose nonzero components are negative. Also let x1=x2==xnk=0.{\bf x}_{1}={\bf x}_{2}=\dots={\bf x}_{n-k}=0. Further, assume that y=Ax{\bf y}=A{\bf x} and that w{\bf w} is an n×1n\times 1 vector. Then (2) will produce the solution of (1) if

Following the procedure of Subsection 3.1 we set SwS_{w}

where as earlier h{\bf h} is a random column vector in RnR^{n} with i.i.d. N(0,1){\cal N}(0,1) components and Sn1S^{n-1} is the unit nn-dimensional sphere. As in Subsection 3.1 our goal will be to compute an upper bound on w(Sw)w(S_{w}) and then equal that upper bound to (m14m)\left(\sqrt{m}-\frac{1}{4\sqrt{m}}\right). In the following subsections we present a way to get such an upper bound. As earlier, to simplify the exposition we again set w(h,Sw)=maxwSw(hTw)w({\bf h},S_{w})=\max_{{\bf w}\in S_{w}}({\bf h}^{T}{\bf w}). In order to upper-bound w(Sw)w(S_{w}) we will first in Subsection 3.2.1 determine an upper bound BwB_{w} on w(h,Sw)w({\bf h},S_{w}). The expected value with respect to h{\bf h} of such an upper bound will be an upper bound on w(Sw)w(S_{w}). In Subsection 3.2.2 we will compute an upper bound on that expected value, i.e. we will compute an upper bound on E(Bw)E(B_{w}). That quantity will be an upper bound on w(Sw)w(S_{w}) since according to the following E(Bw)E(B_{w}) is an upper bound on w(Sw)w(S_{w})

Let h1:(nk)=(h1,h2,,hnk)T{\bf h}_{1:(n-k)}=({\bf h}_{1},{\bf h}_{2},\dots,{\bf h}_{n-k})^{T}. Further, let now h(i)(nk)|{\bf h}|_{(i)}^{(n-k)} be the ii-th smallest magnitude of elements of h1:(nk){\bf h}_{1:(n-k)}. Set

Then one can simplify (65) in the following way

where hˉi\bar{{\bf h}}_{i} is the ii-th element of hˉ\bar{{\bf h}} and c(nk)c\leq(n-k) is such that ((hˉTz)i=1chˉi)0((\bar{{\bf h}}^{T}{\bf z})-\sum_{i=1}^{c}\bar{{\bf h}}_{i})\geq 0. Clearly, as long as (hˉTz)0(\bar{{\bf h}}^{T}{\bf z})\geq 0 there will be a cc (it is possible that c=0c=0) such that quantity on the most right hand side of (68) is an upper bound on w(h,Sw)w({\bf h},S_{w}).

Using (68) we then establish the following analogue to Lemma 2.

Let hRn{\bf h}\in R^{n} be a vector with i.i.d. zero-mean unit variance gaussian components. Further let hˉ\bar{{\bf h}} be as defined in (66) and w(h,Sw)=maxwSw(hTw)w({\bf h},S_{w})=\max_{{\bf w}\in S_{w}}({\bf h}^{T}{\bf w}) where SwS_{w} is as defined in (62). Let zRn{\bf z}\in R^{n} be a column vector such that zi=1,1i(nk){\bf z}_{i}=1,1\leq i\leq(n-k) and zi=1,nk+1in{\bf z}_{i}=-1,n-k+1\leq i\leq n. Then

ζw(h,c)=(hˉTz)i=1chˉinchˉc\zeta_{w}({\bf h},c)=\frac{(\bar{{\bf h}}^{T}{\bf z})-\sum_{i=1}^{c}\bar{{\bf h}}_{i}}{n-c}-\bar{{\bf h}}_{c} and cw=δwnc_{w}=\delta_{w}n is a cnkc\leq n-k such that

Fa1()F_{a}^{-1}(\cdot) is the inverse cdf of the random variable X|X| where XX is zero-mean, unit variance gaussian random variable. ϵ>0\epsilon>0 is an arbitrarily small constant independent of nn.

Following step-by-step the derivation of Lemma 44 (with a trivial adjustment in computing Lipschitz constant σ\sigma) we can establish the weak threshold analogue to it.

Assume the setup of Lemma 5. Let further ψw=E(hˉTz)i=1cwhˉi)n\psi_{w}=\frac{E(\bar{{\bf h}}^{T}{\bf z})-\sum_{i=1}^{c_{w}}\bar{{\bf h}}_{i})}{n}.Then

Follows directly from the derivation before Lemma 44. ∎

As in (45), if nn is large, for a fixed α\alpha one can determine βw\beta_{w} as a maximum β\beta such that

As earlier k=βnk=\beta n and zRn{\bf z}\in R^{n} is a column vector such that zi=1,1i(nk){\bf z}_{i}=1,1\leq i\leq(n-k) and zi=1,nk+1in{\bf z}_{i}=-1,n-k+1\leq i\leq n. Also, as in Subsection 3.1.2, β\beta is again hidden in z{\bf z}. It is not difficult to see that problem of finding βw\beta_{w} for a given fixed α\alpha is equivalent to finding minimum α\alpha such that (73) holds for a fixed βw\beta_{w}. Let βwmax\beta_{w}^{max} be βw\beta_{w} such that minimum α\alpha that satisfies (73) is 11. Analogously to what was done in Subsection 3.1.2, we will determine minimum α\alpha that satisfies (73) for any βw[0,βwmax]\beta_{w}\in[0,\beta_{w}^{max}].

Therefore in the rest of this subsection we show how the left hand side of (73) can be computed for a randomly chosen fixed βw\beta_{w}. We, as in as in Subsection 3.1.2, do so in two steps:

We then compute Ei=cw+1nhˉi2n(E(hˉTz)Ei=1cwhˉi)2n(ncw)\frac{E\sum_{i=c_{w}+1}^{n}\bar{{\bf h}}_{i}^{2}}{n}-\frac{(E(\bar{{\bf h}}^{T}{\bf z})-E\sum_{i=1}^{c_{w}}\bar{{\bf h}}_{i})^{2}}{n(n-c_{w})} with cwc_{w} found in step 11.

From Lemma 5 we have cw=δwnc_{w}=\delta_{w}n is a cc such that

where we recall hˉi,1i(nβwn)\bar{{\bf h}}_{i},1\leq i\leq(n-\beta_{w}n), is the ii-th smallest magnitude of vector h1:(nβwn){\bf h}_{1:(n-\beta_{w}n)}. We also recall that h1:(nβwn){\bf h}_{1:(n-\beta_{w}n)} stands for the first (nβwn)(n-\beta_{w}n) components of h{\bf h} and hi,nβwn+1in{\bf h}_{i},n-\beta_{w}n+1\leq i\leq n, are naturally the last βwn\beta_{w}n components of vector h{\bf h}. Also, as always, all components of h{\bf h} are i.i.d. zero-mean unit variance Gaussian random variables and ϵ>0\epsilon>0 is an arbitrarily small constant. Then clearly Ehi=0,nβwn+1inE{\bf h}_{i}=0,n-\beta_{w}n+1\leq i\leq n and we have from (74)

Set θw=1δw\theta_{w}=1-\delta_{w}. Following and in a way completely analogous to (50) we obtain

Combination of (75), (76), and (77) gives us the following equation for computing θw\theta_{w}

Let θ^w\hat{\theta}_{w} be the solution of (78). Then δw=1θ^w\delta_{w}=1-\hat{\theta}_{w} and cw=δwn=(1θ^w)nc_{w}=\delta_{w}n=(1-\hat{\theta}_{w})n. This concludes step 11.

In this step we compute Ei=cw+1nhˉi2n(E(hˉTz)Ei=1cwhˉi)2n(ncw)\frac{E\sum_{i=c_{w}+1}^{n}\bar{{\bf h}}_{i}^{2}}{n}-\frac{(E(\bar{{\bf h}}^{T}{\bf z})-E\sum_{i=1}^{c_{w}}\bar{{\bf h}}_{i})^{2}}{n(n-c_{w})} with cw=(1θ^w)nc_{w}=(1-\hat{\theta}_{w})n. Using the results from step 11 we easily find

Effectively, what is left to compute is Ei=cw+1nhˉi2n\frac{E\sum_{i=c_{w}+1}^{n}\bar{{\bf h}}_{i}^{2}}{n}. First we note that

Using an approach similar to the one from step 22 of Subsection 3.1.2 and following we have

where as in Subsection 3.1.2 Fb1F_{b}^{-1} is the inverse cdf of squared zero-mean unit variance Gaussian random variable. Following (56) we then have

Combining (80), (81), (82), and (83) we obtain

We summarize the results from this section in the following theorem.

(Weak threshold) Let AA be an m×nm\times n measurement matrix in (1) with the null-space uniformly distributed in the Grassmanian. Let the unknown x{\bf x} in (1) be kk-sparse. Further, let the location and signs of nonzero elements of x{\bf x} be arbitrarily chosen but fixed. Let k,m,nk,m,n be large and let α=mn\alpha=\frac{m}{n} and βw=kn\beta_{w}=\frac{k}{n} be constants independent of mm and nn. Let erfinv be the inverse of the standard error function associated with zero-mean unit variance Gaussian random variable. Further, let ϵ>0\epsilon>0 be an arbitrarily small constant and θ^w\hat{\theta}_{w}, (βwθ^w1\beta_{w}\leq\hat{\theta}_{w}\leq 1) be the solution of

If α\alpha and βw\beta_{w} further satisfy

then the solutions of (1) and (2) coincide with overwhelming probability.

Follows from the previous discussion combining (5), (64), (69), (72), (73), (78), (79), and (84). ∎

The results for the weak threshold obtained from the above theorem as well as the best currently known ones from are presented on Figure 3. As can be seen, the threshold results obtained from the previous analysis match those from .

3 Sectional threshold

In this subsection we determine the sectional threshold βsec\beta_{sec}. Before proceeding further we one more time quickly recall on the definition of the sectional threshold. Namely, for a given α\alpha, βsec\beta_{sec} is the maximum value of β\beta such that the solutions of (1) and (2) coincide for any given βn\beta n-sparse x{\bf x} with a fixed location of nonzero components. Since the analysis will clearly be irrelevant with respect to what particular location of nonzero elements is chosen, we can for the simplicity of the exposition and without loss of generality assume that the components x1,x2,,xnk{\bf x}_{1},{\bf x}_{2},\dots,{\bf x}_{n-k} of x{\bf x} are equal to zero. Under this assumption we have the following corollary of Theorem 3.

Assume that an m×nm\times n measurement matrix AA is given. Let x{\bf x} be a kk-sparse vector. Also let x1=x2==xnk=0.{\bf x}_{1}={\bf x}_{2}=\dots={\bf x}_{n-k}=0. Further, assume that y=Ax{\bf y}=A{\bf x} and that w{\bf w} is an n×1n\times 1 vector. Then (2) will produce the solution of (1) if

Following the procedure of Subsection 3.2 we set SsecS_{sec}

where as earlier h{\bf h} is a random column vector in RnR^{n} with i.i.d. N(0,1){\cal N}(0,1) components and Sn1S^{n-1} is the unit nn-dimensional sphere. As in Subsections 3.1 and 3.2 our goal will be to compute an upper bound on w(Ssec)w(S_{sec}) and then equal that upper bound to (m14m)\left(\sqrt{m}-\frac{1}{4\sqrt{m}}\right). In the following subsections we present a way to get such an upper bound. As earlier, we set w(h,Ssec)=maxwSsec(hTw)w({\bf h},S_{sec})=\max_{{\bf w}\in S_{sec}}({\bf h}^{T}{\bf w}). Following the strategy of previous sections in Subsection 3.3.1 we determine an upper bound BsecB_{sec} on w(h,Ssec)w({\bf h},S_{sec}). In Subsection 3.3.2 we will compute an upper bound on E(Bsec)E(B_{sec}). That quantity will be an upper bound on w(Ssec)w(S_{sec}) since according to the following E(Bsec)E(B_{sec}) is an upper bound on w(Ssec)w(S_{sec})

As earlier, let h1:(nk)=(h1,h2,,hnk)T{\bf h}_{1:(n-k)}=({\bf h}_{1},{\bf h}_{2},\dots,{\bf h}_{n-k})^{T} and let h(i)(nk)|{\bf h}|_{(i)}^{(n-k)} be the ii-th smallest magnitude of elements of h1:(nk){\bf h}_{1:(n-k)}. Set

where hi,nk+1in|{\bf h}_{i}|,n-k+1\leq i\leq n, is the absolute value (magnitude) of hi,nk+1in{\bf h}_{i},n-k+1\leq i\leq n. Then one can simplify (91) in the following way

where h^i\hat{{\bf h}}_{i} is the ii-th element of h^\hat{{\bf h}} and c(nk)c\leq(n-k) is such that ((h^Tz)i=1ch^i)0((\hat{{\bf h}}^{T}{\bf z})-\sum_{i=1}^{c}\hat{{\bf h}}_{i})\geq 0. As earlier, as long as (h^Tz)0(\hat{{\bf h}}^{T}{\bf z})\geq 0 there will be a cc (it is possible that c=0c=0) such that quantity on the most right hand side of (94) is an upper bound on w(h,Ssec)w({\bf h},S_{sec}).

Using (94) we then establish the following analogue to Lemmas 2 and 5.

Let hRn{\bf h}\in R^{n} be a vector with i.i.d. zero-mean unit variance gaussian components. Further let h^\hat{{\bf h}} be as defined in (92) and w(h,Ssec)=maxwSsec(hTw)w({\bf h},S_{sec})=\max_{{\bf w}\in S_{sec}}({\bf h}^{T}{\bf w}) where SsecS_{sec} is as defined in (88). Let zRn{\bf z}\in R^{n} be a column vector such that zi=1,1i(nk){\bf z}_{i}=1,1\leq i\leq(n-k) and zi=1,nk+1in{\bf z}_{i}=-1,n-k+1\leq i\leq n. Then

ζsec(h,c)=(h^Tz)i=1ch^inch^c\zeta_{sec}({\bf h},c)=\frac{(\hat{{\bf h}}^{T}{\bf z})-\sum_{i=1}^{c}\hat{{\bf h}}_{i}}{n-c}-\hat{{\bf h}}_{c} and csec=δsecnc_{sec}=\delta_{sec}n is a cnkc\leq n-k such that

Fa1()F_{a}^{-1}(\cdot) is the inverse cdf of the random variable X|X| where XX is zero-mean, unit variance gaussian random variable. ϵ>0\epsilon>0 is an arbitrarily small constant independent of nn.

Follows directly from the derivation before Lemma 2. ∎

Following step-by-step the derivation of Lemma 44 (with a trivial adjustment in computing Lipschitz constant σ\sigma) we can establish the sectional threshold analogue to it.

Assume the setup of Lemma 7. Let further ψsec=E(h^Tz)i=1csech^i)n\psi_{sec}=\frac{E(\hat{{\bf h}}^{T}{\bf z})-\sum_{i=1}^{c_{sec}}\hat{{\bf h}}_{i})}{n}.Then

Follows directly from the derivation before Lemma 44. ∎

As in (45), if nn is large, for a fixed α\alpha one can determine βsec\beta_{sec} as a maximum β\beta such that

In the rest of this subsection we show how the left hand side of (99) can be computed for a randomly chosen fixed βsec\beta_{sec}. We again, as earlier, do so in two steps:

We then compute Ei=csec+1nh^i2n(E(h^Tz)Ei=1csech^i)2n(ncsec)\frac{E\sum_{i=c_{sec}+1}^{n}\hat{{\bf h}}_{i}^{2}}{n}-\frac{(E(\hat{{\bf h}}^{T}{\bf z})-E\sum_{i=1}^{c_{sec}}\hat{{\bf h}}_{i})^{2}}{n(n-c_{sec})} with csecc_{sec} found in step 11.

From Lemma 7 we have csec=δsecnc_{sec}=\delta_{sec}n is a cc such that

where as in Subsection 3.2 h(i)(nk),1i(nβsecn)|{\bf h}|_{(i)}^{(n-k)},1\leq i\leq(n-\beta_{sec}n), is the ii-th smallest magnitude of vector h1:(nβsecn){\bf h}_{1:(n-\beta_{sec}n)}. Furthermore, h^i=h(i)(nk)\hat{{\bf h}}_{i}=|{\bf h}|_{(i)}^{(n-k)} and clearly h1:(nβsecn){\bf h}_{1:(n-\beta_{sec}n)} stands for first (nβsecn)(n-\beta_{sec}n) components of h{\bf h}. We also recall that hi,nβsecn+1in|{\bf h}_{i}|,n-\beta_{sec}n+1\leq i\leq n, are the magnitudes of the last βsecn\beta_{sec}n components of vector h{\bf h} (these magnitudes of last βsecn\beta_{sec}n components of vector h{\bf h} are not sorted). As earlier, all components of h{\bf h} are i.i.d. zero-mean unit variance Gaussian random variables and ϵ>0\epsilon>0 is an arbitrarily small constant. Then clearly Ehi=2π,nβsecn+1inE|{\bf h}_{i}|=\sqrt{\frac{2}{\pi}},n-\beta_{sec}n+1\leq i\leq n, and we have from (LABEL:eq:secdelta1)

Set θsec=1δsec\theta_{sec}=1-\delta_{sec}. Following the derivation of (76) and (77) we have the following equation for computing θsec\theta_{sec}

Let θ^sec\hat{\theta}_{sec} be the solution of (102). Then δsec=1θ^sec\delta_{sec}=1-\hat{\theta}_{sec} and csec=δsecn=(1θ^sec)nc_{sec}=\delta_{sec}n=(1-\hat{\theta}_{sec})n. This concludes step 11.

In this step we compute Ei=csec+1nh^i2n(E(h^Tz)Ei=1csech^i)2n(ncsec)\frac{E\sum_{i=c_{sec}+1}^{n}\hat{{\bf h}}_{i}^{2}}{n}-\frac{(E(\hat{{\bf h}}^{T}{\bf z})-E\sum_{i=1}^{c_{sec}}\hat{{\bf h}}_{i})^{2}}{n(n-c_{sec})} with csec=(1θ^sec)nc_{sec}=(1-\hat{\theta}_{sec})n. Using results from step 11 we easily find

Effectively, what is left to compute is Ei=csec+1nh^i2n\frac{E\sum_{i=c_{sec}+1}^{n}\hat{{\bf h}}_{i}^{2}}{n}. However, the same quantity has already been computed in (84). Hence we have

We summarize the results from this section in the following theorem.

(Sectional threshold) Let AA be an m×nm\times n measurement matrix in (1) with the null-space uniformly distributed in the Grassmanian. Let the unknown x{\bf x} in (1) be kk-sparse. Further, let the location of nonzero elements of x{\bf x} be arbitrarily chosen but fixed. Let k,m,nk,m,n be large and let α=mn\alpha=\frac{m}{n} and βsec=kn\beta_{sec}=\frac{k}{n} be constants independent of mm and nn. Let erfinv be the inverse of the standard error function associated with zero-mean unit variance Gaussian random variable. Further, let ϵ>0\epsilon>0 be an arbitrarily small constant and θ^sec\hat{\theta}_{sec}, (βsecθ^sec1\beta_{sec}\leq\hat{\theta}_{sec}\leq 1) be the solution of

If α\alpha and βsec\beta_{sec} further satisfy

then the solutions of (1) and (2) coincide with overwhelming probability.

Follows from the previous discussion combining (5), (90), (95), (98), (99), (102), (103), and (104). ∎

The results for the sectional threshold obtained from the above theorem as well as the best currently known ones from are presented on Figure 4. As can be seen, the threshold results obtained from the previous analysis slightly improve on those from .

Probabilistic analysis of the null-space characterizations – signed 𝐱𝐱{\bf x}

In this section we consider recovery of vectors x{\bf x} with elements known to have certain sign pattern. Without loss of generality we assume that it is known that xi0,1in{\bf x}_{i}\geq 0,1\leq i\leq n. We also again assume that x{\bf x} is kk-sparse, i.e. we assume that x{\bf x} has no more than kk nonzero elements. To solve (1) for such an x{\bf x} instead of (2) we consider the following optimization problem

The following theorem from e.g. characterizes the equivalence of (1) and (107).

(Null-space characterization; Non-negative x{\bf x}) Assume that an m×nm\times n measurement matrix AA is given. Let x{\bf x} be a kk-sparse vector whose non-zero components are known to be positive. Further, assume that y=Ax{\bf y}=A{\bf x} and that w{\bf w} is an n×1n\times 1 vector. Let KK be any subset of {1,2,,n}\{1,2,\dots,n\} such that K=k|K|=k and let KiK_{i} denote the ii-th element of KK. Further, let Kˉ={1,2,,n}K\bar{K}=\{1,2,\dots,n\}\setminus K. Then (107) will produce the solution of (1) if

In the rest of this section we will probabilistically analyze validity of (108) (or to be more precise, its a slight modification). In the first following subsection we will show how one can obtain the values of the weak threshold βw+\beta_{w}^{+} for the entire range 0α10\leq\alpha\leq 1 based on such a probabilistic analysis.

In this subsection we determine the weak threshold βw+\beta_{w}^{+}. Before proceeding further we quickly recall on the definition of the weak threshold. The definition of the weak threshold was already introduced in Section 3.2 when recovery of general signals (vectors) x{\bf x} was considered. Here, we slightly modify it so that it fits the scenario of a priori known sign patterns of elements of x{\bf x}. Namely, for a given α\alpha, βw+\beta_{w}^{+} is the maximum value of β\beta such that the solutions of (1) and (107) coincide for any given βn\beta n-sparse x{\bf x} with a fixed location of nonzero components and a priori known to be comprised of non-negative elements. Since the analysis will clearly be irrelevant with respect to what particular location of nonzero elements is chosen, we can for the simplicity of the exposition and without loss of generality assume that the components x1,x2,,xnk{\bf x}_{1},{\bf x}_{2},\dots,{\bf x}_{n-k} of x{\bf x} are equal to zero and the components xnk+1,xnk+2,,xn{\bf x}_{n-k+1},{\bf x}_{n-k+2},\dots,{\bf x}_{n} of x{\bf x} are greater than or equal to zero. Under this assumption we have the following corollary of Theorem 6.

(Nonzero part of x{\bf x} has fixed a location; The signs of elements of x{\bf x} a priori known) Assume that an m×nm\times n measurement matrix AA is given. Let x{\bf x} be a kk-sparse vector whose nonzero components are known to be positive. Also let x1=x2==xnk=0.{\bf x}_{1}={\bf x}_{2}=\dots={\bf x}_{n-k}=0. Further, assume that y=Ax{\bf y}=A{\bf x} and that w{\bf w} is an n×1n\times 1 vector. Then (107) will produce the solution of (1) if

Following the procedure of Subsection 3.2 we set Sw+S_{w}^{+}

where as earlier h{\bf h} is a random column vector in RnR^{n} with i.i.d. N(0,1){\cal N}(0,1) components and Sn1S^{n-1} is the unit nn-dimensional sphere. As in Subsection 3.1 our goal will be to compute an upper bound on w(Sw+)w(S_{w}^{+}) and then equal that upper bound to (m14m)\left(\sqrt{m}-\frac{1}{4\sqrt{m}}\right). To simplify the exposition we again set w(h,Sw+)=maxwSw+(hTw)w({\bf h},S_{w}^{+})=\max_{{\bf w}\in S_{w}^{+}}({\bf h}^{T}{\bf w}). We will proceed again as earlier and in Subsection 4.1.1 we will determine an upper bound Bweak+B_{weak}^{+} on w(h,Sw+)w({\bf h},S_{w}^{+}). In Subsection 4.1.2 we will compute an upper bound on E(Bweak+)E(B_{weak}^{+}). That quantity will be an upper bound on w(Sw+)w(S_{w}^{+}) since according to the following E(Bweak+)E(B_{weak}^{+}) is an upper bound on w(Sw+)w(S_{w}^{+})

w({\bf h},S_{w}^{+}) In a fashion analogous to (8) we can write

Let again h1:(nk)=(h1,h2,,hnk)T{\bf h}_{1:(n-k)}=({\bf h}_{1},{\bf h}_{2},\dots,{\bf h}_{n-k})^{T}. Further, let h(i)(nk){\bf h}_{(i)}^{(n-k)} be the ii-th smallest of the elements of h1:(nk){\bf h}_{1:(n-k)}. Set

Then one can simplify (113) in the following way

where hˉi+\bar{{\bf h}}^{+}_{i} is the ii-th element of hˉ+\bar{{\bf h}}^{+} and c(nk)c\leq(n-k) is such that (((hˉ+)Tz)i=1chˉi+)0(((\bar{{\bf h}}^{+})^{T}{\bf z})-\sum_{i=1}^{c}\bar{{\bf h}}^{+}_{i})\geq 0. Clearly, as long as ((hˉ+)Tz)0((\bar{{\bf h}}^{+})^{T}{\bf z})\geq 0 there will be a cc (it is possible that c=0c=0) such that quantity on the most right hand side of (116) is an upper bound on w(h,Sw+)w({\bf h},S_{w}^{+}).

Using (116) we then establish the following analogue to Lemma 2.

Let hRn{\bf h}\in R^{n} be a vector with i.i.d. zero-mean unit variance gaussian components. Further let hˉ+\bar{{\bf h}}^{+} be as defined in (114) and w(h,Sw+)=maxwSw+(hTw)w({\bf h},S_{w}^{+})=\max_{{\bf w}\in S_{w}^{+}}({\bf h}^{T}{\bf w}) where Sw+S_{w}^{+} is as defined in (110). Let zRn{\bf z}\in R^{n} be a column vector such that zi=1,1i(nk){\bf z}_{i}=1,1\leq i\leq(n-k) and zi=1,nk+1in{\bf z}_{i}=-1,n-k+1\leq i\leq n. Then

ζw+(h,c)=((hˉ+)Tz)i=1chˉi+nchˉc+\zeta_{w}^{+}({\bf h},c)=\frac{((\bar{{\bf h}}^{+})^{T}{\bf z})-\sum_{i=1}^{c}\bar{{\bf h}}^{+}_{i}}{n-c}-\bar{{\bf h}}^{+}_{c} and cweak+=δw+nc_{weak}^{+}=\delta_{w}^{+}n is a cnkc\leq n-k such that

Fc1()F_{c}^{-1}(\cdot) is the inverse cdf of zero-mean, unit variance Gaussian random variable. ϵ>0\epsilon>0 is an arbitrarily small constant independent of nn.

E(B_{weak}^{+}) Following step-by-step the derivation of Lemma 44 we can establish the following analogue to it.

Assume the setup of Lemma 9. Let further ψw+=E((hˉ+)Tz)i=1cweak+hˉi+)n\psi_{w}^{+}=\frac{E((\bar{{\bf h}}^{+})^{T}{\bf z})-\sum_{i=1}^{c_{weak}^{+}}\bar{{\bf h}}^{+}_{i})}{n}.Then

Follows directly from the derivation before Lemma 44. ∎

As earlier, following (45), if nn is large, for a fixed α\alpha one can determine βw+\beta_{w}^{+} as a maximum β\beta such that

In the rest of this subsection we show how the left hand side of (121) can be computed for a randomly chosen fixed βw+\beta_{w}^{+}. We will again repeat the two crucial steps:

We then compute Ei=cweak++1nhˉi+2n(E((hˉ+)Tz)Ei=1cweak+hˉi+)2n(ncweak+)\frac{E\sum_{i=c_{weak}^{+}+1}^{n}|\bar{{\bf h}}^{+}_{i}|^{2}}{n}-\frac{(E((\bar{{\bf h}}^{+})^{T}{\bf z})-E\sum_{i=1}^{c_{weak}^{+}}\bar{{\bf h}}^{+}_{i})^{2}}{n(n-c_{weak}^{+})} with cweak+c_{weak}^{+} found in step 11.

From Lemma 9 we have cweak+=δw+nc_{weak}^{+}=\delta_{w}^{+}n is a cc such that

where we recall that now hˉi+,1i(nβw+n)\bar{{\bf h}}^{+}_{i},1\leq i\leq(n-\beta_{w}^{+}n), is the ii-th smallest element (not magnitude) of vector h1:(nβw+n){\bf h}_{1:(n-\beta_{w}^{+}n)}. Also, we easily have Ehi=0,nβw+n+1inE{\bf h}_{i}=0,n-\beta_{w}^{+}n+1\leq i\leq n, and then from (LABEL:eq:weakdelta1non)

Set θw+=1δw+\theta_{w}^{+}=1-\delta_{w}^{+}. Following and (55) we obtain

We first easily compute Fc1(1θw+1βw+)F_{c}^{-1}(\frac{1-\theta_{w}^{+}}{1-\beta_{w}^{+}}) in the following way

Combining (LABEL:eq:weakdelta1non), (123), (124), (126), and (127) we obtain the following equation for finding θw+\theta_{w}^{+}

Let θ^w+\hat{\theta}_{w}^{+} be the solution of (128). Then δw+=1θ^w+\delta_{w}^{+}=1-\hat{\theta}_{w}^{+} and cweak+=δw+n=(1θ^w+)nc_{weak}^{+}=\delta_{w}^{+}n=(1-\hat{\theta}_{w}^{+})n. This concludes step 11.

In this step we compute Ei=cweak++1nhˉi+2n(E((hˉ+)Tz)Ei=1cweak+hˉi+)2n(ncweak+)\frac{E\sum_{i=c_{weak}^{+}+1}^{n}|\bar{{\bf h}}^{+}_{i}|^{2}}{n}-\frac{(E((\bar{{\bf h}}^{+})^{T}{\bf z})-E\sum_{i=1}^{c_{weak}^{+}}\bar{{\bf h}}^{+}_{i})^{2}}{n(n-c_{weak}^{+})} with cweak+=(1θ^w+)nc_{weak}^{+}=(1-\hat{\theta}_{w}^{+})n. Using results from step 11 we easily find

Effectively, what is left to compute is Ei=cweak++1nhˉi+2n\frac{E\sum_{i=c_{weak}^{+}+1}^{n}|\bar{{\bf h}}^{+}_{i}|^{2}}{n}. First we note that

Using an approach similar to the one from step 22 of Subsection 3.2.2 and following we have

where Fd1F_{d}^{-1} is the inverse cdf of random variable \mboxsign(X)X2\mbox{sign}(X)|X|^{2} and XX is zero-mean unit variance Gaussian random variable. Straightforward calculations produce

Combining (130), (131), (132), and (133) we obtain

We summarize the results from this section in the following theorem.

(Weak threshold, a priori known signs of x{\bf x}) Let AA be an m×nm\times n measurement matrix in (1) with the null-space uniformly distributed in the Grassmanian. Let the unknown x{\bf x} in (1) be kk-sparse. Let it be known that the nonzero components of x{\bf x} are positive. Further, let the locations of nonzero elements of x{\bf x} be arbitrarily chosen but fixed. Let k,m,nk,m,n be large and let α=mn\alpha=\frac{m}{n} and βw+=kn\beta_{w}^{+}=\frac{k}{n} be constants independent of mm and nn. Let erfinv be the inverse of the standard error function associated with zero-mean unit variance Gaussian random variable. Further, let ϵ>0\epsilon>0 be an arbitrarily small constant and θ^w+\hat{\theta}_{w}^{+}, (βw+θ^w+1\beta_{w}^{+}\leq\hat{\theta}_{w}^{+}\leq 1), be the solution of

If α\alpha and βw+\beta_{w}^{+} further satisfy

then the solutions of (1) and (107) coincide with overwhelming probability.

Follows from the previous discussion combining (5), (112), (117), (120), (121), (128), (129), and (134). ∎

The results for the weak thresholds obtained from the above theorem in the case of a priori known signs of components of x{\bf x} as well as the best currently known ones from are presented on Figure 5. As can be seen, the threshold results obtained from the previous analysis match those from .

Discussion

Carefully following our exposition one can note that the strong threshold results in case of signed vectors x{\bf x} are missing. We should mention that the procedure presented in this paper can be repeated for that case as well. However, due to a somewhat complicated structure of the set SS the analysis of that case becomes a bit more tedious and certainly loses on elegance. Nevertheless, we conducted the analysis. However, since the final results that we obtained trail those of (except in a very narrow range around α1\alpha\longrightarrow 1) we decided not to include them in this paper.

On the technical side we should mention that our analysis made critical use of an excellent work . On the other hand massively relied on phenomenal results related to the estimates of the normal tail distributions of Lipshitz functions. In a very recent work related to the matrix-rank optimization the authors in successfully applied results of directly without relying on the main results from . It will certainly be interesting to see what performance guarantees the direct application of the results of would produce in the problems considered in this paper.

At the end we should finally mention a potential universal value of the results presented here. In this paper we were mostly concerned with the compressed sensing signal processing applications. However, the results presented here may be of independent mathematical interest as well. First, clearly our analysis (as almost any other analysis related to compressed sensing) has immediate impact on important mathematical problem of solving under-determined systems of linear equations. Second, following the derivations of it is not that difficult to see that our results can be directly applied to determine the neighborliness thresholds of projected cross-polytope, regular simplex, and positive orthant as well.

References