Orthonormal Expansion l1-Minimization Algorithms for Compressed Sensing

Zai Yang, Cishen Zhang, Jun Deng, Wenmiao Lu

I Introduction

which is known as the basis pursuit denoising (BPDN) problem, with e2ϵ\left\|\emph{{e}}\right\|_{2}\leq\epsilon. Another frequently discussed approach is to solve the problem in its Lagrangian form

It follows from the knowledge of convex optimization that (BPϵ)\left(\text{BP}_{\epsilon}\right) and (QPλ)\left(\text{QP}_{\lambda}\right) are equivalent with appropriate choices of ϵ\epsilon and λ\lambda. In general, λ\lambda decreases as ϵ\epsilon decreases. In the limiting case of λ,ϵ0\lambda,\epsilon\rightarrow 0, both (BPϵ)\left(\text{BP}_{\epsilon}\right) and (QPλ)\left(\text{QP}_{\lambda}\right) converges to the following basis pursuit (BP) problem in noiseless CS:

In the case of strictly sparse signals and noise-free measurements, many fast algorithms have been proposed to exactly reconstruct xo\emph{{x}}^{o}. One class of algorithms uses the greedy pursuit method, which iteratively refines the support and entries of a sparse solution to yield a better approximation of xo\emph{{x}}^{o}, such as OMP, StOMP and CoSaMP. These algorithms, however, may not produce satisfactory sparsity-undersampling tradeoff compared with (BP)\left(\text{BP}\right) because of their greedy operations. As mentioned before, (QPλ)\left(\text{QP}_{\lambda}\right) is equivalent to (BP)\left(\text{BP}\right) as λ0\lambda\rightarrow 0. Hence, (BP)\left(\text{BP}\right) can be solved with high accuracy using algorithms for (QPλ)\left(\text{QP}_{\lambda}\right) by setting λ\lambda to a small value, e.g. 1×1061\times 10^{-6}. IST has attracted much attention because of its simple form. In the case where λ\lambda is small, however, the standard IST in (1) can be very slow. To improve its speed, a fixed-point continuation (FPC) strategy is exploited , in which λ\lambda is decreased in a continuation scheme and a qq-linear convergence rate is achieved. Further, FPC-AS is developed to improve the performance of FPC by introducing an active set, inspired by greedy pursuit algorithms. An alternative approach to improving the speed of IST is to use an aggressive continuation, which takes the form

where λt\lambda_{t} may decrease in each iteration. The algorithm of this form typically has a worse sparsity-undersampling tradeoff than (BP)\left(\text{BP}\right). Such a disadvantage is partially overcome by approximately message passing (AMP) algorithm, in which a modification is introduced in the current residual zt\emph{{z}}_{t}:

where x0\left\|\emph{{x}}\right\|_{0} counts the number of nonzero entries of x. It is noted that AMP having the same spasity-undersampling tradeoff as (BP)\left(\text{BP}\right) is only established based on heuristic arguments and numerical simulations. Moreover, it cannot be easily extended to deal with more general complex-valued sparse signals, though real-valued signals are only considered in this correspondence.

The rest of the correspondence is organized as follows. Section II introduces the proposed exact and relaxed ONE-L1 algorithms followed by their implementation details. Section III reports the efficiency of our algorithm through numerical simulations in both noise-free and noise-contaminated cases. Conclusions are drawn in Section IV.

II ONE-L1 Algorithms

The operator Sλ()S_{\lambda}(\cdot) can be extended to vector variables by its element-wise operation.

II-B Problem Description

where Γ(p)\Gamma(\emph{{p}}) is an operator projecting the vector p onto its first nn entries.

II-C ALM Based ONE-L1 Algorithms

Subsequently, we have the following optimization problem (SP)\left(SP\right):

Instead of solving (SP)\left(SP\right), let us consider the two related problems

To solve (SP2)\left(SP_{2}\right), we let Γ(p)L(x,p,y,μ)=0\partial_{\overline{\Gamma}(\emph{{p}})}\mathcal{L}(\emph{{x}},\emph{{p}},\emph{{y}},\mu)=0 to obtain Γ(p)=Γ(Φxμ1y)\overline{\Gamma}(\emph{{p}})=\overline{\Gamma}\left({\Phi}\emph{{x}}-\mu^{-1}\emph{{y}}\right), i.e.,

where Γ()\overline{\Gamma}\left(\cdot\right) is the operator projecting the variable to its last NnN-n entries. As a result, an iterative solution of (SP)\left(SP\right) is stated in the following Lemma 1.

For fixed y and μ\mu, the iterative algorithm given by

converges to an optimal solution of (SP)\left(SP\right).

Denote L(x,p,y,μ)\mathcal{L}\left(\emph{{x}},\emph{{p}},\emph{{y}},\mu\right) as L(x,p)\mathcal{L}\left(\emph{{x}},\emph{{p}}\right), for simplicity. By the optimality and uniqueness of xj+1\emph{{x}}^{j+1} and pj+1\emph{{p}}^{j+1}, we have L(xj+1,pj+1)L(xj,pj)\mathcal{L}\left(\emph{{x}}^{j+1},\emph{{p}}^{j+1}\right)\leq\mathcal{L}\left(\emph{{x}}^{j},\emph{{p}}^{j}\right) and the equality holds if and only if (xj+1,pj+1)=(xj,pj)\left(\emph{{x}}^{j+1},\emph{{p}}^{j+1}\right)=\left(\emph{{x}}^{j},\emph{{p}}^{j}\right). Hence, the sequence {L(xj,pj)}\left\{\mathcal{L}\left(\emph{{x}}^{j},\emph{{p}}^{j}\right)\right\} is bounded and converges to a constant LL^{*}, i.e., L(xj,pj)L\mathcal{L}\left(\emph{{x}}^{j},\emph{{p}}^{j}\right)\rightarrow L^{*} as j+j\rightarrow+\infty. Since the sequence {xj}\left\{\emph{{x}}^{j}\right\} is also bounded by xj1L(xj,pj)+12μy22\left\|\emph{{x}}^{j}\right\|_{1}\leq\mathcal{L}\left(\emph{{x}}^{j},\emph{{p}}^{j}\right)+\frac{1}{2\mu}\left\|\emph{{y}}\right\|_{2}^{2}, there exists a sub-sequence {xji}i=1+\left\{\emph{{x}}^{j_{i}}\right\}_{i=1}^{+\infty} such that xjixs\emph{{x}}^{j_{i}}\rightarrow\emph{{x}}_{s}^{*} as i+i\rightarrow+\infty, where xs\emph{{x}}_{s}^{*} is an accumulation point of {xj}\left\{\emph{{x}}^{j}\right\}. Correspondingly, pjips\emph{{p}}^{j_{i}}\rightarrow\emph{{p}}_{s}^{*}, and moreover, L(xs,ps)=L\mathcal{L}\left(\emph{{x}}_{s}^{*},\emph{{p}}_{s}^{*}\right)=L^{*}.

We now use contradiction to show that (xs,ps)\left(\emph{{x}}_{s}^{*},\emph{{p}}_{s}^{*}\right) is a fixed point of the algorithm. Suppose that there exist (xs,ps)(xs,ps)\left(\overline{\emph{{x}}}_{s},\overline{\emph{{p}}}_{s}\right)\neq\left(\emph{{x}}_{s}^{*},\emph{{p}}_{s}^{*}\right) such that xs=argminxL(x,ps)\overline{\emph{{x}}}_{s}=\arg\min_{\emph{{x}}}\mathcal{L}\left(\emph{{x}},\emph{{p}}_{s}^{*}\right) and ps=argminp,Γ(p)=bL(xs,p)\overline{\emph{{p}}}_{s}=\arg\min_{\emph{{p}},\Gamma\left(\emph{{p}}\right)=\emph{{b}}}\mathcal{L}\left(\emph{{x}}_{s}^{*},\emph{{p}}\right), and L(xs,ps)<L\mathcal{L}\left(\overline{\emph{{x}}}_{s},\overline{\emph{{p}}}_{s}\right)<L^{*}. By (9), (12) and [4, Lemma 2.2], we have xji+1xs2xjixs20\left\|\emph{{x}}^{j_{i}+1}-\overline{\emph{{x}}}_{s}\right\|_{2}\leq\left\|\emph{{x}}^{j_{i}}-\emph{{x}}_{s}^{*}\right\|_{2}\rightarrow 0, i.e., xji+1xs\emph{{x}}^{j_{i}+1}\rightarrow\overline{\emph{{x}}}_{s}, as i+i\rightarrow+\infty. Meanwhile, pji+1ps\emph{{p}}^{j_{i}+1}\rightarrow\overline{\emph{{p}}}_{s}. Hence, L(xji+1,pji+1)L(xs,ps)<L\mathcal{L}\left(\emph{{x}}^{j_{i}+1},\emph{{p}}^{j_{i}+1}\right)\rightarrow\mathcal{L}\left(\overline{\emph{{x}}}_{s},\overline{\emph{{p}}}_{s}\right)<L^{*}, which contradicts L(xj,pj)L\mathcal{L}\left(\emph{{x}}^{j},\emph{{p}}^{j}\right)\rightarrow L^{*}, resulting in that (xs,ps)\left(\emph{{x}}_{s}^{*},\emph{{p}}_{s}^{*}\right) is a fixed point. Moreover, it follows from xji+qxs2xjixs20\left\|\emph{{x}}^{j_{i}+q}-\emph{{x}}_{s}^{*}\right\|_{2}\leq\left\|\emph{{x}}^{j_{i}}-\emph{{x}}_{s}^{*}\right\|_{2}\rightarrow 0 for any positive integer qq, that xjxs\emph{{x}}^{j}\rightarrow\emph{{x}}_{s}^{*}, as j+j\rightarrow+\infty.

Note that orthonormal matrix Φ=[AB]{\Phi}=\begin{bmatrix}\emph{{A}}\\ \emph{{B}}\end{bmatrix} and ΦΦ=AA+BB=I{\Phi}^{\prime}{\Phi}=\emph{{A}}^{\prime}\emph{{A}}+\emph{{B}}^{\prime}\emph{{B}}=\emph{{I}}. We can obtain

Meanwhile, (SP)\left(SP\right) is equivalent to

By [4, Proposition 3.10], xs\emph{{x}}_{s}^{*} is an optimal solution of the problem in (14) and equivalently, (xs,ps)\left(\emph{{x}}_{s}^{*},\emph{{p}}_{s}^{*}\right) is an optimal solution of (SP)\left(SP\right).

Lemma 1 shows that to solve problem (SP)\left(SP\right) is equivalent to solve, iteratively, problems (SP1)\left(SP_{1}\right) and (SP2)\left(SP_{2}\right).

Reference [4, Proposition 3.10] only deals with the special case A2<1\left\|\emph{{A}}\right\|_{2}<1 and it is, in fact, straightforward to extend the result to arbitrary A.

Following from the framework of the ALM method and Lemma 1, the ALM based ONE-L1 algorithm is outlined in Algorithm 1, where (xt,pt)\left(\emph{{x}}_{t}^{*},\emph{{p}}_{t}^{*}\right) is the optimal solution to (SP)\left(SP\right) in the ttth iteration and yt\emph{{y}}_{t}^{*} is the corresponding Lagrange multiplier.

The convergence of Algorithm 1 is stated in the following theorem.

Any accumulation point (x,p)\left(\emph{{x}}^{*},\emph{{p}}^{*}\right) of sequence {(xt,pt)}t=1+\left\{\left(\emph{{x}}_{t}^{*},\emph{{p}}_{t}^{*}\right)\right\}_{t=1}^{+\infty} of Algorithm 1 is an optimal solution of (BPo)\left(\text{BP}^{o}\right) and the convergence rate with respect to the outer iteration loop index tt is at least O(μt11)O\left(\mu_{t-1}^{-1}\right) in the sense that

where x=x1\emph{{x}}^{\dagger}=\|\emph{{x}}^{*}\|_{1}.

We first show that the sequence {yt}\left\{\emph{{y}}_{t}^{*}\right\} is bounded. By the optimality of (xt+1,pt+1)\left(\emph{{x}}_{t+1}^{*},\emph{{p}}_{t+1}^{*}\right) we have

where x\partial_{\emph{{x}}} denotes the partial differential operator with respect to x resulting in a set of subgradients. Hence, Φyt+1xt+11{\Phi}^{\prime}\emph{{y}}_{t+1}^{*}\in\partial\left\|\emph{{x}}_{t+1}^{*}\right\|_{1}. It follows that Φyt+11\left\|{\Phi}^{\prime}\emph{{y}}_{t+1}^{*}\right\|_{\infty}\leq 1 and {yt}\left\{\emph{{y}}_{t}^{*}\right\} is bounded. By xL(xt+1,pt+1,yt,μt)\emph{{x}}^{\dagger}\geq\mathcal{L}\left(\emph{{x}}_{t+1}^{*},\emph{{p}}_{t+1}^{*},\emph{{y}}_{t}^{*},\mu_{t}\right),

By {yt}\left\{\emph{{y}}_{t}^{*}\right\} is bounded,

For any accumulation point x\emph{{x}}^{*} of xt\emph{{x}}_{t}^{*}, without loss of generality, we have xtx\emph{{x}}_{t}^{*}\rightarrow\emph{{x}}^{*} as t+t\rightarrow+\infty. Hence, x1x\left\|\emph{{x}}^{*}\right\|_{1}\leq\emph{{x}}^{\dagger}. In the mean time, pt+1=Φxt+1+μt1(yt+1yt)p\emph{{p}}_{t+1}^{*}={\Phi}\emph{{x}}_{t+1}^{*}+\mu_{t}^{-1}\left(\emph{{y}}_{t+1}^{*}-\emph{{y}}_{t}^{*}\right)\rightarrow\emph{{p}}^{*} and Φx=p{\Phi}\emph{{x}}^{*}=\emph{{p}}^{*} result in that (x,p)\left(\emph{{x}}^{*},\emph{{p}}^{*}\right) is an optimal solution to (BPo)\left(\text{BP}^{o}\right).

Moreover, by xt+1=Φ[pt+1μt1(yt+1yt)]\emph{{x}}_{t+1}^{*}={\Phi}^{\prime}\left[\emph{{p}}_{t+1}^{*}-\mu_{t}^{-1}\left(\emph{{y}}_{t+1}^{*}-\emph{{y}}_{t}^{*}\right)\right] and

we have xt+11xO(μt1)\left\|\emph{{x}}_{t+1}^{*}\right\|_{1}\geq\emph{{x}}^{\dagger}-O\left(\mu_{t}^{-1}\right), which establishes the theorem with (15).

Algorithm 1 contains, respectively, an inner and an outer iteration loops. Theorem 1 presents only the convergence rate of the outer loop. A natural way to speed up Algorithm 1 is to terminate the inner loop without convergence and use the obtained inner-loop solution as the initialization for the next iteration. This is similar to a continuation strategy and can be realized with reasonably set precision and step size μt\mu_{t} . When the continuation parameter μt\mu_{t} increases very slowly, in a few iterations, the inner loop can produce a solution with high accuracy. In particular, for the purpose of fast and simple computing, we may update the variables in the inner loop only once before stepping into the outer loop operation. This results in a relaxed version of exact ONE-L1 algorithm (eONE-L1), namely relaxed ONE-L1 algorithm (rONE-L1) outlined in Algorithm 2.

The iterative solution (xt,pt)(\emph{{x}}_{t},\emph{{p}}_{t}) of Algorithm 2 converges to a feasible solution (xf,pf)(\emph{{x}}^{f},\emph{{p}}^{f}) of (BPo)\left(\text{BP}^{o}\right) if t=1+μt1<+\sum_{t=1}^{+\infty}\mu_{t}^{-1}<+\infty. It converges at least exponentially to (xf,pf)(\emph{{x}}^{f},\emph{{p}}^{f}) if {μt}\left\{\mu_{t}\right\} is an exponentially increasing sequence.

We show first that sequences {y^t}\left\{\hat{\emph{{y}}}_{t}\right\} and {yt}\left\{\emph{{y}}_{t}\right\} are bounded, where y^t=yt1+μt1(pt1Φxt)\hat{\emph{{y}}}_{t}=\emph{{y}}_{t-1}+\mu_{t-1}\left(\emph{{p}}_{t-1}-{\Phi}\emph{{x}}_{t}\right). By the optimality of xt+1\emph{{x}}_{t+1} and pt+1\emph{{p}}_{t+1} we have

Hence, Φy^t+11\left\|{\Phi}^{\prime}\hat{\emph{{y}}}_{t+1}\right\|_{\infty}\leq 1 and it follows that {y^t}\left\{\hat{\emph{{y}}}_{t}\right\} is bounded. Since yt+1=y^t+1+μt(pt+1pt)\emph{{y}}_{t+1}=\hat{\emph{{y}}}_{t+1}+\mu_{t}\left(\emph{{p}}_{t+1}-\emph{{p}}_{t}\right), we obtain Γ(yt+1)=Γ(y^t+1)\Gamma\left(\emph{{y}}_{t+1}\right)=\Gamma\left(\hat{\emph{{y}}}_{t+1}\right). This together with Γ(yt+1)=0\overline{\Gamma}\left(\emph{{y}}_{t+1}\right)=0 results in yt+12y^t+12\left\|\emph{{y}}_{t+1}\right\|_{2}\leq\left\|\hat{\emph{{y}}}_{t+1}\right\|_{2} and the boundedness of {yt}\left\{\emph{{y}}_{t}\right\}. By pt+1pt=μt1(yt+1y^t+1)\emph{{p}}_{t+1}-\emph{{p}}_{t}=\mu_{t}^{-1}\left(\emph{{y}}_{t+1}-\hat{\emph{{y}}}_{t+1}\right), we have pt+1pt2Cμt1\left\|\emph{{p}}_{t+1}-\emph{{p}}_{t}\right\|_{2}\leq C\mu_{t}^{-1} with CC being a constant. Then {pt}\left\{\emph{{p}}_{t}\right\} is a Cauchy sequence if t=1+μt1<+\sum_{t=1}^{+\infty}\mu_{t}^{-1}<+\infty, resulting in ptpf\emph{{p}}_{t}\rightarrow\emph{{p}}^{f} as t+t\rightarrow+\infty. In the mean time, xtxf\emph{{x}}_{t}\rightarrow\emph{{x}}^{f}, Φxf=pf{\Phi}\emph{{x}}^{f}=\emph{{p}}^{f}. Hence, (xf,pf)\left(\emph{{x}}^{f},\emph{{p}}^{f}\right) is a feasible solution of (BPo)\left(\text{BP}^{o}\right). Suppose that {μt}\left\{\mu_{t}\right\} is an exponentially increasing sequence, i.e., μt+1=rμt\mu_{t+1}=r\mu_{t} with r>1r>1. By the boundedness of {yt}\left\{\emph{{y}}_{t}\right\} and {y^t}\left\{\hat{\emph{{y}}}_{t}\right\} we have

Hence, {pt}\left\{\emph{{p}}_{t}\right\} converges at least exponentially to pf\emph{{p}}^{f} since {μt1}\left\{\mu_{t}^{-1}\right\} exponentially converges to , and the same result holds for {xt}\left\{\emph{{x}}_{t}\right\}.

It is shown in Theorem 2 that faster growth of {μt}\left\{\mu_{t}\right\} can result in faster convergence of {xt}\left\{\emph{{x}}_{t}\right\}. Intuitively, the reduced number of iterations for the inner loop problem (SP)\left(SP\right) may result in some error from the optimal solution xtx^{*}_{t} of the inner loop. This will likely affect the accuracy of the final solution xfx^{f} for (BP)\left(\text{BP}\right). Therefore, the growth speed of {μt}\left\{\mu_{t}\right\} provides a tradeoff between the convergence speed of the algorithm and the precision of the final solution, which will be illustrated in Section III through numerical simulations.

II-D Relationship Between rONE-L1 and IST

The studies and applications of IST type algorithms have been very active in recent years because of their concise presentations. This subsection considers the relationship between rONE-L1 and IST. Note that Γ(yt)=0\overline{\Gamma}\left(\emph{{y}}_{t}\right)=0 in Algorithm 2 and ΦΦ=AA+BB=I{\Phi}^{\prime}{\Phi}=\emph{{A}}^{\prime}\emph{{A}}+\emph{{B}}^{\prime}\emph{{B}}=\emph{{I}}. After some derivations, it can be shown that the rONE-L1 algorithm is equivalent to the following iteration (starting from xt=0\emph{{x}}_{t}=0, as t0t\leq 0, and zt=0\emph{{z}}_{t}=0, as t<0t<0):

where λt=μt1\lambda_{t}=\mu_{t}^{-1} and κt=μt1μt\kappa_{t}=\frac{\mu_{t-1}}{\mu_{t}}. Compared with the general form of IST in (2), one more term κtzt1\kappa_{t}\emph{{z}}_{t-1} is added when computing the current residual zt\emph{{z}}_{t} in rONE-L1. Moreover, a weighted sum (1+κt)xtκtxt1(1+\kappa_{t})\emph{{x}}_{t}-\kappa_{t}\emph{{x}}_{t-1} is used instead of the current solution xt\emph{{x}}_{t}. It will be shown later that these two changes essentially improve the sparsity-undersampling tradeoff.

Equations in (16) show that the expansion from the partially orthonormal matrix A to orthonormal Φ{\Phi} is not at all involved in the actual implementation and computation of rONE-L1. The same claim also holds for eONE-L1 algorithm. Nevertheless, the orthonormal expansion is a key instrumentation in the derivation and analysis of Algorithms 1 and 2.

II-E Implementation Details

As noted in Remark 3, the expansion from A to Φ{\Phi} is not involved in the computing of ONE-L1 algorithms. In our implementations, we consider using exponentially increasing μt\mu_{t}, i.e., fixing r>1r>1 and μt=rtμ0\mu_{t}=r^{t}\mu_{0}. Let Qα()Q_{\alpha}(\cdot) be an α\alpha-quantile operator and μ0=1/Qα(Ab)\mu_{0}=1/Q_{\alpha}\left(\left|\emph{{A}}^{\prime}\emph{{b}}\right|\right), with \left|\cdot\right| applying to the vector variable elementwise, μ01\mu_{0}^{-1} being the threshold in the first iteration and α=0.99\alpha=0.99. In eONE-L1, a large rr can speed up the convergence of the outer loop iteration according to Theorem 1. However, simulations show that a larger rr can result in more iterations in the inner loop. We use r=1+n/Nr=1+n/N as default. In rONE-L1, the value of rr provides a tradeoff between the convergence speed of the algorithm and the precision of the final solution. Our recommendation of rr to achieve the optimal sparsity-undersampling tradeoff is r=min(1+0.04n/N,1.02)r=\min\left(1+0.04n/N,1.02\right), which will be illustrated in Section III-A.

An iterative algorithm needs a termination criterion. The eONE-L1 algorithm is considered converged if Axtb2b2<τ1\frac{\left\|\emph{{A}}\emph{{x}}_{t}^{*}-\emph{{b}}\right\|_{2}}{\left\|b\right\|_{2}}<\tau_{1} with τ1\tau_{1} being a user-defined tolerance. The inner iteration is considered converged if xtj+1xtj2xtj2<τ2\frac{\left\|\emph{{x}}_{t}^{j+1}-\emph{{x}}_{t}^{j}\right\|_{2}}{\left\|\emph{{x}}_{t}^{j}\right\|_{2}}<\tau_{2}. In our implementation, the default values are (τ1,τ2)=(105,106)\left(\tau_{1},\tau_{2}\right)=\left(10^{-5},10^{-6}\right). The rONE-L1 algorithm is considered converged if Axtb2b2<τ\frac{\left\|\emph{{A}}\emph{{x}}_{t}-\emph{{b}}\right\|_{2}}{\left\|\emph{{b}}\right\|_{2}}<\tau, with τ=105\tau=10^{-5} as default.

III Numerical Simulations

This subsection considers the sparsity-undersampling tradeoff of rONE-L1 in the case of strictly sparse signals and noise-free measurements. Phase transition is a measure of the sparsity-undersampling tradeoff in this case. Let the sampling ratio be δ=n/N\delta=n/N and the sparsity ratio be ρ=k/n\rho=k/n, where kk is a measure of sparsity of x, and we call that x is kk-sparse if at most kk entries of x are nonzero. As k,n,Nk,n,N\rightarrow\infty with fixed δ\delta and ρ\rho, the behavior of the phase transition of (BP)\left(\text{BP}\right) is controlled by (δ,ρ)(\delta,\rho). We denote this theoretical curve by ρ=ρT(δ)\rho=\rho_{T}\left(\delta\right), which is plotted in Fig 1.

We estimate the phase transition of rONE-L1 using a Monte Carlo method as in . Two matrix ensembles are considered, including Gaussian with N=1000N=1000 and partial-DCT with N=1024N=1024. Here the finite-NN phase transition is defined as the value of ρ\rho at which the success probability to recover the original signal is 50%50\%. We consider 3333 equispaced values of δ\delta in {0.02,0.05,,0.98}\left\{0.02,0.05,\cdots,0.98\right\}. For each δ\delta, 2121 equispaced values of ρ\rho are generated in the interval [ρT(δ)0.1,ρT(δ)+0.1]\left[\rho_{T}\left(\delta\right)-0.1,\rho_{T}\left(\delta\right)+0.1\right]. Then M=20M=20 random problem instances are generated and solved with respect to each combination of (δ,ρ)(\delta,\rho), where n=δNn=\lceil\delta N\rceil, k=ρnk=\lceil\rho n\rceil, and nonzero entries of sparse signals are generated from the standard Gaussian distribution. Success is declared if the relative root mean squared error (relative RMSE) x^xo2xo2<104\frac{\left\|\hat{\emph{{x}}}-\emph{{x}}^{o}\right\|_{2}}{\left\|\emph{{x}}^{o}\right\|_{2}}<10^{-4}, where x^\hat{\emph{{x}}} is the recovered signal. The number of success among MM experiments is recorded. Finally, a generalized linear model is used to estimate the phase transition as in .

III-B Comparison with IST

The rONE-L1 algorithm can be considered as a modified version of IST in (2). In this subsection we compare the sparsity-undersampling tradeoff and speed of these two algorithms. A similar method is adopted to estimate the phase transition of IST, which is implemented using the same parameter values as rONE-L1. Only nine values of δ\delta in {0.1,0.2,,0.9}\left\{0.1,0.2,\cdots,0.9\right\} are considered with the partial-DCT matrix ensemble for time consideration. Another choice of r=1+0.2δr=1+0.2\delta is considered besides the recommended one. Correspondingly, the phase transition of rONE-L1 with r=1+0.2δr=1+0.2\delta is also estimated.

The observed phase transitions are shown in Fig. 1. As a modified version of IST, obviously, rONE-L1 makes a great improvement over IST in the sparsity-undersampling tradeoff. Meanwhile, comparison of the averaged number of iterations of the two algorithms shows that rONE-L1 is also faster than IST. For example, as δ=0.2\delta=0.2 and the recommended rr is used, rONE-L1 is about 6 times faster than IST.

III-C Comparison with AMP, FPC-AS and NESTA in Noise-free Case

In this subsection, we report numerical simulation results comparing rONE-L1 with state-of-the-art algorithms, including AMP, FPC-AS and NESTA, in the case of sparse signals and noise-free measurements. Our experiments used FPC-AS v.1.21, NESTA v.1.1, and AMP codes provided by the author. We choose parameter values for FPC-AS and NESTA such that each method produces a solution with approximately the same precision as that produced by rONE-L1. In our experiments we consider the recovery of exactly sparse signals from partial-DCT measurements. We set N=214N=2^{14} and δ=0.2\delta=0.2, and an ‘easy’ case where ρ=0.1\rho=0.1 and a ‘hard’ case where ρ=0.22\rho=0.22 are considered, respectively. Here ‘easy’ and ‘hard’ refer to the difficulty degree in recovering a sparse signal from a specific number of measurements. The setting (δ,ρ)=(0.2,0.22)\left(\delta,\rho\right)=(0.2,0.22) is close to the phase transition of (BP)\left(\text{BP}\right). Twenty random problems are created and solved for each algorithm with each combination of (δ,ρ)\left(\delta,\rho\right), and the minimum, maximum and averaged relative RMSE, number of calls of A and A\emph{{A}}^{\prime}, and CPU time usages are recorded. All experiments are carried on Matlab v.7.7.0 on a PC with a Windows XP system and a 3GHz CPU. Default parameter values are used in eONE-L1 and rONE-L1.

AMP: terminating if Axtb2b2<105\frac{\left\|\emph{{A}}\emph{{x}}_{t}-\emph{{b}}\right\|_{2}}{\left\|\emph{{b}}\right\|_{2}}<10^{-5}.

FPC-AS: λ=2×106\lambda=2\times 10^{-6} and gtol=1×106gtol=1\times 10^{-6}, where gtolgtol is the termination criterion on the maximum norm of sub-gradient. FPC-AS solves the problem (QPλ)\left(\text{QP}_{\lambda}\right).

NESTA: λ=2×106\lambda=2\times 10^{-6}, ϵ=0\epsilon=0 and the termination criterion tolvar=1×108tolvar=1\times 10^{-8}. NESTA solves (BPϵ)\left(\text{BP}_{\epsilon}\right) using the Nesterov algorithm , with continuation.

Our experiment results are presented in Table I. In both ‘easy’ and ‘hard’ cases, rONE-L1 is much faster than eONE-L1. In the ‘easy’ case, the proposed rONE-L1 algorithm takes the most number of calls of A and A\emph{{A}}^{\prime}, except that of eONE-L1, due to a conservative setting of rr. But this number of calls (515.4) is very close to that of NESTA (468.9), and furthermore, the CPU time usage of rONE-L1 (2.14 s) is less than that of NESTA (2.70 s) because of its concise implementation. In the ‘hard’ case, rONE-L1 has the second best performance with significantly less CPU time than that of AMP and NESTA. AMP has the second worst CPU time and the worst accuracy as the dynamic threshold in each iteration depends on the mean squared error of the current iterative solution, which cannot be calculated exactly in the implementation.

III-D A Practical Example

This subsection demonstrates the efficiency of rONE-L1 in the general CS where the signal of interest is approximately sparse and measurements are contaminated with noise. We seek to reconstruct the Mondrian image of size 256×256256\times 256, shown in Fig. 2, from its noise-contaminated partial-DCT coefficients. This image presents a challenge as its wavelet expansion is compressible but not exactly sparse. The sampling pattern, which is inspired by magnetic resonance imaging (MRI) and is shown in Fig. 2, is adopted as most energy of the image concentrates at low-frequency components after the DCT transform. The measurement vector b contains n=7419n=7419 DCT measurements (δ=0.113\delta=0.113). White Gaussian noise with standard deviation σ=1\sigma=1 is then added. We set ϵ=n+22nσ\epsilon=\sqrt{n+2\sqrt{2n}}\sigma. Haar wavelet with a decomposition level 4 is chosen as the sparsifying transform W\mathcal{W}. Hence, the problem to be solved is (BPϵ)\left(\text{BP}_{\epsilon}\right) with A=FpW\emph{{A}}=\mathcal{F}_{p}\mathcal{W}^{\prime}, where Fp\mathcal{F}_{p} is the partial-DCT transform. The reconstructed image is H^=Wx^\hat{\emph{{H}}}=\mathcal{W}^{\prime}\hat{\emph{{x}}} with x^\hat{\emph{{x}}} being the reconstructed wavelet coefficients and reconstruction error is calculated as H^HoFHoF\frac{\left\|\hat{\emph{{H}}}-\emph{{H}}^{o}\right\|_{\text{F}}}{\left\|\emph{{H}}^{o}\right\|_{\text{F}}}, where Ho\emph{{H}}^{o} is the original image and F\left\|\cdot\right\|_{\text{F}} denotes the Frobenius norm. We compare the performance of rONE-L1 with NESTA and FPC-AS.

AMP is omitted for its poor performance in this approximately-sparse-signal case. For AMP, the value of the dynamic threshold λt\lambda_{t} and the term xt0\left\|x_{t}\right\|_{0} in (3) depend on the condition that the signal to reconstruct is strictly sparse.

In such a noisy measurement case, an exact solution for (BPϵ)\left(\text{BP}_{\epsilon}\right) is not sought after in the rONE-L1 simulation. The computation of the rONE-L1 algorithm is set to terminate if Axtb2b2τ=ϵb2\frac{\left\|\emph{{A}}\emph{{x}}_{t}-\emph{{b}}\right\|_{2}}{\left\|\emph{{b}}\right\|_{2}}\leq\tau=\frac{\epsilon}{\left\|b\right\|_{2}}, i.e., rONE-L1 outputs the first xt\emph{{x}}_{t} when it becomes a feasible solution of (BPϵ)\left(\text{BP}_{\epsilon}\right).

FPC-AS: λ=1×103\lambda=1\times 10^{-3}, gtol=1×103gtol=1\times 10^{-3}, gtol_scale_x=1×106gtol\_scale\_x=1\times 10^{-6} and the maximum number of iterations for subspace optimization sub_mxitr=10sub\_mxitr=10. The parameters are set according to [15, Section 4.4].

NESTA: λ=1×104\lambda=1\times 10^{-4}, and tolvar=1×106tolvar=1\times 10^{-6}. The parameters are tuned to achieve the minimum reconstruction error.

Fig. 2 shows the experiment results where rONE-L1, FPC-AS and NESTA produce faithful reconstructions of the original image. The rONE-L1 algorithm produces a reconstruction error (0.0741) lower than that of FPC-AS (0.0809) with comparable computation times (11.1 s and 11.4 s, respectively). While NESTA results in a slightly lower reconstruction error (0.0649), it incurs about twice more computation time (29.4 s).

IV Conclusion

In this work, we have presented novel algorithms to solve the basis pursuit problem for noiseless CS. The proposed rONE-L1 algorithm, based on the augmented Lagrange multiplier method and heuristic simplification, can be considered as a modified IST with an aggressive continuation strategy. The following two cases of CS have been studied: 1) exact reconstruction of sparse signals from noise-free measurements, and 2) reconstruction of approximately sparse signals from noise-contaminated measurements. The proposed rONE-L1 outperforms AMP, which is a well-known IST type algorithm, in Case 2 and also in Case 1 when the setting of (δ,ρ)\left(\delta,\rho\right) is close to the phase transition of basis pursuit. It is faster than NESTA in both Case 1 and Case 2. The numerical experiments further show that rONE-L1 outperforms FPC-AS in Case 2. Apart from the high computational efficiency and accurate reconstruction result, another useful property of rONE-L1 is its ease of parameter tuning. It only needs to set a termination criterion τ\tau if the recommended rr is used and the value of τ\tau is explicit in Case 2. While this correspondence focuses on reconstruction of real-valued signals, it is straightforward to apply ONE-L1 algorithms to the reconstruction of complex-valued signals . More rigorous analysis of rONE-L1 is currently under investigation.

Acknowledgment

The authors are grateful to the anonymous reviewers for helpful comments. Z. Yang wishes to thank Arian Maleki for providing the AMP codes. The Matlab codes of ONE-L1 algorithms are available at http://sites.google.com/site/zaiyang0248.

References