Asymptotic Analysis of Complex LASSO via Complex Approximate Message Passing (CAMP)

Arian Maleki, Laura Anitori, Zai Yang, Richard Baraniuk

I Introduction

Recovering a sparse signal from an undersampled set of random linear measurements is the main problem of interest in compressed sensing (CS). In the past few years many algorithms have been proposed for signal recovery, and their performance has been analyzed both analytically and empirically . However, whereas most of the theoretical work has focussed on the case of real-valued signals and measurements, in many applications, such as magnetic resonance imaging and radar, the signals are more easily representable in the complex domain . In such applications, the real and imaginary components of a complex signal are often either zero or non-zero simultaneously. Therefore, recovery algorithms may benefit from this prior knowledge. Indeed the results presented in this paper confirm this intuition.

Motivated by this observation, we investigate the performance of the complex-valued LASSO in the case of noise-free and noisy measurements. The derivations are based on the state evolution (SE) framework, presented previously in . Also a new algorithm, complex approximate message passing (CAMP), is presented to solve the complex LASSO problem. This algorithm is an extension of the AMP algorithm . However, the extension of AMP and its analysis from the real to the complex setting is not trivial; although CAMP shares some interesting features with AMP, it is substantially more challenging to establish the characteristics of CAMP. Furthermore, some important features of CAMP are specific to complex-valued signals and the relevant optimization problem. Note that the extension of the Bayesian-AMP algorithm to complex-valued signals has been considered elsewhere and is not the main focus of this work.

In the next section, we briefly review some of the existing algorithms for sparse signal recovery in the real-valued setting and then focus on recovery algorithms for the complex case, with particular attention to the AMP and CAMP algorithms. We then introduce two criteria which we use as measures of performance for various algorithms in noiseless and noisy settings. Based on these criteria, we establish the novelty of our results compared to the existing work. An overview of the organization of the rest of the paper is provided in Section I-G.

Consider the problem of recovering a sparse vector so\mathdsRNs_{o}\in\mathds{R}^{N} from a noisy undersampled set of linear measurements y\mathdsRny\in\mathds{R}^{n}, where y=Aso+wy=As_{o}+w and ww is the noise. Let kk denote the number of nonzero elements of sos_{o}. The measurement matrix AA has i.i.d. elements from a given distribution on \mathdsR\mathds{R}. Given yy and AA, we seek an approximation to sos_{o}.

In this paper, we are particularly interested in AMP . Starting from x0=0x^{0}=0 and z0=yz^{0}=y, AMP uses the following iterations:

where η(x;τ)=(xτ)+sign(x)\eta_{\circ}(x;\tau)=(|x|-\tau)_{+}\text{sign}(x) is the soft thresholding function, τt\tau_{t} is the threshold parameter, and ItI^{t} is the active set of xtx^{t}, i.e., It={i  xit0}I^{t}=\{i\ |\ x^{t}_{i}\neq 0\}. The notation It|I^{t}| denotes the cardinality of ItI^{t}. As we will describe later, the strong connection between AMP and LASSO and the ease of predicting the performance of AMP has led to an accurate performance analysis of LASSO , .

I-B Complex-valued sparse recovery algorithms

Consider the complex setting, where the signal sos_{o}, the measurements yy, and the matrix AA are complex-valued. The success of LASSO has motivated researchers to use similar techniques in this setting as well. We consider the following two schemes that have been used in the signal processing literature:

which is called the basis pursuit problem, or r-BP in this paper. It is straightforward to extend the analyses of LASSO and BP for the real-valued signals to r-LASSO and r-BP.The asymptotic theoretical results on LASSO and BP consider i.i.d. Gaussian measurement matrices . However, it has been conjectured that the results are universal and hold for a “larger” class of random matrices .

r-LASSO ignores the information about any potential grouping of the real and imaginary parts. But, in many applications the real and imaginary components tend to be either zero or non-zero simultaneously. Considering this extra information in the recovery stage may improve the overall performance of a CS system.

c-LASSO: Another natural extension of the LASSO to the complex setting is the following optimization problem that we term c-LASSO

An important question we address in this paper is: can we measure how much the grouping of the real and the imaginary parts improves the performance of c-LASSO compared to r-LASSO? Several papers have considered similar problems and have provided guarantees on the performance of c-LASSO. However, the results are usually inconclusive because of the loose constants involved in the analyses. This paper addresses the above questions with an analysis that does not involve any loose constants and therefore provides accurate comparisons.

Motivated by the recent results in the asymptotic analysis of the LASSO , , we first derive the complex approximate message passing algorithm (CAMP) as a fast and efficient algorithm for solving the c-LASSO problem. We then extend the state evolution (SE) framework introduced in to predict the performance of the CAMP algorithm in the asymptotic setting. Since the CAMP algorithm solves c-LASSO, such predictions are accurate for c-LASSO as well for NN\rightarrow\infty. The analysis carried out in this paper provides new information and insight on the performance of the c-LASSO that was not known before such as the least favorable distribution and the noise sensitivity of c-LASSO and CAMP. A more detailed description of the contributions of this paper is summarized in Section I-E.

I-C Notation

We are interested in the asymptotic setting where δ=n/N\delta=n/N and ρ=k/n\rho=k/n are fixed, while NN\rightarrow\infty. We further assume that the elements of sos_{o} are i.i.d. so,i(1ρδ)δ0(so,i)+ρδG(so,i)s_{o,i}\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i}), where GG is an unknown probability distribution with no point mass at , and δ0\delta_{0} is a Dirac delta function.This assumption is not necessary and as long as the marginal distribution of sos_{o} converges to a given distribution the statements of this paper hold. For further information on this, see and . Clearly, the expected number of non-zero elements in the vector sos_{o} is ρδN\rho\delta N. We call this value the sparsity level of the signal. In this model, we are assuming that all the non-zero real and imaginary coefficients are paired. This quantifies the maximum amount of improvement the c-LASSO gains by grouping the real and imaginary parts.

We use the notations \mathdsE\mathds{E}, \mathdsEX\mathds{E}_{X}, and \mathdsEXF\mathds{E}_{X\sim F} for expected value, conditional expected value given the random variable XX, and expected value with respect to a random variable XX drawn from the distribution FF, respectively. Define Fϵ,γ\mathcal{F}_{\epsilon,\gamma} as the family of distributions FF with \mathdsEXF(\mathdsI(X=0))1ϵ\mathds{E}_{X\sim F}(\mathds{I}(|X|=0))\geq 1-\epsilon and \mathdsEXF(X2)ϵγ2\mathds{E}_{X\sim F}(|X|^{2})\leq\epsilon\gamma^{2}, where \mathdsI\mathds{I} denotes the indicator function. An important distribution in this class is qo(X)q(X)(1ϵ)δ0(X)+ϵδγ(X)q_{o}(X)\triangleq q(|X|)\triangleq(1-\epsilon)\delta_{0}(|X|)+\epsilon\delta_{\gamma}(|X|), where δγ(X)δ0(Xγ)\delta_{\gamma}(|X|)\triangleq\delta_{0}(|X|-\gamma). Note that this distribution is independent of the phase and in addition to a point mass at zero has another point mass at γ\gamma. Finally, define Fϵ{ F  \mathdsEXF(\mathdsI(X0))ϵ}\mathcal{F}_{\epsilon}\triangleq\{\ F\ |\ \mathds{E}_{X\sim F}(\mathds{I}(|X|\neq 0))\leq\epsilon\}.

I-D Performance criteria

We compare c-LASSO with r-LASSO in both the noise-free and noisy measurements cases. For each scenario, we define a specific measure to compare the performance of the two algorithms.

Consider the problem of recovering sos_{o} drawn from so,ii.i.d.(1ρδ)δ0(so,i)+ρδG(so,i)s_{o,i}\overset{\rm i.i.d.}{\sim}(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i}), from a set of noise free measurements y=Asoy=As_{o}. Let Aα\mathcal{A}_{\alpha} be a sparse recovery algorithm with free parameter α\alpha. For instance A\mathcal{A} may be the c-LASSO algorithm and the free parameter of the algorithm is the regularization argument λ\lambda. Given (y,A)(y,A), Aα\mathcal{A}_{\alpha} returns an estimate x^Aα\hat{x}^{\mathcal{A}_{\alpha}} of sos_{o}. Suppose that in the noise free case, as NN\rightarrow\infty, the performance of Aα\mathcal{A}_{\alpha} exhibits a sharp phase transition, i.e., for every value of δ\delta, there exists ρAα(δ){\rho}^{\mathcal{A}_{\alpha}}(\delta), below which limNx^Aαso2/N0\lim_{N\rightarrow\infty}\|\hat{x}^{\mathcal{A}_{\alpha}}-s_{o}\|^{2}/N\rightarrow 0 almost surely, while for ρ>ρAα(δ)\rho>\rho^{\mathcal{A}_{\alpha}}(\delta), Aα\mathcal{A}_{\alpha} fails and limNx^Aαso2/N0\lim_{N\rightarrow\infty}\|\hat{x}^{\mathcal{A}_{\alpha}}-s_{o}\|^{2}/N\nrightarrow 0. The phase transition has been studied both empirically and theoretically for many sparse recovery algorithms . The phase transition curve ρAα(δ)\rho^{\mathcal{A}_{\alpha}}(\delta) specifies the fundamental exact recovery limit of algorithm Aα\mathcal{A}_{\alpha}. The free parameter α\alpha can strongly affect the performance of the sparse recovery algorithm . Therefore, optimal tuning of this parameter is essential in practical applications. One approach is to tune the parameter for the highest phase transition ,In this paper, we consider algorithms whose phase transitions do not depend on the distribution GG of non-zero coefficients. Otherwise, one could use the maximin framework introduced in . i.e.,

In other words, ρA\rho^{\mathcal{A}} is the best performance Aα\mathcal{A}_{\alpha} provides in the exact sparse signal recovery problem, if we know how to tune the algorithm properly. Based on this framework, we say algorithm A\mathcal{A} outperforms B\mathcal{B} at a given δ\delta, if and only if ρA(δ)>ρB(δ)\rho^{\mathcal{A}}(\delta)>\rho^{\mathcal{B}}(\delta).

I-D2 Noisy measurements

where α\alpha denotes the tuning parameter of the algorithm Aα\mathcal{A}_{\alpha}. If the noise sensitivity is large, then the measurement noise may severely degrade the final reconstruction. In (1) we search for the distribution that induces the maximum reconstruction error to the algorithm. This ensures that for other signal distributions the reconstruction error is smaller. By tuning α\alpha, we may obtain better estimate of sos_{o}. Therefore, we tune the parameter α\alpha to obtain the lowest noise sensitivity, i.e.,

Based on this framework, we say that algorithm A\mathcal{A} outperforms B\mathcal{B} at a given δ\delta and ρ\rho if and only if NSA(δ,ρ)<NSB(δ,ρ){\rm NS}^{\mathcal{A}}(\delta,\rho)<{\rm NS}^{\mathcal{B}}(\delta,\rho).

I-E Contributions

In this paper, we first develop the complex approximate message passing (CAMP) algorithm that is a simple and fast converging iterative method for solving c-LASSO. We extend the state evolution (SE), introduced recently as a framework for accurate asymptotic predictions of the AMP performance, to CAMP.Note that SE has been proved to be accurate only for the case of Gaussian measurement matrices . But, extensive simulations have confirmed its accuracy for a large class of random measurement matrices . The results of our paper are also provably correct for complex Gaussian measurement matrices. But, our simulations confirm that they hold for broader set of matrices. We will then use the connection between CAMP and c-LASSO to provide an accurate asymptotic analysis of the c-LASSO problem. We aim to characterize the phase transition curve (noise-free measurements) and noise sensitivity (noisy measurements) of c-LASSO and CAMP when the real and imaginary parts are paired, i.e., they are both zero or non-zero simultaneously. Both criteria have been extensively studied for the real signals (and hence for the r-LASSO) . The results of our predictions are summarized in Figures 1, 2, and 3. Figure 1 compares the phase transition curve of c-BP and CAMP with the phase transition curve of r-BP. As we expected c-BP outperforms r-BP since it exploits the connection between the real and imaginary parts. If ρSE(δ)\rho_{SE}(\delta) denotes the phase transition curve, then we also prove that ρSE(δ)1log(1/2δ)\rho_{SE}(\delta)\sim\frac{1}{\log(1/2\delta)} as δ0\delta\rightarrow 0. Comparing this with ρSER(δ)12log(1/δ)\rho^{R}_{SE}(\delta)\sim\frac{1}{2\log(1/\delta)} for the r-LASSO , we conclude that

This means that, in the very high undersampling regime the c-LASSO can recover signals that are two times more dense than the signals that are recovered by r-LASSO. Figure 2 exhibits the noise sensitivity of c-LASSO and CAMP. We prove in Section III-C that, as the sparsity approaches the phase transition curve, the noise sensitivity grows up to infinity. Finally, Figure 3 compares the contour plots of the noise sensitivity of c-LASSO with those of the r-LASSO. For the fixed value noise sensitivity, the level set of the c-LASSO is higher than that of r-LASSO. It is worth noting that the same comparisons hold between CAMP and AMP, as we will clarify in Section III-D.

I-F Related work

The state evolution framework used in this paper was first introduced in . Deriving the phase transition and noise sensitivity of the LASSO for real-valued signals and real-valued measurements from SE is due to ; see for more comprehensive discussion. Finally, the derivation of AMP from the full sum-product message passing is due to . Our main contribution in this paper is to extend these results to the complex setting. Not only is the analysis of the state evolution more challenging in this setting, but it also provides new insights on the performance of c-LASSO that have not been available. For instance, the noise sensitivity of c-LASSO has not previously been determined.

The recovery of sparse complex signals is a special case of group-sparsity or block-sparsity, where all the groups are non-overlapping and have size 22. According to the group sparsity assumption, the non-zero elements of the signal tend to occur in groups or clusters. One of the algorithms used in this context is the group-LASSO . Consider a signal so\mathdsRNs_{o}\in\mathds{R}^{N}. Partition the indices of sos_{o} into mm groups g1,,gmg_{1},\ldots,g_{m}. The group-LASSO algorithm minimizes the following cost function:

where the λi\lambda_{i}’s are regularization parameters.

The group-Lasso algorithm has been extensively studied in the literature . We briefly review several papers and emphasize the differences from our work. analyzes the consistency of the group LASSO estimator in the presence of noise. Fixing the signal sos_{o}, it provides conditions under which the group LASSO is consistent as nn\rightarrow\infty. consider a weak notion of consistency, i.e., exact support recovery. However, proves that in the setting we are interested in, i.e., k/n=ρk/n=\rho and n/N=δn/N=\delta, even exact support recovery is not possible. When noise is present, our goal is neither exact recovery nor exact support recovery. Instead, we characterize the mean square error (MSE) of the reconstruction. This criterion has been considered in . Although the results of show qualitatively the benefit of group sparsity, they do not characterize the difference quantitatively. In fact, loose constants in both the error bound and the number of samples do not permit accurate performance comparison. In our analysis, no loose constant is involved, and we provide very accurate characterization of the mean square error.

Group-sparsity and group-LASSO are also of interest in the sparse recovery community. For example, the analysis carried out in are based on “coherence”. These results provide sufficient conditions with again loose constants as discussed above. The work of addresses this issue by an accurate analysis of the algorithm in the noiseless setting σ=0\sigma=0. They provide a very accurate estimate of the phase transition curve for the group-LASSO. However, SE provides a more flexible framework to analyze c-LASSO than the analysis of , and it provides more information than just the phase transition curve. For instance, it points to the least favorable distribution of the input and noise sensitivity of c-LASSO.

The Bayesian approach that assumes a hidden Markov model for the signal has been also explored for the recovery of group sparse signals . It has been shown that AMP combined with an expectation maximization algorithm (for estimating the parameters of the distribution) leads to promising results in practice . Kamilov et al. have taken the first step towards a theoretical understanding of such algorithms. However, the complete understanding of the expectation maximization employed in such methods is not available yet. Furthermore, the success of such algorithms seem to be dependent on the match between the assumed and actual prior distribution. Such dependencies have not been theoretically analyzed yet. In this paper we assume that the distribution of non-zero coefficients is not known beforehand and characterize the performance of c-LASSO for the least favorable distribution.

While writing this paper we were made aware that in an independent work Donoho, Johnstone, and Montanari are extending the SE framework to the general setting of group sparsity . Their work considers the state evolution framework for the group-LASSO problem and will include the generalization of the analysis provided in this paper to the case where the variables tend to cluster in groups of size BB.

Both complex signals and group-sparse signals are special cases of model-based CS . By introducing more structured models for the signal, proves that the number of measurements needed are proportional to the “complexity” of the model rather than the sparsity level . The results in model-based CS also suffer from loose constants in both the number of measurements and the mean square error bounds.

Finally, from an algorithmic point of view, several papers have considered solving the c-LASSO problem using first-order algorithms .First-order methods are iterative algorithms that use either the gradient or the subgradient of the function at the previous iterations to update their estimates. The deterministic framework that measures the convergence of an algorithm on the problem instance that yields the slowest convergence rate, is not an appropriate measure of the convergence rate for the compressed sensing problems . Therefore, considers the average convergence rate for iterative algorithms. In that setting, AMP is the only first order algorithm that provably achieves linear convergence to date. Similarly, the CAMP algorithm, introduced in this paper, provides the first, first-order c-LASSO solver that provides a linear average convergence rate.

I-G Organization of the paper

We introduce the CAMP algorithm in Section II. We then explain the state evolution equations that characterizes the evolution of the mean square error through the iterations of the CAMP algorithm in Section III, and we analyze the important properties of the SE equations. We then discuss the connection between our calculations and the solution of LASSO in Section III-D. We confirm our results via Monte Carlo simulations in Section IV.

II Complex Approximate Message Passing

The high computational complexity of interior point methods for solving large scale convex optimization problems has spurred the development of first-order methods for solving the LASSO problem. See and the references therein for a description of some of these algorithms. One of the most successful algorithms for CS problems is the AMP algorithm introduced in . In this section, we use the approach introduced in to derive the approximate message passing algorithm for the c-LASSO problem that we term Complex Approximate Message Passing (CAMP).

Let s1,s2,,sNs_{1},s_{2},\ldots,s_{N} be NN random variables with the following distribution:

where β\beta is a constant and Z(β)seβλs1β2yAs22dsZ(\beta)\triangleq\int_{s}{\rm e}^{-\beta\lambda\|s\|_{1}-\frac{\beta}{2}\|y-As\|_{2}^{2}}ds. As β\beta\rightarrow\infty, the mass of this distribution concentrates around the solution of the LASSO. Therefore, one way to find the solution of LASSO is to marginalize this distribution. However, calculating the marginal distribution is an NP-complete problem. The sum-product message passing algorithm provides a successful heuristic for approximating the marginal distribution. As NN\rightarrow\infty and β\beta\rightarrow\infty the iterations of the sum-product message passing algorithm are simplified to ( or Chapter 5 of )

III Formal analysis of CAMP and c-LASSO

In this section, we explain the state evolution (SE) framework that predicts the performance of the CAMP and c-LASSO in the asymptotic settings. We then use this framework to analyze the phase transition and noise sensitivity of the CAMP and c-LASSO. The formal connection between state evolution and CAMP/c-LASSO is discussed in Section III-D.

We now conduct an asymptotic analysis of the CAMP algorithm. As we confirm in Section III-D, the asymptotic performance of the algorithm is tracked through a few variables, called the state variables. The state of the algorithm is the 5-tuple s=(m;δ,ρ,σ,G)\mathbf{s}=(m;\delta,\rho,\sigma,G), where GG corresponds to the distribution of the non-zero elements of the sparse vector sos_{o}, σ\sigma is the standard deviation of the measurement noise, and mm is the asymptotic normalized mean square error. The threshold parameter (threshold policy) of CAMP in its most general form could be a function of the state of the algorithm τ(s)\tau(\mathbf{s}). Define npi(m;σ,δ)σ2+mδ{\rm npi}(m;\sigma,\delta)\triangleq\sigma^{2}+\frac{m}{\delta}. The mean square error (MSE) map is defined as

where Z1,Z2N(0,1/2)Z_{1},Z_{2}\sim N(0,1/2) and X(1ρδ)δ0(x)+ρδG(x)X\sim(1-\rho\delta)\delta_{0}(|x|)+\rho\delta G(x) are independent random variables. Note that GG is a probability distribution on \mathdsC\mathds{C}. In the rest of this paper, we consider the thresholding policy τ(s)=τnpi(m,σ,δ)\tau(\mathbf{s})=\tau\sqrt{{\rm npi}(m,\sigma,\delta)}, where the constant τ\tau is yet to be tuned according to the schemes introduced in Sections I-D1 and I-D2. When we use this thresholding policy we may equivalently write Ψ(s,τ(s))\Psi(\mathbf{s},\tau(\mathbf{s})) as Ψ(s,τ)\Psi(\mathbf{s},\tau). This thresholding policy is the same as the thresholding policy introduced in . When the parameters δ,ρ,σ,τ\delta,\rho,\sigma,\tau and GG are clear from the context, we denote the MSE map by Ψ(m)\Psi(m). SE is the evolution of mm (starting from t=0t=0 and m0=E(X2)m_{0}=E(|X|^{2})) by the rule

where npitnpi(mt,σ,δ){\rm npi}^{t}\triangleq{\rm npi}(m_{t},\sigma,\delta). As will be described in Section III-D, this equation tracks the normalized MSE of the CAMP algorithm in the asymptotic setting n,Nn,N\rightarrow\infty and n/Nδn/N\rightarrow\delta. In other words, if mtm_{t} is the MSE of the CAMP algorithm at iteration tt, the mt+1m_{t+1}, calculated by (7), is the MSE of CAMP at iteration t+1t+1.

Let Ψ\Psi be almost everywhere differentiable. mm^{*} is called a fixed point of Ψ\Psi if and only if Ψ(m)=m\Psi(m^{*})=m^{*}. Furthermore, a fixed point is called stable if dΨ(m)dmm=m<1\left.\frac{d\Psi(m)}{dm}\right|_{m=m^{*}}<1, and unstable if dΨ(m)dmm=m>1\left.\frac{d\Psi(m)}{dm}\right|_{m=m^{*}}>1.

It is clear that if mm^{*} is the unique stable fixed point of the Ψ\Psi function, then mtmm_{t}\rightarrow m^{*} as tt\rightarrow\infty. Also, if all the fixed points of Ψ\Psi are unstable, then mtm_{t}\rightarrow\infty as tt\rightarrow\infty. Define μX\mu\triangleq|X| and θX\theta\triangleq\measuredangle X. Let G(μ,θ)G(\mu,\theta) denote the probability density function of XX and define G(μ)G(μ,θ)dθG(\mu)\triangleq\int G(\mu,\theta)d\theta as the marginal distribution of μ\mu. The next lemma shows that in order to analyze the state evolution function we only need to consider the amplitude distribution. This substantially simplifies our analysis of SE in the next sections.

The MSE map does not depend on the phase distribution of the input signal, i.e.,

III-B Noise-free signal recovery

Consider the noise free setting with σ=0\sigma=0. Suppose that SE predicts the MSE of CAMP in the asymptotic setting (we will make this rigorous in Section III-D). As mentioned in Section I-D1, in order to characterize the performance of CAMP in the noiseless setting, we first derive its phase transition curve and then optimize over τ\tau to obtain the highest phase transition CAMP can achieve. Fix all the state variables except for mm, and ρ\rho. The evolution of mm, discriminates the following two regions for ρ\rho:

Region I: The values of ρ\rho for which Ψ(m)<m\Psi(m)<m for every m>0m>0;

Since is necessarily a fixed point of the Ψ\Psi function, in Region I mt0m_{t}\rightarrow 0 as tt\rightarrow\infty. The following lemma shows that in Region II m=0m=0 is an unstable fixed point and therefore starting from m00m_{0}\neq 0, mt0m_{t}\nrightarrow 0.

Let σ=0\sigma=0. If ρ\rho is in Region II, then Ψ\Psi has an unstable fixed point at zero.

We prove in Lemma V.2 that Ψ(m)\Psi(m) is a concave function of mm. Therefore, ρ\rho is in Region II if and only if dΨ(m)dmm=0>1\left.\frac{d\Psi(m)}{dm}\right|_{m=0}>1. This in turn indicates that is an unstable fixed point. ∎

It is also easy to confirm that Region I is of the form [0,ρSE(δ,G,τ))[0,\rho_{SE}(\delta,G,\tau)). As we will see in Section III-D, ρSE(δ,G,τ)\rho_{SE}(\delta,G,\tau) determines the phase transition curve of the CAMP algorithm. According to Lemma III.2, the MSE map does not depend on the phase distribution of the non-zero elements. The following proposition shows that in fact ρSE\rho_{SE} is independent of GG even though the Ψ\Psi function depends on G(μ)G(\mu).

ρSE(δ,G,τ)\rho_{SE}(\delta,G,\tau) is independent of the distribution GG.

According to Lemma V.2 in Appendix V-D, Ψ\Psi is concave. Therefore, it has a stable fixed point at zero if and only if its derivative at zero is less than 11. It is also straightforward (from Appendix V-D) to show that

Setting this derivative to 11, it is clear that the phase transition value of ρ\rho is independent of GG. ∎

According to Proposition III.4 the only parameters that affect ρSE\rho_{SE} are δ\delta and the free parameter τ\tau. Fixing δ\delta, we tune τ\tau such that the algorithm achieves its highest phase transition for a certain number of measurements, i.e.,

Using SE we can calculate the optimal value of τ\tau and ρSE(δ)\rho_{SE}(\delta).

ρSE(δ)\rho_{SE}(\delta) and δ\delta satisfy the following implicit relations:

for τ[0,)\tau\in[0,\infty). Here, χ1(τ)ωτω(τω)eω2dω\chi_{1}(\tau)\triangleq\int_{\omega\geq\tau}\omega(\tau-\omega){\rm e}^{-\omega^{2}}d\omega and χ2(τ)ω>τω(ωτ)2eω2\chi_{2}(\tau)\triangleq\int_{\omega>\tau}\omega(\omega-\tau)^{2}{\rm e}^{-\omega^{2}}.

See Appendix V-D for the proof. Figure 1 displays this phase transition curve that is derived from the SE framework and compares it with the phase transition of r-BP algorithm. As will be described later, ρSE(δ)\rho_{SE}(\delta) corresponds to the phase transition of c-LASSO. Hence the difference between ρSE(δ)\rho_{SE}(\delta) and phase transition curve of r-LASSO is the benefit of grouping the real and imaginary parts.

It is also interesting to compare the ρSE(δ)\rho_{SE}(\delta) (which as we see later predicts the performance of c-LASSO) with the phase transition of r-LASSO in high undersampling regime δ0\delta\rightarrow 0. The implicit formulation above enables us to calculate the asymptotic performance of the phase transition as δ0\delta\rightarrow 0.

ρSE(δ)\rho_{SE}(\delta) follows the asymptotic behavior

See Appendix V-E for the proof. As mentioned above, this theorem shows that as δ0\delta\rightarrow 0 the phase transition of c-BP and CAMP is two times that of the r-LASSO, which is given by ρSER1/(2log(1/δ))\rho^{R}_{SE}\sim 1/(2\log(1/\delta)) . This improvement is due to the grouping of real and imaginary parts of the signal.

III-C Noise sensitivity

In this section we characterize the noise sensitivity of SE. To achieve this goal, we first discuss the risk of the complex soft thresholding function. The properties of this risk play an important role in the discussion of the noise sensitivity of SE in Section III-C2.

Define the risk of the soft thresholding function as

where μ[0,)\mu\in[0,\infty), θ[0,2π)\theta\in[0,2\pi), and the expected value is with respect to the two independent random variables Z1,Z2N(0,1/2)Z_{1},Z_{2}\sim N(0,1/2). It is important to note that according to Lemma III.2, the risk function is independent of θ\theta. The following lemma characterizes two important properties of this risk function:

r(μ,τ)r(\mu,\tau) is an increasing function of μ\mu and a concave function in terms of μ2\mu^{2}.

See Appendix V-F for the proof of this lemma. We define the minimax risk of the soft thresholding function as

where qq is the probability density function of XX, and the expected value is with respect to XX, Z1Z_{1} and Z2Z_{2}.

Note that qFϵq\in\mathcal{F}_{\epsilon} implies that qq has a point mass of 1ϵ1-\epsilon at zero; see Section I-C for more information. In the next section we show a connection between this minimax risk and the noise sensitivity of the SE. Therefore, it is important to characterize M(ϵ)M^{\flat}(\epsilon).

The minimax risk of the soft thresholding function satisfies

See Appendix V-G for the proof. It is important to note that the quantities in (8) can be easily calculated in terms of the density and distribution function of a normal random variable. Therefore, a simple computer program may accurately calculate the value of M(ϵ)M^{\flat}(\epsilon) for any ϵ\epsilon.

The proof provided for Proposition III.8 also proves the following proposition. We will discuss the importance of this result for compressed sensing problems in the next section.

The maximum of the risk function, maxqFϵ,γ\mathdsEη(X+Z1+iZ2;τ)X2\max_{q\in\mathcal{F}_{\epsilon,\gamma}}\mathds{E}|\eta(X+Z_{1}+iZ_{2};\tau)-X|^{2}, is achieved on q(X)=(1ϵ)δ0(X)+ϵδγ(X)q(X)=(1-\epsilon)\delta_{0}(|X|)+\epsilon\delta_{\gamma}(|X|).

First, note that the maximizing distribution (or least favorable distribution) is independent of the threshold parameter. Second, note that the maximizing distribution is not unique since we have already proved that the phase distribution does not affect the risk function.

III-C2 Noise sensitivity of state evolution

As mentioned in Section III-A, in the presence of measurement noise, SE is given by

where npi=σ2+mtδ{\rm npi}={\sigma^{2}+\frac{m_{t}}{\delta}}. As mentioned above, mtm_{t} characterizes the asymptotic MSE of CAMP at iteration tt. Therefore, the final solution of the CAMP algorithm converges to one of the stable fixed points of the Ψ\Psi function. The next theorem suggests that the stable fixed point is unique, and therefore no matter where the algorithm starts from it will always converge to the same MSE.

Ψ(m)\Psi(m) has a unique stable fixed point to which the sequence of {mt}\{m_{t}\} converges.

We call the fixed point in Lemma III.10 fMSE(σ2,δ,ρ,G,τ){\rm fMSE}(\sigma^{2},\delta,\rho,G,\tau). According to Section I-D2, we define the minimax noise sensitivity as

The noise sensitivity of SE can be easily evaluated from M(ϵ)M^{\flat}(\epsilon). The following theorem characterizes this relation.

Let ρMSE(δ)\rho_{MSE}(\delta) be the value of ρ\rho satisfying M(ρδ)=δM^{\flat}(\rho\delta)=\delta. Then, for ρ<ρMSE\rho<\rho_{MSE} we have

and for ρ>ρMSE(δ)\rho>\rho_{MSE}(\delta), NSSE(δ,ρ)={\rm NS}^{SE}(\delta,\rho)=\infty.

The proof of this theorem follows along the same lines as the proof of Proposition 3.1 in , and therefore we skip it for the sake of brevity. The contour lines of this noise sensitivity function are displayed in Figure 2.

Similar arguments as those presented in Proposition 3.1 in combined with Proposition III.9 prove the following.

The maximum of the formal MSE, maxqFϵ,γfMSE(σ2,δ,ρ,G,τ)\max_{q\in\mathcal{F}_{\epsilon,\gamma}}{\rm fMSE}(\sigma^{2},\delta,\rho,G,\tau) is achieved by q=(1ϵ)δ0(X)+ϵδγ(X)q=(1-\epsilon)\delta_{0}(|X|)+\epsilon\delta_{\gamma}(|X|), independent of σ\sigma and τ\tau.

Again we emphasize that the maximizing or least favorable distribution is not unique. Note that the least favorable distribution provides a simple approach for designing and setting the parameters of CS systems : We design the system such that it performs well on the least favorable distribution, and it is then guaranteed that the system will perform as well (or in many cases better) on all other input distributions.

As a final remark we note that ρMSE(δ)\rho_{MSE}(\delta) equals ρSE(δ)\rho_{SE}(\delta) as proved next.

The proof is a simple comparison of the formulas. We first know that ρMSE\rho_{MSE} is derived from the following equation

On the other hand, since Ψ(m)\Psi(m) is a concave function of mm, ρSE(δ,τ)\rho_{SE}(\delta,\tau) is derived from dΨ(m)dmm=0=1\left.\frac{d\Psi(m)}{dm}\right|_{m=0}=1. This derivative is equal to

Also, ρSE(δ)=supτρSE(τ,δ)\rho_{SE}(\delta)=\sup_{\tau}\rho_{SE}(\tau,\delta). However, in order to obtain the highest ρ\rho we should minimize the above expression over τ\tau. Therefore, both ρSE(δ)\rho_{SE}(\delta) and ρMSE(δ)\rho_{MSE}(\delta) satisfy the same equations and thus are exactly equal. ∎

III-D Connection between the state evolution, CAMP, and c-LASSO

There is a strong connection between the SE framework, the CAMP algorithm, and c-LASSO. Recently, proved that SE predicts the asymptotic performance of the AMP algorithm when the measurement matrix is i.i.d. Gaussian. The result also holds for complex Gaussian matrices and complex input vectors. As in , we conjecture that the SE predictions are correct for a “large” class of random matrices. We show evidence of this claim in Section IV. Here, for the sake of completeness, we quote the result of in the complex setting. Let γ:\mathdsC2\mathdsR\gamma:\mathds{C}^{2}\rightarrow\mathds{R} be a pseudo-Lipschitz function.γ:\mathdsC2\mathdsR\gamma:\mathds{C}^{2}\rightarrow\mathds{R} is pseudo-Lipschitz if and only if ψ(x)ψ(y)L(1+x2+y2)xy2|\psi(x)-\psi(y)|\leq L(1+\|x\|_{2}+\|y\|_{2})\|x-y\|_{2}. To make the presentation clear we consider a simplified version of Definition 1 in .

A sequence of instances {so(N),A(N),w(N)}\{s_{o}(N),A(N),w(N)\}, indexed by the ambient dimension NN, is called a converging sequence if the following conditions hold:

The elements of so(N)\mathdsRNs_{o}(N)\in\mathds{R}^{N} are i.i.d. drawn from (1ρδ)δ0(so,i)+ρδG(so,i)(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i}).

The elements of w(N)\mathdsRnw(N)\in\mathds{R}^{n} (n=δNn=\delta N) are i.i.d. drawn from N(0,σw2)N(0,\sigma_{w}^{2}).

The elements of A(N)\mathdsRn×NA(N)\in\mathds{R}^{n\times N} are i.i.d. drawn from a complex Gaussian distribution.

Consider a converging sequence {so(N),A(N),w(N)}\{s_{o}(N),A(N),w(N)\}. Let xt(N)x^{t}(N) be the estimate of the CAMP algorithm at iteration tt. For any pseudo Lipschitz function γ:\mathdsC2\mathdsR\gamma:\mathds{C}^{2}\rightarrow\mathds{R} we have

almost surely, where Z1+iZ2CN(0,1)Z_{1}+iZ_{2}\sim CN(0,1) and X(1ρδ)δ0(so,i)+ρδG(so,i)X\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i}) are independent complex random variables. Also, npitσ2+mt/δ{\rm npi}^{t}\triangleq\sigma^{2}+m_{t}/\delta, where mtm_{t} satisfies (7).

The proof of this theorem is similar to the proof of Theorem 1 in and hence is skipped here.

It is also straightforward to extend the result of and on the connection of message passing algorithms and LASSO to the complex setting. For a given value of τ\tau suppose that the fixed point of the state evolution is denoted by mm^{*}. Define λ(τ)\lambda(\tau) as

and \mathdsE\mathds{E} is with respect to independent random variables Z1+iZ2CN(0,1)Z_{1}+iZ_{2}\sim CN(0,1) and X(1ρδ)δ0(so,i)+ρδG(so,i)X\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i}). The following theorem establishes the connection between the solution of LASSO and the state evolution equation.

Consider a converging sequence {so(N),A(N),w(N)}\{s_{o}(N),A(N),w(N)\}. Let x^λ(τ)(N)\hat{x}^{\lambda(\tau)}(N) be the solution of LASSO. Then, for any pseudo Lipschitz function γ:\mathdsC2\mathdsR\gamma:\mathds{C}^{2}\rightarrow\mathds{R} we have

almost surely, where Z1+iZ2CN(0,1)Z_{1}+iZ_{2}\sim CN(0,1) and X(1ρδ)δ0(so,i)+ρδG(so,i)X\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i}) are independent complex random variables. npiσ2+m/δ{\rm npi}^{*}\triangleq\sigma^{2}+m^{*}/\delta, where mm^{*} is the fixed point of (7).

The proof of the theorem is similar to the proof of Theorem 1.4 in and hence is skipped here.

Note that according to Theorems III.15 and III.16, SE predicts the dynamic of the AMP algorithm and the solution of LASSO accurately in the asymptotic settings.

III-E Discussion

In this section we briefly discuss the convergence rate of the CAMP algorithm. In this respect our results are straightforward extension of the analysis in . But, for the sake of completeness, we mention a few highlights. Let {mt}t=1\{m_{t}\}_{t=1}^{\infty} be a sequence of MSE generated according to state evolution (7) for X(1ϵ)δ0(X)+ϵG(X)X\sim(1-\epsilon)\delta_{0}(|X|)+\epsilon G(X), τ\tau, and σ2=0\sigma^{2}=0. The following proposition provides an upper bound on mtm_{t} as a function of iteration tt.

Let {mt}t=1\{m_{t}\}_{t=1}^{\infty} be a sequence of MSEs generated according to SE. Then

Since according to Lemma V.2 Ψ(m)\Psi(m) is concave, we have Ψ(m)dΨ(m)dmm=0m\Psi(m)\leq\left.\frac{d\Psi(m)}{dm}\right|_{m=0}m. Hence at every iteration, mm is attenuated by dΨ(m)dmm=0\left.\frac{d\Psi(m)}{dm}\right|_{m=0}. After tt iterations we have mt(dΨ(m)dmm=0)tm0m^{t}\leq\left(\left.\frac{d\Psi(m)}{dm}\right|_{m=0}\right)^{t}m_{0}. ∎

According to Theorem III.17 the convergence rate of CAMP is linear (in the asymptotic setting).If the measurement matrix is not i.i.d. random CAMP does not necessarily converge at this rate. This is due to the fact that state evolution does not necessarily hold for arbitrary matrices. In fact, due to the concavity of the Ψ\Psi function, CAMP converges faster for large values of MSE mm. As mm reaches zero the convergence rate decreases towards the rate predicted by this theorem. Theorem III.17 provides an upper bound on the number of iterations the algorithm requires to reach to a certain accuracy. Figure 4 exhibits the value of dΨ(m)dmm=0\left.\frac{d\Psi(m)}{dm}\right|_{m=0} as a function of ρ\rho and δ\delta. This figure is based on the calculations we have presented in Appendix V-D. Here, τ\tau is chosen such that the CAMP algorithm achieves the same phase transition as c-BP algorithm. Note that, according to Proposition III.17, if dΨ(m)dmm=0<0.9\left.\frac{d\Psi(m)}{dm}\right|_{m=0}<0.9, then m200<7.1×1010m0m_{200}<7.1\times 10^{-10}m_{0}.

Theorem III.17 only considers the noise-free problem. But, again due to the concavity of the Ψ\Psi function, the convergence of CAMP to its fixed point is even faster for noisy measurements. To see this, note that once the measurements are noisy, the fixed point of CAMP occurs at a larger value of mm. Since Ψ\Psi is concave, the derivative at this point is lower than the derivative at zero. Hence, convergence will be faster.

III-E2 Extensions

The results presented in this paper are concerned with the two most popular problems in compressed sensing, i.e., exact recovery of sparse signals and approximate recovery of sparse signals in the presence of noise. However, our framework is far more powerful and can address other compressed sensing problems as well. For instance a similar framework has been used to address the problem of recovering approximately sparse signals in the presence of noise . For the sake of brevity we have not provided such an analysis in the current paper. However, the properties we proved in Lemmas III.2, III.7, and Proposition III.8 enable a straightforward extension of our analysis to such cases as well.

Furthermore, the framework we developed here provides a way for recovering sparse complex-valued signals when the distribution of non-zero elements is known. This area has been studied in .

IV Simulations

As explained in Section III-D, our theoretical results show that, if the elements of the matrix are i.i.d. Gaussian, then SE predicts the performance of the CAMP and c-LASSO algorithms accurately. However, in this section we will show evidence that suggests the theoretical framework is applicable to a wider class of measurement matrices. We then investigate the dependence of the empirical phase transition on the input distribution for medium problem sizes.

We investigate the effect of the measurement matrix distribution on the performance of CAMP and c-LASSO in two different cases. First, we consider the case where the measurements are noise-free. We postpone a discussion of measurement noise to Section IV-A2.

Suppose that the measurements are noise-free. Our goal is to empirically measure the phase transition curves of the c-LASSO and CAMP on the measurement matrices provided in Table I. To characterize the phase transition of an algorithm, we do the following:

We consider 33 equispaced values of δ\delta between and 11.

For each value of δ\delta, we calculate ρSE(δ)\rho_{SE}(\delta) from the theoretical framework and then consider 41 equispaced values of ρ\rho in [ρSE(δ)0.2,ρSE(δ)+0.2][\rho_{SE}(\delta)-0.2,\rho_{SE}(\delta)+0.2].

We fix N=1000N=1000, and for any value of ρ\rho and δ\delta, we calculate n=δNn=\lfloor\delta N\rfloor and k=ρδNk=\lfloor\rho\delta N\rfloor.

We draw M=20M=20 independent random matrices from one of the distributions described in Table I and for each matrix we construct a random input vector sos_{o} with one of the distributions described in Table II. We then form y=Asoy=As_{o} and recover sos_{o} from y,Ay,A by either c-BP or CAMP to obtain x^\hat{x}. The matrix distributions and coefficient distributions we consider in our simulations are specified in Tables I and II, respectively.

For each δ\delta, ρ\rho, and Monte Carlo sample jj we define a success variable Sδ,ρ,j=\mathdsI(x^x2x2<tol)S_{\delta,\rho,j}=\mathds{I}\left(\frac{\|\hat{x}-x\|_{2}}{\|x\|_{2}}<{\rm tol}\right) and we calculate the success probability p^δ,ρS=1MjSδ,ρ,j\hat{p}^{S}_{\delta,\rho}=\frac{1}{M}\sum_{j}S_{\delta,\rho,j}. This provides an empirical estimate of the probability of correct recovery. The value of tol in our case is set to 10410^{-4}.

For a fixed value of δ\delta, we fit a logistic regression function to p^S(δ,ρ)\hat{p}^{S}(\delta,\rho) to obtain pδS(ρ)p_{\delta}^{S}(\rho). Then we find the value of ρ^δ\hat{\rho}_{\delta} for which pδS(ρ)=0.5p_{\delta}^{S}(\rho)=0.5.

See for a more detailed discussion of this approach. For the c-LASSO algorithm, we are reproducing the experiments of and, therefore, we are using one-L1 algorithm . Although Figure 4 confirms that for most cases even 200200 iterations of CAMP are enough to reach convergence, since our goal is to measure the phase transition, we consider 30003000 iterations. See Section III-E1 for the discussion on the convergence rate.

Figure 5 compares the phase transition of c-LASSO and CAMP on the ensembles specified in Table I with the theoretical prediction of this paper. In this simulation the coefficient ensemble is UP (see Table II). Clearly, the empirical and theoretical phase transitions of the algorithms coincide. More importantly, we can conjecture that the choice of the measurement matrix ensemble does not affect the phase transition of these two algorithms. We will next discuss the impact of measurement matrix when there is noise on the measurements.

IV-A2 Noisy measurements

In this section we aim to show that, even in the presence of noise, the matrix ensembles defined in Table I perform similarly. Here is the setup for our experiment:

We set δ=0.25\delta=0.25, ρ=0.1\rho=0.1, and N=1000N=1000.

We choose 5050 different values of σ\sigma in the range [0.001, 0.1].

We choose n×Nn\times N measurement matrix AA from one of the ensembles specified in Table I.

We draw kk i.i.d. elements from UP ensemble for the k=ρnk=\lfloor\rho n\rfloor non-zero elements of the input sos_{o}.

We form the measurement vector y=Aso+σwy=As_{o}+\sigma w where ww is the noise vector with i.i.d. elements from CN(0,1)CN(0,1).

For CAMP, we set τ=2\tau=2. For c-LASSO, we use (9) to derive the corresponding values of λ\lambda for τ=2\tau=2 in CAMP.

We calculate the MSE x^so22/N\|\hat{x}-s_{o}\|_{2}^{2}/N for each matrix ensemble and compare the results.

Figures 6 and 7 summarize our results. The concentration of the points along the y=xy=x line indicates that the matrix ensembles, specified in Table I, perform similarly. The coincidence of the phase transition curves for different matrix ensembles is known as universality hypothesis (conjecture). In order to provide a stronger evidence, we run the above experiment with N=4000N=4000. The results of this experiment are exhibited in Figures 8 and 9. It is clear from these figures that the MSE is now more concentrated around the y=xy=x line. Additional experiments with other parameter values exhibited the same behavior. Note that as NN grows, the variance of the MSE estimate becomes smaller, and the behavior of the algorithm is closer to the average performance that is predicted by the SE equation.

IV-B Coefficient ensemble simulations

According to Proposition III.4, ρSE(δ,τ)\rho_{SE}(\delta,\tau) is independent of the distribution GG of non-zero coefficients of s0s_{0}. We test the accuracy of this result on medium problem sizes. We fix δ\delta to 0.10.1 and we calculate p^δ,ρS\hat{p}^{S}_{\delta,\rho} for 6060 equispaced values of ρ\rho between 0.10.1 and 0.50.5. For each algorithm and each value of ρ\rho we run 100100 Monte Carlo trials and calculate the success rate for the Gaussian matrix and the coefficient ensembles specified in Table II. Figure 10 summarizes our result. Simulations at other values of δ\delta result in very similar behavior. These results are consistent with Proposition III.4. The small differences between the empirical phase transitions are due to two issues that are not reflected in Proposition III.4: (i) NN is finite, while Proposition III.4 considers the asymptotic setting. (ii) The number of algorithm iterations is finite, while Proposition III.4 assumes that we run CAMP for an infinite number of iterations.

V Proofs of the main results

For a given convex function f:\mathdsCn\mathdsRf:\mathds{C}^{n}\rightarrow\mathds{R} the proximity operator at point xx is defined as

Since (10) can be decoupled into the elements of the x,yx,y, we can obtain the optimal value of yy, by optimizing over its individual components. In other words, we solve the optimization in (10) for x,y\mathdsCx,y\in\mathds{C}. In this case the optimization reduces to

Suppose that the optimal yy_{*} satisfies (yR)2+(yI)2>0(y_{*}^{R})^{2}+(y_{*}^{I})^{2}>0. Then the function (yR)2+(yI)2\sqrt{(y^{R})^{2}+(y^{I})^{2}} is differentiable and the optimal solution satisfies

Combining the two equations in V-A we obtain yRxI=xRyIy_{*}^{R}x^{I}=x^{R}y_{*}^{I}. Replacing this in (V-A) we have yR=xRτxR(xR)2+(xI)2y_{*}^{R}=x^{R}-\frac{\tau|x^{R}|}{\sqrt{(x^{R})^{2}+(x^{I})^{2}}} and yI=xIτxI(xR)2+(xI)2y_{*}^{I}=x^{I}-\frac{\tau|x^{I}|}{\sqrt{(x^{R})^{2}+(x^{I})^{2}}}. It is clear that if (xR)2+(xI)2<τ\sqrt{(x^{R})^{2}+(x^{I})^{2}}<\tau, then the signs of yRy_{*}^{R} and xRx^{R} will be opposite, which is in contradiction with (V-A). Therefore, if (xR)2+(xI)2<τ\sqrt{(x^{R})^{2}+(x^{I})^{2}}<\tau, both yRy_{*}^{R} and yIy_{*}^{I} are zero. It is straightforward to check that (0,0)(0,0) satisfies the subgradient optimality condition. ∎

V-B Proof of Proposition II.1

denote the real and imaginary parts of the complex soft thresholding function. Define

We also use the first-order expansion of the soft thresholding function to obtain

V-C Proof of Lemma III.2

Let μ\mu and θ\theta denote the amplitude and phase of the random variable XX. Define νnpi=σ2+mδ\nu\triangleq\sqrt{\rm npi}=\sqrt{\sigma^{2}+\frac{m}{\delta}} and ζμν\zeta\triangleq\frac{\mu}{{\nu}}. Then

where \mathdsEζ,θ\mathds{E}_{\zeta,\theta} denotes the conditional expectation given the variables ζ,θ\zeta,\theta. Note that the marginal distribution of ζ\zeta depends only on the marginal distribution of μ\mu. The first term in (18) is independent of the phase θ\theta, and therefore we should prove that the second term is also independent of θ\theta. Define

We prove that Φ\Phi is independent of θ\theta. For two real-valued variables zrz_{r} and zcz_{c}, define z(zr,zc)\mathbf{z}\triangleq(z_{r},z_{c}), dzdzrdzcd\mathbf{z}\triangleq dz_{r}dz_{c}, and

Define the two sets Sτ{(zr,zc)  αz<τ}S_{\tau}\triangleq\{(z_{r},z_{c})\ |\ \alpha_{z}<\tau\} and Sτc\mathdsR2\SτS_{\tau}^{c}\triangleq\mathds{R}^{2}\backslash S_{\tau}, where “\\backslash” is the set subtraction operator. We have

The first integral in (20) corresponds to the case ζeiθ+zr+izc<τ|\zeta{\rm e}^{i\theta}+z_{r}+iz_{c}|<\tau. The second integral is over the values of zrz_{r} and zcz_{c} for which ζeiθ+zr+izcτ|\zeta{\rm e}^{i\theta}+z_{r}+iz_{c}|\geq\tau. Define βζcosθ+zr\beta\triangleq\zeta\cos\theta+z_{r} and γζsinθ+zc\gamma\triangleq\zeta\sin\theta+z_{c}. We then obtain

Equality (a) is the result of the change of integration variables from γ\gamma and β\beta to rβ2+γ2r\triangleq\sqrt{\beta^{2}+\gamma^{2}} and \phi\triangleq\arctan\Big{(}\frac{\gamma}{\beta}\Big{)}. The periodicity of the cosine function proves that the last integration is independent of the phase θ\theta. We can similarly prove that zSτζ21πezr2+zc2dz\int_{\mathbf{z}\in S_{\tau}}\zeta^{2}\frac{1}{\pi}{\rm e}^{-z_{r}^{2}+z_{c}^{2}}d\mathbf{z} is independent of θ\theta. This completes the proof. \hfill\hfill\Box

V-D Proof of Theorem III.5

We first prove the following lemma that simplifies the proof of Theorem III.5.

The function Ψ(m)\Psi(m) is concave with respect to mm.

For the notational simplicity define νσ2+mδ\nu\triangleq\sqrt{\sigma^{2}+\frac{m}{\delta}}, XνXνX_{\nu}\triangleq\frac{X}{\nu}, and AνXνZ1+iZ2A_{\nu}\triangleq|X_{\nu}-Z_{1}+iZ_{2}|. We note that

Therefore, Ψ\Psi is concave with respect to mm if and only if it is concave with respect to ν2\nu^{2}. According to Lemma III.2 the phase distribution of XX does not affect the Ψ\Psi function. Therefore, we set the phase of XX to zero and assume that it is a positive-valued random variable (representing the amplitude). This assumption substantially simplifies the calculations. We have

where \mathdsEX\mathds{E}_{X} denotes the expected value conditioned on the random variable XX. We first prove that ΨX(ν2)ν2\mathdsEX(η(Xν+Z1+iZ2;τ)Xν2)\Psi_{X}(\nu^{2})\triangleq\nu^{2}\mathds{E}_{X}\left(\left|\eta(X_{\nu}+Z_{1}+iZ_{2};\tau)-X_{\nu}\right|^{2}\right) is concave with respect to ν2\nu^{2} by proving dΨXd(ν2)20\frac{d\Psi_{X}}{d(\nu^{2})^{2}}\leq 0. Then, since Ψ(ν2)\Psi(\nu^{2}) is a convex combination of ΨX(ν2)\Psi_{X}(\nu^{2}), we conclude that Ψ(ν2)\Psi(\nu^{2}) is a concave function of ν2\nu^{2} as well. The rest of the proof details the algebra required for calculating and simplifying d2ΨX(ν2)d2ν2\frac{d^{2}\Psi_{X}(\nu^{2})}{d^{2}\nu^{2}}.

Using the real and imaginary parts of the soft thresholding function and its partial derivatives introduced in (V-B) and (V-B) we have

where 1R\partial_{1}^{R}, 2R\partial_{2}^{R}, 1I\partial_{1}^{I}, and 2I\partial_{2}^{I} are defined in (V-B). Note that in the above calculations, 2ηR\partial_{2}\eta^{R} and 2ηI\partial_{2}\eta^{I} did not appear, since we assumed that XX is a real-valued random variable. Define Aν(Xν+Z1)2+Z22A_{\nu}\triangleq\sqrt{(X_{\nu}+Z_{1})^{2}+Z_{2}^{2}}. It is straightforward to show that

For f:\mathdsC\mathdsRf:\mathds{C}\rightarrow\mathds{R} we define 12f(x+iy)2f(x+iy)x2\partial^{2}_{1}f(x+iy)\triangleq\frac{\partial^{2}f(x+iy)}{\partial x^{2}}. It is straightforward to show that

Our next objective is to simplify the terms in (22). We start with

Note that in the above expression we have replaced (1τZ22Aν3)δ(Aντ)\left(1-\frac{\tau Z_{2}^{2}}{A_{\nu}^{3}}\right)\delta(A_{\nu}-\tau) with (1Z22Aν2)δ(Aντ)\left(1-\frac{Z_{2}^{2}}{A_{\nu}^{2}}\right)\delta(A_{\nu}-\tau) for an obvious reason. It is straightforward to simplify this expression to obtain

By plugging (V-D), (24), and (25) into (22), we obtain

We claim that both terms on the right hand side of (V-D) are negative. To prove this claim, we first focus on the first term:

Define Sτ{(Z1,Z2)  Aντ}S_{\tau}\triangleq\{(Z_{1},Z_{2})\ |\ A_{\nu}\geq\tau\}. We have

Equality (a) is the result of the change of integration variables from z1z_{1}, z2z_{2} to rAνr\triangleq A_{\nu} and ϕarctan(z2z1+Xν)\phi\triangleq\arctan\left(\frac{z_{2}}{z_{1}+X_{\nu}}\right). With exactly similar approach we can prove that the second term of (V-D) is also negative.

So far we have proved that ΨX(m)\Psi_{X}(m) is concave with respect to mm. But this implies that Ψ(m)\Psi(m) is also concave, since it is a convex combination of concave functions. ∎

As proved in Lemma V.2, Ψ(m)\Psi(m) is a concave function. Furthermore Ψ(0)=0\Psi(0)=0. Therefore a given value of ρ\rho is below the phase transition, i.e., ρ<ρSE(δ)\rho<\rho_{SE}(\delta) if and only if dΨdmm<1\left.\frac{d\Psi}{dm}\right|_{m}<1. It is straightforward to calculate the derivative at zero and confirm that

Since Z1,Z2N(0,1/2)Z_{1},Z_{2}\sim N(0,1/2) and are independent, the phase of Z1+iZ2Z_{1}+iZ_{2} has a uniform distribution, while its amplitude has Rayleigh distribution. Therefore, we have

We plug (29) into (28) and set the derivative dΨdmm=1\left.\frac{d\Psi}{dm}\right|_{m}=1 to obtain the value of ρ\rho at which the phase transition occurs. This value is given by

Clearly the phase transition depends on τ\tau. Hence according to the framework we introduced in Section I-D1, we search for the value of τ\tau that maximizes the phase transition ρ\rho. Define χ1(τ)τω(τω)eω2dω\chi_{1}(\tau)\triangleq\int_{\tau_{*}}^{\infty}\omega(\tau_{*}-\omega){\rm e}^{-\omega^{2}}d\omega and χ2(τ)τω(ωτ)2eω2dω\chi_{2}(\tau)\triangleq\int_{\tau_{*}}^{\infty}\omega(\omega-\tau_{*})^{2}{\rm e}^{-\omega^{2}}d\omega. This optimal τ\tau satisfies

which in turn results in δ=4(1+τ2)χ1(τ)4τχ2(τ)2τ+4χ1(τ)\delta=\frac{4(1+\tau_{*}^{2})\chi_{1}(\tau^{*})-4\tau_{*}\chi_{2}(\tau^{*})}{-2\tau_{*}+4\chi_{1}(\tau^{*})}. Plugging δ\delta into the formula for ρ\rho, we obtain the formula in Theorem III.5. ∎

V-E Proof of Theorem III.6

We first show that the value of δ\delta in Theorem III.5 goes to zero as τ\tau\rightarrow\infty. By changing the variable of integration from ω\omega to γ=ωτ\gamma=\omega-\tau, we obtain

Again by changing integration variables we have

Using (30) and (31) in the formula for δ\delta in Theorem III.5 establishes that δ0\delta\rightarrow 0 as τ\tau\rightarrow\infty. Therefore, in order to find the asymptotic behavior of the phase transition as δ0\delta\rightarrow 0, we can calculate the asymptotic behavior of δ\delta and ρ\rho as τ\tau\rightarrow\infty. This is a standard application of Laplace’s method. Using this method we calculate the leading terms of ρ\rho and δ\delta:

Plugging (32) and (33) into the formula we have for ρ\rho and δ\delta in Theorem III.5, we obtain

V-F Proof of Lemma III.7

According to Lemma III.2, the phase θ\theta does not affect the risk function, and therefore we set it to zero. We have

where ηR(μ+Z1+iZ2;τ)=(μ+Z1τ(μ+Z1)A)\mathdsI(Aτ)\eta^{R}(\mu+Z_{1}+iZ_{2};\tau)=\left(\mu+Z_{1}-\frac{\tau(\mu+Z_{1})}{A}\right)\mathds{I}(A\geq\tau), ηI(μ+Z1+iZ2;τ)=(z2τZ2A)\mathdsI(Aτ)\eta^{I}(\mu+Z_{1}+iZ_{2};\tau)=\left(z_{2}-\frac{\tau Z_{2}}{A}\right)\mathds{I}(A\geq\tau) and A(μ+Z1)2+Z22A\triangleq\sqrt{(\mu+Z_{1})^{2}+Z_{2}^{2}}. If we calculate the derivative of the risk function with respect to μ\mu, then we have

Therefore, the risk of the complex soft thresholding is an increasing function of μ\mu. Furthermore,

It is clear that the next derivative with respect to μ2\mu^{2} is negative, and therefore the function is concave. \hfill\hfill\Box

V-G Proof of Proposition III.8

As is clear from the statement of the theorem, the main challenge here is to characterize

Let X=μeiθX=\mu{\rm e}^{i\theta}, where μ\mu and θ\theta are the phase and amplitude of XX respectively. According to Lemma III.2, the risk function is independent of θ\theta. Furthermore, since qFϵq\in F_{\epsilon}, we can write it as q(μ)=(1ϵ)δ0(μ)+(1ϵ)G(μ)q(\mu)=(1-\epsilon)\delta_{0}(\mu)+(1-\epsilon)G(\mu), where GG is absolutely continuous with respect to Lebesgue measure. We then have

Furthermore, from the monotonicity of the risk function proved in Lemma III.7, we have

Again we can use the monotonicity of the risk function to prove that

The last equality is the result of the monotone convergence theorem. Combining (34) and (35) completes the proof. \hfill\hfill\Box

VI Conclusions

We have considered the problem of recovering a complex-valued sparse signal from an undersampled set of complex-valued measurements. We have accurately analyzed the asymptotic performance of c-LASSO and CAMP algorithms. Using the state evolution framework, we have derived simple expressions for the noise sensitivity and phase transition of these two algorithms. The results presented here show that substantial improvements can be achieved when the real and imaginary parts are considered jointly by the recovery algorithm. For instance, Theorem III.6 shows that in the high undersampling regime the phase transition of CAMP and c-BP is two times higher than the phase transition of r-LASSO.

Acknowledgements

Thanks to David Donoho and Andrea Montanari for their encouragement and their valuable suggestions on an early draft of this paper. We would also like to thank the reviewers and the associate editor for the thoughtful comments that helped us improve the quality of the manuscript, and to thank Ali Mousavi for careful reading of our paper and suggesting improvements. This work was supported by the grants NSF CCF-0431150, CCF-0926127, and CCF-1117939; DARPA/ONR N66001-11-C-4092 and N66001-11-1-4090; ONR N00014-08-1-1112, N00014-10-1-0989, and N00014-11-1-0714; AFOSR FA9550-09-1-0432; ARO MURI W911NF-07-1-0185 and W911NF-09-1-0383; and the TI Leadership University Program.

References