Asymptotic Analysis of Complex LASSO via Complex Approximate Message Passing (CAMP)

Arian Maleki, Laura Anitori, Zai Yang, Richard Baraniuk

I Introduction

Recovering a sparse signal from an undersampled set of random linear measurements is the main problem of interest in compressed sensing (CS). In the past few years many algorithms have been proposed for signal recovery, and their performance has been analyzed both analytically and empirically . However, whereas most of the theoretical work has focussed on the case of real-valued signals and measurements, in many applications, such as magnetic resonance imaging and radar, the signals are more easily representable in the complex domain . In such applications, the real and imaginary components of a complex signal are often either zero or non-zero simultaneously. Therefore, recovery algorithms may benefit from this prior knowledge. Indeed the results presented in this paper confirm this intuition.

Motivated by this observation, we investigate the performance of the complex-valued LASSO in the case of noise-free and noisy measurements. The derivations are based on the state evolution (SE) framework, presented previously in . Also a new algorithm, complex approximate message passing (CAMP), is presented to solve the complex LASSO problem. This algorithm is an extension of the AMP algorithm . However, the extension of AMP and its analysis from the real to the complex setting is not trivial; although CAMP shares some interesting features with AMP, it is substantially more challenging to establish the characteristics of CAMP. Furthermore, some important features of CAMP are specific to complex-valued signals and the relevant optimization problem. Note that the extension of the Bayesian-AMP algorithm to complex-valued signals has been considered elsewhere and is not the main focus of this work.

In the next section, we briefly review some of the existing algorithms for sparse signal recovery in the real-valued setting and then focus on recovery algorithms for the complex case, with particular attention to the AMP and CAMP algorithms. We then introduce two criteria which we use as measures of performance for various algorithms in noiseless and noisy settings. Based on these criteria, we establish the novelty of our results compared to the existing work. An overview of the organization of the rest of the paper is provided in Section I-G.

Consider the problem of recovering a sparse vector $s_{o}\in\mathds{R}^{N}$ from a noisy undersampled set of linear measurements $y\in\mathds{R}^{n}$ , where $y=As_{o}+w$ and $w$ is the noise. Let $k$ denote the number of nonzero elements of $s_{o}$ . The measurement matrix $A$ has i.i.d. elements from a given distribution on $\mathds{R}$ . Given $y$ and $A$ , we seek an approximation to $s_{o}$ .

In this paper, we are particularly interested in AMP . Starting from $x^{0}=0$ and $z^{0}=y$ , AMP uses the following iterations:

where $\eta_{\circ}(x;\tau)=(|x|-\tau)_{+}\text{sign}(x)$ is the soft thresholding function, $\tau_{t}$ is the threshold parameter, and $I^{t}$ is the active set of $x^{t}$ , i.e., $I^{t}=\{i\ |\ x^{t}_{i}\neq 0\}$ . The notation $|I^{t}|$ denotes the cardinality of $I^{t}$ . As we will describe later, the strong connection between AMP and LASSO and the ease of predicting the performance of AMP has led to an accurate performance analysis of LASSO , .

I-B Complex-valued sparse recovery algorithms

Consider the complex setting, where the signal $s_{o}$ , the measurements $y$ , and the matrix $A$ are complex-valued. The success of LASSO has motivated researchers to use similar techniques in this setting as well. We consider the following two schemes that have been used in the signal processing literature:

which is called the basis pursuit problem, or r-BP in this paper. It is straightforward to extend the analyses of LASSO and BP for the real-valued signals to r-LASSO and r-BP.The asymptotic theoretical results on LASSO and BP consider i.i.d. Gaussian measurement matrices . However, it has been conjectured that the results are universal and hold for a “larger” class of random matrices .

r-LASSO ignores the information about any potential grouping of the real and imaginary parts. But, in many applications the real and imaginary components tend to be either zero or non-zero simultaneously. Considering this extra information in the recovery stage may improve the overall performance of a CS system.

c-LASSO: Another natural extension of the LASSO to the complex setting is the following optimization problem that we term c-LASSO

An important question we address in this paper is: can we measure how much the grouping of the real and the imaginary parts improves the performance of c-LASSO compared to r-LASSO? Several papers have considered similar problems and have provided guarantees on the performance of c-LASSO. However, the results are usually inconclusive because of the loose constants involved in the analyses. This paper addresses the above questions with an analysis that does not involve any loose constants and therefore provides accurate comparisons.

Motivated by the recent results in the asymptotic analysis of the LASSO , , we first derive the complex approximate message passing algorithm (CAMP) as a fast and efficient algorithm for solving the c-LASSO problem. We then extend the state evolution (SE) framework introduced in to predict the performance of the CAMP algorithm in the asymptotic setting. Since the CAMP algorithm solves c-LASSO, such predictions are accurate for c-LASSO as well for $N\rightarrow\infty$ . The analysis carried out in this paper provides new information and insight on the performance of the c-LASSO that was not known before such as the least favorable distribution and the noise sensitivity of c-LASSO and CAMP. A more detailed description of the contributions of this paper is summarized in Section I-E.

I-C Notation

We are interested in the asymptotic setting where $\delta=n/N$ and $\rho=k/n$ are fixed, while $N\rightarrow\infty$ . We further assume that the elements of $s_{o}$ are i.i.d. $s_{o,i}\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i})$ , where $G$ is an unknown probability distribution with no point mass at , and $\delta_{0}$ is a Dirac delta function.This assumption is not necessary and as long as the marginal distribution of $s_{o}$ converges to a given distribution the statements of this paper hold. For further information on this, see and . Clearly, the expected number of non-zero elements in the vector $s_{o}$ is $\rho\delta N$ . We call this value the sparsity level of the signal. In this model, we are assuming that all the non-zero real and imaginary coefficients are paired. This quantifies the maximum amount of improvement the c-LASSO gains by grouping the real and imaginary parts.

We use the notations $\mathds{E}$ , $\mathds{E}_{X}$ , and $\mathds{E}_{X\sim F}$ for expected value, conditional expected value given the random variable $X$ , and expected value with respect to a random variable $X$ drawn from the distribution $F$ , respectively. Define $\mathcal{F}_{\epsilon,\gamma}$ as the family of distributions $F$ with $\mathds{E}_{X\sim F}(\mathds{I}(|X|=0))\geq 1-\epsilon$ and $\mathds{E}_{X\sim F}(|X|^{2})\leq\epsilon\gamma^{2}$ , where $\mathds{I}$ denotes the indicator function. An important distribution in this class is $q_{o}(X)\triangleq q(|X|)\triangleq(1-\epsilon)\delta_{0}(|X|)+\epsilon\delta_{\gamma}(|X|)$ , where $\delta_{\gamma}(|X|)\triangleq\delta_{0}(|X|-\gamma)$ . Note that this distribution is independent of the phase and in addition to a point mass at zero has another point mass at $\gamma$ . Finally, define $\mathcal{F}_{\epsilon}\triangleq\{\ F\ |\ \mathds{E}_{X\sim F}(\mathds{I}(|X|\neq 0))\leq\epsilon\}$ .

I-D Performance criteria

We compare c-LASSO with r-LASSO in both the noise-free and noisy measurements cases. For each scenario, we define a specific measure to compare the performance of the two algorithms.

Consider the problem of recovering $s_{o}$ drawn from $s_{o,i}\overset{\rm i.i.d.}{\sim}(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i})$ , from a set of noise free measurements $y=As_{o}$ . Let $\mathcal{A}_{\alpha}$ be a sparse recovery algorithm with free parameter $\alpha$ . For instance $\mathcal{A}$ may be the c-LASSO algorithm and the free parameter of the algorithm is the regularization argument $\lambda$ . Given $(y,A)$ , $\mathcal{A}_{\alpha}$ returns an estimate $\hat{x}^{\mathcal{A}_{\alpha}}$ of $s_{o}$ . Suppose that in the noise free case, as $N\rightarrow\infty$ , the performance of $\mathcal{A}_{\alpha}$ exhibits a sharp phase transition, i.e., for every value of $\delta$ , there exists ${\rho}^{\mathcal{A}_{\alpha}}(\delta)$ , below which $\lim_{N\rightarrow\infty}\|\hat{x}^{\mathcal{A}_{\alpha}}-s_{o}\|^{2}/N\rightarrow 0$ almost surely, while for $\rho>\rho^{\mathcal{A}_{\alpha}}(\delta)$ , $\mathcal{A}_{\alpha}$ fails and $\lim_{N\rightarrow\infty}\|\hat{x}^{\mathcal{A}_{\alpha}}-s_{o}\|^{2}/N\nrightarrow 0$ . The phase transition has been studied both empirically and theoretically for many sparse recovery algorithms . The phase transition curve $\rho^{\mathcal{A}_{\alpha}}(\delta)$ specifies the fundamental exact recovery limit of algorithm $\mathcal{A}_{\alpha}$ . The free parameter $\alpha$ can strongly affect the performance of the sparse recovery algorithm . Therefore, optimal tuning of this parameter is essential in practical applications. One approach is to tune the parameter for the highest phase transition ,In this paper, we consider algorithms whose phase transitions do not depend on the distribution $G$ of non-zero coefficients. Otherwise, one could use the maximin framework introduced in . i.e.,

In other words, $\rho^{\mathcal{A}}$ is the best performance $\mathcal{A}_{\alpha}$ provides in the exact sparse signal recovery problem, if we know how to tune the algorithm properly. Based on this framework, we say algorithm $\mathcal{A}$ outperforms $\mathcal{B}$ at a given $\delta$ , if and only if $\rho^{\mathcal{A}}(\delta)>\rho^{\mathcal{B}}(\delta)$ .

I-D2 Noisy measurements

where $\alpha$ denotes the tuning parameter of the algorithm $\mathcal{A}_{\alpha}$ . If the noise sensitivity is large, then the measurement noise may severely degrade the final reconstruction. In (1) we search for the distribution that induces the maximum reconstruction error to the algorithm. This ensures that for other signal distributions the reconstruction error is smaller. By tuning $\alpha$ , we may obtain better estimate of $s_{o}$ . Therefore, we tune the parameter $\alpha$ to obtain the lowest noise sensitivity, i.e.,

Based on this framework, we say that algorithm $\mathcal{A}$ outperforms $\mathcal{B}$ at a given $\delta$ and $\rho$ if and only if ${\rm NS}^{\mathcal{A}}(\delta,\rho)<{\rm NS}^{\mathcal{B}}(\delta,\rho)$ .

I-E Contributions

In this paper, we first develop the complex approximate message passing (CAMP) algorithm that is a simple and fast converging iterative method for solving c-LASSO. We extend the state evolution (SE), introduced recently as a framework for accurate asymptotic predictions of the AMP performance, to CAMP.Note that SE has been proved to be accurate only for the case of Gaussian measurement matrices . But, extensive simulations have confirmed its accuracy for a large class of random measurement matrices . The results of our paper are also provably correct for complex Gaussian measurement matrices. But, our simulations confirm that they hold for broader set of matrices. We will then use the connection between CAMP and c-LASSO to provide an accurate asymptotic analysis of the c-LASSO problem. We aim to characterize the phase transition curve (noise-free measurements) and noise sensitivity (noisy measurements) of c-LASSO and CAMP when the real and imaginary parts are paired, i.e., they are both zero or non-zero simultaneously. Both criteria have been extensively studied for the real signals (and hence for the r-LASSO) . The results of our predictions are summarized in Figures 1, 2, and 3. Figure 1 compares the phase transition curve of c-BP and CAMP with the phase transition curve of r-BP. As we expected c-BP outperforms r-BP since it exploits the connection between the real and imaginary parts. If $\rho_{SE}(\delta)$ denotes the phase transition curve, then we also prove that $\rho_{SE}(\delta)\sim\frac{1}{\log(1/2\delta)}$ as $\delta\rightarrow 0$ . Comparing this with $\rho^{R}_{SE}(\delta)\sim\frac{1}{2\log(1/\delta)}$ for the r-LASSO , we conclude that

This means that, in the very high undersampling regime the c-LASSO can recover signals that are two times more dense than the signals that are recovered by r-LASSO. Figure 2 exhibits the noise sensitivity of c-LASSO and CAMP. We prove in Section III-C that, as the sparsity approaches the phase transition curve, the noise sensitivity grows up to infinity. Finally, Figure 3 compares the contour plots of the noise sensitivity of c-LASSO with those of the r-LASSO. For the fixed value noise sensitivity, the level set of the c-LASSO is higher than that of r-LASSO. It is worth noting that the same comparisons hold between CAMP and AMP, as we will clarify in Section III-D.

I-F Related work

The state evolution framework used in this paper was first introduced in . Deriving the phase transition and noise sensitivity of the LASSO for real-valued signals and real-valued measurements from SE is due to ; see for more comprehensive discussion. Finally, the derivation of AMP from the full sum-product message passing is due to . Our main contribution in this paper is to extend these results to the complex setting. Not only is the analysis of the state evolution more challenging in this setting, but it also provides new insights on the performance of c-LASSO that have not been available. For instance, the noise sensitivity of c-LASSO has not previously been determined.

The recovery of sparse complex signals is a special case of group-sparsity or block-sparsity, where all the groups are non-overlapping and have size $2$ . According to the group sparsity assumption, the non-zero elements of the signal tend to occur in groups or clusters. One of the algorithms used in this context is the group-LASSO . Consider a signal $s_{o}\in\mathds{R}^{N}$ . Partition the indices of $s_{o}$ into $m$ groups $g_{1},\ldots,g_{m}$ . The group-LASSO algorithm minimizes the following cost function:

where the $\lambda_{i}$ ’s are regularization parameters.

The group-Lasso algorithm has been extensively studied in the literature . We briefly review several papers and emphasize the differences from our work. analyzes the consistency of the group LASSO estimator in the presence of noise. Fixing the signal $s_{o}$ , it provides conditions under which the group LASSO is consistent as $n\rightarrow\infty$ . consider a weak notion of consistency, i.e., exact support recovery. However, proves that in the setting we are interested in, i.e., $k/n=\rho$ and $n/N=\delta$ , even exact support recovery is not possible. When noise is present, our goal is neither exact recovery nor exact support recovery. Instead, we characterize the mean square error (MSE) of the reconstruction. This criterion has been considered in . Although the results of show qualitatively the benefit of group sparsity, they do not characterize the difference quantitatively. In fact, loose constants in both the error bound and the number of samples do not permit accurate performance comparison. In our analysis, no loose constant is involved, and we provide very accurate characterization of the mean square error.

Group-sparsity and group-LASSO are also of interest in the sparse recovery community. For example, the analysis carried out in are based on “coherence”. These results provide sufficient conditions with again loose constants as discussed above. The work of addresses this issue by an accurate analysis of the algorithm in the noiseless setting $\sigma=0$ . They provide a very accurate estimate of the phase transition curve for the group-LASSO. However, SE provides a more flexible framework to analyze c-LASSO than the analysis of , and it provides more information than just the phase transition curve. For instance, it points to the least favorable distribution of the input and noise sensitivity of c-LASSO.

The Bayesian approach that assumes a hidden Markov model for the signal has been also explored for the recovery of group sparse signals . It has been shown that AMP combined with an expectation maximization algorithm (for estimating the parameters of the distribution) leads to promising results in practice . Kamilov et al. have taken the first step towards a theoretical understanding of such algorithms. However, the complete understanding of the expectation maximization employed in such methods is not available yet. Furthermore, the success of such algorithms seem to be dependent on the match between the assumed and actual prior distribution. Such dependencies have not been theoretically analyzed yet. In this paper we assume that the distribution of non-zero coefficients is not known beforehand and characterize the performance of c-LASSO for the least favorable distribution.

While writing this paper we were made aware that in an independent work Donoho, Johnstone, and Montanari are extending the SE framework to the general setting of group sparsity . Their work considers the state evolution framework for the group-LASSO problem and will include the generalization of the analysis provided in this paper to the case where the variables tend to cluster in groups of size $B$ .

Both complex signals and group-sparse signals are special cases of model-based CS . By introducing more structured models for the signal, proves that the number of measurements needed are proportional to the “complexity” of the model rather than the sparsity level . The results in model-based CS also suffer from loose constants in both the number of measurements and the mean square error bounds.

Finally, from an algorithmic point of view, several papers have considered solving the c-LASSO problem using first-order algorithms .First-order methods are iterative algorithms that use either the gradient or the subgradient of the function at the previous iterations to update their estimates. The deterministic framework that measures the convergence of an algorithm on the problem instance that yields the slowest convergence rate, is not an appropriate measure of the convergence rate for the compressed sensing problems . Therefore, considers the average convergence rate for iterative algorithms. In that setting, AMP is the only first order algorithm that provably achieves linear convergence to date. Similarly, the CAMP algorithm, introduced in this paper, provides the first, first-order c-LASSO solver that provides a linear average convergence rate.

I-G Organization of the paper

We introduce the CAMP algorithm in Section II. We then explain the state evolution equations that characterizes the evolution of the mean square error through the iterations of the CAMP algorithm in Section III, and we analyze the important properties of the SE equations. We then discuss the connection between our calculations and the solution of LASSO in Section III-D. We confirm our results via Monte Carlo simulations in Section IV.

II Complex Approximate Message Passing

The high computational complexity of interior point methods for solving large scale convex optimization problems has spurred the development of first-order methods for solving the LASSO problem. See and the references therein for a description of some of these algorithms. One of the most successful algorithms for CS problems is the AMP algorithm introduced in . In this section, we use the approach introduced in to derive the approximate message passing algorithm for the c-LASSO problem that we term Complex Approximate Message Passing (CAMP).

Let $s_{1},s_{2},\ldots,s_{N}$ be $N$ random variables with the following distribution:

where $\beta$ is a constant and $Z(\beta)\triangleq\int_{s}{\rm e}^{-\beta\lambda\|s\|_{1}-\frac{\beta}{2}\|y-As\|_{2}^{2}}ds$ . As $\beta\rightarrow\infty$ , the mass of this distribution concentrates around the solution of the LASSO. Therefore, one way to find the solution of LASSO is to marginalize this distribution. However, calculating the marginal distribution is an NP-complete problem. The sum-product message passing algorithm provides a successful heuristic for approximating the marginal distribution. As $N\rightarrow\infty$ and $\beta\rightarrow\infty$ the iterations of the sum-product message passing algorithm are simplified to ( or Chapter 5 of )

III Formal analysis of CAMP and c-LASSO

In this section, we explain the state evolution (SE) framework that predicts the performance of the CAMP and c-LASSO in the asymptotic settings. We then use this framework to analyze the phase transition and noise sensitivity of the CAMP and c-LASSO. The formal connection between state evolution and CAMP/c-LASSO is discussed in Section III-D.

We now conduct an asymptotic analysis of the CAMP algorithm. As we confirm in Section III-D, the asymptotic performance of the algorithm is tracked through a few variables, called the state variables. The state of the algorithm is the 5-tuple $\mathbf{s}=(m;\delta,\rho,\sigma,G)$ , where $G$ corresponds to the distribution of the non-zero elements of the sparse vector $s_{o}$ , $\sigma$ is the standard deviation of the measurement noise, and $m$ is the asymptotic normalized mean square error. The threshold parameter (threshold policy) of CAMP in its most general form could be a function of the state of the algorithm $\tau(\mathbf{s})$ . Define ${\rm npi}(m;\sigma,\delta)\triangleq\sigma^{2}+\frac{m}{\delta}$ . The mean square error (MSE) map is defined as

where $Z_{1},Z_{2}\sim N(0,1/2)$ and $X\sim(1-\rho\delta)\delta_{0}(|x|)+\rho\delta G(x)$ are independent random variables. Note that $G$ is a probability distribution on $\mathds{C}$ . In the rest of this paper, we consider the thresholding policy $\tau(\mathbf{s})=\tau\sqrt{{\rm npi}(m,\sigma,\delta)}$ , where the constant $\tau$ is yet to be tuned according to the schemes introduced in Sections I-D1 and I-D2. When we use this thresholding policy we may equivalently write $\Psi(\mathbf{s},\tau(\mathbf{s}))$ as $\Psi(\mathbf{s},\tau)$ . This thresholding policy is the same as the thresholding policy introduced in . When the parameters $\delta,\rho,\sigma,\tau$ and $G$ are clear from the context, we denote the MSE map by $\Psi(m)$ . SE is the evolution of $m$ (starting from $t=0$ and $m_{0}=E(|X|^{2})$ ) by the rule

where ${\rm npi}^{t}\triangleq{\rm npi}(m_{t},\sigma,\delta)$ . As will be described in Section III-D, this equation tracks the normalized MSE of the CAMP algorithm in the asymptotic setting $n,N\rightarrow\infty$ and $n/N\rightarrow\delta$ . In other words, if $m_{t}$ is the MSE of the CAMP algorithm at iteration $t$ , the $m_{t+1}$ , calculated by (7), is the MSE of CAMP at iteration $t+1$ .

Let $\Psi$ be almost everywhere differentiable. $m^{*}$ is called a fixed point of $\Psi$ if and only if $\Psi(m^{*})=m^{*}$ . Furthermore, a fixed point is called stable if $\left.\frac{d\Psi(m)}{dm}\right|_{m=m^{*}}<1$ , and unstable if $\left.\frac{d\Psi(m)}{dm}\right|_{m=m^{*}}>1$ .

It is clear that if $m^{*}$ is the unique stable fixed point of the $\Psi$ function, then $m_{t}\rightarrow m^{*}$ as $t\rightarrow\infty$ . Also, if all the fixed points of $\Psi$ are unstable, then $m_{t}\rightarrow\infty$ as $t\rightarrow\infty$ . Define $\mu\triangleq|X|$ and $\theta\triangleq\measuredangle X$ . Let $G(\mu,\theta)$ denote the probability density function of $X$ and define $G(\mu)\triangleq\int G(\mu,\theta)d\theta$ as the marginal distribution of $\mu$ . The next lemma shows that in order to analyze the state evolution function we only need to consider the amplitude distribution. This substantially simplifies our analysis of SE in the next sections.

The MSE map does not depend on the phase distribution of the input signal, i.e.,

III-B Noise-free signal recovery

Consider the noise free setting with $\sigma=0$ . Suppose that SE predicts the MSE of CAMP in the asymptotic setting (we will make this rigorous in Section III-D). As mentioned in Section I-D1, in order to characterize the performance of CAMP in the noiseless setting, we first derive its phase transition curve and then optimize over $\tau$ to obtain the highest phase transition CAMP can achieve. Fix all the state variables except for $m$ , and $\rho$ . The evolution of $m$ , discriminates the following two regions for $\rho$ :

Region I: The values of $\rho$ for which $\Psi(m)<m$ for every $m>0$ ;

Since is necessarily a fixed point of the $\Psi$ function, in Region I $m_{t}\rightarrow 0$ as $t\rightarrow\infty$ . The following lemma shows that in Region II $m=0$ is an unstable fixed point and therefore starting from $m_{0}\neq 0$ , $m_{t}\nrightarrow 0$ .

Let $\sigma=0$ . If $\rho$ is in Region II, then $\Psi$ has an unstable fixed point at zero.

We prove in Lemma V.2 that $\Psi(m)$ is a concave function of $m$ . Therefore, $\rho$ is in Region II if and only if $\left.\frac{d\Psi(m)}{dm}\right|_{m=0}>1$ . This in turn indicates that is an unstable fixed point. ∎

It is also easy to confirm that Region I is of the form $[0,\rho_{SE}(\delta,G,\tau))$ . As we will see in Section III-D, $\rho_{SE}(\delta,G,\tau)$ determines the phase transition curve of the CAMP algorithm. According to Lemma III.2, the MSE map does not depend on the phase distribution of the non-zero elements. The following proposition shows that in fact $\rho_{SE}$ is independent of $G$ even though the $\Psi$ function depends on $G(\mu)$ .

$\rho_{SE}(\delta,G,\tau)$ is independent of the distribution $G$ .

According to Lemma V.2 in Appendix V-D, $\Psi$ is concave. Therefore, it has a stable fixed point at zero if and only if its derivative at zero is less than $1$ . It is also straightforward (from Appendix V-D) to show that

Setting this derivative to $1$ , it is clear that the phase transition value of $\rho$ is independent of $G$ . ∎

According to Proposition III.4 the only parameters that affect $\rho_{SE}$ are $\delta$ and the free parameter $\tau$ . Fixing $\delta$ , we tune $\tau$ such that the algorithm achieves its highest phase transition for a certain number of measurements, i.e.,

Using SE we can calculate the optimal value of $\tau$ and $\rho_{SE}(\delta)$ .

$\rho_{SE}(\delta)$ and $\delta$ satisfy the following implicit relations:

for $\tau\in[0,\infty)$ . Here, $\chi_{1}(\tau)\triangleq\int_{\omega\geq\tau}\omega(\tau-\omega){\rm e}^{-\omega^{2}}d\omega$ and $\chi_{2}(\tau)\triangleq\int_{\omega>\tau}\omega(\omega-\tau)^{2}{\rm e}^{-\omega^{2}}$ .

See Appendix V-D for the proof. Figure 1 displays this phase transition curve that is derived from the SE framework and compares it with the phase transition of r-BP algorithm. As will be described later, $\rho_{SE}(\delta)$ corresponds to the phase transition of c-LASSO. Hence the difference between $\rho_{SE}(\delta)$ and phase transition curve of r-LASSO is the benefit of grouping the real and imaginary parts.

It is also interesting to compare the $\rho_{SE}(\delta)$ (which as we see later predicts the performance of c-LASSO) with the phase transition of r-LASSO in high undersampling regime $\delta\rightarrow 0$ . The implicit formulation above enables us to calculate the asymptotic performance of the phase transition as $\delta\rightarrow 0$ .

$\rho_{SE}(\delta)$ follows the asymptotic behavior

See Appendix V-E for the proof. As mentioned above, this theorem shows that as $\delta\rightarrow 0$ the phase transition of c-BP and CAMP is two times that of the r-LASSO, which is given by $\rho^{R}_{SE}\sim 1/(2\log(1/\delta))$ . This improvement is due to the grouping of real and imaginary parts of the signal.

III-C Noise sensitivity

In this section we characterize the noise sensitivity of SE. To achieve this goal, we first discuss the risk of the complex soft thresholding function. The properties of this risk play an important role in the discussion of the noise sensitivity of SE in Section III-C2.

Define the risk of the soft thresholding function as

where $\mu\in[0,\infty)$ , $\theta\in[0,2\pi)$ , and the expected value is with respect to the two independent random variables $Z_{1},Z_{2}\sim N(0,1/2)$ . It is important to note that according to Lemma III.2, the risk function is independent of $\theta$ . The following lemma characterizes two important properties of this risk function:

$r(\mu,\tau)$ is an increasing function of $\mu$ and a concave function in terms of $\mu^{2}$ .

See Appendix V-F for the proof of this lemma. We define the minimax risk of the soft thresholding function as

where $q$ is the probability density function of $X$ , and the expected value is with respect to $X$ , $Z_{1}$ and $Z_{2}$ .

Note that $q\in\mathcal{F}_{\epsilon}$ implies that $q$ has a point mass of $1-\epsilon$ at zero; see Section I-C for more information. In the next section we show a connection between this minimax risk and the noise sensitivity of the SE. Therefore, it is important to characterize $M^{\flat}(\epsilon)$ .

The minimax risk of the soft thresholding function satisfies

See Appendix V-G for the proof. It is important to note that the quantities in (8) can be easily calculated in terms of the density and distribution function of a normal random variable. Therefore, a simple computer program may accurately calculate the value of $M^{\flat}(\epsilon)$ for any $\epsilon$ .

The proof provided for Proposition III.8 also proves the following proposition. We will discuss the importance of this result for compressed sensing problems in the next section.

The maximum of the risk function, $\max_{q\in\mathcal{F}_{\epsilon,\gamma}}\mathds{E}|\eta(X+Z_{1}+iZ_{2};\tau)-X|^{2}$ , is achieved on $q(X)=(1-\epsilon)\delta_{0}(|X|)+\epsilon\delta_{\gamma}(|X|)$ .

First, note that the maximizing distribution (or least favorable distribution) is independent of the threshold parameter. Second, note that the maximizing distribution is not unique since we have already proved that the phase distribution does not affect the risk function.

III-C2 Noise sensitivity of state evolution

As mentioned in Section III-A, in the presence of measurement noise, SE is given by

where ${\rm npi}={\sigma^{2}+\frac{m_{t}}{\delta}}$ . As mentioned above, $m_{t}$ characterizes the asymptotic MSE of CAMP at iteration $t$ . Therefore, the final solution of the CAMP algorithm converges to one of the stable fixed points of the $\Psi$ function. The next theorem suggests that the stable fixed point is unique, and therefore no matter where the algorithm starts from it will always converge to the same MSE.

$\Psi(m)$ has a unique stable fixed point to which the sequence of $\{m_{t}\}$ converges.

We call the fixed point in Lemma III.10 ${\rm fMSE}(\sigma^{2},\delta,\rho,G,\tau)$ . According to Section I-D2, we define the minimax noise sensitivity as

The noise sensitivity of SE can be easily evaluated from $M^{\flat}(\epsilon)$ . The following theorem characterizes this relation.

Let $\rho_{MSE}(\delta)$ be the value of $\rho$ satisfying $M^{\flat}(\rho\delta)=\delta$ . Then, for $\rho<\rho_{MSE}$ we have

and for $\rho>\rho_{MSE}(\delta)$ , ${\rm NS}^{SE}(\delta,\rho)=\infty$ .

The proof of this theorem follows along the same lines as the proof of Proposition 3.1 in , and therefore we skip it for the sake of brevity. The contour lines of this noise sensitivity function are displayed in Figure 2.

Similar arguments as those presented in Proposition 3.1 in combined with Proposition III.9 prove the following.

The maximum of the formal MSE, $\max_{q\in\mathcal{F}_{\epsilon,\gamma}}{\rm fMSE}(\sigma^{2},\delta,\rho,G,\tau)$ is achieved by $q=(1-\epsilon)\delta_{0}(|X|)+\epsilon\delta_{\gamma}(|X|)$ , independent of $\sigma$ and $\tau$ .

Again we emphasize that the maximizing or least favorable distribution is not unique. Note that the least favorable distribution provides a simple approach for designing and setting the parameters of CS systems : We design the system such that it performs well on the least favorable distribution, and it is then guaranteed that the system will perform as well (or in many cases better) on all other input distributions.

As a final remark we note that $\rho_{MSE}(\delta)$ equals $\rho_{SE}(\delta)$ as proved next.

The proof is a simple comparison of the formulas. We first know that $\rho_{MSE}$ is derived from the following equation

On the other hand, since $\Psi(m)$ is a concave function of $m$ , $\rho_{SE}(\delta,\tau)$ is derived from $\left.\frac{d\Psi(m)}{dm}\right|_{m=0}=1$ . This derivative is equal to

Also, $\rho_{SE}(\delta)=\sup_{\tau}\rho_{SE}(\tau,\delta)$ . However, in order to obtain the highest $\rho$ we should minimize the above expression over $\tau$ . Therefore, both $\rho_{SE}(\delta)$ and $\rho_{MSE}(\delta)$ satisfy the same equations and thus are exactly equal. ∎

III-D Connection between the state evolution, CAMP, and c-LASSO

There is a strong connection between the SE framework, the CAMP algorithm, and c-LASSO. Recently, proved that SE predicts the asymptotic performance of the AMP algorithm when the measurement matrix is i.i.d. Gaussian. The result also holds for complex Gaussian matrices and complex input vectors. As in , we conjecture that the SE predictions are correct for a “large” class of random matrices. We show evidence of this claim in Section IV. Here, for the sake of completeness, we quote the result of in the complex setting. Let $\gamma:\mathds{C}^{2}\rightarrow\mathds{R}$ be a pseudo-Lipschitz function. $\gamma:\mathds{C}^{2}\rightarrow\mathds{R}$ is pseudo-Lipschitz if and only if $|\psi(x)-\psi(y)|\leq L(1+\|x\|_{2}+\|y\|_{2})\|x-y\|_{2}$ . To make the presentation clear we consider a simplified version of Definition 1 in .

A sequence of instances $\{s_{o}(N),A(N),w(N)\}$ , indexed by the ambient dimension $N$ , is called a converging sequence if the following conditions hold:

The elements of $s_{o}(N)\in\mathds{R}^{N}$ are i.i.d. drawn from $(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i})$ .

The elements of $w(N)\in\mathds{R}^{n}$ ( $n=\delta N$ ) are i.i.d. drawn from $N(0,\sigma_{w}^{2})$ .

The elements of $A(N)\in\mathds{R}^{n\times N}$ are i.i.d. drawn from a complex Gaussian distribution.

Consider a converging sequence $\{s_{o}(N),A(N),w(N)\}$ . Let $x^{t}(N)$ be the estimate of the CAMP algorithm at iteration $t$ . For any pseudo Lipschitz function $\gamma:\mathds{C}^{2}\rightarrow\mathds{R}$ we have

almost surely, where $Z_{1}+iZ_{2}\sim CN(0,1)$ and $X\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i})$ are independent complex random variables. Also, ${\rm npi}^{t}\triangleq\sigma^{2}+m_{t}/\delta$ , where $m_{t}$ satisfies (7).

The proof of this theorem is similar to the proof of Theorem 1 in and hence is skipped here.

It is also straightforward to extend the result of and on the connection of message passing algorithms and LASSO to the complex setting. For a given value of $\tau$ suppose that the fixed point of the state evolution is denoted by $m^{*}$ . Define $\lambda(\tau)$ as

and $\mathds{E}$ is with respect to independent random variables $Z_{1}+iZ_{2}\sim CN(0,1)$ and $X\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i})$ . The following theorem establishes the connection between the solution of LASSO and the state evolution equation.

Consider a converging sequence $\{s_{o}(N),A(N),w(N)\}$ . Let $\hat{x}^{\lambda(\tau)}(N)$ be the solution of LASSO. Then, for any pseudo Lipschitz function $\gamma:\mathds{C}^{2}\rightarrow\mathds{R}$ we have

almost surely, where $Z_{1}+iZ_{2}\sim CN(0,1)$ and $X\sim(1-\rho\delta)\delta_{0}(|s_{o,i}|)+\rho\delta G(s_{o,i})$ are independent complex random variables. ${\rm npi}^{*}\triangleq\sigma^{2}+m^{*}/\delta$ , where $m^{*}$ is the fixed point of (7).

The proof of the theorem is similar to the proof of Theorem 1.4 in and hence is skipped here.

Note that according to Theorems III.15 and III.16, SE predicts the dynamic of the AMP algorithm and the solution of LASSO accurately in the asymptotic settings.

III-E Discussion

In this section we briefly discuss the convergence rate of the CAMP algorithm. In this respect our results are straightforward extension of the analysis in . But, for the sake of completeness, we mention a few highlights. Let $\{m_{t}\}_{t=1}^{\infty}$ be a sequence of MSE generated according to state evolution (7) for $X\sim(1-\epsilon)\delta_{0}(|X|)+\epsilon G(X)$ , $\tau$ , and $\sigma^{2}=0$ . The following proposition provides an upper bound on $m_{t}$ as a function of iteration $t$ .

Let $\{m_{t}\}_{t=1}^{\infty}$ be a sequence of MSEs generated according to SE. Then

Since according to Lemma V.2 $\Psi(m)$ is concave, we have $\Psi(m)\leq\left.\frac{d\Psi(m)}{dm}\right|_{m=0}m$ . Hence at every iteration, $m$ is attenuated by $\left.\frac{d\Psi(m)}{dm}\right|_{m=0}$ . After $t$ iterations we have $m^{t}\leq\left(\left.\frac{d\Psi(m)}{dm}\right|_{m=0}\right)^{t}m_{0}$ . ∎

According to Theorem III.17 the convergence rate of CAMP is linear (in the asymptotic setting).If the measurement matrix is not i.i.d. random CAMP does not necessarily converge at this rate. This is due to the fact that state evolution does not necessarily hold for arbitrary matrices. In fact, due to the concavity of the $\Psi$ function, CAMP converges faster for large values of MSE $m$ . As $m$ reaches zero the convergence rate decreases towards the rate predicted by this theorem. Theorem III.17 provides an upper bound on the number of iterations the algorithm requires to reach to a certain accuracy. Figure 4 exhibits the value of $\left.\frac{d\Psi(m)}{dm}\right|_{m=0}$ as a function of $\rho$ and $\delta$ . This figure is based on the calculations we have presented in Appendix V-D. Here, $\tau$ is chosen such that the CAMP algorithm achieves the same phase transition as c-BP algorithm. Note that, according to Proposition III.17, if $\left.\frac{d\Psi(m)}{dm}\right|_{m=0}<0.9$ , then $m_{200}<7.1\times 10^{-10}m_{0}$ .

Theorem III.17 only considers the noise-free problem. But, again due to the concavity of the $\Psi$ function, the convergence of CAMP to its fixed point is even faster for noisy measurements. To see this, note that once the measurements are noisy, the fixed point of CAMP occurs at a larger value of $m$ . Since $\Psi$ is concave, the derivative at this point is lower than the derivative at zero. Hence, convergence will be faster.

III-E2 Extensions

The results presented in this paper are concerned with the two most popular problems in compressed sensing, i.e., exact recovery of sparse signals and approximate recovery of sparse signals in the presence of noise. However, our framework is far more powerful and can address other compressed sensing problems as well. For instance a similar framework has been used to address the problem of recovering approximately sparse signals in the presence of noise . For the sake of brevity we have not provided such an analysis in the current paper. However, the properties we proved in Lemmas III.2, III.7, and Proposition III.8 enable a straightforward extension of our analysis to such cases as well.

Furthermore, the framework we developed here provides a way for recovering sparse complex-valued signals when the distribution of non-zero elements is known. This area has been studied in .

IV Simulations

As explained in Section III-D, our theoretical results show that, if the elements of the matrix are i.i.d. Gaussian, then SE predicts the performance of the CAMP and c-LASSO algorithms accurately. However, in this section we will show evidence that suggests the theoretical framework is applicable to a wider class of measurement matrices. We then investigate the dependence of the empirical phase transition on the input distribution for medium problem sizes.

We investigate the effect of the measurement matrix distribution on the performance of CAMP and c-LASSO in two different cases. First, we consider the case where the measurements are noise-free. We postpone a discussion of measurement noise to Section IV-A2.

Suppose that the measurements are noise-free. Our goal is to empirically measure the phase transition curves of the c-LASSO and CAMP on the measurement matrices provided in Table I. To characterize the phase transition of an algorithm, we do the following:

We consider 33 equispaced values of $\delta$ between and $1$ .

For each value of $\delta$ , we calculate $\rho_{SE}(\delta)$ from the theoretical framework and then consider 41 equispaced values of $\rho$ in $[\rho_{SE}(\delta)-0.2,\rho_{SE}(\delta)+0.2]$ .

We fix $N=1000$ , and for any value of $\rho$ and $\delta$ , we calculate $n=\lfloor\delta N\rfloor$ and $k=\lfloor\rho\delta N\rfloor$ .

We draw $M=20$ independent random matrices from one of the distributions described in Table I and for each matrix we construct a random input vector $s_{o}$ with one of the distributions described in Table II. We then form $y=As_{o}$ and recover $s_{o}$ from $y,A$ by either c-BP or CAMP to obtain $\hat{x}$ . The matrix distributions and coefficient distributions we consider in our simulations are specified in Tables I and II, respectively.

For each $\delta$ , $\rho$ , and Monte Carlo sample $j$ we define a success variable $S_{\delta,\rho,j}=\mathds{I}\left(\frac{\|\hat{x}-x\|_{2}}{\|x\|_{2}}<{\rm tol}\right)$ and we calculate the success probability $\hat{p}^{S}_{\delta,\rho}=\frac{1}{M}\sum_{j}S_{\delta,\rho,j}$ . This provides an empirical estimate of the probability of correct recovery. The value of tol in our case is set to $10^{-4}$ .

For a fixed value of $\delta$ , we fit a logistic regression function to $\hat{p}^{S}(\delta,\rho)$ to obtain $p_{\delta}^{S}(\rho)$ . Then we find the value of $\hat{\rho}_{\delta}$ for which $p_{\delta}^{S}(\rho)=0.5$ .

See for a more detailed discussion of this approach. For the c-LASSO algorithm, we are reproducing the experiments of and, therefore, we are using one-L1 algorithm . Although Figure 4 confirms that for most cases even $200$ iterations of CAMP are enough to reach convergence, since our goal is to measure the phase transition, we consider $3000$ iterations. See Section III-E1 for the discussion on the convergence rate.

Figure 5 compares the phase transition of c-LASSO and CAMP on the ensembles specified in Table I with the theoretical prediction of this paper. In this simulation the coefficient ensemble is UP (see Table II). Clearly, the empirical and theoretical phase transitions of the algorithms coincide. More importantly, we can conjecture that the choice of the measurement matrix ensemble does not affect the phase transition of these two algorithms. We will next discuss the impact of measurement matrix when there is noise on the measurements.

IV-A2 Noisy measurements

In this section we aim to show that, even in the presence of noise, the matrix ensembles defined in Table I perform similarly. Here is the setup for our experiment:

We set $\delta=0.25$ , $\rho=0.1$ , and $N=1000$ .

We choose $50$ different values of $\sigma$ in the range [0.001, 0.1].

We choose $n\times N$ measurement matrix $A$ from one of the ensembles specified in Table I.

We draw $k$ i.i.d. elements from UP ensemble for the $k=\lfloor\rho n\rfloor$ non-zero elements of the input $s_{o}$ .

We form the measurement vector $y=As_{o}+\sigma w$ where $w$ is the noise vector with i.i.d. elements from $CN(0,1)$ .

For CAMP, we set $\tau=2$ . For c-LASSO, we use (9) to derive the corresponding values of $\lambda$ for $\tau=2$ in CAMP.

We calculate the MSE $\|\hat{x}-s_{o}\|_{2}^{2}/N$ for each matrix ensemble and compare the results.

Figures 6 and 7 summarize our results. The concentration of the points along the $y=x$ line indicates that the matrix ensembles, specified in Table I, perform similarly. The coincidence of the phase transition curves for different matrix ensembles is known as universality hypothesis (conjecture). In order to provide a stronger evidence, we run the above experiment with $N=4000$ . The results of this experiment are exhibited in Figures 8 and 9. It is clear from these figures that the MSE is now more concentrated around the $y=x$ line. Additional experiments with other parameter values exhibited the same behavior. Note that as $N$ grows, the variance of the MSE estimate becomes smaller, and the behavior of the algorithm is closer to the average performance that is predicted by the SE equation.

IV-B Coefficient ensemble simulations

According to Proposition III.4, $\rho_{SE}(\delta,\tau)$ is independent of the distribution $G$ of non-zero coefficients of $s_{0}$ . We test the accuracy of this result on medium problem sizes. We fix $\delta$ to $0.1$ and we calculate $\hat{p}^{S}_{\delta,\rho}$ for $60$ equispaced values of $\rho$ between $0.1$ and $0.5$ . For each algorithm and each value of $\rho$ we run $100$ Monte Carlo trials and calculate the success rate for the Gaussian matrix and the coefficient ensembles specified in Table II. Figure 10 summarizes our result. Simulations at other values of $\delta$ result in very similar behavior. These results are consistent with Proposition III.4. The small differences between the empirical phase transitions are due to two issues that are not reflected in Proposition III.4: (i) $N$ is finite, while Proposition III.4 considers the asymptotic setting. (ii) The number of algorithm iterations is finite, while Proposition III.4 assumes that we run CAMP for an infinite number of iterations.

V Proofs of the main results

For a given convex function $f:\mathds{C}^{n}\rightarrow\mathds{R}$ the proximity operator at point $x$ is defined as

Since (10) can be decoupled into the elements of the $x,y$ , we can obtain the optimal value of $y$ , by optimizing over its individual components. In other words, we solve the optimization in (10) for $x,y\in\mathds{C}$ . In this case the optimization reduces to

Suppose that the optimal $y_{*}$ satisfies $(y_{*}^{R})^{2}+(y_{*}^{I})^{2}>0$ . Then the function $\sqrt{(y^{R})^{2}+(y^{I})^{2}}$ is differentiable and the optimal solution satisfies

Combining the two equations in V-A we obtain $y_{*}^{R}x^{I}=x^{R}y_{*}^{I}$ . Replacing this in (V-A) we have $y_{*}^{R}=x^{R}-\frac{\tau|x^{R}|}{\sqrt{(x^{R})^{2}+(x^{I})^{2}}}$ and $y_{*}^{I}=x^{I}-\frac{\tau|x^{I}|}{\sqrt{(x^{R})^{2}+(x^{I})^{2}}}$ . It is clear that if $\sqrt{(x^{R})^{2}+(x^{I})^{2}}<\tau$ , then the signs of $y_{*}^{R}$ and $x^{R}$ will be opposite, which is in contradiction with (V-A). Therefore, if $\sqrt{(x^{R})^{2}+(x^{I})^{2}}<\tau$ , both $y_{*}^{R}$ and $y_{*}^{I}$ are zero. It is straightforward to check that $(0,0)$ satisfies the subgradient optimality condition. ∎

V-B Proof of Proposition II.1

denote the real and imaginary parts of the complex soft thresholding function. Define

We also use the first-order expansion of the soft thresholding function to obtain

V-C Proof of Lemma III.2

Let $\mu$ and $\theta$ denote the amplitude and phase of the random variable $X$ . Define $\nu\triangleq\sqrt{\rm npi}=\sqrt{\sigma^{2}+\frac{m}{\delta}}$ and $\zeta\triangleq\frac{\mu}{{\nu}}$ . Then

where $\mathds{E}_{\zeta,\theta}$ denotes the conditional expectation given the variables $\zeta,\theta$ . Note that the marginal distribution of $\zeta$ depends only on the marginal distribution of $\mu$ . The first term in (18) is independent of the phase $\theta$ , and therefore we should prove that the second term is also independent of $\theta$ . Define

We prove that $\Phi$ is independent of $\theta$ . For two real-valued variables $z_{r}$ and $z_{c}$ , define $\mathbf{z}\triangleq(z_{r},z_{c})$ , $d\mathbf{z}\triangleq dz_{r}dz_{c}$ , and

Define the two sets $S_{\tau}\triangleq\{(z_{r},z_{c})\ |\ \alpha_{z}<\tau\}$ and $S_{\tau}^{c}\triangleq\mathds{R}^{2}\backslash S_{\tau}$ , where “ $\backslash$ ” is the set subtraction operator. We have

The first integral in (20) corresponds to the case $|\zeta{\rm e}^{i\theta}+z_{r}+iz_{c}|<\tau$ . The second integral is over the values of $z_{r}$ and $z_{c}$ for which $|\zeta{\rm e}^{i\theta}+z_{r}+iz_{c}|\geq\tau$ . Define $\beta\triangleq\zeta\cos\theta+z_{r}$ and $\gamma\triangleq\zeta\sin\theta+z_{c}$ . We then obtain

Equality (a) is the result of the change of integration variables from $\gamma$ and $\beta$ to $r\triangleq\sqrt{\beta^{2}+\gamma^{2}}$ and $\phi\triangleq\arctan\Big{(}\frac{\gamma}{\beta}\Big{)}$ . The periodicity of the cosine function proves that the last integration is independent of the phase $\theta$ . We can similarly prove that $\int_{\mathbf{z}\in S_{\tau}}\zeta^{2}\frac{1}{\pi}{\rm e}^{-z_{r}^{2}+z_{c}^{2}}d\mathbf{z}$ is independent of $\theta$ . This completes the proof. $\hfill\Box$

V-D Proof of Theorem III.5

We first prove the following lemma that simplifies the proof of Theorem III.5.

The function $\Psi(m)$ is concave with respect to $m$ .

For the notational simplicity define $\nu\triangleq\sqrt{\sigma^{2}+\frac{m}{\delta}}$ , $X_{\nu}\triangleq\frac{X}{\nu}$ , and $A_{\nu}\triangleq|X_{\nu}-Z_{1}+iZ_{2}|$ . We note that

Therefore, $\Psi$ is concave with respect to $m$ if and only if it is concave with respect to $\nu^{2}$ . According to Lemma III.2 the phase distribution of $X$ does not affect the $\Psi$ function. Therefore, we set the phase of $X$ to zero and assume that it is a positive-valued random variable (representing the amplitude). This assumption substantially simplifies the calculations. We have

where $\mathds{E}_{X}$ denotes the expected value conditioned on the random variable $X$ . We first prove that $\Psi_{X}(\nu^{2})\triangleq\nu^{2}\mathds{E}_{X}\left(\left|\eta(X_{\nu}+Z_{1}+iZ_{2};\tau)-X_{\nu}\right|^{2}\right)$ is concave with respect to $\nu^{2}$ by proving $\frac{d\Psi_{X}}{d(\nu^{2})^{2}}\leq 0$ . Then, since $\Psi(\nu^{2})$ is a convex combination of $\Psi_{X}(\nu^{2})$ , we conclude that $\Psi(\nu^{2})$ is a concave function of $\nu^{2}$ as well. The rest of the proof details the algebra required for calculating and simplifying $\frac{d^{2}\Psi_{X}(\nu^{2})}{d^{2}\nu^{2}}$ .

Using the real and imaginary parts of the soft thresholding function and its partial derivatives introduced in (V-B) and (V-B) we have

where $\partial_{1}^{R}$ , $\partial_{2}^{R}$ , $\partial_{1}^{I}$ , and $\partial_{2}^{I}$ are defined in (V-B). Note that in the above calculations, $\partial_{2}\eta^{R}$ and $\partial_{2}\eta^{I}$ did not appear, since we assumed that $X$ is a real-valued random variable. Define $A_{\nu}\triangleq\sqrt{(X_{\nu}+Z_{1})^{2}+Z_{2}^{2}}$ . It is straightforward to show that

For $f:\mathds{C}\rightarrow\mathds{R}$ we define $\partial^{2}_{1}f(x+iy)\triangleq\frac{\partial^{2}f(x+iy)}{\partial x^{2}}$ . It is straightforward to show that

Our next objective is to simplify the terms in (22). We start with

Note that in the above expression we have replaced $\left(1-\frac{\tau Z_{2}^{2}}{A_{\nu}^{3}}\right)\delta(A_{\nu}-\tau)$ with $\left(1-\frac{Z_{2}^{2}}{A_{\nu}^{2}}\right)\delta(A_{\nu}-\tau)$ for an obvious reason. It is straightforward to simplify this expression to obtain

By plugging (V-D), (24), and (25) into (22), we obtain

We claim that both terms on the right hand side of (V-D) are negative. To prove this claim, we first focus on the first term:

Define $S_{\tau}\triangleq\{(Z_{1},Z_{2})\ |\ A_{\nu}\geq\tau\}$ . We have

Equality (a) is the result of the change of integration variables from $z_{1}$ , $z_{2}$ to $r\triangleq A_{\nu}$ and $\phi\triangleq\arctan\left(\frac{z_{2}}{z_{1}+X_{\nu}}\right)$ . With exactly similar approach we can prove that the second term of (V-D) is also negative.

So far we have proved that $\Psi_{X}(m)$ is concave with respect to $m$ . But this implies that $\Psi(m)$ is also concave, since it is a convex combination of concave functions. ∎

As proved in Lemma V.2, $\Psi(m)$ is a concave function. Furthermore $\Psi(0)=0$ . Therefore a given value of $\rho$ is below the phase transition, i.e., $\rho<\rho_{SE}(\delta)$ if and only if $\left.\frac{d\Psi}{dm}\right|_{m}<1$ . It is straightforward to calculate the derivative at zero and confirm that

Since $Z_{1},Z_{2}\sim N(0,1/2)$ and are independent, the phase of $Z_{1}+iZ_{2}$ has a uniform distribution, while its amplitude has Rayleigh distribution. Therefore, we have

We plug (29) into (28) and set the derivative $\left.\frac{d\Psi}{dm}\right|_{m}=1$ to obtain the value of $\rho$ at which the phase transition occurs. This value is given by

Clearly the phase transition depends on $\tau$ . Hence according to the framework we introduced in Section I-D1, we search for the value of $\tau$ that maximizes the phase transition $\rho$ . Define $\chi_{1}(\tau)\triangleq\int_{\tau_{*}}^{\infty}\omega(\tau_{*}-\omega){\rm e}^{-\omega^{2}}d\omega$ and $\chi_{2}(\tau)\triangleq\int_{\tau_{*}}^{\infty}\omega(\omega-\tau_{*})^{2}{\rm e}^{-\omega^{2}}d\omega$ . This optimal $\tau$ satisfies

which in turn results in $\delta=\frac{4(1+\tau_{*}^{2})\chi_{1}(\tau^{*})-4\tau_{*}\chi_{2}(\tau^{*})}{-2\tau_{*}+4\chi_{1}(\tau^{*})}$ . Plugging $\delta$ into the formula for $\rho$ , we obtain the formula in Theorem III.5. ∎

V-E Proof of Theorem III.6

We first show that the value of $\delta$ in Theorem III.5 goes to zero as $\tau\rightarrow\infty$ . By changing the variable of integration from $\omega$ to $\gamma=\omega-\tau$ , we obtain

Again by changing integration variables we have

Using (30) and (31) in the formula for $\delta$ in Theorem III.5 establishes that $\delta\rightarrow 0$ as $\tau\rightarrow\infty$ . Therefore, in order to find the asymptotic behavior of the phase transition as $\delta\rightarrow 0$ , we can calculate the asymptotic behavior of $\delta$ and $\rho$ as $\tau\rightarrow\infty$ . This is a standard application of Laplace’s method. Using this method we calculate the leading terms of $\rho$ and $\delta$ :

Plugging (32) and (33) into the formula we have for $\rho$ and $\delta$ in Theorem III.5, we obtain

V-F Proof of Lemma III.7

According to Lemma III.2, the phase $\theta$ does not affect the risk function, and therefore we set it to zero. We have

where $\eta^{R}(\mu+Z_{1}+iZ_{2};\tau)=\left(\mu+Z_{1}-\frac{\tau(\mu+Z_{1})}{A}\right)\mathds{I}(A\geq\tau)$ , $\eta^{I}(\mu+Z_{1}+iZ_{2};\tau)=\left(z_{2}-\frac{\tau Z_{2}}{A}\right)\mathds{I}(A\geq\tau)$ and $A\triangleq\sqrt{(\mu+Z_{1})^{2}+Z_{2}^{2}}$ . If we calculate the derivative of the risk function with respect to $\mu$ , then we have

Therefore, the risk of the complex soft thresholding is an increasing function of $\mu$ . Furthermore,

It is clear that the next derivative with respect to $\mu^{2}$ is negative, and therefore the function is concave. $\hfill\Box$

V-G Proof of Proposition III.8

As is clear from the statement of the theorem, the main challenge here is to characterize

Let $X=\mu{\rm e}^{i\theta}$ , where $\mu$ and $\theta$ are the phase and amplitude of $X$ respectively. According to Lemma III.2, the risk function is independent of $\theta$ . Furthermore, since $q\in F_{\epsilon}$ , we can write it as $q(\mu)=(1-\epsilon)\delta_{0}(\mu)+(1-\epsilon)G(\mu)$ , where $G$ is absolutely continuous with respect to Lebesgue measure. We then have

Furthermore, from the monotonicity of the risk function proved in Lemma III.7, we have

Again we can use the monotonicity of the risk function to prove that

The last equality is the result of the monotone convergence theorem. Combining (34) and (35) completes the proof. $\hfill\Box$

VI Conclusions

We have considered the problem of recovering a complex-valued sparse signal from an undersampled set of complex-valued measurements. We have accurately analyzed the asymptotic performance of c-LASSO and CAMP algorithms. Using the state evolution framework, we have derived simple expressions for the noise sensitivity and phase transition of these two algorithms. The results presented here show that substantial improvements can be achieved when the real and imaginary parts are considered jointly by the recovery algorithm. For instance, Theorem III.6 shows that in the high undersampling regime the phase transition of CAMP and c-BP is two times higher than the phase transition of r-LASSO.

Acknowledgements

Thanks to David Donoho and Andrea Montanari for their encouragement and their valuable suggestions on an early draft of this paper. We would also like to thank the reviewers and the associate editor for the thoughtful comments that helped us improve the quality of the manuscript, and to thank Ali Mousavi for careful reading of our paper and suggesting improvements. This work was supported by the grants NSF CCF-0431150, CCF-0926127, and CCF-1117939; DARPA/ONR N66001-11-C-4092 and N66001-11-1-4090; ONR N00014-08-1-1112, N00014-10-1-0989, and N00014-11-1-0714; AFOSR FA9550-09-1-0432; ARO MURI W911NF-07-1-0185 and W911NF-09-1-0383; and the TI Leadership University Program.