Phase Recovery, MaxCut and Complex Semidefinite Programming

Irène Waldspurger, Alexandre d'Aspremont, Stéphane Mallat

Introduction

The phase recovery problem, i.e. the problem of reconstructing a complex phase vector given only the magnitude of linear measurements, appears in a wide range of engineering and physical applications. It is needed for example in X-ray and crystallography imaging [Harrison, 1993], diffraction imaging [Bunk et al., 2007] or microscopy [Miao et al., 2008]. In these applications, the detectors cannot measure the phase of the incoming wave and only record its amplitude. Recovering the complex phase of wavelet transforms from their amplitude also has applications in audio signal processing [Griffin and Lim, 1984].

Recovering the phase of AxAx from Ax|Ax| is a nonconvex optimization problem. Until recently, this problem was solved using various greedy algorithms [Gerchberg and Saxton, 1972; Fienup, 1982; Griffin and Lim, 1984], which alternate projections on the range of AA and on the nonconvex set of vectors yy such that y=Ax|y|=|Ax|. However, these algorithms often stall in local minima. A convex relaxation called PhaseLift was introduced in [Chai et al., 2011] and [Candes et al., 2011b] by observing that Ax2|Ax|^{2} is a linear function of X=xxX=xx^{*} which is a rank one Hermitian matrix. The recovery of xx is thus expressed as a rank minimization problem over positive semidefinite Hermitian matrices XX satisfying some linear conditions. This last problem is approximated by a semidefinite program. It has been shown that this program recovers xx with high probability when AA has gaussian independant entries [Candes et al., 2011b, a]. Numerically, the same result seems to hold for several classes of linear operators AA.

Our main contribution here is to formulate phase recovery as a quadratic optimization problem over the unit complex torus. We then write a convex relaxation to phase recovery very similar to the MaxCut semidefinite program (we call this relaxation PhaseCut in what follows). While the resulting SDP is typically larger than the PhaseLift relaxation, its simple structure (the constraint matrices are singletons) allows us to solve it very efficiently. In particular, this allows us to use a provably convergent block coordinate descent algorithm whose iteration complexity is similar to that of the original greedy algorithm in Gerchberg and Saxton (each iteration is a matrix vector product, which can be computed efficiently). We also show that tightness of PhaseLift implies tightness of a modified version of PhaseCut. Furthermore, under the condition that AA is injective and bb has no zero coordinate, we derive an equivalence result between PhaseLift and a modified version of PhaseCut, in the noiseless setting. This result implies that both algorithms are simultaneously tight (an earlier version of this paper showed PhaseLift tightness implies PhaseCut tightness and the reverse direction was then proved in [Voroninski, 2012] under mild additional assumptions). In a noisy setting, one can show that PhaseCut is at least as stable as a variant of PhaseLift, while PhaseCut empirically appears to be more stable in some cases, e.g. when bb is sparse.

Seeing the MaxCut relaxation emerge in a phase recovery problem is not entirely surprising: it appears, for example, in an angular synchronization problem where one seeks to reconstruct a sequence of angles θi\theta_{i} (up to a global phase), given information on pairwise differences θiθj\theta_{i}-\theta_{j} mod. 2π2\pi, for (i,j)S(i,j)\in S [see Singer, 2011], the key difference between this last problem and the phase recovery problem in (1) is that the sign information is lost in the input to (1). Complex MaxCut-like relaxations of decoding problems also appear in maximum-likelihood channel detection [Luo et al., 2003; Kisialiou and Luo, 2010; So, 2010]. From a combinatorial optimization perspective, showing the equivalence between phase recovery and MaxCut allows us to expose a new class of nontrivial problem instances where the semidefinite relaxation for a MaxCut-like problem is tight, together with explicit conditions for tightness directly imported from the matrix completion formulation of these problems (these conditions are of course also hard to check, but hold with high probability for some classes of random experiments).

Phase recovery

Because F\bf F is not convex however, this alternating projection method usually converges to a stationary point yy^{\infty} which does not belong to the intersection of F\bf F with the image of AA, and hence AAyb|AA^{\dagger}y^{\infty}|\neq b. Several modifications proposed in [Fienup, 1982] improve the convergence rate but do not eliminate the existence of multiple stationary points. To guarantee convergence to a unique solution, which hopefully belongs to the intersection of F\bf F and the image of AA, this non-convex optimization problem has recently been relaxed as a semidefinite program [Chai et al., 2011; Candes et al., 2011b], where phase recovery is formulated as a matrix completion problem (described in Section 4). Although the computational complexity of this relaxation is much higher than that of the Gerchberg-Saxton algorithm, it is able to recover xx from Ax|Ax| (up to a multiplicative constant) in a number of cases [Chai et al., 2011; Candes et al., 2011b].

2. Splitting Phase and Amplitude Variables

which means that problem (1) is equivalent to the reduced problem

The objective of this last problem can be rewritten as follows

is positive semidefinite. The intuition behind this last formulation is simple, (IAA)(\mathbf{I}-AA^{\dagger}) is the orthogonal projector on the orthogonal complement of the image of AA (the kernel of AA^{*}), so this last problem simply minimizes in the phase vector uu the norm of the component of diag(b)u\mathop{\bf diag}(b)u which is not in the image of AA.

3. Greedy Optimization in Phase

for each i=1,,ni=1,\ldots,n, when uu is the optimum solution to problem (2). We can use this fact to derive Algorithm 2, a greedy algorithm for optimizing the phase problem.

This greedy algorithm converges to a stationary point uu^{\infty}, but it is generally not a global solution of problem (2), and hence AAdiag(u)bb|AA^{\dagger}\mathop{\bf diag}(u^{\infty})b|\neq b. It has often nearly the same stationary points as the Gerchberg-Saxton algorithm. One can indeed verify that if uu^{\infty} is a stationary point then y=diag(u)by^{\infty}=\mathop{\bf diag}(u^{\infty})b is a stationary point of the Gerchberg-Saxton algorithm. Conversely if bb has no zero coordinate and yy^{\infty} is a stable stationary point of the Gerchberg-Saxton algorithm then ui=yi/yiu_{i}^{\infty}=y_{i}^{\infty}/|y_{i}^{\infty}| defines a stationary point of the greedy algorithm in phase.

If AxAx can be computed with a fast algorithm using O(nlogn)O(n\log n) operations, which is the case for Fourier or wavelets transform operators for example, then each Gerchberg-Saxton iteration is computed with O(nlogn)O(n\log n) operations. The greedy phase algorithm above does not take advantage of this fast algorithm and requires O(n2)O(n^{2}) operations to update all coordinates uiu_{i} for each iteration kk. However, we will see in Section 3.6 that a small modification of the algorithm allows for O(nlogn)O(n\log n) iteration complexity.

4. Complex MaxCut

Following the classical relaxation argument in [Shor, 1987; Lovász and Schrijver, 1991; Goemans and Williamson, 1995; Nesterov, 1998], we first write U=uu\mboxHnU=uu^{*}\in{\mbox{\bf H}}_{n}. Problem (2), written

in the variable U\mboxHnU\in{\mbox{\bf H}}_{n}. After dropping the (nonconvex) rank constraint, we obtain the following convex relaxation

which is a semidefinite program (SDP) in the matrix U\mboxHnU\in{\mbox{\bf H}}_{n} and can be solved efficiently. When the solution of problem PhaseCut has rank one, the relaxation is tight and the vector uu such that U=uuU=uu^{*} is an optimal solution of the phase recovery problem (2). If the solution has rank larger than one, a normalized leading eigenvector vv of UU is used as an approximate solution, and diag(UvvT)\mathop{\bf diag}(U-vv^{T}) gives a measure of the uncertainty around the coefficients of vv.

In practice, semidefinite programming solvers are rarely designed to directly handle problems written over Hermitian matrices and start by reformulating complex programs in \mboxHn{\mbox{\bf H}}_{n} as real semidefinite programs over \mboxS2n{\mbox{\bf S}}_{2n} based on the simple facts that follow. For Z,Y\mboxHnZ,Y\in{\mbox{\bf H}}_{n}, we define T(Z)\mboxS2n{\mathcal{T}}(Z)\in{\mbox{\bf S}}_{2n} as in [Goemans and Williamson, 2004]

so that Tr(T(Z)T(Y))=2Tr(ZY)\mathop{\bf Tr}({\mathcal{T}}(Z){\mathcal{T}}(Y))=2\mathop{\bf Tr}(ZY). By construction, Z\mboxHnZ\in{\mbox{\bf H}}_{n} iff T(Z)\mboxS2n{\mathcal{T}}(Z)\in{\mbox{\bf S}}_{2n}. One can also check that z=x+iyz=x+iy is an eigenvector of ZZ with eigenvalue λ\lambda if and only if

are eigenvectors of T(Z){\mathcal{T}}(Z), both with eigenvalue λ\lambda (depending on the normalization of zz, one corresponds to (Re(z),Im(z))(\operatorname{Re}(z),\operatorname{Im}(z)), the other one to (Re(iz),Im(iz))(\operatorname{Re}(i\,z),\operatorname{Im}(i\,z)). This means in particular that Z0Z\succeq 0 if and only if T(Z)0{\mathcal{T}}(Z)\succeq 0.

We can use these facts to formulate an equivalent semidefinite program over real symmetric matrices, written

in the variable XX in \mboxS2n{\mbox{\bf S}}_{2n}. This last problem is equivalent to PhaseCut. In fact, because of symmetries in T(M){\mathcal{T}}(M), the equality constraints enforcing symmetry can be dropped, and this problem is equivalent to a MaxCut like problem in dimension 2n2n, which reads

in the variable XX in \mboxS2n{\mbox{\bf S}}_{2n}. As we will see below, formulating a relaxation to the phase recovery problem as a complex MaxCut-like semidefinite program has direct computational benefits.

Algorithms

In the previous section, we have approximated the phase recovery problem (2) by a convex relaxation, written

which is a semidefinite program in the matrix U\mboxHnU\in{\mbox{\bf H}}_{n}. The dual, written

For small scale problems, with n102n\sim 10^{2}, generic interior point solvers such as SDPT3 [Toh et al., 1999] solve problem (5) with a complexity typically growing as O(n4.5log(1/ϵ))O\left(n^{4.5}\log(1/\epsilon)\right) where ϵ>0\epsilon>0 is the target precision [Ben-Tal and Nemirovski, 2001, §4.6.3]. Exploiting the fact that the 2n2n equality constraints on the diagonal in (5) are singletons, Helmberg et al. derive an interior point method for solving the MaxCut problem, with complexity growing as O(n3.5log(1/ϵ))O\left(n^{3.5}\log(1/\epsilon)\right) where the most expensive operation at each iteration is the inversion of a positive definite matrix, which costs O(n3)O(n^{3}) flops.

2. First-Order Methods

When nn becomes large, the cost of running even one iteration of an interior point solver rapidly becomes prohibitive. However, we can exploit the fact that the dual of problem (5) can be written (after switching signs) as a maximum eigenvalue minimization problem. Smooth first-order minimization algorithms detailed in [Nesterov, 2007] then produce an ϵ\epsilon-solution after

floating point operations. Each iteration requires forming a matrix exponential, which costs O(n3)O(n^{3}) flops. This is not strictly smaller than the iteration complexity of specialized interior point algorithms, but matrix structure often allows significant speedup in this step. Finally, the simplest subgradient methods produce an ϵ\epsilon-solution in

floating point operations. Each iteration requires computing a leading eigenvector which has complexity roughly O(n2logn)O(n^{2}\log n).

3. Block Coordinate Descent

We can also solve the semidefinite program in PhaseCut using a block coordinate descent algorithm. While no explicit complexity bounds are available for this method in our case, the algorithm is particularly simple and has a very low cost per iteration (it only requires computing a matrix vector product). We write ici^{c} the index set {1,,i1,i+1,,n}\{1,\ldots,i-1,i+1,\ldots,n\} and describe the method as Algorithm 3.

Block coordinate descent is widely used to solve statistical problems where the objective is separable (LASSO is a typical example) and was shown to efficiently solve semidefinite programs arising in covariance estimation [d’Aspremont et al., 2006]. These results were extended by [Wen et al., 2012] to a broader class of semidefinite programs, including MaxCut. We briefly recall its simple construction below, applied to a barrier version of the MaxCut relaxation PhaseCut, written

which is a semidefinite program in the matrix U\mboxHnU\in{\mbox{\bf H}}_{n}, where μ>0\mu>0 is the barrier parameter. As in interior point algorithms, the barrier enforces positive semidefiniteness and the value of μ>0\mu>0 precisely controls the distance between the optimal solution to (7) and the optimal set of PhaseCut. We refer the reader to [Boyd and Vandenberghe, 2004] for further details. The key to applying coordinate descent methods to problems penalized by the logdet()\log\det(\cdot) barrier is the following block-determinant formula

This means that, all other parameters being fixed, minimizing the function det(X)\det(X) in the row and column block of variables xx, is equivalent to minimizing the quadratic form yxTZ1xy-x^{T}Z^{-1}x, arguably a much simpler problem. Solving the semidefinite program (7) row/column by row/column thus amounts to solving the simple problem (9) described in the following lemma.

Proof. As in [Wen et al., 2012], a direct consequence of the first order optimality conditions for (9).

Here, we see problem (7) as an unconstrained minimization problem over the off-diagonal coefficients of UU, and (8) shows that each block iteration amounts to solving a minimization subproblem of the form (9). Lemma 3.1 then shows that this is equivalent to computing a matrix vector product. Linear convergence of the algorithm is guaranteed by the result in [Boyd and Vandenberghe, 2004, §9.4.3] and the fact that the function logdet\log\det is strongly convex over compact subsets of the positive semidefinite cone. So the complexity of the method is bounded by O(log1ϵ)O\left(\log\frac{1}{\epsilon}\right) but the constant in this bound depends on nn here, and the dependence cannot be quantified explicitly.

4. Initialization & Randomization

All the sample points zz generated using this procedure satisfy zi=1|z_{i}|=1, hence are feasible points for problem (2). This means in particular that QP(M)E[zMz]QP(M)\leq\textstyle\mathop{\bf E}[z^{*}Mz]. In fact, this expectation can be computed almost explicitly, using

where F(U)F(U) is the matrix with coefficients F(Uij)F(U_{ij}), i,j=1,,ni,j=1,\ldots,n. We then get

5. Approximation Bounds

The semidefinite program in PhaseCut is a MaxCut-type graph partitioning relaxation whose performance has been studied extensively. Note however that most approximation results for MaxCut study maximization problems over positive semidefinite or nonnegative matrices, while we are minimizing in PhaseCut so, as pointed out in [Kisialiou and Luo, 2010; So and Ye, 2010] for example, we do not inherit the constant approximation ratios that hold in the classical MaxCut setting.

6. Exploiting Structure

In some instances, we have additional structural information on the solution of problems (1) and (2), which usually reduces the complexity of approximating PhaseCut and improves the quality of the approximate solutions. We briefly highlight a few examples below.

6.2. Alignment

In other instances, we might have prior knowledge that the phases of certain samples are aligned, i.e. that there is an index set II such that ui=uj,\mboxforalli,jIu_{i}=u_{j},\quad\mbox{for all }i,j\in I, this reduces to the symmetric case discussed above when the phase is arbitrary. W.l.o.g., we can also fix the phase to be one, with ui=1u_{i}=1 for iIi\in I, and solve a constrained version of the relaxation PhaseCut

which is a semidefinite program in U\mboxHnU\in{\mbox{\bf H}}_{n}.

6.3. Fast Fourier transform

If the product MxMx can be computed with a fast algorithm in O(nlogn)O(n\log n) operations, which is the case for Fourier or wavelet transform operators, we significantly speed up the iterations of Algorithm 3 to update all coefficients at once. Each iteration of the modified Algorithm 3 then has cost O(nlogn)O(n\log n) instead of O(n2)O(n^{2}).

6.4. Real valued signal

In some cases, we know that the solution vector xx in (1) is real valued. Problem (1) can be reformulated to explicitly constrain the solution to be real, by writing it

or again, using the operator T(){\mathcal{T}}(\cdot) defined in (4)

The optimal solution of the inner minimization problem in xx is given by x=A2B2vx=A_{2}^{\dagger}B_{2}v, where

which is a semidefinite program in the variable V\mboxS2nV\in{\mbox{\bf S}}_{2n}, where M2=(A2A2B2B2)T(A2A2B2B2)=B2T(IA2A2)B2M_{2}=(A_{2}A_{2}^{\dagger}B_{2}-B_{2})^{T}(A_{2}A_{2}^{\dagger}B_{2}-B_{2})=B_{2}^{T}(\mathbf{I}-A_{2}A_{2}^{\dagger})B_{2}.

Matrix completion & exact recovery conditions

in the variable X\mboxHpX\in{\mbox{\bf H}}_{p}, where X=xxX=xx^{*} when exact recovery occurs. This last problem can be relaxed as

which is a semidefinite program (called PhaseLift by Candes et al. [2011b]) in the variable X\mboxHpX\in{\mbox{\bf H}}_{p}. Recent results in [Candes et al., 2011a; Candes and Li, 2012] give explicit (if somewhat stringent) conditions on AA and xx under which the relaxation is tight (i.e. the optimal XX in PhaseLift is unique, has rank one, with leading eigenvector xx).

We also introduce a weak version of PhaseLift, which is more directly related to PhaseCut and is easier to interpret geometrically. It was noted in [Candes et al., 2011a] that, when I\mboxspan{aiai}i=1n\mathbf{I}\in\mbox{span}\{a_{i}a_{i}^{*}\}_{i=1}^{n}, the condition Tr(aiaiX)=bi2,i=1,...,n\mathop{\bf Tr}(a_{i}a_{i}^{*}X)=b_{i}^{2},i=1,...,n determines Tr(X)\mathop{\bf Tr}(X), so in this case the trace minimization objective is redundant and PhaseLift is equivalent to

When I\mboxspan{aiai}i=1n\mathbf{I}\notin\mbox{span}\{a_{i}a_{i}^{*}\}_{i=1}^{n} on the other hand, Weak PhaseLift and PhaseLift are not equivalent: solutions of PhaseLift solve Weak PhaseLift too but the converse is not true. Interior point solvers typically pick a solution at the analytic center of the feasible set of Weak PhaseLift which in general can be significantly different from the minimum trace solution.

However, in practice, the removal of trace minimization does not really seem to alter the performances of the algorithm. We will illustrate this affirmation with numerical experiments in §5.4 and a formal proof is given in [Demanet and Hand, 2012] who showed that, in the case of Gaussian random measurements, the relaxation of Weak PhaseLift was tight with high probability under the same conditions as PhaseLift.

2. Phase Recovery as a Projection

We will see in what follows that phase recovery can interpreted as a projection problem. These results will prove useful later to study stability. The PhaseCut reconstruction problem defined in PhaseCut is written

with M=diag(b)(IAA)diag(b)M=\mathop{\bf diag}(b)(\mathbf{I}-AA^{\dagger})\mathop{\bf diag}(b). In what follows, we assume bi0b_{i}\neq 0, i=1,...,ni=1,...,n, which means that, after scaling UU, solving PhaseCut is equivalent to solving

In the following lemma, we show that this last semidefinite program can be understood as a projection problem on a section of the semidefinite cone using the trace (or nuclear) norm. We define

which is also F={V\mboxHn:(IAA)V(IAA)=0}\mathcal{F}=\{V\in{\mbox{\bf H}}_{n}:(\mathbf{I}-AA^{\dagger})V(\mathbf{I}-AA^{\dagger})=0\}, and we now formulate the objective of problem (11) as a distance.

For all V\mboxHnV\in{\mbox{\bf H}}_{n} such that V0V\succeq 0,

where d1d_{1} is the distance associated to the trace norm.

Proof. Let B1\mathcal{B}_{1} (resp. B2\mathcal{B}_{2}) be an orthonormal basis of \mboxrangeA\mbox{{range}}\,A (resp. (\mboxrangeA)(\mbox{{range}}\,A)^{\perp}). Let TT be the transformation matrix from canonical basis to orthonormal basis B1B2\mathcal{B}_{1}\cup\mathcal{B}_{2}. Then

As the transformation XT1XTX\to T^{-1}XT preserves the nuclear norm, for every matrix V0V\succeq 0, if we write

then the orthogonal projection of VV onto F\mathcal{F} is

so d1(V,F)=VW1=(000V3)1d_{1}(V,\mathcal{F})=\|V-W\|_{1}=\|\left(\begin{smallmatrix}0&0\\ 0&V_{3}\end{smallmatrix}\right)\|_{1}. As V0V\succeq 0, (V1V2V2V3)0\left(\begin{smallmatrix}V_{1}&V_{2}\\ V_{2}^{*}&V_{3}\end{smallmatrix}\right)\succeq 0 hence (000V3)0\left(\begin{smallmatrix}0&0\\ 0&V_{3}\end{smallmatrix}\right)\succeq 0, so d1(V,F)=Tr(000V3)d_{1}(V,\mathcal{F})=\mathop{\bf Tr}\left(\begin{smallmatrix}0&0\\ 0&V_{3}\end{smallmatrix}\right). Because AAAA^{\dagger} is the orthogonal projection onto R(A){\mathcal{R}}(A), we have T1(IAA)T=(000I)T^{-1}(\mathbf{I}-AA^{\dagger})T=\left(\begin{smallmatrix}0&0\\ 0&\mathbf{I}\end{smallmatrix}\right) hence

This means that PhaseCut can be written as a projection problem, i.e.

in the variable V\mboxHnV\in{\mbox{\bf H}}_{n}, where Hb={V\mboxHn\mboxs.t.Vi,i=bi2,i=1,...,n}\mathcal{H}_{b}=\{V\in{\mbox{\bf H}}_{n}\mbox{ s.t. }V_{i,i}=b_{i}^{2},i=1,...,n\}. Moreover, with aia_{i} the ii-th row of AA, we have for all X\mboxHp+X\in{\mbox{\bf H}}_{p}^{+}, Tr(aiaiX)=aiXai=diag(AXA)i,i=1,,n\mathop{\bf Tr}(a_{i}a_{i}^{*}X)=a_{i}^{*}Xa_{i}=\mathop{\bf diag}(AXA^{*})_{i},\quad i=1,\ldots,n, so if we call V=AXAFV=AXA^{*}\in\mathcal{F}, when AA is injective, X=AVAX=A^{\dagger}VA^{{\dagger}*} and Weak PhaseLift is equivalent to

First order algorithms for Weak PhaseLift will typically solve

which is another projection problem in VV.

3. Tightness of the Semidefinite Relaxation

We will now formulate a refinement of the semidefinite relaxation in PhaseCut and prove that this refinement is equivalent to the relaxation in PhaseLift under mild technical assumptions. Suppose uu is the optimal phase vector, we know that the optimal solution to (1) can then be written x=Adiag(b)ux=A^{\dagger}\mathop{\bf diag}(b)u, which corresponds to the matrix X=Adiag(b)uudiag(b)AX=A^{\dagger}\mathop{\bf diag}(b)uu^{*}\mathop{\bf diag}(b)A^{{\dagger}*} in PhaseLift, hence

Writing B=diag(b)AAdiag(b)B=\mathop{\bf diag}(b)A^{{\dagger}*}A^{\dagger}\mathop{\bf diag}(b), when problem (1) is solvable, we look for the “minimum trace” solution among all the optimal points of relaxation PhaseCut by solving

which is a semidefinite program in U\mboxHnU\in{\mbox{\bf H}}_{n}. When problem (1) is solvable, then every optimal solution of the semidefinite relaxation PhaseCut is a feasible point of relaxation PhaseCutMod. In practice, the semidefinite program SDP(M+γB)SDP(M+\gamma B), written

obtained by replacing MM by M+γBM+\gamma B in problem PhaseCut, will produce a solution to PhaseCutMod whenever γ>0\gamma>0 is sufficiently small (this is essentially the exact penalty method detailed in [Bertsekas, 1998, §4.3] for example). This means that all algorithms (greedy or SDP) designed to solve the original PhaseCut problem can be recycled to solve PhaseCutMod with negligible effect on complexity. We now show that the PhaseCutMod and PhaseLift relaxations are simultaneously tight when AA is injective. An earlier version of this paper showed PhaseLift tightness implies PhaseCut tightness and the argument was reversed in [Voroninski, 2012] under mild additional assumptions.

Assume that bi0b_{i}\neq 0 for i=1,,ni=1,\ldots,n, that AA is injective and that there is a solution xx to (1). The function

is a bijection between the feasible points of PhaseCutMod and those of PhaseLift.

Proof. Note that Φ\Phi is injective whenever b>0b>0 and AA has full rank. We have to show that UU is a feasible point of PhaseCutMod if and only if it can be written under the form Φ(X)\Phi(X), where XX is feasible for PhaseLift. We first show that

for some X0X\succeq 0. Observe that Tr(UM)=0\mathop{\bf Tr}(UM)=0 means UM=0UM=0 because U,M0U,M\succeq 0, hence Tr(MU)=0\mathop{\bf Tr}(MU)=0 in (15) is equivalent to

because b>0b>0 and M=diag(b)(IAA)diag(b)M=\mathop{\bf diag}(b)(\mathbf{I}-AA^{\dagger})\mathop{\bf diag}(b). If we set X=Adiag(b)Udiag(b)AX=A^{\dagger}\mathop{\bf diag}(b)U\mathop{\bf diag}(b)A^{{\dagger}*}, this last equality implies both

which is U=Φ(X)U=\Phi(X), and shows (15) implies (16). Conversely, if U=Φ(X)U=\Phi(X) then diag(b)Udiag(b)=AXA\mathop{\bf diag}(b)U\mathop{\bf diag}(b)=AXA^{*} and using AAA=AAA^{\dagger}A=A, we get AXA=AAAXA=AAdiag(b)Udiag(b)AXA^{*}=AA^{\dagger}AXA^{*}=AA^{\dagger}\mathop{\bf diag}(b)U\mathop{\bf diag}(b) which means MU=0MU=0, hence (15) is in fact equivalent to (16) since U0U\succeq 0 by construction.

Now, if XX is feasible for PhaseLift, we have shown Tr(MΦ(X))=0\mathop{\bf Tr}(M\Phi(X))=0 and ϕ(X)0\phi(X)\succeq 0, moreover diag(Φ(X))i=Tr(aiaiX)/bi2=1\mathop{\bf diag}(\Phi(X))_{i}=\mathop{\bf Tr}(a_{i}a_{i}^{*}X)/b_{i}^{2}=1, so U=Φ(X)U=\Phi(X) is a feasible point of PhaseCutMod. Conversely, if UU is feasible for PhaseCutMod, we have shown that there exists X0X\succeq 0 such that U=Φ(X)U=\Phi(X) which means diag(b)Udiag(b)=AXA\mathop{\bf diag}(b)U\mathop{\bf diag}(b)=AXA^{*}. We also have Tr(aiaiX)=bi2Uii=bi2\mathop{\bf Tr}(a_{i}a_{i}^{*}X)=b_{i}^{2}U_{ii}=b_{i}^{2}, which means XX is feasible for PhaseLift and concludes the proof.

We now have the following central corollary showing the equivalence between PhaseCutMod and PhaseLift in the noiseless case.

If AA is injective, bi0b_{i}\neq 0 for all i=1,...,ni=1,...,n and if the reconstruction problem (1) admits an exact solution, then PhaseCutMod is tight (i.e. has a unique rank one solution) whenever PhaseLift is.

Proof. When AA is injective, Tr(X)=Tr(BΦ(X))\mathop{\bf Tr}(X)=\mathop{\bf Tr}(B\Phi(X)) and Rank(X)=Rank(Φ(X))\mathop{\bf Rank}(X)=\mathop{\bf Rank}(\Phi(X)).

This last result shows that in the noiseless case, the relaxations PhaseLift and PhaseCutMod are in fact equivalent. In the same way, we could have shown that Weak PhaseLift and PhaseCut were equivalent. The performances of both algorithms may not match however when the information on bb is noisy and perfect recovery is not possible.

Note that Proposition 4.2 and corollary 4.3 also hold when the initial signal is real and the measurements are complex. In this case, we define the BB in PhaseCutMod by B=B2A2A2B2B=B_{2}A_{2}^{{\dagger}*}A_{2}^{\dagger}B_{2} (with the notations of paragraph 3.6.4). We must also replace the definition of Φ\Phi by Φ(X)=B21A2XA2B21\Phi(X)=B_{2}^{-1}A_{2}XA_{2}^{*}B_{2}^{-1}. Furthermore, all steps in the proof of proposition 4.2 are still valid if we replace MM by M2M_{2}, AA by A2A_{2} and diag(b)\mathop{\bf diag}(b) by B2B_{2}. The only difference is that now 1bi2Tr(aiaiX)=diag(Φ(X))i+diag(Φ(X))n+i\frac{1}{b_{i}^{2}}\mathop{\bf Tr}(a_{i}a_{i}^{*}X)=\mathop{\bf diag}(\Phi(X))_{i}+\mathop{\bf diag}(\Phi(X))_{n+i}.

4. Stability in the Presence of Noise.

We now consider the case where the vector of measurements bb is of the form b=Ax0+bnoiseb=|Ax_{0}|+b_{\rm noise}. We first introduce a definition of CC-stability for PhaseCut and Weak PhaseLift. The main result of this section is that, when the Weak PhaseLift solution in (14) is stable at a point, PhaseCut is stable too, with a constant of the same order. The converse does not seem to be true when bb is sparse.

The following matrix perturbation result motivates this definition, by showing that a CC-stable algorithm generates a O(Cbnoise2)O(C\|b_{\rm noise}\|_{2})-error over the signal it reconstructs.

Let C>0C>0 be arbitrary. We suppose that Ax00Ax_{0}\neq 0 and V(Ax0)(Ax0)2CAx02bnoise2Ax022/2\|V-(Ax_{0})(Ax_{0})^{*}\|_{2}\leq C\|Ax_{0}\|_{2}\|b_{\rm noise}\|_{2}\leq\|Ax_{0}\|_{2}^{2}/2. Let yy be VV’s main eigenvector, normalized so that (Ax0)y=Ax02(Ax_{0})^{*}y=\|Ax_{0}\|_{2}. Then

and the constant in this last equation does not depend upon AA, x0x_{0}, CC or b2\|b\|_{2}.

Proof. We use [El Karoui and d’Aspremont, 2009, Eq.10] for

This result is based on [Kato, 1995, Eq. 3.29], which gives a precise asymptotic expansion of uvu-v. For our purposes here, we only need the first-order term. See also Bhatia , Stewart and Sun or Stewart among others for a complete discussion. We get vu=O(E2)\|v-u\|=O(\|E\|_{2}) because if M=uuM=uu^{*}, then R=1\|R\|_{\infty}=1 in [El Karoui and d’Aspremont, 2009, Eq.10]. This implies

Note that normalizing yy differently, we would obtain yAx024Cbnoise2\|y-Ax_{0}\|_{2}\leq 4C\|b_{\rm noise}\|_{2}. We now show the main result of this section, according to which PhaseCut is “almost as stable as” Weak PhaseLift. In practice of course, the exact values of the stability constants has no importance, what matters is that they are of the same order.

Such a VPCV_{PC} must exist or PhaseCut would be DD-stable in x0x_{0}. We call VPC\sslashV_{PC}^{\sslash} the restriction of VPCV_{PC} to \mboxrange(A)\mbox{{range}}(A) (that is, the matrix such that x(VPC\sslash)y=x(VPC)yx^{*}(V_{PC}^{\sslash})y=x^{*}(V_{PC})y if x,y\mboxrange(A)x,y\in\mbox{{range}}(A) and x(VPC\sslash)y=0x^{*}(V_{PC}^{\sslash})y=0 if x\mboxrange(A)x\in\mbox{{range}}(A)^{\perp} or y\mboxrange(A)y\in\mbox{{range}}(A)^{\perp}) and VPCV_{PC}^{\perp} the restriction of VPCV_{PC} to \mboxrange(A)\mbox{{range}}(A)^{\perp}. Let us set bn,PL=VPCii\sslashAx0iib_{\rm n,PL}=\sqrt{V_{PC\,ii}^{\sslash}}-|Ax_{0}|_{ii} for i=1,...,ni=1,...,n. As VPC\sslash\mboxHn+FV_{PC}^{\sslash}\in{\mbox{\bf H}}_{n}^{+}\cap\mathcal{F}, VPC\sslashV_{PC}^{\sslash} minimizes (14) for b=Ax0+bn,PLb=|Ax_{0}|+b_{\rm n,PL} (because VPC\sslashHbV_{PC}^{\sslash}\in\mathcal{H}_{b}). Lemmas A.1 and A.2 (proven in the appendix) imply that VPC\sslash(Ax0)(Ax0)2>CAx02bn,PL2\|V_{PC}^{\sslash}-(Ax_{0})(Ax_{0})^{*}\|_{2}>C\|Ax_{0}\|_{2}\|b_{\rm n,PL}\|_{2} and bn,PL2ϵ\|b_{\rm n,PL}\|_{2}\leq\epsilon. As ϵ\epsilon is arbitrary, Weak PhaseLift is not CC-stable in x0x_{0}, which contradicts our hypotheses. Consequently, PhaseCut is (2C+22+1)(2C+2\sqrt{2}+1)-stable in x0x_{0}.

Theorem 4.7 is still true if we replace 2C+22+12C+2\sqrt{2}+1 by any D>2C+2D>2C+\sqrt{2}. We only have to replace, in the demonstration, the inequality bn,PC2Ax02\|b_{\rm n,PC}\|_{2}\leq\|Ax_{0}\|_{2} by bn,PC2αAx02\|b_{\rm n,PC}\|_{2}\leq\alpha\|Ax_{0}\|_{2} with α=D(2C+2)/(1+2)\alpha={D-(2C+\sqrt{2})}/{(1+\sqrt{2})}. Also, the demonstration of this theorem is based on the fact that, when VPCV_{PC} solves (13), one can construct some VPL=VPC\sslashV_{PL}=V_{PC}^{\sslash} close to VPCV_{PC}, which is an approximate solution of (14). It is natural to wonder whether, conversely, from a solution VPLV_{PL} of (14), one can construct an approximate solution VPCV_{PC} of (13). It does not seem to be the case. One could for example imagine setting VPC=diag(R)VPLdiag(R)V_{PC}=\mathop{\bf diag}(R)V_{PL}\mathop{\bf diag}(R), where Ri=bi/VPLiiR_{i}={b_{i}}/{\sqrt{V_{PL\,ii}}}. Then VPCV_{PC} would not necessarily minimize (13) but at least belong to Hb\mathcal{H}_{b}. But VPCVPL2\|V_{PC}-V_{PL}\|_{2} might be quite large: (14) implies that diag(VPL)b2s\|\mathop{\bf diag}(V_{PL})-b^{2}\|_{s} is small but, if some coefficients of bb are very small, some RiR_{i} may still be huge, so diag(R)≉I\mathop{\bf diag}(R)\not\approx\mathbf{I}. This does happen in practice (see § 5.5).

To conclude this section, we relate this definition of stability to the one introduced in [Candes and Li, 2012]. Suppose that AA is a matrix of random gaussian independant measurements such that E[Ai,j2]=1\textstyle\mathop{\bf E}[|A_{i,j}|^{2}]=1 for all i,ji,j. We also suppose that nc0pn\geq c_{0}p (for some c0c_{0} independent of nn and pp). In the noisy setting, Candes and Li showed that the minimizer XX of a modified version of PhaseLift satisfies with high probability

for some C0C_{0} independent of all variables. Assuming that the Weak PhaseLift solution in (14) behaves as PhaseLift in a noisy setting and that (17) also holds for Weak PhaseLift, then

Consequently, for any C>2C0A2nC>2C_{0}\frac{||A||_{\infty}^{2}}{n}, Weak PhaseLift is CC-stable in all x0x_{0}. With high probability, A2(1+1/8)n||A||_{\infty}^{2}\leq(1+1/8)n (it is a corollary of [Candes and Li, 2012, Lemma 2.1]) so Weak PhaseLift (and thus also PhaseCut) is CC-stable with high probability for some CC independent of all parameters of the problem.

5. Perturbation Results

We recall here sensitivity analysis results for semidefinite programming from Todd and Yildirim ; Yildirim , which produce explicit bounds on the impact of small perturbations in the observation vector b2b^{2} on the solution VV of the semidefinite program (11). Roughly speaking, these results show that if b2+bnoiseb^{2}+b_{noise} remains in an explicit ellipsoid (called Dikin’s ellipsoid), then interior point methods converge back to the solution in one full Newton step, hence the impact on VV is linear, equal to the Newton step. These results are more numerical in nature than the stability bounds detailed in the previous section, but they precisely quantify both the size and, perhaps more importantly, the geometry of the stability region.

6. Complexity Comparisons

Both the relaxation in PhaseLift and that in PhaseCut are semidefinite programs and we highlight below the relative complexity of solving these problems depending on algorithmic choices and precision targets. Note that, in their numerical experiments, [Candes et al., 2011b] solve a penalized formulation of problem PhaseLift, written

in the variable X\mboxHpX\in{\mbox{\bf H}}_{p}, for various values of the penalty parameter λ>0\lambda>0.

The trace norm promotes a low rank solution, and solving a sequence of weighted trace-norm problems has been shown to further reduce the rank in [Fazel et al., 2003; Candes et al., 2011b]. This method replaces Tr(X)\mathop{\bf Tr}(X) by Tr(WkX)\mathop{\bf Tr}(W_{k}X) where W0W_{0} is initialized to the identity II. Given a solution XkX_{k} of the resulting semidefinite program, the weighted matrix is updated to Wk+1=(Xk+ηI)1W_{k+1}=(X_{k}+\eta I)^{-1} (see Fazel et al. for details). We denote by KK the total number of such iterations, typically of the order of 1010. Trace minimization is not needed for the semidefinite program (PhaseCut), where the trace is fixed because we optimize over a normalized phase vector. However, weighted trace-norm iterations could potentially improve performance in PhaseCut as well.

Recall that pp is the size of the signal and nn is the number of measured samples with n=Jpn=Jp in the examples reviewed in Section 5. In the numerical experiments in [Candes et al., 2011b] as well as in this paper, J=3,4,5J=3,4,5. The complexity of solving the PhaseCut and PhaseLift relaxations in PhaseLift using generic semidefinite programming solvers such as SDPT3 [Toh et al., 1999], without exploiting structure, is given by

for PhaseCut and PhaseLift respectively [Ben-Tal and Nemirovski, 2001, § 6.6.3]. The fact that the constraint matrices have only one nonzero coefficient in PhaseCut can be exploited (the fact that the constraints aiaia_{i}a_{i}^{*} are rank one in PhaseLift helps, but it does not modify the principal complexity term) so we get

for PhaseCut and PhaseLift respectively using the algorithm in Helmberg et al. for example. If we use first-order solvers such as TFOCS [Becker et al., 2012], based on the optimal algorithm in [Nesterov, 1983], the dependence on the dimension can be further reduced, to become

for solving a penalized version of the PhaseCut relaxation and the penalized formulation of PhaseLift in (18). While the dependence on the signal dimensions pp is somewhat reduced, the dependence on the target precision grows from log(1/ϵ)\log(1/\epsilon) to 1/ϵ1/\epsilon. Finally, the iteration complexity of the block coordinate descent Algorithm 3 is substantially lower and its convergence is linear, but no fully explicit bounds on the number of iterations are known in our case. The complexity of the method is then bounded by O(log1ϵ)O\left(\log\frac{1}{\epsilon}\right) but the constant in this bound depends on nn here, and the dependence cannot be quantified explicitly.

Algorithmic choices are ultimately guided by precision targets. If ϵ\epsilon is large enough so that a first-order solver or a block coordinate descent can be used, the complexity of PhaseCut is not significantly better than that of PhaseLift. On the contrary, when ϵ\epsilon is small, we must use an interior point solver, for which PhaseCut’s complexity is an order of magnitude lower than that of PhaseLift because its constraint matrices are singletons. In practice, the target value for ϵ\epsilon strongly depends on the sampling matrix AA. For example, when AA corresponds to the convolution by 66 Gaussian random filters (§5.2), to reconstruct a Gaussian white noise of size 6464 with a relative precision of η\eta, we typically need ϵ2.101η\epsilon\sim 2.10^{-1}\eta. For 44 Cauchy wavelets (§5.3), it is twenty times less, with ϵ102η\epsilon\sim 10^{-2}\eta. For other types of signals than Gaussian white noise, we may even need ϵ103η\epsilon\sim 10^{-3}\eta.

7. Greedy Refinement

8. Sparsity

which is a semidefinite program in the (larger) matrix variable U\mboxHpU\in{\mbox{\bf H}}_{p}, with M3M_{3} the matrix containing MM in its upper left block and zeros elsewhere, and V=(Adiag(b),F)V=(A^{\dagger}\mathop{\bf diag}(b),F).

Numerical results

In this section, we compare the numerical performance of the Gerchberg-Saxton (greedy), PhaseCut and PhaseLift algorithms on various phase recovery problems. As in [Candes et al., 2011b], the PhaseLift problem is solved using the package in [Becker et al., 2012], with reweighting, using K=10K=10 outer iterations and 10001000 iterations of the first order algorithm. The PhaseCut and Gerchberg-Saxton algorithms described here are implemented in a public software package available at

http://www.cmap.polytechnique.fr/scattering/code/phaserecovery.zip

We also record the error over measured amplitudes, written

The discrete Fourier transform y^\hat{y} of a signal yy of qq coefficients is written

In X-ray crystallography or diffraction imaging experiments, compactly supported signals are estimated from the amplitude of Fourier transforms oversampled by a factor J2J\geq 2. The corresponding operator AA computes an oversampled discrete Fourier transform evaluated over n=Jpn=Jp coefficients. The signal xx of size pp is extended into xJx^{J} by adding (J1)p(J-1)p zeros and

2. Multiple Random Illumination Filters

To guarantee uniqueness of the phase recovery problem, one can add independent measurements by “illuminating” the object through J filters hjh^{j} in the context of X-ray imaging or crystallography [Candes et al., 2011a]. The resulting operator AA is the discrete Fourier transform of xx multiplied by each filter hjh^{j} of size pp

where x^h^j\hat{x}\star\hat{h}^{j} is the circular convolution between x^\hat{x} and h^j\hat{h}^{j}. Candes et al. [2011a] proved that, for some constant C>0C>0 large enough, CpCp Gaussian independent measurements are sufficient to perfectly reconstruct a signal of size pp, with high probability. Similarly, we would expect that, picking the filters hjh^{j} as realizations of independent Gaussian random variables, perfect reconstruction will be guaranteed with high probability if JJ is large enough (and independent of pp). This result has not yet been proven because Gaussian filters do not give independent measurements but Candes et al. [2011b] observed that, empirically, for signals of size p=128p=128, with J=4J=4 filters, perfect recovery is achieved in 100%100\% of experiments.

3. Wavelet Transform

Phase recovery problems from the modulus of wavelet coefficients appear in audio signal processing where this modulus is used by many audio and speech recognition systems. These moduli also provide physiological models of cochlear signals in the ear [Chi et al., 2005] and recovering audio signals from wavelet modulus coefficients is an important problem in this context.

To simplify experiments, we consider wavelets dilated by dyadic factors 2j2^{j} which have a lower frequency resolution than audio wavelets. A discrete wavelet transform is computed by circular convolutions with discrete wavelet filters, i.e.

Numerical computations are performed with a Cauchy wavelet whose Fourier transform is

up to a scaling factor, with d=5d=5. To guarantee that AA is an invertible operator, the lowest signal frequencies are carried by a suitable low-pass filter ϕ\phi and

The PhaseCut algorithm reconstructs exactly all test signals. Moreover, the recovered matrix UU is always of rank one and it is therefore not necessary to refine the solution with Gerchberg-Saxton iterations. At first sight, this difference in performance between PhaseCut and PhaseLift may seem to contradict the equivalence results of §4.3 (which are valid when xx is real and when xx is complex). It can be explained however by the fact that 1010 steps of reweighing and 10001000 inner iterations per step are not enough to let PhaseLift fully converge. In these experiments, the precision required to get perfect reconstruction is very high and, consequently, the number of first-order iterations required to achieve it is too large (see §4.6). With an interior-point-solver, this number would be much smaller but the time required per iteration would become prohibitively large. The much simpler structure of the PhaseCut relaxation allows us to solve these larger problems more efficiently.

4. Impact of Trace Minimization

We saw in §4.1 that, in the absence of noise, PhaseCut was very similar to a simplified version of PhaseLift, Weak PhaseLift, in which no trace minimization is performed. Here, we confirm empirically that Weak PhaseLift and PhaseLift are essentially equivalent. Minimizing the trace is usually used as rank minimization heuristic, with recovery guarantees in certain settings [Fazel et al., 2003; Candes and Recht, 2008; Chandrasekaran et al., 2012] but it does not seem to make much difference here. In fact, Demanet and Hand recently showed that in the independent experiments setting, Weak PhaseLift has a unique (rank one) solution with high probability, i.e. the feasible set of PhaseLift is a singleton and trace minimization has no impact. Of course, from a numerical point of view, solving the feasibility problem Weak PhaseLift is about as hard as solving the trace minimization problem PhaseLift, so the result [Demanet and Hand, 2012] simplifies analysis but does not really affect numerical performance.

Figure 3 compares the performances of PhaseLift and Weak PhaseLift as a function of nn (the number of measurements). We plot the percentage of successful reconstructions (left) and the percentage of cases where the relaxation was exact, i.e. the reconstructed matrix XX was rank one (right). The plot shows a clear phase transitions when the number of measurements increases. For PhaseLift, these transitions happen respectively at n=1552.5pn=155\approx 2.5p and n=2854.5pn=285\approx 4.5p, while for Weak PhaseLift, the values become n=1702.7pn=170\approx 2.7p and n=2954.6pn=295\approx 4.6p, so the transition thresholds are very similar. Note that, in the absence of noise, Weak PhaseLift and PhaseCut have the same solutions, up to a linear transformation (see §4.2), so we can expect the same behavior in the comparison PhaseCut versus PhaseCutMod.

5. Reconstruction in the Presence of Noise

Numerical stability is crucial for practical applications. In this last subsection, we suppose that the vector bb of measurements is of the form

When all coefficients of AxAx have approximately the same amplitude, PhaseLift and PhaseCut produce similar results, but when AxAx is sparse, PhaseLift appears less stable. We gave a qualitative explanation of this behavior at the end of §4.4 which seems to be confirmed by the results in Figure 4. This boils down to the fact that the values of the phase variables in PhaseCut corresponding to zeros in bb can be set to zero so the problem becomes much smaller. Indeed, the performance of PhaseLift and PhaseCut are equivalent in the case of Gaussian random filters (where measurements are never sparse), they are a bit worse in the case of sinusoids (where measurements are sometimes sparse) and quite unsatisfactory for scan-lines of images (where measurements are always sparse).

5.2. Multiple random illumination filters

The same result seems to hold for AA corresponding to Gaussian random illumination filters (cf. §5.2). Moreover, PhaseCut is as stable as PhaseLift. Actually, up to 20%20\% of noise, when followed by some Gerchberg-Saxton iterations, PhaseCut and PhaseLift almost always reconstruct the same function. Figure 5 displays the corresponding empirical performance, confirming that both algorithms are stable. The relative reconstruction errors are approximately linear in the amount of noise, with

In the non-sparse case, both algorithms yield very similar error ϵ7bnoise2/Ax2\epsilon\approx 7{\|b_{\rm noise}\|_{2}}/{\|Ax\|_{2}} (the difference for a relative noise of 10410^{-4} may come from a computational artifact). In the sparse case, there are less phases to reconstruct, because we do not need to reconstruct the phase of null measurements. Consequently, the problem is better constrained and we expect the algorithms to be more stable. Indeed, the relative errors over the reconstructed matrices are smaller. However, in this case, the performance of PhaseLift and PhaseCut do not match anymore: ϵ3bnoise2/Ax2\epsilon\approx 3{\|b_{\rm noise}\|_{2}}/{\|Ax\|_{2}} for PhaseLift and ϵ1.2bnoise2/Ax2\epsilon\approx 1.2{\|b_{\rm noise}\|_{2}}/{\|Ax\|_{2}} for PhaseCut. This remark has no practical impact in our particular example here because taking a few Gerchberg-Saxton iterations would likely make both methods converge towards the same solution, but it confirms the importance of accounting for the sparsity of Ax|Ax|.

Acknowledgments

The authors are grateful to Richard Baraniuk, Emmanuel Candès, Rodolphe Jenatton, Amit Singer and Vlad Voroninski for very constructive comments. In particular, Vlad Voroninski showed in [Voroninski, 2012] that the argument in the first version of this paper, proving that PhaseCutMod is tight when PhaseLift is, could be reversed under mild technical conditions and pointed out an error in our handling of sparsity constraints. AA would like to acknowledge support from a starting grant from the European Research Council (project SIPA), and SM acknowledges support from ANR grant BLAN 012601.

Appendix A Technical lemmas

We now prove two technical lemmas used in the proof of Theorem 4.7.

Under the assumptions and notations of Theorem 4.7, we have

We also have TrVPC\sslash=TrVPCTrVPC\mathop{\bf Tr}V_{PC}^{\sslash}=\mathop{\bf Tr}V_{PC}-\mathop{\bf Tr}V_{PC}^{\perp}. This equality comes from the fact that, if {fi}\{f_{i}\} is an hermitian base of \mboxrange(A)\mbox{{range}}(A) and {gi}\{g_{i}\} an hermitian base of \mboxrange(A)\mbox{{range}}(A)^{\perp}, then

As VPC0V_{PC}^{\perp}\succeq 0, TrVPC\sslashTrVPC=Ax0+bn,PC22\mathop{\bf Tr}V_{PC}^{\sslash}\leq\mathop{\bf Tr}V_{PC}=\||Ax_{0}|+b_{\rm n,PC}\|_{2}^{2} and, by combining this with relations (21) and (A), we get

And, by reminding that we assumed bn,PC2Ax02\|b_{\rm n,PC}\|_{2}\leq\|Ax_{0}\|_{2},

Under the assumptions and notations of Theorem 4.7, we have bn,PL22bn,PC\|b_{\rm n,PL}\|_{2}\leq 2\|b_{\rm n,PC}\|.

Because fiVPCgifiVPCfigiVPCgi=VPCii\sslashVPCii|f_{i}^{*}V_{PC}g_{i}|\leq\sqrt{f_{i}^{*}V_{PC}f_{i}}\sqrt{g_{i}^{*}V_{PC}g_{i}}=\sqrt{V_{PC\,ii}^{\sslash}}\sqrt{V_{PC\,ii}^{\perp}},

References