Phase retrieval for imaging problems

Fajwel Fogel, Irène Waldspurger, Alexandre d'Aspremont

Introduction

Phase retrieval seeks to reconstruct a complex signal, given a number of observations on the magnitude of linear measurements, i.e. solve

Because the phase constraint $|Ax|=b$ is nonconvex, the phase recovery problem (1) is non-convex. Several greedy algorithms have been developed (see [Gerchberg and Saxton, 1972; Fienup, 1982; Griffin and Lim, 1984; Bauschke et al., 2002] among others), which alternate projections on the range of $A$ and on the nonconvex set of vectors $y$ such that $|y|=|Ax|$ . While empirical performance is often good, these algorithms can stall in local minima. A convex relaxation was introduced in [Chai et al., 2011] and [Candes et al., 2011] (who call it PhaseLift) by observing that $|Ax|^{2}$ is a linear function of $X=xx^{*}$ which is a rank one Hermitian matrix, using the classical lifting argument for nonconvex quadratic programs developed in [Shor, 1987; Lovász and Schrijver, 1991]. The recovery of $x$ is thus expressed as a rank minimization problem over positive semidefinite Hermitian matrices $X$ satisfying some linear conditions, i.e. a matrix completion problem. This last problem has received a significant amount of attention because of its link to compressed sensing and the NETFLIX collaborative filtering problem. This minimum rank matrix completion problem is approximated by a semidefinite program which has been shown to recover $x$ for several (random) classes of observation operators $A$ [Candes et al., 2011, 2013a, 2013b].

On the algorithmic side, [Waldspurger et al., 2012] showed that the phase retrieval problem (1) can be reformulated in terms of a single phase variable, which can be read as an extension of the MAXCUT combinatorial graph partitioning problem over the unit complex torus, allowing fast algorithms designed for solving semidefinite relaxations of MAXCUT to be applied to the phase retrieval problem.

On the experimental side, phase recovery is a classical problem in Fourier optics for example [Goodman, 2008], where a diffraction medium takes the place of a lens. This has a direct applications in X-ray and crystallography imaging, diffraction imaging or microscopy [Harrison, 1993; Bunk et al., 2007; Johnson et al., 2008; Miao et al., 2008; Dierolf et al., 2010].

Here, we implement and study several efficient convex relaxation algorithms for phase retrieval on imaging problem instances where $A$ is based on a Fourier operator. We show in particular how structural assumptions on the signal and the observations (e.g. sparsity, smoothness, positivity, known support, oversampling, etc.) can be exploited to both speed-up convergence and improve recovery performance. While no experimental data is available from diffraction imaging problems with multiple randomly coded illuminations, we simulate numerical experiments of this type using molecular density information from the protein data bank [Berman et al., 2002]. Our results show in particular that the convex relaxation is stable and that in some settings, as few as two random illuminations suffice to reconstruct the image.

The paper is organized as follows. Section 2 briefly recalls the structure of some key algorithms used in phase retrieval. Section 3 describes applications to imaging problems and how structural assumptions can sifnificantly reduce the cost of solving large-scale instances and improve recovery performance. Section 4 details some numerical experiments while Section 5 describes the interface to the numerical library developed for these problems.

Algorithms

In this section, we briefly recall several basic algorithmic approaches to solve the phase retrieval problem (1). Early methods were all based on extensions of an alternating projection algorithm. However, recent results showed that phase retrieval could be interpreted as a matrix completion problem similar to the NETFLIX problem, a formulation which yields both efficient convex relaxations and recovery guarantees.

The phase retrieval problem (1) can be rewritten

The algorithm Gerchberg-Saxton by [Gerchberg and Saxton, 1972] for instance seeks to reconstruct $y=Ax$ and alternates between orthogonal projections on the range of $A$ and normalization of the magnitudes $|y|$ to match the observations $b$ . The cost per iteration of this method is minimal but convergence (when it happens) is often slow.

A classical “input-output” variant, detailed here as algorithm Fienup, introduced by [Fienup, 1982] adds an extra penalization step which usually speeds up convergence and improves recovery performance when additional information is available on the support of the signal. Oversampling the Fourier transform forming $A$ in imaging problems usually helps performance as well. Of course, in all these cases, convergence to a global optimum cannot be guaranteed but empirical recovery performance is often quite good.

2. PhaseLift: semidefinite relaxation in signal

Using a classical lifting argument by [Shor, 1987], and writing

[Chai et al., 2011; Candes et al., 2011] reformulate the phase recovery problem (1) as a matrix completion problem, written

in the variable $X\in{\mbox{\bf H}}_{p}$ , where $X=xx^{*}$ when exact recovery occurs. This last problem can be relaxed as

which is a semidefinite program (called PhaseLift by Candes et al. ) in the variable $X\in{\mbox{\bf H}}_{p}$ . This problem is solved in [Candes et al., 2011] using first order algorithms implemented in [Becker et al., 2011]. This semidefinite relaxation has been shown to recover the true signal $x$ exactly for several classes of observation operators $A$ [Candes et al., 2011, 2013a, 2013b].

3. PhaseCut: semidefinite relaxation in phase

Now, given the phase, signal reconstruction is a simple least squares problem, i.e. given $u$ we obtain $x$ as

where $A^{\dagger}$ is the pseudo inverse of $A$ . Replacing $x$ by its value in (3), the phase recovery problem becomes

is positive semidefinite. This problem is non-convex in the phase variable $u$ . [Waldspurger et al., 2012] detailed greedy algorithm Greedy to locally optimize (5) in the phase variable.

A convex relaxation to (5) was also derived in [Waldspurger et al., 2012] using the classical lifting argument for nonconvex quadratic programs developed in [Shor, 1987; Lovász and Schrijver, 1991]. This relaxation is written

which is a semidefinite program (SDP) in the matrix $U\in{\mbox{\bf H}}_{n}$ . This problem has a structure similar to the classical MAXCUT relaxation and instances of reasonable size can be solved using specialized implementations of interior point methods designed for that problem [Helmberg et al., 1996]. Larger instances are solved in [Waldspurger et al., 2012] using the block-coordinate descent algorithm BlockPhaseCut.

Ultimately, algorithmic choices heavily depend on problem structure, and these will be discussed in detail in the section that follows. In particular, we will study how to exploit structural information on the signal (nonnegativity, sparse 2D FFT, etc.), to solve realistically large instances formed in diffraction imaging applications.

Imaging problems

In the imaging problems we study here, various illuminations of a single object are performed through randomly coded masks, hence the matrix $A$ is usually formed using a combination of random masks and Fourier transforms, and we have significant structural information on both the signal we seek to reconstruct (regularity, etc.) and the observations (power law decay in frequency domain, etc.). Many of these additional structural hints can be used to speedup numerical operations, convergence and improve phase retrieval performance. The paragraphs that follow explore these points in more detail.

and the pseudo-inverse of $A$ also has a simple structure, with

where $I^{\prime}_{l}$ is the dual filter of $I_{l}$ , which is

Convolution with $\delta_{i}$ is only a shift and vectors $\mathcal{F}(I_{s}\circ I^{\prime}_{l})$ may be precomputed so this operation is very fast.

2. Low rank iterates

3. Bounded support

The situation is a lot simpler in some of the molecular imaging problems we are studying below since the electron density we are trying to recover is often smooth, which means that its Fourier transform will be sparse, with known support. While we lose the phase, we do observe the magnitude of the Fourier coefficients so we can rank them by magnitude. This allows us to considerably reduce the size of the SDP relaxation without losing much reconstruction fidelity, i.e. in many cases we observe that a significant fraction of the coefficients of $b$ are close to zero. From a computational point of view, sparsity in $b$ allows us to solve a truncated semidefinite relaxation (PhaseCut). See Figure 1 for an illustration of this phenomenon on the caffeine molecule.

Indeed, without loss of generality, we can reorder the observations $b$ such that we approximately have $b=(b_{1}^{T},0)^{T}$ . Similarly, we note

Using the fact that $b_{2}=0$ , the matrix $M$ in the objective of (5) can itself be written in blocks, that is

Since $b_{2}=0$ , any complex vector with coefficients of magnitude one can be taken for $u_{2}$ and the optimization problem (5) is equivalent to

is positive semidefinite. This problem can in turn be relaxed into a PhaseCut problem which is usually considerably smaller than the original (PhaseCut) problem since $M_{1}$ is typically a fraction of the size of $M$ .

4. Real, positive densities

In some cases, such as imaging experiments where a random binary mask is projected on an object for example, we know that the linear observations are formed as the Fourier transform of a positive measure. This introduces additional restrictions on the structure of these observations, which can be written as convex constraints on the phase vector. We detail two different ways of accounting for this positivity assumption.

In the case where the signal is real and nonnegative, [Waldspurger et al., 2012] show that problem (3) can modified to specifically account for the fact that the signal is real, by writing it

using the operator ${\mathcal{T}}(\cdot)$ defined as

we can rewrite the phase problem on real valued signal as

The optimal solution of the inner minimization problem in $x$ is given by $x=A_{2}^{\dagger}B_{2}v$ , where

which is a semidefinite program in the variable $V\in{\mbox{\bf S}}_{2n}$ , where

Because $x=A_{2}^{\dagger}B_{2}v$ for real instances, we can add a nonnegativity constraint to this relaxation, using

which is a semidefinite program in $V\in{\mbox{\bf S}}_{2n}$ .

4.2. Bochner’s theorem and the Fourier transform of positive measures

Proof. See [Berg et al., 1984] for example.

then when $\mathcal{F}x=\mathop{\bf diag}(b)u$ , Bochner’s theorem states that $B(\mathop{\bf diag}(b)u)\succeq 0$ iff $x\geq 0$ . The contraint $B(\mathop{\bf diag}(b)u)\succeq 0$ is a linear matrix inequality in $u$ , hence is convex.

and the phase retrieval problem (5) for positive signals $x$ is now written

The SDP relaxation PhaseCut+ cannot be solved using block coordinate descent. Without positivity constraints, the relaxation PhaseCutR designed for real signals can be solved efficiently using the algorithm in [Helmberg et al., 1996]. The constraint structure in PhaseCutR means that the most expensive step at each iteration of the algorithm in [Helmberg et al., 1996] is computing the inverse of a symmetric matrix of dimension $n$ (or less, exploiting sparsity in $b$ ). Sparse instances of the more complex relation PhaseCut+ were solved using SDPT3 [Toh et al., 1999] in what follows.

Numerical Experiments

We study molecular imaging problems based on electronic densities obtained from the Protein Data Bank [Berman et al., 2002]. From a 3D image, we obtain a 2D projection by integrating the third dimension. After normalizing these images, we simulate multiple diffraction observations for each molecule, using several random masks. Here, our masks consist of randomly generated binary filters placed before the sample, but other settings are possible [Candes et al., 2013b]. Our vector of observations then corresponds to the magnitude of the Fourier transform of the componentwise product of the image and the filter. As in the SPSIM package [Maia, 2013] simulating diffraction imaging experiments, random Poisson noise is added to the observations, modeling sensor and electronic noise. More specifically, the noisy intensity measurements are obtained using the following formula,

We present numerical experiments on two molecules from the Protein Data Bank (PDB), namely caffeine and cocaine, with very different structure (properly projected, caffeine is mostly circular, cocaine has a star shape). Images of the caffeine and cocaine molecules at low and high resolutions are presented in Figure 2. We first use “high” $128\times 128$ pixels resolutions to evaluate the sensitivity of PhaseCut to noise and number of masks using the fast BlockPhaseCut algorithm (see section 4.1). We then use a “low” $16\times 16$ pixels image resolution to compare PhaseCut formulations using structural constraints, i.e. complex PhaseCut, real PhaseCutR, and PhaseCut+ (with positivity constraints, see section 4.2) on a large number of random experiments.

We first compare the results obtained by the Fienup and BlockPhaseCut algorithms while varying the number of masks and the noise level. For the PhaseCut relaxation, in order to deal with the large size of the lifted matrix, we use the low rank approximation described in §3.2 to store iterates and exploit sparsity in the magnitude of the observations vector described as described in §3.3. Tables 1 and 2 illustrate the impact of image resolution and oversampling on the fraction of coefficients required to approximately reconstruct a molecular density up to a given quality threshold, for various molecules. We observe that the sparsity of 2D FFTs increases with resolution and oversampling, but varies from one molecule to another. We then retrieve the phase vector as the first eigenvector in the final low rank approximation, then refine it with the greedy algorithms Greedy or Fienup.

More specifically, in the experiments that follow, the image was of size $128\times 128$ , we used a rank of two for the low rank approximation, kept the largest 1000 observations, did 5000 iterations of algorithm Fienup, and 20 cycles of algorithm BlockPhaseCut (one cycle corresponds to optimizing once over all rows/columns of the lifted matrix). We compared the results of the phase recovery using one to four masks, and three different levels of Poisson noise (no noise, “small” noise, “large” noise). In all settings, all points of the electronic density were illuminated at least once by the random masks (the first mask lets all the signal go through). The noisy (Poisson) intensity measurements were obtained using the formula described above. Experiments were performed on a regular Linux desktop using Matlab for the greedy algorithms and a C implementation of the block coordinate algorithm for PhaseCut. Reported CPU times are in seconds.

1.2. Results

In most cases both algorithm Fienup and BlockPhaseCut seem to converge to the (global) optimal solution, though Fienup is much faster. In some cases however, such as the experiment with two filters and no noise in Figure 3, initializing algorithm Fienup with the solution from BlockPhaseCut significantly outperforms the solution obtained by algorithm Fienup alone, which appears to be stuck in a local minimum. The corresponding MSE values are listed in Table 3. In Figure 5 we plot the histogram of MSE for the noiseless case with only two illuminations, using either algorithm Fienup, or BlockPhaseCut followed by greedy refinements, over many random illumination configurations. We observe that in many samples, algorithm Fienup gets stuck in a local optimum, while the SDP always converges to a global optimum.

2. Performance of PhaseCut relaxations with respect to number of masks, noise and filter resolution

We now compare PhaseCut formulations with structural constraints, i.e. complex PhaseCut, real PhaseCutR, and PhaseCut+ (with positivity constraints, see section 4.2) on a large number of random experiments formed using “low” $16\times 16$ pixels image resolution.

Masks are of resolution $1\times 1$ and no noise is added. As shown in Figures 6 and 7, PhaseCut, PhaseCutR and PhaseCut+ with Fienup post processing (respectively “SDP + Fienup HIO”, “SDP + real + Fienup HIO”, “SDP + real + toeplitz + Fienup HIO” and “Fienup HIO” curves on the figure) all outperform Fienup alone. For PhaseCut, in most cases, two to three masks seem enough to exactly recover the phase. Moreover, as expected, PhaseCutR performs a little bit better than PhaseCut, but surprisingly, positivity constraints of PhaseCut+ do not seem to improve the solution of PhaseCutR in these experiments. Finally, as shown in Figures 8 and 9, oversampling the Fourier transform seems to have a positive impact on the reconstruction. Results on caffeine and cocaine are very similar.

2.2. Varying mask resolution

Here, two or three masks are used and no noise is added. As shown in Figures 10, 11, 12 and 13, we can see that the MSE of reconstructed images increase with the resolution of masks. Moreover PhaseCutR is more robust to lower mask resolution than PhaseCut. Finally, as expected, with more randomly masked illuminations, we can afford to lower mask resolution.

2.3. Varying noise levels

Here two masks are used (the minimum), with resolution $1\times 1$ . Poisson noise is added (parameterized by $\alpha$ ). As shown in Figures 14, and 15, we can see that PhaseCut and PhaseCutR are stable with regards to noise, i.e. we obtain a linear increase of the log MSE with respect to the log noise.

User guide

We provide here the instructions to artificially recover the image of a molecule from the Protein Data Bank using PhaseCutToolbox (download at www.di.ens.fr/~aspremon). This example is entirely reproduced with comments in the script testPhaseCut.m.

Our toolbox works on all recent versions of MATLAB on Mac OS X, and on MATLAB versions anterior to 2008 on Linux (there might be conflicts with Arpack library for ulterior versions, when using BlockPhaseCut). Installation only requires to put the toolbox folder and subdirectories on the Matlab path. Use for instance the command:

where MYPATH is the directory where you have copied the toolbox.

2. Generate the diffraction pattern of a molecule

Suppose we work with the caffeine molecule, on an image of resolution $128\times 128$ pixels. We set the corresponding input variables.

Now, we set the parameters of the masks. The number of masks (also called filters or illuminations) is set to 2. Moreover we set the filter resolution to 1. The filter resolution corresponds to the square root of the number of pixels in each block of the binary filter. The filter resolution must divide N (the square root of the number of pixels in the image).

Since the filters are generated randomly, we set the seed of the uniform random generator to 1 in order to get reproducible experiments. Note that the quality of the phase retrieval may depend on the shape of the generated masks, especially when using only 2 or 3 filters.

Now we can generate an image, 2 masks and their corresponding diffraction patterns. We set the level of noise on the observations to zero here (i.e. no noise). $\alpha$ is the level of Poisson noise, and $\beta$ is the level of Gaussian noise.

We set the oversampling parameter for the Fourier transform to 2.

The total number of observations, i.e. the size of the vector $b$ is

Suppose that we want to use only the first largest one thousand observations in PhaseCut, we set

Note that the number of observations that is sufficient to get close to the optimal solution depends on the size of the data N and the sparsity of the vector b. From a more practical point of view, the larger nbObsKept, the more time intensive the optimization. Therefore, for a quick test we recommend setting nbObsKept to a few thousands, then increasing it if the results are not satisfying.

Finally we call the function genData which is going to generate both the image x we want to recover, filters, and observations b. bs corresponds to the thousand largest observations, xs is the image recovered with the true phase but using only bs. idx_bs is the logical indicator vector of bs (bs=b(idx_bs)). We put displayFig to 1 in order to display the filters, the images of the molecule x and xs, as well as the diffraction patterns (with and without noise).

3. Phase Retrieval using Fienup and/or PhaseCut

Using the data generated in the previous section, we retrieve the phase of the observations vector b. Suppose we want to use the SDP relaxation with greedy refinement, we set

The other choices for method are ’Fienup’, ’Fienup HIO’ and ’SDP’ (no greedy refinement). We set the initial (full) phase vector $u$ to the vector of ones, and the number of iterations for Fienup algorithm to 5000. The number of iterations for Fienup algorithm must be large enough so that the objective function converges to a stationary point. In most cases 5000 iterations seems to be enough.

We also need to choose which algorithm we want to use in order to solve the SDP relaxation. For high resolution images, we recommend to always use the block coordinate descent algorithm with a low rank approximation of the lifted matrix (BCDLR), since interior points methods (when using SDPT3 or Mosek) and block coordinate descent without low rank approximation (BCD) become very slow when the number of observations used is over a few thousands.

If we had wanted to solve PhaseCutR or PhaseCut+ we would have set

We can now set up the parameters for the BCDLR solver.

One cycle corresponds to optimizing over all the columns of the lifted matrix. In most cases, it seems that using nbCycles between 20 and 40 is enough to get close to the optimum, at least when refining the solution with Fienup algorithm. r is the rank for the low rank approximation of the lifted matrix. Similarly it seems that r between 2 and 4 gives reasonable results. Note that you can check that the low rank approximation is valid by looking at the maximum ratio between the last and the first eigenvalues throughout all iterations of the BCDLR algorithm. This ratio is outputted as relax.eigRatio when calling the function retrievePhase (see below). We finally call the function retrievePhase in order to solve the SDP relaxation with greedy refinement.

The function retrievePhase outputs the vector of retrieved phase as retrievedPhase and the values of the objective function at each iteration/cycle of the algorithm in objValues (add .Fienup, .SDP .SDPREfined to retrievedPhase and objValues to get the corresponding retrieved phase and objective value). If using the SDP relaxation, the vector retrievedPhase is the first eigenvector of the final lifted matrix in PhaseCut. Note that the objective value in Fienup and in the SDP relaxation do not correspond exactly since the lifted matrix may be of rank bigger than one during the iterations of the BCDLR. Therefore we also output finalObj, which is the objective value of the phase vector extracted from the lifted matrix (i.e. the vector retrievedPhase). The image can now be retrieved using the command

Finally you can visualize the results using the following standard Matlab commands, plotting the objective values

4. Reproduceing the experiments of the paper

All the numerical experiments of the paper can be reproduced using the Matlab scripts included in the toolbox directory Experiments.

phaseTransition_OSF1.m (evolution of MSE with number of filters, with no oversampling of the Fourier transform, Figures 6, 7)

phaseTransition_OSF2.m (evolution of MSE with number of filters, with oversampling of the Fourier transform Figures 8, 9)

filterResTransition.m (evolution of MSE with filter resolution, figures 12, 13, 10, 11).

noiseTransition.m (evolution of MSE with noise, Figures 14, 15)

testNoiseNbIllums.m (test noise vs number of filters, Figures 3 and 4, and table 3)

testSeeds.m (test different seeds to generate filters, Figure 5)

Acknowledgments

AA and FF would like to acknowledge support from a starting grant from the European Research Council (project SIPA).