Iterative Reconstruction of Rank-One Matrices in Noise

Alyson K. Fletcher, Sundeep Rangan

Introduction

where W\mathbf{W} represents some unknown noise and m\sqrt{m} is a normalization factor. The problem can be considered as a rank-one special case of finding a low-rank matrix in the presence of noise. Such low-rank estimation problems arise in a range of applications including blind channel estimation , antenna array processing , subspace system identification , and principal component or factor analysis .

When the noise term W\mathbf{W} is zero, the vector pair (u0,v0)(\mathbf{u}_{0},\mathbf{v}_{0}) can be recovered exactly, up to a scaling, from the maximal left and right singular vectors of A\mathbf{A} . However, in the presence of noise, the rank-one matrix can in general only be estimated approximately. In this case, a priori information or constraints on (u0,v0)(\mathbf{u}_{0},\mathbf{v}_{0}) may improve the estimation. Such constraints arise, for example, in factor analysis in statistics, where one of the factors is often constrained to be either positive or sparse . Similar sparsity constraints occur in the problem of dictionary learning . Also, in digital communications, one of the factors could come from a discrete QAM constellation.

In this paper, we impose the constraints on u0\mathbf{u}_{0} and v0\mathbf{v}_{0} in a Bayesian setting where u0\mathbf{u}_{0} and v0\mathbf{v}_{0} are assumed to be independent from one another with i.i.d. densities of the form

for some scalar random variables U0U_{0} and V0V_{0}. The noise W\mathbf{W} in (1) is assumed to have i.i.d. Gaussian components where WijN(0,τw)W_{ij}\sim{\mathcal{N}}(0,\tau_{w}) for some variance τw>0\tau_{w}>0. Under this model, the posterior density of u0\mathbf{u}_{0} and v0\mathbf{v}_{0} is given by

where F\|\cdot\|_{F} is the Frobenius norm. Given this posterior density, we consider two estimation problems:

MAP estimation: In this problem, we wish to find the maximum a posteriori estimates

MMSE estimation: In this problem, we wish to find the posterior mean or, equivalently, the minimum mean squared error estimate,

We may also be interested in the posterior variances and posterior marginals.

Exact computation of either of these estimates is generally intractable. Even though the components u0\mathbf{u}_{0} and v0\mathbf{v}_{0} are assumed to have independent components, the posterior density (3) is not, in general, separable due to the term u0v0T\mathbf{u}_{0}\mathbf{v}_{0}^{T}. Thus, the MAP estimate requires a search over mm and nn-dimensional vectors, (u0,v0)(\mathbf{u}_{0},\mathbf{v}_{0}) and the MMSE estimate requires integration over this m+nm+n-dimensional space. However, due to the separability assumption on the priors (2), it is computationally simpler to alternately estimate the factors u0\mathbf{u}_{0} and v0\mathbf{v}_{0}. This form of alternating estimation is used, for example, in the classic alternating power method for finding maximal singular values and some alternating methods in sparse or non-negative dictionary learning .

The approach in this work also uses a form of alternating estimation, but based on a recently-developed powerful class of algorithms known as Approximate Message Passing or AMP. AMP methods are derived from Gaussian and quadratic approximations of loopy belief propagation in graphical models and were originally used for problems in compressed sensing . The AMP methodology has been successfully applied in a range of applications . In recent years, there has been growing interest of AMP for matrix factorization problems as well . The problem considered in this paper can be considered as a simple rank one case of these problems.

Our main contribution in this work is to show that, for the rank one MMSE and MAP estimation problems, the proposed IterFac algorithm admits an asymptotically-exact characterization in the limit when m,nm,n\rightarrow\infty with m/nm/n constant and the components of the true vectors u0\mathbf{u}_{0} and v0\mathbf{v}_{0} have limiting empirical distributions. In this scenario, we show that the empirical joint distribution of the components of u0\mathbf{u}_{0} and v0\mathbf{v}_{0} and the corresponding estimates from the IterFac algorithm are described by a simple scalar equivalent model where the IterFac component estimates are identically distributed to scalar estimates of the variables corrupted by Gaussian noise. Moreover, the effective Gaussian noise level in this model is described by a simple set of scalar state evolution (SE) equations. This scalar equivalent model is common in analyses of AMP methods as well as replica analyses of related estimation problems . From the scalar equivalent model, one can compute the asymptotic value of almost any component-separable metric including mean-squared error or correlation. Thus, in addition to being computationally simple and general, the IterFac algorithm admits a precise analysis in the case of Gaussian noise. Moreover, since fixed points of the IterFac algorithm correspond, under suitable circumstances, to local maxima of objectives such as (4), the analysis can be used to characterize the behavior of such minima—even if alternate algorithms to IterFac are used.

The main analytical tool is a recently-developed technique by Bayati and Montanari used in the analysis of AMP algorithms. This work proved that, in the limit for large Gaussian mixing matrices, the behavior of AMP estimates can be described by a scalar equivalent model with effective noise levels defined by certain scalar state evolution (SE) equations. Similar SE analyses have appeared in related contexts . To prove the SE equations for the IterFac algorithm, we apply a key theorem from with a simple change of variables and a slight modification to account for parameter adaptation.

A conference version of this paper appeared in . This paper provides all the proofs along with more detailed discussions and simulations. Since the original publication of the conference version in , several other research groups have extended and built on the work. Importantly, has shown that in the case of certain discrete priors, the IterFac algorithm is provably Bayes optimal in a large system limit. It was shown in , that an algorithm closely related to IterFac could provide the best known scaling laws for the hidden clique problem. More recently, and have used related AMP-type methods for matrix factorization problems with rank greater than one.

Iterative Rank-One Factorization

for some scalar functions Gu()G_{u}(\cdot) and Gv()G_{v}(\cdot). The choice of the factor selection function will depend on whether IterFac is used for MAP or MMSE estimation.

To describe the factor selection functions for the MAP estimation problem (4), Let λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t) be sequences of pairs of parameters

Then, for each tt, consider random vectors p(t)\mathbf{p}(t) and q(t)\mathbf{q}(t) given by,

The random variables p(t)\mathbf{p}(t) and q(t)\mathbf{q}(t) in (8) are simply scaled versions of the true vectors u0\mathbf{u}_{0} and v0\mathbf{v}_{0} with additive white Gaussian noise (AWGN). Then, for the MAP problem we take the factor selection functions to be the MAP estimates of u0\mathbf{u}_{0} and v0\mathbf{v}_{0} given the observations p(t)\mathbf{p}(t) and q(t)\mathbf{q}(t):

Importantly, due to the separability assumption on the priors (2), these MAP estimates can be computed componentwise:

Hence, the IterFac algorithm replaces the vector-valued MAP estimation of u0,v0\mathbf{u}_{0},\mathbf{v}_{0} from the joint density (3), with a sequence of scalar MAP estimation problems along with multiplications by A\mathbf{A} and AT\mathbf{A}^{T}.

MMSE Estimation:

For the MMSE estimation problem, we simply take the factor selection functions the MMSE estimates with respect to the random variables (8),

Again, due to the separability assumption (2), these MMSE estimation problems can be performed componentwise. Hence MMSE IterFac reduces the vector MMSE problem to a sequence of scalar problems in Gaussian noise.

Intuitive Algorithm Derivation and Analysis

Before we formally analyze the algorithm, it is instructive to understand the intuition behind the method. For both the MAP and MMSE versions of IterFac, first observe that

where (a) follows from line 6 in Algorithm 1; (b) follows from the assumptions on the measurements (1) and in (c), we have used the definitions

For the first iteration, W\mathbf{W} is a large zero mean matrix independent of the initial condition v(0)\mathbf{v}(0). Also, μu(0)=0\mu_{u}(0)=0 and hence the components of zu(0)\mathbf{z}_{u}(0) will be asymptotically Gaussian zero mean variables due to the Central Limit Theorem. Hence, the vector p(0)\mathbf{p}(0) will be distributed as the true vector u0\mathbf{u}_{0} with Gaussian noise as in the model (8). Therefore, we can take an initial MAP or MMSE estimate of u0\mathbf{u}_{0} from the scalar estimation functions in (9) or (11).

Unfortunately, in subsequent iterations with t>0t>0, v(t)\mathbf{v}(t) is no longer independent of W\mathbf{W} and hence Wv(t)\mathbf{W}\mathbf{v}(t) will not, in general, be a Gaussian zero mean random vector. However, remarkably, we will show that with the specific choice of μu(t)\mu_{u}(t) in Algorithm 1, the addition of the term μu(t)u(t)\mu_{u}(t)\mathbf{u}(t) in (13) “debiases” the noise term so that zu(t)\mathbf{z}_{u}(t) is asymptotically zero mean Gaussian. Thus, p(t)\mathbf{p}(t) continues to be distributed as in (8) and we can construct MAP or MMSE estimates of u0\mathbf{u}_{0} from the vector p(t)\mathbf{p}(t). For example, using the MMSE estimation function (11) with appropriate selection of the parameters λu(t)\lambda_{u}(t), we obtain that the estimate for u(t ⁣+ ⁣1)\mathbf{u}(t\!+\!1) is given by the MMSE estimate

For the MAP estimation problem, the MAP selection function (9) computes the MAP estimate for u0\mathbf{u}_{0}.

Similarly, for q(t)\mathbf{q}(t), we see that

for γv(t)=u0Tu(t ⁣+ ⁣1)/m\gamma_{v}(t)=\mathbf{u}_{0}^{T}\mathbf{u}(t\!+\!1)/m and zv(t)=(1/m)WTu(t ⁣+ ⁣1)+μv(t)v(t)\mathbf{z}_{v}(t)=(1/\sqrt{m})\mathbf{W}^{T}\mathbf{u}(t\!+\!1)+\mu_{v}(t)\mathbf{v}(t). Then, v(t ⁣+ ⁣1)\mathbf{v}(t\!+\!1) is taken as the MMSE or MAP estimate of v0\mathbf{v}_{0} from the vector q(t)\mathbf{q}(t).

Thus, we see that the algorithm attempts to produce a sequence of unbiased estimates p(t)\mathbf{p}(t) and q(t)\mathbf{q}(t) for scaled versions of u0\mathbf{u}_{0} and v0\mathbf{v}_{0}. Then, it constructs the MAP or MMSE estimates u(t)\mathbf{u}(t) and v(t ⁣+ ⁣1)\mathbf{v}(t\!+\!1) from these unbiased estimates.

2 State Evolution Analysis

The above “derivation” of the algorithm suggests a possible method to analyze the IterFac algorithm. Consider a sequence of problems indexed by nn and suppose that the dimension m=m(n)m=m(n) grows linearly with nn in that

for some β>0\beta>0. For each nn and iteration number tt, define the sets

which are the set of the components of the true vectors u0iu_{0i} and v0jv_{0j} and their corresponding estimates ui(t)u_{i}(t) and vj(t)v_{j}(t). To characterize the quality of the estimates, we would like to describe the distributions of the pairs in θu(t)\theta_{u}(t) and θv(t)\theta_{v}(t).

Our formal analysis below will provide an exact characterization of these distributions. Specifically, we will show that the sets have empirical limits of the form

for some random variable pairs (U0,U(t)(U_{0},U(t) and (V0,V(t))(V_{0},V(t)). The precise definition of empirical limits is given in Appendix A.1, but loosely, it means that the empirical distribution of the components in θu(t)\theta_{u}(t) and θv(t)\theta_{v}(t) converge to certain random variable pairs.

In addition, we can inductively derive what the distributions are for the limits in (17). To make matters simple, suppose that the parameters λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t) are not selected adaptively but instead given by a fixed sequences λu(t)\overline{\lambda}_{u}(t) and λv(t)\overline{\lambda}_{v}(t). Also, given random variables in the limits of (17), define the deterministic constants

Now suppose that the second limit in (17) holds for some tt. Then, γu(t)\gamma_{u}(t) in (13) would have the following limit:

Also, if we ignore the debiasing term with μu(t)\mu_{u}(t) and assume that W\mathbf{W} is independent of v(t)\mathbf{v}(t), the variance of the components of zu(t)\mathbf{z}_{u}(t) in (13) would be

Thus, we would expect a typical component of pi(t)p_{i}(t) in (12) to be distributed as

Due to the separability assumption (6), each component ui(t ⁣+ ⁣1)=Gu(pi(t),λu(t))u_{i}(t\!+\!1)=G_{u}(p_{i}(t),\overline{\lambda}_{u}(t)). So, we would expect the components to follow the distribution

This provides an exact description of the expected joint density of (U0,U(t ⁣+ ⁣1))(U_{0},U(t\!+\!1)). From this density we can then compute αu0(t ⁣+ ⁣1)\overline{\alpha}_{u0}(t\!+\!1) and αu1(t ⁣+ ⁣1)\overline{\alpha}_{u1}(t\!+\!1) in (18).

A similar calculation, again assuming the “debiasing” and Gaussianity assumptions are valid, shows that the limiting empirical distribution of θv(t ⁣+ ⁣1)\theta_{v}(t\!+\!1) in (17) should follow

This provides the joint density (V0,V(t ⁣+ ⁣1))(V_{0},V(t\!+\!1)) from which we can compute αv0(t ⁣+ ⁣1)\overline{\alpha}_{v0}(t\!+\!1) and αv1(t ⁣+ ⁣1)\overline{\alpha}_{v1}(t\!+\!1) in (18). Thus, we have provided a simple method to recursively compute the joint densities of the limits in (17) and their second-order statistics (18).

Asymptotic Analysis under Gaussian Noise

In the above intuitive analysis, we did not formally describe the sense of convergence nor offer any formal justification for the Gaussianity assumptions. In addition, we assumed that the parameters λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t) were fixed sequences. In reality, we will need them to be data adaptive. We will make the above arguments rigorous under the following assumptions. Note that although we will be interested in MAP or MMSE estimation functions, our analysis will apply to arbitrary factor selection functions Gu()G_{u}(\cdot) and Gv()G_{v}(\cdot).

Consider a sequence of random realizations of the estimation problem in Section 1 indexed by the dimension nn. The matrix A\mathbf{A} and the parameters in Algorithm 1 satisfy the following:

For each nn, the output dimension m=m(n)m=m(n) is deterministic and scales linearly with nn as (15).

The factor selection functions Gu(p,λu)G_{u}(\mathbf{p},\lambda_{u}) and Gv(q,λv)G_{v}(\mathbf{q},\lambda_{v}) in lines 7 and 12 are componentwise separable in that for all component indices ii and jj,

for some scalar functions Gu(p,λu)G_{u}(p,\lambda_{u}) and Gv(q,λv)G_{v}(q,\lambda_{v}). The scalar functions must be differentiable in pp and qq. Moreover, for every tt, the functions Gu(p,λu)G_{u}(p,\lambda_{u}) and Gu(p,λu)/p\partial G_{u}(p,\lambda_{u})/\partial p must be Lipschitz continuous in pp with a Lipschitz constant that is continuous in λu\lambda_{u}, and continuous in λu\lambda_{u} uniformly over pp. Similarly, for every tt, the functions Gv(q,λv)G_{v}(q,\lambda_{v}) and Gv(q,λv)/q\partial G_{v}(q,\lambda_{v})/\partial q must be Lipschitz continuous in qq with a Lipschitz constant that is continuous in λv\lambda_{v}, and continuous in λv\lambda_{v} uniformly over qq.

The parameters λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t) are computed via

for (possibly vector-valued) functions ϕλu()\phi_{\lambda u}(\cdot) and ϕλv()\phi_{\lambda v}(\cdot) that are pseudo-Lipschitz continuous of order p=2p=2.

For t=0t=0, the sets (17) empirically converge with bounded moments of order 22 to the limits

for some random variable pairs (U0,U(0)(U_{0},U(0) and (V0,V(0))(V_{0},V(0)). See Appendix A.1 for a precise definition of the empirical convergence used here.

The assumptions need some explanations. Assumptions 4.1(a) and (b) simply state that we are considering an asymptotic analysis for certain large matrices A\mathbf{A} consisting of a random rank one matrix plus Gaussian noise. The analysis of Algorithm 1 for higher ranks is still not known, but we provide some possible ideas later. Assumption 4.1(c) is a mild condition on the factor selection functions. In particular, the separability assumption holds for the MAP or MMSE functions (9) and (11) under separable priors.

Assumption (d) allows for the parameters λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t) in the factor selection functions to be data dependent, provided that they can each be determined via empirical averages of some function of the most recent data. Assumption (e) is the initial induction hypothesis.

Under these assumptions, we can recursively define the sequence of random variables (U0,U(t))(U_{0},U(t)) and (V0,V(t))(V_{0},V(t)) as described above. For the parameters λu(t)\overline{\lambda}_{u}(t) and λv(t)\overline{\lambda}_{v}(t), define them from the expectations

These are the limiting values we would expect given the adaptation rules (22).

Under Assumption 4.1, the sets θu(t)\theta_{u}(t) and θv(t)\theta_{v}(t) in (16) converge empirically with bounded moments of order p=2p=2 with the limits in (17).

2 Scalar Equivalent Model

The main contribution of Theorem 4.1 is that it provides a simple scalar equivalent model for the asymptotic behavior of the algorithm. The sets θu(t)={(u0i,ui(t))}\theta_{u}(t)=\{(u_{0i},u_{i}(t))\} and θv(t)={(v0j,vj(t))}\theta_{v}(t)=\{(v_{0j},v_{j}(t))\} in (16) are the components of true vectors u0\mathbf{u}_{0} and v0\mathbf{v}_{0} and their estimates u(t)\mathbf{u}(t) and v(t)\mathbf{v}(t). The theorem shows that empirical distribution of these components are asymptotically equivalent to simple random variable pairs (U0,U(t))(U_{0},U(t)) and (V0,V(t))(V_{0},V(t)) given by (19) and (20). In this scalar system, the variable U(t ⁣+ ⁣1)U(t\!+\!1) is the output of the factor selection function Gu()G_{u}(\cdot) applied to a scaled and Gaussian noise-corrupted version of the true variable U0U_{0}. Similarly, V(t ⁣+ ⁣1)V(t\!+\!1) is the output of the factor selection function Gv()G_{v}(\cdot) applied to a scaled and Gaussian noise-corrupted version of the true variable V0V_{0}. Following , we can thus call the result a single-letter characterization of the algorithm.

From this single-letter characterization, one can exactly compute a large class of performance metrics of the algorithm. Specifically, the empirical convergence of θu(t)\theta_{u}(t) shows that for any pseudo-Lipschitz function ϕ(u0,u)\phi(u_{0},u) of order pp, the following limit exists almost surely:

where the expectation on the right-hand side is over the variables (U0,U(t))(U_{0},U(t)) with U0U_{0} identical to the variable in the limit in (23) and U(t)U(t) given by (19). This expectation can thus be explicitly evaluated by a simple two-dimensional integral and consequently any component-separable performance metric based on a suitably continuous loss function ϕ(u0,u)\phi(u_{0},u) can be exactly computed.

For example, if we take ϕ(u0,u)=(uu0)2\phi(u_{0},u)=(u-u_{0})^{2} we can compute the asymptotic mean squared error of the estimate,

Also, for each tt, define the empirical second-order statistics

From the second order statistics, we can compute the asymptotic correlation coefficient between u0\mathbf{u}_{0} and its estimate u\mathbf{u} given by

Similarly, the asymptotic correlation coefficient between v0\mathbf{v}_{0} and v\mathbf{v} has a simple expression

The correlation coefficient is useful, since we know that, without additional constraints, the terms u0\mathbf{u}_{0} and v0\mathbf{v}_{0} can only be estimated up to a scalar. The correlation coefficient is scale invariant.

Examples

The SE analysis can be used to exactly predict the asymptotic behavior of the IterFac algorithm for any smooth scalar estimation functions, including the MAP or MMSE functions (9) and (11). There are, however, two cases, where the SE equations have particularly simple and interesting solutions: linear estimation functions and MMSE functions.

Suppose we use linear selection functions of the form

where the parameters λu\lambda_{u} and λv\lambda_{v} allow for normalization or other scalings of the outputs. Linear selection functions of the form (31) arise when one selects Gu()G_{u}(\cdot) and Gv()G_{v}(\cdot) from the MAP or MMSE functions (9) or (11) with Gaussian priors.

With Gaussian priors, the correct solution to the MAP estimate (4) is for (u^,v^)(\widehat{\mathbf{u}},\widehat{\mathbf{v}}) to be the (appropriately scaled) left and right maximal singular vectors of A\mathbf{A}. We will thus call the estimates (u(t),v(t))(\mathbf{u}(t),\mathbf{v}(t)) of Algorithm 1 and linear selection functions (31) the estimated maximal singular vectors.

Consider the state evolution equation (18), (19), and (20) with the linear selection functions (31). Then:

The asymptotic correlation coefficients (29) and (30) satisfy the following recursive rules:

For any positive initial condition, ρv(0)>0\rho_{v}(0)>0, the asymptotic correlation coefficients converge to the limits

The theorem provides a set of recursive equations for the asymptotic correlation coefficients ρu(t)\rho_{u}(t) and ρv(t)\rho_{v}(t) along with simple expressions for the limiting values as tt\rightarrow\infty. We thus obtain exactly how correlated the estimated maximal singular vectors of a matrix A\mathbf{A} of the form (1) are to the rank one factors (u0,v0)(\mathbf{u}_{0},\mathbf{v}_{0}). The proof of the theorem also provides expressions for the second-order statistics in (18) to be used in the scalar equivalent model.

The fixed point expressions (33) agree with the more general results in that derive the correlations for ranks greater than one and low-rank recovery with missing entries. Similar results can also be found in . An interesting consequence of the expressions in (33) is that unless

the asymptotic correlation coefficients are exactly zero. The ratio τuτv/τw\tau_{u}\tau_{v}/\tau_{w} can be interpreted as a scaled SNR.

2 Minimum Mean-Squared Error Estimation

Next suppose we use the MMSE selection functions (11). Using the scalar equivalent models (19) and (20), we take the scaling factor and noise parameters as

where DN(0,1)D\sim\mathcal{N}(0,1) is independent of U0U_{0} and V0V_{0}. That is, Eu(ηu){\mathcal{E}}_{u}(\eta_{u}) and Ev(ηv){\mathcal{E}}_{v}(\eta_{v}) are the mean-squared errors of estimating U0U_{0} and V0V_{0} from observations YY with SNRs of ηu\eta_{u} and ηv\eta_{v}. The functions Eu(){\mathcal{E}}_{u}(\cdot) and Ev(){\mathcal{E}}_{v}(\cdot) arise in a range of estimation problems and the analytic and functional properties of these functions can be found in .

For all tt, the asymptotic correlation coefficients (29) and (30) satisfy the recursive relationships

If, in addition, Eu(ηu){\mathcal{E}}_{u}(\eta_{u}) and Ev(ηv){\mathcal{E}}_{v}(\eta_{v}) are continuous, then for any positive initial condition, ρv(0)>0\rho_{v}(0)>0, as tt\rightarrow\infty, the asymptotic correlation coefficients (ρu(t),ρv(t))(\rho_{u}(t),\rho_{v}(t)) increase monotonically to fixed points (ρu,ρu)(\rho_{u}^{*},\rho_{u}^{*}) of (37) with ρv>0\rho_{v}^{*}>0.

Again, we see that we can obtain simple, explicit recursions for the asymptotic correlations. Moreover, the asymptotic correlations provably converge to fixed points of the SE equations. The proof of the theorem also provides expressions for the second-order statistics in (18) to be used in the scalar equivalent model.

3 Zero Initial Conditions

Of course, with zero mean random variables, a more sensible initial condition is to take v(0)\mathbf{v}(0) to be some non-zero random vector, as is commonly done in power algorithm recursions for computing maximal singular vectors. To understand the behavior of the algorithm under this random initial condition, let

where we have explicitly denoted the dependence on the problem dimension nn. From (30), we have that limnρv(t,n)=ρv(t)\lim_{n\rightarrow\infty}\rho_{v}(t,n)=\rho_{v}(t) for all tt. Also, with a random initial condition v(0)\mathbf{v}(0) independent of v0\mathbf{v}_{0}, it can be checked that ρv(0,n)=O(1/n)\rho_{v}(0,n)=O(1/n) so that

Hence, from the SE equations (37), ρv(t)=ρu(t)=0\rho_{v}(t)=\rho_{u}(t)=0 for all tt. That is,

This limit suggests that, even with random initial condition, the IterFac algorithm will not produce a useful estimate.

However, it is still possible that the limit in the opposite order of (39) may be non-zero:

That is, for each nn, it may be possible to obtain a non-zero correlation, but the number of iterations for convergence increases with nn since the algorithm starts from a decreasingly small initial correlation. Unfortunately, our SE analysis cannot make predictions on limits in the order of (40).

We can however analyze the following limit:

If βτuτv>τw\sqrt{\beta}\tau_{u}\tau_{v}>\tau_{w},

Conversely, if βτuτv<τw\sqrt{\beta}\tau_{u}\tau_{v}<\tau_{w},

The result of the lemma is somewhat disappointing. The lemma shows that βτuτv>τw\sqrt{\beta}\tau_{u}\tau_{v}>\tau_{w} is essentially necessary and sufficient for the IterFac algorithm with MMSE estimates to be able to overcome arbitrarily small initial conditions and obtain an estimate with a non-zero correlation to the true vector. Unfortunately, this is the identical to the condition (34) for the linear estimator to obtain a non-zero correlation. Thus, the IterFac algorithm with MMSE estimates performs no better than simple linear estimation in the initial iterations when the priors have zero means. Since linear estimation is equivalent to finding maximal singular vectors without any particular constraints, we could interpret Lemma 5.5 as saying that the IterFac algorithm under MMSE estimation cannot exploit structure in the components in the initial iterations. As a result, in low SNRs it may be necessary to use other algorithms as an initial condition for IterFac – such procedures, however, require further study.

Numerical Simulation

which provides a simple model for a sparse, positive vector. The parameter λ\lambda is the fraction of nonzero components and is set in this simulation to λ=0.1\lambda=0.1. Note that these components have a non-zero mean so the difficulties of Section 5.3 are avoided. The dimensions are (m,n)=(1000,500)(m,n)=(1000,500), and the noise level τw\tau_{w} is set according to the scaled SNR defined as

The results of the simulation are shown in Fig. 1, which shows the simulated and SE-predicted performance of the IterFac algorithm with both the linear and MMSE selection functions for the priors on u\mathbf{u} and v\mathbf{v}. The algorithm is run for t=10t=10 iterations and the plot shows the median of the final correlation coefficient ρv(t)\rho_{v}(t) over 50 Monte Carlo trials at each value of SNR. It can be seen that the performance of the IterFac algorithm for both the linear and MMSE estimates are in excellent agreement with the SE predictions. The correlation coefficient of the linear estimator also matches the correlation of the estimates produced from the maximal singular vectors of A\mathbf{A}. This is not surprising since, with linear selection functions, the IterFac algorithm is essentially an iterative method to find the maximal singular vectors. The figure also shows the benefit of exploiting the prior on v0\mathbf{v}_{0}, which is evident from the superior performance of the MMSE estimate over the linear reconstruction.

Fig. 2 shows the correlation coefficient as a function of the iteration number for the MMSE and linear methods for two values of the SNR. Again, we see that the SE predictions are closely matched to the median simulated values. In addition, we see that we get good convergence within 4 to 8 iterations. Based on the SE predictions, this number will not scale with dimension and hence the simulation suggests that only a small number of iterations will be necessary for even very large problems. All code for the simulation can be found in the public GAMP sourceforge project .

Limitations and Extensions

There are several potential lines for future work – some of which have already been explored in other works made since the original publication of this paper in .

The algorithm presented in this paper considers only rank one matrices. The works have proposed AMP-type algorithms for more general classes of matrix factorization problems and provided analyses of these methods based on replica methods and other ideas from statistical physics.

Unknown priors

The MMSE estimator in Section 5.2 requires exact knowledge of the priors on U0U_{0} and V0V_{0} as well as the Gaussian noise level τw\tau_{w}. In many problems in statistics, these are not known. There are two possible solutions that may be investigated in the future. One method is to parameterize the distributions of U0U_{0} and V0V_{0} and estimate these parameters in the MMSE selection functions (11) – potentially through an EM type procedure as in . This EM type approach with hidden hyperparameters has been recently successfully used in a related approximate message passing method in . The analysis of the such parameter learning could possibly be accounted for through the adaptation parameters λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t). A second approach is to assume that the distributions of U0U_{0} and V0V_{0} belong to a known family of distributions and then find a min-max solution. Such a min-max technique was proposed for AMP recovery of sparse vectors in . See also .

Optimality

While the current paper characterizes the performance of the IterFac algorithm, it remains open how far that performance is to optimal estimation such as the joint MMSE estimates of u0\mathbf{u}_{0} and v0\mathbf{v}_{0}. AMP methods have been shown to be provably optimal in a wide range of scenarios . More recently, has proven that for sparse binary priors, the IterFac algorithm was provably Bayes optimal for certain sparsity levels. A potential line of future work is to see if these results can be extended to more general settings.

Conclusions

We have presented a computationally-efficient method for estimating rank-one matrices in noise. The estimation problem is reduced to a sequence of scalar AWGN estimation problems which can be performed easily for a large class of priors or regularization functions on the coefficients. In the case of Gaussian noise, the asymptotic performance of the algorithm is exactly characterized by a set of scalar state evolution equations which appear to match the performance at moderate dimensions well. Thus, the methodology is computationally simple, general and admits a precise analysis in certain asymptotic, random settings. Future work include extensions to higher rank matrices and handling of the cases where the priors are not known.

Acknowledgments

The authors thank Vivek Goyal, Ulugbek Kamilov, Andrea Montanari and Phil Schniter for detailed comments on an earlier draft.

Appendix A Appendices: Proofs

Now suppose that for each n=1,2,n=1,2,\ldots, we have a set of vectors

When the nature of convergence is clear, we may write (with some abuse of notation)

A.2 Bayati–Montanari Recursions with Adaptation

and ξv(t)\xi_{v}(t) and ξu(t)\xi_{u}(t) are scalar step sizes given by

The recursions (46) to (48) are identical to the recursions analyzed in , except for the introduction of the parameters νu(t)\nu_{u}(t) and νv(t)\nu_{v}(t). We will call these parameters adaptation parameters since they enable the functions Hu()H_{u}(\cdot) and Hv()H_{v}(\cdot) to have some data dependence, not explicitly considered in . Similar to the selection of the parameters λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t) in (22), we assume that, in each iteration tt, the adaptation parameters are selected by functions of the form,

where ϕu()\phi_{u}(\cdot) and ϕv()\phi_{v}(\cdot) are (possibly vector-valued) pseudo-Lipschitz continuous of order pp for some p>1p>1. Thus, the values of νu(t)\nu_{u}(t) and νv(t)\nu_{v}(t) depend on the outputs v(t)\mathbf{v}(t) and u(t ⁣+ ⁣1)\mathbf{u}(t\!+\!1). Note that in equations (47) to (49), did_{i}, u0iu_{0i}, bjb_{j} and v0jv_{0j} are the components of the vectors d\mathbf{d}, u0\mathbf{u}_{0}, b\mathbf{b} and v0\mathbf{v}_{0}, respectively. The algorithm is initialized with t=0t=0, ξu(0)=0\xi_{u}(0)=0 and some vector v(0)\mathbf{v}(0).

Now, similar to Section 4, consider a sequence of random realizations of the parameters indexed by the input dimension nn. For each nn, we assume that the output dimension m=m(n)m=m(n) is deterministic and scales linearly as in (15) for some β0\beta\geq 0. Assume that the transform matrix S\mathbf{S} has i.i.d. Gaussian components sijN(0,1/m)s_{ij}\sim{\cal N}(0,1/m). Also assume that the empirical limits in (23) hold with bounded moments of order 2p22p-2 for some limiting random variables (U0,U(0))(U_{0},U(0)) and (V0,V(0))(V_{0},V(0)). We will also assume the following continuity assumptions on Hu()H_{u}(\cdot) and Hv()H_{v}(\cdot):

The function Hu(t,b,u0,νu)H_{u}(t,b,u_{0},\nu_{u}) satisfies the following continuity conditions:

For every νu\nu_{u} and tt, Hu(t,b,u0,νu)H_{u}(t,b,u_{0},\nu_{u}) and its derivative Hu(t,b,u0,νu)/b\partial H_{u}(t,b,u_{0},\nu_{u})/\partial b are Lipschitz continuous in bb and u0u_{0} for some Lipschitz constant that is continuous in νu\nu_{u}; and

For every νu\nu_{u} and tt, Hu(t,b,u0,νu)H_{u}(t,b,u_{0},\nu_{u}) and Hu(t,b,u0,νu)/b\partial H_{u}(t,b,u_{0},\nu_{u})/\partial b are is continuous at νu\nu_{u} uniformly over (b,u0)(b,u_{0}).

The function Hv(t,d,v0,νv)H_{v}(t,d,v_{0},\nu_{v}) satisfies the analogous continuity assumptions in dd, v0v_{0} and νv\nu_{v}.

Under these assumption, we will show, as in Section 4, that for any fixed iteration tt, the sets θu(t)\theta_{u}(t) and θv(t)\theta_{v}(t) in (16) converge empirically to the limits (17) for some random variable pairs (U0,U(t))(U_{0},U(t)) and (V0,V(t))(V_{0},V(t)). The random variable U0U_{0} is identical to the variable in the limit (23) and, for t0t\geq 0, U(t)U(t) is given by

for some deterministic constants νv(t)\overline{\nu}_{v}(t) and τb(t)\tau^{b}(t) that will be defined below. Similarly, the random variable V0V_{0} is identical to the variable in the limit (23) and, for t0t\geq 0, V(t)V(t) is given by

for some constants νv(t)\overline{\nu}_{v}(t) and τd(t)\tau^{d}(t), also defined below.

The constants τb(t)\tau^{b}(t), τd(t)\tau^{d}(t), νu(t)\overline{\nu}_{u}(t) and νv(t)\overline{\nu}_{v}(t) can be computed recursively through the following state evolution equations

where the expectations are over the random variables U(t)U(t) and V(t)V(t) above and initialized with

With these definitions, we can now state the adaptive version of the result from Bayati and Montanari . Although the full proof requires that p=2p=2, much of the proof is valid for p2p\geq 2. Hence, where possible, we provide the steps for the general pp case.

Consider the recursion in (46) to (49) satisfying the above assumptions for the case when p=2p=2. Then, for any fixed iteration number tt, the sets θu(t)\theta_{u}(t) and θv(t)\theta_{v}(t) in (16) converge empirically to the limits (17) with bounded moments of order p=2p=2 to the random variable pairs (U0,U(t))(U_{0},U(t)) and (V0,V(t))(V_{0},V(t)) described above.

Now, Bayati and Montanari’s result in shows that this non-adaptive algorithm satisfies the required properties. That is, the following limits hold with bounded moments of order pp,

So, the limits (17) will be shown if we can prove the following limits hold almost surely for all tt:

where p\|\cdot\|_{p} is the pp-norm. In the course of proving (57), we will also show the following limits hold almost surely,

The proof of the limits (57) and (58) can be demonstrated via induction on tt with the following straightforward (but somewhat tedious) continuity argument:

To begin the induction argument, first note that the non-adaptive algorithm has the same initial condition as the adaptive algorithm. That is, v(0)=v(0)\mathbf{v}^{*}(0)=\mathbf{v}(0) and ξu(0)=ξu(0)=0\xi_{u}(0)=\xi_{u}^{*}(0)=0. Also, since ξu(0)=ξu(0)=0\xi_{u}(0)=\xi_{u}^{*}(0)=0, from (46a), the initial value of u(t)\mathbf{u}(t) does not matter. So, without loss of generality, we can assume that the initial condition satisfies u(0)=u(0)\mathbf{u}(0)=\mathbf{u}^{*}(0). Thus, the limits (57) and (58c) hold for t=0t=0.

We now proceed by induction. Suppose that the limits (57) and (58c) hold almost surely for some t0t\geq 0. Since S\mathbf{S} has i.i.d. components with zero mean and variance 1/m1/m, by the Marceko-Pastur Theorem , the maximum singular value of S\mathbf{S} is bounded. For p=2p=2, the maximum singular value is the pp-norm operator norm, and therefore, there exists a constant CS>0C_{S}>0 such that

Substituting the bound (59) into (46a), we obtain

Now, since p1p\geq 1, we have that for any positive numbers aa and bb,

Applying (61) into (60), and the fact that limnm/n=β\lim_{n}m/n=\beta, we obtain that

for some other constant CS>0C^{\prime}_{S}>0. Now, since u(t)\mathbf{u}^{*}(t) is the output of the non-adaptive algorithm it satisfies the limit

Substituting the bound (63) along with induction hypotheses, (57) and (58c) into (62) shows (58a).

Next, to prove the limit (58e), first observe that since ϕu()\phi_{u}(\cdot) is pseudo-Lipschitz continuous of order pp, we have that νu(t)\overline{\nu}_{u}(t) in (52b) can be replaced by the limit of the empirical means

where the limit holds almost surely. Combining (64) with the (49),

Applying the fact that ϕu()\phi_{u}(\cdot) is pseudo-Lipschitz continuous of order pp to (65), we obtain that there exists a constant Lv>0L_{v}>0 such that

where the last step is due to Hölder’s inequality with the exponents

Now, similar to the proof of (63) one can show that the non-adaptive output satisfies the limit

Also, from the induction hypothesis (57), it follows that the non-adaptive output must satisfy the same limit

Applying the bounds (67) and (68) and the limit (66) shows that (58e) holds almost surely.

Now the limit in (58e) and the second limit in (57) together with the continuity conditions on Hu()H_{u}(\cdot) in Assumption A.1 show that the first limit in (57) holds almost surely for t+1t+1 and (58d) holds almost surely for tt. Using (46b), the proof of the limit (58b) is similar to the proof of (58a). These limits in turn show that the second limit in (57) and the limits in (58c) hold almost surely for t+1t+1. We have thus shown that if (57) and (58c) hold almost surely for some tt, they hold for t+1t+1. Thus, by induction they hold for all tt. Finally, applying the limits (56), (57) and a continuity argument shows that the desired limits (17) hold almost surely.

A.3 Proof of Theorem 4.1

The theorem directly follows from the adaptive Bayati–Montanari recursion theorem, Theorem A.2 above, with some change of variables. Specifically, let

where W\mathbf{W} is the Gaussian noise in the rank one model in Assumption 4.1(b). Since W\mathbf{W} has i.i.d. components with Gaussian distributions N(0,τw){\mathcal{N}}(0,\tau_{w}), the components of S\mathbf{S} will be i.i.d. with distributions N(0,1/m){\mathcal{N}}(0,1/m).

Now, using the rank one model for A\mathbf{A} in (1)

where the last step is from the definition of αv1(t)\alpha_{v1}(t) in (26). Substituting (70) into the the update rule for p(t)\mathbf{p}(t) in line 6 of Algorithm 1, we obtain

Note that we have used the fact that β=n/m\beta=n/m. Hence, if we define

Similarly, one can show that if we define

which are the adaptation functions in (22) with additional components for the second-order statistics uu0uu_{0} and vv0vv_{0}. Since ϕλu(t,)\phi_{\lambda u}(t,\cdot) and ϕλv(t,)\phi_{\lambda v}(t,\cdot) are pseudo-Lipschitz of order pp, so are ϕu(t,)\phi_{u}(t,\cdot) and ϕv(t,)\phi_{v}(t,\cdot). Taking the empirical means over each of the two components of ϕu()\phi_{u}(\cdot) and ϕv()\phi_{v}(\cdot), and applying (22) and (26), we see that if νu(t)\nu_{u}(t) and νv(t)\nu_{v}(t) are defined as in (49),

Therefore, νu(t)\nu_{u}(t) and νv(t)\nu_{v}(t) are vectors containing the parameters λu(t)\lambda_{u}(t) and λv(t)\lambda_{v}(t) for the factor selection functions in lines 7 and 12 of Algorithm 1 as well as the second-order statistics αu1(t)\alpha_{u1}(t) and αv1(t)\alpha_{v1}(t). Now, for νu=(αv1,λu)\nu_{u}=(\alpha_{v1},\lambda_{u}) and νv=(αu1,λv)\nu_{v}=(\alpha_{u1},\lambda_{v}) define the scalar functions

Since Gu(p,λu)G_{u}(p,\lambda_{u}) and Gv(q,λv)G_{v}(q,\lambda_{v}) satisfy the continuity conditions in Assumption 4.1(c), Hu(t,b,u0,λu)H_{u}(t,b,u_{0},\lambda_{u}) and Hv(t,d,v0,λv)H_{v}(t,d,v_{0},\lambda_{v}) satisfy Assumption A.1. In addition, the componentwise separability assumption in (21) implies that the updates in lines 7 and 12 of Algorithm 1 can be rewritten as

Thus, combining (78) and (79) with the definitions of b(t)\mathbf{b}(t) and d(t)\mathbf{d}(t) in (72) and (74), we obtain

where (a) follows from the definition of ξu(t)\xi_{u}(t) in (73); (b) is the setting for μu(t ⁣+ ⁣1)\mu_{u}(t\!+\!1) in line 8; and (c) follows from the definition of Hv(t,d)H_{v}(t,d) in (78b). Similarly, one can show that

Equations (73), (75), (77), (78), (81) and (82) exactly match the recursions in equations (46) to (49). Therefore, Theorem A.2 shows that the limits (17) hold in a sense that the sets θu(t)\theta_{u}(t) and θv(t)\theta_{v}(t) converge empirically with bounded moments of order pp.

We next show that the limits U(t)U(t) and V(t)V(t) on the right-hand side of (17) match the descriptions in (19) and (20). First, define αu1(t)\overline{\alpha}_{u1}(t) and αv1(t)\overline{\alpha}_{v1}(t) in (18) and λu(t)\overline{\lambda}_{u}(t) and λv(t)\overline{\lambda}_{v}(t) as in (24). Then, from (76), the expectations νu(t)\overline{\nu}_{u}(t) and νv(t)\overline{\nu}_{v}(t) in (52b) are given by

where B(t)N(0,τb(t))B(t)\sim{\mathcal{N}}(0,\tau^{b}(t)). Therefore, if we let Zu(t)=τwB(t)Z_{u}(t)=\sqrt{\tau_{w}}B(t), then Zu(t)Z_{u}(t) is zero mean Gaussian with variance

where follows from (52a) and (b) follows from the definition of αv0(t)\overline{\alpha}_{v0}(t) in (18). Substituting Zu(t)=τwB(t)Z_{u}(t)=\sqrt{\tau_{w}}B(t) into (84) we obtain the model for U(t ⁣+ ⁣1)U(t\!+\!1) in (19). Similarly, using (51) and (78b), we can obtain the model V(t ⁣+ ⁣1)V(t\!+\!1) in (20). Thus, we have proven that the random variables (U0,U(t))(U_{0},U(t)) and (V0,V(t))(V_{0},V(t)) are described by (19) and (20), and this completes the proof.

A.4 Proof of Theorem 5.1

The theorem is proven by simply evaluating the second order statistics. We begin with αu1(t ⁣+ ⁣1)\overline{\alpha}_{u1}(t\!+\!1):

where (a) is the definition in (18); (b) follows from (19) and (31); (c) follows from (19); and (d) follows from the independence of Zu(t)Z_{u}(t) and U0U_{0} and the definition of τu\tau_{u} in (28). Similarly, one can show that

Substituting (85) and (86) into (29), we obtain the asymptotic correlation coefficient

where the last step follows from (30). This proves the first equation in (32).

Applying these equations into (29) and (30), we obtain the recursion (32). Hence, we have proven part (a) of the theorem.

For part (b), we need the following simple lemma.

Suppose that H:H:\rightarrow is continuous and monotonically increasing, and x(t)x(t) is a sequence satisfying the recursive relation

for some initial condition x(0)x(0)\in. Then, either x(t)x(t) monotonically increases or decreases to some x=H(x)x^{*}=H(x^{*}).

This can be proven similar to [42, lemma 7].

To apply Lemma A.4, observe that the recursions (32) show that

then it follows from (32) that ρu(t ⁣+ ⁣1)=H(ρu(t))\rho_{u}(t\!+\!1)=H(\rho_{u}(t)) for all tt. By taking the derivative, it can be checked that H(ρu)H(\rho_{u}) is monotonically increasing. It follows from Lemma A.4 that ρu(t)ρu\rho_{u}(t)\rightarrow\rho_{u}^{*} for some fixed point ρu=H(ρu)\rho_{u}^{*}=H(\rho_{u}^{*}) with ρu\rho_{u}^{*}\in.

Now, there are only two fixed point solutions to ρu=H(ρu)\rho_{u}^{*}=H(\rho_{u}^{*}): ρu=0\rho_{u}^{*}=0 and

then ρu\rho_{u}^{*} in (90) is not positive, so the only fixed solution in $isis\rho_{u}^{*}=0.Therefore,when(91)isnotsatisfied,. Therefore, when (91) is not satisfied,\rho_{u}(t)mustconvergetothezerofixedpoint:must converge to the zero fixed point:\rho_{u}(t)\rightarrow 0$.

Now, suppose that (91) is satisfied. In this case, we claim that ρu(t)ρu\rho_{u}(t)\rightarrow\rho_{u}^{*} where ρu\rho_{u}^{*} is in (90). We prove this claim by contradiction and suppose, instead, that ρu(t)\rho_{u}(t) converges to the other fixed point: ρu(t)0\rho_{u}(t)\rightarrow 0. Since Lemma A.4 shows that ρu(t)\rho_{u}(t) must be either monotonically increasing or decreasing, the only way ρu(t)0\rho_{u}(t)\rightarrow 0 is that ρu(t)\rho_{u}(t) monotonically decreases to zero. But, when (91) is satisfied, it can be checked that for ρu(t)\rho_{u}(t) sufficiently small and positive, ρu(t ⁣+ ⁣1)>H(ρu(t))\rho_{u}(t\!+\!1)>H(\rho_{u}(t)). This contradicts the fact that ρu(t)\rho_{u}(t) is monotonically decreasing, and therefore, ρu(t)\rho_{u}(t) must converge to the other fixed point ρu\rho_{u}^{*} in (90).

Hence, we have shown that when (91) is not satisfied, ρu(t)0\rho_{u}(t)\rightarrow 0, and when (91) is satisfied ρu(t)ρu\rho_{u}(t)\rightarrow\rho_{u}^{*} in (90). This is equivalent to the first limit in (33). The second limit (33) is proved similarly.

A.5 Proof of Theorem 5.3

and therefore αu0(t)=αu1(t)\overline{\alpha}_{u0}(t)=\overline{\alpha}_{u1}(t). Now, consider the measurement P(t)P(t) in (19). The SNR in this channel is

Since U(t ⁣+ ⁣1)U(t\!+\!1) is the conditional expectation of U0U_{0} given P(t)P(t), the mean-squared error is given by Eu(ηu(t)){\mathcal{E}}_{u}(\eta_{u}(t)) defined in (36). Therefore,

where (a) follows from expanding the square and substituting in the definitions in (18) and (28); and (b) follows from the fact that αu0(t ⁣+ ⁣1)=αu1(t ⁣+ ⁣1)\overline{\alpha}_{u0}(t\!+\!1)=\overline{\alpha}_{u1}(t\!+\!1) proven above. We have thus proven that

Therefore, the asymptotic correlation coefficient is given by

where (a) follows from (29) and (b) follows from (95). Substituting in (93) into (96) proves the first equation in (37). The second recursion in (37) can be proven similarly.

Hence, from (30), the initial correlation coefficient is

which agrees with the statement in the theorem. This proves part (a).

To prove part (b), we again use Lemma A.4. Define the functions

From (37), it follows that ρv(t ⁣+ ⁣1)=H(ρv(t))\rho_{v}(t\!+\!1)=H(\rho_{v}(t)). Now, Eu(ηu){\mathcal{E}}_{u}(\eta_{u}) and Ev(ηv){\mathcal{E}}_{v}(\eta_{v}) defined in (36) are the mean-squared errors of U0U_{0} and V0V_{0} under AWGN estimation measurements with SNRs ηu\eta_{u} and ηv\eta_{v}. Therefore, Eu(ηu){\mathcal{E}}_{u}(\eta_{u}) and Ev(ηv){\mathcal{E}}_{v}(\eta_{v}) must be monotonically decreasing in ηu\eta_{u} and ηv\eta_{v}. Therefore, Hu(ρv)H_{u}(\rho_{v}) and Hv(ρu)H_{v}(\rho_{u}) in (97) are monotonically increasing functions and thus so is the concatenated function H(ρv)H(\rho_{v}) in (98). Also, since the assumption of part (b) is that Eu(ηu){\mathcal{E}}_{u}(\eta_{u}) and Ev(ηv){\mathcal{E}}_{v}(\eta_{v}) are continuous, H(ρv)H(\rho_{v}) is also continuous. It follows from Lemma A.4 that ρv(t)ρv\rho_{v}(t)\rightarrow\rho_{v}^{*} where ρv\rho_{v}^{*} is a fixed point of (37).

It remains to show ρv>0\rho_{v}^{*}>0. Observe that

Therefore, the limit point ρv\rho_{v}^{*} of ρv(t)\rho_{v}(t) must satisfy ρvρv(0)>0\rho_{v}^{*}\geq\rho_{v}(0)>0.

A.6 Proof of Lemma 5.5

Towards this end, we first use a standard result (see, for example, ) that, for any prior on U0U_{0} with bounded second moments, the mean-squared error in (36) satisfies

The term τu/(1+ηuτu)\tau_{u}/(1+\eta_{u}\tau_{u}) is the minimum mean-squared error with linear estimation of U0U_{0} from an AWGN noise-corrupted measurement Y=ηuU0+DY=\sqrt{\eta_{u}}U_{0}+D, DN(0,1)D\sim{\mathcal{N}}(0,1). Equation (99) arises from the fact that linear estimation is optimal in low SNRs – see for details. Using (99), we can compute the derivative of HuH_{u},

We now apply a standard linearization analysis of the nonlinear system ρv(t ⁣+ ⁣1)=H(ρv(t))\rho_{v}(t\!+\!1)=H(\rho_{v}(t)) around the fixed point ρv=0\rho_{v}=0. See, for example, . If βτuτv<τw\sqrt{\beta}\tau_{u}\tau_{v}<\tau_{w} then H(0)<1H^{\prime}(0)<1 and the fixed point is stable. Thus, for any ρv(0)\rho_{v}(0) sufficiently small ρv(t)0\rho_{v}(t)\rightarrow 0. This proves part (b) of the lemma.

On the other hand, if βτuτv>τw\sqrt{\beta}\tau_{u}\tau_{v}>\tau_{w} then H(0)>1H^{\prime}(0)>1 and the fixed point is unstable. This will imply that for any ρv(0)>0\rho_{v}(0)>0, ρv(t)\rho_{v}(t) will diverge from zero. But, we know from Theorem 5.3 that ρv(t)\rho_{v}(t) must converge to some fixed point of ρv=H(ρv)\rho_{v}=H(\rho_{v}). So, the limit point must be positive. This proves part (a) of the lemma.

References