Near-Optimal Column-Based Matrix Reconstruction

Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail

Introduction

as possible. We present polynomial-time near-optimal constructions for arbitrary $r>k$ , settling important open questions regarding column-based matrix reconstruction.

Spectral norm: What is the best reconstruction error with $r>k$ columns? We present polynomial-time (deterministic and randomized) algorithms with approximation error asymptotically matching a lower bound proven in this work. Prior work had focused on the $r=k$ case and presented near-optimal polynomial-time algorithms .

Frobenius norm: How many columns are needed for relative error approximation, i.e. a reconstruction error of at most

2 Main results

we will state all our bounds in terms of the latter quantity. Note that we chose to state our Frobenius norm bounds in terms of the square of the Frobenius norm; this choice facilitates comparisons with prior work and simplifies our proofs (see also Table 1 for a summary of our results).

Our bound implies a constant-factor approximation. Previous work presents deterministic near-optimal algorithms for $r=k$ ; we are unaware of any deterministic algorithms for $r>k$ .

Our last, yet perhaps most interesting result, guarantees relative-error Frobenius norm approximation by combining the algorithm of Theorem 4 with one round of adaptive sampling . This is the first relative-error approximation for Frobenius norm reconstruction that uses a linear number of columns in $k$ (the target rank). Previous work achieves relative error with $O(k\log k+k/\epsilon)$ columns. Our result asymptotically matches the $\Omega(k/\epsilon)$ lower bound in .

Notice that in Theorems 2 and 4, which use the deterministic spectral sparsification technique of Lemma 14 to select the columns, we only achieve a $2+\epsilon$ error by selecting $O(k/\epsilon^{2})$ columns. To improve this constant factor approximation to a relative error bound we used the adaptive sampling idea from .

3 Running times

the accuracy guarantees only weaken by small constant factors.

4 Lower Bounds

Table 2 provides a summary on lower bounds for the ratio

5 Prior results on column-based matrix reconstructions

There is a long literature on algorithms for column-based matrix reconstruction using $r\geq k$ columns. The first result goes back to , with the most recent one being, to the best of our knowledge, the work in .

We present known upper bounds for the approximation ratio

5.2 The spectral norm case

We present known guarantees for the approximation ratio

In general, results for spectral norm have been rare. When $r=k$ , the strongest bound emerges from the Strong Rank Revealing QR (RRQR) (specifically Algorithm 4 in ), which, for $f>1$ , runs in $O(mnk\log_{f}n)$ time and guarantees an $f^{2}k(n-k)+1$ approximation. For $r>k$ , to the best of our knowledge, there is no easy way to extend the RRQR guarantees. In fact we are only aware of one bound that is applicable to this domain, other than those obtained by trivially extending the Frobenius norm bounds, because any $\alpha$ -approximation in the Frobenius norm gives an $\alpha(\rho-k)$ -approximation in the spectral norm:

Main Tools

Our two main tools are the use of matrix factorizations for column-based low-rank matrix reconstruction, and two deterministic sparsification lemmas which extend the work of .

$(b)$ follows by matrix-Pythagoras because

We now prove Eqn. (3). In the above derivation up to (3.1), we have shown:

By strong submultiplicativity, the last term is bounded by

The proposed algorithm runs in $O\left(mnk\epsilon^{-1}\log\left(k^{-1}\min\{m,n\}\right)\right)$ time.

The proposed algorithm runs in $O\left(mnk\epsilon^{-1}\right)$ time.

2 Sparse approximate decompositions of the identity

Proofs of our Main Results

In this section, we leverage the main tools described in Section 3 in order to prove the results of Section 1.2 (Theorems 1 through 5). We start with a proof of Theorem 1, using Lemmas 10 and 13.

2 Proof of Theorem 3

3 Proof of Theorem 2

4 Proof of Theorem 4

5 Proof of Theorem 5

Finally, we will prove Theorem 5 by combining the results of Theorem 4 (a constant factor approximation algorithm) with one round of adaptive sampling. We first recall the following lemma, which has appeared in prior work .

The matrix $C$ can be computed in $O(mn+r\log r)$ time.

contain all the sampled columns. We will analyze the expectation

Finally, recall that for our choice of $s$ , $s\geq c_{0}k/\epsilon$ , and so we obtain the relative error bound. The number of columns needed is

and use $d=O(\epsilon^{-2/3})$ to get the final asymptotic run time.

We conclude by noting that the number of columns required for relative error approximation is approximately ${2k\over\epsilon}$ , a two-factor from optimal, since ${k\over\epsilon}$ columns are necessary (see and Section 9.1). We get an improved running time equal to

using just a constant factor more columns by setting $d$ and $\epsilon_{0}$ in the proof to constants (for example, setting $d=100,\ \epsilon_{0}={62\over 181}\approx{1\over 3}$ results in sampling ${3k\over\epsilon}(1+o(1))$ columns).

Proof of Lemma 11: Approximate SVD in the Spectral Norm

The running time of the above algorithm is $O(mnrq)$ . Corollary 10.10 of presents the following bound:

Let $x$ be a positive random variable. Then, for any $h\geq 1$ ,

To get the reverse inequality, we will use matrix-Pythagoras as follows:

where the last step follows from Lemma 25. We conclude that

Using $\sqrt{x^{2}+y^{2}}\leq x+y$ , we conclude that

collecting our results together, we obtain:

To conclude, combine with Eqn. (8) and note that

Proof of Lemma 12: Approximate SVD in the Frobenius Norm

The running time of the above algorithm is $O(mnr)$ . Theorem 10.5 in presents the following bound:

Dual Set Spectral Sparsification: proof of Lemma 13

In this section, we prove Lemma 13, which generalizes Theorem 3.1 in . Indeed, setting ${\cal V}={\cal U}$ reproduces the spectral sparsification result of Theorem 3.1 in . The basic observation is that the abalysis of for the upper bound and lower bound are essentially independent. This means that the analysis in directly applies when the upper and lower bounds are analyzed with respect to different sets of vectors. The only point at which the bounds are used together is when one needs to compute a weight that is in between the two bounds, which as show, is always possible with a single set of vectors and it is also true with different sets of vectors for the same reason. For completeness we present the details, simplifying a little and emphasizing the independent treatment of ${\mathbf{v}}_{i}$ and ${\mathbf{u}}_{i}$ in the proof.

As in , we will provide a constructive proof of the lemma and we start by describing the algorithm that computes the weights $s_{i}$ , $i=1,\ldots,n$ .

The fundamental idea underlying Algorithm 1 is the greedy selection of vectors that satisfy a number of desired properties in each step. These properties will eventually imply the eigenvalue bounds of Lemma 13. We start by defining several quantities that will be used in the description of the algorithm and its proof. First, fix two constants:

Algorithm 1 runs in $r$ steps. The initial vector of weights ${\mathbf{s}}_{0}$ is initialized to the all-zero vector. At each step $\tau=0,\ldots,r-1$ , the algorithm selects a pair of vectors $({\mathbf{u}}_{j},{\mathbf{v}}_{j})$ that satisfy Eqn. (10), computes the associated weight $t$ from Eqn. (11), and updates two matrices and the vector of weights appropriately, as specified in Eqn. (12).

2 Running time

3 Proof of Correctness

Now recall that Algorithm 1 runs in $r$ steps. Initially, all $n$ weights are set to zero. Assume that at the $\tau$ -th step ( $\tau=0,\ldots,r-1$ ) the vector of weights

At the $\tau$ -th step, for all $\tau=0,\ldots,r-1$ , there exists an index $j$ in $\left\{1,\ldots,n\right\}$ such that setting the weight $t>0$ as in Eqn. (11) satisfies

The lemma now follows by simple induction on $\tau$ .

We are now ready to conclude the proof of Lemma 13. By Lemma 31, at the $r$ -th step,

4 Proof of Lemma 30

In order to prove Lemma 30 we will use the following averaging argument.

Assuming ${\cal E}\geq 0$ the claim follows immediately because $\delta_{\textsc{l}}=1$ and

Thus, we only need to show that ${\cal E}\geq 0$ . From the Cauchy-Schwarz inequality, for $a_{i},b_{i}\geq 0$ , $\left(\sum_{i}a_{i}b_{i}\right)^{2}\leq\left(\sum_{i}a_{i}^{2}b_{i}\right)\left(\sum_{i}b_{i}\right)$ and thus

(recall that $r>k$ ). Second, $\lambda_{i}>{\textsc{l}}_{\tau+1}$ because

Combining these two observations with Eqn. (20) we conclude that ${\cal E}\geq 0$ .

Lemma 30 follows from Lemma 32 because the two inequalities must hold simultaneously for at least one index $j$ .

Dual-set Spectral-Frobenius Sparsification: proof of Lemma 14

In this section we will provide a constructive proof of Lemma 14. Our proof closely follows the proof of Lemma 13, so we will only highlight the differences. We first discuss modifications to Algorithm 1. First of all, the new inputs are ${\cal V}=\{{\mathbf{v}}_{1},\ldots,{\mathbf{v}}_{n}\}$ and ${\cal A}=\{{\mathbf{a}}_{1},\ldots,{\mathbf{a}}_{n}\}$ . The output is a set of $n$ non-negative weights $s_{i}$ , at most $r$ of which are non-zero. We define the parameters

Then, at the $\tau$ -th step, the algorithm will pick an index $j$ and compute a weight $t>0$ such that

The algorithm updates the vector of weights ${\mathbf{s}}_{\tau}$ and the matrix

It is worth noting that the algorithm does not need to update the matrix

At every step $\tau=0,\ldots,r-1$ there exists an index $j$ in $\left\{1,\ldots,n\right\}$ that satisfies Eqn. (22).

where the last equality follows from the definition of $\delta_{\textsc{u}}$ .

Using the conditions of the lemma and the definition of $U_{F}$ from Eqn. (21),

We can now combine Lemmas 28 and 34 to prove that at all steps $\tau=0,\ldots,r-1$ ,

Note that after all $r$ steps of the algorithm are completed,

Lower bounds

Theorem 35 below is the main result in this section.

We extend the lower bound in to arbitrary $r>k$ . Consider the matrix

where ${\mathbf{e}}_{i}\in\R^{n+1}$ are the standard basis vectors. Then,

2 Frobenius norm approximation

does not imply a lower bound for the ratio

Also, note that Proposition 4 in shows a lower bound equal to $\left(1+k/2r\right)$ for the ratio

For completeness, we extend the bound of to the ratio

In fact, we obtain a lower bound which is asymptotically $1+k/r$ . We start with the following lemma.

This last expression is minimized subject to the constraint that $\sum_{i=1}^{k}r_{i}=r$ when $r_{i}=r/k$ , and so

Where we used $\alpha^{2}=k{\alpha^{\prime}}^{2}$ . The result follows because

Conclusions and Open Problems

Several interesting questions remain unanswered, which we summarize in this section.

First, is it possible to improve the running time of the deterministic algorithms of Lemmas 13 and 14? Recently, Zouzias made progress in improving the running time of the spectral sparsification result of ; can we get a similar improvement for the 2-set algorithms presented here? Or perhaps, can we trade off the running time with randomization in those algorithms?

Acknowledgments

We would like to thank A. Deshpande, D. Feldman, K. Varadarajan, and J. Tropp for useful discussions, and D. Feldman and K. Varadarajan for pointing out the connections between the subspace approximation line of research and our work. We would also like to thank an anonymous reviewer for pointing out and how this result on oblique projections implies Eqn. (3) and its implications to Theorems 3 and 15; this avenue suggested by the reviewer slightly improved our previous bounds.

This work has been supported by NSF CCF-1016501 and NSF DMS-1008983 to P. Drineas and M. Magdon-Ismail. C. Boutsidis acknowledges the support from XDATA program of the Defense Advanced Research Projects Agency (DARPA), administered through Air Force Research Laboratory contract FA8750-12-C-0323.