Near-Optimal Column-Based Matrix Reconstruction
Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail
Introduction
as possible. We present polynomial-time near-optimal constructions for arbitrary , settling important open questions regarding column-based matrix reconstruction.
Spectral norm: What is the best reconstruction error with columns? We present polynomial-time (deterministic and randomized) algorithms with approximation error asymptotically matching a lower bound proven in this work. Prior work had focused on the case and presented near-optimal polynomial-time algorithms .
Frobenius norm: How many columns are needed for relative error approximation, i.e. a reconstruction error of at most
2 Main results
we will state all our bounds in terms of the latter quantity. Note that we chose to state our Frobenius norm bounds in terms of the square of the Frobenius norm; this choice facilitates comparisons with prior work and simplifies our proofs (see also Table 1 for a summary of our results).
Our bound implies a constant-factor approximation. Previous work presents deterministic near-optimal algorithms for ; we are unaware of any deterministic algorithms for .
Our last, yet perhaps most interesting result, guarantees relative-error Frobenius norm approximation by combining the algorithm of Theorem 4 with one round of adaptive sampling . This is the first relative-error approximation for Frobenius norm reconstruction that uses a linear number of columns in (the target rank). Previous work achieves relative error with columns. Our result asymptotically matches the lower bound in .
Notice that in Theorems 2 and 4, which use the deterministic spectral sparsification technique of Lemma 14 to select the columns, we only achieve a error by selecting columns. To improve this constant factor approximation to a relative error bound we used the adaptive sampling idea from .
3 Running times
the accuracy guarantees only weaken by small constant factors.
4 Lower Bounds
Table 2 provides a summary on lower bounds for the ratio
5 Prior results on column-based matrix reconstructions
There is a long literature on algorithms for column-based matrix reconstruction using columns. The first result goes back to , with the most recent one being, to the best of our knowledge, the work in .
We present known upper bounds for the approximation ratio
5.2 The spectral norm case
We present known guarantees for the approximation ratio
In general, results for spectral norm have been rare. When , the strongest bound emerges from the Strong Rank Revealing QR (RRQR) (specifically Algorithm 4 in ), which, for , runs in time and guarantees an approximation. For , to the best of our knowledge, there is no easy way to extend the RRQR guarantees. In fact we are only aware of one bound that is applicable to this domain, other than those obtained by trivially extending the Frobenius norm bounds, because any -approximation in the Frobenius norm gives an -approximation in the spectral norm:
Main Tools
Our two main tools are the use of matrix factorizations for column-based low-rank matrix reconstruction, and two deterministic sparsification lemmas which extend the work of .
follows by matrix-Pythagoras because
We now prove Eqn. (3). In the above derivation up to (3.1), we have shown:
By strong submultiplicativity, the last term is bounded by
The proposed algorithm runs in time.
The proposed algorithm runs in time.
2 Sparse approximate decompositions of the identity
Proofs of our Main Results
In this section, we leverage the main tools described in Section 3 in order to prove the results of Section 1.2 (Theorems 1 through 5). We start with a proof of Theorem 1, using Lemmas 10 and 13.
2 Proof of Theorem 3
3 Proof of Theorem 2
4 Proof of Theorem 4
5 Proof of Theorem 5
Finally, we will prove Theorem 5 by combining the results of Theorem 4 (a constant factor approximation algorithm) with one round of adaptive sampling. We first recall the following lemma, which has appeared in prior work .
The matrix can be computed in time.
contain all the sampled columns. We will analyze the expectation
Finally, recall that for our choice of , , and so we obtain the relative error bound. The number of columns needed is
and use to get the final asymptotic run time.
We conclude by noting that the number of columns required for relative error approximation is approximately , a two-factor from optimal, since columns are necessary (see and Section 9.1). We get an improved running time equal to
using just a constant factor more columns by setting and in the proof to constants (for example, setting results in sampling columns).
Proof of Lemma 11: Approximate SVD in the Spectral Norm
The running time of the above algorithm is . Corollary 10.10 of presents the following bound:
Let be a positive random variable. Then, for any ,
To get the reverse inequality, we will use matrix-Pythagoras as follows:
where the last step follows from Lemma 25. We conclude that
Using , we conclude that
collecting our results together, we obtain:
To conclude, combine with Eqn. (8) and note that
Proof of Lemma 12: Approximate SVD in the Frobenius Norm
The running time of the above algorithm is . Theorem 10.5 in presents the following bound:
Dual Set Spectral Sparsification: proof of Lemma 13
In this section, we prove Lemma 13, which generalizes Theorem 3.1 in . Indeed, setting reproduces the spectral sparsification result of Theorem 3.1 in . The basic observation is that the abalysis of for the upper bound and lower bound are essentially independent. This means that the analysis in directly applies when the upper and lower bounds are analyzed with respect to different sets of vectors. The only point at which the bounds are used together is when one needs to compute a weight that is in between the two bounds, which as show, is always possible with a single set of vectors and it is also true with different sets of vectors for the same reason. For completeness we present the details, simplifying a little and emphasizing the independent treatment of and in the proof.
As in , we will provide a constructive proof of the lemma and we start by describing the algorithm that computes the weights , .
The fundamental idea underlying Algorithm 1 is the greedy selection of vectors that satisfy a number of desired properties in each step. These properties will eventually imply the eigenvalue bounds of Lemma 13. We start by defining several quantities that will be used in the description of the algorithm and its proof. First, fix two constants:
Algorithm 1 runs in steps. The initial vector of weights is initialized to the all-zero vector. At each step , the algorithm selects a pair of vectors that satisfy Eqn. (10), computes the associated weight from Eqn. (11), and updates two matrices and the vector of weights appropriately, as specified in Eqn. (12).
2 Running time
3 Proof of Correctness
Now recall that Algorithm 1 runs in steps. Initially, all weights are set to zero. Assume that at the -th step () the vector of weights
At the -th step, for all , there exists an index in such that setting the weight as in Eqn. (11) satisfies
The lemma now follows by simple induction on .
We are now ready to conclude the proof of Lemma 13. By Lemma 31, at the -th step,
4 Proof of Lemma 30
In order to prove Lemma 30 we will use the following averaging argument.
Assuming the claim follows immediately because and
Thus, we only need to show that . From the Cauchy-Schwarz inequality, for , and thus
(recall that ). Second, because
Combining these two observations with Eqn. (20) we conclude that .
Lemma 30 follows from Lemma 32 because the two inequalities must hold simultaneously for at least one index .
Dual-set Spectral-Frobenius Sparsification: proof of Lemma 14
In this section we will provide a constructive proof of Lemma 14. Our proof closely follows the proof of Lemma 13, so we will only highlight the differences. We first discuss modifications to Algorithm 1. First of all, the new inputs are and . The output is a set of non-negative weights , at most of which are non-zero. We define the parameters
Then, at the -th step, the algorithm will pick an index and compute a weight such that
The algorithm updates the vector of weights and the matrix
It is worth noting that the algorithm does not need to update the matrix
At every step there exists an index in that satisfies Eqn. (22).
where the last equality follows from the definition of .
Using the conditions of the lemma and the definition of from Eqn. (21),
We can now combine Lemmas 28 and 34 to prove that at all steps ,
Note that after all steps of the algorithm are completed,
Lower bounds
Theorem 35 below is the main result in this section.
We extend the lower bound in to arbitrary . Consider the matrix
where are the standard basis vectors. Then,
2 Frobenius norm approximation
does not imply a lower bound for the ratio
Also, note that Proposition 4 in shows a lower bound equal to for the ratio
For completeness, we extend the bound of to the ratio
In fact, we obtain a lower bound which is asymptotically . We start with the following lemma.
This last expression is minimized subject to the constraint that when , and so
Where we used . The result follows because
Conclusions and Open Problems
Several interesting questions remain unanswered, which we summarize in this section.
First, is it possible to improve the running time of the deterministic algorithms of Lemmas 13 and 14? Recently, Zouzias made progress in improving the running time of the spectral sparsification result of ; can we get a similar improvement for the 2-set algorithms presented here? Or perhaps, can we trade off the running time with randomization in those algorithms?
Acknowledgments
We would like to thank A. Deshpande, D. Feldman, K. Varadarajan, and J. Tropp for useful discussions, and D. Feldman and K. Varadarajan for pointing out the connections between the subspace approximation line of research and our work. We would also like to thank an anonymous reviewer for pointing out and how this result on oblique projections implies Eqn. (3) and its implications to Theorems 3 and 15; this avenue suggested by the reviewer slightly improved our previous bounds.
This work has been supported by NSF CCF-1016501 and NSF DMS-1008983 to P. Drineas and M. Magdon-Ismail. C. Boutsidis acknowledges the support from XDATA program of the Defense Advanced Research Projects Agency (DARPA), administered through Air Force Research Laboratory contract FA8750-12-C-0323.