OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

Jelani Nelson, Huy L. Nguyen

Introduction

There has been much recent work on applications of dimensionality reduction to handling large datasets. Typically special features of the data such as low “intrinsic” dimensionality, or sparsity, are exploited to reduce the volume of data before processing, thus speeding up analysis time. One success story of this approach is the applications of fast algorithms for the Johnson-Lindenstrauss lemma [JL84], which allows one to reduce the dimension of a set of vectors while preserving all pairwise distances. There have been two popular lines of work in this area: one focusing on fast embeddings for all vectors [AC09, AL09, AL11, HV11, KMR12, KW11, Vyb11], and one focusing on fast embeddings specifically for sparse vectors [Ach03, BOR10, DKS10, KN10, KN12].

In this work we focus on the problem of constructing an oblivious subspace embedding (OSE) [Sar06] and on applications of these embeddings. Roughly speaking, the problem is to design a data-independent distribution over linear mappings such that when data come from an unknown low-dimensional subspace, they are reduced to roughly their true dimension while their structure (all distances in the subspace in this case) is preserved at the same time. It can be seen as a continuation of the approach based on the Johnson-Lindenstrauss lemma to subspaces. Here we focus on the setting of sparse inputs, where it is important that the algorithms take time proportional to the input sparsity. These embeddings have found applications in numerical linear algebra problems such as least squares regression, low rank approximation, and approximating leverage scores [CW09, CW12, DMIMW12, NDT09, Sar06, Tro11]. We refer the interested reader to the surveys [HMT11, Mah11] for an overview of this area.

Here n,d,ε,δn,d,\varepsilon,\delta are given parameters of the problem and we would like mm as small as possible.

The O(ndlogn)O(nd\log n) running time of the above scheme to compute ΠA\Pi A seems almost linear, and thus nearly optimal, since the input size is already ndnd to describe AA. While this is true for dense AA, in many practical instances one often expects the input matrix AA to be sparse, in which case linear time in the input description actually means O(nnz(A))O(\operatornamewithlimits{nnz}(A)), where nnz()\operatornamewithlimits{nnz}(\cdot) is the number of non-zero entries. For example consider the case of AA being the Netflix matrix, where Ai,jA_{i,j} is user ii’s score for movie jj: AA is very sparse since most users do not watch, let alone score, most movies [ZWSP08].

In the following paragraph we let SΠS_{\Pi} be the space required to store Π\Pi implicitly (e.g. store the seed to some hash function that specifies Π\Pi). We let tct_{c} be the running time required by an algorithm which, given a column index and the length-SΠS_{\Pi} seed specifying Π\Pi, returns the list of all non-zeroes in that column in Π\Pi.

1 Problem Statements and Bounds

We now formally define all numerical linear algebra problems we consider. Plugging our new OSE’s into previous algorithms for the above problems yields the bounds in Figure 1; the value rr used in bounds denotes rank(A)\operatorname{rank}(A).

Least Squares Regression:

Low Rank Approximation:

2 Our Construction and Techniques

Bounding this latter quantity is a simple calculation and fits in under a page (Theorem 3).

The two constructions with smaller md/ε2m\approx d/\varepsilon^{2} are the sparse Johnson-Lindenstrauss matrices of [KN12]. In particular, the only properties we need from our OSE in our analyses are the following. Let each matrix in the support of the OSE have entries in {0,1/s,1/s}\{0,1/\sqrt{s},-1/\sqrt{s}\}. For a randomly drawn Π\Pi, let δi,j\delta_{i,j} be an indicator random variable for the event Πi,j0\Pi_{i,j}\neq 0, and write Πi,j=δi,jσi,j/s\Pi_{i,j}=\delta_{i,j}\sigma_{i,j}/\sqrt{s}, where the σi,j\sigma_{i,j} are random signs. Then the properties we need are

For any j[n]j\in[n], i=1mδi,j=s\sum_{i=1}^{m}\delta_{i,j}=s with probability 11.

The second property says the δi,j\delta_{i,j} are negatively correlated. We call any matrix drawn from an OSE with the above properties an oblivious sparse norm-approximating projection (OSNAP).

The work of [KN12] gave two OSNAP distributions, either of which suffice for our current OSE problem. In the first construction, each column is chosen to have exactly ss non-zero entries in random locations, each equal to ±1/s\pm 1/\sqrt{s} uniformly at random. For our purposes the signs σi,j\sigma_{i,j} need only be O(logd)O(\log d)-wise independent, and each column can be specified by a O(logd)O(\log d)-wise independent permutation, and the seeds specifying the permutations in different columns need only be O(logd)O(\log d)-wise independent. In the second construction we pick hash functions h:[d]×[s][m/s]h:[d]\times[s]\rightarrow[m/s], σ:[d]×[s]{1,1}\sigma:[d]\times[s]\rightarrow\{-1,1\}, both O(logd)O(\log d)-wise independent, and thus each representable using O(logdlognd)O(\log d\log nd) random bits. For each (i,j)[d]×[s](i,j)\in[d]\times[s] we set Π(j1)s+h(i,j),i=σ(i,j)/s\Pi_{(j-1)s+h(i,j),i}=\sigma(i,j)/\sqrt{s}, and all other entries in Π\Pi are set to zero. Note also that the TZ sketch is itself an OSNAP with s=1s=1.

Our approach follows the classical moment method in random matrix theory; see [Tao12, Section 2] or [Ver12] introductions to this area. In particular, our approach is inspired by one taken by Bai and Yin [BY93], who in our notation were concerned with the case n=dn=d, U=IU=I, Π\Pi dense. Most of the complications in our proof arise because UU is not the identity matrix, so that rows of UU are not orthogonal. For example, in the case of UU having orthogonal rows all graphs GG in the last paragraph have no edges other than self-loops and are trivial to analyze.

Analysis

Let M,H,PM,H,P be n×nn\times n Hermitian matrices where MM has eigenvalues μ1μn\mu_{1}\geq\ldots\geq\mu_{n}, HH has eigenvalues ν1νn\nu_{1}\geq\ldots\geq\nu_{n}, and PP has eigenvalues ρ1ρn\rho_{1}\geq\ldots\geq\rho_{n}. Then  1in\forall\ 1\leq i\leq n, it holds that νi+ρnμiνi+ρ1\nu_{i}+\rho_{n}\leq\mu_{i}\leq\nu_{i}+\rho_{1}.

Let S=(ΠU)ΠUS=(\Pi U)^{*}\Pi U. Letting II be the d×dd\times d identity matrix, Weyl’s inequality with M=SM=S, H=(1+ε2)IH=(1+\varepsilon^{2})I, and P=S(1+ε2)IP=S-(1+\varepsilon^{2})I implies that all the eigenvalues of SS lie in the range [1+ε2+λmin(P),1+ε2+λmax(P)][1+ε2P,1+ε2+P][1+\varepsilon^{2}+\lambda_{min}(P),1+\varepsilon^{2}+\lambda_{max}(P)]\subseteq[1+\varepsilon^{2}-\|P\|,1+\varepsilon^{2}+\|P\|], where λmin(M)\lambda_{min}(M) (resp. λmax(M)\lambda_{max}(M)) is the smallest (resp. largest) eigenvalue of MM. Since Pε2+SI\|P\|\leq\varepsilon^{2}+\|S-I\|, it thus suffices to show

since P2ε\|P\|\leq 2\varepsilon implies that all eigenvalues of SS lie in [(1ε)2,(1+ε)2][(1-\varepsilon)^{2},(1+\varepsilon)^{2}].

Before proceeding with our proofs below, observe that for all k,kk,k^{\prime}

Noting uk,uk=uk2=1\langle u^{k},u^{k}\rangle=\|u^{k}\|^{2}=1 and uk,uk=0\langle u^{k},u^{k^{\prime}}\rangle=0 for kkk\neq k^{\prime}, we have for all k,kk,k^{\prime}

For Π\Pi an OSNAP with s=1s=1 and ε(0,1)\varepsilon\in(0,1), with probability at least 1δ1-\delta all singular values of ΠU\Pi U are 1±ε1\pm\varepsilon as long as mδ1(d2+d)/(2εε2)2m\geq\delta^{-1}(d^{2}+d)/(2\varepsilon-\varepsilon^{2})^{2}, σ\sigma is 44-wise independent, and hh is pairwise independent.

Noting 0=uk,uk2=k=1n(uik)2(uik)2+ijuikuikujkujk0=\langle u^{k},u^{k^{\prime}}\rangle^{2}=\sum_{k=1}^{n}(u^{k}_{i})^{2}(u^{k^{\prime}}_{i})^{2}+\sum_{i\neq j}u^{k}_{i}u^{k^{\prime}}_{i}u^{k}_{j}u^{k^{\prime}}_{j} we have that ijuikuikujkujk0\sum_{i\neq j}u^{k}_{i}u^{k^{\prime}}_{i}u^{k}_{j}u^{k^{\prime}}_{j}\leq 0, so

Before proving the next theorem, it is helpful to state a few facts that we will repeatedly use. Recall that uiu^{i} denotes the iith column of UU, and we will let uiu_{i} denote the iith row of UU.

and this inner product is 11 for i=ji=j and otherwise. \blacksquare

A multigraph GG has kk edge-disjoint spanning trees iff

for every partition PP of the vertex set of GG, where EP(G)E_{P}(G) is the set of edges of GG crossing between two different partitions in PP.

The following corollary is standard, and we will later only need it for the case k=2k=2.

Let GG be a multigraph formed by removing at most kk edges from a multigraph GG^{\prime} that has edge-connectivity at least 2k2k. Then GG must have at least kk edge-disjoint spanning trees.

Proof. For any partition PP of the vertex set, each partition must have at least 2k2k edges leaving it in GG^{\prime}. Thus the number of edges crossing partitions must be at least kPk|P| in GG^{\prime}, and thus at least kPkk|P|-k in GG. Theorem 6 thus implies that GG has kk edge-disjoint spanning trees. \blacksquare

Proof. We have supx,y=1xByB\sup_{\|x\|,\|y\|=1}x^{*}By\leq\|B\| since xByxByx^{*}By\leq\|x\|\cdot\|B\|\cdot\|y\|. To show that unit norm x,yx,y exist which achieve B\|B\|, let B=UΣVB=U\Sigma V^{*} be the singular value decomposition of BB. That is, U,VU,V are unitary and Σ\Sigma is diagonal with entries σ1σ2σd0\sigma_{1}\geq\sigma_{2}\geq\ldots\sigma_{d}\geq 0 so that B=σ1\|B\|=\sigma_{1}. We can then achieve xBy=σ1x^{*}By=\sigma_{1} by letting xx be the first column of UU and yy be the first column of VV. \blacksquare

For Π\Pi an OSNAP with s=Θ(log3(d/δ)/ε)s=\Theta(\log^{3}(d/\delta)/\varepsilon) and ε(0,1)\varepsilon\in(0,1), with probability at least 1δ1-\delta, all singular values of ΠU\Pi U are 1±ε1\pm\varepsilon as long as m=Ω(dlog8(d/δ)/ε2)m=\Omega(d\log^{8}(d/\delta)/\varepsilon^{2}) and σ,h\sigma,h are Ω(log(d/δ))\Omega(\log(d/\delta))-wise independent.

Proof. We will again show Eq. (2). Recall that by Eq. (1) we have

Now we make some observations. Due to the random signs σr,i\sigma_{r,i}, a monomial ψ\psi has expectation zero unless every bond in MR(G)\mathop{MR}(G) has even multiplicity, in which case the product of random signs in ψ\psi is 11. Also, note the expectation of the product of the δr,i\delta_{r,i} terms in ψ\psi is at most (s/m)b(s/m)^{b} by OSNAP properties. Thus letting G\mathcal{G} be the set of all such graphs GG with even bond multiplicity in MR(G)\mathop{MR}(G) that arise from some monomial ψ\psi appearing in Eq. (6), we have

Before continuing further it will be convenient to introduce a notion we will use in our analysis called a generalized dot product multigraph. Such a graph G^\widehat{G} is just as in the case of a dot product multigraph, except that each edge e=(i,j)e=(i,j) is associated with some matrix MeM_{e}. We call MeM_{e} the edge-matrix of ee. Also since G^\widehat{G} is undirected, we can think of an edge e=(i,j)e=(i,j) with edge-matrix MeM_{e} also as an edge (j,i)(j,i), in which case we say its associated edge-matrix is MeM_{e}^{*}. We then associate with G^\widehat{G} the product

Note that a dot product multigraph is simply a generalized dot product multigraph in which Me=IM_{e}=I for all ee. Also, in such a generalized dot product multigraph, we treat multiedges as representing the same bond iff the associated edge-matrices are also equal (in general multiedges may have different edge-matrices).

Let HH be a connected generalized dot product multigraph on vertex set [N][N] with E(H)E(H)\neq\emptyset and where every bond has even multiplicity. Also suppose that for all eE(H)e\in E(H), Me1\|M_{e}\|\leq 1. Define

where vai=uaiv_{a_{i}}=u_{a_{i}} for i1i\neq 1, and va1v_{a_{1}} equals some fixed vector cc with c1\|c\|\leq 1. Then f(H)c2f(H)\leq\|c\|^{2}.

Proof. Let π\pi be some permutation of {2,,N}\{2,\ldots,N\}. For a bond q=(i,j)B(H)q=(i,j)\in B(H), let 2αq2\alpha_{q} denote the multiplicity of qq in HH. Then by ordering the assignments of the ata_{t} in the summation

according to π\pi, we obtain the exactly equal expression

Here we have taken the product over tπ1(j)t\leq\pi^{-1}(j) as opposed to t<π1(j)t<\pi^{-1}(j) since there may be self-loops. By Lemma 5 and the fact that c1\|c\|\leq 1 we have that for any i,ji,j, vi,vj2vi2vj21\langle v_{i},v_{j}\rangle^{2}\leq\|v_{i}\|^{2}\cdot\|v_{j}\|^{2}\leq 1, so we obtain an upper bound on Eq. (8) by replacing each vaπ(t),vaj2αv\langle v_{a_{\pi(t)}},v_{a_{j}}\rangle^{2\alpha_{v}} term with vaπ(t),vaj2\langle v_{a_{\pi(t)}},v_{a_{j}}\rangle^{2}. We can thus obtain the sum

which upper bounds Eq. (8). Now note for 2tN2\leq t\leq N that for any nonnegative integer βt\beta_{t} and for {qB(H):q=(π(t),j),t<π1(j)}\{q\in B(H):q=(\pi(t),j),t<\pi^{-1}(j)\} non-empty (note the strict inequality t<π1(j)t<\pi^{-1}(j)),

where Eq. (10) used Lemma 5, Eq. (11) used Lemma 4, and Eq. (12) used that Mq1\|M_{q}\|\leq 1. Now consider processing the alternating sum-product in Eq. (9) from right to left. We say that a bond (i,j)B(H)(i,j)\in B(H) is assigned to ii if π1(i)<π1(j)\pi^{-1}(i)<\pi^{-1}(j). When arriving at the ttth sum-product and using the upper bound Eq. (11) on the previous t1t-1 sum-products, we will have a sum over vaπ(t)2\|v_{a_{\pi(t)}}\|^{2} raised to some nonnegative power (specifically the number of bonds incident upon π(t)\pi(t) but not assigned to π(t)\pi(t), plus one if π(t)\pi(t) has a self-loop) multiplied by a product of vaπ(t),vaj2\langle v_{a_{\pi(t)}},v_{a_{j}}\rangle^{2} over all bonds (π(t),j)(\pi(t),j) assigned to π(t)\pi(t). There are two cases. In the first case π(t)\pi(t) has no bonds assigned to it. We will ignore this case since we will show that we can choose π\pi to avoid it.

The other case is that π(t)\pi(t) has at least one bond assigned to it. In this case we are in the scenario of Eq. (11) and thus summing over aπ(t)a_{\pi(t)} yields a non-empty product of vaj2\|v_{a_{j}}\|^{2} for the jj for which (π(t),j)(\pi(t),j) is a bond assigned to π(t)\pi(t). Thus in our final sum, as long as we choose π\pi to avoid the first case, we are left with an upper bound of c\|c\| raised to some power equal to the edge-degree of vertex 11 in HH, which is at least 22. The lemma would then follow since cjc2\|c\|^{j}\leq\|c\|^{2} for j2j\geq 2.

It now remains to show that we can choose π\pi to avoid the first case where some t{2,,N}t\in\{2,\ldots,N\} is such that π(t)\pi(t) has no bonds assigned to it. Let TT be a spanning tree in HH rooted at vertex 11. We then choose any π\pi with the property that for any i<ji<j, π(i)\pi(i) is not an ancestor of π(j)\pi(j) in TT. This can be achieved, for example, by assigning π\pi values in reverse breadth first search order. \blacksquare

Let G^\widehat{G} be any dot product graph as in Eq. (7). Then

Proof. We first note that we have the inequality

We can view the sum over tt on the right hand side of the above as creating t1t-1 new dot product multigraphs, each with one fewer vertex where we eliminated vertex yy and associated it with vertex tt for some tt, and for each edge (y,a)(y,a) we effectively replaced it with (t,a)(t,a). Also in first sum where we sum over all nn values of aya_{y}, we have eliminated the constraints ayaia_{y}\neq a_{i} for iyi\neq y. By recursively applying this inequality to each of the resulting tt summations, we bound

by a sum of contributions from y!y! dot product multigraphs where in none of these multigraphs do we have the constraint that aiaja_{i}\neq a_{j} for iji\neq j. We will show that each one of these resulting multigraphs contributes at most dyw+1d^{y-w+1}, from which the lemma follows.

Let GG^{\prime} be one of the dot product multigraphs at a leaf of the above recursion so that we now wish to bound

where Me=IM_{e}=I for all ee for GG^{\prime}. Before proceeding, we first claim that every connected component of GG^{\prime} is Eulerian. To see this, observe GG has an Eulerian tour, by following the edges of GG in increasing order of label, and thus all middle vertices have even edge-degree in GG. However they also have even edge-degree in MR(G)\mathop{MR}(G), and thus the edge-degree of a middle vertex in LM(G)\mathop{LM}(G) must be even as well. Thus, every vertex in G^\widehat{G} has even edge-degree, and thus every vertex in each of the recursively created leaf graphs also has even edge-degree since at every step when we eliminate a vertex, some other vertex’s degree increases by the eliminated vertex’s degree which was even. Thus every connected component of GG^{\prime} is Eulerian as desired.

We now upper bound F(G)F(G^{\prime}). Let the connected components of GG^{\prime} be C1,,CCC(G)C_{1},\ldots,C_{CC(G^{\prime})}, where CC()CC(\cdot) counts connected components. An observation we repeatedly use later is that for any generalized dot product multigraph HH with components C1,,CCC(H)C_{1},\ldots,C_{CC(H)},

We treat GG^{\prime} as a generalized dot product multigraph so that each edge ee has an associated matrix MeM_{e} (though in fact Me=IM_{e}=I for all ee). Define an undirected multigraph to be good if all its connected components have two edge-disjoint spanning trees. We will show that F(G)F(G)F(G^{\prime})\leq F(G^{\prime\prime}) for some generalized dot product multigraph GG^{\prime\prime} that is good then will show F(G)dyw+1F(G^{\prime\prime})\leq d^{y-w+1}. If GG^{\prime} itself is good then we can set G=GG^{\prime\prime}=G^{\prime}. Otherwise, we will show F(G)=F(H0)==F(Hτ)F(G^{\prime})=F(H_{0})=\ldots=F(H_{\tau}) for smaller and smaller generalized dot product multigraphs HtH_{t} (i.e. with successively fewer vertices) whilst maintaining the invariant that each HtH_{t} has Eulerian connected components and has Me1\|M_{e}\|\leq 1 for all ee. We stop when some HτH_{\tau} is good and we can set G=HτG^{\prime\prime}=H_{\tau}.

Let us now focus on constructing this sequence of HtH_{t} in the case that GG^{\prime} is not good. Let H0=GH_{0}=G^{\prime}. Suppose we have constructed H0,,Ht1H_{0},\ldots,H_{t-1} for i1i\geq 1 none of which are good, and now we want to construct HtH_{t}. Since Ht1H_{t-1} is not good it cannot be 44-edge-connected by Corollary 7, so there is some connected component CjC_{j^{*}} of Ht1H_{t-1} with some cut SV(Cj)S\subsetneq V(C_{j^{*}}) with 22 edges crossing the cut (S,V(Cj)\S)(S,V(C_{j^{*}})\backslash S) (note that since CjC_{j^{*}} is Eulerian, any cut has an even number of edges crossing it). Choose such an SV(Cj)S\subsetneq V(C_{j^{*}}) with S|S| minimum amongst all such cuts. Let the two edges crossing the cut be (g,h),(g,h)(g,h),(g^{\prime},h^{\prime}) with h,hSh,h^{\prime}\in S (note that it may be the case that g=gg=g^{\prime} and/or h=hh=h^{\prime}). Note that F(Cj)F(C_{j^{*}}) equals the magnitude of

We define HtH_{t} to be Ht1H_{t-1} but where in the jj^{*}th component we replace CjC_{j^{*}} with Cj(V(Cj)\S)C_{j}^{*}(V(C_{j}^{*})\backslash S) and add an additional edge from gg to gg^{\prime} which we assign edge-matrix MM. We thus have that F(Ht1)=F(Ht)F(H_{t-1})=F(H_{t}) by Eq. (14). Furthermore each component of HtH_{t} is still Eulerian since every vertex in Ht1H_{t-1} has either been eliminated, or its edge-degree has been preserved and thus all edge-degrees are even. It remains to show that M1\|M\|\leq 1.

We first claim that Cj(S)C_{j^{*}}(S) has two edge-disjoint spanning trees. Define CC^{\prime} to be the graph Cj(S)C_{j^{*}}(S) with an edge from hh to hh^{\prime} added. We show that C(S)C^{\prime}(S) is 44-edge-connected so that Cj(S)C_{j^{*}}(S) has two edge-disjoint spanning trees by Corollary 7. Now to see this, consider some SSS^{\prime}\subsetneq S. Consider the cut (S,V(C)\S)(S^{\prime},V(C^{\prime})\backslash S^{\prime}). CC^{\prime} is Eulerian, so the number of edges crossing this cut is either 22 or at least 44. If it 22, then since S<S|S^{\prime}|<|S| this is a contradiction since SS was chosen amongst such cuts to have S|S| minimum. Thus it is at least 44, and we claim that the number of edges crossing the cut (S,S\S)(S^{\prime},S\backslash S^{\prime}) in C(S)C^{\prime}(S) must also be at least 44. If not, then it is 22 since C(S)C^{\prime}(S) is Eulerian. However since the number of edges leaving SS^{\prime} in CC^{\prime} is at least 44, it must then be that h,hSh,h^{\prime}\in S^{\prime}. But then the cut (S\S,V(C)\(S\S))(S\backslash S^{\prime},V(C^{\prime})\backslash(S\backslash S^{\prime})) has 22 edges crossing it so that S\SS\backslash S^{\prime} is a smaller cut than SS with 22 edges leaving it in CC^{\prime}, violating the minimality of S|S|, a contradiction. Thus C(S)C^{\prime}(S) is 44-edge-connected, implying Cj(S)C_{j^{*}}(S) has two edge-disjoint spanning trees T1,T2T_{1},T_{2} as desired.

Now to show M1\|M\|\leq 1, by Fact 8 we have M=supx,x=1xMx\|M\|=\sup_{\|x\|,\|x^{\prime}\|=1}x^{*}Mx^{\prime}. We have that

where Eq. (16) used the AM-GM inequality, and Eq. (17) used Lemma 10 (note the graph with vertex set S{g}S\cup\{g^{\prime}\} and edge set E(Cj(S))\T1{(g,h)}E(C_{j^{*}}(S))\backslash T_{1}\cup\{(g^{\prime},h^{\prime})\} is connected since T2E(Cj(S))\T1T_{2}\subseteq E(C_{j^{*}}(S))\backslash T_{1}). Thus we have shown that HtH_{t} satisfies the desired properties. Now notice that the sequence H0,,H1,H_{0},\ldots,H_{1},\ldots must eventually terminate since the number of vertices is strictly decreasing in this sequence and any Eulerian graph on 22 vertices is good. Therefore we have that HτH_{\tau} is eventually good for some τ>0\tau>0 and we can set G=HτG^{\prime\prime}=H_{\tau}.

It remains to show that for our final good GG^{\prime\prime} we have F(G)dyw+1F(G^{\prime\prime})\leq d^{y-w+1}. We will show this in two parts by showing that both CC(G)dyw+1CC(G^{\prime\prime})\leq d^{y-w+1} and F(G)dCC(G)F(G^{\prime\prime})\leq d^{CC(G^{\prime\prime})}. For the first claim, note that CC(G)CC(G^)CC(G^{\prime\prime})\leq CC(\widehat{G}) since every HtH_{t} has the same number of connected components as GG^{\prime}, and CC(G)CC(G^)CC(G^{\prime})\leq CC(\widehat{G}). This latter inequality holds since in each level of recursion used to eventually obtain GG^{\prime} from G^\widehat{G}, we repeatedly identified two vertices as equal and merged them, which can only decrease the number of connected components. Now, all middle vertices in GG lie in one connected component (since GG is connected) and MR(G)\mathop{MR}(G) has ww connected components. Thus the at least w1w-1 edges connecting these components in GG must come from LM(G)\mathop{LM}(G), implying that LM(G)\mathop{LM}(G) (and thus G^\widehat{G}) has at most yw+1y-w+1 connected components, which thus must also be true for GG^{\prime\prime} as argued above.

It only remains to show F(G)dCC(G)F(G^{\prime\prime})\leq d^{CC(G^{\prime\prime})}. Let GG^{\prime\prime} have connected components C1,,CCC(G)C_{1},\ldots,C_{CC(G^{\prime\prime})} with each CjC_{j} having 22 edge-disjoint spanning trees T1j,T2jT^{j}_{1},T^{j}_{2}. We then have

where Eq. (18) used the AM-GM inequality, and Eq. (19) used Lemma 10, which applies since V(Ct)V(C_{t}) with edge set T1tT^{t}_{1} is connected, and V(Ct)V(C_{t}) with edge set E(Ct)\T1tE(C_{t})\backslash T^{t}_{1} is connected (since T2tE(Ct)\T1tT_{2}^{t}\subseteq E(C_{t})\backslash T^{t}_{1}). \blacksquare

Let α,γ>0\alpha,\gamma>0 be arbitrary constants. For Π\Pi an OSNAP with s=Θ(1/ε)s=\Theta(1/\varepsilon) and ε(0,1)\varepsilon\in(0,1), with probability at least 11/dα1-1/d^{\alpha}, all singular values of ΠU\Pi U are 1±ε1\pm\varepsilon for m=Ω(d1+γ/ε2)m=\Omega(d^{1+\gamma}/\varepsilon^{2}) and σ,h\sigma,h being Ω(logd)\Omega(\log d)-wise independent. The constants in the big-Θ\Theta and big-Ω\Omega depend on α,γ\alpha,\gamma.

Applications

We use the fact that many matrix problems have the same time complexity as matrix multiplication including computing the matrix inverse [BH74][Har08, Appendix A], and QR decomposition [Sch73]. In this paper we only consider the real RAM model and state the running time in terms of the number of field operations. The algorithms for solving linear systems, computing inverse, QR decomposition, and approximating SVD based on fast matrix multiplication can be implemented with precision comparable to that of conventional algorithms to achieve the same error bound (with a suitable notion of approximation/stability). We refer readers to [DDH07] for details. Notice that it is possible that both algorithms based on fast matrix multiplication and conventional counterparts are unstable, see e.g. [AV97] for an example of a pathological matrix with very high condition number.

In this section we describe some applications of our subspace embeddings to problems in numerical linear algebra. All applications follow from a straightforward replacement of previously used embeddings with our new ones as most proofs go through verbatim. In the statement of our bounds we implicitly assume nnz(A)n\operatornamewithlimits{nnz}(A)\geq n, since otherwise fully zero rows of AA can be ignored without affecting the problem solution.

This section describes the application of our subspace embedding from Theorem 9 or Theorem 12 to approximating the leverage scores. Consider a matrix AA of size n×dn\times d and rank rr. Let UU be a n×rn\times r matrix whose columns form an orthonormal basis of the column space of AA. The leverage scores of AA are the squared lengths of the rows of UU. The algorithm for approximating the leverage scores and the analysis are the same as those of [CW12], which itself uses essentially the same algorithm outline as Algorithm 1 of [DMIMW12]. The improved bound is stated below (cf. [CW12, Theorem 21]).

2 Least Squares Regression

There is an algorithm for least squares regression running in time O(nnz(A)+d3log(d/ε)/ε2)O(\operatornamewithlimits{nnz}(A)+d^{3}\log(d/\varepsilon)/\varepsilon^{2}) and succeeding with probability at least 2/32/3.

Proof. Applying Theorem 3 to the subspace spanned by columns of AA and bb, we get a distribution over matrices Π\Pi of size O(d2/ε2)×nO(d^{2}/\varepsilon^{2})\times n such that Π\Pi preserves lengths of vectors in the subspace up to a factor 1±ε1\pm\varepsilon with probability at least 5/65/6. Thus, we only need to find argminxΠAxΠb2\operatornamewithlimits{argmin}_{x}\|\Pi Ax-\Pi b\|_{2}. Note that ΠA\Pi A has size O(d2/ε2)×dO(d^{2}/\varepsilon^{2})\times d. By Theorem 12 of [Sar06], there is an algorithm that with probability at least 5/65/6, finds a 1±ε1\pm\varepsilon approximate solution for least squares regression for the smaller input of ΠA\Pi A and Πb\Pi b and runs in time O(d3log(d/ε)/ε2)O(d^{3}\log(d/\varepsilon)/\varepsilon^{2}). \blacksquare

The following theorem follows from using the embedding of Theorem 9 and the same argument as [CW12, Theorem 32].

Let rr be the rank of AA. There is an algorithm for least squares regression running in time O(nnz(A)((logr)O(1)+log(n/ε))+rω(logr)O(1)+r2log(1/ε))O(\operatornamewithlimits{nnz}(A)((\log r)^{O(1)}+\log(n/\varepsilon))+r^{\omega}(\log r)^{O(1)}+r^{2}\log(1/\varepsilon)) and succeeding with probability at least 2/32/3.

3 Low Rank Approximation

In this section, we describe the application of our subspace embeddings to low rank approximation. Here given a matrix AA, one wants to find a rank kk matrix AkA_{k} minimizing AAkF\|A-A_{k}\|_{F}. Let Δk\Delta_{k} be the minimum AAkF\|A-A_{k}\|_{F} over all rank kk matrices AkA_{k}. Notice that our matrices are of the same form as sparse JL matrices considered by [KN12] so the following property holds for matrices constructed in Theorem 9 (cf. [CW12, Lemma 24]).

[KN12, Theorem 19] Fix ε,δ>0\varepsilon,\delta>0. Let D\mathcal{D} be the distribution over matrices given in Theorem 9 with nn columns. For any matrices A,BA,B with nn rows,

The matrices of Theorem 3 are the same as those of [CW12] so the above property holds for them as well. Therefore, the same algorithm and analysis as in [CW12] work. We state the improved bounds using the embedding of Theorem 3 and Theorem 9 below (cf. [CW12, Theorem 36 and 38]).

Given a matrix AA of size n×nn\times n, there are 2 algorithms that, with probability at least 3/53/5, find 3 matrices U,Σ,VU,\Sigma,V where UU is of size n×kn\times k, Σ\Sigma is of size k×kk\times k, VV is of size n×kn\times k, UTU=VTV=IkU^{T}U=V^{T}V=I_{k}, Σ\Sigma is a diagonal matrix, and

Proof. The proof is essentially the same as that of [CW12] so we only mention the difference. We use 2 bounds for the running time: multiplying an a×ba\times b matrix and a b×cb\times c matrix with c>ac>a takes O(aω2bc)O(a^{\omega-2}bc) time (simply dividing the matrices into a×aa\times a blocks), and approximating SVD for an a×ba\times b matrix MM with a>ba>b takes O(abω1)O(ab^{\omega-1}) time (time to compute MTMM^{T}M, approximate SVD of MTM=QDQTM^{T}M=QDQ^{T} in O(bω)O(b^{\omega}) time [DDH07], and compute MQMQ to complete the SVD of MM). \blacksquare

Acknowledgments

We thank Andrew Drucker for suggesting the SNAP acronym for the OSE’s considered in this work, to which we added the “oblivious” descriptor.

References