An Efficient Parallel Solver for SDD Linear Systems

Richard Peng, Daniel A. Spielman

Introduction

The problem of solving systems of linear equations in symmetric, diagonally dominant (SDD) matrices, and in particular the Laplacian matrices of graphs, arises in applications ranging from the solution of elliptic PDEs [BHV08], to the computation of maximum flows in graphs [DS08, CKM+11, Mad13], to semi-supervised learning [ZGL03, ZS04, ZBL+03]. It has recently been exploited as a primitive in many other algorithms [KM09, KMP12, OSV12, LKP12, KMT11].

In the last decade, there have been remarkable advances in the development of fast algorithms for solving systems of linear equations in SDD matrices. Following an approach suggested by Vaidya [Vai90], Spielman and Teng [ST08] developed a nearly-linear time algorithm for this problem. Their algorithm has three main ingredients: a multi-level framework suggested by Vaidya [Vai90], Joshi [Jos97] and Reif [Rei98]; low-stretch spanning tree preconditioners, introduced by Boman and Hendrickson [BH01] as an improvement of Vaidya’s maximum spanning tree preconditioners; and spectral graph sparsifiers [ST11]. Koutis, Miller and Peng [KMP10, KMP11] developed a simpler and faster way to exploit these ingredients, resulting in an algorithm for finding ϵ\epsilon-approximate solutions to these systems in time O~(nlognlogϵ1)\widetilde{O}(n\log n\log\epsilon^{-1}).

Kelner, Orecchia, Sidford and Zhu [KOSZ13] discovered a simple and efficient algorithm for solving these linear equations that relies only on low-stretch spanning trees, avoiding the use of graph sparsifiers and the multi-level framework. This algorithm has been improved by Lee and Sidford [LS13] to run in time O~(nlog3/2nlogϵ1)\widetilde{O}(n\log^{3/2}n\log\epsilon^{-1}).

There has also been some progress in parallelizing these solvers. When describing a parallel algorithm, we call the number of operations performed its work and the time it takes in parallel its depth. For planar graphs, Koutis and Miller gave an algorithm that requires nearly-linear work and depth close to m1/6m^{1/6} [KM07]. Their approach was extended to general graphs by Blelloch, et. al., who gave parallel algorithms for constructing low-stretch embeddings, leading to an algorithm with depth close to m1/3m^{1/3} [BGK+11].

We present the first algorithm that requires nearly-linear work and poly-logarithmic depth. Unlike the previous nearly-linear time algorithms for solving systems in SDD matrices, our algorithm does not require low-stretch spanning trees. It merely requires spectral sparsifiers of graphs, which we use to construct a sparse approximate inverse chain. It would be easy to solve a system of equations in a matrix if one had a sparse matrix that approximates its inverse. While not all matrices have sparse approximate inverses, we construct something almost as good for SDD matrices: a sequence of sparse matrices whose product approximates the inverse.

Given a sparse approximate inverse chain, the resulting algorithm for solving the linear equations is very simple: we multiply a vector by each matrix in the chain twice. The reason that we multiply by each twice, rather than once, is to make the solver a symmetric operator. This is explained in more detail in Section 3. The algorithm is analogous to the V-cycle used in the Multigrid method [BHM01], whereas the multilevel approach of [ST08, KMP10, KMP11] required the solution of the smallest systems many times, and better resembles the Multigrid W-cyclesVariants of the Multigrid algorithm are among the most popular approaches to solving large SDD linear systems in practice. The advantage of these algorithms is that they can be efficiently parallelized. However, there is no analysis proving that they should work in general. In fact, the theory suggests that V-cycles should be sufficient, while the more expensive W-cycles are often necessary. .

There is a polynomial time algorithm that on input an nn-dimensional SDDM matrix M\boldsymbol{\mathit{M}} with mm nonzeros and condition number at most κ\kappa, produces a sparse approximate inverse chain that can be used to solve any linear equation in M\boldsymbol{\mathit{M}} to any precision ϵ\epsilon in work O((m+nlog3κ)log1/ϵ)O((m+n\log^{3}\kappa)\log 1/\epsilon) and depth O(lognlogκlog1/ϵ)O(\log n\log\kappa\log 1/\epsilon).

The second is a proof that slightly weaker approximate inverse chains exist and can be constructed in nearly-linear work and polylogarithmic time.

There is an algorithm that on input an nn-dimensional SDDM matrix M\boldsymbol{\mathit{M}} with mm nonzeros and condition number at most κ\kappa, produces with probability at least 1/21/2 a sparse approximate inverse chain that can be used to solve any linear equation in M\boldsymbol{\mathit{M}} to any precision ϵ\epsilon in work O((m+nlogcnlog3κ)log1/ϵ)O((m+n\log^{c}n\log^{3}\kappa)\log 1/\epsilon) and depth O(lognlogκlog1/ϵ)O(\log n\log\kappa\log 1/\epsilon), for some constant cc. Moreover, the algorithm runs in work O(mlogc1nlog3κ)O(m\log^{c_{1}}n\log^{3}{\kappa}) and depth O(logc2nlogκ)O(\log^{c_{2}}n\log{\kappa}) for some other constants c1c_{1} and c2c_{2}.

We present an overview of our algorithm in Section 3, and supply the details later.

Background and Notation

Throughout this paper we will consider symmetric matrices. A symmetric matrix is diagonally dominant if each diagonal entry is at leastThis is sometimes called weak diagonal dominance, with strict diagonal dominance referring to the case in which each diagonal is strictly larger. as large as the sum of the absolute values of the other entries in its row. The problem of solving systems of linear equations in symmetric, diagonally dominant (SDD) matrices can be reduced to the problems of solving systems in either Laplacian matrices—SDD matrices with zero rows sums and non-positive off-diagonal elements—or SDDM matrices—positive definite SDD matrices with non-positive off-diagonal elements. The reductions from one form to another approximately preserve the condition numbers of the system (see Appendix A).

A fundamental property of a matrix is its condition number: the ratio of its largest eigenvalue to its smallest. In the case of a Laplacian matrix, which has as an eigenvalue, we consider the ratio to the second-smallest. The condition number measures how much the solution to a linear system in a matrix can change if small changes are made to the target vector or the matrix itself. Thus, the condition number is a lower bound on the precision required when carrying out computations with a matrix. It is also appears in the running time of many numerical algorithms, including the one we present here. The condition number of the Laplacian matrix of a connected weighted graph with nn vertices is at most O(n3wmax/wmin)O(n^{3}w_{max}/w_{min}), where wmaxw_{max} and wminw_{min} are the largest and smallest weights of edges in the graph (see [ST08, Lemma 6.1]) The condition number of a submatrix of a Laplacian is at most O(n4wmax/wmin)O(n^{4}w_{max}/w_{min}) (see Appendix A). Our algorithms will run in time depending upon the logarithm of the condition number.

We recall that a symmetric matrix X\boldsymbol{\mathit{X}} is positive definite if all of its eigenvalues are positive. We indicate this through the notation \boldsymbol{\mathit{X}}\succ{\mbox{\boldmath0}}, where indicates the all-0 matrix. We similarly write \boldsymbol{\mathit{X}}\succcurlyeq{\mbox{\boldmath0}} if X\boldsymbol{\mathit{X}} is positive semidefinite. For matrices X\boldsymbol{\mathit{X}} and Y\boldsymbol{\mathit{Y}}, we write XY\boldsymbol{\mathit{X}}\succ\boldsymbol{\mathit{Y}} or XY\boldsymbol{\mathit{X}}\succcurlyeq\boldsymbol{\mathit{Y}} to indicate that XY\boldsymbol{\mathit{X}}-\boldsymbol{\mathit{Y}} is positive definite or positive semidefinite.

This partial ordering allows us to define a notion of approximation for matrices. We write XϵY\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}} to indicate that

and we remark that this relation is symmetric. While it is more common to use 1+ϵ1+\epsilon in place of exp(ϵ)\exp(\epsilon) above, the exponential facilitates combining approximations. The ϵ\epsilons will be small everywhere in this paper, and we note that when ϵ\epsilon is close to , exp(ϵ)\exp(\epsilon) is between 1+ϵ1+\epsilon and 1+2ϵ1+2\epsilon. The following facts about the positive definite order and approximation are standard (see [BGH+06, BH03, ST08]).

For positive semidefinite matrices X\boldsymbol{\mathit{X}}, Y\boldsymbol{\mathit{Y}}, W\boldsymbol{\mathit{W}} and Z\boldsymbol{\mathit{Z}},

if YϵZ\boldsymbol{\mathit{Y}}\approx_{\epsilon}\boldsymbol{\mathit{Z}}, then X+YϵX+Z\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{Y}}\approx_{\epsilon}\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{Z}};

if XϵY\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}} and WϵZ\boldsymbol{\mathit{W}}\approx_{\epsilon}\boldsymbol{\mathit{Z}}, then X+WϵY+Z\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{W}}\approx_{\epsilon}\boldsymbol{\mathit{Y}}+\boldsymbol{\mathit{Z}};

if Xϵ1Y\boldsymbol{\mathit{X}}\approx_{\epsilon_{1}}\boldsymbol{\mathit{Y}} and Yϵ2Z\boldsymbol{\mathit{Y}}\approx_{\epsilon_{2}}\boldsymbol{\mathit{Z}}, then Xϵ1+ϵ2Z\boldsymbol{\mathit{X}}\approx_{\epsilon_{1}+\epsilon_{2}}\boldsymbol{\mathit{Z}};

if X\boldsymbol{\mathit{X}} and Y\boldsymbol{\mathit{Y}} are positive definite matrices such that XϵY\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}}, then X1ϵY1\boldsymbol{\mathit{X}}^{-1}\approx_{\epsilon}\boldsymbol{\mathit{Y}}^{-1};

if XϵY\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}} and V\boldsymbol{\mathit{V}} is a matrix, then VTXVϵVTYV.\boldsymbol{\mathit{V}}^{T}\boldsymbol{\mathit{X}}\boldsymbol{\mathit{V}}\approx_{\epsilon}\boldsymbol{\mathit{V}}^{T}\boldsymbol{\mathit{Y}}\boldsymbol{\mathit{V}}.

Overview

Our algorithm is most naturally described for SDDM matrices. They are the matrices M\boldsymbol{\mathit{M}} that can be written as

where D\boldsymbol{\mathit{D}} is a nonnegative diagonal matrix, A\boldsymbol{\mathit{A}} is a symmetric nonnegative matrix, and DA\boldsymbol{\mathit{D}}\succ\boldsymbol{\mathit{A}}.

Our algorithm is inspired by the following identity, which holds for matrices A\boldsymbol{\mathit{A}} of norm less than 11.

As the terms corresponding to larger values of kk make shrinking contributions, it is possible to obtain a good approximation to (IA)1(\boldsymbol{\mathit{I}}-\boldsymbol{\mathit{A}})^{-1} by truncating this product.

We exploit a variation of this identity that is symmetric, and thus allows us to employ sparsifiersThe product of sparse approximations of matrices need not be an approximation of the product, unless the products are taken symmetrically. . Our identity also applies more naturally to SDDM matrices. It is:

This tells us that, aside for a few multiplications of a vector by a matrix, we can reduce the problem of solving a linear equation in DA\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}} to the problem of solving a linear equation in DAD1A\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}. Moreover, this latter matrix is also a SDDM matrix (Proposition 5.6). We can in turn reduce this to the problem of solving a system of equations in

After applying this identity kk times, we obtain the matrix

As kk grows large, it becomes easy to approximately solve systems of linear equations in this matrix because all of the eigenvalues of D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} are less than 11 in absolute value. So, the term (D1A)2k(\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}})^{2^{k}} becomes negligible and the matrix approaches D\boldsymbol{\mathit{D}} (Corollary 5.5).

However, this does not yet yield an efficient algorithm as it requires us to multiply vectors by matrices of the form

for several integers kk. This can require a lot of work if those matrices are dense. We overcome this problem by sparsifying those matrices.

We now describe this process in more detail. Let M0=D0A0\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0} be the matrix that defines the equation we want to solve. We observe (in Proposition 5.6) that the matrix D0A0D01A0\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}\boldsymbol{\mathit{D}}_{0}^{-1}\boldsymbol{\mathit{A}}_{0} is also positive definite and diagonally dominant. So, a spectral sparsifier [ST11, BSS12] will allow us to approximate this matrix by a sparse SDDM matrix D1A1\boldsymbol{\mathit{D}}_{1}-\boldsymbol{\mathit{A}}_{1}. Proceeding in this fashion, we obtain a sequence of sparse matrices

and so that Ai\boldsymbol{\mathit{A}}_{i} becomes negligible as ii grows. These provide a natural algorithm for solving a system of equations of the form M0x=b0\boldsymbol{\mathit{M}}_{0}\boldsymbol{\mathit{x}}=\boldsymbol{\mathit{b}}_{0}, which we now present.

x=\textscSolve((M0,D0)(Md,Dd),b)\boldsymbol{\mathit{x}}=\textsc{Solve}((\boldsymbol{\mathit{M}}_{0},\boldsymbol{\mathit{D}}_{0})\ldots(\boldsymbol{\mathit{M}}_{d},\boldsymbol{\mathit{D}}_{d}),\boldsymbol{\mathit{b}}) 1. For ii from 11 to dd, set bi=(I+AiDi1)bi1.\boldsymbol{\mathit{b}}_{i}=(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1})\boldsymbol{\mathit{b}}_{i-1}. 2. Set xd=Dd1bd\boldsymbol{\mathit{x}}_{d}=\boldsymbol{\mathit{D}}_{d}^{-1}\boldsymbol{\mathit{b}}_{d}. 3. For ii from d1d-1 downto , set xi=12(Di1bi+(I+Di1Ai)xi+1).\boldsymbol{\mathit{x}}_{i}=\frac{1}{2}(\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{b}}_{i}+(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i})\boldsymbol{\mathit{x}}_{i+1}).

The work required by Solve is proportional to the total number of nonzero entries in the matrices in the chain, and the depth is proportional to logn\log n times dd.

If the approximations in (2) are good enough, then Mx0\boldsymbol{\mathit{M}}\boldsymbol{\mathit{x}}_{0} will be a good approximation of b0\boldsymbol{\mathit{b}}_{0}. We now roughly estimate how good these approximations need to be. We carry out the details in the following sections. If the condition number of M\boldsymbol{\mathit{M}} is κ\kappa, then the largest eigenvalue of D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} at most 11/κ1-1/\kappa (Proposition 5.3).

So, if we take d=O(log(κ))d=O(\log(\kappa)), all of the eigenvalues of (D1A)2d(\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}})^{2^{d}} will be close to 0. At every level of the recursion, we will incur a loss in approximation quality. To limit the overall loss to a constant, we will require the approximation in (2) to be at least as good as 1/2d\approx_{1/2d}. Using the best-known sparsifiers, we could achieve this with sparsifiers that have O(d2n)O(d^{2}n) edges. So, the total number of nonzero entries in all of the matrices A1,,AdA_{1},\dotsc,A_{d} will be O(d3n)=O(nlog3κ)O(d^{3}n)=O(n\log^{3}\kappa). We establish the existence of such chains in Section 5. If we instead use sparsifiers that we presently know how to construct efficiently, we multiply the total number of nonzero entries by a factor of O(logcn)O(\log^{c}n), for some constant cc. In Section 6, we give a nearly linear work, polylog depth algorithm for constructing these chains.

Approximate Inverse Chains

for all 1id1\leq i\leq d, Miϵi1Di1Ai1Di11Ai1\boldsymbol{\mathit{M}}_{i}\approx_{\epsilon_{i-1}}\boldsymbol{\mathit{D}}_{i-1}-\boldsymbol{\mathit{A}}_{i-1}\boldsymbol{\mathit{D}}_{i-1}^{-1}\boldsymbol{\mathit{A}}_{i-1};

for all 1id1\leq i\leq d, Diϵi1Di1\boldsymbol{\mathit{D}}_{i}\approx_{\epsilon_{i-1}}\boldsymbol{\mathit{D}}_{i-1}; and,

DdϵdMd\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d}.

For the rest of this section, we assume that D1,,Dd\boldsymbol{\mathit{D}}_{1},\dotsc,\boldsymbol{\mathit{D}}_{d} and A1,,Ad\boldsymbol{\mathit{A}}_{1},\dotsc,\boldsymbol{\mathit{A}}_{d} are an approximate inverse chain for D0A0\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}.

The choice of the bound of 22 on ϵi\sum\epsilon_{i} is not particularly important: any constant will allow us to use an approximate inverse chain to solve systems of linear equations in D0A0\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}. We remark that many of the matrices Ai\boldsymbol{\mathit{A}}_{i} will have nonzero diagonal entries. For this reason, we keep the representation of Di\boldsymbol{\mathit{D}}_{i} and Ai\boldsymbol{\mathit{A}}_{i} separate.

We now show that an approximate inverse chain allows one to crudely solve systems of linear equations in D0A0\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0} in time proportional to the number of nonzero entries in the matrices in the chain. We first verify that if we replace DiAiDi1Ai\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i} by Mi+1\boldsymbol{\mathit{M}}_{i+1} in identity (1), then we still obtain a good approximation to Mi1\boldsymbol{\mathit{M}}_{i}^{-1}.

We now use Lemma 4.2, to prove that Solve approximates the inverse of M0\boldsymbol{\mathit{M}}_{0}.

Let Z0\boldsymbol{\mathit{Z}}_{0} be the operator defined by Solve. That is, x0=Z0b0\boldsymbol{\mathit{x}}_{0}=\boldsymbol{\mathit{Z}}_{0}\boldsymbol{\mathit{b}}_{0}. Then,

Let Zi\boldsymbol{\mathit{Z}}_{i} be the operator such that xi=Zibi\boldsymbol{\mathit{x}}_{i}=\boldsymbol{\mathit{Z}}_{i}\boldsymbol{\mathit{b}}_{i}. That is,

Zd=Dd1\boldsymbol{\mathit{Z}}_{d}=\boldsymbol{\mathit{D}}_{d}^{-1}, and

for 0id10\leq i\leq d-1, Zi=12(Di1+(I+Di1Ai)Zi+1(I+AiDi1))\boldsymbol{\mathit{Z}}_{i}=\frac{1}{2}\left(\boldsymbol{\mathit{D}}_{i}^{-1}+\left(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i}\right)\boldsymbol{\mathit{Z}}_{i+1}\left(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\right)\right).

We will prove by reverse induction on ii that

The base case of i=di=d follows from applying Fact 2.1d to DdϵdMd\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d}. For the induction, suppose the result is true for i+1i+1. Then the induction hypothesis gives

by Lemma 4.2. An application of Fact 2.1c completes the induction. ∎

This leads to a constant factor approximation of M0\boldsymbol{\mathit{M}}_{0}. In order to turn this into a high quality approximation, we can use preconditioned Richardson iteration.

[Saa03, Axe94] There exists an algorithm PreconRichardson such that for any symmetric positive semi-definite matrices A\boldsymbol{\mathit{A}} and B\boldsymbol{\mathit{B}} such that BO(1)A1\boldsymbol{\mathit{B}}\approx_{O(1)}\boldsymbol{\mathit{A}}^{-1}, and any error tolerance 0<ϵ1/20<\epsilon\leq 1/2,

Under exact arithmetic, \textscPreconRichardson(A,B,b,ϵ)\textsc{PreconRichardson}(\boldsymbol{\mathit{A}},\boldsymbol{\mathit{B}},\boldsymbol{\mathit{b}},\epsilon) is a linear operator on b\boldsymbol{\mathit{b}} and if Z\boldsymbol{\mathit{Z}} is the matrix such that Zb=\textscPreconRichardson(A,B,b,ϵ)\boldsymbol{\mathit{Z}}\boldsymbol{\mathit{b}}=\textsc{PreconRichardson}(\boldsymbol{\mathit{A}},\boldsymbol{\mathit{B}},\boldsymbol{\mathit{b}},\epsilon), then

\textscPreconRichardson(A,B,b,ϵ)\textsc{PreconRichardson}(\boldsymbol{\mathit{A}},\boldsymbol{\mathit{B}},\boldsymbol{\mathit{b}},\epsilon) takes O(log(1/ϵ))O(\log(1/\epsilon)) iterations, each consisting of one multiplication of a vector by A\boldsymbol{\mathit{A}} and one by B\boldsymbol{\mathit{B}}.

This allows us to solve linear equations in M0\boldsymbol{\mathit{M}}_{0} to arbitrary precision, and the overall performance of the solver can be summarized as follows.

Given an approximate inverse chain for M0\boldsymbol{\mathit{M}}_{0} where Di\boldsymbol{\mathit{D}}_{i} and Ai\boldsymbol{\mathit{A}}_{i} have total size mim_{i}, there is an algorithm \textscSolve(M,b,ϵ)\textsc{Solve}(\boldsymbol{\mathit{M}},\boldsymbol{\mathit{b}},\epsilon) that

runs in O(dlognlog(1/ϵ))O(d\log{n}\log(1/\epsilon)) depth and O(i=0dmilog(1/ϵ))O(\sum_{i=0}^{d}m_{i}\log(1/\epsilon)) work;

is a symmetric linear operator on b\boldsymbol{\mathit{b}}; and

if Z\boldsymbol{\mathit{Z}} is the matrix such that \textscSolve(M,b,ϵ)=b\textsc{Solve}(\boldsymbol{\mathit{M}},\boldsymbol{\mathit{b}},\epsilon)=\boldsymbol{\mathit{b}}, then ZϵM1\boldsymbol{\mathit{Z}}\approx_{\epsilon}\boldsymbol{\mathit{M}}^{-1}.

The algorithm calls the procedure Solve inside preconditioned Richardson iteration. The guarantees of the operator follows from Lemma 4.3, therefore the guarantees of the overall algorithm is given by Lemma 4.4.

To analyze the running time, observe that each of the O(log(1/ϵ))O(\log(1/\epsilon)) iterations performs one matrix-vector multiplication involving M0\boldsymbol{\mathit{M}}_{0}, and then invokes the lower accuracy solver. This solver in turn performs two matrix-vector products for each matrix Mi\boldsymbol{\mathit{M}}_{i}, along with a constant number of vector additions. The bounds on the depth and work follow from the fact that the cost of each of such a matrix-vector multiplication requires depth O(logn)O(\log{n}) and work O(mi)O(m_{i}). ∎

Existence of Sparse Approximate Inverse Chains

We now show that short sparse approximate inverse chains exist. Specifically, we show that as long as each Mi+1\boldsymbol{\mathit{M}}_{i+1} is a good approximation of DiAiDi1Ai\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i}, Di+1\boldsymbol{\mathit{D}}_{i+1} quickly becomes a good approximation to Mi+1\boldsymbol{\mathit{M}}_{i+1}. The following proposition tells us that the approximation factor improves by a constant factor if we do not use approximations.

Let D\boldsymbol{\mathit{D}} and A\boldsymbol{\mathit{A}} be matrices such that D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} is diagonalizable and has real eigenvalues. If all of the eigenvalues of D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} are at most 1λ1-\lambda, Then all of the eigenvalues of D1AD1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} are between and (1λ)2(1-\lambda)^{2}.

This follows from the fact that D1AD1A=(D1A)2\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}=(\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}})^{2}. ∎

In Appendix B we prove the following lemma which says that this remains approximately true if we substitute a good approximation of DAD1A\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}. Note that it agrees with Proposition 5.1 when ϵ=0\epsilon=0.

Let D\boldsymbol{\mathit{D}} and D^\boldsymbol{\widehat{\mathit{D}}} be positive diagonal matrices and let A\boldsymbol{\mathit{A}} and A^\boldsymbol{\widehat{\mathit{A}}} be nonnegative symmetric matrices such that DA\boldsymbol{\mathit{D}}\succ\boldsymbol{\mathit{A}}, D^A^\boldsymbol{\widehat{\mathit{D}}}\succ\boldsymbol{\widehat{\mathit{A}}}, DϵD^\boldsymbol{\mathit{D}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{D}}} and

Let the largest eigenvalue of D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} be 1λ1-\lambda. Then, the eigenvalues of D^1A^\boldsymbol{\widehat{\mathit{D}}}^{-1}\boldsymbol{\widehat{\mathit{A}}} lie between 1exp(2ϵ)1-\exp(2\epsilon) and 1(1(1λ)2)exp(2ϵ)1-(1-(1-\lambda)^{2})\exp(-2\epsilon).

The following proposition allows us to bound the eigenvalues of D01A0\boldsymbol{\mathit{D}}_{0}^{-1}\boldsymbol{\mathit{A}}_{0} in terms of the condition number of D0A0\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}.

Let M=DA\boldsymbol{\mathit{M}}=\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}} be a positive definite matrix with condition number κ\kappa, where D\boldsymbol{\mathit{D}} is a positive diagonal matrix and A\boldsymbol{\mathit{A}} is nonnegative. Then, the eigenvalues of D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} are between 1+1/κ-1+1/\kappa and 11/κ1-1/\kappa.

Let λmax\lambda_{max} and λmin\lambda_{min} be the largest and smallest eigenvalues of M\boldsymbol{\mathit{M}}. As the largest eigenvalue of a positive diagonal matrix is its largest entry, and the largest eigenvalue of a symmetric matrix is at least its largest entry, the largest eigenvalue of D\boldsymbol{\mathit{D}} is at most λmax\lambda_{max}. We will prove that the smallest eigenvalue of ID1A\boldsymbol{\mathit{I}}-\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} is at least λmin/λmax=1/κ\lambda_{min}/\lambda_{max}=1/\kappa. In particular, it is equal to

This implies that the largest eigenvalue of D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} is at most 11/κ1-1/\kappa. The Perron-Frobenius theorem tells us that the largest absolute value of an eigenvalue of a nonnegative matrix is the largest eigenvalue. As D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} is nonnegative, the bound on its smallest eigenvalue follows. ∎

Conversely, the following proposition allows us to show that D\boldsymbol{\mathit{D}} is a good approximation of DA\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}} if D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} is small.

If the eigenvalues of D1A\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} lie between α-\alpha and β\beta, then

Applying parts aa and ee of Fact 2.1, we derive

One obstacle to finding an approximate inverse chain is that the last matrix must be approximated by its diagonal. We use the preceding lemma and propositions to show that we can achieve this with a chain whose depth is logarithmic in the condition number of M0\boldsymbol{\mathit{M}}_{0}.

If M0=D0A0\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0} is a SDDM matrix with condition number κ\kappa, d=log4/3κd=\left\lceil\log_{4/3}\kappa\right\rceil, and D1,,Dd\boldsymbol{\mathit{D}}_{1},\dots,\boldsymbol{\mathit{D}}_{d} and A1,,Ad\boldsymbol{\mathit{A}}_{1},\dots,\boldsymbol{\mathit{A}}_{d} satisfy conditions aa and bb of Definition 4.1 with ϵ0,,ϵd11/9\epsilon_{0},\dots,\epsilon_{d-1}\leq 1/9, then DdϵdMd\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d} for ϵd=ln3\epsilon_{d}=\ln 3.

Proposition 5.3 tells us that the eigenvalues of D01A0\boldsymbol{\mathit{D}}_{0}^{-1}\boldsymbol{\mathit{A}}_{0} are at most 11/κ1-1/\kappa in absolute value. For ϵi1/9\epsilon_{i}\leq 1/9 and λ1/3\lambda\leq 1/3,

So, Lemma 5.2 implies that the eigenvalues of Dd1Ad\boldsymbol{\mathit{D}}_{d}^{-1}\boldsymbol{\mathit{A}}_{d} lie between 1exp(2/9)1/41-\exp(2/9)\geq-1/4 and 2/32/3. Proposition 5.4 then tells us that DγDA\boldsymbol{\mathit{D}}\approx_{\gamma}\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}, where

It remains to show that we can find sequences of matrices Di\boldsymbol{\mathit{D}}_{i} and Ai\boldsymbol{\mathit{A}}_{i} that satisfy conditions aa and bb of Definition 4.1. We begin by proving that as long as DA\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}} is a SDDM matrix, so is DAD1A\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}.

It is clear that M^\boldsymbol{\widehat{\mathit{M}}} is symmetric, and all entries in AD1A\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} are nonnegative. To check that M^\boldsymbol{\widehat{\mathit{M}}} is diagonally dominant, we compute the sum of the off-diagonal entries in row ii:

Reordering the two summations and collecting terms gives

As M^\boldsymbol{\widehat{\mathit{M}}} is SDD, it is positive semidefinite. To see that it is positive definite, one need merely observe that it is nonsingular. This follows from the nonsingularity of M\boldsymbol{\mathit{M}} and identity (1). ∎

This allows us to use the spectral sparsifiers of Batson, Spielman, and Srivastava [BSS12] to sparsify DAD1A\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}.

For every nn-dimensional Laplacian matrix L\boldsymbol{\mathit{L}} and every 0<ϵ<1/20<\epsilon<1/2, there exists a Laplacian matrix L~\boldsymbol{\widetilde{\mathit{L}}} with O(n/ϵ2)O(n/\epsilon^{2}) nonzero entries so that LϵL~\boldsymbol{\mathit{L}}\approx_{\epsilon}\boldsymbol{\widetilde{\mathit{L}}}.

For every nn-dimensional SDDM matrix M=DA\boldsymbol{\mathit{M}}=\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}} with D\boldsymbol{\mathit{D}} nonnegative diagonal and A\boldsymbol{\mathit{A}} nonnegative, and every 0<ϵ<1/20<\epsilon<1/2, there exist a nonnegative diagonal D^\boldsymbol{\widehat{\mathit{D}}} and a nonnegative A^\boldsymbol{\widehat{\mathit{A}}} with at most O(n/ϵ2)O(n/\epsilon^{2}) nonzero entries so that

Let Y\boldsymbol{\mathit{Y}} be the diagonal matrix containing the diagonal entries of A\boldsymbol{\mathit{A}} and let X\boldsymbol{\mathit{X}} be the diagonal matrix so that DXA\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{X}}-\boldsymbol{\mathit{A}} has zero row-sums. Then, DXA\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{X}}-\boldsymbol{\mathit{A}} is a Laplacian matrix, so by Theorem 5.7 there exists a Laplacian matrix D~A~\boldsymbol{\widetilde{\mathit{D}}}-\boldsymbol{\widetilde{\mathit{A}}} where D~\boldsymbol{\widetilde{\mathit{D}}} is nonnegative diagonal, A~\boldsymbol{\widetilde{\mathit{A}}} has at most O(n/ϵ2)O(n/\epsilon^{2}) nonzero entries and is nonnegative with zero diagonal, and

As neither A~\boldsymbol{\widetilde{\mathit{A}}} nor AY\boldsymbol{\mathit{A}}-\boldsymbol{\mathit{Y}} have diagonal entries, it follows that

We now set D^=D~+X+Y\boldsymbol{\widehat{\mathit{D}}}=\boldsymbol{\widetilde{\mathit{D}}}+\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{Y}} and A^=A~+Y\boldsymbol{\widehat{\mathit{A}}}=\boldsymbol{\widetilde{\mathit{A}}}+\boldsymbol{\mathit{Y}}. The desired properties of D^\boldsymbol{\widehat{\mathit{D}}} and A^\boldsymbol{\widehat{\mathit{A}}} then follow from Fact 2.1a. ∎

If M0=D0A0\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0} is an nn-dimensional SDDM matrix with condition number at most κ\kappa, D0\boldsymbol{\mathit{D}}_{0} is diagonal and A0\boldsymbol{\mathit{A}}_{0} is nonnegative, then M0\boldsymbol{\mathit{M}}_{0} has an approximate inverse chain D1,,Dd\boldsymbol{\mathit{D}}_{1},\dots,\boldsymbol{\mathit{D}}_{d} and A1,,Ad\boldsymbol{\mathit{A}}_{1},\dots,\boldsymbol{\mathit{A}}_{d} such that d=O(logκ)d=O(\log\kappa) and the total number of nonzero entries in the matrices in the chain is O(nlog3κ)O(n\log^{3}{\kappa}).

Set d=log4/3κd=\left\lceil\log_{4/3}\kappa\right\rceil and ϵ0,,ϵd1\epsilon_{0},\dots,\epsilon_{d-1} to the minimum of 1/91/9 and 1/2d1/2d. By Corollary 5.8 there exists a sequence of matrices Mi=DiAi\boldsymbol{\mathit{M}}_{i}=\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i} that satisfy conditions aa and bb of Definition 4.1 and that each have O(n/log2κ)O(n/\log^{2}\kappa) nonzero entries. By Corollary 5.5, we then know that DdϵdMd\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d} for ϵd=ln3\epsilon_{d}=\ln 3. As the sum of the ϵi\epsilon_{i} is at most 1/2+ln3<21/2+\ln 3<2, this sequence of matrices is an approximate inverse chain. ∎

Theorem 1.1 now follows from Theorem 4.5 and Theorem 5.9.

Efficient Parallel Construction

We now show how to construct sparse approximate inverse chains efficiently in parallel. The only obstacle is that we must employ an efficient sparsification routine instead of Theorem 5.7. We do this through a two-step process. If Ai\boldsymbol{\mathit{A}}_{i} is an nn-dimensional matrix with mm nonzero entries, we first compute an approximation of DiAiDi1Ai\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i} that has O(n+mlogn/ϵ2)O(n+m\log n/\epsilon^{2}) nonzero entries. We do this by observing that the off-diagonal entries of this matrix come from a sum of weighted cliques, and that we can sparsify those cliques individually. This is made easy by the existence of a closed form for the effective resistance between vertices of the cliques. We then employ a general-purpose sparsification routine to further reduce the number of nonzero entries to O(nlogcn/ϵ2)O(n\log^{c}n/\epsilon^{2}), for some constant cc.

To find an algorithm that produces these sparsifiers in nearly-linear work and polylogarithmic depth, we look to the original spectral sparsification algorithm of Spielman and Teng [ST11]. It uses a polylogarithmic number of calls to a graph partitioning routine to divide a graph into a small set of edges plus a number of subgraphs of high conductance. It then samples edges from these subgraphs at random. This algorithm requires nearly-linear work and polylogarithmic depth if the graph partitioning algorithm does as well.

The graph partitioning algorithm used in [ST11] comes from [ST13]. This algorithm, which is based on local graph clustering, could probably be implemented efficiently in parallel. However, it is not stated in a parallel form in that paper. Instead, we rely on an improvement of this graph partitioning algorithm by Orecchia and Vishnoi [OV11]. By using their BalCut algorithm, we obtain a sparsifier with fewer edges while using less work and in polylogarithmic depth. To see that the BalCut algorithm can be implemented in nearly-linear time and polylogarithmic depth, we observe that all of its operations are either multiplication of vectors by matrices, elementary vector operations, or sorting and computing sparsest cuts by sweeps along vectors. All of these components parallelize efficiently, and Orecchia and Vishnoi showed that their algorithm only requires O(mlogcm)O(m\log^{c}m) work, for some constant cc, when asked to produce cuts of polylogarithmic conductance. This is the case in the calls to BalCut made by Spielman and Teng’s [ST11] algorithm, specifically by the routine Partition2 inside Sparsify.

We summarize this discussion with the following theorem.

There exists an algorithm that on input an nn-dimensional Laplacian matrix L\boldsymbol{\mathit{L}} with mm nonzero entries, an ϵ[0,1/2]\epsilon\in[0,1/2], produces with probability at least 11/n21-1/n^{2} a Laplacian matrix L^\boldsymbol{\widehat{\mathit{L}}} such that LϵL^\boldsymbol{\mathit{L}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{L}}} and L^\boldsymbol{\widehat{\mathit{L}}} has O(nlogcn/ϵ2)O(n\log^{c}n/\epsilon^{2}) entries for some constant cc. Moreover, this algorithm requires O(mlogc1n)O(m\log^{c_{1}}n) work and O(logc2n)O(\log^{c_{2}}n) depth, for some other constants c1c_{1} and c2c_{2}.

We cannot directly apply the above-described algorithm to sparsify DAD1A\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} because that matrix could be dense. So, we must avoid actually constructing this matrix, and construct a sparse approximation of it first instead. Let mm be the number of nonzero entries in A\boldsymbol{\mathit{A}}. We will show how to construct a sparse approximation of DAD1A\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}} with O(mlogm/ϵ2)O(m\log m/\epsilon^{2}) nonzero entries. We begin by writing the second matrix as a sum of outer products:

where A:,u\boldsymbol{\mathit{A}}_{:,u} denotes the uuth column of A\boldsymbol{\mathit{A}} and Au,:\boldsymbol{\mathit{A}}_{u,:} denotes the uuth row. We can see that the diagonal entries of this sum come from two types of products: Au,uDu,u1Au,u\boldsymbol{\mathit{A}}_{u,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,u} and Au,vDv,v1Av,u\boldsymbol{\mathit{A}}_{u,v}\boldsymbol{\mathit{D}}_{v,v}^{-1}\boldsymbol{\mathit{A}}_{v,u}. There are only nn terms of the first type and mm terms of the second. So, all of these can be computed in time O(n+m)O(n+m). We are more concerned with the off-diagonal entries. These can come from three types of products: Au,uDu,u1Au,v\boldsymbol{\mathit{A}}_{u,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,v}, Av,uDu,u1Au,u\boldsymbol{\mathit{A}}_{v,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,u}, and Av,uDu,u1Au,w\boldsymbol{\mathit{A}}_{v,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,w}. There are only O(m)O(m) terms of the first two types, so we can just compute and store all of them. For a fixed uu, the terms of the last type correspond to a weighted complete graph on the neighbors of vertex uu in which the edge between vv and ww has weight Av,uDu,u1Au,w\boldsymbol{\mathit{A}}_{v,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,w}. Denote this graph by GuG_{u} and its Laplacian matrix by Lu\boldsymbol{\mathit{L}}_{u}. We now show how to sparsify such a weighted complete graph.

We will do this through the approach of sampling edges by their effective resistance introduced by Spielman and Srivastava [SS08]. The effective resistance of an edge (v,w)(v,w) in a weighted graph GG with Laplacian matrix Lu\boldsymbol{\mathit{L}}_{u} is given by

where ev\boldsymbol{\mathit{e}}_{v} is the elementary unit vector in direction vv and Lu\boldsymbol{\mathit{L}}_{u}^{{\dagger}} is the pseudo-inverse of Lu\boldsymbol{\mathit{L}}_{u}.

It has been observed [KMP10, KL13] that one can strengthen the result of [SS08] by replacing Rudelson’s concentration theorem [Rud99] with that of Rudelson and Vershynin [RV07] to obtain the following result.

Let G=(V,E,w)G=(V,E,\boldsymbol{\mathit{w}}) be a weighted graph with nn vertices and let L\boldsymbol{\mathit{L}} be its Laplacian matrix. Consider the distribution on edges obtained by choosing an edge from EE with probability proportional to its weight times its effective resistance, and then dividing its weight by this probability. For every 0<ϵ<1/20<\epsilon<1/2, if one forms the Laplacian of O(nlogn/ϵ2)O(n\log n/\epsilon^{2}) edges chosen from this distribution, with replacement, then the resulting Laplacian matrix, L^\boldsymbol{\widehat{\mathit{L}}}, satisfies LϵL^\boldsymbol{\mathit{L}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{L}}} with probability at least 11/n21-1/n^{2}.

To apply this theorem, we need to know the effective resistances between pairs of vertices in GuG_{u}.

The effective resistance between vv and ww in GuG_{u} is

Note that Du,u\boldsymbol{\mathit{D}}_{u,u} can be larger than dud_{u}.

There exists an algorithm that when given as input an ϵ[0,1/2]\epsilon\in[0,1/2] and an nn-dimensional SDDM matrix M=DA\boldsymbol{\mathit{M}}=\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}, with D\boldsymbol{\mathit{D}} nonnegative diagonal and A\boldsymbol{\mathit{A}} nonnegative with mm nonzero entries, produces with probability at least 11/n1-1/n a nonnegative diagonal matrix D^\boldsymbol{\widehat{\mathit{D}}} and a nonnegative A^\boldsymbol{\widehat{\mathit{A}}} with at most O(mlogn/ϵ2)O(m\log n/\epsilon^{2}) nonzero entries so that

Moreover, this algorithm runs in time O(mlog2n/ϵ2)O(m\log^{2}n/\epsilon^{2}).

We handle the diagonal entries as in Corollary 5.8. We also keep all of the off-diagonal entries of the first two types described above. For each vertex uu, let δu\delta_{u} be the number of neighbors of vertex uu. We will approximate Lu\boldsymbol{\mathit{L}}_{u} by a Laplacian L^u\boldsymbol{\widehat{\mathit{L}}}_{u} with O(δulogn/ϵ2)O(\delta_{u}\log n/\epsilon^{2}) non-zero entries. By Fact 2.1b, the sum of the L^u\boldsymbol{\widehat{\mathit{L}}}_{u} approximates the sum of the Lu\boldsymbol{\mathit{L}}_{u}. By Theorem 6.2, we can construct L^u\boldsymbol{\widehat{\mathit{L}}}_{u} by sampling the appropriate number of edges of GuG_{u}, and then re-scaling them. It remains to check that we can sample each of these edges in time O(logn)O(\log n). To see this, observe that we need to sample edge (v,w)(v,w) with probability proportional to

So, in time O(δu)O(\delta_{u}) we can compute for each vv the probability that an edge involving vv is sampled. We can then create a data structure that will allow us to sample vv with this probability in time O(logn)O(\log n), and to then follow it by sampling ww from the distribution induced by fixing the choice of vv. ∎

There exists an algorithm that on input an nn-dimensional diagonal D0\boldsymbol{\mathit{D}}_{0} and nonnegative A0\boldsymbol{\mathit{A}}_{0} with mm nonzero entries such that M0=D0A0\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0} has condition number at most κ\kappa, constructs with probability at least 1/21/2 an approximate inverse chain D1,,Dd\boldsymbol{\mathit{D}}_{1},\dots,\boldsymbol{\mathit{D}}_{d} and A1,,Ad\boldsymbol{\mathit{A}}_{1},\dots,\boldsymbol{\mathit{A}}_{d} such that d=O(logκ)d=O(\log\kappa) and the total number of nonzero entries in the matrices in the chain is O(nlogcnlog3κ)O(n\log^{c}n\log^{3}{\kappa}), for some constant cc. Moreover, the algorithm runs in work O(mlogc1nlog3κ)O(m\log^{c_{1}}n\log^{3}{\kappa}) and depth O(logc2nlogκ)O(\log^{c_{2}}n\log{\kappa}), for some other constants c1c_{1} and c2c_{2}.

The proof follows the proof of Theorem 5.9. However, in place of Corollary 5.8 and Theorem 5.7, it uses the algorithms implicit in Corollary 6.4 and Theorem 6.1 to construct Di\boldsymbol{\mathit{D}}_{i} and Ai\boldsymbol{\mathit{A}}_{i} from Di1\boldsymbol{\mathit{D}}_{i-1} and Ai1\boldsymbol{\mathit{A}}_{i-1}. ∎

Theorem 1.2 now follows from Theorem 4.5 and Theorem 6.5.

Acknowledgement

We thank Gary Miller for very helpful discussions, and Jakub Pachocki for comments on an earlier draft.

References

Appendix A The Condition Number of a Submatrix

To learn more about the reductions between the problems of solving systems of linear equations in various forms of SDD matrices, we refer the reader to either Section 3.2 of [ST08] or Appendix A of [KOSZ13]. For example, Spielman and Teng [ST08] point out that one can solve a system of equations Lx=b\boldsymbol{\mathit{L}}\boldsymbol{\mathit{x}}=\boldsymbol{\mathit{b}} in a Laplacian matrix by first solving the system in which the first entry of x\boldsymbol{\mathit{x}} is forced to be zero. After solving this reduced system, one can orthogonalize the solution with respect to the nullspace of L\boldsymbol{\mathit{L}}, the all 1s vector. The reduced system is a linear system in the submatrix of L\boldsymbol{\mathit{L}} obtained by removing its first row and column. The resulting matrix is a SDDM matrix. We now show that its condition number is at most nn times the finite condition number of L\boldsymbol{\mathit{L}}. As the condition number of a submatrix of a positive definite matrix is at most the condition number of the original, the same bound holds for every submatrix of L\boldsymbol{\mathit{L}}.

Let L\boldsymbol{\mathit{L}} be the Laplacian matrix of a connected graph on nn vertices and let M\boldsymbol{\mathit{M}} be the submatrix of L\boldsymbol{\mathit{L}} containing all rows and columns of L\boldsymbol{\mathit{L}} but the first. Then, λ1(M)λ2(L)/n\lambda_{1}(\boldsymbol{\mathit{M}})\geq\lambda_{2}(\boldsymbol{\mathit{L}})/n.

For any vector v\boldsymbol{\mathit{v}} of length n1n-1, let u\boldsymbol{\mathit{u}} be the vector consisting of a 0 followed by v\boldsymbol{\mathit{v}}. Then, vTMv=uTLu\boldsymbol{\mathit{v}}^{T}\boldsymbol{\mathit{M}}\boldsymbol{\mathit{v}}=\boldsymbol{\mathit{u}}^{T}\boldsymbol{\mathit{L}}\boldsymbol{\mathit{u}}. As 11 spans the nullspace of L\boldsymbol{\mathit{L}}, let x\boldsymbol{\mathit{x}} be the vector obtained by orthogonalizing u\boldsymbol{\mathit{u}} with respect to 11:

where μ\mu is the average of the entries of u\boldsymbol{\mathit{u}}. Then, vTMv=xTLx\boldsymbol{\mathit{v}}^{T}\boldsymbol{\mathit{M}}\boldsymbol{\mathit{v}}=\boldsymbol{\mathit{x}}^{T}\boldsymbol{\mathit{L}}\boldsymbol{\mathit{x}}. We will show that xv/n\left\|\boldsymbol{\mathit{x}}\right\|\geq\left\|\boldsymbol{\mathit{v}}\right\|/\sqrt{n}. The lemma will then follow, as

As \boldsymbol{\mathit{u}}-\mu{\mbox{\boldmath1}} is orthogonal to u\boldsymbol{\mathit{u}},

Under the conditions of Lemma A.1, the condition number of M\boldsymbol{\mathit{M}} is at most nn times the finite condition number of L\boldsymbol{\mathit{L}}. Moreover, the same is true for every principal submatrix of L\boldsymbol{\mathit{L}}.

The largest and smallest eigenvalues of a principal submatrix of a matrix are between the largest and smallest eigenvalues of that matrix. As M\boldsymbol{\mathit{M}} is a sub-matrix of L\boldsymbol{\mathit{L}}, its largest eigenvalue is smaller than the largest eigenvalue of L\boldsymbol{\mathit{L}}. So, the condition number of M\boldsymbol{\mathit{M}} is at most nn times the finite condition number of L\boldsymbol{\mathit{L}}. The condition number of every sub-matrix of M\boldsymbol{\mathit{M}} can only be smaller. ∎

Appendix B Proof of Lemma 5.2

The proof is a routine calculation. We build to it in a few steps.

Let A\boldsymbol{\mathit{A}} be a positive definite matrix and let X\boldsymbol{\mathit{X}} be a non-negative diagonal matrix. Then,

We just prove the first inequality. The second is similar. We have

Let D~\boldsymbol{\widetilde{\mathit{D}}} and D^\boldsymbol{\widehat{\mathit{D}}} be positive diagonal matrices and let A~\boldsymbol{\widetilde{\mathit{A}}} and A^\boldsymbol{\widehat{\mathit{A}}} be non-negative symmetric matrices such that D~A~\boldsymbol{\widetilde{\mathit{D}}}\succ\boldsymbol{\widetilde{\mathit{A}}}, D^A^\boldsymbol{\widehat{\mathit{D}}}\succ\boldsymbol{\widehat{\mathit{A}}}, D~ϵD^\boldsymbol{\widetilde{\mathit{D}}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{D}}} and D~A~ϵD^A^\boldsymbol{\widetilde{\mathit{D}}}-\boldsymbol{\widetilde{\mathit{A}}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{D}}}-\boldsymbol{\widehat{\mathit{A}}}. Then,

Let λ\lambda be the largest eigenvalue of D~1A~\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}}, and let μ=1λ\mu=1-\lambda. Then, μ\mu is the smallest eigenvalue of ID~1/2A~D~1/2\boldsymbol{\mathit{I}}-\boldsymbol{\widetilde{\mathit{D}}}^{-1/2}\boldsymbol{\widetilde{\mathit{A}}}\boldsymbol{\widetilde{\mathit{D}}}^{-1/2}. As

Let X=D~1/2D^1/2\boldsymbol{\mathit{X}}=\boldsymbol{\widetilde{\mathit{D}}}^{-1/2}\boldsymbol{\widehat{\mathit{D}}}^{1/2}. As λmin(X)2exp(ϵ)\lambda_{min}(\boldsymbol{\mathit{X}})^{2}\geq\exp(-\epsilon), Proposition B.1 allows us to conclude that the smallest eigenvalue of

We may similarly conclude that the largest eigenvalue of (3) is at most

Let D~=D\boldsymbol{\widetilde{\mathit{D}}}=\boldsymbol{\mathit{D}} and A~=AD1A\boldsymbol{\widetilde{\mathit{A}}}=\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}. From Proposition 5.1, we know that the largest eigenvalue of D~1A~\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}} is at most (1λ)2(1-\lambda)^{2}, and so 1λmax(D~1A~)1(1λ)21-\lambda_{max}(\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}})\geq 1-(1-\lambda)^{2}. Proposition B.2 then implies that

As the smallest eigenvalue of D~1A~\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}} is at least zero, the largest eigenvalue of ID~1A~\boldsymbol{\mathit{I}}-\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}} is at most 11. So, Proposition B.2 tells us that the largest eigenvalue of ID^1A^\boldsymbol{\mathit{I}}-\boldsymbol{\widehat{\mathit{D}}}^{-1}\boldsymbol{\widehat{\mathit{A}}} is at most exp(2ϵ)\exp(2\epsilon). This in turn implies that the smallest eigenvalue of D^1A^\boldsymbol{\widehat{\mathit{D}}}^{-1}\boldsymbol{\widehat{\mathit{A}}} is at least 1exp(2ϵ)1-\exp(2\epsilon). ∎