An Efficient Parallel Solver for SDD Linear Systems

Richard Peng, Daniel A. Spielman

Introduction

The problem of solving systems of linear equations in symmetric, diagonally dominant (SDD) matrices, and in particular the Laplacian matrices of graphs, arises in applications ranging from the solution of elliptic PDEs [BHV08], to the computation of maximum flows in graphs [DS08, CKM+11, Mad13], to semi-supervised learning [ZGL03, ZS04, ZBL+03]. It has recently been exploited as a primitive in many other algorithms [KM09, KMP12, OSV12, LKP12, KMT11].

In the last decade, there have been remarkable advances in the development of fast algorithms for solving systems of linear equations in SDD matrices. Following an approach suggested by Vaidya [Vai90], Spielman and Teng [ST08] developed a nearly-linear time algorithm for this problem. Their algorithm has three main ingredients: a multi-level framework suggested by Vaidya [Vai90], Joshi [Jos97] and Reif [Rei98]; low-stretch spanning tree preconditioners, introduced by Boman and Hendrickson [BH01] as an improvement of Vaidya’s maximum spanning tree preconditioners; and spectral graph sparsifiers [ST11]. Koutis, Miller and Peng [KMP10, KMP11] developed a simpler and faster way to exploit these ingredients, resulting in an algorithm for finding $\epsilon$ -approximate solutions to these systems in time $\widetilde{O}(n\log n\log\epsilon^{-1})$ .

Kelner, Orecchia, Sidford and Zhu [KOSZ13] discovered a simple and efficient algorithm for solving these linear equations that relies only on low-stretch spanning trees, avoiding the use of graph sparsifiers and the multi-level framework. This algorithm has been improved by Lee and Sidford [LS13] to run in time $\widetilde{O}(n\log^{3/2}n\log\epsilon^{-1})$ .

There has also been some progress in parallelizing these solvers. When describing a parallel algorithm, we call the number of operations performed its work and the time it takes in parallel its depth. For planar graphs, Koutis and Miller gave an algorithm that requires nearly-linear work and depth close to $m^{1/6}$ [KM07]. Their approach was extended to general graphs by Blelloch, et. al., who gave parallel algorithms for constructing low-stretch embeddings, leading to an algorithm with depth close to $m^{1/3}$ [BGK+11].

We present the first algorithm that requires nearly-linear work and poly-logarithmic depth. Unlike the previous nearly-linear time algorithms for solving systems in SDD matrices, our algorithm does not require low-stretch spanning trees. It merely requires spectral sparsifiers of graphs, which we use to construct a sparse approximate inverse chain. It would be easy to solve a system of equations in a matrix if one had a sparse matrix that approximates its inverse. While not all matrices have sparse approximate inverses, we construct something almost as good for SDD matrices: a sequence of sparse matrices whose product approximates the inverse.

Given a sparse approximate inverse chain, the resulting algorithm for solving the linear equations is very simple: we multiply a vector by each matrix in the chain twice. The reason that we multiply by each twice, rather than once, is to make the solver a symmetric operator. This is explained in more detail in Section 3. The algorithm is analogous to the V-cycle used in the Multigrid method [BHM01], whereas the multilevel approach of [ST08, KMP10, KMP11] required the solution of the smallest systems many times, and better resembles the Multigrid W-cyclesVariants of the Multigrid algorithm are among the most popular approaches to solving large SDD linear systems in practice. The advantage of these algorithms is that they can be efficiently parallelized. However, there is no analysis proving that they should work in general. In fact, the theory suggests that V-cycles should be sufficient, while the more expensive W-cycles are often necessary. .

There is a polynomial time algorithm that on input an $n$ -dimensional SDDM matrix $\boldsymbol{\mathit{M}}$ with $m$ nonzeros and condition number at most $\kappa$ , produces a sparse approximate inverse chain that can be used to solve any linear equation in $\boldsymbol{\mathit{M}}$ to any precision $\epsilon$ in work $O((m+n\log^{3}\kappa)\log 1/\epsilon)$ and depth $O(\log n\log\kappa\log 1/\epsilon)$ .

The second is a proof that slightly weaker approximate inverse chains exist and can be constructed in nearly-linear work and polylogarithmic time.

There is an algorithm that on input an $n$ -dimensional SDDM matrix $\boldsymbol{\mathit{M}}$ with $m$ nonzeros and condition number at most $\kappa$ , produces with probability at least $1/2$ a sparse approximate inverse chain that can be used to solve any linear equation in $\boldsymbol{\mathit{M}}$ to any precision $\epsilon$ in work $O((m+n\log^{c}n\log^{3}\kappa)\log 1/\epsilon)$ and depth $O(\log n\log\kappa\log 1/\epsilon)$ , for some constant $c$ . Moreover, the algorithm runs in work $O(m\log^{c_{1}}n\log^{3}{\kappa})$ and depth $O(\log^{c_{2}}n\log{\kappa})$ for some other constants $c_{1}$ and $c_{2}$ .

We present an overview of our algorithm in Section 3, and supply the details later.

Background and Notation

Throughout this paper we will consider symmetric matrices. A symmetric matrix is diagonally dominant if each diagonal entry is at leastThis is sometimes called weak diagonal dominance, with strict diagonal dominance referring to the case in which each diagonal is strictly larger. as large as the sum of the absolute values of the other entries in its row. The problem of solving systems of linear equations in symmetric, diagonally dominant (SDD) matrices can be reduced to the problems of solving systems in either Laplacian matrices—SDD matrices with zero rows sums and non-positive off-diagonal elements—or SDDM matrices—positive definite SDD matrices with non-positive off-diagonal elements. The reductions from one form to another approximately preserve the condition numbers of the system (see Appendix A).

A fundamental property of a matrix is its condition number: the ratio of its largest eigenvalue to its smallest. In the case of a Laplacian matrix, which has as an eigenvalue, we consider the ratio to the second-smallest. The condition number measures how much the solution to a linear system in a matrix can change if small changes are made to the target vector or the matrix itself. Thus, the condition number is a lower bound on the precision required when carrying out computations with a matrix. It is also appears in the running time of many numerical algorithms, including the one we present here. The condition number of the Laplacian matrix of a connected weighted graph with $n$ vertices is at most $O(n^{3}w_{max}/w_{min})$ , where $w_{max}$ and $w_{min}$ are the largest and smallest weights of edges in the graph (see [ST08, Lemma 6.1]) The condition number of a submatrix of a Laplacian is at most $O(n^{4}w_{max}/w_{min})$ (see Appendix A). Our algorithms will run in time depending upon the logarithm of the condition number.

We recall that a symmetric matrix $\boldsymbol{\mathit{X}}$ is positive definite if all of its eigenvalues are positive. We indicate this through the notation $\boldsymbol{\mathit{X}}\succ{\mbox{\boldmath$ 0 $}}$ , where indicates the all-0 matrix. We similarly write $\boldsymbol{\mathit{X}}\succcurlyeq{\mbox{\boldmath$ 0 $}}$ if $\boldsymbol{\mathit{X}}$ is positive semidefinite. For matrices $\boldsymbol{\mathit{X}}$ and $\boldsymbol{\mathit{Y}}$ , we write $\boldsymbol{\mathit{X}}\succ\boldsymbol{\mathit{Y}}$ or $\boldsymbol{\mathit{X}}\succcurlyeq\boldsymbol{\mathit{Y}}$ to indicate that $\boldsymbol{\mathit{X}}-\boldsymbol{\mathit{Y}}$ is positive definite or positive semidefinite.

This partial ordering allows us to define a notion of approximation for matrices. We write $\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}}$ to indicate that

and we remark that this relation is symmetric. While it is more common to use $1+\epsilon$ in place of $\exp(\epsilon)$ above, the exponential facilitates combining approximations. The $\epsilon$ s will be small everywhere in this paper, and we note that when $\epsilon$ is close to , $\exp(\epsilon)$ is between $1+\epsilon$ and $1+2\epsilon$ . The following facts about the positive definite order and approximation are standard (see [BGH+06, BH03, ST08]).

For positive semidefinite matrices $\boldsymbol{\mathit{X}}$ , $\boldsymbol{\mathit{Y}}$ , $\boldsymbol{\mathit{W}}$ and $\boldsymbol{\mathit{Z}}$ ,

if $\boldsymbol{\mathit{Y}}\approx_{\epsilon}\boldsymbol{\mathit{Z}}$ , then $\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{Y}}\approx_{\epsilon}\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{Z}}$ ;

if $\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}}$ and $\boldsymbol{\mathit{W}}\approx_{\epsilon}\boldsymbol{\mathit{Z}}$ , then $\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{W}}\approx_{\epsilon}\boldsymbol{\mathit{Y}}+\boldsymbol{\mathit{Z}}$ ;

if $\boldsymbol{\mathit{X}}\approx_{\epsilon_{1}}\boldsymbol{\mathit{Y}}$ and $\boldsymbol{\mathit{Y}}\approx_{\epsilon_{2}}\boldsymbol{\mathit{Z}}$ , then $\boldsymbol{\mathit{X}}\approx_{\epsilon_{1}+\epsilon_{2}}\boldsymbol{\mathit{Z}}$ ;

if $\boldsymbol{\mathit{X}}$ and $\boldsymbol{\mathit{Y}}$ are positive definite matrices such that $\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}}$ , then $\boldsymbol{\mathit{X}}^{-1}\approx_{\epsilon}\boldsymbol{\mathit{Y}}^{-1}$ ;

if $\boldsymbol{\mathit{X}}\approx_{\epsilon}\boldsymbol{\mathit{Y}}$ and $\boldsymbol{\mathit{V}}$ is a matrix, then $\boldsymbol{\mathit{V}}^{T}\boldsymbol{\mathit{X}}\boldsymbol{\mathit{V}}\approx_{\epsilon}\boldsymbol{\mathit{V}}^{T}\boldsymbol{\mathit{Y}}\boldsymbol{\mathit{V}}.$

Overview

Our algorithm is most naturally described for SDDM matrices. They are the matrices $\boldsymbol{\mathit{M}}$ that can be written as

where $\boldsymbol{\mathit{D}}$ is a nonnegative diagonal matrix, $\boldsymbol{\mathit{A}}$ is a symmetric nonnegative matrix, and $\boldsymbol{\mathit{D}}\succ\boldsymbol{\mathit{A}}$ .

Our algorithm is inspired by the following identity, which holds for matrices $\boldsymbol{\mathit{A}}$ of norm less than $1$ .

As the terms corresponding to larger values of $k$ make shrinking contributions, it is possible to obtain a good approximation to $(\boldsymbol{\mathit{I}}-\boldsymbol{\mathit{A}})^{-1}$ by truncating this product.

We exploit a variation of this identity that is symmetric, and thus allows us to employ sparsifiersThe product of sparse approximations of matrices need not be an approximation of the product, unless the products are taken symmetrically. . Our identity also applies more naturally to SDDM matrices. It is:

This tells us that, aside for a few multiplications of a vector by a matrix, we can reduce the problem of solving a linear equation in $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}$ to the problem of solving a linear equation in $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ . Moreover, this latter matrix is also a SDDM matrix (Proposition 5.6). We can in turn reduce this to the problem of solving a system of equations in

After applying this identity $k$ times, we obtain the matrix

As $k$ grows large, it becomes easy to approximately solve systems of linear equations in this matrix because all of the eigenvalues of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ are less than $1$ in absolute value. So, the term $(\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}})^{2^{k}}$ becomes negligible and the matrix approaches $\boldsymbol{\mathit{D}}$ (Corollary 5.5).

However, this does not yet yield an efficient algorithm as it requires us to multiply vectors by matrices of the form

for several integers $k$ . This can require a lot of work if those matrices are dense. We overcome this problem by sparsifying those matrices.

We now describe this process in more detail. Let $\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ be the matrix that defines the equation we want to solve. We observe (in Proposition 5.6) that the matrix $\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}\boldsymbol{\mathit{D}}_{0}^{-1}\boldsymbol{\mathit{A}}_{0}$ is also positive definite and diagonally dominant. So, a spectral sparsifier [ST11, BSS12] will allow us to approximate this matrix by a sparse SDDM matrix $\boldsymbol{\mathit{D}}_{1}-\boldsymbol{\mathit{A}}_{1}$ . Proceeding in this fashion, we obtain a sequence of sparse matrices

and so that $\boldsymbol{\mathit{A}}_{i}$ becomes negligible as $i$ grows. These provide a natural algorithm for solving a system of equations of the form $\boldsymbol{\mathit{M}}_{0}\boldsymbol{\mathit{x}}=\boldsymbol{\mathit{b}}_{0}$ , which we now present.

$\boldsymbol{\mathit{x}}=\textsc{Solve}((\boldsymbol{\mathit{M}}_{0},\boldsymbol{\mathit{D}}_{0})\ldots(\boldsymbol{\mathit{M}}_{d},\boldsymbol{\mathit{D}}_{d}),\boldsymbol{\mathit{b}})$ 1. For $i$ from $1$ to $d$ , set $\boldsymbol{\mathit{b}}_{i}=(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1})\boldsymbol{\mathit{b}}_{i-1}.$ 2. Set $\boldsymbol{\mathit{x}}_{d}=\boldsymbol{\mathit{D}}_{d}^{-1}\boldsymbol{\mathit{b}}_{d}$ . 3. For $i$ from $d-1$ downto , set $\boldsymbol{\mathit{x}}_{i}=\frac{1}{2}(\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{b}}_{i}+(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i})\boldsymbol{\mathit{x}}_{i+1}).$

The work required by Solve is proportional to the total number of nonzero entries in the matrices in the chain, and the depth is proportional to $\log n$ times $d$ .

If the approximations in (2) are good enough, then $\boldsymbol{\mathit{M}}\boldsymbol{\mathit{x}}_{0}$ will be a good approximation of $\boldsymbol{\mathit{b}}_{0}$ . We now roughly estimate how good these approximations need to be. We carry out the details in the following sections. If the condition number of $\boldsymbol{\mathit{M}}$ is $\kappa$ , then the largest eigenvalue of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ at most $1-1/\kappa$ (Proposition 5.3).

So, if we take $d=O(\log(\kappa))$ , all of the eigenvalues of $(\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}})^{2^{d}}$ will be close to 0. At every level of the recursion, we will incur a loss in approximation quality. To limit the overall loss to a constant, we will require the approximation in (2) to be at least as good as $\approx_{1/2d}$ . Using the best-known sparsifiers, we could achieve this with sparsifiers that have $O(d^{2}n)$ edges. So, the total number of nonzero entries in all of the matrices $A_{1},\dotsc,A_{d}$ will be $O(d^{3}n)=O(n\log^{3}\kappa)$ . We establish the existence of such chains in Section 5. If we instead use sparsifiers that we presently know how to construct efficiently, we multiply the total number of nonzero entries by a factor of $O(\log^{c}n)$ , for some constant $c$ . In Section 6, we give a nearly linear work, polylog depth algorithm for constructing these chains.

Approximate Inverse Chains

for all $1\leq i\leq d$ , $\boldsymbol{\mathit{M}}_{i}\approx_{\epsilon_{i-1}}\boldsymbol{\mathit{D}}_{i-1}-\boldsymbol{\mathit{A}}_{i-1}\boldsymbol{\mathit{D}}_{i-1}^{-1}\boldsymbol{\mathit{A}}_{i-1}$ ;

for all $1\leq i\leq d$ , $\boldsymbol{\mathit{D}}_{i}\approx_{\epsilon_{i-1}}\boldsymbol{\mathit{D}}_{i-1}$ ; and,

$\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d}$ .

For the rest of this section, we assume that $\boldsymbol{\mathit{D}}_{1},\dotsc,\boldsymbol{\mathit{D}}_{d}$ and $\boldsymbol{\mathit{A}}_{1},\dotsc,\boldsymbol{\mathit{A}}_{d}$ are an approximate inverse chain for $\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ .

The choice of the bound of $2$ on $\sum\epsilon_{i}$ is not particularly important: any constant will allow us to use an approximate inverse chain to solve systems of linear equations in $\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ . We remark that many of the matrices $\boldsymbol{\mathit{A}}_{i}$ will have nonzero diagonal entries. For this reason, we keep the representation of $\boldsymbol{\mathit{D}}_{i}$ and $\boldsymbol{\mathit{A}}_{i}$ separate.

We now show that an approximate inverse chain allows one to crudely solve systems of linear equations in $\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ in time proportional to the number of nonzero entries in the matrices in the chain. We first verify that if we replace $\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i}$ by $\boldsymbol{\mathit{M}}_{i+1}$ in identity (1), then we still obtain a good approximation to $\boldsymbol{\mathit{M}}_{i}^{-1}$ .

We now use Lemma 4.2, to prove that Solve approximates the inverse of $\boldsymbol{\mathit{M}}_{0}$ .

Let $\boldsymbol{\mathit{Z}}_{0}$ be the operator defined by Solve. That is, $\boldsymbol{\mathit{x}}_{0}=\boldsymbol{\mathit{Z}}_{0}\boldsymbol{\mathit{b}}_{0}$ . Then,

Let $\boldsymbol{\mathit{Z}}_{i}$ be the operator such that $\boldsymbol{\mathit{x}}_{i}=\boldsymbol{\mathit{Z}}_{i}\boldsymbol{\mathit{b}}_{i}$ . That is,

$\boldsymbol{\mathit{Z}}_{d}=\boldsymbol{\mathit{D}}_{d}^{-1}$ , and

for $0\leq i\leq d-1$ , $\boldsymbol{\mathit{Z}}_{i}=\frac{1}{2}\left(\boldsymbol{\mathit{D}}_{i}^{-1}+\left(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i}\right)\boldsymbol{\mathit{Z}}_{i+1}\left(\boldsymbol{\mathit{I}}+\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\right)\right)$ .

We will prove by reverse induction on $i$ that

The base case of $i=d$ follows from applying Fact 2.1d to $\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d}$ . For the induction, suppose the result is true for $i+1$ . Then the induction hypothesis gives

by Lemma 4.2. An application of Fact 2.1c completes the induction. ∎

This leads to a constant factor approximation of $\boldsymbol{\mathit{M}}_{0}$ . In order to turn this into a high quality approximation, we can use preconditioned Richardson iteration.

[Saa03, Axe94] There exists an algorithm PreconRichardson such that for any symmetric positive semi-definite matrices $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\mathit{B}}$ such that $\boldsymbol{\mathit{B}}\approx_{O(1)}\boldsymbol{\mathit{A}}^{-1}$ , and any error tolerance $0<\epsilon\leq 1/2$ ,

Under exact arithmetic, $\textsc{PreconRichardson}(\boldsymbol{\mathit{A}},\boldsymbol{\mathit{B}},\boldsymbol{\mathit{b}},\epsilon)$ is a linear operator on $\boldsymbol{\mathit{b}}$ and if $\boldsymbol{\mathit{Z}}$ is the matrix such that $\boldsymbol{\mathit{Z}}\boldsymbol{\mathit{b}}=\textsc{PreconRichardson}(\boldsymbol{\mathit{A}},\boldsymbol{\mathit{B}},\boldsymbol{\mathit{b}},\epsilon)$ , then

$\textsc{PreconRichardson}(\boldsymbol{\mathit{A}},\boldsymbol{\mathit{B}},\boldsymbol{\mathit{b}},\epsilon)$ takes $O(\log(1/\epsilon))$ iterations, each consisting of one multiplication of a vector by $\boldsymbol{\mathit{A}}$ and one by $\boldsymbol{\mathit{B}}$ .

This allows us to solve linear equations in $\boldsymbol{\mathit{M}}_{0}$ to arbitrary precision, and the overall performance of the solver can be summarized as follows.

Given an approximate inverse chain for $\boldsymbol{\mathit{M}}_{0}$ where $\boldsymbol{\mathit{D}}_{i}$ and $\boldsymbol{\mathit{A}}_{i}$ have total size $m_{i}$ , there is an algorithm $\textsc{Solve}(\boldsymbol{\mathit{M}},\boldsymbol{\mathit{b}},\epsilon)$ that

runs in $O(d\log{n}\log(1/\epsilon))$ depth and $O(\sum_{i=0}^{d}m_{i}\log(1/\epsilon))$ work;

is a symmetric linear operator on $\boldsymbol{\mathit{b}}$ ; and

if $\boldsymbol{\mathit{Z}}$ is the matrix such that $\textsc{Solve}(\boldsymbol{\mathit{M}},\boldsymbol{\mathit{b}},\epsilon)=\boldsymbol{\mathit{b}}$ , then $\boldsymbol{\mathit{Z}}\approx_{\epsilon}\boldsymbol{\mathit{M}}^{-1}$ .

The algorithm calls the procedure Solve inside preconditioned Richardson iteration. The guarantees of the operator follows from Lemma 4.3, therefore the guarantees of the overall algorithm is given by Lemma 4.4.

To analyze the running time, observe that each of the $O(\log(1/\epsilon))$ iterations performs one matrix-vector multiplication involving $\boldsymbol{\mathit{M}}_{0}$ , and then invokes the lower accuracy solver. This solver in turn performs two matrix-vector products for each matrix $\boldsymbol{\mathit{M}}_{i}$ , along with a constant number of vector additions. The bounds on the depth and work follow from the fact that the cost of each of such a matrix-vector multiplication requires depth $O(\log{n})$ and work $O(m_{i})$ . ∎

Existence of Sparse Approximate Inverse Chains

We now show that short sparse approximate inverse chains exist. Specifically, we show that as long as each $\boldsymbol{\mathit{M}}_{i+1}$ is a good approximation of $\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i}$ , $\boldsymbol{\mathit{D}}_{i+1}$ quickly becomes a good approximation to $\boldsymbol{\mathit{M}}_{i+1}$ . The following proposition tells us that the approximation factor improves by a constant factor if we do not use approximations.

Let $\boldsymbol{\mathit{D}}$ and $\boldsymbol{\mathit{A}}$ be matrices such that $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ is diagonalizable and has real eigenvalues. If all of the eigenvalues of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ are at most $1-\lambda$ , Then all of the eigenvalues of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ are between and $(1-\lambda)^{2}$ .

This follows from the fact that $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}=(\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}})^{2}$ . ∎

In Appendix B we prove the following lemma which says that this remains approximately true if we substitute a good approximation of $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ . Note that it agrees with Proposition 5.1 when $\epsilon=0$ .

Let $\boldsymbol{\mathit{D}}$ and $\boldsymbol{\widehat{\mathit{D}}}$ be positive diagonal matrices and let $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\widehat{\mathit{A}}}$ be nonnegative symmetric matrices such that $\boldsymbol{\mathit{D}}\succ\boldsymbol{\mathit{A}}$ , $\boldsymbol{\widehat{\mathit{D}}}\succ\boldsymbol{\widehat{\mathit{A}}}$ , $\boldsymbol{\mathit{D}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{D}}}$ and

Let the largest eigenvalue of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ be $1-\lambda$ . Then, the eigenvalues of $\boldsymbol{\widehat{\mathit{D}}}^{-1}\boldsymbol{\widehat{\mathit{A}}}$ lie between $1-\exp(2\epsilon)$ and $1-(1-(1-\lambda)^{2})\exp(-2\epsilon)$ .

The following proposition allows us to bound the eigenvalues of $\boldsymbol{\mathit{D}}_{0}^{-1}\boldsymbol{\mathit{A}}_{0}$ in terms of the condition number of $\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ .

Let $\boldsymbol{\mathit{M}}=\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}$ be a positive definite matrix with condition number $\kappa$ , where $\boldsymbol{\mathit{D}}$ is a positive diagonal matrix and $\boldsymbol{\mathit{A}}$ is nonnegative. Then, the eigenvalues of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ are between $-1+1/\kappa$ and $1-1/\kappa$ .

Let $\lambda_{max}$ and $\lambda_{min}$ be the largest and smallest eigenvalues of $\boldsymbol{\mathit{M}}$ . As the largest eigenvalue of a positive diagonal matrix is its largest entry, and the largest eigenvalue of a symmetric matrix is at least its largest entry, the largest eigenvalue of $\boldsymbol{\mathit{D}}$ is at most $\lambda_{max}$ . We will prove that the smallest eigenvalue of $\boldsymbol{\mathit{I}}-\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ is at least $\lambda_{min}/\lambda_{max}=1/\kappa$ . In particular, it is equal to

This implies that the largest eigenvalue of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ is at most $1-1/\kappa$ . The Perron-Frobenius theorem tells us that the largest absolute value of an eigenvalue of a nonnegative matrix is the largest eigenvalue. As $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ is nonnegative, the bound on its smallest eigenvalue follows. ∎

Conversely, the following proposition allows us to show that $\boldsymbol{\mathit{D}}$ is a good approximation of $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}$ if $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ is small.

If the eigenvalues of $\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ lie between $-\alpha$ and $\beta$ , then

Applying parts $a$ and $e$ of Fact 2.1, we derive

One obstacle to finding an approximate inverse chain is that the last matrix must be approximated by its diagonal. We use the preceding lemma and propositions to show that we can achieve this with a chain whose depth is logarithmic in the condition number of $\boldsymbol{\mathit{M}}_{0}$ .

If $\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ is a SDDM matrix with condition number $\kappa$ , $d=\left\lceil\log_{4/3}\kappa\right\rceil$ , and $\boldsymbol{\mathit{D}}_{1},\dots,\boldsymbol{\mathit{D}}_{d}$ and $\boldsymbol{\mathit{A}}_{1},\dots,\boldsymbol{\mathit{A}}_{d}$ satisfy conditions $a$ and $b$ of Definition 4.1 with $\epsilon_{0},\dots,\epsilon_{d-1}\leq 1/9$ , then $\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d}$ for $\epsilon_{d}=\ln 3$ .

Proposition 5.3 tells us that the eigenvalues of $\boldsymbol{\mathit{D}}_{0}^{-1}\boldsymbol{\mathit{A}}_{0}$ are at most $1-1/\kappa$ in absolute value. For $\epsilon_{i}\leq 1/9$ and $\lambda\leq 1/3$ ,

So, Lemma 5.2 implies that the eigenvalues of $\boldsymbol{\mathit{D}}_{d}^{-1}\boldsymbol{\mathit{A}}_{d}$ lie between $1-\exp(2/9)\geq-1/4$ and $2/3$ . Proposition 5.4 then tells us that $\boldsymbol{\mathit{D}}\approx_{\gamma}\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}$ , where

It remains to show that we can find sequences of matrices $\boldsymbol{\mathit{D}}_{i}$ and $\boldsymbol{\mathit{A}}_{i}$ that satisfy conditions $a$ and $b$ of Definition 4.1. We begin by proving that as long as $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}$ is a SDDM matrix, so is $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ .

It is clear that $\boldsymbol{\widehat{\mathit{M}}}$ is symmetric, and all entries in $\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ are nonnegative. To check that $\boldsymbol{\widehat{\mathit{M}}}$ is diagonally dominant, we compute the sum of the off-diagonal entries in row $i$ :

Reordering the two summations and collecting terms gives

As $\boldsymbol{\widehat{\mathit{M}}}$ is SDD, it is positive semidefinite. To see that it is positive definite, one need merely observe that it is nonsingular. This follows from the nonsingularity of $\boldsymbol{\mathit{M}}$ and identity (1). ∎

This allows us to use the spectral sparsifiers of Batson, Spielman, and Srivastava [BSS12] to sparsify $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ .

For every $n$ -dimensional Laplacian matrix $\boldsymbol{\mathit{L}}$ and every $0<\epsilon<1/2$ , there exists a Laplacian matrix $\boldsymbol{\widetilde{\mathit{L}}}$ with $O(n/\epsilon^{2})$ nonzero entries so that $\boldsymbol{\mathit{L}}\approx_{\epsilon}\boldsymbol{\widetilde{\mathit{L}}}$ .

For every $n$ -dimensional SDDM matrix $\boldsymbol{\mathit{M}}=\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}$ with $\boldsymbol{\mathit{D}}$ nonnegative diagonal and $\boldsymbol{\mathit{A}}$ nonnegative, and every $0<\epsilon<1/2$ , there exist a nonnegative diagonal $\boldsymbol{\widehat{\mathit{D}}}$ and a nonnegative $\boldsymbol{\widehat{\mathit{A}}}$ with at most $O(n/\epsilon^{2})$ nonzero entries so that

Let $\boldsymbol{\mathit{Y}}$ be the diagonal matrix containing the diagonal entries of $\boldsymbol{\mathit{A}}$ and let $\boldsymbol{\mathit{X}}$ be the diagonal matrix so that $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{X}}-\boldsymbol{\mathit{A}}$ has zero row-sums. Then, $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{X}}-\boldsymbol{\mathit{A}}$ is a Laplacian matrix, so by Theorem 5.7 there exists a Laplacian matrix $\boldsymbol{\widetilde{\mathit{D}}}-\boldsymbol{\widetilde{\mathit{A}}}$ where $\boldsymbol{\widetilde{\mathit{D}}}$ is nonnegative diagonal, $\boldsymbol{\widetilde{\mathit{A}}}$ has at most $O(n/\epsilon^{2})$ nonzero entries and is nonnegative with zero diagonal, and

As neither $\boldsymbol{\widetilde{\mathit{A}}}$ nor $\boldsymbol{\mathit{A}}-\boldsymbol{\mathit{Y}}$ have diagonal entries, it follows that

We now set $\boldsymbol{\widehat{\mathit{D}}}=\boldsymbol{\widetilde{\mathit{D}}}+\boldsymbol{\mathit{X}}+\boldsymbol{\mathit{Y}}$ and $\boldsymbol{\widehat{\mathit{A}}}=\boldsymbol{\widetilde{\mathit{A}}}+\boldsymbol{\mathit{Y}}$ . The desired properties of $\boldsymbol{\widehat{\mathit{D}}}$ and $\boldsymbol{\widehat{\mathit{A}}}$ then follow from Fact 2.1a. ∎

If $\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ is an $n$ -dimensional SDDM matrix with condition number at most $\kappa$ , $\boldsymbol{\mathit{D}}_{0}$ is diagonal and $\boldsymbol{\mathit{A}}_{0}$ is nonnegative, then $\boldsymbol{\mathit{M}}_{0}$ has an approximate inverse chain $\boldsymbol{\mathit{D}}_{1},\dots,\boldsymbol{\mathit{D}}_{d}$ and $\boldsymbol{\mathit{A}}_{1},\dots,\boldsymbol{\mathit{A}}_{d}$ such that $d=O(\log\kappa)$ and the total number of nonzero entries in the matrices in the chain is $O(n\log^{3}{\kappa})$ .

Set $d=\left\lceil\log_{4/3}\kappa\right\rceil$ and $\epsilon_{0},\dots,\epsilon_{d-1}$ to the minimum of $1/9$ and $1/2d$ . By Corollary 5.8 there exists a sequence of matrices $\boldsymbol{\mathit{M}}_{i}=\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i}$ that satisfy conditions $a$ and $b$ of Definition 4.1 and that each have $O(n/\log^{2}\kappa)$ nonzero entries. By Corollary 5.5, we then know that $\boldsymbol{\mathit{D}}_{d}\approx_{\epsilon_{d}}\boldsymbol{\mathit{M}}_{d}$ for $\epsilon_{d}=\ln 3$ . As the sum of the $\epsilon_{i}$ is at most $1/2+\ln 3<2$ , this sequence of matrices is an approximate inverse chain. ∎

Theorem 1.1 now follows from Theorem 4.5 and Theorem 5.9.

Efficient Parallel Construction

We now show how to construct sparse approximate inverse chains efficiently in parallel. The only obstacle is that we must employ an efficient sparsification routine instead of Theorem 5.7. We do this through a two-step process. If $\boldsymbol{\mathit{A}}_{i}$ is an $n$ -dimensional matrix with $m$ nonzero entries, we first compute an approximation of $\boldsymbol{\mathit{D}}_{i}-\boldsymbol{\mathit{A}}_{i}\boldsymbol{\mathit{D}}_{i}^{-1}\boldsymbol{\mathit{A}}_{i}$ that has $O(n+m\log n/\epsilon^{2})$ nonzero entries. We do this by observing that the off-diagonal entries of this matrix come from a sum of weighted cliques, and that we can sparsify those cliques individually. This is made easy by the existence of a closed form for the effective resistance between vertices of the cliques. We then employ a general-purpose sparsification routine to further reduce the number of nonzero entries to $O(n\log^{c}n/\epsilon^{2})$ , for some constant $c$ .

To find an algorithm that produces these sparsifiers in nearly-linear work and polylogarithmic depth, we look to the original spectral sparsification algorithm of Spielman and Teng [ST11]. It uses a polylogarithmic number of calls to a graph partitioning routine to divide a graph into a small set of edges plus a number of subgraphs of high conductance. It then samples edges from these subgraphs at random. This algorithm requires nearly-linear work and polylogarithmic depth if the graph partitioning algorithm does as well.

The graph partitioning algorithm used in [ST11] comes from [ST13]. This algorithm, which is based on local graph clustering, could probably be implemented efficiently in parallel. However, it is not stated in a parallel form in that paper. Instead, we rely on an improvement of this graph partitioning algorithm by Orecchia and Vishnoi [OV11]. By using their BalCut algorithm, we obtain a sparsifier with fewer edges while using less work and in polylogarithmic depth. To see that the BalCut algorithm can be implemented in nearly-linear time and polylogarithmic depth, we observe that all of its operations are either multiplication of vectors by matrices, elementary vector operations, or sorting and computing sparsest cuts by sweeps along vectors. All of these components parallelize efficiently, and Orecchia and Vishnoi showed that their algorithm only requires $O(m\log^{c}m)$ work, for some constant $c$ , when asked to produce cuts of polylogarithmic conductance. This is the case in the calls to BalCut made by Spielman and Teng’s [ST11] algorithm, specifically by the routine Partition2 inside Sparsify.

We summarize this discussion with the following theorem.

There exists an algorithm that on input an $n$ -dimensional Laplacian matrix $\boldsymbol{\mathit{L}}$ with $m$ nonzero entries, an $\epsilon\in[0,1/2]$ , produces with probability at least $1-1/n^{2}$ a Laplacian matrix $\boldsymbol{\widehat{\mathit{L}}}$ such that $\boldsymbol{\mathit{L}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{L}}}$ and $\boldsymbol{\widehat{\mathit{L}}}$ has $O(n\log^{c}n/\epsilon^{2})$ entries for some constant $c$ . Moreover, this algorithm requires $O(m\log^{c_{1}}n)$ work and $O(\log^{c_{2}}n)$ depth, for some other constants $c_{1}$ and $c_{2}$ .

We cannot directly apply the above-described algorithm to sparsify $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ because that matrix could be dense. So, we must avoid actually constructing this matrix, and construct a sparse approximation of it first instead. Let $m$ be the number of nonzero entries in $\boldsymbol{\mathit{A}}$ . We will show how to construct a sparse approximation of $\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ with $O(m\log m/\epsilon^{2})$ nonzero entries. We begin by writing the second matrix as a sum of outer products:

where $\boldsymbol{\mathit{A}}_{:,u}$ denotes the $u$ th column of $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\mathit{A}}_{u,:}$ denotes the $u$ th row. We can see that the diagonal entries of this sum come from two types of products: $\boldsymbol{\mathit{A}}_{u,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,u}$ and $\boldsymbol{\mathit{A}}_{u,v}\boldsymbol{\mathit{D}}_{v,v}^{-1}\boldsymbol{\mathit{A}}_{v,u}$ . There are only $n$ terms of the first type and $m$ terms of the second. So, all of these can be computed in time $O(n+m)$ . We are more concerned with the off-diagonal entries. These can come from three types of products: $\boldsymbol{\mathit{A}}_{u,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,v}$ , $\boldsymbol{\mathit{A}}_{v,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,u}$ , and $\boldsymbol{\mathit{A}}_{v,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,w}$ . There are only $O(m)$ terms of the first two types, so we can just compute and store all of them. For a fixed $u$ , the terms of the last type correspond to a weighted complete graph on the neighbors of vertex $u$ in which the edge between $v$ and $w$ has weight $\boldsymbol{\mathit{A}}_{v,u}\boldsymbol{\mathit{D}}_{u,u}^{-1}\boldsymbol{\mathit{A}}_{u,w}$ . Denote this graph by $G_{u}$ and its Laplacian matrix by $\boldsymbol{\mathit{L}}_{u}$ . We now show how to sparsify such a weighted complete graph.

We will do this through the approach of sampling edges by their effective resistance introduced by Spielman and Srivastava [SS08]. The effective resistance of an edge $(v,w)$ in a weighted graph $G$ with Laplacian matrix $\boldsymbol{\mathit{L}}_{u}$ is given by

where $\boldsymbol{\mathit{e}}_{v}$ is the elementary unit vector in direction $v$ and $\boldsymbol{\mathit{L}}_{u}^{{\dagger}}$ is the pseudo-inverse of $\boldsymbol{\mathit{L}}_{u}$ .

It has been observed [KMP10, KL13] that one can strengthen the result of [SS08] by replacing Rudelson’s concentration theorem [Rud99] with that of Rudelson and Vershynin [RV07] to obtain the following result.

Let $G=(V,E,\boldsymbol{\mathit{w}})$ be a weighted graph with $n$ vertices and let $\boldsymbol{\mathit{L}}$ be its Laplacian matrix. Consider the distribution on edges obtained by choosing an edge from $E$ with probability proportional to its weight times its effective resistance, and then dividing its weight by this probability. For every $0<\epsilon<1/2$ , if one forms the Laplacian of $O(n\log n/\epsilon^{2})$ edges chosen from this distribution, with replacement, then the resulting Laplacian matrix, $\boldsymbol{\widehat{\mathit{L}}}$ , satisfies $\boldsymbol{\mathit{L}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{L}}}$ with probability at least $1-1/n^{2}$ .

To apply this theorem, we need to know the effective resistances between pairs of vertices in $G_{u}$ .

The effective resistance between $v$ and $w$ in $G_{u}$ is

Note that $\boldsymbol{\mathit{D}}_{u,u}$ can be larger than $d_{u}$ .

There exists an algorithm that when given as input an $\epsilon\in[0,1/2]$ and an $n$ -dimensional SDDM matrix $\boldsymbol{\mathit{M}}=\boldsymbol{\mathit{D}}-\boldsymbol{\mathit{A}}$ , with $\boldsymbol{\mathit{D}}$ nonnegative diagonal and $\boldsymbol{\mathit{A}}$ nonnegative with $m$ nonzero entries, produces with probability at least $1-1/n$ a nonnegative diagonal matrix $\boldsymbol{\widehat{\mathit{D}}}$ and a nonnegative $\boldsymbol{\widehat{\mathit{A}}}$ with at most $O(m\log n/\epsilon^{2})$ nonzero entries so that

Moreover, this algorithm runs in time $O(m\log^{2}n/\epsilon^{2})$ .

We handle the diagonal entries as in Corollary 5.8. We also keep all of the off-diagonal entries of the first two types described above. For each vertex $u$ , let $\delta_{u}$ be the number of neighbors of vertex $u$ . We will approximate $\boldsymbol{\mathit{L}}_{u}$ by a Laplacian $\boldsymbol{\widehat{\mathit{L}}}_{u}$ with $O(\delta_{u}\log n/\epsilon^{2})$ non-zero entries. By Fact 2.1b, the sum of the $\boldsymbol{\widehat{\mathit{L}}}_{u}$ approximates the sum of the $\boldsymbol{\mathit{L}}_{u}$ . By Theorem 6.2, we can construct $\boldsymbol{\widehat{\mathit{L}}}_{u}$ by sampling the appropriate number of edges of $G_{u}$ , and then re-scaling them. It remains to check that we can sample each of these edges in time $O(\log n)$ . To see this, observe that we need to sample edge $(v,w)$ with probability proportional to

So, in time $O(\delta_{u})$ we can compute for each $v$ the probability that an edge involving $v$ is sampled. We can then create a data structure that will allow us to sample $v$ with this probability in time $O(\log n)$ , and to then follow it by sampling $w$ from the distribution induced by fixing the choice of $v$ . ∎

There exists an algorithm that on input an $n$ -dimensional diagonal $\boldsymbol{\mathit{D}}_{0}$ and nonnegative $\boldsymbol{\mathit{A}}_{0}$ with $m$ nonzero entries such that $\boldsymbol{\mathit{M}}_{0}=\boldsymbol{\mathit{D}}_{0}-\boldsymbol{\mathit{A}}_{0}$ has condition number at most $\kappa$ , constructs with probability at least $1/2$ an approximate inverse chain $\boldsymbol{\mathit{D}}_{1},\dots,\boldsymbol{\mathit{D}}_{d}$ and $\boldsymbol{\mathit{A}}_{1},\dots,\boldsymbol{\mathit{A}}_{d}$ such that $d=O(\log\kappa)$ and the total number of nonzero entries in the matrices in the chain is $O(n\log^{c}n\log^{3}{\kappa})$ , for some constant $c$ . Moreover, the algorithm runs in work $O(m\log^{c_{1}}n\log^{3}{\kappa})$ and depth $O(\log^{c_{2}}n\log{\kappa})$ , for some other constants $c_{1}$ and $c_{2}$ .

The proof follows the proof of Theorem 5.9. However, in place of Corollary 5.8 and Theorem 5.7, it uses the algorithms implicit in Corollary 6.4 and Theorem 6.1 to construct $\boldsymbol{\mathit{D}}_{i}$ and $\boldsymbol{\mathit{A}}_{i}$ from $\boldsymbol{\mathit{D}}_{i-1}$ and $\boldsymbol{\mathit{A}}_{i-1}$ . ∎

Theorem 1.2 now follows from Theorem 4.5 and Theorem 6.5.

Acknowledgement

We thank Gary Miller for very helpful discussions, and Jakub Pachocki for comments on an earlier draft.

References

Appendix A The Condition Number of a Submatrix

To learn more about the reductions between the problems of solving systems of linear equations in various forms of SDD matrices, we refer the reader to either Section 3.2 of [ST08] or Appendix A of [KOSZ13]. For example, Spielman and Teng [ST08] point out that one can solve a system of equations $\boldsymbol{\mathit{L}}\boldsymbol{\mathit{x}}=\boldsymbol{\mathit{b}}$ in a Laplacian matrix by first solving the system in which the first entry of $\boldsymbol{\mathit{x}}$ is forced to be zero. After solving this reduced system, one can orthogonalize the solution with respect to the nullspace of $\boldsymbol{\mathit{L}}$ , the all 1s vector. The reduced system is a linear system in the submatrix of $\boldsymbol{\mathit{L}}$ obtained by removing its first row and column. The resulting matrix is a SDDM matrix. We now show that its condition number is at most $n$ times the finite condition number of $\boldsymbol{\mathit{L}}$ . As the condition number of a submatrix of a positive definite matrix is at most the condition number of the original, the same bound holds for every submatrix of $\boldsymbol{\mathit{L}}$ .

Let $\boldsymbol{\mathit{L}}$ be the Laplacian matrix of a connected graph on $n$ vertices and let $\boldsymbol{\mathit{M}}$ be the submatrix of $\boldsymbol{\mathit{L}}$ containing all rows and columns of $\boldsymbol{\mathit{L}}$ but the first. Then, $\lambda_{1}(\boldsymbol{\mathit{M}})\geq\lambda_{2}(\boldsymbol{\mathit{L}})/n$ .

For any vector $\boldsymbol{\mathit{v}}$ of length $n-1$ , let $\boldsymbol{\mathit{u}}$ be the vector consisting of a 0 followed by $\boldsymbol{\mathit{v}}$ . Then, $\boldsymbol{\mathit{v}}^{T}\boldsymbol{\mathit{M}}\boldsymbol{\mathit{v}}=\boldsymbol{\mathit{u}}^{T}\boldsymbol{\mathit{L}}\boldsymbol{\mathit{u}}$ . As $1$ spans the nullspace of $\boldsymbol{\mathit{L}}$ , let $\boldsymbol{\mathit{x}}$ be the vector obtained by orthogonalizing $\boldsymbol{\mathit{u}}$ with respect to $1$ :

where $\mu$ is the average of the entries of $\boldsymbol{\mathit{u}}$ . Then, $\boldsymbol{\mathit{v}}^{T}\boldsymbol{\mathit{M}}\boldsymbol{\mathit{v}}=\boldsymbol{\mathit{x}}^{T}\boldsymbol{\mathit{L}}\boldsymbol{\mathit{x}}$ . We will show that $\left\|\boldsymbol{\mathit{x}}\right\|\geq\left\|\boldsymbol{\mathit{v}}\right\|/\sqrt{n}$ . The lemma will then follow, as

As $\boldsymbol{\mathit{u}}-\mu{\mbox{\boldmath$ 1 $}}$ is orthogonal to $\boldsymbol{\mathit{u}}$ ,

Under the conditions of Lemma A.1, the condition number of $\boldsymbol{\mathit{M}}$ is at most $n$ times the finite condition number of $\boldsymbol{\mathit{L}}$ . Moreover, the same is true for every principal submatrix of $\boldsymbol{\mathit{L}}$ .

The largest and smallest eigenvalues of a principal submatrix of a matrix are between the largest and smallest eigenvalues of that matrix. As $\boldsymbol{\mathit{M}}$ is a sub-matrix of $\boldsymbol{\mathit{L}}$ , its largest eigenvalue is smaller than the largest eigenvalue of $\boldsymbol{\mathit{L}}$ . So, the condition number of $\boldsymbol{\mathit{M}}$ is at most $n$ times the finite condition number of $\boldsymbol{\mathit{L}}$ . The condition number of every sub-matrix of $\boldsymbol{\mathit{M}}$ can only be smaller. ∎

Appendix B Proof of Lemma 5.2

The proof is a routine calculation. We build to it in a few steps.

Let $\boldsymbol{\mathit{A}}$ be a positive definite matrix and let $\boldsymbol{\mathit{X}}$ be a non-negative diagonal matrix. Then,

We just prove the first inequality. The second is similar. We have

Let $\boldsymbol{\widetilde{\mathit{D}}}$ and $\boldsymbol{\widehat{\mathit{D}}}$ be positive diagonal matrices and let $\boldsymbol{\widetilde{\mathit{A}}}$ and $\boldsymbol{\widehat{\mathit{A}}}$ be non-negative symmetric matrices such that $\boldsymbol{\widetilde{\mathit{D}}}\succ\boldsymbol{\widetilde{\mathit{A}}}$ , $\boldsymbol{\widehat{\mathit{D}}}\succ\boldsymbol{\widehat{\mathit{A}}}$ , $\boldsymbol{\widetilde{\mathit{D}}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{D}}}$ and $\boldsymbol{\widetilde{\mathit{D}}}-\boldsymbol{\widetilde{\mathit{A}}}\approx_{\epsilon}\boldsymbol{\widehat{\mathit{D}}}-\boldsymbol{\widehat{\mathit{A}}}$ . Then,

Let $\lambda$ be the largest eigenvalue of $\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}}$ , and let $\mu=1-\lambda$ . Then, $\mu$ is the smallest eigenvalue of $\boldsymbol{\mathit{I}}-\boldsymbol{\widetilde{\mathit{D}}}^{-1/2}\boldsymbol{\widetilde{\mathit{A}}}\boldsymbol{\widetilde{\mathit{D}}}^{-1/2}$ . As

Let $\boldsymbol{\mathit{X}}=\boldsymbol{\widetilde{\mathit{D}}}^{-1/2}\boldsymbol{\widehat{\mathit{D}}}^{1/2}$ . As $\lambda_{min}(\boldsymbol{\mathit{X}})^{2}\geq\exp(-\epsilon)$ , Proposition B.1 allows us to conclude that the smallest eigenvalue of

We may similarly conclude that the largest eigenvalue of (3) is at most

Let $\boldsymbol{\widetilde{\mathit{D}}}=\boldsymbol{\mathit{D}}$ and $\boldsymbol{\widetilde{\mathit{A}}}=\boldsymbol{\mathit{A}}\boldsymbol{\mathit{D}}^{-1}\boldsymbol{\mathit{A}}$ . From Proposition 5.1, we know that the largest eigenvalue of $\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}}$ is at most $(1-\lambda)^{2}$ , and so $1-\lambda_{max}(\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}})\geq 1-(1-\lambda)^{2}$ . Proposition B.2 then implies that

As the smallest eigenvalue of $\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}}$ is at least zero, the largest eigenvalue of $\boldsymbol{\mathit{I}}-\boldsymbol{\widetilde{\mathit{D}}}^{-1}\boldsymbol{\widetilde{\mathit{A}}}$ is at most $1$ . So, Proposition B.2 tells us that the largest eigenvalue of $\boldsymbol{\mathit{I}}-\boldsymbol{\widehat{\mathit{D}}}^{-1}\boldsymbol{\widehat{\mathit{A}}}$ is at most $\exp(2\epsilon)$ . This in turn implies that the smallest eigenvalue of $\boldsymbol{\widehat{\mathit{D}}}^{-1}\boldsymbol{\widehat{\mathit{A}}}$ is at least $1-\exp(2\epsilon)$ . ∎