Approximate Gaussian Elimination for Laplacians: Fast, Sparse, and Simple
Rasmus Kyng, Sushant Sachdeva
Introduction
A symmetric matrix is called Symmetric and Diagonally Dominant (SDD) if for all An SDD matrix is a Laplacian if for and for all A Laplacian matrix is naturally associated with a graph on its vertices, where are adjacent if The problem of solving systems of linear equations where is an SDD matrix (and often a Laplacian), is a fundamental primitive and arises in varied applications in both theory and practice. Example applications include solutions of partial differential equations via the finite element method [Str86, BHV08], semi-supervised learning on graphs [ZGL03, ZS04, ZBL+04], and computing maximum flows in graphs [DS08, CKM+11, Mad13, LS14]. It has also been used as a primitive in the design of several fast algorithms [KM09, OSV12, KMP12, LKP12, KRS15]. It is known that solving SDD linear systems can be reduced to solving Laplacian systems [Gre96].
A natural approach to solving systems of linear equations is Gaussian elimination, or its variant for symmetric matrices, Cholesky factorization. Cholesky factorization of a matrix produces a factorization where is a lower-triangular matrix, and is a diagonal matrix. Such a factorization allows us to solve a system by computing where the inverse of , and can be applied quickly since they are lower-triangular and diagonal, respectively.
The fundamental obstacle to using Cholesky factorization for quickly solving systems of linear equations is that can be a dense matrix even if the original matrix is sparse. The reason is that the key step in Cholesky factorization, eliminating a variable, say from a system of equations, creates a new coefficient for every pair such that and are non-zero. This phenomenon is called fill-in. For Laplacian systems, eliminating the first variable corresponds to eliminating the first vertex in the graph, and the fill-in corresponds to adding a clique on all the neighbors of the first vertex. Sequentially eliminating variables often produces a sequence of increasingly-dense systems, resulting in an worst-case time even for sparse Informally, the algorithm for generating the Cholesky factorization for a Laplacian can be expressed as follows:
Use equation to express the variable for vertex in terms of the remaining variables.
Eliminate vertex , adding a clique on the neighbors of
Eliminating the vertices in an order given by a permutation generates a factorization where denotes the permutation matrix of , i.e., for all Though picking a good order of elimination can significantly reduce the running time of Cholesky factorization, it gives no guarantees for general systems, e.g., for sparse expander graphs, every ordering results in an running time [RJL79].
Our Results.
In this paper, we present the first nearly linear time algorithm that generates a sparse approximate Cholesky decomposition for Laplacian matrices, with provable approximation guarantees. Our algorithm SparseCholesky can be described informally as follows (see Section 3 for a precise description):
Use equation to express the variable for vertex in terms of the remaining variables.
Eliminate vertex , adding random samples from the clique on the neighbors of
We prove the following theorem about our algorithm, where for symmetric matrices we write if is positive semidefinite (PSD).
where i.e., has a sparse Cholesky factorization.
The sparse approximate Cholesky factorization for given by Theorem C.1 immediately implies fast solvers for Laplacian systems. We can use the simplest iterative method, called iterative refinement [Hig02, Chapter 12] to solve the system as follows. We let,
For all Laplacian systems with and all using the sparse approximate Cholesky factorization given by Theorem C.1, the above iterate for satisfies We can compute such an in time
In our opinion, this is the simplest nearly-linear time solver for Laplacian systems. Our algorithm only uses random sampling, and no graph-theoretic constructions, in contrast with all previous Laplacian solvers. The analysis is also entirely contained in this paper. We also remark that there is a possibility that our analysis is not tight, and that the bounds can be improved by a stronger matrix concentration result.
Technical Contributions.
There are several key ideas that are central to our result: The first is randomizing the order of elimination. At step of the algorithm, if we eliminate a fixed vertex and sample the resulting clique, we do not know useful bounds on the sample variances that would allow us to prove concentration. Randomizing over the choice of vertex to eliminate allows us to bound the sample variance by roughly times the Laplacian at the previous step.
The second key idea is our approach to estimating effective resistances: A core element in all nearly linear time Laplacian solvers is a procedure for estimating effective resistances (or leverage scores) for edges in order to compute sampling probabilities. In previous solvers, these estimates are obtained using fairly involved procedures (e.g. low-stretch trees, ultrasparsifiers, or the subsampling procedure due to Cohen et al. [CLM+15]). In contrast, our solver starts with the crudest possible estimates of for every edge, and then uses the triangle inequality for effective resistances (Lemma 5.1) to obtain estimates for the new edges generated. We show that these estimates suffice for constructing a nearly linear time Laplacian solver.
Finally, we develop new concentration bounds for a class of matrix martingales that we call bags-of-dice martingales. The name is motivated by a simple scalar model: at each step, we pick a random bag of dice from a collection of bags (in the algorithm, this corresponds to picking a random vertex to eliminate), and then we independently roll each die in the selected bag (corresponding to drawing independent samples from the clique added). The guarantees obtained from existing matrix concentration results are too weak for our application. The concentration bound gives us a powerful tool for handling conditionally independent sums of variables. We defer a formal description of the martingales and the concentration bound to Section 4.2.
Comparison to other Laplacian solvers.
In contrast, our algorithm requires no graph-theoretic construction, and is based purely on random sampling. Our result only uses two algebraic facts about Laplacian matrices: 1. They are closed under taking Schur complements, and 2. They satisfy the effective resistance triangle inequality (Lemma 5.1).
[KLP+16] presented the first nearly linear time solver for block Diagonally Dominant (bDD) systems – a generalization of SDD systems. If bDD matrices satisfy the effective resistance triangle inequality (we are unaware if they do), then the algorithm in the main body of this paper immediately applies to bDD systems, giving a sparse approximate block Cholesky decomposition and a nearly linear time solver for bDD matrices.
In Section C, we sketch a near-linear time algorithm for computing a sparse approximate block Cholesky factorization for bDD matrices. It combines the approach of SparseCholesky with a recursive approach for estimating effective resistances, as in [KLP+16], using the subsampling procedure [CLM+15]. Though the algorithm is more involved than SparseCholesky, it runs in time and produces a sparse approximate Cholesky decomposition with only entries. The algorithm only uses that bDD matrices are closed under taking Schur complements, and that the Schur complements have a clique structure similar to Laplacians (see Section 2).
Comparison to Incomplete Cholesky Factorization.
A popular approach to tackling fill-in is Incomplete Cholesky factorization, where we throw away most of the new entries generated when eliminating variables. The hope is that the resulting factorization is still an approximation to the original matrix in which case such an approximate factorization can be used to quickly solve systems in Though variants of this approach are used often in practice, and we have approximation guarantees for some families of Laplacians [Gus78, Gua97, BGH+06], there are no known guarantees for general Laplacians to the best of our knowledge.
Preliminaries
A weighted multi-graph is not uniquely defined by its Laplacian, since the Laplacian only depends on the sum of the weights of the multi-edges on each edge. We want to establish a one-to-one correspondence between a weighted multi-graph and its Laplacian , so from now on, we will consider every Laplacian to be maintained explicitly as a sum of Laplacians of multi-edges, and we will maintain this multi-edge decomposition as part of our algorithms.
If is connected, then the kernel of is the span of the vector
Cholesky Factorization in Sum and Product Forms.
We now formally introduce Cholesky factorization. Rather than the usual perspective where we factor out lower triangular matrices at every step of the algorithm, we present an equivalent perspective where we subtract a rank one term from the current matrix, obtaining its Schur complement. The lower triangular structure follows from the fact that the matrix effectively become smaller at every step.
Let be any symmetric positive-semidefinite matrix. Let denote the column of Using the first equation in the system to eliminate the variable produces another system where is with replaced by 0, and
For computing the Cholesky factorization, we perform a sequence of such operations, where in the step, we select an index and eliminate the variable We define
If at some step , , then we define , and . Continuing until , will have at most one non-zero entry, which will be on the diagonal. We define and .
Let be the matrix with as its column, and be the diagonal matrix , then Define the permutation matrix by Letting we have This decomposition is known as Cholesky factorization. Crucially, is lower triangular, since and for we have .
Clique Structure of the Schur Complement.
For example, we denote the first column of by then We can write the Schur complement as It is immediate that is a Laplacian matrix, since . A more surprising (but well-known) fact is that
is also a Laplacian, and its edges form a clique on the neighbors of . It suffices to show it for We write to denote Then
Thus is a Laplacian since it is a sum of two Laplacians. By induction, for all is a Laplacian.
The SparseCholesky Algorithm
Algorithm 1 gives the pseudo-code for our algorithm SparseCholesky. Our main result, Theorem 3.1 (a more precise version of Theorem C.1), shows that the algorithm computes an approximate sparse Cholesky decomposition in nearly linear time. We assume the Real RAM model. We prove the theorem in Section 4.
The expected number of non-zero entries in is . The algorithm runs in expected time .
Algorithm 2 gives the pseudo-code for our CliqueSample algorithm.
The most significant obstacle to making Cholesky factorization of Laplacians efficient is the fill-in phenomenon, namely that each clique has roughly non-zero entries. To solve this problem, we develop a sampling procedure CliqueSample that produces a sparse Laplacian matrix which approximates the clique . As input, the procedure requires a Laplacian matrix , maintained as a sum of Laplacians of multi-edges, and a vertex . It then computes a sampled matrix that approximates . The elimination step in SparseCholesky removes the edges incident on , and only adds at most multi-edges. This means the total number of multi-edges does not increase with each elimination step, solving the fill-in problem. The sampling procedure is also very fast: It takes time, much faster than the order time required to even write down the clique .
Although it is notationally convenient for us to pass the whole matrix to CliqueSample, the procedure only relies on multi-edges incident on , so we will only pass these multi-edges.
Theorem 3.1 only provides guarantees only on the expected running time. In fact, if we make a small change to the algorithm, we can get running time w.h.p. At the elimination, instead of picking the vertex uniformly at random among the remaining vertices, we pick the vertex uniformly at random among the remaining vertices with at most twice the average multi-edge degree in . In Appendix B, we sketch a proof of this.
Analysis of the Algorithm using Matrix Concentration
In this section, we analyze the SparseCholesky algorithm, and prove Theorem 3.1. To prove the theorem, we need several intermediate results which we will now present. In Section 4.1, we show how the output the SparseCholesky and CliqueSample algorithms can be used to define a matrix martingale. In Section 4.2, we introduce a new type of martingale, called a bags-of-dice martingale, and a novel matrix concentration result for these martingales. In Section 4.3, we show how to apply our new matrix concentration results to the SparseCholesky martingale and prove Theorem 3.1. We defer proofs of the lemmas that characterize CliqueSample to Section 5, and proofs of our matrix concentration results to Section 6.
Throughout this section, we will study matrices that arise in the when using SparseCholesky to produce a sparse approximate Cholesky factorization of the Laplacian of a multi-graph . We will very frequently need to refer to matrices that are normalized by . We adopt the following notation: Given a symmetric matrix s.t. ,
We will only use this notation for matrices that satisfy the condition . Note that and iff Normalization is always done with respect to the Laplacian input to SparseCholesky. We say a multi-edge is -bounded if
Given a Laplacian that corresponds to a multi-graph , and a scalar , we say that is -bounded if every multi-edge of is -bounded. Since every multi-edge of is trivially -bounded, we can obtain a -bounded Laplacian that corresponds to the same matrix, by splitting each multi-edge into identical copies, with a fraction of the initial weight. The resulting Laplacian has at most multi-edges.
Our next lemma describes some basic properties of the samples output by CliqueSample. We prove the lemma in Section 5.
is or the Laplacian of a multi-edge with endpoints , where are neighbors of in .
, i.e. is -bounded w.r.t. .
The algorithm runs in time .
The lemma tells us that the samples in expectation behave like the clique , and that each sample is -bounded. This will be crucial to proving concentration properties of our algorithm. We will use the fact that the expectation of the CliqueSample algorithm output equals the matrix produced by standard Cholesky elimination, to show that in expectation, the sparse approximate Cholesky decomposition produced by our SparseCholesky algorithm equals the original Laplacian. We will also see how we can use this expected behaviour to represent our sampling process as a martingale. We define the approximate Laplacian as
Thus our final output equals . Note that Line (9) of the SparseCholesky algorithm does not introduce any sampling error, and so . The only significance of Line (9) is that it puts the matrix in the form we need for our factorization. Now
This is a martingale. To prove multiplicative concentration bounds, we need to normalize the martingale by , and so instead we consider
This martingale has considerable structure beyond a standard martingale. Conditional on the choices of the SparseCholesky algorithm until step , and conditional on , the terms are independent.
In Section 4.2 we define a type of martingale that formalizes the important aspects of this structure.
2 Bags-of-Dice Martingales and Matrix Concentration Results
We use the following shorthand notation: Given a sequence of random variables , for every we write
Extending this notation to conditional expectations, we write,
Note that is allowed to be random, as long as it is fixed conditional on and . The martingale given in Equation (6) is a bags-of-dice martingale, with , and . The name is motivated by a simple model: At each step of the martingale we pick a random bag of dice from a collection of bags (this corresponds to the outcome of ) and then we independently roll each die in the bag (corresponds to the outcomes ).
Suppose is a bags-of-dice martingale of matrices that satisfies
Every sample satisfies
We remark that this theorem, and all the results in this section extend immediately to Hermitian matrices. We prove the above theorem in Section 6. This result is based on the techniques introduced by Tropp [Tro12] for using Lieb’s Concavity Theorem to prove matrix concentration results. Tropp’s result improved on earlier work, such as Ahlswede and Winter [AW02] and Rudelson and Vershynin [RV07]. These earlier matrix concentration results required IID sampling, making them unsuitable for our purposes.
Unfortunately, we cannot apply Theorem 4.3 directly to the bags-of-dice martingale in Equation (6). As we will see later, the variance of can have norm proportional to , which can grow large.
However, we expect that the probability of growing large is very small. Our next construction allows us to leverage this idea, and avoid the small probability tail events that prevent us from directly applying Theorem 4.3 to the bags-of-dice martingale in Equation (6).
Given a bags-of-dice martingale of matrices , and a scalar , we define for each the event
We also define the -truncated martingale:
The truncated martingale is derived from another martingale by forcing the martingale to get “stuck” if it grows too large. This ensures that so long as the martingale is not stuck, it is not too large. On the other hand, as our next result shows, the truncated martingale fails more often than the original martingale, and so it suffices to prove concentration of the truncated martingale to prove concentration of the original martingale. The theorem stated below is proven in Section 6.
Given a bags-of-dice martingale of matrices , a scalar , the associated -truncated martingale is also a bags-of-dice martingale, and
3 Analyzing the SparseCholesky Algorithm Using Bags-of-Dice Martingales
By taking , and , we obtain a bags-of-dice martingale . Let denote the corresponding -truncated bags-of-dice martingale. The next lemma shows that is well-behaved. The lemma is proven in Section 5.
Proof of Theorem 3.1: We have . Since for all , , the statement is equivalent to . Further, , and so it is equivalent to . By Theorem 4.5, we have,
Thus, we can pick Finally, Lemma 4.1 also gives,
Thus, we can pick and
Similarly, we obtain concentration for with the same parameters. Thus, by Theorem 4.3,
Picking we get and
Combining this with Equation ( ‣ 4.3) establishes Equation (11).
Finally, we need to bound the expected running time of the algorithm. We start by observing that the algorithm maintains the two following invariants:
Every multi-edge in is -bounded.
The total number of multi-edges is at most .
We establish the first invariant inductively. The invariant holds for , because of the splitting of original edges into copies with weight . The invariant thus also holds for , since the multi-edges of this Laplacian are a subset of the previous ones. By Lemma 4.1, every multi-edge output by CliqueSample is -bounded, so is -bounded. If we apply this argument repeatedly for we get invariant (1).
Invariant (2) is also very simple to establish: It holds for , because splitting of original edges into copies does not produce more than multi-edges in total. When computing , we subtract , which removes exactly multi-edges, while we add the multi-edges produced by the call to , which is at most . So the number of multi-edges is not increasing.
Clique Sampling Proofs
In this section, we prove Lemmas 4.1 and 4.6 that characterize the behaviour of our algorithm CliqueSample, which is used in SparseCholesky to approximate the clique generated by eliminating a variable.
A important element of the CliqueSample algorithm is our very simple approach to leverage score estimation. Using the well-known result that effective resistance in Laplacians is a distance (see Lemma 5.2), we give a bound on the leverage scores of all edges in a clique that arises from elimination. We let
Note that the factor accounts for the fact that every pair is double counted.
Suppose multi-edges are -bounded w.r.t. , and have endpoints and respectively, and , then is -bounded.
To prove Lemma 5.1, we need the following result about Laplacians:
Given a connected weighted multi-graph with associated Laplacian matrix in , consider three distinct vertices , and the pair-vectors , and .
This is known as phenomenon that Effective Resistance is a distance [KR93].
Proof of Lemma 5.1: Using the previous lemma:
To prove Lemma 4.1, we need the following result of Walker [Wal77] (see Bringmann and Panagiotou [BP12] for a modern statement of the result).
The time required for each sample is .
We note that there are simpler sampling constructions than that of Lemma 5.3 that need time per sample, and using such a method would only worsen our running time by a factor .
Proof of Lemma 4.1: From Lines (5) and (6), is or the Laplacian of a multi-edge with endpoints . To upper bound the running time, it is important to note that we do not need access to the entire matrix . We only need the multi-edges incident on . When calling CliqueSample, we only pass a copy of just these multi-edges.
We observe that the uniform samples in Line (3) can be done in time each, provided we count the number of multi-edges incident on to find . We can compute in time. Using Lemma 5.3, if we do time preprocessing, we can compute each sample in Line (2) in time . Since we do samples, the total time for sampling is hence .
Now we determine the expected value of the sum of the samples. Note that in the sum below, each pair of multi-edges appears twice, with different weights.
Proof of Lemma 4.6: Throughout the proof of this lemma, all the random variables considered are conditional on the choices of the SparseCholesky algorithm up to and including step .
Now, is PSD since it is a Laplacian, so . By Equation (2), we get and by Equation (1) we get , finally by Equation (5) we get so
Again, using , we get
Matrix Concentration Analysis
To prove Theorem 4.3, we need the following lemma, which is the main technical result of this section.
Given a bags-of-dice martingale of matrices that is for all such that , we have,
Before proving this lemma, we will see how to use it to prove Theorem 4.3.
Proof of Theorem 4.3: Given Lemma 6.1, we can show Theorem 4.3 using the following bound via trace exponentials, which was first developed by Ahlswede and Winter [AW02].
To show Lemma 6.1 will need the following result by Tropp [Tro12], which is a corollary of Lieb’s Concavity Theorem [Lie73].
Given a random symmetric matrix , and a fixed symmetric matrix ,
We will use the following claim to control the above trace by an inductive argument.
For all and all such that , we have,
Before proving this claim, we see that it immediately implies Lemma 6.1.
Proof of Lemma 6.1: We chain the inequalities given by Claim 6.3 for to obtain,
where the last inequality follows from for symmetric and
We will also need the next two lemmas, which essentially appear in the work of Tropp [Tro12]. For completeness, we also prove these lemmas in Appendix A.
We also need the following well-known fact (see for example [Tro12]):
Given symmetric matrices s.t. , .
For all such that all and for all symmetric that are fixed given
Now, using Fact 6.6 which states that is monotone increasing with respect to the PSD order, we obtain the lemma.
Proof of Claim 6.3: Since using Lemma 6.7 with we obtain,
Now, using Fact 6.6, namely that is monotone increasing with respect to the PSD order, we obtain the lemma.
2 Truncating Bags-of-Dice Martingales
To prove this theorem, we use the next lemma, which we will prove later in this section:
Acknowledgements
We thank Daniel Spielman for suggesting this project and for helpful comments and discussions.
References
Appendix A Conditions for Bounding Matrix Moment Generating Functions
Proof of Lemma 6.4: We define . Note that . The function is positive and increasing in for all real . This means for every symmetric matrix , , and so for any symmetric matrix , . Thus
Proof of Lemma 6.5: We define . The function is positive and increasing in for all real . This means for every symmetric matrix , , and so for any symmetric matrix , . Also . Thus
The lemma now follows using the fact that is operator monotone (increasing), for all real and
Appendix B Obtaining Concentration of Running Time
As indicated in Remark 3.2, we can obtain a version of Theorem 3.1 that provides running time guarantees with high probability instead of in expectation, by making a slight change to the SparseCholesky algorithm. In this appendix, we briefly sketch how to prove this. We refer to the modified algorithm as LowDegreeSparseCholesky. The algorithm is requires only two small modifications: Firstly, instead of initially choosing a permutation at random, we choose the vertex to eliminate by sampling it uniformly at random amongst the remaining vertices that have degree at most twice the average multi-edge degree in . We can do this by keeping track of all vertex degrees, and sampling a remaining vertex at random, and resampling if the degree is too high, until we get a low degree vertex. Secondly, to make up for the slight reduction in the randomization of the choice of vertex, we double the value of used in Line 1.
The number of non-zero entries in is . With high probability the algorithm runs in time .
Appendix C Sparse Cholesky Factorization for bDD Matrices
In this appendix, we sketch a version of our approximate Cholesky factorization algorithm, that is also applicable to BDD matrices, which include the class of Connection Laplacian matrices. We call this algorithm BDDSparseCholesky. It follows closely the algorithmic structure used in [KLP+16]. Like [KLP+16], we need the input matrix to be non-singular, which we can achieve using the approach described in Claims 2.4 and 2.5 of [KLP+16].
Our algorithm replace their expander-based Schur complement approximation routine with our simple one-by-one vertex elimination, while still using their recursive subsampling based framework for estimation of leverage scores. The constants in this appendix are not optimized.
We study bDD matrices as defined in [KLP+16], with blocks, where is a constant (see their Section 1.1). The class of bDD matrices is Hermitian, rather than symmetric, but our notion of spectral approximation and our matrix concentration results extend immediately to Hermitian matrices. Throughout this section, we use to conjugate transpose. Our algorithm will still compute a Cholesky composition, except we do not factor the individual block matrices. We will sketch a proof of the following result:
where i.e., has a sparse Cholesky factorization.
We choose a fixed , but the algorithms can be adapted to produce approximate Cholesky decompositions with spectral approximation and running time dependence .
For BDD matrices, we do not know a result analogous to the fact that effective resistance is a distance in Laplacians (see Lemma 5.2). Instead, we use a result that is weaker by a factor : Given two bDD multi-edge matrices , and that are incident on vertex blocks and respectively, if we eliminate vertex block , this creates a multi-edge with BDD matrix satisfying
We will sketch how to modify the LowDegreeSparseCholesky algorithm to solve BDD matrices. We call this BDDSparseCholesky. This algorithm is similar to SparseCholesky and LowDegreeSparseCholesky, except that the number of multi-edges in the approximate factorization will be slowly increasing with each elimination and to counter this we will need to occasionally sparsify the matrices we produce. First we will assume an oracle procedure Sparsify, which we will later see how to construct using a boot-strapping approach that recursively calls BDDSparseCholesky on smaller matrices.
should be identical to LowDegreeSparseCholesky, except
The sampling rate is an explicit parameter to BDDSparseCholesky.
BDDSparseCholesky should not split the initial input multi-edge into smaller copies. This will be important because we use BDDSparseCholesky recursively, and we will only split edges at the top level.
We adapt the CliqueSample routine to sample from a bDD elimination clique, and we produce samples, and scale all samples by a factor .
After eliminating vertices, it calls to produce a sampled matrix of dimension , and then calls to recursively produce an approximate Cholesky decomposition of .
To compute an approximate Cholesky decomposition of , we set . Form by splitting each edge of into copies with of their initial weight. Call .
The clique sampling routine for bDD matrices uses more conservative sampling than CliqueSample , because we use the weaker Equation (12). The sparsification step then becomes necessary because our clique sampling routine now causes the total number of number of multi-edges to increase with each elimination. However, the increase will not exceed a factor , so after eliminations, the total number of multi-edges has not grown by more than .
We can use a truncated martingale to analyze the entire approximate Cholesky decomposition produced by BDDSparseCholesky and its recursive calls using Theorems 4.3 and 4.5 (these theorems extend to Hermitian matrices immediately). The calls to Sparsify will cause our bound on the martingale variance to grow larger, but only by a constant factor. By increasing by an appropriate constant, we still obtain concentration.
On the other hand, the calls to Sparsify will ensure that the time spent in the recursive calls to BDDSparseCholesky is only a constant fraction of the time spent in the initial call, assuming the total number of multi-edges in exceeds (if not, we can always split edges to achieve this). This corresponds to assuming that before the initial edge splitting at the start of the algorithm, we have at least edges.
This means the total time to compute the approximate Cholesky decomposition of using BDDSparseCholesky will only be , excluding calls to Sparsify. The decomposition will have non-zeros, and its approximate inverse can be applied in time.
We now sketch briefly how to implement the Sparsify routine. It closely resembles the Sparsify routine of [KLP+16] (see their Lemma H.1).
Implementing the sparsification routine.
uses the subsampling-based techniques of [CLM+15]. It is identical to the sparsification routine of [KLP+16], except the recursive call to a linear solver uses BDDSparseCholesky. The routine first samples each multi-edge with probability to produce a sparse matrix , and then uses Johnson-Lindenstrauss-based leverage score estimation (see [CLM+15]) to compute IID samples with the desired -bound w.r.t. . The IID samples are summed to give the output matrix . This requires approximately solving systems of linear equations in the sparse matrix . To do so, Sparsify first splits every edge of into copies (increasing the number of multi-edges by a factor ), then makes a single recursive call to , and then uses the resulting Cholesky decomposition times to compute approximate solutions to systems of linear equations. One issue requires some care: The subsampling guarantees provided by [CLM+15] are with respect to and not , however, by using a truncated martingale in our analysis, we can assume that .
Running time including sparsification.
Finally, if we take account of time spent on calls to Sparsify and its recursive calls to , we get a time recursion for BDDSparseCholesky of
(assuming initially for that ), which can be solved to give a running time bound of .
Fixing at the top level ensures that a union bound across all recursive calls in BDDSparseCholesky will give that the approximate Cholesky decomposition obtains a factor spectral approximation with high probability.
By applying the sparsification approach described in this section once, we can compute an approximate Cholesky decomposition of Laplacian matrices in time time w.h.p.
We run LowDegreeSparseCholesky with a modification: after eliminating all but vertices, we do sparsification on the remaining graph with multi-edges and vertices. The call to LowDegreeSparseCholesky until sparsification will take time w.h.p. The sparsification is done using a modified version of the Sparsify routine described above. Instead of sampling each multi-edge with probability , we use a probability of . The recursive linear solve in Sparsify can be done using unmodified LowDegreeSparseCholesky, and linear system solves for Johnson-Lindenstrauss leverage score estimation can be done using this decomposition, all in time. The output graph from Sparsify can be Cholesky decomposed using unmodified LowDegreeSparseCholesky as well, and this will take time . In total, we get a running time and number of non-zeros bounded by time w.h.p.