Twice-Ramanujan Sparsifiers

Joshua Batson, Daniel A. Spielman, Nikhil Srivastava

Introduction

where LGL_{G} and LHL_{H} are the Laplacian matrices of GG and HH. We recall that

where wu,vw_{u,v} is the weight of edge (u,v)(u,v) in GG. By considering vectors xx that are the characteristic vectors of sets, one can see that condition (1) is strictly stronger than the cut condition of Benczur and Karger.

In the case where GG is the complete graph, excellent spectral sparsifiers are supplied by Ramanujan Graphs . These are dd-regular graphs HH all of whose non-zero Laplacian eigenvalues lie between d2d1d-2\sqrt{d-1} and d+2d1d+2\sqrt{d-1}. Thus, if we take a Ramanujan graph on nn vertices and multiply the weight of every edge by n/(d2d1)n/(d-2\sqrt{d-1}), we obtain a graph that κ\kappa-approximates the complete graph, for

In this paper, we prove that every graph can be approximated at least this well Strictly speaking, our approximation constant is only better than the Ramanujan bound κ=d+2d1d2d1\kappa=\frac{d+2\sqrt{d-1}}{d-2\sqrt{d-1}} in the regime d1+52d\geq\frac{1+\sqrt{5}}{2}. This includes the actual Ramanujan graphs, for which dd is an integer greater than 22. by a graph with only twice as many edges as the Ramanujan graph (as a dd-regular graph has dn/2dn/2 edges).

Our proof provides a deterministic greedy algorithm for computing the graph HH in time O(dn3m)O(dn^{3}m).

We remark that while the edges of HH are a subset of the edges of GG, the weights of edges in HH and GG will typically be different. In fact, there exist unweighted graphs GG for which every good spectral sparsifier HH must contain edges of widely varying weights .

In the case that GG is a complete graph, our construction produces expanders. However, these expanders are slightly unusual in that their edges have weights, they may be irregular, and the weighted degrees of vertices can vary slightly. This may lead one to ask whether they should really be considered expanders. In Section 4 we argue that they should be.

As the graphs we produce are irregular and weighted, it is also not immediately clear that we should be comparing κ\kappa with the Ramanujan bound of

It is knownWhile lower bounds on the spectral gap of dd-regular graphs focus on showing that the second-smallest eigenvalue is asymptotically at most d2d1d-2\sqrt{d-1}, the same proofs by test functions can be used to show that the largest eigenvalue is at asymptotically least d+2d1d+2\sqrt{d-1}. that no dd-regular graph of uniform weight can κ\kappa-approximate a complete graph for κ\kappa asymptotically better than (2) . While we believe that no graph of average degree dd can be a κ\kappa-approximation of a complete graph for κ\kappa asymptotically better than (2), we are unable to show this at the moment and prove instead the weaker claim that no such graph can achieve κ\kappa less than

2 Prior Work

Spielman and Teng introduced the notion of sparsification that we consider, and proved that (1+ϵ)(1+\epsilon)-approximations with O~(n/ϵ2)\widetilde{{O}}\left(n/\epsilon^{2}\right) edges could be constructed in O~(m)\widetilde{{O}}\left(m\right) time. They used these sparsifiers to obtain a nearly-linear time algorithm for solving diagonally dominant systems of linear equations .

Spielman and Teng were inspired by the notion of sparsification introduced by Benczur and Karger for cut problems, which only required inequality (1) to hold for all x{0,1}Vx\in\left\{0,1\right\}^{V}. Benczur and Karger showed how to construct graphs HH meeting this guarantee with O(nlogn/ϵ2){{O}}\left(n\log n/\epsilon^{2}\right) edges in O(mlog3n){{O}}\left(m\log^{3}n\right) time; their cut sparsifiers have been used to obtain faster algorithms for cut problems .

Spielman and Srivastava proved the existence of spectral sparsifiers with O(nlogn/ϵ2){{O}}\left(n\log n/\epsilon^{2}\right) edges, and showed how to construct them in O~(m)\widetilde{{O}}\left(m\right) time. They conjectured that it should be possible to find such sparsifiers with only O(n/ϵ2){{O}}\left(n/\epsilon^{2}\right) edges. We affirmatively resolve this conjecture.

Recently, partial progress was made towards this conjecture by Goyal, Rademacher and Vempala , who showed how to find graphs HH with only 2n2n edges that O(logn){{O}}\left(\log n\right)-approximate bounded degree graphs GG under the cut notion of Benczur and Karger.

We remark that all of these constructions were randomized. Ours is the first deterministic algorithm to achieve the guarantees of any of these papers.

Preliminaries

Let G=(V,E,w)G=(V,E,w) be a connected weighted undirected graph with nn vertices and mm edges and edge weights we>0w_{e}>0. If we orient the edges of GG arbitrarily, we can write its Laplacian as L=BTWBL=B^{T}WB, where Bm×nB_{m\times n} is the signed edge-vertex incidence matrix, given by

and Wm×mW_{m\times m} is the diagonal matrix with W(e,e)=weW(e,e)=w_{e}. It is immediate that LL is positive semidefinite since:

and that GG is connected if and only if ker(L)=ker(W1/2B)=span(1)\ker(L)=\ker(W^{1/2}B)=\textrm{span}(\mathbf{1}).

2 The Pseudoinverse

Since LL is symmetric we can diagonalize it and write

where λ1,,λn1\lambda_{1},\ldots,\lambda_{n-1} are the nonzero eigenvalues of LL and u1,,un1u_{1},\ldots,u_{n-1} are a corresponding set of orthonormal eigenvectors. The Moore-Penrose Pseudoinverse of LL is then defined as

Notice that ker(L)=ker(L+)\ker(L)=\ker(L^{+}) and that

3 Formulas for Rank-one Updates

We use the following well-known theorem from linear algebra, which describes the behavior of the inverse of a matrix under rank-one updates (see [8, Section 2.1.3]).

If AA is a nonsingular n×nn\times n matrix and v\mathbf{v} is a vector, then

There is a related formula describing the change in the determinant of a matrix under the same update:

If AA is nonsingular and v\mathbf{v} is a vector, then

The Main Result

At the heart of this work is the following purely linear algebraic theorem. We use the notation ABA\preceq B to mean that BAB-A is positive semidefinite, and idS\mathbf{id}_{S} to denote the identity operator on a vector space SS.

Then there exist scalars si0s_{i}\geq 0 with {i:si0}dn|\{i:s_{i}\neq 0\}|\leq dn so that

The sparsification result for graphs follows quickly from this theorem as shown below.

which are indexed by the edges of GG and satisfy

By the Courant-Fischer Theorem, this is equivalent to:

The rest of this section is devoted to proving Theorem 3.1. The proof is constructive and yields a deterministic polynomial time algorithm for finding the scalars sis_{i}, which can then be used to sparsify graphs, as advertised.

Given vectors {vi}\{\mathbf{v}_{i}\}, our goal is to choose a small set of coefficients sis_{i} so that A=isiviviTA=\sum_{i}s_{i}\mathbf{v}_{i}\mathbf{v}_{i}^{T} is well-conditioned. We will build the matrix AA in steps, starting with A=0A=0 and adding one vector siviviTs_{i}\mathbf{v}_{i}\mathbf{v}_{i}^{T} at a time. Before beginning the proof, it will be instructive to study how the eigenvalues and characteristic polynomial of a matrix evolve upon the addition of a vector. This discussion should provide some intuition for the structure of the proof, and demystify the origin of the ‘Twice-Ramanujan’ number d+1+2dd+12d\frac{d+1+2\sqrt{d}}{d+1-2\sqrt{d}} which appears in our final result.

It is well known that the eigenvalues of A+vvTA+\mathbf{v}\mathbf{v}^{T} interlace those of AA. In fact, the new eigenvalues can be determined exactly by looking at the characteristic polynomial of A+vvTA+\mathbf{v}\mathbf{v}^{T}, which is computed using Lemma 2.2 as follows:

where λi\lambda_{i} are the eigenvalues of AA and uju_{j} are the corresponding eigenvectors. The polynomial pA+vvT(x)p_{A+\mathbf{v}\mathbf{v}^{T}}(x) has two kinds of zeros λ\lambda:

Those for which pA(λ)=0p_{A}(\lambda)=0. These are equal to the eigenvalues λj\lambda_{j} of AA for which the added vector v\mathbf{v} is orthogonal to the corresponding eigenvector uju_{j}, and which do not therefore ‘move’ upon adding vvT\mathbf{v}\mathbf{v}^{T}.

Those for which pA(λ)0p_{A}(\lambda)\neq 0 and

These are the eigenvalues which have moved and strictly interlace the old eigenvalues. The above equation immediately suggests a simple physical model which gives intuition as to where these new eigenvalues are located.

Physical Model. We interpret the eigenvalues λ\lambda as charged particles lying on a slope. On the slope are nn fixed, chargeless barriers located at the initial eigenvalues λj\lambda_{j}, and each particle is resting against one of the barriers under the influence of gravity. Adding the vector vvT\mathbf{v}\mathbf{v}^{T} corresponds to placing a charge of v,uj2\langle\mathbf{v},u_{j}\rangle^{2} on the barrier corresponding to λj\lambda_{j}. The charges on the barriers repel those on the eigenvalues with a force that is proportional to the charge on the barrier and inversely proportional to the distance from the barrier — i.e., the force from barrier jj is given by

a quantity which is positive for λj\lambda_{j} ‘below’ λ\lambda, which are pushing the partical ‘upward’, and negative otherwise. The eigenvalues move up the slope until they reach an equilibrium in which the repulsive forces from the barriers cancel the effect of gravity, which we take to be a +1+1 in the downward direction. Thus the equilibrium condition corresponds exactly to having the total ‘downward pull’ f(λ)f(\lambda) equal to zero.

With this physical model in mind, we begin to consider what happens to the eigenvalues of AA when we add a random vector from our set {vi}\{\mathbf{v}_{i}\}. The first observation is that for any eigenvector uju_{j} (in fact for any vector at all), the expected projection of a randomly chosen v{vi}im\mathbf{v}\in\{\mathbf{v}_{i}\}_{i\leq m} is

Of course, this does not mean that there is any single vector vi\mathbf{v}_{i} in our set that realizes this ‘expected behavior’ of equal projections on the eigenvectors. But if we were to add such a vector For concreteness, we remark that this ‘average’ vector would be precisely vavg=1mjuj.\mathbf{v}_{\textsf{avg}}=\frac{1}{\sqrt{m}}\sum_{j}u_{j}. in our physical model, we would add equal charges of 1/m1/m to each of the barriers, and we would expect all of the eigenvalues of AA to drift forward ‘steadily’. In fact, one might expect that after sufficiently many iterations of this process, the eigenvalues would all march forward together, with no eigenvalue too far ahead or too far behind, and we would end up in a position where λmax/λmin\lambda_{max}/\lambda_{min} is bounded.

In fact, this intuition turns out to be correct. Adding a vector with equal projections changes the characteristic polynomial in the following manner:

since pA(x)=jij(xλi)p_{A}^{\prime}(x)=\sum_{j}\prod_{i\neq j}(x-\lambda_{i}). If we start with A=0A=0, which has characteristic polynomial p0(x)=xnp_{0}(x)=x^{n}, then after kk iterations of this process we obtain the polynomial

where DD is the derivative with respect to xx. Fortunately, iterating the operator (IαD)(I-\alpha D) for any α>0\alpha>0 generates a standard family of orthogonal polynomials – the associated Laguerre polynomials . These polynomials are very well-studied and the locations of their zeros are known; in particular, after k=dnk=dn iterations the ratio of the largest to the smallest zero is known to be

To prove the theorem, we will show that we can choose a sequence of actual vectors that realizes the expected behavior (i.e. the behavior of repeatedly adding vavg\mathbf{v}_{\textsf{avg}}), as long as we are allowed to add arbitrary fractional amounts of the viviT\mathbf{v}_{i}\mathbf{v}_{i}^{T} via the weights si0s_{i}\geq 0. We will control the eigenvalues of our matrix by maintaining two barriers as in the physical model, and keeping the eigenvalues between them. The lower barrier will ‘repel’ the eigenvalues forward; the upper one will make sure they do not go too far. The barriers will move forward at a steady pace. By maintaining that the total ‘repulsion’ at every step of this process is bounded, we will be able to guarantee that there is always some multiple of a vector to add that allows us to continue the process.

2 Proof by Barrier Functions

We begin by defining two ‘barrier’ potential functions which measure the quality of the eigenvalues of a matrix. These potential functions are inspired by the inverse law of repulsion in the physical model discussed in the last section.

To prove the theorem, we will build the sum isiviviT\sum_{i}s_{i}\mathbf{v}_{i}\mathbf{v}_{i}^{T} iteratively, adding one vector at a time. Specifically, we will construct a sequence of matrices

along with positive constantsOn first reading the paper, we suggest the reader follow the proof with the assignment ϵU=ϵL=1\epsilon_{U}=\epsilon_{L}=1, u0=nu_{0}=n, l0=nl_{0}=-n, δU=2\delta_{U}=2, δL=1/3\delta_{L}=1/3. This will provide the bound (6d+1)/(d1)(6d+1)/(d-1), and eliminates the need to use Claim 3.6. u0,l0,δU,δL,ϵUu_{0},l_{0},\delta_{U},\delta_{L},\epsilon_{U} and ϵL\epsilon_{L} which satisfy the following conditions:

Initially, the barriers are at u=u0u=u_{0} and l=l0l=l_{0} and the potentials are

Each matrix is obtained by a rank-one update of the previous one — specifically by adding a positive multiple of an outer product of some vi\mathbf{v}_{i}.

If we increment the barriers uu and ll by δU\delta_{U} and δL\delta_{L} respectively at each step, then the upper and lower potentials do not increase. For every q=0,1,Qq=0,1,\ldots Q,

No eigenvalue ever jumps across a barrier. For every q=0,1,Qq=0,1,\ldots Q,

To complete the proof we will choose u0,l0,δU,δL,ϵUu_{0},l_{0},\delta_{U},\delta_{L},\epsilon_{U} and ϵL\epsilon_{L} so that after Q=dnQ=dn steps, the condition number of A(Q)A^{(Q)} is bounded by

By construction, A(Q)A^{(Q)} is a weighted sum of at most dndn of the vectors, as desired.

The main technical challenge is to show that conditions (b) and (c) can be satisfied simultaneously — i.e., that there is always a choice of vvT\mathbf{v}\mathbf{v}^{T} to add to the current matrix which allows us to shift both barriers up by a constant without increasing either potential. We achieve this in the following three lemmas.

The first lemma concerns shifting the upper barrier. If we shift uu forward to u+δUu+\delta_{U} without changing the matrix AA, then the upper potential Φu(A)\Phi^{u}(A) decreases since the eigenvalues λi\lambda_{i} do not move and uu moves away from them. This gives us room to add some multiple of a vector tvvTt\mathbf{v}\mathbf{v}^{T}, which will move the λi\lambda_{i} towards ll and increase the potential, counteracting the initial decrease due to shifting. The following lemma quantifies exactly how much of a given vvT\mathbf{v}\mathbf{v}^{T} we can add without increasing the potential beyond its original value before shifting.

That is, if we add tt times vvT\mathbf{v}\mathbf{v}^{T} to AA and shift the upper barrier by δU\delta_{U}, then we do not increase the upper potential.

We remark that UA(v)U_{A}(\mathbf{v}) is linear in the outer product vvT\mathbf{v}\mathbf{v}^{T}.

Let u=u+δUu^{\prime}=u+\delta_{U}. By the Sherman-Morrison formula, we can write the updated potential as:

The second lemma is about shifting the lower barrier. Here, shifting ll forward to l+δLl+\delta_{L} while keeping AA fixed has the opposite effect — it increases the lower potential Φl(A)\Phi_{l}(A) since the barrier ll moves towards the eigenvalues λi\lambda_{i}. Adding a multiple of a vector tvvTt\mathbf{v}\mathbf{v}^{T} will move the λi\lambda_{i} forward and away from the barrier, decreasing the potential. Here, we quantify exactly how much of a given vvT\mathbf{v}\mathbf{v}^{T} we need to add to compensate for the initial increase from shifting ll, and return the potential to its original value before the shift.

That is, if we add tt times vvT\mathbf{v}\mathbf{v}^{T} to AA and shift the lower barrier by δL\delta_{L}, then we do not increase the lower potential.

Now proceed as in the proof for the upper potential. Let l=l+δLl^{\prime}=l+\delta_{L}. By Sherman-Morrison, we have:

Rearranging shows that Φl+δL(A+tvvT)Φl(A)\Phi_{l+\delta_{L}}(A+t\mathbf{v}\mathbf{v}^{T})\leq\Phi_{l}(A) when 1/tLA(v)1/t\leq L_{A}(\mathbf{v}). ∎

The third lemma identifies the conditions under which we can find a single tvvTt\mathbf{v}\mathbf{v}^{T} which allows us to maintain both potentials while shifting barriers, and thereby continue the process. The proof that such a vector exists is by an averaging argument, so this can be seen as the step in which we relate the behavior of actual vectors to the behavior of the expected vector vavg\mathbf{v}_{\textsf{avg}}. Notice that the use of variable weights tt, from which the eventual sis_{i} arise, is crucial to this part of the proof.

then there exists an ii and positive tt for which

from which the claim will follow by Lemmas 3.3 and 3.4. We begin by bounding

If λi>l\lambda_{i}>l for all ii, 0i(λil)1ϵL0\leq\sum_{i}(\lambda_{i}-l)^{-1}\leq\epsilon_{L}, and 1/δLϵL01/\delta_{L}-\epsilon_{L}\geq 0, then

for every ii. So, the denominator of the left-most term on the left-hand side is positive, and the claimed inequality is equivalent to

which, by moving the first term on the RHS to the LHS, is just

All we need to do now is set ϵU,ϵL,δU\epsilon_{U},\epsilon_{L},\delta_{U}, and δL\delta_{L} in a manner that satisfies Lemma 3.5 and gives a good bound on the condition number. Then, we can take A(0)=0A^{(0)}=0 and construct A(q+1)A^{(q+1)} from A(q)A^{(q)} by choosing any vector vi\mathbf{v}_{i} with

(such a vector is guaranteed to exist by Lemma 3.5) and setting A(q+1)=A(q)+tviviTA^{(q+1)}=A^{(q)}+t\mathbf{v}_{i}\mathbf{v}_{i}^{T} for any t0t\geq 0 satisfying:

The initial potentials are ΦnϵU(0)=ϵU\Phi^{\frac{n}{\epsilon_{U}}}(0)=\epsilon_{U} and ΦnϵL(0)=ϵL\Phi_{\frac{n}{\epsilon_{L}}}(0)=\epsilon_{L}. After dndn steps, we have

To turn this proof into an algorithm, one must first compute the vectors vi\mathbf{v}_{i}, which can be done in time O(n2m){{O}}\left(n^{2}m\right). For each iteration of the algorithm, we must compute ((u+δU)IA)1((u+\delta_{U})I-A)^{-1}, ((u+δU)IA)2((u+\delta_{U})I-A)^{-2}, and the same matrices for the lower potential function. This computation can be performed in time O(n3){{O}}\left(n^{3}\right). Finally, we can decide which edge to add in each iteration by computing UA(vi)U_{A}(\mathbf{v}_{i}) and LA(vi)L_{A}(\mathbf{v}_{i}) for each edge, which can be done in time O(n2m){{O}}\left(n^{2}m\right). As we run for dndn iterations, the total time of the algorithm is O(dn3m){{O}}\left(dn^{3}m\right).

Sparsifiers of the Complete Graph

Let G=(V,E)G=(V,E) be the complete graph on nn vertices, and let H=(V,F,w)H=(V,F,w) be a weighted graph of average degree dd that (1+ϵ)(1+\epsilon)-approximates GG. As xTLGx=nx2x^{T}L_{G}x=n\left\|x\right\|^{2} for every xx orthogonal to 11, it is immediate that every vertex of HH has weighed degree between nn and (1+ϵ)n(1+\epsilon)n. Thus, one should think of HH as being an expander graph in which each edge weight has been multiplied by n/dn/d.

As HH is weighted and can be irregular, it may at first seem strange to view it as an expander. However, it may easily be shown to have the properties that define expanders: it has high edge-conductance, random walks mix rapidly on HH and converge to an almost-uniform distribution, and it satisfies the Expander Mixing Property (see or [10, Lemma 2.5]). High edge-conductance and rapid mixing would not be so interesting if the weighted degrees were not nearly uniform — for example, the star graph has both of these properties, but the random walk on the star graph converges to a very non-uniform distribution, and the star does not satisfy the Expander Mixing Property. For the convenience of the reader, we include a proof that HH has the Expander Mixing Property below.

Let LH=(V,E,w)L_{H}=(V,E,w) be a graph that (1+ϵ)(1+\epsilon)-approximates LGL_{G}, the complete graph on VV. Then, for every pair of disjoint sets SS and TT,

where w(S,T)w(S,T) denotes the sum of the weights of edges between SS and TT.

where MM is a matrix of norm at most (ϵ/2)LGnϵ/2(\epsilon/2)\left\|L_{G}\right\|\leq n\epsilon/2. Let xx be the characteristic vector of SS, and let yy be the characteristic vector of TT. We have

As GG is the complete graph and SS and TT are disjoint, we also know

Using the proof of the lower bound on the spectral gap of Alon and Boppana (see ) one can show that a dd-regular unweighted graph cannot κ\kappa-approximate a complete graph for κ\kappa asymptotically better than (2). We conjecture that this bound also holds for weighted graphs of average degree dd. Presently, we prove the following weaker result for such graphs.

Let GG be the complete graph on vertex set VV, and let H=(V,E,w)H=(V,E,w) be a weighted graph with nn vertices and a vertex of degree dd. If HH κ\kappa-approximates GG, then

We use a standard approach. Suppose HH is a κ\kappa-approximation of the complete graph. We will construct vectors xx^{*} and yy^{*} orthogonal to the 11 vector so that

is large, and this will give us a lower bound on κ\kappa.

Let v0v_{0} be the vertex of degree dd, and let its neighbors be v1,,vdv_{1},\dotsc,v_{d}. Suppose viv_{i} is connected to v0v_{0} by an edge of weight wiw_{i}, and the total weight of the edges between viv_{i} and vertices other than v0,v1,,vdv_{0},v_{1},\ldots,v_{d} is δi\delta_{i}. We begin by considering vectors xx and yy with

These vectors are not orthogonal to 11, but we will take care of that later. It is easy to compute the values taken by the quadratic form at xx and yy:

Since HH is a κ\kappa-approximation, all weighted degrees must lie between nn and nκn\kappa, which gives

Let xx^{*} and yy^{*} be the projections of xx and yy respectively orthogonal to the 11 vector. Then

Combining (5) and (6), we conclude that asymptotically:

But by our assumption the LHS is at most κ\kappa, so we have

Conclusion

We conclude by drawing a connection between Theorem 3.1 and an outstanding open problem in mathematics, the Kadison-Singer conjecture. This conjecture, which dates back to 1959, is equivalent to the well-known Paving Conjecture as well as to a stronger form of the restricted invertibility theorem of Bourgain and Tzafriri . The following formulation is due to Nik Weaver .

then there is a partition X1,XrX_{1},\ldots X_{r} of {1,,m}\{1,\ldots,m\} for which

Suppose we had a version of Theorem 3.1 which, assuming viδ\|\mathbf{v}_{i}\|\leq\delta, guaranteed that the scalars sis_{i} were all either or some constant β>0\beta>0, and gave a constant approximation factor κ<β\kappa<\beta. Then we would have

for S={i:si0}S=\{i:s_{i}\neq 0\}, yielding a proof of Conjecture 5.1 with r=2r=2 and ϵ=min{1κβ,1β}\epsilon=\min\{1-\frac{\kappa}{\beta},\frac{1}{\beta}\} since

As a special case, such a theorem would also imply the existence of unweighted sparsifiers for the complete graph and other (sufficiently dense) edge-transitive graphs. It is also worth noting that the viδ\|\mathbf{v}_{i}\|\leq\delta condition when applied to vectors {Πe}eE\{\Pi_{e}\}_{e\in E} arising from a graph simply means that the effective resistances of all edges are bounded; thus, we would be able to conclude that any graph with sufficiently small resistances can be split into two graphs that approximate it spectrally.

References