Twice-Ramanujan Sparsifiers

Joshua Batson, Daniel A. Spielman, Nikhil Srivastava

Introduction

where $L_{G}$ and $L_{H}$ are the Laplacian matrices of $G$ and $H$ . We recall that

where $w_{u,v}$ is the weight of edge $(u,v)$ in $G$ . By considering vectors $x$ that are the characteristic vectors of sets, one can see that condition (1) is strictly stronger than the cut condition of Benczur and Karger.

In the case where $G$ is the complete graph, excellent spectral sparsifiers are supplied by Ramanujan Graphs . These are $d$ -regular graphs $H$ all of whose non-zero Laplacian eigenvalues lie between $d-2\sqrt{d-1}$ and $d+2\sqrt{d-1}$ . Thus, if we take a Ramanujan graph on $n$ vertices and multiply the weight of every edge by $n/(d-2\sqrt{d-1})$ , we obtain a graph that $\kappa$ -approximates the complete graph, for

In this paper, we prove that every graph can be approximated at least this well Strictly speaking, our approximation constant is only better than the Ramanujan bound $\kappa=\frac{d+2\sqrt{d-1}}{d-2\sqrt{d-1}}$ in the regime $d\geq\frac{1+\sqrt{5}}{2}$ . This includes the actual Ramanujan graphs, for which $d$ is an integer greater than $2$ . by a graph with only twice as many edges as the Ramanujan graph (as a $d$ -regular graph has $dn/2$ edges).

Our proof provides a deterministic greedy algorithm for computing the graph $H$ in time $O(dn^{3}m)$ .

We remark that while the edges of $H$ are a subset of the edges of $G$ , the weights of edges in $H$ and $G$ will typically be different. In fact, there exist unweighted graphs $G$ for which every good spectral sparsifier $H$ must contain edges of widely varying weights .

In the case that $G$ is a complete graph, our construction produces expanders. However, these expanders are slightly unusual in that their edges have weights, they may be irregular, and the weighted degrees of vertices can vary slightly. This may lead one to ask whether they should really be considered expanders. In Section 4 we argue that they should be.

As the graphs we produce are irregular and weighted, it is also not immediately clear that we should be comparing $\kappa$ with the Ramanujan bound of

It is knownWhile lower bounds on the spectral gap of $d$ -regular graphs focus on showing that the second-smallest eigenvalue is asymptotically at most $d-2\sqrt{d-1}$ , the same proofs by test functions can be used to show that the largest eigenvalue is at asymptotically least $d+2\sqrt{d-1}$ . that no $d$ -regular graph of uniform weight can $\kappa$ -approximate a complete graph for $\kappa$ asymptotically better than (2) . While we believe that no graph of average degree $d$ can be a $\kappa$ -approximation of a complete graph for $\kappa$ asymptotically better than (2), we are unable to show this at the moment and prove instead the weaker claim that no such graph can achieve $\kappa$ less than

2 Prior Work

Spielman and Teng introduced the notion of sparsification that we consider, and proved that $(1+\epsilon)$ -approximations with $\widetilde{{O}}\left(n/\epsilon^{2}\right)$ edges could be constructed in $\widetilde{{O}}\left(m\right)$ time. They used these sparsifiers to obtain a nearly-linear time algorithm for solving diagonally dominant systems of linear equations .

Spielman and Teng were inspired by the notion of sparsification introduced by Benczur and Karger for cut problems, which only required inequality (1) to hold for all $x\in\left\{0,1\right\}^{V}$ . Benczur and Karger showed how to construct graphs $H$ meeting this guarantee with ${{O}}\left(n\log n/\epsilon^{2}\right)$ edges in ${{O}}\left(m\log^{3}n\right)$ time; their cut sparsifiers have been used to obtain faster algorithms for cut problems .

Spielman and Srivastava proved the existence of spectral sparsifiers with ${{O}}\left(n\log n/\epsilon^{2}\right)$ edges, and showed how to construct them in $\widetilde{{O}}\left(m\right)$ time. They conjectured that it should be possible to find such sparsifiers with only ${{O}}\left(n/\epsilon^{2}\right)$ edges. We affirmatively resolve this conjecture.

Recently, partial progress was made towards this conjecture by Goyal, Rademacher and Vempala , who showed how to find graphs $H$ with only $2n$ edges that ${{O}}\left(\log n\right)$ -approximate bounded degree graphs $G$ under the cut notion of Benczur and Karger.

We remark that all of these constructions were randomized. Ours is the first deterministic algorithm to achieve the guarantees of any of these papers.

Preliminaries

Let $G=(V,E,w)$ be a connected weighted undirected graph with $n$ vertices and $m$ edges and edge weights $w_{e}>0$ . If we orient the edges of $G$ arbitrarily, we can write its Laplacian as $L=B^{T}WB$ , where $B_{m\times n}$ is the signed edge-vertex incidence matrix, given by

and $W_{m\times m}$ is the diagonal matrix with $W(e,e)=w_{e}$ . It is immediate that $L$ is positive semidefinite since:

and that $G$ is connected if and only if $\ker(L)=\ker(W^{1/2}B)=\textrm{span}(\mathbf{1})$ .

2 The Pseudoinverse

Since $L$ is symmetric we can diagonalize it and write

where $\lambda_{1},\ldots,\lambda_{n-1}$ are the nonzero eigenvalues of $L$ and $u_{1},\ldots,u_{n-1}$ are a corresponding set of orthonormal eigenvectors. The Moore-Penrose Pseudoinverse of $L$ is then defined as

Notice that $\ker(L)=\ker(L^{+})$ and that

3 Formulas for Rank-one Updates

We use the following well-known theorem from linear algebra, which describes the behavior of the inverse of a matrix under rank-one updates (see [8, Section 2.1.3]).

If $A$ is a nonsingular $n\times n$ matrix and $\mathbf{v}$ is a vector, then

There is a related formula describing the change in the determinant of a matrix under the same update:

If $A$ is nonsingular and $\mathbf{v}$ is a vector, then

The Main Result

At the heart of this work is the following purely linear algebraic theorem. We use the notation $A\preceq B$ to mean that $B-A$ is positive semidefinite, and $\mathbf{id}_{S}$ to denote the identity operator on a vector space $S$ .

Then there exist scalars $s_{i}\geq 0$ with $|\{i:s_{i}\neq 0\}|\leq dn$ so that

The sparsification result for graphs follows quickly from this theorem as shown below.

which are indexed by the edges of $G$ and satisfy

By the Courant-Fischer Theorem, this is equivalent to:

The rest of this section is devoted to proving Theorem 3.1. The proof is constructive and yields a deterministic polynomial time algorithm for finding the scalars $s_{i}$ , which can then be used to sparsify graphs, as advertised.

Given vectors $\{\mathbf{v}_{i}\}$ , our goal is to choose a small set of coefficients $s_{i}$ so that $A=\sum_{i}s_{i}\mathbf{v}_{i}\mathbf{v}_{i}^{T}$ is well-conditioned. We will build the matrix $A$ in steps, starting with $A=0$ and adding one vector $s_{i}\mathbf{v}_{i}\mathbf{v}_{i}^{T}$ at a time. Before beginning the proof, it will be instructive to study how the eigenvalues and characteristic polynomial of a matrix evolve upon the addition of a vector. This discussion should provide some intuition for the structure of the proof, and demystify the origin of the ‘Twice-Ramanujan’ number $\frac{d+1+2\sqrt{d}}{d+1-2\sqrt{d}}$ which appears in our final result.

It is well known that the eigenvalues of $A+\mathbf{v}\mathbf{v}^{T}$ interlace those of $A$ . In fact, the new eigenvalues can be determined exactly by looking at the characteristic polynomial of $A+\mathbf{v}\mathbf{v}^{T}$ , which is computed using Lemma 2.2 as follows:

where $\lambda_{i}$ are the eigenvalues of $A$ and $u_{j}$ are the corresponding eigenvectors. The polynomial $p_{A+\mathbf{v}\mathbf{v}^{T}}(x)$ has two kinds of zeros $\lambda$ :

Those for which $p_{A}(\lambda)=0$ . These are equal to the eigenvalues $\lambda_{j}$ of $A$ for which the added vector $\mathbf{v}$ is orthogonal to the corresponding eigenvector $u_{j}$ , and which do not therefore ‘move’ upon adding $\mathbf{v}\mathbf{v}^{T}$ .

Those for which $p_{A}(\lambda)\neq 0$ and

These are the eigenvalues which have moved and strictly interlace the old eigenvalues. The above equation immediately suggests a simple physical model which gives intuition as to where these new eigenvalues are located.

Physical Model. We interpret the eigenvalues $\lambda$ as charged particles lying on a slope. On the slope are $n$ fixed, chargeless barriers located at the initial eigenvalues $\lambda_{j}$ , and each particle is resting against one of the barriers under the influence of gravity. Adding the vector $\mathbf{v}\mathbf{v}^{T}$ corresponds to placing a charge of $\langle\mathbf{v},u_{j}\rangle^{2}$ on the barrier corresponding to $\lambda_{j}$ . The charges on the barriers repel those on the eigenvalues with a force that is proportional to the charge on the barrier and inversely proportional to the distance from the barrier — i.e., the force from barrier $j$ is given by

a quantity which is positive for $\lambda_{j}$ ‘below’ $\lambda$ , which are pushing the partical ‘upward’, and negative otherwise. The eigenvalues move up the slope until they reach an equilibrium in which the repulsive forces from the barriers cancel the effect of gravity, which we take to be a $+1$ in the downward direction. Thus the equilibrium condition corresponds exactly to having the total ‘downward pull’ $f(\lambda)$ equal to zero.

With this physical model in mind, we begin to consider what happens to the eigenvalues of $A$ when we add a random vector from our set $\{\mathbf{v}_{i}\}$ . The first observation is that for any eigenvector $u_{j}$ (in fact for any vector at all), the expected projection of a randomly chosen $\mathbf{v}\in\{\mathbf{v}_{i}\}_{i\leq m}$ is

Of course, this does not mean that there is any single vector $\mathbf{v}_{i}$ in our set that realizes this ‘expected behavior’ of equal projections on the eigenvectors. But if we were to add such a vector For concreteness, we remark that this ‘average’ vector would be precisely $\mathbf{v}_{\textsf{avg}}=\frac{1}{\sqrt{m}}\sum_{j}u_{j}.$ in our physical model, we would add equal charges of $1/m$ to each of the barriers, and we would expect all of the eigenvalues of $A$ to drift forward ‘steadily’. In fact, one might expect that after sufficiently many iterations of this process, the eigenvalues would all march forward together, with no eigenvalue too far ahead or too far behind, and we would end up in a position where $\lambda_{max}/\lambda_{min}$ is bounded.

In fact, this intuition turns out to be correct. Adding a vector with equal projections changes the characteristic polynomial in the following manner:

since $p_{A}^{\prime}(x)=\sum_{j}\prod_{i\neq j}(x-\lambda_{i})$ . If we start with $A=0$ , which has characteristic polynomial $p_{0}(x)=x^{n}$ , then after $k$ iterations of this process we obtain the polynomial

where $D$ is the derivative with respect to $x$ . Fortunately, iterating the operator $(I-\alpha D)$ for any $\alpha>0$ generates a standard family of orthogonal polynomials – the associated Laguerre polynomials . These polynomials are very well-studied and the locations of their zeros are known; in particular, after $k=dn$ iterations the ratio of the largest to the smallest zero is known to be

To prove the theorem, we will show that we can choose a sequence of actual vectors that realizes the expected behavior (i.e. the behavior of repeatedly adding $\mathbf{v}_{\textsf{avg}}$ ), as long as we are allowed to add arbitrary fractional amounts of the $\mathbf{v}_{i}\mathbf{v}_{i}^{T}$ via the weights $s_{i}\geq 0$ . We will control the eigenvalues of our matrix by maintaining two barriers as in the physical model, and keeping the eigenvalues between them. The lower barrier will ‘repel’ the eigenvalues forward; the upper one will make sure they do not go too far. The barriers will move forward at a steady pace. By maintaining that the total ‘repulsion’ at every step of this process is bounded, we will be able to guarantee that there is always some multiple of a vector to add that allows us to continue the process.

2 Proof by Barrier Functions

We begin by defining two ‘barrier’ potential functions which measure the quality of the eigenvalues of a matrix. These potential functions are inspired by the inverse law of repulsion in the physical model discussed in the last section.

To prove the theorem, we will build the sum $\sum_{i}s_{i}\mathbf{v}_{i}\mathbf{v}_{i}^{T}$ iteratively, adding one vector at a time. Specifically, we will construct a sequence of matrices

along with positive constantsOn first reading the paper, we suggest the reader follow the proof with the assignment $\epsilon_{U}=\epsilon_{L}=1$ , $u_{0}=n$ , $l_{0}=-n$ , $\delta_{U}=2$ , $\delta_{L}=1/3$ . This will provide the bound $(6d+1)/(d-1)$ , and eliminates the need to use Claim 3.6. $u_{0},l_{0},\delta_{U},\delta_{L},\epsilon_{U}$ and $\epsilon_{L}$ which satisfy the following conditions:

Initially, the barriers are at $u=u_{0}$ and $l=l_{0}$ and the potentials are

Each matrix is obtained by a rank-one update of the previous one — specifically by adding a positive multiple of an outer product of some $\mathbf{v}_{i}$ .

If we increment the barriers $u$ and $l$ by $\delta_{U}$ and $\delta_{L}$ respectively at each step, then the upper and lower potentials do not increase. For every $q=0,1,\ldots Q$ ,

No eigenvalue ever jumps across a barrier. For every $q=0,1,\ldots Q$ ,

To complete the proof we will choose $u_{0},l_{0},\delta_{U},\delta_{L},\epsilon_{U}$ and $\epsilon_{L}$ so that after $Q=dn$ steps, the condition number of $A^{(Q)}$ is bounded by

By construction, $A^{(Q)}$ is a weighted sum of at most $dn$ of the vectors, as desired.

The main technical challenge is to show that conditions (b) and (c) can be satisfied simultaneously — i.e., that there is always a choice of $\mathbf{v}\mathbf{v}^{T}$ to add to the current matrix which allows us to shift both barriers up by a constant without increasing either potential. We achieve this in the following three lemmas.

The first lemma concerns shifting the upper barrier. If we shift $u$ forward to $u+\delta_{U}$ without changing the matrix $A$ , then the upper potential $\Phi^{u}(A)$ decreases since the eigenvalues $\lambda_{i}$ do not move and $u$ moves away from them. This gives us room to add some multiple of a vector $t\mathbf{v}\mathbf{v}^{T}$ , which will move the $\lambda_{i}$ towards $l$ and increase the potential, counteracting the initial decrease due to shifting. The following lemma quantifies exactly how much of a given $\mathbf{v}\mathbf{v}^{T}$ we can add without increasing the potential beyond its original value before shifting.

That is, if we add $t$ times $\mathbf{v}\mathbf{v}^{T}$ to $A$ and shift the upper barrier by $\delta_{U}$ , then we do not increase the upper potential.

We remark that $U_{A}(\mathbf{v})$ is linear in the outer product $\mathbf{v}\mathbf{v}^{T}$ .

Let $u^{\prime}=u+\delta_{U}$ . By the Sherman-Morrison formula, we can write the updated potential as:

The second lemma is about shifting the lower barrier. Here, shifting $l$ forward to $l+\delta_{L}$ while keeping $A$ fixed has the opposite effect — it increases the lower potential $\Phi_{l}(A)$ since the barrier $l$ moves towards the eigenvalues $\lambda_{i}$ . Adding a multiple of a vector $t\mathbf{v}\mathbf{v}^{T}$ will move the $\lambda_{i}$ forward and away from the barrier, decreasing the potential. Here, we quantify exactly how much of a given $\mathbf{v}\mathbf{v}^{T}$ we need to add to compensate for the initial increase from shifting $l$ , and return the potential to its original value before the shift.

That is, if we add $t$ times $\mathbf{v}\mathbf{v}^{T}$ to $A$ and shift the lower barrier by $\delta_{L}$ , then we do not increase the lower potential.

Now proceed as in the proof for the upper potential. Let $l^{\prime}=l+\delta_{L}$ . By Sherman-Morrison, we have:

Rearranging shows that $\Phi_{l+\delta_{L}}(A+t\mathbf{v}\mathbf{v}^{T})\leq\Phi_{l}(A)$ when $1/t\leq L_{A}(\mathbf{v})$ . ∎

The third lemma identifies the conditions under which we can find a single $t\mathbf{v}\mathbf{v}^{T}$ which allows us to maintain both potentials while shifting barriers, and thereby continue the process. The proof that such a vector exists is by an averaging argument, so this can be seen as the step in which we relate the behavior of actual vectors to the behavior of the expected vector $\mathbf{v}_{\textsf{avg}}$ . Notice that the use of variable weights $t$ , from which the eventual $s_{i}$ arise, is crucial to this part of the proof.

then there exists an $i$ and positive $t$ for which

from which the claim will follow by Lemmas 3.3 and 3.4. We begin by bounding

If $\lambda_{i}>l$ for all $i$ , $0\leq\sum_{i}(\lambda_{i}-l)^{-1}\leq\epsilon_{L}$ , and $1/\delta_{L}-\epsilon_{L}\geq 0$ , then

for every $i$ . So, the denominator of the left-most term on the left-hand side is positive, and the claimed inequality is equivalent to

which, by moving the first term on the RHS to the LHS, is just

All we need to do now is set $\epsilon_{U},\epsilon_{L},\delta_{U}$ , and $\delta_{L}$ in a manner that satisfies Lemma 3.5 and gives a good bound on the condition number. Then, we can take $A^{(0)}=0$ and construct $A^{(q+1)}$ from $A^{(q)}$ by choosing any vector $\mathbf{v}_{i}$ with

(such a vector is guaranteed to exist by Lemma 3.5) and setting $A^{(q+1)}=A^{(q)}+t\mathbf{v}_{i}\mathbf{v}_{i}^{T}$ for any $t\geq 0$ satisfying:

The initial potentials are $\Phi^{\frac{n}{\epsilon_{U}}}(0)=\epsilon_{U}$ and $\Phi_{\frac{n}{\epsilon_{L}}}(0)=\epsilon_{L}$ . After $dn$ steps, we have

To turn this proof into an algorithm, one must first compute the vectors $\mathbf{v}_{i}$ , which can be done in time ${{O}}\left(n^{2}m\right)$ . For each iteration of the algorithm, we must compute $((u+\delta_{U})I-A)^{-1}$ , $((u+\delta_{U})I-A)^{-2}$ , and the same matrices for the lower potential function. This computation can be performed in time ${{O}}\left(n^{3}\right)$ . Finally, we can decide which edge to add in each iteration by computing $U_{A}(\mathbf{v}_{i})$ and $L_{A}(\mathbf{v}_{i})$ for each edge, which can be done in time ${{O}}\left(n^{2}m\right)$ . As we run for $dn$ iterations, the total time of the algorithm is ${{O}}\left(dn^{3}m\right)$ .

Sparsifiers of the Complete Graph

Let $G=(V,E)$ be the complete graph on $n$ vertices, and let $H=(V,F,w)$ be a weighted graph of average degree $d$ that $(1+\epsilon)$ -approximates $G$ . As $x^{T}L_{G}x=n\left\|x\right\|^{2}$ for every $x$ orthogonal to $1$ , it is immediate that every vertex of $H$ has weighed degree between $n$ and $(1+\epsilon)n$ . Thus, one should think of $H$ as being an expander graph in which each edge weight has been multiplied by $n/d$ .

As $H$ is weighted and can be irregular, it may at first seem strange to view it as an expander. However, it may easily be shown to have the properties that define expanders: it has high edge-conductance, random walks mix rapidly on $H$ and converge to an almost-uniform distribution, and it satisfies the Expander Mixing Property (see or [10, Lemma 2.5]). High edge-conductance and rapid mixing would not be so interesting if the weighted degrees were not nearly uniform — for example, the star graph has both of these properties, but the random walk on the star graph converges to a very non-uniform distribution, and the star does not satisfy the Expander Mixing Property. For the convenience of the reader, we include a proof that $H$ has the Expander Mixing Property below.

Let $L_{H}=(V,E,w)$ be a graph that $(1+\epsilon)$ -approximates $L_{G}$ , the complete graph on $V$ . Then, for every pair of disjoint sets $S$ and $T$ ,

where $w(S,T)$ denotes the sum of the weights of edges between $S$ and $T$ .

where $M$ is a matrix of norm at most $(\epsilon/2)\left\|L_{G}\right\|\leq n\epsilon/2$ . Let $x$ be the characteristic vector of $S$ , and let $y$ be the characteristic vector of $T$ . We have

As $G$ is the complete graph and $S$ and $T$ are disjoint, we also know

Using the proof of the lower bound on the spectral gap of Alon and Boppana (see ) one can show that a $d$ -regular unweighted graph cannot $\kappa$ -approximate a complete graph for $\kappa$ asymptotically better than (2). We conjecture that this bound also holds for weighted graphs of average degree $d$ . Presently, we prove the following weaker result for such graphs.

Let $G$ be the complete graph on vertex set $V$ , and let $H=(V,E,w)$ be a weighted graph with $n$ vertices and a vertex of degree $d$ . If $H$ $\kappa$ -approximates $G$ , then

We use a standard approach. Suppose $H$ is a $\kappa$ -approximation of the complete graph. We will construct vectors $x^{*}$ and $y^{*}$ orthogonal to the $1$ vector so that

is large, and this will give us a lower bound on $\kappa$ .

Let $v_{0}$ be the vertex of degree $d$ , and let its neighbors be $v_{1},\dotsc,v_{d}$ . Suppose $v_{i}$ is connected to $v_{0}$ by an edge of weight $w_{i}$ , and the total weight of the edges between $v_{i}$ and vertices other than $v_{0},v_{1},\ldots,v_{d}$ is $\delta_{i}$ . We begin by considering vectors $x$ and $y$ with

These vectors are not orthogonal to $1$ , but we will take care of that later. It is easy to compute the values taken by the quadratic form at $x$ and $y$ :

Since $H$ is a $\kappa$ -approximation, all weighted degrees must lie between $n$ and $n\kappa$ , which gives

Let $x^{*}$ and $y^{*}$ be the projections of $x$ and $y$ respectively orthogonal to the $1$ vector. Then

Combining (5) and (6), we conclude that asymptotically:

But by our assumption the LHS is at most $\kappa$ , so we have

Conclusion

We conclude by drawing a connection between Theorem 3.1 and an outstanding open problem in mathematics, the Kadison-Singer conjecture. This conjecture, which dates back to 1959, is equivalent to the well-known Paving Conjecture as well as to a stronger form of the restricted invertibility theorem of Bourgain and Tzafriri . The following formulation is due to Nik Weaver .

then there is a partition $X_{1},\ldots X_{r}$ of $\{1,\ldots,m\}$ for which

Suppose we had a version of Theorem 3.1 which, assuming $\|\mathbf{v}_{i}\|\leq\delta$ , guaranteed that the scalars $s_{i}$ were all either or some constant $\beta>0$ , and gave a constant approximation factor $\kappa<\beta$ . Then we would have

for $S=\{i:s_{i}\neq 0\}$ , yielding a proof of Conjecture 5.1 with $r=2$ and $\epsilon=\min\{1-\frac{\kappa}{\beta},\frac{1}{\beta}\}$ since

As a special case, such a theorem would also imply the existence of unweighted sparsifiers for the complete graph and other (sufficiently dense) edge-transitive graphs. It is also worth noting that the $\|\mathbf{v}_{i}\|\leq\delta$ condition when applied to vectors $\{\Pi_{e}\}_{e\in E}$ arising from a graph simply means that the effective resistances of all edges are bounded; thus, we would be able to conclude that any graph with sufficiently small resistances can be split into two graphs that approximate it spectrally.