Spectral redemption: clustering sparse networks

Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang

I Spectral Clustering and Sparse Networks

In order to study the effectiveness of spectral algorithms in a specific ensemble of graphs, suppose that a graph GG is generated by the stochastic block model blockmodel1 . There are qq groups of vertices, and each vertex vv has a group label gv{1,,q}g_{v}\in\{1,\ldots,q\}. Edges are generated independently according to a q×qq\times q matrix pp of probabilities, with Pr[Au,v=1]=pgu,gv\Pr[A_{u,v}=1]=p_{g_{u},g_{v}}. In the sparse case, we have pab=cab/np_{ab}=c_{ab}/n, where the affinity matrix cabc_{ab} stays constant in the limit nn\to\infty.

The adjacency matrix is defined as the n×nn\times n matrix Au,v=1A_{u,v}=1 if (u,v)E(u,v)\in E and otherwise. A typical spectral algorithm assigns each vertex a kk-dimensional vector according to its entries in the first kk eigenvectors of AA for some kk, and clusters these vectors according to a heuristic such as the kk-means algorithm (often after normalizing or weighting them in some way). In the case q=2q=2, we can simply label the vertices according to the sign of the second eigenvector.

As shown in nadakuditi-newman1 , spectral algorithms succeed all the way down to the threshold (1) if the graph is sufficiently dense. In that case AA’s spectrum has a discrete part and a continuous part in the limit nn\to\infty. Its first eigenvector essentially sorts vertices according to their degree, while the second eigenvector is correlated with the communities. The second eigenvalue is given by

The question is when this eigenvalue gets lost in the continuous bulk of eigenvalues coming from the randomness in the graph. This part of the spectrum, like that of a sufficiently dense Erdős-Rényi random graph, is asymptotically distributed according to Wigner’s semicircle law Wigner

Thus the bulk of the spectrum lies in the interval [2c,2c][-2\sqrt{c},2\sqrt{c}]. If λc>c\lambda_{c}>c, which is equivalent to (1), the spectral algorithm can find the corresponding eigenvector, and it is correlated with the true community structure.

However, in the sparse case where cc is constant while nn is large, this picture breaks down due to a number of reasons. Most importantly, the leading eigenvalues of AA are dictated by the vertices of highest degree, and the corresponding eigenvectors are localized around these vertices KrivelevichSudakov:03 . As nn grows, these eigenvalues exceed λc\lambda_{c}, swamping the community-correlated eigenvector, if any, with the bulk of uninformative eigenvectors. As a result, spectral algorithms based on AA fail a significant distance from the threshold given by (1). Moreover, this gap grows as nn increases: for instance, the largest eigenvalue grows as the square root of the largest degree, which is roughly proportional to logn/loglogn\log n/\log\log n for Erdős-Rényi graphs. To illustrate this problem, the spectrum of AA for a large graph generated by the block model is depicted in Fig. 1.

Other popular operators for spectral clustering include the Laplacian L=DAL=D-A where Duv=duδu,vD_{uv}=d_{u}\delta_{u,v} is the diagonal matrix of vertex degrees, the random walk matrix Quv=Auv/duQ_{uv}=A_{uv}/d_{u}, and the modularity matrix Muv=Auvdudv/(2m)M_{uv}=A_{uv}-d_{u}d_{v}/(2m). However, all these experience qualitatively the same difficulties as with AA in the sparse case. Another simple heuristic is to simply remove the high-degree vertices (e.g. CO:10 ), but this throws away a significant amount of information; in the sparse case it can even destroy the giant component, causing the graph to fall apart into disconnected pieces BoJaRi:07 .

II The Non-Backtracking Operator

The main contribution of this paper is to show how to redeem the performance of spectral algorithms in sparse networks by using a different linear operator. The non-backtracking matrix BB is a 2m×2m2m\times 2m matrix, defined on the directed edges of the graph. Specifically,

Using BB rather than AA addresses the problem described above. The spectrum of BB is not sensitive to high-degree vertices, since a walk starting at vv cannot turn around and return to it immediately. Other convenient properties of BB are that any tree dangling off the graph, or disconnected from it, simply contributes zero eigenvalues to the spectrum, since a non-backtracking walk is forced to a leaf of the tree where it has nowhere to go. Similarly one can show that unicyclic components yield eigenvalues that are either , 11 or 1-1.

Moreover, the bulk of BB’s spectrum is confined to the disk in the complex plane of radius c\sqrt{c}, as shown in Fig. 2. As a result, the second eigenvalue is well separated from the top of the bulk, i.e., from the third largest eigenvalue in absolute value, as shown in Fig. 3.

The eigenvector corresponding to μc\mu_{c} is strongly correlated with the community structure. Since BB is defined on directed edges, at each vertex we sum this eigenvector over all its incoming edges. If we label vertices according to the sign of this sum, then the majority of vertices are labeled correctly (up to a change of sign, which switches the two communities). Thus a spectral algorithm based on BB succeeds when μc>c\mu_{c}>\sqrt{c}, i.e. when (1) holds—but unlike standard spectral algorithms, this criterion now holds even in the sparse case. We present arguments for these claims in the next section.

III Reconstruction and a Community-Correlated Eigenvector

In this section we sketch justifications of the claims in the previous section regarding BB’s spectral properties, showing that its second eigenvector is correlated with the communities whenever (1) holds. Let us start by recalling how to generalize equation (2) for the adjacency matrix AA of sparse graphs. We follow mossel-neeman-sly , who derived a similar result in the case of random regular graphs.

With μ=μc\mu=\mu_{c} defined as in (3), for a given integer rr, consider the vector

where σu=±1\sigma_{u}=\pm 1 denotes uu’s community. By the theory of the reconstruction problem on trees KestenStigum:66 ; MosselPeres:03 , if (1) holds then the correlation f(r),σ/n\langle f^{(r)},\sigma\rangle/n is bounded away from zero in the limit nn\to\infty.

We will show that if rr is large but small compared to the diameter of the graph, then f(r)f^{(r)} is closely related to the second eigenvector of BB. Thus if we label vertices according to the sign of this second eigenvector (summed over all incoming edges at each vertex) we obtain the true communities with significant accuracy.

First we show that f(r)f^{(r)} approximately obeys an eigenvalue equation that generalizes (2). As long as the radius-rr neighborhood of vv is a tree, we have

Summing over vv’s neighborhood gives the expectation

and summing the fluctuations over the (in expectation) crc^{r} vertices at distance rr gives

If μ=μc\mu=\mu_{c} and (1) holds so that μc>c\mu_{c}>\sqrt{c}, these fluctuations tend to zero for large rr. In that case, we can identify f(r)f^{(r)} with f(r±1)f^{(r\pm 1)}, and (5) becomes

In particular, in the dense case we can recover (2) by approximating DD with c\mathds1c\mathds{1}, or equivalently pretending that the graph is cc-regular. Then ff is an eigenvector of AA with eigenvalue λc=μ+(c1)μ1\lambda_{c}=\mu+(c-1)\mu^{-1}.

We define an analogous approximate eigenvector of BB,

where now dd refers to the number of steps in the graph of directed edges. We have in expectation

and as before g(r)g(r+1)|g^{(r)}-g^{(r+1)}| tends to zero as rr increases. Identifying them gives an approximate eigenvector gg with eigenvalue μ\mu,

Furthermore, summing over all incoming edges gives

giving signs correlated with the true community memberships σv\sigma_{v}.

We note that the relation between the eigenvalue equation (7) for BB and the quadratic eigenvalue equation (6) is exact and well known in the theory of zeta functions of graphs Hashimoto:89 ; Bass92 ; AnFrHo:07 . More generally, all eigenvalues μ\mu of BB that are not ±1\pm 1 are the roots of the equation

This equation hence describes 2n2n of BB’s eigenvalues. These are the eigenvalues of a 2n×2n2n\times 2n matrix,

The left eigenvectors of BB^{\prime} are of the form (f,μf)(f,-\mu f) where ff obeys (6). Thus we can find ff by dealing with a 2n×2n2n\times 2n matrix rather than a 2m×2m2m\times 2m one, which considerably reduces the computational complexity of our algorithm.

Next, we argue that the bulk of BB’s spectrum is confined to the disk of radius c\sqrt{c}. First note that for any matrix BB,

Since this holds for any fixed rr, we conclude that almost all of BB’s eigenvalues obey μc\left|\mu\right|\leq\sqrt{c}. Proving rigorously that all the eigenvalues in the bulk are asymptotically confined to this disk requires a more precise argument and is left for future work.

As a side remark we note that (8) yields BB’s spectrum for dd-regular graphs AnFrHo:07 . There are nn pairs of eigenvalues μ±\mu_{\pm} such that

where λ\lambda are the (real) eigenvalues of AA. These are related by μ+μ=d1\mu_{+}\mu_{-}=d-1, so all the non-real eigenvalues of BB are conjugate pairs on the circle of radius d1\sqrt{d-1}. The other eigenvalues are ±1\pm 1. For random regular graphs, the asymptotic spectral density of BB follows straightforwardly from the well known result of McKay:81 for the spectral density of the adjacency matrix.

Finally, the singular values of BB are easy to derive for any simple graph, i.e., one without self-loops or multiple edges. Namely, BBTBB^{T} is block-diagonal: for each vertex vv, it has a rank-one block of size dvd_{v} that connects vv’s outgoing edges to each other. As a consequence, BB has nn singular values dv1d_{v}-1, and its other 2mn2m-n singular values are 11. However, since BB is not symmetric, its eigenvalues and its singular values are different—while its singular values are controlled by the vertex degrees, its eigenvalues are not. This is precisely why its spectral properties are better than those of AA and related operators.

IV More Than Two Groups and General Degree Distributions

The arguments given above regarding BB’s spectral properties generalize straightforwardly to other graph ensembles. First, consider block models with qq groups, where for 1aq1\leq a\leq q group aa has fractional size nan_{a}. The average degree of group aa is ca=bcabnbc_{a}=\sum_{b}c_{ab}n_{b}. The hardest case is where ca=cc_{a}=c is the same for all aa, so that we cannot simply label vertices according to their degree.

More generally, if the community-correlated eigenvectors have distinct eigenvalues, we can have multiple transitions where some of them can be detected by a spectral algorithm while others cannot.

There is an important difference between the general case and q=2q=2. While for q=2q=2 it is literally impossible for any algorithm to distinguish the communities below this transition, for larger qq the situation is more complicated. In general (for q5q\geq 5 in the assortative case, and q3q\geq 3 in the disassortative one) the threshold (12) marks a transition from an “easily detectable” regime to a “hard detectable” one. In the hard detectable regime, it is theoretically possible to find the communities, but it is conjectured that any algorithm that does so takes exponential time decelle-etal1 ; decelle-etal2 . In particular, we have found experimentally that none of BB’s eigenvectors are correlated with the groups in the hard regime. Nonetheless, our arguments suggest that spectral algorithms based on BB are optimal in the sense that they succeed all the way down to this easy/hard transition.

Since a major drawback of the stochastic block model is that its degree distribution is Poisson, we can also consider random graphs with specified degree distributions. Again, the hardest case is where the groups have the same degree distribution. Let aka_{k} denote the fraction of vertices of degree kk. The average branching ratio of a branching process that explores the neighborhood of a vertex, i.e., the average number of new edges leaving a vertex vv that we arrive at when following a random edge, is

V Deriving B by Linearizing Belief Propagation

The matrix BB also appears naturally as a linearization of the update equations for belief propagation (BP). This linearization was used previously to investigate phase transitions in the performance of the BP algorithm CoMoVi:09 ; Urbanke ; decelle-etal1 ; decelle-etal2 .

The update (13) has a trivial fixed point ηvw=1/2\eta_{v\to w}=1/2, where every vertex is equally likely to be in either community. Writing ηuv±=1/2±δuv\eta_{u\to v}^{\pm}=1/2\pm\delta_{u\to v} and linearizing around this fixed point gives the following update rule for δ\delta,

More generally, in a block model with qq communities, an affinity matrix cabc_{ab}, and an expected fraction nan_{a} of vertices in each community aa, linearizing around the trivial point and defining ηuva=na+δuva\eta_{u\to v}^{a}=n_{a}+\delta_{u\to v}^{a} gives a tensor product operator

where TT is the q×qq\times q matrix defined in (11).

Thus we can analyze BP to first order around the trivial fixed point by keeping track of just 2qn2qn variables rather than 2qm2qm of them.

This shows that the spectral properties of the non-backtracking matrix are closely related to belief propagation. Specifically, the trivial fixed point is unstable, leading to a fixed point that is correlated with the community structure, exactly when TBT\otimes B has an eigenvalue greater than 11. However, by avoiding the fixed point where all the vertices belong to the same group, we suppress BB’s leading eigenvalue; thus the criterion for instability is νμ2>1\nu\mu_{2}>1 where ν\nu is TT’s leading eigenvalue and μ2\mu_{2} is BB’s second eigenvalue. This is equivalent to (12) in the case where the groups are of equal size.

In general, the BP algorithm provides a slightly better agreement with the actual group assignment, since it approximates the Bayes-optimal inference of the block model. On the other hand, the BP update rule depends on the parameters of the block model, and if these parameters are unknown they need to be learned, which presents additional difficulties zhang-etal . In contrast, our spectral algorithm does not depend on the parameters of the block model, giving an advantage over BP in addition to its computational efficiency.

VI Experimental Results and Discussion

In Fig. 4, we compare the spectral algorithm based on the non-backtracking matrix BB with those based on various classical operators: the adjacency matrix AA, the modularity matrix MM, the Laplacian LL, and the random walk matrix QQ. We see that there is a regime where standard spectral algorithms do no better than chance, while the one based on BB achieves a strong correlation with the true group assignment all the way down to the detectability threshold. We also show the performance of belief propagation, which is believed to be asymptotically optimal decelle-etal1 ; decelle-etal2 .

We measure the performance as the overlap, defined as

Finally we turn towards real networks to illustrate the advantages of spectral clustering based on the non-backtracking matrix in practical applications. In Fig. 6 we show BB’s spectrum for several networks commonly used as benchmarks for community detection. In each case we plot a circle whose radius is the square root of the largest eigenvalue. Even though these networks were not generated by the stochastic block model, these spectra look qualitatively similar to the picture discussed above (Fig. 2). This leads to several very convenient properties. For each of these networks we observed that only the eigenvectors with real eigenvalues are correlated to the group assignment given by the ground truth. Moreover, the real eigenvalues that lie outside of the circle are clearly identifiable. This is very unlike the situation for the operators used in standard spectral clustering algorithms, where one must decide which eigenvalues are in the bulk and which are outside.

In particular, the number of real eigenvalues outside of circle seems to be a natural indicator for the true number qq of clusters present in the network, just as for networks generated by the stochastic block model. This suggests that in the network of political books there might in fact be 4 groups rather than 3, in the blog network there might be more than two groups, and in the NCAA football network there might be 10 groups rather than 12. However, we also note that large real eigenvalues may correspond in some networks to small cliques in the graph; it is a philosophical question whether or not to count these as communities.

Note also that clustering based on the non-backtracking matrix works not only for assortative networks, but also for disassortative ones, such as word adjacency networks adjnoun , where the important real eigenvalue is negative—without being told which is the case.

A Matlab implementation with demos that can be used to reproduce our numerical results can be found at code .

VII Conclusion

While recent advances have made statistical inference of network models for community detection far more scalable than in the past (e.g. decelle-etal1 ; ball-karrer-newman ; pseudolikelihood ; subsampling ) spectral algorithms are highly competitive because of the computational efficiency of sparse linear algebra. However, for sparse networks there is a large regime in which statistical inference methods such as belief propagation can detect communities, while standard spectral algorithms cannot.

We closed this gap by using the non-backtracking matrix BB as a new starting point for spectral algorithms. We showed that for sparse networks generated by the stochastic block model, BB’s spectral properties are much better than those of the adjacency matrix and its relatives. In fact, it is asymptotically optimal in the sense that it allows us to detect communities all the way down to the detectability transition. We also computed BB’s spectrum for some common benchmarks for community detection in real-world networks, showing that the real eigenvalues are a good guide to the number of communities and the correct labeling of the vertices.

Our approach can be straightforwardly generalized to spectral clustering for other types of sparse data, such as real-valued similarities between objects. The definition of BB extends to

where s(u,v)s(u,v) is the similarity index between uu and vv. As in the case of graphs, we cluster the vertices by computing the top eigenvectors of BB, projecting the rows of BB to the space spanned by these eigenvectors, and using a low-dimensional clustering algorithm such as kk-means to cluster the projected rows clustering-intro . However, we believe that, as for sparse graphs, there will be important regimes in which using BB will succeed where standard clustering algorithms fail. Given the wide use of spectral clustering throughout the sciences, we expect that the non-backtracking matrix and its generalizations will have a significant impact on data analysis.

References