Community detection thresholds and the weak Ramanujan property

Laurent Massoulie

Introduction

Community detection, like clustering, aims to identify groups of similar items from a global population. It is a useful primitive for performing recommendation, e.g. of contents or contacts to users of online social networks. The stochastic block model has been introduced by Holland et al. to represent interactions between individuals. It consists of a random graph on $n$ nodes, each node $i\in\mathcal{N}=\{1,\ldots,n\}$ being assigned a type $\sigma_{i}$ from some fixed set $\Sigma$ . Conditionally on node types, edge $(i,j)$ is present with probability $p(\sigma_{i},\sigma_{j})$ independently of other edges, for some matrix of probabilities $(p(\sigma,\sigma^{\prime}))$ .

It constitutes an adequate testbed for community detection. Indeed the performance of candidate detection schemes, captured by the fraction of nodes $i$ for which estimated types $\hat{\sigma}_{i}$ and true types $\sigma_{i}$ coincide, can be compared and analysed on instances of the stochastic block model. Such analyses can in turn suggest new schemes.

Recently Decelle et al. conjectured the existence of a phase transition in the sparse regime where the graph’s average degree is $O(1)$ . Specifically, they predicted that for parameters below a certain threshold, no estimates $\hat{\sigma}_{i}$ of node types existed that would be positively correlated with true types $\sigma_{i}$ , while above the threshold, belief propagation algorithms could determine estimates $\hat{\sigma}_{i}$ achieving such a positive correlation. Their conjecture is formulated on a simple symmetric instance of the stochastic block model featuring two node types $\{+1,-1\}$ . The phenomenon appears more general though: Heimlicher et al. extended the conjecture to the more general setup of labeled stochastic block models.

The study of this phenomenon is important for two reasons. First, by localizing precisely the transition point below which no useful signal is present in the observations, one thus characterizes how much subsampling of the original graph can be performed before all information is lost. Second, algorithms leading to estimates $\hat{\sigma}_{i}$ that achieve positive correlation all the way down to the transition are expected to constitute more robust approaches than alternatives which would fail before the transition. It is therefore important to determine such algorithms.

The negative part of the conjecture has been proven by Mossel, Neeman and Sly . Essentially they established that existence of estimates $\hat{\sigma}_{i}$ positively correlated with true types $\sigma_{i}$ would imply feasibility of a reconstruction problem on a random tree model describing the local statistics of the original random graph. However by results of Evans et al. such reconstruction is infeasible below the conjectured transition point.

Until now, positive results in the sparse case did not apply down to the transition point. The best results to date (see ) relied on Coja-Oghlan , showing that spectral clustering applied to the adjacency matrix, suitably trimmed by removal of high degree nodes, yields positively correlated estimates. However this does not apply down to the conjectured threshold.

This limitation stems from the following fact. Spectral methods perform well on matrices enjoying a spectral separation property, namely the spectrum should comprise a few large eigenvalues whose associated eigenvectors reflect the sought structure and all other eigenvalues should be negligible. The prototype of such separation is the Ramanujan property, according to which $d$ -regular graphs have the second eigenvalue $\lambda$ no larger than $2\sqrt{d-1}$ in absolute value. Friedman established that random $d$ -regular graphs almost satisfy this, in that for them $|\lambda|\leq 2\sqrt{d-1}+o(1)$ . Erdős-Rényi graphs with average degree $d$ are such that $|\lambda|\leq O(\sqrt{d})$ , provided $d=\Omega(\log n)$ (see Feige and Ofek ), but such Ramanujan-like separation is lost for smaller $d$ . This lack of separation inherently limits the power of spectral methods in the sparse case.

2 Main results

We focus on the stochastic block model in Decelle et al. . The graph is denoted $\mathcal{G}$ , node types (or spins) $\sigma_{i}$ are uniformly and i.i.d. drawn from $\{-1,+1\}$ . An edge is present between any two nodes $i$ , $j$ with probability $a/n$ if $\sigma_{i}=\sigma_{j}$ , and $b/n$ if $\sigma_{i}=-\sigma_{j}$ , constants $a$ and $b$ being the model parameters. The conjectured transition point is specified by quantity $\tau=(a-b)^{2}/[2(a+b)]$ : for $\tau<1$ it is known that positively correlated detection is impossible; we set out to prove that it is feasible for $\tau>1$ .

We shall make use of the notations $\alpha:=(a+b)/2$ , $\beta:=(a-b)/2$ . The detectability condition $\tau>1$ can be restated as

the empirical overlap between the true and estimated spins defined as

converges in probability to the set $\{-r,+r\}$ for some strictly positive constant $r>0$ as $n\to\infty$ .

3 Paper organization

Proof structure

Before we describe the steps used to establish this, let us verify how it implies Theorem 1.1. Note that since ${\mathbf{E}}(X)=1$ , writing

we see that inequality ${\mathbf{P}}(X\geq x)-{\mathbf{P}}(-X\geq x)>0$ must hold on a set of $x$ ’s of positive Lebesgue measure. Since the points $x$ at which the distribution of either $X$ or $-X$ has an atom is at most countable, there thus exists an $x$ at which neither distribution has an atom, and the desired inequality ${\mathbf{P}}(X\geq x)-{\mathbf{P}}(-X\geq x)>0$ holds. Letting $t=x/\sqrt{{\mathbf{E}}(X^{2})}$ and $r={\mathbf{P}}(X\geq x)-{\mathbf{P}}(-X\geq x)$ we readily have by (4) that the empirical overlap in (3) must converge to $\{-r,+r\}$ .

Theorem 2.1 will follow from the combination of two analyses. Let $\bar{A}$ denote the expectation of the graph’s adjacency matrix conditional on the spin vector $\sigma$ , that is

The first analysis establishes the following

They are close (in a sense made precise in Section 4) to the corresponding quantities $(B^{(t)}e)_{i}$ , $(B^{(t)}\sigma)_{i}$ , and are easier to analyze. In particular, they enjoy a quasi-deterministic growth property:

This, combined with Theorem 2.2, yields the key intermediate step:

Matrix expansion and spectral radii bounds

Our aim in this section is to establish Theorem 2.2. Denoting $\xi_{ij}$ the indicator of edge $(i,j)$ ’s presence in $\mathcal{G}$ we can write

where $\bar{A}$ is as in (5). We then have the expansion:

With these notations at hand, one obtains from (15):

since we chose $k$ so that $2k\epsilon>1$ and the last term is polylogarithmic in $n$ . This establishes (7).

and this last bound decays to zero as a power of $n$ by the condition $2k\epsilon>1$ and the fact that the last term in the product is polylogarithmic in $n$ . This completes the proof of Theorem 2.2.

Local Analysis: structure of expanded neighborhoods

For any $k\geq 0$ , the number of nodes with spin $\pm$ at distance $k$ (respectively $\leq k$ ) of node $i$ is denoted $U^{\pm}_{k}(i)$ (respectively, $U^{\pm}_{\leq k}(i)$ ). We thus have

We shall omit indices $i$ when considering quantities related to a fixed node $i$ . In the remainder of the section we condition on the spins $\sigma$ of all nodes. We denote $n_{\pm}$ as the number of nodes with spin $\pm$ .

For fixed $i\in\mathcal{N}$ it is readily seen that, conditionally on ${\mathcal{F}}_{k-1}:=\sigma(U^{+}_{t},U^{-}_{t},t\leq k-1)$ , we have:

Theorem (2.3) is established based on these characterizations by extensive use of Chernoff bounds for binomial variables. Its proof is deferred to the Appendix.

The next technical result establishes approximate independence of neighborhoods of distinct nodes. It is instrumental in Section 4.3 e.g. in establishing weak laws of large numbers on the fraction of nodes satisfying a given property.

We first state how to transport the deterministic growth controls (11) of Theorem 2.3 to vectors $B^{(m-1)}e$ and $B^{(m-1)}\sigma$ , a key step in the proof of Theorem 2.4. One has the following

Proof is in the Appendix, together with that of the following Corollary:

(of Theorem 2.4). Using identity (6), write for unit norm $x$ :

Using the bounds (26,27), the right-hand side is no larger than

By the previous inequalities (10,24,25) and the row sum bound, we have that

We now state two Lemmas which will allow to establish Theorem 4.1.

Using these, we now establish the following

3 Coupling with Poisson tree growth process

Introduce the stochastic process $\{V^{\pm}_{t}\}_{t\geq 0}$ defined by

where ${\mathcal{G}}_{t-1}=\sigma(V^{\pm_{k}},k\leq t-1)$ . We then have the following

The proof given in the Appendix relies on the Stein-Chen method for Poisson approximation.

where $V^{\pm}_{t}$ is as defined in (31). We then have the following

The two processes $\{M_{t}\}$ , $\{\Delta_{t}\}$ are ${\mathcal{G}}_{t}$ -martingales. Process $\{M_{t}\}$ is uniformly integrable under Condition $\alpha>1$ . Under Condition $\beta^{2}>\alpha$ , process $\{\Delta_{t}\}$ is also uniformly integrable.

Under $\alpha<\beta^{2}$ the martingale $\{\Delta_{t}\}$ converges almost surely to a unit mean random variable $\Delta_{\infty}$ . Moreover this random variable has a finite variance $1/(\beta^{2}/\alpha-1)$ to which the variance of $\Delta_{t}$ converges. It further holds that ${\mathbf{E}}|\Delta^{2}_{t}-\Delta_{\infty}^{2}|\to 0$ as $t\to\infty$ .

Together these properties allow to establish the following

One has the following convergence in probability

For each $t$ that is an atom of neither $\Delta_{\infty}$ ’s or $-\Delta_{\infty}$ ’s distribution, the following convergence in probability holds

To convey the main ideas of the proof (deferred to the Appendix), we now indicate how to establish a property similar to (34), namely for a continuous bounded function $g$ we establish convergence in probability

The expectation of the sum in the left-hand side reads

where $i\neq j$ are two fixed nodes with spin $\pm$ . By the coupling lemma 4.1 it holds that

It follows that the variance of the empirical average in (39) goes to zero as $n\to\infty$ . Its announced convergence in probability to $(1/2){\mathbf{E}}g(\pm\Delta_{\infty})$ then follows by Tchebitchev’s inequality.

Theorems 4.1 and 4.2 readily imply Theorem 2.1.

Conclusions

acknowledgements: The author gratefully acknowledges stimulating discussions on the topic with Marc Lelarge and Charles Bordenave.

References

Appendix A Proof of Proposition 3.1

We bound the expectation of the corresponding sum as follows. Let $v$ (respectively, $e$ ) be the number of nodes (respectively, edges) traversed by a particular circuit. The quantity $c=e-v+1$ is the so-called “tree excess”, counting the number of edges that are traversed while not being part of the tree consisting of edges whose first traversal strictly augments the number of spanned nodes.

We represent the corresponding circuit as follows.

We number nodes by the order in which they are met by the circuit, starting with node 1.

a path using only edges already used in the circuit, and lying on the tree of new node discoveries

a cycle edge connecting the end of the two previous steps to a node already spanned. Such a cycle edge may have already been traversed by the circuit one or several times.

Given the tree previously spanned, and the current position on it, the first part of the sequence is characterized by the node label of its end: indeed, since on this subsequence we enforce the condition that the paths are simple, back-tracking is forbidden. Hence there is only one path on the tree going from the origin to the destination. We thus represent the first part by the number of the destination node if this part is non-empty, and by zero otherwise.

For a given number of nodes $v$ and edges $e$ , the number of corresponding nodes in $\{1,\ldots,n\}$ is upper-bounded by $n^{v}$ . For a given edge present with multiplicity $m\in\{1,\ldots,2k\}$ , the corresponding expectation is zero if $m=1$ , and for $m\geq 2$ , we have

Appendix B Proof of Proposition 3.2

Thus we have the upper bound on the number of valid circuit labels with $v$ nodes and $e$ edges:

Appendix C Proof of Theorem 2.3

The following inequality is easily verified to hold for any non-negative $U$ , $V$ , $a$ , $b$ , $n$ such that $a/n,b/n\leq 1$ , and will be instrumental in the sequel:

Next lemma is the key ingredient to establish Theorem 2.3.

where $M$ denotes the matrix $(a/2\;b/2,b/2\;a/2)$ .

Recall that conditionally on $\mathcal{F}_{t-1}$ the random variables $U^{+}_{t}$ and $U^{-}_{t}$ are independent, distributed according to

Let $T$ be the first instant $t$ for which $U_{t}\geq K\log(n)$ , for some $K$ to be specified.

By definition of $T$ , necessarily $U_{T-1}<K\log(n)$ . Thus

The mean of the Binomial distribution in the right-hand side of the above is equivalent to $(a\vee b)(1/2)K\log(n)$ and less than $\kappa\log(n)$ for $\kappa=(a\vee b)K$ . Hence by Chernoff’s inequality, for $h(x):=x\log(x)-x+1$ ,

Take $K^{\prime}$ so that $\kappa h(K^{\prime}/2\kappa)>2$ . The right-hand side of the above is then no larger than $n^{-2}$ .

Thus properties (10) clearly hold for $t\leq T$ . We now establish that they hold with sufficiently large probability for larger $t$ .

Conditional on $\mathcal{F}_{T}$ , the binomial distribution of $U^{\pm}_{T+1}$ has mean

Using the inequalities (41) we obtain that this mean lies in the interval

For a given $\epsilon>0$ , we can choose $K$ sufficiently large so that

It follows that $U^{\pm}_{T+1}$ admits a relative deviation from its conditional mean by $\epsilon$ with probability at most $n^{-2}$ .

and consider the events $\mathcal{A}_{t}:=\{U^{\pm}_{t}\in[1-\epsilon_{t},1+\epsilon_{t}]\frac{aU^{\pm}_{t-1}+bU^{\mp}_{t-1}}{2}\}$ . Conditionally on $\mathcal{A}_{T},\ldots,\mathcal{A}_{t}$ , the vector $U_{t}=(U^{+}_{t},U^{-}_{t})$ verifies the announced inequality (42). Given that $\alpha$ is the spectral radius of $M$ , it follows from this condition that $U^{\pm}_{t}\geq(1-O(\epsilon))\alpha^{t-T}K^{\prime\prime}\log(n)$ . We then check that Chernoff’s bound applies to show that the condition holds at step $t$ with high enough probability. It suffices to ensure that

Using (42), we readily have for $t,t^{\prime}\leq T$ , with $t>t^{\prime}$ :

A similar lower bound holds with $-\epsilon_{s}$ in place of $+\epsilon_{s}$ . Setting $t^{\prime}=T$ in the upper bound, since $S_{T}=O(\log(n)$ , the upper bound (10) follows for $S_{t}$ , as $\prod_{s=T+1}^{t}(1+\epsilon_{s})=O(1)$ .

It readily follows that (11) holds for $S_{t}$ .

Consider now $D_{t}$ . Using (42) again, we have

Since $S_{s}=O(\log(n)\alpha^{s-T})$ , $|D_{T}|=O(\log(N)$ and $\epsilon_{s}=O(\alpha^{-(s-T)/2})$ , we obtain for $t^{\prime}=T$ :

where we have used the assumption that $\beta^{2}>\alpha$ to bound $\sum_{u>0}\beta^{-u}\alpha^{u/2}$ . Property (10) thus holds for $D_{t}$ .

Finally, the right-hand side of (43) is of order

Since for $t^{\prime}<T$ we readily have $D_{t^{\prime}}=O(\log(n)$ by definition of $T$ , property (11) follows for $D_{t}$ . ∎

Appendix D Proof of Lemma 4.2

There are two ways for creating cycles within the distance $k$ -neighborhood of $i$ : an edge may be present between two nodes at distance $k-1$ of $i$ , or two nodes at distance $k-1$ may be connected to the same node at distance $k$ of $i$ . The number of edges of the first type is stochastically dominated by $\hbox{Bin}(S_{k-1}^{2},a\vee b/n)$ . Its expected number conditionally on $\Omega_{k-1}(i)$ , defined as

As for the second type of cycles, its number is stochastically dominated by

Appendix E Proof of Lemma 4.3

Appendix F Proof of Corollary 4.1

Using the bound (25) for $i\in{\mathcal{B}}$ , we can bound the first summation, using Cauchy-Schwarz’s inequality by

By Cauchy-Schwarz again, this is no larger than

Appendix G Proof of Lemma 4.6

We assume that $\sigma_{i}=+$ , the case $\sigma=-$ being similar. Introduce the events

where constant $C$ is as in Theorem 2.3. As established in the proof of Theorem 2.3, the probability of each $\Omega_{k}$ is $1-o(n^{-2})$ .

Let us evaluate, conditionally on ${\mathcal{F}}_{k-1}$ and on $\Omega_{k-1}$ the variation distance between $(U^{+}_{k},U^{-}_{k})$ and a pair of (conditionally on ${\mathcal{F}}_{k-1}$ ) independent random variables with respective distributions

The Stein-Chen method enables to bound the variation distance between a $\hbox{Bin}(n,\lambda/n)$ and a $\hbox{Poi}(\lambda)$ random variables by $n\min(1,\lambda^{-1})(\lambda/n)^{2}\leq\lambda/n$ . Furthermore, two Poisson random variables with respective parameters $\lambda$ , $\lambda^{\prime}$ have variation distance at most $|\lambda-\lambda^{\prime}|$ . This entails the bounds

Appendix H Proof of Lemma 4.7

It readily follows that both processes $\{M_{t}\}$ , $\{\Delta_{t}\}$ are martingales. To establish uniform integrability we shall show that both processes have uniformly bounded variance. To that end we use the conditional variance formula

and the fact that the variance of a Poisson random variable equals its mean. Thus

This yields by the conditional variance formula

Since $\hbox{Var}(M_{0})=0$ , it follows by induction that

The latter is uniformly bounded for $\alpha>1$ hence the uniform integrability of $\{M_{t}\}$ under this condition.

It thus follows by $\hbox{Var}(\Delta_{0})=0$ and induction that

thus establishing uniform integrability of martingale $\{\Delta_{t}\}$ . ∎

Appendix I Proof of Corollary 4.2

Convergence almost surely and in $L_{1}$ is guaranteed under uniform integrability by the martingale convergence theorem (). Finiteness of the limiting variable’s variance under uniform bounds on the variance is also standard; it follows from Fatou’s lemma. Convergence of the variances is established as follows. The limiting variable satisfies a distributional equation given by

where the $\Delta_{i}$ , $\Delta^{\prime}_{i}$ are i.i.d. and distributed as $\Delta$ . The only solution for the variance of $\Delta$ , apart from the degenerate solution , is then readily seen to be $1/(\beta^{2}/\alpha-1)$ , which is indeed the limit of the variance of $\Delta_{t}$ . The $L_{1}$ -convergence of $\Delta^{2}_{t}$ to $\Delta_{\infty}^{2}$ is then a direct consequence of Scheffé’s lemma. ∎

Appendix J Proof of Theorem 4.2

Let us now consider the second moment of the empirical sum:

We break it into two terms, the first being

Using Lemma 4.6 and Theorem 2.3, using similar arguments as before we can bound this term by

which clearly goes to zero as $n\to\infty$ .

The convergence in probability (33) follows.

We now turn to establishing (34). We shall only consider the case of sign +, the other being handled similarly. Fix some arbitrarily small $\delta>0$ . Because $\tau$ is a continuity point of the distribution of $\Delta_{\infty}$ , we can find two bounded Lipschitz-continuous functions $f$ , $g$ such that

we have that this empirical sum differs from the simpler one

The same argument can be applied to $g$ , eventually leading to the convergence in probability

As $\delta$ is arbitrary, this establishes (34).

Pick again an arbitrary $\delta>0$ , two pairs of Lipschitz-continuous functions $f_{\pm}$ and $g_{\pm}$ such that

The difference $(n_{+}-n_{-})/n$ is of order $1/\sqrt{n}$ and thus vanishes. We upper-bound the remaining terms by

Letting $K$ denote the Lipschitz-continuity constant for both $g_{+}$ and $f_{-}$ , this last display differs from

Because of the assumed convergence in probability $\lim_{n\to\infty}||x-y||=0$ , the first error term necessarily tends to zero in probability by Cauchy-Schwarz inequality. The second term is dealt with as mentioned in the proof of the previous lemma. Finally, using the coupling lemmas 4.6 and 4.1, by evaluating the first and second moments of (49), we obtain the convergence in probability

The latter term is then an upper bound on the $\limsup$ of the empirical overlap. By the same approach, we obtain a lower bound of

on the $\liminf$ of the overlap. These upper and lower bounds differ by at most $2\delta$ , and differ from ${\mathbf{P}}(\Delta_{\infty}\geq t)-{\mathbf{P}}(\Delta_{\infty}\leq-t)$ by at most $\delta$ . Since $\delta$ is arbitrary, this establishes the announced convergence in probability of the empirical overlap to quantity $x$ where

is strictly positive by our choice of $t$ . ∎

Appendix K Proof of Lemma 4.4

Appendix L Proof of Lemma 4.5

To establish the lower bound of (29), note that by Cauchy-Schwarz,

The lower bound in (30) is established similarly, from the inequality

We control the magnitude of this quantity in the tree model; using coupling we will then transpose the corresponding estimates to the original scenario.

Let then ${\mathcal{T}}$ denote a branching process with offspring $\hbox{Poi}(\alpha)$ . The process of spins is then constructed by sampling uniformly the root’s spin, and then propagating spins in a Markovian fashion with transition matrix $(a/(a+b)b(a+b),b(a+b),a(a+b))$ that is $\alpha^{-1}M$ . Its eigenvalues are thus $(1,\beta/\alpha)$ .

We evaluate its second moment conditionally on ${\mathcal{T}}$ by writing $X^{2}$ as

We will use this formula, and further distinguish nodes $j^{\prime}$ according to their distance $2(d+d^{\prime}-\tau)$ for $\tau=0,\ldots,2(d\wedge d^{\prime})$ . This yields

Note now that with high probability, we have the following evaluations

By coupling (techniques of Theorem 4.2 involving Tchebitchev inequality, based on the bounds of Theorem 2.3 and Lemmas 4.6 and 4.1) we thus have that with high probability,