Community detection thresholds and the weak Ramanujan property
Laurent Massoulie
Introduction
Community detection, like clustering, aims to identify groups of similar items from a global population. It is a useful primitive for performing recommendation, e.g. of contents or contacts to users of online social networks. The stochastic block model has been introduced by Holland et al. to represent interactions between individuals. It consists of a random graph on nodes, each node being assigned a type from some fixed set . Conditionally on node types, edge is present with probability independently of other edges, for some matrix of probabilities .
It constitutes an adequate testbed for community detection. Indeed the performance of candidate detection schemes, captured by the fraction of nodes for which estimated types and true types coincide, can be compared and analysed on instances of the stochastic block model. Such analyses can in turn suggest new schemes.
Recently Decelle et al. conjectured the existence of a phase transition in the sparse regime where the graph’s average degree is . Specifically, they predicted that for parameters below a certain threshold, no estimates of node types existed that would be positively correlated with true types , while above the threshold, belief propagation algorithms could determine estimates achieving such a positive correlation. Their conjecture is formulated on a simple symmetric instance of the stochastic block model featuring two node types . The phenomenon appears more general though: Heimlicher et al. extended the conjecture to the more general setup of labeled stochastic block models.
The study of this phenomenon is important for two reasons. First, by localizing precisely the transition point below which no useful signal is present in the observations, one thus characterizes how much subsampling of the original graph can be performed before all information is lost. Second, algorithms leading to estimates that achieve positive correlation all the way down to the transition are expected to constitute more robust approaches than alternatives which would fail before the transition. It is therefore important to determine such algorithms.
The negative part of the conjecture has been proven by Mossel, Neeman and Sly . Essentially they established that existence of estimates positively correlated with true types would imply feasibility of a reconstruction problem on a random tree model describing the local statistics of the original random graph. However by results of Evans et al. such reconstruction is infeasible below the conjectured transition point.
Until now, positive results in the sparse case did not apply down to the transition point. The best results to date (see ) relied on Coja-Oghlan , showing that spectral clustering applied to the adjacency matrix, suitably trimmed by removal of high degree nodes, yields positively correlated estimates. However this does not apply down to the conjectured threshold.
This limitation stems from the following fact. Spectral methods perform well on matrices enjoying a spectral separation property, namely the spectrum should comprise a few large eigenvalues whose associated eigenvectors reflect the sought structure and all other eigenvalues should be negligible. The prototype of such separation is the Ramanujan property, according to which -regular graphs have the second eigenvalue no larger than in absolute value. Friedman established that random -regular graphs almost satisfy this, in that for them . Erdős-Rényi graphs with average degree are such that , provided (see Feige and Ofek ), but such Ramanujan-like separation is lost for smaller . This lack of separation inherently limits the power of spectral methods in the sparse case.
2 Main results
We focus on the stochastic block model in Decelle et al. . The graph is denoted , node types (or spins) are uniformly and i.i.d. drawn from . An edge is present between any two nodes , with probability if , and if , constants and being the model parameters. The conjectured transition point is specified by quantity : for it is known that positively correlated detection is impossible; we set out to prove that it is feasible for .
We shall make use of the notations , . The detectability condition can be restated as
the empirical overlap between the true and estimated spins defined as
converges in probability to the set for some strictly positive constant as .
3 Paper organization
Proof structure
Before we describe the steps used to establish this, let us verify how it implies Theorem 1.1. Note that since , writing
we see that inequality must hold on a set of ’s of positive Lebesgue measure. Since the points at which the distribution of either or has an atom is at most countable, there thus exists an at which neither distribution has an atom, and the desired inequality holds. Letting and we readily have by (4) that the empirical overlap in (3) must converge to .
Theorem 2.1 will follow from the combination of two analyses. Let denote the expectation of the graph’s adjacency matrix conditional on the spin vector , that is
The first analysis establishes the following
They are close (in a sense made precise in Section 4) to the corresponding quantities , , and are easier to analyze. In particular, they enjoy a quasi-deterministic growth property:
This, combined with Theorem 2.2, yields the key intermediate step:
Matrix expansion and spectral radii bounds
Our aim in this section is to establish Theorem 2.2. Denoting the indicator of edge ’s presence in we can write
where is as in (5). We then have the expansion:
With these notations at hand, one obtains from (15):
since we chose so that and the last term is polylogarithmic in . This establishes (7).
and this last bound decays to zero as a power of by the condition and the fact that the last term in the product is polylogarithmic in . This completes the proof of Theorem 2.2.
Local Analysis: structure of expanded neighborhoods
For any , the number of nodes with spin at distance (respectively ) of node is denoted (respectively, ). We thus have
We shall omit indices when considering quantities related to a fixed node . In the remainder of the section we condition on the spins of all nodes. We denote as the number of nodes with spin .
For fixed it is readily seen that, conditionally on , we have:
Theorem (2.3) is established based on these characterizations by extensive use of Chernoff bounds for binomial variables. Its proof is deferred to the Appendix.
The next technical result establishes approximate independence of neighborhoods of distinct nodes. It is instrumental in Section 4.3 e.g. in establishing weak laws of large numbers on the fraction of nodes satisfying a given property.
We first state how to transport the deterministic growth controls (11) of Theorem 2.3 to vectors and , a key step in the proof of Theorem 2.4. One has the following
Proof is in the Appendix, together with that of the following Corollary:
(of Theorem 2.4). Using identity (6), write for unit norm :
Using the bounds (26,27), the right-hand side is no larger than
By the previous inequalities (10,24,25) and the row sum bound, we have that
We now state two Lemmas which will allow to establish Theorem 4.1.
Using these, we now establish the following
3 Coupling with Poisson tree growth process
Introduce the stochastic process defined by
where . We then have the following
The proof given in the Appendix relies on the Stein-Chen method for Poisson approximation.
where is as defined in (31). We then have the following
The two processes , are -martingales. Process is uniformly integrable under Condition . Under Condition , process is also uniformly integrable.
Under the martingale converges almost surely to a unit mean random variable . Moreover this random variable has a finite variance to which the variance of converges. It further holds that as .
Together these properties allow to establish the following
One has the following convergence in probability
For each that is an atom of neither ’s or ’s distribution, the following convergence in probability holds
To convey the main ideas of the proof (deferred to the Appendix), we now indicate how to establish a property similar to (34), namely for a continuous bounded function we establish convergence in probability
The expectation of the sum in the left-hand side reads
where are two fixed nodes with spin . By the coupling lemma 4.1 it holds that
It follows that the variance of the empirical average in (39) goes to zero as . Its announced convergence in probability to then follows by Tchebitchev’s inequality.
Theorems 4.1 and 4.2 readily imply Theorem 2.1.
Conclusions
acknowledgements: The author gratefully acknowledges stimulating discussions on the topic with Marc Lelarge and Charles Bordenave.
References
Appendix A Proof of Proposition 3.1
We bound the expectation of the corresponding sum as follows. Let (respectively, ) be the number of nodes (respectively, edges) traversed by a particular circuit. The quantity is the so-called “tree excess”, counting the number of edges that are traversed while not being part of the tree consisting of edges whose first traversal strictly augments the number of spanned nodes.
We represent the corresponding circuit as follows.
We number nodes by the order in which they are met by the circuit, starting with node 1.
a path using only edges already used in the circuit, and lying on the tree of new node discoveries
a cycle edge connecting the end of the two previous steps to a node already spanned. Such a cycle edge may have already been traversed by the circuit one or several times.
Given the tree previously spanned, and the current position on it, the first part of the sequence is characterized by the node label of its end: indeed, since on this subsequence we enforce the condition that the paths are simple, back-tracking is forbidden. Hence there is only one path on the tree going from the origin to the destination. We thus represent the first part by the number of the destination node if this part is non-empty, and by zero otherwise.
For a given number of nodes and edges , the number of corresponding nodes in is upper-bounded by . For a given edge present with multiplicity , the corresponding expectation is zero if , and for , we have
Appendix B Proof of Proposition 3.2
Thus we have the upper bound on the number of valid circuit labels with nodes and edges:
Appendix C Proof of Theorem 2.3
The following inequality is easily verified to hold for any non-negative , , , , such that , and will be instrumental in the sequel:
Next lemma is the key ingredient to establish Theorem 2.3.
where denotes the matrix .
Recall that conditionally on the random variables and are independent, distributed according to
Let be the first instant for which , for some to be specified.
By definition of , necessarily . Thus
The mean of the Binomial distribution in the right-hand side of the above is equivalent to and less than for . Hence by Chernoff’s inequality, for ,
Take so that . The right-hand side of the above is then no larger than .
Thus properties (10) clearly hold for . We now establish that they hold with sufficiently large probability for larger .
Conditional on , the binomial distribution of has mean
Using the inequalities (41) we obtain that this mean lies in the interval
For a given , we can choose sufficiently large so that
It follows that admits a relative deviation from its conditional mean by with probability at most .
and consider the events . Conditionally on , the vector verifies the announced inequality (42). Given that is the spectral radius of , it follows from this condition that . We then check that Chernoff’s bound applies to show that the condition holds at step with high enough probability. It suffices to ensure that
Using (42), we readily have for , with :
A similar lower bound holds with in place of . Setting in the upper bound, since , the upper bound (10) follows for , as .
It readily follows that (11) holds for .
Consider now . Using (42) again, we have
Since , and , we obtain for :
where we have used the assumption that to bound . Property (10) thus holds for .
Finally, the right-hand side of (43) is of order
Since for we readily have by definition of , property (11) follows for . ∎
Appendix D Proof of Lemma 4.2
There are two ways for creating cycles within the distance -neighborhood of : an edge may be present between two nodes at distance of , or two nodes at distance may be connected to the same node at distance of . The number of edges of the first type is stochastically dominated by . Its expected number conditionally on , defined as
As for the second type of cycles, its number is stochastically dominated by
Appendix E Proof of Lemma 4.3
Appendix F Proof of Corollary 4.1
Using the bound (25) for , we can bound the first summation, using Cauchy-Schwarz’s inequality by
By Cauchy-Schwarz again, this is no larger than
Appendix G Proof of Lemma 4.6
We assume that , the case being similar. Introduce the events
where constant is as in Theorem 2.3. As established in the proof of Theorem 2.3, the probability of each is .
Let us evaluate, conditionally on and on the variation distance between and a pair of (conditionally on ) independent random variables with respective distributions
The Stein-Chen method enables to bound the variation distance between a and a random variables by . Furthermore, two Poisson random variables with respective parameters , have variation distance at most . This entails the bounds
Appendix H Proof of Lemma 4.7
It readily follows that both processes , are martingales. To establish uniform integrability we shall show that both processes have uniformly bounded variance. To that end we use the conditional variance formula
and the fact that the variance of a Poisson random variable equals its mean. Thus
This yields by the conditional variance formula
Since , it follows by induction that
The latter is uniformly bounded for hence the uniform integrability of under this condition.
It thus follows by and induction that
thus establishing uniform integrability of martingale . ∎
Appendix I Proof of Corollary 4.2
Convergence almost surely and in is guaranteed under uniform integrability by the martingale convergence theorem (). Finiteness of the limiting variable’s variance under uniform bounds on the variance is also standard; it follows from Fatou’s lemma. Convergence of the variances is established as follows. The limiting variable satisfies a distributional equation given by
where the , are i.i.d. and distributed as . The only solution for the variance of , apart from the degenerate solution , is then readily seen to be , which is indeed the limit of the variance of . The -convergence of to is then a direct consequence of Scheffé’s lemma. ∎
Appendix J Proof of Theorem 4.2
Let us now consider the second moment of the empirical sum:
We break it into two terms, the first being
Using Lemma 4.6 and Theorem 2.3, using similar arguments as before we can bound this term by
which clearly goes to zero as .
The convergence in probability (33) follows.
We now turn to establishing (34). We shall only consider the case of sign +, the other being handled similarly. Fix some arbitrarily small . Because is a continuity point of the distribution of , we can find two bounded Lipschitz-continuous functions , such that
we have that this empirical sum differs from the simpler one
The same argument can be applied to , eventually leading to the convergence in probability
As is arbitrary, this establishes (34).
Pick again an arbitrary , two pairs of Lipschitz-continuous functions and such that
The difference is of order and thus vanishes. We upper-bound the remaining terms by
Letting denote the Lipschitz-continuity constant for both and , this last display differs from
Because of the assumed convergence in probability , the first error term necessarily tends to zero in probability by Cauchy-Schwarz inequality. The second term is dealt with as mentioned in the proof of the previous lemma. Finally, using the coupling lemmas 4.6 and 4.1, by evaluating the first and second moments of (49), we obtain the convergence in probability
The latter term is then an upper bound on the of the empirical overlap. By the same approach, we obtain a lower bound of
on the of the overlap. These upper and lower bounds differ by at most , and differ from by at most . Since is arbitrary, this establishes the announced convergence in probability of the empirical overlap to quantity where
is strictly positive by our choice of . ∎
Appendix K Proof of Lemma 4.4
Appendix L Proof of Lemma 4.5
To establish the lower bound of (29), note that by Cauchy-Schwarz,
The lower bound in (30) is established similarly, from the inequality
We control the magnitude of this quantity in the tree model; using coupling we will then transpose the corresponding estimates to the original scenario.
Let then denote a branching process with offspring . The process of spins is then constructed by sampling uniformly the root’s spin, and then propagating spins in a Markovian fashion with transition matrix that is . Its eigenvalues are thus .
We evaluate its second moment conditionally on by writing as
We will use this formula, and further distinguish nodes according to their distance for . This yields
Note now that with high probability, we have the following evaluations
By coupling (techniques of Theorem 4.2 involving Tchebitchev inequality, based on the bounds of Theorem 2.3 and Lemmas 4.6 and 4.1) we thus have that with high probability,