Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs
Charles Bordenave, Marc Lelarge, Laurent Massoulié
Introduction
Given a finite (simple, non-oriented) graph , several matrices of interest can be associated to , besides its adjacency matrix . In this work we are interested in the so-called non-backtracking matrix of , denoted by . It is indexed by the set of oriented edges in and defined by
where for any , we set , , . This matrix was introduced by Hashimoto . A non-backtracking walk is a directed path of directed edges of such that no edge is the inverse of its preceding edge. It is easily seen that for any , counts the number of non-backtracking walks of edges on starting with and ending with .
Our focus is the spectrum of , referred to in the sequel as the non-backtracking spectrum of , when is a sparse random graph. Specifically we shall characterize the asymptotic behavior of the leading eigenvalues and associated eigenvectors in the non-backtracking spectrum of sparse random graphs in the limit where .
The random graphs we consider are drawn according to the Stochastic Block Model, a generalization of Erdős-Rényi graphs due to Holland et al. . In this model nodes are partitioned into groups. An edge between two nodes is drawn with probability , where denotes the group node belongs to. Thus when the matrix is fixed the expected node degrees remain of order 1 as . We focus moreover on instances where the fraction of nodes in group converges to a limit as .
Our primary motivation stems from the problem of community detection, namely: how to estimate a clustering of graph nodes into groups close to the underlying blocks, based on the observation of such a random graph ? Decelle et al. conjectured a phase transition phenomenon on detectability, namely: the underlying block structure could be detected if and only if .
In the case of two communities with roughly equal sizes () and symmetric matrix , the negative part (impossibility of detection when ) was proven by Mossel et al . As for the positive part (feasibility of detection when ), it was conjectured in that the so-called belief propagation algorithm would succeed. Krzakala et al. then made their so-called “spectral redemption conjecture” according to which detection based on the second eigenvector of the non-backtracking matrix would succeed.
Recently a variant of the spectral redemption conjecture was proven by Massoulié : the spectrum of a matrix counting self-avoiding paths in allows to detect communities through thresholding of the second eigenvector. More recently and independently of , an alternative proof of the positive part of the conjecture in was given by Mossel et al. , based on an elaborate construction involving countings of non-backtracking paths in .
The two approaches of and to proving the positive part of the conjecture in , while both relying ultimately on properties of specific path counts, differ however in the following respects. The method in relies on a clear spectral separation property but its implementation is computationally expensive, as the counts of self-avoiding walks it relies upon take super-linear (though polynomial) time. The method in is computationally efficient as it runs in quasi-linear time, but the proof does not establish a spectral separation property. The other two methods conjectured to achieve successful reconstruction, namely belief propagation and analysis of non-backtracking spectrum, are computationally efficient and they are motivated by a clear intuition as described in the spectral redemption conjecture.
Our present work proves the spectral redemption conjecture. More generally by characterizing all the leading eigen-elements it determines the limits of community detection based on the non-backtracking spectrum in the presence of an arbitrary number of communities.
Ihara zeta function
Hashimoto introduced the matrix in the context of the Ihara zeta function . We have the identity
where is the Ihara zeta function of the graph, refer to . It follows that the poles of the Ihara zeta function are the reciprocal of the eigenvalues of . Our main results have thus consequences on the localization of the poles of the zeta function of random graphs.
Weak Ramanujan property
Our result also has an interpretation from the standpoint of Ramanujan graphs, introduced by Lubotzky et al. (see Murty for a recent survey). These are by definition -regular graphs such that the second largest modulus of its eigenvalues is at most . By a result of Alon and Boppana (see ) for fixed , -regular graphs on nodes must have their second largest eigenvalue at least as . Hence Ramanujan graphs are regular graphs with maximal spectral gap between the first and second eigenvalue moduli. A celebrated result of Friedman states that random -regular graphs achieve this lower bound with high probability as their number of nodes goes to infinity.
Lubotzky has proposed an extension of the definition of Ramanujan graphs to the non-regular case. Specifically, is Ramanujan according to this definition if and only if
where is the adjacency matrix of , its spectral radius, and the spectral radius of the adjacency operator on the universal covering tree of .
Using the analogy between the Ihara zeta function and the Riemann zeta function, Stark and Terras (see ) have defined a graph to satisfy the graph Riemann hypothesis if its non-backtracking matrix has no eigenvalues such that , where is the Perron-Frobenius eigenvalue of . Interestingly, a regular graph is Ramanujan if and only if it satisfies the graph Riemann hypothesis (see and ). Thus the graph Riemann hypothesis can also be seen as a generalization of the notion of Ramanujan graphs to the non-regular case, phrased in terms of non-backtracking spectra rather than on spectra of universal covers as in the definition of Lubotzky .
Our results imply that for fixed , Erdős-Rényi graphs have an associated non-backtracking matrix such that and all its other eigenvalues verify with high probability as . In this sense, Erdős-Rényi graphs asymptotically satisfy the graph Riemann hypothesis, itself is a plausible extension of the notion of Ramanujan graphs to the non-regular case. This may be seen as an analogue of Friedman’s Theorem in the context of Erdős-Rényi graphs. Similarly, for the Stochastic Block Model, our main result is analogue to recent results on the eigenvalues of random -lifts of base graphs, see . Interestingly, in , the methods developed in the present paper are adapted to lead to a new and simpler proof of Friedman’s Theorem and its extensions to random -lifts. The random graphs studied here will require a more delicate analysis.
Organization
The paper is organized as follows. We start in Section 2 with preliminaries on non-backtracking matrices. We state our main results in Section 3, namely Theorems 3 and 4 characterizing properties of non-backtracking spectra of Erdős-Rényi graphs and Stochastic Block Models respectively, and Theorem 5 establishing their consequence for community detection.
In Section 4, we provide the algebraic ingredients we shall need. Specifically we establish general bounds on the perturbation of eigenvalues and eigenvectors of not necessarily symmetric matrices that elaborate on the Bauer-Fike Theorem.
In Section 6 we detail another combinatorial argument needed in our proof, namely we establish bounds on the norms of the various matrices involved in the previous matrix expansion. The latter norm bounds are established by the trace method, adapting arguments due to Fűredi and Komlós .
Section 7 gives the proof strategy for Theorem 4 on non-backtracking spectra of Stochastic Block Models.
In Section 8 we establish convergence results on multitype branching processes that extend results of Kesten and Stigum . These are then used in Section 9 to characterize the local structure of the random graphs under study. Specifically we relate the local statistics of Stochastic Block Models to branching processes via coupling, and then establish weak laws of large numbers on these local statistics. These probabilistic arguments together with the previous algebraic and combinatorial arguments allow us to conclude the proofs of Theorems 3 and 4. Section 11 contains the proof of Theorem 5 on community detection.
Preliminaries on non-backtracking matrices
We set . A priori, is not a normal matrix. We are interested in its eigenvalues ordered non-increasingly, . The Perron-Frobenius Theorem implies notably that is a non-negative real. If is connected, is equal to the growth rate of the universal cover of (see Angel, Friedman and Hoory ).
Hence is a symmetric matrix (in mathematical physics, this type of symmetry is called PT-invariance, PT standing for parity-time). If , , are the eigenvalues of and , , is an orthonormal basis of eigenvectors, we deduce that
This is precisely the singular value decomposition of .
For example, for , it is a simple exercise to compute . We find that the eigenvalues of are , , and with multiplicity . In particular, the singular values of contain only information on the degree sequence of the underlying graph .
For large however, we may expect that the decomposition (3) carries more structural information on the graph (this will be further discussed in Subsection 2.2 below). This will be the underlying principle in the proof of our main results. For the moment, we simply note the following. Assume that is irreducible. From Perron-Frobenius theorem, if is the Perron eigenvector of , , then for any fixed,
A quantitative version of the above limits will be given in the forthcoming Proposition 7. Another consequence of (3) is that, for , and should be nearly orthogonal if these vectors converge as . Indeed, a heuristic computation gives
We will exploit this general phenomenon in the proof of our main results.
2 Chung, Cheeger and Alon-Boppana inequalities for non-backtracking matrices
The aim of this subsection is to advocate the use of non-backtracking matrices. Here, we discuss briefly candidate counterparts for irregular graphs of inequalities that are classical in the context of regular graphs. This subsection will not be used in the proof of our main results, it may be skipped.
The diameter bound of Chung gives an upper bound on the diameter of a regular graph in terms of its spectral gap. The following lemma, expressed in terms of the decomposition (4) of , is an analogue:
Let , be such that
Then, the graph distance between and is at most .
Observe that if then and are at most at distance . On the other hand from (4),
has absolute value at most from Cauchy-Schwartz inequality and the orthonormality of the , . Thus finally, ∎
Cheeger-type inequalities connect the expansion ratio (isoperimetry) of the graph and its spectral gap, for a survey see . For a subset of edges, we measure its volume by
By construction, our volume is normalized with . We say that is edge-symmetric if . For example the set of edges adjacent to a given subset of vertices is edge-symmetric. If are edge-symmetric, we define
Since is the number of non-backtracking walks of length starting with and ending with , measures a kind of conductance between and with a proximity range of radius . If , the scalar
can be thought of as the outer surface of a set . The -th expansion ratio of is then defined as
In (3), after reordering the eigenvalues of as , plays the role of the spectral gap in the classical Cheeger inequality. With this new convention, the following lemma is the analog of the easy part of Cheeger’s inequality for graphs.
The argument is standard. For simplicity, we drop the index . From Courant-Fisher min-max Theorem, we have
Let be edge-symmetric. We set
By construction and . Hence,
However, using the edge-symmetry of and ,
Also, using the singular value equation ,
Since, for , , it concludes the proof. ∎
The Alon-Boppana theorem gives a lower bound on the second largest eigenvalue of the adjacency matrix of a regular graph (see ). We conclude this paragraph with an elementary bound of this type. We introduce for ,
In words, is the number of non-backtracking walks of length starting with . As already pointed, if is irreducible, the Perron eigenvalue is the growth rate of the universal cover of the graph: for any ,
This last crude inequality gives a lower bound on the second largest singular value of .
Main results
We now state our results on the non-backtracking spectra of Erdős-Rényi graphs first, and Stochastic Block Models next.
Let be an Erdős-Rényi graph with parameters for some fixed parameter . Then, with probability tending to as , the eigenvalues of its non-backtracking matrix satisfy
Moreover the normalized Perron-Frobenius eigenvector associated to is asymptotically aligned with
Theorem 3 is illustrated by Figure 1. We conjecture that the lower bound holds, it is reasonable in view of Figure 1. We shall prove a weaker lower bound, see forthcoming Remark 12. It is also an interesting open problem to study the convergence of the empirical distribution of the eigenvalues of .
2 Stochastic Block Model
For integer , we set . We consider a random graph on the vertex set defined as follows. Each vertex is given a type from the set where the number of types is assumed fixed and the map is such that, for all ,
for some probability vector . For ease of notation, we often write in place of .
i.e. for some integer , has positive coefficients. In particular, is the Perron-Frobenius eigenvalue. It implies notably that for all , . We define by
(with ). Since , the matrix is diagonalizable. Let be an orthonormal basis of eigenvectors of such that . Then and are the left and right-eigenvectors associated to eigenvalue , , . We get
where the second identity comes from and .
We will make the further assumption that each vertex type has the same asymptotic average degree , i.e.,
This entails that is a stochastic matrix and we then have
We will also assume that a quantitative version of (7) holds, namely that for some ,
The random graph is usually called the stochastic block model (SBM for short) or inhomogeneous random graph, see Bollobás, Janson and Riordan and Holland, Laskey and Leinhardt . A popular case is when the map is itself random and are i.i.d. with distribution . In this case, with probability one, condition (13) is met for any .
If , then we have . Under condition (11), we have so that and .
If , and for all so that condition (11) is satisfied. We have and .
In particular, . Our main result is the following generalization of Theorem 3.
Let be an SBM as above such that hypotheses (8,11,13) hold. Then with probability tending to as ,
Moreover, if is a simple eigenvalue of for some , then a normalized eigenvector, say , of is asymptotically aligned with
It follows from this result that a non-trivial estimation of the node types is feasible on the basis of the eigenvectors provided . More precisely, for vertex type estimators based on the observed random graph , following Decelle et al. , define the overlap as the minimum over permutations of the quantity
We shall say that has asymptotic overlap if converges in probability to as grows. It has asymptotic positive overlap if for some , with probability tending to as grows. Note that an asymptotic overlap of zero is always achievable by assigning to each vertex the type that maximizes . In the case where all communities have asymptotically the same size, i.e. , zero overlap is also achieved by assigning types at random.
As conjectured in and proven in , in the setup of Example 2 with , the best possible overlap is with high probability when , i.e. when . Conversely, adapting the argument in , when , we have the following
The reason for the existence of the signing in the above statement is that we do not know a priori whether the vector or is asymptotically close to (15). In the simplest case, we will be able to estimate this sign and the vector will be equal to or and , .
3 Notation
We denote by the transpose of .
Given a (non-oriented) graph , we denote by a walk of length where each and for all . We also denote the concatenation of two walks and by . A walk is non-backtracking if for all , . A walk contains a cycle if there exists with .
Algebraic tools: Perturbation of Eigenvalues and Eigenvectors
One main tool in our analysis is the Bauer-Fike Theorem. The form given below elaborates on the usual statement of the Theorem which in general omits the second half.
(Bauer-Fike Theorem; see [4, Theorem VI.5.1]). Let be a diagonalizable matrix such that for some invertible matrix and diagonal matrix one has . Let be a perturbation matrix.
Then any eigenvalue of verifies
where is the -th diagonal entry of .
Denote by the right-hand side of (17) and the ball centered at with radius . Let be a set of indices such that
Then the number of eigenvalues of in is exactly .
The following proposition on perturbation of rank one matrices will be a basic ingredient to deduce from expressions like (3) quantitative versions of (5). It relies on the stability of eigenvalues and eigenvectors of matrices which are not too far from being normal ( is normal if and only if ; see ), and with a well separated spectrum.
with , and
Let , be the eigenvalues of , with . Then, has multiplicity one and we have
with . Moreover, there exists a unit eigenvector of with eigenvalue such that
The eigenvalues of are . We deduce that
In particular, the eigenvalue has multiplicity one in and thus has multiplicity one in . We now bound the difference between and . First, by assumption,
with .
Since for , we obtain from (19) that
where is some scalar and we have used . We now use the following general inequality
From the trianle inequality, . We then have
Finally, since has multiplicity one in , the eigenspace of for coincides with the eigenspace of for . This concludes the proof of Proposition 7. ∎
Assume there exist such that for all , , , and
where , (the minimum over the empty set being ). Let , , be the eigenvalues of with . Then, there exists a permutation such that for all ,
with . Moreover, if has multiplicity one in , is a simple eigenvalue and there exists a unit eigenvector of with eigenvalue such that
with .
Now, by assumption, is less than the minimal distance between the distinct eigenvalues of . We deduce from Theorem 6 applied to that there is a permutation such that
We now bound the difference between and . By assumption, Hence, arguing as in the proof of Proposition 7,
It now remains to control the eigenvector of such that has multiplicity one. First from (22), is a simple eigenvalue of . Let be a corresponding normed eigenvector of . Applying yields
Applying once more to (23) yields
Multiplying (23) by and subracting it to the previous display yields
Moreover, this implies upon dividing (24) by :
We conclude this paragraph with an elementary lemma on Gram-Schmidt orthonormalization process. It will be used to obtain vectors which are exactly orthogonal as in the assumptions of Proposition 8.
We prove the statement by induction. For , . For , we denote by the orthonormal projection of on the span of . We have
It is easy to check that for all . In particular, if , and then from (21), . ∎
Erdős-Rényi graph: proof strategy for Theorem 3
(if , we set ). The proof relies on the following two propositions.
Hence, Proposition 11 implies that w.h.p.
We may now apply Proposition 7. If , we find that w.h.p.
and the normalized Perron eigenvector of satisfies w.h.p.
We note that from [15, Theorem 3.3.16], we get that in (4),
The proof of Proposition 11 relies crucially on a matrix expansion given in Proposition 13, which extends the argument introduced in for matrices counting self-avoiding walks to the present setup where non-backtracking walks instead are considered. We now introduce some notation to state it.
where is the graph’s adjacency matrix. For integer , , we define as the set of non-backtracking walks of length starting from and ending at in the complete graph on the vertex set . We have that
and is the subset of tangle-free paths in . For , we set
The matrix can be thought of as an attempt to center the non-backtracking matrix when the underlying graph is tangle-free. We use the convention that a product over an empty set is equal to . We also set
Notably, is the projection on . We have the following telescopic sum decomposition.
3 Norm bounds
The following proposition will be established in Section 6 using path counting combinatorial arguments.
4 Proof of Proposition 11
Together with Propositions 13 and 14, we shall also need the next two results, established by local analysis in Section 9. In particular the forthcoming Lemma 30 implies that
For the Erdős-Rényi graph, Corollary 34 states the following.
Also, from (2), since ,
Hence, from Proposition 16 and norm bound (31), w.h.p.
Proof of Proposition 14: path count combinatorics
The proof will use a version of the trace method. For , we set
The symmetry (2) implies that . With the convention that , we get
where is the set of sequence of paths such that is non-backtracking tangle-free of length and for all ,
with the convention that .
where is the subset of where each non-oriented edge is visited at least twice. For each we associate the graph of visited vertices and edges. We set
We say that a path is canonical if and the vertices are first visited in order. will denote the set of canonical paths in . Every canonical path is isomorphic to paths in . We also have the following
Let be the set of canonical paths with and . We have
In order to upper bound we need to find an injective way to encode the canonical paths .
Let . We set , will be called an edge of . We explore the sequence in lexicographic order denoted by (that is and ). We think of the index as a time. For , we say that is a first time, if has not been seen before (that is for all ). If is a first time the edge is called a tree edge. By construction, the set of tree edges forms a tree with vertex set . The edges which are not in are called the excess edges of . Any vertex different from has its associated tree edge. It follows that the cardinal of excess edges is .
We build a first encoding of . If is not a first time, we say that is an important time and we mark the time by the vector , where is the next time that will not be a tree edge (by convention if remains on the tree for all ). Since there is a unique non-backtracking path between two vertices of a tree, we can reconstruct from the position of the important times and their mark. It gives rise to our first encoding.
The main issue with this encoding is that the number of important times could be large. We have however not used so far the hypothesis that each path is tangle free. To this end, we are going to partition important times into three categories, short cycling, long cycling and superfluous times. First consider the case where the -th path contains a cycle. For each , the first time such that is called a short cycling time. Let be such that . By the assumption of tangle-freeness, is the only cycle visited by . We denote by the first time after that in not an edge of (by convention if remains on ). We add the extra mark to the short cycling time. Important times with or are called long cycling times. The other important times are called superfluous. The key observation is that for each , the number of long cycling times is bounded by (since there is at most one cycle, no edge of can be seen twice outside those of , the coming from the fact that the short cycling time is an excess edge). Now consider the case where the -th path does not contain a cycle, then all important times are called long cycling times and their number is bounded by .
We now have our second encoding. We can reconstruct from the positions of the long cycling and the short cycling times and their marks. For each , there are at most short cycling time and long cycling times within if contains a cycle and short cycling time and long cycling times if does not contain a cycle. There are at most ways to position them (in time). There are at most different possible marks for a long cycling time and possible marks for a short cycling time. We deduce that
We use to obtain the announced bound. ∎
From (37) and Markov inequality, it suffices to prove that
For our choice of in (35), we have, for large enough,
In particular, the above geometric series converges and (38) follows. ∎
The bound (31) on we will now establish improves by a factor on the trivial estimate . Its proof parallels the argument used to show (30). We have
where is the set of pairs of paths such is non-backtracking and and each edge is visited at least twice. The only difference with defined above is that we do not require that . However, this last condition was not used in the proof of Lemma 17. It follows that the set of canonical paths in with distinct vertices and distinct edges has cardinal bounded by . Since the paths are connected and each edge appears at least twice, we have . As in the proof of (30), we get from (40) with
We conclude with Markov inequality and the union bound. ∎
with the convention that .
We define as the union of the graph , , . Note that the edges are not taken into account in . As usual, we set and . Since is tangled either (a) contains a cycle and is connected or (b) both and contain a cycle. In particular, all connected components of contain a cycle and it follows that
Taking the expectation in (42), we find that
Since is of order , we observe that the second statement in (33) improves by a factor the crude bound .
We only prove the second statement. The first statement is proved similarly. The argument again parallels that used to show (30). We take as in (35). We have
where is the set of sequence of paths defined below (36). From (47) and Markov inequality, it suffices to prove that
If is a canonical element of then and . Also, any is isomorphic to less that elements in . Moreover, we have that
For our choice of in (35), the above geometric series converges and (48) follows. ∎
Observe that unless , , or in which cases . We may thus decompose
where is the identity, and the non-zero entries of are equal and are the pairs such that , or . Thus
Stochastic Block Model : proof of Theorem 4
(in the above, if , we set ). We also define
Proposition 19 will follow from the local analysis done in Section 9. The next Proposition will be established in Section 10 using a matrix expansion together with norm bounds derived by combinatorial arguments parallel to the proof of Proposition 11 for the Erdős-Rényi graph.
We now check that the two preceding propositions imply Theorem 4. We consider obtained by the Gram-Schmidt orthonormalization of . By Lemma 9 and Proposition 19(iv), w.h.p. and for all ,
Since , from Proposition 19(i)-(iii), we find by induction on , w.h.p. for all ,
Consequently, from Proposition 20, we have w.h.p.
Hence, Proposition 20 and (53)-(54) imply that w.h.p.
We are now in position to apply Proposition 8. From (52), the statement of Proposition 19(ii) also holds with replaced by . It readily implies Theorem 4.
Controls on the growth of Poisson multi-type branching processes
In this section we derive results for multi-type Galton-Watson branching processes with Poisson offspring that will be crucial for the local analysis of Section 9. We refer to Section 3.2 for the notation used below.
We include the proof for later use. For , we have
so that, as ,
It follows easily that is an -martingale with mean . From Doob’s martingale convergence Theorem, the statement will follow if we prove that for some and all integer ,
To this end, we denote by the number of individuals of type in the -th generation which descend from a particle of type in the -th generation. Thus . We then have
where in the penultimate equality we used the fact that the variance of a Poisson random variable equals its mean. It follows that
The Perron-Frobenius Theorem implies that the matrix converges elementwise to as . In our case, , and . Consequently, for some ,
Since the above series is convergent. ∎
We also need to control the behavior of for . The next result is contained in Kesten and Stigum [17, Theorem 2.4].
Assume . For define
Then converges weakly to a random variable with finite positive variance.
Note that Theorem 2.4 in expresses as a mixture of Gaussian variables. The normalization in the case comes from the fact that is diagonalizable, and hence all its Jordan blocks are of size .
2 Quantitative versions of the Kesten-Stigum Theorems
We will also need probabilistic bounds on the growth of the total population at generation defined as
Assume . There exist such that for all ,
It is straightforward to check that converges, hence there exist constants such that for all ,
where we have set . In particular, on the event , we have
where we have used the existence of some such that for , one has . Finally by our choice of and (57), if ,
Hence we deduce the statement of the lemma for some (suitably redefined) constants . ∎
A key ingredient in the subsequent analysis will be the following result, which bounds by how much the growth of processes deviates from a purely deterministic exponential growth.
and for all , all ,
with . Similarly, for one has
where by convention for . Let . Then for any ,
In particular, for any , letting , we have, if ,
Consider first the case where . As there exists such that for all , , we get
Consider now the case where . As there exists such that, for all , , we get
since implies that from (11). Thus there exists some such that, for any ,
If , then and the same bound trivially holds. We thus obtain the existence of constants such that, for any ,
Now, from (55), for any , ,
From Equation (59) and Lemma 23, for large enough, with probability at least , we have for all that and . On this event, we get, for ,
where at the last line, we used that and for . Similarly, on the same event, for , from (55), for and ,
Using now , we have .
3 A cross-generation functional
where and is the centered martingale defined in Theorem 21.
We now prove the statements of the theorem for . We find similarly
Since , . We write
The last statement of Theorem 24 and (64) give
For any , there exists a constant such that for any ,
We use twice the bound . We find
for some new constant depending on and . We deduce that for some new ,
4 Decorrelation in homogeneous Galton-Watson branching processes
Assume that the spin at the root node is distributed according to the stationary distribution . Conditionally on the branching tree , the process of spins attached to the vertices of the tree is a Markov random field. For any two neighbor nodes of and any , one has the following transition probabilities
For any two (possibly equal) nodes of , any , , one has
By standard properties of independent Poisson random variables, conditionally on the spin and on the number of children of the root , then the spins of each of the children of the root are i.i.d., distributed according to . Moreover, is the stationary distribution for this transition kernel, which is reversible, as follows from the relation and the facts that is symmetric together with the assumption (11) that the column sums of all coincide with . The Markov random field property and the expression of the transition kernel follow by iterating this argument.
We now evaluate the conditional expectation in (67). Let denote the unique path in connecting nodes and . Let denote the -field generated by and the spin variables . We then have by the Markov random field property
where we used the fact that is a left-eigenvector of associated with eigenvalue . Thus
where the last equality follows from -orthogonality (9) between vectors and for . ∎
by Lemma 27, Equation(67). This concludes the proof. ∎
Local structure of random graphs
We now derive the necessary controls on the local structure of the SBM random graphs under consideration. Coupling results will allow to bound the deviation of their local structure from branching processes. Asymptotic independence between local neighborhoods of distinct nodes will then be used to establish weak laws of large numbers.
For , we define the ”oriented” distance
Then, for integer , we introduce the vector where, for ,
The vector counts the types at oriented distance from .
We shall denote by the set of vertices at distance from . We introduce
where at the last line we have used Assumption (11)-(13). Central to our local study is the classical exploration process of the neighborhood of which starts with and at stage , if is not empty, takes a vertex in at minimal distance from , say , reveals its neighbors, say , in , and update . We will denote by the filtration generated by and by the set of discovered vertices at time . We start by establishing a rough bound on the growth of .
There exist such that for all and for any ,
Consequently, for any , there exists such that
To prove the first statement, observe that, in the exploration process, given , if has type , the number of neighbors of in is upper bounded stochastically by
We now check that the random graph is locally tree-like. For and integer , we denote by the rooted subgraph of rooted at , spanned by the vertices at distance at most from . If , we set where is the graph with the edge removed (if it was present in ).
where and we have used again Lemma 29. ∎
We conclude this subsection with a coupling of the process and a multi-type Galton-Watson tree. Recall that for probability measures on a countable set , the total variation distance is given by
where the minimum is over all coupling such that , .
From (13), the latter is . We thus have proved that (73) holds for some new . ∎
We will use the following corollary of Proposition 31.
2 Geometric growth of linear functions of non-backtracking walks
The next proposition asserts that for most , grows nearly geometrically in with rate up to an error of order .
and statement (i) follows from Corollary 32. For statement (ii), we simply use that w.h.p. is tangle free (by Lemma 30), hence, there are at most two non-backtracking walks of length from to any . We get
However, by Lemma 29 w.h.p. for all and all , . ∎
From Cauchy-Schwartz inequality, the first term is bounded w.h.p. by
3 Laws of large numbers for local functions
We first prove weak laws of large numbers for general local functionals of SBM random graphs that will then be applied to specific functionals of interest.
Hence, for any , for some ,
Now, for , let , where is the edge set of . The vector is an independent vector and for some function ,
We also define as the graph with edge set . We set
where is the random rooted tree associated to the Galton-Watson branching process defined in Section 8 started from and has distribution .
In view of Proposition 35 and Jensen’s inequality, it is sufficient to prove that
3.2 Law of large numbers for specific local functions
For any , there exists such that, in probability,
For any , there exists such that w.h.p.
Let , , be the Galton-Watson branching process defined in Section 8 started from and has distribution . We denote by the associated random rooted tree. If , by Theorem 21, for some ,
It proves statement (i) of the proposition.
For statement (ii), we use Theorem 22 instead. We denote by the limit martingale in Theorem 22 when . If , we find similarly that for any ,
Since , the right hand side is . ∎
For any , there exists such that w.h.p.
For any , there exists such that w.h.p.
Let , , be the Galton-Watson branching process defined in Section 8 started from and has distribution . We denote by the associated random rooted tree. For any , by Theorem 25, for some positive constant ,
On the other hand, Theorem 25 also ensures that for some , for ,
Since , we have the rough bound
Since , the right hand side goes to and statement (i) follows from (79). Statement (ii) is proved similarly using (80). For statement (iii), we use the above computation, together with Theorem 28 and (9). It gives the claimed bound. ∎
4 Proof of Proposition 19
where at the last line, we have used Lemma 30. Since , it proves the first statement.
On the other hand, from Proposition 37(i), w.h.p.
The conclusion follows since .∎
All ingredients are finally gathered to prove Proposition 19.
Proof of (i). Let . By definition
From Proposition 37(i) and Proposition 38(i) respectively, for some positive constants , w.h.p.
It remains to use Lemma 39 and the assumption . We find w.h.p.
Proof of (ii). Let . Since is an isometry,
Proof of (iv). Let . From (2) and (82)-(83) for , w.h.p.
In addition, Equations (82)-(83), Proposition 37(i) and Lemma 39 entail that w.h.p.
From Proposition 37(iii), we get if that w.h.p.
Proof of (v). Let . From (2) and (82), w.h.p.
As in (84), we use . We find from (82), Proposition 37(i) and Lemma 39 that w.h.p.
Proof of (vi). We again use the same argument. Let . From (82),
Recall that . Then from (82), Proposition 38(i) and Lemma 39, we obtain w.h.p.
Norm of non-backtracking matrices
In this section we prove Proposition 20. The argument used for Erdős-Rényi graphs extends rather directly to the stochastic block model.
In this paragraph, we essentially repeat the argument of subsection 5.2. We define, for , the centered variable,
We now re-define as the weighted non-backtracking matrix on the complete graph on , for ,
where represents the non-backtracking property, and . We also introduce
2 Proof of Proposition 20
The proof of Proposition 20 parallels that of Proposition 11.
First, the expressions (39)-(46)-(49) now depend on the types of the vertices involved in a path. We treat for example the case of (39) needed for Proposition 30. We claim that if is a canonical path with edges and vertices,
With (87) in place of (39), using Lemma 17 we then bound given by (38) as follows
The second difference lies in the definition of the matrix . For the stochastic block model, from (10), the entry is zero unless , , or . Moreover, the non-zero entries are bounded by . Then the argument of the proof of bound (34) carries over easily.
With bounds (30)-(32)-(33)-(34) available for the stochastic block model, the remainder of the proof of Proposition 20 repeats the argument of subsection 5.4.
Stochastic Block Model : proof of Theorem 5
The strategy of proof is based on the following lemma which asserts that the existence of a Boolean function non constant over the classes ensures the existence of an estimation with asymptotically positive overlap.
Assume that and there exists a function of the graph such that in probability, for any ,
where is not a constant function (there exists such that ). Let be a partition of , such that and
Then the following estimation procedure yields asymptotically positive overlap with permutation in (16) equal to the identity : assign to each vertex a label picked uniformly at random from if and from if .
Observe that the existence of a non trivial partition satisfying (88) is implied by the assumption that is not constant.
Summing over all , in probability,
where is the left hand side of (88). Similarly, if is the right hand side of (88), in probability,
where the strict inequality comes from (88). ∎
Our aim is now to find a non constant functions over the classes which depends on the eigenvector . To this end, we introduce a new random variable, for ,
where and was defined in Proposition 38.
Let , and be as in Lemma 41. For any continuity point of the distribution of , in ,
From Proposition 38(i) we find that, in probability,
Hence, from the triangle inequality, w.h.p.
We deduce from Cauchy-Schwartz inequality that w.h.p.
Since is a continuity point of , it is then a routine to deduce Lemma 42 from Lemma 41. ∎
All ingredients are now gathered to prove Theorem 5. We fix as in Theorem 5 and let be as above. We set
Write, for , , . Note that by definition of . Also by the orthogonality relation (9) between and we obtain , so that . For , we shall denote by the random variable obtained as a mixture of the for , with weights . Note that has mean .
We may now come back to the eigenvector in Theorem 5. We set in Theorem 5. For some unknown sign we have
We first assume that is known and . We consider the function
From (90), (88) is satisfied with and we can apply Lemma 40 to obtain an asymptotically positive overlap. We note that the sign is easy to estimate consistently if the random variable which is the mixture of the with weights is not symmetric. Indeed, in this case, for some bounded continuous function ,
Then, from (89), given , in probability,
takes a different value for and .
Another simple case is if defined above is symmetric and . If this occurs, and have the same distribution. We consider the function and the estimation where a vertex such that receives a uniform label in , and otherwise a label uniform in . By Lemma 40 applied to , if , we obtain an positive overlap with and the permutation in (16) equal to the identity. If , we obtain an positive overlap with and any permutation in (16) such that ,
We consider the function and the estimation where if and uniform on otherwise. We apply Lemma 40 to and the partition , . We obtain an asymptotically positive overlap for any permutation in (16) such that .
Hence, it follows from (90) that (88) is satisfied with . We can then apply Lemma 40 to obtain an asymptotically positive overlap.