Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs

Charles Bordenave, Marc Lelarge, Laurent Massoulié

Introduction

Given a finite (simple, non-oriented) graph G=(V,E)G=(V,E), several matrices of interest can be associated to GG, besides its adjacency matrix A=(1 ⁣ ⁣I{u,v}E)u,vVA=(1\!\!{\sf I}_{\{u,v\}\in E})_{u,v\in V}. In this work we are interested in the so-called non-backtracking matrix of GG, denoted by BB. It is indexed by the set E={(u,v):{u,v}E}\vec{E}=\{(u,v):\{u,v\}\in E\} of oriented edges in EE and defined by

where for any e=(u,v)Ee=(u,v)\in\vec{E}, we set e1=ue_{1}=u, e2=ve_{2}=v, e1=(v,u)e^{-1}=(v,u). This matrix was introduced by Hashimoto . A non-backtracking walk is a directed path of directed edges of GG such that no edge is the inverse of its preceding edge. It is easily seen that for any k1k\geq 1, BefkB^{k}_{ef} counts the number of non-backtracking walks of k+1k+1 edges on GG starting with ee and ending with ff.

Our focus is the spectrum of BB, referred to in the sequel as the non-backtracking spectrum of GG, when GG is a sparse random graph. Specifically we shall characterize the asymptotic behavior of the leading eigenvalues and associated eigenvectors in the non-backtracking spectrum of sparse random graphs in the limit nn\to\infty where n=Vn=|V|.

The random graphs we consider are drawn according to the Stochastic Block Model, a generalization of Erdős-Rényi graphs due to Holland et al. . In this model nodes vVv\in V are partitioned into rr groups. An edge between two nodes u,vu,v is drawn with probability W(σ(u),σ(v))/nW(\sigma(u),\sigma(v))/n, where σ(u)[r]\sigma(u)\in[r] denotes the group node uu belongs to. Thus when the r×rr\times r matrix WW is fixed the expected node degrees remain of order 1 as nn\to\infty. We focus moreover on instances where the fraction of nodes in group ii converges to a limit π(i)\pi(i) as nn\to\infty.

Our primary motivation stems from the problem of community detection, namely: how to estimate a clustering of graph nodes into groups close to the underlying blocks, based on the observation of such a random graph GG? Decelle et al. conjectured a phase transition phenomenon on detectability, namely: the underlying block structure could be detected if and only if μ2>α|\mu_{2}|>\sqrt{\alpha}.

In the case of two communities with roughly equal sizes (π(1)=π(2)=1/2\pi(1)=\pi(2)=1/2) and symmetric matrix WW, the negative part (impossibility of detection when μ2α|\mu_{2}|\leq\sqrt{\alpha}) was proven by Mossel et al . As for the positive part (feasibility of detection when μ2>α|\mu_{2}|>\sqrt{\alpha}), it was conjectured in that the so-called belief propagation algorithm would succeed. Krzakala et al. then made their so-called “spectral redemption conjecture” according to which detection based on the second eigenvector of the non-backtracking matrix BB would succeed.

Recently a variant of the spectral redemption conjecture was proven by Massoulié : the spectrum of a matrix counting self-avoiding paths in GG allows to detect communities through thresholding of the second eigenvector. More recently and independently of , an alternative proof of the positive part of the conjecture in was given by Mossel et al. , based on an elaborate construction involving countings of non-backtracking paths in GG.

The two approaches of and to proving the positive part of the conjecture in , while both relying ultimately on properties of specific path counts, differ however in the following respects. The method in relies on a clear spectral separation property but its implementation is computationally expensive, as the counts of self-avoiding walks it relies upon take super-linear (though polynomial) time. The method in is computationally efficient as it runs in quasi-linear time, but the proof does not establish a spectral separation property. The other two methods conjectured to achieve successful reconstruction, namely belief propagation and analysis of non-backtracking spectrum, are computationally efficient and they are motivated by a clear intuition as described in the spectral redemption conjecture.

Our present work proves the spectral redemption conjecture. More generally by characterizing all the leading eigen-elements it determines the limits of community detection based on the non-backtracking spectrum in the presence of an arbitrary number of communities.

Ihara zeta function

Hashimoto introduced the matrix BB in the context of the Ihara zeta function . We have the identity

where ζG\zeta_{G} is the Ihara zeta function of the graph, refer to . It follows that the poles of the Ihara zeta function are the reciprocal of the eigenvalues of BB. Our main results have thus consequences on the localization of the poles of the zeta function of random graphs.

Weak Ramanujan property

Our result also has an interpretation from the standpoint of Ramanujan graphs, introduced by Lubotzky et al. (see Murty for a recent survey). These are by definition kk-regular graphs such that the second largest modulus of its eigenvalues is at most 2k12\sqrt{k-1}. By a result of Alon and Boppana (see ) for fixed kk, kk-regular graphs on nn nodes must have their second largest eigenvalue at least 2k1o(1)2\sqrt{k-1}-o(1) as nn\to\infty. Hence Ramanujan graphs are regular graphs with maximal spectral gap between the first and second eigenvalue moduli. A celebrated result of Friedman states that random kk-regular graphs achieve this lower bound with high probability as their number of nodes nn goes to infinity.

Lubotzky has proposed an extension of the definition of Ramanujan graphs to the non-regular case. Specifically, GG is Ramanujan according to this definition if and only if

where AA is the adjacency matrix of GG, ρ\rho its spectral radius, and σ\sigma the spectral radius of the adjacency operator on the universal covering tree of GG.

Using the analogy between the Ihara zeta function and the Riemann zeta function, Stark and Terras (see ) have defined a graph to satisfy the graph Riemann hypothesis if its non-backtracking matrix BB has no eigenvalues λ\lambda such that λ(ρB,ρB)|\lambda|\in(\sqrt{\rho_{B}},\rho_{B}), where ρB\rho_{B} is the Perron-Frobenius eigenvalue of BB. Interestingly, a regular graph GG is Ramanujan if and only if it satisfies the graph Riemann hypothesis (see and ). Thus the graph Riemann hypothesis can also be seen as a generalization of the notion of Ramanujan graphs to the non-regular case, phrased in terms of non-backtracking spectra rather than on spectra of universal covers as in the definition of Lubotzky .

Our results imply that for fixed α>1\alpha>1, Erdős-Rényi graphs G(n,α/n)\mathcal{G}(n,\alpha/n) have an associated non-backtracking matrix BB such that ρBα\rho_{B}\sim\alpha and all its other eigenvalues λ\lambda verify λα+o(1)|\lambda|\leq\sqrt{\alpha}+o(1) with high probability as nn\to\infty. In this sense, Erdős-Rényi graphs asymptotically satisfy the graph Riemann hypothesis, itself is a plausible extension of the notion of Ramanujan graphs to the non-regular case. This may be seen as an analogue of Friedman’s Theorem in the context of Erdős-Rényi graphs. Similarly, for the Stochastic Block Model, our main result is analogue to recent results on the eigenvalues of random nn-lifts of base graphs, see . Interestingly, in , the methods developed in the present paper are adapted to lead to a new and simpler proof of Friedman’s Theorem and its extensions to random nn-lifts. The random graphs studied here will require a more delicate analysis.

Organization

The paper is organized as follows. We start in Section 2 with preliminaries on non-backtracking matrices. We state our main results in Section 3, namely Theorems 3 and 4 characterizing properties of non-backtracking spectra of Erdős-Rényi graphs and Stochastic Block Models respectively, and Theorem 5 establishing their consequence for community detection.

In Section 4, we provide the algebraic ingredients we shall need. Specifically we establish general bounds on the perturbation of eigenvalues and eigenvectors of not necessarily symmetric matrices that elaborate on the Bauer-Fike Theorem.

In Section 6 we detail another combinatorial argument needed in our proof, namely we establish bounds on the norms of the various matrices involved in the previous matrix expansion. The latter norm bounds are established by the trace method, adapting arguments due to Fűredi and Komlós .

Section 7 gives the proof strategy for Theorem 4 on non-backtracking spectra of Stochastic Block Models.

In Section 8 we establish convergence results on multitype branching processes that extend results of Kesten and Stigum . These are then used in Section 9 to characterize the local structure of the random graphs under study. Specifically we relate the local statistics of Stochastic Block Models to branching processes via coupling, and then establish weak laws of large numbers on these local statistics. These probabilistic arguments together with the previous algebraic and combinatorial arguments allow us to conclude the proofs of Theorems 3 and 4. Section 11 contains the proof of Theorem 5 on community detection.

Preliminaries on non-backtracking matrices

We set m=Em=|\vec{E}|. A priori, BB is not a normal matrix. We are interested in its eigenvalues λ1(B),,λm(B)\lambda_{1}(B),\ldots,\lambda_{m}(B) ordered non-increasingly, λ1(B)λm(B)|\lambda_{1}(B)|\geq\ldots\geq|\lambda_{m}(B)|. The Perron-Frobenius Theorem implies notably that λ1(B)\lambda_{1}(B) is a non-negative real. If GG is connected, λ1(B)\lambda_{1}(B) is equal to the growth rate of the universal cover of GG (see Angel, Friedman and Hoory ).

Hence BkPB^{k}P is a symmetric matrix (in mathematical physics, this type of symmetry is called PT-invariance, PT standing for parity-time). If (σj,k)(\sigma_{j,k}), 1jm1\leq j\leq m, are the eigenvalues of BkPB^{k}P and (xj,k)(x_{j,k}), 1jm1\leq j\leq m, is an orthonormal basis of eigenvectors, we deduce that

This is precisely the singular value decomposition of BkB^{k}.

For example, for k=1k=1, it is a simple exercise to compute (σj,1)1jm(\sigma_{j,1})_{1\leq j\leq m}. We find that the eigenvalues of BPBP are (deg(v)1)(\deg(v)-1), 1vn1\leq v\leq n, and 1-1 with multiplicity mnm-n. In particular, the singular values of BB contain only information on the degree sequence of the underlying graph GG.

For large kk however, we may expect that the decomposition (3) carries more structural information on the graph (this will be further discussed in Subsection 2.2 below). This will be the underlying principle in the proof of our main results. For the moment, we simply note the following. Assume that BB is irreducible. From Perron-Frobenius theorem, if ξ\xi is the Perron eigenvector of BB, ξ=1\|\xi\|=1, then for any nn fixed,

A quantitative version of the above limits will be given in the forthcoming Proposition 7. Another consequence of (3) is that, for iji\neq j, xi,kx_{i,k} and xˇj,k\check{x}_{j,k} should be nearly orthogonal if these vectors converge as kk\to\infty. Indeed, a heuristic computation gives

We will exploit this general phenomenon in the proof of our main results.

2 Chung, Cheeger and Alon-Boppana inequalities for non-backtracking matrices

The aim of this subsection is to advocate the use of non-backtracking matrices. Here, we discuss briefly candidate counterparts for irregular graphs of inequalities that are classical in the context of regular graphs. This subsection will not be used in the proof of our main results, it may be skipped.

The diameter bound of Chung gives an upper bound on the diameter of a regular graph in terms of its spectral gap. The following lemma, expressed in terms of the decomposition (4) of BkB^{k}, is an analogue:

Let e=(u,u)e=(u,u^{\prime}), f=(v,v)Ef=(v,v^{\prime})\in\vec{E} be such that

Then, the graph distance between uu and vv is at most k+1k+1.

Observe that if Bef1k>0B^{k}_{ef^{-1}}>0 then uu and vv are at most at distance k+1k+1. On the other hand from (4),

has absolute value at most s2,ks_{2,k} from Cauchy-Schwartz inequality and the orthonormality of the xj,kx_{j,k}, 1jm1\leq j\leq m. Thus finally, Bef1ks1,kx1,k(e)x1,k(f)s2.B^{k}_{ef^{-1}}\geq s_{1,k}x_{1,k}(e)x_{1,k}(f)-s_{2}.

Cheeger-type inequalities connect the expansion ratio (isoperimetry) of the graph and its spectral gap, for a survey see . For a subset XEX\subset\vec{E} of edges, we measure its volume by

By construction, our volume is normalized with Vk(E)=1V_{k}(\vec{E})=1. We say that XEX\subset\vec{E} is edge-symmetric if Xˇ=X\check{X}=X. For example the set of edges adjacent to a given subset of vertices is edge-symmetric. If X,YX,Y are edge-symmetric, we define

Since BefkB^{k}_{ef} is the number of non-backtracking walks of length k+1k+1 starting with ee and ending with ff, Ek(X,Y)E_{k}(X,Y) measures a kind of conductance between XX and YY with a proximity range of radius k+1k+1. If Xc=X\EX^{c}=X\backslash\vec{E}, the scalar

can be thought of as the outer surface of a set XX. The kk-th expansion ratio of GG is then defined as

In (3), after reordering the eigenvalues of BkPB^{k}P as σ1,kσ2,kσm,k\sigma_{1,k}\geq\sigma_{2,k}\geq\ldots\geq\sigma_{m,k}, σ1,kσ2,k\sigma_{1,k}-\sigma_{2,k} plays the role of the spectral gap in the classical Cheeger inequality. With this new convention, the following lemma is the analog of the easy part of Cheeger’s inequality for graphs.

The argument is standard. For simplicity, we drop the index kk. From Courant-Fisher min-max Theorem, we have

Let XEX\subset\vec{E} be edge-symmetric. We set

By construction x,x1=0\langle x,x_{1}\rangle=0 and x2=1/V(X)+1/V(Xc)\|x\|^{2}=1/V(X)+1/V(X^{c}). Hence,

However, using the edge-symmetry of XX and XcX^{c},

Also, using the singular value equation Bkxˇ1=σ1x1B^{k}\check{x}_{1}=\sigma_{1}x_{1},

Since, for x,x>0x,x^{\prime}>0, 1/x+1/x2/(xx)1/x+1/x^{\prime}\leq 2/(x\wedge x^{\prime}), it concludes the proof. ∎

The Alon-Boppana theorem gives a lower bound on the second largest eigenvalue of the adjacency matrix of a regular graph (see ). We conclude this paragraph with an elementary bound of this type. We introduce for eEe\in\vec{E},

In words, Sk(e)S_{k}(e) is the number of non-backtracking walks of length k+1k+1 starting with ee. As already pointed, if BB is irreducible, the Perron eigenvalue is the growth rate of the universal cover of the graph: for any eEe\in\vec{E},

This last crude inequality gives a lower bound on the second largest singular value of BkB^{k}.

Main results

We now state our results on the non-backtracking spectra of Erdős-Rényi graphs first, and Stochastic Block Models next.

Let GG be an Erdős-Rényi graph with parameters (n,α/n)(n,\alpha/n) for some fixed parameter α>1\alpha>1. Then, with probability tending to 11 as nn\to\infty, the eigenvalues λi(B)\lambda_{i}(B) of its non-backtracking matrix BB satisfy

Moreover the normalized Perron-Frobenius eigenvector associated to λ1(B)\lambda_{1}(B) is asymptotically aligned with

Theorem 3 is illustrated by Figure 1. We conjecture that the lower bound λ2(B)αo(1)|\lambda_{2}(B)|\geq\sqrt{\alpha}-o(1) holds, it is reasonable in view of Figure 1. We shall prove a weaker lower bound, see forthcoming Remark 12. It is also an interesting open problem to study the convergence of the empirical distribution of the eigenvalues of BB.

2 Stochastic Block Model

For integer k1k\geq 1, we set [k]={1,,k}[k]=\{1,\ldots,k\}. We consider a random graph G=(V,E)G=(V,E) on the vertex set V=[n]V=[n] defined as follows. Each vertex v[n]v\in[n] is given a type σn(v)\sigma_{n}(v) from the set [r][r] where the number of types rr is assumed fixed and the map σn:[n][r]\sigma_{n}:[n]\to[r] is such that, for all i[r]i\in[r],

for some probability vector π=(π(1),,π(r))\pi=(\pi(1),\cdots,\pi(r)). For ease of notation, we often write σ\sigma in place of σn\sigma_{n}.

i.e. for some integer k1k\geq 1, MkM^{k} has positive coefficients. In particular, μ1>maxk2μk\mu_{1}>\max_{k\geq 2}|\mu_{k}| is the Perron-Frobenius eigenvalue. It implies notably that for all i[r]i\in[r], π(i)>0\pi(i)>0. We define r0r_{0} by

(with μr+1=0\mu_{r+1}=0). Since M=Π1/2SΠ1/2M=\Pi^{1/2}S\Pi^{-1/2}, the matrix MM is diagonalizable. Let {ui}i[r]\{u_{i}\}_{i\in[r]} be an orthonormal basis of eigenvectors of SS such that Sui=μiuiSu_{i}=\mu_{i}u_{i}. Then ϕi:=Π1/2ui\phi_{i}:=\Pi^{-1/2}u_{i} and ψi=Π1/2ui\psi_{i}=\Pi^{1/2}u_{i} are the left and right-eigenvectors associated to eigenvalue μi\mu_{i}, ϕiM=μiϕi\phi_{i}^{*}M=\mu_{i}\phi^{*}_{i}, Mψi=μiψiM\psi_{i}=\mu_{i}\psi_{i}. We get

where the second identity comes from ψk=Πϕk\psi_{k}=\Pi\phi_{k} and W=Π1MW=\Pi^{-1}M.

We will make the further assumption that each vertex type has the same asymptotic average degree α>1\alpha>1, i.e.,

This entails that M/αM^{*}/\alpha is a stochastic matrix and we then have

We will also assume that a quantitative version of (7) holds, namely that for some γ(0,1]\gamma\in(0,1],

The random graph GG is usually called the stochastic block model (SBM for short) or inhomogeneous random graph, see Bollobás, Janson and Riordan and Holland, Laskey and Leinhardt . A popular case is when the map σ\sigma is itself random and σ(v)\sigma(v) are i.i.d. with distribution (π(1),,π(r))(\pi(1),\cdots,\pi(r)). In this case, with probability one, condition (13) is met for any γ<1/2\gamma<1/2.

If r=2r=2, then we have π(1)=1π(2)\pi(1)=1-\pi(2). Under condition (11), we have W22=(π(1)W11+(12π(1))W12)/(1π(1))W_{22}=(\pi(1)W_{11}+(1-2\pi(1))W_{12})/(1-\pi(1)) so that μ1=α=π(1)W11+(1π(1))W12\mu_{1}=\alpha=\pi(1)W_{11}+(1-\pi(1))W_{12} and μ2=π(1)(W11W12)\mu_{2}=\pi(1)\left(W_{11}-W_{12}\right).

If r2r\geq 2, π(i)=1/r\pi(i)=1/r and Wii=ab=WijW_{ii}=a\neq b=W_{ij} for all iji\neq j so that condition (11) is satisfied. We have μ1=α=(a+(r1)b)/r\mu_{1}=\alpha=(a+(r-1)b)/r and μ2==μr=(ab)/r\mu_{2}=\ldots=\mu_{r}=(a-b)/r.

In particular, χ1=χ\chi_{1}=\chi. Our main result is the following generalization of Theorem 3.

Let GG be an SBM as above such that hypotheses (8,11,13) hold. Then with probability tending to 11 as nn\to\infty,

Moreover, if μk\mu_{k} is a simple eigenvalue of MM for some k[r0]k\in[r_{0}], then a normalized eigenvector, say ξk\xi_{k}, of λk(B)\lambda_{k}(B) is asymptotically aligned with

It follows from this result that a non-trivial estimation of the node types σ(v)\sigma(v) is feasible on the basis of the eigenvectors {ξk}2kr0\{\xi_{k}\}_{2\leq k\leq r_{0}} provided r0>1r_{0}>1. More precisely, for vertex type estimators σ^(v):[n][r]\hat{\sigma}(v):[n]\to[r] based on the observed random graph GG, following Decelle et al. , define the overlap ov(σ^,σ)\hbox{ov}(\hat{\sigma},\sigma) as the minimum over permutations p:[r][r]p:[r]\to[r] of the quantity

We shall say that σ^\hat{\sigma} has asymptotic overlap δ\delta if ov(σ^,σ)\hbox{ov}(\hat{\sigma},\sigma) converges in probability to δ\delta as nn grows. It has asymptotic positive overlap if for some δ>0\delta>0, ov(σ^,σ)>δ\hbox{ov}(\hat{\sigma},\sigma)>\delta with probability tending to 11 as nn grows. Note that an asymptotic overlap of zero is always achievable by assigning to each vertex the type kk^{*} that maximizes π(k)\pi(k). In the case where all communities have asymptotically the same size, i.e. π(i)1/r\pi(i)\equiv 1/r, zero overlap is also achieved by assigning types at random.

As conjectured in and proven in , in the setup of Example 2 with r=2r=2, the best possible overlap is o(1)o(1) with high probability when r0=1r_{0}=1, i.e. when μ2μ1\mu_{2}\leq\sqrt{\mu_{1}}. Conversely, adapting the argument in , when r0>1r_{0}>1, we have the following

The reason for the existence of the signing ω{1,1}V\omega\in\{-1,1\}^{V} in the above statement is that we do not know a priori whether the vector ξk\xi_{k} or ξk-\xi_{k} is asymptotically close to (15). In the simplest case, we will be able to estimate this sign and the vector ω\omega will be equal to 1 ⁣ ⁣I-1\!\!{\sf I} or 1 ⁣ ⁣I1\!\!{\sf I} and I+={i[r]:ϕk(i)>0}I^{+}=\{i\in[r]:\phi_{k}(i)>0\}, I=[r]\I+I^{-}=[r]\backslash I^{+}.

3 Notation

We denote by CC^{*} the transpose of CC.

Given a (non-oriented) graph G=(V,E)G=(V,E), we denote by γ=(γ0,,γk)\gamma=(\gamma_{0},\dots,\gamma_{k}) a walk of length kk where each γiV\gamma_{i}\in V and {γi,γi+1}E\{\gamma_{i},\gamma_{i+1}\}\in E for all i{0,k˙1}i\in\{0,\dot{k}-1\}. We also denote the concatenation of two walks γ\gamma and γ\gamma^{\prime} by (γ,γ)(\gamma,\gamma^{\prime}). A walk is non-backtracking if for all i{0,,k2}i\in\{0,\dots,k-2\}, γiγi+2\gamma_{i}\neq\gamma_{i+2}. A walk contains a cycle if there exists iji\neq j with γi=γj\gamma_{i}=\gamma_{j}.

Algebraic tools: Perturbation of Eigenvalues and Eigenvectors

One main tool in our analysis is the Bauer-Fike Theorem. The form given below elaborates on the usual statement of the Theorem which in general omits the second half.

(Bauer-Fike Theorem; see [4, Theorem VI.5.1]). Let DD be a diagonalizable matrix such that for some invertible matrix VV and diagonal matrix Λ\Lambda one has D=V1ΛVD=V^{-1}\Lambda V. Let EE be a perturbation matrix.

Then any eigenvalue μ\mu of D+ED+E verifies

where λi\lambda_{i} is the ii-th diagonal entry of Λ\Lambda.

Denote by RR the right-hand side of (17) and Ci:=B(λi,R)C_{i}:={\mathcal{B}}(\lambda_{i},R) the ball centered at λi\lambda_{i} with radius RR. Let I{\mathcal{I}} be a set of indices such that

Then the number of eigenvalues of D+ED+E in iICi\cup_{i\in{\mathcal{I}}}C_{i} is exactly I|{\mathcal{I}}|.

The following proposition on perturbation of rank one matrices will be a basic ingredient to deduce from expressions like (3) quantitative versions of (5). It relies on the stability of eigenvalues and eigenvectors of matrices which are not too far from being normal (AA is normal if and only if AA=AAA^{*}A=AA^{*}; see ), and with a well separated spectrum.

with yk,xkc0\langle y_{k},x_{k}\rangle\geq c_{0}, xkykc1\|x_{k}\|\|y_{k}\|\leq c_{1} and

Let (λi)(\lambda_{i}), 1in1\leq i\leq n be the eigenvalues of AA, with λnλ1|\lambda_{n}|\leq\ldots\leq|\lambda_{1}|. Then, λ1\lambda_{1} has multiplicity one and we have

with C=π/2+2c11log(2(c1c01))C=\pi/2+2\sqrt{c_{1}\vee 1}\log(2(c_{1}\vee c_{0}^{-1})). Moreover, there exists a unit eigenvector ψ\psi of AA with eigenvalue λ1\lambda_{1} such that

The eigenvalues of WWWW^{*} are 1±b1\pm|b|. We deduce that

In particular, the eigenvalue λ1k\lambda_{1}^{k} has multiplicity one in AkA^{k} and thus λ1\lambda_{1} has multiplicity one in AA. We now bound the difference between λ1\lambda_{1} and θ=1\theta=1. First, by assumption,

with c2=(2c1)1log(2(c1c01))c_{2}=\sqrt{(2c_{1})\vee 1}\log(2(c_{1}\vee c_{0}^{-1})).

Since x(π/2)sin(x)|x|\leq(\pi/2)|\sin(x)| for x[π/2,π/2]x\in[-\pi/2,\pi/2], we obtain from (19) that

where cc is some scalar and we have used λ1kc0/2|\lambda_{1}|^{k}\geq c_{0}/2. We now use the following general inequality

From the trianle inequality, z01z\|z_{0}\|\geq 1-\|z^{\perp}\|. We then have

Finally, since λ=λ1k\lambda=\lambda^{k}_{1} has multiplicity one in AkA^{k}, the eigenspace of λ1\lambda_{1} for AA coincides with the eigenspace of λ1k\lambda_{1}^{k} for AkA^{k}. This concludes the proof of Proposition 7. ∎

Assume there exist c0,c1>0c_{0},c_{1}>0 such that for all ij[r]i\neq j\in[r], yk,j,xk,jc0\langle y_{k,j},x_{k,j}\rangle\geq c_{0}, xk,jyk,jc1\|x_{k,j}\|\|y_{k,j}\|\leq c_{1}, xk,j,yk,i=xk,j,xk,i=yk,j,yk,i=0\langle x_{k,j},y_{k,i}\rangle=\langle x_{k,j},x_{k,i}\rangle=\langle y_{k,j},y_{k,i}\rangle=0 and

where ϑ=miniθi\vartheta=\min_{i}|\theta_{i}|, γ=min{θi/θj:θi>θj>0 or θi<θj<0}\gamma=\min\{\theta_{i}/\theta_{j}:\theta_{i}>\theta_{j}>0\hbox{ or }\theta_{i}<\theta_{j}<0\} (the minimum over the empty set being ++\infty). Let (λi)(\lambda_{i}), 1in1\leq i\leq n, be the eigenvalues of AA with λnλ1|\lambda_{n}|\leq\ldots\leq|\lambda_{1}|. Then, there exists a permutation σSr\sigma\in S_{r} such that for all i[r]i\in[r],

with C=π/2+2c11log(2(c1c01))C=\pi/2+2\sqrt{c_{1}\vee 1}\log(2(c_{1}\vee c_{0}^{-1})). Moreover, if θσ(i)\theta_{\sigma(i)} has multiplicity one in θ\theta, λi\lambda_{i} is a simple eigenvalue and there exists a unit eigenvector ψi\psi_{i} of AA with eigenvalue λi\lambda_{i} such that

with C=24c1c01/(1(c0γkc1)+c0)C^{\prime}=24c_{1}c_{0}^{-1}/(1\wedge(c_{0}\gamma^{k}-c_{1})_{+}\wedge c_{0}).

Now, by assumption, 2κRk<c0(c0γkc1)+2\kappa\|R_{k}\|<c_{0}\wedge(c_{0}\gamma^{k}-c_{1})_{+} is less than the minimal distance between the distinct eigenvalues of DD. We deduce from Theorem 6 applied to D+U1RkUD+U^{-1}R_{k}U that there is a permutation sSrs\in S_{r} such that

We now bound the difference between λi\lambda_{i} and θs(i)\theta_{s(i)}. By assumption, c0θjνjc1θj.c_{0}|\theta_{j}|\leq|\nu_{j}|\leq c_{1}|\theta_{j}|. Hence, arguing as in the proof of Proposition 7,

It now remains to control the eigenvector of λi\lambda_{i} such that θσ(i)\theta_{\sigma(i)} has multiplicity one. First from (22), λi\lambda_{i} is a simple eigenvalue of AA. Let zz be a corresponding normed eigenvector of AA. Applying AkA^{k} yields

Applying AkA^{k} once more to (23) yields

Multiplying (23) by λik\lambda_{i}^{k} and subracting it to the previous display yields

Moreover, this implies upon dividing (24) by λik\lambda_{i}^{k}:

We conclude this paragraph with an elementary lemma on Gram-Schmidt orthonormalization process. It will be used to obtain vectors which are exactly orthogonal as in the assumptions of Proposition 8.

We prove the statement by induction. For k=1k=1, uˉ1=u1\bar{u}_{1}=u_{1}. For k1k\geq 1, we denote by vk+1v_{k+1} the orthonormal projection of uk+1u_{k+1} on the span of (u1,,uk)(u_{1},\cdots,u_{k}). We have

It is easy to check that 2k(1+kk)21(k+1)k+1\sqrt{2k(1+k^{k})}\leq 2^{-1}(k+1)^{k+1} for all k1k\geq 1. In particular, if δ<(k+1)(k+1)\delta<(k+1)^{-(k+1)}, vk+1uk+1v_{k+1}\neq u_{k+1} and then from (21), uk+1uˉk+12vk+1δ(k+1)k+1\|u_{k+1}-\bar{u}_{k+1}\|\leq 2\|v_{k+1}\|\leq\delta(k+1)^{k+1}. ∎

Erdős-Rényi graph: proof strategy for Theorem 3

(if θ=0\theta=0, we set ζ=0\zeta=0). The proof relies on the following two propositions.

Hence, Proposition 11 implies that w.h.p.

We may now apply Proposition 7. If λi=λi(B)\lambda_{i}=\lambda_{i}(B), we find that w.h.p.

and the normalized Perron eigenvector ξ\xi of BB satisfies w.h.p.

We note that from [15, Theorem 3.3.16], we get that in (4),

The proof of Proposition 11 relies crucially on a matrix expansion given in Proposition 13, which extends the argument introduced in for matrices counting self-avoiding walks to the present setup where non-backtracking walks instead are considered. We now introduce some notation to state it.

where AA is the graph’s adjacency matrix. For integer k1k\geq 1, e,fE(V)e,f\in\vec{E}(V), we define Γefk\Gamma^{k}_{ef} as the set of non-backtracking walks γ=(γ0,,γk)\gamma=(\gamma_{0},\ldots,\gamma_{k}) of length kk starting from (γ0,γ1)=e(\gamma_{0},\gamma_{1})=e and ending at (γk1,γk)=f(\gamma_{k-1},\gamma_{k})=f in the complete graph on the vertex set VV. We have that

and Fefk+1F^{k+1}_{ef} is the subset of tangle-free paths in Γefk+1\Gamma^{k+1}_{ef}. For uvu\neq v, we set

The matrix Δ(k)\Delta^{(k)} can be thought of as an attempt to center the non-backtracking matrix BkB^{k} when the underlying graph is tangle-free. We use the convention that a product over an empty set is equal to 11. We also set

Notably, B(0)B^{(0)} is the projection on E\vec{E}. We have the following telescopic sum decomposition.

3 Norm bounds

The following proposition will be established in Section 6 using path counting combinatorial arguments.

4 Proof of Proposition 11

Together with Propositions 13 and 14, we shall also need the next two results, established by local analysis in Section 9. In particular the forthcoming Lemma 30 implies that

For the Erdős-Rényi graph, Corollary 34 states the following.

Also, from (2), since χˇ=χ\check{\chi}=\chi,

Hence, from Proposition 16 and norm bound (31), w.h.p.

Proof of Proposition 14: path count combinatorics

The proof will use a version of the trace method. For n3n\geq 3, we set

The symmetry (2) implies that Δef(k)=Δf1e1(k)\Delta^{(k)}_{ef}=\Delta^{(k)}_{f^{-1}e^{-1}}. With the convention that e2m+1=e1e_{2m+1}=e_{1}, we get

where Wk,mW_{k,m} is the set of sequence of paths γ=(γ1,,γ2m)\gamma=(\gamma_{1},\ldots,\gamma_{2m}) such that γi=(γi,0,,γi,k)Vk+1\gamma_{i}=(\gamma_{i,0},\cdots,\gamma_{i,k})\in V^{k+1} is non-backtracking tangle-free of length kk and for all i=1,,2mi=1,\ldots,2m,

with the convention that γ0=γ2m\gamma_{0}=\gamma_{2m}.

where Wk,mW^{\prime}_{k,m} is the subset of Wk,mW_{k,m} where each non-oriented edge is visited at least twice. For each γWk,m\gamma\in W_{k,m} we associate the graph G(γ)=(V(γ),E(γ))G(\gamma)=(V(\gamma),E(\gamma)) of visited vertices and edges. We set

We say that a path γ\gamma is canonical if V(γ)={1,,v(γ)}V(\gamma)=\{1,\cdots,v(\gamma)\} and the vertices are first visited in order. Wk,m\mathcal{W}_{k,m} will denote the set of canonical paths in Wk,mW_{k,m}. Every canonical path is isomorphic to (nv(γ))v(γ)!{n\choose v(\gamma)}v(\gamma)! paths in Wk,mW_{k,m}. We also have the following

Let Wk,m(v,e)\mathcal{W}_{k,m}(v,e) be the set of canonical paths with v(γ)=vv(\gamma)=v and e(γ)=ee(\gamma)=e. We have

In order to upper bound Wk,m(v,e)|\mathcal{W}_{k,m}(v,e)| we need to find an injective way to encode the canonical paths xWk,m(v,e)x\in\mathcal{W}_{k,m}(v,e).

Let x=(xi,t)1i2m,0tkWk,m(v,e)x=(x_{i,t})_{1\leq i\leq 2m,0\leq t\leq k}\in\mathcal{W}_{k,m}(v,e). We set yi,t={xi,t,xi,t+1}y_{i,t}=\{x_{i,t},x_{i,t+1}\}, yi,ty_{i,t} will be called an edge of xx. We explore the sequence (xi,t)(x_{i,t}) in lexicographic order denoted by \preceq (that is (i,t)(i+1,t)(i,t)\preceq(i+1,t^{\prime}) and (i,t)(i,t+1)(i,t)\preceq(i,t+1)). We think of the index (i,t)(i,t) as a time. For 0tk10\leq t\leq k-1, we say that (i,t)(i,t) is a first time, if xi,t+1x_{i,t+1} has not been seen before (that is xi,t+1xi,tx_{i,t+1}\neq x_{i^{\prime},t^{\prime}} for all (i,t)(i,t)(i^{\prime},t^{\prime})\preceq(i,t)). If (i,t)(i,t) is a first time the edge yi,ty_{i,t} is called a tree edge. By construction, the set of tree edges forms a tree TT with vertex set {1,,v}\{1,\ldots,v\}. The edges which are not in TT are called the excess edges of xx. Any vertex different from 11 has its associated tree edge. It follows that the cardinal of excess edges is ϵ=ev+1\epsilon=e-v+1.

We build a first encoding of Wk,m(v,e)\mathcal{W}_{k,m}(v,e). If (i,t)(i,t) is not a first time, we say that (i,t)(i,t) is an important time and we mark the time (i,t)(i,t) by the vector (xi,t+1,xi,τ)(x_{i,t+1},x_{i,\tau}), where (i,τ)(i,\tau) is the next time that yi,τy_{i,\tau} will not be a tree edge (by convention τ=k\tau=k if xi,sx_{i,s} remains on the tree for all t+1skt+1\leq s\leq k). Since there is a unique non-backtracking path between two vertices of a tree, we can reconstruct xWk,mx\in\mathcal{W}_{k,m} from the position of the important times and their mark. It gives rise to our first encoding.

The main issue with this encoding is that the number of important times could be large. We have however not used so far the hypothesis that each path xix_{i} is tangle free. To this end, we are going to partition important times into three categories, short cycling, long cycling and superfluous times. First consider the case where the ii-th path xix_{i} contains a cycle. For each ii, the first time (i,t)(i,t) such that xi,t+1{xi,0,,xi,t}x_{i,t+1}\in\{x_{i,0},\ldots,x_{i,t}\} is called a short cycling time. Let 0σt0\leq\sigma\leq t be such that xi,t+1=xi,σx_{i,t+1}=x_{i,\sigma}. By the assumption of tangle-freeness, C:=(xi,σ,,xi,t+1)C:=(x_{i,\sigma},\cdots,x_{i,t+1}) is the only cycle visited by xix_{i}. We denote by (i,τ)(i,\tau) the first time after (i,t)(i,t) that yi,τy_{i,\tau} in not an edge of CC (by convention τ=k\tau=k if xix_{i} remains on CC). We add the extra mark τ\tau to the short cycling time. Important times (i,t)(i,t) with 1t<σ1\leq t<\sigma or τ<tk\tau<t\leq k are called long cycling times. The other important times are called superfluous. The key observation is that for each 1i2m1\leq i\leq 2m, the number of long cycling times (i,t)(i,t) is bounded by ϵ1\epsilon-1 (since there is at most one cycle, no edge of xx can be seen twice outside those of CC, the 1-1 coming from the fact that the short cycling time is an excess edge). Now consider the case where the ii-th path does not contain a cycle, then all important times are called long cycling times and their number is bounded by ϵ\epsilon.

We now have our second encoding. We can reconstruct xx from the positions of the long cycling and the short cycling times and their marks. For each 1i2m1\leq i\leq 2m, there are at most 11 short cycling time and ϵ1\epsilon-1 long cycling times within xix_{i} if xix_{i} contains a cycle and short cycling time and ϵ\epsilon long cycling times if xix_{i} does not contain a cycle. There are at most k2mϵk^{2m\epsilon} ways to position them (in time). There are at most v2v^{2} different possible marks for a long cycling time and v2kv^{2}k possible marks for a short cycling time. We deduce that

We use v2kmv\leq 2km to obtain the announced bound. ∎

From (37) and Markov inequality, it suffices to prove that

For our choice of mm in (35), we have, for nn large enough,

In particular, the above geometric series converges and (38) follows. ∎

The bound (31) on Δ(k)χ\|\Delta^{(k)}\chi\| we will now establish improves by a factor n\sqrt{n} on the trivial estimate Δ(k)χχΔ(k)\|\Delta^{(k)}\chi\|\leq\|\chi\|\|\Delta^{(k)}\|. Its proof parallels the argument used to show (30). We have

where Wk,1W^{\prime\prime}_{k,1} is the set of pairs of paths (γ1,γ2)(\gamma_{1},\gamma_{2}) such γi=(γi,0,,γi,k)\gamma_{i}=(\gamma_{i,0},\ldots,\gamma_{i,k}) is non-backtracking and (γ1,k1,γ1,k)=(γ2,1,γ2,0)(\gamma_{1,k-1},\gamma_{1,k})=(\gamma_{2,1},\gamma_{2,0}) and each edge is visited at least twice. The only difference with Wk,1W_{k,1} defined above is that we do not require that (γ1,0,γ1,1)=(γ2,k,γ2,k1)(\gamma_{1,0},\gamma_{1,1})=(\gamma_{2,k},\gamma_{2,k-1}). However, this last condition (γ1,0,γ1,1)=(γ2m,k,γ2m,k1)(\gamma_{1,0},\gamma_{1,1})=(\gamma_{2m,k},\gamma_{2m,k-1}) was not used in the proof of Lemma 17. It follows that the set of canonical paths in Wk,1W^{\prime\prime}_{k,1} with vv distinct vertices and ee distinct edges has cardinal bounded by k2(2k)6(ev+1)k^{2}(2k)^{6(e-v+1)}. Since the paths are connected and each edge appears at least twice, we have v1ekv-1\leq e\leq k. As in the proof of (30), we get from (40) with m=1m=1

We conclude with Markov inequality and the union bound. ∎

with the convention that γ0=γ2m\gamma_{0}=\gamma_{2m}.

We define G(γ)=(V(γ),E(γ))G(\gamma)=(V(\gamma),E(\gamma)) as the union of the graph G(γiz)G(\gamma_{i}^{z}), 1i2m1\leq i\leq 2m, z{1,2}z\in\{1,2\}. Note that the edges (γi,k,γi,k+1)(\gamma_{i,k},\gamma_{i,k+1}) are not taken into account in G(γ)G(\gamma). As usual, we set v(γ)=V(γ)v(\gamma)=|V(\gamma)| and e(γ)=E(γ)v(γ)e(\gamma)=|E(\gamma)|\geq v(\gamma). Since γi\gamma_{i} is tangled either (a) G(γi)G(\gamma_{i}) contains a cycle and is connected or (b) both G(γi1)G(\gamma_{i}^{1}) and G(γi2)G(\gamma_{i}^{2}) contain a cycle. In particular, all connected components of G(γ)G(\gamma) contain a cycle and it follows that

Taking the expectation in (42), we find that

Since K\|K\| is of order nn, we observe that the second statement in (33) improves by a factor n\sqrt{n} the crude bound KB(k)KB(k)\|KB^{(k)}\|\leq\|K\|\|B^{(k)}\|.

We only prove the second statement. The first statement is proved similarly. The argument again parallels that used to show (30). We take mm as in (35). We have

where Wk,mW_{k,m} is the set of sequence of paths γ=(γ1,,γ2m)\gamma=(\gamma_{1},\ldots,\gamma_{2m}) defined below (36). From (47) and Markov inequality, it suffices to prove that

If γWk,m\gamma\in\mathcal{W}_{k,m} is a canonical element of Wk,mW_{k,m} then v(γ)1e(γ)2kmv(\gamma)-1\leq e(\gamma)\leq 2km and v(γ)3v(\gamma)\geq 3. Also, any γWk,m\gamma\in\mathcal{W}_{k,m} is isomorphic to less that nv(γ)n^{v(\gamma)} elements in Wk,mW_{k,m}. Moreover, we have that

For our choice of mm in (35), the above geometric series converges and (48) follows. ∎

Observe that Lef=0L_{ef}=0 unless e=fe=f, Kef=1K_{ef}=1, Kf1e=1K_{f^{-1}e}=1 or Kef1=1K_{ef^{-1}}=1 in which cases Lef=1L_{ef}=-1. We may thus decompose

where II is the identity, and the non-zero entries of KK^{\prime} are equal 11 and are the pairs (e,f)(e,f) such that Kef=1K_{ef}=1, Kf1e=1K_{f^{-1}e}=1 or Kef1=1K_{ef^{-1}}=1. Thus

Stochastic Block Model : proof of Theorem 4

(in the above, if θk=0\theta_{k}=0, we set ζk=0\zeta_{k}=0). We also define

Proposition 19 will follow from the local analysis done in Section 9. The next Proposition will be established in Section 10 using a matrix expansion together with norm bounds derived by combinatorial arguments parallel to the proof of Proposition 11 for the Erdős-Rényi graph.

We now check that the two preceding propositions imply Theorem 4. We consider (φˉ1,,φˉr)(\bar{\varphi}_{1},\cdots,\bar{\varphi}_{r^{\prime}}) obtained by the Gram-Schmidt orthonormalization of (φˇ1,,φˇr)(\check{\varphi}_{1},\cdots,\check{\varphi}_{r}). By Lemma 9 and Proposition 19(iv), w.h.p. r=rr^{\prime}=r and for all k[r]k\in[r],

Since φˇkφˉk=o(1)\|\check{\varphi}_{k}-\bar{\varphi}_{k}\|=o(1), from Proposition 19(i)-(iii), we find by induction on k[r]k\in[r], w.h.p. for all k[r]k\in[r],

Consequently, from Proposition 20, we have w.h.p.

Hence, Proposition 20 and (53)-(54) imply that w.h.p.

We are now in position to apply Proposition 8. From (52), the statement of Proposition 19(ii) also holds with ζk\zeta_{k} replaced by ζˉk\bar{\zeta}_{k}. It readily implies Theorem 4.

Controls on the growth of Poisson multi-type branching processes

In this section we derive results for multi-type Galton-Watson branching processes with Poisson offspring that will be crucial for the local analysis of Section 9. We refer to Section 3.2 for the notation used below.

We include the proof for later use. For 0s<t0\leq s<t, we have

so that, as ϕkM=μkϕk\phi_{k}^{*}M=\mu_{k}\phi_{k}^{*},

It follows easily that (Xk(t))(X_{k}(t)) is an Ft\mathcal{F}_{t}-martingale with mean . From Doob’s martingale convergence Theorem, the statement will follow if we prove that for some C>0C>0 and all integer t0t\geq 0,

To this end, we denote by Zs+1(i,j)Z_{s+1}(i,j) the number of individuals of type ii in the s+1s+1-th generation which descend from a particle of type jj in the ss-th generation. Thus j[r]Zs+1(i,j)=Zs+1(i)\sum_{j\in[r]}Z_{s+1}(i,j)=Z_{s+1}(i). We then have

where in the penultimate equality we used the fact that the variance of a Poisson random variable equals its mean. It follows that

The Perron-Frobenius Theorem implies that the matrix (M/μ1)s(M/\mu_{1})^{s} converges elementwise to ψ1ϕ1/ϕ1ψ1\psi_{1}\phi_{1}^{*}/\phi_{1}^{*}\psi_{1} as ss\to\infty. In our case, ϕ1=1 ⁣ ⁣I\phi_{1}=1\!\!{\sf I}, ψ1=π\psi_{1}=\pi^{*} and 1 ⁣ ⁣I,ψ1=1\langle 1\!\!{\sf I},\psi_{1}\rangle=1. Consequently, for some C1C\geq 1,

Since μk2>μ1\mu_{k}^{2}>\mu_{1} the above series is convergent. ∎

We also need to control the behavior of ϕk,Zt\langle\phi_{k},Z_{t}\rangle for k[r]\[r0]k\in[r]\backslash[r_{0}]. The next result is contained in Kesten and Stigum [17, Theorem 2.4].

Assume Z0=xZ_{0}=x. For k[r]\[r0]k\in[r]\backslash[r_{0}] define

Then Xk(t)X_{k}(t) converges weakly to a random variable XkX_{k} with finite positive variance.

Note that Theorem 2.4 in expresses XkX_{k} as a mixture of Gaussian variables. The normalization in the case μk2=μ1\mu_{k}^{2}=\mu_{1} comes from the fact that MM is diagonalizable, and hence all its Jordan blocks are of size 11.

2 Quantitative versions of the Kesten-Stigum Theorems

We will also need probabilistic bounds on the growth of the total population at generation tt defined as

Assume S0=1S_{0}=1. There exist c0,c1>0c_{0},c_{1}>0 such that for all s0s\geq 0,

It is straightforward to check that fkf_{k} converges, hence there exist constants c0,c1>0c_{0},c_{1}>0 such that for all k1k\geq 1,

where we have set γ(s)=slogss+1\gamma(s)=s\log s-s+1. In particular, on the event {Sksfkμ1k}Fk\{S_{k}\leq sf_{k}\mu_{1}^{k}\}\in\mathcal{F}_{k}, we have

where we have used the existence of some θ>0\theta>0 such that for x[0,c1]x\in[0,c_{1}], one has γ(1+x)θx2\gamma(1+x)\geq\theta x^{2}. Finally by our choice of εk\varepsilon_{k} and (57), if smax(1/c0,1/c1)s\geq\max(1/c^{\prime}_{0},1/c_{1}),

Hence we deduce the statement of the lemma for some (suitably redefined) constants c0,c1>0c_{0},c_{1}>0. ∎

A key ingredient in the subsequent analysis will be the following result, which bounds by how much the growth of processes sϕk,Zss\to\langle\phi_{k},Z_{s}\rangle deviates from a purely deterministic exponential growth.

and for all k[r]\[r0]k\in[r]\backslash[r_{0}], all t0t\geq 0,

with γ(s)=slogs+1s\gamma(s)=s\log s+1-s. Similarly, for s<1s<1 one has

where by convention γ(x)=+\gamma(x)=+\infty for x0x\leq 0. Let δ(x):=γ(1x)γ(1+x)\delta(x):=\gamma(1-x)\wedge\gamma(1+x). Then for any s0s\geq 0,

In particular, for any i[r]i\in[r], letting y:=MZty:=MZ_{t}, we have, if Zt0Z_{t}\neq 0,

Consider first the case where sy11/2y(i)s\|y\|^{1/2}_{1}\leq y(i). As there exists θ>0\theta>0 such that for all xx\in, δ(x)θx2\delta(x)\geq\theta x^{2}, we get

Consider now the case where sy11/2>y(i)s\|y\|^{1/2}_{1}>y(i). As there exists θ>0\theta^{\prime}>0 such that, for all x1x\geq 1, δ(x)θx\delta(x)\geq\theta^{\prime}x, we get

since Zt0Z_{t}\neq 0 implies that y1μ1\|y\|_{1}\geq\mu_{1} from (11). Thus there exists some c0>0c_{0}>0 such that, for any s0s\geq 0,

If Zt=0Z_{t}=0, then Zt+1=0Z_{t+1}=0 and the same bound trivially holds. We thus obtain the existence of constants c0,c1>0c_{0},c_{1}>0 such that, for any u1u\geq 1,

Now, from (55), for any ss, 0st0\leq s\leq t,

From Equation (59) and Lemma 23, for CC large enough, with probability at least 1nβ1-n^{-\beta}, we have for all h0h\geq 0 that Zh+1MZh2C(logn)(h+1)Zh11/2\|Z_{h+1}-MZ_{h}\|_{2}\leq C(\log n)(h+1)\|Z_{h}\|_{1}^{1/2} and Zh1C(logn)μ1h\|Z_{h}\|_{1}\leq C(\log n)\mu_{1}^{h}. On this event, we get, for k[r0]k\in[r_{0}],

where at the last line, we used that μk2>μ1\mu_{k}^{2}>\mu_{1} and hshahc(a)sas\sum_{h\geq s}ha^{h}\leq c(a)sa^{s} for 0<a<10<a<1. Similarly, on the same event, for k[r]\[r0]k\in[r]\backslash[r_{0}], from (55), for t1t\geq 1 and s=0s=0,

Using now μk2μ1\mu_{k}^{2}\leq\mu_{1}, we have u=0t1(u+1)(μ1μk)u=O(t2(μ1/μk)t)\sum_{u=0}^{t-1}(u+1){{\left(\frac{\sqrt{\mu}_{1}}{\mu_{k}}\right)}}^{u}=O(t^{2}(\sqrt{\mu_{1}}/\mu_{k})^{t}).

3 A cross-generation functional

where Yk(t)=Xk(t)+ϕk,Z0Y_{k}(t)=X_{k}(t)+\langle\phi_{k},Z_{0}\rangle and XkX_{k} is the centered martingale defined in Theorem 21.

We now prove the statements of the theorem for k[r0]k\in[r_{0}]. We find similarly

Since k[r0]k\in[r_{0}], ρk:=μk2/α>1\rho_{k}:=\mu^{2}_{k}/\alpha>1. We write

The last statement of Theorem 24 and (64) give

For any p1p\geq 1, there exists a constant C=C(p,α)>0C=C(p,\alpha)>0 such that for any k[r]k\in[r],

We use twice the bound i=1nxipnp1i=1nxip|\sum_{i=1}^{n}x_{i}|^{p}\leq n^{p-1}\sum_{i=1}^{n}|x_{i}|^{p}. We find

for some new constant CC^{\prime} depending on α\alpha and pp. We deduce that for some new C>0C>0,

4 Decorrelation in homogeneous Galton-Watson branching processes

Assume that the spin σ(o)\sigma(o) at the root node oo is distributed according to the stationary distribution π\pi. Conditionally on the branching tree T{\mathcal{T}}, the process of spins σ(u)\sigma(u) attached to the vertices of the tree is a Markov random field. For any two neighbor nodes u,vu,v of T{\mathcal{T}} and any i,j[r]i,j\in[r], one has the following transition probabilities

For any two (possibly equal) nodes u,vu,v of T{\mathcal{T}}, any i,j[r]i,j\in[r], iji\neq j, one has

By standard properties of independent Poisson random variables, conditionally on the spin σ(o)\sigma(o) and on the number of children of the root oo, then the spins of each of the children of the root are i.i.d., distributed according to Mσ(o)/αM_{\cdot\sigma(o)}/\alpha. Moreover, π\pi is the stationary distribution for this transition kernel, which is reversible, as follows from the relation M=ΠWM=\Pi W and the facts that WW is symmetric together with the assumption (11) that the column sums of MM all coincide with α\alpha. The Markov random field property and the expression of the transition kernel follow by iterating this argument.

We now evaluate the conditional expectation in (67). Let u1=u,,ut=vu_{1}=u,\ldots,u_{t}=v denote the unique path in T{\mathcal{T}} connecting nodes uu and vv. Let Fs{\mathcal{F}}_{s} denote the σ\sigma-field generated by T{\mathcal{T}} and the spin variables σ(u1),,σ(us)\sigma(u_{1}),\ldots,\sigma(u_{s}). We then have by the Markov random field property

where we used the fact that ϕj\phi_{j} is a left-eigenvector of MM associated with eigenvalue μj\mu_{j}. Thus

where the last equality follows from π\pi-orthogonality (9) between vectors ϕk\phi_{k} and ϕj\phi_{j} for jkj\neq k. ∎

by Lemma 27, Equation(67). This concludes the proof. ∎

Local structure of random graphs

We now derive the necessary controls on the local structure of the SBM random graphs under consideration. Coupling results will allow to bound the deviation of their local structure from branching processes. Asymptotic independence between local neighborhoods of distinct nodes will then be used to establish weak laws of large numbers.

For e,fE(V)e,f\in\vec{E}(V), we define the ”oriented” distance

Then, for integer t0t\geq 0, we introduce the vector Yt(e)=(Yt(e)(i))i[r]Y_{t}(e)=(Y_{t}(e)(i))_{i\in[r]} where, for i[r]i\in[r],

The vector Yt(e)Y_{t}(e) counts the types at oriented distance tt from ee.

We shall denote by St(v)S_{t}(v) the set of vertices at distance tt from vv. We introduce

where at the last line we have used Assumption (11)-(13). Central to our local study is the classical exploration process of the neighborhood of vv which starts with A0={v}A_{0}=\{v\} and at stage t0t\geq 0, if AtA_{t} is not empty, takes a vertex in AtA_{t} at minimal distance from vv, say vtv_{t}, reveals its neighbors, say Nt+1N_{t+1}, in [n]\At[n]\backslash A_{t}, and update At+1=(AtNt+1)\{vt}A_{t+1}=(A_{t}\cup N_{t+1})\backslash\{v_{t}\}. We will denote by Ft\mathcal{F}_{t} the filtration generated by (A0,,At)(A_{0},\cdots,A_{t}) and by Dt=0stAsD_{t}=\cup_{0\leq s\leq t}A_{s} the set of discovered vertices at time tt. We start by establishing a rough bound on the growth of StS_{t}.

There exist c0,c1>0c_{0},c_{1}>0 such that for all s0s\geq 0 and for any w[n]E(V)w\in[n]\cup\vec{E}(V),

Consequently, for any p1p\geq 1, there exists c>0c>0 such that

To prove the first statement, observe that, in the exploration process, given Ft\mathcal{F}_{t}, if vtv_{t} has type jj, the number of neighbors of vtv_{t} in [n]\Dt[n]\backslash D_{t} is upper bounded stochastically by

We now check that the random graph GG is locally tree-like. For v[n]v\in[n] and integer h0h\geq 0, we denote by (G,v)h(G,v)_{h} the rooted subgraph of GG rooted at vv, spanned by the vertices at distance at most hh from vv. If e=(u,v)E(V)e=(u,v)\in\vec{E}(V), we set (G,e)h=(G,v)h(G,e)_{h}=(G^{\prime},v)_{h} where GG^{\prime} is the graph GG with the edge {u,v}\{u,v\} removed (if it was present in GG).

where c>0c>0 and we have used again Lemma 29. ∎

We conclude this subsection with a coupling of the process Zt(e)Z_{t}(e) and a multi-type Galton-Watson tree. Recall that for probability measures P,QP,Q on a countable set X\mathcal{X}, the total variation distance is given by

where the minimum is over all coupling (X,Y)(X,Y) such that XdPX\stackrel{{\scriptstyle d}}{{\sim}}P, YdQY\stackrel{{\scriptstyle d}}{{\sim}}Q.

From (13), the latter is O(nγ(1κ))O(n^{-\gamma\wedge(1-\kappa)}). We thus have proved that (73) holds for some new C>0C>0. ∎

We will use the following corollary of Proposition 31.

2 Geometric growth of linear functions of non-backtracking walks

The next proposition asserts that for most eEe\in\vec{E}, Btχk,δe\langle B^{t}\chi_{k},\delta_{e}\rangle grows nearly geometrically in tt with rate μk\mu_{k} up to an error of order αt/2\alpha^{t/2}.

and statement (i) follows from Corollary 32. For statement (ii), we simply use that w.h.p. GG is tangle free (by Lemma 30), hence, there are at most two non-backtracking walks of length tt from ee to any ff. We get

However, by Lemma 29 w.h.p. for all t0t\geq 0 and all eEe\in\vec{E}, St(e)C(logn)αt|S_{t}(e)|\leq C(\log n)\alpha^{t}. ∎

From Cauchy-Schwartz inequality, the first term is bounded w.h.p. by

3 Laws of large numbers for local functions

We first prove weak laws of large numbers for general local functionals of SBM random graphs that will then be applied to specific functionals of interest.

Hence, for any p1p\geq 1, for some cp>0c_{p}>0,

Now, for 1kn1\leq k\leq n, let Xk={1vk:{v,k}E}X_{k}=\{1\leq v\leq k:\{v,k\}\in E\}, where EE is the edge set of GG. The vector (X1,,Xn)(X_{1},\cdots,X_{n}) is an independent vector and for some function FF,

We also define GkG_{k} as the graph with edge set vkXv\cup_{v\neq k}X_{v}. We set

where (T,o)(T,o) is the random rooted tree associated to the Galton-Watson branching process defined in Section 8 started from Z0=διZ_{0}=\delta_{\iota} and ι\iota has distribution (π(1),,π(r))(\pi(1),\ldots,\pi(r)).

In view of Proposition 35 and Jensen’s inequality, it is sufficient to prove that

3.2 Law of large numbers for specific local functions

For any k[r0]k\in[r_{0}], there exists ρk>0\rho_{k}>0 such that, in probability,

For any k[r]\[r0]k\in[r]\backslash[r_{0}], there exists ρk>0\rho_{k}>0 such that w.h.p.

Let ZtZ_{t}, t0t\geq 0, be the Galton-Watson branching process defined in Section 8 started from Z0=διZ_{0}=\delta_{\iota} and ι\iota has distribution (π(1),,π(r))(\pi(1),\ldots,\pi(r)). We denote by (T,o)(T,o) the associated random rooted tree. If k[r0]k\in[r_{0}], by Theorem 21, for some ρk>0\rho_{k}>0,

It proves statement (i) of the proposition.

For statement (ii), we use Theorem 22 instead. We denote by XkX_{k} the limit martingale in Theorem 22 when Z0=διZ_{0}=\delta_{\iota}. If μk2<μ1\mu_{k}^{2}<\mu_{1}, we find similarly that for any θ>0\theta>0,

Since κ<γ/4\kappa<\gamma/4, the right hand side is o(1)o(1). ∎

For any k[r0]k\in[r_{0}], there exists ρk>0\rho^{\prime}_{k}>0 such that w.h.p.

For any k[r]\[r0]k\in[r]\backslash[r_{0}], there exists Ck>0C_{k}>0 such that w.h.p.

Let ZtZ_{t}, t0t\geq 0, be the Galton-Watson branching process defined in Section 8 started from Z0=διZ_{0}=\delta_{\iota} and ι\iota has distribution (π(1),,π(r))(\pi(1),\ldots,\pi(r)). We denote by (T,o)(T,o) the associated random rooted tree. For any k[r0]k\in[r_{0}], by Theorem 25, for some positive constant ρk\rho^{\prime}_{k},

On the other hand, Theorem 25 also ensures that for some Ck>0C_{k}>0, for k[r]\[r0]k\in[r]\backslash[r_{0}],

Since μk2>α\mu_{k}^{2}>\alpha, we have the rough bound

Since κ<γ/5\kappa<\gamma/5, the right hand side goes to and statement (i) follows from (79). Statement (ii) is proved similarly using (80). For statement (iii), we use the above computation, together with Theorem 28 and (9). It gives the claimed bound. ∎

4 Proof of Proposition 19

where at the last line, we have used Lemma 30. Since κ<1/2\kappa<1/2, it proves the first statement.

On the other hand, from Proposition 37(i), w.h.p.

The conclusion follows since κ<1/2\kappa<1/2.∎

All ingredients are finally gathered to prove Proposition 19.

Proof of (i). Let k[r0]k\in[r_{0}]. By definition

From Proposition 37(i) and Proposition 38(i) respectively, for some positive constants c0,c1c_{0},c_{1}, w.h.p.

It remains to use Lemma 39 and the assumption μk2>α\mu_{k}^{2}>\alpha. We find w.h.p.

Proof of (ii). Let k[r0]k\in[r_{0}]. Since Px=xˇPx=\check{x} is an isometry,

Proof of (iv). Let μˉk=μkα\bar{\mu}_{k}=\mu_{k}\vee\sqrt{\alpha}. From (2) and (82)-(83) for k,j[r]k,j\in[r], w.h.p.

In addition, Equations (82)-(83), Proposition 37(i) and Lemma 39 entail that w.h.p.

From Proposition 37(iii), we get if kjk\neq j that w.h.p.

Proof of (v). Let kj[r0]k\neq j\in[r_{0}]. From (2) and (82), w.h.p.

As in (84), we use x,yx,yxyy+yxx{{\left|\langle x,y\rangle-\langle x^{\prime},y^{\prime}\rangle\right|}}\leq\|x^{\prime}\|\|y-y^{\prime}\|+\|y\|\|x-x^{\prime}\|. We find from (82), Proposition 37(i) and Lemma 39 that w.h.p.

Proof of (vi). We again use the same argument. Let kj[r0]k\neq j\in[r_{0}]. From (82),

Recall that x,yx,yxyy+yxx{{\left|\langle x,y\rangle-\langle x^{\prime},y^{\prime}\rangle\right|}}\leq\|x^{\prime}\|\|y-y^{\prime}\|+\|y\|\|x-x^{\prime}\|. Then from (82), Proposition 38(i) and Lemma 39, we obtain w.h.p.

Norm of non-backtracking matrices

In this section we prove Proposition 20. The argument used for Erdős-Rényi graphs extends rather directly to the stochastic block model.

In this paragraph, we essentially repeat the argument of subsection 5.2. We define, for uvVu\neq v\in V, the centered variable,

We now re-define KK as the weighted non-backtracking matrix on the complete graph on VV, for e,fE(V)e,f\in\vec{E}(V),

where efe\to f represents the non-backtracking property, e2=f1e_{2}=f_{1} and ef1e\neq f^{-1}. We also introduce

2 Proof of Proposition 20

The proof of Proposition 20 parallels that of Proposition 11.

First, the expressions (39)-(46)-(49) now depend on the types of the vertices involved in a path. We treat for example the case of (39) needed for Proposition 30. We claim that if γWk,m\gamma\in W_{k,m} is a canonical path with ee edges and vv vertices,

With (87) in place of (39), using Lemma 17 we then bound SS given by (38) as follows

The second difference lies in the definition of the matrix L=K(2)WˉL=K^{(2)}-\bar{W}. For the stochastic block model, from (10), the entry LefL_{ef} is zero unless e=fe=f, efe\to f, f1ef^{-1}\to e or ef1e\to f^{-1}. Moreover, the non-zero entries are bounded by aa. Then the argument of the proof of bound (34) carries over easily.

With bounds (30)-(32)-(33)-(34) available for the stochastic block model, the remainder of the proof of Proposition 20 repeats the argument of subsection 5.4.

Stochastic Block Model : proof of Theorem 5

The strategy of proof is based on the following lemma which asserts that the existence of a Boolean function non constant over the classes ensures the existence of an estimation with asymptotically positive overlap.

Assume that π(i)1/r\pi(i)\equiv 1/r and there exists a function F:V{0,1}F:V\to\{0,1\} of the graph GG such that in probability, for any i[r]i\in[r],

where f:[r]f:[r]\to is not a constant function (there exists (i,j)(i,j) such that f(i)f(j)f(i)\neq f(j)). Let (I+,I)(I^{+},I^{-}) be a partition of [r][r], such that 0<I+<r0<|I^{+}|<r and

Then the following estimation procedure yields asymptotically positive overlap with permutation pp in (16) equal to the identity : assign to each vertex vv a label σ^(v)\hat{\sigma}(v) picked uniformly at random from I+I^{+} if F(v)=1F(v)=1 and from II^{-} if F(v)=0F(v)=0.

Observe that the existence of a non trivial partition (I+,I)(I^{+},I^{-}) satisfying (88) is implied by the assumption that ff is not constant.

Summing over all jI+j\in I^{+}, in probability,

where f+f_{+} is the left hand side of (88). Similarly, if ff_{-} is the right hand side of (88), in probability,

where the strict inequality comes from (88). ∎

Our aim is now to find a non constant functions over the classes which depends on the eigenvector ξk\xi_{k}. To this end, we introduce a new random variable, for vVv\in V,

where s=αρks=\sqrt{\alpha\rho^{\prime}_{k}} and ρk\rho^{\prime}_{k} was defined in Proposition 38.

Let k[r0]k\in[r_{0}], i[r]i\in[r] and Yk,iY_{k,i} be as in Lemma 41. For any continuity point tt of the distribution of Yk,i|Y_{k,i}|, in L2L^{2},

From Proposition 38(i) we find that, in probability,

Hence, from the triangle inequality, w.h.p.

We deduce from Cauchy-Schwartz inequality that w.h.p.

Since tt is a continuity point of Yk,i|Y_{k,i}|, it is then a routine to deduce Lemma 42 from Lemma 41. ∎

All ingredients are now gathered to prove Theorem 5. We fix k[r0]k\in[r_{0}] as in Theorem 5 and let ξk\xi^{\prime}_{k} be as above. We set

Write, for ε=±\varepsilon=\pm, πε=jIεπ(j)\pi_{\varepsilon}=\sum_{j\in I^{\varepsilon}}\pi(j), gε=jJεπ(j)ϕk(j)g_{\varepsilon}=\sum_{j\in J^{\varepsilon}}\pi(j)\phi_{k}(j). Note that g+>0g_{+}>0 by definition of J+J^{+}. Also by the orthogonality relation (9) between ϕ1=1 ⁣ ⁣I\phi_{1}=1\!\!{\sf I} and ϕk\phi_{k} we obtain g++g=0g_{+}+g_{-}=0, so that g<0g_{-}<0. For ε=±\varepsilon=\pm, we shall denote by XεX_{\varepsilon} the random variable obtained as a mixture of the XjX_{j} for jJεj\in J^{\varepsilon}, with weights π(j)/πε\pi(j)/\pi_{\varepsilon}. Note that XεX_{\varepsilon} has mean gε/πεg_{\varepsilon}/\pi_{\varepsilon}.

We may now come back to the eigenvector ξk\xi_{k} in Theorem 5. We set τ=st0\tau=st_{0} in Theorem 5. For some unknown sign ω{1,1}\omega\in\{-1,1\} we have

We first assume that ω\omega is known and ξk=ξk\xi^{\prime}_{k}=\xi_{k}. We consider the function

From (90), (88) is satisfied with I±=J±I^{\pm}=J^{\pm} and we can apply Lemma 40 to obtain an asymptotically positive overlap. We note that the sign ω\omega is easy to estimate consistently if the random variable XX which is the mixture of the XjX_{j} with weights π(j)=1/r\pi(j)=1/r is not symmetric. Indeed, in this case, for some bounded continuous function ff,

Then, from (89), given ω\omega, in probability,

takes a different value for ω=1\omega=1 and ω=1\omega=-1.

Another simple case is if XX defined above is symmetric and J+=J|J^{+}|=|J^{-}|. If this occurs, X+X^{+} and X-X_{-} have the same distribution. We consider the function F(v)=1 ⁣ ⁣I(e:e2=vξk(e)>τ/n)=1 ⁣ ⁣I(ωIk(v)>t0)F(v)=1\!\!{\sf I}(\sum_{e:e_{2}=v}\xi_{k}(e)>\tau/\sqrt{n})=1\!\!{\sf I}(\omega I_{k}(v)>t_{0}) and the estimation where a vertex such that F(v)=1F(v)=1 receives a uniform label in J+J^{+}, and otherwise a label uniform in JJ^{-}. By Lemma 40 applied to FF, if ω=1\omega=1, we obtain an positive overlap with I±=J±I^{\pm}=J^{\pm} and the permutation pp in (16) equal to the identity. If ω=1\omega=-1, we obtain an positive overlap with I±=JI^{\pm}=J^{\mp} and any permutation pp in (16) such that p(J±)=Jp(J^{\pm})=J^{\mp},

We consider the function F(v)=1 ⁣ ⁣I(e:e2=vξk(e)>τ/n)=1 ⁣ ⁣I(ωIk(v)>t0)F(v)=1\!\!{\sf I}(\sum_{e:e_{2}=v}\xi_{k}(e)>\tau/\sqrt{n})=1\!\!{\sf I}(\omega I_{k}(v)>t_{0}) and the estimation where σ^(v)=1\hat{\sigma}(v)=1 if F(v)=1F(v)=1 and σ^(v)\hat{\sigma}(v) uniform on {2,,r}\{2,\ldots,r\} otherwise. We apply Lemma 40 to FF and the partition I+={jω}I^{+}=\{j_{\omega}\}, I=[r]\{jω}I^{-}=[r]\backslash\{j_{\omega}\}. We obtain an asymptotically positive overlap for any permutation pp in (16) such that p(jω)=1p(j_{\omega})=1.

Hence, it follows from (90) that (88) is satisfied with I±=J±I^{\pm}=J^{\pm}. We can then apply Lemma 40 to obtain an asymptotically positive overlap.

References