A Proof Of The Block Model Threshold Conjecture

Elchanan Mossel, Joe Neeman, Allan Sly

Introduction

We consider the simplest version of the stochastic block model, namely the version with two symmetric states:

If q=qq=q^{\prime}, the stochastic block model is just an Erdős-Rényi model, but if qqq\gg q^{\prime} then one expects that a typical graph will have two well-defined clusters.

Theoretical computer scientists’ interest in the average case analysis of the minimum-bisection problem led to intensive research on algorithms for recovering the partition (although not all of these works used the same model as us; for example, considered random regular graphs with a fixed minimum bisection size). At the same time, the model is a classical statistical model for networks with communities , and the questions of identifiability of the parameters and recovery of the clusters have been studied extensively, see e.g. . We refer the readers to for more background on the model.

2 The block models in sparse graphs

Until recently, all of the theoretical literature on the stochastic block model focused on what we call the dense case, where the average degree is of order at least logn\log n and the graph is connected. Indeed, it is clear that connectivity is required, if we wish to label all vertices accurately. However, the case of sparse graphs with constant average degree is well motivated from the perspective of real networks, see e.g. .

Although sparse graphs are natural for modelling many large networks, the stochastic block model seems to be most difficult to analyze in the sparse setting. Despite the large body of work about this model, until recently the only result for the sparse case q,q=O(1n)q,q^{\prime}=O(\frac{1}{n}) was that of Coja-Oghlan . Recently, Decelle et al. made some fascinating conjectures for the cluster identification problem in the sparse stochastic block model. In what follows, we will set q=a/nq=a/n and q=b/nq^{\prime}=b/n for some fixed a,ba,b. It will be useful to parameterize these by d=(a+b)/2d=(a+b)/2 and s=(ab)/2s=(a-b)/2. Note that with these parameters, s2>ds^{2}>d implies that s,d>1s,d>1.

If s2>ds^{2}>d then the clustering problem in G(n,an,bn)\mathcal{G}(n,\frac{a}{n},\frac{b}{n}) is solvable as nn\to\infty, in the sense that one can a.a.s. find a bisection whose correlation with the planted bisection is bounded away from 0.

Decelle et al.’s work is based on deep but non-rigorous ideas from statistical physics. In order to identify the best bisection, they use the sum-product algorithm (also known as belief propagation). Using the cavity method, they argue that the algorithm should work, a claim that is bolstered by compelling simulation results. By contrast the best rigorous work by Coja-Oghlan showed that if s2>Cdlogds^{2}>Cd\log d for a large constant CC, then the spectral method solves the clustering problem. (After the current article first appeared as a preprint, independent works gave simple algorithms that work when s2>Cds^{2}>Cd.)

What makes Conjecture 1.2 even more appealing is the fact that if it is true, it represents a threshold for the solvability of the clustering problem. Indeed it was conjectured in and proved in that if s2ds^{2}\leq d then the clustering problem in G(n,an,bn)\mathcal{G}(n,\frac{a}{n},\frac{b}{n}) problem is not solvable as nn\to\infty. It was further shown in that s2=ds^{2}=d represents the threshold for identifiability of the parameters aa and bb as conjectured by .

The threshold d=s2d=s^{2} can be understood both in terms of spin systems and in terms of random matrices. It was first derived heuristically as a stability condition for the belief propagation algorithm . Around a typical vertex, the joint distribution of the graph labeled by the clusters is asymptotically (in the sense of local weak convergence) a Galton-Watson branching process labeled with the free Ising model. The threshold d=s2d=s^{2} corresponds to the extremality or reconstruction threshold for the Ising model, the point at which information on the spin at the root can be recovered over arbitrarily long distances. In this property was used to show the impossibility of reconstructing clusters when s2ds^{2}\leq d.

In the threshold was heuristically derived by considering the spectrum of the matrix of non-backtracking walks. On an Erdős-Rényi random graph the bulk spectrum of this matrix has radius d1/2d^{1/2}. When s2>ds^{2}>d there is a natural construction of an approximate eigenvector of eigenvalue ss which thus escapes from the bulk exactly at d=s2d=s^{2}. The random matrix interpretation plays a central role in our anaylsis and is discussed further in Section 2.3.

3 Notation

We write graphs as G=(V,E)G=(V,E), where VV is a vertex set and EE is the set of edges. We write vwv\sim w if {v,w}E\{v,w\}\in E. If we need to speak about several graphs at the same time, we may write V(G)V(G) or E(G)E(G) in order to be clear that we are referring to the vertices (or edges) of the graph GG. If σ\sigma is a labelling on VV and UVU\subset V, then we write σU\sigma_{U} for the restriction of σ\sigma to UU.

In order not to be overwhelmed with quantifiers, we make heavy use of the asymptotic notations o,O,ωo,O,\omega, and Ω\Omega, including in the antecedent. For example, the statement that “an=O(bn)a_{n}=O(b_{n}) implies that cn=O(dn)c_{n}=O(d_{n})” means that for every C1>0C_{1}>0 there exists some C2>0C_{2}>0 such that anC1bna_{n}\leq C_{1}b_{n} for all nn implies that cnC2dnc_{n}\leq C_{2}d_{n} for all nn. Given a collection of sequences (av,n)(a_{v,n}) depending on some other parameter vv, we say that they satisfy some asymptotics uniformly in vv if the hidden constants or rates of convergence do not depend on vv. For example, “av,n=o(bn)a_{v,n}=o(b_{n}) uniformly in vv” means that there is some sequence cn0c_{n}\to 0 such that av,n/bncna_{v,n}/b_{n}\leq c_{n} for all vv and all nn.

We write that a sequence of events holds asymptotically almost surely (or a.a.s.) if their probabilities converge to one.

Our results

Our main result is a proof of Conjecture 1.2:

If s2>ds^{2}>d then the clustering problem in G(n,an,bn)\mathcal{G}(n,\frac{a}{n},\frac{b}{n}) is solvable as nn\to\infty, in the sense that one can a.a.s. find a bisection whose correlation with the planted bisection is bounded away from 0.

Our algorithm is also computationally efficient, and can be implemented in almost linear time O(nlog2n)O(n\log^{2}n). We recently learned that Laurent Massoulié independently found a different proof of the conjecture .

We note that our proof of Theorem 2.1 actually gives slightly stronger results. First, aa and bb do not need to be fixed, but may grow slowly with nn:

If a,b=no(1/loglogn)a,b=n^{o(1/\log\log n)} and s2/dλ>1s^{2}/d\geq\lambda>1 for all nn then the clustering problem in G(n,an,bn)\mathcal{G}(n,\frac{a}{n},\frac{b}{n}) is solvable as nn\to\infty, in the sense of Theorem 2.1.

Moreover, although our proof of Theorem 2.1 does not give particularly good bounds for the size of the correlation, it does show that the correlation tends to 1 as s2/ds^{2}/d grows.

If a,b=no(1/loglogn)a,b=n^{o(1/\log\log n)} and s2/ds^{2}/d\to\infty then the clustering problem in G(n,an,bn)\mathcal{G}(n,\frac{a}{n},\frac{b}{n}) is solvable as nn\to\infty, in the sense that one can a.a.s. find a bisection that agrees with the planted bisection up to an error of o(n)o(n) vertices.

It was conjectured in that a popular algorithm, belief propagation initialized with i.i.d. uniform messages, detects communities all the way to the threshold. However, analysis of belief propagation with random initial messages is a difficult task. Krzakala et al. argued that a novel and very efficient spectral algorithm based on a non-backtracking matrix should also detect communities all the way to the threshold. Unfortunately we were unable to follow the path suggested in and our algorithm for detection is not a spectral algorithm. Still, our analysis is based on the non-backtracking walk and we show that it can be implemented using matrix powering. The algorithm has very good theoretical running time O(nlog2n)O(n\log^{2}n) but the constant in the OO needed for the proof that the algorithm is correct is very large, so the algorithm described is not nearly as efficient as the one in (nor have we implemented it).

A path is a sequence u0,,uku_{0},\dots,u_{k} of vertices such that for all ii, uiui1u_{i}\neq u_{i-1}. (Note that we do not require vertices in a path to be connected by an edge in any given graph; thus, it might be more standard – but also rather longer – to use the term path in the complete graph instead.) We write E(γ)E(\gamma) for the set of {ui1,ui}\{u_{i-1},u_{i}\} and V(γ)V(\gamma) for the set {u0,,uk}\{u_{0},\dots,u_{k}\}.

A non-backtracking path is a path u0,,uku_{0},\dots,u_{k} such that for every 0ik20\leq i\leq k-2, uiui+2u_{i}\neq u_{i+2}.

A self-avoiding path is a sequence of vertices u0,,uku_{0},\dots,u_{k} that are all distinct.

The basic intuition behind the proof is that we should be expecting a larger number of non-backtracking paths of a given length kk in the graph between two vertices uu and vv if they have the same label, while a smaller number of non-backtracking paths in the graph is expected if the nodes uu and vv have different labels.

Instead of working with the number of non-backtracking walks in the graph, it is more convenient to work with a rank one correction, where an edge is represented by 1d/n1-d/n and a non-edge by d/n-d/n. With this alteration the expected weight of each edge is .

Let We=1{eE(G)}d/nW_{e}=1_{\{e\in E(G)\}}-d/n. For a non-backtracking path γ=u0,,uk\gamma=u_{0},\ldots,u_{k}, let

Where kk is clear from the context, we will sometimes just write Nu,vN_{u,v}.

Our basic method is to show that Nu,v(k)N_{u,v}^{(k)} is correlated with σuσv\sigma_{u}\sigma_{v} for some klognk\sim\log n. In order to do this, we would like to compute the expectation and variance of the Nu,vN_{u,v}. There is an obstacle, however, which is that on some very rare event there are many more paths in the graph than there should be; this event throws off the expectation and variance of Nu,vN_{u,v}. For intuition, take k=αlognk=\lceil\alpha\log n\rceil for some large constant α\alpha (in order to make our method work, we will need α\alpha arbitrarily large if s2s^{2} is arbitrarily close to dd). As we will show later, Nu,vN_{u,v} is of the order sk/ns^{k}/n with high probability. However, the expectation of Nu,vN_{u,v} could be much larger. Indeed, the probability that an mm-clique containing uu and vv will appear is at least nm2=em2lognn^{-m^{2}}=e^{-m^{2}\log n}. On the event of its appearance, there are at least (m2)k=e(m2)αlogn(m-2)^{k}=e^{(m-2)\alpha\log n} non-backtracking paths of length kk from uu to vv that stay entirely within the graph; each of these paths γ\gamma has Xγ1X_{\gamma}\approx 1. If log(m2)2logs\log(m-2)\geq 2\log s and αlog(m2)>2m2\alpha\log(m-2)>2m^{2} (which can be achieved by first taking mm large enough depending on ss and then taking α\alpha large enough depending on mm), then these paths contribute an expected weight of at least

which is of a larger order than sk/ns^{k}/n. For this reason, our argument for controlling Nu,vN_{u,v} will be divided into two parts: we will use the second moment method to control the part of Nu,vN_{u,v} that involves “nice” paths, and we will control the other paths by conditioning on an event that excludes cliques, along with some other problematic structures which we call tangles.

Roughly speaking, our main technical result is that Nu,vN_{u,v} is correlated with σuσv\sigma_{u}\sigma_{v}, and that as uu and vv vary then the variables Nu,vN_{u,v} are essentially uncorrelated.

It follows easily from Theorem 2.8 that if s2/ds^{2}/d\to\infty then we can get very accurate estimates of σuσv\sigma_{u}\sigma_{v} by computing Nu,v(k)N_{u,v}^{(k)}. To achieve non-trivial estimates of σuσv\sigma_{u}\sigma_{v} in the case s2/d>1s^{2}/d>1 is more complicated. We will explain the procedure roughly in the next section.

2 Almost linear time algorithm

Theorem 2.8 suggests a natural way to check if two vertices are in the same cluster; this is the basis of the algorithm we develop to cluster the graph. We further show how to efficiently perform the algorithm using matrix powering.

There is an algorithm that runs in time O(ndlog2n)O(nd\log^{2}n) and satisfies the following guarantee: for any λ>1\lambda>1 there is an ϵ>0\epsilon>0 such that if s2/dλ>1s^{2}/d\geq\lambda>1 for all nn then the algorithm, given GG(n,an,bn)G\sim\mathcal{G}(n,\frac{a}{n},\frac{b}{n}), produces a labelling τ\tau satisfying

with probability 1o(1)1-o(1), where σ\sigma is the true labelling of GG.

With more care in the analysis the running time could be reduced to O(ndlogn)O(nd\log n) for a slightly modified algorithm.

3 Connections with random matrix theory

The preceding arguments – and also the more general random matrix theory – break down in the sparse case, where aa and bb are O(1)O(1). One reason for this is the presence of high-degree vertices: with high probability there exist vertices with degree Ω(logn/loglogn)\Omega(\log n/\log\log n), and these play havoc with the spectrum of AA. We emphasize that this is not merely a looseness in the analysis: naive spectral algorithms genuinely fail for sparse graphs, see e.g. for a discussion of why this happens for the stochastic block model, or for a quite different example of the connection between vertex degree and spectrum in random graphs.

Some attempts were made to modify spectral methods. A popular idea is to prune all nodes whose degree is larger than some big constant. This idea was pioneered by Feige and Ofek for a different application, and studied in our context by Coja-Oghlan , who gave a spectral algorithm that succeeds on sparse graphs but not all the way to the threshold: it requires s2>Cdlogds^{2}>Cd\log d for some constant CC.

Krzakala et al. suggest quite a different way to “fix” the spectrum of AA: instead of AA, they consider a non-symmetric matrix that avoids the contribution of high-degree vertices by counting non-backtracking paths in the graph instead of all paths in the graph. Although simulations strongly suggested that the non-backtracking matrix had desirable spectral properties whenever s2>ds^{2}>d, the proof remained elusive until very recently (and after the first appearance of this work), when Bordenave et al. gave a solution. Their result required new developments in random matrix theory, partly because the non-backtracking matrix is very sparse and partly because its entries are far from i.i.d. These methods were further developed by Bordenave , who gave a new proof of Alon’s conjecture for the second eigenvalue of random dd-regular graphs.

In an independent (and concurrent) work, Massoulié gave a proof of Theorem 2.1 using a spectral algorithm. He considered the matrix MM where MuvM_{uv} is the number of self-avoiding walks between uu and vv of length αlogn\alpha\log n, for some not-too-large constant α\alpha. This “regularized” matrix has several advantages over the non backtracking matrix we consider. The matrix is quite dense (all degree are polynomial) and in fact is close to regular. Moreover, the matrix is symmetric which allows standard perturbation theory to apply. On the other hand, the entries of the matrix are not independent. Still, Massoulié showed how to apply the trace method and analyze the spectrum of the matrix. He proved that if s2>ds^{2}>d then MM has a separation between its second- and third-largest eigenvalues, and that the second eigenvector is correlated with the true labelling. Hence, the spectral algorithm that rounds the second eigenvector of MM succeeds down to the threshold. We note that the algorithm we describe is much more efficient than the one by , which might not even be implementable in polynomial time (for example, counting self-avoiding walks is #P-complete even for fairly simple families of graphs); in any case, simply writing down the dense n×nn\times n matrix in will take time O(n2)O(n^{2}).

The algorithm and its running time

In this section, we will describe the algorithm and give its analysis assuming Theorem 2.8. We will begin by describing how to use the quantities Nu,v(k)N_{u,v}^{(k)} to estimate the graph labelling. In Section 3.2, we will show how these quantities may be computed efficiently, thereby completing the description of our algorithm. In Section 3.3, we prove the algorithm’s correctness.

Recall that our random graph model adds a within-class edge with probability a/na/n and a between-class edge with probability b/nb/n, where aa and bb are parameters that may grow slowly with nn. We set d=(a+b)/2d=(a+b)/2 and s=(ab)/2s=(a-b)/2, and assume that s2/dλ>1s^{2}/d\geq\lambda>1 for all nn.

We begin by describing a simplified version of our algorithm. This simplified version runs more slowly, but it is more intuitive and will serve to motivate the main algorithm. The basic idea is to fix a very slowly increasing sequence, say R=Rn=2loglogloglognR=R_{n}=2\lceil\log\log\log\log n\rceil. We write Br(v)B_{r}(v) for the set of vertices whose path distance to vv in GG is at most rr, and we write Sv=BR(v)BR1(v)S_{v}=B_{R}(v)\setminus B_{R-1}(v). Fix a node ww^{*} with large degree (at least loglogn\sqrt{\log\log n}, say). For every other node vv, consider the graph H=H(v)H=H(v) obtained by removing ww^{*} and BR1(v)B_{R-1}(v) from GG. Our estimate for σv\sigma_{v} will be

The preceding algorithm has two flaws that we will correct shortly. First, the distribution of HH is slightly painful to work with, because after removing the node ww^{*} the remaining edges are no longer independent. Second, the running time of the algorithm above will be about O(n2logn)O(n^{2}\log n), because we must compute the numbers Nu,w(k)N_{u,w}^{(k)} (each of which takes time O(nlogn)O(n\log n)) with respect to O(n)O(n) different graphs H(v)H(v). This could be fixed by handling several nodes simultaneously: we could remove vBR1(v)\bigcup_{v}B_{R-1}(v) from GG, where the union is taken over, say, n/lognn/\log n vertices vv. This is almost the approach that we will take, but we need to be careful that whatever we remove, the remaining graph is almost distributed according to the stochastic block model. We will ensure this by a slightly convoluted plan: instead of removing specific nodes and neighborhoods, we will remove δn\delta n vertices from GG uniformly at random and look for nodes and neighborhoods that are contained in the removed part. The precise description of our algorithm follows:

Let R=2loglogloglognR=2\lceil\log\log\log\log n\rceil. Let δ>0\delta^{\prime}>0 be chosen so that s2(1δ)2=d(1δ)s^{2}(1-\delta^{\prime})^{2}=d(1-\delta^{\prime}) and let δ=δ/2\delta=\delta^{\prime}/2. We will choose constants κ\kappa and kk depending on ss^{\prime} and dd^{\prime} (the precise dependence will be given later). Then, we proceed as follows:

Remove at random n\lceil\sqrt{n}\rceil vertices VV^{\prime\prime} from the graph , leaving the graph G=(V,E)G^{\prime}=(V^{\prime},E^{\prime})

Let ww^{\ast} be a node in VV^{\prime\prime} whose number of neighbors in VV^{\prime} is closest to loglogn\lceil\sqrt{\log\log n}\rceil. Let SS_{\ast} be the set of its neighbors in VV^{\prime}.

For each vVv\in V^{\prime} denote Sv=BR(v)BR1(v)S_{v}=B_{R}(v)\setminus B_{R-1}(v).

For each 1jlogn1\leq j\leq\log n, let UjU_{j} be a uniformly random set of nδn\lceil n\delta\rceil-\lceil\sqrt{n}\rceil vertices of VSV^{\prime}\setminus S_{\ast}; set Vj=V(SUj)V_{j}=V^{\prime}\setminus(S_{\ast}\cup U_{j}).

For each vVv\in V^{\prime} and 1jlogn1\leq j\leq\log n define

where ξj,v\xi_{j,v} are i.i.d. random variables uniform on $$ and

For each vVv\in V^{\prime} let JvJ_{v} be the first jj such that BR1(v)Vj=B_{R-1}(v)\cap V_{j}=\emptyset and (SvS)Vj(S_{v}\cup S_{\ast})\subset V_{j}, and 0 if no such jj exists. Then set τ(v)=τJv,v\tau(v)=\tau_{J_{v},v} when Jv0J_{v}\neq 0 and τJv,v0\tau_{J_{v},v}\neq 0. For all other vVv\in V, choose τ(v)\tau(v) at independently at random uniformly from {1,1}\{1,-1\}.

We will prove Theorem 2.9 in Section 3.3 by showing that the output τ\tau of the algorithm above is correlated with the true partition with high probability. In the following section we describe how to evaluate the τ\tau in time O(ndlog2n)O(nd\log^{2}n).

2 Efficient implementation of the algorithm

The main computational step in the algorithm above is to compute Nu,u(k,j)N^{(k,j)}_{u,u^{\prime}}; in this section, we will describe how to do so. First, however, note that Nu,u(k,j)N^{(k,j)}_{u,u^{\prime}} is just Nu,u(k)N^{(k)}_{u,u^{\prime}} computed on the subgraph induced by V(SVj)V^{\prime}\setminus(S_{\ast}\cup V_{j}). In particular, it is enough to show how to compute Nu,u(k)N^{(k)}_{u,u^{\prime}} efficiently.

Let VV be the vertex set and AA the adjacency matrix of the graph GG. We recall the definition of Nu,v(k)N^{(k)}_{u,v} and introduce some related matrices

(where an empty sum is defined to be zero, so Q(k,ρ)Q^{(k,\rho)} is the zero matrix for k<0k<0), and define the 4n×4n4n\times 4n matrices

Finally, define the 4n×n4n\times n matrix Qk\mathcal{Q}_{k} by

For a graph on nn vertices and mm edges and for every vector zz the matrix N(k)zN^{(k)}z can be computed in time O((m+n)k)O((m+n)k).

The proof follows from the fact that (by Lemma 3.2) N(k)zN^{(k)}z is the first nn coordinates of

and that each of the matrices M,M^\mathcal{M},\hat{\mathcal{M}} and Q0\mathcal{Q}_{0} is made of at most 1616 blocks, each of which is a sum of a sparse matrix with O(n+m)O(n+m) entries and a rank 11 matrix. Therefore, the displayed expression above can be computed with k+1k+1 matrix-vector multiplications, each of which requires O(n+m)O(n+m) time. ∎

We can now prove the running time bound in Theorem 2.9. Indeed, in iteration jj of the algorithm, the sum uSVj,uSvVjNu,u(k,j)\sum_{u\in S_{\ast}\cap V_{j},u^{\prime}\in S_{v}\cap V_{j}}N_{u,u^{\prime}}^{(k,j)} is the non-trivial computation that needs to be done. This sum can be read from the entries of N(k)zN^{(k)}z, where NN is computed on the graph with the removed nodes and zz is the indicator of the vertices in SS_{\ast}. By Proposition 3.3, the running time of iteration jj is O((n+m)k)O((n+m)k). Since there are logn\log n iterations and we have m=O(nd)m=O(nd) and k=O(logn)k=O(\log n) we obtain that the running time of the algorithm is O(ndlog2n)O(nd\log^{2}n).

We will write Nu,w,v(k)N^{(k)}_{u,w,v} to denote the Nu,v(k)N^{(k)}_{u,v}, but with the sum restricted to non-backtracking paths which move to ww on their first step. Then we have the recursion

By expanding the terms of the form Nu,w,v(k2)N^{(k-2)}_{u,w,v} repeatedly, we obtain by induction that

The recursion above can be written using the matrix QQ from (5) in the following way:

Written in terms of the matrices Q\mathcal{Q} from (8) and M,M^\mathcal{M},\hat{\mathcal{M}} defined in (6) and (7), the recursions above can be written as MQk=Qk+1\mathcal{M}\mathcal{Q}_{k}=\mathcal{Q}_{k+1} and \hat{\mathcal{M}}\mathcal{Q}_{k}=\left(\begin{array}[]{cccc}N^{(k+1)}&0&0&0\\ \end{array}\right)^{T}, as claimed. ∎

3 Correctness of the algorithm

In this section, we will prove the correctness of the algorithm assuming Theorem 2.8. We begin with some preliminary observations about the distributions of various subgraphs of GG. The distribution of GG^{\prime} is simply a stochastic block model with fewer vertices, that is G(nn,a/n,b/n)G(n-\lceil\sqrt{n}\rceil,a/n,b/n). Let Gj=(Vj,Ej)G_{j}=(V_{j},E_{j}) denote the graph obtained at iteration jj; GjG_{j} is also distributed as a stochastic block model, but we will need to say more because we will need to use GjG_{j} conditioned on some extra properties. In particular, we need to argue that conditioned on a vertex neighborhood being removed, the distribution on the remaining graph is drawn (approximately) from the block model. The technical issue here is that the removed vertices are correlated and moreover we need to condition on some of their labels. Nevertheless, this can be handled because the neighborhood of a single vertex does not contain too many other vertices.

For a vertex vv in GjG_{j}, let U=U(v)U=U(v) denote the set SvSS_{v}\cup S_{\ast}. We will be interested in the distribution of (Gj,U,σU)(G_{j},U,\sigma_{U}) and we would like to couple it with a configuration of (G,U,σU)(G^{\prime},U^{\prime},\sigma^{\prime}_{U^{\prime}}) drawn from G(nδn,a/n,b/n)\mathcal{G}(n-\lceil\delta n\rceil,a/n,b/n) and UU^{\prime} is some fixed set of vertices of size U|U|.

The proof will couple σ\sigma^{\prime} with σVj\sigma_{V_{j}} and the edges of GG^{\prime} with the edges of GjG_{j}. The coupling proceeds in the following way:

We take σ\sigma^{\prime} and σ\sigma to be equal on U=UU=U^{\prime} (neither one is random in either measure).

Then we try to couple all other labels so they are completely identical.

Finally, if the labels are identical, we will include exactly the same edges. This is possible since different edges are independent and the probabilities of including edges just depend on the end points.

where n±n_{\pm} is the number of ±1\pm 1 labels in σBR1(v)\sigma_{B_{R-1}(v)}. Thus

Next we note that with high probability S=loglogn|S_{\ast}|=\lceil\sqrt{\log\log n}\rceil since the probability that there exists a vertex in VV^{\prime\prime} with that number of neighbors tends to one; we will condition on this event. Also, we may assume without loss of generality that σw=+\sigma_{w^{\ast}}=+. We will denote

Note that MM_{\ast} is a sum of i.i.d. signs, each of which has expectation sp\frac{s}{p} (since we conditioned on σw=+\sigma_{w^{\ast}}=+). Hence, a.a.s.

which means in particular that MM_{\ast} a.a.s. has the same sign as ss.

Before we proceed to the estimates that apply specifically for our algorithm, let us note a simple corollary of the first three statements of Theorem 2.8:

Take disjoint sets U1,U2VU_{1},U_{2}\subset V that have cardinality no(1)n^{o(1)}; let U=U1U2U=U_{1}\cup U_{2}. Under the notation and assumptions of Theorem 2.8, if Y=uU1,vU2Yu,vY=\sum_{u\in U_{1},v\in U_{2}}Y_{u,v} then uniformly for all labellings σU\sigma_{U} on UU,

We divide the sum into three parts: the first part (containing U1U2|U_{1}||U_{2}| terms) sums over u=uu=u^{\prime} and v=vv=v^{\prime}; for this part, we apply (2). The second part (containing less than U12U2+U22U1|U_{1}|^{2}|U_{2}|+|U_{2}|^{2}|U_{1}| terms) sums over indices with either u=uu=u^{\prime} or v=vv=v^{\prime}; for this part, we use the bound

and then apply (2) to each term on the right hand side. Finally, the third part ranges over distinct u,u,v,vu,u^{\prime},v,v^{\prime} (less than U12V12|U_{1}|^{2}|V_{1}|^{2} terms), and we apply (3) Putting these three parts together,

We may apply the previous lemma with (4) and Chebyshev’s inequality to show that

can be used to estimate the sign of MvM_{v} (which, recall, was defined in (9)).

For a random vertex vv and any ϵ>0\epsilon>0, conditioned on Jv0J_{v}\neq 0,

Next, we control ZvYvZ_{v}-Y_{v}. By (4) and a union bound,

Since Sv|S_{v}| and S|S_{\ast}| are no(1)n^{o(1)}, (4) implies that for any t1t\geq 1, the right hand side above converges to zero. Putting it together and setting t=ϵsRt=\epsilon{s^{\prime}}^{R} (which is at least 1 for large enough nn),

The purpose of this section is to show that MvM_{v} can be used to estimate σv\sigma_{v}. We will do this by exploiting the connection between neighborhoods in GG and multi-type branching processes. Since this section is the only place where we will use the theory of branching processes, we will give only a brief introduction; readers unfamiliar with this theory should consult the book by Athreya and Ney .

For notational simplicity, we will assume for now that s>0s>0. The case s<0s<0 will be discussed at the end of the section. For the rest of this section, TT will denote a Galton-Watson branching process with Poisson(d)\operatorname{Poisson}(d) offspring distribution rooted at ρ\rho. We will assign three random labellings to the vertices of TT in the following way: first, divide TT into connected components by running bond percolation: deleting each edge independently with probability s/ds/d. Then, for each component choose a label uniformly in {±1}\{\pm 1\} and assign that label to all vertices in that component. We define η,η+,\eta,\eta^{+}, and η\eta^{-} respectively to be the configurations generated this way where the connected component of the root is labelled randomly, labelled +1+1, or labelled 1-1 respectively. Let ζ=ηρ\zeta=\eta_{\rho}, let ΨR=vSR(ρ)ηv\Psi_{R}=\sum_{v\in S_{R}(\rho)}\eta_{v} and define ΨR±\Psi_{R}^{\pm} similarly.

It is well-known (and not hard to check) that the random labelling η\eta may also be generated in the following way: choose ηρ\eta_{\rho} uniformly at random. For every child uu of ηρ\eta_{\rho} independently, let ηu=ηρ\eta_{u}=\eta_{\rho} with probability aa+b\frac{a}{a+b} and otherwise let ηu=ηρ\eta_{u}=-\eta_{\rho}. Then recurse this process down the tree: for every child ww of uu independently, let ηw=ηu\eta_{w}=\eta_{u} with probability aa+b\frac{a}{a+b} and otherwise let ηw=ηu\eta_{w}=-\eta_{u}. The processes η+\eta^{+} and η\eta^{-} may be generated similarly, except that instead of beginning with ηρ\eta_{\rho} labelled randomly, we fix ηρ+=+1\eta^{+}_{\rho}=+1 and ηρ=1\eta^{-}_{\rho}=-1.

Let ξ\xi be a uniform random variable on $thatisindependentofthat is independent ofT,,\eta,and, and\eta^{\pm}.Thereexist. There exist\kappa>0andand\epsilon>0$ such that

By symmetry, ΨR\Psi_{R} is symmetric about 0 and so if ξ\xi is an independent uniform on $thenforanythen for any\kappa>0$,

for small enough δ>0\delta>0. By Chebyshev’s inequality, both ΨR+\Psi_{R}^{+} and ΨR\Psi_{R}^{-} belong to [κsR,κsR][-\kappa s^{R},\kappa s^{R}] with probability 1O(κ2)1-O(\kappa^{-2}). Then

Finally, symmetry of ΨR+\Psi_{R}^{+} and ΨR\Psi_{R}^{-} implies that

which completes the proof if κ\kappa is a sufficiently large constant. ∎

For 1ilogn1\leq i\leq\log n let (Ti,ηi)(T_{i},\eta_{i}) be iid copies of (T,η)(T,\eta) above for R=2loglogloglognR=2\lceil\log\log\log\log n\rceil. For v1,,vlognv_{1},\ldots,v_{\log n} be uniformly chosen vertices in VV,

This argument is a minor variation on a well-known argument showing the local tree-like structure of sparse graphs. We will give only a sketch, but a much more detailed argument (although for only one neighborhood) is given in .

We establish the result by coupling the two processes. By Markov’s inequality, with high probability i=1lognTilog2n\sum_{i=1}^{\log n}|T_{i}|\leq\log^{2}n and i=1lognBR(vi)log2n\sum_{i=1}^{\log n}|B_{R}(v_{i})|\leq\log^{2}n. Moreover, by standard arguments in sparse random graphs, BR(vi)\bigcup B_{R}(v_{i}) is a disjoint union of trees with high probability.

We reveal the branching process trees by sequentially revealing for each vertex how many children of each label it has (Poisson(a/2)\hbox{Poisson}(a/2) of the same label and Poisson(b/2)\hbox{Poisson}(b/2) of the opposite label) down to level RR in a breadth-first manner.

Similarly, we can reveal the neighborhoods of the viv_{i} and their labels in GG by sequentially revealing the neighbors and labels of the currently revealed vertices. Suppose that we condition on the labels of all vertices and on the graph structure that was revealed so far, and suppose that we want to reveal the neighbors of a given vertex uu. With high probability, none of these revealed neighbors will belong to the already-explored set and so we will focus on uu’s neighbors among the unexplored vertices. If n±n^{\pm} are the numbers of ±1\pm 1-labelled vertices that have not yet been explored, then uu has Bin(nσu,a/n)\operatorname{Bin}(n^{\sigma_{u}},a/n) neighbors of label σu\sigma_{u} and Bin(nσu,b/n)\operatorname{Bin}(n^{\sigma_{u}},b/n) neighbors of label σu-\sigma_{u}. Note that n±n^{\pm} are both in n/2±n2/3n/2\pm n^{2/3} with high probability, because the original labels were biased by at most O(n1/2)O(n^{1/2}) and we have revealed at most log2n\log^{2}n of them.

We couple these two processes with the usual coupling of Poisson and Binomial random variables. In each step we fail with probability O(n1/3)O(n^{-1/3}) and (since there are at most log2n\log^{2}n steps) the coupling altogether fails with probability o(1)o(1). ∎

Using the coupling between graphs and trees, we will show that Aj,v\mathcal{A}_{j,v} is a good estimator for σv\sigma_{v}. Later, we will show that Aj,v\mathcal{A}_{j,v} usually agrees with our previous estimator τj,v\tau_{j,v}.

Let v1,,vlognv_{1},\ldots,v_{\log n} be a uniform sample without replacement from VV^{\prime}. Take the coupling in Lemma 3.8, and let ΨR,i=vSR(ρi)ηv\Psi_{R,i}=\sum_{v\in S_{R}(\rho_{i})}\eta_{v}, where ρi\rho_{i} is the root of TiT_{i}. Set

By Lemma 3.8, the same holds for Aj,vi\mathcal{A}_{j,v_{i}}. If we now partition VV^{\prime} to sets of size logn\log n and use the fact that σviAj,vi\sigma_{v_{i}}\mathcal{A}^{\prime}_{j,v_{i}} are ±1\pm 1, we obtain the claim of the lemma. ∎

Recall that we have been assuming s>0s>0. In the case s<0s<0, Lemma 3.9 (which is the only result from this section that we will use later) remains true. Indeed, in order to generate the TT and η\eta for the case s<0s<0, one can generate them for s|s| and then flip the sign of every label in an odd generation. Since RR is even, level RR of the tree is unchanged and sR=sRs^{R}=|s|^{R}. Thus, Lemma 3.9 remains true.

and that JvJ_{v} is the first jj such that BR1(v)Vj=B_{R-1}(v)\cap V_{j}=\emptyset and (SvS)Vj(S_{v}\cup S_{\ast})\subset V_{j}, and Jv=0J_{v}=0 if no such jj exists.

Recall that with high probability we have that S=loglogn|S_{\ast}|=\sqrt{\log\log n}. With high probability BR(v)d2R|B_{R}(v)|\leq d^{2R}. Condition on BR(v)d2R|B_{R}(v)|\leq d^{2R}. The probability that BR1Vj=B_{R-1}\cap V_{j}=\emptyset and SvSVjS_{v}\cup S_{\ast}\subset V_{j} is bounded below by ec(d2R+S)(logn)1/2e^{-c(d^{2R}+|S_{\ast}|)}\geq(\log n)^{-1/2}. Since these are independent events given BR(v)|B_{R}(v)| it follows that with probability tending to one Jv0J_{v}\neq 0. ∎

We now show that the indicators AJv,v\mathcal{A}_{J_{v},v} and τJv,v\tau_{J_{v},v} usually agree.

By Lemma 3.10, the probability of Jv=0J_{v}=0 goes to zero, and therefore Lemma 3.6 implies that

The event AJv,vτJv,v\mathcal{A}_{J_{v},v}\neq\tau_{J_{v},v} is equivalent to ξJv,v\xi_{J_{v},v} falling outside the interval with end-points Mv/(κsR)-M_{v}/(\kappa s^{\prime R}) and

Since with high probability MM_{\ast} is concentrated around sdS\frac{s^{\prime}}{d}|S_{\ast}|, Lemma 3.6 implies the latter end point converges in probability to

Therefore the probability that ξJv,v-\xi_{J_{v},v} falls in the interval converges to as needed. ∎

We can now complete the proof of Theorem 2.9.

Combining Lemmas 3.9, 3.11 and 3.10 we have that with high probability

yielding an algorithm recovering the a constant correlation with the true partition. The running time bound was proved in Section 3.2. ∎

The proof of Theorem 2.3 (i.e., when we are far above the threshold) is rather easier, and doesn’t require the branching process tools:

Combinatorial path bounds

A crucial ingredient in the proof is obtaining bounds on the number of various types of paths (in the complete graph) in terms of how much they self-intersect, either by intersecting a previous vertex on the path or by repeating an edge of the path.

Given a path γ=(v1,,vk)\gamma=(v_{1},\dots,v_{k}), we say that an edge (vi,vi+1)(v_{i},v_{i+1})

is new if for all jij\leq i, vjvi+1v_{j}\neq v_{i+1}

is old if there is some j<ij<i such that {vi,vi+1}={vj,vj+1}\{v_{i},v_{i+1}\}=\{v_{j},v_{j+1}\}.

Otherwise, we say that (vi,vi+1)(v_{i},v_{i+1}) is returning (in this case vi+1=vjv_{i+1}=v_{j} for j<ij<i but {vi,vi+1}\{v_{i},v_{i+1}\} is not one of the previous edges).

Let kn(γ),ko(γ)k_{n}(\gamma),k_{o}(\gamma) and kr(γ)k_{r}(\gamma) be the number of new, old, and returning edges respectively.

For the rest of this subsection we fix α\alpha and set k=αlognk=\lceil\alpha\log n\rceil. Note that for every new edge in a path, the number of distinct vertices in the path increases by one, as does the number of distinct edges. For a returning edge, only the number of edges increases, while for an old edge, neither increases. Therefore we easily see that:

The number of vertices visited by the path γ\gamma is kn(γ)+1k_{n}(\gamma)+1 and the number of edges is kn(γ)+kr(γ)k_{n}(\gamma)+k_{r}(\gamma).

Our first bound is a fairly crude one that will allow us to assume that krk_{r} is smaller than some constant. Note that there is no non-backtracking restriction yet.

For any constant CC, if kr1k_{r}\geq 1 and nn is sufficiently large then there are at most

paths γ\gamma of length at most ClognC\log n, with a fixed starting and ending point, and satisfying kn(γ)=knk_{n}(\gamma)=k_{n} and kr(γ)=krk_{r}(\gamma)=k_{r}.

The point of Lemma 4.4 is that it implies that paths with large krk_{r} are so rare that they do not contribute any weight. Indeed, for some α\alpha to be determined choose kk^{*} large enough (depending on α\alpha) so that

It then follows from Lemma 4.4 that if Γ\Gamma is the collection of all paths of length at most 4αlogn4\alpha\log n with kr(γ)kk_{r}(\gamma)\geq k^{*} then

Note that for any path γ\gamma and any labelling σ\sigma,

(since kr(γ)+kn(γ)k_{r}(\gamma)+k_{n}(\gamma) is the number of edges in γ\gamma). Hence, we have:

Let k=k(α,d)k^{*}=k^{*}(\alpha,d) be defined in (13). Then

where the sum ranges over γ\gamma of length at most 4αlogn4\alpha\log n and with kr(γ)kk_{r}(\gamma)\geq k^{*}.

Consider paths of fixed length kk; later, we will sum over all kClognk\leq C\log n. Suppose that for all ii, we decide in advance whether (vi,vi+1)(v_{i},v_{i+1}) will be new, old, or returning. There are at most (kkn ko kr)\binom{k}{k_{n}\ k_{o}\ k_{r}} ways to make this choice. Fix an ii and suppose that viv_{i} has already been determined. If (vi,vi+1)(v_{i},v_{i+1}) is new then there are at most nn choices for vi+1v_{i+1}. If (vi,vi+1)(v_{i},v_{i+1}) is returning then there are at most V(γ)=kn+1k|V(\gamma)|=k_{n}+1\leq k choices for vi+1v_{i+1}. Otherwise, (vi,vi+1)(v_{i},v_{i+1}) is an old edge, and there are at most kr+2k_{r}+2 choices for vi+1v_{i+1} because kr+2k_{r}+2 bounds the maximum degree of the final path. Hence, the total number of choices is at most

(In the case ko=0k_{o}=0, we adopt the convention (y/0)0=1(y/0)^{0}=1.) Now, the quantity (y/x)x(y/x)^{x} is increasing in xx as long as xy/ex\leq y/e. Applying this with y=2ekkry=2ekk_{r} and the values x=koky/ex=k_{o}\leq k\leq y/e we have

On the other hand, ek2/(nkr)n2/3ek^{2}/(nk_{r})\leq n^{-2/3} for sufficiently large nn. Hence, the total number of paths of length kk is at most

Summing over kClognk\leq C\log n introduces an extra factor of ClognC\log n, but this factor is cancelled out by nkr/6n^{-k_{r}/6} for sufficiently large nn. ∎

The bounds of Lemma 4.4 are not accurate when krk_{r} is small. Essentially, we require bounds of nkn1+o(1)n^{k_{n}-1+o(1)} in order to make the rest of our argument work (certainly, we can’t expect any better bounds, since every new edge but the last one has almost nn choices). In order to achieve this bound, we need to introduce extra structure into our paths: they need to be non-backtracking and without many tangles.

First, note that if we specify which edges are returning and we also specify the first new edge after each returning edge, then we have also determined which edges are old (because every edge after a new edge but before the next returning edge is new). Therefore, the number of ways to specify which edges are old, new, or returning is at most k2krk^{2k_{r}}. Then there are at most kkrk^{k_{r}} ways to choose the returning edges and at most nkn1n^{k_{n}-1} ways to choose the new edges (since one of them must hit the final vertex, so it has no choices). So far, we have made at most k3krnkn1k^{3k_{r}}n^{k_{n}-1} choices, and these choices determine the edges traversed by γ\gamma.

Having fixed the edges traversed by γ\gamma, we will now count old edges. We denote by d(v)=d(v,γ)d(v)=d(v,\gamma) the degree of vv in γ\gamma: that is, the number of wγw\in\gamma such that {w,v}E(γ)\{w,v\}\in E(\gamma). Let V3{V_{\geq 3}} be the set of vertices with degree at least 3. Note that because γ\gamma is non-backtracking, if an edge (vi,vi+1)(v_{i},v_{i+1}) is old, then vi+1v_{i+1} is already determined by the path up to viv_{i} unless viV3v_{i}\in{V_{\geq 3}}.

where m(v)mout(v)+min(v)m(v)^{m_{\text{out}}(v)+m_{\text{in}}(v)} bounds the number of ways that we can intersperse the short arrivals and departures among all visits to vv, and the second inequality follows from the fact that d(v)m(v)d(v)\leq m(v). Repeating this for all vV3v\in{V_{\geq 3}}, we see that the number of ways to choose all the old edges in dd is at most

where the second inequality follows because every time the walk returns to its old path, it creates at most two vertices of degree higher than two (one when the walk returns, and one when it leaves again). Since m(v)km(v)\leq k for every vv, the quantity above is bounded by k2krk^{2k_{r}}. Plugging this back into (14), we get the claimed bound. ∎

When we take second moments over various sums over paths, we will end up having to control the number of pairs of paths with certain properties. In what follows, we take two self-avoiding paths, γ1\gamma_{1} and γ2\gamma_{2}, of length kk. We will refine Definition 4.1 by saying that a (directed) edge (u,v)(u,v) of γ2\gamma_{2} is new with respect to γ1\gamma_{1} if v∉V(γ1)v\not\in V(\gamma_{1}). We say that (u,v)(u,v) is old with respect to γ1\gamma_{1} if the (undirected) edge {u,v}\{u,v\} appears in γ1\gamma_{1}. Otherwise, we say that that (u,v)(u,v) is returning with respect to γ1\gamma_{1}. We write kn,γ1(γ2)k_{n,\gamma_{1}}(\gamma_{2}), ko,γ1(γ2)k_{o,\gamma_{1}}(\gamma_{2}), and kr,γ1(γ2)k_{r,\gamma_{1}}(\gamma_{2}) for the numbers of edges of these three types in γ2\gamma_{2}.

Fix vertices u,u,v,vu,u^{\prime},v,v^{\prime} (not necessarily distinct). There are at most

pairs (γ1,γ2)(\gamma_{1},\gamma_{2}) of length-kk self-avoiding paths where γ1\gamma_{1} goes from uu to vv, γ2\gamma_{2} goes from uu^{\prime} to vv^{\prime}, and where kn,γ1(γ2)=kn,γ1k_{n,\gamma_{1}}(\gamma_{2})=k_{n,\gamma_{1}} and kr,γ1(γ2)=kr,γ1k_{r,\gamma_{1}}(\gamma_{2})=k_{r,\gamma_{1}}.

For this proof, whenever we speak of old, new, or returning edges of γ2\gamma_{2}, we mean with respect to γ1\gamma_{1}.

First, assume that vv^{\prime} is not an interior node of γ1\gamma_{1}. There are at most nk1n^{k-1} such choices for γ1\gamma_{1}; fix one and consider γ2\gamma_{2}. Every sequence of old edges in γ2\gamma_{2} either occurs at the beginning of γ2\gamma_{2}, or it is preceded by a returning edge. Hence, there are at most (kkr,γ1)(kkr,γ1+1)\binom{k}{k_{r,\gamma_{1}}}\binom{k}{k_{r,\gamma_{1}}+1} choices for the edge types of γ2\gamma_{2}: (kkr,γ1)\binom{k}{k_{r,\gamma_{1}}} choices for which edges are returning, and at most (kkr,γ1+1)\binom{k}{k_{r,\gamma_{1}}+1} choices for the end of each sequence of old edges. Each new edge has at most nn choices for its endpoint. In the case that v∉{u,v}v^{\prime}\not\in\{u,v\} then (since it is also not an interior node of γ1\gamma_{1}) the last edge is new, but it has no choices. Hence, there are at most nkn,γ11v∉{u,v}n^{k_{n,\gamma_{1}}-1_{v^{\prime}\not\in\{u,v\}}} choices for the new edges. Each returning edge has at most E(γ1)=k|E(\gamma_{1})|=k choices for its endpoint, and every sequence of old edges has at most 22 choices: it must follow the (self-avoiding) path γ1\gamma_{1}, but it may do so in either direction; moreover, there are at most kr,γ1+1k_{r,\gamma_{1}}+1 distinct sequences of old edges. Hence, the total number of choices for γ2\gamma_{2} is bounded by

pairs that satisfy the conditions of the lemma, and also the additional constraint that vv^{\prime} is not an interior node of γ1\gamma_{1}.

Now suppose that vv^{\prime} is an interior node of γ1\gamma_{1}. There are at most knk2kn^{k-2} ways to choose such a γ1\gamma_{1}. We may repeat the previous paragraph to bound the number of choices of γ2\gamma_{2}, except that this time there will be up to nkn,γ1n^{k_{n,\gamma_{1}}} choices for the new edges, because the final edge of γ2\gamma_{2} may not be new. Hence, there are at most

pairs of paths of this form, where vv^{\prime} is an interior node of γ1\gamma_{1}. Combined with the other case, this proves the claim. ∎

Weighted sums over self-avoiding paths

In this section, we consider the behavior of weighted sums over self-avoiding paths. In particular, we will prove (1), (2), and (3) from Theorem 2.8.

Eventually, we will need to bound (or bound) the expected weight of complicated paths. Our basic building block for these computations is the expected weight of a self-avoiding path.

Let ζ\zeta be either a self-avoiding path or a simple cycle. Let zz be the length of ζ\zeta and let u,vu,v be the endpoints. If pmz=no(1)pmz=n^{o(1)} then uniformly with respect to γ\gamma

First, consider the case m=1m=1. For any labelling τ\tau that is compatible with σu\sigma_{u} and σv\sigma_{v},

Since ζ\zeta is a path, if xx is an interior vertex of ζ\zeta then τx\tau_{x} appears exactly twice in the product above. Since τx2=1\tau_{x}^{2}=1, these terms all cancel out, leaving

which proves the claim in the case m=1m=1.

Now we take the average over all assignments τ\tau that agree with σu\sigma_{u} and σv\sigma_{v}:

where the sum ranges over all 2z12^{z-1} labellings τ\tau on ζ\zeta that agree with σu\sigma_{u} and σv\sigma_{v}. Combining this with (15) completes the proof. ∎

2 Decomposition into segments

Although our current goal is to understand the contribution of self-avoiding paths, in order to compute the second moment in Theorem 2.8, we will need to consider the concatenation of two self-avoiding paths (which may not be self-avoiding). Therefore, we introduce the following method for decomposing a general path into its self-avoiding pieces. This decomposition will also be useful in Section 6, where we consider more complicated paths.

Consider a path γ\gamma. We say that a collection of paths ζ(1),,ζ(r)\zeta^{(1)},\dots,\zeta^{(r)} is a SAW-decomposition of γ\gamma if

each ζ(i)\zeta^{(i)} is a self-avoiding path;

the interior vertices of each ζ(i)\zeta^{(i)} are not contained in any other ζ(j)\zeta^{(j)}, nor is any interior vertex of ζ(i)\zeta^{(i)} equal to the starting or ending vertex of γ\gamma; and

the ζ(i)\zeta^{(i)} cover γ\gamma, in the sense that E(γ)=iE(ζ(i))E(\gamma)=\bigcup_{i}E(\zeta^{(i)}).

Note that since ζ(i)\zeta^{(i)} and ζ(j)\zeta^{(j)} share no interior vertices, every time that the path γ\gamma begins to traverse ζ(i)\zeta^{(i)}, it must finish traversing ζ(i)\zeta^{(i)}. Moreover, the fact that ζ(i)\zeta^{(i)} and ζ(j)\zeta^{(j)} share no interior vertices implies that they are edge-disjoint, and so for each fixed ii, every edge in ζ(i)\zeta^{(i)} is traversed the same number (i.e. mim_{i}) of times.

There is a natural way to construct a SAW-decomposition of a path γ\gamma. Consider a path γ\gamma between uu and vv, and let V3{V_{\geq 3}} be the subset of γ\gamma’s vertices that have degree 3 or more in γ\gamma. Let

We call the preceding construction of ζ(1),,ζ(r)\zeta^{(1)},\dots,\zeta^{(r)} the canonical SAW-decomposition of γ\gamma.

For a set of vertices UU, if we run the preceding construction, but with

instead of as defined in (16), then we call the resulting SAW-decomposition the UU-canonical SAW-decomposition of γ\gamma.

If ζ(1),,ζ(r)\zeta^{(1)},\dots,\zeta^{(r)} is the canonical SAW-decomposition of γ\gamma then r2kr(γ)+B(γ)+1r\leq 2k_{r}(\gamma)+B(\gamma)+1, where B(γ)B(\gamma) is the number of backtracks in γ\gamma.

If ζ(1),,ζ(r)\zeta^{(1)},\dots,\zeta^{(r)} is the UU-canonical SAW-decomposition of γ\gamma then r2kr(γ)+B(γ)+1+Ur\leq 2k_{r}(\gamma)+B(\gamma)+1+|U|.

Every returning edge in γ\gamma increases rr by at most 2, since it can create a new SAW component, and it can split an existing component into 2 pieces. Every backtrack in γ\gamma increases rr by at most 1, since it can create a new SAW component. This proves the first statement; to prove the second, note that each vertex vAv\in A creates at most one new component, since if vV3v\in{V_{\geq 3}} then it has no effect, while if v∉V3v\not\in{V_{\geq 3}} then it has degree at most 2 in γ\gamma and so splitting the path that goes through vv introduces at most one new component. ∎

3 The weight of a SAW-decomposition

We can compute the expected weight of a SAW-decomposition by simply applying Lemma 5.1 to each component. We state the following lemma slightly more generally, so that we may also apply it to subsets of the SAW-decomposition of a path.

where k=ieζ(i)mek=\sum_{i}\sum_{e\in\zeta^{(i)}}m_{e}.

where the last inequality follows because 1<d<s21<d<s^{2}, and so me>1m_{e}>1 implies d<s2smed<s^{2}\leq s^{m_{e}}. ∎

4 Proof of (1)–(3)

Now we prove the first three parts of Theorem 2.8. The claim (1) about the first moment follows from Lemma 5.1.

For the second moment, we will expand the square in Yu,v2Y_{u,v}^{2}. Suppose γ1\gamma_{1} and γ2\gamma_{2} are a pair of self-avoiding paths of length kk from uu to vv. By reversing γ2\gamma_{2} and appending it to γ1\gamma_{1}, we obtain a single path (γ\gamma, say) from uu to itself which passes through vv and backtracks at most once (at vv). We consider the set of all γ\gamma that can be obtained in this way, and divide them into four classes:

Γ0\Gamma_{0} is the collection of such paths with kr(γ)=0k_{r}(\gamma)=0. These paths begin with a self-avoiding walk from uu to vv, after which they backtrack at vv and walk back to uu along exactly the same path. They have kk edges, k1k-1 vertices, and every edge is visited twice.

Γ1\Gamma_{1} is the collection of such paths with kr=1k_{r}=1. These paths consist of a simple cycle that is traversed once, with up to two “tails” that are traversed twice each.

Γ2\Gamma_{2} is the collection of such paths with 2krk2\leq k_{r}\leq k^{*}.

Γ3\Gamma_{3} is the collection of such paths with kr>kk_{r}>k^{*}.

where the second inequality follows from our choice of kk in Theorem 2.8. In particular, this term is of a lower order than the bound claimed in the theorem.

Next, we consider Γ1\Gamma_{1}. Recall that the first kk steps of γΓ1\gamma\in\Gamma_{1} make up a simple path. Let ii be minimal so that the (k+i+1)(k+i+1)th step of γ\gamma is new; let jj be such that the 2kj2k-jth step of γ\gamma is returning. It follows that the first jj edges of γ\gamma consist of a simple path where each edge is traversed twice. The same holds for edges ki+1k-i+1 through k1k-1. The rest of γ\gamma consists of a simple cycle of length 2k2(i+j)2k-2(i+j), each edge of which is traversed once. Let Γ1(i,j)\Gamma_{1}(i,j) denote the set of such paths. By Lemma 5.6, if γ\gamma’s interior does not intersect UU then the expected weight of γΓ1(i,j)\gamma\in\Gamma_{1}(i,j) is bounded by

Now, Γ1(i,j)=(1+o(1))n2kij2|\Gamma_{1}(i,j)|=(1+o(1))n^{2k-i-j-2} because γΓ1(i,j)\gamma\in\Gamma_{1}(i,j) has 2kij2k-i-j distinct vertices (including uu and vv), and once those vertices and their order is fixed then γ\gamma is determined. As in the argument for Γ0\Gamma_{0}, the paths whose interiors intersect UU provide a negligible contribution, and hence

Hence, Γ1\Gamma_{1} provides the main term in the claimed bound.

Summing over the kk choices of knk_{n} shows that the paths in Γ2\Gamma_{2} contribute a smaller order term than the paths in Γ1\Gamma_{1}.

Finally, we bound Γ3\Gamma_{3} using Corollary 4.5; these terms also contribute a smaller order term.

4.2 The cross moment

Γ0\Gamma_{0} are the pairs of paths that do not intersect.

Γ1\Gamma_{1} are the pairs of paths that do intersect, and that satisfy kr,γ1(γ2)kk_{r,\gamma_{1}}(\gamma_{2})\leq k^{*}.

Γ2\Gamma_{2} are the pairs of paths that satisfy kr,γ1(γ2)>kk_{r,\gamma_{1}}(\gamma_{2})>k^{*}.

For (γ1,γ2)Γ0(\gamma_{1},\gamma_{2})\in\Gamma_{0}, the variables Xγ1X_{\gamma_{1}} and Xγ2X_{\gamma_{2}} are independent, and hence

We recall from (1) that the right hand side above is of the order s2kn2s^{2k}n^{-2}; in order to prove the claim about the cross moments, we need to show that the contributions of Γ1\Gamma_{1} and Γ2\Gamma_{2} are of the order s2kn3+o(1)s^{2k}n^{-3+o(1)}.

To control Γ1\Gamma_{1}, we split pairs of paths according to kn,γ1(γ2)k_{n,\gamma_{1}}(\gamma_{2}). If Γ1,kn,γ1\Gamma_{1,k_{n,\gamma_{1}}} is the set of pairs of paths in Γ1\Gamma_{1} with kn,γ1(γ2)=kn,γ1k_{n,\gamma_{1}}(\gamma_{2})=k_{n,\gamma_{1}} then Lemma 4.8 implies that Γ1,kn,γ1nk+kn,γ12+o(1)|\Gamma_{1,k_{n,\gamma_{1}}}|\leq n^{k+k_{n,\gamma_{1}}-2+o(1)} (when applying Lemma 4.8, recall that vv^{\prime} is distinct from uu and vv). By Lemma 5.6, and noting that E(γ1)E(γ2)k+kn,γ1+1|E(\gamma_{1})\cup E(\gamma_{2})|\geq k+k_{n,\gamma_{1}}+1 because kr,γ1(γ2)1k_{r,\gamma_{1}}(\gamma_{2})\geq 1,

Summing over the kk possible values of kn,γ1k_{n,\gamma_{1}} adds another no(1)n^{o(1)} factor, and we conclude that Γ1\Gamma_{1} is a lower order term.

Finally, we control Γ2\Gamma_{2}. For each pair (γ1,γ2)Γ2(\gamma_{1},\gamma_{2})\in\Gamma_{2}, we may create a new path γ\gamma by joining the end of γ1\gamma_{1} to the beginning of γ2\gamma_{2}. Then γ\gamma has length 2k+12k+1 and kr(γ)kk_{r}(\gamma)\geq k^{*}. Note that Xγ1nXγ1Xγ2|X_{\gamma}|\geq\frac{1}{n}|X_{\gamma_{1}}X_{\gamma_{2}}| because the new edge that we added always has We1n|W_{e}|\geq\frac{1}{n}. Hence,

where the second sum ranges over all γ\gamma of length 2k+12k+1 satisfying kr(γ)kk_{r}(\gamma)\geq k^{*}. But by Corollary 4.5, the last quantity is at most n3+o(1)n^{-3+o(1)}, and so Γ2\Gamma_{2} contributes a lower order term.

Weighted sums over complicated paths

For any path γ\gamma, ΞΩF(γ)\Xi\subset\Omega_{\mathcal{F}(\gamma)}.

Let HH be any fixed graph with mm vertices and m+1m+1 edges. There are at most nmn^{m} ways to embed HH into GG, and for each of those embeddings, the probability that all edges in HH appear is at most (2d/n)m(2d/n)^{m}. By a union bound, the probability that HH is a subgraph of GG is at most n1(2d)mn^{-1}(2d)^{m}.

Before proceeding to bound the weight of non-self-avoiding paths, we present one more preliminary lemma. Because we will take a second moment, we will need to handle pairs of non-self-avoiding paths. In order to do so, we need to interpret the condition eFmet\sum_{e\in F}m_{e}\geq t in the definition of F(γ)\mathcal{F}(\gamma) for pairs of paths. In the following lemma we deal with multiple paths, so we will write me(γ)m_{e}(\gamma) for the number of times that the path γ\gamma crosses the edge ee.

Let γ1\gamma_{1} and γ2\gamma_{2} be two paths from uu to vv of length kk. Let γ\gamma be the path from uu to uu obtained by first following γ1\gamma_{1} and then following the reversal of γ2\gamma_{2}. For any F1F(γ1)F_{1}\in\mathcal{F}(\gamma_{1}) and F2F(γ2)F_{2}\in\mathcal{F}(\gamma_{2}),

Let F=F1F2F=F_{1}\cup F_{2}, F1=(F1F2)E(γ2)F_{1}^{\prime}=(F_{1}\setminus F_{2})\cap E(\gamma_{2}), and F2=(F2F1)E(γ1)F_{2}^{\prime}=(F_{2}\setminus F_{1})\cap E(\gamma_{1}). Then set H=F(F1F2)H=F\setminus(F_{1}^{\prime}\cup F_{2}^{\prime}). Recall that γiFi\gamma_{i}\setminus F_{i} has at least as many edges as vertices.

also has at least as many edges as vertices. Hence, Hkr(γ)1|H|\leq k_{r}(\gamma)-1.

For every eFie\in F_{i}^{\prime}, we have me(γ)me(γi)+1m_{e}(\gamma)\geq m_{e}(\gamma_{i})+1. Hence,

Let γ1\gamma_{1} and γ2\gamma_{2} be non-backtracking paths from uu to vv of length kk that are not self-avoiding. That is, kr(γ1),kr(γ2)1k_{r}(\gamma_{1}),k_{r}(\gamma_{2})\geq 1. Let γ\gamma be the path obtained by first following γ1\gamma_{1} and then following γ2\gamma_{2} backwards. Let t(γi)t(\gamma_{i}) be the number of tangles in γi\gamma_{i}.

Recall from (13) that kk^{*} is a constant (depending on ss and dd) such that paths with kr>kk_{r}>k^{*} are irrelevant.

uniformly over γ\gamma and σU\sigma_{U}.

The term involving XLX_{L} may be bounded by Lemma 5.6 (recalling that the combined path γ\gamma has length 2k2k):

Next, we turn to the first term of (18). Recall that ΩF\Omega_{F} is the event that no edge in FF appears. Hence, if F1F(γ1)F_{1}\in\mathcal{F}(\gamma_{1}), F2F(γ2)F_{2}\in\mathcal{F}(\gamma_{2}), and F=F1F2F=F_{1}\cup F_{2} then

where we applied Lemma 6.3 in the first inequality above. Hence,

Now, E(K)+E(L)|E(K)|+|E(L)| is the number of distinct edges traversed by γ\gamma, which is also equal to kn(γ)+kr(γ)k_{n}(\gamma)+k_{r}(\gamma). Applying this in the exponent of nn completes the proof. ∎

2 Proof of (4)

We will now combine Lemma 6.4 with our earlier bounds on the number of paths (Lemma 4.7) to show that the total weight of non-self-avoiding paths is negligible on the event that the graph contains no tangles.

because both sides are non-negative and, by Lemma 6.1, they agree whenever the left hand side is non-zero. Next, we expand the sum above as

Combining the last two displayed equations,

Let γ=γ(γ1,γ2)\gamma=\gamma(\gamma_{1},\gamma_{2}) be γ1\gamma_{1} concatenated with the reverse of γ2\gamma_{2}. Let Λ1Γbad×Γbad\Lambda_{1}\subset\Gamma^{\text{bad}}\times\Gamma^{\text{bad}} be the set of pairs (γ1,γ2)(\gamma_{1},\gamma_{2}) such that γ(γ1,γ2)\gamma(\gamma_{1},\gamma_{2}) has more than kk^{*} returning edges. Let Λ2(Γbad×Γbad)Λ1\Lambda_{2}\subset(\Gamma^{\text{bad}}\times\Gamma^{\text{bad}})\setminus\Lambda_{1} be the set of pairs (γ1,γ2)(\gamma_{1},\gamma_{2}) such that (V(γ1)V(γ2))U2logn|(V(\gamma_{1})\cup V(\gamma_{2}))\cap U|\geq 2\sqrt{\log n}. Let Λ=Λ1Λ2\Lambda=\Lambda_{1}\cup\Lambda_{2}. By Corollary 4.5,

Since U=no(1)n1/2|U|=n^{o(1)}\leq n^{1/2} for large enough nn, the fraction of γ1Γbad\gamma_{1}\in\Gamma^{\text{bad}} such that V(γ1)Ulogn|V(\gamma_{1})\cap U|\geq\sqrt{\log n} is at most nclognn^{-c\sqrt{\log n}} for some constant c>0c>0. By Lemma 4.4,

We will further split this sum according to the number of tangles in γ1\gamma_{1} and γ2\gamma_{2}; that is, we define Γtbad\Gamma^{\text{bad}}_{t} to be the set of γ1Γbad\gamma_{1}\in\Gamma^{\text{bad}} with t(γ1)=tt(\gamma_{1})=t. We will show that for any t1t_{1} and t2t_{2},

Summing over the k=no(1)k=n^{o(1)} possible values of t1t_{1} and t2t_{2}, this will imply (20) and complete the proof.

To control (21), fix γ1\gamma_{1} and consider the sum over γ2\gamma_{2}. By Lemma 4.7, there are at most nkn+t21+o(1)n^{k_{n}+t_{2}-1+o(1)} choices of γ2Γt2bad\gamma_{2}\in\Gamma^{\text{bad}}_{t_{2}} that satisfy kn(γ2)=knk_{n}(\gamma_{2})=k_{n} (denote this set by Γt2,knbad\Gamma^{\text{bad}}_{t_{2},k_{n}}). Note that the fraction of γ2Γt2,knbad\gamma_{2}\in\Gamma^{\text{bad}}_{t_{2},k_{n}} satisfying kn(γ)=kn(γ1)+kn(γ2)mk_{n}(\gamma)=k_{n}(\gamma_{1})+k_{n}(\gamma_{2})-m is at most k2m(n2k)mk^{2m}(n-2k)^{-m}. Indeed, m=kn(γ1)+kn(γ2)kn(γ)m=k_{n}(\gamma_{1})+k_{n}(\gamma_{2})-k_{n}(\gamma) is the number of edges that were new in γ2\gamma_{2} but not in γ\gamma. There are at most kmk^{m} ways to choose which edges in γ2\gamma_{2} will no longer be new and each one has at most kmk^{m} choices for a non-new step, versus at least (n2k)m(n-2k)^{m} choices for a new step. Set Γt2,kn,m,γ1bad\Gamma^{\text{bad}}_{t_{2},k_{n},m,\gamma_{1}} to be the paths γ2Γt2,knbad\gamma_{2}\in\Gamma^{\text{bad}}_{t_{2},k_{n}} satisfying kn(γ)=kn(γ1)+kn(γ2)mk_{n}(\gamma)=k_{n}(\gamma_{1})+k_{n}(\gamma_{2})-m. Note that if (γ1,γ2)∉Λ(\gamma_{1},\gamma_{2})\not\in\Lambda then the total number of returning edges in γ\gamma is at most k=O(1)k^{*}=O(1), and the number of vertices in UU intersecting V(γ)V(\gamma) is at most 2logn2\sqrt{\log n}. Hence, Lemma 6.4 applied with U=UV(γ)U=U\cap V(\gamma) implies that for any γ1Γt1bad\gamma_{1}\in\Gamma^{\text{bad}}_{t_{1}} and any mm,

Taking the sum over knkk_{n}\leq k and mkm\leq k only contributes a factor of no(1)n^{o(1)}; hence,

Summing over the nkn+t11+o(1)n^{k_{n}+t_{1}-1+o(1)} possible γ1Γt1,knbad\gamma_{1}\in\Gamma^{\text{bad}}_{t_{1},k_{n}} and then over the kk possible values of knk_{n}, we see that the right hand side of (21) is bounded by s2kn3+o(1)s^{2k}n^{-3+o(1)}, as claimed. ∎

Finally, note that we have finished the proof of Theorem 2.8. Indeed, we proved (1) at the beginning of Section 5.4, (2) in Section 5.4.1, (3) in Section 5.4.2, and we just proved (4).

Acknowledgments

The authors are grateful to Cris Moore and Lenka Zdeborová for stimulating and interesting discussions on many aspects of the block model. They also thank the Charles Bordenave and the anonymous referees for pointing out several simplifications and corrections.

References