Consistency Thresholds for the Planted Bisection Model

Elchanan Mossel, Joe Neeman, Allan Sly

Introduction

The “planted bisection model” is a random graph model with 2n2n vertices that are divided into two classes with nn vertices each. Edges within the classes are added to the graph independently with probability pnp_{n} each, while edges between the classes are added with probability qnq_{n}. Following Bui et al, who studied a related model, Dyer and Frieze introduced the planted bisection model in order to study the average-case complexity of the Min-Bisection problem, which asks for a bisection of a graph that cuts the smallest possible number of edges. This problem is known to be NP-complete in the worst case , but on a random graph model with a “planted” small bisection one might hope that it is usually easy. Indeed, Dyer and Frieze showed that if pn=p>q=qnp_{n}=p>q=q_{n} are fixed as nn\to\infty then with high probability the bisection that separates the two classes is the minimum bisection, and it can be found in expected O(n3)O(n^{3}) time.

These models were introduced slightly earlier in the statistics literature (under the name “stochastic block model”) in order to study the problem of community detection in random graphs. Here, the two parts of the bisection are interpreted as latent “communities” in a network, and the goal is to identify them from the observed graph structure. If pn>qnp_{n}>q_{n}, the maximum a posteriori estimate of the true communities is exactly the same as the minimum bisection (see the discussion leading to Lemma 4.1), and so the community detection problem on a stochastic block model is exactly the same as the Min-Bisection problem on a planted bisection model; hence, we will use the statistical and computer science terminologies interchangeably. We note, however, the statistics literature is slightly more general, in the sense that it often allows qn>pnq_{n}>p_{n}, and sometimes relaxes the problem by allowing the detected communities to contain some errors.

Our main contribution is a necessary and sufficient condition on pnp_{n} and qnq_{n} for recoverability of the planted bisection. When the bisection can be recovered, we provide an efficient algorithm for doing so.

Definitions and results

The oldest and most fundamental question about planted partition models is the label reconstruction problem: if we were given the graph GG but not the labelling σ\sigma, could we reconstruct σ\sigma (up to its sign) from GG? This problem is usually framed in the asymptotic regime, where the number of nodes nn\to\infty, and pp and qq are allowed to depend on nn.

Given sequences pnp_{n} and qnq_{n} in $,andgivenamap, and given a map\mathcal{A}fromgraphstovertexlabellings,wesaythatfrom graphs to vertex labellings, we say that\mathcal{A}$ is strongly consistent (or sometimes just consistent) if

where the probability Prn\operatorname{Pr}_{n} is taken with respect to (G,σ)G(2n,pn,qn)(G,\sigma)\sim\mathcal{G}(2n,p_{n},q_{n}).

Depending on the application, it may also make sense to ask for a labelling which is almost completely accurate, in the sense that it correctly labels all but a vanishingly small fraction of nodes. Amini et al. suggested the term “weak consistency” for this notion.

Given σ,τ{1,1}2n\sigma,\tau\in\{1,-1\}^{2n}, define

Given sequences pnp_{n} and qnq_{n} in $,andgivenamap, and given a map\mathcal{A}fromgraphstovertexlabellings,wesaythatfrom graphs to vertex labellings, we say that\mathcal{A}$ is weakly consistent if

where “P\stackrel{{\scriptstyle P}}{{\to}}” means convergence in probability, and the probability is taken with respect to (G,σ)G(2n,pn,qn)(G,\sigma)\sim\mathcal{G}(2n,p_{n},q_{n}).

Our main result is a characterization of the sequences pnp_{n} and qnq_{n} for which consistent or weakly consistent estimators exist. Note that the characterization of weak consistency was obtained previously by Yun and Proutiere , but we include it here for completeness.

When m=nm=n, we will abbreviate by P(n,p,q)=P(n,n,p,q)P(n,p,q)=P(n,n,p,q).

Consider sequences pnp_{n} and qnq_{n} in $.Thereexistsastronglyconsistentestimatorfor. There exists a strongly consistent estimator for\mathcal{G}(2n,p_{n},q_{n})ifandonlyifif and only ifP(n,p_{n},q_{n})=o(n^{-1}).Thereexistsaweaklyconsistentestimatorfor. There exists a weakly consistent estimator for\mathcal{G}(2n,p_{n},q_{n})ifandonlyifif and only ifP(n,p_{n},q_{n})\to 0$.

In order to provide some intuition for Definition 2.4 and its appearance in our characterization, we note the following graph-theoretic interpretation of P(n,p,q)P(n,p,q):

Given a labelled graph (G,σ)G(2n,p,q)(G,\sigma)\sim\mathcal{G}(2n,p,q) and a node vV(G)v\in V(G), we say that vv has a majority of size kk if either

We say that vv has a majority if it has a majority of size one. If vv does not have a majority, we say that it has a minority.

Fix sequences pnp_{n} and qnq_{n} in $andletand let(G,\sigma)\sim\mathcal{G}(n,p_{n},q_{n})$. Then

P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) if and only if a.a.s. every vV(G)v\in V(G) has a majority; and

P(n,pn,qn)0P(n,p_{n},q_{n})\to 0 if and only if a.a.s. at most o(n)o(n) nodes in V(G)V(G) fail to have a majority.

Proposition 2.7 suggests some intuition for Theorem 2.5: namely, that a node can be labelled correctly if and only if it has a majority. In fact, having a majority is necessary for correct labelling (and we will use this to prove one direction of Theorem 2.5); however, it is not sufficient. For example, there are regimes in which 51% of nodes have majorities, but only 50% of them can be correctly labelled (see ).

We note that Theorem 2.5 has certain parallels with local-to-global threshold phenomena in random graphs. For example, Erdős and Rényi showed that for G(n,pn)\mathcal{G}(n,p_{n}), if pnp_{n} is large enough so that with high probability every node has a neighbor then the graph is connected with high probability. On the other hand, every node having a neighbor is clearly necessary for the graph to be connected. An analogous story holds for the existence of Hamiltonian cycles: Komlós and Szemerédi showed that G(n,pn)\mathcal{G}(n,p_{n}) has a Hamiltonian cycle with high probability if and only if with high probability every node has degree at least two.

These results on connectedness and Hamiltonicity have a feature in common: in both cases, an obviously necessary local condition turns out to also be sufficient (on random graphs) for a global condition. One can interpret Theorem 2.5 similarly: the minimum bisection in G(n,pn,qn)\mathcal{G}(n,p_{n},q_{n}) equals the planted bisection with high probability if and only if with high probability every node has more neighbors of its own label than those of the other label.

Our algorithm comes in three steps, each of which is based on an idea that has already appeared in the literature. Our first step is a spectral algorithm, along the lines of those developed by Boppana , McSherry , and Coja-Oghlan . Yun and Proutiere recently made some improvements to (a special case of) Coja-Oghlan’s work, showing that a spectral algorithm can find a bisection with o(n)o(n) errors if n(pnqn)2pn+qnn\frac{(p_{n}-q_{n})^{2}}{p_{n}+q_{n}}\to\infty; this is substantially weaker than McSherry’s condition for strong consistency, which would require converging to infinity with a rate of at least logn\log n.

The second stage of our algorithm is to apply a “replica trick.” We hold out a small subset UU of vertices and run a spectral algorithm on the subgraph induced by VUV\setminus U. Then we label vertices in UU by examining the edges between UU and VUV\setminus U. By repeating the process for many subsets UU, we dramatically reduce the number of errors made by the spectral algorithm. More importantly, we get extra information about the structure of the errors; for example, we can show that the set of incorrectly-labelled vertices is very poorly connected. Similar ideas are used by Condon and Karp , who used successive augmentation to build an initial guess on a subset of vertices, and then used that guess to correctly classify the remaining vertices. The authors also used a similar idea in the pn,qn=Θ(n1)p_{n},q_{n}=\Theta(n^{-1}) regime, with a more complicated replica trick based on belief propagation.

The third step of our algorithm is a hill-climbing algorithm, or a sequence of local improvements. We simply relabel vertices so that they agree with the majority of their neighbors. An iterative version of this procedure was considered in , and a randomized version (based on simulated annealing) was studied by Jerrum and Sorkin . Our version has better performance guarantees because we begin our hill-climbing just below the summit: as we will show, we need to relabel only a tiny fraction of the vertices and each of those will be relabelled only once.

As noted above, none of the ingredients in our algorithm are novel on their own. However, the way that we combine them is new (and also crucial to the correctness of the resulting algorithm). For example, McSherry used a spectral algorithm with a “clean-up” stage, but his clean-up stage was different from our second and third stages.

Although Theorem 2.5 is not particularly explicit in terms of pnp_{n} and qnq_{n}, one can obtain various explicit characterizations in particular regimes (for example, in order to better compare our results with the existing literature). We will focus our attention on the case where pnp_{n} and qnq_{n} are bounded away from one; for concreteness, suppose pn,qn2/3p_{n},q_{n}\leq 2/3. Because of the symmetry of the problem, this case suffices: indeed, replacing GG(n,pn,qn)G\sim\mathcal{G}(n,p_{n},q_{n}) by its complement (the graph in which two vertices are connected if they are not connected in GG) corresponds to replacing pnp_{n} by 1pn1-p_{n} and qnq_{n} by 1qn1-q_{n}. Hence, if we handle the case pn,qn2/3p_{n},q_{n}\leq 2/3 then we also handle the case pn,qn1/3p_{n},q_{n}\geq 1/3. There remains the case in which min{pn,qn}1/3\min\{p_{n},q_{n}\}\leq 1/3 and 2/3max{pn,qn}2/3\leq\max\{p_{n},q_{n}\}, but this case is trivial: P(n,pn,qn)P(n,p_{n},q_{n}) decreases exponentially fast in nn, and even very simple algorithms are known to be strongly consistent.

One can easily see that to obtain strong consistency, at least one of pnp_{n} or qnq_{n} must be at least n1lognn^{-1}\log n asymptotically. Indeed, suppose qnpn=n1lognq_{n}\leq p_{n}=n^{-1}\log n and let XBinom(n,pn)X\sim\operatorname{Binom}(n,p_{n}), YBinom(n,qn)Y\sim\operatorname{Binom}(n,q_{n}). Then Pr(X=0)=Θ(n1)\operatorname{Pr}(X=0)=\Theta(n^{-1}), and so certainly P(n,pn,qn)=Pr(YX)=Ω(n1)P(n,p_{n},q_{n})=\operatorname{Pr}(Y\geq X)=\Omega(n^{-1}), which means that strong consistency is impossible for these parameters. However, strong consistency is possible for some other parameters in the range Θ(n1logn)\Theta(n^{-1}\log n). Using a Poisson approximation, we can characterize explicitly which of these sequences allow for strong consistency:

Let pn=ann1lognp_{n}=a_{n}n^{-1}\log n and qn=bnn1lognq_{n}=b_{n}n^{-1}\log n. If there is a constant CC such that C1an,bnCC^{-1}\leq a_{n},b_{n}\leq C for all but finitely many nn then P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) if and only if

In a denser regime, it is tempting to approximate Binom(n,pn)\operatorname{Binom}(n,p_{n}) and Binom(n,qn)\operatorname{Binom}(n,q_{n}) by the normal random variables N(npn,nσp2)\mathcal{N}(np_{n},n\sigma_{p}^{2}) and N(nqn,nσq2)\mathcal{N}(nq_{n},n\sigma_{q}^{2}), where σp=p(1p)\sigma_{p}=\sqrt{p(1-p)} and σq=q(1q)\sigma_{q}=\sqrt{q(1-q)}. That is,

where σ=σp2+σq2\sigma=\sqrt{\sigma_{p}^{2}+\sigma_{q}^{2}}. The central limit theorem implies that the normal approximation is correct in the bulk of the distribution if npnnp_{n}\to\infty and nqnnq_{n}\to\infty. However, we are interested in applying this approximation for the tail, which requires a faster increase of npnnp_{n} and a more delicate argument.

Suppose pn,qn=ω(n1log3n)p_{n},q_{n}=\omega\left(n^{-1}\log^{3}n\right) and pn,qn2/3.p_{n},q_{n}\leq 2/3. Then the following conditions are equivalent

nPr(N(0,1)σn1n(pnqn))0n\operatorname{Pr}\left(\mathcal{N}(0,1)\geq\sigma_{n}^{-1}\sqrt{n}(p_{n}-q_{n})\right)\to 0

nσnpnqnexp(n(pnqn)22σn2)0\frac{\sqrt{n}\sigma_{n}}{p_{n}-q_{n}}\exp(-\frac{n(p_{n}-q_{n})^{2}}{2\sigma_{n}^{2}})\to 0,

where σn=pn(1pn)+qn(1qn)\sigma_{n}=\sqrt{p_{n}(1-p_{n})+q_{n}(1-q_{n})}.

In particular, the third condition in Proposition 2.9 gives an explicit formula for checking whether a strongly consistent estimator exists.

The formula for weak consistency is rather simpler:

P(n,pn,qn)0P(n,p_{n},q_{n})\to 0 if and only if n(pnqn)2pn+qn\frac{n(p_{n}-q_{n})^{2}}{p_{n}+q_{n}}\to\infty.

One direction of Proposition 2.10 follows from Chebyshev’s inequality, while the other follows from the central limit theorem.

3 Relation to prior work

Over the years, various authors have improved on the seminal work of Dyer and Frieze by proving weaker sufficient conditions on the sequences pnp_{n} and qnq_{n} for which the planted bisection can be recovered. (Various results also generalized the problem by allowing more than two labels, but we will ignore this generalization here.) For example, Jerrum and Sorkin required pnqn=Ω(n1/6+ϵ)p_{n}-q_{n}=\Omega(n^{-1/6+\epsilon}), while Condon and Karp improved this to pnqn=Ω(n1/2+ϵ)p_{n}-q_{n}=\Omega(n^{-1/2+\epsilon}). McSherry made a big step by showing that if

for a large enough constant CC then spectral methods can exactly recover the labels. This was significant because it allowed pnp_{n} and qnq_{n} to be as small as Θ(n1logn)\Theta(n^{-1}\log n), which is order-wise the smallest possible. A similar result for a slightly different random graph model had been claimed earlier by Boppana , but the proof was incomplete. Carson and Impagliazzo showed that with slightly worse poly-logarithmic factors, a simple hill-climbing algorithm also works. Analogous results were later obtained by by Bickel and Chen using modularity maximization (for which no efficient algorithm is known).

Until now, none of the sufficient conditions in the literature were also necessary; in fact, necessary conditions on pnp_{n} and qnq_{n} have only rarely been discussed. It is instructive to keep the example pn=1/2p_{n}=1/2, qn=1/2rnq_{n}=1/2-r_{n} in mind. In this case McSherry’s condition is the same as requiring that rnCn1lognr_{n}\geq C\sqrt{n^{-1}\log n}. On the other hand, Carson and Impagliazzo pointed out that if rncn1lognr_{n}\leq c\sqrt{n^{-1}\log n} for some small constant cc then the minimum bisection no longer coincides with the planted bisection (as far as we are aware, this was the only necessary condition in the literature). From a statistical point of view, this means that the true communities can no longer be reconstructed perfectly. Our contribution closes the gap between McSherry’s sufficient condition and Carson-Impagliazzo’s necessary condition. In the above case, for example, Proposition 2.9 shows that the critical constant is C=c=1C=c=1.

4 Parallel independent work

Abbe et al. independently studied the same problem in the logarithmic sparsity regime. They consider pn=(alogn)/np_{n}=(a\log n)/n and qn=(blogn)/nq_{n}=(b\log n)/n for constants aa and bb; they show that (a+b)2ab>1(a+b)-2\sqrt{ab}>1 is sufficient for strong consistency and that (a+b)2ab1(a+b)-2\sqrt{ab}\geq 1 is necessary. Note that these are implied by Proposition 2.8, which is more precise. Abbe et al. also consider a semidefinite programming algorithm for recovering the labels; they show that it performs well under slightly stronger assumptions.

5 Other related work, and an open problem

Consistency is not the only interesting notion that one can study on the planted partition model. Earlier work by the authors and by Massoulié considered a much weaker notion of recovery: they only asked whether one could find a labelling that was positively correlated with the true labels.

There are also model-free notions of consistency. Kumar and Kannan considered a deterministic spatial clustering problem and showed that if every point is substantially closer to the center of its own cluster than it is to the center of the other cluster then one can exactly reconstruct the clusters. This is in much the same spirit as Theorem 2.5.

Makarychev, Makarychev, and Vijayaraghavan proposed semi-random models for planted bisections. These models allow for adversarial noise, and also allow edge distributions that are not independent, but only invariant under permutations. They then give approximation algorithms for Min-Bisection, which they prove to work under expansion conditions that hold with high probability for their semi-random model.

We ask whether the techniques developed here could sharpen the results obtained by Makarychev et al. For example, exact recovery under adversarial noise is clearly impossible, but if the adversary is restricted to adding o(n)o(n) edges, then maybe one can guarantee almost exact recovery.

Binomial probabilities and graph structure

In this section, we will prove Proposition 2.7, which relates the binomial probabilities P(n,pn,qn)P(n,p_{n},q_{n}) to the structure of random graphs GG(2n,pn,qn)G\sim\mathcal{G}(2n,p_{n},q_{n}).

From now on, the letters cc and CC refer to positive constants, whose value may change from line to line. We adopt the convention that CC refers to a “sufficiently large” constant, so that any statement involving CC will remain true if CC is replaced by a larger constant. Similarly, cc refers to a “sufficiently small” constant.

Note that the condition mp64logmmp\geq 64\log m is not only a technical one (although the constant 6464 is certainly not optimal). For example, if p=m1logmp=m^{-1}\log m and q=0q=0 then (2) fails to hold, because Pr(YX)=Pr(X=0)m1\operatorname{Pr}(Y\geq X)=\operatorname{Pr}(X=0)\sim m^{-1} but Pr(YX1)=Pr(X1)m1logm\operatorname{Pr}(Y\geq X-1)=\operatorname{Pr}(X\leq 1)\sim m^{-1}\log m.

Nevertheless, it is still possible to consider similar estimates in the sparse case. Here is an analogue of (2) that holds with p=O(m1logm)p=O(m^{-1}\log m).

2 Majorities are uncorrelated

The preceding propositions may be combined to show that the event that uu has a minority is essentially independent of the event that vv has a minority. First, we observe that removing one trial from a binomial random variable doesn’t change very much.

There is a universal constant C>0C>0 such that for all m,nm,n and all p,q2/3p,q\leq 2/3,

Assume without loss of generality that pqp\geq q. Let XBinom(m1,p)X^{\prime}\sim\operatorname{Binom}(m-1,p), YBinom(n1,q)Y^{\prime}\sim\operatorname{Binom}(n-1,q), ξXBernoulli(p)\xi_{X}\sim\operatorname{Bernoulli}(p) and ξYBernoulli(q)\xi_{Y}\sim\operatorname{Bernoulli}(q) be independent, and then take X=X+ξXX=X^{\prime}+\xi_{X} and Y=Y+ξYY=Y^{\prime}+\xi_{Y}. In terms of these variables, the left-hand inequality above may be written as

We will focus on this inequality (since the other inequality is essentially identical). Now,

If we assume that (m1)p64log(m1)(m-1)p\geq 64\log(m-1) then (1) implies that

which implies the claim. On the other hand, if (m1)p64log(m1)(m-1)p\leq 64\log(m-1) then directly from (3) we have

Next, we show that {u has a minority}\{u\text{ has a minority}\} and {v has a minority}\{v\text{ has a minority}\} are essentially uncorrelated. We recall that if AA and BB are events then Cov(A,B)=Pr(AB)Pr(A)Pr(B)\operatorname{Cov}(A,B)=\operatorname{Pr}(A\cap B)-\operatorname{Pr}(A)\operatorname{Pr}(B).

Fix nodes uu and vv. Let AA and BB be the events that uu and vv respectively have minorities. If p,q2/3p,q\leq 2/3 then

Assume that p>qp>q and that σu=+\sigma_{u}=+ and σv=\sigma_{v}=- (the other cases are very similar). Let ξ\xi be the indicator that uvu\sim v, and let AA and BB be the events that uu and vv respectively have minorities. Note that AA and BB are conditionally independent given ξ\xi, which means that

where the last equality holds because AA and BB have the same distribution given ξ\xi.

Define \alpha=P(n-1,n,p,q)=\operatorname{Pr}(\text{uhas a minority})=\operatorname{Pr}(\text{vhas a minority}). By our assumption that σuσv\sigma_{u}\neq\sigma_{v} and p>qp>q, we have Pr(Aξ=0)Pr(Aξ=1)\operatorname{Pr}(A\mid\xi=0)\leq\operatorname{Pr}(A\mid\xi=1). On the other hand,

Next, we consider Pr(Aξ=1)\operatorname{Pr}(A\mid\xi=1). Note that

By applying either (2) or Proposition 3.2 to the right hand side above, we have

(To get the second case, we are either applying (2) for 64lognnpn1/264\log n\leq np\leq n^{1/2} or we are applying Proposition 3.2.) In the first case, the random variable Pr(Aξ)\operatorname{Pr}(A\mid\xi) is supported on an interval of width at most Cn1/6α+Cn2Cn^{-1/6}\alpha+Cn^{-2} and so its variance is at most Cn1/3α2+Cn4Cn^{-1/3}\alpha^{2}+Cn^{-4}. In the second case, Pr(ξ=1)=qpn1/2\operatorname{Pr}(\xi=1)=q\leq p\leq n^{-1/2}, and so

which is bounded by Cα2n1/3+Cn4C\alpha^{2}n^{-1/3}+Cn^{-4}.

3 Graph structure

Finally, we will use our preceding estimates to prove Proposition 2.7. Most of the proof essentially follows by straightforward first moment arguments. The most complicated part is showing that P(n,pn,qn)=Ω(n1)P(n,p_{n},q_{n})=\Omega(n^{-1}) implies that with constant probability there exists a node with a minority. This uses a fairly standard second moment argument, the main technical part of which is contained in Lemma 3.5.

Fix a node vV(G)v\in V(G) and suppose without loss of generality that σv=+\sigma_{v}=+. For notational convenience, we will also suppose that p>qp>q; an essentially identical proof works for p<qp<q. Let XX and YY denote the number of ++- and --labelled neighbors of vv. Then

Suppose first that P(n,pn,qn)=o(1)P(n,p_{n},q_{n})=o(1). Then

by Lemma 3.3. Summing over vV(G)v\in V(G), we have

and so Markov’s inequality implies that a.a.s. all but o(n)o(n) nodes have a majority.

For the rest of the proof, we will assume that pn,qn2/3p_{n},q_{n}\leq 2/3. As we explained in Section 2.2, this case suffices: if pn,qn1/3p_{n},q_{n}\geq 1/3 then we may apply the result with pnp_{n} and qnq_{n} replaced by 1pn1-p_{n} and 1qn1-q_{n}; if qn1/3q_{n}\leq 1/3 and pn2/3p_{n}\geq 2/3 then P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) and we have already given that part of the proof.

Suppose that the number of nodes without a majority is not o(n)o(n) a.a.s. Then there is some ϵ>0\epsilon>0 such that for infinitely many nn, the probability of having ϵn\epsilon n nodes with a minority is at least ϵ\epsilon. Thus, the expected number of nodes with a minority is at least ϵ2n\epsilon^{2}n for infinitely many nn, which in turn implies that P(n1,n,pn,qn)=Pr(YX)ϵ2P(n-1,n,p_{n},q_{n})=\operatorname{Pr}(Y\geq X)\geq\epsilon^{2} for infinitely many nn. By Lemma 3.3, P(n,pn,qn)↛0P(n,p_{n},q_{n})\not\to 0.

It remains to prove that all nodes have a majority a.a.s. only if P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}). This requires a second moment argument: let ξu\xi_{u} be the indicator that uu has a minority and let N=uξuN=\sum_{u}\xi_{u} be the number of nodes with a minority. If \alpha=\operatorname{Pr}(\text{uhas a minority}) (which is the same for all uu) then

Sufficient condition for strong consistency

The rough idea behind our strongly consistent labelling algorithm is to first run a weakly consistent algorithm and then try to improve it. The natural way to improve an almost-accurate labelling τ\tau is to search for nodes uu that have a minority with respect to τ\tau and flip their signs. In fact, if the errors in τ\tau were independent of the neighbors of uu then this would work quite well: assuming that uu has a decently large majority (which it will, for most uu, by Proposition 3.1), then having a labelling τ\tau with few errors is like observing each neighbor of uu with a tiny amount of noise. This tiny amount of noise is very unlikely to flip uu’s neighborhood from a majority to a minority. Therefore, choosing uu’s sign to give it a majority is a reasonable approach.

There are two important problems with the argument outlined in the previous paragraph: it requires the errors in τ\tau to be independent, and it is only guaranteed to work for those uu that have a sizeable majority (i.e., almost, but not quite, all the nodes in GG). Nevertheless, this procedure is a good starting point and it motivates the first clean-up stage of our algorithm (Algorithm 1). By removing uu from the graph before looking for the almost-accurate labelling τ\tau, we ensure the required independence properties (as a result, note that we will be dealing with multiple labellings τ\tau, depending on which nodes we removed before running our almost-accurate labelling algorithm). And although the final labelling we obtain is not guaranteed to be entirely correct, we show that it has very few (i.e., at most nϵn^{\epsilon}) errors whereas the initial labelling was only guaranteed to have o(n)o(n) errors.

In order to finally produce the correct labelling, we return to the earlier idea: flipping the label of every node that has a minority. We analyze this procedure by noting that after the previous step of the algorithm, the errors were confined to a very particular set of nodes (namely, those without a very strong majority). We show that this set of nodes is small and poorly connected, which means that every node in the graph is guaranteed to only have a few neighbors in this bad set. In particular, even nodes with relatively weak majorities cannot be flipped by labelling errors in the bad set. We analyze this procedure in Section 4.3.

As stated in the introduction, there exist algorithms for a.a.s. correctly labelling all but o(n)o(n) nodes. Assuming that pn+qn=Ω(n1logn)p_{n}+q_{n}=\Omega(n^{-1}\log n), such an algorithm is easy to describe, and we include it for completeness; indeed, the algorithm we give is essentially folklore, although a nice treatment is given in . A slightly more complex algorithm that doesn’t assume pn+qn=Ω(n1logn)p_{n}+q_{n}=\Omega(n^{-1}\log n) can be found in .

If pn+qn=Ω(n1logn)p_{n}+q_{n}=\Omega(n^{-1}\log n) then there is a constant CC such that

a.a.s. as nn\to\infty, where \|\cdot\| denotes the spectral norm.

2 The replica step

Let BBPartition be an algorithm that is guaranteed to a.a.s. label all but o(n)o(n) nodes correctly; we will use it as a black box. Note that we may assume that BBPartition produces an exactly balanced labelling. If not, then if its output has more ++ labels than - labels, say, we can randomly choose some ++-labelled vertices and relabel them. The new labelling is balanced, and it is still guaranteed to have at most o(n)o(n) mistakes.

We define VϵV_{\epsilon} to be a set of “bad” nodes that our first step is not required to label correctly.

Let VϵV_{\epsilon} be the elements of VV that have a majority of size less than ϵnplogn\epsilon\sqrt{np\log n}, or that have more than 100np100np neighbors.

For any ϵ>0\epsilon>0, Algorithm 1 a.a.s. correctly labels every node in VVϵV\setminus V_{\epsilon}.

Before proving Proposition 4.3, we deal with a minor technical point. The following lemma shows that we can apply BBPartition to subgraphs of GG(2n,pn,qn)G\sim\mathcal{G}(2n,p_{n},q_{n}), and it will still have the required guarantees.

If P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) then for any α>0\alpha>0, P(αn,pn,qn)0P(\lfloor\alpha n\rfloor,p_{n},q_{n})\to 0.

This follows from two simple properties of the function PP. First, we have P(n1+n2,p,q)P(n1,p,q)P(n2,p,q)P(n_{1}+n_{2},p,q)\geq P(n_{1},p,q)P(n_{2},p,q) for any n1,n2,pn_{1},n_{2},p, and qq. Indeed, if XiBinom(ni,p)X_{i}\sim\operatorname{Binom}(n_{i},p) and YiBinom(ni,q)Y_{i}\sim\operatorname{Binom}(n_{i},q) are independent then

A similar coupling argument shows that for any n20n_{2}\geq 0, P(n1,p,q)12P(n1+n2,p,q)P(n_{1},p,q)\geq\frac{1}{2}P(n_{1}+n_{2},p,q). Indeed, conditioned on X1+X2Y1+Y2X_{1}+X_{2}\leq Y_{1}+Y_{2}, the probability of X1Y1X_{1}\leq Y_{1} is at least 12\frac{1}{2}. Hence,

Now, choose an integer kk so that α1/k\alpha\geq 1/k. Then

Since kk and α\alpha are constant as nn\to\infty, this completes the proof.

First, we may assume without loss of generality that the partition U+,UU_{+},U_{-} that was produced in line 1 is positively correlated with the true labelling σ\sigma. By our assumption on BBPartition, at line 1 Ui,+U_{i,+} either agrees with V+UiV_{+}\setminus U_{i} or VUiV_{-}\setminus U_{i}, up to an error of o(n)o(n). After the relabelling in line 1, then, a.a.s. Ui,+U_{i,+} agrees with V+UiV_{+}\setminus U_{i} up to an error of o(n)o(n). Since mm is a constant independent of nn, this property a.a.s. holds for every ii simultaneously.

Now, consider a node v∉Vϵv\not\in V_{\epsilon} and suppose without loss of generality that σv=+\sigma_{v}=+. Conditioned on vUiv\in U_{i}, every other node is added to UiU_{i} independently with probability 1/m1/m. Hence, conditioned on vv having k+k_{+} ++-labelled neighbors and kk_{-} --labelled neighbors, it has Binom(k+,1/m)\operatorname{Binom}(k_{+},1/m) ++-labelled neighbors in UiU_{i} and Binom(k,1/m)\operatorname{Binom}(k_{-},1/m) --labelled neighbors in UiU_{i}. Let k+,ik_{+,i} denote the number of ++-labelled neighbors that vv has in UiU_{i} and let k+,¬i=k+k+,ik_{+,\lnot i}=k_{+}-k_{+,i} be the number of ++-labelled neighbors that vv has in VUiV\setminus U_{i} (and similarly for -).

By Bernstein’s inequality, with probability at least 12n21-2n^{-2},

Recall that v∉Vϵv\not\in V_{\epsilon} implies that k+100npk_{+}\leq 100np, k100npk_{-}\leq 100np and

where the last inequality follows from the definition of mm. Taking a union bound over the events leading to (4), we see that a.a.s., for every v∉Vϵv\not\in V_{\epsilon} with σv=+\sigma_{v}=+, if vUiv\in U_{i} then

In other words, every v∉Vϵv\not\in V_{\epsilon} still has a strong majority, even if we consider only edges between vv and the complement of UiU_{i}.

Let XX_{-} be the number of ++-valued neighbors of vv that were incorrectly labelled as - in line 1 (i.e. X={u:uv,σu=+,uUi,}X_{-}=|\{u:u\sim v,\sigma_{u}=+,u\in U_{i,-}\}|), and let X+X_{+} be the number of --valued neighbors that were incorrectly labelled as ++. Note that the quantities considered in line 1 of Algorithm 1 may be expressed in terms of kk and XX as

Hence, the inequality X+X<12k+,¬iki,¬i|X_{+}-X_{-}|<\frac{1}{2}|k_{+,\lnot i}-k_{i,\lnot i}| will imply that vv is correctly labelled in lines 1–1. For the rest of the proof, our goal will be to show that a.a.s. the above inequality holds for all v∉Vϵv\not\in V_{\epsilon}.

Let E=#{uUi,:σu=+}E_{-}=\#\{u\in U_{i,-}:\sigma_{u}=+\} (i.e., the total number of ++-labelled vertices that were mislabelled in line 1) and let E+=#{uUi,+:σu=}E_{+}=\#\{u\in U_{i,+}:\sigma_{u}=-\}. Note that the neighbors of vv are independent of Ui,U_{i,-}, and so conditioned on k+,¬ik_{+,\lnot i} and k,¬ik_{-,\lnot i},

where V+V_{+} and VV_{-} are the set of uu with σu=+\sigma_{u}=+ and σu=\sigma_{u}=-, respectively. Now condition on k+,¬ik_{+,\lnot i} and k,¬ik_{-,\lnot i}, and on the following a.a.s. events:

Under the above events, and recalling that k+100npk_{+}\leq 100np,

Going back to (6), we see that a.a.s. for all v∉Vϵv\not\in V_{\epsilon},

Next, we consider the deviations of XX_{-} and X+X_{+} around their means. By Bernstein’s inequality for hypergeometric variables, there is a constant CC such that with probability 1n21-n^{-2}, XX_{-} is within

of its expectation. Since E=o(n)E_{-}=o(n), we can take nn large enough so that XX_{-} is within ϵ16nplogn\frac{\epsilon}{16}\sqrt{np\log n} of its expectation with probability 1n21-n^{-2}. Arguing similarly for X+X_{+} we have

with probability 12n21-2n^{-2}. Taking a union bound over v∉Vϵv\not\in V_{\epsilon} (recall that XX and kk both depend on vv), we see that the above inequality holds a.a.s. for all v∉Vϵv\not\in V_{\epsilon} simultaneously. By (6), a.a.s. for all vVϵv\in V_{\epsilon},

3 The hill-climbing step

After running Algorithm 1, we are left with a graph in which only nodes belonging to VϵV_{\epsilon} could possibly be mis-labelled. Fortunately, very few nodes belong to VϵV_{\epsilon}, and those that do are poorly connected to the rest of the graph. This is the content of the next two propositions.

For every δ>0\delta>0 there exists an ϵ>0\epsilon>0 such that if P(n,p,q)=o(n1)P(n,p,q)=o(n^{-1}) then Vϵnδ|V_{\epsilon}|\leq n^{\delta} a.a.s.

Consider a single vVv\in V. By Bernstein’s inequality the probability that vv has 100np100np neighbors is less than n2n^{-2} (using nplognnp\geq\log n, which follows from P(n,p,q)=o(n1)P(n,p,q)=o(n^{-1})). Hence, a.a.s. every vv has at most 100np100np neighbors.

In particular, if Cϵ<δC\epsilon<\delta then the right hand size is o(n1+δ)o(n^{-1+\delta}). By Markov’s inequality, this implies that a.a.s. at most nδn^{\delta} nodes fail to have a majority of size ϵnplogn\epsilon\sqrt{np\log n}.

Since (2/ϵ)ϵ1(2/\epsilon)^{\epsilon}\to 1 as ϵ0\epsilon\to 0, we may choose ϵ\epsilon so that (2C/ϵ)Cϵlognnδ/2(2C/\epsilon)^{C\epsilon\log n}\leq n^{\delta/2}. By Markov’s inequality, we see that at most nδn^{\delta} nodes fail to have a majority of size ϵnplogn\epsilon\sqrt{np\log n}.

Suppose that P(n,p,q)=o(n1)P(n,p,q)=o(n^{-1}) and npn1/4np\leq n^{1/4}. For sufficiently small ϵ\epsilon, a.a.s. no node has two or more neighbors in VϵV_{\epsilon}.

Fix u,vVu,v\in V; let XBinom(n1,p)X\sim\operatorname{Binom}(n-1,p) and YBinom(n,q)Y\sim\operatorname{Binom}(n,q). As in the proof of Proposition 4.7, a.a.s. every vVv\in V has at most 100np100np neighbors; for the rest of the proof, we condition on this event. Moreover, we may choose ϵ\epsilon small enough so that Pr(YXϵnplogn)n7/8\operatorname{Pr}(Y\geq X-\epsilon\sqrt{np\log n})\leq n^{-7/8}. In particular, that means that Pr(uVϵ)n7/8\operatorname{Pr}(u\in V_{\epsilon})\leq n^{-7/8}. Now condition on the neighbors of uu. If vv has a majority of 2ϵnplogn2\epsilon\sqrt{np\log n} on all edges except for uu, then it lies outside of VϵV_{\epsilon} regardless of whether it neighbors uu. But this event is independent of whether uVϵu\in V_{\epsilon}, and if ϵ\epsilon is sufficiently small then it has probability at least 1n7/81-n^{-7/8}. Hence, Pr(u,vVϵ)n7/4\operatorname{Pr}(u,v\in V_{\epsilon})\leq n^{-7/4}.

Now condition on the event that u,vVϵu,v\in V_{\epsilon}. Recall that uu and vv each have at most 100np100n1/4100np\leq 100n^{1/4} neighbors in VV_{-} and at most 100n1/4100n^{1/4} neighbors in V+V_{+}. Conditioned on the number of neighbors in VV_{-} and V+V_{+}, the neighbors of uu and vv are independent and uniformly distributed. Hence, the probability that they have a common neighbor is O(n3/43/4+1)=O(n1/2)O(n^{-3/4-3/4+1})=O(n^{-1/2}). Combining this with the previous paragraph, we have

Taking a union bound over n2n^{2} choices of uu and vv completes the proof.

Suppose that npn1/4np\leq n^{1/4}. For sufficiently small ϵ\epsilon, a.a.s. no two nodes in VϵV_{\epsilon} are adjacent.

Fix u,vVu,v\in V. The probability that they are adjacent is at most pn3/4p\leq n^{-3/4}. As in the previous proof, if ϵ\epsilon is small enough then Pr(uVϵuv)\operatorname{Pr}(u\in V_{\epsilon}\mid u\sim v) and Pr(vVϵuv,uVϵ)\operatorname{Pr}(v\in V_{\epsilon}\mid u\sim v,u\in V_{\epsilon}) are both at most n7/8n^{-7/8}. Multiplying these conditional probabilities, we have

and we conclude by taking a union bound over uu and vv.

Suppose that we initialize Algorithm 2 with a partition whose errors are restricted to VϵV_{\epsilon}, and suppose that P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}). Then a.a.s., Algorithm 2 returns the true partition.

We consider two cases: the dense regime n1/4np2n/3n^{1/4}\leq np\leq 2n/3, and the sparse regime 12lognnpn1/4\frac{1}{2}\log n\leq npn^{1/4}.

In the dense regime, note that by Proposition 3.1, a.a.s. every node has a majority of Ω(np/logn)Ω(n1/9)\Omega(\sqrt{np/\log n})\geq\Omega(n^{1/9}). On the other hand, if ϵ\epsilon is sufficiently small then (by Proposition 4.7) Vϵn1/10|V_{\epsilon}|\leq n^{1/10}, which implies that every node in V+V_{+} will have most of its neighbors in U+U_{+}. Therefore, W+=V+W_{+}=V_{+} in Algorithm 2.

In the sparse regime, let VV^{\prime} be the set of nodes with a majority of less than three; note that VVϵV^{\prime}\subset V_{\epsilon}. By Proposition 4.9, a.a.s. every node has at most one neighbor in VϵV_{\epsilon}, which implies that every node in V+VV_{+}\setminus V^{\prime} has most of its neighbors in U+U_{+}; hence every node outside of VV^{\prime} will be correctly labelled. On the other hand, Proposition 4.11 shows that nodes in VV^{\prime} are also correctly labelled, since none of them have any neighbors in VϵV_{\epsilon} (recalling that VVϵV^{\prime}\subset V_{\epsilon}).

Necessary condition for strong consistency

A classical fact in Bayesian statistics says that if we are asked to produce a configuration σ^\hat{\sigma} from the graph GG, then the algorithm with the highest probability of success is the maximum a posteriori estimator, σ^\hat{\sigma}, which is defined to be any τ{1,1}V(G)\tau\in\{-1,1\}^{V(G)} satisfying uτu=0\sum_{u}\tau_{u}=0 that maximizes Pr(Gσ=τ)\operatorname{Pr}(G\mid\sigma=\tau). (To see that this is the estimator with the highest probability of success, note that every τ\tau that maximizes Pr(Gσ=τ)\operatorname{Pr}(G\mid\sigma=\tau) also maximizes Pr(σ=τG)\operatorname{Pr}(\sigma=\tau\mid G); clearly, a τ\tau that maximizes the latter quantity is an optimal estimate.) In order to prove that P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) is necessary for strong consistency, we relate the success probability of σ^\hat{\sigma} to the existence of nodes with minorities. Note that we say vv has a majority with respect to τ\tau if (assuming p>qp>q) τ\tau gives the same label to vv as it does to most of vv’s neighbors.

If there is a unique maximal σ^\hat{\sigma} then with respect to σ^\hat{\sigma}, there cannot be both a ++-labelled node with a minority and a --labelled node with a minority.

For convenience, we will assume that p>qp>q. The same proof works for p<qp<q, but one needs to remember that the definition of “majority” and “minority” swap in that case (Definition 2.6).

The probability of GG conditioned on the labelling τ\tau may be written explicitly: if AτA_{\tau} is the set of unordered pairs uvu\neq v with τu=τv\tau_{u}=\tau_{v} and BτB_{\tau} is the set of unordered pairs uvu\neq v with τuτv\tau_{u}\neq\tau_{v} then

Consider a labelling τ\tau. Suppose that there exist nodes uu and vv with τu=+\tau_{u}=+ and τv=\tau_{v}=-, and such that both uu and vv have minorities with respect to τ\tau. We will show that τ\tau cannot be the unique maximizer of Pr(Gσ=τ)\operatorname{Pr}(G\mid\sigma=\tau), which will establish the lemma.

Consider the labelling τ\tau^{\prime} that is identical to τ\tau except that τu=\tau^{\prime}_{u}=- and τv=+\tau^{\prime}_{v}=+. The fact that uu and vv both had minorities with respect to τ\tau implies that

(note that equality is possible in the inequalities above if uu and vv are neighbors). On the other hand, the number of ++ and - labels are the same for τ\tau and τ\tau^{\prime}; hence Aτ=Aτ|A_{\tau}|=|A_{\tau^{\prime}}| and Bτ=Bτ|B_{\tau}|=|B_{\tau^{\prime}}|. Looking back at (7), therefore, we have

Hence, τ\tau cannot be the unique maximizer of Pr(Gσ=τ)\operatorname{Pr}(G\mid\sigma=\tau).

In order to argue that P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) is necessary for strong consistency, we need to show that if P(n,pn,qn)P(n,p_{n},q_{n}) is not o(n1)o(n^{-1}) then (G,σ)G(2n,pn,qn)(G,\sigma)\sim\mathcal{G}(2n,p_{n},q_{n}) has a non-vanishing chance of containing nodes of both labels with minorities.

Suppose that P(n,pn,qn)P(n,p_{n},q_{n}) is not o(n1)o(n^{-1}). By Proposition 2.7, there is some ϵ>0\epsilon>0 such that for infinitely many nn, \operatorname{Pr}(\exists u:\text{uhas a minority})\geq\epsilon. Since ++-labelled nodes and --labelled nodes are symmetric, there are infinitely many nn such that

By Harris’s inequality , the two events above are non-negatively correlated because both of them are monotonic events with the same directions: both are monotonic increasing in the edges between ++-labelled and --labelled nodes and monotonic decreasing in the other edges. Hence, there are infinitely many nn for which

Binomial approximations

In this section, we collect various technical, but not particularly enlightening, estimates for binomial variables. Specifically, we prove Propositions 2.8 and 2.9, which give explicit characterizations of the condition P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) in the sparse and dense case respectively, and Proposition 3.1 and 3.2, which give perturbative estimates for binomial probabilities. Our main tools are Bernstein’s inequality, Stirling’s approximation and Taylor expansion.

For simplicity, in this section we write a=an,b=bna=a_{n},b=b_{n} and c=a+bc=a+b. If there is a constant C>0C>0 such that C1fgCfC^{-1}f\leq g\leq Cf then we write fgf\asymp g. We recall that a,b=Θ(1)a,b=\Theta(1) and that pn=alognpn=a\log n and qn=blognqn=b\log n. Let XBinom(n,p)X\sim\operatorname{Binom}(n,p) and YBinom(n,q)Y\sim\operatorname{Binom}(n,q).

We begin with a Poisson approximation to binomials.

If Z=X+YZ=X+Y then for every k10clognk\leq 10c\log n,

where the sequence implicit in the o(1)o(1) notation is independent of nn and kk.

By a direct computation, if k10clognk\leq 10c\log n then

We first note that if abϵ=ϵ(C)a-b\leq\epsilon=\epsilon(C) then strong consistency does not hold. This follows because with constant probability we have that XX is less than its mean anlogna_{n}\log n and the probability that YY is larger than alogna\log n is at least n1/2n^{-1/2} if ϵ\epsilon is a sufficiently small constant.

Without loss of generality, we may assume that c1c\geq 1. Indeed, if c<1c<1 then the proposition is trivially true: on the one hand P(n,pn,qn)=Ω(n1)P(n,p_{n},q_{n})=\Omega(n^{-1}) because Pr(X=0)\operatorname{Pr}(X=0) and Pr(Y=0)\operatorname{Pr}(Y=0) are both Ω(n1)\Omega(n^{-1}); on the other hand, (a+b2ab1)logn+12loglogn(a+b-2\sqrt{ab}-1)\log n+\frac{1}{2}\log\log n\to-\infty because a+b=c<1a+b=c<1 and ab\sqrt{ab} is bounded away from zero as nn\to\infty.

where the second equality follows from the fact that Pr(Z10clogn)O(n2)\operatorname{Pr}(Z\geq 10c\log n)\leq O(n^{-2}), recalling that c1c\geq 1.

For a fixed k10clognk\leq 10c\log n, we have that

where η=ba+b12(1ϵ)\eta=\frac{b}{a+b}\leq\frac{1}{2}(1-\epsilon). Recall that binomial tail probabilities decay exponentially fast; since η12(1ϵ)\eta\leq\frac{1}{2}(1-\epsilon), Pr(Binom(k,η)k/2)Pr(Binom(k,η)=k/2)\operatorname{Pr}(\operatorname{Binom}(k,\eta)\geq k/2)\asymp\operatorname{Pr}(\operatorname{Binom}(k,\eta)=\lceil k/2\rceil).Combining this with Stirling’s approximation we have

where θ=η(1η)=aba+b\theta=\sqrt{\eta(1-\eta)}=\frac{\sqrt{ab}}{a+b}. By Lemma 6.1,

and so Stirling’s approximation for k1k\geq 1 gives

and so the maximum is obtained around the value

Thus nPr(YX)0n\operatorname{Pr}(Y\geq X)\to 0 if and only if

2 Characterization of dense strong consistency

Our main tool for proving Proposition 2.9 will be the following Local Central Limit Theorem. The proof is a standard application of Stirling’s approximation.

Let C>0C>0 be an arbitrary constant and YBinom(n,q)Y\sim\operatorname{Binom}(n,q), where

Let σq2=q(1q)\sigma_{q}^{2}=q(1-q) and let ϕ(x)=(2π)1/2ex2/2\phi(x)=(2\pi)^{-1/2}e^{-x^{2}/2}. Then for all integers kk such that knqCnlognσq|k-nq|\leq C\sqrt{n\log n}\sigma_{q} it holds that

The second statement follows easily from the first one using the formula for ϕ\phi and noting that if δCnlognσq\delta\leq C\sqrt{n\log n}\sigma_{q} and ϵ1|\epsilon|\leq 1 then

To prove the first statement, we begin with Stirling’s approximation. Noting that kk\to\infty as nn\to\infty, we obtain:

and since q=ω(n1log3n)q=\omega(n^{-1}\log^{3}n) implies σqlognn=o(q/logn)\frac{\sigma_{q}\sqrt{\log n}}{\sqrt{n}}=o(q/\log n), it follows that n/k=(1+o(1/logn))1qn/k=(1+o(1/\log n))\frac{1}{q}. Similarly, nnk=(1+o(1/logn))11q\frac{n}{n-k}=(1+o(1/\log n))\frac{1}{1-q} and so

Next, we use Taylor expansion around nq=knq=k. The first-order term vanishes and we have

where the last equality uses the fact that

which follows from the assumption that q=ω(n1log3(n))q=\omega(n^{-1}\log^{3}(n)). Now, from (8) we have nk(nk)=(1+o(1/logn))1nσq2\frac{n}{k(n-k)}=(1+o(1/\log n))\frac{1}{n\sigma_{q}^{2}}. Since (knq)2=O(σq2nlogn)(k-nq)^{2}=O(\sigma_{q}^{2}n\log n), we have

The proof follows by combining this with (8) and Stirling’s approximation for Pr(Y=k)\operatorname{Pr}(Y=k).

The second and third conditions are clearly equivalent; we will show the equivalence of the first two.

So writing bq=5nlognσqb_{q}=5\sqrt{n\log n}\sigma_{q} and bp=5nlognσpb_{p}=5\sqrt{n\log n}\sigma_{p} we have:

Where M,NN(0,1)M,N\sim\mathcal{N}(0,1) are independent. The proof follows.

3 Perturbation estimates for dense binomials

The main approximation that we use to prove Proposition 3.1 is the following:

Now, under the assumption mp64logmmp\geq 64\log m, we have mp4mplogmmp/2mp-4\sqrt{mp\log m}\geq mp/2 and mp+4mplogm3mp/2mp+4\sqrt{mp\log m}\leq 3mp/2. Consider the first term in the upper bound of Lemma 6.7:

where the last inequality used kmp4mplogm|k-mp|\leq 4\sqrt{mp\log m} and kmp/2k\geq mp/2. The other term in the upper bound of Lemma 6.7 is similar:

for sufficiently large mm, where the second inequality follows by lower-bounding both terms in the denominator: p2/3p\leq 2/3 implies mmp2mpm-mp\geq 2mp and kmp+4mplogmk\leq mp+4\sqrt{mp\log m} implies mkcmpm-k\geq cmp for some c>0c>0 and sufficiently large mm (this follows by considering the cases p[210,2/3]p\in[2^{-10},2/3] and p[64m1logm,210]p\in[64m^{-1}\log m,2^{-10}] separately). Combining (11) and (12) with Lemma 6.7, we obtain

The lower bound (i.e. (1)) is essentially the same, and we give only a sketch: we write

4 Perturbation estimates for sparse binomials

The sparse case needs a slightly different argument and has slightly worse bounds. We have the following analogue of Lemma 6.7:

This time, we will use a sharper bound on the sum: since the logarithm is an increasing function,

Since kk and mpmp are o(m)o(m), log((mk)/(mmp))=o(1)\log((m-k)/(m-mp))=o(1), and so

This proof is similar to the proof of Proposition 3.1, but with Lemma 6.10 instead of Lemma 6.7 and some slightly different truncations: we write

By Bernstein’s inequality, we may truncate the sum at m\sqrt{m} at the cost of an additive ecme^{-c\sqrt{m}} term. We apply the inequality

(which follows from Lemma 6.10) to each term in the sum, yielding

where the second inequality follows (assuming CeC\geq e) because (ey/x)x(ey/x)^{x} is an increasing function of xx for xyx\leq y. Putting everything together,

Finally, note that Pr(X=0)Pr(YX)\operatorname{Pr}(X=0)\leq\operatorname{Pr}(Y\geq X) so that the first two terms above may be combined at the cost of increasing CC. For the additive term ecme^{-c\sqrt{m}}, note that mp128logmmp\leq 128\log m implies that Pr(YX)Pr(X=0)=Ω(nα)\operatorname{Pr}(Y\geq X)\geq\operatorname{Pr}(X=0)=\Omega(n^{-\alpha}) for some constant α\alpha, and so ecme^{-c\sqrt{m}} may also be absorbed into the main term at the cost of increasing CC.

Erratum

The published version of this paper contained a mistake; we are grateful to Jan van Waaij for pointing it out.

The statement of Lemma 5.1 is incorrect; the error in the proof was introduced in the inequality

which does not hold under the assumption of Lemma 5.1. To formulate a correct version, we introduce the notion of a strict minority:

Given a labelled graph (G,σ)(G,\sigma), we say that vv has a strict minority if either

Here is a corrected version of Lemma 5.1 (using the notation of Lemma 5.1):

If there is a unique maximal σ^\hat{\sigma} then with respect to σ^\hat{\sigma} then there cannot be both a ++-labelled node uu and a --labelled node vv such that either

uu and vv both have strict minorities, or

uu and vv are non-adjacent and both have minorities.

The proof of Lemma 7.2 is essentially the same as the proof of Lemma 5.1, except that the strengthened assumption means that the problematic inequality is now true. As before, assume that p>qp>q, let uu and vv be any nodes with τu=+\tau_{u}=+ and τv=\tau_{v}=-, let Aτ={{u,v}:τu=τv}A_{\tau}=\{\{u,v\}:\tau_{u}=\tau_{v}\} and let τ\tau^{\prime} be the labelling obtained from τ\tau by swapping the labels of uu and vv. We need to show that under either of the two conditions in Lemma 7.2, E(G)AτE(G)Aτ|E(G)\cap A_{\tau^{\prime}}|\geq|E(G)\cap A_{\tau}|.

The sets E(G)AτE(G)\cap A_{\tau^{\prime}} and E(G)AτE(G)\cap A_{\tau} differ only among edges that are incident to either uu or vv, so it suffices to consider such edges, of which there are five types:

if wuw\sim u has τw=+\tau_{w}=+ then {u,w}Aτ\{u,w\}\in A_{\tau} but not AτA_{\tau^{\prime}};

if wuw\sim u, wvw\neq v has τw=\tau_{w}=- then {u,w}Aτ\{u,w\}\in A_{\tau^{\prime}} but not AτA_{\tau};

if wvw\sim v has τw=\tau_{w}=- then {v,w}Aτ\{v,w\}\in A_{\tau} but not AτA_{\tau^{\prime}};

if wvw\sim v, wuw\neq u has τw=+\tau_{w}=+ then {v,w}Aτ\{v,w\}\in A_{\tau^{\prime}} but not AτA_{\tau};

{u,v}\{u,v\} belongs to neither AτA_{\tau} nor AτA_{\tau^{\prime}}.

Let NaN_{a} through NeN_{e} be the number of edges of GG corresponding to each of the types above. Then E(G)AτE(G)Aτ=Nb+NdNaNc|E(G)\cap A_{\tau^{\prime}}|-|E(G)\cap A_{\tau}|=N_{b}+N_{d}-N_{a}-N_{c}. Note that NeN_{e} is either zero or one, and it is one if and only if {u,v}E(G)\{u,v\}\in E(G), and note also that uu has a minority if and only if NaNb+NeN_{a}\leq N_{b}+N_{e}, while uu has a strict minority if and only if NaNb+Ne1N_{a}\leq N_{b}+N_{e}-1 (and similarly for vv). Hence, if uu and vv both have strict minorities then

while if uu and vv both have minorities and are non-adjacent then

Hence, in either case we have established that E(G)AτE(G)Aτ|E(G)\cap A_{\tau^{\prime}}|\geq|E(G)\cap A_{\tau}|.

Finally, note that if Bτ={{u,v}:τuτv}B_{\tau}=\{\{u,v\}:\tau_{u}\neq\tau_{v}\} then EBτ=EEAτ|E\cap B_{\tau}|=|E|-|E\cap A_{\tau}| and so (7) implies that

If it were possible to increase E(G)Aτ|E(G)\cap A_{\tau}| while maintaining E(G)|E(G)|, Aτ|A_{\tau}|, and Bτ|B_{\tau}|, τ\tau could not have been the unique maximum a posteriori estimator.

Since the incorrect Lemma 5.1 was used to prove that P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) is necessary for strong consistency, we will now show how Lemma 7.2 can be used for the same purpose. So, for the rest of the section we fix some ϵ>0\epsilon>0 and assume (after passing to a subsequence of nn, if necessary) that P(n,pn,qn)ϵn1P(n,p_{n},q_{n})\geq\epsilon n^{-1}. We will divide the proof into a sparse case and a dense case. In the sparse case, we show that there is a pair of non-adjacent minorities:

If P(n,pn,qn)ϵn1P(n,p_{n},q_{n})\geq\epsilon n^{-1} and npn64lognnp_{n}\leq 64\log n for infinitely many nn then with asymptotically positive probability there is a non-adjacent pair uu, vv of nodes such that σu=+\sigma_{u}=+, σv=\sigma_{v}=-, and uu and vv have minorities.

We will divide {u:σu=+}\{u:\sigma_{u}=+\} into three sets S1,+,S2,+S_{1,+},S_{2,+}, and S3,+S_{3,+}, each of size at least n/4n/4; similarly, we divide {u:σu=}\{u:\sigma_{u}=-\} into S1,S_{1,-}, S2,S_{2,-}, and S3,S_{3,-}. For each of these six sets Si,jS_{i,j}, Pr(uSi,j:u is a minority)δ\operatorname{Pr}(\exists u\in S_{i,j}:u\text{ is a minority})\geq\delta. Next, note that the event that uu has a minority is (in the sense of Harris ) monotone increasing in the edges between ++-labelled and --labelled nodes and monotone decreasing in the other edges. It follows from Harris’s inequality that any such events are non-negatively correlated. In particular, with probability at least δ6\delta^{6}, every Si,jS_{i,j} contains a node with a minority (ui,ju_{i,j}, say).

We will complete the proof by showing that a.a.s. it is not the case that every ui,+u_{i,+} is connected to every uk,u_{k,-}. Indeed, if every ui,+u_{i,+} is connected to every uk,u_{k,-} then the graph GG contains a subgraph isomorphic to K3,3K_{3,3} (the complete bipartite graph). However, the random graph GG is stochastically dominated by the Erdős-Rényi graph G(n,64n1logn)\mathcal{G}(n,64n^{-1}\log n), and it is well-known (for example, by the first moment method) that such a graph a.a.s. does not contain a copy of K3,3K_{3,3}.

To complete the proof we consider the dense case, where we prove that there is a pair of strict minorities:

If P(n,pn,qn)ϵn1P(n,p_{n},q_{n})\geq\epsilon n^{-1} and npn64lognnp_{n}\geq 64\log n infinitely often then with asymptotically positive probability there are a pair uu and vv with opposite labels and strict minorities.

Once we have established that Pr(N+1)\operatorname{Pr}(N_{+}\geq 1) is bounded away from zero, it follows by symmetry that Pr(N1)\operatorname{Pr}(N_{-}\geq 1) is bounded away from zero (where NN_{-} is the number of --labelled nodes with a minority). As in the proof of Lemma 7.4, Harris’s inequality implies that Pr(N+1 and N1)Pr(N+1)Pr(N1)\operatorname{Pr}(N_{+}\geq 1\text{ and }N_{-}\geq 1)\geq\operatorname{Pr}(N_{+}\geq 1)\operatorname{Pr}(N_{-}\geq 1), and then it follows that with asymptotically positive probability there are strict minorities with both ++ and - labels.

Finally, the proof that P(n,pn,qn)=o(n1)P(n,p_{n},q_{n})=o(n^{-1}) is necessary for strong consistency follows by combining Lemma 7.2 with Lemma 7.4 in the sparse case, or with Lemma 7.6 in the dense case.

References