Consistency Thresholds for the Planted Bisection Model

Elchanan Mossel, Joe Neeman, Allan Sly

Introduction

The “planted bisection model” is a random graph model with $2n$ vertices that are divided into two classes with $n$ vertices each. Edges within the classes are added to the graph independently with probability $p_{n}$ each, while edges between the classes are added with probability $q_{n}$ . Following Bui et al, who studied a related model, Dyer and Frieze introduced the planted bisection model in order to study the average-case complexity of the Min-Bisection problem, which asks for a bisection of a graph that cuts the smallest possible number of edges. This problem is known to be NP-complete in the worst case , but on a random graph model with a “planted” small bisection one might hope that it is usually easy. Indeed, Dyer and Frieze showed that if $p_{n}=p>q=q_{n}$ are fixed as $n\to\infty$ then with high probability the bisection that separates the two classes is the minimum bisection, and it can be found in expected $O(n^{3})$ time.

These models were introduced slightly earlier in the statistics literature (under the name “stochastic block model”) in order to study the problem of community detection in random graphs. Here, the two parts of the bisection are interpreted as latent “communities” in a network, and the goal is to identify them from the observed graph structure. If $p_{n}>q_{n}$ , the maximum a posteriori estimate of the true communities is exactly the same as the minimum bisection (see the discussion leading to Lemma 4.1), and so the community detection problem on a stochastic block model is exactly the same as the Min-Bisection problem on a planted bisection model; hence, we will use the statistical and computer science terminologies interchangeably. We note, however, the statistics literature is slightly more general, in the sense that it often allows $q_{n}>p_{n}$ , and sometimes relaxes the problem by allowing the detected communities to contain some errors.

Our main contribution is a necessary and sufficient condition on $p_{n}$ and $q_{n}$ for recoverability of the planted bisection. When the bisection can be recovered, we provide an efficient algorithm for doing so.

Definitions and results

The oldest and most fundamental question about planted partition models is the label reconstruction problem: if we were given the graph $G$ but not the labelling $\sigma$ , could we reconstruct $\sigma$ (up to its sign) from $G$ ? This problem is usually framed in the asymptotic regime, where the number of nodes $n\to\infty$ , and $p$ and $q$ are allowed to depend on $n$ .

Given sequences $p_{n}$ and $q_{n}$ in $ $, and given a map$ \mathcal{A} $from graphs to vertex labellings, we say that$ \mathcal{A}$ is strongly consistent (or sometimes just consistent) if

where the probability $\operatorname{Pr}_{n}$ is taken with respect to $(G,\sigma)\sim\mathcal{G}(2n,p_{n},q_{n})$ .

Depending on the application, it may also make sense to ask for a labelling which is almost completely accurate, in the sense that it correctly labels all but a vanishingly small fraction of nodes. Amini et al. suggested the term “weak consistency” for this notion.

Given $\sigma,\tau\in\{1,-1\}^{2n}$ , define

Given sequences $p_{n}$ and $q_{n}$ in $ $, and given a map$ \mathcal{A} $from graphs to vertex labellings, we say that$ \mathcal{A}$ is weakly consistent if

where “ $\stackrel{{\scriptstyle P}}{{\to}}$ ” means convergence in probability, and the probability is taken with respect to $(G,\sigma)\sim\mathcal{G}(2n,p_{n},q_{n})$ .

Our main result is a characterization of the sequences $p_{n}$ and $q_{n}$ for which consistent or weakly consistent estimators exist. Note that the characterization of weak consistency was obtained previously by Yun and Proutiere , but we include it here for completeness.

When $m=n$ , we will abbreviate by $P(n,p,q)=P(n,n,p,q)$ .

Consider sequences $p_{n}$ and $q_{n}$ in $ $. There exists a strongly consistent estimator for$ \mathcal{G}(2n,p_{n},q_{n}) $if and only if$ P(n,p_{n},q_{n})=o(n^{-1}) $. There exists a weakly consistent estimator for$ \mathcal{G}(2n,p_{n},q_{n}) $if and only if$ P(n,p_{n},q_{n})\to 0$.

In order to provide some intuition for Definition 2.4 and its appearance in our characterization, we note the following graph-theoretic interpretation of $P(n,p,q)$ :

Given a labelled graph $(G,\sigma)\sim\mathcal{G}(2n,p,q)$ and a node $v\in V(G)$ , we say that $v$ has a majority of size $k$ if either

We say that $v$ has a majority if it has a majority of size one. If $v$ does not have a majority, we say that it has a minority.

Fix sequences $p_{n}$ and $q_{n}$ in $ $and let$ (G,\sigma)\sim\mathcal{G}(n,p_{n},q_{n})$. Then

$P(n,p_{n},q_{n})=o(n^{-1})$ if and only if a.a.s. every $v\in V(G)$ has a majority; and

$P(n,p_{n},q_{n})\to 0$ if and only if a.a.s. at most $o(n)$ nodes in $V(G)$ fail to have a majority.

Proposition 2.7 suggests some intuition for Theorem 2.5: namely, that a node can be labelled correctly if and only if it has a majority. In fact, having a majority is necessary for correct labelling (and we will use this to prove one direction of Theorem 2.5); however, it is not sufficient. For example, there are regimes in which 51% of nodes have majorities, but only 50% of them can be correctly labelled (see ).

We note that Theorem 2.5 has certain parallels with local-to-global threshold phenomena in random graphs. For example, Erdős and Rényi showed that for $\mathcal{G}(n,p_{n})$ , if $p_{n}$ is large enough so that with high probability every node has a neighbor then the graph is connected with high probability. On the other hand, every node having a neighbor is clearly necessary for the graph to be connected. An analogous story holds for the existence of Hamiltonian cycles: Komlós and Szemerédi showed that $\mathcal{G}(n,p_{n})$ has a Hamiltonian cycle with high probability if and only if with high probability every node has degree at least two.

These results on connectedness and Hamiltonicity have a feature in common: in both cases, an obviously necessary local condition turns out to also be sufficient (on random graphs) for a global condition. One can interpret Theorem 2.5 similarly: the minimum bisection in $\mathcal{G}(n,p_{n},q_{n})$ equals the planted bisection with high probability if and only if with high probability every node has more neighbors of its own label than those of the other label.

Our algorithm comes in three steps, each of which is based on an idea that has already appeared in the literature. Our first step is a spectral algorithm, along the lines of those developed by Boppana , McSherry , and Coja-Oghlan . Yun and Proutiere recently made some improvements to (a special case of) Coja-Oghlan’s work, showing that a spectral algorithm can find a bisection with $o(n)$ errors if $n\frac{(p_{n}-q_{n})^{2}}{p_{n}+q_{n}}\to\infty$ ; this is substantially weaker than McSherry’s condition for strong consistency, which would require converging to infinity with a rate of at least $\log n$ .

The second stage of our algorithm is to apply a “replica trick.” We hold out a small subset $U$ of vertices and run a spectral algorithm on the subgraph induced by $V\setminus U$ . Then we label vertices in $U$ by examining the edges between $U$ and $V\setminus U$ . By repeating the process for many subsets $U$ , we dramatically reduce the number of errors made by the spectral algorithm. More importantly, we get extra information about the structure of the errors; for example, we can show that the set of incorrectly-labelled vertices is very poorly connected. Similar ideas are used by Condon and Karp , who used successive augmentation to build an initial guess on a subset of vertices, and then used that guess to correctly classify the remaining vertices. The authors also used a similar idea in the $p_{n},q_{n}=\Theta(n^{-1})$ regime, with a more complicated replica trick based on belief propagation.

The third step of our algorithm is a hill-climbing algorithm, or a sequence of local improvements. We simply relabel vertices so that they agree with the majority of their neighbors. An iterative version of this procedure was considered in , and a randomized version (based on simulated annealing) was studied by Jerrum and Sorkin . Our version has better performance guarantees because we begin our hill-climbing just below the summit: as we will show, we need to relabel only a tiny fraction of the vertices and each of those will be relabelled only once.

As noted above, none of the ingredients in our algorithm are novel on their own. However, the way that we combine them is new (and also crucial to the correctness of the resulting algorithm). For example, McSherry used a spectral algorithm with a “clean-up” stage, but his clean-up stage was different from our second and third stages.

Although Theorem 2.5 is not particularly explicit in terms of $p_{n}$ and $q_{n}$ , one can obtain various explicit characterizations in particular regimes (for example, in order to better compare our results with the existing literature). We will focus our attention on the case where $p_{n}$ and $q_{n}$ are bounded away from one; for concreteness, suppose $p_{n},q_{n}\leq 2/3$ . Because of the symmetry of the problem, this case suffices: indeed, replacing $G\sim\mathcal{G}(n,p_{n},q_{n})$ by its complement (the graph in which two vertices are connected if they are not connected in $G$ ) corresponds to replacing $p_{n}$ by $1-p_{n}$ and $q_{n}$ by $1-q_{n}$ . Hence, if we handle the case $p_{n},q_{n}\leq 2/3$ then we also handle the case $p_{n},q_{n}\geq 1/3$ . There remains the case in which $\min\{p_{n},q_{n}\}\leq 1/3$ and $2/3\leq\max\{p_{n},q_{n}\}$ , but this case is trivial: $P(n,p_{n},q_{n})$ decreases exponentially fast in $n$ , and even very simple algorithms are known to be strongly consistent.

One can easily see that to obtain strong consistency, at least one of $p_{n}$ or $q_{n}$ must be at least $n^{-1}\log n$ asymptotically. Indeed, suppose $q_{n}\leq p_{n}=n^{-1}\log n$ and let $X\sim\operatorname{Binom}(n,p_{n})$ , $Y\sim\operatorname{Binom}(n,q_{n})$ . Then $\operatorname{Pr}(X=0)=\Theta(n^{-1})$ , and so certainly $P(n,p_{n},q_{n})=\operatorname{Pr}(Y\geq X)=\Omega(n^{-1})$ , which means that strong consistency is impossible for these parameters. However, strong consistency is possible for some other parameters in the range $\Theta(n^{-1}\log n)$ . Using a Poisson approximation, we can characterize explicitly which of these sequences allow for strong consistency:

Let $p_{n}=a_{n}n^{-1}\log n$ and $q_{n}=b_{n}n^{-1}\log n$ . If there is a constant $C$ such that $C^{-1}\leq a_{n},b_{n}\leq C$ for all but finitely many $n$ then $P(n,p_{n},q_{n})=o(n^{-1})$ if and only if

In a denser regime, it is tempting to approximate $\operatorname{Binom}(n,p_{n})$ and $\operatorname{Binom}(n,q_{n})$ by the normal random variables $\mathcal{N}(np_{n},n\sigma_{p}^{2})$ and $\mathcal{N}(nq_{n},n\sigma_{q}^{2})$ , where $\sigma_{p}=\sqrt{p(1-p)}$ and $\sigma_{q}=\sqrt{q(1-q)}$ . That is,

where $\sigma=\sqrt{\sigma_{p}^{2}+\sigma_{q}^{2}}$ . The central limit theorem implies that the normal approximation is correct in the bulk of the distribution if $np_{n}\to\infty$ and $nq_{n}\to\infty$ . However, we are interested in applying this approximation for the tail, which requires a faster increase of $np_{n}$ and a more delicate argument.

Suppose $p_{n},q_{n}=\omega\left(n^{-1}\log^{3}n\right)$ and $p_{n},q_{n}\leq 2/3.$ Then the following conditions are equivalent

$n\operatorname{Pr}\left(\mathcal{N}(0,1)\geq\sigma_{n}^{-1}\sqrt{n}(p_{n}-q_{n})\right)\to 0$

$\frac{\sqrt{n}\sigma_{n}}{p_{n}-q_{n}}\exp(-\frac{n(p_{n}-q_{n})^{2}}{2\sigma_{n}^{2}})\to 0$ ,

where $\sigma_{n}=\sqrt{p_{n}(1-p_{n})+q_{n}(1-q_{n})}$ .

In particular, the third condition in Proposition 2.9 gives an explicit formula for checking whether a strongly consistent estimator exists.

The formula for weak consistency is rather simpler:

$P(n,p_{n},q_{n})\to 0$ if and only if $\frac{n(p_{n}-q_{n})^{2}}{p_{n}+q_{n}}\to\infty$ .

One direction of Proposition 2.10 follows from Chebyshev’s inequality, while the other follows from the central limit theorem.

3 Relation to prior work

Over the years, various authors have improved on the seminal work of Dyer and Frieze by proving weaker sufficient conditions on the sequences $p_{n}$ and $q_{n}$ for which the planted bisection can be recovered. (Various results also generalized the problem by allowing more than two labels, but we will ignore this generalization here.) For example, Jerrum and Sorkin required $p_{n}-q_{n}=\Omega(n^{-1/6+\epsilon})$ , while Condon and Karp improved this to $p_{n}-q_{n}=\Omega(n^{-1/2+\epsilon})$ . McSherry made a big step by showing that if

for a large enough constant $C$ then spectral methods can exactly recover the labels. This was significant because it allowed $p_{n}$ and $q_{n}$ to be as small as $\Theta(n^{-1}\log n)$ , which is order-wise the smallest possible. A similar result for a slightly different random graph model had been claimed earlier by Boppana , but the proof was incomplete. Carson and Impagliazzo showed that with slightly worse poly-logarithmic factors, a simple hill-climbing algorithm also works. Analogous results were later obtained by by Bickel and Chen using modularity maximization (for which no efficient algorithm is known).

Until now, none of the sufficient conditions in the literature were also necessary; in fact, necessary conditions on $p_{n}$ and $q_{n}$ have only rarely been discussed. It is instructive to keep the example $p_{n}=1/2$ , $q_{n}=1/2-r_{n}$ in mind. In this case McSherry’s condition is the same as requiring that $r_{n}\geq C\sqrt{n^{-1}\log n}$ . On the other hand, Carson and Impagliazzo pointed out that if $r_{n}\leq c\sqrt{n^{-1}\log n}$ for some small constant $c$ then the minimum bisection no longer coincides with the planted bisection (as far as we are aware, this was the only necessary condition in the literature). From a statistical point of view, this means that the true communities can no longer be reconstructed perfectly. Our contribution closes the gap between McSherry’s sufficient condition and Carson-Impagliazzo’s necessary condition. In the above case, for example, Proposition 2.9 shows that the critical constant is $C=c=1$ .

4 Parallel independent work

Abbe et al. independently studied the same problem in the logarithmic sparsity regime. They consider $p_{n}=(a\log n)/n$ and $q_{n}=(b\log n)/n$ for constants $a$ and $b$ ; they show that $(a+b)-2\sqrt{ab}>1$ is sufficient for strong consistency and that $(a+b)-2\sqrt{ab}\geq 1$ is necessary. Note that these are implied by Proposition 2.8, which is more precise. Abbe et al. also consider a semidefinite programming algorithm for recovering the labels; they show that it performs well under slightly stronger assumptions.

5 Other related work, and an open problem

Consistency is not the only interesting notion that one can study on the planted partition model. Earlier work by the authors and by Massoulié considered a much weaker notion of recovery: they only asked whether one could find a labelling that was positively correlated with the true labels.

There are also model-free notions of consistency. Kumar and Kannan considered a deterministic spatial clustering problem and showed that if every point is substantially closer to the center of its own cluster than it is to the center of the other cluster then one can exactly reconstruct the clusters. This is in much the same spirit as Theorem 2.5.

Makarychev, Makarychev, and Vijayaraghavan proposed semi-random models for planted bisections. These models allow for adversarial noise, and also allow edge distributions that are not independent, but only invariant under permutations. They then give approximation algorithms for Min-Bisection, which they prove to work under expansion conditions that hold with high probability for their semi-random model.

We ask whether the techniques developed here could sharpen the results obtained by Makarychev et al. For example, exact recovery under adversarial noise is clearly impossible, but if the adversary is restricted to adding $o(n)$ edges, then maybe one can guarantee almost exact recovery.

Binomial probabilities and graph structure

In this section, we will prove Proposition 2.7, which relates the binomial probabilities $P(n,p_{n},q_{n})$ to the structure of random graphs $G\sim\mathcal{G}(2n,p_{n},q_{n})$ .

From now on, the letters $c$ and $C$ refer to positive constants, whose value may change from line to line. We adopt the convention that $C$ refers to a “sufficiently large” constant, so that any statement involving $C$ will remain true if $C$ is replaced by a larger constant. Similarly, $c$ refers to a “sufficiently small” constant.

Note that the condition $mp\geq 64\log m$ is not only a technical one (although the constant $64$ is certainly not optimal). For example, if $p=m^{-1}\log m$ and $q=0$ then (2) fails to hold, because $\operatorname{Pr}(Y\geq X)=\operatorname{Pr}(X=0)\sim m^{-1}$ but $\operatorname{Pr}(Y\geq X-1)=\operatorname{Pr}(X\leq 1)\sim m^{-1}\log m$ .

Nevertheless, it is still possible to consider similar estimates in the sparse case. Here is an analogue of (2) that holds with $p=O(m^{-1}\log m)$ .

2 Majorities are uncorrelated

The preceding propositions may be combined to show that the event that $u$ has a minority is essentially independent of the event that $v$ has a minority. First, we observe that removing one trial from a binomial random variable doesn’t change very much.

There is a universal constant $C>0$ such that for all $m,n$ and all $p,q\leq 2/3$ ,

Assume without loss of generality that $p\geq q$ . Let $X^{\prime}\sim\operatorname{Binom}(m-1,p)$ , $Y^{\prime}\sim\operatorname{Binom}(n-1,q)$ , $\xi_{X}\sim\operatorname{Bernoulli}(p)$ and $\xi_{Y}\sim\operatorname{Bernoulli}(q)$ be independent, and then take $X=X^{\prime}+\xi_{X}$ and $Y=Y^{\prime}+\xi_{Y}$ . In terms of these variables, the left-hand inequality above may be written as

We will focus on this inequality (since the other inequality is essentially identical). Now,

If we assume that $(m-1)p\geq 64\log(m-1)$ then (1) implies that

which implies the claim. On the other hand, if $(m-1)p\leq 64\log(m-1)$ then directly from (3) we have

Next, we show that $\{u\text{ has a minority}\}$ and $\{v\text{ has a minority}\}$ are essentially uncorrelated. We recall that if $A$ and $B$ are events then $\operatorname{Cov}(A,B)=\operatorname{Pr}(A\cap B)-\operatorname{Pr}(A)\operatorname{Pr}(B)$ .

Fix nodes $u$ and $v$ . Let $A$ and $B$ be the events that $u$ and $v$ respectively have minorities. If $p,q\leq 2/3$ then

Assume that $p>q$ and that $\sigma_{u}=+$ and $\sigma_{v}=-$ (the other cases are very similar). Let $\xi$ be the indicator that $u\sim v$ , and let $A$ and $B$ be the events that $u$ and $v$ respectively have minorities. Note that $A$ and $B$ are conditionally independent given $\xi$ , which means that

where the last equality holds because $A$ and $B$ have the same distribution given $\xi$ .

Define $\alpha=P(n-1,n,p,q)=\operatorname{Pr}(\text{$ u $has a minority})=\operatorname{Pr}(\text{$ v $has a minority})$ . By our assumption that $\sigma_{u}\neq\sigma_{v}$ and $p>q$ , we have $\operatorname{Pr}(A\mid\xi=0)\leq\operatorname{Pr}(A\mid\xi=1)$ . On the other hand,

Next, we consider $\operatorname{Pr}(A\mid\xi=1)$ . Note that

By applying either (2) or Proposition 3.2 to the right hand side above, we have

(To get the second case, we are either applying (2) for $64\log n\leq np\leq n^{1/2}$ or we are applying Proposition 3.2.) In the first case, the random variable $\operatorname{Pr}(A\mid\xi)$ is supported on an interval of width at most $Cn^{-1/6}\alpha+Cn^{-2}$ and so its variance is at most $Cn^{-1/3}\alpha^{2}+Cn^{-4}$ . In the second case, $\operatorname{Pr}(\xi=1)=q\leq p\leq n^{-1/2}$ , and so

which is bounded by $C\alpha^{2}n^{-1/3}+Cn^{-4}$ .

3 Graph structure

Finally, we will use our preceding estimates to prove Proposition 2.7. Most of the proof essentially follows by straightforward first moment arguments. The most complicated part is showing that $P(n,p_{n},q_{n})=\Omega(n^{-1})$ implies that with constant probability there exists a node with a minority. This uses a fairly standard second moment argument, the main technical part of which is contained in Lemma 3.5.

Fix a node $v\in V(G)$ and suppose without loss of generality that $\sigma_{v}=+$ . For notational convenience, we will also suppose that $p>q$ ; an essentially identical proof works for $p<q$ . Let $X$ and $Y$ denote the number of $+$ - and $-$ -labelled neighbors of $v$ . Then

Suppose first that $P(n,p_{n},q_{n})=o(1)$ . Then

by Lemma 3.3. Summing over $v\in V(G)$ , we have

and so Markov’s inequality implies that a.a.s. all but $o(n)$ nodes have a majority.

For the rest of the proof, we will assume that $p_{n},q_{n}\leq 2/3$ . As we explained in Section 2.2, this case suffices: if $p_{n},q_{n}\geq 1/3$ then we may apply the result with $p_{n}$ and $q_{n}$ replaced by $1-p_{n}$ and $1-q_{n}$ ; if $q_{n}\leq 1/3$ and $p_{n}\geq 2/3$ then $P(n,p_{n},q_{n})=o(n^{-1})$ and we have already given that part of the proof.

Suppose that the number of nodes without a majority is not $o(n)$ a.a.s. Then there is some $\epsilon>0$ such that for infinitely many $n$ , the probability of having $\epsilon n$ nodes with a minority is at least $\epsilon$ . Thus, the expected number of nodes with a minority is at least $\epsilon^{2}n$ for infinitely many $n$ , which in turn implies that $P(n-1,n,p_{n},q_{n})=\operatorname{Pr}(Y\geq X)\geq\epsilon^{2}$ for infinitely many $n$ . By Lemma 3.3, $P(n,p_{n},q_{n})\not\to 0$ .

It remains to prove that all nodes have a majority a.a.s. only if $P(n,p_{n},q_{n})=o(n^{-1})$ . This requires a second moment argument: let $\xi_{u}$ be the indicator that $u$ has a minority and let $N=\sum_{u}\xi_{u}$ be the number of nodes with a minority. If $\alpha=\operatorname{Pr}(\text{$ u $has a minority})$ (which is the same for all $u$ ) then

Sufficient condition for strong consistency

The rough idea behind our strongly consistent labelling algorithm is to first run a weakly consistent algorithm and then try to improve it. The natural way to improve an almost-accurate labelling $\tau$ is to search for nodes $u$ that have a minority with respect to $\tau$ and flip their signs. In fact, if the errors in $\tau$ were independent of the neighbors of $u$ then this would work quite well: assuming that $u$ has a decently large majority (which it will, for most $u$ , by Proposition 3.1), then having a labelling $\tau$ with few errors is like observing each neighbor of $u$ with a tiny amount of noise. This tiny amount of noise is very unlikely to flip $u$ ’s neighborhood from a majority to a minority. Therefore, choosing $u$ ’s sign to give it a majority is a reasonable approach.

There are two important problems with the argument outlined in the previous paragraph: it requires the errors in $\tau$ to be independent, and it is only guaranteed to work for those $u$ that have a sizeable majority (i.e., almost, but not quite, all the nodes in $G$ ). Nevertheless, this procedure is a good starting point and it motivates the first clean-up stage of our algorithm (Algorithm 1). By removing $u$ from the graph before looking for the almost-accurate labelling $\tau$ , we ensure the required independence properties (as a result, note that we will be dealing with multiple labellings $\tau$ , depending on which nodes we removed before running our almost-accurate labelling algorithm). And although the final labelling we obtain is not guaranteed to be entirely correct, we show that it has very few (i.e., at most $n^{\epsilon}$ ) errors whereas the initial labelling was only guaranteed to have $o(n)$ errors.

In order to finally produce the correct labelling, we return to the earlier idea: flipping the label of every node that has a minority. We analyze this procedure by noting that after the previous step of the algorithm, the errors were confined to a very particular set of nodes (namely, those without a very strong majority). We show that this set of nodes is small and poorly connected, which means that every node in the graph is guaranteed to only have a few neighbors in this bad set. In particular, even nodes with relatively weak majorities cannot be flipped by labelling errors in the bad set. We analyze this procedure in Section 4.3.

As stated in the introduction, there exist algorithms for a.a.s. correctly labelling all but $o(n)$ nodes. Assuming that $p_{n}+q_{n}=\Omega(n^{-1}\log n)$ , such an algorithm is easy to describe, and we include it for completeness; indeed, the algorithm we give is essentially folklore, although a nice treatment is given in . A slightly more complex algorithm that doesn’t assume $p_{n}+q_{n}=\Omega(n^{-1}\log n)$ can be found in .

If $p_{n}+q_{n}=\Omega(n^{-1}\log n)$ then there is a constant $C$ such that

a.a.s. as $n\to\infty$ , where $\|\cdot\|$ denotes the spectral norm.

2 The replica step

Let BBPartition be an algorithm that is guaranteed to a.a.s. label all but $o(n)$ nodes correctly; we will use it as a black box. Note that we may assume that BBPartition produces an exactly balanced labelling. If not, then if its output has more $+$ labels than $-$ labels, say, we can randomly choose some $+$ -labelled vertices and relabel them. The new labelling is balanced, and it is still guaranteed to have at most $o(n)$ mistakes.

We define $V_{\epsilon}$ to be a set of “bad” nodes that our first step is not required to label correctly.

Let $V_{\epsilon}$ be the elements of $V$ that have a majority of size less than $\epsilon\sqrt{np\log n}$ , or that have more than $100np$ neighbors.

For any $\epsilon>0$ , Algorithm 1 a.a.s. correctly labels every node in $V\setminus V_{\epsilon}$ .

Before proving Proposition 4.3, we deal with a minor technical point. The following lemma shows that we can apply BBPartition to subgraphs of $G\sim\mathcal{G}(2n,p_{n},q_{n})$ , and it will still have the required guarantees.

If $P(n,p_{n},q_{n})=o(n^{-1})$ then for any $\alpha>0$ , $P(\lfloor\alpha n\rfloor,p_{n},q_{n})\to 0$ .

This follows from two simple properties of the function $P$ . First, we have $P(n_{1}+n_{2},p,q)\geq P(n_{1},p,q)P(n_{2},p,q)$ for any $n_{1},n_{2},p$ , and $q$ . Indeed, if $X_{i}\sim\operatorname{Binom}(n_{i},p)$ and $Y_{i}\sim\operatorname{Binom}(n_{i},q)$ are independent then

A similar coupling argument shows that for any $n_{2}\geq 0$ , $P(n_{1},p,q)\geq\frac{1}{2}P(n_{1}+n_{2},p,q)$ . Indeed, conditioned on $X_{1}+X_{2}\leq Y_{1}+Y_{2}$ , the probability of $X_{1}\leq Y_{1}$ is at least $\frac{1}{2}$ . Hence,

Now, choose an integer $k$ so that $\alpha\geq 1/k$ . Then

Since $k$ and $\alpha$ are constant as $n\to\infty$ , this completes the proof.

First, we may assume without loss of generality that the partition $U_{+},U_{-}$ that was produced in line 1 is positively correlated with the true labelling $\sigma$ . By our assumption on BBPartition, at line 1 $U_{i,+}$ either agrees with $V_{+}\setminus U_{i}$ or $V_{-}\setminus U_{i}$ , up to an error of $o(n)$ . After the relabelling in line 1, then, a.a.s. $U_{i,+}$ agrees with $V_{+}\setminus U_{i}$ up to an error of $o(n)$ . Since $m$ is a constant independent of $n$ , this property a.a.s. holds for every $i$ simultaneously.

Now, consider a node $v\not\in V_{\epsilon}$ and suppose without loss of generality that $\sigma_{v}=+$ . Conditioned on $v\in U_{i}$ , every other node is added to $U_{i}$ independently with probability $1/m$ . Hence, conditioned on $v$ having $k_{+}$ $+$ -labelled neighbors and $k_{-}$ $-$ -labelled neighbors, it has $\operatorname{Binom}(k_{+},1/m)$ $+$ -labelled neighbors in $U_{i}$ and $\operatorname{Binom}(k_{-},1/m)$ $-$ -labelled neighbors in $U_{i}$ . Let $k_{+,i}$ denote the number of $+$ -labelled neighbors that $v$ has in $U_{i}$ and let $k_{+,\lnot i}=k_{+}-k_{+,i}$ be the number of $+$ -labelled neighbors that $v$ has in $V\setminus U_{i}$ (and similarly for $-$ ).

By Bernstein’s inequality, with probability at least $1-2n^{-2}$ ,

Recall that $v\not\in V_{\epsilon}$ implies that $k_{+}\leq 100np$ , $k_{-}\leq 100np$ and

where the last inequality follows from the definition of $m$ . Taking a union bound over the events leading to (4), we see that a.a.s., for every $v\not\in V_{\epsilon}$ with $\sigma_{v}=+$ , if $v\in U_{i}$ then

In other words, every $v\not\in V_{\epsilon}$ still has a strong majority, even if we consider only edges between $v$ and the complement of $U_{i}$ .

Let $X_{-}$ be the number of $+$ -valued neighbors of $v$ that were incorrectly labelled as $-$ in line 1 (i.e. $X_{-}=|\{u:u\sim v,\sigma_{u}=+,u\in U_{i,-}\}|$ ), and let $X_{+}$ be the number of $-$ -valued neighbors that were incorrectly labelled as $+$ . Note that the quantities considered in line 1 of Algorithm 1 may be expressed in terms of $k$ and $X$ as

Hence, the inequality $|X_{+}-X_{-}|<\frac{1}{2}|k_{+,\lnot i}-k_{i,\lnot i}|$ will imply that $v$ is correctly labelled in lines 1–1. For the rest of the proof, our goal will be to show that a.a.s. the above inequality holds for all $v\not\in V_{\epsilon}$ .

Let $E_{-}=\#\{u\in U_{i,-}:\sigma_{u}=+\}$ (i.e., the total number of $+$ -labelled vertices that were mislabelled in line 1) and let $E_{+}=\#\{u\in U_{i,+}:\sigma_{u}=-\}$ . Note that the neighbors of $v$ are independent of $U_{i,-}$ , and so conditioned on $k_{+,\lnot i}$ and $k_{-,\lnot i}$ ,

where $V_{+}$ and $V_{-}$ are the set of $u$ with $\sigma_{u}=+$ and $\sigma_{u}=-$ , respectively. Now condition on $k_{+,\lnot i}$ and $k_{-,\lnot i}$ , and on the following a.a.s. events:

Under the above events, and recalling that $k_{+}\leq 100np$ ,

Going back to (6), we see that a.a.s. for all $v\not\in V_{\epsilon}$ ,

Next, we consider the deviations of $X_{-}$ and $X_{+}$ around their means. By Bernstein’s inequality for hypergeometric variables, there is a constant $C$ such that with probability $1-n^{-2}$ , $X_{-}$ is within

of its expectation. Since $E_{-}=o(n)$ , we can take $n$ large enough so that $X_{-}$ is within $\frac{\epsilon}{16}\sqrt{np\log n}$ of its expectation with probability $1-n^{-2}$ . Arguing similarly for $X_{+}$ we have

with probability $1-2n^{-2}$ . Taking a union bound over $v\not\in V_{\epsilon}$ (recall that $X$ and $k$ both depend on $v$ ), we see that the above inequality holds a.a.s. for all $v\not\in V_{\epsilon}$ simultaneously. By (6), a.a.s. for all $v\in V_{\epsilon}$ ,

3 The hill-climbing step

After running Algorithm 1, we are left with a graph in which only nodes belonging to $V_{\epsilon}$ could possibly be mis-labelled. Fortunately, very few nodes belong to $V_{\epsilon}$ , and those that do are poorly connected to the rest of the graph. This is the content of the next two propositions.

For every $\delta>0$ there exists an $\epsilon>0$ such that if $P(n,p,q)=o(n^{-1})$ then $|V_{\epsilon}|\leq n^{\delta}$ a.a.s.

Consider a single $v\in V$ . By Bernstein’s inequality the probability that $v$ has $100np$ neighbors is less than $n^{-2}$ (using $np\geq\log n$ , which follows from $P(n,p,q)=o(n^{-1})$ ). Hence, a.a.s. every $v$ has at most $100np$ neighbors.

In particular, if $C\epsilon<\delta$ then the right hand size is $o(n^{-1+\delta})$ . By Markov’s inequality, this implies that a.a.s. at most $n^{\delta}$ nodes fail to have a majority of size $\epsilon\sqrt{np\log n}$ .

Since $(2/\epsilon)^{\epsilon}\to 1$ as $\epsilon\to 0$ , we may choose $\epsilon$ so that $(2C/\epsilon)^{C\epsilon\log n}\leq n^{\delta/2}$ . By Markov’s inequality, we see that at most $n^{\delta}$ nodes fail to have a majority of size $\epsilon\sqrt{np\log n}$ .

Suppose that $P(n,p,q)=o(n^{-1})$ and $np\leq n^{1/4}$ . For sufficiently small $\epsilon$ , a.a.s. no node has two or more neighbors in $V_{\epsilon}$ .

Fix $u,v\in V$ ; let $X\sim\operatorname{Binom}(n-1,p)$ and $Y\sim\operatorname{Binom}(n,q)$ . As in the proof of Proposition 4.7, a.a.s. every $v\in V$ has at most $100np$ neighbors; for the rest of the proof, we condition on this event. Moreover, we may choose $\epsilon$ small enough so that $\operatorname{Pr}(Y\geq X-\epsilon\sqrt{np\log n})\leq n^{-7/8}$ . In particular, that means that $\operatorname{Pr}(u\in V_{\epsilon})\leq n^{-7/8}$ . Now condition on the neighbors of $u$ . If $v$ has a majority of $2\epsilon\sqrt{np\log n}$ on all edges except for $u$ , then it lies outside of $V_{\epsilon}$ regardless of whether it neighbors $u$ . But this event is independent of whether $u\in V_{\epsilon}$ , and if $\epsilon$ is sufficiently small then it has probability at least $1-n^{-7/8}$ . Hence, $\operatorname{Pr}(u,v\in V_{\epsilon})\leq n^{-7/4}$ .

Now condition on the event that $u,v\in V_{\epsilon}$ . Recall that $u$ and $v$ each have at most $100np\leq 100n^{1/4}$ neighbors in $V_{-}$ and at most $100n^{1/4}$ neighbors in $V_{+}$ . Conditioned on the number of neighbors in $V_{-}$ and $V_{+}$ , the neighbors of $u$ and $v$ are independent and uniformly distributed. Hence, the probability that they have a common neighbor is $O(n^{-3/4-3/4+1})=O(n^{-1/2})$ . Combining this with the previous paragraph, we have

Taking a union bound over $n^{2}$ choices of $u$ and $v$ completes the proof.

Suppose that $np\leq n^{1/4}$ . For sufficiently small $\epsilon$ , a.a.s. no two nodes in $V_{\epsilon}$ are adjacent.

Fix $u,v\in V$ . The probability that they are adjacent is at most $p\leq n^{-3/4}$ . As in the previous proof, if $\epsilon$ is small enough then $\operatorname{Pr}(u\in V_{\epsilon}\mid u\sim v)$ and $\operatorname{Pr}(v\in V_{\epsilon}\mid u\sim v,u\in V_{\epsilon})$ are both at most $n^{-7/8}$ . Multiplying these conditional probabilities, we have

and we conclude by taking a union bound over $u$ and $v$ .

Suppose that we initialize Algorithm 2 with a partition whose errors are restricted to $V_{\epsilon}$ , and suppose that $P(n,p_{n},q_{n})=o(n^{-1})$ . Then a.a.s., Algorithm 2 returns the true partition.

We consider two cases: the dense regime $n^{1/4}\leq np\leq 2n/3$ , and the sparse regime $\frac{1}{2}\log n\leq npn^{1/4}$ .

In the dense regime, note that by Proposition 3.1, a.a.s. every node has a majority of $\Omega(\sqrt{np/\log n})\geq\Omega(n^{1/9})$ . On the other hand, if $\epsilon$ is sufficiently small then (by Proposition 4.7) $|V_{\epsilon}|\leq n^{1/10}$ , which implies that every node in $V_{+}$ will have most of its neighbors in $U_{+}$ . Therefore, $W_{+}=V_{+}$ in Algorithm 2.

In the sparse regime, let $V^{\prime}$ be the set of nodes with a majority of less than three; note that $V^{\prime}\subset V_{\epsilon}$ . By Proposition 4.9, a.a.s. every node has at most one neighbor in $V_{\epsilon}$ , which implies that every node in $V_{+}\setminus V^{\prime}$ has most of its neighbors in $U_{+}$ ; hence every node outside of $V^{\prime}$ will be correctly labelled. On the other hand, Proposition 4.11 shows that nodes in $V^{\prime}$ are also correctly labelled, since none of them have any neighbors in $V_{\epsilon}$ (recalling that $V^{\prime}\subset V_{\epsilon}$ ).

Necessary condition for strong consistency

A classical fact in Bayesian statistics says that if we are asked to produce a configuration $\hat{\sigma}$ from the graph $G$ , then the algorithm with the highest probability of success is the maximum a posteriori estimator, $\hat{\sigma}$ , which is defined to be any $\tau\in\{-1,1\}^{V(G)}$ satisfying $\sum_{u}\tau_{u}=0$ that maximizes $\operatorname{Pr}(G\mid\sigma=\tau)$ . (To see that this is the estimator with the highest probability of success, note that every $\tau$ that maximizes $\operatorname{Pr}(G\mid\sigma=\tau)$ also maximizes $\operatorname{Pr}(\sigma=\tau\mid G)$ ; clearly, a $\tau$ that maximizes the latter quantity is an optimal estimate.) In order to prove that $P(n,p_{n},q_{n})=o(n^{-1})$ is necessary for strong consistency, we relate the success probability of $\hat{\sigma}$ to the existence of nodes with minorities. Note that we say $v$ has a majority with respect to $\tau$ if (assuming $p>q$ ) $\tau$ gives the same label to $v$ as it does to most of $v$ ’s neighbors.

If there is a unique maximal $\hat{\sigma}$ then with respect to $\hat{\sigma}$ , there cannot be both a $+$ -labelled node with a minority and a $-$ -labelled node with a minority.

For convenience, we will assume that $p>q$ . The same proof works for $p<q$ , but one needs to remember that the definition of “majority” and “minority” swap in that case (Definition 2.6).

The probability of $G$ conditioned on the labelling $\tau$ may be written explicitly: if $A_{\tau}$ is the set of unordered pairs $u\neq v$ with $\tau_{u}=\tau_{v}$ and $B_{\tau}$ is the set of unordered pairs $u\neq v$ with $\tau_{u}\neq\tau_{v}$ then

Consider a labelling $\tau$ . Suppose that there exist nodes $u$ and $v$ with $\tau_{u}=+$ and $\tau_{v}=-$ , and such that both $u$ and $v$ have minorities with respect to $\tau$ . We will show that $\tau$ cannot be the unique maximizer of $\operatorname{Pr}(G\mid\sigma=\tau)$ , which will establish the lemma.

Consider the labelling $\tau^{\prime}$ that is identical to $\tau$ except that $\tau^{\prime}_{u}=-$ and $\tau^{\prime}_{v}=+$ . The fact that $u$ and $v$ both had minorities with respect to $\tau$ implies that

(note that equality is possible in the inequalities above if $u$ and $v$ are neighbors). On the other hand, the number of $+$ and $-$ labels are the same for $\tau$ and $\tau^{\prime}$ ; hence $|A_{\tau}|=|A_{\tau^{\prime}}|$ and $|B_{\tau}|=|B_{\tau^{\prime}}|$ . Looking back at (7), therefore, we have

Hence, $\tau$ cannot be the unique maximizer of $\operatorname{Pr}(G\mid\sigma=\tau)$ .

In order to argue that $P(n,p_{n},q_{n})=o(n^{-1})$ is necessary for strong consistency, we need to show that if $P(n,p_{n},q_{n})$ is not $o(n^{-1})$ then $(G,\sigma)\sim\mathcal{G}(2n,p_{n},q_{n})$ has a non-vanishing chance of containing nodes of both labels with minorities.

Suppose that $P(n,p_{n},q_{n})$ is not $o(n^{-1})$ . By Proposition 2.7, there is some $\epsilon>0$ such that for infinitely many $n$ , $\operatorname{Pr}(\exists u:\text{$ u $has a minority})\geq\epsilon$ . Since $+$ -labelled nodes and $-$ -labelled nodes are symmetric, there are infinitely many $n$ such that

By Harris’s inequality , the two events above are non-negatively correlated because both of them are monotonic events with the same directions: both are monotonic increasing in the edges between $+$ -labelled and $-$ -labelled nodes and monotonic decreasing in the other edges. Hence, there are infinitely many $n$ for which

Binomial approximations

In this section, we collect various technical, but not particularly enlightening, estimates for binomial variables. Specifically, we prove Propositions 2.8 and 2.9, which give explicit characterizations of the condition $P(n,p_{n},q_{n})=o(n^{-1})$ in the sparse and dense case respectively, and Proposition 3.1 and 3.2, which give perturbative estimates for binomial probabilities. Our main tools are Bernstein’s inequality, Stirling’s approximation and Taylor expansion.

For simplicity, in this section we write $a=a_{n},b=b_{n}$ and $c=a+b$ . If there is a constant $C>0$ such that $C^{-1}f\leq g\leq Cf$ then we write $f\asymp g$ . We recall that $a,b=\Theta(1)$ and that $pn=a\log n$ and $qn=b\log n$ . Let $X\sim\operatorname{Binom}(n,p)$ and $Y\sim\operatorname{Binom}(n,q)$ .

We begin with a Poisson approximation to binomials.

If $Z=X+Y$ then for every $k\leq 10c\log n$ ,

where the sequence implicit in the $o(1)$ notation is independent of $n$ and $k$ .

By a direct computation, if $k\leq 10c\log n$ then

We first note that if $a-b\leq\epsilon=\epsilon(C)$ then strong consistency does not hold. This follows because with constant probability we have that $X$ is less than its mean $a_{n}\log n$ and the probability that $Y$ is larger than $a\log n$ is at least $n^{-1/2}$ if $\epsilon$ is a sufficiently small constant.

Without loss of generality, we may assume that $c\geq 1$ . Indeed, if $c<1$ then the proposition is trivially true: on the one hand $P(n,p_{n},q_{n})=\Omega(n^{-1})$ because $\operatorname{Pr}(X=0)$ and $\operatorname{Pr}(Y=0)$ are both $\Omega(n^{-1})$ ; on the other hand, $(a+b-2\sqrt{ab}-1)\log n+\frac{1}{2}\log\log n\to-\infty$ because $a+b=c<1$ and $\sqrt{ab}$ is bounded away from zero as $n\to\infty$ .

where the second equality follows from the fact that $\operatorname{Pr}(Z\geq 10c\log n)\leq O(n^{-2})$ , recalling that $c\geq 1$ .

For a fixed $k\leq 10c\log n$ , we have that

where $\eta=\frac{b}{a+b}\leq\frac{1}{2}(1-\epsilon)$ . Recall that binomial tail probabilities decay exponentially fast; since $\eta\leq\frac{1}{2}(1-\epsilon)$ , $\operatorname{Pr}(\operatorname{Binom}(k,\eta)\geq k/2)\asymp\operatorname{Pr}(\operatorname{Binom}(k,\eta)=\lceil k/2\rceil)$ .Combining this with Stirling’s approximation we have

where $\theta=\sqrt{\eta(1-\eta)}=\frac{\sqrt{ab}}{a+b}$ . By Lemma 6.1,

and so Stirling’s approximation for $k\geq 1$ gives

and so the maximum is obtained around the value

Thus $n\operatorname{Pr}(Y\geq X)\to 0$ if and only if

2 Characterization of dense strong consistency

Our main tool for proving Proposition 2.9 will be the following Local Central Limit Theorem. The proof is a standard application of Stirling’s approximation.

Let $C>0$ be an arbitrary constant and $Y\sim\operatorname{Binom}(n,q)$ , where

Let $\sigma_{q}^{2}=q(1-q)$ and let $\phi(x)=(2\pi)^{-1/2}e^{-x^{2}/2}$ . Then for all integers $k$ such that $|k-nq|\leq C\sqrt{n\log n}\sigma_{q}$ it holds that

The second statement follows easily from the first one using the formula for $\phi$ and noting that if $\delta\leq C\sqrt{n\log n}\sigma_{q}$ and $|\epsilon|\leq 1$ then

To prove the first statement, we begin with Stirling’s approximation. Noting that $k\to\infty$ as $n\to\infty$ , we obtain:

and since $q=\omega(n^{-1}\log^{3}n)$ implies $\frac{\sigma_{q}\sqrt{\log n}}{\sqrt{n}}=o(q/\log n)$ , it follows that $n/k=(1+o(1/\log n))\frac{1}{q}$ . Similarly, $\frac{n}{n-k}=(1+o(1/\log n))\frac{1}{1-q}$ and so

Next, we use Taylor expansion around $nq=k$ . The first-order term vanishes and we have

where the last equality uses the fact that

which follows from the assumption that $q=\omega(n^{-1}\log^{3}(n))$ . Now, from (8) we have $\frac{n}{k(n-k)}=(1+o(1/\log n))\frac{1}{n\sigma_{q}^{2}}$ . Since $(k-nq)^{2}=O(\sigma_{q}^{2}n\log n)$ , we have

The proof follows by combining this with (8) and Stirling’s approximation for $\operatorname{Pr}(Y=k)$ .

The second and third conditions are clearly equivalent; we will show the equivalence of the first two.

So writing $b_{q}=5\sqrt{n\log n}\sigma_{q}$ and $b_{p}=5\sqrt{n\log n}\sigma_{p}$ we have:

Where $M,N\sim\mathcal{N}(0,1)$ are independent. The proof follows.

3 Perturbation estimates for dense binomials

The main approximation that we use to prove Proposition 3.1 is the following:

Now, under the assumption $mp\geq 64\log m$ , we have $mp-4\sqrt{mp\log m}\geq mp/2$ and $mp+4\sqrt{mp\log m}\leq 3mp/2$ . Consider the first term in the upper bound of Lemma 6.7:

where the last inequality used $|k-mp|\leq 4\sqrt{mp\log m}$ and $k\geq mp/2$ . The other term in the upper bound of Lemma 6.7 is similar:

for sufficiently large $m$ , where the second inequality follows by lower-bounding both terms in the denominator: $p\leq 2/3$ implies $m-mp\geq 2mp$ and $k\leq mp+4\sqrt{mp\log m}$ implies $m-k\geq cmp$ for some $c>0$ and sufficiently large $m$ (this follows by considering the cases $p\in[2^{-10},2/3]$ and $p\in[64m^{-1}\log m,2^{-10}]$ separately). Combining (11) and (12) with Lemma 6.7, we obtain

The lower bound (i.e. (1)) is essentially the same, and we give only a sketch: we write

4 Perturbation estimates for sparse binomials

The sparse case needs a slightly different argument and has slightly worse bounds. We have the following analogue of Lemma 6.7:

This time, we will use a sharper bound on the sum: since the logarithm is an increasing function,

Since $k$ and $mp$ are $o(m)$ , $\log((m-k)/(m-mp))=o(1)$ , and so

This proof is similar to the proof of Proposition 3.1, but with Lemma 6.10 instead of Lemma 6.7 and some slightly different truncations: we write

By Bernstein’s inequality, we may truncate the sum at $\sqrt{m}$ at the cost of an additive $e^{-c\sqrt{m}}$ term. We apply the inequality

(which follows from Lemma 6.10) to each term in the sum, yielding

where the second inequality follows (assuming $C\geq e$ ) because $(ey/x)^{x}$ is an increasing function of $x$ for $x\leq y$ . Putting everything together,

Finally, note that $\operatorname{Pr}(X=0)\leq\operatorname{Pr}(Y\geq X)$ so that the first two terms above may be combined at the cost of increasing $C$ . For the additive term $e^{-c\sqrt{m}}$ , note that $mp\leq 128\log m$ implies that $\operatorname{Pr}(Y\geq X)\geq\operatorname{Pr}(X=0)=\Omega(n^{-\alpha})$ for some constant $\alpha$ , and so $e^{-c\sqrt{m}}$ may also be absorbed into the main term at the cost of increasing $C$ .

Erratum

The published version of this paper contained a mistake; we are grateful to Jan van Waaij for pointing it out.

The statement of Lemma 5.1 is incorrect; the error in the proof was introduced in the inequality

which does not hold under the assumption of Lemma 5.1. To formulate a correct version, we introduce the notion of a strict minority:

Given a labelled graph $(G,\sigma)$ , we say that $v$ has a strict minority if either

Here is a corrected version of Lemma 5.1 (using the notation of Lemma 5.1):

If there is a unique maximal $\hat{\sigma}$ then with respect to $\hat{\sigma}$ then there cannot be both a $+$ -labelled node $u$ and a $-$ -labelled node $v$ such that either

$u$ and $v$ both have strict minorities, or

$u$ and $v$ are non-adjacent and both have minorities.

The proof of Lemma 7.2 is essentially the same as the proof of Lemma 5.1, except that the strengthened assumption means that the problematic inequality is now true. As before, assume that $p>q$ , let $u$ and $v$ be any nodes with $\tau_{u}=+$ and $\tau_{v}=-$ , let $A_{\tau}=\{\{u,v\}:\tau_{u}=\tau_{v}\}$ and let $\tau^{\prime}$ be the labelling obtained from $\tau$ by swapping the labels of $u$ and $v$ . We need to show that under either of the two conditions in Lemma 7.2, $|E(G)\cap A_{\tau^{\prime}}|\geq|E(G)\cap A_{\tau}|$ .

The sets $E(G)\cap A_{\tau^{\prime}}$ and $E(G)\cap A_{\tau}$ differ only among edges that are incident to either $u$ or $v$ , so it suffices to consider such edges, of which there are five types:

if $w\sim u$ has $\tau_{w}=+$ then $\{u,w\}\in A_{\tau}$ but not $A_{\tau^{\prime}}$ ;

if $w\sim u$ , $w\neq v$ has $\tau_{w}=-$ then $\{u,w\}\in A_{\tau^{\prime}}$ but not $A_{\tau}$ ;

if $w\sim v$ has $\tau_{w}=-$ then $\{v,w\}\in A_{\tau}$ but not $A_{\tau^{\prime}}$ ;

if $w\sim v$ , $w\neq u$ has $\tau_{w}=+$ then $\{v,w\}\in A_{\tau^{\prime}}$ but not $A_{\tau}$ ;

$\{u,v\}$ belongs to neither $A_{\tau}$ nor $A_{\tau^{\prime}}$ .

Let $N_{a}$ through $N_{e}$ be the number of edges of $G$ corresponding to each of the types above. Then $|E(G)\cap A_{\tau^{\prime}}|-|E(G)\cap A_{\tau}|=N_{b}+N_{d}-N_{a}-N_{c}$ . Note that $N_{e}$ is either zero or one, and it is one if and only if $\{u,v\}\in E(G)$ , and note also that $u$ has a minority if and only if $N_{a}\leq N_{b}+N_{e}$ , while $u$ has a strict minority if and only if $N_{a}\leq N_{b}+N_{e}-1$ (and similarly for $v$ ). Hence, if $u$ and $v$ both have strict minorities then

while if $u$ and $v$ both have minorities and are non-adjacent then

Hence, in either case we have established that $|E(G)\cap A_{\tau^{\prime}}|\geq|E(G)\cap A_{\tau}|$ .

Finally, note that if $B_{\tau}=\{\{u,v\}:\tau_{u}\neq\tau_{v}\}$ then $|E\cap B_{\tau}|=|E|-|E\cap A_{\tau}|$ and so (7) implies that

If it were possible to increase $|E(G)\cap A_{\tau}|$ while maintaining $|E(G)|$ , $|A_{\tau}|$ , and $|B_{\tau}|$ , $\tau$ could not have been the unique maximum a posteriori estimator.

Since the incorrect Lemma 5.1 was used to prove that $P(n,p_{n},q_{n})=o(n^{-1})$ is necessary for strong consistency, we will now show how Lemma 7.2 can be used for the same purpose. So, for the rest of the section we fix some $\epsilon>0$ and assume (after passing to a subsequence of $n$ , if necessary) that $P(n,p_{n},q_{n})\geq\epsilon n^{-1}$ . We will divide the proof into a sparse case and a dense case. In the sparse case, we show that there is a pair of non-adjacent minorities:

If $P(n,p_{n},q_{n})\geq\epsilon n^{-1}$ and $np_{n}\leq 64\log n$ for infinitely many $n$ then with asymptotically positive probability there is a non-adjacent pair $u$ , $v$ of nodes such that $\sigma_{u}=+$ , $\sigma_{v}=-$ , and $u$ and $v$ have minorities.

We will divide $\{u:\sigma_{u}=+\}$ into three sets $S_{1,+},S_{2,+}$ , and $S_{3,+}$ , each of size at least $n/4$ ; similarly, we divide $\{u:\sigma_{u}=-\}$ into $S_{1,-}$ , $S_{2,-}$ , and $S_{3,-}$ . For each of these six sets $S_{i,j}$ , $\operatorname{Pr}(\exists u\in S_{i,j}:u\text{ is a minority})\geq\delta$ . Next, note that the event that $u$ has a minority is (in the sense of Harris ) monotone increasing in the edges between $+$ -labelled and $-$ -labelled nodes and monotone decreasing in the other edges. It follows from Harris’s inequality that any such events are non-negatively correlated. In particular, with probability at least $\delta^{6}$ , every $S_{i,j}$ contains a node with a minority ( $u_{i,j}$ , say).

We will complete the proof by showing that a.a.s. it is not the case that every $u_{i,+}$ is connected to every $u_{k,-}$ . Indeed, if every $u_{i,+}$ is connected to every $u_{k,-}$ then the graph $G$ contains a subgraph isomorphic to $K_{3,3}$ (the complete bipartite graph). However, the random graph $G$ is stochastically dominated by the Erdős-Rényi graph $\mathcal{G}(n,64n^{-1}\log n)$ , and it is well-known (for example, by the first moment method) that such a graph a.a.s. does not contain a copy of $K_{3,3}$ .

To complete the proof we consider the dense case, where we prove that there is a pair of strict minorities:

If $P(n,p_{n},q_{n})\geq\epsilon n^{-1}$ and $np_{n}\geq 64\log n$ infinitely often then with asymptotically positive probability there are a pair $u$ and $v$ with opposite labels and strict minorities.

Once we have established that $\operatorname{Pr}(N_{+}\geq 1)$ is bounded away from zero, it follows by symmetry that $\operatorname{Pr}(N_{-}\geq 1)$ is bounded away from zero (where $N_{-}$ is the number of $-$ -labelled nodes with a minority). As in the proof of Lemma 7.4, Harris’s inequality implies that $\operatorname{Pr}(N_{+}\geq 1\text{ and }N_{-}\geq 1)\geq\operatorname{Pr}(N_{+}\geq 1)\operatorname{Pr}(N_{-}\geq 1)$ , and then it follows that with asymptotically positive probability there are strict minorities with both $+$ and $-$ labels.

Finally, the proof that $P(n,p_{n},q_{n})=o(n^{-1})$ is necessary for strong consistency follows by combining Lemma 7.2 with Lemma 7.4 in the sparse case, or with Lemma 7.6 in the dense case.