Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions

Bruce Hajek, Yihong Wu, Jiaming Xu

Introduction

The stochastic block model (SBM) , also known as the planted partition model , is a popular statistical model for studying the community detection and graph partitioning problem (see, e.g., and the references therein). In its simple form, it assumes that out of a total of nn vertices, (K1++Kr)(K_{1}+\cdots+K_{r}) of them are partitioned into rr clusters with sizes K1,,Kr,K_{1},\ldots,K_{r}, and the remaining n(K1++Kr)n-(K_{1}+\cdots+K_{r}) vertices do not belong to any clusters (called outlier vertices); a random graph GG is then generated based on the cluster structure, where each pair of vertices is connected independently with probability pp if they are in the same cluster or qq otherwise. In this paper, we focus on the problem of exactly recovering the clusters (up to a permutation of cluster indices) based on the graph GG.

In the setting of two equal-sized clusters or a single cluster plus outlier vertices, recently it has been shown in that the semidefinite programming (SDP) relaxation of the maximum likelihood (ML) estimator achieves the optimal recovery threshold with high probability, in the asymptotic regime of p=alogn/np=a\log n/n and q=blogn/nq=b\log n/n for fixed constants a,ba,b and cluster sizes growing linearly in nn as nn\to\infty. The result for two equal-sized clusters was originally conjectured in and another resolution was recently given in independently.

In this paper, we extend the optimality of SDP to the following three cases, while still assuming p=alogn/np=a\log n/n and q=blogn/nq=b\log n/n with a>b>0a>b>0:

Stochastic block model with two asymmetric clusters: the first cluster consists of KK vertices and the second cluster consists of nKn-K vertices with K=ρnK=\lfloor\rho n\rfloor for some ρ[0,1/2].\rho\in[0,1/2]. The value of ρ\rho may be known or unknown to the recovery procedure.

Stochastic block model with rr clusters of equal size KK: r2r\geq 2 is a fixed integer and n=rKn=rK.

Censored block model with two clusters: given an Erdős-Rényi random graph GG(n,p)G\sim{\mathcal{G}}(n,p), each edge (i,j)(i,j) has a label Lij{±1}L_{ij}\in\{\pm 1\} independently drawn according to the distribution:

where σi=1\sigma^{\ast}_{i}=1 if vertex ii is in the first cluster and σi=1\sigma^{\ast}_{i}=-1 otherwise; ϵ[0,1/2]\epsilon\in[0,1/2] is a fixed constant.Under the censored block model, the graph itself does not contain any information about the underlying clusters and we are interested in recovering the clusters by observing the graph and edge labels.

In all three cases, we show that a necessary condition for the maximum likelihood (ML) estimator to succeed coincides with a sufficient condition for the correctness of the SDP procedure, thereby establishing both the optimal recovery threshold and the optimality of the SDP relaxation. The proof techniques in this paper are similar to those in ; however, the construction and validation of dual certificates for the success of SDP are more challenging especially in the multiple-cluster case. Notably, we resolve an open problem raised in [1, Section 6] about the optimal recovery threshold in the censored block model and show that the optimal recovery threshold can be achieved in polynomial-time via SDP.

To further investigate the applicability of SDP procedures for community detection, we explored two cases for which the algorithm is adaptive to the unknown cluster sizes. First, we found that for two clusters, the conditions for exact recovery are the strongest in the equal-sized case. This suggests, and it is shown in Section 2.2, that if the cluster size constraint is replaced by an appropriate Lagrangian term not depending on the cluster size, exact recovery is achieved for all cluster sizes under the condition required for two equal-sized clusters. Secondly, we examined the general community detection problem with a fixed number of unequal-sized clusters and with outlier vertices, and identified a sufficient condition for the SDP procedure to achieve exact recovery with knowledge of only the smallest cluster size and the parameters a,b.a,b. (See Section 5.)

The optimality result of SDP has recently been extended to the cases with o(logn)o(\log n) number of equal-sized clusters in and a fixed number of clusters with unequal sizes in .

For the case of rr equal-sized clusters, it is independently shown in that the optimal recovery threshold can be obtained in polynomial-time. Their clustering algorithm is a two-step procedure similar to , where the partial recovery is achieved via a simple spectral algorithm. For the case with two unequal-sized clusters, a sufficient (but not tight) recovery condition is also derived in .

Further literature on SDP for cluster recovery

There has been a recent surge of interest in analyzing the semidefinite programming relaxation approach for cluster recovery; some of the latest development are summarized below. For different recovery approaches such as spectral methods, we refer the reader to for details.

The SDP approach is mostly analyzed in the regime where the average degrees scale as logn\log n, with the objective of exact cluster recovery. In this setting, the analysis often relies on the standard technique of dual witnesses, which amounts to constructing the dual variables so that the desired KKT conditions are satisfied for the primal variable corresponding to the true clusters. The SDP has been applied to recover cliques or densest subgraphs in . For the stochastic block model with possibly unbounded number of clusters, a sufficient condition for an SDP procedure to achieve exact recovery is obtained in , which improves the sufficient conditions in in terms of scaling. Various formulations of SDP for cluster recovery are discussed in . The robustness of the SDP has been investigated in for minimum bisection in the semirandom model with monotone adversary and, more recently, in for generalized SBM with arbitrary outlier vertices. The SDP machinery has also been applied to recover clusters with partially observed graphs and binary matrices . In the converse direction, necessary conditions for the success of particular SDPs are obtained in . In contrast to the previous work mentioned above where the constants are often loose, the recent line of work initiated by , and followed by and the current paper, focus on establishing necessary and sufficient conditions in the special case of a fixed number of clusters with sharp constants, attained via SDP relaxations.

In the sparse graph case with bounded average degree, exact recovery is provably impossible and instead the goal is to achieve partial recovery, namely, to correctly cluster all but a small fraction of vertices. Using Grothendieck’s inequality, a sufficient condition for SDP to achieve partial recovery is obtained in ; the technique is extended to the labeled stochastic block model in . In , an SDP-based test is applied to distinguish the binary symmetric stochastic block model versus the Erdős-Rényi random graph and shown to attain the optimal detection threshold.

Notation

Denote the identity matrix by I\mathbf{I}, the all-one matrix by J\mathbf{J} and the all-one vector by 1\mathbf{1}. We write X0X\succeq 0 if XX is symmetric and positive semidefinite and X0X\geq 0 if all the entries of XX are non-negative. Let Sn{\mathcal{S}}^{n} denote the set of all n×nn\times n symmetric matrices. For XSnX\in{\mathcal{S}}^{n}, let λ2(X)\lambda_{2}(X) denote its second smallest eigenvalue. For any matrix YY, let Y\|Y\| denote its spectral norm. For any positive integer nn, let [n]={1,,n}[n]=\{1,\ldots,n\}. For any set T[n]T\subset[n], let T|T| denote its cardinality and TcT^{c} denote its complement. For ρ\rho\in, let ρˉ=1ρ\bar{\rho}=1-\rho. We use standard big OO notations, e.g., for any sequences {an}\{a_{n}\} and {bn}\{b_{n}\}, an=Θ(bn)a_{n}=\Theta(b_{n}) if there is an absolute constant c>0c>0 such that 1/can/bnc1/c\leq a_{n}/b_{n}\leq c; an=Ω(bn)a_{n}=\Omega(b_{n}) or bn=O(an)b_{n}=O(a_{n}) if there exists an absolute constant c>0c>0 such that an/bnca_{n}/b_{n}\geq c. Let Bern(p){\rm Bern}(p) denote the Bernoulli distribution with mean pp and Binom(n,p){\rm Binom}(n,p) denote the binomial distribution with nn trials and success probability pp. All logarithms are natural and we use the convention 0log0=00\log 0=0.

Binary asymmetric SBM

Let AA denote the adjacency matrix of the graph, and (C1,C2)(C^{\ast}_{1},C^{\ast}_{2}) denote the underlying true partition, where the clusters C1C^{\ast}_{1} and C2C^{\ast}_{2} have cardinalities KK and nKn-K, respectively, and we consider the asymptotic regime K=nρK=\lceil n\rho\rceil as nn\to\infty for ρ[0,12]\rho\in[0,\frac{1}{2}] fixed. In this subsection we assume that ρ\rho is known to the recovery procedure and the goal is to obtain the ρ\rho-dependent optimal recovery threshold attained by SDP relaxations.

The cluster structure under the binary stochastic block model can be represented by a vector σ{±1}n\sigma\in\{\pm 1\}^{n} such that σi=1\sigma_{i}=1 if vertex ii is in the first cluster and σi=1\sigma_{i}=-1 otherwise. Let σ\sigma^{\ast} correspond to the true clusters. Then the ML estimator of σ\sigma^{\ast} for the case a>ba>b can be simply stated as

which maximizes the number of in-cluster edges minus the number of out-cluster edges subject to the cluster size constraint. If K=n/2K=n/2, (1) reduces to the minimum graph bisection problem which is NP-hard in the worst case. Due to the computational intractability of the ML estimator, next we turn to its convex relaxation. Let Y=σσY=\sigma\sigma^{\top}. Then Yii=1Y_{ii}=1 is equivalent to σi=±1\sigma_{i}=\pm 1, and σ1=±(2Kn)\sigma^{\top}\mathbf{1}=\pm(2K-n) if and only if Y,J=(2Kn)2\langle Y,\mathbf{J}\rangle=(2K-n)^{2}. Therefore, (1) can be recast asHenceforth, all matrix variables in the optimization are symmetric.

Notice that any feasible solution is a rank-one positive semidefinite matrix. Relaxing this condition by dropping the rank-one restriction, we obtain the following convex relaxation of (2), which is a semidefinite program:

We note that the only model parameter needed by the estimator (3) is the cluster size KK.

Let Y=σ(σ)Y^{\ast}=\sigma^{\ast}(\sigma^{\ast})^{\top} correspond to the true partition and Yn{σσ:σ{±1}n,σ1=2Kn}{\mathcal{Y}}_{n}\triangleq\{\sigma\sigma^{\top}:\sigma\in\{\pm 1\}^{n},\sigma^{\top}\mathbf{1}=2K-n\} denote the set of all admissible partitions. The following result establishes the optimality of the SDP procedure.

with ρˉ=1ρ\bar{\rho}=1-\rho, τ=ablogalogb\tau=\frac{a-b}{\log a-\log b}, γ=(ρˉρ)2τ2+4ρρˉab\gamma=\sqrt{(\bar{\rho}-\rho)^{2}\tau^{2}+4\rho\bar{\rho}ab}, and η(0,a,b)=limρ0η(ρ,a,b)=a+b2τlogeabτ.\eta(0,a,b)=\lim_{\rho\to 0}\eta(\rho,a,b)=\frac{a+b}{2}-\tau\log\frac{{\rm e}\sqrt{ab}}{\tau}.

The proof of Theorem 1 is similar in outline to the proof given in , but a considerable detour is needed to handle the imbalance. Notice that by definition, η(ρ,a,b)=η(ρˉ,a,b)\eta(\rho,a,b)=\eta(\bar{\rho},a,b), and η(1/2,a,b)=12(ab)2\eta(1/2,a,b)=\frac{1}{2}(\sqrt{a}-\sqrt{b})^{2}. The threshold function η(ρ,a,b)\eta(\rho,a,b) turns out to be the error exponent in the following large deviation events. For vertex ii, let e(i,C1)e(i,C^{*}_{1}) denotes the number of edges between vertex ii and vertices in C1C^{*}_{1}, and define e(i,C2)e(i,C^{*}_{2}) similarly. Then,

Next we prove a converse for Theorem 1 which shows that the recovery threshold achieved by the SDP relaxation is in fact optimal.

In the special case with two equal-sized clusters, we have K=n/2K=n/2 and η(1/2,a,b)=12(ab)2\eta(1/2,a,b)=\frac{1}{2}(\sqrt{a}-\sqrt{b})^{2}. The corresponding threshold (ab)2>2(\sqrt{a}-\sqrt{b})^{2}>2 has been established in , and the achievability by SDP has been shown in and independently by later.

A recent work also studies the exact recovery problem in the unbalanced case and provides the sufficient (but not tight) recovery condition for a polynomial-time two-step procedure based on the spectral method.

2 Unknown cluster size

Theorem 4 shows that if one knows the relative cluster size ρ\rho, the SDP relaxation (3) achieves the size-dependent optimal threshold η(ρ,a,b)>1\eta(\rho,a,b)>1. For fixed aa and bb, η(ρ,a,b)\eta(\rho,a,b) is minimized at ρ=12\rho=\frac{1}{2} (see Appendix A for a proof). This suggests that for two communities the equal-sized case is the most difficult to cluster. Indeed, the next result proves that if there is no constraint on the cluster size, then the optimal recovery threshold coincides with that in the balanced case, i.e., (ab)2>2(\sqrt{a}-\sqrt{b})^{2}>2, which can be achieved by a penalized SDP.

Theorem 3 holds for all cluster sizes K,K, including the extreme case where the entire network forms a single cluster (K=0K=0), in which case the SDP (5) outputs Y=JY^{*}=\mathbf{J} with high probability. The downside is that the penalization parameter λ\lambda^{*} depends on the parameters aa and bb. Nevertheless, there exists a fully data-driven choice of λ\lambda^{*} based on the degree distribution of the network, so that Theorem 3 continues to hold whenever the cluster sizes scale linearly, i.e., K/nρ(0,12]K/n\to\rho\in(0,\frac{1}{2}]; the price to pay for adaptivity is that the probability of error vanishes polylogarithmically instead of polynomially as nn\to\infty. See Appendix B for details.

SBM with multiple equal-sized clusters

The cluster structure under the stochastic block model with rr clusters of equal size KK can be represented by rr binary vectors ξ1,,ξr{0,1}n\xi_{1},\ldots,\xi_{r}\in\{0,1\}^{n}, where ξk\xi_{k} is the indicator function of the cluster kk, such that ξk(i)=1\xi_{k}(i)=1 if vertex ii is in cluster kk and ξk(i)=0\xi_{k}(i)=0 otherwise. Let ξ1,,ξr\xi^{\ast}_{1},\ldots,\xi^{\ast}_{r} correspond to the true clusters and let AA denote the adjacency matrix. Then the maximum likelihood (ML) estimator of ξ\xi^{\ast} for the case a>ba>b can be simply stated as

We remark that we could as well have worked with the constraint Y,J=0\langle Y,\mathbf{J}\rangle=0, which, for Y0Y\succeq 0, is equivalent to the last constraint in (8). Letting Z=r1rY+1rJZ=\frac{r-1}{r}Y+\frac{1}{r}\mathbf{J}, we can also equivalently rewrite (8) as

The only model parameter needed by the estimator (9) is the cluster size KK. Let Z=k=1rξk(ξk)Z^{*}=\sum_{k=1}^{r}\xi_{k}^{*}(\xi_{k}^{*})^{\top} correspond to the true clusters and define

The sufficient condition for the success of SDP in (9) is given as follows.

The following result establishes the optimality of the SDP procedure.

The optimal recovery threshold ab=r\sqrt{a}-\sqrt{b}=\sqrt{r} is also obtained by two parallel independent works via a polynomial-time two-step procedure, consisting of a partial recovery algorithm followed by a cleanup stage. The previous work studies the stochastic block model in a much more general setting with rr clusters of equal size KK plus outlier vertices, where r,Kr,K and the edge probabilities p,qp,q may scale with nn arbitrarily as long as rKnrK\leq n; it is shown that an SDP achieves exact recovery with high probability provided that

for some universal constant CC. In the special setting where the network consists of a fixed number of clusters without outliers and p=alogn/n>q=blogn/np=a\log n/n>q=b\log n/n, the sufficient condition (10) simplifies to abCr\sqrt{a}-\sqrt{b}\geq C^{\prime}\sqrt{r} for some absolute constant CC^{\prime}, which is off by a constant factor compared to the sharp sufficient condition ab>r\sqrt{a}-\sqrt{b}>\sqrt{r} given by Theorem 4.

It is straightforward to extend the current proof of Theorem 5 to the regime where r=γlogsnr=\gamma\log^{s}n, p=alogs+1nn>q=blogs+1nnp=\frac{a\log^{s+1}n}{n}>q=\frac{b\log^{s+1}n}{n} for any fixed γ,a,b>0\gamma,a,b>0 and s[0,1)s\in[0,1), showing that SDP achieves the optimal recovery threshold ab=γ\sqrt{a}-\sqrt{b}=\sqrt{\gamma}. Indeed, the preprint shows similar optimality results of SDP for r=o(logn)r=o(\log n) number of equal-sized clusters. Conversely, it has been recently proved in that SDP relaxations cease to be optimal for logarithmically many communities in the sense that SDP is constantwise suboptimal when rClognr\geq C\log n for a large enough constant CC and orderwise suboptimal when r=ω(logn)r=\omega(\log n).

Binary censored block model

Under the binary censored block model, with possibly unequal cluster sizes, the cluster structure can be represented by a vector σ{±1}n\sigma\in\{\pm 1\}^{n} such that σi=1\sigma_{i}=1 if vertex ii is in the first cluster and σi=1\sigma_{i}=-1 if vertex ii is in the second cluster. Let σ{±1}n\sigma^{\ast}\in\{\pm 1\}^{n} correspond to the true clusters. Let AA denote the weighted adjacency matrix such that Aij=0A_{ij}=0 if i,ji,j are not connected by an edge; Aij=1A_{ij}=1 if i,ji,j are connected by an edge with label +1+1; Aij=1A_{ij}=-1 if i,ji,j are connected by an edge with label 1-1. Then the ML estimator of σ\sigma^{\ast} can be simply stated as

which maximizes the number of in-cluster +1+1 edges minus that of in-cluster 1-1 edges, or equivalently, maximizes the number of cross-cluster 1-1 edges minus that of cross-cluster +1+1 edges. The NP-hard max-cut problem can be reduced to (11) by simply labeling all the edges in the input graph as 1-1 edges, and thus (11) is computationally intractable in the worst case. Instead, we consider the SDP studied in obtained by convex relaxation. Let Y=σσY=\sigma\sigma^{\top}. Then Yii=1Y_{ii}=1 is equivalent to σi=±1\sigma_{i}=\pm 1. Therefore, (6) can be recast as

Replacing the rank-one constraint by positive semidefiniteness, we obtain the following convex relaxation of (12), which is an SDP:

We remark that (13) does not rely on any knowledge of the model parameters. Let Y=σ(σ)Y^{\ast}=\sigma^{\ast}(\sigma^{\ast})^{\top} and Yn{σσ ⁣:σ{±1}n}{\mathcal{Y}}_{n}\triangleq\{\sigma\sigma^{\top}\colon\sigma\in\{\pm 1\}^{n}\}. The following result establishes the success condition of the SDP procedure in the scaling regime p=alogn/np=a\log n/n for a fixed constant aa:

Next we prove a converse for Theorem 6 which shows that the recovery threshold achieved by the SDP relaxation is in fact optimal.

Theorem 7 still holds if the cluster sizes are proportional to nn and known to the estimators, i.e., the prior distribution of σ\sigma^{\ast} is uniform over {σ{±1}n:σ1=2Kn}\{\sigma\in\{\pm 1\}^{n}:\sigma^{\top}\mathbf{1}=2K-n\} for K=ρnK=\lfloor\rho n\rfloor with ρ(0,1/2]\rho\in(0,1/2].

Denote by a(ϵ)a^{*}(\epsilon) the optimal recovery threshold, namely, the infimum of a>0a>0 such that exact cluster recovery is possible with probability converging to one as nn\to\infty. Our results show that for all ϵ[0,1/2]\epsilon\in[0,1/2], the optimal recovery threshold is given by

and can be achieved by the SDP relaxations. The optimal recovery threshold is insensitive to ρ\rho, which is in contrast to what we have seen for the binary stochastic block model.

Exact cluster recovery in the censored block model is previously studied in and it is shown that if ϵ1/2\epsilon\to 1/2, the maximum likelihood estimator achieves the optimal recovery threshold a(12ϵ)2>2+o(1)a(1-2\epsilon)^{2}>2+o(1), while an SDP relaxation of the ML estimator succeeds if a(12ϵ)2>4+o(1)a(1-2\epsilon)^{2}>4+o(1). The optimal recovery threshold for any fixed ϵ(0,1/2)\epsilon\in(0,1/2) and whether it can be achieved in polynomial-time were previously unknown. Theorem 6 and Theorem 7 together show that the SDP relaxation achieves the optimal recovery threshold a(1ϵϵ)2>1a(\sqrt{1-\epsilon}-\sqrt{\epsilon})^{2}>1 for any fixed constant ϵ[0,1/2]\epsilon\in[0,1/2]. Notice that (1ϵϵ)2=12(12ϵ)2+o((12ϵ)2)(\sqrt{1-\epsilon}-\sqrt{\epsilon})^{2}=\frac{1}{2}(1-2\epsilon)^{2}+o((1-2\epsilon)^{2}) when ϵ1/2\epsilon\to 1/2. For the censored block model with the background graph being random regular graph, it is further shown in that the SDP relaxations also achieve the optimal exact recovery threshold.

The above exact recovery threshold in the regime p=alogn/np=a\log n/n shall be contrasted with the positively correlated recovery threshold in the sparse regime p=a/np=a/n for constant aa. In this sparse regime, there exists at least a constant fraction of vertices with no neighbors and exactly recovering the clusters is hopeless; instead, the goal is to find an estimator σ^\widehat{\sigma} positively correlated with σ\sigma^{\ast} up to a global flip of signs. It was conjectured in that the positively correlated recovery is possible if and only if a(12ϵ)2>1a(1-2\epsilon)^{2}>1; the converse part is shown in and recently it is proved in that spectral algorithms achieve the sharp threshold in polynomial-time.

An SDP for general cluster structure

In this section we consider SDPs for the general case of multiple clusters and outliers. We assume there are rr clusters with sizes K1,,Kr,K_{1},\ldots,K_{r}, and n(K1++Kr)n-(K_{1}+\cdots+K_{r}) outlier vertices. Vertices in the same cluster are connected with probability pp, while other pairs of vertices are connected between them with probability q.q. We consider the asymptotic regime p=alognn,p=\frac{a\log n}{n}, q=blognnq=\frac{b\log n}{n} and Kk=ρknK_{k}=\rho_{k}n as nn\rightarrow\infty for a,b,ρ0,,ρra,b,\rho_{0},\ldots,\rho_{r} fixed, with ρ1ρr>0.\rho_{1}\geq\ldots\geq\rho_{r}>0. Let ρmin=ρr.\rho_{\min}=\rho_{r}. We derive sufficient conditions for exact recovery by SDPs. While the conditions are not the tightest possible for specific cases, we would like to identify an algorithm that recovers the cluster matrix exactly without knowing the details of the cluster structure. As in Section 3, the true cluster matrix can be expressed as Z=k=1rξk(ξk),Z^{*}=\sum_{k=1}^{r}\xi_{k}^{*}(\xi_{k}^{*})^{\top}, where ξk\xi_{k}^{*} is the indicator function of the kthk^{{\rm th}} cluster. Denote by Zn{\mathcal{Z}}_{n} the collection of all such cluster matrices.

Implementing the SDP (15) requires no knowledge of the density parameters aa and bb, the number of clusters r,r, or the sizes of the individual clusters; but it does require the exact knowledge of the sum as well as the sum of squares of the cluster sizes, which, in practical applications, may be unrealistic to assume. Therefore, similar to (5), we also consider the following penalized SDP, obtained by removing the constraints for those two quantities while augmenting the objective function:

Here the penalization parameters η\eta^{*} and λ\lambda^{*} must be specified.

Clearly the above two SDPs are different and need not have the same solutions; nevertheless, they are similar enough so that in the following theorem we state a sufficient condition for either of the SDPs to exactly recover ZZ^{*} with high probability. Define

For μ>0\mu>0 fixed, I(μ,d)I(\mu,d) is a strictly convex, nonnegative function in dd which is zero if and only if d=μ.d=\mu.

Suppose there exists ψ1>0\psi_{1}>0 and ψ2>0\psi_{2}>0 with b+ψ1+ψ2<ab+\psi_{1}+\psi_{2}<a such that

We examine two simpler sufficient conditions for recovery, assuming we have enough information to implement one of the two SDPs, and we also have a lower bound ρmin,\rho_{\min}, on the ρk\rho_{k}’s, but we don’t know how many clusters there are nor whether there are outlier vertices. The conditions of Theorem 8 are most stringent when there are two clusters of the smallest possible size ρmin,\rho_{\min}, and in that case we get the tightest result from the theorem by selecting ψ1=ψ2=ψ,\psi_{1}=\psi_{2}=\psi, yielding the following corollary:

There is no simple expression for ψ\psi in Corollary 1. If instead we consider the equation I(a,b+2ψ)=I(b,b+2ψ),I(a,b+2\psi)=I(b,b+2\psi), we have the smaller but explicit solution ψ=τb2,\psi=\frac{\tau-b}{2}, where τ=ablog(a/b).\tau=\frac{a-b}{\log(a/b)}. Using this ψ\psi in the test I(b,b+ψ)>1/ρmin,I(b,b+\psi)>1/\rho_{\min}, we obtain the following weaker but more explicit recovery condition, which, nevertheless, is within a factor of eight of the necessary condition (see Remark 32 below):

Let us compare the sufficient condition provided by Corollary 2 with necessary conditions for recovery. In the presence of outliers, I(b,τ)>1/ρminI(b,\tau)>1/\rho_{\min} is a necessary condition as shown in [24, Theorem 4], for otherwise we can swap a vertex in the smallest cluster with an outlier vertex to increase the number of in-cluster edges. Also, with at least two clusters,

is necessary, because we could have two smallest clusters of sizes ρminn,\rho_{\min}n, and even if a genie were to reveal all the other clusters, we would still need (32) to recover the two smallest ones, as shown by [2, Theorem 1]. By Lemma 11, I(b,τ)(ab)22I(b,τ)I(b,\tau)\leq(\sqrt{a}-\sqrt{b})^{2}\leq 2I(b,\tau); so with or without outliers, 2I(b,τ)>1/ρmin2I(b,\tau)>1/\rho_{\min} is necessary. By Lemma 12, I(b,τ)4I(b,τ+b2)I(b,\tau)\leq 4I(b,\frac{\tau+b}{2}). Therefore we conclude that the sufficient condition of Corollary 2 is within a factor of four (resp. eight) of the necessary condition in the presence (resp. absence) of outliers.

Conclusions

This paper shows that the SDP procedure works for recovering community structure at the asymptotically optimal threshold in various important settings beyond the case of two equal-sized clusters or that of a single cluster and outliers considered in . In particular, SDP relaxations works asymptotically optimally for two unequal clusters (with or without knowing the cluster size), or rr equal clusters, or the binary censored block model with the background graph being Erdős-Rényi. These results demonstrate the versatility of SDP relaxation as a simple, general purpose, computationally feasible methodology for community detection.

The picture is less impressive when these cases are combined to have a general case with rr clusters of various sizes plus outliers. Still, we found that an SDP procedure can achieve exact recovery even without the knowledge of the cluster sizes; the sufficient condition for recovery is within a factor of eight of the necessary information-theoretic bound. An interesting open problem is whether the SDP relaxation can achieve the optimal recovery threshold in this general case. The preprint addresses this problem, showing that the SDP relaxation still achieves the optimal threshold for recovering a fixed number of clusters with unequal sizes.

Proofs

with γ=α2+4ρ1ρ2ab.\gamma=\sqrt{\alpha^{2}+4\rho_{1}\rho_{2}ab}.

We first prove the upper tail bound in (36) using Chernoff’s bound. In particular,

with γn=αn2+4ρ1,nρ2,nab\gamma_{n}=\sqrt{\alpha_{n}^{2}+4\rho_{1,n}\rho_{2,n}ab}. Thus, using the inequality that log(1x)x\log(1-x)\leq-x, we have

If ρ1,n=ρ1+o(1)\rho_{1,n}=\rho_{1}+o(1), ρ2,n=ρ2+o(1)\rho_{2,n}=\rho_{2}+o(1), and αn=α+o(1)\alpha_{n}=\alpha+o(1), then we let

with γ=α2+4ρ1ρ2ab\gamma=\sqrt{\alpha^{2}+4\rho_{1}\rho_{2}ab}. It follows that

and thus the upper tail bound in (35) holds in view of (37). Next, we prove the lower tail bound in (35).

Case 1: ρ1,ρ2>0\rho_{1},\rho_{2}>0. For any choice of the constant α\alpha^{\prime} with α>α,\alpha^{\prime}>|\alpha|,

Setting α=α2+4ρ1ρ2ab\alpha^{\prime}=\sqrt{\alpha^{2}+4\rho_{1}\rho_{2}ab} in the last displayed equation yields

where the last inequality follows from Lemma 1. ∎

The following lemma provides a deterministic sufficient condition for the success of SDP (3) in the case a>ba>b.

Then Y^SDP=Y\widehat{Y}_{{\rm SDP}}=Y^{\ast} is the unique solution to (3).

where (a)(a) holds because S,Y0\langle S^{\ast},Y\rangle\geq 0; (b)(b) holds because Y,S=(σ)Sσ=0\langle Y^{\ast},S^{\ast}\rangle=(\sigma^{\ast})^{\top}S^{\ast}\sigma^{\ast}=0 by (38). Hence, YY^{\ast} is an optimal solution. It remains to establish its uniqueness. To this end, suppose Y~{\widetilde{Y}} is an optimal solution. Then,

where (a)(a) holds because J,Y~=J,Y\langle\mathbf{J},{\widetilde{Y}}\rangle=\langle\mathbf{J},Y^{\ast}\rangle, A,Y~=A,Y\langle A,{\widetilde{Y}}\rangle=\langle A,Y^{\ast}\rangle, and Y~ii=Yii=1{\widetilde{Y}}_{ii}=Y^{*}_{ii}=1 for all i[n]i\in[n]. In view of (38), since Y~0{\widetilde{Y}}\succeq 0, S0S^{\ast}\succeq 0 with λ2(S)>0\lambda_{2}(S^{*})>0, Y~{\widetilde{Y}} must be a multiple of Y=σ(σ)Y^{*}=\sigma^{\ast}(\sigma^{\ast})^{\top}. Because Y~ii=1{\widetilde{Y}}_{ii}=1 for all i[n]i\in[n], Y~=Y{\widetilde{Y}}=Y^{\ast}. ∎

Let D=diag{di}D^{\ast}=\mathsf{diag}\left\{{d^{\ast}_{i}}\right\} with

and choose λ=τlogn/n\lambda^{*}=\tau\log n/n, where τ=ablogalogb\tau=\frac{a-b}{\log a-\log b}. It suffices to show that S=DA+λJS^{*}=D^{\ast}-A+\lambda^{\ast}\mathbf{J} satisfies the conditions in Lemma 3 with high probability.

By definition, diσi=jAijσjλ(2Kn)d^{\ast}_{i}\sigma_{i}^{\ast}=\sum_{j}A_{ij}\sigma^{\ast}_{j}-\lambda^{\ast}(2K-n) for all i[n]i\in[n], i.e., Dσ=Aσλ(2Kn)1D^{\ast}\sigma^{\ast}=A\sigma^{\ast}-\lambda^{\ast}(2K-n)\mathbf{1}. Since Jσ=(2Kn)1\mathbf{J}\sigma^{\ast}=(2K-n)\mathbf{1}, it follows that the desired (38) holds, that is, Sσ=0S^{*}\sigma^{*}=0. It remains to verify that S0S^{\ast}\succeq 0 and λ2(S)>0\lambda_{2}(S^{\ast})>0 with high probability, which amounts to showing that

where (a)(a) holds because x,σ=0\left\langle x,\sigma^{\ast}\right\rangle=0. It follows from (41) that for any xσx\perp\sigma^{\ast} and x2=1\|x\|_{2}=1, xTSx=t1(x)+t2(x)x^{T}S^{*}x=t_{1}(x)+t_{2}(x) where

We next bound infxσ,x2=1t1(x)\inf_{x\perp\sigma^{*},\|x\|_{2}=1}t_{1}(x) from the below. Consider the specific vector xˇ\check{x} that maximizes xJxx^{\top}\mathbf{J}x subject to the unit norm constraint and x,σ=0.\langle x,\sigma^{*}\rangle=0. It has coordinates nKnK\sqrt{\frac{n-K}{nK}} for the KK vertices of the first cluster and coordinates Kn(nK)\sqrt{\frac{K}{n(n-K)}} for the nKn-K vertices of the other cluster. Let E2=\mboxspan(σ,xˇ);E_{2}=\mbox{span}(\sigma^{*},\check{x}); E2E_{2} is the set of vectors that are constant over each cluster. Then

Notice that for any vector xx with xE2,x\perp E_{2}, Jx=0.\mathbf{J}x=0. It follows that

We bound the three terms in the parenthesis separately in the sequel.

Lower bound on t1(xˇ)t_{1}(\check{x}): Notice that xˇJxˇ=4K(nK)/n\check{x}^{\top}\mathbf{J}\check{x}=4K(n-K)/n and thus

Since xˇDxˇ=A,Bλ(2Kn)i=1nxˇi2σi\check{x}^{\top}D^{\ast}\check{x}=\langle A,B\rangle-\lambda^{\ast}(2K-n)\sum_{i=1}^{n}\check{x}_{i}^{2}\sigma_{i}^{\ast}, where Bij=σiσjxˇi2B_{ij}=\sigma_{i}\sigma_{j}\check{x}_{i}^{2}, it follows that xˇDxˇ\check{x}^{\top}D\check{x} is Lipschitz continuous in AA with Lipschitz constant BF=(1ρ)2/ρ+ρ2/(1ρ)+o(1)\|B\|_{\rm F}=\sqrt{(1-\rho)^{2}/{\rho}+\rho^{2}/(1-\rho)}+o(1). Moreover, AijA_{ij} is $valued.ItfollowsfromtheTalagrandsconcentrationinequalityforLipschitzconvexfunctions(see,e.g.,[40,Theorem2.1.13])thatforany-valued. It follows from the Talagrand’s concentration inequality for Lipschitz convex functions (see, e.g., [40, Theorem 2.1.13]) that for anyc>0,thereexists, there existsc^{\prime}>0onlydependingononly depending on\rho$, such that

Hence, with probability at least 1nc1-n^{-c},

Lower bound on infx2=1,xE2xDxˇ\inf_{\|x\|_{2}=1,x\perp E_{2}}x^{\top}D^{*}\check{x}: Note that E[D]xˇE2.E[D^{*}]\check{x}\in E_{2}. So for any vector xx with xE2,x\perp E_{2}, xE[D]xˇ=0x^{\top}E[D^{*}]\check{x}=0. Hence,

where the last inequality follows from the Cauchy-Schwartz inequality. It follows from the Talagrand’s concentration inequality for Lipschitz convex functions that for any c>0c>0, there exists c>0c^{\prime}>0 such that

Lower bound on infx2=1,xE2xDx\inf_{\|x\|_{2}=1,x\perp E_{2}}x^{\top}D^{*}x: Notice that for x2=1,\|x\|_{2}=1, xDxminidix^{\top}D^{*}x\geq\min_{i}d^{\ast}_{i}, so it suffices to bound minidi\min_{i}d^{\ast}_{i} from the below. For iC1i\in C_{1}, AijσiσjA_{ij}\sigma_{i}\sigma_{j} is equal in distribution to XRX-R, where XBinom(K1,alognn)X\sim{\rm Binom}(K-1,\frac{a\log n}{n}) and RBinom(nK,blognn)R\sim{\rm Binom}(n-K,\frac{b\log n}{n}). It follows from Lemma 2 that

For iC2i\in C_{2}, AijσiσjA_{ij}\sigma_{i}\sigma_{j} is equal in distribution to XRX-R, where XBinom(nK1,alognn)X\sim{\rm Binom}(n-K-1,\frac{a\log n}{n}) and RBinom(K,blognn)R\sim{\rm Binom}(K,\frac{b\log n}{n}). It follows from Lemma 2 that

It follows from the definition of did^{\ast}_{i} that

where the last inequality follows from the assumption that η(ρ,a,b)>1\eta(\rho,a,b)>1.

Combing all the three lower bounds together, we get that with high probability,

Notice that we have shown that with high probability infxσ,x2=1t2(x)pclogn\inf_{x\perp\sigma^{*},\|x\|_{2}=1}t_{2}(x)\geq p-c^{\prime}\sqrt{\log n}. It follows from (44) that with high probability,

Notice that a>b>0a>b>0 and thus τ>b\tau>b. Therefore, the desired (40) holds and the theorem follows from Lemma 3. ∎

By symmetry, we can condition on C1C_{1}^{*} being the first KK vertices. Let TT denote the set of first nlog2n\lfloor\frac{n}{\log^{2}n}\rfloor vertex. Then

Suppose that the true clusters are C1C^{\ast}_{1} and C2C^{\ast}_{2} of cardinality KnK_{n} and nKnn-K_{n}, respectively. One can easily check that Lemma 3 still holds with λ=τlogn/n\lambda^{\ast}=\tau\log n/n, where τ=ablogalogb\tau=\frac{a-b}{\log a-\log b}. Choose the same did_{i}^{\ast} in (39) as in the proof of Theorem 1. It suffices to show for any 0Knn0\leq K_{n}\leq n,

First, consider the case Kn=0K_{n}=0 or nn where Y=JY^{*}=\mathbf{J} and the graph is simply G(n,p){\mathcal{G}}(n,p). Then for i[n]i\in[n], jAijσiσjBinom(n1,alognn)\sum_{j}A_{ij}\sigma^{\ast}_{i}\sigma^{\ast}_{j}\sim{\rm Binom}(n-1,\frac{a\log n}{n}). Recall that τ=ablogalogb\tau=\frac{a-b}{\log a-\log b} and notice that in this case, di=jAijσiσjτlognd_{i}^{\ast}=\sum_{j}A_{ij}\sigma^{\ast}_{i}\sigma^{\ast}_{j}-\tau\log n. It follows from Lemma 1 that

where η(0,a,b)=aτlog(ea/τ)\eta(0,a,b)=a-\tau\log({\rm e}a/\tau) and the last inequality follows from Lemma 13 in Appendix A. By the union bound,

where the last inequality holds because η(1/2,a,b)=12(ab)2>1\eta(1/2,a,b)=\frac{1}{2}(\sqrt{a}-\sqrt{b})^{2}>1 by assumption. Moreover, since σ=±1\sigma^{*}=\pm\mathbf{1}, any xx such that xσx\perp\sigma^{\ast} satisfies xJx=0x^{\top}\mathbf{J}x=0. It follows from (41) that

Next, we consider the case 1Knn11\leq K_{n}\leq n-1. For iC1i\in C_{1}, jAijσiσj\sum_{j}A_{ij}\sigma_{i}\sigma_{j} is stochastically larger than XR1X-R-1, where XBinom(Kn,alognn)X\sim{\rm Binom}(K_{n},\frac{a\log n}{n}) and RBinom(nKn,blognn)R\sim{\rm Binom}(n-K_{n},\frac{b\log n}{n}). Let ρn=Knn(0,1)\rho_{n}=\frac{K_{n}}{n}\in(0,1) and tn=τ(12ρn)lognlognloglogn1t_{n}=\tau(1-2\rho_{n})\log n-\frac{\log n}{\log\log n}-1. Applying the non-asymptotic upper bound in Lemma 2 yields

We proceed to show that g(ρn,1ρn,a,b,tnlogn)η(1/2,a,b)+o(1)g(\rho_{n},1-\rho_{n},a,b,-\frac{t_{n}}{\log n})\geq\eta(1/2,a,b)+o(1). First note that

and tnlogn=τ(ρnρˉn)ϵn-\frac{t_{n}}{\log n}=\tau(\rho_{n}-\bar{\rho}_{n})-\epsilon_{n}, where ϵn=1loglogn+1logn\epsilon_{n}=\frac{1}{\log\log n}+\frac{1}{\log n} and ρˉn1ρn\bar{\rho}_{n}\triangleq 1-\rho_{n}. Furthermore, for any fixed a,b>0a,b>0,

for some function F(a,b)F(a,b) independent of nn and ρˉ1ρ\bar{\rho}\triangleq 1-\rho. To see this, let t=τ(ρρˉ)δt=\tau(\rho-\bar{\rho})-\delta, where 0δϵn0\leq\delta\leq\epsilon_{n}. First consider the case of t<0t<0. Then ρ12+o(1)2/3\rho\leq\frac{1}{2}+o(1)\leq 2/3. Hence 1/3ρˉ11/3\leq\bar{\rho}\leq 1. Then

Since ab<τ<a+b2\sqrt{ab}<\tau<\frac{a+b}{2} whenever aba\neq b and ρˉρ\bar{\rho}-\rho\in, both the numerator and denominator in (50) are bounded away from zero and infinity uniformly in ρ\rho. The case of t>0t>0 follows analogously. Therefore

where (51) is due to (49), (52) is by definition of η\eta, and (53) follows from Lemma 13.

Similarly, for iC2i\in C_{2}, AijσiσjA_{ij}\sigma_{i}\sigma_{j} is stochastically larger than XR1X-R-1, where XBinom(nKn,alognn)X\sim{\rm Binom}(n-K_{n},\frac{a\log n}{n}) and RBinom(Kn,blognn)R\sim{\rm Binom}(K_{n},\frac{b\log n}{n}). Let kn=τ(12ρn)logn+lognloglogn+1k^{\prime}_{n}=\tau(1-2\rho_{n})\log n+\frac{\log n}{\log\log n}+1. It follows from Lemma 2 that

where the last inequality follows from the same steps as in (51) – (53). It follows from the definition of did^{\ast}_{i} that

where the last inequality follows from the assumption that η(1/2,a,b)=12(ab)2>1\eta(1/2,a,b)=\frac{1}{2}(\sqrt{a}-\sqrt{b})^{2}>1. Furthermore one can verify that infxσt2(x)pO(logn)\inf_{x\perp\sigma^{\ast}}t_{2}(x)\geq p-O(\sqrt{\log n}) with high probability, where the functions t1t_{1} and t2t_{2} are defined in (42)–(43). We divide the remaining analysis into the two cases:

Case 1: Knn/lognK_{n}\leq n/\sqrt{\log n} or nKnn/lognn-K_{n}\leq n/\sqrt{\log n}. Notice that τa+b2\tau\leq\frac{a+b}{2} and recall the definition of xˇ\check{x} in the proof of Theorem 1. Then

Thus the desired (48) follows by the same argument used in the proof of Theorem 1.

Then the desired (48) follows by the same argument used in the proof of Theorem 1.

2 Proofs for Section 3: Multiple equal-sized clusters

Theorem 4 is proved after three lemmas are given. For k[r]k\in[r], denote by Ck[n]C_{k}\subset[n] the support of the kthk^{\rm th} cluster. For a set TT of vertices, let e(i,T)jTAije(i,T)\triangleq\sum_{j\in T}A_{ij} and e(T,T)=iTe(i,T).e(T^{\prime},T)=\sum_{i\in T^{\prime}}e(i,T). Let k(i)k(i) denote the index of the cluster containing vertex i.i. Denote the number of neighbors of ii in its own cluster by si=e(i,Ck(i))s_{i}=e(i,C_{k(i)}) and the maximum number of neighbors of ii in other clusters by ri=maxkk(i)e(i,Ck).r_{i}=\max_{k^{\prime}\neq k(i)}e(i,C_{k^{\prime}}).

Notice that siBinom(K,p)s_{i}\sim{\rm Binom}(K,p) and for kk(i)k^{\prime}\neq k(i), e(i,Ck)Binom(K,q)e(i,C_{k^{\prime}})\sim{\rm Binom}(K,q). It is shown in that

Applying the union bound over all possible vertices, we complete the proof. ∎

There exists a constant c>0c>0 depending only on bb and rr such that

where (a)(a) follows from Bernstein’s inequality. Furthermore,

It follows from McDiarmid’s inequality that

Thus, with probability at most n2n^{-2}, 1KiCkriKq+O(logn)\frac{1}{K}\sum_{i\in C_{k}}r_{i}\geq Kq+O(\sqrt{\log n}). The lemma follows in view of the union bound. ∎

The following lemma provides a deterministic sufficient condition for the success of SDP (9) in the case a>ba>b.

Then Z^SDP=Z\widehat{Z}_{SDP}=Z^{*} is the unique solution to (9).

Let H=ZZ,H=Z-Z^{*}, where ZZ is an arbitrary feasible matrix for the SDP (9). Since ZZ and ZZ^{*} are both feasible, D,H=λ1,H=1(λ),H=0.\langle D^{*},H\rangle=\langle\lambda^{*}\mathbf{1}^{\top},H\rangle=\langle\mathbf{1}(\lambda^{\ast})^{\top},H\rangle=0. Since A=DBS+λ1+1(λ),A=D^{*}-B^{*}-S^{*}+\lambda^{*}\mathbf{1}^{\top}+\mathbf{1}(\lambda^{\ast})^{\top},

B,H0,\langle B^{*},H\rangle\geq 0, with equality if and only if B,Z=0.\langle B^{*},Z\rangle=0. That is because B0,B^{*}\geq 0, Z0,Z\geq 0, and B,Z=0.\langle B^{*},Z^{*}\rangle=0.

S,H0,\langle S^{*},H\rangle\geq 0, with equality if and only if S,Z=0.\langle S^{*},Z\rangle=0. That is because S,Z0\langle S^{*},Z\rangle\geq 0 (because S,Z0S^{*},Z\succeq 0) and S,Z=0\langle S^{*},Z^{*}\rangle=0 (because Z=k=1rξk(ξk)Z^{*}=\sum_{k=1}^{r}\xi^{*}_{k}(\xi^{*}_{k})^{\top} and Sξk=0S^{*}\xi^{*}_{k}=0 for all k[r]k\in[r]).

Thus, A,H0,\langle A,H\rangle\leq 0, so that ZZ^{*} is a solution to the SDP.

To prove that ZZ^{*} is the unique solution, restrict attention to the case that ZZ is another solution to the SDP. We need to show Z=Z.Z=Z^{*}. Since both ZZ and ZZ^{*} are solutions, A,H=0,\langle A,H\rangle=0, so that B,H=S,H=0.\langle B^{*},H\rangle=\langle S^{*},H\rangle=0. Therefore, by the above two points: B,Z=S,Z=0.\langle B^{*},Z\rangle=\langle S^{*},Z\rangle=0. For each ii, Bi,j=0B^{*}_{i,j}=0 if and only if vertices ii and jj are in the same cluster. Also, the fact Z0Z\succeq 0 and Zii1Z_{ii}\leq 1 for all ii implies Zij1Z_{ij}\leq 1 for all i,j.i,j. Thus, the only way ZZ can meet the constraint Z1=K1Z\mathbf{1}=K\mathbf{1} is that Zij=1Z_{ij}=1 whenever ii and jj are in the same cluster. Therefore Z=ZZ=Z^{*} and hence ZZ^{*} is the unique solution. ∎

We now begin the proof of Theorem 4. Let EE denote the subspace spanned by vectors {ξk}k[r]\{\xi^{\ast}_{k}\}_{k\in[r]}, i.e., E=span(ξk:k[r]).E=\text{span}(\xi^{\ast}_{k}:k\in[r]). Ultimately, we will show that

Since BB^{*} is assumed to be symmetric, (59) is equivalent to requiring that

for some ykky_{kk^{\prime}}^{\ast} and zkkz_{kk^{\prime}}^{\ast}. Next we ensure that Sξk=0S^{*}\xi^{\ast}_{k}=0 for k[r]k\in[r]. Equivalently, we want to ensure that for any distinct k,k[r]k,k^{\prime}\in[r] and any iCki\in C_{k},

Requiring (62) for all distinct k,k[r]k,k^{\prime}\in[r] and all iCki\in C_{k} is equivalent to requiring

for all distinct k,k[r]k,k^{\prime}\in[r] and all jCkj\in C_{k^{\prime}} (by swapping ii for jj and kk for kk^{\prime}). Moreover, it is equivalent to checking both (62) and (63) under the additional assumption that k<k.k<k^{\prime}. Substituting (60) into (62) and (63) gives that for all k,k[r]k,k^{\prime}\in[r] with k<k,k<k^{\prime},

For k<kk<k^{\prime} and iCk,jCki\in C_{k},j\in C_{k^{\prime}}, set

where ukku_{kk^{\prime}} and αk\alpha_{k} are to be determined. Equations (64) and (65) both reduce to:

(which must hold whenever k<kk<k^{\prime}) and (61) becomes

In view of Lemma 4 and the assumption that ab>r\sqrt{a}-\sqrt{b}>\sqrt{r}, mini(siri)logn/loglogn\min_{i}(s_{i}-r_{i})\geq\log n/\log\log n with high probability. By Lemma 5, maxk[r]1KiCkriKq+O(logn)\max_{k\in[r]}\frac{1}{K}\sum_{i\in C_{k}}r_{i}\leq Kq+O(\sqrt{\log n}) with high probability. Finally set

Thus, the desired (57) holds in view of (69) and (58). Also, note that e(Ck,Ck)Binom(K2,q)e(C_{k},C_{k^{\prime}})\sim{\rm Binom}(K^{2},q). For XBinom(n,p0)X\sim{\rm Binom}(n,p_{0}) with p0p_{0}\in, Chernoff’s bound yields

Applying the union bound, we have that with high probability, e(Ck,Ck)>K2qKlogne(C_{k},C_{k^{\prime}})>K^{2}q-K\sqrt{\log n} for all 1k<kr1\leq k<k^{\prime}\leq r. Hence ykk(i)>0y^{\ast}_{kk^{\prime}}(i)>0 and zkk(j)>0z^{\ast}_{kk^{\prime}}(j)>0 for all 1k<kr1\leq k<k^{\prime}\leq r and iCki\in C_{k} and jCkj\in C_{k^{\prime}} so that Bij>0B^{\ast}_{ij}>0 for all i,ji,j in distinct clusters as desired. ∎

3 Proofs for Section 4: Binary censored block model

Our analysis of the SDP relies on two key ingredients: the spectrum of labeled Erdős-Rényi random graph and the tail bounds for the binomial distributions, which we first present.

Let E=(Eij)E=(E_{ij}) denote an n×nn\times n matrix with independent entries drawn from μ^p2δ1+p2δ1+(1p)δ0\widehat{\mu}\triangleq\frac{p}{2}\delta_{1}+\frac{p}{2}\delta_{-1}+(1-p)\delta_{0}, which is the distribution of a Rademacher random variable multiplied with an independent Bernoulli with bias pp. Define EE^{\prime} as Eii=EiiE^{\prime}_{ii}=E_{ii} and Eij=EjiE^{\prime}_{ij}=-E_{ji} for all iji\neq j. Let AA^{\prime} be an independent copy of AA. Let DD be a zero-diagonal symmetric matrix whose entries are drawn from μ^\widehat{\mu} and DD^{\prime} be an independent copy of DD. Let M=(Mij)M=(M_{ij}) denote an n×nn\times n zero-diagonal symmetric matrix whose entries are Rademacher and independent from CC and CC^{\prime}. We apply the usual symmetrization arguments:

where (a),(d)(a),(d) follow from the Jensen’s inequality; (b)(b) follows because AAA-A^{\prime} has the same distribution as (AA)M(A-A^{\prime})\circ M, where \circ denotes the element-wise product; (c),(f)(c),(f) follow from the triangle inequality; (e)(e) follows from the fact that DDD-D^{\prime} has the same distribution as EEE-E^{\prime}. In particular, first, the diagonal entries of DDD-D^{\prime} and EEE-E^{\prime} are all equal to zero. Second, both DDD-D^{\prime} and EEE-E^{\prime} are symmetric matrices with independent upper triangular entries. Third, DijDijD_{ij}-D^{\prime}_{ij} is equal in distribution to EijEijE_{ij}-E^{\prime}_{ij} for all i<ji<j by definition.

Then, we apply the result of Seginer which characterized the expected spectral norm of i.i.d. random matrices within universal constant factors. Let Xji=1nEij2X_{j}\triangleq\sum_{i=1}^{n}E_{ij}^{2}, which are independent Binom(n,p){\rm Binom}(n,p). Since μ^\widehat{\mu} is symmetric, [39, Theorem 1.1] and Jensen’s inequality yield

for some universal constant κ\kappa. In view of the following Chernoff bound for the binomial distribution [31, Theorem 4.4]:

for all t6npt\geq 6np, setting t0=6max{np/logn,1}t_{0}=6\max\{np/\log n,1\} and applying the union bound, we have

where the last inequality follows from npc0lognnp\geq c_{0}\log n. Assembling (70) – (72), we obtain

Assume that kn[m]k_{n}\in[m] and kn=(1+o(1))lognloglognk_{n}=(1+o(1))\frac{\log n}{\log\log n}. Then

If ϵ=0\epsilon=0, then i=1mXiBinom(m,p)\sum_{i=1}^{m}X_{i}\sim{\rm Binom}(m,p) and the lemma follows from Lemma 1. Next we focus on the case ϵ>0\epsilon>0. It follows from the Chernoff bound that

Hence, by setting x=kn/mx=k_{n}/m, we get λ=12log1ϵϵ+o(1)\lambda^{\ast}=\frac{1}{2}\log\frac{1-\epsilon}{\epsilon}+o(1) and thus

where the last equality holds due to the Taylor expansion of log(1x)\log(1-x) at x=0x=0 and p=alogn/np=a\log n/n. Combining the last displayed equation with (75) gives the desired (74). ∎

The following lemma establishes a lower tail bound for i=1mXi\sum_{i=1}^{m}X_{i}.

Let k2aϵ(1ϵ)lognk^{\ast}\triangleq\lfloor 2a\sqrt{\epsilon(1-\epsilon)}\log n\rfloor. Notice that i=1mXi2Binom(m,p)\sum_{i=1}^{m}X^{2}_{i}\sim{\rm Binom}(m,p). Let Z1,Z2,,Zni.i.d.(1ϵ)δ+1+ϵδ1.Z_{1},Z_{2},\ldots,Z_{n}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}(1-\epsilon)\delta_{+1}+\epsilon\delta_{-1}. Then

We use the following non-asymptotic bound on the binomial tail probability [9, Lemma 4.7.2]: For UBinom(n,p)U\sim{\rm Binom}(n,p),

where λ=kn(0,1)p\lambda=\frac{k}{n}\in(0,1)\geq p and D(λp)=λlogλp+(1λ)log1λ1pD(\lambda\|p)=\lambda\log\frac{\lambda}{p}+(1-\lambda)\log\frac{1-\lambda}{1-p} is the binary divergence function. Let WBinom(k,ϵ)W\sim{\rm Binom}(k^{\ast},\epsilon). Then,

Moreover, using the following bound on binomial coefficients [9, Lemma 4.7.1]:

where λ=kn(0,1)\lambda=\frac{k}{n}\in(0,1) and h(λ)=λlogλ(1λ)log(1λ)h(\lambda)=-\lambda\log\lambda-(1-\lambda)\log(1-\lambda) is the binary entropy function, we have that

Observe that by the definition of kk^{\ast}, logalognk=D(1/2ϵ)+o(1)\log\frac{a\log n}{k^{\ast}}=D(1/2\|\epsilon)+o(1) and it follows from (76) that

The following lemma provides a deterministic sufficient condition for the success of SDP (13) in the case of a>ba>b.

Suppose there exist D=diag{di}D^{\ast}=\mathsf{diag}\left\{{d^{\ast}_{i}}\right\} such that SDAS^{*}\triangleq D^{\ast}-A satisfies S0S^{\ast}\succeq 0, λ2(S)>0\lambda_{2}(S^{\ast})>0 and

Then Y^SDP=Y\widehat{Y}_{{\rm SDP}}=Y^{\ast} is the unique solution to (13).

where the Lagrangian multipliers are S0S\succeq 0 and D=diag{di}D=\mathsf{diag}\left\{{d_{i}}\right\}. Then for any YY satisfying the constraints in (13),

where (a)(a) holds because S,Y0\langle S^{\ast},Y\rangle\geq 0; (b)(b) holds because Y,S=(σ)Sσ=0\langle Y^{\ast},S^{\ast}\rangle=(\sigma^{\ast})^{\top}S^{\ast}\sigma^{\ast}=0 by (77). Hence, YY^{\ast} is an optimal solution. It remains to establish its uniqueness. To this end, suppose Y~{\widetilde{Y}} is an optimal solution. Then,

where (a)(a) holds because A,Y~=A,Y\langle A,{\widetilde{Y}}\rangle=\langle A,Y^{\ast}\rangle and Y~ii=Yii=1{\widetilde{Y}}_{ii}=Y^{*}_{ii}=1 for all i[n]i\in[n]. In view of (77), since Y~0{\widetilde{Y}}\succeq 0, S0S^{\ast}\succeq 0 with λ2(S)>0\lambda_{2}(S^{*})>0, Y~{\widetilde{Y}} must be a multiple of Y=σ(σ)Y^{*}=\sigma^{\ast}(\sigma^{\ast})^{\top}. Because Y~ii=1{\widetilde{Y}}_{ii}=1 for all i[n]i\in[n], Y~=Y{\widetilde{Y}}=Y^{\ast}. ∎

Let D=diag{di}D^{\ast}=\mathsf{diag}\left\{{d^{\ast}_{i}}\right\} with

It suffices to show that S=DAS^{*}=D^{\ast}-A satisfies the conditions in Lemma 9 with high probability.

By definition, diσi=jAijσjd^{\ast}_{i}\sigma_{i}^{\ast}=\sum_{j}A_{ij}\sigma^{\ast}_{j} for all ii, i.e., Dσ=AσD^{\ast}\sigma^{\ast}=A\sigma^{\ast}. Thus (77) holds, that is, Sσ=0S^{*}\sigma^{*}=0. It remains to verify that S0S^{\ast}\succeq 0 and λ2(S)>0\lambda_{2}(S^{\ast})>0 with probability converging to one, which amounts to showing that

Applying the union bound implies that mini[n]dilognloglogn\min_{i\in[n]}d^{\ast}_{i}\geq\frac{\log n}{\log\log n} holds with probability at least 1n1a(1ϵϵ)2+o(1)1-n^{1-a(\sqrt{1-\epsilon}-\sqrt{\epsilon})^{2}+o(1)}. It follows from the assumption a(1ϵϵ)2>1a(\sqrt{1-\epsilon}-\sqrt{\epsilon})^{2}>1 and (80) that the desired (79) holds, completing the proof. ∎

The prior distribution of σ\sigma^{\ast} is uniform over {±1}n\{\pm 1\}^{n}. First consider the case of ϵ=0\epsilon=0. If a<1a<1, then the number of isolated vertices tends to infinity in probability . Notice that for isolated vertices ii, vertex σi\sigma^{\ast}_{i} is equally likely to be +1+1 or 1-1 conditional on the graph. Hence, the probability of exact recovery converges to .

Let TT denote the set of first nlog2n\lfloor\frac{n}{\log^{2}n}\rfloor vertices and Tc=[n]\TT^{c}=[n]\backslash T. Let si=jTc:σj=σiAijs^{\prime}_{i}=\sum_{j\in T^{c}:\sigma^{\ast}_{j}=\sigma^{\ast}_{i}}A_{ij} and ri=jTc:σjσiAijr^{\prime}_{i}=\sum_{j\in T^{c}:\sigma^{\ast}_{j}\neq\sigma^{\ast}_{i}}A_{ij}. Then

4 Proofs for Section 5: General cluster structure

We first present a dual certificate lemma which is useful for the proof of Theorem 8. Recall that ξk\xi_{k}^{*} denotes the indicator vector of cluster kk for k[r]k\in[r] and Z=kξkξk.Z^{*}=\sum_{k}\xi_{k}^{\top}\xi_{k}.

(If the penalized SDP (22) is used, the same η\eta^{*} and λ\lambda^{*} should be used in the SDP and in this lemma.) Then ZZ^{*} is the unique solution to both SDP (15) and (22) (i.e., Z^SDP\widehat{Z}_{SDP} produced by either SDP is equal to ZZ^{*}).

Let H=ZZ,H=Z-Z^{*}, where ZZ is either an arbitrary feasible matrix for the SDP (15) or an arbitrary feasible matrix for the SDP (22). Since AηIλJ=DBS,A-\eta^{*}\mathbf{I}-\lambda^{*}\mathbf{J}=D^{*}-B^{*}-S^{*},

D,H0,\langle D^{*},H\rangle\leq 0, with equality if and only if Zii=1Z_{ii}=1 for all inlier s i.i. That is because for inliers ii, di>0d^{*}_{i}>0, Zii1=Zii,Z_{ii}\leq 1=Z^{*}_{ii}, and for outliers i,i, di=0.d^{*}_{i}=0.

B,H0,\langle B^{*},H\rangle\geq 0, with equality if and only if B,Z=0.\langle B^{*},Z\rangle=0. That is because B0,B^{*}\geq 0, Z0,Z\geq 0, and B,Z=0.\langle B^{*},Z^{*}\rangle=0.

S,H0,\langle S^{*},H\rangle\geq 0, with equality if and only if S,Z=0.\langle S^{*},Z\rangle=0. That is because S,Z0\langle S^{*},Z\rangle\geq 0 (because S,Z0S^{*},Z\succeq 0) and S,Z=0\langle S^{*},Z^{*}\rangle=0 (because ZZ^{*} is a sum of matrices of the form ξkξk\xi_{k}\xi_{k}^{\top} and Sξk=0S^{*}\xi_{k}=0 for all k.k.)

Thus, A,HηI,HλJ,H0.\langle A,H\rangle-\eta^{*}\langle\mathbf{I},H\rangle-\lambda^{*}\langle\mathbf{J},H\rangle\leq 0. Therefore, ZZ^{*} is a solution to SDP (22). If ZZ is a feasible solution for the SDP (15), (as ZZ^{*} is), then I,H=J,H=0,\langle\mathbf{I},H\rangle=\langle\mathbf{J},H\rangle=0, so we conclude that A,H0,\langle A,H\rangle\leq 0, so ZZ^{*} is also a solution to SDP (15).

To prove that ZZ^{*} is the unique solution, restrict attention to the case that ZZ is another solution to either one of the SDPs. We need to show Z=Z.Z=Z^{*}. Since both ZZ and ZZ^{*} are solutions, A,HηI,HλJ,H0,\langle A,H\rangle-\eta^{*}\langle\mathbf{I},H\rangle-\lambda^{*}\langle\mathbf{J},H\rangle\leq 0, so that D,H=B,H=S,H=0.\langle D^{*},H\rangle=\langle B^{*},H\rangle=\langle S^{*},H\rangle=0. Therefore, by the above three points: Zii=1Z_{ii}=1 for all inliers ii, and B,Z=S,Z=0.\langle B^{*},Z\rangle=\langle S^{*},Z\rangle=0.

Since Bij>0B^{*}_{ij}>0 whenever ii and jj are in distinct clusters, and Z0Z\geq 0 and B0B^{*}\geq 0, the condition B,Z=0\langle B^{*},Z\rangle=0 implies that Zij=0Z_{ij}=0 whenever ii and jj are in distinct clusters. By assumption, ξk\xi_{k}^{*} is an eigenvector of SS^{*} with corresponding eigenvalue zero, for 1kr.1\leq k\leq r. Since λr+1(S)>0\lambda_{r+1}(S)>0, it follows that all the other eigenvalues of SS^{*} are strictly positive. The condition S,Z=0\langle S^{*},Z\rangle=0 thus implies that all the other eigenvectors of SS^{*} are in the null space of Z,Z, so the eigenvectors of ZZ corresponding to the positive eigenvalues of ZZ must be in the span of ξ1,,ξr.\xi^{\ast}_{1},\ldots,\xi^{\ast}_{r}. It follows that ZZ is a linear combination of matrices of the form ξk(ξk),\xi^{\ast}_{k}(\xi^{\ast}_{k^{\prime}})^{\top}, for k,k[r].k,k^{\prime}\in[r]. It follows that Zij=0Z_{ij}=0 if either ii or jj is an outlier vertex, or both are outlier vertices. Moreover, whenever ii and jj are in the same cluster, Zij=Zii=1Z_{ij}=Z_{ii}=1. In conclusion, Z=ZZ=Z^{\ast}. ∎

For k[r]k\in[r], denote by Ck[n]C_{k}\subset[n] the support of the kthk^{\rm th} cluster. Also, let C0C_{0} denote the set of outlier vertices. For a set TT of vertices, let e(i,T)jTAije(i,T)\triangleq\sum_{j\in T}A_{ij} and e(T,T)=iTe(i,T).e(T^{\prime},T)=\sum_{i\in T^{\prime}}e(i,T). Let k(i)k(i) denote the index of the cluster containing vertex i.i. Denote the number of neighbors of ii in its own cluster by si=e(i,Ck(i))s_{i}=e(i,C_{k(i)}) and the maximum number of neighbors of ii in other clusters by ri=maxkk(i)e(i,Ck).r_{i}=\max_{k^{\prime}\neq k(i)}e(i,C_{k^{\prime}}).

Now, let us construct (D,B,η,λ)(D^{*},B^{*},\eta^{*},\lambda^{*}) such that the conditions of Lemma 10 hold with high probability. Notice that di=0d^{*}_{i}=0 if ii is an outlier and BCk×Ck=0B^{*}_{C_{k}\times C_{k}}=0 for k[r]k\in[r]. In order that (Sξk)i=0(S^{*}\xi_{k}^{*})_{i}=0 for iCki\in C_{k} and k[r]k\in[r], we must choose:

The condition Sξk=0S^{*}\xi_{k}^{*}=0 for k[r]k\in[r] also partially constrains the symmetric matrix B.B^{*}. We should try to be economical in the choice of BB^{*} so that we have a chance to prove that S0.S^{*}\succeq 0.

where we used the fact that for each pair of distinct kk and kk^{\prime}, each of the terms in the definition of BCk×Ck(i,j)B^{*}_{C_{k}\times C_{k^{\prime}}}(i,j) is either constant in ii or constant in jj, or both, and if k=0k=0 the terms are constant in jj and if k=0k^{\prime}=0 the terms are constant in i.i. The needed condition di0d_{i}^{*}\geq 0 involves getting a lower bound on the number of edges a vertex ii has to other vertices in its own cluster (we can concentrate on the smallest cluster for that purpose), while the needed condition B0B\geq 0 involves an upper bound on the number of edges between a vertex ii in one cluster and the vertices of a different cluster.

where ξ0\xi_{0} is the indicator function for the set of outlier vertices and we used the fact that J=11\mathbf{J}=\mathbf{1}\mathbf{1}^{\top} and 1=(1ξ0)+ξ0.\mathbf{1}=(\mathbf{1}-\xi_{0})+\xi_{0}. From this it is clear that if λq,\lambda^{*}\geq q, then λr+1(S)>0.\lambda_{r+1}(S^{\ast})>0. So we will be sure to select λq.\lambda^{*}\geq q. In fact, that will be needed to ensure that Bij0B_{ij}\geq 0 for all i,j.i,j.

It remains to select λ\lambda^{*} so that di0d_{i}\geq 0 and Bij0B_{ij}\geq 0 for all i,ji,j with high probability. Let λ=τ~logn/n\lambda^{*}=\widetilde{\tau}\log n/n with τ~=b+ψ1+ψ2\widetilde{\tau}=b+\psi_{1}+\psi_{2}, where ψ1\psi_{1} and ψ2\psi_{2} satisfy the assumptions (28)-(31). Then, for inlier vertex iCki\in C_{k}, in view of Lemma 1 and the definition of I(,)I(\cdot,\cdot) in (27),

Turning next to BijB^{\ast}_{ij}’s for (i,j)Ck×Ck,(i,j)\in C_{k}\times C_{k^{\prime}}, it suffices to consider the two following cases:

Note that e(Ck,Ck)KkKk\frac{e(C_{k},C_{k^{\prime}})}{K_{k}K_{k^{\prime}}} will be very close to qq with high probability, so we can replace it by q=blognnq=\frac{b\log n}{n}, which is also the mean of e(i,Ck)Kk\frac{e(i,C_{k^{\prime}})}{K_{k^{\prime}}} and e(j,Ck)Kk.\frac{e(j,C_{k})}{K_{k}}. Specifically, it follows from the Chernoff bound that

where μ=qKkKk\mu=qK_{k}K_{k^{\prime}} and ϵ=2lognqKkKk.\epsilon=\frac{2\sqrt{\log n}}{\sqrt{qK_{k}K_{k^{\prime}}}}. In view of Lemma 1 and the union bound,

By the assumptions ρrI(b,b+ψ1)>1\rho_{r}I(b,b+\psi_{1})>1 and ρr1I(b,b+ψ2)>1\rho_{r-1}I(b,b+\psi_{2})>1, it follows that with high probability BCk×Ck>0B^{\ast}_{C_{k}\times C_{k^{\prime}}}>0.

By the assumptions ρrI(b,τ~)>1\rho_{r}I(b,\widetilde{\tau})>1, it follows that with high probability BCk×Ck0B^{\ast}_{C_{k}\times C_{k^{\prime}}}\geq 0.

In conclusion, we have constructed (D,B,η,λ)(D^{*},B^{*},\eta^{*},\lambda^{*}) such that the conditions of Lemma 10 hold with high probability. Therefore, the theorem follows by applying Lemma 10. ∎

Let τ=ablog(a/b)\tau=\frac{a-b}{\log(a/b)} for 0<a<b.0<a<b. Then I(b,τ)(ab)22I(b,τ).I(b,\tau)\leq(\sqrt{a}-\sqrt{b})^{2}\leq 2I(b,\tau).

Notice that I(a,x)+I(b,x)I(a,x)+I(b,x) is strictly convex in xx. By setting the derivative to be zero, we find that it achieves its minimum value, (ab)2,(\sqrt{a}-\sqrt{b})^{2}, at x=ab.x=\sqrt{ab}. By definition, I(a,τ)=I(b,τ)I(a,\tau)=I(b,\tau) and abτ.\sqrt{ab}\leq\tau. Thus, (ab)2I(a,τ)+I(b,τ)=2I(b,τ)(\sqrt{a}-\sqrt{b})^{2}\leq I(a,\tau)+I(b,\tau)=2I(b,\tau). Moreover, I(a,x)I(a,x) is decreasing for xax\leq a and I(b,x)I(b,x) is non-negative. Therefore, I(b,τ)=I(a,τ)I(a,ab)(ab)2.I(b,\tau)=I(a,\tau)\leq I(a,\sqrt{ab})\leq(\sqrt{a}-\sqrt{b})^{2}.

For any μ>0\mu>0 and x>0,x>0, I(μ,μ+2x)4I(μ,μ+x).I(\mu,\mu+2x)\leq 4I(\mu,\mu+x).

Let f(x)=I(μ,μ+x)f(x)=I(\mu,\mu+x) for x0x\geq 0. Then f(0)=f(0)=0f(0)=f^{\prime}(0)=0 and f(s)=1μ+s.f^{\prime\prime}(s)=\frac{1}{\mu+s}. Therefore,

Thus, using a change of variables s=2ts=2t,

Comparing the expressions for f(x)f(x) and f(2x)f(2x) completes the proof. ∎

Appendix A Behavior of threshold function in (4)

Recall η(ρ,a,b)\eta(\rho,a,b) defined in (4) which governs the sharp recovery threshold for the asymmetric binary SBM. The following lemma implies η(ρ,a,b)\eta(\rho,a,b) is minimized at ρ=1/2.\rho=1/2.

For any a>b>0a>b>0, η(ρ,a,b)\eta(\rho,a,b) is convex in ρ\rho over ,, and symmetric about ρ=1/2.\rho=1/2.

and from this expression it is easily checked that η\eta is symmetric about ρ=1/2.\rho=1/2.

Let η,η\eta^{\prime},\eta^{\prime\prime} denote the first-order and second-order derivative of η\eta with respect to ρ\rho, respectively. We show that η0\eta^{\prime\prime}\geq 0. Recall that γ=(12ρ)2τ2+4ρ(1ρ)ab\gamma=\sqrt{(1-2\rho)^{2}\tau^{2}+4\rho(1-\rho)ab}. Hence,

Let h(ρ)=log(γ+(12ρ)τ)ρ(γ(12ρ)τ)(1ρ)h(\rho)=\log\frac{(\gamma+(1-2\rho)\tau)\rho}{(\gamma-(1-2\rho)\tau)(1-\rho)} and then

where (a)(a) follows using the expression of γ\gamma. Therefore,

where the last inequality follows because by letting x=ab/τ2x=ab/\tau^{2},

Appendix B A data-driven choice of the penalization parameter in (5)

Set ρ^=1n1{wiw^}\widehat{\rho}=\frac{1}{n}\sum{\mathbf{1}_{\left\{{w_{i}\leq\widehat{w}}\right\}}}, w^+=1nwi1{wi>w^}\widehat{w}_{+}=\frac{1}{n}\sum w_{i}{\mathbf{1}_{\left\{{w_{i}>\widehat{w}}\right\}}} and w^=1nwi1{wi<w^}\widehat{w}_{-}=\frac{1}{n}\sum w_{i}{\mathbf{1}_{\left\{{w_{i}<\widehat{w}}\right\}}}, which are consistent estimates for ρ,w+,w\rho,w_{+},w_{-}, respectively. From these we can readily obtain consistent estimates for (a,b,ρ)(a,b,\rho) whenever ρ1/2\rho\neq 1/2. Furthermore, when ρ=1/2\rho=1/2, we claim that

Now we are ready to choose the penalty parameter λ^=λ^(A)\widehat{\lambda}=\widehat{\lambda}(A), so that Theorem 3 continues to hold upon replacing the deterministic λ\lambda^{*} by λ^\widehat{\lambda}. Let

Let d~1=j2A1j\widetilde{d}_{1}=\sum_{j\neq 2}A_{1j} and d~2=j1A2j\widetilde{d}_{2}=\sum_{j\neq 1}A_{2j}. Let w~i=d~i/logn\widetilde{w}_{i}=\widetilde{d}_{i}/\log n for i=1,2i=1,2. Then

References