Semidefinite Programs for Exact Recovery of a Hidden Community

Bruce Hajek, Yihong Wu, Jiaming Xu

Introduction

Consider the stochastic block model (SBM) with a single community, where out of nn vertices a community consisting of KK vertices are chosen uniformly at random; two vertices are connected by an edge with probability pp if they both belong to the community and with probability qq if either one of them is not in the community. The goal is to recover the community based on observation of the graph, which, when p>qp>q, is also known as the planted dense subgraph recovery problem .

In the special case of p=1p=1 and q=1/2q=1/2, planted dense subgraph recovery reduces to the widely-studied planted clique problem, i.e., finding a hidden clique of size KK in the Erdős-Rényi random graph G(n,1/2){\mathcal{G}}(n,1/2). It is well-known that the maximum likelihood estimator (MLE), which is computationally intractable, finds any clique of size K2(1+ϵ)log2nK\geq 2(1+\epsilon)\log_{2}n for any constant ϵ>0\epsilon>0; however, existing polynomial-time algorithms, including spectral methods , message passing , and semi-definite programming (SDP) relaxation of MLE , are only known to find a clique of size KϵnK\geq\epsilon\sqrt{n}. In fact, impossibility results for the more powerful ss-round Lovász-Schrijver relaxations and, more recently, degree-2r2r sums-of-squares (SOS) relaxation (with s=1s=1 and r=1r=1 corresponding to SDP) have been recently obtained in and , showing that relaxations of constant rounds or degrees lead to order-wise suboptimality even for detecting the clique. In other words, for the planted clique problem there is a significant gap between the state of the art of polynomial-time algorithms and what is information-theoretically possible.

In sharp contrast, for sparser graphs and larger community size, SDP relaxations have been shown to achieve the information-theoretic recovery limit up to sharp constants. For p=alogn/n,q=blogn/np=a\log n/n,q=b\log n/n and K=ρnK=\rho n for fixed constants a,b>0a,b>0 and 0<ρ<10<\rho<1, the recent work identified a sharp threshold ρ=ρ(a,b)\rho^{\ast}=\rho^{\ast}(a,b) such that if ρ>ρ\rho>\rho^{\ast}, an SDP relaxation of MLE recovers the hidden community with high probability; if ρ<ρ\rho<\rho^{\ast}, exact recovery is information theoretically impossible. This optimality result of SDP has been extended to multiple communities as long as their sizes scale linearly with the graph size nn .

The dichotomy between the optimality of SDP up to sharp constants in the relatively sparse regime and the order-wise suboptimality of SDP in the dense regime prompts us to investigate the following question:

When do SDP relaxations cease to be optimal for planted dense subgraph recovery?

In this paper, we address this question under the more general hidden community model considered in .

Let CC^{*} be drawn uniformly at random from all subsets of [n][n] of cardinality KK. Given probability measures PP and QQ on a common measurable space X{\mathcal{X}}, let AA be an n×nn\times n symmetric matrix with zero diagonal where for all 1i<jn1\leq i<j\leq n, AijA_{ij} are mutually independent, and AijPA_{ij}\sim P if i,jCi,j\in C^{*} and AijQA_{ij}\sim Q otherwise.

We are particularly interested in the following choices of PP and QQ:

Bernoulli case: P=Bern(p)P={\rm Bern}(p) and Q=Bern(q)Q={\rm Bern}(q) with 0q<p10\leq q<p\leq 1. In this case, the data matrix AA corresponds to the adjacency matrix of a graph, and the problem reduces to planted dense subgraph recovery.

Gaussian case: P=N(μ,1)P={\mathcal{N}}(\mu,1) and Q=N(0,1)Q={\mathcal{N}}(0,1) with μ>0\mu>0. In this case, the submatrix of AA with row and column indices in CC^{\ast} has a positive mean μ\mu except on the diagonal, while the rest of AA has zero mean, and the problem corresponds to a symmetric version of the submatrix localization problem studied in .

2 Main results

We show that for both planted dense subgraph recovery and submatrix localization, SDP relaxations of MLE achieve the information-theoretic optimal threshold if and only if the hidden community size satisfies K=ω(nlogn)K=\omega(\frac{n}{\log n}). More specifically,

K=ω(nlogn)K=\omega(\frac{n}{\log n}), SDP attains the information-theoretic recovery limits with sharp constants. This extends the previous result in obtained for K=Θ(n)K=\Theta(n) and the Bernoulli case.

K=Θ(nlogn)K=\Theta(\frac{n}{\log n}), SDP is order-wise optimal, but strictly suboptimal by a constant factor;

K=o(nlogn)K=o(\frac{n}{\log n}) and KK\to\infty, SDP is order-wise suboptimal.

To establish our main results, we derive a sufficient condition and a necessary condition under which the optimal solution to SDP is unique and coincides with the true cluster matrix. In particular, for planted dense subgraph recovery, whenever SDP does not achieve the information-theoretic threshold, our sufficient condition and necessary condition are within constant factors of each other; for submatrix localization, we characterize the minimal signal-to-noise ratio required by SDP within a factor of four when K=ω(n)K=\omega(\sqrt{n}). The sufficiency proof is similar to those in based on the dual certificate argument; we extend the construction and validation of dual certificates for the success of SDP to the general distributions P,QP,Q. The necessity proof is via constructing a high-probability feasible solution to the SDP by means of random perturbation of the ground truth that leads to a higher objective value. One could instead adapt the existing constructions in the SOS literature for planted clique to our setting, but it falls short of establishing the impossibility of SDP to attain the optimal recovery threshold in the critical regime of K=Θ(n/logn)K=\Theta(n/\log n); see Remark 2 for details.

An alternative approach to establish impossibility results for SDP, thanks to strong duality that holds for the specific program, is to prove the non-existence of dual certificates, which turns out to yield the same condition given by the aforementioned explicit construction of primal solutions. The dual-based method has been previously used for proving necessary conditions for related nuclear-norm constrained optimization problems, see e.g., ; however, the constants in the derived conditions are often loose or unspecified. In comparison, we aim to obtain necessary conditions for SDP relaxations with explicit constants. Another difference is that the specific SDP considered here is more complicated involving the stringent positive semi-definite constraint and a set of equality and non-negativity constraints.

Using similar techniques, we obtain analogous results for SDP relaxation for SBM with logarithmically many communities. Specifically, consider the network of n=rKn=rK vertices partitioned into rr communities of cardinality KK each, with edge probability pp for pairs of vertices within communities and qq for other pairs of vertices. Then SDP relaxation, in contrast to the MLE, is constantwise suboptimal if rClognr\geq C\log n for sufficiently large CC, and orderwise suboptimal if r=ω(logn)r=\omega\left(\log n\right). That is, it is constantwise suboptimal if KcnlognK\leq\frac{cn}{\log n} for sufficiently small cc, and orderwise suboptimal if K=o(nlogn)K=o\left(\frac{n}{\log n}\right). This result complements the sharp optimality for SDP previously established in for r=O(1)r=O(1) and extended to r=o(logn)r=o(\log n) in .

3 Notation

Semidefinite programming relaxations

which maximizes the sum of entries among all K×KK\times K principal submatrices of LL.

If LL is the log likelihood ratio (LLR) matrix with f(Aij)=logdPdQ(Aij)f(A_{ij})=\log\frac{dP}{dQ}(A_{ij}) for iji\neq j and Lii=0L_{ii}=0, then ξ^\widehat{\xi} is precisely the MLE of ξ\xi^{\ast}. In general, evaluating the MLE requires knowledge of KK and the distributions P,QP,Q. Computing the MLE is NP hard in the worst case for general values of nn and KK since certifying the existence of a clique of a specified size in an undirected graph, which is known to be NP complete , can be reduced to computation of the MLE. This intractability of the MLE prompts us to consider its semidefinite programming relaxation as studied in . Note that (1) can be equivalentlyHere (1) and (2) are equivalent in the following sense: For any feasible ξ\xi for (1), Z=ξξZ=\xi\xi^{\top} is feasible for (2); Any feasible ZZ for (2) can be written as Z=ξξZ=\xi\xi^{\top} such that either ξ\xi or ξ-\xi is feasible for (1). formulated as

Replacing the rank-one constraint by the positive semidefinite constraint leads to the following convex relaxation of (2), which can be cast as a semidefinite program:Z^SDP\widehat{Z}_{{\rm SDP}} as argmax\mathop{\arg\max} denotes the set of maximizers of the optimization problem (3). If ZZ^{*} is the unique maximizer, we write Z^SDP=Z\widehat{Z}_{{\rm SDP}}=Z^{*}.

Recall that if LL is the LLR matrix, then the solution ξ^\widehat{\xi} to (1) is precisely the MLE of ξ\xi^{\ast}. In the Gaussian case, logdPdQ(Aij)=μ(Aijμ/2)\log\frac{dP}{dQ}(A_{ij})=\mu(A_{ij}-\mu/2) with μ>0\mu>0 for iji\neq j; in the Bernoulli case, logdPdQ(Aij)=logp(1q)q(1p)Aij+log1p1q\log\frac{dP}{dQ}(A_{ij})=\log\frac{p(1-q)}{q(1-p)}A_{ij}+\log\frac{1-p}{1-q} with p>qp>q for iji\neq j. Thus, in both cases, (3) with L=AL=A corresponds to a semidefinite programming relaxation of the MLE, and the only model parameter needed for evaluating (3) is the cluster size KK.

Analysis of SDP in the general model

In this section, we give a sufficient condition and a necessary condition, both deterministic, for the success of SDP (3) for exact recovery. Define

The sufficient condition of Theorem 1 is derived via the dual certificate argument. That is, we give an explicit construction of dual variables which together with ZZ^{*} are shown to satisfy the KKT conditions under the condition (5).

There is no feasible solution to (6) unless 1am1\leq a\leq m, so by convention, let Vm(a)=V_{m}(a)=-\infty if a<1a<1 or a>ma>m. Dropping the second and the last constraints in (6), yields Vm(a)λmax(M)V_{m}(a)\leq\lambda_{\max}(M). Also, Vm(1)=0,V_{m}(1)=0, Vm(m)=M,J/m,V_{m}(m)=\langle M,\mathbf{J}\rangle/m, and aVm(a)a\mapsto V_{m}(a) is concave on [1,m][1,m]. Clearly, the distributions of MM as well as Vm(a)V_{m}(a) depend on the distribution QQ but not PP.

Fix K,n,C,K,n,C^{*}, the matrix LL, and a[1,K]a\in[1,K]. Also, let r=aKr=\frac{a}{K}. For ease of notation suppose the indices are permuted so that C=[K],C^{*}=[K], index KK minimizes e(i,C)e(i,C^{*}) over all iC,i\in C^{*}, and index K+1K+1 maximizes e(j,C)e(j,C^{*}) over all j∉Cj\not\in C^{*}. Let UU be an n×nn\times n matrix corresponding to the solution of the SDP defining Vm(a)V_{m}(a) with M=L(C)c×(C)cM=L_{(C^{*})^{c}\times(C^{*})^{c}} in (6). That is, UU is a symmetric n×nn\times n matrix with Uij=0U_{ij}=0 if (i,j)∉(C)c×(C)c,(i,j)\not\in(C^{*})^{c}\times(C^{*})^{c}, Vm(a)=L,UV_{m}(a)=\langle L,U\rangle, U0U\succeq 0, U0U\geq 0, Tr(U)=1,\mathsf{Tr}(U)=1, and J,U=a=Kr\langle\mathbf{J},U\rangle=a=Kr.

Next we give intuition about the construction of primal feasible solutions via random perturbation that lead to a necessary condition for SDP. Three positive semidefinite perturbations of ZZ^{*}, namely Z+δiZ^{*}+\delta_{i} for 1i3,1\leq i\leq 3, can be defined for 0<ϵ<1/20<\epsilon<1/2 by letting (dashed lines delineate the K×KK\times K submatrices and only nonzero entries are shown):

It turns out that Z+δ1+δ2+δ3Z^{*}+\delta_{1}+\delta_{2}+\delta_{3} is close to being positive semidefinite for small ϵ\epsilon. Also,

Therefore, up to o(ϵ)o(\epsilon) terms, Z+δ1+δ2+δ3Z^{*}+\delta_{1}+\delta_{2}+\delta_{3} satisfies the two equality constraints of the SDP (3) and is near a feasible solution of the SDP (3), suggesting that a necessary condition for the optimality of ZZ^{*} is L,δ1+δ2+δ30\langle L,\delta_{1}+\delta_{2}+\delta_{3}\rangle\leq 0. Note that

Hence the term inside the parenthesis must be non-positive. This leads the following deterministic necessary condition for SDP. The proof, given in Section 8.2, is a minor variation of the heuristic argument just presented.

If ZZ^SDPZ^{*}\in\widehat{Z}_{{\rm SDP}}, then

Setting a=Ka=K in (24) yields the weaker necessary condition: miniCe(i,C)VnK(K)\min_{i\in C^{\ast}}e(i,C^{*})\geq V_{n-K}(K).

The problem formulation as well as proof technique of Theorem 2 differ from existing results on the planted clique problem for the sum of squares (SoS) hierarchy in an essential way. Aside from the fact that those papers consider more powerful convex relaxations, they address the clique detection problem (which do have implications for clique estimation), which can be viewed as testing the null hypothesis H0:H_{0}: clique absent versus the alternative H1:H_{1}: clique present, using the value of the SOS program as the test statistic. The approach of these papers involves only the null hypothesis, showing that a feasible solution to SOS program can be constructed based on the G(n,1/2){\mathcal{G}}(n,1/2) graph whose objective value is much larger than the size of the largest clique in G(n,1/2){\mathcal{G}}(n,1/2), leading to a large integrality gap. This further induces a high false-positive error probability if the size of the planted clique KK is small. In comparison, since we are dealing with recovery as opposed to detection using SDP, the impossibility result in Theorem 2 follows from the fact that, if the true cluster matrix ZZ^{*} is an optimal solution, then certain random perturbations of ZZ^{*} must not lead to a strictly larger objective value. More precisely, the perturbation argument involves three directions (14)-(23). Note that the matrix UU in (23) is the maximizer of (6) and can be constructed using similar techniques in the SoS literature. However, this perturbation alone is not enough to separate the performance of SDP from MLE in the critical regime K=Θ(n/logn)K=\Theta(n/\log n), and it is necessary to exploit the other perturbations terms (14)-(22) that depend on the true cluster matrix.

Since Slater’s condition and hence strong duality holds for the SDP (3), the fulfillment of the KKT conditions is necessary for ZZ^{*} to be a maximizer. We provide an alternative proof of Theorem 2 in Section 8.2, showing that (24) is necessary for the existence of dual variables to satisfy the KKT conditions together with ZZ^{*}.

By comparing Theorem 1 and Theorem 2, we find that both the sufficient and necessary conditions are in terms of the separation between miniCe(i,C)\min_{i\in C^{*}}e(i,C^{*}) and maxjCe(j,C)\max_{j\notin C^{*}}e(j,C^{*}). In comparison, for the optimal estimator, MLE, to succeed in exact recovery, it is necessary that miniCe(i,C)maxjCe(j,C)\min_{i\in C^{*}}e(i,C^{*})\geq\max_{j\notin C^{*}}e(j,C^{*}); otherwise, one can form a candidate community CC by swapping the node ii in CC^{*} achieving the minimum e(i,C)e(i,C^{*}) with the node jj not in CC^{*} achieving the maximum e(j,C)e(j,C^{*}), so that the new community CC has a likelihood at least as large as that of CC^{*}.

Capitalizing on Theorems 1 and 2, we will derive explicit sufficient and necessary results for the success of SDP in the Gaussian and Bernoulli cases. Interestingly, in both cases, if K=ω(n/logn)K=\omega(n/\log n), the sufficient condition of SDP coincides in the leading terms with the information-theoretic necessary condition for miniCe(i,C)maxjCe(j,C)\min_{i\in C^{*}}e(i,C^{*})\geq\max_{j\notin C^{*}}e(j,C^{*}), thus resulting in the optimality of SDP with the sharp constants.

Submatrix localization

In this section we consider the submatrix localization problem corresponding to the Gaussian case of Definition 1. The SDP relaxation of MLE is given by (3) with L=AL=A.

Assume that K2K\geq 2 and nKnn-K\asymp n. Let ϵ>0\epsilon>0 be an arbitrary constant. If either KK\to\infty and

To deduce (25) from the general sufficient condition (5), we first show that

Next we present a converse result for the exact recovery performance of SDP in a strong sense:

To deduce Theorem 4 from the general necessary condition given in Theorem 2, we first show that the inequalities in (27) and (28) are in fact equalities. Then, we prove a high-probability lower bound to Vm(a)V_{m}(a) and choose a=o(K)a=o(K) in (24) when K=ω(n)K=\omega(\sqrt{n}), and a=Ka=K when K=O(n)K=O(\sqrt{n}).

By comparing sufficient condition (25) and necessary condition (29), we can see that the sufficient condition and necessary condition are within a factor of 44 in the case of K=ω(n).K=\omega(\sqrt{n}).

It is instructive to compare the performance of the SDP to the information-theoretic fundamental limits. We focus on the most interesting regime of KK\to\infty and nKnn-K\asymp n. It has been shown (cf. [20, Theorem 4]) that, for any ϵ>0\epsilon>0, the MLE (which minimizes the probability of error) achieves exact recovery if

no estimator can exactly recover the community with high probability.

Comparing (32) – (33) with (25), (26), and (29)– (31), we arrive at the following conclusion on the performance of the SDP relaxation:

K=ω(n/logn)K=\omega(n/\log n): Since n=o(Klogn)\sqrt{n}=o(\sqrt{K\log n}), in this regime SDP attains the information-theoretically optimal recovery threshold with sharp constant.

K=Θ(n/logn)K=\Theta(n/\log n): SDP is order-wise optimal but strictly suboptimal in terms of constants. More precisely, consider the critical regime of

for fixed constants ρ,μ0>0\rho,\mu_{0}>0. Then MLE succeeds (resp. fails) if ρμ02\rho\mu_{0}^{2} >> (resp. <<) 88. If ρμ0>22ρ+2\rho\mu_{0}>2\sqrt{2\rho}+2, then SDP succeeds; conversely, if SDP succeeds, then ρμ022ρ+1/2\rho\mu_{0}\geq 2\sqrt{2\rho}+1/2. Moreover, it is shown in that a message passing algorithm plus clean-up succeeds if ρμ02>8\rho\mu_{0}^{2}>8 and ρμ0>1/e\rho\mu_{0}>1/\sqrt{{\rm e}}, while a linear message passing algorithm corresponds to a spectral method succeeds if ρμ02>8\rho\mu_{0}^{2}>8 and ρμ0>1\rho\mu_{0}>1. Therefore, SDP is strictly suboptimal comparing to MLE, message passing, and linear message passing for ρ>0\rho>0, ρ>(1/e1/2)2/8\rho>(1/\sqrt{{\rm e}}-1/2)^{2}/8, and ρ>1/32\rho>1/32, respectively. See Fig. 1 for an illustration.

ω(1)K=o(n/logn)\omega(1)\leq K=o(n/\log n): Comparing to MLE, SDP is order-wise suboptimal. Moreover, when Kn1/2δK\leq n^{1/2-\delta} for any fixed constant δ>0\delta>0, μ=Ω(logn)\mu=\Omega(\sqrt{\log n}) is necessary for SDP to achieve exact recovery, while the entrywise hard-thresholding or simply picking the largest entries attains exact recovery when μ(1ϵ)2logn\mu(1-\epsilon)\geq 2\sqrt{\log n}. Thus in this regime, the more sophisticated SDP procedure is only possible to outperform the trivial thresholding algorithm by a constant factor. Similar phenomena has been observed in the bi-clustering problem , which is an asymmetric version of the submatrix localization problem, and the sparse PCA .

K=Θ(1)K=\Theta(1): In this case the sufficient condition of SDP is within a constant factor of the information limit. For the extreme case of K=2K=2, SDP achieves the information limit with optimal constant, namely, μ(1ϵ)2logn\mu(1-\epsilon)\geq 2\sqrt{\log n}; however, in this case exact recovery can be trivially achieved by entrywise hard-thresholding or simply picking the largest entries.

Planted densest subgraph

In this section, we turn to the planted densest subgraph problem corresponding to the Bernoulli case of Definition 1, where P=Bern(p)P={\rm Bern}(p) and Q=Bern(q)Q={\rm Bern}(q) with 0q<p10\leq q<p\leq 1. We prove both positive and negative results for the SDP relaxation of the MLE, i.e., (3) with L=AL=A being the adjacency matrix of the graph, to exactly recover the community CC^{*}.

The following assumption on the community size and graph sparsity will be imposed:

As nn\to\infty, KK\to\infty, nKnn-K\asymp n, qq is bounded away from 11, and nq=Ω(logn)nq=\Omega(\log n).

Our SDP results are in terms of the following quantities:It can be shown that τ1\tau_{1} and τ2\tau_{2} are well-defined whenever exact recovery is information-theoretically possible; see Lemma 15.

To deduce sufficient condition (36) from the general result (5), we first show that with high probability,

Then, we prove that with high probability,

We prove (40) by contradiction: assuming (40) is violated, we construct explicitly a high-probability feasible solution ZZ to (3) based on the optimal solution of SDP defining VnK(K)V_{n-K}(K) given in (6), and show that A,Z=A,Z\langle A,Z\rangle=\langle A,Z^{*}\rangle, contracting the unique optimality of ZZ^{*}. Notice that in the special case of p=1p=1 (planted clique), ZZ^{*} is always a maximizer of the SDP (3) therefore the failure of SDP amounts to multiple maximizers.

To deduce the necessary condition (41) from Theorem 2, we first establish some inequalities similar to (38) and (39) but in the reverse direction. Then, we prove a high-probability lower bound to Vm(a)V_{m}(a) and choose a=1κnq1q+1a=\frac{1}{\kappa}\sqrt{\frac{nq}{1-q}}+1.

Particularizing Theorem 5 and Theorem 6 to the planted clique problem (p=1p=1 and q=1/2q=1/2), we conclude that: for any fixed ϵ>0\epsilon>0, if K2(1+ϵ)nK\geq 2(1+\epsilon)\sqrt{n}, then SDP succeeds (namely, ZZ^{*} is the unique optimal solution to (3)) with high probability; conversely, if K(1ϵ)n/2K\leq(1-\epsilon)\sqrt{n}/2, SDP fails with high probability. In comparison, a message passing algorithm plus clean-up is shown in to succeed if K>(1+ϵ)n/eK>(1+\epsilon)\sqrt{n/{\rm e}}.

Assume that logp(1q)q(1p)\log\frac{p(1-q)}{q(1-p)} is bounded. If K(pq)nq=O(1)\frac{K(p-q)}{\sqrt{nq}}=O(1), then the sufficient condition of SDP given in Theorem 5 reduces to K(τ1τ2)nqΩ(1)\frac{K(\tau_{1}-\tau_{2})}{\sqrt{nq}}\geq\Omega(1), while the necessary condition of SDP given in Theorem 6 reduces to K(τ1τ2)nqΩ(1)\frac{K(\tau_{1}-\tau_{2})}{\sqrt{nq}}\geq\Omega(1). Thus, the sufficient and necessary conditions are within constant factors of each other. If instead K(pq)nq=ω(1)\frac{K(p-q)}{\sqrt{nq}}=\omega(1), then SDP attains the information-theoretic recovery threshold with sharp constants, as shown in the next subsection.

In this section, we compare the performance limits of SDP with the information-theoretic limits of exact recovery obtained in under the assumption that logp(1q)q(1p)\log\frac{p(1-q)}{q(1-p)} is bounded and K/nK/n is bounded away from 11. Let

It is shown in [20, Theorem 3] that, the optimal estimator, MLE, achieves exact recovery if

no estimator can exactly recover the community with high probability.

Next we compare the SDP conditions (Theorems 5 and 6) to the information limit (43)–(44). Without loss of generality, we can assume the MLE necessary conditions holds. Our results on the performance limits of SDP lead to the following observations:

K=ω(n/logn)K=\omega(n/\log n). In this case, (43) implies (36) and thus SDP attains the information-theoretic recovery threshold with sharp constants. To see this, note that Lemma 15 shows that τ1(1ϵ)τ+ϵp\tau_{1}\geq(1-\epsilon)\tau^{\ast}+\epsilon p and τ2(1ϵ)τ+ϵq\tau_{2}\leq(1-\epsilon)\tau^{\ast}+\epsilon q for some small constant ϵ>0\epsilon>0. Moreover, Lemma 12 and Lemma 14 imply that

Therefore, if K=ω(n/logn)K=\omega(n/\log n), (43) implies that Kq=Ω(logn)Kq=\Omega(\log n) and K(pq)/nqK(p-q)/\sqrt{nq}\to\infty, and consequently

which in turn implies condition (36). This result recovers the previous result in the special case of K=ρnK=\rho n, p=alogn/np=a\log n/n, and q=blogn/nq=b\log n/n with fixed constants ρ,a,b\rho,a,b, where SDP has been shown to attain the information-theoretic recovery threshold with sharp constants .

K=o(n/logn)K=o(n/\log n). In this case, condition (41) together with qτ2pq\leq\tau_{2}\leq p and τ1p\tau_{1}\leq p implies that K(pq)/nq=Ω(1)K(p-q)/\sqrt{nq}=\Omega(1). In comparison, in view of (45), K(pq)/nq=ω(Klogn/n)K(p-q)/\sqrt{nq}=\omega(\sqrt{K\log n/n}) is sufficient for the information-theoretic sufficient condition (43) to hold. Hence, in this regime, SDP is order-wise suboptimal.

The above observations imply that a gap between the performance limit of SDP and information-theoretic limit emerges at K=Θ(n/logn)K=\Theta(n/\log n). To elaborate on this, consider the following regime:

where ρ>0\rho>0 and a>b>0a>b>0 are fixed constants. Let I(x,y)xylog(ex/y)I(x,y)\triangleq x-y\log({\rm e}x/y) for x,y>0x,y>0. Let γ1\gamma_{1} satisfy γ1<a\gamma_{1}<a and ρI(a,γ1)=1\rho I(a,\gamma_{1})=1 and γ2\gamma_{2} satisfy γ2>b\gamma_{2}>b and ρI(b,γ2)=1\rho I(b,\gamma_{2})=1. The following corollary follows from the performance limit of MLE given by (43)-(44) and that of SDP given by (36)-(41).

If γ1>γ2\gamma_{1}>\gamma_{2}, then MLE attains exact recovery; conversely, if MLE attains exact recovery, then γ1γ2\gamma_{1}\geq\gamma_{2}.

If ρ(γ1γ2)>4b\rho(\gamma_{1}-\gamma_{2})>4\sqrt{b}, then SDP attains exact recovery; conversely, if SDP attains exact recovery, then ρ(γ1γ2)b/4\rho(\gamma_{1}-\gamma_{2})\geq\sqrt{b}/4.

The proof is deferred to Appendix D. The above corollary implies that in the regime of (46), SDP is order-wise optimal, but strictly suboptimal by a constant factor. In comparison, as shown in , belief propagation plus clean-up succeeds if γ1>γ2\gamma_{1}>\gamma_{2} and ρ(ab)>b/e\rho(a-b)>\sqrt{b/{\rm e}}, while a linear message-passing algorithm corresponding to spectral method succeeds if γ1>γ2\gamma_{1}>\gamma_{2} and ρ(ab)>b\rho(a-b)>\sqrt{b}.

Stochastic block model with Ω​(log⁡n)Ω𝑛\Omega(\log n) communities

In this section, we consider the stochastic block model with r2r\geq 2 communities of size KK in a network of n=rKn=rK nodes. Derived in , the following SDP is a natural convex relaxation of MLE:There are slightly different but equivalent ways to impose the constraints. Under the condition Y0,Y\succeq 0, the constraint Y,J=0\langle Y,\mathbf{J}\rangle=0 is equivalent to Y1=0Y\mathbf{1}=0, which is the formulation used in .

Define the n×nn\times n symmetric matrix YY^{\ast} corresponding to the true clusters by Yij=1Y^{\ast}_{ij}=1 if vertices ii and jj are in the same cluster, including the case i=ji=j, and Yij=1r1Y^{\ast}_{ij}=-\frac{1}{r-1} otherwise.

where κ\kappa is the constant defined in (37).

Under the assumption of q=Θ(p)q=\Theta(p), the information-theoretic condition has been established in : MLE succeeds with high probability if and only if

Comparing (49) to the necessary condition (48) for SDP, we immediately conclude that SDP is orderwise suboptimal if r=ω(logn)r=\omega(\log n), or equivalently, K=o(nlogn)K=o(\frac{n}{\log n}). Furthermore, if rClognr\geq C\log n for a sufficiently large constant CC, SDP is suboptimal in terms of constants, which is consistent with the single-community result in Section 1.2.

Discussions

An interesting future direction is to establish upper and lower bounds of SOS relaxations for the problem of finding a hidden community in relatively sparse SBM.

Proofs

In this section, we prove our main theorems. In particular, Section 8.1 contains the proofs of SDP sufficient conditions given in Theorem 1, Theorem 3, and Theorem 5. The proofs of SDP necessary conditions given in Theorem 2, Theorem 4, and Theorem 6 are presented in Section 8.2.

In this subsection, we provide the proof of Theorem 1, as well as the proofs of its further consequence in the Gaussian and Bernoulli cases.

Before the main proofs, we need a dual certificate lemma, providing a set of deterministic conditions which is both sufficient and necessary for the success of SDP (3).

then ZZ^{\ast} is the unique optimal solution to (3).

Notice that Z=K(nK)n(n1)I+K(K1)n(n1)JZ=\frac{K(n-K)}{n(n-1)}\mathbf{I}+\frac{K(K-1)}{n(n-1)}\mathbf{J} is strictly feasible to (3), i.e., the Slater’s condition holds, which implies, via Slater’s theorem for SDP, that strong duality holds (see, e.g., [7, Section 5.9.1]). Thus the KKT conditions given in (50)–(52) are both sufficient and necessary for the optimality of ZZ^{\ast}.

To show the uniqueness of ZZ^{\ast} under condition (53) or condition (54), consider another optimal solution Z~{\widetilde{Z}}. Then,

where (a)(a) holds because I,Z~=K\langle\mathbf{I},{\widetilde{Z}}\rangle=K and J,Z~=K2\langle\mathbf{J},{\widetilde{Z}}\rangle=K^{2}; (b)(b) holds because L,Z~=L,Z\langle L,{\widetilde{Z}}\rangle=\langle L,Z^{\ast}\rangle, B,Z~0B,{\widetilde{Z}}\geq 0, and D,Z~iCdi=D,Z\langle D,{\widetilde{Z}}\rangle\leq\sum_{i\in C^{*}}d_{i}=\langle D,Z^{*}\rangle in view of di0d_{i}\geq 0 and Z~ii1{\widetilde{Z}}_{ii}\leq 1 for all i[n]i\in[n]. It follows that the inequality (b)(b) holds with equality, and thus D,Z~Z=0\langle D,\widetilde{Z}-Z^{\ast}\rangle=0 and B,Z~=0\langle B,\widetilde{Z}\rangle=0.

Suppose (53) holds. Since Z~0{\widetilde{Z}}\succeq 0, S0S\succeq 0, and S,Z~=0\langle S,{\widetilde{Z}}\rangle=0, Z~{\widetilde{Z}} needs to be a multiple of Z=ξ(ξ)Z^{\ast}=\xi^{\ast}(\xi^{\ast})^{\top}. Then Z~=Z{\widetilde{Z}}=Z^{\ast} since Tr(Z~)=Tr(Z)=K\mathsf{Tr}({\widetilde{Z}})=\mathsf{Tr}(Z^{\ast})=K.

Suppose instead (54) holds. Since B,Z~=0\langle B,\widetilde{Z}\rangle=0 and B,Z~0B,{\widetilde{Z}}\geq 0, it follows that Z~ij=0\widetilde{Z}_{ij}=0 for all iji\neq j such that (i,j)C×C(i,j)\notin C^{*}\times C^{\ast}. Also, in view of D,Z~Z=0\langle D,\widetilde{Z}-Z^{\ast}\rangle=0 and Z~ii1{\widetilde{Z}}_{ii}\leq 1, we have that Z~ii=1{\widetilde{Z}}_{ii}=1 for all iCi\in C^{*}. Hence, Z~ii=0{\widetilde{Z}}_{ii}=0 for all iCi\notin C^{\ast} due to I,Z~=K\langle\mathbf{I},{\widetilde{Z}}\rangle=K. Finally, it follows from J,Z~=K2\langle\mathbf{J},{\widetilde{Z}}\rangle=K^{2} that Z~ij=1{\widetilde{Z}}_{ij}=1 for all (i,j)C×C(i,j)\in C^{\ast}\times C^{\ast}. Hence, we conclude that Z~=Z{\widetilde{Z}}=Z^{\ast}. ∎

We construct (λ,η,S,D,B)(\lambda,\eta,S,D,B) which satisfy the conditions in Lemma 1. Observe that to satisfy (50), (51), and (52), we need that D=diag{di}D=\mathsf{diag}\left\{{d_{i}}\right\} with

and Bij=0B_{ij}=0 for i,jCi,j\in C^{\ast}, and

where, given λ\lambda, η\eta can be chosen without loss of generality to be:

where biλ1KjCLijb_{i}\triangleq\lambda-\frac{1}{K}\sum_{j\in C^{\ast}}L_{ij} for iCi\notin C^{\ast}. By definition, we have di(Zii1)=0d_{i}(Z^{\ast}_{ii}-1)=0 and BijZij=0B_{ij}Z^{\ast}_{ij}=0 for all i,j[n]i,j\in[n]. Moreover, for all iCi\in C^{\ast},

where the last equality holds because Bij=0B_{ij}=0 if (i,j)C×C(i,j)\in C^{\ast}\times C^{\ast}; for all iCi\notin C^{\ast},

where the last equality follows from our choice of bib_{i}. Hence, Dξ=Lξ+BξηξλK1D\xi^{\ast}=L\xi^{\ast}+B\xi^{\ast}-\eta\xi^{\ast}-\lambda K\mathbf{1} and consequently Sξ=0S\xi^{\ast}=0. Also, by definition, miniCdi0\min_{i\in C^{\ast}}d_{i}\geq 0 and miniCbi0\min_{i\notin C^{\ast}}b_{i}\geq 0, and thus D0D\geq 0, B0B\geq 0.

It remains to verify S0S\succeq 0 with λ2(S)>0\lambda_{2}(S)>0, i.e.,

it follows that for any xξx\perp\xi^{\ast} and x2=1\|x\|_{2}=1,

where (a)(a) holds because Bij=0B_{ij}=0 for all (i,j)C×C(i,j)\in C^{\ast}\times C^{\ast} and

We need the following standard result in extreme value theory (e.g., see [12, Example 10.5.3] and use union bound).

Let {Zi}\{Z_{i}\} be a sequence of standard normal random variables. Then

with equality if the random variables are independent.

Note that {jCAij:iC}\left\{\sum_{j\in C^{*}}A_{ij}:i\in C^{\ast}\right\} are not mutually independent. By Lemma 2 applied to Aij,-A_{ij},

By Lemma 5, for any sequence tn,t_{n}\to\infty,

with probability converging to one. Hence, in view of the assumption (25), we have that (61) holds with high probability.

In the remainder, we prove (26) for any K2K\geq 2 implies that ZZ^{*} is the unique optimal solution of the SDP. We write T={(i,j)C×C:ij}T=\{(i,j)\in C^{\ast}\times C^{\ast}:i\neq j\} and Tc={(i,j)[n]×[n]:ij}\TT^{c}=\{(i,j)\in[n]\times[n]:i\neq j\}\backslash T. Recall that for distinct i,ji,j, AijN(μ,1)A_{ij}\sim{\mathcal{N}}(\mu,1) if i,jCi,j\in C^{*} and N(0,1){\mathcal{N}}(0,1) otherwise. Using Lemma 2 and the assumption (26), we have

with probability converging to 11. Hence, without loss of generality, we can and do assume that (62) holds in the following. Let ZZ be any feasible solution of SDP (3). Since Zii1Z_{ii}\leq 1 for all ii and Z0,Z\succeq 0, it follows that Zij1|Z_{ij}|\leq 1 for all i,ji,j. Hence 0ZJ0\leq Z\leq\mathbf{J}. Also, JI,Z=K(K1)\langle\mathbf{J}-\mathbf{I},Z\rangle=K(K-1). So Z,A\langle Z,A\rangle is a weighted sum of the terms (Aij:ij),(A_{ij}:i\neq j), where the weights ZijZ_{ij} are nonnegative, with values in , and total weight equal to K(K1)K(K-1). The sum is thus maximized if and only if all the weight is placed on the K(K1)K(K-1) largest terms, namely AijA_{ij} with (i,j)T(i,j)\in T, which are each strictly larger than the other terms. Thus, ZZ^{*} is the unique maximizer.

1.2 Bernoulli case

We will use the following upper bounds for the binomial distribution tails [42, Theorem 1]:

where Q()Q(\cdot) denotes the standard normal tail probability. By the definition of τ1\tau_{1} and τ2\tau_{2}, it follows that

By the union bound, with high probability,

We decompose A=A1+A2A=A_{1}+A_{2}, where A1A_{1} is obtained from AA by setting all entries not in C×CC^{\ast}\times C^{\ast} to be zero; similar, A2A_{2} is obtained from AA by setting all entries in C×CC^{\ast}\times C^{\ast} to be zero. Applying Lemma 10 yields that with high probability,

where κ\kappa is defined in (37). Hence, in view of the assumption (36), we have that (63) holds with high probability. ∎

2 Necessary conditions

The proof is a slight variation of the heuristic derivation given before the statement of Theorem 2. Fix K,n,C,K,n,C^{*}, the matrix LL, and a constant aa with 1aK1\leq a\leq K and let r=aKr=\frac{a}{K}. Suppose the indices are ordered and the matrix UU is defined as in the heuristic derivation.

Let ZZ be defined as a function of ϵ0\epsilon\geq 0 as follows. We shall specify α\alpha and β\beta depending on ϵ\epsilon for sufficiently small ϵ\epsilon in such a way that

Let ξϵ\xi_{\epsilon} be the column vector with K+1K+1 nonzero entries, defined by ξϵ=(1,,1,1ϵ,βϵ,0,,0)\xi_{\epsilon}=(1,\dots,1,1-\epsilon,\beta\epsilon,0,\ldots,0)^{\top}. Finally, let Z=αξϵξϵT+2ϵUZ=\alpha\xi_{\epsilon}\xi_{\epsilon}^{T}+2\epsilon U. In expanded form:

Up to o(ϵ)o(\epsilon) terms, ZZ is equal to the matrix Z+δ1+δ2+δ3Z^{*}+\delta_{1}+\delta_{2}+\delta_{3} described in the heuristic derivation. Clearly for ϵ\epsilon sufficiently small, Z0,Z\geq 0, Z0,Z\succeq 0, and Zii1Z_{ii}\leq 1. It is also straightforward to see that

so that once we establish the feasibility of Z,Z, the proof will be complete. That is, it remains to show that α\alpha and β\beta can be selected for sufficiently small ϵ\epsilon so that (66), I,Z=K\langle\mathbf{I},Z\rangle=K, and J,Z=K2\langle\mathbf{J},Z\rangle=K^{2} hold true. The later two equations can be written as

Combining (67) and (68) to eliminate α\alpha and simplifying yields the following equation for β:\beta:

This equation has the form F(ϵ,β)=0F(\epsilon,\beta)=0 (KK and rr are fixed) with a solution at (ϵ,β)=(0,1r)(\epsilon,\beta)=(0,1-r). Also, Fϵ(0,1r)=K(1r)\frac{\partial F}{\partial\epsilon}(0,1-r)=K(1-r) and Fβ(0,1r)=K20\frac{\partial F}{\partial\beta}(0,1-r)=-K^{2}\neq 0. Therefore, by the implicit function theorem, the equation determines β\beta as a continuously differentiable function of ϵ\epsilon for small enough epsilon, and

This expression for β\beta together with (67) yields that for sufficiently small ϵ,\epsilon, α<1\alpha<1 and

Here is an alternative proof of Theorem 2 via a dual-based approach. If Z=ξ(ξ)Z^{*}=\xi^{\ast}(\xi^{\ast})^{\top} maximizes (3), then by Lemma 1 there exist dual variables (S,D,B,λ,η)(S,D,B,\lambda,\eta) with S=DBL+ηI+λJ0S=D-B-L+\eta\mathbf{I}+\lambda\mathbf{J}\succeq 0, B0B\geq 0, D=diag{di}0D=\mathsf{diag}\left\{{d_{i}}\right\}\geq 0, such that (50), (51) and (52) are satisfied. As a consequence, the choice of DD is fixed, namely,

Therefore, the condition miniCdi0\min_{i\in C^{\ast}}d_{i}\geq 0 implies that

Moreover, the dual variable BB satisfies BCC=0B_{C^{\ast}C^{\ast}}=0 and the off-diagonal block B(C)cCB_{(C^{\ast})^{c}C^{\ast}} satisfies

Denote all possible choices of BB by the following convex set:

In particular, we have jCBij0\sum_{j\in C^{*}}B_{ij}\geq 0 for all iCi\notin C^{*}, which implies that

Finally, S=D+λJBL+ηI0S=D+\lambda\mathbf{J}-B-L+\eta\mathbf{I}\succeq 0 and Sξ=0S\xi^{\ast}=0 imply that there exists BBB\in{\mathcal{B}} and η\eta such that ηsupx=1x(B+LDλJ)x\eta\geq\sup_{\|x\|=1}x^{\top}(B+L-D-\lambda\mathbf{J})x and \eqrefeq:etaupperbound\eqref{eq:etaupperbound} holds. Hence,

where (a) follows because U=(1/n)I+JU=(1/n)\mathbf{I}+\mathbf{J} is strictly feasible for the supremum in (75) (i.e. it satisfies Slater’s condition) so the strong duality holds.

Restricting UU in (76) to satisfy Uij=0U_{ij}=0 except for those i,jCi,j\notin C^{\ast}, and U,J=a[1,K]\langle U,\mathbf{J}\rangle=a\in[1,K], we get that ηsup1aK{VnK(a)aλ}\eta\geq\sup_{1\leq a\leq K}\left\{V_{n-K}(a)-a\lambda\right\}. It follows from (72) that

where the last inequality follows from aKa\leq K and (74). ∎

Consider the Gaussian case P=N(μ,1)P={\mathcal{N}}(\mu,1) and Q=N(0,1)Q={\mathcal{N}}(0,1). Before the proof of Theorem 4, we need to introduce a key lemma to lower bound the value of Vm(a)V_{m}(a) given in (6). Recall that m=nKm=n-K. By the assumption, L=AL=A and hence MM has the same distribution as an m×mm\times m symmetric random matrix WW with zero-diagonal and Wiji.i.d.N(0,1)W_{ij}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}{\mathcal{N}}(0,1) for 1i<jm1\leq i<j\leq m. The following lemma provides a high-probability lower bound on Vm(a)V_{m}(a) defined in (6); its proof is deferred to Appendix E.

Assume that a>1a>1 and a=o(m)a=o(m) as mm\to\infty. Let M=WM=W be an m×mm\times m symmetric random matrix with zero-diagonal and independent standard normal entries in the definition of Vm(a)V_{m}(a) in (6). Then with probability tending to one,

where rm3/48(a1)+2am=o(m)r\triangleq\frac{m^{3/4}}{\sqrt{8(a-1)}}+\frac{2a}{\sqrt{m}}=o(\sqrt{m}) if a=ω(m)a=\omega(\sqrt{m}).

We also have the following simple observations on Vm(a)V_{m}(a):

Dropping the second and the last constraints in (6), we have Vm(a)λmax(W)=2m(1+oP(1))V_{m}(a)\leq\lambda_{\max}(W)=2\sqrt{m}(1+o_{P}(1)).

We next prove Theorem 4 by combining Theorem 2 and Lemma 3.

It follows from (78) that with a non-vanishing probability,

K=ω(n)K=\omega(\sqrt{n}). We show that the necessary condition (29) holds. In view of (81), to get a necessary condition as tight as possible, one should choose aa so that VnK(a)V_{n-K}(a) is large and aa is small comparing to KK. To this end, set a=K(nK)1/4a=\sqrt{K}(n-K)^{1/4}. Since K=o(n)K=o(n) and K=ω(n)K=\omega(\sqrt{n}) by assumption, we have a=ω(nK)a=\omega(\sqrt{n-K}) and a=o(K)a=o(K). Applying Lemma 3, we conclude that

Combining (78), (79), (80), and (82), and using nKnK/(2nK),\sqrt{n-K}\geq\sqrt{n}-K/(2\sqrt{n-K}), we obtain the desired (29).

K=O(n)K=O(\sqrt{n}). In view of the high-probability lower bounds to VnK(a)V_{n-K}(a) for a=O(nK)a=O(\sqrt{n-K}) given in (77), VnK(a)a2log(nK)/KV_{n-K}(a)-a\sqrt{2\log(n-K)/K} is maximized over [1,K][1,K] at a=Ka=K. Hence, we set a=Ka=K, which satisfies a=O(nK)a=O(\sqrt{n-K}). It follows from (81) that with a non-vanishing probability,

The desired lower bound on μ\mu follows from the high-probability lower bounds on VnK(K)V_{n-K}(K) given in (77) for a=O(nK)a=O(\sqrt{n-K}). ∎

2.2 Bernoulli case

Recall that m=nKm=n-K and by assumption, L=AL=A. In the Bernoulli case, MM is an m×mm\times m symmetric random matrix with zero diagonal and independent entries such that Mij=MjiBern(q)M_{ij}=M_{ji}\sim{\rm Bern}(q) for all i<ji<j. The following lemma provides a high-probability lower bound on Vm(a)V_{m}(a) defined in (6); its proof is deferred to Appendix F.

Assume that a=o(m)a=o(m), qq is bounded away from 11, m2qm^{2}q\to\infty. Recall that κ\kappa is defined in (37). With probability tending to one,

If a11κmq/(1q)a-1\geq\frac{1}{\kappa}\sqrt{mq/(1-q)}, then

If 0a11κmq/(1q)0\leq a-1\leq\frac{1}{\kappa}\sqrt{mq/(1-q)}, then Vm(a)=a1V_{m}(a)=a-1.

We have the following simple observations on Vm(a)V_{m}(a):

Vm(1)=0V_{m}(1)=0 and Vm(a)(a1)A=a1V_{m}(a)\leq(a-1)\|A\|_{\infty}=a-1.

Dropping the second and the last constraints in (6), we have with high probability Vm(a)λmax(A)κmq(1q)V_{m}(a)\leq\lambda_{\max}(A)\leq\kappa\sqrt{mq(1-q)}.

We next prove Theorem 6 by combining Theorem 2 and Lemma 4.

We first show that if ZZ^{*} is unique with some non-vanishing probability, then K1nq/(1q)/κK-1\geq\sqrt{nq/(1-q)}/\kappa. We prove it by contradiction. Suppose that K1<(nK)q/(1q)/κK-1<\sqrt{(n-K)q/(1-q)}/\kappa. Let A~{\widetilde{A}} denote the (nK)×(nK)(n-K)\times(n-K) submatrix of AA supported on (C)c×(C)c(C^{\ast})^{c}\times(C^{\ast})^{c}. Take a=Ka=K in Lemma 4; the last statement of the lemma implies that VnK(K)=K1V_{n-K}(K)=K-1 with high probability. Furthermore, the proof of the lemma shows that the (nK)×(nK)(n-K)\times(n-K) matrix Z~{\widetilde{Z}} defined by Z~ii=1/(nK){\widetilde{Z}}_{ii}=1/(n-K) and Z~ij=(K1)A~ij/A~,J{\widetilde{Z}}_{ij}=(K-1){\widetilde{A}}_{ij}/\langle{\widetilde{A}},\mathbf{J}\rangle for iji\neq j satisfies Z~,A~=K1\langle{\widetilde{Z}},{\widetilde{A}}\rangle=K-1 and, with high probability, Z~0{\widetilde{Z}}\succeq 0. Let ZZ be the n×nn\times n matrix such that Z(C)c(C)c=KZ~Z_{(C^{\ast})^{c}(C^{\ast})^{c}}=K{\widetilde{Z}} and Zij=0Z_{ij}=0 for all (i,j)(C)c×(C)c(i,j)\notin(C^{\ast})^{c}\times(C^{\ast})^{c}. Then one can easily verify that ZZ is feasible for (3) with high probability and Z,A=K(K1)\langle Z,A\rangle=K(K-1). Since Z,AK(K1)\langle Z^{*},A\rangle\leq K(K-1), it follows that with high probability ZZ^{\ast} is not the unique optimal solution to (3), arriving at a contradiction. The necessity of (40) is then proved.

We use the following lower bounds for the binomial distribution tails [42, Theorem 1]:

Let Ko=KlogKK_{o}=\lceil\frac{K}{\log K}\rceil and σ2=(Ko1)p\sigma^{2}=(K_{o}-1)p. Define events

By the definition of τ2\tau^{\prime}_{2} and the convexity of divergence, we have that d(τ2q)(1δ)d(τ2q)d(\tau^{\prime}_{2}\|q)\leq(1-\delta)d(\tau_{2}\|q). Thus

where we used the bound Q(x)12πtt2+1et2/2Q(x)\geq\frac{1}{\sqrt{2\pi}}\frac{t}{t^{2}+1}{\rm e}^{-t^{2}/2} and the fact that δloglog(nK)log(nK)\delta\geq\frac{\log\log(n-K)}{\log(n-K)}. Hence,

Applying Lemma 4, we have that with probability converging to 11,

Recall that we have shown that K11κ(nK)q/(1q)K-1\geq\frac{1}{\kappa}\sqrt{(n-K)q/(1-q)} in the first part of the proof. In view of τ2q\tau^{\prime}_{2}\geq q and (88), VnK(a)aτ2V_{n-K}(a)-a\tau^{\prime}_{2} is maximized at

which gives VnK(a)=a1V_{n-K}(a)=a-1. Hence, it follows from (87) that

Plugging in the definition of τ1\tau^{\prime}_{1} and τ2\tau^{\prime}_{2}, we derive that

where the last inequality follows because τ2τ2\tau^{\prime}_{2}\leq\tau_{2} and δ2loglogKlogK\delta\leq\frac{2\log\log K}{\log K}. Hence, we arrive at the desired necessary condition (41). ∎

2.3 Multiple-community stochastic block model

Since the MLE is optimal, in proving the theorem, we can assume without loss of generality that the necessary condition for consistency of the MLE, K(pq)2=Ω(qlogn)K(p-q)^{2}=\Omega(q\log n), holds (see Remark 9). Since p=Θ(q)p=\Theta(q), it follows that we can assume without loss of generality that K(pq)=Ω(logn)K(p-q)=\Omega(\log n) and Kq=Ω(logn).Kq=\Omega(\log n).

Suppose (48) fails, namely, there exists ϵ>0\epsilon>0 such that

We construct a matrix YY which, with high probability, constitutes a feasible solution to the SDP program (47) with an objective value exceeding that of YY^{\ast}. The construction is a variant of that used in proving Lemma 4 in Appendix F. Let

Since w0w\leq 0, to satisfy the other constraints in (47), it suffices to ensure

where dmax=maxidid_{\max}=\max_{i}d_{i} is the maximal degree.

Since Y1=0Y\mathbf{1}=0, (93) is equivalent to PYP0PYP\succeq 0, where P=I1nJP=\mathbf{I}-\frac{1}{n}\mathbf{J} is the matrix for projection onto the subspace orthogonal to 1\mathbf{1}. Since

By the Chernoff bounds for binomial distributions,

Solving (91) and (95) and by the assumption p=o(1)p=o(1) and the fact 1n1=o(pqr),\frac{1}{n-1}=o(\frac{p-q}{r}), we have:

Hence t+2wdmax=(1+ϵ+oP(1))pqr1r1t+2wd_{\max}=-(1+\epsilon+o_{P}(1))\frac{p-q}{r}\geq-\frac{1}{r-1}, i.e., (92), holds with high probability.

Acknowledgement

This research was supported by the National Science Foundation under Grant CCF 14-09106, IIS-1447879, NSF OIS 13-39388, and CCF 14-23088, and Strategic Research Initiative on Big-Data Analytics of the College of Engineering at the University of Illinois, and DOD ONR Grant N00014-14-1-0823, and Grant 328025 from the Simons Foundation. This work was done in part while J. Xu was visiting the Simons Institute for the Theory of Computing.

References

Appendix A Bounds on spectral norms of random matrices

For the convenience of the reader, this section collects known bounds on the spectral norms of random matrices that are used in this paper.

There is a universal constant CC such that whenever AA is a random matrix (not necessarily square) of independent and zero-mean entries:

The following two lemmas are used in the proof of Lemma 10 below.

Let XX be an n×nn\times n symmetric random matrix with Xij=ξijbijX_{ij}=\xi_{ij}b_{ij}, where {ξij:ij}\{\xi_{ij}:i\geq j\} are independent symmetric random variables with unit variance, and {bij:ij}\{b_{ij}:i\geq j\} are given scalers. Then for any α3\alpha\geq 3,

where σ2maxijbij2\sigma^{2}\triangleq\max_{i}\sum_{j}b_{ij}^{2}.

There are universal constants CC and CC^{\prime} such that the following holds. Let AA be a symmetric random matrix such that {Aij:1ijn}\{A_{ij}:1\leq i\leq j\leq n\} are independent, zero-mean, variance at most σ2\sigma^{2}, and bounded in absolute value by KK. If KK and σ\sigma depend on nn such that σCn1/2Klog2n\sigma\geq C^{\prime}n^{-1/2}K\log^{2}n, then

with probability converging to one as nn\to\infty.

For example, for the case the matrix entries are Bern(p){\rm Bern}(p), the second term in (97) becomes asymptotically negligible compared to the first if pn=ω((np)1/4logn)\sqrt{pn}=\omega((np)^{1/4}\log n), or equivalently, np=ω(log4n)np=\omega(\log^{4}n).

Let MM denote a symmetric n×nn\times n random matrix with zero diagonals and independent entries such that Mij=MjiBern(pij)M_{ij}=M_{ji}\sim{\rm Bern}(p_{ij}) for all i<ji<j with pijp_{ij}\in. Assume pij(1pij)rp_{ij}(1-p_{ij})\leq r for all i<ji<j and nr=Ω(logn)nr=\Omega(\log n). Then, with high probability,

It follows from the symmetrization argument and Lemma 8 (for this application of the lemma, bijr,b_{ij}\leq\sqrt{r}, ξijbij1|\xi_{ij}b_{ij}|\leq 1, and σ2nr\sigma^{2}\leq nr) that for any α3\alpha\geq 3,

Appendix B A concentration inequality for a random matrix of log normal entries

Let g(x)=eτxτ2/2g(x)=e^{\tau x-\tau^{2}/2} for some τ>0\tau>0. Recall that WW is an m×mm\times m symmetric, zero-diagonal random matrix with i.i.d. standard Gaussian entries up to symmetry. Let g(W)g(W) denote an m×mm\times m symmetric, zero-diagonal random matrix whose (i,j)(i,j)-th entry is g(Wij)g(W_{ij}) for iji\neq j. We need the following matrix concentration inequality for g(W)g(W).

There exists a universal constant C>0C>0 such that

In addition, if τ0\tau\to 0 as mm\to\infty, then the following refined bound holds:

where Δ=2mτ3/2=o(mτ)\Delta=2\sqrt{m}\tau^{3/2}=o(\sqrt{m}\tau).

where g(s)(s+1)64(s+1)3+6(s+1)3=s2(3+16s+15s2+6s3+s4)200s2g(s)\triangleq(s+1)^{6}-4(s+1)^{3}+6(s+1)-3=s^{2}(3+16s+15s^{2}+6s^{3}+s^{4})\leq 200s^{2} for all ss\in. Applying (102) again yields the desired result.

To bound h(W)\|h(W)\|, let BB be the upper-triangular part of h(W)h(W), namely, Bij=h(Wij)B_{ij}=h(W_{ij}) if i<ji<j and 0 elsewhere. Then h(W)2B\|h(W)\|\leq 2\|B\|. Since BB consists of independent zero-mean entries, Lemma 6 yields

for some universal constant cc. Note that

Combining the last displayed equation with (103) and applying the union bound complete the proof for the case τ0\tau\to 0. ∎

Appendix C Useful facts on binary divergences

The upper bound follows by applying the inequality logxx1\log x\leq x-1 for x>0x>0 and the lower bound is proved using 2d(pq)p2=1p(1p)\frac{\partial^{2}d(p\|q)}{\partial p^{2}}=\frac{1}{p(1-p)} and Taylor’s expansion. ∎

Assume that 0<qp<10<q\leq p<1 and u,v[q,p]u,v\in[q,p]. Then for any 0<η<10<\eta<1,

for some x(min{u,v},max{u,v})x\in(\min\{u,v\},\max\{u,v\}). Notice that d(xv)=logx(1v)(1x)vd^{\prime}(x\|v)=\log\frac{x(1-v)}{(1-x)v} and thus

where the last equality holds due to log(1+x)x\log(1+x)\leq x and x(q,p).x\in(q,p). It follows that

where the last inequality holds due to the lower bounds in (104) and (105). Thus the first claim follows. For the second claim,

where the first inequality holds due to the lower bounds in (104) and (105); the last inequality holds due to the upper bounds in (104) and (105). ∎

Assume that logp(1q)q(1p)\log\frac{p(1-q)}{q(1-p)} is bounded from above. Suppose for some ϵ>0\epsilon>0 that Kd(pq)>(1+ϵ)lognKKd(p\|q)>(1+\epsilon)\log\frac{n}{K} for all sufficiently large nn. Recall that τ\tau^{\ast} is defined in (42). Then pτ=Θ(pq)p-\tau^{\ast}=\Theta(p-q) and τq=Θ(pq)\tau^{\ast}-q=\Theta(p-q).

Notice that d(pq)+d(qp)=(pq)logp(1q)q(1p)d(p\|q)+d(q\|p)=(p-q)\log\frac{p(1-q)}{q(1-p)}. Hence,

By the boundedness assumption of logp(1q)q(1p)\log\frac{p(1-q)}{q(1-p)} and Lemma 12, d(pq)d(qp)d(p\|q)\asymp d(q\|p). Since Kd(pq)>(1+ϵ)lognKKd(p\|q)>(1+\epsilon)\log\frac{n}{K} for all sufficiently large nn, it follows that pτp-\tau^{*} and τq\tau^{*}-q are both Θ(pq).\Theta(p-q).

Assume that logp(1q)q(1p)\log\frac{p(1-q)}{q(1-p)} is bounded. Suppose that Kd(pq)>(1+ϵ)lognKKd(p\|q)>(1+\epsilon)\log\frac{n}{K} for all sufficiently large nn.

If lim infnKd(τq)logn1\liminf_{n\to\infty}\frac{Kd(\tau^{\ast}\|q)}{\log n}\geq 1, then τ1\tau_{1} and τ2\tau_{2} in (35) are well-defined and take values in the interval [q,p][q,p].

If lim infnKd(τq)logn>1\liminf_{n\to\infty}\frac{Kd(\tau^{\ast}\|q)}{\log n}>1, then there exists a fixed constant η>0\eta>0 such that τ1(1η)τ+ηp\tau_{1}\geq(1-\eta)\tau^{\ast}+\eta p and τ2(1η)τ+ηq\tau_{2}\leq(1-\eta)\tau^{\ast}+\eta q.

It follows from Lemma 14 that pτ=Ω(pq)p-\tau^{\ast}=\Omega(p-q) and τq=Ω(pq)\tau^{\ast}-q=\Omega(p-q). In particular, there exists a fixed constant δ>0\delta>0 such that (1δ)q+δpτ(1δ)p+δq(1-\delta)q+\delta p\leq\tau^{\ast}\leq(1-\delta)p+\delta q. By the monotonicity and convexity of divergence, d(τq)(1δ)d(pq)d(\tau^{\ast}\|q)\leq(1-\delta)d(p\|q) and d(τp)(1δ)d(qp)d(\tau^{\ast}\|p)\leq(1-\delta)d(q\|p). Hence, if lim infnKd(τq)logn1\liminf_{n\to\infty}\frac{Kd(\tau^{\ast}\|q)}{\log n}\geq 1, then Kd(pq)(1+δ)lognKd(p\|q)\geq(1+\delta^{\prime})\log n and Kd(qp)(1+δ)logKKd(q\|p)\geq(1+\delta^{\prime})\log K for some fixed constant δ>0\delta^{\prime}>0. Thus, in view of the continuity of binary divergence functions, τ1\tau_{1} and τ2\tau_{2} are well-defined, and moreover τ1q\tau_{1}\geq q and τ2p\tau_{2}\leq p.

Note that (1η)τ+ηp[q,p](1-\eta)\tau^{\ast}+\eta p\in[q,p]. In view of Lemma 13,

If lim infnKd(τq)logn>1\liminf_{n\to\infty}\frac{Kd(\tau^{\ast}\|q)}{\log n}>1, then there exists a fixed constant ϵ>0\epsilon^{\prime}>0 such that for sufficiently large nn, Kd(τq)(1+ϵ)lognKd(\tau^{\ast}q)\geq(1+\epsilon^{\prime})\log n. It follows from the last displayed equation that by choosing η\eta sufficiently small, d((1η)τ+ηqq)(1+δ)lognd\left((1-\eta)\tau^{\ast}+\eta q\|q\right)\geq(1+\delta^{\prime})\log n for some fixed constant δ>0\delta^{\prime}>0. Thus by definition, τ2(1η)τ+ηq\tau_{2}\leq(1-\eta)\tau^{\ast}+\eta q. Similarly, one can verify that τ1(1η)τ+ηp\tau_{1}\geq(1-\eta)\tau^{\ast}+\eta p. ∎

Appendix D Proof of Corollary 1

We first show that if γ1>γ2\gamma_{1}>\gamma_{2}, then

which implies that MLE achieves exact recovery in view of (43).

Recall that I(x,y)=xylog(ex/y)I(x,y)=x-y\log({\rm e}x/y) for x,y>0.x,y>0. Define τ0=ablog(a/b)\tau_{0}=\frac{a-b}{\log(a/b)}. Then I(b,τ0)=I(a,τ0)I(b,\tau_{0})=I(a,\tau_{0}). Note that I(b,γ2)=I(a,γ1)=1/ρI(b,\gamma_{2})=I(a,\gamma_{1})=1/\rho. Since I(b,x)I(b,x) is strictly increasing over [b,)[b,\infty) and I(a,x)I(a,x) is strictly decreasing over (0,a](0,a], it follows that γ2<τ0<γ1\gamma_{2}<\tau_{0}<\gamma_{1}. Thus I(b,τ0)>1/ρI(b,\tau_{0})>1/\rho. In the regime (46), we have τ=log2nn(τ0+o(1))\tau^{\ast}=\frac{\log^{2}n}{n}\left(\tau_{0}+o(1)\right). Taylor’s expansion yields that

Secondly, suppose that MLE achieves exact recovery. We aim to show that γ1γ2\gamma_{1}\geq\gamma_{2}. Suppose not. Then γ1<γ2\gamma_{1}<\gamma_{2}. By the similar argument as above, it follows that γ1<τ0<γ2\gamma_{1}<\tau_{0}<\gamma_{2}. Thus I(b,τ0)<1/ρI(b,\tau_{0})<1/\rho. As a consequence,

for some positive constant ϵ>0\epsilon>0, which contradicts the fact that lim infnKd(τq)logn1\liminf_{n\to\infty}\frac{Kd(\tau^{\ast}\|q)}{\log n}\geq 1, the necessary condition (44) for MLE to achieve exact recovery.

Finally, we prove the claims for SDP. By definition, τ1=log2n(γ1+o(1))/n\tau_{1}=\log^{2}n(\gamma_{1}+o(1))/n and τ2=log2n(γ2+o(1))/n\tau_{2}=\log^{2}n(\gamma_{2}+o(1))/n. Therefore, if ρ(γ1γ2)>4b\rho(\gamma_{1}-\gamma_{2})>4\sqrt{b}, then the sufficient condition for SDP (36) holds; if the necessary condition for SDP (41) holds, then ρ(γ1γ2)b/4\rho(\gamma_{1}-\gamma_{2})\geq\sqrt{b}/4.

Appendix E Proof of Lemma 3

Define an m×mm\times m matrix ZZ by Zii=1mZ_{ii}=\frac{1}{m} and Zij=a1αm(m1)g(Wij)Z_{ij}=\frac{a-1}{\alpha m(m-1)}g(W_{ij}) for iji\neq j. By definition, Z0,Tr(Z)=1Z\geq 0,\mathsf{Tr}(Z)=1, and Z,J=a\langle Z,\mathbf{J}\rangle=a.

Thus to get a lower bound to Vm(a)V_{m}(a) as tight as possible, we would like to maximize τ\tau so that Z0Z\succeq 0 with high probability.

Hence, since J0\mathbf{J}\succeq 0 and a1a\geq 1, to show Z0Z\succeq 0, it suffices to verify that

The last two terms in (110) are lower order terms comparing to the first term; thus τ=(1+o(1))m2(a1)\tau=(1+o(1))\frac{\sqrt{m}}{2(a-1)}. It follows that for sufficiently large mm, τ>0\tau>0, τ=o(1)\tau=o(1), and τ=ω(1/m)\tau=\omega(1/\sqrt{m}).

Since τ0\tau\to 0 and τm\tau m\to\infty, applying Lemma 11 yields that with probability tending to one,

Plugging in the definition of τ\tau given in (110), (114) implies that with probability converging to one,

Since by assumption a=ω(m)a=\omega(\sqrt{m}) and a=o(m)a=o(m), combining (113) and (115) yields that with probability tending to one, (109) and hence Z0Z\succeq 0 hold.

In view of (110) and (112), with probability tending to one,

a=o(m)a=o(\sqrt{m}). The desired lower bound given in the third part of (77) is trivially true for a=1a=1 so we suppose a2a\geq 2. The proof is almost identical to the first case except that we set

First, we verify that (109) holds with high probability. By the choice of τ\tau, eτ2=o(m1/3)e^{\tau^{2}}=o(m^{1/3}). Thus, (111) and hence (112) continue to hold. It follows from (112) that α=1+OP(log(m/a2)m)\alpha=1+O_{P}(\frac{\log(m/a^{2})}{\sqrt{m}}). Applying Lemma 11 and Markov’s inequality, with probability at least 1(logma2)1/41-(\log\frac{m}{a^{2}})^{-1/4},

Plugging in the definition of τ\tau given in (117), it further implies that with high probability,

Therefore (109) holds with high probability.

Then we compute the value of the objective function Z,W\langle Z,W\rangle. Entirely analogously to (116), we have

Therefore with probability tending to one,

By the choice of τ\tau given in (117), we have that

Combining the last two displayed equations yield that with high probability,

proving the desired lower bound to Vm(a)V_{m}(a) given in the third part of (77).

a=Θ(m)a=\Theta(\sqrt{m}). Let τ\tau be a constant to be chosen later. The proof is similar to the previous two cases; the key difference is that the distributions of entries of g(W)g(W) are independent of mm, and thus we can the invoke Lemma 7, a corollary of the Bai-Yin theorem, instead of Lemma 11, to obtain

In view of (109), as long as τ\tau is chosen to be a constant so that

we have Z0Z\succeq 0 with high probability.

Finally, we compute the value of the objective function Z,W\langle Z,W\rangle. Entirely analogously to (116), we have

It follows that with probability converging to 11,

which yields the desired lower bound to Vm(a)V_{m}(a). ∎

Appendix F Proof of Lemma 4

The proof follows the same fashion as that in the Gaussian case. In particular, to prove the desired lower bound to Vm(a)V_{m}(a), we construct an explicit feasible solution ZZ to (6); however, the particular construction is different. Recall that in the Bernoulli case, MM is assumed to be an m×mm\times m symmetric random matrix with zero diagonal and independent entries such that Mij=MijBern(q)M_{ij}=M_{ij}\sim{\rm Bern}(q) for all i<ji<j.

and assume that R(0,1)R\in(0,1) for the time being. For a given γ(0,1]\gamma\in(0,1], define

Define an m×mm\times m matrix ZZ by Zii=1/mZ_{ii}=1/m and Zij=αMij+βZ_{ij}=\alpha M_{ij}+\beta for iji\neq j. By definition, Z,I=1\langle Z,\mathbf{I}\rangle=1, α+β=γ(a1)Rm(m1)0\alpha+\beta=\frac{\gamma(a-1)}{Rm(m-1)}\geq 0, and thus Z0Z\geq 0. Moreover,

Thus to get a lower bound to Vm(a)V_{m}(a) as tight as possible, we would like to choose γ\gamma as large as possible to satisfy Z0Z\succeq 0 with high probability.

Thus, to show Z0Z\succeq 0, it suffices to verify that

As a result, we would like to choose γ(0,1]\gamma\in(0,1] as large as possible to satisfy (119). We pause to give some intuitions on the choice of γ\gamma. By concentration inequalities, RqR\approx q with high probability. Since a=o(m)a=o(m), β=o(1/m)\beta=o(1/m). Furthermore, qmq(1q)q\ll\sqrt{mq(1-q)}. Hence, to satisfy (119), roughly it suffices that

This suggests that we should take γ\gamma to be the minimum of q+mq(1q)κ(a1)q+\frac{\sqrt{mq(1-q)}}{\kappa(a-1)} and 11.

Before specifying the precise choice of γ\gamma, we first show that RR is close to qq with high probability. Let cm=log(mq)c_{m}=\log(m\sqrt{q}) which converges to infinity under the assumption that m2qm^{2}q\to\infty. Thus, by the Chernoff bound for the binomial distribution, with probability converging to 11, Rqcmq/m|R-q|\leq c_{m}\sqrt{q}/m. Without loss of generality, we can and do assume that Rqcmq/m|R-q|\leq c_{m}\sqrt{q}/m in the remainder of the proof. Since qq is bounded away from 11 and m2qm^{2}q\to\infty, RR is also bounded away from 11 and R>0R>0. This verifies that α,β\alpha,\beta and hence ZZ are well-defined.

where ϵ=2/log(mmin{q,1/a})\epsilon=2/\log\left(m\min\{\sqrt{q},1/a\}\right). Equivalently,

The assumptions, m2qm^{2}q\to\infty and a=o(m)a=o(m), imply that ϵ=o(1)\epsilon=o(1) and hence γ[q,1]\gamma\in[q,1].

Next, we compute the value of Z,M\langle Z,M\rangle. In view of (118), it suffices to evaluate (a1)γ(a-1)\gamma. By the choice of γ\gamma,

Since ϵ=o(1)\epsilon=o(1), absorbing the factor 1ϵ1-\epsilon in the last displayed equation into the definition of κ\kappa given in (37) yields the desired lower bound to Vm(a)V_{m}(a).

To finish the proof, we are left to verify (119). Since β+αR=a1m(m1)\beta+\alpha R=\frac{a-1}{m(m-1)}, it follows that

where we used the fact that Rqcmq/m|R-q|\leq c_{m}\sqrt{q}/m and αaγ/m2R\alpha\leq a\gamma/m^{2}R in the last equality.

Let α0=γqq(1q)(a1)m(m1)\alpha_{0}=\frac{\gamma-q}{q(1-q)}\frac{(a-1)}{m(m-1)}. Next, we bound αα0|\alpha-\alpha_{0}| from the above. In view of Rqcmq/m|R-q|\leq c_{m}\sqrt{q}/m and γq\gamma\geq q,

Thus, to verify (119), it reduces to show the right hand side of the last displayed equation is negative. In view of (121),

where the last equality because cm=log(mq)c_{m}=\log(m\sqrt{q}) and the assumption that a=o(m)a=o(m). Combining the last two displayed equations and plugging in the definition of ϵ\epsilon yield that

Hence, it follows from (125) that (119) holds. Consequently, Z0Z\succeq 0 holds with high probability. This completes the proof of the lemma.

Appendix G Proof of (79)

Note that for each iCi\in C, XijCWijX_{i}\triangleq\sum_{j\in C}W_{ij} is distributed according to N(0,K1){\mathcal{N}}(0,K-1) but not independently. Below we use the Chung-Erdös inequality :

Appendix H Proof of (86)

Suppose for convenience of notation that CC^{*} consists of the first KK indices, and TT consists of the first KoK_{o} indices: C=[K]C^{*}=[K] and T=[Ko]T=[K_{o}]. Let T={iT:e(i,T)(Ko1)p+6σ},T^{\prime}=\{i\in T:e(i,T)\leq(K_{o}-1)p+6\sigma\}, SinceIn case T=T^{\prime}=\emptyset we use the usual convention that the minimum of an empty set of numbers is ++\infty.

The set TT^{\prime} is independent of (e(i,C\T):iT)(e(i,C^{*}\backslash T):i\in T) and those variables each have the Binom(KKo,p){\rm Binom}(K-K_{o},p) distribution. Using the tail lower bound (84), we have

By definition of τ1\tau^{\prime}_{1} and the convexity of divergence, d(τ1p)(1δ)d(τ1p)d(\tau^{\prime}_{1}\|p)\leq(1-\delta)d(\tau_{1}\|p), it follows that