Tight Lower Bounds for Planted Clique in the Degree-4 SOS Program
Prasad Raghavendra, Tselil Schramm
Introduction
In the Maximum Clique problem, the input consists of a graph and the goal is to find the largest subset of vertices all of which are connected to each other. The Maximum Clique problem is NP-hard to approximate within a -factor for all [Hås96, Kho01].
The works of [FK08, BV09] show that, if one were able to efficiently calculate the injective tensor norm of a certain random order- tensor, then by extending the spectral algorithm of [AKS98] one would have a polynomial-time algorithm for . However, there is no known algorithm that efficiently computes the injective tensor norm of an order- tensor; in fact computing the inective tensor norm is hard to approximate in the general case [HM13].
This line of work brings to the fore the question: can a -degree SOS relaxation solve the planted clique problem for some ? While lower bounds are known for Lovász-Schrijver SDP relaxations for planted clique [FK00, FK03], SOS relaxations can in general be much more powerful than Lovász-Schrijver relaxations. For example, while there are instances of unique games that are hard for -rounds of the Lovász-Schrijver SDP hierarchy [KS09, RS09], recent work has shown that these instances are solved by degree- SOS hierarchy [BBH+12].
Moreover, even the degree- SOS relaxation proves to be surprisingly powerful in a few applications:
First, the work of Barak et al. [BBH+12] shows that a degree SOS relaxation can certify hypercontractivity of low degree polynomials over the hypercube. This argument is the reason that hard instances for Lovász-Schriver and other SDP hierarchies constructed via the noisy hypercube gadgets are easily refuted by the SOS hierarchy.
Second, a degree- SOS relaxation can certify that the -to- norm of a random subspace of dimension at most is bounded by a constant (with high probability over the choice of the subspace) [BBH+12]. This average-case problem has superficial similarities to the planted clique problem.
In this work, we make modest progress towards a lower bound for SOS relaxations of planted clique by obtaining a nearly tight lower bound for the degree- SOS relaxation (corresponding to two rounds, ). More precisely, our main result is the following.
Note that by the work of [AKS98], this result is tight up to logarithmic factors. In an independent work, Hopkins, Kothari and Potechin [HKP15] have obtained a similar result.
We use the symbol to denote the PSD ordering on matrices, saying that if is PSD and that if . When we wish to hide constant factors for clarity, we use to denote that for some constant .
In our notation, we at times closely follow the notation of [DM15a], as our paper builds on their results and we recycle many of their bounds.
2 Organization
Preliminaries and Proof Overview
In this section, we describe the degree-4 SOS relaxation for the max-clique SDP and give background on the Deshpande-Montanari witness. We then describe our own modified witness, and give an overview of the proof that our witness is feasible (the difficult part being showing that our witness is PSD). The full proof of feasibility is deferred to Section 3.
It is instructive to think of the variable as a pseudoexpectation of the product of indicator variables, or a pseudomoment:
We obtain Theorem 2.1 by constructing a point, or witness, for each , then proving that the point is feasible with high probability. We defer the description of our witness to Definition 2.8 and Definition 2.9, as we spend Section 2.2 and Section 2.3 motivating our construction; however the curious reader may skip ahead to Definition 2.9 which does not require the knowledge of additional notation.
2 Deshpande-Montanari Witness
This SDP solution assigns to each clique of size , a value that depends only on its size (in our case, ). In essence, their solution takes advantage of the independence of the instance. The motivating observation is that the variable can be thought of as a pseudoexpectation of the indicator that is a subclique of the planted clique. The idea is then to make this pseudoexpectation of the indicator consistent with the true expectation under the distribution where a clique of size is planted uniformly at random within the instance of . Thus, every vertex is in the clique “with uniform probability:”
Then, the same principle is applied to edges, traingles, and -cliques, so that
This is the general idea of the SDP solution of [DM15a]. More formally, the SDP solution in [DM15a] is specified by four parameters as,
where for a set of vertices , is the indicator that the subgraph induced on is a clique. The parameters determine the value of the objective function, and the feasibility of the solution. As a convention, we will define .
It is easy to check that the solution satisfies all the linear constraints of the SOS program (2.1), since it assigns non-zero values only to cliques in . The key difficulty is in showing that the matrix is PSD for an appropriate choice of parameters .
In order to show that , it is sufficient to show that where,
where is the indicator for the presence of the edge . In words, is the matrix where the entry is proportional not to the indicator of whether is a clique, but to the indicator of whether has as a subgraph the bipartite clique with bipartitions and . It is easy to see that the matrix is obtained by dropping from the rows and columns corresponding where . Hence .
Notice that is a random matrix whose entries depend on the edges in the random graph . At the risk of over-simplification, the approach of both the previous works [MPW15] and [DM15a] can be broadly summarized as follows:
The most significant challenge is to argue that (2.3) holds with high probability. In fact, the inequality only holds for the Deshpande-Montanari SDP solution with high probability for parameters for which the objective value is .
where are the projections to the spaces respectively. The eigenvalues are given by,
Deviation from Expectation
where includes all multilinear entries, and includes all non-multilinear entries, i.e., entries where . Formally,
The spectral norm of the matrix over the eigenspaces is carefully bounded in [DM15a].
(Proposition 4.20, 4.25 in [DM15a]) With probability at least , all of the following bounds hold:
Deshpande and Montanari fix , , and for a parameter . Using Proposition 2.3 and Lemma 2.4, the above matrix inequality becomes,
which can be shown to hold for . Eventually, it is necessary to show (2.3), which is stronger than . This is again achieved by showing bounds on the spectra of and . We refer the reader to [DM15a] for more details of the arguments.
3 Problematic Subspace
The SDP solution described above ceases to be PSD at which corresponds to an objective value of . The specific obstruction to arises out of (2.10). More precisely, the bottom principal minor which yields the constraint,
In fact, we identify a specific subspace that is problematic for the [DM15a] solution. To describe the subspace, let us fix some notation. Define the random variable to be if , and otherwise. We follow the convention that .
This is an immediate observation from the various matrix norm bounds in [DM15a] (specifically Lemma A.2, Lemma A.3 and Observation A.5). We defer the detailed proof to Appendix A.1. ∎
Since , the above lemma implies that all the vectors with large singular values for are within the subspace . Furthermore, we will show the following lemma which clearly articulates that is the sole obstruction to .
It is sufficient to show that and . Using Proposition 2.3 and (2.9), when condidition (2.11) holds. Using Proposition 2.3, Lemma 2.4 and Lemma 2.5 we can write,
which is PSD given the bounds on in conditions (2.12) and (2.13). To see this, one shows that all the principal minors are PSD.
An immediate corollary of the proof of the above lemma is the following.
Under the hypothesis of Lemma 2.6, with probability ,
4 The Corrected Witness
Suppose we have an unconstrained matrix that we wish to modify as little as possible so as to ensure . Given a test vector so that , the natural update to make is to take for a suitably chosen . This would suggest creating a new SDP solution by setting .
Unfortunately, the SOS SDP relaxation has certain hard constraints, namely that the non-clique entries are fixed at zero. Moreover, the entry must depend only on . Setting the SDP solution matrix to would almost certainly violate both these constraints. It is thus natural to consider multiplicative updates to the entries of the matrix which clearly preserve the zero entries of the matrix.
where . Then our SDP witness is the matrix , defined so that
where is the projection that zeros out rows and columns corresponding to pairs .
where is some factor chosen for each depending on the choice of , which we will set later to ensure that the final moments matrix is PSD.
For , and , with probability at least , the solution does not violate any of the linear constraints of the planted clique SDP.
First, whenever so these entries satisfy the constraints of the SDP. If then is given by,
Notice that is non-zero only if is a clique, and it depends only on . Moreover, is a sum over iid mean random variables and therefore satisfies,
A simple union bound over all subsets shows that for all of them with probability at least . ∎
It now remains to verify that . We will do this by verifying the Schur complement conditions, as in [DM15a]. Analogous to the submatrix , one can consider the corresponding submatrix of . The expression for is as follows:
Here is the matrix with on the diagonal, and is the matrix corresponding to the non-multilinear entries (entries corresponding to monomials like ), and is the all-s matrix. The matrices and are unchanged, and so we must simply verify that and that .
This concludes our proof overview. In Section 3, we verify the Schur complement conditions and prove our main result, and in Section 4 we give the random matrix concentration results upon which we rely throughout the proof.
Proof of the Main Result
In this section, we will demonstrate that , and that . This will allow us to conclude that our solution matrix is PSD, and therefore is a feasible point for the degree-4 SOS relaxation.
Before we proceed further, it will be convenient to parametrize the choice of and . In particular, it will be useful to fix,
for two parameters , which we will finally fix to and . For this setting of parameters, the eigenvalues from Proposition 2.3 are bounded by,
Here we will make a first step towards verifying the Schur complement conditions of Lemma 2.2 by showing that . Specifically, we will show the following stronger claim.
For and , , the following holds with probability at least ,
Fix . By definition of , we have
Define . We can apply Lemma 2.6 to the term and Corollary 2.7 for ,
Now we will appeal to a few matrix concentration bounds that we show in Section 4. First, with probability , the vectors are nearly orthogonal, and therefore form a well-conditioned basis for the subspace .
Also, the vectors have negligible projection on to the eigenspaces which implies that with overwhelming probability,
Finally, is an dimensional space. Each has only non-zero singular values each of which is . Moreover, multiplying on the left and right by acts as a random linear transformation/ random change of basis. Intuitively, this suggests that has eigenvalues all of which are roughly . In fact, with probability ,
By Lemma 2.4, with probability at least , . Substituting this bound for along with (3.2), finishes the proof for our choice of parameters. The details are presented below for completeness.
Clearly and
Towards bounding the eigenvalues of , Deshpande and Montanari [DM15a] observe the following properties of with regards to the spaces and .
Thus, the columns of are in . So we have that
In Lemma 4.5, we bound . So we have that
where we have applied the inequality . The conclusion follows by noting that , and that by Lemma 3.2 . ∎
We will bound , as in our bounds on , by splitting the matrix up according to the eigenspaces .
Let be as defined in (3.1). For the choice of in (3.1), we have that with probability ,
If we set then the inequality reduces to,
which is an immediate consequence of triangle inequality for and Cauchy-Schwartz inequality. ∎
To simplify the calculations and make the dominant terms apparent, let us fix , , and wherein each . First, observe that for this setting of parameters.
For the terms , and we can write,
The conclusion follows by grouping the projections, and taking the dominating terms as grows. ∎
For the choice of given in (3.1), we have that with probability .
Recall that by Theorem 3.1 with probability at least ,
By our choice of the parameters , the conclusion of Theorem 3.4 implies that
as desired. The details of the calculation are spelled out below for the sake of completeness; we verify that the coefficient of each projector is non-negative.
5 Proof of Main Theorem
We finally have the components needed to prove Theorem 2.1.
First, we recall that independent of our choice of , the SDP solution defined in Definition 2.8 does not violate any of the linear constraints of (2.1), as shown in Proposition 2.10. To meet the program constraints, it remains to show that for the choice of given in (3.1), our solution is PSD.
The solution matrix is a principal submatrix of , and so implies . We prove that satisfies the Schur complement conditions from Lemma 2.2 with high probability. Observing that and , we apply Theorem 3.6, which states that for our choice of , . For the term, we apply the lower bound on the eigenvalues of given by [DM15a] (Lemma A.7), which state that so long as and , we have with probability . For our choice of , we have
and so we may conclude that . Therefore by a union bound, the conditions of Lemma 2.2 are satisfied with probability , and and so our solution satisfies the PSDness constraint.
Concentration of Projected Matrices
In this section, we give bounds on the spectra of random matrices that are part of the correction term. Though we are able to recycle many of the spectral bounds of Deshpande and Montanari [DM15a], in our modification to their witness, we introduce new matrices which also require description and norm bounds.
From , we form a new graph by identifying all the nodes with the same label; thus, the number of nodes in is the number of labels in . We then collapse the parallel edges in to form the graph ; since each labelled edge appears at least twice, the number of edges in is at most half that in . The number of nodes in (and thus labels in ) is at most the number of edges in plus the number of connected components; this is tight when is a forest. Thus the number of distinct labels in is at most , where is the number of components in . ∎
We begin with a lemma that shows that the are close to an orthogonal basis for .
By definition, the vectors form a basis for the subspace .
Let be the matrix whose th row is . We will use matrix concentration to analyze the eigenvalues of , which are identical to the nonzero eigenvalues of .
The expression is a sum over monomial products over variables , where each monomial product corresponds to a labelling of a graph . Each entry in corresponds to a sum over links, where each link is a cycle of length , with the vertices on opposite ends of the cycle, and the necessarily distinct vertices are on the other opposite ends of a cycle. We will refer to as the center vertices and as the peripheral vertices of the link. Each edge of the link is weighted by . Since for all , for every contributing labelling, it can never be the case that one of . Each monomial product in the summation corresponds to a labelling of the graph , where is a cycle with links. has edges, and in total it has vertices.
We prove that any such contributing labelling has at most unique vertex labels. We proceed by induction on , the length of the cycle. In the base case, we have a cycle on two links; by inspection no such cycle can have more than labels, and the base case holds.
Now, consider a cycle of length . If every label appears twice, then we are done, since there are vertices in . Thus there must be a vertex that appears only once.
There can be no peripheral vertex whose label does not repeat, since the two center vertices neighboring a single peripheral vertex cannot have the same label in a contributing term, as . Now, if there exists a center vertex whose label does not repeat, it must be that there is a matching between its neighbors so that every vertex is matched to a vertex of the same label; we identify these same-label vertices and remove and two of its neighbors from the graph, leaving us with a cycle of length , having removed at most one label from the graph. The induction hypothesis now applies, and we have a total of at most labels, as desired.
The following lemma allows us to approximate the projector to by a matrix that is easy to describe; we will use this matrix as an approximation to the projector in later proofs.
Now, let ; we can explicitly calculate the entries of ,
We will require the fact that lies mostly outside of , which we prove in the following lemma.
With probability at least ,
The only contributing terms correspond to those for which every edge variable in the product has even multiplicity. Each contributing term is a connected graph and has edges and at most vertices where every labeled edge appears twice, so we may apply Proposition 4.2 to conclude that there are at most labels in any such cycle. We thus have that
and applying Lemma 4.1, we conclude that with probability , as desired. ∎
We combine the above lemmas to bound the norm of one final matrix that arises in the computations in Theorem 3.1.
where to obtain the second line we have applied Cauchy-Schwarz, to obtain the third line we have used the fact that , and to obtain the final line we have used the fact that .
Now, the second term is with overwhelming probability by Lemma 4.5. It remains to bound the first term. To this end, we apply Lemma 4.3 to replace with . Let . An entry of has the form
Acknowledgements
We thank Satish Rao for many helpful conversations.
We also greatfully acknowledge the comments of anonymous reviewers in helping us improve the manuscript.
References
Appendix A Matrix Norm Bounds from Deshpande and Montanari
In this appendix, we give for completeness a list of the bounds proven by Deshpande and Montanari [DM15a] that were not included in the body above out of space or expository considerations.
With probability at least , for each and for each ,
Deshpande and Montanari use the trace power method to bound the norm of by bounding the norms of the individually. Some of the matrices have Wigner-like behavior.
With probability , we have that for each ,
A select few of the have larger eigenvalues.
With probability , we have that for each ,
We also give a short proof of an observation of Deshpande and Montanari, which states that some of the vanish when projected to :
Let be the projector to . Then always,
and so by the characterization of from Proposition 2.3 the vector . The conclusion follows.
Finally, we use a bound on the norm of the matrix , which is the difference of and the non-multilinear entries.
We also require bounds on the matrices used in the Schur complement steps. The bounds of Deshpande and Montanari suffice for us, since we do not modify moments of order less than .
We prove Lemma 2.5, which follows almost immediately from the bounds of [DM15a].
Using the matrices from Definition A.1 and Observation A.5, we have that
where we have used the fact that the columns of and lie in . We thus have
Now, we prove that the trace power method works, for completeness.
The proof follows from an application of Markov’s inequality. We have that for even ,
where we have applied Stirling’s approximation in the last step. Choosing and completes the proof. ∎