Sum-of-squares lower bounds for planted clique
Raghu Meka, Aaron Potechin, Avi Wigderson
Introduction
Finding cliques in random graphs has been the focus of substantial study in algorithm design. Let denote Erdös-Renyi random graphs on vertices where each edge is kept in the graph with probability . It is easy to check that in a random graph , the largest clique has size with high probability. On the other hand, the best known polynomial-time algorithms can only find cliques of size and obtaining better algorithms remains a longstanding open problem: Karp [Kar76] suggested that even finding cliques of size could require superpolynomial time.
Motivated by this, much attention has been given to the related planted clique problem or hidden clique problem introduced by Jerrum [Jer92] and Kucera [Kuc95]. Here, we are given a graph generated by first choosing a random graph and placing a clique of size in the random graph for . The goal is to recover the hidden clique for as small a as possible given . The study of the planted clique problem and its variations (like finding planted dense subgraphs) is motivated from several other more recent directions. Its potential as being hard on average has lead to proposals to base crypto systems on variants of it [ABW10]. It was used to argue that testing -wise independence is hard near the information theoretic limit by [AAK+07]. It is used in [ABBG10] to argue that evaluating some financial derivatives is hard. It was also used to justify the hardness of sparse principal component detection by Bethet and Rigollet [BR13]. Another source of interest comes from the related algorithmic problem of finding large communities in social networks. The best known polynomial-time algorithms can solve the problem for [AKS98] (see [DGGP14] for a near linear-time algorithm) and improving on this bound has received significant attention. The algorithmic problem has also been of much interest in the context of signal finding in molecular biology (pattern discovery in DNA sequences) as modeled in the work of [PS+00].
In this work we exhibit a lower bound for the problem in the powerful Lasserre [Las01] and “sum-of-squares” () [Par00] semi-definite programming hierarchiesFor brevity, in the following, we will use hierarchy as a common term for the formulations of Lasserre [Las01] and Parrilo [Par00] which are essentially the same in our context.. As it happens, proving such lower bounds for the planted clique problem reduces easily to proving an integrality gap of value for the natural formulation of the maximum clique problem in these hierarchies on graphs. Our main result then is the following average-case lower bound for maximum clique. We defer the formal definition of the semi-definite relaxation and hierarchies for now, and only note a few facts. First, that implementing the th level of the hierarchy (namely, rounds), takes roughly time, which is polynomial for constant . Second, the above algorithm for may be viewed as implementing only one round. Third, that suffices for exact solution of the problem, namely finding the maximum clique. Our lower bound implies that polynomial time (when the number of rounds is constant) cannot handle even , and that as many as rounds cannot handle . Here are more precise statementsThroughout, denote constants..
With high probability, for the natural -round relaxation of the maximum clique problem has an integrality gap of at least .
As a corollary we obtain the following lower bound for the planted clique problem.
With high probability, for the natural -round relaxation of the planted clique problem has an integrality gap of at least .
2 Background and related work
Linear and semi-definite hierarchies are one of the most powerful and well-studied techniques in algorithm design. The most prominent of these are the Sherali-Adams hierarchy () [SA90], Lovasz-Schrijver hierarchy () [LS91], their semi-definite versions , and Lasserre and hierarchies. The hierarchies present progressively stronger convex relaxations for combinatorial optimization problems parametrized by the number of rounds , where the -round relaxation can be solved in time on instances of size in all of them. In terms of relative power (barring some minor technicalities about how the numbering of rounds starts), it is known that . Because they capture most powerful techniques for combinatorial optimization, lower bounds for hierarchies serve as strong unconditional evidence for computational hardness. Such lower bounds are even more relevant and compelling in situations where we do not have NP-hardness results, as is the case for typical average-case optimization problems.
Broadly speaking, our understanding of the hierarchy is more limited than those of and hierarchies and in fact the hierarchy appears to be much more powerful. A particularly striking example of this phenomenon was provided by a recent work of Barak et al. [BBH+12]. They showed that a constant number of rounds of the hierarchy can solve the much studied unique games problem on instances which need super constant number of rounds. It was also shown by the works of [BRS11, GS11] that the hierarchy captures the sub-exponential algorithm for unique games of [ABS10]. These results emphasize the need for a better understanding of the power and limitations of the hierarchy.
From the perspective of proving limitations, all known lower bounds for the hierarchy essentially have their origins in the works of Grigoriev [Gri01b, Gri01a], some of which were later independently rediscovered by Schoenebeck [Sch08]. These works show that even rounds of hierarchy cannot solve random or instances, implying a strong unconditional average-case lower bound for a natural distribution.
Most subsequent lower bounds for hierarchy such as those of [Tul09], [BCV+12] rely on [Gri01b] and [Sch08] and gadget reductions. For example, Tulsiani [Tul09] shows that rounds of has an integrality gap of for maximum clique in worst-case. This is in stark contrast to the average-case setting: even a single round of gets an integrality gap of at most for maximum clique on [FK00]. Thus, the worst-case and average-case problems have very different complexities. Finally, using reductions tend to induce distributions that are far from uniform and definitely not as natural as .
For max-clique on random graphs, Feige and Krauthgamer [FK00] showed that , and hence , has an integrality gap of at most with high probability. Complementing this, they also showed [FK03] that the gap remains for with high probability. However, there were no non-trivial lower bounds known for the stronger hierarchy.
For the planted clique problem, other algorithmic techniques were studied. Jerrum [Jer92] showed that a broad class of Markov chain Monte-Carlo (MCMC) based methods cannot solve the problem when the planted clique has size for any constant . Another approach for the planted clique problem based on optimizing a third order tensor was suggested by Frieze and Kannan [FK08]. However, the corresponding optimization problem is NP-hard in the worst-case.
In a recent work, Feldman et al. [FGR+13] introduced the framework of statistical algorithms which generalizes many algorithmic approaches like MCMC methods and showed that such algorithms cannot find large cliques when the planted clique has size in less than timeThe results of [FGR+13] actually apply to the harder bipartite planted clique problem, but this assumption is not too critical.. However, their framework seems quite different from hierarchy based algorithms. In particular, the statistical algorithms framework is not applicable to algorithms which first pick a sample, fix it, and then perform various operations (such as convex relaxations) on it, as is the case for the hierarchies above.
Meka and Wigderson [MW13] addressed lower bounds for planted clique and claimed a stronger bound than Thm 1.1. While there was a fatal error in their proof, many of the techniques introduced there are used in the present paper.
Independent of our work, Deshpande and Montanari [DM15] recently gave a degree lower bound for planted clique; while they are only able to handle the degree case (i.e., ) , they obtain a better bound for this case than us (roughly vs as we do).
3 Proof systems and SDP hierarchies
A potentially simpler problem than deciding is a large clique exists is the problem of producing short certificates to the non-existence of such cliques. This puts the problem in the realm of proof complexity. Indeed, we approach the problem of lower bounds from this viewpoint, via the positivstellensatz proof system perspective of Grigoriev and Volobjov [GV01]. We explain this proof system next in general, and then specialize to Boolean problems and specifically to planted clique.
Suppose we are given a system of polynomial equations or “axioms”
where and are arbitrary -variate polynomials. Clearly, if there exists an identity as above, then the system has no solution over reals. Starting with the seminal work of Artin on Hilbert’s seventeenth problem [Art27], a long line of important results in real algebraic geometry – [Kri64, Ste73, Put93, Sch91]; cf. [BCR98] and references therein – showed that, under some (important) technical conditionsWe avoid going into the details here as the conditions are easily met in the presence of Boolean axioms., such certifying identities always exist for an infeasible system. This motivates the following notion of complexity for refuting systems of polynomial equations.
where are -variate polynomials such that for all and for all .
Our interest in positivstellensatz refutations as above comes from the known relations between such identities and hierarchy. Informally (and under appropriate technical conditions), identities as above of degree show that hierarchy can certify infeasibility of the axioms in rounds and vice versa. We will focus on showing degree lower bounds for identities as above and use them to get integrality gaps for the the hierarchy. We formalize this in Section 12. For a brief history of the different formulations from [GV01], [Las01], [Par00] and the relations between them and results in real algebraic geometry we refer the reader to [OZ13].
Given the above setup, we shall consider the following set of natural axioms to test if a graph has a clique of size .
Given the above theorem it is easy to deduce the integrality gap for the SOS hierarchy, Theorem 1.1: see Section 12. We next highlight the outline of the proof, and some of our techniques which may be of broader interest.
4 Outline
Under reasonable technical conditions which ensure strong duality, the converse also holds. For the clique axioms from Equation 2.1, a dual certificate would correspond to a feasible vector solution for the -round relaxation for maximum clique (see Figure 1 for the exact formulation) with value .
The following elementary lemma will be crucial.
The existence of such a mapping trivially implies a lower bound for refutations: apply to both sides of a purported identity as in Equation 1.1 to arrive at a contradiction.
The lemma suggests a general recipe for proving refutation lower bounds:
Reduction to PSDness of another matrix : The matrix has many zero rows and columns which makes it difficult to work with. In Section 5 we fix this by filling in the zero rows and columns of to obtain a new matrix . We then argue that to show is PSD it is sufficient to show that is PSD.
(Deterministic) Matrix analysis: is PSD with a large minimum eigenvalue . We show this statement in Section Section 7 by using the theory of association schemes described below.
Large deviation: with high probability, . This is done by using the structure of our matrix along-with a careful application of the trace method to bound the norms of certain random matrices with dependent entries.
As discussed, the essence of proving Theorem 1.5 involves showing that a certain random matrix is positive semi-definite (PSD) with high probability. In our case, this calls for showing a relation of the form Here and henceforth denotes PSD ordering: if and only if is positive definite. for two matrices whose rows and columns are indexed by subsets of of size . This in turn leads us to matrices which though complicated to describe, will be set-symmetric - the entry defined by any two (row and column) sets depends solely on the size of the intersection . The set of all such matrices, called the Johnson scheme, is quite well studied in combinatorics as a special case of association schemes. In particular, all such matrices commute with one another and their common eigenspaces are completely understood. This theory allows us to estimate the eigenvalues and norms of various matrices that arise in the analysis.
Techniques: Trace bounds for locally random matrices
After various simplifications and reductions, a central problem we have to deal with is upper bounding the spectral norm of certain random matrices, defined by the underlying random graph . As above, these matrices have rows and columns indexed by subsets of vertices. The entry of the matrix will be a random variable of expectation zero, which depends only on the edges and non-edges of in the subgraph induced by (hence we name such matrices local). In the simple case when (so rows and columns are indexed by singletons), which is the one studied in the analysis of the approximation algorithm, the random variables in all entries are mutually independent, and a norm bound is easy to obtain by a straightforward use of the trace method. However, for as we need to handle, the entries of the matrix are dependent whenever the edge sets of their entries intersect. This significantly complicates the trace calculation, and we develop some combinatorial tools to bound the trace of high powers of such local matrices.
Dual certificate for 𝖯𝖲(r)𝖯𝖲𝑟\mathsf{PS}(r) refutations of max-clique
As mentioned in the introduction, we can often work out what the dual certificate should be from the axioms and basic linear algebra. As an example, we first work out the case where the graph is the complete graph; this will also help us draw a concrete connection to the work of [Gri01a].
For complete graph, the clique axioms simplify to
These incidentally also correspond to proving lower bounds for knapsack as studied by Grigoriev [Gri01a] (and was what lead us to the specific dual certificate we study). However, in the context of lower bounds for knapsack, the axioms are mainly interesting for non-integer and Grigoriev shows that for non-integer , the above system has no refutation for .
From the above it follows that we can define and hence as follows:
Grigoriev takes . Here we set with a view towards what is to come. Thus, the final certificate is
For , the mapping defined above is PSD for .
2 Certificate for clique axioms
The above equations give us a system of linear equations that needs to satisfy. By working with the equations, it is easy to guess a natural solution for the system.
Given a graph on , and , , let
For instance, if and , then is the degree of vertex .
For any graph , defined by Equation 2.4 satisfies Equations 2.2.
The first equation in Equation 2.2 follows immediately from the definition of . Now, for ,
Observe that our notion of degree, , satisfies the following recurrence: for ,
The above two equations imply that satisfies the second equation in 2.2. ∎
Thus, to prove our main theorem Theorem 1.5, it suffices to show that as defined above is PSD with high probability. We now argue that in fact, to show that is PSD we do not need to consider all polynomials of degree at most . Rather, it is sufficient to show that whenever is multilinear and homogeneous of degree .
For any of degree at most we may write where is multilinear and homogeneous of degree , has degree at most , and all have degree at most .
We first make multilinear by removing any terms which are not multilinear from as follows. If has a term of the form where has degree at most , write . Iteratively applying this procedure, we may write plus terms of the form where is multilinear of degree at most and has degree at most .
We now make multilinear and homogeneous of degree by removing any terms which have lower degree as follows. If has a term of the form where , write
Iteratively applying this procedure, we may write plus terms of the form and terms of the forms where is multilinear and homogeneous of degree , all such have degree at most and all such have degree at most . Putting everything together, the result follows. ∎
If for all multilinear homogeneous of degree then is PSD.
Assume for all multilinear homogeneous of degree and for some . Using Lemma 2.3, we may write where is multilinear and homogeneous of degree . so . Contradiction. ∎
In the remainder of the paper, we show that is PSD with high probability for .
There exists a constant such that, with high probability over , the matrix defined by Equation 2.5 is PSD for .
Overview of proof of Theorem 2.5
The proof of Theorem 2.5 is quite technical, and is broken into two parts, where the second part is further broken down into smaller parts. While we gave a sketch of the proof of Theorem 2.5 in the inroduction, we give a more detailed overview of the proof here. Recall that all matrices mentioned below are random matrices which are specified by the choice of the random graph .
As mentioned in the introduction, the matrix has many zero rows and columns which makes it difficult to work with. The first part is to fill in the zero rows and columns of to obtain a new matrix, , which is nonsingular and has no high variance entries. In Section 5 we define this matrix and show that if is PSD, so is . The idea is that and are symmetric and the nonzero part of is a principal submatrix of , so the smallest nonzero eigenvalue of is at least as large as the smallest eigenvalue of .
Having defined (which is set-symmetric), let us spell out what the other matrices are. The“local” random matrix is defined in a simple way as follows:
Finally, define the last matrix .
The proof that is PSD proceeds in three modular steps:
We use the results about Johnson scheme to show that and has a large least eigenvalue (roughly ); see Section 7.
We next show that by exploiting the recursive structure of the matrix and some careful trace calculations. This is the most technically intensive part of the proof, and requires the development of some combinatorial tools to estimate the trace of high powers of ; see Section 8.2.
We then show that . This is done by first showing that every entry of is small in magnitude, via concentration bounds on the number of cliques in random graphs, and bounding its norm using Gershgorin’s circle theorem (Lemma 4.1); see Section 8.3.
Preliminaries
We shall use the following notationsSome are repeated from the introduction so as to have them at one place.:
denotes the set of -variate polynomials of degree at most .
denotes positivstellensatz refutations of degree at most as defined in Definition 1.3.
For , let , denote all subsets of size exactly and at most , respectively.
For , let .
By default all vectors are column vectors. For a set , denotes the indicator vector of the set .
We will also need the following standard fact from matrix theory (see [GVL96] for instance).
Finally, we need McDiarmid’s inequality for obtaining tail bounds for functions of independent random variables (see [dubashi2009concentration] for instance)
Let be independent random variables and let be a function over the domain space of . Let be such that for all , ,
In this section, we define the matrix and show that if is PSD then so is . We use the following notations for brevity: For any set , let . For , let
Intuitively, for every , is what would be had we added cliques on the subsets , to the graph. The above definition avoids the problem of the whole row and column corresponding to or becoming zero if either was not a clique and controls the variance of the entries. We now show that to show is PSD, it is sufficient to show that is PSD.
The reason this lemma is true is because as shown below, the nonzero part of is a principal submatrix of .
Whenever and are cliques of size in ,
Suppose that and are cliques in . Then, if and is a clique and otherwise. Therefore,
The nonzero part of is a principal submatrix of .
We now use the following elementary fact about matrices.
If is a principal submatrix of a symmetric matrix then the smallest eigenvalue of is at least as large as the smallest eigenvalue of .
Combining Corollary 5.3 and Proposition 5.4, if is PSD then is PSD, as needed. ∎
Johnson scheme
Association schemes is a classical area in combinatorics and coding theory (cf. for instance [vLW01]). We shall use a few classical results (lemmas 6.6, 6.7 below), about the eigenspaces and eigenvalues of association schemes and the Johnson scheme in particular. We also introduce two bases for the Johnson scheme, which will play a key role in bounding the eigenvalues of various matrices later.
We start with some basics about the Johnson scheme - some of our notations are non-standard but they fit better with the rest of the manuscript.
As we will soon see, is also a commutative algebra. There is a natural basis for the subspace :
Another important collection of matrices that come up naturally while studying PSD’ness of set-symmetric matrices is the following which gives a basis of PSD matrices for the Johnson scheme.
Equivalently, for , if we let be the PSD rank one matrix
For fixed , the following relations hold:
The first relation follows immediately from the definition of . The second relation follows from inverting the set of equations given in (1). ∎
The main nontrivial result from the theory of association schemes we use is the following characterization of the eigenspaces of matrices in . The starting point for these characterizations is the fact that matrices in commute with one another and hence are simultaneously diagonalizable. We refer the reader to Section 7.4 in [God] (the matrices in our notation correspond to matrices in [God]) for the proofs of these results.
are eigenspaces for and consequently for all matrices in .
For , .
For any matrix , let denote the eigenvalue of within the eigenspace . Then,
Therefore, as and ’s have common eigenspaces, by Lemma 6.6,
PSD’ness of the expectation matrices
The expectation matrix above is just a scalar multiple of (viewed as a matrix) as defined in Equation 2.1. Therefore, by Theorem 2.1, as defined above is PSD for . We give a simpler proof of this claim here for the case when .
The matrix is positive definite for .
We will show this by writing as a suitable positive linear combination of the PSD matrices ’s from Section 6. More concretely, for any , we have
If and then is PSD with minimal eigenvalue
Since the ’s are PSD and , is PSD with minimal eigenvalue , as needed. ∎
PSD’ness of dual certificate
We are now ready to prove our main result, Theorem 1.5, with the aid of several technical results whose proof is deferred to Section 9 and Section 10. We prove Theorem 1.5 by showing that the matrix will be PSD with high probability (Theorem 2.5). In turn, we show that is PSD with high probability with our main technical lemma, which says that is PSD with high probability (this is sufficient by Lemma 5.1).
To prove Lemma 8.1, we first decompose as in Section 8.1. We then analyze and in Section 8.2 and Section 8.3 respectively. We put all the pieces together to show the PSD’ness of in Section 8.4.
For the remainder of this section, we shall use the following additional notations:
For , let . Then, for with , is the probability that .
In the following we will adopt the convention that denote elements of and denote elements of .
We write if there exist constants such that .
Finally, define . We have already shown in Section 7 that is PSD with minimal eigenvalue . There are now two remaining modular steps in the proof:
We show that is by exploiting the recursive structure of the matrix and some careful trace calculations. This is the most technically intensive part of the proof.
We then show that is . This is done by first showing that each entry of is small in magnitude and using Lemma 4.1.
The next two subsections address these two steps with the corresponding technical elements dealt with in Section 9 and Section 10 respectively.
2 Bounding the norm of the locally random matrix L𝐿L
In this subsection, we bound the norm of the matrix .
For some constant , with probability at least over the random graph ,
In other words, for disjoint the ’th entry is essentially (up to a constant multiple) a shift of the indicator random variable which is if all edges in are in and otherwise.
If , for all , .
Note that is an easy bound for (each entry of the matrix is at most in magnitude); the main advantage of the claim is the multiplicative factor.
In the remainder of this section we use the recursive structure of the matrix to prove Claim 8.2 assuming the above claim. We first introduce some notation:
The next claim relates the norms of “lifts” of matrix , . Conceptually, bounding the norms of matrices with non-zero entries on intersecting indexing sets are reduced to that of the disjoint case. Note that the requirement exactly captures the latter.
We partition the entries of as follows.
For any such that , , and where , let be the matrix such that the following is true:
if where are the elements of in increasing order and are the elements of in increasing order.
For all , .
The nonzero part of can be viewed as a submatrix of , so it cannot have larger induced norm than . ∎
If then . If then . This implies that if and only if , is the set of indices of in , and is the set of indices of in , which happens for precisely one . Thus, for precisely one and is otherwise, so , as needed. ∎
.
If , are distinct subsets of of size , , and then and .
Assume that and let be the elements of in increasing order. Then . Contradiction. Following similar logic, we cannot have that either. ∎
For any , .
Note that we can permute the rows and columns of a matrix without affecting its induced norm. By Proposition 8.9, we can permute the rows and columns of to put it into block form where each block is the nonzero part of for some . For a matrix in block form, its norm is the maximum of the norms of the individual blocks, which by Proposition 8.6 is at most , as needed. ∎
With these results, Lemma 8.4 follows immediately. Plugging in Proposition 8.10 to Proposition 8.8 gives , as needed. ∎
We now use the above statements to prove Lemma 8.2.
We claim that for , and as in Equation 8.1
To see the above, fix with and let , . Observe that
We cosider two cases as in the definition of .
Case 1. . Then, . Equation 8.6 now follows from the first case of the definition of .
Case 2. . Then, . Equation 8.6 now follows from the second case of the definition of .
Therefore, by Claim 8.3, Lemma 8.4 and Equation 8.6,
The lemma now follows as . ∎
3 Bounding the norm of the global error matrix ΔΔ\Delta
The main claim of this subsection is the following bound on the spectral norm of .
For , with probability at least over the random graph ,
The proof relies on the following bound on the individual entries of .
For some universal constant , and , with probability at least over the random graph , for all , with ,
Before proving the lemma, we first use it to bound .
Suppose that the conclusion of the previous lemma holds. Then, for any ,
The lemma now follows from the above bound and Lemma 4.1. ∎
Fix sets with . Let be the event that .
Then, by the second case of Equation 8.3, conditioned on we have . Thus, the claim holds trivially in this case. In the following we condition on . Observe that
We next use the following claim that is concentrated around its mean when conditioned on being a clique. At a high level, this follows from the fact that conditioned on being a clique, can be written as a (structured) low-degree polynomial in the indicator variables of the edges not in with small variance. We defer the proof to the appendix.
As a consequence of the above claim we also get concentration for . This is because is identically distributed as . Therefore, taking and applying a union bound over all sets we get that with probability at least , for all such that , and ,
and conditioned on , . The lemma now follows by combining the above two bounds. ∎
4 Putting things together
We now prove Lemma 8.1 and use it to prove our main results.
for as in the statement of the lemma for a sufficiently big constant . ∎
We bring the arguments from previous sections together to prove our main results Theorem 2.5 and Theorem 1.5.
Follows immediately from Lemma 5.1 and Lemma 8.1. ∎
Follows immediately from Lemma 1.8, Claim 2.2 and Theorem 2.5. ∎
Theorems 1.1 and 1.2 follow immediately from our -refutation lower bound using standard arguments. We defer these to the appendix.
Bounding norms of locally random matrices
Going back to Claim 8.3 let us first look at the special case of to gain some intuition. In this case, the entries of are (essentially) independent, and so the trace method is easy to apply. More precisely, is a symmetric random matrix with zeros on the diagonal and the entries in the upper diagonal taking independent uniformly random values. It is well known that in this case (see [Ver] for instance). One can also prove the bound by the trace method as follows. We have that
where . We can then look at which products have expectation .
To handle higher ’s we first generalize the above argument based on constraint graphs to work with general locally-random matrices. However, unlike for , distinct entries of the matrix are now dependent, which significantly complicates the structure of the terms and the associated count of the terms which have non-zero expectation. The rest of the section is devoted to this. While we apply our arguments to the particular locally-random matrices arising in our proof, these techniques should apply more generally to other locally-random matrices.
We next state our main technical result which gives us a way to bound traces of high powers of locally random matrices based on the structure of the individual terms. The advantage being that the conditions on the terms will be easier to ascertain in our applications.
Here we use rather than for subsets because we will be viewing the individual elements of each as vertices.
Assume that we have values and for every positive , we have a function such that and can be written in the form
For every term with non-zero expected value, for some integers and where and .
Then, if , for all ,
We will use this theorem with two types of functions . When for some matrix depending on , for all so this theorem gives us a probabilistic bound on . When for some function , then for all so this theorem gives us a probabilistic bound on .
In the case when , . Each term here has expected value at most and it is easy to argue that for any term with non-zero expected value, the number of distinct elements is at most . Applying Theorem 9.1 with , and we have that for all , and ,
This bound is weaker (by a logarithmic factor) than the bounds in e.g. [Ver], but is sufficient for our purposes.
Before proving the theorem we introduce the concept of constraint graphs which are a useful way to visualize our calculations. While the statement of the above theorem does not involve constraint graphs, thinking in terms of constraint graphs is helpful in proving the conditions required to apply the theorem.
Given a family of sets of vertices , we define a corresponding constraint graph whose vertices are the sets and there is an edge between , , if .
The above definition is useful because of the following elementary lemma.
In the following we use as a short form for . We prove this result by obtaining an upper bound on the number of terms in with nonzero expected value. This gives us a probabilistic upper bound for , implying the upper bound on .
Define to be the number of ways to choose subsets of such that and for all , .
We can choose each ordered -tuple of elements in which contains at most distinct elements as follows. There must be at least elements which are duplicates of other elements, so we can first choose a set of indices such that for all , for some . There are choices for . We then choose the elements . There are no restrictions on these elements so there are choices for these elements. Finally, we choose the elements . To determine each it is sufficient to specify the such that . For each there are choices for the corresponding , so the number of choices for these elements is at most . Putting everything together, the total number of choices is at most . Now note that since we are choosing subsets of rather than one big ordered tuple, the order within each subset does not matter. Thus, there are different ordered tuples which give the same subsets of elements, so the total number of possibilities for the subsets is at most , as needed. ∎
Moreover, by our assumptions, each of these nonzero terms has value at most , so
Now, by Markov’s inequality applied to ,
The claim now follows by rearranging the above bound. ∎
In this subsection, we prove Claim 8.3 using Theorem 9.1. For convenience, we restate Claim 8.3 here with more precise constants.
If , for all , .
The core of the proof will be to bound for any term with non-zero expectation which appear in the expansion of . We will do so by arguing that the constraint graph associated with the term has at most connected components, which we do by inductively decomposing as follows.
Given a partition of , define if and and otherwise.
whenever and are not disjoint. For all disjoint and , for choices of and and is for the rest. ∎
Since , ∎
To simplify this expression, rename the sets of vertices as follows.
If and is odd then take
If and is even then take
where we take . To study which of these terms may have non-zero expectation, we first define a graph related to the corresponding constraint graph.
Given a constraint graph , let be a graph with two types of edges, product edges and constraint edges, such that
Now, each is a random variable with expectation , so if any is independent from everything else, the product will have expectation . Such dependencies arise due to the presence of edges from G occurring in (at least two) different “elements” (say , for ) of the term. Such repeated occurrences manifest in our constraint graphs (and the graph defined above) as (three or four) cycles in the graph, which we call independence breaking. For a term to have non-zero expectation it must be that every element is on some such cycle. This implies that each product has zero expected value unless all of the product edges in the corresponding are part of independence-breaking cycles. This places restrictions on (see Lemma 9.17) which in turn places restrictions on the constraint graph , allowing us to use Theorem 9.1. We make these ideas precise below.
Given and , we define for all .
Define an independence breaking 3-cycle in to consist of product edges , and a constraint edge .
Define an independence breaking 4-cycle to consist of product edges , and constraint edges and .
For all such that whenever is odd and whenever is even, if the corresponding has a product edge which is not contained in any independence-breaking cycle then
If is not contained in any independence-breaking cycle then no edge between and appears anywhere else so is a random variable with expectation which is independent from everything else and thus . ∎
We now bound the number of connected components in with the following lemma.
Let and be a graph such that
Every product edge of is contained in an independence-breaking cycle.
Every constraint edge of is of the form where is even.
Then, the number of connected components in the graph defined by only the constraint edges of is at most .
The intuitive idea behind this lemma is that if we add the constraint edges in the right order, every new constraint edge can put two product edges into independence breaking cycles. For example, a constraint edge between and puts the product edges and into an independence breaking 3-cycle. If we then add a constraint edge between and , this puts the product edges and into an independence breaking 4-cycle. The final constraint edge can put 4 product edges into independence breaking cycles, so the number of constraint edges needed is .
To make this argument work, we use an inductive proof. We note that if there is no which is isolated in , we must have at least constraint edges. On the other hand, if there a which is isolated, there must be a constraint edge between and . As noted above, this constraint edge puts the product edges and into an independence breaking 3-cycle. We take this to be the first constraint edge. We then argue that we can essentially delete and merge and which allows us to use the inductive hypothesis. We make these ideas rigorous below.
We prove Lemma 9.17 by induction on . The base case is trivial, as we clearly need at least one constraint edge, so the number of connected components in is at most . Now assume that and the result is true for .
First note that if there is no which is isolated (when looking only at constraint edges), then there are at most connected components in . Thus, we may assume that is isolated for some . Now note that for the product edge , since is isolated, there are no independence breaking 3-cycles or 4-cycles where is the endpoint of a constraint edge. Thus, we must have that is part of an independence breaking 3-cycle consisting of , , and a constraint edge .
Now form a new graph as follows. Delete and contract the constraint edge between and . More precisely,
Take
Take
After doing this, rename as and rename each where as . In going from to , we have effectively reduced both and the number of connected components by . To complete the proof, we need to check that satisfies the inductive hypotheses. Based on the reduction from to , we still have that every constraint edge is of the form where is even. We check that every product edge is still part of an independence-breaking cycle case by case.
Every independence-breaking cycle which did not contain the constraint edge in is preserved in except that the vertices may have been renamed. The reason for this is that such an independence breaking cycle in cannot contain and can contain at most one of .
The independence-breaking 3-cycle in consisting of the product edges , and the constraint edge is removed, but so are the product edges and , so this is fine.
If we have an independence breaking 4-cycle in consisting of the product edges , and the constraint edges , , this becomes an independence-breaking 3-cycle in with product edges , and a constraint edge (note that and are merged into in and is renamed as in ).
satisfies the inductive hypotheses, so looking only at the constraint edges, has at most connected components. has one more connected component than (the vertex in ), so has at most connected components, as needed. ∎
The above lemma combined with Lemma 9.5 gives the following corollary.
For all terms occurring in Equation 9.1 with nonzero expectation, .
We can now apply Theorem 9.1 with , and by the above corollary. Every entry of has magnitude at most so we can take . By Theorem 9.1, if , for all and , for every ,
Since for all and , we have that for all , for all and and all ,
Now by Corollary 9.11, so
We now prove large deviation bounds for leading to Claim 8.13 which we state below in a more precise form.
If , and , then for all , with ,
To prove the claim we first show a similar concentration bound for the number of cliques of a certain size in . While similar results appear in the literature, see for instance [Ruc88, Vu01, JLR11], we give a short direct proof based on Theorem 9.1.
For a graph , define to be the number of -cliques in .
For all , for all and , and
The first part of the theorem is trivial so we focus on the second part. Given a set of vertices of size , define to be if is a clique and otherwise. Then,
Now let’s consider the function .
Note that unless each set of vertices has two vertices in common with a different set of vertices . Now consider a graph where the vertices are and an edge between if . Let be the number of connected components in . We claim that . For, as in the proof of Lemma 9.5, first consider elements belonging to the different connected components. Now, add the remaining elements of so that each new element is adjacent to at least one of the previously added sets. When doing so, each step can increase the size of the union by at most . Therefore, the size of the union is at most . On the other hand, each connected component in must have at least two vertices, so . Therefore, .
We can now apply Theorem 9.1 with , and so that for , and ,
Using the facts that and for all nonnegative integers , we have that
We are now ready to prove Theorem 10.1. The idea is as follows. Let be the collection of vertices which are adjacent to all the vertices in . Then, conditioned on being a clique, is just the number of cliques of size in the vertices which is primarily determined by . This is because the edges between vertices of are independent of the edges involving vertices in so that we can apply Theorem 10.3 to .
Let be as above and let us condition on being a clique. Then, is just the number of cliques of size among the vertices in . Therefore, by Theorem 10.3, with probability at least ,
We next argue that is concentrated around its mean. For , let be the indicator random variable that is if the ’th vertex is adjacent to all the vertices in and otherwise. Then, and
Observe that the random variables are independent of each other and that
We next apply McDiarmid’s inequality to the function . Note that changing any single coordinate of the inputs to can change its value by at most . Therefore, by Theorem 4.2, with probability at least ,
Combining the above equations, we get that with probability at least ,
The theorem now follows as . ∎
Conclusion and future work
In this work we showed a lower bound for the maximum clique problem on random graphs in the hierarchy and positivstellensatz proof system. Besides the specific application to clique lower bounds, the PSD’ness of the matrix from Equation 2.5 seems to carry further information that could be potentially useful elsewhere, perhaps for studying various sub-graph statistics. Further, the arguments related to association schemes and bounding the norm of locally random matrices could also be useful elsewhere, especially for other hierarchy lower bounds. One natural and interesting candidate is the densest subgraph problem.
For planted clique itself, the most obvious open problem is to tighten the gap between the current upper bound of and our lower bound of for rounds of the SOS hierarchy. In particular, can a constant number of rounds of beat the square-root barrier and identify planted cliques of size ? KelnerPersonal comminication showed that our dual certificate actually is not PSD for roughly . Thus one needs to come up with a different dual certificate to approach the upper bound of even for .
We thank Boaz Barak, Siu-on Chan, Jonathan Kelner, Robert Krauthgamer, James Lee, Nati Linial, David Steurer, Madhu Sudan and Amir Yehudayoff for several useful comments.
References
Hierarchy Gaps and Positivstellensatz Refutations
For a detailed discussion of the hierarchies and -refutations we refer the reader to the discussions in [OZ13]. The basic principle is that, typically, -refutations are more robust and stronger than the hierarchy formulations.
The (or Lasserre) relaxation for maximum clique is stated in Figure 1 (cf. [Tul09]). Although, the formulation itself is not in terms of an SDP, it is a standard fact that as the program only involves inner products of vectors, the optimization can be done by semi-definite programming.
The connection between Figure 1 and -refutations comes from the following straightforward lemma stating that a certificate for -refutations is simply a primal solution to the standard -round -relaxation of the problem.
Observe that for any two subsets ,
Therefore, the vectors satisfy the first two constraints of Figure 1 as is a dual certificate. Further, and for any set ,
so that . Thus, : give a feasible solution for the program in Figure 1. Finally, the value of the solution is
Let . Then, from the above lemma and the proof of Theorem 1.5 (where we showed the existence of a dual certificate for the clique axioms), the value of the -round -relaxation for max-clique on is at least with high probability. The claim follows as the integral value is with high probability. ∎
The value of the relaxation in Figure 1 is clearly monotone with respect to adding edges. Therefore, from the above argument, for the value of the -round -relaxation for max-clique on is at least with high probability. The claim follows as the integral value is with high probability. ∎