Sum-of-squares lower bounds for planted clique

Raghu Meka, Aaron Potechin, Avi Wigderson

Introduction

Finding cliques in random graphs has been the focus of substantial study in algorithm design. Let G(n,p)G(n,p) denote Erdös-Renyi random graphs on nn vertices where each edge is kept in the graph with probability pp. It is easy to check that in a random graph GG(n,1/2)G\leftarrow G(n,1/2), the largest clique has size (2+o(1))log2n(2+o(1))\log_{2}n with high probability. On the other hand, the best known polynomial-time algorithms can only find cliques of size (1+o(1))log2n(1+o(1))\log_{2}n and obtaining better algorithms remains a longstanding open problem: Karp [Kar76] suggested that even finding cliques of size (1+ε)log2n(1+\varepsilon)\log_{2}n could require superpolynomial time.

Motivated by this, much attention has been given to the related planted clique problem or hidden clique problem introduced by Jerrum [Jer92] and Kucera [Kuc95]. Here, we are given a graph GG(n,1/2,k)G\leftarrow G(n,1/2,k) generated by first choosing a G(n,1/2)G(n,1/2) random graph and placing a clique of size kk in the random graph for tlog2nt\gg\log_{2}n. The goal is to recover the hidden clique for as small a kk as possible given GG. The study of the planted clique problem and its variations (like finding planted dense subgraphs) is motivated from several other more recent directions. Its potential as being hard on average has lead to proposals to base crypto systems on variants of it [ABW10]. It was used to argue that testing kk-wise independence is hard near the information theoretic limit by [AAK+07]. It is used in [ABBG10] to argue that evaluating some financial derivatives is hard. It was also used to justify the hardness of sparse principal component detection by Bethet and Rigollet [BR13]. Another source of interest comes from the related algorithmic problem of finding large communities in social networks. The best known polynomial-time algorithms can solve the problem for k=Θ(n)k=\Theta(\sqrt{n}) [AKS98] (see [DGGP14] for a near linear-time algorithm) and improving on this bound has received significant attention. The algorithmic problem has also been of much interest in the context of signal finding in molecular biology (pattern discovery in DNA sequences) as modeled in the work of [PS+00].

In this work we exhibit a lower bound for the problem in the powerful Lasserre [Las01] and “sum-of-squares” (SOS\mathsf{SOS}) [Par00] semi-definite programming hierarchiesFor brevity, in the following, we will use SOS\mathsf{SOS} hierarchy as a common term for the formulations of Lasserre [Las01] and Parrilo [Par00] which are essentially the same in our context.. As it happens, proving such lower bounds for the planted clique problem reduces easily to proving an integrality gap of value kk for the natural formulation of the maximum clique problem in these hierarchies on G(n,1/2)G(n,1/2) graphs. Our main result then is the following average-case lower bound for maximum clique. We defer the formal definition of the semi-definite relaxation and hierarchies for now, and only note a few facts. First, that implementing the rrth level of the SOS\mathsf{SOS} hierarchy (namely, rr rounds), takes roughly nO(r)n^{O(r)} time, which is polynomial for constant rr. Second, the above algorithm for k=Θ(n)k=\Theta(\sqrt{n}) may be viewed as implementing only one round. Third, that r=lognr=\log n suffices for exact solution of the problem, namely finding the maximum clique. Our lower bound implies that polynomial time (when the number of rounds rr is constant) cannot handle even k=no(1)k=n^{o(1)}, and that as many as (logn)1/2(\log n)^{1/2} rounds cannot handle k=(logn)O(1)k=(\log n)^{O(1)}. Here are more precise statementsThroughout, c,Cc,C denote constants..

With high probability, for GG(n,1/2)G\leftarrow G(n,1/2) the natural rr-round SOS\mathsf{SOS} relaxation of the maximum clique problem has an integrality gap of at least n1/2r/Cr(logn)2n^{1/2r}/C^{r}(\log n)^{2}.

As a corollary we obtain the following lower bound for the planted clique problem.

With high probability, for GG(n,1/2,t)G\leftarrow G(n,1/2,t) the natural rr-round SOS\mathsf{SOS} relaxation of the planted clique problem has an integrality gap of at least n1/2r/tCr(logn)2n^{1/2r}/tC^{r}(\log n)^{2}.

2 Background and related work

Linear and semi-definite hierarchies are one of the most powerful and well-studied techniques in algorithm design. The most prominent of these are the Sherali-Adams hierarchy (SA\mathsf{SA}) [SA90], Lovasz-Schrijver hierarchy (LS\mathsf{LS}) [LS91], their semi-definite versions SA+\mathsf{SA}_{+}, LS+\mathsf{LS}_{+} and Lasserre and SOS\mathsf{SOS} hierarchies. The hierarchies present progressively stronger convex relaxations for combinatorial optimization problems parametrized by the number of rounds rr, where the rr-round relaxation can be solved in nO(r)n^{O(r)} time on instances of size nn in all of them. In terms of relative power (barring some minor technicalities about how the numbering of rounds starts), it is known that LS+(r)<SA+(r)<SOS(r)\mathsf{LS}_{+}(r)<\mathsf{SA}_{+}(r)<\mathsf{SOS}(r). Because they capture most powerful techniques for combinatorial optimization, lower bounds for hierarchies serve as strong unconditional evidence for computational hardness. Such lower bounds are even more relevant and compelling in situations where we do not have NP-hardness results, as is the case for typical average-case optimization problems.

Broadly speaking, our understanding of the SOS\mathsf{SOS} hierarchy is more limited than those of LS+\mathsf{LS}_{+} and SA+\mathsf{SA}_{+} hierarchies and in fact the SOS\mathsf{SOS} hierarchy appears to be much more powerful. A particularly striking example of this phenomenon was provided by a recent work of Barak et al. [BBH+12]. They showed that a constant number of rounds of the SOS\mathsf{SOS} hierarchy can solve the much studied unique games problem on instances which need super constant number of LS+,SA+\mathsf{LS}_{+},\mathsf{SA}_{+} rounds. It was also shown by the works of [BRS11, GS11] that the SOS\mathsf{SOS} hierarchy captures the sub-exponential algorithm for unique games of [ABS10]. These results emphasize the need for a better understanding of the power and limitations of the SOS\mathsf{SOS} hierarchy.

From the perspective of proving limitations, all known lower bounds for the SOS\mathsf{SOS} hierarchy essentially have their origins in the works of Grigoriev [Gri01b, Gri01a], some of which were later independently rediscovered by Schoenebeck [Sch08]. These works show that even Ω(n)\Omega(n) rounds of SOS\mathsf{SOS} hierarchy cannot solve random 3XOR3XOR or 3SAT3SAT instances, implying a strong unconditional average-case lower bound for a natural distribution.

Most subsequent lower bounds for SOS\mathsf{SOS} hierarchy such as those of [Tul09], [BCV+12] rely on [Gri01b] and [Sch08] and gadget reductions. For example, Tulsiani [Tul09] shows that 2O(logn)2^{O(\sqrt{\log n})} rounds of SOS\mathsf{SOS} has an integrality gap of n/2O(logn)n/2^{O(\sqrt{\log n})} for maximum clique in worst-case. This is in stark contrast to the average-case setting: even a single round of SOS\mathsf{SOS} gets an integrality gap of at most O(n)O(\sqrt{n}) for maximum clique on G(n,1/2)G(n,1/2) [FK00]. Thus, the worst-case and average-case problems have very different complexities. Finally, using reductions tend to induce distributions that are far from uniform and definitely not as natural as G(n,1/2)G(n,1/2).

For max-clique on random G(n,1/2)G(n,1/2) graphs, Feige and Krauthgamer [FK00] showed that LS+(r)\mathsf{LS}_{+}(r), and hence SOS(r)\mathsf{SOS}(r), has an integrality gap of at most n/2Ω(r)\sqrt{n}/2^{\Omega(r)} with high probability. Complementing this, they also showed [FK03] that the gap remains n/2r\sqrt{n}/2^{r} for LS+(r)\mathsf{LS}_{+}(r) with high probability. However, there were no non-trivial lower bounds known for the stronger SOS\mathsf{SOS} hierarchy.

For the planted clique problem, other algorithmic techniques were studied. Jerrum [Jer92] showed that a broad class of Markov chain Monte-Carlo (MCMC) based methods cannot solve the problem when the planted clique has size O(n1/2δ)O(n^{1/2-\delta}) for any constant δ>0\delta>0. Another approach for the planted clique problem based on optimizing a third order tensor was suggested by Frieze and Kannan [FK08]. However, the corresponding optimization problem is NP-hard in the worst-case.

In a recent work, Feldman et al. [FGR+13] introduced the framework of statistical algorithms which generalizes many algorithmic approaches like MCMC methods and showed that such algorithms cannot find large cliques when the planted clique has size O(n1/2δ)O(n^{1/2-\delta}) in less than nΩ(logn)n^{\Omega(\log n)} timeThe results of [FGR+13] actually apply to the harder bipartite planted clique problem, but this assumption is not too critical.. However, their framework seems quite different from hierarchy based algorithms. In particular, the statistical algorithms framework is not applicable to algorithms which first pick a sample, fix it, and then perform various operations (such as convex relaxations) on it, as is the case for the hierarchies above.

Meka and Wigderson [MW13] addressed SOS\mathsf{SOS} lower bounds for planted clique and claimed a stronger bound than Thm 1.1. While there was a fatal error in their proof, many of the techniques introduced there are used in the present paper.

Independent of our work, Deshpande and Montanari [DM15] recently gave a degree 44 SOS\mathsf{SOS} lower bound for planted clique; while they are only able to handle the degree 44 case (i.e., r=2r=2) , they obtain a better bound for this case than us (roughly n1/3n^{1/3} vs n1/4n^{1/4} as we do).

3 Proof systems and SDP hierarchies

A potentially simpler problem than deciding is a large clique exists is the problem of producing short certificates to the non-existence of such cliques. This puts the problem in the realm of proof complexity. Indeed, we approach the problem of SOS\mathsf{SOS} lower bounds from this viewpoint, via the positivstellensatz proof system perspective of Grigoriev and Volobjov [GV01]. We explain this proof system next in general, and then specialize to Boolean problems and specifically to planted clique.

Suppose we are given a system of polynomial equations or “axioms”

where {g1,,gm}\{g_{1},\ldots,g_{m}\} and {h1,,hN}\{h_{1},\ldots,h_{N}\} are arbitrary nn-variate polynomials. Clearly, if there exists an identity as above, then the system F\mathcal{F} has no solution over reals. Starting with the seminal work of Artin on Hilbert’s seventeenth problem [Art27], a long line of important results in real algebraic geometry – [Kri64, Ste73, Put93, Sch91]; cf. [BCR98] and references therein – showed that, under some (important) technical conditionsWe avoid going into the details here as the conditions are easily met in the presence of Boolean axioms., such certifying identities always exist for an infeasible system. This motivates the following notion of complexity for refuting systems of polynomial equations.

where g1,,gm,h1,,hNg_{1},\ldots,g_{m},h_{1},\ldots,h_{N} are nn-variate polynomials such that deg(figi)2rdeg(f_{i}g_{i})\leq 2r for all i[m]i\in[m] and deg(hj)rdeg(h_{j})\leq r for all j[N]j\in[N].

Our interest in positivstellensatz refutations as above comes from the known relations between such identities and SOS\mathsf{SOS} hierarchy. Informally (and under appropriate technical conditions), identities as above of degree rr show that SOS\mathsf{SOS} hierarchy can certify infeasibility of the axioms in 2r+Θ(1)2r+\Theta(1) rounds and vice versa. We will focus on showing degree lower bounds for identities as above and use them to get integrality gaps for the the SOS\mathsf{SOS} hierarchy. We formalize this in Section 12. For a brief history of the different formulations from [GV01], [Las01], [Par00] and the relations between them and results in real algebraic geometry we refer the reader to [OZ13].

Given the above setup, we shall consider the following set of natural axioms to test if a graph GG has a clique of size kk.

Given the above theorem it is easy to deduce the integrality gap for the SOS hierarchy, Theorem 1.1: see Section 12. We next highlight the outline of the proof, and some of our techniques which may be of broader interest.

4 Outline

Under reasonable technical conditions which ensure strong duality, the converse also holds. For the clique axioms from Equation 2.1, a dual certificate would correspond to a feasible vector solution for the rr-round SOS\mathsf{SOS} relaxation for maximum clique (see Figure 1 for the exact formulation) with value kk.

The following elementary lemma will be crucial.

The existence of such a mapping trivially implies a lower bound for PS(r)\mathsf{PS}(r) refutations: apply M\mathcal{M} to both sides of a purported PS(r)\mathsf{PS}(r) identity as in Equation 1.1 to arrive at a contradiction.

The lemma suggests a general recipe for proving PS(r)\mathsf{PS}(r) refutation lower bounds:

Reduction to PSDness of another matrix MM^{\prime}: The matrix MM has many zero rows and columns which makes it difficult to work with. In Section 5 we fix this by filling in the zero rows and columns of MM to obtain a new matrix MM^{\prime}. We then argue that to show MM is PSD it is sufficient to show that MM^{\prime} is PSD.

(Deterministic) Matrix analysis: E=E[M]E=E[M^{\prime}] is PSD with a large minimum eigenvalue λmin(E)\lambda_{min}(E). We show this statement in Section Section 7 by using the theory of association schemes described below.

Large deviation: with high probability, MEλmin(E)\|M^{\prime}-E\|\leq\lambda_{min}(E). This is done by using the structure of our matrix MM^{\prime} along-with a careful application of the trace method to bound the norms of certain random matrices with dependent entries.

As discussed, the essence of proving Theorem 1.5 involves showing that a certain random matrix is positive semi-definite (PSD) with high probability. In our case, this calls for showing a relation of the form ABA\prec BHere and henceforth \prec denotes PSD ordering: ABA\prec B if and only if BAB-A is positive definite. for two matrices A,BA,B whose rows and columns are indexed by subsets of [n][n] of size rr. This in turn leads us to matrices which though complicated to describe, will be set-symmetric - the entry defined by any two (row and column) sets I,JI,J depends solely on the size of the intersection IJI\cap J. The set of all such matrices, called the Johnson scheme, is quite well studied in combinatorics as a special case of association schemes. In particular, all such matrices commute with one another and their common eigenspaces are completely understood. This theory allows us to estimate the eigenvalues and norms of various matrices that arise in the analysis.

Techniques: Trace bounds for locally random matrices

After various simplifications and reductions, a central problem we have to deal with is upper bounding the spectral norm of certain random matrices, defined by the underlying random graph GG(n,1/2)G\leftarrow G(n,1/2). As above, these matrices have rows and columns indexed by subsets of vertices. The entry (I,J)(I,J) of the matrix will be a random variable of expectation zero, which depends only on the edges and non-edges of GG in the subgraph induced by IJI\cup J (hence we name such matrices local). In the simple case when r=1r=1 (so rows and columns are indexed by singletons), which is the one studied in the analysis of the n\sqrt{n} approximation algorithm, the random variables in all entries are mutually independent, and a norm bound is easy to obtain by a straightforward use of the trace method. However, for r>1r>1 as we need to handle, the entries of the matrix are dependent whenever the edge sets of their entries intersect. This significantly complicates the trace calculation, and we develop some combinatorial tools to bound the trace of high powers of such local matrices.

Dual certificate for 𝖯𝖲​(r)𝖯𝖲𝑟\mathsf{PS}(r) refutations of max-clique

As mentioned in the introduction, we can often work out what the dual certificate should be from the axioms and basic linear algebra. As an example, we first work out the case where the graph GG is the complete graph; this will also help us draw a concrete connection to the work of [Gri01a].

For complete graph, the clique axioms simplify to

These incidentally also correspond to proving lower bounds for knapsack as studied by Grigoriev [Gri01a] (and was what lead us to the specific dual certificate we study). However, in the context of lower bounds for knapsack, the axioms are mainly interesting for non-integer kk and Grigoriev shows that for non-integer kn/2k\leq n/2, the above system has no PS(r)\mathsf{PS}(r) refutation for r<kr<k.

From the above it follows that we can define ff and hence M\mathcal{M} as follows:

Grigoriev takes f(0)=1f(0)=1. Here we set f(0)=(n2r)f(0)=\binom{n}{2r} with a view towards what is to come. Thus, the final certificate is

For k<n/2k<n/2, the mapping MGr\mathcal{M}_{Gr} defined above is PSD for r<kr<k.

2 Certificate for clique axioms

The above equations give us a system of linear equations that M\mathcal{M} needs to satisfy. By working with the equations, it is easy to guess a natural solution for the system.

Given a graph GG on [n][n], and I[n]I\subseteq[n], I2r|I|\leq 2r, let

For instance, if r=1r=1 and vGv\in G, then degG({v})deg_{G}(\{v\}) is the degree of vertex vv.

For any graph GG, MMG\mathcal{M}\equiv\mathcal{M}_{G} defined by Equation 2.4 satisfies Equations 2.2.

The first equation in Equation 2.2 follows immediately from the definition of M\mathcal{M}. Now, for I[n],I<2rI\subseteq[n],|I|<2r,

Observe that our notion of degree, degGdeg_{G}, satisfies the following recurrence: for I<2r|I|<2r,

The above two equations imply that M\mathcal{M} satisfies the second equation in 2.2. ∎

Thus, to prove our main theorem Theorem 1.5, it suffices to show that M\mathcal{M} as defined above is PSD with high probability. We now argue that in fact, to show that M\mathcal{M} is PSD we do not need to consider all polynomials PP of degree at most rr. Rather, it is sufficient to show that M(P12)0\mathcal{M}(P_{1}^{2})\geq 0 whenever P1P_{1} is multilinear and homogeneous of degree rr.

For any PP of degree at most rr we may write P=P1+iP2i(xi2xi)+P3(ixik)P=P_{1}+\sum_{i}{P_{2i}(x^{2}_{i}-x_{i})}+P_{3}(\sum_{i}{x_{i}}-k) where P1P_{1} is multilinear and homogeneous of degree rr, P3P_{3} has degree at most r1r-1, and all P2iP_{2i} have degree at most r2r-2.

We first make PP multilinear by removing any terms which are not multilinear from PP as follows. If PP has a term of the form xi2f{x^{2}_{i}}f where ff has degree at most r2r-2, write xi2f=(xi2xi)f+xif{x^{2}_{i}}f=(x^{2}_{i}-x_{i})f+{x_{i}}f. Iteratively applying this procedure, we may write P=PP=P^{\prime} plus terms of the form (xi2xi)f(x^{2}_{i}-x_{i})f where PP^{\prime} is multilinear of degree at most rr and ff has degree at most r2r-2.

We now make PP^{\prime} multilinear and homogeneous of degree rr by removing any terms which have lower degree as follows. If PP^{\prime} has a term of the form XIX_{I} where I<r|I|<r, write

Iteratively applying this procedure, we may write P=P1P=P_{1} plus terms of the form (xi2xi)f(x^{2}_{i}-x_{i})f and terms of the forms (ixik)g(\sum_{i}{x_{i}-k})g where P1P_{1} is multilinear and homogeneous of degree rr, all such ff have degree at most r2r-2 and all such gg have degree at most r1r-1. Putting everything together, the result follows. ∎

If M(P12)0\mathcal{M}(P_{1}^{2})\geq 0 for all multilinear homogeneous P1P_{1} of degree rr then M\mathcal{M} is PSD.

Assume M(P12)0\mathcal{M}(P_{1}^{2})\geq 0 for all multilinear homogeneous P1P_{1} of degree rr and M(P2)<0\mathcal{M}(P^{2})<0 for some PP(n,r)P\in\mathcal{P}(n,r). Using Lemma 2.3, we may write P=P1+iP2i(xi2xi)+P3(ixik)P=P_{1}+\sum_{i}{P_{2i}(x^{2}_{i}-x_{i})}+P_{3}(\sum_{i}{x_{i}}-k) where P1P_{1} is multilinear and homogeneous of degree rr. M(P2)=M(P12)\mathcal{M}(P^{2})=\mathcal{M}(P_{1}^{2}) so M(P12)<0\mathcal{M}(P_{1}^{2})<0. Contradiction. ∎

In the remainder of the paper, we show that MM is PSD with high probability for kΩr(n1/2r/(logn)1/r)k\leq\Omega_{r}(n^{1/{2r}}/(\log n)^{1/r}).

There exists a constant c>0c>0 such that, with high probability over GG(n,1/2)G\leftarrow G(n,1/2), the matrix MGM_{G} defined by Equation 2.5 is PSD for k2cr(n/logn)1/rk\leq 2^{-cr}\cdot(\sqrt{n}/\log n)^{1/r}.

Overview of proof of Theorem 2.5

The proof of Theorem 2.5 is quite technical, and is broken into two parts, where the second part is further broken down into smaller parts. While we gave a sketch of the proof of Theorem 2.5 in the inroduction, we give a more detailed overview of the proof here. Recall that all matrices mentioned below are random matrices which are specified by the choice of the random graph GG.

As mentioned in the introduction, the matrix M=MGM=M_{G} has many zero rows and columns which makes it difficult to work with. The first part is to fill in the zero rows and columns of MM to obtain a new matrix, MM^{\prime}, which is nonsingular and has no high variance entries. In Section 5 we define this matrix MM^{\prime} and show that if MM^{\prime} is PSD, so is MM. The idea is that MM and MM^{\prime} are symmetric and the nonzero part of MM is a principal submatrix of MM^{\prime}, so the smallest nonzero eigenvalue of MM is at least as large as the smallest eigenvalue of MM^{\prime}.

Having defined EE (which is set-symmetric), let us spell out what the other matrices are. The“local” random matrix LL is defined in a simple way as follows:

Finally, define the last matrix Δ=MEL\Delta=M^{\prime}-E-L.

The proof that MM^{\prime} is PSD proceeds in three modular steps:

We use the results about Johnson scheme to show that E0E\succ 0 and has a large least eigenvalue (roughly Ωr(krnr)\Omega_{r}(k^{r}n^{r})); see Section 7.

We next show that L<Ck2rnr1/2logn\|L\|<Ck^{2r}n^{r-1/2}\log n by exploiting the recursive structure of the matrix LL and some careful trace calculations. This is the most technically intensive part of the proof, and requires the development of some combinatorial tools to estimate the trace of high powers of LL; see Section 8.2.

We then show that Δ<Ck2rnr1/2logn\|\Delta\|<Ck^{2r}n^{r-1/2}\log n. This is done by first showing that every entry of Δ\Delta is small in magnitude, via concentration bounds on the number of cliques in random graphs, and bounding its norm using Gershgorin’s circle theorem (Lemma 4.1); see Section 8.3.

Preliminaries

We shall use the following notationsSome are repeated from the introduction so as to have them at one place.:

P(n,2r)\mathcal{P}(n,2r) denotes the set of nn-variate polynomials of degree at most 2r2r.

PS(r)\mathsf{PS}(r) denotes positivstellensatz refutations of degree at most rr as defined in Definition 1.3.

For 0rn0\leq r\leq n, let ([n]r)\binom{[n]}{r}, ([n]r)\binom{[n]}{\leq r} denote all subsets of size exactly and at most rr, respectively.

For I[n]I\subseteq[n], let XI=iIxiX_{I}=\prod_{i\in I}x_{i}.

By default all vectors are column vectors. For a set II, \mathds1(I)\mathds{1}(I) denotes the indicator vector of the set II.

We will also need the following standard fact from matrix theory (see [GVL96] for instance).

Finally, we need McDiarmid’s inequality for obtaining tail bounds for functions of independent random variables (see [dubashi2009concentration] for instance)

Let X1,,XnX_{1},\ldots,X_{n} be independent random variables and let ff be a function over the domain space of (X1,,Xn)(X_{1},\ldots,X_{n}). Let c1,,cn>0c_{1},\ldots,c_{n}>0 be such that for all ii, x1,,xn,xix_{1},\ldots,x_{n},x_{i}^{\prime},

In this section, we define the matrix MM^{\prime} and show that if MM^{\prime} is PSD then so is MM. We use the following notations for brevity: For any set I[n]I\subseteq[n], let E(I)={{i,j}:ijI}\mathcal{E}(I)=\{\{i,j\}:i\neq j\in I\}. For 0ir0\leq i\leq r, let

Intuitively, for every I,JI,J, M(I,J)M^{\prime}(I,J) is what M(I,J)M(I,J) would be had we added cliques on the subsets II, JJ to the graph. The above definition avoids the problem of the whole row and column corresponding to II or JJ becoming zero if either was not a clique and controls the variance of the entries. We now show that to show MM is PSD, it is sufficient to show that MM^{\prime} is PSD.

The reason this lemma is true is because as shown below, the nonzero part of MM is a principal submatrix of MM^{\prime}.

Whenever II and JJ are cliques of size rr in GG, M(I,J)=M(I,J)M^{\prime}(I,J)=M(I,J)

Suppose that II and JJ are cliques in GG. Then, MT(I,J)=β(IJ)M_{T}(I,J)=\beta(|I\cap J|) if IJTI\cup J\subseteq T and TT is a clique and otherwise. Therefore,

The nonzero part of MM is a principal submatrix of MM^{\prime}.

We now use the following elementary fact about matrices.

If AA is a principal submatrix of a symmetric matrix BB then the smallest eigenvalue of AA is at least as large as the smallest eigenvalue of BB.

Combining Corollary 5.3 and Proposition 5.4, if MM^{\prime} is PSD then MM is PSD, as needed. ∎

Johnson scheme

Association schemes is a classical area in combinatorics and coding theory (cf. for instance [vLW01]). We shall use a few classical results (lemmas 6.6, 6.7 below), about the eigenspaces and eigenvalues of association schemes and the Johnson scheme in particular. We also introduce two bases for the Johnson scheme, which will play a key role in bounding the eigenvalues of various matrices later.

We start with some basics about the Johnson scheme - some of our notations are non-standard but they fit better with the rest of the manuscript.

As we will soon see, J\mathcal{J} is also a commutative algebra. There is a natural basis for the subspace J\mathcal{J}:

Another important collection of matrices that come up naturally while studying PSD’ness of set-symmetric matrices is the following which gives a basis of PSD matrices for the Johnson scheme.

Equivalently, for T[n]T\subseteq[n], if we let PTP_{T} be the PSD rank one matrix

For fixed n,rn,r, the following relations hold:

The first relation follows immediately from the definition of PtP_{t}. The second relation follows from inverting the set of equations given in (1). ∎

The main nontrivial result from the theory of association schemes we use is the following characterization of the eigenspaces of matrices in J\mathcal{J}. The starting point for these characterizations is the fact that matrices in J\mathcal{J} commute with one another and hence are simultaneously diagonalizable. We refer the reader to Section 7.4 in [God] (the matrices PtP_{t} in our notation correspond to matrices CtC_{t} in [God]) for the proofs of these results.

V0,,VrV_{0},\ldots,V_{r} are eigenspaces for {Pt:0tr}\{P_{t}:0\leq t\leq r\} and consequently for all matrices in J\mathcal{J}.

For 0jr0\leq j\leq r, dim(Vj)=(nj)(nj1)dim(V_{j})=\binom{n}{j}-\binom{n}{j-1}.

For any matrix QJQ\in\mathcal{J}, let λj(Q)\lambda_{j}(Q) denote the eigenvalue of QQ within the eigenspace VjV_{j}. Then,

Therefore, as QQ and PtP_{t}’s have common eigenspaces, by Lemma 6.6,

PSD’ness of the expectation matrices

The expectation matrix above is just a scalar multiple of MGr\mathcal{M}_{Gr} (viewed as a matrix) as defined in Equation 2.1. Therefore, by Theorem 2.1, EME_{M} as defined above is PSD for r<min(k,nk)r<\min(k,n-k). We give a simpler proof of this claim here for the case when rmin(k2,nk)r\leq\min(\frac{k}{2},n-k).

The matrix EME_{M} is positive definite for rmin(k2,nk)r\leq\min(\frac{k}{2},n-k).

We will show this by writing EME_{M} as a suitable positive linear combination of the PSD matrices PtP_{t}’s from Section 6. More concretely, for any α0,,αt>0\alpha_{0},\ldots,\alpha_{t}>0, we have

If k<n2r3r2r1k<\frac{n-2r}{3r\cdot 2^{r-1}} and rk2r\leq\frac{k}{2} then EE is PSD with minimal eigenvalue 2O(r2)krnr2^{-O(r^{2})}{k^{r}}{n^{r}}

Since the PtP_{t}’s are PSD and Pr=IP_{r}=I, EE is PSD with minimal eigenvalue 2O(r2)krnr2^{-O(r^{2})}{k^{r}}{n^{r}}, as needed. ∎

PSD’ness of dual certificate

We are now ready to prove our main result, Theorem 1.5, with the aid of several technical results whose proof is deferred to Section 9 and Section 10. We prove Theorem 1.5 by showing that the matrix MM will be PSD with high probability (Theorem 2.5). In turn, we show that MM is PSD with high probability with our main technical lemma, which says that MM^{\prime} is PSD with high probability (this is sufficient by Lemma 5.1).

To prove Lemma 8.1, we first decompose MM^{\prime} as M=E+L+ΔM^{\prime}=E+L+\Delta in Section 8.1. We then analyze LL and Δ\Delta in Section 8.2 and Section 8.3 respectively. We put all the pieces together to show the PSD’ness of MM^{\prime} in Section 8.4.

For the remainder of this section, we shall use the following additional notations:

For 0ir0\leq i\leq r, let p(i)=2(ri)2p(i)=2^{-(r-i)^{2}}. Then, for I,J([n]r)I,J\in\binom{[n]}{r} with IJ=i|I\cap J|=i, p(i)p(i) is the probability that E(IJ)(E(I)E(J))G\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G.

In the following we will adopt the convention that I,J,KI,J,K denote elements of ([n]r)\binom{[n]}{r} and T,TT,T^{\prime} denote elements of ([n]2r)\binom{[n]}{2r}.

We write ArBA\approx_{r}B if there exist constants c,Cc,C such that cr2BACr2Bc^{r^{2}}B\leq A\leq C^{r^{2}}B.

Finally, define Δ=MEL\Delta=M^{\prime}-E-L. We have already shown in Section 7 that EE is PSD with minimal eigenvalue 2O(r2)krnr2^{-O(r^{2})}{k^{r}}{n^{r}}. There are now two remaining modular steps in the proof:

We show that L\|L\| is 2O(r2)k2rnr1/2logn2^{O(r^{2})}k^{2r}n^{r-1/2}\log{n} by exploiting the recursive structure of the matrix LL and some careful trace calculations. This is the most technically intensive part of the proof.

We then show that Δ\|\Delta\| is 2O(r2)k2rnr1/2logn2^{O(r^{2})}k^{2r}n^{r-1/2}\log{n}. This is done by first showing that each entry of Δ\Delta is small in magnitude and using Lemma 4.1.

The next two subsections address these two steps with the corresponding technical elements dealt with in Section 9 and Section 10 respectively.

2 Bounding the norm of the locally random matrix L𝐿L

In this subsection, we bound the norm of the matrix LL.

For some constant C>0C>0, with probability at least 11/n1-1/n over the random graph GG,

In other words, for disjoint V,W([n]a)V,W\in\binom{[n]}{a} the Ra(V,W)R_{a}(V,W)’th entry is essentially (up to a constant multiple) a shift of the indicator random variable which is 11 if all edges in V×WV\times W are in GG and otherwise.

If n100n\geq 100, for all ε(0,1)\varepsilon\in(0,1), Pr[Ra>2a2+2a+2ln(nε)na12]<ε\operatorname*{\mathsf{Pr}}\left[||R_{a}||>2^{a^{2}+2a+2}\ln{(\frac{n}{\varepsilon})}n^{a-\frac{1}{2}}\right]<\varepsilon.

Note that 2a2na2^{a^{2}}n^{a} is an easy bound for Ra\|R_{a}\| (each entry of the matrix is at most 2a22^{a^{2}} in magnitude); the main advantage of the claim is the multiplicative n1/2n^{-1/2} factor.

In the remainder of this section we use the recursive structure of the matrix LL to prove Claim 8.2 assuming the above claim. We first introduce some notation:

The next claim relates the norms of “lifts” of matrix RR, R(i)R^{(i)}. Conceptually, bounding the norms of matrices with non-zero entries on intersecting indexing sets are reduced to that of the disjoint case. Note that the requirement R=R0R=R^{0} exactly captures the latter.

We partition the entries of R(i)R^{(i)} as follows.

For any X,Y,KX,Y,K such that X[1,r1]X\subseteq[1,r_{1}], Y[1,r2]Y\subseteq[1,r_{2}], and KV(G)K\subseteq V(G) where K=X=Y=i|K|=|X|=|Y|=i, let RX,Y,K(i)R_{X,Y,K}^{(i)} be the matrix such that the following is true:

RX,Y,K(i)(I,J)=R(i)(I,J)=R(IK,JK)R_{X,Y,K}^{(i)}(I,J)=R^{(i)}(I,J)=R(I\setminus K,J\setminus K) if K={ix:xX}={jy:yY}K=\{i_{x}:x\in X\}=\{j_{y}:y\in Y\} where i1,,ir1i_{1},\cdots,i_{r_{1}} are the elements of II in increasing order and j1,,jr2j_{1},\cdots,j_{r_{2}} are the elements of JJ in increasing order.

For all X,Y,KX,Y,K, RX,Y,K(i)R||R_{X,Y,K}^{(i)}||\leq||R||.

The nonzero part of RX,Y,K(i)R_{X,Y,K}^{(i)} can be viewed as a submatrix of RR, so it cannot have larger induced norm than RR. ∎

If R(i)(I,J)=0R^{(i)}(I,J)=0 then X,Y,KRX,Y,K(i)(I,J)=0\sum_{X,Y,K}{R_{X,Y,K}^{(i)}(I,J)}=0. If R(i)(I,J)0R^{(i)}(I,J)\neq 0 then IJ=i|I\cap J|=i. This implies that K={ix:xX}={jy:yY}K=\{i_{x}:x\in X\}=\{j_{y}:y\in Y\} if and only if K=IJK=I\cap J, XX is the set of indices of KK in II, and YY is the set of indices of KK in JJ, which happens for precisely one X,Y,KX,Y,K. Thus, RX,Y,K(i)(I,J)=R(i)(I,J)R_{X,Y,K}^{(i)}(I,J)=R^{(i)}(I,J) for precisely one I,J,XI,J,X and is otherwise, so R(i)=X,Y,KRX,Y,K(i)R^{(i)}=\sum_{X,Y,K}{R_{X,Y,K}^{(i)}}, as needed. ∎

R(i)X,YKRX,Y,K(i)||R^{(i)}||\leq\sum_{X,Y}{||\sum_{K}{R_{X,Y,K}^{(i)}}||}.

If K1K_{1},K2K_{2} are distinct subsets of V(G)V(G) of size xx, RX,Y,K1(i)(I1,J1)0R_{X,Y,K_{1}}^{(i)}(I_{1},J_{1})\neq 0, and RX,Y,K2(i)(I2,J2)0R_{X,Y,K_{2}}^{(i)}(I_{2},J_{2})\neq 0 then I1I2I_{1}\neq I_{2} and J1J2J_{1}\neq J_{2}.

Assume that I1=I2=II_{1}=I_{2}=I and let i1,,ir1i_{1},\cdots,i_{r_{1}} be the elements of II in increasing order. Then K1={ix:xX}=K2K_{1}=\{i_{x}:x\in X\}=K_{2}. Contradiction. Following similar logic, we cannot have that J1=J2J_{1}=J_{2} either. ∎

For any X,Y[1,n]X,Y\subseteq[1,n], KRX,Y,K(i)R||\sum_{K}{R_{X,Y,K}^{(i)}}||\leq||R||.

Note that we can permute the rows and columns of a matrix without affecting its induced norm. By Proposition 8.9, we can permute the rows and columns of KRX,Y,K(i)\sum_{K}{R_{X,Y,K}^{(i)}} to put it into block form where each block is the nonzero part of RX,Y,K(i)R_{X,Y,K}^{(i)} for some KK. For a matrix in block form, its norm is the maximum of the norms of the individual blocks, which by Proposition 8.6 is at most R||R||, as needed. ∎

With these results, Lemma 8.4 follows immediately. Plugging in Proposition 8.10 to Proposition 8.8 gives R(i)X,YKRX,Y,K(i)X,YR(r1i)(r2i)R||R^{(i)}||\leq\sum_{X,Y}{||\sum_{K}{R_{X,Y,K}^{(i)}}||}\leq\sum_{X,Y}{||R||}\leq{{r_{1}}\choose i}{{r_{2}}\choose i}||R||, as needed. ∎

We now use the above statements to prove Lemma 8.2.

We claim that for 0ir0\leq i\leq r, and αi\alpha_{i} as in Equation 8.1

To see the above, fix I,J([n]r)I,J\in\binom{[n]}{r} with IJ=i|I\cap J|=i and let V=I(IJ)V=I\setminus(I\cap J), W=J(IJ)W=J\setminus(I\cap J). Observe that

We cosider two cases as in the definition of LL.

Case 1. E(IJ)(E(I)E(J))G\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G. Then, Rri(i)(I,J)=Rri(V,W)=2(ri)21=(1p(i))/p(i)R_{r-i}^{(i)}(I,J)=R_{r-i}(V,W)=2^{(r-i)^{2}-1}=(1-p(i))/p(i). Equation 8.6 now follows from the first case of the definition of LL.

Case 2. E(IJ)(E(I)E(J))⊈G\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\not\subseteq G. Then, Rri(i)(I,J)=Rri(V,W)=1R_{r-i}^{(i)}(I,J)=R_{r-i}(V,W)=-1. Equation 8.6 now follows from the second case of the definition of LL.

Therefore, by Claim 8.3, Lemma 8.4 and Equation 8.6,

The lemma now follows as L=i=0rLiL=\sum_{i=0}^{r}L^{i}. ∎

3 Bounding the norm of the global error matrix ΔΔ\Delta

The main claim of this subsection is the following bound on the spectral norm of Δ\Delta.

For n>C24r2n>C2^{4r^{2}}, with probability at least 11/n1-1/n over the random graph GG,

The proof relies on the following bound on the individual entries of Δ\Delta.

For some universal constant CC, and n>C24r2n>C2^{4r^{2}}, with probability at least 11/n1-1/n over the random graph GG, for all I,J([n]r)I,J\in\binom{[n]}{r}, with i=IJi=|I\cap J|,

Before proving the lemma, we first use it to bound Δ\|\Delta\|.

Suppose that the conclusion of the previous lemma holds. Then, for any I([n]r)I\in\binom{[n]}{r},

The lemma now follows from the above bound and Lemma 4.1. ∎

Fix sets I,JI,J with IJ=i|I\cap J|=i. Let A\mathcal{A} be the event that E(IJ)(E(I)E(J))G\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G.

Then, by the second case of Equation 8.3, conditioned on ¬A\neg\mathcal{A} we have Δ(I,J)=0\Delta(I,J)=0. Thus, the claim holds trivially in this case. In the following we condition on A\mathcal{A}. Observe that

We next use the following claim that degG(IJ)\deg_{G}(I\cup J) is concentrated around its mean when conditioned on IJI\cup J being a clique. At a high level, this follows from the fact that conditioned on IJI\cup J being a clique, degG(IJ)\deg_{G}(I\cup J) can be written as a (structured) low-degree polynomial in the indicator variables of the edges not in IJI\cup J with small variance. We defer the proof to the appendix.

As a consequence of the above claim we also get concentration for M(I,J)AM^{\prime}(I,J)\mid\mathcal{A}. This is because M(I,J)AM^{\prime}(I,J)\mid\mathcal{A} is identically distributed as M(I,J)(IJ a clique)M(I,J)\mid(I\cup J\text{ a clique}). Therefore, taking ε=1/n2r+1\varepsilon=1/n^{2r+1} and applying a union bound over all sets I,JI,J we get that with probability at least 11/n1-1/n, for all I,JI,J such that E(IJ)(E(I)E(J))G\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G, and IJ=i|I\cap J|=i,

and conditioned on A\mathcal{A}, Δ(I,J)=M(I,J)α(IJ)/p(IJ)\Delta(I,J)=M^{\prime}(I,J)-\alpha(|I\cap J|)/p(|I\cap J|). The lemma now follows by combining the above two bounds. ∎

4 Putting things together

We now prove Lemma 8.1 and use it to prove our main results.

for kk as in the statement of the lemma for a sufficiently big constant cc. ∎

We bring the arguments from previous sections together to prove our main results Theorem 2.5 and Theorem 1.5.

Follows immediately from Lemma 5.1 and Lemma 8.1. ∎

Follows immediately from Lemma 1.8, Claim 2.2 and Theorem 2.5. ∎

Theorems 1.1 and 1.2 follow immediately from our PS(r)\mathsf{PS}(r)-refutation lower bound using standard arguments. We defer these to the appendix.

Bounding norms of locally random matrices

Going back to Claim 8.3 let us first look at the special case of a=1a=1 to gain some intuition. In this case, the entries of R1R_{1} are (essentially) independent, and so the trace method is easy to apply. More precisely, R1R_{1} is a symmetric random matrix with zeros on the diagonal and the entries in the upper diagonal taking independent uniformly random ±1\pm{1} values. It is well known that R1=O(n)\|R_{1}\|=O(\sqrt{n}) in this case (see [Ver] for instance). One can also prove the bound by the trace method as follows. We have that

where i2q+1=i1i_{2q+1}=i_{1}. We can then look at which products j=12qR1(ij,ij+1)\prod_{j=1}^{2q}{{R_{1}}({i_{j}},i_{j+1})} have expectation .

To handle higher aa’s we first generalize the above argument based on constraint graphs to work with general locally-random matrices. However, unlike for a=1a=1, distinct entries of the matrix are now dependent, which significantly complicates the structure of the terms and the associated count of the terms which have non-zero expectation. The rest of the section is devoted to this. While we apply our arguments to the particular locally-random matrices arising in our proof, these techniques should apply more generally to other locally-random matrices.

We next state our main technical result which gives us a way to bound traces of high powers of locally random matrices based on the structure of the individual terms. The advantage being that the conditions on the terms will be easier to ascertain in our applications.

Here we use VV rather than II for subsets because we will be viewing the individual elements of each VV as vertices.

Assume that we have values a,B>0a,B>0 and for every positive qq, we have a function p(G,2q)p(G,2q) such that p(G,2q)0p(G,2q)\geq 0 and p(G,2q)p(G,2q) can be written in the form

For every term f(G,{V1,,V2q})f(G,\{V_{1},\ldots,V_{2q}\}) with non-zero expected value, jVj2aqqy+z|\cup_{j}V_{j}|\leq 2aq-qy+z for some integers yy and zz where 1y2a1\leq y\leq 2a and z0z\geq 0.

Then, if n10n\geq 10, for all ε(0,1)\varepsilon\in(0,1),

We will use this theorem with two types of functions pp. When p(G,2q)=tr((MTM)q)p(G,2q)=tr(({M^{T}}M)^{q}) for some matrix MM depending on GG, Mp(G,2q)2q||M||\leq\sqrt[2q]{p(G,2q)} for all q>0q>0 so this theorem gives us a probabilistic bound on M||M||. When p(G,2q)=h(G)2qp(G,2q)=h(G)^{2q} for some function hh, then h(G)=p(G,2q)2qh(G)=\sqrt[2q]{p(G,2q)} for all q>0q>0 so this theorem gives us a probabilistic bound on h(G)h(G).

In the case when p(G,2q)=tr(R12q)p(G,2q)=tr(R_{1}^{2q}), p(G,2q)=i1,,i2qj=12qR1(ij,ij+1)p(G,2q)=\sum_{i_{1},\cdots,i_{2q}}{\prod_{j=1}^{2q}{{R_{1}}({i_{j}},i_{j+1})}}. Each term here has expected value at most 11 and it is easy to argue that for any term with non-zero expected value, the number of distinct elements is at most q+1q+1. Applying Theorem 9.1 with y=z=1y=z=1, and B=1B=1 we have that for all n10n\geq 10, and ε(0,1)\varepsilon\in(0,1),

This bound is weaker (by a logarithmic factor) than the bounds in e.g. [Ver], but is sufficient for our purposes.

Before proving the theorem we introduce the concept of constraint graphs which are a useful way to visualize our calculations. While the statement of the above theorem does not involve constraint graphs, thinking in terms of constraint graphs is helpful in proving the conditions required to apply the theorem.

Given a family of sets of vertices {Vi}\{V_{i}\}, we define a corresponding constraint graph CC whose vertices are the sets {Vi}\{V_{i}\} and there is an edge between Vi,VjV_{i},V_{j}, iji\neq j, if ViVjV_{i}\cap V_{j}\neq\emptyset.

The above definition is useful because of the following elementary lemma.

In the following we use {Vi}\{V_{i}\} as a short form for {V1,,V2q}\{V_{1},\ldots,V_{2q}\}. We prove this result by obtaining an upper bound on the number of terms in p(G,2q)={Vi}f(G,{Vi})p(G,2q)=\sum_{\{V_{i}\}}{f(G,\{V_{i}\})} with nonzero expected value. This gives us a probabilistic upper bound for p(G,2q)p(G,2q), implying the upper bound on minq{p(G,2q)2q}\min_{q}{\{\sqrt[2q]{p(G,2q)}\}}.

Define N(n,a,q,m)N(n,a,q,m) to be the number of ways to choose subsets {Vi:i[2q]}\{V_{i}:i\in[2q]\} of [n][n] such that iVim|\cup_{i}{V_{i}}|\leq m and for all ii, Vi=a|V_{i}|=a.

We can choose each ordered 2aq2aq-tuple (v1,,v2aq)(v_{1},\cdots,v_{2aq}) of elements in [n][n] which contains at most mm distinct elements as follows. There must be at least 2aqm2aq-m elements which are duplicates of other elements, so we can first choose a set II of 2aqm2aq-m indices such that for all iIi\in I, vi=vjv_{i}=v_{j} for some jIj\notin I. There are (2aq2aqm)\binom{2aq}{2aq-m} choices for II. We then choose the elements {vj:jI}\{v_{j}:j\notin I\}. There are no restrictions on these elements so there are nmn^{m} choices for these elements. Finally, we choose the elements {vi:iI}\{v_{i}:i\in I\}. To determine each viv_{i} it is sufficient to specify the jIj\notin I such that vi=vjv_{i}=v_{j}. For each ii there are mm choices for the corresponding jj, so the number of choices for these elements is at most m2aqmm^{2aq-m}. Putting everything together, the total number of choices is at most (2aq2aqm)nmm2aqm\binom{2aq}{2aq-m}{n^{m}}{{m}^{2aq-m}}. Now note that since we are choosing subsets {Vi:i[2q]}\{V_{i}:i\in[2q]\} of [n][n] rather than one big ordered tuple, the order within each subset does not matter. Thus, there are (a!)2q(a!)^{2q} different ordered tuples which give the same subsets of elements, so the total number of possibilities for the subsets {Vi}\{V_{i}\} is at most (a!)2q(2aq2aqm)nmm2aqm(a!)^{-2q}{\binom{2aq}{2aq-m}}{n^{m}}{{m}^{2aq-m}}, as needed. ∎

Moreover, by our assumptions, each of these nonzero terms E[f(G,{Vi})]E[f(G,\{V_{i}\})] has value at most B2qB^{2q}, so

Now, by Markov’s inequality applied to p(G,2q)p(G,2q),

The claim now follows by rearranging the above bound. ∎

In this subsection, we prove Claim 8.3 using Theorem 9.1. For convenience, we restate Claim 8.3 here with more precise constants.

If n100n\geq 100, for all ε(0,1)\varepsilon\in(0,1), Pr[Ra>2a2+2a+2ln(nε)na12]<ε\operatorname*{\mathsf{Pr}}\left[||R_{a}||>2^{a^{2}+2a+2}\ln{(\frac{n}{\varepsilon})}n^{a-\frac{1}{2}}\right]<\varepsilon.

The core of the proof will be to bound j=12qVij|\cup_{j=1}^{2q}V_{i_{j}}| for any term j=12qRa(Vij,Vij+1)\prod_{j=1}^{2q}R_{a}(V_{i_{j}},V_{i_{j+1}}) with non-zero expectation which appear in the expansion of tr((RaTRa)q)tr((R_{a}^{T}R_{a})^{q}). We will do so by arguing that the constraint graph associated with the term has at most 2aqq+12aq-q+1 connected components, which we do by inductively decomposing RaR_{a} as follows.

Given a partition (A,B)(A,B) of [1,n][1,n], define Ra,A,B(V1,V2)=Ra(V1,V2)R_{a,A,B}(V_{1},V_{2})=R_{a}(V_{1},V_{2}) if V1AV_{1}\subseteq A and V2BV_{2}\subseteq B and otherwise.

Ra,A,B(V1,V2)=Ra(V1,V2)=0R_{a,A,B}(V_{1},V_{2})=R_{a}(V_{1},V_{2})=0 whenever V1V_{1} and V2V_{2} are not disjoint. For all disjoint V1V_{1} and V2V_{2}, Ra,A,B(V1,V2)=Ra(V1,V2)R_{a,A,B}(V_{1},V_{2})=R_{a}(V_{1},V_{2}) for 2n2a2^{n-2a} choices of AA and BB and is for the rest. ∎

Ra22amaxA,B{Ra,A,B}||R_{a}||\leq 2^{2a}\max_{A,B}{\{||R_{a,A,B}||\}}

Since Ra=22anA,BRa,A,BR_{a}=2^{2a-n}\sum_{A,B}{R_{a,A,B}}, Ra22anA,BRa,A,B22amaxA,B{Ra,A,B}||R_{a}||\leq 2^{2a-n}\sum_{A,B}{||R_{a,A,B}||}\leq 2^{2a}\max_{A,B}{\{||R_{a,A,B}||\}}

To simplify this expression, rename the sets of vertices as follows.

If i[1,2q]i\in[1,2q] and ii is odd then take Wi=V(i+12)1W_{i}=V_{(\frac{i+1}{2})1}

If i[1,2q]i\in[1,2q] and ii is even then take Wi=V(i2)2W_{i}=V_{(\frac{i}{2})2}

where we take W2q+1=W1W_{2q+1}=W_{1}. To study which of these terms may have non-zero expectation, we first define a graph related to the corresponding constraint graph.

Given a constraint graph CC, let HH be a graph with two types of edges, product edges and constraint edges, such that

EP(H)={(Wi,Wi+1):i[1,2q]}E_{P}(H)=\{(W_{i},W_{i+1}):i\in[1,2q]\}

EC(H)={(Wi,Wj):ij,WiWj}E_{C}(H)=\{(W_{i},W_{j}):i\neq j,W_{i}\cap W_{j}\neq\emptyset\}

Now, each Ra(Wi,Wi+1)R_{a}(W_{i},W_{i+1}) is a random variable with expectation , so if any Ra(Wi,Wi+1)R_{a}(W_{i},W_{i+1}) is independent from everything else, the product will have expectation . Such dependencies arise due to the presence of edges from G occurring in (at least two) different “elements” (say (Wi,Wi+1)(W_{i},W_{i+1}), (Wj,Wj+1)(W_{j},W_{j+1}) for iji\neq j) of the term. Such repeated occurrences manifest in our constraint graphs (and the graph HH defined above) as (three or four) cycles in the graph, which we call independence breaking. For a term to have non-zero expectation it must be that every element (Wi,Wi+1)(W_{i},W_{i+1}) is on some such cycle. This implies that each product i=12qRa(Wi,Wi+1)\prod_{i=1}^{2q}{R_{a}(W_{i},W_{i+1})} has zero expected value unless all of the product edges in the corresponding HH are part of independence-breaking cycles. This places restrictions on HH (see Lemma 9.17) which in turn places restrictions on the constraint graph CC, allowing us to use Theorem 9.1. We make these ideas precise below.

Given qq and {W1,,W2q}\{W_{1},\cdots,W_{2q}\}, we define Wi±2q=WiW_{i\pm 2q}=W_{i} for all i[1,2q]i\in[1,2q].

Define an independence breaking 3-cycle in HH to consist of product edges (Wi,Wi+1)(W_{i},W_{i+1}), (Wi+1,Wi+2)(W_{i+1},W_{i+2}) and a constraint edge ((Wi,j),(Wi+2,j))((W_{i},j),(W_{i+2},j)).

Define an independence breaking 4-cycle to consist of product edges e1=(Wi1,Wi1+1)e_{1}=(W_{i_{1}},W_{i_{1}+1}), e2=(Wi2,Wi2±1)e_{2}=(W_{i_{2}},W_{i_{2}\pm 1}) and constraint edges (Wi1,Wi2)(W_{i_{1}},W_{i_{2}}) and (Wi1+1,Wi2±1)(W_{i_{1}+1},W_{i_{2}\pm 1}).

For all W1,,W2qW_{1},\cdots,W_{2q} such that WiBW_{i}\subseteq B whenever ii is odd and WiAW_{i}\subseteq A whenever ii is even, if the corresponding HH has a product edge (Wi,Wi+1)(W_{i},W_{i+1}) which is not contained in any independence-breaking cycle then E[i=12qRa(Wi,Wi+1)]=0E[\prod_{i=1}^{2q}{R_{a}(W_{i},W_{i+1})}]=0

If (Wi,Wi+1)(W_{i},W_{i+1}) is not contained in any independence-breaking cycle then no edge between WiW_{i} and Wi+1W_{i+1} appears anywhere else so Ra(Wi,Wi+1)R_{a}(W_{i},W_{i+1}) is a random variable with expectation which is independent from everything else and thus E[i=12qRa(Wi,Wi+1)]=0E[\prod_{i=1}^{2q}{R_{a}(W_{i},W_{i+1})}]=0. ∎

We now bound the number of connected components in HH with the following lemma.

Let q2q\geq 2 and HH be a graph such that

Every product edge of HH is contained in an independence-breaking cycle.

Every constraint edge of HH is of the form (Wi,Wi+j)(W_{i},W_{i+j}) where jj is even.

Then, the number of connected components in the graph defined by only the constraint edges of HH is at most q+1q+1.

The intuitive idea behind this lemma is that if we add the constraint edges in the right order, every new constraint edge can put two product edges into independence breaking cycles. For example, a constraint edge between Wi1W_{i-1} and Wi+1W_{i+1} puts the product edges (Wi1,Wi)(W_{i-1},W_{i}) and (Wi,Wi+1)(W_{i},W_{i+1}) into an independence breaking 3-cycle. If we then add a constraint edge between Wi2W_{i-2} and Wi+2W_{i+2}, this puts the product edges (Wi2,Wi1)(W_{i-2},W_{i-1}) and (Wi+1,Wi+2)(W_{i+1},W_{i+2}) into an independence breaking 4-cycle. The final constraint edge can put 4 product edges into independence breaking cycles, so the number of constraint edges needed is q1q-1.

To make this argument work, we use an inductive proof. We note that if there is no WiW_{i} which is isolated in HH, we must have at least qq constraint edges. On the other hand, if there a WiW_{i} which is isolated, there must be a constraint edge between Wi1W_{i-1} and Wi+1W_{i+1}. As noted above, this constraint edge puts the product edges (Wi1,Wi)(W_{i-1},W_{i}) and (Wi,Wi+1)(W_{i},W_{i+1}) into an independence breaking 3-cycle. We take this to be the first constraint edge. We then argue that we can essentially delete WiW_{i} and merge Wi1W_{i-1} and Wi+1W_{i+1} which allows us to use the inductive hypothesis. We make these ideas rigorous below.

We prove Lemma 9.17 by induction on qq. The base case q=2q=2 is trivial, as we clearly need at least one constraint edge, so the number of connected components in HH is at most 33. Now assume that q=k3q=k\geq 3 and the result is true for q=k1q=k-1.

First note that if there is no WiW_{i} which is isolated (when looking only at constraint edges), then there are at most qq connected components in HH. Thus, we may assume that WiW_{i} is isolated for some ii. Now note that for the product edge (Wi1,Wi)(W_{i-1},W_{i}), since WiW_{i} is isolated, there are no independence breaking 3-cycles or 4-cycles where WiW_{i} is the endpoint of a constraint edge. Thus, we must have that (Wi1,Wi)(W_{i-1},W_{i}) is part of an independence breaking 3-cycle consisting of (Wi1,Wi)(W_{i-1},W_{i}), (Wi,Wi+1)(W_{i},W_{i+1}), and a constraint edge (Wi1,Wi+1)(W_{i-1},W_{i+1}).

Now form a new graph HH^{\prime} as follows. Delete WiW_{i} and contract the constraint edge between Wi1W_{i-1} and Wi+1W_{i+1}. More precisely,

Take V(H)=V(H){Wi1,Wi,Wi+1}{U}V(H^{\prime})=V(H)\setminus\{W_{i-1},W_{i},W_{i+1}\}\cup\{U\}

Take Eproduct(H)=Eproduct(H){(Wj,Wj+1):j[i2,i+1]}{(Wi2,U),(U,Wi+2)}E_{product}(H^{\prime})=E_{product}(H)\setminus\{(W_{j},W_{j+1}):j\in[i-2,i+1]\}\cup\{(W_{i-2},U),(U,W_{i+2})\}

After doing this, rename UU as Wi1W_{i-1} and rename each WjW_{j} where j>i+1j>i+1 as Wj2W_{j-2}. In going from HH to HH^{\prime}, we have effectively reduced both qq and the number of connected components by 11. To complete the proof, we need to check that HH^{\prime} satisfies the inductive hypotheses. Based on the reduction from HH to HH^{\prime}, we still have that every constraint edge is of the form (Wi,Wi+j)(W_{i},W_{i+j}) where jj is even. We check that every product edge is still part of an independence-breaking cycle case by case.

Every independence-breaking cycle which did not contain the constraint edge (Wi1,Wi+1)(W_{i-1},W_{i+1}) in HH is preserved in HH^{\prime} except that the vertices may have been renamed. The reason for this is that such an independence breaking cycle in HH cannot contain WiW_{i} and can contain at most one of {Wi1,Wi+1}\{W_{i-1},W_{i+1}\}.

The independence-breaking 3-cycle in HH consisting of the product edges (Wi1,Wi)(W_{i-1},W_{i}), (Wi,Wi+1)(W_{i},W_{i+1}) and the constraint edge (Wi1,Wi+1)(W_{i-1},W_{i+1}) is removed, but so are the product edges (Wi1,Wi)(W_{i-1},W_{i}) and (Wi,Wi+1)(W_{i},W_{i+1}), so this is fine.

If we have an independence breaking 4-cycle in HH consisting of the product edges (Wi2,Wi1)(W_{i-2},W_{i-1}), (Wi+1,Wi+2)(W_{i+1},W_{i+2}) and the constraint edges (Wi1,Wi+1)(W_{i-1},W_{i+1}), (Wi2,Wi+2)(W_{i-2},W_{i+2}), this becomes an independence-breaking 3-cycle in HH^{\prime} with product edges (Wi2,Wi1)(W_{i-2},W_{i-1}), (Wi1,Wi)(W_{i-1},W_{i}) and a constraint edge (Wi2,Wi)(W_{i-2},W_{i}) (note that Wi1W_{i-1} and Wi+1W_{i+1} are merged into Wi1W_{i-1} in HH^{\prime} and Wi+2W_{i+2} is renamed as WiW_{i} in HH^{\prime}).

HH^{\prime} satisfies the inductive hypotheses, so looking only at the constraint edges, HH^{\prime} has at most (q1)+1=q(q-1)+1=q connected components. HH has one more connected component than HH^{\prime} (the vertex WiW_{i} in HH), so HH has at most q+1q+1 connected components, as needed. ∎

The above lemma combined with Lemma 9.5 gives the following corollary.

For all terms i=12qRa(Wi,Wi+1)\prod_{i=1}^{2q}R_{a}(W_{i},W_{i+1}) occurring in Equation 9.1 with nonzero expectation, i=12qWi2aq(2q)+q+1|\cup_{i=1}^{2q}W_{i}|\leq 2aq-(2q)+q+1.

We can now apply Theorem 9.1 with y=1y=1, and z=1z=1 by the above corollary. Every entry of Ra,A,BR_{a,A,B} has magnitude at most 2a22^{a^{2}} so we can take B=2a2B=2^{a^{2}}. By Theorem 9.1, if n10n\geq 10, for all AA and BB, for every ε(0,1)\varepsilon\in(0,1),

Since 4lnne(lnn+2)4\ln{n}\geq e(\ln{n}+2) for all n100n\geq 100 and a!aa!\geq a, we have that for all n100n\geq 100, for all AA and BB and all ε(0,1)\varepsilon\in(0,1),

Now by Corollary 9.11, Ra22amaxA,B{Ra,A,B}||R_{a}||\leq 2^{2a}\max_{A,B}{\{||R_{a,A,B}||\}} so

We now prove large deviation bounds for degG(  )deg_{G}(\;) leading to Claim 8.13 which we state below in a more precise form.

If n10n\geq 10, and ε(0,1)\varepsilon\in(0,1), then for all I[n]I\subseteq[n], with I=i2r|I|=i\leq 2r,

To prove the claim we first show a similar concentration bound for the number of cliques of a certain size in GG. While similar results appear in the literature, see for instance [Ruc88, Vu01, JLR11], we give a short direct proof based on Theorem 9.1.

For a graph GG, define Na(G)N_{a}(G) to be the number of aa-cliques in GG.

For all aa, for all n10n\geq 10 and ε(0,1)\varepsilon\in(0,1), E[Na(G)]=2(a2)(na)E[N_{a}(G)]=2^{-\binom{a}{2}}{\binom{n}{a}} and

The first part of the theorem is trivial so we focus on the second part. Given a set of vertices VV of size aa, define cVc_{V} to be 12(a2)1-2^{-\binom{a}{2}} if VV is a clique and 2(a2)-2^{-\binom{a}{2}} otherwise. Then,

Now let’s consider the function p(G,2q)=(V:V=acV)2q=W1,,W2qi=12qcWip(G,2q)=(\sum_{V:|V|=a}{c_{V}})^{2q}=\sum_{W_{1},\cdots,W_{2q}}{\prod_{i=1}^{2q}{c_{W_{i}}}}.

Note that E[i=12qcWi]=0E[\prod_{i=1}^{2q}{c_{W_{i}}}]=0 unless each set of vertices WiW_{i} has two vertices in common with a different set of vertices WjW_{j}. Now consider a graph C2C_{2} where the vertices are {W1,,W2q}\{W_{1},\ldots,W_{2q}\} and an edge between Wi,WjW_{i},W_{j} if WiWj2|W_{i}\cap W_{j}|\geq 2. Let tt be the number of connected components in C2C_{2}. We claim that iWi2aq4q+2t|\cup_{i}W_{i}|\leq 2aq-4q+2t. For, as in the proof of Lemma 9.5, first consider elements Wi1,,WitW_{i_{1}},\ldots,W_{i_{t}} belonging to the tt different connected components. Now, add the remaining elements of {W1,,W2q}\{W_{1},\ldots,W_{2q}\} so that each new element is adjacent to at least one of the previously added sets. When doing so, each step can increase the size of the union by at most a2a-2. Therefore, the size of the union is at most at+(a2)(2qt)=2aq4q+2tat+(a-2)(2q-t)=2aq-4q+2t. On the other hand, each connected component in C2C_{2} must have at least two vertices, so tqt\leq q. Therefore, iWi2aq2q|\cup_{i}W_{i}|\leq 2aq-2q.

We can now apply Theorem 9.1 with y=2y=2, z=0z=0 and B=1B=1 so that for n10n\geq 10, and ε(0,1)\varepsilon\in(0,1),

Using the facts that e2<8e^{2}<8 and m2m!2\frac{m^{2}}{m!}\leq 2 for all nonnegative integers mm, we have that

We are now ready to prove Theorem 10.1. The idea is as follows. Let AIA_{I} be the collection of vertices which are adjacent to all the vertices in II. Then, conditioned on II being a clique, degG(I)deg_{G}(I) is just the number of cliques of size 2ri2r-i in the vertices AIA_{I} which is primarily determined by AI|A_{I}|. This is because the edges between vertices of AIA_{I} are independent of the edges involving vertices in II so that we can apply Theorem 10.3 to AIA_{I}.

Let AIA_{I} be as above and let us condition on II being a clique. Then, degG(I)deg_{G}(I) is just the number of cliques of size 2ri2r-i among the vertices in AIA_{I}. Therefore, by Theorem 10.3, with probability at least 1ε/21-\varepsilon/2,

We next argue that (AI2ri)\binom{|A_{I}|}{2r-i} is concentrated around its mean. For jIj\notin I, let XjX_{j} be the indicator random variable that is 11 if the jj’th vertex is adjacent to all the vertices in II and otherwise. Then, AI=jIXj|A_{I}|=\sum_{j\notin I}X_{j} and

Observe that the random variables XjX_{j} are independent of each other and that

We next apply McDiarmid’s inequality to the function ff. Note that changing any single coordinate of the inputs to ff can change its value by at most n2ri1n^{2r-i-1}. Therefore, by Theorem 4.2, with probability at least 1ε/21-\varepsilon/2,

Combining the above equations, we get that with probability at least 1ε1-\varepsilon,

The theorem now follows as (2ri2)+i(2ri)=(2r2)(i2)\binom{2r-i}{2}+i(2r-i)=\binom{2r}{2}-\binom{i}{2}. ∎

Conclusion and future work

In this work we showed a lower bound for the maximum clique problem on random G(n,1/2)G(n,1/2) graphs in the SOS\mathsf{SOS} hierarchy and positivstellensatz proof system. Besides the specific application to clique lower bounds, the PSD’ness of the matrix MM from Equation 2.5 seems to carry further information that could be potentially useful elsewhere, perhaps for studying various sub-graph statistics. Further, the arguments related to association schemes and bounding the norm of locally random matrices could also be useful elsewhere, especially for other SOS\mathsf{SOS} hierarchy lower bounds. One natural and interesting candidate is the densest subgraph problem.

For planted clique itself, the most obvious open problem is to tighten the gap between the current upper bound of O(n/2r)O(\sqrt{n}/2^{r}) and our lower bound of 2O(r)(n/logn)1/r2^{-O(r)}(\sqrt{n}/\log n)^{1/r} for rr rounds of the SOS hierarchy. In particular, can a constant number of rounds of SOS\mathsf{SOS} beat the square-root barrier and identify planted cliques of size o(n1/2)o(n^{1/2})? KelnerPersonal comminication showed that our dual certificate MM actually is not PSD for kk roughly O(n1/(r+1))O(n^{1/(r+1)}). Thus one needs to come up with a different dual certificate to approach the upper bound of n\sqrt{n} even for r=2r=2.

We thank Boaz Barak, Siu-on Chan, Jonathan Kelner, Robert Krauthgamer, James Lee, Nati Linial, David Steurer, Madhu Sudan and Amir Yehudayoff for several useful comments.

References

Hierarchy Gaps and Positivstellensatz Refutations

For a detailed discussion of the hierarchies and PS(r)\mathsf{PS}(r)-refutations we refer the reader to the discussions in [OZ13]. The basic principle is that, typically, PS(r)\mathsf{PS}(r)-refutations are more robust and stronger than the hierarchy formulations.

The SOS\mathsf{SOS} (or Lasserre) relaxation for maximum clique is stated in Figure 1 (cf. [Tul09]). Although, the formulation itself is not in terms of an SDP, it is a standard fact that as the program only involves inner products of vectors, the optimization can be done by semi-definite programming.

The connection between Figure 1 and PS(r)\mathsf{PS}(r)-refutations comes from the following straightforward lemma stating that a certificate for PS(r)\mathsf{PS}(r)-refutations is simply a primal solution to the standard rr-round SOS\mathsf{SOS}-relaxation of the problem.

Observe that for any two subsets S1,S2([n]r)S_{1},S_{2}\in\binom{[n]}{\leq r},

Therefore, the vectors (US:Sr)(\bm{U}_{S}:|S|\leq r) satisfy the first two constraints of Figure 1 as M\mathcal{M} is a dual certificate. Further, U2=M(,)=1\|\bm{U}_{\emptyset}\|^{2}=M(\emptyset,\emptyset)=1 and for any set SS,

so that US1\|\bm{U}_{S}\|\leq 1. Thus, (US(\bm{U}_{S}: Sr)|S|\leq r) give a feasible solution for the program in Figure 1. Finally, the value of the solution is

Let GG(n,1/2)G\leftarrow G(n,1/2). Then, from the above lemma and the proof of Theorem 1.5 (where we showed the existence of a dual certificate for the clique axioms), the value of the rr-round SOS\mathsf{SOS}-relaxation for max-clique on GG is at least n1/2r/Cr(logn)1/rn^{1/2r}/C^{r}(\log n)^{1/r} with high probability. The claim follows as the integral value is (2+o(1))log2n(2+o(1))\log_{2}n with high probability. ∎

The value of the relaxation in Figure 1 is clearly monotone with respect to adding edges. Therefore, from the above argument, for GG(n,1/2,t)G\leftarrow G(n,1/2,t) the value of the rr-round SOS\mathsf{SOS}-relaxation for max-clique on GG is at least n1/2r/Cr(logn)1/rn^{1/2r}/C^{r}(\log n)^{1/r} with high probability. The claim follows as the integral value is tt with high probability. ∎