Sum-of-squares lower bounds for planted clique

Raghu Meka, Aaron Potechin, Avi Wigderson

Introduction

Finding cliques in random graphs has been the focus of substantial study in algorithm design. Let $G(n,p)$ denote Erdös-Renyi random graphs on $n$ vertices where each edge is kept in the graph with probability $p$ . It is easy to check that in a random graph $G\leftarrow G(n,1/2)$ , the largest clique has size $(2+o(1))\log_{2}n$ with high probability. On the other hand, the best known polynomial-time algorithms can only find cliques of size $(1+o(1))\log_{2}n$ and obtaining better algorithms remains a longstanding open problem: Karp [Kar76] suggested that even finding cliques of size $(1+\varepsilon)\log_{2}n$ could require superpolynomial time.

Motivated by this, much attention has been given to the related planted clique problem or hidden clique problem introduced by Jerrum [Jer92] and Kucera [Kuc95]. Here, we are given a graph $G\leftarrow G(n,1/2,k)$ generated by first choosing a $G(n,1/2)$ random graph and placing a clique of size $k$ in the random graph for $t\gg\log_{2}n$ . The goal is to recover the hidden clique for as small a $k$ as possible given $G$ . The study of the planted clique problem and its variations (like finding planted dense subgraphs) is motivated from several other more recent directions. Its potential as being hard on average has lead to proposals to base crypto systems on variants of it [ABW10]. It was used to argue that testing $k$ -wise independence is hard near the information theoretic limit by [AAK+07]. It is used in [ABBG10] to argue that evaluating some financial derivatives is hard. It was also used to justify the hardness of sparse principal component detection by Bethet and Rigollet [BR13]. Another source of interest comes from the related algorithmic problem of finding large communities in social networks. The best known polynomial-time algorithms can solve the problem for $k=\Theta(\sqrt{n})$ [AKS98] (see [DGGP14] for a near linear-time algorithm) and improving on this bound has received significant attention. The algorithmic problem has also been of much interest in the context of signal finding in molecular biology (pattern discovery in DNA sequences) as modeled in the work of [PS+00].

In this work we exhibit a lower bound for the problem in the powerful Lasserre [Las01] and “sum-of-squares” ( $\mathsf{SOS}$ ) [Par00] semi-definite programming hierarchiesFor brevity, in the following, we will use $\mathsf{SOS}$ hierarchy as a common term for the formulations of Lasserre [Las01] and Parrilo [Par00] which are essentially the same in our context.. As it happens, proving such lower bounds for the planted clique problem reduces easily to proving an integrality gap of value $k$ for the natural formulation of the maximum clique problem in these hierarchies on $G(n,1/2)$ graphs. Our main result then is the following average-case lower bound for maximum clique. We defer the formal definition of the semi-definite relaxation and hierarchies for now, and only note a few facts. First, that implementing the $r$ th level of the $\mathsf{SOS}$ hierarchy (namely, $r$ rounds), takes roughly $n^{O(r)}$ time, which is polynomial for constant $r$ . Second, the above algorithm for $k=\Theta(\sqrt{n})$ may be viewed as implementing only one round. Third, that $r=\log n$ suffices for exact solution of the problem, namely finding the maximum clique. Our lower bound implies that polynomial time (when the number of rounds $r$ is constant) cannot handle even $k=n^{o(1)}$ , and that as many as $(\log n)^{1/2}$ rounds cannot handle $k=(\log n)^{O(1)}$ . Here are more precise statementsThroughout, $c,C$ denote constants..

With high probability, for $G\leftarrow G(n,1/2)$ the natural $r$ -round $\mathsf{SOS}$ relaxation of the maximum clique problem has an integrality gap of at least $n^{1/2r}/C^{r}(\log n)^{2}$ .

As a corollary we obtain the following lower bound for the planted clique problem.

With high probability, for $G\leftarrow G(n,1/2,t)$ the natural $r$ -round $\mathsf{SOS}$ relaxation of the planted clique problem has an integrality gap of at least $n^{1/2r}/tC^{r}(\log n)^{2}$ .

2 Background and related work

Linear and semi-definite hierarchies are one of the most powerful and well-studied techniques in algorithm design. The most prominent of these are the Sherali-Adams hierarchy ( $\mathsf{SA}$ ) [SA90], Lovasz-Schrijver hierarchy ( $\mathsf{LS}$ ) [LS91], their semi-definite versions $\mathsf{SA}_{+}$ , $\mathsf{LS}_{+}$ and Lasserre and $\mathsf{SOS}$ hierarchies. The hierarchies present progressively stronger convex relaxations for combinatorial optimization problems parametrized by the number of rounds $r$ , where the $r$ -round relaxation can be solved in $n^{O(r)}$ time on instances of size $n$ in all of them. In terms of relative power (barring some minor technicalities about how the numbering of rounds starts), it is known that $\mathsf{LS}_{+}(r)<\mathsf{SA}_{+}(r)<\mathsf{SOS}(r)$ . Because they capture most powerful techniques for combinatorial optimization, lower bounds for hierarchies serve as strong unconditional evidence for computational hardness. Such lower bounds are even more relevant and compelling in situations where we do not have NP-hardness results, as is the case for typical average-case optimization problems.

Broadly speaking, our understanding of the $\mathsf{SOS}$ hierarchy is more limited than those of $\mathsf{LS}_{+}$ and $\mathsf{SA}_{+}$ hierarchies and in fact the $\mathsf{SOS}$ hierarchy appears to be much more powerful. A particularly striking example of this phenomenon was provided by a recent work of Barak et al. [BBH+12]. They showed that a constant number of rounds of the $\mathsf{SOS}$ hierarchy can solve the much studied unique games problem on instances which need super constant number of $\mathsf{LS}_{+},\mathsf{SA}_{+}$ rounds. It was also shown by the works of [BRS11, GS11] that the $\mathsf{SOS}$ hierarchy captures the sub-exponential algorithm for unique games of [ABS10]. These results emphasize the need for a better understanding of the power and limitations of the $\mathsf{SOS}$ hierarchy.

From the perspective of proving limitations, all known lower bounds for the $\mathsf{SOS}$ hierarchy essentially have their origins in the works of Grigoriev [Gri01b, Gri01a], some of which were later independently rediscovered by Schoenebeck [Sch08]. These works show that even $\Omega(n)$ rounds of $\mathsf{SOS}$ hierarchy cannot solve random $3XOR$ or $3SAT$ instances, implying a strong unconditional average-case lower bound for a natural distribution.

Most subsequent lower bounds for $\mathsf{SOS}$ hierarchy such as those of [Tul09], [BCV+12] rely on [Gri01b] and [Sch08] and gadget reductions. For example, Tulsiani [Tul09] shows that $2^{O(\sqrt{\log n})}$ rounds of $\mathsf{SOS}$ has an integrality gap of $n/2^{O(\sqrt{\log n})}$ for maximum clique in worst-case. This is in stark contrast to the average-case setting: even a single round of $\mathsf{SOS}$ gets an integrality gap of at most $O(\sqrt{n})$ for maximum clique on $G(n,1/2)$ [FK00]. Thus, the worst-case and average-case problems have very different complexities. Finally, using reductions tend to induce distributions that are far from uniform and definitely not as natural as $G(n,1/2)$ .

For max-clique on random $G(n,1/2)$ graphs, Feige and Krauthgamer [FK00] showed that $\mathsf{LS}_{+}(r)$ , and hence $\mathsf{SOS}(r)$ , has an integrality gap of at most $\sqrt{n}/2^{\Omega(r)}$ with high probability. Complementing this, they also showed [FK03] that the gap remains $\sqrt{n}/2^{r}$ for $\mathsf{LS}_{+}(r)$ with high probability. However, there were no non-trivial lower bounds known for the stronger $\mathsf{SOS}$ hierarchy.

For the planted clique problem, other algorithmic techniques were studied. Jerrum [Jer92] showed that a broad class of Markov chain Monte-Carlo (MCMC) based methods cannot solve the problem when the planted clique has size $O(n^{1/2-\delta})$ for any constant $\delta>0$ . Another approach for the planted clique problem based on optimizing a third order tensor was suggested by Frieze and Kannan [FK08]. However, the corresponding optimization problem is NP-hard in the worst-case.

In a recent work, Feldman et al. [FGR+13] introduced the framework of statistical algorithms which generalizes many algorithmic approaches like MCMC methods and showed that such algorithms cannot find large cliques when the planted clique has size $O(n^{1/2-\delta})$ in less than $n^{\Omega(\log n)}$ timeThe results of [FGR+13] actually apply to the harder bipartite planted clique problem, but this assumption is not too critical.. However, their framework seems quite different from hierarchy based algorithms. In particular, the statistical algorithms framework is not applicable to algorithms which first pick a sample, fix it, and then perform various operations (such as convex relaxations) on it, as is the case for the hierarchies above.

Meka and Wigderson [MW13] addressed $\mathsf{SOS}$ lower bounds for planted clique and claimed a stronger bound than Thm 1.1. While there was a fatal error in their proof, many of the techniques introduced there are used in the present paper.

Independent of our work, Deshpande and Montanari [DM15] recently gave a degree $4$ $\mathsf{SOS}$ lower bound for planted clique; while they are only able to handle the degree $4$ case (i.e., $r=2$ ) , they obtain a better bound for this case than us (roughly $n^{1/3}$ vs $n^{1/4}$ as we do).

3 Proof systems and SDP hierarchies

A potentially simpler problem than deciding is a large clique exists is the problem of producing short certificates to the non-existence of such cliques. This puts the problem in the realm of proof complexity. Indeed, we approach the problem of $\mathsf{SOS}$ lower bounds from this viewpoint, via the positivstellensatz proof system perspective of Grigoriev and Volobjov [GV01]. We explain this proof system next in general, and then specialize to Boolean problems and specifically to planted clique.

Suppose we are given a system of polynomial equations or “axioms”

where $\{g_{1},\ldots,g_{m}\}$ and $\{h_{1},\ldots,h_{N}\}$ are arbitrary $n$ -variate polynomials. Clearly, if there exists an identity as above, then the system $\mathcal{F}$ has no solution over reals. Starting with the seminal work of Artin on Hilbert’s seventeenth problem [Art27], a long line of important results in real algebraic geometry – [Kri64, Ste73, Put93, Sch91]; cf. [BCR98] and references therein – showed that, under some (important) technical conditionsWe avoid going into the details here as the conditions are easily met in the presence of Boolean axioms., such certifying identities always exist for an infeasible system. This motivates the following notion of complexity for refuting systems of polynomial equations.

where $g_{1},\ldots,g_{m},h_{1},\ldots,h_{N}$ are $n$ -variate polynomials such that $deg(f_{i}g_{i})\leq 2r$ for all $i\in[m]$ and $deg(h_{j})\leq r$ for all $j\in[N]$ .

Our interest in positivstellensatz refutations as above comes from the known relations between such identities and $\mathsf{SOS}$ hierarchy. Informally (and under appropriate technical conditions), identities as above of degree $r$ show that $\mathsf{SOS}$ hierarchy can certify infeasibility of the axioms in $2r+\Theta(1)$ rounds and vice versa. We will focus on showing degree lower bounds for identities as above and use them to get integrality gaps for the the $\mathsf{SOS}$ hierarchy. We formalize this in Section 12. For a brief history of the different formulations from [GV01], [Las01], [Par00] and the relations between them and results in real algebraic geometry we refer the reader to [OZ13].

Given the above setup, we shall consider the following set of natural axioms to test if a graph $G$ has a clique of size $k$ .

Given the above theorem it is easy to deduce the integrality gap for the SOS hierarchy, Theorem 1.1: see Section 12. We next highlight the outline of the proof, and some of our techniques which may be of broader interest.

4 Outline

Under reasonable technical conditions which ensure strong duality, the converse also holds. For the clique axioms from Equation 2.1, a dual certificate would correspond to a feasible vector solution for the $r$ -round $\mathsf{SOS}$ relaxation for maximum clique (see Figure 1 for the exact formulation) with value $k$ .

The following elementary lemma will be crucial.

The existence of such a mapping trivially implies a lower bound for $\mathsf{PS}(r)$ refutations: apply $\mathcal{M}$ to both sides of a purported $\mathsf{PS}(r)$ identity as in Equation 1.1 to arrive at a contradiction.

The lemma suggests a general recipe for proving $\mathsf{PS}(r)$ refutation lower bounds:

Reduction to PSDness of another matrix $M^{\prime}$ : The matrix $M$ has many zero rows and columns which makes it difficult to work with. In Section 5 we fix this by filling in the zero rows and columns of $M$ to obtain a new matrix $M^{\prime}$ . We then argue that to show $M$ is PSD it is sufficient to show that $M^{\prime}$ is PSD.

(Deterministic) Matrix analysis: $E=E[M^{\prime}]$ is PSD with a large minimum eigenvalue $\lambda_{min}(E)$ . We show this statement in Section Section 7 by using the theory of association schemes described below.

Large deviation: with high probability, $\|M^{\prime}-E\|\leq\lambda_{min}(E)$ . This is done by using the structure of our matrix $M^{\prime}$ along-with a careful application of the trace method to bound the norms of certain random matrices with dependent entries.

As discussed, the essence of proving Theorem 1.5 involves showing that a certain random matrix is positive semi-definite (PSD) with high probability. In our case, this calls for showing a relation of the form $A\prec B$ Here and henceforth $\prec$ denotes PSD ordering: $A\prec B$ if and only if $B-A$ is positive definite. for two matrices $A,B$ whose rows and columns are indexed by subsets of $[n]$ of size $r$ . This in turn leads us to matrices which though complicated to describe, will be set-symmetric - the entry defined by any two (row and column) sets $I,J$ depends solely on the size of the intersection $I\cap J$ . The set of all such matrices, called the Johnson scheme, is quite well studied in combinatorics as a special case of association schemes. In particular, all such matrices commute with one another and their common eigenspaces are completely understood. This theory allows us to estimate the eigenvalues and norms of various matrices that arise in the analysis.

Techniques: Trace bounds for locally random matrices

After various simplifications and reductions, a central problem we have to deal with is upper bounding the spectral norm of certain random matrices, defined by the underlying random graph $G\leftarrow G(n,1/2)$ . As above, these matrices have rows and columns indexed by subsets of vertices. The entry $(I,J)$ of the matrix will be a random variable of expectation zero, which depends only on the edges and non-edges of $G$ in the subgraph induced by $I\cup J$ (hence we name such matrices local). In the simple case when $r=1$ (so rows and columns are indexed by singletons), which is the one studied in the analysis of the $\sqrt{n}$ approximation algorithm, the random variables in all entries are mutually independent, and a norm bound is easy to obtain by a straightforward use of the trace method. However, for $r>1$ as we need to handle, the entries of the matrix are dependent whenever the edge sets of their entries intersect. This significantly complicates the trace calculation, and we develop some combinatorial tools to bound the trace of high powers of such local matrices.

Dual certificate for 𝖯𝖲(r)𝖯𝖲𝑟\mathsf{PS}(r) refutations of max-clique

As mentioned in the introduction, we can often work out what the dual certificate should be from the axioms and basic linear algebra. As an example, we first work out the case where the graph $G$ is the complete graph; this will also help us draw a concrete connection to the work of [Gri01a].

For complete graph, the clique axioms simplify to

These incidentally also correspond to proving lower bounds for knapsack as studied by Grigoriev [Gri01a] (and was what lead us to the specific dual certificate we study). However, in the context of lower bounds for knapsack, the axioms are mainly interesting for non-integer $k$ and Grigoriev shows that for non-integer $k\leq n/2$ , the above system has no $\mathsf{PS}(r)$ refutation for $r<k$ .

From the above it follows that we can define $f$ and hence $\mathcal{M}$ as follows:

Grigoriev takes $f(0)=1$ . Here we set $f(0)=\binom{n}{2r}$ with a view towards what is to come. Thus, the final certificate is

For $k<n/2$ , the mapping $\mathcal{M}_{Gr}$ defined above is PSD for $r<k$ .

2 Certificate for clique axioms

The above equations give us a system of linear equations that $\mathcal{M}$ needs to satisfy. By working with the equations, it is easy to guess a natural solution for the system.

Given a graph $G$ on $[n]$ , and $I\subseteq[n]$ , $|I|\leq 2r$ , let

For instance, if $r=1$ and $v\in G$ , then $deg_{G}(\{v\})$ is the degree of vertex $v$ .

For any graph $G$ , $\mathcal{M}\equiv\mathcal{M}_{G}$ defined by Equation 2.4 satisfies Equations 2.2.

The first equation in Equation 2.2 follows immediately from the definition of $\mathcal{M}$ . Now, for $I\subseteq[n],|I|<2r$ ,

Observe that our notion of degree, $deg_{G}$ , satisfies the following recurrence: for $|I|<2r$ ,

The above two equations imply that $\mathcal{M}$ satisfies the second equation in 2.2. ∎

Thus, to prove our main theorem Theorem 1.5, it suffices to show that $\mathcal{M}$ as defined above is PSD with high probability. We now argue that in fact, to show that $\mathcal{M}$ is PSD we do not need to consider all polynomials $P$ of degree at most $r$ . Rather, it is sufficient to show that $\mathcal{M}(P_{1}^{2})\geq 0$ whenever $P_{1}$ is multilinear and homogeneous of degree $r$ .

For any $P$ of degree at most $r$ we may write $P=P_{1}+\sum_{i}{P_{2i}(x^{2}_{i}-x_{i})}+P_{3}(\sum_{i}{x_{i}}-k)$ where $P_{1}$ is multilinear and homogeneous of degree $r$ , $P_{3}$ has degree at most $r-1$ , and all $P_{2i}$ have degree at most $r-2$ .

We first make $P$ multilinear by removing any terms which are not multilinear from $P$ as follows. If $P$ has a term of the form ${x^{2}_{i}}f$ where $f$ has degree at most $r-2$ , write ${x^{2}_{i}}f=(x^{2}_{i}-x_{i})f+{x_{i}}f$ . Iteratively applying this procedure, we may write $P=P^{\prime}$ plus terms of the form $(x^{2}_{i}-x_{i})f$ where $P^{\prime}$ is multilinear of degree at most $r$ and $f$ has degree at most $r-2$ .

We now make $P^{\prime}$ multilinear and homogeneous of degree $r$ by removing any terms which have lower degree as follows. If $P^{\prime}$ has a term of the form $X_{I}$ where $|I|<r$ , write

Iteratively applying this procedure, we may write $P=P_{1}$ plus terms of the form $(x^{2}_{i}-x_{i})f$ and terms of the forms $(\sum_{i}{x_{i}-k})g$ where $P_{1}$ is multilinear and homogeneous of degree $r$ , all such $f$ have degree at most $r-2$ and all such $g$ have degree at most $r-1$ . Putting everything together, the result follows. ∎

If $\mathcal{M}(P_{1}^{2})\geq 0$ for all multilinear homogeneous $P_{1}$ of degree $r$ then $\mathcal{M}$ is PSD.

Assume $\mathcal{M}(P_{1}^{2})\geq 0$ for all multilinear homogeneous $P_{1}$ of degree $r$ and $\mathcal{M}(P^{2})<0$ for some $P\in\mathcal{P}(n,r)$ . Using Lemma 2.3, we may write $P=P_{1}+\sum_{i}{P_{2i}(x^{2}_{i}-x_{i})}+P_{3}(\sum_{i}{x_{i}}-k)$ where $P_{1}$ is multilinear and homogeneous of degree $r$ . $\mathcal{M}(P^{2})=\mathcal{M}(P_{1}^{2})$ so $\mathcal{M}(P_{1}^{2})<0$ . Contradiction. ∎

In the remainder of the paper, we show that $M$ is PSD with high probability for $k\leq\Omega_{r}(n^{1/{2r}}/(\log n)^{1/r})$ .

There exists a constant $c>0$ such that, with high probability over $G\leftarrow G(n,1/2)$ , the matrix $M_{G}$ defined by Equation 2.5 is PSD for $k\leq 2^{-cr}\cdot(\sqrt{n}/\log n)^{1/r}$ .

Overview of proof of Theorem 2.5

The proof of Theorem 2.5 is quite technical, and is broken into two parts, where the second part is further broken down into smaller parts. While we gave a sketch of the proof of Theorem 2.5 in the inroduction, we give a more detailed overview of the proof here. Recall that all matrices mentioned below are random matrices which are specified by the choice of the random graph $G$ .

As mentioned in the introduction, the matrix $M=M_{G}$ has many zero rows and columns which makes it difficult to work with. The first part is to fill in the zero rows and columns of $M$ to obtain a new matrix, $M^{\prime}$ , which is nonsingular and has no high variance entries. In Section 5 we define this matrix $M^{\prime}$ and show that if $M^{\prime}$ is PSD, so is $M$ . The idea is that $M$ and $M^{\prime}$ are symmetric and the nonzero part of $M$ is a principal submatrix of $M^{\prime}$ , so the smallest nonzero eigenvalue of $M$ is at least as large as the smallest eigenvalue of $M^{\prime}$ .

Having defined $E$ (which is set-symmetric), let us spell out what the other matrices are. The“local” random matrix $L$ is defined in a simple way as follows:

Finally, define the last matrix $\Delta=M^{\prime}-E-L$ .

The proof that $M^{\prime}$ is PSD proceeds in three modular steps:

We use the results about Johnson scheme to show that $E\succ 0$ and has a large least eigenvalue (roughly $\Omega_{r}(k^{r}n^{r})$ ); see Section 7.

We next show that $\|L\|<Ck^{2r}n^{r-1/2}\log n$ by exploiting the recursive structure of the matrix $L$ and some careful trace calculations. This is the most technically intensive part of the proof, and requires the development of some combinatorial tools to estimate the trace of high powers of $L$ ; see Section 8.2.

We then show that $\|\Delta\|<Ck^{2r}n^{r-1/2}\log n$ . This is done by first showing that every entry of $\Delta$ is small in magnitude, via concentration bounds on the number of cliques in random graphs, and bounding its norm using Gershgorin’s circle theorem (Lemma 4.1); see Section 8.3.

Preliminaries

We shall use the following notationsSome are repeated from the introduction so as to have them at one place.:

$\mathcal{P}(n,2r)$ denotes the set of $n$ -variate polynomials of degree at most $2r$ .

$\mathsf{PS}(r)$ denotes positivstellensatz refutations of degree at most $r$ as defined in Definition 1.3.

For $0\leq r\leq n$ , let $\binom{[n]}{r}$ , $\binom{[n]}{\leq r}$ denote all subsets of size exactly and at most $r$ , respectively.

For $I\subseteq[n]$ , let $X_{I}=\prod_{i\in I}x_{i}$ .

By default all vectors are column vectors. For a set $I$ , $\mathds{1}(I)$ denotes the indicator vector of the set $I$ .

We will also need the following standard fact from matrix theory (see [GVL96] for instance).

Finally, we need McDiarmid’s inequality for obtaining tail bounds for functions of independent random variables (see [dubashi2009concentration] for instance)

Let $X_{1},\ldots,X_{n}$ be independent random variables and let $f$ be a function over the domain space of $(X_{1},\ldots,X_{n})$ . Let $c_{1},\ldots,c_{n}>0$ be such that for all $i$ , $x_{1},\ldots,x_{n},x_{i}^{\prime}$ ,

In this section, we define the matrix $M^{\prime}$ and show that if $M^{\prime}$ is PSD then so is $M$ . We use the following notations for brevity: For any set $I\subseteq[n]$ , let $\mathcal{E}(I)=\{\{i,j\}:i\neq j\in I\}$ . For $0\leq i\leq r$ , let

Intuitively, for every $I,J$ , $M^{\prime}(I,J)$ is what $M(I,J)$ would be had we added cliques on the subsets $I$ , $J$ to the graph. The above definition avoids the problem of the whole row and column corresponding to $I$ or $J$ becoming zero if either was not a clique and controls the variance of the entries. We now show that to show $M$ is PSD, it is sufficient to show that $M^{\prime}$ is PSD.

The reason this lemma is true is because as shown below, the nonzero part of $M$ is a principal submatrix of $M^{\prime}$ .

Whenever $I$ and $J$ are cliques of size $r$ in $G$ , $M^{\prime}(I,J)=M(I,J)$

Suppose that $I$ and $J$ are cliques in $G$ . Then, $M_{T}(I,J)=\beta(|I\cap J|)$ if $I\cup J\subseteq T$ and $T$ is a clique and otherwise. Therefore,

The nonzero part of $M$ is a principal submatrix of $M^{\prime}$ .

We now use the following elementary fact about matrices.

If $A$ is a principal submatrix of a symmetric matrix $B$ then the smallest eigenvalue of $A$ is at least as large as the smallest eigenvalue of $B$ .

Combining Corollary 5.3 and Proposition 5.4, if $M^{\prime}$ is PSD then $M$ is PSD, as needed. ∎

Johnson scheme

Association schemes is a classical area in combinatorics and coding theory (cf. for instance [vLW01]). We shall use a few classical results (lemmas 6.6, 6.7 below), about the eigenspaces and eigenvalues of association schemes and the Johnson scheme in particular. We also introduce two bases for the Johnson scheme, which will play a key role in bounding the eigenvalues of various matrices later.

We start with some basics about the Johnson scheme - some of our notations are non-standard but they fit better with the rest of the manuscript.

As we will soon see, $\mathcal{J}$ is also a commutative algebra. There is a natural basis for the subspace $\mathcal{J}$ :

Another important collection of matrices that come up naturally while studying PSD’ness of set-symmetric matrices is the following which gives a basis of PSD matrices for the Johnson scheme.

Equivalently, for $T\subseteq[n]$ , if we let $P_{T}$ be the PSD rank one matrix

For fixed $n,r$ , the following relations hold:

The first relation follows immediately from the definition of $P_{t}$ . The second relation follows from inverting the set of equations given in (1). ∎

The main nontrivial result from the theory of association schemes we use is the following characterization of the eigenspaces of matrices in $\mathcal{J}$ . The starting point for these characterizations is the fact that matrices in $\mathcal{J}$ commute with one another and hence are simultaneously diagonalizable. We refer the reader to Section 7.4 in [God] (the matrices $P_{t}$ in our notation correspond to matrices $C_{t}$ in [God]) for the proofs of these results.

$V_{0},\ldots,V_{r}$ are eigenspaces for $\{P_{t}:0\leq t\leq r\}$ and consequently for all matrices in $\mathcal{J}$ .

For $0\leq j\leq r$ , $dim(V_{j})=\binom{n}{j}-\binom{n}{j-1}$ .

For any matrix $Q\in\mathcal{J}$ , let $\lambda_{j}(Q)$ denote the eigenvalue of $Q$ within the eigenspace $V_{j}$ . Then,

Therefore, as $Q$ and $P_{t}$ ’s have common eigenspaces, by Lemma 6.6,

PSD’ness of the expectation matrices

The expectation matrix above is just a scalar multiple of $\mathcal{M}_{Gr}$ (viewed as a matrix) as defined in Equation 2.1. Therefore, by Theorem 2.1, $E_{M}$ as defined above is PSD for $r<\min(k,n-k)$ . We give a simpler proof of this claim here for the case when $r\leq\min(\frac{k}{2},n-k)$ .

The matrix $E_{M}$ is positive definite for $r\leq\min(\frac{k}{2},n-k)$ .

We will show this by writing $E_{M}$ as a suitable positive linear combination of the PSD matrices $P_{t}$ ’s from Section 6. More concretely, for any $\alpha_{0},\ldots,\alpha_{t}>0$ , we have

If $k<\frac{n-2r}{3r\cdot 2^{r-1}}$ and $r\leq\frac{k}{2}$ then $E$ is PSD with minimal eigenvalue $2^{-O(r^{2})}{k^{r}}{n^{r}}$

Since the $P_{t}$ ’s are PSD and $P_{r}=I$ , $E$ is PSD with minimal eigenvalue $2^{-O(r^{2})}{k^{r}}{n^{r}}$ , as needed. ∎

PSD’ness of dual certificate

We are now ready to prove our main result, Theorem 1.5, with the aid of several technical results whose proof is deferred to Section 9 and Section 10. We prove Theorem 1.5 by showing that the matrix $M$ will be PSD with high probability (Theorem 2.5). In turn, we show that $M$ is PSD with high probability with our main technical lemma, which says that $M^{\prime}$ is PSD with high probability (this is sufficient by Lemma 5.1).

To prove Lemma 8.1, we first decompose $M^{\prime}$ as $M^{\prime}=E+L+\Delta$ in Section 8.1. We then analyze $L$ and $\Delta$ in Section 8.2 and Section 8.3 respectively. We put all the pieces together to show the PSD’ness of $M^{\prime}$ in Section 8.4.

For the remainder of this section, we shall use the following additional notations:

For $0\leq i\leq r$ , let $p(i)=2^{-(r-i)^{2}}$ . Then, for $I,J\in\binom{[n]}{r}$ with $|I\cap J|=i$ , $p(i)$ is the probability that $\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G$ .

In the following we will adopt the convention that $I,J,K$ denote elements of $\binom{[n]}{r}$ and $T,T^{\prime}$ denote elements of $\binom{[n]}{2r}$ .

We write $A\approx_{r}B$ if there exist constants $c,C$ such that $c^{r^{2}}B\leq A\leq C^{r^{2}}B$ .

Finally, define $\Delta=M^{\prime}-E-L$ . We have already shown in Section 7 that $E$ is PSD with minimal eigenvalue $2^{-O(r^{2})}{k^{r}}{n^{r}}$ . There are now two remaining modular steps in the proof:

We show that $\|L\|$ is $2^{O(r^{2})}k^{2r}n^{r-1/2}\log{n}$ by exploiting the recursive structure of the matrix $L$ and some careful trace calculations. This is the most technically intensive part of the proof.

We then show that $\|\Delta\|$ is $2^{O(r^{2})}k^{2r}n^{r-1/2}\log{n}$ . This is done by first showing that each entry of $\Delta$ is small in magnitude and using Lemma 4.1.

The next two subsections address these two steps with the corresponding technical elements dealt with in Section 9 and Section 10 respectively.

2 Bounding the norm of the locally random matrix L𝐿L

In this subsection, we bound the norm of the matrix $L$ .

For some constant $C>0$ , with probability at least $1-1/n$ over the random graph $G$ ,

In other words, for disjoint $V,W\in\binom{[n]}{a}$ the $R_{a}(V,W)$ ’th entry is essentially (up to a constant multiple) a shift of the indicator random variable which is $1$ if all edges in $V\times W$ are in $G$ and otherwise.

If $n\geq 100$ , for all $\varepsilon\in(0,1)$ , $\operatorname*{\mathsf{Pr}}\left[||R_{a}||>2^{a^{2}+2a+2}\ln{(\frac{n}{\varepsilon})}n^{a-\frac{1}{2}}\right]<\varepsilon$ .

Note that $2^{a^{2}}n^{a}$ is an easy bound for $\|R_{a}\|$ (each entry of the matrix is at most $2^{a^{2}}$ in magnitude); the main advantage of the claim is the multiplicative $n^{-1/2}$ factor.

In the remainder of this section we use the recursive structure of the matrix $L$ to prove Claim 8.2 assuming the above claim. We first introduce some notation:

The next claim relates the norms of “lifts” of matrix $R$ , $R^{(i)}$ . Conceptually, bounding the norms of matrices with non-zero entries on intersecting indexing sets are reduced to that of the disjoint case. Note that the requirement $R=R^{0}$ exactly captures the latter.

We partition the entries of $R^{(i)}$ as follows.

For any $X,Y,K$ such that $X\subseteq[1,r_{1}]$ , $Y\subseteq[1,r_{2}]$ , and $K\subseteq V(G)$ where $|K|=|X|=|Y|=i$ , let $R_{X,Y,K}^{(i)}$ be the matrix such that the following is true:

$R_{X,Y,K}^{(i)}(I,J)=R^{(i)}(I,J)=R(I\setminus K,J\setminus K)$ if $K=\{i_{x}:x\in X\}=\{j_{y}:y\in Y\}$ where $i_{1},\cdots,i_{r_{1}}$ are the elements of $I$ in increasing order and $j_{1},\cdots,j_{r_{2}}$ are the elements of $J$ in increasing order.

For all $X,Y,K$ , $||R_{X,Y,K}^{(i)}||\leq||R||$ .

The nonzero part of $R_{X,Y,K}^{(i)}$ can be viewed as a submatrix of $R$ , so it cannot have larger induced norm than $R$ . ∎

If $R^{(i)}(I,J)=0$ then $\sum_{X,Y,K}{R_{X,Y,K}^{(i)}(I,J)}=0$ . If $R^{(i)}(I,J)\neq 0$ then $|I\cap J|=i$ . This implies that $K=\{i_{x}:x\in X\}=\{j_{y}:y\in Y\}$ if and only if $K=I\cap J$ , $X$ is the set of indices of $K$ in $I$ , and $Y$ is the set of indices of $K$ in $J$ , which happens for precisely one $X,Y,K$ . Thus, $R_{X,Y,K}^{(i)}(I,J)=R^{(i)}(I,J)$ for precisely one $I,J,X$ and is otherwise, so $R^{(i)}=\sum_{X,Y,K}{R_{X,Y,K}^{(i)}}$ , as needed. ∎

$||R^{(i)}||\leq\sum_{X,Y}{||\sum_{K}{R_{X,Y,K}^{(i)}}||}$ .

If $K_{1}$ , $K_{2}$ are distinct subsets of $V(G)$ of size $x$ , $R_{X,Y,K_{1}}^{(i)}(I_{1},J_{1})\neq 0$ , and $R_{X,Y,K_{2}}^{(i)}(I_{2},J_{2})\neq 0$ then $I_{1}\neq I_{2}$ and $J_{1}\neq J_{2}$ .

Assume that $I_{1}=I_{2}=I$ and let $i_{1},\cdots,i_{r_{1}}$ be the elements of $I$ in increasing order. Then $K_{1}=\{i_{x}:x\in X\}=K_{2}$ . Contradiction. Following similar logic, we cannot have that $J_{1}=J_{2}$ either. ∎

For any $X,Y\subseteq[1,n]$ , $||\sum_{K}{R_{X,Y,K}^{(i)}}||\leq||R||$ .

Note that we can permute the rows and columns of a matrix without affecting its induced norm. By Proposition 8.9, we can permute the rows and columns of $\sum_{K}{R_{X,Y,K}^{(i)}}$ to put it into block form where each block is the nonzero part of $R_{X,Y,K}^{(i)}$ for some $K$ . For a matrix in block form, its norm is the maximum of the norms of the individual blocks, which by Proposition 8.6 is at most $||R||$ , as needed. ∎

With these results, Lemma 8.4 follows immediately. Plugging in Proposition 8.10 to Proposition 8.8 gives $||R^{(i)}||\leq\sum_{X,Y}{||\sum_{K}{R_{X,Y,K}^{(i)}}||}\leq\sum_{X,Y}{||R||}\leq{{r_{1}}\choose i}{{r_{2}}\choose i}||R||$ , as needed. ∎

We now use the above statements to prove Lemma 8.2.

We claim that for $0\leq i\leq r$ , and $\alpha_{i}$ as in Equation 8.1

To see the above, fix $I,J\in\binom{[n]}{r}$ with $|I\cap J|=i$ and let $V=I\setminus(I\cap J)$ , $W=J\setminus(I\cap J)$ . Observe that

We cosider two cases as in the definition of $L$ .

Case 1. $\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G$ . Then, $R_{r-i}^{(i)}(I,J)=R_{r-i}(V,W)=2^{(r-i)^{2}-1}=(1-p(i))/p(i)$ . Equation 8.6 now follows from the first case of the definition of $L$ .

Case 2. $\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\not\subseteq G$ . Then, $R_{r-i}^{(i)}(I,J)=R_{r-i}(V,W)=-1$ . Equation 8.6 now follows from the second case of the definition of $L$ .

Therefore, by Claim 8.3, Lemma 8.4 and Equation 8.6,

The lemma now follows as $L=\sum_{i=0}^{r}L^{i}$ . ∎

3 Bounding the norm of the global error matrix ΔΔ\Delta

The main claim of this subsection is the following bound on the spectral norm of $\Delta$ .

For $n>C2^{4r^{2}}$ , with probability at least $1-1/n$ over the random graph $G$ ,

The proof relies on the following bound on the individual entries of $\Delta$ .

For some universal constant $C$ , and $n>C2^{4r^{2}}$ , with probability at least $1-1/n$ over the random graph $G$ , for all $I,J\in\binom{[n]}{r}$ , with $i=|I\cap J|$ ,

Before proving the lemma, we first use it to bound $\|\Delta\|$ .

Suppose that the conclusion of the previous lemma holds. Then, for any $I\in\binom{[n]}{r}$ ,

The lemma now follows from the above bound and Lemma 4.1. ∎

Fix sets $I,J$ with $|I\cap J|=i$ . Let $\mathcal{A}$ be the event that $\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G$ .

Then, by the second case of Equation 8.3, conditioned on $\neg\mathcal{A}$ we have $\Delta(I,J)=0$ . Thus, the claim holds trivially in this case. In the following we condition on $\mathcal{A}$ . Observe that

We next use the following claim that $\deg_{G}(I\cup J)$ is concentrated around its mean when conditioned on $I\cup J$ being a clique. At a high level, this follows from the fact that conditioned on $I\cup J$ being a clique, $\deg_{G}(I\cup J)$ can be written as a (structured) low-degree polynomial in the indicator variables of the edges not in $I\cup J$ with small variance. We defer the proof to the appendix.

As a consequence of the above claim we also get concentration for $M^{\prime}(I,J)\mid\mathcal{A}$ . This is because $M^{\prime}(I,J)\mid\mathcal{A}$ is identically distributed as $M(I,J)\mid(I\cup J\text{ a clique})$ . Therefore, taking $\varepsilon=1/n^{2r+1}$ and applying a union bound over all sets $I,J$ we get that with probability at least $1-1/n$ , for all $I,J$ such that $\mathcal{E}(I\cup J)\setminus(\mathcal{E}(I)\cup\mathcal{E}(J))\subseteq G$ , and $|I\cap J|=i$ ,

and conditioned on $\mathcal{A}$ , $\Delta(I,J)=M^{\prime}(I,J)-\alpha(|I\cap J|)/p(|I\cap J|)$ . The lemma now follows by combining the above two bounds. ∎

4 Putting things together

We now prove Lemma 8.1 and use it to prove our main results.

for $k$ as in the statement of the lemma for a sufficiently big constant $c$ . ∎

We bring the arguments from previous sections together to prove our main results Theorem 2.5 and Theorem 1.5.

Follows immediately from Lemma 5.1 and Lemma 8.1. ∎

Follows immediately from Lemma 1.8, Claim 2.2 and Theorem 2.5. ∎

Theorems 1.1 and 1.2 follow immediately from our $\mathsf{PS}(r)$ -refutation lower bound using standard arguments. We defer these to the appendix.

Bounding norms of locally random matrices

Going back to Claim 8.3 let us first look at the special case of $a=1$ to gain some intuition. In this case, the entries of $R_{1}$ are (essentially) independent, and so the trace method is easy to apply. More precisely, $R_{1}$ is a symmetric random matrix with zeros on the diagonal and the entries in the upper diagonal taking independent uniformly random $\pm{1}$ values. It is well known that $\|R_{1}\|=O(\sqrt{n})$ in this case (see [Ver] for instance). One can also prove the bound by the trace method as follows. We have that

where $i_{2q+1}=i_{1}$ . We can then look at which products $\prod_{j=1}^{2q}{{R_{1}}({i_{j}},i_{j+1})}$ have expectation .

To handle higher $a$ ’s we first generalize the above argument based on constraint graphs to work with general locally-random matrices. However, unlike for $a=1$ , distinct entries of the matrix are now dependent, which significantly complicates the structure of the terms and the associated count of the terms which have non-zero expectation. The rest of the section is devoted to this. While we apply our arguments to the particular locally-random matrices arising in our proof, these techniques should apply more generally to other locally-random matrices.

We next state our main technical result which gives us a way to bound traces of high powers of locally random matrices based on the structure of the individual terms. The advantage being that the conditions on the terms will be easier to ascertain in our applications.

Here we use $V$ rather than $I$ for subsets because we will be viewing the individual elements of each $V$ as vertices.

Assume that we have values $a,B>0$ and for every positive $q$ , we have a function $p(G,2q)$ such that $p(G,2q)\geq 0$ and $p(G,2q)$ can be written in the form

For every term $f(G,\{V_{1},\ldots,V_{2q}\})$ with non-zero expected value, $|\cup_{j}V_{j}|\leq 2aq-qy+z$ for some integers $y$ and $z$ where $1\leq y\leq 2a$ and $z\geq 0$ .

Then, if $n\geq 10$ , for all $\varepsilon\in(0,1)$ ,

We will use this theorem with two types of functions $p$ . When $p(G,2q)=tr(({M^{T}}M)^{q})$ for some matrix $M$ depending on $G$ , $||M||\leq\sqrt[2q]{p(G,2q)}$ for all $q>0$ so this theorem gives us a probabilistic bound on $||M||$ . When $p(G,2q)=h(G)^{2q}$ for some function $h$ , then $h(G)=\sqrt[2q]{p(G,2q)}$ for all $q>0$ so this theorem gives us a probabilistic bound on $h(G)$ .

In the case when $p(G,2q)=tr(R_{1}^{2q})$ , $p(G,2q)=\sum_{i_{1},\cdots,i_{2q}}{\prod_{j=1}^{2q}{{R_{1}}({i_{j}},i_{j+1})}}$ . Each term here has expected value at most $1$ and it is easy to argue that for any term with non-zero expected value, the number of distinct elements is at most $q+1$ . Applying Theorem 9.1 with $y=z=1$ , and $B=1$ we have that for all $n\geq 10$ , and $\varepsilon\in(0,1)$ ,

This bound is weaker (by a logarithmic factor) than the bounds in e.g. [Ver], but is sufficient for our purposes.

Before proving the theorem we introduce the concept of constraint graphs which are a useful way to visualize our calculations. While the statement of the above theorem does not involve constraint graphs, thinking in terms of constraint graphs is helpful in proving the conditions required to apply the theorem.

Given a family of sets of vertices $\{V_{i}\}$ , we define a corresponding constraint graph $C$ whose vertices are the sets $\{V_{i}\}$ and there is an edge between $V_{i},V_{j}$ , $i\neq j$ , if $V_{i}\cap V_{j}\neq\emptyset$ .

The above definition is useful because of the following elementary lemma.

In the following we use $\{V_{i}\}$ as a short form for $\{V_{1},\ldots,V_{2q}\}$ . We prove this result by obtaining an upper bound on the number of terms in $p(G,2q)=\sum_{\{V_{i}\}}{f(G,\{V_{i}\})}$ with nonzero expected value. This gives us a probabilistic upper bound for $p(G,2q)$ , implying the upper bound on $\min_{q}{\{\sqrt[2q]{p(G,2q)}\}}$ .

Define $N(n,a,q,m)$ to be the number of ways to choose subsets $\{V_{i}:i\in[2q]\}$ of $[n]$ such that $|\cup_{i}{V_{i}}|\leq m$ and for all $i$ , $|V_{i}|=a$ .

We can choose each ordered $2aq$ -tuple $(v_{1},\cdots,v_{2aq})$ of elements in $[n]$ which contains at most $m$ distinct elements as follows. There must be at least $2aq-m$ elements which are duplicates of other elements, so we can first choose a set $I$ of $2aq-m$ indices such that for all $i\in I$ , $v_{i}=v_{j}$ for some $j\notin I$ . There are $\binom{2aq}{2aq-m}$ choices for $I$ . We then choose the elements $\{v_{j}:j\notin I\}$ . There are no restrictions on these elements so there are $n^{m}$ choices for these elements. Finally, we choose the elements $\{v_{i}:i\in I\}$ . To determine each $v_{i}$ it is sufficient to specify the $j\notin I$ such that $v_{i}=v_{j}$ . For each $i$ there are $m$ choices for the corresponding $j$ , so the number of choices for these elements is at most $m^{2aq-m}$ . Putting everything together, the total number of choices is at most $\binom{2aq}{2aq-m}{n^{m}}{{m}^{2aq-m}}$ . Now note that since we are choosing subsets $\{V_{i}:i\in[2q]\}$ of $[n]$ rather than one big ordered tuple, the order within each subset does not matter. Thus, there are $(a!)^{2q}$ different ordered tuples which give the same subsets of elements, so the total number of possibilities for the subsets $\{V_{i}\}$ is at most $(a!)^{-2q}{\binom{2aq}{2aq-m}}{n^{m}}{{m}^{2aq-m}}$ , as needed. ∎

Moreover, by our assumptions, each of these nonzero terms $E[f(G,\{V_{i}\})]$ has value at most $B^{2q}$ , so

Now, by Markov’s inequality applied to $p(G,2q)$ ,

The claim now follows by rearranging the above bound. ∎

In this subsection, we prove Claim 8.3 using Theorem 9.1. For convenience, we restate Claim 8.3 here with more precise constants.

If $n\geq 100$ , for all $\varepsilon\in(0,1)$ , $\operatorname*{\mathsf{Pr}}\left[||R_{a}||>2^{a^{2}+2a+2}\ln{(\frac{n}{\varepsilon})}n^{a-\frac{1}{2}}\right]<\varepsilon$ .

The core of the proof will be to bound $|\cup_{j=1}^{2q}V_{i_{j}}|$ for any term $\prod_{j=1}^{2q}R_{a}(V_{i_{j}},V_{i_{j+1}})$ with non-zero expectation which appear in the expansion of $tr((R_{a}^{T}R_{a})^{q})$ . We will do so by arguing that the constraint graph associated with the term has at most $2aq-q+1$ connected components, which we do by inductively decomposing $R_{a}$ as follows.

Given a partition $(A,B)$ of $[1,n]$ , define $R_{a,A,B}(V_{1},V_{2})=R_{a}(V_{1},V_{2})$ if $V_{1}\subseteq A$ and $V_{2}\subseteq B$ and otherwise.

$R_{a,A,B}(V_{1},V_{2})=R_{a}(V_{1},V_{2})=0$ whenever $V_{1}$ and $V_{2}$ are not disjoint. For all disjoint $V_{1}$ and $V_{2}$ , $R_{a,A,B}(V_{1},V_{2})=R_{a}(V_{1},V_{2})$ for $2^{n-2a}$ choices of $A$ and $B$ and is for the rest. ∎

$||R_{a}||\leq 2^{2a}\max_{A,B}{\{||R_{a,A,B}||\}}$

Since $R_{a}=2^{2a-n}\sum_{A,B}{R_{a,A,B}}$ , $||R_{a}||\leq 2^{2a-n}\sum_{A,B}{||R_{a,A,B}||}\leq 2^{2a}\max_{A,B}{\{||R_{a,A,B}||\}}$ ∎

To simplify this expression, rename the sets of vertices as follows.

If $i\in[1,2q]$ and $i$ is odd then take $W_{i}=V_{(\frac{i+1}{2})1}$

If $i\in[1,2q]$ and $i$ is even then take $W_{i}=V_{(\frac{i}{2})2}$

where we take $W_{2q+1}=W_{1}$ . To study which of these terms may have non-zero expectation, we first define a graph related to the corresponding constraint graph.

Given a constraint graph $C$ , let $H$ be a graph with two types of edges, product edges and constraint edges, such that

$E_{P}(H)=\{(W_{i},W_{i+1}):i\in[1,2q]\}$

$E_{C}(H)=\{(W_{i},W_{j}):i\neq j,W_{i}\cap W_{j}\neq\emptyset\}$

Now, each $R_{a}(W_{i},W_{i+1})$ is a random variable with expectation , so if any $R_{a}(W_{i},W_{i+1})$ is independent from everything else, the product will have expectation . Such dependencies arise due to the presence of edges from G occurring in (at least two) different “elements” (say $(W_{i},W_{i+1})$ , $(W_{j},W_{j+1})$ for $i\neq j$ ) of the term. Such repeated occurrences manifest in our constraint graphs (and the graph $H$ defined above) as (three or four) cycles in the graph, which we call independence breaking. For a term to have non-zero expectation it must be that every element $(W_{i},W_{i+1})$ is on some such cycle. This implies that each product $\prod_{i=1}^{2q}{R_{a}(W_{i},W_{i+1})}$ has zero expected value unless all of the product edges in the corresponding $H$ are part of independence-breaking cycles. This places restrictions on $H$ (see Lemma 9.17) which in turn places restrictions on the constraint graph $C$ , allowing us to use Theorem 9.1. We make these ideas precise below.

Given $q$ and $\{W_{1},\cdots,W_{2q}\}$ , we define $W_{i\pm 2q}=W_{i}$ for all $i\in[1,2q]$ .

Define an independence breaking 3-cycle in $H$ to consist of product edges $(W_{i},W_{i+1})$ , $(W_{i+1},W_{i+2})$ and a constraint edge $((W_{i},j),(W_{i+2},j))$ .

Define an independence breaking 4-cycle to consist of product edges $e_{1}=(W_{i_{1}},W_{i_{1}+1})$ , $e_{2}=(W_{i_{2}},W_{i_{2}\pm 1})$ and constraint edges $(W_{i_{1}},W_{i_{2}})$ and $(W_{i_{1}+1},W_{i_{2}\pm 1})$ .

For all $W_{1},\cdots,W_{2q}$ such that $W_{i}\subseteq B$ whenever $i$ is odd and $W_{i}\subseteq A$ whenever $i$ is even, if the corresponding $H$ has a product edge $(W_{i},W_{i+1})$ which is not contained in any independence-breaking cycle then $E[\prod_{i=1}^{2q}{R_{a}(W_{i},W_{i+1})}]=0$

If $(W_{i},W_{i+1})$ is not contained in any independence-breaking cycle then no edge between $W_{i}$ and $W_{i+1}$ appears anywhere else so $R_{a}(W_{i},W_{i+1})$ is a random variable with expectation which is independent from everything else and thus $E[\prod_{i=1}^{2q}{R_{a}(W_{i},W_{i+1})}]=0$ . ∎

We now bound the number of connected components in $H$ with the following lemma.

Let $q\geq 2$ and $H$ be a graph such that

Every product edge of $H$ is contained in an independence-breaking cycle.

Every constraint edge of $H$ is of the form $(W_{i},W_{i+j})$ where $j$ is even.

Then, the number of connected components in the graph defined by only the constraint edges of $H$ is at most $q+1$ .

The intuitive idea behind this lemma is that if we add the constraint edges in the right order, every new constraint edge can put two product edges into independence breaking cycles. For example, a constraint edge between $W_{i-1}$ and $W_{i+1}$ puts the product edges $(W_{i-1},W_{i})$ and $(W_{i},W_{i+1})$ into an independence breaking 3-cycle. If we then add a constraint edge between $W_{i-2}$ and $W_{i+2}$ , this puts the product edges $(W_{i-2},W_{i-1})$ and $(W_{i+1},W_{i+2})$ into an independence breaking 4-cycle. The final constraint edge can put 4 product edges into independence breaking cycles, so the number of constraint edges needed is $q-1$ .

To make this argument work, we use an inductive proof. We note that if there is no $W_{i}$ which is isolated in $H$ , we must have at least $q$ constraint edges. On the other hand, if there a $W_{i}$ which is isolated, there must be a constraint edge between $W_{i-1}$ and $W_{i+1}$ . As noted above, this constraint edge puts the product edges $(W_{i-1},W_{i})$ and $(W_{i},W_{i+1})$ into an independence breaking 3-cycle. We take this to be the first constraint edge. We then argue that we can essentially delete $W_{i}$ and merge $W_{i-1}$ and $W_{i+1}$ which allows us to use the inductive hypothesis. We make these ideas rigorous below.

We prove Lemma 9.17 by induction on $q$ . The base case $q=2$ is trivial, as we clearly need at least one constraint edge, so the number of connected components in $H$ is at most $3$ . Now assume that $q=k\geq 3$ and the result is true for $q=k-1$ .

First note that if there is no $W_{i}$ which is isolated (when looking only at constraint edges), then there are at most $q$ connected components in $H$ . Thus, we may assume that $W_{i}$ is isolated for some $i$ . Now note that for the product edge $(W_{i-1},W_{i})$ , since $W_{i}$ is isolated, there are no independence breaking 3-cycles or 4-cycles where $W_{i}$ is the endpoint of a constraint edge. Thus, we must have that $(W_{i-1},W_{i})$ is part of an independence breaking 3-cycle consisting of $(W_{i-1},W_{i})$ , $(W_{i},W_{i+1})$ , and a constraint edge $(W_{i-1},W_{i+1})$ .

Now form a new graph $H^{\prime}$ as follows. Delete $W_{i}$ and contract the constraint edge between $W_{i-1}$ and $W_{i+1}$ . More precisely,

Take $V(H^{\prime})=V(H)\setminus\{W_{i-1},W_{i},W_{i+1}\}\cup\{U\}$

Take $E_{product}(H^{\prime})=E_{product}(H)\setminus\{(W_{j},W_{j+1}):j\in[i-2,i+1]\}\cup\{(W_{i-2},U),(U,W_{i+2})\}$

After doing this, rename $U$ as $W_{i-1}$ and rename each $W_{j}$ where $j>i+1$ as $W_{j-2}$ . In going from $H$ to $H^{\prime}$ , we have effectively reduced both $q$ and the number of connected components by $1$ . To complete the proof, we need to check that $H^{\prime}$ satisfies the inductive hypotheses. Based on the reduction from $H$ to $H^{\prime}$ , we still have that every constraint edge is of the form $(W_{i},W_{i+j})$ where $j$ is even. We check that every product edge is still part of an independence-breaking cycle case by case.

Every independence-breaking cycle which did not contain the constraint edge $(W_{i-1},W_{i+1})$ in $H$ is preserved in $H^{\prime}$ except that the vertices may have been renamed. The reason for this is that such an independence breaking cycle in $H$ cannot contain $W_{i}$ and can contain at most one of $\{W_{i-1},W_{i+1}\}$ .

The independence-breaking 3-cycle in $H$ consisting of the product edges $(W_{i-1},W_{i})$ , $(W_{i},W_{i+1})$ and the constraint edge $(W_{i-1},W_{i+1})$ is removed, but so are the product edges $(W_{i-1},W_{i})$ and $(W_{i},W_{i+1})$ , so this is fine.

If we have an independence breaking 4-cycle in $H$ consisting of the product edges $(W_{i-2},W_{i-1})$ , $(W_{i+1},W_{i+2})$ and the constraint edges $(W_{i-1},W_{i+1})$ , $(W_{i-2},W_{i+2})$ , this becomes an independence-breaking 3-cycle in $H^{\prime}$ with product edges $(W_{i-2},W_{i-1})$ , $(W_{i-1},W_{i})$ and a constraint edge $(W_{i-2},W_{i})$ (note that $W_{i-1}$ and $W_{i+1}$ are merged into $W_{i-1}$ in $H^{\prime}$ and $W_{i+2}$ is renamed as $W_{i}$ in $H^{\prime}$ ).

$H^{\prime}$ satisfies the inductive hypotheses, so looking only at the constraint edges, $H^{\prime}$ has at most $(q-1)+1=q$ connected components. $H$ has one more connected component than $H^{\prime}$ (the vertex $W_{i}$ in $H$ ), so $H$ has at most $q+1$ connected components, as needed. ∎

The above lemma combined with Lemma 9.5 gives the following corollary.

For all terms $\prod_{i=1}^{2q}R_{a}(W_{i},W_{i+1})$ occurring in Equation 9.1 with nonzero expectation, $|\cup_{i=1}^{2q}W_{i}|\leq 2aq-(2q)+q+1$ .

We can now apply Theorem 9.1 with $y=1$ , and $z=1$ by the above corollary. Every entry of $R_{a,A,B}$ has magnitude at most $2^{a^{2}}$ so we can take $B=2^{a^{2}}$ . By Theorem 9.1, if $n\geq 10$ , for all $A$ and $B$ , for every $\varepsilon\in(0,1)$ ,

Since $4\ln{n}\geq e(\ln{n}+2)$ for all $n\geq 100$ and $a!\geq a$ , we have that for all $n\geq 100$ , for all $A$ and $B$ and all $\varepsilon\in(0,1)$ ,

Now by Corollary 9.11, $||R_{a}||\leq 2^{2a}\max_{A,B}{\{||R_{a,A,B}||\}}$ so

We now prove large deviation bounds for $deg_{G}(\;)$ leading to Claim 8.13 which we state below in a more precise form.

If $n\geq 10$ , and $\varepsilon\in(0,1)$ , then for all $I\subseteq[n]$ , with $|I|=i\leq 2r$ ,

To prove the claim we first show a similar concentration bound for the number of cliques of a certain size in $G$ . While similar results appear in the literature, see for instance [Ruc88, Vu01, JLR11], we give a short direct proof based on Theorem 9.1.

For a graph $G$ , define $N_{a}(G)$ to be the number of $a$ -cliques in $G$ .

For all $a$ , for all $n\geq 10$ and $\varepsilon\in(0,1)$ , $E[N_{a}(G)]=2^{-\binom{a}{2}}{\binom{n}{a}}$ and

The first part of the theorem is trivial so we focus on the second part. Given a set of vertices $V$ of size $a$ , define $c_{V}$ to be $1-2^{-\binom{a}{2}}$ if $V$ is a clique and $-2^{-\binom{a}{2}}$ otherwise. Then,

Now let’s consider the function $p(G,2q)=(\sum_{V:|V|=a}{c_{V}})^{2q}=\sum_{W_{1},\cdots,W_{2q}}{\prod_{i=1}^{2q}{c_{W_{i}}}}$ .

Note that $E[\prod_{i=1}^{2q}{c_{W_{i}}}]=0$ unless each set of vertices $W_{i}$ has two vertices in common with a different set of vertices $W_{j}$ . Now consider a graph $C_{2}$ where the vertices are $\{W_{1},\ldots,W_{2q}\}$ and an edge between $W_{i},W_{j}$ if $|W_{i}\cap W_{j}|\geq 2$ . Let $t$ be the number of connected components in $C_{2}$ . We claim that $|\cup_{i}W_{i}|\leq 2aq-4q+2t$ . For, as in the proof of Lemma 9.5, first consider elements $W_{i_{1}},\ldots,W_{i_{t}}$ belonging to the $t$ different connected components. Now, add the remaining elements of $\{W_{1},\ldots,W_{2q}\}$ so that each new element is adjacent to at least one of the previously added sets. When doing so, each step can increase the size of the union by at most $a-2$ . Therefore, the size of the union is at most $at+(a-2)(2q-t)=2aq-4q+2t$ . On the other hand, each connected component in $C_{2}$ must have at least two vertices, so $t\leq q$ . Therefore, $|\cup_{i}W_{i}|\leq 2aq-2q$ .

We can now apply Theorem 9.1 with $y=2$ , $z=0$ and $B=1$ so that for $n\geq 10$ , and $\varepsilon\in(0,1)$ ,

Using the facts that $e^{2}<8$ and $\frac{m^{2}}{m!}\leq 2$ for all nonnegative integers $m$ , we have that

We are now ready to prove Theorem 10.1. The idea is as follows. Let $A_{I}$ be the collection of vertices which are adjacent to all the vertices in $I$ . Then, conditioned on $I$ being a clique, $deg_{G}(I)$ is just the number of cliques of size $2r-i$ in the vertices $A_{I}$ which is primarily determined by $|A_{I}|$ . This is because the edges between vertices of $A_{I}$ are independent of the edges involving vertices in $I$ so that we can apply Theorem 10.3 to $A_{I}$ .

Let $A_{I}$ be as above and let us condition on $I$ being a clique. Then, $deg_{G}(I)$ is just the number of cliques of size $2r-i$ among the vertices in $A_{I}$ . Therefore, by Theorem 10.3, with probability at least $1-\varepsilon/2$ ,

We next argue that $\binom{|A_{I}|}{2r-i}$ is concentrated around its mean. For $j\notin I$ , let $X_{j}$ be the indicator random variable that is $1$ if the $j$ ’th vertex is adjacent to all the vertices in $I$ and otherwise. Then, $|A_{I}|=\sum_{j\notin I}X_{j}$ and

Observe that the random variables $X_{j}$ are independent of each other and that

We next apply McDiarmid’s inequality to the function $f$ . Note that changing any single coordinate of the inputs to $f$ can change its value by at most $n^{2r-i-1}$ . Therefore, by Theorem 4.2, with probability at least $1-\varepsilon/2$ ,

Combining the above equations, we get that with probability at least $1-\varepsilon$ ,

The theorem now follows as $\binom{2r-i}{2}+i(2r-i)=\binom{2r}{2}-\binom{i}{2}$ . ∎

Conclusion and future work

In this work we showed a lower bound for the maximum clique problem on random $G(n,1/2)$ graphs in the $\mathsf{SOS}$ hierarchy and positivstellensatz proof system. Besides the specific application to clique lower bounds, the PSD’ness of the matrix $M$ from Equation 2.5 seems to carry further information that could be potentially useful elsewhere, perhaps for studying various sub-graph statistics. Further, the arguments related to association schemes and bounding the norm of locally random matrices could also be useful elsewhere, especially for other $\mathsf{SOS}$ hierarchy lower bounds. One natural and interesting candidate is the densest subgraph problem.

For planted clique itself, the most obvious open problem is to tighten the gap between the current upper bound of $O(\sqrt{n}/2^{r})$ and our lower bound of $2^{-O(r)}(\sqrt{n}/\log n)^{1/r}$ for $r$ rounds of the SOS hierarchy. In particular, can a constant number of rounds of $\mathsf{SOS}$ beat the square-root barrier and identify planted cliques of size $o(n^{1/2})$ ? KelnerPersonal comminication showed that our dual certificate $M$ actually is not PSD for $k$ roughly $O(n^{1/(r+1)})$ . Thus one needs to come up with a different dual certificate to approach the upper bound of $\sqrt{n}$ even for $r=2$ .

We thank Boaz Barak, Siu-on Chan, Jonathan Kelner, Robert Krauthgamer, James Lee, Nati Linial, David Steurer, Madhu Sudan and Amir Yehudayoff for several useful comments.

References

Hierarchy Gaps and Positivstellensatz Refutations

For a detailed discussion of the hierarchies and $\mathsf{PS}(r)$ -refutations we refer the reader to the discussions in [OZ13]. The basic principle is that, typically, $\mathsf{PS}(r)$ -refutations are more robust and stronger than the hierarchy formulations.

The $\mathsf{SOS}$ (or Lasserre) relaxation for maximum clique is stated in Figure 1 (cf. [Tul09]). Although, the formulation itself is not in terms of an SDP, it is a standard fact that as the program only involves inner products of vectors, the optimization can be done by semi-definite programming.

The connection between Figure 1 and $\mathsf{PS}(r)$ -refutations comes from the following straightforward lemma stating that a certificate for $\mathsf{PS}(r)$ -refutations is simply a primal solution to the standard $r$ -round $\mathsf{SOS}$ -relaxation of the problem.

Observe that for any two subsets $S_{1},S_{2}\in\binom{[n]}{\leq r}$ ,

Therefore, the vectors $(\bm{U}_{S}:|S|\leq r)$ satisfy the first two constraints of Figure 1 as $\mathcal{M}$ is a dual certificate. Further, $\|\bm{U}_{\emptyset}\|^{2}=M(\emptyset,\emptyset)=1$ and for any set $S$ ,

so that $\|\bm{U}_{S}\|\leq 1$ . Thus, $(\bm{U}_{S}$ : $|S|\leq r)$ give a feasible solution for the program in Figure 1. Finally, the value of the solution is

Let $G\leftarrow G(n,1/2)$ . Then, from the above lemma and the proof of Theorem 1.5 (where we showed the existence of a dual certificate for the clique axioms), the value of the $r$ -round $\mathsf{SOS}$ -relaxation for max-clique on $G$ is at least $n^{1/2r}/C^{r}(\log n)^{1/r}$ with high probability. The claim follows as the integral value is $(2+o(1))\log_{2}n$ with high probability. ∎

The value of the relaxation in Figure 1 is clearly monotone with respect to adding edges. Therefore, from the above argument, for $G\leftarrow G(n,1/2,t)$ the value of the $r$ -round $\mathsf{SOS}$ -relaxation for max-clique on $G$ is at least $n^{1/2r}/C^{r}(\log n)^{1/r}$ with high probability. The claim follows as the integral value is $t$ with high probability. ∎