A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem

Boaz Barak, Samuel B. Hopkins, Jonathan Kelner, Pravesh K. Kothari, Ankur Moitra, Aaron Potechin

Introduction

The planted clique (also known as hidden clique) problem is a central question in average-case complexity. Arising from the 1976 work of Karp [Kar76], the problem was formally defined by Jerrum [Jer92] and Kucera [Kuc95] as follows: given a random Erdős-Rényi graph GG from the distribution G(n,1/2)G(n,1/2) (where every edge is chosen to be included with probability 1/21/2 independently of all others) in which we plant an additional clique (i.e., set of vertices that are all neighbors of one another) SS of size ω\omega, find SS. It is not hard to see that the problem is solvable by brute force search (which in this case takes quasipolynomial time) whenever ω>clogn\omega>c\log n for any constant c>2c>2. However, despite intense effort, the best polynomial-time algorithms only work for ω=εn\omega=\varepsilon\sqrt{n}, for any constant ε>0\varepsilon>0 [AKS98].

Over the years the planted clique problem and related problems have been connected to many other questions in a variety of areas including finding communities [HWX15], finding signals in molecular biology [PS000], discovering network motifs in biological networks [MSOI+02, JM15], computing Nash equilibrium [HK11, ABC13], property testing [AAK+07], sparse principal component analysis [BR13], compressed sensing [KZ14], cryptography [JP00, ABW10] and even mathematical finance [DBL10].

Thus, the question of whether the currently known algorithms can be improved is of great interest. Unfortunately, it is unlikely that lower bounds for planted clique (because it is an average-case problem) can be derived from conjectured complexity class separations such as PNPP\neq NP [FF93, BT06]. Our best evidence for the difficulty of this problem comes from works showing limitations on particular classes of algorithms. In particular, since many of the algorithmic approaches for this and related problems involve spectral techniques and convex programs, limitations for these types of algorithm are of significant interest. One such negative result was shown by Feige and Krauthgamer [FK03a] who proved that the nO(d)n^{O(d)}-time degree dd Lovász-Schrijver semidefinite programming hierarchy (LS+LS_{+} for short) can only recover the clique if its size is at least n/2d\sqrt{n/2^{d}}.As we discuss in Remark 1.2 below, formally such results apply to the incomparable refutation problem, which is the task of certifying that there is no ω\omega-sized clique in a random G(n,1/2)G(n,1/2) graph. However, our current knowledge is consistent with these variants having the same computational complexity.

However, recently it was shown that in several cases, the Sum-of-Squares (SoS) hierarchy [Sho87, Par00, Las01] — a stronger family of semidefinite programs which can be solved in time nO(d)n^{O(d)} for degree parameter dd — can be significantly more powerful than other algorithms such as LS+LS_{+} [BBH+12, BKS14, BKS15]. Thus it was conceivable that the SOS hierarchy might be able to find cliques that are much smaller than n\sqrt{n} in polynomial time.

There is an absolute constant cc so that for every d=d(n)d=d(n) and large enough nn, the SoS relaxation of the planted clique problem has integrality gap at least n1/2c(d/logn)1/2n^{1/2-c(d/\log n)^{1/2}}.

Beyond improving the previously known results, our proof is significantly more general and we believe provides a better intuition behind the limitations for SoS algorithms by viewing them from a “computational Bayesian probability” lens that is of its own interest. Moreover, there is some hope (as we elaborate below) that this view could be useful not just for more negative results but for SoS upper bounds as well. In particular our proof elucidates to a certain extent the way in which the SoS algorithm is more powerful than the LS+LS_{+} algorithm.

Like other average-case problems in NP\mathbf{NP}, the planted clique problem with parameter ω\omega has three variants of search, refutation, and decision. The search variant is the task of recovering the clique from a graph in which it was planted. The refutation variant is the task of certifying that a random graph in G(n,1/2)G(n,1/2) (where with high probability the largest clique has size (2+o(1))logn(2+o(1))\log n) does not have a clique of size ω\omega. The decision problem is to distinguish between a random graph from G(n,1/2)G(n,1/2) and a graph in which an ω\omega-sized clique has been planted. The decision variant can be reduced to either the search or the refutation variant, but we know of no reduction between the latter two variants. Integrality gaps for mathematical relaxations such as the Sum-of-Squares hierarchy are most naturally stated as negative results for the refutation variant, as they show that such relaxations cannot certify that a random graph has no ω\omega-sized clique by looking at the maximum value of the objective function. Our result can also be viewed as showing that the natural SoS-based algorithm for the decision problem (which attempts to distinguish on the objective value) also fails. Moreover, our result also rules out some types of SoS-based algorithms for the search problem as it shows that in a graph with a planted clique, there exists a solution with an objective value of ω\omega based only on the random part, which means that it does not contain any information about which nodes participate in the clique and hence is not useful for rounding algorithms.

Planted Clique and Probabilistic Inference

We now discuss the ways in which the planted clique problem differs from problems for which strong SoS lower bounds have been shown before, and how this relates to a “computational Bayesian” perspective. There have been several strong lower bounds for the SoS algorithm before, in particular for problems such as 3SAT, 3XOR and other constraint satisfaction problems as well as the knapsack problem [Gri01, Sch08, BCK15]. However, obtaining strong lower bounds for the planted clique problem seems to have required different techniques. A high-level way to describe the difference between the planted clique problems and the problems tackled by previous results is that lower bounds for the planted clique problem boil down to handling weak global constraints as opposed to strong local ones. That is, while in the random 3SAT/3XOR setting, the effect of one variable on another is either extremely strong (if they are "nearby" in the formula) or essentially zero, in the planted clique setting every variable has a weak global effect on all other variables. We now explain this in more detail.

Consider a random graph GG in which a clique SS of size ω\omega has been planted. If someone tells us that vertex 1717 is not in SS, then it makes it slightly less likely that 1717’s neighbors are in SS and slightly more likely that 1717’s non-neighbors are in SS. So, this information has a weak global effect. In contrast, when we have a random sparse 3SAT formula φ\varphi in which an assignment xx has been planted, if someone tells us that x17=0x_{17}=0 then it gives us a lot of information about the local neighborhood of the 17th17^{th} variable (the variables that are involved in constraints with 1717 or one that have a short path of constraints to it) but there is an exponential decay of these correlations and so this information basically tells us essentially nothing about the distribution of most of the variables xix_{i} (that are far away from 1717 in the sparse graph induced by φ\varphi).xThis exponential decay can be shown formally for the case of satisfiable random 3SAT or 3XOR formulas whose clause density is sufficiently smaller than the threshold. In our regime of overconstrainted random 3SAT/3XOR formulas there will not exist any satisfying assignments, and so to talk about “correlations” in the distributions of assignments we need to talk about the “Bayesian estimates” that arise from algorithms such as Sum-of-Squares or belief propagation. Both these algorithms exhibit this sort of exponential decay we talk about; see also Remark 2.1 Thus, in the random 3SAT setting information about the assignments of individual variables has a strong local effect. Indeed, previous Sum-of-Squares lower bounds for random 3SAT and 3XOR [Gri01, Sch08], could be interpreted as producing "distribution like" objects in which, conditioned on the value of a small set of variables SS, some of the variables "close" to SS in the formula were completely fixed, and the rest were completely independent.

This difference between the random SAT and the planted clique problems means that some subtleties that can be ignored in setting of random constraint satisfaction problems need to be tackled head-on when dealing with planted cliques. However to make this clearer, we need to take a detour and discuss Bayesian probabilities and their relation to the Sum of Square Algorithm.

Strictly speaking, if a graph GG contains a unique clique SS of size ω\omega, for every vertex ii, the probability that ii is in SS is either zero or one. But, a computationally bounded observer may not know whether ii is in the clique or not, and we could try to quantify this ignorance using probabilities. These can be thought of as a computational analogs of Bayesian probabilities, that, rather than aiming to measure the frequency at which an event occurs in some sample space, attempt to capture the subjective beliefs of some observer.

That is, the Bayesian probability that an observer BB assigns to an event EE can be thought of as corresponding to the odds at which BB would make the bet that EE holds. Note that this probability could be strictly between zero and one even if the event EE is fully determined, depending on the evidence available to BB. While typically Bayesian analysis does not take into account computational limitations, one could imagine that even if BB has access to information that fully determines whether EE happened or not, she could still rationally assign a subjective probability to EE that is strictly between zero and one if making the inferences from this information is computationally infeasible. In particular, in the example above, even if a computationally bounded observer has access to the graph GG, which information-theoretically fully determines the planted ω\omega-sized clique, she could still assign a probability strictly between zero and one to the event that vertex 1717 is in the planted ω\omega-sized clique, based on some simple to compute statistics such as how many neighbors 1717 has, etc.

The Sum-of-Squares algorithm can be thought of as giving rise to an internally consistent set of such "computational probabilities". These probabilities may not capture all possible inferences that a computationally bounded observer could make, but they do capture all inferences that can be made via a certain restricted proof system.

Indeed, the key conceptual insight of this paper is to phrase the calibration property (2.2) as a desiderata for our pseudo-distributions. Namely, we define that a function f=f(G,x)f=f(G,x) is “simple” if it is a low degree polynomial in both the entries of GG’s adjacency matrix and the variables xx, and then require (2.2) to hold for all simple functions. It turns out that once you do so, the choice for the pseudo-distribution is essentially determined, and hence proving the main result amounts to showing that it satisfies the constraints of the SoS algorithm. In the next section we will outline some of the ideas involved in this proof.

In the light of the discussion above, it is instructive to consider the case of random 3XOR discussed before. Random 3XOR instances on nn variables and Θ(n)\Theta(n) constraints are easily seen to be maximally unsatisfiable (that is, at most 1/2\approx 1/2 the constraints can be satisfied by any assignment) with high probability. On the other hand, Grigorev [Gri01] constructed a sum of squares pseudoexpectation that pretends that such instances instances are satisfiable with high probability, proving a sum of squares lower bound for refuting random 3XOR formulas.

Analogous to the planted distribution G(n,1/2,ω)G(n,1/2,\omega), one can define a natural planted distribution over 3XOR instances - roughly speaking, this corresponds to first choosing a random Boolean assignment xx^{*} to nn variables and then sampling random 3XOR constraints conditioned on being consistent with xx^{*}. It is not hard to show that pseudo-calibrating with respect to this planted distribution a la (2.2) produces precisely the pseudoexpectation that Grigoriev constructed. However, unlike in the planted clique case, in the case of 3XOR, the pseudo-calibration condition implies that for every low-degree monomial xSx_{S}, either the value of xSx_{S} is completely fixed (if it can be derived via low width resolution from the 3XOR equations of the instance) or it is completely unconstrained.

The pseudoexpectations considered in previous works [FK03b, MPW15, DM15]) are similar to Grigoriev’s construction, in the sense that they essentially respect only strong constraints (e.g., that if AA is not a clique in the graph, then the probability that it is contained in the planted clique is zero), but other than that assume that variables are independent. However, unlike the 3XOR case, in the planted clique problem respecting these strong constraints is not enough to achieve the pseudo-calibration condition (2.2) and the pseudoexpectation of [FK03b, MPW15, DM15] can be shown to violate weak probabilistic constraints imposed by (2.2) even at degree four. See Observation 2.4 for an example.

2 From Calibrated Pseudo-distributions to Sum-of-Squares Lower Bounds

We can restate our main result as follows:

Previously, the choice of the pseudo-distribution seemed to require a “creative guess” or an “ansatz”. For problems such as random 3SAT this guess was fairly natural and almost “forced”, while for planted clique planted clique as well as some related problems [MW15] the choice of the pseudo-distribution seemed to have more freedom, and more than one choice appeared in the literature.

The coefficients of the polynomial pGp_{G} of Observation 2.4 are themselves low degree polynomials in the adjacency matrix of GG. This turns out to be a common feature in all the families of polynomials one encounters in the above works. Thus our approach is to fix all these polynomials by fiat, by placing the constraint that the pseudo-distribution must satisfy (2.2) for every such polynomial, and using that as our implicit definition of the pseudo-distribution. Indeed it turns our that once we do so, the pseudo-distribution is essentially determined. Moreover, (2.2) guarantees that it satisfies many of the “weak global constraints” that can be shown using Bayesian calculations.

We believe that this principled approach to designing pseudo-distributions elucidates the power and limitations of the SoS algorithm in cases such as the planted clique, where accounting for weak global correlations is a crucial aspect of the problem.

Theorem 2.3 (as well as Theorem 1.1) makes no mention of the planted distribution G(n,1/2,ω)G(n,1/2,\omega) and only refers to an actual random graph. Thus it might seem strange that we base our pseudo-distribution on the planted distribution via (2.2). One way to think about the planted distribution is that it corresponds to a Bayesian prior distribution on the clique. Note that this is the maximum entropy distribution on cliques of size ω\omega, and so it is a natural choice for a prior per Jaynes’s principle of maximum entropy. Our actual pseudo-distribution can be viewed as correcting this planted distribution to a posterior that respects simple inferences from the observed graph GG.

3 Proving Positivity

where V(T)\mathcal{V}(T) is the set of nodes incident to the subset of edges (i.e., graph) TT and χT(G)=eTGe\chi_{T}(G)=\prod_{e\in T}G_{e}. We carry out this computation in Section 5. For ωn0.5ε\omega\approx n^{0.5-\varepsilon}, we will need to choose the truncation threshold τd/ε\tau\gtrapprox d/\varepsilon. It turns out that constraints 1–5 are easy to verify and thus we are left with proving the positivity constraint. Indeed this is not surprising as verifying this constraint is always the hardest part of a sum of squares lower bound.

Given a (symmetric) matrix NN, to show that N0N\succeq 0 our first hope might be to diagonalize NN. That is, we would hope to find a matrix VV and a diagonal matrix DD so that N=VDVN=VDV^{\dagger}. Then as long as every entry of DD is nonnegative, we would obtain N0N\succeq 0. Unfortunately, carrying this out directly can be far too complicated. Even the eigenvectors of very simple random matrices–for example, a matrix with independent ±1\pm 1 entries—are not explicitly understood. Our moment matrix M\mathcal{M} is a much more complicated random matrix, with intricate dependencies among the entries. However, as the next example demonstrates, it is sometimes possible to prove PSDness for a random matrix using an approximate diagonalization.

We return now to the moment matrix M\mathcal{M} for our (pseudo)calibrated pseudodistribution. Our goal is to give an approximate diagonalization of M\mathcal{M}. There are several obstacles to doing so:

In the case d=2d=2 there was just one rank-11 approximate eigenspace to be handled. The number of these approximate eigenspaces will grow with dd, so we will need a more generic way to handle them.

The errors in our diagonalization of M\mathcal{M}—corresponding in our d=2d=2 example to the matrix EE—will not be so small that we can ignore them as we did above. Instead, each error matrix will itself have to be approximately diagonalized, recursively until these errors are driven down sufficiently far in magnitude.

At a high level our approach can be viewed as falling into the general paradigm of “structure vs. randomness” as discussed by Tao [Tao05]. The general idea of this paradigm is to separate an object OO into a “structured” part that is simple and predictable, and a “random” part that is unpredictable but has small magnitude or has some global statistical properties.

One example of this is the Szemerédi regularity lemma [Sze78] as well variants such as [FK96] that partition a matrix into a sum of a low rank and pseudorandom components. Another example arises from the random models for the primes (e.g., see [Tao15, Gra95]). These can be thought of positing that, as far as certain simple statistics are concerned, (large enough) primes can be thought of as being selected randomly conditioned on not being divisible by 2,3,52,3,5 etc.. up to some bound ww.

All these examples can be viewed from a computationally bounded Bayesian perspective. For every object OO we can consider the part of OO that can be inferred by a computationally bounded observer to be OO’s structured component, while the remaining uncertainty can be treated as if it is random, even if in actuality it is fully determined. Thus in our case, even though for almost every particular graph GG from G(n,1/2,ω)G(n,1/2,\omega), the clique xx is fully determined by GG, we still think of xx as having a “structured” part which consists of all the inferences a “simple” observer can make from GG (e.g., that if ii and jj are non-neighbors then xixj=0x_{i}x_{j}=0), and a “random” part that consists of the remaining uncertainty. As in other cases of applying this paradigm, part of the technical work is bounding the magnitude (in our case in spectral norm) that arises from the “random” part, though as mentioned above in our case we need a particularly delicate control of the error terms which ends up causing much of the technical difficulty.

Proving Positivity: A Technical Overview

The matrix M\mathcal{M} is generated from the random graph GG, but its entries are not independent. Rather, each entry is a polynomial in GeG_{e}, and there are some fairly complex dependencies between different them. Indeed, these dependencies will create a spectral structure for M\mathcal{M} that is very different from the spectrum of standard random matrices with independent entries and makes proving M\mathcal{M} positive semidefinite challenging. Our approach to showing that M\mathcal{M} is positive semidefinite is through a type of “symbolic factorization” or “approximate diagonalization,” which we explain next.

It is instructive to begin with the tight analysis presented in [HKP15] of the moments constructed in [MPW15]The construction in [MPW15] actually also satisfies xi=ω\sum x_{i}=\omega as a constraint which causes the precise form to differ. We ignore this distinction here.. These moments can in fact obtained by using truncation threshold τ=S\tau=|S| in (2.3). This choice of τ\tau is the smallest possible for which the resulting construction satisfies the hard clique constraints. [HKP15] show that this construction satisfies positivity for ωn1/(d2+1).\omega\lessapprox n^{1/(\frac{d}{2}+1)}.

For the purpose of this overview, let us work with the principal submatrix FF indexed by subsets II and JJ of size exactly dd. The analysis in [HKP15] proceeds by first splitting FF into d+1d+1 components F=F0+F1++FdF=F_{0}+F_{1}+\cdots+F_{d} where Fi(I,J)=F(I,J)F_{i}(I,J)=F(I,J) if IJ=i|I\cap J|=i and otherwise. Below, we discuss two of the key ideas involved that will serve as an inspiration for us.

As discussed before, we must approximately diagonalize the matrix FF in the sense that the off diagonals blocks must be "small enough" to be charged to the on diagonal block. Thus the main question before us is obtain an (approximate) understanding of the spectrum of FF that allows us to come up with a "change of basis" in which the off diagonal blocks are small enough to be charged to the positive eigenmass in the on-diagonal blocks.

Let us consider the piece F0F_{0} for our discussion here. As alluded to in Section 3, we want to break FF into minimal pieces so that each piece is symmetric under the permutation of vertices. We can hope that each piece will then essentially have a single dominating eigenvalue that can be determined relatively easily. Below, we will essentially implement this plan.

First, we need to decide what kind of "pieces" we will need. These are the graphical matrices that we define next.

Let UU be a graph on [2d][2d] with specially identified subsets left and right subsets [d][d] and [2d][d][2d]\setminus[d]. For any I,J([n]d)I,J\in{{[n]}\choose{d}}, IJ=I\cap J=\varnothing, let πI,J\pi_{I,J} be an injective map that takes [d][d] into II and [2d][d][2d]\setminus[d] into JJ using a fixed convention. The graphical matrix MUM_{U} with graph UU is then defined by MU(I,J)=χπI,J(U)(G).M_{U}(I,J)=\chi_{\pi_{I,J}(U)}(G).

In particular, this implies that when UU has a perfect matching, MUM_{U} is pseudorandom in the sense that FUF_{U} essentially has the spectral norm nd/2\approx n^{d/2}, the same as that of an independent {1,1}\{-1,1\} random matrix of the same dimensions. This allows MUM_{U} to be bounded against the positive eigenvalue (ωn)d{\left(\frac{\omega}{n}\right)}^{d} of the diagonal matrix FdF_{d} as (ωn)d(ωn)2dnd/2{\left(\frac{\omega}{n}\right)}^{d}\gg{\left(\frac{\omega}{n}\right)}^{2d}n^{d/2} (even for ω\omega approaching n\sqrt{n}!). However for MUM_{U} when UU has a maximum matching of size t<dt<d, one can’t bound against the diagonal matrix FdF_{d} anymore.

The next main idea is to note that for every MUM_{U} there’s an appropriate "diagonal" against which we must charge the negative eigenvalues of MUM_{U}. When UU has a perfect matching, this is literally the diagonal matrix FdF_{d} as done above. However, when, say, UU is a (bipartite) matching of size t<dt<d, we should instead charge against the "diagonal" matrix that can thought of as obtained by "collapsing" each matching edge into a vertex in UU. In particular, this collapsing produces a matrix that lies in the decomposition of FtF_{t}.

There are a two main takeaways from this analysis that would serve as inspiration in the analysis of our actual construction. First is the decomposition into graphical matrices in order to have a coarse handle on the spectrum of the moment matrix. Second, the "charging" of negative eigenvalues against appropriate "diagonals" is essentially governed by the combinatorics of matchings in UU.

2 The Main Analysis

We can now try to use the lessons from the warm up analysis to inspire our actual analysis. To begin with, we recall that each graphical matrix was obtained by choosing an appropriate (set of) Fourier monomials for any entry indexed by I,JI,J. However, since for our actual construction we have monomials of much higher degree, we need to extend the notion of graphical matrices with shapes corresponding to larger graphs UU. See Def 7.6 for a formal definition.

It turns out that the right combinatorial idea to generalize the size of the maximum matching and control the spectral norm of the graphical matrices MU\mathcal{M}_{U} is the maximum number of vertex disjoint paths between specially designated left and right endpoints of UU (themselves the generalization of the bipartition we had in the warmup). Using Menger’s theorem, this is equal to the size of a minimal collection of vertices that separates the left and right sets in the graph UU, which we call the separator size of UU.

One could hope that we could replace the RHS of (3.3) by

In fact, it turns out we can focus attention (up to sufficiently small error in the spectral norm) to the case τ1τ/3\tau_{1}\leqslant\tau/3, τ2τ/3\tau_{2}\leqslant\tau/3, τ3τ/3\tau_{3}\leqslant\tau/3 in which case if M(I,J)M(I,J) was equal to (3.4) we could simply write

for some other matrix Q1\mathcal{Q}_{1}. Continuing this argument gives us for every qq a factorization of Mq\mathcal{M}_{q} as

The error matrices ξ0,ξ1,,ξ2d\xi_{0},\xi_{1},\ldots,\xi_{2d} arise from truncation issues, which we have ignored in the argument above and turn out to be negligible.

It is not hard to show that Q0D\mathcal{Q}_{0}\succeq D for some positive semidefinite matrix DD that we define later. What remains is to bound the remaining matrices Q1,Q2d1\mathcal{Q}_{1},\ldots\mathcal{Q}_{2d-1} in order to conclude that M\mathcal{M} is positive semidefinite. Next, we elaborate on the structure of these matrices. It turns out that we can define the “shape” of a graph Rm\mathcal{R}_{m} in an appropriate way so that

where UU is a finite (for constant dd) sized graph with vertex set ABCA\cup B\cup C, where we call AA the “left” side of UU and BB the “right” side of UU. Moreover Qi=UQiU\mathcal{Q}_{i}=\sum_{U}\mathcal{Q}^{U}_{i}. Now QiU\mathcal{Q}^{U}_{i} is a random matrix and special cases of this general family of matrices (for particular choices of UU) arise in several earlier works on lower bounds for planted clique. Medarametla and Potechin [MP] showed that the spectral norm of QU\mathcal{Q}^{U} can be controlled by a bound on its coefficients and a few combinatorial parameters of UU — namely V(U)|\mathcal{V}(U)|, AB|A\cap B| and the number of vertex disjoint paths between A/BA/B and B/AB/A.

Preliminaries

We use small Greek letters indicate constants/parameters.

Pdn\mathcal{P}_{d}^{n} denotes the linear space of all multilinear polynomials of degree at most dd on {0,1}n\{0,1\}^{n}.

We write 1Q\mathbf{1}_{Q} for any event QQ to be the -11 indicator of whether QQ happens.

For a subset T([n]2)T\subseteq{{[n]}\choose{2}} of edges of a graph on vertex set [n][n], we write V(T)[n]\mathcal{V}(T)\subseteq[n] to denote the vertices that have at least one edge incident on them in TT.

For a graph GG, let \mathcal{C}_{q}=\mathcal{C}_{q}(G)=\{I\subseteq[n]\,:\,I\text{ is aqcliquein-clique inG}\}, and let Cq=qqCd\mathcal{C}_{\leqslant q}=\bigcup_{q^{\prime}\leqslant q}\mathcal{C}_{d^{\prime}}. Let C(G)=C\mathcal{C}(G)=\mathcal{C}_{\leqslant\infty} be the collection of all cliques in GG. We count the empty set and all singletons as cliques.

We write G(n,12)\mathcal{G}(n,\frac{1}{2}) to denote the distribution on graphs on the vertex set [n][n] where each edge is included with probability 1/21/2 independently of others.

We write f(n)g(n)f(n)\ll g(n) to mean that for every constant cc there is an n0n_{0} such that if nn0n\geqslant n_{0}, f(n)Cg(n)f(n)\leqslant Cg(n).

2 Graphs

We identify a graph GG with its {1,1}\{-1,1\} adjacency matrix and write Ge{1,1}G_{e}\in\{-1,1\} for the {1,1}\{-1,1\}-indicator of whether e[n]×[n]e\in[n]\times[n] is an edge (indicated by Ge=+1G_{e}=+1) in the graph GG or not. When GG(n,12)G\sim\mathcal{G}(n,\frac{1}{2}), GeG_{e} are independent {1,1}\{-1,1\}-random variables.

A graph function is a real-valued function of the variables Ge{1,1}G_{e}\in\{-1,1\} for e([n]2)e\in{{[n]}\choose{2}}. For graphs G1,G2,,GkG^{1},G^{2},\ldots,G^{k} on the vertex set [n][n], we define Δ(G1,G2,,Gk)\Delta(G^{1},G^{2},\ldots,G^{k}) to be the graph GG satisfying Ge=ΠikGei.G_{e}=\Pi_{i\leqslant k}G^{i}_{e}.

For a graph GG on [n][n] and vertex sets I,J[n]I,J\subseteq[n], a set of vertices S[n]S\subseteq[n] is said to be a minimal vertex separator if SS is a set of smallest possible size such that every path between II and JJ in GG passes through some vertex of SS.

Often, II and JJ will be allowed to intersect in which case any vertex separator must contain IJI\cap J.

For a graph GG on [n][n] and two subsets of vertices I,J[n]I,J\subseteq[n], the maximum number of vertex disjoint paths between II and JJ in GG is equal to the size of any minimal vertex separator between II and JJ in GG.

3 Fourier Analysis

where χW(G)\chi_{W}(G) is the parity function on edges in WW:

Let GG be a graph on nn described by the vector G{1,1}(n2)G\in\{-1,1\}^{{n\choose 2}}. For any subset S[n]S\subseteq[n] of the vertices, we have the identity:

4 The Sum-of-Squares Algorithm

The sum of squares algorithm has several equivalent definitions. We follow the notation of pseudoexpectations as in the survey of Barak and Steurer [BS14].

Given a set of constraints {pi=0}\{p_{i}=0\} for 1im1\leqslant i\leqslant m and an objective polynomial pp, degre sum of squares algorithm of degree dd solves the problem

The Pseudo-expectation

Let S[n]S\subseteq[n] be a set of vertices of size Sd|S|\leqslant d. Let T([n]2)T\subseteq{{[n]}\choose{2}} be a set of edges. Let χT=eTGe\chi_{T}=\prod_{e\in T}G_{e}. Let

The Fourier coefficients can in fact be explicitly computed easily:

Let T([n]2)T\subseteq{{[n]}\choose{2}}, S[n]S\subseteq[n] and V(T)[n]\mathcal{V}(T)\subseteq[n] be the vertices incident to edges in TT. Then

We note that the second term above is . It’s easy to see if xS=0x_{S}=0. Otherwise, xV(T)=0x_{\mathcal{V}(T)}=0, and there is an edge eTe\in T but not contained in the clique xx. Thus,

3 Proof of Main Theorem

With high probability over GG from G(n,1/2)G(n,1/2), every pPdp\in\mathcal{P}_{d} satisfies,

It is easy to complete the proof of Theorem 1.1 now:

By Lemma 5.4, Lemma 5.5, and Lemma 5.6, there is a universal CC so that if Cd/ετ(1/C)εlognCd/\varepsilon\leqslant\tau\leqslant(1/C)\varepsilon\log n, (by a union bound) with high probability the following all hold:

4 Proof Plan

As is standard, we can reduce Lemma 5.6 to showing that the associated moment matrix, is positive semidefinite.

Thus, Lemma 5.6 is equivalent to showing:

With high probability, M0.\mathcal{M}\succeq 0.

At a high level our plan involves first getting an approximate factorization of the moment matrix M=LQ0L+"error"\mathcal{M}=\operatorname{\mathcal{L}}\mathcal{Q}_{0}\operatorname{\mathcal{L}}^{\dagger}+"error" for appropriately defined matrices L\operatorname{\mathcal{L}} and Q0\mathcal{Q}_{0}. This step is the key technical part of the proof - given such a factorization, our task reduces to showing that Q0\mathcal{Q}_{0} and LL\operatorname{\mathcal{L}}\operatorname{\mathcal{L}}^{\dagger} has large enough positive eigenvalues to compensate for the error. The first approximate factorization step will occupy us in Section 6. The technical work in second step involves showing upper bounds on the spectral norms of appropriately defined pieces of Q0\mathcal{Q}_{0} and is the content of Section 7.

Approximate Factorization of the Moment Matrix

In this section we get set up for the first step in the proof of Lemma 5.8 by setting up some definitions. Ribbons will play a crucial role in our analysis:

An (I,J)(I,J)-ribbon R\mathcal{R} is a graph with edge set WR([n]2)W_{\mathcal{R}}\subseteq{{[n]}\choose{2}} and vertex set VRV(WR)IJV_{\mathcal{R}}\supseteq\mathcal{V}(W_{\mathcal{R}})\cup I\cup J, for two specially identified subsets I,J[n]I,J\subseteq[n], each of size at most dd, called the left and the right ends, respectively. We sometimes write V(R)=defVR\mathcal{V}(\mathcal{R})\stackrel{{\scriptstyle\textrm{def}}}{{=}}V_{\mathcal{R}} and call V(R)|\mathcal{V}(\mathcal{R})| the size of R\mathcal{R}. Also, we write χR\chi_{\mathcal{R}} for the monomial χWR\chi_{W_{\mathcal{R}}} where WRW_{\mathcal{R}} is the edge set of the ribbon R\mathcal{R}.

In our analysis, (I,J)(I,J)-ribbons arise as the terms in the Fourier decomposition of the entry M(I,J)\mathcal{M}(I,J) in the moment matrix. It is important to emphasize that the subsets II and JJ in an (I,J)(I,J)-ribbon are allowed to intersect. Also V(R)\mathcal{V}(\mathcal{R}) can contain vertices that are not in V(WR)\mathcal{V}(W_{\mathcal{R}}) if there are isolated vertices in the ribbon.

Ultimately, we will want to partition a ribbon into three subribbons in such a way that we can express the moment matrix as the sum of positive semidefinite matrices, and some error terms. Our partitioning will be based on minimum vertex separators.

For an (I,J)(I,J)-ribbon R\mathcal{R} with edge set WRW_{\mathcal{R}}, a subset QV(R)Q\subseteq\mathcal{V}(\mathcal{R}) of vertices is a vertex separator if QQ separates II and JJ in WRW_{\mathcal{R}}. A vertex separator is minimum if there are no other vertex separators with strictly fewer vertices. The separator size of R\mathcal{R} is the cardinality of any minimum vertex separator of R\mathcal{R}.

The following elementary lemma establishes that a ribbon has a unique leftmost and rightmost vertex separator of minimum size. We defer its proof to Appendix A.3.

Let R\mathcal{R} be an (I,J)(I,J)-ribbon. There is a unique minimum vertex separator SS of R\mathcal{R} such that SS separates II and QQ for any vertex separator QQ of R\mathcal{R}. We call SS the leftmost separator in R\mathcal{R}. We define the rightmost separator analogously and we denote them by SL(R)S_{L}(\mathcal{R}) and SR(R)S_{R}(\mathcal{R}) respectively.

We illustrate the notion of a leftmost and rightmost vertex separator in the example below.

Let I={a,b,c}I=\{a,b,c\} and let J={c,x,y,z}J=\{c,x,y,z\}. The maximum number of vertex disjoint paths from II to JJ is 22 — for example, we could take the path {c}\{c\} and the path {b,h,i,j,z}\{b,h,i,j,z\}. The leftmost and rightmost separators are SL={c,i}S_{L}=\{c,i\} and SR={c,j}S_{R}=\{c,j\} respectively. This example illustrates an important point that when II and JJ intersect, SLS_{L} and SRS_{R} must both contain IJI\cap J.

2 Factorization of Monomials

Our factorization of M\mathcal{M} will rely on an iterative argument for grouping and factoring the Fourier characters in the decomposition of M(I,J)\mathcal{M}(I,J).

With the definition of the canonical factorization in hand, we will collect some important properties about it that we will make use of later:

In the discussion above, we established some properties that a canonical factorization must satisfy. Next we show the reverse direction, that any collection of ribbons that satisfies the below properties must be a canonical factorization. Consider a collection of ribbons R0,R1,R2\mathcal{R}_{0},\mathcal{R}_{1},\mathcal{R}_{2}, and the following list of properties:

3 Factorization of Matrix Entries

This leads to our first factorization of the entries M(I,J)\mathcal{M}(I,J) of M\mathcal{M}. Unfortunately, the error terms in this first attempt will be too large. Using canonical factorizations and Claim 6.5, for any I,J[n]I,J\subseteq[n] of size at most dd we can write

4 Factorization of the Matrix ℳℳ\mathcal{M}

The powers of (ω/n)(\omega/n) are split between Q0\mathcal{Q}_{0} and L\operatorname{\mathcal{L}} so that the typical of eigenvalue of Q0\mathcal{Q}_{0} will be approximately 11 (although it will be some time before we are prepared to prove that).

The equation in lines 6.1, 6.2, and 6.3 can be written succinctly as

As we will see later, with high probability Q00\mathcal{Q}_{0}\succeq 0, and thus also LQ0L0\operatorname{\mathcal{L}}\mathcal{Q}_{0}\operatorname{\mathcal{L}}^{\dagger}\succeq 0. So long as τ\tau is sufficiently large, the spectral norm ξ0\|\xi_{0}\| of the error term that accounts for ribbons whose size is too large will be negligible. However, the error E0E_{0} does not turn out to be negligible. To overcome this we will apply a similar factorization approach to E0E_{0} as we did for M\mathcal{M}; iterating this factorization will push down the error from ribbon nondisjointness.

We record an elementary fact about Q0\mathcal{Q}_{0}:

Let Π\Pi be the projector to Span{eC:CCd}\mathsf{Span}\{e_{C}\,:\,C\in\mathcal{C}_{\leqslant d}\}. Then Q0=ΠQ0=Q0Π\mathcal{Q}_{0}=\Pi\mathcal{Q}_{0}=\mathcal{Q}_{0}\Pi.

Suppose SS is not a clique in GG. We need to show that the row Q0(S,)\mathcal{Q}_{0}(S,\cdot) is zero. For every entry Q0(S,S)\mathcal{Q}_{0}(S,S^{\prime}), notice that the Fourier coefficients Q0(S,S)^(T)=Q0(S,S)^(T)\widehat{\mathcal{Q}_{0}(S,S^{\prime})}(T)=\widehat{\mathcal{Q}_{0}(S,S^{\prime})}(T^{\prime}) if T,T([n]2)T,T^{\prime}\subseteq{{[n]}\choose{2}} disagree only on edges inside SS. (That is, TT(S2)T\oplus T^{\prime}\subseteq{S\choose 2}.) This means that \mathcal{Q}_{0}(S,S^{\prime})=\mathbf{1}_{S\text{ is a clique inG}}\cdot f_{S,S^{\prime}}(G) for some function fS,Sf_{S,S^{\prime}}. ∎

In what follows, we will show how to factor a slightly more general sort of matrix; this factorization will be applicable iteratively, starting with E0E_{0}.

An improper (I,J)(I,J)-ribbon R\mathcal{R} is an (I,J)(I,J)-ribbon R0\mathcal{R}_{0} together with a set Z(R)[n]\mathcal{Z}(\mathcal{R})\subseteq[n] of vertices disjoint from V(R0)\mathcal{V}(\mathcal{R}_{0}). (Think of adding the vertices Z(R)\mathcal{Z}(\mathcal{R}) to the ribbon R0\mathcal{R}_{0} as degree- nodes.) We write V(R)=V(R0)Z(R)\mathcal{V}(\mathcal{R})=\mathcal{V}(\mathcal{R}_{0})\cup\mathcal{Z}(\mathcal{R}). When we need to distinguish, we sometimes call ordinary ribbons “proper”.

Every ribbon is also an improper ribbon by taking Z()=\mathcal{Z}(\cdot)=\varnothing, and every improper ribbon has a corresponding ribbon given by deleting its degree- vertices.

We are ready to define our recursive factorization of E0E_{0}. Recall that c0(Rm)=1c_{0}(\mathcal{R}_{m})=1 if Rm\mathcal{R}_{m} satisfies 3 and c0(Rm)=0c_{0}(\mathcal{R}_{m})=0 otherwise and E0=Ec0E_{0}=\mathcal{E}_{c_{0}}. Applying the factorization above to Ec0\mathcal{E}_{c_{0}} we obtain matrices ξ1=ξc0,Q1\xi_{1}=\xi_{c_{0}},\mathcal{Q}_{1}, and Ec1\mathcal{E}_{c_{1}}. Then of course we can apply the factorization again to Ec1\mathcal{E}_{c_{1}}.

Proceeding inductively, for all i[1,2d]i\in[1,2d] let ξi=ξci1,Qi,\xi_{i}=\xi_{c_{i-1}},\mathcal{Q}_{i}, and Eci\mathcal{E}_{c_{i}} be the matrices given by applying the factorization to Eci1\mathcal{E}_{c_{i-1}} at step ii.

We have that M=L(Q0)LE0ξ0\mathcal{M}=\operatorname{\mathcal{L}}(\mathcal{Q}_{0})\operatorname{\mathcal{L}}^{\dagger}-\mathcal{E}_{0}-\xi_{0} and Ei1=LQiLEiξci1=LQiLEiξi\mathcal{E}_{i-1}=\operatorname{\mathcal{L}}\mathcal{Q}_{i}\operatorname{\mathcal{L}}^{\dagger}-\mathcal{E}_{i}-\xi_{c_{i-1}}=\operatorname{\mathcal{L}}\mathcal{Q}_{i}\operatorname{\mathcal{L}}^{\dagger}-\mathcal{E}_{i}-\xi_{i}. We prove the claim by starting with the first formula and appliying the second formula for each i[1,2d]i\in[1,2d]. At the end, we are left with an extra term E2d\mathcal{E}_{2d}. We must show that E2d=0\mathcal{E}_{2d}=0.

ℳℳ\mathcal{M} is PSD

In this section we combine the factorization of M\mathcal{M} in terms of the matrices L,Qi,ξi\operatorname{\mathcal{L}},\mathcal{Q}_{i},\xi_{i} that we obtained in Section 6 with estimates on the eigenvalues of the Q\mathcal{Q}s and ξ\xis. The starting point is the following PSDness claim for Q0\mathcal{Q}_{0}.

We also need to bound Qi\|\mathcal{Q}_{i}\| for i>0i>0.

The preceding lemmas are enough to obtain Q0+Q2dD/2\mathcal{Q}_{0}-\ldots+\mathcal{Q}_{2d}\succeq D/2, but in the end we need to work with the matrix L(Q0+Q2d)L(ξ0+ξ2d)\operatorname{\mathcal{L}}(\mathcal{Q}_{0}-\ldots+\mathcal{Q}_{2d})\operatorname{\mathcal{L}}^{\dagger}-(\xi_{0}-\ldots+\xi_{2d}). The next two lemmas allow us to make this last step.

With high probability, ΠLΠLΠΩ(ω/n)d+1Π\Pi\operatorname{\mathcal{L}}\Pi\operatorname{\mathcal{L}}^{\dagger}\Pi\succeq\Omega(\omega/n)^{d+1}\cdot\Pi, where as usual Π\Pi is the projector to Span{eC:CCd}\mathsf{Span}\{e_{C}\,:\,C\in\mathcal{C}_{\leqslant d}\}.

Finally, we need a bound on the ξ\xi matrices.

With high probability, ξ0+ξ2dn16d\|\xi_{0}-\ldots+\xi_{2d}\|\leqslant n^{-16d}.

By a union bound, with high probability the conclusions of Lemmas 7.1, 7.2, 7.3, and 7.4 all hold. By Lemma 7.1 and Lemma 7.2,

where as usual Π\Pi is the projector to SpaneC:CCd\mathsf{Span}{e_{C}\,:\,C\in\mathcal{C}_{\leqslant d}}. Thus by Lemma 7.3, we obtain L(Q0+Q2d)LΩ(ω/n)d+1Π\operatorname{\mathcal{L}}(\mathcal{Q}_{0}-\ldots+\mathcal{Q}_{2d})\operatorname{\mathcal{L}}^{\dagger}\succeq\Omega(\omega/n)^{d+1}\cdot\Pi. Finally, by Lemma 7.4 we have

In the next subsections, we prove the foregoing lemmas.

Our PSDness arguments require bounds on the spectral norm of certain random matrices.

Our random matrices arise out of decompositions of the moment matrix from Definition 5.7 and are functions of a graph GG on vertex set [n][n]. Our norm bounds will hold for what we call as graphical matrices, that are are defined to capture the matrices are invariant under permutation of vertices in the graph GG and are in fact "minimal" such matrices.

We first identify the shape of a ribbon that basically identifies the structure of a ribbon up to renaming.

For an (I,J)(I,J)-ribbon R\mathcal{R}, consider the graph UU on the vertex set [V(R)][|\mathcal{V}(\mathcal{R})|] whose edges are

(Here we are considering V(R)\mathcal{V}(\mathcal{R}) to have the usual ordering inherited from [n][n].) Also, let UU have two distinguished subsets of vertices AA and BB, where A=\{i\,:\,\text{theithelementof-th element of\mathcal{V}(\mathcal{R})isinis inI}\}, and similarly for BB and JJ. We call UU the shape of R\mathcal{R} and write shape(R)=U\mathsf{shape}(\mathcal{R})=U.

We record some observations on shapes of ribbons.

If R\mathcal{R} is a ribbon (not an improper ribbon), its shape satisfies the assumptions of Lemma 7.8 (namely, that every vertex outside ABA\cup B has degree at least 11).

If, for example, R\mathcal{R} is an (I,J)(I,J) ribbon where IJ={1}I\cap J=\{1\} (which must be the least element in both II and JJ), then (I,J)(I^{\prime},J^{\prime})-ribbon R\mathcal{R}^{\prime} only has the same shape as R\mathcal{R} if IJ=1|I^{\prime}\cap J^{\prime}|=1 and contains only the least element in II and JJ. More broadly, specifying the shape of a ribbon in particular specifies the pattern of intersection of its endpoints.

We are now ready to define graphical matrices.

When UU is a graph on 22 vertices with distinguished sets {1}\{1\} and {2}\{2\} of size 11 each and a single edge connecting vertex 11 and 22, the graphical matrix of shape UU is just the standard {1,1}\{-1,1\}-adjacency matrix of the graph GG.

The following lemma will be our main tool. It is in essence due to Medarametla and Potechin [MP] and special cases of the bound have been proven and used in [HKP+16, HKP15, DM15]. We give a proof in the appendix for completeness.

Let UU be a graph on tO(logn)t\leqslant O(\log n) vertices, with two distinguished subsets of vertices AA and BB, and suppose:

UU admits pp vertex-disjoint paths from ABA\setminus B to BAB\setminus A.

Every vertex outside ABA\cup B has degree at least 11.

Let M=M(G)M=M(G) be the graphical matrix with shape UU. Then, whp, Mntpr22O(t)(logn)O(tr+p)\|M\|\leqslant n^{\tfrac{t-p-r}{2}}\cdot 2^{O(t)}\cdot(\log n)^{O(t-r+p)}.

Lemma 7.8 can be seen as a generalization of the standard upper bound on the spectral norm of the adjacency matrix. Example 7.7 shows how adjacency matrix is a graphical matrix with a shape UU on 22 vertices with a single edge connecting them, thus, t=2t=2 and r=1r=1. Lemma 7.8 thus shows an upper bound of npolylog(n)\sqrt{n}\operatorname{poly}\log{(n)} on the spectral norm of the adjacency matrix which is tight up to a polylog(n)\operatorname{poly}\log{(n)} factor.

In this section we prove Lemma 7.1, which we restate here.

To begin, we split Q0\mathcal{Q}_{0} into its diagonal Q0diag\mathcal{Q}_{0}^{\text{diag}} and its off-diagonal Q0off-diag\mathcal{Q}_{0}^{\text{off-diag}} parts.

Then Q0=Q0diag+Q0off-diag\mathcal{Q}_{0}=\mathcal{Q}_{0}^{\text{diag}}+\mathcal{Q}_{0}^{\text{off-diag}}. Expanding Q0diag\mathcal{Q}_{0}^{\text{diag}},

for all S([n]d)S\in{{[n]}\choose{d}} with high probability by a similar argument as in Lemma 5.4 and a union bound.

Next, we bound Q0off-diag\|\mathcal{Q}_{0}^{\text{off-diag}}\| be decomposing it according to ribbon shape. Fix s,tτs,t\leqslant\tau. Let U1(s,t),,Uq(s,t)U_{1}^{(s,t)},\ldots,U_{q}^{(s,t)} be all the graphs on vertex set [t][t] with two distinguished sets of vertices A,BA,B, both of size ss, with ABs1|A\cap B|\leqslant s-1, and where there are sABs-|A\cap B| vertex-disjoint paths from ABA\setminus B to BAB\setminus A. Let Mi(s,t)M_{i}^{(s,t)} be given by

We can apply Lemma 7.8 to conclude that with probability at least 1O(n100logn)1-O(n^{-100\log n}),

where to conclude the bound on the exponent in (logn)O(tAB+AB)(\log n)^{O(t-|A\cap B|+|A\setminus B|)} we have used that t2sABt\geqslant 2s-|A\cap B|.

Notice that for fixed ss and tt, there are at most 2(t2)+O(t)2^{{t\choose 2}+O(t)} unique shapes U1(s,t),,Uq(s,t)U_{1}^{(s,t)},\ldots,U_{q}^{(s,t)}. Thus, a union bound followed by the triangle inequality, we obtain that for fixed ss and tt, with probability at least 1O(n99logn)1-O(n^{-99\log n}),

Under our assumptions on the parameters d,τ,d,\tau, and ε\varepsilon, this is at most 2(s2)/(100τ)2^{{s\choose 2}}/(100\tau). Summing over all tτt\leqslant\tau, for a fixed ss we have

Notice that the above matrix is exactly the block of Q0off-diag\mathcal{Q}_{0}^{\text{off-diag}} corresponding to subsets of size ss. Together with our bound on Q0diag\mathcal{Q}_{0}^{\text{diag}}, this proves the lemma. ∎

In this section we prove Lemma 7.2, restated here.

We will need to bound the coefficients ci(Rm)c_{i}(\mathcal{R}_{m}^{\prime}) used to define the matrices Qi\mathcal{Q}_{i} which we set up in Section 6.

recalling that ω=n1/2ε\omega=n^{1/2-\varepsilon}. Furthermore, if Rm\mathcal{R}_{m} and Rm\mathcal{R}_{m}^{\prime} have the same shape, then ci(Rm)=ci(Rm)c_{i}(\mathcal{R}_{m})=c_{i}(\mathcal{R}_{m}^{\prime}).

With this lemma in hand we can prove Lemma 7.2.

Fix some 0<i2d0<i\leqslant 2d. We will use Lemma 7.8, which requires that we first decompose each Qi\mathcal{Q}_{i} into simpler matrices. First of all, for a proper ribbon Rm\mathcal{R}_{m}, let

Note that we include Rm\mathcal{R}_{m} itself in this sum as a proper ribbon is also an improper ribbon.

Consider all of the improper ribbons Rm\mathcal{R}^{\prime}_{m} with kk isolated vertices whose largest proper subribbon is Rm\mathcal{R}_{m}. For each such ribbon Rm\mathcal{R}^{\prime}_{m}, by Lemma 7.10, (ω/n)kci(Rm)(ωn)k+snpki/22+εs(\omega/n)^{k}c_{i}(\mathcal{R}^{\prime}_{m})\leqslant\left(\frac{\omega}{n}\right)^{k+s}\cdot n^{\frac{p-k-i/2}{2}+\varepsilon s}. There are at most nkn^{k} such improper ribbons. Adding all of their contributions together gives at most

Summing this up over all k0k\geqslant 0 gives the result. ∎

As usual, let Π\Pi be the projector to Span{eC:CCd}\mathsf{Span}\{e_{C}\,:\,C\in\mathcal{C}_{\leqslant d}\}. Note that ΠQi=QiΠ=Qi\Pi\mathcal{Q}_{i}=\mathcal{Q}_{i}\Pi=\mathcal{Q}_{i}, since Qi(I,J)=0\mathcal{Q}_{i}(I,J)=0 whenever II or JJ is not a clique. So, to show that D/8dQiD/8dD/8d\succeq\mathcal{Q}_{i}\succeq-D/8d, it is sufficient to show that for all vectors vv with v=Πvv=\Pi v it happens that vQivvT(D/8d)v|v^{\dagger}{\mathcal{Q}_{i}}v|\leqslant v^{T}(D/8d)v. To see this, let vkv_{k} be the part of vv indexed by cliques of size exactly kk. Now,

We turn to the proof of Lemma 7.10, for which we want the following characterization of the effect of the separating factorization on the underlying graph of a ribbon.

We require the following combinatorial quantities:

In this section we prove Lemma 7.3, restated here.

With high probability, ΠLΠLΠΩ(ω/n)d+1Π\Pi\operatorname{\mathcal{L}}\Pi\operatorname{\mathcal{L}}^{\dagger}\Pi\succeq\Omega(\omega/n)^{d+1}\cdot\Pi, where as usual Π\Pi is the projector to Span{eC:CCd}\mathsf{Span}\{e_{C}\,:\,C\in\mathcal{C}_{\leqslant d}\}.

We recall the definition of L\operatorname{\mathcal{L}}.

Consider a diagonal entry L(S,S)\operatorname{\mathcal{L}}(S,S). Since every ribbon R\mathcal{R} appearing in its expansion must have 1, in particular it has no edges inside SS. Thus, by the same argument as in Lemma 5.4, with probability at least 1O(n10logn)1-O(n^{-10\log n}),

Let Loff-diag\operatorname{\mathcal{L}}^{\text{off-diag}} be given by

every vertex in UU outside ABA\cup B is reachable from AA without passing through BB, and

BB is the unique minimum-size vertex separator in UU separating AA from BB.

with probability 1O(n99logn)1-O(n^{-99\log n}). By our assumptions on d,τ,d,\tau, and ε\varepsilon, this is at most (ω/n)sr/21/d4(\omega/n)^{s_{r}/2}\cdot 1/d^{4}.

5 High-Degree Matrices Have Small Norms

In this section we prove Lemma 7.4, restated here:

With high probability, ξ0+ξ2dn16d\|\xi_{0}-\ldots+\xi_{2d}\|\leqslant n^{-16d}.

We recall the definition of ξi\xi_{i}. For a coefficient function on ribbons ci1(Rm)c_{i-1}(\mathcal{R}_{m}), we have a matrix E\mathcal{E} given by

and another one, E\mathcal{E}^{\prime}, given by

Then the matrix ξi\xi_{i} is given by EE\mathcal{E}-\mathcal{E}^{\prime}.

We will actually prove a bound on the Frobenious norm of each matrix ξi\xi_{i}. The following will allow us to control the magnitude of the entries. It follows immediately from our concentration bound Lemma A.1, which is proved via the moment method. (Under the slightly stronger assumption τεlogn/loglogn\tau\ll\varepsilon\log n/\log\log n, it would also follow from standard hypercontractivity.)

Suppose cTc_{T} are a collection of coefficients, one for each T([n]2)T\subseteq{{[n]}\choose{2}}, and there is a constant CC such that

Otherwise, cT(ω/n)T/CCd|c_{T}|\leqslant(\omega/n)^{|T|/C-Cd}.

Then with probability at least 1O(n100logn)1-O(n^{-100\log n}) it occurs that T([n]2)cTχTn20d\left\lvert\sum_{T\subseteq{{[n]}\choose{2}}}c_{T}\cdot\chi_{T}\right\rvert\leqslant n^{-20d}.

We will also need several facts about the coefficients of ribbons in the expansion of each matrix ξi\xi_{i}.

We will apply Lemma 7.17 to ξi(I,J)\xi_{i}(I,J) for each i2di\leqslant 2d and I,J[n]I,J\subseteq[n] with I,Jd|I|,|J|\leqslant d. So consider the Fourier expansion of ξi(I,J)\xi_{i}(I,J), given by

From Lemma 7.18, we obtain that if T>Cτ|T|>C\tau then cT=0c_{T}=0, for some absolute constant CC. For smaller TT we need a bound on the magnitude cT|c_{T}|. The coefficient cTc_{T} is bounded by

By Lemma 7.10, we have ci1(Rm)nd(ω/n)2dc_{i-1}(\mathcal{R}_{m})\leqslant n^{d}\leqslant(\omega/n)^{-2d}. At the same time, there are at most 2O(τ2)2^{O(\tau^{2})} nonzero terms in the sum (7.3). Thus by Lemma 7.18 and our assumptions on d,τ,d,\tau, and ε\varepsilon, the coefficient cTc_{T} is at most (ω/n)τ/CCd(\omega/n)^{\tau/C-Cd} for some absolute constant CC.

Applying Lemma 7.17, we obtain ξi(I,J)n20d|\xi_{i}(I,J)|\leqslant n^{-20d} with probability 1O(n100logn)1-O(n^{-100\log n}). Taking a union bound over all n2dn2lognn^{2d}\leqslant n^{2\log n} entries of ξi\xi_{i}, and over all i2di\leqslant 2d, we obtain that ξ0+ξ2dξ0+ξ2dFn16d\|\xi_{0}-\ldots+\xi_{2d}\|\leqslant\|\xi_{0}-\ldots+\xi_{2d}\|_{F}\leqslant n^{-16d} with probability 1O(n96logn)1-O(n^{-96\log n}). ∎

Acknowledgements

We thank Raghu Meka, Ryan O’Donnell, Prasad Raghavendra, Tselil Schramm, David Steurer, and Avi Wigderson for many useful discussions related to this paper.

References

Appendix A Omitted Proofs

In this subsection we prove Lemma 5.3, restated here.

A.2 Concentration Bounds for Linear Constraints

In this section we prove Lemma 5.4. We will use the following elementary concentration bound repeatedly. (It is the scalar version of the matrix concentration bound Lemma 7.8; we state and prove a scalar version here because it is a good warmup for Lemma 7.8.)

Considering each T([n]2)T\subseteq{{[n]}\choose{2}} as a graph, we partition {T([n]2):V(T)=t}\{T\subseteq{{[n]}\choose{2}}\,:\,|\mathcal{V}(T)|=t\} into ptp_{t} families {Tit}i=1p\{\mathcal{T}_{i}^{t}\}_{i=1}^{p} by placing TT and TT^{\prime} in the same family iff there exists a permutation σ:[n][n]\sigma:[n]\rightarrow[n] of vertices so that σ(T)=T\sigma(T)=T^{\prime}. Thus,

By a union bound over all pt2t2p_{t}\leqslant 2^{t^{2}} families Tit\mathcal{T}_{i}^{t}, we get that with high probability,

For τ(ε/2)logn\tau\leqslant(\varepsilon/2)\log n and ω=n1/2ε\omega=n^{1/2-\varepsilon}, this is at most nΩ(ε)n^{-\Omega(\varepsilon)}. ∎

A.3 Combinatorial Proofs about Ribbons

In this section we prove Lemma 6.3, restated here:

Let R\mathcal{R} be an (I,J)(I,J)-ribbon. There is a unique minimum vertex separator SS of R\mathcal{R} such that SS separates II and QQ for any vertex separator QQ of R\mathcal{R}. We call SS the leftmost separator in R\mathcal{R}. We define the rightmost separator analogously and we denote them by SL(R)S_{L}(\mathcal{R}) and SR(R)S_{R}(\mathcal{R}) respectively.

We start by defining a natural partial order on the set of vertex separators in a ribbon R\mathcal{R}.

We write Q1Q2Q_{1}\leqslant Q_{2} for two vertex separators Q1Q_{1} and Q2Q_{2} of an (I,J)(I,J)-ribbon R\mathcal{R} if Q1Q_{1} separates II and Q2Q_{2}.

Next, we check that the definition above indeed is a partial order.

For any set of minimum vertex separators Q1,Q2,Q3Q_{1},Q_{2},Q_{3} an (I,J)(I,J)-ribbon, we have:

If Q1Q2Q_{1}\leqslant Q_{2} and Q2Q3Q_{2}\leqslant Q_{3}, then, Q1Q3Q_{1}\leqslant Q_{3}.

If Q1Q2Q_{1}\leqslant Q_{2} and Q2Q1Q_{2}\leqslant Q_{1}, then, Q1=Q2Q_{1}=Q_{2}.

The first statement is immediate from the definition. For the second, consider a path PP from II to Q3Q_{3} in R\mathcal{R}. Since Q2Q3Q_{2}\leqslant Q_{3}, PP passes through a vertex in Q2Q_{2}. Thus, PP contains a subpath that connects II and Q2Q_{2}. But since Q1Q2Q_{1}\leqslant Q_{2}, this subpath must pass through Q1Q_{1}. Thus, any such PP must pass through Q1Q_{1} and thus, Q1Q3Q_{1}\leqslant Q_{3}.

Finally, for the third statement, let k=Q1=Q2k=|Q_{1}|=|Q_{2}|. Then, using Menger’s theorem (Fact 4.2, there is a set of kk vertex disjoint paths P1,P2,,PkP_{1},P_{2},\ldots,P_{k} between II and JJ. By virtue of Q1,Q2Q_{1},Q_{2} being minimum vertex separators of R\mathcal{R}, Q1Q_{1} and Q2Q_{2} must intersect each PiP_{i} in exactly one vertex. It is then immediate that the only way Q1Q2Q_{1}\leqslant Q_{2} and Q2Q1Q_{2}\leqslant Q_{1} if every PiP_{i} intersects Q1,Q2Q_{1},Q_{2} in the same vertex. ∎

It is enough to show that for any two minimum separators Q1,Q2Q_{1},Q_{2} of size kk in RR, there are separators QL,QRQ_{L},Q_{R} such that QLQ1QRQ_{L}\leqslant Q_{1}\leqslant Q_{R} and QLQ2QRQ_{L}\leqslant Q_{2}\leqslant Q_{R}. We now construct QLQ_{L} and QRQ_{R} as required.

Let U=Q1Q2U=Q_{1}\cap Q_{2} and V=Q1ΔQ2V=Q_{1}\Delta Q_{2}. Let WLVW_{L}\subseteq V be the set of vertices ww such that there is a path from II to ww that doesn’t pass through Q1Q2Q_{1}\cup Q_{2}. Similarly, let WRVW_{R}\subseteq V be the set of vertices such that there is a path from ww to some vertex in JJ that doesn’t pass through any vertex in Q1Q2Q_{1}\cup Q_{2}. Then we first observe:

Assume otherwise and let wWLWRw\in W_{L}\cap W_{R}. Then there is a path between II and JJ that doesn’t go through any vertex in at least one of Q1Q_{1} or Q2Q_{2} contradicting that both are in fact vertex separators. ∎

Let QL=UWLQ_{L}=U\cup W_{L} and QR=UWRQ_{R}=U\cup W_{R}. Then QL,QRQ_{L},Q_{R} are both vertex separators in RR.

We only give the argument for QLQ_{L}, the other case is similar. Assume there is a path PP from II to JJ that does not pass through QLQ_{L}. PP must intersect Q1Q2Q_{1}\cup Q_{2}. Then there is a vertex vQ1Q2v\in Q_{1}\cup Q_{2} such that there is a path II to vv which intersects no other vertices in Q1Q2Q_{1}\cup Q_{2}. This implies that either vUv\in U or vWLv\in W_{L}. But by our construction of WLW_{L} this is a contradiction. ∎

Finally, we note that both QL,QRQ_{L},Q_{R} must in fact be minimum vertex separators.

Let Q1=Q2=k|Q_{1}|=|Q_{2}|=k. Then 2k=Q1+Q2=2U+V2U+WL+WR=UWL+UWR=QL+QR2k=|Q_{1}|+|Q_{2}|=2|U|+|V|\geqslant 2|U|+|W_{L}|+|W_{R}|=|U\cup W_{L}|+|U\cup W_{R}|=|Q_{L}|+|Q_{R}|. Since QLQ_{L} and QRQ_{R} are vertex separators, QL,QRk|Q_{L}|,|Q_{R}|\geqslant k. Thus, QL=QR=k|Q_{L}|=|Q_{R}|=k. ∎

Finally, we have the ordering requirement on QLQ_{L} and QRQ_{R}.

QLQ1Q_{L}\leqslant Q_{1} and Q2QRQ_{2}\leqslant Q_{R}.

Let PP be a path from II to Q1Q_{1}, let vv be the first vertex on this path which is in Q1Q2Q_{1}\cup Q_{2}. Then, vUv\in U or vWLv\in W_{L}. Thus, QLQ1Q_{L}\leqslant Q_{1}. The other case is similar.∎

Appendix B Spectral Norms

The results in this section are in essence due to Medarametla and Potechin [MP]. For completeness, we state and prove them here in the language and notation of the current paper, with minor modifications as needed.

Let UU be a graph on tO(logn)t\leqslant O(\log n) vertices, with two distinguished subsets of vertices AA and BB, and suppose:

UU admits pp vertex-disjoint paths from ABA\setminus B to BAB\setminus A.

Every vertex outside ABA\cup B has degree at least 11.

Let M=M(G)M=M(G) be the graphical matrix with shape UU. Then, whp, Mntpr22O(t)(logn)O(tr+p)\|M\|\leqslant n^{\tfrac{t-p-r}{2}}\cdot 2^{O(t)}\cdot(\log n)^{O(t-r+p)}.

We proceed by the trace power method, with a dependence-breaking step beforehand.

Let q1,,qpq_{1},\ldots,q_{p} be vertex-disjoint paths from ABA\setminus B to BAB\setminus A in UU. Without loss of generality we can take each to intersect ABA\setminus B and BAB\setminus A only at its endpoints. We will partition the space of labelings σ\sigma into disjoint sets S1,,SmS_{1},\ldots,S_{m}. For each SkS_{k} there will be a partition V1k,V2kV_{1}^{k},V_{2}^{k} of [n][n] so that σ(jpqj)V1k\sigma(\bigcup_{j\leqslant p}q_{j})\subseteq V_{1}^{k} and σ(U(jpqj))V2k\sigma(U\setminus(\bigcup_{j\leqslant p}q_{j}))\subseteq V_{2}^{k} for every σSk\sigma\in S_{k}. Let (V11,V21),,(V1m,V2m)(V_{1}^{1},V_{2}^{1}),\ldots,(V_{1}^{m},V_{2}^{m}) be a sequence of independent uniformly random partitions of [n][n]. Call a labeling σ\sigma good at kk if the preceeding conditions apply to σ\sigma for the partition V1k,V2kV_{1}^{k},V_{2}^{k} and not for any V1k,V2kV_{1}^{k^{\prime}},V_{2}^{k^{\prime}} for some k<kk^{\prime}<k. Let S_{k}=\{\sigma\,:\,\sigma\mbox{ is good atk}\}.

There is m=O(2ttlogn)m=O(2^{t}\cdot t\cdot\log n) so that k=1mSk\bigcup_{k=1}^{m}S_{k} contains every labeling σ:UG\sigma:U\rightarrow G.

Henceforth, let S1,,SmS_{1},\ldots,S_{m} be the partition guaranteed by the preceeding claim. For kmk\leqslant m, let Mk(I,J)=σSk:σ(A)=I,σ(B)=Jval(σ)M_{k}(I,J)=\sum_{\sigma\in S_{k}\,:\,\sigma(A)=I,\sigma(B)=J}\operatorname{val}(\sigma). Then M=k=1mMkM=\sum_{k=1}^{m}M_{k}.

This means that among the labels σi(j)\sigma_{i}(j) for all jABj\notin A\cap B, there are at most

Finally, consider the labels of the rr vertices j1,,jrj_{1},\ldots,j_{r} in ABA\cap B. The first labelling σ1\sigma_{1} assigns these vertices some σ1(j1),,σ1(jr)\sigma_{1}(j_{1}),\ldots,\sigma_{1}(j_{r}) labels in GG. Since σ2\sigma_{2} agrees with σ1\sigma_{1} on AA-vertices, we must have σ2(j1)=σ1(j1),,σ1(jr)=σ2(jr)\sigma_{2}(j_{1})=\sigma_{1}(j_{1}),\ldots,\sigma_{1}(j_{r})=\sigma_{2}(j_{r}). Since σ3\sigma_{3} agrees with σ2\sigma_{2} on BB-vertices, we must have σ3(j1)=σ2(j1),,σ3(jr)=σ2(jr)\sigma_{3}(j_{1})=\sigma_{2}(j_{1}),\ldots,\sigma_{3}(j_{r})=\sigma_{2}(j_{r}), and so on. So there are at most rr unique labels for such vertices.

Choose the labels σi(j1),,σi(jr)\sigma_{i}(j_{1}),\ldots,\sigma_{i}(j_{r}) of all the vertices in ABA\cap B. Here there are at most nrn^{r} options.

Choose the labels σi(j)[n]\sigma_{i}(j)\in[n] for all jABj\notin A\cap B and pairs (i,j)(i,j) which in Stage 2 we chose to be the first appearance of a label. If there are xx such vertices, there are at most nxn^{x} options.

Now using Markov’s inequality and standard manipulations, for any ss,