Tight Lower Bounds for Planted Clique in the Degree-4 SOS Program

Prasad Raghavendra, Tselil Schramm

Introduction

In the Maximum Clique problem, the input consists of a graph $G=(V,E)$ and the goal is to find the largest subset $S$ of vertices all of which are connected to each other. The Maximum Clique problem is NP-hard to approximate within a $n^{1-\varepsilon}$ -factor for all $\varepsilon>0$ [Hås96, Kho01].

The works of [FK08, BV09] show that, if one were able to efficiently calculate the injective tensor norm of a certain random order- $m$ tensor, then by extending the spectral algorithm of [AKS98] one would have a polynomial-time algorithm for $k>n^{1/m}$ . However, there is no known algorithm that efficiently computes the injective tensor norm of an order- $m$ tensor; in fact computing the inective tensor norm is hard to approximate in the general case [HM13].

This line of work brings to the fore the question: can a $d=O(1)$ -degree SOS relaxation solve the planted clique problem for some $k<\sqrt{n}$ ? While lower bounds are known for Lovász-Schrijver SDP relaxations for planted clique [FK00, FK03], SOS relaxations can in general be much more powerful than Lovász-Schrijver relaxations. For example, while there are instances of unique games that are hard for $\operatorname{poly}(\log\log n)$ -rounds of the Lovász-Schrijver SDP hierarchy [KS09, RS09], recent work has shown that these instances are solved by degree- $8$ SOS hierarchy [BBH+12].

Moreover, even the degree- $4$ SOS relaxation proves to be surprisingly powerful in a few applications:

First, the work of Barak et al. [BBH+12] shows that a degree $4$ SOS relaxation can certify $2-to-4$ hypercontractivity of low degree polynomials over the hypercube. This argument is the reason that hard instances for Lovász-Schriver and other SDP hierarchies constructed via the noisy hypercube gadgets are easily refuted by the SOS hierarchy.

Second, a degree- $4$ SOS relaxation can certify that the $2$ -to- $4$ norm of a random subspace of dimension at most $o(\sqrt{n})$ is bounded by a constant (with high probability over the choice of the subspace) [BBH+12]. This average-case problem has superficial similarities to the planted clique problem.

In this work, we make modest progress towards a lower bound for SOS relaxations of planted clique by obtaining a nearly tight lower bound for the degree- $4$ SOS relaxation (corresponding to two rounds, $r=2$ ). More precisely, our main result is the following.

Note that by the work of [AKS98], this result is tight up to logarithmic factors. In an independent work, Hopkins, Kothari and Potechin [HKP15] have obtained a similar result.

We use the symbol $\succeq$ to denote the PSD ordering on matrices, saying that $A\succeq 0$ if $A$ is PSD and that $A\succeq B$ if $A-B\succeq 0$ . When we wish to hide constant factors for clarity, we use $a\lesssim b$ to denote that $a\leq C\cdot b$ for some constant $C$ .

In our notation, we at times closely follow the notation of [DM15a], as our paper builds on their results and we recycle many of their bounds.

2 Organization

Preliminaries and Proof Overview

In this section, we describe the degree-4 SOS relaxation for the max-clique SDP and give background on the Deshpande-Montanari witness. We then describe our own modified witness, and give an overview of the proof that our witness is feasible (the difficult part being showing that our witness is PSD). The full proof of feasibility is deferred to Section 3.

It is instructive to think of the variable $X_{S}$ as a pseudoexpectation of the product of indicator variables, or a pseudomoment:

We obtain Theorem 2.1 by constructing a point, or witness, for each $G\sim G(n,\tfrac{1}{2})$ , then proving that the point is feasible with high probability. We defer the description of our witness to Definition 2.8 and Definition 2.9, as we spend Section 2.2 and Section 2.3 motivating our construction; however the curious reader may skip ahead to Definition 2.9 which does not require the knowledge of additional notation.

2 Deshpande-Montanari Witness

This SDP solution assigns to each clique of size $1,\ldots,d$ , a value that depends only on its size (in our case, $d=4$ ). In essence, their solution takes advantage of the independence of the $G(n,p)$ instance. The motivating observation is that the variable $X_{S}$ can be thought of as a pseudoexpectation of the indicator that $S$ is a subclique of the planted clique. The idea is then to make this pseudoexpectation of the indicator consistent with the true expectation under the distribution where a clique of size $k$ is planted uniformly at random within the instance of $G(n,p)$ . Thus, every vertex is in the clique “with uniform probability:”

Then, the same principle is applied to edges, traingles, and $4$ -cliques, so that

This is the general idea of the SDP solution of [DM15a]. More formally, the SDP solution in [DM15a] is specified by four parameters $\underline{\alpha}=\{\alpha_{i}\}_{i=1}^{4}$ as,

where for a set of vertices $A\subseteq V$ , $\mathcal{G}_{A}$ is the indicator that the subgraph induced on $A$ is a clique. The parameters $\{\alpha_{i}\}_{i\in}$ determine the value of the objective function, and the feasibility of the solution. As a convention, we will define $\alpha_{0}=1$ .

It is easy to check that the solution $M(G,\underline{\alpha})$ satisfies all the linear constraints of the SOS program (2.1), since it assigns non-zero values only to cliques in $G$ . The key difficulty is in showing that the matrix $M$ is PSD for an appropriate choice of parameters $\underline{\alpha}$ .

In order to show that $M(G,\underline{\alpha})\succeq 0$ , it is sufficient to show that $N(G,\underline{\alpha})\succeq 0$ where,

where $\mathcal{G}_{ij}$ is the indicator for the presence of the edge $(i,j)$ . In words, $N$ is the matrix where the entry $\{a,b,c,d\}$ is proportional not to the indicator of whether $\{a,b,c,d\}$ is a clique, but to the indicator of whether $G$ has as a subgraph the bipartite clique with bipartitions $\{a,b\}$ and $\{c,d\}$ . It is easy to see that the matrix $M$ is obtained by dropping from $N$ the rows and columns corresponding $\{a,b\}\in\binom{n}{2}$ where $(a,b)\notin E(G)$ . Hence $N\succeq 0\Longrightarrow M\succeq 0$ .

Notice that $N$ is a random matrix whose entries depend on the edges in the random graph $G$ . At the risk of over-simplification, the approach of both the previous works [MPW15] and [DM15a] can be broadly summarized as follows:

The most significant challenge is to argue that (2.3) holds with high probability. In fact, the inequality only holds for the Deshpande-Montanari SDP solution with high probability for parameters $\alpha$ for which the objective value is $o(n^{1/3})$ .

where $\Pi_{0},\Pi_{1},\Pi_{2}$ are the projections to the spaces $V_{0},V_{1},V_{2}$ respectively. The eigenvalues are given by,

Deviation from Expectation

where $Q$ includes all multilinear entries, and $K$ includes all non-multilinear entries, i.e., entries $K(A,B)$ where $A\cap B\neq\emptyset$ . Formally,

The spectral norm of the matrix $Q$ over the eigenspaces $V_{0},V_{1},V_{2}$ is carefully bounded in [DM15a].

(Proposition 4.20, 4.25 in [DM15a]) With probability at least $1-O(n^{-4})$ , all of the following bounds hold:

Deshpande and Montanari fix $\alpha_{1}=\kappa$ , $\alpha_{2}=4\kappa^{2}$ , $\alpha_{3}=8\kappa^{3}$ and $\alpha_{4}=512\kappa^{4}$ for a parameter $\kappa$ . Using Proposition 2.3 and Lemma 2.4, the above matrix inequality becomes,

which can be shown to hold for $\kappa\ll n^{-2/3}$ . Eventually, it is necessary to show (2.3), which is stronger than $H_{22}\succeq 0$ . This is again achieved by showing bounds on the spectra of $H_{11}^{-1}$ and $H_{12}$ . We refer the reader to [DM15a] for more details of the arguments.

3 Problematic Subspace

The SDP solution described above ceases to be PSD at $\kappa\simeq n^{-2/3}$ which corresponds to an objective value of $O(n^{1/3})$ . The specific obstruction to $H_{22}\succeq 0$ arises out of (2.10). More precisely, the bottom $2\times 2$ principal minor which yields the constraint,

In fact, we identify a specific subspace $W$ that is problematic for the [DM15a] solution. To describe the subspace, let us fix some notation. Define the random variable $A_{ij}$ to be $-1$ if $(i,j)\not\in E$ , and $+1$ otherwise. We follow the convention that $A_{ii}=0$ .

This is an immediate observation from the various matrix norm bounds in [DM15a] (specifically Lemma A.2, Lemma A.3 and Observation A.5). We defer the detailed proof to Appendix A.1. ∎

Since $\lVert\Pi_{2}Q\Pi_{1}\rVert\gg\alpha_{4}\overline{n}$ , the above lemma implies that all the vectors with large singular values for $Q$ are within the subspace $W$ . Furthermore, we will show the following lemma which clearly articulates that $W$ is the sole obstruction to $H_{22}\succeq 0$ .

It is sufficient to show that $B_{W^{\perp}},B_{W}$ and $B_{K}\succeq 0$ . Using Proposition 2.3 and (2.9), $B_{K}\succeq(\lambda_{0}-\alpha_{3}\overline{n}^{1/2})\Pi_{0}+(\lambda_{1}-\alpha_{3}\overline{n}^{1/2})\Pi_{1}+(\lambda_{2}-\alpha_{3}\overline{n}^{1/2})\Pi_{2}\succeq 0$ when condidition (2.11) holds. Using Proposition 2.3, Lemma 2.4 and Lemma 2.5 we can write,

which is PSD given the bounds on $\lambda_{1},\lambda_{2},\lambda_{3}$ in conditions (2.12) and (2.13). To see this, one shows that all the $2\times 2$ principal minors are PSD.

An immediate corollary of the proof of the above lemma is the following.

Under the hypothesis of Lemma 2.6, with probability $1-O(n^{-4})$ ,

4 The Corrected Witness

Suppose we have an unconstrained matrix $M$ that we wish to modify as little as possible so as to ensure $M\succeq 0$ . Given a test vector $w$ so that $w^{T}Mw<0$ , the natural update to make is to take $M^{\prime}=M+\beta\cdot ww^{T}$ for a suitably chosen $\beta$ . This would suggest creating a new SDP solution by setting $H_{22}^{\prime}=H_{22}+\beta\sum_{i\in[n]}a_{i}a_{i}^{T}$ .

Unfortunately, the SOS SDP relaxation has certain hard constraints, namely that the non-clique entries are fixed at zero. Moreover, the entry $X_{S_{1},S_{2}}$ must depend only on $S_{1}\cup S_{2}$ . Setting the SDP solution matrix to $H_{22}+\beta\sum_{i\in[n]}a_{i}a_{i}^{T}$ would almost certainly violate both these constraints. It is thus natural to consider multiplicative updates to the entries of the matrix which clearly preserve the zero entries of the matrix.

where $\beta=\frac{1}{100\sqrt{n}\log n}$ . Then our SDP witness is the matrix $M^{\prime}$ , defined so that

where $\mathcal{P}$ is the projection that zeros out rows and columns corresponding to pairs $(i,j)\not\in E$ .

where $c_{|S|}(\underline{\alpha})$ is some factor chosen for each $|S|\in\{0,\ldots,4\}$ depending on the choice of $\underline{\alpha}$ , which we will set later to ensure that the final moments matrix is PSD.

For $\beta=\frac{1}{100\sqrt{n}\log n}$ , and $\alpha_{4}\leq\tfrac{1}{2}$ , with probability at least $1-O(n^{-5})$ , the solution $N^{\prime}(G,\underline{\alpha})$ does not violate any of the linear constraints of the planted clique SDP.

First, $M^{\prime}(S_{1},S_{2})=M(S_{1},S_{2})$ whenever $|S_{1}\cup S_{2}|<4$ so these entries satisfy the constraints of the SDP. If $\lvert S_{1}\cup S_{2}\rvert=4$ then $M^{\prime}(S_{1},S_{2})$ is given by,

Notice that $M^{\prime}(S_{1},S_{2})$ is non-zero only if $S_{1}\cup S_{2}$ is a clique, and it depends only on $S_{1}\cup S_{2}$ . Moreover, $\sum_{i\in[n]}\prod_{j\in S_{1}\cup S_{2}}A_{ij}$ is a sum over iid mean random variables and therefore satisfies,

A simple union bound over all subsets $S_{1}\cup S_{2}\in\binom{n}{4}$ shows that $M^{\prime}(S_{1},S_{2})\in$ for all of them with probability at least $1-O(n^{-5})$ . ∎

It now remains to verify that $N^{\prime}(G,\underline{\alpha})\succeq 0$ . We will do this by verifying the Schur complement conditions, as in [DM15a]. Analogous to the submatrix $H_{22}$ , one can consider the corresponding submatrix $H^{\prime}_{22}$ of $N^{\prime}$ . The expression for $H^{\prime}$ is as follows:

Here $D_{i}$ is the matrix with $a_{i}$ on the diagonal, and $K$ is the matrix corresponding to the non-multilinear entries (entries corresponding to monomials like $x_{a}^{2}x_{b}x_{c}$ ), and $J_{\binom{n}{2}}$ is the all- $1$ s matrix. The matrices $H_{12}$ and $H_{11}$ are unchanged, and so we must simply verify that $H_{22}^{\prime}\succeq H_{12}^{\top}H_{11}^{-1}H_{12}$ and that $H_{22}^{\prime}\succeq 0$ .

This concludes our proof overview. In Section 3, we verify the Schur complement conditions and prove our main result, and in Section 4 we give the random matrix concentration results upon which we rely throughout the proof.

Proof of the Main Result

In this section, we will demonstrate that $H^{\prime}_{22}\succeq 0$ , and that $H^{\prime}_{22}\succeq H_{12}^{\top}H_{11}^{-1}H_{12}$ . This will allow us to conclude that our solution matrix is PSD, and therefore is a feasible point for the degree-4 SOS relaxation.

Before we proceed further, it will be convenient to parametrize the choice of $\alpha_{1},\alpha_{2},\alpha_{3}$ and $\alpha_{4}$ . In particular, it will be useful to fix,

for two parameters $\gamma,\rho$ , which we will finally fix to $\gamma=\log^{4}n$ and $\rho=\log^{-20}n$ . For this setting of parameters, the eigenvalues $\lambda_{0},\lambda_{1},\lambda_{2}$ from Proposition 2.3 are bounded by,

Here we will make a first step towards verifying the Schur complement conditions of Lemma 2.2 by showing that $H^{\prime}_{22}\succeq 0$ . Specifically, we will show the following stronger claim.

For $\beta=\frac{1}{100\sqrt{n}\log{n}}$ and $\gamma=\log^{4}{n}$ , $\rho<\log^{-20}n$ , the following holds with probability at least $1-O(n^{-4})$ ,

Fix $\theta=\frac{\lVert Q\rVert^{2}}{\lambda_{1}}$ . By definition of $H^{\prime}_{22}$ , we have

Define $P_{W}=\sum_{i\in[n]}a_{i}a_{i}^{T}$ . We can apply Lemma 2.6 to the $H_{22}$ term and Corollary 2.7 for $H_{22}-K$ ,

Now we will appeal to a few matrix concentration bounds that we show in Section 4. First, with probability $1-O(n^{-5})$ , the vectors $\{a_{i}^{\otimes 2}\}$ are nearly orthogonal, and therefore form a well-conditioned basis for the subspace $W$ .

Also, the vectors $\{a_{i}^{\otimes 2}\}$ have negligible projection on to the eigenspaces $V_{0},V_{1}$ which implies that with overwhelming probability,

Finally, $W$ is an $n$ dimensional space. Each $D_{i}\Pi_{2}\Pi_{W}\Pi_{2}D_{i}$ has only $n$ non-zero singular values each of which is $O(1)$ . Moreover, multiplying on the left and right by $D_{i}$ acts as a random linear transformation/ random change of basis. Intuitively, this suggests that $\sum_{i}D_{i}\Pi_{2}\Pi_{W}\Pi_{2}D_{i}$ has $n^{2}$ eigenvalues all of which are roughly $O(1)$ . In fact, with probability $1-O(n^{-5})$ ,

By Lemma 2.4, with probability at least $1-O(n^{-4})$ , $\lVert Q\rVert\lesssim\alpha_{4}\overline{n}^{3/2}$ . Substituting this bound for $\theta=\frac{\lVert Q\rVert^{2}}{\lambda_{1}}$ along with (3.2), finishes the proof for our choice of parameters. The details are presented below for completeness.

Clearly $\lambda_{0}>\lambda_{1}>\lambda_{2}$ and

Towards bounding the eigenvalues of $H_{21}^{T}H_{11}^{-1}H_{12}$ , Deshpande and Montanari [DM15a] observe the following properties of $H_{21}$ with regards to the spaces $V_{0},V_{1}$ and $V_{2}$ .

Thus, the columns of $U$ are in $W\cup V_{1}$ . So we have that

In Lemma 4.5, we bound $\|\Pi_{01}\Pi_{W}\Pi_{01}\|\leq O(\frac{\log^{2}n}{n})$ . So we have that

where we have applied the inequality $a^{2}+b^{2}\geq 2ab$ . The conclusion follows by noting that $\|x^{T}\Pi_{2}H_{21}\|=\|x^{T}\Pi_{2}\Pi_{W}U\|\leq\|x^{T}\Pi_{2}\Pi_{W}\|\cdot\|U\|$ , and that by Lemma 3.2 $\|U\|\lesssim\alpha_{3}\overline{n}$ . ∎

We will bound $H_{21}H_{11}^{-1}H_{12}$ , as in our bounds on $H_{22}^{\prime}$ , by splitting the matrix up according to the eigenspaces $\Pi_{0},\Pi_{1},\Pi_{2}$ .

Let $c_{1},\ldots,c_{4}$ be as defined in (3.1). For the choice of $\underline{\alpha}$ in (3.1), we have that with probability $1-O(n^{-5})$ ,

If we set $y_{i}=A^{1/2}x_{i}$ then the inequality reduces to,

which is an immediate consequence of triangle inequality for $\lVert\cdot\rVert$ and Cauchy-Schwartz inequality. ∎

To simplify the calculations and make the dominant terms apparent, let us fix $\alpha_{1}=c_{1}/\sqrt{n}$ , $\alpha_{2}=c_{2}/n$ , $\alpha_{3}=c_{3}/n^{3/2}$ and $\alpha_{4}=c_{4}/n$ wherein each $c_{i}\in[1/\log^{200}n,1]$ . First, observe that $\frac{1}{\alpha_{1}}\gg\frac{1}{n(\alpha_{2}-2\alpha_{1}^{2})}$ for this setting of parameters.

For the terms $x_{A}^{T}H_{11}^{-1}x_{A}$ , $x_{B}^{T}H_{11}^{-1}x_{B}$ and $x_{C}^{T}H_{11}^{-1}x_{C}$ we can write,

The conclusion follows by grouping the projections, and taking the dominating terms as $n$ grows. ∎

For the choice of $\underline{\alpha}$ given in (3.1), we have that $H^{\prime}_{22}\succeq H_{12}^{T}H_{11}^{-1}H_{12}$ with probability $1-O(n^{-4})$ .

Recall that by Theorem 3.1 with probability at least $1-O(n^{-4})$ ,

By our choice of the parameters $\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4}$ , the conclusion of Theorem 3.4 implies that

as desired. The details of the calculation are spelled out below for the sake of completeness; we verify that the coefficient of each projector is non-negative.

5 Proof of Main Theorem

We finally have the components needed to prove Theorem 2.1.

First, we recall that independent of our choice of $\underline{\alpha}$ , the SDP solution defined in Definition 2.8 does not violate any of the linear constraints of (2.1), as shown in Proposition 2.10. To meet the program constraints, it remains to show that for the choice of $\underline{\alpha}$ given in (3.1), our solution is PSD.

The solution matrix $M^{\prime}(G,\underline{\alpha})$ is a principal submatrix of $N^{\prime}(G,\underline{\alpha})$ , and so $N^{\prime}\succeq 0$ implies $M^{\prime}\succeq 0$ . We prove that $N^{\prime}$ satisfies the Schur complement conditions from Lemma 2.2 with high probability. Observing that $H^{\prime}_{11}=H_{11}$ and $H_{12}^{\prime}=H_{12}$ , we apply Theorem 3.6, which states that for our choice of $\underline{\alpha}$ , $H_{22}^{\prime}\succeq H_{12}^{T}H_{11}^{-1}H_{21}$ . For the $H_{11}$ term, we apply the lower bound on the eigenvalues of $H_{11}$ given by [DM15a] (Lemma A.7), which state that so long as $\alpha_{1}-\alpha_{2}\gg\alpha_{2}n^{-1/2}$ and $\alpha_{2}-2\alpha_{1}^{2}\geq 0$ , we have $H_{11}\succeq 0$ with probability $1-O(n^{-5})$ . For our choice of $\alpha_{1},\alpha_{2}$ , we have

and so we may conclude that $H_{11}\succeq 0$ . Therefore by a union bound, the conditions of Lemma 2.2 are satisfied with probability $1-O(n^{-4})$ , and $N^{\prime}(G,\underline{\alpha})\succeq 0$ and so our solution satisfies the PSDness constraint.

Concentration of Projected Matrices

In this section, we give bounds on the spectra of random matrices that are part of the correction term. Though we are able to recycle many of the spectral bounds of Deshpande and Montanari [DM15a], in our modification to their witness, we introduce new matrices which also require description and norm bounds.

From $F$ , we form a new graph $F^{\prime}$ by identifying all the nodes with the same label; thus, the number of nodes in $F^{\prime}$ is the number of labels in $F$ . We then collapse the parallel edges in $F^{\prime}$ to form the graph $H$ ; since each labelled edge appears at least twice, the number of edges in $H$ is at most half that in $F$ . The number of nodes in $H$ (and thus labels in $F$ ) is at most the number of edges in $H$ plus the number of connected components; this is tight when $H$ is a forest. Thus the number of distinct labels in $F$ is at most $|E|/2+c$ , where $c$ is the number of components in $F$ . ∎

We begin with a lemma that shows that the $a_{i}$ are close to an orthogonal basis for $W$ .

By definition, the vectors $a_{1},\ldots,a_{n}$ form a basis for the subspace $W$ .

Let $\mathcal{R}$ be the matrix whose $i$ th row is $a_{i}$ . We will use matrix concentration to analyze the eigenvalues of $\mathcal{R}\mathcal{R}^{T}$ , which are identical to the nonzero eigenvalues of $P_{W}=\mathcal{R}^{T}\mathcal{R}$ .

The expression $\operatorname{Tr}(M^{k})$ is a sum over monomial products over variables $\{A_{ip}\}_{i,p\in[n]}$ , where each monomial product corresponds to a labelling $\mathcal{L}:F\to[n]$ of a graph $F$ . Each entry in $M_{ij}$ corresponds to a sum over links, where each link is a cycle of length $4$ , with the vertices $i,j$ on opposite ends of the cycle, and the necessarily distinct vertices $p,q$ are on the other opposite ends of a cycle. We will refer to $i,j$ as the center vertices and $p,q$ as the peripheral vertices of the link. Each edge $(u,v)$ of the link is weighted by $A_{uv}$ . Since $A_{ii}=0$ for all $i\in[n]$ , for every contributing labelling, it can never be the case that one of $p,q=i$ . Each monomial product in the summation $\operatorname{Tr}(M^{k})$ corresponds to a labelling $(F,\mathcal{L})$ of the graph $F$ , where $F$ is a cycle with $k$ links. $F$ has $4k$ edges, and in total it has $3k$ vertices.

We prove that any such contributing labelling $\mathcal{L}:F\to[n]$ has at most $3k/2+1$ unique vertex labels. We proceed by induction on $k$ , the length of the cycle. In the base case, we have a cycle on two links; by inspection no such cycle can have more than $5$ labels, and the base case holds.

Now, consider a cycle of length $k$ . If every label appears twice, then we are done, since there are $3k$ vertices in $F$ . Thus there must be a vertex that appears only once.

There can be no peripheral vertex whose label does not repeat, since the two center vertices neighboring a single peripheral vertex cannot have the same label in a contributing term, as $M(i,i)=0$ . Now, if there exists a center vertex $i$ whose label does not repeat, it must be that there is a matching between its $p,q$ neighbors so that every vertex is matched to a vertex of the same label; we identify these same-label vertices and remove $i$ and two of its neighbors from the graph, leaving us with a cycle of length $k-1$ , having removed at most one label from the graph. The induction hypothesis now applies, and we have a total of at most $3(k-1)/2+2\leq 3k/2+1$ labels, as desired.

The following lemma allows us to approximate the projector to $V_{0}\cup V_{1}$ by a matrix that is easy to describe; we will use this matrix as an approximation to the projector in later proofs.

Now, let $P_{01}=\sum_{i}v_{i}v_{i}^{T}$ ; we can explicitly calculate the entries of $P_{01}$ ,

We will require the fact that $W$ lies mostly outside of $V_{0}\cup V_{1}$ , which we prove in the following lemma.

With probability at least $1-O(n^{-\omega(\log n)})$ ,

The only contributing terms correspond to those for which every edge variable in the product has even multiplicity. Each contributing term is a connected graph and has $4k$ edges and at most $5k$ vertices where every labeled edge appears twice, so we may apply Proposition 4.2 to conclude that there are at most $2k+1$ labels in any such cycle. We thus have that

and applying Lemma 4.1, we conclude that $\|M\|\lesssim\frac{\log^{2}n}{n}$ with probability $1-O(n^{-\omega(\log n)})$ , as desired. ∎

We combine the above lemmas to bound the norm of one final matrix that arises in the computations in Theorem 3.1.

where to obtain the second line we have applied Cauchy-Schwarz, to obtain the third line we have used the fact that $a^{2}+b^{2}\geq 2ab$ , and to obtain the final line we have used the fact that $\|D_{i}x\|=\|x\|$ .

Now, the second term is $O(\log^{2}n)\cdot\|x\|^{2}$ with overwhelming probability by Lemma 4.5. It remains to bound the first term. To this end, we apply Lemma 4.3 to replace $\Pi_{W}$ with $\tfrac{1+o(1)}{n^{2}}\cdot P_{W}=\tfrac{1+o(1)}{n^{2}}\cdot\sum_{i}a_{i}a_{i}^{T}$ . Let $M=\frac{1}{n^{2}}\cdot\sum_{i}D_{i}P_{W}D_{i}$ . An entry of $M$ has the form

Acknowledgements

We thank Satish Rao for many helpful conversations.

We also greatfully acknowledge the comments of anonymous reviewers in helping us improve the manuscript.

References

Appendix A Matrix Norm Bounds from Deshpande and Montanari

In this appendix, we give for completeness a list of the bounds proven by Deshpande and Montanari [DM15a] that were not included in the body above out of space or expository considerations.

With probability at least $1-6n^{-5}$ , for each $\eta\leq 2$ and for each $\nu\leq\binom{4}{\eta}$ ,

Deshpande and Montanari use the trace power method to bound the norm of $Q$ by bounding the norms of the $J_{\eta,\nu}$ individually. Some of the $J_{\eta,\nu}$ matrices have Wigner-like behavior.

With probability $1-O(n^{-5})$ , we have that for each $(\eta,\nu)\in\{(2,1),(2,6),(3,\cdot),(4,1)\}$ ,

A select few of the $J_{\eta,\nu}$ have larger eigenvalues.

With probability $1-O(n^{-4})$ , we have that for each $(\eta,\nu)\in\{(1,\cdot),(2,2),(2,3),(2,4),(2,5)\}$ ,

We also give a short proof of an observation of Deshpande and Montanari, which states that some of the $J_{\eta,\nu}$ vanish when projected to $V_{2}$ :

Let $\Pi_{2}$ be the projector to $V_{2}$ . Then always,

and so by the characterization of $V_{1}$ from Proposition 2.3 the vector $v^{\prime}\in V_{1}$ . The conclusion follows.

Finally, we use a bound on the norm of the matrix $K$ , which is the difference of $H_{2,2}$ and the non-multilinear entries.

We also require bounds on the matrices used in the Schur complement steps. The bounds of Deshpande and Montanari suffice for us, since we do not modify moments of order less than $4$ .

We prove Lemma 2.5, which follows almost immediately from the bounds of [DM15a].

Using the matrices from Definition A.1 and Observation A.5, we have that

where we have used the fact that the columns of $J_{2,4}$ and $J_{2,2}$ lie in $W$ . We thus have

Now, we prove that the trace power method works, for completeness.

The proof follows from an application of Markov’s inequality. We have that for even $k$ ,

where we have applied Stirling’s approximation in the last step. Choosing $k=O(\log n)$ and $t=O\left(\eta^{-1/k}\cdot\gamma\cdot\log n\cdot n^{\alpha}\right)$ completes the proof. ∎