Tight Lower Bounds for Planted Clique in the Degree-4 SOS Program

Prasad Raghavendra, Tselil Schramm

Introduction

In the Maximum Clique problem, the input consists of a graph G=(V,E)G=(V,E) and the goal is to find the largest subset SS of vertices all of which are connected to each other. The Maximum Clique problem is NP-hard to approximate within a n1εn^{1-\varepsilon}-factor for all ε>0\varepsilon>0 [Hås96, Kho01].

The works of [FK08, BV09] show that, if one were able to efficiently calculate the injective tensor norm of a certain random order-mm tensor, then by extending the spectral algorithm of [AKS98] one would have a polynomial-time algorithm for k>n1/mk>n^{1/m}. However, there is no known algorithm that efficiently computes the injective tensor norm of an order-mm tensor; in fact computing the inective tensor norm is hard to approximate in the general case [HM13].

This line of work brings to the fore the question: can a d=O(1)d=O(1)-degree SOS relaxation solve the planted clique problem for some k<nk<\sqrt{n}? While lower bounds are known for Lovász-Schrijver SDP relaxations for planted clique [FK00, FK03], SOS relaxations can in general be much more powerful than Lovász-Schrijver relaxations. For example, while there are instances of unique games that are hard for poly(loglogn)\operatorname{poly}(\log\log n)-rounds of the Lovász-Schrijver SDP hierarchy [KS09, RS09], recent work has shown that these instances are solved by degree-88 SOS hierarchy [BBH+12].

Moreover, even the degree-44 SOS relaxation proves to be surprisingly powerful in a few applications:

First, the work of Barak et al. [BBH+12] shows that a degree 44 SOS relaxation can certify 2to42-to-4 hypercontractivity of low degree polynomials over the hypercube. This argument is the reason that hard instances for Lovász-Schriver and other SDP hierarchies constructed via the noisy hypercube gadgets are easily refuted by the SOS hierarchy.

Second, a degree-44 SOS relaxation can certify that the 22-to-44 norm of a random subspace of dimension at most o(n)o(\sqrt{n}) is bounded by a constant (with high probability over the choice of the subspace) [BBH+12]. This average-case problem has superficial similarities to the planted clique problem.

In this work, we make modest progress towards a lower bound for SOS relaxations of planted clique by obtaining a nearly tight lower bound for the degree-44 SOS relaxation (corresponding to two rounds, r=2r=2). More precisely, our main result is the following.

Note that by the work of [AKS98], this result is tight up to logarithmic factors. In an independent work, Hopkins, Kothari and Potechin [HKP15] have obtained a similar result.

We use the symbol \succeq to denote the PSD ordering on matrices, saying that A0A\succeq 0 if AA is PSD and that ABA\succeq B if AB0A-B\succeq 0. When we wish to hide constant factors for clarity, we use aba\lesssim b to denote that aCba\leq C\cdot b for some constant CC.

In our notation, we at times closely follow the notation of [DM15a], as our paper builds on their results and we recycle many of their bounds.

2 Organization

Preliminaries and Proof Overview

In this section, we describe the degree-4 SOS relaxation for the max-clique SDP and give background on the Deshpande-Montanari witness. We then describe our own modified witness, and give an overview of the proof that our witness is feasible (the difficult part being showing that our witness is PSD). The full proof of feasibility is deferred to Section 3.

It is instructive to think of the variable XSX_{S} as a pseudoexpectation of the product of indicator variables, or a pseudomoment:

We obtain Theorem 2.1 by constructing a point, or witness, for each GG(n,12)G\sim G(n,\tfrac{1}{2}), then proving that the point is feasible with high probability. We defer the description of our witness to Definition 2.8 and Definition 2.9, as we spend Section 2.2 and Section 2.3 motivating our construction; however the curious reader may skip ahead to Definition 2.9 which does not require the knowledge of additional notation.

2 Deshpande-Montanari Witness

This SDP solution assigns to each clique of size 1,,d1,\ldots,d, a value that depends only on its size (in our case, d=4d=4). In essence, their solution takes advantage of the independence of the G(n,p)G(n,p) instance. The motivating observation is that the variable XSX_{S} can be thought of as a pseudoexpectation of the indicator that SS is a subclique of the planted clique. The idea is then to make this pseudoexpectation of the indicator consistent with the true expectation under the distribution where a clique of size kk is planted uniformly at random within the instance of G(n,p)G(n,p). Thus, every vertex is in the clique “with uniform probability:”

Then, the same principle is applied to edges, traingles, and 44-cliques, so that

This is the general idea of the SDP solution of [DM15a]. More formally, the SDP solution in [DM15a] is specified by four parameters α={αi}i=14\underline{\alpha}=\{\alpha_{i}\}_{i=1}^{4} as,

where for a set of vertices AVA\subseteq V, GA\mathcal{G}_{A} is the indicator that the subgraph induced on AA is a clique. The parameters {αi}i\{\alpha_{i}\}_{i\in} determine the value of the objective function, and the feasibility of the solution. As a convention, we will define α0=1\alpha_{0}=1.

It is easy to check that the solution M(G,α)M(G,\underline{\alpha}) satisfies all the linear constraints of the SOS program (2.1), since it assigns non-zero values only to cliques in GG. The key difficulty is in showing that the matrix MM is PSD for an appropriate choice of parameters α\underline{\alpha}.

In order to show that M(G,α)0M(G,\underline{\alpha})\succeq 0, it is sufficient to show that N(G,α)0N(G,\underline{\alpha})\succeq 0 where,

where Gij\mathcal{G}_{ij} is the indicator for the presence of the edge (i,j)(i,j). In words, NN is the matrix where the entry {a,b,c,d}\{a,b,c,d\} is proportional not to the indicator of whether {a,b,c,d}\{a,b,c,d\} is a clique, but to the indicator of whether GG has as a subgraph the bipartite clique with bipartitions {a,b}\{a,b\} and {c,d}\{c,d\}. It is easy to see that the matrix MM is obtained by dropping from NN the rows and columns corresponding {a,b}(n2)\{a,b\}\in\binom{n}{2} where (a,b)E(G)(a,b)\notin E(G). Hence N0M0N\succeq 0\Longrightarrow M\succeq 0.

Notice that NN is a random matrix whose entries depend on the edges in the random graph GG. At the risk of over-simplification, the approach of both the previous works [MPW15] and [DM15a] can be broadly summarized as follows:

The most significant challenge is to argue that (2.3) holds with high probability. In fact, the inequality only holds for the Deshpande-Montanari SDP solution with high probability for parameters α\alpha for which the objective value is o(n1/3)o(n^{1/3}).

where Π0,Π1,Π2\Pi_{0},\Pi_{1},\Pi_{2} are the projections to the spaces V0,V1,V2V_{0},V_{1},V_{2} respectively. The eigenvalues are given by,

Deviation from Expectation

where QQ includes all multilinear entries, and KK includes all non-multilinear entries, i.e., entries K(A,B)K(A,B) where ABA\cap B\neq\emptyset. Formally,

The spectral norm of the matrix QQ over the eigenspaces V0,V1,V2V_{0},V_{1},V_{2} is carefully bounded in [DM15a].

(Proposition 4.20, 4.25 in [DM15a]) With probability at least 1O(n4)1-O(n^{-4}), all of the following bounds hold:

Deshpande and Montanari fix α1=κ\alpha_{1}=\kappa, α2=4κ2\alpha_{2}=4\kappa^{2}, α3=8κ3\alpha_{3}=8\kappa^{3} and α4=512κ4\alpha_{4}=512\kappa^{4} for a parameter κ\kappa. Using Proposition 2.3 and Lemma 2.4, the above matrix inequality becomes,

which can be shown to hold for κn2/3\kappa\ll n^{-2/3}. Eventually, it is necessary to show (2.3), which is stronger than H220H_{22}\succeq 0. This is again achieved by showing bounds on the spectra of H111H_{11}^{-1} and H12H_{12}. We refer the reader to [DM15a] for more details of the arguments.

3 Problematic Subspace

The SDP solution described above ceases to be PSD at κn2/3\kappa\simeq n^{-2/3} which corresponds to an objective value of O(n1/3)O(n^{1/3}). The specific obstruction to H220H_{22}\succeq 0 arises out of (2.10). More precisely, the bottom 2×22\times 2 principal minor which yields the constraint,

In fact, we identify a specific subspace WW that is problematic for the [DM15a] solution. To describe the subspace, let us fix some notation. Define the random variable AijA_{ij} to be 1-1 if (i,j)∉E(i,j)\not\in E, and +1+1 otherwise. We follow the convention that Aii=0A_{ii}=0.

This is an immediate observation from the various matrix norm bounds in [DM15a] (specifically Lemma A.2, Lemma A.3 and Observation A.5). We defer the detailed proof to Appendix A.1. ∎

Since Π2QΠ1α4n\lVert\Pi_{2}Q\Pi_{1}\rVert\gg\alpha_{4}\overline{n}, the above lemma implies that all the vectors with large singular values for QQ are within the subspace WW. Furthermore, we will show the following lemma which clearly articulates that WW is the sole obstruction to H220H_{22}\succeq 0.

It is sufficient to show that BW,BWB_{W^{\perp}},B_{W} and BK0B_{K}\succeq 0. Using Proposition 2.3 and (2.9), BK(λ0α3n1/2)Π0+(λ1α3n1/2)Π1+(λ2α3n1/2)Π20B_{K}\succeq(\lambda_{0}-\alpha_{3}\overline{n}^{1/2})\Pi_{0}+(\lambda_{1}-\alpha_{3}\overline{n}^{1/2})\Pi_{1}+(\lambda_{2}-\alpha_{3}\overline{n}^{1/2})\Pi_{2}\succeq 0 when condidition (2.11) holds. Using Proposition 2.3, Lemma 2.4 and Lemma 2.5 we can write,

which is PSD given the bounds on λ1,λ2,λ3\lambda_{1},\lambda_{2},\lambda_{3} in conditions (2.12) and (2.13). To see this, one shows that all the 2×22\times 2 principal minors are PSD.

An immediate corollary of the proof of the above lemma is the following.

Under the hypothesis of Lemma 2.6, with probability 1O(n4)1-O(n^{-4}),

4 The Corrected Witness

Suppose we have an unconstrained matrix MM that we wish to modify as little as possible so as to ensure M0M\succeq 0. Given a test vector ww so that wTMw<0w^{T}Mw<0, the natural update to make is to take M=M+βwwTM^{\prime}=M+\beta\cdot ww^{T} for a suitably chosen β\beta. This would suggest creating a new SDP solution by setting H22=H22+βi[n]aiaiTH_{22}^{\prime}=H_{22}+\beta\sum_{i\in[n]}a_{i}a_{i}^{T}.

Unfortunately, the SOS SDP relaxation has certain hard constraints, namely that the non-clique entries are fixed at zero. Moreover, the entry XS1,S2X_{S_{1},S_{2}} must depend only on S1S2S_{1}\cup S_{2}. Setting the SDP solution matrix to H22+βi[n]aiaiTH_{22}+\beta\sum_{i\in[n]}a_{i}a_{i}^{T} would almost certainly violate both these constraints. It is thus natural to consider multiplicative updates to the entries of the matrix which clearly preserve the zero entries of the matrix.

where β=1100nlogn\beta=\frac{1}{100\sqrt{n}\log n}. Then our SDP witness is the matrix MM^{\prime}, defined so that

where P\mathcal{P} is the projection that zeros out rows and columns corresponding to pairs (i,j)∉E(i,j)\not\in E.

where cS(α)c_{|S|}(\underline{\alpha}) is some factor chosen for each S{0,,4}|S|\in\{0,\ldots,4\} depending on the choice of α\underline{\alpha}, which we will set later to ensure that the final moments matrix is PSD.

For β=1100nlogn\beta=\frac{1}{100\sqrt{n}\log n}, and α412\alpha_{4}\leq\tfrac{1}{2}, with probability at least 1O(n5)1-O(n^{-5}), the solution N(G,α)N^{\prime}(G,\underline{\alpha}) does not violate any of the linear constraints of the planted clique SDP.

First, M(S1,S2)=M(S1,S2)M^{\prime}(S_{1},S_{2})=M(S_{1},S_{2}) whenever S1S2<4|S_{1}\cup S_{2}|<4 so these entries satisfy the constraints of the SDP. If S1S2=4\lvert S_{1}\cup S_{2}\rvert=4 then M(S1,S2)M^{\prime}(S_{1},S_{2}) is given by,

Notice that M(S1,S2)M^{\prime}(S_{1},S_{2}) is non-zero only if S1S2S_{1}\cup S_{2} is a clique, and it depends only on S1S2S_{1}\cup S_{2}. Moreover, i[n]jS1S2Aij\sum_{i\in[n]}\prod_{j\in S_{1}\cup S_{2}}A_{ij} is a sum over iid mean random variables and therefore satisfies,

A simple union bound over all subsets S1S2(n4)S_{1}\cup S_{2}\in\binom{n}{4} shows that M(S1,S2)M^{\prime}(S_{1},S_{2})\in for all of them with probability at least 1O(n5)1-O(n^{-5}). ∎

It now remains to verify that N(G,α)0N^{\prime}(G,\underline{\alpha})\succeq 0. We will do this by verifying the Schur complement conditions, as in [DM15a]. Analogous to the submatrix H22H_{22}, one can consider the corresponding submatrix H22H^{\prime}_{22} of NN^{\prime}. The expression for HH^{\prime} is as follows:

Here DiD_{i} is the matrix with aia_{i} on the diagonal, and KK is the matrix corresponding to the non-multilinear entries (entries corresponding to monomials like xa2xbxcx_{a}^{2}x_{b}x_{c}), and J(n2)J_{\binom{n}{2}} is the all-11s matrix. The matrices H12H_{12} and H11H_{11} are unchanged, and so we must simply verify that H22H12H111H12H_{22}^{\prime}\succeq H_{12}^{\top}H_{11}^{-1}H_{12} and that H220H_{22}^{\prime}\succeq 0.

This concludes our proof overview. In Section 3, we verify the Schur complement conditions and prove our main result, and in Section 4 we give the random matrix concentration results upon which we rely throughout the proof.

Proof of the Main Result

In this section, we will demonstrate that H220H^{\prime}_{22}\succeq 0, and that H22H12H111H12H^{\prime}_{22}\succeq H_{12}^{\top}H_{11}^{-1}H_{12}. This will allow us to conclude that our solution matrix is PSD, and therefore is a feasible point for the degree-4 SOS relaxation.

Before we proceed further, it will be convenient to parametrize the choice of α1,α2,α3\alpha_{1},\alpha_{2},\alpha_{3} and α4\alpha_{4}. In particular, it will be useful to fix,

for two parameters γ,ρ\gamma,\rho, which we will finally fix to γ=log4n\gamma=\log^{4}n and ρ=log20n\rho=\log^{-20}n. For this setting of parameters, the eigenvalues λ0,λ1,λ2\lambda_{0},\lambda_{1},\lambda_{2} from Proposition 2.3 are bounded by,

Here we will make a first step towards verifying the Schur complement conditions of Lemma 2.2 by showing that H220H^{\prime}_{22}\succeq 0. Specifically, we will show the following stronger claim.

For β=1100nlogn\beta=\frac{1}{100\sqrt{n}\log{n}} and γ=log4n\gamma=\log^{4}{n}, ρ<log20n\rho<\log^{-20}n, the following holds with probability at least 1O(n4)1-O(n^{-4}),

Fix θ=Q2λ1\theta=\frac{\lVert Q\rVert^{2}}{\lambda_{1}}. By definition of H22H^{\prime}_{22}, we have

Define PW=i[n]aiaiTP_{W}=\sum_{i\in[n]}a_{i}a_{i}^{T}. We can apply Lemma 2.6 to the H22H_{22} term and Corollary 2.7 for H22KH_{22}-K,

Now we will appeal to a few matrix concentration bounds that we show in Section 4. First, with probability 1O(n5)1-O(n^{-5}), the vectors {ai2}\{a_{i}^{\otimes 2}\} are nearly orthogonal, and therefore form a well-conditioned basis for the subspace WW.

Also, the vectors {ai2}\{a_{i}^{\otimes 2}\} have negligible projection on to the eigenspaces V0,V1V_{0},V_{1} which implies that with overwhelming probability,

Finally, WW is an nn dimensional space. Each DiΠ2ΠWΠ2DiD_{i}\Pi_{2}\Pi_{W}\Pi_{2}D_{i} has only nn non-zero singular values each of which is O(1)O(1). Moreover, multiplying on the left and right by DiD_{i} acts as a random linear transformation/ random change of basis. Intuitively, this suggests that iDiΠ2ΠWΠ2Di\sum_{i}D_{i}\Pi_{2}\Pi_{W}\Pi_{2}D_{i} has n2n^{2} eigenvalues all of which are roughly O(1)O(1). In fact, with probability 1O(n5)1-O(n^{-5}),

By Lemma 2.4, with probability at least 1O(n4)1-O(n^{-4}), Qα4n3/2\lVert Q\rVert\lesssim\alpha_{4}\overline{n}^{3/2}. Substituting this bound for θ=Q2λ1\theta=\frac{\lVert Q\rVert^{2}}{\lambda_{1}} along with (3.2), finishes the proof for our choice of parameters. The details are presented below for completeness.

Clearly λ0>λ1>λ2\lambda_{0}>\lambda_{1}>\lambda_{2} and

Towards bounding the eigenvalues of H21TH111H12H_{21}^{T}H_{11}^{-1}H_{12}, Deshpande and Montanari [DM15a] observe the following properties of H21H_{21} with regards to the spaces V0,V1V_{0},V_{1} and V2V_{2}.

Thus, the columns of UU are in WV1W\cup V_{1}. So we have that

In Lemma 4.5, we bound Π01ΠWΠ01O(log2nn)\|\Pi_{01}\Pi_{W}\Pi_{01}\|\leq O(\frac{\log^{2}n}{n}). So we have that

where we have applied the inequality a2+b22aba^{2}+b^{2}\geq 2ab. The conclusion follows by noting that xTΠ2H21=xTΠ2ΠWUxTΠ2ΠWU\|x^{T}\Pi_{2}H_{21}\|=\|x^{T}\Pi_{2}\Pi_{W}U\|\leq\|x^{T}\Pi_{2}\Pi_{W}\|\cdot\|U\|, and that by Lemma 3.2 Uα3n\|U\|\lesssim\alpha_{3}\overline{n}. ∎

We will bound H21H111H12H_{21}H_{11}^{-1}H_{12}, as in our bounds on H22H_{22}^{\prime}, by splitting the matrix up according to the eigenspaces Π0,Π1,Π2\Pi_{0},\Pi_{1},\Pi_{2}.

Let c1,,c4c_{1},\ldots,c_{4} be as defined in (3.1). For the choice of α\underline{\alpha} in (3.1), we have that with probability 1O(n5)1-O(n^{-5}),

If we set yi=A1/2xiy_{i}=A^{1/2}x_{i} then the inequality reduces to,

which is an immediate consequence of triangle inequality for \lVert\cdot\rVert and Cauchy-Schwartz inequality. ∎

To simplify the calculations and make the dominant terms apparent, let us fix α1=c1/n\alpha_{1}=c_{1}/\sqrt{n}, α2=c2/n\alpha_{2}=c_{2}/n, α3=c3/n3/2\alpha_{3}=c_{3}/n^{3/2} and α4=c4/n\alpha_{4}=c_{4}/n wherein each ci[1/log200n,1]c_{i}\in[1/\log^{200}n,1]. First, observe that 1α11n(α22α12)\frac{1}{\alpha_{1}}\gg\frac{1}{n(\alpha_{2}-2\alpha_{1}^{2})} for this setting of parameters.

For the terms xATH111xAx_{A}^{T}H_{11}^{-1}x_{A}, xBTH111xBx_{B}^{T}H_{11}^{-1}x_{B} and xCTH111xCx_{C}^{T}H_{11}^{-1}x_{C} we can write,

The conclusion follows by grouping the projections, and taking the dominating terms as nn grows. ∎

For the choice of α\underline{\alpha} given in (3.1), we have that H22H12TH111H12H^{\prime}_{22}\succeq H_{12}^{T}H_{11}^{-1}H_{12} with probability 1O(n4)1-O(n^{-4}).

Recall that by Theorem 3.1 with probability at least 1O(n4)1-O(n^{-4}),

By our choice of the parameters α1,α2,α3,α4\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4}, the conclusion of Theorem 3.4 implies that

as desired. The details of the calculation are spelled out below for the sake of completeness; we verify that the coefficient of each projector is non-negative.

5 Proof of Main Theorem

We finally have the components needed to prove Theorem 2.1.

First, we recall that independent of our choice of α\underline{\alpha}, the SDP solution defined in Definition 2.8 does not violate any of the linear constraints of (2.1), as shown in Proposition 2.10. To meet the program constraints, it remains to show that for the choice of α\underline{\alpha} given in (3.1), our solution is PSD.

The solution matrix M(G,α)M^{\prime}(G,\underline{\alpha}) is a principal submatrix of N(G,α)N^{\prime}(G,\underline{\alpha}), and so N0N^{\prime}\succeq 0 implies M0M^{\prime}\succeq 0. We prove that NN^{\prime} satisfies the Schur complement conditions from Lemma 2.2 with high probability. Observing that H11=H11H^{\prime}_{11}=H_{11} and H12=H12H_{12}^{\prime}=H_{12}, we apply Theorem 3.6, which states that for our choice of α\underline{\alpha}, H22H12TH111H21H_{22}^{\prime}\succeq H_{12}^{T}H_{11}^{-1}H_{21}. For the H11H_{11} term, we apply the lower bound on the eigenvalues of H11H_{11} given by [DM15a] (Lemma A.7), which state that so long as α1α2α2n1/2\alpha_{1}-\alpha_{2}\gg\alpha_{2}n^{-1/2} and α22α120\alpha_{2}-2\alpha_{1}^{2}\geq 0, we have H110H_{11}\succeq 0 with probability 1O(n5)1-O(n^{-5}). For our choice of α1,α2\alpha_{1},\alpha_{2}, we have

and so we may conclude that H110H_{11}\succeq 0. Therefore by a union bound, the conditions of Lemma 2.2 are satisfied with probability 1O(n4)1-O(n^{-4}), and N(G,α)0N^{\prime}(G,\underline{\alpha})\succeq 0 and so our solution satisfies the PSDness constraint.

Concentration of Projected Matrices

In this section, we give bounds on the spectra of random matrices that are part of the correction term. Though we are able to recycle many of the spectral bounds of Deshpande and Montanari [DM15a], in our modification to their witness, we introduce new matrices which also require description and norm bounds.

From FF, we form a new graph FF^{\prime} by identifying all the nodes with the same label; thus, the number of nodes in FF^{\prime} is the number of labels in FF. We then collapse the parallel edges in FF^{\prime} to form the graph HH; since each labelled edge appears at least twice, the number of edges in HH is at most half that in FF. The number of nodes in HH (and thus labels in FF) is at most the number of edges in HH plus the number of connected components; this is tight when HH is a forest. Thus the number of distinct labels in FF is at most E/2+c|E|/2+c, where cc is the number of components in FF. ∎

We begin with a lemma that shows that the aia_{i} are close to an orthogonal basis for WW.

By definition, the vectors a1,,ana_{1},\ldots,a_{n} form a basis for the subspace WW.

Let R\mathcal{R} be the matrix whose iith row is aia_{i}. We will use matrix concentration to analyze the eigenvalues of RRT\mathcal{R}\mathcal{R}^{T}, which are identical to the nonzero eigenvalues of PW=RTRP_{W}=\mathcal{R}^{T}\mathcal{R}.

The expression Tr(Mk)\operatorname{Tr}(M^{k}) is a sum over monomial products over variables {Aip}i,p[n]\{A_{ip}\}_{i,p\in[n]}, where each monomial product corresponds to a labelling L:F[n]\mathcal{L}:F\to[n] of a graph FF. Each entry in MijM_{ij} corresponds to a sum over links, where each link is a cycle of length 44, with the vertices i,ji,j on opposite ends of the cycle, and the necessarily distinct vertices p,qp,q are on the other opposite ends of a cycle. We will refer to i,ji,j as the center vertices and p,qp,q as the peripheral vertices of the link. Each edge (u,v)(u,v) of the link is weighted by AuvA_{uv}. Since Aii=0A_{ii}=0 for all i[n]i\in[n], for every contributing labelling, it can never be the case that one of p,q=ip,q=i. Each monomial product in the summation Tr(Mk)\operatorname{Tr}(M^{k}) corresponds to a labelling (F,L)(F,\mathcal{L}) of the graph FF, where FF is a cycle with kk links. FF has 4k4k edges, and in total it has 3k3k vertices.

We prove that any such contributing labelling L:F[n]\mathcal{L}:F\to[n] has at most 3k/2+13k/2+1 unique vertex labels. We proceed by induction on kk, the length of the cycle. In the base case, we have a cycle on two links; by inspection no such cycle can have more than 55 labels, and the base case holds.

Now, consider a cycle of length kk. If every label appears twice, then we are done, since there are 3k3k vertices in FF. Thus there must be a vertex that appears only once.

There can be no peripheral vertex whose label does not repeat, since the two center vertices neighboring a single peripheral vertex cannot have the same label in a contributing term, as M(i,i)=0M(i,i)=0. Now, if there exists a center vertex ii whose label does not repeat, it must be that there is a matching between its p,qp,q neighbors so that every vertex is matched to a vertex of the same label; we identify these same-label vertices and remove ii and two of its neighbors from the graph, leaving us with a cycle of length k1k-1, having removed at most one label from the graph. The induction hypothesis now applies, and we have a total of at most 3(k1)/2+23k/2+13(k-1)/2+2\leq 3k/2+1 labels, as desired.

The following lemma allows us to approximate the projector to V0V1V_{0}\cup V_{1}by a matrix that is easy to describe; we will use this matrix as an approximation to the projector in later proofs.

Now, let P01=iviviTP_{01}=\sum_{i}v_{i}v_{i}^{T}; we can explicitly calculate the entries of P01P_{01},

We will require the fact that WW lies mostly outside of V0V1V_{0}\cup V_{1}, which we prove in the following lemma.

With probability at least 1O(nω(logn))1-O(n^{-\omega(\log n)}),

The only contributing terms correspond to those for which every edge variable in the product has even multiplicity. Each contributing term is a connected graph and has 4k4k edges and at most 5k5k vertices where every labeled edge appears twice, so we may apply Proposition 4.2 to conclude that there are at most 2k+12k+1 labels in any such cycle. We thus have that

and applying Lemma 4.1, we conclude that Mlog2nn\|M\|\lesssim\frac{\log^{2}n}{n} with probability 1O(nω(logn))1-O(n^{-\omega(\log n)}), as desired. ∎

We combine the above lemmas to bound the norm of one final matrix that arises in the computations in Theorem 3.1.

where to obtain the second line we have applied Cauchy-Schwarz, to obtain the third line we have used the fact that a2+b22aba^{2}+b^{2}\geq 2ab, and to obtain the final line we have used the fact that Dix=x\|D_{i}x\|=\|x\|.

Now, the second term is O(log2n)x2O(\log^{2}n)\cdot\|x\|^{2} with overwhelming probability by Lemma 4.5. It remains to bound the first term. To this end, we apply Lemma 4.3 to replace ΠW\Pi_{W} with 1+o(1)n2PW=1+o(1)n2iaiaiT\tfrac{1+o(1)}{n^{2}}\cdot P_{W}=\tfrac{1+o(1)}{n^{2}}\cdot\sum_{i}a_{i}a_{i}^{T}. Let M=1n2iDiPWDiM=\frac{1}{n^{2}}\cdot\sum_{i}D_{i}P_{W}D_{i}. An entry of MM has the form

Acknowledgements

We thank Satish Rao for many helpful conversations.

We also greatfully acknowledge the comments of anonymous reviewers in helping us improve the manuscript.

References

Appendix A Matrix Norm Bounds from Deshpande and Montanari

In this appendix, we give for completeness a list of the bounds proven by Deshpande and Montanari [DM15a] that were not included in the body above out of space or expository considerations.

With probability at least 16n51-6n^{-5}, for each η2\eta\leq 2 and for each ν(4η)\nu\leq\binom{4}{\eta},

Deshpande and Montanari use the trace power method to bound the norm of QQ by bounding the norms of the Jη,νJ_{\eta,\nu} individually. Some of the Jη,νJ_{\eta,\nu} matrices have Wigner-like behavior.

With probability 1O(n5)1-O(n^{-5}), we have that for each (η,ν){(2,1),(2,6),(3,),(4,1)}(\eta,\nu)\in\{(2,1),(2,6),(3,\cdot),(4,1)\},

A select few of the Jη,νJ_{\eta,\nu} have larger eigenvalues.

With probability 1O(n4)1-O(n^{-4}), we have that for each (η,ν){(1,),(2,2),(2,3),(2,4),(2,5)}(\eta,\nu)\in\{(1,\cdot),(2,2),(2,3),(2,4),(2,5)\},

We also give a short proof of an observation of Deshpande and Montanari, which states that some of the Jη,νJ_{\eta,\nu} vanish when projected to V2V_{2}:

Let Π2\Pi_{2} be the projector to V2V_{2}. Then always,

and so by the characterization of V1V_{1} from Proposition 2.3 the vector vV1v^{\prime}\in V_{1}. The conclusion follows.

Finally, we use a bound on the norm of the matrix KK, which is the difference of H2,2H_{2,2} and the non-multilinear entries.

We also require bounds on the matrices used in the Schur complement steps. The bounds of Deshpande and Montanari suffice for us, since we do not modify moments of order less than 44.

We prove Lemma 2.5, which follows almost immediately from the bounds of [DM15a].

Using the matrices from Definition A.1 and Observation A.5, we have that

where we have used the fact that the columns of J2,4J_{2,4} and J2,2J_{2,2} lie in WW. We thus have

Now, we prove that the trace power method works, for completeness.

The proof follows from an application of Markov’s inequality. We have that for even kk,

where we have applied Stirling’s approximation in the last step. Choosing k=O(logn)k=O(\log n) and t=O(η1/kγlognnα)t=O\left(\eta^{-1/k}\cdot\gamma\cdot\log n\cdot n^{\alpha}\right) completes the proof. ∎