Lower bounds on the size of semidefinite programming relaxations

James R. Lee, Prasad Raghavendra, David Steurer

Introduction

Convex characterizations and relaxations of combinatorial problems have been a consistent, powerful theme in the theory of algorithms since its inception. Linear and semidefinite programming relaxations have been particularly useful for the efficient computation of approximate solutions to NP-hard problems (see, for instance, the books [WS11, Vaz01]). In some sense, semidefinite programs (SDPs) can be seen as combining the rich expressiveness of linear programs with the global geometric power of spectral methods. For many fundamental combinatorial problems, this provides a genuinely new structural and computational perspective [GW95, KMS98, ARV09]. Indeed, for an array of optimization problems, the best-known approximation algorithms can only be achieved via SDP relaxations.

It has long been known that integrality gaps for linear programs (LPs) can often lead to gadgets for NP-hardness of approximation reductions (see, e.g., [LY93, CGH+05, HK03]). Furthermore, assuming the Unique Games Conjecture [Kho02], it is known that integrality gaps for SDPs can be translated directly into hardness of approximation results [KKMO04, Aus10, Rag08]. All of this suggests that the computational model underlying LPs and SDPs is remarkably powerful.

In the setting of linear programs (LPs), the search for a model and characterization began in a remarkable work of Yannakakis [Yan91]. He proved that the TSP and matching polytopes do not admit symmetric linear programming formulations of size 2o(n)2^{o(n)}, where nn is the number of vertices in the underlying graph. In the process, he laid the structural framework (in terms of nonnegative factorizations) that would underlie all future work in the subject. It took over 20 years before Fiorini, Massar, Pokutta, Tiwary, and de Wolf [FMP+12] were able to remove the symmetry assumption and obtain a lower bound of 2Ω(n)2^{\Omega(\sqrt{n})} on the size of any LP formulation. Soon afterward, Rothvoß [Rot14] gave a lower bound of 2Ω(n)2^{\Omega(n)} on the size of any LP formulation for the matching polytope (and also TSP), completing Yannakakis’ vision.

Despite the progress in understanding the power of LP formulations, it remained a mystery whether there were similar strong lower bounds in the setting of SDPs. An analogous positive semidefinite factorization framework was provided in [FMP+12, GPT11]. Following the LP methods of [CLRS13], the papers [LRST14, FSP13] proved exponential lower bounds on the size of symmetric SDP formulations for NP-hard constraint satisfaction problems (CSPs).

In the present work, we prove strong lower bounds on the size of general SDP formulations for the cut, TSP, and stable set polytopes. Moreover, we show that polynomial-size SDP relaxations cannot achieve arbitrarily good approximations for many NP-hard constraint satisfaction problems. For instance, no polynomial-size family of relaxations can achieve better than a 7/87/8-approximation for max 3-sat. More generally, we show that the low-degree sum-of-squares SDP relaxations yield the best approximation among all polynomial-sized families of relaxations for max-CSPs.

This is achieved by relating arbitrary SDP formulations to those coming from the sum-of-squares SDP hierarchyThis hierarchy is also frequently referred to as the Lasserre SDP hierarchy. [Las01, Par00, Sho87], analogous to our previous work with Chan relating LP formulations to the Sherali–Adams hierarchy [CLRS13]. The SDP setting poses a number of significant challenges. At a very high level, our approach can be summarized as follows: Given an arbitrary SDP formulation of small size, we use methods from quantum entropy maximization and online convex optimization (often going by the name “matrix multiplicative weights update”) to learn an approximate low-degree sum-of-squares formulation on a subset of the input variables. In the next section, we present a formal overview of our results, and a discussion of the connections to quantum information theory, real algebraic geometry, and proof complexity.

The results of this work fall along two broad themes, lower bounds on spectrahedral lifts of specific polytopes and lower bounds on SDP relaxations for constraint satisfaction problems. Both sets of results are consequences of a general method for proving lower bounds on positive semidefinite rank. For the convenience of the reader, we have organized the two themes in two self-contained trajectories. Thus, lower bounds on spectrahedral lifts can be accessed via Section 1.1, Section 1.3, Section 2, Section 3 and Section 5. The lower bounds for constraint satisfaction problems can be reached through Section 1.2, Section 1.3, Section 2, Section 3 and Section 6.

We also present general results on approximating density operators against families of linear tests through quantum learning in Section 4. Finally, in Section 7, we exhibit applications of our techniques to non-negative rank; in particular, this is used to give a simple, self-contained proof of a lower bound on the nonnegative rank of the unique disjointness matrix.

1 Spectrahedral lifts of polytopes

one says that PP admits a positive-semidefinite (psd) lift of size kk. (This terminology is taken from [FGP+14].) We remark that the intersection of a PSD cone with an affine subspace is often referred to as a spectrahedron.

Here, the optimization problem on the right is a semidefinite programming problem in kk-by-kk matrices. This idea also goes under the name of a semidefinite extended formulation [FMP+12].

In Section 5.1, we show the following strong lower bound on its psd rank.

The importance of the correlation polytope corrn\text{{{corr}}}_{n} lies in the fact that a number of interesting polytopes from combinatorial optimization contain a face that linearly projects to corrn\text{{{corr}}}_{n} . We first define a few different families of polytopes and then recall their relation to corrn\text{{{corr}}}_{n}.

Finally, consider an arbitrary nn-vertex graph G=([n],E)G=([n],E). We recall that a subset of vertices S[n]S\subseteq[n] is an independent set (also called a stable set) if there are no edges between vertices in SS. The stable set polytope of GG is given by

By results of [DS90] and [FMP+12] (see Proposition 5.2), Theorem 1.1 directly implies the following lower bounds on the psd rank of the cut, TSP, and stable set polytopes.

The following lower bounds hold for every n1n\geqslant 1,

2 Semidefinite relaxations and constraint satisfaction

We now formalize a computational model of semidefinite relaxations for combinatorial optimization problems and prove strong lower bounds for it. Unlike the polytope setting in the previous section, this model also allows us to capture approximation algorithms directly.

Many basic optimization problems are special cases of this general problem, corresponding to functions ff of a particular form: For the problem of finding the maximum cut in a graph GG with nn vertices, the function ff outputs on input x{0,1}nx\in\{0,1\}^{n} the number of edges in GG that cross the bipartition represented by xx, i.e., f(x)f(x) is the number of edges {i,j}E(G)\{i,j\}\in E(G) with xixjx_{i}\neq x_{j}. Similarly, for max 3-sat on a 3CNF formula φ\varphi with nn variables, f(x)f(x) is the number of clauses in φ\varphi satisfied by the assignment xx. More generally, for any kk-ary boolean constraint satisfaction problem, the function ff counts the number of satisfied constraints. Note that in these examples, the functions have at most degree 22, degree 33, and degree kk, respectively.

The sosd\overline{\textup{sos}}_{d} upper bound is equivalent to the degree-dd sum-of-squares (also known as the level-d/2d/2 Lasserre) SDP bound, and for small values of dd, these upper bounds underlie the best-known approximation algorithms for several optimization problems. For example, the Goemans–Williamson algorithm for max cut is based on the upper bound sos2\overline{\textup{sos}}_{2}. If we let αGW0.878\alpha_{GW}\approx 0.878 be the approximation ratio of this algorithm, then every graph GG satisfies max(fG)αGWsos2(fG)\max(f_{G})\geqslant\alpha_{GW}\cdot\overline{\textup{sos}}_{2}(f_{G}) where the function fGf_{G} measures cuts in GG, i.e., fG(x):ijE(G)(xixj)2f_{G}(x)\coloneq\sum_{ij\in E(G)}(x_{i}-x_{j})^{2}.

A natural generalization of low-degree sum-of-squares certificates is obtained by summing squares of functions in a low-dimensional subspace. We can formulate this generalization as a non-uniform model of computation that captures general semidefinite programming relaxations. First, we make the following definition for a subspace of functions.

In this work, we exhibit unconditional lower bounds in this powerful non-uniform model of computation. For example, we show that the max 3-sat problem cannot be approximated to a factor better than 7/87/8 using a polynomial-size family of SDP relaxations. Formally, we show the following lower bound for max 3-sat.

there exists a max 3-sat instance \Im on nn variables such that max()s\max(\Im)\leqslant s but sosU()=1\overline{\textup{sos}}_{U}(\Im)=1 (i.e., UU fails to achieve a factor-ss approximation for max 3-sat).

Our main result is a characterization of an optimal semidefinite programming relaxation for the class of constraint satisfaction problems among all families of SDP relaxations of similar size. Roughly speaking, we show that the O(1)O(1)-degree sum-of-squares relaxations are optimal among all polynomial-sized SDP relaxations for constraint satisfaction problems. Towards stating our main result, we define the class of constraint satisfaction problems. For the sake of clarity, we restrict ourselves to boolean constraint satisfaction problems although the results hold in greater generality.

For a finite collection P\mathscr{P} of kk-ary predicates P ⁣:{0,1}k{0,1}P\colon\{0,1\}^{k}\to\{0,1\}, we let max-P\mathscr{P} denote the following optimization problem: An instance \Im consists of boolean variables X1,,XnX_{1},\ldots,X_{n} and a collection of P\mathscr{P}-constraints P1(X)=1,,PM(X)=1P_{1}(X)=1,\ldots,P_{M}(X)=1 over these variables. A P\mathscr{P}-constraint is a predicate P0 ⁣:{0,1}n{0,1}P_{0}\colon\{0,1\}^{n}\to\{0,1\} such that P0(X)=P(Xi1,,Xik)P_{0}(X)=P(X_{i_{1}},\ldots,X_{i_{k}}) for some PPP\in\mathscr{P} and distinct indices i1,,ik[n]i_{1},\ldots,i_{k}\in[n]. The objective is to find an assignment x{0,1}nx\in\{0,1\}^{n} that satisfies as many of the constraints as possible, that is, which maximizes

We denote the optimal value of an assignment for \Im as opt()=maxx{0,1}n(x)\operatorname{opt}(\Im)=\max_{x\in\{0,1\}^{n}}\Im(x). For example, max cut corresponds to the case where P\mathscr{P} consists of the binary inequality predicate. For max 3-sat, P\mathscr{P} contains all eight 3-literal disjunctions, e.g., X1Xˉ2Xˉ3X_{1}\vee\bar{X}_{2}\vee\bar{X}_{3}.

In other words, the upper bound sosU\overline{\textup{sos}}_{U} allows us to distinguishIn order to distinguish between the cases max()s\max(\Im)\leqslant s and max()>c\max(\Im)>c it is enough to check whether \Im satisfies sosU()c\overline{\textup{sos}}_{U}(\Im)\leqslant c. In the case max()s\max(\Im)\leqslant s, we know that sosU()c\overline{\textup{sos}}_{U}(\Im)\leqslant c by (1.2). On the other hand, in the case max()>c\max(\Im)>c, we know that sosU()>c\overline{\textup{sos}}_{U}(\Im)>c because sosU()\overline{\textup{sos}}_{U}(\Im) is always an upper bound on max()\max(\Im). between the cases max()s\max(\Im)\leqslant s and max()>c\max(\Im)>c for all instances Πn\Im\in\Pi_{n}.

We prove the following theorem, which shows that for every boolean CSP, the approximation guarantees obtained by the degree-dd sum-of-squares upper bound (also known as the the level-d/2d/2 Lasserre SDP relaxation) are optimal among all semidefinite programming relaxations of size at most ncdn^{cd} for some universal constant c>0c>0.

The theorem has several immediate concrete consequences for specific boolean CSPs. First, we know that O(1)O(1)-degree sos upper bounds do not achieve an approximation ratio better than 7/87/8 for max 3-sat [Gri01b, Sch08], therefore Theorem 1.6 implies that polynomial-size SDP relaxations for max 3-sat cannot achieve an approximation ratio better than 7/87/8. In fact, a quantitatively stronger version of the above theorem yields Theorem 1.5.

3 Positive semidefinite rank and sum-of-squares degree

In order to prove our results on spectrahedral lifts and semidefinite relaxations, the factorization perspective will be essential. In the LP setting, the characterization of polyhedral lifts and LP relaxations in terms of nonnegative factorizations is a significant contribution of Yannakakis [Yan91]. In the SDP setting, the analogous characterization is in terms of positive semidefinite factorizations [FMP+12, GPT11].

Nonnegative factorizations correspond to the special case that the matrices {Ai}\{A_{i}\} and {Bj}\{B_{j}\} are restricted to be diagonal. A rank-rr nonnegative factorization can equivalently be viewed as a sum of rr rank-11 nonnegative factorizations (nonnegative rectangles). Indeed, this viewpoint is crucial for all lower bounds on nonnegative factorization. In contrast, rank-rr psd factorizations do not seem to admit a good characterization in terms of rank-11 psd factorizations. This difference captures one of the main difficulties of proving psd rank lower bounds.

This notion is closely relatedFor the sake of simplicity, we have only defined this notion for functions on the discrete cube. In more general settings, one has to be a bit more careful; we refer to [GV02]. to (a special case of) the Positivstellensatz proof system of Grigoriev and Vorobjov [GV02]. We refer to the surveys [Lau09, BS14] and the introduction of [OZ12] for a review of such proof systems and their relationship to semidefinite programming.

where S[n]S\subseteq[n] runs over all subsets of size mm and x{0,1}nx\in\{0,1\}^{n}.

The reader might observe that the matrix in (1.3) looks very similar to the “pattern matrices” defined by Sherstov [She11]. This comparison is not unfounded; some high-level aspects of our proof are quite similar to his. Random restrictions are a powerful tool for analyzing functions over the discrete cube. We refer to [O’D14, Ch. 4] for a discussion of their utility in the context of discrete Fourier analysis. They were also an important tool in the work [CLRS13] on lower bounds for LPs. Accordingly, one hopes that our methods may have additional applications in communication complexity. This would not be surprising, as there is a model of quantum communication that exactly captures psd rank (see [FMP+12]).

Given the preceding proposition, the following result of Grigoriev on the Knapsack tautologies completes our quest for a lower bound.

has degsos(f)m+1\deg_{\mathsf{sos}}(f)\geqslant m+1.

Note that since m/2m/2 is not an integer, (1.4) is nonnegative for all x{0,1}mx\in\{0,1\}^{m}. It turns out that in order to prove stronger lower bounds for corrn\text{{{corr}}}_{n}, we will require a lower bound on the approximate sos degree of ff. Thus the Knapsack tautologies (1.4) will be studied carefully in Section 5.1. In Section 2.2, we discuss the proof of Theorem 1.8 in some detail. Then in Section 3, we present a quantitatively stronger theorem and its proof.

Fix now numbers k,n1k,n\geqslant 1 and a boolean CSP Π\Pi. Fix a pair of constants 0sc10\leqslant s\leqslant c\leqslant 1. Suppose our goal is to show a lower bound on the size of SDP relaxations that yield a (c,s)(c,s)-approximation on instances of size nn. It turns out that this task reduces to proving a lower bound on the positive semidefinite rank of an explicit matrix MM indexed by problem instances and problem solutions (points on the discrete cube in our case).

For any boolean CSP Πn\Pi_{n} and any constants 0s<c10\leqslant s<c\leqslant 1, let UU be a subspace of minimal dimension that achieves a (c,s)(c,s)-approximation for Πn\Pi_{n}. Denote the set of instances

Before describing the proof of this proposition, observe that together with our main theorem the proposition implies Theorem 1.6 (optimality of degree-dd sum-of-squares for approximating boolean CSPs): If 0\Im_{0} is a max-P\mathscr{P} instance on mm variables with max(0)s\max(\Im_{0})\leqslant s and sosd(0)>c\overline{\textup{sos}}_{d}(\Im_{0})>c, then f=c0f=c-\Im_{0} has sos degree larger than dd. Our main theorem gives a lower bound on the psd rank of the matrix MnfM^{f}_{n}. Since this matrix is a submatrix of the matrix in Proposition 1.13, our psd rank lower bound implies a lower bound on the minimum dimension of a subspace achieving a (c,s)(c,s)-approximation for \textup{{max-\mathscr{P}}}_{n}.

Since UU achieves a (c,s)(c,s)-approximation for Πn\Pi_{n}, for every instance Πns\Im\in\Pi_{n}^{\leqslant s} we will have sosU()c\overline{\textup{sos}}_{U}(\Im)\leqslant c. By definition of sosU()\overline{\textup{sos}}_{U}(\Im), this implies that c=igi2c-\Im=\sum_{i}g_{i}^{2} for giUg_{i}\in U. By expressing each gi2g_{i}^{2} as gi2=Tr(ΛiQ(x))g_{i}^{2}=\operatorname{Tr}(\Lambda_{i}Q(x)) for some ΛiSr+\Lambda_{i}\in\mathcal{S}_{r}^{+} we get,

Proof overview and setup

If HH comes equipped with a canonical (ordered) orthonormal basis (as will always be the case throughout), we represent linear operators on HH by matrices with rows and columns indexed by the basis elements. In this case, M(H)\mathcal{M}(H) consists of symmetric matrices and D(H)\mathcal{D}(H) consists of symmetric, positive semidefinite matrices whose diagonal entries summing to one. If AM(H)A\in\mathcal{M}(H) is positive semidefinite, we use A1/2A^{1/2} to denote the positive semidefinite square root of AA.

Given a linear operator A:HHA:H\to H, we define the operator, trace, and Frobenius norms, respectively:

Recall Tr(ATB)AB\operatorname{Tr}(A^{T}B)\leqslant\|A\|\cdot\|B\|_{*} and the Cauchy-Schwarz inequality Tr(ATB)AFBF\operatorname{Tr}(A^{T}B)\leqslant\|A\|_{F}\|B\|_{F}. For a matrix MM, we use M\|M\|_{\infty} to denote the maximum absolute value of an entry in MM.

2 Factorizations, quantum learning, and pseudo-densities

This contradicts degsos(f)=d+2\deg_{\mathsf{sos}}(f)=d+2 since P(S)R(x)F2\|\sqrt{P(S)}R(x)\|_{F}^{2} is a sum of squares of a polynomials of degree less than d/2d/2.

By appealing to convex duality, it is possible to construct a certificate that the matrix MnfM_{n}^{f} does not admit low degree factorizations. The certificate consists of a linear functional that separates MnfM_{n}^{f} from the convex hull of matrices that admit low degree psd factorizations. Formally, if we define the convex set Cd\mathcal{C}_{d} of non-negative matrices as,

then we will construct a linear functional LL on (nm)×2n\binom{n}{m}\times 2^{n} matrices such that

The linear functional is precisely the one derived from what we refer to as a pseudo-density.

Consider a matrix NCdN\in\mathcal{C}_{d} which admits a low degree psd factorization given by N(S,x)=Tr(P(S)R(x)2)N(S,x)=\operatorname{Tr}(P(S)R(x)^{2}). Then since DD is a degree-dd pseudo-density, we would have

However, since DD is negatively correlated with ff,

Formally, for a number r1r\geqslant 1, consider the following set Cr\mathcal{C}_{r} of nonnegative matrices,

Fix a matrix NCrN\in\mathcal{C}_{r}. It is instructive to have the situation N1,N=Θ(1)\lVert N\rVert_{1},\lVert N\rVert_{\infty}=\Theta(1) in mind for the rest of this outline. By definition of Cr\mathcal{C}_{r}, the matrix NN admits a psd factorization of rank O(r)O(r). In light of the above discussion, our goal is to approximate the matrix NN by a low degree factorization with respect to the functional LDL_{D}. A low degree approximation for NN is constructed in two steps.

Well-behaved factorizations. The first step involves obtaining a nicer factorization of NN. Toward this end, we define the quantity

Applying the above lemma to the matrix NN at hand, we get a psd factorization N(S,x)=Tr(P(S)Q(x))N(S,x)=\operatorname{Tr}(P(S)Q(x)) wherein P(S)\lVert P(S)\rVert and Q(x)\lVert Q(x)\rVert_{*} are bounded polynomially in rr. This analytic control on the factorization will be important for controlling error bounds, but also—in a more subtle way—for the next step.

Now, if T\mathcal{T} is a family of test functionals, then a canonical way of finding a “simple” approximation to UQU_{Q} that satisfies all the tests is via the following maximum-entropy (convex) optimization problem:

where we recall that S()S(\cdot) denotes the quantum entropy functional. Moreover, one can attempt to solve this optimization by some form of projected sub-gradient descent. Interpretations of this algorithm go by many names, notably the “matrix multiplicative weights update method” and “mirror descent” with quantum entropy as the regularizer; see, e.g., [NY83, BT03, TRW05, AK07, WK12] and the recent survey [Bub14].

Let κ1\kappa\geqslant 1 and ω>0\omega>0 be given. Define

and for all tests ΛTκ,ω\Lambda\in\mathcal{T}_{\kappa,\omega},

In other words, the learning algorithm produces a hypothesis with error at most ε\varepsilon for all the tests in Tκ,ω\mathcal{T}_{\kappa,\omega}; moreover, the hypothesis is the square of a polynomial whose degree is not much larger than that of the tests. The value ω\omega corresponds to the ubiquitous “width” parameter and, as in most applications of the multiplicative weights method, bounding ω\omega will be centrally important. The reader should also take note of the appearance of the relative entropy in the degree bound (2.8). It will turn out that low psd rank factorizations will give us functions Q:{0,1}nS+kQ:\{0,1\}^{n}\to\mathcal{S}_{+}^{k} with high entropy (and thus small relative entropy with respect to the uniform state); this is actually a direct consequence of the factorization rescaling in Lemma 2.1.

Then from the definition of MnfM_{n}^{f}, we have

On the other hand, we will prove the following theorem in Section 3.1.

Now if we consider the normalized matrix N=Mnf/MnfN=M^{f}_{n}/\lVert M^{f}_{n}\rVert_{\infty}, we see that it satisfies the first premise N1\lVert N\rVert_{\infty}\leqslant 1 but violates the conclusion of the theorem (because of (2.9)). Therefore we know that the second premise is violated, which gives the lower bound

Since this achieves our goal, we are left to explain why Theorem 2.3 should be true, at least when we apply it with N=Mnf/MnfN=M_{n}^{f}/\|M_{n}^{f}\|_{\infty}. If we apply LDL_{D} to the right-hand side of (2.1)—our presumed factorization for MnfM^{f}_{n}—we arrive at the expression

We can view this as a test on QQ in the sense of Theorem 2.2. Since deg(D)m\deg(D)\leqslant m (because DD is only a function of mm variables), this is a low-degree test. Theorem 2.2 then suggests that we can replace QQ by a low-degree approximator R2R^{2}, while losing only ε\varepsilon in the “accuracy” of the test.

Since the approximation property implies that Q(x)Q(x) and R(x)2R(x)^{2} should perform similarly under the test (up to the “accuracy” ε\varepsilon), we would conclude that LD(Mnf)εL_{D}(M_{n}^{f})\geqslant-\varepsilon, yielding the conclusion of Theorem 2.3.

Random restriction and degree reduction. The one serious issue with the preceding argument is that our supposition is far too strong: One cannot expect to have deg(R)d/2\deg(R)\leqslant d/2. Indeed, the guarantee of Theorem 2.2 tells us that the approximator R(x)R(x) has degree at most Kdeg(D)K\cdot\deg(D) for some (possibly large) number KK (which itself depends on many parameters). To overcome this problem, we use another crucial property of our functional (2.4): It is an expectation over small sets S[n]S\subseteq[n]. If we randomly choose such a subset with S=mn|S|=m\ll n and randomly choose values ySˉy_{\bar{S}} for the variables in Sˉ\bar{S}, we expect that the resulting (partially evaluated) polynomial R(xS,xSˉ)xSˉ=ySˉR(x_{S},x_{\bar{S}})|_{x_{\bar{S}}=y_{\bar{S}}} will satisfy deg(R(xS,xSˉ)xSˉ=ySˉ)deg(R)\deg(R(x_{S},x_{\bar{S}})|_{x_{\bar{S}}=y_{\bar{S}}})\ll\deg(R). (Strictly speaking, this will only be true in an approximate sense.)

It is precisely this degree reduction property of random restriction that saves the preceding sketch. In the next sections, we perform a more delicate quantitative analysis capable of achieving much stronger lower bounds. The norm D\|D\|_{\infty} of the pseudo-density will play a central role in this study. Thus in Section 5.1, we show that Grigoriev’s proof of Theorem 1.12 can be carefully recast in the language of pseudo-densities such that the resulting pseudo-density has small norm.

PSD rank and sum-of-squares degree

We now move to proving the main technical theorems of the paper along the lines of the informal overview presented in Section 2.2.

we have LD(N)εL_{D}(N)\geqslant-\varepsilon. Moreover, this holds for

The proof of this theorem consists of two parts. First, we observe that if DD is a degree-dd pseudo-density, then LD(N)L_{D}(N) is nonnegative for all matrices NN that admit a factorization in terms of squares of low-degree polynomials, i.e., a factorization N(S,x)=Tr(AS2Bx2)N(S,x)=\operatorname{Tr}(A_{S}^{2}B_{x}^{2}) for symmetric matrics {AS}\{A_{S}\} and {Bx}\{B_{x}\} such that the function xBxx\mapsto B_{x} has degree at most d/2d/2 over {0,1}n\{0,1\}^{n}.For the convenience of the reader, we recall that the degree of the matrix-valued function xBxx\mapsto B_{x} is defined as the maximum degree of the functions x(Bx)ijx\mapsto(B_{x})_{ij} where i,ji,j range over the indices of BxB_{x}. Indeed, consider such a factorization. Then,

As explained in Section 2.2, this guarantee is not sufficient for us. The following theorem (proved in Section 3.2) allows us to analyze LDL_{D} even when the degree of the map xBxx\mapsto B_{x} is much larger than d/2d/2. (For mno(1)m\leqslant n^{o(1)}, it will be the case that the linear functional LDL_{D} is approximately nonnegative on NN even when xBxx\mapsto B_{x} has degree up to no(1)n^{o(1)}).

Mi,jTr(PiQj)Mi,j+ηMM_{i,j}\leqslant\operatorname{Tr}(P_{i}Q_{j})\leqslant M_{i,j}+\eta\|M\|_{\infty},

Note that Item 2 holds by construction. Also observe that

verifying Item 1. Finally, we have the inequalities for all i[p],j[q]i\in[p],j\in[q],

The first inequality verifies Item 3 since r1r\geqslant 1 and η1\eta\leqslant 1. The last inequality implies that

satisfying the conclusions of the theorem.

where the last inequality has used N1\|N\|_{\infty}\leqslant 1.

From Theorem 3.3(4), the density matrix QQ satisfies

Theorem 3.3(1) allows us to lower bound LD(N)L_{D}(N) in terms of the matrix (S,x)Tr(PSQx)(S,x)\mapsto\operatorname{Tr}(P_{S}Q_{x}) and value D\|D\|_{\infty}:

Theorem 3.3(2) allows us to upper bound the spectral norm of FF by

The next theorem allows us to lower bound Tr(FQ)\operatorname{Tr}(FQ) by replacing QQ with a simpler density matrix that is a low-degree polynomial in FF. (See Theorem 4.1, where a slightly more general version is proved.)

Apply Theorem 3.4 to the density matrix QQ and the symmetric matrix FF defined above with the value ε\varepsilon (which is already fixed). Let pp be the resulting degree-kk polynomial, with kk satisfying the bounds of the theorem.

We have now assembled all components of the proof of Theorem 3.1.

We lower bound the linear functional LD(N)L_{D}(N) by

Now set η:=min(ε/D,1)\eta\mathrel{\mathop{:}}=\min(\varepsilon/\|D\|_{\infty},1) and use (3.1) to bound τ1+η2\tau\leqslant 1+\eta\leqslant 2. This yields

Now recall that our invocation of Theorem 3.4 gives us a bound on k=deg(p)k=\deg(p):

Plugging this bound into (3.8) yields, for some constant c>0c^{\prime}>0,

2 Degree reduction

The next theorem is a restatement of Theorem 3.2. One should simply note that for any symmetric matrix AA, we have AF2=Tr(A2)\|A\|_{F}^{2}=\operatorname{Tr}(A^{2}).

For the sake of this lemma, which uses Fourier analysis, we will think of BB and DD as functions on {1,1}n\{-1,1\}^{n}. Since this is a linear transformation on the domain, it does not affect their degrees as multilinear polynomials.

For every S[n]S\subseteq[n] with S=m\lvert S\rvert=m, we decompose BB into two parts B=BS,low+BS,highB=B_{S,\mathsf{low}}+B_{S,\mathsf{high}} such that BS,lowB_{S,\mathsf{low}} is the part of BB with degree at most d/2d/2 in the variables SS:

(Recall Section 2.1 for the Fourier-analytic definitions.)

The proof consists of two steps that are captured by the following two lemmas.

Let τ=maxSA(S)2\tau=\max_{S}\lVert A(S)^{2}\rVert. Then,

Here, we have used that the quadratic formula ABF2=ABlowF2+ABhighF2+2ABlow,ABhigh\lVert AB\rVert_{F}^{2}=\lVert AB_{\mathsf{low}}\rVert_{F}^{2}+\lVert AB_{\mathsf{high}}\rVert_{F}^{2}+2\langle AB_{\mathsf{low}},AB_{\mathsf{high}}\rangle, where ,\langle\cdot,\cdot\rangle is the inner product that induces F\lVert\cdot\rVert_{F}, i.e., X,Y=Tr(XTY)\langle X,Y\rangle=\operatorname{Tr}(X^{T}Y). Hence,

The first step used the identity x2y2=x+yxy\lvert x^{2}-y^{2}\rvert=\lvert x+y\rvert\cdot\lvert x-y\rvert. In the second step, we applied Cauchy–Schwarz. The third step used the triangle inequality, ABFABlowFABhighF.\left\lvert\vphantom{\bigoplus}\lVert AB\rVert_{F}-\lVert AB_{\mathsf{low}}\rVert_{F}\right\rvert\leqslant\lVert AB_{\mathsf{high}}\rVert_{F}\,.

This bound implies the desired lower bound

By construction the Fourier transform of BS,highB_{S,\mathsf{high}} satisfies

We combine the previous two lemmas to lower bound the correlation between the pseudo-density D(xS)D(x_{S}) and the norms A(S)B(x)F2\lVert A(S)B(x)\rVert_{F}^{2},

3 Proof of the main theorem

For a function f ⁣:{0,1}mf\colon\{0,1\}^{m}\to and an integer nmn\geqslant m, let Mnf ⁣:(nm)×{0,1}nM^{f}_{n}\colon\binom{n}{m}\times\{0,1\}^{n}\to be the matrix,

For any m,d1m,d\geqslant 1, the following holds. Let f:{0,1}mf:\{0,1\}^{m}\to be a nonnegative function with d+2=degsos(f)d+2=\deg_{\mathsf{sos}}(f). Then for n2mn\geqslant 2m,

where Cf>0C_{f}>0 is a constant depending only on ff.

which yields an explicit psd factorization of MnfM^{f}_{n} with matrices {PS},{Qx}\{P_{S}\},\{Q_{x}\} of dimension r=i1+d/2(ni)1+n1+d/2r=\sum_{i\leqslant 1+d/2}\binom{n}{i}\leqslant 1+n^{1+d/2}. ∎

Approximations for density operators

We turn now to a central theme of our approach: High-entropy states can be approximated by “simple” states if the approximation is only with respect to “simple” tests. In our setting, “simple” will mean low-degree. In Section 4.1, we present a basic version of this principle with respect to a single test functional. This suffices for essentially all our applications to psd rank lower bounds.

We believe that the maximum-entropy approximation framework is a powerful one, so Section 4.2 is devoted to a more general exploration of the principle. In particular, we state and prove approximation theorems for density operators with respect to families of tests. In the rest of this section, we fix a finite-dimensional real inner product space HH.

Let FM(H)F\in\mathcal{M}(H) be a symmetric matrix and let QD(H)Q\in\mathcal{D}(H) be a density matrix. Then, for every ε(0,12)\varepsilon\in(0,\frac{1}{2}), there exists a degree-kk univariate polynomial pp with kO(F/ε)S(QU)+O(log1/εloglog1/ε)k\leqslant O(\lVert F\rVert/\varepsilon)\cdot S(Q\,\|\,U)+O\left(\frac{\log 1/\varepsilon}{\log\log 1/\varepsilon}\right) such that

Moreover, the polynomial pp depends only on ε\varepsilon, the operator norm F\lVert F\rVert, and the relative entropy S(QU)S(Q\,\|\,U).)

The proof consists of two steps. First, we will show that the theorem holds with 1Tr(p(F)2)p(F)2\tfrac{1}{\operatorname{Tr}(p(F)^{2})}p(F)^{2} replaced by eλF/Tr(eλF)e^{-\lambda F}/\operatorname{Tr}(e^{-\lambda F}) for λ(1/ε)S(QU)\lambda\leqslant(1/\varepsilon)\cdot S(Q\,\|\,U). Then, we will approximate the matrix exponential by the square of a low-degree polynomial.

For every symmetric matrix FF and every density matrix QQ,

as long as λ1/εS(QU)\lambda\geqslant 1/\varepsilon\cdot S(Q\,\|\,U).

By the duality formula for quantum entropy (see, e.g., [Car10, Thm. 2.13]), the function f:XλTr(FX)+S(XU)f:X\mapsto\lambda\operatorname{Tr}(FX)+S(X\,\|\,\mathcal{U}) over the the set of density matrices is minimized at X=eλF/Tr(eλF)X^{\star}=e^{-\lambda F}/\operatorname{Tr}(e^{-\lambda F}). Therefore, using the fact S(XU)0S(X^{\star}\,\|\,\mathcal{U})\geqslant 0, we get

which implies that Tr(FX)Tr(FQ)+S(QU)/λTr(FQ)+ε\operatorname{Tr}(FX^{\star})\leqslant\operatorname{Tr}(FQ)+S(Q\,\|\,\mathcal{U})/\lambda\leqslant\operatorname{Tr}(FQ)+\varepsilon, as desired. ∎

Next we observe that one can pass from univariate approximations of exe^{x} to approximations of eFe^{F} in the trace norm.

Let δ(0,1]\delta\in(0,1] and τ>0\tau>0 be given. Suppose there exists a univariate polynomial p(x)p(x) such that for every x[τ/2,τ/2]x\in[-\tau/2,\tau/2],

Then for every FM(H)F\in\mathcal{M}(H) with Fτ\|F\|\leqslant\tau, we have

Under the assumptions, for every x[τ,τ]x\in[-\tau,\tau], one has

where the last line follows from δ1\delta\leqslant 1. Note the elementary equality: For all x,y,x,y>0x,y,x^{\prime},y^{\prime}>0,

Let λ1,λ2,,λn[τ,τ]\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\in[-\tau,\tau] denote the eigenvalues of FF. We conclude that

Since eFe^{F} and p(F)p(F) are simultaneously diagonalizable, the preceding inequality is precisely our goal (4.2). ∎

The following corollary of Lemma 4.3 follows by checking that the Taylor expansion of exe^{x} satisfies the approximation guarantee (4.1).

For every ε(0,12)\varepsilon\in(0,\frac{1}{2}) and every symmetric matrix FM(H)F\in\mathcal{M}(H), there is a number k3e(F+log(1/ε)loglog(1/ε))k\leqslant 3e\left(\|F\|_{\infty}+\frac{\log(1/\varepsilon)}{\log\log(1/\varepsilon)}\right) and a univariate degree-kk polynomial pkp_{k} with non-negative coefficients such that

Let pk(x)=t=0kxkk!p_{k}(x)=\sum_{t=0}^{k}\frac{x^{k}}{k!}. By Taylor’s theorem, we have

Define τ=F\tau=\|F\|_{\infty} and choose k=3e(τ+log(1/ε)loglog(1/ε))k=\left\lfloor 3e\left(\tau+\frac{\log(1/\varepsilon)}{\log\log(1/\varepsilon)}\right)\right\rfloor so that for x[τ/2,τ/2]x\in[-\tau/2,\tau/2], we have xk+1(k+1)!ε/6\frac{x^{k+1}}{(k+1)!}\leqslant\varepsilon/6. Finally, apply Lemma 4.3. ∎

The proof of the main theorem in this section follows by combining Lemma 4.2 and Corollary 4.4.

Fix ε(0,12)\varepsilon\in(0,\frac{1}{2}) and FM(H)F\in\mathcal{M}(H), qD(H)q\in\mathcal{D}(H). Choose λ=(2/ε)S(QU)\lambda=(2/\varepsilon)\cdot S(Q\,\|\,\mathcal{U}) and F=λFF^{\prime}=-\lambda F. Let pkp_{k} be the polynomial from Corollary 4.4 for k=3e(F+log(1/ε)loglog(1/ε))k=3e\left(\lVert F^{\prime}\rVert+\frac{\log(1/\varepsilon^{\prime})}{\log\log(1/\varepsilon^{\prime})}\right) and ε=ε/(2F)\varepsilon^{\prime}=\varepsilon/(2\lVert F\rVert). Note that kO(F/ε)S(QU)+O(log1/εloglog1/ε)k\leqslant O(\lVert F\rVert/\varepsilon)\cdot S(Q\,\|\,\mathcal{U})+O\left(\frac{\log 1/\varepsilon}{\log\log 1/\varepsilon}\right). Moreover,

Since ε/2+εFε\varepsilon/2+\varepsilon^{\prime}\cdot\lVert F\rVert\leqslant\varepsilon, the polynomial p(x)=pk(λx/2)p(x)=p_{k}(-\lambda x/2) satisfies the desired bound

2 Approximation against a family of tests

Let TM(H)\mathcal{T}\subseteq\mathcal{M}(H) denote a compact set of matrices, and set Δ(T):=supATA\Delta(\mathcal{T})\mathrel{\mathop{:}}=\sup_{A\in\mathcal{T}}\|A\|. For AM(H)A\in\mathcal{M}(H), we define the associated dual gauge

One should think of T\mathcal{T} as a set of test functionals; for A,AM(H)A,A^{\prime}\in\mathcal{M}(H), the value [AA]T[A-A^{\prime}]_{\mathcal{T}} measures the extent to which AA and AA^{\prime} are distinguishable using tests from T\mathcal{T}. It is important to note that if T\mathcal{T} is not centrally symmetric, then []T[\cdot]_{\mathcal{T}} might also fail to be symmetric. For future reference, we observe that fact that for any AM(H)A\in\mathcal{M}(H),

Our main approximation theorem asserts that, with respect to tests from a convex set T\mathcal{T}, a high-entropy density operator can be well-approximated by the square of a low-degree polynomial in some element of T\mathcal{T}.

For every ε(0,12)\varepsilon\in(0,\frac{1}{2}), the following holds. Let TM(H)\mathcal{T}\subseteq\mathcal{M}(H) be compact and convex, and let QD(H)Q\in\mathcal{D}(H) be a density matrix. Then there exists a number

a univariate degree-kk polynomial pp, and an element FTF\in\mathcal{T} such that Tr(p(F)2)=1\operatorname{Tr}(p(F)^{2})=1 and

This can be verified immediately by showing that both sides satisfy the differential equation

with F(0,t)=0F(0,t)=0 for all tt. (This argument is taken from [Wil67].)

We will only require (4.8) for β=1\beta=1. For example, (4.8) and cyclicity of the trace yields

Denote X(t)=dX(t)dtX^{\prime}(t)=\frac{dX(t)}{dt}. If we know that X(t)X(t) is symmetric, and its eigenvalues are {λi}\{\lambda_{i}\}, then by diagonalizing in the basis of X(t)X(t), we can also derive

where in the final line we have used the fact that if aba\geqslant b, then

since eba1+(ba)e^{b-a}\geqslant 1+(b-a). Thus we have

We will use this for the following lemma.

For every ε>0\varepsilon>0, the following holds. Let CM(H)\mathcal{C}\subseteq\mathcal{M}(H) be a compact set, and let Q,Q0D(H)Q,Q_{0}\in\mathcal{D}(H) be density matrices. If one defines h=8ε2S(QQ0)Δ(T)2h=\lceil\frac{8}{\varepsilon^{2}}S(Q\,\|\,Q_{0})\Delta(\mathcal{T})^{2}\rceil then there exist A1,A2,,AhTA_{1},A_{2},\ldots,A_{h}\in\mathcal{T} such that

Consider for t0t\geqslant 0, the density matrix

where sΛsTs\mapsto\Lambda_{s}\in\mathcal{T} is any measurable function.

where in the final line we have used Tr(Q)=1\operatorname{Tr}(Q)=1.

Let T=2εS(QQ0)T=\frac{2}{\varepsilon}S(Q\,\|\,Q_{0}). Suppose the map tΛtTt\mapsto\Lambda_{t}\in\mathcal{T} is such that

Then from (4.15) and (4.16), we arrive at

which contradicts the fact that S(QQT)0S(Q\,\|\,Q_{T})\geqslant 0.

and define Λt=Ai+1\Lambda_{t}=A_{i+1} for t(ti,ti+1]t\in(t_{i},t_{i+1}].

Finally, observe that for t(ti,ti+1)t\in(t_{i},t_{i+1}), we have

where we have used the fact that Λt=ddtlog(elogQ00tΛsds)\Lambda_{t}=-\frac{d}{dt}\log\left(e^{\log Q_{0}-\int_{0}^{t}\Lambda_{s}\,ds}\right). We conclude that

with λ1+1δS(QU)|\lambda|\lesssim 1+\frac{1}{\delta}S(Q\,\|\,U) and which satisfies

Note here that since T\mathcal{T} is assumed to be convex, we have FTF\in\mathcal{T} (see the form of (4.12)).

Then we apply Corollary 4.4 to λF\lambda F to obtain a degree-kk polynomial pkp_{k} such that

for the relative entropy between fμf\mu and μ\mu. We will also allow ourselves to conflate μ\mu with the corresponding density by writing μ(x)\mu(x) for μ({x})\mu(\{x\}) and xXx\in X. One should note that an analog of Lemma 4.6 for the special case of probability distributions (instead of density matrices) can be proved exactly along the same lines, but without the use of matrix inequalities.

We also lift each test gg to a matrix M(g)=x{0,1}ng(x)exexTM(g)=\sum_{x\in\{0,1\}^{n}}g(x)e_{x}e_{x}^{T} and the set M(T)M(\mathcal{T}) now denotes a class of matrix tests.

so that S(M(f)Q0)=D(fμ)S(M(f)\,\|\,Q_{0})=D(f\,\|\,\mu).

and such that for every gTg\in\mathcal{T},

The correlation polytope

This polytope is also known as the Boolean quadric polytope [Pad89] for the following reason.

where AA is the matrix A=(aij)A=(a_{ij}). Since this inequality holds at the vertices, it holds for all of corrn\text{{{corr}}}_{n}. ∎

We now recall the relationship between the correlation, cut, TSP, and stable set polytopes. The first fact is from [DS90], while the second two are taken from [FMP+12].

For every n1n\geqslant 1, the following hold:

corrn\text{{{corr}}}_{n} is linearly isomorphic to cutn+1\text{{{cut}}}_{n+1}.

There exists a number anO(n2)a_{n}\leqslant O(n^{2}) such that some face of tspan\text{{{tsp}}}_{a_{n}} linearly projects to corrn\text{{{corr}}}_{n}.

There exists a graph HnH_{n} on bnO(n2)b_{n}\leqslant O(n^{2}) vertices such that some face of stabbn(Hn)\text{{{stab}}}_{b_{n}}(H_{n}) linearly projects to corrn\text{{{corr}}}_{n}.

We will now prove a lower bound on the psd rank of corrn\text{{{corr}}}_{n}. Our first goal is to construct a suitable family of pseudo-densities. We will employ Grigoriev’s work [Gri01a] on degree lower bounds for Positivstellensatz calculus refutations. The primary difficulty will be in expressing Grigoriev’s lower bound using a pseudo-density of small norm.

Grigoriev constructs a linear functional G\mathcal{G} on the space of mm-variate real polynomials modulo the ideal I\mathcal{I} generated by {Xi2Xi:i=1[m]}\{X_{i}^{2}-X_{i}:i=1\in[m]\}:

The functional is uniquely defined by the values

for each multilinear monomial XS=iSXiX^{S}=\prod_{i\in S}X_{i} with S[m]S\subseteq[m]. Observe that m/2m/2 is not an integer and the (generalized) binomial coefficient (m/2k)\binom{m/2}{k} is defined using the formal expression

It is easy to check that G\mathcal{G} satisfies (5.2):

Grigoriev shows that G\mathcal{G} satisfies (5.1) [Gri01a, Lem. 1.4].

We claim that for any univariate real polynomial pp with deg(p)m\deg(p)\leqslant m,

Both sides of the claimed identity are univariate polynomials in tt of degree at most mm and agree with each other on the m+1m+1 points given by t{0,1,,m}t\in\{0,1,\ldots,m\}. Hence, the two polynomials are identically equal.

For each x{0,1}mx\in\{0,1\}^{m}, let x|x| denote its hamming weight, and define

Finally, in order to bound D\lVert D\rVert_{\infty}, observe that the polynomials cw(t)c_{w}(t) are given by the interpolation formula

For an x{0,1}mx\in\{0,1\}^{m} with x=w|x|=w we have,

where in the last step we have used Sterling’s approximation for the inequality (m1(m1)/2)2m1/32(m1)+1\binom{m-1}{(m-1)/2}\leqslant 2^{m-1}/\sqrt{\frac{3}{2}(m-1)+1}, valid for m3m\geqslant 3. ∎

There is a constant α>0\alpha>0 such that for every n1n\geqslant 1,

Choosing n2αm13/2lognn\geqslant\frac{2}{\alpha^{\prime}}m^{13/2}\log n, an easy calculation shows that

Optimality of low-degree sum-of-squares for max CSPs

Constraint satisfaction problems form a broad class of discrete optimization problems that include, for example, max cut and max 3-sat. For simplicity of presentation, we will focus on constraint satisfaction problems with a boolean alphabet, though similar ideas extend to larger domains (see an analogous generalization in Section 7). We begin our presentation with a formal definition of semidefinite programming relaxations for max-CSPs.

Given an instance \Im\in\textup{{max-\Pi}}_{n}, the SDP relaxation S\mathcal{S} has value

The degree-dd sos relaxation for \Im is given by

We will now write the dual formulation of the above semidefinite program to expose the underlying spectrahedron and linearization. The dual of (6.1) can be written as,

where the final inequality used the fact that D,g20\langle D,g^{2}\rangle\geqslant 0 for all functions gg with deg(g)d/2\deg(g)\leqslant d/2.

From the above discussion, it is clear that the feasible region of the degree dd-sos relaxation (6.2) corresponds to the spectrahedron,

From (6.3) and (6.4), for every instance \Im\in\textup{{max-\Pi}}_{n} and every assignment x{0,1}nx\in\{0,1\}^{n} we have

Now the degree-dd sos relaxation corresponding to an instance \Im\in\textup{{max-\Pi}}_{n} in (6.1) and (6.2) can be equivalently formulated as

where the first index of Mc,sn,ΠM^{n,\Pi}_{c,s} ranges over all instances on nn variables satisfying opt()s\operatorname{opt}(\Im)\leqslant s. A simple consequence of Proposition 1.10 is the following.

2 General SDPs vs. sum-of-squares

Our main theorem is that general SDP relaxations for max-CSPs are no more powerful than low-degree sum-of-squares relaxations in the polynomial-size regime.

Given that the degree-dd sos relaxation cannot achieve a (c,s)(c,s)-approximation, there exists an instance \Im of \textup{{max-\Pi}}_{m} for some mm such that opt()s\operatorname{opt}(\Im)\leqslant s but degsos(c)>d\deg_{\mathsf{sos}}(c-\Im)>d.

This implies that no sequence of SDP relaxations of size at most o((nlogn)d/4)o((\frac{n}{\log n})^{d/4}) can achieve a (c,s)(c,s)-approximation for max-Π\Pi. ∎

For a stronger quantitative bound, we require the following simple fact.

Write χα=χAχB\chi_{\alpha}=\chi_{A}\chi_{B} for some A,BA,B with A,Bd/2|A|,|B|\leqslant d/2 and observe that

Suppose that for every n1n\geqslant 1, the degree-d(n)d(n) sos relaxation cannot achieve a (c+ε,s)(c+\varepsilon,s)-approximation for \textup{{max-\Pi}}_{n}. Then for all n1n\geqslant 1, no SDP relaxation of size at most Knd(n)2/8Kn^{d(n)^{2}/8} can achieve a (c,s)(c,s)-approximation for \textup{{max-\Pi}}_{N} for every N>n4d(n)N>n^{4d(n)}.

Using known lower bounds for low degree sum-of-squares relaxations for max-CSPs [Gri01b, Sch08, Tul09], Theorem 6.4 implies lower bounds against general SDP relaxations for a range of specific max-CSPs. For instance, the lower bounds of Grigoriev [Gri01b] and Schoenebeck [Sch08] imply a lower bound for max 3-sat (see Theorem 1.5).

For every ε>0\varepsilon>0, there exists a constant cεc_{\varepsilon} such that the following holds. For every n1n\geqslant 1, there is a \textup{{max-\Pi}}_{n} instance n\Im_{n} such that opt(n)7/8+ε\operatorname{opt}(\Im_{n})\leqslant 7/8+\varepsilon, but soscεn()=1\overline{\textup{sos}}_{c_{\varepsilon}n}(\Im)=1.

Observe that one can obtain the bound of Theorem 1.5 using the preceding result as follows. In Theorem 6.4, choose nlogNn\asymp\log N and d(n)logNloglogNd(n)\asymp\frac{\log N}{\log\log N} so that n4d(n)Nn^{4d(n)}\asymp N. In that case, the lower bound obtained is of the order Nd(n)/32NΩ(logN/loglogN)N^{d(n)/32}\asymp N^{\Omega(\log N/\log\log N)}.

Nonnegative rank

Theorem 3.8 exhibits a connection between psd rank and sos degree. There is a similar connection between nonnegative rank and junta-degree. The results of Section 7.1 generalize those of [CLRS13], while the method of proof is closely related. As opposed to [CLRS13], we use the learning approach of Section 4 to approximate by juntas. In Section 7.2, we demonstrate an application to the correlation polytope.

We also define a more quantitative notion: Approximate junta degree with respect to an arbitrary measure. Given ε>0\varepsilon>0 and a measure μ\mu on XnX^{n}, we define

where we take the maximum to be equal to 1-1 if no such pseudo-density exists. (See Section 7.2 for an example where a biased measure μ\mu is used to analyze the nonnegative rank of the lopsided disjointness matrix.) We can now state our main theorem on nonnegative rank.

where d+1=degJε(f;μm)d+1=\deg_{\mathsf{J}}^{\varepsilon}(f;\mu^{m}) and c>0c>0 is a universal constant.

The left-hand-side inequality of (7.1) follows from the fact that the cone of nonnegative dd-juntas is spanned by i=0d+1(ni)1+nd+1\sum_{i=0}^{d+1}\binom{n}{i}\leqslant 1+n^{d+1} nonnegative dd-juntas. We move on to right-hand inequality.

Let Λτ={i:qiτ}\Lambda_{\tau}=\{i:\|q_{i}\|_{\infty}\leqslant\tau\}. Note that for iΛτi\notin\Lambda_{\tau}, we must have

Let JiJ_{i} denote the set of coordinates on which qiq_{i} depends, so that Jik|J_{i}|\leqslant k^{\prime}.

We break the right-hand side of (7.2) into two parts. First, using (7.4),

For the second part, we use (7.5) so that for every iΛτi\in\Lambda_{\tau} and S=m|S|=m,

This implies that for iΛτi\in\Lambda_{\tau},

where in the final line we have used λif1\|\lambda_{i}\|_{\infty}\leqslant\|f\|_{1} from (7.3) and also Jik|J_{i}|\leqslant k^{\prime} and n2mn\geqslant 2m.

Combining this with (7.7) and (7.6), we conclude that

Let us now set τ=3rff1\tau=3r\frac{\|f\|_{\infty}}{\|f\|_{1}} and δ=εD/3\delta=\varepsilon\|D\|_{\infty}/3, yielding

Now, if rndr\geqslant n^{d}, then we are done. Otherwise, (7.8) yields

2 The correlation polytope and lopsided disjointness

We now illustrate a particularly simple application of our method to nonnegative rank.

and let μ\mu be the measure on {0,1}\{0,1\} satisfying μ(0)=12/m\mu(0)=1-2/m and μ(1)=2/m\mu(1)=2/m. Then,

Plugging this result into Theorem 7.2 yields the following.

There is a constant c>0c>0 such that for every m3m\geqslant 3 and n2mn\geqslant 2m, we have

We now verify that DD is a dd-local pseudo-density (with respect to μm\mu^{m}) for d=m2+1d=\frac{m}{2}+1. Observe first that

Let β=2m(12m)m1\beta=\frac{2}{m}\left(1-\frac{2}{m}\right)^{m-1} denote μm(1,0,,0)\mu^{m}(1,0,\ldots,0). Consider a subset S[m]S\subseteq[m] and some fixed string b{0,1}Sb\in\{0,1\}^{S}. Let 1b:{0,1}m{0,1}\bm{1}_{b}:\{0,1\}^{m}\to\{0,1\} denote the indicator of whether xS=bx_{S}=b. If b=0b=\mathbf{0}, then

The latter quantity is nonnegative as long as Sm2+1|S|\leqslant\frac{m}{2}+1.

In other words, the lower bound of Theorem 7.4 applies to all matrices that have a subset of entries corresponding to the unique disjointness problem.

3 Unique games hardness for LPs

As an illustrative application of the relation between nonnegative rank and junta-degree (Theorem 7.2), we present an LP hardness result for the Unique Games problem.We thank Ola Svensson for the suggestion to make this explicit.

Fix an integer q1q\geqslant 1. An instance \Im of unique games UGq\textup{{UG}}^{q} consists of variables X1,,XnX_{1},\ldots,X_{n} taking values in [q][q] and a collection of predicates P1,,PMP_{1},\ldots,P_{M} over these variables. Each constraint PiP_{i} is over a pair of distinct variables {Xai,Xbi}\{X_{a_{i}},X_{b_{i}}\} and is specified by a bijection πi:[q][q]\pi_{i}:[q]\to[q] as follows:

The goal is to find an assignment that maximizes, over x[q]nx\in[q]^{n}, the number of satisfied constraints:

Recall that opt()=maxx[q]n(x)\operatorname{opt}(\Im)=\max_{x\in[q]^{n}}\Im(x). Let UGnq\textup{{UG}}^{q}_{n} denote the family of Unique Games instances on nn variables. The authors of [CMM09] exhibit a strong integrality gap for Sherali-Adams linear programming relaxations of UGq\textup{{UG}}^{q}

Fix a number t1t\geqslant 1 and let q=2tq=2^{t}. Then for every δ>0\delta>0, there exist γ,ε>0\gamma,\varepsilon>0, an m1m\geqslant 1, and an instance UGmq\Im\in\textup{{UG}}^{q}_{m} such that

but degJε(1δ)mγ\deg_{\mathsf{J}}^{\varepsilon}(1-\delta-\Im)\geqslant m^{\gamma}.

In fact, the authors of [CMM09] construct a lower bound for the dd-round Sherali-Adams LP relaxation (where dmγ)d\asymp m^{\gamma}). But there is an equivalence between such lower bounds and the existence of a dd-local pseudo-density; we refer to [CLRS13] for a discussion. Applying Theorem 7.2 (with X=[q]X=[q] and μ\mu as the uniform measure on [q][q]), we obtain the following corollary.

Let Mc,sn,UGqM^{n,\textup{{UG}}^{q}}_{c,s} denote the matrix with entries

where \Im runs over all UGnq\textup{{UG}}^{q}_{n} instances with opt()s\operatorname{opt}(\Im)\leqslant s, and all values x[q]nx\in[q]^{n}.

For every t1t\geqslant 1, δ>0\delta>0, and d1d\geqslant 1, there exists a constant c>0c>0 such that for all n1n\geqslant 1,

In the language of [CLRS13] (see also Section 6 for related definitions in the SDP setting), this shows that polynomial-size families of LP relaxations cannot achieve a (1δ,1q+δ)(1-\delta,\frac{1}{q}+\delta)-approximation for the Unique Games problem.

Acknowledgments

This work was supported, in large part, by NSF grant CCF-1407779. A significant fraction of the project was completed during a long-term visit of the authors to the Simons Institute for the Theory of Computing (Berkeley) for the program on Algorithmic Spectral Graph Theory. The authors would also like to thank Paul Beame, Siu-On Chan, Daniel Dadush, Troy Lee, Sebastian Pokutta, Pablo Parrilo, Mohit Singh, Ola Svensson, Thomas Rothvoß, and Rekha Thomas for valuable discussions and comments.

References