Hypercontractivity, Sum-of-Squares Proofs, and their Applications

Boaz Barak, Fernando G. S. L. Brandão, Aram W. Harrow, Jonathan A. Kelner, David Steurer, Yuan Zhou

Introduction

For a function f ⁣:Ω\varmathbbRf\colon\Omega\to\varmathbb R on a (finite) probability space Ω\Omega, the pp-norm is defined as fp=(\varmathbbEΩfp)1/p\lVert f\rVert_{p}=(\operatorname*{\varmathbb{E}}_{\Omega}f^{p})^{1/p}. We follow the convention to use expectation norms for functions (on probability spaces) and counting norms, denoted as vp=(i=1nvip)1/p{\boldsymbol{\lVert}}v{\boldsymbol{\rVert}}_{p}=(\sum_{i=1}^{n}|v_{i}|^{p})^{1/p}, for vectors v\varmathbbRmv\in\varmathbb R^{m}. All normed spaces here will be finite dimensional. We distinguish between expectation and counting norms to avoid recurrent normalization factors. The pqp\rightarrow q norm Apq\|A\|_{p\rightarrow q} of a linear operator AA between vector spaces of such functions is the smallest number c0c\geqslant 0 such that Afqcfp\lVert Af\rVert_{q}\leqslant c\lVert f\rVert_{p} for all functions ff in the domain of AA. We also define the pqp\to q norm of a subspace VV to be the maximum of fq/fp\lVert f\rVert_{q}/\lVert f\rVert_{p} for fVf\in V; note that for p=2p=2 this is the same as the norm of the projector operator into VV.

In this work, we are interested in the case p<qp<q and we will call such pqp\to q norms hypercontractive.We use this name because a bound of the form Apq1\|A\|_{p\rightarrow q}\leqslant 1 for p<qp<q is often called a hypercontractive inequality. Roughly speaking, for p<qp<q, a function ff with large fq\lVert f\rVert_{q} compared to fp\lVert f\rVert_{p} can be thought of as “spiky” or somewhat sparse (i.e., much of the mass concentrated in small portion of the entries). Hence finding a function ff in a linear subspace VV maximizing fq/f2\lVert f\rVert_{q}/\lVert f\rVert_{2} for some q>2q>2 can be thought of as a geometric analogue of the problem finding the shortest word in a linear code. This problem is equivalent to computing the 2q2\to q norm of the projector PP into VV (since Pf2f2\lVert Pf\rVert_{2}\leqslant\lVert f\rVert_{2}). Also when AA is a normalized adjacency matrix of a graph (or more generally a Markov operator), upper bounds on the pqp\to q norm are known as mixed-norm, Nash or hypercontractive inequalities and can be used to show rapid mixing of the corresponding random walk (e.g., see the surveys [Gro75, SC97]). Such bounds also have many applications to theoretical computer science, which are described in the survey [Bis11].

However, very little is known about the complexity of computing these norms. This is in contrast to the case of pqp\to q norms for pqp\geqslant q, where much more is known both in terms of algorithms and lower bounds, see [Ste05, KNS08, BV11].

Our Results

We initiate a study of the computational complexity of approximating the 242\to 4 (and more generally 2q2\to q for q>2q>2) norm. While there are still many more questions than answers on this topic, we are able to show some new algorithmic and hardness results, as well as connections to both Khot’s unique games conjecture [Kho02] (UGC) and questions from quantum information theory. In particular our paper gives some conflicting evidence regarding the validity of the UGC and its close variant— the small set expansion hypothesis (SSEH) of [RS10]. (See also our conclusions section.)

First, we show in Theorem 2.5 that approximating the 242\to 4 problem to within any constant factor cannot be done in polynomial time (unless SAT can be solved in exp(o(n))\exp(o(n)) time) but yet this problem is seemingly related to the Unique Games and Small-Set Expansion problems. In particular, we show that approximating the 242\to 4 norm is Small-Set Expansion- hard but yet has a subexponential algorithm closely related to the [ABS10] algorithm for Unique Games and Small-Set Expansion. Thus the computational difficulty of this problem can be considered as some indirect evidence supporting the validity of the UGC (or perhaps some weaker variants of it). To our knowledge, this is the first evidence of this kind for the UGC.

On the other hand, we show that a natural polynomial-time algorithm (based on an SDP hierarchy) that solves the previously proposed hard instances for Unique Games. The previous best algorithms for some of these instances took almost exponential ( exp(exp(logΩ(1)n))\exp(\exp(\log^{\Omega(1)}n)) ) time, and in fact they were shown to require super-polynomial time for some hierarchies. Thus this result suggests that this algorithm could potentially refute the UGC, and hence can be construed as evidence opposing the UGC’s validity.

We show several algorithmic results for the 242\to 4 (and more generally 2q2\to q) norm.

For q2q\geqslant 2, we say that an algorithm provides a (c,C)(c,C)-approximation for the 2q2\to q norm if on input an operator AA, the algorithm can distinguish between the case that A2qcσ\lVert A\rVert_{2\to q}\leqslant c\sigma and the case that A2qCσ\lVert A\rVert_{2\to q}\geqslant C\sigma, where σ=σmin(A)\sigma=\sigma_{\min}(A) is the minimum nonzero singular value of AA. (Note that since we use the expectation norm, AfqAf2σf2\lVert Af\rVert_{q}\geqslant\lVert Af\rVert_{2}\geqslant\sigma\lVert f\rVert_{2} for every function ff orthogonal to the Kernel of AA.) We say that an algorithm provides a good approximation for the 2q2\to q norm if it provides a (c,C)(c,C)-approximation for some (dimension independent) constants c<Cc<C. The motivation behind this definition is to capture the notion of a dimension independent approximation factor, and is also motivated by Theorem 2.4 below, that relates a good approximation for the 2q2\to q norm to solving the Small-Set Expansion problem.

For every 1<c<C1<c<C, there is a poly(n)exp(n2/q)\operatorname{poly}(n)\exp(n^{2/q})-time algorithm that computes a (c,C)(c,C)-approximation for the 2q2\to q norm of any linear operator whose range is \varmathbbRn\varmathbb R^{n}.

Combining this with our results below, we get as a corollary a subexponential algorithm for the Small-Set Expansion problem matching the parameters of [ABS10]’s algorithm. We note that this algorithm can be achieved by the “Sum of Squares” SDP hierarchy described below (and probably weaker hierarchies as well, although we did not verify this).

1.2 Polynomial algorithm for specific instances

We study a natural semidefinite programming (SDP) relaxation for computing the 242\to 4 norm of a given linear operator which we call Tensor-SDP.We use the name Tensor-SDP for this program since it will be a canonical relaxation of the polynomial program maxx2=1T,x4\max_{\lVert x\rVert_{2}=1}{\langle T,x^{\otimes 4}\rangle} where TT is the 44-tensor such that T,x4=Ax44\langle T,x^{\otimes 4}\rangle=\lVert Ax\rVert_{4}^{4}. See Section 4.5 for more details. While Tensor-SDP is very unlikely to provide a poly-time constant-factor approximation for the 242\to 4 norm in general (see Theorem 2.5 below), we do show that it provides such approximation on two very different types of instances:

We show that Tensor-SDP certifies a constant upper bound on the ratio A24/A22\|A\|_{2\rightarrow 4}/\lVert A\rVert_{2\to 2} where A:\varmathbbRn\varmathbbRmA:\varmathbb R^{n}\to\varmathbb R^{m} is a random linear operator (e.g., obtained by a matrix with entries chosen as i.i.d Bernoulli variables) and mΩ(n2logn)m\geqslant\Omega(n^{2}\log n). In contrast, if m=o(n2)m=o(n^{2}) then this ratio is ω(1)\omega(1), and hence this result is almost tight in the sense of obtaining “good approximation” in the sense mentioned above. We find this interesting, since random matrices seem like natural instances; indeed for superficially similar problems such shortest codeword, shortest lattice vector (or even the 121\to 2 norm), it seems hard to efficiently certify bounds on random operators.

We show that Tensor-SDP gives a good approximation of the 242\to 4 norm of the operator projecting a function f:{±1}n\varmathbbRf:\{\pm 1\}^{n}\to\varmathbb R into its low-degree component:

Let Pd\mathcal{P}_{d} be the linear operator that maps a function f:{±1}n\varmathbbRf:\{\pm 1\}^{n}\to\varmathbb R of the form f=α[n]f^αχαf=\sum_{\alpha\subseteq[n]}\hat{f}_{\alpha}\chi_{\alpha} to its low-degree part f=αdf^αχαf^{\prime}=\sum_{|\alpha|\leqslant d}\hat{f}_{\alpha}\chi_{\alpha} (where χα(x)=iαxi\chi_{\alpha}(x)=\prod_{i\in\alpha}x_{i}). Then Tensor-SDP(Pd)9d\textup{{Tensor-SDP}}(\mathcal{P}_{d})\leqslant 9^{d}.

The fact that Pd\mathcal{P}_{d} has bounded 242\to 4 norm is widely used in the literature relating to the UGC. Previously, no general-purpose algorithm was known to efficiently certify this fact.

1.3 Quasipolynomial algorithm for additive approximation

We also consider the generalization of Tensor-SDP to a natural SDP hierarchy. This is a convex relaxation that starts from an initial SDP and tightens it by adding additional constraints. Such hierarchies are generally paramaterized by a number rr (often called the number of rounds), where the 1st1^{\text{st}} round corresponds to the initial SDP, and the nthn^{\text{th}} round (for discrete problems where nn is the instance size) corresponds to the exponential brute force algorithm that outputs an optimal answer. Generally, the rthr^{\text{th}}-round of each such hierarchy can be evaluated in nO(r)n^{O(r)} time (though in some cases nO(1)2O(r)n^{O(1)}2^{O(r)} time suffices [BRS11]). See Section 3, as well as the surveys [CT10, Lau03] and the papers [SA90, LS91, RS09, KPS10] for more information about these hierarchies.

We call the hierarchy we consider here the Sum of Squares (SoS) hierarchy. It is not novel but rather a variant of the hierarchies studied by several authors including Shor [Sho87], Parrilo [Par00, Par03], Nesterov [Nes00] and Lasserre [Las01]. (Generally in our context these hierarchies can be made equivalent in power, though there are some subtleties involved; see [Lau09] and Appendix C for more details.) We describe the SoS hierarchy formally in Section 3. We show that Tensor-SDP’s extension to several rounds of the SoS hierarchy gives a non-trivial additive approximation:

Let Tensor-SDP(d)\textup{{Tensor-SDP}}^{(d)} denote the nO(d)n^{O(d)}-time algorithm by extending Tensor-SDP to dd rounds of the Sum-of-Squares hierarchy. Then for all ε\varepsilon, there is d=O(log(n)/ε2)d=O(\log(n)/\varepsilon^{2}) such that

The term A222A22\|A\|_{2\rightarrow 2}^{2}\|A\|_{2\rightarrow\infty}^{2} is a natural upper bound on A244\|A\|_{2\rightarrow 4}^{4} obtained using Hölder’s inequality. Since A22\|A\|_{2\rightarrow 2} is the largest singular value of AA, and A2\|A\|_{2\rightarrow\infty} is the largest 2-norm of any row of AA, they can be computed quickly. Theorem 2.3 shows that one can improve this upper bound by a factor of ε\varepsilon using run time exp(log2(n)/ε2)\exp(\log^{2}(n)/\varepsilon^{2})). Note however that in the special case (relevant to the UGC) that AA is a projector to a subspace VV, A22=1\lVert A\rVert_{2\to 2}=1 and A2dim(V)\lVert A\rVert_{2\to\infty}\geqslant\sqrt{\dim(V)} (see Lemma 10.1), which unfortunately means that Theorem 2.3 does not give any new algorithms in that setting.

Despite Theorem 2.3 being a non-quantum algorithm for for an ostensibly non-quantum problem, we actually achieve it using the results of Brandão, Christandl and Yard [BaCY11] about the quantum separability problem. In fact, it turns out that the SoS hierarchy extension of Tensor-SDP is equivalent to techniques that have been used to approximate separable states [DPS04]. We find this interesting both because there are few positive general results about the convergence rate of SDP hierarchies, and because the techniques of [BaCY11], based on entanglement measures of quantum states, are different from typical ways of proving correctness of semidefinite programs, and in particular different techniques from the ones we use to analyze Tensor-SDP in other settings. This connection also means that integrality gaps for Tensor-SDP would imply new types of entangled states that pass most of the known tests for separability.

2 Reductions

We relate the question of computing the hypercontractive norm with two other problems considered in the literature: the small set expansion problem [RS10, RST10], and the injective tensor norm question studied in the context of quantum information theory [HM10, BaCY11].

Khot’s Unique Games Conjecture [Kho02] (UGC) has been the focus of intense research effort in the last few years. The conjecture posits the hardness of approximation for a certain constraint-satisfaction problem, and shows promise to settle many open questions in the theory of approximation algorithms. Many works have been devoted to studying the plausibility of the UGC, as well as exploring its implications and obtaining unconditional results inspired or motivated by this effort. Tantalizingly, at the moment we have very little insight on whether this conjecture is actually true, and thus producing evidence on the UGC’s truth or falsity is a central effort in computational complexity. Raghavendra and Steurer [RS10] proposed a hypothesis closely related to the UGC called the Small-Set Expansion hypothesis (SSEH). Loosely speaking, the SSEH states that it is NP-hard to certify that a given graph G=(V,E)G=(V,E) is a small-set expander in the sense that subsets with size o(V)o(|V|) vertices have almost all their neighbors outside the set. [RS10] showed that SSEH implies UGC. While a reduction in the other direction is not known, all currently known algorithmic and integrality gap results apply to both problems equally well (e.g., [ABS10, RST12]), and thus the two conjectures are likely to be equivalent.

We show, loosely speaking, that a graph is a small-set expander if and only if the projection operator to the span of its top eigenvectors has bounded 242\to 4 norm. To make this precise, if G=(V,E)G=(V,E) is a regular graph, then let Pλ(G)P_{\geqslant\lambda}(G) be the projection operator into the span of the eigenvectors of GG’s normalized adjacency matrix with eigenvalue at least λ\lambda, and ΦG(δ)\Phi_{G}(\delta) be minSV,SδV\varmathbbP(u,v)E[v∉SuS]\min_{S\subseteq V,|S|\leqslant\delta|V|}\operatorname*{\varmathbb{P}}_{(u,v)\in E}[v\not\in S|u\in S].

Then we relate small-set expansion to the 242\rightarrow 4 norm (indeed the 2q2\rightarrow q norm for even q4q\geqslant 4) as follows:

For every regular graph GG, λ>0\lambda>0 and even qq,

(Norm bound implies expansion) For all δ>0,ε>0\delta>0,\varepsilon>0, Pλ(G)2qε/δ(q2)/2q\|P_{\geqslant\lambda}(G)\|_{2\to q}\leqslant\varepsilon/\delta^{(q-2)/2q} implies that ΦG(δ)1λε2\Phi_{G}(\delta)\geqslant 1-\lambda-\varepsilon^{2}.

(Expansion implies norm bound) There are constants c1,c2>0c_{1},c_{2}>0 such that for all δ>0\delta>0, ΦG(δ)>1c1λ2q2c2q\Phi_{G}(\delta)>1-c_{1}\lambda^{2q}2^{-c_{2}q} implies Pλ(G)2q2/δ\|P_{\geqslant\lambda}(G)\|_{2\rightarrow q}\leqslant 2/\sqrt{\delta}.

While one direction (bounded hypercontractive norm implies small-set expansion) was already known,While we do not know who was the first to point out this fact explicitly, within theoretical CS it was implicitly used in several results relating the Bonami-Beckner-Gross hypercontractivity of the Boolean noise operator to isoperimetric properties, with one example being O’Donnell’s proof of the soundness of [KV05]’s integrality gap (see [KV05, Sec 9.1]). to our knowledge the other direction is novel. As a corollary we show that the SSEH implies that there is no good approximation for the 242\to 4 norm.

2.2 Hypercontractivity and the injective tensor norm

We are able to make progress in understanding both the complexity of the 242\rightarrow 4 norm and the quality of our SDP relaxation by relating the 242\rightarrow 4 norm to several natural questions about tensors. An rr-tensor can be thought of as an rr-linear form on \varmathbbRn\varmathbb R^{n}, and the injective tensor norm inj\|\cdot\|_{\operatorname{inj}} of a tensor is given by maximizing this form over all unit vector inputs. See Section 9 for a precise definition. When r=1r=1, this norm is the 2-norm of a vector and when r=2r=2, it is the operator norm (or 2 ⁣ ⁣22\!\rightarrow\!2-norm) of a matrix, but for r=3r=3 it becomes NP-hard to calculate. One motivation to study this norm comes from quantum mechanics, where computing it is equivalent to a number of long-studied problems concerning entanglement and many-body physics [HM10]. More generally, tensors arise in a vast range of practical problems involving multidimensional data [vL09] for which the injective tensor norm is both of direct interest and can be used as a subroutine for other tasks, such as tensor decomposition [dlVKKV05].

It is not hard to show that A244\|A\|_{2\rightarrow 4}^{4} is actually equal to Tinj\lVert T\rVert_{{\rm inj}} for some 44-tensor T=TAT=T_{A}. Not all 44-tensors can arise this way, but we show that the injective tensor norm problem for general tensors can be reduced to those of the form TAT_{A}. Combined with known results about the hardness of tensor computations, this reduction implies the following hardness result. To formulate the theorem, recall that the Exponential Time Hypothesis (ETH) [IPZ98] states that 3-SAT instances of length nn require time exp(Ω(n))\exp(\Omega(n)) to solve.

Assuming ETH, then for any ε,δ\varepsilon,\delta satisfying 2ε+δ<12\varepsilon+\delta<1, the 242\rightarrow 4 norm of an m×mm\times m matrix AA cannot be approximated to within a exp(logε(m))\exp(\log^{\varepsilon}(m)) multiplicative factor in time less than mlogδ(m)m^{\log^{\delta}(m)} time. This hardness result holds even with AA is a projector.

While we are primarily concerned with the case of Ω(1)\Omega(1) approximation factor, we note that poly-time approximations to within multiplicative factor 1+1/n1.011+1/n^{1.01} are not possible unless P=NP\textbf{P}=\textbf{NP}. This, along with Theorem 2.5, is restated more formally as Theorem 9.4 in Section 9.2. Theorem 2.5 yields as a corollary that, assuming ETH, there is no polynomial-time algorithm obtaining a good approximation for the 242\rightarrow 4 norm. We note that these results hold under weaker assumptions than the ETH; see Section 9.2 as well.

Previously no hardness results were known for the 242\rightarrow 4 norm, or any pqp\rightarrow q norm with p<qp<q, even for calculating the norms exactly. However, hardness of approximation results for 1+1/poly(n)1+1/\operatorname{poly}(n) multiplicative error have been proved for other polynomial optimization problems [BTN98].

3 Relation to the Unique Games Conjecture

Our results and techniques have some relevance to the unique games conjecture. Theorem 2.4 shows that obtaining a good approximation for the 2q2\to q norm is Small-Set Expansion hard, but Theorem 2.1 shows that this problem is not “that much harder” than Unique Games and Small-Set Expansion since it too has a subexponential algorithm. Thus, the 2q2\to q problem is in some informal sense “of similar flavor” to the Unique Games/ Small-Set Expansion. On the other hand, we actually are able to show in Theorem 2.5 hardness (even if only quasipolynomial) of this problem, whereas a similar result for Unique Games or Small-Set Expansion would be a major breakthrough. So there is a sense in which these results can be thought of as some “positive evidence” in favor of at least weak variants of the UGC. (We emphasize however that there are inherent difficulties in extending these results for Unique Games, and it may very well be that obtaining a multiplicative approximation to the 242\to 4 of an operator is significantly harder problem than Unique Games or Small-Set Expansion.) In contrast, our positive algorithmic results show that perhaps the 2q2\to q norm can be thought of as a path to refuting the UGC. In particular we are able to extend our techniques to show a polynomial time algorithm can approximate the canonical hard instances for Unique Games considered in prior works.

(Informal) Eight rounds of the SoS relaxation certifies that it is possible to satisfy at most 1/1001/100 fraction of the constraints of Unique Games instances of the “quotient noisy cube” and “short code” types considered in [RS09, KS09, KPS10, BGH+11]

These instances are the same ones for which previous works showed that weaker hierarchies such as “SDP+Sherali Adams” and “Approximate Lasserre” require ω(1)\omega(1) rounds to certify that one cannot satisfy almost all the constraints [KV05, RS10, KS09, BGH+11]. In fact, for the “short code” based instances of [BGH+11] there was no upper bound known better than exp(logΩ(1)n)\exp(\log^{\Omega(1)}n) on the number of rounds required to certify that they are not almost satisfiable, regardless of the power of the hierarchy used.

This is significant since the current best known algorithms for Unique Games utilize SDP hierarchies [BRS11, GS11],Both these works showed SDP-hierarchy-based algorithms matching the performance of the subexponential algorithm of [ABS10]. [GS11] used the Lasserre hierarchy, while [BRS11] used the weaker “SDP+Sherali-Adams” hierarchy. and the instances above were the only known evidence that polynomial time versions of these algorithms do not refute the unique games conjecture. Our work also show that strong “basis independent” hierarchies such as Sum of Squares [Par00, Par03] and Lasserre [Las01] can in fact do better than the seemingly only slightly weaker variants.The only other result of this kind we are aware of is [KMN11], that show that Lasserre gives a better approximation ratio than the linear programming Sherali-Adams hierarchy for the knapsack problem. We do not know if weaker semidefinite hierarchies match this ratio, although knapsack of course has a simple dynamic programming based PTAS.

The SoS hierarchy

For our algorithmic results in this paper we consider a semidefinite programming (SDP) hierarchy that we call the Sum of Squares (SoS) hierarchy. This is not a novel algorithm and essentially the same hierarchies were considered by many other researchers (see the survey [Lau09]). Because different works sometimes used slightly different definitions, in this section we formally define the hierarchy we use as well as explain the intuition behind it. While there are some subtleties involved, one can think of this hierarchy as equivalent in power to the programs considered by Parrilo, Lasserre and others, while stronger than hierarchies such “SDP+Sherali-Adams” and “Approximate Lasserre” considered in [RS09, KPS10, BRS11].

The SoS SDP is a relaxation for polynomial equations. That is, we consider a system of the following form: maximize P0(x)P_{0}(x) over x\varmathbbRnx\in\varmathbb R^{n} subject to Pi2(x)=0P_{i}^{2}(x)=0 for i=1mi=1\ldots m and P0,,PmP_{0},\ldots,P_{m} polynomials of degree at most dd.This form is without loss of generality, as one can translate an inequality constraint of the form Pi(x)0P_{i}(x)\geqslant 0 into the equality constraint (Pi(x)y2)2=0(P_{i}(x)-y^{2})^{2}=0 where yy is some new auxiliary variable. It is useful to show equivalences between various hierarchy formulations; see also Appendix C. For r2dr\geqslant 2d, the rr-round SoS SDP optimizes over x1,,xnx_{1},\ldots,x_{n} that can be thought of as formal variables rather than actual numbers. For these formal variables, expressions of the form P(x)P(x) are well defined and correspond to a real number (which can be computed from the SDP solution) as long as PP is a polynomial of degree at most rr. These numbers obey the linearity property which is that (P+Q)(x)=P(x)+Q(x)(P+Q)(x)=P(x)+Q(x), and, most importantly, the positivity property that P2(x)0P^{2}(x)\geqslant 0 for every polynomial PP of degree at most r/2r/2. These expressions satisfy all initial constraints (i.e., Pi2(x)=0P_{i}^{2}(x)=0 for i=1mi=1\ldots m) and the value of the SDP is set to be the expression P0(x)P_{0}(x). The above means that to show that the SoS relaxation has value at most vv it is sufficient to give any proof that derives from the constraints {Pi2(x)=0}i=1..m\{P_{i}^{2}(x)=0\}_{i=1..m} the conclusion that P0(x)vP_{0}(x)\leqslant v using only the linearity and positivity properties, without using any polynomials of degree larger than rr in the intermediate steps. In fact, such a proof always has the form

where R1,,Rk,Q1,,QmR_{1},\ldots,R_{k},Q_{1},\ldots,Q_{m} are arbitrary polynomials satisfying degRir/2,degPiQir\deg R_{i}\leqslant r/2,\deg P_{i}Q_{i}\leqslant r. The polynomial iRi(x)2\sum_{i}R_{i}(x)^{2} is a SoS (sum of squares) and optimizing over such polynomials (along with the Q1,,QmQ_{1},\ldots,Q_{m}) can be achieved with a semi-definite program.

We can now present our formal definition of pseudo-expectation and the SoS hierarchy:We use the name “Sum of Squares” since the positivity condition below is the most important constraint of this program. However, some prior works used this name for the dual of the program we define here. As we show in Appendix C, in many cases of interest to us there is no duality gap.

The main difference between the SoS hierarchy and weaker SDP hierarchies considered in the literature such as SDP+Sherali Adams and the Approximate Lasserre hierarchies [RS09, KPS10] is that the SoS hierarchy treats all polynomials equally and hence is agnostic to the choice of basis. For example, the approximate Lasserre hierarchy can also be described in terms of pseudo-expectations, but these pseudo-expectations are only defined for monomials, and are allowed some small error. While they can be extended linearly to other polynomials, for non-sparse polynomials that error can greatly accumulate.

1 Basic properties of pseudo-expectation

For two polynomials PP and QQ, we write PQP\preceq Q if Q=P+i=1mRi2Q=P+\sum_{i=1}^{m}R_{i}^{2} for some polynomials R1,,RmR_{1},\ldots,R_{m}.

We would like to understand how polynomials behave on linear subspaces of \varmathbbRn\varmathbb R^{n}. A map P ⁣:\varmathbbRn\varmathbbRP\colon\varmathbb R^{n}\to\varmathbb R is polynomial over a linear subspace V\varmathbbRnV\subseteq\varmathbb R^{n} if PP restricted to VV agrees with a polynomial in the coefficients for some basis of VV. Concretely, if g1,,gmg_{1},\ldots,g_{m} is an (orthonormal) basis of VV, then PP is polynomial over VV if P(f)P(f) agrees with a polynomial in f,g1,,f,gm\langle f,g_{1}\rangle,\ldots,\langle f,g_{m}\rangle. We say that PQP\preceq Q holds over a subspace VV if PQP-Q, as a polynomial over VV, is a sum of squares.

2 Why is this SoS hierarchy useful?

While the proof of (3.3) is fairly simple, we find the result— that hypercontractivity of polynomials is efficiently certifiable—somewhat surprising. The reason is that hypercontractivity serves as the basis of the integrality gaps results which are exactly instances of maximization problems where the objective value is low but this is supposedly hard to certify. In particular, we consider integrality gaps for Unique Games considered before in the literature. All of these instances follow the framework initiated by Khot and Vishnoi [KV05]. Their idea was inspired by Unique Games hardness proofs, with the integrality gap obtained by composing an initial instance with a gadget. The proof that these instances have “cheating” SDP solutions is obtained by “lifting” the completeness proof of the gadget. On the other hand, the soundness property of the gadget, combined with some isoperimetric results, showed that the instances do not have real solutions. This approach of lifting completeness proofs of reductions was used to get other integrality gap results as well [Tul09]. We show that the SoS hierarchy allows us to lift a certain soundness proof for these instances, which includes a (variant of) the invariance principle of [MOO05], influence-decoding a la [KKMO04], and hypercontractivity of low-degree polynomials. It turns out all these results can be proven via sum-of-squares type arguments and hence lifted to the SoS hierarchy.

Overview of proofs

We now give a very high level overview of the tools we use to obtain our results, leaving details to the later sections and appendices.

Our subexponential algorithm for obtaining a good approximation for the 2q2\to q norm is extremely simple. It is based on the observation that a subspace V\varmathbbRnV\subseteq\varmathbb R^{n} of too large a dimension must contain a function ff such that fqf2\lVert f\rVert_{q}\gg\lVert f\rVert_{2}. For example, if dim(V)n\dim(V)\gg\sqrt{n}, then there must be ff such that f4f2\lVert f\rVert_{4}\gg\lVert f\rVert_{2}. This means that if we want to distinguish between, say, the case that V242\|V\|_{2\rightarrow 4}\leqslant 2 and V243\|V\|_{2\rightarrow 4}\geqslant 3, then we can assume without loss of generality that dim(V)=O(n)\dim(V)=O(\sqrt{n}) in which case we can solve the problem in exp(O(n))\exp(O(\sqrt{n})) time. To get intuition, consider the case that VV is spanned by an orthonormal basis f1,,fdf^{1},\ldots,f^{d} of functions whose entries are all in ±1\pm 1. Then clearly we can find coefficients a1,,ad{±1}a_{1},\ldots,a_{d}\in\{\pm 1\} such that the first coordinate of g=ajfjg=\sum a_{j}f^{j} is equal to dd, which means that its 44-norm is at least (d4/n)1/4=d/n1/4(d^{4}/n)^{1/4}=d/n^{1/4}. On the other hand, since the basis is orthonormal, the 22-norm of gg equals d\sqrt{d} which is d/n1/4\ll d/n^{1/4} for dnd\gg\sqrt{n}.

Note the similarity between this algorithm and [ABS10]’s algorithm for Small-Set Expansion, that also worked by showing that if the dimension of the top eigenspace of a graph is too large then it cannot be a small-set expander. Indeed, using our reduction of Small-Set Expansion to the 2q2\to q norm, we can reproduce a similar result to [ABS10].

2 Bounding the value of SoS relaxations

We show that in several cases, the SoS SDP hierarchy gives strong bounds on various instances. At the heart of these results is a general approach of “lifting” proofs about one-dimensional objects into the SoS relaxation domain. Thus we transform the prior proofs that these instances have small objective value, into a proof that the SoS relaxation also has a small objective The crucial observation is that many proofs boil down to the simple fact that a sum of squares of numbers is always non-negative. It turns out that this “sum of squares” axiom is surprisingly powerful (e.g. implying a version of the Cauchy–Schwarz inequality given by Lemma A.4), and many proofs boil down to essentially this principle.

3 The 2-to-4 norm and small-set expansion

Bounds on the pqp\to q norm of operators for p<qp<q have been used to show fast convergence of Markov chains. In particular, it is known that if the projector to the top eigenspace of a graph GG has bounded 242\to 4 norm, then that graph is a small-set expander in the sense that sets of o(1)o(1) fraction of the vertices have most of their edges exit the set. In this work we show a converse to this statement, proving that if GG is a small-set expander, then the corresponding projector has bounded 242\to 4 norm. As mentioned above, one corollary of this result is that a good (i.e., dimension-independent) approximation to the 242\to 4 norm will refute the Small-Set Expansion hypothesis of [RS10].

We give a rough sketch of the proof. Suppose that GG is a sufficiently strong small-set expander, in the sense that every set SS with SδV(G)|S|\leqslant\delta|V(G)| has all but a tiny fraction of the edges (u,v)(u,v) with uSu\in S satisfying v∉Sv\not\in S. Let ff be a function in the eigenspace of GG corresponding to eigenvalues larger than, say 0.990.99. Since ff is in the top eigenspace, for the purposes of this sketch let’s imagine that it satisfies

where the expectation is over a random neighbor yy of xx. Now, suppose that \varmathbbEf(x)2=1\operatorname*{\varmathbb{E}}f(x)^{2}=1 but \varmathbbEf(x)4=C\operatorname*{\varmathbb{E}}f(x)^{4}=C for some Cpoly(1/δ)C\gg\operatorname{poly}(1/\delta). That means that most of the contribution to the 44-norm comes from the set SS of vertices xx such that f(x)(1/2)C1/4f(x)\geqslant(1/2)C^{1/4}, but SδV(G)|S|\ll\delta|V(G)|. Moreover, suppose for simplicity that f(x)((1/2)C1/4,2C1/4)f(x)\in((1/2)C^{1/4},2C^{1/4}), in which case the condition (*) together with the small-set expansion condition that for most vertices yy in Γ(S)\Gamma(S) (the neighborhood of SS) satisfy f(y)C1/4/3f(y)\geqslant C^{1/4}/3, but the small-set expansion condition, together with the regularity of the graph imply that Γ(S)200S|\Gamma(S)|\geqslant 200|S| (say), which implies that \varmathbbEf(x)42C\operatorname*{\varmathbb{E}}f(x)^{4}\geqslant 2C—a contradiction.

The actual proof is more complicated, since we can’t assume the condition (4.1). Instead we will approximate it it by assuming that ff is the function in the top eigenspace that maximizes the ratio f4/f2\lVert f\rVert_{4}/\lVert f\rVert_{2}. See Section 8 for the details.

4 The 2-to-4 norm and the injective tensor norm

To relate the 242\to 4 norm to the injective tensor norm, we start by establishing equivalences between the 242\rightarrow 4 norm and a variety of different tensor problems. Some of these are straightforward exercises in linear algebra, analogous to proving that the largest eigenvalue of MTMM^{T}M equals the square of the operator norm of MM.

One technically challenging reduction is between the problem of optimizing a general degree-4 polynomial f(x)f(x) for x\varmathbbRnx\in\varmathbb R^{n} and a polynomial that can be written as the sum of fourth powers of linear functions of xx. Straightforward approaches will magnify errors by poly(n)\operatorname{poly}(n) factors, which would make it impossible to rule out a PTAS for the 242\rightarrow 4 norm. This would still be enough to prove that a 1/poly(n)1/\operatorname{poly}(n) additive approximation is NP-hard. However, to handle constant-factor approximations, we will instead use a variant of a reduction in [HM10]. This will allow us to map a general tensor optimization problem (corresponding to a general degree-4 polynomial) to a 242\rightarrow 4 norm calculation without losing very much precision.

To understand this reduction, we first introduce the n2×n2n^{2}\times n^{2} matrix A2,2A_{2,2} (defined in Section 9) with the property that A244=maxzTA2,2z\|A\|_{2\rightarrow 4}^{4}=\max z^{T}A_{2,2}z, where the maximum is taken over unit vectors zz that can be written in the form xyx\otimes y. Without this last restriction, the maximum would simply be the operator norm of A2,2A_{2,2}. Operationally, we can think of A2,2A_{2,2} as a quantum measurement operator, and vectors of the form xyx\otimes y as unentangled states (equivalently we say that vectors in this form are tensor product states, or simply “product states”). Thus the difference between A244\|A\|_{2\rightarrow 4}^{4} and A2,2222\|A_{2,2}\|_{2\rightarrow 2}^{2} can be thought of as the extent to which the measurement A2,2A_{2,2} can notice the difference between product states and (some) entangled state.

Next, we define a matrix AA^{\prime} whose rows are of the form (xy)A2,2(x^{\prime}\otimes y^{\prime})^{*}\sqrt{A_{2,2}}, where x,y\varmathbbRnx^{\prime},y^{\prime}\in\varmathbb R^{n} range over a distribution that approximates the uniform distribution. If AA^{\prime} acts on a vector of the form xyx\otimes y, then the maximum output 4-norm (over L2L_{2}-unit vectors x,yx,y) is precisely A24\|A\|_{2\rightarrow 4}. Intuitively, if AA^{\prime} acts on a highly “entangled” vector zz, meaning that z,xy\langle z,x\otimes y\rangle is small for all unit vectors x,yx,y, then Az4\|A^{\prime}z\|_{4} should be small. This is because zz will have small overlap with xyx^{\prime}\otimes y^{\prime}, and A2,2A_{2,2} is positive semi-definite, so its off-diagonal entries can be upper-bounded in terms of its operator norm. These arguments (detailed in Section 9.2) lead to only modest bounds on AA^{\prime}, but then we can use an amplification argument to make the 242\rightarrow 4 norm of AA^{\prime} depend more sensitively on that of AA, at the cost of blowing up the dimension by a polynomial factor.

The reductions we achieve also permit us, in Section 9.3, to relate our Tensor-SDP algorithm with the sum-of-squares relaxation used by Doherty, Parrilo, and Spedalieri [DPS04] (henceforth DPS). We show the two relaxations are essentially equivalent, allowing us to import results proved, in some cases, with techniques from quantum information theory. One such result, from [BaCY11], requires relating A2,2A_{2,2} to a quantum measurement of the 1-LOCC form. This means that there are two nn-dimensional subsystems, combined via tensor products, and A2,2A_{2,2} can be implemented as a measurement on the first subsystem followed by a measurement on the second subsystem that is chosen conditioned on the results of the first measurement. The main result of [BaCY11] proved that such 1-LOCC measurements exhibit much better behavior under DPS, and they obtain nontrivial approximation guarantees with only O(log(n)/ε2)O(\log(n)/\varepsilon^{2}) rounds. Since this is achieved by DPS, it also implies an upper bound on the error of Tensor-SDP. This upper bound is εZ\varepsilon Z, where ZZ is the smallest number for which A2,2ZMA_{2,2}\leqslant ZM for some 1-LOCC measurement MM. While ZZ is not believed to be efficiently computable, it is at least A2,222\|A_{2,2}\|_{2\rightarrow 2}, since any measurement MM has M221\|M\|_{2\rightarrow 2}\leqslant 1. To upper bound ZZ, we can explicitly construct A2,2A_{2,2} as a quantum measurement. This is done by the following protocol. Let a1,,ama_{1},\ldots,a_{m} be the rows of AA. One party performs the quantum measurement with outcomes {αaiaiT}i=1m\{\alpha a_{i}a_{i}^{T}\}_{i=1}^{m} (where α\alpha is a normalization factor) and transmits the outcome ii to the other party. Upon receiving message ii, the second party does the two outcome measurement {βaiaiT,IβaiaiT}\{\beta a_{i}a_{i}^{T},I-\beta a_{i}a_{i}^{T}\} and outputs 0 or 1 accordingly, where β\beta is another normalization factor. The measurement A2,2A_{2,2} corresponds to the “0” outcomes. For this to be a physically realizable 1-LOCC measurement, we need αATA22\alpha\leqslant\|A^{T}A\|_{2\rightarrow 2} and βA22\beta\leqslant\|A\|_{2\rightarrow\infty}^{2}. Combining these ingredients, we obtain the approximation guarantee in Theorem 2.3. More details on this argument are in Section 9.3.1.

5 Definitions and Notation

In most of this paper we use expectation norms defined as above, but in some contexts the counting norms will be more convenient. We will usually stick to the convention that functions use expectation norms while vectors use the counting norms. For a vector v\varmathbbCUv\in\varmathbb C^{\mathcal{U}} and p1p\geqslant 1, the pp counting norm of vv, denoted vp{\boldsymbol{\lVert}}v{\boldsymbol{\rVert}}_{p}, is defined to be (iUvip)1/p\left(\sum_{i\in\mathcal{U}}|v_{i}|^{p}\right)^{1/p}. The counting inner product of two vectors u,v\varmathbbRUu,v\in\varmathbb R^{\mathcal{U}}, denoted as u,v\boldsymbol{\langle}u,v\boldsymbol{\rangle}, is defined to be iUuivi\sum_{i\in\mathcal{U}}u_{i}v_{i}^{*}.

The Tensor-SDP algorithm

There is a very natural SoS program for the 242\to 4 norm for a given linear operator A ⁣:L2(U)L2(V)A\colon L_{2}(\mathcal{U})\to L_{2}(\mathcal{V}):

Note that Af44\lVert Af\rVert_{4}^{4} is indeed a degree 44 polynomial in the variables {f(u)}uU\{f(u)\}_{u\in\mathcal{U}}. The Tensor-SDP(d)\textup{{Tensor-SDP}}^{(d)} algorithm makes sense for d4d\geqslant 4, and we denote by Tensor-SDP its most basic version where d=4d=4. The Tensor-SDP algorithm applies not just to the 242\to 4 norm, but to optimizing general polynomials over the unit ball of L2(U)L_{2}(\mathcal{U}) by replacing Af44\lVert Af\rVert_{4}^{4} with an arbitrary polynomial PP.

While we do not know the worst-case performance of the Tensor-SDP algorithm, we do know that it performs well on random instances (see Section 7), and (perhaps more relevant to the UGC) on the projector to low-degree polynomials (see Theorem 2.2). The latter is a corollary of the following result:

Over the space of nn-variate Fourier polynomialsAn nn-variate Fourier polynomial with degree at most dd is a function f ⁣:{±1}n\varmathbbRf\colon\{\pm 1\}^{n}\to\varmathbb R of the form f=α[n],αdf^αχαf=\sum_{\alpha\subseteq[n],|\alpha|\leqslant d}\hat{f}_{\alpha}\chi_{\alpha} where χα(x)=iαxi\chi_{\alpha}(x)=\prod_{i\in\alpha}x_{i}. ff with degree at most dd,

where the expectations are over {±1}n\{\pm 1\}^{n}.

The result is proven by a careful variant of the standard inductive proof of the hypercontractivity for low-degree polynomials (see e.g. [O’D07]). We include it in this part of the paper since it is the simplest example of how to “lift” known proofs about functions over the reals into proofs about the fictitious random variables that arise in semidefinite programming hierarchies. To strengthen the inductive hypothesis, we will prove the more general statement that for ff and gg being nn-variate Fourier polynomials with degrees at most dd and ee, it holds that \varmathbbEf2g29d+e2(\varmathbbEf2)(\varmathbbEg2)\operatorname*{\varmathbb{E}}f^{2}g^{2}\preceq 9^{\frac{d+e}{2}}\left(\operatorname*{\varmathbb{E}}f^{2}\right)\left(\operatorname*{\varmathbb{E}}g^{2}\right). (Formally, this polynomial relation is over the linear space of pairs of nn-variate Fourier polynomials (f,g)(f,g), where ff has degree at most dd and gg has degree at most ee.) The proof is by induction on the number of variables.

If one of the functions is constant (so that d=0d=0 or e=0e=0), then \varmathbbEf2g2=(\varmathbbEf2)(\varmathbbEg2)\operatorname*{\varmathbb{E}}f^{2}g^{2}=(\operatorname*{\varmathbb{E}}f^{2})(\operatorname*{\varmathbb{E}}g^{2}), as desired. Otherwise, let f0,f1,g0,g1f_{0},f_{1},g_{0},g_{1} be Fourier polynomials depending only on x1,,xn1x_{1},\ldots,x_{n-1} such that f(x)=f0(x)+xnf1(x)f(x)=f_{0}(x)+x_{n}f_{1}(x) and g(x)=g0(x)+xng1(x)g(x)=g_{0}(x)+x_{n}g_{1}(x). The Fourier polynomials f0,f1,g0,g1f_{0},f_{1},g_{0},g_{1} depend linearly on ff and gg (because f0(x)=12f(x1,,xn1,1)+12f(x1,,xn1,1)f_{0}(x)=\tfrac{1}{2}f(x_{1},\ldots,x_{n-1},1)+\tfrac{1}{2}f(x_{1},\ldots,x_{n-1},-1) and f1(x)=12f(x1,,xn1,1)12f(x1,,xn1,1)f_{1}(x)=\tfrac{1}{2}f(x_{1},\ldots,x_{n-1},1)-\tfrac{1}{2}f(x_{1},\ldots,x_{n-1},-1)). Furthermore, the degrees of f0f_{0}, f1f_{1}, g0g_{0}, and g1g_{1} are at most dd, d1d-1, ee, and e1e-1, respectively.

Since \varmathbbExn=\varmathbbExn3=0\operatorname*{\varmathbb{E}}x_{n}=\operatorname*{\varmathbb{E}}x_{n}^{3}=0, if we expand \varmathbbEf2g2=\varmathbbE(f0+xnf1)2(g0+xng1)2\operatorname*{\varmathbb{E}}f^{2}g^{2}=\operatorname*{\varmathbb{E}}(f_{0}+x_{n}f_{1})^{2}(g_{0}+x_{n}g_{1})^{2} then the terms where xnx_{n} appears in an odd power vanish, and we obtain

By expanding the square expression 2\varmathbbE(f0f1g0g1)22\operatorname*{\varmathbb{E}}(f_{0}f_{1}-g_{0}g_{1})^{2}, we get 4\varmathbbEf0f1g0g12\varmathbbEf02g12+f12g024\operatorname*{\varmathbb{E}}f_{0}f_{1}g_{0}g_{1}\preceq 2\operatorname*{\varmathbb{E}}f_{0}^{2}g_{1}^{2}+f_{1}^{2}g_{0}^{2} and thus

Applying the induction hypothesis to all four terms on the right-hand side of (5.1) (using for the last two terms that the degree of f1f_{1} and g1g_{1} is at most d1d-1 and e1e-1),

Since \varmathbbEf02+\varmathbbEf12=\varmathbbE(f0+xnf1)2=\varmathbbEf2\operatorname*{\varmathbb{E}}f_{0}^{2}+\operatorname*{\varmathbb{E}}f_{1}^{2}=\operatorname*{\varmathbb{E}}(f_{0}+x_{n}f_{1})^{2}=\operatorname*{\varmathbb{E}}f^{2} (using \varmathbbExn=0\operatorname*{\varmathbb{E}}x_{n}=0) and similarly \varmathbbEg02+\varmathbbEg12=\varmathbbEg2\operatorname*{\varmathbb{E}}g_{0}^{2}+\operatorname*{\varmathbb{E}}g_{1}^{2}=\operatorname*{\varmathbb{E}}g^{2}, we derive the desired relation \varmathbbEf2g29d+e2(\varmathbbEf2)(\varmathbbEg2)\operatorname*{\varmathbb{E}}f^{2}g^{2}\preceq 9^{\frac{d+e}{2}}\left(\operatorname*{\varmathbb{E}}f^{2}\right)\left(\operatorname*{\varmathbb{E}}g^{2}\right). ∎

SoS succeeds on Unique Games integrality gaps

In this section we prove Theorem 2.6, showing that 8 rounds of the SoS hierarchy can beat the Basic SDP program on the canonical integrality gaps considered in the literature.

For sufficiently small ε\varepsilon and large kk, and every n\varmathbbNn\in\varmathbb N, let W\mathcal{W} be an nn-variable kk-alphabet Unique Games instance of the type considered in [RS09, KS09, KPS10] obtained by composing the “quotient noisy cube” instance of [KV05] with the long-code alphabet reduction of [KKMO04] so that the best assignment to W\mathcal{W}’s variable satisfies at most an ε\varepsilon fraction of the constraints. Then, on input W\mathcal{W}, eight rounds of the SoS relaxation outputs at most 1/1001/100.

The proof is very technical, as it is obtained by taking the already rather technical proofs of soundness for these instances, and “lifting” each step into the SoS hierarchy, a procedure that causes additional difficulties. The high level structure of all integrality gap instances constructed in the literature was the following: Start with a basic integrality gap instance of Unique Games where the Basic SDP outputs 1o(1)1-o(1) but the true optimum is o(1)o(1), the alphabet size of U\mathcal{U} is (necessarily) R=ω(1)R=\omega(1). Then, apply an alphabet-reduction gadget (such as the long code, or in the recent work [BGH+11] the so called “short code”) to transform U\mathcal{U} into an instance W\mathcal{W} with some constant alphabet size kk. The soundness proof of the gadget guarantees that the true optimum of U\mathcal{U} is small, while the analysis of previous works managed to “lift” the completeness proofs, and argue that the instance U\mathcal{U} survives a number of rounds that tends to infinity as ε\varepsilon tends to zero, where (1ε)(1-\varepsilon) is the completeness value in the gap constructions, and exact tradeoff between number of rounds and ε\varepsilon depends on the paper and hierarchy.

The fact that the basic instance U\mathcal{U} has small integral value can be shown by appealing to hypercontractivity of low-degree polynomials, and hence can be “lifted” to the SoS world via Lemma 5.1. The bulk of the technical work is in lifting the soundness proof of the gadget. On a high level this proof involves the following components: (1) The invariance principle of [MOO05], saying that low influence functions cannot distinguish between the cube and the sphere; this allows us to argue that functions that perform well on the gadget must have an influential variable, and (2) the influence decoding procedure of [KKMO04] that maps these influential functions on each local gadget into a good global assignment for the original instance U\mathcal{U}.

The invariance principle poses a special challenge, since the proof of [MOO05] uses so called “bump” functions which are not at all low-degree polynomials.A similar, though not identical, challenge arises in [BGH+11] where they need to extend the invariance principle to the “short code” setting. However, their solution does not seem to apply in our case, and we use a different approach. We use a weaker invariance principle, only showing that the 44 norm of a low influence function remains the same between two probability spaces that agree on the first 22 moments. Unlike the usual invariance principle, we do not move between Bernoulli variables and Gaussian space, but rather between two different distributions on the discrete cube. It turns out that for the purposes of these Unique Games integrality gaps, the above suffices. The lifted invariance principle is proven via a “hybrid” argument similar to the argument of [MOO05], where hypercontractivity of low-degree polynomials again plays an important role.

The soundness analysis of [KKMO04] is obtained by replacing each local function with an average over its neighbors, and then choosing a random influential coordinate from the new local function as an assignment for the original uniquegames instance. We follow the same approach, though even simple tasks such as independent randomized rounding turn out to be much subtler in the lifted setting. However, it turns out that by making appropriate modification to the analysis, it can be lifted to complete the proof of Theorem 2.6.

We now turn to the actual proof of Theorem 6.1. The proof consists of lifting to the SoS hierarchy all the steps used in the analysis of previous integrality gaps, which themselves arise from hardness reductions. We start in Section 6.2 by showing a sum-of-squares proof for a weaker version of [MOO05]’s invariance principle. Then in Section 6.3 we show how one can perform independent rounding in the SoS world (this is a trivial step in proofs involving true random variables, but becomes much more subtle when dealing with SoS solutions). In Sections 6.4 and 6.5 we lift variants of the [KKMO04] dictatorship test. The proof uses a SoS variant of influence decoding, which is covered in Section 6.6. Together all these sections establish SoS analogs of the soundness properties of the hardness reduction used in the previous results. Then, in Section 6.7 we show that analysis of the basic instance has a sum of squares proof (since it is based on hypercontractivity of low-degree polynomials). Finally in Section 6.8 we combine all these tools to conclude the proof. In Section 6.9 we discuss why this proof applies (with some modifications) also to the “short-code” based instances of [BGH+11].

2 Invariance Principle for Fourth Moment

In this section, we will give a sum-of-squares proof for a variant of the invariance principle of [MOO05]. Instead of for general smooth functionals (usually constructed from “bump functions”), we show invariance only for the fourth moment. It turns out that invariance of the fourth moment is enough for our applications.

Let k=2tk=2^{t} for t\varmathbbNt\in\varmathbb N and let X=(X1,,XR)\mathcal{X}=(\mathcal{X}_{1},\ldots,\mathcal{X}_{R}) be an independent sequenceAn orthonormal ensemble is a collection of orthonormal real-valued random variables, one being the constant 11. A sequence of such ensembles is independent if each ensemble is defined over an independent probability space. (See [MOO05] for details.) of orthonormal ensembles Xr=(Xr,0,,Xr,k1)\mathcal{X}_{r}=(X_{r,0},\ldots,X_{r,k-1}). Concretely, we choose Xr,i=χi(xr)X_{r,i}=\chi_{i}(x_{r}), where χ0,,χk1{\chi_{0},\ldots,\chi_{k-1}} is the set of characters of \varmathbbF2t\varmathbb F_{2}^{t} and xx is sampled uniformly from (\varmathbbF2t)R(\varmathbb F_{2}^{t})^{R}. Every random variable over (\varmathbbF2t)R(\varmathbb F_{2}^{t})^{R} can be expressed as a multilinear polynomial over the sequence X\mathcal{X}. In this sense, X\mathcal{X} is maximally dependent. On the other hand, let Y=(Y1,,YR)\mathcal{Y}=(\mathcal{Y}_{1},\ldots,\mathcal{Y}_{R}) be a sequence of ensembles Yr=(Yr,0,,Yr,k1)\mathcal{Y}_{r}=(Y_{r,0},\ldots,Y_{r,k-1}), where Yr,01Y_{r,0}\equiv 1 and Yr,jY_{r,j} are independent, unbiased {±1}\{\pm 1\} Bernoulli variables. The sequence Y\mathcal{Y} is maximally independent since it consists of completely independent random variables.

(Since the expressions \varmathbbEXf4\operatorname*{\varmathbb{E}}_{\mathcal{X}}f^{4} and \varmathbbEYf4\operatorname*{\varmathbb{E}}_{\mathcal{Y}}f^{4} are degree-44 polynomials in ff, their pseudo-expectations are defined.)

We consider the following intermediate sequences of ensembles Z(r)=(X1,,Xr,Yr+1,,YR)\mathcal{Z}^{(r)}=(\mathcal{X}_{1},\ldots,\mathcal{X}_{r},\mathcal{Y}_{r+1},\ldots,\mathcal{Y}_{R}). Note that Z(0)=Y\mathcal{Z}^{(0)}=\mathcal{Y} and Z(R)=X\mathcal{Z}^{(R)}=\mathcal{X}. For r\varmathbbNr\in\varmathbb N, we write f=Erf+Drff=E_{r}f+D_{r}f, where ErfE_{r}f is the part of ff that does not dependent on coordinate rr and Drf=fErfD_{r}f=f-E_{r}f. For all r\varmathbbNr\in\varmathbb N, the following identities (between polynomials in ff) hold

The last step uses that the first two moments of the ensembles Xr\mathcal{X}_{r} and Yr\mathcal{Y}_{r} match and that ErfE_{r}f does not dependent on coordinate rr.

Next, we consider the term r\varmathbbEZ(r)(Erf)(Drf)3\sum_{r}\operatorname*{\varmathbb{E}}_{\mathcal{Z}^{(r)}}(E_{r}f)(D_{r}f)^{3}. (The remaining two terms are analogous.) To bound its pseudo-expectation, we apply Cauchy-Schwarz,

3 Interlude: Independent Rounding

In this section, we will show how to convert variables that satisfy f2ff^{2}\preceq f to variables fˉ\bar{f} satisfying fˉ2=fˉ\bar{f}^{2}=\bar{f}. The derived variables fˉ\bar{f} will inherit several properties of the original variables ff (in particular, multilinear expectations). This construction corresponds to the standard independent rounding for variables with values between and 11. The main challenge is that our random variables are fictitious.

Let ff be a 44-f.r.v. over \varmathbbRn\varmathbb R^{n}. Suppose fi2fif_{i}^{2}\preceq f_{i} (in terms of an unspecified jointly-distributed 44-f.r.v.). Note that for real numbers xx, the condition x2xx^{2}\leqslant x is equivalent to xx\in.

4 Dictatorship Test for Small-Set Expansion

Let Ω={0,,k1}\Omega=\{0,\ldots,k-1\} and let T1εT_{1-\varepsilon} be the noise graph on ΩR\Omega^{R} with second largest eigenvalue 1ε1-\varepsilon. Let ff be a 44-f.r.v. over L2(ΩR)L_{2}(\Omega^{R}). Suppose f2ff^{2}\preceq f (in terms of an unspecified jointly-distributed 44-f.r.v.). Note that for real numbers xx, the condition x2xx^{2}\leqslant x is equivalent to xx\in.

The following theorem is an analog of the “Majority is Stablest” result [MOO05].

(Here, we assume that ε\varepsilon, δ\delta and τ\tau are sufficiently small.)

The previous theorem is about graph expansion (measured by the quadratic form f,T1εf\langle f,T_{1-\varepsilon}f\rangle). The following lemma allows us to relate graph expansion to the 44-norm of the projection of ff into the span of the eigenfunctions of T1εT_{1-\varepsilon} with significant eigenvalue. We will be able to bound this 44-norm in terms of the influences of ff (using the invariance principle in the previous section).

Let ff be a 44-f.r.v. over L2(ΩR)L_{2}(\Omega^{R}). Suppose f2ff^{2}\preceq f (in terms of unspecified jointly-distributed 44-f.r.v. s). Then for all λ>0\lambda>0,

Here, P>λP_{>\lambda} is the projector into the span of the eigenfunctions of T1εT_{1-\varepsilon} with eigenvalue larger than λ\lambda.

The following relation between polynomials holds

By Corollary 6.5, there exists a 44-f.r.v. (f,fˉ)(f,\bar{f}) over L2(ΩR)×L2(ΩR)L_{2}(\Omega^{R})\times L_{2}(\Omega^{R}) such that fˉ2fˉfˉ\bar{f}^{2}\equiv_{\bar{f}}\bar{f}. Then,

To balance the terms (1/λ)O(1/ε)δ5/4(1/\lambda)^{O(1/\varepsilon)}\delta^{5/4} and λδ\lambda\delta, we choose λ=δΩ(ε)\lambda=\delta^{\Omega(\varepsilon)}. We conclude the desired bound,

5 Dictatorship Test for Unique Games

Let Ω=\varmathbbZk\Omega=\varmathbb Z_{k} (cyclic group of order kk) and let ff be a 44-f.r.v. over L2(Ω×ΩR)L_{2}(\Omega\times\Omega^{R}). Here, f(a,x)f(a,x) is intended to be 0/10/1 variable indicating whether symbol aa is assigned to the point xx.

The following graph T1εT^{\prime}_{1-\varepsilon} on Ω×ΩR\Omega\times\Omega^{R} corresponds to the 22-query dictatorship test for Unique Games [KKMO04],

Here, y1εxy\sim_{1-\varepsilon}x means that yy is a random neighbor of xx in the graph T1εT_{1-\varepsilon} (the ε\varepsilon-noise graph on ΩR\Omega^{R}).

We define fˉ(x):=\varmathbbEcΩf(c,xc\varmathbbI)\bar{f}(x)\mathrel{\mathop{:}}=\operatorname*{\varmathbb{E}}_{c\in\Omega}f(c,x-c\cdot\varmathbb I). (We think of fˉ\bar{f} as a variable over L2(ΩR)L_{2}(\Omega^{R}).) Then, the following polynomial identity (in ff) holds

(Here, we assume that ε\varepsilon, δ\delta and τ\tau are sufficiently small.)

6 Influence Decoding

Let U\mathcal{U} be a unique game with vertex set VV and alphabet [R][R]. Recall that we represent U\mathcal{U} as a distribution over triples (u,v,π)(u,v,\pi) where u,vVu,v\in V and π\pi is a permutation of [R][R]. The triples encode the constraints of U\mathcal{U}. We assume that the unique game U\mathcal{U} is regular in the same that every vertex participates in the same fraction of constraints.

Let Ω=\varmathbbZk\Omega=\varmathbb Z_{k} (cyclic group of order kk). We reduce U\mathcal{U} to a unique game W=Wε,k(U)\mathcal{W}=\mathcal{W}_{\varepsilon,k}(\mathcal{U}) with vertex set V×ΩRV\times\Omega^{R} and alphabet Ω\Omega. Let f={fu}uVf=\{f_{u}\}_{u\in V} be a variable over L2(Ω×ΩR)VL_{2}(\Omega\times\Omega^{R})^{V}. The unique game W\mathcal{W} corresponds to the following quadratic form in ff,

Here, (u,v,π)Uu(u,v,\pi)\sim\mathcal{U}\mid u denotes a random constraint of U\mathcal{U} incident to vertex uu, the graph T1εT^{\prime}_{1-\varepsilon} corresponds to the dictatorship test of Unique Games defined in Section 6.5, and fv(π)(a,x)=fv(a,π.x)f^{(\pi)}_{v}(a,x)=f_{v}(a,\pi.x) is the function obtained by permuting the last RR coordinates according to π\pi (where π.x(i)=xπ(i)\pi.x(i)=x_{\pi(i)}).

We define gu=\varmathbbE(u,v,π)Uufv(π)g_{u}=\operatorname*{\varmathbb{E}}_{(u,v,\pi)\sim\mathcal{U}\mid u}f^{(\pi)}_{v}. Then,

By applying Theorem 6.8 to (6.2), we can bound the objective value of ff

The arguments in this subsection imply the following theorem.

The optimal value of the level-dd SoS relaxation for the unique game W=Wε,k(U)\mathcal{W}=\mathcal{W}_{\varepsilon,k}(\mathcal{U}) is bounded from above by

Since the quadratic form GUh2\lVert G_{\mathcal{U}}h\rVert^{2} has only nonnegative coefficients (in the standard basis), we can use Corollary 6.5 to ensure that the level-d/2d/2 random variable hh satisfies in addition h2hhh^{2}\equiv_{h}h.

7 Certifying Small-Set Expansion

Let T1εT_{1-\varepsilon} be a the noise graph on {±1}R\{\pm 1\}^{R} with second largest eigenvalue 1ε1-\varepsilon.

By Lemma 6.7 (applying it for the case Ω={0,1}\Omega=\{0,1\}), for every λ>0\lambda>0,

To balance the terms, we choose λ=δΩ(ε)\lambda=\delta^{\Omega(\varepsilon)}, which gives the desired bound. ∎

8 Putting Things Together

Let T1ηT_{1-\eta} be a the noise graph on {±1}R\{\pm 1\}^{R} with second largest eigenvalue 1η1-\eta. Let U=Uη,R\mathcal{U}=\mathcal{U}_{\eta,R} be an instance of Unique Games with label-extended graph GU=T1ηG_{\mathcal{U}}=T_{1-\eta} (e.g., the construction in [KV05]).

Combining Theorem 6.9 (with d=4d=4) and Theorem 6.11 gives the following result.

The optimal value of the level-88 SoS relaxation for the unique game W=Wε,k(Uη,R)\mathcal{W}=\mathcal{W}_{\varepsilon,k}(\mathcal{U}_{\eta,R}) is bounded from above by

In particular, the optimal value of the relaxation is close to 1/kΩ(ε)1/k^{\Omega(\varepsilon)} if logR(logk)2/η\log R\gg(\log k)^{2}/\eta.

9 Refuting Instances based on Short Code

Let U=Uη,R\mathcal{U}^{\prime}=\mathcal{U}^{\prime}_{\eta,R} be an instance of Unique Games according to the basic construction in [BGH+11]. (The label-extended graph of U\mathcal{U} will be a subgraph of T1εT_{1-\varepsilon} induced by the subset of {±1}R\{\pm 1\}^{R} corresponding to a Reed–Muller code, that is, evaluations of low-degree \varmathbbF2\varmathbb F_{2}-polynomials.)

Let W=Wε,k(Uη,R)\mathcal{W}^{\prime}=\mathcal{W}^{\prime}_{\varepsilon,k}(\mathcal{U}^{\prime}_{\eta,R}) be the unique game obtained by applying the short-code alphabet reduction of [BGH+11].

The following analog of Theorem 6.12 holds.

The optimal value of the level-88 SoS relaxation for the unique game W=Wε,k(Uη,R)\mathcal{W}^{\prime}=\mathcal{W}^{\prime}_{\varepsilon,k}(\mathcal{U}^{\prime}_{\eta,R}) is bounded from above by

In particular, the optimal value of the relaxation is close to 1/kΩ(ε)1/k^{\Omega(\varepsilon)} if logR(logk)2/η\log R\gg(\log k)^{2}/\eta.

The proof of Theorem 6.13 is almost literally the same as the proof of Theorem 6.12. In the following, we sketch the main arguments why the proof doesn’t have to change. First, several of the results of the previous sections apply to general graphs and instances of Unique Games. In particular, Lemma 6.7 applies to general graphs and Theorem 6.9 applies to general gadget-composed instances of unique games assuming a “Majority is Stablest” result for the gadget. In fact, the only parts that require further justification are the invariance principle (Theorem 6.2) and hypercontractivity bound (Lemma 5.1). Both the invariance principle and the hypercontractivity bound are about the fourth moment of a low-degree Fourier polynomial (whose coefficients are fictitious random variables). For the construction of [BGH+11], we need to argue about the fourth moment with respect to a different distribution over inputs. (Instead of the uniform distribution, [BGH+11] considers a distribution over inputs related to the Reed–Muller code.) However, this distribution happens to be kk-wise independent for k/4k/4 larger than the degree of our Fourier polynomial. Hence, as a degree-44 polynomial in Fourier coefficients, the fourth moment with respect to the [BGH+11]-input distribution is the same as with respect to the uniform distribution, which considered here.

Hypercontractivity of random operators

We already saw that the Tensor-SDP algorithm provides non-trivial guarantees on the 242\to 4 norms of the projector to low-degree polynomials. In this section we show that it also works for a natural but very different class of instances, namely random linear operators. Let

where eie_{i} is the vector with a 1 in the ithi^{\text{th}} position, and each ai,ja_{i,j} is chosen i.i.d. from a distribution D\mathcal{D} on \varmathbbR\varmathbb R. We will show that Tensor-SDP returns an answer close to the correct value under fairly general assumptions on D\mathcal{D}. Specifically, we will assume that

for some constant ψ>0\psi>0. Two examples of distributions that meet these criteria are the uniform distribution over {1,1}\{-1,1\} and the standard (mean-zero, unit variance) Gaussian distribution.

The following theorem is the main result of this section, and shows that the approximation ratio of Tensor-SDP approaches 1 as m,nm,n\rightarrow\infty and n2/m0n^{2}/m\rightarrow 0.

There exist constants 0<c1<c20<c_{1}<c_{2} such that

with high probability (i.e. probability 1o(1)1-o(1)) over random matrices AA distributed according to (7.1) with D\mathcal{D} satisfying (7.2). Here o(1)o(1) refers to quantities that approach zero when both mm and nn approach \infty.

To get some intuition for these terms, note that \varmathbbE[Z4]=3\operatorname*{\varmathbb{E}}[Z^{4}]=3 if ZZ is a standard Gaussian random variable. Similarly for any fixed vector x\varmathbbRnx\in\varmathbb R^{n}, if AA has standard Gaussian entries, then AxAx will have Gaussian entries with mean zero and variance x2\lVert x\rVert^{2}. Even if AA has general entries (with variance 1) this will be approximately true because of the central limit theorem. This accounts for the 3 term that dominates when mn2m\gg n^{2}. On the other hand, the lower bound of n2/mn^{2}/m holds because we can choose xx after AA is chosen. We will see that it emerges from choosing the signs of xx to match those of any row of AA. These lower bounds hold under rather general assumptions and variants of them apply even without any randomness, as we will briefly explore in Lemma 7.5.

The upper bound on the value of Tensor-SDP is rather more complicated. It will be seen to follow from a concentration-of-measure bound for matrices. However, the argument does not simply involve bounding the top eigenvalue of the appropriate random matrix. We will need to further make use of the symmetry properties of Tensor-SDP; indeed it appears crucial that Tensor-SDP uses semidefinite programming rather than simply an eigenvalue calculation. See Remark 7.6 for more on this point and for a comparison with the case of complex matrices.

Before proving Theorem 7.1, we introduce some results from the literature. First, we discuss the implications of (7.2d).

Let ZZ be a real-valued random variable with \varmathbbE[Z]=0\operatorname*{\varmathbb{E}}[Z]=0. The following are equivalent with parameters Ci>0C_{i}>0 differing by at most constant factors:

Sub-gaussian moment: \varmathbbE[exp((Z/C1)2)]2\operatorname*{\varmathbb{E}}[\exp((Z/C_{1})^{2})]\leqslant 2.

Moments: \varmathbbE[Zk](C2k)k\operatorname*{\varmathbb{E}}[|Z|^{k}]\leqslant(C_{2}\sqrt{k})^{k} for all nonnegative integers kk.

Tails: \varmathbbP[Zt]exp(1t2/C33)\operatorname*{\varmathbb{P}}[|Z|\geqslant t]\leqslant\exp(1-t^{2}/C_{3}^{3}) for all t0t\geqslant 0.

Sub-exponential moments: \varmathbbE[exp(tZ)]exp(t2C42)\operatorname*{\varmathbb{E}}[\exp(tZ)]\leqslant\exp(t^{2}C_{4}^{2}) for all t\varmathbbRt\in\varmathbb R.

The largest ψ\psi for which (7.2d) holds is called the ψ2\psi_{2} norm of a distribution, where the 22 refers to the fact that we have Z2Z^{2} in the exponent. In general, the ψp\psi_{p} norm of a distribution D\mathcal{D} refers to the smallest ψ>0\psi>0 such that \varmathbbEZD[exp(Z/ψp)]2\operatorname*{\varmathbb{E}}_{Z\sim\mathcal{D}}[\exp(|Z/\psi|^{p})]\leqslant 2.

In what follows, we will also need to define the ψ1\psi_{1} norm of a vector. If D\mathcal{D} is now a distribution on \varmathbbRN\varmathbb R^{N}, define the ψp\psi_{p} norm Dψp\lVert\mathcal{D}\rVert_{\psi_{p}} to be the smallest ψ>0\psi>0 such that

or \infty if no finite such ψ\psi exists. We depart from the normal convention by including a factor of Np/2N^{p/2} in the definition, so that our expectation-norm convention will be consistent with the results in [ALPTJ11], and specifically Lemma 7.3 below. This definition is also consistent with the term “subgaussian moment” since a vector of i.i.d. Gaussians with mean zero and variance one will have ψ2\psi_{2} equal to a constant (in fact 8/3\sqrt{8/3}). We will abuse notation and write Xψp\|X\|_{\psi_{p}} to refer to the ψp\psi_{p} norm of the distribution D\mathcal{D} associated with a random variable XX.

Relations between norms. One can verify that if a vector has i.i.d. entries with O(1)O(1) ψ2\psi_{2} norm, then the distribution over vectors also has Dψ2O(1)\|\mathcal{D}\|_{\psi_{2}}\leqslant O(1). However, the relation between ψp\psi_{p} norms for different values of pp is less clear in the vector case. If ZZ is a real-valued random variable, then Xψpp=Xpψ1\|X\|^{p}_{\psi_{p}}=\|X^{p}\|_{\psi_{1}}, but there is no simple analogue of this for vectors. We will see an example below where vvψ1\|v\otimes v\|_{\psi_{1}} can be larger than vψ22\|v\|_{\psi_{2}}^{2} by a dimension-dependent factor; also we will see why bounding the ψ1\psi_{1} norm is important.

We will require a bound from [ALPTJ10, ALPTJ11] about the convergence of sums of i.i.d rank-one matrices.

Let b1,,bmb_{1},\ldots,b_{m} be independent random vectors in \varmathbbRN\varmathbb R^{N} with biψ1φ\|b_{i}\|_{\psi_{1}}\leqslant\varphi and satisfying

Then with probability 12exp(cN)\geqslant 1-2\exp(-c\sqrt{N}), we have

where ε=C(φ+K)2max(N/m,N/m)\varepsilon=C(\varphi+K)^{2}\max(N/m,\sqrt{N/m}) with c,C>0c,C>0 universal constants.

The NmN\leqslant m case (when the N/m\sqrt{N/m} term is applicable) was proven in Theorem 1 of [ALPTJ11], and the N>mN>m case (i.e. when the max is achieved by N/mN/m) is discussed in Remark 1.2 of [ALPTJ11] (see also Theorem 3.13 of [ALPTJ10]).

We will also make use of the Hanson-Wright inequality, in the form proved by Rudelson and Vershynin in [RV13].

Let a\varmathbbRNa\in\varmathbb R^{N} be a random vector with independent components satisfying \varmathbbE[ai]=0\operatorname*{\varmathbb{E}}[a_{i}]=0 and aiψ2ψ\|a_{i}\|_{\psi_{2}}\leqslant\psi. Let YY be an n×nn\times n matrix. For t0t\geqslant 0,

Here YF\|Y\|_{F} is the Frobenius norm, defined as YF:=TrYTY\|Y\|_{F}:=\sqrt{\operatorname{Tr}Y^{T}Y}.

2 Proof of Theorem 7.1

First we prove that, for some constant c1>0c_{1}>0,

with high probability (i.e. probability 1o(1)1-o(1)). Define a vector x\varmathbbRnx\in\varmathbb R^{n} by xj=sign(a1,j)x_{j}=\operatorname{sign}(a_{1,j}) (or arbitrarily if a1,j=0a_{1,j}=0). Then x2=1\|x\|_{2}=1 and

Next, we will average over the choice of AA. The first term is proportional to \varmathbbE[a1,j1a1,j2a1,j3a1,j4]=\varmathbbE[a1,1]4\operatorname*{\varmathbb{E}}[|a_{1,j_{1}}a_{1,j_{2}}a_{1,j_{3}}a_{1,j_{4}}|]=\operatorname*{\varmathbb{E}}[|a_{1,1}|]^{4}, if we drop the O(1/n)O(1/n) fraction of terms where the j1,j2,j3,j4j_{1},j_{2},j_{3},j_{4} are not all distinct.

For the second term, recall that \varmathbbE[ai,j]=0\operatorname*{\varmathbb{E}}[a_{i,j}]=0. Since xjx_{j} and ai,ja_{i,j} are independent for i1i\neq 1 we also have \varmathbbE[ai,jxj]=0\operatorname*{\varmathbb{E}}[a_{i,j}x_{j}]=0. Thus the only terms that survive have j1,j2,j3,j4j_{1},j_{2},j_{3},j_{4} paired off in one of three ways: either j1=j2,j3=j4j_{1}=j_{2},j_{3}=j_{4} or j1=j3,j2=j4j_{1}=j_{3},j_{2}=j_{4} or j1=j4,j2=j3j_{1}=j_{4},j_{2}=j_{3}. Each of these contributes an identical \varmathbbE[a1,12]2\operatorname*{\varmathbb{E}}[a_{1,1}^{2}]^{2}. We can neglect the overcounting from the j1=j2=j3=j4j_{1}=j_{2}=j_{3}=j_{4} terms because it accounts for an O(1/n)O(1/n) fraction of the terms and by Lemma 7.2, \varmathbbE[a1,14]=μ4O(1)\operatorname*{\varmathbb{E}}[a_{1,1}^{4}]=\mu_{4}\leqslant O(1). We conclude that

By assumption \varmathbbE[a1,12]=1\operatorname*{\varmathbb{E}}[a_{1,1}^{2}]=1. From Lemma 7.2 we have that \varmathbbE[a1,14]O(1)\operatorname*{\varmathbb{E}}[a_{1,1}^{4}]\leqslant O(1). Recall the Berger-Hölder inequality \varmathbbE[X]\varmathbbE[X2]3/2\varmathbbE[X4]1/2\operatorname*{\varmathbb{E}}[|X|]\geqslant\frac{\operatorname*{\varmathbb{E}}[X^{2}]^{3/2}}{\operatorname*{\varmathbb{E}}[X^{4}]^{1/2}} from [Ber97]. Combining these facts we have that \varmathbbE[a1,1]μ41/2=:c11/4\operatorname*{\varmathbb{E}}[|a_{1,1}|]\geqslant\mu_{4}^{-1/2}=:c_{1}^{1/4}. This proves that \varmathbbE[Ax44](3+c1n2/m)(1o(1))\operatorname*{\varmathbb{E}}[\|Ax\|_{4}^{4}]\geqslant(3+c_{1}n^{2}/m)(1-o(1)).

To show that this inequality holds with high probability we will argue that \varmathbbV(Ax44)o(\varmathbbE[Ax44]2)\operatorname*{\varmathbb{V}}(\|Ax\|_{4}^{4})\leqslant o(\operatorname*{\varmathbb{E}}[\|Ax\|_{4}^{4}]^{2}). We calculate

The dominant terms here all correspond (up to 1o(1)1-o(1) factors) to terms in the expansion of \varmathbbE[Ax44]2\operatorname*{\varmathbb{E}}[\|Ax\|_{4}^{4}]^{2}. These terms correspond to i=1i=1, j1,j2,j3,j4j_{1},j_{2},j_{3},j_{4} all distinct or i1i\neq 1, j1,j2,j3,j4j_{1},j_{2},j_{3},j_{4} comprising two elements each repeated twice; and similarly one of those two possibilities for i,j1,j2,j3,j4i^{\prime},j_{1}^{\prime},j_{2}^{\prime},j_{3}^{\prime},j_{4}^{\prime}. In the case when i,i1i,i^{\prime}\neq 1, the dominant contribution comes from iii\neq i^{\prime} for which the only pairings we count involve matchings within the {j1,j2,j3,j4}\{j_{1},j_{2},j_{3},j_{4}\} and {j1,j2,j3,j4}\{j_{1}^{\prime},j_{2}^{\prime},j_{3}^{\prime},j_{4}^{\prime}\}, but not between these two sets. The terms we neglect in this way are smaller by a O(n1+m1)O(n^{-1}+m^{-1}) factor. We conclude that (7.9a) holds with high probability.

In case μ4\mu_{4} is large, this bound may not be optimal. In that case, we choose x=nejx=\sqrt{n}e_{j} for some j[n]j\in[n]. Then Ax=iai,jeiAx=\sum_{i}a_{i,j}e_{i} and Ax44=1mi[m]ai,j4\|Ax\|_{4}^{4}=\frac{1}{m}\sum_{i\in[m]}a_{i,j}^{4}. Thus \varmathbbE[Ax44]=μ4\operatorname*{\varmathbb{E}}[\|Ax\|_{4}^{4}]=\mu_{4}. For any fixed jj, we have \varmathbbE[Ax48]=μ42+m1(\varmathbbE[a1,18]μ42)\operatorname*{\varmathbb{E}}[\|Ax\|_{4}^{8}]=\mu_{4}^{2}+m^{-1}(\operatorname*{\varmathbb{E}}[a_{1,1}^{8}]-\mu_{4}^{2}), implying that \varmathbbV[Ax44]=O(1/m)\operatorname*{\varmathbb{V}}[\|Ax\|_{4}^{4}]=O(1/m). This implies that (7.9b) holds with high probability, and concludes the proof of the lower bound on A244\|A\|_{2\rightarrow 4}^{4}.

The more interesting half of the proof is to show that

Let ai:=j[n]ai,jeja_{i}:=\sum_{j\in[n]}a_{i,j}e_{j} so that A=i=1meiaiT/nA=\sum_{i=1}^{m}e_{i}a_{i}^{T}/\sqrt{n}. Define A2,2=1mi=1maiaiTaiaiTA_{2,2}=\frac{1}{m}\sum_{i=1}^{m}a_{i}a_{i}^{T}\otimes a_{i}a_{i}^{T}. For n2×n2n^{2}\times n^{2} real matrices X,YX,Y, define X,Y:=TrXTY/n2=\varmathbbEi,j[n]Xi,jYi,j\langle X,Y\rangle:=\operatorname{Tr}X^{T}Y/n^{2}=\operatorname*{\varmathbb{E}}_{i,j\in[n]}X_{i,j}Y_{i,j}. Additionally define the convex set X\mathcal{X} to be the set of n2×n2n^{2}\times n^{2} real matrices X=(X(i1,i2),(i3,i4))i1,i2,i3,i4[n]X=(X_{(i_{1},i_{2}),(i_{3},i_{4})})_{i_{1},i_{2},i_{3},i_{4}\in[n]} with X0X\succeq 0, \varmathbbEi,j[n]X(i,j),(i,j)=1\operatorname*{\varmathbb{E}}_{i,j\in[n]}X_{(i,j),(i,j)}=1 and X(i1,i2),(i3,i4)=X(iπ(1),iπ(2)),(iπ(3),iπ(4))X_{(i_{1},i_{2}),(i_{3},i_{4})}=X_{(i_{\pi(1)},i_{\pi(2)}),(i_{\pi(3)},i_{\pi(4)})} for any permutation πS4\pi\in\mathcal{S}_{4}. Finally, let hX(Y):=maxXXX,Yh_{\mathcal{X}}(Y):=\max_{X\in\mathcal{X}}\langle X,Y\rangle. It is straightforward to show (c.f. Lemma 9.3) that

We note that if X\mathcal{X} were defined without the symmetry constraint, it would simply be the convex hull of xxTxx^{T} for unit vectors x\varmathbbRn2x\in\varmathbb R^{n^{2}} and Tensor-SDP(A)\textup{{Tensor-SDP}}(A) would simply be the largest eigenvalue of A2,2A_{2,2}. However, we will later see that the symmetry constraint is crucial to Tensor-SDP(A)\textup{{Tensor-SDP}}(A) being O(1)O(1).

Our strategy will be to analyze A2,2A_{2,2} by applying Lemma 7.3 to show that A2,2A_{2,2} is close to Σ:=\varmathbbE[A2,2]=\varmathbbE[aiaiTaiaiT]\Sigma:=\operatorname*{\varmathbb{E}}[A_{2,2}]=\operatorname*{\varmathbb{E}}[a_{i}a_{i}^{T}\otimes a_{i}a_{i}^{T}]. First we calculate Σ\Sigma. Following the paragraph above (7.10) we find that

We can write this more concisely in terms of operators. Define

This will help us compute the spectrum of Σ\Sigma. First observe that F(xy)=yxF(x\otimes y)=y\otimes x for any x,y\varmathbbRnx,y\in\varmathbb R^{n} and thus has all eigenvalues ±1\pm 1. The spectrum of Δ\Delta is similarly bounded, since it is a projector onto the nn-dimensional subspace spanned by {eiei:i[n]}\{e_{i}\otimes e_{i}:i\in[n]\}. On the other hand, ΦΦT\Phi\Phi^{T} has a single eigenvalue equal to nn, and the rest equal to zero. Putting this together, Σ\Sigma has one eigenvalue equal to n+μ41n+\mu_{4}-1, n1n-1 eigenvalues equal to μ41\mu_{4}-1, n(n1)2\frac{n(n-1)}{2} eigenvalues equal to 22 and n(n1)2\frac{n(n-1)}{2} eigenvalues equal to 0.

We would like to show that A2,2A_{2,2} converges to Σ\Sigma using Lemma 7.3. To this end, define Σ0:=I+F+ΦΦT\Sigma_{0}:=I+F+\Phi\Phi^{T} and define bi:=Σ01/2(aiai)b_{i}:=\Sigma_{0}^{-1/2}(a_{i}\otimes a_{i}). Here Σ01/2\Sigma_{0}^{-1/2} refers to the square root of the pseudo-inverse of Σ0\Sigma_{0}. We can compute this by considering Σ0\Sigma_{0} to be an operator on 2\varmathbbRn\vee^{2}\varmathbb R^{n}, which is defined to be the symmetric subspace of \varmathbbRn\varmathbbRn\varmathbb R^{n}\otimes\varmathbb R^{n}, i.e. the set of +1+1-eigenvectors of FF. Observe that Σ01/21/2\|\Sigma_{0}^{-1/2}\|\leqslant 1/2.

Next we would like to bound the ψ1\psi_{1} norm of bib_{i}, corresponding to the fact that this distribution is not too “pointy.” To this end we will compute

where φ>0\varphi>0 will be chosen later, we have assumed WLOG in the first equation that x2\varmathbbRnx\in\vee^{2}\varmathbb R^{n} and we have defined y=Σ1/2xy=\Sigma^{-1/2}x in the last equation. Let YY be the n×nn\times n matrix with Yi,j=yi,j/nY_{i,j}=y_{i,j}/n. Thus y,aa=aTYa/n\langle y,a\otimes a\rangle=a^{T}Ya/n. To interpret the condition Σ01/2y21\|\Sigma_{0}^{1/2}y\|_{2}\leqslant 1, observe that

Then the ψ1\psi_{1} norm of bb is the smallest positive φ\varphi for which

is 2\leqslant 2. Choose a YY achieving the maximum in (7.22). Observe that

From (7.2a) and (7.2b) we have \varmathbbE[aTYa]=TrY1\operatorname*{\varmathbb{E}}[a^{T}Ya]=\operatorname{Tr}Y\leqslant 1. Applying Lemma 7.4 and using (7.23) we have that

Note that for a random variable XX, \varmathbbE[eX]=0sds\varmathbbP[eXs]=dtet\varmathbbP[Xt]\operatorname*{\varmathbb{E}}[e^{X}]=\int_{0}^{s}ds\,\operatorname*{\varmathbb{P}}[e^{X}\geqslant s]=\int_{-\infty}^{\infty}dt\,e^{t}\operatorname*{\varmathbb{P}}[X\geqslant t]. Combining this with (7.24) we can upper bound (7.22) as

Using TrY1\operatorname{Tr}Y\leqslant 1, it follows that we can take φ=O(ψ2)\varphi=O(\psi^{2}) to bound (7.25) 2\leqslant 2. We conclude that the ψ1\psi_{1} norm of the bib_{i} vectors are O(ψ2)\leqslant O(\psi^{2}).

We can now apply Lemma 7.3 to find that with high probability

where ε=Cψ4max(n2/m,n2/m)\varepsilon=C\psi^{4}\max(n^{2}/m,\sqrt{n^{2}/m}) for some C>0C>0. Rearranging we find that

To translate this operator inequality into a statement about hXh_{\mathcal{X}} we observe that

hX(M1)hX(M2)h_{\mathcal{X}}(M_{1})\leqslant h_{\mathcal{X}}(M_{2}) whenever M1M2M_{1}\preceq M_{2}; and

hX(M1+M2)hX(M1)+hX(M2)h_{\mathcal{X}}(M_{1}+M_{2})\leqslant h_{\mathcal{X}}(M_{1})+h_{\mathcal{X}}(M_{2}).

Here we define (μ43)+:=max(0,μ43)(\mu_{4}-3)^{+}:=\max(0,\mu_{4}-3). Observe that II, FF and Δ\Delta each have largest eigenvalue equal to 1, and so hX(I),hX(F),hX(Δ)1h_{\mathcal{X}}(I),h_{\mathcal{X}}(F),h_{\mathcal{X}}(\Delta)\leqslant 1. (In fact, these are each equalities.)

However, the single nonzero eigenvalue of ΦΦT\Phi\Phi^{T} is equal to nn. Here we will need to use the symmetry constraint on X\mathcal{X}. Let XΓX^{\Gamma} be the matrix with entries X(i1,i2),(i3,i4)Γ:=X(i1,i4),(i3,i2)X^{\Gamma}_{(i_{1},i_{2}),(i_{3},i_{4})}:=X_{(i_{1},i_{4}),(i_{3},i_{2})}. If XXX\in\mathcal{X} then X=XΓX=X^{\Gamma}. Additionally, X,Y=XΓ,YΓ\langle X,Y\rangle=\langle X^{\Gamma},Y^{\Gamma}\rangle. Thus

This last equality follows from the fact that (ΦΦT)Γ=F(\Phi\Phi^{T})^{\Gamma}=F.

Putting together these ingredients, we establish (7.12), which concludes the proof of the theorem. ∎

3 Discussion

It may seem surprising that the factor of 31/43^{1/4} emerges even for matrices with, say, ±1\pm 1 entries. An intuitive justification for this is that even if the columns of AA are not Gaussian vectors, most linear combinations of them resemble Gaussians. The following Lemma shows that this behavior begins as soon as nn is ω(1)\omega(1).

Let A=i=1meiaiT/nA=\sum_{i=1}^{m}e_{i}a_{i}^{T}/\sqrt{n} with \varmathbbEiai241\operatorname*{\varmathbb{E}}_{i}\|a_{i}\|_{2}^{4}\geqslant 1. Then A24(3/(1+2/n))1/4.\|A\|_{2\rightarrow 4}\geqslant(3/(1+2/n))^{1/4}.

To see that the denominator cannot be improved in general, observe that when n=1n=1 a random sign matrix will have 242\rightarrow 4 norm equal to 1.

Choose x\varmathbbRnx\in\varmathbb R^{n} to be a random Gaussian vector such that \varmathbbExx22=1\operatorname*{\varmathbb{E}}_{x}\|x\|_{2}^{2}=1. Then

The last equality comes from the fact that ai,x\langle a_{i},x\rangle is a Gaussian random variable with mean zero and variance ai22/n\|a_{i}\|_{2}^{2}/n. On the other hand, \varmathbbExx24=1+2/n\operatorname*{\varmathbb{E}}_{x}\|x\|_{2}^{4}=1+2/n. Thus, there must exist an xx for which Ax44/x243/(1+2/n)\|Ax\|_{4}^{4}/\|x\|_{2}^{4}\geqslant 3/(1+2/n). ∎

It is instructive to consider a variant of the above argument. A simpler upper bound on the value of Tensor-SDP(A)\textup{{Tensor-SDP}}(A) is given simply by A2,2\|A_{2,2}\|. However, the presence of the ΦΦT\Phi\Phi^{T} term means that this bound will be off by an nn-dependent factor. Thus we observe that the symmetry constraints of Tensor-SDP(4)\textup{{Tensor-SDP}}^{(4)} provide a crucial advantage over the simpler bound using eigenvalues. In the language of quantum information (see Section 9.3), this means that the PPT constraint is necessary for the approximation to succeed. See Section 9.3.2 for an example of this that applies to higher levels of the hierarchy as well.

There is a similar reason why we cannot directly apply Lemma 7.3 to the vectors aiaia_{i}\otimes a_{i}. In computing the ψ1\psi_{1} norm for aiaia_{i}\otimes a_{i}, take x=nΦx=\sqrt{n}\Phi in (7.5). Then we find that nΦ,aiain/φ=ai22n/φ\langle\sqrt{n}\Phi,a_{i}\otimes a_{i}\rangle n/\varphi=\|a_{i}\|_{2}^{2}\sqrt{n}/\varphi and thus ψ1n\psi_{1}\sim\sqrt{n}.

On the other hand, when the aia_{i} are chosen to be random complex Gaussian vectors, we simply have \varmathbbEaiaiaiai=I+F\operatorname*{\varmathbb{E}}a_{i}a_{i}^{*}\otimes a_{i}a_{i}^{*}=I+F. In this case, the upper bound Tensor-SDP(A)A2,2\textup{{Tensor-SDP}}(A)\leqslant\|A_{2,2}\| is already sufficient. Thus, only real random vectors demonstrate a separation between these two bounds.

Our results can be seen as proving that hPPT(M)h_{\operatorname{PPT}}(M) is close to hSep(M)h_{\operatorname{Sep}}(M) when MM is of the form 1mi=1maiaiTaiaiT\frac{1}{m}\sum_{i=1}^{m}a_{i}a_{i}^{T}\otimes a_{i}a_{i}^{T} and a1,,ama_{1},\ldots,a_{m} are random vectors. In the case when MM is instead a randomly chosen projector, Montanaro [Mon13] proved a similar bound using much more sophisticated techniques. His methods do not apply to our problem since our choices of MM are very different than random projectors.

The 2-to-q norm and small-set expansion

In this section we show that a graph is a small-set expander if and only if the projector to the subspace of its adjacency matrix’s top eigenvalues has a bounded 2q2\to q norm for even q4q\geqslant 4. While the “if” part was known before, the “only if” part is novel. This characterization of small-set expanders is of general interest, and also leads to a reduction from the Small-Set Expansion problem considered in [RS10] to the problem of obtaining a good approximation for the 2q2\to q norms.

Our main theorem of this section is the following:

For every regular graph GG, λ>0\lambda>0 and even qq,

(Norm bound implies expansion) For all δ>0,ε>0\delta>0,\varepsilon>0, Pλ(G)2qε/δ(q2)/2q\|P_{\geqslant\lambda}(G)\|_{2\to q}\leqslant\varepsilon/\delta^{(q-2)/2q} implies that ΦG(δ)1λε2\Phi_{G}(\delta)\geqslant 1-\lambda-\varepsilon^{2}.

(Expansion implies norm bound) There are constants c1,c2>0c_{1},c_{2}>0 such that for all δ>0\delta>0, ΦG(δ)>1c1λ2q2c2q\Phi_{G}(\delta)>1-c_{1}\lambda^{2q}2^{-c_{2}q} implies Pλ(G)2q2/δ\|P_{\geqslant\lambda}(G)\|_{2\rightarrow q}\leqslant 2/\sqrt{\delta}.

One corollary of Theorem 2.4 is that a good approximation to the 2q2\to q norm implies an approximation of Φδ(G)\Phi_{\delta}(G)

If there is a polynomial-time computable relaxation R\mathcal{R} yielding good approximation for the 2q2\to q, then the Small-Set Expansion Hypothesis of [RS10] is false.

Using [RST10], to refute the small-set expansion hypothesis it is enough to come up with an efficient algorithm that given an input graph GG and sufficiently small δ>0\delta>0, can distinguish between the Yes case: ΦG(δ)<0.1\Phi_{G}(\delta)<0.1 and the No case ΦG(δ)>12clog(1/δ)\Phi_{G}(\delta^{\prime})>1-2^{-c\log(1/\delta^{\prime})} for any δδ\delta^{\prime}\geqslant\delta and some constant cc. In particular for all η>0\eta>0, if δ\delta is small enough then in the No case ΦG(δ0.4)>1η\Phi_{G}(\delta^{0.4})>1-\eta.

Using Theorem 2.4, in the Yes case we know P1/22q1/(10δ(q2)/2q)\|P_{\geqslant 1/2}\|_{2\rightarrow q}\geqslant 1/(10\delta^{(q-2)/2q}), while in the No case, if we choose δ\delta sufficiently small so that η\eta is smaller than c1(1/2)q2c2qc_{1}(1/2)^{q}2^{-c_{2}q}, then we know that P1/22q2/δ0.2\|P_{\geqslant 1/2}\|_{2\rightarrow q}\leqslant 2/\sqrt{\delta^{0.2}}. Clearly, if we have a good approximation for the 2q2\to q norm then, for sufficiently small δ\delta, we can distinguish between these two cases. ∎

The first part of Theorem 2.4 follows from previous work (e.g., see [KV05]). For completeness, we include a proof in Appendix B. The second part will follow from the following lemma:

Set e=e(λ,q):=c12c2q/λqe=e(\lambda,q):=c_{1}2^{c_{2}q}/\lambda^{q}, with universal constants c1,c2>0c_{1},c_{2}>0. Then for every λ>0\lambda>0 and 1δ01\geqslant\delta\geqslant 0, if GG is a graph that satisfies cp(G(S))1/(eS)\textup{{cp}}(G(S))\leqslant 1/(e|S|) for all SS with μ(S)δ\mu(S)\leqslant\delta, then fq2f2/δ\lVert f\rVert_{q}\leqslant 2\lVert f\rVert_{2}/\sqrt{\delta} for all fVλ(G)f\in V_{\geqslant\lambda}(G).

We use the variant of the local Cheeger bound obtained in [Ste10, Theorem 2.1], stating that if ΦG(δ)1η\Phi_{G}(\delta)\geqslant 1-\eta then for every fL2(V)f\in L_{2}(V) satisfying f12δf22\lVert f\rVert_{1}^{2}\leqslant\delta\lVert f\rVert_{2}^{2}, Gf22cηf22\lVert Gf\rVert_{2}^{2}\leqslant c\sqrt{\eta}\lVert f\rVert_{2}^{2}. The proof follows by noting that for every set SS, if ff is the characteristic function of SS then f1=f22=μ(S)\lVert f\rVert_{1}=\lVert f\rVert_{2}^{2}=\mu(S), and cp(G(S))=Gf22/(μ(S)S)\textup{{cp}}(G(S))=\lVert Gf\rVert_{2}^{2}/(\mu(S)|S|). ∎

Fix λ>0\lambda>0. We assume that the graph satisfies the condition of the Lemma with c12c2q/λqc_{1}2^{c_{2}q}/\lambda^{q}, for constants c1,c2c_{1},c_{2} that we will set later. Let G=(V,E)G=(V,E) be such a graph, and ff be function in Vλ(G)V_{\geqslant\lambda}(G) with f2=1\lVert f\rVert_{2}=1 that maximizes fq\lVert f\rVert_{q}. We write f=i=1mαiχif=\sum_{i=1}^{m}\alpha_{i}\chi_{i} where χ1,,χm\chi_{1},\ldots,\chi_{m} denote the eigenfunctions of GG with values λ1,,λm\lambda_{1},\ldots,\lambda_{m} that are at least λ\lambda. Assume towards a contradiction that fq>2/δ\lVert f\rVert_{q}>2/\sqrt{\delta}. We’ll prove that g=i=1m(αi/λi)χig=\sum_{i=1}^{m}(\alpha_{i}/\lambda_{i})\chi_{i} satisfies gq5fq/λ\lVert g\rVert_{q}\geqslant 5\lVert f\rVert_{q}/\lambda. This is a contradiction since (using λi[λ,1]\lambda_{i}\in[\lambda,1]) g2f2/λ\lVert g\rVert_{2}\leqslant\lVert f\rVert_{2}/\lambda, and we assumed ff is a function in Vλ(G)V_{\geqslant\lambda}(G) with a maximal ratio of fq/f2\lVert f\rVert_{q}/\lVert f\rVert_{2}.

Let UVU\subseteq V be the set of vertices such that f(x)1/δ|f(x)|\geqslant 1/\sqrt{\delta} for all xUx\in U. Using Markov inequality and the fact that \varmathbbExV[f(x)2]=1\operatorname*{\varmathbb{E}}_{x\in V}[f(x)^{2}]=1, we know that μ(U)=U/Vδ\mu(U)=|U|/|V|\leqslant\delta, meaning that under our assumptions any subset SUS\subseteq U satisfies cp(G(S))1/(eS)\textup{{cp}}(G(S))\leqslant 1/(e|S|). On the other hand, because fqq2q/δq/2\lVert f\rVert_{q}^{q}\geqslant 2^{q}/\delta^{q/2}, we know that UU contributes at least half of the term fqq=\varmathbbExVf(x)q\lVert f\rVert_{q}^{q}=\operatorname*{\varmathbb{E}}_{x\in V}f(x)^{q}. That is, if we define α\alpha to be μ(U)\varmathbbExUf(x)q\mu(U)\operatorname*{\varmathbb{E}}_{x\in U}f(x)^{q} then αfqq/2\alpha\geqslant\lVert f\rVert_{q}^{q}/2. We will prove the lemma by showing that gqq(10λ1)qα\lVert g\rVert_{q}^{q}\geqslant\left(10\lambda^{-1}\right)^{q}\alpha.

Let cc be a sufficiently large constant (c=100c=100 will do). We define UiU_{i} to be the set {xU:f(x)[ci/δ,ci+1/δ)}\{x\in U:f(x)\in[c^{i}/\sqrt{\delta},c^{i+1}/\sqrt{\delta})\}, and let II be the maximal ii such that UiU_{i} is non-empty. Thus, the sets U0,,UIU_{0},\ldots,U_{I} form a partition of UU (where some of these sets may be empty). We let αi\alpha_{i} be the contribution of UiU_{i} to α\alpha. That is, αi=μi\varmathbbExUif(x)q\alpha_{i}=\mu_{i}\operatorname*{\varmathbb{E}}_{x\in U_{i}}f(x)^{q}, where μi=μ(Ui)\mu_{i}=\mu(U_{i}). Note that α=α0++αI\alpha=\alpha_{0}+\cdots+\alpha_{I}. We’ll show that there are some indices i1,,iJi_{1},\ldots,i_{J} such that:

αi1++αiJα/(2c10)\alpha_{i_{1}}+\cdots+\alpha_{i_{J}}\geqslant\alpha/(2c^{10}).

For all j[J]j\in[J], there is a non-negative function gj:V\varmathbbRg_{j}:V\to\varmathbb R such that \varmathbbExVgj(x)qeαij/(10c2)q/2\operatorname*{\varmathbb{E}}_{x\in V}g_{j}(x)^{q}\geqslant e\alpha_{i_{j}}/(10c^{2})^{q/2}.

For every xVx\in V, g1(x)++gJ(x)g(x)g_{1}(x)+\cdots+g_{J}(x)\leqslant|g(x)|.

Showing these will complete the proof, since it is easy to see that for two non-negative functions g,gg^{\prime},g^{\prime\prime} and even integer qq, \varmathbbE(g(x)+g(x))q\varmathbbEg(x)q+\varmathbbEg(x)q\operatorname*{\varmathbb{E}}(g^{\prime}(x)+g^{\prime\prime}(x))^{q}\geqslant\operatorname*{\varmathbb{E}}g^{\prime}(x)^{q}+\operatorname*{\varmathbb{E}}g^{\prime\prime}(x)^{q}, and hence (ii) and (iii) imply that

Using (i) we conclude that for e2c1010q(10c2)q/2/λqe\geqslant 2c^{10}10^{q}(10c^{2})^{q/2}/\lambda^{q}, the right-hand side of (8.1) will be larger than (10/λ)qα(10/\lambda)^{q}\alpha.

We find the indices i1,,iJi_{1},\ldots,i_{J} iteratively. We let I\mathcal{I} be initially the set {0..I}\{0..I\} of all indices. For j=1,2,...j=1,2,... we do the following as long as I\mathcal{I} is not empty:

Let iji_{j} be the largest index in I\mathcal{I}.

Remove from I\mathcal{I} every index ii such that αic10αij/2iij\alpha_{i}\leqslant c^{10}\alpha_{i_{j}}/2^{i-i_{j}}.

We let JJ denote the step when we stop. Note that our indices i1,,iJi_{1},\ldots,i_{J} are sorted in descending order. For every step jj, the total of the αi\alpha_{i}’s for all indices we removed is less than c10αijc^{10}\alpha_{i_{j}} and hence we satisfy (i). The crux of our argument will be to show (ii) and (iii). They will follow from the following claim:

Let SVS\subseteq V and β>0\beta>0 be such that Sδ|S|\leqslant\delta and f(x)β|f(x)|\geqslant\beta for all xSx\in S. Then there is a set TT of size at least eSe|S| such that \varmathbbExTg(x)2β2/4\operatorname*{\varmathbb{E}}_{x\in T}g(x)^{2}\geqslant\beta^{2}/4.

The claim will follow from the following lemma:

Let DD be a distribution with cp(D)1/N\textup{{cp}}(D)\leqslant 1/N and gg be some function. Then there is a set TT of size NN such that \varmathbbExTg(x)2(\varmathbbEg(D))2/4\operatorname*{\varmathbb{E}}_{x\in T}g(x)^{2}\geqslant(\operatorname*{\varmathbb{E}}g(D))^{2}/4.

Identify the support of DD with the set [M][M] for some MM, we let pip_{i} denote the probability that DD outputs ii, and sort the pip_{i}’s such that p1p2pMp_{1}\geqslant p_{2}\cdots p_{M}. We let β\beta^{\prime} denote \varmathbbEg(D)\operatorname*{\varmathbb{E}}g(D); that is, β=i=1Mpig(i)\beta^{\prime}=\sum_{i=1}^{M}p_{i}g(i). We separate to two cases. If i>Npig(i)β/2\sum_{i>N}p_{i}g(i)\geqslant\beta^{\prime}/2, we define the distribution DD^{\prime} as follows: we set \varmathbbP[D=i]\operatorname*{\varmathbb{P}}[D^{\prime}=i] to be pip_{i} for i>Ni>N, and we let all iNi\leqslant N be equiprobable (that is be output with probability (i=1Npi)/N(\sum_{i=1}^{N}p_{i})/N). Clearly, \varmathbbEg(D)i>Npig(i)β/2\operatorname*{\varmathbb{E}}|g(D^{\prime})|\geqslant\sum_{i>N}p_{i}g(i)\geqslant\beta^{\prime}/2, but on the other hand, since the maximum probability of any element in DD^{\prime} is at most 1/N1/N, it can be expressed as a convex combination of flat distributions over sets of size NN, implying that one of these sets TT satisfies \varmathbbExTg(x)β/2\operatorname*{\varmathbb{E}}_{x\in T}|g(x)|\geqslant\beta^{\prime}/2, and hence \varmathbbExTg(x)2β2/4\operatorname*{\varmathbb{E}}_{x\in T}g(x)^{2}\geqslant\beta^{\prime 2}/4.

The other case is that i=1Npig(i)β/2\sum_{i=1}^{N}p_{i}g(i)\geqslant\beta^{\prime}/2. In this case we use Cauchy-Schwarz and argue that

But using our bound on the collision probability, the right-hand side of (8.2) is upper bounded by 1Ni=1Ng(i)2=\varmathbbEx[N]g(x)2\tfrac{1}{N}\sum_{i=1}^{N}g(i)^{2}=\operatorname*{\varmathbb{E}}_{x\in[N]}g(x)^{2}. ∎

By construction f=Ggf=Gg, and hence we know that for every xx, f(x)=\varmathbbEyxg(y)f(x)=\operatorname*{\varmathbb{E}}_{y\sim x}g(y). This means that if we let DD be the distribution G(S)G(S) then

By the expansion property of GG, cp(D)1/(eS)\textup{{cp}}(D)\leqslant 1/(e|S|) and thus by Lemma 8.4 there is a set TT of size eSe|S| satisfying \varmathbbExTg(x)2β2/4\operatorname*{\varmathbb{E}}_{x\in T}g(x)^{2}\geqslant\beta^{2}/4. ∎

We will construct the functions g1,,gJg_{1},\ldots,g_{J} by applying iteratively Claim 8.3. We do the following for j=1,,Jj=1,\ldots,J:

Let TjT_{j} be the set of size eUije|U_{i_{j}}| that is obtained by applying Claim 8.3 to the function ff and the set UijU_{i_{j}}. Note that \varmathbbExTjg(x)2βij2/4\operatorname*{\varmathbb{E}}_{x\in T_{j}}g(x)^{2}\geqslant\beta_{i_{j}}^{2}/4, where we let βi=ci/δ\beta_{i}=c^{i}/\sqrt{\delta} (and hence for every xUix\in U_{i}, βif(x)cβi\beta_{i}\leqslant|f(x)|\leqslant c\beta_{i}).

Let gjg^{\prime}_{j} be the function on input xx that outputs γg(x)\gamma\cdot|g(x)| if xTjx\in T_{j} and otherwise, where γ1\gamma\leqslant 1 is a scaling factor that ensures that \varmathbbExTjg(x)2\operatorname*{\varmathbb{E}}_{x\in T_{j}}g^{\prime}(x)^{2} equals exactly βij2/4\beta_{i_{j}}^{2}/4.

We define gj(x)=max{0,gj(x)k<jgk(x)}g_{j}(x)=\max\{0,g^{\prime}_{j}(x)-\sum_{k<j}g_{k}(x)\}.

Note that the second step ensures that gj(x)g(x)g^{\prime}_{j}(x)\leqslant|g(x)|, while the third step ensures that g1(x)++gj(x)gj(x)g_{1}(x)+\cdots+g_{j}(x)\leqslant g^{\prime}_{j}(x) for all jj, and in particular g1(x)++gJ(x)g(x)g_{1}(x)+\cdots+g_{J}(x)\leqslant|g(x)|. Hence the only thing left to prove is the following:

\varmathbbExVgj(x)qeαij/(10c)q/2\operatorname*{\varmathbb{E}}_{x\in V}g_{j}(x)^{q}\geqslant e\alpha_{i_{j}}/(10c)^{q/2}

Recall that for every ii, αi=μi\varmathbbExUif(x)q\alpha_{i}=\mu_{i}\operatorname*{\varmathbb{E}}_{x\in U_{i}}f(x)^{q}, and hence (using f(x)[βi,cβi)f(x)\in[\beta_{i},c\beta_{i}) for xUix\in U_{i}):

Now fix T=TjT=T_{j}. Since \varmathbbExVgj(x)q\operatorname*{\varmathbb{E}}_{x\in V}g_{j}(x)^{q} is at least (in fact equal) μ(T)\varmathbbExTgj(x)q\mu(T)\operatorname*{\varmathbb{E}}_{x\in T}g_{j}(x)^{q} and μ(T)=eμ(Uij)\mu(T)=e\mu(U_{i_{j}}), we can use (8.3) and \varmathbbExTgj(x)q(ExTgj(x)2)q/2\operatorname*{\varmathbb{E}}_{x\in T}g_{j}(x)^{q}\geqslant(E_{x\in T}g_{j}(x)^{2})^{q/2}, to reduce proving the claim to showing the following:

We know that \varmathbbExTgj(x)2=βij2/4\operatorname*{\varmathbb{E}}_{x\in T}g^{\prime}_{j}(x)^{2}=\beta_{i_{j}}^{2}/4. We claim that (8.4) will follow by showing that for every k<jk<j,

where i=ikiji^{\prime}=i_{k}-i_{j}. (Note that i>0i^{\prime}>0 since in our construction the indices i1,,iJi_{1},\ldots,i_{J} are sorted in descending order.)

Indeed, (8.5) means that if we let momentarily gj\lVert g_{j}\rVert denote \varmathbbExTgj(x)2\sqrt{\operatorname*{\varmathbb{E}}_{x\in T}g_{j}(x)^{2}} then

The first inequality holds because we can write gjg_{j} as gjhjg^{\prime}_{j}-h_{j}, where hj=min{gj,k<jgk}h_{j}=\min\{g^{\prime}_{j},\sum_{k<j}g_{k}\}. Then, on the one hand, gjgjhj\lVert g_{j}\rVert\geqslant\lVert g^{\prime}_{j}\rVert-\lVert h_{j}\rVert, and on the other hand, hjk<jgk\lVert h_{j}\rVert\leqslant\lVert\sum_{k<j}g_{k}\rVert since gj0g^{\prime}_{j}\geqslant 0. The second inequality holds because gkgk\lVert g_{k}\rVert\leqslant\lVert g^{\prime}_{k}\rVert. By squaring (8.6) and plugging in the value of gj2\lVert g^{\prime}_{j}\rVert^{2} we get (8.4).

since otherwise the index iji_{j} would have been removed from the I\mathcal{I} at the kthk^{th} step. Since βik=βijci\beta_{i_{k}}=\beta_{i_{j}}c^{i^{\prime}}, we can plug (8.3) in (8.7) to get

Since Ti=eUi|T_{i}|=e|U_{i}| for all ii, it follows that Tk/T(2/c)4ic6|T_{k}|/|T|\leqslant(2/c)^{4i^{\prime}}c^{-6}. On the other hand, we know that \varmathbbExTkgk(x)2=βik2/4=c2iβij2/4\operatorname*{\varmathbb{E}}_{x\in T_{k}}g^{\prime}_{k}(x)^{2}=\beta_{i_{k}}^{2}/4=c^{2i^{\prime}}\beta^{2}_{i_{j}}/4. Thus,

and now we just choose cc sufficiently large so that c2/24>100c^{2}/2^{4}>100. ∎ ∎

Relating the 2-to-4 norm and the injective tensor norm

In this section, we present several equivalent formulations of the 2-to-4 norm: 1) as the injective tensor norm of a 4-tensor, 2) as the injective tensor norm of a 3-tensor, and 3) as the maximum of a linear function over a convex set, albeit a set where the weak membership problem is hard. Additionally, we can consider maximizations over real or complex vectors. These equivalent formulations are discussed in Section 9.1.

We use this to show hardness of approximation (Theorem 2.5) for the 2-to-4 norm in Section 9.2, and then show positive algorithmic results (Theorem 2.3) in Section 9.3. Somewhat surprisingly, many of the key arguments in these sections are imported from the quantum information literature, even though no quantum algorithms are involved. It is an interesting question to find a more elementary proof of the result in Section 9.3.

In this section, it will be convenient to sometimes work with the counting norms .{\boldsymbol{\lVert}}.{\boldsymbol{\rVert}}, which we recall are defined as xp:=(ixip)1/p{\boldsymbol{\lVert}}x{\boldsymbol{\rVert}}_{p}:=(\sum_{i}|x_{i}|^{p})^{1/p}, and the counting inner product, defined by x,y:=xy\boldsymbol{\langle}x,y\boldsymbol{\rangle}:=x^{*}y, where ∗ denotes the conjugate transpose.

We will also need the definition of separable states from quantum information. For a vector space VV, define L(V)L(V) to be the linear operators on VV, and define D(V):={ρL(V):ρ0,Trρ=1}=conv{vv:vS(V)}\mathcal{D}(V):=\{\rho\in L(V):\rho\succeq 0,\operatorname{Tr}\rho=1\}=\operatorname{conv}\{vv^{*}:v\in\mathbf{S}(V)\} to be the density operators on VV. The trace induces an inner product on operators: X,Y:=TrXY\boldsymbol{\langle}X,Y\boldsymbol{\rangle}:=\operatorname{Tr}X^{*}Y. An important class of density operators are the separable density operators. For vector spaces V1,,VrV_{1},\ldots,V_{r}, these are

If V=V1==VrV=V_{1}=\cdots=V_{r}, then let Sepr(V)\operatorname{Sep}^{r}(V) denote Sep(V1,,Vr)\operatorname{Sep}(V_{1},\ldots,V_{r}). Physically, density operators are the quantum analogues of probability distributions, and separable density operators describe unentangled quantum states; conversely, entangled states are defined to be the set of density operators that are not separable. For readers familiar with quantum information, we point out that our treatment differs principally in its use of the expectation for norms and inner products, rather than the sum.

For any bounded convex set KK, define the support function of KK to be

Define ei\varmathbbFne_{i}\in\varmathbb F^{n} to be the vector with 1 in the ithi^{\text{th}} position. Now we can give the convex-optimization formulation of the injective tensor norm.

Let V1,,VrV_{1},\ldots,V_{r} be vector spaces with ni:=dimVin_{i}:=\dim V_{i}, and TV1VrT\in V_{1}\otimes\cdots\otimes V_{r}. Choose an orthonormal basis e1,,enre_{1},\ldots,e_{n_{r}} for VrV_{r}. Define T1,,TnrV1Vr1T_{1},\ldots,T_{n_{r}}\in V_{1}\otimes\cdots\otimes V_{r_{1}} by T=i=1nrTieiT=\sum_{i=1}^{n_{r}}T_{i}\otimes e_{i} and define ML(V1Vr1)M\in L(V_{1}\otimes\cdots\otimes V_{r-1}) by M=i=1nrTiTiM=\sum_{i=1}^{n_{r}}T_{i}T_{i}^{*}. Then

Observe that any M0M\succeq 0 can be expressed in this form, possibly by padding nrn_{r} to be at least rankM\operatorname{rank}M. Thus calculating inj{\boldsymbol{\lVert}}\cdot{\boldsymbol{\rVert}}_{\rm inj} for rr-tensors is equivalent in difficulty to computing hSepr1\mathbf{h}_{\operatorname{Sep}^{r-1}} for p.s.d. arguments. This argument appeared before in [HM10], where it was explained using quantum information terminology.

It is instructive to consider the r=2r=2 case. In this case, TT is equivalent to a matrix T^\hat{T} and Tinj=T^22{\boldsymbol{\lVert}}T{\boldsymbol{\rVert}}_{\rm inj}={\boldsymbol{\lVert}}\hat{T}{\boldsymbol{\rVert}}_{2\rightarrow 2}. Moreover Sep1(\varmathbbFn1)=D(\varmathbbFn1)\operatorname{Sep}^{1}(\varmathbb F^{n_{1}})=\mathcal{D}(\varmathbb F^{n_{1}}) is simply the convex hull of vvvv^{*} for unit vectors vv. Thus hSep1(\varmathbbFn1)(M)\mathbf{h}_{\operatorname{Sep}^{1}(\varmathbb F^{n_{1}})}(M) is simply the maximum eigenvalue of M=TTM=TT^{*}. In this case, Lemma 9.1 merely states that the square of the largest singular value of T^\hat{T} is the largest eigenvalue of T^T^\hat{T}\hat{T}^{*}. The general proof follows this framework.

In what follows, we will also need to make use of some properties of symmetric tensors. Define Sk\mathcal{S}_{k} to be the group of permutations of [k][k] and define Pn(π)L((\varmathbbFn)k)P_{n}(\pi)\in L((\varmathbb F^{n})^{\otimes k}) to be the operator that permutes kk tensor copies of \varmathbbFn\varmathbb F^{n} according to π\pi. Formally,

Then define k\varmathbbFn\vee^{k}\varmathbb F^{n} to be the subspace of vectors in (\varmathbbFn)r(\varmathbb F^{n})^{\otimes r} that are unchanged by each Pn(π)P_{n}(\pi). This space is called the symmetric subspace. A classic result in symmetric polynomials states that r\varmathbbFn\vee^{r}\varmathbb F^{n} is spanned by the vectors {vr:v\varmathbbFn}\{v^{\otimes r}:v\in\varmathbb F^{n}\}.For the proof, observe that vrr\varmathbbFnv^{\otimes r}\in\vee^{r}\varmathbb F^{n} for any v\varmathbbFnv\in\varmathbb F^{n}. To construct a basis for r\varmathbbFn\vee^{r}\varmathbb F^{n} out of linear combinations of different vrv^{\otimes r}, let z1,,znz_{1},\ldots,z_{n} be indeterminates and evaluate the rr-fold derivatives of (z1e1++znen)r(z_{1}e_{1}+\cdots+z_{n}e_{n})^{\otimes r} at z1==zn=0z_{1}=\cdots=z_{n}=0.

One important fact about symmetric tensors is that for injective tensor norm, the vectors in the maximization can be taken to be equal. Formally,

This has been proven in several different works; see the paragraph above Eq. (3.1) of [CKP00] for references.

1.2 Connection to the 2-to-4 norm

Let A=i=1meiaiTA=\sum_{i=1}^{m}e_{i}a_{i}^{T}, so that a1,,am\varmathbbRna_{1},\ldots,a_{m}\in\varmathbb R^{n} are the rows of AA. Define

Further, for a real tensor T(\varmathbbRn)rT\in(\varmathbb R^{n})^{\otimes r}, define Tinj[\varmathbbC]{\boldsymbol{\lVert}}T{\boldsymbol{\rVert}}_{\operatorname{inj}[\varmathbb C]} to be the injective tensor norm that results from treating TT as a complex tensor; that is, max{T,x1xr:x1,,xrS(\varmathbbCn)}\max\{|\boldsymbol{\langle}T,x_{1}\otimes\cdots\otimes x_{r}\boldsymbol{\rangle}|:x_{1},\ldots,x_{r}\in\mathbf{S}(\varmathbb C^{n})\}. For r3r\geqslant 3, Tinj[\varmathbbC]{\boldsymbol{\lVert}}T{\boldsymbol{\rVert}}_{\operatorname{inj}[\varmathbb C]} can be larger than Tinj{\boldsymbol{\lVert}}T{\boldsymbol{\rVert}}_{\operatorname{inj}} by as much as 2\sqrt{2} [CKP00].

Our main result on equivalent forms of the 242\rightarrow 4 norm is the following.

Next one can verify with direct calculation (and using maxzS(\varmathbbRn)v,z=v2\max_{z\in\mathbf{S}(\varmathbb R^{n})}\boldsymbol{\langle}v,z\boldsymbol{\rangle}={\boldsymbol{\lVert}}v{\boldsymbol{\rVert}}_{2}) that

Now define z(i):=ei,zz(i):=\boldsymbol{\langle}e_{i},z\boldsymbol{\rangle} and continue.

From Lemma 9.1, we thus have A244=hSep2(\varmathbbRn)(A2,2)=hSep2(\varmathbbCn)(A2,2){\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4}^{4}=\mathbf{h}_{\operatorname{Sep}^{2}(\varmathbb R^{n})}(A_{2,2})=\mathbf{h}_{\operatorname{Sep}^{2}(\varmathbb C^{n})}(A_{2,2}).

To justify (9.22), we argue that the maximum in (9.21) is achieved by taking all the z(i)z(i) real (and indeed nonnegative). The resulting matrix iz(i)aiaiT\sum_{i}z(i)a_{i}a_{i}^{T} is real and symmetric, so its operator norm is achieved by taking x=yx=y to be real vectors. Thus, the maximum in A3inj[\varmathbbC]{\boldsymbol{\lVert}}A_{3}{\boldsymbol{\rVert}}_{\operatorname{inj}[\varmathbb C]} is achieved for real x,y,zx,y,z and as a result A3inj[\varmathbbC]=A3inj{\boldsymbol{\lVert}}A_{3}{\boldsymbol{\rVert}}_{\operatorname{inj}[\varmathbb C]}={\boldsymbol{\lVert}}A_{3}{\boldsymbol{\rVert}}_{\operatorname{inj}}.

Having now made the bridge to complex vectors, we can work backwards to establish the last equivalence: A4inj[\varmathbbC]\|A_{4}\|_{\operatorname{inj}[\varmathbb C]}. Repeating the argument that led to (9.17) will establish that A4inj[\varmathbbC]=maxxS(\varmathbbCn)maxzS(\varmathbbCm)A3,xxz2=A3inj[\varmathbbC]2{\boldsymbol{\lVert}}A_{4}{\boldsymbol{\rVert}}_{\operatorname{inj}[\varmathbb C]}=\max_{x\in\mathbf{S}(\varmathbb C^{n})}\max_{z\in\mathbf{S}(\varmathbb C^{m})}|\boldsymbol{\langle}A_{3},x\otimes x\otimes z\boldsymbol{\rangle}|^{2}={\boldsymbol{\lVert}}A_{3}{\boldsymbol{\rVert}}_{\operatorname{inj}[\varmathbb C]}^{2}. ∎

2 Hardness of approximation for the 2-to-4 norm

This section is devoted to the proof of Theorem 2.5, establishing hardness of approximation for the 2-to-4 norm.

First, we restate Theorem 2.5 more precisely. We omit the reduction to when AA is a projector, deferring this argument to Corollary 9.9, where we will further use a randomized reduction.

(restatement of Theorem 2.5) Let ϕ\phi be a 3-SAT instance with nn variables and O(n)O(n) clauses. Determining whether ϕ\phi is satisfiable can be reduced in polynomial time to determining whether A24C\|A\|_{2\rightarrow 4}\geqslant C or A24c\|A\|_{2\rightarrow 4}\leqslant c where 0c<C0\leqslant c<C and AA is an m×mm\times m matrix. This is possible for two choices of parameters:

m=poly(n)m=\operatorname{poly}(n), and C/c>1+1/npolylog(n)C/c>1+1/n\operatorname{poly}\log(n); or,

m=exp(npolylog(n)log(C/c))m=\exp(\sqrt{n}\operatorname{poly}\log(n)\log(C/c)).

The key challenge is establishing the following reduction.

Let ML(\varmathbbCn\varmathbbCn)M\in L(\varmathbb C^{n}\otimes\varmathbb C^{n}) satisfy 0MI0\leqslant M\leqslant I. Assume that either (case Y) hSep(n,n)(M)=1\mathbf{h}_{\operatorname{Sep}(n,n)}(M)=1 or (case N) hSep(n,n)(M)1δ\mathbf{h}_{\operatorname{Sep}(n,n)}(M)\leqslant 1-\delta. Let kk be a positive integer. Then there exists a matrix AA of size n4k×n2kn^{4k}\times n^{2k} such that in case Y, A24=1\|A\|_{2\rightarrow 4}=1, and in case N, A24=(1δ/2)k\|A\|_{2\rightarrow 4}=(1-\delta/2)^{k}. Moreover, AA can be constructed efficiently from MM.

Once Lemma 9.5 is proved, Theorem 2.5 follows from previously known results about the hardness of approximating hSep)\mathbf{h}_{\operatorname{Sep}}). Let ϕ\phi be a 3-SAT instance with nn variables and O(n)O(n) clauses. In Theorem 4 of [GNN12] (improving on earlier work of [Gur03]), it was proved that ϕ\phi can be reduced to determining whether hSep(nc,nc)(M)\mathbf{h}_{\operatorname{Sep}(n^{c},n^{c})}(M) is equal to 1 (“case Y”) or 11/nlogc(n)\leqslant 1-1/n\log^{c}(n) (“case N”), where c>0c>0 is a universal constant, and MM is an efficiently constructible matrix with 0MI0\leqslant M\leqslant I. Now we apply Lemma 9.5 with k=1k=1 to find that exists a matrix AA of dimension poly(n)\operatorname{poly}(n) such that in case Y, A24=1{\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4}=1, and in case N, A2411/2nlogc(n){\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4}\leqslant 1-1/2n\log^{c}(n). Thus, distinguishing these cases would determine whether ϕ\phi is satisfiable. This establishes part (1) of Theorem 2.5.

For part (2), we start with Corollary 14 of [HM10], which gives a reduction from determining the satisfiability of ϕ\phi to distinguishing between (“case Y”) hSep(m,m)(M)=1\mathbf{h}_{\operatorname{Sep}(m,m)}(M)=1 and (“case N”) hSep(m,m)(M)1/2\mathbf{h}_{\operatorname{Sep}(m,m)}(M)\leqslant 1/2. Again 0MI0\leqslant M\leqslant I, and MM can be constructed in time poly(m)\operatorname{poly}(m) from ϕ\phi, but this time m=exp(npolylog(n))m=\exp(\sqrt{n}\operatorname{poly}\log(n)). Applying Lemma 9.5 in a similar fashion completes the proof. ∎

The previous section shows that computing A24{\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4} is equivalent to computing hSep(n,n)(A2,2)\mathbf{h}_{\operatorname{Sep}(n,n)}(A_{2,2}), for A2,2A_{2,2} defined as in (9.13). However, the hardness results of [Gur03, GNN12, HM10] produce matrices MM that are not in the form of A2,2A_{2,2}. The reduction of [HM10] comes closest, by producing a matrix that is a sum of terms of the form xxyyxx^{*}\otimes yy^{*}. However, we need a sum of terms of the form xxxxxx^{*}\otimes xx^{*}. This will be achieved by a variant of the protocol used in [HM10].

Let M0L(\varmathbbCn\varmathbbCn)M_{0}\in L(\varmathbb C^{n}\otimes\varmathbb C^{n}) satisfy 0MI0\leqslant M\leqslant I. Consider the promise problem of distinguishing the cases hSep(n,n)(M0)=1\mathbf{h}_{\operatorname{Sep}(n,n)}(M_{0})=1 (called “case Y”) from hSep(n,n)(M0)1/2\mathbf{h}_{\operatorname{Sep}(n,n)}(M_{0})\leqslant 1/2 (called “case N”). We show that this reduces to finding a multiplicative approximation for A24{\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4} for some real AA of dimension nαn^{\alpha} for a constant α>0\alpha>0. Combined with known hardness-of-approximation results (Corollary 15 of [HM10]), this will imply Theorem 2.5.

Define PP to be the projector onto the subspace of (\varmathbbCn)4(\varmathbb C^{n})^{\otimes 4} that is invariant under Pn((1,3))P_{n}((1,3)) and Pn((2,4))P_{n}((2,4)) (see Section 9.1 for definitions). This can be obtained by applying Pn((2,3))P_{n}((2,3)) to 2\varmathbbCn2\varmathbbCn\vee^{2}\varmathbb C^{n}\otimes\vee^{2}\varmathbb C^{n}, where we recall that 2\varmathbbCn\vee^{2}\varmathbb C^{n} is the symmetric subspace of (\varmathbbCn)2(\varmathbb C^{n})^{\otimes 2}. Since PP projects onto the vectors invariant under the 4-element group generated by Pn((1,3))P_{n}((1,3)) and Pn((2,4))P_{n}((2,4)), we can write it as

An alternate definition of PP is due to Wick’s theorem:

where the expectation is taken over complex-Gaussian-distributed vectors a,b\varmathbbCna,b\in\varmathbb C^{n} normalized so that \varmathbbEa22=\varmathbbEb22=n/2\operatorname*{\varmathbb{E}}\|a\|_{2}^{2}=\operatorname*{\varmathbb{E}}\|b\|_{2}^{2}=n/\sqrt{2}. Here we use the notation ^\hat{\otimes} to mark the separation between systems that we will use to define the separable states Sep(n2,n2)\operatorname{Sep}(n^{2},n^{2}). We could equivalently write P=\varmathbbEa,b[(aabb)^2]P=\operatorname*{\varmathbb{E}}_{a,b}[(aa^{*}\otimes bb^{*})^{\hat{\otimes}2}]. We will find that (9.24) is more useful for doing calculations, while (9.25) is helpful for converting M0M_{0} into a form that resembles A2,2A_{2,2} for some matrix AA.

Define M1=(M0^M0)P(M0^M0)M_{1}=(\sqrt{M_{0}}\,\hat{\otimes}\sqrt{M_{0}})P\,(\sqrt{M_{0}}\,\hat{\otimes}\sqrt{M_{0}}), where M0\sqrt{M_{0}} is taken to be the unique positive-semidefinite square root of M0M_{0}. Observe that

where we define va,b:=M0(ab)v_{a,b}:=\sqrt{M_{0}}(a\otimes b) and Va,b:=va,bva,bV_{a,b}:=v_{a,b}v_{a,b}^{*}. We claim that hSep(M1)\mathbf{h}_{\operatorname{Sep}}(M_{1}) gives a reasonable proxy for hSep(M0)\mathbf{h}_{\operatorname{Sep}}(M_{0}) in the following sense.

The proof of Lemma 9.6 is deferred to the end of this section. The analysis is very similar to Theorem 13 of [HM10], but the analysis here is much simpler because M0M_{0} acts on only two systems. However, it is strictly speaking not a consequence of the results in [HM10], because that paper considered a slightly different choice of M1M_{1}.

The advantage of replacing M0M_{0} with M1M_{1} is that (thanks to (9.25)) we now have a matrix with the same form as A2,2A_{2,2} in (9.13), allowing us to make use of Lemma 9.3. However, we first need to amplify the separation between cases Y and N. This is achieved by the matrix M2:=M1kM_{2}:=M_{1}^{\otimes k}. This tensor product is not across the cut we use to define separable states; in other words:

Now Lemma 12 from [HM10] implies that hSep(n2k,n2k)(M2)=hSep(n2,n2)(M1)k\mathbf{h}_{\operatorname{Sep}(n^{2k},n^{2k})}(M_{2})=\mathbf{h}_{\operatorname{Sep}(n^{2},n^{2})}(M_{1})^{k}. This is either 1 or (3/4)k\leqslant(3/4)^{k}, depending on whether we have case Y or N.

Finally, we would like to relate this to the 242\rightarrow 4 norm of a matrix. It will be more convenient to work with M1M_{1}, and then take tensor powers of the corresponding matrix. Naively applying Lemma 9.3 would relate hSep(M1)\mathbf{h}_{\operatorname{Sep}}(M_{1}) to A24{\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4} for an infinite-dimensional AA. Instead, we first replace the continuous distribution on aa (resp. bb) with a finitely-supported distribution in a way that does not change \varmathbbEaaaaa\operatorname*{\varmathbb{E}}_{a}aa^{*}\otimes aa^{*} (resp. \varmathbbEbbbbb\operatorname*{\varmathbb{E}}_{b}bb^{*}\otimes bb^{*}). Such distributions are called complex-projective (2,2)-designs or quantum (state) 2-designs, and can be constructed from spherical 4-designs on \varmathbbR2n\varmathbb R^{2n} [AE07]. Finding these designs is challenging when each vector needs to have the same weight, but for our purposes we can use Carathéodory’s theorem to show that there exist vectors z1,,zmz_{1},\ldots,z_{m} with m=n2m=n^{2} such that

In what follows, assume that the average over a,ba,b used in the definitions of P,M1,M2P,M_{1},M_{2} is replaced by the sum over z1,,zmz_{1},\ldots,z_{m}. By (9.29) this change does not affect the values of P,M1,M2P,M_{1},M_{2}.

For i,j[m]i,j\in[m], define wi,j:=M0(zizj)w_{i,j}:=\sqrt{M_{0}}(z_{i}\otimes z_{j}), and let ei,j:=eieje_{i,j}:=e_{i}\otimes e_{j}. Now we can apply Lemma 9.3 to find that hSep(M1)=A1244\mathbf{h}_{\operatorname{Sep}}(M_{1})={\boldsymbol{\lVert}}A_{1}{\boldsymbol{\rVert}}_{2\rightarrow 4}^{4}, where

The amplified matrix M2M_{2} similarly satisfies hSep(n2k,n2k)(M2)=A2244\mathbf{h}_{\operatorname{Sep}(n^{2k},n^{2k})}(M_{2})={\boldsymbol{\lVert}}A_{2}{\boldsymbol{\rVert}}_{2\rightarrow 4}^{4}, where

The last step is to relate the complex matrix A2A_{2} to a real matrix A3A_{3} with the same 242\rightarrow 4 norm once we restrict to real inputs. This can be achieved by replacing a single complex entry α+iβ\alpha+i\beta with the 6×26\times 2 real matrix

A complex input x+iyx+iy is represented by the column vector (xy)\begin{pmatrix}x\\ y\end{pmatrix}. The initial 2×22\times 2 matrix maps this to the real representation of (α+iβ)(x+iy)(\alpha+i\beta)(x+iy), and then the fixed 6×26\times 2 matrix maps this to a vector whose 4-norm equals (α+iβ)(x+iy)4|(\alpha+i\beta)(x+iy)|^{4}.

We conclude with the proof of Lemma 9.6, mostly following [HM10].

Case Y is simplest, and also provides intuition for the choices of the M1M_{1} construction. Since the extreme points of Sep(n,n)\operatorname{Sep}(n,n) are of the form xxyyxx^{*}\otimes yy^{*} for x,yS(\varmathbbCn)x,y\in\mathbf{S}(\varmathbb C^{n}), it follows that there exists x,yS(\varmathbbCn)x,y\in\mathbf{S}(\varmathbb C^{n}) with xy,M(xy)=1\boldsymbol{\langle}x\otimes y,M(x\otimes y)\boldsymbol{\rangle}=1. Since MIM\leqslant I, this implies that M(xy)=(xy)M(x\otimes y)=(x\otimes y). Thus M0(xy)=(xy)\sqrt{M_{0}}(x\otimes y)=(x\otimes y). Let

Then zz is an eigenvector of both M0M0\sqrt{M_{0}}\otimes\sqrt{M_{0}} and PP, with eigenvalue 11 in each case. To see this for PP, we use the definition in (9.24). Thus z,M1z=1\boldsymbol{\langle}z,M_{1}z\boldsymbol{\rangle}=1, and it follows that hSep(n2,n2)(M1)1\mathbf{h}_{\operatorname{Sep}(n^{2},n^{2})}(M_{1})\geqslant 1. On the other hand, M1IM_{1}\leqslant I, implying that hSep(n2,n2)(M1)1\mathbf{h}_{\operatorname{Sep}(n^{2},n^{2})}(M_{1})\leqslant 1. This establishes case Y.

For case N, we assume that hSep(n,n)(M0)1δ\mathbf{h}_{\operatorname{Sep}(n,n)}(M_{0})\leqslant 1-\delta for any x,yS(\varmathbbCn)x,y\in\mathbf{S}(\varmathbb C^{n}). The idea of the proof is that for any x,yS(\varmathbbCn2)x,y\in\mathbf{S}(\varmathbb C^{n^{2}}), we must either have x,yx,y close to a product state, in which case the M0\sqrt{M_{0}} step will shrink the vector, or if they are far from a product state and preserved by M0M0\sqrt{M_{0}}\otimes\sqrt{M_{0}}, then the PP step will shrink the vector. In either case, the length will be reduced by a dimension-independent factor.

We now spell this argument out in detail. Choose x,yS(\varmathbbCn2)x,y\in\mathbf{S}(\varmathbb C^{n^{2}}) to achieve

Let X,YL(\varmathbbCn)X,Y\in L(\varmathbb C^{n}) be defined by

Note that X,X=x,M0x1\boldsymbol{\langle}X,X\boldsymbol{\rangle}=\boldsymbol{\langle}x,M_{0}x\boldsymbol{\rangle}\leqslant 1 and similarly for Y,Y\boldsymbol{\langle}Y,Y\boldsymbol{\rangle}. We wish to estimate

Using (9.24) we see that the expression inside the \boldsymbol{\langle}\cdot\boldsymbol{\rangle} is

Using the AM-GM inequality we see that the maximum of this expression is achieved when X=YX=Y, in which case we have

Let the singular values of XX be σ1σn\sigma_{1}\geqslant\cdots\geqslant\sigma_{n}. Observe that σ22=X,X1{\boldsymbol{\lVert}}\sigma{\boldsymbol{\rVert}}_{2}^{2}=\boldsymbol{\langle}X,X\boldsymbol{\rangle}\leqslant 1, and thus σ44=XX,XXσ12{\boldsymbol{\lVert}}\sigma{\boldsymbol{\rVert}}_{4}^{4}=\boldsymbol{\langle}X^{*}X,X^{*}X\boldsymbol{\rangle}\leqslant\sigma_{1}^{2}. On the other hand,

Remark: It is possible to extend Lemma 9.5 to the situation when case Y has hSep(M)>1δ\mathbf{h}_{Sep}(M)>1-\delta^{\prime} for some constant δ<δ\delta^{\prime}<\delta. Since the details are somewhat tedious, and repeat arguments in [HM10], we omit them here.

Can Theorem 2.5 give any super-polynomial lower bound for the SSE problem if we assume the Exponential-Time Hypothesis for 3-SAT? To resolve this question using our techniques, we would like to reduce 3-SAT to estimating the 242\rightarrow 4 norm of the projector onto the eigenvectors of a graph that have large eigenvalue. We do not know how to do this. However, instead, we show that the matrix AA constructed in Theorem 2.5 can be taken to be a projector. This is almost WLOG, except that the resulting 242\rightarrow 4 norm will be at least 31/43^{1/4}.

Let AA be a linear map from \varmathbbRk\varmathbb R^{k} to \varmathbbRn\varmathbb R^{n} and 0<c<C0<c<C , ε>0\varepsilon>0 some numbers. Then there is m=O(n2/ε2)m={O}(n^{2}/\varepsilon^{2}) and a map AA^{\prime} from \varmathbbRk\varmathbb R^{k} to \varmathbbRm\varmathbb R^{m} such that σmin(A)1ε\sigma_{\min}(A^{\prime})\geqslant 1-\varepsilon and (i) if A24c\|A\|_{2\rightarrow 4}\leqslant c then A2431/4+ε\|A^{\prime}\|_{2\rightarrow 4}\leqslant 3^{1/4}+\varepsilon, (ii) A24C\|A\|_{2\rightarrow 4}\geqslant C then A24Ω(εC/c)\|A^{\prime}\|_{2\rightarrow 4}\geqslant\Omega(\varepsilon C/c).

We let BB be a random map from \varmathbbRk\varmathbb R^{k} to \varmathbbRO(n2/δ2)\varmathbb R^{{O(n^{2}/\delta^{2})}} with entries that are i.i.d. Gaussians with mean zero and variance 1/k1/\sqrt{k}. Then Dvoretzky’s theorem [Pis99] implies that for every f\varmathbbRkf\in\varmathbb R^{k}, Bf431/4(1±δ)f2\lVert Bf\rVert_{4}\in 3^{1/4}(1\pm\delta)\lVert f\rVert_{2}. Consider the operator A=(A/cB)A^{\prime}=\begin{pmatrix}A/c\\ B\end{pmatrix} that maps ff into the concatenation of AfAf and BfBf. Moreover we take multiple copies of each coordinate so that the measure of output coordinates of AA^{\prime} corresponding to AA is δ\delta, while the measure of coordinates corresponding to BB is 1δ1-\delta.

Now for every function ff, we get that Af44=δc4Af44+(1δ)Bf44\lVert A^{\prime}f\rVert_{4}^{4}=\frac{\delta}{c^{4}}\lVert Af\rVert_{4}^{4}+(1-\delta)\lVert Bf\rVert_{4}^{4}. In particular, since Bf443(1±δ)f24\lVert Bf\rVert_{4}^{4}\in 3(1\pm\delta)\lVert f\rVert_{2}^{4}, we get that if ff is a unit vector and Af44c4\lVert Af\rVert_{4}^{4}\leqslant c^{4} then Af44δ+3(1+δ)\lVert A^{\prime}f\rVert_{4}^{4}\leqslant\delta+3(1+\delta), while if Af44C4\lVert Af\rVert_{4}^{4}\geqslant C^{4}, then Af44δ(C/c)4\lVert A^{\prime}f\rVert_{4}^{4}\geqslant\delta(C/c)^{4}.

Also note that the random operator BB will satisfy that for every function ff, Bf2(1δ)f2\lVert Bf\rVert_{2}\geqslant(1-\delta)\lVert f\rVert_{2}, and hence Af(1δ)2f\lVert A^{\prime}f\rVert\geqslant(1-\delta)^{2}\lVert f\rVert. Choosing δ=ε/2\delta=\varepsilon/2 concludes the proof. ∎

It turns out that for the purposes of hardness of good approximation, the case that AA is a projector is almost without loss of generality.

Suppose that for some ε>0,C>1+ε\varepsilon>0,C>1+\varepsilon there is a poly(n)\operatorname{poly}(n) algorithm that on input a subspace V\varmathbbRnV\subseteq\varmathbb R^{n} can distinguish between the case (Y) ΠV24C\|\Pi_{V}\|_{2\rightarrow 4}\geqslant C and the case (N) ΠV2431/4+ε\|\Pi_{V}\|_{2\rightarrow 4}\leqslant 3^{1/4}+\varepsilon, where ΠV\Pi_{V} denotes the projector onto VV. Then there is δ=Ω(ε)\delta=\Omega(\varepsilon) and a poly(n)\operatorname{poly}(n) algorithm that on input an operator A:\varmathbbRk\varmathbbRnA:\varmathbb R^{k}\to\varmathbb R^{n} with σmin(A)1δ\sigma_{\min}(A)\geqslant 1-\delta can distinguish between the case (Y) A24C(31/4+δ)\|A\|_{2\rightarrow 4}\geqslant C(3^{1/4}+\delta) and (N) A2431/4+δ\|A\|_{2\rightarrow 4}\leqslant 3^{1/4}+\delta.

First we can assume without loss of generality that A22=σmax(A)31/4+δ\lVert A\rVert_{2\to 2}=\sigma_{\max}(A)\leqslant 3^{1/4}+\delta, since otherwise we could rule out case (N). Now we let VV be the image of AA. In the case (N) we get that that for every f\varmathbbRkf\in\varmathbb R^{k}

implying ΠV2431/4+O(δ)\|\Pi_{V}\|_{2\rightarrow 4}\leqslant 3^{1/4}+O(\delta). In the case (Y) we get that there is some ff such that Af4C(31/4+δ)f2\lVert Af\rVert_{4}\geqslant C(3^{1/4}+\delta)\lVert f\rVert_{2}, but since Af2σmax(A)f2(31/4+δ)f2\lVert Af\rVert_{2}\leqslant\sigma_{\max}(A)\lVert f\rVert_{2}\leqslant(3^{1/4}+\delta)\lVert f\rVert_{2}, we get that Af4CAf2\lVert Af\rVert_{4}\geqslant C\lVert Af\rVert_{2}, implying ΠV24C\|\Pi_{V}\|_{2\rightarrow 4}\geqslant C. ∎

Together these two lemmas effectively extend Theorem 2.5 to the case when AA is a projector. We focus on the hardness of approximating to within a constant factor.

3 Algorithmic applications of equivalent formulations

In this section we discuss the positive algorithmic results that come from the equivalences in Section 9.1. Since entanglement plays such a central role in quantum mechanics, the set Sep2(\varmathbbCn)\operatorname{Sep}^{2}(\varmathbb C^{n}) has been extensively studied. However, because its hardness has long been informally recognized (and more recently has been explicitly established [Gur03, Liu07, HM10, GNN12]), various relaxations have been proposed for the set. These relaxations are generally efficiently computable, but also have limited accuracy; see [BS10] for a review.

Two of the most important relaxations are the PPT condition and kk-extendability. For an operator XL((\varmathbbCn)r)X\in L((\varmathbb C^{n})^{\otimes r}) and a set S[r]S\subseteq[r], define the partial transpose XTSX^{T_{S}} to be the result of applying the transpose map to the systems SS. Formally, we define

and extend TST_{S} linearly to all of L((\varmathbbCn)r)L((\varmathbb C^{n})^{\otimes r}). One can verify that if XSepr(\varmathbbCn)X\in\operatorname{Sep}^{r}(\varmathbb C^{n}) then XTS0X^{T_{S}}\succeq 0 for all S[r]S\subseteq[r]. In this case we say that XX is PPT, meaning that it has Positive Partial Transposes. However, the converse is not always true. If n>2n>2 or r>2r>2, then there are states which are PPT but not in Sep\operatorname{Sep} [HHH96].

The second important relaxation of Sep\operatorname{Sep} is called rr-extendability. To define this, we need to introduce the partial trace. For S[r]S\subseteq[r], we define TrS\operatorname{Tr}_{S} to be the map from L((\varmathbbCn)r)L((\varmathbb C^{n})^{\otimes r}) to L((\varmathbbCn)rS)L((\varmathbb C^{n})^{\otimes r-|S|}) that results from applying Tr\operatorname{Tr} to the systems in SS. Formally

and TrS\operatorname{Tr}_{S} extends by linearity to all of L((\varmathbbCn)r)L((\varmathbb C^{n})^{\otimes r}).

To obtain our relaxation of Sep\operatorname{Sep}, we say that ρD(\varmathbbCn\varmathbbCn)\rho\in\mathcal{D}(\varmathbb C^{n}\otimes\varmathbb C^{n}) is rr-extendable if there exists a symmetric extension σD(\varmathbbCnr\varmathbbCn)\sigma\in\mathcal{D}(\varmathbb C^{n}\otimes\vee^{r}\varmathbb C^{n}) such that Tr{3,,r+1}σ=ρ\operatorname{Tr}_{\{3,\ldots,r+1\}}\sigma=\rho. Observe that if ρSep2(\varmathbbCn)\rho\in\operatorname{Sep}^{2}(\varmathbb C^{n}), then we can write ρ=ixixiyiyi\rho=\sum_{i}x_{i}x_{i}^{*}\otimes y_{i}y_{i}^{*}, and so σ=ixixi(yiyi)r\sigma=\sum_{i}x_{i}x_{i}^{*}\otimes(y_{i}y_{i}^{*})^{\otimes r} is a valid symmetric extension. Thus the set of kk-extendable states contains the set of separable states, but again the inclusion is strict. Indeed, increasing kk gives an infinite hierarchy of strictly tighter approximations of Sep2(\varmathbbCn)\operatorname{Sep}^{2}(\varmathbb C^{n}). This hierarchy ultimately converges [DPS04], although not always at a useful rate (see Example IV.1 of [CKMR07]). Interestingly this relaxation is known to completely fail as a method of approximating Sep2(\varmathbbRn)\operatorname{Sep}^{2}(\varmathbb R^{n}) [CFS02], but our Lemma 9.3 is evidence that those difficulties do not arise in the 2 ⁣ ⁣42\!\rightarrow\!4-norm problem.

These two relaxations can be combined to optimize over symmetric extensions that have positive partial transposes [DPS04]. Call this the level-rr DPS relaxation. It is known to converge in some cases more rapidly than rr-extendability alone [NOP09], but also is never exact for any finite rr [DPS04]. Like SoS, this relaxation is an SDP with size nO(r)n^{O(r)}. In fact, for the case of the 242\rightarrow 4 norm, the relaxations are equivalent.

When the level-rr DPS relaxation is applied to A2,2A_{2,2}, the resulting approximation is equivalent to Tensor-SDP(2r+2)\textup{{Tensor-SDP}}^{(2r+2)}

Suppose we are given an optimal solution to the level-rr DPS relaxation. This can be thought of as a density operator σD(\varmathbbCnr\varmathbbCn)\sigma\in\mathcal{D}(\varmathbb C^{n}\otimes\vee^{r}\varmathbb C^{n}) whose objective value is λ:=A2,2,Tr{3,,r+1}σ=A2,2Inr1,σ\lambda:=\boldsymbol{\langle}A_{2,2},\operatorname{Tr}_{\{3,\ldots,r+1\}}\sigma\boldsymbol{\rangle}=\boldsymbol{\langle}A_{2,2}\otimes I_{n}^{\otimes r-1},\sigma\boldsymbol{\rangle}. Let Πsym(2):=(I+Pn((1,2)))/2\Pi_{\rm sym}^{(2)}:=(I+P_{n}((1,2)))/2 be the orthogonal projector onto 2\varmathbbCn\vee^{2}\varmathbb C^{n}. Then A2,2=Πsym(2)A2,2Πsym(2)A_{2,2}=\Pi_{\rm sym}^{(2)}A_{2,2}\Pi_{\rm sym}^{(2)}. Thus, we can replace σ\sigma by σ:=(Πsym(2)Inr1)σ(Πsym(2)Inr1)\sigma^{\prime}:=(\Pi_{\rm sym}^{(2)}\otimes I_{n}^{\otimes r-1})\sigma(\Pi_{\rm sym}^{(2)}\otimes I_{n}^{\otimes r-1}) without changing the objective function. However, unless σ=σ\sigma^{\prime}=\sigma, we will have Trσ<1\operatorname{Tr}\sigma^{\prime}<1. In this case, either σ=0\sigma^{\prime}=0 and λ=0\lambda=0, or σ/Trσ\sigma^{\prime}/\operatorname{Tr}\sigma^{\prime} is a solution of the DPS relaxation with a higher objective value. In either case, this contradicts the assumption that λ\lambda is the optimal value. Thus, we must have σ=σ\sigma=\sigma^{\prime}, and in particular suppσ2\varmathbbCn(\varmathbbCn)r1\operatorname{supp}\sigma\subseteq\vee^{2}\varmathbb C^{n}\otimes(\varmathbb C^{n})^{\otimes r-1}. Since we had suppσ\varmathbbCnr\varmathbbCn\operatorname{supp}\sigma\subseteq\varmathbb C^{n}\otimes\vee^{r}\varmathbb C^{n} by assumption, it follows that

Observe next that σT\sigma^{T} is also a valid and optimal solution to the DPS relaxation, and so σ=(σ+σT)/2\sigma^{\prime}=(\sigma+\sigma^{T})/2 is as well. Since σ\sigma^{\prime} is both symmetric and Hermitian, it must be a real matrix. Replacing σ\sigma with σ\sigma^{\prime}, we see that we can assume WLOG that σ\sigma is real.

Similarly, the PPT condition implies that σTA0\sigma^{T_{A}}\geqslant 0. (Recall that the first system is AA and the rest are B1,,BrB_{1},\ldots,B_{r}.) Since the partial transpose doesn’t change the objective function, σ=(σ+σTA)/2\sigma^{\prime}=(\sigma+\sigma^{T_{A}})/2 is also an optimal solution. Replacing σ\sigma with σ\sigma^{\prime}, we see that we can assume WLOG that σ=σTA\sigma=\sigma^{T_{A}}. Let σ(\varmathbbRn)2r+2\vec{\sigma}\in(\varmathbb R^{n})^{\otimes 2r+2} denote the flattening of σ\sigma; i.e. xy,σ=x,σy\boldsymbol{\langle}x\otimes y,\vec{\sigma}\boldsymbol{\rangle}=\boldsymbol{\langle}x,\sigma y\boldsymbol{\rangle} for all x,y(\varmathbbRn)r+1x,y\in(\varmathbb R^{n})^{r+1}. Then the fact that σ=σTA\sigma=\sigma^{T_{A}} means that σ\vec{\sigma} is invariant under the action of Pn((1,r+1))P_{n}((1,r+1)). Similarly, the fact that suppσr+1\varmathbbRn\operatorname{supp}\sigma\subseteq\vee^{r+1}\varmathbb R^{n} implies that σr+1\varmathbbRnr+1\varmathbbRn\vec{\sigma}\in\vee^{r+1}\varmathbb R^{n}\otimes\vee^{r+1}\varmathbb R^{n}. Combining these two facts we find that σ2r+2\varmathbbRn\vec{\sigma}\in\vee^{2r+2}\varmathbb R^{n}.

where Πsym(2r+2):=12r+2!πS2r+2Pn(π)\Pi_{\rm sym}^{(2r+2)}:=\frac{1}{2r+2!}\sum_{\pi\in\mathcal{S}_{2r+2}}P_{n}(\pi). For a homogenous polynomial Q(f)Q(f) of even degree 2r2r+22r^{\prime}\leqslant 2r+2 we define coeff\operatorname{\mathit{coeff}} by

For a homogenous polynomial Q(f)Q(f) of odd degree, we set coeff(Q):=0\operatorname{\mathit{coeff}}(Q):=0. Then we can extend coeff\operatorname{\mathit{coeff}} by linearity to all polynomials of degree 2r+2\leqslant 2r+2. Now define

To translate a Tensor-SDP solution into a DPS solution, we run this construction in reverse. The arguments are essentially the same, except that we no longer need to establish symmetry across all 2r+22r+2 indices. ∎

Many approximation guarantees for the kk-extendable relaxation (with or without the additional PPT constraints) required that kk be poly(n)\operatorname{poly}(n), and thus do not lead to useful algorithms. Recently, [BaCY11] showed that in some cases it sufficed to take k=O(logn)k=O(\log n), leading to quasi-polynomial algorithms. It is far from obvious that their proof translates into our sum-of-squares framework, but nevertheless Lemma 9.10 implies that Tensor-SDP can take advantage of their analysis.

To apply the algorithm of [BaCY11], we need to upper-bound A2,2A_{2,2} by an 1-LOCC measurement operator. That is, a quantum measurement that can be implemented by one-way Local Operations and Classical Communication (LOCC). Such a measurement should have a decomposition of the form iViWi\sum_{i}V_{i}\otimes W_{i} where each Vi,Wi0V_{i},W_{i}\succeq 0, iViIn\sum_{i}V_{i}\preceq I_{n} and each WiInW_{i}\preceq I_{n}. Thus, for complex vectors v1,,vm,w1,,wmv_{1},\ldots,v_{m},w_{1},\ldots,w_{m} satisfying iviviIn\sum_{i}v_{i}v_{i}^{*}\preceq I_{n} and i,wiwiIn\forall i,\,w_{i}w_{i}^{*}\preceq I_{n}, the operator iviviwiwi\sum_{i}v_{i}v_{i}^{*}\otimes w_{i}w_{i}^{*} is a 1-LOCC measurement.

To upper-bound A2,2A_{2,2} by a 1-LOCC measurement, we note that aiaiTai22Ina_{i}a_{i}^{T}\preceq{\boldsymbol{\lVert}}a_{i}{\boldsymbol{\rVert}}_{2}^{2}I_{n}. Thus, if we define Z:=iaiaiT22maxiai2Z:={\boldsymbol{\lVert}}\sum_{i}a_{i}a_{i}^{T}{\boldsymbol{\rVert}}_{2\to 2}\max_{i}{\boldsymbol{\lVert}}a_{i}{\boldsymbol{\rVert}}^{2}, then A2,2/ZA_{2,2}/Z is a 1-LOCC measurement. Note that this is a stricter requirement than merely requiring A2,2/ZIn2A_{2,2}/Z\preceq I_{n^{2}}. On the other hand, in some cases (e.g. aia_{i} all orthogonal), it may be too pessimistic.

In terms of the original matrix A=ieiaiTA=\sum_{i}e_{i}a_{i}^{T}, we have maxiai2=A2\max_{i}{\boldsymbol{\lVert}}a_{i}{\boldsymbol{\rVert}}_{2}={\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow\infty}. Also iaiaiT22=ATA22=A222{\boldsymbol{\lVert}}\sum_{i}a_{i}a_{i}^{T}{\boldsymbol{\rVert}}_{2\to 2}={\boldsymbol{\lVert}}A^{T}A{\boldsymbol{\rVert}}_{2\rightarrow 2}={\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 2}^{2}. Thus

Recall from the introduction that ZZ is an upper bound on A244{\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4}^{4}, based on the fact that x4x2x{\boldsymbol{\lVert}}x{\boldsymbol{\rVert}}_{4}\leqslant\sqrt{{\boldsymbol{\lVert}}x{\boldsymbol{\rVert}}_{2}{\boldsymbol{\lVert}}x{\boldsymbol{\rVert}}_{\infty}} for any xx. (This bound also arises from using interpolation of norms [Ste56].)

We can now apply the argument of [BaCY11] and show that optimizing over O(r)O(r)-extendable states will approximate A244{\boldsymbol{\lVert}}A{\boldsymbol{\rVert}}_{2\rightarrow 4}^{4} up to additive error log(n)rZ\sqrt{\frac{\log(n)}{r}}Z. Equivalently, we can obtain additive error εZ\varepsilon Z using O(log(n)/ε2)O(\log(n)/\varepsilon^{2})-round Tensor-SDP. Whether the relaxation used is the DPS relaxation or our SoS-based Tensor-SDP algorithm, the resulting runtime is exp(O(log2(n)/ε2))\exp(O(\log^{2}(n)/\varepsilon^{2})).

3.2 Gap instances

Since Tensor-SDP is equivalent than the DPS relaxation for separable states, any gap instance for Tensor-SDP would translate into a gap instance for the DPS relaxation. This would mean the existence of a state that passes the kk-extendability and PPT test, but nevertheless is far from separable, with A2,2A_{2,2} serving as the entanglement witness demonstrating this. While such states are already known [DPS04, BS10], it would be of interest to find new such families of states, possibly with different scaling of rr and nn.

Many of the results about the convergence of DPSr\operatorname{DPS}_{r} to Sep\operatorname{Sep} (such as [DPS04, CKMR07, KM09, BaCY11]) use only the fact that DPSrExtr\operatorname{DPS}_{r}\subset\operatorname{Ext}_{r}. A rare exception is [NOP09], which shows that DPSr\operatorname{DPS}_{r} is at least quadratically closer to Sep\operatorname{Sep} than Extr\operatorname{Ext}_{r} is, in the regime where rnr\gg n. Another simple example comes from M=ΦΦM=\Phi\Phi^{*}, where Φ\Phi is the maximally entangled state n1/2i=1neiein^{-1/2}\sum_{i=1}^{n}e_{i}\otimes e_{i}. Then one can readily compute that hSep(M)=hDPS1(M)=1/n\mathbf{h}_{\operatorname{Sep}}(M)=\mathbf{h}_{\operatorname{DPS}_{1}}(M)=1/n, while the rr-extendible state

achieves hExtr(M)1/r\mathbf{h}_{\operatorname{Ext}_{r}}(M)\geqslant 1/r. (In words, (9.43) describes a state where AA and a randomly chosen BiB_{i} share the state ΦΦ\Phi\Phi^{*}, while the other BjB_{j} systems are described by maximally mixed states.) This proves that the rr-extendable hierarchy cannot achieve a good multiplicative approximation of hSep(M)\mathbf{h}_{\operatorname{Sep}}(M) for all MM without taking rΩ(n)r\geqslant\Omega(n).

Can we improve this when MM is in a restricted class, such as 1-LOCC? Here [BRSdW11] show that the Khot-Vishnoi integrality construction can yield an n2n^{2}-dimensional MM for which hSep(M)O(1/n)h_{\operatorname{Sep}}(M)\leqslant O(1/n), but TrMΦΩ(1/log2(n))\operatorname{Tr}M\Phi\geqslant\Omega(1/\log^{2}(n)). Combined with (9.43) this implies that hExtr(M)Ω(1/rlog2(n))\mathbf{h}_{\operatorname{Ext}_{r}}(M)\geqslant\Omega(1/r\log^{2}(n)). On the other hand, Theorem 6.12 and Lemma 9.10 implies that hDPS3(M)O(1/n)\mathbf{h}_{\operatorname{DPS}_{3}}(M)\leqslant O(1/n). Additionally, the MM from Ref. [BRSdW11] belongs to the class BELL, a subset of 1-LOCC, given by measurements of the form i,jpi,jAiBj\sum_{i,j}p_{i,j}A_{i}\otimes B_{j}, with 0pi,j10\leqslant p_{i,j}\leqslant 1 and iAi=jBj=I\sum_{i}A_{i}=\sum_{j}B_{j}=I. As a result, we obtain the following corollary.

There exists an n2n^{2} dimensional MBELLM\in\text{BELL} such that

Subexponential algorithm for the 2-to-q norm

For every 1<c<C1<c<C, there is a poly(n)exp(n2/q)\operatorname{poly}(n)\exp(n^{2/q})-time algorithm that computes a (c,C)(c,C)-approximation for the 2q2\to q norm of any linear operator whose range is \varmathbbRn\varmathbb R^{n}.

and obtain as a corollary a subexponential algorithm for Small-Set Expansion. The algorithm roughly matches the performance of [ABS10]’s for the same problem, and in fact is a very close variant of it. The proof is obtained by simply noticing that a subspace VV cannot have too large of a dimension without containing a vector vv (that can be easily found) such that vqv2\lVert v\rVert_{q}\gg\lVert v\rVert_{2}, while of course it is always possible to find such a vector (if it exists) in time exponential in dim(V)\dim(V). The key observation is the following basic fact (whose proof we include here for completeness):

For every subspace V\varmathbbRnV\subseteq\varmathbb R^{n}, V2dim(V)\lVert V\rVert_{2\to\infty}\geqslant\sqrt{\dim(V)}.

Let f1,,fdf^{1},\ldots,f^{d} be an orthonormal basis for VV, where d=dim(V)d=\dim(V). For every i[n]i\in[n], let gig^{i} be the function j=1dfijfi\sum_{j=1}^{d}f^{j}_{i}f^{i}. Note that the ithi^{th} coordinate of gig^{i} is equal to j=1d(fij)2\sum_{j=1}^{d}(f^{j}_{i})^{2} (*) which also equals gi22\lVert g^{i}\rVert_{2}^{2} since the fjf^{j}’s are an orthonormal basis. Also the expectation of (*) over ii is j=1d\varmathbbEi[n](fij)2=j=1dfj22=d\sum_{j=1}^{d}\operatorname*{\varmathbb{E}}_{i\in[n]}(f^{j}_{i})^{2}=\sum_{j=1}^{d}\lVert f^{j}\rVert_{2}^{2}=d since these are unit vectors. Thus we get that \varmathbbEigi\varmathbbEigii=d=\varmathbbEig22\operatorname*{\varmathbb{E}}_{i}\lVert g^{i}\rVert_{\infty}\geqslant\operatorname*{\varmathbb{E}}_{i}g^{i}_{i}=d=\operatorname*{\varmathbb{E}}_{i}\lVert g\rVert_{2}^{2}. We claim that one of the gig^{i}’s must satisfy gidgi2\lVert g^{i}\rVert_{\infty}\geqslant\sqrt{d}\lVert g^{i}\rVert_{2}. Indeed, suppose otherwise, then we’d get that

meaning Eigi2<d2E_{i}\lVert g^{i}\rVert_{\infty}^{2}<d^{2}, but Eigi2(\varmathbbEigi)2=d2E_{i}\lVert g^{i}\rVert_{\infty}^{2}\geqslant\left(\operatorname*{\varmathbb{E}}_{i}\lVert g^{i}\rVert_{\infty}\right)^{2}=d^{2}— a contradiction. ∎

For every subspace V\varmathbbRnV\subseteq\varmathbb R^{n}, V2qdim(V)/n1/q\lVert V\rVert_{2\to q}\geqslant\sqrt{\dim(V)}/n^{1/q}

By looking at the contribution to the qthq^{th}-norm of just one coordinate one can see that for every function ff, fq(fq/n)1/q=f/n1/q\lVert f\rVert_{q}\geqslant(\lVert f\rVert_{\infty}^{q}/n)^{1/q}=\lVert f\rVert_{\infty}/n^{1/q}. ∎

Let A:\varmathbbRm\varmathbbRnA:\varmathbb R^{m}\to\varmathbb R^{n} be an operator, and let 1<c<C1<c<C be some constants and σ=σmin(A)\sigma=\sigma_{\min}(A) be such that Af2σf2\lVert Af\rVert_{2}\geqslant\sigma\lVert f\rVert_{2} for every ff orthogonal to the kernel of AA. We want to distinguish between the case that A2qc\lVert A\rVert_{2\to q}\leqslant c and the case that A2qC\lVert A\rVert_{2\to q}\geqslant C. If σ>c\sigma>c then clearly we are not in the first case, and so we are done. Let VV be the image of AA. If dim(V)C2n2/q\dim(V)\leqslant C^{2}n^{2/q} then we can use brute force enumeration to find out if such vv exists in the space. Otherwise, by Corollary 10.2 we must be in the second case. ∎

Note that by applying Theorem 2.3 we can replace the brute force enumeration step by the SoS hierarchy, since V221\lVert V\rVert_{2\to 2}\leqslant 1 automatically, and unless V2Cn1/q\lVert V\rVert_{2\to\infty}\leqslant Cn^{1/q} we will be in the second case.

A corollary of Theorem 2.1 is a subexponential algorithm for Small-Set Expansion

For every 0.4>ν>00.4>\nu>0 there is an exp(n1/O(log(1/ν)))\exp(n^{1/O(\log(1/\nu))}) time algorithm that given a graph with the promise that either (i) ΦG(δ)1ν\Phi_{G}(\delta)\geqslant 1-\nu or (ii) ΦG(δ2)0.5\Phi_{G}(\delta^{2})\leqslant 0.5 decides which is the case.

For q=O(log(1/ν))q=O(\log(1/\nu)) we find from Theorem 2.4 that in case (i), V0.42q2/δ\|V_{\geqslant 0.4}\|_{2\to q}\leqslant 2/\sqrt{\delta}, while in case (ii) V0.42q0.1/δ12/q\|V_{\geqslant 0.4}\|_{2\to q}\geqslant 0.1/\delta^{1-2/q}. Thus it sufficies to obtain a (2/δ,0.1/δ12/q)(2/\sqrt{\delta},0.1/\delta^{1-2/q})-approximation for the 2q2\to q norm to solve the problem, and by Theorem 2.1 this can be achieved in time exp(nO(log(1/ν)))\exp(n^{O(\log(1/\nu))}) for sufficiently small δ\delta. ∎

Conclusions

This work motivates further study of the complexity of approximating hypercontractive norms such as the 242\to 4 norm. A particulary interesting question is what is the complexity of obtaining a good approximation for the 242\to 4 norm and what’s the relation of this problem to the Small-Set Expansion problem. Our work leaves possible at least the following three scenarios: (i) both these problems can be solved in quasipolynomial time, but not faster, which would mean that the UGC as stated is essentially false but a weaker variant of it is true, (ii) both these problems are NP-hard to solve (via a reduction with polynomial blowup) meaning that the UGC is true, and (iii) the Small-Set Expansion and Unique Games problems are significantly easier than the 242\to 4 problem with the most extreme case being that the former two problems can be solved in polynomial time and the latter is NP-hard and hence cannot be done faster than subexponential time. This last scenario would mean that one can improve on the subexponential algorithm for the 242\to 4 norm for general instances by using the structure of instances arising from the Small-Set Expansion reduction of Theorem 2.4 (which indeed seem quite different from the instances arising from the hardness reduction of Theorem 2.5). In any case we hope that further study of the complexity of computing hypercontractive norms can lead to a better understanding of the boundary between hardness and easiness for Unique Games and related problems.

We thank Pablo Parrilo for useful discussions, Daniel Dadush for catching an error in an earlier version of Section 7 and the anonymous STOC referees for numerous comments that greatly improved the presentation of this paper. Aram Harrow was funded by NSF grants 0916400, 0829937, 0803478, 1111382, DARPA QuEST contract FA9550-09-1-0044 and ARO contract W911NF-12-1-0486. Jonathan Kelner was partially supported by NSF awards 1111109 and 0843915.

Appendix A More facts about pseudo-expectation

In this section we note some additional facts about pseudo-expectation functionals that are useful in this paper.

The relation P2PP^{2}\preceq P holds if and only if 0P10\preceq P\preceq 1. Furthermore, if P2PP^{2}\preceq P and 0QP0\preceq Q\preceq P, then Q2QQ^{2}\preceq Q.

If P0P\succeq 0, then P1P\preceq 1 implies P2PP^{2}\preceq P. (Multiplying both sides with a sum of squares preserves the order.) On the other hand, suppose P2PP^{2}\preceq P. Since P20P^{2}\succeq 0, we also have P0P\succeq 0. Since 1P=PP2+(1P)21-P=P-P^{2}+(1-P)^{2}, the relation P2PP^{2}\preceq P also implies P1P\preceq 1.

For the second part of the lemma, suppose P2PP^{2}\preceq P and 0QP0\preceq Q\preceq P. Using the first part of the lemma, we have P1P\preceq 1. It follows that 0Q10\preceq Q\preceq 1, which in turn implies Q2QQ^{2}\preceq Q (using the other direction of the first part of the lemma). ∎

The right-hand side minus the LHS equals the square polynomial 12fg,fg\tfrac{1}{2}\langle f-g,f-g\rangle

If (f,g)(f,g) is a level-22 fictitious random variable over \varmathbbRU×\varmathbbRU\varmathbb R^{\mathcal{U}}\times\varmathbb R^{\mathcal{U}}, then

If (f,g)(f,g) is a 44-f.r.v. over \varmathbbRU×\varmathbbRU\varmathbb R^{\mathcal{U}}\times\varmathbb R^{\mathcal{U}}, then

Appendix B Norm bound implies small-set expansion

In this section, we show that an upper bound on 2q2\to q norm of the projector to the top eigenspace of a graph implies that the graph is a small-set expander. This proof appeared elsewhere implicitly [KV05, O’D07] or explicitly [BGH+11] and is presented here only for completeness. We use the same notation from Section 8. Fix a graph GG (identified with its normalized adjacency matrix), and λ(0,1)\lambda\in(0,1), letting VλV_{\geqslant\lambda} denote the subspace spanned by eigenfunctions with eigenvalue at least λ\lambda.

If p,qp,q satisfy 1/p+1/q=11/p+1/q=1 then xp=maxy:yq1x,y\lVert x\rVert_{p}=\max_{y:\lVert y\rVert_{q}\leqslant 1}|\langle x,y\rangle|. Indeed, x,yxpyq|\langle x,y\rangle|\leqslant\lVert x\rVert_{p}\lVert y\rVert_{q} by Hölder’s inequality, and by choosing yi=sign(xi)xip1y_{i}=\operatorname{sign}(x_{i})|x_{i}|^{p-1} and normalizing one can see this equality is tight. In particular, for every xL(U)x\in L(\mathcal{U}), xq=maxy:yq/(q1)1x,y\lVert x\rVert_{q}=\max_{y:\lVert y\rVert_{q/(q-1)}\leqslant 1}|\langle x,y\rangle| and yq/(q1)=maxxq1x,y\lVert y\rVert_{q/(q-1)}=\max_{\lVert x\rVert_{q}\leqslant 1}|\langle x,y\rangle|. As a consequence

Note that if AA is a projection operator, A=ATA=A^{T}. Thus, part 1 of Theorem 2.4 follows from the following lemma:

Let G=(V,E)G=(V,E) be regular graph and λ(0,1)\lambda\in(0,1). Then, for every SVS\subseteq V,

Let ff be the characteristic function of SS, and write f=f+ff=f^{\prime}+f^{\prime\prime} where fVλf^{\prime}\in V_{\lambda} and f=fff^{\prime\prime}=f-f^{\prime} is the projection to the eigenvectors with value less than λ\lambda. Let μ=μ(S)\mu=\mu(S). We know that

And fq/(q1)=(\varmathbbEf(x)q/(q1))(q1)/q=μ(q1)/q\lVert f\rVert_{q/(q-1)}=\left(\operatorname*{\varmathbb{E}}f(x)^{q/(q-1)}\right)^{(q-1)/q}=\mu^{(q-1)/q}, meaning that fVλq/(q1)2μ(q1)/q\lVert f^{\prime}\rVert\leqslant\lVert V_{\lambda}\rVert_{q/(q-1)\to 2}\mu^{(q-1)/q}. We now write

Plugging this into (B.1) yields the result. ∎

Appendix C Semidefinite Programming Hierarchies

In this section, we compare different SDP hierarchies and discuss some of their properties.

In this section, we compare the SoS hierarchy and Lasserre hierarchy at the example of Max Cut. (We use a formulation of Lasserre’s hierarchy similar to the one in [Sch08].) It will turn out that these different formulations are equivalent up to (small) constant factors in the number of levels. We remark that the same proof with syntactic modifications shows that our SoS relaxation of Unique Games is equivalent to the corresponding Lasserre relaxation.

References