Making the long code shorter, with applications to the Unique Games Conjecture

Boaz Barak, Parikshit Gopalan, Johan Hastad, Raghu Meka, Prasad Raghavendra, David Steurer

Introduction

Khot’s Unique Games Conjecture [Kho02] (UGC) has been the focus of intense research effort in the last few years. The conjecture posits the hardness of approximation for a certain constraint satisfaction problem, and shows promise to settle many open questions in theory of approximation algorithms. Specifically, an instance Γ\Gamma of the Unique Games problem with nn variables and alphabet Σ\Sigma is described by a collection of constraints of the form (x,y,π)(x,y,\pi) where π\pi is a permutation over Σ\Sigma. An assignment to Γ\Gamma is a mapping ff from [n][n] to Σ\Sigma, and ff’s value is the fraction of constraints (x,y,π)(x,y,\pi) such that f(y)=π(f(x))f(y)=\pi(f(x)). The Unique Games Conjecture is that for any ε>0\varepsilon>0, there is some finite Σ\Sigma such that it is NP hard to distinguish between the case that a Unique Games instance Γ\Gamma with alphabet Γ\Gamma has an assignment satisfying 1ε1-\varepsilon fraction of the constraints, and the case that every assignment satisfies at most ε\varepsilon fraction of Γ\Gamma’s constraint.

Many works have been devoted to studying the plausibility of the UGC, as well as exploring its implications and obtaining unconditional results motivated by this effort. Tantalizingly, at the moment we have very little evidence for the truth of this conjecture. One obvious reason to believe the UGC is that no algorithm is known to contradict it, though that of course may have more to do with our proof techniques for algorithm analysis than actual computational difficulty. Thus perhaps the strongest evidence for the conjecture comes from results showing particular instances on which certain natural algorithms will fail to solve the problem. However, even those integrality gaps are quantitatively rather weak. For example, while Arora, Barak and Steurer [ABS10] showed a subexponential upper bound on an algorithm for the Unique Games and the related Small-Set Expansion problem, the hardest known instances for their algorithm only required quasipolynomial time [Kol10]. Similarly (and related to this), known integrality gaps for Unique Games and related problems do not rule out their solution by an O(logn)O(\log n)-round semidefinite hierarchy, an algorithm that can be implemented in quasipolynomial (or perhaps even polynomial [BRS11]) time.

The long code has been a central tool in many of these works. This is the set of “dictator” functions mapping \varmathbbF2N\varmathbb F_{2}^{N} to \varmathbbF2\varmathbb F_{2} that have the form x1xNxix_{1}\ldots x_{N}\mapsto x_{i} for some ii. Many hardness reductions (especially from Unique Games) and constructions of integrality gap instances use the long code as a tool. However, this is also the source of their inefficiency, as the long code is indeed quite long. Specifically, it has only NN codewords but dimension 2N2^{N}, which leads to exponential blowup in many of these applications.In this work, we introduce a different code, which we call the “short code”, that is exponentially more efficient, and can be used in the long code’s place in many of these applications, leading to significant quantitative improvements. In particular, we use our code to show instances on which the [ABS10] algorithm, as well as certain semidefinite hierarchies, require almost sub-exponential time, thus considerably strengthening the known evidence in support of the Unique Games Conjecture. Moreover, our results open up possibilities for qualitative improvements as well, in particular suggesting a new approach to prove the Unique Games Conjecture via an efficient alphabet reduction.

At the heart of the long code’s applications lie its connection with the noisy Hypercube. This is the weighted graph HN,εH_{N,\varepsilon} whose vertices are elements in \varmathbbF2N\varmathbb F_{2}^{N} where a random neighbor of x\varmathbbF2Nx\in\varmathbb F_{2}^{N} is obtained by flipping each bit of xx independently with probability ε\varepsilon.This graph is closely related and has similar properties to the unweighted graph where we connect xx and yy if their Hamming distance is at most εN\varepsilon N. It is not too hard to show that the codewords of the long code correspond to the top eigenvectors of the noisy hypercube which also give the minimal bisections of the graph, cutting only an ε\varepsilon fraction of edges. In addition, several converse results are known, showing that bisections (and more general functions) cutting few edges are close to these top eigenvectors (or dictatorships) in some sense. (One such result is the “Majority is Stablest” Theorem of [MOO05].) The inefficiency of the longcode is manifested in the fact that the number of vertices of the noisy cube is exponential in the number NN of its top eigenvectors.

Another way to describe the long code is that it encodes x\varmathbbF2nx\in\varmathbb F_{2}^{n} by a binary vector vxv_{x} of length 22n2^{2^{n}} where vx(f)=f(x)v_{x}(f)=f(x) for every function f:\varmathbbF2n\varmathbbF2f:\varmathbb F_{2}^{n}\rightarrow\varmathbb F_{2}. This view also accounts for the name “long code”, since one can see that this is the longest possible encoding of xx without having repeated coordinates. For every subset D\mathcal{D} of functions mapping \varmathbbF2n\varmathbb F_{2}^{n} to \varmathbbF2\varmathbb F_{2}, we define the D\mathcal{D}-short code to be the code that encodes xx by a vector vxv_{x} of length D|\mathcal{D}| where vx(f)=f(x)v_{x}(f)=f(x) for every fDf\in\mathcal{D}. Note that this is a very general definition that encapsulates any code without repeated coordinates. For d\varmathbbNd\in\varmathbb N, we define the dd-short code to be the the D\mathcal{D}-short code where D\mathcal{D} is the set of all polynomials over \varmathbbF2n\varmathbb F_{2}^{n} of degree at most dd. Note that the 11-short code is the Hadamard code, while the nn-short code is the long code. We use the name “short code” to denote the dd short code for d=O(1)d=O(1). Note that the short code has 2n2^{n} codewords and dimension roughly 2nd2^{n^{d}}, and hence only quasipolynomial blowup, as opposed to the exponential blowup of the long code. Our main contribution is a construction of a “derandomized” noisy cube, which is a small subgraph of the noisy cube that enjoys the same relations to the short code (including a “Majority is Stablest” theorem) as the original noisy cube has to the long code. As a result, in many applications one can use the short code and the derandomized cube in place of the long code and the noisy cube, obtaining an exponential advantage. Using this approach we obtain the following results:

Small set expanders with many large eigenvalues

Our first application, and the motivation to this work, is a question of Arora, Barak and Steurer [ABS10]: How many eigenvectors with eigenvalue at least 1ε1-\varepsilon can an nn-vertex small set expander graph have? We say a graph is a small set expander (SSE) if all sufficiently small subsets of vertices have, say, at least 0.90.9 fracton of their neighbors outside the set. [ABS10] showed an upper bound of nO(ε)n^{O(\varepsilon)} on the number of large (i.e., greater than 1ε1-\varepsilon) eigenvalues of a small set expander. Arora et al. then observed that the subspace enumeration algorithm of [KT07, Kol10] for approximating small set expansion in an input graph takes time at most exponential in this number, which they then use to give an algorithm with similar running time for the Unique Games problem. Up to this work, the best lower bound was polylog(n)\operatorname{polylog}(n), with the example being the noisy cube, and hence as far as we knew the algorithm of [ABS10] could solve the small set expansion problem in quasipolynomial time, which in turn might have had significant implications for the Unique Games problem as well. Our derandomized noisy cube yields an example with an almost polynomial number of large eigenvalues:

For every ε>0\varepsilon>0, there is an nn-vertex small set expander graph with 2(logn)Ω(1)2^{(\log n)^{\Omega(1)}} eigenvectors with corresponding eigenvalues at least 1ε1-\varepsilon.

Theorem 1 actually follows from a more general result connecting locally testable codes to small set expanders, which we instantiate with the Reed Muller code. See Section 2 for details.

Efficient integrality gaps

There is a standard semidefinite program (SDP) relaxation for the Unique Games problem, known as the “basic SDP” [KV05, RS09]. Several works have shown upper and lower bounds on the approximation guarantees of this relaxation, and for constant alphabet size, the relation between the alphabet size and approximation guarantee is completely understood [CMM06]. However, for unbounded alphabet, there was still a big gap in our understanding of the relation between the approximation guarantee and the number of variables. Gupta and Talwar [GT06] showed that if the relaxation’s value is 1ε1-\varepsilon, there is an assignment satisfying 1O(εlogn)1-O(\varepsilon\log n) fraction of constraints. On the other hand, Khot and Vishnoi [KV05] gave an integrality gap instance where the relaxation’s value was 11/poly(loglogn)1-1/\operatorname{poly}(\log\log n)Throughout, for any function ff, poly(f(n))\operatorname{poly}(f(n)) denotes a function gg satisfying g(n)=f(n)Ω(1)g(n)=f(n)^{\Omega(1)}. but the objective value (maximum fraction of constraints satisfied by any assignment) was o(1)o(1). It was a natural question whether this could be improved (e.g., see [Lee11]), and indeed our short code allows us to obtain an almost exponential improvement:

There is an nn-variable instance of Unique Games with objective value o(1)o(1) but for which the standard semidefinite programming (SDP) relaxation has value at least 11/qpolylog(n)1-1/\operatorname{qpolylog}(n).For functions f,g:\varmathbbN[0,)f,g:\varmathbb N\to[0,\infty) we write f=qpoly(g)f=\operatorname{qpoly}(g) if f=exp(polylog(g))f=\exp(\operatorname{polylog}(g)). That is, if there are constants C>c>0C>c>0 such that for all sufficiently large nn, exp((logg(n))c)f(n)exp((logg(n))C)\exp((\log g(n))^{c})\leqslant f(n)\leqslant\exp((\log g(n))^{C}). (Note that we allow c<1c<1, and so f=qpoly(g)f=\operatorname{qpoly}(g) does not imply that f>gf>g.) Similarly, we define qpolylog(g)=qpoly(logg)\operatorname{qpolylog}(g)=\operatorname{qpoly}(\log g) and write f=qqpoly(g)f=\operatorname{qqpoly}(g) if f=exp(exp(poly(loglogg)))f=\exp(\exp(\operatorname{poly}(\log\log g))).

Integrality gaps for SDP hierarchies

Barak, Raghavendra and Steurer [BRS11] (see also [GS11]) showed that for every ε>0\varepsilon>0, nεn^{\varepsilon} rounds of the SA\mathsf{SA} hierarchy yields a non-trivial improvement over the basic SDP . The unique games conjecture predicts that this is optimal, in the sense that no(1)n^{o(1)} rounds of any hierarchy should not improve the worst-case approximation ratio above the basic SDP.This is under the widely believed assumption that NPDtime(exp(no(1))\mathbf{NP}\nsubseteq\mathbf{Dtime}(\exp(n^{o(1)}). However, this prediction is far from being verified, with the best lower bounds given by [RS09] (see also [KS09]) who showed instances that require logΩ(1)n\log^{\Omega(1)}n rounds for the LH\mathsf{LH} hierarchy, and (loglogn)Ω(1)(\log\log n)^{\Omega(1)} rounds for the SA\mathsf{SA} hierarchy. Moreover, these instances are known to be solvable in quasipolynomial time [Kol10] and in fact via polylog(n)\operatorname{polylog}(n) rounds of the SA\mathsf{SA} hierarchy [BRS11] . Thus prior work gave no evidence that the unique games problem cannot be solved in quasipolynomial time. In this work we obtain almost-exponentially more efficient integrality gaps, resisting qpoly(logn)\operatorname{qpoly}(\log n) rounds of the SA\mathsf{SA} hierarchy and qqpoly(n)\operatorname{qqpoly}(n) rounds of the LH\mathsf{LH} hierarchy. The latter is the first superlogarithmic SDP hierarchy lower bound for Unique Games for any SDP hierarchy considered in the literature.

For every ε>0\varepsilon>0 there is some k=k(ε)k=k(\varepsilon), such that for every nn there is an nn variable instance Γ\Gamma of Unique Games with alphabet size kk such that the objective value of Γ\Gamma is at most ε\varepsilon, but the value on Γ\Gamma of both qpoly(logn)\operatorname{qpoly}(\log n) rounds of the SA\mathsf{SA} hierarchy and qqpoly(n)\operatorname{qqpoly}(n) rounds of the LH\mathsf{LH} hierarchy is at least 1ε1-\varepsilon.

Alphabet reduction gadget

Khot, Kindler, Mossel and O’Donnel [KKMO04] used the long code to show an “alphabet reduction” gadget for unique games. They show how to reduce a unique game instance with some large alphabet KK to an instance with arbitrarily small alphabet. (In particular, they showed how one can reduce arbitrary unique games instances into binary alphabet instances, which turns out to be equivalent to the Max Cut problem.) However, quantitatively their result was rather inefficient, incurring an exponential in KK blowup of the instance. By replacing the long code with our “short code”, we obtain a more efficient gadget, incurring only a qusipolynomial blowup. One caveat is that, because the short code doesn’t support arbitrary permutations, this reduction only works for unique games instances whose constraints are affine functions over \varmathbbF2k\varmathbb F_{2}^{k} where k=logKk=\log K; however this class of unique games seems sufficiently rich for many applications.For example, because the multiplicative group of the field \varmathbbF2n\varmathbb F_{2^{n}} is cyclic, one can represent constraints of the form xixj=ci,j(mod2n1)x_{i}-x_{j}=c_{i,j}\pmod{2^{n}-1} as linear constraints over \varmathbbF2n\varmathbb F_{2}^{n} (i.e., constraints of the form xi=Ci,jxjx_{i}=C_{i,j}x_{j} where Ci,jC_{i,j} is an invertible linear map over \varmathbbF2n\varmathbb F_{2}^{n}).

Once again, our quantitative results are stronger than stated here, and as in [KKMO04], we obtain nearly optimal relation between the alphabet size kk and the soundness and completeness thresholds. In particular for k=2k=2 our results match the parameters of the Max Cut algorithm of Geomans and Williamson. Our alphabet reduction gadget suggests a new approach to proving the unique games conjecture by using it as an “inner PCP”. For example, one could first show hardness of unique games with very large alphabet (polynomial or even subexponential in the number of variables) and then applying alphabet reduction. At the very least, coming up with plausible hard instances for unique games should be easier with a large alphabet.

The long code is also used as a tool in applications that do not involve the unique games conjecture. On a high level, there are two properties that make the long code useful in hardness of approximation: (i) it has a 22 query test obtained from the noisy hypercube and (ii) it has many symmetries, and in particular one can read off any function of xx from the xthx^{th} codeword. Our short code preserves property (i) but (as is necessary for a more efficient code) does not preserve property (ii), as one can only read off low degree polynomials of xx (also it is only symmetric under affine transformations). We note that if one does not care about property (i) and is happy with a 33 query test, then it’s often possible to use the Hadamard code which is more efficient than the short code (indeed it’s essentially equal to the dd-short code for d=1d=1). Thus, at least in the context of hardness of approximation, it seems that the applications the short code will be most useful are those where property (i) is the crucial one.

Despite the name “short code”, our code is not the shortest possible code. While in our applications, dimension linear in the number of codewords is necessary (e.g., one can’t have a graph with more eigenvalues than vertices), it’s not clear that the dimension needs to be polynomial. It is a very interesting open question to find shorter codes that can still be used in the above applications.

Our techniques

To explain our techniques we focus on our first application— the construction of a small set expander with many eigenvalues close to 11. The best way to view this construction is as a derandomization of the noisy hypercube, and so it will be useful to recall why the noisy hypercube itself is a small set expander.

Recall that the ε\varepsilon-noisy hypercube is the graph HN,εH_{N,\varepsilon} whose vertex set is {±1}N\{\pm 1\}^{N} where we sample a neighbor of xx by flipping each bit independently with probability ε\varepsilon. The eigenvectors in HN,εH_{N,\varepsilon} are given by the parity functions χα(x)=iαxi\chi_{\alpha}(x)=\prod_{i\in\alpha}x_{i} for subsets α[N]\alpha\subseteq[N] and the corresponding eigenvalues are λα=(12ε)α\lambda_{\alpha}=(1-2\varepsilon)^{|\alpha|}. Thus λα\lambda_{\alpha} only depends on the degree α|\alpha| of χα\chi_{\alpha}. In particular, the “dictator” functions χ{i}(x)=xi\chi_{\{i\}}(x)=x_{i} have eigenvalue 12ε1-2\varepsilon and they correspond to balanced cuts (where vertices are partitioned based on the value of xix_{i}) with edge expansion ε\varepsilon. As α\alpha increases, λα\lambda_{\alpha} decreases, becoming a constant around α=O(1/ε)|\alpha|=O(1/\varepsilon).

Given f:{±1}N{0,1}f:\{\pm 1\}^{N}\rightarrow\{0,1\} which is the indicator of a set SS, its Fourier expansion f(x)=αf^(α)χα(x)f(x)=\sum_{\alpha}\hat{f}(\alpha)\chi_{\alpha}(x) can be viewed as expressing the vector ff in the eigenvector basis. The edge expansion of SS is determined by the distribution of its Fourier mass; sets where most of the Fourier mass is on large sets will expand well. Given this connection, small-set expansion follows from the fact that the indicator functions of small sets have most of their mass concentrated on large Fourier coefficients. More precisely a set SS of measure μ\mu has most of its Fourier mass on coefficients of degree Ω(log(1/μ))\Omega(\log(1/\mu)). This follows from the so-called (2,4)-hypercontractive inequality for low-degree polynomials— that for every degree dd polynomial ff,

for some CC depending only on dd. (See Section 4.1 for the proof, though some intuition can be obtained by noting that if ff is a characteristic function of a set SS of measure μ=o(1)\mu=o(1) then \varmathbbE[f2]2=μ2\operatorname*{\varmathbb{E}}[f^{2}]^{2}=\mu^{2} and \varmathbbE[f4]=μ\operatorname*{\varmathbb{E}}[f^{4}]=\mu and hence Equation (2.1) shows that ff cannot be an O(1)O(1)-degree polynomial.)

By a “derandomized hypercube” we mean a graph on much fewer vertices that still (approximately) preserves the above properties of the noisy hypercube. Specifically we want to find a very small subset D\mathcal{D} of {±1}N\{\pm 1\}^{N} and a subgraph GG of HN,εH_{N,\varepsilon} whose vertex set is D\mathcal{D} such that (i) GG will have similar eigenvalue profile to HN,εH_{N,\varepsilon}, and in particular have NN eigenvalues close to 11 and (ii) GG will be a a small set expander. To get the parameters we are looking for, we’ll need to have the size of D\mathcal{D} be at most qpoly(N)\operatorname{qpoly}(N).

A natural candidate is to take D\mathcal{D} to be a random set, but it is not hard to show that this will not work. A better candidate might be a linear subspace D\varmathbbF2N\mathcal{D}\subseteq\varmathbb F_{2}^{N} that looks suitably pseudorandom. We show that in fact it suffices to choose a subspace D\mathcal{D} whose dual C=D\mathcal{C}=\mathcal{D}^{\perp} is a sufficiently good locally testable code. (We identify \varmathbbF2N\varmathbb F_{2}^{N} with {±1}N\{\pm 1\}^{N} via the usual map (b1,,bN)((1)b1,,(1)bN)(b_{1},\ldots,b_{N})\mapsto((-1)^{b_{1}},\ldots,(-1)^{b_{N}}).)

Our construction requires an asymptotic family of [N,K,D]2[N,K,D]_{2} linear codes C\varmathbbF2N\mathcal{C}\subseteq\varmathbb F_{2}^{N} where the distance DD tends to infinity. The code should have a εN\varepsilon N-query local tester which when given a received word α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N} samples a codeword qq of weight at most εN\varepsilon N from a distribution T\mathcal{T} on C\mathcal{C}^{\perp} and accepts if α,q=1\langle\alpha,q\rangle=1. The test clearly accepts codewords in C\mathcal{C}, we also require it to reject words that are distance at least D/10D/10 from every codeword in C\mathcal{C} with probability 0.490.49. Given such a locally testable code C\mathcal{C}, we consider the Cayley graphCayley graph are usually defined to be unweighted graph. However, the definition can be generalized straightforwardly to weighted graphs. GG whose vertices are the codewords of the dual code D=C\mathcal{D}=\mathcal{C}^{\perp} while the (appropriately weighted) edges correspond to the distribution T\mathcal{T}. That is, a vertex of GG is a codeword xDx\in\mathcal{D}, while a random neighbor of xx is obtained by picking a random qq from T\mathcal{T} and moving to x+qx+q.

Because D\mathcal{D} is a subspace, it is easy to show that the eigenvectors of GG are linear functions of of the form χα(x)\chi_{\alpha}(x) for x,α\varmathbbF2Nx,\alpha\in\varmathbb F_{2}^{N} (where if ααC\alpha\oplus\alpha^{\prime}\in\mathcal{C} then χα\chi_{\alpha} and χα\chi_{\alpha^{\prime}} are identical on GG’s vertices). Moreover, from the way we designed the graph, for every α\varmathbbF2n\alpha\in\varmathbb F_{2}^{n}, the corresponding eigenvalue λα\lambda_{\alpha} is equal to \varmathbbEqT[(1)α,q]=12\varmathbbPT[Test rejects α]\operatorname*{\varmathbb{E}}_{q\in\mathcal{T}}[(-1)^{\langle\alpha,q\rangle}]=1-2\operatorname*{\varmathbb{P}}_{\mathcal{T}}[\text{Test rejects }\alpha]. This connection between the spectrum of GG and the local testability of C\mathcal{C} allows us to invoke machinery from coding theory in our analysis.

From this one can deduce that the eigenvalue spectrum of GG does indeed resemble the hypercube in the range close to 11. In particular each χ{i}(x)=xi\chi_{\{i\}}(x)=x_{i} is a distinct eigenvector with eigenvalue 12ε1-2\varepsilon, and gives a bad cut in GG (where vertices are partitioned based on the value of xix_{i}). On the other hand for any eigenvector χ\chi of GG, choose α\alpha of minimal weight such that χ=χα\chi=\chi_{\alpha}. Now if α>D/10|\alpha|>D/10 this means that the distance of α\alpha from C\mathcal{C} is at least D/10D/10, which using the testing property implies that λα120.49=0.02\lambda_{\alpha}\leqslant 1-2\cdot 0.49=0.02.

If we can show that indicator functions of small sets have most of their Fourier mass on such eigenvectors (with small eigenvalue), that will imply that small sets have good expansion. For small subsets of the hypercube, recall that this is proved using (2,4)-hpercontractivity for low-degree polynomials. The key observation is that the inequality

still holds for all polynomials ff of degree d<D/4d<D/4. This is because the distance of C\mathcal{C} is DD, hence the distribution of a random xx in D\mathcal{D} is DD-wise independent, which means that the expectation of any polynomial of degree at most DD is equal over such xx and over a uniform xx in {±1}N\{\pm 1\}^{N}. Thus (2.2) follows from (2.1), completing our proof.

We instantiate this approach with using for C\mathcal{C} the Reed Muller code consisting of polynomials in nn variables over \varmathbbF2\varmathbb F_{2} of degree nd1n-d-1. This is a code of distance D=2d1D=2^{d-1}. We note that the degree nd1n-d-1 and hence the rate of the code C\mathcal{C} are very high. The graph is over the codewords of D=C\mathcal{D}=\mathcal{C}^{\perp} that is itself the Reed Muller code of polynomials over \varmathbbF2n\varmathbb F_{2}^{n} of degree dd. Our basic tester consists of selecting a random minimum weight codeword of D\mathcal{D}.For many applications we amplify the success of this tester by selecting a sum of tt random such words, this corresponds to taking some power of the basic graph G\mathcal{G} described. Thus our graph G\mathcal{G} has as its vertices the dd degree polynomials over \varmathbbF2n\varmathbb F_{2}^{n} with an edge between every polynomials p,qp,q such that pqp-q is a product of dd linearly-independent affine functions (as those are the minimal weight codewords in the Reed Muller code). We use the optimal analysis of Bhattacharyya, Kopparty, Schoenebeck, Sudan and Zuckerman [BKS+10] to argue about the local testability of C\mathcal{C} which is a high degree Reed Muller code. We should note that this test is very closely related to the Gowers uniformity test that was first analyzed in the work of Kaufman et al. [AKK+05], but our application requires the stronger result from [BKS+10].

We now briefly outline how we use the above tools to obtain more efficient versions of several other constructions such as alphabet reduction gadgets and integrality gaps for unique games and other problems.

To beign with, the graph we construct can be used to prove Theorem 2. That is, a construction of an MM variable instance Γ\Gamma of unique games where every assignment can satisfy at most a very small (say 1/1001/100) fraction of the constraints, but for which the standard semidefinite programming (SDP) relaxation has value of at least 11/qpoly(logM)1-1/\operatorname{qpoly}(\log M). The basic idea is to simply take the graph G\mathcal{G} we constructed above, and turn it into an instance of unique games by considering it to be the label extended graph of some unique games instance. We now elaborate a bit below, leaving the full details to Section 7. Recall that a Unique Games instance Γ\Gamma with MM variables and alphabet Σ\Sigma is described by a collection of constraints of the form (x,y,π)(x,y,\pi) where π\pi is a permutation over Σ\Sigma. An assignment to Γ\Gamma is a mapping ff from [M][M] to Σ\Sigma, and ff’s value is the fraction of constraints (x,y,π)(x,y,\pi) such that f(y)=π(f(x))f(y)=\pi(f(x)). The label extended graph corresponding to Γ\Gamma is the graph GΓG_{\Gamma} over vertices [M]×Σ[M]\times\Sigma where for every constraint of the form (x,y,π)(x,y,\pi) and σΣ\sigma\in\Sigma we add an edge between (x,σ)(x,\sigma) and (y,π(σ))(y,\pi(\sigma)). It is not hard to see that an assignment of value 1ε1-\varepsilon corresponds to a subset SS containing exactly MM of GΓG_{\Gamma}’s vertices with small expansion (i.e., ε\varepsilon fraction of the edges from SS leave the set). Thus if GΓG_{\Gamma} is an expander for sets of measure 1/Σ1/|\Sigma| in GΓG_{\Gamma} then there is no nearly satisfying assignment for the unique games instance Γ\Gamma. In our case, our graph G\mathcal{G} has the degree dd polynomials over \varmathbbF2n\varmathbb F_{2}^{n} as its vertices, and we transform it into a unique game instance whose variables correspond to degree dd polynomials without linear terms. The alphabet Σ\Sigma consists of all linear functions over \varmathbbF2n\varmathbb F_{2}^{n}. We ensure that the graph G\mathcal{G} is the label extended graph of Γ\Gamma by setting the permutations accordingly: given a polynomial pp without a linear term, and a function qq that is a product of dd affine functions,Actually, to get better parameters, we take some power tt of G\mathcal{G}, meaning that we consider qq that is a sum of tt functions that are products of dd affine functions. if we write q=q+qq=q^{\prime}+q^{\prime\prime} where qq^{\prime\prime} is the linear part of qq, then we add a constraint of the form (p,p+q,π)(p,p+q^{\prime},\pi) where π\pi is the permutation that maps a linear function rr into r+qr+q^{\prime\prime}. Some not too difficult calculations show that the top eigenvectors of our graph G\mathcal{G} yield a solution for the semidefinite program for Γ\Gamma (if the top eigenvectors are f1,,fKf^{1},\ldots,f^{K}, our vector solution will associate with each vertex xx the vector (f1(x),,fK(x)(f^{1}(x),\ldots,f^{K}(x)). By choosing carefully the parameters of the graph G\mathcal{G}, the instance Γ\Gamma will have SDP value 11/qpoly(logM)1-1/\operatorname{qpoly}(\log M) where MM is the number of variables.

Derandomized Invariance Principle

While hypercontractivity of low degree polynomials suffices for some applications of the long code, other applications require other theorems, and in particular the invariance principle, shown for the hypercube by Mossel, O’Donnel and Oleszkiewicz [MOO05].Roughly speaking their invariance principle says that for “nice” functions ff on the vertices of the NN-dimensional noisy hypercube, the distribution of f(x)f(x) where xx is a random vertex is close to the distribution of f(y)f(y) where yy consists of NN independent standard Gaussian random variables (appropriately extending ff to act on \varmathbbRN\varmathbb R^{N}). To obtain more efficient version of these applications, we first show that the same holds even when xx is a random vertex in our smaller subset of NN-dimensional strings – the Reed–Muller codewords. Our central tool is a recent result by Meka and Zuckerman [MZ10] which derandomizes the invariance principle of Mossel et al. Our key insight is that taking a random Reed–Muller codeword can in fact be viewed as an instantiation of the Meka-Zuckerman generator, which involves splitting the input into blocks via a pairwise independent hash function, and using independent kk-wise independent distributions in each block. This allows us to obtain a version of the “Majority is Stablest” theorem for our graph, which is the main corollary of the invariance principle that is used in applications of the longcode. See Section 5 for more details.

Efficient alphabet reduction

With the “Majority of Stablest” theorem in hand, proving Theorem 4 (efficient alphabet reduction for unique games), is fairly straightforward. The idea is to simply replace the noisy hypercube gadget used by [KKMO04] with our derandomized hypercube. This is essentially immediate in the case of alphabet reduction to binary alphabet (i.e., reduction to Max Cut) but requires a bit more work when reducing to a larger alphabet. See Section 6 for more details.

Efficient hierarchy integrality gaps

Our proof Theorem 3 again works by plugging in our short code / derandomized noisy hypercube in place of the long code in the previous integrality gap constructions [KV05, KS09, RS09]. Specifically, these constructions worked by starting with an integrality gap for unique games where the basic SDP yields 11/r1-1/r, and then composing it with an alphabet reduction gadget to obtain a new instance; Raghavendra and Steurer [RS09] showed that the composed instances resist poly(r)\operatorname{poly}(r) rounds of the SA\mathsf{SA} hierarchy and exp(poly(r))\exp(\operatorname{poly}(r)) rounds of the LH\mathsf{LH} hierarchy. These constructions used the noisy cube twice— both to obtain the basic unique games gap instance, and to obtain the alphabet reduction gadget. We simply plug in our short code in both usages— using for the basic unique games instance the efficient version obtained in Theorem 2, and for the alphabet reduction gadget the efficient version obtained in Theorem 4. (Luckily, our unique games instance has affine constraints and so is compatible with our alphabet reduction gadget.) The result essentially follows in a blackbox way from the analysis of [RS09]. See Section 8 for details.

Preliminaries

Let GG be a regular graph with vertex set VV. For a subset SVS\subseteq V we define the volume of SS, denoted μ(S)\mu(S), to be S/V|S|/|V|. We define the expansion of SS, denoted Φ(S)\operatorname{\Phi}(S), to be the probability over a random edge (u,v)(u,v), conditioned on uSu\in S that v∉Sv\not\in S. Equivalently (since GG is regular), Φ(S)=G(S,VS)/(degGS)\operatorname{\Phi}(S)=G(S,V\setminus S)/(\deg_{G}\lvert S\rvert) where degG\deg_{G} is the degree of the graph GG and G(S,VS)G(S,V\setminus S) is the number of edges going from SS to VSV\setminus S. Throughout, we denote the normalized adjacency matrix of a graph GG also by GG, and refer to the spectrum of the adjacency matrix as the spectrum of the graph GG. Note that by definition, every regular graph has maximum eigenvalue 11. In this paper, we use expectation norms for real-valued functions. That is, for a function f ⁣:S\varmathbbRf\colon S\to\varmathbb R and p1p\geqslant 1, we let fp:=(\varmathbbExSf(x)p)1/p\lVert f\rVert_{p}\mathrel{\mathop{:}}=(\operatorname*{\varmathbb{E}}_{x\in S}|f(x)|^{p})^{1/p}.

Many of the unique games instances that appear in this work belong to a special subclass of unique games, namely \varmathbbF2n\textscMax2Lin\varmathbb F_{2}^{n}\textsc{-Max-2Lin} instances defined below.

Given a group H\mathcal{H}, an H\textscMax2Lin\mathcal{H}\textsc{-Max-2Lin} instance consists of a system of linear equations over the group H\mathcal{H} where each equation is of the form xixj=cijx_{i}-x_{j}=c_{ij} for some cijHc_{ij}\in\mathcal{H}.

Let C\mathcal{C} be an [N,K,D]2[N,K,D]_{2} code, that is, C\mathcal{C} is a KK-dimensional linear subspace of \varmathbbF2N\varmathbb F_{2}^{N} with minimum distance DD (=min{wt(x):xC}=\min\{{\sf wt}(x):x\in\mathcal{C}\}). (In this paper, we are mostly interested in the extremely high rate regime when H=NKH=N-K is very small compared to NN and are happy with DD being some large constant.) Let Δ(x,y){0,,N}\Delta(x,y)\in\{0,\ldots,N\} denote Hamming distance between x,y\varmathbbF2Nx,y\in\varmathbb F_{2}^{N}. For α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N} and a code C\mathcal{C} we define

We say a distribution T\mathcal{T} over \varmathbbF2N\varmathbb F_{2}^{N} is a canonical tester for C\mathcal{C} if every vector in the support of the distribution T\mathcal{T} is a codeword qCq\in C^{\bot}. The query complexity of T\mathcal{T} is the maximum weight of a vector in its support. The tester’s soundness curve sT ⁣:\varmathbbNs_{\mathcal{T}}\colon\varmathbb N\to is defined as

Similarly, we denote the rejection probability of T\mathcal{T} for a vector α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N} by sT(α)=\varmathbbPqT{α,q=1}s_{\mathcal{T}}(\alpha)=\operatorname*{\varmathbb{P}}_{{q\sim\mathcal{T}}}\left\{\langle\alpha,q\rangle=1\right\}. We let the query probability τ\tau\in of a tester be the expected fraction of queried coordinates, that is, τ=\varmathbbEqTwt(q)/N\tau=\operatorname*{\varmathbb{E}}_{q\sim\mathcal{T}}{\sf wt}(q)/N. We say that a tester T\mathcal{T} with query probability τ\tau is smooth if for any coordinate i[N]i\in[N], \varmathbbPqT{qi=1}=τ\operatorname*{\varmathbb{P}}_{{q\sim\mathcal{T}}}\left\{q_{i}=1\right\}=\tau and we say it is 22-smooth if in addition, for any two distinct coordinates iji\neq j, \varmathbbPqT{qi=qj=1}=τ2\operatorname*{\varmathbb{P}}_{{q\sim\mathcal{T}}}\left\{q_{i}=q_{j}=1\right\}=\tau^{2}.

Finally, the following simple lemma gives some estimates for rejection probabilities of vectors for smooth testers.

If T\mathcal{T} is a smooth canonical tester with query probability τ\tau, then sT(α)Δ(α,C)τs_{\mathcal{T}}(\alpha)\leqslant\Delta(\alpha,\mathcal{C})\cdot\tau for every vector α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N}. Furthermore, if T\mathcal{T} is 22-smooth, then sT(α)(1γ)Δ(α,C)τs_{\mathcal{T}}(\alpha)\geqslant(1-\gamma)\cdot\Delta(\alpha,\mathcal{C})\cdot\tau for every vector α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N} with Δ(α,C)τγ\Delta(\alpha,\mathcal{C})\tau\leqslant\gamma.

Fix α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N} and let k=Δ(α,C)k=\Delta(\alpha,\mathcal{C}). Without loss of generality, we may assume wt(α)=k{\sf wt}(\alpha)=k. By renaming coordinates, we may assume α1==αk=1\alpha_{1}=\ldots=\alpha_{k}=1 and αk+1==αN=0\alpha_{k+1}=\ldots=\alpha_{N}=0. Then, sT(α)\varmathbbPqT{q1=1}++\varmathbbPqT{qk=1}=kτs_{\mathcal{T}}(\alpha)\leqslant\operatorname*{\varmathbb{P}}_{{q\sim\mathcal{T}}}\left\{q_{1}=1\right\}+\ldots+\operatorname*{\varmathbb{P}}_{{q\sim\mathcal{T}}}\left\{q_{k}=1\right\}=k\cdot\tau. On the other hand,

We review the prerequisites for Majority is Stablest and Unique Games related results in the corresponding sections.

Small Set Expanders from Locally Testable Codes

In this section we first use some known properties of hypercontractive norms to give a sufficient condition for graphs to be small set expanders. We then describe a generic way to construct graphs satisfying this condition from locally testable codes, proving Theorem 1.

Let V\mathcal{V} be a subspace of the set of functions from VV to \varmathbbR\varmathbb R for some finite set VV. We denote by PVP_{\mathcal{V}} the projection operator to the space V\mathcal{V}. For p,q1p,q\geqslant 1, we define

We now relate this notion to small set expansion. We first show that a subspace V\mathcal{V} with bounded (4/3)2(4/3)\to 2 norm cannot contain the characteristic function of a small set:

Let f:V{0,1}f:V\to\{0,1\} such that μ=\varmathbbExV[f(x)]\mu=\operatorname*{\varmathbb{E}}_{x\in V}[f(x)] then PVf22V4/322μ3/2\lVert P_{\mathcal{V}}f\rVert_{2}^{2}\leqslant\lVert\mathcal{V}\rVert_{4/3\to 2}^{2}\mu^{3/2}.

Note that if V4/32=O(1)\lVert\mathcal{V}\rVert_{4/3\to 2}=O(1) and μ=o(1)\mu=o(1), then PVf22=o(f22)\lVert P_{\mathcal{V}}f\rVert_{2}^{2}=o(\lVert f\rVert_{2}^{2}), meaning the projection of ff onto VV is small. It is often easier to work with the 242\to 4 norm instead of the 4/324/3\to 2 norm. The following lemma allows us to use a bound on the former to bound the latter:

Let f:V\varmathbbRf:V\to\varmathbb R and let f=PVff^{\prime}=P_{V}f. We know that

Dividing by f2=\varmathbbE[f2]1/2\lVert f\rVert_{2}=\operatorname*{\varmathbb{E}}[f^{2}]^{1/2} yields the result. ∎

We now conclude that graphs for which the top eigenspace has bounded 242\to 4 norm are small set expanders. The lemma can be viewed qualitatively as a generalization of one direction of the classical Cheeger’s inequality relating combinatorial expansion to eigenvalue gap [Che70].

Let G=(V,E)G=(V,E) be regular graph, and V\mathcal{V} be the span of the eigenvectors of GG with eigenvalue larger than λ\lambda. Then, for every SVS\subseteq V,

Let ff be the characteristic function of SS, and write f=f+ff=f^{\prime}+f^{\prime\prime} where f=PVff^{\prime}=P_{\mathcal{V}}f (and so f=fff^{\prime\prime}=f-f^{\prime} is the projection to the eigenvectors with value at most λ\lambda). Let μ=μ(S)\mu=\mu(S). We know that

Plugging this into (4.1) yields the result. ∎

2 Cayley graphs on codes

Motivated by the previous section, we now construct a graph for which the projection operator on to the top eigenspace is hypercontractive, i.e., has small 242\to 4 norm, while also having high rank.

By Xoring the results of mulitple tests, one can let the soundness s(k)s(k) tend to \nicefrac12\nicefrac{{1}}{{2}}. Hence, if s(k)s(k) is significantly larger than ε\varepsilon (for appropriate kk), one can obtain a graph with many large eigenvalues such that small enough sets have near-perfect expansion.

Note that if deg(χα)<D/2\deg(\chi_{\alpha})<D/2, then the minimum weight representative in α+C\alpha+\mathcal{C} is unique. (This uniqueness will allow us later to define low-degree influences of functions, see Section 5.)

We let λα\lambda_{\alpha} denote the eigenvalue corresponding to character χα\chi_{\alpha}. The following observation connects the soundness of the canonical tester to the spectrum of GG:

For any α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N}, λα=12s(α)\lambda_{\alpha}=1-2s(\alpha).

From standard facts about Cayley graphs, it follows that

We use this to show that many dictator cuts in GG which correspond to characters with degree 11 have eigenvalues close to 11. We let λi,χi\lambda_{i},\chi_{i} denote λ{i},χ{i}\lambda_{\{i\}},\chi_{\{i\}}. As noted before, for D>2D>2 these are distinct characters.

We have λi14ε\lambda_{i}\geqslant 1-4\varepsilon for at least N/2N/2 coordinates [i]N[i]\in N.

We have λi=12\varmathbbPqT[qi=1]\lambda_{i}=1-2\operatorname*{\varmathbb{P}}_{q\in\mathcal{T}}[q_{i}=1]. Since wt(q)εN{\sf wt}(q)\leqslant\varepsilon N for every qTq\in\mathcal{T},

So we can have \varmathbbPqT[qi=1]2ε\operatorname*{\varmathbb{P}}_{q\in\mathcal{T}}[q_{i}=1]\geqslant 2\varepsilon for at most N/2N/2 coordinates. ∎

Another immediate conseuqence of Lemma 4.5 is that large degree characters have small eigenvalues.

If deg(χα)k\deg(\chi_{\alpha})\geqslant k, then λα12s(k)\lambda_{\alpha}\leqslant 1-2s(k).

Subspace Hypercontractivity

Given a function f ⁣:C\varmathbbRf\colon\mathcal{C}^{\perp}\to\varmathbb R we can write it (uniquely) as a linear combination of the characters {χα}α\varmathbbF2N/C\{\chi_{\alpha}\}_{\alpha\in\varmathbb F_{2}^{N}/\mathcal{C}}

where f^(α)=χα,f\hat{f}(\alpha)=\langle\chi_{\alpha},f\rangle is the Fourier transform of ff (over the abelian group C\mathcal{C}^{\bot}).

We define the degree of ff, denoted deg(f)\deg(f) to be maxα:f^(α)0deg(χα).\max_{\alpha:\hat{f}(\alpha)\neq 0}\deg(\chi_{\alpha}). Note that deg(f+g)max{deg(f),deg(g)}\deg(f+g)\leqslant\max\{\deg(f),\deg(g)\} and deg(fg)deg(f)+deg(g)\deg(fg)\leqslant\deg(f)+\deg(g). The following crucial observation follows immediately from the fact that C\mathcal{C} has minimum distance DD.

The uniform distribution on C\mathcal{C}^{\perp} is (D1)(D-1) wise independent. That is, for any α\varmathbbF2N\alpha\in\varmathbb F_{2}^{N} such that 1wt(α)<D1\leqslant{\sf wt}(\alpha)<D we have \varmathbbEpC[χα(p)]=0\operatorname*{\varmathbb{E}}_{p\in\mathcal{C}^{\perp}}[\chi_{\alpha}(p)]=0.

The proof follows from the following two facts:

This bound on the 242\to 4 norm is known to hold for true low degree polynomials under the uniform distribution on the hypercube by the Bonami-Beckner-Gross inequality [O’D08].

Combining the above bound with Lemma 4.3 we get that, if the local tester rejects sufficiently far codewords with high probability, then the resulting graph is a small set expander:

In particular, as s(k)s(k) tends to 1/21/2, the expansion of small sets tends to 11. This corollary together with Corollary 4.6 completes the proof of Theorem 4.4.

3 A Canonical Tester for Reed Muller codes

There exists a constant η0>0\eta_{0}>0 such that for all n,dn,d, and k<η02dk<\eta_{0}2^{d} the tester TRM\mathcal{T}_{\text{\sf RM}} described above has soundness s(k)(k/2)2ds(k)\geqslant(k/2)\cdot 2^{-d}.

For a graph GG the continuous-time random walk on GG with parameter tt is described by the (stochastic) matrix G(t)=et(IG)G(t)=e^{-t(I-G)}. G(t)G(t) and GG have the same eigenvectors and the eigenvalues of G(t)G(t) are {et(1μi)}\{e^{-t(1-\mu_{i})}\}, where {μi}\{\mu_{i}\} is the spectrum of GG.

If deg(χα)=k\deg(\chi_{\alpha})=k, λαmax(ρk/2,ρμ02d)\lambda_{\alpha}\leqslant\max(\rho^{k/2},\rho^{\mu_{0}2^{d}}) where μ0\mu_{0} is an absolute constant.

For all δ<δ0\delta<\delta_{0} for some constant δ0\delta_{0}, if deg(χα)=k<δ22d+1\deg(\chi_{\alpha})=k<\delta^{2}2^{d+1}, λαρkδ\lvert\lambda_{\alpha}-\rho^{k}\rvert\leqslant\delta.

For k2dk\leqslant 2^{d}, 1μα=kτ±k2τ2kτ/21-\mu_{\alpha}=k\tau\pm k^{2}\tau^{2}\geqslant k\tau/2. Therefore, if deg(χα)2d\deg(\chi_{\alpha})\leqslant 2^{d}, λα=eε2d+1(1μα)eε2d+1kτ/2=ρk/2\lambda_{\alpha}=e^{-\varepsilon 2^{d+1}(1-\mu_{\alpha})}\leqslant e^{-\varepsilon 2^{d+1}k\tau/2}=\rho^{k/2}. For k>2dk>2^{d}, by Corollary 4.7, μα<12s(k)<C0\mu_{\alpha}<1-2s(k)<C_{0} for a universal constant C0<1C_{0}<1. Therefore, λα<eε2d+1(1C0)=ρ(1C0)2d+1<ρμ02d|\lambda_{\alpha}|<e^{-\varepsilon 2^{d+1}(1-C_{0})}=\rho^{(1-C_{0})2^{d+1}}<\rho^{\mu_{0}2^{d}} for μ0<(1C0)/2\mu_{0}<(1-C_{0})/2.

We now prove the second bound. If εk2τ<δ/10\varepsilon k^{2}\tau<\delta/10, we have λα=ρk(1±δ)\lambda_{\alpha}=\rho^{-k}(1\pm\delta) which implies λαρkδ\lvert\lambda_{\alpha}-\rho^{k}\rvert\leqslant\delta. Otherwise, if εk2τδ/10\varepsilon k^{2}\tau\geqslant\delta/10, our assumption k<δ22d+1k<\delta^{2}2^{d+1} implies εk>1/(10δ)\varepsilon k>1/(10\delta), hence εεke110δδ/4\varepsilon^{-\varepsilon k}\leqslant e^{-\frac{1}{10\delta}}\leqslant\delta/4 for all δ<δ0\delta<\delta_{0}. For k2dk\leqslant 2^{d}, (1μα)=kτ±k2τ2kτ/2(1-\mu_{\alpha})=k\tau\pm k^{2}\tau^{2}\geqslant k\tau/2. Hence λαetkτ/2e120δδ/4\lambda_{\alpha}\leqslant e^{-tk\tau/2}\leqslant e^{-\frac{1}{20\delta}}\leqslant\delta/4 for all δ<δ0\delta<\delta_{0}. In this case, we get λαρkλα+ρkδ/2\lvert\lambda_{\alpha}-\rho^{k}\rvert\leqslant|\lambda_{\alpha}|+|\rho^{k}|\leqslant\delta/2.

For any ε,η>0\varepsilon,\eta>0, there exists a graph GG with 2(logG)1d2^{(\log|G|)^{\frac{1}{d}}} eigenvalues larger than 1ε1-\varepsilon for d=log(1/ε)+loglog(1/η)+O(1)d=\log(1/\varepsilon)+\log\log(1/\eta)+O(1) and where every set SGS\subseteq G has expansion

Then, for every α\varmathbbF2N/C\alpha\in\varmathbb F_{2}^{N}/\mathcal{C}, deg(χα)=1\deg(\chi_{\alpha})=1, we have sτ(α)2ds_{\tau}(\alpha)\geqslant 2^{-d}. Hence μα12d+1\mu_{\alpha}\geqslant 1-2^{-d+1}, λαet2d+1=e4ε\lambda_{\alpha}\geqslant e^{-t2^{-d+1}}=e^{-4\varepsilon}. Therefore, there are at least N=2(logG)1/dN=2^{(\log|G|)^{1/d}} eigenvalues which are larger than 14ε1-4\varepsilon.

The fact that our graph is a Cayley graph over \varmathbbF2n\varmathbb F_{2}^{n} has a potentially interesting implication for coding theory. By looking at the set of edge labels as the rows of a generating matrix for a code, we know that large Fourier coefficients corresponds to low weight codewords, and hence we get a code of dimension m=(nd)m=\binom{n}{d} that has an almost exponential (i.e. 2n2^{n}) number of codewords of low weight, but yet has small generalized Hamming distance in the sense that every subspace of codimension ω(1)\omega(1) contains a codeword of fractional Hamming weight 1o(1)1-o(1). In particular by setting dd to be a function slowly tending to infinity we can get a linear code for which correcting from an o(1)o(1) fraction of corruption errors requires an almost exponential list size, but for which one can correct a fraction approaching 11 of erasure errors using a list of constant size. (The code obtained by taking all edges of our graph has an almost exponential blowup, but this can be reduced by subsampling the edges.)

Majority is Stablest over Codes

In this section we show an analogue of the “Majority is Stablest” result of Mossel et al. for the RM graph we constructed in the previous section; this will help us replace the noisy cube with the RM graph in various unique games gadgets.

For ρ>0\rho>0, let Γρ:\Gamma_{\rho}:\to be the Gaussian noise stability curve defined as follows. For μ\mu\in, let t\varmathbbRt\in\varmathbb R be such that \varmathbbPgN(0,1[g<t]=μ\operatorname*{\varmathbb{P}}_{g\leftarrow\mathcal{N}(0,1}[g<t]=\mu. Then, Γρ(μ)=\varmathbbPX,Y[Xt,Yt]\Gamma_{\rho}(\mu)=\operatorname*{\varmathbb{P}}_{X,Y}[X\leqslant t,Y\leqslant t], where (X,Y)\varmathbbR2(X,Y)\in\varmathbb R^{2} is a two-dimensional mean zero Gaussian random vector with covariance matrix \left(\begin{array}[]{cc}1&\rho\\ \rho&1\end{array}\right). We refer the reader to Appendix B in Mossel et al. [MOO05] for a more detailed discussion on Γρ\Gamma_{\rho}.

Let P(x)=I[N]aIiIxiP(x)=\sum_{I\subseteq[N]}a_{I}\prod_{i\in I}x_{i} be a NN-variate multilinear polynomial P:\varmathbbRN\varmathbbRP:\varmathbb R^{N}\to\varmathbb R. Define P2=IaI2\|P\|^{2}=\sum_{I}a_{I}^{2} and we say PP is ε\varepsilon-regular if for every i[N]i\in[N], iIaI2ε2P2\sum_{i\ni I}a_{I}^{2}\leqslant\varepsilon^{2}\cdot\|P\|^{2}.

Throught this section, we let Tρ,NT_{\rho,N} (we omit NN when the dimension is clear) denote the noisy hypercube graph with second largest eigenvalue ρ\rho.

The following theorem shows that, in the context of noise stability, a regular function on the hypercube behaves like a function on Gaussian space.

Let f ⁣:{±1}Nf\colon\{\pm 1\}^{N}\to be a function with \varmathbbEf=μ\operatorname*{\varmathbb{E}}f=\mu. Suppose Infi10log(1/τ)(f)τ\operatorname{Inf}_{i}^{\leqslant 10\log(1/\tau)}(f)\leqslant\tau for all i[N]i\in[N]. Then,

where TρT_{\rho} is the Boolean noise graph with second largest eigenvalue ρ\rho and Γρ\Gamma_{\rho} is the Gaussian noise stability curve.

We will need the following ingredient of the proof of Theorem 5.1 from [MOO05]. For a,b\varmathbbRa,b\in\varmathbb R, let ζ[a,b] ⁣:\varmathbbR\varmathbbR+\zeta_{[a,b]}\colon\varmathbb R\to\varmathbb R_{+} be the functional ζ[a,b](x)=max{ax,xb,0}2\zeta_{[a,b]}(x)=\max\{a-x,x-b,0\}^{2}. For a real-valued random variable XX, the expectation \varmathbbEζ(X)\operatorname*{\varmathbb{E}}\zeta(X) is the L22L_{2}^{2}-distance of XX to the set of [a,b][a,b]-valued random variables (over the same probability space as XX). We will be interested in the case a=0a=0 and b=1b=1. For this case, we abbreviate ζ=ζ\zeta=\zeta_{}.

We will need the following corollary of that can handle functions that are not valuedasinthetheorembutjustcloseto-valued as in the theorem but just close to-valued functions.

Let f ⁣:{±1}N\varmathbbRf\colon\{\pm 1\}^{N}\to\varmathbb R be a function with \varmathbbEf=μ\operatorname*{\varmathbb{E}}f=\mu and \varmathbbEζfτ\operatorname*{\varmathbb{E}}\zeta\circ f\leqslant\tau. Suppose Infif30log(1/τ)τ\operatorname{Inf}_{i}f^{\leqslant 30\log(1/\tau)}\leqslant\tau for all i[N]i\in[N]. Then,

where TρT_{\rho} is the Boolean noise graph with second largest eigenvalue ρ\rho and Γρ\Gamma_{\rho} is the Gaussian noise stability curve. (Here, we assume that τ\tau is small enough.)

Let ff^{\prime} be the closest $valuedfunctionto-valued function tof.Since. Since\lVert f-f^{\prime}\rVert\leqslant\sqrt{\tau},itfollowsthat, it follows that\operatorname{Inf}^{\leqslant 20\log(1/\tau)}_{i}f^{\prime}\leqslant\tau+O(\sqrt{\tau})\ll\tau^{1/3}andand\operatorname*{\varmathbb{E}}f^{\prime}\leqslant\operatorname*{\varmathbb{E}}f+\sqrt{\tau}.Since. Since\langle f,T_{\rho}f\rangle\leqslant\langle f^{\prime},T_{\rho}f^{\prime}\rangle+O(\sqrt{\tau}),thecorollaryfollowsbyapplyingTheorem5.1tothefunction, the corollary follows by applying Theorem 5.1 to the functionf^{\prime}.(Here,wealsousethatfactthat. (Here, we also use that fact that\Gamma_{\rho}(\mu+\sqrt{\tau})\leqslant\Gamma_{\rho}(\mu)+2\sqrt{\tau}$. See Lemma B.3 in [MOO05].) ∎

We remark that although we specialize to Reed–Muller codes in this section, most of the arguments generalize appropriately to arbitrary codes with good canonical testers modulo a conjecture about bounded independence distributions fooling low-degree polynomial threshold functions. We briefly discuss this in Section 5.3.

To state our version of “Majority is Stablest” we first extend the notion of influences to functions over Reed–Muller codes. For n,d\varmathbbNn,d\in\varmathbb N, N=2nN=2^{n}, let C\varmathbbF2N\mathcal{C}\subseteq\varmathbb F_{2}^{N} be the Reed–Muller code RM(n,nd1)\text{\sf RM}(n,n-d-1) and let C\varmathbbF2N\mathcal{C}^{\bot}\subseteq\varmathbb F_{2}^{N} be its dual RM(n,d)\text{\sf RM}(n,d). For the rest of this section we assume that a set of representatives corresponding to the minimum weight codeword in each coset is chosen for the coset space \varmathbbF2N/C\varmathbb F_{2}^{N}/\mathcal{C}.

where \varmathbbV[f]=\varmathbbE[f2](\varmathbbE[f])2\operatorname*{\varmathbb{V}}[f]=\operatorname*{\varmathbb{E}}[f^{2}]-(\operatorname*{\varmathbb{E}}[f])^{2} denotes the variance of ff.

We are now ready to state the main result of this section generalizing the Majority is Stablest result to Reed-Muller codes. Let TRM\mathcal{T}_{\text{\sf RM}} be the canonical tester for C\mathcal{C} as defined in Section 4.3.

where ρ=eε\rho=e^{-\varepsilon} and Γρ ⁣:\varmathbbR\varmathbbR\Gamma_{\rho}\colon\varmathbb R\to\varmathbb R is the noise stability curve of Gaussian space.

The proof of the theorem proceeds in three steps. We first show that the eigenvalue profile of the graph GG is close to the eigenvalue profile of the Boolean noise graph (see Lemma 4.13). We then show an invariance principle for low-degree polynomials (and as a corollary for smoothed functions) showing that they have similar behaviour under the uniform distribution over the hypercube and the uniform distribution over the appropriate Reed–Muller code. Finally, we use the invariance principle to translate the majority is stablest result in the hypercube setting to the Reed–Muller code. The above approach is similar to that of Mossel et al., who translate a majority is stablest result in the Gaussian space to the hypercube using a similar invariance principle.

We first state the invariance principle that we use below (see the next subsection for the proof). Recall the definition of the functional ζ ⁣:\varmathbbR\varmathbbR\zeta\colon\varmathbb R\to\varmathbb R from Section 5.1.

The (somewhat technical) proof of Theorem 5.6 from the above invariance principle closely follows the argument of Mossel et al. and is defered to the appendix – see Section A.

2 Invariance Principles over Reed–Muller Codes

The various invariance principles of Mossel et al. [MOO05] are essentially equivalent (upto some polynomial loss in error estimates) to saying that for any low-degree regular polynomial PP, the polynomial threshold function (PTF) sign(P(  ))\operatorname{sign}(P(\;)) cannot distinguish between the uniform distribution over the hybercube and the standard multivariate Gaussian distribution N(0,1)N\mathcal{N}(0,1)^{N}.

Ideally, we would like a similar invariance principle to hold even when xx is chosen uniformly from the codes of the earlier sections instead of being uniform over the hypercube. Such an invariance principle will allow us to analyze alphabet reductions and integrality gaps based on graphs considered in earlier sections (e.g., the RM graph). We obtain such generic invariance principles applicable to all codes modulo certain plausible conjectures on low-degree polynomials being fooled by bounded independence.

For the explicit example of Reed–Muller code we bypass the conjectures and directly show an invariance principle by proving that the uniform distribution over the Reed–Muller code fools low-degree PTFs. To do so, we will use the specific structure of the Reed–Muller code along with the pseudorandom generator (PRG) for PTFs of Meka and Zuckerman [MZ10]. Specifically, we show that the uniform distribution over RM can be seen as an instantiation of the PRG of [MZ10] and then use the latter’s analysis as a blackbox. Call a smooth function ψ:\varmathbbR\varmathbbR\psi:\varmathbb R\rightarrow\varmathbb R BB-nice if ψ(4)(t)B|\psi^{(4)}(t)|\leqslant B for every t\varmathbbRt\in\varmathbb R.

Choose a random hHh\in\mathcal{H} and partition [N][N] into tt blocks B1,,BtB_{1},\ldots,B_{t}, with Bj={i:h(i)=j}B_{j}=\{i:h(i)=j\}.

Choose independent samples x1,,xtDx_{1},\ldots,x_{t}\leftarrow\mathcal{D} and let y{±1}Ny\in\{\pm 1\}^{N} be chosen according to an arbitrary distribution independent of x1,,xtx_{1},\ldots,x_{t}.

OutputThe description we give here is slightly different from that of [MZ10] due to the presence of the string yy. However, the analysis of [MZ10] works without any changes for this case as well.

Meka and Zuckerman show that GH,DG_{\mathcal{H},\mathcal{D}} as above fool (arbitrary) low-degree polynomials. Below we state their result for regular PTFs which suffices for our purposes and gives better quantitative bounds.

For simplicity, in the following discussion we view RM(n,d)\text{\sf RM}(n,d) as generating a vector in \varmathbbF2N\varmathbb F_{2}^{N} and show that the uniform distribution over RM(n,d)\text{\sf RM}(n,d) has the appropriate independence structure as required by Theorem 5.10, albeit with {±1}\{\pm 1\} replaced with {0,1}\{0,1\}. This does not the effect the analysis of the generator.

Let c=2log(1/ε)c=2\log(1/\varepsilon) and let S\mathcal{S} be the subspace of polynomials of the form

where the polynomials PaP_{a} each have degree at most dcd-c. Note that we can sample a uniformly random element Q1SQ_{1}\in\mathcal{S} by choosing independent, uniformly random degree at most dcd-c polynomials Pa:\varmathbbF2nc\varmathbbF2P_{a}:\varmathbb F_{2}^{n-c}\rightarrow\varmathbb F_{2} for a{0,1}ca\in\{0,1\}^{c} and setting Q1Q_{1} as above. This is because, each collection (Pa)a{0,1}c(P_{a})_{a\in\{0,1\}^{c}} leads to a unique element of S\mathcal{S} and together they cover all elements of S\mathcal{S}.

Let S\mathcal{S}^{\prime} be a subspace of degree dd, nn-variate polynomials such that SS={0}\mathcal{S}\cap\mathcal{S}^{\prime}=\{0\} and S,S\mathcal{S},\mathcal{S}^{\prime} together span all degree dd polynomials. Let A:\varmathbbF2n\varmathbbF2n\mathcal{A}:\varmathbb F_{2}^{n}\rightarrow\varmathbb F_{2}^{n} be the space of all affine transformations. For AAA\in\mathcal{A}, let hA:[N][t]h_{A}:[N]\rightarrow[t] be defined by hA(x)=A(x)[c]h_{A}(x)=A(x)_{|[c]} and let H{hA:AA}\mathcal{H}\equiv\{h_{A}:A\in\mathcal{A}\}. It is easy to see that for AuAA\in_{u}\mathcal{A}, the hash functions hAh_{A} are almost pairwise independent. Observe that for Q1uS,Q2uSQ_{1}\in_{u}\mathcal{S},Q_{2}\in_{u}\mathcal{S}^{\prime} and AuAA\in_{u}\mathcal{A}, the polynomial Q(  )=(Q1+Q2)(A(  ))Q(\;)=(Q_{1}+Q_{2})(A(\;)) is uniformly distributed over all nn-variate degree dd polynomials.

Now, fix a polynomial Q2SQ_{2}\in\mathcal{S}^{\prime}. Then, for a random Q1uSQ_{1}\in_{u}\mathcal{S}, we have

where u=Axu=Ax and the polynomials (Pa)a{0,1}c(P_{a})_{a\in\{0,1\}^{c}}, are independent uniformly random polynomials of degree at most dcd-c in ncn-c variables. Let D\mathcal{D} denote the distribution of (P(u))u\varmathbbF2nc(P^{\prime}(u))_{u\in\varmathbb F_{2}^{n-c}} for PP^{\prime} a uniformly random polynomial of degree at most dcd-c in (nc)(n-c) variables. Then, for every fixed AAA\in\mathcal{A} and Q2SQ_{2}\in\mathcal{S}^{\prime}, the distribution of the evaluations of QQ restricted to different buckets Ba={x:hA(x)=a}B_{a}=\{x:h_{A}(x)=a\} are independent of one another. Moreover, within each bucket BaB_{a}, the evaluations vector (Q1(x))xBa(Q_{1}(x))_{x\in B_{a}} is distributed as D\mathcal{D}, which is (2dc1)(2^{d-c}-1)-wise independent.

Therefore, for every fixed Q2SQ_{2}\in\mathcal{S}^{\prime}, the distribution of z=(Q(x))x\varmathbbF2nz=(Q(x))_{x\in\varmathbb F_{2}^{n}} is the same as the output of GH,DG_{\mathcal{H},\mathcal{D}} as defined in Equation 5.2, where y=Q2(A(x))y=Q_{2}(A(x)). The theorem now follows from Theorem 5.10. ∎

The invariance principle of Theorem 5.9 combined with the appropriate choice of the smooth function ψ\psi gives us the following corollaries.

Follows from using Theorem 5.9 and an argument as in Theorem 3.19 of [MOO05] who get a similar conclusion for the hypercube starting from an invariance principle for the hypercube to the Gaussian space. ∎

Follows from Theorem 5.9 and Lemma 5.8 in [MZ10]. ∎

3 Invariance Principles over Codes

Our main tool for proving that “majority is stablest” result over Reed–Muller codes, Theorem 5.6, was the invariance principle Theorem 5.7. We conjecture that similar results should hold for any linear code with sufficiently large dual distance so that the codewords have bounded independence. In particular, we conjecture that bounded independence fools arbitrary low-degree polynomial threshold functions (PTFs) over {±1}n\{\pm 1\}^{n}.

The conjecture is known to be true for halfspaces [DGJ+09], degree two PTFs [DKN10] and for Gaussians with bounded independence [Kan11].

For all d\varmathbbNd\in\varmathbb N and ε>0\varepsilon>0, there exists k=k(d,ε)k=k(d,\varepsilon) such that the following holds: Let QQ be an nn-variate multilinear real polynomial with degree dd. Let XX be an kk-wise independent distribution over {±1}n\{\pm 1\}^{n} and let YY be the uniform distribution over {±1}n\{\pm 1\}^{n}. Then, \varmathbbEsignQ(X)\varmathbbEsignQ(Y)ε.\lvert\operatorname*{\varmathbb{E}}\operatorname{sign}\circ Q(X)-\operatorname*{\varmathbb{E}}\operatorname{sign}\circ Q(Y)\rvert\leqslant\varepsilon\,.

Finally, we remark that for the application to “majority is stablest” it suffices to show a weaker invariance principle applicable to the ζ\zeta functional.

For all d\varmathbbNd\in\varmathbb N and ε>0\varepsilon>0, there exists k=k(d,ε)k=k(d,\varepsilon) and η=η(ε)\eta=\eta(\varepsilon) such that the following holds: Let QQ be an nn-variate multilinear real polynomial with degree dd. Let XX be a kk-wise independent distribution over {±1}n\{\pm 1\}^{n} and let YY be the uniform distribution over {±1}n\{\pm 1\}^{n}. Suppose that \varmathbbEQ(X)21\operatorname*{\varmathbb{E}}Q(X)^{2}\leqslant 1 and \varmathbbEζQ(X)η\operatorname*{\varmathbb{E}}\zeta\circ Q(X)\leqslant\eta. Then, \varmathbbEζQ(Y)ε\operatorname*{\varmathbb{E}}\zeta\circ Q(Y)\leqslant\varepsilon.

We show in the appendix that Conjecture 5.13 implies Conjecture 5.14.

Efficient Alphabet Reduction

The long code over a (non-binary) alphabet QQ consists of the set of dictator functions {f1,,fN ⁣:QNQ},\{f_{1},\ldots,f_{N}\colon Q^{N}\to Q\}, where fi(x)=xif_{i}(x)=x_{i} for all xQNx\in Q^{N}.

A natural 22-query test for this code was proposed by Khot et al. [KKMO07] and analyzed in Mossel et al. [MOO05]. The queries of the test are associated with the edges of the ε\varepsilon-noise graph on QNQ^{N}. In this graph, the weight of an edge (x,y)(x,y) is its probability in the following sampling procedure: Sample xQNx\in Q^{N} uniformly at random and resample each coordinate of xQNx\in Q^{N} independently with probability ε\varepsilon to generate yQNy\in Q^{N}.

In this section, we present a more efficient code that serves as an analogue for the long code over a non-binary alphabet. For n,d\varmathbbNn,d\in\varmathbb N, let N=2nN=2^{n} and let C\varmathbbF2N\mathcal{C}\subseteq\varmathbb F_{2}^{N} be the Reed–Muller code RM(n,nd1)\text{\sf RM}(n,n-d-1) and let D=C\varmathbbF2N\mathcal{D}=\mathcal{C}^{\perp}\in\varmathbb F_{2}^{N} be its dual RM(n,d)\text{\sf RM}(n,d). Let TD\mathcal{T}\subseteq\mathcal{D} denote the canonical test set for the code C\mathcal{C} as in Section 4.3.

Let t\varmathbbNt\in\varmathbb N and let Q=\varmathbbF2tQ=\varmathbb F_{2}^{t}. We define the following distribution Tt\mathcal{T}_{t} over Dt\mathcal{D}^{t} (the tt-fold direct sum of D\mathcal{D}, a subspace of \varmathbbF2tN\varmathbb F_{2}^{t\cdot N}),

Sample cc from the test set TD\mathcal{T}\subseteq\mathcal{D}.

Sample w=(w(1),,w(t))w=(w^{(1)},\ldots,w^{(t)}) from \varmathbbF2t\varmathbb F_{2}^{t} at random.

Sample z=(z(1),,z(t))Dtz=(z^{(1)},\ldots,z^{(t)})\in\mathcal{D}^{t} by setting

Since the noise graph on QNQ^{N} is a Cayley graph over the abelian group \varmathbbF2tN\varmathbb F_{2}^{tN}, the characters of this group form a basis of eigenfunctions. For β=(β1,,βN)QN\beta=(\beta_{1},\ldots,\beta_{N})\in Q^{N}, let χβ ⁣:QN{±1}\chi_{\beta}\colon Q^{N}\to\{\pm 1\} denote the character

The eigenvalue of χβ\chi_{\beta} in the ε\varepsilon-noise graph on QNQ^{N} (1ε)wt(β)(1-\varepsilon)^{{\sf wt}(\beta)} where wt(β)={iβi0t}{\sf wt}(\beta)=\lvert\{i\mid\beta_{i}\neq 0^{t}\}\rvert is the Hamming weight of β\beta as a length-NN string over alphabet QQ. (In this section wt(){\sf wt}(\cdot) will always refer to the Hamming weight of strings over alphabet QQ.)

where wt(β)={i[N]βi0t}{\sf wt}(\beta^{\prime})=\lvert\{i\in[N]\mid\beta^{\prime}_{i}\neq 0^{t}\}\rvert is the Hamming weight of β\beta^{\prime} seen as a length-NN string over alphabet QQ. (Here, the minimum is over all βQN\beta^{\prime}\in Q^{N} that lie in the same coset as β\beta in QN/CtQ^{N}/\mathcal{C}^{t}.)

We will first prove an upper bound on λβ\lambda_{\beta} for the case that wt(β)2d{\sf wt}(\beta)\gg 2^{d}. We write β=(β(1),,β(t))\beta=(\beta^{(1)},\ldots,\beta^{(t)}) with β(i)\varmathbbF2N\beta^{(i)}\in\varmathbb F_{2}^{N}. Let z=(z(1),,z(t))Dtz=(z^{(1)},\ldots,z^{(t)})\in\mathcal{D}^{t} be a string drawn from the distribution Tt\mathcal{T}_{t}. Note that z(i)=w(i)cz^{(i)}=w^{(i)}\cdot c, where w=(w(1),,w(t))w=(w^{(1)},\ldots,w^{(t)}) and cc are sampled as in the definition of Tt\mathcal{T}_{t}. Since ww is a random vector in \varmathbbF2t\varmathbb F_{2}^{t}, we can upper bound λβ\lambda_{\beta},

Without loss of generality, we may assume that β(t)\beta^{(t)} has Hamming weight (as a binary string) at least wt(β)/t{\sf wt}(\beta)/t. By Theorem 4.11, if wt(β)>η2d{\sf wt}(\beta)>\eta 2^{-d} for sufficiently small η>0\eta>0, we can upper bound λβ1Ω(η/t)\lambda_{\beta}\leqslant 1-\Omega(\eta/t).

Next, we will estimate λβ\lambda_{\beta} (from below and above) for wt(β)2d{\sf wt}(\beta)\ll 2^{-d}. Let I[N]I\subseteq[N] be the set of coordinates i[N]i\in[N] with βi0t\beta_{i}\neq 0^{t}. We claim,

We write β=(β1,,βN)\beta=(\beta_{1},\ldots,\beta_{N}) with βi\varmathbbF2t\beta_{i}\in\varmathbb F_{2}^{t}. Then, β,z=i[N]ciw,βi\langle\beta,z\rangle=\sum_{i\in[N]}c_{i}\langle w,\beta_{i}\rangle. We refine the event β,z=1\langle\beta,z\rangle=1 according to the cardinality of Isupp(c)I\cap\operatorname{supp}(c). If Isupp(c)=I\cap\operatorname{supp}(c)=\emptyset, then β,z=0\langle\beta,z\rangle=0. On the other hand, conditioned on Isupp(c)\lvert I\cap\operatorname{supp}(c)\rvert, the event β,z=1\langle\beta,z\rangle=1 is equivalent to the event w,βi0=1\langle w,\beta_{i_{0}}\rangle=1 with {i0}=Isupp(c)\{i_{0}\}=I\cap\operatorname{supp}(c). Since βi00t\beta_{i_{0}}\neq 0^{t}, this event has (conditional) probability \nicefrac12\nicefrac{{1}}{{2}}. Hence,

which implies the claimed estimate for λβ\lambda_{\beta}.

It remains to estimate the distribution of Isupp(c)\lvert I\cap\operatorname{supp}(c)\rvert. The argument is similar to the proof of Lemma 3.3. For every coordinate i[N]i\in[N], we have \varmathbbPcT{ci=1}=2d\operatorname*{\varmathbb{P}}_{{c\in\mathcal{T}}}\left\{c_{i}=1\right\}=2^{-d}. Thus, \varmathbbP{Isupp(c)=1}I2d=wt(β)/2d\operatorname*{\varmathbb{P}}\left\{\,\lvert I\cap\operatorname{supp}(c)\rvert=1\right\}\leqslant\lvert I\rvert\cdot 2^{-d}={\sf wt}(\beta)/2^{d}. On the other hand, for any two distinct coordinates ij[N]i\neq j\in[N], we have \varmathbbPcT{ci=cj=1}=22d\operatorname*{\varmathbb{P}}_{{c\in\mathcal{T}}}\left\{c_{i}=c_{j}=1\right\}=2^{-2d}. Therefore,

Similarly, \varmathbbP{Isupp(c)2}(wt(β)/2d)2\operatorname*{\varmathbb{P}}\left\{\,\lvert I\cap\operatorname{supp}(c)\rvert\geqslant 2\right\}\leqslant({\sf wt}(\beta)/2^{d})^{2}. We conclude that

(Note that the estimate is only meaningful when wt(β)2d{\sf wt}(\beta)\ll 2^{d}.) ∎

For an absolute constant c0c_{0} and all βQN/Dt\beta\in Q^{N}/\mathcal{D}^{t}, λβmax(ρwt(β)/c0t,ρ2d/c0t)\lambda_{\beta}\leqslant\max(\rho^{{\sf wt}(\beta)/c_{0}t},\rho^{2^{d}/c_{0}t}).

Influences

Let βQN/Ct\beta\in Q^{N}/\mathcal{C}^{t}. Suppose wt(β)<wt(Ct)/2{\sf wt}(\beta)<{\sf wt}(\mathcal{C}^{t})/2. (Note that CtQN\mathcal{C}^{t}\subseteq Q^{N} has the same minimum distance as C\varmathbbF2N\mathcal{C}\subseteq\varmathbb F_{2}^{N}. ) In this case, we will identify β\beta with the (unique) codeword of minimum weight in the equivalence class βQN/Ct\beta\in Q^{N}/\mathcal{C}^{t}.

(Here, βi\beta_{i} refers to the ii-th coordinate of the unique minimum-weight representative of the equivalence class β\beta.)

1 Majority is Stablest

In this section, we show an analogue of the majority is stablest theorem of [MOO05] on the ε\varepsilon-noise graph on QNQ^{N} just as Theorem 5.6 showed an analogue of the majority is stablest theorem over the Boolean noise graph.

where ρ=eε\rho=e^{-\varepsilon}, μ=\varmathbbExDt[f(x)]\mu=\operatorname*{\varmathbb{E}}_{x\sim\mathcal{D}^{t}}[f(x)] and Γρ ⁣:\varmathbbR\varmathbbR\Gamma_{\rho}\colon\varmathbb R\to\varmathbb R is the noise stability curve over Gaussian space.

2 222-Query Test

We will now describe a dictatorship test for functions on Dt\mathcal{D}^{t}, analogous to the 22-query dictatorship test on ε\varepsilon-noise graph.

Clearly, the dictator functions are linear functions over Dt\mathcal{D}^{t}, i.e., χβ(c+c)=χβ(c)+χβ(c)\chi_{\beta}(c+c^{\prime})=\chi_{\beta}(c)+\chi_{\beta}(c^{\prime}). This linearity is used to perform the 22-query test via folding. Note that for each αQ\alpha\in Q, the constant function α(x)=α\alpha(x)=\alpha for all x\varmathbbF2nx\in\varmathbb F_{2}^{n}, belongs to the code Dt\mathcal{D}^{t}. We will fold the function by enforcing that for all αQ\alpha\in Q, f(c+α)=f(c)+αf(c+\alpha)=f(c)+\alpha for all αQ\alpha\in Q.

Completeness

Recall that for a cCc\in\mathcal{C}^{\bot} generated from distribution Tε\mathcal{T}_{\varepsilon}, for each x\varmathbbF2nx\in\varmathbb F_{2}^{n} (see Lemma 4.5)

It is easy to see that by construction, this property holds for the distribution Tt,ε\mathcal{T}_{t,\varepsilon} also, namely,

Soundness

The probability of acceptance of the 22-query test is given by,

By applying Theorem 5.6, there is an appropriate choice of L,τL,\tau such that if maxi[N]InfiL(fα)τ\max_{i\in[N]}\operatorname{Inf}^{\leqslant L}_{i}(f_{\alpha})\leqslant\tau for all α\alpha then the probability of acceptance can be bounded by

Efficient integrality gaps for unique games

In this section, we present constructions of SDP integrality gap instances starting from a code C\mathcal{C} along with a local tester. To this end, we make an additional assumption on the code C\mathcal{C}. Specifically, let us suppose there exists a subcode H\mathcal{H} of D=C\mathcal{D}=\mathcal{C}^{\perp} with distance 12\frac{1}{2}. Formally, we show the following result.

Let C\mathcal{C} be an [N,K,D]2[N,K,D]_{2} linear code with a canonical tester T\mathcal{T} as described in Definition 3.2. Furthermore, let H\mathcal{H} be a subcode of D=C\mathcal{D}=\mathcal{C}^{\perp} with distance 12\frac{1}{2}. Then, there exists an instance of unique games, more specifically a H\textscMax2Lin\mathcal{H}\textsc{-Max-2Lin} instance, whose vertices are D\mathcal{D} (D=2NK|\mathcal{D}|=2^{N-K}) and alphabet H\mathcal{H} such that:

The optimum value of the natural SDP relaxation for unique games is at least (12tN)2\left(1-\frac{2t}{N}\right)^{2} where tt is the number of queries made by the canonical tester T\mathcal{T}.

Instantiating the above theorem with the Reed–Muller code and its canonical tester we obtain the following explicit SDP integrality gap instance.

For every integer nn, δ>0\delta>0 there exists a \varmathbbF2n\textscMax2Lin\varmathbb F_{2}^{n}\textsc{-Max-2Lin} instance Γ\Gamma on M=22log2nM=2^{2^{\log^{2}n}} vertices such that the optimum value of the SDP relaxation on Γ\Gamma is 1O(log(1/δ)n)=1O(log(1/δ)2(loglogM)1/2)1-O(\frac{\log(1/\delta)}{n})=1-O\left(\frac{\log(1/\delta)}{2^{(\log\log M)^{1/2}}}\right) while every labelling of Γ\Gamma satisfies at most O(δ)O(\delta) fraction of edges.

Fix the code C\mathcal{C} to be the Reed–Muller code RM(n,nlogn)\text{\sf RM}(n,n-\log n) of degee d=lognd=\log n over nn variables. The block length of the code is N=2nN=2^{n}, while the rate is K=2nid(ni)2nO(2log2n)K=2^{n}-\sum_{i\leqslant d}\binom{n}{i}\leqslant 2^{n}-O(2^{\log^{2}n}). This code contains the Hadammard code H\mathcal{H} which is of relative distance 12\frac{1}{2}.

Let TRM\mathcal{T}_{\text{\sf RM}} denote the canonical Reed–Muller tester for RM(n,nlogn)\text{\sf RM}(n,n-\log n), and let TRMr\mathcal{T}_{\text{\sf RM}}^{\oplus r} denote the XOR of rr-independent tests. Let us fix r=100log(1/δ)r=100\log(1/\delta), thus yielding a canonical tester making t=log(1/δ)2ndt=\log(1/\delta)\cdot 2^{n-d} queries. By the work of [BKS+10], this tester has a soundness of at least s(k)=12(1k/2d+1)r/2s(k)=\frac{1}{2}-(1-k/2^{d+1})^{r}/2. With k=2d/10k=2^{d}/10, the above soundness is at least s(k)1/2δ/2s(k)\geqslant 1/2-\delta/2. Using Theorem 7.1, the optimum value of the resulting \varmathbbF2n\textscMax2Lin\varmathbb F_{2}^{n}\textsc{-Max-2Lin} instance is at most δ\delta. On the other hand, the SDP value is at least

Starting from C\mathcal{C}, we construct an SDP integrality gap instance Γ(C,T)\Gamma(\mathcal{C},\mathcal{T}) for unique games as described below.

Here we construct SDP vectors that form a feasible solution to a natural SDP relaxation of unique games [KV05].

For a vector c\varmathbbF2mc\in\varmathbb F_{2}^{m}, we will use (1)c\varmathbbRm(-1)^{c}\in\varmathbb R^{m} to denote the vector whose coordinates are given by (1)ic=(1)ci(-1)^{c}_{i}=(-1)^{c_{i}}. For a pair of vectors c,cc,c^{\prime}, we have

For each vertex cDc\in\mathcal{D} associate vectors {bc,h=(1)c+h(1)c+hhH}\{{\bm{b}}_{c,h}=(-1)^{c+h}\otimes(-1)^{c+h}|h\in\mathcal{H}\}. Notice that for a pair of vectors bc,h,bc,h{\bm{b}}_{c,h},{\bm{b}}_{c^{\prime},h^{\prime}} we have,

Since the distance of the code H\mathcal{H} is 12\frac{1}{2}, we have

In other words, for every vertex cc, the corresponding SDP vectors are orthonormal. The objective value of the SDP solution is given by,

where tt is the number of queries made by the canonical tester T\mathcal{T} for C\mathcal{C}.

Soundness

The expectation of the function fpf_{p} is given by,

Since fpf_{p} is bounded in the range $$ we have,

Applying Corollary 4.10, we get that for each pp,

Hierarchy integrality gaps for Unique Games and Related Problems

This section is devoted to the construction of a integrality gap instance for a hierarchy of SDP relaxations to Unique Games. More specifically, we consider the LHr\mathsf{LH}_{r} and SAr\mathsf{SA}_{r} SDP hierarchies described in [RS09]. For these SDP hierarchies, we will demonstrate the following integrality gap constructions.

For every ε,δ>0\varepsilon,\delta>0, there exists an \varmathbbF2t\textscMax2Lin\varmathbb F_{2}^{t}\textsc{-Max-2Lin} instance II for some positive integer tt, such that no labelling satisfies more than δ\delta fraction of edges of Γ\Gamma while there exists an SDP solution such that,

the SDP solution is feasible for LHR\mathsf{LH}_{R} with R=exp(exp(Ω(loglog1/2N)))R=\exp(\exp(\Omega(\log\log^{1/2}N))).

the SDP solution is feasible for SAR\mathsf{SA}_{R} with R=exp(Ω(loglog1/2N))R=\exp(\Omega(\log\log^{1/2}N)).

the SDP solution has value 1O(ε)1-O(\varepsilon).

where NN is the number of vertices in the instance II.

Composing the above SDP integrality gap with Unique games based hardness reductions yields corresponding gap instances for several classes of problems like constraint satisfaction problems (CSPs) and ordering CSPs like maximum acyclic subgraph. Specifically, up to exp(exp(Ω(loglog1/2N)))\exp(\exp(\Omega(\log\log^{1/2}N))) rounds of LH\mathsf{LH} hierarchy or the exp(Ω(loglog1/2N)\exp(\Omega(\log\log^{1/2}N) rounds of the SA\mathsf{SA} hierarchy can be shown to have te same SDP integrality gap as the simple SDP relaxation for every CSP.For the sake of brevity, we omit a formal statement of this result here.

Towards showing Theorem 8.1, we follow the approach outlined in [RS09]. At a high-level, the idea is to start with an integrality gap instance Γ\Gamma for a simple SDP relaxation for unique games over a large alphabet. The instance Γ\Gamma is reduced to an instance Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) of unique games over a smaller alphabet using a reduction similar to Khot et al. [KKMO07]. Moreover, the SDP solution to the simple SDP relaxation of Γ\Gamma can be translated to a solution for several rounds of SDP hierarchy for Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma).

Let Γ\Gamma be an instance of \varmathbbF2n\textscMax2Lin\varmathbb F_{2}^{n}\textsc{-Max-2Lin} over a set of vertices V(Γ)V(\Gamma) and edges E(Γ)E(\Gamma). On every edge (u,v)E(Γ)(u,v)\in E(\Gamma), there is a constraint of the form uv=αuvu-v=\alpha_{uv} for some α\varmathbbF2n\alpha\in\varmathbb F_{2}^{n}. We will reduce Γ\Gamma to an instance of Q\textscMax2LinQ\textsc{-Max-2Lin} instance using the two query test described in Section 6.

Notice that Reed-Muller codes are invariant under translation of its coordinates. Therefore, the code Dt\mathcal{D}^{t} and the test distributions Tt,ε\mathcal{T}_{t,\varepsilon} are both invariant under translation. Formally, for an α\varmathbbF2n\alpha\in\varmathbb F_{2}^{n}, the translation operator Tα ⁣:QNQNT_{\alpha}\colon Q^{N}\to Q^{N} is defined by

Given a codeword cDtc\in\mathcal{D}^{t}, we have TαcDtT_{\alpha}\circ c\in\mathcal{D}^{t}.

We are now ready to describe the reduction from Γ\Gamma to an instance of \varmathbbF2n\textscMax2Lin\varmathbb F_{2}^{n}\textsc{-Max-2Lin}.

Soundness

For all sufficiently small constants ε,δ>0\varepsilon,\delta>0 and all choices of Q=2tQ=2^{t}, there exists γ,d\gamma,d such that if no labelling of Γ\Gamma satisfies more than γ\gamma fraction of edges, then every labelling of Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) satisfies at most QΓρ(\nicefrac1Q)+δQ\Gamma_{\rho}\left(\nicefrac{{1}}{{Q}}\right)+\delta fraction of constraints, where ρ=eε\rho=e^{-\varepsilon}.

Due to folding we have fqv(c)=fq+rv(c+r)f^{v}_{q}(c)=f^{v}_{q+r}(c+r) for all rQr\in Q. Moreover, this implies that \varmathbbEcDtfqv=1Q\operatorname*{\varmathbb{E}}_{c\in\mathcal{D}^{t}}f^{v}_{q}=\frac{1}{Q}. Finally, for a vertex uV(Γ)u\in V(\Gamma) and rQr\in Q define,

Clearly, for the functions hruh^{u}_{r} also we have,

The probability of acceptance of the verifier can be arithmetized in terms of the functions hruh^{u}_{r}.

Suppose the probability of acceptance of the verifier is at least QΓρ(\nicefrac1Q)+δQ\cdot\Gamma_{\rho}(\nicefrac{{1}}{{Q}})+\delta. By simple averaging, for at least δ/2\delta/2 fraction of the vertices uV(Γ)u\in V(\Gamma) we have,

Let us refer to such a vertex uu as being good.

Fix the parameters τ,L,d\tau,L,d to those obtained by applying Theorem 6.4 with parameters ε,δ/2Q\varepsilon,\delta/2Q. Recall that by (8.1), we have \varmathbbEDt[hqu]=1Q\operatorname*{\varmathbb{E}}_{\mathcal{D}^{t}}[h^{u}_{q}]=\frac{1}{Q}. Applying Theorem 6.4, if for each qQq\in Q, maxα\varmathbbF2nInfαl(hqu)τ\max_{\alpha\in\varmathbb F_{2}^{n}}\operatorname{Inf}_{\alpha}^{\leqslant l}(h^{u}_{q})\leqslant\tau then,

This implies that for each good vertex uu there exists q,αq,\alpha such that InfαL(hqu)τ\operatorname{Inf}_{\alpha}^{\leqslant L}(h^{u}_{q})\geqslant\tau. We will use these influential coordinates to decode a labelling for the \varmathbbF2n\textscMax2Lin\varmathbb F_{2}^{n}\textsc{-Max-2Lin} instance Γ\Gamma.

For each vertex vV(Γ)v\in V(\Gamma) define the set of influential coordinates SvS_{v} as,

Using Lemma A.1, for each of the functions hqvh^{v}_{q} or fqvf^{v}_{q}, there are at most 2L/τ2L/\tau coordinates with influence greater than τ/2\tau/2. Therefore, for each vertex vv the set SvS_{v} is of size at most 2Q2L/τ=4QL/τ2\cdot Q\cdot 2L/\tau=4QL/\tau.

Define an assignment of labels A:V(Γ)\varmathbbF2nA:V(\Gamma)\to\varmathbb F_{2}^{n} as follows. For each vertex vv, sample a random αSv\alpha\in S_{v} and assign A(v)=αA(v)=\alpha.

Fix one good vertex uu, and a corresponding q,αq,\alpha such that InfαL(hqu)τ\operatorname{Inf}_{\alpha}^{\leqslant L}(h_{q}^{u})\geqslant\tau. By definition of hquh_{q}^{u} this implies that

Hence, for at least a τ/2\tau/2 fraction of the neighbours vN(u)v\in N(u), the coordinate ααuv\alpha-\alpha_{uv} has influence at least τ/2\tau/2 on fqvf^{v}_{q}. Therefore, for every good vertex uu, for at least τ/2\tau/2 fraction of its neighbours vN(u)v\in N(u), the edge (u,v)(u,v) is satisfied by the labelling AA with probability at least 1Su1Svτ2/16Q2L2\frac{1}{|S_{u}|}\frac{1}{|S_{v}|}\geqslant\tau^{2}/16Q^{2}L^{2}. Since there are at least a δ/2\delta/2-fraction of good vertices uu, the expected fraction of edges satisfied by the labelling AA is at least δ2τ2τ216Q2L2=δτ364Q2L2\frac{\delta}{2}\cdot\frac{\tau}{2}\cdot\frac{\tau^{2}}{16Q^{2}L^{2}}=\frac{\delta\tau^{3}}{64Q^{2}L^{2}}.

SDP Solution

We will construct feasible solutions to certain strong SDP relaxations of Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) by appealing to the work of [RS09]. The SDP hierarchies that we consider are referred to as the LH\mathsf{LH} and SA\mathsf{SA} hierarchies. Informally, the rthr^{th}-level LH\mathsf{LH} relaxation (LHr\mathsf{LH}_{r}) consists of the simple SDP relaxation for unique games augmented by local distributions μS\mu_{S} over integral assignments for every set SS of at most rr vertices. The local distribution μS\mu_{S} is required to be consistent with the inner products of the SDP vectors. Alternately, this SDP hierarchy can be thought of as, the simple SDP relaxation augmented by every valid constraint on at most rr vertices.

The SA\mathsf{SA} hierarchy is a somewhat stronger hierarchy that requires the local distributions μS\mu_{S} to be consistent with each other, namely, μS\mu_{S} and μT\mu_{T} agree on STS\cap T. Alternately, the SA\mathsf{SA} hierarchy corresponds to the simple SDP relaxation augmented with rr-rounds of Sherali-Adams LP variables. We refer the reader to [RS09] for formal definitions of the SA\mathsf{SA} and LH\mathsf{LH} hierarchies.

Suppose Γ\Gamma has an SDP solution that of value 1η1-\eta, then there exists an SDP solution to the instance Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) such that,

the SDP solution is feasible for LHR\mathsf{LH}_{R} with R=2Ω(ε/η1/4)R=2^{\Omega(\varepsilon/\eta^{1/4})}.

the SDP solution is feasible for SAR\mathsf{SA}_{R} with R=Ω(ε/η1/4)R=\Omega(\varepsilon/\eta^{1/4}).

the SDP solution has value 1O(ε)oη(1)1-O(\varepsilon)-o_{\eta}(1) on Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma).

This lemma is a direct consequence of Theorem 99 from [RS09].

In [RS09], the authors start with an integrality gap instance Γ\Gamma for the simple SDP for unique games, and then perform a traditional long code based reduction to obtain an instance Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma).

The crucial observation is the following.

The vertices of Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) are a subset of vertices of Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma) – the instance obtained by the traditional QQ-ary long code reduction on Γ\Gamma.

The vertices of Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) are pairs of the form (v,c)(v,c) where vV(Γ)v\in V(\Gamma) and cDtc\in\mathcal{D}^{t}. The codeword cDtc\in\mathcal{D}^{t} can be thought of as a string of length N=2nN=2^{n} over the alphabet Q=\varmathbbF2tQ=\varmathbb F_{2}^{t}, namely, cQ2nc\in Q^{2^{n}}. The vertices of the instance Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma) obtained via a traditional QQ-ary long code reduction is V(Γ)×Q2nV(\Gamma)\times Q^{2^{n}}. Hence the observation follows. ∎

In [RS09], the authors construct an SDP solution for the instance Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma) that is feasible for LHR\mathsf{LH}_{R} relaxation with R=2Ω(ε/η1/4)R=2^{\Omega(\varepsilon/\eta^{1/4})} and for SAR\mathsf{SA}_{R} relaxation with R=Ω(ε/η1/4)R=\Omega(\varepsilon/\eta^{1/4}). As noted in 8.5, the vertices of Ψε,Q(Γ)\Psi_{\varepsilon,Q}(\Gamma) are a subset of the vertices of Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma). Therefore, the same SDP solution constructed in [RS09] when restricted to the instance Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) yields a feasible solution for the corresponding LHR\mathsf{LH}_{R} and SAR\mathsf{SA}_{R} relaxations.

To finish the proof, we need to show that the value of the SDP solution from [RS09] is 12εoη(1)1-2\varepsilon-o_{\eta}(1).

The traditional long code based reduction to get Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma) uses the noise stability test as the inner gadget. Namely, to test if a function f:\varmathbbFQ2n\varmathbbFQf:\varmathbb F_{Q}^{2^{n}}\to\varmathbb F_{Q} is a dictator function, the verifier picks x\varmathbbFQ2nx\in\varmathbb F_{Q}^{2^{n}} uniformly at random, and rerandomizes each coordinate of xx independently with probability ε\varepsilon, and then tests if f(x)=f(y)f(x)=f(y). Composing this noise stability test with the outer unique game Γ\Gamma yields the instance Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma). The value of the SDP solution constructed for Φε,Q(Γ)\Phi_{\varepsilon,Q}(\Gamma) in [RS09] depends only on the expected hamming distance between the queries x,yx,y. More precisely, in Claim 22 of [RS09], the authors show that if the distribution on the queries (x,y)\varmathbbFQ2n×\varmathbbFQ2n(x,y)\in\varmathbb F_{Q}^{2^{n}}\times\varmathbb F_{Q}^{2^{n}} is chosen to be an arbitrary distribution NS\mathsf{NS} over \varmathbbFQ2n×\varmathbbFQ2n\varmathbb F_{Q}^{2^{n}}\times\varmathbb F_{Q}^{2^{n}}, the SDP objective value of the solution is given by

Fix t=10/εlog(1/δ)t=\lceil 10/\varepsilon\log(1/\delta)\rceil and Q=2tQ=2^{t}. By our choice of QQ, we have QΓeε(1/Q)δQ\Gamma_{e^{-\varepsilon}}(1/Q)\leqslant\delta (see Appendix B in [MOO05] for such asymptotic bounds on Γ\Gamma).

Fix γ,d\gamma,d depending on ε,δ\varepsilon,\delta and QQ as dictated by Lemma 8.3. Let Γ\Gamma be the Unique games instance obtained by Corollary 7.2 with the optimal integral value set to γ\gamma. In particular, Γ\Gamma is a \varmathbbF2n\textscMax2Lin\varmathbb F_{2}^{n}\textsc{-Max-2Lin} instance that has M=22log2nM=2^{2^{\log^{2}n}} vertices. Its SDP optimum for the simple Unique games SDP relaxation is at least 1O(C(ε,δ)/n)1-O(C(\varepsilon,\delta)/n) (η=O(C(ε,δ)/n)\eta=O(C(\varepsilon,\delta)/n)) for some constant C(ε,δ)C(\varepsilon,\delta) depending on ε,δ\varepsilon,\delta.

Now we apply the reduction to \varmathbbF2t\textscMax2Lin\varmathbb F_{2}^{t}\textsc{-Max-2Lin} outlined below to obtain an instance Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma). The number of vertices of the instance Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) is V(Γ)×Dt|V(\Gamma)|\times|\mathcal{D}^{t}|. Note that the choice of the degree dd is a constant (say d(ε,δ)d(\varepsilon,\delta)) depending on ε,δ\varepsilon,\delta. Hence, the number of points in Dt\mathcal{D}^{t} is given by Dt=2O(nd(ε,δ))|\mathcal{D}^{t}|=2^{O(n^{d(\varepsilon,\delta)})}. Therefore, the number of vertices of Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) is N=22log2n2O(nd(ε,δ))=22O(log2n)N=2^{2^{\log^{2}n}}\cdot 2^{O(n^{d(\varepsilon,\delta)})}=2^{2^{O(\log^{2}n)}}. Equivalently, we have n=2Ω(loglog1/2N)n=2^{\Omega(\log\log^{1/2}N)}.

By Lemma 8.3, the optimal labelling to Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) satisfies at most QΓeε(1/Q)+δ=O(δ)Q\Gamma_{e^{-\varepsilon}}(1/Q)+\delta=O(\delta) fraction of constraints.

By Lemma 8.4, there exists an SDP solution to the instance Ψε,Q,d(Γ)\Psi_{\varepsilon,Q,d}(\Gamma) with value 1O(ε)oη(1)1-O(\varepsilon)-o_{\eta}(1). Since η=O(C(ε,δ)/n)\eta=O(C(\varepsilon,\delta)/n), for large enough choice of nn, the SDP value is at least 1O(ε)1-O(\varepsilon).

The SDP solution is feasible for LHR\mathsf{LH}_{R} for R=2Ω(ε/η1/4)=2c(ε,δ)n1/4=exp(exp(Ω(loglog1/2N)))R=2^{\Omega(\varepsilon/\eta^{1/4})}=2^{c(\varepsilon,\delta)n^{1/4}}=\exp(\exp(\Omega(\log\log^{1/2}N))) rounds, where c(ε,δ)c(\varepsilon,\delta) is a constant depending on ε\varepsilon and δ\delta. Furthermore, the SDP solution is also feasible for SAR\mathsf{SA}_{R} for R=Ω(ε/η1/4)=c(ε,δ)n1/4=exp(Ω(loglog1/2N))R=\Omega(\varepsilon/\eta^{1/4})=c(\varepsilon,\delta)n^{1/4}=\exp(\Omega(\log\log^{1/2}N)).

Acknowledgements.

We have had many beneficial conversations on this question with a number of researchers including Venkatesan Guruswami, Ran Raz, Alex Samorodnitsky, Amir Shpilka, Daniel Spielman, Ryan O’Donnell, Subhash Khot and Madhu Sudan. We thank Avi Wigderson for suggesting the name “short code”.

References

Appendix A Missing Proofs

Let g=Gγfg=G^{\gamma}f and G=G12γG^{\prime}=G^{1-2\gamma}. Then, the graph GG^{\prime} has the same eigenfunctions as GG - χα\chi_{\alpha} for α\varmathbbF2N/C\alpha\in\varmathbb F_{2}^{N}/\mathcal{C} with eigenvalues λα=λα12γ\lambda_{\alpha}^{\prime}=\lambda_{\alpha}^{1-2\gamma}. From the above equation, it is easy to check that, for ρ=ρ12γ\rho^{\prime}=\rho^{1-2\gamma},

Further, as the eigenvalues of GG are each at most 11, the coordinate influences of gg are no larger than those of ff.

Hence, using Equation (A.2) (and the crude bound μ1\mu\leqslant 1),

Let S{±1}NS\subseteq\{\pm 1\}^{N} be the set of {±1}\{\pm 1\}-vectors corresponding to the Reed–Muller code C=RM(n,d)\mathcal{C}^{\bot}=\text{\sf RM}(n,d), that is, for every codeword cCc\in\mathcal{C}^{\bot}, the set SS contains the vector ((1)c1,,(1)cN)((-1)^{c_{1}},\ldots,(-1)^{c_{N}}). Then, as gg is $valuedon-valued on\mathcal{C}^{\bot}andand\zeta$ measures distance to bounded random variables, by Equation (A.1),

Since Γρ(μ+η)Γρ(μ)+2η\Gamma_{\rho^{\prime}}(\mu+\sqrt{\eta})\leqslant\Gamma_{\rho^{\prime}}(\mu)+2\sqrt{\eta} and Γρ(μ)Γρ(μ)+ρρ/(1ρ)\Gamma_{\rho}(\mu)\leqslant\Gamma_{\rho^{\prime}}(\mu)+|\rho-\rho^{\prime}|/(1-\rho) (cf. Lemma B.3, Corollary B.5 in [MOO05]), it follows from Equations (A.3) and (A.4) that

(Here we used the estimate ρρ=ρρ12γ=O(γlog(1/ρ))|\rho-\rho^{\prime}|=|\rho-\rho^{1-2\gamma}|=O(\gamma\log(1/\rho)).) By choosing dClog(1/τ)d\geqslant C\log(1/\tau) and γ=CKloglog(1/τ)/(log(1/τ)log(1/ρ))\gamma=CK\log\log(1/\tau)/(\log(1/\tau)\log(1/\rho)) for an appropriately large constant CC, the above expression simplifies to

Since ζQ(X)\zeta\circ Q(X) is a non-negative random variable,

The following lemma shows a bound on the sum of influences.

The usual identity for the total (low-degree) influence holds,

Analogous to Theorem 5.7, the following invariance principle can be shown for regular multilinear polynomials.

The proof follows easily from the proof of Theorems 5.9 and 5.7 and the fact that if RM(n,d)\text{\sf RM}(n,d) satisfies the properties of the PRG in [MZ10], then so does RM(n,d)t\text{\sf RM}(n,d)^{t}. We omit the proof.

The work of Mossel et al. [MOO05] also obtains bounds on noise stability of functions over product spaces of large alphabets namely QNQ^{N}. The following corollary is a consequence of Theorem 4.44.4 in [MOO05]. The proof is analgous to that of Corollary 5.3 from Theorem 5.1.

Let f ⁣:QN\varmathbbRf\colon Q^{N}\to\varmathbb R be a function with \varmathbbEf=μ\operatorname*{\varmathbb{E}}f=\mu and \varmathbbEζfτ\operatorname*{\varmathbb{E}}\zeta\circ f\leqslant\tau. Suppose Infif30log(1/τ)/logQτ\operatorname{Inf}_{i}f^{\leqslant 30\log(1/\tau)/\log Q}\leqslant\tau for all i[N]i\in[N]. Then,

where TρT_{\rho} is the noise graph on QNQ^{N} with second largest eigenvalue ρ\rho and Γρ\Gamma_{\rho} is the Gaussian noise stability curve. (Here, we assume that τ\tau is small enough.)

Now we are ready to present the proof of the majority is stablest theorem over Dt\mathcal{D}^{t} (Theorem 6.4) using Theorem A.2 and Corollary A.3.

Let g=Gγfg=G^{\gamma}f and G=G12γG^{\prime}=G^{1-2\gamma}. Then, the graph GG^{\prime} has the same eigenfunctions as GG - χα\chi_{\alpha} for αQN/C\alpha\in Q^{N}/\mathcal{C} with eigenvalues λα=λα12γ\lambda_{\alpha}^{\prime}=\lambda_{\alpha}^{1-2\gamma}. From the above equation, it is easy to check that, for ρ=ρ12γ\rho^{\prime}=\rho^{1-2\gamma},

Hence, using Equation (A.6) (and the crude bound μ1\mu\leqslant 1),

Let S{±1}NtS\subseteq\{\pm 1\}^{Nt} be the set of {±1}\{\pm 1\}-vectors corresponding to the Reed–Muller code Dt\mathcal{D}^{t}, that is, for every codeword c=(c(1),c(2),,c(t))Dtc=(c^{(1)},c^{(2)},\ldots,c^{(t)})\in\mathcal{D}^{t}, the set SS contains the vector ((1)c1(1),(1)cj(i),(1)cN(t))((-1)^{c^{(1)}_{1}},\ldots(-1)^{c^{(i)}_{j}},(-1)^{c^{(t)}_{N}}). Then, as gg is $valuedon-valued on\mathcal{D}^{t}andand\zeta$ measures distance to bounded random variables, by Equation (A.5),

Since Γρ(μ+η)Γρ(μ)+2η\Gamma_{\rho^{\prime}}(\mu+\sqrt{\eta})\leqslant\Gamma_{\rho^{\prime}}(\mu)+2\sqrt{\eta} and Γρ(μ)Γρ(μ)+ρρ/(1ρ)\Gamma_{\rho}(\mu)\leqslant\Gamma_{\rho^{\prime}}(\mu)+|\rho-\rho^{\prime}|/(1-\rho) (cf. Lemma B.3, Corollary B.5 in [MOO05]), it follows from Equations (A.7) and (A.8) that