Rounding Semidefinite Programming Hierarchies via Global Correlation

Boaz Barak, Prasad Raghavendra, David Steurer

Introduction

This paper is concerned with hierarchies of semi-definite programs (SDP’s). Semidefinite programs are an extremely useful tool in algorithms and in particular approximation algorithms (e.g., [GW95, KMS98]). SDP’s involve finding an integral (say 0/10/1) solution for some optimization problem, by using convex programming to find a fractional/high-dimensional solution and then rounding it into an integral solution. Sherali and Adams [SA90], Lovász and Schrijver [LS91], and, later Lasserre [Las01], proposed systematic ways, known as hierarchies, to make this convex relaxation tighter, thus ensuring that the fractional solution is closer to an integral one. These hierarchies are parameterized by a number rr, called the level or number of rounds of the hierarchy. Given a program on nn variables, optimizing over the rthr^{th} level of the hierarchy can be done in time nO(r)n^{O(r)}. The gap between integral and fractional solutions decreases with rr, and reaches zero at the nthn^{th} level, where the program is guaranteed to find an optimal integral solution. The paper [Lau03] surveys and compares the different hierarchies proposed in the literature, see also the recent survey [CT10].

These semidefinite programming hierarchies have been of some interest in recent years, since they provide natural candidate algorithms for many computational problems. In particular, whenever the basic semidefinite or linear program provides a suboptimal approximation factor, it makes sense to ask how many rounds of the hierarchy are required to significantly improve upon this factor. Unfortunately, taking advantage of these hierarchies has often been difficult, and while some algorithms (e.g., [ARV09]) can be encapsulated in, say, level 33 or 44 of some hierarchies, there have been relatively few results (e.g. [Chl07, BCC+10]) that use higher levels to obtain new algorithmic results. In fact, there has been more success in showing that high levels of hierarchies do not help for many computational problems [ABLT06, STT07, GMPT07, RS09, KS09]. In particular for 3SAT and several other NP-hard problems, it is known that it takes Ω(n)\Omega(n) rounds of the strongest SDP hierarchy (i.e., Lasserre) to improve upon the approximation rate achieved by the basic SDP (or sometimes even simpler algorithms) [Sch08, Tul09].

Semidefinite hierarchies are of particular interest in the case of problems related to Khot’s Unique Games Conjecture (UGC) [Kho02]. Several works have shown that for a wide variety of problems, the UGC implies that (unless P=NP\textbf{P}=\textbf{NP}) the basic semidefinite program cannot be improved upon by any polynomial-time algorithm [KKMO04, MOO05, Rag08]. Thus in particular the UGC predicts that for all these problems, it will take a super-constant (and in fact polynomial, under widely believed assumptions) number of hierarchy rounds to improve upon the basic SDP. Investigating this prediction, particularly for the Unique Games problem itself and other related problems such as Max Cut, Sparsest Cut and Small-Set Expansion, has been the focus of several works, and it is known that at least (loglogn)Ω(1)(\log\log n)^{\Omega(1)} rounds are required for a non-trivial approximation [RS09, KS09] by a natural (though not strongest possible) SDP hierarchy. However, no non-trivial upper bound was known prior to the current work, and so it was conceivable that these lower bounds can be improved to Ω(n)\Omega(n).

Recently, Arora, Barak and Steurer [ABS10] gave a 2npoly(ε)2^{n^{\operatorname{poly}(\varepsilon)}}-time algorithm for solving the Unique Games and Small-Set Expansion problems (where ε\varepsilon is the completeness parameter, see below). However, their algorithm did not use semidefinite programming hierarchies, and so does not immediately imply an upper bound on the number of rounds needed.

Our main contribution is a new method to analyze and round SDP hierarchies. We elaborate more on our method in Section 2, but its high level description is that uses global correlations inside the high-dimensional SDP solution, combined with the hierarchy constraints, to obtain a better rounding of this solution into an integral one. We believe this method can be of general utility, and in particular we use it here to give new algorithms for approximating constraint satisfaction problem on two-variable constraints (2-CSP’s), that run faster than the previously known algorithms for a natural family of instances. To state our results we need the notion of a threshold rank.

Results for Unique Games constraints

We say that a Max 2-Csp instance is a Unique Games instance if all the relation Πi,j\Pi_{i,j} have the form that (a,b)Πi,j(a,b)\in\Pi_{i,j} iff a=πi,j(b)a=\pi_{i,j}(b) where πi,j\pi_{i,j} is a permutation of [k][k]. As mentioned above, the performance of SDP hierarchy on Unique Games instances and related problems is of particular interest. We obtain somewhat stronger quantitative results for Unique Games instances. Also, as remarked below, our results are “morally stronger” in this case, since it’s conceivable that the hardest instances for these types of problems have small threshold rank. First, we show that for Unique Games instances the threshold τ\tau in Theorem 1.1 does not need to depend on the alphabet size. Namely, we prove

The Unique Games Conjecture is about a specific approximation regime for Unique Games. Given a Unique Games instance with optimal value 1ε1-\varepsilon, the goal is to find an assignment with value at least 1/21/2.

We also show that in this case a sublinear (and in fact a small root) number of rounds suffice to get such an approximation in the worst case, regardless of the instance’s threshold rank. Moreover, we also show that such an approximation can be obtained in a number of rounds that depends on the τ\tau-threshold rank for τ\tau that is close to 11 (as opposed to the small value of τ\tau needed for Theorems 1.1 and 1.2).

Examples of graphs with small threshold rank

Also, as noted in [ABS10], hypercontractive graphs (i.e., graphs whose 22 to 44 operator norm is bounded) have at most polylogarithmic τ\tau-threshold rank for every constant τ>0\tau>0. For several 2-CSP’s such as Max Cut, Unique Games, Small-Set Expansion, Sparsest Cut, the constraint graphs for the canonical “problematic instances” (i.e., integrality gap examples [FS02, KV05, KS09, RS09]) are all hypercontractive, since they are based on either the noisy Gaussian graph or noisy Boolean cube. In fact, it is conceivable that the Small-Set Expansion problem is trivial on graphs with large threshold rank, in the sense that we do not know of any example of an instance having, say, logω(1)n\log^{\omega(1)}n 0.990.99-threshold rank, and objective value smaller than 1/21/2. (For the Unique Games and Max Cut problems it is trivial to construct instances with large threshold rank by taking many disjoint copies of the same instance, though it could still be the case that the hardest instances are the ones with small threshold rank.) On the other hand, for other 2-CSPs such as Label Cover, some natural hard instances have linear threshold-rank. For example this is the case if one considers the natural “clause vs. variable” or “clause vs. clause” 2-CSP obtained from random instances of 3SAT (which is not surprising given that a non-trivial approximation for random 3SAT requires Ω(n)\Omega(n) levels of the Lasserre hierarchy [Sch08]).

Algorithm efficiency

Our algorithm actually does not require the full power of the Lasserre hierarchy. First, we can use the relaxed variant with approximate constraints studied in [KS09, RS09, KPS10]. Second, in the proof of Theorem 1.3, we don’t need to utilize the constraints on all (nr)\binom{n}{r} rr-sized subsets of nn variables, but rather sufficiently many random sets suffice. As a result, we can implement our rr-round algorithm in time 2O(r)poly(n)2^{O(r)}\operatorname{poly}(n).

2 Related works

For Unique Games and related problems, previous works [KT07, Kol10, ABS10] used subspace enumeration to give algorithms with similar running time to Theorem 1.3 in the case that the threshold rank of the label extended graph of the instance is small. This is known to be a stronger condition on the instances than bounding the threshold rank of the constraint graph. The only known bound on the 1ε1-\varepsilon threshold rank of the label extended graph in terms of the 1ε1-\varepsilon threshold rank of the constraint graph loses a factor of about nεn^{\varepsilon} [ABS10]. These subspace enumeration algorithms also only applied to nearly satisfiable instances (whose objective value is close to 11), and so did not give guarantees comparable to Theorems 1.1 and 1.2. As mentioned below in Section 2, SDP-based algorithms have some robustness advantages over spectral techniques. SDP hierarchies are also easily shown to yield polynomial-time approximation scheme for 2CSPs whose constraint graphs can have very high threshold rank such as bounded tree width graphs and regular planar graphs (or more generally any hyperfinite family of graphs, see e.g. [HKNO09] and the references therein).

Approximation schemes for (pseudo) dense CSP’s

For general 2CSP’s, several works gave polynomial-time approximation schemes for dense and pseudo-dense instances [FK99, ACOH+10, COCF10]. Our work generalizes these results, since pseudo-density is a stronger condition than having a constraint graph of low threshold rank. Furthermore, for an ε\varepsilon-approximation the degree of the instance needed by these works is exponential in 1ε\frac{1}{\varepsilon}, while the results of this work apply even on random graphs of degree poly(1/ε)poly(1/\varepsilon).

Analyzing SDP hierarchy

Using very different methods, Chlamtac [Chl07] and Bhaskara et al [BCC+10] gave LP/SDP-hierarchy based algorithms for hypergraph coloring and the densest subgraph problem respectively. As mentioned above, several works gave lower bounds for LP/SDP hierarchies. In particular [RS09, KS09] showed that approximation such as those achieved in Theorem 1.3 for Unique Games problem require loglogΩ(1)n\log\log^{\Omega(1)}n rounds of a relaxed variant of the Lasserre hierarchy. This relaxed variant captures our hierarchy as well. Schoenebeck [Sch08] proved that achieving a non-trivial approximation for 3SAT on random instances requires Ω(n)\Omega(n) rounds in the Lasserre hierarchy, while Tulsiani [Tul09] showed that Lasserre lower bounds are preserved under common types of NP-hardness reductions.

In a concurrent and independent work, Guruswami and Sinop [GS11] gave results very similar to ours. They also use the Lasserre hierarchy to get an approximation scheme with similar performance to our Theorem 1.1 for 2-CSPs, and in fact even consider generalizations involving additional (approximate) global linear constraints. They also get essentially the same results for Unique Games as our Theorem 1.3. Furthermore, their rounding algorithm is the same as ours. However, there are some differences both in results and the proof. First, although [GS11] use a notion similar to our local-to-global correlation, they view it differently, and interestingly relate it to the problem of column selection for low rank approximations of matrices. Also, apart from the special case of unique constraints, they work with the threshold rank of the label extended graph, as opposed to the constraint graph as is the case here (however for binary alphabet these two graphs coincide). Their analysis relies on the full power of the Lasserre hierarchy, whereas we show that a weaker hierarchy is sufficient in the Unique Games case, and can even be done faster (i.e., exp(r)poly(n)\exp(r)\operatorname{poly}(n) vs nO(r)n^{O(r)}).

Our techniques

We now describe, on a very high and imprecise level, the ideas behind our rounding algorithm and its analysis. A semidefinite programming relaxation of an optimization problem yields a set of vectors v1,,vnv_{1},\ldots,v_{n} satisfying certain conditions and achieving some objective value cc. The goal of a rounding algorithm is to transform this set of vectors into, say, a +1/1+1/-1 solution, satisfying the same conditions and achieving value cc^{\prime} that is close to cc. At a very high level, our main result is that if these vectors have some non-trivial global correlation, then a good rounding can be achieved with a non-trivially small number of hierarchy rounds. Our second observation is that in several cases, the vectors corresponding to a good SDP solution can be shown to have significant mass inside some low-dimensional subspace, and that implies a lower bound on their global correlation. Below we elaborate on what we mean, using the Max Cut problem (which is a special case of Unique Games) as an illustrative example. Our result for Max Cut is worked out in more detail in Section 4.

The SDP solution for Max Cut problem consists of a sequence V=v1,,vn\mathcal{V}=v_{1},\ldots,v_{n} of unit vectors, and the objective value is the expectation of (1vi,vj)/2(1-\langle v_{i},v_{j}\rangle)/2 over all edges {i,j}\{i,j\} in the input graph. Note that in the case that the vectors v1,,vnv_{1},\ldots,v_{n} are one dimensional unit vectors (i.e., vi{±1}v_{i}\in\{\pm 1\}), V\mathcal{V} exactly corresponds to a cut in the graph, and the objective value measures the fraction of edges cut. Now, suppose that you could find rr vectors vi1,,virVv_{i_{1}},\ldots,v_{i_{r}}\in\mathcal{V}, whom we’ll call the basis vectors, such that every other vVv\in\mathcal{V} has some significant projection ρ\rho into the span of vi1,,virv_{i_{1}},\ldots,v_{i_{r}}. That is, if we let PP be the projection operator corresponding to this space, then for every vVv\in\mathcal{V} , Pv2ρ\lVert Pv\rVert_{2}\geqslant\rho. It turns out that in this case, if ρ\rho is sufficiently close to 11 and the vector solution V\mathcal{V} satisfied r+2r+2 rounds of an appropriate SDP hierarchy, then we can round V\mathcal{V} to achieve a very good cut. The intuition behind this is the following: the constraints of r+2r+2 hierarchy rounds allow us to essentially assume without loss of generality that the vectors vi1,,virv_{i_{1}},\ldots,v_{i_{r}} are one-dimensional. That is, after applying an appropriate rotation, we can think of each one of them as a vector of the form (±1,0,,0)(\pm 1,0,\ldots,0). Moreover, our assumption implies that every other vector in vv has a magnitude of at least ρ\rho in its first coordinate. Now one can show that simply rounding each vector to the sign of its first coordinate will result in a ±1\pm 1 assignment to the vertices corresponding to a good cut.

Local to global correlation

From the above discussion, our goal of rounding SDP hierarchies is reduced to finding a small number of basis vectors vi1,,virv_{i_{1}},\ldots,v_{i_{r}} such that every (or at least most) other vector in the solution V\mathcal{V} has very large projection into their span. But, why should such vectors exist? We show that we can assume they exist if the original Max Cut instance has small threshold rank. The latter is a condition that, as mentioned above, holds for many natural families of instances, including the canonical “hard instances” that are known to fool the GW algorithm— the noisy sphere and noisy Gaussian graphs [FS02, RS09]. The key concept behind our proof is the notion of local vs global correlations. It is a very well known property of expander graphs that random edges behave similarly to pairs of independently chosen vertices with respect to some tests. Specifically, if GG is an nn-vertex expander in the sense that the normalized adjacency matrix AGA_{G}’s second largest eigenvalue is at most ε\varepsilon, and ff is a bounded function mapping vertices to numbers, then we know that \varmathbbEi,j[f(i)f(j)2](1±O(ε))\varmathbbEij[f(i)f(j)2])\operatorname*{\varmathbb{E}}_{i,j}[|f(i)-f(j)|^{2}]\in(1\pm O(\varepsilon))\operatorname*{\varmathbb{E}}_{i\sim j}[|f(i)-f(j)|^{2}]), where the former expectation is over pairs of vertices and the latter is over pairs connected by an edge. In other words, expander graphs imply that if ff is locally correlated over the edges of an expander graph, then it is also globally correlated. In fact, this is easily shown to hold even if ff maps vertices not into numbers but into vectors— i.e., if v1,,vnv_{1},\ldots,v_{n} are unit vectors that are locally correlated over the edges of GG then they are also globally correlated. Indeed, this property of expanders has been used in the work of [AKK+08], who showed that the basic SDP program for Unique Games can be successfully rounded if the input graph is an expander.

Distribution view of SDP’s

Another, often beneficial way to view SDP hierarchies is as providing distribution on integral solutions (see Section 3.2). In this view, for every set of r+2r+2 vertices i1,,ir,ir+1,ir+2i_{1},\ldots,i_{r},i_{r+1},i_{r+2}, the SDP hierarchy provides a distribution Xi1,,XirX_{i_{1}},\ldots,X_{i_{r}} over ±1\pm 1. Moreover, we require that distributions on overlapping sets will be consistent, and that the for every two variables i,ji,j the expectation E[XiXj]E[X_{i}X_{j}] will equal the inner product vi,vj\langle v_{i},v_{j}\rangle of the corresponding vectors. The challenge in rounding the SDP is that there is not necessarily a way to sample simultaneously the random variables X1,,XnX_{1},\ldots,X_{n} in some consistent way. The projection of a vector vv into the span of vi1,,virv_{i_{1}},\ldots,v_{i_{r}} turns out to capture (an appropriate notion of) the mutual information between the variable Xi1X_{i_{1}} and the variables Xi1,,XirX_{i_{1}},\ldots,X_{i_{r}}. Looked at from this viewpoint, our rounding algorithm involves choosing an assignment from the distribution for the basis vertices, and conditioning on its value. As long as (**) holds, we can find a random variable XiX_{i} such that conditioning on XiX_{i} will significantly decrease the entropy of the remaining variables. When we get stuck and (*) is violated, it means that for a typical edge iji\sim j, the random variables XiX_{i} and XjX_{j} are close to being statistically independent. This means that just sampling each XiX_{i} independently will give approximately the same value on a typical constraint.

Threshold rank vs global correlation

Whenever the graph has small number of large eigenvalues, the condition that local correlation implies global correlation holds. This is useful to simulate eigenspace enumeration algorithms such as used by [KT07, Kol10, ABS10, Ste10] since in the case of Unique Games (and other related problems), a good SDP solution must be locally well correlated. But the notion of local to global correlation is somewhat more general and robust than having small threshold rank. For example, adding n\sqrt{n} isolated vertices to a graph will increase correspondingly the number of eigenvectors with value 11, but will actually not change by much the local to global correlation. This captures to a certain extent the fact that SDP-based solutions are more robust than the spectral based algorithms. (A similar example of this phenomenon is that adding a tiny bipartite disjoint graph to the input graph makes the smallest eigenvalue become 1-1, but does not change by much the value of the Goemans-Williamson SDP.) We hope that this robustness of the SDP-based approach will enable further improvements in the future.

Theorem 1.3 considers a different parameter than Theorems 1.1 and 1.2. The latter two results consider threshold ranks for a small (i.e., close to ) threshold τ\tau, and achieve a very good approximation. In contrast, Theorem 1.1 considers threshold τ\tau that is close to 11, but only achieve a rough approximation (corresponding to the approximation guarantee relevant to the unique games conjecture). This is also manifested in some technical differences in the proofs.

Organization

We begin by fixing notation and a few formal definitions in the next section. For the purpose of exposition, we first present an algorithm for Max Cut on low-rank graphs using the Lasserre hierarchy in Section 4. Following this, the general algorithm for 2-CSPs on low-rank graphs is presented in Section 5. The connection between local and global correlations in low-rank graphs that is central to our algorithms, is outlined in Section 6. To implement our general approach in a hierarchy weaker than Lasserre hierarchy, we outline an argument to obtain low-rank approximation to any set of vectors in Section 7. The final section (Section 8) of the paper is devoted to subexponential time algorithm for Unique Games.

Preliminaries

We will use capital letters X,YX,Y to denote random variables, and lower-case letters to denote assignments to these random variables.

The collision probability of XX is defined as

where XX^{\prime} is an independent copy of XX (so that the sequence X,XX,X^{\prime} is i.i.d.). It is easy to see that the variance and collision probability are related by,

2 Local Distributions

3 Lasserre Hierarchy

Let UU be a Unique Games instance with constraint graph G=(V,E)G=(V,E), label set [k]={1,,k}[k]=\{1,\ldots,k\}, and bisections {πij}ijE\{\pi_{ij}\}_{ij\in E}. An mm-round Lasserre solution consists of mm-local random variables X1,,XnX_{1},\ldots,X_{n} and vectors vS,αv_{S,\alpha} for all vertex sets SVS\subseteq V with Sm+2\lvert S\rvert\leqslant m+2 and all local assignments α[k]S\alpha\in[k]^{S}. A Lasserre solution is feasible if the local random variables are consistent with the vectors, in the sense that for all S,TVS,T\subseteq V and α[k]S,β[k]T\alpha\in[k]^{S},\beta\in[k]^{T} with STm+2\lvert S\cup T\rvert\leqslant m+2, we have

The objective is to maximize the following expression

An important consequence of the existence of the vectors vS,αv_{S,\alpha} is that for every set SVS\subseteq V with Sm\lvert S\rvert\leqslant m and local assignment xS[k]Sx_{S}\in[k]^{S}, the matrix {Cov(Xia,XjbXS=xS)}i,jV,a,b[k]\left\{\operatorname*{Cov}(X_{ia},X_{jb}\mid X_{S}=x_{S})\right\}_{i,j\in V,\,a,b\in[k]} is positive semidefinite.

Warmup – MaxCut Example

For the sake of exposition, we first present an algorithm for the Max Cut problem on low-rank graphs. In the Max Cut problem, the input consists of a graph G=(V,E)G=(V,E) and the goal is to find a cut SSˉ=VS\cup\bar{S}=V of the vertices that maximizes the number of edges crossing, i.e., maximizes E(S,Sˉ)|E(S,\bar{S})|.

The Goemans-Williamson SDP relaxation for the problem assigns a unit vector viv_{i} for every vertex iVi\in V, so as to maximize the average squared length Ei,jEvivj2E_{i,j\in E}\lVert v_{i}-v_{j}\rVert^{2} of the edges. Formally, the SDP relaxation is given by,

Stronger SDP relaxations produced by hierarchies such as Sherali-Adams and Lasserre hierarchy also yield probability distributions over local assignments.

More precisely, given a mm-round Lasserre SDP solution, it can be associated with a set of mm-local random variables X1,,XnX_{1},\ldots,X_{n} taking values in {1,1}\{-1,1\}. For an edge (i,j)(i,j), its contribution to the SDP objective value (vivj2\lVert v_{i}-v_{j}\rVert^{2}) is equal to the probability that the edge (i,j)(i,j) is cut under the distribution of local assignments μij\mu_{ij}, namely,

Consequently, in order to obtain a cut with value close to the SDP objective, it is sufficient to jointly sample X1,,XnX_{1},\ldots,X_{n}, such that on every edge (i,j)(i,j) the distribution of XiX_{i} and XjX_{j} is close to the corresponding local distribution μij\mu_{ij}. However, the variables X1,,XnX_{1},\ldots,X_{n} are not jointly distributed, and hence cannot all be sampled together.

As a first attempt, let us suppose we sample each XiX_{i} independently from its associated marginal μi\mu_{i}. If on most edges (i,j)(i,j), the distribution of the resulting samples Xi,XjX_{i},X_{j} is close to μij\mu_{ij}, then we are done. On an edge (i,j)(i,j), the local distribution μij\mu_{ij} is far from the independent sampling distribution μi×μj\mu_{i}\times\mu_{j} only if the random variables Xi,XjX_{i},X_{j} are correlated. Henceforth, these correlations across the edges would be refered to as “local correlations". A natural measure for correlations that we will utilize here is defined as Cov(Xi,Xj)=\varmathbbE[XiXj]\varmathbbE[Xi]\varmathbbE[Xj]\operatorname*{Cov}(X_{i},X_{j})=\operatorname*{\varmathbb{E}}[X_{i}X_{j}]-\operatorname*{\varmathbb{E}}[X_{i}]\operatorname*{\varmathbb{E}}[X_{j}]. Using this measure, the statistical distance between independent sampling (μi×μj\mu_{i}\times\mu_{j}) and correlated sampling (μij\mu_{ij}) is given by

(See Lemma 5.1 for a more general version of the above bound).

On the flip side, the existence of correlations makes the problem of sampling X1,,XnX_{1},\ldots,X_{n} easier! If two variables Xi,XjX_{i},X_{j} are correlated, then sampling/fixing the value of XiX_{i} reduces the uncertainty in the value of XjX_{j}. More precisely, conditioning on the value of XiX_{i} reduces the variance of XjX_{j} as shown below:

Therefore, if we pick an iVi\in V at random and fix its value then the expected decrease in the variance of all the other variables is given by,

The above bound is proven in a more general setting in Lemma 5.2. As all random variables involved have variance at most 11, we can rewrite the above expression as,

The decrease in the variance is directly related to the global correlations between random pairs of vertices i,jVi,j\in V.

Recall that, the failure of independent sampling yields a lower bound on the average local correlations on the edges namely, Ei,jECov(Xi,Xj)E_{i,j\in E}|\operatorname*{Cov}(X_{i},X_{j})|. The crucial observation is that if the graph GG is a good expander in a suitable sense, then these local correlations translate in to non-negligible global correlations. Formally, we show the following (in Section 6):

Let v1,,vn{\bm{v}}_{1},\ldots,{\bm{v}}_{n} be vectors in the unit ball. Suppose that the vectors are correlated across the edges of a regular nn-vertex graph GG,

Then, the global correlation of the vectors is lower bounded by

As random variables XiX_{i} arise from the solution to a SDP, the matrix (Cov(Xi,Xj))i,jV\left(\operatorname*{Cov}(X_{i},X_{j})\right)_{i,j\in V} is positive semidefinite, i.e., there exists vectors uiu_{i} such that ui,uj=Cov(Xi,Xj)i,jV\langle u_{i},u_{j}\rangle=\operatorname*{Cov}(X_{i},X_{j})\,\,\forall i,j\in V. Let us consider the vectors vi=ui2v_{i}=u_{i}^{\otimes 2}. Suppose the local correlation \varmathbbEi,jECov(Xi,Xj)\operatorname*{\varmathbb{E}}_{i,j\in E}|\operatorname*{Cov}(X_{i},X_{j})| is at least ε\varepsilon then we have,

and \varmathbbEi[vi2]1\operatorname*{\varmathbb{E}}_{i}[\lVert v_{i}\rVert^{2}]\leqslant 1. If the graph GG is low-rank, then by Lemma 4.1 we get a lower bound on the global correlation of the vectors viv_{i}, namely

General 2-CSP on Low Rank Graphs

Let \Im be a (general) Max 2-Csp instance with variable set V=[n]V=[n] and label set [k][k]. (We represent \Im as a distribution over triples (i,j,Π)(i,j,\Pi), where i,jVi,j\in V and Π[k]×[k]\Pi\subseteq[k]\times[k] is an arbitrary binary predicate. The goal is to find an assignment x[k]Vx\in[k]^{V} that maximizes the probability \varmathbbP(i,j,Π){(xi,xj)Π}\operatorname*{\varmathbb{P}}_{{(i,j,\Pi)\sim\Im}}\left\{(x_{i},x_{j})\in\Pi\right\}.)

For simplicity,If the constraint graph is not regular, all of our results still hold for an appropriate definition of threshold rank. we will assume that the constraint graph of \Im is regular, i.e., every variable iVi\in V appears in the same number of constraints. (Since we allow the constraints to be weighted, the precise condition is that the total weight of the constraints incident to a vertex is the same for every vertex.)

Let X1,,XnX_{1},\ldots,X_{n} be rr-local random variables with range [k][k]. We write XiaX_{ia} to denote the {0,1}\{0,1\}-indicator of the event Xi=aX_{i}=a. Notice that {Xia}iV,a[k]\{X_{ia}\}_{i\in V,\,a\in[k]} are also mm-local random variables.

For two random variables XX and XX^{\prime} with the same range, we denote their statistical distance,

The following lemma shows that the statistical difference between independent sampling and correlated sampling is explained by local correlation.

Under the distribution {XiXj}\{X_{i}X_{j}\}, the event {Xi=a,Xj=b}\{X_{i}=a,X_{j}=b\} has probability \varmathbbEXiaXjb\operatorname*{\varmathbb{E}}X_{ia}X_{jb}. On the other hand, under the product distribution {Xi}{Xj}\{X_{i}\}\{X_{j}\}, this event has probability \varmathbbEXia\varmathbbEXjb\operatorname*{\varmathbb{E}}X_{ia}\operatorname*{\varmathbb{E}}X_{jb}. Hence, the difference of these probabilities is equal to \varmathbbEXiaXjb\varmathbbEXia\varmathbbEXjb=Cov(Xia,Xjb)\operatorname*{\varmathbb{E}}X_{ia}X_{jb}-\operatorname*{\varmathbb{E}}X_{ia}\operatorname*{\varmathbb{E}}X_{jb}=\operatorname*{Cov}(X_{ia},X_{jb}). ∎

Conditional Variance and Pairwise Correlation

The following lemma shows that conditioning on a variable XjX_{j} decreases the variance of a variable XiX_{i} by the correlation of the variables XiaX_{ia} and XjbX_{jb}.

Pairwise Correlations and Inner Products

Suppose that the matrix (Cov(Xia,Xjb))iV,a[k]\left(\operatorname*{Cov}(X_{ia},X_{jb})\right)_{i\in V,\,a\in[k]} is positive semidefinite. Then, there exists vectors v1,,vn{\bm{v}}_{1},\ldots,{\bm{v}}_{n} in the unit ball such that for all vertices i,jVi,j\in V,

Local Correlation vs Global Correlation on Low-Rank Graphs

The following lemma shows that local correlation (correlation across edges of a graph) implies global correlation (correlation between random vertices) if the graph has low threshold rank. (Proof in Section 6.)

Let v1,,vn{\bm{v}}_{1},\ldots,{\bm{v}}_{n} be vectors in the unit ball. Suppose that the vectors are correlated across the edges of a regular nn-vertex graph GG,

Then, the global correlation of the vectors is lower bounded by

Putting Things Together

The following lemma shows that either independent sampling is statistically close to correlated sampling across edges of a graph or the typical variance of a vertex decreases non-trivially by conditioning on a random vertex.

Let GG be a regular nn-vertex graph and ε\varepsilon be the expected statistical distance between independent and correlated sampling across the edges of GG,

Further, suppose that the matrix (Cov(Xia,Xjb))iV,a[k]\left(\operatorname*{Cov}(X_{ia},X_{jb})\right)_{i\in V,\,a\in[k]} is positive semidefinite. Then, conditioning on a random vertex decreases the variances by

Let v1,,vn{\bm{v}}_{1},\ldots,{\bm{v}}_{n} be the vectors constructed in Lemma 5.3. By Lemma 5.3 and Lemma 5.1, the local correlation of these vectors is at least

(The last step also uses Cauchy–Schwartz.) Hence, Lemma 4.1 implies the following lower bound on the global correlation of these vectors,

Lemma 5.3 and Lemma 5.2 allows us to relate the expected decrement of the variances to the global correlation of the vectors v1,,vn{\bm{v}}_{1},\ldots,{\bm{v}}_{n},

The following lemma asserts that if the constraint graph has low threshold rank then there exists a partial assignment xSx_{S} to a small set SS of vertices such that independent sampling conditioned on this assignment xSx_{S} gives almost the same value as correlated sampling (without conditioning on the assignment xSx_{S}).

Algorithm 5.5 (Propagation Sampling). Input: rr-local random variables X1,,XnX_{1},\ldots,X_{n} over [k][k] Output: (global) distribution over assignments x[k]Vx\in[k]^{V}. 1. Choose m{1,,r}m\in\{1,\ldots,r\} at random. 2. Sample a random set of “seed vertices” SVmS\in V^{m}. (Repeated vertices are allowed.) 3. Sample a assignment xS[k]Sx_{S}\in[k]^{S} for SS according to its local distribution {XS}\{X_{S}\}. 4. For every other vertex iVSi\in V\setminus S, sample a label xi[k]x_{i}\in[k] according to the local distribution for S{i}S\cup\{i\} conditioned on the assignment xSx_{S} for SS.

To prove the current theorem it is enough to show that \varmathbbEm[r]εmε\operatorname*{\varmathbb{E}}_{m\in[r]}\varepsilon_{m}\leqslant\varepsilon. For mrm\leqslant r, define a non-negative potential Φm\Phi_{m} as follows

Let m[r]m\in[r]. Suppose εmε/2\varepsilon_{m}\geqslant\varepsilon/2. Then,

The following theorem directly implies Theorem 1.1.

An optimal rr-round Lasserre solution gives rise to rr-local random variables X1,,XnX_{1},\ldots,X_{n} over [k][k]. Let XiaX_{ia} be the indicator variable of the event Xi=aX_{i}=a. The matrices {Cov(Xia,XjbXS=xS)}i,jV,a,b[k]\{\operatorname*{Cov}(X_{ia},X_{jb}\mid X_{S}=x_{S})\}_{i,j\in V,\,a,b\in[k]} are positive semidefinite for all sets SVS\subseteq V with Sr\lvert S\rvert\leqslant r and local assignments xS[k]Sx_{S}\in[k]^{S}. Furthermore, the Lasserre solution satisfies

Let X1,,XnX_{1}^{\prime},\ldots,X^{\prime}_{n} be the jointly-distributed (global) random variables in Theorem 5.6. By Theorem 5.6, we can estimate the expected value of the assignment X1,,XnX^{\prime}_{1},\ldots,X^{\prime}_{n} as

1 Special case of Unique Games

The following lemma is a version of Lemma 5.3 tailored towards Unique Games. The advantage of this version of the lemma is that the bounds are independent of the alphabet size kk.

Let X1,,XnX_{1},\ldots,X_{n} be rr-local random variables over [k][k] and let XiaX_{ia} be the indicator of the event Xi=aX_{i}=a. Suppose that the matrix (Cov(Xia,Xjb))iV,a[k]\left(\operatorname*{Cov}(X_{ia},X_{jb})\right)_{i\in V,\,a\in[k]} is positive semidefinite. Then, there exists vectors v1,,vn{\bm{v}}_{1},\ldots,{\bm{v}}_{n} in the unit ball such that for all vertices i,jVi,j\in V and permutations π\pi of [k][k],

The following theorem immediately implies Theorem 1.2. Let \Im be a Unique Games instance with alphabet size kk and constraint graph GG.

Let X1,,XnX_{1},\ldots,X_{n} be rr-local random variables over [k][k] from an optimal rr-round Lasserre solution for \Im. The local variables satisfy

For a permutation π\pi of [k][k], we define a modified version of statistical distance,

Therefore, we can estimate the expected fraction of satisfied constraints as

Local Correlation implies Global Correlation in Low-Rank Graphs

Let GG be a regular graph with vertex set V={1,,n}V=\{1,\ldots,n\}. We identify GG with its normalized adjacency matrix, a symmetric stochastic matrix. Let λ1λn\lambda_{1}\geqslant\ldots\geqslant\lambda_{n}\in be the eigenvalues of GG in non-increasing order.

The following lemma shows that a violation of the local vs global correlation condition implies that the graph has high threshold rank.

Suppose there exist vectors v1,,vn\varmathbbRnv_{1},\ldots,v_{n}\in\varmathbb R^{n} such that

Then for all C>1C>1, λ(11/C)m1Cε\lambda_{(1-1/C)m}\geqslant 1-C\cdot\varepsilon. In particular, λm/2>12ε\lambda_{m/2}>1-2\varepsilon.

Let X=(xr,s)r,s[n]X=(x_{r,s})_{r,s\in[n]} be the Gram matrix (vi,vj)i,jV(\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle)_{i,j\in V} represented in the eigenbasis of GG, so that

Let mm^{\prime} be the largest index such that λm1Cε\lambda_{m^{\prime}}\geqslant 1-C\cdot\varepsilon. Notice that the numbers p1=x1,1,,pn=xn,np_{1}=x_{1,1},\ldots,p_{n}=x_{n,n} form a probability distribution over r[n]r\in[n]. Let q=i=1mpiq=\sum_{i=1}^{m^{\prime}}p_{i} be the probability of the event rmr\leqslant m^{\prime}. Using Cauchy–Schwarz, we can bound this probability in terms of mm,

On the other hand, we can bound the expectation of λr\lambda_{r} with respect to the probability distribution (p1,,n)(p_{1},\ldots,_{n}) in terms of this probability qq,

It follows that m(1\nicefrac1C)mm^{\prime}\geqslant\left(1-\nicefrac{{1}}{{C}}\right)\cdot m, which gives the desired conclusion that GG has at least (1\nicefrac1C)m\left(1-\nicefrac{{1}}{{C}}\right)\cdot m eigenvalues λrCε\lambda_{r}\geqslant-C\cdot\varepsilon. ∎

Note that Lemma 4.1 follows directly from the previous lemma by picking C=(1ρ/100)(1ρ)C=\frac{(1-\rho/100)}{(1-\rho)} and observing that \varmathbbEi,jVvi,vj\varmathbbEi,jVvi,vj2\operatorname*{\varmathbb{E}}_{i,j\in V}|\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle|\geqslant\operatorname*{\varmathbb{E}}_{i,j\in V}|\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle|^{2} since vi,vj1|\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle|\leqslant 1 for all i,jVi,j\in V

As a converse to Lemma 6.1, the following lemma shows that if a graph has many eigenvalues close to 11, then there exist vectors for the vertices of the graph with high local correlation and low global correlation.

If λm1ε\lambda_{m}\geqslant 1-\varepsilon, then there exist vectors v1,,vn\varmathbbRmv_{1},\ldots,v_{n}\in\varmathbb R^{m} such that

Let f(1),,f(m) ⁣:V\varmathbbRf^{(1)},\ldots,f^{(m)}\colon V\to\varmathbb R be orthonormal eigenfunctions of GG with eigenvalue larger than 1ε1-\varepsilon. Consider vectors v1,,vn\varmathbbRmv_{1},\ldots,v_{n}\in\varmathbb R^{m} satisfying vi,vj=\varmathbbEr[m]fi(r)fj(r).\langle v_{i},v_{j}\rangle=\operatorname*{\varmathbb{E}}_{r\in[m]}f^{(r)}_{i}f^{(r)}_{j}. Since the functions f(r)f^{(r)} have norm 11, the typical squared norm of the vectors viv_{i} satisfies

Since the eigenvalues of the eigenfunctions f(r)f^{(r)} are larger than 1ε1-\varepsilon, we can lower bound the local correlation of the vectors viv_{i},

Finally, since the function f(m)f^{(m)} are orthonormal, the global correlation of the vectors viv_{i} is

The condition that there exist vectors v1,,vn\varmathbbRnv_{1},\ldots,v_{n}\in\varmathbb R^{n} with

is equivalent to the condition that there exists a symmetric positive semidefinite matrix X\varmathbbRV×VX\in\varmathbb R^{V\times V} such that

On Low Rank Approximations to Sets of Vectors

Let v1,,vn\varmathbbRnv_{1},\ldots,v_{n}\in\varmathbb R^{n} be vectors in the unit ball. Then for every ε>0\varepsilon>0, there exists a subset U{v1,,vn}U\subseteq\{v_{1},\ldots,v_{n}\} with U1/ε\lvert U\rvert\leqslant 1/\varepsilon such that \varmathbbEi,j[n]wiwjwˉi,wˉj2ε\operatorname*{\varmathbb{E}}_{i,j\in[n]}\lVert w_{i}\rVert\,\lVert w_{j}\rVert\,\langle\bar{w}_{i},\bar{w}_{j}\rangle^{2}\leqslant\varepsilon, where wiw_{i} is the projection of viv_{i} to the orthogonal complement of the span of UU.

The proof of Theorem 7.1 is by an iterative construction. In each iteration, we will use the following lemma.

Let v1,,vn\varmathbbRnv_{1},\ldots,v_{n}\in\varmathbb R^{n} be vectors. Then, there exists a unit vector u{vˉ1,,vˉn}u\in\{\bar{v}_{1},\ldots,\bar{v}_{n}\} such that the vectors v1,,vnv^{\prime}_{1},\ldots,v^{\prime}_{n} with vi=vivi,uuv^{\prime}_{i}=v_{i}-\langle v_{i},u\rangle u satisfy the following condition,

Suppose we pick a random index j[n]j\in[n] and choose u=vˉju=\bar{v}_{j}. In this case, the squared norm of the vectors vi=vivi,uuv^{\prime}_{i}=v_{i}-\langle v_{i},u\rangle u equals

Hence, we can estimate the expected decrease of the typical squared norms for a random vector u{vˉ1,,vˉn}u\in\{\bar{v}_{1},\ldots,\bar{v}_{n}\}.

It follows that there exists a unit vector u{vˉ1,,vˉn}u\in\{\bar{v}_{1},\ldots,\bar{v}_{n}\} such that the vectors vi=vivi,uuv^{\prime}_{i}=v_{i}-\langle v_{i},u\rangle u have the desired property

We can construct the set UU in a greedy fashion so as to minimize the total squared norm of the vectors w1,,wnw_{1},\ldots,w_{n} (the projections of the vectors viv_{i} to the orthogonal complement of the span of UU). (In fact, we could choose set UU randomly.) To make the analysis more convenient, we use the following, slightly different construction.

Let vi(1)=viv^{(1)}_{i}=v_{i} for all i[n]i\in[n].

For tt from 11 to 1/ε1/\varepsilon, construct vectors u(t)\varmathbbRnu^{(t)}\in\varmathbb R^{n} and v1(t+1),,vn(t+1)\varmathbbRnv^{(t+1)}_{1},\ldots,v^{(t+1)}_{n}\in\varmathbb R^{n} as follows:

Using Lemma 7.2, pick a unit vector u(t){vˉ1(t),,vˉn(t)}u^{(t)}\in\{\bar{v}^{(t)}_{1},\ldots,\bar{v}^{(t)}_{n}\} such that the vectors vi(t+1)=vi(t)vi(t),u(t)u(t)v^{(t+1)}_{i}=v^{(t)}_{i}-\langle v^{(t)}_{i},u^{(t)}\rangle u^{(t)} satisfy the condition

Notice that the vectors v1(t),,vn(t)v^{(t)}_{1},\ldots,v^{(t)}_{n} are the projections of the vector v1,,vnv_{1},\ldots,v_{n} into the orthogonal complement of the span of the vectors u(1),,u(t1)u^{(1)},\ldots,u^{(t-1)}. Let UU be the set of all indices jj such that u(t)=vˉj(t)u^{(t)}=\bar{v}^{(t)}_{j} for some t{1,,1/ε}t\in\{1,\ldots,1/\varepsilon\}. We can verify that the vectors u(1),,u(1/ε)u^{(1)},\ldots,u^{(1/\varepsilon)} are an orthonormal basis of the span of UU. Let w1,,wnw_{1},\ldots,w_{n} be the projections of the vectors v1,,vnv_{1},\ldots,v_{n} into the orthogonal complement of the span of UU (so that wi=vi(1/ε)w_{i}=v^{(1/\varepsilon)}_{i}). Since the vectors w1,,wnw_{1},\ldots,w_{n} are projections of the vectors v1(t),,vn(t)v^{(t)}_{1},\ldots,v^{(t)}_{n} for all t1,,1/εt\in{1,\ldots,1/\varepsilon}, it follows that

Hence, we can bound the typical squared norm of the vectors wiw_{i},

Since the left-hand side is nonnegative and \varmathbbEi[n]vi21\operatorname*{\varmathbb{E}}_{i\in[n]}\lVert v_{i}\rVert^{2}\leqslant 1, it follows that \varmathbbEi,j[n]wiwjwˉi,wˉj2ε,\operatorname*{\varmathbb{E}}_{i,j\in[n]}\lVert w_{i}\rVert\,\lVert w_{j}\rVert\,\langle\bar{w}_{i},\bar{w}_{j}\rangle^{2}\leqslant\varepsilon\,, as desired. ∎

For our applications it will sometimes be convenient to associate different subspace with subsets UU of vectors (in Theorem 7.1, we associate the span of vectors in UU with the subset UU).

Let v1,,vn\varmathbbRnv_{1},\ldots,v_{n}\in\varmathbb R^{n} be vectors in the unit ball. For every subset UVU\subseteq V, let QUQ_{U} be the projector on some subspace orthogonal to the span of UU. (Note that QUQ_{U} is not necessarily the projector on the orthogonal complement of the span of UU.) Then for every ε>0\varepsilon>0, there exists a subset U{v1,,vn}U\subseteq\{v_{1},\ldots,v_{n}\} with U1/ε\lvert U\rvert\leqslant 1/\varepsilon such that \varmathbbEi,j[n]wiwjwˉi,wˉj2ε\operatorname*{\varmathbb{E}}_{i,j\in[n]}\lVert w_{i}\rVert\,\lVert w_{j}\rVert\,\langle\bar{w}_{i},\bar{w}_{j}\rangle^{2}\leqslant\varepsilon, where wi=QUviw_{i}=Q_{U}v_{i}.

We use the same construction as in the proof of Theorem 7.1. The only difference is that we define vi(t+1)=PU(t)viv^{(t+1)}_{i}=P_{U^{(t)}}v_{i} (instead of vi(t+1)=vi(t)vi(t),u(t)u(t)v^{(t+1)}_{i}=v^{(t)}_{i}-\langle v^{(t)}_{i},u^{(t)}\rangle u^{(t)}). Here, U(t)U^{(t)} is the set of all indices jj such that u(t)=vˉj(t)u^{(t^{\prime})}=\bar{v}_{j}^{(t^{\prime})} for some ttt^{\prime}\leqslant t. The proof is still applies to this modifies construction because vi(t+1)vi(t)vi(t),u(t)u(t)\lVert v^{(t+1)}_{i}\rVert\leqslant\lVert v^{(t)}_{i}-\langle v^{(t)}_{i},u^{(t)}\rangle u^{(t)}\rVert (which is the only fact used about these vectors). ∎

Rounding SDP Solutions to Unique Games

In this section, we will present a subexponential time algorithm for Unique Games based on a SDP hierarchy, namely the simple SDP augmented with Sherali-Adams hierarchy. This hierarchy of relaxations weaker than the Lasserre hierarchy was studied in some earlier works [RS09, KS09]. Roughly speaking, the mmth round relaxation in this hierarchy corresponds to the basic semidefinite program, along with all valid constraints on at most mm vectors. Formally, the variables in the mmth round relaxation for Unique Games consists of

A set of vectors V={via}iV,a[k]\mathcal{V}=\{v_{ia}\}{i\in V,a\in[k]} with kk orthogonal vectors for every vertex iVi\in V.

The constraint of the SDP relaxation ensure that the inner products of the vectors are consistent with the corresponding local distributions, i.e., for all SVS\subseteq V, Sm\lvert S\rvert\leqslant m i,jSi,j\in S and a,b[k]a,b\in[k],

The objective value of the SDP corresponds to minimizing the number of violated constraints,

Sample a assignment xS[k]Sx_{S}\in[k]^{S} for SS according to its local distribution μS\mu_{S}.

For every other vertex iVSi\in V\setminus S, sample a label xi[k]x_{i}\in[k] according to the local distribution for S{i}S\cup\{i\} conditioned on the assignment xSx_{S} for SS.

The above procedure will be referred to as propagation rounding and the set SS of vertices will be called the seed vertices.

The following lemma implies that if the seed vertices SS nearly determine the values of a set of vertices TT, then the assignment output by the propagation rounding has a distribution similar to the local distribution μT\mu_{T} that is part of the LP/SDP solution (hence gets close to the SDP value).

For a set SVS\subseteq V S=mt|S|=m-t, let μS\mu^{|S} denote the distribution over global assignments x[k]Vx\in[k]^{V} output by propagation rounding with SS as the seed vertex set. Then, for every subset TT with Tt|T|\leqslant t we have

Sample a assignment xS[k]Sx_{S}\in[k]^{S} for SS according to its local distribution μS\mu_{S},

Sample an assignment yT[k]Ty_{T}\in[k]^{T} according to the local distribution for STS\cup T conditioned on the assignment xSx_{S} for SS,

For every vertex tTt\in T, sample a label xt[k]x_{t}\in[k] according to the local distribution for S{t}S\cup\{t\} conditioned on the assignment xSx_{S} for SS,

Clearly the distribution of yTy_{T} is μT\mu_{T}, while the distribution of xTx_{T} is μTS\mu^{|S}_{T}. For any tTt\in T, the coordinates xtx_{t} and yty_{t} are independent samples from μS,txS\mu_{S,t}\mid x_{S}. Therefore we have,

Averaging over the different choices of xSx_{S},

2 Unique Games on Low Rank Graphs

Let GG be an instance of unique games whose constraint graph GG has low threshold rank. Let V={via}iV,a[k]\mathcal{V}=\{v_{ia}\}_{i\in V,a\leqslant[k]} be an SDP solution for GG, and let {μS}SV,Sm\{\mu_{S}\}_{S\subseteq V,|S|\leqslant m} denote the associated set of locla distributions. Let X1,,XnX_{1},\ldots,X_{n} denote the associated mm-local random variables. The main result of this section shows that there exists a small set of seed vertices fixing whose value determines the value of almost every other vertex. Formally, we show the following

For every integer mm, there exists a subset of vertices SVS\subseteq V of size S=k2m|S|=k^{2}m such that

To this end, we will relate conditioning a random variable XiX_{i} on a set XSX_{S}, to projecting the SDP vectors corresponding to the variable XiX_{i} in to the span of the vectors corresponding to XSX_{S}. This analogy is formalized in the following lemma.

Let X1,X2,,XrX_{1},X_{2},\ldots,X_{r} be random variables with range [k][k] with a joint distribution μ\mu associated with them. For each i[r]i\in[r], a[k]a\in[k], let XiaX_{ia} be the indicator of the event that Xi=aX_{i}=a. Let us suppose there exists vectors {via}i[r],a[k]\{v_{ia}\}_{i\in[r],a\in[k]} such that

Let us suppose via=jS,b[k]cjbvjb+PSviav_{ia}=\sum_{j\in S,b\in[k]}c_{jb}v_{jb}+P_{S}v_{ia}. Define a random variable CSC_{S} as follows,

Note that on fixing the values {Xj}jS\{X_{j}\}_{j\in S}, the random variable CSC_{S} is fixed.

By the definition of variance of a real random variable we have the following inequality.

Averaging the above inequality over the settings of xSx_{S}, we get

Note that the second moments of the random variables {Xia}i[r],a[k]\{X_{ia}\}_{i\in[r],a\in[k]} match with the corresponding inner products of vectors {via}i[r],a[k]\{v_{ia}\}_{i\in[r],a\in[k]}. Hence,

The claim (1) follows from (8.1) and (8.2).

The claim (2) follows from (1) and the definition of variance of a random variable taking values over [k][k]. ∎

For a subset TV={via}iV,a[k]T\subseteq\mathcal{V}=\{v_{ia}\}_{i\in V,a\in[k]}, let STS_{T} be the set of vertices associated with it namely,

Let QTQ_{T} denote the projector on to the subspace orthogonal to span of {viaiST,a[k]}\{v_{ia}|i\in S_{T},a\in[k]\}. In particular, QTQ_{T} is a projector on to a subspace orthogonal to TT for all TVT\subseteq\mathcal{V}.

Apply Theorem 7.3 on the set of vectors V={via}iV,a[k]\mathcal{V}=\{v_{ia}\}_{i\in V,a\in[k]} with the projectors QTQ_{T} for a subset TVT\subseteq\mathcal{V}. Theorem 7.3 implies that there exists a choice of TVT\subseteq\mathcal{V} of size T=k2m|T|=k^{2}m such that if uia=QTviau_{ia}=Q_{T}v_{ia} then,

From Lemma 8.5, the low global correlation of vectors {Ui}iV\{U_{i}\}_{i\in V} implies that their squared length is small, i.e.,

If V\mathcal{V} is an SDP solution to unique games with value 1η1-\eta, i.e.,

If the vectors {Ui}iV\{U_{i}\}_{i\in V} satisfy,

then the average correlation among the vectors {Ui}iV\{U_{i}\}_{i\in V} is at least \nicefrac1m\nicefrac{{1}}{{m}}, i.e.,

By Lemma 8.4, the vectors {Ui}\{U_{i}\} satisfy

Let \varmathbbEiUi2=C4η/λm\operatorname*{\varmathbb{E}}_{i}\lVert U_{i}\rVert^{2}=C\geqslant 4\eta/\lambda_{m}. Normalize the vectors UiU_{i} so as to make their average squared length equal to 11. The resulting vectors have correlation at least (1η/C)1λm/2(1-\eta/C)\geqslant 1-\lambda_{m}/2. By Lemma 6.1, this implies that \varmathbbEi,jVUi,Uj21m\operatorname*{\varmathbb{E}}_{i,j\in V}\langle U_{i},U_{j}\rangle^{2}\geqslant\frac{1}{m}. Since Ui1\lVert U_{i}\rVert\leqslant 1 for all iVi\in V, we get

3 Wrapping Up

Our main result about Unique Games (Theorem 1.3) is a direct consequence of Theorem 8.6 and Theorem 8.7 presented here.

For every positive integer mm, there exists an algorithm running in time nO(mk2)n^{O(mk^{2})} that given a unique games instance Γ\Gamma over alphabet [k][k] with value 1η1-\eta, finds a labelling satisfying 1O(ηλm)1-O(\frac{\eta}{\lambda_{m}}) fraction of the edges. Here λm\lambda_{m} is the mthm^{th} smallest eigen value of the Laplacian of the constraint graph Γ\Gamma.

The algorithm proceeds by solving the k2m+2k^{2}m+2-round Lasserre SDP for the given instance. Starting with the SDP solution, the algorithm runs the propagation rounding algorithm starting from every possible seed set SS of size S=k2m|S|=k^{2}m.

By Lemma 8.2, there exists one such set SS for which we have,

Let μS\mu^{|S} denote the distribution over global assignments output by the propagation rounding scheme. For an edge (i,j)(i,j), let μij\mu_{ij} denote the local distribution over [k]2[k]^{2} suggested by the SDP solution. From Lemma 8.1, the statistical distance between μij\mu_{ij} and μijS\mu^{|S}_{ij} is at most

Averaging over all the edges we see that,

where Val(V)\mathsf{Val}(\mathcal{V}) is the SDP objective value of the solution V\mathcal{V}. Along with (8.5), this implies that the algorithm on the choice of the appropriate seed set SS would find a solution with value at least 1ηO(ηλm)1-\eta-O(\frac{\eta}{\lambda_{m}}). ∎

There exists an algorithm that given a Unique Games instance Γ\Gamma with vertex set [n][n], label set [k][k], and optimal value 1ε1-\varepsilon, finds an assignment with value at least \nicefrac12\nicefrac{{1}}{{2}} by rounding an k2nO(ε1/3)k^{2}\cdot n^{O(\varepsilon^{1/3})}-round Lasserre solution.

The proof follows by combining our propagation rounding and the decomposition theorem of [ABS10]. The latter result allows us to partition the input graph into disjoint components each with 1cε1-c\varepsilon rank at most nO(ε1/3)n^{O(\varepsilon^{1/3})} by removing at most 0.010.01 fraction of the edges in our input graph. An SDP solution for the input graph induces a solution for each of the components, and hence we can round the solution for each component separately using propagation rounding. ∎

Conclusions

We have shown that nO(ε1/3)n^{O(\varepsilon^{1/3})} rounds of an SDP hierarchy suffice for solving the Unique Games problem on (1ε)(1-\varepsilon)-satisfiable instances. The best lower bound known for the hierarchy we used is loglogΩ(1)n\log\log^{\Omega(1)}n [RS09, KS09], and so a natural question, with obvious relevance to the unique games conjecture, is which bound is closer to the truth. The fact that our algorithm’s running time for rr rounds is only 2O(r)2^{O(r)} (as opposed to nO(r)n^{O(r)}), challenges the interpretation of lower bounds in the range [ω(1),\O(logn)][\omega(1),\O(\log n)] as corresponding to super-polynomial running time, and so provides further motivation to the question of whether the current hierarchy lower bounds can be improved further.

With the exception of the Small-Set Expansion problem, we do not know how to translate algorithms for Unique Games into other computational problems. We hope that our ideas will help in combining the [ABS10] subexponential algorithm for Unique Games with SDP-based method to make progress on other Unique Games-hard computational problems. Indeed, Arora and Ge (personal communication) recently used the ideas of this work to obtain improved algorithms for 33-coloring on some interesting families of instances. A concrete open question along similar lines is whether one can get an algorithm for the Max Cut problem with approximation factor ε\varepsilon better than the factor of the Goemans-Williamson algorithm that runs in time exp(npoly(ε))\exp(n^{\operatorname{poly}(\varepsilon)}).

For general 2-CSPs, we know that some instances will require a large number of hierarchy rounds, but it’s interesting to see whether there is any clean characterization of the instances on which SDP hierarchies do well, encompassing, say, both low threshold rank graphs and planar graphs. Another interesting question is to find the right generalization of the low threshold rank condition to kk-CSPs for k>2k>2.

References

Appendix A Faster Algorithms for SDP hierarchies

In this section, we argue that our rounding algorithm also works with weaker SDP hierarchies. We will show that for these weaker hierarchies, a near-optimal mm-round solution can be computed in time 2O(r)poly(n)2^{O(r)}\operatorname{poly}(n). Due to the equivalence of optimization and separation, it is enough to describe a separation oracle with running time 2O(r)poly(n)2^{O(r)}\operatorname{poly}(n). Given a collection of vectors {via}\{v_{ia}\}, the separation oracle either has to output a good assignment or it has to output a valid linear constraint violated by the inner products of the input vectors.

We argue that such a separation oracle can easily be extracted from our rounding algorithm. Our rounding algorithm for Unique Games first selects a set SS of roughly mm vertices, then samples an assignment xSx_{S} for these vertices, and finally samples labels xix_{i} for the remaining vertices from the local distributions conditioned on the event xSx_{S}. The selection of the set SS depends only on the SDP vectors {via}\{v_{ia}\} but not on the local distributions (which are not known to the separation oracle).

Hence, given vectors {via}\{v_{ia}\}, our separation oracle can simply work as follows:

Select a vertex subset using Theorem 7.1 based on the given vectors {via}\{v_{ia}\}.

Using linear programming, find local distributions that are as consistent as possible with the inner products of the vectors {via}\{v_{ia}\}. If these local distributions match the inner products sufficiently closely, then our propagation rounding algorithm will succeed. On the other hand, if the local distributions do not match the inner products closely enough, then we can find a valid linear constraints that is violated by the inner product of the given vectors. (This separating linear constraint can be obtained from the dual solution of the linear program that was used to find the best local distributions.)

Appendix B Omitted proofs from Section 5 and Section 8

This appendix contains the proofs for some omitted proofs.

(Lemma 8.4 restated) If V\mathcal{V} is an SDP solution to unique games with value 1η1-\eta, i.e.,

Observe that the vectors uiau_{ia} are projections of viav_{ia} and projections shrinks distances, which implies that the {uia}\{u_{ia}\} vectors are correlated across constraints of the Unique Games instance,

Since xˉ1xˉ2yˉ1yˉ22xˉ1yˉ12+xˉ2yˉ22\lVert\bar{x}_{1}\otimes\bar{x}_{2}-\bar{y}_{1}\otimes\bar{y}_{2}\rVert^{2}\leqslant\lVert\bar{x}_{1}-\bar{y}_{1}\rVert^{2}+\lVert\bar{x}_{2}-\bar{y}_{2}\rVert^{2}, we can further upper bound

(In the last step, we again used the identity xy2=(xy)2+xyxˉyˉ2.\lVert x-y\rVert^{2}=(\lVert x\rVert-\lVert y\rVert)^{2}+\lVert x\rVert\,\lVert y\rVert\,\lVert\bar{x}-\bar{y}\rVert^{2}\,. and the fact that uiaujπij(a)viavjπij(a)\lVert u_{ia}\rVert\,\lVert u_{j\pi_{ij}(a)}\rVert\leqslant\lVert v_{ia}\rVert\,\lVert v_{j\pi_{ij}(a)}\rVert.) By averaging over the label set and the edges of the graph, it follows as claimed that

Let X1,,XnX_{1},\ldots,X_{n} be rr-local random variables over [k][k] and let XiaX_{ia} be the indicator of the event Xi=aX_{i}=a. Suppose that the matrix (Cov(Xia,Xjb))iV,a[k]\left(\operatorname*{Cov}(X_{ia},X_{jb})\right)_{i\in V,\,a\in[k]} is positive semidefinite. Then, there exists vectors v1,,vn{\bm{v}}_{1},\ldots,{\bm{v}}_{n} in the unit ball such that for all vertices i,jVi,j\in V and permutations π\pi of [k][k],

On the other hand, we can upper bound the inner product of vi{\bm{v}}_{i} and vj{\bm{v}}_{j},

Finally, the vectors v1,vn{\bm{v}}_{1}\ldots,{\bm{v}}_{n} are in the unit ball,

Here, we are using the fact that via,vib=0\langle v_{ia},v_{ib}\rangle=0 for all distinct a,b[k]a,b\in[k]. ∎

Appendix C Facts about Variance

Let XX and YY be jointly-distributed random variables. Assume that YY has finite range.Let ZZ be the orthogonal projection of the random variable XX onto the subspace of functions of the random variable YY. Then,

By construction ZZ is a function f(Y)f(Y) of the random variable YY and XZX-Z is orthogonal to all functions of the variable YY. Hence, \varmathbbE[XY=y]=f(y)\operatorname*{\varmathbb{E}}[X\mid Y=y]=f(y). Therefore, the expected variance of [XY][X\mid Y] is

which gives the desired identity using Z=f(Y)Z=f(Y). ∎

Let XX and YY be as in the previous lemma. Suppose the range of YY has cardinality 22. Then,

Without loss of generality, we may assume that \varmathbbEX=\varmathbbEY=0\operatorname*{\varmathbb{E}}X=\operatorname*{\varmathbb{E}}Y=0 and \varmathbbEY2=1\operatorname*{\varmathbb{E}}Y^{2}=1. Then, the set of random variables {1,Y}\{1,Y\} is an orthonormal basis for the subspace of functions of YY. Let ρ=\varmathbbEXY\rho=\operatorname*{\varmathbb{E}}XY. Then, ρY\rho Y is the orthogonal projection of XX to the subspace of function of YY. (Here, we use the assumption \varmathbbEX=0\operatorname*{\varmathbb{E}}X=0.) Hence, using the previous lemma,