Rounding Semidefinite Programming Hierarchies via Global Correlation

Boaz Barak, Prasad Raghavendra, David Steurer

Introduction

This paper is concerned with hierarchies of semi-definite programs (SDP’s). Semidefinite programs are an extremely useful tool in algorithms and in particular approximation algorithms (e.g., [GW95, KMS98]). SDP’s involve finding an integral (say $0/1$ ) solution for some optimization problem, by using convex programming to find a fractional/high-dimensional solution and then rounding it into an integral solution. Sherali and Adams [SA90], Lovász and Schrijver [LS91], and, later Lasserre [Las01], proposed systematic ways, known as hierarchies, to make this convex relaxation tighter, thus ensuring that the fractional solution is closer to an integral one. These hierarchies are parameterized by a number $r$ , called the level or number of rounds of the hierarchy. Given a program on $n$ variables, optimizing over the $r^{th}$ level of the hierarchy can be done in time $n^{O(r)}$ . The gap between integral and fractional solutions decreases with $r$ , and reaches zero at the $n^{th}$ level, where the program is guaranteed to find an optimal integral solution. The paper [Lau03] surveys and compares the different hierarchies proposed in the literature, see also the recent survey [CT10].

These semidefinite programming hierarchies have been of some interest in recent years, since they provide natural candidate algorithms for many computational problems. In particular, whenever the basic semidefinite or linear program provides a suboptimal approximation factor, it makes sense to ask how many rounds of the hierarchy are required to significantly improve upon this factor. Unfortunately, taking advantage of these hierarchies has often been difficult, and while some algorithms (e.g., [ARV09]) can be encapsulated in, say, level $3$ or $4$ of some hierarchies, there have been relatively few results (e.g. [Chl07, BCC+10]) that use higher levels to obtain new algorithmic results. In fact, there has been more success in showing that high levels of hierarchies do not help for many computational problems [ABLT06, STT07, GMPT07, RS09, KS09]. In particular for 3SAT and several other NP-hard problems, it is known that it takes $\Omega(n)$ rounds of the strongest SDP hierarchy (i.e., Lasserre) to improve upon the approximation rate achieved by the basic SDP (or sometimes even simpler algorithms) [Sch08, Tul09].

Semidefinite hierarchies are of particular interest in the case of problems related to Khot’s Unique Games Conjecture (UGC) [Kho02]. Several works have shown that for a wide variety of problems, the UGC implies that (unless $\textbf{P}=\textbf{NP}$ ) the basic semidefinite program cannot be improved upon by any polynomial-time algorithm [KKMO04, MOO05, Rag08]. Thus in particular the UGC predicts that for all these problems, it will take a super-constant (and in fact polynomial, under widely believed assumptions) number of hierarchy rounds to improve upon the basic SDP. Investigating this prediction, particularly for the Unique Games problem itself and other related problems such as Max Cut, Sparsest Cut and Small-Set Expansion, has been the focus of several works, and it is known that at least $(\log\log n)^{\Omega(1)}$ rounds are required for a non-trivial approximation [RS09, KS09] by a natural (though not strongest possible) SDP hierarchy. However, no non-trivial upper bound was known prior to the current work, and so it was conceivable that these lower bounds can be improved to $\Omega(n)$ .

Recently, Arora, Barak and Steurer [ABS10] gave a $2^{n^{\operatorname{poly}(\varepsilon)}}$ -time algorithm for solving the Unique Games and Small-Set Expansion problems (where $\varepsilon$ is the completeness parameter, see below). However, their algorithm did not use semidefinite programming hierarchies, and so does not immediately imply an upper bound on the number of rounds needed.

Our main contribution is a new method to analyze and round SDP hierarchies. We elaborate more on our method in Section 2, but its high level description is that uses global correlations inside the high-dimensional SDP solution, combined with the hierarchy constraints, to obtain a better rounding of this solution into an integral one. We believe this method can be of general utility, and in particular we use it here to give new algorithms for approximating constraint satisfaction problem on two-variable constraints (2-CSP’s), that run faster than the previously known algorithms for a natural family of instances. To state our results we need the notion of a threshold rank.

Results for Unique Games constraints

We say that a Max 2-Csp instance is a Unique Games instance if all the relation $\Pi_{i,j}$ have the form that $(a,b)\in\Pi_{i,j}$ iff $a=\pi_{i,j}(b)$ where $\pi_{i,j}$ is a permutation of $[k]$ . As mentioned above, the performance of SDP hierarchy on Unique Games instances and related problems is of particular interest. We obtain somewhat stronger quantitative results for Unique Games instances. Also, as remarked below, our results are “morally stronger” in this case, since it’s conceivable that the hardest instances for these types of problems have small threshold rank. First, we show that for Unique Games instances the threshold $\tau$ in Theorem 1.1 does not need to depend on the alphabet size. Namely, we prove

The Unique Games Conjecture is about a specific approximation regime for Unique Games. Given a Unique Games instance with optimal value $1-\varepsilon$ , the goal is to find an assignment with value at least $1/2$ .

We also show that in this case a sublinear (and in fact a small root) number of rounds suffice to get such an approximation in the worst case, regardless of the instance’s threshold rank. Moreover, we also show that such an approximation can be obtained in a number of rounds that depends on the $\tau$ -threshold rank for $\tau$ that is close to $1$ (as opposed to the small value of $\tau$ needed for Theorems 1.1 and 1.2).

Examples of graphs with small threshold rank

Also, as noted in [ABS10], hypercontractive graphs (i.e., graphs whose $2$ to $4$ operator norm is bounded) have at most polylogarithmic $\tau$ -threshold rank for every constant $\tau>0$ . For several 2-CSP’s such as Max Cut, Unique Games, Small-Set Expansion, Sparsest Cut, the constraint graphs for the canonical “problematic instances” (i.e., integrality gap examples [FS02, KV05, KS09, RS09]) are all hypercontractive, since they are based on either the noisy Gaussian graph or noisy Boolean cube. In fact, it is conceivable that the Small-Set Expansion problem is trivial on graphs with large threshold rank, in the sense that we do not know of any example of an instance having, say, $\log^{\omega(1)}n$ $0.99$ -threshold rank, and objective value smaller than $1/2$ . (For the Unique Games and Max Cut problems it is trivial to construct instances with large threshold rank by taking many disjoint copies of the same instance, though it could still be the case that the hardest instances are the ones with small threshold rank.) On the other hand, for other 2-CSPs such as Label Cover, some natural hard instances have linear threshold-rank. For example this is the case if one considers the natural “clause vs. variable” or “clause vs. clause” 2-CSP obtained from random instances of 3SAT (which is not surprising given that a non-trivial approximation for random 3SAT requires $\Omega(n)$ levels of the Lasserre hierarchy [Sch08]).

Algorithm efficiency

Our algorithm actually does not require the full power of the Lasserre hierarchy. First, we can use the relaxed variant with approximate constraints studied in [KS09, RS09, KPS10]. Second, in the proof of Theorem 1.3, we don’t need to utilize the constraints on all $\binom{n}{r}$ $r$ -sized subsets of $n$ variables, but rather sufficiently many random sets suffice. As a result, we can implement our $r$ -round algorithm in time $2^{O(r)}\operatorname{poly}(n)$ .

2 Related works

For Unique Games and related problems, previous works [KT07, Kol10, ABS10] used subspace enumeration to give algorithms with similar running time to Theorem 1.3 in the case that the threshold rank of the label extended graph of the instance is small. This is known to be a stronger condition on the instances than bounding the threshold rank of the constraint graph. The only known bound on the $1-\varepsilon$ threshold rank of the label extended graph in terms of the $1-\varepsilon$ threshold rank of the constraint graph loses a factor of about $n^{\varepsilon}$ [ABS10]. These subspace enumeration algorithms also only applied to nearly satisfiable instances (whose objective value is close to $1$ ), and so did not give guarantees comparable to Theorems 1.1 and 1.2. As mentioned below in Section 2, SDP-based algorithms have some robustness advantages over spectral techniques. SDP hierarchies are also easily shown to yield polynomial-time approximation scheme for 2CSPs whose constraint graphs can have very high threshold rank such as bounded tree width graphs and regular planar graphs (or more generally any hyperfinite family of graphs, see e.g. [HKNO09] and the references therein).

Approximation schemes for (pseudo) dense CSP’s

For general 2CSP’s, several works gave polynomial-time approximation schemes for dense and pseudo-dense instances [FK99, ACOH+10, COCF10]. Our work generalizes these results, since pseudo-density is a stronger condition than having a constraint graph of low threshold rank. Furthermore, for an $\varepsilon$ -approximation the degree of the instance needed by these works is exponential in $\frac{1}{\varepsilon}$ , while the results of this work apply even on random graphs of degree $poly(1/\varepsilon)$ .

Analyzing SDP hierarchy

Using very different methods, Chlamtac [Chl07] and Bhaskara et al [BCC+10] gave LP/SDP-hierarchy based algorithms for hypergraph coloring and the densest subgraph problem respectively. As mentioned above, several works gave lower bounds for LP/SDP hierarchies. In particular [RS09, KS09] showed that approximation such as those achieved in Theorem 1.3 for Unique Games problem require $\log\log^{\Omega(1)}n$ rounds of a relaxed variant of the Lasserre hierarchy. This relaxed variant captures our hierarchy as well. Schoenebeck [Sch08] proved that achieving a non-trivial approximation for 3SAT on random instances requires $\Omega(n)$ rounds in the Lasserre hierarchy, while Tulsiani [Tul09] showed that Lasserre lower bounds are preserved under common types of NP-hardness reductions.

In a concurrent and independent work, Guruswami and Sinop [GS11] gave results very similar to ours. They also use the Lasserre hierarchy to get an approximation scheme with similar performance to our Theorem 1.1 for 2-CSPs, and in fact even consider generalizations involving additional (approximate) global linear constraints. They also get essentially the same results for Unique Games as our Theorem 1.3. Furthermore, their rounding algorithm is the same as ours. However, there are some differences both in results and the proof. First, although [GS11] use a notion similar to our local-to-global correlation, they view it differently, and interestingly relate it to the problem of column selection for low rank approximations of matrices. Also, apart from the special case of unique constraints, they work with the threshold rank of the label extended graph, as opposed to the constraint graph as is the case here (however for binary alphabet these two graphs coincide). Their analysis relies on the full power of the Lasserre hierarchy, whereas we show that a weaker hierarchy is sufficient in the Unique Games case, and can even be done faster (i.e., $\exp(r)\operatorname{poly}(n)$ vs $n^{O(r)}$ ).

Our techniques

We now describe, on a very high and imprecise level, the ideas behind our rounding algorithm and its analysis. A semidefinite programming relaxation of an optimization problem yields a set of vectors $v_{1},\ldots,v_{n}$ satisfying certain conditions and achieving some objective value $c$ . The goal of a rounding algorithm is to transform this set of vectors into, say, a $+1/-1$ solution, satisfying the same conditions and achieving value $c^{\prime}$ that is close to $c$ . At a very high level, our main result is that if these vectors have some non-trivial global correlation, then a good rounding can be achieved with a non-trivially small number of hierarchy rounds. Our second observation is that in several cases, the vectors corresponding to a good SDP solution can be shown to have significant mass inside some low-dimensional subspace, and that implies a lower bound on their global correlation. Below we elaborate on what we mean, using the Max Cut problem (which is a special case of Unique Games) as an illustrative example. Our result for Max Cut is worked out in more detail in Section 4.

The SDP solution for Max Cut problem consists of a sequence $\mathcal{V}=v_{1},\ldots,v_{n}$ of unit vectors, and the objective value is the expectation of $(1-\langle v_{i},v_{j}\rangle)/2$ over all edges $\{i,j\}$ in the input graph. Note that in the case that the vectors $v_{1},\ldots,v_{n}$ are one dimensional unit vectors (i.e., $v_{i}\in\{\pm 1\}$ ), $\mathcal{V}$ exactly corresponds to a cut in the graph, and the objective value measures the fraction of edges cut. Now, suppose that you could find $r$ vectors $v_{i_{1}},\ldots,v_{i_{r}}\in\mathcal{V}$ , whom we’ll call the basis vectors, such that every other $v\in\mathcal{V}$ has some significant projection $\rho$ into the span of $v_{i_{1}},\ldots,v_{i_{r}}$ . That is, if we let $P$ be the projection operator corresponding to this space, then for every $v\in\mathcal{V}$ , $\lVert Pv\rVert_{2}\geqslant\rho$ . It turns out that in this case, if $\rho$ is sufficiently close to $1$ and the vector solution $\mathcal{V}$ satisfied $r+2$ rounds of an appropriate SDP hierarchy, then we can round $\mathcal{V}$ to achieve a very good cut. The intuition behind this is the following: the constraints of $r+2$ hierarchy rounds allow us to essentially assume without loss of generality that the vectors $v_{i_{1}},\ldots,v_{i_{r}}$ are one-dimensional. That is, after applying an appropriate rotation, we can think of each one of them as a vector of the form $(\pm 1,0,\ldots,0)$ . Moreover, our assumption implies that every other vector in $v$ has a magnitude of at least $\rho$ in its first coordinate. Now one can show that simply rounding each vector to the sign of its first coordinate will result in a $\pm 1$ assignment to the vertices corresponding to a good cut.

Local to global correlation

From the above discussion, our goal of rounding SDP hierarchies is reduced to finding a small number of basis vectors $v_{i_{1}},\ldots,v_{i_{r}}$ such that every (or at least most) other vector in the solution $\mathcal{V}$ has very large projection into their span. But, why should such vectors exist? We show that we can assume they exist if the original Max Cut instance has small threshold rank. The latter is a condition that, as mentioned above, holds for many natural families of instances, including the canonical “hard instances” that are known to fool the GW algorithm— the noisy sphere and noisy Gaussian graphs [FS02, RS09]. The key concept behind our proof is the notion of local vs global correlations. It is a very well known property of expander graphs that random edges behave similarly to pairs of independently chosen vertices with respect to some tests. Specifically, if $G$ is an $n$ -vertex expander in the sense that the normalized adjacency matrix $A_{G}$ ’s second largest eigenvalue is at most $\varepsilon$ , and $f$ is a bounded function mapping vertices to numbers, then we know that $\operatorname*{\varmathbb{E}}_{i,j}[|f(i)-f(j)|^{2}]\in(1\pm O(\varepsilon))\operatorname*{\varmathbb{E}}_{i\sim j}[|f(i)-f(j)|^{2}])$ , where the former expectation is over pairs of vertices and the latter is over pairs connected by an edge. In other words, expander graphs imply that if $f$ is locally correlated over the edges of an expander graph, then it is also globally correlated. In fact, this is easily shown to hold even if $f$ maps vertices not into numbers but into vectors— i.e., if $v_{1},\ldots,v_{n}$ are unit vectors that are locally correlated over the edges of $G$ then they are also globally correlated. Indeed, this property of expanders has been used in the work of [AKK+08], who showed that the basic SDP program for Unique Games can be successfully rounded if the input graph is an expander.

Distribution view of SDP’s

Another, often beneficial way to view SDP hierarchies is as providing distribution on integral solutions (see Section 3.2). In this view, for every set of $r+2$ vertices $i_{1},\ldots,i_{r},i_{r+1},i_{r+2}$ , the SDP hierarchy provides a distribution $X_{i_{1}},\ldots,X_{i_{r}}$ over $\pm 1$ . Moreover, we require that distributions on overlapping sets will be consistent, and that the for every two variables $i,j$ the expectation $E[X_{i}X_{j}]$ will equal the inner product $\langle v_{i},v_{j}\rangle$ of the corresponding vectors. The challenge in rounding the SDP is that there is not necessarily a way to sample simultaneously the random variables $X_{1},\ldots,X_{n}$ in some consistent way. The projection of a vector $v$ into the span of $v_{i_{1}},\ldots,v_{i_{r}}$ turns out to capture (an appropriate notion of) the mutual information between the variable $X_{i_{1}}$ and the variables $X_{i_{1}},\ldots,X_{i_{r}}$ . Looked at from this viewpoint, our rounding algorithm involves choosing an assignment from the distribution for the basis vertices, and conditioning on its value. As long as (**) holds, we can find a random variable $X_{i}$ such that conditioning on $X_{i}$ will significantly decrease the entropy of the remaining variables. When we get stuck and (*) is violated, it means that for a typical edge $i\sim j$ , the random variables $X_{i}$ and $X_{j}$ are close to being statistically independent. This means that just sampling each $X_{i}$ independently will give approximately the same value on a typical constraint.

Threshold rank vs global correlation

Whenever the graph has small number of large eigenvalues, the condition that local correlation implies global correlation holds. This is useful to simulate eigenspace enumeration algorithms such as used by [KT07, Kol10, ABS10, Ste10] since in the case of Unique Games (and other related problems), a good SDP solution must be locally well correlated. But the notion of local to global correlation is somewhat more general and robust than having small threshold rank. For example, adding $\sqrt{n}$ isolated vertices to a graph will increase correspondingly the number of eigenvectors with value $1$ , but will actually not change by much the local to global correlation. This captures to a certain extent the fact that SDP-based solutions are more robust than the spectral based algorithms. (A similar example of this phenomenon is that adding a tiny bipartite disjoint graph to the input graph makes the smallest eigenvalue become $-1$ , but does not change by much the value of the Goemans-Williamson SDP.) We hope that this robustness of the SDP-based approach will enable further improvements in the future.

Theorem 1.3 considers a different parameter than Theorems 1.1 and 1.2. The latter two results consider threshold ranks for a small (i.e., close to ) threshold $\tau$ , and achieve a very good approximation. In contrast, Theorem 1.1 considers threshold $\tau$ that is close to $1$ , but only achieve a rough approximation (corresponding to the approximation guarantee relevant to the unique games conjecture). This is also manifested in some technical differences in the proofs.

Organization

We begin by fixing notation and a few formal definitions in the next section. For the purpose of exposition, we first present an algorithm for Max Cut on low-rank graphs using the Lasserre hierarchy in Section 4. Following this, the general algorithm for 2-CSPs on low-rank graphs is presented in Section 5. The connection between local and global correlations in low-rank graphs that is central to our algorithms, is outlined in Section 6. To implement our general approach in a hierarchy weaker than Lasserre hierarchy, we outline an argument to obtain low-rank approximation to any set of vectors in Section 7. The final section (Section 8) of the paper is devoted to subexponential time algorithm for Unique Games.

Preliminaries

We will use capital letters $X,Y$ to denote random variables, and lower-case letters to denote assignments to these random variables.

The collision probability of $X$ is defined as

where $X^{\prime}$ is an independent copy of $X$ (so that the sequence $X,X^{\prime}$ is i.i.d.). It is easy to see that the variance and collision probability are related by,

2 Local Distributions

3 Lasserre Hierarchy

Let $U$ be a Unique Games instance with constraint graph $G=(V,E)$ , label set $[k]=\{1,\ldots,k\}$ , and bisections $\{\pi_{ij}\}_{ij\in E}$ . An $m$ -round Lasserre solution consists of $m$ -local random variables $X_{1},\ldots,X_{n}$ and vectors $v_{S,\alpha}$ for all vertex sets $S\subseteq V$ with $\lvert S\rvert\leqslant m+2$ and all local assignments $\alpha\in[k]^{S}$ . A Lasserre solution is feasible if the local random variables are consistent with the vectors, in the sense that for all $S,T\subseteq V$ and $\alpha\in[k]^{S},\beta\in[k]^{T}$ with $\lvert S\cup T\rvert\leqslant m+2$ , we have

The objective is to maximize the following expression

An important consequence of the existence of the vectors $v_{S,\alpha}$ is that for every set $S\subseteq V$ with $\lvert S\rvert\leqslant m$ and local assignment $x_{S}\in[k]^{S}$ , the matrix $\left\{\operatorname*{Cov}(X_{ia},X_{jb}\mid X_{S}=x_{S})\right\}_{i,j\in V,\,a,b\in[k]}$ is positive semidefinite.

Warmup – MaxCut Example

For the sake of exposition, we first present an algorithm for the Max Cut problem on low-rank graphs. In the Max Cut problem, the input consists of a graph $G=(V,E)$ and the goal is to find a cut $S\cup\bar{S}=V$ of the vertices that maximizes the number of edges crossing, i.e., maximizes $|E(S,\bar{S})|$ .

The Goemans-Williamson SDP relaxation for the problem assigns a unit vector $v_{i}$ for every vertex $i\in V$ , so as to maximize the average squared length $E_{i,j\in E}\lVert v_{i}-v_{j}\rVert^{2}$ of the edges. Formally, the SDP relaxation is given by,

Stronger SDP relaxations produced by hierarchies such as Sherali-Adams and Lasserre hierarchy also yield probability distributions over local assignments.

More precisely, given a $m$ -round Lasserre SDP solution, it can be associated with a set of $m$ -local random variables $X_{1},\ldots,X_{n}$ taking values in $\{-1,1\}$ . For an edge $(i,j)$ , its contribution to the SDP objective value ( $\lVert v_{i}-v_{j}\rVert^{2}$ ) is equal to the probability that the edge $(i,j)$ is cut under the distribution of local assignments $\mu_{ij}$ , namely,

Consequently, in order to obtain a cut with value close to the SDP objective, it is sufficient to jointly sample $X_{1},\ldots,X_{n}$ , such that on every edge $(i,j)$ the distribution of $X_{i}$ and $X_{j}$ is close to the corresponding local distribution $\mu_{ij}$ . However, the variables $X_{1},\ldots,X_{n}$ are not jointly distributed, and hence cannot all be sampled together.

As a first attempt, let us suppose we sample each $X_{i}$ independently from its associated marginal $\mu_{i}$ . If on most edges $(i,j)$ , the distribution of the resulting samples $X_{i},X_{j}$ is close to $\mu_{ij}$ , then we are done. On an edge $(i,j)$ , the local distribution $\mu_{ij}$ is far from the independent sampling distribution $\mu_{i}\times\mu_{j}$ only if the random variables $X_{i},X_{j}$ are correlated. Henceforth, these correlations across the edges would be refered to as “local correlations". A natural measure for correlations that we will utilize here is defined as $\operatorname*{Cov}(X_{i},X_{j})=\operatorname*{\varmathbb{E}}[X_{i}X_{j}]-\operatorname*{\varmathbb{E}}[X_{i}]\operatorname*{\varmathbb{E}}[X_{j}]$ . Using this measure, the statistical distance between independent sampling ( $\mu_{i}\times\mu_{j}$ ) and correlated sampling ( $\mu_{ij}$ ) is given by

(See Lemma 5.1 for a more general version of the above bound).

On the flip side, the existence of correlations makes the problem of sampling $X_{1},\ldots,X_{n}$ easier! If two variables $X_{i},X_{j}$ are correlated, then sampling/fixing the value of $X_{i}$ reduces the uncertainty in the value of $X_{j}$ . More precisely, conditioning on the value of $X_{i}$ reduces the variance of $X_{j}$ as shown below:

Therefore, if we pick an $i\in V$ at random and fix its value then the expected decrease in the variance of all the other variables is given by,

The above bound is proven in a more general setting in Lemma 5.2. As all random variables involved have variance at most $1$ , we can rewrite the above expression as,

The decrease in the variance is directly related to the global correlations between random pairs of vertices $i,j\in V$ .

Recall that, the failure of independent sampling yields a lower bound on the average local correlations on the edges namely, $E_{i,j\in E}|\operatorname*{Cov}(X_{i},X_{j})|$ . The crucial observation is that if the graph $G$ is a good expander in a suitable sense, then these local correlations translate in to non-negligible global correlations. Formally, we show the following (in Section 6):

Let ${\bm{v}}_{1},\ldots,{\bm{v}}_{n}$ be vectors in the unit ball. Suppose that the vectors are correlated across the edges of a regular $n$ -vertex graph $G$ ,

Then, the global correlation of the vectors is lower bounded by

As random variables $X_{i}$ arise from the solution to a SDP, the matrix $\left(\operatorname*{Cov}(X_{i},X_{j})\right)_{i,j\in V}$ is positive semidefinite, i.e., there exists vectors $u_{i}$ such that $\langle u_{i},u_{j}\rangle=\operatorname*{Cov}(X_{i},X_{j})\,\,\forall i,j\in V$ . Let us consider the vectors $v_{i}=u_{i}^{\otimes 2}$ . Suppose the local correlation $\operatorname*{\varmathbb{E}}_{i,j\in E}|\operatorname*{Cov}(X_{i},X_{j})|$ is at least $\varepsilon$ then we have,

and $\operatorname*{\varmathbb{E}}_{i}[\lVert v_{i}\rVert^{2}]\leqslant 1$ . If the graph $G$ is low-rank, then by Lemma 4.1 we get a lower bound on the global correlation of the vectors $v_{i}$ , namely

General 2-CSP on Low Rank Graphs

Let $\Im$ be a (general) Max 2-Csp instance with variable set $V=[n]$ and label set $[k]$ . (We represent $\Im$ as a distribution over triples $(i,j,\Pi)$ , where $i,j\in V$ and $\Pi\subseteq[k]\times[k]$ is an arbitrary binary predicate. The goal is to find an assignment $x\in[k]^{V}$ that maximizes the probability $\operatorname*{\varmathbb{P}}_{{(i,j,\Pi)\sim\Im}}\left\{(x_{i},x_{j})\in\Pi\right\}$ .)

For simplicity,If the constraint graph is not regular, all of our results still hold for an appropriate definition of threshold rank. we will assume that the constraint graph of $\Im$ is regular, i.e., every variable $i\in V$ appears in the same number of constraints. (Since we allow the constraints to be weighted, the precise condition is that the total weight of the constraints incident to a vertex is the same for every vertex.)

Let $X_{1},\ldots,X_{n}$ be $r$ -local random variables with range $[k]$ . We write $X_{ia}$ to denote the $\{0,1\}$ -indicator of the event $X_{i}=a$ . Notice that $\{X_{ia}\}_{i\in V,\,a\in[k]}$ are also $m$ -local random variables.

For two random variables $X$ and $X^{\prime}$ with the same range, we denote their statistical distance,

The following lemma shows that the statistical difference between independent sampling and correlated sampling is explained by local correlation.

Under the distribution $\{X_{i}X_{j}\}$ , the event $\{X_{i}=a,X_{j}=b\}$ has probability $\operatorname*{\varmathbb{E}}X_{ia}X_{jb}$ . On the other hand, under the product distribution $\{X_{i}\}\{X_{j}\}$ , this event has probability $\operatorname*{\varmathbb{E}}X_{ia}\operatorname*{\varmathbb{E}}X_{jb}$ . Hence, the difference of these probabilities is equal to $\operatorname*{\varmathbb{E}}X_{ia}X_{jb}-\operatorname*{\varmathbb{E}}X_{ia}\operatorname*{\varmathbb{E}}X_{jb}=\operatorname*{Cov}(X_{ia},X_{jb})$ . ∎

Conditional Variance and Pairwise Correlation

The following lemma shows that conditioning on a variable $X_{j}$ decreases the variance of a variable $X_{i}$ by the correlation of the variables $X_{ia}$ and $X_{jb}$ .

Pairwise Correlations and Inner Products

Suppose that the matrix $\left(\operatorname*{Cov}(X_{ia},X_{jb})\right)_{i\in V,\,a\in[k]}$ is positive semidefinite. Then, there exists vectors ${\bm{v}}_{1},\ldots,{\bm{v}}_{n}$ in the unit ball such that for all vertices $i,j\in V$ ,

Local Correlation vs Global Correlation on Low-Rank Graphs

The following lemma shows that local correlation (correlation across edges of a graph) implies global correlation (correlation between random vertices) if the graph has low threshold rank. (Proof in Section 6.)

Let ${\bm{v}}_{1},\ldots,{\bm{v}}_{n}$ be vectors in the unit ball. Suppose that the vectors are correlated across the edges of a regular $n$ -vertex graph $G$ ,

Then, the global correlation of the vectors is lower bounded by

Putting Things Together

The following lemma shows that either independent sampling is statistically close to correlated sampling across edges of a graph or the typical variance of a vertex decreases non-trivially by conditioning on a random vertex.

Let $G$ be a regular $n$ -vertex graph and $\varepsilon$ be the expected statistical distance between independent and correlated sampling across the edges of $G$ ,

Further, suppose that the matrix $\left(\operatorname*{Cov}(X_{ia},X_{jb})\right)_{i\in V,\,a\in[k]}$ is positive semidefinite. Then, conditioning on a random vertex decreases the variances by

Let ${\bm{v}}_{1},\ldots,{\bm{v}}_{n}$ be the vectors constructed in Lemma 5.3. By Lemma 5.3 and Lemma 5.1, the local correlation of these vectors is at least

(The last step also uses Cauchy–Schwartz.) Hence, Lemma 4.1 implies the following lower bound on the global correlation of these vectors,

Lemma 5.3 and Lemma 5.2 allows us to relate the expected decrement of the variances to the global correlation of the vectors ${\bm{v}}_{1},\ldots,{\bm{v}}_{n}$ ,

The following lemma asserts that if the constraint graph has low threshold rank then there exists a partial assignment $x_{S}$ to a small set $S$ of vertices such that independent sampling conditioned on this assignment $x_{S}$ gives almost the same value as correlated sampling (without conditioning on the assignment $x_{S}$ ).

Algorithm 5.5 (Propagation Sampling). Input: $r$ -local random variables $X_{1},\ldots,X_{n}$ over $[k]$ Output: (global) distribution over assignments $x\in[k]^{V}$ . 1. Choose $m\in\{1,\ldots,r\}$ at random. 2. Sample a random set of “seed vertices” $S\in V^{m}$ . (Repeated vertices are allowed.) 3. Sample a assignment $x_{S}\in[k]^{S}$ for $S$ according to its local distribution $\{X_{S}\}$ . 4. For every other vertex $i\in V\setminus S$ , sample a label $x_{i}\in[k]$ according to the local distribution for $S\cup\{i\}$ conditioned on the assignment $x_{S}$ for $S$ .

To prove the current theorem it is enough to show that $\operatorname*{\varmathbb{E}}_{m\in[r]}\varepsilon_{m}\leqslant\varepsilon$ . For $m\leqslant r$ , define a non-negative potential $\Phi_{m}$ as follows

Let $m\in[r]$ . Suppose $\varepsilon_{m}\geqslant\varepsilon/2$ . Then,

The following theorem directly implies Theorem 1.1.

An optimal $r$ -round Lasserre solution gives rise to $r$ -local random variables $X_{1},\ldots,X_{n}$ over $[k]$ . Let $X_{ia}$ be the indicator variable of the event $X_{i}=a$ . The matrices $\{\operatorname*{Cov}(X_{ia},X_{jb}\mid X_{S}=x_{S})\}_{i,j\in V,\,a,b\in[k]}$ are positive semidefinite for all sets $S\subseteq V$ with $\lvert S\rvert\leqslant r$ and local assignments $x_{S}\in[k]^{S}$ . Furthermore, the Lasserre solution satisfies

Let $X_{1}^{\prime},\ldots,X^{\prime}_{n}$ be the jointly-distributed (global) random variables in Theorem 5.6. By Theorem 5.6, we can estimate the expected value of the assignment $X^{\prime}_{1},\ldots,X^{\prime}_{n}$ as

1 Special case of Unique Games

The following lemma is a version of Lemma 5.3 tailored towards Unique Games. The advantage of this version of the lemma is that the bounds are independent of the alphabet size $k$ .

Let $X_{1},\ldots,X_{n}$ be $r$ -local random variables over $[k]$ and let $X_{ia}$ be the indicator of the event $X_{i}=a$ . Suppose that the matrix $\left(\operatorname*{Cov}(X_{ia},X_{jb})\right)_{i\in V,\,a\in[k]}$ is positive semidefinite. Then, there exists vectors ${\bm{v}}_{1},\ldots,{\bm{v}}_{n}$ in the unit ball such that for all vertices $i,j\in V$ and permutations $\pi$ of $[k]$ ,

The following theorem immediately implies Theorem 1.2. Let $\Im$ be a Unique Games instance with alphabet size $k$ and constraint graph $G$ .

Let $X_{1},\ldots,X_{n}$ be $r$ -local random variables over $[k]$ from an optimal $r$ -round Lasserre solution for $\Im$ . The local variables satisfy

For a permutation $\pi$ of $[k]$ , we define a modified version of statistical distance,

Therefore, we can estimate the expected fraction of satisfied constraints as

Local Correlation implies Global Correlation in Low-Rank Graphs

Let $G$ be a regular graph with vertex set $V=\{1,\ldots,n\}$ . We identify $G$ with its normalized adjacency matrix, a symmetric stochastic matrix. Let $\lambda_{1}\geqslant\ldots\geqslant\lambda_{n}\in$ be the eigenvalues of $G$ in non-increasing order.

The following lemma shows that a violation of the local vs global correlation condition implies that the graph has high threshold rank.

Suppose there exist vectors $v_{1},\ldots,v_{n}\in\varmathbb R^{n}$ such that

Then for all $C>1$ , $\lambda_{(1-1/C)m}\geqslant 1-C\cdot\varepsilon$ . In particular, $\lambda_{m/2}>1-2\varepsilon$ .

Let $X=(x_{r,s})_{r,s\in[n]}$ be the Gram matrix $(\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle)_{i,j\in V}$ represented in the eigenbasis of $G$ , so that

Let $m^{\prime}$ be the largest index such that $\lambda_{m^{\prime}}\geqslant 1-C\cdot\varepsilon$ . Notice that the numbers $p_{1}=x_{1,1},\ldots,p_{n}=x_{n,n}$ form a probability distribution over $r\in[n]$ . Let $q=\sum_{i=1}^{m^{\prime}}p_{i}$ be the probability of the event $r\leqslant m^{\prime}$ . Using Cauchy–Schwarz, we can bound this probability in terms of $m$ ,

On the other hand, we can bound the expectation of $\lambda_{r}$ with respect to the probability distribution $(p_{1},\ldots,_{n})$ in terms of this probability $q$ ,

It follows that $m^{\prime}\geqslant\left(1-\nicefrac{{1}}{{C}}\right)\cdot m$ , which gives the desired conclusion that $G$ has at least $\left(1-\nicefrac{{1}}{{C}}\right)\cdot m$ eigenvalues $\lambda_{r}\geqslant-C\cdot\varepsilon$ . ∎

Note that Lemma 4.1 follows directly from the previous lemma by picking $C=\frac{(1-\rho/100)}{(1-\rho)}$ and observing that $\operatorname*{\varmathbb{E}}_{i,j\in V}|\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle|\geqslant\operatorname*{\varmathbb{E}}_{i,j\in V}|\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle|^{2}$ since $|\langle{\bm{v}}_{i},{\bm{v}}_{j}\rangle|\leqslant 1$ for all $i,j\in V$

As a converse to Lemma 6.1, the following lemma shows that if a graph has many eigenvalues close to $1$ , then there exist vectors for the vertices of the graph with high local correlation and low global correlation.

If $\lambda_{m}\geqslant 1-\varepsilon$ , then there exist vectors $v_{1},\ldots,v_{n}\in\varmathbb R^{m}$ such that

Let $f^{(1)},\ldots,f^{(m)}\colon V\to\varmathbb R$ be orthonormal eigenfunctions of $G$ with eigenvalue larger than $1-\varepsilon$ . Consider vectors $v_{1},\ldots,v_{n}\in\varmathbb R^{m}$ satisfying $\langle v_{i},v_{j}\rangle=\operatorname*{\varmathbb{E}}_{r\in[m]}f^{(r)}_{i}f^{(r)}_{j}.$ Since the functions $f^{(r)}$ have norm $1$ , the typical squared norm of the vectors $v_{i}$ satisfies

Since the eigenvalues of the eigenfunctions $f^{(r)}$ are larger than $1-\varepsilon$ , we can lower bound the local correlation of the vectors $v_{i}$ ,

Finally, since the function $f^{(m)}$ are orthonormal, the global correlation of the vectors $v_{i}$ is

The condition that there exist vectors $v_{1},\ldots,v_{n}\in\varmathbb R^{n}$ with

is equivalent to the condition that there exists a symmetric positive semidefinite matrix $X\in\varmathbb R^{V\times V}$ such that

On Low Rank Approximations to Sets of Vectors

Let $v_{1},\ldots,v_{n}\in\varmathbb R^{n}$ be vectors in the unit ball. Then for every $\varepsilon>0$ , there exists a subset $U\subseteq\{v_{1},\ldots,v_{n}\}$ with $\lvert U\rvert\leqslant 1/\varepsilon$ such that $\operatorname*{\varmathbb{E}}_{i,j\in[n]}\lVert w_{i}\rVert\,\lVert w_{j}\rVert\,\langle\bar{w}_{i},\bar{w}_{j}\rangle^{2}\leqslant\varepsilon$ , where $w_{i}$ is the projection of $v_{i}$ to the orthogonal complement of the span of $U$ .

The proof of Theorem 7.1 is by an iterative construction. In each iteration, we will use the following lemma.

Let $v_{1},\ldots,v_{n}\in\varmathbb R^{n}$ be vectors. Then, there exists a unit vector $u\in\{\bar{v}_{1},\ldots,\bar{v}_{n}\}$ such that the vectors $v^{\prime}_{1},\ldots,v^{\prime}_{n}$ with $v^{\prime}_{i}=v_{i}-\langle v_{i},u\rangle u$ satisfy the following condition,

Suppose we pick a random index $j\in[n]$ and choose $u=\bar{v}_{j}$ . In this case, the squared norm of the vectors $v^{\prime}_{i}=v_{i}-\langle v_{i},u\rangle u$ equals

Hence, we can estimate the expected decrease of the typical squared norms for a random vector $u\in\{\bar{v}_{1},\ldots,\bar{v}_{n}\}$ .

It follows that there exists a unit vector $u\in\{\bar{v}_{1},\ldots,\bar{v}_{n}\}$ such that the vectors $v^{\prime}_{i}=v_{i}-\langle v_{i},u\rangle u$ have the desired property

We can construct the set $U$ in a greedy fashion so as to minimize the total squared norm of the vectors $w_{1},\ldots,w_{n}$ (the projections of the vectors $v_{i}$ to the orthogonal complement of the span of $U$ ). (In fact, we could choose set $U$ randomly.) To make the analysis more convenient, we use the following, slightly different construction.

Let $v^{(1)}_{i}=v_{i}$ for all $i\in[n]$ .

For $t$ from $1$ to $1/\varepsilon$ , construct vectors $u^{(t)}\in\varmathbb R^{n}$ and $v^{(t+1)}_{1},\ldots,v^{(t+1)}_{n}\in\varmathbb R^{n}$ as follows:

Using Lemma 7.2, pick a unit vector $u^{(t)}\in\{\bar{v}^{(t)}_{1},\ldots,\bar{v}^{(t)}_{n}\}$ such that the vectors $v^{(t+1)}_{i}=v^{(t)}_{i}-\langle v^{(t)}_{i},u^{(t)}\rangle u^{(t)}$ satisfy the condition

Notice that the vectors $v^{(t)}_{1},\ldots,v^{(t)}_{n}$ are the projections of the vector $v_{1},\ldots,v_{n}$ into the orthogonal complement of the span of the vectors $u^{(1)},\ldots,u^{(t-1)}$ . Let $U$ be the set of all indices $j$ such that $u^{(t)}=\bar{v}^{(t)}_{j}$ for some $t\in\{1,\ldots,1/\varepsilon\}$ . We can verify that the vectors $u^{(1)},\ldots,u^{(1/\varepsilon)}$ are an orthonormal basis of the span of $U$ . Let $w_{1},\ldots,w_{n}$ be the projections of the vectors $v_{1},\ldots,v_{n}$ into the orthogonal complement of the span of $U$ (so that $w_{i}=v^{(1/\varepsilon)}_{i}$ ). Since the vectors $w_{1},\ldots,w_{n}$ are projections of the vectors $v^{(t)}_{1},\ldots,v^{(t)}_{n}$ for all $t\in{1,\ldots,1/\varepsilon}$ , it follows that

Hence, we can bound the typical squared norm of the vectors $w_{i}$ ,

Since the left-hand side is nonnegative and $\operatorname*{\varmathbb{E}}_{i\in[n]}\lVert v_{i}\rVert^{2}\leqslant 1$ , it follows that $\operatorname*{\varmathbb{E}}_{i,j\in[n]}\lVert w_{i}\rVert\,\lVert w_{j}\rVert\,\langle\bar{w}_{i},\bar{w}_{j}\rangle^{2}\leqslant\varepsilon\,,$ as desired. ∎

For our applications it will sometimes be convenient to associate different subspace with subsets $U$ of vectors (in Theorem 7.1, we associate the span of vectors in $U$ with the subset $U$ ).

Let $v_{1},\ldots,v_{n}\in\varmathbb R^{n}$ be vectors in the unit ball. For every subset $U\subseteq V$ , let $Q_{U}$ be the projector on some subspace orthogonal to the span of $U$ . (Note that $Q_{U}$ is not necessarily the projector on the orthogonal complement of the span of $U$ .) Then for every $\varepsilon>0$ , there exists a subset $U\subseteq\{v_{1},\ldots,v_{n}\}$ with $\lvert U\rvert\leqslant 1/\varepsilon$ such that $\operatorname*{\varmathbb{E}}_{i,j\in[n]}\lVert w_{i}\rVert\,\lVert w_{j}\rVert\,\langle\bar{w}_{i},\bar{w}_{j}\rangle^{2}\leqslant\varepsilon$ , where $w_{i}=Q_{U}v_{i}$ .

We use the same construction as in the proof of Theorem 7.1. The only difference is that we define $v^{(t+1)}_{i}=P_{U^{(t)}}v_{i}$ (instead of $v^{(t+1)}_{i}=v^{(t)}_{i}-\langle v^{(t)}_{i},u^{(t)}\rangle u^{(t)}$ ). Here, $U^{(t)}$ is the set of all indices $j$ such that $u^{(t^{\prime})}=\bar{v}_{j}^{(t^{\prime})}$ for some $t^{\prime}\leqslant t$ . The proof is still applies to this modifies construction because $\lVert v^{(t+1)}_{i}\rVert\leqslant\lVert v^{(t)}_{i}-\langle v^{(t)}_{i},u^{(t)}\rangle u^{(t)}\rVert$ (which is the only fact used about these vectors). ∎

Rounding SDP Solutions to Unique Games

In this section, we will present a subexponential time algorithm for Unique Games based on a SDP hierarchy, namely the simple SDP augmented with Sherali-Adams hierarchy. This hierarchy of relaxations weaker than the Lasserre hierarchy was studied in some earlier works [RS09, KS09]. Roughly speaking, the $m$ th round relaxation in this hierarchy corresponds to the basic semidefinite program, along with all valid constraints on at most $m$ vectors. Formally, the variables in the $m$ th round relaxation for Unique Games consists of

A set of vectors $\mathcal{V}=\{v_{ia}\}{i\in V,a\in[k]}$ with $k$ orthogonal vectors for every vertex $i\in V$ .

The constraint of the SDP relaxation ensure that the inner products of the vectors are consistent with the corresponding local distributions, i.e., for all $S\subseteq V$ , $\lvert S\rvert\leqslant m$ $i,j\in S$ and $a,b\in[k]$ ,

The objective value of the SDP corresponds to minimizing the number of violated constraints,

Sample a assignment $x_{S}\in[k]^{S}$ for $S$ according to its local distribution $\mu_{S}$ .

For every other vertex $i\in V\setminus S$ , sample a label $x_{i}\in[k]$ according to the local distribution for $S\cup\{i\}$ conditioned on the assignment $x_{S}$ for $S$ .

The above procedure will be referred to as propagation rounding and the set $S$ of vertices will be called the seed vertices.

The following lemma implies that if the seed vertices $S$ nearly determine the values of a set of vertices $T$ , then the assignment output by the propagation rounding has a distribution similar to the local distribution $\mu_{T}$ that is part of the LP/SDP solution (hence gets close to the SDP value).

For a set $S\subseteq V$ $|S|=m-t$ , let $\mu^{|S}$ denote the distribution over global assignments $x\in[k]^{V}$ output by propagation rounding with $S$ as the seed vertex set. Then, for every subset $T$ with $|T|\leqslant t$ we have

Sample a assignment $x_{S}\in[k]^{S}$ for $S$ according to its local distribution $\mu_{S}$ ,

Sample an assignment $y_{T}\in[k]^{T}$ according to the local distribution for $S\cup T$ conditioned on the assignment $x_{S}$ for $S$ ,

For every vertex $t\in T$ , sample a label $x_{t}\in[k]$ according to the local distribution for $S\cup\{t\}$ conditioned on the assignment $x_{S}$ for $S$ ,

Clearly the distribution of $y_{T}$ is $\mu_{T}$ , while the distribution of $x_{T}$ is $\mu^{|S}_{T}$ . For any $t\in T$ , the coordinates $x_{t}$ and $y_{t}$ are independent samples from $\mu_{S,t}\mid x_{S}$ . Therefore we have,

Averaging over the different choices of $x_{S}$ ,

2 Unique Games on Low Rank Graphs

Let $G$ be an instance of unique games whose constraint graph $G$ has low threshold rank. Let $\mathcal{V}=\{v_{ia}\}_{i\in V,a\leqslant[k]}$ be an SDP solution for $G$ , and let $\{\mu_{S}\}_{S\subseteq V,|S|\leqslant m}$ denote the associated set of locla distributions. Let $X_{1},\ldots,X_{n}$ denote the associated $m$ -local random variables. The main result of this section shows that there exists a small set of seed vertices fixing whose value determines the value of almost every other vertex. Formally, we show the following

For every integer $m$ , there exists a subset of vertices $S\subseteq V$ of size $|S|=k^{2}m$ such that

To this end, we will relate conditioning a random variable $X_{i}$ on a set $X_{S}$ , to projecting the SDP vectors corresponding to the variable $X_{i}$ in to the span of the vectors corresponding to $X_{S}$ . This analogy is formalized in the following lemma.

Let $X_{1},X_{2},\ldots,X_{r}$ be random variables with range $[k]$ with a joint distribution $\mu$ associated with them. For each $i\in[r]$ , $a\in[k]$ , let $X_{ia}$ be the indicator of the event that $X_{i}=a$ . Let us suppose there exists vectors $\{v_{ia}\}_{i\in[r],a\in[k]}$ such that

Let us suppose $v_{ia}=\sum_{j\in S,b\in[k]}c_{jb}v_{jb}+P_{S}v_{ia}$ . Define a random variable $C_{S}$ as follows,

Note that on fixing the values $\{X_{j}\}_{j\in S}$ , the random variable $C_{S}$ is fixed.

By the definition of variance of a real random variable we have the following inequality.

Averaging the above inequality over the settings of $x_{S}$ , we get

Note that the second moments of the random variables $\{X_{ia}\}_{i\in[r],a\in[k]}$ match with the corresponding inner products of vectors $\{v_{ia}\}_{i\in[r],a\in[k]}$ . Hence,

The claim (1) follows from (8.1) and (8.2).

The claim (2) follows from (1) and the definition of variance of a random variable taking values over $[k]$ . ∎

For a subset $T\subseteq\mathcal{V}=\{v_{ia}\}_{i\in V,a\in[k]}$ , let $S_{T}$ be the set of vertices associated with it namely,

Let $Q_{T}$ denote the projector on to the subspace orthogonal to span of $\{v_{ia}|i\in S_{T},a\in[k]\}$ . In particular, $Q_{T}$ is a projector on to a subspace orthogonal to $T$ for all $T\subseteq\mathcal{V}$ .

Apply Theorem 7.3 on the set of vectors $\mathcal{V}=\{v_{ia}\}_{i\in V,a\in[k]}$ with the projectors $Q_{T}$ for a subset $T\subseteq\mathcal{V}$ . Theorem 7.3 implies that there exists a choice of $T\subseteq\mathcal{V}$ of size $|T|=k^{2}m$ such that if $u_{ia}=Q_{T}v_{ia}$ then,

From Lemma 8.5, the low global correlation of vectors $\{U_{i}\}_{i\in V}$ implies that their squared length is small, i.e.,

If $\mathcal{V}$ is an SDP solution to unique games with value $1-\eta$ , i.e.,

If the vectors $\{U_{i}\}_{i\in V}$ satisfy,

then the average correlation among the vectors $\{U_{i}\}_{i\in V}$ is at least $\nicefrac{{1}}{{m}}$ , i.e.,

By Lemma 8.4, the vectors $\{U_{i}\}$ satisfy

Let $\operatorname*{\varmathbb{E}}_{i}\lVert U_{i}\rVert^{2}=C\geqslant 4\eta/\lambda_{m}$ . Normalize the vectors $U_{i}$ so as to make their average squared length equal to $1$ . The resulting vectors have correlation at least $(1-\eta/C)\geqslant 1-\lambda_{m}/2$ . By Lemma 6.1, this implies that $\operatorname*{\varmathbb{E}}_{i,j\in V}\langle U_{i},U_{j}\rangle^{2}\geqslant\frac{1}{m}$ . Since $\lVert U_{i}\rVert\leqslant 1$ for all $i\in V$ , we get

3 Wrapping Up

Our main result about Unique Games (Theorem 1.3) is a direct consequence of Theorem 8.6 and Theorem 8.7 presented here.

For every positive integer $m$ , there exists an algorithm running in time $n^{O(mk^{2})}$ that given a unique games instance $\Gamma$ over alphabet $[k]$ with value $1-\eta$ , finds a labelling satisfying $1-O(\frac{\eta}{\lambda_{m}})$ fraction of the edges. Here $\lambda_{m}$ is the $m^{th}$ smallest eigen value of the Laplacian of the constraint graph $\Gamma$ .

The algorithm proceeds by solving the $k^{2}m+2$ -round Lasserre SDP for the given instance. Starting with the SDP solution, the algorithm runs the propagation rounding algorithm starting from every possible seed set $S$ of size $|S|=k^{2}m$ .

By Lemma 8.2, there exists one such set $S$ for which we have,

Let $\mu^{|S}$ denote the distribution over global assignments output by the propagation rounding scheme. For an edge $(i,j)$ , let $\mu_{ij}$ denote the local distribution over $[k]^{2}$ suggested by the SDP solution. From Lemma 8.1, the statistical distance between $\mu_{ij}$ and $\mu^{|S}_{ij}$ is at most

Averaging over all the edges we see that,

where $\mathsf{Val}(\mathcal{V})$ is the SDP objective value of the solution $\mathcal{V}$ . Along with (8.5), this implies that the algorithm on the choice of the appropriate seed set $S$ would find a solution with value at least $1-\eta-O(\frac{\eta}{\lambda_{m}})$ . ∎

There exists an algorithm that given a Unique Games instance $\Gamma$ with vertex set $[n]$ , label set $[k]$ , and optimal value $1-\varepsilon$ , finds an assignment with value at least $\nicefrac{{1}}{{2}}$ by rounding an $k^{2}\cdot n^{O(\varepsilon^{1/3})}$ -round Lasserre solution.

The proof follows by combining our propagation rounding and the decomposition theorem of [ABS10]. The latter result allows us to partition the input graph into disjoint components each with $1-c\varepsilon$ rank at most $n^{O(\varepsilon^{1/3})}$ by removing at most $0.01$ fraction of the edges in our input graph. An SDP solution for the input graph induces a solution for each of the components, and hence we can round the solution for each component separately using propagation rounding. ∎

Conclusions

We have shown that $n^{O(\varepsilon^{1/3})}$ rounds of an SDP hierarchy suffice for solving the Unique Games problem on $(1-\varepsilon)$ -satisfiable instances. The best lower bound known for the hierarchy we used is $\log\log^{\Omega(1)}n$ [RS09, KS09], and so a natural question, with obvious relevance to the unique games conjecture, is which bound is closer to the truth. The fact that our algorithm’s running time for $r$ rounds is only $2^{O(r)}$ (as opposed to $n^{O(r)}$ ), challenges the interpretation of lower bounds in the range $[\omega(1),\O(\log n)]$ as corresponding to super-polynomial running time, and so provides further motivation to the question of whether the current hierarchy lower bounds can be improved further.

With the exception of the Small-Set Expansion problem, we do not know how to translate algorithms for Unique Games into other computational problems. We hope that our ideas will help in combining the [ABS10] subexponential algorithm for Unique Games with SDP-based method to make progress on other Unique Games-hard computational problems. Indeed, Arora and Ge (personal communication) recently used the ideas of this work to obtain improved algorithms for $3$ -coloring on some interesting families of instances. A concrete open question along similar lines is whether one can get an algorithm for the Max Cut problem with approximation factor $\varepsilon$ better than the factor of the Goemans-Williamson algorithm that runs in time $\exp(n^{\operatorname{poly}(\varepsilon)})$ .

For general 2-CSPs, we know that some instances will require a large number of hierarchy rounds, but it’s interesting to see whether there is any clean characterization of the instances on which SDP hierarchies do well, encompassing, say, both low threshold rank graphs and planar graphs. Another interesting question is to find the right generalization of the low threshold rank condition to $k$ -CSPs for $k>2$ .

References

Appendix A Faster Algorithms for SDP hierarchies

In this section, we argue that our rounding algorithm also works with weaker SDP hierarchies. We will show that for these weaker hierarchies, a near-optimal $m$ -round solution can be computed in time $2^{O(r)}\operatorname{poly}(n)$ . Due to the equivalence of optimization and separation, it is enough to describe a separation oracle with running time $2^{O(r)}\operatorname{poly}(n)$ . Given a collection of vectors $\{v_{ia}\}$ , the separation oracle either has to output a good assignment or it has to output a valid linear constraint violated by the inner products of the input vectors.

We argue that such a separation oracle can easily be extracted from our rounding algorithm. Our rounding algorithm for Unique Games first selects a set $S$ of roughly $m$ vertices, then samples an assignment $x_{S}$ for these vertices, and finally samples labels $x_{i}$ for the remaining vertices from the local distributions conditioned on the event $x_{S}$ . The selection of the set $S$ depends only on the SDP vectors $\{v_{ia}\}$ but not on the local distributions (which are not known to the separation oracle).

Hence, given vectors $\{v_{ia}\}$ , our separation oracle can simply work as follows:

Select a vertex subset using Theorem 7.1 based on the given vectors $\{v_{ia}\}$ .

Using linear programming, find local distributions that are as consistent as possible with the inner products of the vectors $\{v_{ia}\}$ . If these local distributions match the inner products sufficiently closely, then our propagation rounding algorithm will succeed. On the other hand, if the local distributions do not match the inner products closely enough, then we can find a valid linear constraints that is violated by the inner product of the given vectors. (This separating linear constraint can be obtained from the dual solution of the linear program that was used to find the best local distributions.)

Appendix B Omitted proofs from Section 5 and Section 8

This appendix contains the proofs for some omitted proofs.

(Lemma 8.4 restated) If $\mathcal{V}$ is an SDP solution to unique games with value $1-\eta$ , i.e.,

Observe that the vectors $u_{ia}$ are projections of $v_{ia}$ and projections shrinks distances, which implies that the $\{u_{ia}\}$ vectors are correlated across constraints of the Unique Games instance,

Since $\lVert\bar{x}_{1}\otimes\bar{x}_{2}-\bar{y}_{1}\otimes\bar{y}_{2}\rVert^{2}\leqslant\lVert\bar{x}_{1}-\bar{y}_{1}\rVert^{2}+\lVert\bar{x}_{2}-\bar{y}_{2}\rVert^{2}$ , we can further upper bound

(In the last step, we again used the identity $\lVert x-y\rVert^{2}=(\lVert x\rVert-\lVert y\rVert)^{2}+\lVert x\rVert\,\lVert y\rVert\,\lVert\bar{x}-\bar{y}\rVert^{2}\,.$ and the fact that $\lVert u_{ia}\rVert\,\lVert u_{j\pi_{ij}(a)}\rVert\leqslant\lVert v_{ia}\rVert\,\lVert v_{j\pi_{ij}(a)}\rVert$ .) By averaging over the label set and the edges of the graph, it follows as claimed that

On the other hand, we can upper bound the inner product of ${\bm{v}}_{i}$ and ${\bm{v}}_{j}$ ,

Finally, the vectors ${\bm{v}}_{1}\ldots,{\bm{v}}_{n}$ are in the unit ball,

Here, we are using the fact that $\langle v_{ia},v_{ib}\rangle=0$ for all distinct $a,b\in[k]$ . ∎

Appendix C Facts about Variance

Let $X$ and $Y$ be jointly-distributed random variables. Assume that $Y$ has finite range.Let $Z$ be the orthogonal projection of the random variable $X$ onto the subspace of functions of the random variable $Y$ . Then,

By construction $Z$ is a function $f(Y)$ of the random variable $Y$ and $X-Z$ is orthogonal to all functions of the variable $Y$ . Hence, $\operatorname*{\varmathbb{E}}[X\mid Y=y]=f(y)$ . Therefore, the expected variance of $[X\mid Y]$ is

which gives the desired identity using $Z=f(Y)$ . ∎

Let $X$ and $Y$ be as in the previous lemma. Suppose the range of $Y$ has cardinality $2$ . Then,

Without loss of generality, we may assume that $\operatorname*{\varmathbb{E}}X=\operatorname*{\varmathbb{E}}Y=0$ and $\operatorname*{\varmathbb{E}}Y^{2}=1$ . Then, the set of random variables $\{1,Y\}$ is an orthonormal basis for the subspace of functions of $Y$ . Let $\rho=\operatorname*{\varmathbb{E}}XY$ . Then, $\rho Y$ is the orthogonal projection of $X$ to the subspace of function of $Y$ . (Here, we use the assumption $\operatorname*{\varmathbb{E}}X=0$ .) Hence, using the previous lemma,