Rounding Semidefinite Programming Hierarchies via Global Correlation
Boaz Barak, Prasad Raghavendra, David Steurer
Introduction
This paper is concerned with hierarchies of semi-definite programs (SDP’s). Semidefinite programs are an extremely useful tool in algorithms and in particular approximation algorithms (e.g., [GW95, KMS98]). SDP’s involve finding an integral (say ) solution for some optimization problem, by using convex programming to find a fractional/high-dimensional solution and then rounding it into an integral solution. Sherali and Adams [SA90], Lovász and Schrijver [LS91], and, later Lasserre [Las01], proposed systematic ways, known as hierarchies, to make this convex relaxation tighter, thus ensuring that the fractional solution is closer to an integral one. These hierarchies are parameterized by a number , called the level or number of rounds of the hierarchy. Given a program on variables, optimizing over the level of the hierarchy can be done in time . The gap between integral and fractional solutions decreases with , and reaches zero at the level, where the program is guaranteed to find an optimal integral solution. The paper [Lau03] surveys and compares the different hierarchies proposed in the literature, see also the recent survey [CT10].
These semidefinite programming hierarchies have been of some interest in recent years, since they provide natural candidate algorithms for many computational problems. In particular, whenever the basic semidefinite or linear program provides a suboptimal approximation factor, it makes sense to ask how many rounds of the hierarchy are required to significantly improve upon this factor. Unfortunately, taking advantage of these hierarchies has often been difficult, and while some algorithms (e.g., [ARV09]) can be encapsulated in, say, level or of some hierarchies, there have been relatively few results (e.g. [Chl07, BCC+10]) that use higher levels to obtain new algorithmic results. In fact, there has been more success in showing that high levels of hierarchies do not help for many computational problems [ABLT06, STT07, GMPT07, RS09, KS09]. In particular for 3SAT and several other NP-hard problems, it is known that it takes rounds of the strongest SDP hierarchy (i.e., Lasserre) to improve upon the approximation rate achieved by the basic SDP (or sometimes even simpler algorithms) [Sch08, Tul09].
Semidefinite hierarchies are of particular interest in the case of problems related to Khot’s Unique Games Conjecture (UGC) [Kho02]. Several works have shown that for a wide variety of problems, the UGC implies that (unless ) the basic semidefinite program cannot be improved upon by any polynomial-time algorithm [KKMO04, MOO05, Rag08]. Thus in particular the UGC predicts that for all these problems, it will take a super-constant (and in fact polynomial, under widely believed assumptions) number of hierarchy rounds to improve upon the basic SDP. Investigating this prediction, particularly for the Unique Games problem itself and other related problems such as Max Cut, Sparsest Cut and Small-Set Expansion, has been the focus of several works, and it is known that at least rounds are required for a non-trivial approximation [RS09, KS09] by a natural (though not strongest possible) SDP hierarchy. However, no non-trivial upper bound was known prior to the current work, and so it was conceivable that these lower bounds can be improved to .
Recently, Arora, Barak and Steurer [ABS10] gave a -time algorithm for solving the Unique Games and Small-Set Expansion problems (where is the completeness parameter, see below). However, their algorithm did not use semidefinite programming hierarchies, and so does not immediately imply an upper bound on the number of rounds needed.
Our main contribution is a new method to analyze and round SDP hierarchies. We elaborate more on our method in Section 2, but its high level description is that uses global correlations inside the high-dimensional SDP solution, combined with the hierarchy constraints, to obtain a better rounding of this solution into an integral one. We believe this method can be of general utility, and in particular we use it here to give new algorithms for approximating constraint satisfaction problem on two-variable constraints (2-CSP’s), that run faster than the previously known algorithms for a natural family of instances. To state our results we need the notion of a threshold rank.
Results for Unique Games constraints
We say that a Max 2-Csp instance is a Unique Games instance if all the relation have the form that iff where is a permutation of . As mentioned above, the performance of SDP hierarchy on Unique Games instances and related problems is of particular interest. We obtain somewhat stronger quantitative results for Unique Games instances. Also, as remarked below, our results are “morally stronger” in this case, since it’s conceivable that the hardest instances for these types of problems have small threshold rank. First, we show that for Unique Games instances the threshold in Theorem 1.1 does not need to depend on the alphabet size. Namely, we prove
The Unique Games Conjecture is about a specific approximation regime for Unique Games. Given a Unique Games instance with optimal value , the goal is to find an assignment with value at least .
We also show that in this case a sublinear (and in fact a small root) number of rounds suffice to get such an approximation in the worst case, regardless of the instance’s threshold rank. Moreover, we also show that such an approximation can be obtained in a number of rounds that depends on the -threshold rank for that is close to (as opposed to the small value of needed for Theorems 1.1 and 1.2).
Examples of graphs with small threshold rank
Also, as noted in [ABS10], hypercontractive graphs (i.e., graphs whose to operator norm is bounded) have at most polylogarithmic -threshold rank for every constant . For several 2-CSP’s such as Max Cut, Unique Games, Small-Set Expansion, Sparsest Cut, the constraint graphs for the canonical “problematic instances” (i.e., integrality gap examples [FS02, KV05, KS09, RS09]) are all hypercontractive, since they are based on either the noisy Gaussian graph or noisy Boolean cube. In fact, it is conceivable that the Small-Set Expansion problem is trivial on graphs with large threshold rank, in the sense that we do not know of any example of an instance having, say, -threshold rank, and objective value smaller than . (For the Unique Games and Max Cut problems it is trivial to construct instances with large threshold rank by taking many disjoint copies of the same instance, though it could still be the case that the hardest instances are the ones with small threshold rank.) On the other hand, for other 2-CSPs such as Label Cover, some natural hard instances have linear threshold-rank. For example this is the case if one considers the natural “clause vs. variable” or “clause vs. clause” 2-CSP obtained from random instances of 3SAT (which is not surprising given that a non-trivial approximation for random 3SAT requires levels of the Lasserre hierarchy [Sch08]).
Algorithm efficiency
Our algorithm actually does not require the full power of the Lasserre hierarchy. First, we can use the relaxed variant with approximate constraints studied in [KS09, RS09, KPS10]. Second, in the proof of Theorem 1.3, we don’t need to utilize the constraints on all -sized subsets of variables, but rather sufficiently many random sets suffice. As a result, we can implement our -round algorithm in time .
2 Related works
For Unique Games and related problems, previous works [KT07, Kol10, ABS10] used subspace enumeration to give algorithms with similar running time to Theorem 1.3 in the case that the threshold rank of the label extended graph of the instance is small. This is known to be a stronger condition on the instances than bounding the threshold rank of the constraint graph. The only known bound on the threshold rank of the label extended graph in terms of the threshold rank of the constraint graph loses a factor of about [ABS10]. These subspace enumeration algorithms also only applied to nearly satisfiable instances (whose objective value is close to ), and so did not give guarantees comparable to Theorems 1.1 and 1.2. As mentioned below in Section 2, SDP-based algorithms have some robustness advantages over spectral techniques. SDP hierarchies are also easily shown to yield polynomial-time approximation scheme for 2CSPs whose constraint graphs can have very high threshold rank such as bounded tree width graphs and regular planar graphs (or more generally any hyperfinite family of graphs, see e.g. [HKNO09] and the references therein).
Approximation schemes for (pseudo) dense CSP’s
For general 2CSP’s, several works gave polynomial-time approximation schemes for dense and pseudo-dense instances [FK99, ACOH+10, COCF10]. Our work generalizes these results, since pseudo-density is a stronger condition than having a constraint graph of low threshold rank. Furthermore, for an -approximation the degree of the instance needed by these works is exponential in , while the results of this work apply even on random graphs of degree .
Analyzing SDP hierarchy
Using very different methods, Chlamtac [Chl07] and Bhaskara et al [BCC+10] gave LP/SDP-hierarchy based algorithms for hypergraph coloring and the densest subgraph problem respectively. As mentioned above, several works gave lower bounds for LP/SDP hierarchies. In particular [RS09, KS09] showed that approximation such as those achieved in Theorem 1.3 for Unique Games problem require rounds of a relaxed variant of the Lasserre hierarchy. This relaxed variant captures our hierarchy as well. Schoenebeck [Sch08] proved that achieving a non-trivial approximation for 3SAT on random instances requires rounds in the Lasserre hierarchy, while Tulsiani [Tul09] showed that Lasserre lower bounds are preserved under common types of NP-hardness reductions.
In a concurrent and independent work, Guruswami and Sinop [GS11] gave results very similar to ours. They also use the Lasserre hierarchy to get an approximation scheme with similar performance to our Theorem 1.1 for 2-CSPs, and in fact even consider generalizations involving additional (approximate) global linear constraints. They also get essentially the same results for Unique Games as our Theorem 1.3. Furthermore, their rounding algorithm is the same as ours. However, there are some differences both in results and the proof. First, although [GS11] use a notion similar to our local-to-global correlation, they view it differently, and interestingly relate it to the problem of column selection for low rank approximations of matrices. Also, apart from the special case of unique constraints, they work with the threshold rank of the label extended graph, as opposed to the constraint graph as is the case here (however for binary alphabet these two graphs coincide). Their analysis relies on the full power of the Lasserre hierarchy, whereas we show that a weaker hierarchy is sufficient in the Unique Games case, and can even be done faster (i.e., vs ).
Our techniques
We now describe, on a very high and imprecise level, the ideas behind our rounding algorithm and its analysis. A semidefinite programming relaxation of an optimization problem yields a set of vectors satisfying certain conditions and achieving some objective value . The goal of a rounding algorithm is to transform this set of vectors into, say, a solution, satisfying the same conditions and achieving value that is close to . At a very high level, our main result is that if these vectors have some non-trivial global correlation, then a good rounding can be achieved with a non-trivially small number of hierarchy rounds. Our second observation is that in several cases, the vectors corresponding to a good SDP solution can be shown to have significant mass inside some low-dimensional subspace, and that implies a lower bound on their global correlation. Below we elaborate on what we mean, using the Max Cut problem (which is a special case of Unique Games) as an illustrative example. Our result for Max Cut is worked out in more detail in Section 4.
The SDP solution for Max Cut problem consists of a sequence of unit vectors, and the objective value is the expectation of over all edges in the input graph. Note that in the case that the vectors are one dimensional unit vectors (i.e., ), exactly corresponds to a cut in the graph, and the objective value measures the fraction of edges cut. Now, suppose that you could find vectors , whom we’ll call the basis vectors, such that every other has some significant projection into the span of . That is, if we let be the projection operator corresponding to this space, then for every , . It turns out that in this case, if is sufficiently close to and the vector solution satisfied rounds of an appropriate SDP hierarchy, then we can round to achieve a very good cut. The intuition behind this is the following: the constraints of hierarchy rounds allow us to essentially assume without loss of generality that the vectors are one-dimensional. That is, after applying an appropriate rotation, we can think of each one of them as a vector of the form . Moreover, our assumption implies that every other vector in has a magnitude of at least in its first coordinate. Now one can show that simply rounding each vector to the sign of its first coordinate will result in a assignment to the vertices corresponding to a good cut.
Local to global correlation
From the above discussion, our goal of rounding SDP hierarchies is reduced to finding a small number of basis vectors such that every (or at least most) other vector in the solution has very large projection into their span. But, why should such vectors exist? We show that we can assume they exist if the original Max Cut instance has small threshold rank. The latter is a condition that, as mentioned above, holds for many natural families of instances, including the canonical “hard instances” that are known to fool the GW algorithm— the noisy sphere and noisy Gaussian graphs [FS02, RS09]. The key concept behind our proof is the notion of local vs global correlations. It is a very well known property of expander graphs that random edges behave similarly to pairs of independently chosen vertices with respect to some tests. Specifically, if is an -vertex expander in the sense that the normalized adjacency matrix ’s second largest eigenvalue is at most , and is a bounded function mapping vertices to numbers, then we know that , where the former expectation is over pairs of vertices and the latter is over pairs connected by an edge. In other words, expander graphs imply that if is locally correlated over the edges of an expander graph, then it is also globally correlated. In fact, this is easily shown to hold even if maps vertices not into numbers but into vectors— i.e., if are unit vectors that are locally correlated over the edges of then they are also globally correlated. Indeed, this property of expanders has been used in the work of [AKK+08], who showed that the basic SDP program for Unique Games can be successfully rounded if the input graph is an expander.
Distribution view of SDP’s
Another, often beneficial way to view SDP hierarchies is as providing distribution on integral solutions (see Section 3.2). In this view, for every set of vertices , the SDP hierarchy provides a distribution over . Moreover, we require that distributions on overlapping sets will be consistent, and that the for every two variables the expectation will equal the inner product of the corresponding vectors. The challenge in rounding the SDP is that there is not necessarily a way to sample simultaneously the random variables in some consistent way. The projection of a vector into the span of turns out to capture (an appropriate notion of) the mutual information between the variable and the variables . Looked at from this viewpoint, our rounding algorithm involves choosing an assignment from the distribution for the basis vertices, and conditioning on its value. As long as (**) holds, we can find a random variable such that conditioning on will significantly decrease the entropy of the remaining variables. When we get stuck and (*) is violated, it means that for a typical edge , the random variables and are close to being statistically independent. This means that just sampling each independently will give approximately the same value on a typical constraint.
Threshold rank vs global correlation
Whenever the graph has small number of large eigenvalues, the condition that local correlation implies global correlation holds. This is useful to simulate eigenspace enumeration algorithms such as used by [KT07, Kol10, ABS10, Ste10] since in the case of Unique Games (and other related problems), a good SDP solution must be locally well correlated. But the notion of local to global correlation is somewhat more general and robust than having small threshold rank. For example, adding isolated vertices to a graph will increase correspondingly the number of eigenvectors with value , but will actually not change by much the local to global correlation. This captures to a certain extent the fact that SDP-based solutions are more robust than the spectral based algorithms. (A similar example of this phenomenon is that adding a tiny bipartite disjoint graph to the input graph makes the smallest eigenvalue become , but does not change by much the value of the Goemans-Williamson SDP.) We hope that this robustness of the SDP-based approach will enable further improvements in the future.
Theorem 1.3 considers a different parameter than Theorems 1.1 and 1.2. The latter two results consider threshold ranks for a small (i.e., close to ) threshold , and achieve a very good approximation. In contrast, Theorem 1.1 considers threshold that is close to , but only achieve a rough approximation (corresponding to the approximation guarantee relevant to the unique games conjecture). This is also manifested in some technical differences in the proofs.
Organization
We begin by fixing notation and a few formal definitions in the next section. For the purpose of exposition, we first present an algorithm for Max Cut on low-rank graphs using the Lasserre hierarchy in Section 4. Following this, the general algorithm for 2-CSPs on low-rank graphs is presented in Section 5. The connection between local and global correlations in low-rank graphs that is central to our algorithms, is outlined in Section 6. To implement our general approach in a hierarchy weaker than Lasserre hierarchy, we outline an argument to obtain low-rank approximation to any set of vectors in Section 7. The final section (Section 8) of the paper is devoted to subexponential time algorithm for Unique Games.
Preliminaries
We will use capital letters to denote random variables, and lower-case letters to denote assignments to these random variables.
The collision probability of is defined as
where is an independent copy of (so that the sequence is i.i.d.). It is easy to see that the variance and collision probability are related by,
2 Local Distributions
3 Lasserre Hierarchy
Let be a Unique Games instance with constraint graph , label set , and bisections . An -round Lasserre solution consists of -local random variables and vectors for all vertex sets with and all local assignments . A Lasserre solution is feasible if the local random variables are consistent with the vectors, in the sense that for all and with , we have
The objective is to maximize the following expression
An important consequence of the existence of the vectors is that for every set with and local assignment , the matrix is positive semidefinite.
Warmup – MaxCut Example
For the sake of exposition, we first present an algorithm for the Max Cut problem on low-rank graphs. In the Max Cut problem, the input consists of a graph and the goal is to find a cut of the vertices that maximizes the number of edges crossing, i.e., maximizes .
The Goemans-Williamson SDP relaxation for the problem assigns a unit vector for every vertex , so as to maximize the average squared length of the edges. Formally, the SDP relaxation is given by,
Stronger SDP relaxations produced by hierarchies such as Sherali-Adams and Lasserre hierarchy also yield probability distributions over local assignments.
More precisely, given a -round Lasserre SDP solution, it can be associated with a set of -local random variables taking values in . For an edge , its contribution to the SDP objective value () is equal to the probability that the edge is cut under the distribution of local assignments , namely,
Consequently, in order to obtain a cut with value close to the SDP objective, it is sufficient to jointly sample , such that on every edge the distribution of and is close to the corresponding local distribution . However, the variables are not jointly distributed, and hence cannot all be sampled together.
As a first attempt, let us suppose we sample each independently from its associated marginal . If on most edges , the distribution of the resulting samples is close to , then we are done. On an edge , the local distribution is far from the independent sampling distribution only if the random variables are correlated. Henceforth, these correlations across the edges would be refered to as “local correlations". A natural measure for correlations that we will utilize here is defined as . Using this measure, the statistical distance between independent sampling () and correlated sampling () is given by
(See Lemma 5.1 for a more general version of the above bound).
On the flip side, the existence of correlations makes the problem of sampling easier! If two variables are correlated, then sampling/fixing the value of reduces the uncertainty in the value of . More precisely, conditioning on the value of reduces the variance of as shown below:
Therefore, if we pick an at random and fix its value then the expected decrease in the variance of all the other variables is given by,
The above bound is proven in a more general setting in Lemma 5.2. As all random variables involved have variance at most , we can rewrite the above expression as,
The decrease in the variance is directly related to the global correlations between random pairs of vertices .
Recall that, the failure of independent sampling yields a lower bound on the average local correlations on the edges namely, . The crucial observation is that if the graph is a good expander in a suitable sense, then these local correlations translate in to non-negligible global correlations. Formally, we show the following (in Section 6):
Let be vectors in the unit ball. Suppose that the vectors are correlated across the edges of a regular -vertex graph ,
Then, the global correlation of the vectors is lower bounded by
As random variables arise from the solution to a SDP, the matrix is positive semidefinite, i.e., there exists vectors such that . Let us consider the vectors . Suppose the local correlation is at least then we have,
and . If the graph is low-rank, then by Lemma 4.1 we get a lower bound on the global correlation of the vectors , namely
General 2-CSP on Low Rank Graphs
Let be a (general) Max 2-Csp instance with variable set and label set . (We represent as a distribution over triples , where and is an arbitrary binary predicate. The goal is to find an assignment that maximizes the probability .)
For simplicity,If the constraint graph is not regular, all of our results still hold for an appropriate definition of threshold rank. we will assume that the constraint graph of is regular, i.e., every variable appears in the same number of constraints. (Since we allow the constraints to be weighted, the precise condition is that the total weight of the constraints incident to a vertex is the same for every vertex.)
Let be -local random variables with range . We write to denote the -indicator of the event . Notice that are also -local random variables.
For two random variables and with the same range, we denote their statistical distance,
The following lemma shows that the statistical difference between independent sampling and correlated sampling is explained by local correlation.
Under the distribution , the event has probability . On the other hand, under the product distribution , this event has probability . Hence, the difference of these probabilities is equal to . ∎
Conditional Variance and Pairwise Correlation
The following lemma shows that conditioning on a variable decreases the variance of a variable by the correlation of the variables and .
Pairwise Correlations and Inner Products
Suppose that the matrix is positive semidefinite. Then, there exists vectors in the unit ball such that for all vertices ,
Local Correlation vs Global Correlation on Low-Rank Graphs
The following lemma shows that local correlation (correlation across edges of a graph) implies global correlation (correlation between random vertices) if the graph has low threshold rank. (Proof in Section 6.)
Let be vectors in the unit ball. Suppose that the vectors are correlated across the edges of a regular -vertex graph ,
Then, the global correlation of the vectors is lower bounded by
Putting Things Together
The following lemma shows that either independent sampling is statistically close to correlated sampling across edges of a graph or the typical variance of a vertex decreases non-trivially by conditioning on a random vertex.
Let be a regular -vertex graph and be the expected statistical distance between independent and correlated sampling across the edges of ,
Further, suppose that the matrix is positive semidefinite. Then, conditioning on a random vertex decreases the variances by
Let be the vectors constructed in Lemma 5.3. By Lemma 5.3 and Lemma 5.1, the local correlation of these vectors is at least
(The last step also uses Cauchy–Schwartz.) Hence, Lemma 4.1 implies the following lower bound on the global correlation of these vectors,
Lemma 5.3 and Lemma 5.2 allows us to relate the expected decrement of the variances to the global correlation of the vectors ,
The following lemma asserts that if the constraint graph has low threshold rank then there exists a partial assignment to a small set of vertices such that independent sampling conditioned on this assignment gives almost the same value as correlated sampling (without conditioning on the assignment ).
Algorithm 5.5 (Propagation Sampling). Input: -local random variables over Output: (global) distribution over assignments . 1. Choose at random. 2. Sample a random set of “seed vertices” . (Repeated vertices are allowed.) 3. Sample a assignment for according to its local distribution . 4. For every other vertex , sample a label according to the local distribution for conditioned on the assignment for .
To prove the current theorem it is enough to show that . For , define a non-negative potential as follows
Let . Suppose . Then,
The following theorem directly implies Theorem 1.1.
An optimal -round Lasserre solution gives rise to -local random variables over . Let be the indicator variable of the event . The matrices are positive semidefinite for all sets with and local assignments . Furthermore, the Lasserre solution satisfies
Let be the jointly-distributed (global) random variables in Theorem 5.6. By Theorem 5.6, we can estimate the expected value of the assignment as
1 Special case of Unique Games
The following lemma is a version of Lemma 5.3 tailored towards Unique Games. The advantage of this version of the lemma is that the bounds are independent of the alphabet size .
Let be -local random variables over and let be the indicator of the event . Suppose that the matrix is positive semidefinite. Then, there exists vectors in the unit ball such that for all vertices and permutations of ,
The following theorem immediately implies Theorem 1.2. Let be a Unique Games instance with alphabet size and constraint graph .
Let be -local random variables over from an optimal -round Lasserre solution for . The local variables satisfy
For a permutation of , we define a modified version of statistical distance,
Therefore, we can estimate the expected fraction of satisfied constraints as
Local Correlation implies Global Correlation in Low-Rank Graphs
Let be a regular graph with vertex set . We identify with its normalized adjacency matrix, a symmetric stochastic matrix. Let be the eigenvalues of in non-increasing order.
The following lemma shows that a violation of the local vs global correlation condition implies that the graph has high threshold rank.
Suppose there exist vectors such that
Then for all , . In particular, .
Let be the Gram matrix represented in the eigenbasis of , so that
Let be the largest index such that . Notice that the numbers form a probability distribution over . Let be the probability of the event . Using Cauchy–Schwarz, we can bound this probability in terms of ,
On the other hand, we can bound the expectation of with respect to the probability distribution in terms of this probability ,
It follows that , which gives the desired conclusion that has at least eigenvalues . ∎
Note that Lemma 4.1 follows directly from the previous lemma by picking and observing that since for all
As a converse to Lemma 6.1, the following lemma shows that if a graph has many eigenvalues close to , then there exist vectors for the vertices of the graph with high local correlation and low global correlation.
If , then there exist vectors such that
Let be orthonormal eigenfunctions of with eigenvalue larger than . Consider vectors satisfying Since the functions have norm , the typical squared norm of the vectors satisfies
Since the eigenvalues of the eigenfunctions are larger than , we can lower bound the local correlation of the vectors ,
Finally, since the function are orthonormal, the global correlation of the vectors is
The condition that there exist vectors with
is equivalent to the condition that there exists a symmetric positive semidefinite matrix such that
On Low Rank Approximations to Sets of Vectors
Let be vectors in the unit ball. Then for every , there exists a subset with such that , where is the projection of to the orthogonal complement of the span of .
The proof of Theorem 7.1 is by an iterative construction. In each iteration, we will use the following lemma.
Let be vectors. Then, there exists a unit vector such that the vectors with satisfy the following condition,
Suppose we pick a random index and choose . In this case, the squared norm of the vectors equals
Hence, we can estimate the expected decrease of the typical squared norms for a random vector .
It follows that there exists a unit vector such that the vectors have the desired property
We can construct the set in a greedy fashion so as to minimize the total squared norm of the vectors (the projections of the vectors to the orthogonal complement of the span of ). (In fact, we could choose set randomly.) To make the analysis more convenient, we use the following, slightly different construction.
Let for all .
For from to , construct vectors and as follows:
Using Lemma 7.2, pick a unit vector such that the vectors satisfy the condition
Notice that the vectors are the projections of the vector into the orthogonal complement of the span of the vectors . Let be the set of all indices such that for some . We can verify that the vectors are an orthonormal basis of the span of . Let be the projections of the vectors into the orthogonal complement of the span of (so that ). Since the vectors are projections of the vectors for all , it follows that
Hence, we can bound the typical squared norm of the vectors ,
Since the left-hand side is nonnegative and , it follows that as desired. ∎
For our applications it will sometimes be convenient to associate different subspace with subsets of vectors (in Theorem 7.1, we associate the span of vectors in with the subset ).
Let be vectors in the unit ball. For every subset , let be the projector on some subspace orthogonal to the span of . (Note that is not necessarily the projector on the orthogonal complement of the span of .) Then for every , there exists a subset with such that , where .
We use the same construction as in the proof of Theorem 7.1. The only difference is that we define (instead of ). Here, is the set of all indices such that for some . The proof is still applies to this modifies construction because (which is the only fact used about these vectors). ∎
Rounding SDP Solutions to Unique Games
In this section, we will present a subexponential time algorithm for Unique Games based on a SDP hierarchy, namely the simple SDP augmented with Sherali-Adams hierarchy. This hierarchy of relaxations weaker than the Lasserre hierarchy was studied in some earlier works [RS09, KS09]. Roughly speaking, the th round relaxation in this hierarchy corresponds to the basic semidefinite program, along with all valid constraints on at most vectors. Formally, the variables in the th round relaxation for Unique Games consists of
A set of vectors with orthogonal vectors for every vertex .
The constraint of the SDP relaxation ensure that the inner products of the vectors are consistent with the corresponding local distributions, i.e., for all , and ,
The objective value of the SDP corresponds to minimizing the number of violated constraints,
Sample a assignment for according to its local distribution .
For every other vertex , sample a label according to the local distribution for conditioned on the assignment for .
The above procedure will be referred to as propagation rounding and the set of vertices will be called the seed vertices.
The following lemma implies that if the seed vertices nearly determine the values of a set of vertices , then the assignment output by the propagation rounding has a distribution similar to the local distribution that is part of the LP/SDP solution (hence gets close to the SDP value).
For a set , let denote the distribution over global assignments output by propagation rounding with as the seed vertex set. Then, for every subset with we have
Sample a assignment for according to its local distribution ,
Sample an assignment according to the local distribution for conditioned on the assignment for ,
For every vertex , sample a label according to the local distribution for conditioned on the assignment for ,
Clearly the distribution of is , while the distribution of is . For any , the coordinates and are independent samples from . Therefore we have,
Averaging over the different choices of ,
2 Unique Games on Low Rank Graphs
Let be an instance of unique games whose constraint graph has low threshold rank. Let be an SDP solution for , and let denote the associated set of locla distributions. Let denote the associated -local random variables. The main result of this section shows that there exists a small set of seed vertices fixing whose value determines the value of almost every other vertex. Formally, we show the following
For every integer , there exists a subset of vertices of size such that
To this end, we will relate conditioning a random variable on a set , to projecting the SDP vectors corresponding to the variable in to the span of the vectors corresponding to . This analogy is formalized in the following lemma.
Let be random variables with range with a joint distribution associated with them. For each , , let be the indicator of the event that . Let us suppose there exists vectors such that
Let us suppose . Define a random variable as follows,
Note that on fixing the values , the random variable is fixed.
By the definition of variance of a real random variable we have the following inequality.
Averaging the above inequality over the settings of , we get
Note that the second moments of the random variables match with the corresponding inner products of vectors . Hence,
The claim (1) follows from (8.1) and (8.2).
The claim (2) follows from (1) and the definition of variance of a random variable taking values over . ∎
For a subset , let be the set of vertices associated with it namely,
Let denote the projector on to the subspace orthogonal to span of . In particular, is a projector on to a subspace orthogonal to for all .
Apply Theorem 7.3 on the set of vectors with the projectors for a subset . Theorem 7.3 implies that there exists a choice of of size such that if then,
From Lemma 8.5, the low global correlation of vectors implies that their squared length is small, i.e.,
If is an SDP solution to unique games with value , i.e.,
If the vectors satisfy,
then the average correlation among the vectors is at least , i.e.,
By Lemma 8.4, the vectors satisfy
Let . Normalize the vectors so as to make their average squared length equal to . The resulting vectors have correlation at least . By Lemma 6.1, this implies that . Since for all , we get
3 Wrapping Up
Our main result about Unique Games (Theorem 1.3) is a direct consequence of Theorem 8.6 and Theorem 8.7 presented here.
For every positive integer , there exists an algorithm running in time that given a unique games instance over alphabet with value , finds a labelling satisfying fraction of the edges. Here is the smallest eigen value of the Laplacian of the constraint graph .
The algorithm proceeds by solving the -round Lasserre SDP for the given instance. Starting with the SDP solution, the algorithm runs the propagation rounding algorithm starting from every possible seed set of size .
By Lemma 8.2, there exists one such set for which we have,
Let denote the distribution over global assignments output by the propagation rounding scheme. For an edge , let denote the local distribution over suggested by the SDP solution. From Lemma 8.1, the statistical distance between and is at most
Averaging over all the edges we see that,
where is the SDP objective value of the solution . Along with (8.5), this implies that the algorithm on the choice of the appropriate seed set would find a solution with value at least . ∎
There exists an algorithm that given a Unique Games instance with vertex set , label set , and optimal value , finds an assignment with value at least by rounding an -round Lasserre solution.
The proof follows by combining our propagation rounding and the decomposition theorem of [ABS10]. The latter result allows us to partition the input graph into disjoint components each with rank at most by removing at most fraction of the edges in our input graph. An SDP solution for the input graph induces a solution for each of the components, and hence we can round the solution for each component separately using propagation rounding. ∎
Conclusions
We have shown that rounds of an SDP hierarchy suffice for solving the Unique Games problem on -satisfiable instances. The best lower bound known for the hierarchy we used is [RS09, KS09], and so a natural question, with obvious relevance to the unique games conjecture, is which bound is closer to the truth. The fact that our algorithm’s running time for rounds is only (as opposed to ), challenges the interpretation of lower bounds in the range as corresponding to super-polynomial running time, and so provides further motivation to the question of whether the current hierarchy lower bounds can be improved further.
With the exception of the Small-Set Expansion problem, we do not know how to translate algorithms for Unique Games into other computational problems. We hope that our ideas will help in combining the [ABS10] subexponential algorithm for Unique Games with SDP-based method to make progress on other Unique Games-hard computational problems. Indeed, Arora and Ge (personal communication) recently used the ideas of this work to obtain improved algorithms for -coloring on some interesting families of instances. A concrete open question along similar lines is whether one can get an algorithm for the Max Cut problem with approximation factor better than the factor of the Goemans-Williamson algorithm that runs in time .
For general 2-CSPs, we know that some instances will require a large number of hierarchy rounds, but it’s interesting to see whether there is any clean characterization of the instances on which SDP hierarchies do well, encompassing, say, both low threshold rank graphs and planar graphs. Another interesting question is to find the right generalization of the low threshold rank condition to -CSPs for .
References
Appendix A Faster Algorithms for SDP hierarchies
In this section, we argue that our rounding algorithm also works with weaker SDP hierarchies. We will show that for these weaker hierarchies, a near-optimal -round solution can be computed in time . Due to the equivalence of optimization and separation, it is enough to describe a separation oracle with running time . Given a collection of vectors , the separation oracle either has to output a good assignment or it has to output a valid linear constraint violated by the inner products of the input vectors.
We argue that such a separation oracle can easily be extracted from our rounding algorithm. Our rounding algorithm for Unique Games first selects a set of roughly vertices, then samples an assignment for these vertices, and finally samples labels for the remaining vertices from the local distributions conditioned on the event . The selection of the set depends only on the SDP vectors but not on the local distributions (which are not known to the separation oracle).
Hence, given vectors , our separation oracle can simply work as follows:
Select a vertex subset using Theorem 7.1 based on the given vectors .
Using linear programming, find local distributions that are as consistent as possible with the inner products of the vectors . If these local distributions match the inner products sufficiently closely, then our propagation rounding algorithm will succeed. On the other hand, if the local distributions do not match the inner products closely enough, then we can find a valid linear constraints that is violated by the inner product of the given vectors. (This separating linear constraint can be obtained from the dual solution of the linear program that was used to find the best local distributions.)
Appendix B Omitted proofs from Section 5 and Section 8
This appendix contains the proofs for some omitted proofs.
(Lemma 8.4 restated) If is an SDP solution to unique games with value , i.e.,
Observe that the vectors are projections of and projections shrinks distances, which implies that the vectors are correlated across constraints of the Unique Games instance,
Since , we can further upper bound
(In the last step, we again used the identity and the fact that .) By averaging over the label set and the edges of the graph, it follows as claimed that
Let be -local random variables over and let be the indicator of the event . Suppose that the matrix is positive semidefinite. Then, there exists vectors in the unit ball such that for all vertices and permutations of ,
On the other hand, we can upper bound the inner product of and ,
Finally, the vectors are in the unit ball,
Here, we are using the fact that for all distinct . ∎
Appendix C Facts about Variance
Let and be jointly-distributed random variables. Assume that has finite range.Let be the orthogonal projection of the random variable onto the subspace of functions of the random variable . Then,
By construction is a function of the random variable and is orthogonal to all functions of the variable . Hence, . Therefore, the expected variance of is
which gives the desired identity using . ∎
Let and be as in the previous lemma. Suppose the range of has cardinality . Then,
Without loss of generality, we may assume that and . Then, the set of random variables is an orthonormal basis for the subspace of functions of . Let . Then, is the orthogonal projection of to the subspace of function of . (Here, we use the assumption .) Hence, using the previous lemma,