On largest volume simplices and sub-determinants
Marco Di Summa, Friedrich Eisenbrand, Yuri Faenza, Carsten Moldenhauer
Introduction
Many techniques in convex geometry begin with approximating a geometric shape by a simpler one. The maximum volume ellipsoid, or John ellipsoid (see, e.g., ), for example, is a prominent such simplification with many applications in discrete and continuous optimization.
Simplices are, next to ellipsoids, among the most primitive convex sets. We are interested here in the problem of approximating a given -polytope by a contained simplex of largest volume. More precisely, we investigate the approximability and hardness of the following problem.
We assume here, without loss of generality, that the convex hull of the points is full-dimensional. As is the case for ellipsoids, the largest volume simplex in a convex body has attracted a lot of attention in the computer science and optimization literature, see, e.g., .
where is the matrix with columns . Thus MVS can be reduced to instances of the problem of finding the largest absolute value of a subdeterminant of a matrix. This motivates the second problem that is central to our study.
Here a basis of a matrix is a maximal subset of the column indices such that the corresponding columns are linearly independent, and is the matrix consisting of the columns indexed by . Khachiyan has shown that there exists a approximation algorithm for MSD, and thus also for MVS, with running time polynomial in and .
We show that there exists an algorithm for MVS, with running time polynomial in and , that computes a simplex with
We show that there exists a constant such that MVS cannot be approximated within a factor of , unless . This improves the previous best -inapproximability of Packer .
Our hardness result (ii) holds for instances with . A recent sampling technique immediately yields a -approximation for such instances (for another constant ), showing that the hardness is essentially tight in this case.
These results (i), (ii) and (iii) also hold for MSD.
The literature on topics related to MVS and MSD is extensive. In order to put our results in perspective, we provide an overview of a selection of related papers.
Variants where the dimension of the solution can be restricted, e.g. finding a maximum volume -dimensional simplex, have been considered for MVS and MSD .
Hardness of approximation
Packer has shown that MVS is inapproximable within a constant factor smaller than . This implies the same hardness for MSD, which was shown earlier to be NP-hard by Papadimitriou .
In both cases, the authors show that there exist a constant and a function such that it is NP-hard to approximate the respective problem with factor less than . Here, is linear in , but with constant dependence strictly less than one, thus these results do not cover the case .
Subdeterminants in optimization and combinatorics
The subdeterminant lower bound for hereditary discrepancy is . Recently Matousek has shown that the subdeterminant bound is tight up to a polynomial factor in and . In a recent series of papers by Nikolov et al. it was shown how to approximate the hereditary discrepancy, and thus the subdeterminant bound , within a polynomial factor in and . Our result provides a -approximation to . It is an interesting problem whether a polynomial time approximation algorithm for the subdeterminant bound with a guarantee that is polynomial in exists.
A tight analysis of Khachiyan’s algorithm
We now come to the main algorithmic result of our paper and show the following theorem. Recall that is the matrix corresponding to the columns of indexed by . We denote the set of bases of by .
The algorithm runs in time polynomial in , and . Thus MSD and MVS can be approximated within the factor above.
The largest subdeterminant of , in absolute value, corresponds to the largest volume simplex in with one vertex being the origin. The algorithm begins by rounding the polytope .
Khachiyan showed that, if is a symmetric -polytope that is explicitly given by its vertices, then it is possible to compute an ellipsoid such that in time polynomial in the number of vertices of , and . This applies to the polytope .
holds. From there, the algorithm proceeds in a greedy fashion, see Figure 1.
Note that correspond to the Gram-Schmidt orthogonalization of the vectors returned by the greedy procedure. Thus, after rounding, the absolute value of the determinant induced by the chosen vectors is .
We now review Khachiyan’s analysis showing that his algorithm is a factor approximation algorithm for MSD. Let us denote the Euclidean lengths of the picked vectors by for , and define . The claimed bound follows from the following two facts:
There is a polynomial-time -approximation algorithm for problems MSD and MVS.
2 Improving the analysis
We will now show that the approximation factor can in fact be bounded by . To do so, we significantly tighten the upper bound (i) presented in the analysis above. Key to this improvement is the following observation.
Each is on a principal axis of .
None of the vectors is contained in the interior of .
There exists such that each is contained in .
The theorem follows from the existence of an ellipsoid satisfying the conditions of Lemma 3 with .
The vectors are pairwise orthogonal due to the projection in Step 2. Recall that determinants are invariant under rotation and note that the algorithm’s execution is also invariant under rotation. Therefore, by suitably rotating the input, we can assume without loss of generality that are the vectors , where is the -th unit vector.
Next, let us group the elements of the decreasing sequence as follows. Let be the set of indices such that . Let denote the number of such groups. Since , we have . Assume that all groups are non-empty (discarding empty groups will decrease the number of groups and subsequently yield an improved analysis). Let be the largest element of and note that for all .
We are done once we have shown that the ellipsoid , together with , satisfies the conditions (a)–(c) from Lemma 3.
3 Tightness of the analysis
We now provide a family of instances of increasing dimension where Khachiyan’s algorithm achieves a ratio of , where tends to as grows. Thus, we basically match the upper bound on the approximation ratio given in the previous section.
Let be a power of two. The instances are matrices with rows of the form
Here, is a diagonal matrix of the form
Thus the polytope that is generated by the columns of the matrix (1) and their negatives is “round”, in the sense that it contains the unit ball and it is contained in the unit ball scaled by . We will show below that Khachiyan’s algorithm will output a solution of value at most . This implies the claim, as
Then using , we deduce that the approximation ratio is
Let be any column vector of . The squared norm of the projection of into the orthogonal complement of the first column vectors of (where ) is
The last term is the squared norm of the -th column vector of . Thus Khachiyan’s algorithm outputs the first columns of , and in the last step a column of norm at most from . The determinant of the column vector selected by Khachiyan’s algorithm is then at most , as claimed above.
Hardness
We now consider the hardness of approximating MVS and MSD. As mentioned in Section 1.1, the best inapproximability result was due to Packer , who proved that MVS cannot be approximated with a factor better than , unless . In this section we provide a drastic improvement showing that it is NP-hard to approximate MVS and MSD with a factor , where is an explicit constant. In particular, we show that the result holds for instances where . We will also conclude that the hardness result is best possible for such instances.
Our argument is based on the connection between MSD and the following problem.
Given a simple undirected graph, find a maximum family of vertex-disjoint odd cycles.
The overall strategy for proving hardness is the following. We first build on a hardness result by Berman and Karpinski on stable sets in -regular graphs and show that OCP is NP-Hard to approximate with a factor , where is an explicit constant. Our second step is to reduce OCP to MSD, using the construction seen above. Hence the constant inapproximability for OCP leads to a -inapproximability for MSD. Last, we reduce MSD to MVS.
Let us remark that OCP is NP-hard even when restricted to planar graphs and, in that case, allows for constant factor approximations . Following the hardness for packing the maximum number of disjoint cycles by Friggstad and Salavatipour , we can deduce for OCP a constant hardness under PNP and a hardness of unless , where is the number of nodes of the graph. This result relies on the PCP-theorem. Our construction below yields a weaker hardness for OCP, but it does not rely on the PCP-theorem, it leads to an improved hardness result for the vertex-disjoint triangle packing problem, and it is much simpler to use for constructing a hardness for MSD subsequently. In particular, it enables us to easily calculate the explicit constants in the hardness for MSD.
On the positive side, Kawarabayashi and Reed showed that for general graphs OCP can be approximated within a factor of .
We now describe our inapproximability result for OCP. We require a result of Berman and Karpinski for maximum stable set on 3-regular graphs. Given a system of linear equations modulo 2 with 3 variables per equation, Håstad showed that it is NP-hard to distinguish between instances where there exists a solution satisfying equations and instances where no solution satisfies more than equations, for any arbitrarily small . Building on this result, Berman and Karpinski gave a polynomial time construction of a 3-regular graph on vertices that translates satisfied equations to a maximum stable set of size at most and satisfied equations to a maximum stable set of size at least . Thus, it is NP-Hard to detect if a -regular graph on vertices has a stable set of size at least , or at most , for each .
Let be the graph constructed in . Intuitively, we would like to construct a new graph such that every vertex in corresponds to a triangle in and that a stable set in also corresponds to a packing of triangles in . A first candidate for such a graph would be the line graph of (recall that is -regular), but in the line graph we might also create triangles that do not correspond to vertices in .
We solve this issue by slightly changing . Subdivide every edge in twice, i.e., substitute an edge by a path . Let be the obtained graph. Since has vertices, hence edges, has vertices and edges. Note that is triangle-free.
We now prove that every subdivision of an edge in augments the stable sets by exactly one vertex. Consider a stable set in . By choosing when or otherwise we obtain an induced stable set in of size . Conversely, let be a stable set in . Modify such that for each not both and are in : if both are in , then the stable set has the same cardinality and only includes one of . Then we can obtain a stable set in of size at most . In particular, a stable set of size (resp., ) in translates into a stable set of size (resp., ) in .
Now let be constructed as follows: starting from the line graph of , for each add two vertices and connect them as to obtain the graph depicted in Figure 2. The number of vertices in is and the number of edges is , as every vertex belonging to the line graph of has degree four and the two additional vertices in each have degree two.
As is triangle-free, there is a one-to-one correspondence between triangles in and vertices in . Moreover, two vertices in are adjacent if and only if their corresponding triangles in have a common vertex. Thus a maximum stable set of size (resp., ) in translates into a maximum number of (resp., ) vertex-disjoint triangles in . It is known that finding the maximum number of vertex-disjoint triangles in a graph is APX-hard . However, no explicit lower bound was known.
It is NP-hard to approximate the maximum number of vertex-disjoint triangles in a graph with a factor of for arbitrarily small constant . The result holds even for graphs with maximum degree four.
Now, consider OCP in , i.e., finding the maximum number of vertex-disjoint odd cycles. We prove that there is always an optimal solution consisting of triangles only. Assume the contrary, i.e., the optimal solution includes a cycle of length at least . We distinguish three cases. First, does not contain any vertex other than those of type and (see Figure 2). Then, must be a triangle. Second, is fully contained in some . One easily checks that also in this case must be a triangle. Third, is not contained in any . Then has to pass through the node of some and hence the solution does not include any triangle in such . Substitute by any of the two triangles in to obtain another optimal solution to OCP. Hence there is an optimal solution to OCP in only containing triangles. Therefore we have the following hardness result.
It is NP-hard to approximate OCP with a factor of for arbitrarily small constant . The result holds even for graphs with maximum degree four.
2 From OCP to MSD and MVS
Consider the node-edge incidence matrix of : this is a matrix. From what was argued above, the maximum number of vertex-disjoint odd cycles of (resp., ) translates into subdeterminants of size (resp., ). This allows us to prove a hardness of when the dimension is .
It is NP-hard to approximate MSD with a factor of for arbitrarily small constant . The hardness even holds when restricted to node-edge incidence matrices of graphs with maximum degree four.
where and . ∎
We now derive a similar inapproximability result for MVS.
It is NP-hard to approximate MVS with a factor of for arbitrarily small constant .
where is the simplex among of maximum volume. Note moreover that the submatrix associated to the non-zero vertices of is a feasible solution to the original MSD problem. We deduce
Hence, we can output and obtain the required approximation. ∎
For every , it is NP-Hard to approximate
the maximum number of vertex-disjoint triangles in a graph and OCP within a factor of ;
MSD and MVS with a factor of .
3 Tightness for instances with n=O(d)𝑛𝑂𝑑n=O(d)
The claimed result then follows by repeated sampling and picking the largest basis.
If , then there exists a randomized algorithm for MSD with approximation ratio for some constant that depends on the constant in the -notation.