Randomized Rounding for the Largest Simplex Problem

Aleksandar Nikolov

Introduction

The $j$ -MVS problem can be easily reduced to a a problem about subdeterminants of positive semidefinite matrices (see Lemma 5 for the reduction). For an $m\times n$ matrix $M$ , let $M_{S,T}$ be the submatrix with rows indexed by $S\subseteq[m]$ and columns indexed by $T\subseteq[n]$ . In the maximum $j$ -subdeterminant problem ( $j$ -MSD) we are given an $n\times n$ positive semidefinite matrix $M$ of rank $d$ , and the goal is find a set $S$ of cardinality $j$ so that $\det M_{S,S}$ is maximized. The $j$ -MVS problem in $d$ dimensions can be reduced to solving $n$ instances of the $j$ -MSD problem for matrices of rank $d$ , and the reduction is approximation preserving.

The $j$ -MSD problem was also independently studied in the context of low-rank approximations. The optimal row-rank approximation of a matrix $A$ is well understood, and for both the operator and the Frobenius norm (and in fact any unitarily invariant matrix norm) is given by the the projection of the rows and columns of $A$ onto the top singular vectors. However, an approximation in terms of a submatrix of $A$ often has a better explanatory value. For example, if $A$ is a $n\times d$ matrix in which each row is a data point, and if $A$ is well-approximated by its projection onto the span of $j$ of its columns, we can argue that, at least intuitively, these columns represent important features in the data. Goreinov and Tyrtyshnikov [GT01] gave one formalization of this intuition, which we cite next.

Let $M\succeq 0$ be an $n\times n$ matrix and let $S\subseteq[n]$ be an optimal solution to the $j$ -MSD problem for $M$ . Then, for $T=[n]\setminus S$ we have

where $\sigma_{j+1}$ is the $(j+1)$ -st largest singular value of $M$ .

Hereditary discrepancy has an important application to rounding problems: roughly speaking, if the hereditary discrepancy of the matrix $A$ is low, then any solution to a $x$ linear system $Ax=b$ can be rounded to an integral vector $\bar{x}$ without introducing a lot of error. In this sense, hereditary discrepancy generalizes total unimodularity, which is equivalent to $\operatorname{herdisc}(A)=1$ [GH62]. The following theorem, proved by Lovász, Spencer, and Vesztergombi, makes this connection precise.

In fact the theorem holds with hereditary discrepancy defined in terms of any norm.

The following theorem shows that $\operatorname{detlb}_{2}(A)$ gives nearly-tight upper and lower bound on $\operatorname{herdisc}_{2}(A)$ . While not explicitly stated in this form, the theorem can be proved by modifying the arguments in [LSV86, Mat11] in a straightforward way.

There exists a constant $C$ such that for any $d\times n$ matrix $A$ ,

If for each $1\leq j\leq d$ we have a factor $\alpha(j)$ approximation for $j$ -MSD, then we get a factor $\alpha\stackrel{{\scriptstyle def}}{{=}}\max_{j}\alpha(j)^{1/2j}$ -approximation to $\operatorname{detlb}_{2}(A)$ , and, therefore, a factor $C\alpha\log d$ approximation to $\operatorname{herdisc}_{2}(A)$ .

On the algorithmic side, the best known approximation for $d$ -MSD is $(c\log d)^{d}$ for a constant $c$ , proved by Di Summa et al. [DEFM14]. They show that this approximation is achieved by a classical algorithm by Khachiyan [Kha95], for which they give a new analysis. This also implies a factor $(c\log d)^{d/2}$ approximation for $d$ -MVS (see Lemma 5). For $j<d$ , the best approximation known is of the form $(cj)^{j}$ for a constant $c$ : algorithms with this guarantee were given by Packer [Pac04] for $j$ -MVS and by Çivril and Magdon-Ismail [ÇM13] for $j$ -MSD.

2 Our Contribution

In this paper we design deterministic polynomial time approximation algorithms for the $j$ -MSD, and, therefore, also the $j$ -MVS, problems. Our main result is the following theorem.

There exists a deterministic polynomial time algorithm which approximates the $j$ -MSD problem within a factor of $e^{j+o(j)}$ . This implies that there also exists a deterministic polynomial time algorithm which approximates the $j$ -MVS problem within a factor of $e^{j/2+o(j)}$

This is the first approximation algorithm for $j$ -MSD and $j$ -MVS with an approximation factor of the form $\exp(O(j))$ , which matches the known hardness results up to the constant in the exponent. It is natural to conjecture that it is $\mathsf{NP}$ -hard to approximate $j$ -MSD within a factor $e^{j-\epsilon}$ for any $\epsilon>0$ . We leave this as an open problem.

Theorem 4 implies a factor $\sqrt{e}+o(1)$ approximation to $\operatorname{detlb}_{2}(A)$ for any $d\times n$ matrix $A$ , and, therefore, a factor $O(\log d)$ approximation to $\operatorname{herdisc}_{2}(A)$ . The latter result also follows from the techniques of the author and Talwar [NT14]. However, our result gives the first constant-factor approximation to a natural variant of the determinant lower bound. It is an interesting open problem to extend this result to the determinant lower bound for $\operatorname{herdisc}(A)$ , which is equal to

We also use our techniques to give an elementary and short proof of a variant of the restricted invertibility principle of Bourgain and Tzafriri [BT87].

3 Techniques

Our strategy for approximating $j$ -MSD when $j<d$ is similar, but the analysis becomes more complicated. For motivation, let us consider the $j=1$ case, in which we simply need to compute the largest diagonal entry of the input matrix $M$ , or, working with the columns $v_{1},\ldots,v_{n}$ of the square root $V$ of $M$ , we need to compute the index $i$ such that $v_{i}$ has the largest squared Euclidean norm. Of course, this problem can be solved trivially in linear time by enumerating over the $v_{i}$ , but it is instructive to solve it using an approach similar to the one we used for $d$ -MSD. Consider the smallest enclosing ball problem for $v_{1},\ldots,v_{n}$ : minimize $r$ subject to $v_{1},\ldots,v_{n}$ being contained in a Euclidean ball of radius $r$ centered at . It is clear that the optimal $r$ is equal to the norm of the longest $v_{i}$ . The dual of the smallest enclosing ball problem is the problem of maximizing $\sum p_{i}v_{i}v_{i}^{T}=\sum p_{i}\|v_{i}\|_{2}^{2}$ over probability vectors $p$ (a much more general version of this fact is proved in Theorem 15). This latter problem is our convex relaxation of $1$ -MSD. While this is a natural relaxation that we could have arrived at directly, without going through the smallest enclosing ball problem, our approach pays off when generalizing to the case $1<j<d$ , in which it is not clear how to come up directly with a natural convex relaxation of $j$ -MSD. The randomized rounding algorithm applied to the relaxation samples an index $i$ from the distribution determined by an optimal vector $p$ ; the expected squared length of $v_{i}$ is $\sum p_{i}\|v_{i}\|_{2}^{2}$ , i.e. exactly the objective value of the relaxation.

We follow a similar strategy for general $j$ . We define a minimization problem over ellipsoids centered at 0 that contain $v_{1},\ldots,v_{n}$ . The objective of the problem is to minimize the volume of the largest $j$ -dimensional section of the containing ellipsoid. It is not hard to show that this problem gives an upper bound on $j$ -MSD (Lemma 13). The main technical challenge is to derive the dual of this optimization problem and to analyze the natural randomized rounding algorithm applied to it. An important difference from the $j=d$ case is that the objective of the ellipsoid optimization problem is no longer differentiable, which complicates the analysis of the dual. When $1<j<d$ , the objective of the dual “splits” into two terms, one that resembles the $j=1$ case and another that resembles the $j=d$ case. To relate the expected value of the output of the rounding algorithm to this more complicated objective we use the theory of Schur-concave functions applied to the elementary symmetric polynomials.

We derandomize our algorithms using the method of conditional expectations. This approach and the use of the elementary symmetric polynomials to relate the eigenvalues of a matrix to its entries are inspired by the volume-sampling algorithms of Deshpande and Rademacher [DR10]. These sampling techniques together with the Schur concavity of ratios of elementary symmetric polynomials were used previously in the work of Guruswami and Sinop [GS12] on low rank matrix approximations.

Preliminaries

We use the notation $[n]=\{1,\ldots,n\}$ for an integer $n$ . With ${S\choose k}$ we denote the set of size $k$ subsets of the set $S$ .

Lemma 5 implies that a factor $\alpha$ approximation algorithm for $j$ -MSD implies a factor $\sqrt{\alpha}$ approximation algorithm for $j$ -MVS. For this reason, for the rest of the paper we will focus our attention on the $j$ -MSD problem.

2 Convex Analysis and Optimization

Consider an optimization problem in the following general form:

For any $x$ which is feasible for (1)–(2), and any $y\geq 0$ , $g(y)\leq f_{0}(x)$ . This fact is known as weak duality. The Lagrange dual problem is defined as

Strong duality holds when the optimal value of (3) equals the optimal value of (1)–(2). Slater’s condition is a commonly used sufficient condition for strong duality. We state it next.

Assume $f_{0},\ldots,f_{m}$ in the problem (1)–(2) are convex functions over their respective domains, and for some $k\geq 0$ , $f_{1},\ldots,f_{k}$ are affine functions. Let there be a point $x$ in the relative interior of the domains of $f_{0},\ldots,f_{m}$ , so that $f_{i}(x)\leq 0$ for $1\leq i\leq k$ and $f_{j}(x)<0$ for $k+1\leq j\leq m$ . Then the optimal value of (1)–(2) equals the optimal value of (3), and the value of (3) is achieved if it is finite.

For more information on convex programming and duality, we refer the reader to the books by Boyd and Vandenberghe [BV04] and Rockafellar [Roc70].

3 Ellipsoids and John’s Theorem

This program corresponds to finding the minimum volume ellipsoid centered at 0 that contains $v_{1},\ldots,v_{n}$ . It is a convex minimization problem over the open domain $\{W:W\succ 0\}$ with affine constraints, and, therefore, satisfies Slater’s condition. The dual problem to (4)–(6) is

Up to scaling of the variables $c_{1},\ldots,c_{n}$ , this is the $D$ -optimal design problem. For a proof of the duality, see [BV04, Sect. 5.1.6, 5.2.4, 7.5.2]; it also follows from the the more general Theorem 15. Since it is the dual of a convex minimization problem, (7)–(9) is a convex maximization problem. Then the following variant of John’s theorem is a direct consequence of strong duality for the program (4)–(6) (which is implied by Slater’s condition):

The optimal value of (4)–(6) is equal to the optimal value of (7)–(9).

4 Properties of Determinants

First we recall the classical Binet-Cauchy formula for the determinant of a matrix product. For any $m\times n$ matrix $A$ , $m\geq n$ , we have

Let $e_{k}$ be the degree $k$ elementary symmetric polynomial, i.e.

Let $M$ be an $n\times n$ symmetric matrix with eigenvalues $\lambda_{1},\ldots,\lambda_{n}$ . It is well-known that $\det(M)=e_{n}(\lambda_{1},\ldots,\lambda_{n})$ and ${\rm tr}(M)=e_{1}(\lambda_{1},\ldots,\lambda_{n})$ . In fact a similar identity involving the entries of $M$ and its eigenvalues holds for all $k$ :

This fact is also classical and can be proved by expressing each coefficient of the characteristic polynomial of $M$ in two different ways: as a sum of subdeterminants, and as a symmetric polynomial of its roots.

5 Schur Convexity

We use the following classical fact about the Schur-concavity of elementary symmetric functions, proved by Schur.

The elementary symmetric polynomial $e_{k}$ of degree $k$ , $1\leq k\leq n$ , is Schur-concave.

The Full-Dimensional Case

In this section we discuss the special case $j=d$ . We treat this case separately because it is a natural problem in itself, and the technical details of our algorithm are simpler, while illustrating some of the key ideas of our approach.

We first prove a simple lemma which is essential to our analysis.

Let $V$ be a $d\times n$ matrix with column vectors $v_{1},\ldots,v_{n}$ . Let $p_{1},\ldots,p_{n}$ give the probabilities for a distribution on $[n]$ , i.e. $p_{i}\geq 0$ for all $i$ and $\sum p_{i}=1$ . Let $S$ be a random multiset of $d$ elements, each sampled independently with replacement from $[n]$ according to the distribution determined by $p_{1},\ldots,p_{n}$ . Then

where $P=\operatorname{diag}(p_{1},\ldots,p_{n})$ is a diagonal matrix with the values $p_{i}$ on the main diagonal. The right hand side is equal to $d!\det(VPV^{T})$ by the Binet-Cauchy formula (10). Since $VPV^{T}=\sum_{i=1}^{n}{p_{i}v_{i}v_{i}^{T}}$ , this finishes the proof. ∎

We present our approximation algorithm for $d$ -MSD as Algorithm 1. The main approximation guarantee of the algorithm is given in Theorem 10.

Let the random multiset $S$ be the output of Algorithm 1 for input $M$ and an $\alpha$ -optimal $c_{1},\ldots,c_{n}$ . Then

Observe first that $\det(M_{S,S})=\det(V_{S}^{T}V_{S})=\det(V_{S})^{2}$ for any $S$ of size $d$ . Then by Lemma 9, with $p_{i}\stackrel{{\scriptstyle def}}{{=}}\frac{1}{d}c_{i}$ , we have

The first inequality above follows by applying the AM-GM inequality to the eigenvalues of $V_{T}^{T}WV_{T}$ , and the last inequality is implied by the constraints (5). What we have shown is equivalent to the intuitive geometric fact that the volume of the largest simplex with one vertex at contained in the convex hull of $v_{1},\ldots,v_{n}$ is at most the volume of the largest simplex with one vertex at contained in the Löwner ellipsoid of $v_{1},\ldots,v_{n}$ (or in fact any ellipsoid containing these points).

as desired. The asymptotic estimate $\frac{d!}{d^{d}}\sim\sqrt{2\pi d}e^{-d}$ is a restatement of Stirling’s approximation of $d!$ . ∎

Since (7)–(9) is a convex optimization problem, we can use the ellipsoid method to to compute an $\alpha$ -optimal solution in time polynomial in $n,d,\log\alpha^{-1}$ [GLS81]. Khachiyan [Kha96] showed how to compute a $d\ln(1+\epsilon)$ -optimal solution to (7)–(9) (i.e. a multiplicative $(1+\epsilon)^{d}$ approximation to $\det\Bigl{(}\sum_{i}c_{i}v_{i}v_{i}^{T}\Bigr{)}$ ) using a polynomial in $n,d,\epsilon^{-1}$ number of real value operations. Using either method with Algorithm 1, we get an approximation factor of $\frac{1}{\sqrt{2\pi d}}((1+\epsilon)e)^{d}$ in time polynomial in $n$ , $d$ , and $\epsilon^{-1}$ .

In Section 5 we show how to derandomize Algorithm 1 using the method of conditional expectations.

The General Case

A natural first attempt to extend Algorithm 1 to general $j<d$ is to simply sample $j$ , rather than $d$ , coordinates from the distribution induced by an optimal solution to (7)–(9). A straightforward extension of the analysis in Section 3 shows that this algorithm achieves approximation factor $\tfrac{d^{d}}{j!}$ , which is $\exp(O(j))$ for $j=\Omega(d)$ but approaches $d^{d}$ for smaller $j$ . In order to achieve $\exp(O(j))$ approximation for all $j$ , we generalize the Löwner ellipsoid problem. The rounding algorithm remains essentially the same, but the details of the analysis become more complicated.

It is not hard to see that $\max_{H}{\operatorname{vol}_{j}(H\cap E)}$ for an ellipsoid $E$ is proportional to the product of the lengths of the $j$ longest major axes of $E$ . We use this observation to formulate the problem of finding $j$ -Löwner ellipsoid as a convex program. First we need to define the appropriate function on the space of positive definite matrices.

To show that $\Delta_{j}(W)$ is convex and continuous, and to characterize its subdifferentials, we will use a general result of Lewis, extending classical work by von Neumann on unitarily invariant matrix norms. Below we state a slightly specialized case of his result.

For a set $S\subseteq[d]$ , let us use the notation $1_{S}$ for the $d$ -dimensional indicator vector of $S$ , i.e. the $i$ -th coordinate of $1_{S}$ is $1$ if $i\in S$ and otherwise. Let us define the convex polytope $V_{j,d}\stackrel{{\scriptstyle def}}{{=}}\operatorname{conv}\{1_{S}:S\in{[d]\choose j}\}$ . This is the basis polytope of the rank $j$ uniform matroid. We can now prove the convexity of $\Delta_{j}$ and characterize its subdifferentials.

The function $\Delta_{j}$ is convex and continuous on the space of positive definite matrices. Moreover, for any $W\succ 0$ with eigenvalues

the subdifferential of $\Delta_{j}$ at $W$ is

Because $\delta_{j}$ is symmetric, Lemma 11 implies that in order to show that $\Delta_{j}$ is convex and continuous, we only need to show that $\delta_{j}$ is convex and continuous. Because the function $-\ln x$ is monotone decreasing in $x$ , we can write $\delta_{j}(x)$ as

Since $\delta_{j}(\lambda)=\max\{\delta_{S}(\lambda):S\in{[d]\choose j}\}$ , and each $\delta_{S}$ is differentiable, we have

The gradient $\nabla\delta_{S}(\lambda)$ is given by $\frac{\partial\delta_{S}(\lambda)}{\partial\lambda_{i}}=-\lambda_{i}^{-1}$ for $i\in S$ and $\frac{\partial\delta_{S}(\lambda)}{\partial\lambda_{i}}=0$ otherwise. We have

This implies the desired characterization of $\partial\delta_{j}(\lambda)$ . ∎

Geometrically, the lemma captures the following fact. Let $E$ be an ellipsoid centered at and containing $v_{1},\ldots,v_{n}$ . Then the volume of the largest $j$ -dimensional simplex in the convex hull of $v_{1},\ldots,v_{n}$ with one vertex at is at most the volume of the largest $j$ -dimensional simplex in $E$ with one vertex at . Moreover, the latter simplex is contained in the $j$ -dimensional subspace whose intersection with $E$ has the largest volume. In the formal proof below we use a linear algebraic argument.

Let $\lambda_{1}\leq\ldots\leq\lambda_{d}$ be the eigenvalues of $W$ , and let $\mu_{1}\leq\ldots\leq\mu_{j}$ be the eigenvalues of $U^{T}WU$ . By the Cauchy interlace theorem, $\lambda_{i}\leq\mu_{i}$ for any $1\leq i\leq j$ , and, therefore,

On the other hand, by applying the AM-GM inequality to the eigenvalues of the matrix $V_{S}^{T}WV_{S}$ , we get

The final inequality above follows from the constraints (14). Combining the inequalities gives the desired bound. ∎

2 Duality for j𝑗j-Löwner Ellipsoids

As mentioned above, the program (13)–(15) that we used to capture $j$ -Löwner ellipsoids is convex and satisfies Slater’s condition. Therefore, it admits a dual characterization, which we will use in our algorithm. In this section we derive the dual characterization using the Lagrange dual function.

Before we introduce the dual, or even properly define its objective function, we need to prove a technical lemma.

Let $x_{1}\geq\ldots\geq x_{m}\geq 0$ be non-negative reals, and let $j\leq m$ be a positive integer. There exists a unique integer $k$ , $0\leq k\leq j-1$ , such that

Define $x_{>k}\stackrel{{\scriptstyle def}}{{=}}\sum_{i>k}{x_{i}}$ . If $x_{>0}\geq jx_{1}$ holds, then (16) is satisfied for $k=0$ , and we are done. So let us assume that $x_{>0}<jx_{1}$ . Then $x_{>1}=x_{>0}-x_{1}<(j-1)x_{1}$ , and the first inequality in (16) is satisfied for $k=1$ . If the second inequality is also satisfied we are done, so let us assume that $x_{>1}<(j-1)x_{2}$ , which implies the first inequality in (16) for $k=2$ . Continuing in this manner, we see that if the inequalities (16) are not satisfied for any $k\in\{0,\ldots,j-2\}$ , then we must have $x_{>j-1}<x_{j-1}$ . But the second inequality for $k=j-1$ , i.e. $x_{>j-1}=x_{j}+x_{>j}\geq x_{j}$ is always satisfied because all the $x_{i}$ are non-negative, so we have that if (16) does not hold for any $k\leq j-2$ , then it must hold for $k=j-1$ . This finishes the proof of existence.

This completes the proof of uniqueness. ∎

We now introduce a function which will be used in formulating a dual characterization of (13)–(15).

We will prove that the dual of (13)–(15) is equivalent to the following optimization problem:

The program (17)–(19) is a convex optimization problem, and its optimal value is equal to the optimal value of (13)–(15).

Observe that when $j=d$ , $\Gamma_{j}(X)=\ln\det(X)$ , so that (13)–(15) in this case reduces to (7)–(9). I.e. Theorem 15 generalizes Lemma 7.

To prove Theorem 15, we need two additional technical lemmas. The first one is well-know and follows from more general results characterizing the facets of the basis polytope of a matroid [Sch03].

For any $j$ and $d$ , $V_{j,d}=\{x:\sum_{i=1}^{d}{x_{i}}=j,0\leq x_{i}\leq 1~{}\forall 1\leq i\leq d\}.$

The next lemma is the key technical ingredient in the proof of Theorem 15.

Let $X\succeq 0$ be a $d\times d$ matrix of rank at least $j$ . Then there exists a $d\times d$ matrix $W\succ 0$ such that $X\in-\partial\Delta_{j}(W)$ , and $\Gamma_{j}(X)=\Delta_{j}(W)$ .

Let $r$ be the rank of $X$ , and let $\mu_{1}\geq\ldots\geq\mu_{d}$ be its eigenvalues. Let $U$ be an orthonormal matrix such that $X=U\operatorname{diag}(\lambda)U^{T}$ for $\mu=(\mu_{1},\ldots,\mu_{d})$ . Assume that $k$ is a positive integer strictly smaller than $j$ such that $\mu_{k}>\frac{\sum_{i>k}{\mu_{i}}}{j-k}\geq\mu_{k+1}$ and define $\nu\stackrel{{\scriptstyle def}}{{=}}\frac{\sum_{i>k}{\lambda_{i}}}{j-k}$ . A unique such choice of $k$ exists by Lemma 14. Moreover, since $X$ has rank at least $j$ , $\lambda_{1}\geq\ldots\geq\lambda_{k+1}>0$ , which also implies $\nu\geq 0$ . Therefore, the following vector $\lambda$ is well-defined for any $\nu>\epsilon>0$ :

Let us set $W\stackrel{{\scriptstyle def}}{{=}}U\operatorname{diag}(\lambda)U^{T}$ . By Lemma 12, to prove that $X\in-\partial\Delta_{j}(W)$ , it suffices to show that $(\mu_{k+1},\ldots,\mu_{r})\in\nu V_{j-k,r-k}$ . This inclusion follows from Lemma 16 because, by the choice of $\nu$ and $k$ , $0\leq\lambda_{i}\leq\nu$ for all $k+1\leq i\leq r$ , and $\sum_{i=k+1}^{r}{\mu_{i}}=(j-k)\nu$ .

The equality $\Gamma_{j}(X)=\Delta_{j}(W)$ follows by a calculation. By the choice of $k$ and $\nu$ , $\mu_{1}\geq\ldots\geq\mu_{k}>\nu$ . Therefore, the $j$ smallest eigenvalues of $W$ are $\lambda_{1}\leq\ldots\leq\lambda_{j}$ , and we have

Let us define $\{W:W\succ 0\}$ to be the domain for the constraints (14) and the objective function (13). This makes the constraint $W\succ 0$ implicit. The optimization problem is convex by Lemma 12. Is is also always feasible: for example, if $r=\max_{i=1}^{n}{\|v_{i}\|_{2}^{2}}$ , then $r^{-1}I$ is a feasible solution. Slater’s condition is therefore satisfied and strong duality holds. To prove the theorem, it suffices to show that the dual problem to (13)–(15) is equivalent to (17)–(19).

The Lagrange dual function for (13)–(15) is

A matrix $W\succ 0$ achieves the minimum above if and only if $0\in\partial g(c)$ , which, by the additivity of subgradients, is equivalent to $\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\in-\partial\Delta_{j}(W)$ . Define $X\stackrel{{\scriptstyle def}}{{=}}\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}$ . Consider first the case in which $X$ has rank less than $j$ . Let $t\geq 0$ be a parameter, and let $\Pi$ be an orthogonal projection matrix onto the nullspace of $X$ . Consider the matrix $W\stackrel{{\scriptstyle def}}{{=}}I+t\Pi$ . The sum $\sum_{i=1}^{n}{c_{i}v_{i}^{T}Wv_{i}}={\rm tr}(XW)={\rm tr}(X)$ remains bounded for all $t$ , while $\Delta_{j}(W)$ goes to $-\infty$ as $t\to\infty$ . Therefore $g(c)=-\infty$ in this case.

Next we consider the case in which $X$ has rank at least $j$ . Then, by Lemma 17, there exists a $W$ such that $X\in-\partial\Delta_{j}(W)$ , and, therefore, this $W$ achieves $g(c)$ . From Lemma 12 and $X\in-\partial\Delta_{j}(W)$ , it follows that $\sum_{i=1}^{n}{c_{i}v_{i}^{T}Wv_{i}}={\rm tr}(XW)=j$ . Also using the fact that, by Lemma 17, $\Delta_{j}(W)=\Gamma_{j}(X)$ , we have

To finish the proof we show that any $c$ that maximizes the right hand side above satisfies $\sum_{i=1}^{n}{c_{i}}=j$ , and, therefore, the optimal value of the dual problem, $\max\{g(c):c_{i}\geq 0\ \forall 1\leq i\leq n\}$ , is equal to the optimal value of (17)–(19). Let us fix some arbitrary $c$ such that $c_{i}\geq 0$ for all $i$ and $\sum_{i=1}^{n}{c_{i}}=j$ , and consider the function $h(t)\stackrel{{\scriptstyle def}}{{=}}g(tc)$ , defined over positive real numbers $t$ . It will be enough to show that the unique maximizer of $h(t)$ is $t=1$ . Since $h$ is a restriction of a convex function, it is also convex, and it is enough to show that $1$ is the unique solution of $\frac{dh}{dt}=0$ . Let $\lambda_{1}\geq\ldots\geq\lambda_{d}$ be the eigenvalues of $\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}$ , and let $k$ be the unique integer with which $\Gamma_{j}(\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}})$ is computed, i.e. $k$ is such that $\lambda_{k}>\frac{\sum_{i>l}{\lambda_{i}}}{j-k}\geq\lambda_{k+1}$ . The eigenvalues of $\sum_{i=1}^{n}{tc_{i}v_{i}v_{i}^{T}}$ are $t\lambda_{1}\geq\ldots\geq t\lambda_{d}$ , so the condition $t\lambda_{k}>\frac{\sum_{i>l}{t\lambda_{i}}}{j-k}\geq t\lambda_{k+1}$ is clearly satisfied, and, by Lemma 14, this choice of $k$ is unique. Therefore, $\Gamma_{j}\Bigl{(}\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\Bigr{)}$ and $\Gamma_{j}\Bigl{(}\sum_{i=1}^{n}{tc_{i}v_{i}v_{i}^{T}}\Bigr{)}$ are computed with the same $k$ . By the basic properties of logarithms, $\Gamma_{j}\Bigl{(}t\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\Bigr{)}=\Gamma_{j}\Bigl{(}\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\Bigr{)}+j\ln t$ , and, therefore,

The derivative $\frac{dh}{dt}=\frac{j}{t}-j$ vanishes only at $t=1$ , which implies that $h(t)=g(tc)$ is maximized at $t=1$ . This proves the claim and finishes the proof of the theorem. ∎

3 The Rounding Algorithm

Our rounding algorithm, shown as Algorithm 2, is nearly identical to Algorithm 1, except for using probability weights proportional to an optimal solution of (17)–(19). The approximation guarantee for the algorithm is given by the following theorem.

Let the random multiset $S$ be the output of Algorithm 1 for input $M$ and $\alpha$ -optimal $c_{1},\ldots,c_{n}$ . Then

Let us define $p_{i}\stackrel{{\scriptstyle def}}{{=}}\frac{1}{j}c_{i}$ for $1\leq i\leq n$ , and $P\stackrel{{\scriptstyle def}}{{=}}\operatorname{diag}(p_{1},\ldots,p_{n})$ . If some element in $S$ repeats, then $\det(M_{S,S})=0$ . On the other hand, each set $S$ can be sampled in $j!$ different ways, one for each ordering of its elements. We can then write the expectation of $\det(M_{S,S})$ as

The asymptotic estimate $\frac{j!}{j^{j}}\sim\sqrt{2\pi j}e^{-j}$ is again just Sterling’s approximation to $j!$ . This completes the proof of the theorem. ∎

Since (17)–(19) is a convex optimization problem, we can use the ellipsoid method to to compute an $\alpha$ -optimal solution in time polynomial in $n,d,\log\alpha^{-1}$ [GLS81]. Together with Algorithm 2, we get an approximation factor of $\frac{1}{\sqrt{2\pi d}}((1+\epsilon)e)^{j}$ in time polynomial in $n$ , $d$ , and $\log\epsilon^{-1}$ . It is also conceivable that the barycentric coordinate descent method of Khachiyan [Kha96] can be extended to solve (17)–(19).

In Section 5 we show how to derandomize Algorithm 2 using the method of conditional expectations.

Derandomizing the Algorithms

In Theorems 10 and 18 we only proved our approximation guarantees in expectation. A priori, this does not give a useful bound on the probability that the set output by Algorithm 1 or 2 is close to optimal. However, it is not hard to derandomize the algorithms using the method of conditional expectation. The deterministic algorithm is presented as Algorithm 3.

By the method of conditional expectation [AS08], and Theorem 18, it is enough to show that

where the expectation is over the distribution on the output of Algorithm 2. Expanding the expectation on the right hand side, we have

The second equality follows from the “base times height” formula for the determinant. The penultimate equality follows from (11). ∎

To implement Algorithm 3, we need to be able to evaluate the elementary symmetric polynomial $e_{j-|T|}(\lambda(T))$ . This can be done in polynomial time by expanding the characteristic polynomial of the matrix $(C^{1/2}V^{T}\Pi(T)VC^{1/2})_{[n]\setminus T,[n]\setminus T}$ : $e_{j-|T|}(\lambda(T))$ is equal to $(-1)^{j-|T|}$ times the coefficient of the term of degree $d-j+|T|$ .

Restricted Invertibility Principles

The celebrated Restricted Invertibility Principle (RIP) of Bourgain and Tzafriri [BT87] is a powerful generalization of the simple fact in linear algebra that a matrix of rank $r$ has an invertible submatrix with at least $r$ columns. The RIP shows that if the “robust rank” of a matrix is large, i.e. the matrix has many large singular values, then it contains a proportionally large submatrix which is well-invertible, i.e. its inverse is bounded in operator norm. The RIP has had many applications in Banach space theory, asymptotic convex geometry, statistics, and recently in discrepancy theory and private data analysis.

The version of the RIP above was proved by Spielman and Srivastava for $c_{1}=\ldots=c_{n}=1$ . However, essentially the same proof shows the slight generalization formulated above.

Next we state our version of the RIP for determinants.

Moreover, such a set $S$ can be computed in deterministic polynomial time.

where $P\stackrel{{\scriptstyle def}}{{=}}\operatorname{diag}(p_{1},\ldots,p_{n})$ , $\mu$ is the vector of eigenvalues of the matrix $P^{1/2}XP^{1/2}$ , and the final equality follows by (11).

Let us identify $L$ with a $d\times d$ matrix in the natural way. The matrix

Note that the entries of $\lambda$ (i.e. the eigenvalues of $LL^{T}$ ) are equal to the squared singular values of $L$ , and, therefore, $\|L\|_{HS}^{2}=\|\lambda\|_{1}$ and $\|L\|_{2\to 2}^{2}=\lambda_{1}$ . To complete the proof, it remains to show that $e_{j}(\lambda)\geq j^{-j}\|\lambda\|_{1}^{j}$ . Because $e_{j}$ is Schur concave, it is enough to show that $\lambda\prec\bar{\lambda}\stackrel{{\scriptstyle def}}{{=}}\Bigl{(}\frac{\|\lambda\|_{1}}{j},\ldots,\frac{\|\lambda\|_{1}}{j},0,\ldots,0\Bigr{)}$ , where $\bar{\lambda}$ has $j$ non-zero coordinates. Indeed, for any $k\leq j$ , by the choice of $j$ ,

For $j<k\leq d$ , $\sum_{i=1}^{k}{\lambda_{i}}\leq\|\lambda\|_{1}=\sum_{i=1}^{k}{\bar{\lambda}_{i}}$ , with equality for $k=d$ . Therefore, $\lambda\prec\bar{\lambda}$ and $e_{j}(\lambda)\geq e_{j}(\bar{\lambda})=j^{-j}\|\lambda\|_{1}^{j}$ . This finishes the proof of the main claim.

A set $S\subseteq[n]$ such that $M=X_{S,S}$ satisfies the conclusion of the theorem can be computed in deterministic polynomial time via the method of conditional expectations, as in the proof of Theorem 19. ∎

Theorem 21 is incomparable with Theorem 20. On one hand, the conclusion of Theorem 20 is of a qualitatively stronger type: it implies a lower bound on the smallest singular value of the matrix $M=(\langle Lv_{i},Lv_{k}\rangle)_{i,k\in S}$ . On the other hand, the set $S$ in Theorem 21 can be as large as the (floor function of the) robust rank $\|L\|_{HS}^{2}/\|L\|_{2\to 2}^{2}$ , while this is in general not possible in Theorem 20.

Conclusion

We have given a polynomial time deterministic algorithm that approximates the $j$ -MSD problem by a factor of $e^{j+o(j)}$ , and, therefore, the $j$ -MVS problem by a factor of $e^{j/2+o(j)}$ . Our algorithms use randomized rounding with a generalization of the $D$ -optimal design problem. The analysis relies on convex duality, Schur convexity, and elementary properties of determinants.

We conjecture that approximating the $j$ -MSD problem within a factor of $e^{j-\epsilon}$ is $\mathsf{NP}$ -hard for any $\epsilon>0$ . As an easier problem, it will be interesting to construct an input for which the $j$ -Löwner ellipsoid approximates $j$ -MSD no better than a factor of $e^{j}$ , or to give a better analysis.

We also leave open the problem of computing a constant factor approximation to the determinant lower bound on hereditary discrepancy. Finally, it will be interesting to generalize Khachiyan’s barycentric coordinate descent algorithm for the $D$ -optimal design problem to the dual of the $j$ -Löwner ellipsoid problem.

Acknowledgements

I thank Kunal Talwar and Nikhil Bansal for useful discussions.