Randomized Rounding for the Largest Simplex Problem

Aleksandar Nikolov

Introduction

The jj-MVS problem can be easily reduced to a a problem about subdeterminants of positive semidefinite matrices (see Lemma 5 for the reduction). For an m×nm\times n matrix MM, let MS,TM_{S,T} be the submatrix with rows indexed by S[m]S\subseteq[m] and columns indexed by T[n]T\subseteq[n]. In the maximum jj-subdeterminant problem (jj-MSD) we are given an n×nn\times n positive semidefinite matrix MM of rank dd, and the goal is find a set SS of cardinality jj so that detMS,S\det M_{S,S} is maximized. The jj-MVS problem in dd dimensions can be reduced to solving nn instances of the jj-MSD problem for matrices of rank dd, and the reduction is approximation preserving.

The jj-MSD problem was also independently studied in the context of low-rank approximations. The optimal row-rank approximation of a matrix AA is well understood, and for both the operator and the Frobenius norm (and in fact any unitarily invariant matrix norm) is given by the the projection of the rows and columns of AA onto the top singular vectors. However, an approximation in terms of a submatrix of AA often has a better explanatory value. For example, if AA is a n×dn\times d matrix in which each row is a data point, and if AA is well-approximated by its projection onto the span of jj of its columns, we can argue that, at least intuitively, these columns represent important features in the data. Goreinov and Tyrtyshnikov [GT01] gave one formalization of this intuition, which we cite next.

Let M0M\succeq 0 be an n×nn\times n matrix and let S[n]S\subseteq[n] be an optimal solution to the jj-MSD problem for MM. Then, for T=[n]ST=[n]\setminus S we have

where σj+1\sigma_{j+1} is the (j+1)(j+1)-st largest singular value of MM.

Hereditary discrepancy has an important application to rounding problems: roughly speaking, if the hereditary discrepancy of the matrix AA is low, then any solution to a xx linear system Ax=bAx=b can be rounded to an integral vector xˉ\bar{x} without introducing a lot of error. In this sense, hereditary discrepancy generalizes total unimodularity, which is equivalent to herdisc(A)=1\operatorname{herdisc}(A)=1 [GH62]. The following theorem, proved by Lovász, Spencer, and Vesztergombi, makes this connection precise.

In fact the theorem holds with hereditary discrepancy defined in terms of any norm.

The following theorem shows that detlb2(A)\operatorname{detlb}_{2}(A) gives nearly-tight upper and lower bound on herdisc2(A)\operatorname{herdisc}_{2}(A). While not explicitly stated in this form, the theorem can be proved by modifying the arguments in [LSV86, Mat11] in a straightforward way.

There exists a constant CC such that for any d×nd\times n matrix AA,

If for each 1jd1\leq j\leq d we have a factor α(j)\alpha(j) approximation for jj-MSD, then we get a factor α=defmaxjα(j)1/2j\alpha\stackrel{{\scriptstyle def}}{{=}}\max_{j}\alpha(j)^{1/2j}-approximation to detlb2(A)\operatorname{detlb}_{2}(A), and, therefore, a factor CαlogdC\alpha\log d approximation to herdisc2(A)\operatorname{herdisc}_{2}(A).

On the algorithmic side, the best known approximation for dd-MSD is (clogd)d(c\log d)^{d} for a constant cc, proved by Di Summa et al. [DEFM14]. They show that this approximation is achieved by a classical algorithm by Khachiyan [Kha95], for which they give a new analysis. This also implies a factor (clogd)d/2(c\log d)^{d/2} approximation for dd-MVS (see Lemma 5). For j<dj<d, the best approximation known is of the form (cj)j(cj)^{j} for a constant cc: algorithms with this guarantee were given by Packer [Pac04] for jj-MVS and by Çivril and Magdon-Ismail [ÇM13] for jj-MSD.

2 Our Contribution

In this paper we design deterministic polynomial time approximation algorithms for the jj-MSD, and, therefore, also the jj-MVS, problems. Our main result is the following theorem.

There exists a deterministic polynomial time algorithm which approximates the jj-MSD problem within a factor of ej+o(j)e^{j+o(j)}. This implies that there also exists a deterministic polynomial time algorithm which approximates the jj-MVS problem within a factor of ej/2+o(j)e^{j/2+o(j)}

This is the first approximation algorithm for jj-MSD and jj-MVS with an approximation factor of the form exp(O(j))\exp(O(j)), which matches the known hardness results up to the constant in the exponent. It is natural to conjecture that it is NP\mathsf{NP}-hard to approximate jj-MSD within a factor ejϵe^{j-\epsilon} for any ϵ>0\epsilon>0. We leave this as an open problem.

Theorem 4 implies a factor e+o(1)\sqrt{e}+o(1) approximation to detlb2(A)\operatorname{detlb}_{2}(A) for any d×nd\times n matrix AA, and, therefore, a factor O(logd)O(\log d) approximation to herdisc2(A)\operatorname{herdisc}_{2}(A). The latter result also follows from the techniques of the author and Talwar [NT14]. However, our result gives the first constant-factor approximation to a natural variant of the determinant lower bound. It is an interesting open problem to extend this result to the determinant lower bound for herdisc(A)\operatorname{herdisc}(A), which is equal to

We also use our techniques to give an elementary and short proof of a variant of the restricted invertibility principle of Bourgain and Tzafriri [BT87].

3 Techniques

Our strategy for approximating jj-MSD when j<dj<d is similar, but the analysis becomes more complicated. For motivation, let us consider the j=1j=1 case, in which we simply need to compute the largest diagonal entry of the input matrix MM, or, working with the columns v1,,vnv_{1},\ldots,v_{n} of the square root VV of MM, we need to compute the index ii such that viv_{i} has the largest squared Euclidean norm. Of course, this problem can be solved trivially in linear time by enumerating over the viv_{i}, but it is instructive to solve it using an approach similar to the one we used for dd-MSD. Consider the smallest enclosing ball problem for v1,,vnv_{1},\ldots,v_{n}: minimize rr subject to v1,,vnv_{1},\ldots,v_{n} being contained in a Euclidean ball of radius rr centered at . It is clear that the optimal rr is equal to the norm of the longest viv_{i}. The dual of the smallest enclosing ball problem is the problem of maximizing piviviT=pivi22\sum p_{i}v_{i}v_{i}^{T}=\sum p_{i}\|v_{i}\|_{2}^{2} over probability vectors pp (a much more general version of this fact is proved in Theorem 15). This latter problem is our convex relaxation of 11-MSD. While this is a natural relaxation that we could have arrived at directly, without going through the smallest enclosing ball problem, our approach pays off when generalizing to the case 1<j<d1<j<d, in which it is not clear how to come up directly with a natural convex relaxation of jj-MSD. The randomized rounding algorithm applied to the relaxation samples an index ii from the distribution determined by an optimal vector pp; the expected squared length of viv_{i} is pivi22\sum p_{i}\|v_{i}\|_{2}^{2}, i.e. exactly the objective value of the relaxation.

We follow a similar strategy for general jj. We define a minimization problem over ellipsoids centered at 0 that contain v1,,vnv_{1},\ldots,v_{n}. The objective of the problem is to minimize the volume of the largest jj-dimensional section of the containing ellipsoid. It is not hard to show that this problem gives an upper bound on jj-MSD (Lemma 13). The main technical challenge is to derive the dual of this optimization problem and to analyze the natural randomized rounding algorithm applied to it. An important difference from the j=dj=d case is that the objective of the ellipsoid optimization problem is no longer differentiable, which complicates the analysis of the dual. When 1<j<d1<j<d, the objective of the dual “splits” into two terms, one that resembles the j=1j=1 case and another that resembles the j=dj=d case. To relate the expected value of the output of the rounding algorithm to this more complicated objective we use the theory of Schur-concave functions applied to the elementary symmetric polynomials.

We derandomize our algorithms using the method of conditional expectations. This approach and the use of the elementary symmetric polynomials to relate the eigenvalues of a matrix to its entries are inspired by the volume-sampling algorithms of Deshpande and Rademacher [DR10]. These sampling techniques together with the Schur concavity of ratios of elementary symmetric polynomials were used previously in the work of Guruswami and Sinop [GS12] on low rank matrix approximations.

Preliminaries

We use the notation [n]={1,,n}[n]=\{1,\ldots,n\} for an integer nn. With (Sk){S\choose k} we denote the set of size kk subsets of the set SS.

Lemma 5 implies that a factor α\alpha approximation algorithm for jj-MSD implies a factor α\sqrt{\alpha} approximation algorithm for jj-MVS. For this reason, for the rest of the paper we will focus our attention on the jj-MSD problem.

2 Convex Analysis and Optimization

Consider an optimization problem in the following general form:

For any xx which is feasible for (1)–(2), and any y0y\geq 0, g(y)f0(x)g(y)\leq f_{0}(x). This fact is known as weak duality. The Lagrange dual problem is defined as

Strong duality holds when the optimal value of (3) equals the optimal value of (1)–(2). Slater’s condition is a commonly used sufficient condition for strong duality. We state it next.

Assume f0,,fmf_{0},\ldots,f_{m} in the problem (1)–(2) are convex functions over their respective domains, and for some k0k\geq 0, f1,,fkf_{1},\ldots,f_{k} are affine functions. Let there be a point xx in the relative interior of the domains of f0,,fmf_{0},\ldots,f_{m}, so that fi(x)0f_{i}(x)\leq 0 for 1ik1\leq i\leq k and fj(x)<0f_{j}(x)<0 for k+1jmk+1\leq j\leq m. Then the optimal value of (1)–(2) equals the optimal value of (3), and the value of (3) is achieved if it is finite.

For more information on convex programming and duality, we refer the reader to the books by Boyd and Vandenberghe [BV04] and Rockafellar [Roc70].

3 Ellipsoids and John’s Theorem

This program corresponds to finding the minimum volume ellipsoid centered at 0 that contains v1,,vnv_{1},\ldots,v_{n}. It is a convex minimization problem over the open domain {W:W0}\{W:W\succ 0\} with affine constraints, and, therefore, satisfies Slater’s condition. The dual problem to (4)–(6) is

Up to scaling of the variables c1,,cnc_{1},\ldots,c_{n}, this is the DD-optimal design problem. For a proof of the duality, see [BV04, Sect. 5.1.6, 5.2.4, 7.5.2]; it also follows from the the more general Theorem 15. Since it is the dual of a convex minimization problem, (7)–(9) is a convex maximization problem. Then the following variant of John’s theorem is a direct consequence of strong duality for the program (4)–(6) (which is implied by Slater’s condition):

The optimal value of (4)–(6) is equal to the optimal value of (7)–(9).

4 Properties of Determinants

First we recall the classical Binet-Cauchy formula for the determinant of a matrix product. For any m×nm\times n matrix AA, mnm\geq n, we have

Let eke_{k} be the degree kk elementary symmetric polynomial, i.e.

Let MM be an n×nn\times n symmetric matrix with eigenvalues λ1,,λn\lambda_{1},\ldots,\lambda_{n}. It is well-known that det(M)=en(λ1,,λn)\det(M)=e_{n}(\lambda_{1},\ldots,\lambda_{n}) and tr(M)=e1(λ1,,λn){\rm tr}(M)=e_{1}(\lambda_{1},\ldots,\lambda_{n}). In fact a similar identity involving the entries of MM and its eigenvalues holds for all kk:

This fact is also classical and can be proved by expressing each coefficient of the characteristic polynomial of MM in two different ways: as a sum of subdeterminants, and as a symmetric polynomial of its roots.

5 Schur Convexity

We use the following classical fact about the Schur-concavity of elementary symmetric functions, proved by Schur.

The elementary symmetric polynomial eke_{k} of degree kk, 1kn1\leq k\leq n, is Schur-concave.

The Full-Dimensional Case

In this section we discuss the special case j=dj=d. We treat this case separately because it is a natural problem in itself, and the technical details of our algorithm are simpler, while illustrating some of the key ideas of our approach.

We first prove a simple lemma which is essential to our analysis.

Let VV be a d×nd\times n matrix with column vectors v1,,vnv_{1},\ldots,v_{n}. Let p1,,pnp_{1},\ldots,p_{n} give the probabilities for a distribution on [n][n], i.e. pi0p_{i}\geq 0 for all ii and pi=1\sum p_{i}=1. Let SS be a random multiset of dd elements, each sampled independently with replacement from [n][n] according to the distribution determined by p1,,pnp_{1},\ldots,p_{n}. Then

where P=diag(p1,,pn)P=\operatorname{diag}(p_{1},\ldots,p_{n}) is a diagonal matrix with the values pip_{i} on the main diagonal. The right hand side is equal to d!det(VPVT)d!\det(VPV^{T}) by the Binet-Cauchy formula (10). Since VPVT=i=1npiviviTVPV^{T}=\sum_{i=1}^{n}{p_{i}v_{i}v_{i}^{T}}, this finishes the proof. ∎

We present our approximation algorithm for dd-MSD as Algorithm 1. The main approximation guarantee of the algorithm is given in Theorem 10.

Let the random multiset SS be the output of Algorithm 1 for input MM and an α\alpha-optimal c1,,cnc_{1},\ldots,c_{n}. Then

Observe first that det(MS,S)=det(VSTVS)=det(VS)2\det(M_{S,S})=\det(V_{S}^{T}V_{S})=\det(V_{S})^{2} for any SS of size dd. Then by Lemma 9, with pi=def1dcip_{i}\stackrel{{\scriptstyle def}}{{=}}\frac{1}{d}c_{i}, we have

The first inequality above follows by applying the AM-GM inequality to the eigenvalues of VTTWVTV_{T}^{T}WV_{T}, and the last inequality is implied by the constraints (5). What we have shown is equivalent to the intuitive geometric fact that the volume of the largest simplex with one vertex at contained in the convex hull of v1,,vnv_{1},\ldots,v_{n} is at most the volume of the largest simplex with one vertex at contained in the Löwner ellipsoid of v1,,vnv_{1},\ldots,v_{n} (or in fact any ellipsoid containing these points).

as desired. The asymptotic estimate d!dd2πded\frac{d!}{d^{d}}\sim\sqrt{2\pi d}e^{-d} is a restatement of Stirling’s approximation of d!d!. ∎

Since (7)–(9) is a convex optimization problem, we can use the ellipsoid method to to compute an α\alpha-optimal solution in time polynomial in n,d,logα1n,d,\log\alpha^{-1} [GLS81]. Khachiyan [Kha96] showed how to compute a dln(1+ϵ)d\ln(1+\epsilon)-optimal solution to (7)–(9) (i.e. a multiplicative (1+ϵ)d(1+\epsilon)^{d} approximation to \det\Bigl{(}\sum_{i}c_{i}v_{i}v_{i}^{T}\Bigr{)}) using a polynomial in n,d,ϵ1n,d,\epsilon^{-1} number of real value operations. Using either method with Algorithm 1, we get an approximation factor of 12πd((1+ϵ)e)d\frac{1}{\sqrt{2\pi d}}((1+\epsilon)e)^{d} in time polynomial in nn, dd, and ϵ1\epsilon^{-1}.

In Section 5 we show how to derandomize Algorithm 1 using the method of conditional expectations.

The General Case

A natural first attempt to extend Algorithm 1 to general j<dj<d is to simply sample jj, rather than dd, coordinates from the distribution induced by an optimal solution to (7)–(9). A straightforward extension of the analysis in Section 3 shows that this algorithm achieves approximation factor ddj!\tfrac{d^{d}}{j!}, which is exp(O(j))\exp(O(j)) for j=Ω(d)j=\Omega(d) but approaches ddd^{d} for smaller jj. In order to achieve exp(O(j))\exp(O(j)) approximation for all jj, we generalize the Löwner ellipsoid problem. The rounding algorithm remains essentially the same, but the details of the analysis become more complicated.

It is not hard to see that maxHvolj(HE)\max_{H}{\operatorname{vol}_{j}(H\cap E)} for an ellipsoid EE is proportional to the product of the lengths of the jj longest major axes of EE. We use this observation to formulate the problem of finding jj-Löwner ellipsoid as a convex program. First we need to define the appropriate function on the space of positive definite matrices.

To show that Δj(W)\Delta_{j}(W) is convex and continuous, and to characterize its subdifferentials, we will use a general result of Lewis, extending classical work by von Neumann on unitarily invariant matrix norms. Below we state a slightly specialized case of his result.

For a set S[d]S\subseteq[d], let us use the notation 1S1_{S} for the dd-dimensional indicator vector of SS, i.e. the ii-th coordinate of 1S1_{S} is 11 if iSi\in S and otherwise. Let us define the convex polytope Vj,d=defconv{1S:S([d]j)}V_{j,d}\stackrel{{\scriptstyle def}}{{=}}\operatorname{conv}\{1_{S}:S\in{[d]\choose j}\}. This is the basis polytope of the rank jj uniform matroid. We can now prove the convexity of Δj\Delta_{j} and characterize its subdifferentials.

The function Δj\Delta_{j} is convex and continuous on the space of positive definite matrices. Moreover, for any W0W\succ 0 with eigenvalues

the subdifferential of Δj\Delta_{j} at WW is

Because δj\delta_{j} is symmetric, Lemma 11 implies that in order to show that Δj\Delta_{j} is convex and continuous, we only need to show that δj\delta_{j} is convex and continuous. Because the function lnx-\ln x is monotone decreasing in xx, we can write δj(x)\delta_{j}(x) as

Since δj(λ)=max{δS(λ):S([d]j)}\delta_{j}(\lambda)=\max\{\delta_{S}(\lambda):S\in{[d]\choose j}\}, and each δS\delta_{S} is differentiable, we have

The gradient δS(λ)\nabla\delta_{S}(\lambda) is given by δS(λ)λi=λi1\frac{\partial\delta_{S}(\lambda)}{\partial\lambda_{i}}=-\lambda_{i}^{-1} for iSi\in S and δS(λ)λi=0\frac{\partial\delta_{S}(\lambda)}{\partial\lambda_{i}}=0 otherwise. We have

This implies the desired characterization of δj(λ)\partial\delta_{j}(\lambda). ∎

Geometrically, the lemma captures the following fact. Let EE be an ellipsoid centered at and containing v1,,vnv_{1},\ldots,v_{n}. Then the volume of the largest jj-dimensional simplex in the convex hull of v1,,vnv_{1},\ldots,v_{n} with one vertex at is at most the volume of the largest jj-dimensional simplex in EE with one vertex at . Moreover, the latter simplex is contained in the jj-dimensional subspace whose intersection with EE has the largest volume. In the formal proof below we use a linear algebraic argument.

Let λ1λd\lambda_{1}\leq\ldots\leq\lambda_{d} be the eigenvalues of WW, and let μ1μj\mu_{1}\leq\ldots\leq\mu_{j} be the eigenvalues of UTWUU^{T}WU. By the Cauchy interlace theorem, λiμi\lambda_{i}\leq\mu_{i} for any 1ij1\leq i\leq j, and, therefore,

On the other hand, by applying the AM-GM inequality to the eigenvalues of the matrix VSTWVSV_{S}^{T}WV_{S}, we get

The final inequality above follows from the constraints (14). Combining the inequalities gives the desired bound. ∎

2 Duality for j𝑗j-Löwner Ellipsoids

As mentioned above, the program (13)–(15) that we used to capture jj-Löwner ellipsoids is convex and satisfies Slater’s condition. Therefore, it admits a dual characterization, which we will use in our algorithm. In this section we derive the dual characterization using the Lagrange dual function.

Before we introduce the dual, or even properly define its objective function, we need to prove a technical lemma.

Let x1xm0x_{1}\geq\ldots\geq x_{m}\geq 0 be non-negative reals, and let jmj\leq m be a positive integer. There exists a unique integer kk, 0kj10\leq k\leq j-1, such that

Define x>k=defi>kxix_{>k}\stackrel{{\scriptstyle def}}{{=}}\sum_{i>k}{x_{i}}. If x>0jx1x_{>0}\geq jx_{1} holds, then (16) is satisfied for k=0k=0, and we are done. So let us assume that x>0<jx1x_{>0}<jx_{1}. Then x>1=x>0x1<(j1)x1x_{>1}=x_{>0}-x_{1}<(j-1)x_{1}, and the first inequality in (16) is satisfied for k=1k=1. If the second inequality is also satisfied we are done, so let us assume that x>1<(j1)x2x_{>1}<(j-1)x_{2}, which implies the first inequality in (16) for k=2k=2. Continuing in this manner, we see that if the inequalities (16) are not satisfied for any k{0,,j2}k\in\{0,\ldots,j-2\}, then we must have x>j1<xj1x_{>j-1}<x_{j-1}. But the second inequality for k=j1k=j-1, i.e. x>j1=xj+x>jxjx_{>j-1}=x_{j}+x_{>j}\geq x_{j} is always satisfied because all the xix_{i} are non-negative, so we have that if (16) does not hold for any kj2k\leq j-2, then it must hold for k=j1k=j-1. This finishes the proof of existence.

This completes the proof of uniqueness. ∎

We now introduce a function which will be used in formulating a dual characterization of (13)–(15).

We will prove that the dual of (13)–(15) is equivalent to the following optimization problem:

The program (17)–(19) is a convex optimization problem, and its optimal value is equal to the optimal value of (13)–(15).

Observe that when j=dj=d, Γj(X)=lndet(X)\Gamma_{j}(X)=\ln\det(X), so that (13)–(15) in this case reduces to (7)–(9). I.e. Theorem 15 generalizes Lemma 7.

To prove Theorem 15, we need two additional technical lemmas. The first one is well-know and follows from more general results characterizing the facets of the basis polytope of a matroid [Sch03].

For any jj and dd, Vj,d={x:i=1dxi=j,0xi1 1id}.V_{j,d}=\{x:\sum_{i=1}^{d}{x_{i}}=j,0\leq x_{i}\leq 1~{}\forall 1\leq i\leq d\}.

The next lemma is the key technical ingredient in the proof of Theorem 15.

Let X0X\succeq 0 be a d×dd\times d matrix of rank at least jj. Then there exists a d×dd\times d matrix W0W\succ 0 such that XΔj(W)X\in-\partial\Delta_{j}(W), and Γj(X)=Δj(W)\Gamma_{j}(X)=\Delta_{j}(W).

Let rr be the rank of XX, and let μ1μd\mu_{1}\geq\ldots\geq\mu_{d} be its eigenvalues. Let UU be an orthonormal matrix such that X=Udiag(λ)UTX=U\operatorname{diag}(\lambda)U^{T} for μ=(μ1,,μd)\mu=(\mu_{1},\ldots,\mu_{d}). Assume that kk is a positive integer strictly smaller than jj such that μk>i>kμijkμk+1\mu_{k}>\frac{\sum_{i>k}{\mu_{i}}}{j-k}\geq\mu_{k+1} and define ν=defi>kλijk\nu\stackrel{{\scriptstyle def}}{{=}}\frac{\sum_{i>k}{\lambda_{i}}}{j-k}. A unique such choice of kk exists by Lemma 14. Moreover, since XX has rank at least jj, λ1λk+1>0\lambda_{1}\geq\ldots\geq\lambda_{k+1}>0, which also implies ν0\nu\geq 0. Therefore, the following vector λ\lambda is well-defined for any ν>ϵ>0\nu>\epsilon>0:

Let us set W=defUdiag(λ)UTW\stackrel{{\scriptstyle def}}{{=}}U\operatorname{diag}(\lambda)U^{T}. By Lemma 12, to prove that XΔj(W)X\in-\partial\Delta_{j}(W), it suffices to show that (μk+1,,μr)νVjk,rk(\mu_{k+1},\ldots,\mu_{r})\in\nu V_{j-k,r-k}. This inclusion follows from Lemma 16 because, by the choice of ν\nu and kk, 0λiν0\leq\lambda_{i}\leq\nu for all k+1irk+1\leq i\leq r, and i=k+1rμi=(jk)ν\sum_{i=k+1}^{r}{\mu_{i}}=(j-k)\nu.

The equality Γj(X)=Δj(W)\Gamma_{j}(X)=\Delta_{j}(W) follows by a calculation. By the choice of kk and ν\nu, μ1μk>ν\mu_{1}\geq\ldots\geq\mu_{k}>\nu. Therefore, the jj smallest eigenvalues of WW are λ1λj\lambda_{1}\leq\ldots\leq\lambda_{j}, and we have

Let us define {W:W0}\{W:W\succ 0\} to be the domain for the constraints (14) and the objective function (13). This makes the constraint W0W\succ 0 implicit. The optimization problem is convex by Lemma 12. Is is also always feasible: for example, if r=maxi=1nvi22r=\max_{i=1}^{n}{\|v_{i}\|_{2}^{2}}, then r1Ir^{-1}I is a feasible solution. Slater’s condition is therefore satisfied and strong duality holds. To prove the theorem, it suffices to show that the dual problem to (13)–(15) is equivalent to (17)–(19).

The Lagrange dual function for (13)–(15) is

A matrix W0W\succ 0 achieves the minimum above if and only if 0g(c)0\in\partial g(c), which, by the additivity of subgradients, is equivalent to i=1nciviviTΔj(W)\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\in-\partial\Delta_{j}(W). Define X=defi=1nciviviTX\stackrel{{\scriptstyle def}}{{=}}\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}. Consider first the case in which XX has rank less than jj. Let t0t\geq 0 be a parameter, and let Π\Pi be an orthogonal projection matrix onto the nullspace of XX. Consider the matrix W=defI+tΠW\stackrel{{\scriptstyle def}}{{=}}I+t\Pi. The sum i=1nciviTWvi=tr(XW)=tr(X)\sum_{i=1}^{n}{c_{i}v_{i}^{T}Wv_{i}}={\rm tr}(XW)={\rm tr}(X) remains bounded for all tt, while Δj(W)\Delta_{j}(W) goes to -\infty as tt\to\infty. Therefore g(c)=g(c)=-\infty in this case.

Next we consider the case in which XX has rank at least jj. Then, by Lemma 17, there exists a WW such that XΔj(W)X\in-\partial\Delta_{j}(W), and, therefore, this WW achieves g(c)g(c). From Lemma 12 and XΔj(W)X\in-\partial\Delta_{j}(W), it follows that i=1nciviTWvi=tr(XW)=j\sum_{i=1}^{n}{c_{i}v_{i}^{T}Wv_{i}}={\rm tr}(XW)=j. Also using the fact that, by Lemma 17, Δj(W)=Γj(X)\Delta_{j}(W)=\Gamma_{j}(X), we have

To finish the proof we show that any cc that maximizes the right hand side above satisfies i=1nci=j\sum_{i=1}^{n}{c_{i}}=j, and, therefore, the optimal value of the dual problem, max{g(c):ci0 1in}\max\{g(c):c_{i}\geq 0\ \forall 1\leq i\leq n\}, is equal to the optimal value of (17)–(19). Let us fix some arbitrary cc such that ci0c_{i}\geq 0 for all ii and i=1nci=j\sum_{i=1}^{n}{c_{i}}=j, and consider the function h(t)=defg(tc)h(t)\stackrel{{\scriptstyle def}}{{=}}g(tc), defined over positive real numbers tt. It will be enough to show that the unique maximizer of h(t)h(t) is t=1t=1. Since hh is a restriction of a convex function, it is also convex, and it is enough to show that 11 is the unique solution of dhdt=0\frac{dh}{dt}=0. Let λ1λd\lambda_{1}\geq\ldots\geq\lambda_{d} be the eigenvalues of i=1nciviviT\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}, and let kk be the unique integer with which Γj(i=1nciviviT)\Gamma_{j}(\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}) is computed, i.e. kk is such that λk>i>lλijkλk+1\lambda_{k}>\frac{\sum_{i>l}{\lambda_{i}}}{j-k}\geq\lambda_{k+1}. The eigenvalues of i=1ntciviviT\sum_{i=1}^{n}{tc_{i}v_{i}v_{i}^{T}} are tλ1tλdt\lambda_{1}\geq\ldots\geq t\lambda_{d}, so the condition tλk>i>ltλijktλk+1t\lambda_{k}>\frac{\sum_{i>l}{t\lambda_{i}}}{j-k}\geq t\lambda_{k+1} is clearly satisfied, and, by Lemma 14, this choice of kk is unique. Therefore, \Gamma_{j}\Bigl{(}\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\Bigr{)} and \Gamma_{j}\Bigl{(}\sum_{i=1}^{n}{tc_{i}v_{i}v_{i}^{T}}\Bigr{)} are computed with the same kk. By the basic properties of logarithms, \Gamma_{j}\Bigl{(}t\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\Bigr{)}=\Gamma_{j}\Bigl{(}\sum_{i=1}^{n}{c_{i}v_{i}v_{i}^{T}}\Bigr{)}+j\ln t, and, therefore,

The derivative dhdt=jtj\frac{dh}{dt}=\frac{j}{t}-j vanishes only at t=1t=1, which implies that h(t)=g(tc)h(t)=g(tc) is maximized at t=1t=1. This proves the claim and finishes the proof of the theorem. ∎

3 The Rounding Algorithm

Our rounding algorithm, shown as Algorithm 2, is nearly identical to Algorithm 1, except for using probability weights proportional to an optimal solution of (17)–(19). The approximation guarantee for the algorithm is given by the following theorem.

Let the random multiset SS be the output of Algorithm 1 for input MM and α\alpha-optimal c1,,cnc_{1},\ldots,c_{n}. Then

Let us define pi=def1jcip_{i}\stackrel{{\scriptstyle def}}{{=}}\frac{1}{j}c_{i} for 1in1\leq i\leq n, and P=defdiag(p1,,pn)P\stackrel{{\scriptstyle def}}{{=}}\operatorname{diag}(p_{1},\ldots,p_{n}). If some element in SS repeats, then det(MS,S)=0\det(M_{S,S})=0. On the other hand, each set SS can be sampled in j!j! different ways, one for each ordering of its elements. We can then write the expectation of det(MS,S)\det(M_{S,S}) as

The asymptotic estimate j!jj2πjej\frac{j!}{j^{j}}\sim\sqrt{2\pi j}e^{-j} is again just Sterling’s approximation to j!j!. This completes the proof of the theorem. ∎

Since (17)–(19) is a convex optimization problem, we can use the ellipsoid method to to compute an α\alpha-optimal solution in time polynomial in n,d,logα1n,d,\log\alpha^{-1} [GLS81]. Together with Algorithm 2, we get an approximation factor of 12πd((1+ϵ)e)j\frac{1}{\sqrt{2\pi d}}((1+\epsilon)e)^{j} in time polynomial in nn, dd, and logϵ1\log\epsilon^{-1}. It is also conceivable that the barycentric coordinate descent method of Khachiyan [Kha96] can be extended to solve (17)–(19).

In Section 5 we show how to derandomize Algorithm 2 using the method of conditional expectations.

Derandomizing the Algorithms

In Theorems 10 and 18 we only proved our approximation guarantees in expectation. A priori, this does not give a useful bound on the probability that the set output by Algorithm 1 or 2 is close to optimal. However, it is not hard to derandomize the algorithms using the method of conditional expectation. The deterministic algorithm is presented as Algorithm 3.

By the method of conditional expectation [AS08], and Theorem 18, it is enough to show that

where the expectation is over the distribution on the output of Algorithm 2. Expanding the expectation on the right hand side, we have

The second equality follows from the “base times height” formula for the determinant. The penultimate equality follows from (11). ∎

To implement Algorithm 3, we need to be able to evaluate the elementary symmetric polynomial ejT(λ(T))e_{j-|T|}(\lambda(T)). This can be done in polynomial time by expanding the characteristic polynomial of the matrix (C1/2VTΠ(T)VC1/2)[n]T,[n]T(C^{1/2}V^{T}\Pi(T)VC^{1/2})_{[n]\setminus T,[n]\setminus T}: ejT(λ(T))e_{j-|T|}(\lambda(T)) is equal to (1)jT(-1)^{j-|T|} times the coefficient of the term of degree dj+Td-j+|T|.

Restricted Invertibility Principles

The celebrated Restricted Invertibility Principle (RIP) of Bourgain and Tzafriri [BT87] is a powerful generalization of the simple fact in linear algebra that a matrix of rank rr has an invertible submatrix with at least rr columns. The RIP shows that if the “robust rank” of a matrix is large, i.e. the matrix has many large singular values, then it contains a proportionally large submatrix which is well-invertible, i.e. its inverse is bounded in operator norm. The RIP has had many applications in Banach space theory, asymptotic convex geometry, statistics, and recently in discrepancy theory and private data analysis.

The version of the RIP above was proved by Spielman and Srivastava for c1==cn=1c_{1}=\ldots=c_{n}=1. However, essentially the same proof shows the slight generalization formulated above.

Next we state our version of the RIP for determinants.

Moreover, such a set SS can be computed in deterministic polynomial time.

where P=defdiag(p1,,pn)P\stackrel{{\scriptstyle def}}{{=}}\operatorname{diag}(p_{1},\ldots,p_{n}), μ\mu is the vector of eigenvalues of the matrix P1/2XP1/2P^{1/2}XP^{1/2}, and the final equality follows by (11).

Let us identify LL with a d×dd\times d matrix in the natural way. The matrix

Note that the entries of λ\lambda (i.e. the eigenvalues of LLTLL^{T}) are equal to the squared singular values of LL, and, therefore, LHS2=λ1\|L\|_{HS}^{2}=\|\lambda\|_{1} and L222=λ1\|L\|_{2\to 2}^{2}=\lambda_{1}. To complete the proof, it remains to show that ej(λ)jjλ1je_{j}(\lambda)\geq j^{-j}\|\lambda\|_{1}^{j}. Because eje_{j} is Schur concave, it is enough to show that \lambda\prec\bar{\lambda}\stackrel{{\scriptstyle def}}{{=}}\Bigl{(}\frac{\|\lambda\|_{1}}{j},\ldots,\frac{\|\lambda\|_{1}}{j},0,\ldots,0\Bigr{)}, where λˉ\bar{\lambda} has jj non-zero coordinates. Indeed, for any kjk\leq j, by the choice of jj,

For j<kdj<k\leq d, i=1kλiλ1=i=1kλˉi\sum_{i=1}^{k}{\lambda_{i}}\leq\|\lambda\|_{1}=\sum_{i=1}^{k}{\bar{\lambda}_{i}}, with equality for k=dk=d. Therefore, λλˉ\lambda\prec\bar{\lambda} and ej(λ)ej(λˉ)=jjλ1je_{j}(\lambda)\geq e_{j}(\bar{\lambda})=j^{-j}\|\lambda\|_{1}^{j}. This finishes the proof of the main claim.

A set S[n]S\subseteq[n] such that M=XS,SM=X_{S,S} satisfies the conclusion of the theorem can be computed in deterministic polynomial time via the method of conditional expectations, as in the proof of Theorem 19. ∎

Theorem 21 is incomparable with Theorem 20. On one hand, the conclusion of Theorem 20 is of a qualitatively stronger type: it implies a lower bound on the smallest singular value of the matrix M=(Lvi,Lvk)i,kSM=(\langle Lv_{i},Lv_{k}\rangle)_{i,k\in S}. On the other hand, the set SS in Theorem 21 can be as large as the (floor function of the) robust rank LHS2/L222\|L\|_{HS}^{2}/\|L\|_{2\to 2}^{2}, while this is in general not possible in Theorem 20.

Conclusion

We have given a polynomial time deterministic algorithm that approximates the jj-MSD problem by a factor of ej+o(j)e^{j+o(j)}, and, therefore, the jj-MVS problem by a factor of ej/2+o(j)e^{j/2+o(j)}. Our algorithms use randomized rounding with a generalization of the DD-optimal design problem. The analysis relies on convex duality, Schur convexity, and elementary properties of determinants.

We conjecture that approximating the jj-MSD problem within a factor of ejϵe^{j-\epsilon} is NP\mathsf{NP}-hard for any ϵ>0\epsilon>0. As an easier problem, it will be interesting to construct an input for which the jj-Löwner ellipsoid approximates jj-MSD no better than a factor of eje^{j}, or to give a better analysis.

We also leave open the problem of computing a constant factor approximation to the determinant lower bound on hereditary discrepancy. Finally, it will be interesting to generalize Khachiyan’s barycentric coordinate descent algorithm for the DD-optimal design problem to the dual of the jj-Löwner ellipsoid problem.

Acknowledgements

I thank Kunal Talwar and Nikhil Bansal for useful discussions.

References