Faster and Simpler Width-Independent Parallel Algorithms for Positive Semidefinite Programming

Richard Peng, Kanat Tangwongsan, Peng Zhang

Introduction

Semidefinite programming (SDP), alongside linear programming (LP), is an important tool in approximation algorithms, optimization, and discrete mathematics. In the context of approximation algorithms alone, it has emerged as a key technique which underlies a number of impressive results that substantially improve the approximation ratios. To solve a semidefinite program, algorithms from the linear programming literature such as Ellipsoid or interior-point algorithms can be applied to derive near exact solutions. But they are often costly. As a result, finding efficient approximations to such problems is a critical step in making them more practical.

From a parallel algorithms standpoint, both LPs and SDPs are P-complete even to approximate to any constant accuracy, suggesting that it is unlikely that they have a polylogarithmic depth algorithm. For linear programs, however, the special case of positive linear programs, first studied by Luby and Nisan , has an algorithm that finds a (1+ε)(1+\varepsilon)-approximate solution in O(poly(1εlogn))O(\textrm{poly}(\tfrac{1}{\varepsilon}\log{n})) iterations. This weaker approximation guarantee is still sufficient for approximation algorithms (e.g., solutions to vertex cover and set cover via randomized rounding), spurring interest in studying these problems in both sequential and parallel contexts (see, e.g., ).

The importance of problems such as MaxCut and Sparsest Cut has led to the identification and study of positive SDPs. Our work is motivated by a result by Jain and Yao that gave the first positive SDP algorithm whose work and depth are independent of the width parameter (commonly known as width-independent algorithms). As there are substantial differences in the analysis in this version compared to the conference version , we address the relation between our paper and other works in Section 1.1.

We present a simple algorithm that offers guarantees similar to but has less work-depth complexity. Each iteration of our algorithm involves only simple matrix operations and computing the trace of the product of a matrix exponential and a positive semidefinite matrix. The input consists of an accuracy parameter ε>0\varepsilon>0 and a positive semidefinite program (PSDP) in the following standard primal form:

where the matrices C,A1,,An\mathbf{C},\mathbf{A}_{1},\dots,\mathbf{A}_{n} are mm-by-mm symmetric positive semidefinite matrices, \bullet denotes the pointwise dot product between matrices (see Section 2), and the scalars b1,,bnb_{1},\dots,b_{n} are nonnegative reals. This is a subclass of SDPs where the matrices and scalars are “positive” in their respective settings. We also assume, as is standard, that the SDP has strong duality. Our main result is as follows:

Given a primal positive SDP involving m×mm\times m matrices with nn constraints and an accuracy parameter ε>0\varepsilon>0, there is an algorithm approxPSDP that produces a (1+ε)(1+\varepsilon)-approximation in O(1ε3log3n)O(\frac{1}{\varepsilon^{3}}\log^{3}n) iterations, where each iteration involves computing matrix sums and a special primitive that computes exp(Φ)A\exp(\mathbf{\Phi})\bullet\mathbf{A} in the case when Φ\mathbf{\Phi} and A\mathbf{A} are both positive semidefinite.

The theorem quantifies the cost of our algorithm in terms of the number of iterations. The work and depth bounds implied by this theorem vary with the format of the input and how the matrix exponential is computed in each iteration. As we will discuss in Section 4, with input given in a suitable form, our algorithm runs in nearly-linear work and polylogarithmic depth.

The first definition of positive SDPs was due to Klein and Lu , who used it to characterize the MaxCut SDP. The MaxCut SDP can be viewed as a direct generalization of positive (packing) LPs. More recent work defines the notion of positive packing SDPs , which captures problems such as MaxCut, sparse PCA, and coloring; and the notion of covering SDPs , which captures the ARV relaxation of Sparsest Cut among others. Works this area tend to focus on developing fast sequential algorithms for finding a (1+ε)(1+\varepsilon)-approximation, leading to a series of sequential algorithms (e.g., ). The iteration count for these algorithms, however, depends on the so-called “width” parameter of the input program or some parameter of the spectrum of the input program. In some instances, the width parameter can be as large as Ω(n)\Omega(n), making it a bottleneck in the depth of direct parallelization. This is even more so in the case of the Sparsest Cut SDP even though the problem has been labeled as a covering SDP .

Over the past few years, width-independent algorithms for positive SDPs have received much attention. Jain and Yao gave the first result in this direction: a polylog depth algorithm based on the first width independent linear programming algoirthm by Luby and Nisan . Their algorithm is based on updating the primal matrix. This leads to an intricate analysis based on carefully analyzing the eigenspaces of a particular matrix before and after each update. It takes O(1ε13log13mlogn)O(\frac{1}{\varepsilon^{13}}\log^{13}m\log n) iterations, each of which involves computing spectral decompositions using least Ω(mω)\Omega(m^{\omega}) work.

Our algorithm follows a different approach. It updates the dual program in a way motivated by the width-independent positive LP algorithm by Young . Concurrently, Jain and Yao gave a similar algorithm for positive SDPs. Their algorithm solves a class of SDPs which contains both packing and diagonal covering constraints. Since matrix packing conditions between diagonal matrices are equivalent to point-wise conditions of the diagonal entries, these constraints are closer to a generalization of positive covering LP constraints. We believe that removing this restriction on diagonal packing matrices would greatly widen the class of problems included in this class of SDPs and discuss possibilities in this direction in Section 5.

The convergence analyses of both of these routines go through the matrix multiplicative weights update framework for solving semidefinite programs . However, a crucial algebraic piece is missing for adapting the iteration count bound from Young’s algorithm. Unlike scalar exponentials, matrix exponentials is not a monotonic function under standard notions of matrix ordering such as the Loewner partial order.

Most recently, Allen-Zhu et al. gave the first rigorous adaptation of Young’s algorithm to positive SDPs . Motivated by this result, as well as its precursor in positive LPs , we complete the analysis of our original algorithm. Specifically, we show that the dual generation scheme in these routines provide good bounds on the iteration count. This leads to a different analysis of Young’s algorithm that omits phases. Our modified analysis is for a simplified pseudocode of the algorithm from that removes these phases. However, the phase-based version can be analyzed similarly.

Compared to the algorithm by Allen-Zhu et al. , our analysis uses more elementary linear algebraic techniques, and is closer to multiplicative weights update schemes. We believe the dynamic bucketing method for obtaining better dependencies on ε\varepsilon is also applicable to our analysis. Finally, as many recent results on faster positive LP / SDP algorithms take optimization based views, we believe our approach is also of independent interest.

2 Overview

We derive our algorithm by generalizing Young’s algoirthm . In place of the “soft max” function for bounding the maximum of a set of linear constraints, we use matrix exponential and the matrix multiplicative weights update (MMWU) mechanism. Moreover, besides standard operations on (sparse) matrix, the only other primitive needed is the matrix dot product exp(Φ)A\exp(\mathbf{\Phi})\bullet\mathbf{A}, where Φ\mathbf{\Phi} and A\mathbf{A} are positive semidefinite.

Intuitions. For intuition about packing SDPs and the matrix multiplicative weights update method for finding approximate solutions, a useful analogy of the decision problem is that of packing a (fractional) amount of ellipses into the unit ball. Figure 1 provides an example involving 33 matrices (ellipses) in 22 dimensions. Note that A1\mathbf{A}_{1} and A2\mathbf{A}_{2} are axis-aligned; their sum is also axis-aligned in this case. In fact, positive linear programs in the broader context corresponds exactly to the restriction of all ellipsoids being axis-aligned. In this setting, the algorithm of can be viewed as creating a penalty function by weighting the length of the axises using an exponential function. Then, ellipsoids with sufficiently small penalty subject to this function have their weights increased.

However, once we allow general ellipsoids such as A3\mathbf{A}_{3}, the resulting sum will no longer be axis-aligned. In this setting, a natural extension is to take the exponential of the semimajor axises of the resulting ellipsoid instead, leading to the matrix multiplicative weights scheme. Our analysis then focuses on showing that Young’s algorithm still has a width-independent iteration count in this setting.

Work and Depth. We now discuss the work and depth bounds of our algorithm. The main cost of each iteration of our algorithm comes from computing the dot product between a matrix exponential and a PSD matrix. Like in the sequential setting , we need to compute for each iteration the product Aiexp(Φ)\mathbf{A}_{i}\bullet\exp(\mathbf{\Phi}), where Φ\mathbf{\Phi} is some PSD matrix. The cost of our algorithm therefore depends on how the input is specified. When the input is given prefactored—that is, the mm-by-mm matrices Ai\mathbf{A}_{i}’s are given as Ai=QiQi\mathbf{A}_{i}=\mathbf{Q}_{i}\mathbf{Q}_{i}^{\top} and the matrix C1/2\mathbf{C}^{-1/2} is given, then Theorem 4.1 can be used to compute matrix exponential in O(1ε3(m+q)lognlogqlog(\nicefrac1ε))O(\frac{1}{\varepsilon^{3}}(m+q)\log n\log q\log(\nicefrac{{1}}{{\varepsilon}})) work and O(1εlognlogqlog(\nicefrac1ε))O(\frac{1}{\varepsilon}\log n\log q\log(\nicefrac{{1}}{{\varepsilon}})) depth, where qq is the number of nonzero entries across Qi\mathbf{Q}_{i}’s and C1/2\mathbf{C}^{-1/2}. This is because the matrix Φ\mathbf{\Phi} that we exponentiate has Φ2O(1εlogn)\|{\mathbf{\Phi}}\|_{2}\leq O(\frac{1}{\varepsilon}\log n), as shown in Lemma 3.5. Therefore, as a corollary to the main theorem, we have the following cost bounds:

The algorithm approxPSDP has in O~(1ϵ6(n+m+q))\widetilde{O}(\frac{1}{\epsilon^{6}}(n+m+q)) work and O(1ϵ4logO(1)(n+m+q))O(\frac{1}{\epsilon^{4}}\log^{O(1)}(n+m+q)) depth.

If, however, the input program is not given in this form, we can add a preprocessing step that factors each Ai\mathbf{A}_{i} into QiQi\mathbf{Q}_{i}\mathbf{Q}_{i}^{\top} since Ai\mathbf{A}_{i} is positive semidefinite. In general, this preprocessing requires at most O(m4)O(m^{4}) work and O(log3m)O(\log^{3}m) depth using standard parallel QR factorization . Furthermore, these matrices often have certain structure that makes them easier to factor. Similarly, we can factor and invert C\mathbf{C} with the same cost bound, and can do better if it also has specialized structure.

Background and Notation

where v1,v2,,vm\mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{m} are the eigenvectors of A\mathbf{A} with eigenvalues λ1λm\lambda_{1}\geq\dots\geq\lambda_{m} respectively. We will use λ1(A)\lambda_{1}(\mathbf{A}), λ2(A),,λm(A)\lambda_{2}(\mathbf{A}),\dots,\lambda_{m}(\mathbf{A}) to represent the eigenvalues of A\mathbf{A} in decreasing order and also use λmax(A)\lambda_{\max}(\mathbf{A}) to denote λ1(A)\lambda_{1}(\mathbf{A}). Notice that positive semidefiniteness induces a partial ordering on matrices. We write AB\mathbf{A}\preccurlyeq\mathbf{B} if BA0\mathbf{B}-\mathbf{A}\succcurlyeq\mathbf{0}.

The trace of a matrix A\mathbf{A}, denoted Tr[A]\mathop{\mathsf{Tr}}\left[\mathbf{A}\right], is the sum of the matrix’s diagonal entries: Tr[A]=iAi,i\mathop{\mathsf{Tr}}\left[\mathbf{A}\right]=\sum_{i}A_{i,i}. Alternatively, the trace of a matrix can be expressed as the sum of its eigenvalues, so Tr[A]=iλi(A)\mathop{\mathsf{Tr}}\left[\mathbf{A}\right]=\sum_{i}\lambda_{i}(\mathbf{A}). Furthermore, we define

It follows that A\mathbf{A} is positive semidefinite if and only if AB0\mathbf{A}\bullet\mathbf{B}\geq 0 for all PSD B\mathbf{B}.

where, again, vi\mathbf{v}_{i} is the eigenvector corresponding to the eigenvalue λi\lambda_{i}. It is not difficult to check that for exp(A)\exp(\mathbf{A}), this definition coincides with exp(A)=i01i!Ai\exp(\mathbf{A})=\sum_{i\geq 0}\frac{1}{i!}\mathbf{A}^{i}.

Our algorithm relies on a matrix multiplicative weights (MMW) algorithm, which can be summarized as follows. For a fixed ε012\varepsilon_{0}\leq\frac{1}{2} and W(1)=I\mathbf{W}^{(1)}=\mathbf{I}, we play a “game” a number of times, where in iteration t=1,2,t=1,2,\dots, the following steps are performed:

Produce a “probability” matrix P(t)=W(t)/Tr[W(t)]\mathbf{P}^{(t)}=\mathbf{W}^{(t)}/\mathop{\mathsf{Tr}}\left[\mathbf{W}^{(t)}\right];

Incur a gain matrix M(t)\mathbf{M}^{(t)}; and

Like in the standard setting of multiplicative weights algorithms, the gain matrix is chosen by an external party, possibly adversarially. In our algorithm, the gain matrix is chosen to reflect the step we make in the iteration. Arora and Kale shows that the MMW algorithm has the following guarantees (restated for our setting):

For ε012\varepsilon_{0}\leq\frac{1}{2}, if M(t)\mathbf{M}^{(t)}’s are all PSD and M(t)I\mathbf{M}^{(t)}\preccurlyeq\mathbf{I}, then after TT iterations,

2 Reduction to Bounded Decision Version

Our algorithm works with normalized primal/dual programs shown in Figure 2.

By using binary search and appropriately scaling the input program, such an SDP can be approximated using the following decision problem:

or a PSD matrix Y\mathbf{Y} (a primal solution) such that

This reduction can also ensure bounded trace on all Ai\mathbf{A}_{i}’s. The following lemma summarizes key properties of such a reduction:

For 0<ε<10<\varepsilon<1, a positive packing semidefinite program can be approximated to a relative error to ε\varepsilon using O(logn)O(\log{n}) calls to the ε\varepsilon-decision problem. Furthermore, each Ai\mathbf{A}_{i} supplied to the decision problem has Tr[Ai]O(n3)\mathop{\mathsf{Tr}}\left[\mathbf{A}_{i}\right]\leq O(n^{3}).

The reduction is common to most positive linear and semidefinite program solvers ; we briefly sketch the idea for completeness.

(Sektch) First, transform to the normalized form by “dividing through” by C\mathbf{C} (see Appendix A). Since Tr[ixiAi]\mathop{\mathsf{Tr}}\left[\sum_{i}x_{i}\mathbf{A}_{i}\right] is within a factor of nn of the maximum eigenvalue of ixiAi\sum_{i}x_{i}\mathbf{A}_{i}, 1miniTr[Ai]\frac{1}{\min_{i}\mathop{\mathsf{Tr}}\left[\mathbf{A}_{i}\right]} gives a value that is within a factor nn of the optimum. Therefore, the optimization version can be solved by binary searching on the objective a total of at most O(log(nε))O(\log(\frac{n}{\varepsilon})) iterations.

In each of these decision instances, we can rescale Ai\mathbf{A}_{i}’s so that the threshold in question is 11. In this setting, the total sum of xix_{i} with Tr[Ai]n3\mathop{\mathsf{Tr}}\left[\mathbf{A}_{i}\right]\geq n^{3} is at most 1/n1/n. By not using these xix_{i}’s, the optimum changes by an additive value of less than ε\varepsilon.

Solving Positive SDPs

In this section, we describe a parallel algorithm for solving the decision version of positive packing SDPs, inspired by Young’s algorithm for positive LPs. The following theorem presents the guarantees of our algorithm.

Let 0<ε<10<\varepsilon<1. There is an algorithm decisionPSDP that given a positive SDP, solves the ε\varepsilon-decision problem in O(ε3log2n)O(\varepsilon^{-3}\log^{2}n) iterations.

Presented in Algorithm 3.1 is an algorithm that we will show to satisfy the theorem.

The algorithm is a multiplicative weights update routine, which proceeds in several rounds. The starting solution is xi(0)=1nTr[Ai]\mathbf{x}^{(0)}_{i}=\frac{1}{n\mathop{\mathsf{Tr}}\left[\mathbf{A}_{i}\right]}. This solution is chosen to be small so that ixi(0)AiI\sum_{i}x_{i}^{(0)}\mathbf{A}_{i}\preccurlyeq\mathbf{I}, hence respecting the dual constraint. Each subsequent update is a multiple of the current solution, so this x(0)\mathbf{x}^{(0)} is also chosen to ensure that these updates make rapid progress.

In each iteration following that, the algorithm computes

Our presentation follows the multiplicative weights update framework from Arora-Kale and Kale . Several intermediate variables are helpful for further discussion. Define the cumulative sum corresponding to x(t)\mathbf{x}^{(t)} as

This allows us to write the exponential as W(t)=exp(Ψ(t))\mathbf{W}^{(t)}=\exp(\mathbf{\Psi}^{(t)}). We will also specifically define the probability matrix by which we use to pick the update coordinates:

This matrix is easier to work with because it has trace 11:

Using this “probability” matrix P(t)\mathbf{P}^{(t)}, the algorithm identifies which x\mathbf{x} coordinates to update. These are the coordinates xix_{i}’s for which their contributions with respect to the Ai\mathbf{A}_{i}’s are still small—P(t)Ai1+ε\mathbf{P}^{(t)}\bullet\mathbf{A}_{i}\leq 1+\varepsilon. Each of these x\mathbf{x} coordinates will be incremented by δ(t)=defαxi\delta^{(t)}\stackrel{{\scriptstyle{\smash{\textsf{def}}}}}{{=}}\alpha\cdot\mathbf{x}_{i}, where α=ε/K1+10ε\alpha=\frac{\varepsilon/K}{1+10\varepsilon}. Therefore, in terms of δ(t)\delta^{(t)}, we have

The main loop in Algorithm 3.1 terminates when x(t)1>K\|{\mathbf{x}^{(t)}}\|_{1}>K or the number of iterations exceeds a preset threshold R=O(ε3log2n)R=O(\varepsilon^{-3}\log^{2}n). For a desired accuracy parameter ε>0\varepsilon>0, we set KK to O(1ε(1+lnn))O(\frac{1}{\varepsilon}(1+\ln n)) so that the lnn\ln n additive term from Theorem 2.1 can be absorbed into the relative error. This additive term comes from the starting point x(0)\mathbf{x}^{(0)}.

The choice of α\alpha may seem mysterious at this point; it is chosen to prevent us from taking a step that is too big from the current solution while still making substantial progress. It ensures that

The update has small width, aka. iδi(t)AiεI\sum_{i}\delta_{i}^{(t)}\mathbf{A}_{i}\preccurlyeq\varepsilon\mathbf{I}, and

We cannot overshoot by much when exiting from the while loop, 1δ(t)ε\mathbf{1}^{\top}\delta^{(t)}\leq\varepsilon.

Our analysis of decisionPSDP and in turn our proof of Theorem 3.1 hinge on two complementary processess: show that after RR steps, Y\overline{\mathbf{Y}} is indeed a feasible primal solution, or that if the the algorithm terminates because x(t)1>K\|{\mathbf{x}^{(t)}}\|_{1}>K, then x^\widehat{\mathbf{x}} is a feasible dual solution. In the latter case, we have

For this x^\widehat{\mathbf{x}} to be a dual solution, we still need to show that it satisfies ix^iAiI\sum_{i}\widehat{\mathbf{x}}_{i}\mathbf{A}_{i}\preccurlyeq\mathbf{I}.

Let TT be the final iteration count (i.e., the final tt). To meet the requirement above, we only need to show that 1(1+10ε)KΨ(T)I\frac{1}{(1+10\varepsilon)K}\mathbf{\Psi}^{(T)}\preccurlyeq\mathbf{I}. We prove the following spectrum bound:

We prove this lemma by induction on the iteration number, resorting to properties of the MMW algorithm (Theorem 2.1), which relates the final spectral values to the “gain” derived at each intermediate step.

Ψ(0)I\mathbf{\Psi}^{(0)}\preccurlyeq\mathbf{I}. In other words, λmax(Ψ(0))=λmax(i=1nxi(0)Ai)1\lambda_{\max}\left(\mathbf{\Psi}^{(0)}\right)=\lambda_{\max}\left(\sum_{i=1}^{n}x_{i}^{(0)}\mathbf{A}_{i}\right)\leq 1.

Now for any iteration tTt\leq T, we can rewrite Ψ(t)\mathbf{\Psi}^{(t)} as

since both sums yield positive semidefinite matrices and the λmax\lambda_{\max} of the first sum is at most 11 by Claim 3.3.

Proof of Lemma 3.2: We will prove (3.5) by (strong) induction on tt. The base case of t=0t=0 is true by Claim 3.3. For a given tt, if we inductively assume that Ψ(τ)(1+10ε)KI\mathbf{\Psi}^{(\tau)}\preccurlyeq(1+10\varepsilon)K\mathbf{I} for all τ<t\tau<t, then for each 1τ<t1\leq\tau<t,

This makes Theorem 2.1 applicable, which gives

which allows us to conclude that Ψ(t)(1+10ε)KI\mathbf{\Psi}^{(t)}\preccurlyeq(1+10\varepsilon)K\mathbf{I}, as desired. \blacksquare

It remains to show the claims about the algorithm utilized in the above proof.

Our choice of x(0)\mathbf{x}^{(0)} guarantees that for all i=1,,ni=1,\dots,n,

Summing across i=1,,ni=1,\dots,n gives the desired bound.

Now every iB(t)i\in B^{(t)}, though B(t)B^{(t)} can be empty, has the property that AiP(t)(1+ε)\mathbf{A}_{i}\bullet\mathbf{P}^{(t)}\leq(1+\varepsilon), so

It remains to examine the case where the algorithm returns a primal solution: Equation (3.3) gives

Furthermore, this Y\overline{\mathbf{Y}} satisfies the primal constraints:

If x(T)1K\|{\mathbf{x}^{(T)}}\|_{1}\leq K—i.e, the algorithm exits the while-loop because it reaches RR iterations—then for all i=1,,ni=1,\dots,n, AiY1\mathbf{A}_{i}\bullet\overline{\mathbf{Y}}\geq 1.

Assume for a contradiction that there is an i[n]i\in[n] such that AiY<1\mathbf{A}_{i}\bullet\overline{\mathbf{Y}}<1. This means

Let U={τ:P(τ)Ai<1+ε}U=\{\tau:\mathbf{P}^{(\tau)}\bullet\mathbf{A}_{i}<1+\varepsilon\} be the iterations in which the ii-th coordinate of x\mathbf{x} is incremented. By Markov’s inequality, the number of such iterations is bounded by U<ε1+εT|U|<\frac{\varepsilon}{1+\varepsilon}T. But then, every time the ii-th coordinate changes, it increases by a factor of 1+α1+\alpha, so

Because x(T)1K\|{\mathbf{x}^{(T)}}\|_{1}\leq K, the algorithm exits the while-loop with T=RT=R. Therefore, for 0<ε<10<\varepsilon<1,

as Tr[Ai]O(n3)\mathop{\mathsf{Tr}}\left[\mathbf{A}_{i}\right]\leq O(n^{3}) by Lemma 2.2. This is a contradiction to x(T)1K\|{\mathbf{x}^{(T)}}\|_{1}\leq K, which proves the lemma.

Proof of Theorem 3.1: The algorithm terminates after at most RR iterations. Notice that B(t)B^{(t)} may be empty in some iterations but this does not harm the algorithm nor the proof. It is standard to check that R=O(ε3log2n)R=O(\varepsilon^{-3}\log^{2}n). If we do run this many iterations, then Lemma 3.6 gives that we terminate with a primal solution.

Otherwise, Lemma 3.2 gives that at any point in the algorithm, the solution vector x(t)\mathbf{x}^{(t)} satisfies ixi(t)Ai(1+10ε)KI\sum_{i}x^{(t)}_{i}\mathbf{A}_{i}\preccurlyeq(1+10\varepsilon)K\mathbf{I}. Together with Equation (3.4), we know that any x^\widehat{\mathbf{x}} returned x1110ε\|{\mathbf{x}^{*}}\|_{1}\geq 1-10\varepsilon and

Thus, x^\widehat{\mathbf{x}} is indeed a dual solution with value at least 110ε1-10\varepsilon. Replacing ε\varepsilon with ε/10\varepsilon/10 then meets the requirements of the decision problem. \blacksquare

Matrix Exponential Evaluation

We describe a fast algorithm for computing the matrix dot product of a positive semidefinite matrix and the matrix exponential of another positive semidefinite matrix.

There is an algorithm bigDotExp that when given a mm-by-mm matrix Φ\mathbf{\Phi} with pp non-zero entries, κmax{1,Φ2}\kappa\geq\max\{1,\|{\mathbf{\Phi}}\|_{2}\}, and mm-by-mm matrices Ai\mathbf{A}_{i} in factorized form Ai=QiQi\mathbf{A}_{i}=\mathbf{Q}_{i}\mathbf{Q}_{i}^{\top} where the total number of nonzeros across all Qi\mathbf{Q}_{i} is qq; \mboxbigDotExp(Φ,{Ai=QiQi}i=1n)\mbox{{bigDotExp}}(\mathbf{\Phi},\{\mathbf{A}_{i}=\mathbf{Q}_{i}\mathbf{Q}_{i}^{\top}\}_{i=1}^{n}) computes (1±ε)(1\pm\varepsilon) approximations to all exp(Φ)Ai\exp{(\mathbf{\Phi})}\bullet\mathbf{A}_{i} in O(κlogmlog(\nicefrac1ε))O(\kappa\log{m}\log(\nicefrac{{1}}{{\varepsilon}})) depth and O(1ε2(κlog(\nicefrac1ϵ)p+q)logm)O(\frac{1}{\varepsilon^{2}}(\kappa\log(\nicefrac{{1}}{{\epsilon}})p+q)\log{m}) work.

The idea behind Theorem 4.1 is to approximate the matrix exponential using a low-degree polynomial because evaluating matrix exponentials exactly is costly. For this, we will apply the following lemma, reproduced from Lemma 6 in :

If B\mathbf{B} is a PSD matrix such that B2κ\|{\mathbf{B}}\|_{2}\leq\kappa, then the operator

The given factorization of each Ai\mathbf{A}_{i} allows us to write exp(Φ)Ai\exp{(\mathbf{\Phi})}\bullet\mathbf{A}_{i} as the 22-norm of a vector:

By Lemma 4.2, it suffices to evaluate B^Ai\widehat{\mathbf{B}}\bullet\mathbf{A}_{i} where B^\widehat{\mathbf{B}} is an approximation to B=exp(12Φ)\mathbf{B}=\exp(\tfrac{1}{2}\mathbf{\Phi}). To further reduce the work, we can apply the Johnson-Lindenstrauss transformation to reduce the length of the vectors to O(logm)O(\log{m}); specifically, we find a O(1ε2logm)×mO(\frac{1}{\varepsilon^{2}}\log{m})\times m Gaussian matrix Π\mathbf{\Pi} and evaluate

Since Π\mathbf{\Pi} only has O(1ε2logm)O(\frac{1}{\varepsilon^{2}}\log{m}) rows, we can compute ΠB^\mathbf{\Pi}\widehat{\mathbf{B}} using O(logm)O(\log{m}) evaluations of B^\widehat{\mathbf{B}}. The work/depth bounds follow from doing each of the evaluations of B^Πi\widehat{\mathbf{B}}\Pi_{i}, where Πi\Pi_{i} denotes the ii-th column of Π\mathbf{\Pi}, and matrix-vector multiplies involving Φ\mathbf{\Phi} in parallel.

Conclusion

We presented a simple NC parallel algorithm for packing SDPs that requires O(1ε4log4nlog(1ε))O(\frac{1}{\varepsilon^{4}}\log^{4}n\log(\frac{1}{\varepsilon})) iterations, where each iteration involves only simple matrix operations and computing the trace of the product of a matrix exponential and a positive semidefinite matrix. When a positive SDP is given in a factorized form, we showed how the dot product with matrix exponential can be implemented in nearly-linear work, leading to an algorithm with O~(m+n+q)\widetilde{O}(m+n+q) work, where nn is the number of constraint matrices, mm is the dimension of these matrices, and qq is the total number of nonzero entries in the factorization.

Compared to the situation with positive LPs, the classification of positive SDPs is much richer because packing/covering constraints can take many forms, either as matrices (e.g. i=1nxiAiI\sum_{i=1}^{n}x_{i}\mathbf{A}_{i}\preccurlyeq\mathbf{I} for packing, i=1nxiAiI\sum_{i=1}^{n}x_{i}\mathbf{A}_{i}\succcurlyeq\mathbf{I} for covering) or as dot products between matrices (e.g. AiY1\mathbf{A}_{i}\bullet\mathbf{Y}\leq 1 for packing, AiY1\mathbf{A}_{i}\bullet\mathbf{Y}\geq 1 for covering). The positive SDPs studied in and our paper should be compared with the closely related notion of covering SDPs studied by Iyengar et al ; however, among the applications they examine, only the beamforming SDP relaxation discussed in Section 2.2 of falls completely within the framework of packing SDPs as defined in 2.10. Problems such as MaxCut and SparsestCut require additional matrix-based packing constraints. We believe extending these algorithms to solve mixed packing/covering SDPs is an interesting direction for future work.

Acknowledgments

This work is partially supported by the National Science Foundation under grant numbers CCF-1018463, CCF-1018188, and CCF-1016799 and by generous gifts from IBM, Intel, and Microsoft. Richard Peng was partly supported by a Microsoft Research PhD. Fellowship.

We thank the SPAA reviewers, as well as Guy Blelloch and Gary Miller for suggestions that helped improve this paper. While making this revision, we benefitted greatly from discussions with Jon Kelner and Di Wang.

References

Appendix A Normalized Positive SDPs

This is the same transformation that Jain and Yao presented ; we only present it here for easy reference.

Consider the primal program in (1.4). It suffices to show that it can be transformed into the following program without changing the optimal value:

We can make the following assumptions without loss of generality: First, bi>0b_{i}>0 for all i=1,,mi=1,\dots,m because if bib_{i} were , we could have thrown it away. Second, all Ai\mathbf{A}_{i}’s are the support of C\mathbf{C}, or otherwise we know that the corresponding dual variable must be set to and therefore can be removed right away. Therefore, we will treat CC as having a full-rank, allowing us to define

It is not hard to verify that the normalized program (A.4) has the same optimal value as the original SDP (1.4).

Note that if we’re given factorization of Ai\mathbf{A}_{i} into QiQi\mathbf{Q}_{i}\mathbf{Q}_{i}^{\top}, then Bi\mathbf{B}_{i} can also be factorized as:

Furthermore, it can be checked that the dual of the normalized program is the same as the dual in Equation 2.10.