A Parallel Approximation Algorithm for Positive Semidefinite Programming

Rahul Jain, Penghui Yao

Introduction

In this work we consider the class of positive semidefinite programs. A positive semidefinite program can be expressed in the following standard form (we use symbols $\geq,\leq$ to also represent Löwner order).

Our algorithm is inspired by the algorithm used by Luby and Nisan to solve positive linear programs. Positive linear programs can be considered as a special case of positive semidefinite programs in which the matrices used in the description of the program are all pairwise commuting. Our algorithm (and the algorithm in ) is based on the ’multiplicative weights update’ (MWU) method. This is a powerful technique for ’experts learning’ and finds its origins in various fields including learning theory, game theory, and optimization. The algorithms used in are based on its matrix variant the ’matrix multiplicative weights update’ method. The algorithm of Luby and Nisan proceeds in phases, where in each phase the large eigenvalues of $\sum_{i=1}^{m}y^{t}_{i}A_{i}$ ( $y^{t}_{i}$ s represent the candidate dual variables at time $t$ ) are sought to be brought below a threshold determined for that phase. The primal variable at time step $t$ is chosen to be the projection onto the large eigenvalues (above the threshold) eigenspace of $\sum_{i=1}^{m}y^{t}_{i}A_{i}$ . Using the sum of the primal variables generated so far, the dual variables are updated using the MWU method. A suitable scaling parameter $\lambda_{t}$ is chosen during this update, which is small enough so that the good properties needed in the analysis of MWU are preserved and at the same time is large enough so that there is reasonable progress in bringing down the large eigenvalues.

Due to the non-commutative nature of the matrices involved in our case, our algorithm primarily deviates from that of in how the threshold is determined inside each phase. The problem that is faced is roughly as follows. Since $A_{i}$ ’s could be non-commuting, when $y^{t}_{i}$ s are scaled down, the sum of the large eigenvalues of $\sum_{i=1}^{m}y^{t}_{i}A_{i}$ may not come down and this scaling may just move the large eigenvalues eigenspace. Therefore a suitable extra condition needs to be ensured while choosing the threshold. Due to this, our analysis also primarily deviates from in bounding the number of time steps required in any phase and is significantly more involved. The analysis requires us to study the relationship between the large eigenvalues eigenspaces before and after scaling (say $W_{1}$ and $W_{2}$ ). For this purpose we consider the decomposition of the underlying space into one and two-dimensional subspaces which are invariant under the actions of both $\Pi_{1}$ and $\Pi_{2}$ (projections onto $W_{1}$ and $W_{2}$ respectively) and this helps the analysis significantly. Such decomposition has been quite useful in earlier works as well for example in quantum walk and quantum complexity theory .

We present the algorithm in the next section and its analysis, both optimality and the running time, in the subsequent section. Due to space constraints we move some proofs to the Appendix.

Algorithm

Given the positive semidefinite program $({P},{D})$ as above, we first show in Appendix A that without loss of generality $(P,D)$ can be in the following special form.

Analysis

For all of this section, let ${\varepsilon}_{1}=\frac{3{\varepsilon}}{\ln n}$ . In the following we assume that $n$ is sufficiently large and ${\varepsilon}$ is sufficiently small.

In this section we present the analysis assuming that all the operations performed by the algorithm are perfect. We claim, without going into further details, that similar analysis can be performed while taking into account the accuracy loss due to the actual operations of the algorithm in the limited running time.

For all $t\leq t_{f}$ , $\lambda_{t}$ satisfies the conditions $1.$ and $2.$ in Step (3d) in the Algorithm.

Follows since $\frac{1}{m^{1/{\varepsilon}}}\geq\operatorname{Tr}Y_{t_{f}}=\operatorname{Tr}\exp(-\Phi(X_{t_{f}}))>\exp(-\alpha)\enspace.$ ∎

Following lemma shows that for any time $t$ , $\|\Phi^{*}(Y_{t})\|$ is not much larger than $(1+{\varepsilon}_{0})^{\mathsf{thr}}$ .

For all $t\leq t_{f}$ , $\|\Phi^{*}(Y_{t})\|\leq(1+{\varepsilon}_{0})^{\mathsf{thr}}(1+{\varepsilon}_{1}).$

Fix any $t\leq t_{f}$ . As $\operatorname{Tr}(\Phi^{*}(Y_{t}))\leq nN_{(1+{\varepsilon}_{0})^{k}}(\Phi^{*}(Y_{t}))$ , the loop at Step 3(c) runs at most $\frac{\ln n}{\ln(1+\frac{2{\varepsilon}}{5})}$ times. Hence

Following lemma shows that as $t$ increases, there is a reduction in the trace of the dual variable in terms of the trace of the primal variable.

For all $t\leq t_{f}$ we have, $\operatorname{Tr}Y_{t+1}\leq\operatorname{Tr}Y_{t}-\lambda_{t}\cdot(1-4\sqrt{{\varepsilon}})\cdot\left\lVert\mspace{1.0mu}\Phi^{*}(Y_{t})\mspace{1.0mu}\right\rVert\cdot(\operatorname{Tr}\Pi_{t})\enspace.$

Following lemma relates the trace of $X_{t_{f}}$ with the trace of $Y^{*}$ and $Y_{t_{f}}$ .

$\operatorname{Tr}X_{t_{f}}\leq\frac{1}{(1-4\sqrt{{\varepsilon}})}\cdot(\operatorname{Tr}Y^{*})\cdot\ln(m/\operatorname{Tr}Y_{t_{f}})\enspace.$

We can now finally bound the trace of $X^{*}$ in terms of the trace of $Y^{*}$ .

$X^{*}$ and $Y^{*}$ are feasible for the $P$ and $D$ respectively and

It is easily verified that $X^{*}$ and $Y^{*}$ are feasible for $P$ and $D$ respectively. From Lemma 5 we have,

Since $Y_{t_{f}}=\exp(-\Phi(X_{t_{f}}))$ we have

2 Time complexity

Let us first introduce some notation. Let $A$ be a Hermitian matrix and $l$ be a real number. Let

$\Pi^{A}_{l}$ denote the projector onto the space spanned by the eigenvectors of $A$ with eigenvalues at least $l$ . Let $\Pi^{A}$ be shorthand for $\Pi^{A}_{1}$ .

$N_{l}(A)$ denote the sum of eigenvalues of $A$ at least $l$ . Thus $N_{l}(A)=\operatorname{Tr}\Pi^{A}_{l}A$ . Let $N(A)$ be shorthand for $N_{1}(A)$ .

$\lambda_{k}(A)$ denote the k-th largest eigenvalue of $A$ .

$\lambda^{\downarrow}(A)\stackrel{{\scriptstyle\smash{\text{\tiny def}}}}{{=}}(\lambda_{1}(A),\cdots,\lambda_{n}(A))$ .

for any two vectors $u,v\in\mathcal{R}^{n}$ we say $u$ majorizes $v$ , denoted $u\succeq v$ , iff $\sum_{i=1}^{k}u_{i}=\sum_{i=1}^{k}v_{i}$ and for any $j\in[n]$ we have, $\sum_{i=1}^{j}u_{i}\geq\sum_{i=1}^{j}v_{i}$ .

For $n\times n$ Hermitian matrices $A$ and $B$ , $A\geq B$ implies $\lambda_{i}(A)\geq\lambda_{i}(B)$ for all $1\leq i\leq n$ . Thus $N_{l}(A)\geq N_{l}(B)$ for any real number $l$ .

Let $A$ be an $n\times n$ Hermitian matrix and $P_{1},\cdots,P_{r}$ be a family of mutually orthogonal projections. Then $\lambda^{\downarrow}(A)\succeq\lambda^{\downarrow}(\sum_{i}P_{i}AP_{i}).$

For any two projectors $\Pi$ and $\Delta$ , there exits an orthogonal decomposition of the underlying vector space into one dimensional and two dimensional subspaces that are invariant under both $\Pi$ and $\Delta$ . Moreover, inside each two-dimensional subspace, $\Pi$ and $\Delta$ are rank-one projectors.

Let $k_{f}$ be the final value of $k$ . Then $k_{s}-k_{f}=\mathcal{O}(\frac{\log m\log^{2}n}{{\varepsilon}^{3}})$ .

Hence $k_{f}\geq-\mathcal{O}(\frac{\log m}{{\varepsilon}{\varepsilon}_{0}})$ . Therefore $k_{s}-k_{f}=\mathcal{O}(\frac{\log m}{{\varepsilon}{\varepsilon}_{0}})=\mathcal{O}(\frac{\log m\log^{2}n}{{\varepsilon}^{3}})$ . ∎

For any fixed $k$ , the number of iterations of the algorithm is at most $\mathcal{O}(\frac{\log^{2}n}{{\varepsilon}_{1}^{9}{\varepsilon}})$ . Hence combined with Lemma 10, the total number of iterations of the algorithm is at most $\mathcal{O}(\frac{\log^{13}n\log m}{{\varepsilon}^{13}}).$

Fix $k$ . Assume that the Algorithm has reached step $3(d)$ for this fixed $k$ , $\frac{6\log^{2}n}{{\varepsilon}_{1}^{9}{\varepsilon}}$ times. As argued in the proof of Lemma 4, whenever Algorithm reaches step $3(d)$ , $\mathsf{thr}\geq k-\frac{3\ln n}{{\varepsilon}}$ . Thus there exists a value $s$ between $k$ and $k-\frac{3\ln n}{{\varepsilon}}$ such that $\mathsf{thr}=s$ at least $\frac{2\log n}{{\varepsilon}_{1}^{9}}$ times.

From Lemma 4 we get that the sum of the eigenvalues above $(1+{\varepsilon}_{0})^{s}$ , is at most $n(1+{\varepsilon}_{1})(1+{\varepsilon}_{0})^{s}$ at the beginning of this phase. Whenever $\mathsf{thr}\neq s$ in this phase, using Fact 7, we conclude that the eigenvalues of $\Phi^{*}(Y_{t})$ above $(1+{\varepsilon}_{0})^{s}$ do not increase. Whenever $\mathsf{thr}=s$ in this phase, using Lemma 12, we conclude that the eigenvalues of $\Phi^{*}(Y_{t})$ above $(1+{\varepsilon}_{0})^{s}$ reduce by a factor of $(1-{\varepsilon}_{1}^{9})$ . This can be seen by letting $A$ in Lemma 12 to be $\frac{1-\exp(-2\sqrt{{\varepsilon}})}{(1+{\varepsilon}_{0})^{s}}\cdot\Phi^{*}(P^{\geq}_{\lambda_{t}}Y_{t}P^{\geq}_{\lambda_{t}})$ and $B$ to be $\frac{1}{(1+{\varepsilon}_{0})^{s}}\Phi^{*}(Y^{t})-A$ . Now condition $3(d)(1.)$ of the Algorithm gives condition $(2)$ of Lemma 12. Condition $(1)$ of Lemma 12 can also be seen to be satisfied (using Lemma 3) and condition $(4)$ of Lemma 12 is false due to condition $3(c)$ of the Algorithm. This implies condition $(3)$ of Lemma 12 must also be false which gives us the desired conclusion.

Therefore the eigenvalues of $\Phi^{*}(Y_{t})$ above $(1+{\varepsilon}_{0})^{s}$ (in particular above $(1+{\varepsilon}_{0})^{k}$ ) will vanish before $\mathsf{thr}=s$ , $\frac{2\log n}{{\varepsilon}_{1}^{9}}$ times. Hence $k$ must decrease before the Algorithm has reached step $3(d)$ , $\frac{6\log^{2}n}{{\varepsilon}_{1}^{9}{\varepsilon}}$ times. ∎

Following is a key lemma. It states that for two positive semidefinite matrices $A,B$ , if $A$ has good weight in the large (above $1$ ) eigenvalues space of $A+B$ and if the sum of large (above $1$ ) eigenvalues of $B$ is pretty much the same as for $A+B$ , then the sum of eigenvalues of $A+B$ , slightly below $1$ should be a constant fraction larger than the sum above $1$ .

Let ${\varepsilon}^{\prime}=\frac{{\varepsilon}_{0}}{1+{\varepsilon}_{0}}$ . Let $A,B$ be two $n\times n$ positive semidefinite matrices satisfying

In order to prove this Lemma we will need to first show a few other Lemmas. By Fact 9, $\Pi^{B}$ and $\Pi^{A+B}$ decompose the underlying space $V$ as follows,

Above for each $i\in[k]$ , $V_{i}$ is either one-dimensional or two-dimensional subspace, invariant for both $\Pi^{B}$ and $\Pi^{A+B}$ and inside $V_{i}$ at least one of $\Pi^{B}$ and $\Pi^{A+B}$ survives. $W$ is the subspace where both $\Pi^{B}$ and $\Pi^{A+B}$ vanish. We identify the subspace $V_{i}$ and the projector onto itself. For any matrix $M$ , define $M_{i}$ to be $V_{i}MV_{i}$ . We can see that both the projectors $\Pi^{B}$ and $\Pi^{A+B}$ are decomposed into the direct sum of one-dimensional projectors as follows.

For any $i\in[k]$ , $\Pi^{B_{i}}=\Pi^{B}_{i}$ and $\Pi^{A+B}_{i}=\Pi^{A_{i}+B_{i}}$ . That is, the eigenspace of $B_{i}$ with eigenvalues at least $1$ , is exactly the restriction of $\Pi^{B}$ to $V_{i}$ and similarly for $A_{i}+B_{i}$ .

Let $\Pi^{B}_{i}=|v_{1}\rangle\langle v_{1}|$ and $V_{i}-\Pi^{B}_{i}=|v_{0}\rangle\langle v_{0}|$ , then

is the spectral decomposition of $B_{i}$ . As $\Pi^{B}|v_{1}\rangle=\Pi^{B}_{i}|v_{1}\rangle=|v_{1}\rangle$ and $\Pi^{B}|v_{0}\rangle=\Pi^{B}_{i}|v_{0}\rangle=0$ , we have $\langle v_{1}|B|v_{1}\rangle\geq 1$ and $\langle v_{0}|B|v_{0}\rangle<1$ , and hence $\Pi^{B_{i}}=|v_{1}\rangle\langle v_{1}|.$ ∎

We prove (6) and (7) and (8) follow similarly.

From (1), for all $i\in[k]$ , $\operatorname{Tr}\Pi^{A_{i}+B_{i}}(A_{i}+B_{i})\leq 1+{\varepsilon}_{1}$ . Combined with (8), we have

From (10) (since for all $i\in[k],N(A_{i}+B_{i})\geq N(B_{i})$ ),

Note that for any $i\in I\cap J$ , $\dim V_{i}=2$ . Otherwise, either $\Pi^{A_{i}+B_{i}}=\Pi^{B_{i}}$ or $\Pi^{B_{i}}=0$ and neither of these can happen in $I\cap J$ (from definitions of $I$ and $J$ ).

The following lemma states that for each $i\in I\cap J$ , the second eigenvalue of $A_{i}+B_{i}$ is close to $1$ . Its proof involves some direct calculations and due to space constraint we move it to Appendix B.

Let $P$ and $Q$ be $2\times 2$ positive semidefinite matrices satisfying

Then $\lambda_{2}(P+Q)>1-\frac{1}{9}{\varepsilon}_{1}^{3}.$

We can finally prove Lemma 12. By Fact 7, $\lambda^{\downarrow}(A+B)\succeq\lambda^{\downarrow}(\sum_{i}{A_{i}+B_{i}})$ . Let $j_{1}=\max\{j:\lambda_{j}(A+B)\geq 1\}$ , $j_{2}=\max\{j:\lambda_{j}(\sum_{i}(A_{i}+B_{i}))\geq 1\},$ and $j_{0}=j_{1}+\frac{99}{100}{\varepsilon}k$ . Then

According to the decomposition in Fact 9, Lemma 14 and the remarks below it, $j_{1}=j_{2}=k$ and

The RHS of both the equations are equal by Lemma 14. Therefore,

Let $x=N_{1-{\varepsilon}^{\prime}}(A+B)-N(A+B)$ , then

Therefore from previous three inequalities,

Note that ${\varepsilon}_{1}^{3}\ll{\varepsilon}^{\prime}$ , therefore from Remark 2.,

Penghui Yao would like to thank Attila Pereszl $\acute{e}$ nyi and Huangjun Zhu for helpful discussions.

References

Appendix A Transforming to special form

Let us consider an instance of a positive semidefinite program as follows.

We show how to transform the primal problem to the special form and a similar transformation can be applied to dual problem. First observe that if for some $i$ , $b_{i}=0$ , the corresponding constraint in primal problem is trivial and can be removed. Similarly if for some $i$ , the support of $A_{i}$ is not contained in the support of $C$ , then $y_{i}$ must be and can be removed. Therefore we can assume w.l.o.g. that for all $i,b_{i}>0$ and the support of $A_{i}$ is contained in the support of $C$ . Hence w.l.o.g we can take the support of $C$ as the whole space, in other words, $C$ is invertible. For all $i\in[m],$ define $A_{i}^{\prime}\stackrel{{\scriptstyle\smash{\text{\tiny def}}}}{{=}}\frac{C^{-1/2}A_{i}C^{-1/2}}{b_{i}}$ . Consider the normalized Primal problem.

The next step to transforming the problem is to limit the range of eigenvalues of $A_{i}^{\prime}$ s. Let $\beta=\min_{i}{\|A_{i}^{\prime}\|}$ .

Let $A_{i}^{\prime}=\sum_{j=1}^{n}a_{ij}^{\prime}|v_{ij}\rangle\langle v_{ij}|$ be the spectral decomposition of $A_{i}^{\prime}$ . Define for all $i\in[m]$ and $j\in[n]$ ,

Define $A_{i}^{{}^{\prime\prime}}=\sum_{j=1}^{n}a_{ij}^{{}^{\prime\prime}}|v_{ij}\rangle\langle v_{ij}|.$ Consider the transformed Primal problem $P^{{}^{\prime\prime}}$ .

Transformed Primal problem $P^{{}^{\prime\prime}}$

Any feasible solution to $P^{{}^{\prime\prime}}$ is also a feasible solution to $P^{\prime}$ .

Follows immediately from the fact that $A_{i}^{{}^{\prime\prime}}\leq A_{i}^{\prime}$ .

Fix $i\in[m]$ . Assume that there exists $j\in[n]$ such that $a_{ij}^{\prime}\geq\frac{\beta m}{{\varepsilon}}$ . Then, from Claim 18

Note that for all $i\in[m]$ , the ratio between the largest eigenvalue and the smallest nonzero eigenvalue of $A_{i}^{{}^{\prime\prime}}$ is at most $\frac{m^{2}}{{\varepsilon}^{2}}=\gamma$ .

Finally, we get the special form Primal problem $\hat{P}$ as follows. Let $t=\max_{i\in[m]}\|A^{{}^{\prime\prime}}_{i}\|$ and for all $i\in[m]$ define $\hat{A}_{i}\stackrel{{\scriptstyle\smash{\text{\tiny def}}}}{{=}}\frac{A_{i}^{{}^{\prime\prime}}}{t}$ . Consider,

Appendix B Deferred Proofs

is the eigenvector of $P_{2}+Q_{2}$ with eigenvalue $\lambda$ . Hence $\Pi^{P_{2}+Q_{2}}=|v\rangle\langle v|$ . Note that $\lambda>b+|r|\sin^{2}\theta$ , because $\lambda_{2}(P_{2}+Q_{2})=1+|r|+b-\lambda<1.$ Consider