A Parallel Approximation Algorithm for Positive Semidefinite Programming

Rahul Jain, Penghui Yao

Introduction

In this work we consider the class of positive semidefinite programs. A positive semidefinite program can be expressed in the following standard form (we use symbols ,\geq,\leq to also represent Löwner order).

Our algorithm is inspired by the algorithm used by Luby and Nisan to solve positive linear programs. Positive linear programs can be considered as a special case of positive semidefinite programs in which the matrices used in the description of the program are all pairwise commuting. Our algorithm (and the algorithm in ) is based on the ’multiplicative weights update’ (MWU) method. This is a powerful technique for ’experts learning’ and finds its origins in various fields including learning theory, game theory, and optimization. The algorithms used in are based on its matrix variant the ’matrix multiplicative weights update’ method. The algorithm of Luby and Nisan proceeds in phases, where in each phase the large eigenvalues of i=1myitAi\sum_{i=1}^{m}y^{t}_{i}A_{i} (yity^{t}_{i}s represent the candidate dual variables at time tt) are sought to be brought below a threshold determined for that phase. The primal variable at time step tt is chosen to be the projection onto the large eigenvalues (above the threshold) eigenspace of i=1myitAi\sum_{i=1}^{m}y^{t}_{i}A_{i}. Using the sum of the primal variables generated so far, the dual variables are updated using the MWU method. A suitable scaling parameter λt\lambda_{t} is chosen during this update, which is small enough so that the good properties needed in the analysis of MWU are preserved and at the same time is large enough so that there is reasonable progress in bringing down the large eigenvalues.

Due to the non-commutative nature of the matrices involved in our case, our algorithm primarily deviates from that of in how the threshold is determined inside each phase. The problem that is faced is roughly as follows. Since AiA_{i}’s could be non-commuting, when yity^{t}_{i}s are scaled down, the sum of the large eigenvalues of i=1myitAi\sum_{i=1}^{m}y^{t}_{i}A_{i} may not come down and this scaling may just move the large eigenvalues eigenspace. Therefore a suitable extra condition needs to be ensured while choosing the threshold. Due to this, our analysis also primarily deviates from in bounding the number of time steps required in any phase and is significantly more involved. The analysis requires us to study the relationship between the large eigenvalues eigenspaces before and after scaling (say W1W_{1} and W2W_{2}). For this purpose we consider the decomposition of the underlying space into one and two-dimensional subspaces which are invariant under the actions of both Π1\Pi_{1} and Π2\Pi_{2} (projections onto W1W_{1} and W2W_{2} respectively) and this helps the analysis significantly. Such decomposition has been quite useful in earlier works as well for example in quantum walk and quantum complexity theory .

We present the algorithm in the next section and its analysis, both optimality and the running time, in the subsequent section. Due to space constraints we move some proofs to the Appendix.

Algorithm

Given the positive semidefinite program (P,D)({P},{D}) as above, we first show in Appendix A that without loss of generality (P,D)(P,D) can be in the following special form.

Analysis

For all of this section, let ε1=3εlnn{\varepsilon}_{1}=\frac{3{\varepsilon}}{\ln n}. In the following we assume that nn is sufficiently large and ε{\varepsilon} is sufficiently small.

In this section we present the analysis assuming that all the operations performed by the algorithm are perfect. We claim, without going into further details, that similar analysis can be performed while taking into account the accuracy loss due to the actual operations of the algorithm in the limited running time.

For all ttft\leq t_{f}, λt\lambda_{t} satisfies the conditions 1.1. and 2.2. in Step (3d) in the Algorithm.

Follows since 1m1/εTrYtf=Trexp(Φ(Xtf))>exp(α).\frac{1}{m^{1/{\varepsilon}}}\geq\operatorname{Tr}Y_{t_{f}}=\operatorname{Tr}\exp(-\Phi(X_{t_{f}}))>\exp(-\alpha)\enspace.

Following lemma shows that for any time tt, Φ(Yt)\|\Phi^{*}(Y_{t})\| is not much larger than (1+ε0)thr(1+{\varepsilon}_{0})^{\mathsf{thr}}.

For all ttft\leq t_{f}, Φ(Yt)(1+ε0)thr(1+ε1).\|\Phi^{*}(Y_{t})\|\leq(1+{\varepsilon}_{0})^{\mathsf{thr}}(1+{\varepsilon}_{1}).

Fix any ttft\leq t_{f}. As Tr(Φ(Yt))nN(1+ε0)k(Φ(Yt))\operatorname{Tr}(\Phi^{*}(Y_{t}))\leq nN_{(1+{\varepsilon}_{0})^{k}}(\Phi^{*}(Y_{t})), the loop at Step 3(c) runs at most lnnln(1+2ε5)\frac{\ln n}{\ln(1+\frac{2{\varepsilon}}{5})} times. Hence

Following lemma shows that as tt increases, there is a reduction in the trace of the dual variable in terms of the trace of the primal variable.

For all ttft\leq t_{f} we have, TrYt+1TrYtλt(14ε)\mspace1.0muΦ(Yt)\mspace1.0mu(TrΠt).\operatorname{Tr}Y_{t+1}\leq\operatorname{Tr}Y_{t}-\lambda_{t}\cdot(1-4\sqrt{{\varepsilon}})\cdot\left\lVert\mspace{1.0mu}\Phi^{*}(Y_{t})\mspace{1.0mu}\right\rVert\cdot(\operatorname{Tr}\Pi_{t})\enspace.

Following lemma relates the trace of XtfX_{t_{f}} with the trace of YY^{*} and YtfY_{t_{f}}.

TrXtf1(14ε)(TrY)ln(m/TrYtf).\operatorname{Tr}X_{t_{f}}\leq\frac{1}{(1-4\sqrt{{\varepsilon}})}\cdot(\operatorname{Tr}Y^{*})\cdot\ln(m/\operatorname{Tr}Y_{t_{f}})\enspace.

We can now finally bound the trace of XX^{*} in terms of the trace of YY^{*}.

XX^{*} and YY^{*} are feasible for the PP and DD respectively and

It is easily verified that XX^{*} and YY^{*} are feasible for PP and DD respectively. From Lemma 5 we have,

Since Ytf=exp(Φ(Xtf))Y_{t_{f}}=\exp(-\Phi(X_{t_{f}})) we have

2 Time complexity

Let us first introduce some notation. Let AA be a Hermitian matrix and ll be a real number. Let

ΠlA\Pi^{A}_{l} denote the projector onto the space spanned by the eigenvectors of AA with eigenvalues at least ll. Let ΠA\Pi^{A} be shorthand for Π1A\Pi^{A}_{1}.

Nl(A)N_{l}(A) denote the sum of eigenvalues of AA at least ll. Thus Nl(A)=TrΠlAAN_{l}(A)=\operatorname{Tr}\Pi^{A}_{l}A. Let N(A)N(A) be shorthand for N1(A)N_{1}(A).

λk(A)\lambda_{k}(A) denote the k-th largest eigenvalue of AA.

λ(A)=def(λ1(A),,λn(A))\lambda^{\downarrow}(A)\stackrel{{\scriptstyle\smash{\text{\tiny def}}}}{{=}}(\lambda_{1}(A),\cdots,\lambda_{n}(A)).

for any two vectors u,vRnu,v\in\mathcal{R}^{n} we say uu majorizes vv, denoted uvu\succeq v, iff i=1kui=i=1kvi\sum_{i=1}^{k}u_{i}=\sum_{i=1}^{k}v_{i} and for any j[n]j\in[n] we have, i=1juii=1jvi\sum_{i=1}^{j}u_{i}\geq\sum_{i=1}^{j}v_{i}.

For n×nn\times n Hermitian matrices AA and BB, ABA\geq B implies λi(A)λi(B)\lambda_{i}(A)\geq\lambda_{i}(B) for all 1in1\leq i\leq n. Thus Nl(A)Nl(B)N_{l}(A)\geq N_{l}(B) for any real number ll.

Let AA be an n×nn\times n Hermitian matrix and P1,,PrP_{1},\cdots,P_{r} be a family of mutually orthogonal projections. Then λ(A)λ(iPiAPi).\lambda^{\downarrow}(A)\succeq\lambda^{\downarrow}(\sum_{i}P_{i}AP_{i}).

For any two projectors Π\Pi and Δ\Delta, there exits an orthogonal decomposition of the underlying vector space into one dimensional and two dimensional subspaces that are invariant under both Π\Pi and Δ\Delta. Moreover, inside each two-dimensional subspace, Π\Pi and Δ\Delta are rank-one projectors.

Let kfk_{f} be the final value of kk. Then kskf=O(logmlog2nε3)k_{s}-k_{f}=\mathcal{O}(\frac{\log m\log^{2}n}{{\varepsilon}^{3}}).

Hence kfO(logmεε0)k_{f}\geq-\mathcal{O}(\frac{\log m}{{\varepsilon}{\varepsilon}_{0}}). Therefore kskf=O(logmεε0)=O(logmlog2nε3)k_{s}-k_{f}=\mathcal{O}(\frac{\log m}{{\varepsilon}{\varepsilon}_{0}})=\mathcal{O}(\frac{\log m\log^{2}n}{{\varepsilon}^{3}}). ∎

For any fixed kk, the number of iterations of the algorithm is at most O(log2nε19ε)\mathcal{O}(\frac{\log^{2}n}{{\varepsilon}_{1}^{9}{\varepsilon}}). Hence combined with Lemma 10, the total number of iterations of the algorithm is at most O(log13nlogmε13).\mathcal{O}(\frac{\log^{13}n\log m}{{\varepsilon}^{13}}).

Fix kk. Assume that the Algorithm has reached step 3(d)3(d) for this fixed kk , 6log2nε19ε\frac{6\log^{2}n}{{\varepsilon}_{1}^{9}{\varepsilon}} times. As argued in the proof of Lemma 4, whenever Algorithm reaches step 3(d)3(d), thrk3lnnε\mathsf{thr}\geq k-\frac{3\ln n}{{\varepsilon}}. Thus there exists a value ss between kk and k3lnnεk-\frac{3\ln n}{{\varepsilon}} such that thr=s\mathsf{thr}=s at least 2lognε19\frac{2\log n}{{\varepsilon}_{1}^{9}} times.

From Lemma 4 we get that the sum of the eigenvalues above (1+ε0)s(1+{\varepsilon}_{0})^{s}, is at most n(1+ε1)(1+ε0)sn(1+{\varepsilon}_{1})(1+{\varepsilon}_{0})^{s} at the beginning of this phase. Whenever thrs\mathsf{thr}\neq s in this phase, using Fact 7, we conclude that the eigenvalues of Φ(Yt)\Phi^{*}(Y_{t}) above (1+ε0)s(1+{\varepsilon}_{0})^{s} do not increase. Whenever thr=s\mathsf{thr}=s in this phase, using Lemma 12, we conclude that the eigenvalues of Φ(Yt)\Phi^{*}(Y_{t}) above (1+ε0)s(1+{\varepsilon}_{0})^{s} reduce by a factor of (1ε19)(1-{\varepsilon}_{1}^{9}). This can be seen by letting AA in Lemma 12 to be 1exp(2ε)(1+ε0)sΦ(PλtYtPλt)\frac{1-\exp(-2\sqrt{{\varepsilon}})}{(1+{\varepsilon}_{0})^{s}}\cdot\Phi^{*}(P^{\geq}_{\lambda_{t}}Y_{t}P^{\geq}_{\lambda_{t}}) and BB to be 1(1+ε0)sΦ(Yt)A\frac{1}{(1+{\varepsilon}_{0})^{s}}\Phi^{*}(Y^{t})-A. Now condition 3(d)(1.)3(d)(1.) of the Algorithm gives condition (2)(2) of Lemma 12. Condition (1)(1) of Lemma 12 can also be seen to be satisfied (using Lemma 3) and condition (4)(4) of Lemma 12 is false due to condition 3(c)3(c) of the Algorithm. This implies condition (3)(3) of Lemma 12 must also be false which gives us the desired conclusion.

Therefore the eigenvalues of Φ(Yt)\Phi^{*}(Y_{t}) above (1+ε0)s(1+{\varepsilon}_{0})^{s} (in particular above (1+ε0)k(1+{\varepsilon}_{0})^{k}) will vanish before thr=s\mathsf{thr}=s, 2lognε19\frac{2\log n}{{\varepsilon}_{1}^{9}} times. Hence kk must decrease before the Algorithm has reached step 3(d)3(d), 6log2nε19ε\frac{6\log^{2}n}{{\varepsilon}_{1}^{9}{\varepsilon}} times. ∎

Following is a key lemma. It states that for two positive semidefinite matrices A,BA,B, if AA has good weight in the large (above 11) eigenvalues space of A+BA+B and if the sum of large (above 11) eigenvalues of BB is pretty much the same as for A+BA+B, then the sum of eigenvalues of A+BA+B, slightly below 11 should be a constant fraction larger than the sum above 11.

Let ε=ε01+ε0{\varepsilon}^{\prime}=\frac{{\varepsilon}_{0}}{1+{\varepsilon}_{0}}. Let A,BA,B be two n×nn\times n positive semidefinite matrices satisfying

In order to prove this Lemma we will need to first show a few other Lemmas. By Fact 9, ΠB\Pi^{B} and ΠA+B\Pi^{A+B} decompose the underlying space VV as follows,

Above for each i[k]i\in[k], ViV_{i} is either one-dimensional or two-dimensional subspace, invariant for both ΠB\Pi^{B} and ΠA+B\Pi^{A+B} and inside ViV_{i} at least one of ΠB\Pi^{B} and ΠA+B\Pi^{A+B} survives. WW is the subspace where both ΠB\Pi^{B} and ΠA+B\Pi^{A+B} vanish. We identify the subspace ViV_{i} and the projector onto itself. For any matrix MM, define MiM_{i} to be ViMViV_{i}MV_{i}. We can see that both the projectors ΠB\Pi^{B} and ΠA+B\Pi^{A+B} are decomposed into the direct sum of one-dimensional projectors as follows.

For any i[k]i\in[k], ΠBi=ΠiB\Pi^{B_{i}}=\Pi^{B}_{i} and ΠiA+B=ΠAi+Bi\Pi^{A+B}_{i}=\Pi^{A_{i}+B_{i}}. That is, the eigenspace of BiB_{i} with eigenvalues at least 11, is exactly the restriction of ΠB\Pi^{B} to ViV_{i} and similarly for Ai+BiA_{i}+B_{i}.

Let ΠiB=v1v1\Pi^{B}_{i}=|v_{1}\rangle\langle v_{1}| and ViΠiB=v0v0V_{i}-\Pi^{B}_{i}=|v_{0}\rangle\langle v_{0}|, then

is the spectral decomposition of BiB_{i}. As ΠBv1=ΠiBv1=v1\Pi^{B}|v_{1}\rangle=\Pi^{B}_{i}|v_{1}\rangle=|v_{1}\rangle and ΠBv0=ΠiBv0=0\Pi^{B}|v_{0}\rangle=\Pi^{B}_{i}|v_{0}\rangle=0, we have v1Bv11\langle v_{1}|B|v_{1}\rangle\geq 1 and v0Bv0<1\langle v_{0}|B|v_{0}\rangle<1, and hence ΠBi=v1v1.\Pi^{B_{i}}=|v_{1}\rangle\langle v_{1}|.

We prove (6) and (7) and (8) follow similarly.

From (1), for all i[k]i\in[k], TrΠAi+Bi(Ai+Bi)1+ε1\operatorname{Tr}\Pi^{A_{i}+B_{i}}(A_{i}+B_{i})\leq 1+{\varepsilon}_{1}. Combined with (8), we have

From (10) (since for all i[k],N(Ai+Bi)N(Bi)i\in[k],N(A_{i}+B_{i})\geq N(B_{i})),

Note that for any iIJi\in I\cap J, dimVi=2\dim V_{i}=2. Otherwise, either ΠAi+Bi=ΠBi\Pi^{A_{i}+B_{i}}=\Pi^{B_{i}} or ΠBi=0\Pi^{B_{i}}=0 and neither of these can happen in IJI\cap J (from definitions of II and JJ).

The following lemma states that for each iIJi\in I\cap J, the second eigenvalue of Ai+BiA_{i}+B_{i} is close to 11. Its proof involves some direct calculations and due to space constraint we move it to Appendix B.

Let PP and QQ be 2×22\times 2 positive semidefinite matrices satisfying

Then λ2(P+Q)>119ε13.\lambda_{2}(P+Q)>1-\frac{1}{9}{\varepsilon}_{1}^{3}.

We can finally prove Lemma 12. By Fact 7, λ(A+B)λ(iAi+Bi)\lambda^{\downarrow}(A+B)\succeq\lambda^{\downarrow}(\sum_{i}{A_{i}+B_{i}}). Let j1=max{j:λj(A+B)1}j_{1}=\max\{j:\lambda_{j}(A+B)\geq 1\}, j2=max{j:λj(i(Ai+Bi))1},j_{2}=\max\{j:\lambda_{j}(\sum_{i}(A_{i}+B_{i}))\geq 1\}, and j0=j1+99100εkj_{0}=j_{1}+\frac{99}{100}{\varepsilon}k. Then

According to the decomposition in Fact 9, Lemma 14 and the remarks below it, j1=j2=kj_{1}=j_{2}=k and

The RHS of both the equations are equal by Lemma 14. Therefore,

Let x=N1ε(A+B)N(A+B)x=N_{1-{\varepsilon}^{\prime}}(A+B)-N(A+B), then

Therefore from previous three inequalities,

Note that ε13ε{\varepsilon}_{1}^{3}\ll{\varepsilon}^{\prime}, therefore from Remark 2.,

Penghui Yao would like to thank Attila Pereszleˊ\acute{e}nyi and Huangjun Zhu for helpful discussions.

References

Appendix A Transforming to special form

Let us consider an instance of a positive semidefinite program as follows.

We show how to transform the primal problem to the special form and a similar transformation can be applied to dual problem. First observe that if for some ii, bi=0b_{i}=0, the corresponding constraint in primal problem is trivial and can be removed. Similarly if for some ii, the support of AiA_{i} is not contained in the support of CC, then yiy_{i} must be and can be removed. Therefore we can assume w.l.o.g. that for all i,bi>0i,b_{i}>0 and the support of AiA_{i} is contained in the support of CC. Hence w.l.o.g we can take the support of CC as the whole space, in other words, CC is invertible. For all i[m],i\in[m], define Ai=defC1/2AiC1/2biA_{i}^{\prime}\stackrel{{\scriptstyle\smash{\text{\tiny def}}}}{{=}}\frac{C^{-1/2}A_{i}C^{-1/2}}{b_{i}}. Consider the normalized Primal problem.

The next step to transforming the problem is to limit the range of eigenvalues of AiA_{i}^{\prime}s. Let β=miniAi\beta=\min_{i}{\|A_{i}^{\prime}\|}.

Let Ai=j=1naijvijvijA_{i}^{\prime}=\sum_{j=1}^{n}a_{ij}^{\prime}|v_{ij}\rangle\langle v_{ij}| be the spectral decomposition of AiA_{i}^{\prime}. Define for all i[m]i\in[m] and j[n]j\in[n],

Define Ai=j=1naijvijvij.A_{i}^{{}^{\prime\prime}}=\sum_{j=1}^{n}a_{ij}^{{}^{\prime\prime}}|v_{ij}\rangle\langle v_{ij}|. Consider the transformed Primal problem PP^{{}^{\prime\prime}}.

Transformed Primal problem PP^{{}^{\prime\prime}}

Any feasible solution to PP^{{}^{\prime\prime}} is also a feasible solution to PP^{\prime}.

Follows immediately from the fact that AiAiA_{i}^{{}^{\prime\prime}}\leq A_{i}^{\prime}.

Fix i[m]i\in[m]. Assume that there exists j[n]j\in[n] such that aijβmεa_{ij}^{\prime}\geq\frac{\beta m}{{\varepsilon}}. Then, from Claim 18

Note that for all i[m]i\in[m], the ratio between the largest eigenvalue and the smallest nonzero eigenvalue of AiA_{i}^{{}^{\prime\prime}} is at most m2ε2=γ\frac{m^{2}}{{\varepsilon}^{2}}=\gamma.

Finally, we get the special form Primal problem P^\hat{P} as follows. Let t=maxi[m]Ait=\max_{i\in[m]}\|A^{{}^{\prime\prime}}_{i}\| and for all i[m]i\in[m] define A^i=defAit\hat{A}_{i}\stackrel{{\scriptstyle\smash{\text{\tiny def}}}}{{=}}\frac{A_{i}^{{}^{\prime\prime}}}{t}. Consider,

Appendix B Deferred Proofs

is the eigenvector of P2+Q2P_{2}+Q_{2} with eigenvalue λ\lambda. Hence ΠP2+Q2=vv\Pi^{P_{2}+Q_{2}}=|v\rangle\langle v|. Note that λ>b+rsin2θ\lambda>b+|r|\sin^{2}\theta, because λ2(P2+Q2)=1+r+bλ<1.\lambda_{2}(P_{2}+Q_{2})=1+|r|+b-\lambda<1. Consider