Quantum state tomography via compressed sensing

David Gross, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, Jens Eisert

Let ρ\rho be an arbitrary state of rank rr. If m=cdrlog2dm=cdr\log^{2}d randomly chosen Pauli expectations are known, then ρ\rho can be uniquely reconstructed by solving the convex optimization problem (1) with probability of failure exponentially small in cc.

The proof is inspired by, but technically very different from, earlier work on matrix completion Candes2008 . Our methods are more general, can be tuned to give tighter bounds, and are much more compact, allowing us to present a fairly complete argument in this Letter. A more detailed presentation of this technique – covering the reconstruction of low-rank matrices from few expansion coefficients w.r.t. general operator bases (not just Pauli matrices or matrix elements) – will be published elsewhere prep2 .

Proof: Here we sketch the argument and explain the main ideas; detailed calculations are in the EPAPS supplement.

We will ascertain this by using a basic idea from convex optimization: constructing a strict subgradient YY for the norm. A matrix YY is a strict subgradient if ρ+Δtr>ρtr+trYΔ\|\rho+\Delta\|_{\operatorname{tr}}>\|\rho\|_{\operatorname{tr}}+\operatorname{tr}Y\Delta for all Δ0\Delta\neq 0. The main contribution below is a method for constructing such a YY which is also in the range of R\mathcal{R}. For then RΔ=0\mathcal{R}\Delta=0 implies that Δ\Delta is orthogonal to the range of R\mathcal{R}, thus trYΔ=0\operatorname{tr}Y\Delta=0 and the subgradient condition reads ρ+Δtr>ρtr\|\rho+\Delta\|_{\operatorname{tr}}>\|\rho\|_{\operatorname{tr}}. This implies uniqueness. (In fact, it is sufficient to approximate the subgradient condition in a certain sense).

Let EE be the projection onto the range of ρ\rho, let TT be the space spanned by those operators whose row or column space is contained in rangeρ\operatorname{range}\rho. Let PT\mathcal{P}_{T} be the projection onto TT, PT\mathcal{P}_{T}^{\bot} onto the orthogonal complement. Decompose Δ=ΔT+ΔT\Delta=\Delta_{T}+\Delta_{T}^{\bot}, the parts of Δ\Delta that lie in the subspaces TT and TT^{\bot}. We distinguish two cases: (i) ΔT2>d2ΔT2\|\Delta_{T}\|_{2}>d^{2}\|\Delta_{T}^{\bot}\|_{2}, and (ii) ΔT2d2ΔT2\|\Delta_{T}\|_{2}\leq d^{2}\|\Delta_{T}^{\bot}\|_{2} note2 .

Case (i) is easier. In this case, Δ\Delta is well-approximated by ΔT\Delta_{T} and essentially we only have to show that the restriction A:=PTRPT\mathcal{A}:=\mathcal{P}_{T}\mathcal{R}\mathcal{P}_{T} of R\mathcal{R} to TT is invertible. Using a non-commutative large deviation bound (see EPAPS supplement),

Case (ii) is more involved. A matrix Yspan(w(A1),,w(Am))Y\in\operatorname{span}(w(A_{1}),\ldots,w(A_{m})) is an almost subgradient note3 if

First, suppose such a YY exists. Then a simple calculation (see EPAPS) using the condition (ii) shows that RΔ=0\mathcal{R}\Delta=0 indeed implies ρ+Δtr>ρtr\|\rho+\Delta\|_{\operatorname{tr}}>\|\rho\|_{\operatorname{tr}} as hinted at above. This proves uniqueness in case (ii). The difficult part consists in showing that an almost-subgradient exists.

To this end, we design a recursive process (the “golfing scheme” prep2 ) which converges to a subgradient exponentially fast. Assume we draw ll batches of κ0rd\kappa_{0}rd Pauli observables independently at random (κ0\kappa_{0} will be chosen later). Define recursively X0=EX_{0}=E,

so that Xi22iX0=2ir\|X_{i}\|_{2}\leq 2^{-i}\|X_{0}\|=2^{-i}\sqrt{r}. Hence, Y=YlY=Y_{l} fulfills the first part of (3), as soon as llog2(2d2r)l\geq\log_{2}(2d^{2}\sqrt{r}). We turn to the second part. Again using large-deviation techniques (EPAPS) we find PTRiXi11/(4r)Xi12\|\mathcal{P}_{T}^{\bot}\mathcal{R}_{i}X_{i-1}\|\leq 1/(4\sqrt{r})\|X_{i-1}\|_{2} with some (high) probability (1p3)(1-p_{3}). Therefore:

Lastly, we have to bound the total probability of failure pfp1+p2+p3p_{f}\leq p_{1}+p_{2}+p_{3}. Set κ0=64μ(1+ln(8dl))\kappa_{0}=64\mu(1+\ln(8dl)), which means that m=dr(lnd)2O(1)m=dr(\ln d)^{2}\,O(1) coefficients will be sampled in total. A simple calculation gives pfeμp_{f}\leq e^{-\mu}. This completes the proof of our main result. \Box

In the remaining space, we address the important aspects of resilience against noise, certified tomography, and numerical performance. Owing to space limitations, the presentation will focus on conceptual issues, with the details in prep1 .

Realistic situations will differ from the previous case in two regards. First, the true state ρt\rho_{t} may not be low-rank, but only well approximated by a state ρ\rho of rank rr: ρtρ2ε1\|\rho_{t}-\rho\|_{2}\leq\varepsilon_{1}. Second, due to systematic and statistical noise, the available estimates for the Pauli expectations are not exactly trρtw(a)\operatorname{tr}\rho_{t}w(a), but of the form trωw(a)\operatorname{tr}\omega w(a) for some matrix ω\omega. Assume RωRρt2ε2\|\mathcal{R}\omega-\mathcal{R}\rho_{t}\|_{2}\leq\varepsilon_{2} (in practical situations, ε2\varepsilon_{2} may be estimated from the error bars associated with the individual Pauli expectation values errormodel ). In order to get an estimate for ρt\rho_{t}, choose some λ1\lambda\geq 1 and ελ(d2/m)ε1+ε2\varepsilon\geq\lambda(\sqrt{d^{2}/m})\varepsilon_{1}+\varepsilon_{2}, and solve the convex program

Let ρt\rho_{t} be an approximately low-rank state as described above. Suppose m=cdrlog2dm=cdr\log^{2}d randomly chosen Pauli expectations are known up to an error of ε\varepsilon as in (6), and let σ\sigma^{\star} be the solution of (6). Then the difference σρttr\|\sigma^{\star}-\rho_{t}\|_{\operatorname{tr}} is smaller than O(εrd)O(\varepsilon\sqrt{rd}). This holds with probability of failure at most 1/λ21/\lambda^{2} plus the probability of failure in Theorem 1.

The proof combines ideas from Ref. Candes2009 with our argument above note4 . The main difference from the noise-free case is that, instead of using trYΔ=0\operatorname{tr}Y\Delta=0, we must now work with trYΔ2Y2δ|\operatorname{tr}Y\Delta|\leq 2\|Y\|_{2}\,\delta. With this estimate, Observation 1 follows from the noise-free proof, together with some elementary calculations (see EPAPS). We remark that the above bound is likely to be quite loose; based on related work involving the “restricted isometry property,” we conjecture that the robustness to noise is actually substantially stronger than what is shown here FCRP08 .

The preceding results require an a priori promise: that the true state ρt\rho_{t} is δ1\delta_{1}-close to a rank-rr state. However, when performing tomography of an unknown state, neither rr nor δ1\delta_{1} are known beforehand. There are a few solutions to this quandary. First, rr and δ1\delta_{1} may be estimated from other physical parameters of the system, such as the strength of the local noise localnoise .

Another approach is to estimate rr and δ1\delta_{1} from the same data that is used to reconstruct the state. When r=1r=1, this approach is particularly effective, in entirely assumption-free tomography: one can estimate δ1\delta_{1}, using only O(d)O(d) Pauli expectation values. This is because δ1\delta_{1} is related to the purity Trρ2\operatorname{Tr}\rho^{2}, which has a simple closed-form expression in terms of Pauli expectation values. See EPAPS for details. We get:

Assume that the unknown physical state is close to being pure. Then one can find a certificate for that assumption, and reconstruct the state with explicit guarantees on the reconstruction error, from O(cdlog2d)O(cd\log^{2}d) Pauli expectation values. The probability of failure is exponentially small in cc.

Finally, when the state is approximately low-rank but not nearly pure (r>1r>1), one may perform tomography using different numbers of random Pauli expectation values mm. When mm is larger than necessary (corresponding to an over-estimate of rr), we are guaranteed to find the correct density matrix. When mm is too small, we find empirically that the algorithms for reconstructing the density matrix (i.e., solving the convex program (1)) simply fail to converge.

Here we describe a variant of our tomography method that makes the classical post-processing step (i.e., solving the convex program (1) to reconstruct the density matrix) faster. This method also uses random Pauli measurements, but they are chosen in a structured way. Any Pauli matrix is of the form w(u,v)=k=1niukvk(σx)uk(σz)vkw(u,v)=\bigotimes_{k=1}^{n}i^{u_{k}v_{k}}(\sigma^{x})^{u_{k}}(\sigma^{z})^{v_{k}} for u,v{0,1}nu,v\in\{0,1\}^{n}. We choose a random subset S{0,1}nS\subset\{0,1\}^{n} of size O(rpolylog(d))O(r\operatorname{polylog}(d)), and then for all uSu\in S and v{0,1}nv\in\{0,1\}^{n}, measure the Pauli matrix w(u,v)w(u,v). We call this the “hybrid method” because it is equivalent to a certain structured matrix completion problem. This fact implies that certain key computations in solving the convex program (1) can be implemented in time O(d)O(d) rather than O(d2)O(d^{2}) Cai2008 . However, the hybrid method is not covered by the strong theoretical guarantees shown earlier, though it does give accurate results in practice. For a more complete discussion, see the EPAPS supplement.

We numerically simulated both the random Pauli and hybrid approaches discussed above. For both approaches, we used singular value thresholding (SVT) Cai2008 . Instead of directly solving Eq. (6), SVT minimizes  τσtr+σ22/2\ \tau\|\sigma\|_{\operatorname{tr}}+\|\sigma\|_{2}^{2}/2 subject to tr(σω)w(Ai)δ|\operatorname{tr}(\sigma-\omega)w(A_{i})|\leq\delta, which is a good proxy to Eq. (6) when τ\tau dominates the second term; the programs are equivalent in the limit τ\tau\rightarrow\infty (provided Eq. (6) has a unique solution) Cai2008 . Estimating the second term for typical states suggests choosing 2τr12\tau r\gg 1; we use τ=5\tau=5. To simulate tomography, we chose a random state from the Haar measure on a d×rd\times r dimensional system and traced out the rr-dimensional ancilla, then applied depolarizing noise of strength γ\gamma. We sampled expectation values associated with randomly chosen operators as above, and added additional statistical noise (respecting Hermiticity) which was i.i.d. Gaussian with variance σ2\sigma^{2} and mean zero. We used SVT and quantified the quality of the reconstruction by the fidelity and the trace distance for various values of mm, each averaged over 55 simulations. This dependence is shown in Fig. 1. The reconstruction is remarkably high fidelity, despite severe undersampling and corruption by both depolarizing and statistical noise tr_renorm . Using the hybrid method with 88 qubits on a rank 33 state plus γ=5%\gamma=5\% depolarizing, and statistical noise strength σd=0.1\sigma d=0.1, we typically achieve 95%95\% fidelity reconstructions in under 1010 seconds on a modest laptop with 22 GB of RAM and a 2.22.2 GHz dual-core processor using MATLAB — even though 90%90\% of the matrix elements remain unsampled. Increasing the number of samples only improves our accuracy and speed, so long as sparsity is maintained.

Using truly randomly chosen Pauli observables (instead of the hybrid method) slightly increases the processing time due to the dense matrix multiplications involved: in our setup about one minute. However, this method achieves even better performance with respect to errors, as seen in Fig. 1.

The simulations above show that our method work for generic low rank states. Lastly, we demonstrate the functioning of the approach in the experimental context of the state ρ\rho found in the 88 ion experiment of Ref. Haffner2005 . To exemplify the above results, we simulated physical measurements by sampling from the probability distribution computed using the Born rule applied to the reconstructed state ρ\rho. This state is approximately low-rank, with 99% of the weight concentrated on the first 1111 eigenvectors. The standard deviation per observable was 3/d3/d. Fewer than 30% of all Pauli matrices were chosen randomly. From this information, a rank =3=3 approximation σ\sigma with fidelity of 90.5%90.5\% with respect to ρ\rho was found in about 33 minutes on the aforementioned laptop.

We have presented new methods for low-rank quantum state tomography, which require only O(rdlog2(d))O(rd\,\log^{2}(d)) measurements, where rr is the rank of the unknown density matrix and dd is the Hilbert space dimension. Our methods are based on and further develop the new paradigm of compressed sensing, and in particular, matrix completion Candes2008 ; Candes2009a . We use measurements that are experimentally feasible, together with very fast classical post-processing. The methods perform well in practice, and are also supported by theoretical guarantees. It would be interesting to further flesh out the trade off between the need for measurements that can be performed easily in an experiment and the need for sparse matrices during the classical post-processing step. It is the hope that this work stimulates such further investigations.

Acknowledgments. We thank E. Candès and Y. Plan for useful discussions. Research at PI is supported by the Government of Canada through Industry Canada and by the Province of Ontario through the Ministry of Research & Innovation. YL is supported by an NSF Mathematical Sciences Postdoctoral Fellowship, JE by the EU (QAP, QESSENCE, MINOS, COMPAS) and the EURYI, DG by the EU (CORNER). We thank the anonymous referees for many helpful suggestions.

References

I Appendix

While this publication contains a complete proof of all the claims relevant for quantum tomography, the reader is invited to consult the more general and explicit presentation in Ref. prep2 (and soon prep1 ). Below, we provide those details of the proof of Theorem 1, which were left out in the main text.

We introduce some more formal notations used in the argument. Denote the trace inner product between two Hermitian operators ρ,σ\rho,\sigma by \mbox{\big{(}\rho,\sigma\big{)}}:=\operatorname{tr}\rho\sigma. We assume that w(A1),,w(Am)w(A_{1}),\dots,w(A_{m}) are independent, identically distributed matrix-valued random variables, with w(Ai)w(A_{i}) drawn from the d2d^{2} Pauli matrices with uniform probability. Thus, we model the selection of the observables as a process of sampling with replacement. It is both very plausible and easily provable nesme that drawing the observables without replacement can only yield better results.

An essential tool for the proof is a non-commutative large-deviation bound from ahlswede . Let S=imXiS=\sum_{i}^{m}X_{i} be a sum of i.i.d. matrix-valued random variables (r.v.’s) XiX_{i}. Then it is shown in ahlswede that for every λ,t>0\lambda,t>0 we have

I.1.2 “Case (i)”: large-deviation bound

The first application of (9) is to verify Eq. (2) from the main text, which claims that

To this end, let YiY_{i} be the super-operator defined by

which implies σ22drm2\sigma^{2}\leq\frac{2dr}{m^{2}}. The claimed Eq. (10) directly follows by plugging this estimate of σ2\sigma^{2} into the non-commutative large-deviation bound (9).

I.1.3 “Case (ii)”: the approximate subgradient

Next, consider the claim after Eq. (3) of the main text. There, we assumed that YY was a matrix in span(w(A1),,w(Am))\operatorname{span}(w(A_{1}),\ldots,w(A_{m})) such that

Recall the scalar sign function sign\operatorname{sign} which maps positive numbers to +1+1, to and negative numbers to 1-1. If σ\sigma is any Hermitian matrix, then signσ\operatorname{sign}\sigma is the matrix resulting from applying the sign\operatorname{sign}-function to the eigenvalues of σ\sigma. Note that

for any two Hermitian σ1,σ2\sigma_{1},\sigma_{2}.

Letting F=signΔTF=\operatorname{sign}\Delta_{T}^{\perp} we compute:

(Use the “pinching inequality” bhatia in the first step; (12), (13) in the second. The third step is (12) and using that RΔ=0\mathcal{R}\Delta=0 and YrangeYY\in\operatorname{range}Y implies (Y,Δ)=0(Y,\Delta)=0. The last estimate uses (11) and, once more, (13)).

I.1.4 “Case (ii)”: large deviation bound

The deviation bound before Eq. (5) of the main text follows again from (9). Let FF be an arbitrary matrix in TT. With Xi=dmPT(w(Ai))trw(Ai)FX_{i}=\frac{d}{m}\mathcal{P}^{\bot}_{T}(w(A_{i}))\operatorname{tr}w(A_{i})F:

having used that PTwa1\|\mathcal{P}_{T}^{\bot}w_{a}\|\leq 1 and that the {d1/2wa}\{d^{-1/2}w_{a}\} form an orthonormal basis. Thus

In the proof, we use (16) for t=1/(4r)t=1/(4\sqrt{r}). Hence the probability of failure becomes

I.2 Details for Observation 1

In this subsection we need to assume that the Paulis are sampled without replacement. All previous bounds continue to hold — see remark above. Let

be the projection operator onto rangeR\operatorname{range}\mathcal{R}, normalized so that Q=1\lVert\mathcal{Q}\rVert=1. Define γ=md2\gamma=\frac{m}{d^{2}}, and note that Q=γR\mathcal{Q}=\gamma\mathcal{R}. The optimization program (6) of the main text becomes minσtr\min\|\sigma\|_{\operatorname{tr}}, s.t. QσQω2γε\|\mathcal{Q}\sigma-\mathcal{Q}\omega\|_{2}\leq\gamma\varepsilon.

Let Δ=σρ\Delta=\sigma-\rho. We upper-bound QΔ2\lVert\mathcal{Q}{\Delta}\rVert_{2} as follows. First,

(where we defined δ=γε\delta=\gamma\varepsilon).

Combining the above two inequalities and rearranging, we get

We now show that ρ+Δtr<ρtr\|\rho+\Delta\|_{\operatorname{tr}}<\|\rho\|_{\operatorname{tr}} implies that Δ\Delta must be small. With the estimate (17) at our disposal, we re-visit (I.1.3):

We use a crude bound Y2=PT(Y)2+PT(Y)2PT(Y)E2+E2+PT(Y)d12d2+r+12d\lVert Y\rVert_{2}=\lVert\mathcal{P}_{T}(Y)\rVert_{2}+\lVert\mathcal{P}_{T}^{\perp}(Y)\rVert_{2}\leq\lVert\mathcal{P}_{T}(Y)-E\rVert_{2}+\lVert E\rVert_{2}+\lVert\mathcal{P}_{T}^{\perp}(Y)\rVert\sqrt{d}\leq\frac{1}{2d^{2}}+\sqrt{r}+\frac{1}{2}\sqrt{d}. Then, for reasonable values of the parameters (say d16d\geq 16, m16m\geq 16, rd/10r\leq d/10), we have

So ρ+Δtr<ρtr\|\rho+\Delta\|_{\operatorname{tr}}<\|\rho\|_{\operatorname{tr}} implies

Finally, write Δtr2rΔT2+ΔTtr\lVert\Delta\rVert_{\operatorname{tr}}\leq\sqrt{2r}\lVert\Delta_{T}\rVert_{2}+\lVert\Delta_{T}^{\perp}\rVert_{\operatorname{tr}}, and use (17) and (18). After simplifying, substituting in δ=γε\delta=\gamma\varepsilon, and setting κ=m/(rd)\kappa=m/(rd), one obtains

Finally, we write σρttrσρtr+ρρttr\lVert\sigma-\rho_{t}\rVert_{\operatorname{tr}}\leq\lVert\sigma-\rho\rVert_{\operatorname{tr}}+\lVert\rho-\rho_{t}\rVert_{\operatorname{tr}}. The first term is bounded by O(εrd)O(\varepsilon\sqrt{rd}) as shown above; the second term is ρρt2dε1dεd\leq\lVert\rho-\rho_{t}\rVert_{2}\sqrt{d}\leq\varepsilon_{1}\sqrt{d}\leq\varepsilon\sqrt{d}. This gives the desired result.

I.3 Certified tomography for almost-pure states

For almost-pure states (r=1r=1), it is possible to obtain estimates for δ1\delta_{1} from only O(d)O(d) Pauli expectation values without any assumptions. In this subsection, we sketch a simple scheme based on this observation: it outputs a reconstructed density matrix σ\sigma, together with a certified bound on the deviation σρttr\|\sigma-\rho_{t}\|_{\operatorname{tr}}. The algorithm takes two inputs: O(dlog2d)O(d\,\log^{2}d) random Pauli expectation values, and the experimentalist’s estimate of the measurement precision δ2\delta_{2} errormodel .

Concretely, we set r=1r=1 and aim to put a bound on δ1=ρtψψ2\delta_{1}=\|\rho_{t}-|\psi\rangle\langle\psi|\|_{2}, where ψ|\psi\rangle is the eigenvector of ρt\rho_{t} corresponding to the largest eigenvalue. Such a bound can be obtained in terms of the purity trρt2=ρt22\operatorname{tr}\rho_{t}^{2}=\|\rho_{t}\|_{2}^{2}. E.g.,

Choose m=4μd/t2m=4\mu d/t^{2} for some μ>1\mu>1 to ensure that

Combining the previous equation with (20), we have arrived at a certified estimate for δ1\delta_{1}.

I.4 A hybrid approach to matrix recovery

Matrix recovery using Pauli measurements does lack one desirable feature: the classical post-processing (solving the convex programs) is more costly, compared to matrix completion Candes2008 ; Candes2009a . This is due to the role of sparse linear algebra in the SVT (singular value thresholding) algorithm Cai2008 . The basic issue is that SVT must handle matrices of the form Rρ\mathcal{R}\rho. For matrix completion, Rρ\mathcal{R}\rho is sparse, so basic operations such as matrix-vector multiplication take time O(d)O(d); but when we use random Pauli measurements, R(ρ)\mathcal{R}(\rho) is dense, and basic operations take time O(d2)O(d^{2}). We now describe a “hybrid” approach that avoids this difficulty, and works well in practice. The main observation is that for certain, carefully selected sets of Pauli matrices, Rρ\mathcal{R}\rho is sparse after all.

for u,v{0,1}nu,v\in\{0,1\}^{n}. Plainly, the position of the dd non-zero matrix elements of w(u,v)w(u,v) depends only on uu (vv encodes only phase information). Now choose a random subset S{0,1}nS\subset\{0,1\}^{n} of size O(rpolylog(d))O(r\operatorname{polylog}(d)), and then for all uSu\in S and v{0,1}nv\in\{0,1\}^{n}, measure the Pauli matrix w(u,v)w(u,v). Thus we are measuring each of the Pauli strings containing only σz\sigma^{z} or identity, together with these same strings “masked” by applying a set of size S|S| of Pauli strings with a pattern of σx\sigma^{x} and identity. Formally, this means

It follows that Rρ\mathcal{R}\rho is sparse with only Sd|S|d non-zero matrix elements. This “hybrid method” can be viewed as a variant of the usual matrix completion problem, where instead of sampling matrix elements independently at random, we sample groups of matrix elements determined by the random strings uSu\in S.

While the hybrid algorithm works well for generic states, certain input states ρ\rho may fail to be “incoherent enough” w.r.t. the very specific set expectation values obtained (c.f. Candes2008 ; Candes2009a ). For example, when the eigenvectors of ρ\rho are nearly aligned with the standard basis, most of the matrix elements of ρ\rho are nearly 0, and hence matrix completion is impossible. To avoid this problem, we suggest to perform a pseudo-random unitary UU prior to measuring the Pauli matrices. One then uses the hybrid method on UρUU\rho U^{\dagger}, and finally applies U1U^{-1} to recover ρ\rho. In particular, one can draw UU at random from an (approximate) unitary kk-design with kn/lognk\sim n/\log n. Explicit constructions of such unitaries are known, and can be implemented efficiently harrow-low .

While we cannot at this point prove rigorous guarantees for the hybrid approach, we do show below that randomization by approximate kk-designs generates sufficient “incoherence” that the original matrix completion algorithms Candes2008 ; Candes2009a would work. Because these algorithms call for matrix elements to be sampled from a uniform distribution, Observation 3 does not rigorously apply to the hybrid scheme. It does, however, make it plausible that pseudo-randomization overcomes incoherence problems and that guarantees for the hybrid method can be proven in the future.

Let ρ\rho be an arbitrary state of rank rr and dimension dd, and let EE be the projector onto the support of ρ\rho. Let i|i\rangle, i=1,,di=1,\ldots,d, denote the standard basis. Let UU be drawn at random from an (ε\varepsilon-approximate) unitary kk-design with kn/lognk\sim n/\log n (and ε=1/dk\varepsilon=1/d^{k}), and let bi=Ui|b_{i}\rangle=U|i\rangle. Then, with probability at least 1(1/d)1-(1/d), the following holds:

where μ0=C1(logd)C2\mu_{0}=C_{1}(\log d)^{C_{2}}, and C1C_{1} and C2C_{2} are fixed constants.

This implies the incoherence conditions (A0) and (A1) of Candes2008 , specialized to the case of positive semidefinite matrices, with μ0\mu_{0} as given above and μ1=μ0r\mu_{1}=\mu_{0}\sqrt{r}. Combining with the results of Candes2008 shows that ordinary matrix completion, with matrix elements sampled independently at random, will succeed. This guarantee does not extend to the hybrid method, however.

Proof of Observation 3: First consider a single vector b1|b_{1}\rangle, and define Z=Eb122Z=\lVert E|b_{1}\rangle\rVert^{2}_{2}. We will compute the kk’th moment of ZZ:

Using Markov’s inequality, and setting t=(rk/d)d2/k(r/d)poly(logd)t=(rk/d)\cdot d^{2/k}\leq(r/d)\cdot\text{poly}(\log d), we get

This proves the claim for a single vector b1|b_{1}\rangle. Now take the union bound over all the vectors bi|b_{i}\rangle, i=1,,di=1,\ldots,d. \square