Operator scaling: theory and applications

Ankit Garg, Leonid Gurvits, Rafael Oliveira, Avi Wigderson

Introduction

This introduction will be unusually long, due to the unusual number of settings in which the problems we study appear, and their interconnections and history.

The main computational problem we will be concerned with in this paper (which we call SINGULAR) is determining whether such a symbolic matrix is invertible or not (over the field of fractions in the given variables). This problem has a dual life, depending on whether the variables commute or don’t commute. In the commutative case this problem has an illustrious history and significance. It was first explicitly stated by Edmonds [Edm67], and shown to have a randomized polynomial time algorithm by Lovasz [Lov79]. The completeness of determinant for arithmetic formulas by Valiant [Val79] means that singularity captures the celebrated Polynomial Identity Testing (PIT) problem, and so in the commutative setting we will refer to it as PIT. The derandomization of the latter probabilistic algorithm for PIT (namely, proving PIT P\in\mathcal{P}) became all-important overnight when Kabanets and Impagliazzo [KI04] showed it would imply nontrivial arithmetic or Boolean lower bounds well beyond current reach.

On the other hand, in the non-commutative case even the meaning of this problem SINGULAR is unclear. It took decades to fully define and understand the related notion of a “field of fractions” for non-commutative polynomials, namely the free skew field over which we (attempt to) invert the matrixFor now, the reader may think of the elements of this “free skew field” simply as containing all expressions (formulas) built from the variables and constants using the arithmetic operations of addition, multiplication and division (we define it more formally a bit later). We note that while this is syntactically the same definition one can use for commuting variables, the skew field is vastly more complex, and in particular its elements cannot be described canonically as ratios of polynomials.. But as we will explain below, this non-commutative SINGULAR problem has many intuitive and clean equivalent formulations (some entirely commutative!). It captures a non-commutative version of identity testing for polynomials and rational functions, provides a possible avenue to attack the notorious commutative PIT version, and quite surprisingly, its different formulations arise naturally in diverse areas of mathematics, revealing surprising connections between them. We only note that unlike the commutative PIT, it is not even clear from its definition that the problem SINGULAR is decidable. It requires some of the very nontrivial characterizations above, and other important results in commutative algebra, to prove a deterministic exponential time upper bound on its complexity (see two very different proofs in [CR99, IQS17]), the best known before this work. No better bound was known even allowing randomness.

The main result of this paper is a deterministic polynomial time algorithm for this problem!

LL is singular (namely not invertible) over 𝔽​(

For every integer kk we have Det(i=1mXiAi)0\textnormal{Det}\left(\sum_{i=1}^{m}X_{i}\otimes A_{i}\right)\equiv 0, where the XiX_{i} are k×kk\times k matrices of commutative variables, Det is still the commutative determinant polynomial, acting here on kn×knkn\times kn matrices.

The completely positive quantum operator (or map) associated with LL is rank-decreasing. Namely, there exist a positive semi-definite matrix PP such that rank(i=1mAiPAi)<rank(P)\textnormal{rank}(\sum_{i=1}^{m}A_{i}PA_{i}^{\dagger})<\emph{{rank}}(P).Here AA^{\dagger} denotes the conjugate transpose of a complex matrix AA.

In the rest of this introduction we will try to explain how and why the problem SINGULAR arises in these different contexts, how are they related. We will discuss the algorithm that solves SINGULAR (which already appears in [Gur04]), its complexity analysis (which is the main result of this paper), and how they arise from these (and other!) sources. We will also discuss the extension of this algorithm to non-commutative rank computation, and the implications of these algorithms to the various settings. We also highlight the recurring analogs and connections between the commutative and non-commutative worlds. There are probably many ways to weave this meandering story due to the multiple connections between the topics; we chose one (some accounts of subsets of these connections appear e.g. in [Gur04, HW14, IQS17]). Note that most descriptions will be informal or semi-formal; formal definitions for everything that is actually needed will be given in the technical sections, and for the rest we provide references.

Polynomials, both in commuting and non-commuting variables, can always be uniquely defined by their coefficients. And so, while their computational complexity is of course interesting, it is not part of their description. For rational functions this persists in the commutative setting, as they are simply ratios of polynomials. It is perhaps amusing that in the non-commutative setting, the first definition of the free skew field of rational functions was computational, namely one whose objects are described by their computation via a sequence of arithmetic operations. Both this description, and the subsequent discovery of a syntactic representation play an important role in our story.

Books on history and construction of the skew field include [Coh95, Row80], and a thorough discussion from the computational perspective is given in [HW14]. Here we will be relatively brief, highlighting the key roles played by matrices and symbolic matrices in these constructions of the free skew field.

The rational expressions under the above equivalence relation comprise the (universal) free skew field 𝔽​(

This theorem immediately implies a polynomial identity test in probabilistic polynomial time (just like in the commutative case) mentioned in the next subsection - simply plug random matrices of appropriate size for the variables. However, for rational expressions, which is our main interest here, such upper bounds on the sufficient dimension to exclude rational identities are much harder to prove, and what is known is much weaker. The best bounds (and only for some fields) are exponential, which seem to only provide probabilistic exponential time rational identity tests. The last subsection of this introduction shows how these exponential dimension bounds arise and obtained from invariant theory, and how we use them to nonetheless derive a deterministic, polynomial time identity test promised in our main theorem.

A second construction of the free skew field which is somehow more concrete was developed by Cohn. In the notation we have already established of symbolic matrices Cohn proved:

Every element of the free skew field 𝔽​(

For this definition not to be self-referential (and other reasons) Cohn proved some of the important characterizations of invertible matrices, including items (2) and (5) in Theorem 1.4. It is clear that (each entry of) the inverse of such a matrix can be given by a rational expression, and the question of whether a matrix is invertible again arises naturally, and it turns out to capture rational identity testing.

2 Word Problems and Identity Tests

Word problems and their complexity are central throughout mathematics, arising whenever mathematical objects in a certain class have several representations. In such cases, a basic problem is whether two such representations describe the same object. Often, indeed, some problems of this form have served as early examples of decidable and undecidable problemsE.g. deciding if two knots diagrams describe the same knot was proved decidable by Haken [Hak61], and deciding if two presentations with generators and relations describe the same group was proved undecidable by Rabin [Rab58].

As is well known, the main reason why PIT is so important in the commutative setting is that it captures the polynomial and rational function identity test for formulas, as proved by Valiant [Val79]. Combining it with the equivalences of Fact 1.2, we have

There is an efficient algorithm which converts every arithmetic formula ϕ(y)\phi({\mathbf{y}}) in commuting variables y{\mathbf{y}} of size ss to a symbolic matrix LϕL_{\phi} of size poly(s)\emph{{poly}}(s), such that ϕ\phi is identically zero if and only if LϕL_{\phi}\in PIT (i.e., LϕL_{\phi} is singular).

Theorem 1.7 of the previous subsection, showing that in the non-commutative setting as well SINGULAR captures polynomial and rational function identity test, is the analog Valiant’s completeness theorem. It was proved even earlier by Cohn (see also Malcolmson’s version [Mal78]), using similar linear algebraic ideas. Here is a more precise statement which gives the analogy.

There is an efficient algorithm which converts every arithmetic formula ϕ(x)\phi({\mathbf{x}}) in non-commuting variables x{\mathbf{x}} of size ss to a symbolic matrix LϕL_{\phi} of size poly(s)\emph{{poly}}(s), such that the rational expression computed by ϕ\phi is identically zero if and only if LϕL_{\phi}\in SINGULAR.

As mentioned, the structure of the free skew field is so complex that unlike the commutative case, even decidability of SINGULAR (and rational identity testing) is far from obvious. The first to prove decidability was Cohn in [Coh73, Coh75]. The first explicit bound on time was given by Cohn and Reutenauer in [CR99], reducing it to a system of commutative polynomial equations using characterization (5) of SINGULAR, proved earlier by Cohn, which puts it in PSPACE\mathcal{PSPACE}. The best upper bound before this work was singly exponential time, obtained by [IQS17], and of course yields the same bound for rational identity testing.

From our Theorem 1.1 we conclude a deterministic, polynomial identity test for non-commuting rational expressions.

It is important to stress that unlike the commutative case, where symbolic matrix inversion can be simulated by quasi-polynomial sized formulas, in the non-commutative case symbolic matrix inversion is exponentially more powerful than formulas (and so Theorem 1.1 is far more powerful than its corollary above). We state these two results formally below for contrastReplacing formulas by circuits there is no contrast - in both the commutative and non-commutative setting matrix inverse has a polynomial size circuit (with division of course) [HW14]. Note that when saying “computing the inverse” we mean computing any entry in the inverse matrix.

The inverse of a non-commutative n×nn\times n symbolic matrix in 𝔽​(

Our algorithm of Theorem 1.1 and Corollary 1.10 is a “white-box” identity test (namely one which uses the description of the given matrix or formula), in contrast to a “black-box” algorithm which only has input-output access to the function computed by them. The best-known “black-box” identity test for these formulas, even if randomization is allowed(!), requires exponential time. It is a very interesting question to find a faster black-box algorithm.

When division is not allowed, efficient deterministic identity tests of non-commutative polynomials were known in several models. The strongest is for arithmetic branching programs (ABPs). As this model is easily simulable by matrix inversion (see Theorem 6.5 in [HW14], our algorithm provides an alternative (and very different) proof of the following theorem of Raz and Shpilka [RS05].

There is a deterministic polynomial time white-box identity testing algorithm for non-commutative ABPs.

For certain classes computing only polynomials even efficient black-box algorithms are known, and we mention two below; the first deterministic, for non-commutative ABPs, and the second probabilistic (but for circuits).

There is a deterministic quasi-polynomial time black-box identity testing algorithm for non-commutative ABPs.

There is a probabilistic polynomial time black-box identity testing algorithm for non-commutative circuits.

3 Commutative and Non-commutative Rank of Symbolic Matrices

There are many sources, motivations and results regarding invertibility, and more generally rank, of commutative symbolic matrices, which are much older than the complexity theory interest in it. These mathematical papers, like the ones in the computational complexity literature, regard it as a difficult problem, and attempt to understand a variety of special cases (of a different nature than restrictions of the computational power of the model). Some of the many references to this body of work can be found in the papers [FR04, GM02]. Often, the non-commutative rank (explicitly or implicitly) is used to give upper bound on the commutative rank, and their relationship becomes of interest. We focus on this connection here, and explain how our main result implies a deterministic approximation algorithm to the commutative rank.

The following are equivalent for a matrix of linear forms over commutative variables, L(y)L({\mathbf{y}}).

While the characterization above is simple, the one below is very substantial, mostly developed by Cohn for his construction of the free skew field. The first characterization is due to him [Coh95] and the second is due to Fortin and Reutenauer [FR04] who heavily use Cohn’s techniques.

The following are equivalent for a matrix of linear forms over non-commutative variables, L(x)L({\mathbf{x}}).

nc-rank(L(x))=r\emph{{nc-rank}}(L({\mathbf{x}}))=r over 𝔽​(

There is a deterministic algorithm which, given mm n×nn\times n integer matrices A1,AmA_{1},\dots A_{m} with entries of bit-size bb, computes nc-rank(L)\emph{{nc-rank}}(L) in time poly(n,m,b)\emph{{poly}}(n,m,b) (where L=i=1mxiAiL=\sum_{i=1}^{m}x_{i}A_{i}).

We note here that in some formulations of the problem, LL is represented in affine form, namely where an additional constant matrix A0A_{0} is added to LL (this may be viewed as a non-commutative analog of the rank-completion problem). However, the algorithm above works in this case as well , since nc-rank(L)\textnormal{nc-rank}(L) remains unchanged if an additional variable x0x_{0} was added as well. So, we will stick with the linear formulation.

It is not hard to see from the definitions that for every LL we have rank(L)nc-rank(L)\textnormal{rank}(L)\leq\textnormal{nc-rank}(L). We have already seen that they can be different in Example 1.3. Taking many copies of that 3×33\times 3 matrix we see that there can be a factor 3/23/2 gap between the two: for any rr there are matrices LL with rank(L)=2r\textnormal{rank}(L)=2r and nc-rank(L)=3r\textnormal{nc-rank}(L)=3r. However, Fortin and Reutenauer [FR04] proved that this gap is never more than a factor of 2.

For every LL we have nc-rank(L)2rank(L)\emph{{nc-rank}}(L)\leq 2\emph{{rank}}(L).

An immediate corollary of our main result is thus a factor 2 approximation of commutative rank.

We find the question of obtaining a better approximation ratio efficiently a very interesting problem, a different relaxation of the commutative PIT problem that as far as we are aware was not studied till now.

4 Compression Spaces, Optimization and Gurvits’ Algorithm G𝐺G

An important class of spaces of matrices LL, which is studied in many of the papers mentioned above, is the class of compression spaces (this notation was apparently introduced by Eisenbud and Harris [EH88]). They are defined as those spaces LL for which LL is rank(L)\textnormal{rank}(L)-decomposable. A simpler definition follows the characterization of [FR04] above, namely a space LL is a compression space iff rank(L)=nc-rank(L)\textnormal{rank}(L)=\textnormal{nc-rank}(L). The importance of compression spaces and their many origins will be surveyed below.

There is a deterministic polynomial time algorithm, Algorithm GG, which for every nn and every n×nn\times n matrix LL given by a set of integer matrices (A1,A2,,Am)(A_{1},A_{2},\dots,A_{m}), outputs “singular” or “invertible”, and its output is guaranteed to be correctFor both the commutative and non-commutative definitions. when either rank(L)=n\emph{{rank}}(L)=n or nc-rank(L)<n\emph{{nc-rank}}(L)<n.

In particular, the algorithm always gives the correct answer for compression spaces. Our result on efficient computation of the non-commutative rank implies that for compression spaces we can determine the commutative rank.

Yet another source, elaborated on in [Gur04], is geometric duality theorems, such as the ones for matroid intersection and matroid parity problems. These sometimes give rise to compression spaces, with one prototypical example being spaces where the generating matrices AiA_{i} all have rank-1; this follows from the Edmonds-Rado [Edm69, Rad42] duality theorem for matroid intersectionIt is an interesting question whether a compression space naturally arises from matroid parity duality of Lovasz [Lov80, Lov89].. Another, simpler example are spaces generated by upper-triangular matrices. Other examples of compression spaces are given in [Gur04]. Following his paper, we note that our rank algorithm above can solve such optimization problems in the dark, namely when the given subspace has such structured spanning generators, but the generators AiA_{i} actually given to the algorithm may be arbitrary. This is quite surprising, and in general cannot work by uncovering the original structured generators from the given ones: an efficient algorithm is not known for the problem of deciding if a given space of matrices is spanned by rank-1 matrices and we believe it is hard. It would be interesting to explore which other optimization problems can be encoded as the rank of a compression space, and whether the fact that it can be computed “in the dark” has any applications.

5 Permanents, Quantum Operators, Origins and Nature of Algorithm G𝐺G

We will describe the algorithm formally in Section 2.2. Here we give an informal description, which makes it easy to explain its origins and nature. We find these aspects particularly interesting. One (which has very few parallels) is that while the problem SINGULAR solved by Algorithm GG in Theorem 1.1 is purely algebraic, the algorithm itself is purely analytic; it generates from the input a sequence of complex-valued matrices and attempts to discover if it is convergent. Another is that the algorithm arises as a quantum analog of another algorithm with very different motivation that we now discuss.

To give more insight to the working of algorithm GG, let us describe another algorithm (for a different problem) which inspired it, which we call Algorithm SS. This matrix scaling algorithm was developed by Sinkhorn [Sin64] for applications in numerical analysis, and has since found many other applications (see survey and references in [LSW98], who used it as a basis for their deterministic algorithm for approximating the permanent of non-negative matrices). Two different analyses of Sinkhorn’s algorithm SS, one of [LSW98] and the other in the unpublished [GY98] inspire the analysis of Algorithm GG in [Gur04].

We describe the [LSW98] analysis for Algorithm SS. We need a few definitions. For a non-negative matrix AA, let R(A)R(A) denote the diagonal matrix whose (i,i)(i,i)-entry is the inverse of the L1L_{1} norm of row ii (which here is simply the sum of its entries as AA is non-negative). Similarly C(A)C(A) is defined for the columnsA “non-triviality” assumption is that no row or column in AA is all zero..

Algorithm SS gets as input a non-negative integer matrix AA. For a fixed polynomial (in the input size) number of iterations it repeats the following two steps

Normalize rows: AR(A)AA\leftarrow R(A)\cdot A

Normalize columns: AAC(A)A\leftarrow A\cdot C(A)

What does this algorithm do? It is clear that in alternate steps either R(A)=IR(A)=I or C(A)=IC(A)=I, where II is the identity matrix. Thus AA itself alternates being row-stochastic and column-stochastic. The question is whether both converge to II together, namely, if this process converts AA to a doubly stochastic matrix. In [LSW98] it is proved that this happens if and only if Per(A)>0\textnormal{Per}(A)>0, where Per is the permanent polynomial. Moreover, convergence is easy to detect after a few iterations! If we define ds(A)=R(A)I2+C(A)I2\textnormal{ds}(A)=||R(A)-I||^{2}+||C(A)-I||^{2} as a notion of distance between AA and the doubly stochastic matrices, then the convergence test is simply whether ds(A)<1/n\textnormal{ds}(A)<1/n. If it is that small at the end of the algorithm then Per(A)>0\textnormal{Per}(A)>0, otherwise Per(A)=0\textnormal{Per}(A)=0.

The analysis of convergence of Algorithm SS in [LSW98] is extremely simple, using the permanent itself as a progress measure (or potential function). It has the usual three parts which makes a potential function useful:

The input size provides an exponential lower bound on the starting value of Per(A)\textnormal{Per}(A),

The arithmetic-geometric mean inequality guarantees that it grows by a factor of 1+1/n1+1/n at every iteration, and

The permanent of any stochastic matrix is upper bounded by 1.

We shall return to this analysis of Algorithm SS soon.

As it happens, Algorithm GG is a quantum analog of Algorithm SS! In quantum analogs of classical situations two things typically happen, diagonal matrices (which commute) become general matrices (which do not), and the L1L_{1} norm is replaced by L2L_{2}. This happens here as well, and we do so almost syntactically, referring the reader to [Gur04] for their quantum information theoretic intuition and meaning of all notions we mention.

The input to algorithm GG is a symbolic matrix L=ixiAiL=\sum_{i}x_{i}A_{i}, given by the n×nn\times n integer matrices (A1,A2,,Am)(A_{1},A_{2},\dots,A_{m}). Briefly, LL is viewed as a completely positive (quantum) operator, or map, on psd matrices, mapping such a (complex valued) matrix PP to L(P)=iAiPAiL(P)=\sum_{i}A_{i}PA_{i}^{\dagger} (PP is typically a “density matrix” describing a quantum state, namely a psd matrix with unit trace, and the operator LL will typically preserve trace or at least not increase it). The dual operator LL^{*} acts (as you’d expect) by L(P)=iAiPAiL^{*}(P)=\sum_{i}A_{i}^{\dagger}PA_{i}. The analog “normalizing factors” for LL, named R(L)R(L) and C(L)C(L) are definedAgain using a “non-triviality” assumption these matrices are invertible. by R(L)=(iAiAi)12R(L)=(\sum_{i}A_{i}A_{i}^{\dagger})^{-\frac{1}{2}}, and C(L)=(iAiAi)12C(L)=(\sum_{i}A_{i}^{\dagger}A_{i})^{-\frac{1}{2}}. Note that R(L)=L(I)12R(L)=L(I)^{-\frac{1}{2}} and C(L)=L(I)12C(L)=L^{*}(I)^{-\frac{1}{2}}.

On input (A1,A2,,Am)(A_{1},A_{2},\dots,A_{m}) Algorithm GG repeats, for a fixed polynomial (in the input size) number of iterations, the following analogous two steps

Normalize rows: LR(L)LL\leftarrow R(L)\cdot L

Normalize columns: LLC(L)L\leftarrow L\cdot C(L)

So again, row and column operations are performed alternately, simultaneously on all matrices AiA_{i}. It is clear, as above, that after each step either R(L)=IR(L)=I or C(L)=IC(L)=I. It is natural to define the case when both occur as “doubly stochastic”, and wonder under what conditions does this sequence converge, namely both R(L)R(L) and C(L)C(L) simultaneously approach II, and alternatively the limiting LL simultaneously fixes the identity matrix II. A natural guess would be that it has to do with a “quantum permanent”. Indeed, Gurvits [Gur04] defines such a polynomial (in the entries of the AiA_{i}) QPer(L)\textnormal{QPer}(L), and proves several properties and characterizations (for example, it is always non-negative like the permanent, and moreover specializes to the permanent when the operator LL is actually a “classical” operator described by a single non-negative matrix AA).

One can similarly define in an analogous way to the classical setting a “distance from double stochastic” by ds(L)=R(L)I2+C(L)I2\textnormal{ds}(L)=||R(L)-I||^{2}+||C(L)-I||^{2}, and test (after polynomially many iterations) if ds<1/n\textnormal{ds}<1/n. This is precisely what Algorithm GG does. It is not hard to see that if LL (with non-commuting variables) is singular, then there is no convergence, the test above fails (and indeed QPer(L)=0\textnormal{QPer}(L)=0). However, in contrast to Algorithm SS, the analysis in [Gur04] falls short of proving that, otherwise we have convergence (and QPer(L)>0\textnormal{QPer}(L)>0). It does so only under the strong (commutative) assumption Det(L)0\textnormal{Det}(L)\neq 0, namely when the symbolic matrix, viewed with commuting variables, is nonsingular. This result was stated above as Theorem 1.22. The 3-step complexity analysis proceeds exactly as in [LSW98] for the classical Algorithm SS described above, using the quantum permanent as a progress measure. However, if Det(L)=0\textnormal{Det}(L)=0 then QPer(L)=0\textnormal{QPer}(L)=0, and lower bounding the initial quantum permanent from below in part (1) of that analysis fails.

To prove our main theorem, showing that convergence happens if and only if LL is non-singular over the non-commutative skew field, we use another ingredient from Gurvits’ paper. He proposes another progress measure, called capacity and denoted cap(L)\textnormal{cap}(L). Capacity (defined in the next section) is a quantum analog of another classical progress measure, based on KL-divergence, which was used in [GY98] for the analysis of Algorithm SS. The advantage of capacity, as pointed out in [Gur04], is that it does not necessarily vanish when Det(L)=0\textnormal{Det}(L)=0. Indeed, it is positive whenever LL is non-singular in the non-commutative sense! However Gurvits could only bound it sufficiently well from below as a function of the input size if Det(L)0\textnormal{Det}(L)\neq 0. Thus, for a general input LL, positive capacity only guarantees that Algorithm GG converges in a finite number of steps, without providing an upper bound on that number.

Our main contribution, which leads to a polynomial upper bound, is proving an explicit capacity lower bound for every LL that is nonsingular over the skew field. The source of the proof turns out to be commutative algebra and invariant theory, as we discuss next.

6 Invariant Theory, Left-Right Action, Capacity and the Analysis of Algorithm G𝐺G

Invariant theory is a vast field; we will exposit here only the minimal background that is necessary for this paper. The books [CLO07, DK02, KP96] provide expositions on the general theory. More focused discussions towards our applications appear in the appendix of [HW14] and Section 1.2 of [FS13a].

Invariant theory deals with understanding the symmetries of mathematical objects, namely transformations of the underlying space which leave an object unchanged or invariant. Such a set of transformations always form a group (and every group arises this way). One major question in this field is, given a group acting on a space, characterize the objects left invariant under all elements of the group. Here we will only discuss very specific space and actions: polynomials (with commuting variables!) that are left invariant under certain linear transformations of the variables. The invariant polynomials under such action clearly form a ring (called the invariant ring). Important for us is the null-cone of the action, which is simply all assignments to variables which vanish on all non-constant homogeneous invariant polynomials. Namely the null cone is the affine algebraic variety defined by the ideal generated by homogeneous invariant polynomials of positive degree.

Here are two motivating examples surely familiar to the reader. The first example is given by polynomials in nn variables y1,,yny_{1},\dots,y_{n} invariant under the full group of permutations SnS_{n}; here the invariant polynomials are (naturally) all symmetric polynomials. While this is an infinite set, containing polynomials of arbitrarily high degrees, note that this set of polynomials is generated (as an algebra) by the n+1n+1 elementary symmetric polynomials, with the largest degree being nn. This finite generation is no coincidence! A general, important theorem of Hilbert [Hil93] assures us that for “such” linear actions there is always a finite generating set (and hence a finite upper bound on their maximum degree). Obtaining upper bounds on the degree of generating sets, finding descriptions of minimal generating sets for natural actions are the classical goals of this area, and a more modern oneArising in particular in the GCT program of Mulmuley and Sohoni is obtaining succinct descriptions and efficient computation of these invariants (e.g. see [Mul12, FS13a]). The case of the action of the symmetric group is an excellent example of a perfect understanding of the invariant polynomial ring. Note that for this action the null-cone is simply the all-zero vector.

What we consider here is the left-right action on mm n×nn\times n matrices, where a pair (B,C)(B,C) as above takes (Y1,Y2,,Ym)(Y_{1},Y_{2},\dots,Y_{m}) to (BY1C,BY2C,,BYmC)(BY_{1}C,BY_{2}C,\dots,BY_{m}C). The study of the invariant ring of polynomials (in the mn2mn^{2} variables sitting in the entries of these matrices) for this action was achievedWe note that this is part of the larger project of understanding quiver representations, started by the works of Procesi, Razmysolov, and Formanek [Pro76, Raz74, For86]. by [DW00, DZ01, SdB01, ANS10], and will look very familiar from condition (4) in Theorem 1.4, as well as from Amitsur’s definition of the free skew fieldNote though that the roles of which matrices in the tensor product are variable, and which are constant, has switched!

Over algebraically closed fields, the invariant ring of polynomials of the left-right action above is generated by all polynomials of the form Det(iDiYi)\textnormal{Det}\left(\sum_{i}D_{i}\otimes Y_{i}\right), for all dd and all d×dd\times d matrices DiD_{i}.

It is worthwhile to stress the connection forged between the commutative and non-commutative worlds by this theorem when combined with Amitsur’s and Cohn’s constructions of the skew field. A set of matrices (A1,A2,,Am)(A_{1},A_{2},\dots,A_{m}) is in the null-cone of the left-right action if and only if the symbolic matrix L=ixiAiL=\sum_{i}x_{i}A_{i} is not invertible in the free skew field! In other words, the non-commutative SINGULAR problem (and thus rational identity testing, and the word problem in the skew field) arises completely naturally in commutative algebra. Of course, invertibility itself is invariant under the left-right action (indeed, even by any invertible matrices B,CB,C, not necessarily of determinant 1), so one expects a connection in at least one direction. At any rate, one may hope now that commutative algebraic geometric tools will aid in solving these non-commutative problems, and they do!

The generating set of the invariant ring given in the theorem is still an infinite set, and as mentioned above, Hilbert proved that some finite subset of it is already generating.

The invariant ring of polynomials of the action of a reductive group on a finite dimensional vector space is finitely generated.

This exponential upper bound was the state of art when we wrote our paper. Subsequent to it, this bound was greatly improved to linear!

This much better bound gave rise to further important developments discussed below under “Subsequent Work”, that highlight even more how invariant theory informs algorithmic efficiency in certain problems. We conclude this section describing the idea behind our proof, which uncovers this connection.

We plan to used the exact same 3-step complexity analysis of Algorithm GG as the one for Algorithm SS described in subsection 1.5, and which Gurvits [Gur04] used to analyze Algorithm GG for a special subset of inputs. As mentioned, the only missing ingredient is step (1), namely an explicit lower bound on the capacity of an input operator LL. Just like in the classical case of Algorithm SS, an exponential lower bound (in terms of the size of LL) is required to establish a polynomial convergence. This is a proper time to finally define this important parameter of positive operatorsWe note that this notion of capacity seems to have nothing to do with the usual capacity of a quantum channel.

Given L=(A1,A2,,Am)L=(A_{1},A_{2},\ldots,A_{m}), recall that L(P)=iAiPAiL(P)=\sum_{i}A_{i}PA_{i}^{\dagger} and define cap(L)\textnormal{cap}(L) by

where the infimum is taken over all psd matrices PP with Det(P)1\textnormal{Det}(P)\geq 1.

First, observe that capacity is the solution to a non-convex optimization problem (over a convex domain Recall that log determinant is a concave function over the domain of positive definite matrices). We will see in the next section that we can not only bound it, but actually compute it efficiently! Next, observe that capacity is an invariant (though not a polynomial invariant) of the left-right action.What we know is that cap(L)0\textnormal{cap}(L)\neq 0. From this we infer that some polynomial invariant of the form above, Det(iAiDi)\textnormal{Det}\left(\sum_{i}A_{i}\otimes D_{i}\right) is nonzero, where the DiD_{i} are d×dd\times d matrices for some dd.

This expression naturally suggests considering several new positive operators, and relating their capacities. The first, in dimension dd, is the operator, M=(D1,D2,,Dm)M=(D_{1},D_{2},\ldots,D_{m}). Two others, in ndnd dimensions are LM=(A1D1,A2D2,,AmDm)L_{M}=(A_{1}\otimes D_{1},A_{2}\otimes D_{2},\ldots,A_{m}\otimes D_{m}), and LM=(A1D1,AiDj,,AmDm)L\otimes M=(A_{1}\otimes D_{1},\ldots\,A_{i}\otimes D_{j},\ldots,\,A_{m}\otimes D_{m}). The non-vanishing of the determinant above proves that both cap(LM)>0\textnormal{cap}(L_{M})>0 and cap(LM)>0\textnormal{cap}(L\otimes M)>0. And it is easy to obtain exponential lower bounds on both quantities, which however depend not only on the description size of LL (which is given), but also of that of MM which could be much larger.

To facilitate our original capacity lower bound, which only used dσ(n)exp(poly(n))d\leq\sigma(n)\leq\exp(\textnormal{poly}(n)), we proved that the normalizedTaking the dimension’s root of capacity capacity is multiplicative, a fact of independent interest (see details in Section 3.3):

Using only the easier direction, namely that the left hand side is at most the right hand side, it suggests that our lower bound on cap(L)\textnormal{cap}(L) will follow from a lower bound on cap(LM)\textnormal{cap}(L\otimes M) and an upper bound on cap(M)\textnormal{cap}(M). Both of these in turn follow from an upper bound on the entry sizes in MM, namely in the DiD_{i}, which easily follow from the given exponential upper bound on dd. A key thing to note is that even when dd is exponential in nn, and so these size bounds are doubly exponential, it is the dd’th root of this bound that gives the required lower bound cap(L)\textnormal{cap}(L).

For the new lower bound, presented in Section 2.3, we bound cap(L)\textnormal{cap}(L) using directly a bound on cap(LM)\textnormal{cap}(L\otimes M). Using much more careful size bounds on the entries of MM, which follow from Alon’s Nullstellensatz theorem, and tightening the reduction between the two capacities, we are able to derive the desired capacity lower bound using any finite bound on dd !

This new proof eliminates the need for good degree upper bounds for the purpose of analyzing this algorithm (if one does not care about efficiency beyond having a polynomial bound). However good degree upper bounds (that are anyway an important goal in invariant theory) were essential for the original proof, and for general actions were useful for other algorithmic problems which arose in other contexts in computational complexity, for example in geometric complexity theory (GCT). Further, we believe they will be important for studying other problems in algebraic complexity and optimization. Indeed, in a work in progress, we need degree bounds for invariant rings whose generating sets are not understood as well as for the left-right action, but Derksen’s general exponential degree upper bound still holds and is useful.

The capacity lower bound above implies, as mentioned, a polynomial upper bound on the number of iterations of Algorithm GG, assuming we can perform exact arithmetic! To actually bound the running time the algorithm has to be refined, truncating numbers so that they can be represented by polynomially many bits, and so a careful analysis of the bit complexity is required to actually prove its running time. This issue naturally leads to the next section, in which we discuss how the algorithm can be extended to actually compute capacity of the input operator, as well as provide bounds on its continuity. These results in turn are essential in a follow-up work [GGOW18] we have done on the Brascamp-Lieb inequalities, where capacity arises naturally via a simple reduction to operator scaling.

7 Computation and Continuity of Capacity

We also show that Algorithm G can be modified to compute the capacity of a completely positive operator LL. Recall that the capacity of an operator LL is defined to be cap(L)=infDet(L(P))\textnormal{cap}(L)=\text{inf}\>\textnormal{Det}(L(P)), where the infimum is taken over all psd matrices PP with Det(P)=1\textnormal{Det}(P)=1. The following theorem is proven as Theorem 5.3 in Section 5.

There is a poly(n,b,1/ϵ)\textnormal{poly}(n,b,1/\epsilon) time algorithm for computing a (1+ϵ)(1+\epsilon)-multiplicative approximation to cap(L)\textnormal{cap}(L). Here nn is the dimension on which LL acts and bb denotes the bit-sizes involved in description of LL.

This is quite fascinating since we don’t know of a convex formulation for cap(L)\textnormal{cap}(L). The idea for the computation of capacity is quite simple. We know that the capacity of a doubly stochastic operator is exactly 11. We also know how operator scaling changes the capacity. If LB,CL_{B,C} is a scaling of LL i.e.

if LB,CL_{B,C} is doubly stochastic. Thus if we can find a doubly-stochastic-scaling of LL, then we can compute cap(L)\textnormal{cap}(L) exactly. Algorithm G helps in finding an approximately-doubly-stochastic-scaling of LL, which results in an approximation scheme for computing cap(L)\textnormal{cap}(L). It is a very intriguing open problem if a (1+ϵ)(1+\epsilon) approximation to cap(L)\textnormal{cap}(L) can be computed in time poly(n,b,log(1/ϵ))\textnormal{poly}(n,b,\log(1/\epsilon)). In the aforementioned upcoming paper, we use this result (via a black-box reduction!) to approximating optimal constants in Brascamp-Lieb inequalities.

Although, we don’t know of a convex formulation for capacity, it does have some nice properties. Let us look at the Lagrangian for log(cap(L))\log(\textnormal{cap}(L)):

A critical point CC i.e. where the gradient f(C,λ)=0\nabla f(C,\lambda)=0 should satisfy L(L(C)1)=C1L^{*}\left(L(C)^{-1}\right)=C^{-1}. This follows from

and hence any critical point is in fact a global minimum for cap(L)\textnormal{cap}(L)! It would be interesting to see if our techniques can help solve other non-convex optimization problems which share the same property.

Algorithm GG can also be thought of as an alternating minimization algorithm for computing capacity. First let us write capacity in an alternative way.

While the constraints are convex (by log-concavity of determinant), the objective is quadratic. However if, we fix XX or YY, this program is convex. In fact, if we fix XX, the optimum YY is Det(L(X))1/nL(X)1\textnormal{Det}(L(X))^{1/n}\cdot L(X)^{-1}. Similarly, if we fix YY, the optimum XX is Det(L(Y))1/nL(Y)1\textnormal{Det}(L^{*}(Y))^{1/n}\cdot L^{*}(Y)^{-1}. Starting from X0=IX_{0}=I, if we do alternate minimization, we get exactly Algorithm GG and our results imply that this process converges in polynomial number of steps!

From the description of capacity, it is not clear if it is continuous. The reason is that a priori, the optimizing matrices could be radically different for cap(L1)\textnormal{cap}(L_{1}) and cap(L2)\textnormal{cap}(L_{2}) even if L1L_{1} and L2L_{2} are quite close. We show that this not the case, essentially because cap(L)\textnormal{cap}(L) (as well as the optimizer) can be approximated by a simple iterative process namely Algorithm GG! And Algorithm GG is clearly continuous in LL. We prove this formally in Theorem 4.5 of Section 4.3. In the aforementioned upcoming paper, we use this result to prove continuity of Brascamp-Lieb constants.

The continuity of capacity can also be proven via other methods and is already mentioned in [Gur04]. But here we manage to get explicit bounds on the continuity parameter which we don’t know how to get by methods other than analyzing Algorithm GG.

Subsequent Work

After our work, Derksen and Makam [DM17] obtained polynomial degree bounds for the left-right action over any field! Using this bound, Ivanyos, Qiao and Subrahmanyam [IQS18] designed a completely different deterministic polynomial time algorithm for SINGULAR that works over all fields. This algorithm has a combinatorial/linear algebraic flavor, and has several important advantages over our algorithm, but also some limitations in comparison. One clear advantage is of course that it works over every field. Another is that this algorithm can certify the non-commutative singularity of an input LL over 𝔽​(

Using some of the techniques developed in [IQS17, IQS18], Bläser, Jindal and Pandey [BJP17] designed a deterministic PTAS for the commutative rank i.e. a (1+ϵ)(1+\epsilon)-approximation algorithm which runs in deterministic nO(1/ϵ)n^{O(1/\epsilon)} time. This improves the factor-22 approximation algorithm given in this paper.

8 Organization

In Section 2, we describe Algorithm G for testing if quantum operators are rank-decreasing. We explain the notion of capacity of quantum operators, prove an explicit lower bound on the capacity of rank non-decreasing operators, and explain how it is used in the analysis to prove that Algorithm GG runs in polynomial time. In Section 3, we study various properties of capacity, some of which are used in other places in the paper. In Section 4, we analyze the bit complexity of Algorithm GG and also prove an explicit bound on the continuity of capacity. In Section 5, we will show how a modification of Algorithm GG can be used to approximate capacity. In the appendix Section A, we show how to compute the non-commutative rank. We conclude in Section 6 with a short discussion and open problems.

Quantum Operators and Analysis of Algorithm G

This section is devoted to Algorithm GG and its analysis. We start with preliminaries about completely positive operators, their properties, and basic quantities associated with them. We then formally describe Algorithm GG, and proceed to give a full analysis of its running time, proving the main theorem of this paper. Section 2.1 contains definitions and properties of completely positive operators and their capacity. Section 2.2 describes the Algorithm GG and its analysis assuming an explicit lower bound on the capacity. Section 2.3 contains the main theorem concerning the lower bound on capacity of rank non-decreasing operators.

The above is actually not the usual definition of completely positive operators. TT is defined to be positive if T(X)0T(X)\succeq 0 whenever X0X\succeq 0. TT is completely positive if IkTI_{k}\otimes T is positive for all k1k\geq 1. Choi [Cho75] proved that an operator is completely positive iff it is of the form stated above.

If T(X)=i=1mAiXAiT(X)=\sum_{i=1}^{m}A_{i}XA_{i}^{\dagger} is a completely positive operator, we define its dual TT^{*} by T(X)=i=1mAiXAiT^{*}(X)=\sum_{i=1}^{m}A_{i}^{\dagger}XA_{i}. If both TT and TT^{*} are trace preserving, namely T(I)=T(I)=IT(I)=T^{*}(I)=I then we call TT (and TT^{*}) doubly stochastic.

A completely positive operator TT is said to be rank-decreasing if there exists an X0X\succeq 0 s.t. rank(T(X))<rank(X)\text{rank}(T(X))<\text{rank}(X). TT is said to be cc-rank-decreasing if there exists an X0X\succeq 0 s.t. rank(T(X))rank(X)c\text{rank}(T(X))\leq\text{rank}(X)-c. We will sometimes refer to operators that are not rank decreasing as rank non-decreasing.

Now that we defined completely positive operators, we define their capacity, which is a very important complexity measure of such operators suggested in [Gur04]. Its evolution will be central to the complexity analysis of our algorithm.

[Gur04] The capacity of a completely positive operator TT, denoted by cap(T)\textnormal{cap}(T), is defined as

This notion of capacity has a very interesting history. Some special cases of capacity were defined in [GS00] and [GS02]. It was then extended to hyperbolic polynomials and this also led to a resolution (and extremely elegant proofs) of the Van der Waerden Conjecture for Mixed Discriminants [Gur06].

The next proposition shows how capacity changes when linear transformations are applied (as in the algorithm) to the completely positive operator.

Let TT be the operator defined by A1,,AmA_{1},\ldots,A_{m} and let TB,CT_{B,C} be the operator defined by BA1C,,BAmCBA_{1}C,\ldots,BA_{m}C, where B,CB,C are invertible matrices. Then

The next proposition gives a useful upper bound on the capacity of trace preserving completely positive operators; this will be used for the convergence analysis of the algorithm.

Let TT be a completely positive operator with Kraus operators A1,,AmA_{1},\ldots,A_{m} (which are n×nn\times n complex matrices). Also suppose that either i=1mAiAi=I\sum_{i=1}^{m}A_{i}^{\dagger}A_{i}=I or i=1mAiAi=I\sum_{i=1}^{m}A_{i}A_{i}^{\dagger}=I. Then cap(T)1\textnormal{cap}(T)\leq 1.

Note that tr(T(I))=n\textnormal{tr}(T(I))=n in either case.

The second inequality follows from the AM-GM inequality. ∎

2 Algorithm G𝐺G and its convergence rate

We now describe Algorithm GG to test if a completely positive operator TT (given in terms of Kraus operators A1,,AmA_{1},\ldots,A_{m}) is rank non-decreasing (equivalently properties (1)(8)(1)-(8) in Theorem 1.4 one of which is that A1,,AmA_{1},\ldots,A_{m} admit no shrunk subspace). We will then analyze its convergence rate, namely the number of scaling iterations needed as a function of the input size. In section 4, we will continue with the finer analysis of the bit complexity of this algorithm.

Since the property of A1,,AmA_{1},\ldots,A_{m} having a shrunk subspace or not remains invariant if we replace A1,,AmA_{1},\ldots,A_{m} by a basis spanning the subspace spanned by A1,,AmA_{1},\ldots,A_{m}, we can always assume wlog that mn2m\leq n^{2}. Suppose the maximum bit size of the entries of A1,,AmA_{1},\ldots,A_{m} is bb. Since scaling the matrices A1,,AmA_{1},\ldots,A_{m} doesn’t change the problem, we can assume that A1,,AmA_{1},\ldots,A_{m} have integer entries of magnitude at most M=2O(b)M=2^{O(b)}.

Algorithm GG below is essentially Gurvits’ algorithm [Gur04] for Edmonds’ problem for subspaces of matrices having Edmonds-Rado property. It is a non-commutative generalization of the Sinkhorn scaling procedure to scale non-negative matrices to doubly stochastic matrices (see for example [LSW98] and references therein). The algorithm alternates applying a “normalizing” basis change from the left and right to the given matrices, so as to alternately make the operator or its dual trace preserving. The idea is that this process will converge to a doubly stochastic operator iff it is not rank-decreasing, and furthermore, we can bound the number tt of iterations by poly(n,log(M))\textnormal{poly}(n,\log(M)). We will use the following to measure of our operator to being doubly stochastic.

Given a completely positive operator TT, define its right normalization TRT_{R} as follows:

Given a completely positive operator TT, define its left normalization TLT_{L} as follows:

Note that TL(I)=IT_{L}(I)=I. These operations are referred to as row and column operations in [Gur04].

We next prove that if a completely positive operator TT is doubly stochastic, or close to being one in this measure, then it is rank non-decreasing. This will be important for the termination condition of Algorithm GG.

If TT is a completely positive operator which is right ((or left)) normalized satisfying ds(T)1/(n+1)\textnormal{ds}(T)\leq 1/(n+1), then TT is rank non-decreasing.

Suppose X0X\succeq 0 is a psd matrix s.t. Rank(X)=r\text{Rank}(X)=r. We would like to prove that Rank(T(X))r\text{Rank}(T(X))\geq r. Let

be the eigenvalue decomposition of XX where viv_{i}’s are orthonormal and λi>0\lambda_{i}>0. Then

Let us denote T(vivi)T\left(v_{i}v_{i}^{\dagger}\right) by RiR_{i}. Since T(I)=IT(I)=I, we get that i=1rRiT(I)=I\sum_{i=1}^{r}R_{i}\preceq T(I)=I. Also note that

This is because RiR_{i}’s are psd matrices and hence a vector is in the kernel of i=1rλiRi\sum_{i=1}^{r}\lambda_{i}R_{i} iff it is in the kernel of all the RiR_{i}’s. Because of i=1rRiI\sum_{i=1}^{r}R_{i}\preceq I, we get that

Suppose T(I)=I+ΔT^{*}(I)=I+\Delta. We know that tr[Δ2]1/(n+1)\textnormal{tr}[\Delta^{2}]\leq 1/(n+1). Then

Let PP denote the projection onto {v1,,vr}\{v_{1},\ldots,v_{r}\}. Adding the above equations for all i{1,,r}i\in\{1,\ldots,r\}, we get

which completes the proof since Rank(T(X))\text{Rank}\left(T(X)\right) is an integer. ∎

We are now ready to present the algorithm.

In algorithm GG, the operators are maintained in terms of the Kraus operators and it is easy to see the effect of right and left normalizations on the Kraus operators. In fact, they are named such since for a right (left) normalization, the Kraus operators multiplied on the right (left) by T(I)1/2(T(I)1/2)T^{*}(I)^{-1/2}\left(T(I)^{-1/2}\right).

The main objective is to analyze the minimum number of steps tt for which this algorithm terminates with the correct answer. The following theorem of [Gur04] gives the time analysis of the algorithm assuming an initial lower bound on the capacity of the input completely positive operator. In the next subsection we will prove our main result, an appropriate lower bound, which shows that the algorithm terminates with the correct answer in polynomial time.

Let TT be a completely positive operator that is rank non-decreasing, with Kraus operators A1,,AmA_{1},\ldots,A_{m}, which are n×nn\times n integer matrices such that each entry of AiA_{i} has absolute value at most MM. Suppose TRT_{R} is the right normalization of TT. If cap(TR)f(n,M)\textnormal{cap}(T_{R})\geq f(n,M), then Algorithm GG when applied for at least t=2+36nlog(1/f(n,M))t=2+36n\cdot\log(1/f(n,M)) steps is correct.

For completeness sake, we provide a full proof of this theorem. Again, this analysis follows similar ones for the classical Sinkhorn iterations, e.g. as in [LSW98, GY98]. Basically, capacity increases by a factor roughly 1+1/36n1+1/36n per iteration as long as it is not too close to 1.

If either T(I)T(I) or T(I)T^{*}(I) is singular, then TT is rank-decreasing. When T(I)T(I) is singular, TT decreases the rank of II. When T(I)T^{*}(I) is singular, any vector in the kernel of T(I)T^{*}(I) lies in the kernels of all the AiA_{i}’s. If T(I)T(I) and T(I)T^{*}(I) are both non-singular, it is easy to verify that Tj(I)T_{j}(I) and Tj(I)T_{j}^{*}(I) will remain non-singular for all jj and hence step 2 is well defined. Also using Theorem 2.12 and the fact that right and left normalizations don’t change the property of being rank decreasing, Algorithm GG will always output rank decreasing if TT is rank-decreasing.

So what is left to prove is if TT is rank non-decreasing, then min{ϵj:1jt}1/6n\min\{\epsilon_{j}:1\leq j\leq t\}\leq 1/6n. Assume to the contrary that it is not. Denote by capj\textnormal{cap}_{j} to be the capacity of the operator TjT_{j}. By Lemma 5.2 below (which essentially is a robust version of the AM-GM inequality), if ϵj>1/6n\epsilon_{j}>1/6n, then capj+1exp(1/36n)capj\textnormal{cap}_{j+1}\geq\textnormal{exp}(1/36n)\cdot\textnormal{cap}_{j}.

From the assumption of the theorem, we know that cap1f(n,M)\textnormal{cap}_{1}\geq f(n,M). Also it is easy to see that capj1\textnormal{cap}_{j}\leq 1 for all jj (Proposition 2.8). However by the assumption that min{ϵj:1jt}>1/6n\min\{\epsilon_{j}:1\leq j\leq t\}>1/6n and the increase in capacity per iteration seen above, we get that

Plugging in t=2+36nlog(1/f(n,M))t=2+36n\cdot\log(1/f(n,M)) gives us the required contradiction.

In the main Theorem 2.20 below we will prove that the quantity f(n,M)f(n,M) used in the statement of Theorem 2.14 is exp(4nlog(Mmn))\geq\textnormal{exp}(-4n\log(Mmn)), which will prove that the number of iterations needed in Algorithm GG is O(n2log(Mmn))O(n^{2}\log(Mmn)). But this alone doesn’t guarantee that the algorithm is actually polynomial time, since the bit sizes of numbers involved might get exponential. As it happens, simple truncation suffices to overcome this problem, as shown in [Gur04], and we reproduce this analysis in Subsection 4.2 for completeness.

3 Main Theorem: The Lower Bound on Capacity

In this subsection we prove our main theorem, a lower bound on capacity of an operator in terms of its description size, in Theorem 2.20. Before diving into the proof we provide a high level plan. We will first prove that if a completely positive operator with integer entries is rank non-decreasing (that is has positive capacity), then the capacity is actually non-negligible. Our starting point (Theorem 1.4) is the statement of the equivalence between the rank non-decreasing property and an algebraic condition (non-vanishing of a certain determinant). Using Alon’s Combinatorial Nullstellensatz, we can ensure small coefficients in the algebraic condition above. We then state and prove (Lemma 2.17) a Cauchy-Schwartz inequality for matrices needed next. The main result (Theorem 2.18 ) proceeds by applying an appropriate scaling to the original integral operator, which reduces it to one that preserves the identity matrix. The latter property (which provides a trace bound), the integrality of the original operator (which provides an integral determinantal lower bound) and the multiplicativity of capacity under scaling combine (via the above inequality) to give the desired bound. We now proceed with the details of this plan.

We will need the fact that (nonzero) polynomials of degree dd cannot vanish on all points with non-negative integer coordinates with sum d\leq d. This follows from Alon’s combinatorial nullstellensatz [Alo99, Theorem 1.2].

As a corollary of Theorem 1.4 and of Lemma 2.15, we get the following:

Furthermore, D1,,DmD_{1},\ldots,D_{m} can be chosen to be matrices with non-negative integer entries s.t.

This is because the total degree of pp is ndnd and the individual degree of each variable Xj(k,l)X_{j}(k,l) in pp is at most nn. This implies the desired upper bound. ∎

We will also need the following Cauchy-Schwarz like inequality:

Let C1,,CmC_{1},\ldots,C_{m} and D1,,DmD_{1},\ldots,D_{m} be d1d_{1} and d2d_{2} dimensional complex matrices, respectively. Then

The first inequality is just tr[YY]0\textnormal{tr}\left[YY^{\dagger}\right]\geq 0. The second equality follows from linearity of trace, tr[YZ]=tr[Y]tr[Z]\textnormal{tr}\left[Y\otimes Z\right]=\textnormal{tr}[Y]\cdot\textnormal{tr}[Z] and tr[XY]=tr[YX]\textnormal{tr}[XY^{\dagger}]=\textnormal{tr}[YX^{\dagger}]. Rearranging the above, we get the following inequality:

Summing Equation (1) over pairs (i,j)(i,j) with i<ji<j, we obtain the following:

Adding itr[CiCi]tr[DiDi]\sum_{i}\textnormal{tr}\left[C_{i}C_{i}^{\dagger}\right]\cdot\textnormal{tr}\left[D_{i}D_{i}^{\dagger}\right] to both sides completes the proof (due to linearity of trace). ∎

Now we are ready to prove our main theorem for operators with integral Kraus operators.

Since TAT_{A} is rank non-decreasing, by Corollary 2.16, there exist non-negative integer matrices D1,,DmD_{1},\ldots,D_{m} of some dimension dd s.t. Det(D1A1++DmAm)0\textnormal{Det}(D_{1}\otimes A_{1}+\cdots+D_{m}\otimes A_{m})\neq 0 and also

Since A1,,AmA_{1},\ldots,A_{m} are also integer matrices, Det(D1A1++DmAm)\textnormal{Det}(D_{1}\otimes A_{1}+\cdots+D_{m}\otimes A_{m}) is a non-zero integer and hence

Let X0X\succ 0 be such that Det(X)=1\textnormal{Det}(X)=1. Consider the matrices Ci=TA(X)1/2AiX1/2C_{i}=T_{A}(X)^{-1/2}A_{i}X^{1/2}. This is a scaling intended so that the operator TCT_{C} defined by the matrices CiC_{i} preserves identity i.e. satisfies

Let YY denote the matrix D1C1++DmCmD_{1}\otimes C_{1}+\cdots+D_{m}\otimes C_{m}. Then

The equality follows from multiplicativity of determinant and the fact that Det(IdY)=Det(Y)d\textnormal{Det}\left(I_{d}\otimes Y\right)=\textnormal{Det}(Y)^{d}. The inequality follows from Det(X)=1\textnormal{Det}(X)=1 and Equation (3). Hence we obtain

On the other hand, by the AM-GM inequality

The first inequality follows from Lemma 2.17. The second equality follows from Equation (4). The second inequality follows from Equation (2). Combining equations (7), (8) and (9), we obtain:

Since XX was an arbitrary matrix s.t. X0X\succ 0 and Det(X)=1\textnormal{Det}(X)=1, we obtain the desired lower bound on cap(TA)\textnormal{cap}(T_{A}). ∎

A slightly stronger bound can be obtained by using quantum permanents [Gur04] in the case when the DiD_{i}’s are of constant dimension.

Theorem 2.18 immediately implies the following capacity lower bound that we need.

Suppose TT is a cptp map which is rank non-decreasing and is obtained by right normalization of an operator with Kraus operators A1,,AmA_{1},\ldots,A_{m}, which are n×nn\times n integer matrices such that each entry of AiA_{i} has absolute value at most MM. Then cap(T)exp(4nlog(Mmn))\textnormal{cap}(T)\geq\textnormal{exp}(-4n\log(Mmn)).

Let TAT_{A} be the operator defined by A1,,AmA_{1},\ldots,A_{m}. Since TT is the right normalization of TAT_{A}, by Proposition 2.7, we get that

Note that T(I)T^{*}(I) is a psd matrix. Also

This implies (via the AM-GM inequality) that

Combining Equations (10) and (11), and Theorem 2.18 gives the desired bound. ∎

Properties of Capacity

In this section, we prove some interesting properties of capacity. In subsection 3.1, we prove that the capacity of operators which are close to doubly stochastic is close to 11. This is used in the proof of Theorem 3.11. In subsection 3.2, we characterize the almost optimal points of capacity in terms of approximate fixed points of an operator. This will be used later in section 4 in the proof of continuity of capacity (theorem 4.5). In 3.3, we prove a multiplicativity property of capacity under tensor products of operators. This property was used in the lower bound on capacity in the previous version of this paper. While it is no longer needed for this purpose now, it is an interesting and intriguing fact by itself, so we include a proof.

The following definition of capacity for non-negative matrices is due to [GY98]. They used it to analyze Sinkhorn’s algorithm for matrix scaling.

Given a non-negative matrix AA, its capacity is defined as follows:

The next lemma states that the capacity of almost doubly stochastic matrices is close to 11.

Let AA be a non-negative n×nn\times n matrix s.t. the row sums of AA are 11 and

where c1,,cnc_{1},\ldots,c_{n} are the column sums of AA. Then

For the inequality, we used concavity of log and the fact that row sums of BB are 11. For the third equality, we used that the column sums of BB are 11.

Now we need to move on to AA which is almost doubly stochastic. A direct argument like for the doubly stochastic case does not work. We need to first prove the following claim ([LSW98]):

There exists a decomposition of A=λB+ZA=\lambda B+Z, where BB is doubly stochastic, ZZ is non-negative and λ1nϵ\lambda\geq 1-\sqrt{n\epsilon}.

Given the claim, it is easy to complete the proof of the lemma.

So we end with a proof of the claim from [LSW98]. We will first prove that there is a decomposition A=D+ZA=D+Z where DD is multiple of a doubly stochastic matrix and per(Z)=0\text{per}(Z)=0. Initially, we start with D=0D=0 and Z=AZ=A. As long as per(Z)>0\text{per}(Z)>0, there is an α>0\alpha>0 and a permutation matrix PP s.t. Z=ZαPZ^{\prime}=Z-\alpha P is nonnegative and the number of non-zero entries in ZZ^{\prime} is strictly less than that of ZZ. Replace DD by D=D+αPD^{\prime}=D+\alpha P and ZZ by ZZ^{\prime}. Note that DD^{\prime} is also a multiple of a doubly stochastic matrix. After at most n2n^{2} iterations, we will find a decomposition A=λB+ZA=\lambda B+Z, where per(Z)=0\text{per}(Z)=0 (and BB is doubly stochastic). We will now prove that λ1nϵ\lambda\geq 1-\sqrt{n\epsilon}. If λ=1\lambda=1, then we are already done, so assume λ<1\lambda<1. In this case, consider the matrix C=Z1λC=\frac{Z}{1-\lambda}. Row sums of CC are 11 and the ithi^{\text{th}} column sum ci=ciλ1λc_{i}^{\prime}=\frac{c_{i}-\lambda}{1-\lambda}. Then

Since per(C)=0\text{per}(C)=0 and row sums are 11, it follows that (see Lemma 5.2 in [LSW98])

which implies λ1nϵ\lambda\geq 1-\sqrt{n\epsilon}

The next lemma says that capacity of almost doubly stochastic operators is close to 11. The proof will proceed by reducing the operator case to the non-negative matrix case established above.

Let TT be a positive operator acting on n×nn\times n matrices such that tr[(T(I)I)2]ϵ\textnormal{tr}[(T(I)-I)^{2}]\leq\epsilon and T(I)=IT^{*}(I)=I (equivalently TT is trace-preserving). Then cap(T)(1nϵ)n\textnormal{cap}(T)\geq(1-\sqrt{n\epsilon})^{n}.

The proof can be adapted to obtain a similar statement when both T(I),T(I)IT(I),T^{*}(I)\approx I.

Let XX be a positive definite matrix s.t. Det(X)=1\textnormal{Det}(X)=1. Let

be an eigenvalue decomposition of XX with v1,,vnv_{1},\ldots,v_{n} forming a orthonormal basis. Then

Let us denote T(vjvj)T(v_{j}v_{j}^{\dagger}) by RjR_{j}. Then since TT is trace preserving, we have that tr(Rj)=1\textnormal{tr}(R_{j})=1 for all jj. Also let

be an eigenvalue decomposition for T(X)T(X) with u1,,unu_{1},\ldots,u_{n} forming a orthonormal basis. It follows that for all ii,

Let AA denote the non-negative matrix s.t. Ai,j=uiRjuiA_{i,j}=u_{i}^{\dagger}R_{j}u_{i}. Then σ=Aλ\sigma=A\lambda. Also the column sums of AA are all 11 since tr(Rj)=1\textnormal{tr}(R_{j})=1 for all jj. We will also prove that

where r1,,rnr_{1},\ldots,r_{n} denote the column sums of AA. Then the lemma follows from Lemma 3.2 (applied with row sums switched with column sums) and the facts that Det(T(X))=i=1nσi\textnormal{Det}(T(X))=\prod_{i=1}^{n}\sigma_{i} and Det(X)=i=1nλi\textnormal{Det}(X)=\prod_{i=1}^{n}\lambda_{i}. Now note that

The first inequality can be proved via convexity of square. ∎

We also prove an easy lemma for the other direction: operators with capacity almost 11 are almost doubly stochastic.

Let TT be a positive operator acting on n×nn\times n matrices s.t. cap(T)exp(δ)\textnormal{cap}(T)\geq\textnormal{exp}(-\delta), δ1/6\delta\leq 1/6 and also T(I)=IT^{*}(I)=I. Then ds(T)=tr[(T(I)I)2]6δ\textnormal{ds}(T)=\textnormal{tr}[(T(I)-I)^{2}]\leq 6\delta.

cap(T)exp(δ)\textnormal{cap}(T)\geq\textnormal{exp}(-\delta) implies that Det(T(I))exp(δ)\textnormal{Det}(T(I))\geq\textnormal{exp}(-\delta). Also tr[T(I)]=tr[T(I)]=n\textnormal{tr}[T(I)]=\textnormal{tr}\left[T^{*}(I)\right]=n. Suppose tr[(T(I)I)2]=α\textnormal{tr}[(T(I)-I)^{2}]=\alpha. Then Lemma 5.1 below implies that

This implies that α6δ\alpha\leq 6\delta. ∎

Note that the parameters in Lemmas 3.4 and 3.6 don’t match and that is because capacity and the distance to doubly stochasticity don’t characterize each other exactly.

2 Characterization of approximate optimizers for capacity

Let TT be a completely positive operator. Define Fixed(T,ϵ)\text{Fixed}(T,\epsilon) to be the set of hermitian positive-definite matrices which are ϵ\epsilon-approximate fixed point of the operator XT(T(X)1)1X\rightarrow T^{*}(T(X)^{-1})^{-1} i.e. CFixed(T,ϵ)C\in\text{Fixed}(T,\epsilon) if the following holds:

The following lemma essentially says that the elements of Fixed(T,ϵ)\text{Fixed}(T,\epsilon) are approximate minimizers for cap(T)\textnormal{cap}(T).

Suppose CFixed(T,ϵ)C\in\text{Fixed}(T,\epsilon). If ϵ1/(n+1)\epsilon\leq 1/(n+1), then TT is rank non-decreasing. Furthermore,

Here nn is the size of matrices on which TT acts. Similar statement also holds for the operator XT(T(X)1)1X\rightarrow T(T^{*}(X)^{-1})^{-1}.

We will use a CFixed(T,ϵ)C\in\text{Fixed}(T,\epsilon) to find a scaling of TT which is almost doubly stochastic and then apply Theorem 2.12 (saying that almost doubly stochastic operators are rank non-decreasing) and Lemma 3.4 (giving a lower bound on capacity of almost doubly stochastic operators).

When ϵ1/(n+1)\epsilon\leq 1/(n+1), by Theorem 2.12, ZZ is rank non-decreasing and hence TT is rank non-decreasing. Also by Lemma 3.4,

The upper bound on cap(T)\textnormal{cap}(T) is clear from the definition of capacity. ∎

Next we prove the other direction but again, not with matching parameters.

where δ1/6\delta\leq 1/6. Then CFixed(T,6δ)C\in\text{Fixed}(T,6\delta).

As in the proof of Lemma 3.9, define the operator

This implies that CFixed(T,6δ)C\in\text{Fixed}(T,6\delta) because of the way operator ZZ is set up. ∎

3 Multiplicativity of capacity

In this subsection, we prove a curious multiplicativity property of capacity of completely positive operators. This is not required for the lower bound on capacity presented in this version but the easy direction of this theorem was required in the lower bound on capacity in the previous version.

Let T1T_{1} and T2T_{2} be completely positive operators where T1T_{1} acts on matrices of dimension d1d_{1} and T2T_{2} acts on matrices of dimension d2d_{2}. Then

We first prove the easy \leq direction of the above theorem. If XX and YY are the minimizers for cap(T1)\textnormal{cap}(T_{1}) and cap(T2)\textnormal{cap}(T_{2}), then looking at how much T1T2T_{1}\otimes T_{2} shrinks the determinant of XYX\otimes Y will give us the required statement. The infimum for cap(T1)\textnormal{cap}(T_{1}) and cap(T2)\textnormal{cap}(T_{2}) might not be achieved but that is fine.

Let T1T_{1} and T2T_{2} be completely positive operators where T1T_{1} acts on matrices of dimension d1d_{1} and T2T_{2} acts on matrices of dimension d2d_{2}. Then

Suppose XX, YY be d1d_{1} and d2d_{2} dimensional matrices respectively s.t. X,Y0X,Y\succ 0 and Det(X),Det(Y)=1\textnormal{Det}(X),\textnormal{Det}(Y)=1. Then XY0X\otimes Y\succ 0 and Det(XY)=1\textnormal{Det}(X\otimes Y)=1 as well. Also

This proves that cap(T1)d2cap(T2)d1cap(T1T2)\textnormal{cap}(T_{1})^{d_{2}}\cdot\textnormal{cap}(T_{2})^{d_{1}}\geq\textnormal{cap}(T_{1}\otimes T_{2}) (by taking XX and YY to be sequences of matrices which approach cap(T1)\textnormal{cap}(T_{1}) and cap(T2)\textnormal{cap}(T_{2}) respectively). Taking 1/(d1d2)1/(d_{1}d_{2}) powers on both sides completes the proof of the easy direction.

We are now ready to prove Theorem 3.11. The main ingredient is the duality between capacity being and the existence of a scaling to almost doubly stochastic position.

More formally, S1(I)=I,S2(I)=IS_{1}^{*}(I)=I,S_{2}^{*}(I)=I and tr[(S1(I)I)2]ϵ,tr[(S2(I)I)2]ϵ\textnormal{tr}\left[(S_{1}(I)-I)^{2}\right]\leq\epsilon,\textnormal{tr}\left[(S_{2}(I)-I)^{2}\right]\leq\epsilon. Now consider the operator S=S1S2S=S_{1}\otimes S_{2}. Note that S(I)=S1(I)S2(I)S(I)=S_{1}(I)\otimes S_{2}(I) and S(I)=S1(I)S2(I)S^{*}(I)=S_{1}^{*}(I)\otimes S_{2}^{*}(I). Hence S(I)=IS^{*}(I)=I and tr[(S(I)I)2]ϵ=2(d1+d2+ϵ)ϵ\textnormal{tr}\left[(S(I)-I)^{2}\right]\leq\epsilon^{\prime}=2(d_{1}+d_{2}+\epsilon)\epsilon. This can be seen from the following chain of inequalities (and equalities):

The first inequality follows from the following Cauchy-Schwarz inequality for psd matrices:

The second inequality follows from tr[S1(I)2]d1+ϵ\textnormal{tr}[S_{1}(I)^{2}]\leq d_{1}+\epsilon, which follows from the fact that tr[S1(I)]=tr[S1(I)]=tr[Id1]=d1\textnormal{tr}[S_{1}(I)]=\textnormal{tr}[S_{1}^{*}(I)]=\textnormal{tr}\left[I_{d_{1}}\right]=d_{1} and that tr[(S1(I)I)2]ϵ\textnormal{tr}\left[(S_{1}(I)-I)^{2}\right]\leq\epsilon, as well as tr[(S2(I)I)2]ϵ\textnormal{tr}\left[(S_{2}(I)-I)^{2}\right]\leq\epsilon. Note that the operator SS is explicitly given as follows:

By Lemma 3.4, we have that cap(S)(1d1d2ϵ)d1d2\textnormal{cap}(S)\geq(1-\sqrt{d_{1}d_{2}\epsilon^{\prime}})^{d_{1}d_{2}} and hence

Combining the fact that S1(I)=IS_{1}^{*}(I)=I and S2(I)=IS_{2}^{*}(I)=I along with Proposition 2.8, we get that cap(S1),cap(S2)1\textnormal{cap}(S_{1}),\textnormal{cap}(S_{2})\leq 1. Also by Proposition 2.7,we have that

Combining equations (12), (13) and (14), we get that

Since ϵ=2(d1+d2+ϵ)ϵ\epsilon^{\prime}=2(d_{1}+d_{2}+\epsilon)\epsilon can be taken to be arbitrarily close to , this completes the proof. ∎

Bit Complexity Analysis of Algorithm G𝐺G and Continuity of Capacity

In this section, we will provide the bit complexity analysis of Algorithm GG and also provide explicit bounds on the continuity of capacity by a sensitivity analysis of Algorithm GG. We start by analyzing a natural iterative sequence associated with an operator TT that will be very useful, both, in the bit complexity and continuity analysis. Section 4.1 will describe how the operators TjT_{j} in Algorithm GG evolve with respect this iterative sequence. Section 4.2 provides the bit complexity analysis of Algorithm GG and Section 4.3 provides explicit bounds for continuity of capacity.

Consider the sequence of matrices given by S0=T(I)S_{0}=T^{*}(I) (II is of size n×nn\times n), and

The next proposition studies the stability properties of this sequence of matrices.

For a real symmetric positive definite matrix XX, let l(X)l(X) denote its smallest eigenvalue and u(X)u(X) its largest.

Suppose AA is an n×nn\times n real symmetric positive definite matrix with integer entries s.t. the magnitude of any entry of AA is at most MM. Then l(A)1(Mn)n1l(A)\geq\frac{1}{(Mn)^{n-1}} and u(A)Mnu(A)\leq Mn.

Define α=(M2n2m)n1\alpha=(M^{2}n^{2}m)^{n-1}. Let TT be a completely positive operator whose Kraus operators A1,,AmA_{1},\ldots,A_{m} are n×nn\times n integer matrices and each entry of AiA_{i} is of magnitude at most MM. Also assume that T(I),T(I)T(I),T^{*}(I) are both non-singular. Then

Let T,M,n,mT,M,n,m be as before. Let XX be a real symmetric matrix. Then T(X)αX||T(X)||\leq\alpha||X||. Also if X is real symmetric positive definite, then l(T(X))α1l(X)l(T(X))\geq\alpha^{-1}\cdot l(X) and u(T(X))αu(X)u(T(X))\leq\alpha\cdot u(X). Similarly T(X)αX||T^{*}(X)||\leq\alpha||X||, l(T(X))α1l(X)l(T^{*}(X))\geq\alpha^{-1}\cdot l(X) and u(T(X))αu(X)u(T^{*}(X))\leq\alpha\cdot u(X).

SjS_{j} are real symmetric positive definite matrices for all 0jt0\leq j\leq t. Also l(Sj)α(j+1)l(S_{j})\geq\alpha^{-(j+1)} and u(Sj)αj+1u(S_{j})\leq\alpha^{j+1} for all 0jt0\leq j\leq t.

For all 1k,ln1\leq k,l\leq n, Sj(k,l)αj+1|S_{j}(k,l)|\leq\alpha^{j+1}.

The last inequality follows from Cauchy-Schwarz. Now since AA is positive definite and the determinant is an integer, Det(A)1\textnormal{Det}(A)\geq 1. Then

from which it follows that l(A)1(Mn)n1l(A)\geq\frac{1}{(Mn)^{n-1}}.

Follows from previous point by noting that each entry of T(I),T(I)T(I),T^{*}(I) is an integer of magnitude at most M2nmM^{2}nm and T(I),T(I)T(I),T^{*}(I) are both symmetric matrices.

Suppose XX is a real symmetric matrix satisfying X=β||X||=\beta. Then

since TT is completely positive and hence it maps psd matrices to psd matrices. Thus we get

Similarly one can prove that βαIT(X)\beta\alpha I\succeq T(X) which would imply T(X)αβ=αX||T(X)||\leq\alpha\beta=\alpha||X||. Other statements can be proven in a similar manner.

We will prove this via induction on jj. It is true for S0S_{0} by point 2 in the proposition. Suppose the statement holds for Sj1S_{j-1}. Then Sj=T(Sj11)S_{j}=T(S_{j-1}^{-1}) or Sj=T(Sj11)S_{j}=T^{*}(S_{j-1}^{-1}). It does not really matter which case it is for the purpose of this proof. Lets assume the former. Then

The first inequality is by point 3 in the proposition. The second inequality is by induction hypothesis. Similarly we can also prove that u(Sj)αj+1u(S_{j})\leq\alpha^{j+1}, which would complete the induction step.

Here eke_{k} and ele_{l} are the standard basis vectors.

Define another sequence of matrices defined by U0=S0U_{0}=S_{0} and

where Δj\Delta_{j}’s are small perturbations. We now study the closeness of the sequences {Uj}\{U_{j}\} and {Sj}\{S_{j}\}.

If Δj12jαj+1||\Delta_{j}||\leq\frac{1}{2^{j}\cdot\alpha^{j+1}} for all j1j\geq 1, then l(Uj)12jαj+1l(U_{j})\geq\frac{1}{2^{j}\cdot\alpha^{j+1}} and u(Uj)2jαj+1u(U_{j})\leq 2^{j}\cdot\alpha^{j+1}, for all j0j\geq 0. Here α=(M2n2m)n1\alpha=(M^{2}n^{2}m)^{n-1}.

We will prove this using induction on jj. It is true for j=0j=0 since U0=S0U_{0}=S_{0}. Assume that l(Uj)12jαj+1l(U_{j})\geq\frac{1}{2^{j}\cdot\alpha^{j+1}} and u(Uj)2jαj+1u(U_{j})\leq 2^{j}\cdot\alpha^{j+1}. Then Uj+1=T(Uj1)+Δj+1U_{j+1}=T(U_{j}^{-1})+\Delta_{j+1} or Uj+1=T(Uj1)+Δj+1U_{j+1}=T^{*}(U_{j}^{-1})+\Delta_{j+1}. Suppose it is T(Uj1)+Δj+1T(U_{j}^{-1})+\Delta_{j+1}. The other case is similar.

Note that there is a lot of slack in this part, but it does not matter for our purposes. ∎

The following holds for every integer tt: suppose Δjδ||\Delta_{j}||\leq\delta for all 1jt1\leq j\leq t, where δ1(2α)t+1\delta\leq\frac{1}{(2\alpha)^{t+1}}. Then

for all 0jt0\leq j\leq t. Here α=(M2n2m)n1\alpha=(M^{2}n^{2}m)^{n-1}.

For the fourth inequality we used Proposition 4.1 and Lemma 4.2. Note that value of δ\delta is small enough so that conditions of Lemma 4.2 are satisfied. ∎

In this subsection we compute succinct expressions for the operators TjT_{j} which appear in Algorithm GG, together with expressions for their capacity and distance to doubly stochastic. These expressions will involve the matrices SjS_{j} defined in the previous subsection. These operators TjT_{j} are the intermediary operators in the scaling procedure, and the succinct expressions will be important for us when approximating capacity, or proving the stability of the capacity of a quantum operator.

Starting from a completely positive operator TT which satisfies T(I)=IT(I)=I, denote the sequence of operators obtained after right and left normalizations by {Tj}\{T_{j}\} i.e. TjT_{j} is the operator obtained after jj iterations of right and left normalizations (as in Algorithm GG). Note that

For each jj, TjT_{j} is an operator scaling of TT. So

for some non-singular matrices CjC_{j} and DjD_{j}. Let us denote CjCjC_{j}^{\dagger}C_{j} by PjP_{j} and DjDjD_{j}^{\dagger}D_{j} by QjQ_{j}. It turns out PjP_{j} and QjQ_{j} are the matrices that really matter for the purpose of analyzing Algorithm GG and we will see next how these evolve.

When jj is odd, Tj(I)=IT_{j}^{*}(I)=I. Note that

So the condition Tj(I)=IT_{j}^{*}(I)=I gives us that Qj=T(Pj)1Q_{j}=T^{*}(P_{j})^{-1}. Also note that when jj is odd, Pj=Pj1P_{j}=P_{j-1}. When jj is even, we have, Tj(I)=IT_{j}(I)=I, which implies Pj=T(Qj)1P_{j}=T(Q_{j})^{-1} and also Qj=Qj1Q_{j}=Q_{j-1} holds. This can be summarized as follows:

along with T0=TT_{0}=T and P0,Q0=IP_{0},Q_{0}=I. Thus we can see that

with the convention S1=IS_{-1}=I. For Algorithm GG, what matters is

we get (because similar matrices have the same trace) that

For the computation of capacity, the determinants that matter are

By the discussion above and the fact that similar matrices have the same determinants, we get that

with the convention S2,S1=IS_{-2},S_{-1}=I. Thus we get

2 Bit Complexity Analysis of Algorithm G𝐺G

We are now ready to analyze the bit complexity of Algorithm GG. We will prove that if one runs Algorithm GG while truncating the numbers to an appropriate polynomial number of bits there is essentially no change in the convergence and required number of iterations. We will call the algorithm working with truncated inputs Algorithm GG^{\prime}.

Given a matrix AA, let Trn(A)\text{Trn}(A) be the matrix obtained by truncating the entries of A up to P(n,log(M))P(n,\log(M)) bits after the decimal point. P(n,log(M))P(n,\log(M)) is a polynomial which we will specify later. Note that ATrn(A)2P(n,log(M))||A-\text{Trn}(A)||_{\infty}\leq 2^{-P(n,\log(M))}. We now describe Algorithm GG^{\prime}, which is the variant of Algorithm GG with truncation.

The parameter tt will be chosen as before: t=2+36nlog(1/f(n,M))t=2+36n\cdot\log(1/f(n,M)). We now show that throughout the iterations, distances to double stochasticity is essentially the same for the original and truncated algorithms GG and GG^{\prime}.

Here ϵj=ds(Tj)\epsilon_{j}=\textnormal{ds}(T_{j}) as defined in Algorithm GG. Also note that by equations (19a) and (19b),

Hence by Theorem 2.12, TT is rank non-decreasing. Now assume, for the reverse direction, that TT is rank non-decreasing. Then by the analysis in Section 2.2, min{ϵj:1jt}1/12n\min\{\epsilon_{j}:1\leq j\leq t\}\leq 1/12n. Hence by Lemma 4.4

This proves the correctness of Algorithm GG^{\prime}.

where the last inequality follows from Proposition 4.1. Also

where the last inequality follows from Lemma 4.2 and the fact that P(n,log(M))P(n,\log(M)) will be chosen to be a large enough polynomial so that the perturbations Δj\Delta_{j} in the lemma satisfy the condition Δj2P(n,log(M))12jαj+1||\Delta_{j}||\leq 2^{-P(n,\log(M))}\leq\frac{1}{2^{j}\cdot\alpha^{j+1}}. Now

In the first equality, we used the fact that tr(MjNj)=tr(NjMj)\textnormal{tr}(M_{j}N_{j})=\textnormal{tr}(N_{j}M_{j}). In the second inequality, we used the fact that the maximum magnitude of any entry in Mj+NjM_{j}+N_{j} is bounded by (2α)2j(2\alpha)^{2j} and that tr(MjNj)|\textnormal{tr}(M_{j}-N_{j})| is upper bounded by nMjNjn||M_{j}-N_{j}||. The third inequality is just Cauchy-Schwarz and the fourth is the fact that Frobenius norm of a matrix is upper bounded by nMjNj2n||M_{j}-N_{j}||^{2}. Let us try to upper bound MjNj||M_{j}-N_{j}|| now.

The second last inequality follows by application of Proposition 4.1 and Lemmas 4.2 and 4.3. Also δ\delta here is at most n2P(n,log(M))n\cdot 2^{-P(n,\log(M))} since Δj2P(n,log(M))||\Delta_{j}||_{\infty}\leq 2^{-P(n,\log(M))}.

Now α=(M2n2m)n1=exp(Θ(nlog(n)log(M)))\alpha=(M^{2}n^{2}m)^{n-1}=\textnormal{exp}(\Theta(n\log(n)\log(M))) (since mn2m\leq n^{2}). Hence

We proved in Theorem 2.20 that log(1/f(n,M))\log(1/f(n,M)) is poly(n,log(M))\textnormal{poly}(n,\log(M)), so P(n,log(M))P(n,\log(M)) is also a polynomial in nn and log(M)\log(M). ∎

3 Continuity of Capacity

In this section, we prove the continuity of capacity. We prove the following theorem:

Suppose A1,,AmA_{1},\ldots,A_{m} and B1,,BmB_{1},\ldots,B_{m} are two tuples of n×nn\times n matrices such that the bit-complexity of elements of AiA_{i}’s is bb and AiBiδ||A_{i}-B_{i}||\leq\delta for all ii. Let TAT_{A} be the operator defined by A1,,AmA_{1},\ldots,A_{m} and TBT_{B} be the operator defined by B1,,BmB_{1},\ldots,B_{m}. Then there exists a polynomial P(n,b,log(m))P(n,b,\log(m)) s.t. if δexp(P(n,b,log(m)))\delta\leq\textnormal{exp}(-P(n,b,\log(m))), then cap(TA)>0\textnormal{cap}(T_{A})>0 implies cap(TB)>0\textnormal{cap}(T_{B})>0. Furthermore,

The fact that capacity is continuous is already mentioned in [Gur04] and can be proved by other methods. But here we provide explicit bounds on the continuity parameters. Recall that Fixed(T,ϵ)\text{Fixed}(T,\epsilon) (defined in subsection 3.2) is the set of hermitian positive-definite matrices CC which are ϵ\epsilon-approximate fixed points of the operator XT(T(X)1)1X\rightarrow T(T^{*}(X)^{-1})^{-1} i.e. satisfy the following condition:

The main insight is that by the analysis of Algorithm GG, since cap(TA)>0\textnormal{cap}(T_{A})>0, there exists a CFixed(TA,ϵ)C\in\text{Fixed}(T_{A},\epsilon) with low spectral norm. Since TBT_{B} is close to TAT_{A}, CFixed(TB,ϵ)C\in\text{Fixed}(T_{B},\epsilon^{\prime}) for ϵ\epsilon^{\prime} close to ϵ\epsilon and then the proof is finished by applying Lemma 3.9.

(Of Theorem 4.5) The proof of Theorem 2.14 can be modified to prove the following: there exists a polynomial Q(n,b)Q(n,b) s.t. for all ϵ>0\epsilon>0, if we run t=Q(n,b)/ϵt=Q(n,b)/\epsilon iterations of Algorithm GG starting from T0=TAT_{0}=T_{A} satisfying cap(TA)>0\textnormal{cap}(T_{A})>0, then for some 1jt1\leq j\leq t, ϵj=ds(Tj)ϵ\epsilon_{j}=\textnormal{ds}(T_{j})\leq\epsilon. Essentially, at each step capacity increases by roughly exp(Ω(ϵ))\textnormal{exp}(\Omega(\epsilon)) if ϵj>ϵ\epsilon_{j}>\epsilon, capacity is lower bounded by exp(Q(n,b))\textnormal{exp}(-Q(n,b)) initially and upper bounded by 11 always. By equations (19a) and (19b), we know that

where {Si}\{S_{i}\} is the sequence of matrices given by S0=TA(I)S_{0}=T_{A}^{*}(I), and

Suppose 1rt1\leq r\leq t be such that ϵrϵ\epsilon_{r}\leq\epsilon. Wlog assume that rr is odd. Then Sr=TA(TA(Sr21)1)S_{r}=T_{A}\left(T_{A}^{*}\left(S_{r-2}^{-1}\right)^{-1}\right). ϵrϵ\epsilon_{r}\leq\epsilon implies that Sr21S_{r-2}^{-1} is an ϵ\epsilon-approximate fixed point of the operator XTA(TA(X)1)1X\rightarrow T_{A}\left(T_{A}^{*}(X)^{-1}\right)^{-1}. Hence by Lemma 3.9,

We will prove now that Sr21S_{r-2}^{-1} is also an ϵ\epsilon^{\prime}-approximate fixed point of the operator XTB(TB(X)1)1X\rightarrow T_{B}\left(T_{B}^{*}(X)^{-1}\right)^{-1} for an appropriate choice of ϵ\epsilon^{\prime}. Let us denote Sr21S_{r-2}^{-1} by CC. By an application of Proposition 4.1, it follows that the lowest and highest eigenvalues of CC, l(C)l(C) and u(C)u(C) satisfy

where Q1(n,b,log(m))Q_{1}(n,b,\log(m)) is another polynomial s.t. Q1(n,b,log(m))=O(Q(n,b)nblog(nm))Q_{1}(n,b,\log(m))=O(Q(n,b)\cdot n\cdot b\cdot\log(nm)). Let DD be an arbitrary matrix. Then

The first two inequalities are just the triangle inequality. The third inequality follows from submultiplicativity of the spectral norm. The fourth inequality follows from the fact that

We will now upper bound TA(TA(C)1)TB(TB(C)1)||T_{A}\left(T_{A}^{*}(C)^{-1}\right)-T_{B}\left(T_{B}^{*}(C)^{-1}\right)||:

Equation (22) along with Proposition 4.1 implies that the lowest and highest eigenvalues of TA(C)T_{A}^{*}(C) satisfy the following:

Now applying equation (23) with D=TA(C)1D=T_{A}^{*}(C)^{-1} along with equation (26) gives us the following:

Now we will upper bound TA(C)1TB(C)1||T_{A}^{*}(C)^{-1}-T_{B}^{*}(C)^{-1}||.

The last inequality follows from equations (26) and (24). Note that we need δexp(P(n,b,log(m)))\delta\leq\textnormal{exp}(-P(n,b,\log(m))) for a sufficiently large polynomial PP here to upper bound TB(C)1||T_{B}^{*}(C)^{-1}|| via equations (26) and (24). Now applying equation (23) with D=TA(C)1TB(C)1D=T_{A}^{*}(C)^{-1}-T_{B}^{*}(C)^{-1} along with equation (28) gives us that

We are left to upper bound TA(TA(C)1TB(C)1)||T_{A}\left(T_{A}^{*}(C)^{-1}-T_{B}^{*}(C)^{-1}\right)||. This follows from Proposition 4.1 and equation (28).

Combining equations (25), (27), (29) and (30), we get the following

Let us denote the matrix D1=CTA(TA(C)1)D_{1}=C\cdot T_{A}\left(T_{A}^{*}(C)^{-1}\right) and D2=CTB(TB(C)1)D_{2}=C\cdot T_{B}\left(T_{B}^{*}(C)^{-1}\right). D1D_{1} determines whether CC is an approximate-fixed point of TAT_{A} and D2D_{2} determines whether CC is an approximate-fixed point of TBT_{B}.

The second inequality follows from equations (22) and (31). We also have the following elementary inequality:

The above inequality implies that CC is an ϵ\epsilon^{\prime}-approximate fixed point of TBT_{B} for

We can now choose ϵ=12(n+1)\epsilon=\frac{1}{2(n+1)}. Then as long as δexp(Q11(n,b,log(m))/ϵ)2(n+1)\delta\leq\frac{\textnormal{exp}(-Q_{11}(n,b,\log(m))/\epsilon)}{2(n+1)}, then CC is a 1/(n+1)1/(n+1)-approximate fixed point of the operator XTB(TB(X)1)1X\rightarrow T_{B}\left(T_{B}^{*}(X)^{-1}\right)^{-1} and by Lemma 3.9, TBT_{B} is rank non-decreasing and hence cap(TB)>0\textnormal{cap}(T_{B})>0. This proves first part of the theorem. For the second part of the theorem, since CC is an ϵ\epsilon^{\prime}-approximate fixed point of TBT_{B} for

by equation (23) and Proposition 4.1. Hence

Now, combining equations (21) and (21), we get that

Combining this with equation (35) gives us

Now choose ϵ=2max{Q11(n,b,log(m)),Q12(n,b,log(m))}/log(1/δ)\epsilon=2\cdot\text{max}\left\{Q_{11}(n,b,\log(m)),Q_{12}(n,b,\log(m))\right\}/\log(1/\delta). This ensures that ϵ2ϵ\epsilon^{\prime}\leq 2\epsilon and elementary calculations can then finish the proof of the theorem. ∎

Computing the Capacity of a Quantum Operator

In this section, we show how algorithm GG can be used to compute an approximation to the capacity of any quantum operator. For simplicity of exposition, we will present in this section an analysis of convergence of algorithm GG without truncation. Afterwards, in subsection 5.1, we show how to adapt the analysis of algorithm GG to handle the truncation. This corresponds to the analysis of algorithm GG^{\prime} in the previous section.

We begin with the following lemma, which is an adaptation of Lemma 3.10 from [LSW98].

Let x1,,xnx_{1},\ldots,x_{n} be positive real numbers such that i=1nxi=n\displaystyle\sum_{i=1}^{n}x_{i}=n and i=1n(xi1)2=α\displaystyle\sum_{i=1}^{n}(x_{i}-1)^{2}=\alpha. Then

Where in the last inequalities we used the fact that i=1n(xi1)3(i=1n(xi1)2)3/2\displaystyle\sum_{i=1}^{n}(x_{i}-1)^{3}\leq\left(\displaystyle\sum_{i=1}^{n}(x_{i}-1)^{2}\right)^{3/2} and α3/2α\alpha^{3/2}\leq\alpha.

Consider the function f(λ)=i=1n(1+λ(xi1))f(\lambda)=\prod_{i=1}^{n}(1+\lambda(x_{i}-1)). We will prove that ff is a decreasing function of λ\lambda when λ\lambda\in. In that case

where the last inequality follows from Case 1. Now let us prove that ff is decreasing.

As a corollary of Lemma 5.1, we obtain the following quantitative progress measure towards computing capacity:

Let TT be a right (left) normalized quantum operator such that ds(T)=α\textnormal{ds}(T)=\alpha. Additionally, let T~\widetilde{T} be the left (right) normalization of operator TT. Then,

Suppose TT is right normalized and T~\widetilde{T} is the left normalization of TT. Proposition 2.7 tells us that

Let λ1,,λn\lambda_{1},\ldots,\lambda_{n} be the eigenvalues of T(I)T(I). As n=tr(T(I))=i=1nλin=\textnormal{tr}(T(I))=\displaystyle\sum_{i=1}^{n}\lambda_{i} and α=ds(T)=tr[(T(I)I)2]=i=1n(λi1)2\alpha=\textnormal{ds}(T)=\textnormal{tr}[(T(I)-I)^{2}]=\displaystyle\sum_{i=1}^{n}(\lambda_{i}-1)^{2}, the conditions of Lemma 5.1 apply and we have

This implies the desired lower bounds on cap(T~)\textnormal{cap}(\widetilde{T}). Since the case where TT is left normalized is analogous, we omit the argument. ∎

We now state a slight modification of Algorithm G, with a view towards computing the capacity of a quantum operator.

Let TT be a completely positive operator, whose Kraus operators are given by n×nn\times n integer matrices A1,,AmA_{1},\ldots,A_{m}, such that each entry of AiA_{i} has absolute value at most MM. Algorithm GG when applied for t=n3ϵ2(1+10n2log(Mn))t=\dfrac{n^{3}}{\epsilon^{2}}\cdot\left(1+10n^{2}\log(Mn)\right) steps approximates cap(T)\textnormal{cap}(T) within a multiplicative factor of 1±ϵ1\pm\epsilon.

If either T(I)T(I) or T(I)T^{*}(I) is singular, then TT decreases the rank of II. When T(I)T(I) is singular, rank decreasing follows by definition. When T(I)T^{*}(I) is singular, one way to see it is that Im(Ai)Im(T(I))\text{Im}(A_{i})\subseteq\text{Im}\left(T^{*}(I)\right) for all ii. Since Im(AiAi)=Im(Ai)\text{Im}\left(A_{i}A_{i}^{\dagger}\right)=\text{Im}(A_{i}), we get that Im(T(I))Im(T(I))\text{Im}\left(T(I)\right)\subseteq\text{Im}\left(T^{*}(I)\right) and hence T(I)T(I) is singular. Therefore, the algorithm is correct on step 1, by outputting cap(T)=0\textnormal{cap}(T)=0.

If T(I)T(I) and T(I)T^{*}(I) are both non-singular, it is easy to verify that Tj(I)T_{j}(I) and Tj(I)T_{j}^{*}(I) will remain non-singular for all jj and hence step 3 is well defined.

In this case, since right and left normalizations don’t change the property of being rank decreasing, we have cap(Tj)=0\textnormal{cap}(T_{j})=0 for all 0jt0\leq j\leq t. Hence, Lemma 3.4 implies that ds(Tj)>ϵ2n3\textnormal{ds}(T_{j})>\dfrac{\epsilon^{2}}{n^{3}} for all 0jt0\leq j\leq t. In this case, step 3 of Algorithm G will correctly output cap(T)=0\textnormal{cap}(T)=0.

In this case, we will show that there must exist 0jt0\leq j\leq t such that ϵjϵ2n3\epsilon_{j}\leq\dfrac{\epsilon^{2}}{n^{3}}. Assume the contrary, for the sake of contradiction. By Theorem 2.20, we know that cap(T1)exp(10n2log(Mn))\textnormal{cap}(T_{1})\geq\exp(-10n^{2}\log(Mn)). Also Proposition 2.8 implies that cap(Tj)1\textnormal{cap}(T_{j})\leq 1 for all jj. However by the assumption that ϵj>ϵ2n3\epsilon_{j}>\dfrac{\epsilon^{2}}{n^{3}}, Lemma 5.2 implies that cap(Tj+1)exp(ϵ2/n3)cap(Tj)\textnormal{cap}(T_{j+1})\geq\exp(\epsilon^{2}/n^{3})\cdot\textnormal{cap}(T_{j}) for all 0jt0\leq j\leq t. Hence, we obtain:

Plugging in t=n3ϵ2(1+10n2log(Mn))t=\dfrac{n^{3}}{\epsilon^{2}}\cdot\left(1+10n^{2}\log(Mn)\right) gives us the required contradiction.

Now that we showed the existence of 0jt0\leq j\leq t such that ϵjϵ2n3\epsilon_{j}\leq\dfrac{\epsilon^{2}}{n^{3}}, we will show that step 4 indeed computes a good approximation to capacity. For the first ϵj\epsilon_{j} such that ϵjϵ2n3\epsilon_{j}\leq\dfrac{\epsilon^{2}}{n^{3}}, Lemma 3.4 implies that cap(Tj)(1nϵj)n(1ϵ/n)n1ϵ\textnormal{cap}(T_{j})\geq(1-\sqrt{n\epsilon_{j}})^{n}\geq(1-\epsilon/n)^{n}\geq 1-\epsilon. Since cap(Tj)=cap(T)(i=0j1det(Ri))1\textnormal{cap}(T_{j})=\textnormal{cap}(T)\cdot\left(\displaystyle\prod_{i=0}^{j-1}\det(R_{i})\right)^{-1}, we have

As 1ϵcap(Tj)11-\epsilon\leq\textnormal{cap}(T_{j})\leq 1, we obtain the correct approximation. ∎

In this subsection, we analyze the computation of capacity when we truncate the input matrices. This analysis will be similar to the one in Section 4.2. We begin with some intuition on why truncation works.

Notice that to approximate the capacity, all we need is to compute the determinants of the matrices UiU_{i} in Algorithm 2. The UiU_{i}’s are the truncations of the matrices SiS_{i} from equation (15), the latter matrices being important as they describe the scaled operator TjT_{j} in terms of the original operator TT, see equations (16, 17, 18). The reason why truncating the input works is because the eigenvalues of UiU_{i} are very similar to the eigenvalues of SiS_{i}. Therefore, we can show that det(Si)det(Ui)\det(S_{i})\approx\det(U_{i}). This will imply that the truncated capacity is a good approximation to the actual capacity. The analysis will rely mainly on Lemma 4.3, which gives a good bound on the spectral norm of SiUiS_{i}-U_{i}. Now we state the truncated algorithm.

Given a matrix AA, let Trn(A)\textnormal{Trn}(A) be the matrix obtained by truncating the entries of AA up to P(n,1/ϵ,log(M))=1ϵ(n12log4(Mn))log(n4/ϵ2)P(n,1/\epsilon,\log(M))=\frac{1}{\epsilon}\cdot(n^{12}\log^{4}(Mn))\cdot\log(n^{4}/\epsilon^{2}) bits after the decimal point.

We now proceed to the analysis of Algorithm 4. In Theorem 5.3, we proved the correctness of Algorithm G without truncation. Thus, to prove correctness of Algorithm 4, it is enough to prove two statements:

if ϵ~jϵ2/4n3\widetilde{\epsilon}_{j}\leq\epsilon^{2}/4n^{3}, then the operators TjT_{j} will satisfy the ϵjϵ2/n3\epsilon_{j}\leq\epsilon^{2}/n^{3} bounds

UiSi2P(n,1/ϵ,log(M))/2\|U_{i}-S_{i}\|\leq 2^{-P(n,1/\epsilon,\log(M))/2}.

The first item implies that steps 1 to 3 of the algorithm above are correct, and the second item will tell us that step 4 indeed computes an 1±ϵ1\pm\epsilon approximation to capacity. More formally, we have the following theorem.

By applying Lemma 4.4 with parameters tt and P(n,1/ϵ,log(M))P(n,1/\epsilon,\log(M)) as above, we get that

Therefore, steps 1 to 3 of Algorithm 4 work just as if we had not done any truncation (as in Algorithm 3). This implies that we will always output cap(T)=0\textnormal{cap}(T)=0 whenever the operator TT is rank decreasing.

We are now left with the computation of capacity when TT is rank non-decreasing, which is done in step 4. By applying Lemma 4.3 with parameter δ=2P(n,1/ϵ,log(M))\delta=2^{-P(n,1/\epsilon,\log(M))}, we get

Let 0μi1μi2μin0\leq\mu_{i1}\leq\mu_{i2}\leq\dots\leq\mu_{in} be the eigenvalues of UiU_{i} and 0λi1λi2λin0\leq\lambda_{i1}\leq\lambda_{i2}\leq\dots\leq\lambda_{in} be the eigenvalues of SiS_{i}. From UiSi2P(n,1/ϵ,log(M))/2\|U_{i}-S_{i}\|\leq 2^{-P(n,1/\epsilon,\log(M))/2} and Lemma 4.2, we have

Similarly, we have that det(Sj1)det(Sj2)(1+ϵ/2)det(Uj1)det(Uj2)\det(S_{j-1})\cdot\det(S_{j-2})\leq(1+\epsilon/2)\cdot\det(U_{j-1})\cdot\det(U_{j-2}).

As ϵ~jϵ24n3\widetilde{\epsilon}_{j}\leq\dfrac{\epsilon^{2}}{4n^{3}} implies that ϵjϵ24n3+ϵ2n4\epsilon_{j}\leq\dfrac{\epsilon^{2}}{4n^{3}}+\dfrac{\epsilon^{2}}{n^{4}}, by Lemma 3.4 we have cap(Tj)[(1ϵ/2),1]\textnormal{cap}(T_{j})\in[(1-\epsilon/2),1]. Thus, equation (4.1) yields

The inequalities above imply that det(Uj1)det(Uj2)\det(U_{j-1})\cdot\det(U_{j-2}), that is, the output of Algorithm 4, lies in the interval [(1ϵ)cap(T),(1+ϵ)cap(T)][(1-\epsilon)\textnormal{cap}(T),(1+\epsilon)\textnormal{cap}(T)]. ∎

Conclusion and Open Problems

In this paper we gave a polynomial time algorithm for computing the non-commutative rank of a symbolic matrix over any subfield of the complex numbers. We stated its different incarnations and implications to the many different areas in which this problem arises (indeed we feel that expositing these many connections, some essential to the present result, may yield better future interaction between them with possible more benefits). We note that our algorithm and the analysis bypasses the need to use any degree bounds at all. We further note again that despite the purely algebraic nature of the problem our algorithm is purely analytic, generating a sequence of complex matrices and testing its convergence.

We collect now the most obvious directions for future research, some of them already mentioned in the paper.

Find more applications of this algorithm to optimization problems.

Can we use these techniques to design an efficient deterministic algorithm for the orbit-closure intersection problem for the Left-Right action? In terms of invariants, this is equivalent to asking if two tuples of matrices can be separated by invariants of the Left-Right action (over algebraically closed fields of characteristic ). More formally given two tuples of matrices, (A1,,Am)(A_{1},\ldots,A_{m}) and (B1,,Bm)(B_{1},\ldots,B_{m}), check whether for all (T1,,Tm)(T_{1},\ldots,T_{m}) of arbitrary dimension,

The results of [DM17] give a randomized polynomial time algorithm for this problem (over algebraically closed fields of characteristic ): just plug in random (T1,,Tm)(T_{1},\ldots,T_{m}) of polynomial dimension.

Find a black-box algorithm for SINGULAR. That is, efficiently produce (deterministically) a polynomial size set S\mathcal{S} of tuples of polynomial dimension matrices s.t. for all L=i=1mxiAiL=\sum_{i=1}^{m}x_{i}A_{i} s.t. LL is non-singular, it holds that for some (T1,,Tm)S(T_{1},\ldots,T_{m})\in\mathcal{S},

Due to the recent polynomial dimension bounds of [DM17], it can be proved that a random set S\mathcal{S} works. The challenge is to produce it deterministically. As a special case, this captures deterministic parallel algorithms for the decision version of bipartite perfect matching (when A1,,AmA_{1},\ldots,A_{m} are elementary matrices Ei,jE_{i,j} representing the edges of a bipartite graph). So perhaps, techniques from the recent breakthrough work [FGT16] can be useful.

Explore further the connection between commutative and non-commutative PIT problems. We feel that beyond the many connections between commutative and non-commutative settings that arise here, this different angle of looking at the commutative PIT problem, relating it to its non-commutative cousin, may help in the major quest of finding an efficient deterministic algorithm for it. As mentioned above, this viewpoint has already resulted in a deterministic PTAS for the commutative rank [BJP17].

We design an efficient algorithm for checking if a completely positive operator is rank-decreasing. Can we do the same for positive operators? Algorithm G in fact works for positive operators as well and all that is needed is to prove an effective lower bound on the capacity cap(T)\textnormal{cap}(T) of a positive operator TT which is rank non-decreasing (similar to Theorem 2.20). It was already proven in [Gur04] that cap(T)>0\textnormal{cap}(T)>0 for a positive operator TT which is rank non-decreasing.

Design a strongly polynomial time algorithm for operator scaling. Strongly polynomial time algorithms for matrix scaling were given by [LSW98]. Can they be extended to the operator case?

Can we compute (1+ϵ)(1+\epsilon) approximation to cap(T)\textnormal{cap}(T) in time poly(n,b,log(1/ϵ))\textnormal{poly}(n,b,\log(1/\epsilon))? For computing capacity of non-negative matrices, such algorithms exist. One of the algorithms in [LSW98] has this stronger convergence rate. Also for matrices, capacity can be formulated as a convex program and hence the Ellipsoid algorithm also gives this stronger convergence rate [GY98].

Can we design efficient algorithms for testing the null-cone of general quivers? There is reduction from general quivers to Kronecker-quiver or the left-right action (e.g. see [DM17]) but the reduction is not always efficient. What about the general problem of testing the null-cone of actions of reductive groups?

We would like to thank Harm Derksen, Pavel Hrubes, Louis Rowen and K. V. Subrahmanyam for helpful discussions. We would also like to thank Oded Regev for suggesting us that operator scaling could be used for approximating capacity. Finally, we thank the anonymous reviewers for a comprehensive reading of the paper and pointing several typographical errors and minor bugs.

References

Appendix A Symbolic matrices with polynomial entries and non-commutative rank

In this section, we show how to compute the non-commutative rank of any (not necessarily square) matrix with linear entries over the free skew field ℚ​(

In fact, we solve a more general problem. Subsection A.1 starts with a reduction of computing nc-rank of a matrix with polynomial entries (given by formulae), to the problem of computing the nc-rank of a matrix with linear entries, via the so-called “Higman’s trick” (Proposition A.2). We give the simple quantitative analysis of this reduction, which as far as we know does not appear in the literature and may be useful elsewhere. This reduction, with the two above, allow computing the non-commutative rank of any matrix in time polynomial in the description of its entries.

Before stating the full version of the effective Higman trick, we need to define the bit complexity of a formula computing a non-commutative polynomial.

With this definition in hand, we can state and prove Higman’s trick, which first appeared in [Hig40]. In the proposition below, it will be useful to have the following notation to denote the direct sum of two matrices AA and BB:

where the zero matrices in the top right and bottom left corners are of appropriate dimensions. Before stating and proving Higman’s trick, let us work through a small example which showcases the essence of the trick.

Suppose we want to know the nc-rank of matrix (1xyz+xy)\begin{pmatrix}1&x\\ y&z+xy\end{pmatrix}. The problem here is that this matrix is not linear, and we need to have a linear matrix. How can we convert this matrix into a linear matrix while preserving the rank, or the complement of the rank? To do this, we need to remove the multiplication happening in z+xyz+xy.

Notice that the complement of its rank does not change after the following transformation:

Since the complement of the rank does not change after we perform elementary row or column operations, we can first add x(third row)x\cdot\text{(third row)} to the second row, and then subtract (third column)y\text{(third column)}\cdot y to the second column, to obtain:

The complement of the rank of this last matrix is the same as the complement of the rank of our original matrix! In particular, if this last matrix is full rank, it implies that our original matrix is also full rank. This is the essence of Higman’s trick. We now proceed to its full version.

Let Mult(aij)\textsf{Mult}(a_{ij}) be the number of multiplication gates in the formula computing entry aija_{ij} and

That is, TT is the total number of multiplication gates used to compute all entries of the matrix AA.

We prove this proposition by induction on TT, for matrices of all dimensions. The base case, when T=0T=0, is trivial, as in this case AA itself has linear entries. Suppose now that the proposition is true for all matrices (regardless of their dimensions) which can be computed by formulas using <T<T multiplication gates.

Let AA be our matrix, which can be computed using TT multiplications. W.l.o.g., we can assume that Mult(amn)1\textsf{Mult}(a_{mn})\geq 1. Then, by finding a multiplication gate in the formula for amna_{mn} that has no other multiplication gate as an ancestor, we can write amna_{mn} in the form amn=a+bca_{mn}=a+b\cdot c, where

Therefore, the number of multiplications needed to compute BB is given by

Setting P=PRP=P^{\prime}R and Q=SQQ=SQ^{\prime} proves the inductive step and completes the proof. Since we only use subformulas of the formulas computing the entries of AA, the bound on the bit complexity does not change. ∎

A.2 Classical Reduction

Since nc-rank(M)r\textnormal{nc-rank}(M)\geq r, there exists an r×rr\times r minor of MM of full rank. Let QQ be such a minor of MM. W.l.o.g.,Notice that we can make the following assumption just to simplify notation. In actuality, we do not know where the full rank minor is located in MM. we can assume that QQ is the [r]×[r][r]\times[r] principal minor of MM. Hence, we have that

where U1U_{1} and V1V_{1} are r×rr\times r matrices and the others are matrices with the proper dimensions.

Letting U=(Ir0)U^{\prime}=\begin{pmatrix}I_{r}&0\end{pmatrix} and V=(Ir0)V^{\prime}=\begin{pmatrix}I_{r}\\ 0\end{pmatrix}, the equality above becomes:

we obtain that UMVUMV is full, as we wanted. Notice that the second inequality comes from the fact that rank does not increase after restrictions of the new variables. ∎

Notice that we do not know the rank nc-rank(M)\textnormal{nc-rank}(M) a priori. Therefore, our algorithm will try all possible values of r[n]r\in[n] and output the maximum value of rr for which we find a full matrix.

For each r×rr\times r matrix UMVUMV, we can use the effective Higman’s trick to convert UMVUMV into a s×ss\times s matrix with linear entries. With this matrix, we can use the truncated Gurvits’ algorithm to check whether the matrix we just obtained is full. Since we have this test, we will be able to output the correct rank. Algorithm 5 is the precise formulation of the procedure just described.

To prove this theorem, it is enough to show that Algorithm 5 is correct and it runs with the desired runtime.

Without loss of generality, we can assume that nmn\leq m. Therefore we have that nc-rank(M)n\textnormal{nc-rank}(M)\leq n. By Lemma A.3, if rnc-rank(M)r\leq\textnormal{nc-rank}(M), then matrix MrM_{r} will be of full rank (and therefore will not have a shrunk subspace, by Theorem 1.4). Since Mr=UrMVrM_{r}=U_{r}MV_{r}, from the formulas computing the entries of MM we obtain formulas of size at most 2smn2smn computing the entries of MrM_{r}. Moreover, the bit complexities of these formulas will still be bounded by bb, as multiplication by generic matrices do not mix any of the polynomials of MM.

By Proposition A.2 and the fact that the size of the formulas computing the entries of MrM_{r} are bounded by 2smn2smn, we have that NrN_{r} is a linear matrix of dimensions (k+r)×(k+r)(k+r)\times(k+r), where k2s(mn)2k\leq 2s(mn)^{2} and the bit complexity of the coefficients bounded by bb. Moreover, Nr=P(MrIk)QN_{r}=P(M_{r}\oplus I_{k})Q implies that NrN_{r} is full if, and only if, MrM_{r} is full, which is true if, and only if, nc-rank(M)r\textnormal{nc-rank}(M)\geq r.

Now, by Theorem 1.1, we have a deterministic polynomial time algorithm to determine whether NrN_{r} is full rank. If rnc-rank(M)r\leq\textnormal{nc-rank}(M), NrN_{r} will be full, and the maximum such rr will be exactly when r=nc-rank(M)r=\textnormal{nc-rank}(M). Therefore, by outputting the maximum rr for which NrN_{r} we compute nc-rank(M)\textnormal{nc-rank}(M). This proves that our algorithm is correct. Notice that the runtime is polynomial in the input size, as we perform at most nn applications of the Higman trick and of Algorithm GG^{\prime}. This completes the proof. ∎

A.3 The Quantum Reduction

Here X1,1X_{1,1}, X1,2X_{1,2}, X2,1X_{2,1}, X2,2X_{2,2} are n×nn\times n, n×c1n\times c-1, c1×nc-1\times n, c1×c1c-1\times c-1 matrices respectively. Then T\overline{T} is completely positive and TT is cc-rank-decreasing iff T\overline{T} is rank-decreasing. Note that we are considering cnc\leq n.

A well known characterization due to Choi [Cho75] states that T\overline{T} is completely positive iff i,j=1n+c1Ei,jT(Ei,j)\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j}) is psd. Here Ei,jE_{i,j} is the matrix with 11 at i,ji,j position and everywhere else. Now

From here it is easy to verify that i,j=1n+c1Ei,jT(Ei,j)\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j}) is psd given that i,j=1nEi,jT(Ei,j)\sum_{i,j=1}^{n}E_{i,j}\otimes T(E_{i,j}) is psd. Now suppose that T\overline{T} is rank-decreasing. This can only happen if X1,1=0X_{1,1}=0 or X2,2=0X_{2,2}=0, otherwise

can be psd (and hermitian) only if X1,2=X2,1=0X_{1,2}=X_{2,1}=0. In this case a c1c-1 ranked matrix is mapped to rank nn matrix. So X2,2X_{2,2} has to be zero. Then again by the psd condition X1,2=X2,1=0X_{1,2}=X_{2,1}=0. So

Hence Rank(T(X1,1))Rank(X1,1)c\text{Rank}(T(X_{1,1}))\leq\text{Rank}(X_{1,1})-c. This proves one direction. Now suppose that TT is cc-rank-decreasing and Rank(T(X))Rank(X)c\text{Rank}(T(X))\leq\text{Rank}(X)-c, then

This seems to be the “quantum” analogue of obtaining a maximum matching oracle based on a perfect matching oracle: add c-1 dummy vertices to both sides of the bipartite graph and connect them to everything. Then the new graph has a perfect matching iff the original graph had a matching of size nc+1\geq n-c+1.

Here we didn’t specify a set of Kraus operators for the operator T\overline{T} which seem to be needed to run Algorithms 1 and 2 but Kraus operators can be obtained by looking at the eigenvectors of i,j=1n+c1Ei,jT(Ei,j)\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j}). Alternatively Algorithms 1 and 2 can also be interpreted as acting directly on the Choi-Jamiolkowski state of T\overline{T} i.e. i,j=1n+c1Ei,jT(Ei,j)\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j}).