Operator scaling: theory and applications

Ankit Garg, Leonid Gurvits, Rafael Oliveira, Avi Wigderson

Introduction

This introduction will be unusually long, due to the unusual number of settings in which the problems we study appear, and their interconnections and history.

The main computational problem we will be concerned with in this paper (which we call SINGULAR) is determining whether such a symbolic matrix is invertible or not (over the field of fractions in the given variables). This problem has a dual life, depending on whether the variables commute or don’t commute. In the commutative case this problem has an illustrious history and significance. It was first explicitly stated by Edmonds [Edm67], and shown to have a randomized polynomial time algorithm by Lovasz [Lov79]. The completeness of determinant for arithmetic formulas by Valiant [Val79] means that singularity captures the celebrated Polynomial Identity Testing (PIT) problem, and so in the commutative setting we will refer to it as PIT. The derandomization of the latter probabilistic algorithm for PIT (namely, proving PIT $\in\mathcal{P}$ ) became all-important overnight when Kabanets and Impagliazzo [KI04] showed it would imply nontrivial arithmetic or Boolean lower bounds well beyond current reach.

On the other hand, in the non-commutative case even the meaning of this problem SINGULAR is unclear. It took decades to fully define and understand the related notion of a “field of fractions” for non-commutative polynomials, namely the free skew field over which we (attempt to) invert the matrixFor now, the reader may think of the elements of this “free skew field” simply as containing all expressions (formulas) built from the variables and constants using the arithmetic operations of addition, multiplication and division (we define it more formally a bit later). We note that while this is syntactically the same definition one can use for commuting variables, the skew field is vastly more complex, and in particular its elements cannot be described canonically as ratios of polynomials.. But as we will explain below, this non-commutative SINGULAR problem has many intuitive and clean equivalent formulations (some entirely commutative!). It captures a non-commutative version of identity testing for polynomials and rational functions, provides a possible avenue to attack the notorious commutative PIT version, and quite surprisingly, its different formulations arise naturally in diverse areas of mathematics, revealing surprising connections between them. We only note that unlike the commutative PIT, it is not even clear from its definition that the problem SINGULAR is decidable. It requires some of the very nontrivial characterizations above, and other important results in commutative algebra, to prove a deterministic exponential time upper bound on its complexity (see two very different proofs in [CR99, IQS17]), the best known before this work. No better bound was known even allowing randomness.

The main result of this paper is a deterministic polynomial time algorithm for this problem!

$L$ is singular (namely not invertible) over 𝔽(

For every integer $k$ we have $\textnormal{Det}\left(\sum_{i=1}^{m}X_{i}\otimes A_{i}\right)\equiv 0$ , where the $X_{i}$ are $k\times k$ matrices of commutative variables, Det is still the commutative determinant polynomial, acting here on $kn\times kn$ matrices.

The completely positive quantum operator (or map) associated with $L$ is rank-decreasing. Namely, there exist a positive semi-definite matrix $P$ such that $\textnormal{rank}(\sum_{i=1}^{m}A_{i}PA_{i}^{\dagger})<\emph{{rank}}(P)$ .Here $A^{\dagger}$ denotes the conjugate transpose of a complex matrix $A$ .

In the rest of this introduction we will try to explain how and why the problem SINGULAR arises in these different contexts, how are they related. We will discuss the algorithm that solves SINGULAR (which already appears in [Gur04]), its complexity analysis (which is the main result of this paper), and how they arise from these (and other!) sources. We will also discuss the extension of this algorithm to non-commutative rank computation, and the implications of these algorithms to the various settings. We also highlight the recurring analogs and connections between the commutative and non-commutative worlds. There are probably many ways to weave this meandering story due to the multiple connections between the topics; we chose one (some accounts of subsets of these connections appear e.g. in [Gur04, HW14, IQS17]). Note that most descriptions will be informal or semi-formal; formal definitions for everything that is actually needed will be given in the technical sections, and for the rest we provide references.

Polynomials, both in commuting and non-commuting variables, can always be uniquely defined by their coefficients. And so, while their computational complexity is of course interesting, it is not part of their description. For rational functions this persists in the commutative setting, as they are simply ratios of polynomials. It is perhaps amusing that in the non-commutative setting, the first definition of the free skew field of rational functions was computational, namely one whose objects are described by their computation via a sequence of arithmetic operations. Both this description, and the subsequent discovery of a syntactic representation play an important role in our story.

Books on history and construction of the skew field include [Coh95, Row80], and a thorough discussion from the computational perspective is given in [HW14]. Here we will be relatively brief, highlighting the key roles played by matrices and symbolic matrices in these constructions of the free skew field.

The rational expressions under the above equivalence relation comprise the (universal) free skew field 𝔽(

This theorem immediately implies a polynomial identity test in probabilistic polynomial time (just like in the commutative case) mentioned in the next subsection - simply plug random matrices of appropriate size for the variables. However, for rational expressions, which is our main interest here, such upper bounds on the sufficient dimension to exclude rational identities are much harder to prove, and what is known is much weaker. The best bounds (and only for some fields) are exponential, which seem to only provide probabilistic exponential time rational identity tests. The last subsection of this introduction shows how these exponential dimension bounds arise and obtained from invariant theory, and how we use them to nonetheless derive a deterministic, polynomial time identity test promised in our main theorem.

A second construction of the free skew field which is somehow more concrete was developed by Cohn. In the notation we have already established of symbolic matrices Cohn proved:

Every element of the free skew field 𝔽(

For this definition not to be self-referential (and other reasons) Cohn proved some of the important characterizations of invertible matrices, including items (2) and (5) in Theorem 1.4. It is clear that (each entry of) the inverse of such a matrix can be given by a rational expression, and the question of whether a matrix is invertible again arises naturally, and it turns out to capture rational identity testing.

2 Word Problems and Identity Tests

Word problems and their complexity are central throughout mathematics, arising whenever mathematical objects in a certain class have several representations. In such cases, a basic problem is whether two such representations describe the same object. Often, indeed, some problems of this form have served as early examples of decidable and undecidable problemsE.g. deciding if two knots diagrams describe the same knot was proved decidable by Haken [Hak61], and deciding if two presentations with generators and relations describe the same group was proved undecidable by Rabin [Rab58].

As is well known, the main reason why PIT is so important in the commutative setting is that it captures the polynomial and rational function identity test for formulas, as proved by Valiant [Val79]. Combining it with the equivalences of Fact 1.2, we have

There is an efficient algorithm which converts every arithmetic formula $\phi({\mathbf{y}})$ in commuting variables ${\mathbf{y}}$ of size $s$ to a symbolic matrix $L_{\phi}$ of size $\emph{{poly}}(s)$ , such that $\phi$ is identically zero if and only if $L_{\phi}\in$ PIT (i.e., $L_{\phi}$ is singular).

Theorem 1.7 of the previous subsection, showing that in the non-commutative setting as well SINGULAR captures polynomial and rational function identity test, is the analog Valiant’s completeness theorem. It was proved even earlier by Cohn (see also Malcolmson’s version [Mal78]), using similar linear algebraic ideas. Here is a more precise statement which gives the analogy.

There is an efficient algorithm which converts every arithmetic formula $\phi({\mathbf{x}})$ in non-commuting variables ${\mathbf{x}}$ of size $s$ to a symbolic matrix $L_{\phi}$ of size $\emph{{poly}}(s)$ , such that the rational expression computed by $\phi$ is identically zero if and only if $L_{\phi}\in$ SINGULAR.

As mentioned, the structure of the free skew field is so complex that unlike the commutative case, even decidability of SINGULAR (and rational identity testing) is far from obvious. The first to prove decidability was Cohn in [Coh73, Coh75]. The first explicit bound on time was given by Cohn and Reutenauer in [CR99], reducing it to a system of commutative polynomial equations using characterization (5) of SINGULAR, proved earlier by Cohn, which puts it in $\mathcal{PSPACE}$ . The best upper bound before this work was singly exponential time, obtained by [IQS17], and of course yields the same bound for rational identity testing.

From our Theorem 1.1 we conclude a deterministic, polynomial identity test for non-commuting rational expressions.

It is important to stress that unlike the commutative case, where symbolic matrix inversion can be simulated by quasi-polynomial sized formulas, in the non-commutative case symbolic matrix inversion is exponentially more powerful than formulas (and so Theorem 1.1 is far more powerful than its corollary above). We state these two results formally below for contrastReplacing formulas by circuits there is no contrast - in both the commutative and non-commutative setting matrix inverse has a polynomial size circuit (with division of course) [HW14]. Note that when saying “computing the inverse” we mean computing any entry in the inverse matrix.

The inverse of a non-commutative $n\times n$ symbolic matrix in 𝔽(

Our algorithm of Theorem 1.1 and Corollary 1.10 is a “white-box” identity test (namely one which uses the description of the given matrix or formula), in contrast to a “black-box” algorithm which only has input-output access to the function computed by them. The best-known “black-box” identity test for these formulas, even if randomization is allowed(!), requires exponential time. It is a very interesting question to find a faster black-box algorithm.

When division is not allowed, efficient deterministic identity tests of non-commutative polynomials were known in several models. The strongest is for arithmetic branching programs (ABPs). As this model is easily simulable by matrix inversion (see Theorem 6.5 in [HW14], our algorithm provides an alternative (and very different) proof of the following theorem of Raz and Shpilka [RS05].

There is a deterministic polynomial time white-box identity testing algorithm for non-commutative ABPs.

For certain classes computing only polynomials even efficient black-box algorithms are known, and we mention two below; the first deterministic, for non-commutative ABPs, and the second probabilistic (but for circuits).

There is a deterministic quasi-polynomial time black-box identity testing algorithm for non-commutative ABPs.

There is a probabilistic polynomial time black-box identity testing algorithm for non-commutative circuits.

3 Commutative and Non-commutative Rank of Symbolic Matrices

There are many sources, motivations and results regarding invertibility, and more generally rank, of commutative symbolic matrices, which are much older than the complexity theory interest in it. These mathematical papers, like the ones in the computational complexity literature, regard it as a difficult problem, and attempt to understand a variety of special cases (of a different nature than restrictions of the computational power of the model). Some of the many references to this body of work can be found in the papers [FR04, GM02]. Often, the non-commutative rank (explicitly or implicitly) is used to give upper bound on the commutative rank, and their relationship becomes of interest. We focus on this connection here, and explain how our main result implies a deterministic approximation algorithm to the commutative rank.

The following are equivalent for a matrix of linear forms over commutative variables, $L({\mathbf{y}})$ .

While the characterization above is simple, the one below is very substantial, mostly developed by Cohn for his construction of the free skew field. The first characterization is due to him [Coh95] and the second is due to Fortin and Reutenauer [FR04] who heavily use Cohn’s techniques.

The following are equivalent for a matrix of linear forms over non-commutative variables, $L({\mathbf{x}})$ .

$\emph{{nc-rank}}(L({\mathbf{x}}))=r$ over 𝔽(

There is a deterministic algorithm which, given $m$ $n\times n$ integer matrices $A_{1},\dots A_{m}$ with entries of bit-size $b$ , computes $\emph{{nc-rank}}(L)$ in time $\emph{{poly}}(n,m,b)$ (where $L=\sum_{i=1}^{m}x_{i}A_{i}$ ).

We note here that in some formulations of the problem, $L$ is represented in affine form, namely where an additional constant matrix $A_{0}$ is added to $L$ (this may be viewed as a non-commutative analog of the rank-completion problem). However, the algorithm above works in this case as well , since $\textnormal{nc-rank}(L)$ remains unchanged if an additional variable $x_{0}$ was added as well. So, we will stick with the linear formulation.

It is not hard to see from the definitions that for every $L$ we have $\textnormal{rank}(L)\leq\textnormal{nc-rank}(L)$ . We have already seen that they can be different in Example 1.3. Taking many copies of that $3\times 3$ matrix we see that there can be a factor $3/2$ gap between the two: for any $r$ there are matrices $L$ with $\textnormal{rank}(L)=2r$ and $\textnormal{nc-rank}(L)=3r$ . However, Fortin and Reutenauer [FR04] proved that this gap is never more than a factor of 2.

For every $L$ we have $\emph{{nc-rank}}(L)\leq 2\emph{{rank}}(L)$ .

An immediate corollary of our main result is thus a factor 2 approximation of commutative rank.

We find the question of obtaining a better approximation ratio efficiently a very interesting problem, a different relaxation of the commutative PIT problem that as far as we are aware was not studied till now.

4 Compression Spaces, Optimization and Gurvits’ Algorithm G𝐺G

An important class of spaces of matrices $L$ , which is studied in many of the papers mentioned above, is the class of compression spaces (this notation was apparently introduced by Eisenbud and Harris [EH88]). They are defined as those spaces $L$ for which $L$ is $\textnormal{rank}(L)$ -decomposable. A simpler definition follows the characterization of [FR04] above, namely a space $L$ is a compression space iff $\textnormal{rank}(L)=\textnormal{nc-rank}(L)$ . The importance of compression spaces and their many origins will be surveyed below.

There is a deterministic polynomial time algorithm, Algorithm $G$ , which for every $n$ and every $n\times n$ matrix $L$ given by a set of integer matrices $(A_{1},A_{2},\dots,A_{m})$ , outputs “singular” or “invertible”, and its output is guaranteed to be correctFor both the commutative and non-commutative definitions. when either $\emph{{rank}}(L)=n$ or $\emph{{nc-rank}}(L)<n$ .

In particular, the algorithm always gives the correct answer for compression spaces. Our result on efficient computation of the non-commutative rank implies that for compression spaces we can determine the commutative rank.

Yet another source, elaborated on in [Gur04], is geometric duality theorems, such as the ones for matroid intersection and matroid parity problems. These sometimes give rise to compression spaces, with one prototypical example being spaces where the generating matrices $A_{i}$ all have rank-1; this follows from the Edmonds-Rado [Edm69, Rad42] duality theorem for matroid intersectionIt is an interesting question whether a compression space naturally arises from matroid parity duality of Lovasz [Lov80, Lov89].. Another, simpler example are spaces generated by upper-triangular matrices. Other examples of compression spaces are given in [Gur04]. Following his paper, we note that our rank algorithm above can solve such optimization problems in the dark, namely when the given subspace has such structured spanning generators, but the generators $A_{i}$ actually given to the algorithm may be arbitrary. This is quite surprising, and in general cannot work by uncovering the original structured generators from the given ones: an efficient algorithm is not known for the problem of deciding if a given space of matrices is spanned by rank-1 matrices and we believe it is hard. It would be interesting to explore which other optimization problems can be encoded as the rank of a compression space, and whether the fact that it can be computed “in the dark” has any applications.

5 Permanents, Quantum Operators, Origins and Nature of Algorithm G𝐺G

We will describe the algorithm formally in Section 2.2. Here we give an informal description, which makes it easy to explain its origins and nature. We find these aspects particularly interesting. One (which has very few parallels) is that while the problem SINGULAR solved by Algorithm $G$ in Theorem 1.1 is purely algebraic, the algorithm itself is purely analytic; it generates from the input a sequence of complex-valued matrices and attempts to discover if it is convergent. Another is that the algorithm arises as a quantum analog of another algorithm with very different motivation that we now discuss.

To give more insight to the working of algorithm $G$ , let us describe another algorithm (for a different problem) which inspired it, which we call Algorithm $S$ . This matrix scaling algorithm was developed by Sinkhorn [Sin64] for applications in numerical analysis, and has since found many other applications (see survey and references in [LSW98], who used it as a basis for their deterministic algorithm for approximating the permanent of non-negative matrices). Two different analyses of Sinkhorn’s algorithm $S$ , one of [LSW98] and the other in the unpublished [GY98] inspire the analysis of Algorithm $G$ in [Gur04].

We describe the [LSW98] analysis for Algorithm $S$ . We need a few definitions. For a non-negative matrix $A$ , let $R(A)$ denote the diagonal matrix whose $(i,i)$ -entry is the inverse of the $L_{1}$ norm of row $i$ (which here is simply the sum of its entries as $A$ is non-negative). Similarly $C(A)$ is defined for the columnsA “non-triviality” assumption is that no row or column in $A$ is all zero..

Algorithm $S$ gets as input a non-negative integer matrix $A$ . For a fixed polynomial (in the input size) number of iterations it repeats the following two steps

Normalize rows: $A\leftarrow R(A)\cdot A$

Normalize columns: $A\leftarrow A\cdot C(A)$

What does this algorithm do? It is clear that in alternate steps either $R(A)=I$ or $C(A)=I$ , where $I$ is the identity matrix. Thus $A$ itself alternates being row-stochastic and column-stochastic. The question is whether both converge to $I$ together, namely, if this process converts $A$ to a doubly stochastic matrix. In [LSW98] it is proved that this happens if and only if $\textnormal{Per}(A)>0$ , where Per is the permanent polynomial. Moreover, convergence is easy to detect after a few iterations! If we define $\textnormal{ds}(A)=||R(A)-I||^{2}+||C(A)-I||^{2}$ as a notion of distance between $A$ and the doubly stochastic matrices, then the convergence test is simply whether $\textnormal{ds}(A)<1/n$ . If it is that small at the end of the algorithm then $\textnormal{Per}(A)>0$ , otherwise $\textnormal{Per}(A)=0$ .

The analysis of convergence of Algorithm $S$ in [LSW98] is extremely simple, using the permanent itself as a progress measure (or potential function). It has the usual three parts which makes a potential function useful:

The input size provides an exponential lower bound on the starting value of $\textnormal{Per}(A)$ ,

The arithmetic-geometric mean inequality guarantees that it grows by a factor of $1+1/n$ at every iteration, and

The permanent of any stochastic matrix is upper bounded by 1.

We shall return to this analysis of Algorithm $S$ soon.

As it happens, Algorithm $G$ is a quantum analog of Algorithm $S$ ! In quantum analogs of classical situations two things typically happen, diagonal matrices (which commute) become general matrices (which do not), and the $L_{1}$ norm is replaced by $L_{2}$ . This happens here as well, and we do so almost syntactically, referring the reader to [Gur04] for their quantum information theoretic intuition and meaning of all notions we mention.

The input to algorithm $G$ is a symbolic matrix $L=\sum_{i}x_{i}A_{i}$ , given by the $n\times n$ integer matrices $(A_{1},A_{2},\dots,A_{m})$ . Briefly, $L$ is viewed as a completely positive (quantum) operator, or map, on psd matrices, mapping such a (complex valued) matrix $P$ to $L(P)=\sum_{i}A_{i}PA_{i}^{\dagger}$ ( $P$ is typically a “density matrix” describing a quantum state, namely a psd matrix with unit trace, and the operator $L$ will typically preserve trace or at least not increase it). The dual operator $L^{*}$ acts (as you’d expect) by $L^{*}(P)=\sum_{i}A_{i}^{\dagger}PA_{i}$ . The analog “normalizing factors” for $L$ , named $R(L)$ and $C(L)$ are definedAgain using a “non-triviality” assumption these matrices are invertible. by $R(L)=(\sum_{i}A_{i}A_{i}^{\dagger})^{-\frac{1}{2}}$ , and $C(L)=(\sum_{i}A_{i}^{\dagger}A_{i})^{-\frac{1}{2}}$ . Note that $R(L)=L(I)^{-\frac{1}{2}}$ and $C(L)=L^{*}(I)^{-\frac{1}{2}}$ .

On input $(A_{1},A_{2},\dots,A_{m})$ Algorithm $G$ repeats, for a fixed polynomial (in the input size) number of iterations, the following analogous two steps

Normalize rows: $L\leftarrow R(L)\cdot L$

Normalize columns: $L\leftarrow L\cdot C(L)$

So again, row and column operations are performed alternately, simultaneously on all matrices $A_{i}$ . It is clear, as above, that after each step either $R(L)=I$ or $C(L)=I$ . It is natural to define the case when both occur as “doubly stochastic”, and wonder under what conditions does this sequence converge, namely both $R(L)$ and $C(L)$ simultaneously approach $I$ , and alternatively the limiting $L$ simultaneously fixes the identity matrix $I$ . A natural guess would be that it has to do with a “quantum permanent”. Indeed, Gurvits [Gur04] defines such a polynomial (in the entries of the $A_{i}$ ) $\textnormal{QPer}(L)$ , and proves several properties and characterizations (for example, it is always non-negative like the permanent, and moreover specializes to the permanent when the operator $L$ is actually a “classical” operator described by a single non-negative matrix $A$ ).

One can similarly define in an analogous way to the classical setting a “distance from double stochastic” by $\textnormal{ds}(L)=||R(L)-I||^{2}+||C(L)-I||^{2}$ , and test (after polynomially many iterations) if $\textnormal{ds}<1/n$ . This is precisely what Algorithm $G$ does. It is not hard to see that if $L$ (with non-commuting variables) is singular, then there is no convergence, the test above fails (and indeed $\textnormal{QPer}(L)=0$ ). However, in contrast to Algorithm $S$ , the analysis in [Gur04] falls short of proving that, otherwise we have convergence (and $\textnormal{QPer}(L)>0$ ). It does so only under the strong (commutative) assumption $\textnormal{Det}(L)\neq 0$ , namely when the symbolic matrix, viewed with commuting variables, is nonsingular. This result was stated above as Theorem 1.22. The 3-step complexity analysis proceeds exactly as in [LSW98] for the classical Algorithm $S$ described above, using the quantum permanent as a progress measure. However, if $\textnormal{Det}(L)=0$ then $\textnormal{QPer}(L)=0$ , and lower bounding the initial quantum permanent from below in part (1) of that analysis fails.

To prove our main theorem, showing that convergence happens if and only if $L$ is non-singular over the non-commutative skew field, we use another ingredient from Gurvits’ paper. He proposes another progress measure, called capacity and denoted $\textnormal{cap}(L)$ . Capacity (defined in the next section) is a quantum analog of another classical progress measure, based on KL-divergence, which was used in [GY98] for the analysis of Algorithm $S$ . The advantage of capacity, as pointed out in [Gur04], is that it does not necessarily vanish when $\textnormal{Det}(L)=0$ . Indeed, it is positive whenever $L$ is non-singular in the non-commutative sense! However Gurvits could only bound it sufficiently well from below as a function of the input size if $\textnormal{Det}(L)\neq 0$ . Thus, for a general input $L$ , positive capacity only guarantees that Algorithm $G$ converges in a finite number of steps, without providing an upper bound on that number.

Our main contribution, which leads to a polynomial upper bound, is proving an explicit capacity lower bound for every $L$ that is nonsingular over the skew field. The source of the proof turns out to be commutative algebra and invariant theory, as we discuss next.

6 Invariant Theory, Left-Right Action, Capacity and the Analysis of Algorithm G𝐺G

Invariant theory is a vast field; we will exposit here only the minimal background that is necessary for this paper. The books [CLO07, DK02, KP96] provide expositions on the general theory. More focused discussions towards our applications appear in the appendix of [HW14] and Section 1.2 of [FS13a].

Invariant theory deals with understanding the symmetries of mathematical objects, namely transformations of the underlying space which leave an object unchanged or invariant. Such a set of transformations always form a group (and every group arises this way). One major question in this field is, given a group acting on a space, characterize the objects left invariant under all elements of the group. Here we will only discuss very specific space and actions: polynomials (with commuting variables!) that are left invariant under certain linear transformations of the variables. The invariant polynomials under such action clearly form a ring (called the invariant ring). Important for us is the null-cone of the action, which is simply all assignments to variables which vanish on all non-constant homogeneous invariant polynomials. Namely the null cone is the affine algebraic variety defined by the ideal generated by homogeneous invariant polynomials of positive degree.

Here are two motivating examples surely familiar to the reader. The first example is given by polynomials in $n$ variables $y_{1},\dots,y_{n}$ invariant under the full group of permutations $S_{n}$ ; here the invariant polynomials are (naturally) all symmetric polynomials. While this is an infinite set, containing polynomials of arbitrarily high degrees, note that this set of polynomials is generated (as an algebra) by the $n+1$ elementary symmetric polynomials, with the largest degree being $n$ . This finite generation is no coincidence! A general, important theorem of Hilbert [Hil93] assures us that for “such” linear actions there is always a finite generating set (and hence a finite upper bound on their maximum degree). Obtaining upper bounds on the degree of generating sets, finding descriptions of minimal generating sets for natural actions are the classical goals of this area, and a more modern oneArising in particular in the GCT program of Mulmuley and Sohoni is obtaining succinct descriptions and efficient computation of these invariants (e.g. see [Mul12, FS13a]). The case of the action of the symmetric group is an excellent example of a perfect understanding of the invariant polynomial ring. Note that for this action the null-cone is simply the all-zero vector.

What we consider here is the left-right action on $m$ $n\times n$ matrices, where a pair $(B,C)$ as above takes $(Y_{1},Y_{2},\dots,Y_{m})$ to $(BY_{1}C,BY_{2}C,\dots,BY_{m}C)$ . The study of the invariant ring of polynomials (in the $mn^{2}$ variables sitting in the entries of these matrices) for this action was achievedWe note that this is part of the larger project of understanding quiver representations, started by the works of Procesi, Razmysolov, and Formanek [Pro76, Raz74, For86]. by [DW00, DZ01, SdB01, ANS10], and will look very familiar from condition (4) in Theorem 1.4, as well as from Amitsur’s definition of the free skew fieldNote though that the roles of which matrices in the tensor product are variable, and which are constant, has switched!

Over algebraically closed fields, the invariant ring of polynomials of the left-right action above is generated by all polynomials of the form $\textnormal{Det}\left(\sum_{i}D_{i}\otimes Y_{i}\right)$ , for all $d$ and all $d\times d$ matrices $D_{i}$ .

It is worthwhile to stress the connection forged between the commutative and non-commutative worlds by this theorem when combined with Amitsur’s and Cohn’s constructions of the skew field. A set of matrices $(A_{1},A_{2},\dots,A_{m})$ is in the null-cone of the left-right action if and only if the symbolic matrix $L=\sum_{i}x_{i}A_{i}$ is not invertible in the free skew field! In other words, the non-commutative SINGULAR problem (and thus rational identity testing, and the word problem in the skew field) arises completely naturally in commutative algebra. Of course, invertibility itself is invariant under the left-right action (indeed, even by any invertible matrices $B,C$ , not necessarily of determinant 1), so one expects a connection in at least one direction. At any rate, one may hope now that commutative algebraic geometric tools will aid in solving these non-commutative problems, and they do!

The generating set of the invariant ring given in the theorem is still an infinite set, and as mentioned above, Hilbert proved that some finite subset of it is already generating.

The invariant ring of polynomials of the action of a reductive group on a finite dimensional vector space is finitely generated.

This exponential upper bound was the state of art when we wrote our paper. Subsequent to it, this bound was greatly improved to linear!

This much better bound gave rise to further important developments discussed below under “Subsequent Work”, that highlight even more how invariant theory informs algorithmic efficiency in certain problems. We conclude this section describing the idea behind our proof, which uncovers this connection.

We plan to used the exact same 3-step complexity analysis of Algorithm $G$ as the one for Algorithm $S$ described in subsection 1.5, and which Gurvits [Gur04] used to analyze Algorithm $G$ for a special subset of inputs. As mentioned, the only missing ingredient is step (1), namely an explicit lower bound on the capacity of an input operator $L$ . Just like in the classical case of Algorithm $S$ , an exponential lower bound (in terms of the size of $L$ ) is required to establish a polynomial convergence. This is a proper time to finally define this important parameter of positive operatorsWe note that this notion of capacity seems to have nothing to do with the usual capacity of a quantum channel.

Given $L=(A_{1},A_{2},\ldots,A_{m})$ , recall that $L(P)=\sum_{i}A_{i}PA_{i}^{\dagger}$ and define $\textnormal{cap}(L)$ by

where the infimum is taken over all psd matrices $P$ with $\textnormal{Det}(P)\geq 1$ .

First, observe that capacity is the solution to a non-convex optimization problem (over a convex domain Recall that log determinant is a concave function over the domain of positive definite matrices). We will see in the next section that we can not only bound it, but actually compute it efficiently! Next, observe that capacity is an invariant (though not a polynomial invariant) of the left-right action.What we know is that $\textnormal{cap}(L)\neq 0$ . From this we infer that some polynomial invariant of the form above, $\textnormal{Det}\left(\sum_{i}A_{i}\otimes D_{i}\right)$ is nonzero, where the $D_{i}$ are $d\times d$ matrices for some $d$ .

This expression naturally suggests considering several new positive operators, and relating their capacities. The first, in dimension $d$ , is the operator, $M=(D_{1},D_{2},\ldots,D_{m})$ . Two others, in $nd$ dimensions are $L_{M}=(A_{1}\otimes D_{1},A_{2}\otimes D_{2},\ldots,A_{m}\otimes D_{m})$ , and $L\otimes M=(A_{1}\otimes D_{1},\ldots\,A_{i}\otimes D_{j},\ldots,\,A_{m}\otimes D_{m})$ . The non-vanishing of the determinant above proves that both $\textnormal{cap}(L_{M})>0$ and $\textnormal{cap}(L\otimes M)>0$ . And it is easy to obtain exponential lower bounds on both quantities, which however depend not only on the description size of $L$ (which is given), but also of that of $M$ which could be much larger.

To facilitate our original capacity lower bound, which only used $d\leq\sigma(n)\leq\exp(\textnormal{poly}(n))$ , we proved that the normalizedTaking the dimension’s root of capacity capacity is multiplicative, a fact of independent interest (see details in Section 3.3):

Using only the easier direction, namely that the left hand side is at most the right hand side, it suggests that our lower bound on $\textnormal{cap}(L)$ will follow from a lower bound on $\textnormal{cap}(L\otimes M)$ and an upper bound on $\textnormal{cap}(M)$ . Both of these in turn follow from an upper bound on the entry sizes in $M$ , namely in the $D_{i}$ , which easily follow from the given exponential upper bound on $d$ . A key thing to note is that even when $d$ is exponential in $n$ , and so these size bounds are doubly exponential, it is the $d$ ’th root of this bound that gives the required lower bound $\textnormal{cap}(L)$ .

For the new lower bound, presented in Section 2.3, we bound $\textnormal{cap}(L)$ using directly a bound on $\textnormal{cap}(L\otimes M)$ . Using much more careful size bounds on the entries of $M$ , which follow from Alon’s Nullstellensatz theorem, and tightening the reduction between the two capacities, we are able to derive the desired capacity lower bound using any finite bound on $d$ !

This new proof eliminates the need for good degree upper bounds for the purpose of analyzing this algorithm (if one does not care about efficiency beyond having a polynomial bound). However good degree upper bounds (that are anyway an important goal in invariant theory) were essential for the original proof, and for general actions were useful for other algorithmic problems which arose in other contexts in computational complexity, for example in geometric complexity theory (GCT). Further, we believe they will be important for studying other problems in algebraic complexity and optimization. Indeed, in a work in progress, we need degree bounds for invariant rings whose generating sets are not understood as well as for the left-right action, but Derksen’s general exponential degree upper bound still holds and is useful.

The capacity lower bound above implies, as mentioned, a polynomial upper bound on the number of iterations of Algorithm $G$ , assuming we can perform exact arithmetic! To actually bound the running time the algorithm has to be refined, truncating numbers so that they can be represented by polynomially many bits, and so a careful analysis of the bit complexity is required to actually prove its running time. This issue naturally leads to the next section, in which we discuss how the algorithm can be extended to actually compute capacity of the input operator, as well as provide bounds on its continuity. These results in turn are essential in a follow-up work [GGOW18] we have done on the Brascamp-Lieb inequalities, where capacity arises naturally via a simple reduction to operator scaling.

7 Computation and Continuity of Capacity

We also show that Algorithm G can be modified to compute the capacity of a completely positive operator $L$ . Recall that the capacity of an operator $L$ is defined to be $\textnormal{cap}(L)=\text{inf}\>\textnormal{Det}(L(P))$ , where the infimum is taken over all psd matrices $P$ with $\textnormal{Det}(P)=1$ . The following theorem is proven as Theorem 5.3 in Section 5.

There is a $\textnormal{poly}(n,b,1/\epsilon)$ time algorithm for computing a $(1+\epsilon)$ -multiplicative approximation to $\textnormal{cap}(L)$ . Here $n$ is the dimension on which $L$ acts and $b$ denotes the bit-sizes involved in description of $L$ .

This is quite fascinating since we don’t know of a convex formulation for $\textnormal{cap}(L)$ . The idea for the computation of capacity is quite simple. We know that the capacity of a doubly stochastic operator is exactly $1$ . We also know how operator scaling changes the capacity. If $L_{B,C}$ is a scaling of $L$ i.e.

if $L_{B,C}$ is doubly stochastic. Thus if we can find a doubly-stochastic-scaling of $L$ , then we can compute $\textnormal{cap}(L)$ exactly. Algorithm G helps in finding an approximately-doubly-stochastic-scaling of $L$ , which results in an approximation scheme for computing $\textnormal{cap}(L)$ . It is a very intriguing open problem if a $(1+\epsilon)$ approximation to $\textnormal{cap}(L)$ can be computed in time $\textnormal{poly}(n,b,\log(1/\epsilon))$ . In the aforementioned upcoming paper, we use this result (via a black-box reduction!) to approximating optimal constants in Brascamp-Lieb inequalities.

Although, we don’t know of a convex formulation for capacity, it does have some nice properties. Let us look at the Lagrangian for $\log(\textnormal{cap}(L))$ :

A critical point $C$ i.e. where the gradient $\nabla f(C,\lambda)=0$ should satisfy $L^{*}\left(L(C)^{-1}\right)=C^{-1}$ . This follows from

and hence any critical point is in fact a global minimum for $\textnormal{cap}(L)$ ! It would be interesting to see if our techniques can help solve other non-convex optimization problems which share the same property.

Algorithm $G$ can also be thought of as an alternating minimization algorithm for computing capacity. First let us write capacity in an alternative way.

While the constraints are convex (by log-concavity of determinant), the objective is quadratic. However if, we fix $X$ or $Y$ , this program is convex. In fact, if we fix $X$ , the optimum $Y$ is $\textnormal{Det}(L(X))^{1/n}\cdot L(X)^{-1}$ . Similarly, if we fix $Y$ , the optimum $X$ is $\textnormal{Det}(L^{*}(Y))^{1/n}\cdot L^{*}(Y)^{-1}$ . Starting from $X_{0}=I$ , if we do alternate minimization, we get exactly Algorithm $G$ and our results imply that this process converges in polynomial number of steps!

From the description of capacity, it is not clear if it is continuous. The reason is that a priori, the optimizing matrices could be radically different for $\textnormal{cap}(L_{1})$ and $\textnormal{cap}(L_{2})$ even if $L_{1}$ and $L_{2}$ are quite close. We show that this not the case, essentially because $\textnormal{cap}(L)$ (as well as the optimizer) can be approximated by a simple iterative process namely Algorithm $G$ ! And Algorithm $G$ is clearly continuous in $L$ . We prove this formally in Theorem 4.5 of Section 4.3. In the aforementioned upcoming paper, we use this result to prove continuity of Brascamp-Lieb constants.

The continuity of capacity can also be proven via other methods and is already mentioned in [Gur04]. But here we manage to get explicit bounds on the continuity parameter which we don’t know how to get by methods other than analyzing Algorithm $G$ .

Subsequent Work

After our work, Derksen and Makam [DM17] obtained polynomial degree bounds for the left-right action over any field! Using this bound, Ivanyos, Qiao and Subrahmanyam [IQS18] designed a completely different deterministic polynomial time algorithm for SINGULAR that works over all fields. This algorithm has a combinatorial/linear algebraic flavor, and has several important advantages over our algorithm, but also some limitations in comparison. One clear advantage is of course that it works over every field. Another is that this algorithm can certify the non-commutative singularity of an input $L$ over 𝔽(

Using some of the techniques developed in [IQS17, IQS18], Bläser, Jindal and Pandey [BJP17] designed a deterministic PTAS for the commutative rank i.e. a $(1+\epsilon)$ -approximation algorithm which runs in deterministic $n^{O(1/\epsilon)}$ time. This improves the factor- $2$ approximation algorithm given in this paper.

8 Organization

In Section 2, we describe Algorithm G for testing if quantum operators are rank-decreasing. We explain the notion of capacity of quantum operators, prove an explicit lower bound on the capacity of rank non-decreasing operators, and explain how it is used in the analysis to prove that Algorithm $G$ runs in polynomial time. In Section 3, we study various properties of capacity, some of which are used in other places in the paper. In Section 4, we analyze the bit complexity of Algorithm $G$ and also prove an explicit bound on the continuity of capacity. In Section 5, we will show how a modification of Algorithm $G$ can be used to approximate capacity. In the appendix Section A, we show how to compute the non-commutative rank. We conclude in Section 6 with a short discussion and open problems.

Quantum Operators and Analysis of Algorithm G

This section is devoted to Algorithm $G$ and its analysis. We start with preliminaries about completely positive operators, their properties, and basic quantities associated with them. We then formally describe Algorithm $G$ , and proceed to give a full analysis of its running time, proving the main theorem of this paper. Section 2.1 contains definitions and properties of completely positive operators and their capacity. Section 2.2 describes the Algorithm $G$ and its analysis assuming an explicit lower bound on the capacity. Section 2.3 contains the main theorem concerning the lower bound on capacity of rank non-decreasing operators.

The above is actually not the usual definition of completely positive operators. $T$ is defined to be positive if $T(X)\succeq 0$ whenever $X\succeq 0$ . $T$ is completely positive if $I_{k}\otimes T$ is positive for all $k\geq 1$ . Choi [Cho75] proved that an operator is completely positive iff it is of the form stated above.

If $T(X)=\sum_{i=1}^{m}A_{i}XA_{i}^{\dagger}$ is a completely positive operator, we define its dual $T^{*}$ by $T^{*}(X)=\sum_{i=1}^{m}A_{i}^{\dagger}XA_{i}$ . If both $T$ and $T^{*}$ are trace preserving, namely $T(I)=T^{*}(I)=I$ then we call $T$ (and $T^{*}$ ) doubly stochastic.

A completely positive operator $T$ is said to be rank-decreasing if there exists an $X\succeq 0$ s.t. $\text{rank}(T(X))<\text{rank}(X)$ . $T$ is said to be $c$ -rank-decreasing if there exists an $X\succeq 0$ s.t. $\text{rank}(T(X))\leq\text{rank}(X)-c$ . We will sometimes refer to operators that are not rank decreasing as rank non-decreasing.

Now that we defined completely positive operators, we define their capacity, which is a very important complexity measure of such operators suggested in [Gur04]. Its evolution will be central to the complexity analysis of our algorithm.

[Gur04] The capacity of a completely positive operator $T$ , denoted by $\textnormal{cap}(T)$ , is defined as

This notion of capacity has a very interesting history. Some special cases of capacity were defined in [GS00] and [GS02]. It was then extended to hyperbolic polynomials and this also led to a resolution (and extremely elegant proofs) of the Van der Waerden Conjecture for Mixed Discriminants [Gur06].

The next proposition shows how capacity changes when linear transformations are applied (as in the algorithm) to the completely positive operator.

Let $T$ be the operator defined by $A_{1},\ldots,A_{m}$ and let $T_{B,C}$ be the operator defined by $BA_{1}C,\ldots,BA_{m}C$ , where $B,C$ are invertible matrices. Then

The next proposition gives a useful upper bound on the capacity of trace preserving completely positive operators; this will be used for the convergence analysis of the algorithm.

Let $T$ be a completely positive operator with Kraus operators $A_{1},\ldots,A_{m}$ (which are $n\times n$ complex matrices). Also suppose that either $\sum_{i=1}^{m}A_{i}^{\dagger}A_{i}=I$ or $\sum_{i=1}^{m}A_{i}A_{i}^{\dagger}=I$ . Then $\textnormal{cap}(T)\leq 1$ .

Note that $\textnormal{tr}(T(I))=n$ in either case.

The second inequality follows from the AM-GM inequality. ∎

2 Algorithm G𝐺G and its convergence rate

We now describe Algorithm $G$ to test if a completely positive operator $T$ (given in terms of Kraus operators $A_{1},\ldots,A_{m}$ ) is rank non-decreasing (equivalently properties $(1)-(8)$ in Theorem 1.4 one of which is that $A_{1},\ldots,A_{m}$ admit no shrunk subspace). We will then analyze its convergence rate, namely the number of scaling iterations needed as a function of the input size. In section 4, we will continue with the finer analysis of the bit complexity of this algorithm.

Since the property of $A_{1},\ldots,A_{m}$ having a shrunk subspace or not remains invariant if we replace $A_{1},\ldots,A_{m}$ by a basis spanning the subspace spanned by $A_{1},\ldots,A_{m}$ , we can always assume wlog that $m\leq n^{2}$ . Suppose the maximum bit size of the entries of $A_{1},\ldots,A_{m}$ is $b$ . Since scaling the matrices $A_{1},\ldots,A_{m}$ doesn’t change the problem, we can assume that $A_{1},\ldots,A_{m}$ have integer entries of magnitude at most $M=2^{O(b)}$ .

Algorithm $G$ below is essentially Gurvits’ algorithm [Gur04] for Edmonds’ problem for subspaces of matrices having Edmonds-Rado property. It is a non-commutative generalization of the Sinkhorn scaling procedure to scale non-negative matrices to doubly stochastic matrices (see for example [LSW98] and references therein). The algorithm alternates applying a “normalizing” basis change from the left and right to the given matrices, so as to alternately make the operator or its dual trace preserving. The idea is that this process will converge to a doubly stochastic operator iff it is not rank-decreasing, and furthermore, we can bound the number $t$ of iterations by $\textnormal{poly}(n,\log(M))$ . We will use the following to measure of our operator to being doubly stochastic.

Given a completely positive operator $T$ , define its right normalization $T_{R}$ as follows:

Given a completely positive operator $T$ , define its left normalization $T_{L}$ as follows:

Note that $T_{L}(I)=I$ . These operations are referred to as row and column operations in [Gur04].

We next prove that if a completely positive operator $T$ is doubly stochastic, or close to being one in this measure, then it is rank non-decreasing. This will be important for the termination condition of Algorithm $G$ .

If $T$ is a completely positive operator which is right $($ or left $)$ normalized satisfying $\textnormal{ds}(T)\leq 1/(n+1)$ , then $T$ is rank non-decreasing.

Suppose $X\succeq 0$ is a psd matrix s.t. $\text{Rank}(X)=r$ . We would like to prove that $\text{Rank}(T(X))\geq r$ . Let

be the eigenvalue decomposition of $X$ where $v_{i}$ ’s are orthonormal and $\lambda_{i}>0$ . Then

Let us denote $T\left(v_{i}v_{i}^{\dagger}\right)$ by $R_{i}$ . Since $T(I)=I$ , we get that $\sum_{i=1}^{r}R_{i}\preceq T(I)=I$ . Also note that

This is because $R_{i}$ ’s are psd matrices and hence a vector is in the kernel of $\sum_{i=1}^{r}\lambda_{i}R_{i}$ iff it is in the kernel of all the $R_{i}$ ’s. Because of $\sum_{i=1}^{r}R_{i}\preceq I$ , we get that

Suppose $T^{*}(I)=I+\Delta$ . We know that $\textnormal{tr}[\Delta^{2}]\leq 1/(n+1)$ . Then

Let $P$ denote the projection onto $\{v_{1},\ldots,v_{r}\}$ . Adding the above equations for all $i\in\{1,\ldots,r\}$ , we get

which completes the proof since $\text{Rank}\left(T(X)\right)$ is an integer. ∎

We are now ready to present the algorithm.

In algorithm $G$ , the operators are maintained in terms of the Kraus operators and it is easy to see the effect of right and left normalizations on the Kraus operators. In fact, they are named such since for a right (left) normalization, the Kraus operators multiplied on the right (left) by $T^{*}(I)^{-1/2}\left(T(I)^{-1/2}\right)$ .

The main objective is to analyze the minimum number of steps $t$ for which this algorithm terminates with the correct answer. The following theorem of [Gur04] gives the time analysis of the algorithm assuming an initial lower bound on the capacity of the input completely positive operator. In the next subsection we will prove our main result, an appropriate lower bound, which shows that the algorithm terminates with the correct answer in polynomial time.

Let $T$ be a completely positive operator that is rank non-decreasing, with Kraus operators $A_{1},\ldots,A_{m}$ , which are $n\times n$ integer matrices such that each entry of $A_{i}$ has absolute value at most $M$ . Suppose $T_{R}$ is the right normalization of $T$ . If $\textnormal{cap}(T_{R})\geq f(n,M)$ , then Algorithm $G$ when applied for at least $t=2+36n\cdot\log(1/f(n,M))$ steps is correct.

For completeness sake, we provide a full proof of this theorem. Again, this analysis follows similar ones for the classical Sinkhorn iterations, e.g. as in [LSW98, GY98]. Basically, capacity increases by a factor roughly $1+1/36n$ per iteration as long as it is not too close to 1.

If either $T(I)$ or $T^{*}(I)$ is singular, then $T$ is rank-decreasing. When $T(I)$ is singular, $T$ decreases the rank of $I$ . When $T^{*}(I)$ is singular, any vector in the kernel of $T^{*}(I)$ lies in the kernels of all the $A_{i}$ ’s. If $T(I)$ and $T^{*}(I)$ are both non-singular, it is easy to verify that $T_{j}(I)$ and $T_{j}^{*}(I)$ will remain non-singular for all $j$ and hence step 2 is well defined. Also using Theorem 2.12 and the fact that right and left normalizations don’t change the property of being rank decreasing, Algorithm $G$ will always output rank decreasing if $T$ is rank-decreasing.

So what is left to prove is if $T$ is rank non-decreasing, then $\min\{\epsilon_{j}:1\leq j\leq t\}\leq 1/6n$ . Assume to the contrary that it is not. Denote by $\textnormal{cap}_{j}$ to be the capacity of the operator $T_{j}$ . By Lemma 5.2 below (which essentially is a robust version of the AM-GM inequality), if $\epsilon_{j}>1/6n$ , then $\textnormal{cap}_{j+1}\geq\textnormal{exp}(1/36n)\cdot\textnormal{cap}_{j}$ .

From the assumption of the theorem, we know that $\textnormal{cap}_{1}\geq f(n,M)$ . Also it is easy to see that $\textnormal{cap}_{j}\leq 1$ for all $j$ (Proposition 2.8). However by the assumption that $\min\{\epsilon_{j}:1\leq j\leq t\}>1/6n$ and the increase in capacity per iteration seen above, we get that

Plugging in $t=2+36n\cdot\log(1/f(n,M))$ gives us the required contradiction.

In the main Theorem 2.20 below we will prove that the quantity $f(n,M)$ used in the statement of Theorem 2.14 is $\geq\textnormal{exp}(-4n\log(Mmn))$ , which will prove that the number of iterations needed in Algorithm $G$ is $O(n^{2}\log(Mmn))$ . But this alone doesn’t guarantee that the algorithm is actually polynomial time, since the bit sizes of numbers involved might get exponential. As it happens, simple truncation suffices to overcome this problem, as shown in [Gur04], and we reproduce this analysis in Subsection 4.2 for completeness.

3 Main Theorem: The Lower Bound on Capacity

In this subsection we prove our main theorem, a lower bound on capacity of an operator in terms of its description size, in Theorem 2.20. Before diving into the proof we provide a high level plan. We will first prove that if a completely positive operator with integer entries is rank non-decreasing (that is has positive capacity), then the capacity is actually non-negligible. Our starting point (Theorem 1.4) is the statement of the equivalence between the rank non-decreasing property and an algebraic condition (non-vanishing of a certain determinant). Using Alon’s Combinatorial Nullstellensatz, we can ensure small coefficients in the algebraic condition above. We then state and prove (Lemma 2.17) a Cauchy-Schwartz inequality for matrices needed next. The main result (Theorem 2.18 ) proceeds by applying an appropriate scaling to the original integral operator, which reduces it to one that preserves the identity matrix. The latter property (which provides a trace bound), the integrality of the original operator (which provides an integral determinantal lower bound) and the multiplicativity of capacity under scaling combine (via the above inequality) to give the desired bound. We now proceed with the details of this plan.

We will need the fact that (nonzero) polynomials of degree $d$ cannot vanish on all points with non-negative integer coordinates with sum $\leq d$ . This follows from Alon’s combinatorial nullstellensatz [Alo99, Theorem 1.2].

As a corollary of Theorem 1.4 and of Lemma 2.15, we get the following:

Furthermore, $D_{1},\ldots,D_{m}$ can be chosen to be matrices with non-negative integer entries s.t.

This is because the total degree of $p$ is $nd$ and the individual degree of each variable $X_{j}(k,l)$ in $p$ is at most $n$ . This implies the desired upper bound. ∎

We will also need the following Cauchy-Schwarz like inequality:

Let $C_{1},\ldots,C_{m}$ and $D_{1},\ldots,D_{m}$ be $d_{1}$ and $d_{2}$ dimensional complex matrices, respectively. Then

The first inequality is just $\textnormal{tr}\left[YY^{\dagger}\right]\geq 0$ . The second equality follows from linearity of trace, $\textnormal{tr}\left[Y\otimes Z\right]=\textnormal{tr}[Y]\cdot\textnormal{tr}[Z]$ and $\textnormal{tr}[XY^{\dagger}]=\textnormal{tr}[YX^{\dagger}]$ . Rearranging the above, we get the following inequality:

Summing Equation (1) over pairs $(i,j)$ with $i<j$ , we obtain the following:

Adding $\sum_{i}\textnormal{tr}\left[C_{i}C_{i}^{\dagger}\right]\cdot\textnormal{tr}\left[D_{i}D_{i}^{\dagger}\right]$ to both sides completes the proof (due to linearity of trace). ∎

Now we are ready to prove our main theorem for operators with integral Kraus operators.

Since $T_{A}$ is rank non-decreasing, by Corollary 2.16, there exist non-negative integer matrices $D_{1},\ldots,D_{m}$ of some dimension $d$ s.t. $\textnormal{Det}(D_{1}\otimes A_{1}+\cdots+D_{m}\otimes A_{m})\neq 0$ and also

Since $A_{1},\ldots,A_{m}$ are also integer matrices, $\textnormal{Det}(D_{1}\otimes A_{1}+\cdots+D_{m}\otimes A_{m})$ is a non-zero integer and hence

Let $X\succ 0$ be such that $\textnormal{Det}(X)=1$ . Consider the matrices $C_{i}=T_{A}(X)^{-1/2}A_{i}X^{1/2}$ . This is a scaling intended so that the operator $T_{C}$ defined by the matrices $C_{i}$ preserves identity i.e. satisfies

Let $Y$ denote the matrix $D_{1}\otimes C_{1}+\cdots+D_{m}\otimes C_{m}$ . Then

The equality follows from multiplicativity of determinant and the fact that $\textnormal{Det}\left(I_{d}\otimes Y\right)=\textnormal{Det}(Y)^{d}$ . The inequality follows from $\textnormal{Det}(X)=1$ and Equation (3). Hence we obtain

On the other hand, by the AM-GM inequality

The first inequality follows from Lemma 2.17. The second equality follows from Equation (4). The second inequality follows from Equation (2). Combining equations (7), (8) and (9), we obtain:

Since $X$ was an arbitrary matrix s.t. $X\succ 0$ and $\textnormal{Det}(X)=1$ , we obtain the desired lower bound on $\textnormal{cap}(T_{A})$ . ∎

A slightly stronger bound can be obtained by using quantum permanents [Gur04] in the case when the $D_{i}$ ’s are of constant dimension.

Theorem 2.18 immediately implies the following capacity lower bound that we need.

Suppose $T$ is a cptp map which is rank non-decreasing and is obtained by right normalization of an operator with Kraus operators $A_{1},\ldots,A_{m}$ , which are $n\times n$ integer matrices such that each entry of $A_{i}$ has absolute value at most $M$ . Then $\textnormal{cap}(T)\geq\textnormal{exp}(-4n\log(Mmn))$ .

Let $T_{A}$ be the operator defined by $A_{1},\ldots,A_{m}$ . Since $T$ is the right normalization of $T_{A}$ , by Proposition 2.7, we get that

Note that $T^{*}(I)$ is a psd matrix. Also

This implies (via the AM-GM inequality) that

Combining Equations (10) and (11), and Theorem 2.18 gives the desired bound. ∎

Properties of Capacity

In this section, we prove some interesting properties of capacity. In subsection 3.1, we prove that the capacity of operators which are close to doubly stochastic is close to $1$ . This is used in the proof of Theorem 3.11. In subsection 3.2, we characterize the almost optimal points of capacity in terms of approximate fixed points of an operator. This will be used later in section 4 in the proof of continuity of capacity (theorem 4.5). In 3.3, we prove a multiplicativity property of capacity under tensor products of operators. This property was used in the lower bound on capacity in the previous version of this paper. While it is no longer needed for this purpose now, it is an interesting and intriguing fact by itself, so we include a proof.

The following definition of capacity for non-negative matrices is due to [GY98]. They used it to analyze Sinkhorn’s algorithm for matrix scaling.

Given a non-negative matrix $A$ , its capacity is defined as follows:

The next lemma states that the capacity of almost doubly stochastic matrices is close to $1$ .

Let $A$ be a non-negative $n\times n$ matrix s.t. the row sums of $A$ are $1$ and

where $c_{1},\ldots,c_{n}$ are the column sums of $A$ . Then

For the inequality, we used concavity of log and the fact that row sums of $B$ are $1$ . For the third equality, we used that the column sums of $B$ are $1$ .

Now we need to move on to $A$ which is almost doubly stochastic. A direct argument like for the doubly stochastic case does not work. We need to first prove the following claim ([LSW98]):

There exists a decomposition of $A=\lambda B+Z$ , where $B$ is doubly stochastic, $Z$ is non-negative and $\lambda\geq 1-\sqrt{n\epsilon}$ .

Given the claim, it is easy to complete the proof of the lemma.

So we end with a proof of the claim from [LSW98]. We will first prove that there is a decomposition $A=D+Z$ where $D$ is multiple of a doubly stochastic matrix and $\text{per}(Z)=0$ . Initially, we start with $D=0$ and $Z=A$ . As long as $\text{per}(Z)>0$ , there is an $\alpha>0$ and a permutation matrix $P$ s.t. $Z^{\prime}=Z-\alpha P$ is nonnegative and the number of non-zero entries in $Z^{\prime}$ is strictly less than that of $Z$ . Replace $D$ by $D^{\prime}=D+\alpha P$ and $Z$ by $Z^{\prime}$ . Note that $D^{\prime}$ is also a multiple of a doubly stochastic matrix. After at most $n^{2}$ iterations, we will find a decomposition $A=\lambda B+Z$ , where $\text{per}(Z)=0$ (and $B$ is doubly stochastic). We will now prove that $\lambda\geq 1-\sqrt{n\epsilon}$ . If $\lambda=1$ , then we are already done, so assume $\lambda<1$ . In this case, consider the matrix $C=\frac{Z}{1-\lambda}$ . Row sums of $C$ are $1$ and the $i^{\text{th}}$ column sum $c_{i}^{\prime}=\frac{c_{i}-\lambda}{1-\lambda}$ . Then

Since $\text{per}(C)=0$ and row sums are $1$ , it follows that (see Lemma 5.2 in [LSW98])

which implies $\lambda\geq 1-\sqrt{n\epsilon}$ ∎

The next lemma says that capacity of almost doubly stochastic operators is close to $1$ . The proof will proceed by reducing the operator case to the non-negative matrix case established above.

Let $T$ be a positive operator acting on $n\times n$ matrices such that $\textnormal{tr}[(T(I)-I)^{2}]\leq\epsilon$ and $T^{*}(I)=I$ (equivalently $T$ is trace-preserving). Then $\textnormal{cap}(T)\geq(1-\sqrt{n\epsilon})^{n}$ .

The proof can be adapted to obtain a similar statement when both $T(I),T^{*}(I)\approx I$ .

Let $X$ be a positive definite matrix s.t. $\textnormal{Det}(X)=1$ . Let

be an eigenvalue decomposition of $X$ with $v_{1},\ldots,v_{n}$ forming a orthonormal basis. Then

Let us denote $T(v_{j}v_{j}^{\dagger})$ by $R_{j}$ . Then since $T$ is trace preserving, we have that $\textnormal{tr}(R_{j})=1$ for all $j$ . Also let

be an eigenvalue decomposition for $T(X)$ with $u_{1},\ldots,u_{n}$ forming a orthonormal basis. It follows that for all $i$ ,

Let $A$ denote the non-negative matrix s.t. $A_{i,j}=u_{i}^{\dagger}R_{j}u_{i}$ . Then $\sigma=A\lambda$ . Also the column sums of $A$ are all $1$ since $\textnormal{tr}(R_{j})=1$ for all $j$ . We will also prove that

where $r_{1},\ldots,r_{n}$ denote the column sums of $A$ . Then the lemma follows from Lemma 3.2 (applied with row sums switched with column sums) and the facts that $\textnormal{Det}(T(X))=\prod_{i=1}^{n}\sigma_{i}$ and $\textnormal{Det}(X)=\prod_{i=1}^{n}\lambda_{i}$ . Now note that

The first inequality can be proved via convexity of square. ∎

We also prove an easy lemma for the other direction: operators with capacity almost $1$ are almost doubly stochastic.

Let $T$ be a positive operator acting on $n\times n$ matrices s.t. $\textnormal{cap}(T)\geq\textnormal{exp}(-\delta)$ , $\delta\leq 1/6$ and also $T^{*}(I)=I$ . Then $\textnormal{ds}(T)=\textnormal{tr}[(T(I)-I)^{2}]\leq 6\delta$ .

$\textnormal{cap}(T)\geq\textnormal{exp}(-\delta)$ implies that $\textnormal{Det}(T(I))\geq\textnormal{exp}(-\delta)$ . Also $\textnormal{tr}[T(I)]=\textnormal{tr}\left[T^{*}(I)\right]=n$ . Suppose $\textnormal{tr}[(T(I)-I)^{2}]=\alpha$ . Then Lemma 5.1 below implies that

This implies that $\alpha\leq 6\delta$ . ∎

Note that the parameters in Lemmas 3.4 and 3.6 don’t match and that is because capacity and the distance to doubly stochasticity don’t characterize each other exactly.

2 Characterization of approximate optimizers for capacity

Let $T$ be a completely positive operator. Define $\text{Fixed}(T,\epsilon)$ to be the set of hermitian positive-definite matrices which are $\epsilon$ -approximate fixed point of the operator $X\rightarrow T^{*}(T(X)^{-1})^{-1}$ i.e. $C\in\text{Fixed}(T,\epsilon)$ if the following holds:

The following lemma essentially says that the elements of $\text{Fixed}(T,\epsilon)$ are approximate minimizers for $\textnormal{cap}(T)$ .

Suppose $C\in\text{Fixed}(T,\epsilon)$ . If $\epsilon\leq 1/(n+1)$ , then $T$ is rank non-decreasing. Furthermore,

Here $n$ is the size of matrices on which $T$ acts. Similar statement also holds for the operator $X\rightarrow T(T^{*}(X)^{-1})^{-1}$ .

We will use a $C\in\text{Fixed}(T,\epsilon)$ to find a scaling of $T$ which is almost doubly stochastic and then apply Theorem 2.12 (saying that almost doubly stochastic operators are rank non-decreasing) and Lemma 3.4 (giving a lower bound on capacity of almost doubly stochastic operators).

When $\epsilon\leq 1/(n+1)$ , by Theorem 2.12, $Z$ is rank non-decreasing and hence $T$ is rank non-decreasing. Also by Lemma 3.4,

The upper bound on $\textnormal{cap}(T)$ is clear from the definition of capacity. ∎

Next we prove the other direction but again, not with matching parameters.

where $\delta\leq 1/6$ . Then $C\in\text{Fixed}(T,6\delta)$ .

As in the proof of Lemma 3.9, define the operator

This implies that $C\in\text{Fixed}(T,6\delta)$ because of the way operator $Z$ is set up. ∎

3 Multiplicativity of capacity

In this subsection, we prove a curious multiplicativity property of capacity of completely positive operators. This is not required for the lower bound on capacity presented in this version but the easy direction of this theorem was required in the lower bound on capacity in the previous version.

Let $T_{1}$ and $T_{2}$ be completely positive operators where $T_{1}$ acts on matrices of dimension $d_{1}$ and $T_{2}$ acts on matrices of dimension $d_{2}$ . Then

We first prove the easy $\leq$ direction of the above theorem. If $X$ and $Y$ are the minimizers for $\textnormal{cap}(T_{1})$ and $\textnormal{cap}(T_{2})$ , then looking at how much $T_{1}\otimes T_{2}$ shrinks the determinant of $X\otimes Y$ will give us the required statement. The infimum for $\textnormal{cap}(T_{1})$ and $\textnormal{cap}(T_{2})$ might not be achieved but that is fine.

Let $T_{1}$ and $T_{2}$ be completely positive operators where $T_{1}$ acts on matrices of dimension $d_{1}$ and $T_{2}$ acts on matrices of dimension $d_{2}$ . Then

Suppose $X$ , $Y$ be $d_{1}$ and $d_{2}$ dimensional matrices respectively s.t. $X,Y\succ 0$ and $\textnormal{Det}(X),\textnormal{Det}(Y)=1$ . Then $X\otimes Y\succ 0$ and $\textnormal{Det}(X\otimes Y)=1$ as well. Also

This proves that $\textnormal{cap}(T_{1})^{d_{2}}\cdot\textnormal{cap}(T_{2})^{d_{1}}\geq\textnormal{cap}(T_{1}\otimes T_{2})$ (by taking $X$ and $Y$ to be sequences of matrices which approach $\textnormal{cap}(T_{1})$ and $\textnormal{cap}(T_{2})$ respectively). Taking $1/(d_{1}d_{2})$ powers on both sides completes the proof of the easy direction.

We are now ready to prove Theorem 3.11. The main ingredient is the duality between capacity being and the existence of a scaling to almost doubly stochastic position.

More formally, $S_{1}^{*}(I)=I,S_{2}^{*}(I)=I$ and $\textnormal{tr}\left[(S_{1}(I)-I)^{2}\right]\leq\epsilon,\textnormal{tr}\left[(S_{2}(I)-I)^{2}\right]\leq\epsilon$ . Now consider the operator $S=S_{1}\otimes S_{2}$ . Note that $S(I)=S_{1}(I)\otimes S_{2}(I)$ and $S^{*}(I)=S_{1}^{*}(I)\otimes S_{2}^{*}(I)$ . Hence $S^{*}(I)=I$ and $\textnormal{tr}\left[(S(I)-I)^{2}\right]\leq\epsilon^{\prime}=2(d_{1}+d_{2}+\epsilon)\epsilon$ . This can be seen from the following chain of inequalities (and equalities):

The first inequality follows from the following Cauchy-Schwarz inequality for psd matrices:

The second inequality follows from $\textnormal{tr}[S_{1}(I)^{2}]\leq d_{1}+\epsilon$ , which follows from the fact that $\textnormal{tr}[S_{1}(I)]=\textnormal{tr}[S_{1}^{*}(I)]=\textnormal{tr}\left[I_{d_{1}}\right]=d_{1}$ and that $\textnormal{tr}\left[(S_{1}(I)-I)^{2}\right]\leq\epsilon$ , as well as $\textnormal{tr}\left[(S_{2}(I)-I)^{2}\right]\leq\epsilon$ . Note that the operator $S$ is explicitly given as follows:

By Lemma 3.4, we have that $\textnormal{cap}(S)\geq(1-\sqrt{d_{1}d_{2}\epsilon^{\prime}})^{d_{1}d_{2}}$ and hence

Combining the fact that $S_{1}^{*}(I)=I$ and $S_{2}^{*}(I)=I$ along with Proposition 2.8, we get that $\textnormal{cap}(S_{1}),\textnormal{cap}(S_{2})\leq 1$ . Also by Proposition 2.7,we have that

Combining equations (12), (13) and (14), we get that

Since $\epsilon^{\prime}=2(d_{1}+d_{2}+\epsilon)\epsilon$ can be taken to be arbitrarily close to , this completes the proof. ∎

Bit Complexity Analysis of Algorithm G𝐺G and Continuity of Capacity

In this section, we will provide the bit complexity analysis of Algorithm $G$ and also provide explicit bounds on the continuity of capacity by a sensitivity analysis of Algorithm $G$ . We start by analyzing a natural iterative sequence associated with an operator $T$ that will be very useful, both, in the bit complexity and continuity analysis. Section 4.1 will describe how the operators $T_{j}$ in Algorithm $G$ evolve with respect this iterative sequence. Section 4.2 provides the bit complexity analysis of Algorithm $G$ and Section 4.3 provides explicit bounds for continuity of capacity.

Consider the sequence of matrices given by $S_{0}=T^{*}(I)$ ( $I$ is of size $n\times n$ ), and

The next proposition studies the stability properties of this sequence of matrices.

For a real symmetric positive definite matrix $X$ , let $l(X)$ denote its smallest eigenvalue and $u(X)$ its largest.

Suppose $A$ is an $n\times n$ real symmetric positive definite matrix with integer entries s.t. the magnitude of any entry of $A$ is at most $M$ . Then $l(A)\geq\frac{1}{(Mn)^{n-1}}$ and $u(A)\leq Mn$ .

Define $\alpha=(M^{2}n^{2}m)^{n-1}$ . Let $T$ be a completely positive operator whose Kraus operators $A_{1},\ldots,A_{m}$ are $n\times n$ integer matrices and each entry of $A_{i}$ is of magnitude at most $M$ . Also assume that $T(I),T^{*}(I)$ are both non-singular. Then

Let $T,M,n,m$ be as before. Let $X$ be a real symmetric matrix. Then $||T(X)||\leq\alpha||X||$ . Also if X is real symmetric positive definite, then $l(T(X))\geq\alpha^{-1}\cdot l(X)$ and $u(T(X))\leq\alpha\cdot u(X)$ . Similarly $||T^{*}(X)||\leq\alpha||X||$ , $l(T^{*}(X))\geq\alpha^{-1}\cdot l(X)$ and $u(T^{*}(X))\leq\alpha\cdot u(X)$ .

$S_{j}$ are real symmetric positive definite matrices for all $0\leq j\leq t$ . Also $l(S_{j})\geq\alpha^{-(j+1)}$ and $u(S_{j})\leq\alpha^{j+1}$ for all $0\leq j\leq t$ .

For all $1\leq k,l\leq n$ , $|S_{j}(k,l)|\leq\alpha^{j+1}$ .

The last inequality follows from Cauchy-Schwarz. Now since $A$ is positive definite and the determinant is an integer, $\textnormal{Det}(A)\geq 1$ . Then

from which it follows that $l(A)\geq\frac{1}{(Mn)^{n-1}}$ .

Follows from previous point by noting that each entry of $T(I),T^{*}(I)$ is an integer of magnitude at most $M^{2}nm$ and $T(I),T^{*}(I)$ are both symmetric matrices.

Suppose $X$ is a real symmetric matrix satisfying $||X||=\beta$ . Then

since $T$ is completely positive and hence it maps psd matrices to psd matrices. Thus we get

Similarly one can prove that $\beta\alpha I\succeq T(X)$ which would imply $||T(X)||\leq\alpha\beta=\alpha||X||$ . Other statements can be proven in a similar manner.

We will prove this via induction on $j$ . It is true for $S_{0}$ by point 2 in the proposition. Suppose the statement holds for $S_{j-1}$ . Then $S_{j}=T(S_{j-1}^{-1})$ or $S_{j}=T^{*}(S_{j-1}^{-1})$ . It does not really matter which case it is for the purpose of this proof. Lets assume the former. Then

The first inequality is by point 3 in the proposition. The second inequality is by induction hypothesis. Similarly we can also prove that $u(S_{j})\leq\alpha^{j+1}$ , which would complete the induction step.

Here $e_{k}$ and $e_{l}$ are the standard basis vectors.

Define another sequence of matrices defined by $U_{0}=S_{0}$ and

where $\Delta_{j}$ ’s are small perturbations. We now study the closeness of the sequences $\{U_{j}\}$ and $\{S_{j}\}$ .

If $||\Delta_{j}||\leq\frac{1}{2^{j}\cdot\alpha^{j+1}}$ for all $j\geq 1$ , then $l(U_{j})\geq\frac{1}{2^{j}\cdot\alpha^{j+1}}$ and $u(U_{j})\leq 2^{j}\cdot\alpha^{j+1}$ , for all $j\geq 0$ . Here $\alpha=(M^{2}n^{2}m)^{n-1}$ .

We will prove this using induction on $j$ . It is true for $j=0$ since $U_{0}=S_{0}$ . Assume that $l(U_{j})\geq\frac{1}{2^{j}\cdot\alpha^{j+1}}$ and $u(U_{j})\leq 2^{j}\cdot\alpha^{j+1}$ . Then $U_{j+1}=T(U_{j}^{-1})+\Delta_{j+1}$ or $U_{j+1}=T^{*}(U_{j}^{-1})+\Delta_{j+1}$ . Suppose it is $T(U_{j}^{-1})+\Delta_{j+1}$ . The other case is similar.

Note that there is a lot of slack in this part, but it does not matter for our purposes. ∎

The following holds for every integer $t$ : suppose $||\Delta_{j}||\leq\delta$ for all $1\leq j\leq t$ , where $\delta\leq\frac{1}{(2\alpha)^{t+1}}$ . Then

for all $0\leq j\leq t$ . Here $\alpha=(M^{2}n^{2}m)^{n-1}$ .

For the fourth inequality we used Proposition 4.1 and Lemma 4.2. Note that value of $\delta$ is small enough so that conditions of Lemma 4.2 are satisfied. ∎

In this subsection we compute succinct expressions for the operators $T_{j}$ which appear in Algorithm $G$ , together with expressions for their capacity and distance to doubly stochastic. These expressions will involve the matrices $S_{j}$ defined in the previous subsection. These operators $T_{j}$ are the intermediary operators in the scaling procedure, and the succinct expressions will be important for us when approximating capacity, or proving the stability of the capacity of a quantum operator.

Starting from a completely positive operator $T$ which satisfies $T(I)=I$ , denote the sequence of operators obtained after right and left normalizations by $\{T_{j}\}$ i.e. $T_{j}$ is the operator obtained after $j$ iterations of right and left normalizations (as in Algorithm $G$ ). Note that

For each $j$ , $T_{j}$ is an operator scaling of $T$ . So

for some non-singular matrices $C_{j}$ and $D_{j}$ . Let us denote $C_{j}^{\dagger}C_{j}$ by $P_{j}$ and $D_{j}^{\dagger}D_{j}$ by $Q_{j}$ . It turns out $P_{j}$ and $Q_{j}$ are the matrices that really matter for the purpose of analyzing Algorithm $G$ and we will see next how these evolve.

When $j$ is odd, $T_{j}^{*}(I)=I$ . Note that

So the condition $T_{j}^{*}(I)=I$ gives us that $Q_{j}=T^{*}(P_{j})^{-1}$ . Also note that when $j$ is odd, $P_{j}=P_{j-1}$ . When $j$ is even, we have, $T_{j}(I)=I$ , which implies $P_{j}=T(Q_{j})^{-1}$ and also $Q_{j}=Q_{j-1}$ holds. This can be summarized as follows:

along with $T_{0}=T$ and $P_{0},Q_{0}=I$ . Thus we can see that

with the convention $S_{-1}=I$ . For Algorithm $G$ , what matters is

we get (because similar matrices have the same trace) that

For the computation of capacity, the determinants that matter are

By the discussion above and the fact that similar matrices have the same determinants, we get that

with the convention $S_{-2},S_{-1}=I$ . Thus we get

2 Bit Complexity Analysis of Algorithm G𝐺G

We are now ready to analyze the bit complexity of Algorithm $G$ . We will prove that if one runs Algorithm $G$ while truncating the numbers to an appropriate polynomial number of bits there is essentially no change in the convergence and required number of iterations. We will call the algorithm working with truncated inputs Algorithm $G^{\prime}$ .

Given a matrix $A$ , let $\text{Trn}(A)$ be the matrix obtained by truncating the entries of A up to $P(n,\log(M))$ bits after the decimal point. $P(n,\log(M))$ is a polynomial which we will specify later. Note that $||A-\text{Trn}(A)||_{\infty}\leq 2^{-P(n,\log(M))}$ . We now describe Algorithm $G^{\prime}$ , which is the variant of Algorithm $G$ with truncation.

The parameter $t$ will be chosen as before: $t=2+36n\cdot\log(1/f(n,M))$ . We now show that throughout the iterations, distances to double stochasticity is essentially the same for the original and truncated algorithms $G$ and $G^{\prime}$ .

Here $\epsilon_{j}=\textnormal{ds}(T_{j})$ as defined in Algorithm $G$ . Also note that by equations (19a) and (19b),

Hence by Theorem 2.12, $T$ is rank non-decreasing. Now assume, for the reverse direction, that $T$ is rank non-decreasing. Then by the analysis in Section 2.2, $\min\{\epsilon_{j}:1\leq j\leq t\}\leq 1/12n$ . Hence by Lemma 4.4

This proves the correctness of Algorithm $G^{\prime}$ .

where the last inequality follows from Proposition 4.1. Also

where the last inequality follows from Lemma 4.2 and the fact that $P(n,\log(M))$ will be chosen to be a large enough polynomial so that the perturbations $\Delta_{j}$ in the lemma satisfy the condition $||\Delta_{j}||\leq 2^{-P(n,\log(M))}\leq\frac{1}{2^{j}\cdot\alpha^{j+1}}$ . Now

In the first equality, we used the fact that $\textnormal{tr}(M_{j}N_{j})=\textnormal{tr}(N_{j}M_{j})$ . In the second inequality, we used the fact that the maximum magnitude of any entry in $M_{j}+N_{j}$ is bounded by $(2\alpha)^{2j}$ and that $|\textnormal{tr}(M_{j}-N_{j})|$ is upper bounded by $n||M_{j}-N_{j}||$ . The third inequality is just Cauchy-Schwarz and the fourth is the fact that Frobenius norm of a matrix is upper bounded by $n||M_{j}-N_{j}||^{2}$ . Let us try to upper bound $||M_{j}-N_{j}||$ now.

The second last inequality follows by application of Proposition 4.1 and Lemmas 4.2 and 4.3. Also $\delta$ here is at most $n\cdot 2^{-P(n,\log(M))}$ since $||\Delta_{j}||_{\infty}\leq 2^{-P(n,\log(M))}$ .

Now $\alpha=(M^{2}n^{2}m)^{n-1}=\textnormal{exp}(\Theta(n\log(n)\log(M)))$ (since $m\leq n^{2}$ ). Hence

We proved in Theorem 2.20 that $\log(1/f(n,M))$ is $\textnormal{poly}(n,\log(M))$ , so $P(n,\log(M))$ is also a polynomial in $n$ and $\log(M)$ . ∎

3 Continuity of Capacity

In this section, we prove the continuity of capacity. We prove the following theorem:

Suppose $A_{1},\ldots,A_{m}$ and $B_{1},\ldots,B_{m}$ are two tuples of $n\times n$ matrices such that the bit-complexity of elements of $A_{i}$ ’s is $b$ and $||A_{i}-B_{i}||\leq\delta$ for all $i$ . Let $T_{A}$ be the operator defined by $A_{1},\ldots,A_{m}$ and $T_{B}$ be the operator defined by $B_{1},\ldots,B_{m}$ . Then there exists a polynomial $P(n,b,\log(m))$ s.t. if $\delta\leq\textnormal{exp}(-P(n,b,\log(m)))$ , then $\textnormal{cap}(T_{A})>0$ implies $\textnormal{cap}(T_{B})>0$ . Furthermore,

The fact that capacity is continuous is already mentioned in [Gur04] and can be proved by other methods. But here we provide explicit bounds on the continuity parameters. Recall that $\text{Fixed}(T,\epsilon)$ (defined in subsection 3.2) is the set of hermitian positive-definite matrices $C$ which are $\epsilon$ -approximate fixed points of the operator $X\rightarrow T(T^{*}(X)^{-1})^{-1}$ i.e. satisfy the following condition:

The main insight is that by the analysis of Algorithm $G$ , since $\textnormal{cap}(T_{A})>0$ , there exists a $C\in\text{Fixed}(T_{A},\epsilon)$ with low spectral norm. Since $T_{B}$ is close to $T_{A}$ , $C\in\text{Fixed}(T_{B},\epsilon^{\prime})$ for $\epsilon^{\prime}$ close to $\epsilon$ and then the proof is finished by applying Lemma 3.9.

(Of Theorem 4.5) The proof of Theorem 2.14 can be modified to prove the following: there exists a polynomial $Q(n,b)$ s.t. for all $\epsilon>0$ , if we run $t=Q(n,b)/\epsilon$ iterations of Algorithm $G$ starting from $T_{0}=T_{A}$ satisfying $\textnormal{cap}(T_{A})>0$ , then for some $1\leq j\leq t$ , $\epsilon_{j}=\textnormal{ds}(T_{j})\leq\epsilon$ . Essentially, at each step capacity increases by roughly $\textnormal{exp}(\Omega(\epsilon))$ if $\epsilon_{j}>\epsilon$ , capacity is lower bounded by $\textnormal{exp}(-Q(n,b))$ initially and upper bounded by $1$ always. By equations (19a) and (19b), we know that

where $\{S_{i}\}$ is the sequence of matrices given by $S_{0}=T_{A}^{*}(I)$ , and

Suppose $1\leq r\leq t$ be such that $\epsilon_{r}\leq\epsilon$ . Wlog assume that $r$ is odd. Then $S_{r}=T_{A}\left(T_{A}^{*}\left(S_{r-2}^{-1}\right)^{-1}\right)$ . $\epsilon_{r}\leq\epsilon$ implies that $S_{r-2}^{-1}$ is an $\epsilon$ -approximate fixed point of the operator $X\rightarrow T_{A}\left(T_{A}^{*}(X)^{-1}\right)^{-1}$ . Hence by Lemma 3.9,

We will prove now that $S_{r-2}^{-1}$ is also an $\epsilon^{\prime}$ -approximate fixed point of the operator $X\rightarrow T_{B}\left(T_{B}^{*}(X)^{-1}\right)^{-1}$ for an appropriate choice of $\epsilon^{\prime}$ . Let us denote $S_{r-2}^{-1}$ by $C$ . By an application of Proposition 4.1, it follows that the lowest and highest eigenvalues of $C$ , $l(C)$ and $u(C)$ satisfy

where $Q_{1}(n,b,\log(m))$ is another polynomial s.t. $Q_{1}(n,b,\log(m))=O(Q(n,b)\cdot n\cdot b\cdot\log(nm))$ . Let $D$ be an arbitrary matrix. Then

The first two inequalities are just the triangle inequality. The third inequality follows from submultiplicativity of the spectral norm. The fourth inequality follows from the fact that

We will now upper bound $||T_{A}\left(T_{A}^{*}(C)^{-1}\right)-T_{B}\left(T_{B}^{*}(C)^{-1}\right)||$ :

Equation (22) along with Proposition 4.1 implies that the lowest and highest eigenvalues of $T_{A}^{*}(C)$ satisfy the following:

Now applying equation (23) with $D=T_{A}^{*}(C)^{-1}$ along with equation (26) gives us the following:

Now we will upper bound $||T_{A}^{*}(C)^{-1}-T_{B}^{*}(C)^{-1}||$ .

The last inequality follows from equations (26) and (24). Note that we need $\delta\leq\textnormal{exp}(-P(n,b,\log(m)))$ for a sufficiently large polynomial $P$ here to upper bound $||T_{B}^{*}(C)^{-1}||$ via equations (26) and (24). Now applying equation (23) with $D=T_{A}^{*}(C)^{-1}-T_{B}^{*}(C)^{-1}$ along with equation (28) gives us that

We are left to upper bound $||T_{A}\left(T_{A}^{*}(C)^{-1}-T_{B}^{*}(C)^{-1}\right)||$ . This follows from Proposition 4.1 and equation (28).

Combining equations (25), (27), (29) and (30), we get the following

Let us denote the matrix $D_{1}=C\cdot T_{A}\left(T_{A}^{*}(C)^{-1}\right)$ and $D_{2}=C\cdot T_{B}\left(T_{B}^{*}(C)^{-1}\right)$ . $D_{1}$ determines whether $C$ is an approximate-fixed point of $T_{A}$ and $D_{2}$ determines whether $C$ is an approximate-fixed point of $T_{B}$ .

The second inequality follows from equations (22) and (31). We also have the following elementary inequality:

The above inequality implies that $C$ is an $\epsilon^{\prime}$ -approximate fixed point of $T_{B}$ for

We can now choose $\epsilon=\frac{1}{2(n+1)}$ . Then as long as $\delta\leq\frac{\textnormal{exp}(-Q_{11}(n,b,\log(m))/\epsilon)}{2(n+1)}$ , then $C$ is a $1/(n+1)$ -approximate fixed point of the operator $X\rightarrow T_{B}\left(T_{B}^{*}(X)^{-1}\right)^{-1}$ and by Lemma 3.9, $T_{B}$ is rank non-decreasing and hence $\textnormal{cap}(T_{B})>0$ . This proves first part of the theorem. For the second part of the theorem, since $C$ is an $\epsilon^{\prime}$ -approximate fixed point of $T_{B}$ for

by equation (23) and Proposition 4.1. Hence

Now, combining equations (21) and (21), we get that

Combining this with equation (35) gives us

Now choose $\epsilon=2\cdot\text{max}\left\{Q_{11}(n,b,\log(m)),Q_{12}(n,b,\log(m))\right\}/\log(1/\delta)$ . This ensures that $\epsilon^{\prime}\leq 2\epsilon$ and elementary calculations can then finish the proof of the theorem. ∎

Computing the Capacity of a Quantum Operator

In this section, we show how algorithm $G$ can be used to compute an approximation to the capacity of any quantum operator. For simplicity of exposition, we will present in this section an analysis of convergence of algorithm $G$ without truncation. Afterwards, in subsection 5.1, we show how to adapt the analysis of algorithm $G$ to handle the truncation. This corresponds to the analysis of algorithm $G^{\prime}$ in the previous section.

We begin with the following lemma, which is an adaptation of Lemma 3.10 from [LSW98].

Let $x_{1},\ldots,x_{n}$ be positive real numbers such that $\displaystyle\sum_{i=1}^{n}x_{i}=n$ and $\displaystyle\sum_{i=1}^{n}(x_{i}-1)^{2}=\alpha$ . Then

Where in the last inequalities we used the fact that $\displaystyle\sum_{i=1}^{n}(x_{i}-1)^{3}\leq\left(\displaystyle\sum_{i=1}^{n}(x_{i}-1)^{2}\right)^{3/2}$ and $\alpha^{3/2}\leq\alpha$ .

Consider the function $f(\lambda)=\prod_{i=1}^{n}(1+\lambda(x_{i}-1))$ . We will prove that $f$ is a decreasing function of $\lambda$ when $\lambda\in$ . In that case

where the last inequality follows from Case 1. Now let us prove that $f$ is decreasing.

As a corollary of Lemma 5.1, we obtain the following quantitative progress measure towards computing capacity:

Let $T$ be a right (left) normalized quantum operator such that $\textnormal{ds}(T)=\alpha$ . Additionally, let $\widetilde{T}$ be the left (right) normalization of operator $T$ . Then,

Suppose $T$ is right normalized and $\widetilde{T}$ is the left normalization of $T$ . Proposition 2.7 tells us that

Let $\lambda_{1},\ldots,\lambda_{n}$ be the eigenvalues of $T(I)$ . As $n=\textnormal{tr}(T(I))=\displaystyle\sum_{i=1}^{n}\lambda_{i}$ and $\alpha=\textnormal{ds}(T)=\textnormal{tr}[(T(I)-I)^{2}]=\displaystyle\sum_{i=1}^{n}(\lambda_{i}-1)^{2}$ , the conditions of Lemma 5.1 apply and we have

This implies the desired lower bounds on $\textnormal{cap}(\widetilde{T})$ . Since the case where $T$ is left normalized is analogous, we omit the argument. ∎

We now state a slight modification of Algorithm G, with a view towards computing the capacity of a quantum operator.

Let $T$ be a completely positive operator, whose Kraus operators are given by $n\times n$ integer matrices $A_{1},\ldots,A_{m}$ , such that each entry of $A_{i}$ has absolute value at most $M$ . Algorithm $G$ when applied for $t=\dfrac{n^{3}}{\epsilon^{2}}\cdot\left(1+10n^{2}\log(Mn)\right)$ steps approximates $\textnormal{cap}(T)$ within a multiplicative factor of $1\pm\epsilon$ .

If either $T(I)$ or $T^{*}(I)$ is singular, then $T$ decreases the rank of $I$ . When $T(I)$ is singular, rank decreasing follows by definition. When $T^{*}(I)$ is singular, one way to see it is that $\text{Im}(A_{i})\subseteq\text{Im}\left(T^{*}(I)\right)$ for all $i$ . Since $\text{Im}\left(A_{i}A_{i}^{\dagger}\right)=\text{Im}(A_{i})$ , we get that $\text{Im}\left(T(I)\right)\subseteq\text{Im}\left(T^{*}(I)\right)$ and hence $T(I)$ is singular. Therefore, the algorithm is correct on step 1, by outputting $\textnormal{cap}(T)=0$ .

If $T(I)$ and $T^{*}(I)$ are both non-singular, it is easy to verify that $T_{j}(I)$ and $T_{j}^{*}(I)$ will remain non-singular for all $j$ and hence step 3 is well defined.

In this case, since right and left normalizations don’t change the property of being rank decreasing, we have $\textnormal{cap}(T_{j})=0$ for all $0\leq j\leq t$ . Hence, Lemma 3.4 implies that $\textnormal{ds}(T_{j})>\dfrac{\epsilon^{2}}{n^{3}}$ for all $0\leq j\leq t$ . In this case, step 3 of Algorithm G will correctly output $\textnormal{cap}(T)=0$ .

In this case, we will show that there must exist $0\leq j\leq t$ such that $\epsilon_{j}\leq\dfrac{\epsilon^{2}}{n^{3}}$ . Assume the contrary, for the sake of contradiction. By Theorem 2.20, we know that $\textnormal{cap}(T_{1})\geq\exp(-10n^{2}\log(Mn))$ . Also Proposition 2.8 implies that $\textnormal{cap}(T_{j})\leq 1$ for all $j$ . However by the assumption that $\epsilon_{j}>\dfrac{\epsilon^{2}}{n^{3}}$ , Lemma 5.2 implies that $\textnormal{cap}(T_{j+1})\geq\exp(\epsilon^{2}/n^{3})\cdot\textnormal{cap}(T_{j})$ for all $0\leq j\leq t$ . Hence, we obtain:

Plugging in $t=\dfrac{n^{3}}{\epsilon^{2}}\cdot\left(1+10n^{2}\log(Mn)\right)$ gives us the required contradiction.

Now that we showed the existence of $0\leq j\leq t$ such that $\epsilon_{j}\leq\dfrac{\epsilon^{2}}{n^{3}}$ , we will show that step 4 indeed computes a good approximation to capacity. For the first $\epsilon_{j}$ such that $\epsilon_{j}\leq\dfrac{\epsilon^{2}}{n^{3}}$ , Lemma 3.4 implies that $\textnormal{cap}(T_{j})\geq(1-\sqrt{n\epsilon_{j}})^{n}\geq(1-\epsilon/n)^{n}\geq 1-\epsilon$ . Since $\textnormal{cap}(T_{j})=\textnormal{cap}(T)\cdot\left(\displaystyle\prod_{i=0}^{j-1}\det(R_{i})\right)^{-1}$ , we have

As $1-\epsilon\leq\textnormal{cap}(T_{j})\leq 1$ , we obtain the correct approximation. ∎

In this subsection, we analyze the computation of capacity when we truncate the input matrices. This analysis will be similar to the one in Section 4.2. We begin with some intuition on why truncation works.

Notice that to approximate the capacity, all we need is to compute the determinants of the matrices $U_{i}$ in Algorithm 2. The $U_{i}$ ’s are the truncations of the matrices $S_{i}$ from equation (15), the latter matrices being important as they describe the scaled operator $T_{j}$ in terms of the original operator $T$ , see equations (16, 17, 18). The reason why truncating the input works is because the eigenvalues of $U_{i}$ are very similar to the eigenvalues of $S_{i}$ . Therefore, we can show that $\det(S_{i})\approx\det(U_{i})$ . This will imply that the truncated capacity is a good approximation to the actual capacity. The analysis will rely mainly on Lemma 4.3, which gives a good bound on the spectral norm of $S_{i}-U_{i}$ . Now we state the truncated algorithm.

Given a matrix $A$ , let $\textnormal{Trn}(A)$ be the matrix obtained by truncating the entries of $A$ up to $P(n,1/\epsilon,\log(M))=\frac{1}{\epsilon}\cdot(n^{12}\log^{4}(Mn))\cdot\log(n^{4}/\epsilon^{2})$ bits after the decimal point.

We now proceed to the analysis of Algorithm 4. In Theorem 5.3, we proved the correctness of Algorithm G without truncation. Thus, to prove correctness of Algorithm 4, it is enough to prove two statements:

if $\widetilde{\epsilon}_{j}\leq\epsilon^{2}/4n^{3}$ , then the operators $T_{j}$ will satisfy the $\epsilon_{j}\leq\epsilon^{2}/n^{3}$ bounds

$\|U_{i}-S_{i}\|\leq 2^{-P(n,1/\epsilon,\log(M))/2}$ .

The first item implies that steps 1 to 3 of the algorithm above are correct, and the second item will tell us that step 4 indeed computes an $1\pm\epsilon$ approximation to capacity. More formally, we have the following theorem.

By applying Lemma 4.4 with parameters $t$ and $P(n,1/\epsilon,\log(M))$ as above, we get that

Therefore, steps 1 to 3 of Algorithm 4 work just as if we had not done any truncation (as in Algorithm 3). This implies that we will always output $\textnormal{cap}(T)=0$ whenever the operator $T$ is rank decreasing.

We are now left with the computation of capacity when $T$ is rank non-decreasing, which is done in step 4. By applying Lemma 4.3 with parameter $\delta=2^{-P(n,1/\epsilon,\log(M))}$ , we get

Let $0\leq\mu_{i1}\leq\mu_{i2}\leq\dots\leq\mu_{in}$ be the eigenvalues of $U_{i}$ and $0\leq\lambda_{i1}\leq\lambda_{i2}\leq\dots\leq\lambda_{in}$ be the eigenvalues of $S_{i}$ . From $\|U_{i}-S_{i}\|\leq 2^{-P(n,1/\epsilon,\log(M))/2}$ and Lemma 4.2, we have

Similarly, we have that $\det(S_{j-1})\cdot\det(S_{j-2})\leq(1+\epsilon/2)\cdot\det(U_{j-1})\cdot\det(U_{j-2})$ .

As $\widetilde{\epsilon}_{j}\leq\dfrac{\epsilon^{2}}{4n^{3}}$ implies that $\epsilon_{j}\leq\dfrac{\epsilon^{2}}{4n^{3}}+\dfrac{\epsilon^{2}}{n^{4}}$ , by Lemma 3.4 we have $\textnormal{cap}(T_{j})\in[(1-\epsilon/2),1]$ . Thus, equation (4.1) yields

The inequalities above imply that $\det(U_{j-1})\cdot\det(U_{j-2})$ , that is, the output of Algorithm 4, lies in the interval $[(1-\epsilon)\textnormal{cap}(T),(1+\epsilon)\textnormal{cap}(T)]$ . ∎

Conclusion and Open Problems

In this paper we gave a polynomial time algorithm for computing the non-commutative rank of a symbolic matrix over any subfield of the complex numbers. We stated its different incarnations and implications to the many different areas in which this problem arises (indeed we feel that expositing these many connections, some essential to the present result, may yield better future interaction between them with possible more benefits). We note that our algorithm and the analysis bypasses the need to use any degree bounds at all. We further note again that despite the purely algebraic nature of the problem our algorithm is purely analytic, generating a sequence of complex matrices and testing its convergence.

We collect now the most obvious directions for future research, some of them already mentioned in the paper.

Find more applications of this algorithm to optimization problems.

Can we use these techniques to design an efficient deterministic algorithm for the orbit-closure intersection problem for the Left-Right action? In terms of invariants, this is equivalent to asking if two tuples of matrices can be separated by invariants of the Left-Right action (over algebraically closed fields of characteristic ). More formally given two tuples of matrices, $(A_{1},\ldots,A_{m})$ and $(B_{1},\ldots,B_{m})$ , check whether for all $(T_{1},\ldots,T_{m})$ of arbitrary dimension,

The results of [DM17] give a randomized polynomial time algorithm for this problem (over algebraically closed fields of characteristic ): just plug in random $(T_{1},\ldots,T_{m})$ of polynomial dimension.

Find a black-box algorithm for SINGULAR. That is, efficiently produce (deterministically) a polynomial size set $\mathcal{S}$ of tuples of polynomial dimension matrices s.t. for all $L=\sum_{i=1}^{m}x_{i}A_{i}$ s.t. $L$ is non-singular, it holds that for some $(T_{1},\ldots,T_{m})\in\mathcal{S}$ ,

Due to the recent polynomial dimension bounds of [DM17], it can be proved that a random set $\mathcal{S}$ works. The challenge is to produce it deterministically. As a special case, this captures deterministic parallel algorithms for the decision version of bipartite perfect matching (when $A_{1},\ldots,A_{m}$ are elementary matrices $E_{i,j}$ representing the edges of a bipartite graph). So perhaps, techniques from the recent breakthrough work [FGT16] can be useful.

Explore further the connection between commutative and non-commutative PIT problems. We feel that beyond the many connections between commutative and non-commutative settings that arise here, this different angle of looking at the commutative PIT problem, relating it to its non-commutative cousin, may help in the major quest of finding an efficient deterministic algorithm for it. As mentioned above, this viewpoint has already resulted in a deterministic PTAS for the commutative rank [BJP17].

We design an efficient algorithm for checking if a completely positive operator is rank-decreasing. Can we do the same for positive operators? Algorithm G in fact works for positive operators as well and all that is needed is to prove an effective lower bound on the capacity $\textnormal{cap}(T)$ of a positive operator $T$ which is rank non-decreasing (similar to Theorem 2.20). It was already proven in [Gur04] that $\textnormal{cap}(T)>0$ for a positive operator $T$ which is rank non-decreasing.

Design a strongly polynomial time algorithm for operator scaling. Strongly polynomial time algorithms for matrix scaling were given by [LSW98]. Can they be extended to the operator case?

Can we compute $(1+\epsilon)$ approximation to $\textnormal{cap}(T)$ in time $\textnormal{poly}(n,b,\log(1/\epsilon))$ ? For computing capacity of non-negative matrices, such algorithms exist. One of the algorithms in [LSW98] has this stronger convergence rate. Also for matrices, capacity can be formulated as a convex program and hence the Ellipsoid algorithm also gives this stronger convergence rate [GY98].

Can we design efficient algorithms for testing the null-cone of general quivers? There is reduction from general quivers to Kronecker-quiver or the left-right action (e.g. see [DM17]) but the reduction is not always efficient. What about the general problem of testing the null-cone of actions of reductive groups?

We would like to thank Harm Derksen, Pavel Hrubes, Louis Rowen and K. V. Subrahmanyam for helpful discussions. We would also like to thank Oded Regev for suggesting us that operator scaling could be used for approximating capacity. Finally, we thank the anonymous reviewers for a comprehensive reading of the paper and pointing several typographical errors and minor bugs.

References

Appendix A Symbolic matrices with polynomial entries and non-commutative rank

In this section, we show how to compute the non-commutative rank of any (not necessarily square) matrix with linear entries over the free skew field ℚ(

In fact, we solve a more general problem. Subsection A.1 starts with a reduction of computing nc-rank of a matrix with polynomial entries (given by formulae), to the problem of computing the nc-rank of a matrix with linear entries, via the so-called “Higman’s trick” (Proposition A.2). We give the simple quantitative analysis of this reduction, which as far as we know does not appear in the literature and may be useful elsewhere. This reduction, with the two above, allow computing the non-commutative rank of any matrix in time polynomial in the description of its entries.

Before stating the full version of the effective Higman trick, we need to define the bit complexity of a formula computing a non-commutative polynomial.

With this definition in hand, we can state and prove Higman’s trick, which first appeared in [Hig40]. In the proposition below, it will be useful to have the following notation to denote the direct sum of two matrices $A$ and $B$ :

where the zero matrices in the top right and bottom left corners are of appropriate dimensions. Before stating and proving Higman’s trick, let us work through a small example which showcases the essence of the trick.

Suppose we want to know the nc-rank of matrix $\begin{pmatrix}1&x\\ y&z+xy\end{pmatrix}$ . The problem here is that this matrix is not linear, and we need to have a linear matrix. How can we convert this matrix into a linear matrix while preserving the rank, or the complement of the rank? To do this, we need to remove the multiplication happening in $z+xy$ .

Notice that the complement of its rank does not change after the following transformation:

Since the complement of the rank does not change after we perform elementary row or column operations, we can first add $x\cdot\text{(third row)}$ to the second row, and then subtract $\text{(third column)}\cdot y$ to the second column, to obtain:

The complement of the rank of this last matrix is the same as the complement of the rank of our original matrix! In particular, if this last matrix is full rank, it implies that our original matrix is also full rank. This is the essence of Higman’s trick. We now proceed to its full version.

Let $\textsf{Mult}(a_{ij})$ be the number of multiplication gates in the formula computing entry $a_{ij}$ and

That is, $T$ is the total number of multiplication gates used to compute all entries of the matrix $A$ .

We prove this proposition by induction on $T$ , for matrices of all dimensions. The base case, when $T=0$ , is trivial, as in this case $A$ itself has linear entries. Suppose now that the proposition is true for all matrices (regardless of their dimensions) which can be computed by formulas using $<T$ multiplication gates.

Let $A$ be our matrix, which can be computed using $T$ multiplications. W.l.o.g., we can assume that $\textsf{Mult}(a_{mn})\geq 1$ . Then, by finding a multiplication gate in the formula for $a_{mn}$ that has no other multiplication gate as an ancestor, we can write $a_{mn}$ in the form $a_{mn}=a+b\cdot c$ , where

Therefore, the number of multiplications needed to compute $B$ is given by

Setting $P=P^{\prime}R$ and $Q=SQ^{\prime}$ proves the inductive step and completes the proof. Since we only use subformulas of the formulas computing the entries of $A$ , the bound on the bit complexity does not change. ∎

A.2 Classical Reduction

Since $\textnormal{nc-rank}(M)\geq r$ , there exists an $r\times r$ minor of $M$ of full rank. Let $Q$ be such a minor of $M$ . W.l.o.g.,Notice that we can make the following assumption just to simplify notation. In actuality, we do not know where the full rank minor is located in $M$ . we can assume that $Q$ is the $[r]\times[r]$ principal minor of $M$ . Hence, we have that

where $U_{1}$ and $V_{1}$ are $r\times r$ matrices and the others are matrices with the proper dimensions.

Letting $U^{\prime}=\begin{pmatrix}I_{r}&0\end{pmatrix}$ and $V^{\prime}=\begin{pmatrix}I_{r}\\ 0\end{pmatrix}$ , the equality above becomes:

we obtain that $UMV$ is full, as we wanted. Notice that the second inequality comes from the fact that rank does not increase after restrictions of the new variables. ∎

Notice that we do not know the rank $\textnormal{nc-rank}(M)$ a priori. Therefore, our algorithm will try all possible values of $r\in[n]$ and output the maximum value of $r$ for which we find a full matrix.

For each $r\times r$ matrix $UMV$ , we can use the effective Higman’s trick to convert $UMV$ into a $s\times s$ matrix with linear entries. With this matrix, we can use the truncated Gurvits’ algorithm to check whether the matrix we just obtained is full. Since we have this test, we will be able to output the correct rank. Algorithm 5 is the precise formulation of the procedure just described.

To prove this theorem, it is enough to show that Algorithm 5 is correct and it runs with the desired runtime.

Without loss of generality, we can assume that $n\leq m$ . Therefore we have that $\textnormal{nc-rank}(M)\leq n$ . By Lemma A.3, if $r\leq\textnormal{nc-rank}(M)$ , then matrix $M_{r}$ will be of full rank (and therefore will not have a shrunk subspace, by Theorem 1.4). Since $M_{r}=U_{r}MV_{r}$ , from the formulas computing the entries of $M$ we obtain formulas of size at most $2smn$ computing the entries of $M_{r}$ . Moreover, the bit complexities of these formulas will still be bounded by $b$ , as multiplication by generic matrices do not mix any of the polynomials of $M$ .

By Proposition A.2 and the fact that the size of the formulas computing the entries of $M_{r}$ are bounded by $2smn$ , we have that $N_{r}$ is a linear matrix of dimensions $(k+r)\times(k+r)$ , where $k\leq 2s(mn)^{2}$ and the bit complexity of the coefficients bounded by $b$ . Moreover, $N_{r}=P(M_{r}\oplus I_{k})Q$ implies that $N_{r}$ is full if, and only if, $M_{r}$ is full, which is true if, and only if, $\textnormal{nc-rank}(M)\geq r$ .

Now, by Theorem 1.1, we have a deterministic polynomial time algorithm to determine whether $N_{r}$ is full rank. If $r\leq\textnormal{nc-rank}(M)$ , $N_{r}$ will be full, and the maximum such $r$ will be exactly when $r=\textnormal{nc-rank}(M)$ . Therefore, by outputting the maximum $r$ for which $N_{r}$ we compute $\textnormal{nc-rank}(M)$ . This proves that our algorithm is correct. Notice that the runtime is polynomial in the input size, as we perform at most $n$ applications of the Higman trick and of Algorithm $G^{\prime}$ . This completes the proof. ∎

A.3 The Quantum Reduction

Here $X_{1,1}$ , $X_{1,2}$ , $X_{2,1}$ , $X_{2,2}$ are $n\times n$ , $n\times c-1$ , $c-1\times n$ , $c-1\times c-1$ matrices respectively. Then $\overline{T}$ is completely positive and $T$ is $c$ -rank-decreasing iff $\overline{T}$ is rank-decreasing. Note that we are considering $c\leq n$ .

A well known characterization due to Choi [Cho75] states that $\overline{T}$ is completely positive iff $\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j})$ is psd. Here $E_{i,j}$ is the matrix with $1$ at $i,j$ position and everywhere else. Now

From here it is easy to verify that $\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j})$ is psd given that $\sum_{i,j=1}^{n}E_{i,j}\otimes T(E_{i,j})$ is psd. Now suppose that $\overline{T}$ is rank-decreasing. This can only happen if $X_{1,1}=0$ or $X_{2,2}=0$ , otherwise

can be psd (and hermitian) only if $X_{1,2}=X_{2,1}=0$ . In this case a $c-1$ ranked matrix is mapped to rank $n$ matrix. So $X_{2,2}$ has to be zero. Then again by the psd condition $X_{1,2}=X_{2,1}=0$ . So

Hence $\text{Rank}(T(X_{1,1}))\leq\text{Rank}(X_{1,1})-c$ . This proves one direction. Now suppose that $T$ is $c$ -rank-decreasing and $\text{Rank}(T(X))\leq\text{Rank}(X)-c$ , then

This seems to be the “quantum” analogue of obtaining a maximum matching oracle based on a perfect matching oracle: add c-1 dummy vertices to both sides of the bipartite graph and connect them to everything. Then the new graph has a perfect matching iff the original graph had a matching of size $\geq n-c+1$ .

Here we didn’t specify a set of Kraus operators for the operator $\overline{T}$ which seem to be needed to run Algorithms 1 and 2 but Kraus operators can be obtained by looking at the eigenvectors of $\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j})$ . Alternatively Algorithms 1 and 2 can also be interpreted as acting directly on the Choi-Jamiolkowski state of $\overline{T}$ i.e. $\sum_{i,j=1}^{n+c-1}E_{i,j}\otimes\overline{T}(E_{i,j})$ .