Scaling a unitary matrix
Alexis De Vos, Stijn De Baerdemacker
Introduction
By definition, the scaling of an matrix is the multiplication of this matrix to the left by a diagonal matrix and to the right by a diagonal matrix , resulting in the matrix , called the scaled matrix .
Sinkhorn has demonstrated that an arbitrary matrix with exclusively real and positive entries can be scaled by diagonal matrices and with exclusively positive real entries, such that the resulting scaled matrix is doubly stochastic, i.e. such that all entries of are real and in the interval and all line sums (i.e. all row sums and all column sums) of are equal to 1.
In order to find the appropriate matrices and , one could consider the set of line-sum equations and solve it for the unknown entries of and the unknown entries of . However, all equations are quadratic. As the equations are non-linear, an analytic solution of the set is not available. Instead, one proceeds iteratively, in the way pioneered by Kruithof . One computes the matrices , , …, successive approximations of the wanted matrix . At the th iteration, one chooses a diagonal matrix with diagonal entries equal to the inverse of the row sums of the matrix :
and one chooses a diagonal matrix with diagonal entries equal to the inverse of the resulting column sums
Because this procedure converges, the matrices and ultimately become the and unit matrices, respectively. This means that ultimately becomes a matrix with unit line sums. By choosing equal to , the two wanted scaling matrices are
and the scaled matrix is .
In the present paper, we investigate whether it is possible to scale an unitary matrix by two unitary diagonal matrices and , such that the scaled matrix has all line sums equal to 1. For this purpose, we apply a Sinkhorn-like algorithm, however with
thus guaranteeing that all the diagonal entries of and are automatically unitary. Here, we call a complex number unitary iff and define the function of a complex number as
If this procedure ultimately converges, then both and ultimately become the unit matrix. This means that ultimately becomes a matrix with all line sums having , i.e. with all line sums real (positive or zero). Below, we will see that not only converges to a matrix with exclusively realThe unitary matrices with real line-sums form a subset of U(), but not a subgroup of U(). This can easily be illustrated by multiplying two U(2) matrices: the square root of NOT matrix M_{1}=\frac{1}{2}\ {\scriptsize\left(\begin{array}[]{rr}1+i&1-i\\ 1-i&1+i\end{array}\right)} and the orthogonal matrix M_{2}=\frac{1}{2}\ {\scriptsize\left(\begin{array}[]{rr}\sqrt{3}&-1\\ 1&\sqrt{3}\end{array}\right)}. All four line sums of both matrices are real (and positive); their product , however, has a real and positive first column sum , but a complex first row sum . (non-negative) line sums, but that surprisingly those line sums moreover equal 1.
A progress measure
For investigating the progress in the matrix sequence , we basicly follow a reasoning similar to the elegant proofs of Sinkhorn’s theorem by Linial et al. and by Aaronson . However, the pivotal role (called either ‘progress measure’ or ‘potential function’) played either by the matrix permanent (Linial et al.) or by the matrix product (Aaronson) is taken over here by the absolute value of the matrix sum. The matrix sum of an matrix is defined as the sum of all its entries:
Assume a matrix with row sums and column sums . We denote by the diagonal matrix with entries equal to . If is a short-hand notation for , and and are its row sums and column sums, respectively, then we have
because can be regarded as a vector sum of vectors with the same length as the vectors but with zero angles between them. The equality sign holds iff all numbers have the same argument. Similarly, we denote by the diagonal matrix with entries equal to . If is a short-hand notation for , and are its column sums, then
where the equality sign holds iff all numbers have the same argument. We thus can conclude that
where the equality holds iff the equality holds both in (1) and in (2). The equality in (1) occurs if all have the same argument, whereas the equality in (2) occurs if all have the same argument. But, a constant leads to an of the form and therefore to column sums and the condition of constant then is equivalent to the condition of constant .
We conclude that, in the matrix sequence of Section 1, we have
where the equality sign holds iff is a matrix with all line sum arguments equal. As soon as , this condition is equivalent to iff is a matrix with all line sums real, either zero or positiveWe remark that, for , the procedure of Section 1 guarantees that all column sums and thus also the matrix sum are positive or zero. Hence, for , the absolute value symbols in (3) could be omitted..
In Appendix A, we prove that, for an arbitrary unitary matrix , we have . The equality sign holds iff is, up to a global phase, a matrix with all line sums equal to 1. For sake of convenience, we define the potential function of an matrix as
such that for all unitary matrices holds and the zero potential corresponds with unit line-sum matrices (times a factor ). With this convention, we rewrite (3) as
The landscape of the matrix group U() displays stationary points. In Appendix B, we show that these points either have zero matrix sum or have all (non-zero) line sums with same argument. We distinguish three categories of stationary points: their potential satisfies
If the first case occurs, then the stationary point is a global maximum; if the second case occurs, then the stationary point is a global minimum. We conjecture that the third class consists of saddle points. In other words: we conjecture that the landscape has no local minima (nor local maxima). As a result, the scaling procedure (with ever decreasing ) ultimaltely converges to the point with minimal potential. We conjecture that this global minimum is a matrix with and thus is a wanted unit line-sum matrix .
Either the given matrix does not have constant line sum arguments, in which case we choose . As long as the subsequent matrices , , … do not have all row sums real, we have a strictly decreasing sequence . This sequence is bounded by 0. Therefore a limit matrix exists, with .
If , then is the wanted scaled matrix. The scaling matrices are and .
In the (very unlikely) case that , the matrix is a stationary point in the potential landscape . According to the conjecture, this point is a saddle point and therefore we can apply appropriate matrices and , both close to the unit matrix, such that has potential lower than . It is sufficient to try mutually orthogonal directions in the -dimensional neighbourhood of the saddle point. After applying these and , we restart the algorithm with a new , equal to .
Or the given matrix has all line sum arguments equal and thus is a stationary point. In this case we choose , with two appropriate matrices and , such that the start matrix is not a stationary point. For this purpose, we can proceed as follows:
If at least two row sums of are different from 0, e.g. and , then we take equal to the unit matrix and all entries of equal to 1, except , thus resulting in at least two different row-sum arguments for .
If at least two column sums of are different from 0, e.g. and , then we take equal to the unit matrix and all entries of equal to 1, except , thus resulting in at least two different column-sum arguments for .
If only one row sum (say ) and only one column sum (say ) of differ from 0, i.e. if is a generalized Hadamard matrix , then we take equal to the unit matrix and all entries of equal to 1, except , thus resulting in at least two different column-sum arguments for .
An example of the latter case is the orthogonal matrix
where, for convenience, we assume . Indeed: all its line sums have zero argument. If we would take , then would be equal to the unit matrix and subsequently would be equal to the unit matrix, such that and, in fact, all subsequent would be equal to and therefore , equal to in this example. In this way, the algorithm cannot find the wanted solution, in spite of the fact that such scaled matrices with exclusively unit line sums actually exist, e.g.
In order to avoid the no-start of the convergence towards the desired scaled matrix , we apply e.g. the matrices L_{0}={\scriptsize\left(\begin{array}[]{rr}i&0\\ 0&1\end{array}\right)} and R_{0}={\scriptsize\left(\begin{array}[]{rr}1&0\\ 0&1\end{array}\right)}, resulting in
where row sums and indeed have unequal arguments: and 0, respectively.
Convergence speed
As a first example of the procedure of Section 2, we take the unitary matrix
with line sums, matrix sum, and potential
Because , , and the line sums do not have equal argument, we are in a ‘common’ case, i.e. not in a stationary point of the landscape. We thus choose . Subsequent steps of the algorithm yield potentials
Next, with the method of Życzkowski et al. , we generate 1,000 random elements of U(3), uniformly distributed with respect to the Haar measure. Table 1 shows how the potential decreases after each step of the scaling procedure. Whereas the values of the initial matrices have a wide distribution between 0 and , the distribution of is very peaked at , as soon as .
The convergence speed turns out to be strongly different for different matrices . Among the 1,000 samples, some converge exceptionally slowly, as is illustrated by the column ‘maximum()’. Usually, however, convergence is fast, as is illustrated by the column ‘average()’. We stress the fact that all 1,000 experiments directly converge to the global minimum , and thus none ‘gets trapped’ in a local minimum and none temporarily ‘halts’ in a saddle point.
Finally, similar experiments with 1,000 random elements from U(4) (see Table 1) and with 1,000 random elements from U(5) lead to similar results.
For equal to 2, 4, 8, 16, and 32, Figure 1 shows the probability distribution of the potential , after , , , and iteration steps. We see how the distribution, at each step, shifts more and more to .
The convergence can also be visualized by displaying as a function of , i.e. the correlation between a after and before an iteration step. Figure 2 shows , , and for both and . As expected, all points lay below the line . We see how the cloud of points, after each step, becomes smaller and smaller and moves closer and closer to the point .
Application
In Reference , two subgroups of the unitary group U() are presented:
the subgroup ZU(), consisting of all diagonal matrices with upper-left entry equal to 1 and other diagonal entries from U(1);
the subgroup XU(), consisting of all unitary matrices with all of their line sums are equal to 1,
and the following theorem is proved: any U() matrix can be decomposed as
with and where all are ZU() matrices and all are XU() matrices. In Reference , it is proved that a shorter decomposition exists: with .
In the present paper, we conjecture that even a far stronger theorem holds: . This means that any U() matrix can be decomposed as
where both and are ZU() matrices and is an XU() matrix. In , it is proved that the group XU() is isomorphic to U() and hence has dimension . Therefore, the product has
degrees of freedom, matching exactly the dimension of U() and thus making the conjecture dimensionally possible. However, no analytic expression is provided for the unknown entries neither of the matrices and nor of the scaled matrix . An analytic solution of the decomposition problem is easily found for . Indeed, an arbitrary member of U(2) can be decomposed according to (4), in two different ways:
For more details about the case U(2), the reader is referred to Appendix C.
No analytic solution is found as soon as . Even the decomposition of an arbitrary member of U(3) is an unsolved problemAs soon as one of the nine entries of the U(3) matrix equals zero, the problem is analytically solvable., in spite of substantial efforts by the authors of the present paper. Independently, in the framework of an other but related problem, Shchesnovich comes to a similar conclusion. For arbitrary , Reference gives an analytic solution for a -dimensional subset of the -dimensional group U(). We conjecture that the asymptotic scaling procedure of Sections 1 and 2 provides a numerical solution, for any member of U(), with arbitrary .
If, in particular, we have , then a U() matrix represents a quantum circuit of width , i.e. acting on qubits. We thus may conclude that such circuit can be decomposed as the cascade of an overall phase, two subcircuits and one subcircuit. The basic building block of any circuit is the 1-qubit circuit represented by matrix
the basic building block of any circuit is the 1-qubit circuit represented by
The NEGATOR realizes an arbitrary root of NOT and thus is a natural generalization of the square root of the NOT gate .
Each matrix is implemented by a string of controlled PHASORs; any matrix represents a circuit composed of controlled NEGATORs .
where is a short-hand notation for and is the permutation matrix
we can transform (4) into a decomposition containing exclusively XU and ZU matrices:
where is an XU matrix which can be implemented with classical reversible gates (i.e. NOTs and controlled NOTs), where is a ZU matrix which can be implemented by a single (uncontrolled) PHASOR gate, and where is the product . The short decomposition (6) illustrates the power of the two subgroups XU() and ZU(), which are complementary , in the sense that
they overlap very little, as their intersection is the trivial group consisting of merely the unit matrix and
they strengthen each other sufficiently, as their closure is the whole unitary group U().
As an example, we give here a decomposition of the Hadamard gate according to schemes (4) and (6), respectively:
Conclusion
We have presented a method for scaling an arbitrary matrix from the unitary group U(), by multiplying the matrix to the left and to the right by unitary diagonal matrices. We conjecture that the resulting scaled matrix is a member of XU(), i.e. the subgroup of U() consisting of all unitary matrices with all line sums equal to 1. If , then scaling can be performed analytically and thus with infinite precision. If , then scaling has to be performed numerically and thus with finite precision. In the terminology of Linial et al. , we would say that matrices from U(2) are ‘scalable’, whereas matrices from U() with are ‘almost scalable’. The conjecture that the numerical algorithm converges to a unit line-sum matrix, is based on four observations:
the proof that such matrix exists in the case ,
the proof that such matrix exists for a -dimensional subset of the general case U() ,
the success of 1,000 numerical experiments in the cases , , and , and
the fact that, according to (5), there are, for arbitrary , exactly the right number of freedoms.
For the special case of , this leads to a decomposition of an arbitrary quantum circuit into
one overall phase, one X circuit and two Z circuits or
References
Appendix A Theorem
Theorem : The absolute value of the matrix sum of a U() matrix is smaller than or equal to . The U() matrices with abs(matrixsum) = are member of the subgroup XU(), where XU() denotes the subgroup of U() consisting of the matrices with unit line sums.
Let , , …, and be the row sums of an matrix. For convenience, we give their real and imaginary parts an explicit notation:
as proved in Appendix A of Reference . We rewrite this property as follows:
where the angle is allowed to have any value.
We consider (9) as the eqn of an -dimensional hypersphere. We ask ourselves what is the highest value of the function
on the surface of this hypersphere. For this purpose, we note that , with some positive constant, is the eqn of the set of two parallel hyperplanes:
The highest value of on the hypersphere is when the two planes are tangent to the sphere. This happens when equals , the two tangent points having coordinates and having the value . See the 2-dimensional illustration in Figure 3.
A similar reasoning is possible for the function
Noting that equals , we conclude that has maximum value . The unitary matrices with this particular value are the matrices with , i.e. the matrices with constant row sum equal to , where is either or .
A dual reasoning holds for the column sums, with , such that and . The unitary matrices with are the matrices with , i.e. the matrices with constant column sum equal to , where is either or . Because a matrix can have only one matrix sum, a matrix with both constant row sum and constant column sum necessarily has constant line sum. Therefore, for the matrices with , the angle equals the angle .
The maximum- matrices equal times a matrix with constant line sum equal to 1. Thus they are member of the group XU(), a subgroup of U(), isomorphic to U(1) XU(), and thus isomorphic to U(1) U().
Appendix B The potential landscape
Given a unitary matrix , finding the scaled matrix is equivalent to solving the (non-linear) eqn
we introduce a -dimensional landscape , given by
We have to find the minimum of this function, i.e. the point .
In order to investigate the shape of the function, we linearize the equation around , i.e. in the neighbourhood of = . For this purpose, we write the row sums, column sums, and matrix sum of the given matrix as follows:
The coefficients of and form the gradient vector of the landscape. A stationary point occurs whenever, for all ,
These conditions can only be fulfilled in the following cases:
i.e. when the matrix sum is zero and hence has the global maximum value of ,
i.e. when all non-zero line sums have the same argument.
We conjecture that all these stationary points are either maxima or saddle points or global minima. In other words: we conjecture that no local minima exist. Moreover, we conjecture that the global minima satisfy .
Appendix C The case U(2)
We consider the unitary group U(2). All diagonal unitary matrices form the subgroup DU(2), isomorphic to U(1) U(1). The subgroup DU(2) divides its supergroup U(2) into double cosets. Let be an arbitrary U(2) matrix:
Its double coset consists of all matrices
where and are short-hand notations for and , respectively. We introduce the variables
Therefore, the double coset is the 3-parameter space
Thus the double coset of consists of the matrices , i.e. of all matrices with the same value of the angle . This constitutes a 3-dimensional subspace of the 4-dimensional space U(2), except for the cases (i.e. for the double coset of the IDENTITY matrix {\tiny\left(\begin{array}[]{cc}1&0\\ 0&1\end{array}\right)}) and (i.e. for the double coset of the NOT matrix {\tiny\left(\begin{array}[]{cc}0&1\\ 1&0\end{array}\right)}), which both are 2-dimensional onlyOne may consider an arbitrary place on earth. The points , with same latitude but arbitrary longitude , form a 1-dimensional subspace of the 2-dimensional earth surface, called the parallel of , except if either , in which case is a 0-dimensional subspace, called the north pole, or , in which case is a 0-dimensional subspace, called the south pole. We therefore may consider the 2-dimensional double coset of the IDENTITY matrix and the 2-dimensional double coset of the NOT matrix as the north and the south pole, respectively, of the 4-dimensional U(2) manifold..
What are, within the double coset of , the stationary points of the landscape? We easily find
The conditions , , and immediately lead to
This set of two trigonometric equations in the three unknowns , , and has infinitely many solutions, leading to an infinite number of matrices:
with arbitrary, , and . These sixteen sets of matrices lead to values equal to , , , and , corresponding to global maxima, saddle points, saddle points, and global minima, respectively. The saddle points belong to U(1)O(2), subgroup of U(2); the global extrema do not.
What are, within the same double coset, the matrices with all line sums real? We easily find the conditions:
This set of three trigonometric equations in the three unknowns , , and has twelve solutions:
Four matrices have matrix sum and thus ; four matrices have matrix sum and thus ; four matrices have matrix sum and thus ; four matrices have matrix sum and thus . Thus four of the twelve matrices represent a global maximum; eight of the twelve matrices represent a saddle point; four of the twelve matrices represent a global minimum.
As an example, we choose the matrix with , such that . Among the twelve matrices of its double coset with real line sums , there are only four matrices where the four line sums are positive, i.e.
where is a short-hand notation for . Among these four matrices, only and have unit line sum (and thus ).
In the neighbourhood of , we have the matrices
with , , and small. This yields a matrix sum
The opposite signs of the coefficients of and illustrate the fact that is a saddle point of the landscape. Only if the subsequent matrices are situated on the line, then the Sinkhorn-like procedure of Section 2 halts at the point . In order to leave this stop, it suffices to continue along another line, e.g. . Similar conclusions hold for the point .
In the neighbourhood of , we have the matrices
with , , and small. This yields a matrix sum
The positive signs of the coefficients of and illustrate the fact that is a minimum (actually, a global minimum) of the landscape. The same conclusion holds for the point .
Let us assume that, in spite of the direct analytic solution for , we scale a U(2) matrix by the iterative method of Sections 1 and 2. Once close to the point , how fast do we converge to this global minimum of ? Close to , we have of the form
As soon as , both column sums of are real (or zero), such that and . Thus we have a matrix
and, because of (15), a potential . Thus all matrices lay on a line, the 1-dimensional space (17), subspace of the 3-dimensional space (16). If we apply the th step of the iterative algorithm, we find, after some algebra:
where . Hence the new potential is and
This illustrates the fact that the convergence speed of the algorithm is indeed dependent on the given matrix , more specifically on its parameter . If this angle is close to , then convergence is fast; if the angle is close to , then convergence is slow.
Finally, we ask ourselves, given the matrix , does the algorithm of Sections 1 and 2 lead to the scaled matrix or to the scaled matrix ? The separatrice consists of the spaces and . If , then the trajectory , , , … ends in the attractor ; if , then the trajectory , , , … ends in the attractor ; if or , then is an orthogonal matrix (either or ) and thus a saddle-point, such that the final destination (either or ) depends on the direction in which one leaves the saddle point.
We close this appendix by comparing the above quantitative U(2) results with the qualitative U() properties. It is well-known that, if a finite group G has a subgroup H, then H divides G into double cosets with sizes ranging from order(H) to order2(H). Similarly, if a Lie group G has a Lie subgroup H, then H divides G into double cosets with dimension ranging from dim(H) to 2 dim(H). The group U() is -dimensional and its subgroup DU() is -dimensional. As a result, DU() divides U() into double cosetsThis set of double cosets, i.e. the double coset space can be mapped to the set (not group!) of so-called unistochastic matrices , a subset of the well-known semigroup of bistochastic matrices (a.k.a. doubly stochastic matrices)., each with dimension between and . In fact, in this particular case, the dimensions of the double cosets range from to . Most of the double cosets are -dimensional; only some are lower-dimensional, e.g. the double cosets of permutation matrices being only -dimensionalTogether these double cosets form the group of complex permutation matrices, a group isomorphic to the semidirect product DU() : Sn, where Sn is the symmetric group of degree .. Thus within a double coset, we have at most degrees of freedom. If we want a matrix with all line sums real, then this imposes conditions, usually lowering the number of freedoms to 0. In other words: in each double coset there usually are a finite number of real line-sum matrices. We conjecture that at least one of these matrices is a unit line-sum matrix.