The Birkhoff theorem for unitary matrices of prime dimension

Alexis De Vos, Stijn De Baerdemacker

Introduction

Doubly stochastic matrices are square matrices with real entries, all belonging to the interval (0,1)(0,1), such that all row sums and all column sums equal unity . Because the product of two doubly stochastic matrices is again a doubly stochastic matrix, the doubly stochastic matrices form a semigroup. They do not form a group because the inverse of a doubly stochastic matrix is not necessarily a doubly stochastic matrix. Because of their interpretation as probability distributions, doubly stochastic matrices emerge in several sections of physics, especially statistical physics. Birkhoff’s theorem says that any doubly stochastic matrix can be written as a weighted sum of permutation matrices, such that all weights are real and belong to the interval (0,1)(0,1) and the sum of the weights equals unity. So, every doubly stochastic matrix is contained in a convex set, spanned by the permutation matrices at the corners, thus dressing them with a geometric interpretation. The higher-dimensional solid containing the matrix is called Birkhoff’s polytope .

In the present paper, we aim to formulate an equivalent of Birkhoff’s theorem for unitary matrices. The importance of unitary matrices equally follows from physics, more in particular from quantum physics and quantum information. In contrast to the n×nn\times n doubly stochastic matrices, the n×nn\times n unitary matrices form a genuine group, called the unitary group and denoted U(nn). Within this group figures a subgroup denoted XU(nn): the group of n×nn\times n unitary matrices with all row sums and all column sums equal unity . As such, XU(nn) acts as a ‘doubly stochastic’ analogon within U(nn). Whereas U(nn) is an n2n^{2}-dimensional Lie group, XU(nn) is only an (n1)2(n-1)^{2}-dimensional Lie group, isomorphic to U(n1n-1). Below, we will demonstrate Birkhoff-like properties for XU matrices, giving them a geometric interpretation.

Three theorems

Theorem 1: If a U(nn) matrix can be decomposed as a weighted sum jmjPj\sum_{j}m_{j}P_{j} of permutation matrices PjP_{j}, then it is, up to a global phase, member of the subgroup XU(nn).

Indeed, let us assume that the matrix MM can be written as jmjPj\sum_{j}m_{j}P_{j}. Each of the permutation matrices PjP_{j} is a matrix with all line sums equal to 1. Therefore the matrix mjPjm_{j}P_{j} is a matrix with all line sums equal to mjm_{j}, and the matrix jmjPj\sum_{j}m_{j}P_{j} is a matrix with all line sums equal to jmj\sum_{j}m_{j}. If MM is member of U(nn) and has constant line sum, then this constant can only be equal to a number of the form eiαe^{i\alpha}, where α\alpha is an arbitrary real . Hence MM is of the form eiαXe^{i\alpha}X with XX member of XU(nn). \blacksquare Thus MM belongs to the group of constant-line-sum unitary matrices eiαXe^{i\alpha}X, a group isomorphic to the direct product U(1) ×\times XU(nn) and thus isomorphic to U(1) ×\times U(n1n-1).

Before introducing two more theorems, we present and prove two lemmas:

Lemma 1: A circulant XU(nn) matrix can be written as a weighted sum of permutation matrices with the sum of the weights equal to 1.

The proof is trivial, the decomposition consisting of the nn circulant n×nn\times n permutation matrices, each with a coefficient equal to one of the entries of the given XU matrix. \blacksquare

Lemma 2: If two matrices can both be written as a weighted sum of permutation matrices with the sum of the weights equal to 1, then also the product of the two matrices can.

We consider two n×nn\times n matrices with the Birkhoff property, i.e.

Here, each PjP_{j} denotes an n×nn\times n permutation matrix. The product c=abc=ab is

i.e. a matrix of the form wmwPw\sum_{w}m_{w}P_{w}. Because, moreover, wmw=uvmuamvb\sum_{w}m_{w}=\sum_{u}\sum_{v}m_{u}^{a}m_{v}^{b} =umua vmvb=1=\sum_{u}m_{u}^{a}\ \sum_{v}m_{v}^{b}=1, we conclude that the product matrix cc also has the Birkhoff property. \blacksquare

We now are in a position to present and prove the following theorem:

Theorem 2: If a matrix belongs to XU(nn), then it can be written as a weighted sum of permutation matrices with the sum of the weights equal to 1.

The proof is by induction on nn: we assume that the theorem is valid for n=Nn=N and consider an arbitrary matrix XX from XU(N+1N+1). It can be written as follows :

where FF is the (N+1)×(N+1)(N+1)\times(N+1) discrete Fourier transform and UU is a matrix from U(NN). The matrix UU can be written as follows :

where aa is a member of U(1), i.e. a complex number with unit modulus, where xx is a member of XU(NN), and where both Z1Z_{1} and Z2Z_{2} are member of ZU(NN). Here, ZU(nn) is the (n1)(n-1)-dimensional subgroup of U(nn), consisting of all diagonal n×nn\times n unitary matrices with upper-left entry equal to unity and thus isomorphic to U(1)n-1. Because of our induction assumption, the matrix xx can be written as

where all pjp_{j} are N×NN\times N permutation matrices and jmj=1\sum_{j}m_{j}=1. We conclude that

such that we have the matrix decomposition

First, we note that both \left(\begin{array}[]{cc}1&\\ &aZ_{1}\end{array}\right) and \left(\begin{array}[]{cc}1&\\ &Z_{2}\end{array}\right) are members of ZU(N+1N+1). For any member ZZ of ZU(N+1N+1) holds the property that FZF1FZF^{-1} is a circulant XU(N+1N+1) matrix. Thus, because of Lemma 1, both X1X_{1} and X2X_{2} can be written as a weighted sum of permutation matrices (i.e. obey the theorem-to-be-proved). Hence, by virtue of Lemma 2, to prove the theorem for XX, it suffices to prove it for YY. For this purpose, we note that, because of jmj=1\sum_{j}m_{j}=1, we have

One can easily verify that any product of the form F\,{\tiny\left(\begin{array}[]{cc}1&\\ &p_{j}\end{array}\right)}\,F^{-1} is a unitary matrix with upper-left entry equal to 1. Because F\,{\tiny\left(\begin{array}[]{cc}1&\\ &p_{j}\end{array}\right)}\,F^{-1} is of the form F\,{\tiny\left(\begin{array}[]{cc}1&\\ &U\end{array}\right)}\,F^{-1}, this product is also an XU(N+1N+1) matrix . A matrix with these two properties is necessarily of the form {\tiny\left(\begin{array}[]{cc}1&\\ &y_{j}\end{array}\right)} with yjy_{j} a member of XU(NN). Because of the induction hypothesis, we may put

Taking into account that 1=kmky1=\sum_{k}m_{k}^{y}, we find

Hence, YY is of the Birkhoff form: a weighted sum of (N+1)×(N+1)(N+1)\times(N+1) permutation matrices with sum-of-weights equal to 1. Hence, XX is. Thus the theorem holds for n=N+1n=N+1.

They can be written as a weighted sum of the two 2×22\times 2 permutation matrices:

and the sum of the two weights m1m_{1} and m2m_{2} equals 1.

Because the theorem holds for n=2n=2 and the theorem holds for n=N+1n=N+1 as soon as it holds for n=Nn=N, the proof of Theorem 2 is complete. \blacksquare

Whereas the Birkhoff decomposition (5) of an XU(2) matrix is unique, the decomposition of an XU(nn) matrix with n>2n>2 is not unique. We now investigate whether, among the many possible decompositions jmjPj\sum_{j}m_{j}P_{j}, there is one or more that satisfies not only jmj=1\sum_{j}m_{j}=1 but also jmj2=1\sum_{j}|m_{j}|^{2}=1. This is a slightly stronger formulation of the mj<1|m_{j}|<1 constraints of the original Birkhoff theorem on doubly stochastic matrices and again defines a convex polytope in which all XU(nn) lie. We start with the case where nn is a prime:

Theorem 3: If a matrix belongs to XU(nn) with prime nn, then it can be written as a weighted sum of permutation matrices with the sum of the squared moduli of the weights equal to 1.

Before proving Theorem 3, it is interesting to investigate some low-dimensional examples. The theorem is trivial for n=2n=2. Indeed, above, we have shown that m1=(1+eiα)/2m_{1}=(1+e^{i\alpha})/2 and m2=(1eiα)/2m_{2}=(1-e^{i\alpha})/2, such that m12+m22=1|m_{1}|^{2}+|m_{2}|^{2}=1.

The theorem is also valid for n=3n=3. In fact, there exist an infinity of decompositions of XX as a weighted sum of the n!=6n!=6 permutation matrices, all satisfying j=16mj2=1\sum_{j=1}^{6}|m_{j}|^{2}=1. Indeed, any member XX of XU(3) can be written as (1), with FF the 3×33\times 3 discrete Fourier transform and UU a 2×22\times 2 unitary matrix. Hence

where ω\omega is the primitive 3 rd root of unity, i.e. ei2π/3=12+i32e^{i2\pi/3}=-\frac{1}{2}+i\,\frac{\sqrt{3}}{2}. The entries of XX therefore look like

Each product ωaUrs\omega^{a}U_{rs} (a=0,1,2\,\forall a=0,1,2 and r,s=1,2\forall r,s=1,2) appears exactly once in every row and exactly once in every column of XX. Therefore it is staightforward to check that XX can be written as

and W3W_{3} is the doubly stochastic matrix with all entries identical, i.e. equal to 13\frac{1}{3}. We call W3W_{3} the 3×33\times 3 van der Waerden matrix . It can be written both as a sum of the circulant matrices and as a sum of the anticirculant matrices:

Here, we apply the following decomposition:

where pp is an arbitrary complex number.

straightforward computations lead to jmj=1\sum_{j}m_{j}=1 and

Taking into account that UU is a 2×22\times 2 unitary matrix leads to

For this sum to equal 1, it suffices that

i.e. that, in the complex plane, pp is located on the circle with center 12\frac{1}{2} and radius 12\frac{1}{2}. For the particular choice p=1p=1, we obtain:

We are now in a position to prove Theorem 3 for an arbitrary prime. We will suffice by demonstrating the existence of one appropriate decomposition. Any member XX of XU(nn) can be written as (1), where FF is the n×nn\times n discrete Fourier transform and UU is a matrix from U(n1n-1). Hence, the matrix entries can be written

where ω\omega is the nn th root of unity. Thus, given the numbers rr and ss, each number UrsU_{rs} appears in the expression of every entry XklX_{kl}. Therefore, we can write XX as a sum of 1+(n1)21+(n-1)^{2} matrices:

where WnW_{n} is the n×nn\times n van der Waerden matrix, i.e. the doubly stochastic matrix with all entries equal to 1n\frac{1}{n}. We call MrsM_{rs} the transfer factor of UrsU_{rs}. It is an n×nn\times n matrix with all entries equal to some ωa\omega^{a}:

As nn is prime, a given number ωa\omega^{a} appears only once in every row and only once in every column of MrsM_{rs}. Moreover MrsM_{rs} has the structure of a ‘supercirculant’ matrix. A square matrix AA is called supercirculant if there exist two integers xx and yy, such that, for all {k,l}\{k,l\}, we have both Ak+1,l+x=Ak,lA_{k+1,l+x}=A_{k,l} and Ak+y,l+1=Ak,lA_{k+y,l+1}=A_{k,l} (where sums are modulo nn). The numbers xx and yy (with 1xn11\leq x\leq n-1 and 1yn11\leq y\leq n-1) are called the pitches. They are interdependent, as

If x=1x=1, then y=1y=1 and AA is simply called circulant; if x=n1x=n-1, then y=n1y=n-1 and AA is called anticirculant. The matrix MrsM_{rs} is supercirculant because the difference l(K+1)l(K)l(K+1)-l(K) in column number, in which ωa\omega^{a} (for a given aa) occurs for two consecutive rows KK and K+1K+1, is a constant (modulo nn) independent of KK. Indeed, applying ω(k1)r(l1)s\omega^{(k-1)r-(l-1)s} equal to ωa\omega^{a} for both k=Kk=K and k=K+1k=K+1 yields

such that l(K+1)l(K)l(K+1)-l(K) is a constant, say xx. Analogously, for a given ωa\omega^{a}, k(L+1)k(L)k(L+1)-k(L) is a constant, say yy. We can summarize that the two pitches xx and yy follow from

Because nn is prime, xx and nn are coprime and so are yy and nn. Therefore, ωa\omega^{a} does not appear more than once in a column or row of MrsM_{rs}. As an example, for n=5n=5, the eqns (6) yield the following functions x(r,s)x(r,s) and y(r,s)y(r,s):

respectively. From the table, one can read that the pitches of M12M_{12} for n=5n=5 are x=3x=3 and y=2y=2, respectively, leading to the explicit form

with ω\omega equal to the 5 th root of unity, i.e. ei2π/5=(51+i10+25)/4e^{i2\pi/5}=(\sqrt{5}-1+i\,\sqrt{10+2\sqrt{5}}\,)/4.

We thus can conclude that any transfer matrix can be written as

Here, Cl,xC_{l,x}, with 1ln1\leq l\leq n and 1xn11\leq x\leq n-1, denotes the n×nn\times n supercirculant permutation matrixNote that ClxC_{lx} is a permutation matrix if and only if xx and nn are coprime. with a first-row unit entry in column ll and a pitch equal to xx. In other words: we have (Cl,x)1,l=(Cl,x)2,l+x=1(C_{l,x})_{1,l}=(C_{l,x})_{2,l+x}=1.

We thus obtain the following decomposition:

where we thus sum over all n(n1)n(n-1) supercirculant permutation matrices ClxC_{lx}. We note here that, because of eqns (6), different values of ss in (7) give rise to different values of r(s,x)r(s,x).

For n2n\neq 2, we may apply the following decomposition of the van der Waerden matrix:

where the nn permutation matrices DjD_{j} are chosen such that they have no 1s in common and are not supercirculant, e.g. Dj=Qj1D1D_{j}=Q^{j-1}D_{1}, where

the former being called the shift matrix. Thanks to such choice, the matrix sets {Clx}\{C_{lx}\} and {Dj}\{D_{j}\} do not overlap and the sum jmjPj\sum_{j}m_{j}P_{j} consists of two separate parts:

These parts have the following respective properties:

The n(n1)n(n-1) weights mlxm_{lx} of the permutation matrices ClxC_{lx} equal a sum of n1n-1 products:

With l=1nω(l1)(ts)=nδst\sum_{l=1}^{n}\omega^{(l-1)(t-s)}=n\,\delta_{st}, we obtain

As UU is an (n1)×(n1)(n-1)\times(n-1) unitary matrix, we have s=1n1 Urs Urs=1\sum_{s=1}^{n-1}\ U_{rs}\ \overline{U_{rs}}=1. Hence

The nn weights mjm_{j} of the permutation matrices DjD_{j} equal 1n\frac{1}{n} and therefore contribute to jmjmj\sum_{j}m_{j}\overline{m_{j}}\, with an amount nn times 1n2|\frac{1}{n}|^{2} and thus 1n\frac{1}{n}.

We note that the above construction does not work for the special case n=3n=3 because, for n=3n=3, the matrices D1D_{1}, D2D_{2}, and D3D_{3} are, by coincidence, anticirculant. Therefore, D1D_{1}, D2D_{2}, and D3D_{3} coincide with C12C_{12}, C22C_{22}, and C32C_{32}, respectively, such that the above special-purpose construction for n=3n=3 was necessary. As a matter of fact, the proposed Birkhoff decomposition consists of n(n1)n(n-1) matrices ClxC_{lx} and nn matrices DjD_{j}, thus of a total of n2n^{2} permutation matrices PjP_{j}. Only for n>3n>3, the relation n2n!n^{2}\leq n! is valid and there exist enough permutation matrices to prove Theorem 3 in the generic way.

If nn is not prime, then not all transfer matrices MrsM_{rs} are supercirculant and the key property of the decomposition, proposed in the proof, is not fulfilled. If both rr and ss are coprime with nn, then MrsM_{rs} is supercirculant. The other transfer matrices consist of identical blocks of size b×cb\times c with

E.g., for n=4n=4, the 4×44\times 4 matrix M12M_{12} has two identical blocks of size b×c=4×2b\times c=4\times 2:

where ω\omega here is the 4 th root of unity, i.e. ii.

Whether Theorem 3 is also valid if nn is a composite number, is left for further investigation. At least it is valid for the smallest non-prime, i.e. for n=4n=4. This can be verified by checking that the decomposition

where the weights mjm_{j} have the values as in the Appendix.

A consequence

As already mentioned in Section 2, any n×nn\times n unitary matrix UU can be decomposed as

where eiαe^{i\alpha} is an overall phase factor, XX is an XU(nn) matrix, and both Z1Z_{1} and Z2Z_{2} are ZU(nn) matrices. Applying the fact that XX can be written as a weighted sum of permutation matrices, we can conclude that UU can be written as a weighted sum of complex permutation matrices. Here, we define a complex permutation matrix as a unitary matrix having one and only one non-zero entry in every row and every column .

Conclusion

We have demonstrated that all matrices of the group eiαe^{i\alpha}XU(nn) can be written as a weighted sum of permutation matrices and that, among the U(nn) matrices they are the only ones that can be decomposed that way. The sum of the weights equals eiαe^{i\alpha}. We prove that the sum of the squared moduli of the weights can be made equal to unity whenever nn is prime, giving a convex geometric interpretation to the decomposition, as in the standard Birkhoff theorem. The case of non-prime nn is left for further investigation.

References

Appendix

An arbitrary member XX of XU(4) may be written as j=124mjPj\sum_{j=1}^{24}m_{j}P_{j} with

where the condition jmj2=1\sum_{j}|m_{j}|^{2}=1 is fulfilled. Here, the n!=24n!=24 permutation matrices have been ordered ‘lexicographically’ as follows:

In this ordering, the supercirculant permutation matrices are C11=P1C_{11}=P_{1}, C13=P6C_{13}=P_{6}, C21=P10C_{21}=P_{10}, C23=P8C_{23}=P_{8}, C31=P15C_{31}=P_{15}, C41=P19C_{41}=P_{19}, and C43=P24C_{43}=P_{24}.