The Birkhoff theorem for unitary matrices of prime dimension

Alexis De Vos, Stijn De Baerdemacker

Introduction

Doubly stochastic matrices are square matrices with real entries, all belonging to the interval $(0,1)$ , such that all row sums and all column sums equal unity . Because the product of two doubly stochastic matrices is again a doubly stochastic matrix, the doubly stochastic matrices form a semigroup. They do not form a group because the inverse of a doubly stochastic matrix is not necessarily a doubly stochastic matrix. Because of their interpretation as probability distributions, doubly stochastic matrices emerge in several sections of physics, especially statistical physics. Birkhoff’s theorem says that any doubly stochastic matrix can be written as a weighted sum of permutation matrices, such that all weights are real and belong to the interval $(0,1)$ and the sum of the weights equals unity. So, every doubly stochastic matrix is contained in a convex set, spanned by the permutation matrices at the corners, thus dressing them with a geometric interpretation. The higher-dimensional solid containing the matrix is called Birkhoff’s polytope .

In the present paper, we aim to formulate an equivalent of Birkhoff’s theorem for unitary matrices. The importance of unitary matrices equally follows from physics, more in particular from quantum physics and quantum information. In contrast to the $n\times n$ doubly stochastic matrices, the $n\times n$ unitary matrices form a genuine group, called the unitary group and denoted U( $n$ ). Within this group figures a subgroup denoted XU( $n$ ): the group of $n\times n$ unitary matrices with all row sums and all column sums equal unity . As such, XU( $n$ ) acts as a ‘doubly stochastic’ analogon within U( $n$ ). Whereas U( $n$ ) is an $n^{2}$ -dimensional Lie group, XU( $n$ ) is only an $(n-1)^{2}$ -dimensional Lie group, isomorphic to U( $n-1$ ). Below, we will demonstrate Birkhoff-like properties for XU matrices, giving them a geometric interpretation.

Three theorems

Theorem 1: If a U( $n$ ) matrix can be decomposed as a weighted sum $\sum_{j}m_{j}P_{j}$ of permutation matrices $P_{j}$ , then it is, up to a global phase, member of the subgroup XU( $n$ ).

Indeed, let us assume that the matrix $M$ can be written as $\sum_{j}m_{j}P_{j}$ . Each of the permutation matrices $P_{j}$ is a matrix with all line sums equal to 1. Therefore the matrix $m_{j}P_{j}$ is a matrix with all line sums equal to $m_{j}$ , and the matrix $\sum_{j}m_{j}P_{j}$ is a matrix with all line sums equal to $\sum_{j}m_{j}$ . If $M$ is member of U( $n$ ) and has constant line sum, then this constant can only be equal to a number of the form $e^{i\alpha}$ , where $\alpha$ is an arbitrary real . Hence $M$ is of the form $e^{i\alpha}X$ with $X$ member of XU( $n$ ). $\blacksquare$ Thus $M$ belongs to the group of constant-line-sum unitary matrices $e^{i\alpha}X$ , a group isomorphic to the direct product U(1) $\times$ XU( $n$ ) and thus isomorphic to U(1) $\times$ U( $n-1$ ).

Before introducing two more theorems, we present and prove two lemmas:

Lemma 1: A circulant XU( $n$ ) matrix can be written as a weighted sum of permutation matrices with the sum of the weights equal to 1.

The proof is trivial, the decomposition consisting of the $n$ circulant $n\times n$ permutation matrices, each with a coefficient equal to one of the entries of the given XU matrix. $\blacksquare$

Lemma 2: If two matrices can both be written as a weighted sum of permutation matrices with the sum of the weights equal to 1, then also the product of the two matrices can.

We consider two $n\times n$ matrices with the Birkhoff property, i.e.

Here, each $P_{j}$ denotes an $n\times n$ permutation matrix. The product $c=ab$ is

i.e. a matrix of the form $\sum_{w}m_{w}P_{w}$ . Because, moreover, $\sum_{w}m_{w}=\sum_{u}\sum_{v}m_{u}^{a}m_{v}^{b}$ $=\sum_{u}m_{u}^{a}\ \sum_{v}m_{v}^{b}=1$ , we conclude that the product matrix $c$ also has the Birkhoff property. $\blacksquare$

We now are in a position to present and prove the following theorem:

Theorem 2: If a matrix belongs to XU( $n$ ), then it can be written as a weighted sum of permutation matrices with the sum of the weights equal to 1.

The proof is by induction on $n$ : we assume that the theorem is valid for $n=N$ and consider an arbitrary matrix $X$ from XU( $N+1$ ). It can be written as follows :

where $F$ is the $(N+1)\times(N+1)$ discrete Fourier transform and $U$ is a matrix from U( $N$ ). The matrix $U$ can be written as follows :

where $a$ is a member of U(1), i.e. a complex number with unit modulus, where $x$ is a member of XU( $N$ ), and where both $Z_{1}$ and $Z_{2}$ are member of ZU( $N$ ). Here, ZU( $n$ ) is the $(n-1)$ -dimensional subgroup of U( $n$ ), consisting of all diagonal $n\times n$ unitary matrices with upper-left entry equal to unity and thus isomorphic to U(1)n-1. Because of our induction assumption, the matrix $x$ can be written as

where all $p_{j}$ are $N\times N$ permutation matrices and $\sum_{j}m_{j}=1$ . We conclude that

such that we have the matrix decomposition

First, we note that both $\left(\begin{array}[]{cc}1&\\ &aZ_{1}\end{array}\right)$ and $\left(\begin{array}[]{cc}1&\\ &Z_{2}\end{array}\right)$ are members of ZU( $N+1$ ). For any member $Z$ of ZU( $N+1$ ) holds the property that $FZF^{-1}$ is a circulant XU( $N+1$ ) matrix. Thus, because of Lemma 1, both $X_{1}$ and $X_{2}$ can be written as a weighted sum of permutation matrices (i.e. obey the theorem-to-be-proved). Hence, by virtue of Lemma 2, to prove the theorem for $X$ , it suffices to prove it for $Y$ . For this purpose, we note that, because of $\sum_{j}m_{j}=1$ , we have

One can easily verify that any product of the form $F\,{\tiny\left(\begin{array}[]{cc}1&\\ &p_{j}\end{array}\right)}\,F^{-1}$ is a unitary matrix with upper-left entry equal to 1. Because $F\,{\tiny\left(\begin{array}[]{cc}1&\\ &p_{j}\end{array}\right)}\,F^{-1}$ is of the form $F\,{\tiny\left(\begin{array}[]{cc}1&\\ &U\end{array}\right)}\,F^{-1}$ , this product is also an XU( $N+1$ ) matrix . A matrix with these two properties is necessarily of the form ${\tiny\left(\begin{array}[]{cc}1&\\ &y_{j}\end{array}\right)}$ with $y_{j}$ a member of XU( $N$ ). Because of the induction hypothesis, we may put

Taking into account that $1=\sum_{k}m_{k}^{y}$ , we find

Hence, $Y$ is of the Birkhoff form: a weighted sum of $(N+1)\times(N+1)$ permutation matrices with sum-of-weights equal to 1. Hence, $X$ is. Thus the theorem holds for $n=N+1$ .

They can be written as a weighted sum of the two $2\times 2$ permutation matrices:

and the sum of the two weights $m_{1}$ and $m_{2}$ equals 1.

Because the theorem holds for $n=2$ and the theorem holds for $n=N+1$ as soon as it holds for $n=N$ , the proof of Theorem 2 is complete. $\blacksquare$

Whereas the Birkhoff decomposition (5) of an XU(2) matrix is unique, the decomposition of an XU( $n$ ) matrix with $n>2$ is not unique. We now investigate whether, among the many possible decompositions $\sum_{j}m_{j}P_{j}$ , there is one or more that satisfies not only $\sum_{j}m_{j}=1$ but also $\sum_{j}|m_{j}|^{2}=1$ . This is a slightly stronger formulation of the $|m_{j}|<1$ constraints of the original Birkhoff theorem on doubly stochastic matrices and again defines a convex polytope in which all XU( $n$ ) lie. We start with the case where $n$ is a prime:

Theorem 3: If a matrix belongs to XU( $n$ ) with prime $n$ , then it can be written as a weighted sum of permutation matrices with the sum of the squared moduli of the weights equal to 1.

Before proving Theorem 3, it is interesting to investigate some low-dimensional examples. The theorem is trivial for $n=2$ . Indeed, above, we have shown that $m_{1}=(1+e^{i\alpha})/2$ and $m_{2}=(1-e^{i\alpha})/2$ , such that $|m_{1}|^{2}+|m_{2}|^{2}=1$ .

The theorem is also valid for $n=3$ . In fact, there exist an infinity of decompositions of $X$ as a weighted sum of the $n!=6$ permutation matrices, all satisfying $\sum_{j=1}^{6}|m_{j}|^{2}=1$ . Indeed, any member $X$ of XU(3) can be written as (1), with $F$ the $3\times 3$ discrete Fourier transform and $U$ a $2\times 2$ unitary matrix. Hence

where $\omega$ is the primitive 3 rd root of unity, i.e. $e^{i2\pi/3}=-\frac{1}{2}+i\,\frac{\sqrt{3}}{2}$ . The entries of $X$ therefore look like

Each product $\omega^{a}U_{rs}$ ( $\,\forall a=0,1,2$ and $\forall r,s=1,2$ ) appears exactly once in every row and exactly once in every column of $X$ . Therefore it is staightforward to check that $X$ can be written as

and $W_{3}$ is the doubly stochastic matrix with all entries identical, i.e. equal to $\frac{1}{3}$ . We call $W_{3}$ the $3\times 3$ van der Waerden matrix . It can be written both as a sum of the circulant matrices and as a sum of the anticirculant matrices:

Here, we apply the following decomposition:

where $p$ is an arbitrary complex number.

straightforward computations lead to $\sum_{j}m_{j}=1$ and

Taking into account that $U$ is a $2\times 2$ unitary matrix leads to

For this sum to equal 1, it suffices that

i.e. that, in the complex plane, $p$ is located on the circle with center $\frac{1}{2}$ and radius $\frac{1}{2}$ . For the particular choice $p=1$ , we obtain:

We are now in a position to prove Theorem 3 for an arbitrary prime. We will suffice by demonstrating the existence of one appropriate decomposition. Any member $X$ of XU( $n$ ) can be written as (1), where $F$ is the $n\times n$ discrete Fourier transform and $U$ is a matrix from U( $n-1$ ). Hence, the matrix entries can be written

where $\omega$ is the $n$ th root of unity. Thus, given the numbers $r$ and $s$ , each number $U_{rs}$ appears in the expression of every entry $X_{kl}$ . Therefore, we can write $X$ as a sum of $1+(n-1)^{2}$ matrices:

where $W_{n}$ is the $n\times n$ van der Waerden matrix, i.e. the doubly stochastic matrix with all entries equal to $\frac{1}{n}$ . We call $M_{rs}$ the transfer factor of $U_{rs}$ . It is an $n\times n$ matrix with all entries equal to some $\omega^{a}$ :

As $n$ is prime, a given number $\omega^{a}$ appears only once in every row and only once in every column of $M_{rs}$ . Moreover $M_{rs}$ has the structure of a ‘supercirculant’ matrix. A square matrix $A$ is called supercirculant if there exist two integers $x$ and $y$ , such that, for all $\{k,l\}$ , we have both $A_{k+1,l+x}=A_{k,l}$ and $A_{k+y,l+1}=A_{k,l}$ (where sums are modulo $n$ ). The numbers $x$ and $y$ (with $1\leq x\leq n-1$ and $1\leq y\leq n-1$ ) are called the pitches. They are interdependent, as

If $x=1$ , then $y=1$ and $A$ is simply called circulant; if $x=n-1$ , then $y=n-1$ and $A$ is called anticirculant. The matrix $M_{rs}$ is supercirculant because the difference $l(K+1)-l(K)$ in column number, in which $\omega^{a}$ (for a given $a$ ) occurs for two consecutive rows $K$ and $K+1$ , is a constant (modulo $n$ ) independent of $K$ . Indeed, applying $\omega^{(k-1)r-(l-1)s}$ equal to $\omega^{a}$ for both $k=K$ and $k=K+1$ yields

such that $l(K+1)-l(K)$ is a constant, say $x$ . Analogously, for a given $\omega^{a}$ , $k(L+1)-k(L)$ is a constant, say $y$ . We can summarize that the two pitches $x$ and $y$ follow from

Because $n$ is prime, $x$ and $n$ are coprime and so are $y$ and $n$ . Therefore, $\omega^{a}$ does not appear more than once in a column or row of $M_{rs}$ . As an example, for $n=5$ , the eqns (6) yield the following functions $x(r,s)$ and $y(r,s)$ :

respectively. From the table, one can read that the pitches of $M_{12}$ for $n=5$ are $x=3$ and $y=2$ , respectively, leading to the explicit form

with $\omega$ equal to the 5 th root of unity, i.e. $e^{i2\pi/5}=(\sqrt{5}-1+i\,\sqrt{10+2\sqrt{5}}\,)/4$ .

We thus can conclude that any transfer matrix can be written as

Here, $C_{l,x}$ , with $1\leq l\leq n$ and $1\leq x\leq n-1$ , denotes the $n\times n$ supercirculant permutation matrixNote that $C_{lx}$ is a permutation matrix if and only if $x$ and $n$ are coprime. with a first-row unit entry in column $l$ and a pitch equal to $x$ . In other words: we have $(C_{l,x})_{1,l}=(C_{l,x})_{2,l+x}=1$ .

We thus obtain the following decomposition:

where we thus sum over all $n(n-1)$ supercirculant permutation matrices $C_{lx}$ . We note here that, because of eqns (6), different values of $s$ in (7) give rise to different values of $r(s,x)$ .

For $n\neq 2$ , we may apply the following decomposition of the van der Waerden matrix:

where the $n$ permutation matrices $D_{j}$ are chosen such that they have no 1s in common and are not supercirculant, e.g. $D_{j}=Q^{j-1}D_{1}$ , where

the former being called the shift matrix. Thanks to such choice, the matrix sets $\{C_{lx}\}$ and $\{D_{j}\}$ do not overlap and the sum $\sum_{j}m_{j}P_{j}$ consists of two separate parts:

These parts have the following respective properties:

The $n(n-1)$ weights $m_{lx}$ of the permutation matrices $C_{lx}$ equal a sum of $n-1$ products:

With $\sum_{l=1}^{n}\omega^{(l-1)(t-s)}=n\,\delta_{st}$ , we obtain

As $U$ is an $(n-1)\times(n-1)$ unitary matrix, we have $\sum_{s=1}^{n-1}\ U_{rs}\ \overline{U_{rs}}=1$ . Hence

The $n$ weights $m_{j}$ of the permutation matrices $D_{j}$ equal $\frac{1}{n}$ and therefore contribute to $\sum_{j}m_{j}\overline{m_{j}}\,$ with an amount $n$ times $|\frac{1}{n}|^{2}$ and thus $\frac{1}{n}$ .

We note that the above construction does not work for the special case $n=3$ because, for $n=3$ , the matrices $D_{1}$ , $D_{2}$ , and $D_{3}$ are, by coincidence, anticirculant. Therefore, $D_{1}$ , $D_{2}$ , and $D_{3}$ coincide with $C_{12}$ , $C_{22}$ , and $C_{32}$ , respectively, such that the above special-purpose construction for $n=3$ was necessary. As a matter of fact, the proposed Birkhoff decomposition consists of $n(n-1)$ matrices $C_{lx}$ and $n$ matrices $D_{j}$ , thus of a total of $n^{2}$ permutation matrices $P_{j}$ . Only for $n>3$ , the relation $n^{2}\leq n!$ is valid and there exist enough permutation matrices to prove Theorem 3 in the generic way.

If $n$ is not prime, then not all transfer matrices $M_{rs}$ are supercirculant and the key property of the decomposition, proposed in the proof, is not fulfilled. If both $r$ and $s$ are coprime with $n$ , then $M_{rs}$ is supercirculant. The other transfer matrices consist of identical blocks of size $b\times c$ with

E.g., for $n=4$ , the $4\times 4$ matrix $M_{12}$ has two identical blocks of size $b\times c=4\times 2$ :

where $\omega$ here is the 4 th root of unity, i.e. $i$ .

Whether Theorem 3 is also valid if $n$ is a composite number, is left for further investigation. At least it is valid for the smallest non-prime, i.e. for $n=4$ . This can be verified by checking that the decomposition

where the weights $m_{j}$ have the values as in the Appendix.

A consequence

As already mentioned in Section 2, any $n\times n$ unitary matrix $U$ can be decomposed as

where $e^{i\alpha}$ is an overall phase factor, $X$ is an XU( $n$ ) matrix, and both $Z_{1}$ and $Z_{2}$ are ZU( $n$ ) matrices. Applying the fact that $X$ can be written as a weighted sum of permutation matrices, we can conclude that $U$ can be written as a weighted sum of complex permutation matrices. Here, we define a complex permutation matrix as a unitary matrix having one and only one non-zero entry in every row and every column .

Conclusion

We have demonstrated that all matrices of the group $e^{i\alpha}$ XU( $n$ ) can be written as a weighted sum of permutation matrices and that, among the U( $n$ ) matrices they are the only ones that can be decomposed that way. The sum of the weights equals $e^{i\alpha}$ . We prove that the sum of the squared moduli of the weights can be made equal to unity whenever $n$ is prime, giving a convex geometric interpretation to the decomposition, as in the standard Birkhoff theorem. The case of non-prime $n$ is left for further investigation.

References

Appendix

An arbitrary member $X$ of XU(4) may be written as $\sum_{j=1}^{24}m_{j}P_{j}$ with

where the condition $\sum_{j}|m_{j}|^{2}=1$ is fulfilled. Here, the $n!=24$ permutation matrices have been ordered ‘lexicographically’ as follows:

In this ordering, the supercirculant permutation matrices are $C_{11}=P_{1}$ , $C_{13}=P_{6}$ , $C_{21}=P_{10}$ , $C_{23}=P_{8}$ , $C_{31}=P_{15}$ , $C_{41}=P_{19}$ , and $C_{43}=P_{24}$ .