From Knothe's transport to Brenier's map and a continuation method for optimal transport

Guillaume Carlier, Alfred Galichon, Filippo Santambrogio

Introduction

Assumption (H-source): the measure μd\mu^{d}, as well as μd\mu^{d}-almost all the measures μxdd1\mu^{d-1}_{x_{d}}, and the measures μxd,xd1d2\mu^{d-2}_{x_{d},x_{d-1}} for μd\mu^{d}-a.e. xdx_{d} and μxdd1\mu^{d-1}_{x_{d}}-a.e. xd1x_{d-1}…up to almost all the measures μxd,,x21\mu^{1}_{x_{d},\dots,x_{2}}, which are all measures on the real line, must have no atoms.

Notice that (H-source) is satisfied as soon as μ\mu is absolutely continuous with respect to the Lebesgue measure.

The Monge-Kantorovich problem and the Brenier’s map. Optimal transportation theory provides an alternative way to transport μ\mu to ν\nu. We recall that in the case of the quadratic cost, the Monge-Kantorovich problem reads as

The Knothe-Rosenblatt as a limit of optimal transportation plans. Let us slightly modify the quadratic cost in (1.1) and replace it with the weighted quadratic cost

where the λi(ε)\lambda_{i}(\varepsilon)’s are positive scalars depending on a parameter ε>0\varepsilon>0. If μ\mu is absolutely continuous with respect to the Lebesgue measure, the corresponding optimal transportation problem admits a unique solution TεT_{\varepsilon}. When in addition, for all k{1,...,d1}k\in\{1,...,d-1\}, λk(ε)/λk+1(ε)0\lambda_{k}(\varepsilon)/\lambda_{k+1}(\varepsilon)\to 0 as ε0\varepsilon\to 0, it is natural to expect the convergence of TεT_{\varepsilon} to the Knothe transport TT. We will show that this convergence holds provided ν\nu satisfies some additional condition, and namely

Assumption (H-target): the measure νd\nu^{d}, as well as νd\nu^{d}-almost all the measures νydd1\nu^{d-1}_{y_{d}}, and the measures νyd,yd1d2\nu^{d-2}_{y_{d},y_{d-1}} for νd\nu^{d}-a.e. ydy_{d} and νydd1\nu^{d-1}_{y_{d}}-a.e. yd1y_{d-1}…up to almost all the measures νyd,,y32\nu^{2}_{y_{d},\dots,y_{3}}, which are all measures on the real line, must have no atoms.

Notice that (H-target) is not natural as (H-source) is. Yet, we will show a counter-example to the convergence result when it is not satisfied. (H-target) as well is satisfied should ν\nu be absolutely continuous (actually, this assumption is slightly weaker then (H-source), since the last disintegration measures are not concerned).

This convergence result was conjectured by Y. Brenier as a very natural one, and actually its proof is not hard. Yet, it was not known before that extra assumptions on ν\nu were needed. This makes one of the point of interest of this paper.

The other point is what we investigate later in the paper, i.e. the other direction: from Knothe to Brenier. We will study the dependence εTε\varepsilon\mapsto T_{\varepsilon} by means of the evolution with respect to ε\varepsilon of the dual variables. This will enable us, to design a numericaly strategy to approximate all the optimal transports TεT_{\varepsilon} taking as initial condition the (cheap to compute) Knothe transport TT.

which converges as ε0\varepsilon\rightarrow 0 to T=(ab2/cb/c0c)T=\begin{pmatrix}\sqrt{a-b^{2}/c}&b/\sqrt{c}\\ 0&\sqrt{c}\end{pmatrix}, which is precisely the matrix of the Knothe transport from μ\mu to ν\nu.

Organization of the paper. In section 2, we show, under suitable assumptions, that the optimal transportation maps for the cost cεc_{\varepsilon} converge to Knothe’s transport map as the parameter ε\varepsilon goes to , we will also emphasize that some conditions are to be imposed on ν\nu for the convergence to hold. In section 3, we show that the evolution of the dual variables in the optimal transportation problem for cost the cεc_{\varepsilon} is given by a well-posed ordinary differential equation. Finally in section 4, we discretize this equation and give several numerical results.

Knothe transport as a limit of quadratic optimal transports

We directly state our first result, whose proof, in the spirit of Γ\Gamma-convergence developments (see ), will require several steps.

Moreover, should the plans γε\gamma_{\varepsilon} be induced by transport maps TεT_{\varepsilon}, then these maps would converge to TT in L2(μ)L^{2}(\mu) as ε0\varepsilon\to 0.

Take the plans γε\gamma_{\varepsilon} that are optimal for the Brenier-like cost cεc_{\varepsilon} given by

(we suppose for simplicity λd(ε)=1\lambda_{d}(\varepsilon)=1 and λi(ε)/λi+1(ε)0\lambda_{i}(\varepsilon)/\lambda_{i+1}(\varepsilon)\to 0). Suppose (which is possible, up to subsequences) γεγ\gamma_{\varepsilon}\rightharpoonup\gamma. We want to prove γ=γK\gamma=\gamma_{K}.

By comparing γε\gamma_{\varepsilon} to γK\gamma_{K} and using optimality we first get

and, passing to the limit as ε0\varepsilon\to 0, since cεc_{\varepsilon} converges locally uniformly to c(d)(x,y)=(xdyd)2c^{(d)}(x,y)=(x_{d}-y_{d})^{2}, we get

Yet, the function c(d)c^{(d)} only depends on the variables xdx_{d} and ydy_{d} and this shows that the measure (πd)γ(\pi_{d})_{\sharp}\gamma gets a better result than (πd)γK(\pi_{d})_{\sharp}\gamma_{K} with respect to the quadratic cost (πd\pi_{d} being the projection onto the last coordinates, i.e. πd(x,y)=(xd,yd)\pi_{d}(x,y)=(x_{d},y_{d})). Yet, the measure γK\gamma_{K} has been chosen on purpose to get optimality from μd\mu_{d} to νd\nu_{d} with respect to this cost, and the two measures share the same marginals. Moreover, thanks to the assumptions on μd\mu_{d}, this optimal transport plan (which is actually induced by a transport map) is unique. This implies (πd)γ=(πd)γK(\pi_{d})_{\sharp}\gamma=(\pi_{d})_{\sharp}\gamma_{K}. Let us call γd\gamma^{d} this common measure.

We go back to (2.1) and go on by noticing that all the measures γε\gamma_{\varepsilon} have the same marginals as γK\gamma_{K} and hence their (separate) projection onto xdx_{d} and ydy_{d} are μd\mu_{d} and νd\nu_{d}, respectively. This implies that (πd)γε(\pi_{d})_{\sharp}\gamma_{\varepsilon} must realize a result which is worse than (πd)γK(\pi_{d})_{\sharp}\gamma_{K} as far as the quadratic cost is concerned and consequently we have

which implies, by simplifying the common term in d(πd)γKd(\pi_{d})_{\sharp}\gamma_{K}, dividing by λd1(ε)\lambda_{d-1}(\varepsilon) and passing to the limit,

(we use the general notation ck(x,y)=xkyk2c^{k}(x,y)=|x_{k}-y_{k}|^{2}). We can notice that both integrals depend on the variables xd1x_{d-1} and yd1y_{d-1} only. Anyway, we can project onto the variables (xd1,xd)(x_{d-1},x_{d}) and (yd1,yd)(y_{d-1},y_{d}) (obtaining measures (πd1)γ(\pi_{d-1})_{\sharp}\gamma and (πd1)γK(\pi_{d-1})_{\sharp}\gamma_{K}) so that we disintegrate with respect to the measure γd\gamma^{d}. We have

It is is sufficient to prove that the measures γ(xd,yd)d1\gamma^{d-1}_{(x_{d},y_{d})} share the same marginals on xd1x_{d-1} and yd1y_{d-1} as the corresponding γ(xd,yd),Kd1\gamma^{d-1}_{(x_{d},y_{d}),K} to get that their quadratic performance should be worse than the corresponding performance of γ(xd,yd),Kd1\gamma^{d-1}_{(x_{d},y_{d}),K} (this is because the Knothe measure has been chosen exactly with the intention of being quadratically optimal on (xd1,yd1)(x_{d-1},y_{d-1}) once xdx_{d} and ydy_{d} are fixed). Yet, (2.2) shows that, on average, the result given by the those measures is not worse than the results of the optimal ones. Thus, the two results coincide for almost any pair (xd,yd)(x_{d},y_{d}) and, by uniqueness of the optimal transports (this relies on the assumptions on the measures μxdd1\mu^{d-1}_{x_{d}}), we get γ(xd,yd)d1=γ(xd,yd),Kd1\gamma^{d-1}_{(x_{d},y_{d})}=\gamma^{d-1}_{(x_{d},y_{d}),K}. To let this proof work it is sufficient to prove that the projections of the two measures coincide for γd\gamma^{d}-a.e. pair (xd,yd)(x_{d},y_{d}). For fixed (xd,yd)(x_{d},y_{d}) we would like to prove, for any ϕ\phi

(and to prove an analogous equality for functions of yd1y_{d-1}). Since we accept to prove it for a.e. pair (xd,yd)(x_{d},y_{d}), it is sufficient to prove this equality:

for any ϕ\phi and any ψ\psi. This means proving

which is not trivial since we only know that the two measures γd1\gamma^{d-1} and γKd1\gamma^{d-1}_{K} have the same marginals with respect to the pairs (xd1,xd)(x_{d-1},x_{d}), (yd1,yd)(y_{d-1},y_{d}) (since they have the same projections onto xx and onto yy) and (xd,yd)(x_{d},y_{d}) (since we just proved it). But here there is a function of the three variables (xd1,xd,yd)(x_{d-1},x_{d},y_{d}). Yet, we know that the measure γd\gamma^{d} is concentrated on the set yd=Td(xd)y_{d}=T_{d}(x_{d}) for a certain map TdT_{d}, and this allows to replace the expression of ydy_{d}, thus getting rid of one variable. This proves that the function ψ(xd,yd)ϕ(xd1)\psi(x_{d},y_{d})\phi(x_{d-1}) is actually a function of (xd1,xd)(x_{d-1},x_{d}) only, and that equality holds when passing from γ\gamma to γK\gamma_{K}.The same can be performed on functions ψ(xd,yd)ϕ(yd1)\psi(x_{d},y_{d})\phi(y_{d-1}) but we have in this case to ensure that we can replace xdx_{d} with a function of ydy_{d}, i.e. that we can invert TdT_{d}. This is possible thanks to the assumption on νd\nu_{d}, since TdT_{d} is the optimal transport from μd\mu_{d} to νd\nu_{d}, but an optimal transport exists in the other direction as well and it gives the same optimal plan (thanks to uniqueness). These facts prove that the measures γ(xd,yd)d1\gamma^{d-1}_{(x_{d},y_{d})} and γ(xd,yd),Kd1\gamma^{d-1}_{(x_{d},y_{d}),K} have the same marginals and hence, since they are both optimal, they coincide for a.e. pair (xd,yd)(x_{d},y_{d}). This implies γd1=γKd1\gamma^{d-1}=\gamma^{d-1}_{K}.

Now, it is possible to go on by steps: once we have proven that γh=γKh\gamma^{h}=\gamma^{h}_{K}, let us take (2.1) and estimate all the terms with xiyi2|x_{i}-y_{i}|^{2} and ihi\geq h thanks to the optimality of γK\gamma_{K}, thus getting

and consequently, by dividing by λh1(ε)\lambda_{h-1}(\varepsilon) and passing to the limit,

We disintegrate with respect to γh\gamma^{h} and we act exacly as before: proving that the marginals of the disintegrations coincide is sufficient to prove equality of the measures. Here we will use test-functions fo the form

The same trick as before, i.e. replacing the variables yy with functions of the variables xx is again possible. To invert the trick and replace xx with yy one needs to invert part of Knothe’s transport. This is possible since our assumptions imply that all the monotone transports we get are invertible. In the end we get, as before, γh1=γKh1\gamma^{h-1}=\gamma^{h-1}_{K}. This procedure may go on up to h=2h=2, thus arriving at γ=γK\gamma=\gamma_{K}.

We have now proven γεγK\gamma_{\varepsilon}\rightharpoonup\gamma_{K}. Yet, if all these transport plans come from transport maps, it is well known that (Tε×id)μ(T×id)μ(T_{\varepsilon}\times id)_{\sharp}\mu\rightharpoonup(T\times id)_{\sharp}\mu implies TεTT_{\varepsilon}\to T in Lp(μ)L^{p}(\mu), for any p>1p>1, as far as TεT_{\varepsilon} is bounded in Lp(μ)L^{p}(\mu). Actually, weak convergence is a simple consequence of boundedness: to go on, we can look at Young’s measures. The assumption (the limit is a transport map as well) exactly means that all the Young measures are dirac masses, which implies strong convergence. In particular we get L2(μ)L^{2}(\mu) convergence and μ\mu-a.e. convergence on a subsequence. ∎

Let us remark here that if instead of considering the quadratic cost cεc_{\varepsilon}, one considers the more general separable cost

where each cic_{i} is a smooth strictly convex function (with suitable growth), then the previous convergence proof carries over.

The Knothe-Rosenblatt map is easily computed as (y1,y2)=T(x):=(2(x1+sgn(x2)),0)(y_{1},y_{2})=T(x):=(2(x_{1}+sgn(x_{2})),0). The solution of any symmetric transportation problem with λε=(ε,1)\lambda^{\varepsilon}=(\varepsilon,1) is (y1,y2)=T0(x):=(x1,0)(y_{1},y_{2})=T^{0}(x):=(x_{1},0) (no transport may do better than this one, which projects on the support of ν\nu). Therefore, in this example the optimal transportation maps fail to tend to the Knothe-Rosenblatt map. The reason is the atom in the measure ν2=δ0\nu^{2}=\delta_{0}.

Convergence even with atoms The convergence result of theorem 2.1 requires the absence of atoms in the projections of ν\nu. This is obviously not the case if ν\nu itself is purely atomic! Yet, this will precisely be the case we will consider in the algorithm we propose in the sequel. The same proof may be extended to this case under the following assumption. Keep the same assumptions on μ\mu but suppose that ν\nu is concentrated on a set SS with the property

This means that, if we restrict ourselves to SS, then all the variables yiy_{i} for i<di<d are actually a function of the last variable ydy_{d}. This is particularly useful when ν\nu is purely atomic, concentrated on a finite (or countable) set of points with different ydy_{d} components.

Just come back to the proof. The equality

only relied on ydy_{d} being a function of xdx_{d}, which is still true. The other equality, namely

gives some extra troubles. It is not any more true that xdx_{d} is a function of ydy_{d}. Yet, it is true that yd1y_{d-1} is a function of ydy_{d} and this allows us to reduce the expression to functions of (xd,yd)(x_{d},y_{d}) only, which is sufficient to get equality. The same procedure may be performed at subsequent steps as well.

An ODE for the dual variables

In this section, we consider for simplicity the case d=2d=2 (although our method extends to higher dimensions), μ\mu uniform on some convex polyhedron Ω\Omega (for the sake of simplicity we will assume Ω=1|\Omega|=1) and ν=1Ni=1Nδyi\nu=\frac{1}{N}\sum_{i=1}^{N}\delta_{y_{i}} where all the points yiΩy_{i}\in\Omega have a different second coordinate yi(2)y_{i}^{(2)}. For every ε0\varepsilon\geq 0, let AεA_{\varepsilon} be the diagonal 2×22\times 2 matrix with diagonal entries (ε,1)(\varepsilon,1) and let cεc_{\varepsilon} be the quadratic cost defined by cε(x,y)=Aε(xy)(xy)c_{\varepsilon}(x,y)=A_{\varepsilon}(x-y)(x-y). We are interested in solving the family of optimal transportation problems

for all values of the parameter ε\varepsilon\in. It is well-known, that (3.1) can be conveniently solved by the dual problem formulated in terms of prices:

where pε(x)=mini{cε(x,yi)pi}p_{\varepsilon}^{*}(x)=\min_{i}\{c_{\varepsilon}(x,y_{i})-p_{i}\} and we impose as a normalization p1=0p_{1}=0. For each ε\varepsilon, there is a unique maximizer p(ε)p(\varepsilon). For each (p,ε)(p,\varepsilon) we define C(p,ε)i={xΩ:infjcε(x,yj)pj=cε(x,yi)pi}C(p,\varepsilon)_{i}=\{x\in\Omega\,:\,\inf_{j}c_{\varepsilon}(x,y_{j})-p_{j}=c_{\varepsilon}(x,y_{i})-p_{i}\}. It is easy to check that Φε:=Φ(.,ε)\Phi_{\varepsilon}:=\Phi(.,\varepsilon) is concave differentiable and that the gradient of Φε\Phi_{\varepsilon} is given by

By concavity of Φε\Phi_{\varepsilon}, the solution p(ε)p(\varepsilon) of (3.2) is characterized by the equation Φε(p(ε))=0\nabla\Phi_{\varepsilon}(p(\varepsilon))=0. The optimal transportation between μ\mu and ν\nu for the cost cεc_{\varepsilon} is then the piecewise map taking value yiy_{i} in the cell C(p(ε),ε))iC(p(\varepsilon),\varepsilon))_{i}. Our aim is to charcacterize the evolution of p(ε)p(\varepsilon) as ε\varepsilon varies. Formally, differentiating the optimality condition Φ(p(ε),ε)=0\nabla\Phi(p(\varepsilon),\varepsilon)=0, we obtain a differential equation for the evolution of p(ε)p(\varepsilon):

Our aim now is to show that the equation (3.3) is well-posed starting with the initial condition p(0)p(0) (corresponding to horizontal cells of area 1/N1/N); this will involve computing the second derivatives of Φ\Phi in (3.3), proving their Lipschitz behavior as well as obtaining a negative bound on the larger eigenvalue of the negative semidefinite matrix Dp,p2ΦD^{2}_{p,p}\Phi .

The price vector p(ε)p(\varepsilon), along the evolution, will always be such that all the areas C(p,ε)i|C(p,\varepsilon)_{i}| are equal (and are equal to 1/N1/N). Yet, we need to prove that the differential equation is well posed and we will set it in an open set,

The initial datum of the equation will be such that C(p(0),0)i=1/N|C(p(0),0)_{i}|=1/N and we will look at the solution only up to the first moment where it exits O{\mathcal{O}}. Yet inside the set it will be well-posed and it will imply conservation of the areas. Hence we will never exit O{\mathcal{O}}.

The positions of the vertices x(p,ε)i,j±x(p,\varepsilon)_{i,j}^{\pm} depend in a Lipschitz way on pp and ε\varepsilon.

Locally it is true that the same point x(p,ε)i,j±x(p,\varepsilon)_{i,j}^{\pm} is defined either by a system of equations

in the case of a point on the intersection of two common boundaries, or by a system

in the case of intersection with the boundary Ω\partial\Omega (locally being given by the equation Lx=l0Lx=l_{0}) . The first system, after simplifying, reads as

and the second as well may be simplified the same way. This means that they are of the form

for a matrix M(ε)M(\varepsilon) which reads, in usual coordinates,

The function Φ\Phi admits pure second derivatives with respect to pp and mixed second derivatives with respect to pp and ε\varepsilon, and these second derivatives are Lipschitz continuous.

We have proven that the positions of the points x(p,ε)i,j±x(p,\varepsilon)_{i,j}^{\pm} depend Lipschitzly, with uniformly bounded Lipschitz constants, on pp and ε\varepsilon. Since the volumes of the cells C(p,ε)iC(p,\varepsilon)_{i} are Lipschitz functions of these points, this implies that pΦ\nabla_{p}\Phi is C0,1C^{0,1}. Hence it admits derivatives almost everywhere and we can compute them in the following way.

The derivative of a volume of a polygonal cell is given, side by side, by the length of the side times the average of the components which are normal to such a side of the two derivatives x˙(p,ε)i,j+\dot{x}(p,\varepsilon)_{i,j}^{+} and x˙(p,ε)i,j\dot{x}(p,\varepsilon)_{i,j}^{-} (the other terms - which are mainly the terms at the corners - are of higher order). The equation of the side D(ε,p)i,jD(\varepsilon,p)_{i,j} is, as we know,

and the normal unit vector to the side is

Let us start from the derivatives of the cell C(p,ε)iC(p,\varepsilon)_{i} with respect to a variable pjp_{j} with jij\neq i. We differentiate (3.8) with respect to pjp_{j} and we get

This formula only works for x=x(p,ε)i,j+x=x(p,\varepsilon)_{i,j}^{+} and x=x(p,ε)i,jx=x(p,\varepsilon)_{i,j}^{-}. Obviously it ony works where they are differentiable, i.e. almost everywhere. Hence the derivative, by summing up and rescaling the normal vector, is given by

where li,jl_{i,j} is the length of D(ε,p)i,jD(\varepsilon,p)_{i,j}.

As far as the derivative with respect to pip_{i} is concerned, it is not difficult to check that we have (by summing up the results on every side)

where the sum is performed on all the indices jj such that the cell C(p,ε)jC(p,\varepsilon)_{j} is in contact with the cell C(p,ε)iC(p,\varepsilon)_{i}.

Automatically, since these derivatives only depend on the values of li,jl_{i,j}, which depend in a Lipschitz manner on the positions of x=x(p,ε)i,j±x=x(p,\varepsilon)_{i,j}^{\pm}, they are Lipschitz as well. This proves that Φε\Phi_{\varepsilon} is actually C2,1C^{2,1} and that these derivatives are well defined and admit the previous expressions (3.9)-(3.10) everywhere.

The computation of the derivatives with respect to ε\varepsilon is a bit trickier. We derive again (3.8), but with respect to ε\varepsilon. Since dAε/dε=BdA_{\varepsilon}/d\varepsilon=B, we get

Then we renormalize the normal vector, sum up the results for x=x(p,ε)i,j+x=x(p,\varepsilon)_{i,j}^{+} and x=x(p,ε)i,jx=x(p,\varepsilon)_{i,j}^{-}, multiply by the lengths and sum up the results for all the sides, and get

In this case as well the result is Lipschitz in (p,ε)(p,\varepsilon) and hence pΦ\nabla_{p}\Phi is differentiable everywhere with respect to ε\varepsilon, with Lipschitz derivative.

We can come now back to the evolution of p=p(ε)p=p(\varepsilon) and consider again the differential equation (3.3). To solve this equation we need to prove that the matrix Dp,p2ΦD^{2}_{p,p}\Phi is actually invertible (for numerical purpose, we will also need to bound its eigenvalues away from zero). It is important to recall that we look at the evolution of the vector p=(p2,,pN)p=(p_{2},\dots,p_{N}), since we may assume p1(ε)=0p_{1}(\varepsilon)=0 for all ε\varepsilon. Hence, we will not look at the entries 11 in the vectors or the matrices. The matrix we consider is M:=(Dp,p2Φ)i,j=2,,NM:=-(D^{2}_{p,p}\Phi)_{i,j=2,\dots,N} has the following properties:

on each line, outside the diagonal we have negative terms Mi,j=li,j2Aε(yjyi)M_{i,j}=-\frac{l_{i,j}}{2|A_{\varepsilon}(y_{j}-y_{i})|};

each element on the diagonal is the sum of minus all the others on the same line (hence it is positive), and possibly of the term which should be in the same line at the first column;

an entry (i,j)(i,j) of the matrix is non-zero if and only if the cells C(p,ε)iC(p,\varepsilon)_{i} and C(p,ε)jC(p,\varepsilon)_{j} share a common boundary with positive length;

in particular, for any pair (i,j)(i,j), even if the entry at place (i,j)(i,j) is zero, it is possible to find a path i=i0,i1,i2,,ik=ji=i_{0},\,i_{1},\,i_{2},\dots,i_{k}=j so that the matrix has non-zero values at all the positions (ih,ih+1)(i_{h},i_{h+1});

the total of the entries of the first column (the one which is not present in the matrix) is strictly positive.

The invertibility of MM is ensured by the following:

Let the matrix MM satisfy the following properties

This implies that all inequalities are equalities and in particular xj=xix_{j}=x_{\overline{i}} whenever Mi,j0M_{\overline{i},j}\neq 0. Hence, the entries of xx on all the indices which are “neighbours” of i\overline{i} equal xix_{\overline{i}} (and they are maximal as well). This allows to repeat the argument replacing i\overline{i} with another maximizing index jj and so on… since any index is connected by a chain of neighbours to i\overline{i}, we get that all the entries are equal. But this implies that the vector in the kernel we selected must be a multiple of the vector (1,1,,1)(1,1,\dots,1). Yet, this vector is not in the kernel since the sum of the elements on each line is not zero for all lines, by assumption (H2)(H2). This proves that MM is invertible. ∎

Finding a lower bound for the modulus of the eigenvalues of MM, i.e. quantifying its inversibility is not straightforward. Indeed, the properties (H1)(H1), (H2)(H2), (H3)(H3) are not sufficient to get this bound, even if we fix the norm of the remainding column as the following counter-example shows. The determinant of the matrices

is 2ε(1ε)02\varepsilon(1-\varepsilon)\to 0, which implies that some eigenvalue as well goes to .

We will obtain a positive lower bound by a compactness argument, but we will use something stronger than simply assumptions (H1)(H1), (H2)(H2), (H3)(H3). The idea is that assumption (H3)(H3) is not closed, but it stays closed when we replace it with the stronger condition of the matrix being associated to a partition (as is the case for M=Dp,p2ΦM=-D^{2}_{p,p}\Phi). In this case if one connection degenerates (i.e. a common boundary reduces to a point), some other connections will play the role.

There is a positive uniform lower bound on the least eigenvalue of any matrix MM associated to the cell partition corresponding to a pair (p,ε)O(p,\varepsilon)\in\mathcal{O}.

The proof will be obtained by contradiction. To this aim, take a sequence of partitions of Ω\Omega into sets (Ωin)i=1,,N(\Omega_{i}^{n})_{i=1,\dots,N}. We assume these sets to be convex polygons with a bounded number of sides. This is the case for the cells associated to pairs (p,ε)(p,\varepsilon). We also know that their areas are always bounded between 1/2N1/2N and 2/N2/N). These partitions give rise to a certain topology of connections between the cells. Up to subsequences, we may suppose that this topology is always the same on all the partitions of the sequence (since the number of possible topologies is finite). Up to subsequences, we also have convergence in the Hausdorff distance. This means that for any ii we have ΩinΩi\Omega_{i}^{n}\to\Omega_{i}, and this convergence, which is the same as the convergence of all the vertices, preserves the areas, the convexity, the upper bound on the number of sides, the fact of being a partition… The matrices associated to these partitions depend continuously on these sets (with respect to this convergence, since they actually depend on the positions of the vertices). Notice that it is possible that a side reduces its length along the sequence up to becoming a single point in the limit. Yet, two cells which share a boundary along the sequence will do it along the whole sequence (thanks to our choice of not changing the topology) and at the limit, either they share a side as well, or they share a point only, but in this case the terms li,jl_{i,j} converged to zero. Hence we can associate to all the partitions their matrices and this correspondence is continuous. We have a sequence of matrices MnMM_{n}\to M and let us suppose that some eigenvalue λ1(n)\lambda_{1}^{(n)} goes to zero. This would imply that the matrix MM is associated to a partition but has a zero eigenvalue. This is not possible, thanks to the proof of lemma 3.3 above. MM is associated to a partition and hence satisfies assumptions (H1)(H1) and (H3)(H3). To check (H2)(H2) we observe that the column that we remove, the first one, is associated to the first cell and, up to some rescaling but bounded factor Aε(y1yj)|A_{\varepsilon}(y_{1}-y_{j})|, its entries are the lengths of its sides. Yet, this cell conserves the area bounds we had on the sequence and its entry cannot be all zero. ∎

From the previous results on the form and the regularity of the derivatives of pΦ\nabla_{p}\Phi, we deduce from the Cauchy-Lipschitz Theorem that the ODE (3.3) governing the evolution of the dual variables is well posed and actually characterizes the optimal prices:

Let p(ε)p(\varepsilon) be the solution of the dual problem (3.2) (recall the normalization p1(ε)=0p_{1}(\varepsilon)=0), then it is the only solution of the ODE:

with initial condition p(0)p(0) such that all the horizontal strips C(p(0),0)iC(p(0),0)_{i} have area 1/N1/N.

Numerical results

The algorithm we propose consists simply in discretizing (3.12) (together with the initial condition p(0)p(0) determined as in Theorem 3.5) by an explicit Euler scheme. Let nn be some positive integer and h:=n1h:=n^{-1} be the step size. Let us set p0=p(0)p_{0}=p(0) and define prices inductively as follows.

While (pk,kh)(p_{k},kh) belongs to the open set O{\mathcal{O}} defined by (3.4), compute:

Note that computing AkA_{k} and δk\delta_{k} by formulas (3.9)-(3.10) and (3.11), requires to construct the cells C(kh,pk)iC(kh,p_{k})_{i}.

Solve the linear system Akz=δkA_{k}z=\delta_{k}; by taking advantage of AkA_{k} being positive definite, we use the conjugate gradient algorithm for minimizing Jk(z)=Ak(z)(z)2δkzJ_{k}(z)=A_{k}(z)(z)-2\delta_{k}\cdot z to solve this system exactly in N1N-1 steps. We denote by zkz_{k} the solution.

Thanks to the Lipschitz properties established in section 3, it is easy to check that for hh small enough, (pk,kh)(p_{k},kh) always remain in O{\mathcal{O}} and then pkp_{k} is well-defined for every kk up to nn. For such an hh and since (3.12) is generated by a Lipschitz function on O{\mathcal{O}}, it is well-known that the convergence of the Euler scheme is linear (see for instance ). Denoting by php^{h} the piecewise constant function having values pkp_{k} on intervals [kh,(k+1)h)[kh,(k+1)h), we thus get the following convergence:

For hh small enough, the algorithm above is well-defined and the uniform error between php^{h} and the optimal price pp is O(h)O(h).

2 Numerical experiments

The construction of the cells at each step is achieved efficiently by an implementation in Matlab. In the setting described above where d=2d=2, Ω=[0,1]2\Omega=\left[0,1\right]^{2}, μ\mu is the uniform distribution on Ω\Omega, ν=1Nk=1Nδyk\nu=\frac{1}{N}\sum_{k=1}^{N}\delta_{y_{k}}, our algorithm computes the cells Ωkε={xΩ:Tε(x)=yk}\Omega_{k}^{\varepsilon}=\left\{x\in\Omega:T_{\varepsilon}\left(x\right)=y_{k}\right\} as well as the prices pkεp_{k}^{\varepsilon} of the cell yky_{k} for ε=0\varepsilon=0 to ε=1\varepsilon=1. For ε=0\varepsilon=0 which is the case where the transportation plan is the Knothe one, the computational task amounts to sorting the second component of the yky_{k}’s. Appropriate discretization steps are then chosen for the transition ε=0\varepsilon=0 to ε=1\varepsilon=1. At each step, the tesselation of Ω\Omega into the polyhedral cells Ωkε\Omega_{k}^{\varepsilon} is computed based on the prices pkεp_{k}^{\varepsilon}. Adjacency information on these cells is computed, as well as the length of the facet between two cells and the coordinates of its extreme points. This information allows one to formulate a discretized version of ODE (3.12) using an Euler discretization scheme.

For geometric computations we use the Multi-Parametric Toolbox library, available online at http://control.ee.ethz.ch/ mpt. In particular, polytope computes the tesselation of Ω\Omega into the polyhedral cells Ωkε\Omega_{k}^{\varepsilon}. (A slighlty modified version of) mpt_buildAdjacency extracts adjacency information on this tesselation, from which the vertices and the lengths of the sides between two cells can be deduced. The library also incorporates convenient graphical routines.

Error analysis Some numerical examples are presented below, for which relative errors in cell areas (i.e. deviation from the optimality conditions) are given as well as a comparison between the tesselations obtained with our method and the true solution. Another way to test our method is as follows. Let us consider the set

We give three instances of executions of our algorithm, with samples of respectively 5, 10, and 15 points. Taking weights (ε,ε1)(\varepsilon,\varepsilon^{-1}) with ε(0,+)\varepsilon\in(0,+\infty) (rather than (ε,1)(\varepsilon,1) with ε(0,1)\varepsilon\in(0,1)) we get the full evolution of the optimal transports from one Knothe’s transport (horizontal strips) to the other (vertical strips). Further examples as well as videos can be found at http://alfred.galichon.googlepages.com/anisotropic. It should also be pointed that our method does not approximate the solution of a single optimal transportation problem but a whole family of such problems (which actually explains relatively high running times).

Five sample points. We take as our sample set a sample of five points. We get the following errors:

for which we draw in Figure 1 the partition obtained for ε=1\varepsilon=1 using an exact method, as well as the true evolutions of the componentwise correlations of the xx’s and the yy’s from ε=0\varepsilon=0 until ε=1\varepsilon=1.

Running the continuation algorithm with 100 and 500 discretization steps we obtained partition sketched below, where the relative error on the cells area is inferior to 5%. The evolution of the componentwise correlations from ε=0\varepsilon=0 until ε=1\varepsilon=1 are also sketched and compared with their exact conterparts (the above curve). Running the algorithm with 500 discretization steps on a standard laptop took 349 seconds, and yields to the results below, where the maximal relative error on the cells area is 1%. The evolution of the componentwise correlations from ε=0\varepsilon=0 until ε=1\varepsilon=1 are also sketched and compared with their exact counterparts.

Ten sample points. Taking a sample of ten points, we obtain the following errors:

We draw in Figure 3 the partition obtained for ε=1\varepsilon=1 using an exact method, as well as the true evolutions of the componentwise correlations of the xx’s and the yy’s from ε=0\varepsilon=0 until ε=1\varepsilon=1.

Fifteen sample points. With a sample of 15 points, we get the following errors:

Acknowledgements G.C. and F.S. gratefully acknowledge the support of the Agence Nationale de la Recherche via the research project OTARIE. A.G. gratefully acknowledge the support of Chaire EDF-Calyon “Finance et Développement Durable,” of Chaire AXA “Assurance et risques majeurs,” and of Chaire Société Générale “Risques Financiers”. The authors wish to warmly thank Yann Brenier and Alessio Figalli for stimulating discussions.

References