Representation formula for the entropy and functional inequalities

Joseph Lehec

Introduction: Borell’s formula

where the supremum is taken over all random processes uu, say bounded and adapted to the Brownian filtration. Among other applications, he derives easily the Prékopa-Leindler inequality. The name Borell’s formula may be unfair to Boué and Dupuis who in an earlier paper obtained a stronger result, allowing the function ff to depend on the whole path (Bt)t(B_{t})_{t\in} (see Theorem 9 below for a precise statement). Anyway, Borell and Boué-Dupuis agree that representation formulas such as (1) arose much earlier in optimal control theory, particularly in Fleming and Soner’s work , and Borell should definitely be credited for bringing these techniques in the context of functional inequalities.

The present article deals with relative entropy. Let (Ω,A,m)(\Omega,\mathcal{A},m) be a measured space and μ\mu be a probability measure. The relative entropy of μ\mu is defined by

Representation formula for the entropy

We shall use repeatedly Girsanov’s formula, see [19, chapter 6].

Let BB be a Brownian motion defined on some filtered probability space (Ω,A,P,F)(\Omega,\mathcal{A},\mathsf{P},\mathcal{F}) and let UU be a drift. Letting μ\mu be the law of B+UB+U, we have

is a uniformly integrable martingale and Girsanov’s formula applies. Under

the process X:=B+UX:=B+U is a Brownian motion. Therefore XX has law μ\mu and γ\gamma under P\mathsf{P} and Q\mathsf{Q}, respectively. Then by (3)

which concludes the proof when U\lVert U\rVert is bounded. In the general case, define the stopping time

It follows immediately that when EU2<+\mathop{\mathsf{E}{}}\nolimits\lVert U\rVert^{2}<+\infty, the law of B+UB+U is absolutely continuous with respect to the Wiener measure γ\gamma. Let us point out that this is actually true for all drifts UU, even if EU2=+\mathop{\mathsf{E}{}}\nolimits\lVert U\rVert^{2}=+\infty, see [19, chapter 7].

2 Föllmer’s drift

The process y=xUy=x-U is a Brownian motion.

Throughout Eγ\mathop{\mathsf{E}{}}\nolimits^{\gamma} and Eμ\mathop{\mathsf{E}{}}\nolimits^{\mu} denote expectations with respect to γ\gamma and μ\mu respectively. On Gt\mathcal{G}_{t} the measure μ\mu has density

with respect to γ\gamma. A standard martingale argument shows that

Since Brownian martingales can be represented as stochastic integrals there exists an adapted process vv satisfying

which is the first assertion of the theorem. The assertion 2 follows from Girsanov’s formula, see [19, Theorem 6.2]. Under μ\mu, we have

Applying Itô’s formula (recall that FF is positive and yy is a Brownian motion under μ\mu) we obtain

If EμU2<+\mathop{\mathsf{E}{}}\nolimits^{\mu}\lVert U\rVert^{2}<+\infty the local martingale part in the equation above is integrable and has mean so that

Again, a localization argument shows that this equality remains valid when EμU2=+\mathop{\mathsf{E}{}}\nolimits^{\mu}\lVert U\rVert^{2}=+\infty, see [15, Lemma (2.6)]. ∎

3 Optimal drift in a strong sense

Still, it remains open whether given a probability space, a filtration and a Brownian motion, there exists a drift achieving equality in (4). It this section, we show that this is indeed the case, under some restriction on the measure μ\mu. The approach is taken from the article in which Baudoin treats the case of Brownian bridges (see subsection 2.5 below). We refer to for the background on stochastic differential equations.

has the pathwise uniqueness property, then it has a unique strong solution. This solution XX satisfies the following.

The relative entropy of μ\mu is given by

where yy is a Brownian motion. Therefore (7) has a weak solution. By Yamada and Watanabe’s theorem, if pathwise uniqueness holds then (7) has a unique strong solution. Moreover, since pathwise uniqueness implies uniqueness in law, the solution XX has law μ\mu. The rest of Theorem 4 concerns the law of XX, so it is contained in Theorem 2. ∎

We end this section by showing that for a reasonably large class of measures μ\mu, the stochastic differential equation (7) does satisfy the pathwise uniqueness property.

There exists ϵ>0\epsilon>0 such that Φϵ\Phi\geq\epsilon.

If μ\mu belongs to S\mathcal{S} then the equation (7) has the pathwise uniqueness property.

where iΦ\nabla_{i}\Phi is the gradient of Φ\Phi in the ii-th variable. By Lemma 3, the process associated to μ\mu is

It is enough to prove that there is a constant CC such that

where Ψ(x1,,xk,x)\Psi(x_{1},\dotsc,x_{k},x) equals

To sum up, we have the following representation formula.

where the minimum is on all drifts UU such that B+UB+U has law μ\mu.

4 The Boué and Dupuis formula

In this subsection the previous results are translated in terms of log-Laplace using the following lemma.

where the supremum is taken over all drifts UU.

Let UU be a drift and μ\mu be the law of B+UB+U. By Proposition 1 and the entropy/log-Lapace duality

On the other hand, given ϵ>0\epsilon>0, there exists a probability measure μS\mu\in\mathcal{S} satisfying (10). Since μS\mu\in\mathcal{S}, Theorem 7 asserts that there exists a drift UU such that B+UB+U has law μ\mu and satisfying

5 Brownian bridges

Let ν\nu have density ρ\rho with respect to γd\gamma_{d}, we have

where the infimum is on all probability measures satisfying μ(x1)1=ν\mu\circ(x_{1})^{-1}=\nu. The infimum is attained when μ\mu is the bridge (11).

By Lemma 3 the Föllmer process of the bridge μ\mu is such that

Under μ\mu, the process (ut)t(u_{t})_{t\in} is a martingale. In particular

Now assume that ρ\rho and ρ\nabla\rho are Lipschitz and that ρϵ\rho\geq\epsilon, so that the bridge μ\mu belongs to S\mathcal{S}. It is easily seen that utu_{t} can also be written as

The stochastic differential equation (7) becomes

By Lemma 6, there is a unique strong solution. Combining Lemma 10 with Theorem 4 we obtain the following dual formulation of Borell’s result (1).

where the infimum is taken on all drifts UU satisfying B1+U1=νB_{1}+U_{1}=\nu in law. The infimum is attained by the drift

where XX is the unique solution of (12).

Applications

Following Borell, we now derive functional inequalities from the representation formula. Let us point out that in all but one applications we use Proposition 1 and Theorem 2 rather than Theorem 7.

Here is a short proof based of Theorem 2. Fair enough, Feyel and Üstünel have a very similar argument.

Let us point out that Talagrand’s inequality can be recovered easily from this theorem, applying it to a Brownian bridge. Details are left to the reader.

2 Logarithmic Sobolev inequality

3 Shannon’s inequality

This inequality plays a central role in information theory, see for an overview on the topic.

Let νθ\nu_{\theta} be the law of cos(θ)η+sin(θ)ξ\cos(\theta)\eta+\sin(\theta)\xi. By Theorem 2, Lemma 10 and Lemma 11 there exists a Brownian motion XX and a drift UU such that

E(U)=E(η) 1\mathop{\mathsf{E}{}}\nolimits(U)=\mathop{\mathsf{E}{}}\nolimits(\eta)\ \mathbf{1}_{}.

Similarly, there exists a Brownian motion YY and a drift VV satisfying the corresponding properties for νπ/2\nu_{\pi/2}. Besides, we can clearly assume that YY is independent of XX. Then cos(θ)X+sin(θ)Y\cos(\theta)X+\sin(\theta)Y is a Brownian motion and

has law νθ\nu_{\theta}. By Proposition 1 and Lemma 10

This is easily seen to be equivalent to (15). ∎

4 Brascamp-Lieb inequality

Let us focus on a family of inequalities dating back to Brascamp and Lieb’s article on optimal constants in Young’s inequality. Since then a number of nice alternate proofs have been discovered, see and the survey article . This subsection is inspired by the (unpublished) proof of Maurey relying on Borell’s formula.

Let xEx\in E, we then have \lvert x\rvert^{2}=\bigl{(}\sum c_{i}P_{i}x\bigr{)}\cdot x and since PiP_{i} is an orthogonal projection

where μi=μPi1\mu_{i}=\mu\circ P_{i}^{-1} is the push-forward of μ\mu by the projection PiP_{i}.

According to Theorem 2 there exists a standard Brownian motion BB on EE and a drift UU such that B+UB+U has law μ\mu and

Since PiP_{i} is an orthogonal projection, the process PiBP_{i}B is a standard Brownian motion on EiE_{i}. Also PiB+PiUP_{i}B+P_{i}U has law μPi1=μi\mu\circ P_{i}^{-1}=\mu_{i}. By Proposition 1

On the other hand, the frame condition (17) implies easily that

pointwise. Taking expectation yields the result. ∎

As observed by Carlen and Cordero , this super-additivity property of the relative entropy is equivalent to the following Brascamp-Lieb inequality.

When the functions FiF_{i} depend only on the point w1w_{1} rather than on the whole path ww we recover the usual Brascamp-Lieb inequality for the Gaussian measure.

5 Reversed Brascamp-Lieb inequality

Again EE is a Euclidean space and E1,,EmE_{1},\dotsc,E_{m} are subspaces satisfying the frame condition (16). Observe that if x1,,xmx_{1},\dotsc,x_{m} belong to E1,,EmE_{1},\dotsc,E_{m} respectively, then for any yEy\in E, the Cauchy-Schwarz inequality and (17) yield

Given mm probability measures μ1,,μm\mu_{1},\dotsc,\mu_{m} belonging to S1,,Sm\mathcal{S}_{1},\dotsc,\mathcal{S}_{m} respectively, there exist mm processes X1,,XmX_{1},\dotsc,X_{m} (defined on the same probability space) such that

XiX_{i} has law μi\mu_{i} for all i=1,,mi=1,\dotsc,m.

Letting μ\mu be the law of ciXi\sum c_{i}X_{i} we have

Again let BB be a standard Brownian motion on EE. For i=1,,mi=1,\dotsc,m, the process PiBP_{i}B is a standard Brownian motion on EiE_{i}. Since μiSi\mu_{i}\in\mathcal{S}_{i} there exists a drift UiU_{i} such that the process Xi=PiB+UiX_{i}=P_{i}B+U_{i} has law μi\mu_{i} and

Let X=ciXiX=\sum c_{i}X_{i} and let μ\mu be the law of XX. Since ciPi\sum c_{i}P_{i} is the identity of EE

On the other hand (18) easily implies that

pointwise. Taking expectation we get the result. ∎

This sub-additivity property of the entropy is a multi-marginal version of the displacement convexity property put forward by Sturm . By duality, we obtain the following reversed Brascamp-Lieb inequality.

By Lemma 8, for every ii, there exists a measure μiSi\mu_{i}\in\mathcal{S}_{i} such that

Let X1,,XmX_{1},\dotsc,X_{m} be the random processes given by the previous theorem, let X=ciXiX=\sum c_{i}X_{i} and let μ\mu be the law of XX. Then by duality and the hypothesis (19) we get

Letting ϵ\epsilon tend to yields the result. ∎

Again when the functions depend only on the value of the path at time 11, we recover the reversed Brascamp-Lieb inequality for the Gaussian measure, which is due to Barthe .

The author is grateful to Patrick Cattiaux and Massimiliano Gubinelli for communicating references and to Christian Léonard, Bernard Maurey and Patrick Cattiaux again for valuable discussions.

References