Representation formula for the entropy and functional inequalities

Joseph Lehec

Introduction: Borell’s formula

where the supremum is taken over all random processes $u$ , say bounded and adapted to the Brownian filtration. Among other applications, he derives easily the Prékopa-Leindler inequality. The name Borell’s formula may be unfair to Boué and Dupuis who in an earlier paper obtained a stronger result, allowing the function $f$ to depend on the whole path $(B_{t})_{t\in}$ (see Theorem 9 below for a precise statement). Anyway, Borell and Boué-Dupuis agree that representation formulas such as (1) arose much earlier in optimal control theory, particularly in Fleming and Soner’s work , and Borell should definitely be credited for bringing these techniques in the context of functional inequalities.

The present article deals with relative entropy. Let $(\Omega,\mathcal{A},m)$ be a measured space and $\mu$ be a probability measure. The relative entropy of $\mu$ is defined by

Representation formula for the entropy

We shall use repeatedly Girsanov’s formula, see [19, chapter 6].

Let $B$ be a Brownian motion defined on some filtered probability space $(\Omega,\mathcal{A},\mathsf{P},\mathcal{F})$ and let $U$ be a drift. Letting $\mu$ be the law of $B+U$ , we have

is a uniformly integrable martingale and Girsanov’s formula applies. Under

the process $X:=B+U$ is a Brownian motion. Therefore $X$ has law $\mu$ and $\gamma$ under $\mathsf{P}$ and $\mathsf{Q}$ , respectively. Then by (3)

which concludes the proof when $\lVert U\rVert$ is bounded. In the general case, define the stopping time

It follows immediately that when $\mathop{\mathsf{E}{}}\nolimits\lVert U\rVert^{2}<+\infty$ , the law of $B+U$ is absolutely continuous with respect to the Wiener measure $\gamma$ . Let us point out that this is actually true for all drifts $U$ , even if $\mathop{\mathsf{E}{}}\nolimits\lVert U\rVert^{2}=+\infty$ , see [19, chapter 7].

2 Föllmer’s drift

The process $y=x-U$ is a Brownian motion.

Throughout $\mathop{\mathsf{E}{}}\nolimits^{\gamma}$ and $\mathop{\mathsf{E}{}}\nolimits^{\mu}$ denote expectations with respect to $\gamma$ and $\mu$ respectively. On $\mathcal{G}_{t}$ the measure $\mu$ has density

with respect to $\gamma$ . A standard martingale argument shows that

Since Brownian martingales can be represented as stochastic integrals there exists an adapted process $v$ satisfying

which is the first assertion of the theorem. The assertion 2 follows from Girsanov’s formula, see [19, Theorem 6.2]. Under $\mu$ , we have

Applying Itô’s formula (recall that $F$ is positive and $y$ is a Brownian motion under $\mu$ ) we obtain

If $\mathop{\mathsf{E}{}}\nolimits^{\mu}\lVert U\rVert^{2}<+\infty$ the local martingale part in the equation above is integrable and has mean so that

Again, a localization argument shows that this equality remains valid when $\mathop{\mathsf{E}{}}\nolimits^{\mu}\lVert U\rVert^{2}=+\infty$ , see [15, Lemma (2.6)]. ∎

3 Optimal drift in a strong sense

Still, it remains open whether given a probability space, a filtration and a Brownian motion, there exists a drift achieving equality in (4). It this section, we show that this is indeed the case, under some restriction on the measure $\mu$ . The approach is taken from the article in which Baudoin treats the case of Brownian bridges (see subsection 2.5 below). We refer to for the background on stochastic differential equations.

has the pathwise uniqueness property, then it has a unique strong solution. This solution $X$ satisfies the following.

The relative entropy of $\mu$ is given by

where $y$ is a Brownian motion. Therefore (7) has a weak solution. By Yamada and Watanabe’s theorem, if pathwise uniqueness holds then (7) has a unique strong solution. Moreover, since pathwise uniqueness implies uniqueness in law, the solution $X$ has law $\mu$ . The rest of Theorem 4 concerns the law of $X$ , so it is contained in Theorem 2. ∎

We end this section by showing that for a reasonably large class of measures $\mu$ , the stochastic differential equation (7) does satisfy the pathwise uniqueness property.

There exists $\epsilon>0$ such that $\Phi\geq\epsilon$ .

If $\mu$ belongs to $\mathcal{S}$ then the equation (7) has the pathwise uniqueness property.

where $\nabla_{i}\Phi$ is the gradient of $\Phi$ in the $i$ -th variable. By Lemma 3, the process associated to $\mu$ is

It is enough to prove that there is a constant $C$ such that

where $\Psi(x_{1},\dotsc,x_{k},x)$ equals

To sum up, we have the following representation formula.

where the minimum is on all drifts $U$ such that $B+U$ has law $\mu$ .

4 The Boué and Dupuis formula

In this subsection the previous results are translated in terms of log-Laplace using the following lemma.

where the supremum is taken over all drifts $U$ .

Let $U$ be a drift and $\mu$ be the law of $B+U$ . By Proposition 1 and the entropy/log-Lapace duality

On the other hand, given $\epsilon>0$ , there exists a probability measure $\mu\in\mathcal{S}$ satisfying (10). Since $\mu\in\mathcal{S}$ , Theorem 7 asserts that there exists a drift $U$ such that $B+U$ has law $\mu$ and satisfying

5 Brownian bridges

Let $\nu$ have density $\rho$ with respect to $\gamma_{d}$ , we have

where the infimum is on all probability measures satisfying $\mu\circ(x_{1})^{-1}=\nu$ . The infimum is attained when $\mu$ is the bridge (11).

By Lemma 3 the Föllmer process of the bridge $\mu$ is such that

Under $\mu$ , the process $(u_{t})_{t\in}$ is a martingale. In particular

Now assume that $\rho$ and $\nabla\rho$ are Lipschitz and that $\rho\geq\epsilon$ , so that the bridge $\mu$ belongs to $\mathcal{S}$ . It is easily seen that $u_{t}$ can also be written as

The stochastic differential equation (7) becomes

By Lemma 6, there is a unique strong solution. Combining Lemma 10 with Theorem 4 we obtain the following dual formulation of Borell’s result (1).

where the infimum is taken on all drifts $U$ satisfying $B_{1}+U_{1}=\nu$ in law. The infimum is attained by the drift

where $X$ is the unique solution of (12).

Applications

Following Borell, we now derive functional inequalities from the representation formula. Let us point out that in all but one applications we use Proposition 1 and Theorem 2 rather than Theorem 7.

Here is a short proof based of Theorem 2. Fair enough, Feyel and Üstünel have a very similar argument.

Let us point out that Talagrand’s inequality can be recovered easily from this theorem, applying it to a Brownian bridge. Details are left to the reader.

2 Logarithmic Sobolev inequality

3 Shannon’s inequality

This inequality plays a central role in information theory, see for an overview on the topic.

Let $\nu_{\theta}$ be the law of $\cos(\theta)\eta+\sin(\theta)\xi$ . By Theorem 2, Lemma 10 and Lemma 11 there exists a Brownian motion $X$ and a drift $U$ such that

$\mathop{\mathsf{E}{}}\nolimits(U)=\mathop{\mathsf{E}{}}\nolimits(\eta)\ \mathbf{1}_{}$ .

Similarly, there exists a Brownian motion $Y$ and a drift $V$ satisfying the corresponding properties for $\nu_{\pi/2}$ . Besides, we can clearly assume that $Y$ is independent of $X$ . Then $\cos(\theta)X+\sin(\theta)Y$ is a Brownian motion and

has law $\nu_{\theta}$ . By Proposition 1 and Lemma 10

This is easily seen to be equivalent to (15). ∎

4 Brascamp-Lieb inequality

Let us focus on a family of inequalities dating back to Brascamp and Lieb’s article on optimal constants in Young’s inequality. Since then a number of nice alternate proofs have been discovered, see and the survey article . This subsection is inspired by the (unpublished) proof of Maurey relying on Borell’s formula.

Let $x\in E$ , we then have $\lvert x\rvert^{2}=\bigl{(}\sum c_{i}P_{i}x\bigr{)}\cdot x$ and since $P_{i}$ is an orthogonal projection

where $\mu_{i}=\mu\circ P_{i}^{-1}$ is the push-forward of $\mu$ by the projection $P_{i}$ .

According to Theorem 2 there exists a standard Brownian motion $B$ on $E$ and a drift $U$ such that $B+U$ has law $\mu$ and

Since $P_{i}$ is an orthogonal projection, the process $P_{i}B$ is a standard Brownian motion on $E_{i}$ . Also $P_{i}B+P_{i}U$ has law $\mu\circ P_{i}^{-1}=\mu_{i}$ . By Proposition 1

On the other hand, the frame condition (17) implies easily that

pointwise. Taking expectation yields the result. ∎

As observed by Carlen and Cordero , this super-additivity property of the relative entropy is equivalent to the following Brascamp-Lieb inequality.

When the functions $F_{i}$ depend only on the point $w_{1}$ rather than on the whole path $w$ we recover the usual Brascamp-Lieb inequality for the Gaussian measure.

5 Reversed Brascamp-Lieb inequality

Again $E$ is a Euclidean space and $E_{1},\dotsc,E_{m}$ are subspaces satisfying the frame condition (16). Observe that if $x_{1},\dotsc,x_{m}$ belong to $E_{1},\dotsc,E_{m}$ respectively, then for any $y\in E$ , the Cauchy-Schwarz inequality and (17) yield

Given $m$ probability measures $\mu_{1},\dotsc,\mu_{m}$ belonging to $\mathcal{S}_{1},\dotsc,\mathcal{S}_{m}$ respectively, there exist $m$ processes $X_{1},\dotsc,X_{m}$ (defined on the same probability space) such that

$X_{i}$ has law $\mu_{i}$ for all $i=1,\dotsc,m$ .

Letting $\mu$ be the law of $\sum c_{i}X_{i}$ we have

Again let $B$ be a standard Brownian motion on $E$ . For $i=1,\dotsc,m$ , the process $P_{i}B$ is a standard Brownian motion on $E_{i}$ . Since $\mu_{i}\in\mathcal{S}_{i}$ there exists a drift $U_{i}$ such that the process $X_{i}=P_{i}B+U_{i}$ has law $\mu_{i}$ and

Let $X=\sum c_{i}X_{i}$ and let $\mu$ be the law of $X$ . Since $\sum c_{i}P_{i}$ is the identity of $E$

On the other hand (18) easily implies that

pointwise. Taking expectation we get the result. ∎

This sub-additivity property of the entropy is a multi-marginal version of the displacement convexity property put forward by Sturm . By duality, we obtain the following reversed Brascamp-Lieb inequality.

By Lemma 8, for every $i$ , there exists a measure $\mu_{i}\in\mathcal{S}_{i}$ such that

Let $X_{1},\dotsc,X_{m}$ be the random processes given by the previous theorem, let $X=\sum c_{i}X_{i}$ and let $\mu$ be the law of $X$ . Then by duality and the hypothesis (19) we get

Letting $\epsilon$ tend to yields the result. ∎

Again when the functions depend only on the value of the path at time $1$ , we recover the reversed Brascamp-Lieb inequality for the Gaussian measure, which is due to Barthe .

The author is grateful to Patrick Cattiaux and Massimiliano Gubinelli for communicating references and to Christian Léonard, Bernard Maurey and Patrick Cattiaux again for valuable discussions.