Subadditivity of the entropy and its relation to Brascamp-Lieb type inequalities
Eric A. Carlen, Dario Cordero-Erausquin
Introduction
Let be a measure space, and let be a probability density on . That is, is a non negative integrable function on with . On the convex subset of probability densities
the entropy of , , is defined by
With this sign convention for the entropy, the inequalities we derive are of superadditive type; however, the terminology “subadditivity of the entropy” is too well entrenched to use anything else.
In other words, the measure is the “push–forward” of the measure under :
(1) Given measurable functions on , and nonnegative numbers , is there a finite constant such that
for all probability densities with finite entropy (i.e. satisfying (1.1))?
(2) Given measurable functions on , and nonnegative numbers , is there a finite constant such that
It is even easier to recognize (1.4) as a classical result in this setting: It becomes
which is the classical Brascamp–Lieb inequality. A celebrated theorem of Brascamp and Lieb says that the best constant in this inequality can be computed by using only centered Gaussian functions as trial functions. A new proof based on optimal mass transport was given by Barthe who also gave a characterization (depending on the vectors and the constants ) of when the constant is finite together with a description of the optimizers in some situations. Carlen, Lieb and Loss introduced a new approach to the Brascamp-Lieb inequalities based on heat flow (see also ). These authors also completed the gaps left by Barthe in the description of the optimizers. Bennett, Carbery, Christ and Tao used a similar approach to deal with the multidimensional versions of the Brascamp-Lieb inequality (see also for a direct approach of the finiteness of the constant ). The paper (and in the multidimensional setting) develops a “splitting procedure” that will prove useful in our situation too. But we shall see that working with entropy clarifies many technical points.
for any probability density on with finite entropy, and
for any nonnegative functions on $$. See for the original proofs of (1.5) and (1.6), in which (1.5) was deduced from (1.6). See for a different and direct proof of (1.5).
Since we are concerned in this paper with the relation between subadditivity of entropy and Brascamp–Lieb type inequalities, it is worth recalling the short argument from that provided the passage from (1.6) to (1.5): Let be any probability density on , and let be its marginals, as above. Then define another probability density on by
Then by positivity of the relative entropy (Jensen’s inequality), we have
since each is a probability density. Thus, , so that (1.6) now follows from (1.7). This argument may give the impression that (1.6) is a “stronger” inequality than (1.5), but as we shall see, this is not the case.
for any probability density on , and
for any nonnegative functions on . See for the proof of (1.9). One could then derive (1.8) using the exact same argument that was used to derive (1.5) from (1.6).
There are more examples of interesting specializations of (1.3) and (1.4). However, these examples suffice to illustrate the context in which the present work is set, and we now turn to the results. One basic result of this paper is the following:
The two questions concerning (1.3) and (1.4) that were raised above are in fact one and the same: We shall prove here that the answer to one question is “yes” if and only if the answer to the other question is “yes” — with the same constant , and with a complete correspondence of cases of equality.
The rest of the paper is organized as follows. In Section 2, we give the proof that (1.3) and (1.4) are dual to one another, so that once one has one inequality established with the cases of equality determined, one has the same for the other. We shall state this duality in a very general setting.
In Section 3, we prove the sharp version of the general Euclidean subadditivity of the entropy inequality.
In Section 4 we shall deduce some interesting consequences from this, including a generalization of Hadamard’s inequality for the determinant.
The final Section 5 gives another duality result showing that the superadditivity inequalities for Fisher information are dual to certain convolution type inequalities of ground state eigenvalues of Schrödinger operators. These inequalities appear to be new. They may be of some intrinsic interest, but our interest in them here is that a direct proof of the eigenvalue inequalities would yield a direct proof of Fisher information inequalities that would in turn yield entropy and Brascamp-Lieb inequalities.
Duality of the Brascamp–Lieb inequality and subadditivity of the entropy
We show that the Brascamp–Lieb inequality is dual to the subadditivity of the entropy, so that once one has proved one of these inequalities with sharp constants, one has the other with sharp constants too. In fact, we shall see that there is an exact correspondence also for cases of equality, but in the next theorem, we focus on the constants.
We shall state the result in a more general setting than the one described in the introduction. We consider a reference measure space and a family of measure spaces together with measurable functions , . For a probability density on (with respect to ), the marginal is thus defined as the probability density on (with respect to ) such that
for all bounded measurable functions on ; accordingly the entropies are given by
Let be a measure space, and for , let be a measure space together with a measurable function from to . For any probability density on , let the probability density on be defined as in (2.1). Finally, let be any set of nonnegative numbers.
For every probability density on with finite entropy, we have
The proof depends an a well known expression for the entropy as a Legendre transform: For any probability density in , and any function such that is integrable,
On the other hand, by Jensen’s inequality,
and there is equality if and only if is a constant multiple of on the support of . We shall use that this Legendre duality nicely combines with the operation of taking marginals.
Proof of Theorem 2.1: First, assume (2.2). Consider any probability density on , and any functions on , . Using (2.4) with defined on by
Then from the assumption (2.2) applied with ,
Now the optimal choice leads to (2.3).
Conversely, suppose that (2.3) is true. Consider functions on , , and define on as in (2.5). Suppose that is integrable, and choose to be the probability density
so that there is equality in (2.4). Then we have from (2.4) that
and so (2), and then (2.4) applied on with the probability density and the function for each , imply
Exponentiating both sides, we obtain (2.2). ∎
We next examine the relation between cases of equality in the two inequalities.
Using the notation of the previous theorem, suppose that is a probability density on for which equality holds in the subadditivity inequality (2.3). Then the marginals of yield equality in the Brascamp–Lieb inequality (2.2), and moreover, and its marginals satisfy
Conversely, suppose that are probability densities (on with respect to for , respectively) for which equality holds in the Brascamp–Lieb inequality (2.2). Then the probability density defined on by
yields equality in the subadditivity inequality (2.3) and moreover is the th marginal of ; i.e. for .
Proof: Suppose that for some probability density , . Then with this , we must have equality in the first inequality in (2), which comes from (2.4). By what we have said about the cases of equality in (2.4), this means that , defined in (2.5) is a constant multiple of . Moreover, to get equality in (2.7), we were forced to choose . This ensures that (2.12) is true.
Furthermore, to get equality in our intermediate application of the Brascamp–Lieb inequality, we must have that is a set of extremals for the Brascamp–Lieb inequality.
The other assertion follows in the same way. ∎
On the other hand the dual inequality, is the classical subadditivity of the entropy inequality
and equality occurs exactly when the coordinates form a set of independent random variables.
In this example, it may appear that the entropy inequality is the more complicated of the two inequalities. However, the fact that statistical independence enters the picture on the entropy side is quite helpful: We will make much use of simple entropy inequalities that are saturated only for independent random variables in our investigation of the cases of equality in the next section.
In general there is no finite constant for which (3.3) is true for all . There are some simple requirements on and for this to be the case.
where is the orthogonal projection onto , and .
Beyond this spanning condition, there are some simple compatibility conditions that must be satisfied by the vectors and the numbers . First of all, it follows from (3.2) that for all ,
There is a further necessary condition that is somewhat less obvious. The key observation to make is that the right hand side of (3.6) tends to infinity as tends to zero if and only if ,
Consider any subset of , and let
Let denote the Gaussian random variable defined by (3.4) when . Note that for each , , so that for such ,
which tends to infinity as tends to zero. Therefore, letting approach zero, we see that the leading term in is at least
(It is exactly this unless for some , , in which case we could have taken an even “worse” set .) Hence, if , there can be no upper bound on . Therefore, (3.3) can only hold when it is the case that for all ,
In particular, we must have for all .
Notice that with fixed, is the pointwise supremum of a set of affine functions, and as such, it is convex. We introduce
Also define , the Gaussian analog of (3.9), by
It is clear that is also a convex function of , and that . Also, since our proof that for used a centered Gaussian random vector, it shows also that for . In fact, we have the following:
and furthermore is finite if and only if .
The proof will be accomplished in three steps:
Step 1: We shall first consider the case in which the vectors are all unit vectors satisfying the following special condition, put forward by K. Ball in the setting of Brascamp-Lieb inequalities (see e.g. ):
with . (Note that (3.7) automatically holds, as it can be seen by taking the trace, and that for all .) Under this condition, we give a simple proof of Theorem 3.1 using an elementary superadditivity property of the Fisher information and integration along the heat flow. The proof here draws on ideas from .
Step 2: We shall show that for , there is a linear change of variables that reduces this case to the one considered in the first step. While the lemma that provides the existence of the change of variables would appear to be a simple statement about linear algebra, the existence of this change of variables is intimately connected with the existence of Gaussian optimizers for the subadditivity (and hence the Brascamp–Lieb) inequality.
Remark: If one is content to prove only that is finite if and only if , there is a very expeditious route: One can easily check the finiteness of at the extreme points of (where, as shown by Barthe, each is either or ). Then the convexity of implies finiteness on all of , and we know it is infinite outside. Proving the equality on all of is more subtle: The values of and do jump as one crosses the boundary of , and we see nothing to preclude from jumping up more than on the boundary. Thus, it is not only for the classification of the cases of equality that we argue as we do in the third step: we do not know of any quick way to “pass to the boundary” of and wrap of the proof of Theorem 3.1 after the second step without developing the splitting argument.
We now begin with the first step. Here we shall use a simple superadditivity result for the Fisher information: If is a random vector with a differentiable density , define the Fisher information of or of by
and in particular, the right hand side is finite for all .
The basic inequality concerning the Fisher information that will yield us our subadditivity result is the fact that for any unit vector ,
with equality if and only if is the product of and a probability density on the orthogonal complement of . This was proved in ; see Theorem 2 there with . Let us include here for completeness a different proof taken from (were more abstract settings are studied). This proof requires more regularity than the one in , but that is fine for our purpose, as we shall apply the inequality along the heat flow.
Using the definition of the marginal (3.1) twice and Hölder’s inequality, we have:
From (3.15), we immediately deduce the superadditivity of information. But before stating the result, let us make a definition needed to discuss the cases of equality.
Then for all random vectors with finite Fisher information,
with equality if , and for all random vectors with finite entropy
Moreover there is equality in these inequalities if and only if for each , and are independent. Under the condition that and that is an irreducible spanning set, then there is equality in these inequalities if and only if is an isotropic Gaussian random vector.
The proof of (3.16) and (3.17) is elementary and follows . The determination of the cases of equality requires a bit more work, but it remains quiet direct (compared to analogous result on the side of the Brascamp-Lieb inequality).
Proof: Inequality (3.16) follows immediately from (3.15) and condition (3.13) rewritten in the form
Equality for is obvious as is a standard Gaussian variable and so the computation boils down to the equality . (For the same reason the right-hand side of the inequality (3.17) is zero.)
As we have noted, the Fisher information of is related to the entropy of through . It is also easy to see (using that commutes with translations) that if is any unit vector, then , the marginal of along , has the property that where we keep the same notation of the -dimensional heat semi-group ( in dimension ); we again have (in dimension ) that
Then since , and because is invariant under dilation, i.e. under the substitution , we get
By Theorem 3.3, the integrand above is non negative for all , and so (3.17) is proved.
The condition for cases of equality in (3.15) tell us that there is equality in (3.16) for a random vector with finite Fisher information if and only if verifies the following property :
If is a standard Gaussian random vector independent of , then verifies if and only if for all , verifies . Thus for a random vector with finite entropy, there is equality in (3.17) if and only if verifies .
Writing , and for each , we have
Evidently the left hand side depends on only thorough and only through . But since and are linearly independent, this means that the left hand side is constant. Hence,
The following lemma will facilitate the application of the the statement concerning the cases of equality in Proposition 3.3:
with for , since . Then, using that , this expression (in ) has the form
which is unbounded for large unless
This must be the case since by hypothesis that . Thus, ∎
We have now completed the first step. We start the second by showing that the change of variables matrix does exist for . The existence of such a change of variables can be deduced from results of Bennett-Carbery-Christ-Tao . However, the flow of logic in their deduction (and in ) runs counter to ours: They first show that such a change of variables exists whenever there are Gaussian optimizers for the Brascamp–Lieb problem, and then show that Gaussian optimizers exist for . Here, we need the change of variables at the outset of our analysis, and hence need a direct proof of this result. We now provide one, using a geometric result of Barthe.
When , there is exactly one such matrix satisfying the further requirements that be positive definite, and that . On the other hand, for , no such matrix exists.
Remark: After settling the cases of equality in Theorem 3.1 we shall derive necessary and sufficient conditions for the existence of such a matrix . Though the conditions are simple and explicit, it turns out that the matrix exists if and only if the supremum in (3.12) is attained at some centered Gaussian , and our proof that the conditions we give are necessary and sufficient depends on this.
Proof: Take any diagonal matrix with positive diagonal entries , , and define the matrix by
We have what we seek if and only if for each , is a unit vector, which is the case if and only if for each , . By the definition of , this means
It has been shown (see for another proof and a statement in this formulation) that there exists positive numbers for which (3.20) is true whenever , and that in this case, when , the set of numbers is unique up to a common multiple. Thus, for , such an exists.
As for the uniqueness, note that given any such matrix , we can change variables, replacing and . Then Proposition 3.3 may be applied to deduce that the only extremizers for the new problem are isotropic Gaussians. Undoing the change of variables, we see that the only extremizers of the original problem are Gaussians whose covariance is a multiple of . Thus, under the further condition that be positive definite (instead of simply symmetric), and that the trace of is fixed, is uniquely determined.
The same change of variables argument (which is exploited systematically in Lemma 3.6 below) shows, through Proposition 3.3, that if such a matrix exists, then . As we have seen, this is impossible when . ∎
Remark: The first proof that there exists a solution, essentially unique, to (3.20) whenever is due to Barthe . However, he used a different characterization of , and did not mention the condition (3.8). Another proof of this, based directly on (3.8) was given in , together with a proof that the characterization of in Barthe’s paper is equivalent to the one based on (3.8).
With the change of variable provided by the previous lemma, we can finish the second step and describe what happens when .
and there exist a Gaussian optimizer. Moreover, if , then if and only if is Gaussian and its covariance is a constant multiple of where is the unique positive definite matrix verifying (3.19) with .
Remark: The condition “”, which has already appeared several times, is present because in one dimension, the subadditivity problem is trivial, so that Gaussians play no special role. Indeed, assume we are given with the condition that and a family of non-zero real numbers. Then, setting
Therefore and every random variable is an extremizer.
Proof: Let be an invertible symmetric matrix verifying (3.19) provided by the Lemma 3.5. Since for any random vector with finite entropy, we have
Introduce the family of vectors for , and set . The previous equality implies that
Since is a family of unit vectors verifying the decomposition of the identity (3.13), we can apply Proposition 3.3 and get that
and every isotropic Gaussian vector is an extremizer. To prove that all optimizers are Gaussian when , note first that, by Lemma 3.4, implies that is an irreducible spanning set. Therefore any optimizer of the variational problem defining is an isotropic Gaussian. (Then every optimizer for is Gaussian whose covariance is a multiple of .) ∎
Remark: Note that the proof above gives also the following statement: If there exists an invertible matrix verifying (3.19) then (with no further assumptions on and ) we have that and that is an extremizer for every standard Gaussian vector .
We now turn to the third step. When , we will pick a non-empty proper subset of of least cardinality among subsets for which equality holds in (3.8). We shall now show that the variational problem defining splits into two such problems involving fewer vectors and random variables in a lower dimensional space. Repeated splittings, and what we have already proved, will enable us to settle all questions concerning the variational problem defining . The splitting argument presented here is patterned on one developed in for the Brascamp–Lieb inequality. However, as we shall see, in the subadditivity setting, the argument leads to a clear and simple analysis of cases of equality. It relies on properties of the conditional entropy.
Let us fix the following notation. Let be a family of of vectors spanning an Euclidean space ,
Note that (a priori this sum is not direct) and so . Thus we have , i.e.:
and if , then .
Suppose next that there exists an extremizing random vector ; i.e., a random vector such that
(for instance where is the covariance matrix of an extremizer , so that ), then is an extremizer (3.25) if and only if decomposes as where and are independent random vectors with values in and , and which are extremizer for \big{(}[Ta_{j}\,;j\in J],c_{J}\big{)} and \big{(}[Ta_{j}\,;j\in J^{c}],c_{J^{c}}\big{)}, respectively.
The proof of this lemma relies on some well known identities and inequalities concerning conditional entropy that we now recall.
Let and be two Euclidean spaces (equipped with the Lebesgue measure). If and are two random vectors with values in and respectively, with a joint density on , let and be the two marginal densities on and , which are of course the densities of and respectively.
Then the conditional density of given is . The conditional entropy of given is then defined to be
Since the entropy of , , is given by
follows directly from the definitions. Furthermore, by Jensen’s inequality
and there is equality if and only if and are independent.
so that . Then and so from (3.27),
For each , we have , so that
Now combining (3.29), (3.30) and (3.32), we have that
It is clear from (3.33) and the definition of that
To see that there is actually equality here, we use the fact that is a critical set of minimal cardinality. This implies that , and by Lemma 3.6, there is a centered Gaussian random vector for which
Pick and let be any random variable with values in that is independent of and such that
This implies that . We have implicitly assumed that (we shall later only need this case, actually), but the argument remains valid if . Thus (3.24) is established.
Now suppose that . Then we may further assume that the random variable in the previous paragraph is a centered Gaussian random variable. Combining this with the independent extremal centered Gaussian random variable , provided by Lemma 3.6, we see that we may take the random variable in the previous paragraph to be a centered Gaussian. Hence, in this case, .
It remains to prove the last statements concerning the cases of equality.
We first assume that we are given a finite entropy random variable for which (3.25) is satisfied. By making a translation, we may assume that is centered; i.e., . Furthermore, the covariance matrix is non-degenerate or else the law of would be concentrated on a proper subspace and this is inconsistent with finite entropy. Since satisfies (3.25), there must be equality in (3.33), and it must be the case that
And since is centered, so is . Next, in addition to equality in (3.37), we must have equality in (3.33). Since the only inequality used in deriving (3.33) was (3.32), this in turn requires equality in (3.32) for each . By (3.31), this means that for ,
By the condition for equality in (3.28), this implies that for , and are independent random variables. But then for any , by independence
This shows that and are orthogonal subspaces in the inner product defined in terms of the covariance. Thus their dimension sums exactly to and so (3.26) holds.
We now prove the final statement describing how extremizers split.
We go back to the beginning of the proof and note that for all : the orthogonal projection does nothing in this case ().
Assume is an extremizer (3.25) which is decomposed as before as . Then as in the argument above we must have that
with and independent for every . Since for every we have that is independent of for and so . Using this together with (3.28) for , we get, after integrating (3.39) with respect to , and applying (3.28),
By the definition of this inequality must be an equality, i.e.
and therefore, there must be equality in the application of (3.28) that we just made. This implies that and are independent, as claimed.
Proof of Theorem 3.1 By Lemma 3.6, whenever , , and there is a Gaussian optimizer.
Hence it remains to consider the case . Then taking to be a proper non-empty subset of of least cardinality for which there is equality in (3.8), we may “peel off” vectors from our set, as in the first part of Lemma 3.7, and reduce maters to the consideration of . By that Lemma, whenever . Now, if and are such that for every proper subset of the remaining indices, strict inequality holds in the analog of (3.8), i.e. , then follows from Lemma 3.6. Otherwise, we “peel off” another proper subset of indices for which equality holds in (3.8), and reduce to a problem with a strictly smaller number of vectors. In a finite number of steps, this process must end. ∎
Our next theorem concerns the cases of equality in the subadditivity inequality. As we have seen in Lemma 3.7, when there is equality, and no is zero, then either , or the variational problem can be split into two problems of the same type, but involving reduced number of vectors, and for random variables taking values in subspaces of a reduced dimension.
Of course, each of these reduced problems must also have an optimizer, and so we can apply the same dichotomy to each of them. This leads to the following definition:
where if and only if , and
such that for each , there is no nonempty proper subset of that yields equality in (3.8). Here, may be empty, but for , is to be non empty.
Note that, if is totally reducible for , then we have, with the notation of the definition, that for ,
The analysis made so far proves the following theorem, which gives a complete analysis of the cases of equality in the subadditivity inequality.
Then the extremizers (3.41) are exactly the random vectors such that decompose as
where is an independent set of random variables with each taking values in and extremal for the corresponding problem (\big{(}[Ta_{j}\,;j\in J_{i}],c_{J_{i}}\big{)}. More precisely, for each , if , then can be any finite entropy random variable with values in ; However, if , then is necessarily Gaussian, and its covariance is a constant multiple of , where is the unique positive definite linear transformation on such that
Proof: The proof relies on successive applications of the Lemmas 3.7 and 3.6. First of all, note that the vectors for the indices such that play no role in the inequality, and so without loss of generality, we may discard these indices without changing , the extremizers and . So we will assume that for all (this means in the Definition 3.8).
with for . This shows that there exists an extremizer only when is totally reducible for . Note that we have also shown that this sum is orthogonal w.r.t. the scalar product given by the covariance of an extremizer.
Of course, there always exists such a linear map . As before the change of vectors and reduces the problem to the case and
With this orthogonal decomposition in hand, we can use Lemma 3.7 to successively “peel-off” orthogonal blocks. We first apply this Lemma to and , and then on the space to , and so on. After steps we get that and that a random vector is an extremizer if and only if it can be written as
where has values in and is extremal for , and with the property that
(Note that in order to construct and extremizer we start with an extremizer on and, then add an extremal independent on in order to get an extremizer on , and so on by repeated applications of Lemma 3.7). Observe that the independence property (3.43) is equivalent to the independence of the set of random vectors . Next remember that for each we have . Thus Lemma 3.6 applies and when then is Gaussian and its variance is imposed as stated. Recall that in dimension the problem is trivial and all random variables are extremal (in particular Gaussian variables are extremal).
Note that the previous theorem tells in particular that when there exists optimizers, there exists Gaussian optimizers (however this was not a needed step in our approach).
Of course, by Theorems 2.1 and 2.2, we now also know that optimizers for the classical Brascamp–Lieb inequality exist under the exact same conditions for optimality described in Theorem 3.9, and that moreover, the optimizers Brascamp–Lieb inequality are exactly the marginals of the optimizing probability densities for the subadditivity inequality. The full description of optimizers (in one dimensional Brascamp-Lieb inequalities) was given in , building on a previous characterization by Barthe . In the multidimensional case, building on Barthe’s work too, Bennett-Carbery-Christ-Tao obtained some description, but the problem was completely solved only recently by Valdimarsson .
There are several interesting consequences of Theorems 3.1 and 3.9. The first is a generalization of Hadamard’s inequality for determinants:
and this inequality is sharp in that the constant cannot be decreased. Moreover, for , there is transformation with for which equality holds in (4.1),and, when , if we take to be positive, then is unique (up to multiplication by a positive scalar).
For simplicity we have stated the existence of an extremal only when , but the right condition is that is totally reducible for , just as in Theorem 3.9.
Theorem 4.1 gives us one simple variational expression for , namely
There is however a simpler variational formula for over an even lower dimensional space, as suggested by the fact that is also the sharp constant in the Brascamp–Lieb inequality. By the classical theorem of Brascamp and Lieb, may be computing by taking the functions in the Brascamp–Lieb inequality to be centered Gaussians; i.e.,
and varying the numbers . This leads directly to the variational expression (4.2) for . Let us recall that the existence of optimizers for this problems was proved by Brascamp and Lieb under the hypothesis that every set of vectors chosen from is linearly independent and later proved by Barthe for . The next theorem gives the complete result. Although the variational formula (4.2) can be deduced by duality, we give a direct proof of it starting from the subadditivity inequality.
The supremum in (4.2) is attained if and only if is totally reducible for . Moreover,
for all . Thus, by Jensen’s inequality,
with equality exactly when for all . Therefore, for all ,
Moreover, as we see from the proof of Lemma 3.5 (based on an observation by Barthe) and Lemma 3.6 and the remarks made just above, there is equality when and is the choice of (unique up to a multiple) for which (3.20) is true. Let denote the diagonal matrix whose th diagonal entry is . Then and therefore, if we define the function by
with equality, when for some choice of ’s. The function is convex (because, as mentioned at the beginning of the previous section, the function is convex by definition), and its domain (i.e. where it is ) is . Therefore we get that
Moreover, for given and , equality in (4.4) for some means that for the corresponding values , the Gaussian is an extremizer for the variational problem defining . By Theorem 3.9, tis means that is totally reducible for .
Conversely, if is totally reducible for , then the variational problem in (4.2) splits into a sum of independent and orthogonal (after a suitable linear transformation ) such problems, but of the interior type (i.e. ) for which Barthe showed optimiziers to exist. Equivalently, the next Theorem 4.3 ensures that we can find a positive operator for which the decomposition of the identity (3.19) holds. Then, as mentioned in the remark after the proof of Lemma 3.6, the random vector is extremal for and setting we have that and by construction (see the proof of Lemma 3.5). This guaranties equality at all steps of our computation above and thus ensures equality in (4.4) ∎
where denotes the Legendre transform of . Since , the choice minimizes , and hence . There is a misprint in in which it is stated (in slightly different notation) that this choice of minimizes itself.
We finally return to Lemma 3.5, as we are now in a position to give necessary and sufficient conditions for the existence of the change of variables provided there.
From here, it is easy to prove the following theorem which supersedes Lemma 3.5, and gives necessary and sufficient conditions for the existence of the change of variables considered there. This result was obtained (in the more general multidimensional setting) by Bennett-Carbery-Christ-Tao along their study of the Brascamp-Lieb extremizers ; here we use the extremizers to the subadditivity of entropy inequality. Though this theorem concerns a problem in linear algebra, we do not know a direct proof of it in a purely linear algebra context, though there may be one.
if and only if the set is totally reducible for
Proof: The proof of Lemma 3.6 shows that whenever such a matrix exists, there exists an optimizer for the subadditivity inequality. Thus, by Theorem 3.9, the condition that is totally reducible for is necessary.
A convolution inequality for eigenvalues
We investigate here the dual of the superadditivity of Fisher information inequality (3.16) from Proposition 3.3.
In Section 2 we have shown that the Legendre transform of the entropy provides an equivalence between subadditivity of the entropy and Brascamp-Lieb inequalities. It turns out that the Fisher information is also a convex functional and its Legendre transform is known to be the smallest eigenvalue of a Schrödinger operator. (This is used extensively in the theory of large deviations, for example). We shall use this fact to derive a subadditivity of the smallest eigenvalues of Schrödinger operators.
Then is the “ground state” eigenvalue of
provided the bottom of the spectrum is an eigenvalue, and in any case, it is the bottom of the spectrum.
where the supremum is taken over all probability densities . This gives us the analog of (2.4) for Fisher information:
with equality if and only if where . (Here, by the definition (5.1) of , is the “ground state” eigenfunction.
The following result generalizes this to the case in which we have unit vectors satisfying (3.13):
Proof: Choose an and a probability density such that
Since is arbitrary, this proves the result. ∎
The inequality (5.3) is sharp since one can use another Legendre transform, as in the proof of Theorem 2.1, and see that it implies the sharp inequality (3.16). Inequality (5.3) could also be proved using a semi-group (or Stochastic) method inspired by the one used by Borell in his study of Brunn-Minkowski type inequalities (which, somehow, are the converse of the inequalities considered here); this would be more complicated than starting from the inequality (3.16) for the Fisher information, though.
An analogous result for functions on the sphere could be given using the sharp superadditivity of Fisher information inequality proved in .