An iterative construction of solutions of the TAP equations for the Sherrington-Kirkpatrick model
Erwin Bolthausen
Introduction
The TAP equations for the Sherrington-Kirkpatrick model describe the quenched expectations of the spin variables in a large system.
We write for the expectation under this measure. We will often drop the indices if there is no danger of confusion. We set
which have to be understood in a limiting sense, as is the solution of the equation
where is the standard normal distribution. It is known that this equation has a unique solution for (see Proposition 1.3.8). If then is the unique solution if and there are two other (symmetric) solutions when which are supposed to be the relevant ones. Mathematically, the validity of the TAP equations has only been proved in the high temperature case, i.e. when is small, although in the physics literature, it is claimed that they are valid also at low temperature, but there they have many solutions, and the Gibbs expectation has to be taken inside “pure states”. For the best mathematical results, see Chap. 1.7.
The appearance of the so-called Onsager term is easy to understand. From standard mean-field theory, one would expect an equation
but one has to take into account the stochastic dependence between the random variables and In fact, it turns out that the above equation should be correct when one replaces by where the latter is computed under a Gibbs average dropping the interactions with the spin Therefore is independent of the and one would get
The Onsager term is an Itô-type correction expanding the dependency of on and replacing on the right hand side by The correction term is non-vanishing because of
i.e. exactly for the same reason as in the Itô-correction in stochastic calculus. We omit the details which are explained in .
In the present paper, there are no results about SK itself. We introduce an iterative approximation scheme for solutions of the TAP equations which is shown to converge below and at the de Almayda-Thouless line, i.e. under condition (2.1) below (see ). This line is supposed to separate the high-temperature region from the low-temperature one, but although the full Parisi formula for the free energy of the SK-model has been proved by Talagrand , there is no proof yet that the AT line is the correct phase separation line.
The iterative scheme we propose reveals, we believe, an interesting structure of the dependence of the on the family even below the AT line. The main technical result, Proposition 2.5 is proved at all temperatures, but beyond the AT-line, it does not give much information.
We finish the section by introducing some notations.
As mentioned above, we suppress in notations as far as possible, but this parameter is present everywhere.
is a Gaussian -matrix where the for are independent centered Gaussians with variance , and where We will exclusively reserve the notation for such a Gaussian matrix.
We will use as generic standard Gaussians. Whenever several of them appear in the same formula, they are assumed to be independent, without special mentioning. We then write when taking expectations with respect to them. (This notation is simply an outflow of the abhorrence probabilists have of using integral signs, as John Westwater once put it).
provided there exists a constant such that
Clearly, if and , then
We will use as a generic positive constant, not necessarily the same at different occurrences. It may depend on and on the level of the approximation scheme appearing in the next section, but on nothing else, unless stated otherwise.
In order to avoid endless repetitions of the parameters and we use the abbreviation
We always assume and as there is a symmetry between the signs, we assume will exclusively be used for the unique solution of (1.2). In the case there is a unique solution of (1.2) which is positive. Proposition 2.5 is valid in this case, too, but this does not lead to a useful result. So, we stick to the case.
Gaussian random variables are always assumed to be centered.
The recursive scheme for the solutions of the TAP equations
here the vector with coordinates all and is the unique solution of (1.2). We define
will exclusively been used to number this level of the iteration. Our main result is
Assume If is below the AT-line, i.e. if
If there is strict inequality in (2.1), then there exist and such that for all
The theorem is a straightforward consequence of a computation of the inner products We explain that first. The actual computation of these inner products will be quite involved and will depend on clarifying the structural dependence of on
where as usual, are independent standard Gaussians. Remember that
satisfies and is strictly increasing and convex on
and are evident by the definition of We compute the first two derivatives of
the second equality by Gaussian partial integration.
In both expressions, we can first integrate out getting
and the similar expression for with replaced by So, we see that is increasing and convex. Furthermore, as
If (2.1) is satisfied, then is the only fixed point of in the interval If (2.1) is not satisfied then there is a unique fixed point of inside the interval
If there is strict inequality in (2.1) , then and converge to exponentially fast.
We prove by induction on that For as the statement follows.
i.e. As the statement follows.
c) Linearization of around easily shows that the convergence is exponentially fast if ∎
Remark that by a) of the above lemma, one has for all
As the variables are bounded, (2.5) implies
Taking the limit, using Proposition 2.5, this converges to From Lemma 2.4, the claim follows. ∎
Proposition 2.5 is true for all temperatures. However, beyond the AT-line, it does not give much information on the behavior of the for large It would be very interesting to know if these iterates satisfy some structural properties beyond the AT-line.
The main task is to prove the Proposition 2.5. It follows by an involved induction argument. We first remark that (2.7) is a consequence of (2.5) and (2.6).
implies that for all we have with
Evidently, all variables are bounded by a constant on if The constant may depend on of course. The are bounded by everywhere.
Iterative modifications of the interaction variables
Let be a sub--field of and be a random matrix. We are only interested in the case where is symmetric and on the diagonal, but this is not important for the moment. We assume that is jointly Gaussian, conditioned on i.e. there is a positive semidefinite - -m.b. matrix such that
(We do not assume that is Gaussian, unconditionally). Consider a -measurable random vector , and the linear space of random variables
We consider the linear projection of onto which is defined to be the unique matrix with components in which satisfy
As is assumed to be conditionally Gaussian, given it follows that is conditionally independent of the variables in given
If is symmetric, then clearly is symmetric, too.
If is a -measurable random variable then is conditionally Gaussian as well and
is conditionally Gaussian, given
and are -measurable
i.e. we perform the above construction with and .
In order that the construction is well defined, we have to inductively prove the properties (C1) and (C2). We actually prove a condition which is stronger than (C1):
Conditionally on is Gaussian, and conditionally independent of
(C1’) implies that is conditionally Gaussian, given and the conditional law, given is the same as given
The case is trivial. We first prove (C2) for using (C1’), (C2) up to We claim that
where stands for a generic -measurable random variable, not necessarily the same at different occurrences.
As and is -measurable, by the induction hypothesis, it follows from (3.2) that is -measurable The statements for are then trivial consequences.
We therefore have to prove (3.2). We prove by induction on that
The case follows from the definition of and the case is (3.2).
Assume that (3.3) is true for We replace by through the recursive definition
as is -measurable and therefore is -measurable
Using (3.1), one gets and therefore
This proves (3.2), and therefore (C2) for
We condition on By (C2), is -measurable As conditioned on , is Gaussian, and independent of it has the same distribution also conditioned on By the construction of this variable is, conditioned on independent of and conditionally Gaussian. ∎
The proof is by induction on For there is nothing to prove.
Assume that the statement is proved up to We want to prove for The case is covered by (3.1). For it follows by Remark 3.1, as is -measurable, that
as by the symmetry of and the induction hypothesis. ∎
We write for a generic -measurable random variable which satisfies
for some The constants here may depend on and the level and on the formula where they appear, but on nothing else, in particular not on and any further indices. For instance, if we write
we mean that there exists with
Furthermore, in such a case, it is tacitly assumed that are -measurable
Evidently, if are then is and if is and is then is
We will finally prove the validity of the following relations:
The are real numbers, not random variables, which depend on only through the type of subset which is taken. For instance, there is only one number (for every ) if all four indices are taken.
The main point with assuming is (2.9). On the variables are bounded for
Assume (4.1) - (4.3) for and (2.9). Then
a) As is -measurable, and is independent of conditionally on we get
Using (4.1), (4.2), and the boundedness of the ’s on , and for , we get
We split the sum over into the one summand , in and The one summand gives
Because for this is seen to be The same applies to
Take e.g. Then with no further dependence of this number on So we get for this part for any summand on with
Using again we get that this is This applies in the same way to all the parts. Therefore b) follows.
due to the orthogonality of the
where the -measurable coefficients satisfy
The existence of -measurable coefficients comes from linear algebra.
Therefore, we can replace the by
which satisfy the desired property (4.9).
We keep fixed for the moment and write for The requirement for them is that for all
Due to the orthonormality of the one gets
Writing for the error term in (4.5), and for the error term in (4.4), we arrive at
In the first summand, we sum now over all remarking that we have assumed that for The error for not summing over the single can be incorporated into We therefore arrive at
Write for the matrix and for Then we have to invert the matrix Remark that Therefore
The right hand side, we can develop as a Neumann series:
As we get the desired conclusion. ∎
The summands involving the all only give contributions which enter the -terms. Take for instance In that case, the claimed -term is In the last summand of (4.10), there is one summand, namely where the are so this summand is only
The other summands behave similarly. The third and fourth summand in (4.10) behave similarly.
As another case, take where we have to get for the second to fourth summand in (4.10).
We write for the summand, we get by multiplying the -th summand in the first bracket with the -th in the second. By induction hypothesis, we get
In the -term, only the multiplication of with counts, the other part giving Therefore
gives the same. In again only the matching of with counts, so we get
The other parts are easily seen to give We we have proved that
c) We have here
The and -terms are clearly of the desired form, either from induction hypothesis or Lemma 4.2.
For we get for the expectation so this is of the desired form. The same applies to It therefore remains
As the whole expression is The other cases are handled similarly. ∎
Proof of Proposition 2.5
We assume and (4.1) - (4.3) for By Proposition 4.1 of the last section, this implies (4.1) - (4.3) for Using this, we prove now (2.5) and (2.6) for so that we have proved Having achieved this, the proof of Proposition 2.5 is complete.
Remark that under
From by (2.5) and (2.6) for and the fact that the are uniformly bounded on we have
Remark that by Lemma 3.2, we have Evidently
This proposition is correct for all The key point with (2.1) is that the first summand disappears for as so that for large stabilizes to but above the AT-line does not converge to Therefore, above the AT-line, in every iteration, new conditionally independent contributions appear.
The above proposition is proved by showing that implies
As implies trivially for it is then clear that implies for all As the are uniformly bounded by we get from that
for all We will then prove (Lemma 5.3) that
This will prove and therefore, this will have finished the whole induction procedure.
Together with proving (5.3), we also show
for which is not evident from (5.3) as the are not bounded.
Assume the validity of (2.5)-(2.7) and (5.4) for Then for
We prove by induction on that
and define where is replaced by Remark that
which prove the desired induction in
To switch from to we observe that by the estimates of Lemma 4.3, one has
By choosing large enough, we get for by Corollary A.2 a)
For the bound is trivial anyway. This proves (5.7) for (5.8) follows in the same way using Corollary A.2 b).
on (5.7) for then follows from Corollary A.2 c). As for (5.8), we remark that
We can then again use Corollary A.2 c) remarking that for
(5.7) for follows from the induction hypothesis (2.7), and Corollary A.2 a). Similarly with (5.8) but here, one has to use part b) of Corollary A.2.
on and one uses the induction hypothesis (5.4) for to get (5.7) for Remark that actually, one has a bound uniform in
Therefore, one also gets (5.8) using Corollary A.2. Up to now, we have obtained
and we can therefore replace on the right hand side, by for or for which is the same as replacing by Therefore, the lemma is proved. ∎
We assume .
We condition on Then is conditionally Gaussian with covariances given in Lemma 4.2 a), b). We can therefore apply Lemma A.3 which gives, conditionally on on an event which has probability
Applying now Lemma A.3 successively to , we get
The case uses a minor modification of the argument. One first uses Lemma A.3 successively to get
b) This also comes with a modification of the reasoning in a).
In the case the outcome is similar, one only has to replace the second factor by
The next observation is that by the induction hypothesis, one can replace by and we get
The important point is that the factor before is replaced by a constant, which is due to the induction hypothesis. We can now proceed in the same way with applying again Lemma A.3, conditioned on and the induction hypothesis. The final outcome is
For the latter case, the right hand side is simply For the case we can rewrite the expression on the right hand side as
Solving, we get and
Appendix A Appendix
then Therefore, we can represent the as
where the are i.i.d. standard Gaussians. Then
By choosing appropriate, we get the desired estimate.
To prove (A.2), we use the same representation. As
for large enough we get the desired conclusion. ∎
Assume and
For any there exist such that
For any there exist such that
If are -measurable with
so that we see that it suffices to consider Then we apply the lemma, part b).
As for c), we have that the conditional distribution of , given is Gaussian, with bounded variance. So the statement follows. ∎
and there are vectors fixed, which are bounded in all indices, such that
We leave out in notations, as often as possible. Consider
The constant will be specified below. Then
where is a bound on the Lipshitz constants for the and is a bound of the
We choose large enough such that the -matrix which is off diagonal, and
on the diagonal is positive definite. This is possible as
Let be a Gaussian matrix with covariance matrix Then
has the same distribution as Here we assume that is independent of the ’s. So, we assume that the are presented in this way.
We can apply Lemma A.1 to the vector , and (A.3) to the first summand on the right-hand side, obtaining
follows by standard Gaussian isoperimetry (see e.g. ). ∎