Stein's method for dependent random variables occurring in Statistical Mechanics

Peter Eichelsbacher, Matthias Löwe

Introduction and main result

There is a long tradition in considering mean–field models in statistical mechanics. The Curie–Weiss model is famous, since it exhibits a number of properties of real substances, such as multiple phases, metastable states and others, explicitly. The aim of this paper is to prove Berry-Esseen bounds for the sums of dependent random variables occurring in statistical mechanics under the name Curie-Weiss models. To this end, we will develop Stein’s method for exchangeable pairs (see ) for a rich class of distributional approximations. For an overview of results on the Curie–Weiss models and related models, see , , .

Here $\beta:=T^{-1}$ is the inverse temperature and $Z_{\Lambda}(\beta)$ is a normalizing constant known as the partition function and $|\Lambda|$ denotes the cardinality of $\Lambda$ . Moreover $\varrho$ is the distribution of a single spin in the limit $\beta\to 0$ . We define $S_{\Lambda}=\sum_{i\in\Lambda}X_{i}^{\Lambda}$ , the total magnetization inside $\Lambda$ . We take without loss of generality $d=1$ and $\Lambda=\{1,\ldots,n\}$ , where $n$ is a positive integer. We write $n$ , $X_{i}^{(n)}$ , $P_{n,\beta}$ and $S_{n}$ , respectively, instead of $|\Lambda|$ , $X_{i}^{\Lambda}$ , $P_{\Lambda,\beta}$ , and $S_{\Lambda}$ , respectively. In the case where $\beta$ is fixed we may even sometimes simply write $P_{n}$ .

In the classical Curie–Weiss model, spins are distributed in $\{-1,+1\}$ according to $\varrho=\frac{1}{2}(\delta_{-1}+\delta_{1})$ . More generally, the Curie–Weiss model carries an additional parameter $h>0$ called external magnetic field which leads to the modified measure, given by

The measures $P_{n,\beta,h}$ is completely determined by the value of the total magnetization. It is therefore called an order parameter and its behaviour will be studied in this paper. The non-negative external magnetic field strength may even depend on the site:

In the general case (1.1), we will see (analogously to the treatment in ) that the asymptotic behaviour of $S_{n}$ depends crucially on the extremal points of a function $G$ (which is a transform of the rate function in a corresponding large deviation principle): define

We shall drop $\beta$ in the notation for $G$ whenever there is no danger of confusion, similarly we will suppress $\varrho$ in the notation for $\phi$ and $G$ . For any measure $\varrho\in\mathcal{B}$ , $G$ was proved to have global minima, which can be only finite in number, see [12, Lemma 3.1]. Define $C=C_{\varrho}$ to be the discrete, non–empty set of minima (local or global) of $G$ . If $\alpha\in C$ , then there exists a positive integer $k:=k(\alpha)$ and a positive real number $\mu:=\mu(\alpha)$ such that

The numbers $k$ and $\mu$ are called the type and strength, respectively, of the extremal point $\alpha$ . Moreover, we define the maximal type $k^{*}$ of $G$ by the formula

Note that the $\mu(\alpha)$ can be calculated explicitly: one gets

An interesting point is, that the global minima of $G$ of maximal type correspond to stable states, meaning that multiple minima represent a mixed phase and a unique global minimum a pure phase. For details see the discussions in .

The following is known about the fluctuation behaviour of $S_{n}$ under $P_{n}$ . In the classical model ( $\varrho$ is the symmetric Bernoulli measure), for $0<\beta<1$ , in the Central Limit Theorem is proved:

in distribution with respect to the Curie–Weiss finite volume Gibbs states with $\sigma^{2}(\beta)=(1-\beta)^{-1}$ . Since for $\beta=1$ the variance $\sigma^{2}(\beta)$ diverges, the Central Limit Theorem fails at the critical point. In it is proved that for $\beta=1$ there exists a random variable $X$ with probability density proportional to $\exp(-\frac{1}{12}x^{4})$ such that as $n\to\infty$

in distribution with respect to the finite-volume Gibbs states. Asymptotic independence properties and propagation of chaos for blocks of size $o(n)$ have been investigated in .

In general, given $\varrho\in\mathcal{B}$ , let $\alpha$ be one of the global minima of maximal type $k$ and strength $\mu$ of $G_{\varrho}$ . Then

in distribution, where $X_{k,\mu,\beta}$ is a random variable with probability density $f_{k,\mu,\beta}$ , defined by

Here, $\sigma^{2}=\frac{1}{\mu}-\frac{1}{\beta}$ so that for $\mu=\mu(\alpha)$ as in (1.6), $\sigma^{2}=([\phi^{\prime\prime}(\beta\alpha)]^{-1}-\beta)^{-1}$ (see , ). Moderate deviation principles have been investigated in .

In and , a class of measures $\varrho$ is described exhibiting a behaviour similar to that of the classical Curie–Weiss model. Assume that $\varrho$ is any symmetric measure that satisfies the Griffiths-Hurst-Sherman (GHS) inequality,

(see also ). One can show that in this case $G$ has the following properties: There exists a value $\beta_{c}$ , the inverse critical temperature, and $G$ has a unique global minimum at the origin for $0<\beta\leq\beta_{c}$ and exactly two global minima, of equal type, for $\beta>\beta_{c}$ . For $\beta_{c}$ the unique global minimum is of type $k\geq 2$ whereas for $\beta\in(0,\beta_{c})$ the unique global minimum is of type 1. At $\beta_{c}$ the law of large numbers still holds, but the fluctuations of $S_{n}$ live on a smaller scale than $\sqrt{n}$ . This critical temperature can be explicitly computed as $\beta_{c}=1/\phi^{\prime\prime}(0)=1/\operatorname{Var}_{\varrho}(X_{1})$ . By rescaling the $X_{i}$ we may thus assume that $\beta_{c}=1$ .

Alternatively, the GHS-inequality can be formulated in the terms of $Z_{n,\beta,h_{1},\ldots,h_{n}}$ , defined in (1.3):

With GHS, we will denote the set of measures $\varrho\in\mathcal{B}$ such that the GHS-inequality (1.10) is valid (for $P_{n,\beta,h_{1},\ldots,h_{n}}$ in the sense of (1)). We will give examples in Section 7.

In [12, Lemma 4.1], for $\varrho\in\mathcal{B}$ it is proved that $G$ has a unique global minimum if and only if

where the right hand side of this strict inequality is the moment generating function of a standard normal random variable. Moreover, in the same Lemma it is proved that $G$ has a local minimum at the origin of type $k$ and strength $\mu$ if and only if

The aim of this paper is to prove the following theorems:

Let $\varrho=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{1}$ and $0<\beta<1$ . We have

where $\Phi_{\beta}$ denotes the distribution function of the normal distribution with expectation zero and variance $(1-\beta)^{-1}$ , and $C$ is an absolute constant, depending on $\beta$ , only.

Let $\varrho=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{1}$ and $\beta=1$ . We have

Let $\varrho=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{1}$ and $0<\beta_{n}<\infty$ depend on $n$ in such a way that $\beta_{n}\to 1$ monotonically as $n\to\infty$ . Then the following assertions hold:

If $\beta_{n}-1=\frac{\gamma}{\sqrt{n}}$ for some $\gamma\not=0$ , we have

If $|\beta_{n}-1|\ll n^{-1/2}$ , $S_{n}/n^{3/4}$ converges in distribution to $F$ , given in (1.14). Moreover, if $|\beta_{n}-1|=\mathcal{O}(n^{-1})$ , (1.13) holds true.

If $|\beta_{n}-1|\gg n^{-1/2}$ , the Kolmogorov distance of the distribution of $\sqrt{\frac{1-\beta_{n}}{n}}\sum_{i=1}^{n}X_{i}$ and the normal distribution $N(0,(1-\beta_{n})^{-1})$ converges to zero. Moreover, if $|\beta_{n}-1|\gg n^{-1/4}$ , we obtain

In , Barbour obtained distributional limit theorems, together with rates of convergence, for the equilibrium distributions of a variety of one-dimensional Markov population processes. In section 3 he mentioned, that his results can be interpreted in the framework of . As far as we understand, his result (3.9) can be interpreted as the statement (1.13), but with the rate $n^{-1/4}$ .

In the first assertion of Theorem 1.4, our method of proof allows to compare the distribution of $S_{n}/n^{3/4}$ alternatively with the distribution with Lebesgue-density proportional to

To be able to compare the distribution of interest with a distribution depending on $n$ (on $\beta_{n}$ ), is one of the advantages of Stein’s method. The proof of this statement follows immediately from the proof of Theorem 1.4.

If in Theorem 1.4 (2) $|\beta_{n}-1|\gg n^{-1}$ the speed of convergence reduces to $\mathcal{O}(\sqrt{n}|1-\beta_{n}|)$ . Likewise, if in Theorem 1.4 (3) $|\beta_{n}-1|\ll n^{-1/4}$ , the speed of convergence is $\mathcal{O}(\frac{1}{n|1-\beta_{n}|})$ . This reduced speed of convergence reflects the influence of two potential limiting measures. Next to the ”true” limit there is also the limit measure from part (1) of Theorem 1.4, which in these cases is relatively close to our measures of interest.

2. Results for a general class of Curie-Weiss models

More generally, we obtain Berry-Esseen bounds for sums of dependent random variables occurring in the general Curie-Weiss models. We will be able to obtain Berry-Esseen-type results for $\varrho$ -a.s. bounded single-spin variables $X_{i}$ :

Given $\varrho\in\mathcal{B}$ in GHS, let $\alpha$ be the global minimum of type $k$ and strength $\mu$ of $G_{\varrho}$ . Assume that the single-spin random variables $X_{i}$ are bounded $\varrho$ -a.s. In the case $k=1$ we obtain

where $\widehat{F}_{W,k}(z):=\int_{-\infty}^{z}\widehat{f}_{W,k}(x)\,dx$ with $\widehat{f}_{W,k}$ defined by

with $W:=\frac{S_{n}-n\alpha}{n^{1-1/2k}}$ and $C_{k}$ is an absolute constant.

Let $\varrho\in\mathcal{B}$ satisfy the GHS-inequality and assume that $\beta_{c}=1$ . Let $\alpha$ be the global minimum of type $k$ with $k\geq 2$ and strength $\mu_{k}$ of $G_{\varrho}$ and let the single-spin variable $X_{i}$ be bounded. Let $0<\beta_{n}<\infty$ depend on $n$ in such a way that $\beta_{n}\to 1$ monotonically as $n\to\infty$ . Then the following assertions hold true:

If $\beta_{n}-1=\frac{\gamma}{n^{1-\frac{1}{k}}}$ for some $\gamma\not=0$ , we have

If $|\beta_{n}-1|\ll n^{-(1-1/k)}$ , $\frac{S_{n}-n\alpha}{n^{1-1/2k}}$ converges in distribution to $\widehat{F}_{W,k}$ , defined as in Theorem 1.7. Moreover, if $|\beta_{n}-1|=\mathcal{O}(n^{-1})$ , (1.17) holds true.

Since the symmetric Bernoulli law is ${\rm GHS}$ , Theorems 1.7 and 1.8 include Berry-Esseen type results for this case. But these results differ from the results in Theorem 1.2, 1.3 and 1.4 with respect to the limiting laws: the laws in 1.7 and 1.8 depend on moments of $W$ . The bounds in Theorems 1.2-1.4 are easier to obtain; moreover their proofs apply Corollary 2.8 and part (2) of Theorem 4.6 which are less involved versions of Stein’s method for exchangeable pairs.

The class of test functions $h$ for the Wasserstein distance $d_{w}$ is just the Lipschitz functions ${\rm Lip}(1)$ with constant no greater than 1. The total variation distance is given by the set ${\mathcal{H}}$ of indicators of Borel sets, the Kolmogorov distance $d_{K}$ by the set of indicators of half lines.

Only for technical reasons, we consider now a modified model. Let

Given the Curie-Weiss model $\widehat{P}_{n,\beta}$ and $\varrho\in\mathcal{B}$ in GHS, let $\alpha$ be the global minimum of type $k$ and strength $\mu$ of $G_{\varrho}$ . In the case $k=1$ , for any uniformly Lipschitz function $h$ we obtain for $W=S_{n}/\sqrt{n}$ that

Lebowitz proved that if $\varrho\in{\rm GHS}$ , then (1.12) is non-positive (see [10, V.13.7.(b)] and ). Stein’s method reduces to the computation of, or bounds on, low order moments, perhaps even only on variances of certain quantities. Such variance computations can be very difficult. We will see in the proof of Theorem 1.7 and Theorem 1.8 the use of Lebowitz’ inequality for bounding the variances successfully.

In the situation of Theorem 1.7 and Theorem 1.8 we can bound higher order moments as follows:

Given $\varrho\in{\mathcal{B}}$ , let $\alpha$ be one of the global minima of maximal type $k$ for $k\geq 1$ and strength $\mu$ of $G_{\varrho}$ . For

We prepare for the proof of Lemma 1.13. It considers a well known transformation – sometimes called the Hubbard–Stratonovich transformation – of our measure of interest.

As shown in , Lemma 3.1, our condition (1.2) ensures that

is finite, such that the above density is well defined.

The proof of this lemma can be found at many places, e.g. in , Lemma 3.3. ∎

In Section 2, we develop in Theorem 2.5, Corollary 2.8 and Corollary 2.9 refinements of Stein’s method for exchangeable pairs in the case of normal approximation. As a first application we prove Theorem 1.2 in Section 3. In Section 4 we develop Stein’s method for exchangeable pairs for a rich class of other distributional approximations. Obtaining good bounds for the solutions of the corresponding Stein equations in the appendix, we prove Theorem 1.3 and Theorem 1.4 in Section 5, applying Theorem 4.6. In Section 6, we proof Theorems 1.7, 1.8 and 1.10, applying Corollary 2.9 and Theorem 4.7. Section 7 contains a collection of examples including the Curie-Weiss model with three states, studying liquid helium, and a continuous Curie-Weiss model, where the single spin distribution $\varrho$ is a uniform distribution.

Stein’s method with exchangeable pairs for normal approximation

for some $0<\lambda<1$ . This approach has been successfully applied in many models, see and for example and references therein. In , the range of application was extended by replacing the linear regression property by a weaker condition, allowing to hold the regression property only approximately. The exchangeable pair approach is also successful for other distributional approximations, as will be shown in Section 4. We develop Stein’s method by replacing the linear regression property by

where $\psi(x)$ will be depend on a continuous distribution under consideration. Before we consider in this section the case of normal approximation, we mention that this is not the first paper to study other distributional approximations via Stein’s method. For a rather large class of continuous distributions, the Stein characterization was introduced in , following [22, Chapter 6]. In , the method of exchangeable pairs was introduced for this class of distribution and used in a simulation context. Recently, the exchangeable pair approach was introduced for exponential approximation in [4, Lemma 2.1].

For measuring the distance of the distribution of $W$ and the standard normal distribution (or any other distribution), we would like to bound

for a class of test functions $h\in{\mathcal{H}}$ , where $\Phi(h):=\int_{-\infty}^{\infty}h(z)\Phi(dz)$ and $\Phi$ is the standard normal distribution function. One advantage of Stein’s method is that we are able to obtain bounds for different distances like the Wasserstein distance $d_{\rm{w}}$ , the total variation distance $d_{\rm{TV}}$ or the Kolmogorov distance $d_{\rm{K}}$ . In , the exchangeable pair approach of Stein was developed for a broad class of non smooth functions $h$ , applying standard smoothing inequalities.

where $\lambda$ is a number satisfying $0<\lambda<1$ . If moreover

Rinott and Rotar also proved a bound in the case, where $|W^{\prime}-W|$ is not assumed to be bounded. In this case, the last two summands on the right hand side of (2.21) have to be replaced by

This estimation is crude, since even for a normalized sum of $n$ independent variables $W$ , it leads to a bound of the order $n^{-1/4}$ . The advantage of the results in is, that these bounds do not only apply to indicators on half lines, but also to a broad class of non smooth test functions, see [19, Section 1.2].

Chen and Shao introduced a concentration inequality approach. Here a concentration inequality is proved using the Stein identity (see and ). In the context of the construction of an exchangeable pair, in Shao and Su proved the following theorem:

If $|W-W^{\prime}|\leq A$ , then the bound reduces to

When $|W-W^{\prime}|$ is bounded, (2.23) improves (2.21) with respect to the constants.

Following the lines of the proofs in and , we obtain the following refinement: Given two random variables $X$ and $Y$ defined on a common probability space, we denote by

the Kolmogorov distance of the distributions of $X$ and $Y$ .

Let $(W,W^{\prime})$ be an exchangeable pair of real-valued random variables such that

If $|W-W^{\prime}|\leq A$ for a constant $A$ , we obtain the bound

When $|W-W^{\prime}|$ is bounded, (2.24) improves (2.21) with respect to the Berry-Esseen constants.

We sketch the proof: For a function $f$ with $|f(x)|\leq C(1+|x|)$ we obtain

Let $f=f_{z}$ denote the solution of the Stein equation

Using $|f^{\prime}(x)|\leq 1$ for all real $x$ (see [6, Lemma 2.2]), we obtain the bound

Using $0<f(x)\leq\sqrt{2\pi}/4$ (see [6, Lemma 2.2]), we have

Bounding $T_{3}$ we apply the concentration technique, see :

Next observe that $|U_{1}|\leq 0.82A^{3}$ , see : by the mean value theorem one gets

Under the assumptions of our Theorem we proceed as in and obtain the following concentration inequality:

see ; here $f$ is defined by $f(x):=-1.5A$ for $x\leq z-A$ , $f(x):=1.5A$ for $x\geq z+2A$ and $f(x):=x-z-A/2$ in between. Now we apply (2.25) and get

In the following corollary, we discuss the Kolmogorov-distance of the distribution of a random variable $W$ to a random variable distributed according to $N(0,\sigma^{2})$ , the normal distribution with mean zero and variance $\sigma^{2}$ .

Let $\sigma^{2}>0$ and $(W,W^{\prime})$ be an exchangeable pair of real-valued random variables such that

Let us denote by $f_{\sigma}:=f_{\sigma,z}$ the solution of the Stein equation

with $F_{\sigma}(z):=\frac{1}{\sqrt{2\pi}\sigma}\int_{-\infty}^{z}\exp\bigl{(}-\frac{y^{2}}{2\sigma^{2}}\bigr{)}\,dy$ . It is easy to see that the identity $f_{\sigma,z}(x)=\sigma f_{z}\bigl{(}\frac{x}{\sigma}\bigr{)}$ , where $f_{z}$ is the solution of the corresponding Stein equation of the standard normal distribution, holds true. Using [6, Lemma 2.2] we obtain $0<f_{\sigma}(x)<\sigma\frac{\sqrt{2\pi}}{4}$ , $|f_{\sigma}^{\prime}(x)|\leq 1$ , and $|f_{\sigma}^{\prime}(x)-f_{\sigma}^{\prime}(y)|\leq 1$ . With (2.30) we arrive at

with $T_{i}$ ’s defined in (2.27). Using the bounds of $f_{\sigma}$ and $f_{\sigma}^{\prime}$ , the bound of $T_{1}$ is the same as in the proof of Theorem 2.5, whereas the bound of $T_{2}$ changes to

Since we consider the case $|W-W^{\prime}|\leq A$ , we have to bound

Using the Stein identity (2.32), the mean value theorem as well as the concentration inequality-argument along the lines of the proof of Theorem 2.5, we obtain

Berry-Esseen bounds for the classical Curie-Weiss model

Let $\varrho$ be the symmetric Bernoulli measure and $0<\beta<1$ . Then

converges in distribution to a $N(0,\sigma^{2})$ with $\sigma^{2}=(1-\beta)^{-1}$ :

We consider the usual construction of an exchangeable pair. We produce a spin collection $X^{\prime}=(X_{i}^{\prime})_{i\geq 1}$ via a Gibbs sampling procedure: select a coordinate, say $i$ , at random and replace $X_{i}$ by $X_{i}^{\prime}$ drawn from the conditional distribution of the $i$ ’th coordinate given $(X_{j})_{j\not=i}$ . Let $I$ be a random variable taking values $1,2,\ldots,n$ with equal probability, and independent of all other random variables. Consider

Hence $(W,W^{\prime})$ is an exchangeable pair and

Let $\mathcal{F}:=\sigma(X_{1},\ldots,X_{n})$ . Now we obtain

The conditional distribution at site $i$ is given by

Now $\frac{1}{\sqrt{n}}\frac{1}{n}\sum_{i=1}^{n}\tanh(\beta m_{i}(X))=\frac{1}{\sqrt{n}}\frac{1}{n}\sum_{i=1}^{n}\bigl{(}\tanh(\beta m_{i}(X))-\tanh(\beta m(X))\bigr{)}+\frac{1}{\sqrt{n}}\tanh(\beta m(X))=:R_{1}+R_{2}$ with $m(X):=\frac{1}{n}\sum_{i=1}^{n}X_{i}$ . Taylor-expansion $\tanh(x)=x+\mathcal{O}(x^{3})$ leads to

To bound the first summand in (2.31), we obtain $(W-W^{\prime})^{2}=\frac{X_{I}^{2}}{n}-\frac{2X_{I}\,X_{I}^{\prime}}{n}+\frac{X_{I}^{\prime}}{n}$ . Hence

Now we discuss the critical case $\beta=1$ , when $\varrho$ is the symmetric Bernoulli distribution. For $\beta=1$ , using the Taylor expansion $\tanh(x)=x-x^{3}/3+\mathcal{O}(x^{5})$ , (3.36) would lead to

Constructing the exchangeable pair $(W,W^{\prime})$ in the same manner as before we will obtain

with $\lambda=\frac{1}{n^{3/2}}$ and a reminder $R(W)$ presented later. Considering the density $p(x)=C\,\exp(-x^{4}/12)$ , we have

This is the starting point for developing Stein’s method for limiting distributions with a regular Lebesgue-density $p(\cdot)$ and an exchangeable pair $(W,W^{\prime})$ which satisfies the condition

with $0<\lambda<1$ . To prove (3.38), observe that

By Taylor expansion and the identity $m_{i}(X)=m(X)-\frac{X_{i}}{n}$ we obtain

The exchangeable pair approach for distributional approximations

Motivated by the classical Curie-Weiss model at the critical temperature, we will develop Stein’s method with the help of exchangeable pairs as follows. For a rather large class of continuous distributions, the Stein characterization was introduced in , following the lines of [22, Chapter 6]. The densities occurring as limit laws in models of statistical mechanics belong to this class. Let $I$ be a real interval, where $-\infty\leq a<b\leq\infty$ . A function is called regular if $f$ is finite on $I$ and, at any interior point of $I$ , $f$ possesses a right-hand limit and a left-hand limit. Further, $f$ possesses a right-hand limit $f(a+)$ at the point $a$ and a left-hand limit $f(b-)$ at the point $b$ .

Let us assume, that the regular density $p$ satisfies the following condition:

Assumption (D) Let $p$ be a regular, strictly positive density on an interval $I=[a,b]$ . Suppose $p$ has a derivative $p^{\prime}$ that is regular on $I$ and has only countably many sign changes and being continuous at the sign changes. Suppose moreover that $\int_{I}p(x)|\log(p(x))|\,dx<\infty$ and assume that

In [23, Proposition] it is proved, that a random variable $Z$ is distributed according to the density $p$ if and only if

for a suitably chosen class $\mathcal{F}$ of functions $f$ . The proof is integration by parts. The corresponding Stein identity is

where $h$ is a measurable function for which $\int_{I}|h(x)|\,p(x)\,dx<\infty$ , $P(x):=\int_{-\infty}^{x}p(y)\,dy$ and $P(h):=\int_{I}h(y)\,p(y)\,dy$ . The solution $f:=f_{h}$ of this differential equation is given by

For the function $h(x):=1_{\{x\leq z\}}(x)$ let $f_{z}$ be the corresponding solution of (4.40). We will make the following assumptions:

Assumption (B1) Let $p$ be a density fulfilling Assumption (D). We assume that for any absolute continuous function $h$ , the solution $f_{h}$ of (4.40) satisfies

where $c_{1},c_{2}$ and $c_{3}$ are constants.

Assumption (B2) Let $p$ be a density fulfilling Assumption (D) We assume that the solution $f_{z}$ of

for all real $x$ and $y$ , where $d_{1},d_{2},d_{3}$ and $d_{4}$ are constants.

At first glance, Condition (4.43) seem to be a rather strong or at least a rather technical condition.

In the case of the normal approximation, $\psi(x)=-x$ , we have to bound $(xf_{z}(x))^{\prime}$ for the solution $f_{z}$ of the classical Stein equation. But it is easy to observe that $|(xf_{z}^{\prime}(x))^{\prime}|\leq 2$ by direct calculation (see [6, Proof of Lemma 6.5]). However, in the normal approximation case, this bound would lead to a worse Berry-Esseen constant (compare Theorem 2.5 with Theorem 4.6). Hence in this case we only use $d_{2}=d_{3}=1$ and $d_{1}=\sqrt{2\pi}/4$ .

We will see, that for all distributions appearing as limit laws in our class of Curie-Weiss models, Condition (4.43) can be proved:

The densities $f_{k,\mu,\beta}$ in (1.8) and (1.9) and the densities in Theorem 1.4, Theorem 1.7 and Theorem 1.8 satisfy Assumptions (D), (B1) and (B2).

We defer the proofs to the appendix, since they only involve careful analysis. ∎

With respect to all densities which appear as limiting distributions in our theorems, we restrict ourselves to bound solutions (and its derivatives) of the corresponding Stein equation characterizing distributions with probability densities $p$ of the form $b_{k}\exp(-a_{k}x^{2k})$ . Along the lines of the proof of Lemma 4.2, one would be able to present good bounds (in the sense that Assumption (B1) and (B2) are fulfilled) even for measures with a probability density of the form

In the case of comparing with an exponential distribution with parameter $\mu$ , it is easy to see, that Assumption (D) and (B2) is fulfilled, see [23, Example 1.6] for (D) and [4, Lemma 2.1] for (B2). We have $\psi(x)=-\mu$ and $\|f_{z}\|\leq 1$ , $\|f_{z}^{\prime}\|\leq 1$ and $\sup_{x,y\geq 0}|f_{z}^{\prime}(x)-f_{z}^{\prime}(y)|\leq 1$ . Thus $|(\psi(x)f_{z}(x))^{\prime}|=\mu|f_{z}^{\prime}(x)|\leq\mu$ .

Therefore one has to bound the derivative of

The following result is a refinement of Stein’s result for exchangeable pairs.

Let $p$ be a density fulfilling Assumption (D). Let $(W,W^{\prime})$ be an exchangeable pair of real-valued random variables such that

for some random variable $R=R(W)$ , $0<\lambda<1$ and $\psi$ defined in (4.39). Then

Let $Z$ be a random variable distributed according to $p$ . Under Assumption (B1), for any uniformly Lipschitz function $h$ , we obtain

Let $Z$ be a random variable distributed according to $p$ . Under Assumption (B2), we obtain for any $A>0$

for a suitably chosen class of functions.

Under Assumption (B2) we obtain for any $A>0$

Interestingly enough, the proof is a quite simple adaption of the results in and follows the lines of the proof of Theorem 2.5. For a function $f$ with $|f(x)|\leq C(1+|x|)$ we obtain

Proof of (1): Now let $f=f_{h}$ be the solution of the Stein equation (4.40), and define

By (4.50), following the calculations on page 21 in , we simply obtain

Proof of (2): Now let $f=f_{z}$ be the solution of the Stein equation (4.42). As in (2.27), using (4.50), we obtain

With $g(x):=(\psi(x)f(x))^{\prime}$ we obtain

Since $|g(x)|\leq d_{4}$ we obtain $|U_{1}|\leq\frac{A^{3}}{2}d_{4}$ .

Analogously to the steps in the proof of Theorem 2.5, $U_{2}$ can be bounded by

The main observation is the following identity:

with $T_{3}$ defined as in the proof of Theorem 4.6. Now we can apply the Cauchy-Schwarz inequality to get

Now the proof follows the lines of the proof of Theorem 4.6. ∎

We discuss an alternative bound in Theorem 4.6 in the case that $(\psi(x)f_{z}(x))^{\prime}$ cannot be bounded uniformly. By the mean value theorem we obtain in general

Let us consider the example $\psi(x)=-x^{3}/3$ . Now

with $\Delta:=(W-W^{\prime})$ . Hence we get

We will see in Section 5, that this bound is good enough for an alternative proof of Theorem 1.3.

Berry-Esseen bound at the critical temperature

We start with (3.38), where $W$ is given by (3.37). We will calculate the remainder term $R(W)$ more carefully: By Taylor expansion and the identities $m_{i}(X)=m(X)-X_{i}/n$ and $m(X)=\frac{1}{n^{1/4}}W$ we obtain

Hence applying Theorem 4.6 we have to bound the expectation of

In Remark 4.8, we presented an alternative bound via Stein’s method without proving a uniform bound for $(\psi^{\prime}(x)f_{z}(x))^{\prime}$ . As we can see, the additional terms in this bound are of smaller order than $\mathcal{O}(n^{-1/2})$ , using $A=n^{-3/4}$ .

(1) Let $\beta_{n}-1=\frac{\gamma}{\sqrt{n}}$ and $W=S_{n}/n^{3/4}$ . For the distribution function $F_{\gamma}$ in Theorem 1.4 we obtain $\psi(x)=\gamma\,x-\frac{1}{3}x^{3}$ . Moreover we have

with $R(\beta_{n},W)=\mathcal{O}(n^{-2})$ . With $\beta_{n}-1=\frac{\gamma}{\sqrt{n}}$ we obtain

(2): we consider the case $|\beta_{n}-1|=\mathcal{O}(n^{-1})$ and $W=S_{n}/n^{3/4}$ . Now in (5.52), the term $\frac{1-\beta_{n}}{n}W$ will be a part of the remainder:

applying Theorem 4.6, we obtain the convergence in distribution for any $\beta_{n}$ with $|\beta_{n}-1|\ll n^{-1/2}$ , and we obtain the Berry-Esseen bound of order $\mathcal{O}(1/\sqrt{n})$ for any $|\beta_{n}-1|=\mathcal{O}(n^{-1})$ .

(3) Finally we consider $|\beta_{n}-1|\gg n^{-1/2}$ and $W=\sqrt{\frac{(1-\beta_{n})}{n}}S_{n}$ . Now we obtain

with $\lambda=\frac{(1-\beta_{n})}{n}$ and $\psi(x)=-x$ . We apply Corollary 2.8: with $A=\frac{1}{\sqrt{n}}(1-\beta_{n})^{1/2}$ , one obtains $\lambda^{-1}A^{3}=n^{-1/2}(1-\beta_{n})^{1/2}$ and

Hence with $|\beta_{n}-1|\gg n^{-1/2}$ we obtain convergence in distribution. Under the additional assumption $|\beta_{n}-1|\gg n^{-1/4}$ we obtain the Berry-Esseen result. ∎

Proof of the general case

Given $\varrho$ which satisfies the GHS-inequality and let $\alpha$ be the global minimum of type $k$ and strength $\mu(\alpha)$ of $G_{\varrho}$ . In case $k=1$ it is known that the random variable $\frac{S_{n}}{\sqrt{n}}$ converges in distribution to a normal distribution $N(0,\sigma^{2})$ with $\sigma^{2}=\mu(\alpha)^{-1}-\beta^{-1}=(\sigma_{\varrho}^{-2}-\beta)^{-1}$ , see for example [10, V.13.15]. Hence in this case we will apply Corollary 2.9 (to obtain better constants for our Berry-Esseen bound in comparison to Theorem 4.7).

Consider $k\geq 1$ . We just treat the case $\alpha=0$ and denote $\mu=\mu(0)$ . The more general case can be done analogously. For $k=1$ , we consider $\psi(x)=-\frac{x}{\sigma^{2}}$ with $\sigma^{2}=\mu^{-1}-\beta^{-1}$ . For any $k\geq 2$ we consider

and $W^{\prime}$ , constructed as in Section 3, such that

Now we have to calculate the conditional distribution at site $i$ in the general case:

In the situation of Theorem 1.7, if $X_{1}$ is $\varrho$ -a.s. bounded, we obtain

with $m_{i}(X):=\frac{1}{n}\sum_{j\not=i}X_{j}=m(X)-\frac{X_{i}}{n}$ .

We compute the conditional density $g_{\beta}(x_{1}|(X_{i})_{i\geq 2})$ of $X_{1}=x_{1}$ given $(X_{i})_{i\geq 2}$ under the Curie-Weiss measure:

By computation of the derivative of $G_{\varrho}$ we see that

If we consider the Curie-Weiss model with respect to $\widehat{P}_{n,\beta}$ , the conditional density $g_{\beta}(x_{1}|(X_{i})_{i\geq 2})$ under this measure becomes

Applying Lemma 6.1 and the presentation (1.5) of $G_{\varrho}$ , it follows that

With $m_{i}(X)=m(X)-\frac{X_{i}}{n}$ and $m(X)=\frac{1}{n^{1/(2k)}}W$ we obtain

For any $k\geq 1$ the first summand ( $l=0$ ) is

To see this, let $k=1$ . Since we set $\phi^{\prime\prime}(0)=1$ , we obtain $\mu(0)=\beta-\beta^{2}$ and therefore $\frac{1}{\beta}\mu(0)W=(1-\beta)W$ . In the case $k\geq 2$ we know that $\beta=1$ . Hence in both cases, (6.53) is checked. Summarizing we obtain for any $k\geq 1$

Hence the last four summands in (4.48) of Theorem 4.7 are $\mathcal{O}(n^{-1/k})$ .

Since we assume that $\varrho\in{\rm GHS}$ , we can apply the correlation-inequality due to Lebowitz (see Remark 1.12)

The choice $i=k$ and $j=l$ leads to the bound

Using a conditional version of Jensen’s inequality we have

Hence the variance of the second term in (6.54) is of the same order as the variance of the first term. Applying (1.5) for $G_{\varrho}$ , the variance of the third term in (6.54) is of the order of the variance of $W^{2}/n^{1/k}$ . Summarizing the variance of (6.54) can be bounded by 9 times the maximum of the variances of the three terms in (6.54), which is a constant times $n^{-2/k}$ , and therefore for $k\geq 1$ we obtain

Since $\alpha=0$ and $k=1$ for $\beta\not=1$ while $\alpha=0$ and $k\geq 2$ for $\beta=1$ , $G_{\varrho}(\cdot)$ can now be expanded as

Hence $\frac{1}{\beta_{n}}\,G_{\varrho}^{\prime}(s)=\frac{\mu_{1}}{\beta_{n}}s+\frac{\mu_{k}}{\beta_{n}(2k-1)!}s^{2k-1}+\mathcal{O}(s^{2k})$ . With Lemma 6.1 and $\mu_{1}=(1-\beta_{n})\beta_{n}$ we obtain

The remainder $R(\beta_{n},W)$ is the remainder in the proof of Theorem 1.7 with $\mu$ exchanged by $\mu_{k}$ and $\beta$ exchanged by $\beta_{n}$ .

Let $\beta_{n}-1=\frac{\gamma}{n^{1-1/k}}$ and $W=n^{1/(2k)-1}\sum_{i=1}^{n}X_{i}$ . We obtain

where $\psi(x)=\gamma x-\frac{\mu_{k}}{\beta_{n}\,(2k-1)!}x^{2k-1}$ . As in the proof of Theorem 1.7 we obtain that $R(\beta_{n},W)=\mathcal{O}(n^{-2})$ . Now we only have to adapt the proof of Theorem 1.7 step by step, applying Lemma 1.13, Lemma 4.2 and Theorem 4.7.

Let $|\beta_{n}-1|=\mathcal{O}(1/n)$ and $W=n^{1/(2k)-1}\sum_{i=1}^{n}X_{i}$ . Now in (6.55), the term $\frac{1-\beta_{n}}{n}W$ will be a part of the remainder:

Thus with Theorem 4.7 we obtain convergence in distribution for any $\beta_{n}$ with $|\beta_{n}-1|\ll n^{-(1-1/k)}$ . Moreover we obtain the Berry-Esseen bound of order $\mathcal{O}(n^{-1/k})$ for any $|\beta_{n}-1|=\mathcal{O}(n^{-1})$ .

Finally we consider $|\beta_{n}-1|\gg n^{-(1-1/2)}$ and $W=\sqrt{\frac{(1-\beta_{n})}{n}}S_{n}$ . A little calculation gives

with $\psi(x)=-x$ and $\lambda=\frac{1-\beta_{n}}{n}$ . Now we apply Corollary 2.9. With $A:=\frac{\rm{const.}(1-\beta_{n})^{1/2}}{\sqrt{n}}$ we obtain

which is of order $\mathcal{O}\bigl{(}\frac{\beta_{n}}{n(1-\beta_{n})}\bigr{)}$ . Hence with $|\beta_{n}-1|\gg n^{-(1-1/k)}$ we get convergence in distribution. Under the additional assumption that $|\beta_{n}-1|\gg n^{-(1/2-1/(2k))}$ we obtain the Berry-Esseen bound. ∎

Examples

It is known that the following distributions $\varrho$ are ${\rm GHS}$ (see [11, Theorem 1.2]). The symmetric Bernoulli measure is ${\rm GHS}$ , first noted in . The family of measures

for $0\leq a\leq 2/3$ is ${\rm GHS}$ , whereas the GHS-inequality fails for $2/3<a<1$ , see [21, p.153]. ${\rm GHS}$ contains all measures of the form

where $V$ is even, continuously differentiable, and unbounded above at infinity, and $V^{\prime}$ is convex on $[0,\infty)$ . ${\rm GHS}$ contains all absolutely continuous measures $\varrho\in{\mathcal{B}}$ with support on $[-a,a]$ for some $0<a<\infty$ provided $g(x)=d\varrho/dx$ is continuously differentiable and strictly positive on $(-a,a)$ and $g^{\prime}(x)/g(x)$ is concave on $[0,a)$ . Measures like $\varrho(dx)={\rm const.}\exp\bigl{(}-ax^{4}-bx^{2}\bigr{)}\,dx$ or $\varrho(dx)={\rm const.}\exp\bigl{(}-a\cosh x-bx^{2}\bigr{)}\,dx$ with $a>0$ and $b$ real are GHS. Both are of physical interest, see and references therein).

We will now consider the next simplest example of the classical Curie–Weiss model: a model with three states. Observe, that this is not the Curie–Weiss–Potts model , since the latter has a different Hamiltonian. Indeed the Hamiltonian considered in is of the form $\frac{1}{n}\sum_{i,j}\delta_{x_{i},x_{j}}$ . It favours states with many equal spins, whereas in our case the spins also need to have large values. We choose $\varrho$ to be

This model seems to be of physical relevance. It is studied in . In it was used to analyze the tri-critical point of liquid helium. A little computation shows that

for all $s\geq 0$ . Hence the GHS-inequality (1.10) is fulfilled (see also [11, Theorem 1.2]), which implies that there is one critical temperature $\beta_{c}$ such that there is one minimum of $G$ for $\beta\leq\beta_{c}$ and two minima above $\beta_{c}$ . Since ${\rm Var}_{\varrho}(X_{1})=2\frac{1}{6}\cdot 3=1$ we see that $\beta_{c}=1$ . For $\beta\leq\beta_{c}$ the minimum of $G$ is located in zero while for $\beta>1$ the two minima are symmetric and satisfy

For $\beta<1$ the rescaled magnetization $S_{n}/\sqrt{n}$ satisfies a Central Limit Theorem and the limiting variance is $(1-\beta)^{-1}$ . Indeed, $\frac{d^{2}}{ds^{2}}\phi_{\varrho}(0)={\rm Var}_{\varrho}(X_{1})=1$ . Hence $\mu_{1}=\beta-\beta^{2}$ and $\sigma^{2}=\frac{1}{1-\beta}$ . Moreover we obtain

For $\beta=\beta_{c}=1$ the rescaled magnetization $S_{n}/n^{5/6}$ converges in distribution to $X$ which has the density $f_{3,6,1}$ . Indeed $\mu_{2}$ is computed to be 6. Moreover we obtain

If $\beta_{n}$ converges monotonically to $1$ faster than $n^{-2/3}$ then $\frac{S_{n}}{n^{5/6}}$ converges in distribution to $\widehat{F}_{3}$ , whereas if $\beta_{n}$ converges monotonically to $1$ slower than $n^{-2/3}$ then $\frac{\sqrt{1-\beta_{n}}\,S_{n}}{\sqrt{n}}$ satisfies a Central Limit Theorem. Eventually, if $|1-\beta_{n}|=\gamma n^{-2/3}$ , $\frac{S_{n}}{n^{5/6}}$ converges in distribution to a random variable which probability distribution has the mixed Lebesgue-density

For $\beta=\beta_{c}=1$ the rescaled magnetization $S_{n}/n^{7/8}$ converges in distribution to $X$ which has the density $f_{4,6/5,1}$ . Indeed $\mu_{2}$ is computed to be

If $\beta_{n}$ converges monotonically to $1$ faster than $n^{-3/4}$ then $\frac{S_{n}}{n^{7/8}}$ converges in distribution to $\widehat{F}_{4}$ , whereas if $\beta_{n}$ converges monotonically to $1$ slower than $n^{-3/4}$ then $\frac{\sqrt{1-\beta_{n}}\,S_{n}}{\sqrt{n}}$ satisfies a Central Limit Theorem. Eventually, if $|1-\beta_{n}|=\gamma n^{-3/4}$ , $\frac{S_{n}}{n^{7/8}}$ converges in distribution to the mixed density

Note that there is some interesting change in limiting behaviour of all of these models at criticality. While for $\beta<1$ all of the models have the same rate of convergence for the Central Limit Theorem behaviour, in the limit at criticality the limiting distribution function as well as the distributions which depend on some moments of $W$ becomes characteristic of the underlying distribution $\varrho$ . Moreover the rate of convergence differs at criticality (for $k\geq 3$ ).

Appendix

Consider a probability density of the form

Here $\psi(x)=-2k\,a_{k}\,x^{2k-1}$ . We have

with $P(z):=\int_{-\infty}^{z}p(x)\,dx$ . Note that $f_{z}(x)=f_{-z}(-x)$ , so we need only to consider the case $z\geq 0$ . For $x>0$ we obtain

So $\exp\bigl{(}a_{k}x^{2k}\bigr{)}\int_{x}^{\infty}\exp\bigl{(}-a_{k}t^{2k}\bigr{)}\,dt$ attains its maximum at $x=0$ and therefore

So $\exp\bigl{(}a_{k}x^{2k}\bigr{)}\int_{-\infty}^{x}\exp\bigl{(}-a_{k}t^{2k}\bigr{)}\,dt$ attains its maximum at $x=0$ and therefore

Applying (8.61) and (8.62) gives $0<f_{z}(x)\leq\frac{1}{2\,b_{k}}$ for all $x$ . Note that for $x<0$ we only have to consider the first case of (8.57), since $z\geq 0$ . The constant $\frac{1}{2\,b_{k}}$ is not optimal. Following the proof of Lemma 2.2 in or alternatively of Lemma 2 in [22, Lecture II] would lead to optimal constants. We omit this. It follows from (8.57) that

With (8.58) we obtain for $0<x\leq z$ that

The same argument for $x\geq z$ leads to $|f_{z}^{\prime}(x)|\leq 2$ . For $x<0$ we use the first half of (8.57) and apply (8.59) to obtain $|f_{z}^{\prime}(x)|\leq 2$ . Actually this bound will be improved later. Next we calculate the derivative of $-\psi(x)\,f_{z}(x)$ :

With (8.60) we obtain $(-\psi(x)f_{z}(x))^{\prime}\geq 0$ , so $-\psi(x)f_{z}(x)$ is an increasing function of $x$ (remark that for $x<0$ we only have to consider the first half of (8.57)). Moreover with (8.58), (8.59) and (8.60) we obtain that

Hence we have $|2k\,a_{k}\,x^{2k-1}f_{z}(x)|\leq 1$ and $|2k\,a_{k}\bigl{(}x^{2k-1}f_{z}(x)-u^{2k-1}f_{z}(u)\bigr{)}|\leq 1$ for any $x$ and $u$ . From (8.58) it follows that $f_{z}^{\prime}(x)>0$ for all $x<z$ and $f_{z}^{\prime}(x)<0$ for $x>z$ . With Stein’s identity $f_{z}^{\prime}(x)=-\psi(x)f_{z}(x)+1_{\{x\leq x\}}-P(z)$ and (8.65) we have

Next we bound $(-\psi(x)f_{z}(x))^{\prime}$ . We already know that $(-\psi(x)f_{z}(x))^{\prime}>0$ . Again we apply (8.58) and (8.59) to see that

for $x\geq z>0$ and all $x\leq 0$ . For $0<x\leq z$ this latter bound holds, as can be seen by applying this bound (more precisely the bound for $(-\psi(x)f_{z}(x))^{\prime}\,\frac{b_{k}}{P(z)}$ for $x\geq z$ ) with $-x$ for $x$ to the formula for $(\psi(x)f_{z}(x))^{\prime}$ in $x\leq z$ . For some constant $c$ we can bound $(\psi(x)f_{z}(x))^{\prime}$ by $c$ for all $|x|\geq\frac{2k-1}{c}$ . Moreover, on $[-\frac{2k-1}{c},\frac{2k-1}{c}]$ the continuous function $(-\psi(x)f_{z}(x))^{\prime}$ is bounded by some constant $d$ , hence we have proved

The problem of finding the optimal constant, depending on $k$ , is omitted. Summarizing, Assumption (B2) is fulfilled for $p$ with $d_{2}=d_{3}=1$ and some constants $d_{1}$ and $d_{4}$ .

An alternative bound is $c_{2}\,e_{1}$ with some constant $c_{2}$ depending on the $(2k-2)$ ’th moment of $p$ . This is using Stein’s identity (4.40) to obtain

The details are omit. To bound the second derivative $f_{h}^{\prime\prime}$ , we differentiate (4.40) and have

Now we apply the fact that the quantity in (8.64) is non-negative to obtain

Moreover we know, that the quantity in (8.64) can be bounded by $\frac{2k-1}{|x|}$ , hence

where $Z$ is distributed according to $p$ . Summarizing we have $|f_{h}^{\prime\prime}(x)|\leq c_{3}\sup_{x}|h^{\prime}(x)|$ for some constant $c_{3}$ , using the fact that $f_{h}$ and therefore $f_{h}^{\prime}$ and $f_{h}^{\prime\prime}$ are continuous. Hence $f_{h}$ satisfies Assumption (B1). ∎

Now let $p(x)=b_{k}\exp\bigl{(}-a_{k}V(x)\bigr{)}$ and $V$ satisfies the assumptions listed in Remark 4.3. To proof that $f_{z}$ (with respect to $p$ ) satisfies Assumption (B2), we adapt (8.60) as well as (8.61) and (8.62), using the assumptions on $V$ . We obtain for $x>0$

Estimating $(-\psi(x)f_{z}(x))^{\prime}$ gives

Acknowledgement. During the preparation of our manusscript we became aware of a preprint of S. Chatterjee ans Q.-M. Shao about Stein’s method with applications to the Curie-Weiss model. As far as we understand, there the authors give an alternative proof of Theorem 1.2 and 1.3.