Langevin dynamics with constraints and computation of free energy differences

Tony Lelievre, Mathias Rousset, Gabriel Stoltz

Introduction and main results

Free energy is a central concept in thermodynamics and in modern works on biochemical and physical systems. Typical examples studied by computer simulations include the solvation free energies (which is the free energy difference between a molecule in vacuo and its counterpart surrounded by solvent molecules) and the binding free energy of two molecules (which determines whether a new drug can have an efficient action on a given protein). In many applications, it is actually the free energy difference profile between the initial and the final state which is a quantity of paramount importance. It is observed by practitioners that free energy barriers are a very important element to describe transition kinetics from one state to the other. For instance, the chemical kinetics of reactions happening in solvent (such as in the cells of our bodies) are limited by free energy barriers, and can take place only when the free energy difference between the initial and the final state is negative, or at least less than the typical thermal energy. It is therefore very important to accurately compute free energy differences in order to assess the likelihood of a certain physical event to happen.

Beside these physical motivations to compute free energy differences, a more abstract motivation is to overcome sampling barriers encountered when computing canonical averages (see the discussion in [35, Section 1.3.3]). Indeed, it is often the case in practice that the trajectories generated by the numerical methods at hand remain stuck for a long time in some region of the phase space, and hop only occasionally to another region, where they also remain stuck – a behavior known as metastability. Chemical and physical intuitions may guide the practitioners of the field towards the choice of some slowly evolving degree of freedom, called reaction coordinate in the following, responsible for the metastable behavior of the system. In this case, free energy techniques can be used to accelerate the sampling. This viewpoint allows to consider applications which are not motivated by physical or biological problems, such as curing sampling issues in Bayesian statistics .

In this introductory section, we present the main results and give the outline of the paper, highlighting the three main contributions of this work. We only briefly define the concepts we need in this general introduction, and refer the reader to the following sections (in particular Section 2) for further precisions on the mathematical objects at hand.

In the present paper, the focus is on the canonical ensemble, which is the equilibrium probability distribution of microscopic states of a system at fixed temperature (fixed average energy). For systems without constraints, this ensemble is characterized by the probability distribution

where ZZ is the normalizing constantThe potential VV is assumed to be such that Z<Z<\infty. ensuring that μ\mu is indeed a probability distribution, and β=(kBT)1\beta=(k_{\rm B}T)^{-1} is proportional to the inverse temperature. One dynamics which admits the canonical measure (1.1) as an invariant measure is the Langevin dynamics (see for instance [35, Section 2.2.3] and references therein):

where WtW_{t} is a standard 3N3N-dimensional Brownian motion, and γ(q),σ(q)\gamma(q),\sigma(q) are 3N×3N3N\times 3N position dependent real matrices which are assumed to satisfy the fluctuation-dissipation identity

The Langevin dynamics can be seen as some modification of the Hamiltonian dynamics with two added components: a damping term γ(qt)M1ptdt-\gamma(q_{t})M^{-1}p_{t}\,dt and a random forcing term σ(qt)dWt\sigma(q_{t})\,dW_{t}. The energy dissipation due to the damping is compensated by the random forcing in such a way that the temperature of the system is T=(kBβ)1T=(k_{B}\beta)^{-1} (with kBk_{\rm B} Boltzmann’s constant).

We will consider positions subject to a mm-dimensional mechanical constraint denoted by

As will become clear below, constrained systems appear in computational statistical physics in two kinds of contexts (see e.g. [42, Chapter 10], and for applications to the computation of free energy differences, and for mathematical textbooks dealing with constrained Hamiltonian dynamics):

for free energy computations, where ξ\xi is a given reaction coordinate parameterizing a transition between ”states” of interest ;

when the system is subject to molecular constraints such as rigid covalent bonds, or rigid bond angles in molecular systems.

In the sequel, ξ\xi may be thought of at first reading as a reaction coordinate (case (i)). Section 4.1 explains how to handle additional molecular constraints (case (ii)) within the same formalism. In any case, the position of the system is constrained onto the submanifold of co-dimension mm:

and the associated phase space is the cotangent bundle denoted by

For a given qΣ(z)q\in\Sigma(z), the set of cotangent momenta is denoted by

The orthogonal projection on TqΣ(z)T^{*}_{q}\Sigma(z) with respect to the scalar product induced by M1M^{-1} is denoted

where GM(q)G_{M}(q) is the Gram matrix associated with the constraints

Throughout the paper, we assume that GMG_{M} is invertible everywhere on Σ(z)\Sigma(z) (for all zz). It is easily checked that PMP_{M} satisfies the projector property PM(q)2=PM(q)P_{M}(q)^{2}=P_{M}(q), and the orthogonality property

2. The constrained Langevin dynamics

For constrained systems, the associated canonical distribution is defined by

where σTΣ(z)(dqdp)\sigma_{T^{*}\Sigma(z)}(dq\,dp) is the phase space Liouville measure of TΣ(z)T^{*}\Sigma(z), and Zz,0Z_{z,0} the normalizing constant (zz refers to the position constraint, and to the velocity or momentum constraint, see (2.15) below). See Section 2.3 for precise definitions.

A dynamics admitting the constrained canonical measure (1.9) as an invariant equilibrium measure is the following Langevin process (“CL” stands for “constrained Langevin”): For a given initial condition (q0,p0)TΣ(z)(q_{0},p_{0})\in T^{*}\Sigma(z),

3. Free energy computations

In words, eβF(z)dz{\rm e}^{-\beta F(z)}\,dz is the image of the measure μ\mu by ξ\xi, and FF can be seen as an “effective potential energy” associated to ξ\xi.

Computing the free energy profile zF(z)z\mapsto F(z) (up to an additive constant independent of zz), or free energy differences between two states F(z2)F(z1)F(z_{2})-F(z_{1}) is a way to compare the relative probabilities of different ”states” parameterized by ξ\xi. This is a very important calculation for practical applications, see . A state should be understood here as the collection of all possible microscopic configurations (q,p)(q,p), distributed according to the canonical measure (1.1), and satisfying the macroscopic constraint ξ(q)=z\xi(q)=z. Since we only focus on computing free energy differences, FF is defined up to an additive constant (independent of zz, denoted by C{\rm C} below, and whose value may vary from line to line) and can be rewritten as:

However, when using constrained simulations in phase space, the momentum variable of the dynamical system is also constrained, and a modified free energy (called “rigid free energy” in the sequel, see Remark (3.3) below for a justification of the term ”rigid”) is more naturally computed, see Section 3. The latter is defined as

More precisely, the first point (i) amounts to showing that

As compared to , where a formal proof for the Hamiltonian case is proposed, we use an explicit calculation that does not require the use of the Lagrangian structure of the problem, or a change of coordinates. Once zFrgdM(z)\nabla_{z}F_{\rm rgd}^{M}(z) is obtained, FrgdM(z)F_{\rm rgd}^{M}(z) can be computed (up to an additive constant) by integration. This procedure is known as thermodynamic integration. Note that using (1.15) and thermodynamic integration, together with (1.14), allows to obtain F(z)F(z) without computing second order derivatives of ξ\xi. This is a desirable property since computing such high derivatives may be cumbersome for some reaction coordinates used in practice. Straightforward computations of the mean force using analytical expressions (see for instance (4.11)-(4.12)) usually involve such high order derivatives.

The second point (ii) is then based on a discretization of a variant of (1.15), obtained by subtracting the martingale part of the Lagrange multipliers. This amounts to averaging the two Lagrange multipliers involved in the RATTLE part of the scheme, see (4.18)).

We also discuss how these techniques can be generalized to compute the free energy for systems with molecular constraints, see Section 4.1.

4. Jarzynski-Crooks relations and nonequilibrium computations of the free energy

As explained in Section 5.3, it is possible to define the work associated with the constraints exerted on the system between time and TT as the displacement multiplied by the constraining force:

where 12βlndetGM(q)\frac{1}{2\beta}\ln\det G_{M}(q) is the so-called Fixman term due to the geometry of the position constraints (see (1.14) and Remark 3.3), and 12z˙(t)TGM1(q)z˙(t)\frac{1}{2}\dot{z}(t)^{T}G_{M}^{-1}(q)\dot{z}(t) is the kinetic energy term due to the velocity of the switching. Then, the free energy profile can be computed through the following fluctuation identity (see (5.23)):

5. Organization of the paper

Preliminaries

After making precise our notation for matrices and matrix valued functions in Section 2.1, we introduce some additional concepts required to describe constrained systems in Section 2.2, and define the phase space measures with constraints in Section 2.3.

Throughout the paper, the following notation is used:

The canonical symplectic matrix is denoted by:

2. Constraints

Contrarily to what is often done in the literature, we avoid global changes of variables and the use of generalized coordinates. We observe that global changes of variables are not required for the proofs of the theoretical results we present, and they are definitely to be avoided in practical numerical computations whenever possible.

Two useful concepts to study constrained Hamiltonian systems (in particular, to use the co-area formula in phase space, as well as the Poisson bracket formulation of the Liouville equation) are the effective velocity vξv_{\xi} and the effective momentum pξp_{\xi} associated with the constrained degrees of freedom ξ\xi:

The expression of the effective velocity is obtained by deriving the constraint ξ\xi along an unconstrained trajectory of the Hamiltonian dynamics

The last equations allow to consider GM1G_{M}^{-1} as some effective mass.

The constraints on a mechanical system can also be reformulated in the more general form

where either (i) the effective momentum is constrained, in which case Ξ=(ξ,pξ)\Xi=(\xi,p_{\xi}) and ζ=(z,pz)\zeta=(z,p_{z}); or (ii) the effective velocity is constrained, in which case Ξ=(ξ,vξ)\Xi=(\xi,v_{\xi}) and ζ=(z,vz)\zeta=(z,v_{z}). The phase space associated with such constraints is denoted by

A position qΣ(z)q\in\Sigma(z) being given, the affine space of constrained momenta verifying (2.6) is then denoted by

in the effective velocity case, and by Σpξ(q,)(pz)\Sigma_{p_{\xi}(q,\cdot)}(p_{z}) in the effective momentum case. This notation is very important for nonequilibrium methods where the constraints evolve in time according to a predefined schedule, see Section 5. Note that the phase space of mechanical constraints, defined by (1.5), is simply TΣ(z)=Σξ,vξ(z,0)=Σξ,pξ(z,0)T^{*}\Sigma(z)=\Sigma_{\xi,v_{\xi}}(z,0)=\Sigma_{\xi,p_{\xi}}(z,0).

We can now define the skew-symmetric Gram tensor of dimension 2m×2m2m\times 2m associated with the constraints:

The Gram matrix Γ\Gamma associated with the generalized constraints (2.6) can be explicitly computed by block. Indeed, for Ξ=(ξ,pξ)T\Xi=(\xi,p_{\xi})^{T},

Therefore, det(Γ)=1{\rm det}(\Gamma)=1 in this case. In the case Ξ=(ξ,vξ)T\Xi=(\xi,v_{\xi})^{T}, the Gram matrix reads

and det(Γ)=det(GM)2{\rm det}(\Gamma)={\rm det}(G_{M})^{2}. Note that in both cases det(Γ)>0{\rm det}(\Gamma)>0. The constrained symplectic (skew-symmetric) matrix is now defined by

and the Poisson bracket associated with generalized constraints (2.6) by:

This Poisson bracket is often called the Dirac bracket in the literature (in reference to the seminal work of Dirac ).

It is easily checked that {,}Ξ\left\{\cdot,\cdot\right\}_{\Xi} verifies the characteristic properties of Poisson brackets, namely the skew-symmetry, Jacobi’s identity, and Leibniz’ rule. Therefore, the flow associated with the evolution equation

defines a symplectic map. Recall that (see [22, Section VII.1.2]) a map ϕ:ΣΞ(ζ)ΣΞ(ζ)\phi\,:\,\Sigma_{\Xi}(\zeta)\to\Sigma_{\Xi}(\zeta) is symplectic if for any (q,p)ΣΞ(ζ)(q,p)\in\Sigma_{\Xi}(\zeta) and u,vT(q,p)ΣΞ(ζ)u,v\in T_{(q,p)}\Sigma_{\Xi}(\zeta),

A consequence of the symplectic structure is the divergence formula (2.25) relating the phase space measure σΣΞ(ζ)(dqdp)\sigma_{\Sigma_{\Xi}(\zeta)}(dq\,dp) on ΣΞ(ζ)\Sigma_{\Xi}(\zeta) (defined below), and the Poisson bracket (2.13). The reader is referred to Chapter 88 in , and Section VII.11 in for more material on constrained systems.

3. Phase space measures

and (u1(q,p),,u6N2m(q,p))(u_{1}(q,p),\ldots,u_{6N-2m}(q,p)) is a basis of tangential vectors of the submanifold TΣ(z)T^{\ast}\Sigma(z) (or ΣΞ(ζ)\Sigma_{\Xi}(\zeta)) at a given point (q,p)(q,p).

It is now possible to define a generalization of the canonical distribution (1.9) as follows:

The distribution (2.15) is associated with the generalized constraints Ξ=(ξ,vξ)T\Xi=\left(\xi,v_{\xi}\right)^{T}, and is used in Section 5 for nonequilibrium methods. Note that μΣξ,vξ(z,0)=μTΣ(z)\mu_{\Sigma_{\xi,v_{\xi}}(z,0)}=\mu_{T^{\ast}\Sigma(z)} defined in (1.9).

3.2. Co-area decompositions

A more concise notation for the above equalities is dqdp=δΞ(q,p)ζ(dqdp)dζdq\,dp=\delta_{\Xi(q,p)-\zeta}(dq\,dp)\,d\zeta and dq=δξ(q)z(dq)dzdq=\delta_{\xi(q)-z}(dq)\,dz.

We refer for example to Chapter 33 in for an elementary proof. An equivalent of (2.18)-(2.17) for momenta reads, for constrained effective momenta:

and for constrained effective velocities:

Using the co-area formulas (2.18)-(2.19), and the expressions of symplectic Gram matrices (2.10)-(2.11), we obtain the following expressions of the phase space measures:

The phase space measure on Σξ,pξ(z,pz)\Sigma_{\xi,p_{\xi}}(z,p_{z}) can be identified with the conditional measure defined in (2.16):

while the phase space measure on Σξ,vξ(z,vz)\Sigma_{\xi,v_{\xi}}(z,v_{z}) is related to the corresponding conditional measure as

The phase space measures are given by the product of surface measures:

Equations (2.23)-(2.24) are a consequence of the fact that

3.3. Divergence formulas

We end this section with an important formula, which is used to show the invariance of the canonical measure in the proof of Proposition 3.2.

The divergence formula (2.25) can be proved using Darboux’s theorem and internal coordinates, or directly using the co-area formula (see Section 3.3 in ).

Constrained Langevin processes and sampling

By differentiating with respect to time the constraint ξ(qt)=z\xi(q_{t})=z, the Lagrange multipliers can be computed explicitly (see for instance Section 3.3 in ):

where we introduced the notation (σP,γP):=(PMσ,PMγPMT)(\sigma_{P},\gamma_{P}):=(P_{M}\,\sigma,P_{M}\,\gamma\,P_{M}^{T}). The constraint therefore has two effects: (i) the matrices γ,σ\gamma,\sigma in the dissipation and fluctuation terms are replaced by their projected counterparts γP,σP\gamma_{P},\sigma_{P}, and (ii) an orthogonal constraining force ξfrgdM\nabla\xi f_{\rm rgd}^{M} is introduced.

The expression of LΞ\mathcal{L}_{\Xi} can be obtained using Itô calculus, as made precise in the following proposition.

where the fluctuation-dissipation part is

with (σP,γP)(\sigma_{P},\gamma_{P}) defined in (1.16). Using the fluctuation-dissipation relation (1.3), the generator LΞthm\mathcal{L}_{\Xi}^{\rm thm} can be rewritten more compactly as

Let us first consider (i), with Ξ=(ξ,vξ)\Xi=(\xi,v_{\xi}) (the case Ξ=(ξ,pξ)\Xi=(\xi,p_{\xi}) being similar). Note that

where the Hessian operator Hess{\rm Hess} is defined in (2.1). Now, (2.11) implies that

Besides, vξ(qt,pt)=0v_{\xi}(q_{t},p_{t})=0 along a trajectory, since (qt,pt)TΣ(z)(q_{t},p_{t})\in T^{*}\Sigma(z). Therefore,

Finally, for all (q,p)TΣ(z)(q,p)\in T^{*}\Sigma(z),

The operator (3.10) is the generator of the Hamiltonian part in (3.3).

We turn to (ii). The diffusive part arises from the fluctuation term σP(qt)dWt\sigma_{P}(q_{t})\,dW_{t} in (3.3), and its expression

is obtained directly from the standard Itô calculus. Similarly, the dissipation operator is

The addition of these two contributions gives the expression of LΞthm\mathcal{L}_{\Xi}^{\rm thm}. ∎

The stationarity and reversibility properties follow from the following detailed balance condition up to momentum reversal (see for instance Section 2.2 in ): for any test functions φ1\varphi_{1}, φ2\varphi_{2},

where S:(q,p)(q,p)S\,:(q,p)\mapsto(q,-p) is the momentum flip. In view of the expression (3.4) of the generator, proving (3.11) amounts to proving this property for the operators {.,H}Ξ\left\{.,H\right\}_{\Xi} and LΞthm\mathcal{L}_{\Xi}^{\rm thm}.

For the Hamiltonian part {.,H}Ξ\left\{.,H\right\}_{\Xi}, the expression (3.10) yields

and the divergence formula (2.25) yields the balance condition (3.11) for the Hamiltonian part, in view of the invariance of the distribution σTΣ(z)\sigma_{T^{*}\Sigma(z)} under the momentum flip SS.

For the thermostat part, it is easily checked, that

for any smooth test function φ\varphi, so that the detailed balance condition up to momentum reversal (3.11) follows from the following more general detailed balance condition, in the case vz=0v_{z}=0 (μΣξ,vξ(z,vz)\mu_{\Sigma_{\xi,v_{\xi}}(z,v_{z})} being defined in (2.15)):

After integration in qq, using the formula (3.5) for LΞthm\mathcal{L}_{\Xi}^{\rm thm} and (2.24), an expression symmetric in (φ1,φ2)(\varphi_{1},\varphi_{2}) is obtained:

hence the detailed balance condition (3.12).

Ergodicity is a consequence of the hypo-ellipticity of the operator LΞ\mathcal{L}_{\Xi} on TΣ(z)T^{\ast}\Sigma(z) (Hörmander’s criterion is satisfied, see ), which is itself a consequence of the fact that PM(q)γPM(q)TP_{M}(q)\gamma P_{M}(q)^{T} is strictly positive on each TqΣ(z)T^{\ast}_{q}\Sigma(z). The proof can be carried out using local coordinates and the results from . ∎

We have considered in this section the Langevin dynamics (3.3) with constraints rigidly imposed by a projection onto the submanifold TΣ(z)T^{*}\Sigma(z). This dynamics samples the canonical distribution (1.9) μTΣ(z)(dqdp)=Zz,01eβH(q,p)σTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp)=Z_{z,0}^{-1}\,{\rm e}^{-\beta H(q,p)}\,\sigma_{T^{*}\Sigma(z)}(dq\,dp), with constraints on both positions and momenta. The marginal on positions of this distribution is, in view of (2.23), proportional to

This is what we call in the following rigidly imposed constraints. The canonical distribution (1.9) with rigid constraints is naturally associated to the rigid free energy (1.13) (and this is what justifies the qualification ”rigid”), since, we recall,

Another way to impose some constraints on a system is to add a penalization term. In our context, this could be done by changing the potential energy VV to

It is easy to check that, in the limit ε0\varepsilon\to 0 (infinite stiffness limit), the canonical measure associated to this potential is the canonical distribution (1.1) with positions conditioned by ξ(q)=z\xi(q)=z. This distribution is proportional to eβH(q,p)δξ(q)z(dq)dp{\rm e}^{-\beta H(q,p)}\,\delta_{\xi(q)-z}(dq)\,dp and its marginal on positions is proportional to

This is what we call in the following softly imposed constraints. The canonical distribution with soft constraints is naturally associated with the standard free energy, since, we recall,

Note that, in view of (2.18), the marginal on positions for softly imposed constraints can be written in terms of rigidly imposed constraints through a modification of the potential:

is sometimes called the Fixman corrector (see ). Thus, if (qt,pt)(q_{t},p_{t}) satisfies (3.3) with the modified potential V+VfixV+V_{{\rm fix}} then qtq_{t} samples (in the longtime limit) the probability measure proportional to eβV(q)δξ(q)z(dq){\rm e}^{-\beta V(q)}\delta_{\xi(q)-z}(dq), and we thus refer to this dynamics as the softly constrained Langevin dynamics.

These concepts will be used in Section 4.1 to describe the computation of free energy differences for systems with molecular constraints.

Finally, let us mention that the infinite stiffness limit ε0\varepsilon\to 0 of the Langevin dynamics (1.2) with the potential VεV_{\varepsilon} is not (except for very specific forms of constraints) the softly constrained Langevin dynamics, as one would expect. We refer to for example, where it is shown that adiabatic effective potentials (derived from the conservation of the ratio of energy over frequency of fast modes) are required to describe the limiting dynamics. However, a formal argument based on “over-damping” the fast modes indeed leads to the softly constrained Langevin dynamics. \sqcap\sqcup

2. Numerical implementation

For simplicity, we restrict ourselves to constant matrices γ\gamma and σ\sigma. Generalizations to position dependent matrices are straightforward.

This equation can be explicitly integrated on [0,t][0,t] to obtain:

However, the matrix exponential etγP(q)M1{\rm e}^{-t\,\gamma_{P}(q)M^{-1}} may be difficult to compute in practice (except for certain choices of γ\gamma and MM, see the discussion at the end of Section 3.2.2). Instead of performing an exact integration, (3.14) can be discretized using a midpoint Euler scheme, which yields (3.16) and (3.18) below.

The numerical scheme we investigate, termed midpoint Euler-Verlet-midpoint Euler splitting, is therefore the following:

Note that when γ=0\gamma=0 and σ=0\sigma=0, the scheme (3.16)-(3.17)-(3.18) becomes deterministic, and reduces to (3.17), which is a scheme for the deterministic Hamiltonian equations of motion with position constraints ξ(q)=z\xi(q)=z. The latter scheme is referred to as the ”Hamiltonian scheme (3.17)” below.

The domain DΔtTΣ(z)D_{\Delta t}\subset T^{\ast}\Sigma(z) is defined as the set of configurations (qn,pn+1/4)TΣ(z)(q^{n},p^{n+1/4})\in T^{\ast}\Sigma(z) such that there is a unique solution (qn+1,pn+3/4)(q^{n+1},p^{n+3/4}) verifying (3.17).

Solving the position constraints (Cq)(C_{q}) consists in projecting onto Σ(z)\Sigma(z) a point in a Δt\Delta t-neighborhood of qnq^{n}. Thus, by the implicit function theorem, the domain DΔtD_{\Delta t} verifies:

It may happen that there is no solution if the time-step is too large, and, even for small time-steps, that several projections exist, see for instance Example 22 in Chapter 77 of . In practice, DΔtD_{\Delta t} can be chosen to be the set of (qn,pn+1/4)(q^{n},p^{n+1/4}) such that the Newton algorithm enforcing the constraints (Cq)(C_{q}) has converged within a given precision threshold and a limited number of iterations.

As for the Verlet scheme in the unconstrained case, the associated numerical flow shares two important qualitative properties with the exact flow: It is time reversible and symplectic (see ). This implies quasi-conservation of energy, in the sense that energy is conserved within a given precision threshold over exponentially long times, see .

2.2. Comments on the fluctuation-dissipation part (3.16) and (3.18)

The new momentum pn+1/4TΣ(z)p^{n+1/4}\in T^{\ast}\Sigma(z) in (3.16) (or pn+1p^{n+1} in (3.18)) may be obtained by first integrating the unconstrained dynamics with a midpoint scheme, and then computing the Lagrange multiplier λn+1/4\lambda^{n+1/4} (or λn+1\lambda^{n+1}) by solving the following linear system implied by the constraints (Cp)(C_{p}):

Besides, it can be checked (see Sections 2.3.2 and 3.3.5 in ) that the Markov chain induced by the fluctuation-dissipation part of the scheme (3.16) (or (3.18)) verifies a detailed balance equation (both in the plain sense and up to momentum reversal) with respect to the stationary measure κTqΣ(z)M1(dp)\kappa_{T^{\ast}_{q}\Sigma(z)}^{M^{-1}}(dp). The latter is defined as the kinetic probability distribution

and is the marginal in the pp-variable of the canonical distribution μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp) conditioned by a given qΣ(z)q\in\Sigma(z). Moreover, if γP:=PMγPMT\gamma_{P}:=P_{M}\,\gamma\,P_{M}^{T} is strictly positive in the sense of symmetric linear transformations of TqΣ(z)T^{*}_{q}\Sigma(z), then the Markov chain on momentum variable induced by (3.16) (or (3.18)) alone is ergodic with respect to κTqΣ(z)M1(dp)\kappa_{T^{\ast}_{q}\Sigma(z)}^{M^{-1}}(dp).

Finally, an important simplification occurs in the integration of (3.14) in the special case when γ\gamma and MM are equal up to a multiplicative constant (so that γM1\gamma M^{-1} is proportional to identity). Indeed in this case the equality (γP(q)M1)n=PM(q)(γM1)n\left(\gamma_{P}(q)M^{-1}\right)^{n}=P_{M}(q)\left(\gamma\,M^{-1}\right)^{n} holds for any n0n\geq 0, and (3.15) simplifies to

The numerical integration of (3.14) can thus be carried out in two steps: (i) exactly integrating (3.14) without constraint, and then (ii) projecting the result onto TqΣ(z)T^{\ast}_{q}\Sigma(z).

2.3. Metropolis-Hastings correction

Usually, the invariant probability distribution sampled by the solution of a numerical scheme is biased by the time discretization. Relying on (i) the time symmetry (up to momentum reversal) and (ii) the preservation of the phase space measure σTΣ(z)(dqdp)\sigma_{T^{*}\Sigma(z)}(dq\,dp) by the solution of the RATTLE scheme (3.17), it is possible to eliminate the time discretization error in the splitting scheme (3.16)-(3.17)-(3.18) by resorting to a Generalized Hybrid Monte Carlo algorithm.

Consider an initial configuration (q0,p0)TΣ(z)(q^{0},p^{0})\in T^{*}\Sigma(z), and a sequence (Gn,Gn+1/2)n0({\mathcal{G}}^{n},{\mathcal{G}}^{n+1/2})_{n\geq 0} of independently and identically distributed standard Gaussian vectors. Iterate on n0n\geq 0:

Evolve the momentum according to the midpoint Euler scheme (3.16), and compute the energy En=H(qn,pn+1/4)E^{n}=H(q^{n},p^{n+1/4}) of the new configuration;

Integrate the Hamiltonian part according to the RATTLE scheme (3.17), denote (q~n+1,p~n+3/4)(\widetilde{q}^{n+1},\widetilde{p}^{n+3/4}) the resulting state, and set En+1=H(q~n+1,p~n+3/4)E^{n+1}=H(\widetilde{q}^{n+1},\widetilde{p}^{n+3/4}).

Accept the proposal (qn+1,pn+3/4):=(q~n+1,p~n+3/4)(q^{n+1},p^{n+3/4}):=(\widetilde{q}^{n+1},\widetilde{p}^{n+3/4}) with probability

Otherwise, reject and flip the momentum: (qn+1,pn+3/4)=(qn,pn+1/4)(q^{n+1},p^{n+3/4})=(q^{n},-p^{n+1/4}).

Evolve the momentum according to the midpoint Euler scheme (3.18).

By construction, the GHMC algorithm with constraints leaves invariant the equilibrium distribution μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp) (see Section 3.3.5 in ).

To understand the momentum reversal required upon rejection, it is useful to write more explicitly the Markov chain as the composition of a Metropolis-Hastings part, where the proposal is obtained by a RATTLE step followed by a momentum reversal (the latter operation is needed to ensure the symmetry of the proposition), which is then accepted or rejected; and another momentum reversal (which leaves invariant the targeted probability measure μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp)). When the proposal is accepted, the two momentum reversals cancel out each other. On the other hand, when the proposal is rejected, momenta are actually reversed. See [35, Section 2.1.4] for more background on generalized Metropolis-Hastings algorithms.

In the above, we implicitly assume that the RATTLE scheme (3.17) is everywhere well defined. In practice however, it is necessary to modify Algorithm 3.5 by restricting the sampled configurations to DΔtD_{\Delta t}. This can be achieved by introducing additional tests in steps (1), (2) and (4), and rejecting the states that have gone outside the set DΔtTΣ(z)D_{\Delta t}\subset T^{\ast}\Sigma(z) where the position constraint (Cq)(C_{q}) is well defined. By doing so, the global algorithm has an invariant equilibrium distribution given by μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp) conditioned on the set of states DΔtD_{\Delta t}. This invariant distribution can be written explicitly as follows:

Therefore, there exists a>0a>0 small enough such that, when pn+1/4a/Δt\|p^{n+1/4}\|\leq a/\Delta t, the RATTLE scheme in (3.17) is well defined, namely there exists a unique qn+1q^{n+1} satisfying the constraint (Cq)(C_{q}). This shows that

The invariant probability distribution of the Markov chain generated by GHMC with the additional rejection steps ensuring 12pTM1pRΔt\frac{1}{2}p^{T}M^{-1}p\leq R_{\Delta t}, is given by (3.21), and actually reads

Its marginal distribution in the position variable is then exactly given by:

This is also the marginal distribution in the position variable of μTΣ(z)\mu_{T^{*}\Sigma(z)}. Note however, that if RΔtR_{\Delta t} is too small, only small momenta will be sampled in step (1) of Algorithm 3.5, and the correlation time of the sampling will be large. In practice, the threshold RΔtR_{\Delta t} should be tuned in preliminary computations so that: (i) RΔtR_{\Delta t} is small enough so that the maximal number NmaxN_{\rm max} of iterations for the Newton algorithm used to enforce (Cq)(C_{q}) in (3.17) is never reached; (ii) RΔtR_{\Delta t} is large enough so that the correlation time of the sampling is as small as possible.

Let us end this section with a warning: It is now known that the correction of the bias in discretizations of the Langevin dynamics by a Metropolization of the scheme may reduce the efficiency of the sampling, see for instance .

3. Exact sampling on a submanifold with overdamped dynamics

Constrained overdamped Langevin processes (or Brownian dynamics) are solutions of the stochastic differential equation (see also )

where λt\lambda_{t} is an adapted stochastic process. Equivalently, (3.23) can be rewritten in the Stratonovitch form as

where \circ denotes the Stratonovitch integration, and PP is the projector defined by (1.7) with the choice M=IdM={\rm Id}:

It can be shown that (3.23) verifies the detailed balance condition for (and is ergodic with respect to) the invariant distribution

Suppose that the following relation is satisfied:

where (Gn)n0({\mathcal{G}}^{n})_{n\geq 0} are independent and identically distributed centered and normalized Gaussian variables, and (λodn)n1(\lambda^{n}_{\rm od})_{n\geq 1} are the Lagrange multipliers associated with the constraints (ξ(qn)=z)n1(\xi(q^{n})=z)_{n\geq 1}. Moreover, the Lagrange multipliers in (3.17) verify:

Irrespective of pnp^{n}, the choice (3.26) in the scheme (3.16)-(3.17)-(3.18) leads to

where λn+1/4\lambda^{n+1/4} is associated with the constraints ξ(qn)Tpn+1/4=0\nabla\xi(q^{n})^{T}p^{n+1/4}=0. This gives

where λn+1/2\lambda^{n+1/2} is such that ξ(qn+1)=z\xi(q^{n+1})=z. The fluctuation-dissipation relation (1.3) can be reformulated in this context as

and the scheme (3.27) is recovered by taking the associated Lagrange multiplier equal to λodn+1=λn+1/4+2λn+1/2\lambda^{n+1}_{\rm od}=\lambda^{n+1/4}+2\lambda^{n+1/2}. Finally, remarking that GM=2ΔtGG_{M}=\frac{2}{\Delta t}G and computing explicitly λn+1/2\lambda^{n+1/2} and λn+3/4\lambda^{n+3/4} in (3.17) yields (3.28)-(3.29)-(3.30). ∎

This point of view allows to construct a Metropolis correction to the Euler scheme (3.27), using the Generalized Hybrid Monte Carlo scheme (Algorithm 3.5) with the time-step chosen according to (3.26). In this way, assuming that the position constraint (Cq)(C_{q}) in (3.17) is everywhere well defined, we obtain a Markov chain (qn)n0(q^{n})_{n\geq 0} discretizing the overdamped dynamics (3.27) which exactly samples the invariant distribution (3.25). Deriving such a Metropolis-Hastings correction to the Euler scheme (3.27) without resorting to phase-space dynamics does not seem to be natural.

Proposition 3.6 can be seen as a discrete version of the zero-mass limit of the Langevin dynamics. It is also possible to obtain a discrete version of the overdamped limit (γ\gamma\to\infty) of the Langevin dynamics by assuming that the parameters satisfy the relation

which is less restrictive than (3.26). Equation (3.27) is then obtained with Δt\Delta t replaced by

In this case, the effective discretization time-step Δs\Delta s for the overdamped Langevin dynamics is thus different from the time-step Δt\Delta t originally used for the discretization of the Langevin dynamics. This is reminiscent of the fact that the overdamped limit (at the continuous level) of the Langevin dynamics requires a change of timescale to obtain the overdamped Langevin dynamics.

where MM is not supposed to be proportional to the identity, the following numerical scheme is obtained:

where Δs~=Δt2/2\widetilde{\Delta s}=\Delta t^{2}/2. This is a discretization of the overdamped dynamics

which is a generalization of (3.23) to a scalar product on Σ(z)\Sigma(z) induced by a general positive definite mass matrix MM. \sqcap\sqcup

Thermodynamic integration with constrained Langevin dynamics

In this section, we focus on the computation of the gradient of the rigid free energy (1.13)

As explained in the introduction, we may indeed concentrate on the computation of the rigid free energy (1.13), since the standard free energy (1.12) can be computed from the latter one using (1.14). The relation (1.14) can be proved with the co-area formula (2.18). Indeed, the free energy defined in (1.12) can be rewritten as (where C denotes a constant which may vary from line to line):

where surface measures are defined in Section 2.3. Note that the rigid free energy FrgdMF_{\rm rgd}^{M} indeed depends explicitly on the mass matrix since

This section is organized as follows. First, we show how systems with molecular constraints and systems with constrained values of the reaction coordinate can be treated in a unified framework (Section 4.1). We then relate the Lagrange multipliers arising in the constrained Langevin dynamics, and the gradient of the rigid free energy (the so-called mean force) in Section 4.2. We consider the numerical computation of the mean force in Section 4.3, where we prove consistency results for the corresponding approximation formulas. Finally, some numerical results on a model system illustrate the approach in Section 4.4.

We discuss here how to generalize all the computations to systems with molecular constraints, generalizing thereby some results of . Without loss of generality, we consider rigidly imposed molecular constraints, see Remark 4.1 below for a discussion of this assumption. This section can be considered as independent of the remainder of the paper and may therefore be omitted in a first reading.

In practice, many systems are subject to molecular constraints, such as fixed lengths for covalent bonds, or fixed angles between covalent bonds. The reader is referred to for practical aspects related to the simulation of molecular constraints. In the context of free energy computations, two types of constraints are therefore considered: first, the molecular constraints,

and the submanifold associated with the reaction coordinates by

is everywhere invertible on ΣmcΣrc(zrc)\Sigma_{\rm mc}\cap\Sigma_{\rm rc}(z_{\rm rc}). Likewise, we denote

Assuming rigid mechanical constraints on the molecular constraints ξmc\xi_{\rm mc} (see Remark 4.1 below), we are led to considering the canonical distribution

to describe systems with molecular constraints at a fixed temperature. The measure σTΣmc\sigma_{T^{*}\Sigma_{\rm mc}} denotes the phase space measure on TΣmcT^{*}\Sigma_{\rm mc}, equal by (2.21) to the conditional measure δξmc(q),pξmc(q,p)(dqdp)\delta_{\xi_{\rm mc}(q),p_{\xi_{\rm mc}}(q,p)}(dq\,dp) associated with the constraints (ξmc(q)=0,pξmc(q,p)=0)(\xi_{\rm mc}(q)=0,p_{\xi_{\rm mc}}(q,p)=0), where pξmcp_{\xi_{\rm mc}} is the effective momentum (2.5) associated with ξmc\xi_{\rm mc}.

The distribution μTΣmc\mu_{T^{*}\Sigma_{\rm mc}} in (4.4) is obtained by constraining rigidly ξmc(q)\xi_{\rm mc}(q) to 0, and not ’softly’ (in which case δξmc(q),pξmc(p,q)(dqdp)\delta_{\xi_{\rm mc}(q),p_{\xi_{\rm mc}}(p,q)}(dq\,dp) would be replaced by δξmc(q)(dq)dp\delta_{\xi_{\rm mc}(q)}(dq)\,dp, see also Remark 3.3). As explained in Section 3, the distribution (4.4) is the equilibrium distribution of a Langevin process (thermostated Hamiltonian dynamics) with rigid position constraints ξmc(q)=0\xi_{\rm mc}(q)=0. Choosing whether constraints should be soft or rigid is a modeling choice, and there is no clear consensus on this issue in the current literature.

Note however that it is possible to rewrite the remainder of this section by considering the softly constrained potential rather than the rigidly constrained potential, up to an appropriate modification of (4.6) below, with the help of some Fixman corrective potential. \sqcap\sqcup

By associativity of the conditioning of measures, the distribution μTΣmc\mu_{T^{*}\Sigma_{\rm mc}} conditioned by a value of the reaction coordinates ξrc(q)=zrc\xi_{\rm rc}(q)=z_{\rm rc} is given, up to a normalizing factor, by:

Therefore, considering the marginal probability distribution of the reaction coordinates ξrc(q)\xi_{\rm rc}(q) leads to the following definition of the free energy associated with ξrc\xi_{\rm rc}:

The conditional distribution can be decomposed as follows, using the co-area formulas (2.18)-(2.20) and the definition of effective momentum (2.5):

Integrating out the momentum in the linear space TqΣmcT^{\ast}_{q}\Sigma_{\rm mc} with scalar product p1,p2M1=p1TM1p2\langle p_{1},p_{2}\rangle_{M^{-1}}=p_{1}^{T}M^{-1}p_{2}, the free energy can be rewritten as:

As a consequence, the free energy FmcF^{\rm mc} can be computed from the generalized rigid free energy:

using the following formula, similar to (1.14):

In the above, μT(Σrc(zrc)Σmc)\mu_{T^{*}(\Sigma_{\rm rc}(z_{\rm rc})\cap\Sigma_{{\rm mc}})} is defined similarly to (1.9). The case of molecular constraints can therefore be treated within the general framework considered in this paper, the sampling of the canonical measure μT(Σrc(zrc)Σmc)\mu_{T^{*}(\Sigma_{\rm rc}(z_{\rm rc})\cap\Sigma_{{\rm mc}})} and the computation of the rigid free energy (4.5) being the problems at hand.

The rigid free energy which is thus naturally computed with such a dynamics is

and the actual free energy is thus recovered by a formula similar to (4.6), namely

2. The mean force and the Lagrange multipliers

In this section, the average of the constraining force (3.2) is related to the gradient of the rigid free energy (1.13) (or mean force). We also give a similar result for the following generalized rigid free energy:

yields on average the rigid free energy derivative:

Moreover, for general constraints (2.6) and the associated generalized free energy (4.7), the formula can be extended as follows: The generalized constraining force is

is defined in (2.9), and the rigid mean force is

where Zζ=ΣΞ(ζ)eβHdσΣΞ(ζ)\displaystyle Z_{\zeta}=\int_{\Sigma_{\Xi}(\zeta)}{\rm e}^{-\beta H}d\,\sigma_{\Sigma_{\Xi}(\zeta)}, and FrgdΞ(ζ)F^{\Xi}_{\rm rgd}(\zeta) is defined in (4.7). When (q,p)(q,p) verifies pξ(q,p)=vξ(q,p)=0p_{\xi}(q,p)=v_{\xi}(q,p)=0, then gΞ(q,p)=0g^{\Xi}(q,p)=0 and fΞ(q,p)=frgdM(q,p)f^{\Xi}(q,p)=f_{\rm rgd}^{M}(q,p).

Formulas (4.9) and (4.10) are obtained directly by replacing φ\varphi by eβH{\rm e}^{-\beta H} in Lemma 6.2 (see the Appendix). The fact that (fΞ(q,p),gΞ(q,p))=(frgdM(q,p),0)(f^{\Xi}(q,p),g^{\Xi}(q,p))=(f_{\rm rgd}^{M}(q,p),0) in the tangential case (namely when pξ(q,p)=vξ(q,p)=0p_{\xi}(q,p)=v_{\xi}(q,p)=0) is a consequence of (3.8). ∎

The following lemma gives a momentum-averaged version of the constraining force (a similar formula exists in the overdamped case, see Equations (4.8)-(4.9) in , for example).

The rigid mean force (4.8) can be rewritten as:

Consider the Gaussian distribution κTqΣ(z)M1(dp){\kappa}^{M^{-1}}_{T^{*}_{q}\Sigma(z)}(dp) defined in (3.19):

which is the marginal distribution in the momentum variable of the canonical distribution μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp), conditioned by a given qΣ(z)q\in\Sigma(z). Proving Lemma 4.4 amounts to showing that the average of the constraining force frgdMf_{\rm rgd}^{M} with respect to κTqΣ(z)M1(dp){\kappa}^{M^{-1}}_{T^{\ast}_{q}\Sigma(z)}(dp) yields frgdM\overline{f}_{\rm rgd}^{M}:

Denoting p1,p2M1=p1TM1p2\langle p_{1},p_{2}\rangle_{M^{-1}}=p_{1}^{T}M^{-1}p_{2}, this yields

Averaging (3.2) over momenta thus leads to the desired result. ∎

A similar result holds for the ‘Hamiltonian part’ of the Lagrange multipliers, defined by:

Indeed, the following almost sure convergence holds:

The estimator based on (4.16) has a smaller variance than the estimator based on (1.15) (or (4.14) above) since only the bounded variation part is retained, and the martingale part due to the Brownian increments and the dissipation term are subtracted out. Similar results on variance reduction where obtained in the overdamped case in .

Recall the expression (3.1) of the Lagrange multipliers, which can be decomposed as the sum of the constraining force, a dissipation term and a martingale (fluctuation) term:

The result follows from three facts. First, the process is ergodic with respect to the equilibrium distribution μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp) and averaging frgdMf_{\rm rgd}^{M} yields the rigid free energy derivative in view of Proposition 4.3. This already shows (4.16).

Second, the Gaussian distribution of μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp) with respect to momentum variables is centered, which yields:

Third, the variance of the martingale term can be uniformly bounded as

The fact that averaging the Lagrange multiplier in (4.14) indeed yields the mean force may not be intuitive. This is actually very much related to the cost interpretation of the Lagrange multipliers in optimization, see [35, Remark 3.29].

3. Numerical discretization of the mean force

Estimates of the mean force based on either (4.8), (4.11) or (4.16) can be obtained.

Free energy derivatives can be computed by averaging frgdM(q)\overline{f}_{\rm rgd}^{M}(q) or frgdM(q,p)f_{\rm rgd}^{M}(q,p) with respect to the distribution μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp), for instance using the estimators:

The functions frgdM(q)\overline{f}_{\rm rgd}^{M}(q) and frgdM(q,p)f_{\rm rgd}^{M}(q,p) may thus be called “rigid local mean forces”. Note that using the momentum-averaged local mean force frgdM\overline{f}_{\rm rgd}^{M} instead of the original frgdMf_{\rm rgd}^{M} reduces the variance since the fluctuations of the momentum variable have been averaged out analytically. Table 1 below confirms this analysis, although the variance reduction appears to be small in our specific numerical experiment.

3.2. Averaging the Lagrange multipliers

Free energy derivatives can also be computed using the Lagrange multipliers of a Langevin constrained process according to (1.15) or (4.16). This technique avoids the possibly cumbersome computation of second order derivatives Hessq(ξ){\rm Hess}_{q}(\xi) of the reaction coordinate, which appear in the expressions of frgdMf^{M}_{\rm rgd} or frgdM\overline{f}_{\rm rgd}^{M}. Besides, the Lagrange multipliers are needed anyway for the numerical integration of the dynamics.

The computation can be performed as before with a longtime simulation of the splitting scheme (3.16)-(3.17)-(3.18) discretizing the Langevin process with constraints. The following approximation formula can for instance be used:

where (λk+1/2,λk+3/4)(\lambda^{k+1/2},\lambda^{k+3/4}) are the Lagrange multipliers in the Hamiltonian part (3.17). The consistency of this estimator is given by the following proposition.

The approximation formula (4.18) is consistent. More precisely, the Lagrange multipliers (λn+1/2,λn+3/4)(\lambda^{n+1/2},\lambda^{n+3/4}) in (3.16)-(3.17)-(3.18) are both equivalent when Δt0\Delta t\to 0 to the constraining force defined in (3.2):

Moreover, the following second order consistency holds for the sum of the Lagrange multipliers:

The variant (4.20), which involves positions and momenta at the beginning and at the end of the Hamiltonian steps only, is used in (4.22) below to estimate the time discretization error in the thermodynamic integration method based on the estimator (4.16).

For sufficiently small time-steps Δt\Delta t, the implicit function theorem ensures that the two projection steps associated with the nonlinear constraints in (3.16)-(3.17)-(3.18) have a unique smooth solution. A Taylor expansion with respect to Δt\Delta t of the position constraints gives

Then, the fact that z=ξ(qn+1)=ξ(qn)z=\xi(q^{n+1})=\xi(q^{n}) and the identity

yield the following expansion of λn+1/2\lambda^{n+1/2} in terms of (qn,pn+1/2)(q^{n},p^{n+1/2}):

By time symmetry, the same computation holds for λn+3/4\lambda^{n+3/4}, starting from (qn+1,pn+3/4)(q^{n+1},p^{n+3/4}) and by formally replacing Δt\Delta t by Δt-\Delta t. This can be double checked by Taylor expanding with respect to Δt\Delta t the position constraints, as done above for λn+1/2\lambda^{n+1/2}. It thus holds:

The sum of the multipliers therefore reads

which gives (4.19). Now, using the previous calculations, we remark that:

This gives the claimed second order consistency of the sum of the Lagrange multipliers (4.20). ∎

This shows the convergence of the estimate (4.18) of the mean force when taking first the limit Δt0\Delta t\to 0 and then TT\to\infty.

3.3. Estimates relying on the Metropolized scheme

When the scheme (3.16)-(3.17)-(3.18) is complemented with a Metropolis step (see Algorithm 3.5), it is possible to prove a result on the longtime limit of trajectorial averages (i.e. letting first the number of iterations go to infinity, and then taking the limit Δt0\Delta t\to 0), upon assuming the irreducibility of the numerical scheme.

Indeed, let us consider the Markov chain (qk,pk)(q^{k},p^{k}) generated by the GHMC scheme in Algorithm 3.5, and assume (i) the irreducibility of the Markov chain, and (ii) that appropriate rejections outside the set DΔt=Σ(z)×{12pTM1pRΔt}\overline{D}_{\Delta t}=\Sigma(z)\times\{\frac{1}{2}p^{T}M^{-1}p\leq R_{\Delta t}\} are made in the steps (1)-(2)-(4) of the algorithm. In particular, the projection steps associated with the nonlinear constraints in Step (2) of Algorithm 3.5 are well defined.

Then, by ergodicity, an average of the analytic expression of the local rigid mean force frgdM\overline{f}_{\rm rgd}^{M} given in (4.12) yields an estimate of the free energy without time discretization error:

If frgdMf_{\rm rgd}^{M} is used instead of frgdM\overline{f}_{\rm rgd}^{M}, then the mean force is computed with some exponentially small error: almost surely,

for some α>0\alpha>0. The error arising from replacing DΔt\overline{D}_{\Delta t} with TΣ(z)T^{*}\Sigma(z) is indeed exponentially small in view of (3.22) (namely RΔtAΔt2R_{\Delta t}\geq A\,\Delta t^{-2}) and using the fact that the marginal distribution in the pp-variable is Gaussian.

Likewise, for estimates based on Lagrange multipliers, the following longtime averaging holds: almost surely,

where we have used the estimate (4.20) on the Lagrange multipliers. The limit Δt0\Delta t\to 0 is obtained by a dominated convergence argument:

Note that, due to the Metropolis correction in Algorithm 3.5, the time discretization error in the sampling of the invariant measure is removed. The only remaining time discretization errors come from (i) the approximation of the local mean force by the Lagrange multipliers (this is a second order error), and (ii) the integration domain being DΔt\overline{D}_{\Delta t} instead of TΣ(z)T^{*}\Sigma(z) (as discussed above, this is an exponentially small error in Δt\Delta t). In conclusion, the left-hand side of (4.22) is an approximation of zFrgdM(z)\nabla_{z}F_{\rm rgd}^{M}(z) up to a O(Δt2){\rm O}(\Delta t^{2}) error term.

3.4. Overdamped limit

which can be seen as a variant of the variance reduced estimator proposed directly for the overdamped scheme (3.27) in :

The rigorous justification of the consistency of (4.23) in the limit Δt0\Delta t\to 0 follows from the results of . See also Section 5.5.2 below for similar results.

4. Numerical illustration

We consider a system composed of NN particles in a 2-dimensional periodic box of side length LL, interacting through the purely repulsive WCA pair potential, which is a truncated Lennard-Jones potential:

where rr denotes the distance between two particles, ε\varepsilon and σ\sigma are two positive parameters and r0=21/6σr_{0}=2^{1/6}\sigma. Among these particles, two (numbered 1 and 2 in the following) are designated to form a dimer while the others are solvent particles. Instead of the above WCA potential, the interaction potential between the two particles of the dimer is a double-well potential

See for instance for other computational studies using this model.

The potential VSV_{\rm S} exhibits two energy minima, one corresponding to the compact state where the length of the dimer is r=r0r=r_{0}, and one corresponding to the stretched state where this length is r=r0+2wr=r_{0}+2w. The energy barrier separating both states is hh. The reaction coordinate used to describe the transition from the compact to the stretched state is the normalized bond length of the dimer molecule:

where q1q_{1} and q2q_{2} are the positions of the two particles forming the dimer. The compact state (resp. the stretched state) corresponds to the value z=0z=0 (resp. z=1z=1) of the reaction coordinate.

The inverse temperature is set to β=1\beta=1, with N=100N=100 particles (N2N-2 solvent particles and the dimer) with solvent density ρ=(12/N)a2=0.436\rho=(1-2/N)a^{-2}=0.436, since there are N2N-2 solvent particles in a square box of side length L=aNL=a\sqrt{N} with a=1.5a=1.5. The parameters describing the WCA interactions are set to σ=1\sigma=1 and ε=1\varepsilon=1, and the additional parameters for the dimer are w=2w=2 and h=2h=2.

For this system, M=IdM={\rm Id} and ξ|\nabla\xi| is constant, so that the rigid free energy FrgdM(z)F^{M}_{\rm rgd}(z) is equal to the free energy F(z)F(z).

The mean force is estimated at the values zi=zmin+iΔzz_{i}=z_{\rm min}+i\Delta z, with zmin=0.2z_{\rm min}=-0.2, zmax=1.2z_{\rm max}=1.2 and Δz=0.014\Delta z=0.014, by ergodic averages obtained with the projected dynamics with Metropolis correction (Algorithm 3.5, where in the simple case considered here, the fluctuation-dissipation part can be integrated exactly). For each value of zz, we integrate the dynamics on a time T=2×104T=2\times 10^{4} with a step size Δt=0.02\Delta t=0.02, using a scalar friction coefficient γ=1\gamma=1.

The resulting mean force profile is presented in Figure 1, together with the associated free energy profile.

Figure 2 compares the analytical constraining force frgdM(qn,pn)f^{M}_{\rm rgd}(q^{n},p^{n}) and the Lagrange multipliers, see Proposition 4.6. In Figure 2, the xx-axis represents the blocks of 10510^{5} simulation steps, concatenated for the 101101 different values of ziz_{i}. It can be seen that the difference between frgdM(qn,pn)f^{M}_{\rm rgd}(q^{n},p^{n}) and the Lagrange multipliers is small in any cases, though somewhat larger for the lowest values of ξ\xi.

It can be checked numerically that the differences λn+1/2frgdM(qn,pn+1/2)Δt2|\lambda^{n+1/2}-f_{\rm rgd}^{M}(q^{n},p^{n+1/2})\frac{\Delta t}{2}| and λn+3/4frgdM(qn+1,pn+1/2)Δt2|\lambda^{n+3/4}-f_{\rm rgd}^{M}(q^{n+1},p^{n+1/2})\frac{\Delta t}{2}| are indeed of order Δt2\Delta t^{2}, and that the difference

is indeed of order Δt3\Delta t^{3} (by computing the average of these elementary differences for various step sizes). The Lagrange multipliers are in any case very good approximations to the constraining force frgdMf^{M}_{\rm rgd}.

Let us finally discuss the efficiency of the different estimators of the mean force, in terms of their variances. They can be written as the empirical average of the following random sequences:

where qn,pn,λn+1/2,λn+3/4q^{n},p^{n},\lambda^{n+1/2},\lambda^{n+3/4} are given by the numerical scheme (3.16)-(3.17)-(3.18). The correlations in time (between the iterates) are very similar for the three methods, and we therefore simply compute the variance over all the samples. Table 1 compares the so-obtained standard errors over 10510^{5} time-steps with Δt=0.02\Delta t=0.02 (simulation time T=2000T=2000 for each value of the reaction coordinate). The results show that the different estimators are more or less equivalent. This is related to the fact that the essential source of variance comes from the sampling of the positions, and not the sampling of the velocities. Note however that, for the smallest value of the reaction coordinate, the estimator based on the averaged local mean force frgdM(qn)\overline{f}^{M}_{\rm rgd}(q^{n}) appears to be better in terms of variance.

Hamiltonian and Langevin nonequilibrium dynamics

The latter free energy is associated to the normalization constant Zz(t),z˙(t)Z_{z(t),\dot{z}(t)} of the distribution μΣξ,vξ(z(t),z˙(t))\mu_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))} defined by (2.15). The generalized rigid free energy (4.7) can be explicitly related to the usual free energy as follows. First, remark that, for a fixed qq,

In the above, the change of variable ppξ(q)GM1(q)vzp\to p-\nabla\xi(q)G_{M}^{-1}(q)v_{z} has been used, in the space

Note that 12vzTGM1(q)vz\frac{1}{2}v_{z}^{T}G_{M}^{-1}(q)v_{z} can be interpreted as the kinetic energy of the reaction coordinate ξ\xi. Using the decomposition of measures (2.24) and the above calculations, an alternative expression of the generalized free energy is:

where, as usual, C{\rm C} denotes a generic constant (independent of zz) whose value may vary from line to line. As a consequence, the standard free energy (1.12) is easily recovered from the generalized free energy, using relations similar to (1.14). Indeed, using (5.1), and with computations similar to the ones leading to (4.2), the difference of the two free energies writes:

In practical nonequilibrium computations, the profile tF(z(t))t\mapsto F(z(t)) can then be computed by adding a corrector to the work value in the Jarzynski estimator computing Frgdξ,vξ(z(t),z˙(t))F^{\xi,v_{\xi}}_{\rm rgd}(z(t),\dot{z}(t)). This yields the identity (5.24) mentioned in the introduction and proved below (see the discussion after Theorem 5.3).

2. Dynamics and generators

In view of the special structure of (σP,γP)(\sigma_{P},\gamma_{P}), this leads to

obtained by using a time reversed switching tz(Tt)t^{\prime}\mapsto z(T-t^{\prime}), and by reversing the momentum first in the initial condition, and then reversing them back after the time evolution (see for more general backward dynamics). More precisely, the backward dynamics can be defined through its generator

In the following proposition, the expressions of Ltf{\mathcal{L}}^{\rm f}_{t} and Ltb{\mathcal{L}}_{t^{\prime}}^{{\rm b}} are explicitly written.

and the generator of the backward process (5.6) at time t[0,T]t^{\prime}\in[0,T] reads:

is the fluctuation-dissipation operator defined in (3.5).

The full expression of the generator Ltf\mathcal{L}_{t}^{\rm f} is then obtained by adding the terms arising from the fluctuation-dissipation. These terms are directly obtained from the terms involving γP\gamma_{P} and σP\sigma_{P} in (5.4), as in the proof of Proposition 3.1.

3. Jarzynski-Crooks identity

Before stating the main result of this section (Theorem 5.3 below), we need to introduce a notion of work. This quantity is most conveniently defined for deterministic dynamics, but the corresponding definition is also valid for stochastic dynamics.

and denote by Φt,t+h:Σξ,vξ(z(t),z˙(t))Σξ,vξ(z(t+h),z˙(t+h))\Phi_{t,t+h}:\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))\to\Sigma_{\xi,v_{\xi}}(z(t+h),\dot{z}(t+h)) the associated flow between time t[0,T]t\in[0,T] and t+h[0,T]t+h\in[0,T]. The work can now be written out more explicitly using the following lemma:

The infinitesimal variation of the work (5.12) reads:

where for all t[0,T]t\in[0,T] and all (q,p)Σξ,vξ(z(t),z˙(t))(q,p)\in\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t)),

The total exchanged work is then a time integral associated with the path t(qt,pt)t\mapsto(q_{t},p_{t}), and is denoted by:

Note that the expression (5.16) can be interpreted as the energy variation of the system during the switching when the stochastic thermostat is turned off.

The expression of the Lagrange multipliers in (5.3) yields (5.15):

The last equality is obtained using the computation of the Lagrange multipliers in (5.3). This yields (5.16). ∎

We are now in position to state the main result of this section.

and by {qtb,ptb}0tT\{q^{\rm b}_{t^{\prime}},p^{\rm b}_{t^{\prime}}\}_{0\leq t^{\prime}\leq T} the solution of the backward Langevin process (5.6) with initial conditions distributed according to

Then, the following Jarzynski-Crooks identity holds on [0,T][0,T]: for any bounded path functional φ[0,T]\varphi_{[0,T]},

where ()r(\,\cdot\,)^{\rm r} denotes the composition with the operation of time reversal of paths:

Note that the theorem still holds in the Hamiltonian case, i.e. when (γP,σP)=(0,0)(\gamma_{P},\sigma_{P})=(0,0). The choice φ[0,T]=1\varphi_{[0,T]}=1 in (5.19) leads to the following work fluctuation identity:

Besides, upon choosing a path functional exp(θβW0,T)\exp(\theta\beta\mathcal{W}_{0,T}), it is possible to obtain a family of free energy estimators, parameterized by θ\theta and where both forward and backward paths are weighted by the exponential of some work. Moreover, the standard Crooks equality on ratios of probability density functions of work values is also a consequence of (5.19), see Section 4.2.2 in .

Note also that the choice φ[0,T](q,p)=ϕ(qT,pT)\varphi_{[0,T]}(q,p)=\phi(q_{T},p_{T}) leads to the following representation of the canonical distribution μΣξ,vξ(z(T),z˙(T))\mu_{\Sigma_{\xi,v_{\xi}}(z(T),\dot{z}(T))}:

The usual free energy profile zF(z)z\mapsto F(z) can therefore be computed using the relations (1.18)-(1.19) presented in the introduction. Indeed, Equation (1.19) (see (5.24) below) can be proved by combining (5.2) and (5.21)–(5.22) as follows:

where the corrector C(t,q)C(t,q) is defined in (1.18):

Estimators of the free energy based on (1.19) can then be constructed, see Chapter 4 in for a review.

Before turning to the proof of Theorem 5.3, we first give the general lemma which enables to deduce the Jarzynski-Crooks fluctuation identity from a nonequilibrium detailed balance condition (similar to the one presented in for switchings arising from a time-dependence in the Hamiltonian).

Let (qt,pt)0tT(q_{t},p_{t})_{0\leq t\leq T} (resp. (qtb,ptb)0tT(q^{\rm b}_{t},p^{\rm b}_{t})_{0\leq t\leq T}) be a Markov process with infinitesimal generator Ltf{\mathcal{L}}^{\rm f}_{t} (resp. Ltb{\mathcal{L}}^{\rm b}_{t}) and initial conditions distributed according to (5.17) (resp. (5.18)). Let us assume that the following nonequilibrium detailed balance condition is satisfied: for any two smooth test functions φ1\varphi_{1}, φ2\varphi_{2},

Then the Jarzynski-Crooks fluctuation identity (5.19) holds.

Let us introduce the following weighted transition operators: for any bounded test function φ\varphi,

where (qt,pt)0tT(q_{t},p_{t})_{0\leq t\leq T} (resp. (qtb,ptb)0tT(q^{\rm b}_{t},p^{\rm b}_{t})_{0\leq t\leq T}) is a Markov process with infinitesimal generator Ltf{\mathcal{L}}^{\rm f}_{t} (resp. Ltb{\mathcal{L}}^{\rm b}_{t}), and Wt,T=W0,TW0,t\mathcal{W}_{t,T}=\mathcal{W}_{0,T}-\mathcal{W}_{0,t}.

We assume that these operators are well defined and smooth with respect to time for sufficiently smooth test functions defined in an open neighborhood of Σξ,vξ(z(t),z˙(t))\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t)) and Σξ,vξ(z(t),z˙(t))\Sigma_{\xi,v_{\xi}}(z(t^{\prime}),\dot{z}(t^{\prime})) respectively (for any t,t[0,T]t,t^{\prime}\in[0,T]).

The transition operators satisfy the following backward Kolmogorov evolution equations:

Consider now two test functions φ0\varphi_{0} and φT\varphi_{T}. The balance condition (5.25) implies

Integrating this equality on [0,T][0,T] yields

which is the Crooks identity (5.19) for path functionals of the form

Then, using the Markov property of the forward and backward processes, Crooks identity (5.19) can be extended to finite-dimensional path functionals of the form:

with 0=t0<t1<tK=T0=t_{0}<t_{1}\cdots<t_{K}=T by repeatedly using a variant of (5.28) on time subintervals [tk,tk+1][t_{k},t_{k+1}] (see the proof of Theorem 4.10 in for further precisions). This allows to conclude since finite dimensional time marginal laws characterize the distribution on continuous paths, see for instance . ∎

First, using Lemma 6.2, we compute the variation of the unnormalized canonical equilibrium distribution with constraints with respect to the switching:

On the other hand, (5.14) and Proposition 5.1 give

Now, (5.25) can be verified in two steps. First, the last two terms in (5.31) (the ”thermostat” terms) cancel out after integration with respect to eβHdσΣξ,vξ(z(t),z˙(t)){\rm e}^{-\beta H}\,d\sigma_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))} thanks to the detailed balance condition (3.12). Then, an integration of (5.31) with respect to eβHdσΣξ,vξ(z(t),z˙(t)){\rm e}^{-\beta H}\,d\sigma_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))} gives, in view of (5.30) and (2.25),

which is indeed (5.25). Note that the time-regularity on the evolution semi-groups (5.26)-(5.27) required to make these computations rigorous is proved in the overdamped case in the proof of Theorem A.5 in . A similar proof can be carried out for constrained Langevin equations. ∎

4. Numerical schemes

so that constraints on momenta are automatically enforced, and no Lagrange multiplier is needed in (5.32) and (5.34).

We comment in the subsequent sections on the different parts of the scheme.

The Lagrange multipliers λn+1/2\lambda^{n+1/2} are associated with the position constraints (Cq)(C_{q}), and the Lagrange multipliers λn+3/4\lambda^{n+3/4} are associated with the velocity constraints (Cp)(C_{p}). In (Cp)(C_{p}), the velocity of the switching at time tn+1t_{n+1} is discretized as:

The latter choice is motivated by the following observation: The position after one step of an unconstrained motion, given by

already satisfies (Cq)(C_{q}) up to error terms of order two with respect to Δt\Delta t. Indeed, using (5.35):

This property is useful to ensure a fast convergence of the numerical algorithm solving the nonlinear constraints (Cq)(C_{q}).

The numerical flow associated with (5.33) is denoted in the sequel as

It can be proven that Φn\Phi^{n} is a symplectic map. The proof is indeed exactly the same as for the symplecticity of the classical RATTLE scheme, see [22, Sections VII.1.3] for an explicit computation for symplectic Euler and [22, Sections VII.1.4] for an extension to RATTLE. As a consequence, Φn\Phi^{n} transports the phase space measure σΣξ,vξ(z(tn),z(tn+1)z(tn)Δt)\sigma_{\Sigma_{\xi,v_{\xi}}\left(z(t_{n}),\frac{z(t_{n+1})-z(t_{n})}{\Delta t}\right)} to the phase space measure σΣξ,vξ(z(tn+1),z(tn+2)z(tn+1)Δt)\sigma_{\Sigma_{\xi,v_{\xi}}\left(z(t_{n+1}),\frac{z(t_{n+2})-z(t_{n+1})}{\Delta t}\right)}.

4.2. Comments on the fluctuation-dissipation part (5.32)-(5.34)

In practice, (5.32) may be rewritten in a form more suited to numerical computations. Of course, similar considerations hold for (5.34). Since γP(q)=PM(q)γPM(q)T\gamma_{P}(q)=P_{M}(q)\gamma P_{M}(q)^{T} and σP(q)=PM(q)σ\sigma_{P}(q)=P_{M}(q)\sigma, (5.32) is equivalent to:

where the Lagrange multiplier λn+1/4\lambda^{n+1/4} is associated with the constraint (Cp)(C_{p}). The equivalence between (5.32) and (5.37) can be checked by multiplying (5.37) by PM(qn)P_{M}(q^{n}) and using (5.35).

The Lagrange multiplier λn+1/4\lambda^{n+1/4} in (5.37) is obtained by multiplying the above equation by ξ(qn)TM1(Id+Δt4γM1)1\nabla\xi(q^{n})^{T}M^{-1}\left({\rm Id}+\frac{\Delta t}{4}\gamma M^{-1}\right)^{-1}, and solving the following linear system:

This system is well posed. Indeed, the matrix ξ(qn)TM1(Id+Δt4γM1)1ξ(qn)\nabla\xi(q^{n})^{T}M^{-1}\left({\rm Id}+\frac{\Delta t}{4}\gamma M^{-1}\right)^{-1}\nabla\xi(q^{n}) can be rewritten as ξ(q)TSξ(q)\nabla\xi(q)^{T}S\nabla\xi(q) with S=M1(Id+Δt4γM1)1S=M^{-1}\left({\rm Id}+\frac{\Delta t}{4}\,\gamma M^{-1}\right)^{-1}. Both MM and γ\gamma are symmetric and non-negative, so that SS is symmetric, positive and invertible. Finally, the invertibility of ξ(q)TSξ(q)\nabla\xi(q)^{T}S\nabla\xi(q) follows from the invertibility of GM(q)G_{M}(q).

In the special case when γ\gamma and MM are equal up to a multiplicative constant, the numerical integration can be simplified using the explicit formula (3.20) and the method described below (3.20), which still holds for the tangential part of the momentum. See Section 5.6 below for further precisions.

4.3. Discretization of the backward process (5.6)

The splitting scheme for the backward Langevin dynamics with time-evolving constraints (5.6) reads as follows: Denote n=NTnn^{\prime}=N_{T}-n, take initial conditions (qb,0,pb,0)(q^{{\rm b},0},p^{{\rm b},0}) distributed according to μΣξ,vξ(z(tNT),z(tNT+1)z(tNT)Δt)\mu_{\Sigma_{\xi,v_{\xi}}\left(z(t_{N_{T}}),\frac{z(t_{N_{T}+1})-z(t_{N_{T}})}{\Delta t}\right)} and iterate on 0nNT10\leq n^{\prime}\leq N_{T}-1,

where we recall n=NTnn^{\prime}=N_{T}-n. Assuming that the flow Φn\Phi^{n} given by (5.36) and Φb,n\Phi^{{\rm b},n^{\prime}} are both well-defined, the following reversibility property is easily checked (extending the symmetry property of the standard RATTLE scheme, see for instance [22, Section VII.1.4]):

4.4. Work discretization and free energy computations

The work (5.12) can be approximated using the Lagrange multipliers in (5.33):

for n=0NT1n=0\ldots N_{T}-1. The (formal) consistency of the work discretization (5.42) in the time continuous limit is a direct consequence of the work expression (5.12).

An estimator of the free energy profile is then obtained by using KK independent realizations of the switching process, computing the work WNT,k\mathcal{W}^{N_{T},k} for each realization k{1,,K}k\in\{1,\ldots,K\} (with the numerical trajectories obtained from the numerical scheme (5.32)-(5.33)-(5.34) and i.i.d. initial conditions sampled according to μΣξ,vξ(z(t0),z(t1)z(t0)Δt)\mu_{\Sigma_{\xi,v_{\xi}}\left(z(t_{0}),\frac{z(t_{1})-z(t_{0})}{\Delta t}\right)}), and approximating (5.23), rewritten up to an unimportant additive constant (independent of TT), as

In the above, the discretization Cn(q)C^{n}(q) of the corrector (1.18) is

We refer to Chapter 4 in for more background on free energy estimators for nonequilibrium dynamics. In particular, it is possible to compute a work associated with the backward switching from the Lagrange multipliers in (5.6), and to resort to bridge estimators (see Section 4.2.3 in ).

However, using approximations such as (5.42) in the Jarzynski-Crooks identity introduces a time discretization error. We show in the next section how to eliminate this error.

4.5. Discrete Jarzynski-Crooks identity

It turns out that a discrete version of the Jarzynski-Crooks identity (5.19) can be obtained. This enables the estimation of free energy differences using nonequilibrium simulation without time discretization error. The discrete equality (5.48) below may be seen as an extension of the corresponding equality obtained for transitions associated with time-dependent Hamiltonians and performed with Metropolis-Hastings dynamics (see and Remark 4.5 in ).

For this purpose, we consider a discretization of the work W0,T\mathcal{W}_{0,T} using the interpretation (5.16) of the work as the energy variation of the Hamiltonian part of the Langevin dynamics. This leads to the following definition of the work at the discrete level:

for n=0NT1n=0\ldots N_{T}-1. This work discretization leads to a Jarzynski-Crooks identity without time discretization error.

Consider the distribution μΣξ,vξ(z(t),z˙(t))\mu_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))} and its normalization Zz(t),z˙(t)Z_{z(t),\dot{z}(t)} defined in (2.15). Denote by {qn,pn}0nNT\{q^{n},p^{n}\}_{0\leq n\leq N_{T}} the solution of the forward discretized Langevin dynamics (5.32)-(5.33)-(5.34) with initial conditions distributed according to

and by {qb,n,pb,n}0nNT\{q^{{\rm b},n^{\prime}},p^{{\rm b},n^{\prime}}\}_{0\leq n^{\prime}\leq N_{T}} the solution of the discretized backward Langevin dynamics (5.38)-(5.39)-(5.40) distributed according to

Then, the following Jarzynski-Crooks identity holds on [0,NT][0,N_{T}]: for any bounded discrete path functional φ[0,NT]\varphi_{[0,N_{T}]},

where Wn\mathcal{W}^{n} is computed according to (5.44), and ()r(\,\cdot\,)^{\rm r} denotes the composition with the operation of time reversal of paths:

The (formal) consistency of the work discretization (5.44) in the time continuous limit is a direct consequence of the work expression (5.16). Free energy estimators based on the identity (5.47) are obtained as described in Section 5.4.4. Let us emphasize once again that there is no error related to the finiteness of the time-step Δt\Delta t in this estimator, and that the only source of approximation is due to the statistical error.

With a slight abuse of notation, we denote in the same way the random variables (qn,pn)(q^{n},p^{n}), (qb,n,pb,n)(q^{{\rm b},n},p^{{\rm b},n}), etc. in (5.32)-(5.33)-(5.34) or (5.38)-(5.39)-(5.40), and the integration variables in the definition of probability distributions. We divide the proof into three steps.

Step 1: The phase space conservation of Φn\Phi^{n} and Φb,n\Phi^{{\rm b},n^{\prime}} and the reversibility property (5.41) imply

Step 2: The probability distribution of pn+1/4p^{n+1/4} given (qn,pn)(q^{n},p^{n}) in the discretization of the fluctuation-dissipation part (5.32) is denoted KOU(qn,pn,dpn+1/4)K^{\rm OU}(q^{n},p^{n},dp^{n+1/4}). The scheme (5.32) is a mid-point discretization of an Ornstein-Uhlenbeck process, which can be rewritten by decomposing the orthogonal and tangential updates of the momentum:

where p=PM(qn)pp_{\parallel}=P_{M}(q^{n})p, and p=(IdPM(qn))pp_{\perp}=({\rm Id}-P_{M}(q^{n}))p. The Markov chain induced by the parallel part of the momentum is the same as the one induced by the scheme (3.16) (or (3.18)) defined in Section 3.2. The latter verifies a detailed balance equation (both in the plain sense and up to momentum reversal) with respect to the stationary measure κTqΣ(z)M1(dp)\kappa_{T^{\ast}_{q}\Sigma(z)}^{M^{-1}}(dp) defined by (3.19) (see Sections 2.3.2 and 3.3.5 in ). We recall that this measure is defined as the kinetic probability distribution in the momentum variable of the canonical distribution μTΣ(z)(dqdp)\mu_{T^{*}\Sigma(z)}(dq\,dp) on the tangential space, conditioned by a given qΣ(z)q\in\Sigma(z). Adding the (invariant) orthogonal part of the momentum, the following detailed balance condition is satisfied:

Step 33: Denote by Kf(qn,pn;dqn+1,dpn+1/4,dpn+3/4,dpn+1)K^{\rm f}(q^{n},p^{n};dq^{n+1},dp^{n+1/4},dp^{n+3/4},dp^{n+1}) the probability distribution of the variables (qn+1,pn+1/4,pn+3/4,pn+1)(q^{n+1},p^{n+1/4},p^{n+3/4},p^{n+1}) given the variables (qn,pn)(q^{n},p^{n}) in the scheme (5.32)-(5.33)-(5.34); and by Kb(qb,n,pb,n;dqb,n+1,dpb,n+1/4,dpb,n+3/4,dpb,n+1)K^{\rm b}(q^{{\rm b},n^{\prime}},p^{{\rm b},n^{\prime}};dq^{{\rm b},n^{\prime}+1},dp^{{\rm b},n^{\prime}+1/4},dp^{{\rm b},n^{\prime}+3/4},dp^{{\rm b},n^{\prime}+1}) the probability distribution of the variables (qb,n+1,pb,n+1/4,pb,n+3/4,pb,n+1)(q^{{\rm b},n^{\prime}+1},p^{{\rm b},n^{\prime}+1/4},p^{{\rm b},n^{\prime}+3/4},p^{{\rm b},n^{\prime}+1}) given the variables (qb,n,pb,n)(q^{{\rm b},n^{\prime}},p^{{\rm b},n^{\prime}}) in the scheme (5.38)-(5.39)-(5.40). The splitting structure yields:

Combining the detailed balance conditions (5.49) and (5.51) of Steps 11 and 22, and using the decomposition (2.24) of phase space measures, it follows

which can be seen as the Jarzynski-Crooks identity over one time-step. Iterating the argument, it is easy to obtain:

5. The overdamped limit

The splitting scheme (5.32)-(5.33)-(5.34) can be used in the overdamped regime, using the method of Proposition 3.6, i.e. by choosing

which implies γP=2PMTPM\gamma_{P}=2P_{M}^{T}P_{M} and σP=2βPM\sigma_{P}=\frac{2}{\sqrt{\beta}}P_{M}. For this choice of parameters, the continuous limit of the numerical scheme is the following variant of the stochastic differential equation (3.23):

where λtod\lambda_{t}^{\rm od} is an adapted stochastic process such that ξ(qt)=z(t)\xi(q_{t})=z(t). We then obtain the following Jarzynski-Crooks relation for discretized overdamped dynamics, without time discretization error.

where (Gn)n0({\mathcal{G}}^{n})_{n\geq 0} are independent and identically distributed centered and normalized Gaussian variables, and (λodn)n1(\lambda^{n}_{\rm od})_{n\geq 1} are the Lagrange multipliers associated with the constraints (ξ(qn)=z(tn))0nNT(\xi(q^{n})=z(t_{n}))_{0\leq n\leq N_{T}}. In the same way, the backward process (5.38)-(5.39)-(5.40) yields the following Euler scheme:

with G=ξTξG=\nabla\xi^{T}\nabla\xi, and the scheme (5.54) is rewritten as:

Then the Jarzynski-Crooks relation (5.47) holds under the assumptions (5.45) and (5.46) on the initial conditions of the schemes (5.54) and (5.55) respectively.

The proof is a direct consequence of the reformulation of (5.54) into (5.57), and a direct application of Theorem 5.5 with the parameters (5.52).

based on the work (5.56) and the corrector

is exact (there is no time discretization error). This free energy estimator can be seen as a variant of the estimator proposed in , which was derived directly for the scheme (5.54), and reads (up to an unimportant additive constant):

and the modified corrector is defined without the kinetic energy term:

There is a bias due to the time discretization error in the estimator (5.60), which can be removed upon following the procedure described in Proposition 5.6.

5.2. Consistency analysis of three free energy estimators

In this section, we would like to discuss the consistency of three free energy estimators introduced above: (5.60)-(5.61) (based on the direct discretization of the overdamped dynamics proposed in ), (5.58)-(5.42) (which uses the Lagrange multipliers to approximate the work) and (5.58)-(5.56) (based on the discrete Jarzynski equality).

The limiting continuous-in-time version of the Jarzynski relation is:

where the work for the overdamped dynamics (5.53) reads (see ):

The consistency of (5.60)-(5.61) with (5.63)-(5.64) was already proven in .

Concerning the consistency of CnC^{n} with C~\widetilde{C} (see (5.58) and (5.60)), note that in the overdamped scaling (M=Δt2IdM=\frac{\Delta t}{2}{\rm Id}), the difference

vanishes when Δt0\Delta t\to 0. This difference can therefore be neglected when analyzing the consistency of the scheme in the continuous-in-time limit. We henceforth concentrate on the consistency of the works (5.42) and (5.56) with (5.64).

In the sequel, we denote the anticipating stochastic integration of the integrand YtY_{t} with respect to dXtdX_{t} by

where \,\circ\, is the Stratonovitch symmetric integration, and .\,.\, the Itô integration. The symbol \leadsto denotes the formal time continuous limit.

Let us justify the consistency of the work expression (5.42) with (5.64). Remark that the Lagrange multipliers in (5.57) verify:

so that λn+1/2\lambda^{n+1/2} and λn+3/4\lambda^{n+3/4} yield the same time continuous limit, that is to say

the same holding true for 2λn+3/42\lambda^{n+3/4}. The work expression (5.42) is thus formally consistent with (5.64). This concludes the proof of the consistency of (5.58)-(5.42) with (5.63)-(5.64).

We now prove the consistency of the work expression (5.56) with (5.64). Define

The expression (5.56) yields using (5.57):

First, since V(qn+1)V(qn)V(qt)dqtV(q^{n+1})-V(q^{n})\leadsto\nabla V(q_{t})\,\circ dq_{t} and 12(V(qn)+V(qn+1))(qn+1qn)V(qt)dqt\frac{1}{2}(\nabla V(q^{n})+\nabla V(q^{n+1}))\cdot(q^{n+1}-q^{n})\leadsto-\nabla V(q_{t})\,\circ dq_{t}, the limit of the terms in (5.72) is zero. Second, using the expressions (5.65) and (5.67), and similarly to (5.68) and (5.70), it holds

since (ξG1ξ(qn)+ξG1ξ(qn+1))T(qn+1qn)2ξG1(qt)z(t)dt(\nabla\xi G^{-1}\nabla\xi(q^{n})+\nabla\xi G^{-1}\nabla\xi(q^{n+1}))^{T}\left(q^{n+1}-q^{n}\right)\leadsto 2\nabla\xi G^{-1}(q_{t})z^{\prime}(t)\,dt by (5.69). Expanding in higher order powers of Δt\Delta t, it can be checked that there exists two functions aa and bb such that

Therefore, since (in the limit Δt0\Delta t\to 0) the martingale part of fn+fn+1f^{n}+f^{n+1} arises only from the term ξG1ξT(qt).dqt\nabla\xi G^{-1}\nabla\xi^{T}(q_{t}).dq_{t}, one obtains

since ξT(qt)P(qt)=0\nabla\xi^{T}(q_{t})P(q_{t})=0. In conclusion, the second term in (5.73) has a zero contribution to the continuous-in-time limit.

As a consequence, the formal time continuous limit of Wn+1Wn\mathcal{W}^{n+1}-\mathcal{W}^{n} is the same as the one of InI^{n}. Computations similar to the one performed above yield

we get in the end that the formal time continuous limit of InI^{n} and Wn+1Wn\mathcal{W}^{n+1}-\mathcal{W}^{n} is the same as:

where we have used (5.71). This concludes the proof of the consistency of (5.58)-(5.56) with (5.63)-(5.64).

6. Numerical illustration

We present some free energy profiles obtained with nonequilibrium switching dynamics for the model system and the parameters described in Section 4.4. The switching schedule reads

with zmin=0.1z_{\rm min}=-0.1 and zmax=1.1z_{\rm max}=1.1. The time-step is Δt=0.01\Delta t=0.01. The initial conditions are obtained by first subsampling a constrained dynamics with ξ(q)=zmin\xi(q)=z_{\rm min} and vξ(q,p)=0v_{\xi}(q,p)=0, with a time spacing Tsample=1T_{\rm sample}=1; and then adding the required component ξ(q)GM1z˙(0)\nabla\xi(q)G_{M}^{-1}\dot{z}(0) to the momentum variable (with z˙(0)=(zmaxzmin)/T\dot{z}(0)=(z_{\rm max}-z_{\rm min})/T).

where α=eγΔt\alpha={\rm e}^{-\gamma\Delta t}, and setting pn+1/4=p~n+1/4+λn+1/4ξ(qn)p^{n+1/4}=\widetilde{p}^{n+1/4}+\lambda^{n+1/4}\nabla\xi(q^{n}) with λn+1/4\lambda^{n+1/4} chosen such that

Figure 3 presents estimates obtained with MM independent realizations of the switching dynamics for different switching times TT, using the estimator presented in Section 5.4.4 with the work discretization (5.42). In all cases, the product MTMT is kept constant. The free energy profile becomes closer to the reference curve as TT is increased, and the profile obtained for T=100T=100 is in excellent agreement with the result obtained with thermodynamic integration. When the switching time is small, more realizations should be considered to reduce the statistical errors and obtain estimates in better agreement with the reference profile. The fact that the variance is very large when the switching time TT is too small is a well-known drawback of estimators based on the Jarzynski-Crooks identity, see the review in Sections 4.1.4 and 4.1.5 in . Roughly speaking, the difficulty is related to the fact that the free energy difference is obtained as an average of exp(βW)\exp(-\beta{\mathcal{W}}), which requires a very good sampling of the small values of the work W{\mathcal{W}}. As TT decreases, the width of the work distribution increases and the low tail part is more and more difficult to sample. Improved estimates can be obtained with estimators based on combinations of forward and backward trajectories, see for instance and Section 4.2 in .

Appendix: Some technical results

We give in this appendix two technical lemmas, used in the proof of Proposition 4.3. The first lemma can also be used to prove the divergence formula (2.25).

For any a{1,,2m}a\in\left\{1,\dots,2m\right\} :

The proof relies on the following computation rules for any family of invertible square matrices θAθ\theta\mapsto A_{\theta}:

Fix a{1,,2m}a\in\{1,\dots,2m\}. First, using (6.3) with AθA_{\theta} replaced by Γ\Gamma and ddθ\frac{d}{d\theta} replaced by {,Ξc}\left\{\cdot,\Xi_{c}\right\}, we obtain

so that by the skew-symmetry of Γ1\Gamma^{-1} and Γ\Gamma,

Jacobi’s identity for Poisson brackets and (6.2) then yield

since Γa,b(Γ1)b,c=δa,c\Gamma_{a,b}(\Gamma^{-1})_{b,c}=\delta_{a,c} where δi,j\delta_{i,j} is the Kronecker symbol. Finally, the left hand and right hand sides of the last equality can be factorized as

Since detΓ>0\left|\det\Gamma\right|>0 and Γ\Gamma is invertible, it follows

for all b=1,,2mb=1,\dots,2m, which is (6.1). ∎

where the phase space ΣΞ(ζ)\Sigma_{\Xi}(\zeta) is defined in (2.7), and the Gram matrix Γ\Gamma in (2.9).

where in the last line the following chain rule has been used:

Now an integration by parts with respect to dqdpdq\,dp, together with (6.1), leads to

References