Langevin dynamics with constraints and computation of free energy differences

Tony Lelievre, Mathias Rousset, Gabriel Stoltz

Introduction and main results

Free energy is a central concept in thermodynamics and in modern works on biochemical and physical systems. Typical examples studied by computer simulations include the solvation free energies (which is the free energy difference between a molecule in vacuo and its counterpart surrounded by solvent molecules) and the binding free energy of two molecules (which determines whether a new drug can have an efficient action on a given protein). In many applications, it is actually the free energy difference profile between the initial and the final state which is a quantity of paramount importance. It is observed by practitioners that free energy barriers are a very important element to describe transition kinetics from one state to the other. For instance, the chemical kinetics of reactions happening in solvent (such as in the cells of our bodies) are limited by free energy barriers, and can take place only when the free energy difference between the initial and the final state is negative, or at least less than the typical thermal energy. It is therefore very important to accurately compute free energy differences in order to assess the likelihood of a certain physical event to happen.

Beside these physical motivations to compute free energy differences, a more abstract motivation is to overcome sampling barriers encountered when computing canonical averages (see the discussion in [35, Section 1.3.3]). Indeed, it is often the case in practice that the trajectories generated by the numerical methods at hand remain stuck for a long time in some region of the phase space, and hop only occasionally to another region, where they also remain stuck – a behavior known as metastability. Chemical and physical intuitions may guide the practitioners of the field towards the choice of some slowly evolving degree of freedom, called reaction coordinate in the following, responsible for the metastable behavior of the system. In this case, free energy techniques can be used to accelerate the sampling. This viewpoint allows to consider applications which are not motivated by physical or biological problems, such as curing sampling issues in Bayesian statistics .

In this introductory section, we present the main results and give the outline of the paper, highlighting the three main contributions of this work. We only briefly define the concepts we need in this general introduction, and refer the reader to the following sections (in particular Section 2) for further precisions on the mathematical objects at hand.

In the present paper, the focus is on the canonical ensemble, which is the equilibrium probability distribution of microscopic states of a system at fixed temperature (fixed average energy). For systems without constraints, this ensemble is characterized by the probability distribution

where $Z$ is the normalizing constantThe potential $V$ is assumed to be such that $Z<\infty$ . ensuring that $\mu$ is indeed a probability distribution, and $\beta=(k_{\rm B}T)^{-1}$ is proportional to the inverse temperature. One dynamics which admits the canonical measure (1.1) as an invariant measure is the Langevin dynamics (see for instance [35, Section 2.2.3] and references therein):

where $W_{t}$ is a standard $3N$ -dimensional Brownian motion, and $\gamma(q),\sigma(q)$ are $3N\times 3N$ position dependent real matrices which are assumed to satisfy the fluctuation-dissipation identity

The Langevin dynamics can be seen as some modification of the Hamiltonian dynamics with two added components: a damping term $-\gamma(q_{t})M^{-1}p_{t}\,dt$ and a random forcing term $\sigma(q_{t})\,dW_{t}$ . The energy dissipation due to the damping is compensated by the random forcing in such a way that the temperature of the system is $T=(k_{B}\beta)^{-1}$ (with $k_{\rm B}$ Boltzmann’s constant).

We will consider positions subject to a $m$ -dimensional mechanical constraint denoted by

As will become clear below, constrained systems appear in computational statistical physics in two kinds of contexts (see e.g. [42, Chapter 10], and for applications to the computation of free energy differences, and for mathematical textbooks dealing with constrained Hamiltonian dynamics):

for free energy computations, where $\xi$ is a given reaction coordinate parameterizing a transition between ”states” of interest ;

when the system is subject to molecular constraints such as rigid covalent bonds, or rigid bond angles in molecular systems.

In the sequel, $\xi$ may be thought of at first reading as a reaction coordinate (case (i)). Section 4.1 explains how to handle additional molecular constraints (case (ii)) within the same formalism. In any case, the position of the system is constrained onto the submanifold of co-dimension $m$ :

and the associated phase space is the cotangent bundle denoted by

For a given $q\in\Sigma(z)$ , the set of cotangent momenta is denoted by

The orthogonal projection on $T^{*}_{q}\Sigma(z)$ with respect to the scalar product induced by $M^{-1}$ is denoted

where $G_{M}(q)$ is the Gram matrix associated with the constraints

Throughout the paper, we assume that $G_{M}$ is invertible everywhere on $\Sigma(z)$ (for all $z$ ). It is easily checked that $P_{M}$ satisfies the projector property $P_{M}(q)^{2}=P_{M}(q)$ , and the orthogonality property

2. The constrained Langevin dynamics

For constrained systems, the associated canonical distribution is defined by

where $\sigma_{T^{*}\Sigma(z)}(dq\,dp)$ is the phase space Liouville measure of $T^{*}\Sigma(z)$ , and $Z_{z,0}$ the normalizing constant ( $z$ refers to the position constraint, and to the velocity or momentum constraint, see (2.15) below). See Section 2.3 for precise definitions.

A dynamics admitting the constrained canonical measure (1.9) as an invariant equilibrium measure is the following Langevin process (“CL” stands for “constrained Langevin”): For a given initial condition $(q_{0},p_{0})\in T^{*}\Sigma(z)$ ,

3. Free energy computations

In words, ${\rm e}^{-\beta F(z)}\,dz$ is the image of the measure $\mu$ by $\xi$ , and $F$ can be seen as an “effective potential energy” associated to $\xi$ .

Computing the free energy profile $z\mapsto F(z)$ (up to an additive constant independent of $z$ ), or free energy differences between two states $F(z_{2})-F(z_{1})$ is a way to compare the relative probabilities of different ”states” parameterized by $\xi$ . This is a very important calculation for practical applications, see . A state should be understood here as the collection of all possible microscopic configurations $(q,p)$ , distributed according to the canonical measure (1.1), and satisfying the macroscopic constraint $\xi(q)=z$ . Since we only focus on computing free energy differences, $F$ is defined up to an additive constant (independent of $z$ , denoted by ${\rm C}$ below, and whose value may vary from line to line) and can be rewritten as:

However, when using constrained simulations in phase space, the momentum variable of the dynamical system is also constrained, and a modified free energy (called “rigid free energy” in the sequel, see Remark (3.3) below for a justification of the term ”rigid”) is more naturally computed, see Section 3. The latter is defined as

More precisely, the first point (i) amounts to showing that

As compared to , where a formal proof for the Hamiltonian case is proposed, we use an explicit calculation that does not require the use of the Lagrangian structure of the problem, or a change of coordinates. Once $\nabla_{z}F_{\rm rgd}^{M}(z)$ is obtained, $F_{\rm rgd}^{M}(z)$ can be computed (up to an additive constant) by integration. This procedure is known as thermodynamic integration. Note that using (1.15) and thermodynamic integration, together with (1.14), allows to obtain $F(z)$ without computing second order derivatives of $\xi$ . This is a desirable property since computing such high derivatives may be cumbersome for some reaction coordinates used in practice. Straightforward computations of the mean force using analytical expressions (see for instance (4.11)-(4.12)) usually involve such high order derivatives.

The second point (ii) is then based on a discretization of a variant of (1.15), obtained by subtracting the martingale part of the Lagrange multipliers. This amounts to averaging the two Lagrange multipliers involved in the RATTLE part of the scheme, see (4.18)).

We also discuss how these techniques can be generalized to compute the free energy for systems with molecular constraints, see Section 4.1.

4. Jarzynski-Crooks relations and nonequilibrium computations of the free energy

As explained in Section 5.3, it is possible to define the work associated with the constraints exerted on the system between time and $T$ as the displacement multiplied by the constraining force:

where $\frac{1}{2\beta}\ln\det G_{M}(q)$ is the so-called Fixman term due to the geometry of the position constraints (see (1.14) and Remark 3.3), and $\frac{1}{2}\dot{z}(t)^{T}G_{M}^{-1}(q)\dot{z}(t)$ is the kinetic energy term due to the velocity of the switching. Then, the free energy profile can be computed through the following fluctuation identity (see (5.23)):

5. Organization of the paper

Preliminaries

After making precise our notation for matrices and matrix valued functions in Section 2.1, we introduce some additional concepts required to describe constrained systems in Section 2.2, and define the phase space measures with constraints in Section 2.3.

Throughout the paper, the following notation is used:

The canonical symplectic matrix is denoted by:

2. Constraints

Contrarily to what is often done in the literature, we avoid global changes of variables and the use of generalized coordinates. We observe that global changes of variables are not required for the proofs of the theoretical results we present, and they are definitely to be avoided in practical numerical computations whenever possible.

Two useful concepts to study constrained Hamiltonian systems (in particular, to use the co-area formula in phase space, as well as the Poisson bracket formulation of the Liouville equation) are the effective velocity $v_{\xi}$ and the effective momentum $p_{\xi}$ associated with the constrained degrees of freedom $\xi$ :

The expression of the effective velocity is obtained by deriving the constraint $\xi$ along an unconstrained trajectory of the Hamiltonian dynamics

The last equations allow to consider $G_{M}^{-1}$ as some effective mass.

The constraints on a mechanical system can also be reformulated in the more general form

where either (i) the effective momentum is constrained, in which case $\Xi=(\xi,p_{\xi})$ and $\zeta=(z,p_{z})$ ; or (ii) the effective velocity is constrained, in which case $\Xi=(\xi,v_{\xi})$ and $\zeta=(z,v_{z})$ . The phase space associated with such constraints is denoted by

A position $q\in\Sigma(z)$ being given, the affine space of constrained momenta verifying (2.6) is then denoted by

in the effective velocity case, and by $\Sigma_{p_{\xi}(q,\cdot)}(p_{z})$ in the effective momentum case. This notation is very important for nonequilibrium methods where the constraints evolve in time according to a predefined schedule, see Section 5. Note that the phase space of mechanical constraints, defined by (1.5), is simply $T^{*}\Sigma(z)=\Sigma_{\xi,v_{\xi}}(z,0)=\Sigma_{\xi,p_{\xi}}(z,0)$ .

We can now define the skew-symmetric Gram tensor of dimension $2m\times 2m$ associated with the constraints:

The Gram matrix $\Gamma$ associated with the generalized constraints (2.6) can be explicitly computed by block. Indeed, for $\Xi=(\xi,p_{\xi})^{T}$ ,

Therefore, ${\rm det}(\Gamma)=1$ in this case. In the case $\Xi=(\xi,v_{\xi})^{T}$ , the Gram matrix reads

and ${\rm det}(\Gamma)={\rm det}(G_{M})^{2}$ . Note that in both cases ${\rm det}(\Gamma)>0$ . The constrained symplectic (skew-symmetric) matrix is now defined by

and the Poisson bracket associated with generalized constraints (2.6) by:

This Poisson bracket is often called the Dirac bracket in the literature (in reference to the seminal work of Dirac ).

It is easily checked that $\left\{\cdot,\cdot\right\}_{\Xi}$ verifies the characteristic properties of Poisson brackets, namely the skew-symmetry, Jacobi’s identity, and Leibniz’ rule. Therefore, the flow associated with the evolution equation

defines a symplectic map. Recall that (see [22, Section VII.1.2]) a map $\phi\,:\,\Sigma_{\Xi}(\zeta)\to\Sigma_{\Xi}(\zeta)$ is symplectic if for any $(q,p)\in\Sigma_{\Xi}(\zeta)$ and $u,v\in T_{(q,p)}\Sigma_{\Xi}(\zeta)$ ,

A consequence of the symplectic structure is the divergence formula (2.25) relating the phase space measure $\sigma_{\Sigma_{\Xi}(\zeta)}(dq\,dp)$ on $\Sigma_{\Xi}(\zeta)$ (defined below), and the Poisson bracket (2.13). The reader is referred to Chapter $8$ in , and Section VII. $1$ in for more material on constrained systems.

3. Phase space measures

and $(u_{1}(q,p),\ldots,u_{6N-2m}(q,p))$ is a basis of tangential vectors of the submanifold $T^{\ast}\Sigma(z)$ (or $\Sigma_{\Xi}(\zeta)$ ) at a given point $(q,p)$ .

It is now possible to define a generalization of the canonical distribution (1.9) as follows:

The distribution (2.15) is associated with the generalized constraints $\Xi=\left(\xi,v_{\xi}\right)^{T}$ , and is used in Section 5 for nonequilibrium methods. Note that $\mu_{\Sigma_{\xi,v_{\xi}}(z,0)}=\mu_{T^{\ast}\Sigma(z)}$ defined in (1.9).

3.2. Co-area decompositions

A more concise notation for the above equalities is $dq\,dp=\delta_{\Xi(q,p)-\zeta}(dq\,dp)\,d\zeta$ and $dq=\delta_{\xi(q)-z}(dq)\,dz$ .

We refer for example to Chapter $3$ in for an elementary proof. An equivalent of (2.18)-(2.17) for momenta reads, for constrained effective momenta:

and for constrained effective velocities:

Using the co-area formulas (2.18)-(2.19), and the expressions of symplectic Gram matrices (2.10)-(2.11), we obtain the following expressions of the phase space measures:

The phase space measure on $\Sigma_{\xi,p_{\xi}}(z,p_{z})$ can be identified with the conditional measure defined in (2.16):

while the phase space measure on $\Sigma_{\xi,v_{\xi}}(z,v_{z})$ is related to the corresponding conditional measure as

The phase space measures are given by the product of surface measures:

Equations (2.23)-(2.24) are a consequence of the fact that

3.3. Divergence formulas

We end this section with an important formula, which is used to show the invariance of the canonical measure in the proof of Proposition 3.2.

The divergence formula (2.25) can be proved using Darboux’s theorem and internal coordinates, or directly using the co-area formula (see Section 3.3 in ).

Constrained Langevin processes and sampling

By differentiating with respect to time the constraint $\xi(q_{t})=z$ , the Lagrange multipliers can be computed explicitly (see for instance Section 3.3 in ):

where we introduced the notation $(\sigma_{P},\gamma_{P}):=(P_{M}\,\sigma,P_{M}\,\gamma\,P_{M}^{T})$ . The constraint therefore has two effects: (i) the matrices $\gamma,\sigma$ in the dissipation and fluctuation terms are replaced by their projected counterparts $\gamma_{P},\sigma_{P}$ , and (ii) an orthogonal constraining force $\nabla\xi f_{\rm rgd}^{M}$ is introduced.

The expression of $\mathcal{L}_{\Xi}$ can be obtained using Itô calculus, as made precise in the following proposition.

where the fluctuation-dissipation part is

with $(\sigma_{P},\gamma_{P})$ defined in (1.16). Using the fluctuation-dissipation relation (1.3), the generator $\mathcal{L}_{\Xi}^{\rm thm}$ can be rewritten more compactly as

Let us first consider (i), with $\Xi=(\xi,v_{\xi})$ (the case $\Xi=(\xi,p_{\xi})$ being similar). Note that

where the Hessian operator ${\rm Hess}$ is defined in (2.1). Now, (2.11) implies that

Besides, $v_{\xi}(q_{t},p_{t})=0$ along a trajectory, since $(q_{t},p_{t})\in T^{*}\Sigma(z)$ . Therefore,

Finally, for all $(q,p)\in T^{*}\Sigma(z)$ ,

The operator (3.10) is the generator of the Hamiltonian part in (3.3).

We turn to (ii). The diffusive part arises from the fluctuation term $\sigma_{P}(q_{t})\,dW_{t}$ in (3.3), and its expression

is obtained directly from the standard Itô calculus. Similarly, the dissipation operator is

The addition of these two contributions gives the expression of $\mathcal{L}_{\Xi}^{\rm thm}$ . ∎

The stationarity and reversibility properties follow from the following detailed balance condition up to momentum reversal (see for instance Section 2.2 in ): for any test functions $\varphi_{1}$ , $\varphi_{2}$ ,

where $S\,:(q,p)\mapsto(q,-p)$ is the momentum flip. In view of the expression (3.4) of the generator, proving (3.11) amounts to proving this property for the operators $\left\{.,H\right\}_{\Xi}$ and $\mathcal{L}_{\Xi}^{\rm thm}$ .

For the Hamiltonian part $\left\{.,H\right\}_{\Xi}$ , the expression (3.10) yields

and the divergence formula (2.25) yields the balance condition (3.11) for the Hamiltonian part, in view of the invariance of the distribution $\sigma_{T^{*}\Sigma(z)}$ under the momentum flip $S$ .

For the thermostat part, it is easily checked, that

for any smooth test function $\varphi$ , so that the detailed balance condition up to momentum reversal (3.11) follows from the following more general detailed balance condition, in the case $v_{z}=0$ ( $\mu_{\Sigma_{\xi,v_{\xi}}(z,v_{z})}$ being defined in (2.15)):

After integration in $q$ , using the formula (3.5) for $\mathcal{L}_{\Xi}^{\rm thm}$ and (2.24), an expression symmetric in $(\varphi_{1},\varphi_{2})$ is obtained:

hence the detailed balance condition (3.12).

Ergodicity is a consequence of the hypo-ellipticity of the operator $\mathcal{L}_{\Xi}$ on $T^{\ast}\Sigma(z)$ (Hörmander’s criterion is satisfied, see ), which is itself a consequence of the fact that $P_{M}(q)\gamma P_{M}(q)^{T}$ is strictly positive on each $T^{\ast}_{q}\Sigma(z)$ . The proof can be carried out using local coordinates and the results from . ∎

We have considered in this section the Langevin dynamics (3.3) with constraints rigidly imposed by a projection onto the submanifold $T^{*}\Sigma(z)$ . This dynamics samples the canonical distribution (1.9) $\mu_{T^{*}\Sigma(z)}(dq\,dp)=Z_{z,0}^{-1}\,{\rm e}^{-\beta H(q,p)}\,\sigma_{T^{*}\Sigma(z)}(dq\,dp)$ , with constraints on both positions and momenta. The marginal on positions of this distribution is, in view of (2.23), proportional to

This is what we call in the following rigidly imposed constraints. The canonical distribution (1.9) with rigid constraints is naturally associated to the rigid free energy (1.13) (and this is what justifies the qualification ”rigid”), since, we recall,

Another way to impose some constraints on a system is to add a penalization term. In our context, this could be done by changing the potential energy $V$ to

It is easy to check that, in the limit $\varepsilon\to 0$ (infinite stiffness limit), the canonical measure associated to this potential is the canonical distribution (1.1) with positions conditioned by $\xi(q)=z$ . This distribution is proportional to ${\rm e}^{-\beta H(q,p)}\,\delta_{\xi(q)-z}(dq)\,dp$ and its marginal on positions is proportional to

This is what we call in the following softly imposed constraints. The canonical distribution with soft constraints is naturally associated with the standard free energy, since, we recall,

Note that, in view of (2.18), the marginal on positions for softly imposed constraints can be written in terms of rigidly imposed constraints through a modification of the potential:

is sometimes called the Fixman corrector (see ). Thus, if $(q_{t},p_{t})$ satisfies (3.3) with the modified potential $V+V_{{\rm fix}}$ then $q_{t}$ samples (in the longtime limit) the probability measure proportional to ${\rm e}^{-\beta V(q)}\delta_{\xi(q)-z}(dq)$ , and we thus refer to this dynamics as the softly constrained Langevin dynamics.

These concepts will be used in Section 4.1 to describe the computation of free energy differences for systems with molecular constraints.

Finally, let us mention that the infinite stiffness limit $\varepsilon\to 0$ of the Langevin dynamics (1.2) with the potential $V_{\varepsilon}$ is not (except for very specific forms of constraints) the softly constrained Langevin dynamics, as one would expect. We refer to for example, where it is shown that adiabatic effective potentials (derived from the conservation of the ratio of energy over frequency of fast modes) are required to describe the limiting dynamics. However, a formal argument based on “over-damping” the fast modes indeed leads to the softly constrained Langevin dynamics. $\sqcap$ $\sqcup$

2. Numerical implementation

For simplicity, we restrict ourselves to constant matrices $\gamma$ and $\sigma$ . Generalizations to position dependent matrices are straightforward.

This equation can be explicitly integrated on $[0,t]$ to obtain:

However, the matrix exponential ${\rm e}^{-t\,\gamma_{P}(q)M^{-1}}$ may be difficult to compute in practice (except for certain choices of $\gamma$ and $M$ , see the discussion at the end of Section 3.2.2). Instead of performing an exact integration, (3.14) can be discretized using a midpoint Euler scheme, which yields (3.16) and (3.18) below.

The numerical scheme we investigate, termed midpoint Euler-Verlet-midpoint Euler splitting, is therefore the following:

Note that when $\gamma=0$ and $\sigma=0$ , the scheme (3.16)-(3.17)-(3.18) becomes deterministic, and reduces to (3.17), which is a scheme for the deterministic Hamiltonian equations of motion with position constraints $\xi(q)=z$ . The latter scheme is referred to as the ”Hamiltonian scheme (3.17)” below.

The domain $D_{\Delta t}\subset T^{\ast}\Sigma(z)$ is defined as the set of configurations $(q^{n},p^{n+1/4})\in T^{\ast}\Sigma(z)$ such that there is a unique solution $(q^{n+1},p^{n+3/4})$ verifying (3.17).

Solving the position constraints $(C_{q})$ consists in projecting onto $\Sigma(z)$ a point in a $\Delta t$ -neighborhood of $q^{n}$ . Thus, by the implicit function theorem, the domain $D_{\Delta t}$ verifies:

It may happen that there is no solution if the time-step is too large, and, even for small time-steps, that several projections exist, see for instance Example $2$ in Chapter $7$ of . In practice, $D_{\Delta t}$ can be chosen to be the set of $(q^{n},p^{n+1/4})$ such that the Newton algorithm enforcing the constraints $(C_{q})$ has converged within a given precision threshold and a limited number of iterations.

As for the Verlet scheme in the unconstrained case, the associated numerical flow shares two important qualitative properties with the exact flow: It is time reversible and symplectic (see ). This implies quasi-conservation of energy, in the sense that energy is conserved within a given precision threshold over exponentially long times, see .

2.2. Comments on the fluctuation-dissipation part (3.16) and (3.18)

The new momentum $p^{n+1/4}\in T^{\ast}\Sigma(z)$ in (3.16) (or $p^{n+1}$ in (3.18)) may be obtained by first integrating the unconstrained dynamics with a midpoint scheme, and then computing the Lagrange multiplier $\lambda^{n+1/4}$ (or $\lambda^{n+1}$ ) by solving the following linear system implied by the constraints $(C_{p})$ :

Besides, it can be checked (see Sections 2.3.2 and 3.3.5 in ) that the Markov chain induced by the fluctuation-dissipation part of the scheme (3.16) (or (3.18)) verifies a detailed balance equation (both in the plain sense and up to momentum reversal) with respect to the stationary measure $\kappa_{T^{\ast}_{q}\Sigma(z)}^{M^{-1}}(dp)$ . The latter is defined as the kinetic probability distribution

and is the marginal in the $p$ -variable of the canonical distribution $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ conditioned by a given $q\in\Sigma(z)$ . Moreover, if $\gamma_{P}:=P_{M}\,\gamma\,P_{M}^{T}$ is strictly positive in the sense of symmetric linear transformations of $T^{*}_{q}\Sigma(z)$ , then the Markov chain on momentum variable induced by (3.16) (or (3.18)) alone is ergodic with respect to $\kappa_{T^{\ast}_{q}\Sigma(z)}^{M^{-1}}(dp)$ .

Finally, an important simplification occurs in the integration of (3.14) in the special case when $\gamma$ and $M$ are equal up to a multiplicative constant (so that $\gamma M^{-1}$ is proportional to identity). Indeed in this case the equality $\left(\gamma_{P}(q)M^{-1}\right)^{n}=P_{M}(q)\left(\gamma\,M^{-1}\right)^{n}$ holds for any $n\geq 0$ , and (3.15) simplifies to

The numerical integration of (3.14) can thus be carried out in two steps: (i) exactly integrating (3.14) without constraint, and then (ii) projecting the result onto $T^{\ast}_{q}\Sigma(z)$ .

2.3. Metropolis-Hastings correction

Usually, the invariant probability distribution sampled by the solution of a numerical scheme is biased by the time discretization. Relying on (i) the time symmetry (up to momentum reversal) and (ii) the preservation of the phase space measure $\sigma_{T^{*}\Sigma(z)}(dq\,dp)$ by the solution of the RATTLE scheme (3.17), it is possible to eliminate the time discretization error in the splitting scheme (3.16)-(3.17)-(3.18) by resorting to a Generalized Hybrid Monte Carlo algorithm.

Consider an initial configuration $(q^{0},p^{0})\in T^{*}\Sigma(z)$ , and a sequence $({\mathcal{G}}^{n},{\mathcal{G}}^{n+1/2})_{n\geq 0}$ of independently and identically distributed standard Gaussian vectors. Iterate on $n\geq 0$ :

Evolve the momentum according to the midpoint Euler scheme (3.16), and compute the energy $E^{n}=H(q^{n},p^{n+1/4})$ of the new configuration;

Integrate the Hamiltonian part according to the RATTLE scheme (3.17), denote $(\widetilde{q}^{n+1},\widetilde{p}^{n+3/4})$ the resulting state, and set $E^{n+1}=H(\widetilde{q}^{n+1},\widetilde{p}^{n+3/4})$ .

Accept the proposal $(q^{n+1},p^{n+3/4}):=(\widetilde{q}^{n+1},\widetilde{p}^{n+3/4})$ with probability

Otherwise, reject and flip the momentum: $(q^{n+1},p^{n+3/4})=(q^{n},-p^{n+1/4})$ .

Evolve the momentum according to the midpoint Euler scheme (3.18).

By construction, the GHMC algorithm with constraints leaves invariant the equilibrium distribution $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ (see Section 3.3.5 in ).

To understand the momentum reversal required upon rejection, it is useful to write more explicitly the Markov chain as the composition of a Metropolis-Hastings part, where the proposal is obtained by a RATTLE step followed by a momentum reversal (the latter operation is needed to ensure the symmetry of the proposition), which is then accepted or rejected; and another momentum reversal (which leaves invariant the targeted probability measure $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ ). When the proposal is accepted, the two momentum reversals cancel out each other. On the other hand, when the proposal is rejected, momenta are actually reversed. See [35, Section 2.1.4] for more background on generalized Metropolis-Hastings algorithms.

In the above, we implicitly assume that the RATTLE scheme (3.17) is everywhere well defined. In practice however, it is necessary to modify Algorithm 3.5 by restricting the sampled configurations to $D_{\Delta t}$ . This can be achieved by introducing additional tests in steps (1), (2) and (4), and rejecting the states that have gone outside the set $D_{\Delta t}\subset T^{\ast}\Sigma(z)$ where the position constraint $(C_{q})$ is well defined. By doing so, the global algorithm has an invariant equilibrium distribution given by $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ conditioned on the set of states $D_{\Delta t}$ . This invariant distribution can be written explicitly as follows:

Therefore, there exists $a>0$ small enough such that, when $\|p^{n+1/4}\|\leq a/\Delta t$ , the RATTLE scheme in (3.17) is well defined, namely there exists a unique $q^{n+1}$ satisfying the constraint $(C_{q})$ . This shows that

The invariant probability distribution of the Markov chain generated by GHMC with the additional rejection steps ensuring $\frac{1}{2}p^{T}M^{-1}p\leq R_{\Delta t}$ , is given by (3.21), and actually reads

Its marginal distribution in the position variable is then exactly given by:

This is also the marginal distribution in the position variable of $\mu_{T^{*}\Sigma(z)}$ . Note however, that if $R_{\Delta t}$ is too small, only small momenta will be sampled in step (1) of Algorithm 3.5, and the correlation time of the sampling will be large. In practice, the threshold $R_{\Delta t}$ should be tuned in preliminary computations so that: (i) $R_{\Delta t}$ is small enough so that the maximal number $N_{\rm max}$ of iterations for the Newton algorithm used to enforce $(C_{q})$ in (3.17) is never reached; (ii) $R_{\Delta t}$ is large enough so that the correlation time of the sampling is as small as possible.

Let us end this section with a warning: It is now known that the correction of the bias in discretizations of the Langevin dynamics by a Metropolization of the scheme may reduce the efficiency of the sampling, see for instance .

3. Exact sampling on a submanifold with overdamped dynamics

Constrained overdamped Langevin processes (or Brownian dynamics) are solutions of the stochastic differential equation (see also )

where $\lambda_{t}$ is an adapted stochastic process. Equivalently, (3.23) can be rewritten in the Stratonovitch form as

where $\circ$ denotes the Stratonovitch integration, and $P$ is the projector defined by (1.7) with the choice $M={\rm Id}$ :

It can be shown that (3.23) verifies the detailed balance condition for (and is ergodic with respect to) the invariant distribution

Suppose that the following relation is satisfied:

where $({\mathcal{G}}^{n})_{n\geq 0}$ are independent and identically distributed centered and normalized Gaussian variables, and $(\lambda^{n}_{\rm od})_{n\geq 1}$ are the Lagrange multipliers associated with the constraints $(\xi(q^{n})=z)_{n\geq 1}$ . Moreover, the Lagrange multipliers in (3.17) verify:

Irrespective of $p^{n}$ , the choice (3.26) in the scheme (3.16)-(3.17)-(3.18) leads to

where $\lambda^{n+1/4}$ is associated with the constraints $\nabla\xi(q^{n})^{T}p^{n+1/4}=0$ . This gives

where $\lambda^{n+1/2}$ is such that $\xi(q^{n+1})=z$ . The fluctuation-dissipation relation (1.3) can be reformulated in this context as

and the scheme (3.27) is recovered by taking the associated Lagrange multiplier equal to $\lambda^{n+1}_{\rm od}=\lambda^{n+1/4}+2\lambda^{n+1/2}$ . Finally, remarking that $G_{M}=\frac{2}{\Delta t}G$ and computing explicitly $\lambda^{n+1/2}$ and $\lambda^{n+3/4}$ in (3.17) yields (3.28)-(3.29)-(3.30). ∎

This point of view allows to construct a Metropolis correction to the Euler scheme (3.27), using the Generalized Hybrid Monte Carlo scheme (Algorithm 3.5) with the time-step chosen according to (3.26). In this way, assuming that the position constraint $(C_{q})$ in (3.17) is everywhere well defined, we obtain a Markov chain $(q^{n})_{n\geq 0}$ discretizing the overdamped dynamics (3.27) which exactly samples the invariant distribution (3.25). Deriving such a Metropolis-Hastings correction to the Euler scheme (3.27) without resorting to phase-space dynamics does not seem to be natural.

Proposition 3.6 can be seen as a discrete version of the zero-mass limit of the Langevin dynamics. It is also possible to obtain a discrete version of the overdamped limit ( $\gamma\to\infty$ ) of the Langevin dynamics by assuming that the parameters satisfy the relation

which is less restrictive than (3.26). Equation (3.27) is then obtained with $\Delta t$ replaced by

In this case, the effective discretization time-step $\Delta s$ for the overdamped Langevin dynamics is thus different from the time-step $\Delta t$ originally used for the discretization of the Langevin dynamics. This is reminiscent of the fact that the overdamped limit (at the continuous level) of the Langevin dynamics requires a change of timescale to obtain the overdamped Langevin dynamics.

where $M$ is not supposed to be proportional to the identity, the following numerical scheme is obtained:

where $\widetilde{\Delta s}=\Delta t^{2}/2$ . This is a discretization of the overdamped dynamics

which is a generalization of (3.23) to a scalar product on $\Sigma(z)$ induced by a general positive definite mass matrix $M$ . $\sqcap$ $\sqcup$

Thermodynamic integration with constrained Langevin dynamics

In this section, we focus on the computation of the gradient of the rigid free energy (1.13)

As explained in the introduction, we may indeed concentrate on the computation of the rigid free energy (1.13), since the standard free energy (1.12) can be computed from the latter one using (1.14). The relation (1.14) can be proved with the co-area formula (2.18). Indeed, the free energy defined in (1.12) can be rewritten as (where C denotes a constant which may vary from line to line):

where surface measures are defined in Section 2.3. Note that the rigid free energy $F_{\rm rgd}^{M}$ indeed depends explicitly on the mass matrix since

This section is organized as follows. First, we show how systems with molecular constraints and systems with constrained values of the reaction coordinate can be treated in a unified framework (Section 4.1). We then relate the Lagrange multipliers arising in the constrained Langevin dynamics, and the gradient of the rigid free energy (the so-called mean force) in Section 4.2. We consider the numerical computation of the mean force in Section 4.3, where we prove consistency results for the corresponding approximation formulas. Finally, some numerical results on a model system illustrate the approach in Section 4.4.

We discuss here how to generalize all the computations to systems with molecular constraints, generalizing thereby some results of . Without loss of generality, we consider rigidly imposed molecular constraints, see Remark 4.1 below for a discussion of this assumption. This section can be considered as independent of the remainder of the paper and may therefore be omitted in a first reading.

In practice, many systems are subject to molecular constraints, such as fixed lengths for covalent bonds, or fixed angles between covalent bonds. The reader is referred to for practical aspects related to the simulation of molecular constraints. In the context of free energy computations, two types of constraints are therefore considered: first, the molecular constraints,

and the submanifold associated with the reaction coordinates by

is everywhere invertible on $\Sigma_{\rm mc}\cap\Sigma_{\rm rc}(z_{\rm rc})$ . Likewise, we denote

Assuming rigid mechanical constraints on the molecular constraints $\xi_{\rm mc}$ (see Remark 4.1 below), we are led to considering the canonical distribution

to describe systems with molecular constraints at a fixed temperature. The measure $\sigma_{T^{*}\Sigma_{\rm mc}}$ denotes the phase space measure on $T^{*}\Sigma_{\rm mc}$ , equal by (2.21) to the conditional measure $\delta_{\xi_{\rm mc}(q),p_{\xi_{\rm mc}}(q,p)}(dq\,dp)$ associated with the constraints $(\xi_{\rm mc}(q)=0,p_{\xi_{\rm mc}}(q,p)=0)$ , where $p_{\xi_{\rm mc}}$ is the effective momentum (2.5) associated with $\xi_{\rm mc}$ .

The distribution $\mu_{T^{*}\Sigma_{\rm mc}}$ in (4.4) is obtained by constraining rigidly $\xi_{\rm mc}(q)$ to 0, and not ’softly’ (in which case $\delta_{\xi_{\rm mc}(q),p_{\xi_{\rm mc}}(p,q)}(dq\,dp)$ would be replaced by $\delta_{\xi_{\rm mc}(q)}(dq)\,dp$ , see also Remark 3.3). As explained in Section 3, the distribution (4.4) is the equilibrium distribution of a Langevin process (thermostated Hamiltonian dynamics) with rigid position constraints $\xi_{\rm mc}(q)=0$ . Choosing whether constraints should be soft or rigid is a modeling choice, and there is no clear consensus on this issue in the current literature.

Note however that it is possible to rewrite the remainder of this section by considering the softly constrained potential rather than the rigidly constrained potential, up to an appropriate modification of (4.6) below, with the help of some Fixman corrective potential. $\sqcap$ $\sqcup$

By associativity of the conditioning of measures, the distribution $\mu_{T^{*}\Sigma_{\rm mc}}$ conditioned by a value of the reaction coordinates $\xi_{\rm rc}(q)=z_{\rm rc}$ is given, up to a normalizing factor, by:

Therefore, considering the marginal probability distribution of the reaction coordinates $\xi_{\rm rc}(q)$ leads to the following definition of the free energy associated with $\xi_{\rm rc}$ :

The conditional distribution can be decomposed as follows, using the co-area formulas (2.18)-(2.20) and the definition of effective momentum (2.5):

Integrating out the momentum in the linear space $T^{\ast}_{q}\Sigma_{\rm mc}$ with scalar product $\langle p_{1},p_{2}\rangle_{M^{-1}}=p_{1}^{T}M^{-1}p_{2}$ , the free energy can be rewritten as:

As a consequence, the free energy $F^{\rm mc}$ can be computed from the generalized rigid free energy:

using the following formula, similar to (1.14):

In the above, $\mu_{T^{*}(\Sigma_{\rm rc}(z_{\rm rc})\cap\Sigma_{{\rm mc}})}$ is defined similarly to (1.9). The case of molecular constraints can therefore be treated within the general framework considered in this paper, the sampling of the canonical measure $\mu_{T^{*}(\Sigma_{\rm rc}(z_{\rm rc})\cap\Sigma_{{\rm mc}})}$ and the computation of the rigid free energy (4.5) being the problems at hand.

The rigid free energy which is thus naturally computed with such a dynamics is

and the actual free energy is thus recovered by a formula similar to (4.6), namely

2. The mean force and the Lagrange multipliers

In this section, the average of the constraining force (3.2) is related to the gradient of the rigid free energy (1.13) (or mean force). We also give a similar result for the following generalized rigid free energy:

yields on average the rigid free energy derivative:

Moreover, for general constraints (2.6) and the associated generalized free energy (4.7), the formula can be extended as follows: The generalized constraining force is

is defined in (2.9), and the rigid mean force is

where $\displaystyle Z_{\zeta}=\int_{\Sigma_{\Xi}(\zeta)}{\rm e}^{-\beta H}d\,\sigma_{\Sigma_{\Xi}(\zeta)}$ , and $F^{\Xi}_{\rm rgd}(\zeta)$ is defined in (4.7). When $(q,p)$ verifies $p_{\xi}(q,p)=v_{\xi}(q,p)=0$ , then $g^{\Xi}(q,p)=0$ and $f^{\Xi}(q,p)=f_{\rm rgd}^{M}(q,p)$ .

Formulas (4.9) and (4.10) are obtained directly by replacing $\varphi$ by ${\rm e}^{-\beta H}$ in Lemma 6.2 (see the Appendix). The fact that $(f^{\Xi}(q,p),g^{\Xi}(q,p))=(f_{\rm rgd}^{M}(q,p),0)$ in the tangential case (namely when $p_{\xi}(q,p)=v_{\xi}(q,p)=0$ ) is a consequence of (3.8). ∎

The following lemma gives a momentum-averaged version of the constraining force (a similar formula exists in the overdamped case, see Equations (4.8)-(4.9) in , for example).

The rigid mean force (4.8) can be rewritten as:

Consider the Gaussian distribution ${\kappa}^{M^{-1}}_{T^{*}_{q}\Sigma(z)}(dp)$ defined in (3.19):

which is the marginal distribution in the momentum variable of the canonical distribution $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ , conditioned by a given $q\in\Sigma(z)$ . Proving Lemma 4.4 amounts to showing that the average of the constraining force $f_{\rm rgd}^{M}$ with respect to ${\kappa}^{M^{-1}}_{T^{\ast}_{q}\Sigma(z)}(dp)$ yields $\overline{f}_{\rm rgd}^{M}$ :

Denoting $\langle p_{1},p_{2}\rangle_{M^{-1}}=p_{1}^{T}M^{-1}p_{2}$ , this yields

Averaging (3.2) over momenta thus leads to the desired result. ∎

A similar result holds for the ‘Hamiltonian part’ of the Lagrange multipliers, defined by:

Indeed, the following almost sure convergence holds:

The estimator based on (4.16) has a smaller variance than the estimator based on (1.15) (or (4.14) above) since only the bounded variation part is retained, and the martingale part due to the Brownian increments and the dissipation term are subtracted out. Similar results on variance reduction where obtained in the overdamped case in .

Recall the expression (3.1) of the Lagrange multipliers, which can be decomposed as the sum of the constraining force, a dissipation term and a martingale (fluctuation) term:

The result follows from three facts. First, the process is ergodic with respect to the equilibrium distribution $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ and averaging $f_{\rm rgd}^{M}$ yields the rigid free energy derivative in view of Proposition 4.3. This already shows (4.16).

Second, the Gaussian distribution of $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ with respect to momentum variables is centered, which yields:

Third, the variance of the martingale term can be uniformly bounded as

The fact that averaging the Lagrange multiplier in (4.14) indeed yields the mean force may not be intuitive. This is actually very much related to the cost interpretation of the Lagrange multipliers in optimization, see [35, Remark 3.29].

3. Numerical discretization of the mean force

Estimates of the mean force based on either (4.8), (4.11) or (4.16) can be obtained.

Free energy derivatives can be computed by averaging $\overline{f}_{\rm rgd}^{M}(q)$ or $f_{\rm rgd}^{M}(q,p)$ with respect to the distribution $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ , for instance using the estimators:

The functions $\overline{f}_{\rm rgd}^{M}(q)$ and $f_{\rm rgd}^{M}(q,p)$ may thus be called “rigid local mean forces”. Note that using the momentum-averaged local mean force $\overline{f}_{\rm rgd}^{M}$ instead of the original $f_{\rm rgd}^{M}$ reduces the variance since the fluctuations of the momentum variable have been averaged out analytically. Table 1 below confirms this analysis, although the variance reduction appears to be small in our specific numerical experiment.

3.2. Averaging the Lagrange multipliers

Free energy derivatives can also be computed using the Lagrange multipliers of a Langevin constrained process according to (1.15) or (4.16). This technique avoids the possibly cumbersome computation of second order derivatives ${\rm Hess}_{q}(\xi)$ of the reaction coordinate, which appear in the expressions of $f^{M}_{\rm rgd}$ or $\overline{f}_{\rm rgd}^{M}$ . Besides, the Lagrange multipliers are needed anyway for the numerical integration of the dynamics.

The computation can be performed as before with a longtime simulation of the splitting scheme (3.16)-(3.17)-(3.18) discretizing the Langevin process with constraints. The following approximation formula can for instance be used:

where $(\lambda^{k+1/2},\lambda^{k+3/4})$ are the Lagrange multipliers in the Hamiltonian part (3.17). The consistency of this estimator is given by the following proposition.

The approximation formula (4.18) is consistent. More precisely, the Lagrange multipliers $(\lambda^{n+1/2},\lambda^{n+3/4})$ in (3.16)-(3.17)-(3.18) are both equivalent when $\Delta t\to 0$ to the constraining force defined in (3.2):

Moreover, the following second order consistency holds for the sum of the Lagrange multipliers:

The variant (4.20), which involves positions and momenta at the beginning and at the end of the Hamiltonian steps only, is used in (4.22) below to estimate the time discretization error in the thermodynamic integration method based on the estimator (4.16).

For sufficiently small time-steps $\Delta t$ , the implicit function theorem ensures that the two projection steps associated with the nonlinear constraints in (3.16)-(3.17)-(3.18) have a unique smooth solution. A Taylor expansion with respect to $\Delta t$ of the position constraints gives

Then, the fact that $z=\xi(q^{n+1})=\xi(q^{n})$ and the identity

yield the following expansion of $\lambda^{n+1/2}$ in terms of $(q^{n},p^{n+1/2})$ :

By time symmetry, the same computation holds for $\lambda^{n+3/4}$ , starting from $(q^{n+1},p^{n+3/4})$ and by formally replacing $\Delta t$ by $-\Delta t$ . This can be double checked by Taylor expanding with respect to $\Delta t$ the position constraints, as done above for $\lambda^{n+1/2}$ . It thus holds:

The sum of the multipliers therefore reads

which gives (4.19). Now, using the previous calculations, we remark that:

This gives the claimed second order consistency of the sum of the Lagrange multipliers (4.20). ∎

This shows the convergence of the estimate (4.18) of the mean force when taking first the limit $\Delta t\to 0$ and then $T\to\infty$ .

3.3. Estimates relying on the Metropolized scheme

When the scheme (3.16)-(3.17)-(3.18) is complemented with a Metropolis step (see Algorithm 3.5), it is possible to prove a result on the longtime limit of trajectorial averages (i.e. letting first the number of iterations go to infinity, and then taking the limit $\Delta t\to 0$ ), upon assuming the irreducibility of the numerical scheme.

Indeed, let us consider the Markov chain $(q^{k},p^{k})$ generated by the GHMC scheme in Algorithm 3.5, and assume (i) the irreducibility of the Markov chain, and (ii) that appropriate rejections outside the set $\overline{D}_{\Delta t}=\Sigma(z)\times\{\frac{1}{2}p^{T}M^{-1}p\leq R_{\Delta t}\}$ are made in the steps (1)-(2)-(4) of the algorithm. In particular, the projection steps associated with the nonlinear constraints in Step (2) of Algorithm 3.5 are well defined.

Then, by ergodicity, an average of the analytic expression of the local rigid mean force $\overline{f}_{\rm rgd}^{M}$ given in (4.12) yields an estimate of the free energy without time discretization error:

If $f_{\rm rgd}^{M}$ is used instead of $\overline{f}_{\rm rgd}^{M}$ , then the mean force is computed with some exponentially small error: almost surely,

for some $\alpha>0$ . The error arising from replacing $\overline{D}_{\Delta t}$ with $T^{*}\Sigma(z)$ is indeed exponentially small in view of (3.22) (namely $R_{\Delta t}\geq A\,\Delta t^{-2}$ ) and using the fact that the marginal distribution in the $p$ -variable is Gaussian.

Likewise, for estimates based on Lagrange multipliers, the following longtime averaging holds: almost surely,

where we have used the estimate (4.20) on the Lagrange multipliers. The limit $\Delta t\to 0$ is obtained by a dominated convergence argument:

Note that, due to the Metropolis correction in Algorithm 3.5, the time discretization error in the sampling of the invariant measure is removed. The only remaining time discretization errors come from (i) the approximation of the local mean force by the Lagrange multipliers (this is a second order error), and (ii) the integration domain being $\overline{D}_{\Delta t}$ instead of $T^{*}\Sigma(z)$ (as discussed above, this is an exponentially small error in $\Delta t$ ). In conclusion, the left-hand side of (4.22) is an approximation of $\nabla_{z}F_{\rm rgd}^{M}(z)$ up to a ${\rm O}(\Delta t^{2})$ error term.

3.4. Overdamped limit

which can be seen as a variant of the variance reduced estimator proposed directly for the overdamped scheme (3.27) in :

The rigorous justification of the consistency of (4.23) in the limit $\Delta t\to 0$ follows from the results of . See also Section 5.5.2 below for similar results.

4. Numerical illustration

We consider a system composed of $N$ particles in a 2-dimensional periodic box of side length $L$ , interacting through the purely repulsive WCA pair potential, which is a truncated Lennard-Jones potential:

where $r$ denotes the distance between two particles, $\varepsilon$ and $\sigma$ are two positive parameters and $r_{0}=2^{1/6}\sigma$ . Among these particles, two (numbered 1 and 2 in the following) are designated to form a dimer while the others are solvent particles. Instead of the above WCA potential, the interaction potential between the two particles of the dimer is a double-well potential

See for instance for other computational studies using this model.

The potential $V_{\rm S}$ exhibits two energy minima, one corresponding to the compact state where the length of the dimer is $r=r_{0}$ , and one corresponding to the stretched state where this length is $r=r_{0}+2w$ . The energy barrier separating both states is $h$ . The reaction coordinate used to describe the transition from the compact to the stretched state is the normalized bond length of the dimer molecule:

where $q_{1}$ and $q_{2}$ are the positions of the two particles forming the dimer. The compact state (resp. the stretched state) corresponds to the value $z=0$ (resp. $z=1$ ) of the reaction coordinate.

The inverse temperature is set to $\beta=1$ , with $N=100$ particles ( $N-2$ solvent particles and the dimer) with solvent density $\rho=(1-2/N)a^{-2}=0.436$ , since there are $N-2$ solvent particles in a square box of side length $L=a\sqrt{N}$ with $a=1.5$ . The parameters describing the WCA interactions are set to $\sigma=1$ and $\varepsilon=1$ , and the additional parameters for the dimer are $w=2$ and $h=2$ .

For this system, $M={\rm Id}$ and $|\nabla\xi|$ is constant, so that the rigid free energy $F^{M}_{\rm rgd}(z)$ is equal to the free energy $F(z)$ .

The mean force is estimated at the values $z_{i}=z_{\rm min}+i\Delta z$ , with $z_{\rm min}=-0.2$ , $z_{\rm max}=1.2$ and $\Delta z=0.014$ , by ergodic averages obtained with the projected dynamics with Metropolis correction (Algorithm 3.5, where in the simple case considered here, the fluctuation-dissipation part can be integrated exactly). For each value of $z$ , we integrate the dynamics on a time $T=2\times 10^{4}$ with a step size $\Delta t=0.02$ , using a scalar friction coefficient $\gamma=1$ .

The resulting mean force profile is presented in Figure 1, together with the associated free energy profile.

Figure 2 compares the analytical constraining force $f^{M}_{\rm rgd}(q^{n},p^{n})$ and the Lagrange multipliers, see Proposition 4.6. In Figure 2, the $x$ -axis represents the blocks of $10^{5}$ simulation steps, concatenated for the $101$ different values of $z_{i}$ . It can be seen that the difference between $f^{M}_{\rm rgd}(q^{n},p^{n})$ and the Lagrange multipliers is small in any cases, though somewhat larger for the lowest values of $\xi$ .

It can be checked numerically that the differences $|\lambda^{n+1/2}-f_{\rm rgd}^{M}(q^{n},p^{n+1/2})\frac{\Delta t}{2}|$ and $|\lambda^{n+3/4}-f_{\rm rgd}^{M}(q^{n+1},p^{n+1/2})\frac{\Delta t}{2}|$ are indeed of order $\Delta t^{2}$ , and that the difference

is indeed of order $\Delta t^{3}$ (by computing the average of these elementary differences for various step sizes). The Lagrange multipliers are in any case very good approximations to the constraining force $f^{M}_{\rm rgd}$ .

Let us finally discuss the efficiency of the different estimators of the mean force, in terms of their variances. They can be written as the empirical average of the following random sequences:

where $q^{n},p^{n},\lambda^{n+1/2},\lambda^{n+3/4}$ are given by the numerical scheme (3.16)-(3.17)-(3.18). The correlations in time (between the iterates) are very similar for the three methods, and we therefore simply compute the variance over all the samples. Table 1 compares the so-obtained standard errors over $10^{5}$ time-steps with $\Delta t=0.02$ (simulation time $T=2000$ for each value of the reaction coordinate). The results show that the different estimators are more or less equivalent. This is related to the fact that the essential source of variance comes from the sampling of the positions, and not the sampling of the velocities. Note however that, for the smallest value of the reaction coordinate, the estimator based on the averaged local mean force $\overline{f}^{M}_{\rm rgd}(q^{n})$ appears to be better in terms of variance.

Hamiltonian and Langevin nonequilibrium dynamics

The latter free energy is associated to the normalization constant $Z_{z(t),\dot{z}(t)}$ of the distribution $\mu_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))}$ defined by (2.15). The generalized rigid free energy (4.7) can be explicitly related to the usual free energy as follows. First, remark that, for a fixed $q$ ,

In the above, the change of variable $p\to p-\nabla\xi(q)G_{M}^{-1}(q)v_{z}$ has been used, in the space

Note that $\frac{1}{2}v_{z}^{T}G_{M}^{-1}(q)v_{z}$ can be interpreted as the kinetic energy of the reaction coordinate $\xi$ . Using the decomposition of measures (2.24) and the above calculations, an alternative expression of the generalized free energy is:

where, as usual, ${\rm C}$ denotes a generic constant (independent of $z$ ) whose value may vary from line to line. As a consequence, the standard free energy (1.12) is easily recovered from the generalized free energy, using relations similar to (1.14). Indeed, using (5.1), and with computations similar to the ones leading to (4.2), the difference of the two free energies writes:

In practical nonequilibrium computations, the profile $t\mapsto F(z(t))$ can then be computed by adding a corrector to the work value in the Jarzynski estimator computing $F^{\xi,v_{\xi}}_{\rm rgd}(z(t),\dot{z}(t))$ . This yields the identity (5.24) mentioned in the introduction and proved below (see the discussion after Theorem 5.3).

2. Dynamics and generators

In view of the special structure of $(\sigma_{P},\gamma_{P})$ , this leads to

obtained by using a time reversed switching $t^{\prime}\mapsto z(T-t^{\prime})$ , and by reversing the momentum first in the initial condition, and then reversing them back after the time evolution (see for more general backward dynamics). More precisely, the backward dynamics can be defined through its generator

In the following proposition, the expressions of ${\mathcal{L}}^{\rm f}_{t}$ and ${\mathcal{L}}_{t^{\prime}}^{{\rm b}}$ are explicitly written.

and the generator of the backward process (5.6) at time $t^{\prime}\in[0,T]$ reads:

is the fluctuation-dissipation operator defined in (3.5).

The full expression of the generator $\mathcal{L}_{t}^{\rm f}$ is then obtained by adding the terms arising from the fluctuation-dissipation. These terms are directly obtained from the terms involving $\gamma_{P}$ and $\sigma_{P}$ in (5.4), as in the proof of Proposition 3.1.

3. Jarzynski-Crooks identity

Before stating the main result of this section (Theorem 5.3 below), we need to introduce a notion of work. This quantity is most conveniently defined for deterministic dynamics, but the corresponding definition is also valid for stochastic dynamics.

and denote by $\Phi_{t,t+h}:\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))\to\Sigma_{\xi,v_{\xi}}(z(t+h),\dot{z}(t+h))$ the associated flow between time $t\in[0,T]$ and $t+h\in[0,T]$ . The work can now be written out more explicitly using the following lemma:

The infinitesimal variation of the work (5.12) reads:

where for all $t\in[0,T]$ and all $(q,p)\in\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))$ ,

The total exchanged work is then a time integral associated with the path $t\mapsto(q_{t},p_{t})$ , and is denoted by:

Note that the expression (5.16) can be interpreted as the energy variation of the system during the switching when the stochastic thermostat is turned off.

The expression of the Lagrange multipliers in (5.3) yields (5.15):

The last equality is obtained using the computation of the Lagrange multipliers in (5.3). This yields (5.16). ∎

We are now in position to state the main result of this section.

and by $\{q^{\rm b}_{t^{\prime}},p^{\rm b}_{t^{\prime}}\}_{0\leq t^{\prime}\leq T}$ the solution of the backward Langevin process (5.6) with initial conditions distributed according to

Then, the following Jarzynski-Crooks identity holds on $[0,T]$ : for any bounded path functional $\varphi_{[0,T]}$ ,

where $(\,\cdot\,)^{\rm r}$ denotes the composition with the operation of time reversal of paths:

Note that the theorem still holds in the Hamiltonian case, i.e. when $(\gamma_{P},\sigma_{P})=(0,0)$ . The choice $\varphi_{[0,T]}=1$ in (5.19) leads to the following work fluctuation identity:

Besides, upon choosing a path functional $\exp(\theta\beta\mathcal{W}_{0,T})$ , it is possible to obtain a family of free energy estimators, parameterized by $\theta$ and where both forward and backward paths are weighted by the exponential of some work. Moreover, the standard Crooks equality on ratios of probability density functions of work values is also a consequence of (5.19), see Section 4.2.2 in .

Note also that the choice $\varphi_{[0,T]}(q,p)=\phi(q_{T},p_{T})$ leads to the following representation of the canonical distribution $\mu_{\Sigma_{\xi,v_{\xi}}(z(T),\dot{z}(T))}$ :

The usual free energy profile $z\mapsto F(z)$ can therefore be computed using the relations (1.18)-(1.19) presented in the introduction. Indeed, Equation (1.19) (see (5.24) below) can be proved by combining (5.2) and (5.21)–(5.22) as follows:

where the corrector $C(t,q)$ is defined in (1.18):

Estimators of the free energy based on (1.19) can then be constructed, see Chapter 4 in for a review.

Before turning to the proof of Theorem 5.3, we first give the general lemma which enables to deduce the Jarzynski-Crooks fluctuation identity from a nonequilibrium detailed balance condition (similar to the one presented in for switchings arising from a time-dependence in the Hamiltonian).

Let $(q_{t},p_{t})_{0\leq t\leq T}$ (resp. $(q^{\rm b}_{t},p^{\rm b}_{t})_{0\leq t\leq T}$ ) be a Markov process with infinitesimal generator ${\mathcal{L}}^{\rm f}_{t}$ (resp. ${\mathcal{L}}^{\rm b}_{t}$ ) and initial conditions distributed according to (5.17) (resp. (5.18)). Let us assume that the following nonequilibrium detailed balance condition is satisfied: for any two smooth test functions $\varphi_{1}$ , $\varphi_{2}$ ,

Then the Jarzynski-Crooks fluctuation identity (5.19) holds.

Let us introduce the following weighted transition operators: for any bounded test function $\varphi$ ,

where $(q_{t},p_{t})_{0\leq t\leq T}$ (resp. $(q^{\rm b}_{t},p^{\rm b}_{t})_{0\leq t\leq T}$ ) is a Markov process with infinitesimal generator ${\mathcal{L}}^{\rm f}_{t}$ (resp. ${\mathcal{L}}^{\rm b}_{t}$ ), and $\mathcal{W}_{t,T}=\mathcal{W}_{0,T}-\mathcal{W}_{0,t}$ .

We assume that these operators are well defined and smooth with respect to time for sufficiently smooth test functions defined in an open neighborhood of $\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))$ and $\Sigma_{\xi,v_{\xi}}(z(t^{\prime}),\dot{z}(t^{\prime}))$ respectively (for any $t,t^{\prime}\in[0,T]$ ).

The transition operators satisfy the following backward Kolmogorov evolution equations:

Consider now two test functions $\varphi_{0}$ and $\varphi_{T}$ . The balance condition (5.25) implies

Integrating this equality on $[0,T]$ yields

which is the Crooks identity (5.19) for path functionals of the form

Then, using the Markov property of the forward and backward processes, Crooks identity (5.19) can be extended to finite-dimensional path functionals of the form:

with $0=t_{0}<t_{1}\cdots<t_{K}=T$ by repeatedly using a variant of (5.28) on time subintervals $[t_{k},t_{k+1}]$ (see the proof of Theorem 4.10 in for further precisions). This allows to conclude since finite dimensional time marginal laws characterize the distribution on continuous paths, see for instance . ∎

First, using Lemma 6.2, we compute the variation of the unnormalized canonical equilibrium distribution with constraints with respect to the switching:

On the other hand, (5.14) and Proposition 5.1 give

Now, (5.25) can be verified in two steps. First, the last two terms in (5.31) (the ”thermostat” terms) cancel out after integration with respect to ${\rm e}^{-\beta H}\,d\sigma_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))}$ thanks to the detailed balance condition (3.12). Then, an integration of (5.31) with respect to ${\rm e}^{-\beta H}\,d\sigma_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))}$ gives, in view of (5.30) and (2.25),

which is indeed (5.25). Note that the time-regularity on the evolution semi-groups (5.26)-(5.27) required to make these computations rigorous is proved in the overdamped case in the proof of Theorem A.5 in . A similar proof can be carried out for constrained Langevin equations. ∎

4. Numerical schemes

so that constraints on momenta are automatically enforced, and no Lagrange multiplier is needed in (5.32) and (5.34).

We comment in the subsequent sections on the different parts of the scheme.

The Lagrange multipliers $\lambda^{n+1/2}$ are associated with the position constraints $(C_{q})$ , and the Lagrange multipliers $\lambda^{n+3/4}$ are associated with the velocity constraints $(C_{p})$ . In $(C_{p})$ , the velocity of the switching at time $t_{n+1}$ is discretized as:

The latter choice is motivated by the following observation: The position after one step of an unconstrained motion, given by

already satisfies $(C_{q})$ up to error terms of order two with respect to $\Delta t$ . Indeed, using (5.35):

This property is useful to ensure a fast convergence of the numerical algorithm solving the nonlinear constraints $(C_{q})$ .

The numerical flow associated with (5.33) is denoted in the sequel as

It can be proven that $\Phi^{n}$ is a symplectic map. The proof is indeed exactly the same as for the symplecticity of the classical RATTLE scheme, see [22, Sections VII.1.3] for an explicit computation for symplectic Euler and [22, Sections VII.1.4] for an extension to RATTLE. As a consequence, $\Phi^{n}$ transports the phase space measure $\sigma_{\Sigma_{\xi,v_{\xi}}\left(z(t_{n}),\frac{z(t_{n+1})-z(t_{n})}{\Delta t}\right)}$ to the phase space measure $\sigma_{\Sigma_{\xi,v_{\xi}}\left(z(t_{n+1}),\frac{z(t_{n+2})-z(t_{n+1})}{\Delta t}\right)}$ .

4.2. Comments on the fluctuation-dissipation part (5.32)-(5.34)

In practice, (5.32) may be rewritten in a form more suited to numerical computations. Of course, similar considerations hold for (5.34). Since $\gamma_{P}(q)=P_{M}(q)\gamma P_{M}(q)^{T}$ and $\sigma_{P}(q)=P_{M}(q)\sigma$ , (5.32) is equivalent to:

where the Lagrange multiplier $\lambda^{n+1/4}$ is associated with the constraint $(C_{p})$ . The equivalence between (5.32) and (5.37) can be checked by multiplying (5.37) by $P_{M}(q^{n})$ and using (5.35).

The Lagrange multiplier $\lambda^{n+1/4}$ in (5.37) is obtained by multiplying the above equation by $\nabla\xi(q^{n})^{T}M^{-1}\left({\rm Id}+\frac{\Delta t}{4}\gamma M^{-1}\right)^{-1}$ , and solving the following linear system:

This system is well posed. Indeed, the matrix $\nabla\xi(q^{n})^{T}M^{-1}\left({\rm Id}+\frac{\Delta t}{4}\gamma M^{-1}\right)^{-1}\nabla\xi(q^{n})$ can be rewritten as $\nabla\xi(q)^{T}S\nabla\xi(q)$ with $S=M^{-1}\left({\rm Id}+\frac{\Delta t}{4}\,\gamma M^{-1}\right)^{-1}$ . Both $M$ and $\gamma$ are symmetric and non-negative, so that $S$ is symmetric, positive and invertible. Finally, the invertibility of $\nabla\xi(q)^{T}S\nabla\xi(q)$ follows from the invertibility of $G_{M}(q)$ .

In the special case when $\gamma$ and $M$ are equal up to a multiplicative constant, the numerical integration can be simplified using the explicit formula (3.20) and the method described below (3.20), which still holds for the tangential part of the momentum. See Section 5.6 below for further precisions.

4.3. Discretization of the backward process (5.6)

The splitting scheme for the backward Langevin dynamics with time-evolving constraints (5.6) reads as follows: Denote $n^{\prime}=N_{T}-n$ , take initial conditions $(q^{{\rm b},0},p^{{\rm b},0})$ distributed according to $\mu_{\Sigma_{\xi,v_{\xi}}\left(z(t_{N_{T}}),\frac{z(t_{N_{T}+1})-z(t_{N_{T}})}{\Delta t}\right)}$ and iterate on $0\leq n^{\prime}\leq N_{T}-1$ ,

where we recall $n^{\prime}=N_{T}-n$ . Assuming that the flow $\Phi^{n}$ given by (5.36) and $\Phi^{{\rm b},n^{\prime}}$ are both well-defined, the following reversibility property is easily checked (extending the symmetry property of the standard RATTLE scheme, see for instance [22, Section VII.1.4]):

4.4. Work discretization and free energy computations

The work (5.12) can be approximated using the Lagrange multipliers in (5.33):

for $n=0\ldots N_{T}-1$ . The (formal) consistency of the work discretization (5.42) in the time continuous limit is a direct consequence of the work expression (5.12).

An estimator of the free energy profile is then obtained by using $K$ independent realizations of the switching process, computing the work $\mathcal{W}^{N_{T},k}$ for each realization $k\in\{1,\ldots,K\}$ (with the numerical trajectories obtained from the numerical scheme (5.32)-(5.33)-(5.34) and i.i.d. initial conditions sampled according to $\mu_{\Sigma_{\xi,v_{\xi}}\left(z(t_{0}),\frac{z(t_{1})-z(t_{0})}{\Delta t}\right)}$ ), and approximating (5.23), rewritten up to an unimportant additive constant (independent of $T$ ), as

In the above, the discretization $C^{n}(q)$ of the corrector (1.18) is

We refer to Chapter 4 in for more background on free energy estimators for nonequilibrium dynamics. In particular, it is possible to compute a work associated with the backward switching from the Lagrange multipliers in (5.6), and to resort to bridge estimators (see Section 4.2.3 in ).

However, using approximations such as (5.42) in the Jarzynski-Crooks identity introduces a time discretization error. We show in the next section how to eliminate this error.

4.5. Discrete Jarzynski-Crooks identity

It turns out that a discrete version of the Jarzynski-Crooks identity (5.19) can be obtained. This enables the estimation of free energy differences using nonequilibrium simulation without time discretization error. The discrete equality (5.48) below may be seen as an extension of the corresponding equality obtained for transitions associated with time-dependent Hamiltonians and performed with Metropolis-Hastings dynamics (see and Remark 4.5 in ).

For this purpose, we consider a discretization of the work $\mathcal{W}_{0,T}$ using the interpretation (5.16) of the work as the energy variation of the Hamiltonian part of the Langevin dynamics. This leads to the following definition of the work at the discrete level:

for $n=0\ldots N_{T}-1$ . This work discretization leads to a Jarzynski-Crooks identity without time discretization error.

Consider the distribution $\mu_{\Sigma_{\xi,v_{\xi}}(z(t),\dot{z}(t))}$ and its normalization $Z_{z(t),\dot{z}(t)}$ defined in (2.15). Denote by $\{q^{n},p^{n}\}_{0\leq n\leq N_{T}}$ the solution of the forward discretized Langevin dynamics (5.32)-(5.33)-(5.34) with initial conditions distributed according to

and by $\{q^{{\rm b},n^{\prime}},p^{{\rm b},n^{\prime}}\}_{0\leq n^{\prime}\leq N_{T}}$ the solution of the discretized backward Langevin dynamics (5.38)-(5.39)-(5.40) distributed according to

Then, the following Jarzynski-Crooks identity holds on $[0,N_{T}]$ : for any bounded discrete path functional $\varphi_{[0,N_{T}]}$ ,

where $\mathcal{W}^{n}$ is computed according to (5.44), and $(\,\cdot\,)^{\rm r}$ denotes the composition with the operation of time reversal of paths:

The (formal) consistency of the work discretization (5.44) in the time continuous limit is a direct consequence of the work expression (5.16). Free energy estimators based on the identity (5.47) are obtained as described in Section 5.4.4. Let us emphasize once again that there is no error related to the finiteness of the time-step $\Delta t$ in this estimator, and that the only source of approximation is due to the statistical error.

With a slight abuse of notation, we denote in the same way the random variables $(q^{n},p^{n})$ , $(q^{{\rm b},n},p^{{\rm b},n})$ , etc. in (5.32)-(5.33)-(5.34) or (5.38)-(5.39)-(5.40), and the integration variables in the definition of probability distributions. We divide the proof into three steps.

Step 1: The phase space conservation of $\Phi^{n}$ and $\Phi^{{\rm b},n^{\prime}}$ and the reversibility property (5.41) imply

Step 2: The probability distribution of $p^{n+1/4}$ given $(q^{n},p^{n})$ in the discretization of the fluctuation-dissipation part (5.32) is denoted $K^{\rm OU}(q^{n},p^{n},dp^{n+1/4})$ . The scheme (5.32) is a mid-point discretization of an Ornstein-Uhlenbeck process, which can be rewritten by decomposing the orthogonal and tangential updates of the momentum:

where $p_{\parallel}=P_{M}(q^{n})p$ , and $p_{\perp}=({\rm Id}-P_{M}(q^{n}))p$ . The Markov chain induced by the parallel part of the momentum is the same as the one induced by the scheme (3.16) (or (3.18)) defined in Section 3.2. The latter verifies a detailed balance equation (both in the plain sense and up to momentum reversal) with respect to the stationary measure $\kappa_{T^{\ast}_{q}\Sigma(z)}^{M^{-1}}(dp)$ defined by (3.19) (see Sections 2.3.2 and 3.3.5 in ). We recall that this measure is defined as the kinetic probability distribution in the momentum variable of the canonical distribution $\mu_{T^{*}\Sigma(z)}(dq\,dp)$ on the tangential space, conditioned by a given $q\in\Sigma(z)$ . Adding the (invariant) orthogonal part of the momentum, the following detailed balance condition is satisfied:

Step $3$ : Denote by $K^{\rm f}(q^{n},p^{n};dq^{n+1},dp^{n+1/4},dp^{n+3/4},dp^{n+1})$ the probability distribution of the variables $(q^{n+1},p^{n+1/4},p^{n+3/4},p^{n+1})$ given the variables $(q^{n},p^{n})$ in the scheme (5.32)-(5.33)-(5.34); and by $K^{\rm b}(q^{{\rm b},n^{\prime}},p^{{\rm b},n^{\prime}};dq^{{\rm b},n^{\prime}+1},dp^{{\rm b},n^{\prime}+1/4},dp^{{\rm b},n^{\prime}+3/4},dp^{{\rm b},n^{\prime}+1})$ the probability distribution of the variables $(q^{{\rm b},n^{\prime}+1},p^{{\rm b},n^{\prime}+1/4},p^{{\rm b},n^{\prime}+3/4},p^{{\rm b},n^{\prime}+1})$ given the variables $(q^{{\rm b},n^{\prime}},p^{{\rm b},n^{\prime}})$ in the scheme (5.38)-(5.39)-(5.40). The splitting structure yields:

Combining the detailed balance conditions (5.49) and (5.51) of Steps $1$ and $2$ , and using the decomposition (2.24) of phase space measures, it follows

which can be seen as the Jarzynski-Crooks identity over one time-step. Iterating the argument, it is easy to obtain:

5. The overdamped limit

The splitting scheme (5.32)-(5.33)-(5.34) can be used in the overdamped regime, using the method of Proposition 3.6, i.e. by choosing

which implies $\gamma_{P}=2P_{M}^{T}P_{M}$ and $\sigma_{P}=\frac{2}{\sqrt{\beta}}P_{M}$ . For this choice of parameters, the continuous limit of the numerical scheme is the following variant of the stochastic differential equation (3.23):

where $\lambda_{t}^{\rm od}$ is an adapted stochastic process such that $\xi(q_{t})=z(t)$ . We then obtain the following Jarzynski-Crooks relation for discretized overdamped dynamics, without time discretization error.

where $({\mathcal{G}}^{n})_{n\geq 0}$ are independent and identically distributed centered and normalized Gaussian variables, and $(\lambda^{n}_{\rm od})_{n\geq 1}$ are the Lagrange multipliers associated with the constraints $(\xi(q^{n})=z(t_{n}))_{0\leq n\leq N_{T}}$ . In the same way, the backward process (5.38)-(5.39)-(5.40) yields the following Euler scheme:

with $G=\nabla\xi^{T}\nabla\xi$ , and the scheme (5.54) is rewritten as:

Then the Jarzynski-Crooks relation (5.47) holds under the assumptions (5.45) and (5.46) on the initial conditions of the schemes (5.54) and (5.55) respectively.

The proof is a direct consequence of the reformulation of (5.54) into (5.57), and a direct application of Theorem 5.5 with the parameters (5.52).

based on the work (5.56) and the corrector

is exact (there is no time discretization error). This free energy estimator can be seen as a variant of the estimator proposed in , which was derived directly for the scheme (5.54), and reads (up to an unimportant additive constant):

and the modified corrector is defined without the kinetic energy term:

There is a bias due to the time discretization error in the estimator (5.60), which can be removed upon following the procedure described in Proposition 5.6.

5.2. Consistency analysis of three free energy estimators

In this section, we would like to discuss the consistency of three free energy estimators introduced above: (5.60)-(5.61) (based on the direct discretization of the overdamped dynamics proposed in ), (5.58)-(5.42) (which uses the Lagrange multipliers to approximate the work) and (5.58)-(5.56) (based on the discrete Jarzynski equality).

The limiting continuous-in-time version of the Jarzynski relation is:

where the work for the overdamped dynamics (5.53) reads (see ):

The consistency of (5.60)-(5.61) with (5.63)-(5.64) was already proven in .

Concerning the consistency of $C^{n}$ with $\widetilde{C}$ (see (5.58) and (5.60)), note that in the overdamped scaling ( $M=\frac{\Delta t}{2}{\rm Id}$ ), the difference

vanishes when $\Delta t\to 0$ . This difference can therefore be neglected when analyzing the consistency of the scheme in the continuous-in-time limit. We henceforth concentrate on the consistency of the works (5.42) and (5.56) with (5.64).

In the sequel, we denote the anticipating stochastic integration of the integrand $Y_{t}$ with respect to $dX_{t}$ by

where $\,\circ\,$ is the Stratonovitch symmetric integration, and $\,.\,$ the Itô integration. The symbol $\leadsto$ denotes the formal time continuous limit.

Let us justify the consistency of the work expression (5.42) with (5.64). Remark that the Lagrange multipliers in (5.57) verify:

so that $\lambda^{n+1/2}$ and $\lambda^{n+3/4}$ yield the same time continuous limit, that is to say

the same holding true for $2\lambda^{n+3/4}$ . The work expression (5.42) is thus formally consistent with (5.64). This concludes the proof of the consistency of (5.58)-(5.42) with (5.63)-(5.64).

We now prove the consistency of the work expression (5.56) with (5.64). Define

The expression (5.56) yields using (5.57):

First, since $V(q^{n+1})-V(q^{n})\leadsto\nabla V(q_{t})\,\circ dq_{t}$ and $\frac{1}{2}(\nabla V(q^{n})+\nabla V(q^{n+1}))\cdot(q^{n+1}-q^{n})\leadsto-\nabla V(q_{t})\,\circ dq_{t}$ , the limit of the terms in (5.72) is zero. Second, using the expressions (5.65) and (5.67), and similarly to (5.68) and (5.70), it holds

since $(\nabla\xi G^{-1}\nabla\xi(q^{n})+\nabla\xi G^{-1}\nabla\xi(q^{n+1}))^{T}\left(q^{n+1}-q^{n}\right)\leadsto 2\nabla\xi G^{-1}(q_{t})z^{\prime}(t)\,dt$ by (5.69). Expanding in higher order powers of $\Delta t$ , it can be checked that there exists two functions $a$ and $b$ such that

Therefore, since (in the limit $\Delta t\to 0$ ) the martingale part of $f^{n}+f^{n+1}$ arises only from the term $\nabla\xi G^{-1}\nabla\xi^{T}(q_{t}).dq_{t}$ , one obtains

since $\nabla\xi^{T}(q_{t})P(q_{t})=0$ . In conclusion, the second term in (5.73) has a zero contribution to the continuous-in-time limit.

As a consequence, the formal time continuous limit of $\mathcal{W}^{n+1}-\mathcal{W}^{n}$ is the same as the one of $I^{n}$ . Computations similar to the one performed above yield

we get in the end that the formal time continuous limit of $I^{n}$ and $\mathcal{W}^{n+1}-\mathcal{W}^{n}$ is the same as:

where we have used (5.71). This concludes the proof of the consistency of (5.58)-(5.56) with (5.63)-(5.64).

6. Numerical illustration

We present some free energy profiles obtained with nonequilibrium switching dynamics for the model system and the parameters described in Section 4.4. The switching schedule reads

with $z_{\rm min}=-0.1$ and $z_{\rm max}=1.1$ . The time-step is $\Delta t=0.01$ . The initial conditions are obtained by first subsampling a constrained dynamics with $\xi(q)=z_{\rm min}$ and $v_{\xi}(q,p)=0$ , with a time spacing $T_{\rm sample}=1$ ; and then adding the required component $\nabla\xi(q)G_{M}^{-1}\dot{z}(0)$ to the momentum variable (with $\dot{z}(0)=(z_{\rm max}-z_{\rm min})/T$ ).

where $\alpha={\rm e}^{-\gamma\Delta t}$ , and setting $p^{n+1/4}=\widetilde{p}^{n+1/4}+\lambda^{n+1/4}\nabla\xi(q^{n})$ with $\lambda^{n+1/4}$ chosen such that

Figure 3 presents estimates obtained with $M$ independent realizations of the switching dynamics for different switching times $T$ , using the estimator presented in Section 5.4.4 with the work discretization (5.42). In all cases, the product $MT$ is kept constant. The free energy profile becomes closer to the reference curve as $T$ is increased, and the profile obtained for $T=100$ is in excellent agreement with the result obtained with thermodynamic integration. When the switching time is small, more realizations should be considered to reduce the statistical errors and obtain estimates in better agreement with the reference profile. The fact that the variance is very large when the switching time $T$ is too small is a well-known drawback of estimators based on the Jarzynski-Crooks identity, see the review in Sections 4.1.4 and 4.1.5 in . Roughly speaking, the difficulty is related to the fact that the free energy difference is obtained as an average of $\exp(-\beta{\mathcal{W}})$ , which requires a very good sampling of the small values of the work ${\mathcal{W}}$ . As $T$ decreases, the width of the work distribution increases and the low tail part is more and more difficult to sample. Improved estimates can be obtained with estimators based on combinations of forward and backward trajectories, see for instance and Section 4.2 in .

Appendix: Some technical results

We give in this appendix two technical lemmas, used in the proof of Proposition 4.3. The first lemma can also be used to prove the divergence formula (2.25).

For any $a\in\left\{1,\dots,2m\right\}$ :

The proof relies on the following computation rules for any family of invertible square matrices $\theta\mapsto A_{\theta}$ :

Fix $a\in\{1,\dots,2m\}$ . First, using (6.3) with $A_{\theta}$ replaced by $\Gamma$ and $\frac{d}{d\theta}$ replaced by $\left\{\cdot,\Xi_{c}\right\}$ , we obtain

so that by the skew-symmetry of $\Gamma^{-1}$ and $\Gamma$ ,

Jacobi’s identity for Poisson brackets and (6.2) then yield

since $\Gamma_{a,b}(\Gamma^{-1})_{b,c}=\delta_{a,c}$ where $\delta_{i,j}$ is the Kronecker symbol. Finally, the left hand and right hand sides of the last equality can be factorized as

Since $\left|\det\Gamma\right|>0$ and $\Gamma$ is invertible, it follows

for all $b=1,\dots,2m$ , which is (6.1). ∎

where the phase space $\Sigma_{\Xi}(\zeta)$ is defined in (2.7), and the Gram matrix $\Gamma$ in (2.9).

where in the last line the following chain rule has been used:

Now an integration by parts with respect to $dq\,dp$ , together with (6.1), leads to