Accurate sampling using Langevin dynamics

Giovanni Bussi, Michele Parrinello

I Introduction

Langevin dynamics was first introduced in molecular simulations to calculate the properties of mesoscopic systems Turq et al. (1977). Here a dissipative force and a noise were added to the Hamilton equations to model a bath of lighter particles. The formal justification for this model can be obtained using the projection operator techniques Zwanzig (1961); Öttinger (1998). However, it was soon realized that Langevin dynamics can also be used as a thermostat Schneider and Stoll (1978), adding the dissipative forces and the noise to the Hamiltonian dynamics to allow a molecular dynamics simulation to explore an ensemble at a fixed temperature. Furthermore, it has been used to sample arbitrary distribution, for instance in the case of numerical quantum-chromodynamics Batrouni et al. (1985).

Several algorithms have been proposed for the numerical integration of the Langevin equation, see among others Refs. Helfand (1979); Greenside and Helfand (1981); van Gunsteren and Berendsen (1982); Brünger et al. (1984); Allen and Tildesley (1987); Hershkovitz (1998); Paterlini and Ferguson (1998); Forbert and Chin (2001); Skeel and Izaguirre (2002); Ricci and Ciccotti (2003); Scemama et al. (2006); Vanden-Eijnden and Ciccotti (2006); Mannella (2004). Most of them were derived with the aim of producing accurate trajectories, i.e. dynamical properties, up to a given order. Because of that, they usually break down when a high friction is applied, essentially when the velocities are varying too fast with respect to the chosen time step. Moreover, their design is not focused on the correctness of the ensemble generated. A notable exception is given by the schemes derived in Ref. Mannella (2004), where the free parameters of the algorithm are chosen so as to minimize the sampling errors. However, none of the algorithms so far proposed offer any way of checking the accuracy of the sampling during a numerical simulation. This is at variance with the numerical integration of Hamilton’s equations, where the conservation of the total energy has been traditionally used to this end Allen and Tildesley (1987); Frenkel and Smit (2002). The standard approach in molecular dynamics is thus to choose the time step by monitoring the energy conservation in a few microcanonical runs, then to adopt the same time step for the Langevin dynamics. To the best of our knowledge, only in a recent paper Scemama et al. (2006) Scemama et al. have shown how to correct exactly the discretization errors in the Langevin dynamics in the context of variational Monte Carlo, using a Metropolis procedure. However, the poor scaling of these accept-reject algorithms with respect to the number of degrees of freedom prevents their application to global moves in very large systems Duane et al. (1987); Frenkel and Smit (2002).

In a recent paper Bussi et al. (2007) we introduced a constant-temperature molecular-dynamics method. In that context, we discussed the notion of effective energy, as a measure of sampling accuracy. In Ref. Bussi et al. (2007) only one variable, the total kinetic energy, was subject to stochastic fluctuations and the response of the thermostat could be modeled so as to have a minimal effect on the dynamics. Here we apply some of the ideas developed in Ref. Bussi et al. (2007) to Langevin dynamics, where all degrees of freedom can be separately controlled and the time scale over which the thermostat reacts is defined by the friction coefficient. When used as thermostat, Langevin dynamics can be more efficient in difficult cases, but it is more disruptive of the dynamics. In an extension of Ref. Bussi et al. (2007), we integrate Langevin using a simple algorithm derived from a Trotter decomposition. The effective-energy drift allows the sampling error to be controlled during a simulation, and can be used in a rigorous way to perform reweighting or accept-reject algorithms, in a scheme that turns out to be similar to that discussed by Scemama et al. Scemama et al. (2006). The advantage of our formulation is that, for large systems, the effective energy can be simply checked against long-term drifts, in the same way as the total energy has traditionally been used to check the accuracy of microcanonical molecular dynamics. We also show the properties of the effective energy in model harmonic oscillators and in a realistic Lennard-Jones crystal.

II Theory

We consider a particle with mass $m$ subject to a potential energy $U(q)$ . The generalization to multiple degrees of freedom is straightforward. The probability density for the canonical ensemble at an inverse temperature $\beta$ is

The canonical ensemble can be sampled through the Langevin dynamics

where $f(q)=-\frac{\partial U}{\partial q}$ is the deterministic force, $\gamma$ is the friction coefficient, and $dW(t)$ is a Wiener noise in the Itoh convention Gardiner (2003), normalized as $\langle dW(t)dW(t^{\prime})\rangle=\delta(t-t^{\prime})$ . A description equivalent to the stochastic Eq. (2) can be formulated in terms of the probability density, which evolves according to the Fokker-Planck equation Gardiner (2003); Risken (1989)

The formal solution of Eq. (3) at a finite time step $\Delta t$ is

which however cannot be evaluated explicitly. Notice that for Hamiltonian dynamics, $\gamma=0$ , the operator $\hat{L}$ is anti-Hermitian and the propagator $e^{-\Delta t\hat{L}}$ is unitary. These properties hold only for a deterministic area-preserving dynamics. They do not hold in a Langevin process.

II.2 A simple integrator

As was first recognized by Tuckerman et al. Tuckerman et al. (1992) and, independently, by Sexton and Weingarten Sexton and Weingarten (1992), the Trotter formula Trotter (1959) allows an approximated propagator to be constructed as

where $M$ is the number of stages in the integrator and $\sum_{j}\hat{L}_{j}=\hat{L}$ . Since in general the $\hat{L}_{j}$ ’s do not commute among themselves, the order in which the stages are applied is relevant, and the splitting in Eq. (6) introduces some error into the propagation. The key point here is that the stages $e^{-\frac{\Delta t}{2}\hat{L}_{j}}$ are chosen so that they can be integrated analytically, and the Trotter splitting is the only source of errors.

It is natural to write $\hat{L}$ as a sum of three parts:

Several choices are now available for the Trotter splitting. We notice that the operators $e^{-\frac{\Delta t}{2}\hat{L}_{\gamma}}$ and $e^{-\Delta t\hat{L}_{pq}}$ leave the stationary distribution in Eq. (1) unchanged:

This is due to the fact that the canonical distribution is stationary not only with respect to $\hat{L}$ but also with respect to $\hat{L}_{pq}=\hat{L}_{p}+\hat{L}_{q}$ , which corresponds to Hamilton propagation, and with respect to $\hat{L}_{\gamma}$ , which introduces the combined effect of friction and noise. Thus, even if the commutator $[\hat{L}_{pq},\hat{L}_{\gamma}]\neq 0$ , the following splitting does not introduce sampling errors,

since it can be interpreted as a sequence of moves each of which has the correct limiting distribution. The $e^{-\frac{\Delta t}{2}\hat{L}_{\gamma}}$ move provides ergodicity in the momenta subspace only, while the $e^{-\Delta t\hat{L}_{pq}}$ move mixes the momenta and positions subspaces. An integrator designed to apply the propagator in Eq. (10) would provide an approximate trajectory and an exact sampling, independently of $\Delta t$ and $\gamma$ . The propagator $e^{-\frac{\Delta t}{2}\hat{L}_{\gamma}}$ can be integrated analytically. Unfortunately, the propagator $e^{-\Delta t\hat{L}_{pq}}$ cannot be integrated exactly and has to be split further. We opt here for the simplest choice, which is the same used to obtain the velocity Verlet algorithm:

In specific cases, different decompositions of $\hat{L}_{pq}$ could be adopted. For example, if the forces can be separated into contributions varying on different time scales, a multiple-time-step decomposition is expected to be more efficient Tuckerman et al. (1992).

Other possible choices for the Trotter splitting which are substantially equivalent to Eq. (11) can be obtained, based on the three operators $\hat{L}_{q}$ , $\hat{L}_{p}$ and $\hat{L}_{\gamma}$ . It is worthwhile to notice that in principle there is no need to split $\hat{L}_{p}$ and $\hat{L}_{\gamma}$ , since $\hat{L}_{p\gamma}=\hat{L}_{p}+\hat{L}_{\gamma}$ can be also evolved analytically. Ricci and Ciccotti Ricci and Ciccotti (2003) derived two integrators using splittings that, in our notation, would read $e^{-\Delta t\hat{L}}\approx e^{-\frac{\Delta t}{2}\hat{L}_{p\gamma}}e^{-\Delta t\hat{L}_{q}}e^{-\frac{\Delta t}{2}\hat{L}_{p\gamma}}$ and $e^{-\Delta t\hat{L}}\approx e^{-\frac{\Delta t}{2}\hat{L}_{q}}e^{-\Delta t\hat{L}_{p\gamma}}e^{-\frac{\Delta t}{2}\hat{L}_{q}}$ . These decompositions involve a single splitting and thus appear more accurate than Eq. (11). However, when $\gamma\Delta t$ is negligible, they do not offer any advantage, and when $\gamma\Delta t$ is not negligible, they do not sample the proper ensemble. This can be easily verified taking the limit $\gamma\Delta t\rightarrow\infty$ . On the other hand, in our scheme the only ensemble violations arise from the fact that for a finite $\Delta t$ the evolution of $\hat{L}_{pq}$ is approximated. These violations are independent of the choice of the friction. Even the infinite friction limit can be taken safely, as shown in Appendix A. Thus, when the sampling quality is an issue, our scheme offers significant advantages.

The splitting in Eq. (11) leads to an explicit integration scheme. In the derivation we use the analytical propagation formula for $\hat{L}_{\gamma}$ which can be found in Ref. Risken (1989). After some manipulation, the integrator is written as:

where $R$ and $R^{\prime}$ are two independent Gaussian numbers and the coefficients $c_{1}$ and $c_{2}$ are

Equation (13b) fixes the weight of the rescaling factor $c_{1}$ and of the amplitude of the Gaussian number $c_{2}$ in such a way that $c_{1}p+c_{2}R$ will be distributed in the same way as $p$ . Thus, Eq. (13b) alone guarantees the correctness of the sampling. On the other hand, Eq. (13a) gives the relation between the friction $\gamma$ and the rescaling factor $c_{1}$ .

In Equation (12), the combination of the two inner stages is a velocity Verlet step, and corresponds to the approximate propagation of $e^{-\Delta t\hat{L}_{pq}}$ . The first and last stages represent the action of the thermostat, i.e. the exact propagation of $e^{-\frac{\Delta}{2}t\hat{L}_{\gamma}}$ . We denote as $p(t^{+})$ and $p(t^{-})$ the momenta immediately after and immediately before the action of the thermostat. We also observe that the first and last stages can be merged as $p(t^{+}+\Delta t)=c_{1}^{2}p(t^{-}+\Delta t)+c_{2}\sqrt{c_{1}^{2}+1}R(t+\Delta t)$ so that one Gaussian random number per degree of freedom is required at each step. This allows the simulation to speed up when the calculation of the deterministic forces is particularly cheap and the generation of the Gaussian random numbers becomes computationally relevant. If one is interested in the values of the momenta at time $t$ , i.e. synchronized with the positions, they can be reconstructed afterwards.

II.3 Control of sampling errors

Our goal is to generate a sequence of points $x_{i}=(p_{i},q_{i})$ in the phase-space, so that a time average can be used in place of the ensemble average Allen and Tildesley (1987). Usually, in molecular dynamics simulations this sampling is approximate, due to the finite-time-step errors. On the other hand, in a Monte Carlo simulation the moves are accepted or refused in such a way that the exact distribution is enforced. Here, we interpret a stochastic molecular dynamics as a highly efficient Monte Carlo where all the moves are accepted. We define $M(x_{i+1}\leftarrow x_{i})~{}dx_{i+1}$ the distribution probability of the point $x_{i+1}$ to be chosen as the next point, given that the present point is $x_{i}$ . We also define the conjugate point $x^{*}=(-p,q)$ , which is obtained by inverting the momentum, and satisfies $\bar{P}(x)=\bar{P}(x^{*})$ . If Equation (2) was integrated exactly, then the detailed balance Gardiner (2003) would be satisfied, i.e., $M(x_{i+1}\leftarrow x_{i})\bar{P}(x_{i})=M(x_{i}^{*}\leftarrow x_{i+1}^{*})\bar{P}(x_{i+1}^{*})$ . However, this is not true when a finite time step is used. Thus, we introduce a weight $w_{i}$ associated to the point $x_{i}$ , which evolves as

We now proceed into an explicit derivation of the terms needed.

We recall that the random numbers $R$ and $R^{\prime}$ are drawn from a Gaussian distribution, i.e.

We notice that given the starting point $x_{i}=(p_{i},q_{i})$ and the ending point $x_{i+1}=(p_{i+1},q_{i+1})$ the value of $R$ and $R^{\prime}$ can be determined solving Eqs. (12) with respect to $R$ and $R^{\prime}$ :

where we have identified the sequence index $i$ with the time $t$ and the sequence index $i+1$ with the time $t+\Delta t$ . Now, changing the variables from $(R,R^{\prime})$ to $(q_{i+1},p_{i+1})$ one obtains the following expression for the transition probability:

At this stage we know the probability for the forward move $M((q_{i+1},p_{i+1})\leftarrow(q_{i},p_{i}))$ . With a similar procedure we can find the probability for the backward move, $M((q_{i},-p_{i})\leftarrow(q_{i+1},-p_{i+1}))$ , and, with some further manipulation, the contribution of the phase-space compression to the effective energy:

For this derivation it is crucial that the change of variables be well defined. Since we have two noise terms $(R,R^{\prime})$ and two variables $(q_{i+1},p_{i+1})$ , we have to require the Jacobian of the transformation to be different from zero. For integrators using only one noise term, it is not obvious that, given the forward trajectory, the backward trajectory is possible. If the backward trajectory is possible, then the effective-energy drift depends on the ratio between the forward and backward probabilities and gives a quantitative measure of the violation of detailed balance. If the backward trajectory is not possible, then the integrator cannot satisfy detailed balance. As an example, the second integrator introduced by Ricci and Ciccotti Ricci and Ciccotti (2003) cannot satisfy detailed balance, as was already pointed out by Scemama et al. Scemama et al. (2006). On the other hand, the modification described in Ref. Scemama et al. (2006) can satisfy detailed balance. It is interesting that in Ref. Scemama et al. (2006) the authors are using the usual formulation of detailed balance, which leads to the need for an explicit inversion of the sign of the velocities. We use a more general formulation of detailed balance Gardiner (2003) in which velocities are considered as odd variables and their inversion after an accepted step is not required. One could also object that the condition of detailed balance is not strictly necessary Manousiouthakis and Deem (1999). However, it appears to us that detailed balance is the only way to enforce or check a distribution in a local manner, i.e., using only information about the present point $x_{i}$ , the next point $x_{i+1}$ , and their conjugated points $x_{i}^{*}$ and $x_{i+1}^{*}$ .

Equations (15) and (19) can be combined, giving a final expression for the effective-energy increment as

From Equation (20) it is easy to see that when $\Delta t$ is small enough the effective energy is approximately constant, since the first and second terms tend to compensate each other and the third term vanishes on the order of $\Delta t^{2}$ . We also notice that the third term in Eq. (20) is an exact differential. Thus it contributes to the fluctuations of the effective energy but not to its drift.

III Examples

It is instructive to study the properties of the integrator in Eq. (12) when it is applied to a harmonic oscillator. We consider an energy profile

where $C$ is an arbitrary constant. The effective distribution that will be sampled by the Langevin dynamics can be obtained analytically and is

This solution can be normalized only if $\Delta t<2/\omega$ . For longer time steps, the dynamics is unstable. Although the logarithm of the distribution in Eq. (23) has an expression similar to that of the so-called shadow Hamiltonian for the harmonic oscillator Toxvaerd (1994); Gans and Shalloway (2000), our derivation of Eq. (23) is based on the stationary distribution only and does not provide any information on the effective trajectory.

The last approximation holds when $\Delta t$ is much smaller than the period of the fastest mode. In this case, it is interesting to note that to have rigorously the same accuracy, the time step has to be chosen proportional to $N^{-1/4}$ .

III.2 Lennard-Jones crystal

In the harmonic oscillator the effective energy reduces to a state function and does not exhibit drifts. In this sense, the harmonic oscillator cannot be considered as a prototype of a real molecular system. In this subsection we discuss the application of Langevin sampling and of effective-energy monitoring in the context of atomistic simulations. We use as a realistic test-case a Lennard-Jones solid, close to the melting point. We express all the quantities in reduced units Allen and Tildesley (1987). We simulate a cubic box with side 19.06 containing 6912 particles arranged according to an fcc lattice, which corresponds to a density $\rho$ =0.998. We set the temperature to $T$ =0.667. We calculate the forces using a distance cutoff of 3. We compare simulations performed using different values for the time step $\Delta t$ and the friction $\gamma$ . All the simulations were performed using a modified version of the DL POLY code Smith et al. ; Smith and Forester (1996).

To stress the fact that the effective-energy slope is a correct indicator of the integration errors, we show the same data in Fig. 3. There, we plot the value of the observable quantity as a function of the slope in the effective-energy drift. The two quantities are highly correlated, indicating that the effective-energy slope gives a realistic estimate of the errors due to the finite time step.

Up to now we have discussed the sampling accuracy, which measures the systematic errors due to the finite-time-step integration. In practical applications also the sampling efficiency, which measures the statistical error due to the finite length of the simulation, is relevant. The sampling efficiency depends on which specific observable one wishes to calculate. In particular, to optimize the efficiency, the autocorrelation time of the quantity of interest has to be as short as possible Frenkel and Smit (2002). In Fig. 4 we show the autocorrelation function of the total potential energy and of the instantaneous pressure, for different choices of the friction $\gamma$ , using a time step $\Delta t=0.0025$ . For both the considered quantities, the optimal choice for $\gamma$ is 20. This rule is far from general, but illustrates clearly the fact that too high a friction can spoil the quality of the sampling since it hinders particle motion.

IV Conclusion

In conclusion, we have studied the properties of a very simple integrator for the Langevin equation, derived employing the Trotter scheme commonly use in the derivation of multiple-time-step integrators. Moreover, we have used the concept of effective energy, introduced in a previous paper, to asses on the fly the accuracy of this integrator in practical cases, ranging from simple one-dimensional oscillators to a Lennard-Jones crystal. Finally, we have shown how to monitor the effective energy in practice. Our formalism can be easily generalized to the description of other stochastic dynamics, such as dissipative-particle dynamics Soddemann et al. (2003).

Appendix A High friction limit

When $\gamma\rightarrow\infty$ the integrator in Eq. (12) can be rewritten in terms of the position only:

Now, defining $D=\frac{\Delta t}{2\beta m}$ this equation becomes

which is exactly the Euler integrator for the overdamped Langevin equation

which is exactly the one used to calculate the acceptance in the smart Monte Carlo technique Rossky et al. (1978).