Discretization errors in molecular dynamics simulations with deterministic and stochastic thermostats
Ruslan L. Davidchack
Introduction
One of the important tasks facing a practitioner of molecular dynamics (MD) starting simulation of a new system is to choose an appropriate integration time step. Since this choice is highly system dependent, even the most sophisticated and well developed MD packages leave this choice to the user. In order to make the MD simulation efficient, the step size should be chosen as large as possible. However, too large a step size results in the instability of the numerical integration of the equations of motion. Strong instability leads to the “blow up” of the simulation, while moderate instability usually manifests itself in a drift of the measured quantities which are expected to be stationary or preserved under the exact dynamics. Even if the numerical integrator is stable and the measured quantities appear stationary, the numerical solution of the equations of motion introduces a discretization error, so that the computed averages depend on the step size. For relatively small step sizes, small systems, and/or short simulation runs the discretization error is masked by the statistical error of the measurement (which stems from the finite duration of the simulation run). However, when larger time steps are used and longer simulations of larger systems are carried out, the discretization error can become much larger than the statistical one.
A particularly clear example of this error can be found in the literature on dissipative particle dynamics (DPD) . Due to the softness of the DPD interactions, the numerical integrators employed there are stable up to relatively large time steps. However, as was recognized relatively recently, the large time steps typically employed in DPD simulations lead to the appearance of various ”artifacts” in the obtained results : the measured kinetic temperature differs from the temperature set by the thermostat, different components of an inhomogeneous system can have different measured kinetic temperatures, kinetic and configurational temperatures are not equal, pressure profiles in spatially inhomogeneous systems are not uniform, etc. All of these artifacts disappear when the step size is sufficiently reduced.
One of the purposes of this article is to point out that these and other observed artifacts are caused solely by the numerical discretization error, and therefore one must not be tempted to give them a physical interpretation. For example, the observed differences in measured temperatures should not be interpreted as a violation of equipartition or as the deviation of the modeled system from equilibrium. Similarly, the nonuniformity of the pressure profile should not be interpreted as evidence for the presence of some hidden internal forces within the system. Instead, a suitable interpretation of these artifacts should be given within the numerical analysis framework.
The terms can be expressed in terms of and its derivatives, provided is sufficiently smooth . If a numerical method is time reversible, then for odd . If the flow is Hamiltonian with Hamiltonian and the numerical method is symplectic, then the modified flow is also Hamiltonian with
where the terms can be expressed in terms of and its derivatives.
Note that, even though the above series do not converge in general, it can be shown that the difference between the exact flow and a modified flow described by a truncated series with a suitably chosen number of terms can be made exponentially small. For symplectic integrators, this implies that the energy of the suitably truncated modified Hamiltonian remains almost constant for exponentially long periods of time. A somewhat weaker conservation of the total energy in conservative systems is also exhibited by non-symplectic methods. (An extensive discussion of these issues can be found, for example, in Refs. .)
Of course, we are not interested in the properties of the shadow system, but rather in those of the original system. Therefore, we would like to determine how the phase space average of a quantity defined for the original system is modified when we sample it with respect to the statistical distribution generated by the modified flow. It can be shown that the result can also be expressed as a power series in :
where and denote the averages along ergodic trajectories of the original and the modified flow, respectively. Traditionally, the time step is chosen small enough to ensure that the difference is smaller than the statistical error in the estimated value of . In order to be able to use larger time steps, it is necessary to estimate this difference and take it into account.
In the study presented in this article we focus on the empirical investigation of discretization errors and their influence on the computed results in a system coupled to different deterministic (Nosé-Hoover and Nosé-Poincaré) and stochastic (Langevin) thermostats. We choose to study the system of water molecules modeled as rigid bodies with the TIP4P parameters . This choice is motivated by the ubiquity of water modelling in the computational chemistry and biochemistry literature, as well as our interest in exploring the interplay of discretization errors for different types of degrees of freedom (i.e., translational and rotational) within the system. We also propose a couple of practical approaches for taking into account or removing the discretization errors from the computed averages. The first approach is based on extrapolating results from simulations with different step sizes. The second approach is based on introducing a weighted coupling of the Nosé-Hoover thermostat to different types of degrees of freedom in order to remove the leading term in the discretization error for a measured quantity of interest.
The rest of the article is organized as follows. In Section 2 we give the details of the simulated model and the expressions for the measured quantities, which include kinetic and configurational temperatures evaluated separately for translational and rotational degrees of freedom, potential energy, pressure, translational and rotational diffusion coefficients, and radial distribution functions. In Section 3 we present and discuss the results of determining discretization errors in the measured quantities for the isolated system (i.e., ensemble), as well as several constant temperature simulations where the system is coupled to a Nosé-Hoover or a Langevin thermostat. In Section 4 we propose and investigate two possible approaches to correcting discretization errors: extrapolation and weighted thermostating. We summarize our results and list the main observations in Section 5. We also provide in the Appendix a detailed description of the numerical algorithms employed in this study.
Computations
The potential energy represents pairwise interaction between interaction sites within molecules:
where is the coordinate of the interaction site within molecule , with being the site coordinate relative to the center of mass of a molecule in the molecule-fixed reference frame. Here
is the rotational matrix expressed in terms of quaternion coordinates. The pairwise interaction potential
where . This degree of potential smoothness is sufficient to ensure that the term in the modified Hamiltonian for a second order numerical integrator is well defined and thus can be used to interpret the leading order term of the discretization error. To study higher order terms, more smoothness might be necessary . We set Å and Å. The screened Coulomb potential is similar to the damped potential introduced by Wolf et al. and provides a better approximation to the full electrostatic interactions compared to simple truncation .
The simulated system contained 1728 molecules in a well-equilibrated liquid state at 300 K and density 989.85 kg/m3. The simulation run for each thermostat and each step size started from the same initial state, which was further equilibrated for 20 000 steps, and then the measurements were collected during the subsequent 200 000 steps.
2 Measured quantities
The temperature of the system is measured in several different ways: translational and rotational kinetic temperatures are given by
respectively, while the total kinetic temperature is given by
The translational and rotational configurational temperatures are measured using expressions
respectively, where is the angular gradient operator for molecule . We also measure potential energy per molecule
where is the volume of the system. Angle brackets are used to represent the average over a simulation run.
2.2 Drift
2.3 Dynamic
between the diffusion coefficient and the slope of the mean square displacement as a function of . The angle brackets denote the average over molecules and over the time origins taken at uncorrelated intervals during the simulation run. The typical behavior of the mean square displacement as a function of time is illustrated in Figure 1(a). For small the function is a parabola, indicating a ballistic regime, while for larger it has a linear dependence on time indicative of the diffusive regime. The diffusion coefficient was determined from the slope of the least-squares straight line fit to this function in the range between 1 and 3 ps.
We also measure velocity and angular velocity time autocorrelation functions. However, quantitative characterization of discretization errors in these quantities is rather problematic since, unless one uses some type of interpolation, the values are available only at integer multiples of the step size. Therefore, we do not report the results of these measurements in this article and just note in passing that, when the diffusion coefficient was calculated by integrating the measured velocity autocorrelation function using the trapezoidal rule, the results were identical within statistical errors with those obtained using Eq. (11).
2.4 Structural
Results and Discussion
In this Section we report and discuss the measurements of discretization errors in the simulation of TIP4P liquid water system coupled to various thermostats. To solve the equations of motion (6) of the isolated system (i.e., in the ensemble), we use a combination of velocity-Verlet integrator for translational and NO_SQUISH integrator for rotational dynamics. The combined integrator is second order, symplectic, and time reversible. All the thermostats considered here are based on augmenting this integrator with deterministic or stochastic terms in the way that preserves the second order and, in the case of a deterministic thermostat, the time reversibility of the integrator. Therefore, we expect that the discretization errors in the measured quantities will scale with as
with measuring the size of the error for step sizes where the term can be neglected. Here for time reversible integrators and otherwise.
For the RDFs, it is expected that, at every value of , the leading order discretization error will also scale linearly with , so that the results of the measurements with different step sizes can be expressed as
The RDFs behave as predicted by Eq. (14), i.e., for each value of the measured values of scale linearly with up to about 5.5 fs step size. In the right column of plots in Figure 2 we show and , as defined in Eq. (14) and determined from the straight line least-squares fit to at each for fs. As can be seen from the plots, most of the discretization error is concentrated around the peaks of the RDFs, resulting in a reduction of the peak heights.
To facilitate a more quantitative comparison between different integrators, we list in Table 1 the values for and , as defined in Eq. (13). It shows that different measurements of the system temperature are all consistent in the limit , while the discretization errors, quantified by , are different. In particular, the errors in rotational temperatures are larger than in translational ones, which is not surprising given the fact that the rotational motion of water molecules is faster than their translational motion.
2 Nosé-Hoover Thermostat
For the simulation of the system in the ensemble using a Nosé-Hoover thermostat, the equations of motion for the momenta, Eqs. (6c) and (6d), are modified as follows:
where is a scalar dynamic variable, whose evolution is described by the differential equation
with initial condition . Even though the modified equations are no longer Hamiltonian, they preserve the ’extended energy’ of the system
Unlike in the case of the V-NSQ integrator, the drift of the extended energy , defined in Eq. (17), is not followed by the drift of other measured quantities. In fact, all of the measured quantities are stationary up to the stability threshold of about 10 fs. Closer inspection shows that the drift in the extended energy only appears in the last term of Eq. (17) due to the drift in . But since the value of this quantity has no influence on any of the other dynamical variables, the stationarity of the rest of the system is not perturbed. Therefore, contrary to the common recommendations about the monitoring of the extended energy to ensure the stability of the Nosé-Hoover dynamics, we see that such monitoring would yield a very conservative limit on the permissible values of the time step .
The size of the discretization errors for the RDFs is similar to that for the V-NSQ integrator, although the sign of the error is reversed, so that the peaks of are larger than those of .
It is interesting to note that, even though the Nosé-Hoover thermostat is designed to control the total kinetic temperature of the system, this quantity also exhibits a discretization error. The reason is that, as implemented in the NH-E integrator (see A.2), the thermostat is coupled to the total kinetic energy at half steps. Analysis of the NH-E integrator reveals the relationship between kinetic energies measured at half steps and full steps. For simplicity, we will write NH-E for a single translational degree of freedom:
Assuming a well equilibrated simulation run, all measured quantities should be stationary and, in particular, . Taking the time average of Eq. (18c) we obtain
that is, the average kinetic temperature measured at half steps is equal to the thermostat temperature . To determine the difference between this kinetic temperature and the one measured at full steps, we combine
and use again the stationarity assumption that \big{\langle}{\big{(}p^{n+\frac{1}{2}}\big{)}^{2}}\big{\rangle}=\big{\langle}{\big{(}p^{n-\frac{1}{2}}\big{)}^{2}}\big{\rangle}, to obtain
2.2 Implicit Nosé-Hoover (NH-I) Integrator
It is not difficult to derive a Nosé-Hoover integrator which controls the temperature using the momentum at full steps. One such integrator can be found in Refs. (see A.3 for details). It is slightly more complicated than NH-E in that it requires solution of a cubic equation for , which can be easily done using Newton’s iterations (hence the word ’implicit’ in the name). The results are shown in Figure 4 and Table 3. Now the total kinetic temperature is maintained equal to the thermostat temperature at all step sizes. At the same time, other temperatures display similar shift in discretization error compared to those for the NH-E integrator. In fact, comparing results in Tables 2 and 3 we see that for all temperatures are shifted by approximately 0.5 K/fs2. The discretization errors in other quantities, such as potential energy, pressure, diffusion, RDFs, etc., are shifted as well, so that these errors are now much larger than for the NH-E integrator. Since it is these quantities, rather than the temperature, that are usually of interest, the ability of the integrator to exactly control the kinetic temperature has, somewhat surprisingly, a detrimental effect on its overall performance.
2.3 Measure-Preserving Nosé-Hoover (NH-MP) Integrator
Another Nosé-Hoover integrator we have tested has the property of preserving the invariant measure in the extended space . Its complexity is similar to that of NH-E, in that it is fully explicit and easy to code, while, similar to NH-I, it controls the total kinetic energy measured at full steps. The results are shown in Figure 5 and Table 4. It turns out that these results are identical, within the statistical error, to those obtained with the NH-I integrator. The only observed difference between NH-I and NH-MP, which is probably due to the measure-preserving property of NH-MP, is that the drift of the extended energy is less pronounced than in the other two integrators and, as can be seen in Figure 5, becomes significant at about 5.5 fs step size. However, just like with NH-E and NH-I, all other measured quantities remain stationary up to the stability threshold, so the improved conservation of the extended energy doesn’t appear to be relevant to the overall performance of the integrator.
3 Nosé-Poincaré (NP) Integrator
4 Separate Nosé-Hoover thermostats
In cases when the system consists of several types of degrees of freedom that evolve on significantly different timescales, the coupling of all degrees of freedom to a single thermostat may lead to inefficient equilibration due to a very slow energy flow (weak coupling) between different types of degrees of freedom. In such cases it is often recommended that different types of degrees of freedom are coupled to separate Nosé-Hoover thermostats, thus providing independent temperature control for each type of degrees of freedom . For example, in the system of water molecules investigated in this work, the translational and rotational degrees of freedom could be coupled to two separate Nosé-Hoover thermostats. The equations of motion for the momenta, Eq. (3.2), would be modified as follows:
where and are scalar dynamic variables evolving according to equations
with initial conditions and . The new dynamics preserves the extended energy in the form
where , , and , .
We have implemented separate Nosé-Hoover thermostats using three different integrators, NH-E, NH-I, and NH-MP, and in all cases observed undesirable side effects of the discretization error on the system evolution. Even for small step sizes, where the extended energy did not exhibit any significant drift, we observed that the thermostat variables and drifted in opposite directions, indicating that the thermal energy was flowing through the system from one thermostat to the other. So, rather than modelling the system in a thermal equilibrium, such simulation appeared to model a steady non-equilibrium process, with a steady flow of energy from one type of degrees of freedom to another. With NH-E, the energy flowed from translational to rotational degrees of freedom, while with NH-I and NH-MP integrators it flowed in the opposite direction.
Another undesirable consequence of using separate Nosé-Hoover thermostats is that, with the NH-E integrator, the total linear momentum of the system, which is set equal to zero (to within machine precision) at the start of the simulation, grows exponentially with time. This indicates that the zero total momentum state, which is conserved by the exact solutions of the Nosé-Hoover equations (3.4)-(3.4), becomes linearly unstable when the equations are solved with the NH-E integrator. The instability is stronger for larger step sizes, but is present at all step sizes. With the other two integrators, we have observed instabilities related to the overall rotational dynamics of the system, but we have not investigated their precise origin.
5 Langevin Thermostat (Lan-A and Lan-B)
Another approach to simulating the canonical ensemble is to couple the system to a stochastic thermostat described by the Langevin equation, which is well known for the translational degrees of freedom and has been recently proposed for the rotational ones . The combined Langevin equations for both translational and rotational dynamics can be written in the form
where and , measured in the units of inverse time, control the coupling of the translational and rotational degrees of freedom to the thermostat, , , and and denote the standard Wiener processes in 3 and 4 dimensions, respectively. Matrices are defined in A.
We have investigated discretization errors and their dependence on and for two different integrators: Langevin A (Lan-A) and Langevin B (Lan-B) . Both integrators are second order (in the weak sense), quasi-symplectic, and exactly preserve the constraint . With , both Lan-A and Lan-B reduce to the V-NSQ integrator. Since the integrators are not time reversible, the dependence of on is expected to be described by Eq. (13) with .
Unlike the Nosé-Hoover and Nosé-Poincaré integrators, where the discretization errors appear to be independent of the thermostat parameter in a wide range of parameters values, the errors for the Langevin integrators depend on and . We have varied the two parameters independently and obtained results for ps-1, and ps-1. (Note that the values ps-1 and ps-1 were determined in Ref. to be close to optimal with respect to the speed of system equilibration for TIP4P liquid water at 270 K.) We will not report here the results of all 15 simulations for each thermostat, but rather report selected results and describe the trends of the dependence of the results on the thermostat parameters.
Contrasting Lan-A and Lan-B we see that Lan-B is better at maintaining the correct values of the kinetic temperatures, but other measured quantities have larger discretization errors than with Lan-A. In this respect, the difference between Lan-A and Lan-B is similar to that between the NH-E and NH-I (or NH-MP) integrators.
Correcting Discretization Errors
Accounting for discretization errors and correcting their influence on the results can be done using backward error analysis, as discussed in the Introduction. The work is underway to implement this approach for various systems and thermostats. Until such tools are available, we present below a couple of practical approaches that might be considered, which allow for correcting or removing the discretization errors from the measured quantities.
To correct for the discretization error in the measured quantities, we can run simulations with several different step sizes and then extrapolate the result to the limit of zero step size. This procedure is known in numerical analysis as Richardson extrapolation . Below we present the analysis of the Richardson extrapolation procedure when two simulations are performed with different step sizes and the extrapolation allows to remove the discretization error term from the results obtained with second order numerical integrators.
As we have seen in this work, if the equations of motion are solved using a second order numerical integrator, then the estimated average obtained from a simulation run with step size has the following step size dependence:
where if the integrator is time reversible and otherwise. If a second simulation run is performed with step size , , then we can also write
The two expressions can be combined to yield a higher order estimator for :
This expression is a particular case of a more general Richardson extrapolation formula, which is often used in numerical analysis in order to accelerate the rate of convergence of numerical methods .
To determine the optimal choice for and for the relative duration of the two runs, we need to minimize the estimated statistical error of in Eq. (25). In a simulation run of steps with step size , the average of along an equilibrium trajectory is given by
These results demonstrate that, in order for Eq. (25) to yield an estimate with statistical error comparable to that of a single step run, it is necessary to be able to run the simulation with the step size that is about four times larger than that of the single run. Such an estimate will be accurate if the value of the term in Eq. (25) remains negligible at this compared to the statistical error. From the results reported above we could see that in the simulations of TIP4P water with the Nosé-Hoover thermostats the higher order terms could be neglected for step sizes up to about 7 fs. Since majority of simulations of the rigid water models are performed with steps sizes of about 0.5 or 1 fs, Richardson extrapolation would allow to obtain in this case higher precision results with the same computational effort or, alternatively, would allow to reduce the computational effort.
Finally, note that steps are carried out with the time step for the time duration of , while only steps with the time step , for the time duration of , are required to find the optimal correction for the order discretization error. It is important that is large enough so that is much larger than the correlation time for any quantity of interest.
2 Weighted Thermostating
The weighted thermostating approach to removing discretization errors is based on the observation that the values of the discretization errors depend on the way the system is coupled to a thermostat. In the case of the Nosé-Hoover thermostat coupled to the system with translational and rotational degrees of freedom this can be exploited by introducing different weights for translational and rotational kinetic energies in Eq. (16) for the thermostat variable:
For this reduces to Eq. (16), while for () the Nosé-Hoover thermostat is coupled only to translational (rotational) degrees of freedom. Since now are expected to be functions of , we hope to find such value of that, for a particular measured quantity of interest , the leading term in the discretization error is zero, .
Even if the system is homogeneous with only one type of degrees of freedom, the weighted thermostating approach can be used by introducing weighted coupling to kinetic and configurational temperatures. The Nosé-Hoover thermostat coupled to the configurational temperature has already been proposed , and it should be straightforward to combine it with the kinetic temperature thermostat.
Summary
We have measured discretization errors in static, dynamic, and structural quantities in the molecular dynamics simulation of a system of TIP4P liquid water coupled to various Nosé-Hoover and Langevin thermostats. Based on the analysis of the obtained results, we list the following main observations:
All measured quantities exhibit dependence on the step size as described by Eq. (13), as expected from the backward error analysis for second order numerical integrators. The linear dependence of discretization errors on extends up to about 70% of the stability threshold of the integrators.
Moderate instability of the integrators manifests itself in a gradual drift of the total (extended) energy of the system, which is conserved by the exact dynamics. While for the V-NSQ and NP integrators this drift is also present in other measured quantities, for the Nosé-Hoover integrators the drift is localized within the thermostat variables and thus is not indicative on the non-stationary evolution within the system itself; all measured quantities of the system coupled to a Nosé-Hoover thermostat remain stationary up to the stability threshold.
To thermostat systems with several different types of degrees of freedom that evolve on significantly different timescales, it is often recommended that separate thermostats are used to control the temperature of different types of degrees of freedom. We have implemented such an approach in our system by coupling the translational and rotational degrees of freedom to two separate Nosé-Hoover thermostats and observed undesirable side effects of the discretization errors on the system evolution. In particular, even when simulating a well equilibrated system, the thermostats induced a steady flow of energy from one type of degrees of freedom to the other. Therefore, we recommend that the coupling of the system to several thermostats should be avoided.
We have proposed two possible approaches for taking the discretization errors into account in order to obtain accurate measurements in simulations with large time steps. The first approach is based on running the simulation with two different step sizes and then using Richardson extrapolation to eliminate the leading term in the discretization error. Our analysis shows that, given the total computational budget of steps, the most precise measurements can be obtained when the first simulation is performed with step size for steps and the second simulation is performed with step size for steps. Our study suggests that can be as large as 70% of the stability threshold of the integrator, which for the system investigated here is about 7 fs.
The second approach is based on a weighted coupling of the Nosé-Hoover thermostat to different types of degrees of freedom. For each quantity of interest, , it is possible to find the value of the weight parameter, , such that the leading discretization error term is zero, . The obvious drawback of this approach is that, in the simulation with a chosen weight parameter, the discretization error can be removed for only one quantity, unless the optimal weights for several quantities of interest coincide.
Finally, we would like to point out that larger step sizes should be used with caution when simulating systems close to phase or other transition regions, since it is possible that the discretization errors shift the system state across the thermodynamic phase boundary into a phase which is different from the one modeled by the exact dynamics. In this case the simple linear dependence of the errors on may break down at smaller step sizes than those demonstrated in the present study.
Acknowledgments
I would like to thank Brian Laird, Stephen Bond, Ben Leimkuhler, Mark Tuckerman, and Glen Martyna for fruitful discussions and helpful remarks. The computations have been carried out on the University of Leicester HPC cluster purchased through the HEFCE Science Research Investment Fund.
Appendix A Numerical Integrator Schemes
The rotational equations of motion are integrated using the symplectic method NO_SQUISH proposed by Miller et al.. The method is based on rewriting the rotational kinetic energy of a molecule in the form
where , , , and the constant matrices are defined as follows:
This allows to introduce a second order integrator for free rotations :
where is defined as follows:
This integrator is symplectic, time reversible, and exactly preserves the constraint .
The following numerical integrators were investigated:
This integrator combines velocity-Verlet algorithm for translational and NO_SQUISH algorithm for rotational dynamics. The combined integrator is second order, symplectic, and time reversible.
A.2 Explicit Nosé-Hoover (NH-E) integrator
This integrator is based on the one described in Refs. , which has been extended to include thermostating of rotational degrees of freedom.
A.3 Implicit Nosé-Hoover (NH-I) integrator
This is a straightforward extension of the integrator presented in Ref. , which includes rotational dynamics.
Note that the equations for , , and are coupled and must be solved together. We solve these equations by substituting the expressions for and into the equation for :
and then applying the Newton-Raphson iteration
For the system modeled in this study, three iterations are sufficient to achieve double-precision accuracy even for the largest attempted step size of fs.
A.4 Measure-Preserving Nosé-Hoover (NH-MP) integrator
This is a straightforward extension of the integrator proposed in Ref. .
A.5 Nosé-Poincaré (NP) integrator
This is a straightforward extension of the integrator proposed in Ref. .
This symplectic integrator is based on the Hamiltonian
This is an explicit method since the equation for is a quadratic equation which can be written as follows