Using nonequilibrium fluctuation theorems to understand and correct errors in equilibrium and nonequilibrium discrete Langevin dynamics simulations

David A. Sivak, John D. Chodera, Gavin E. Crooks

cond-mat.stat-mech physics.chem-ph physics.comp-ph

I Introduction

In the computational natural sciences, dynamic properties of a stochastic system are often calculated using simple numerical integrators for Langevin dynamics Langevin (1908),

where the system is driven from equilibrium by a time-dependent Hamiltonian $\mathcal{H}(t)$ . In the simplest case of a single stochastic particle, $r$ and $v$ are time-dependent position and velocity, $m$ is mass, $f$ is force, $\beta=1/k_{\text{B}}T$ , $k_{\text{B}}$ is Boltzmann’s constant, $T$ is the temperature of the environment, $\gamma$ is a friction coefficient (with dimensions of inverse time), and $\,\mathsf{W}(t)$ is a standard Wiener process. The force is determined by the derivative of the potential energy, $f\equiv-\,\partial\mathcal{H}/\partial r$ . For multidimensional, multiparticle systems, $r$ , $v$ , $f$ , and $d\,\mathsf{W}$ are vectors, and $m$ is a diagonal matrix.

In order to simulate Langevin dynamics on a digital computer, it is necessary to adopt some approximate algorithm that divides time into discrete steps Frenkel and Smit (2002). However, most such schemes have an inherent problem: Even with a time-independent Hamiltonian, they do not preserve the canonical equilibrium distribution determined by $\mathcal{H}$ nor do they satisfy microscopic reversibility. (By reversibility we mean that the probability of sampling a particular trajectory starting from equilibrium is equal to the probability of sampling the trajectory’s time reversal, reversing velocities if necessary.) We show that these pathologies arise because discrete time step integrators of Langevin dynamics can be viewed as simulations of artificial driven nonequilibrium dynamics. This perspective has the advantage that the complications generated by this unwanted but inevitable breaking of time-reversal symmetry can be understood and remedied in a controlled and systematic fashion with insights from nonequilibrium statistical thermodynamics Athènes (2004); Adjanor and Athènes (2005); Adjanor et al. (2006); Lechner et al. (2006).

We can appreciate some of the problems inherent in finite time step Langevin dynamics by first considering the zero friction limit, $\gamma=0$ , with a time-independent Hamiltonian, where Langevin dynamics reduces to deterministic Newtonian dynamics. A simple, popular integrator for Newtonian dynamics is the velocity Verlet algorithm Swope et al. (1982); Tuckerman et al. (1992),

Because of the finite time step, the trajectories generated by this algorithm are inaccurate: They do not faithfully follow the precepts of Newtonian mechanics. Also, the actual energy of the system is not conserved, but rather it fluctuates from one time step to the next. However, the velocity Verlet integration scheme is symplectic (in that the Jacobian of the transformation from old to new positions and velocities is unity, and therefore the phase-space volume is conserved Sanz-Serna (1992)), which ameliorates some problems due to the finite time step. For example, although a finite time step symplectic integrator does not conserve the energy of the system Hamiltonian, it does conserve the energy of a shadow Hamiltonian, which is close to the desired Hamiltonian if the time step is not too large Yoshida (1990); Frenkel and Smit (2002). For sufficiently small timesteps, this conservation of the shadow Hamiltonian prevents long-term drift in the system Hamiltonian over the duration of the simulation.

Essentially, a finite time step dynamics performs work on the system, over-and-above any work due to intentional perturbations from a time-dependent Hamiltonian Lechner et al. (2006). We can imagine this finite time step integration scheme in the following way. At the beginning of each time step, we first perturb the system Hamiltonian such that it becomes the shadow Hamiltonian, changing the energy of the system. The symplectic integrator then updates the position and velocity [Eq. (2)], perfectly preserving the shadow energy of the shadow Hamiltonian. We then switch the Hamiltonian back to the original one, again perturbing the energy. The net change in the energy of the system during this time step is due to work performed on the system by perturbing back and forth between the system and shadow Hamiltonian. We can determine this shadow work (also known as error work Lechner et al. (2006) or an effective energy change Bussi and Parrinello (2007)) during each time step by measuring the difference in energy using the system Hamiltonian, so we do not need to know the form of the shadow Hamiltonian. This shadow work is distinct from any protocol work applied to the system due to explicit, time-dependent perturbations of the system Hamiltonian. Note that Markov-chain Monte Carlo (MCMC) simulations do not generate shadow work Lelièvre et al. (2010) because the dynamics satisfies detailed balance explicitly, which ensures that the trajectories are microscopically reversible Tolman (1938) and that the appropriate equilibrium ensemble is preserved for a time-independent Hamiltonian Frenkel and Smit (2002).

Discretizations of continuous-time Langevin dynamics are essentially a combination of deterministic and stochastic dynamics, and, as a result, they suffer from a combination of problems. With a finite time step, the deterministic parts of the dynamics tend to pump energy into the system in the form of shadow work, driving the system away from equilibrium, whereas the stochastic parts of the dynamics relax the velocities back toward the equilibrium Maxwell-Boltzmann distribution, removing energy from the system in the form of heat. It follows that, even for a system with a Hamiltonian that is explicitly time-independent, a finite-time-step Langevin dynamics has an effective Hamiltonian alternating between the system Hamiltonian and the shadow Hamiltonian, and thus actually simulates a driven, nonequilibrium system, with a net energy flow. Microscopic time-reversal symmetry is broken, and in general we can not determine the steady-state, nonequilibrium distribution. These difficulties may be circumvented by reducing the time step, but at the cost of increasing the computational effort required to simulate the same interval of time; this hardly constitutes a satisfactory resolution.

The main point of this paper is this interpretation of the errors induced by discrete simulation of Langevin dynamics, in terms of a driven thermodynamic process. This perspective forms a bridge between the study of numerical integrators and the rapidly expanding field of nonequilibrium statistical mechanics, permitting the invocation of a wide array of nonequilibrium work fluctuation relations to characterize and correct for biases in estimates of equilibrium and nonequilibrium thermodynamic quantities.

II Concrete Integrator

We demonstrate the utility of this perspective for an integration scheme that is explicitly time-symmetric, that cleanly separates the stochastic and deterministic parts of the dynamics, and for which the deterministic parts are symplectic and the stochastic parts are detailed balanced. This construction allows a clean separation of the system’s energy change into work, shadow work, and heat, simplifying our analysis in terms of a driven nonequilibrium process. Fortunately, integrators with these properties have received recent attention Adjanor and Athènes (2005); Adjanor et al. (2006); Bussi et al. (2009); Bou-Rabee and Owhadi (2010); Lelievre et al. (2012). As a concrete example, we consider the integrator used by Bussi and Parrinello Bussi and Parrinello (2007), where we make the Hamiltonian update explicit:

Here, $\Delta t$ is the time step by which the simulation clock is advanced, $f(n)$ is the force at position $r(n)$ due to the Hamiltonian $\mathcal{H}(n)$ , $a=\exp({-\gamma\,\Delta t})$ , and $\mathcal{N}^{+}$ and $\mathcal{N}^{-}$ are independent, normally distributed random variables with zero mean and unit variance (hence, when scaled by $(\beta m)^{-1/2}$ , distributed according to the equilibrium Maxwell-Boltzmann velocity distribution). The first and last substeps (3a,3g) are stochastic, Markovian, and detailed-balanced (with respect to the canonical measure) velocity randomizations, which leave the position unchanged. The five middle substeps (3b-3f) constitute the deterministic velocity Verlet integrator (2), with the midpoint Hamiltonian update made explicit. The order of substeps and the effective Hamiltonian switches are illustrated in Fig. 1. Note that the deterministic substeps (3b,3c,3e,3f) are each individually symplectic.

III Nonequilibrium thermodynamics

A central relation of driven, nonequilibrium thermodynamics Bustamante et al. (2005); Cleuren et al. (2007); Jarzynski (2011); Spinney and Ford (2012) relates the microscopic irreversibility of trajectories to the work $W[X,\Lambda]$ performed on the system during the forward protocol Crooks (1998, 1999a, 1999b):

It is straightforward to extend this fluctuation theorem to mixed stochastic-deterministic dynamics, such as the Langevin integrator, Eq. (3), provided that the individual substeps satisfy this symmetry. It is for this reason that we insist on a clean separation of the deterministic and stochastic substeps.

The total work $W=\sum_{n}W^{(n)}$ is the sum of the contributions $W^{(n)}$ from individual steps. The total change in energy $\Delta E$ during the step $n\rightarrow n+1$ can be cleanly separated into heat $Q$ , protocol work $W_{\rm prot}$ , and shadow work $W_{\rm shad}$ :

Here, $\Delta E_{a\text{-}g}$ are the energy changes during the corresponding substeps of Eq. (3). Heat is the energy exchanged with the thermal environment, protocol work is the energy change due to deliberate manipulation of the Hamiltonian (i.e., the explicit time-dependence of the system Hamiltonian), and shadow work is the energy change due to alternation between the system and shadow Hamiltonians, resulting from the finite time step of the symplectic part of the integrator. The essential distinction between heat and work is that heat flow is change of the system energy due to change in the current distribution over microstates, whereas work is change of energy due to change in the equilibrium distribution over microstates.

The stochastic velocity randomization substeps obey Eq. (4) since they are balanced, in that they preserve the canonical equilibrium distribution Crooks (1998). The set of deterministic velocity Verlet substeps also obeys Eq. (4), so long as the total work includes the shadow work Athènes (2004); Lechner et al. (2006), since the dynamics is symplectic and microscopically reversible with respect to the shadow Hamiltonian Yoshida (1990); Frenkel and Smit (2002). Since both the deterministic and stochastic substeps are Markovian, it follows that we can safely intermix the two dynamics, and (4) still holds.

It therefore follows that the Langevin integrator obeys various derived relations of nonequilibrium statistical dynamics, such as the Jarzynski equality Jarzynski (1997), fluctuation relations Evans and Searles (1994); Crooks (1998), interrelations between path ensemble averages Crooks (1999a); Hummer and Szabo (2001) and various interrelations between dissipation and time asymmetry Gaspard (2004); Jarzynski (2006); Kawai et al. (2007); Feng and Crooks (2008). Furthermore, by its separation of protocol work and shadow work, the Langevin integrator permits the separation of the respective contributions to microscopic irreversibility of deliberate perturbation (physically meaningful) and the finite time step (a discretization artifact). Notably, the statistics of the protocol work alone systematically deviate from those of the total work, and hence lead to biased inference when using the machinery of nonequilibrium thermodynamics. In Secs. VI and VII we explicitly demonstrate this underappreciated point.

IV “Equilibrium” simulations sample perturbed distributions

It is common practice in the study of the equilibrium properties of molecular systems to use a single finite-time-step mixed stochastic and deterministic dynamical simulation to sample from an equilibrium distribution. However, this distribution departs from the true equilibrium distribution for the system Hamiltonian, a distribution that we can now understand as the steady state due to driving by the finite time step. Thus a question of significant practical interest presents itself: How far from equilibrium is the effective nonequilibrium steady state induced by this time discretization for a system with a time-independent Hamiltonian? Since the explicit system Hamiltonian is unchanging, no protocol work is performed, and thus our analysis in this section focuses on the shadow work alone. Practitioners commonly estimate artifactual errors by monitoring some essentially arbitrary, yet easily measured, observable of the system, such as the total energy. However, we can exploit recent advances in nonequilibrium statistical dynamics to provide a principled characterization of how far the system is driven from equilibrium Sivak and Crooks (2012a).

The natural measure of this instantaneous distance that the system has been driven away from equilibrium is the difference between a nonequilibrium free energy Shaw (1984); Gaveau and Schulman (1997) $F_{\rm neq}\equiv\langle E\rangle-TS$ and the corresponding equilibrium free energy $F_{\rm eq}$ for the given Hamiltonian $\mathcal{H}$ . If the Hamiltonian were held constant and the (previously driven) system were allowed to relax to equilibrium, this deviation from the equilibrium free energy would represent the heat that would be lost to the environment, or equivalently the maximum work that could be imparted to a mechanically coupled system. For the perturbations imposed by the discrete dynamics, this nonequilibrium free-energy deviation is approximated near equilibrium by Sivak and Crooks (2012a)

where $W_{\rm shad}$ is the shadow work over the whole simulation, ${\mathcal{P}}_{\rm ss}$ is the power (work per unit of time) once transients have died off and the system has settled into a nonequilibrium steady state, and $t_{\rm f}-t_{\rm i}$ is the total simulation time. Normalizing this nonequilibrium free-energy deviation by the size of the system (number of degrees of freedom) provides a natural measure of how far from equilibrium each degree of freedom is on average.

To estimate the nonequilibrium steady-state free-energy deviation for a molecular system, we simulate cubic boxes of TIP3P waters of various sizes, both with and without constraints on the water O-H and H-H interatomic distances. (See the Appendix for simulation details.) Initial coordinates and momenta are sampled from equilibrium in an isothermal-isobaric (NPT) ensemble (that is, an ensemble that maintains constant number of particles, constant pressure, and constant temperature) at 1 atm and 298 K using the generalized hybrid Monte Carlo (GHMC) integrator Lelievre et al. (2012); Horowitz (1991). These initial conditions are simulated for $M$ steps with the Langevin integrator [Eq. (3)] at constant volume (using a collision rate $\gamma=9.1$ /ps) to measure the nonequilibrium work to reach steady state, followed by an additional $M$ steps to measure the steady-state power. We have determined that, for all systems and time steps simulated, $M=1028$ steps is sufficient to reach steady state (see Fig. 4). We have also calculated the statistical uncertainty according to Eq. (A).

where the prefactor $a$ depends strongly on whether constraints are employed; see the caption of Fig. 2. This trend is consistent with earlier work observing the strong dependence of Metropolization acceptance probabilities on time step Beskos et al. (2010) and highlights how small reductions in time step can rapidly reduce the deviations of the sampled steady-state distribution from the desired equilibrium distribution defined by the system Hamiltonian $p_{\rm eq}(x)\propto\exp[-\beta\mathcal{H}(x)]$ , without unduly burdensome computational cost. We detail in Sec. VI some methods that correct for these nonequilibrium perturbations. Even in the absence of correction procedures, the above calculation represents a thermodynamically meaningful determination of the deviation from the desired equilibrium sampling associated with the continuous Langevin equation of motion, as a function of simulation parameters.

V Multivariate Fluctuation Theorem

We seek an analytical framework that describes the correlation between the shadow work (performed by integration) and the protocol work (due to explicit Hamiltonian changes). We want this framework to provide a generic method to characterize the effect that shadow work has on the distribution of protocol work, and specifically on the time-reversal symmetry [Eq. (4)] that protocol work would satisfy in its absence. Furthermore, we want this framework to suggest systematic techniques to correct for these distorting effects. We propose such a framework through the generalization of work fluctuation theorems to the context of two sources of work. These results, although formulated specifically for our situation of explicit and artifactual work, are entirely general to situations involving any two sources of work.

Rearrangement of Eq. (4) and splitting the work into two distinct work contributions $W_{1},W_{2}$ gives

Multiplication by delta functions of the two works, $\delta(W_{1}[X,\Lambda]-W_{\rm prot})\delta(W_{2}[X,\Lambda]-W_{\rm shad})$ , and integration over all trajectories produces what we refer to as the multivariate fluctuation theorem,

This is a special case of the generalized detailed fluctuation theorem for joint probabilities of Garcìa-Garcìa, et al. García-García et al. (2010, 2012). Equation (9) gives an expression in terms of the excess work $W_{\rm prot}+W_{\rm shad}-\Delta F_{\rm eq}$ for the ratio of the joint probability distributions over protocol and shadow works realized during the forward and reverse protocols, respectively.

Equation (9) can be trivially extended to arbitrary decompositions of the total work, where each component corresponds to a group of individual work steps. It thus represents a generalization of the work fluctuation theorem Crooks (1999a) to contexts with multiple sources of work. From Eq. (9), several other modified fluctuation theorems can be derived that modify a standard fluctuation theorem for one of the works with an exponential average over the other work. For example, in Sec. VI, we derive a Jarzynski equation modified by the presence of shadow work [Eq. (12)], and, in Sec. VII, we derive a similarly modified integrated transient fluctuation theorem [Eq. (15)].

VI Recovering equilibrium statistics from nonequilibrium simulations

Now that we are equipped with our new interpretation of finite-time-step Langevin dynamics as a driven nonequilibrium process even in the absence of an explicit driving force, nonequilibrium thermodynamics affords various approaches for recovering true equilibrium properties of the system.

One approach is to maintain the simulation at equilibrium by incorporating Monte Carlo moves that conditionally accept or reject candidate trajectory segments or single time steps, for example by using the Metropolis criterion $P_{\text{accept}}=\text{min}(1,\exp\{-\beta W_{\rm shad}\})$ Metropolis et al. (1953). In order to maintain detailed balance, the velocity must be inverted if the proposed state is rejected Lelièvre et al. (2010), which may lead to increased correlation times. Applied to single time steps, this is essentially the idea behind the GHMC integrator Horowitz (1991); Lelièvre et al. (2010), and when applied to trajectory segments, this is the idea behind work-bias Monte Carlo Athènes (2002) and nonequilibrium candidate Monte Carlo Nilmeier et al. (2011) simulations. In either case, Metropolization results in an MCMC process that samples the true equilibrium distribution.

Another approach to recovering equilibrium statistics is to perform a Monte Carlo sampling of trajectories Sun (2003); Atılgan and Sun (2004), generating an ensemble of trajectories weighted by the Boltzmann-weighted work over the entire trajectory, $\exp\{-\beta W_{\rm shad}\}$ . This approach allows both accurate equilibrium statistics and realistic dynamics, albeit at a potentially high computational cost.

Instead of sampling equilibrium trajectories, we can alternatively apply nonequilibrium relations, such as the Jarzynski equality Jarzynski (1997) and path ensemble averages Crooks (1999a); Hummer and Szabo (2001); Minh and Chodera (2009, 2011), to directly recover equilibrium properties from the statistics of a driven system, essentially by reweighting trajectories by $\exp\{-\beta W_{\rm tot}\}$ , where it is important that the work includes both the protocol work and the shadow work. Note that the initial configurations must be sampled from the correct equilibrium ensemble, which can be accomplished with a standard MCMC process, or with one of the approaches discussed above, such as GHMC simulation.

We now demonstrate the importance of including the shadow work by using the Jarzynski equality to estimate free energy changes in a simple model system. The Jarzynski equality Jarzynski (1997) relates the equilibrium free energy change, resulting from some perturbation of the system, to the exponential average of the work incurred during many realizations of the system response to that perturbation,

We can understand the origin of this error by analyzing our estimator in terms of the multivariate fluctuation theorem [Eq. (9)] derived above in Sec. V. Rearranging Eq. (9), decomposing the joint probability into the marginal and conditional probabilities,

and integrating over the shadow work, we find that when ignoring the contributions of shadow work, the Jarzynski estimator of the free energy $\beta\widehat{\Delta F}_{\rm eq}\equiv-\ln\left\langle e^{-\beta W_{\rm prot}}\right\rangle_{\Lambda}$ has a systematic bias from the true free energy change $\beta\Delta F_{\rm eq}$ that is a function of the distribution of shadow works:

VII Correcting nonequilibrium fluctuation theorems

In addition to these errors for equilibrium estimators during simulations with an explicitly time-independent Hamiltonian, ignoring the contribution of shadow work leads to systematic errors in estimates of nonequilibrium quantities when the Hamiltonian is explicitly time dependent: The simulated system is actually subject to a different Hamiltonian than the system one, and thus the probability distribution of protocol works does not obey the relevant time-reversal symmetry (4). We quantitate this time-reversal asymmetry by examining violations of the integrated transient fluctuation theorem (ITFT) Ayton et al. (2001), which for time-symmetric protocols relates the ratio of the probabilities of realizing a negative and a positive total work, respectively, to the exponentially-weighted total work, conditional on the total work being positive:

This relation follows directly from Eq. (4).

Manipulating Eq. (9) to a similar form produces

For this relation to hold, the work in the exponential must be the total work, not the protocol work that appears elsewhere in the equation. When one ignores the shadow work and measures only the protocol work, the ratio of the left-hand side and right-hand side,

departs from unity to the extent that the protocol work fluctuations do not obey the relevant time-reversal symmetry that the total work fluctuations do. Departure from unity in Eq. (15) quantifies the violation of the nonequilibrium time-reversal symmetry obeyed by a proper thermodynamic work encompassing all energy changes not related to heat.

Figures 3c,d show that, for the simple system described in Sec. VI, the protocol work alone (circles) does not obey the nonequilibrium fluctuation relation required of a thermodynamic work (with an error that empirically scales with the square of the time step), but the sum of the protocol and shadow works ( $\times$ s) does obey it. The correction factor $\left\langle e^{-\beta W_{\rm tot}}\right\rangle_{W_{\rm prot}>0}/\left\langle e^{-\beta W_{\rm prot}}\right\rangle_{W_{\rm prot}>0}$ ( $+$ signs) reproduces the error in the ITFT ratio neglecting shadow work. Thus, ignoring the shadow work and using the protocol work rather than the total work produces systematic biases in estimators of nonequilibrium quantities (such as the nonequilibrium free energy Sivak and Crooks (2012a) or the nonequilibrium energetic efficiency Sivak and Crooks (2012b)).

VIII Epilogue

For Hamiltonian dynamics, a finite-time-step symplectic integrator conserves a shadow Hamiltonian and is microscopically reversible. But, as we have seen, for Langevin dynamics, discretization of the dynamics leads (even for a time-independent Hamiltonian) to a mixed deterministic-stochastic nonequilibrium dynamics, which preserves the equilibrium distribution of neither the system nor the shadow Hamiltonian and which is not time-reversal symmetric. However, we can measure the work, heat, and shadow work, and thereby separate the respective contributions to time-reversal symmetry breaking of the finite time step and deliberate perturbation. This procedure allows us to apply results from nonequilibrium thermodynamics to characterize in a thermodynamically meaningful way the error produced by finite-time-step integration and to correct for such errors to recover equilibrium and nonequilibrium properties of the system.

While we focus in this paper on work distributions, we note that discrete integrators can also introduce artifacts into other aspects of a system’s dynamical evolution, for example, producing erroneous free-particle diffusion coefficients and uniform force-field terminal drifts. These artifacts can be mitigated through time-step rescaling, as discussed in Ref. Sivak et al. (2013). Where measurements of work and heat are not required, correct statistics of nonequilibrium trajectories through phase space can be recovered using the Metropolis-adjusted geometric Langevin algorithm of Bou-Rabee and Vanden-Eijnden, which under reasonable conditions on the potential energy is pathwise convergent to the distribution of trajectories for the continuous equations of motion Bou-Rabee and Vanden-Eijnden (2010).

Acknowledgments

The authors thank the anonymous referees for suggestions that substantially improved the manuscript. The authors also thank Manuel Athènes (Commissariat à l’Eńergie Atomique/Saclay), Gabriel Stoltz (CERMICS, Ecole des Ponts ParisTech), Benoît Roux (University of Chicago), Jerome P. Nilmeier (Lawrence Livermore National Laboratory), Todd Gingrich (UC Berkeley), Jesús A. Izaguirre (University of Notre Dame), Benedict Leimkuhler (University of Edinburgh), Jason Wagoner (Stanford University), Huafeng Xu and Cristian Predescu (D. E. Shaw Research) for enlightening discussions and constructive feedback on the manuscript, and Avery A. Brooks for help with the illustrations. The authors are grateful to Peter Eastman and Vijay Pande (Stanford University) for their assistance with the OpenMM molecular simulation library. J. D. C. was supported through a Distinguished Postdoctoral Fellowship from the California Institute for Quantitative Biosciences (QB3) at the University of California, Berkeley. D. A. S. and G. E. C. were funded by the Office of Basic Energy Sciences of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Water simulations were carried out on the NCSA Forge supercomputer through an allocation (TG-MCB100015) from the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant No. OCI-1053575.

Appendix A Simulation details

We carried out simulations using the OpenMM GPU-accelerated molecular simulation toolkit Friedrichs et al. (2009); Eastman et al. (2012) (development revision r3314). Cubic water boxes of various sizes (220, 440, 880, 1760, and 3520 waters) were created using the OpenMM Modeller tool and parametrized with TIP3P water Jorgensen et al. (1983) using the OpenMM Forcefield tool. In constrained simulations, we used the analytical SETTLE algorithm Miyamoto and Kollman (1992) to enforce constraints on water O-H and H-H interatomic distances. This Langevin integrator maintains second-order accuracy Lelievre et al. (2012) when constrained by the RATTLE algorithm Andersen (1983), which should produce results identical (to within machine precision) to SETTLE. We truncated Lennard-Jones interactions at 9 Åand added an analytical long-range dispersion correction Shirts et al. (2007) to account for interactions beyond this cutoff. We handled electrostatics using the reaction-field algorithm Tironi et al. (1995) with an identical cutoff using an exterior dielectric of 78.5.

We sampled initial configurations and momenta from an equilibrium NPT ensemble at 1 atm and 298 K with the GHMC algorithm Lelievre et al. (2012); Horowitz (1991) using a 0.5 fs time step. We controlled pressure using a Monte Carlo molecular-scaling barostat with a proposal size automatically determined during equilibration Chow and Ferguson (1995); Åqvist et al. (2004). After initial equilibration for 250 000 steps, we sampled configurations and momenta every 10 000 GHMC steps and subjected them to Langevin simulation [Eq. (3)] at fixed volume using a collision rate of 9.1/ps. We integrated these initial conditions for a total of 4096 steps using a variety of different time steps from 0.25 fs to 7 fs, with the accumulated shadow work after $2^{n}$ steps stored ( $n=0,1,\ldots,12$ ). The limit of stability was determined by the largest time step that did not generate infinite cumulative work values in 4096 time steps in any sample; we determined the limit to be 2 fs for unconstrained simulations and 6 fs for constrained simulations.

To estimate, using Eq. (6), the nonequilibrium free energy of the steady-state ensemble sampled by discrete Langevin integration, we used the average accumulated shadow work after $M$ steps as the work to switch into steady state, while we used the average dissipated power in the next M steps as an average steady-state power:

Here, the $\left<\cdot\right>_{\rm GHMC}$ notation denotes averages computed over Langevin simulations initiated from GHMC-sampled initial configurations and momenta. Through analysis of $M=2^{n}$ for $n=0,1,\ldots,11$ , we found that the steady-state power, and hence the estimated nonequilibrium free energy, converged after $M=1024$ steps (see Fig. 4), so we used this value for all subsequent analysis.

We estimated the squared uncertainty in the nonequilibrium free energy as