Nonequilibrium candidate Monte Carlo: A new tool for efficient equilibrium simulation

Jerome P. Nilmeier, Gavin E. Crooks, David D. L. Minh, John D. Chodera

Equilibrium and Expanded Thermodynamic Ensembles

For physical systems in equilibrium, the probability of observing a microstate is given by the Boltzmann distribution,

where xΓx\in\Gamma denotes a microstate of the system (which may include coordinates, momenta, and other dynamical variables, such as simulation box dimensions), λ\lambda denotes a set of thermodynamic parameters whose values define a thermodynamic state, and ZλZ_{\lambda} is a normalizing constant known as the partition function.

The reduced potential uλ(x)u_{\lambda}(x) depends on the thermodynamic ensemble under consideration . For instance, in an isothermal-isobaric (NpTNpT) ensemble, the reduced potential will assume the form,

which depends on the Hamiltonian H(x)H(x) (which may include an external biasing potential, and is presumed to be invariant under momentum inversion) and the system volume V(x)V(x). In this ensemble, the vector of controllable thermodynamic parameters λ{β,H,p}\lambda\equiv\{\beta,H,p\} includes the inverse temperature β\beta, the Hamiltonian H(x)H(x), and external pressure pp. Other thermodynamic parameters and their conjugate variables can be included or excluded to generate alternative physical (or unphysical) ensembles.

To allow sampling from multiple thermodynamic states within a single simulation, we also define an expanded ensemble , which specifies a joint distribution for (x,λ)(x,\lambda) in a weighted mixture of thermodynamic states,

where ωλ>0\omega_{\lambda}>0 specifies an externally-imposed weight for state λ\lambda. Here, λG\lambda\in\mathcal{G} may assume values in a discrete or continuous space G\mathcal{G}. If the set G\mathcal{G} consists of a single value of λ\lambda, a single thermodynamic state is sampled, and π(x,λ)=πλ(x)\pi(x,\lambda)=\pi_{\lambda}(x). These thermodynamic states may correspond to a variety of different states of interest, such as temperatures in a simulated tempering simulation , alchemical states in a simulated scaling simulation , or protonation states in a constant-pH simulation .

Nonequilibrium Candidate Monte Carlo

Each perturbation kernel αt\alpha_{t} drives some or all of the degrees of freedom xx in a stochastic or deterministic way (e.g. by driving a torsion angle, a distance between two atoms, or the volume of the simulation cell). Similarly, each propagation kernel KtK_{t} propagates some or all of the coordinates of the system at fixed λt\lambda_{t} according to some form of MCMC or MD (e.g. Metropolis Monte Carlo , velocity Verlet deterministic dynamics, or overdamped Langevin stochastic dynamics ) that may also depend on the time index tt. Interleaving perturbation and propagation allows for energetically unfavorable interactions introduced by perturbation to be relaxed during propagation, potentially increasing acceptance rates relative to the instantaneous perturbations of standard Metropolis Monte Carlo.

The procedure by which a trajectory X(x0,x1,,xT)X\equiv(x_{0},x_{1},\ldots,x_{T}) is generated from an initial microstate x0x_{0} according to a protocol Λ\Lambda can be illustrated by the scheme,

Application of the perturbation αt\alpha_{t} to xt1x_{t-1} generates a perturbed configuration xtx_{t}^{*}, which is then propagated by the kernel KtK_{t} to obtain xtx_{t}.

The next step in NCMC is to accept or reject (xT,λT)(x_{T},\lambda_{T}) as the next sample in the chain. To ensure that the stationary distribution π(x,λ)\pi(x,\lambda) is preserved, we enforce a strict pathwise form of detailed balance,The described pathwise detailed balance condition is closely related to “super-detailed balance” (see, e.g. ), but also accounts for momentum reversal to extend its definition to include molecular dynamics integrators.

Summation of Eq. 6 over all trajectories starting with x0x_{0} and ending with xTx_{T} recovers the standard detailed balance condition (see Appendix for proof).

We define the ratio of proposal kernels as,

and the ratio of propagation kernels as the exponentiated difference in forward and backward conditional path actions as,

where Δu(XΛ)uT(xT)u0(x0)\Delta u(X|\Lambda)\equiv u_{T}(x_{T})-u_{0}(x_{0}) is the energy difference. Eq. 11 is the main result of this paper, and is highly general with regard to both the choice of protocol for perturbation and propagation. In subsequent sections, we discuss specific choices for these protocols that lead to particularly simple acceptance criteria.

Many choices of acceptance probabilities A(XΛ)A(X|\Lambda) that satisfy Eq. 11 are possible, including the well-known Metropolis-Hastings criterion ,

We note that NCMC need not be used exclusively to sample from π(x,λ)\pi(x,\lambda), but can be mixed with other MCMC moves or with MD . For example, one may reinitialize velocities from the Maxwell-Boltzmann distribution after each NCMC step; this is a Gibbs sampling MCMC move using the marginal distribution for velocities.

Perturbation Kernels

A large variety of choices are available for the perturbation kernels αt(x,y)\alpha_{t}(x,y). Through judicious selection of these kernels, a practitioner can design nonequilibrium proposals that carry some component of the system from one high-probability region to another with high acceptance rates. We briefly describe a few possibilities.

Suppose we wish to drive a torsion angle ϕ\phi (an angle subtended by four bonded atoms) stochastically by rotating it to a new torsion angle ϕ\phi^{\prime} (holding bond lengths and angles fixed)according to some probability, such as the von Mises circular distribution centered on ϕ\phi,

with I0(κ)I_{0}(\kappa) denoting the modified Bessel function of order zero and κ>0\kappa>0 taking the role of a dimensionless force constant. Because the stochastic perturbation is made in a non-Cartesian coordinate, a Jacobian J(ϕ)J(\phi) must be included to compute α(x,y)\alpha(x,y) in Cartesian coordinates, resulting in the ratio,

where J(ϕ)=J(ϕ)=1J(\phi^{\prime})=J(\phi)=1 because the transformation (a rotation about a bond vector) preserves the Cartesian phase space volume.

2 Deterministically Driven Degrees of Freedom

3 Simulation Box Scaling

4 Thermodynamic Perturbation

Propagation Kernels

The choice of propagation kernels available is also very broad. If strong driving is performed in selection of α\alpha, one may elect to choose a time-independent propagation kernel Kt(x,y)K(x,y)K_{t}(x,y)\equiv K(x,y) that samples from a stationary distribution π(x)\pi(x) of interest. Alternatively, a strongly time-dependent KtK_{t} could be selected to transiently drive the system out of equilibrium, or from the equilibrium distribution at one thermodynamic state to another. Some possible choices are described below.

One may propagate some or all of a system’s degrees of freedom (e.g. those not affected by the perturbation kernel αt\alpha_{t}) by a method that satisfies detailed balance in πt\pi_{t},

where πt(x)Zt1eut(x)\pi_{t}(x)\equiv Z_{t}^{-1}e^{-u_{t}(x)} for a specified ut(x)u_{t}(x). Many MCMC methods , including Metropolis and various hybrid Monte Carlo (HMC) algorithms that combine discrete-timestep integrators with Monte Carlo acceptance/rejection steps , obey detailed balance and are commonly used for the simulation of physical systems.

By analogy with Crooks , we define a work ww and heat qq for the nonequilibrium driven process,

such that w(XΛ)+q(XΛ)=Δu(XΛ)w(X|\Lambda)+q(X|\Lambda)=\Delta u(X|\Lambda), a restatement of the first law of thermodynamics.

The conditional path action difference can then be written in terms of the heat of the process, q(XΛ)q(X|\Lambda),

leading to an acceptance probability similar to standard MC, except that the work, w(XΛ)w(X|\Lambda), replaces the instantaneous potential energy difference,

2 Deterministic Dynamics

where Δu(XΛ)uT(xT)u0(x0)\Delta u(X|\Lambda)\equiv u_{T}(x_{T})-u_{0}(x_{0}) is the energy difference. The equivalence of the work and energy difference for volume-preserving integrators was previously recognized in the context of fluctuation theorem calculations .

Symplectic integrators include velocity Verlet . These integrators are also symplectic when utilizing constraints through the application of algorithms such as RATTLE , provided that the constraints are iterated to convergence each timestep .

3 Stochastic Dynamics

Stochastic integrators such as velocity Verlet discretizations of Langevin dynamics sample a modified distribution that differs from the desired equilibrium distribution πt\pi_{t} in a timestep-dependent manner . While this modified distribution may be difficult or impossible to compute in order to recover equilibrium properties by reweighting, computation of the relative action ΔS(XΛ)\Delta\mathcal{S}(X|\Lambda) is relatively straightforward, and the NCMC acceptance criteria ensures that the NCMC-sampled configurations are distributed according to the desired equilibrium ensembleAlternatives to using NCMC to correct stochastic integration include introducing a Metropolization correction after each step, as in the generalized hybrid Monte Carlo (GHMC) integrator we use in the example .. As examples, we compute ΔS(XΛ)\Delta\mathcal{S}(X|\Lambda) for the overdamped Langevin (Brownian) dynamics integrator of Ermak and Yeh and the Brünger-Brooks-Karplus (BBK) Langevin integrator in the Appendix.

Illustrative Application: Bistable Dimer in a WCA Fluid

The rate at which effectively uncorrelated samples are generated can be quantified in terms of the correlation time τ\tau for the dimer extension r(t)r(t) (shown in Fig. 2). This time represents the asymptotic decay time constant for the correlation function C(t)=r(0)r(t)C(t)=\left\langle r(0)r(t)\right\rangle, which will behave like

for large tt, where C0=r2C_{0}=\left\langle r^{2}\right\rangle and C=r2C_{\infty}=\left\langle r\right\rangle^{2}. The correlation time is related to the statistical inefficiency, g=1+2τg=1+2\tau, a factor that describes the number of iterations necessary to generate an effectively uncorrelated sample .

For the MD simulation in vacuum (Fig. 2, top trace), we observe slow hopping between compact and extended configurations with a correlation time τ=59.2\tau=59.2 iterations, resulting in slow convergence of the histogram. Introducing instantaneous MC dimer extension/contraction moves that modify the dimer extension by Δr=±r0\Delta r=\pm r_{0} reduces the correlation time to τ0.0\tau\approx 0.0, such that an uncorrelated configuration is generated after each iteration of 500 MD steps and one instantaneous MC step (Fig. 2, second trace from top).

When the dimer is immersed in a dense fluid of WCA particles, however, iterations consisting of 500 MD steps alone result in extremely slow barrier crossings, requiring g \approx 600 iterations to produce an uncorrelated sample (Fig. 2, middle trace). Unlike in vacuum, the introduction of instantaneous MC moves does not significantly reduce the correlation time τ\tau (Fig. 2, second trace from bottom). However, performing these same dimer expansion and contraction moves over 2048-step NCMC moves (Fig. 2, bottom trace) allows the system to rapidly mix between both compact and extended states with a correlation time of τ=4.0\tau=4.0 iterations. While each iteration requires a 5-fold increase in computational effort (500 MD steps + 2048 NCMC switching steps = 2548 force evaluations, versus 500 force evaluations for MD alone), a 67-fold reduction in correlation time is achieved, resulting in a remarkable order-of-magnitude gain in overall efficiency.

The length of the NCMC switching process is a free parameter that may be tuned to further improve efficiency. Towards this end, we estimated the acceptance probability of the extension/contraction moves in dense solvent as a function of switching length (Fig. 3). While instantaneous MC proposals of ±r0\pm r_{0} are only accepted with probability 1027\approx 10^{-27} (the error is this quantity is likely underestimated due to its extremity), dividing the move into smaller steps boosts the acceptance rate to a level useful in condensed-phase simulation. If the move is divided into a small number of steps (1 to 8), there is little to no increase in acceptance rate, but for an intermediate number of steps (16 to 1024), there is a superlinear boost in the acceptance probability relative to the length of the switching process. The acceptance probability finally reaches useful levels around 2000 steps, achieving an acceptance rate of 12% using nonequilibrium proposal trajectories of 2048 steps, or 38% for 8192 steps.

When comparing efficiency, we are most interested in the rate of generating uncorrelated configurations for a given amount of computational effort. Relative to MD alone, this rate is described by the efficiency gain,

Epilogue

We have described a procedure—nonequilibrium candidate Monte Carlo (NCMC)—by which nonequilibrium proposals can be used within MCMC simulations to enhance acceptance rates. In our illustration, we have demonstrated how its use can lead to large improvements in statistical efficiency—the rate at which uncorrelated samples are generated for a fixed amount of computational effort. In other applications, whether similarly large efficiency gains are achieved will depend on the precise nature of the system under study and the nonequilibrium proposals introduced. The most straightforward approach—borrowing Metropolis Monte Carlo proposals that are reasonable for one component of the system in isolation, and converting them to nonequilibrium proposals—is likely to be a fruitful avenue for generating efficient schemes, as was demonstrated here for a simple system.

More generally, the problem of selecting efficient nonequilibrium proposals is similar to the problem of choosing good reaction coordinates, in that it is desirable to drive the system along (possibly complex) slow collective coordinates where orthogonal degrees of freedom relax quickly. The search for such collective coordinates is a topic of active research . Given an initial nonequilibrium protocol, the issue of optimizing such a protocol to minimize dissipation (and maximize acceptance) is also a topic of active study, led by forays into the world of single-molecule measurement . Recent work has also suggested that estimating the thermodynamic metric tensor along the nonequilibrium parameter switching path , could prove useful in adaptively optimizing the switching protocol .

We note that switching trajectories contain potentially useful information. Indeed, several methods have recently been developed to estimate equilibrium properties from nonequilibrium samples through the application of statistical estimator theory to nonequilibrium fluctuation theorems ; these are particularly relevant to switching between multiple thermodynamic states. Though the development of efficient estimators that utilize both nonequilibrium switching trials and sampled equilibrium data generated in NCMC simulations remains an open challenge, it is at least straightforward to incorporate information from rejected NCMC proposals in the estimation of equilibrium averages .

WCA Dimer Simulations

The dimer system considered here consists of two particles that interact via a double-well “bonded” potential in the interatomic distance rr,

To ensure that observed differences were not due to changes in the stationary distribution of the integrator, we used generalized hybrid Monte Carlo (GHMC) () for all our simulations. GHMC is based on a velocity Verlet discretization of Langevin dynamics—the two are equivalent in the limit of small timesteps - but includes an acceptance/rejection step to correct for errors introduced by finite timesteps so that the stationary distribution is the exact equilibrium distribution. We used a timestep of 0.002τ0.002\tau, where τ=σ2m/ϵ\tau=\sqrt{\sigma^{2}m/\epsilon}, and the collision rate was set to τ1\tau^{-1}. With this timestep, the acceptance probability is 99.929±0.00199.929\pm 0.001%; the resulting dynamics closely approximates Langevin dynamics.

In simulations employing instantaneous Monte Carlo moves, a perturbation Δr\Delta r to the interatomic distance rr of the dimer was chosen according to,

where r(x)r(x) denotes the dimer separation of configuration xx. The propagation kernel Kt(x,y)K_{t}(x,y) corresponds to velocity Verlet dynamics where the dimer atoms are held fixed in space. The final configuration after the MC or NCMC rejection procedure was stored and plotted to generate Fig. 2.

The mean acceptance probabilities for each switching time τ\tau can be estimated via the sample mean

For numerical stability, logarithms of A(Xn)A(X_{n}) were stored, as anlnA(Xn)a_{n}\equiv\ln A(X_{n}). We then estimated lnAτ\ln\left\langle A\right\rangle_{\tau} (shown in Fig. 4) as

Integrated autocorrelation times were estimated using the rapid scheme described in Section 5.2 of Ref. .

The acceptance probabilities plotted in Fig. 4 were estimated from a trajectory consisting of 10 000 iterations of 2048-step NCMC, with 500 steps of GHMC dynamics in between each NCMC trial, to ensure equilibrium sampling. Prior to each 2048-step NCMC iteration, a velocity from the Maxwell-Boltzmann distribution was selected, and NCMC trial moves with varying switching times were conducted solely to accumulate statistics. The statistical error in the estimate of lnAτ\ln\left\langle A\right\rangle_{\tau} and the computed relative efficiency over instantaneous Monte Carlo was estimated by 1000 bootstrap trials, in which the dataset of 10 000 work samples was resampled with replacement in each bootstrap trial and 95% confidence intervals computed from the distribution over bootstrap replicates.

The reference distribution for the interparticle distribution P(r)\mathcal{P}(r) plotted in red on the right side of Fig. 2 was computed analytically for the vacuum system,

For the solvated system, this distribution was estimated from an umbrella sampling simulation employing a modified bonded potential intended to remove the barrier in between compact and extended states,

where rnr_{n} denotes the bond separation for sample nn, and a finite-width histogram bin was used instead of the delta function δ(r)\delta(r).

References

Appendix A Proof that NCMC preserves the equilibrium distribution

Following the proof for GHMC in Ref. , here we show that NCMC preserves the equilibrium distribution. The expected acceptance rate for NCMC moves initiated from (x,λ)(x,\lambda) is,

The latter contribution from accepting the candidate such that (x(n+1),λ(n+1))=(xT,λT)(x^{(n+1)},\lambda^{(n+1)})=(x_{T},\lambda_{T}) is,

where ρ(X,Λx0,λ0)Π(Xx0,Λ)P(Λx0,λ0)\rho(X,\Lambda|x_{0},\lambda_{0})\equiv\Pi(X|x_{0},\Lambda)P(\Lambda|x_{0},\lambda_{0}) is the probability of generating the trajectory-protocol pair (X,Λ)(X,\Lambda) from (x0,λ0)(x_{0},\lambda_{0}), and the pathwise detailed balance condition (Eq. 6) is used to produce the quantity in brackets.

Taking the sum of Eqs. 36 and 37, we find that the equilibrium distribution is preserved,

Appendix B Acceptance criteria for overdamped Langevin (Brownian) integrator of Ermak and Yeh

A common integrator for Brownian dynamics (the overdamped regime of Langevin dynamics), in which only coordinates xx are explicitly integrated, is given by Ermak and Yeh . In our notation, where the perturbed coordinates xtx_{t}^{*} are propagated by one step of the stochastic integrator to obtain xtx_{t}, application of the propagation kernel K(xt,xt)K(x_{t}^{*},x_{t}) can be written,

where mm is the particle mass, Ft(x)(/x)Ht(x)F_{t}(x)\equiv-(\partial/\partial x)H_{t}(x) is the (potentially time-dependent) systematic force, and γ\gamma is an effective collision frequency or friction coefficient with units of inverse time. The noise history ξt\xi_{t} for each degree of freedom is a normal random variate with zero mean and variance β1\beta^{-1}, drawn from the distribution

Then, the ratio of transition kernels appearing in Eq. 10 can be written in terms of noise history ξt\xi_{t} and the computed reverse noise history ξt\xi_{t}^{*},

where the tildes are dropped because the microstate xx contains no momenta. The quantity xt/ξt\left|\partial x_{t}/\partial\xi_{t}\right| represents the Jacobian for the change of variables from the ξt\xi_{t} to xtx_{t}, and the Jacobians in the numerator and denominator cancel. The quantity in Eq. 43 can easily be computed during integration.

Appendix C Acceptance criteria for Langevin integrator of Brooks, Brünger, and Karplus (BBK)

The Brünger-Brooks-Karplus (BBK) stochastic integrator is a popular integrator for simulating Langevin dynamics. In our notation, where the perturbed coordinates xtx_{t}^{*} are propagated by one step of the stochastic integrator to obtain xtx_{t}, application of the propagation kernel K(xt,xt)K(x_{t}^{*},x_{t}) can be written,

The noise history terms ξt\xi_{t} and ξt\xi^{\prime}_{t} are normal random variates with zero mean and variance β1\beta^{-1}. Their joint distribution can therefore be written,

which can be shown to be independent of ξt\xi_{t} and ξt\xi^{\prime}_{t}. The conditional path action difference can now be computed,

Appendix D Derivation of effective correlation time for mixed MD/NCMC sampling

For a 2×22\times 2 system whose time evolution is governed by the column stochastic transition matrix T{\bf T}, we can write the autocorrelation function for the dimer extension rr as

where the transition matrix T{\bf T} has unitary eigenvalue decomposition UMU1{\bf U}{\bf M}{\bf U}^{-1}, and μ\mu is the non-unit eigenvalue of T{\bf T}. The constants are C0=(1/2)(rc2+re2)C_{0}=(1/2)(r_{c}^{2}+r_{e}^{2}) and C=(1/4)(rc+re)2C_{\infty}=(1/4)(r_{c}+r_{e})^{2}.

Relating this to the autocorrelation time τ\tau estimated from a timeseries, intended to reflect the fit to

Similarly, we can write the probability that the NCMC step will carry the system from one conformational state to another in terms of the acceptance probability γ\gamma, which we assume to be symmetric,