Comparing Markov Chain Samplers for Molecular Simulation

Robert D. Skeel; Youhan Fang

doi:10.3390/e19100561

and

Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA

^*

Author to whom correspondence should be addressed.

^†

Current address: School of Mathematical and Statistical Sciences, Arizona State University, Phoenix, Tempe, AZ 85287, USA.

Entropy2017, 19(10), 561;https://doi.org/10.3390/e19100561

This article belongs to the Special Issue Understanding Molecular Dynamics via Stochastic Processes

Version Notes

Order Reprints

Abstract

Markov chain Monte Carlo sampling propagators, including numerical integrators for stochastic dynamics, are central to the calculation of thermodynamic quantities and determination of structure for molecular systems. Efficiency is paramount, and to a great extent, this is determined by the integrated autocorrelation time (IAcT). This quantity varies depending on the observable that is being estimated. It is suggested that it is the maximum of the IAcT over all observables that is the relevant metric. Reviewed here is a method for estimating this quantity. For reversible propagators (which are those that satisfy detailed balance), the maximum IAcT is determined by the spectral gap in the forward transfer operator, but for irreversible propagators, the maximum IAcT can be far less than or greater than what might be inferred from the spectral gap. This is consistent with recent theoretical results (not to mention past practical experience) suggesting that irreversible propagators generally perform better if not much better than reversible ones. Typical irreversible propagators have a parameter controlling the mix of ballistic and diffusive movement. To gain insight into the effect of the damping parameter for Langevin dynamics, its optimal value is obtained here for a multidimensional quadratic potential energy function.

Keywords:

Markov chain Monte Carlo; stochastic dynamics integrators; decorrelation time; integrated autocorrelation time

1. Summary

Thermodynamic and structural properties of a molecular system can be precisely defined as ensemble-dependent expectations of functions of its configuration. The calculation of such expectations is feasible only with the use of Markov chain Monte Carlo (MCMC) methods or approximations thereof. Considered here are sampling propagators that do not compromise the Markov property. Included are unbiased samplers that conditionally accept proposed moves and biased samplers that unconditionally accept such moves, in particular, discretized stochastic dynamics. Many such sampling propagators are proposed in the literature, and, in virtually all cases, experiments are conducted to substantiate claims of superiority. Too often though, a good metric is not used to measure the computational cost of a propagator. The aim of this article is threefold: first, to explore some practicalities related to measuring the efficiency of a propagator; second, to highlight the superior efficiency of irreversible propagators, namely, those that do not satisfy detailed balance; and third, to provide some insight into the optimal mix of diffusive and ballistic movement for Langevin dynamics.

Let

ρ_{Q} (q)

,

q = {[q_{1}, q_{2}, \dots, q_{ν}]}^{T}

denote the probability density function corresponding to the ensemble of interest. This function is assumed to be known only up to a multiplicative factor. An observable is an expectation

E [u (Q)] = \int u (q) ρ_{Q} (q) d q

for some “preobservable”

u (q)

. This can be estimated by the mean

{\bar{U}}_{N} = (1 / N) \sum_{n = 0}^{N - 1} u (Q_{n})

, where the random values

Q_{n}

are obtained from a Markov chain

Q_{0} \to Q_{1} \to \dots \to Q_{N - 1}

that samples from

ρ_{Q} (q)

. Note the use here of upper case to denote random variables.

In practice, sampling performance is improved if the configuration variables

q

are augmented with auxiliary variables

p

, e.g., momenta, yielding phase space variables

z = (q, p)

. The probablity density is extended to

ρ (q, p)

so that

\int ρ (q, p) d p = ρ_{Q} (q),

and an MCMC scheme is constructed to produce a chain

Z_{0} \to Z_{1} \to \dots \to Z_{N - 1}

, where

Z = (Q, P)

.

The samples from the chain tend to be highly correlated, and this greatly reduces the convergence rate as

N \to + \infty

. As explained in Section 2, the variance of an estimate for

E [u (Z)]

is

Var [{\bar{U}}_{N}] = \frac{τ}{N} Var [u (Z)] + O (\frac{1}{N^{2}})

(1)

where

τ

is the integrated autocorrelation time (IAcT) for

u (z)

. The effective sample size is therefore

N / τ

, and the appropriate metric for evaluating a propagator is the effective sample size divided by the computing time.

In a great many practical simulations, the effective sample size is probably close to zero. One can disagree on the significance of such simulations [1]. In any case, for the comparison of sampling algorithms, it is possible to choose molecular systems, restrained if necessary, for which it is feasible to attain a decent effective sample size.

Often, the spectral gap is cited as the relevant quantity. To understand its role, it is helpful to express ideas in a direct way as in Refs. [2,3]. As detailed in Section 2, introduce a forward transfer operator

F

to express the ratio

ρ_{n} / ρ

in terms of

ρ_{n - 1} / ρ

, where

ρ_{n} (z)

is the probablity density for

Z_{n}

. Let

F_{0} = F - E

where

E u

denote the function with constant value

E [u (Z)]

. Assume that the operator

F_{0}

has its spectrum strictly inside the unit circle, as it does in practice. The error in

(1 / N) \sum_{n = 0}^{N - 1} ρ_{n} (z)

can be shown [1] to be “proportional to”

{(1 - F_{0})}^{- 1}

and therefore to the reciprocal of the spectral gap

| 1 - λ_{2} |

, where

λ_{2}

is a nonunit eigenvalue of

F

nearest 1.

Estimates of the IAcT are obtained by summing the terms of the autocorrelation function, which is constructed from autocovariances of increasing time lags normalized by the (lag zero) covariance. Each term contributes a roughly equal statistical error but a signal that decays as the lag time increases. Therefore, in practice, the terms are weighted by a function called a lag window. The lag window must be tailored to the autocorrelation function, and choosing a suitable lag window is very difficult, as mentioned in Section 2.1.

Reliable estimates of the IAcT are impractical in general. Section 3 reviews the concept of quasi-reliability, which aims to enforce a necessary condition for reliable estimates. Informally, the goal is to ensure good coverage of those “parts” of phase space that has been explored, to reduce the possibility of missing an opening to an unexplored part of phase space. More precisely, for an arbitrary subset of phase space, we ask that the proportion of samples in that subset differ from its expectation by no more than some tolerance

tol

, with, say, 95% confidence. This is shown to be true if the IAcT

τ

for any preobservable

u (z)

satisfies

τ \leq {tol}^{2} N

. The maximum IAcT

τ_{max}

is the greatest eigenvalue of

G = 1 - E + F_{0} {(1 - F_{0})}^{- 1} + F_{0}^{†} {(1 - F_{0}^{†})}^{- 1},

(2)

where

^{†}

denotes the adjoint with respect to the inner product

⟨ v, u ⟩ = \int \bar{v (z)} u (z) ρ (z) d z .

For a reversible propagator, where

F^{†} = F

,

τ_{max}

is 1 less than twice the reciprocal of the spectral gap. However, for an irreversible propagator, it can be much smaller, as demonstrated by a simple example in Section 4.1, or larger as in Section 5. As a practical algorithm, it is suggested to estimate the maximum IAcT by discretizing the space of preobservables. Consideration of a quadratic energy potential suggests using a linear combination of phase space variables (and possibly quadratic terms in these variables).

The idea of seeking the preobservable that maximizes the IAcT is suggested already in Ref. [4], which considers a set of indicator functions as preobservables and uses the greatest IAcT of these to assess sampling thoroughness. In general, maximizing over a linear combination of preobservables can lead to a much larger result than taking the maximum of them individually, due to correlations that might exist between different preobservables. This does not necessarily apply, however, to those considered in Refs. [1,4].

Section 3.1 notes that that typical irreversible propagators, termed “quasi-reversible” here, have a forward transfer operator

F = R \bar{F}

, where each of

\bar{F}

and

R

is reversible and

R^{2} = 1

. For such propagators, the estimation of

τ_{max}

simplifies somewhat.

Theoretical results [5] indicate that adding irreversibility reduces the autocorrelation times of observables. Section 4 gives a couple of very elementary examples illustrating the dramatic increase in

τ_{max}

if an irreversible propagator

F

is replaced by its reversible “counterpart”

\frac{1}{2} (F + F^{†})

.

Discretized Langevin dynamics is a particularly effective general-purpose propagator. Unfortunately, one must specify a value for the damping coefficient

γ

. Section 5 analyzes

τ_{max}

for a quadratic potential and obtains an optimal value for the coefficient, namely, the value

{(3 / 8)}^{1 / 2} = 0.612 \dots

times the critical damping coefficient for lowest frequency

ω_{1}

. This value is such that the lowest frequency mode is moderately underdamped, with higher frequencies increasingly underdamped.

2. Preliminaries

Assuming that

Z_{0}

has probability density

ρ (z)

, the variance of the estimate

{\bar{U}}_{N}

is exactly

Var [{\bar{U}}_{N}] = \frac{1}{N} Var [u (Z)] (1 + 2 \sum_{k = 1}^{N - 1} (1 - \frac{k}{N}) \frac{C (k)}{C (0)})

where the autocovariances

C (k) = E [(u (Z_{0}) - E [u (Z_{0})]) (u (Z_{k}) - E [u (Z_{k})])] .

The limit

N \to + \infty

gives Equation (1) where the integrated autocorrelation time

τ = 1 + 2 \sum_{k = 1}^{+ \infty} \frac{C (k)}{C (0)} .

(3)

As an example of augmenting configuration space, consider

ρ (q, p) \propto exp (- β (V (q)) + \frac{1}{2} p^{T} M^{- 1} p))

, where

p = {[p_{1}, p_{2}, \dots, p_{ν}]}^{T}

. A good propagator for this is the BAOAB integrator [6] for Langevin dynamics, whose equations are

d Q_{t} = M^{- 1} P_{t} d t, d P_{t} = F (Q_{t}) d t - γ P_{t} d t + \sqrt{\frac{2 γ}{β}} M_{1 / 2} d W_{t},

(4)

where M is a matrix chosen to compress the range of vibrational frequencies,

F (q) = - \nabla V (q)

,

M_{1 / 2} M_{1 / 2}^{T} = M

, and

W_{t}

is a vector of independent standard Wiener processes. Each step of the integrator consists of the following substeps:

B: $P_{n}^{'} = P_{n - 1} + \frac{1}{2} Δ t F (Q_{n - 1})$ ,
A: $Q_{n}^{'} = Q_{n - 1} + \frac{1}{2} Δ t M^{- 1} P_{n}^{'}$ ,
O: $P_{n}^{″} = exp (- γ Δ t) P_{n}^{'} + \sqrt{1 - exp (- 2 γ Δ t)} β^{- 1 / 2} M_{1 / 2} R_{n}$ ,
A: $Q_{n} = Q_{n}^{'} + \frac{1}{2} Δ t M^{- 1} P_{n}^{″}$ ,
B: $P_{n} = P_{n}^{″} + \frac{1}{2} Δ t F (Q_{n})$ ,

where

R_{n}

is a vector of independent standard Gaussian random variables. The samples generated from this process are shown [7] to be those from a distribution that differs from the correct one by only

O (Δ t^{2})

. The special choice

γ = 1 / (2 Δ t)

is the Euler–Leimkuhler–Matthews integrator [6] for Brownian dynamics with step size

δ t = Δ t^{2} / 2

. Remarkably, the invariant density of this integrator differs from the correct one by only

O (δ t^{2})

. This integrator can be expressed as a Markov chain

Q_{1}^{'} \to Q_{2}^{'} \to \dots \to Q_{N - 1}^{'}

in configuration space, with propagator

$Q_{n} = Q_{n}^{'} + \frac{1}{2} \sqrt{2 δ t} β^{- 1 / 2} M_{1 / 2}^{- T} R_{n}$ ,
$Q_{n + 1}^{'} = Q_{n} + δ t M^{- 1} F (Q_{n}) + \frac{1}{2} \sqrt{2 δ t} β^{- 1 / 2} M_{1 / 2}^{- T} R_{n}$ ,

which is a discretization of Brownian dynamics

d Q_{t} = M^{- 1} F (Q_{t}) d t + \sqrt{\frac{2}{β}} M_{1 / 2}^{- T} d W_{t} .

(5)

The desired samples

{Q_{n}}

are available as part of the process (and, as a theoretical observation, they can be recovered from the Markov chain

{Q_{n}^{'}}

alone, by eliminating

R_{n}

in the two equations above and solving for

Q_{n}

).

For any MCMC propagator, the forward transfer operator is defined so that

u_{n} = F u_{n - 1}

where

u_{n} (z) = ρ_{n} (z) / ρ (z)

and

ρ_{n} (z)

is the probability density for

Z_{n}

. In particular,

F u_{n - 1} (z) = \frac{1}{ρ (z)} \int ρ (z | z^{'}) u_{n - 1} (z^{'}) ρ (z^{'}) d z^{'}

where

ρ (z | z^{'})

is the transition probablity for the chain. The operator

F

has an eigenfunction

φ_{1} (z) \equiv 1

for eigenvalue

λ_{1} = 1

.

A reversible propagator is one that satisfies detailed balance, which means that

ρ (z^{'} | z) ρ (z) = ρ (z | z^{'}) ρ (z^{'}) .

Detailed balance is equivalent to

F^{†} = F

, where the adjoint

F^{†}

is defined by the condition that

⟨ F v, u ⟩ = ⟨ v, F^{†} u ⟩

for arbitrary

u (z)

and

v (z)

. The BAOAB integrator is not reversible as a sampling propagator, except for the special case

γ = 1 / (2 Δ t)

; however, a scheme consisting of a fixed number of BAOAB steps followed by a momenta flip is reversible. The unmodified BAOAB integrator is, however, in a class of “quasi-reversible” integrators introduced in Section 3.1.

As a model problem for Brownian dynamics, given by Equation (5), consider

F (q) = - m ω^{2} q

. (The absence of boldface indicates scalar quantities.) Changing variables

Q_{t} = {(m β)}^{- 1 / 2} Q_{t}^{'}

and dropping the prime gives the simple stochastic differential equation

d Q_{t} = - ω^{2} Q_{t} d t + \sqrt{2} d W_{t},

(6)

where

W (t)

denotes a standard Wiener process. A perfect realization of

Q (t)

at discrete points is obtained by the reversible propagator

Q_{n} = exp (- ω^{2} δ t) Q_{n - 1} + \sqrt{1 - exp (- 2 ω^{2} δ t)} \frac{1}{ω} R_{n} .

(7)

The probablity density

ρ (q, t)

for

Q (t)

satisfies the Fokker–Planck equation

(\partial / \partial t) ρ = (\partial / \partial q) (ω^{2} q ρ) + (\partial^{2} / \partial q^{2}) ρ

. Writing

ρ (q, t) = u (q, t) ρ (q)

gives

(\partial / \partial t) u = - ω^{2} q (\partial / \partial q) u + (\partial^{2} / \partial q^{2}) u

. The operator on the right-hand side has eigenfunctions

u (q) = {He}_{k} (ω q)

with eigenvalues

- k ω^{2}

for

k = 0, 1, \dots

. The modified version of the Hermite polynomial of degree k is given by

{He}_{k} (x) = {(- 1)}^{k} exp (x^{2} / 2) (d^{n} / d x^{n}) exp (- x^{2} / 2)

. The forward transfer operator

F

for the propagator defined by Equation (7) has these same eigenfunctions and has eigenvalues

λ_{k + 1} = exp (- k ω^{2} δ t)

for

k = 1, 2, \dots

. The spectral gap is

1 - exp (- ω^{2} δ t)

. In the multidimensional case with normal mode frequencies

0 < ω_{1} \leq ω_{2} \leq \dots \leq ω_{ν}

, the time step

δ t

is some fraction, say

\frac{1}{2}

, of

1 / ω_{ν}^{2}

and the spectral gap is very nearly

\frac{1}{2} ω_{1}^{2} / ω_{ν}^{2}

. To see the applicability to practical numerical integrators, consider the Euler–Leimkuhler–Matthews discretization of Equation (6):

Q_{n}^{'} = (1 - ω^{2} δ t) Q_{n - 1}^{'} + (1 - \frac{1}{2} ω^{2} δ t) \sqrt{2 ω^{2} δ t} \frac{1}{ω} R_{n} .

(8)

Comparing Equations (8) and (7) and equating the coefficients of the two terms on their right-hand sides enables Equation (8) to be written in the form of Equation (7) with modified values for

ω

and

δ t

. In particular, the modified value of

exp (- ω^{2} δ t)

is

1 - ω^{2} δ t

, so spectral gap is exactly

ω^{2} δ t

. In the multidimensional case, where

ω_{1}^{2} δ t ≪ 1

, the spectral gap for the discrete stochastics is very nearly that of the continuous stochastics.

2.1. Estimating Integrated Autocorrelation Time

Suggested [8] as a covariance estimate is the quantity

C_{N} (k) = \frac{1}{N} \sum_{n = 0}^{N - k - 1} (u (z_{n}) - {\bar{U}}_{N}) (u (z_{n + k}) - {\bar{U}}_{N}),

with justification in Ref. [9] (pp. 323–324).

The use of all possible terms

C_{N} (k) / C_{N} (0)

in Equation (3) to estimate the IAcT does not converge in the limit

N \to + \infty

, so, in practice, a lag window

w (k)

is used to increasingly damp terms as the noise to signal ratio increases:

τ \approx 1 + \sum_{k = 1}^{N - 1} w (k) \frac{C_{N} (k)}{C_{N} (0)} .

An interesting algorithm called acor for estimating the IAcT is available on the web [10]. Estimating the IAcT can be quite difficult, and acor can give unsatisfactory results. An attempt to improve it [1] is at best marginally successful. For reversible methods, there are properties of the autocorrelation function that may be useful for improving estimates of it [8].

3. Quasi-Reliable Estimates of Sampling Thoroughness

The idea of quasi-reliability is to require that the sample size N be large enough that, with say 95% confidence, the estimated probability of any subset

Ω

of phase space differs by no more than

tol

from its correct value. More specifically, for any subset

Ω

of phase space, an estimate

{\bar{1_{Ω}}}_{, N}

of

E [1_{Ω} (Z)] = Pr (Z \in Ω)

must satisfy

Var [{\bar{1_{Ω}}}_{, N}] \leq \frac{1}{4} {tol}^{2} .

Because

Var [{\bar{1_{Ω}}}_{, N}] \approx τ_{Ω} \frac{1}{N} Var [1_{Ω} (Z)] \leq \frac{1}{4 N} τ_{Ω},

where

τ_{Ω}

is the IAcT for

1_{Ω}

, it is enough that

\frac{1}{4 N} τ_{Ω} \leq \frac{1}{4} {tol}^{2} for all Ω .

(9)

For molecular simulations, this requires that only those conformations or clusters of conformations having a probability greater than

tol

be sampled.

In practice, molecular systems have many symmetries, which dramatically reduces the amount of sampling needed. For example, water molecules are generally considered interchangeable as are many subsets of atoms on a given molecule, e.g., the 2 hydrogen atoms of any water molecule. More formally, there are permutations P of the variables

z

such that

ρ (P z) = ρ (z)

and that

P^{- 1} F P = F

where

(P u) (z) = u (P z)

. For such symmetries P, the quasi-reliablity requirement considers only those

Ω

for which

1_{Ω} (P z) = 1_{Ω} (z)

.

It is helpful to express the IAcT in terms of the forward transfer operator. It can be shown that

E [v (Z_{0}) u (Z_{k})] = ⟨ F^{k} v, u ⟩ .

and, in particular,

\begin{matrix} C (k) & = & E [u (Q_{0}) u (Q_{k})] - E [u (Q_{0})] E [u (Q_{k})] = ⟨ F^{k} u, u ⟩ - ⟨ E u, u ⟩ \\ = & \{\begin{matrix} ⟨ (1 - E) u, u ⟩, & for k = 0, \\ ⟨ F_{0}^{k} u, u ⟩, & for k \geq 1, \end{matrix} \end{matrix}

(10)

using the fact that

E F_{0} = F_{0} E = 0

. The integrated autocorrelation time can be rewritten as

τ = \frac{C (0) + 2 \sum_{k = 1}^{+ \infty} C (k)}{C (0)} = \frac{⟨ (1 - E + 2 \sum_{k = 1}^{+ \infty} F_{0}^{k}) u, u ⟩}{⟨ (1 - E) u, u ⟩} = \frac{⟨ G u, u ⟩}{⟨ (1 - E) u, u ⟩}

where

G = 1 - E + F_{0} {(1 - F_{0})}^{- 1} + {(F_{0} {(1 - F_{0})}^{- 1})}^{†},

which is a self-adjoint operator.

For a reversible propagator, for which

F

and hence

F_{0}

is self adjoint, an arbitrary preobservable u is in many cases expressible as a linear combination of the eigenfunctions

φ_{k} (z)

of

F_{0}

, corresponding to eigenvalues

1 > λ_{2} \geq λ_{3} \geq \dots > - 1

. (For a more rigorous treatment, see Ref. [8] (Section 2).) The IAcT for u is then simply a weighted average, of the values

1 + \frac{2 λ_{k}}{1 - λ_{k}} = \frac{1 + λ_{k}}{1 - λ_{k}}

with weights

{⟨ φ_{k}, u ⟩}^{2} / (⟨ φ_{k}, φ_{k} ⟩ ⟨ u, u ⟩)

. This is maximized for

u = φ_{2}

, since

(1 + λ_{2}) / (1 - λ_{2})

is the largest of these values. Note that, for

λ_{2}

negative,

τ

could be much less than 1. Having

τ < 1

may appear paradoxical until it is recognized that Equation (1) is simply an asymptotic approximation for

N \to + \infty

.

For the simple example with

F (q) = - ω^{2} q

, the eigenfunction

φ_{2} (q) = \sqrt{2} ω q

. The indicator function

1_{Ω}

that is richest in

φ_{2} (q)

is the one with

Ω = [0, + \infty]

, for which the first weight is

2 / π

. This means that the IAcT for

1_{Ω}

is at least

2 / π

of the maximum IAcT. For a multimodal distribution, the eigenfunction

φ_{2} (q)

corresponding to the subdominant eigenvalue

λ_{2}

[2,3] is closer than is the function

\sqrt{2} ω q

to an indicator function modulo a constant. Therefore, little is lost and simplicity is gained, if we use the maximun IAcT over all observables satisfying the symmetries instead of just indicator functions:

τ_{max} = \sup_{u \in W} \frac{⟨ G u, u ⟩}{⟨ (1 - E) u, u ⟩}

(11)

where

W = {u = u (z) | u (P z) = u (z) for symmetries P}

is our set of preobservables. This can be simplified to

τ_{max} = \sup_{u \in W} \frac{⟨ G u, u ⟩}{⟨ u, u ⟩} .

(12)

To see this, note that the same supremum is obtained for both objective functions if the function u is restricted so that

E u = 0

and that for such a function u the two objective functions are equal.

For the simple example of Equation (8), the IAcT is maximized by

u (q) = q

. In the case of a multidimensional Gaussian distribution, the maximum occurs for linear combination

a^{T} q

of

q

where

a

is the eigenvector of the Hessian of

V (q)

corresponding to its smallest eigenvalue

ω_{1}^{2}

. For multimodal distributions in one dimension, it appears that the maximizing preobservable

φ_{2} (q)

is qualitatively similar to q in the sense that it is monotonic [2,3].

As is customary when seeking an unknown function, one considers a finite linear combination

u (q) = a^{T} u (q)

of basis functions

u_{i} \in W

where the

a_{i}

are chosen to maximimize the IAcT. The number of basis functions is limited by computational cost. The result of Section 5 hints that the ideal number might be

O (ν^{2})

. From Equation (10), the autocovariance for such

u (q)

is

C (k) = a^{T} C_{k} a

where

C_{k} = \{\begin{matrix} ⟨ (1 - E) u, u^{T} ⟩, & for k = 0, \\ ⟨ F_{0}^{k} u, u^{T} ⟩, & for k \geq 1 . \end{matrix}

(13)

Consequently,

τ_{max} \approx max_{a} \frac{a^{T} K a}{a^{T} C_{0} a} where K = C_{0} + 2 \sum_{k = 1}^{+ \infty} C_{k},

(14)

which is a solution of the generalized eigenvalue problem

\frac{1}{2} (K + K^{T}) a = C_{0} a τ .

(15)

The approximation of Equation (14) improves with the number of basis functions.

Without information about the actual distribution

ρ (z)

, a general choice for basis functions might be linear polynomials, which are the “subdominant” eigenfunctions for a Gaussian distribution. Simple examples in Ref. [1] (consisting of a mixture of two Gaussians, a one-node neural “network”, and logistic regression) demonstrate that the use of linear basis functions can yield an IAcT much greater than that of a preobservable of “interest”. For molecular simulation, it is clear that instead of atomic coordinates, a better choice of a basis function is the distance between two atoms, each of which is uniquely distinguishable. In particular, for a protein, one might choose

α

-carbons distributed along the backbone chain of a protein. For further suggestions, consult Ref. [11], which considers the automatic construction of indicator functions for estimating

τ_{max}

, based on the dynamics of the propagator.

3.1. Quasi-Reversible Propagators

As stated above, the BAOAB integrator would be reversible if the final B substep were modified to include a momenta flip, i.e.,

BR: $P_{n} = - (P_{n}^{″} + \frac{1}{2} Δ t F (Q_{n}))$ .

However, reversing direction runs counter to intuition. Additionally, empirical evidence [12] favors irreversible propagators. If the flipped BAOAB integrator is followed by another momenta flip, the result is the original irreversible BAOAB integrator. More generally, a sampler might be said to be “quasi-reversible” if its forward transfer operator

F = R \bar{F}

where each of

\bar{F}

and

R

is reversible and

R^{2} = 1

. For a momenta flip, the operator

R

is defined by

(R u) (q, p) = u (q, - p)

.

Quasi-reversible propagators are special in that their covariance matrices satisfy a special property if basis functions

u_{i}

are chosen so that

R u = D u where D = diag (I^{'}, - I^{″})

and

I^{'}

and

I^{″}

are identity matrices of possibly different dimension. Such basis functions are easy to construct since any function u can be written as a sum of its “even part”

\frac{1}{2} (u + R u)

and its “odd part”

\frac{1}{2} (u - R u)

. In particular, using the fact that

R E = E R

,

\begin{matrix} C_{k} & = & ⟨ F^{k} u, u^{T} ⟩ - ⟨ E u, u^{T} ⟩ = ⟨ u, {(\bar{F} R)}^{k} u^{T} ⟩ - ⟨ u, E u^{T} ⟩ \\ = & ⟨ R u, {(R \bar{F})}^{k} R u^{T} ⟩ - ⟨ R u, R E u^{T} ⟩ = ⟨ D u, F^{k} u^{T} D ⟩ - ⟨ D u, E u^{T} D ⟩ = D C_{k}^{T} D . \end{matrix}

The matrix

C_{k}

thus partitions into four blocks. The symmetric part of

C_{k}

consists of the two diagonal blocks, and the skew-symmetric part consists of the two off-diagonal blocks. Empirical estimates of

C_{k}

lack these symmetry properties. The expected symmetries provides twice as many samples for estimating the sampling error in the off-diagonal elements. Additionally, since the IAcT requires only the symmetric part of

C_{k}

, it is unnecessary to compute the off-diagonal blocks, and the eigenvalue problem decouples into two smaller problems. In addition, it follows that the maximizing linear combination is either a linear combination of the even functions or of the odd functions. For molecular dynamics at least, velocities “relax” much more quickly than positions, so it seems that the long time scales are present only in the position coordinates, so only the

(1, 1)

block might be computed.

4. Irreversible Samplers and Their Superiority

Uniform sampling—if it could be used—converges like

O (1 / N)

, which is superior to the

O (1 / \sqrt{N})

convergence of random sampling. It would be desirable to combine them by promoting near-uniform sampling within layers of near-constant energy and using diffusion to move among energy layers.

4.1. A Very Simple Example

The following very simple example is a discrete analog of Example 2.8 of Ref. [5] and similar to an example of Ref. [13]. Assuming probabilities are represented as row vectors, let

F

be an n by n probability transition matrix given by

F_{i j} = \{\begin{matrix} θ, & if i = j, \\ 1 - θ, & if i = j + 1 \mod n, \\ 0, & otherwise, \end{matrix}

where

0 < θ < 1

. The stationary probability vector is

(1 / n) [1, 1, \dots, 1]

. The inner product for two row vectors

u^{T}

,

v^{T}

is

⟨ v^{T}, u^{T} ⟩ = v^{T} u

, and the adjoint of a transition matrix is its transpose. Since

F

is a circulant matrix, both it and its adjoint have as eigenvectors

u_{k}^{T} = [1, ζ^{k - 1}, \dots, {(ζ^{k - 1})}^{n - 1}]

, where

ζ

denotes the nth root of unity

exp (2 π i / n)

. A straightforward, though lengthy, calculation shows that

u_{k}^{T} F_{0} = (θ + (1 - θ) ζ^{k - 1}) u_{k}^{T} and u_{k}^{T} F_{0}^{T} = (θ + (1 - θ) ζ^{1 - k}) u_{k}^{T},

for

k = 2, 3, \dots, n

, and

u_{k}^{T} G = \frac{θ}{1 - θ} u_{k}^{T} .

Therefore,

2 / | 1 - λ_{2} | = 2 / ((1 - θ) | 1 - ζ |) = 1 / ((1 - θ) sin (π / n)),

but

τ_{max} = θ / (1 - θ) .

This a most elementary illustration of the assertion [5] that “the asymptotic variance for every observable can be made as small as desired under large irreversible drift”. If

θ ≪ 1

, the IAcT is likewise much less than 1 despite the very strong correlation between successive values of the state j. The reason for this is that the IAcT is a measure of sample quality not independence. The measure

τ

simply says that N samples are as good as

N / τ

independent random samples. The

F

operator encourages a uniform sampling, which is better than uncorrelated random samples. However, as shown in the example that follows, such a dramatic difference between (one less than twice) the reciprocal of the spectral gap and the IAcT does not hold when sampling is inhibited more by energy than by entropy barriers. A reversible propagator can be formed from this simple example by using instead the symmetric part of its propagator:

F^{rv} = \frac{1}{2} (F + F^{T})

. This time

u_{k}^{T} F_{0}^{rv} = (θ + \frac{1}{2} (1 - θ) (ζ^{k - 1} + ζ^{1 - k})) u_{k}^{T}, for k = 2, 3, \dots, n,

and

u_{k}^{T} G^{rv} = (\frac{4}{(1 - θ) (2 - ζ^{k - 1} - ζ^{1 - k})} - 1) u_{k}^{T} .

Therefore

2 / | 1 - λ_{2} | = 4 / ((1 - θ) (2 - ζ - ζ^{- 1})) = 1 / ((1 - θ) {sin}^{2} (π / n)),

and

τ_{max} = 1 / ((1 - θ) {sin}^{2} (π / n)) - 1 .

Clearly, the irreversible propagator is much better for

n ≫ 1

.

4.2. A Very Simple Example with a Barrier

Consider the

2 n

by

2 n

transition matrix given by

F = [\begin{array}{c} 0 & 1 - ε & ε \\ 1 & 0 \\ ⋱ & ⋱ \\ 1 & 0 \\ ε & 0 & 1 - ε \\ 1 & 0 \\ ⋱ & ⋱ \\ 1 & 0 \end{array}] .

If

ε

were zero, the eigenvalue 1 would have multiplicity 2, with orthogonal eigenvectors given by the vector of all ones and by

f^{T} = [1, 1, \dots, 1, - 1, - 1, \dots, - 1]

. For small

ε > 0

, it is expected that the eigenvector corresponding to the subdominant eigenvalue would be a perturbation of

f^{T}

. Indeed, it happens that

f^{T} F^{n} = (1 - 2 ε) f^{T}

. This also holds for

{(F^{n})}^{T}

, so the IAcT for

F^{n}

is

1 / ε - 1

. This suggests a value of about

n (1 / ε - 1)

for the IAcT of

F

. This is corroborated by numerical computation, as shown in Table 1a.

Table 1.

τ_{max}

is the IAcT for

F

and

τ_{max}^{rv}

is that for

F^{rv} = \frac{1}{2} (F + F^{T})

.

The excess IAcT due to making the propagator reversible is given by Table 1b.It appears to grow quadratically with n, consistent with the diffusive nature of the propagator. Summing pairs of entries from the two tables shows that for this example the advantage of irreversibility depends on the relative importance of entropy barriers to energy barriers, which is consistent with intuition from molecular dynamics.

5. Optimal Langevin Damping for a Model Problem

To analyze the effect of the damping parameter

γ

on the IAcT for Langevin dynamics, given by Equation (4), consider the standard model problem

F (q) = - m ω^{2} q

. Changing variables

Q_{t} = {(β m)}^{- 1 / 2} Q_{t}^{'}

,

P_{t} = β^{- 1 / 2} m^{1 / 2} P_{t}^{'}

, and dropping the primes gives

d Q_{t} = P_{t} d t, d P_{t} = - ω^{2} Q_{t} d t - γ P_{t} d t + \sqrt{2 γ} d W_{t} .

Assume exact integration with step size

Δ t

.

The analysis is rather lengthy, so for the benefit of the reader who wishes to omit it, the discussion and conclusions are given here:

Reaching precise conclusions is difficult for most of the analysis unless $ω Δ t$ is bounded above by some constant of order 1, which seems to be satisfied in practice. This assumption underlies the statements that follow.
The spectral gap is an increasing function of $ω$ , so for a multidimensional quadratic potential energy, the value of $γ$ that maximizes the spectral gap depends on the lowest frequency $ω_{1}$ .
The spectral gap is maximized for $γ \leq 2 ω$ , corresponding to an underdamped system, for which the spectral gap is $ω Δ t + O (Δ t^{2})$ .
The eigenfunctions of the operator $G$ can be partitioned into eigenspaces $P_{k}^{'}$ , $k = 0, 1, \dots$ , where $P_{k}^{'}$ is a linear combination of $k + 1$ specific polynomials of degree k in $ω q$ and p. The greatest eigenvalue of $G$ is $τ_{max} = {max}_{k} τ_{max}^{(k)}$ where $τ_{max}^{(k)}$ is the maximum IAcT over $P_{k}^{'}$ .
Figure 1 shows that, for fixed $Δ t$ and $γ$ , the value $τ_{max}^{(k)}$ is an increasing function of $ω$ , at least for $k = 1, 2, 3, 4$ . Hence, as for the spectral gap, it is the lowest frequency $ω_{1}$ that dictates the maximum $τ$ .

Figure 1. $γ Δ t τ_{max}^{(k)}$ vs. $ω / γ$ for $k = 1, 2, 3, 4$ .
Figure 2 indicates that, for fixed $Δ t$ and $ω$ , the maximizing $τ$ is either $τ_{max}^{(1)}$ or $τ_{max}^{(2)}$ , depending on the value of $γ$ . The optimal damping coefficient is $γ = (\sqrt{6} / 2) ω$ , for which $τ_{max}^{(1)} = τ_{max}^{(2)} = \sqrt{6} / (ω Δ t)$ .

Figure 2. $ω Δ t τ_{max}^{(k)}$ vs. $γ / ω$ for $k = 1, 2, 3, 4$ .
For the preobservable $ω q$ , and, indeed, for any odd polynomial, the IAcT $τ$ becomes arbitrary small as $γ$ goes to zero. This does not mean, however that the variance goes to 0, because Equation (1) is an asymptotic result, and the IAcT is a prefactor for the leading order $1 / N$ term in the expression for the variance. The next order $1 / N^{2}$ term would dominate if $γ$ were very small. An order $1 / N^{2}$ error is characteristic of uniform sampling, which would be the consequence of nearly ballistic movement. In addition, odd polynomials are special in that their expectation is independent of total energy, so it matters not that barely diffusive movement samples different values of total energy only at a very slow rate.
The eigenfunction for $τ_{max}^{(2)}$ is a specific linear combination of $ω^{2} q^{2} - 1$ and $p^{2} - 1$ . For quadratic polynomials, the total energy does affect its expected value, which is why it is necessary that $γ$ be large enough to sample different energies on a reasonable time scale.

5.1. The Forward Transfer Operator

The forward transfer operator is

F u = ρ^{- 1} e^{Δ t L_{K}} (ρ u)

where

ρ (q, p) \propto exp (- \frac{1}{2} p^{2} - \frac{1}{2} ω^{2} q^{2}) .

and

L_{K}

is the Fokker–Planck operator [14] (Equations (10.1)–(10.3), (10.9))

L_{K} f = - p \frac{\partial}{\partial q} f + ω^{2} q \frac{\partial}{\partial p} f + γ \frac{\partial}{\partial p} (p f) + γ \frac{\partial^{2}}{\partial p^{2}} f .

Therefore,

F = e^{Δ t L}

where

L f = \frac{1}{ρ} L_{K} (ρ f) = - p \frac{\partial}{\partial q} f + ω^{2} q \frac{\partial}{\partial p} f - γ p \frac{\partial}{\partial p} f + γ \frac{\partial^{2}}{\partial p^{2}} f .

Using the relation

L f = \frac{1}{ρ} L_{K} (ρ f)

, it is easy to show that the adjoint

L^{†} f = p \frac{\partial}{\partial q} f - ω^{2} q \frac{\partial}{\partial p} f - γ p \frac{\partial}{\partial p} f + γ \frac{\partial^{2}}{\partial p^{2}} f .

5.2. Gamma for Maximum Spectral Gap

Ref. [14] (Chapter 10, Equations (1), (2), (3b), (9), (22), (52), (60), (71), (72), (74), (77), (78), (82) and (83)) gives the eigenvalues of

L_{K}

as

μ_{k, l} = - \frac{1}{2} (k + l) γ - \frac{1}{2} (k - l) δ, k, l = 0, 1, \dots,

with eigenfunctions

ψ_{k, l} = ρ {(q, p)}^{1 / 2} {(k! l! δ^{k + l})}^{- 1 / 2} {(\sqrt{γ_{+}} B - \sqrt{γ_{-}} A)}^{k} {(- \sqrt{γ_{-}} B + \sqrt{γ_{+}} A)}^{l} ρ {(q, p)}^{1 / 2},

where

γ_{\pm} = \frac{1}{2} (γ \pm δ) and δ = \sqrt{γ^{2} - 4 ω^{2}},

with the radical sign denoting the principal square root, and operators

A

,

B

defined by

A f = - ω^{- 1} \frac{\partial}{\partial q} f + \frac{1}{2} ω q f, B f = - \frac{\partial}{\partial p} f + \frac{1}{2} p f .

The operator

L

has eigenfunctions

ρ {(q, p)}^{- 1} ψ_{k, l} (q, p)

and the same eigenvalues;

F

has these same eigenfunctions and has eigenvalues

exp (Δ t μ_{k, l})

. See Ref. [15] for a more general result.

For

k + l = 1

, one has

μ_{0, 1} = - γ_{-}

,

μ_{1, 0} = - γ_{+}

and

φ_{0, 1} = {(γ_{-} / δ)}^{1 / 2} (- p + γ_{+} q) ρ (q, p), φ_{1, 0} = {(γ_{+} / δ)}^{1 / 2} (p - γ_{-} q) ρ (q, p) .

In the case

γ_{\pm} = \frac{1}{2} γ

,

F

has a double eigenvalue

λ_{2} = λ_{3} = e^{- ω Δ t}

but one eigenfunction

φ_{2} (q, p) = p - ω q

and one generalized eigenfunction

φ_{3} (q, p) = q

, satisfying

(F - λ_{2}) φ_{3} = φ_{2}

. This results in behavior involving a linear combination of

e^{- ω t}

and

t e^{- ω t}

.

The eigenvalue nearest 1 depends on

Δ t

. For small enough

Δ t

, it is

exp (- γ_{-} Δ t)

, and the spectral gap is

ω Δ t + O (Δ t^{2})

for

γ \leq 2 ω

and

ω Δ t (2 ω / γ) {(1 + {(1 - {(2 ω / γ)}^{2})}^{1 / 2})}^{- 1} + O (Δ t^{2})

otherwise. To maximize the spectral gap, choose

γ

to be no greater than

2 ω

, corresponding to an underdamped system.

5.3. Gamma for Maximum IAcT

To obtain

τ_{max}

, one needs eigenelements for

G

, given by Equation (2) with

F = exp (Δ t L)

. Note that the subspace of polynomials of degree

\leq k

is closed under application of

L

and

L^{†}

. Moreover, this holds separately for subspace

P_{k}

of odd polynomials of degree

\leq k

for k odd and for subspace

P_{k}

of even polynomials of degree

\leq k

for k even. It also applies to operators

E

,

F

,

F^{†}

, and

G

. Hence, the

P_{k}

are eigenspaces for

G

. These eigenspaces can be further decomposed as follows. Let

P_{k}^{'} = P_{k}

for

k < 2

, and let

P_{k}^{'} = P_{k} \cap P_{k - 2}^{⊥}

for

k \geq 2

. The claim is that

P_{k}^{'}

is an eigenspace, of dimension

k + 1

. To confirm this, it is enough to show that

⟨ G u, v ⟩ = 0

for any

u \in P_{k}

,

v \in P_{k - 2}

, which follows almost immediately, since

G^{†} = G

and

G v \in P_{k - 2}

. Let

u

be a basis for

P_{k}^{'}

chosen so there are even functions of p followed by odd functions of p. In particular,

\begin{matrix} u & = & {[ω q, p]}^{T}, \\ u & = & {[ω^{2} q^{2} - 1, ω q p, p^{2} - 1]}^{T}, \\ u & = & {[ω^{3} q^{3} - 3 ω q, (ω^{2} q^{2} - 1) p, ω q (p^{2} - 1), p^{3} - 3 p]}^{T}, \\ u & = & {[ω^{4} q^{4} - 6 ω^{2} q^{2} + 3, (ω^{3} q^{3} - 3 ω q) p, (ω^{2} q^{2} - 1) (p^{2} - 1), ω q (p^{3} - 3 p), p^{4} - 6 p + 3]}^{T}, \end{matrix}

for

k = 1, 2, 3, 4

, respectively. The polynomials in

ω q

and in p are modified versions

{He}_{k}

of Hermite polynomials. We have

L u = A u

for some matrix of constants A, given by

A = [\begin{matrix} 0 & - k ω \\ ω & - γ & ⋱ \\ ⋱ & ⋱ & - ω \\ k ω & - k γ \end{matrix}] .

(16)

(cf. Ref. [14], Equations (10.96) and (10.97) for

L_{K}

.) We have

E u = 0

and

F_{0} u = F u = exp (Δ t A) u

. Therefore, from Equations (13) and (14), one has

C_{k} = ⟨ F_{0}^{k} u, u^{T} ⟩ = exp (k Δ t A) ⟨ u, u^{T} ⟩,

and

K = coth (- \frac{1}{2} Δ t A) ⟨ u, u^{T} ⟩ = coth (- \frac{1}{2} Δ t A) C_{0} .

For each value of k,

τ_{max}

is the maximum eigenvalue of

\frac{1}{2} (C_{0}^{- 1} coth (- \frac{1}{2} Δ t A) C_{0} + coth {(- \frac{1}{2} Δ t A)}^{T}) .

(17)

As explained in Section 3.1, if the basis functions are re-ordered so that those that are even functions of p precede those that are odd functions of p, the eigenvalue problem splits into two nearly equal parts.

For

k = 1

,

A = X Λ X^{- 1}

where

Λ = diag (- γ_{-}, - γ_{+})

and

X = [\begin{matrix} ω & ω \\ γ_{-} & γ_{+} \end{matrix}] .

The covariance matrix

C_{0} = ⟨ u, u^{T} ⟩ = diag (1, 1)

, and matrix (17) is

diag (τ_{+}, τ_{-})

where

τ_{\pm} = \frac{1}{δ} (\pm γ_{\pm} coth (\frac{1}{2} Δ t γ_{-}) \mp γ_{\mp} coth (\frac{1}{2} Δ t γ_{+})) .

Using the identity

coth (x \pm y) = (sinh 2 x \mp sinh 2 y) / (cosh 2 x - cosh 2 y)

, one gets that the eigenvalues of matrix (17) are

τ_{\pm} = \frac{sinh (\frac{1}{2} Δ t γ) \pm γ sinh (\frac{1}{2} Δ t δ) / δ}{cosh (\frac{1}{2} Δ t γ) - cosh (\frac{1}{2} Δ t δ)} .

The greater of

τ_{\pm}

is

τ_{+}

, and it is a straightforward exercise to show that

τ_{+}

is an decreasing function of

ω

as long as

{(ω Δ t)}^{2} \leq π^{2} + {(γ Δ t / 2)}^{2} .

(18)

In practice, a numerical integrator is used, whose step size is chosen so that

Δ t L = θ

where

θ

is some fractional value, e.g.,

\frac{1}{2}

, and L is the magnitude of largest eigenvalue of the Jacobian matrix of the right-hand side of the system of (first-order) stochastic differential equations. In particular,

ω Δ t = θ if γ \leq 2 ω;

(19)

otherwise, it holds that

ω Δ t < γ Δ t / 2

. Hence, Inequality (18) is more than satisfied. It is more complicated to analyze the behavior of values of

τ

for

k > 1

, so we exploit the smallness of

ω Δ t

and

γ Δ t

to do an asymptotic analysis. From Expression (17), one sees that for

P_{k}^{'}

, its

τ_{max}

equals

τ_{max}^{(k)} + O (Δ t)

where

τ_{max}^{(k)}

is the greatest eigenvalue of

- \frac{1}{Δ t} (C_{0}^{- 1} A^{- 1} C_{0} + A^{- T}) .

(20)

For

k = 1

, one has

C_{0} = diag (1, 1)

and

τ_{max}^{(1)} = 2 γ / (ω^{2} Δ t)

, corresponding to the eigenfunction

ω q

. For

k = 2

, one has

A^{- 1} = \frac{1}{2 γ ω^{2}} [\begin{matrix} - γ^{2} - ω^{2} & 2 γ ω & - ω^{2} \\ - γ ω & 0 & 0 \\ - ω^{2} & 0 & - ω^{2} \end{matrix}],

C_{0} = diag (2, 1, 2)

, and the largest eigenvalue for Expression (20) is

τ_{max}^{(2)} = \frac{1}{2 Δ t} (\frac{γ}{ω^{2}} + \frac{2}{γ} + \sqrt{\frac{γ^{2}}{ω^{4}} + \frac{4}{γ^{2}}}),

corresponding to an eigenfunction that is some linear combination of

ω^{2} q^{2} - 1

and

p^{2} - 1

. Again

τ_{max}^{(k)}

is a decreasing function of

ω

for small enough

Δ t

. For general k, the size of the elements of A in Equation (16) increase with

ω

, which suggests that for practical values of

Δ t

, the largest eigenvalue for Expression (20) decreases with

ω

. This can be confirmed numerically. To reduce the number of parameters, write

A = - ω B (γ / ω)

, and rewrite Expression (20) as

\frac{1}{ω Δ t} (C_{0}^{- 1} B {(γ / ω)}^{- 1} C_{0} + B {(γ / ω)}^{- T}) .

(21)

The value

τ_{max}^{(k)}

is the largest eigenvalue of Matrix (21), so

γ Δ t τ_{max}^{(k)}

is a function only of the ratio

ω / γ

. Figure 1 plots this quantity for

k = 1, 2, 3, 4

, confirming that

τ_{max}^{(k)}

is decreasing function of

ω

for fixed values of

γ

and

Δ t

.

Therefore, for a multidimensional quadratic potential,

τ_{max}

is very likely to be determined by the lowest frequency mode. (This is readily established for the spectral gap for reasonable values of

Δ t

.)

Having established that the lowest frequency mode is most likely responsible for

τ_{max}

, let

ω

denote the lowest frequency

ω_{1}

, and consider the optimal choice of

γ

. From Matrix (21), observe that

ω Δ t τ_{max}^{(k)}

is a function only of

γ / ω

. Accordingly, we choose

γ / ω

to minimize

{max}_{k} ω Δ t τ_{max}^{(k)}

. However,

Δ t

may depend on

γ

, but only if

γ > 2 ω_{ν}

, according to Equation (19). Let us initially ignore this possibility and revisit it after determining the optimal

γ

.

Assuming that the maximum of

ω Δ t τ_{max}^{(k)}

is attained for

k = 1

or

k = 2

, there are two possible locations for the best

γ

. One is the choice

γ = \sqrt{2} ω

, for which

ω Δ t τ_{max}^{(2)} = 1 + \sqrt{2}

and

ω Δ t τ_{max}^{(1)} = 2 \sqrt{2}

. The other is the choice

γ = (\sqrt{6} / 2) ω

where

ω Δ t τ_{max}^{(k)} = \sqrt{6}

for both

k = 1

and

k = 2

. The latter is the better choice. This and the assumption that the maximum occurs for

k < 3

is supported by Figure 2, which plots

τ_{max}

as a function of

γ

for fixed values of

ω

and

Δ t

. Since

(\sqrt{6} / 2) ω_{1} < 2 ω_{ν}

, we see from Equation (19) that there is no concern about

Δ t

limiting the value of

γ

.

Note that for the optimal

γ

, the dynamics is underdamped in every mode.

6. Discussion and Conclusions

It is asserted in the literature [1,4] that, for comparing MCMC propagators and for ensuring reasonably reliable simulation results, it is of great value to estimate the maximum autocorrelation time over all possible observables. Certainly, this would seem desirable for comparing propagators, since it is best to assume as little as possible about the actual observables that are to be estimated from the Markov chain. In addition, the longest IAcT would be more reliably estimated than any other IAcT, because its accurate estimation would call for a greatest number of samples; moreover, the accuracy of estimates of other IAcTs might be influenced by the longest IAcT. Unfortunately, estimating IAcTs and assessing such estimates is a difficult task, which can benefit greatly from further research.

In this article, it is suggested that the maximum IAcT be estimated by considering an arbitrary linear combination of basis functions chosen to capture the lowest frequency motions of the dynamics of the MCMC chain. It is also worth considering the approach based on a set of well chosen indicator functions [11].

Another quantity for measuring performance of an MCMC propagator is the spectral gap. It is related to equilibration time, but as a predictor of IAcTs, it can be arbitrarily too pessimistic or it can be too optimistic. Such is the case for irreversible propagators, especially for energy landscapes where entropy barriers dominate energy barriers.

Shorter integrated autocorrelation times are thus a primary consideration in the design of MCMC propagators. Simple examples are constructed to explain how irreversibility might help, though their relevance to practical propagators is unclear. Nonetheless, it is advantageous not to insist on reversibility. In particular, the ballistic component of propagators like hybrid Monte Carlo and Langevin dynamics may offer dramatic speedups for overcoming entropy barrriers, though no advantage for energy barriers.

Irreversible propagators typically have a parameter that determines the extent of diffusive behavior. For discrete Langevin dynamics applied to a multidimensional quadratic potential, the optimal value corresponds to slightly underdamped dynamics for the lowest frequency mode, with other modes experiencing even lighter dampling. However, this conclusion is not necessarily indicative of the optimal damping for the usual situation, in which there a multiplicity of energy minima, and further studies are needed.

In a nutshell, general-purpose sampling benefits greatly from the use of irreversible propagators in extended state space, with “diffusion parameter(s)” chosen to minimize the maximum integrated autocorrelation time, which for a quadratic potential energy corresponds to moderately underdamped dynamics in the lowest frequency mode.

Author Contributions

Youhan Fang performed the analysis for Section 5.2 and the computation for Section 5.3. Robert D. Skeel did the analysis for Section 3.1, Section 4, Section 5.1 and Section 5.3 and wrote the paper. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fang, Y.; Cao, Y.; Skeel, R. Quasi-Reliable Estimates of Effective Sample Size. arXiv, 2017; arXiv:1705.03831v2. [Google Scholar]
Schütte, C.; Huisinga, W. Biomolecular Conformations as Metastable Sets of Markov Chains. In Proceedings of the 38th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 4–6 October 2000; pp. 1106–1115. [Google Scholar]
Schütte, C.; Sarich, M. Metastability and Markov State Models in Molecular Dynamics; American Mathematical Society: Providence, RI, USA, 2013. [Google Scholar]
Lyman, E.; Zuckerman, D.M. On the Structural Convergence of Biomolecular Simulations by Determination of the Effective Sample Size. J. Phys. Chem. B 2007, 111, 12876–12882. [Google Scholar] [CrossRef] [PubMed]
Rey-Bellet, L.; Spiliopoulos, K. Irreversible Langevin Samplers and Variance Reduction: A Large Deviations Approach. Nonlinearity 2015, 28, 2081–2104. [Google Scholar] [CrossRef]
Leimkuhler, B.; Matthews, C. Rational Construction of Stochastic Numerical Methods for Molecular Sampling. Appl. Math. Res. Express 2013, 2013, 34–56. [Google Scholar] [CrossRef]
Leimkuhler, B.; Matthews, C.; Stoltz, G. The Computation of Averages from Equilibrium and Nonequilibrium Langevin Molecular Dynamics. IMA J. Numer. Anal. 2016, 36, 13–79. [Google Scholar] [CrossRef]
Geyer, C.J. Practical Markov Chain Monte Carlo. Stat. Sci. 1992, 7, 473–511. [Google Scholar] [CrossRef]
Priestly, M.B. Spectral Analysis and Time Series; Academic Press: Cambridge, MA, USA, 1981. [Google Scholar]
Goodman, J. Acor, Statistical Analysis of a Time Series. 2009. Available online: http://www.math.nyu.edu/faculty/goodman/software/acor/ (accessed on 4 August 2014).
Zhang, X.; Bhatt, D.; Zuckerman, D.M. Automated Sampling Assessment for Molecular Simulations Using the Effective Sample Size. J. Chem. Theory Comput. 2010, 6, 3048–3057. [Google Scholar] [CrossRef] [PubMed]
Cancés, E.; Legoll, F.; Stoltz, G. Special Issue on Molecular Modelling: Theoretical and Numerical Comparison of Some Sampling Methods for Molecular Dynamics. ESAIM Math. Model. Numer. Anal. 2007, 41, 351–389. [Google Scholar] [CrossRef]
Diaconis, P.; Holmes, S.; Neal, R.M. Analysis of a Nonreversible Markov Chain Sampler. Ann. Appl. Probab. 2000, 10, 726–752. [Google Scholar] [CrossRef]
Risken, H. The Fokker-Planck Equation. Methods of Solution and Applications; Springer Series in Synergetics; Springer: New York, NY, USA, 1984; Volume 18. [Google Scholar]
Kozlov, S. Effective diffusion for the Fokker-Planck equation. Math. Notes 1989, 45, 360–368. [Google Scholar] [CrossRef]

Figure 1.

γ Δ t τ_{max}^{(k)}

vs.

ω / γ

for

k = 1, 2, 3, 4

.

Figure 2.

ω Δ t τ_{max}^{(k)}

vs.

γ / ω

for

k = 1, 2, 3, 4

.

Table 1.

τ_{max}

is the IAcT for

F

and

τ_{max}^{rv}

is that for

F^{rv} = \frac{1}{2} (F + F^{T})

.

Table 1.

τ_{max}

is the IAcT for

F

and

τ_{max}^{rv}

is that for

F^{rv} = \frac{1}{2} (F + F^{T})

.

(a) $τ_{\max}$
$n ∖ ε$	0.1	0.01	0.001
10	90	990	9990
100	900	9900	99,900
1000	9000	99,000	999,000
(b) $τ_{\max}^{rv} - τ_{\max}$
$n ∖ ε$	0.1	0.01	0.001
10	35	33	33
100	3910	3511	3355
1000	403,612	389,921	351,047

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Comparing Markov Chain Samplers for Molecular Simulation

Abstract

1. Summary

2. Preliminaries

2.1. Estimating Integrated Autocorrelation Time

3. Quasi-Reliable Estimates of Sampling Thoroughness

3.1. Quasi-Reversible Propagators

4. Irreversible Samplers and Their Superiority

4.1. A Very Simple Example

4.2. A Very Simple Example with a Barrier

5. Optimal Langevin Damping for a Model Problem

5.1. The Forward Transfer Operator

5.2. Gamma for Maximum Spectral Gap

5.3. Gamma for Maximum IAcT

6. Discussion and Conclusions

Author Contributions

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics