physics0508095/man.tex
1: \documentclass[prb,twocolumn,showpacs,showkeys,preprintnumbers,amsmath,amssymb]{revtex4}
2: %\documentclass[prb,preprint,showpacs,showkeys,preprintnumbers,amsmath,amssymb]{revtex4}
3: 
4: \usepackage{bm} %bold math
5: \usepackage{graphicx} %figures
6: \bibliographystyle{apsrev}
7: 
8: \newcommand{\df}{$\Delta F$}
9: \newcommand{\dfrs}{$\Delta F_{\rm ref \rightarrow phys}$}
10: \newcommand{\dfsr}{$\Delta F_{\rm phys \rightarrow ref}$}
11: 
12: \begin{document}
13: 
14: \title{Simple estimation of absolute free energies for biomolecules}
15: \author{ F.\ Marty Ytreberg\footnote{E-mail: fmy1@pitt.edu} }
16: \author{ Daniel M.\ Zuckerman\footnote{E-mail: dmz@ccbb.pitt.edu} }
17: \affiliation{Department of Computational Biology,
18:   School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261}
19: \date{\today}
20: 
21: \begin{abstract}
22: One reason that free energy difference calculations are notoriously
23: difficult in molecular systems
24: is due to insufficient conformational overlap, or similarity, between
25: the two states or systems of interest. The degree of overlap is irrelevant,
26: however, if the absolute free energy of each state can be computed. We present
27: a method for calculating the absolute free energy
28: that employs a simple construction of an exactly computable
29: reference system which
30: possesses high overlap with the state of interest. The approach
31: requires only a physical ensemble of conformations generated via
32: simulation, and an auxiliary
33: calculation of approximately equal central-processing-unit (CPU) cost.
34: Moreover, the calculations can converge to the correct free energy value
35: even when the physical ensemble is incomplete or improperly distributed.
36: As a ``proof of principle,''
37: we use the approach to correctly predict free energies for
38: test systems where the absolute values can be calculated
39: exactly, and also to predict the
40: conformational equilibrium for leucine dipeptide in 
41: implicit solvent.
42: \end{abstract}
43: \keywords{free energy,entropy}
44: \pacs{pacs}
45: \maketitle
46: 
47: \section{Introduction}
48: Knowledge of the free energy for two different
49: states or systems of interest
50: allows the calculation of solubilities,
51: \cite{grossfield-jacs,vangunsteren-onestep}
52: determines binding affinities of ligands to proteins,
53: \cite{kollman-pnas,vangunsteren-estrogen}
54: and determines conformational equilibria
55: (e.g., Ref.\ \onlinecite{ytreberg-shift}).
56: Free energy differences (\df) therefore have potential
57: application in structure-based drug design where current
58: methods rely on {\it ad hoc} protocols to estimate binding affinities.
59: \cite{shoichet-nature,scheraga}
60: 
61: Poor ``overlap,'' the lack of configurational
62: similarity between the two states or systems of interest,
63: is a key cause of computational expense and error in \df\ calculations.
64: The most common approach to improve overlap in free energy
65: calculations (used in thermodynamic integration, and free energy
66: perturbation) is
67: to simulate the system at multiple hybrid, or intermediate stages
68: (e.g., Refs.\ \onlinecite{zwanzig,beveridge,jorgensen,karplus-jcp,mccammon}).
69: However, the simulation of intermediate stages
70: greatly increases the computational cost of the \df\ calculation.
71: 
72: Here, we address the overlap problem by calculating the absolute free
73: energy for each of the end states, thus avoiding the need for any
74: configurational overlap. Our method relies on the calculation of
75: the free energy difference between
76: a reference system (where the exact free energy
77: can be calculated, either analytically or numerically)
78: and the system of interest.
79: 
80: Such use of a reference system with a computable free energy
81: has been used successfully in solids
82: where the reference system is generally a harmonic or
83: Einstein solid, \cite{hoover71,frenkel}
84: and liquid systems, where the reference
85: system is usually an ideal gas. \cite{hoover67,reinhardt-absf}
86: The scheme has also been applied to molecular
87: systems by Stoessel and Nowak, using a harmonic
88: solid in Cartesian coordinates as a reference system. \cite{stoessel}
89: 
90: Other approaches to calculate the absolute free energies of
91: molecules have been developed.
92: Meirovitch and collaborators calculated
93: absolute free energies for peptides in vacuum,
94: for liquid argon and water using the hypothetical
95: scanning method. \cite{meirovitch-deca,meirovitch-argon}
96: Computational cost has thus far limited the approach to
97: peptides with sixty degrees of freedom. \cite{meirovitch-jcp}
98: The ``mining minima'' approach, developed by Gilson and collaborators,
99: estimates the absolute free energy of complex molecules
100: by attempting to enumerate the low-energy conformations 
101: and estimating the contribution to the configurational integral
102: for each. \cite{gilson-jpca,gilson-bj}
103: Anharmonic effects can be included. \cite{gilson-jacs}
104: The mining minima method can, in principle, include potential
105: correlations between the torsions and bond angles or lengths,
106: and uses an approximate method to compute local partition functions.
107: Other investigators have estimated absolute free energies for molecules
108: using harmonic or quasi-harmonic approximations,
109: \cite{karplus-deca,gilson-jacs,aqvist-absf}
110: however, as discussed in
111: Refs.\ \onlinecite{gilson-jacs} and \onlinecite{karplus-deca}
112: local minima can be deviate substantially
113: from a parabolic shape.
114: 
115: We introduce, apparently for the first time, a reference system
116: which is constructed to have high overlap with fairly general
117: molecular systems. The approach
118: can make use of either {\it internal or Cartesian}
119: coordinates. For biomolecules, using internal coordinates greatly  
120: enhances the accuracy of the method since internal coordinates
121: are tailored to the description of conformations.
122: Further, {\it all degrees of freedom and their correlations}
123: are explicitly included in the method.
124: 
125: Our method differs in several ways from the important study of
126: Stoessel and Nowak: \cite{stoessel}
127: (i) we use internal coordinates
128: for molecules which are key for optimizing the overlap between
129: the reference system and the system of interest;
130: (ii) we may use a nearly arbitrary reference potential because
131: only a numerical reference free energy value is needed,
132: not an analytic value;
133: (iii) there is no need, in cases we have studied,
134: to use multi-stage methodology to find
135: the desired free energy due to the overlap built into the
136: reference system, 
137: 
138: We consider this report a ``proof of principle''
139: for our reference system method.
140: After introducing the method,
141: it is tested on single and double-well two-dimensional systems,
142: and on a methane molecule where absolute free energy 
143: estimates can be compared to exact values.
144: The method is then used to compute the absolute free energy
145: of the alpha and beta conformations for
146: leucine dipeptide (ACE-(leu)$_2$-NME) in implicit solvent,
147: {\it using all one-hundred fifteen degrees of freedom},
148: correctly calculating the free energy difference 
149: $\Delta F_{\rm alpha \rightarrow beta}$.
150: Extensions of the method to larger systems are
151: then discussed.
152: 
153: \section{Reference system method\label{sec-method}}
154: \subsection{The fundamental relations}
155: The absolute free energy of the system of interest (``phys'' for physical)
156: is defined using the partition function $Z_{\rm phys}$
157: \begin{eqnarray}
158:     F_{\rm phys} = -k_BT \ln Z_{\rm phys} = \nonumber \\ 
159: 	-k_BT \ln \left[ 
160: 	    \int d \vec{x} \; 
161: 	    e^{-\beta \big(U_{\rm phys}(\vec{x})+K_{\rm phys}(\vec{x})\big)}
162: 	\right],
163: \end{eqnarray}
164: where $T$ is the system temperature, $\beta=1/k_BT$,
165: $U_{\rm phys}$ and $K_{\rm phys}$ are, respectively, the
166: physical potential energy (i.e., simulation forcefield)
167: and the kinetic energy,
168: and $\vec{x}$ represents the full set of
169: configurational coordinates (internal or Cartesian).
170: The kinetic energy term can be integrated exactly to obtain
171: \cite{gilson-jpcb}
172: \begin{eqnarray}
173:     Z_{\rm phys} =
174: 	\Bigg[ \frac{1}{h^{3N}}\frac{8\pi^2}{\sigma C^{\circ}}
175: 	\prod_{i=1}^N \big( 2\pi k_B T m_i \big)^{3/2} \Bigg] 
176: 	    \int d \vec{x} \; e^{-\beta U_{\rm phys}(\vec{x})},
177:     \label{eq-Zphys}
178: \end{eqnarray}
179: where  $m_i$ is the mass of atom $i$, $h$ is Planck's constant,
180: $C^{\circ}$ is the standard concentration,
181: $\sigma$ is the symmetry number, \cite{gilson-bj}
182: $N$ is the number of particles in the system,
183: and the integral is
184: defined to be the configurational partition function.
185: For method used in this study the absolute free energy of the system
186: of interest is calculated using
187: a reference system (``ref''), and the following relationships are used,
188: \begin{eqnarray}
189:     Z_{\rm phys} = Z_{\rm ref} \frac{Z_{\rm phys}}{Z_{\rm ref}}, \nonumber \\
190:     F_{\rm phys} = F_{\rm ref} + \Delta F_{\rm ref \rightarrow phys},
191:     \label{eq-Fphys}
192: \end{eqnarray}
193: where $F_{\rm ref}$ is the trivially computable
194: free energy of the reference system, and \dfrs\ is the free energy
195: difference between the reference and physical system which can
196: be calculated using standard techniques.
197: 
198: For this report, we include estimates of the configurational
199: integral only, i.e., the leading constant factor in square brackets in
200: Eq.\ (\ref{eq-Zphys}) is not included in our results. Ignoring
201: the constant is not a limitation since, for the conformational free
202: energies studied here, the term cancels for
203: free energy differences.
204: 
205: \subsection{The reference energy and its normalization}
206: The trivial identities of Eq.\ (\ref{eq-Fphys}) suggest that arbitrary
207: reference systems can be used in our approach. To be concrete and anticipate
208: the procedure used, our discussion below will assume that a finite-length
209: simulation of the system of interest has been performed---from which
210: histograms of the coordinates have been generated.
211: For the molecular systems studied in this report, ordinary
212: Langevin dynamics simulations are performed using standard
213: forcefields.
214: The reference potential energy can be constructed from a wide
215: variety of histograms, as discussed below. Denoting
216: the computed histograms over all coordinates as $P(\vec{x})$, we define
217: \begin{eqnarray}
218:     U_{\rm ref}(\vec{x}) \equiv -k_BT \ln P(\vec{x}),
219:     \label{eq-Uref}
220: \end{eqnarray}
221: where $P(\vec{x})$ is the normalized probability of a particular
222: configuration (corresponding to a set of histogram bins);
223: see Fig.\ \ref{fig-schematic}.
224: For example, if all coordinates are binned as independent, then
225: \begin{eqnarray}
226:     P(\vec{x})=\prod_{i=1}^{N_{\rm coords}} P_i(x_i),
227:     \label{eq-Pind}
228: \end{eqnarray}
229: where $P_i(x_i)$ is the binned probability distribution (histogram)
230: for the $i^{\rm th}$ coordinate, and there
231: are $N_{\rm coords}$ degrees of freedom in the system.
232: If all coordinates are binned as pairwise correlated, then
233: \begin{eqnarray}
234:     P(\vec{x})=\prod_{ \{ i,j \} } P_{ij}(x_i,x_j),
235:     \label{eq-Pcorr}
236: \end{eqnarray}
237: where $\{ i,j \}$ is a set of pairs in which each coordinate occurs exactly
238: once, and $P_{ij}(x_i,x_j)$ is the probability for two particular coordinate
239: values from the two-dimensional histogram for these coordinates.
240: It is also possible to use an arbitrary combination of independent
241: and correlated coordinates---so long as each coordinate occurs
242: in only one $P$ factor.
243: 
244: We emphasize that the final computed free energy values include
245: all correlations embodied in the true potential $U_{\rm phys}$. This
246: is true regardless of whether or how coordinates are correlated in the
247: reference potential.
248: 
249: \begin{figure}
250:     \includegraphics[scale=0.35]{fig1.eps}
251:     \caption{Depiction of how the reference potential energy
252: 	$U_{\rm ref}$ is calculated for a one-coordinate system.
253: 	First the coordinate is binned, creating a 
254: 	histogram $P$ (solid bars) populated according to the physical
255: 	ensemble. Then Eq.\ (\ref{eq-Uref}) is used to
256: 	calculate reference energies for each coordinate bin (dashed bars).
257: 	A hypothetical physical potential is
258: 	shown as a dotted curve for comparison to $U_{\rm ref}$.
259: 	For a multi-coordinate system $U_{\rm ref}$
260: 	would be the sum of the single-coordinate reference
261: 	potential energies.
262: 	\label{fig-schematic}
263:     }
264: \end{figure}
265: 
266: A schematic of how $U_{\rm ref}$ is computed
267: for a one-coordinate system is shown in Fig.\ \ref{fig-schematic}.
268: The coordinate histogram is first determined (solid bars)
269: using a simulation trajectory;
270: then Eq.\ (\ref{eq-Uref}) is used to calculate
271: $U_{\rm ref}$ (dashed bars). A possible physical potential is
272: also included (dotted line) for comparison to $U_{\rm ref}$.
273: For a system containing many degrees of freedom,
274: the process is carried out for all coordinates, based
275: on Eq.\ (\ref{eq-Pind}), (\ref{eq-Pcorr}) or other correlation scheme.
276: $U_{\rm ref}$ is the sum
277: of all the appropriate terms,
278: consistent with Eq.\ (\ref{eq-Uref}) and the binning choice.
279: 
280: The free energy of the reference system can now
281: be calculated via the reference partition function
282: \begin{eqnarray}
283:     Z_{\rm ref} = \int d\vec{x} \; e^{-\beta U_{\rm ref}(\vec{x})}
284:         = \int d\vec{x} \; P(\vec{x}). 
285:     \label{eq-Zref}
286: \end{eqnarray}
287: In practice, we normalize the histogram for each coordinate
288: to one independently by summing
289: over all histogram bins. So, for a particular bond length $r_1$,
290: that is binned as independent, we account for the Jacobian
291: factor (see Eq.\ (\ref{eq-jacobian})) by defining $\xi = r_1^3/3$, and then
292: \begin{eqnarray}
293:     Z_{\xi} = \int d\xi \; P(\xi)
294: 	= \sum_{N_{\rm bin}} \; \Delta \xi \; P(\xi) = 1, 
295: \end{eqnarray}
296: where $\Delta \xi$ is the histogram bin size, and $N_{\rm bin}$
297: is the number of bins in the $r_1$ histogram.
298: (Binning choices are discussed below.)
299: Similar relationships are used for all coordinates.
300: Thus the reference free energy $F_{\rm ref}=0$
301: and Eq.\ (\ref{eq-Fphys}) becomes 
302: \begin{eqnarray}
303:     F_{\rm phys} = \Delta F_{\rm ref \rightarrow phys}
304:     \;\;\;\;\;\; (F_{\rm ref} \equiv 0)
305:     \label{eq-Fphys2}
306: \end{eqnarray}
307: 
308: \subsection{Using the physical and reference ensembles}
309: With the reference potential energy $U_{\rm ref}$
310: defined in Eq.\ (\ref{eq-Uref})
311: and the physical potential energy $U_{\rm phys}$
312: given by the forcefield, which may include implicit solvation energies,
313: Boltzmann-distributed snapshots from both the
314: reference and physical systems can be utilized
315: to calculate $F_{\rm phys}$=\dfrs.
316: Here, we simply use free energy perturbation \cite{zwanzig}
317: from the reference to the physical systems
318: \begin{eqnarray}
319:     F_{\rm phys} = -k_B T \ln \Big\langle
320: 	e^{-\beta \big( U_{\rm phys}-U_{\rm ref} \big) }
321:     \Big\rangle_{\rm ref} \nonumber \\
322:         \doteq -k_B T \ln \Bigg( \frac{1}{N_{\rm ref}} \sum_{i=1}^{N_{\rm ref}}
323: 	e^{-\beta \big(U_{\rm phys}-U_{\rm ref}\bigr)} \Bigg)
324:     \label{eq-fep}
325: \end{eqnarray}
326: where $N_{\rm ref}$ is number of structures in the reference ensemble,
327: the ``$\doteq$'' symbol denotes a computational estimate,
328: and $\langle ... \rangle_{\rm ref}$ represents a canonical average
329: using structures from the reference ensemble only.
330: It is important to note that, while other choices for
331: computing $F_{\rm phys}$ are possible, such as Bennett's method,
332: \cite{bennett,shirts-benn,shirts-prl,crooks-pre,lu-jcc,ytreberg-shift}
333: Eq.\ (\ref{eq-fep}) is the only choice which relies solely on
334: configurations drawn from the reference ensemble which
335: are, by construction, sampled canonically and without
336: dynamical trapping.
337: We also note that ``uni-directional'' estimates like that of
338: Eq.\ (\ref{eq-fep}) have been analyzed extensively
339: (e.g., Refs.\ \onlinecite{zuckerman-prl} and \onlinecite{zuckerman-jstat})
340: and may be amenable to error-reduction techniques;
341: \cite{zuckerman-cpl,ytreberg-extrap} however, we have applied the
342: perturbation approach here to keep our initial analysis as straightforward
343: as possible.
344: Staged free energy methods like thermodynamic
345: integration \cite{straatsma-ti} and adaptive integration
346: \cite{swendsen-aim} may also be used.
347: 
348: \subsection{The physical ensemble and construction of the reference system}
349: The method used in this report relies on simple
350: histograms for all degrees of freedom
351: (in principle, with internal or Cartesian coordinates)
352: based on a ``physical ensemble'' of
353: conformations generated via molecular dynamics,
354: Monte Carlo or other canonical simulation.
355: The histograms define a reference system with a free energy that is
356: trivially computable, as described in Sec.\ \ref{sec-method}.
357: We emphasize that an analytical
358: solution need not be available; a precise numerical evaluation is
359: more than adequate.
360: A well-sampled ensemble of reference system configurations is then
361: readily generated and used to compute the free energy
362: difference via Eq.\ (\ref{eq-fep}).
363: 
364: The first step in our approach to constructing the reference
365: system is to generate a physical
366: ensemble (i.e., a trajectory) by simulating the
367: system of interest using
368: standard molecular dynamics, Monte Carlo, or other
369: canonical sampling techniques.
370: The trajectory produced by the simulation is
371: used to generate histograms for all coordinates
372: as described below.
373: In creating histograms, note that constrained coordinates,
374: such as bond lengths involving hydrogens constrained
375: by RATTLE, \cite{rattle}
376: need not be binned since these coordinates do not change
377: between configurations.
378: Such coordinate constraints are not required in the method, however.
379: 
380: If internal coordinates are used (such as for the molecules
381: in this study), care must be taken to
382: account for the Jacobian factors.
383: Using internal coordinates with bond lengths $r$,
384: bond angles $\theta$ and dihedrals $\omega$, the
385: volume element in the configurational integral
386: of Eq.\ (\ref{eq-Zphys}) is given by \cite{gilson-jacs}
387: \begin{eqnarray}
388:     d \vec{x} = 
389: 	\prod_{i=1}^{N-1} r_i^2 dr_i \;
390: 	\prod_{i=1}^{N-2} \sin\theta_i d\theta_i \; 
391: 	\prod_{i=1}^{N-3} d\omega_i
392:     = \nonumber \\
393: 	\prod_{i=1}^{N-1} d (r_i^3/3) \;
394: 	\prod_{i=1}^{N-2} d (-\cos \theta_i) \; 
395: 	\prod_{i=1}^{N-3} d\omega_i, 
396:     \label{eq-jacobian}
397: \end{eqnarray}
398: where $N$ is the number of atoms in the system.
399: Thus, when using internal coordinates,
400: the simplest strategy to account for the
401: Jacobian is to bin according to a set of rules:
402: bond lengths are binned according to $r^3/3$,
403: bond angles are binned according to $\cos\theta$,
404: and dihedrals are binned according to $\omega$ (i.e., the
405: same as Cartesian coordinates).
406: 
407: \subsection{Generation of the reference ensemble}
408: Once the histograms are constructed and populated using the physical
409: ensemble, the reference ensemble is generated.
410: To generate a single reference structure,
411: for each coordinate one chooses a histogram
412: bin according to the probability associated with that bin. Then a
413: coordinate value is chosen at random uniformly
414: within the bin according
415: the Jacobian factor in Eq.\ (\ref{eq-jacobian})---e.g., for
416: a bond length $r$, one chooses uniformly in the variable $(r^3/3)$.
417: The process is repeated for every degree of freedom in the system.
418: By repeating the entire procedure, one can generate
419: as many reference structures as desired
420: (i.e., the reference ensemble).
421: 
422: \subsection{Summary of the reference system method}
423: In summary, the method is implemented by first constructing
424: properly normalized histograms for all internal (or Cartesian) coordinates
425: based on a physical ensemble of structures.
426: An ensemble of reference structures is then chosen at random from the
427: histograms.
428: The reference energy ($U_{\rm ref}$ of Eq.\ (\ref{eq-Uref})) and
429: physical energy ($U_{\rm phys}$ from the forcefield) must
430: be calculated for each structure in the reference ensemble.
431: Finally, Eq.\ (\ref{eq-fep}) is used to calculate the
432: desired absolute free energy of the system of interest.
433: 
434: The CPU cost of the method, above that of the
435: initial ``physical'' trajectory, is one physical energy evaluation
436: for each of the $N_{\rm ref}$ reference structures, plus the less
437: expensive cost of generating reference structures.
438: 
439: \section{Results}
440: To test the effectiveness of the reference system method
441: we first estimated the absolute free energy for three test systems
442: where the free energy is known exactly.
443: We chose the two-dimensional potentials
444: from Ref.\ \onlinecite{ytreberg-seps}, and  a methane molecule in vacuum.
445: Finally, we used the method to estimate the absolute free energies
446: of the alpha and beta conformations of the 50-atom
447: leucine dipeptide (ACE-(leu)$_2$-NME), and compared
448: the free energy difference obtained via our method
449: with an independent estimate.
450: In all cases, the free energy estimate computed by our approach
451: is in excellent agreement with independent results.
452: 
453: \subsection{Simple test systems}
454: We first studied the two-dimensional
455: single and double-well potentials from Ref.\ \onlinecite{ytreberg-seps},
456: \begin{eqnarray}
457:   U_{\rm phys}^{\rm single}(x,y)=(x+2)^2+y^2, \nonumber\\
458:   U_{\rm phys}^{\rm double}(x,y)=\frac{1}{10}
459:   \Bigl\{
460:   ((x-1)^2-y^2)^2+ \nonumber \\
461:   10(x^2-5)^2 + (x+y)^4+(x-y)^4
462:   \Bigr\}.
463:   \label{eq-pot}
464: \end{eqnarray}
465: 
466: \begin{table}
467:     \begin{tabular}{l|c|c}
468:     \hline \hline
469:     System & Exact & Estimate \\
470:     \hline
471:     two-dimensional single-well \cite{ytreberg-seps} & -1.1443 & -1.1449 (0.0003) \\
472:     \hline
473:     two-dimensional double-well \cite{ytreberg-seps} & 5.4043 & 5.4058 (0.0003)\\
474:     \hline
475:     Methane molecule & 10.932 & 10.934 (0.002)\\
476:     \hline \hline
477:     \end{tabular}
478:     \caption{
479: 	Absolute free energy estimates obtained using our 
480: 	reference system approach for cases where the absolute free
481: 	energy can be determined exactly.
482: 	In all cases, the estimate is in excellent agreement with
483: 	the exact free energy.
484: 	The uncertainty, shown in parentheses
485: 	(e.g., $3.14 \; (0.05) = 3.14 \pm 0.05$), is
486: 	the standard deviation from five independent simulations.
487: 	The results for the two-dimensional systems are in $k_BT$ units
488: 	and methane results have units of kcal/mole.
489: 	The table shows estimates of the configurational
490: 	integral in Eq.\ (\ref{eq-Zphys}),
491: 	i.e., the constant term is not included in the estimate.
492:     \label{tab-results}
493:     }
494: \end{table}
495: 
496: Table \ref{tab-results} shows the excellent agreement
497: between the reference system estimates and the exact free energies
498: (obtained analytically) for the
499: two-dimensional potentials used in this study, Eq.\ (\ref{eq-pot}).
500: The ``physical'' simulations used Metropolis Monte Carlo
501: with $k_BT=1.0$ and one
502: million snapshots in the physical and reference ensembles.
503: For all two-dimensional simulations, both coordinates
504: were treated with full
505: correlations---i.e., two-dimensional histograms were used---and
506: the bin sizes were chosen such that the number of bins ranged from
507: 100-1000.
508: The error shown in Table \ref{tab-results}
509: in parentheses is the standard deviation from five independent estimates
510: using five separate physical ensembles---and thus five different
511: reference systems.
512: Good estimates were also obtained using fewer snapshots---e.g.,
513: we obtained $F=-1.142 \; (0.003)$
514: for the single-well potential
515: and $F=5.408 \; (0.007)$ for the double-well potential
516: using 10,000 snapshots
517: in both the physical and reference ensembles.
518: 
519: Table \ref{tab-results} also shows the excellent agreement between the
520: reference system estimates and the exact value of the free
521: energy for methane in vacuum.
522: Methane trajectories were generated
523: using TINKER 4.2 \cite{tinker} with the OPLS-AA forcefield. \cite{oplsaa}
524: The temperature was maintained at 300.0 K using Langevin dynamics with
525: a friction coefficient of 91.0 ${\rm ps}^{-1}$ and a time step of 0.5 fs.
526: The physical ensemble was created by generating five 10.0 ns trajectories 
527: with snapshots saved every 0.1 ps.
528: Using the 100,000 methane structures in the physical ensemble,
529: the reference system was generated by binning internal coordinates
530: into histograms. The absolute free energy was then estimated
531: by generating 100,000 structures for the reference ensemble
532: and using Eq.\ (\ref{eq-fep}).
533: All coordinates were binned as independent using
534: one-hundred bins per coordinate, thus only one-dimensional histograms
535: were required.
536: The uncertainty shown in parenthesis in Table \ref{tab-results}
537: is the standard deviation from
538: five independent estimates using the five separate methane
539: trajectories---and thus five different reference systems.
540: 
541: \begin{figure}
542:     \includegraphics[scale=0.35]{fig2.eps}
543:     \caption{Absolute free energy for methane estimated by
544: 	the reference system
545: 	method as a function of the number of reference
546: 	structures $N_{\rm ref}$ used in the estimate.
547: 	The solid horizontal line 
548: 	is the exact free energy obtained by numerical integration.
549: 	Five independent simulations are shown on a log scale to clearly
550: 	show the convergence of the free energy estimate.
551: 	Results shown were obtained using Eq.\ (\ref{eq-fep})
552: 	with one-hundred bins for each degree of freedom, i.e., the estimates
553: 	for the absolute free energy of methane in Table \ref{tab-results}
554: 	are the values shown here for
555: 	$N_{\rm ref}=1,000,000$.
556: 	\label{fig-converge-meth}
557:     }
558: \end{figure}
559: 
560: Figure \ref{fig-converge-meth} shows the convergence
561: behavior of the reference
562: system method for methane. Five independent absolute free energy
563: estimates are shown as a function of the number of reference
564: structures used in the estimate.
565: Each of the five simulations use the same protocol as described above,
566: i.e., the absolute free energy estimates in Table \ref{tab-results} are
567: the values shown in
568: Fig.\ \ref{fig-converge-meth} for $N_{\rm ref}=100,000$.
569: 
570: Methane was chosen as a test system because
571: intra-molecular interactions are due only to bond
572: lengths and angles. In the OPLS-AA forcefield no non-bonded terms
573: are present in the
574: potential energy $U_{\rm phys}$, and thus the exact absolute free energy can
575: be computed numerically without great difficulty.
576: For methane, a configuration is determined by:
577: (i) four bond lengths, which are independent of each other and 
578: all of other coordinates in the forcefield; and
579: (ii) five bond angles which are correlated to one another but
580: not to the bond lengths.
581: Thus the exact partition function $Z_{\rm meth}$ is a product
582: of four bond length partition functions $Z_r$ and one
583: angular partition function $Z_{\theta}$,
584: \begin{eqnarray}
585:     Z_{\rm meth} = Z_r^4 Z_{\theta}, \nonumber \\
586: 	Z_r = \int_{0}^{\infty} dr\;e^{-\beta U_{\rm phys}(r)},
587: 	    \nonumber \\
588: 	Z_{\theta} = \int_{0}^{\pi}
589: 	    d\theta_1 d\theta_2 d\theta_3 d\theta_4 d\theta_5 \;
590: 	    e^{-\beta U_{\rm phys}
591: 		(\theta_1,\theta_2,\theta_3,\theta_4,\theta_5)
592: 	      }.
593: \end{eqnarray}
594: $U_{\rm phys}(r)$ is harmonic and thus $Z_r$ was computed analytically
595: using parameters from the forcefield.
596: For $U_{\rm phys}(\theta_1,\theta_2,\theta_3,\theta_4,\theta_5)$
597: the correlations between angles must be
598: taken into account, thus $Z_{\theta}$ was estimated numerically using
599: TINKER to evaluate $U_{\rm phys}$ in the five-dimensional integral.
600: We found that $F_{\rm meth}=-k_B T \ln Z_{\rm meth} = 10.932$ kcal/mol
601: as shown in Table \ref{tab-results}.
602: 
603: Methane was also used to show that the method correctly computes
604: the free energy even when the physical ensemble is incorrect or incomplete.
605: In our studies we found that the correct free energy
606: is obtained using our method even when the histogram for
607: each coordinate was assumed to be flat, i.e., without the
608: use of a physical ensemble (data not shown).
609: 
610: \begin{figure}
611:     \includegraphics[scale=0.35]{fig3.eps}
612:     \caption{Absolute free energy for methane estimated by
613: 	the reference system
614: 	method as a function of the number of histogram bins used for
615: 	each degree of freedom. The plot shows the ``sweet spot'' where
616: 	histogram bins are small enough to reveal histogram features,
617: 	yet large enough to give sufficient population in each bin.
618: 	The results are shown with a vertical scale of
619: 	two kcal/mol and on a log scale to emphasize the
620: 	wide range of bin sizes that produce excellent results for the
621: 	reference system approach.
622: 	Results shown were obtained using Eq.\ (\ref{eq-fep})
623: 	for a methane molecule using $N_{\rm phys}=N_{\rm ref}=10,000$
624: 	(dashed curve)
625: 	and $N_{\rm phys}=N_{\rm ref}=100,000$ (solid curve).
626: 	The solid horizontal line shows the exact
627: 	free energy and the errorbars are the standard deviations
628: 	of five independent trials.
629: 	The plot demonstrates at least fifty bins should
630: 	be used for each independent coordinate,
631: 	and that the maximum number of bins
632: 	depends on the number of snapshots in the physical ensemble.
633: 	\label{fig-sweet}
634:     }
635: \end{figure}
636: 
637: Choosing the size of the histogram bins
638: is an important consideration.
639: Figure \ref{fig-sweet} shows the large ``sweet spot'' where bins
640: are large enough
641: to be well populated, and yet small enough to reveal
642: histogram features.
643: The figure shows results for the absolute free energy
644: for a methane molecule using ten-thousand structures
645: in both the physical and reference ensembles,
646: $N_{\rm phys}=N_{\rm ref}=10,000$, (dashed curve)
647: and $N_{\rm phys}=N_{\rm ref}=100,000$ (solid curve).
648: The small vertical scale of two kcal/mol and the logarithmic horizontal
649: scale emphasize that there
650: is a wide range of bin sizes that produce excellent results for the
651: reference system approach.
652: Error bars are the standard deviation
653: of five independent simulations. The solid horizontal line shows the exact
654: free energy and the curves are free energy estimates,
655: using Eq.\  (\ref{eq-fep})
656: as a function of the number of bins used for the histograms
657: for all degrees of freedom. From this plot it is clear that one
658: should choose at least fifty bins, and that the maximum number of bins
659: that should be used depends on the number of snapshots in the physical
660: ensemble---more snapshots in the physical ensemble
661: means one can use more bins for the reference system.
662: 
663: \begin{table}
664:     \begin{tabular}{l|c|c}
665:     \hline \hline
666:     System & Estimate (kcal/mol) & Independent Estimate\\
667:     \hline
668:     $F_{\rm alpha}$ & 87.3 (0.7) & --- \\
669:     \hline
670:     $F_{\rm beta}$  & 86.3 (0.7) & --- \\
671:     \hline
672:     $\Delta F_{\rm alpha \rightarrow beta}$ & -1.0 (0.9) & -0.85 (0.05) \\
673:     \hline \hline
674:     \end{tabular}
675:     \caption{
676: 	Absolute free energy estimates of
677: 	the alpha ($F_{\rm alpha}$) and beta ($F_{\rm beta}$) conformations
678: 	obtained using the 
679: 	reference system method for leucine dipeptide with
680: 	GBSA solvation, in units of kcal/mol.
681: 	The independent measurement for the free energy difference
682: 	was obtained via a 1.0 $\mu$s unconstrained simulation.
683: 	The uncertainty for the absolute free energies, 
684: 	shown in parentheses, is the standard deviation from five
685: 	independent 10.0 ns leucine dipeptide simulations using
686: 	one-million reference structures in the reference ensemble.
687: 	The uncertainty
688: 	for the free energy differences is obtained by using every possible
689: 	combination of $F_{\rm alpha}$ and $F_{\rm beta}$,
690: 	i.e., twenty-five independent estimates.
691:         The standard error associated with the 
692: 	$\Delta F_{\rm alpha \rightarrow beta}$ reference system
693: 	estimate is 0.18 kcal/mol, reflecting the twenty-five
694: 	independent estimates.
695: 	The table shows estimates of the configurational
696: 	integral in Eq.\ (\ref{eq-Zphys}),
697: 	i.e., the constant term is not included in the estimate.
698:     \label{tab-results2}
699:     }
700: \end{table}
701: 
702: \subsection{Leucine dipeptide}
703: Table \ref{tab-results2}
704: shows the agreement for leucine dipeptide
705: (ACE-(leu)$_2$-NME) between the free energy difference
706: $\Delta F_{\rm alpha \rightarrow beta}$
707: as predicted by the reference system method, and as
708: predicted via long simulation.
709: The leucine dipeptide physical ensembles were
710: generated using TINKER 4.2 \cite{tinker} with
711: the OPLS-AA forcefield. \cite{oplsaa}
712: The temperature was maintained at 500.0 K (to enable
713: an independent $\Delta F$ estimate via
714: repeated crossing of the free energy barrier between
715: alpha and beta configurations),
716: using Langevin dynamics with a friction coefficient of
717: 5.0 ${\rm ps}^{-1}$. GBSA \cite{still} implicit
718: solvation was used, and RATTLE was utilized to maintain all bonds involving
719: hydrogens at their ideal lengths \cite{rattle} allowing the use
720: of a 2.0 fs time step.
721: 
722: We calculated reference systems and
723: computed absolute free energies of the alpha and
724: beta conformations based on five
725: 10.0 ns trajectories. For all simulations, 
726: backbone torsions were constrained using a flat-bottomed
727: harmonic restraint (zero force if the torsion
728: angles were within the allowed range, and harmonic otherwise),
729: namely, for alpha: $-105<\phi<-45 \;{\rm and}\; -70<\psi<-10$;
730: and for beta: $-125<\phi<-65 \;{\rm and}\; 120<\psi<180$.
731: The reference system was generated using 100,000 snapshots
732: from the physical ensemble, then free energy estimates were obtained
733: by generating 1,000,000 structures for the reference ensemble for
734: each estimate. All one-hundred fifteen
735: (excludes bond lengths constrained by RATTLE \cite{rattle})
736: internal coordinates were binned as independent
737: with fifty bins for each coordinate.
738: The uncertainty shown in parenthesis is
739: the standard deviation from the
740: five independent estimates using the five separate trajectories, i.e.,
741: five different physical ensembles and five different reference systems.
742: 
743: Since independent estimates of the absolute free energies 
744: of the alpha and beta conformations of leucine dipeptide
745: are not available, we calculated the free
746: energy difference
747: $\Delta F_{\rm alpha \rightarrow beta} = -0.85 \; (0.05)$ kcal/mol
748: via a 1.0 $\mu$s unconstrained simulation.
749: The uncertainty of the independent estimate was obtained using
750: block averages.
751: The temperature was chosen to be 500.0 K which allowed around 1500
752: crossings of the free energy barrier between the alpha and
753: beta conformations, providing an accurate independent estimate.
754: As can be seen in Table \ref{tab-results2}, our estimated free
755: energy difference is in good agreement with the independent
756: value obtained via long simulation.
757: 
758: We emphasize that the nearly kcal/mol fluctuations observed in our
759: leucine dipeptide estimates are completely independent of the magnitude
760: of the free energy difference of the same order. That is, for a similar
761: sized system and similar CPU investment, one would expect similar uncertainty,
762: even for a very large free energy difference. This, indeed, is the motivation
763: for performing absolute free energy calculations. We believe, moreover, that
764: efficiency improvements will be achieved beyond the data in this initial
765: report.
766: 
767: \begin{figure}
768:     \includegraphics[scale=0.35]{fig4.eps}\\
769:     \vspace{12pt}
770:     \includegraphics[scale=0.35]{fig5.eps}
771:     \caption{Free energy for leucine dipeptide estimated by
772: 	the reference system
773: 	method as a function of the number of reference structures
774: 	$N_{\rm ref}$ used in the estimate.
775: 	Five independent simulations are shown on a log scale to demonstrate
776: 	the convergence behavior of the free energy estimate for
777: 	(a) the alpha configuration, and (b) the beta configuration.
778: 	Results shown were obtained using Eq.\ (\ref{eq-fep})
779: 	with fifty bins for each degree of freedom.
780: 	\label{fig-converge-di}
781:     }
782: \end{figure}
783: 
784: Figure \ref{fig-converge-di} shows the convergence
785: behavior of the reference
786: system method for leucine dipeptide. Five free energy
787: estimates are shown as a function of the number of reference
788: structures used in the estimate for
789: (a) the alpha configuration, and (b) the beta configuration.
790: Each of the five simulations use the same protocol as described above.
791: 
792: \begin{figure}
793:     \includegraphics[scale=0.35]{fig6.eps}\\
794:     \vspace{12pt}
795:     \includegraphics[scale=0.35]{fig7.eps}
796:     \caption{Scatter plots of the two $\chi_2$ torsions
797: 	of each residue for leucine dipeptide. Results are shown
798: 	for both physical and reference ensembles containing 100,000
799: 	structures each.
800: 	The figure shows that:
801: 	(i) the reference system has good overlap with the physical system,
802: 	as can be seen by the similarity between the two plots;
803: 	and (ii) the reference system is more broadly distributed
804: 	than the physical
805: 	system, as evidenced by the data at (-60,-60) for the reference
806: 	system that is not present for the physical system.
807: 	\label{fig-scat}
808:     }
809: \end{figure}
810: 
811: The leucine dipeptide calculations also demonstrate two important
812: aspects of the particular reference system defined in this study:
813: (i) the reference system has good overlap with the physical system; and
814: (ii) the reference system is broader than the physical system.
815: Figure \ref{fig-scat} shows a scatter plot of the
816: $\chi_2$ torsions of each residue
817: for both the physical and reference ensembles. Each ensemble
818: contains 100,000 structures. The figure clearly shows the
819: excellent overlap between the reference and physical ensemble,
820: as can be seen by the similarity between the two plots. In
821: addition, the reference ensemble scatter plot has data
822: in the region (-60,-60) which does not exist in the
823: physical ensemble, showing that the reference system is ``broader'' than
824: the physical system.
825: 
826: \begin{figure}
827:     \includegraphics[scale=0.35]{fig8.eps}
828:     \caption{Histogram of the distance between the $C_\delta$ of residue one
829: 	and the $C_\alpha$ of residue two for leucine dipeptide. Results are
830: 	shown for both reference and physical ensembles containing 100,000
831: 	structures each.
832: 	The figure shows that:
833: 	(i) the reference system has good overlap with the physical system;
834: 	and (ii) the reference system is broader than the physical system.
835: 	\label{fig-dist}
836:     }
837: \end{figure}
838: 
839: Figure \ref{fig-dist} shows a histogram of the distance between the $C_\delta$
840: atom of residue one and the $C_\alpha$ of residue two for 
841: the same ensembles as Fig.\ \ref{fig-scat}. The figure again shows
842: how the reference system has both excellent overlap with the
843: physical system and is also broader than the physical system.
844: 
845: \section{Discussion}
846: The present results raise a number of questions regarding the reference
847: system approach to computing absolute free energies---in particular, regarding
848: the use of correlations, the importance of the physical ensemble,
849: and the potential for application to larger systems.
850: 
851: \subsection{Correlation of Coordinates}
852: How can correlations among coordinates be used to increase the method's
853: effectiveness? One may choose to 
854: bin coordinates as independent (i.e., one-dimensional
855: histograms), or with correlations
856: (i.e., multi-dimensional histograms).
857: For example, in peptides, one may choose to bin all
858: sets of backbone $\phi,\psi$ torsions as correlated, and all other
859: coordinates (bond lengths, bond angles, other torsions) as
860: independent. It might always seem advantageous
861: to bin some coordinates (at least backbone torsions)
862: as correlated, since reference structures drawn
863: randomly from the histograms 
864: will be less likely to have steric
865: clashes. On the other hand, including correlations with small bin
866: sizes is impractical. As an example, imagine that for the leucine
867: dipeptide molecule used in
868: this study, one binned the four $\phi,\psi$ backbone torsions as
869: correlated. If fifty bins for each torsion were used (as should
870: be done according to the discussion below), then there
871: would be $50^4=6,250,000$ multi-dimensional bins to populate,
872: which is simply not feasible.
873: 
874: There does appear to be an important advantage to eliminating at
875: least some correlations from the original ``physical'' ensemble:
876: namely, a larger portion of conformational space
877: becomes available to the reference ensemble;
878: see Figs.\ \ref{fig-scat} and \ref{fig-dist}. 
879: Since coordinates for the reference structures
880: are drawn randomly and independently, it is
881: possible to generate reference structures that are
882: in entirely different energy basins than those in
883: the physical ensemble. {\it It is thus possible to
884: overcome the inadequacies of the physical ensemble
885: by binning internal coordinates independently}.
886: The optimal (presumably) limited use of correlations
887: will be considered in future work.
888: 
889: Regardless of the degree of correlations included in $U_{\rm ref}$,
890: we emphasize that final results fully include correlations in the physical
891: potential $U_{\rm phys}$.
892: 
893: \subsection{Quality of the physical ensemble}
894: Since the reference ensemble is generated by drawing at random from
895: histograms which, in turn, were generated from the physical ensemble,
896: a natural question to ask is: how complete does the physical
897: ensemble need to be?
898: The surprising answer is that, for our reference system method,
899: the physical ensemble does not need to
900: be complete, or even correct (properly distributed).
901: Since Eqs.\ (\ref{eq-Fphys}) and (\ref{eq-Fphys2}) are
902: valid for arbitrary reference systems,
903: the convergence of the free energy estimate to the correct value
904: is guaranteed, in the limit of infinite sampling
905: ($N_{\rm ref} \rightarrow \infty$), regardless of the
906: quality of the physical ensemble.
907: The ``trick'' is that the ensemble for the reference system must
908: be converged, which can be achieved with much less expense since
909: there is no dynamical trapping.
910: Unlike the typical case for molecular mechanics simulation,
911: we sample the reference ensemble ``perfectly''---there is no possibility
912: of being trapped in a local basin. By construction, since all coordinate
913: values are generated exactly according to the reference distributions,
914: the reference ensemble can only suffer from statistical (but not systematic)
915: error.
916: For example, it was possible to obtain the correct
917: free energy for methane based on 10,000 reference structures
918: even when the histogram for
919: each coordinate was assumed to be flat, i.e., without
920: the use of a physical ensemble (data not shown).
921: 
922: It is important to note that, while convergence to the
923: correct free energy is guaranteed for any
924: choice of reference system, the efficiency of the method could
925: be dramatically reduced if the reference system does not overlap
926: well with the physical system.
927: 
928: Given the fact that the physical ensemble need not be correct, it
929: is easy to imagine a modified method that does not require
930: simulation, but instead populates the histogram bins using the ``bare''
931: potential for each internal coordinate (e.g., Gaussian histograms
932: for bond lengths and angles). Of course,
933: the conformational state must be defined explicitly,
934: with upper and lower limits for coordinates.
935: Allowed ranges for the torsions (especially
936: $\phi,\psi$) are naturally obtainable via, e.g., Ramachandran
937: propensities (e.g., Ref.\ \onlinecite{richardson}),
938: and reasonable ranges for bond lengths and angles
939: could be chosen to be, e.g., several standard deviations
940: from the mean.
941: 
942: \subsection{Extension to larger systems}
943: While the initial results of our reference system method are
944: promising, a naive implementation of the
945: method will find difficulty with large systems (as do
946: all absolute and relative free energy methods).
947: For our method, the difficulty
948: with including a very large number of degrees of freedom
949: is due to the fact that,
950: if one does not treat all correlations in
951: the backbone, then steric clashes will occur frequently when
952: generating the reference ensemble.
953: 
954: However, it is possible to extend the method
955: to larger peptides, still include all degrees of freedom, and
956: bin all coordinates independently (important for broadening
957: configurational space, as discussed above), by using
958: a ``segmentation'' technique motivated by earlier work.
959: \cite{gibson-seg,leach-seg}
960: Consider generating reference
961: structures for a ten-residue peptide in the alpha helix conformation.
962: Due to the large number of backbone torsions,
963: most of the reference structures chosen at random
964: will not be energetically favorable.
965: However, if one breaks the peptide into two pieces, then one can
966: generate many structures for each segment, and only
967: ``keep'' energetically likely segment structures.
968: The selected structures
969: may be joined to form full structures which are reasonably likely
970: to have low energy. 
971: For example, if one generates $10^5$ structures for each of the
972: two segments and keeps only $10^3$ of those, then one only need
973: evaluate $10^3 \times 10^3 = 10^6$ full structures
974: out of a possible $10^5 \times 10^5 = 10^{10}$.
975: A statistically correct segmentation strategy
976: is currently being investigated by the authors for use in
977: large peptides.
978: 
979: Another strategy which may prove useful for larger systems
980: is to use the reference system method with multi-stage simulation.
981: Multi-stage
982: simulation requires the introduction of a hybrid potential energy
983: parameterized by $\lambda$, e.g.,
984: \begin{eqnarray}
985:     U_{\lambda} = \lambda U_{\rm phys} + (1-\lambda) U_{\rm ref}.
986: \end{eqnarray}
987: Thus, $U_0 = U_{\rm ref}$ and $U_1 = U_{\rm phys}$.
988: Simulations are performed using the hybrid potential energy
989: $U_{\lambda}$ (and thus a hybrid forcefield, if using molecular
990: dynamics) at intermediate $\lambda$ values between 0 and 1.
991: Conventional free energy methods such as thermodynamic integration
992: or free energy perturbation can then be used to
993: obtain $F_{\rm phys}$.
994: 
995: We also believe that including correlations, such as suggested
996: by Eq.\ (\ref{eq-Pcorr}) and possibly other ways, may be useful.
997: The inclusion of correlations should improve the overlap between
998: the reference and physical ensembles---thereby reducing the amount
999: of sampling required in the reference system, hence improving efficiency.
1000: This also will be explored in future work.
1001: (We also remind the
1002: reader that the final free energy value includes the full correlations
1003: in $U_{\rm phys}$, regardless of $U_{\rm ref}$.)
1004: 
1005: The method could prove useful in future protein-ligand binding
1006: studies. In the simplest approach, one could freeze all degrees of
1007: freedom except for the ligand and side-chain degrees of freedom
1008: in the binding site. While the absolute free energy would be unphysical,
1009: the approach could permit comparison of ligands or protein mutations
1010: with little or no conformational similarity.
1011: 
1012: In principle, it is possible to extend the reference
1013: system method to include explicitly solvated biomolecules.
1014: However, as with all absolute free energy methods, the
1015: addition of the solvent degrees of freedom causes
1016: the free energy estimate to converge much more slowly than
1017: without explicit solvent.
1018: Thus, we feel the method described in this study will find
1019: use primarily in implicitly solvated biomolecules.
1020: 
1021: \section{Conclusions}
1022: In conclusion, we have introduced and tested a simple
1023: method for calculating absolute
1024: free energies in molecular systems.
1025: The approach relies on the construction of an ensemble of
1026: reference structures (i.e., the reference system) 
1027: that is designed to have high overlap with the physical system
1028: of interest.
1029: The method was first shown to reproduce exactly computable
1030: absolute free energies for simple systems, and
1031: then used to correctly predict the stability of leucine
1032: dipeptide conformations
1033: using all one-hundred fifteen degrees of freedom.
1034: 
1035: Some strengths of the approach are that:
1036: (i) the reference system is built to have good overlap with the system
1037: of interest by using internal coordinates and by using
1038: a single equilibrium ensemble from Monte Carlo or molecular dynamics;
1039: (ii) the absolute free energy estimate is guaranteed to converge to the
1040: correct value, whether or not the physical ensemble is complete
1041: and, in fact, it is possible to estimate the absolute free energy
1042: without the use of a physical ensemble;
1043: (iii) the method explicitly includes all degrees of freedom employed
1044: in the simulation;
1045: (iv) the reference system need only be numerically
1046: computable, i.e., the exact analytic result is not needed; and
1047: (v) the method can  be trivially extended to include the use
1048: of multi-stage simulation.
1049: The CPU cost of the approach, beyond that for short trajectories
1050: of the physical system of interest,
1051: is one energy call for each reference structure, plus
1052: the less expensive cost of generating the reference ensemble.
1053: 
1054: In the present ``proof of principle'' report,
1055: our method was used to study conformational
1056: equilibria; however we feel that the simplicity and flexibility
1057: of the method may find broad use in computational biophysics
1058: and biochemistry for a wide variety of free energy problems.
1059: We have also described a segmentation strategy, currently being
1060: pursued, to use the approach in much larger systems.
1061: 
1062: \section*{Acknowledgments}
1063: The authors would like to thank Edward Lyman, Ronald White,
1064: Srinath Cheluvarajah and Hagai Meirovitch for many
1065: fruitful discussions.
1066: 
1067: \bibliography{}
1068: 
1069: \end{document}
1070: