1: \documentclass[prb,twocolumn,amsmath,amssymb]{revtex4}
2: %\documentclass[prb,preprint,amsmath,amssymb]{revtex4}
3:
4: \usepackage{graphicx} %figures
5: \bibliographystyle{apsrev}
6:
7: \newcommand{\taud}{\tau_{\text{dec}}}
8: \newcommand{\tsim}{t_{\text{sim}}}
9:
10: \begin{document}
11:
12: \title{Demonstrated convergence of the equilibrium ensemble for a fast
13: united-residue protein model}
14: \author{F.\ Marty Ytreberg\footnote{E-mail: ytreberg@uidaho.edu}}
15: \affiliation{Department of Physics,
16: University of Idaho, Moscow, ID 83844-0903}
17: \author{Svetlana Kh.\ Aroutiounian\footnote{The first two authors contributed
18: equally}}
19: \affiliation{Department of Physics, Dillard University,
20: 2601 Gentilly Blvd., New Orleans, LA 70122}
21: \author{Daniel M.\ Zuckerman\footnote{E-mail: dmz@ccbb.pitt.edu}}
22: \affiliation{Department of Computational Biology,
23: University of Pittsburgh, 3064 BST-3, Pittsburgh, PA 15213}
24: \date{\today}
25:
26: \begin{abstract}
27: Due to the time-scale limitations of all-atom simulation of proteins,
28: there has been substantial interest in coarse-grained approaches.
29: Some methods, like ``Resolution Exchange,''
30: [E.\ Lyman \emph{et al.}, Phys.\ Rev.\ Lett.\ {\bf 96}, 028105 (2006)]
31: can accelerate canonical
32: all-atom sampling, but require properly distributed coarse ensembles.
33: We therefore demonstrate that full sampling can indeed be achieved in a
34: sufficiently simplified protein model, as verified by a recently
35: developed convergence analysis.
36: The model accounts for protein backbone geometry in that rigid
37: peptide planes rotate according to atomistically defined dihedral
38: angles, but there are only two degrees of freedom
39: ($\phi$ and $\psi$ dihedrals) per residue.
40: Our convergence analysis indicates that small proteins
41: (up to 89 residues in our tests) can be simulated for more than
42: 50 ``structural decorrelation times'' in less than a week on
43: a single processor.
44: We show that the fluctuation behavior is reasonable,
45: as well as discussing applications, limitations, and extensions of the model.
46: \end{abstract}
47:
48: \maketitle
49:
50: \section{Introduction}
51: How simplified must a molecular model of a protein be for
52: it to allow full canonical sampling?
53: This question may be important to the solution of the protein
54: sampling problem---the generation of protein structures properly
55: distributed according to statistical mechanics---because of the
56: well-known inadequacy of all-atom simulations, which
57: are limited to sub-microsecond timescales.
58: Even small peptides have proven slow to reach convergence
59: \cite{lyman-converge}.
60: Sophisticated atomistic methods, moreover, which often employ
61: elevated temperatures \cite{swendsen-repx,nemoto,hansmann,okamoto,garcia-repx},
62: have yet to show they can overcome the remaining gap in
63: timescales \cite{zuckerman-barriers}---which is
64: generally considered to be several orders of magnitude.
65: On the other hand, because of the drastically reduced numbers
66: of degrees of freedom and smoother landscapes,
67: coarse-grained models
68: (e.g., Refs.\ \onlinecite{levitt-nature,go,scheraga75,kuntz-coarse,
69: miyazawa,skolnick,wolynes,dill,thirumalai,
70: friesner,jernigan-bahar,karplus97,scheraga97a,scheraga97b,clementi-pnas,hall,
71: shakhnovich,voth-forcematching,zuckerman-cam})
72: may have the potential to aid the ultimate solution
73: to the sampling problem, particularly in light of recently developed
74: algorithms like ``Resolution Exchange''
75: \cite{lyman-resx,lyman-resx2}
76: and related methods \cite{luo-coarse,vangunsteren-resx,voth-resx}.
77:
78: Although the Resolution Exchange approach, in principle, can produce
79: properly distributed atomistic ensembles of protein configurations,
80: it requires full sampling at the coarse-grained
81: level \cite{lyman-resx,lyman-resx2}.
82: While the potential for such full sampling has been suggested by
83: some studies of folding and conformational change
84: (e.g., Refs.\ \onlinecite{clementi-jmb,zuckerman-cam}),
85: convergence has yet to be carefully quantified in equilibrium
86: sampling of folded proteins.
87: How much coarse-graining really is necessary?
88: What is the precise computational cost of different approaches?
89: This report begins to answer these questions by studying
90: a united-residue model with realistic backbone geometry.
91:
92: We will require a quantitative method for assessing sampling.
93: A number of approaches have been suggested
94: \cite{brooks-converge,thirumalai-converge,vangunsteren-converge,
95: pande-converge,lyman-converge},
96: but we rely on a recently proposed statistical approach which
97: directly probes the fundamental configuration-space distribution
98: \cite{lyman-converge,lyman-converge2}.
99: The method does not require knowledge of important
100: configurational states or any parameter fitting.
101: In essence, the approach attempts to answer the most
102: fundamental statistical question,
103: ``What is the minimum time interval between snapshots so
104: that a set of structures will behave as if each member
105: were drawn independently from the configuration-space
106: distribution exhibited by the full trajectory?''
107: This interval is termed the structural decorrelation time $\taud$,
108: and the goal is to generate simulations of length $\tsim \gg \taud$.
109:
110: In this report, we demonstrate the convergence
111: of the equilibrium ensemble for several proteins using a fast,
112: united-residue model employing rigid peptide planes.
113: The relative motion of the planes is determined by the
114: \emph{atomistic} geometry embodied in the $\phi$ and $\psi$
115: dihedral-angle rotations, as explained below.
116: We believe such realistic backbone geometry will
117: be necessary for success in Resolution Exchange studies.
118: The use of geometric look-up tables enables the rapid use of
119: only two degrees of freedom per residue ($\phi$ and $\psi$),
120: and one interaction site at the alpha-carbon.
121: The simulations are therefore extremely fast.
122: G{\=o} interactions stabilize the native state
123: while permitting substantial fluctuations in the overall backbone.
124:
125: After the model and the simulation approach are explained,
126: the fluctuations are compared with experimental data
127: from X-ray temperature factors and the diversity of NMR structure sets.
128: The simulations are then analyzed for convergence and timing.
129:
130: \begin{figure}
131: \begin{center}
132: \includegraphics[width=5.7cm,clip]{pep-plane.eps}
133: \end{center}
134: \caption{\label{fig-rigid}
135: The rigid peptide plane model used this study.
136: Note that, in the coarse-grained simulations,
137: only alpha-carbons are represented,
138: and the only degrees of freedom are $\phi$ and $\psi$.
139: Other atoms are shown in the figure only to clarify
140: the geometry and our
141: assumption of rigid peptide planes.
142: }
143: \end{figure}
144:
145: \section{Coarse-grained model}
146: The coarse-grained model used for this study
147: was chosen to meet several criteria:
148: (i) the fewest number of degrees of freedom per residue;
149: (ii) the ability to utilize lookup tables for enhanced simulation speed;
150: (iii) the stability of the native state along with the potential
151: for substantial non-native fluctuations; and,
152: (iv) the ability to allow the addition of
153: chemical detail, as simply as possible.
154: Thus, we chose a rigid peptide plane model with G{\=o}
155: interactions \cite{go,go2,go-pnas}
156: and sterics based on alpha-carbon interaction sites as shown
157: in Fig.\ \ref{fig-rigid}.
158: The use of such a simple model, we emphasize, is consistent
159: with our goal of understanding both the potential, and the
160: limitations of coarse models for statistically valid sampling.
161: Once we have understood the costs associated with the present model,
162: we can design more realistic models, as discussed below.
163: In other words, we made no attempt to design the most
164: chemically realistic coarse-grained model,
165: although we believe the use of atomistic peptide geometry
166: is an improvement over a coarse model we considered
167: previously \cite{zuckerman-cam}.
168:
169: The rigid peptide planes allows the use of only two degrees
170: of freedom per residue, arguably the fewest that one would consider
171: in such a model.
172: Indeed, this is fewer than in a freely rotating chain,
173: although admittedly our model requires somewhat more
174: complex simulation moves, described below.
175:
176: G{\=o} interactions were used because they simultaneously
177: stabilize the native
178: state of the protein and also permit reasonable equilibrium
179: fluctuations, as was shown in an earlier study \cite{zuckerman-cam}.
180: Given our interest in native-state fluctuations and the lack of a
181: \emph{universal} coarse-grained model capable of stabilizing the
182: native state for \emph{any} protein, G{\=o} interactions are a
183: natural choice for enforcing stability.
184: Further, beyond the reasonable ``local'' fluctuations shown below,
185: the model also exhibits partial unfolding events which are
186: expected both theoretically and
187: experimentally \cite{falke,englander,kern}.
188:
189: Because we see the present model as only a first step in the
190: development of better models, it is important that it easily
191: allows for the addition of
192: chemical detail, such as Ramachandran propensities which
193: require only the dihedral angles we use explicitly \cite{richardson}.
194: Furthermore, with a rigid peptide plane, the locations
195: of all backbone atoms---and the beta carbon---are known implicitly.
196: Thus hydrogen-bonding and hydrophobic interactions \cite{dill} can
197: be included in the model with little effort.
198: In other words, the ``extendibility'' of the
199: present simple model was a significant factor in its design.
200:
201: \subsection{Potential energy of model system}
202: The total potential used in the model is given by
203: \begin{equation}
204: U = U^{\rm nat} + U^{\rm non},
205: \label{eq-u}
206: \end{equation}
207: where $U^{\rm nat}$ is the total energy for native contacts,
208: and $U^{\rm non}$ is the total energy for non-native contacts.
209:
210: For the G{\=o} interactions, all residues that are separated by a distance
211: \emph{less} than $R_{\rm cut}$ in the experimental structure are
212: given native interaction energies defined by a square well:
213: \begin{align}
214: U^{\rm nat} &= \sum_{ \{i<j\} }^{\rm native} u^{\rm nat}(r_{ij}),
215: \nonumber \\
216: u^{\rm nat}(r_{ij}) &= \left\{
217: \begin{array}{l}
218: \infty \;\; {\rm if} \;\; r_{ij} < r_{ij}^{\rm nat}(1-\delta)\\
219: -\epsilon \;\; {\rm if} \;\;
220: r_{ij}^{\rm nat}(1-\delta) \leq r_{ij}
221: < r_{ij}^{\rm nat}(1+\delta)\\
222: 0 \;\; {\rm otherwise}
223: \end{array}
224: \right.,
225: \label{eq-un}
226: \end{align}
227: where $r_{ij}$ is the $C_\alpha-C_\alpha$
228: distance between residue $i$ and $j$, $r_{ij}^{\rm nat}$
229: is the the distance between the residues in the experimental structure,
230: $\epsilon$ determines the energy scale of the native
231: G{\=o} attraction,
232: and $\delta$ is a parameter to choose the width of the well.
233: All residues that are separated by \emph{more} than $R_{\rm cut}$ in the
234: experimental structure are
235: given non-native interaction energies defined by
236: \begin{align}
237: U^{\rm non} &= \sum_{ \{i,j\} }^{\rm non-native} u^{\rm non}(r_{ij}),
238: \nonumber \\
239: u^{\rm non}(r_{ij}) &= \left\{
240: \begin{array}{l}
241: \infty \;\; {\rm if} \;\; r_{ij} < (\rho_i+\rho_j)(1-\delta)\\
242: +h\epsilon \;\; {\rm if} \;\; (\rho_i+\rho_j)(1-\delta)
243: \leq r_{ij} < R_{\rm cut}\\
244: 0 \;\; {\rm otherwise}
245: \end{array}
246: \right.,
247: \label{eq-unn}
248: \end{align}
249: where $\rho_i$ is the hard-core radius of residue $i$ defined as half the
250: $C_\alpha$ distance to the nearest non-covalently-bonded residue,
251: and $h$ determines the strength of the repulsive interaction.
252:
253: For this study, parameters were chosen to be similar to those
254: in Ref.\ \onlinecite{zuckerman-cam}, i.e.,
255: $\epsilon=1.0$, $h=0.3$, $\delta=0.1$, and $R_\text{cut}=8.0$ \AA.
256:
257: \subsection{Monte Carlo simulation}
258: The protein fluctuations were generated using
259: Metropolis Monte Carlo \cite{metropolis}.
260: Trial configurations were generated by adding a random Gaussian
261: deviate to the values of three sequential pairs of backbone torsions
262: (three $\phi$ and three $\psi$ angles).
263: We found that changing six sequential backbone
264: torsions maximizes the rate of convergence of the equilibrium ensemble
265: (data not shown).
266: The energy of the trial configuration was
267: then determined using Eq.\ (\ref{eq-u}), and the conformation
268: was accepted with probability $\min (1,e^{-\Delta U/k_BT} )$,
269: where $\Delta U$ is the total change in potential energy of the system.
270: The width of the Gaussian distribution for generating random deviates
271: was chosen such that the acceptance
272: ratio was about 40\% for all simulations.
273: The choice of temperature is discussed below.
274:
275: \subsection{Use of lookup tables}
276: The speed of the coarse-grained simulation was enhanced by using
277: lookup tables to avoid unnecessary computation.
278: In general, utilizing lookup tables increases memory
279: usage while decreasing the number of computations.
280: Since memory is inexpensive and can be expanded easily,
281: utilizing as much memory as possible can be an effective
282: technique for increasing the speed of simulations.
283:
284: In our model there are only two degrees of freedom per residue
285: ($\phi,\psi$), but $C_\alpha$ distances $r_{ij}$ must be
286: computed to determine native and non-native interaction energies
287: given by Eqs. (\ref{eq-un}) and (\ref{eq-unn}).
288: All peptide planes are considered to possess ideal,
289: rigid geometry as determined by energy minimization of
290: the all-atom OPLS forcefield \cite{oplsaa}
291: using the {\sc tinker} simulation package \cite{tinker}.
292:
293: Given a sequence of three residues (alpha carbons),
294: we employed a lookup table to provide the Cartesian coordinates
295: of the third residue---starting from the N-terminus---and
296: its normal vector as a function of
297: $\phi$ and $\psi$; see Fig.\ \ref{fig-rigid}.
298: The table values assume that the first residue is at the origin
299: and the second residue is located on the z-axis. Once the coordinates
300: for the third residue were determined via the lookup table, the fourth
301: residue position was determined using the lookup table in conjunction with
302: a coordinate rotation and shift. Continuing in this fashion, coordinates
303: for the entire protein were determined.
304:
305: The resolution of the lookup table is an important consideration, i.e.,
306: the number of $\phi,\psi$ values for which Cartesian coordinates are stored.
307: In our simulations, we tried resolutions as high as $0.1^\circ$
308: and as low as $1.0^\circ$, and found no difference between the results.
309: Thus, all simulation results presented here use lookup tables with a
310: resolution of $1.0^\circ$.
311:
312: \subsection{\label{sec-equil}Initial protein relaxation}
313: One perhaps unexpected complication of utilizing a rigid peptide plane model
314: is that great care must be taken to relax the protein
315: before simulations can be performed.
316: Although initial values of $\phi,\psi$ are obtained from the
317: X-ray or NMR structure,
318: there are slight deviations from planar/ideal geometry in
319: a real protein. These deviations, while small, can accumulate rapidly to
320: become very large differences in the Cartesian coordinate positions
321: of the residues.
322: Thus, the positions of residues near the beginning of the protein
323: will be nearly correct, while the residues near the end of the protein
324: will likely have large errors---compared to the experimental structure being
325: modeled---which can create severe steric clashes or even incorrect
326: protein topology.
327: The severity of these ``errors'' necessitates the use of a relaxation
328: procedure to generate a suitable starting structure---i.e., a set of $\phi$
329: and $\psi$ angles which, with our ideal-geometry peptide planes,
330: lead to a topologically reasonable and relatively clash-free structure.
331:
332: Before we detail our relaxation procedure, we note that the need for this
333: additional calculation is an artifact of the simplicity of
334: our model which can be overcome.
335: With the use lookup tables, in fact,
336: it is possible to include \emph{flexible} peptide planes
337: without significantly increasing the computational
338: cost of the model.
339: Such an approach, which does not require initial relaxation,
340: is currently under investigation with promising preliminary results
341: (data not shown).
342:
343: The relaxation procedure employed in the present study first
344: uses the $\phi,\psi$ values directly obtained from
345: the experimental structure.
346: These dihedrals provide the initial (problematic)
347: structure for a coarse-grained simulation.
348: Due to the deviations from planarity described above,
349: the root means-square deviation (RMSD)
350: between the initial structure we create
351: and the experimental structure tends to be large ($\sim 10$ \AA\
352: was not uncommon for the proteins in this study).
353: To increase the number of native contacts and reduce the number
354: of steric clashes, we next performed what we term ``RMSD Monte
355: Carlo'' to relax the protein to a low RMSD structure.
356: Trial moves for RMSD Monte Carlo were created as described above, but accepted
357: with probability $\min (1,e^{-\Delta(\text{RMSD})/k_BT_\text{RMSD}} )$,
358: where $k_BT_\text{RMSD} = 10^{-7}$ was chosen so that moves to a higher
359: RMSD were rare.
360: In other words, the energy function itself was not used in this initial phase.
361:
362: Since residues near the beginning of the protein have less
363: error in the starting structure than residues near the end, we used
364: RMSD Monte Carlo in segments. The first twenty residues were relaxed
365: until the RMSD was constant within a tolerance of 0.0001 \AA, followed
366: by the first forty, then the first sixty and so on until the RMSD of the
367: entire protein was relaxed. The RMSD Monte Carlo simulation
368: typically brought the RMSD of the simulated structure to less than
369: 0.5 \AA, however, there were generally still steric clashes,
370: and some native contacts were still not present.
371:
372: The final stage of relaxation was to do regular (i.e., using energy)
373: Metropolis Monte Carlo simulation, with a very low temperature.
374:
375: Relaxation was performed until four criteria were met:
376: (i) the number of native contacts in the relaxed structure
377: was equal to that in the NMR or X-ray structure;
378: (ii) no steric clashes were present;
379: (iii) no non-native contacts were present,
380: i.e., $U^{\rm non} = 0$ in Eq.\ (\ref{eq-unn}), and;
381: (iv) the RMSD was less than 1.0 \AA.
382: When these criteria were
383: met the structure was saved and used
384: as the starting configuration in all future simulations of the protein.
385:
386: \begin{figure*}
387: \begin{center}
388: \includegraphics[width=5.7cm,clip]{rmsf-bar.eps}
389: \hfill
390: \includegraphics[width=5.7cm,clip]{rmsf-cam.eps}
391: \hfill
392: \includegraphics[width=5.7cm,clip]{rmsf-pg.eps}
393: \end{center}
394: \begin{center}
395: \includegraphics[width=5.7cm,clip]{rmsd-1a19.eps}
396: \hfill
397: \includegraphics[width=5.7cm,clip]{rmsd-1cll.eps}
398: \hfill
399: \includegraphics[width=5.7cm,clip]{rmsd-1pgb.eps}
400: \end{center}
401: \caption{\label{fig-rmsf}
402: (Color online)
403: Relative alpha-carbon root mean square fluctuations for three
404: different proteins: (a) barstar, (b) calmodulin, and (c) protein G.
405: Each plot shows results for the
406: X-ray structure (dot-dash), the NMR ensemble (dash),
407: and the coarse-grained simulation (solid).
408: X-ray results were given by $\sqrt{3B/8\pi^2}$, where
409: $B$ is the temperature factor given in the PDB entry.
410: NMR and simulation data were generated using the
411: g\_rmsf program in the {\sc gromacs} molecular simulation
412: package \cite{gromacs}; each ensemble was aligned to the
413: first structure in the corresponding trajectory.
414: For each coarse-grained simulation, $2\times 10^9$ Monte
415: Carlo steps were performed with snapshots saved every
416: 1000 steps, and the potential energy
417: \eqref{eq-u} was set up using the X-ray structure.
418: Panels (d) - (f) show the corresponding whole-structure
419: fluctuations as indicated by the RMSD from the experimental structures.
420: }
421: \end{figure*}
422:
423: \section{Results and Discussion}
424: Using the coarse-grained protein model described above, we
425: generated and tested equilibrium ensembles for three proteins:
426: barstar (PDB entry 1A19, residues 1-89),
427: the N-terminal domain of calmodulin (PDB entry 1CLL, residues 4-75), and
428: the binding domain of protein G (PDB entry 1PGB, residues 1-56)
429:
430: For each protein, the initial simulation structure was generated,
431: followed by RMSD and energy relaxation, as described in
432: Sec.\ \ref{sec-equil}. Then, production runs of
433: $2 \times 10^9$ Monte Carlo moves were performed with snapshots
434: saved every 1000 moves, generating an equilibrium ensemble
435: with $2 \times 10^6$ frames.
436:
437: In an attempt to obtain consistent results for the three proteins,
438: we chose the temperature of the simulation, $k_BT$, to
439: be slightly below the unfolding temperature of the protein. The unfolding
440: temperature was determined by running simulations over a broad range
441: of temperatures and studying the RMSD as a function of simulation
442: time. The temperatures used in the simulations were $k_BT=0.6$ for barstar,
443: $k_BT=0.4$ for calmodulin and $k_BT=0.5$ for protein G.
444:
445: \subsection{Speed of simulations}
446: Due to the use of lookup tables for coordinate transformations,
447: the small number of degrees of freedom,
448: and utilizing simple square potentials, the equilibrium
449: ensembles were generated very rapidly.
450:
451: Running on one Xeon 2.4 GHz processor, $2 \times 10^9$ Monte
452: Carlo moves with snapshots saved every 1000 steps took roughly
453: 6 days for barstar, 4 days for calmodulin, and 3 days for protein G.
454: Thus, less than a week was required to obtain well-converged
455: (see Sec.\ \ref{sec-conv}) simulations
456: of these coarse-grained proteins.
457:
458: \subsection{Protein fluctuations}
459: We first sought to determine whether fluctuations in the
460: coarse-grained simulation are reasonable.
461: Figure \ref{fig-rmsf} shows the alpha-carbon
462: relative root mean square fluctuation for three
463: different proteins.
464: The figures show that there is reasonable qualitative agreement
465: between the NMR, X-ray and simulation data.
466:
467: It should be noted that, in fact, \emph{none} of the three
468: data sets in Figs.\ \ref{fig-rmsf}a, b and c represents the true
469: fluctuations in the protein---for different reasons.
470: The X-ray temperature factor, in addition to thermal fluctuations,
471: includes crystal lattice artifacts and other experimental errors \cite{northrup}.
472: NMR ensembles tend to be biased, perhaps severely, toward low energy structures, and
473: thus also do not represent equilibrium ensembles \cite{spronk}.
474: Finally, our simulation data is
475: not accurate due to the lack of chemical detail in the forcefield.
476:
477: In spite of the limitations of the analysis, we conclude
478: from Fig.\ \ref{fig-rmsf} that
479: the fluctuations of the coarse-grained model are in fact
480: reasonable.
481:
482: The bottom panels of Fig.\ \ref{fig-rmsf} show the whole-molecule
483: fluctuations exhibited throughout the trajectories.
484: In addition to the ability to sample large conformational
485: fluctuations---such as in the case of calmodulin and,
486: to a lesser degree, for protein G---the trajectories are
487: visibly more converged than is typically observed in atomistic
488: simulations, where RMSD values rarely reach a plateau value,
489: let alone sampling around that plateau value multiple times
490: as would be desirable.
491:
492: \subsection{\label{sec-conv}Convergence analysis}
493: The primary purpose of this report is to demonstrate the convergence
494: of the equilibrium ensemble for a coarse-grained protein.
495: The details of the convergence analysis are described in
496: Ref.\ \onlinecite{lyman-converge2}, so we will only briefly describe
497: the method here.
498:
499: Previously, Lyman and Zuckerman \cite{lyman-converge}
500: developed an approach which groups sampled conformations
501: into structural histogram bins, using the RMSD as a metric.
502: While promising, the primary limitation of the method was
503: the lack of a quantitative measure of the convergence.
504:
505: In the method used here, convergence
506: was analyzed by studying the variance of the structural histogram bin
507: populations \cite{lyman-converge2}.
508: The new approach allows a rigorous
509: \emph{quantitative} estimation of convergence---the structural
510: decorrelation time
511: $\taud$, given by the time between frames required for the
512: variance to reach an analytically computable independent-sampling value.
513: Intuitively, and mathematically, $\taud$ is the time interval
514: between snapshots for which they behave as if each frame
515: were drawn independently.
516: If simulation times $\tsim \gg \taud$
517: are obtained, the equilibrium ensemble is considered converged.
518:
519: Perhaps the most important feature of the convergence
520: analysis for our study is that the method does not require
521: any prior knowledge of important states.
522: Furthermore, there is no parameter-fitting or subjective analysis of any kind.
523:
524: \begin{figure*}
525: \begin{center}
526: \includegraphics[width=5.7cm,clip]{conv-bar.eps}
527: \hfill
528: \includegraphics[width=5.7cm,clip]{conv-cam.eps}
529: \hfill
530: \includegraphics[width=5.7cm,clip]{conv-pg.eps}
531: \end{center}
532: \caption{\label{fig-conv}
533: Convergence analysis for coarse-grained simulations of
534: three different proteins:
535: (a) barstar, (b) calmodulin, and (c) protein G.
536: Each plot shows the convergence properties for the same trajectories
537: as used for Fig.\ \ref{fig-rmsf},
538: analyzed using the procedure in
539: Ref.\ \onlinecite{lyman-converge2}.
540: The number of frames required to reach the value of one
541: (the solid horizontal line) is an approximation
542: of the structural decorrelation time $\taud$
543: and is shown on each plot.
544: The three curves on each plot are results for different
545: histogram sub-sample sizes \cite{lyman-converge2}
546: and demonstrates the robustness
547: of the value of $\taud$.
548: The plots predict that the decorrelation times are roughly
549: 40 000 frames for barstar, 20 000 frames for calmodulin
550: and 30 000 frames for protein G.
551: Note that the total number of frames generated for each protein
552: during the simulation was $2\times10^6$.
553: Thus, since each simulation was more than $50 \taud$
554: in length, we conclude that the equilibrium ensembles
555: are well-converged.
556: Error bars represent 80\% confidence intervals
557: in the expected fluctuations around the ideal value of one,
558: based on the given trajectory length and the numerical procedure
559: used to generate the solid curve.
560: }
561: \end{figure*}
562:
563: Figure \ref{fig-conv} shows the convergence properties
564: of the coarse-grained simulations using the same trajectories
565: as in Fig.\ \ref{fig-rmsf}.
566: The ratio of the observed variance to the ideal variance for independent
567: sampling is plotted as a function of the time between the configurations
568: used to compute the observed variance.
569: When this ratio decreases to one the structural decorrelation time $\taud$
570: has been reached, as shown in the figure.
571: The analysis predicts that each simulation
572: is at least 50 times longer than the
573: structural decorrelation time.
574:
575: Thus we conclude that, in less than a week of single-processor
576: time, the equilibrium ensembles for these three proteins are
577: well converged.
578:
579: \section{Conclusions}
580: We have demonstrated the convergence of the equilibrium
581: ensemble for a simple united-residue protein model.
582: The model assumes rigid peptide planes, with atomistically
583: correct geometry, and exhibits reasonable residue-level
584: fluctuations based the planes' geometry, G{\=o} interactions,
585: and sterics.
586:
587: Most importantly, the results indicate \emph{quantitatively}
588: that carefully designed united-residue models have
589: the potential to fully sample protein fluctuations.
590: By using only two degrees of freedom per residue,
591: look up tables for coordinate transforms, and
592: simple square well potentials, we were able
593: to demonstrate that converged equilibrium ensembles
594: can be obtained in less
595: than a week of single processor time.
596: The quantitative convergence analysis indicates that more than 50
597: ``decorrelation times'' were simulated in each case,
598: indicating high-precision ensembles.
599: In addition to application in Resolution Exchange sampling of
600: all-atom models \cite{lyman-resx,lyman-resx2},
601: such speed opens up the long-term possibility of large-scale
602: simulation of many proteins.
603:
604: One important practical limitation of the ideal-peptide-plane
605: geometry in the present model is the need to relax
606: the the initial structure.
607: Proteins larger than 100 residues are difficult to relax.
608: However, we have already begun investigating a flexible-plane
609: model incorporating lookup tables which exhibits no such
610: limitation and remains computationally affordable.
611: We will report on the flexible model in the future.
612:
613: Although the intrinsic atomistic geometry of the peptide
614: plane was included in our model, it lacks chemical interactions.
615: Yet because we obtained converged ensembles in such a short
616: time, it is clear we can ``afford'' extensions
617: to the model which include realistic chemistry.
618: For instance, additional potential energy terms such as
619: Ramachandran propensities \cite{richardson},
620: hydrophobic interactions \cite{dill}
621: and hydrogen-bonding can be included at small cost.
622:
623: Aside from the potential for rigorous atomistic
624: sampling \cite{lyman-resx,lyman-resx2,ytreberg-bbrw},
625: it is important to note the general usefulness of coarse-grained
626: models for generating \emph{ad hoc} atomistic ensembles.
627: Specifically, upon generating a well-sampled ensemble of coarse-grained
628: structures, atomic detail can be added using existing software
629: such as those in Refs.\ \onlinecite{sccomp,rapper}.
630: Once minimized and relaxed,
631: these (now) atomically detailed structures form
632: an \emph{ad hoc} ensemble which
633: may be of immediate use in docking \cite{knegtel,shoichet-nature}
634: and homology modeling applications.
635: Further, in principle, such structures can be re-weighted
636: into the Boltzmann distribution
637: \cite{ytreberg-bbrw}.
638:
639: In the long term, one can imagine a day when structural databases
640: will be based not on single (static) structures but rather
641: will collect ensembles---as envisioned in the authors' scheme for an
642: ``Ensemble Protein Database''(http://www.epdb.pitt.edu/).
643:
644: \begin{acknowledgments}
645: We thank Edward Lyman, Bin Zhang and Artem Mamonov
646: for helpful discussions.
647: Funding was provided by the National Institutes of Health
648: under fellowship GM073517 (to F.M.Y.),
649: and grants GM070987 and ES007318,
650: and by the National Science Foundation grant MCB-0643456.
651: \end{acknowledgments}
652:
653: \bibliography{/home/marty/res/tex/my}
654:
655: \end{document}
656: