1: \documentclass[aps]{revtex4}
2: \usepackage{times}
3: \usepackage{graphicx}
4: \begin{document}
5:
6: \title{Accurate and efficient description of protein vibrational
7: dynamics: comparing molecular dynamics and Gaussian models}
8:
9: \date{\today}
10:
11: \author{Cristian Micheletti, Paolo Carloni and Amos Maritan\\
12: \small International School for Advanced Studies (S.I.S.S.A.) and INFM, Via Beirut 2-4, 34014 Trieste, Italy
13: }
14: \date{\today}
15:
16:
17: \begin{abstract}
18: Current all-atom potential based molecular dynamics (MD) allow the
19: identification of a protein's functional motions on a wide-range of
20: time-scales, up to few tens of ns. However, functional large scale motions
21: of proteins may occur on a time-scale currently not accessible by
22: all-atom potential based molecular dynamics. To avoid the massive
23: computational effort required by this approach several simplified
24: schemes have been introduced. One of the most satisfactory is the
25: Gaussian Network approach based on the energy expansion in terms of
26: the deviation of the protein backbone from its native
27: configuration. Here we consider an extension of this model which
28: captures in a more realistic way the distribution of native
29: interactions due to the introduction of effective sidechain
30: centroids. Since their location is entirely determined by the protein
31: backbone, the model is amenable to the same exact and computationally
32: efficient treatment as previous simpler models. The
33: ability of the model to describe the correlated motion of protein
34: residues in thermodynamic equilibrium is established through a series
35: of successful comparisons with an extensive (14 ns) MD simulation
36: based on the AMBER potential of HIV-1 protease in complex with a
37: peptide substrate. Thus, the model presented here emerges as a
38: powerful tool to provide preliminary, fast yet accurate
39: characterizations of proteins near-native motion.
40: \end{abstract}
41:
42: \maketitle
43:
44: \section{Introduction}
45:
46:
47: Considerable insight into the biological activity of a protein can be
48: gained by identifying its large-scale functional movements. Ideal
49: tools for a detailed characterization of such dynamical properties are
50: constituted by computational techniques such as molecular dynamics
51: (MD) simulations based on effective all-atom potentials
52: \cite{Karplus_ACR}. By these means it is possible, at present, to
53: follow numerically the dynamical evolution of a protein of a few hundred
54: residues in its surrounding solvent over time intervals of tens of
55: nanoseconds.
56:
57: Such time-scales allow to gain considerable insight into important
58: aspects of protein dynamics and to make quantitative connections with
59: experimental quantities such as NMR order parameters \cite{nmr1,nmr2}
60: and Trp fluorescence spectra \cite{karplus_book}. Other complex
61: conformational changes are however difficult or impossible to be
62: observed. Examples include protein-protein molecular recognition,
63: rearrangements occuring upon ligand binding etc. which all involve
64: time-scales of the order of 1 $\mu$s or longer\cite{go82}. In
65: addition, the simulated trajectory might not be sufficiently long that
66: thermodynamic averages can be legitimately replaced with dynamical
67: ones\cite{Hess2002}.
68:
69: Several studies have attempted to bridge the gap between the time
70: scales of feasible MD simulations and the ones of
71: biologically-relevant protein motion by recoursing to a mesoscopic
72: rather than a microscopic approach\cite{tir96}. In fact, the
73: large-scale dynamical features encountered in MD trajectories can be
74: conveniently interpreted, at a first approximation, as a superposition
75: of independent harmonic modes \cite{go82}. This
76: observation was complemented by Tirion who pointed out that, in a
77: normal mode analysis of protein vibrations, the detailed classical
78: force-field could be replaced by suitable harmonic couplings with the
79: same spring constants \cite{tir96}. These results stimulated a variety
80: of studies where the elastic properties of proteins were described
81: through coarse-grained models where amino acids are replaced by
82: effective centroids corresponding to the $C_\alpha$ atoms and the
83: energy function is reduced to harmonic couplings between pairs of
84: spatially close centroids. These approaches, in particular the
85: Gaussian and anisotropic network models (GNM and ANM), have been found
86: to be in accord with both experimental and MD results
87: \cite{bah97,doruker2000,ani01}.
88:
89: Here we introduce an extended network gaussian model which, at
90: variance with previous approaches, incorporates effective $C_\beta$
91: centroids ``tethered'' to the $C_\alpha$ atoms. The presence of the
92: effective $C_\beta$'s allows a good control of the directionality of
93: pairwise interactions in the protein and thus leads to an improved
94: vibrational description. Furthermore, the fact that the sidechains
95: degrees of freedom are entirely controlled by the $C_\alpha$'s has the
96: crucial implication that the computational effort required to
97: characterize the system is exactly the same as for models based only
98: on $C_\alpha$'s.
99:
100: The model equilibrium dynamics is compared against MD simulations
101: based on all-atom effective potentials. We provide several
102: quantitative estimates for how and in what sense, the simplified
103: quadratic approaches can provide a complement of the more accurate but
104: also much more computationally-demanding all-atom potential based MD
105: calculations. From a general point of view, the
106: ideal term of reference would be constituted by a direct experimental
107: determination of the quantities of interest here, such as the
108: correlation of residues' motion. However, since this is not presently
109: feasible, such detailed information can only be obtained from MD
110: simulations. Although such simulations do not provide an absolute term
111: of comparison, it is certainly an adequate reference for the much
112: simplified protein models and energy functionals considered here.
113:
114: The reference system considered here is the complex formed by the
115: HIV-1 protease dimer with a bound model substrate
116: (Fig. \ref{fig:hiv}). This protein appears to be suitable to serve as
117: reference for our model calculations in many respects. First, its
118: dynamics in aqueous solution has been extensively investigated over
119: more than 10 ns by all-atom effective potentials MD simulation
120: \cite{piana2002,piana2002b}. Second, a large-scale motion analysis
121: based on this MD simulation has been already performed. Third, this
122: protein/substrate complex does not contain metal ions, prostetic
123: groups, cofactors or non standard amino acids, and therefore it is
124: amenable as a first test system for our model calculations, which
125: consider only standard amino acids. Finally, this protein is of
126: outstanding pharmaceutical relevance (it is one of the two targets
127: currently used in anti-AIDS therapy) and its dynamics has been
128: revealed as a key ingredient for the enzymatic function and for
129: rationalizing resistance data \cite{condra,patick,piana2002,pcr}.
130:
131: The data obtained from the MD simulation of
132: refs. \cite{piana2002,piana2002b} have been used here to evaluate all
133: the MD-related quantities used for comparison against the results for
134: quadratic models. In support of the general applicability of Gaussian
135: approaches to capture the details of proteins essential motion we also
136: report a comparison of the model predictions against data from a
137: recent MD study of the NGF-trkA complex formed by the nerve growth
138: factor and the tyrosin Kinase A receptor\cite{settanni_ngf}.
139:
140:
141:
142: \begin{figure}
143: \includegraphics[width=3.0in]{fig1.eps}
144: \caption{Backbone trace of the complex formed by the HIV-1 protease
145: homodimer and the bound substrate. Each monomer is composed of 99
146: residues; The model substrate consists of six amino acids
147: \protect{\cite{piana2002,piana2002b}}.}
148: \label{fig:hiv}
149: \end{figure}
150:
151:
152:
153: \section{Theory}
154:
155: \subsection{The model}
156:
157: The starting point of the present analysis is the expansion of the
158: Hamiltonian in terms of the deviations of the amino acids from
159: their reference native positions. The underlying
160: assumption is that a protein immersed in aqueous solution vibrates
161: around its native state with amplitudes so small to justify the
162: quadratic expansion around the minimum of the potential energy
163: function. {\em A posteriori} this does not seem to be a drastic
164: restriction \cite{go82,go_gauss91,doruker2000} despite both the
165: large-amplitude of motion observed in dynamical trajectory and the
166: existence of several native sub-states, instead of a unique energy
167: minimum \cite{substates}.
168:
169: To reduce the spatial degrees of freedom of a protein we adopt a
170: coarse-grained model where a two-particle representation is used for
171: each amino acid: besides the $C_\alpha$ atom, an effective $C_\beta$
172: centroid is employed to capture, in the simplest possible way, the
173: sidechain orientation in a given amino acid (except for GLY for which
174: only the $C_\alpha$ atom is retained). To distinguish the proposed
175: model from those based on the $C_\alpha$ only representation will
176: shall refer to it as the $\beta$ Gaussian model ($\beta$GM for
177: brevity). Since our focus is to study the concerted vibration of
178: various amino acids around the native state of the system, the
179: Hamiltonian that is adopted incorporates, accordingly, pairwise
180: interactions between all pairs of particles that are sufficiently
181: spatially close in the native state. Formally, the system energy
182: function evaluated on a trial structure, $\Gamma$, takes on the form
183:
184:
185:
186:
187: \begin{equation}
188: {\cal H} (\Gamma) = {\cal H}_{BB} (\Gamma) + {\cal H}_{\alpha\alpha} (\Gamma)
189: + {\cal H}_{\alpha\beta} (\Gamma) + {\cal H}_{\beta\beta} (\Gamma)
190: \label{eqn:ham}
191: \end{equation}
192:
193:
194: \noindent where
195: \begin{eqnarray}
196: {\cal H}_{BB} (\Gamma) &=& k \sum_i V^{CA-CA} (d_{i,i+1}^{CA-CA})\nonumber \\
197: {\cal H}_{\alpha\alpha} (\Gamma) &=& \sum_{i<j} \Delta^{CA-CA}_{ij} V^{CA-CA} (d_{i,j}^{CA-CA}) \nonumber\\
198: {\cal H}_{\alpha\beta} (\Gamma) &=& \sum_{i,j} \Delta^{CA-CB}_{ij} V^{CA-CB} (d_{i,j}^{CA-CB}) \nonumber\\
199: {\cal H}_{\beta\beta} (\Gamma) &=& \sum_{i<j} \Delta^{CB-CB}_{ij} V^{CB-CB} (d_{i,j}^{CB-CB}),
200: \label{eqn:hamb}
201: \end{eqnarray}
202:
203: \noindent In expressions (\ref{eqn:hamb}) $\Delta^{XY}_{ij}$ is the
204: native contact matrix that takes on the values of 1 [0] if the native
205: separation of the effective particles of type $X$ and $Y$, belonging
206: respectively to residues $i$ and $j$, is below [above] a certain
207: cutoff value, $R$. With $d^{XY}_{ij}$, on the other hand, we denote
208: the actual separation of the particles in the trial structure,
209: $\Gamma$. The indices $i$ and $j$ run over all integer values ranging
210: from 1 up to the protein length, $N$. In particular, the interaction
211: between particles in consecutive amino acids, $| i-j| =1$, leads to a
212: simple treatment of the protein chain connectivity. However, to
213: account for the much higher strength of the peptide bond with respect
214: to non-covalent contact interactions between amino acids, we have
215: added in (\ref{eqn:ham}) an explicit chain term, ${\cal H}_{BB}$, where
216: the interaction of consecutive $C_\alpha$'s is controlled by $k >0$.
217:
218: By construction, the minimum of the various interaction terms is
219: attained for the native separation of each pair of particles. This
220: ensures that the native state is at the global energy minimum. For
221: small fluctuations around the native structure, the potential
222: interaction energy of two particles, $i$ and $j$, can be expanded in
223: terms of the deviations from the native distance-vector,
224: $\vec{r}_{ij}$. If we indicate the deviation vector as $\vec{x}_{ij}$,
225: so that the total distance vector is $\vec{d}_{ij} = \vec{r}_{ij} +
226: \vec{x}_{ij}$, we can approximate the pairwise interaction as
227:
228: \begin{equation}
229: V(d_{ij}) \approx
230: V(r_{ij}) + { V^{\prime \prime}(r_{ij}) \over 2} \sum_{\mu,\nu}
231: {r_{ij}^\mu\, r_{ij}^\nu \over r^2_{ij}} x_{ij}^\mu\,
232: x_{ij}^\nu
233: \label{eqn:vexp}
234: \end{equation}
235:
236: \noindent where $\mu$ and $\nu$ denote the cartesian components, $x$,
237: $y$ and $z$, and $V^{\prime \prime}$ is the second derivative of $V$.
238: Several models have been introduced previously, where the quadratic
239: expansion (\ref{eqn:vexp}) was used in a context where only
240: interactions among $C_\alpha$ centroids where considered, such as in
241: the anisotropic gaussian model (ANM), recently introduced to study the
242: vibrational spectrum of proteins \cite{ani01}.
243:
244: Based on this quadratic expansion the Hamiltonian of eq. (\ref{eqn:ham})
245: can be approximated as,
246:
247:
248:
249: \begin{equation}
250: \tilde {\cal H} = { 1 \over 2}\, \sum_{ij,\mu\nu} x^{CA}_{i,\mu}
251: M^{CA-CA}_{ij,\mu \nu} x^{CA}_{j,\nu}\ ,
252: \label{eqn:ham2}
253: \end{equation}
254:
255: \noindent where ${M}$ is a $3N$x$3N$ symmetric matrix. The elastic
256: response of the system is uniquely dictated by the eigenvalues and
257: eigenvectors of ${M}$.
258:
259:
260: What differentiates the $\beta$GM from several previous studies is the
261: presence of the interactions between $C_\alpha$ and $C_\beta$ and
262: $C_\beta$-$C_\beta$ (besides the extra strength of the chain term).
263: The introduction of the $C_\beta$ centroids in the protein description
264: leads, in principle, to a more complicated Hamiltonian, with the
265: additional $C_\beta$'s degrees of freedom:
266:
267: \begin{eqnarray}
268: {\cal H} = &&{1 \over 2} \sum_{ij,\mu\nu} x^{CA}_{i,\mu} M^{CA-CA}_{ij,\mu\nu}
269: x^{CA}_{j,\nu} \nonumber \\
270: &&+ \sum_{ij,\mu\nu} x^{CA}_{i,\mu} M^{CA-CB}_{ij,\mu \nu}
271: x^{CB}_{j,\nu} \nonumber \\
272: &&+ {1 \over 2} \sum_{ij,\mu\nu} x^{CB}_{i,\mu} M^{CB-CB}_{ij,\mu \nu}
273: x^{CB}_{j,\nu} \ .
274: \label{eqn:ham3}
275: \end{eqnarray}
276:
277: \noindent However, the location of the $C_\beta$ atoms in a protein
278: structure is almost uniquely specified by the geometry of the peptide
279: chain. An accurate method that predicts the location of the $C_\beta$
280: atoms from the CA trace of a protein is the geometric construction of
281: Park and Levitt\cite{cb_construct}, which assigns the $i$th $C_\beta$
282: location given the positions of the $C_\alpha$'s of residues $i-1$,
283: $i$ and $i+1$, allows to place the fictitious $C_\beta$ at a distance
284: of 0.3 \AA\ from the crystallographic location. Such excellent
285: agreement clarifies that the degrees of freedom of the $C_\beta$
286: centroids should not be considered independent from the $C_\alpha$
287: ones. On the contrary, the $C_\beta$'s can be viewed as rigidly
288: ``tethered'' to the $C_\alpha$ and hence the fluctuations of the
289: former are dictated by those of the latter.
290:
291:
292: Although in principle one could use the original rule of Park and
293: Levitt\cite{cb_construct}, we have adopted a simpler construction
294: scheme which places the $C_\beta$ exactly in the plane specified by
295: the local $C_\alpha$ trace. This simplifies the construction of the
296: $M$ matrices which remain ``diagonal'' in the cartesian components
297: (e.g. the $x$ component of the reconstructed $C_\beta$ depends only on
298: the $x$ components of the neighbouring $C_\alpha$'s. More precisely,
299: the location of the $i$th $C_\beta$ is given by
300:
301:
302: \begin{equation}
303: \vec{r}_{CB}(i) = \vec{r}_{CA}(i) + l {2 \, \vec{r}_{CA} (i) -
304: \vec{r}_{CA} (i+1) -\vec{r}_{CA} (i-1) \over | 2 \, \vec{r}_{CA} (i) -
305: \vec{r}_{CA} (i+1) -\vec{r}_{CA} (i-1)|}
306: \end{equation}
307:
308:
309: \noindent For reasons of self-consistency of the model, this
310: construction rule is used to determine the contact matrices involving
311: the effective $C_\beta$ centroids that are used in place of those
312: depending on the crystallographic $C_\beta$ locations in eqn
313: (\ref{eqn:hamb}). To leading order in the deviations of the
314: $C_\alpha$ atoms, the deviations of $r_{CB}(i)$ thus becomes:
315:
316:
317: \begin{equation}
318: \vec{x}_{CB}(i) \approx l { 2 \, \vec{x}_{CA} (i) - \vec{x}_{CA} (i+1)
319: -\vec{x}_{CA} (i-1) \over | 2 \, \vec{r}_{CA} (i) - \vec{r}_{CA} (i+1)
320: -\vec{r}_{CA} (i-1)|} \ .
321: \label{eqn:cbfluc}
322: \end{equation}
323:
324:
325: \noindent where $l = 3$ \AA. By using this rule, one parametrizes, in
326: terms of the $C_\alpha$ positions the effective $C_\beta$ location of
327: all residues except for GLY and for the initial and final residues
328: which lack one of the flanking $C_\alpha$'s. When the resulting
329: expressions (\ref{eqn:cbfluc}) are substituted in equation
330: (\ref{eqn:ham3}) one obtains an effective quadratic Hamiltonian which,
331: as in equation (\ref{eqn:ham2}) involves only the $C_\alpha$
332: deviations but coupled through a new effective matrix, $\tilde{M}$
333: which is distinguished from previous matrices by a tilde
334: superscript. The book-keeping operations necessary to calculate the
335: elements of such matrix are conveniently implemented with the aid of a
336: computer. Thus, the computational cost and difficulty to characterize
337: the elastic response of the protein is reduced to exactly the same as
338: models with $C_\alpha$ atoms only. In spite of the same computational
339: cost, the $\beta$GM appears to have several advantages in terms of the
340: ability to capture the low-frequency motion and other vibrational
341: properties of proteins, as will be seen below.
342:
343: \subsection{Equilibrium properties}
344:
345:
346: The derivation of the vibrational properties of a protein from the
347: quadratic expansion of the Hamiltonian can be done, broadly speaking,
348: in two different ways: the normal mode analysis and the Langevin
349: analysis. What differentiates the two approaches is the view of the
350: role of the solvent on the system dynamics. If one assumes that the
351: motion of the protein is not significantly damped by the interaction
352: with the solvent, then the normal modes picture can be applied to
353: study the system dynamics around the native state by solving the
354: Newton's dynamical equations
355: \cite{levitt85,Karplus85,Case94,Hinsen98,go_gauss91,tirion93,hiv99,hal99}:
356:
357: \begin{equation}
358: m_i\, \ddot{x}_{i,\mu} = \sum_{j,\nu} \, \tilde{M}_{ij,\mu\nu}
359: x^{CA}_{j,\nu}
360: \label{eqn:m}
361: \end{equation}
362:
363:
364: \noindent The eigenfrequecies and eigenvectors are hence obtained by
365: diagonalizing a matrix derived from $\tilde{M}$ by an appropriate
366: mass-weighting \cite{Goldstein}.
367:
368: Although the normal-mode analysis allows a straightforward dynamical
369: characterization it is of dubious applicability since protein motion
370: in a solvent does not resemble a superposition of pure harmonic
371: oscillations. In fact, several theoretical, experimental and
372: computational studies, have shown that the dynamics of a protein is
373: severely overdamped by the interaction with the solvent
374: \cite{Karplus76,Karplus82,Karplus85,Hinsen98}. The description of the
375: motion in terms of overdamped dynamics appears to be particularly
376: valid for the protein's low-frequency vibrations, that are the most
377: interesting ones due to their expected role in proteins functional
378: activities \cite{hal99}. This observation leads to the alternative
379: view of a heavily damped dynamics \cite{Howard}.
380:
381: In this case, for small deviations from the reference positions, the
382: dynamics of the amino acids can be written as:
383:
384: \begin{equation}
385: \dot{x}_{i,\mu} (t) = - \sum_{j,\nu} \tilde{M}_{ij,\mu\nu} \, x_{j,\nu}(t)
386: + \eta_{i,\mu}(t)
387: \end{equation}
388:
389: \noindent where the time unit has been implicitly chosen so that the
390: viscosity coefficients (assumed to be equal for all particles) are
391: set equal to 1 and the stochastic noise terms satisfy
392: \cite{chandrasekhar}:
393:
394: \begin{eqnarray}
395: \langle \eta_{i,\mu} \rangle &=& 0\\
396: \langle \eta_{i,\mu} \eta_{j,\nu} \rangle&=& \delta_{i,j}\,
397: \delta_{\mu,\nu} 2 \kappa_B\, T \ .
398: \end{eqnarray}
399:
400: \noindent These two conditions ensure, in the long run, the onset of
401: canonical thermal equilibrium, so that the equilibrium probability of
402: a given configuration, $\{x\}$, for the particles in the system is
403: controlled by the Boltzmann factor:
404:
405: \begin{equation}
406: e^{-\beta {\cal H}(\{x\}) } = e^{ -{\beta}\, \sum_{ij,\mu\nu} x_{i,\mu} \tilde{M}_{ij,\mu \nu}
407: x_{j,\nu}}\ .
408: \label{eqn:boltz}
409: \end{equation}
410:
411: In this case, no periodic motion of the system can exist in the
412: absence of external periodic excitations, since any structural
413: deformation will be dissipated by a damped dynamics. The
414: standard theory of stochastic processes \cite{chandrasekhar,levitt85}
415: shows that the eigenvalues of $\tilde{M}$ are inversely proportional
416: to the system relaxation times, and the corresponding eigenvectors
417: indicate the actual shape of the associated distortion of the system.
418:
419:
420: \subsection{Covariance matrices and temperature-factors}
421:
422: Besides identifying the elementary modes of excitation of a protein,
423: it is important to calculate suitable thermodynamic quantities that
424: characterize the protein dynamics once thermal equilibrium with the
425: solvent has established. The main observable that can be calculated
426: within the gaussian model is the degree of correlation of the
427: displacement from the equilibrium (native position) of pairs of
428: $C_\alpha$'s. The thermodynamic average of the correlated
429: displacements, are easily obtained from the inversion of the ${M}$
430: matrix. In fact, after setting $1/\beta = K_B T =1 $, one has
431:
432: \begin{equation}
433: \langle x_{i,\mu}\, x_{j,\nu} \rangle = {\tilde
434: M}^{-1}_{ij,\mu\nu}
435: \label{eqn:fullcij}
436: \end{equation}
437:
438: \noindent where the brackets denote usual canonical thermodynamic
439: averages with the weight of eqn. (\ref{eqn:boltz}). The inverse
440: matrix, ${\tilde M}^{-1}_{ij,\mu\nu}$, is often referred to as
441: the covariance matrix. Since it provides directional details about the
442: correlated motion of pairs of residues we shall term it {\em full}
443: covariance matrix to distinguish it from the {\em reduced} one
444: discussed below which incorporates only a measure of the degree of
445: correlation (but no directional information).
446:
447: The eigenvectors of the full covariance matrix represent the
448: three-dimensional independent modes of structural distortion for the
449: reference protein. The modes associated to the largest eigenvalues of
450: ${\tilde M}^{-1}$ are the slowest to decay in a dissipative dynamics
451: and, hence, make the largest contribution to the mean-square
452: displacement of a given residues. The latter quantity is
453: straightforwardly calculated from eqn. (\ref{eqn:fullcij}):
454:
455: \begin{equation}
456: \langle |\vec{x}_i |^2 \rangle = \sum_{\mu} {\tilde
457: M}^{-1}_{ii,\mu\mu}
458: \label{eqn:bfact}
459: \end{equation}
460:
461: \noindent and can be directly connected to the temperature-factors
462: (also called B-factors) measurements reported in X-ray or
463: high-resolution NMR structural determinations \cite{normod}.
464:
465: \noindent It is worth remarking that the full covariance matrix
466: provides information about the system elasticity not only in
467: conditions of isolation but also when an external force, $\vec{f}_i$,
468: is applied to a given residue, $i$. In fact, within the Gaussian
469: approximation, the average displacement of the $j$th amino acid from
470: its reference position due to the application of $\vec{f}_i$ is given
471: by
472:
473: \begin{equation}
474: \langle x_{j,\nu} \rangle \propto \sum_{\mu} {\tilde
475: M}^{-1}_{ji,\nu\mu}\ f_{i,\mu}\ \ .
476: \end{equation}
477:
478:
479: \noindent As anticipated above, an important role in the analysis of
480: molecular dynamical trajectories is also played by the reduced
481: covariance matrix whose elements, $C_{ij}$, are defined as
482:
483: \begin{equation}
484: C_{ij} \equiv \langle \vec{x}_{i} \cdot \vec{x}_{j} \rangle =
485: \sum_{\mu} \tilde{M}^{-1}_{ij,\mu\mu}
486: \label{eqn:cij}
487: \end{equation}
488:
489: \noindent In ordinary MD simulations, the thermodynamic average in
490: (\ref{eqn:cij}) is replaced with the time average taken over the
491: simulated trajectory (ergodicity assumption). Due to the fact that the
492: $C$ matrix is obtained from $\tilde{M}^{-1}$ after a summation over
493: the cartesian components, the linear size of $C$ is equal to the
494: number of protein residues, $N$, instead of $3N$ as for
495: $\tilde{M}^{-1}$. This ten-fold reduction of information greatly
496: simplifies the identification of significant correlations between
497: residues motion.
498:
499: We conclude this section by discussing a technically important
500: point. The inversion of the $\tilde{M}$ matrix used in
501: eqns. (\ref{eqn:fullcij}--\ref{eqn:cij}), as well as a correct
502: interpretation of equation (\ref{eqn:boltz}) are possible only within
503: the subspace orthogonal to the eigenvectors of $\tilde{M}$ associated
504: with zero eigenvalues. Physically this corresponds to omit the
505: structural modifications that cost no energy (zero modes). Due to the
506: invariance of Hamiltonian (\ref{eqn:ham}) under rotations and
507: translations of the Cartesian reference frame, there will always be at
508: least six zero modes. This number can, however, be larger if the
509: $\tilde{M}$ matrix is sparse.The presence of additional spurious modes
510: in Gaussian network models that incorporate only $C_\alpha$
511: coordinates is usually achieved by two means: either a reduction of
512: the dimensionality of $\tilde{M}$ (as in GNM) or by using large
513: interaction cutoffs in the range 10-15 \AA (as in ANM). The model
514: discussed here allows to use physically-appealing interaction cutoffs
515: of the order of 7 \AA, as for GNM, and yet retaining the full
516: three-dimensional detail in the $\tilde{M}$ matrix. As will be shown
517: later, these ingredients are necessary to capture the finer aspects of
518: protein vibrations such as the correlation of residues' motion, while,
519: consistently with previous studies, the overall mobility of individual
520: residues is rather insensitive to the details of the model
521: \cite{coarseanm02a,coarseanm02b}.\\ In fact, we found that GNM, ANM
522: and $\beta$GM have a similar performance on the prediction of
523: experimental B-factors. This was established using high-resolution,
524: single-chain proteins taken from the non-redundant pdb-select list
525: \cite{pdbselect}. We restricted to proteins length between 50 and 200
526: residues and excluded from the comparison the first and last 5
527: residues to avoid biases due to enhanced terminal mobility. Overall we
528: selected 36 proteins determined with Xray and 31 with NMR. For
529: simplicity we summarise the level of agreement as the average of the
530: non-parametric rank correlation, $\tau$ \cite{halle2002}. This
531: analysis does not rely on the knowledge of the probability
532: distribution from which the points (pairs of data) are taken. What
533: matters is the agreement of the ranking of the points according to
534: each of the two variables. In case of perfect [anti]-correlation the
535: Kendall parameter $\tau$ takes on the value 1 [-1] and usually provides a more
536: stringent (and robust) measure than linear correlation\cite{NR}. For
537: the X-ray set and using GNM, $\tau$ ranged from 0.37 to 0.39 for
538: cutoffs in the range 7.5 - 15 \AA. For the same range $\beta$GM gave $
539: 0.34 < \tau < 0.37$, while for ANM $ 0.30 < \tau < 0.37$ for
540: interaction ranges 10-15 \AA. The comparison against NMR
541: temperature-factors provided higher correlations as already observed
542: in ref. \cite{normod}. For the same cutoff ranges reported above one
543: has for GNM: $ 0.45 < \tau < 0.47$, for $\beta$GM: $ 0.46 < \tau <
544: 0.48$ and for ANM $ 0.42 < \tau < 0.48$.
545:
546: In summary, the novel model discussed here allows to incorporate, in
547: an effective Hamiltonian, not only backbone-backbone interactions but
548: also backbone-sidechain and sidechain-sidechain ones. The sidechain
549: degrees of freedom are entirely controlled by the $C_\alpha$
550: positions. This has the crucial implication that the computational
551: effort required to characterize the vibrational properties of the
552: system is exactly the same as for models that incorporate only
553: interactions between $C_\alpha$ pairs.
554:
555: \section{Results and Discussion}
556:
557: In this section we shall examine the extent to which suitable
558: topology-based harmonic models can capture the details of the
559: near-native vibrations of proteins in thermal equilibrium. Given the
560: present impossibility to probe experimentally the various
561: thermodynamical quantities discussed before, it is mandatory to choose
562: as a reference the results of an all-atom molecular dynamics
563: calculation performed on the HIV-1 protease, in complex with TIMMNR
564: peptide model substrate \cite{piana2002,piana2002b}. All MD dynamical
565: averages were calculated after discarding an initial interval of a few
566: ns over which the protease complexed to the model substrate was
567: equilibrated \cite{piana2002,piana2002b}. The configuration obtained
568: of the end of the equilibration protocol was taken as the reference
569: structure for the Gaussian approach. The inversion of the symmetric
570: $\tilde{M}$ matrix, necessary to characterize the system equilibrium
571: dynamics, was done exploiting the Householder reduction \cite{NR} and
572: took about 10 minutes on a personal computer.
573:
574: For the comparison, it is important to remark that although molecular
575: dynamics studies can reproduce reliably a variety of experimental
576: quantities they ultimately rely on empirical potentials which may be
577: imperfectly parametrised. Besides this issue, it should also be noted
578: that, usually, it is not easy to ascertain whether the simulated
579: trajectory is sufficiently long that thermodynamic averages as in
580: eq. (\ref{eqn:cij}) can be legitimately replaced with dynamical ones,
581: though a recent study has indicated some valuable criteria for this
582: purpose \cite{Hess2002}. This potential limitations of general MD
583: approaches should therefore be borne in mind also in the present
584: context.
585:
586: The series of tests carried out to ascertain the consistency among MD
587: results and the one of gaussian models include the comparison of
588: temperature factors, covariance matrices and essential subspaces. The
589: findings, summarised below, provide a direct and strong indication
590: that the $\beta$GM is apt for capturing several aspects of proteins'
591: near-native vibrational dynamics with an accuracy that rivals with
592: techniques based on all-atom potentials.
593:
594:
595: The generalised energy function of eq. (\ref{eqn:ham}) contains
596: several parameters; one of these, the interaction amplitude of
597: $C_\alpha$ pairs, $V^{\prime \prime}_{CA-CA}$, can be conveniently
598: taken as the energy unit. The other parameters are the interaction
599: cutoff, $R$ (which enters in the definition of the contact matrices)
600: and the amplitudes of the $C_\beta$-$C_\beta$ and $C_\beta$-$C_\alpha$
601: interactions as well as the extra strength of the peptide term, $k$.
602: For reasons of simplicity the strength of these interactions have been
603: chosen of the same order as $V^{\prime \prime}_{CA-CA}$= $k=1$,
604: $V^{\prime \prime}_{CA-CB}=V^{\prime \prime}_{CB-CB}=1/2$. This
605: choice was done for reasons of simplicity but is not particularly
606: restrictive due to the fact that effective interactions between
607: sidechains are expected to be of the same order as
608: $C_\alpha$-$C_\alpha$ ones \cite{stabloc}; in addition the precise
609: value of $k$ will mostly impact on the high frequency vibrational
610: spectrum of the system. Hence, it can be anticipated that the system
611: elastic response should mostly depend on the value of the interaction
612: radius, $R$, which has been accordingly varied in our analysis.
613:
614:
615: We first discuss the possibility to predict the temperature factors
616: encountered in molecular dynamics. This type of validation has been
617: considered before by Doruker {\em et al.} in connection with the
618: anisotropic gaussian model and reported a good consistency with
619: dynamical simulations \cite{doruker2000}.
620:
621: As a measure of the agreement between the residues mean square
622: fluctuations observed in MD and those predicted from the Gaussian
623: models, see eqn. (\ref{eqn:bfact}), we considered the linear
624: correlation coefficient. The degree of correlation as a function of
625: the interaction cutoff radius for $\beta$GM is shown in Fig. \ref{fig:bfact}
626:
627:
628: \begin{figure}
629: \includegraphics[width=3.0in]{fig2.eps}
630: \caption{Linear correlation coefficient for the temperature factors of
631: the 204 residues in the HIV-1 PR/SUB complex.}
632: \label{fig:bfact}
633: \end{figure}
634:
635:
636: \noindent The performance of the $\beta$GM particularly stable beyond
637: an interaction cutoff of about 8.0 \AA. Given the large number of
638: residues in the system (198 for the protein and 6 for the substrate)
639: over which the B-factors are calculated, it is certainly possible to
640: conclude that the correlation coefficient approaching 0.7 visible in
641: Fig. \ref{fig:bfact} is statistically significant. A more quantitative
642: assessment of the statistical significance could be done using the
643: Student's $t$-test or related methods \cite{NR}, although such
644: analysis usually rely on the assumption that the joint distribution of
645: the correlated variables is binormal (which is not necessarily
646: satisfied for B-factors). An alternative way is to recourse to the
647: non-parametric Kendall test mentioned before \cite{NR}, as recently
648: proposed by Halle \cite{halle2002}. The Kendall correlation
649: coefficient among the B-factors of the simulation and those of the $\beta$
650: Gaussian model (for $R \approx$ 7.5 \AA) is $\tau \approx 0.61$ and
651: amply satisfies all ordinary criteria for statistical significance.
652:
653: The successful comparison of the B-factors confirms the general
654: agreement between the overall residues' motion in MD and the
655: equilibrium dynamics predicted by Gaussian models\cite{doruker2000};
656: in fact, for the same cutoff used above ang agsinst the same MD data
657: of the HIV-1 PR/SUB complex, GNM provides a linear correlation
658: coefficient of 0.61 while the Kendall parameter $\tau$ is equal to
659: 0.59. However, the finer details of such accord have not, to the best
660: of our knowledge, been explored yet and hence become the focus of our
661: subsequent analysis which is based on a comparison of the MD and
662: $\beta$GM covariance matrices. This test is particularly important due
663: to the wealth of biological and chemical information that can been
664: extracted from the covariance (essential dynamics) analysis
665: \cite{Amadei93,garcia92,brooks2003}.
666:
667: In the context of HIV-1 Pr
668: \cite{condra,patick,BIOCH88,gulnik,hiv1,apr,condra1,boucher1,Molla,Marko}
669: these motions have a direct mechanical bearing on the structural
670: modulation of the active site, even though they are located remotely
671: from it\cite{piana2002,piana2002b,pcr}.
672:
673: As a first case we consider the reduced covariance matrices. To allow
674: a straightforward comparison of the theoretical and numerical results
675: rather than working directly in terms of the reduced covariance
676: matrix, it is useful to focus on the normalised version, which is
677: dimensionless:
678:
679: \begin{equation}
680: \tilde{C}_{ij} = {\langle \vec{x}_{i} \cdot \vec{x}_{j} \rangle \over
681: \sqrt{\langle |\vec{x}_{i}|^2 \rangle\, \langle |\vec{x}_{j}|^2
682: \rangle}}
683: \label{eqn:cijnorm}
684: \end{equation}
685:
686: \noindent The scatter plot of Fig. \ref{fig:cov2} summarises the
687: degree of accord between the normalised covariance matrix of the MD
688: simulation and that of the $\beta$-Gaussian model, for a cutoff 0f 7.5
689: \AA. The number of entries in the plot is about $2\cdot 10^4$, equal
690: to the number of distinct entries in the 204x204 $\tilde{C}_{ij}$
691: matrix. To avoid introducing artificial biases in the correlation,
692: the diagonal elements of the normalised matrices (which are all equal
693: to 1) have been omitted from the plot.
694:
695: \begin{figure}
696: \includegraphics[width=3.0in]{fig3.eps}
697: \caption{Scatter plot of corresponding entries of the covariance
698: matrices obtained within the $\beta$ Gaussian model ($R=7.5$ \AA) and
699: from the 14ns MD simulation on the HIV-1 PR/SUB complex. The linear
700: correlation coefficient over the $2\cdot 10^4$ data points is 0.80 .}
701: \label{fig:cov2}
702: \end{figure}
703:
704:
705: The linear correlation coefficient among the two sets of data is 0.80
706: (and is stable in the neighborhood of $R=7.5$ \AA). We do not attempt
707: to provide a quantitative measure for the statistical significance of
708: the linear correlation in Fig. \ref{fig:cov2}. In fact, on one hand,
709: as visible in Fig. \ref{fig:cov2}, the joint distribution for
710: covariance matrix elements is only approximately binormal, and hence
711: the traditional tests of linear regression significance are of dubious
712: applicability. On the other, the Kendall correlation measure requires
713: to consider all possible pairs of points in the scatter plot: this
714: makes the analysis impractical and disproportionate to the main goal
715: which is to ascertain the existence of the accord between topological
716: gaussian models and MD results. In fact, rather than measuring the
717: correlation in absolute terms, we shall compare the accord of gaussian
718: models and MD simulations against the degree of ``internal''
719: consistency of the simulated dynamical trajectory itself.
720:
721: We conclude the discussion of the scatter plot of Fig. \ref{fig:cov2}
722: by mentioning that, in case of perfect correlation of two
723: $\tilde{C}_{ij}$ sets, due to the normalization condition of
724: eqn. \ref{eqn:cijnorm}, the data would align along the diagonal of the
725: graph in Fig. \ref{fig:cov2}. Interestingly, despite the scatter
726: visible in the same figure, the interpolating line lies very close to
727: the diagonal, having a slope of $s=0.97$ (taking the $\beta$GM
728: covariance as the independent variable). This fact is useful in
729: illustrating the effects of the cutoff, $R$, on the accord with MD
730: covariance elements. While for $R=10$\AA the slope is still good,
731: $s=1.05$, it deteriorates for $R = 15$\AA, where $s=1.64$. This effect
732: is even more pronounced if the $C_\beta$'s are not included in the
733: model. For example, for $R = 15$ \AA\ the observed slope was $s =
734: 3.41$.
735:
736: The quantification of the self-consistency of MD dynamical
737: trajectories is an extremely important issue since it can provide an
738: {\em a posteriori} indication of whether the dynamical sampling of the
739: phase-space was sufficiently to obtain reliable thermodynamic
740: averages. The analysis usually starts from the calculation of two
741: covariance matrices pertaining to the first and second halves of the
742: MD trajectory.
743:
744: Ths two matrices could then be compared entry by entry, as done above.
745: However, the most appropriate procedure is not to compare the
746: corresponding matrix elements, but rather the physically-important
747: (essential) eigenspaces, that is the linear spaces spanned by the
748: eigenvectors of $M^{-1}$ associated to the largest eigenvalues. A
749: number of studies have suggested ways of measuring this consistency.
750:
751: The first method of comparison that we will be taken into account is
752: the one introduced by Amadei {\em et al} \cite{Amadei99} which focuses
753: on the top $n$ eigenvectors of the covariance matrices under
754: comparison. These eigenvectors describe the most significant modes of
755: vibration of the molecule in the three-dimensional space. We stress
756: here that the covariance matrix considered here is not the reduced one
757: of eq. (\ref{eqn:cij}), whose size is $N$x$N$, but is the full one (the
758: $M^{-1}$ matrix) of size $3N$x$3N$ that contains the three-dimensional
759: information about correlated motion of pairs of residues, see
760: e.g. eqn. (\ref{eqn:fullcij}).
761:
762: By denoting the top $n$ eigenvectors of the two matrices under
763: comparison as $\{\vec\eta\}$ and $\{\vec\nu\}$, the degree of overlap
764: of the essential subspaces is defined as the root mean square inner
765: product (RMSIP) of all pairs of eigenvectors in the two sets
766: \cite{Amadei99}:
767:
768: \begin{equation}
769: RMSIP = \sqrt{ {1 \over n} \sum_{i,j} | \vec\eta_i \cdot \vec\nu_j
770: |^2}
771: \label{eqn:amadei}
772: \end{equation}
773:
774: \noindent Customarily, the analysis is restricted to the top $n=10$
775: eigenvectors. We have measured the RMSIP in eqn. (\ref{eqn:amadei})
776: when $\{\eta\}$ and $\{\nu\}$ come from the essential subspaces of the
777: first and second halves of the 14ns MD trajectory of the HIV-1 PR/SUB
778: complex. The calculated RMSIP value, eq. \ref{eqn:amadei}, was
779: 0.71. Amadei et al. \cite{Amadei99} have also proposed a series of
780: approximate tests to ascertain the statistical relevance of the
781: observed overlap. Based on their analysis, we can conclude that the
782: value obtained here, for the given system size of 204 amino acids, has
783: a probability to have arisen by chance that is, by far, inferior to
784: the conventional threshold of 1 \%. This supports the fact that the MD
785: trajectory was sufficiently long to contain significant physical
786: information about the system equilibrium dynamics.
787:
788: Having in mind the level of internal consistency of the present
789: reference MD trajectory we have turned to measuring the RMSIP between
790: the essential spaces of the whole dynamical trajectory and those of
791: the $\beta$ Gaussian model.
792:
793:
794:
795: \begin{figure}
796: \includegraphics[width=3.0in]{fig4.eps}
797: \caption{Degree of correlation (root mean square inner product), see
798: eqn. \protect(\ref{eqn:amadei}), of the essential subspaces of the MD
799: simulation and of the gaussian models as a function of the interaction
800: cutoff, $R$. The thick curve denotes the performance of the $\beta$
801: Gaussian model, while the horizontal dotted line indicates the overlap
802: of the essential subspaces of the first and second halves of the MD
803: trajectory.}
804: \label{fig:amadei}
805: \end{figure}
806:
807: The resulting trend for the subspaces overlap is shown in
808: Fig. \ref{fig:amadei} as a function of the interaction cutoff,
809: $R$. The best performance of the model is obtained for a cutoff of $R
810: \approx 7.5$ \AA. The corresponding overlap value of 0.68 is very
811: close to the internal overlap of the MD trajectory. We wish to remark
812: that such values of RMSIP are highly non-trivial due to the large size
813: of the full covariance matrix (612x612). This implies that the
814: probability to observe a given overlap, $q$, between two random unit
815: vectors decreases extremely rapidly as $q$ approaches 1
816: \cite{Amadei99}. As a consequence, the MD simulation time required to
817: reach a given target value for the internal RMSIP consistency,
818: $\bar{q}$, grows very repidly with $\bar{q}$.
819:
820: This observation clarifies the utility of the Gaussian approach. With
821: a modest computational investment, required by the diagonalization of
822: a $3N$x$3N$ matrix, one obtains a description of the protein essential
823: dynamics that,within a molecular dynamics framework where all atom
824: effective potentials are used, requires a considerably heavier
825: computational investment. Further improvements over the $\beta$GM
826: performance are obviously possible within MD, but at the price of a
827: rapidly growing computing time.
828:
829: The usefulness of the $\beta$GM is further supported by yet another
830: type of analysis, which concludes the series of tests carried out
831: here. This last approach follows a more precise measure for the
832: agreement of the essential subspaces recently introduced by Hess
833: \cite{Hess2002}. The new measure is an improvement over the definition
834: of eqn. (\ref{eqn:amadei}) since it removes both the subjectivity of
835: the choice of $n$, assigns more importance to the physically-relevant
836: eigenspaces and deals correctly with the presence of spectral
837: degeneracies.
838:
839:
840: \begin{figure}
841: \includegraphics[width=3.0in]{fig5.eps}
842: \caption{Degree of overlap according to the measure introduced by Hess
843: \protect{\cite{Hess2002}} of the essential subspaces of the MD
844: simulation and of the gaussian models as a function of the interaction
845: cutoff, $R$. The thick curve denotes the performance of the $\beta$
846: Gaussian model, while the horizontal dotted line indicates the overlap
847: of the essential subspaces of the first and second halves of the MD
848: trajectory.}
849: \label{fig:berkcor}
850: \end{figure}
851:
852: Unlike the measure of Amadei {\em et al.}, the one of Hess is
853: sensitive to the actual values of the eigenvalues of $M^{-1}$. Since
854: the energy units of the gaussian models considered here has been
855: chosen arbitrarily, a proper normalisation of the spectrum of the
856: covariance matrices has to be carried out in order to use the measure
857: of Hess for the comparison against MD. For this reason, the degree of
858: overlap of the matrices was carried out after having uniformly
859: rescaled the eigenvalues of $M^{-1}$ so that the trace of $M^{-1}$ was
860: equal to 1 for both systems. Physically this corresponds to a
861: normalization of the average residues mean square displacement.
862:
863: The results for this analysis are shown in
864: Fig. \ref{fig:berkcor}. The enhanced stringency of this test, which
865: is extended to the whole vibrational spectrum, and not just to the top
866: 10 modes, is reflected in an overall decrease of the overlap with
867: respect to Fig. \ref{fig:amadei}. This finer measure also reflects
868: better than RMSIP the higher degree os inner consistency of the MD
869: trajectory, as opposed to the MD-$\beta$GM accord. Although the best
870: reference for our model would be provided by a simulation run
871: sufficiently long that the inner MD overlap approaches 1, this is not
872: presently feasible due to the slow (approximately logarithmic)
873: increase of the inner overlap with the length of the MD run. Despite
874: this fact, the results fully confirm the previous conclusions namely
875: that the $\beta$ Gaussian model can predict, with good statistical
876: confidence, equilibrium dynamical properties of proteins and, in
877: particular, identify the relevant modes of vibration of the system.
878:
879: \noindent The reference system used here for the comparison between the
880: model and MD simulations was chosen for both its biological
881: significance and for the availability of MD data collected over the
882: rather long simulation time. The $\beta$ Gaussian model is, however,
883: of general applicability, and to confirm the robustness of the
884: strategy we have considered another biologically important reference
885: system, the NGF-trkA complex. This is constituted by a protein dimer,
886: the nerve growth factor (NGF), complexed with the tyrosine kinase A
887: receptor (trkA); altogether the system comprises 431 residues. The
888: dynamics of the complex in aqueous solution was recently simulated for
889: a time span of 2.6 ns using all-atom potentials \cite{settanni_ngf}.
890: The resulting normalised covariance matrix was compared with the one
891: obtained from the $\beta$ Gaussian model. The linear correlation
892: coefficient over the nearly $10^5$ corresponding distinct entries of
893: the matrices is 0.86, as visible in Fig. \ref{fig:ngf}.
894:
895: \begin{figure}
896: \includegraphics[width=3.0in]{fig6.eps}
897: \caption{Scatter plot of corresponding entries of the covariance
898: matrices obtained within the $\beta$ Gaussian model ($R=7.5$ \AA) and
899: from the 2.6ns MD simulation on the NGF-trkA complex
900: \protect\cite{settanni_ngf}. The linear correlation coefficient over
901: the nearly $10^5$ data points is 0.86}
902: \label{fig:ngf}
903: \end{figure}
904:
905: \noindent This result confirms the viability of the Gaussian approach
906: to capture the details of the large-scale protein movements. However,
907: due to the simplicity of the model interaction potentials (all pairs
908: of $C_\alpha$'s and $C_\beta$'s interact with the same strength) one
909: may foresee that the model is not suitable for modelling the
910: vibrational dynamics of proteins where electrostatic effects or
911: disulfide bonds play an important role for native stability or
912: funtionality.
913:
914: \vskip 0.5cm
915:
916: {\em Biological implications}
917:
918: In the MD calculations by Piana et al.\cite{piana2002,piana2002b,pcr}
919: the essential dynamics analysis has allowed to identify the sites
920: that, despite being spatially distant from the active site have a
921: strong mechanical influence on the structural modulation of the
922: regions binding the substrate. We now compare the findings obtained
923: with MD with those obtained with our model. The degree of mechanical
924: coupling observed in the MD trajectory between the substrate motion
925: and the HIV-1 protease subunits is visible in the top panel of Fig.
926: \ref{fig:dimercontour}. The two curves in the plot represent the
927: profile of the reduced covariance matrix, $C_{ij}$, between the two
928: central atoms of the peptide and the 198 protease residues.
929: Interestingly, the regions that correlate significantly with the
930: substrate motion are those that have been indicated as rather
931: under-constrained by studies where the theory of rigidity has been
932: applied to characterize the enzyme elasticity \cite{thorpe2001}. The
933: two facts provide a consistent picture for the HIV-1 PR mechanics
934: since any functionally-relevant mechanical coupling must intuitively
935: involve mobile (and hence under-constrained)
936: regions. The identification of the mobile regions of
937: HIV-1 protease has also been previously addressed with Gaussian
938: network models in a series of studies which also allowed to identify
939: the residues important for protein stability \cite{bah98,hivgnm03} or
940: otherwise taking part to crucially important networks of key native
941: contacts\cite{hiv-gauss}.
942:
943: The bottom panel of figure \ref{fig:dimercontour} represents the
944: corresponding correlation profiles calculated within the $\beta$GM for
945: a cutoff $R= 7.5$ \AA (which we take as an optimal value from previous
946: analysis). The degree of agreement of the profiles across the two
947: panels is remarkable and, the main difference appears to be due to an
948: overemphasis of negative correlations in the gaussian profiles.
949:
950: In both sets of data one observes a strong positive correlation
951: between the substrate motion and the regions 24-30 and 45-55. This
952: direct mechanical coupling is of immediately interpreted due to the
953: fact that the first region comprises the cleavage site while the
954: second involves the tips of the protease flaps.
955:
956: \begin{figure}
957: \includegraphics[width=3.0in]{fig7.eps}
958: \caption{The two curves in each panel indicate the degree of
959: correlation of the motion between the two central atoms of the
960: substrate and the 198 residues in the subunits of the HIV-1
961: protease. The top panel reports the MD findings, while the bottom one
962: pertains to the $\beta$ Gaussian model.}
963: \label{fig:dimercontour}
964: \end{figure}
965:
966:
967: The essential-space analysis of the MD trajectory revealed that the
968: two regions embrace the substrate and involve it in a rotational
969: ``nutcracker-like'' motion. As a consequence of this rotation, the
970: regions near the flaps elbows, 37-41 and 61-73, undergo a
971: counter-movement that results in a negative correlation with the
972: substrate motion. This effect is clearly visible in the gaussian
973: profiles of Fig. \ref{fig:dimercontour}, while it is less pronounced,
974: but still significant, in the MD results.
975:
976: As remarked in the Theory section, the intimate connection between the
977: linear response theory and the covariance matrix allows to conclude
978: that a force applied in correspondence of sites around residues 40 and
979: 63 should affect the protease-substrate coupling. This effect was
980: indeed observed in the MD simulation where the motion of the substrate
981: towards the cleavage site was strongly affected by the constraints on
982: these regions (see Fig. 6 in ref. \cite{piana2002}).
983:
984: It is by virtue of this mechanically-important coupling that it is
985: possible to rationalize the emergence of mutations causing drug
986: resistance in correspondence of sites far from the cleavage region
987: (e.g. M63I, M46I-L and L47V). In fact, as the explicit MD calculation
988: has shown, the high degree of such coupling between such sites and the
989: cleavage is such that the detailed chemical identity of the former
990: strongly influence the substrate binding affinity of the latter. In
991: particular, the mutations observed in clinics
992: \cite{apr,condra1,boucher1,Molla,Marko} are arguably the result of a
993: chemical fine-tuning that retains the native functionality of the
994: enzyme while decreasing its affinity for inhibiting drugs.
995:
996:
997: The details of how the enzymatic reaction kinetics changes upon amino
998: acid mutations is beyond the reach of the topological models presented
999: here. The gaussian scheme, in fact, is entirely adequate to identify
1000: which sites influence mechanically the active site motion but, due to
1001: the fact that all amino acids (except for GLY which lacks the
1002: $C_\beta$ centroid) are treated equally we cannot explore the
1003: ramifications of changing the amino acid identity into the
1004: cleavage-region and substrate coupling. The effect of one of such
1005: mutations (M46I) on substrate motion has instead been fully taken into
1006: account in ref. \cite{piana2002b}.
1007: Further improvements of the $\beta$ Gaussian model may be possible by
1008: optimizing the distance of the $C_\beta$ centroids from their
1009: respective $C_\alpha$'s (so to better capture the displacement of the
1010: sidechains centres of mass). The model could also be extended to
1011: include, at the simplest possible level, the effects of thermal
1012: denaturation through a self-consistent temperature-dependent weakening
1013: of the strength of the harmonic couplings, as done in
1014: refs. \cite{gaussian,normod}.
1015:
1016:
1017: \section{Conclusions}
1018:
1019: We have examined the extent to which the dynamical properties of a
1020: protein in thermodynamic equilibrium can be accounted for through
1021: solvable models. The starting proint of our analysis is the quadratic
1022: approximation of the free energy landscape in terms of the deviations
1023: of amino acids from their reference positions in the known native
1024: state. We have adopted a novel description of the amino acids which
1025: allows to consider the presence of effective $C_\alpha$ centroids
1026: whose degrees of freedom are entirely controlled by the $C_\alpha$
1027: atoms. As a result, the model that is considered, is able to account
1028: for the directionality of amino acid sidechains while retaining the
1029: same degree of complexity as models based on $C_\alpha$ representation
1030: only. Various equilibrium quantities apt to characterize the most
1031: relevant modes of vibrations of proteins are considered. In
1032: particular, we focussed on (in increasing order of complexity and
1033: detail) the B-factors, the covariance matrices and the
1034: essential dynamical subspace. Our results have been compared against
1035: the analogous quantities obtained through a 14ns molecular dynamics
1036: simulation carried out on the HIV-1 PR enzyme in complex with a TIMMNR
1037: peptide
1038: substrate.
1039:
1040:
1041: As fas as overall equilibrium dynamical properties are concerned, the
1042: $\beta$ Gaussian model provides a picture that is in remarkable
1043: agreement with the MD results. In fact, the essential subspace
1044: predicted theoretically appears to have a degree of consistency with
1045: MD results that is close to the ``inner consistence'' of a 14ns MD
1046: simulation with itself.
1047:
1048: This provides a strong indication that suitable quadratic models can
1049: provide a powerful and accurate tool for characterizing the
1050: vibrational motions of proteins near their native state while
1051: requiring only a modest investment of computational resources. Other
1052: important properties of protein dynamics and functionality that
1053: strongly depend on the sequence composition or on out-of-equilibrium
1054: conditions are, at the moment, beyond the reach of such simplified
1055: approaches considered here. For this reason we believe that the
1056: Gaussian approach would be ideally used in conjunction with molecular
1057: dynamics by providing, prior to investing significant computational
1058: resources in all-atom simulations, a fast but accurate
1059: characterization of a protein's near-native motion.
1060:
1061: \section{Acknowledgments}
1062:
1063: We are indebted to Stefano Piana, Michele Cascella, Giorgio Colombo,
1064: Paolo De Los Rios, Gianluca Lattanzi, Luca Marsella and Gianni
1065: Settanni for useful suggestions and advice. We acknowledge financial
1066: support from INFM and Cofin MIUR 2001.
1067:
1068: \begin{thebibliography}{10}
1069:
1070: \bibitem{Karplus_ACR}
1071: Karplus, M.
1072: \newblock Molecular dynamics simulations of biomolecules.
1073: \newblock Acc. Chem. Res. 35:321--323, 2002.
1074:
1075: \bibitem{nmr1}
1076: Lipari, G. and Szabo, A.
1077: \newblock Model free aproach to the interpretation of nuclear magnetic
1078: resonance relaxation in macromolecules. 1 theory and range of validity.
1079: \newblock J. Am. Chem. Soc. 104:4546--4559, 1982.
1080:
1081: \bibitem{nmr2}
1082: Lipari, G. and Szabo, A.
1083: \newblock Model free aproach to the interpretation of nuclear magnetic
1084: resonance relaxation in macromolecules. 2. analysis of experimental results.
1085: \newblock J. Am. Chem. Soc. 104:4559--4570, 1982.
1086:
1087: \bibitem{karplus_book}
1088: Brooks, C.~L., Karplus, M., and Pettitt, B.~M.
1089: \newblock Proteins: a theoretical perspective of dynamics, structure, and
1090: thermodynamics.
1091: \newblock Wiley, New York, , 1988.
1092:
1093: \bibitem{go82}
1094: Noguti, T. and Go, N.
1095: \newblock Collective variable description of small-amplitude conformational
1096: fluctuations in a globular protein.
1097: \newblock Nature 296:776--778, 1982.
1098:
1099: \bibitem{Hess2002}
1100: Hess, B.
1101: \newblock Convergence of sampling in protein simulations.
1102: \newblock Phys. Rev. E 65:031910, 2002.
1103:
1104: \bibitem{tir96}
1105: Tirion, M.~M.
1106: \newblock Large amplitude elastic motions in proteins from a single--parameter,
1107: atomic analysis.
1108: \newblock Physical Review Letters 77:1905--1908, 1996.
1109:
1110: \bibitem{bah97}
1111: Bahar, I., Atilgan, A.~R., and Erman, B.
1112: \newblock Direct evaluation of thermal fluctuations in proteins using a single
1113: parameter harmonic potential.
1114: \newblock Folding and Design 2:173--181, 1997.
1115:
1116: \bibitem{doruker2000}
1117: Doruker, P., Atilgan, A., and Bahar, I.
1118: \newblock Dynamics of proteins predicted by molecular dynamics simulations and
1119: analytical approaches: Application to alpha-amylase inhibitor.
1120: \newblock Proteins: Structure Function and Genetics 40:512--524, 2000.
1121:
1122: \bibitem{ani01}
1123: Atilgan, A.~R., Durell, S.~R., Jernigan, R.~L., Demirel, M.~C., Keskin, O., and
1124: Bahar, I.
1125: \newblock Anisotropy of fluctuation dynamics of proteins with an elastic
1126: network model.
1127: \newblock Biophysical Journal 80:505--515, 2001.
1128:
1129: \bibitem{piana2002}
1130: Piana, S., Carloni, P., and Parrinello, M.
1131: \newblock Role of conformational fluctuations in the enzymatic reaction of
1132: hiv-1 protease.
1133: \newblock J. Mol. Biol. 319:567--583, 2002.
1134:
1135: \bibitem{piana2002b}
1136: Piana, S., Carloni, P., and Rothlisberger, U.
1137: \newblock Drug resistance in hiv-1 protease: flexibility-assisted mechanism of
1138: compensatory mutations.
1139: \newblock Protein Science 11:2393--2402, 2002.
1140:
1141: \bibitem{condra}
1142: {Condra et~al.}, J.~H.
1143: \newblock In-vivo emergence of hiv-1 variants resistant to multiple protease
1144: inhibitors.
1145: \newblock Nature 374:569--571, 1995.
1146:
1147: \bibitem{patick}
1148: {Patick~et~al.}, A.~K.
1149: \newblock Antiviral and resistance studies of {AG1343}, an orally bioavailable
1150: inhibitor of human immunodeficiency virus protease.
1151: \newblock Antimicrob. Agents Chemother. 40:292--297, 1996.
1152:
1153: \bibitem{pcr}
1154: Piana, S., Carloni, P., and Rothlisberger, U.
1155: \newblock Reaction mechanism of hiv-1 protease by hybrid car-parrinello
1156: md/classical md simulations.
1157: \newblock submitted, 2003.
1158:
1159: \bibitem{settanni_ngf}
1160: Settanni, G., Cattaneo, A., and Carloni, P.
1161: \newblock Molecular dynamics simulations of the ngf-trka domain 5 comples and
1162: comparison with biological data.
1163: \newblock Biophys. J. 84:2282--2292, 2003.
1164:
1165: \bibitem{go_gauss91}
1166: Horiuchi, T. and Go, N.
1167: \newblock Projection of monte carlo and molecular dynamics trajectories onto
1168: the normal mode axes: human lysozyme.
1169: \newblock Proteins: Structure Function and Genetics 10:106--116, 1991.
1170:
1171: \bibitem{substates}
1172: Frauenfelder, H., Siglar, H., and Young, R.~D.
1173: \newblock Science 254:1598, 1991.
1174:
1175: \bibitem{cb_construct}
1176: Park, B. and Levitt, M.
1177: \newblock Energy functions that discriminate x-ray and near-native folds from
1178: well-constructed decoys.
1179: \newblock Proteins: Structure Function and Genetics 258:367--392, 1996.
1180:
1181: \bibitem{levitt85}
1182: Levitt, M., Sander, C., and Stern, P.~S.
1183: \newblock Protein normal-mode dynamics: trypsin inhibitor, crambin,
1184: ribonuclease and lysozyme.
1185: \newblock J. Mol. Biol. 181:423--447, 1985.
1186:
1187: \bibitem{Karplus85}
1188: Brooks, B. and Karplus, M.
1189: \newblock Normal modes for specific motions of macromolecules: application to
1190: the hinge-bending mode of lysozyme.
1191: \newblock Proc. Natl. Acad. Sci. USA 82:4995--4999, 1985.
1192:
1193: \bibitem{Case94}
1194: Case, D.~A.
1195: \newblock Normal mode analysis of protein dynamics.
1196: \newblock Curr. Op. Str. Biol. 4:285--290, 1994.
1197:
1198: \bibitem{Hinsen98}
1199: Hinsen, K.
1200: \newblock Analysis of domain motions by approximate normal mode calculations.
1201: \newblock Proteins: Structure Function and Genetics 33:417--429, 1998.
1202:
1203: \bibitem{tirion93}
1204: Tirion, M.~M. and ben Avraham, D.
1205: \newblock Normal mode analysis of g-actin.
1206: \newblock J. Mol. Biol. 230:186--195, 1993.
1207:
1208: \bibitem{hiv99}
1209: Bahar, I., Erman, B., Jernigan, R.~L., Atilgan, A.~R., and Covell, D.~G.
1210: \newblock Collective motions in hiv--1 reverse transcriptase: Examination of
1211: flexibility and enzyme function.
1212: \newblock Journal of Molecular Biology 285:1023--1037, 1999.
1213:
1214: \bibitem{hal99}
1215: Haliloglu, T. and Bahar, I.
1216: \newblock Structure based analysis of protein dynamics. comparison of
1217: theoretical results for hen lysozyme with x--ray diffraction and nmr
1218: relaxation data.
1219: \newblock Proteins 37:654--667, 1999.
1220:
1221: \bibitem{Goldstein}
1222: Goldstein, H.
1223: \newblock Classical mechanics. 2nd ed.
1224: \newblock Addison-Wesley, Reading, Mass., , 1980.
1225:
1226: \bibitem{Karplus76}
1227: McCammon, J.~A., Gelin, B.~R., Karplus, M., and Wolynes, P.~G.
1228: \newblock The hinge-bending mode in lysozyme.
1229: \newblock Nature 262:325--326, 1976.
1230:
1231: \bibitem{Karplus82}
1232: Swaminathan, S., Ichiye, T., van Gusteren, W., and Karplus, M.
1233: \newblock Time dependence of atomic fluctuations in proteins: analysis of local
1234: and collective motions in bovine pancreatic trypsin inhibitor.
1235: \newblock Biochemistry 21:5230--5241, 1982.
1236:
1237: \bibitem{Howard}
1238: Howard, J.
1239: \newblock Mechanics of motor proteins and the cytoskeleton.
1240: \newblock Sinauer Associates, Sunderland, MA, , 2001.
1241:
1242: \bibitem{chandrasekhar}
1243: Chandrasekhar, S.
1244: \newblock Stochastic problems in physics and astronomy.
1245: \newblock Rev. Mod. Phys. 15:1--89, 1943.
1246:
1247: \bibitem{normod}
1248: Micheletti, C., Lattanzi, G., and Maritan, A.
1249: \newblock Elastic properties of proteins: Insight on the folding process and
1250: evolutionary selection of native structures.
1251: \newblock J. Mol. Biol. 321:909--921, 2002.
1252:
1253: \bibitem{coarseanm02a}
1254: Doruker, P., Jernigan, R., and Bahar, I.
1255: \newblock Dynamics of large proteins through hierarchical levels of
1256: coarse-grained structures.
1257: \newblock J. Comput. Chem. 23:119--27, 2002.
1258:
1259: \bibitem{coarseanm02b}
1260: Doruker, P., Jernigan, R., Navizet, I., and Hernandez, R.
1261: \newblock Important fluctuation dynamics of large protein structures are
1262: preserved upon renormalization.
1263: \newblock Int. J. Quantum Chem. 90:822--837, 2002.
1264:
1265: \bibitem{pdbselect}
1266: Hobohm, U. and Sander, C.
1267: \newblock Enlarged representative set of protein structures.
1268: \newblock Protein Science 3:522, 1994.
1269:
1270: \bibitem{halle2002}
1271: Halle, B.
1272: \newblock Flexibility and packing in proteins.
1273: \newblock Proc. Natl. Acad. Sci. USA 99:1274--1279, 2002.
1274:
1275: \bibitem{NR}
1276: Press, W.~H., Teukolsky, S.~A., Vetterling, W.~T., and Flannery, B.~P.
1277: \newblock Numerical Recipes.
1278: \newblock CUP, Cambridge, , 1999.
1279:
1280: \bibitem{stabloc}
1281: Micheletti, C., Seno, F., Banavar, J.~R., and Maritan, A.
1282: \newblock Learning effective amino acid interactions through iterative
1283: stochastic techniques.
1284: \newblock Proteins: Structure Function and Genetics 42:422--431, 2001.
1285:
1286: \bibitem{Amadei93}
1287: Amadei, A., Linssen, A.~B.~M., and Berendsen, H.~J.~C.
1288: \newblock Essential dynamics of proteins.
1289: \newblock Proteins: Structure Function and Genetics 17:412--425, 1993.
1290:
1291: \bibitem{garcia92}
1292: Garcia, A.
1293: \newblock Large-amplitude nonlinear motions in proteins.
1294: \newblock Phys. Rev. Lett. 68:2696--2699, 1992.
1295:
1296: \bibitem{brooks2003}
1297: Rod, T.~H., Radkiewicz, J.~L., and III, C. L.~B.
1298: \newblock Correlated motion and the effect of distal mutations in dihydrofolate
1299: reductase.
1300: \newblock Proc. Natl. Acad. Sci. USA 100:3954--3959, 2003.
1301:
1302: \bibitem{BIOCH88}
1303: Ala, P.~J., Huston, E.~E., Klabe, R.~M., Jadhav, P.~K., Lam, P. Y.~S., and
1304: Chang, C.~H.
1305: \newblock Counteracting {HIV}-1 protease drug resistance: Structural analysis
1306: of mutant proteases complexed with {XV638} and {SD146}, cyclic urea amides
1307: with broad specificities.
1308: \newblock Biochemistry 37:15042--15049, 1998.
1309:
1310: \bibitem{gulnik}
1311: Gulnik, S., Erickson, J.~W., and Xie, D.
1312: \newblock Vitamins and hormones - advances in research and applications.
1313: \newblock Vitam Horm. 58:213--256, 2000.
1314:
1315: \bibitem{hiv1}
1316: Wlodawer, A. and Erickson, J.~W.
1317: \newblock Structure-based inhibitors of {HIV}-1 protease.
1318: \newblock Annu Rev Biochem 62:543--585, 1993.
1319: \newblock and references therein.
1320:
1321: \bibitem{apr}
1322: Reddy, P. and Ross, J.
1323: \newblock Amprenavir - {A} protease inhibitor for the treatment of patients
1324: with {HIV}-1 infection.
1325: \newblock Formulary 34:567--675, 1999.
1326:
1327: \bibitem{condra1}
1328: Brown, A. J.~L., Korber, B.~T., and Condra, J.~H.
1329: \newblock Associations between amino acids in the evolution of {HIV} type 1
1330: protease sequences under indinavir therapy.
1331: \newblock AIDS Res. Hum. Retroviruses 15:247--253, 1999.
1332:
1333: \bibitem{boucher1}
1334: Boucher, C.
1335: \newblock Rational approaches to resistance: Using saquinavir.
1336: \newblock AIDS 10:S15--9, 1996.
1337:
1338: \bibitem{Molla}
1339: {Molla~et~al.}, A.
1340: \newblock Ordered accumulation of mutations in {HIV} protease confers
1341: resistance to ritonavir.
1342: \newblock Nat. Med. 2:760--766, 1996.
1343:
1344: \bibitem{Marko}
1345: {Markowitz~et~al.}, M.
1346: \newblock Selection and analysis of human-immunodeficiency-virus type-i
1347: variants with increased resistance to abt-538, a novel protease inhibitor.
1348: \newblock J. Virol. 69:701--706, 1995.
1349:
1350: \bibitem{Amadei99}
1351: Amadei, A., Ceruso, M.~A., and Nola, A.~D.
1352: \newblock On the convergence of the conformational coordinates basis set
1353: obtained by the essential dynamics analysis of proteins' molecular dynamcis
1354: simulations.
1355: \newblock Proteins: Structure Function and Genetics 36:419--424, 1999.
1356:
1357: \bibitem{thorpe2001}
1358: Jacobs, D., Rader, A., Kuhn, L., and Thorpe, M.
1359: \newblock Protein flexibility predictions using graph theory.
1360: \newblock Proteins: Structure Function and Genetics 44:150--165, 2001.
1361:
1362: \bibitem{bah98}
1363: Bahar, I., Atilgan, A.~R., Demirel, M.~C., and Erman, B.
1364: \newblock Vibrational dynamics of folded proteins: significance of slow and
1365: fast motions in relation to function and stability.
1366: \newblock Physical Review Letters 80:2733--2736, 1998.
1367:
1368: \bibitem{hivgnm03}
1369: Kurt, N., Scott, W., Schiffer, C., and Haliloglu, T.
1370: \newblock Cooperative fluctuations of unliganded and substrate-bound hiv-1
1371: protease: a structure-based analysis on a variety of conformations from
1372: crystallography and molecular dynamics simulations.
1373: \newblock Proteins 51:409--22, 2003.
1374:
1375: \bibitem{hiv-gauss}
1376: Micheletti, C., Cecconi, F., Flammini, A., and Maritan, A.
1377: \newblock Crucial stages of protein folding through a solvable model:
1378: Predicting target sites for enzyme-inhibiting drugs.
1379: \newblock Protein Science 11:1878--1887, 2002.
1380:
1381: \bibitem{gaussian}
1382: Micheletti, C., Banavar, J., and Maritan, A.
1383: \newblock Conformations of proteins in equilibrium.
1384: \newblock Phys. Rev. Lett. 87:88102, 2001.
1385:
1386: \end{thebibliography}
1387:
1388: \end{document}
1389:
1390: