1: \documentclass[floatfix,a4paper,prb,twocolumn,showkeys]{revtex4}
2: %\documentclass[preprint,endfloats*,a4paper,prb,showkeys]{revtex4}
3:
4: \usepackage{pstricks}
5: \usepackage{graphicx}
6: \usepackage{amsmath,amssymb}
7:
8: \begin{document}
9:
10: \title{The protein folding network}
11:
12: \author{Francesco Rao}
13: \author{Amedeo Caflisch}
14:
15: \email[corresponding author, tel: +41 1 635 55 21,
16: fax: +41 1 635 68 62, e-mail: ]{caflisch@bioc.unizh.ch}
17:
18: \affiliation{Department of Biochemistry, University of Zurich,
19: Winterthurerstrasse 190, CH-8057 Zurich, Switzerlandi\\
20: tel: +41 1 635 55 21, fax: +41 1 635 68 62,
21: e-mail: caflisch@bioc.unizh.ch}
22:
23: %\date{\today}
24:
25: \begin{abstract}
26:
27: The conformation space of a 20-residue antiparallel $\beta$-sheet peptide,
28: sampled by molecular dynamics simulations, is mapped to a network.
29: Conformations are nodes of the network, and the transitions between them are
30: links. The conformation space network describes the significant free energy
31: minima and their dynamic connectivity without projections into arbitrarily
32: chosen reaction coordinates. As previously found for the Internet and the
33: World-Wide Web as well as for social and biological networks, the conformation
34: space network is scale-free and contains highly connected hubs like the native
35: state which is the most populated free energy basin. Furthermore, the native
36: basin exhibits a hierarchical organization which is not found for a random
37: heteropolymer lacking a predominant free-energy minimum. The network topology
38: is used to identify conformations in the folding transition state ensemble, and
39: provides a basis for understanding the heterogeneity of the transition state
40: and denaturated state ensemble as well as the existence of multiple pathways.
41:
42: \end{abstract}
43:
44: \keywords{complex networks, protein folding, energy landscape, transition
45: state, denaturated state ensemble, multiple pathways}
46:
47: \maketitle
48:
49: Proteins are complex macromolecules with many degrees of freedom. To fulfill their
50: function they have to fold to a unique three-dimensional structure (native
51: state). Protein folding is a complex process governed by noncovalent
52: interactions involving the entire molecule. Spontaneous folding in a time range
53: of microseconds to seconds \cite{Daggett:Is} can be reconciled with the large
54: amount of conformers by using energy landscape analysis
55: \cite{Bryngelson:Random,Leopold:Protein,Karplus:The}. The main difficulty of
56: this analysis is that the free-energy has to be projected on arbitrarily chosen
57: reaction coordinates (or order parameters). In many cases a simplified
58: representation of the free-energy landscape is obtained where important
59: informations on the non-native conformation ensemble and the folding transition
60: state ensemble are hidden. Moreover, the possible transitions between
61: free-energy minima cannot be displayed in such projections which hinder the
62: study of pathways and folding intermediates. The characterization of the
63: free-energy minima and the connectivity among them, i.e., possible transitions
64: between minima, for peptides and proteins is still an unresolved problem.
65:
66: In the last five years many complex systems, like the World-Wide Web, metabolic
67: pathways, and protein structures have been modeled as networks
68: \cite{Jeong:The,Albert:Diameter,greene:protein}. Intriguingly, common
69: topological properties have emerged from their organization \cite{newman:siam}.
70: A description of the potential energy landscape without the use of any
71: projection has been given in terms of networks for a Lennard-Jones cluster of
72: atoms \cite{Doye:scalefree}.
73:
74: Here, we introduce complex network analysis \cite{newman:siam} to study the
75: conformation space and folding of beta3s, a designed 20-residue sequence whose
76: solution conformation has been investigated by NMR spectroscopy
77: \cite{DeAlba:Denovo}. The NMR data indicate that beta3s in aqueous solution
78: forms a monomeric (up to more than 1mM concentration) triple-stranded
79: antiparallel $\beta$-sheet (Fig.\ 1, bottom), in equilibrium with the
80: denaturated state \cite{DeAlba:Denovo}. We have previously shown that in
81: implicit solvent \cite{Ferrara:Evaluation} molecular dynamics simulations
82: beta3s folds reversibly to the NMR solution conformation, irrespective of the
83: starting conformation \cite{Ferrara:Folding,Cavalli:Weak}.
84: %
85: We consider conformations sampled by molecular dynamics simulations and the
86: transitions between them as the network nodes and links, respectively.
87: %
88: The network analysis allows to identify the topological properties that are
89: common to both beta3s, which folds to a unique three-dimensional structure
90: \cite{Cavalli:Fast,DeAlba:Denovo}, and a random heteropolymer which lacks a
91: single preferential conformation like the native state despite it has the same
92: residue composition as beta3s. These properties include the presence
93: of several free-energy minima and highly connected conformations (hubs). On the other
94: hand, a hierarchical modularity \cite{Bara:Hier} in the proximity of the native
95: state is peculiar of a folding sequence.
96:
97: \begin{figure}[h]
98: \includegraphics[angle=0,width=80mm] {eps/figure1.eps}
99: %\includegraphics[angle=0,width=120mm] {eps/figure1.eps}
100: \caption{Beta3s conformation space network. The size and color
101: coding of the nodes reflect the statistical weight $w$ and average neighbor
102: connectivity $k_{nn}$, respectively. White, cyan, and red nodes have
103: $k_{nn}<30$, $30 \leq k_{nn} \leq 70$, and $k_{nn}>70$, respectively.
104: Representative conformations are shown by a pipe colored according to secondary
105: structure: white stands for coil, red for $\alpha$-helix, orange for bend, cyan
106: for strand and the N-terminus is in blue. The variable radius of the pipe
107: reflects structural variability within snapshots in a conformation. The yellow
108: diamonds are folding TS conformations (TSE1, TSE2, see text for details)
109: characterized by a connectivity/weight ratio $k/2\tilde w>0.3$, a clustering
110: coefficient $C<0.3$, and $60<k_{nn}<80$. This figure was made using
111: \textit{visone} (www.visone.de) and \textit{MOLMOL}\cite{Koradi96}
112: visualization tools.}
113:
114: \end{figure}
115:
116: \section{Model and Methods}
117:
118: \vspace{0.0cm}\noindent{\bf Molecular dynamics simulations}\hspace{0.2cm} The
119: simulations and part of the analysis of the trajectories were performed with
120: the program CHARMM {\cite{Brooks:CHARMM}}. Beta3s was modeled by explicitly
121: considering all heavy atoms and the hydrogen atoms bound to nitrogen or oxygen
122: atoms (PARAM19 force field {\cite{Brooks:CHARMM}}). A mean field approximation
123: based on the solvent accessible surface was used to describe the main effects
124: of the aqueous solvent on the solute \cite{Ferrara:Evaluation}. The two
125: parameters of the solvation model were optimized without using beta3s. The
126: same force field and implicit solvent model have been used recently in
127: molecular dynamics simulations of the early steps of ordered aggregation
128: \cite{Gsponer:Therole}, and folding of structured peptides ($\alpha$-helices
129: and $\beta$-sheets) ranging in size from 15 to 31 residues
130: \cite{Ferrara:Evaluation,Ferrara:Folding,Hiltpold:Free}, as well as small
131: proteins of about 60 residues \cite{Gsponer:Role,Gsponer:Molecular}. Despite
132: the absence of collisions with water molecules, in the simulations with
133: implicit solvent the separation of time scales is comparable with that observed
134: experimentally. Helices fold in about 1 ns \cite{Ferrara:Thermodynamics},
135: $\beta$-hairpins in about 10 ns \cite{Ferrara:Thermodynamics} and
136: triple-stranded $\beta$-sheets in about 100 ns \cite{Cavalli:Weak}, while the
137: experimental values are $\sim$0.1 $\mu$s \cite{Eaton:Fast}, $\sim$1 $\mu$s
138: \cite{Eaton:Fast} and $\sim$10 $\mu$s \cite{DeAlba:Denovo}, respectively.
139: % exactly the same force field and solvation model is able to reversibly fold
140: % to the experimental conformation several structured peptides
141: % ($\alpha$-helices and $\beta$-sheets)
142: % \cite{Ferrara:Native,Ferrara:Evaluation}.
143: Recently, four molecular dynamics simulations of beta3s were performed at 330 K
144: for a total simulation time of 12.6 $\mu$s \cite{Cavalli:Fast}. There are 72
145: folding events and 73 unfolding events and the average time required to go from
146: the denatured state to the folded conformation is 83 ns. The 12.6 $\mu$s of
147: simulation length is about two orders of magnitude longer than the average
148: folding or unfolding time, which are similar because at 330 K the native and
149: denatured states are almost equally populated \cite{Cavalli:Fast}. For the
150: network analysis the first 0.65 $\mu$s of each of the four simulations were
151: neglected so that along the 10 $\mu$s of simulations there are a total of $5
152: \times 10^{5}$ snapshots because coordinates were saved every 20 ps. The
153: sequence of the random heteropolymer is a randomly scrambled version of the
154: beta3s sequence with the same residue composition. It was simulated for 2
155: $\mu$s and $10^{5}$ snapshots were saved. The conditions for the molecular
156: dynamics simulations, i.e., force field, solvation model, temperature, and time
157: interval between saved snapshots were the same for both peptides.
158:
159: \vspace{0.5cm}\noindent{\bf Construction of the protein folding
160: network}\hspace{0.2cm} To define the nodes and links of the network the
161: secondary structure was calculated \cite{DSSP:cont} for each snapshot
162: (Cartesian coordinates of the atomic nuclei) saved along the molecular dynamics
163: trajectory. A "conformation" is a single string of secondary structure
164: \cite{DSSP:cont}, e.g., the most populated conformation for beta3s (FS in Fig.\
165: 1) is {\tt -EEEESSEEEEEESSEEEE-} where "{\tt E}", "{\tt S}", and "{\tt -}"
166: stand for extended, turn, and unstructured, respectively. There are 8 possible
167: "letters" in the secondary structure "alphabet". Since the N- and C-terminal
168: residues are always assigned an "{\tt -}" \cite{DSSP:cont} a 20-residue peptide
169: can assume $8^{18}\simeq 10^{16}$ conformations. Conformations are nodes of
170: the network and the transitions between them are links. A weight $\tilde w$ is
171: assigned to each node to take into account the free-energy of each conformation
172: and is equal to the number of snapshots with a given secondary structure
173: string. The statistical weight $w$ of a node is equal to the weight normalized
174: by the total number of snapshots in the simulations ($5 \times 10^{5}$ and
175: $10^{5}$ for beta3s and the random heteropolymer, respectively). Considering
176: all the conformations visited during a $\mu s-scale$ simulation can yield to a
177: computationally intractable network size. For this reason we used for the
178: network analysis the 1287 conformations of beta3s with significant weight
179: ($\tilde w\geq 20$ per conformation). Two nodes are connected by an undirected
180: link (and called neighbors) if they either include a pair of snapshots that are
181: visited within 20 ps or they are separated by one or more conformations with
182: less than 20 snapshots each. For the 2 $\mu$s of the random heteropolymer a
183: threshold of $\tilde w\geq 4$ was used, so that $w\geq 4\times 10^{-5}$ as in
184: the beta3s network. The choice of a threshold value is somewhat arbitrary but
185: the network properties are robust for a large range of threshold values (see
186: Supplementary material).
187:
188: The properties of the network are robust also with respect to the length of the
189: simulation time and the definition of the nodes. The topological properties
190: are independent from simulation lengths if one considers more than $2\ \mu s$.
191: The correlation between statistical weight and connectivity, as well as
192: power-law behavior of the connectivity distribution and $1/k$ behavior of the
193: clustering coefficient distribution (see below) are essentially identical after
194: 2, 4, and $10\ \mu s$. As an example, the exponent of the power-law is $2.0$
195: for the beta3s networks based on $2$, $4$ and $10\ \mu s$ of simulation time.
196: Defining nodes by grouping snapshots according to root mean square deviations
197: ({\sc rmsd}) in coordinates of C$_{\alpha}$-C$_{\beta}$ atoms yields the same
198: overall properties i.e., power-law distribution of the links (with similar
199: $\gamma$ value) and $1/k$ tail of the clustering distribution. Grouping
200: snapshots according to secondary structure motifs does not require the use of
201: an arbitrarily chosen {\sc rmsd} cutoff, and is able to capture the
202: fluctuations of partially structured conformations\cite{DSSP:cont}.
203:
204: \vspace{0.5cm}\noindent{\bf Evaluation of $\bf P_{fold}$}\hspace{0.2cm} The TS
205: ensemble can be defined as the set of structures which have the same
206: probability of folding ($P_{fold}$) or unfolding in trajectories started with
207: varying initial conditions\cite{Du:Tcoor}. For each putative TS conformation,
208: the probability to fold before unfolding was calculated by 100 very short
209: trajectories at 330~K started from ten snapshots within a node. The only
210: difference between the ten runs was the seed for the random number generator
211: used for the initial assignment of the atomic velocities. A trajectory was
212: considered to lead to folding (unfolding) if it visits first structures with a
213: fraction of native contacts $Q>22/26$ ($Q<4/26$) \cite{Ferrara:Folding}.
214:
215: \section{Results and Discussion}
216:
217: To study the conformation space network of polypeptides we concentrate
218: on the analysis of topology, i.e., on the study of the connectivity between
219: different conformations, leaving for a later study the analysis of transition
220: rates. We have investigated the network topologies of several peptides but on
221: this paper we focus on beta3s and the random scrambled version of it.
222: Additional details can be find in the Supplementary material where the network
223: properties of another structured peptide and a glycine homopolymer are
224: presented.
225:
226: \vspace{0.5cm}\noindent{\bf Conformation space network of a structured
227: peptide}\hspace{0.2cm} The conformation space network and relevant structures
228: of beta3s are shown in Fig.\ 1. The group of nodes at the bottom of Fig.\ 1
229: (red nodes) represents the native state basin (FS). The native basin is
230: connected to a wide region of nodes with significant native content (cyan
231: circles in the middle of Fig.\ 1). Although many heterogeneous routes can be
232: taken to reach the folded state (in agreement with lattice
233: simulations\cite{Onuchic:TSE,Schonbrun:Fast}), most of the folding events have
234: common structural features that define two average folding pathways. The less
235: frequented average pathway (see Ref \cite{Cavalli:Weak} but also the density of
236: transitions in Fig\ 1 bottom right) consists of conformations that have the
237: N-terminal hairpin formed while the C-terminal strand is mostly unstructured
238: with non-native hydrogen bonds at the turn (TSE1 in Fig.\ 1). The second and
239: most frequented average pathway includes conformations with a well formed
240: C-terminal hairpin while the N-terminal strand is disordered (TSE2 in Fig.\ 1),
241: namely it can be out-of-register or mostly unstructured. It is interesting to
242: note that the same two folding pathways were observed experimentally for a
243: 24-residue peptide with the same folded state as
244: beta3s\cite{Griffiths:Structure}. Furthermore multiple folding pathways have
245: recently been detected by kinetic analysis of a $\beta$-sandwich
246: protein\cite{Wright:Parallel}.
247:
248: The denatured state ensemble is very heterogeneous and includes high enthalpy,
249: high entropy conformations (e.g., the partially helical conformations, denoted
250: HH in Fig.\ 1) but also low enthalpy, low entropy conformations (e.g., the
251: curl-like trap, TR). The former are loosely linked clusters of conformations
252: with similar secondary structure (see Tab.\ 1) which are characterized by an
253: unfavorable effective energy (sum of peptide
254: potential energy and solvation energy) and fluctuating unstructured residues
255: (e.g., the terminal of the helix shown on top left of Fig.\ 1). On the
256: contrary, low enthalpy, low entropy traps form tightly linked clusters with
257: almost identical secondary and tertiary structure, favorable effective
258: energy (similar to the one of the native structure, see Tab.\ 1) and no
259: fluctuating residues (e.g., Fig.\ 1, top right). Taken together, these results
260: indicate that FS is entropically favored over low enthalpy conformations like
261: TR, i.e., FS has more flexibility than TR. A possible explanation is that the
262: C-terminal carboxy is involved in four hydrogen bonds in TR (with the backbone
263: NH's of residues 4-7), whereas both termini undergo rather large fluctuations
264: in FS. In addition, a more favorable van der Waals energy in TR is consistent
265: with a denser packing in TR than in FS.
266:
267: \setcounter{table}{0}
268: \renewcommand{\thetable}{\arabic{table}}
269:
270: \begingroup
271: \squeezetable
272: \begin{table}[h]
273: \begin{ruledtabular}
274: \caption{Energetic comparison of folded and denaturated state. The free-energy
275: of conformation $i$ is \mbox{${\cal F}_i = -k_BT \log(w_i)$}, where $w_i$ is the
276: probability along the trajectory to find the peptide in the conformation $i$.}
277: \begin{tabular}{ccc}
278: \parbox{4.0cm}{\flushleft{\bf Folded state (FS)}}
279: & $\left < {\cal E} \right >$ \footnote{Average effective energy}
280: & $\Delta{\cal F}$ \footnote{Free-energy relative to the most populated
281: conformation. All values are in kcal/mol. The conformational entropy of the
282: peptide is equal to $(\left < {\cal E} \right > - {\cal F})/T$. Note that the
283: curl-like traps are entropically penalized with respect to the native state.}\\
284: {\tt -EEEESSEEEEEESSEEEE-} & -7.6 & 0\\
285: {\tt -EEE-STTEEEEESSEEEE-} & -8.6 & 0.1\\
286: {\tt -EEEESSEEEEE-STTEEE-} & -8.4 & 0.5\\
287: {\tt -EEE-STTEEEE-STTEEE-} & -9.2 & 0.7\\
288: \parbox{4.0cm}{\flushleft{\bf Helical conformations (HH)}} & & \\
289: {\tt ---HHHHHHHHHHS------} & 0.9 & 3.1\\
290: {\tt -HHHHHHHHHHHHS------} & -1.9 & 3.3\\
291: {\tt ---HHHHHHHHHHTT-----} & 0.7 & 3.5\\
292: {\tt ---HHHHHHHHHH-------} & 0.5 & 3.7\\
293: {\tt -HHHHHHHHHHHHTT-----} & -0.8 & 3.7\\
294: {\tt --TT--HHHHHHHHHHHHH-} & -0.8 & 3.8\\
295: \parbox{4.0cm}{\flushleft{\bf Curl-like trap (TR)}} & & \\
296: {\tt ---SSGGG-EEE-STTTEE-} & -7.8 & 3.4\\
297: {\tt ---SSSS--EEE-STTTEE-} & -7.0 & 3.5\\
298: {\tt ---S-GGG-EEE-STTTEE-} & -9.3 & 3.7\\
299: {\tt ---SSGGG-EEE-SGGGEE-} & -9.6 & 3.7\\
300: {\tt ---SSTTT-EEE-STTTEE-} & -8.4 & 3.7\\
301: \end{tabular}
302: \end{ruledtabular}
303: \end{table}
304: \endgroup
305:
306: Note that the network description of non-native conformations is more detailed
307: than the one obtained by projecting the free energy surface on progress
308: variables (e.g., based on fraction of native contacts). In such projections,
309: for low values of the fraction of native contacts structures as diverse as
310: helices and the curl-like conformations mentioned above are not
311: distinguished. Even the ensemble with half of the native contacts is
312: heterogeneous and hard to classify. Using as reaction coordinate the {\sc rmsd}
313: (with respect to a given structure) or the radius of gyration is even less
314: selective. Only when a clever combination of variables is used it is possible
315: to have a more detailed description of the free-energy landscape. The network
316: description of the conformation space gives a synthetic and systematic view of
317: all the possible conformations accessed by the system and their transitions.
318: By considering the statistical weight of the nodes a thermodynamical
319: description of the system is obtained.
320:
321:
322: \begin{figure}[h!]
323: \includegraphics[angle=-90,width=80mm]{eps/figure2.eps}
324: \caption{Correlation between the statistical weight $w$ and the
325: connectivity $k$ for beta3s. The connectivity is proportional to $log^2(w)$
326: with a correlation coefficient of 0.88 (solid line). The correlation and the
327: fit are calculated over all nodes of the network but in the figure logarithmic
328: binning is applied to reduce noise.}
329: \end{figure}
330:
331:
332: The high correlation between the statistical weight of a node and its number of
333: links (Fig.\ 2) shows that the most connected nodes are also low lying minima
334: on the free-energy landscape. This indicates that the conformation space
335: network describes the significant free energy minima and their dynamic
336: connectivity, without projection, where highly populated nodes are minima of
337: free-energy and the set of nodes densely connected to them make up the basins
338: of such minima.
339:
340: \vspace{0.5cm}\noindent{\bf Folding and network topology}\hspace{0.2cm} The
341: average neighbor connectivity $k_{nn}$ of beta3s (Fig.\ 3a), i.e., the average
342: number of links of the neighbors of a given node, is rather heterogeneous,
343: highlighting the presence of different connection rules in different regions of
344: the network. This is not the case for the random heteropolymer (Fig.\ 3b)
345: whose basins have organization and statistical weight similar among each others
346: as previously found for most homopolymers \cite{Vendrusc:SW}. Note that for
347: beta3s the native state is well discriminated by $k_{nn}$ (red nodes in Fig.\ 1
348: and top band in Fig.\ 3a).
349:
350: The connectivity distribution of conformation space networks shows a well
351: pronounced power-law tail $P(k)=k^{-\gamma}$ with $\gamma=2.0$ for both beta3s
352: and the random heteropolymer (Fig.\ 4a) as well as another structured peptide
353: \cite{demarest:alphalac} and homoglycine, i.e.\ (Gly)$_{20}$ (see Supplementary
354: material). The power-law is due to the presence of few largely connected "hubs"
355: while the majority of the nodes have a relatively small number of
356: links\cite{Bara:Scale}. This behavior has been previously observed for several
357: biological\cite{Jeong:The}, social\cite{Newman:Scicol} and technological
358: networks\cite{Albert:Diameter}, which in the literature take the name of
359: scale-free networks. In terms of free-energy this means that only a few low
360: lying minima are present but they act as "hubs" with a large number of routes
361: to access them.
362:
363: The average clustering coefficient $C$ is a measure of the probability that any
364: two neighbors of a node are connected. Beta3s and the heteropolymer have $C$
365: values of $0.49$ and $0.28$, respectively. These values are one order of
366: magnitude larger than random realizations of the two networks with the same
367: amount of nodes and links. The native basin of beta3s includes the nodes with
368: the largest number of links of the network. These nodes give rise to the $1/k$
369: tail of the clustering distribution (Fig.\ 4b), i.e., an inherently
370: hierarchical organization\cite{Bara:Hier} of the conformations in the native
371: basin of beta3s. Such organization is not observed for the non-native region
372: of beta3s and the random heteropolymer. Note that the power-law scaling of the
373: connectivity distribution can be considered as a general property of
374: free-energy landscapes of polypeptides, whereas a hierarchical organization of
375: the nodes reflects a single pronounced free-energy basin of attraction (like
376: the native state).
377:
378: \begin{figure}[h!]
379: \includegraphics[angle=-90,width=90mm]{eps/figure3.eps}
380: \caption{Average neighbor connectivity $k_{nn}$ plotted as a function of the
381: statistical weight for the 1287 nodes of beta3s (A) and for the 2658 nodes of
382: the random heteropolymer (B). $k_{nn}$ of node $i$ is the average number of
383: links of the neighbors of node $i$. The yellow diamonds are folding transition
384: state conformations (see also Fig.\ 1 and text) characterized by a
385: connectivity/weight ratio $k/2\tilde w>0.3$, a clustering coefficient $C<0.3$,
386: and $60<k_{nn}<80$.}
387: \end{figure}
388:
389: \begin{figure}[h] \includegraphics[angle=-90,width=90mm]{eps/figure4.eps}
390: \caption{Topological properties of conformation space networks. Red and
391: blue data points are plotted for beta3s and a random heteropolymer,
392: respectively. For a direct comparison, the connectivity $k$ is normalized by
393: the average connectivity $\left<k\right>$ of each network. Logarithmic binning
394: is applied to reduce noise. (A) The connectivity distribution $P(k)$ is the
395: probability that a node (conformation) has $k$ links (neighbor conformations).
396: The straight line corresponds to a power-law fit $y=x^{-\gamma}$ on the tail of
397: the distribution with $\gamma=2.0$. (B) The clustering coefficient $C$
398: describes the cliques of a node. For node $i$ it is defined as
399: $C_i=\frac{2n_i}{k_i(k_i-1)}$, where $k_i$ is the number of neighbors of node
400: $i$ and $n_i$ is the total number of connections between them. Values of $C$
401: are averaged over the nodes with $k$ links. The straight line corresponds to a
402: power-law fit $y=x^{-1}$ on the tail of the distribution of beta3s.}
403: \end{figure}
404:
405:
406: %%% TSE %%%
407: \vspace{0.5cm}\noindent{\bf Transition state ensemble}\hspace{0.2cm} As
408: mentioned above folding is a complex process with many degrees of freedom
409: involved and it is difficult (or even not possible) to define a single
410: reaction coordinate to monitor folding events
411: \cite{Chan:Protein,Karplus:Aspects}. Hence, it is very difficult to isolate
412: transition state (TS) conformations from equilibrium sampling. The TS
413: conformations are saddle points, i.e., local maxima with respect to the
414: reaction coordinate for folding and local minima with respect to all other
415: coordinates. For this reason we identified the nodes with a high
416: connectivity/weight ratio $k_i/2\tilde w_i$ and low clustering coefficient
417: value $C_i$ as putative TS conformations. The former criterion guarantees that
418: these nodes are accessed and exited, most of the time, by a different route,
419: i.e., they can be directly reached from different conformations of the network
420: space. The low clustering coefficient value guarantees that
421: the neighbors of these
422: conformations are likely to be disconnected. These two conditions are
423: necessary but not sufficient because they do not distinguish folding TS
424: conformations from saddle points between unfolded conformations.
425: %Since the nodes in the native state have the largest amount of links we
426: %speculated that folding TS conformations are linked to both nodes with a high
427: %(native state) and low (denatured state) amount of links.
428: Since the folding TS conformations are linked to both nodes in the native state
429: (having large number of links) and in the
430: denatured state (small/intermediate number of links), we speculated that
431: folding TS conformations should have values of the average neighbor
432: connectivity $k_{nn}$ within a certain range. For nodes with high
433: connectivity/weight ratio and low clustering
434: coefficient, a remarkable correlation of $0.89$ was found
435: between the average neighbor connectivity $k_{nn}$ and $P_{fold}$ (Fig.\ 5),
436: which is the probability of a given conformation to fold before
437: unfolding\cite{Du:Tcoor}. A $P_{fold}$ value close to 0.5 is expected for
438: conformations on top of the folding TS barrier\cite{Gsponer:Molecular} and the
439: correlation suggests that network properties can be used to predict folding TS
440: conformations. These are shown in Fig.\ 1 and 3a with yellow diamonds. As
441: discussed above two main average folding pathways are observed. The less
442: frequented one is characterized by a transition state ensemble of conformations
443: with the first hairpin in a native form (residues 1-13) and a bend in
444: correspondence of the the second native turn (residues 14-15). The
445: C-terminal
446: residues form a straight structure with almost no contacts, either native or
447: non-native. The second average pathway shows a transition state with the second
448: native harpin formed (residues 7-20) and a bend in correspondence of the the
449: first native turn (residues 5-6). Such a symmetrical behavior is presumably due to the
450: simplicity and symmetry of the native conformation as well as the symmetry
451: in the sequence (sequence identity of 67\% between the two hairpins).
452: The folding TS conformations of beta3s form an heterogeneous ensemble with
453: C$_{\alpha}$ root mean square deviations within contributing structures between
454: 3 and 6 \AA. In contrast to previous molecular dynamics studies in which
455: progress variables based on fraction of native contacts were used to describe
456: TS conformations \cite{Lazaridis:New,Ferrara:Folding}, the network properties
457: yield a description of the folding TS ensemble (Fig.\ 1) which does not depend
458: on the choice of reaction coordinates. Interestingly, the folding TS
459: conformations of beta3s have about one-half of the native contacts formed but
460: this is not a sufficient criterion (Table S1 in Supplementary material).
461: Moreover, there is no correlation between the fraction of native contacts and
462: the probability of folding. As a control, $P_{fold}$ values smaller than 0.15
463: were obtained for five nodes with an average fraction of native contacts
464: similar to the folding TS conformations but low connectivity/weight ratio
465: and/or high clustering coefficient.
466:
467: \begin{figure}[h!]
468: \includegraphics[angle=-90,width=80mm]{eps/figure5.eps}
469: \caption{Correlation between $P_{fold}$ and average neighbor
470: connectivity $k_{nn}$. Three nodes used as a control (low connectivity/weight
471: ratio and/or high clustering coefficient but similar fraction of native
472: contacts) are shown with empty circles.}
473: \end{figure}
474:
475: %\vspace{1.0cm}
476: %\noindent{\bf Conclusions}
477: %\vspace{0.5cm}
478:
479: \section{Conclusions}
480:
481: % A foldable protein must satisfy a thermodynamic requirement (stable and
482: % unique folded state) as well as a kinetic requirement (folding within a short
483: % time)\cite{Sali:Fold}. The network topology of beta3s is consistent with
484: % both requirements. The $C(k)\sim 1/k$ behavior originates from a stable
485: % native state and a funnel-like energy
486: % profile\cite{Leopold:Protein,Dill:From}. The kinetic requirement is
487: % supported by the large $C$ values, especially in the denatured state
488: % ensemble, which reflect multiple pathways. How generic are the network
489: % properties for larger proteins? Although the current simulation protocols
490: % and force fields do not allow to reversibly fold a protein on a computer, our
491: % preliminary analysis of unfolding pathways of some two-state proteins
492: % indicates that the topological properties are the same as in beta3s.
493:
494: Complex network theory was used to analyze the conformation space of a
495: structured peptide and the one of a random heteropolymer of same residue
496: composition. Four main results have emerged.
497: %
498: First, as it was already observed for a variety of networks as diverse as the
499: World-Wide Web and the protein interactions in a cell, the conformation space
500: network of polypeptide chains is a scale-free network (power-law behavior of
501: the degree distribution).
502: %
503: Second, the native basin of the structured peptide shows a hierarchical
504: organization of conformations. This organization is not observed for the random
505: heteropolymer which lacks a native state.
506: %
507: Third, free energy minima and their connectivity emerge from the network
508: analysis without requiring projections into arbitrarily chosen reaction
509: coordinates. As a consequence it is found that the denaturated state ensemble
510: is very heterogeneous and includes high entropy, high enthalpy conformations as
511: well as low entropy, low enthalpy traps.
512: %
513: Fourth, the network properties were used to identify transition state
514: conformations and two main average folding pathways. It was found that the
515: average neighbor connectivity $k_{nn}$ correlates with $P_{fold}$, the
516: probability of folding. $P_{fold}$ is computationally very expensive to
517: evaluate. Hence, it will be important to generalize this result by analyzing
518: other structured peptides which is work in progress in our research group. In
519: conclusion, the network analysis seems particularly useful to study the
520: conformation space and folding of structured peptides including the otherwise
521: elusive transition state ensemble.
522:
523: \vspace{0.5cm}\noindent{\bf Acknowledgments}\hspace{0.2cm} We thank
524: M.\ Cecchini,
525: Prof.\ P.\ De Los Rios,
526: E.\ Guarnera,
527: Dr.\ E.\ Paci,
528: Dr.\ M.\ Seeber and
529: Dr.\ G.\ Settanni
530: for interesting discussions.
531: This work was supported by the Swiss National Science Foundation
532: and the National Competence Center for Research (NCCR) in Structural Biology.
533:
534: %\bibliographystyle{unsrtnat}
535: \bibliographystyle{elsart-num}
536: \bibliography{a-bib}
537:
538: %%%%%%% \end{document}
539:
540: \clearpage
541:
542: % -----------------------------------------------------------
543: % SUPPLEMENTARY MATERIAL
544: % -----------------------------------------------------------
545:
546: \setcounter{page}{1}
547: \renewcommand{\thepage}{S-\arabic{page}}
548:
549: \setcounter{figure}{0}
550: \renewcommand{\thefigure}{S\arabic{figure}}
551:
552: \begin{figure*}
553: {\Large\sc Supplementary Material}\\
554: \includegraphics[angle=0,width=170mm] {eps/figS1.eps}
555: \caption{Dependence of the beta3s network properties on the node-weight
556: threshold. The threshold value used in the present work ($\tilde w=20$) is
557: shown as an empty circle while filled circles correspond to threshold values
558: of, from left to right, 500, 200, 100, 50, 10, 5, 2 and 1. (A) Relation between
559: the threshold value and the number of nodes. (B) Number of links as a function
560: of the number of nodes. When the threshold is very large (i.e., small number
561: of nodes) the network approaches a topology where all possible connections are
562: present (solid line, $N_{links}=N_{nodes}(N_{nodes}-1)/2$). When the threshold
563: is small (i.e., large number of nodes) the network approaches a topology with
564: only one link per node (dashed line, $N_{links}=N_{nodes}$). (C) Average number
565: of links per node $\left< k \right>$ as a function of the number of nodes. (D)
566: Average clustering coefficient $C$ as a function of the number of nodes.}
567: \end{figure*}
568:
569: %\clearpage
570:
571: \begin{figure*}
572: {\Large\sc Supplementary Material}\\
573: \includegraphics[angle=-90,width=170mm] {eps/figS2.eps}
574: \caption{Dependence of the beta3s connectivity distribution (A) and clustering
575: coefficient distribution (B) on the node-weight threshold. This plot shows
576: that the scale-free behavior and the $1/k$ tail of the clustering coefficient
577: distribution are robust with respect to the choice of threshold values.}
578: \end{figure*}
579:
580: \begin{figure*}
581: \includegraphics[angle=-90,width=170mm] {eps/figS3.eps}
582: \caption{Connectivity distribution (left) and clustering coefficient
583: distribution (right) for beta3s (filled circles), another structured peptide,
584: i.e., residues 101-111 of $\alpha$-lactalbumin (empty diamonds, Demarest et
585: al., (1999) {\it Biochemistry}, {\bf 38}, 7380), and a 20-residue homo-glycine
586: which is unstructured (filled diamonds).}
587: \end{figure*}
588:
589: \setcounter{table}{0}
590: \renewcommand{\thetable}{S\arabic{table}}
591:
592: \begingroup
593: \squeezetable
594: \begin{table*}[h]
595: {\Large\sc Supplementary Material}\\
596: \begin{ruledtabular}
597: \caption{Supplementary material. Nodes used for $P_{fold}$ evaluation.}
598: \begin{tabular}{ccccccccccc}
599: \parbox{1.0cm}{Node\\number} &
600: \parbox{1.0cm}{Probability\\of folding $P_{fold}$} &
601: \parbox{1.0cm}{Standard deviation\\$\sigma_{P_{fold}}$} &
602: \parbox{1.5cm}{Neighbor connectivity\\$k_{nn}$} &
603: \parbox{1.0cm}{Weight \\ $\tilde w$} &
604: \parbox{1.0cm}{Number of links\\$k$} &
605: \parbox{1.0cm}{$k/2\tilde w$} &
606: \parbox{1.0cm}{Clustering coefficient\\$C$} &
607: \parbox{1.0cm}{Native contacts\\$Q$} &
608: \parbox{1.0cm}{Standard deviation\\$\sigma_Q$} &
609: \parbox{1.5cm}{Secondary\\ structure string} \\ \\
610: \hline
611: 432 & 0.22 & 0.17 & 55.1 & 54 & 40 & 0.37 & 0.31 & 0.38 & 0.09 & {\tt -----SS-EEEEESSEEEE- } \\
612: 218 & 0.26 & 0.20 & 45.5 & 105 & 73 & 0.35 & 0.23 & 0.42 & 0.09 & {\tt ----SSS---EEESSEEE-- } \\
613: 313 & 0.26 & 0.16 & 55.6 & 70 & 60 & 0.43 & 0.28 & 0.46 & 0.09 & {\tt -EEE-STTEEE-SSS----- } \\
614: 446 & 0.32 & 0.25 & 56.2 & 52 & 50 & 0.48 & 0.28 & 0.40 & 0.10 & {\tt --EEESSEEE---SS----- } \\
615: 308 & 0.33 & 0.28 & 65.9 & 72 & 69 & 0.48 & 0.23 & 0.44 & 0.08 & {\tt ----SSS--EEEESSEEEE- } \\
616: 315 & 0.37 & 0.26 & 52.2 & 70 & 60 & 0.43 & 0.24 & 0.47 & 0.08 & {\tt -EEE-STTEEE--SS----- } \\
617: 306 & 0.43 & 0.27 & 57.3 & 73 & 60 & 0.41 & 0.31 & 0.42 & 0.08 & {\tt -----SS--EEE-STTEEE- } \\
618: 208 & 0.51 & 0.26 & 58.1 & 115 & 87 & 0.38 & 0.23 & 0.43 & 0.08 & {\tt -----SS--EEEESSEEEE- } \\
619: 589 & 0.53 & 0.31 & 60.1 & 40 & 52 & 0.65 & 0.17 & 0.45 & 0.10 & {\tt ---SSSTT-EEEESSEEEE- } \\
620: 580 & 0.56 & 0.34 & 65.0 & 40 & 47 & 0.59 & 0.26 & 0.48 & 0.08 & {\tt -EEEESSEEEE--SSS---- } \\
621: 197 & 0.57 & 0.39 & 80.5 & 121 & 105 & 0.43 & 0.28 & 0.52 & 0.09 & {\tt -----STT-EEEESSEEEE- } \\
622: 540 & 0.60 & 0.16 & 70.3 & 44 & 49 & 0.56 & 0.28 & 0.46 & 0.07 & {\tt -----GGG-EEE-STTEEE- } \\
623: 285 & 0.62 & 0.30 & 75.7 & 80 & 68 & 0.42 & 0.30 & 0.47 & 0.07 & {\tt -----GGG-EEEESSEEEE- } \\
624: 630 & 0.65 & 0.22 & 71.3 & 38 & 56 & 0.74 & 0.29 & 0.44 & 0.11 & {\tt -----STT--EE-STTEE-- } \\
625: 426 & 0.76 & 0.20 & 97.7 & 55 & 76 & 0.69 & 0.43 & 0.55 & 0.12 & {\tt ---B-TTTB-EEESSEEE-- } \\
626: 280 & 0.88 & 0.18 & 98.2 & 82 & 81 & 0.49 & 0.43 & 0.56 & 0.09 & {\tt ---EESSEE-EE-STTEE-- } \\
627: Control simulations \\
628: 174 & 0.09 & 0.10 & 60.0 & 139 & 51 & 0.18 & 0.61 & 0.37 & 0.07 & {\tt --EE-STTEEEESTTEEEE- } \\
629: 179 & 0.09 & 0.10 & 25.0 & 135 & 19 & 0.07 & 0.33 & 0.44 & 0.07 & {\tt --EESSEE--EEESSEEE-- } \\
630: 15 & 0.10 & 0.13 & 35.8 & 3243 & 73 & 0.01 & 0.28 & 0.47 & 0.08 & {\tt --EEESSSEEEEESSEEEE- } \\
631: 200 & 0.10 & 0.27 & 62.9 & 119 & 51 & 0.21 & 0.31 & 0.31 & 0.07 & {\tt ----BSSB--EEESSEEEE- } \\
632: 475 & 0.15 & 0.17 & 61.7 & 48 & 34 & 0.35 & 0.68 & 0.43 & 0.06 & {\tt -EEE-STTEEETTTT-EEE- } \\
633: \end{tabular}
634: \end{ruledtabular}
635: \end{table*}
636: \endgroup
637:
638: \end{document}
639: