1: %\documentclass[aps,prl,twocolumn,showpacs,floatfix]{revtex4}
2: \documentclass[aps,prl,twocolumn,floatfix]{revtex4}
3: \usepackage{graphicx,graphics,epsf,psfig,epsfig}% Include figure files
4: \usepackage{subfigure}
5:
6: \def\beq{\begin{equation}}
7: \def\eeq{\end{equation}}
8: \def\beqa{\begin{eqnarray}}
9: \def\eeqa{\end{eqnarray}}
10: %\def\l{\left}
11: %\def\r{\right}
12: %\def\bdi{\begin{displaymath}}
13: %\def\edi{\end{displaymath}}
14: %\def\ds{\displaystyle}
15:
16: \begin{document}
17: %\documentclass[12pt]{article}
18: %\documentstyle[twocolumn,aps,prl,epsf]{revtex}
19: %\documentstyle[preprint,aps,prl,epsf]{revtex} % makes power law references
20: %\topmargin 0in
21: %\begin{document}
22:
23: \title{Lattice tube model of proteins}
24:
25: \author{Jayanth R. Banavar$^1$, Marek Cieplak$^2$,
26: and Amos Maritan$^3$}
27:
28: \address{
29: $^1$104 Davey Laboratory, Department of Physics,
30: The Pennsylvania State University,
31: University Park, Pennsylvania 16802\\
32: $^2$Institute of Physics, Polish Academy of Sciences,
33: Al. Lotnik{\'o}w 32/46, 02-668 Warsaw, Poland \\
34: $^3$Universit{\'a} degli Studi di Padova, Dipartimento di Fisica
35: and INFN, via
36: Mazzolo 8, 35100, Padova, Italy }
37:
38: %\maketitle
39:
40: %\vskip 40pt
41: %\noindent
42: %$^*$Correspondence to: \\
43: %Marek Cieplak,\\
44: %Institute of Physics, \\
45: %Polish Academy of Sciences, \\
46: %Al. Lotnik\'ow 32-46 \\
47: %02-668 Warsaw, Poland\\
48: %Tel: 48-22-843-7001\\
49: %Fax: 48-22-843-0926\\
50: %E-mail: mc@ifpan.edu.pl
51:
52:
53: %\vskip 40pt
54: %\noindent {\bf
55: %Keywords:
56:
57: \begin{abstract}
58:
59: {\small
60: %We present a new lattice model for proteins and demonstrate that
61: %it captures many aspects of real protein behavior.
62: We present a new lattice model for proteins that incorporates
63: a tube-like anisotropy by introducing a preference for mutually parallel
64: alignments in the conformations. The model is demonstrated to
65: capture many aspects of real proteins.
66: }
67:
68: \end{abstract}
69:
70: \maketitle
71:
72: %\vskip 40pt
73: \vspace*{-0.8cm}
74: \hspace*{1.5cm} PACS Numbers: 87.15.He, 87.15.Cc, 87.15.Aa
75:
76: \vspace*{0.5cm}
77:
78: %\newpage
79:
80:
81: There have been several physics-based attempts to distil the
82: essential features of the protein problem and notable success in
83: capturing many of the key ingredients has been achieved using
84: lattice models \cite{Dillreview}. Such coarse-grained
85: descriptions allow a virtually exact analysis of many properties
86: and provide a useful framework for understanding experimental
87: results. Indeed, valuable progress has been made within the
88: simplified description of a lattice model with just two types of
89: amino acids denoted by $H$ and $P$ representing hydrophobic and
90: polar behaviors. The principal theme of this letter is to present a
91: new lattice model of proteins, which takes into account a
92: previously overlooked key attribute of chain molecules -- the
93: context of amino acids within a chain. We benchmark the behavior
94: of this model with the well-studied HP lattice model and show that
95: the new model faithfully captures several attributes of real
96: proteins.
97:
98: %Globular proteins, that act as enzymes, play a key role in the
99: %network of life. As stated by Kornberg \cite{Kornberg}, ``What
100: %gives the cell its life and personality are enzymes. They govern
101: %all body processes; malfunction of even one enzyme can be fatal.
102: %Nothing in nature is so tangible and vital to our lives as
103: %proteins, and yet so poorly understood and appreciated by all but
104: %a few scientists." Proteins are hard to understand for several
105: %reasons: first, they are chain molecules made up of twenty
106: %different types of amino acids each with their own distinctive
107: %features; second, an important role is played by the surrounding
108: %water molecules -- proteins fold in order to create a hydrophobic
109: %core within which the hydrophobic amino acid side chains can be
110: %sequestered; third, proteins are neither long enough that simple
111: %theoretical approximations can be made to study them nor are they
112: %so short as to be amenable to an exact treatment -- they are
113: %finite size objects for which one might expect non universal
114: %behavior with some of the details being important; and finally, nature
115: %tinkers with sequences of amino acids through evolution, building
116: %on what she already has, and so the proteins that we see today are
117: %not the result of a targeted design process but rather are
118: %strongly history dependent.
119:
120: %In spite of these daunting difficulties, there
121: There are clear hints,
122: manifested by the many common characteristics of proteins
123: \cite{Banavar}, that proteins may be simpler than one might
124: expect. Protein structures are constructed in a modular manner
125: from common building blocks -- helices, hairpins and sheets
126: connected together by tight turns. Further, the total number of
127: distinct protein folds seems to be of the order of just a few
128: thousand \cite{Chothia}.
129:
130: The simplest model of an unconstrained object is a hard sphere. A
131: collection of hard spheres exhibits both fluid and crystalline
132: phases on changing the volume fraction. When objects are tethered
133: together in the form of a chain, it is no longer appropriate to
134: consider them as spheres. There is a special direction that one
135: may associate with each object which is tangent to the chain and
136: is defined by the adjoining particles along the chain. It is
137: therefore more appropriate to model the objects making up a chain
138: by means of discs or coins, for which the heads-to-tails direction
139: is distinct from the two other directions. This picture of
140: tethered coins leads to a tube-like description of a chain
141: molecule \cite{Banavar}. Just as symmetry plays a key role in
142: determining the nature of ordering of unconstrained particles (the
143: phases associated with a collection of spheres are vastly simpler
144: than the liquid crystal phases of anisotropic objects), the
145: anisotropy inherent in a tube leads to new behavior. Recent work
146: \cite{Banavar} has shown that the tube picture can be used to
147: understand the conventional polymer phases and the novel phase of
148: matter used by Nature to house protein native state structures in
149: a unified way and for the development of a framework for
150: understanding the common character of proteins.
151:
152: There are three key features of a tube description that one ought
153: to incorporate in a lattice model: self-intersections of a tube
154: are not allowed, the local radius of curvature of a tube can be no
155: smaller than the tube radius
156: %(there would otherwise be a sharp corner in the tube)
157: and in a compact state, there is a tendency
158: for nearby tube segments to be parallel (indeed both helices and
159: sheets have tube segments alongside and parallel to each other
160: leading to a cooperative placement of hydrogen bonds \cite{Liwo}).
161: The first two features are built into a model of a self-avoiding
162: chain on a lattice. Our focus here is on considering the effects
163: of introducing the third.
164:
165: In order to illustrate the key idea, we will consider a 16 amino
166: acid (aa) self-avoiding chain on a square lattice. There have been
167: numerous previous studies \cite{Dillreview} of this system within
168: the standard HP model context and its generalizations \cite{Rios}.
169: In the standard $HP$ model, one
170: ascribes a favorable energy $-1$ for a $HH$ contact (two H aa
171: which are {\em not} next to each other in sequence but sit next to
172: each other in the lattice) and zero energy otherwise. Here, in
173: addition we pay attention to the context that the contact occurs
174: in. Figure 1 illustrates three distinct types of contacts (denoted
175: by an index $m$) depending on the degree to which the segments
176: containing the aa in contact are parallel to each other. The
177: energy assigned to a $HH$ contact of type $m$ in the Tube HP (THP)
178: model is denoted by $e_m$. In what follows, let us choose $e_m$ to
179: be $-3$, $-2$, and $-1$ for $m$=3, 2, and 1 respectively thereby
180: favoring the parallelism of nearby segments. In the standard HP
181: model $e_m= -1$ independent of $m$.
182:
183: In order to understand the role of sequence heterogeneity, it is
184: useful to consider a generalized model in which the energies are
185: described by
186: \begin{equation}
187: E \;=\; \sum_{i<j} \; e_m \; \Delta (i-j,m) \; [ \delta_{i,H} \delta_{j,H}
188: \;+\;(1 - \epsilon ) \; D_{ij}] \;\;,
189: \end{equation}
190: where
191: \begin{equation}
192: D_{ij}\;=\;(\delta_{i,H} \delta_{j,P} + \delta_{i,P} \delta_{j,H}
193: +\delta_{i,P} \delta_{j,P} ) \;= \; 1\;-\;\delta_{i,H}\delta_{j,H}\;\;.
194: \end{equation}
195: Here, $\Delta (i-j,m)$ is equal to 1 if the amino acids $i$ and
196: $j$ form a contact and 0 otherwise. When such a contact exists,
197: the energy of attraction associated with it depends on the index
198: $m$. $\delta _{i,H}$ is defined to be equal to 1 if amino acid $i$
199: is hydrophobic and 0 if it is polar. Similarly, $\delta_{i,P} = 1
200: - \delta_{i,H}$ is equal to 1 if amino acid $i$ is polar.
201: Depending on the choice of the $e_m$ parameters, one obtains the
202: HP or THP models. The limiting cases correspond to $\epsilon =1$,
203: i.e. the 'standard' THP or HP models, and $\epsilon =0$ -- the
204: case of a homopolymer made of $H$ amino acids.
205:
206: For the 16-aa chain, all sequences and all possible conformations
207: can be enumerated exactly. There are interesting differences in
208: the energy landscape of the HP and the THP models. One may
209: determine the sequences which have a unique ground state and the
210: number of distinct designable conformations, which house these
211: sequences, as a function of $\epsilon$ (see inset of Figure 2).
212: For a homopolymer ($\epsilon $=0), the HP model has no designable
213: structure -- all compact conformations are degenerate and have the
214: same energy. Thus in the absence of sequence specificity,
215: there is no pre-selection of protein-like structures among
216: compact conformations.
217: Thus in the absence of sequence specificity, there is
218: no protein-like behavior. When a weak heterogeneity is introduced
219: by turning on a small $\epsilon$, the HP energy landscape becomes
220: rugged and each of the 69 maximally compact conformations become
221: designable but with a weak thermodynamic stability. Thus the
222: funnel-like energy landscape \cite{funnel} arises only on turning
223: on the full degree of sequence heterogeneity.
224:
225: %FIGURE 1
226: \begin{figure}
227: %\epsfxsize=3.6in \centerline{\epsffile{defconf.eps}}
228: \epsfxsize=3.6in \centerline{\epsffile{fig1.eps}}
229: \vspace*{-3.5cm}
230: \caption{ {\small
231: %Fig. 1.
232: Panel a: Sketch of three contact environments in the THP model. The dashed
233: line denotes a contact. Panel b: The optimal
234: structure for the THP model. The circled
235: represent the hydrophobic core and have H aa in them
236: more than 87 \% of the time for the sequences that fold into this
237: conformation when $\epsilon = 1$.
238: }}
239: \end{figure}
240:
241: This is in sharp contrast to the behavior of the THP model --
242: here, even for a homopolymer, one obtains a unique ground state,
243: akin to either a helix or a sheet in two dimensions
244: (see Figure 1), selected not by considerations of the chemistry of
245: the aa but rather by the overarching principles of geometry and
246: symmetry. Interestingly, in the limit of small $\epsilon$, all
247: $2^{16}$=65536 sequences have a unique ground state in the THP
248: model and none in the HP model.
249: When $\epsilon =1$, one obtains 10579 and 1539
250: designable sequences in the THP and HP models respectively (see
251: Figure 2) folding into 684 and 456 distinct folded structures.
252: Furthermore, the number of sequences folding into the most
253: designable structure are 637 and 26 for the two models.
254:
255: %FIGURE 2
256: \begin{figure}
257: %\epsfxsize=4in \centerline{\epsffile{comdes.eps}}
258: \epsfxsize=3.7in \centerline{\epsffile{fig2.eps}}
259: \caption{ {\small
260: %Fig. 2.
261: Rank ordered values of the number of sequences that fold into the
262: given structure for all of the designable structures at $\epsilon
263: =1$. The inset shows the number of designable structures as a
264: function of $\epsilon$ (see text).
265: }}
266: \end{figure}
267:
268: The thermodynamic stability of a sequence is characterized by the
269: folding transition temperature, $T_f$, at which the equilibrium
270: probability of being in the native state conformation is equal to
271: $\frac{1}{2}$. The spread in the values of $T_f$ is nearly three
272: times larger in the THP model than in the HP case.
273: The most stable THP sequence folds into the structure
274: shown in Figure 1b, whereas the most stable HP sequence
275: folds to a structure which is not maximally compact. In order to
276: describe the folding kinetics, we take a sequence at a temperature
277: equal to its $T_f$ value and consider 10 batches of 101
278: trajectories and determine the first passage time to the native
279: state starting from an unfolded conformation. The time evolution
280: \cite{monte} is a Monte Carlo process which satisfies detailed
281: balance. The kinetic moves consist of single bead moves (the kink
282: flips and rotations of the terminal segments) with probability 0.2
283: and of two bead ``crankshaft" moves with probability 0.8. A
284: median folding time is determined for each batch and averaged over
285: all batches to yield a measure of the folding time, $t_{fold}$.
286: Our calculations were carried out for 12 sequences in each model
287: (the top 10 sequences in $T_f$ values and the sequences ranked 20
288: and 30). In all cases, the THP model exhibits more rapid
289: two-state folding
290: than the HP model with the ratio of the folding times for the 12
291: sequences ranging between 0.10 and 0.47.
292: %more digits: 0.1058 and 0.4747.
293:
294: The framework of evolution in life works through both the DNA
295: molecule and the functionally useful protein molecule. Mutations
296: of the DNA molecule lead to the possibility of new proteins, whose
297: selection, in turn, leads to an enhancement of the number of such
298: DNA molecules. As pointed out by Maynard-Smith\cite{Maynard}, as
299: the sequence undergoes mutation, there must be a continuous
300: network that the mutated sequences can traverse without passing
301: through any intermediaries that are non-functioning. Thus, one
302: seeks a connected network in sequence space for evolution by
303: natural selection to occur. There is considerable evidence that
304: much of evolution is neutral \cite{Kimura}.
305:
306: We have investigated the topology of connections \cite{Chan} between
307: the designable structures resulting from point mutations in the sequence
308: (the change of one aa from H to P or vice versa).
309: Indeed, while one has a ``random walk" in sequence space
310: that forms a connected network,
311: there is no similar continuous variation in structure space.
312: When $\epsilon = 1$, 39.3 \% or 605
313: %60.7 \% or 934 comprise
314: of the HP sequences do not belong to
315: the connected network envisioned by Maynard-Smith in sharp
316: contrast to the THP model for which only 13 of the 10579 sequences,
317: i.e. 0.12 \%, do not belong to the network.
318: The THP model is vastly better connected than the HP model,
319: as illustrated in Figure 3. The former exhibits
320: %vastly better connected than the HP model -- the former exhibits
321: approximate scale-free behavior \cite{Reka} while the latter is
322: more akin to a random network with low mean coordination number
323: (Figure 4).
324:
325: %FIGURE 3
326: \begin{figure}
327: %\epsfxsize=4in \centerline{\epsffile{networ.eps}}
328: \epsfxsize=3.7in \centerline{\epsffile{ffig3.eps}}
329: \vspace*{-1cm}
330: \caption{ {\small
331: Network topologies (using Pajek)
332: of designable structures resulting from
333: point mutations in the sequence. The top and bottom panels are for the
334: THP and HP models respectively.
335: }}
336: \end{figure}
337:
338: %FIGURE 4
339: \begin{figure}
340: %\epsfxsize=4in \centerline{\epsffile{nlas.eps}}
341: \epsfxsize=3.9in \centerline{\epsffile{fig4.eps}}
342: \caption{ {\small
343: Probability distribution, $P(z)$, of the effective coordination
344: number for the network of designable structures shown in Fig. 3.
345: % resulting from point
346: %mutations in the sequence.
347: The inset is a plot of the same data in
348: a log-log scale (the top panel) for the THP model and in a
349: log-linear scale for the HP model.
350: The results illustrate the approximate validity of
351: $P(z) \sim z^{-\gamma}$ and $P(z)\sim exp{-z/\xi}$
352: for the THP and HP models respectively.
353: }}
354: \end{figure}
355:
356: In summary, we find that the tube lattice model captures many of
357: the key characteristics of protein behavior in a superior way
358: compared to conventional lattice models. The key advantage of
359: studying a tube on a lattice compared to a more realistic
360: continuum analysis \cite{Banavar} is that one can often carry out
361: an exact analysis for short chains and obtain insights on real
362: protein behavior. As an illustration, we conclude with a simple
363: analysis of a few hundred proteins \cite{Dima} to determine the
364: propensity of amino acid pairs in contact \cite{Tsai} to be in
365: specific environments characterized by the $m$-index introduced
366: above. Specifically, we look at the type of contact between aa $k$
367: and aa $l$ along the sequence and categorize it in the following
368: manner: the specific aa pair involved in the contact, their
369: sequence separation $s = \mid k-l \mid$ equal to 2, 3, 4 or
370: greater than 4 and the number of contacts $m$ between the two
371: groups of aa ($k-1,k,k+1$) and ($l-1,l,l+1$) which can range
372: between 1 and 9. (The geometry of the lattice model in two
373: dimensions allow for only three values $1$, $2$ or $3$ of the contact
374: environment index $m$.) We have determined
375: \begin{equation}
376: \chi _2(k,l,s,m)\;=\;
377: \frac{[n(k,l,s,m)\;-\;p(k,l,s,m)]^2}{p(k,l,s,m)} \;\;.
378: \end{equation}
379: Here $n$ is the number of contacts and $p$ the expected number of
380: contacts based on chance: $p(k,l,s,m)= a q(k,l,s)$, where $q$ is
381: the number of the specific aa pairs at distance $s$ and
382: $a=\sum_{kl} n(k,l,s,m) / \sum_{kl} q(k,l,s)$.
383: A large value of $\chi_2$ corresponds to
384: a strong signal that aa $k$ and aa $l$ prefer to make or avoid a
385: contact in the environment defined by the $s$ and $m$ indices
386: (Table 1) and would be useful in the development of scoring
387: functions for protein structure recognition \cite{Dima}.
388:
389: The tube idea reveals a deep underlying simplicity in the protein problem.
390: In standard approaches, the sequence of amino acids is believed to play
391: a key role in sculpting the free energy landscape and determining
392: its native state structure. Here, instead, the free energy landscape
393: is sculpted predominantly by symmetry and geometry and the sequence
394: plays a vital role in the {\em selection}
395: of the native state from a predetermined
396: menu. Unlike sequences and functionalities, which are shaped by
397: the powerful forces of evolution, the menu of putative native state
398: structures is immutable and is determined by physical law. Indeed,
399: this fixed backdrop provides the initial basis for selection in molecular
400: evolution. DNA which make proteins that are able to fold readily into
401: one of the predetermined folds pass the initial screening.
402: An additional level of filtering completes the selection process of
403: proteins that are not only good folders but are also able to interact well
404: with ligands and other proteins and play a useful functional role.
405:
406: We are grateful to Istvan Albert for helpful advice. This work was
407: supported by KBN (grant 2 P03B 032 25), COFIN MURST 2003, INFM,
408: NASA, NSF IGERT grant DGE-9987589 and the NSF MRSEC at Penn State.
409:
410:
411:
412: %\begin{references}
413: \begin{thebibliography}{99}
414:
415:
416:
417: \bibitem{Dillreview} K. A. Dill, S. Bromberg, K. Z. Yue, K. M. Fiebig,
418: D. P. Yee, P. D. Thomas, H. S. Chan,
419: %Principles of protein folding - a perspective from simple exact models
420: Protein Science {\bf 4}, 561-602 (1995).
421:
422: %\bibitem{Kornberg} A. Kornberg, {\em For the love of enzymes}
423: %(Harvard University Press, Cambridge, 1989).
424:
425: \bibitem{Banavar}
426: J. R. Banavar and A. Maritan, Rev. Mod. Phys. {\bf 75}, 23-34
427: (2003); T. X. Hoang, A. Trovato, F. Seno, J. R. Banavar, and A.
428: Maritan, Proc. Natl. Acad. Sci. USA {\bf 101}, 7960-7964 (2004);
429: J. R. Banavar, T.
430: X. Hoang, A. Maritan, F. Seno, A. Trovato, Phys. Rev. E
431: (submitted).
432:
433: \bibitem{Chothia}
434: C. Chothia and A. V. Finkelstein, Annu. Rev. Biochem. {\bf 59},
435: 1007-1039 (1990); C. Chothia, Nature {\bf 357}, 543-544 (1992).
436:
437: \bibitem{Liwo}
438: A. Liwo, R. Kazmierkiewicz, C. Czaplewski, M. Groth, S. Oldziej,
439: R. J. Rackovsky, M. R. Pincus, and H. A. Scheraga, J. Comput. Chem.
440: {\bf 19}, 259-276 (1998);
441: B. Fain and M. Levitt, Proc. Natl. Acad. Sci. USA {\bf 100},
442: 10700-10705 (2003).
443:
444: \bibitem{Rios}
445: See, e.g.,
446: D. K. Klimov and D. Thirumalai, Folding and Design {\bf 3}, 127-139 (1998);
447: P. De Los Rios and G. Caldarelli, Phys. Rev. E {\bf 62}, 8449-8452
448: (2000).
449:
450: \bibitem{funnel} P. G. Wolynes, J. N. Onuchic, and D. Thirumalai,
451: Science {\bf 267}, 1619-1620 (1995); K. A. Dill and H. S. Chan, Nature
452: Struct. Biol. {\bf 4}, 10-19 (1997).
453:
454: \bibitem{monte}
455: M. Cieplak, M. Henkel, J. Karbowski, and J. R. Banavar,
456: %Master equation approach to protein folding and kinetic traps,
457: Phys. Rev. Lett. {\bf 80}, 3654-3657 (1998).
458:
459: \bibitem{Maynard}
460: J. Maynard Smith, Evolutionary Genetics, Oxford University Press,
461: 2nd edition New York (1998).
462:
463: \bibitem{Kimura}
464: {\it The Neutral Theory of Molecular Evolution}, M. Kimura,
465: Cambridge University Press, New York, Reprint Edition (1985).
466:
467: \bibitem{Chan}
468: E. Bornberg-Bauer and H. S. Chan,
469: %Modeling evolutionary landscapes: Mutational stability, topology,
470: %and superfunels in sequence space,
471: Proc. Natl. Acad. Sci. USA {\bf 96} 10689-10694 (1999);
472: Y. Cui, W. H. Wong, E. Bornberg-Bauer, and H. S. Chan,
473: %Recombinatoric exploration of novel folded structures: A
474: %heteropolymer-based model of protein evolutionary landscapes,
475: Proc. Natl. Acad. Sci. USA {\bf 99} 809-814 (2002);
476: H. S. Chan and E. Bornberg-Bauer,
477: %Perspective on protein evolution from simple exact models,
478: Appl. Bioinformatics {\bf 1} 121-144 (2002).
479:
480: \bibitem{Reka}
481: R. Albert and A.-L. Barabasi, Rev. Mod. Phys. {\bf 74}, 47-98
482: (2002).
483:
484: \bibitem{Dima}
485: I. Chang, M. Cieplak, R. I. Dima, A. Maritan, and J. R. Banavar,
486: %Protein threading by learning,
487: Proc. Natl. Acad. Sci. {\bf98} 14351-14355 (2001).
488:
489: \bibitem{Tsai}
490: J. Tsai, R. Taylor, C. Chothia, and M. Gerstein, J. Mol. Biol.
491: {\bf 290}, 253-266 (1999); M. Cieplak and T. X. Hoang,
492: %Universality classes in folding times of proteins,
493: Biophys. J. {\bf 84}, 475-488 (2003).
494:
495:
496: \end{thebibliography}
497:
498:
499:
500: \newpage
501:
502: %\onecolumn
503:
504:
505: %\vspace*{2cm}
506:
507: \centerline{TABLE CAPTION}
508:
509: {\bf Table I}. {\small
510: The list of aa pairs with $\chi _2 (k,l,s,m)$ larger than
511: 65. In the ensemble of proteins that were studied,
512: there are 97 918, 97 525, 97132, and 17 506 983 aa pairs
513: with $s$ equal to 2, 3, 4, and gretaer than 4 respectively.
514: 336 110 of the pairs form contacts:
515: 22.1, 11.9, 10.4, and 55.6 \% of them correspond
516: to $s$=2, 3, 4, and $> 4$ respectively. For each $s$, the distribution
517: of the contacts over the contact type $m$ is uneven. For $s$=2 and 3,
518: most of the contacts, 36.5 and 59.8 \% respectively,
519: corresponds to $m$=8.
520: These contacts typically correspond to interactions within helices.
521: Amino acids with long and/or forked side groups (L, K, Q, R, E) are
522: more likely to form local contacts with a large $m$.
523: On the other hand, the smallest amino
524: acid, G, is much less likely to form such contacts, as evidenced
525: by the aversion in the pairs G-G, G-P, and G-S for $s$=2 and $m$=8.
526: %The propensity of aa A to participate in short range contacts with
527: %a large $m$ (also for $s$=4) is again steric in nature: its small
528: %size allows for a participation in conformational twists, but the size
529: %is sufficiently big to result in contact formation.
530: The propensity of aa A to participate in short range contacts with
531: a large $m$ (also for $s$=4) is also due to its size: A is small
532: enough to allow for participation in conformational twists, but it
533: is sufficiently big to facilitate formation of many contacts.
534: For $s$=4, 67.3 \% of the contacts ocupy $m$=6. Finally, for $s > 4$,
535: 45.4 \% of the contacts occupy $m$=1 and 2 almost equally.
536: These contacts usually correspond to links between
537: secondary structures, e.g. between two helices or between
538: a helix and a turn, through a pair of hydrophobic amino acids which
539: are unlikely to be a G. The C-C covalent attraction results in
540: non-local contacts over a range of $m$ values.
541: }
542:
543: \newpage
544: %\vspace*{0.6cm}
545:
546: \hspace*{3.5cm} Table I
547:
548: %\large
549: \begin{tabbing}
550: \=xxxxxxxxxxxxxxxx\=xxxxxxxxxxxxxxxxxxxx\=xxxxxxxx\=xxxx \kill
551:
552: %\>{\bf aa pairs} \> {\bf attraction/aversion} \>
553: %{\bf s} \> {\bf m} \\
554: \> {\bf \underline{ aa pairs}} \> {\bf \underline{ attraction/aversion}}
555: \> {\bf $\;$\underline{s}} \>{\bf\underline{m}} \\
556: %\>\underline{~~~~~~~~~~~~~~~~~~~~~} \> \underline{~~~~~~~~~~~~~~~~~~~~~~~~~}
557: %\> \underline{~~~~~~~~} \> \underline{~~~~~}\\
558:
559: %\> ------------------------- \> ------------------------------ \>
560: %------------ \> ------ \\
561: \> {\bf V-I} \> \hspace*{0.9cm}attraction \> 2 \> 4 \\
562: \> {\bf AL-AEQKR} \> \hspace*{0.9cm}attraction \> 2 \> 8 \\
563: \> {\bf G-PSG } \> \hspace*{1.7cm}aversion
564: \> 2 \> 8 \\
565: \vspace*{-0.3cm}
566: \> ------------------------- \> ------------------------------ \>
567: ------------ \> ------ \\
568: \> {\bf A-AQIR $\;$ L-ALQ }\> \hspace*{0.9cm}attraction \> 3 \> 8 \\
569: \vspace*{-0.3cm}
570: \> ------------------------- \> ------------------------------ \>
571: ------------ \> ------ \\
572: \> {\bf A-A $\;$ L-LA $\;$ E-R}\>\hspace*{0.9cm}attraction \> 4 \> 6 \\
573: \> {\bf G-V} \> \hspace*{1.7cm}aversion \> 4 \> 6 \\
574: \vspace*{-0.3cm}
575: \> ------------------------- \> ------------------------------ \>
576: ------------ \> ------ \\
577: \> {\bf L-IFVLMWY } \> \hspace*{0.9cm}attraction \> $>$4 \> 1 \\
578: \> {\bf V-IFVMW $\;$ F-FWY} \> \hspace*{0.9cm}attraction \> $>$4 \> 1 \\
579: \> {\bf I-FIWM $\;$ C-C $\;$ M-FY} \> \hspace*{0.9cm}attraction \>
580: $>$4 \> 1 \\
581: \> {\bf A-G $\;$ G-DST } \> \hspace*{1.7cm}aversion \> $>$4 \> 1 \\
582:
583: \> {\bf L-IFVLMWY $\;$W-Y} \> \hspace*{0.9cm}attraction \> $>$4 \> 2 \\
584: \> {\bf VI-IF $\;$ F-FW $\;$ C-C}\>\hspace*{0.9cm}attraction\>$>$4 \> 2 \\
585: \> {\bf L-LF $\;$ I-V $\;$C-C }\> \hspace*{0.9cm}attraction \> $>$4 \> 3 \\
586: \> {\bf C-C }\> \hspace*{0.9cm}attraction \> $>$4 \> 4 \\
587:
588: \> {\bf V-VI $\;$ I-I } \> \hspace*{0.9cm}attraction \> $>$4 \>5 \\
589: \> {\bf V-LVIFT $\;$ I-I } \> \hspace*{0.9cm}attraction \> $>$4 \> 6 \\
590: \vspace*{-0.3cm}
591: %\> ------------------------- \> ------------------------------ \>
592: %------------ \> ------ \\
593:
594:
595: \end{tabbing}
596:
597: \end{document}
598:
599: