0410:q-bio0410031/ftube.tex

1: %\documentclass[aps,prl,twocolumn,showpacs,floatfix]{revtex4}

2: \documentclass[aps,prl,twocolumn,floatfix]{revtex4}

3: \usepackage{graphicx,graphics,epsf,psfig,epsfig}% Include figure files

4: \usepackage{subfigure}

5:

6: \def\beq{\begin{equation}}

7: \def\eeq{\end{equation}}

8: \def\beqa{\begin{eqnarray}}

9: \def\eeqa{\end{eqnarray}}

10: %\def\l{\left}

11: %\def\r{\right}

12: %\def\bdi{\begin{displaymath}}

13: %\def\edi{\end{displaymath}}

14: %\def\ds{\displaystyle}

15:

16: \begin{document}

17: %\documentclass[12pt]{article}

18: %\documentstyle[twocolumn,aps,prl,epsf]{revtex}

19: %\documentstyle[preprint,aps,prl,epsf]{revtex}   % makes power law references

20: %\topmargin 0in

21: %\begin{document}

22:

23: \title{Lattice tube model of proteins}

24:

25: \author{Jayanth R. Banavar$^1$, Marek Cieplak$^2$,

26:  and Amos Maritan$^3$}

27:

28: \address{

29: $^1$104 Davey Laboratory, Department of Physics,

30: The Pennsylvania State University,

31: University Park, Pennsylvania 16802\\

32: $^2$Institute of Physics, Polish Academy of Sciences,

33: Al. Lotnik{\'o}w 32/46, 02-668 Warsaw, Poland \\

34: $^3$Universit{\'a} degli Studi di Padova, Dipartimento di Fisica

35: and INFN, via

36: Mazzolo 8, 35100, Padova, Italy }

37:

38: %\maketitle

39:

40: %\vskip 40pt

41: %\noindent

42: %$^*$Correspondence to: \\

43: %Marek Cieplak,\\

44: %Institute of Physics, \\

45: %Polish Academy of Sciences, \\

46: %Al. Lotnik\'ow 32-46  \\

47: %02-668 Warsaw, Poland\\

48: %Tel:  48-22-843-7001\\

49: %Fax:  48-22-843-0926\\

50: %E-mail: mc@ifpan.edu.pl

51:

52:

53: %\vskip 40pt

54: %\noindent {\bf

55: %Keywords:

56:

57: \begin{abstract}

58:

59: {\small

60: %We present a  new lattice model for proteins and demonstrate that

61: %it captures many aspects of real protein behavior.

62: We present a new lattice model for proteins that incorporates

63: a tube-like anisotropy by introducing a preference for mutually parallel

64: alignments in the conformations. The model is demonstrated to

65: capture many aspects of real proteins.

66: }

67:

68: \end{abstract}

69:

70: \maketitle

71:

72: %\vskip 40pt

73: \vspace*{-0.8cm}

74: \hspace*{1.5cm} PACS Numbers: 87.15.He, 87.15.Cc, 87.15.Aa

75:

76: \vspace*{0.5cm}

77:

78: %\newpage

79:

80:

81: There have been several physics-based attempts to distil the

82: essential features of the protein problem and notable success in

83: capturing many of the key ingredients has been achieved using

84: lattice models \cite{Dillreview}.  Such coarse-grained

85: descriptions allow a virtually exact analysis of many properties

86: and provide a useful framework for understanding experimental

87: results. Indeed, valuable progress has been made within the

88: simplified description of a lattice model with just two types of

89: amino acids denoted by $H$ and $P$ representing hydrophobic and

90: polar behaviors. The principal theme of this letter is to present a

91: new lattice model of proteins, which takes into account a

92: previously overlooked key attribute of chain molecules -- the

93: context of amino acids within a chain. We benchmark the behavior

94: of this model with the well-studied HP lattice model and show that

95: the new model faithfully captures several attributes of real

96: proteins.

97:

98: %Globular proteins, that act as enzymes, play a key role in the

99: %network of life. As stated by Kornberg \cite{Kornberg}, ``What

100: %gives the cell its life and personality are enzymes.  They govern

101: %all body processes; malfunction of even one enzyme can be fatal.

102: %Nothing in nature is so tangible and vital to our lives as

103: %proteins, and yet so poorly understood and appreciated by all but

104: %a few scientists." Proteins are hard to understand for several

105: %reasons: first, they are chain molecules made up of twenty

106: %different types of amino acids each with their own distinctive

107: %features; second, an important role is played by the surrounding

108: %water molecules -- proteins fold in order to create a hydrophobic

109: %core within which the hydrophobic amino acid side chains can be

110: %sequestered; third, proteins are neither long enough that simple

111: %theoretical approximations can be made to study them nor are they

112: %so short as to be amenable to an exact treatment -- they are

113: %finite size objects for which one might expect non universal

114: %behavior with some of the details being important; and finally, nature

115: %tinkers with sequences of amino acids through evolution, building

116: %on what she already has, and so the proteins that we see today are

117: %not the result of a targeted design process but rather are

118: %strongly history dependent.

119:

120: %In spite of these daunting difficulties, there

121: There are clear hints,

122: manifested by the many common characteristics of proteins

123: \cite{Banavar}, that proteins may be simpler than one might

124: expect. Protein structures are constructed in a modular manner

125: from common building blocks -- helices, hairpins and sheets

126: connected together by tight turns. Further, the total number of

127: distinct protein folds seems to be of the order of just a few

128: thousand \cite{Chothia}.

129:

130: The simplest model of an unconstrained object is a hard sphere. A

131: collection of hard spheres exhibits both fluid and crystalline

132: phases on changing the volume fraction.  When objects are tethered

133: together in the form of a chain, it is no longer appropriate to

134: consider them as spheres. There is a special direction that one

135: may associate with each object which is tangent to the chain and

136: is defined by the adjoining particles along the chain. It is

137: therefore more appropriate to model the objects making up a chain

138: by means of discs or coins, for which the heads-to-tails direction

139: is distinct from the two other directions. This picture of

140: tethered coins leads to a tube-like description of a chain

141: molecule \cite{Banavar}. Just as symmetry plays a key role in

142: determining the nature of ordering of unconstrained particles (the

143: phases associated with a collection of spheres are vastly simpler

144: than the liquid crystal phases of anisotropic objects), the

145: anisotropy inherent in a tube leads to new behavior.  Recent work

146: \cite{Banavar} has shown that the tube picture can be used to

147: understand the conventional polymer phases and the novel phase of

148: matter used by Nature to house protein native state structures in

149: a unified way and for the development of a framework for

150: understanding the common character of proteins.

151:

152: There are three key features of a tube description that one ought

153: to incorporate in a lattice model: self-intersections of a tube

154: are not allowed, the local radius of curvature of a tube can be no

155: smaller than the tube radius

156: %(there would otherwise be a sharp corner in the tube)

157: and in a compact state, there is a tendency

158: for nearby tube segments to be parallel (indeed both helices and

159: sheets have tube segments alongside and parallel to each other

160: leading to a cooperative placement of hydrogen bonds \cite{Liwo}).

161: The first two features are built into a model of a self-avoiding

162: chain on a lattice.  Our focus here is on considering the effects

163: of introducing the third.

164:

165: In order to illustrate the key idea, we will consider a 16 amino

166: acid (aa) self-avoiding chain on a square lattice. There have been

167: numerous previous studies \cite{Dillreview} of this system within

168: the standard HP model context and its generalizations \cite{Rios}.

169: In the standard $HP$ model, one

170: ascribes a favorable energy $-1$ for a $HH$ contact (two H aa

171: which are {\em not} next to each other in sequence but sit next to

172: each other in the lattice) and zero energy otherwise. Here, in

173: addition we pay attention to the context that the contact occurs

174: in. Figure 1 illustrates three distinct types of contacts (denoted

175: by an index $m$) depending on the degree to which the segments

176: containing the aa in contact are parallel to each other. The

177: energy assigned to a $HH$ contact of type $m$ in the Tube HP (THP)

178: model is denoted by $e_m$. In what follows, let us choose $e_m$ to

179: be $-3$, $-2$, and $-1$ for $m$=3, 2, and 1 respectively thereby

180: favoring the parallelism of nearby segments.  In the standard HP

181: model $e_m= -1$ independent of $m$.

182:

183: In order to understand the role of sequence heterogeneity, it is

184: useful to consider a generalized model in which the energies are

185: described by

186: \begin{equation}

187: E \;=\; \sum_{i<j} \; e_m \; \Delta (i-j,m) \; [ \delta_{i,H} \delta_{j,H}

188: \;+\;(1 - \epsilon ) \; D_{ij}] \;\;,

189: \end{equation}

190: where

191: \begin{equation}

192: D_{ij}\;=\;(\delta_{i,H} \delta_{j,P} + \delta_{i,P} \delta_{j,H}

193: +\delta_{i,P} \delta_{j,P} ) \;= \; 1\;-\;\delta_{i,H}\delta_{j,H}\;\;.

194: \end{equation}

195: Here, $\Delta (i-j,m)$ is equal to 1 if the amino acids $i$ and

196: $j$ form a contact and 0 otherwise. When such a contact exists,

197: the energy of attraction associated with it depends on the index

198: $m$. $\delta _{i,H}$ is defined to be equal to 1 if amino acid $i$

199: is hydrophobic and 0 if it is polar. Similarly, $\delta_{i,P} = 1

200: - \delta_{i,H}$ is equal to 1 if amino acid $i$ is polar.

201: Depending on the choice of the $e_m$ parameters, one obtains the

202: HP or THP models. The limiting cases correspond to $\epsilon =1$,

203: i.e. the 'standard' THP or HP models, and $\epsilon =0$ -- the

204: case of a homopolymer made of $H$ amino acids.

205:

206: For the 16-aa chain, all sequences and all possible conformations

207: can be enumerated exactly. There are interesting differences in

208: the energy landscape of the HP and the THP models.  One may

209: determine the sequences which have a unique ground state and the

210: number of distinct designable conformations, which house these

211: sequences, as a function of $\epsilon$ (see inset of Figure 2).

212: For a homopolymer ($\epsilon $=0), the HP model has no designable

213: structure -- all compact conformations are degenerate and have the

214: same energy. Thus in the absence of sequence specificity,

215: there is no pre-selection of protein-like structures among

216: compact conformations.

217: Thus in the absence of sequence specificity, there is

218: no protein-like behavior. When a weak heterogeneity is introduced

219: by turning on a small $\epsilon$, the HP energy landscape becomes

220: rugged and each of the 69 maximally compact conformations become

221: designable but with a weak thermodynamic stability.  Thus the

222: funnel-like energy landscape \cite{funnel} arises only on turning

223: on the full degree of sequence heterogeneity.

224:

225: %FIGURE 1

226: \begin{figure}

227: %\epsfxsize=3.6in \centerline{\epsffile{defconf.eps}}

228: \epsfxsize=3.6in \centerline{\epsffile{fig1.eps}}

229: \vspace*{-3.5cm}

230: \caption{ {\small

231: %Fig. 1.

232: Panel a: Sketch of three contact environments in the THP model. The dashed

233: line denotes a contact. Panel b:  The optimal

234: structure for the THP model. The circled

235: represent the hydrophobic core and have H aa in them

236: more than 87 \% of the time for the sequences that fold into this

237: conformation when $\epsilon = 1$.

238:  }}

239: \end{figure}

240:

241: This is in sharp contrast to the behavior of the THP model --

242: here, even for a homopolymer, one obtains a unique ground state,

243: akin to either a helix or a sheet in two dimensions

244: (see Figure 1), selected not by considerations of the chemistry of

245: the aa but rather by the overarching principles of geometry and

246: symmetry. Interestingly, in the limit of small $\epsilon$, all

247: $2^{16}$=65536 sequences have a unique ground state in the THP

248: model and none in the HP model.

249: When $\epsilon =1$, one obtains 10579 and 1539

250: designable sequences in the THP and HP models respectively (see

251: Figure 2) folding into 684 and 456 distinct folded structures.

252: Furthermore, the number of sequences folding into the most

253: designable structure are 637 and 26 for the two models.

254:

255: %FIGURE 2

256: \begin{figure}

257: %\epsfxsize=4in \centerline{\epsffile{comdes.eps}}

258: \epsfxsize=3.7in \centerline{\epsffile{fig2.eps}}

259: \caption{ {\small

260: %Fig. 2.

261: Rank ordered values of the number of sequences that fold into the

262: given structure for all of the designable structures at $\epsilon

263: =1$. The inset shows the number of designable structures as a

264: function of $\epsilon$ (see text).

265: }}

266: \end{figure}

267:

268: The thermodynamic stability of a sequence is characterized by the

269: folding transition temperature, $T_f$, at which the equilibrium

270: probability of being in the native state conformation is equal to

271: $\frac{1}{2}$. The spread in the values of $T_f$ is nearly three

272: times larger in the THP model than in the HP case.

273: The most stable THP sequence  folds into the structure

274: shown in Figure 1b, whereas the most stable HP sequence

275: folds to a structure which is not maximally compact. In order to

276: describe the folding kinetics, we take a sequence at a temperature

277: equal to its $T_f$ value and consider 10 batches of 101

278: trajectories and determine the first passage time to the native

279: state starting from an unfolded conformation. The time evolution

280: \cite{monte} is a Monte Carlo process which satisfies detailed

281: balance. The kinetic moves consist of single bead moves (the kink

282: flips and rotations of the terminal segments) with probability 0.2

283: and of two bead ``crankshaft" moves with probability 0.8.  A

284: median folding time is determined for each batch and averaged over

285: all batches to yield a measure of the folding time, $t_{fold}$.

286: Our calculations were carried out for 12 sequences in each model

287: (the top 10 sequences in $T_f$ values and the sequences ranked 20

288: and 30). In all cases, the THP model exhibits more rapid

289: two-state folding

290: than the HP model with the ratio of the folding times for the 12

291: sequences ranging between 0.10 and 0.47.

292: %more digits:  0.1058  and 0.4747.

293:

294: The framework of evolution in life works through both the DNA

295: molecule and the functionally useful protein molecule. Mutations

296: of the DNA molecule lead to the possibility of new proteins, whose

297: selection, in turn, leads to an enhancement of the number of such

298: DNA molecules. As pointed out by Maynard-Smith\cite{Maynard}, as

299: the sequence undergoes mutation, there must be a continuous

300: network that the mutated sequences can traverse without passing

301: through any intermediaries that are non-functioning.  Thus, one

302: seeks a connected network in sequence space for evolution by

303: natural selection to occur.  There is considerable evidence that

304: much of evolution is neutral \cite{Kimura}.

305:

306: We have investigated the topology of connections \cite{Chan} between

307: the designable structures resulting from point mutations in the sequence

308: (the change of one aa from H to P or vice versa).

309: Indeed, while one has a ``random walk" in sequence space

310: that forms a connected network,

311: there is no similar continuous variation in structure space.

312: When $\epsilon = 1$, 39.3 \% or 605

313: %60.7 \% or 934 comprise

314: of the HP sequences do not belong to

315: the connected network envisioned by Maynard-Smith in sharp

316: contrast to the THP model for which only 13 of the 10579 sequences,

317: i.e. 0.12 \%, do not belong to the network.

318: The THP model is vastly better connected than the HP model,

319: as illustrated in Figure 3. The former exhibits

320: %vastly better connected than the HP model -- the former exhibits

321: approximate scale-free behavior \cite{Reka} while the latter is

322: more akin to a random network with low mean coordination number

323: (Figure 4).

324:

325: %FIGURE 3

326: \begin{figure}

327: %\epsfxsize=4in \centerline{\epsffile{networ.eps}}

328: \epsfxsize=3.7in \centerline{\epsffile{ffig3.eps}}

329: \vspace*{-1cm}

330: \caption{ {\small

331: Network topologies (using Pajek)

332: of designable structures resulting from

333: point mutations in the sequence. The top and bottom panels are for the

334: THP and HP models respectively.

335:  }}

336: \end{figure}

337:

338: %FIGURE 4

339: \begin{figure}

340: %\epsfxsize=4in \centerline{\epsffile{nlas.eps}}

341: \epsfxsize=3.9in \centerline{\epsffile{fig4.eps}}

342: \caption{ {\small

343: Probability distribution, $P(z)$, of the effective coordination

344: number for the network of designable structures shown in Fig. 3.

345: % resulting from point

346: %mutations in the sequence.

347: The inset is a plot of the same data in

348: a log-log scale (the top panel) for the THP model and in a

349: log-linear scale for the HP model.

350: The results illustrate the approximate validity of

351: $P(z) \sim z^{-\gamma}$ and  $P(z)\sim exp{-z/\xi}$

352: for the THP and HP models  respectively.

353:  }}

354: \end{figure}

355:

356: In summary, we find that the tube lattice model captures many of

357: the key characteristics of protein behavior in a superior way

358: compared to conventional lattice models. The key advantage of

359: studying a tube on a lattice compared to a more realistic

360: continuum analysis \cite{Banavar} is that one can often carry out

361: an exact analysis for short chains and obtain insights on real

362: protein behavior. As an illustration, we conclude with a simple

363: analysis of a few hundred proteins \cite{Dima} to determine the

364: propensity of amino acid pairs in contact \cite{Tsai} to be in

365: specific environments characterized by the $m$-index introduced

366: above. Specifically, we look at the type of contact between aa $k$

367: and aa $l$ along the sequence and categorize it in the following

368: manner: the specific aa pair involved in the contact, their

369: sequence separation $s =  \mid k-l \mid$ equal to 2, 3, 4 or

370: greater than 4 and the number of contacts $m$ between the two

371: groups of aa ($k-1,k,k+1$) and ($l-1,l,l+1$) which can range

372: between 1 and 9. (The geometry of the lattice model in two

373: dimensions allow for only three values $1$, $2$ or $3$  of the contact

374: environment index $m$.) We have determined

375: \begin{equation}

376: \chi _2(k,l,s,m)\;=\;

377: \frac{[n(k,l,s,m)\;-\;p(k,l,s,m)]^2}{p(k,l,s,m)} \;\;.

378: \end{equation}

379: Here $n$ is the number of contacts and $p$ the expected number of

380: contacts based on chance: $p(k,l,s,m)= a q(k,l,s)$, where $q$ is

381: the number of the specific aa  pairs at distance $s$ and

382: $a=\sum_{kl} n(k,l,s,m) / \sum_{kl} q(k,l,s)$.

383: A large value of $\chi_2$ corresponds to

384: a strong signal that aa $k$ and aa $l$ prefer to make or avoid a

385: contact in the environment defined by the $s$ and $m$ indices

386: (Table 1) and would be useful in the development of scoring

387: functions for protein structure recognition \cite{Dima}.

388:

389: The tube idea reveals a deep underlying simplicity in the protein problem.

390: In standard approaches, the sequence of amino acids is believed to play

391: a key role in sculpting the free energy landscape and determining

392: its native state structure. Here, instead, the free energy landscape

393: is sculpted predominantly by symmetry and geometry and the sequence

394: plays a vital role in the {\em selection}

395: of the native state from a predetermined

396: menu.  Unlike sequences and functionalities, which are shaped by

397: the powerful forces of evolution, the menu of putative native state

398: structures is immutable and is determined by physical law.  Indeed,

399: this fixed backdrop provides the initial basis for selection in molecular

400: evolution. DNA which make proteins that are able to fold readily into

401: one of the predetermined folds pass the initial screening.

402: An additional level of filtering completes the selection process of

403: proteins that are not only good folders but are also able to interact well

404: with ligands and other proteins and play a useful functional role.

405:

406: We are grateful to Istvan Albert for helpful advice. This work was

407: supported by KBN (grant 2 P03B 032 25), COFIN MURST 2003, INFM,

408: NASA, NSF IGERT grant DGE-9987589 and the NSF MRSEC at Penn State.

409:

410:

411:

412: %\begin{references}

413: \begin{thebibliography}{99}

414:

415:

416:

417: \bibitem{Dillreview} K. A. Dill, S. Bromberg, K. Z. Yue, K. M. Fiebig,

418: D. P. Yee, P. D. Thomas, H. S. Chan,

419: %Principles of protein folding - a perspective from simple exact models

420: Protein Science {\bf 4}, 561-602 (1995).

421:

422: %\bibitem{Kornberg} A. Kornberg, {\em For the love of enzymes}

423: %(Harvard University Press, Cambridge, 1989).

424:

425: \bibitem{Banavar}

426: J. R. Banavar and A. Maritan, Rev. Mod. Phys. {\bf 75}, 23-34

427: (2003); T. X. Hoang, A. Trovato, F. Seno, J. R. Banavar, and A.

428: Maritan, Proc. Natl. Acad. Sci. USA {\bf 101}, 7960-7964 (2004);

429: J. R. Banavar, T.

430: X. Hoang, A. Maritan, F. Seno, A. Trovato, Phys. Rev. E

431: (submitted).

432:

433: \bibitem{Chothia}

434: C. Chothia and A. V. Finkelstein, Annu. Rev. Biochem. {\bf 59},

435: 1007-1039 (1990); C. Chothia, Nature {\bf 357}, 543-544 (1992).

436:

437: \bibitem{Liwo}

438: A. Liwo, R. Kazmierkiewicz, C. Czaplewski, M. Groth, S. Oldziej,

439: R. J. Rackovsky, M. R. Pincus, and H. A. Scheraga, J. Comput. Chem.

440: {\bf 19}, 259-276 (1998);

441: B. Fain and M. Levitt, Proc. Natl. Acad. Sci. USA {\bf 100},

442: 10700-10705 (2003).

443:

444: \bibitem{Rios}

445: See, e.g.,

446: D. K. Klimov and D. Thirumalai, Folding and Design {\bf 3}, 127-139 (1998);

447: P. De Los Rios and G. Caldarelli, Phys. Rev. E {\bf 62}, 8449-8452

448: (2000).

449:

450: \bibitem{funnel} P. G. Wolynes, J. N. Onuchic, and D. Thirumalai,

451: Science {\bf 267}, 1619-1620 (1995); K. A. Dill and H. S. Chan, Nature

452: Struct. Biol. {\bf 4}, 10-19 (1997).

453:

454: \bibitem{monte}

455: M. Cieplak, M. Henkel, J. Karbowski, and J. R. Banavar,

456: %Master equation approach to protein folding and kinetic traps,

457: Phys. Rev. Lett. {\bf 80}, 3654-3657 (1998).

458:

459: \bibitem{Maynard}

460: J. Maynard Smith, Evolutionary Genetics, Oxford University Press,

461: 2nd edition New York (1998).

462:

463: \bibitem{Kimura}

464: {\it The Neutral Theory of Molecular Evolution}, M. Kimura,

465: Cambridge University Press, New York, Reprint Edition (1985).

466:

467: \bibitem{Chan}

468: E. Bornberg-Bauer and H. S. Chan,

469: %Modeling evolutionary landscapes: Mutational stability, topology,

470: %and superfunels in sequence space,

471: Proc. Natl. Acad. Sci. USA {\bf 96} 10689-10694 (1999);

472: Y. Cui, W. H. Wong, E. Bornberg-Bauer, and H. S. Chan,

473: %Recombinatoric exploration of novel folded structures: A

474: %heteropolymer-based model of protein evolutionary landscapes,

475: Proc. Natl. Acad. Sci. USA {\bf 99} 809-814 (2002);

476: H. S. Chan and E. Bornberg-Bauer,

477: %Perspective on protein evolution from simple exact models,

478: Appl. Bioinformatics {\bf 1} 121-144 (2002).

479:

480: \bibitem{Reka}

481: R. Albert and A.-L. Barabasi, Rev. Mod. Phys. {\bf 74}, 47-98

482: (2002).

483:

484: \bibitem{Dima}

485: I. Chang, M. Cieplak, R. I. Dima, A. Maritan, and J. R. Banavar,

486: %Protein threading by learning,

487: Proc. Natl. Acad. Sci. {\bf98} 14351-14355 (2001).

488:

489: \bibitem{Tsai}

490: J. Tsai, R. Taylor, C. Chothia, and M. Gerstein, J. Mol. Biol.

491: {\bf 290}, 253-266 (1999); M. Cieplak and T. X. Hoang,

492: %Universality classes in folding times of proteins,

493: Biophys. J. {\bf 84}, 475-488 (2003).

494:

495:

496: \end{thebibliography}

497:

498:

499:

500: \newpage

501:

502: %\onecolumn

503:

504:

505: %\vspace*{2cm}

506:

507: \centerline{TABLE CAPTION}

508:

509: {\bf Table I}.   {\small

510: The list of aa pairs with $\chi _2 (k,l,s,m)$ larger than

511: 65. In the ensemble of proteins that were studied,

512: there are 97 918, 97 525, 97132, and 17 506 983 aa pairs

513: with $s$ equal to 2, 3, 4, and gretaer than 4 respectively.

514: 336 110 of the pairs form contacts:

515: 22.1, 11.9, 10.4, and 55.6 \%  of them correspond

516: to $s$=2, 3, 4, and $> 4$ respectively. For each $s$, the distribution

517: of the contacts over the contact type $m$ is uneven. For $s$=2 and 3,

518: most of the contacts, 36.5 and 59.8 \% respectively,

519: corresponds to $m$=8.

520: These contacts typically correspond to interactions within helices.

521: Amino acids with long and/or forked side groups (L, K, Q, R, E) are

522: more likely to form local contacts with a large $m$.

523: On the other hand, the smallest amino

524: acid, G, is much less likely to form such contacts, as evidenced

525: by the aversion in the pairs G-G, G-P, and G-S for $s$=2 and $m$=8.

526: %The propensity of aa A to participate in short range contacts with

527: %a large $m$ (also for $s$=4) is again steric in nature: its small

528: %size allows for a participation in conformational twists, but the size

529: %is sufficiently big to result in contact formation.

530: The propensity of aa A to participate in short range contacts with

531: a large $m$ (also for $s$=4) is also due to its size: A is small

532: enough to allow for participation in conformational twists, but it

533: is sufficiently big to facilitate formation of many contacts.

534: For $s$=4, 67.3 \% of the contacts ocupy $m$=6. Finally, for $s > 4$,

535: 45.4 \% of the contacts occupy $m$=1 and 2 almost equally.

536: These contacts usually correspond to links between

537: secondary structures, e.g. between two helices or between

538: a helix and a turn, through a pair of hydrophobic amino acids which

539: are unlikely to be a G. The C-C covalent attraction results in

540: non-local contacts over a range of $m$ values.

541: }

542:

543: \newpage

544: %\vspace*{0.6cm}

545:

546: \hspace*{3.5cm} Table I

547:

548: %\large

549: \begin{tabbing}

550: \=xxxxxxxxxxxxxxxx\=xxxxxxxxxxxxxxxxxxxx\=xxxxxxxx\=xxxx \kill

551:

552: %\>{\bf aa pairs} \> {\bf attraction/aversion} \>

553: %{\bf s} \> {\bf m} \\

554: \> {\bf \underline{ aa pairs}} \> {\bf \underline{ attraction/aversion}}

555: \> {\bf $\;$\underline{s}} \>{\bf\underline{m}} \\

556: %\>\underline{~~~~~~~~~~~~~~~~~~~~~} \> \underline{~~~~~~~~~~~~~~~~~~~~~~~~~}

557: %\> \underline{~~~~~~~~} \> \underline{~~~~~}\\

558:

559: %\> ------------------------- \> ------------------------------ \>

560: %------------ \> ------ \\

561: \> {\bf  V-I} \>   \hspace*{0.9cm}attraction   \>     2 \>  4  \\

562: \> {\bf AL-AEQKR} \>  \hspace*{0.9cm}attraction  \>      2  \> 8 \\

563: \> {\bf G-PSG }  \>  \hspace*{1.7cm}aversion

564: \>  2  \> 8  \\

565: \vspace*{-0.3cm}

566: \> ------------------------- \> ------------------------------ \>

567: ------------ \> ------ \\

568: \> {\bf  A-AQIR $\;$ L-ALQ }\> \hspace*{0.9cm}attraction  \>    3  \> 8   \\

569: \vspace*{-0.3cm}

570: \> ------------------------- \> ------------------------------ \>

571: ------------ \> ------ \\

572: \> {\bf  A-A $\;$ L-LA $\;$ E-R}\>\hspace*{0.9cm}attraction  \>  4 \> 6   \\

573: \> {\bf  G-V} \> \hspace*{1.7cm}aversion  \>  4 \> 6  \\

574: \vspace*{-0.3cm}

575: \> ------------------------- \> ------------------------------ \>

576: ------------ \> ------ \\

577: \> {\bf L-IFVLMWY } \>  \hspace*{0.9cm}attraction  \>   $>$4 \> 1  \\

578: \> {\bf V-IFVMW $\;$ F-FWY} \> \hspace*{0.9cm}attraction  \>   $>$4 \> 1   \\

579: \> {\bf I-FIWM $\;$ C-C $\;$ M-FY} \> \hspace*{0.9cm}attraction \>

580:   $>$4 \> 1 \\

581: \> {\bf A-G $\;$ G-DST } \> \hspace*{1.7cm}aversion \>  $>$4 \> 1 \\

582:

583: \> {\bf L-IFVLMWY $\;$W-Y} \>  \hspace*{0.9cm}attraction  \>   $>$4 \> 2  \\

584: \> {\bf VI-IF $\;$ F-FW $\;$ C-C}\>\hspace*{0.9cm}attraction\>$>$4 \> 2   \\

585: \> {\bf L-LF $\;$ I-V $\;$C-C }\> \hspace*{0.9cm}attraction \> $>$4 \> 3 \\

586: \> {\bf C-C }\> \hspace*{0.9cm}attraction \> $>$4 \> 4 \\

587:

588: \> {\bf V-VI $\;$ I-I } \> \hspace*{0.9cm}attraction  \>  $>$4 \>5   \\

589: \> {\bf V-LVIFT $\;$ I-I }   \> \hspace*{0.9cm}attraction \>  $>$4 \> 6   \\

590: \vspace*{-0.3cm}

591: %\> ------------------------- \> ------------------------------ \>

592: %------------ \> ------ \\

593:

594:

595: \end{tabbing}

596:

597: \end{document}

598:

599: