1: %------------------------------------------------------------
2: % Revtex, up-dated July 20, 2004, first versin Feb 2004
3: % Title: Medium effects on the selection of sequences folding
4: % into stable proteins in a simple model %
5: % Authors: You-Quan Li, Yong-Yun Ji, Jun-Wen Mao and Xiao-Wei Tang
6: %--------------------------------------------------------
7:
8: \documentclass[twocolumn,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}
9:
10: %\documentclass[preprint,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}
11: % Some other (several out of many) possibilities
12: %\documentclass[preprint,aps]{revtex4}
13: %\documentclass[preprint,aps,draft]{revtex4}
14: %\documentclass[prb]{revtex4}% Physical Review B
15:
16: \usepackage{graphicx}% Include figure files
17: \usepackage{dcolumn}% Align table columns on decimal point
18: \usepackage{bm}% bold math
19: \usepackage{amssymb}
20: \usepackage{amsmath}
21: \usepackage{latexsym}
22:
23: \begin{document}
24:
25: \title{Medium effects on the selection of sequences folding \\
26: into stable proteins in a simple model}%
27:
28: \author{You-Quan Li}\email{yqli@zimp.zju.edu.cn}
29: \author{Yong-Yun Ji}
30: \author{Jun-Wen Mao}
31: \author{Xiao-Wei Tang}
32: \address{Department of Physics, Zhejiang University, Hangzhou 310027, P.R. China.} %
33:
34: \date{\today}%
35:
36: \begin{abstract}
37: We study the medium effects
38: on the selection of
39: sequences in protein
40: folding by taking account
41: of the surface potential in
42: $HP$-model. Our analysis on
43: the proportion of H and P
44: monomers in the sequences
45: gives a direct
46: interpretation that the
47: lowly designable structures
48: possess small average gap.
49: The numerical calculation
50: by means of our model
51: exhibits that the surface
52: potential enhances the
53: average gap of highly
54: designable structures. It
55: also shows that a most
56: stable structure may be no
57: longer the most stable one
58: if the medium parameters
59: changed.
60:
61: \end{abstract}
62:
63: \pacs{87.10.+e, 87.14.Ee, 87.15.-v}%
64: %87.10.+e General theory and mathematical aspects
65: %87.14.Ee Proteins
66: %87.15.-v Biomolecules: structure and physical properties
67:
68: \maketitle
69:
70: %introduction and our motivation
71: Proteins are known to play a virtual role in the structure and
72: functioning of all forms of life, and the protein folding problem
73: is one of the most fundamental and still unsolved problems.
74: Composed of a specific sequence of amino acids, each protein is
75: folded into native structure (a particular 3-dimensional shape)
76: that determines its biological function and it is widely believed
77: that for most single domain proteins, the native structure is the
78: global free-energy minimum\cite{1}. The amino-acid sequence alone
79: encodes sufficient\cite{1} information to determine its 3-d
80: structure. Theoretical studies on protein sequence and structure
81: include molecular dynamical simulation\cite{2} and lattice
82: model\cite{3}. The latter has absorbed much attention\cite{4,5}
83: while the former takes much CPU even on huge computers\cite{2}.
84:
85: For the naturally occurring
86: varieties of amino acids
87: can be classified\cite{6}
88: as either of hydrophobic(H)
89: or of polar(P), a
90: HP-lattice model to
91: interpret protein folding
92: was introduced\cite{4}.
93: Based on the called
94: standard HP model, 27
95: monomers occupying all
96: sites of a cubic\cite{5},
97: Li et al.\cite{7}
98: introduced the
99: designability to show that
100: potentially good sequences
101: are those with a unique
102: ground state separated by a
103: large gap from the first
104: excited state. By defining
105: the designability of a
106: structure as the number of
107: sequences that possess the
108: structure as their unique
109: lowest-energy state, they
110: found that the structures
111: differ drastically in their
112: designabilities. The
113: sequences that design the
114: highly designable
115: structures are
116: thermodynamically more
117: stable\cite{7,8}. Studies
118: on the designability for a
119: larger lattice
120: model\cite{9} and for an
121: off-lattice model\cite{10}
122: showed the similar results.
123: For many-letter models, the
124: different parameters gave
125: different results: Buchler
126: et al.\cite{11} got that
127: the designability of the
128: structure depends
129: sensitively on the size of
130: the alphabet, and Li et
131: al.\cite{12} achieved that
132: the designability of the
133: structure is not sensitive
134: to the alphabet size when a
135: realistic interaction
136: potential(MJ matrix) is
137: employed. Ejtehadi et al.
138: found that if the strength
139: of the non-additive part of
140: the interaction potential
141: becomes larger than a
142: critical value, the degree
143: of designability of
144: structures will depend on
145: the parameters of the
146: potential\cite{13}.
147:
148: Since useful features
149: concerning to the protein
150: folding and their stability
151: can be explored on the
152: basis of lattice model, it
153: will be worthwhile to study
154: the effect of media on
155: protein folding properties.
156: In this letter, we consider
157: the medium effects by
158: introducing different
159: parameters that
160: characterize various
161: concentrations of medium
162: solution. Our results give
163: some answers to the
164: following questions.
165: Namely, are those sequences
166: associated with highly
167: designable structures
168: universally good? how do
169: they vary depending on
170: media\cite{14} where the
171: protein is placed?
172:
173: % The paragraph describing our models
174:
175: We investigate the effects
176: of media upon the category
177: of highly designable
178: protein sequences, which
179: will undoubtedly provide a
180: clue to understand the
181: variations in the nature
182: selection of protein
183: species caused by media
184: where the protein lives.
185: For this purpose, we must
186: reconstruct the original HP
187: model by introducing
188: potential parameters to the
189: monomers at protein's
190: surface. The protein is
191: figured as a chain of beads
192: occupying the sites of a
193: lattice in a self-avoiding
194: way, so our model
195: evaluating the energy of a
196: sequence folded into a
197: particular structure reads,
198: \begin{equation*}
199: H=\sum_{i<j}E_{\sigma_i\sigma_j}\delta_{|r_i-r_j|,1}(1-\delta_{|r_i-r_j|,1})
200: +\sum_{{r_j}{\in}S}U_{r_j}\delta_{\sigma_j,P}
201: \end{equation*}
202: where i, j denote for the
203: successive labels of
204: monomers in a sequence,
205: $r_i$ for the position (of
206: the $i$-th monomer) on the
207: lattice sites, and
208: $\sigma_i$ refers H or P
209: corresponding to
210: hydrophobic or polar
211: monomer. Here the Kronecker
212: delta notation is adopted,
213: i.e., $\delta_{a,b}=1$ if
214: a=b but $\delta_{a,b}=0$ if
215: $a\ne b$. As the
216: hydrophobic force\cite{6}
217: drives protein to fold into
218: a compact shape with more
219: hydrophobic monomers inside
220: as possible, the $HH$
221: contacts are more favorite
222: in this model, which can be
223: characterized by choosing
224: $E_{PP}=0$, $E_{HP}=-1$,
225: and $E_{HH}=-2.3$ as
226: adopted in Ref.\cite{7}. In
227: order to include the
228: effects caused by the
229: protein's surrounding
230: medium that is relevant to
231: salt concentration\cite{14}
232: of a solution where the
233: protein is placed, we
234: introduce $U_V$, $U_E$, and
235: $U_F$ to represent the
236: attractive potentials in
237: the protein surface for
238: polar (hydrophilic)
239: monomers at vertices,
240: edges, or face centers
241: respectively. These
242: attractive forces arise
243: from the medium (solution)
244: to the hydrophilic
245: monomers. Since we are not
246: able to deal with a sphere
247: surface in present lattice
248: model, we consider
249: different weights at the
250: surface, saying
251: $U_{\tau}=-\gamma_{\tau}V$.
252: If
253: $\gamma_V=\gamma_E=\gamma_F\ne
254: 0$, no any new results
255: occur in comparison to the
256: result that Li et al. had
257: studied. This is because
258: the core in the cubic of
259: the 27-site model always
260: contains a hydrophobic
261: core, which implies that
262: the surface potentials
263: merely cause a global shift
264: in energy spectrum of the
265: 27-site model if we impose
266: an equal weights on a
267: vertex, edge as well as
268: center of a face. We then
269: investigate several cases
270: of non-vanishing,
271: $\gamma$'s later on.
272:
273: %The details of our calculation and analysis
274:
275: It has been noticed\cite{7}
276: that some structures can be
277: designed by a large number
278: of sequences, while the
279: others can be designed by
280: only few sequences. The
281: designability of a
282: structure is measured by
283: the number($N_s$) of
284: sequences that take the
285: given structure as their
286: unique ground state, as was
287: first introduced by Li et
288: al.\cite{7}. Additionally,
289: structures differ
290: drastically according to
291: their designability, i.e.,
292: highly designable
293: structures emerge with a
294: number of associated
295: sequences much larger than
296: the average ones. For a
297: particular sequence, the
298: energy gap $\delta_s$ is
299: the minimum energy needed
300: to change its ground-state
301: structure into a different
302: compact structure. The
303: average energy gap
304: $\bar{\delta}_s$ for a
305: given structure is
306: evaluated by averaging the
307: gaps over all the $N_s$
308: sequences that design that
309: structure. The structures
310: with large $N_s$ have much
311: larger average gap than
312: those with small $N_s$, and
313: there is an apparent jump
314: around $N_s=1400$ in the
315: average energy gap. This
316: feature was first noticed
317: by Li et al.\cite{7} in the
318: medium-independent HP
319: model, thus these highly
320: designable structures are
321: thermodynamically more
322: stable and possess
323: protein-like secondary
324: structures into which the
325: protein sequences fold
326: faster than the other
327: structures\cite{7}. To
328: interpret this feature, we
329: calculate the average
330: distribution of the number
331: of hydrophobic monomers for
332: the highly designable
333: structures and for the
334: lowly designable structures
335: respectively. We plot these
336: two distributions together
337: with the pure mathematical
338: binary arrangement
339: distribution in Fig.~\ref{fig:binary} where
340: all distributions are
341: normalized to unit.
342: Clearly, the distributions
343: for highly designable
344: structures shift toward the
345: larger number of
346: hydrophobic monomers in
347: comparison to the
348: mathematical distribution.
349: This leads to a lower
350: energy scale because the
351: more hydrophobic monomers
352: there are, the lower their
353: energy will be. Oppositely,
354: the distribution for lowly
355: designable structures shift
356: toward the small number of
357: hydrophobic monomers in
358: comparison to the
359: mathematical distribution,
360: which causes a higher
361: energy. This may interpret
362: that the lowly designable
363: structures possess small
364: average gap.
365: %fig1
366: \begin{figure}
367: \includegraphics[width=0.32\textwidth]{fig_1.eps}%
368: \caption{\label{fig:binary} Comparison of distributions for binary
369: arrangement (green dot line), the lowly designable structures (red
370: dash-dot line), and the highly designable structures (black solid
371: line) respectively.}
372: \end{figure}
373:
374: Although the choices of
375: $E_{PP}=0$, $E_{HP}=-1$,
376: and $E_{HH}=-2.3$ adopted
377: in Ref.\cite{7} fulfil the
378: principle that the major
379: driving force for protein
380: folding is the hydrophobic
381: force, the difference
382: between the H-H contacts
383: occurring inside protein
384: and that occurring at surface was disregarded. Therefore, to explore the designability
385: affected by the medium surrounding the protein, the application of surface
386: potential in our model becomes inevitable. We pointed out in the above that
387: the 26 monomers are on the surface for 27-site model, which gave trivial
388: result for uniform weights to the surface potential. On the other hand,
389: increasing the number of the lattice sites will make the model beyond the
390: calculation capacity of nowadays computers. However, after some further
391: tuning the original model, we are able to obtain nontrivial and interesting
392: results. First, we consider a ``cubic shape approximation" by imposing different
393: potential weights:
394: ${\gamma}_V=7/8$, ${\gamma}_E=6/8$,
395: and ${\gamma}_F=4/8$, which come from the different interfaces
396: between the medium solution and the monomers at vertex, edge and the face centre
397: respectively. For this parameter choice, we find there are 17 more sequences
398: possessing unique ground state regardless of the magnitudes of $V$
399: (ranging from 0.1 to 2.1) though they do not possess unique ground states in the model studied
400: by Li et al where the effect of medium was neglected\cite{7}. Our calculation further
401: exposes that 14 of those 17 sequences mainly belong to the highly designable
402: structures, and have relatively larger energy gap. We analyse all the 17 sequences,
403: and find that the 14 ones can be related to each other by a single mutation, which
404: implies that they belong to the ``neutral island" suggested by Trinquier et al.\cite{15}.
405: These results confirm that protein structures are selected in nature because they
406: are readily designed and stable against mutations, and that such a selection
407: simultaneously leads to thermodynamic stability and foldability. Thus, a key
408: point to understand the protein-folding problem is to understand the emergence
409: and the properties of highly designable structures.
410:
411: %fig2
412: \begin{figure*}
413: \includegraphics[width=0.28\textwidth]{fig_2a}
414: \includegraphics[width=0.28\textwidth]{fig_2b}
415: \includegraphics[width=0.28\textwidth]{fig_2c}
416: \caption{\label{fig:average}Average
417: gap of structures versus $N_s$ of the structures in the case of
418: ${\gamma}_V=7/8$, $\gamma_E=6/8$, $\gamma_F=0$ for (a) $V=0.0001$,
419: (b)$V=0.9$, and (c)$V=2.1$, respectively.}
420: \end{figure*}
421:
422: %fig3
423: \begin{figure*}
424: \includegraphics[width=0.28\textwidth]{fig_3a}%
425: \includegraphics[width=0.28\textwidth]{fig_3b}
426: \includegraphics[width=0.28\textwidth]{fig_3c}
427: \caption{\label{fig:largest}
428: The largest average gap
429: $\bar{\delta}_{max}$ versus
430: the parameter $V$: (a) for
431: $\gamma_V=7/8$,
432: $\gamma_E=6/8$,
433: $\gamma_F=4/8$; (b)for
434: $\gamma_V=7/8$,
435: $\gamma_E=6/8$,
436: $\gamma_F=0$;
437: (c) for $\gamma_V=1$, $\gamma_E=1$, $\gamma_F=0$ case.}
438: \end{figure*}
439:
440: %fig4
441: \begin{figure*}
442: \includegraphics[width=0.28\textwidth]{fig_4a}
443: \includegraphics[width=0.28\textwidth]{fig_4b}
444: \caption{\label{fig:histogram}The
445: histogram for the number of
446: sequences versus the energy
447: gap for the 60 high
448: designable structures in
449: the absence of medium
450: (left); and in the presence
451: of medium $\gamma_V=7/8$,
452: $\gamma_E=6/8$,$\gamma_F=0$, $V=2.1$(right).}
453: \end{figure*}
454:
455: The second parametrization is to consider
456: $\gamma_V=7/8$,
457: $\gamma_E=6/8$, and $\gamma_F=0$, which models
458: a protein with 7 monomers at the inside while 20 ones
459: at surface. In this case, we find there are 48 more
460: sequences possessing unique ground state for a wider
461: range of magnitudes of $V$ (from 0.0001 to 2.1),
462: which, however, have none unique ground states in the
463: case of Li et al.\cite{7}. Whereas, only one sequence
464: designs the highly designable structure while
465: the other 47 sequences design lowly designable
466: structures. All the energy gaps of those new sequences
467: are found to be $V/8$.
468: Since the ratio of the
469: numbers of the monomers at
470: surface to that at the
471: inside is of order 1 in
472: natural proteins\cite{8},
473: and the ratio in our model
474: is 26:1 in first case but
475: is 20:7 in the second case,
476: the latter case ought to be
477: closer to the usual natural
478: proteins. Fig.~\ref{fig:average} shows the
479: average energy gap for
480: different potential
481: parameters. Clearly, the
482: surface potential enhances
483: the average gap of highly
484: designable structures,
485: which illustrates that the
486: highly designable
487: structures selected by
488: nature are more stable in
489: proper media than in
490: ``vacuum". Recent
491: experiment\cite{16}
492: revealed that the
493: additional stability of a
494: thermophilic protein comes
495: from just a few residues at
496: the protein surface. Thus
497: our theoretical results may
498: evoke more attention to the
499: dependence of stability on
500: medium effects in further
501: model studies.
502:
503: We calculate the case by assuming the potentials at
504: the vertices and at edges with the same weights,
505: i.e., $\gamma_V=1$, $\gamma_E=1$, and $\gamma_F=0$.
506: We find that there is no sequence beyond those of Ref.\cite{7} to take
507: the highly designable structures. Just like the result in Ref.\cite{14}, there are
508: also 60 structures that possess large average gap. When we take
509: account of the effects of medium, the average gap for highly
510: designable structures increase apparently as the potential
511: parameter increases, but the average gap of lowly designable
512: structures does not change much. In all the aforementioned
513: cases, the average gap of a single highly designable structure
514: increases linearly with
515: respect to the increase of
516: $V$. Furthermore, we find
517: the structure with largest
518: average gap is not fixed
519: for all potential
520: parameters. Crossings
521: between energy levels
522: always take place when the
523: potential parameter
524: changes. It is therefore
525: worthwhile to point out
526: that the gains of stability
527: for distinct structures
528: vary, and the most stable
529: protein structure in one
530: surrounding medium maybe no
531: more the most stable one in
532: another medium. The plots
533: of the largest energy gap
534: versus the parameter $V$
535: are shown in Fig.~\ref{fig:largest}
536: respectively for the three
537: cases of the weights
538: $\gamma$'s discussed in the
539: above. In order to show an
540: apparent change for eye's
541: view, we have set the value
542: of the vertical axis in
543: Fig.~\ref{fig:largest} to be the largest
544: average gap minus $0.21V$,
545: $0.5V$, and $0.6V$ for the
546: cases (a), (b), and (c),
547: respectively. In each case
548: is there a critical value
549: of $V$ across which the
550: plot transits from a strait
551: line to another strait
552: line. The critical values
553: of $V$ differ in different
554: cases, but the largest
555: average gaps at the
556: transition point take the
557: same value
558: $\bar{\delta}_s=1.4137$.
559:
560: We analyze all the
561: sequences that design the
562: 60 highly designable
563: structures respectively. In
564: the absence of medium,
565: ${\gamma}_V={\gamma}_E={\gamma}_F=0$,
566: the energy gaps
567: ${\delta}_s$ of those
568: sequences range from 0.3 to
569: 2.6 (see Fig.~\ref{fig:histogram}). Almost
570: half of them have small
571: energy gaps (around 0.3).
572: In the presence of medium,
573: the energy gaps for most of
574: the sequences with larger
575: (over 1) energy gap rises
576: as parameter increases
577: while that for the
578: sequences with small energy
579: gap does not rises
580: apparently. For the cases
581: (a) ${\gamma}_V=7/8$,
582: ${\gamma}_E=6/8$,
583: ${\gamma}_F=4/8$, (b)
584: ${\gamma}_V=7/8$,
585: ${\gamma}_E=6/8$, ${\gamma}_F=0$, and
586: (c) ${\gamma}_V={\gamma}_E=1$, ${\gamma}_F=0$,
587: the increments in energy gaps are mainly $3V/8$, $7V/8$, and
588: $V$
589: respectively. Whereas, there are also a small portion of the
590: sequences whose energy gaps decrease in the medium, e.g., 276
591: sequences in the case
592: ${\gamma}_V=7/8$,
593: ${\gamma}_E=6/8$,
594: ${\gamma}_F=4/8$.
595: Considering some particular
596: structures among the 60
597: highly designable ones, we
598: analyze the sequences that
599: design them. The energy gap
600: of the sequences with
601: larger energy gap will
602: mostly increase when the
603: sequence is placed in
604: medium, which leads to the
605: linear increment of average
606: gap. Our results also
607: illustrate that the
608: distribution shapes emerge
609: similar for those three
610: structures. In addition,
611: the total number of
612: sequence in (b) is less
613: than in (c), but there are
614: much more sequences
615: possessing large energy gap
616: in (b) than in (c).
617:
618: In summary, our simple analysis of the average
619: distribution of the number of hydrophobic monomers can
620: interpret that the lowly designable structures
621: possess small average gap. Our model study exhibits
622: that the surface potential enhances the average gap of
623: highly designable structures, which implies
624: that the highly designable structures selected by
625: nature are more stable in proper media than in
626: ``vacuum". We obtained that the energy gap of the
627: sequences with larger energy gap will mostly
628: increase when the sequence is placed in medium, which
629: leads to the linear increment of average gap.
630: We also noticed that there is a critical value for the
631: parameter of the surface potential, which means that
632: a most stable structure may be no longer the most
633: stable one if the medium parameters changed. Since a
634: lot of studies have shown that several properties of
635: natural proteins can be captured by simple models,
636: our discussion in above may motivate people to model
637: the effect of medium on all theoretical studies where
638: the medium potential was ignored.
639:
640: %Acknowledgments
641: This work is supported by NSFC No.10225419 and 90103022.
642:
643: \begin{references}
644: \bibitem{1} C. Anfinsen, Science {\bf 181}, 223 (1973).
645: \bibitem{2} T. Lazaridis and M. Karplus, Science {\bf 278}, 1928 (1997).
646: \bibitem{3} H. Taketomi, Y. Ueda, and N. Go, Int. J. Prept. Protein Res {\bf 7}, 445 (1975).
647: \bibitem{4} K. A. Dill, Biochemistry {\bf 24}, 1501 (1985).
648: \bibitem{5} M. E. Shakhnovich and A. Gutin, J. Chem. Phys. {\bf 93}, 5967 (1990)
649: \bibitem{6} W. Kauzmann, Adv. Protein Chem. {\bf 14}, 1 (1959).
650: \bibitem{7} H. Li, R. Helling, C. Tang, and N. S. Wingreen, Science {\bf 273}, 666 (1996).
651: \bibitem{8} H. Li, C. Tang, and N. S. Wingreen, Proc. Natl. Acad. Sci. USA {\bf 95}, 4987 (1998).
652: \bibitem{9} H. Cejtin, J. Edler, A. Gottlieb, R. Helling, H. Li, J. Philbin, N. Wingreen, and C. Tang, J. Chem. Phys. {\bf 116}, 352 (2002).
653: \bibitem{10} J. Miller, C. Zeng, N. S. Wingreen and C. Tang, Proteins {\bf 47}, 506 (2002).
654: \bibitem{11} N. E. G. Buchler and R. A. Goldstein, Proteins {\bf 34}, 113 (1999).
655: \bibitem{12} H. Li, C. Tang, and N. S. Wingreen, Proteins {\bf 49}, 403 (2002).
656: \bibitem{13} M. R. Ejtehadi, N. Hamedani, H. Seyed-Allaei, V. Shahrezaei, and M. Yahyanejad, J. Phys. A {\bf 31}, 6141 (1998).
657: \bibitem{14} B. N. Dominy, D. Perl, F. X. Schmid, and CB III. Brooks, J. Mol. Biol. {\bf 319}, 541 (2002).
658: \bibitem{15} G. Trinquier and Y. H. Sanejouand, Phys. Rev. E {\bf 59}, 942 (1999).
659: \bibitem{16} D. Perl, U. Mueller, U. Heinemann, and F. X. Schmid, Nature Struct. Biol. {\bf 7}, 380 (2000).
660: \end{references}
661: %\bibliography{media}% Produces the bibliography via BibTeX.
662:
663: \end{document}
664: