q-bio0508009/flory.tex
1: 
2: \documentclass[aps,prb,twocolumn,floatfix]{revtex4}
3: \usepackage{graphicx}% Include figure files
4: 
5: \begin{document}
6: %\tightenlines
7: 
8: \title{Proteins and polymers}
9: 
10: 
11: \author{Jayanth R. Banavar}
12: \affiliation{Department of Physics, 104 Davey Lab, The
13: Pennsylvania State University, University Park PA 16802, USA}
14: 
15: \author{Trinh Xuan Hoang}
16: \affiliation{Institute of Physics and Electronics, Vietnamese
17: Academy of Science and Technology, 10 Dao Tan, Hanoi, Vietnam}
18: 
19: \author{Amos Maritan}
20: \affiliation{Dipartimento di Fisica `G. Galilei' and INFN,
21: Universit\`a di Padova, Via Marzolo 8, 35131 Padova, Italy}
22: 
23: \begin{abstract}
24: 
25: Proteins, chain molecules of amino acids, behave in ways which are
26: similar to each other yet quite distinct from standard compact
27: polymers. We demonstrate that the Flory theorem, derived for
28: polymer melts, holds for compact protein native state structures
29: and is not incompatible with the existence of structured building
30: blocks such as $\alpha$-helices and $\beta$-strands. We present a
31: discussion on how the notion of the thickness of a polymer chain,
32: besides being useful in describing a chain molecule in the
33: continuum limit, plays a vital role in interpolating between
34: conventional polymer physics and the phase of matter associated
35: with protein structures.
36: 
37: \end{abstract}
38: 
39: \pacs{\underline{87.15.-v}, 89.75.Fb, 05.20.-y}
40: 
41: \maketitle
42: 
43: %\newpage
44: \newcounter{ctr}
45: \setcounter{ctr}{1}
46: 
47: Proteins are chain molecules made up of small chemical entities
48: called amino acids.  In spite of their small size, the diverse
49: physical and chemical attributes of the twenty types of naturally
50: occurring amino acids and the history-dependent role played by
51: evolution, globular proteins exhibit a range of striking common
52: characteristics \cite{RMP}. Traditional attempts at creating a
53: framework for understanding proteins using ideas from polymer
54: physics have been largely unsuccessful as stated by
55: Flory\cite{Flory}: ``Synthetic analogs of globular  proteins are
56: unknown. The capability  of adopting a dense  globular
57: configuration stabilized by self-interactions and of transforming
58: reversibly to the random coil are  peculiar to the chain molecules
59: of globular proteins alone."  The standard models of polymer
60: physics do not provide an explanation for why there are a
61: relatively small number (of order thousand) native state folds
62: \cite{Chothia}, why they are inevitably made up of helices and
63: sheets \cite{Creighton} and how these folds are adapted for
64: biological function especially enzymatic activity.
65: 
66: In this paper, we seek to bridge this apparent gap between
67: polymer physics and the physics of compact biomolecules.  We do
68: this in two complementary ways: first, we study the average
69: behavior of compact protein native state structures and show that,
70: in spite of being made up mainly of $\alpha$-helices and
71: $\beta$-strands, the Flory theorem derived for polymer melts
72: \cite{polymerbooks,Orland} holds reasonably well
73: for native state protein structures as well; second, we
74: demonstrate that the notion of an anisotropic chain of non-zero
75: thickness is valuable for extrapolating from conventional polymer
76: physics to the phase used by nature to house protein structures.
77: 
78: 
79: Let us begin with an analysis of protein native state structures
80: from the protein data bank \cite{PDB} to assess the validity of
81: the Flory theorem. We consider a coarse-grained description in
82: which each amino acid is represented by its $C^{\alpha}$ atom, the
83: hinges of the protein backbone. It is well known from Flory's work
84: in polymer physics that polymer melts or even a long compact
85: polymer has very interesting sub-structure
86: \cite{polymerbooks,Govorun,Lua}. The
87: basic idea is that a short labelled piece of a polymer chain from
88: within such a dense melt exhibits statistics (distributions and
89: an end-to-end distance) which are characteristic of random walk behavior.
90: Physically, the effective absence of any interaction
91: is believed to arise from the inability of the chain to
92: discern whether it is making contacts
93: with itself or with other chains.  Does the presumed
94: validity of the Flory theorem and the existence of Gaussian random
95: walk statistics for short chain segments preclude structures
96: built up from helices and sheets? Interestingly, it has been
97: suggested recently \cite{Rose} that model denatured proteins can
98: exhibit random coil statistics in spite of having significant
99: secondary structure.
100: 
101: Our principal results are summarized in Figures 1-5 and
102: demonstrate that for
103: compact proteins, characterized by an end-to-end distance
104: scaling approximately as the cube root of the protein size (see
105: Figure 1):
106: 
107: 1) The Flory theorem is found to hold (Figure 2) for proteins
108: segments made up of more than 48 amino acids.  The existence of
109: secondary motifs results in an effective persistence length of this order
110: beyond which one obtains Gaussian statistics (Figure 3)
111: accompanied by random walk behavior.
112: 
113: 2) The validity of the Flory theorem is {\em not} incompatible
114: with the existence of secondary motifs \cite{Lua}.
115: 
116: 3) One can understand the crossover in Figure 2 by studying
117: correlation functions of the tangent and the binormal vectors
118: along the chain (Figures 4 and 5).
119: 
120: Our results vividly demonstrate that proteins exhibit properties
121: that are not incompatible with those of generic compact polymers.
122: However, as stated before, the standard models of polymer physics
123: do not account for the rich phase of matter associated with
124: protein native state structures.  In order to proceed, let us
125: recall that a dominant structural motif used in biomolecular
126: structures is the helix \cite{WC,Pauling1}. An everyday object
127: which, on compaction, can be coiled naturally and efficiently into
128: a helical shape is a garden hose or a tube \cite{MaritanNature}. A
129: tube can be thought of as a thick polymer, a polymer chain endowed
130: with a natural thickness. We will proceed to study the attributes
131: of a tube and its relationship with conventional descriptions of
132: polymers.
133: 
134: In the continuum, a non-zero chain thickness serves a valuable
135: purpose. Consider first a polymer chain of vanishing thickness in
136: the continuum. It is well-known \cite{polymerbooks} that the end
137: to end distance, $R$, of a swollen, self-avoiding chain scales
138: approximately as the $3/5$-th power of its length, $L$. In the
139: absence of any other length scale in the problem (recall that we
140: are dealing with a chain of zero thickness in the continuum), one
141: is led to a fundamental problem in simple dimensional analysis in
142: expressing the relationship $R \sim L^{0.6}$ -- both $R$ and $L$
143: have units of length and there is no other length scale in the
144: problem which can be used to fix the correct dimension in the
145: scaling relation. In order to study a chain molecule in the
146: continuum, the traditional approach has been to use the powerful
147: machinery of renormalization group theory \cite{Wilson}. A tube of
148: non-zero thickness circumvents this problem by providing the
149: required additional length scale naturally, even in the continuum.
150: Indeed, one may write a scaling form $R(L, b, \Delta) = L
151: F(L/\Delta,b/\Delta)$, where $\Delta$ is the tube thickness. The
152: continuum limit can be safely taken by letting $b$ go to $0$
153: leading to $R = L F(L/\Delta,0) \sim \Delta^{1-\nu} L^\nu$.
154: 
155: 
156: An interesting issue in polymer physics is the description, in the
157: continuum, of a closed chain with certain knot topologies. One, of
158: course, requires physically that the knot number be preserved in
159: any dynamics.  A string described by in standard continuum
160: approach is necessarily characterized by an infinitesimal
161: thickness and allows changes in the knot topology with a finite
162: energy cost rendering the model somewhat unphysical in this
163: regard. This problem is cured by the tube description. Hard
164: spheres have been studied for centuries and their self-avoidance
165: is ensured by considering all pairs of spheres and requiring that
166: their centers are no closer than the sphere diameter. Strikingly,
167: the generalization of this result to a tube entails a simple
168: modification of the standard pair-wise interactions \cite{BGMM}.
169: For each pair of points along the tube axis, one draws two circles
170: both passing through the two points and each one tangential to the
171: axis at one or the other location. One then simply requires that
172: none of the radii is smaller than the tube radius \cite{GM,BGMM}.
173: The use of many-body potentials is an essential ingredient for
174: describing a tube in the continuum \cite{BGMM}. The many-body
175: potential replaces the pairwise self-interaction potential and
176: ought not to be thought of as a higher order correction.
177: 
178: 
179: The coarse-grained flexible tube model captures two essential
180: ingredients of proteins  -- the space within a tube roughly allows
181: for the packing of the protein atoms and local steric effects are
182: encapsulated by constraints on the local radius of curvature; the
183: effects of the geometrical constraints imposed by the chemistry of
184: backbone hydrogen bonds are represented by the inherent anisotropy
185: of a tube (a  tube, when discretized, may be imagined to be a chain of
186: discs).  The generic compact polymer phase arises for long tubes with a
187: thickness much smaller than the range of attractive interactions
188: promoting compaction.
189: 
190: Recent work \cite{HoangPNAS} has shown that the low energy conformations
191: adopted by tube-like polymers with certain constraints on symmetry and
192: geometry are made up of helices and sheets akin to marginally compact
193: protein secondary structures.
194: For classes of short homopolymers characterized by generic
195: geometrical constraints arising from backbone hydrogen bonds and
196: sterics and with mild variations in their overall hydrophobicity
197: and local curvature energy penalty parameters, one obtains a free
198: energy landscape\cite{HoangPNAS}, determined by geometry and
199: symmetry, with multiple minima corresponding to the menu of folds.
200: We have generated a thousand structures with low energies of a
201: homopolymer of length $N=48$. The structures are local energy
202: minima in simulated annealing simulations. A refined set of about
203: 320 protein-like structures is obtained by choosing only those
204: that are marginally compact ($7.6\AA < R_g < 12\AA$) and have a
205: sufficient amount of secondary structure content (the fraction of
206: residues participating in either a helix or a sheet is larger than
207: 60\% of the total number of residues). 
208: Strikingly, Figure 6a shows that the behavior of short segments of real
209: proteins and the model structures are qualitatively similar to each other.
210: The deviation from Gaussian behavior in both cases is due to the presence
211: of secondary structures, whose characteristic length scale is smaller for
212: the model structures than for real proteins. Interestingly, even for
213: relatively short segment lengths (l = 8, 12) in the model structures, one
214: observes statistical behavior somewhat similar to that of Gaussian chains
215: (Figure 6b) along with significant deviations, most notably a peak due to
216: the presence of the secondary structures.  Due to the limited chain length
217: that one can reliably study in the model we are not able to observe the
218: crossover to the regime predicted by Flory.
219: 
220: In summary, we have shown that there is a natural bridge, provided
221: by the chain thickness, between polymer physics and the physics of
222: biomolecular structures.  The thickness provides a physically
223: motivated cut-off length scale which allows for a well-defined
224: continuum limit. The Flory theorem is found to hold for proteins
225: in spite of the structured building blocks of protein native state
226: structures. Our results suggest that the powerful arsenal of
227: techniques of polymer physics can be brought to bear on the
228: protein problem and conversely, the notion that chain molecules
229: are inherently anisotropic and have a non-zero thickness provide a
230: new perspective in the field of polymer physics.
231: 
232: 
233: This work was supported by PRIN 2003, INFN, NASA, NSF IGERT grant
234: DGE-9987589, NSF MRSEC at Penn State, and the NSC of Vietnam (grant 
235: No. 410704).
236: 
237: \begin{thebibliography}{99}
238: 
239: \bibitem{RMP}
240: J. R. Banavar and A. Maritan, Rev. Mod. Phys. {\bf 75}, 23 (2003).
241: 
242: \bibitem{Flory}
243: P. J. Flory, {\it Statistical Mechanics of Chain Molecules}
244: (Wiley, New York, 1969).
245: 
246: \bibitem{Chothia}
247: C. Chothia, Nature (London) {\bf 357}, 543 (1992).
248: 
249: \bibitem{Creighton}
250: T. E. Creighton, {\it Proteins, Structure and Molecular Properties,
251: 2nd ed.} (Freeman, New York, 1993).
252: 
253: \bibitem{polymerbooks}
254: P. J. Flory, {\it Principles of polymer chemistry}
255: (Cornell University Press, Ithaca, 1953);
256: A. Y. Grosberg and A. R. Khokhlov, {\it Statistical Physics of
257: Macromolecules} (AIP Press, New York, 1994);
258: M. Rubinstein and R. Colby, {\it Polymer physics}
259: (Oxford University Press, New York, 2003);
260: C. Vanderzande, {\it Lattice models of polymers}
261: (Cambridge Univeristy Press, Cambridge, 1998).
262: 
263: \bibitem{Orland}
264: H. Orland, Journal de Physique I {\bf 4}, 101 (1994).
265: 
266: \bibitem{PDB}
267: H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T .N. Bhat,
268: H. Weissig, I.N.  Shindyalov, P.E. Bourne,
269: Nucleic Acids Research {\bf 28}, 235 (2000)
270: 
271: \bibitem{Govorun}
272: E. N. Govorun, V. A. Ivanov, A. R. Khokhlov, P. G. Khalatur,
273: A. L. Borovinsky, A. Y. Grosberg, Phys. Rev. E {\bf 64}, 040903 (2001).
274: 
275: \bibitem{Lua}
276: R. Lua, A. L. Borovinsky, A. Y. Grosberg, Polymers {\bf 45}, 717 (2004).
277: 
278: \bibitem{Rose}
279: N. C. Fitzkee and G. D. Rose, Proc. Acad. Natl. Sci. USA (in press).
280: 
281: \bibitem{WC} J.~D. Watson and F.~H.~C. Crick, Nature {\bf 171}, 737 (1953).
282: 
283: \bibitem{Pauling1}
284: L. Pauling, R.~B. Corey, and H.~R. Branson,
285: P. Natl. Acad. Sci. USA  {\bf 37}, 205 (1951).
286: 
287: \bibitem{MaritanNature}
288: A. Maritan, C. Micheletti, A. Trovato, and J.~R. Banavar,
289: Nature {\bf 406}, 287 (2000);
290: A. Stasiak, and J.~H. Maddocks, Nature {\bf 406}, 251 (2000).
291: 
292: \bibitem{Wilson}
293: K. G. Wilson, Rev. Mod. Phys. {\bf 55}, 583 (1983).
294: 
295: \bibitem{BGMM}
296: J. R. Banavar, O. Gonzalez, J. H. Maddocks and A. Maritan,
297: J. Stat. Phys. {\bf 110}, 35 (2003).
298: 
299: %\bibitem{Stasiak}
300: %A. Stasiak, V. Katritch and L. H. Kauffman, eds., {\it Ideal Knots}
301: %(World Scientific Publishing, Singapore, 1998).
302: 
303: \bibitem{GM}
304: O. Gonzalez and J. H. Maddocks, Proc. Natl. Acad. Sci. USA
305: {\bf 96}, 4769 (1999).
306: 
307: \bibitem{Edwards}
308: M. Doi and S. F. Edwards, {\it The Theory of Polymer Dynamics}
309: (Clarendon Press, New York, 1993).
310: 
311: \bibitem{PRE}
312: J. R. Banavar, T. X. Hoang, A. Maritan, F. Seno and A. Trovato,
313: Phys. Rev. E. {\bf 70}, 041905 (2004).
314: 
315: \bibitem{BMMT}
316: J. R. Banavar, A. Maritan, C. Micheletti and A. Trovato,
317: Prot. Struct. Func. Gen. {\bf 47}, 315 (2002).
318: 
319: \bibitem{HoangPNAS}
320: T. X. Hoang, A. Trovato, F. Seno, J. R. Banavar and A. Maritan,
321: Proc. Natl. Acad. Sci. USA {\bf 101}, 7960 (2004).
322: 
323: 
324: \end{thebibliography}
325: 
326: \newpage
327: 
328: \section*{Figure Captions}
329: 
330: \begin{description}
331: \item[Fig. 1.]Log-log
332: plot of the radius of gyration $R_g$ of a set of 700 proteins versus their
333: length $L$ or the number of constituent amino acids. The proteins
334: used in our study were selected based on several criteria: the
335: sequences chosen have less than 50\% overlap with each other, the
336: structures have been obtained with high resolution X-ray
337: diffraction and the proteins are substantially compact without
338: dangling ends so that $R_g/L^{1/3} \leq 3.02\AA$. The straight line
339: has a slope of $1/3$ as a guide to the eye.
340: \item[Fig. 2.]Log-log
341: plot of the end to end distance $R$ versus $l$ for protein
342: segments. The plot was obtained by averaging over all segments of
343: length $l$ selected from the data set depicted in Figure 1. For a
344: given $l$, $R$ was determined as an average over all segments of
345: that length in proteins whose lengths are greater than $l^{3/2}$,
346: in order to avoid finite size effects \cite{Lua}. The error bars
347: are of the order of the size of the symbols. Note the plateau
348: which indicates that $R$ is only slowly increasing with $l$ around
349: 24. For values of $l$ larger than 48, we find that $R \sim
350: l^{1/2}$.
351: \item[Fig. 3.]Statistics of the end-to-end distance of segments of
352: proteins of length $l$. For $l=48$, 64 and 80, the distributions
353: show a nice collapse to the form expected for Gaussian statistics:
354: the solid line denotes the function
355: $P(x)=\frac{1}{\sigma^3}\sqrt{\frac{2}{\pi
356: l}}\,x^2\exp(-\frac{x^2}{2\sigma^2})$, where $\sigma=2.164 \AA $ .
357: For $l=16$, where the presence of secondary motifs play a major
358: role, the distribution is qualitatively different from the other
359: sizes and exhibits a peak arising from the presence of
360: $\alpha$-helices.
361: \item[Fig. 4.]Plot of
362: the tangent-tangent and binormal-binormal correlation functions
363: along the protein sequence derived from our data set. The tangent
364: vector at location $i$ is defined as an unit vector pointing along
365: the line joining the positions of the $i-1$-th and the $i+1$-th
366: amino acids.
367: The normal vector is defined by joining the $i$-th location to the
368: center of the circle drawn through three amino acid
369: ($i-1$,$i$,$i+1$) locations. The binormal is perpendicular to the
370: plane defined by the tangent and the normal. Note that: a) the
371: negative tangent-tangent correlation at sequence separation $k$
372: around 13 corresponds to a turning back,  on average, of the chain
373: direction and is related to the cross-over shown in Figure 2; b)
374: the binormal-binormal correlation remains non-zero for large
375: separations.
376: \item[Fig. 5.]Histogram of the magnitudes of the average tangent and
377: binormal vectors for each protein in our data set. For each
378: protein, we measured the magnitude as $\frac {1} {N}
379: \mid\Sigma_{i=1}^N \vec{v}_i\mid$, where $\vec{v}_i$ is either the
380: unit tangent or the unit binormal vector at location $i$ and $N$
381: is the number of such vectors for a given protein. For comparison,
382: a histogram of the magnitudes of the average of randomly oriented
383: vectors is shown as the shaded histogram. (Here $\vec{v}_i$ was
384: selected to be a randomly oriented unit vector.) Note that several
385: proteins have a significant non-zero mean binormal vector due to
386: the presence of $\alpha$-helices.
387: \item[Fig. 6.](a) Statistics of the end-to-end distance of
388: segments of length $l=6$ taken from model protein structures 
389: \cite{HoangPNAS} and from PDB structures. The peak in the
390: distributions arises from the presence of $\alpha$-helices.
391: (b) Same as Figure 3 but for segments of the model structures
392: of lengths $l=8$ and $l=12$. The fits to the
393: Gaussian form given in the caption of Figure 3 yield
394: $\sigma=2.61\AA$ for $l=8$ and $\sigma=2.08\AA$ for $l=12$.
395: \end{description}
396: 
397: \clearpage
398: 
399: 
400: \begin{figure}
401: \centerline{\includegraphics[width=3.2in]{fig1.eps}}
402: %\vspace{30pt}
403: %\centerline{J. R. Banavar et al., Fig. 1}
404: \caption{}
405: \end{figure}
406: %\clearpage
407: 
408: \begin{figure}
409: \centerline{\includegraphics[width=3.2in]{fig2.eps}}
410: %\vspace{30pt}
411: %\centerline{J. R. Banavar et al., Fig. 2}
412: \caption{}
413: \end{figure}
414: %\clearpage
415: 
416: \begin{figure}
417: \centerline{\includegraphics[width=3.2in]{fig3.eps}}
418: %\vspace{30pt}
419: %\centerline{J. R. Banavar et al., Fig. 3}
420: \caption{}
421: \end{figure}
422: %\clearpage
423: 
424: \begin{figure}
425: \centerline{\includegraphics[width=3.2in]{fig4.eps}}
426: %\vspace{30pt}
427: %\centerline{J. R. Banavar et al., Fig. 4}
428: \caption{}
429: \end{figure}
430: %\clearpage
431: 
432: \begin{figure}
433: \centerline{\includegraphics[width=3.2in]{fig5.eps}}
434: %\vspace{30pt}
435: %\centerline{J. R. Banavar et al., Fig. 5}
436: \caption{}
437: \end{figure}
438: %\clearpage
439: 
440: \begin{figure}
441: \centerline{\includegraphics[width=3.2in]{fig6.eps}}
442: %\vspace{30pt}
443: %\centerline{J. R. Banavar et al., Fig. 6}
444: \caption{ }
445: \end{figure}
446: 
447: %\eject
448: \end{document}
449: