0508:q-bio0508009/flory.tex

1:

2: \documentclass[aps,prb,twocolumn,floatfix]{revtex4}

3: \usepackage{graphicx}% Include figure files

4:

5: \begin{document}

6: %\tightenlines

7:

8: \title{Proteins and polymers}

9:

10:

11: \author{Jayanth R. Banavar}

12: \affiliation{Department of Physics, 104 Davey Lab, The

13: Pennsylvania State University, University Park PA 16802, USA}

14:

15: \author{Trinh Xuan Hoang}

16: \affiliation{Institute of Physics and Electronics, Vietnamese

17: Academy of Science and Technology, 10 Dao Tan, Hanoi, Vietnam}

18:

19: \author{Amos Maritan}

20: \affiliation{Dipartimento di Fisica `G. Galilei' and INFN,

21: Universit\`a di Padova, Via Marzolo 8, 35131 Padova, Italy}

22:

23: \begin{abstract}

24:

25: Proteins, chain molecules of amino acids, behave in ways which are

26: similar to each other yet quite distinct from standard compact

27: polymers. We demonstrate that the Flory theorem, derived for

28: polymer melts, holds for compact protein native state structures

29: and is not incompatible with the existence of structured building

30: blocks such as $\alpha$-helices and $\beta$-strands. We present a

31: discussion on how the notion of the thickness of a polymer chain,

32: besides being useful in describing a chain molecule in the

33: continuum limit, plays a vital role in interpolating between

34: conventional polymer physics and the phase of matter associated

35: with protein structures.

36:

37: \end{abstract}

38:

39: \pacs{\underline{87.15.-v}, 89.75.Fb, 05.20.-y}

40:

41: \maketitle

42:

43: %\newpage

44: \newcounter{ctr}

45: \setcounter{ctr}{1}

46:

47: Proteins are chain molecules made up of small chemical entities

48: called amino acids.  In spite of their small size, the diverse

49: physical and chemical attributes of the twenty types of naturally

50: occurring amino acids and the history-dependent role played by

51: evolution, globular proteins exhibit a range of striking common

52: characteristics \cite{RMP}. Traditional attempts at creating a

53: framework for understanding proteins using ideas from polymer

54: physics have been largely unsuccessful as stated by

55: Flory\cite{Flory}: ``Synthetic analogs of globular  proteins are

56: unknown. The capability  of adopting a dense  globular

57: configuration stabilized by self-interactions and of transforming

58: reversibly to the random coil are  peculiar to the chain molecules

59: of globular proteins alone."  The standard models of polymer

60: physics do not provide an explanation for why there are a

61: relatively small number (of order thousand) native state folds

62: \cite{Chothia}, why they are inevitably made up of helices and

63: sheets \cite{Creighton} and how these folds are adapted for

64: biological function especially enzymatic activity.

65:

66: In this paper, we seek to bridge this apparent gap between

67: polymer physics and the physics of compact biomolecules.  We do

68: this in two complementary ways: first, we study the average

69: behavior of compact protein native state structures and show that,

70: in spite of being made up mainly of $\alpha$-helices and

71: $\beta$-strands, the Flory theorem derived for polymer melts

72: \cite{polymerbooks,Orland} holds reasonably well

73: for native state protein structures as well; second, we

74: demonstrate that the notion of an anisotropic chain of non-zero

75: thickness is valuable for extrapolating from conventional polymer

76: physics to the phase used by nature to house protein structures.

77:

78:

79: Let us begin with an analysis of protein native state structures

80: from the protein data bank \cite{PDB} to assess the validity of

81: the Flory theorem. We consider a coarse-grained description in

82: which each amino acid is represented by its $C^{\alpha}$ atom, the

83: hinges of the protein backbone. It is well known from Flory's work

84: in polymer physics that polymer melts or even a long compact

85: polymer has very interesting sub-structure

86: \cite{polymerbooks,Govorun,Lua}. The

87: basic idea is that a short labelled piece of a polymer chain from

88: within such a dense melt exhibits statistics (distributions and

89: an end-to-end distance) which are characteristic of random walk behavior.

90: Physically, the effective absence of any interaction

91: is believed to arise from the inability of the chain to

92: discern whether it is making contacts

93: with itself or with other chains.  Does the presumed

94: validity of the Flory theorem and the existence of Gaussian random

95: walk statistics for short chain segments preclude structures

96: built up from helices and sheets? Interestingly, it has been

97: suggested recently \cite{Rose} that model denatured proteins can

98: exhibit random coil statistics in spite of having significant

99: secondary structure.

100:

101: Our principal results are summarized in Figures 1-5 and

102: demonstrate that for

103: compact proteins, characterized by an end-to-end distance

104: scaling approximately as the cube root of the protein size (see

105: Figure 1):

106:

107: 1) The Flory theorem is found to hold (Figure 2) for proteins

108: segments made up of more than 48 amino acids.  The existence of

109: secondary motifs results in an effective persistence length of this order

110: beyond which one obtains Gaussian statistics (Figure 3)

111: accompanied by random walk behavior.

112:

113: 2) The validity of the Flory theorem is {\em not} incompatible

114: with the existence of secondary motifs \cite{Lua}.

115:

116: 3) One can understand the crossover in Figure 2 by studying

117: correlation functions of the tangent and the binormal vectors

118: along the chain (Figures 4 and 5).

119:

120: Our results vividly demonstrate that proteins exhibit properties

121: that are not incompatible with those of generic compact polymers.

122: However, as stated before, the standard models of polymer physics

123: do not account for the rich phase of matter associated with

124: protein native state structures.  In order to proceed, let us

125: recall that a dominant structural motif used in biomolecular

126: structures is the helix \cite{WC,Pauling1}. An everyday object

127: which, on compaction, can be coiled naturally and efficiently into

128: a helical shape is a garden hose or a tube \cite{MaritanNature}. A

129: tube can be thought of as a thick polymer, a polymer chain endowed

130: with a natural thickness. We will proceed to study the attributes

131: of a tube and its relationship with conventional descriptions of

132: polymers.

133:

134: In the continuum, a non-zero chain thickness serves a valuable

135: purpose. Consider first a polymer chain of vanishing thickness in

136: the continuum. It is well-known \cite{polymerbooks} that the end

137: to end distance, $R$, of a swollen, self-avoiding chain scales

138: approximately as the $3/5$-th power of its length, $L$. In the

139: absence of any other length scale in the problem (recall that we

140: are dealing with a chain of zero thickness in the continuum), one

141: is led to a fundamental problem in simple dimensional analysis in

142: expressing the relationship $R \sim L^{0.6}$ -- both $R$ and $L$

143: have units of length and there is no other length scale in the

144: problem which can be used to fix the correct dimension in the

145: scaling relation. In order to study a chain molecule in the

146: continuum, the traditional approach has been to use the powerful

147: machinery of renormalization group theory \cite{Wilson}. A tube of

148: non-zero thickness circumvents this problem by providing the

149: required additional length scale naturally, even in the continuum.

150: Indeed, one may write a scaling form $R(L, b, \Delta) = L

151: F(L/\Delta,b/\Delta)$, where $\Delta$ is the tube thickness. The

152: continuum limit can be safely taken by letting $b$ go to $0$

153: leading to $R = L F(L/\Delta,0) \sim \Delta^{1-\nu} L^\nu$.

154:

155:

156: An interesting issue in polymer physics is the description, in the

157: continuum, of a closed chain with certain knot topologies. One, of

158: course, requires physically that the knot number be preserved in

159: any dynamics.  A string described by in standard continuum

160: approach is necessarily characterized by an infinitesimal

161: thickness and allows changes in the knot topology with a finite

162: energy cost rendering the model somewhat unphysical in this

163: regard. This problem is cured by the tube description. Hard

164: spheres have been studied for centuries and their self-avoidance

165: is ensured by considering all pairs of spheres and requiring that

166: their centers are no closer than the sphere diameter. Strikingly,

167: the generalization of this result to a tube entails a simple

168: modification of the standard pair-wise interactions \cite{BGMM}.

169: For each pair of points along the tube axis, one draws two circles

170: both passing through the two points and each one tangential to the

171: axis at one or the other location. One then simply requires that

172: none of the radii is smaller than the tube radius \cite{GM,BGMM}.

173: The use of many-body potentials is an essential ingredient for

174: describing a tube in the continuum \cite{BGMM}. The many-body

175: potential replaces the pairwise self-interaction potential and

176: ought not to be thought of as a higher order correction.

177:

178:

179: The coarse-grained flexible tube model captures two essential

180: ingredients of proteins  -- the space within a tube roughly allows

181: for the packing of the protein atoms and local steric effects are

182: encapsulated by constraints on the local radius of curvature; the

183: effects of the geometrical constraints imposed by the chemistry of

184: backbone hydrogen bonds are represented by the inherent anisotropy

185: of a tube (a  tube, when discretized, may be imagined to be a chain of

186: discs).  The generic compact polymer phase arises for long tubes with a

187: thickness much smaller than the range of attractive interactions

188: promoting compaction.

189:

190: Recent work \cite{HoangPNAS} has shown that the low energy conformations

191: adopted by tube-like polymers with certain constraints on symmetry and

192: geometry are made up of helices and sheets akin to marginally compact

193: protein secondary structures.

194: For classes of short homopolymers characterized by generic

195: geometrical constraints arising from backbone hydrogen bonds and

196: sterics and with mild variations in their overall hydrophobicity

197: and local curvature energy penalty parameters, one obtains a free

198: energy landscape\cite{HoangPNAS}, determined by geometry and

199: symmetry, with multiple minima corresponding to the menu of folds.

200: We have generated a thousand structures with low energies of a

201: homopolymer of length $N=48$. The structures are local energy

202: minima in simulated annealing simulations. A refined set of about

203: 320 protein-like structures is obtained by choosing only those

204: that are marginally compact ($7.6\AA < R_g < 12\AA$) and have a

205: sufficient amount of secondary structure content (the fraction of

206: residues participating in either a helix or a sheet is larger than

207: 60\% of the total number of residues).

208: Strikingly, Figure 6a shows that the behavior of short segments of real

209: proteins and the model structures are qualitatively similar to each other.

210: The deviation from Gaussian behavior in both cases is due to the presence

211: of secondary structures, whose characteristic length scale is smaller for

212: the model structures than for real proteins. Interestingly, even for

213: relatively short segment lengths (l = 8, 12) in the model structures, one

214: observes statistical behavior somewhat similar to that of Gaussian chains

215: (Figure 6b) along with significant deviations, most notably a peak due to

216: the presence of the secondary structures.  Due to the limited chain length

217: that one can reliably study in the model we are not able to observe the

218: crossover to the regime predicted by Flory.

219:

220: In summary, we have shown that there is a natural bridge, provided

221: by the chain thickness, between polymer physics and the physics of

222: biomolecular structures.  The thickness provides a physically

223: motivated cut-off length scale which allows for a well-defined

224: continuum limit. The Flory theorem is found to hold for proteins

225: in spite of the structured building blocks of protein native state

226: structures. Our results suggest that the powerful arsenal of

227: techniques of polymer physics can be brought to bear on the

228: protein problem and conversely, the notion that chain molecules

229: are inherently anisotropic and have a non-zero thickness provide a

230: new perspective in the field of polymer physics.

231:

232:

233: This work was supported by PRIN 2003, INFN, NASA, NSF IGERT grant

234: DGE-9987589, NSF MRSEC at Penn State, and the NSC of Vietnam (grant

235: No. 410704).

236:

237: \begin{thebibliography}{99}

238:

239: \bibitem{RMP}

240: J. R. Banavar and A. Maritan, Rev. Mod. Phys. {\bf 75}, 23 (2003).

241:

242: \bibitem{Flory}

243: P. J. Flory, {\it Statistical Mechanics of Chain Molecules}

244: (Wiley, New York, 1969).

245:

246: \bibitem{Chothia}

247: C. Chothia, Nature (London) {\bf 357}, 543 (1992).

248:

249: \bibitem{Creighton}

250: T. E. Creighton, {\it Proteins, Structure and Molecular Properties,

251: 2nd ed.} (Freeman, New York, 1993).

252:

253: \bibitem{polymerbooks}

254: P. J. Flory, {\it Principles of polymer chemistry}

255: (Cornell University Press, Ithaca, 1953);

256: A. Y. Grosberg and A. R. Khokhlov, {\it Statistical Physics of

257: Macromolecules} (AIP Press, New York, 1994);

258: M. Rubinstein and R. Colby, {\it Polymer physics}

259: (Oxford University Press, New York, 2003);

260: C. Vanderzande, {\it Lattice models of polymers}

261: (Cambridge Univeristy Press, Cambridge, 1998).

262:

263: \bibitem{Orland}

264: H. Orland, Journal de Physique I {\bf 4}, 101 (1994).

265:

266: \bibitem{PDB}

267: H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T .N. Bhat,

268: H. Weissig, I.N.  Shindyalov, P.E. Bourne,

269: Nucleic Acids Research {\bf 28}, 235 (2000)

270:

271: \bibitem{Govorun}

272: E. N. Govorun, V. A. Ivanov, A. R. Khokhlov, P. G. Khalatur,

273: A. L. Borovinsky, A. Y. Grosberg, Phys. Rev. E {\bf 64}, 040903 (2001).

274:

275: \bibitem{Lua}

276: R. Lua, A. L. Borovinsky, A. Y. Grosberg, Polymers {\bf 45}, 717 (2004).

277:

278: \bibitem{Rose}

279: N. C. Fitzkee and G. D. Rose, Proc. Acad. Natl. Sci. USA (in press).

280:

281: \bibitem{WC} J.~D. Watson and F.~H.~C. Crick, Nature {\bf 171}, 737 (1953).

282:

283: \bibitem{Pauling1}

284: L. Pauling, R.~B. Corey, and H.~R. Branson,

285: P. Natl. Acad. Sci. USA  {\bf 37}, 205 (1951).

286:

287: \bibitem{MaritanNature}

288: A. Maritan, C. Micheletti, A. Trovato, and J.~R. Banavar,

289: Nature {\bf 406}, 287 (2000);

290: A. Stasiak, and J.~H. Maddocks, Nature {\bf 406}, 251 (2000).

291:

292: \bibitem{Wilson}

293: K. G. Wilson, Rev. Mod. Phys. {\bf 55}, 583 (1983).

294:

295: \bibitem{BGMM}

296: J. R. Banavar, O. Gonzalez, J. H. Maddocks and A. Maritan,

297: J. Stat. Phys. {\bf 110}, 35 (2003).

298:

299: %\bibitem{Stasiak}

300: %A. Stasiak, V. Katritch and L. H. Kauffman, eds., {\it Ideal Knots}

301: %(World Scientific Publishing, Singapore, 1998).

302:

303: \bibitem{GM}

304: O. Gonzalez and J. H. Maddocks, Proc. Natl. Acad. Sci. USA

305: {\bf 96}, 4769 (1999).

306:

307: \bibitem{Edwards}

308: M. Doi and S. F. Edwards, {\it The Theory of Polymer Dynamics}

309: (Clarendon Press, New York, 1993).

310:

311: \bibitem{PRE}

312: J. R. Banavar, T. X. Hoang, A. Maritan, F. Seno and A. Trovato,

313: Phys. Rev. E. {\bf 70}, 041905 (2004).

314:

315: \bibitem{BMMT}

316: J. R. Banavar, A. Maritan, C. Micheletti and A. Trovato,

317: Prot. Struct. Func. Gen. {\bf 47}, 315 (2002).

318:

319: \bibitem{HoangPNAS}

320: T. X. Hoang, A. Trovato, F. Seno, J. R. Banavar and A. Maritan,

321: Proc. Natl. Acad. Sci. USA {\bf 101}, 7960 (2004).

322:

323:

324: \end{thebibliography}

325:

326: \newpage

327:

328: \section*{Figure Captions}

329:

330: \begin{description}

331: \item[Fig. 1.]Log-log

332: plot of the radius of gyration $R_g$ of a set of 700 proteins versus their

333: length $L$ or the number of constituent amino acids. The proteins

334: used in our study were selected based on several criteria: the

335: sequences chosen have less than 50\% overlap with each other, the

336: structures have been obtained with high resolution X-ray

337: diffraction and the proteins are substantially compact without

338: dangling ends so that $R_g/L^{1/3} \leq 3.02\AA$. The straight line

339: has a slope of $1/3$ as a guide to the eye.

340: \item[Fig. 2.]Log-log

341: plot of the end to end distance $R$ versus $l$ for protein

342: segments. The plot was obtained by averaging over all segments of

343: length $l$ selected from the data set depicted in Figure 1. For a

344: given $l$, $R$ was determined as an average over all segments of

345: that length in proteins whose lengths are greater than $l^{3/2}$,

346: in order to avoid finite size effects \cite{Lua}. The error bars

347: are of the order of the size of the symbols. Note the plateau

348: which indicates that $R$ is only slowly increasing with $l$ around

349: 24. For values of $l$ larger than 48, we find that $R \sim

350: l^{1/2}$.

351: \item[Fig. 3.]Statistics of the end-to-end distance of segments of

352: proteins of length $l$. For $l=48$, 64 and 80, the distributions

353: show a nice collapse to the form expected for Gaussian statistics:

354: the solid line denotes the function

355: $P(x)=\frac{1}{\sigma^3}\sqrt{\frac{2}{\pi

356: l}}\,x^2\exp(-\frac{x^2}{2\sigma^2})$, where $\sigma=2.164 \AA $ .

357: For $l=16$, where the presence of secondary motifs play a major

358: role, the distribution is qualitatively different from the other

359: sizes and exhibits a peak arising from the presence of

360: $\alpha$-helices.

361: \item[Fig. 4.]Plot of

362: the tangent-tangent and binormal-binormal correlation functions

363: along the protein sequence derived from our data set. The tangent

364: vector at location $i$ is defined as an unit vector pointing along

365: the line joining the positions of the $i-1$-th and the $i+1$-th

366: amino acids.

367: The normal vector is defined by joining the $i$-th location to the

368: center of the circle drawn through three amino acid

369: ($i-1$,$i$,$i+1$) locations. The binormal is perpendicular to the

370: plane defined by the tangent and the normal. Note that: a) the

371: negative tangent-tangent correlation at sequence separation $k$

372: around 13 corresponds to a turning back,  on average, of the chain

373: direction and is related to the cross-over shown in Figure 2; b)

374: the binormal-binormal correlation remains non-zero for large

375: separations.

376: \item[Fig. 5.]Histogram of the magnitudes of the average tangent and

377: binormal vectors for each protein in our data set. For each

378: protein, we measured the magnitude as $\frac {1} {N}

379: \mid\Sigma_{i=1}^N \vec{v}_i\mid$, where $\vec{v}_i$ is either the

380: unit tangent or the unit binormal vector at location $i$ and $N$

381: is the number of such vectors for a given protein. For comparison,

382: a histogram of the magnitudes of the average of randomly oriented

383: vectors is shown as the shaded histogram. (Here $\vec{v}_i$ was

384: selected to be a randomly oriented unit vector.) Note that several

385: proteins have a significant non-zero mean binormal vector due to

386: the presence of $\alpha$-helices.

387: \item[Fig. 6.](a) Statistics of the end-to-end distance of

388: segments of length $l=6$ taken from model protein structures

389: \cite{HoangPNAS} and from PDB structures. The peak in the

390: distributions arises from the presence of $\alpha$-helices.

391: (b) Same as Figure 3 but for segments of the model structures

392: of lengths $l=8$ and $l=12$. The fits to the

393: Gaussian form given in the caption of Figure 3 yield

394: $\sigma=2.61\AA$ for $l=8$ and $\sigma=2.08\AA$ for $l=12$.

395: \end{description}

396:

397: \clearpage

398:

399:

400: \begin{figure}

401: \centerline{\includegraphics[width=3.2in]{fig1.eps}}

402: %\vspace{30pt}

403: %\centerline{J. R. Banavar et al., Fig. 1}

404: \caption{}

405: \end{figure}

406: %\clearpage

407:

408: \begin{figure}

409: \centerline{\includegraphics[width=3.2in]{fig2.eps}}

410: %\vspace{30pt}

411: %\centerline{J. R. Banavar et al., Fig. 2}

412: \caption{}

413: \end{figure}

414: %\clearpage

415:

416: \begin{figure}

417: \centerline{\includegraphics[width=3.2in]{fig3.eps}}

418: %\vspace{30pt}

419: %\centerline{J. R. Banavar et al., Fig. 3}

420: \caption{}

421: \end{figure}

422: %\clearpage

423:

424: \begin{figure}

425: \centerline{\includegraphics[width=3.2in]{fig4.eps}}

426: %\vspace{30pt}

427: %\centerline{J. R. Banavar et al., Fig. 4}

428: \caption{}

429: \end{figure}

430: %\clearpage

431:

432: \begin{figure}

433: \centerline{\includegraphics[width=3.2in]{fig5.eps}}

434: %\vspace{30pt}

435: %\centerline{J. R. Banavar et al., Fig. 5}

436: \caption{}

437: \end{figure}

438: %\clearpage

439:

440: \begin{figure}

441: \centerline{\includegraphics[width=3.2in]{fig6.eps}}

442: %\vspace{30pt}

443: %\centerline{J. R. Banavar et al., Fig. 6}

444: \caption{ }

445: \end{figure}

446:

447: %\eject

448: \end{document}

449: