0408:q-bio0408024/media.tex

1: %------------------------------------------------------------

2: %  Revtex, up-dated July 20, 2004, first versin Feb 2004

3: %  Title: Medium effects on the selection of sequences folding

4: %         into stable proteins in a simple model                 %

5: %  Authors: You-Quan Li, Yong-Yun Ji, Jun-Wen Mao and Xiao-Wei Tang

6: %--------------------------------------------------------

7:

8: \documentclass[twocolumn,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}

9:

10: %\documentclass[preprint,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}

11: % Some other (several out of many) possibilities

12: %\documentclass[preprint,aps]{revtex4}

13: %\documentclass[preprint,aps,draft]{revtex4}

14: %\documentclass[prb]{revtex4}% Physical Review B

15:

16: \usepackage{graphicx}% Include figure files

17: \usepackage{dcolumn}% Align table columns on decimal point

18: \usepackage{bm}% bold math

19: \usepackage{amssymb}

20: \usepackage{amsmath}

21: \usepackage{latexsym}

22:

23: \begin{document}

24:

25: \title{Medium effects on the selection of sequences folding \\

26: into stable proteins in a simple model}%

27:

28: \author{You-Quan Li}\email{yqli@zimp.zju.edu.cn}

29: \author{Yong-Yun Ji}

30: \author{Jun-Wen Mao}

31: \author{Xiao-Wei Tang}

32: \address{Department of Physics, Zhejiang University, Hangzhou 310027, P.R. China.} %

33:

34: \date{\today}%

35:

36: \begin{abstract}

37: We study the medium effects

38: on the selection of

39: sequences in protein

40: folding by taking account

41: of the surface potential in

42: $HP$-model. Our analysis on

43: the proportion of H and P

44: monomers in the sequences

45: gives a direct

46: interpretation that the

47: lowly designable structures

48: possess small average gap.

49: The numerical calculation

50: by means of our model

51: exhibits that the surface

52: potential enhances the

53: average gap of highly

54: designable structures. It

55: also shows that a most

56: stable structure may be no

57: longer the most stable one

58: if the medium parameters

59: changed.

60:

61: \end{abstract}

62:

63: \pacs{87.10.+e, 87.14.Ee, 87.15.-v}%

64: %87.10.+e General theory and mathematical aspects

65: %87.14.Ee Proteins

66: %87.15.-v Biomolecules: structure and physical properties

67:

68: \maketitle

69:

70: %introduction and our motivation

71: Proteins are known to play a virtual role in the structure and

72: functioning of all forms of life, and the protein folding problem

73: is one of the most fundamental and still unsolved problems.

74: Composed of a specific sequence of amino acids, each protein is

75: folded into native structure (a particular 3-dimensional shape)

76: that determines its biological function and it is widely believed

77: that for most single domain proteins, the native structure is the

78: global free-energy minimum\cite{1}. The amino-acid sequence alone

79: encodes sufficient\cite{1} information to determine its 3-d

80: structure. Theoretical studies on protein sequence and structure

81: include molecular dynamical simulation\cite{2} and lattice

82: model\cite{3}. The latter has absorbed much attention\cite{4,5}

83: while the former takes much CPU even on huge computers\cite{2}.

84:

85: For the naturally occurring

86: varieties of amino acids

87: can be classified\cite{6}

88: as either of hydrophobic(H)

89: or of polar(P), a

90: HP-lattice model to

91: interpret protein folding

92: was introduced\cite{4}.

93: Based on the called

94: standard HP model, 27

95: monomers occupying all

96: sites of a cubic\cite{5},

97: Li et al.\cite{7}

98: introduced the

99: designability to show that

100: potentially good sequences

101: are those with a unique

102: ground state separated by a

103: large gap from the first

104: excited state. By defining

105: the designability of a

106: structure as the number of

107: sequences that possess the

108: structure as their unique

109: lowest-energy state, they

110: found that the structures

111: differ drastically in their

112: designabilities. The

113: sequences that design the

114: highly designable

115: structures are

116: thermodynamically more

117: stable\cite{7,8}. Studies

118: on the designability for a

119: larger lattice

120: model\cite{9} and for an

121: off-lattice model\cite{10}

122: showed the similar results.

123: For many-letter models, the

124: different parameters gave

125: different results: Buchler

126: et al.\cite{11} got that

127: the designability of the

128: structure depends

129: sensitively on the size of

130: the alphabet, and Li et

131: al.\cite{12} achieved that

132: the designability of the

133: structure is not sensitive

134: to the alphabet size when a

135: realistic interaction

136: potential(MJ matrix) is

137: employed. Ejtehadi et al.

138: found that if the strength

139: of the non-additive part of

140: the interaction potential

141: becomes larger than a

142: critical value, the degree

143: of designability of

144: structures will depend on

145: the parameters of the

146: potential\cite{13}.

147:

148: Since useful features

149: concerning to the protein

150: folding and their stability

151: can be explored on the

152: basis of lattice model, it

153: will be worthwhile to study

154: the effect of media on

155: protein folding properties.

156: In this letter, we consider

157: the medium effects by

158: introducing different

159: parameters that

160: characterize various

161: concentrations of medium

162: solution. Our results give

163: some answers to the

164: following questions.

165: Namely, are those sequences

166: associated with highly

167: designable structures

168: universally good? how do

169: they vary depending on

170: media\cite{14} where the

171: protein is placed?

172:

173: % The paragraph describing our models

174:

175: We investigate the effects

176: of media upon the category

177: of highly designable

178: protein sequences, which

179: will undoubtedly provide a

180: clue to understand the

181: variations in the nature

182: selection of protein

183: species caused by media

184: where the protein lives.

185: For this purpose, we must

186: reconstruct the original HP

187: model by introducing

188: potential parameters to the

189: monomers at protein's

190: surface. The protein is

191: figured as a chain of beads

192: occupying the sites of a

193: lattice in a self-avoiding

194: way, so our model

195: evaluating the energy of a

196: sequence folded into a

197: particular structure reads,

198: \begin{equation*}

199: H=\sum_{i<j}E_{\sigma_i\sigma_j}\delta_{|r_i-r_j|,1}(1-\delta_{|r_i-r_j|,1})

200: +\sum_{{r_j}{\in}S}U_{r_j}\delta_{\sigma_j,P}

201: \end{equation*}

202: where i, j denote for the

203: successive labels of

204: monomers in a sequence,

205: $r_i$ for the position (of

206: the $i$-th monomer) on the

207: lattice sites, and

208: $\sigma_i$ refers H or P

209: corresponding to

210: hydrophobic or polar

211: monomer. Here the Kronecker

212: delta notation is adopted,

213: i.e., $\delta_{a,b}=1$ if

214: a=b but $\delta_{a,b}=0$ if

215: $a\ne b$. As the

216: hydrophobic force\cite{6}

217: drives protein to fold into

218: a compact shape with more

219: hydrophobic monomers inside

220: as possible, the $HH$

221: contacts are more favorite

222: in this model, which can be

223: characterized by choosing

224: $E_{PP}=0$, $E_{HP}=-1$,

225: and $E_{HH}=-2.3$ as

226: adopted in Ref.\cite{7}. In

227: order to include the

228: effects caused by the

229: protein's surrounding

230: medium that is relevant to

231: salt concentration\cite{14}

232: of a solution where the

233: protein is placed, we

234: introduce $U_V$, $U_E$, and

235: $U_F$ to represent the

236: attractive potentials in

237: the protein surface for

238: polar (hydrophilic)

239: monomers at vertices,

240: edges, or face centers

241: respectively. These

242: attractive forces arise

243: from the medium (solution)

244: to the hydrophilic

245: monomers. Since we are not

246: able to deal with a sphere

247: surface in present lattice

248: model, we consider

249: different weights at the

250: surface, saying

251: $U_{\tau}=-\gamma_{\tau}V$.

252: If

253: $\gamma_V=\gamma_E=\gamma_F\ne

254: 0$, no any new results

255: occur in comparison to the

256: result that Li et al. had

257: studied. This is because

258: the core in the cubic of

259: the 27-site model always

260: contains a hydrophobic

261: core, which implies that

262: the surface potentials

263: merely cause a global shift

264: in energy spectrum of the

265: 27-site model if we impose

266: an equal weights on a

267: vertex, edge as well as

268: center of a face. We then

269: investigate several cases

270: of non-vanishing,

271: $\gamma$'s later on.

272:

273: %The details of our calculation and analysis

274:

275: It has been noticed\cite{7}

276: that some structures can be

277: designed by a large number

278: of sequences, while the

279: others can be designed by

280: only few sequences. The

281: designability of a

282: structure is measured by

283: the number($N_s$) of

284: sequences that take the

285: given structure as their

286: unique ground state, as was

287: first introduced by Li et

288: al.\cite{7}. Additionally,

289: structures differ

290: drastically according to

291: their designability, i.e.,

292: highly designable

293: structures emerge with a

294: number of associated

295: sequences much larger than

296: the average ones. For a

297: particular sequence, the

298: energy gap $\delta_s$ is

299: the minimum energy needed

300: to change its ground-state

301: structure into a different

302: compact structure. The

303: average energy gap

304: $\bar{\delta}_s$ for a

305: given structure is

306: evaluated by averaging the

307: gaps over all the $N_s$

308: sequences that design that

309: structure. The structures

310: with large $N_s$ have much

311: larger average gap than

312: those with small $N_s$, and

313: there is an apparent jump

314: around $N_s=1400$ in the

315: average energy gap. This

316: feature was first noticed

317: by Li et al.\cite{7} in the

318: medium-independent  HP

319: model, thus these highly

320: designable structures are

321: thermodynamically more

322: stable and possess

323: protein-like secondary

324: structures into which the

325: protein sequences fold

326: faster than the  other

327: structures\cite{7}. To

328: interpret this feature, we

329: calculate the  average

330: distribution of the number

331: of hydrophobic monomers for

332: the highly designable

333: structures and for the

334: lowly designable structures

335: respectively. We plot these

336: two distributions together

337: with the pure mathematical

338: binary arrangement

339: distribution in Fig.~\ref{fig:binary} where

340: all distributions are

341: normalized to unit.

342: Clearly, the distributions

343: for highly designable

344: structures shift toward the

345: larger number of

346: hydrophobic monomers in

347: comparison to the

348: mathematical distribution.

349: This leads to a lower

350: energy scale   because the

351: more hydrophobic monomers

352: there are, the lower their

353: energy will be. Oppositely,

354: the distribution for lowly

355: designable structures shift

356: toward the small number of

357: hydrophobic monomers in

358: comparison to the

359: mathematical distribution,

360: which causes a higher

361: energy. This may interpret

362: that the lowly designable

363: structures possess small

364: average gap.

365: %fig1

366: \begin{figure}

367: \includegraphics[width=0.32\textwidth]{fig_1.eps}%

368: \caption{\label{fig:binary} Comparison of distributions for binary

369: arrangement (green dot line), the lowly designable structures (red

370: dash-dot line), and the highly designable structures (black solid

371: line) respectively.}

372: \end{figure}

373:

374: Although the choices of

375: $E_{PP}=0$, $E_{HP}=-1$,

376: and $E_{HH}=-2.3$ adopted

377: in Ref.\cite{7} fulfil the

378: principle that the major

379: driving force for protein

380: folding is the hydrophobic

381: force, the difference

382: between the H-H contacts

383: occurring inside protein

384: and that occurring at surface was disregarded. Therefore, to explore the designability

385: affected by the medium surrounding the protein, the application of surface

386: potential in our model becomes inevitable. We pointed out in the above that

387: the 26 monomers are on the surface for 27-site model, which gave trivial

388: result for uniform weights to the surface potential. On the other hand,

389: increasing the number of the lattice sites will make the model beyond the

390: calculation capacity of nowadays computers. However, after some further

391: tuning the original model, we are able to obtain nontrivial and interesting

392: results. First, we consider a ``cubic shape approximation" by imposing different

393: potential weights:

394: ${\gamma}_V=7/8$, ${\gamma}_E=6/8$,

395: and ${\gamma}_F=4/8$, which come from the different interfaces

396: between the medium solution and the monomers at vertex, edge and the face centre

397: respectively. For this parameter choice, we find there are 17 more sequences

398: possessing unique ground state regardless of the magnitudes of $V$

399: (ranging from  0.1 to 2.1) though they do not possess unique ground states in the model studied

400: by Li et al where the effect of medium was neglected\cite{7}. Our calculation further

401: exposes that 14 of those 17 sequences mainly belong to the highly designable

402: structures, and have relatively larger energy gap. We analyse all the 17 sequences,

403: and find that the 14 ones can be related to each other by a single mutation, which

404: implies that they belong to the ``neutral island" suggested by Trinquier et al.\cite{15}.

405: These results confirm that protein structures are selected in nature because they

406: are readily designed and stable against mutations, and that such a selection

407: simultaneously leads to thermodynamic stability and foldability. Thus, a key

408: point to understand the protein-folding problem is to understand the emergence

409: and the properties of highly designable structures.

410:

411: %fig2

412: \begin{figure*}

413: \includegraphics[width=0.28\textwidth]{fig_2a}

414: \includegraphics[width=0.28\textwidth]{fig_2b}

415: \includegraphics[width=0.28\textwidth]{fig_2c}

416: \caption{\label{fig:average}Average

417: gap of structures versus $N_s$ of the structures in the case of

418: ${\gamma}_V=7/8$, $\gamma_E=6/8$, $\gamma_F=0$ for (a) $V=0.0001$,

419: (b)$V=0.9$, and (c)$V=2.1$, respectively.}

420: \end{figure*}

421:

422: %fig3

423: \begin{figure*}

424: \includegraphics[width=0.28\textwidth]{fig_3a}%

425: \includegraphics[width=0.28\textwidth]{fig_3b}

426: \includegraphics[width=0.28\textwidth]{fig_3c}

427: \caption{\label{fig:largest}

428: The largest average gap

429: $\bar{\delta}_{max}$ versus

430: the parameter $V$: (a) for

431: $\gamma_V=7/8$,

432: $\gamma_E=6/8$,

433: $\gamma_F=4/8$; (b)for

434: $\gamma_V=7/8$,

435: $\gamma_E=6/8$,

436: $\gamma_F=0$;

437: (c) for $\gamma_V=1$, $\gamma_E=1$, $\gamma_F=0$ case.}

438: \end{figure*}

439:

440: %fig4

441: \begin{figure*}

442: \includegraphics[width=0.28\textwidth]{fig_4a}

443: \includegraphics[width=0.28\textwidth]{fig_4b}

444: \caption{\label{fig:histogram}The

445: histogram for the number of

446: sequences versus the energy

447: gap for the 60 high

448: designable structures in

449: the absence of medium

450: (left); and in the presence

451: of medium $\gamma_V=7/8$,

452:  $\gamma_E=6/8$,$\gamma_F=0$, $V=2.1$(right).}

453: \end{figure*}

454:

455: The second parametrization is to consider

456: $\gamma_V=7/8$,

457: $\gamma_E=6/8$, and $\gamma_F=0$, which models

458: a protein with 7 monomers at the inside while 20 ones

459: at surface. In this case, we find  there are 48 more

460: sequences possessing unique ground state for a wider

461: range of magnitudes  of $V$ (from 0.0001 to 2.1),

462: which, however, have none unique ground states in the

463: case of Li et al.\cite{7}. Whereas, only one sequence

464: designs the highly designable structure while

465: the other 47 sequences design lowly designable

466: structures. All the energy gaps of those new sequences

467: are found to be $V/8$.

468: Since the ratio of the

469: numbers of the monomers at

470: surface to that at the

471: inside is of order 1 in

472: natural proteins\cite{8},

473: and the ratio in our model

474: is 26:1 in first case but

475: is 20:7 in the second case,

476: the latter case ought to be

477: closer to the usual natural

478: proteins. Fig.~\ref{fig:average} shows the

479: average energy gap for

480: different potential

481: parameters. Clearly, the

482: surface potential enhances

483: the average gap of highly

484: designable structures,

485: which illustrates that the

486: highly designable

487: structures selected by

488: nature are more stable in

489: proper media than in

490: ``vacuum". Recent

491: experiment\cite{16}

492: revealed that the

493: additional stability of a

494: thermophilic protein comes

495: from just a few residues at

496: the protein surface. Thus

497: our theoretical results may

498: evoke more attention to the

499: dependence of stability on

500: medium effects in further

501: model studies.

502:

503: We calculate the case by assuming the potentials at

504: the vertices and at edges with the same weights,

505: i.e., $\gamma_V=1$,  $\gamma_E=1$, and $\gamma_F=0$.

506: We find that there is no sequence beyond those of Ref.\cite{7} to take

507: the highly designable structures. Just like the result in Ref.\cite{14}, there are

508: also 60 structures that possess large average gap. When we take

509: account of the effects of medium, the average gap for highly

510: designable structures increase apparently as the potential

511: parameter increases, but the average gap of lowly designable

512: structures does not change much. In all the aforementioned

513: cases, the average gap of a single highly designable structure

514: increases linearly with

515: respect to the increase of

516: $V$. Furthermore, we find

517: the structure with largest

518: average gap is not fixed

519: for all potential

520: parameters. Crossings

521: between energy levels

522: always take place when the

523: potential parameter

524: changes. It is therefore

525: worthwhile to point out

526: that the gains of stability

527: for distinct structures

528: vary, and the most stable

529: protein structure in one

530: surrounding medium maybe no

531: more the most stable one in

532: another medium. The plots

533: of the largest energy gap

534: versus the parameter $V$

535: are shown in Fig.~\ref{fig:largest}

536: respectively for the three

537: cases of the weights

538: $\gamma$'s discussed in the

539: above. In order to show an

540: apparent change for eye's

541: view, we have set the value

542: of the vertical axis in

543: Fig.~\ref{fig:largest} to be the largest

544: average gap minus $0.21V$,

545: $0.5V$, and $0.6V$ for the

546: cases (a), (b), and (c),

547: respectively. In each case

548: is there a critical value

549: of $V$ across which the

550: plot transits from a strait

551: line to another strait

552: line. The critical values

553: of $V$ differ in different

554: cases, but the largest

555: average gaps at the

556: transition point take the

557: same value

558: $\bar{\delta}_s=1.4137$.

559:

560: We analyze all the

561: sequences  that design the

562: 60 highly designable

563: structures respectively. In

564: the absence of medium,

565: ${\gamma}_V={\gamma}_E={\gamma}_F=0$,

566: the energy gaps

567: ${\delta}_s$ of those

568: sequences range from 0.3 to

569: 2.6 (see Fig.~\ref{fig:histogram}). Almost

570: half of them have small

571: energy gaps (around 0.3).

572: In the presence of medium,

573: the energy gaps for most of

574: the sequences with larger

575: (over 1) energy gap rises

576: as parameter increases

577: while that for the

578: sequences with small energy

579: gap does not rises

580: apparently. For the cases

581: (a) ${\gamma}_V=7/8$,

582: ${\gamma}_E=6/8$,

583: ${\gamma}_F=4/8$, (b)

584: ${\gamma}_V=7/8$,

585:  ${\gamma}_E=6/8$, ${\gamma}_F=0$, and

586:  (c) ${\gamma}_V={\gamma}_E=1$, ${\gamma}_F=0$,

587:  the increments in energy gaps are mainly $3V/8$, $7V/8$, and

588:  $V$

589:  respectively. Whereas, there are also a small portion of the

590: sequences whose energy gaps decrease in the medium, e.g., 276

591: sequences in the case

592:  ${\gamma}_V=7/8$,

593:  ${\gamma}_E=6/8$,

594: ${\gamma}_F=4/8$.

595: Considering some particular

596: structures among the 60

597: highly designable ones, we

598: analyze the sequences that

599: design them. The energy gap

600: of the sequences with

601: larger energy gap will

602: mostly increase when the

603: sequence is placed in

604: medium, which leads to the

605: linear increment of average

606: gap. Our results also

607: illustrate that the

608: distribution shapes emerge

609: similar for those three

610: structures.  In addition,

611: the total number of

612: sequence in (b) is less

613: than in (c), but there are

614: much more sequences

615: possessing large energy gap

616: in (b) than in (c).

617:

618: In summary, our simple analysis of the average

619: distribution of the number of hydrophobic monomers can

620: interpret that the lowly designable structures

621: possess small average gap. Our model study exhibits

622: that the surface potential enhances the average gap of

623: highly designable structures, which implies

624: that the highly designable structures selected by

625: nature are more stable in proper media than in

626: ``vacuum". We obtained that the energy gap of the

627: sequences with larger energy gap will mostly

628: increase when the sequence is placed in medium, which

629: leads to the linear increment of average gap.

630: We also noticed that there is a critical value for the

631: parameter of the surface potential, which means that

632: a most stable structure may be no longer the most

633: stable one if the medium parameters changed. Since a

634: lot of studies have shown that several properties of

635: natural proteins can be captured by simple models,

636: our discussion in above may motivate people to model

637: the effect of medium on all theoretical studies  where

638: the medium potential was ignored.

639:

640: %Acknowledgments

641: This work is supported by NSFC No.10225419 and 90103022.

642:

643: \begin{references}

644: \bibitem{1} C. Anfinsen,  Science {\bf 181}, 223 (1973).

645: \bibitem{2} T. Lazaridis and M. Karplus, Science {\bf 278}, 1928 (1997).

646: \bibitem{3} H. Taketomi, Y. Ueda, and N. Go, Int. J. Prept. Protein Res {\bf 7}, 445 (1975).

647: \bibitem{4} K. A. Dill, Biochemistry {\bf 24}, 1501 (1985).

648: \bibitem{5} M. E. Shakhnovich and A. Gutin, J. Chem. Phys. {\bf 93}, 5967 (1990)

649: \bibitem{6} W. Kauzmann, Adv. Protein Chem. {\bf 14}, 1 (1959).

650: \bibitem{7} H. Li, R. Helling, C. Tang,  and  N. S. Wingreen, Science {\bf 273}, 666 (1996).

651: \bibitem{8} H. Li, C. Tang, and N. S. Wingreen, Proc. Natl. Acad. Sci. USA {\bf 95}, 4987 (1998).

652: \bibitem{9} H. Cejtin, J. Edler, A. Gottlieb, R. Helling, H. Li, J. Philbin, N. Wingreen, and C. Tang, J. Chem. Phys. {\bf 116}, 352 (2002).

653: \bibitem{10} J. Miller, C. Zeng, N. S. Wingreen and C. Tang, Proteins {\bf 47}, 506 (2002).

654: \bibitem{11} N. E. G. Buchler and R. A. Goldstein, Proteins {\bf 34}, 113 (1999).

655: \bibitem{12} H. Li, C. Tang, and N. S. Wingreen, Proteins {\bf 49}, 403 (2002).

656: \bibitem{13} M. R. Ejtehadi, N. Hamedani, H. Seyed-Allaei, V. Shahrezaei, and M. Yahyanejad, J. Phys. A {\bf 31}, 6141 (1998).

657: \bibitem{14} B. N. Dominy, D. Perl, F. X. Schmid, and CB III. Brooks, J. Mol. Biol. {\bf 319}, 541 (2002).

658: \bibitem{15} G. Trinquier and Y. H. Sanejouand, Phys. Rev. E {\bf 59}, 942 (1999).

659: \bibitem{16} D. Perl, U. Mueller, U. Heinemann, and  F. X. Schmid, Nature Struct. Biol. {\bf 7}, 380 (2000).

660: \end{references}

661: %\bibliography{media}% Produces the bibliography via BibTeX.

662:

663: \end{document}

664: