0405:q-bio0405014/VOZ1.tex

1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2: %\\

3: %Title: 	Prediction of RNA pseudoknots by Monte Carlo simulations

4: %Authors: 	Graziano Vernizzi, Henri Orland, Anthony Zee

5: %Comments: 	LaTeX, 22 pages

6: %Report-no:	SPhT-T04/061

7: %\\

8: %Abstract:

9: % In this paper we consider the problem of RNA folding with pseudoknots.

10: % We use a graphical representation in which the secondary structures

11: % are described by planar diagrams. Pseudoknots are identified as

12: % non-planar diagrams. We analyze the non-planar topologies of RNA

13: % structures and propose a classification of RNA pseudoknots according

14: % to the minimal genus of the surface on which the RNA structure can be

15: % embedded.  This classification provides a simple and natural way to

16: % tackle the problem of RNA folding prediction in presence of

17: % pseudoknots. Based on that approach, we describe a Monte Carlo

18: % algorithm for the prediction of pseudoknots in an RNA molecule.

19: %

20:

21: \documentclass[11pt]{article}

22: \usepackage{amssymb}

23: \usepackage{epsfig}

24: \newlength{\bredde}

25: \def\slash#1{\settowidth{\bredde}{$#1$}\ifmmode\,\raisebox{.15ex}{/}

26: \hspace*{-\bredde} #1\else$\,\raisebox{.15ex}{/}\hspace*{-\bredde} #1$\fi}

27: \textwidth 170mm

28: \textheight 230mm

29: \topmargin -0.8cm

30: \oddsidemargin -0.8cm

31: \evensidemargin -0.8cm

32:

33:

34: \newcommand{\vs}[1]{\rule[- #1 mm]{0mm}{#1 mm}}

35: \newcommand{\nn}{\nonumber}

36:

37: \newcommand{\sect}[1]{\setcounter{equation}{0}\section{#1}}

38: \renewcommand{\theequation}{\thesection.\arabic{equation}}

39:

40: \def\zc{{z^\ast}}

41: \def\tr{{\mbox{tr}}}

42: \def\re{{\Re\mbox{e}}}

43: \def\im{{\Im\mbox{m}}}

44:

45:

46: \begin{document}

47: \topmargin -1.4cm

48: \oddsidemargin -0.8cm

49: \evensidemargin -0.8cm

50: \title{\Large{{\bf

51: Prediction of RNA pseudoknots by Monte Carlo simulations

52: }}}

53:

54: \vspace{1.5cm}

55: \author{~\\{\sc G. Vernizzi}$^1$, {\sc H. Orland}$^1$

56: and {\sc A. Zee}$^2$\\~\\

57: $^1$Service de Physique Th\'eorique, CEA/DSM/SPhT Saclay\\

58: Unit\'e de recherche associ\'ee au CNRS\\

59: F-91191 Gif-sur-Yvette Cedex, France\\~\\

60: $^2$Institute of Theoretical Physics and Department of Physics\\

61: University of California, Santa Barbara, CA 93106, USA

62: }

63:

64:

65: \date{}

66: \maketitle

67: \vfill

68: \begin{abstract}

69: In this paper we consider the problem of RNA folding with pseudoknots.

70: We use a graphical representation in which the secondary structures

71: are described by planar diagrams. Pseudoknots are identified as

72: non-planar diagrams. We analyze the non-planar topologies of RNA

73: structures and propose a classification of RNA pseudoknots according

74: to the minimal genus of the surface on which the RNA structure can be

75: embedded.  This classification provides a simple and natural way to

76: tackle the problem of RNA folding prediction in presence of

77: pseudoknots. Based on that approach, we describe a Monte Carlo

78: algorithm for the prediction of pseudoknots in an RNA molecule.

79: \end{abstract}

80: %PACS   02.40.Pc General topology                               **

81: %       82.35.Pq Biopolymers, biopolymerization                 **

82: %       82.39.Pj Nucleic acids, DNA and RNA bases

83: %       87.14.Gg DNA, RNA                                       **

84: %       87.15.Aa Theory and modeling; computer simulation

85: %       87.15.Cc Folding and sequence analysis                  **

86: %       87.15.He Dynamics and conformational changes

87:

88:

89: %keywords: 	General topology, RNA, pseudoknot, structure prediction,

90: %		Monte Carlo simulations

91:

92:

93: \vfill

94:

95:

96: \begin{flushleft}

97: SPhT-T04/061\\

98: q-bio.BM/0405014

99: \end{flushleft}

100: \thispagestyle{empty}

101: \newpage

102:

103:

104:

105: \renewcommand{\thefootnote}{\arabic{footnote}}

106: \setcounter{footnote}{0}

107:

108:

109: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

110: \sect{Introduction}

111: \label{introsection}

112: In recent years the quest for an algorithm which can predict the

113: spatial structure of an RNA molecule given its chemical sequence has

114: received considerable attention from molecular biologists

115: \cite{Science}. In fact the three-dimensional structure of an RNA

116: molecule is intimately connected to its specific biological function

117: in the cell (e.g. for protein synthesis and transport, catalysis,

118: chromosome replication and regulation) \cite{TB}. It is determined by

119: the sequence of nucleotides along the sugar-phosphate backbone of the

120: RNA. The chemical formula or sequence of covalently linked nucleotides

121: along the molecule from the 5' to the 3' end is called the {\it

122: primary structure}. The four basic types of nucleotides are adenine

123: (A), cytosine (C), guanine (G) and uracil (U), but it is known that

124: modified bases may appear \cite{LCC}.

125:

126: At high enough temperatures, or under high-denaturant conditions RNA

127: molecules have the three-dimensional structure of a free

128: single-stranded swollen polymer. At room temperature, different

129: nucleotides can pair by means of saturating hydrogen bonds. The

130: standard Watson-Crick pairs are A$\bullet$U and C$\bullet$G with two

131: and three hydrogen bonds respectively, whereas G$\bullet$U is a wobble

132: pair with two hydrogen bonds. Comparative methods showed that

133: ``non-canonical'' pairings are also possible \cite{Nagaswamy}, as well

134: as higher-order interactions such as triplets, or quartets.  In this

135: paper we will consider only canonical base-pair interactions. Adjacent

136: base pairs can stack, providing and additional binding energy which is

137: actually the origin of the formation of stable A-form helices, one of

138: the main structural characteristics of folded RNAs. Helices may embed

139: unpaired sections of RNA, in the form of hairpins, loops and bulges.

140: It is all these pairings, stackings of bases and structural motifs

141: which bring the RNA into its folded three-dimensional

142: configuration. One of the main open problems of molecular biology is

143: the prediction of the actual spatial molecular structure of RNA

144: (i.e. its {\it shape}) given its primary structure.

145:

146: As we shall see in Section \ref{representsection}, it is possible to

147: define {\it secondary structures} of RNA as structures in which the

148: pairings between canonical base pairs do not cross in a certain

149: representation (planar graphs).  One can also define the {\it tertiary

150: structure} of RNA which is the actual three-dimensional arrangement of

151: the base sequence.  This classification corresponds to the fact that

152: the secondary structure of RNA carries the main contribution to the

153: free energy of a fully folded RNA configuration, including also some

154: of the sterical constraints. For that reason one can attempt to

155: describe the folding process hierarchically \cite{TB,BJT,BW,LD}.

156: However, since the secondary structure describes just the topology of

157: binary contacts of the bases, most of the information about distances

158: in real three-dimensional space is lost.  The importance of the

159: secondary structure relies in the fact that it may provide the

160: ``skeleton'' of the final tertiary structure.

161:

162: Over the past twenty years several algorithms have been proposed for

163: the prediction of RNA folding.  They are based on: deterministic or

164: stochastic minimization of a free energy function \cite{ZS,Monte},

165: phylogenetic comparison \cite{phyl1,phyl2,phyl3,phyl4}, kinetic

166: folding \cite{FFHS,mironovD,isa1,isa2}, maximal weighted matching

167: method \cite{MWM}, and several others (for a survey see \cite{Z2}).

168: It is fair to say that despite the large number of tools available for

169: the prediction of RNA structures, no reliable algorithms exist for the

170: prediction of the full tertiary structure of RNA. Most of the

171: algorithms listed above deal with the prediction of just the RNA

172: secondary structure.  To describe the full folding it is important to

173: introduce the concept of RNA pseudoknot \cite{PRB}. One says that two

174: base pairs form a pseudoknot when the parts of the RNA sequence

175: spanned by those two base pairs are neither disjoint, nor have one

176: contained in the other.  Thus RNA secondary structures without

177: pseudoknots can be represented by planar diagrams, whereas RNA with

178: pseudoknots appear when two base pairs can ``cross'', leading to

179: non-planar diagrams (a more precise definition is given in the next

180: Section). Pseudoknots play an important role in natural RNAs

181: \cite{WJ}, for structural, regulatory and catalytic

182: functions. Pseudoknots are excluded in the definition of RNA secondary

183: structure and many authors consider them as part of the tertiary

184: structure. This restriction is due to the fact that RNA secondary

185: structures without pseudoknots can be predicted easily. One should

186: also note that pseudoknots very often involve base-pairing from

187: distant parts of the RNA, and are thus quite sensitive to the ionic

188: strength of the solution. It has been shown that the number of

189: pseudoknots depends on the concentration of Mg$^{++}$ ion, and can be

190: strongly suppressed by decreasing the ionic strength (thus enhancing

191: electrostatic repulsion) \cite{MD,Dave,MD2}.  The most popular and

192: successful technique for predicting secondary structures is dynamic

193: programming \cite{ZS,waterman,williams,NJ,Z,Z1,HH2,Wuchty}, for which

194: the memory and CPU requirements scale with the sequence length $L$ as

195: $O(L^2)$ and $O(L^3)$, respectively.

196:

197: Recently, new deterministic algorithms that deal with pseudoknots have

198: been formulated

199: \cite{RE,Uemura,Akutsu,LP,giege,deogun,OZ,POZ,PTOZ}. In this case the

200: memory and CPU requirements generally scale as $O(L^4)$ and $O(L^6)$

201: respectively ($O(L^4)$ and $O(L^5)$ in

202: \cite{LP}, or $O(L^4)$ and $O(L^3)$ for a restricted model in

203: \cite{Akutsu}), which can be a very demanding computational effort

204: even for short RNA sequences ($L\sim100$). Moreover, the main

205: limitation of these algorithms is the lack of precise experimental

206: informations about the contribution of pseudoknots to the RNA free

207: energy, which is often excluded a priori in the data analysis (as also

208: pointed out in \cite{isa1,MironovL,gultyaev}).  The

209: increase of computational complexity does not come as a surprise. In

210: fact the RNA-folding problem with pseudoknots has been proven to be

211: NP-complete for some classes of pseudoknots \cite{Akutsu,LP}.

212: For that reason, stochastic algorithms might be a better choice to

213: predict secondary structures with pseudoknots in a reasonable time and

214: for long enough sequences.

215:

216: In \cite{Monte,abrah,gultyMonte,Ivo} stochastic Monte Carlo algorithms

217: for the prediction of RNA pseudoknots have been proposed.  In these

218: stochastic approaches, the very irregular structure of the energy

219: landscape (glassy-like) is the main obstacle: configurations with

220: small differences in energy may be separated by high energy barriers,

221: and the system may very easily get trapped in metastable states. Among

222: the stochastic methods, the direct simulation of the RNA-folding

223: dynamics (including pseudoknots) with kinetic folding algorithms

224: \cite{isa1,isa2} is most successful. This technique allows to

225: describe the succession of secondary structures with pseudoknots

226: during the folding process. The approach we follow in this paper is

227: close in spirit to that one, with a stronger emphasis on the

228: topological character of the RNA pseudoknots.  It is based on a

229: correspondence (first noticed by E. Rivas and S.R. Eddy in \cite{RE})

230: between a graphical representation of RNA secondary structures with

231: pseudoknots and Feynman diagrams.  In \cite{RE} the authors consider

232: only a particular class of pseudoknots.  Along the same direction, the

233: authors of \cite{OZ} made the correspondence between RNA folding and

234: Feynman diagrams more explicit by formulating a {\it matrix field

235: theory} model whose Feynman diagrams give exactly all the RNA

236: secondary structures with pseudoknots. The remarkable facts of this

237: new approach is that it provides an analytic tool for the prediction

238: of pseudoknots, and all the diagrams appear to be naturally organized

239: in a series of terms, called the {\it topological expansion}, where

240: the first term corresponds to planar secondary structures without

241: pseudoknots, and higher-order terms correspond to structures with

242: pseudoknots.

243:

244: In this paper we explore in more detail this topological expansion

245: and its potential predictive power.  We also propose a numerical

246: stochastic algorithm for dealing with this expansion in a systematic

247: way, which in principle allows the prediction of all kinds of RNA

248: pseudoknots. The paper is organized as follows.  In Section

249: \ref{representsection} we review some well-known graphical

250: representation of RNA structures, with special emphasis on the

251: so-called ``disk diagram'' representation. In such a representation

252: one can uniquely associate to each RNA secondary structure with (or

253: without) pseudoknots, a circle diagram which is planar (or not planar,

254: respectively).  In Section \ref{toposection} we show how one can

255: characterize the ``degree of non-planarity'' of a given disk diagram.

256: In fact, one can always associate an integer number to each RNA disk

257: diagram, called the {\it genus}, and we will describe its topological

258: meaning and information content. We thus propose to classify

259: RNA pseudoknots according to their genus.  Following this idea, in Section

260: \ref{statsection} we generalize the standard thermodynamic model for the

261: description of RNA structures to the inclusion of pseudoknots. The

262: generalized model we propose is very natural, in the same spirit when

263: going from the Canonical Ensemble to the Grand Canonical Ensemble in

264: statistical mechanics. Our model can control the topological

265: fluctuations i.e. the formation of pseudoknots in the RNA molecule,

266: and we will describe the general features of its phase diagram.  In

267: Section \ref{montesection} we describe a Monte Carlo algorithm for the

268: actual calculation of thermodynamical quantities in our generalized

269: model. In particular we will list in details the Monte Carlo moves,

270: the free-energy updating algorithm and the simulated annealing method

271: we propose for dealing with the problem of high energy

272: barriers. Section \ref{conclsection} contains the concluding remarks,

273: and the Appendix is devoted to the explicit description of a part of

274: the Monte Carlo algorithm of Section

275: \ref{montesection}.

276:

277:

278:

279:

280:

281:

282: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

283: \sect{Representation of RNA structures}

284: \label{representsection}

285:

286: Any RNA sequence can be represented as the list of nucleotides

287: $r_i\in$(A,C,G,U), $i=1,\ldots,L$, where $r_i$ is the $i$-th

288: nucleotide along the oriented sugar backbone (from 5' to 3'). The

289: ordered list $\{r_1,r_2,\ldots,r_L\}$ is called the primary structure

290: of the RNA.

291:

292: The RNA secondary structure requires a more graphical representation.

293: Actually there are several equivalent ways to represent an RNA

294: secondary structure with a given primary structure. The most commonly

295: used representation is perhaps the {\it bracket} notation, where two

296: paired bases, $r_i$ and $r_j$ ($i<j$), are represented by parenthesis

297: ``(`` and ``)'', and unpaired bases are represented by a dot '.' or a

298: colon ':' (see Figure

299: \ref{bracketplot}). Pseudoknots can be described in a similar fashion,

300: but one needs to introduce several different kinds of brackets (like

301: square brackets '[', ']' or braces '$\{$', '$\}$', as for example in

302: the database \cite{pseudobase,pseudobase1}), and this is not a

303: very efficient representation for complicated structures.

304: \begin{figure}[-t]

305: \centerline{

306: \epsfig{figure=data/bracket.eps,width=25pc}

307: }

308: \caption{

309: An RNA configuration without pseudoknots (left column) and RNA

310: configuration with a simple ``H'' pseudoknot (right column).  From the

311: top to the bottom: the RNA configuration, arc representation and

312: bracket representation. Note that the arc diagram for pseudoknotted

313: RNA has crossing arcs and the bracket representation requires two

314: kinds of parenthesis.}

315: \label{bracketplot}

316: \end{figure}

317:

318: %A closely related graphical representation is the {\it mountain plot},

319: %where a list of rectangles are superimposed, each rectangle having

320: %a height proportional to the strength of the bond between $r_i$ and $r_j$ and %a width equal to the segment $i-j$ (see Figure \ref{bracketplot}). RNA seconda%ry structures without pseudoknots can be represe%nted also

321: %by tree graphs and the ones with pseudoknots with non-tree graphs. This

322: %approach its consequences from the point of view of graph theory

323: % are analyzed in \cite{RNAgraph1}.

324:

325:

326: Among several other representations (e.g. mountain diagrams

327: \cite{HH}, tree diagrams \cite{FKSS}, graphs \cite{RNAgraph1}),

328: a very general and widely used representation is the so-called {\it

329: dot plot} diagram. It is an array where a dot is placed in the row $i$

330: and column $j$ if the bases $r_i$ and $r_j$ are actually paired (see

331: Figure

332: \ref{rnadotplot}). This plot is the graphical representation of the $L

333: \times L$ {\it contact matrix} $C$ with elements

334: \begin{equation}

335: C_{ij}=

336: \left\{

337: \begin{array}{ll}

338: 1 & \mbox{if } i \mbox{ and } j  \mbox{ are paired} \, ,\\

339: 0 & \mbox{ otherwise.}

340: \end{array}

341: \right.

342: \label{Contact}

343: \end{equation}

344: In mathematical terms, the contact matrix $C$ is the matrix

345: of the permutation involution associated to the

346: given set of pairings. In fact, one can always interpret the base pairing

347: $i-j$ as a transposition of the elements $\{i,j\}$ and therefore one

348: can uniquely associate a permutation $\sigma$ to any structure by:

349: \begin{equation}

350: \sigma(i)=

351: \left\{

352: \begin{array}{ll}

353: j & \mbox{if } i \mbox{ and } j  \mbox{ are paired} \, ,\\

354: i & \mbox{ otherwise.}

355: \end{array}

356: \right.

357: \end{equation}

358: For example, if the primary structure is $\{5'-CUUCAUCAGGAAAUGAC-3'

359: \}$ and the pseudoknotted secondary structure is: $.(((.[[[)))..]]].$,

360: one can associate to it the permutation:

361: \begin{equation}

362: \sigma=\left(

363: \begin{array}{ccccccccccccccccc}

364: .&(&(&(&.&[&[&[&)&)&)&.&.&]&]&]&.\\

365: 1& 2 &3 &4 &5 &6 &7 &8 &9 &10& 11& 12& 13& 14& 15& 16& 17  \\

366: 1&11 &10&9&5 &16&15&14 &4 &3 &2  & 12& 13& 8 & 7 & 6 & 17

367: \end{array}

368: \right) \, ,

369: \end{equation}

370: which is also an involution since $\sigma^2$ is the identity

371: permutation.  The matrix representation of $\sigma$ is the matrix $D$

372: with $D_{i, \sigma(i)}=1$ and $0$ otherwise. Obviously $D=C+{\cal I}$,

373: where ${\cal I}$ is the $L \times L$ identity matrix. This notation is

374: very useful for numerical implementations of the algorithm we propose

375: in Section \ref{montesection}. The advantage of the dot plot diagram

376: is that it allows the comparison between different RNA secondary

377: structures, just by superimposition as it is necessary for

378: comparative analysis. Moreover it can be used for representing

379: RNA structure with any kind of pseudoknots.

380: \begin{figure}[-t]

381: \centerline{

382: %\epsfig{figure=data/rnaplot.ps,width=15pc}

383: \epsfig{figure=data/dotplot2.eps,width=30pc}

384: %\put(5,0){$\bar{m}$}

385: }

386: \caption{Representation of an RNA secondary structure with an ``H'' pseudoknot (left), and the corresponding dot plot diagram (right).}

387: \label{rnadotplot}

388: \end{figure}

389:

390: A representation which is completely equivalent to the dot plot

391: diagram is the {\it disk diagram} (also called {\it circle plot} or

392: {\it circular plot}). In this case the RNA sequence is represented as

393: an oriented circle (from 5' to 3') by virtually linking the first

394: nucleotide to the last one. Each base pairing is represented as an arc

395: inside the circle, connecting the two paired bases. Figure

396: \ref{circleplot} shows a typical disk diagram.  In this representation

397: secondary structures without pseudoknots are purely planar diagrams,

398: i.e. diagrams that can be drawn without crossing arcs, whereas

399: pseudoknots correspond to structures which are not planar.

400: \begin{figure}[-ht]

401: \centerline{

402: \epsfig{figure=data/circles.eps,width=15pc}

403: }

404: \caption{

405: Typical disk (circle) diagram representation of an RNA secondary

406: structure without pseudoknots. The circle is anticlockwise oriented

407: from $5'$ to $3'$.  Note that there are no crossing arcs.}

408: \label{circleplot}

409: \end{figure}

410: This fact has been already observed by E.Rivas and S.R.Eddy in

411: \cite{RE}, where they consider diagrams with arcs inside

412: {\it and} outside the disk\footnote{More precisely, they represent the

413: RNA sequence as an oriented straight line, and the pairings as arcs

414: above and below that line. This is of course equivalent to the disk

415: representation.}. Crossing arcs are allowed but only inside or only

416: outside but not both at the same time (so-called ``overlapping

417: pseudoknots'', see diagrams a) and b) of Figure

418: \ref{rivaseddyplot}). As it was shown in

419: \cite{POZ,PTOZ}, several general classes of pseudoknots cannot be

420: described in such a simple way (such as the diagram on the right of Figure

421: \ref{rivaseddyplot}). It is then more convenient to draw the

422: arcs always inside the disk (or outside, but not both) and to consider

423: all the corresponding diagrams as non planar.  It is precisely

424: following this approach that the authors of \cite{OZ} found an

425: algorithm for computing pseudoknots with matrix field theory in a

426: completely general fashion.  In this paper we pursue the same analysis

427: by considering the diagrams themselves and not the associated matrix

428: field theory model.\\

429: \begin{figure}[-ht]

430: \centerline{

431: \epsfig{figure=data/eddyrivas1.eps,width=11pc}

432: \epsfig{figure=data/eddyrivas2.eps,width=9pc}

433: \epsfig{figure=data/eddyrivas3.eps,width=9pc}

434: \put(-290,-13){$a)$}

435: \put(-170,-13){$b)$}

436: \put(-55,-13){$c)$}

437: }

438: \caption{

439: Three kind of disk diagrams for RNA secondary structures with

440: pseudoknots. The authors of \cite{RE} consider cases of the form a)

441: (``overlapping pseudoknots''), b) (pseudoknot present in {\it

442: Escherichia coli $\alpha$} mRNA \cite{Gluick}) but not c) (parallel

443: $\beta$-sheet protein interaction). The technique in

444: \cite{OZ} can deal with all the three cases.}

445: \label{rivaseddyplot}

446: \end{figure}

447:

448:

449:

450:

451:

452:

453: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

454: \sect{The topological character of RNA pseudoknots}

455: \label{toposection}

456:

457: There is a very natural way for classifying the ``degree of

458: non-planarity'' of a given disk diagram, which we review here

459: briefly. It is based on a topological analysis introduced long ago by

460: Euler. We emphasize that this characterization is a well-known

461: classical result of algebraic topology

462: and it has been already introduced in \cite{OZ} for RNA secondary

463: structures. We repeat it here in more detail for the convenience

464: of the reader.

465:

466: As we have shown, in any disk diagram the RNA sequence is represented

467: by an oriented circle. When the circle is drawn on a sphere, its

468: orientation allows to distinguish an ``inside'' and an ``outside'' of

469: the circle.  One says that the circle is a ``boundary'' or

470: ``puncture'' (as it can be drawn smaller and smaller, in a continuous

471: fashion up to single point) on the sphere.  Hence any (disk) planar

472: diagram can be drawn on a sphere without crossing lines, simply by

473: drawing the arcs on the same side (see Figure \ref{sphereplot}). The key

474: observation is that the sphere is naturally partitioned in several

475: parts by the diagram. As explained in \cite{OZ} it is useful to draw

476: the arcs with a ``double-line notation'' (see Figure \ref{sphereplot}). In

477: this way it is clear that the sphere is partitioned into several

478: polygons. Note that all the lines have an orientation induced by the

479: one of the circle.

480: \begin{figure}[-t]

481: \centerline{

482: \epsfig{figure=data/sphere.eps,width=25pc}

483: }

484: \caption{Example of planar disk diagram. In a) the disk diagram of a double hairpin (like the one in Figure \ref{bracketplot}) is on a sphere. In b) the arcs are drawn all outside the circle. In c) the sphere is partitioned in 6 patches (5 faces and one ``hole'', i.e. the RNA circle). In d) is the representation of c) in double line notation (black thick arcs). Here $\#F=5$, $\#V=4$, $\#E=8$, and therefore $\chi=1$, i.e. $g=0$.}

485: \label{sphereplot}

486: \end{figure}

487:

488: The Euler characteristic $\chi$ of a  diagram is defined as

489: \begin{equation}

490: \chi=\#V-\#E+\#F \, ,

491: \label{genus}

492: \end{equation}

493: where $\#V$, $\#E$ and $\#F$ are the numbers of vertices,

494: edges, and faces, respectively. A vertex is just a nucleotide, an edge is any line

495: connecting two nucleotides (either an arc joining the nucleotides, or

496: the RNA sequence) and a face is that part of the surface within a closed

497: loop of edges. Obviously, if there are $n$ arcs then $\#E=\#V+n$.  A

498: famous theorem of Euler states that any polyhedron homeomorphic to a

499: sphere with a boundary (puncture) has an Euler characteristic

500: $\chi=1$. Therefore all RNA secondary structures without pseudoknots

501: are described by disk diagrams with $\chi=1$.

502:

503:

504: \begin{figure}[-ht]

505: \centerline{

506: \epsfig{figure=data/khplot1.eps,width=12pc}

507: \epsfig{figure=data/khplot2.eps,width=10pc}

508: %\put(5,0){$\bar{m}$}

509: }

510: \caption{A ``kissing hairpin'' pseudoknot. The respective disk diagram has

511: crossing arcs necessarily, when the arcs are drawn all inside (or all

512: outside) the RNA circle.}

513: \label{kisshairpinplot}

514: \end{figure}

515: Let us discuss the case when there is a pseudoknot. For simplicity, we

516: consider a ``kissing hairpin'' pseudoknot.  In this case the

517: corresponding disk diagram is not planar, and has crossing arcs (see

518: for instance Figure \ref{kisshairpinplot}). After drawing the disk

519: diagram in double line notation, and counting the number of vertices,

520: edges and faces, one gets that $\chi=-1$ this time. This value has a

521: precise geometrical meaning.  In fact, the Euler characteristic of a

522: surface (or of a manifold in general) is closely related to its {\it

523: genus} $g$, i.e. the number of ``handles'' of the surface.  Namely if

524: the manifold is orientable (as the disk diagram is, since the oriented

525: circle line defines naturally an orientation of all the elements of

526: the diagram), then one has $\chi = 2 - 2g-c$ where $c$ is the number

527: of punctures ($c=1$ in the case we consider here, with only one RNA

528: strand). It follows that a kissing hairpin is represented by a disk

529: diagram with genus $g=1$. One concludes then, that such a disk diagram

530: can be drawn on an oriented manifold with one handle, that is a {\it

531: torus} (which is a doughnut-shaped surface formed by taking a cylinder

532: and joining the two circular ends together, see Figure \ref{torusplot}).  This

533: procedure can be extended easily to cases with more complex

534: pseudoknots. For instance, the three diagrams of Figure

535: \ref{rivaseddyplot} have genus $g=2$, $g=1$ and $g=2$,

536: respectively. In Figure \ref{eightplot} there is a graphical

537: representation of all 8 types of irreducible pseudoknots with genus

538: $g=1$ (from \cite{POZ}) and in Figure \ref{higherplot}

539: there is some examples of pseudoknots with a higher genus.\\

540: \begin{figure}[-ht]

541: \centerline{

542: \epsfig{figure=data/torus.eps,width=20pc}

543: }

544: \caption{The ``kissing hairpins'' of Figure \ref{kisshairpinplot} can be drawn on a torus without intersections. In this example $\#F=9$, $\#V=20$, $\#E=20+10$, and therefore $\chi=-1$, i.e. $g=1$.}

545: \label{torusplot}

546: \end{figure}

547:

548: \begin{figure}[-ht]

549: \centerline{

550: \epsfig{figure=data/eightloop.eps,width=30pc}

551: %\put(5,0){$\bar{m}$}

552: }

553: \caption{List of all eight irreducible diagrams with genus $g=1$

554: (from \cite{PTOZ}) and their representation with double line notation, on the left column and right column, respectively.}

555: \label{eightplot}

556: \end{figure}

557: \begin{figure}[-ht]

558: \centerline{

559: \epsfig{figure=data/highergenus.eps,width=30pc}

560: }

561: \caption{Example of RNA pseudoknots with higher genus. The first two plots correspond to the diagrams a) and c) of Figure \ref{rivaseddyplot} with genus $g=2$. The third plot is an example with genus $g=3$.}

562: \label{higherplot}

563: \end{figure}

564: Thus we have a simple way to classify pseudoknots. This classification

565: corresponds exactly to the series expansion of the partition function

566: of the matrix model proposed in \cite{OZ}. There, the series is in

567: powers of the form $N^{-2g}$, where $N$ is the size of the matrix, and

568: $g$ is the genus of the corresponding set of diagrams. In the next

569: section, we will exploit the same idea and show how one can

570: control the topological character of pseudoknots in a statistical

571: mechanical model for RNA secondary structures with pseudoknots.

572:

573:

574:

575:

576:

577: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

578: \sect{Statistical mechanics model of RNA structures with pseudoknots}

579: \label{statsection}

580:

581: In almost all the energy models for RNA which have been proposed in

582: recent years, the thermodynamical properties of a single stranded RNA

583: are studied by means of a partition function of the form

584: \begin{eqnarray}

585: {\cal Z}_{RNA}&=&\int \prod_{k=1}^L \, d \mathbf

586: {r}_k \,\, \sum_{C_{ij}} f(\{ \mathbf{r}\})

587: e^{-\frac{1}{k_B T} U(C_{ij}, \{  \mathbf{r} \} )}

588: \sim\sum_{C_{ij}} \omega(C) e^{-\frac{1}{k_B T} E(C) }= \nonumber \\

589: &=&

590: \sum_{C_{ij}} e^{-\frac{1}{k_B T}\left[ E(C)-T S(T,C) \right]} \, ,

591: \label{ZRNA}

592: \end{eqnarray}

593: where $T$ is the temperature, $k_B$ is the Boltzmann constant,

594: $\mathbf{r}_k$ is the three-dimensional position vector of the $k$-th

595: nucleotide, $f(\{ \mathbf{r}\})$ takes into account the geometry and

596: the constraints of the chain of nucleotides, the function $U$ takes

597: into account the energetics coming from the pairing and stacking of

598: base pairs, and the sum over $C_{ij}$ is the sum over all possible

599: contact matrices for a given primary structure. The function

600: $\omega(C)$ is proportional to the number of configurations having the

601: same contact matrix $C$, and therefore its logarithm is just the

602: entropy factor associated to the polymeric nature of the

603: sugar-phosphate backbone. The free energy of a given configuration

604: ${\cal F(C)} \equiv E(C)-T S(T,C)$ is the sum of several

605: contributions, both of energetic ( $E(C)$ ) and entropic nature (

606: $S(C)$ ): Watson-Crick and wobble base pairs, stacking energies,

607: terminal mismatches and dangling energies, special triloops and

608: tetraloops, entropy contributions (internal loops, bulges, hairpin

609: loops),

610: %usually they are modeled from self-avoiding polymers theory

611: penalty factors for terminal-AU in helices, for asymmetries etc.  All

612: these terms have been determined empirically, and they are called

613: ``Turner energy rules'' \cite{Tur}. For more details see

614: \cite{ZTM}. When pseudoknots are excluded, the sum in eq. (\ref{ZRNA}) is

615: restricted over contact matrices that correspond to planar diagrams

616: only.  As we mentioned already in the introduction, the partition

617: function ${\cal Z}_{RNA}$ without pseudoknots can be calculated

618: efficiently by deterministic algorithms (dynamic programming)

619: \cite{caskill}: the most popular ones are perhaps the ``mfold

620: package'' by M.Zuker et al. \cite{z2003,mathews} and the ``RNA

621: Vienna package'' by I.Hofacker et al. \cite{ivovienna}\footnote{They

622: are available on-line at {\texttt

623: www.bioinfo.rpi.edu/applications/mfold/} and {\texttt

624: www.tbi.univie.ac.at/}, respectively.}.  When pseudoknots are

625: included, the sum in eq. (\ref{ZRNA}) is unrestricted and, as we

626: described in the previous Section, this leads to topology

627: fluctuations. This situation is very common also in other areas of

628: Physics (e.g. dynamical triangulations \cite{Mau}, random surfaces or

629: quantum gravity \cite{Amb}, quantum field theory \cite{thooft}), and

630: there are now standard ways to deal with it. The idea is to introduce

631: an additional parameter $\mu$, which is a topological ``chemical

632: potential'', and to consider the partition function:

633: \begin{equation}

634: {\cal Z}_{RNA}(\mu)= \sum_{C_{ij}}

635: e^{-\frac{1}{k_B T}\left[ E(C)-T S(T,C)+\mu g(C) \right]} \, ,

636: \label{ourZRNA}

637: \end{equation}

638: where $g(C)$ is the genus of the configuration associated to the contact

639: matrix $C$. The ``chemical potential'' $\mu$ allows a simple control over the

640: topological character of the pseudoknots in the statistical ensemble

641: at thermal equilibrium. It is also directly related to $N$ (the size

642: of the matrix) in the matrix model formulation of \cite{OZ}:

643: \[

644: {\cal Z}_{Matrix} \sim 1+\frac{Z_1}{N^2}+\frac{Z_2}{N^4}+\ldots \, ,

645: \]

646: with $\mu=-2 k_B T \log(N)$.  The advantage here is that the energy

647: function $E(C)$ can be more realistic than the one in \cite{OZ}.

648:

649: The model without chemical potential, i.e. $\mu=0$, corresponds to the

650: case where there are no restrictions on the possible fluctuations of

651: the topology.  On the other hand when $\mu$ is very large, all the

652: configurations with $g>0$ are suppressed by the Boltzmann weight, and

653: in this case one recovers the planar limit (i.e. RNA secondary

654: structures without pseudoknots). One might expect then a phase

655: transition associated to the formation of pseudoknots. A natural order

656: parameter is the average genus of a RNA structure with pseudoknots

657: which can simply be recovered by taking the logarithmic derivative of

658: the partition function

659: \begin{equation}

660: \langle g(\mu) \rangle

661: =

662: -k_B T  \frac{\partial}{\partial \mu} \log {\cal Z}_{RNA}(\mu) \, .

663: \end{equation}

664: To our knowledge there are no available experimental data about

665: the dependence of the genus of RNA molecules on the

666: temperature. Informations and inputs from experiments would be highly

667: desirable.

668:

669: Figure \ref{phaseplot} displays the expected phase diagram in the

670: plane $\{\mu,T\}$ of our model. At high temperature, the RNA is always

671: in a fully denaturated phase. At lower temperature and large $\mu$ the

672: secondary structures without pseudoknots are the dominating

673: configurations.  The interesting part of the diagram is for lower

674: values of $\mu$, where possibly $ \langle g(\mu) \rangle \; \neq 0$

675: and pseudoknots are present.

676: \begin{figure}[-ht]

677: \centerline{

678: \epsfig{figure=data/phase.eps,width=15pc}

679: \put(-190,120){$T$}

680: \put(-35,-9) {$\mu$}

681: \put(-95,120) {denaturated phase}

682: \put(-5,50) {planar limit}

683: \put(-170,10) {phase with pseudoknots}

684: }

685: \caption{Qualitative structure of the phase diagram in the  $\{\mu,T\}$ plane.}

686: \label{phaseplot}

687: \end{figure}

688:

689: Even if eq. (\ref{ourZRNA}) can in principle deal with pseudoknotted

690: RNA molecules, it is fair to say that for any realistic energy

691: function, the model is rather unlikely amenable to an analytic

692: solution. Moreover any dynamic programming approach has been shown to

693: be computationally very demanding even for pseudoknots with genus

694: $g=1$ \cite{PTOZ}. Hence a stochastic algorithm for studying the model

695: eq.  (\ref{ourZRNA}) is probably the only feasible way. In the next

696: Section we describe in details a Monte Carlo algorithm for the

697: simulation of the model of eq. (\ref{ourZRNA}).

698:

699: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

700: \sect{A Monte Carlo algorithm for RNA pseudoknots prediction}

701: \label{montesection}

702:

703: %The assumption of the RNA being in thermodynamic

704: %equilibrium may well be wrong, and it certainly is in some cases

705: %(such as during the synthesis of the RNA) /cite{S.R. Morgan and P.G.Higgs, J.%Chem.Phys. {\bf 105} (1996) 7152.}. In fact this observation led several

706: %groups to develop kinetic folding algorithms for RNA secondary structures

707: %(for predicting pseudoknots \cite{hofacker}, or recostruting folding

708: %pathways

709: %analysis of kinetic folding of RNA  has been introduce by Martinez(1984

710: %Mironov et al(1985) Mironov &Kister (1986), proposing montecarlo algorithms

711: %construction of secondary structures.

712:

713: A well-known method for generating a set of configurations which are

714: distributed according to a given Boltzmann weight is the Monte Carlo

715: method.  It is a standard method of modern computational analysis and

716: we refer to \cite{MC,frenkel} for a review and an introduction on the

717: subject.  In recent years, it has been also used for the prediction of

718: RNA secondary structures in various contexts. In particular, our

719: proposal can be thought of as a generalization of the Monte Carlo

720: method described in \cite{Monte} where the authors considered only RNA

721: secondary structures without pseudoknots. We aim to apply this method

722: to the statistical ensemble defined by eq. (\ref{ourZRNA}).

723:

724: The sum over all the RNA configurations in eq. (\ref{ourZRNA})

725: contains many terms. In general, the total number of RNA

726: configurations (planar and non planar configurations) grows like $L!$

727: for a sequence of length $L$.  The number of RNA configurations with a

728: fixed genus $g$ grows exponentially with $L$: a detailed analysis

729: for the number of diagrams with genus $g=0$ (i.e. planar diagrams) can

730: be found in \cite{combinatorics}.\footnote{An analysis for structure

731: with higher genus similar to the one in \cite{combinatorics} is still

732: lacking. We expect that the matrix field theory model introduced in

733: \cite{OZ} can shed some light on this issue.}

734: Since the number of secondary structures on a surface with fixed genus

735: grows exponentially, one expects that a brute force Monte Carlo

736: importance sampling would be rather ineffective for a not too-short

737: RNA sequence, in a reasonable amount of computation time.  For that

738: reason we decided to use the standard Metropolis method \cite{Metro}. The

739: Metropolis method is an efficient and simple scheme for generating a

740: set of configurations distributed according to a given probability

741: function, by means of a random walk in the configuration space. In our

742: case, the Metropolis Monte Carlo method generates a set of $n$ RNA

743: configurations $\{C^{(0)},C^{(1)},\ldots,C^{(n)}\}$, such that

744: $\lim_{n

745: \to \infty} n_{C}/n=P(C)$, where $P(C)$ is the given

746: probability distribution (e.g., the Boltzmann distribution $P(C) =

747: {\cal Z}^{-1} \exp[\left( E-T S+\mu g \right) / k_B T ]$ and $n_C$ is

748: the number of configurations of type $C$ in the statistical

749: ensemble. Each element $C^{(k)}$ of the sequence is generated by

750: accepting or rejecting a random configuration. In the following we

751: give a complete description of the Metropolis Monte Carlo algorithm

752: for RNA pseudoknots predictions:

753:

754: \begin{itemize}

755:

756: \item Step 1: Pick an initial configuration $C^{(0)}$:

757: A simple initial configuration can be the fully denaturated state of

758: RNA, i.e.  the contact matrix is the matrix with all zero entries and

759: the respective permutation involution is the identity permutation

760: $\sigma=\left(

761: \begin{array}{cccc}

762: 1&2&\ldots&L\\

763: 1&2&\ldots&L

764: \end{array}

765: \right)$. Set the variable $n=1$.

766:

767:

768: \item Step 2: Pick a trial configuration $C^{(n)}$ (by deforming

769: the configuration $C^{(n-1)}$). Such an operation is called ``Monte

770: Carlo move'' $C^{(n-1)} \to C^{(n)}$. Compute the probability ratio

771: \begin{equation}

772: \rho=\frac{P(C^{(n)})}{P(C^{(n-1)})} \, .

773: \label{rho}

774: \end{equation}

775: Pick a random number $x$ with value between 0 and 1.  If $x \leq \rho$

776: accept the configuration $C^{(n)}$ as the new configuration.

777: Otherwise refuse it and keep $C^{(n-1)}$ as new configuration,

778: i.e. put $C^{(n)}=C^{(n-1)}$ . Increase the variable $n$ by one.

779:

780:

781: \item  Step 3: repeat Step 2 for $n_{max}$ times, where $n_{max}$ is a

782: sufficiently large number.

783:

784: \end{itemize}

785:

786:

787: The most relevant aspect of this method is that, at large $n_{max}$,

788: one can generate an ensemble of configurations with the probability

789: distribution $P(C) = {\cal Z}^{-1} \exp[-(E(C)-T S(T,C) +\mu

790: g(C))/(k_B T)]$, simply by computing probability ratios.  Therefore,

791: this method is extremely useful as it avoids the need of computing the

792: partition function ${\cal Z}$ of the system, a computational task that would be

793: surely intractable for long RNA sequences.

794: \begin{figure}[-ht]

795: \centerline{

796: \epsfig{figure=data/energy.eps,width=15pc}

797: \put(-90,85) {$P=1$}

798: \put(-70,125) {$P=e^{-\Delta E/k_b T}$}

799: }

800: \caption{The Metropolis algorithm accepts a configuration with lower energy

801: with probability $P=1$. It can also accept a configuration with higher

802: energy, with probability $P=e^{-\Delta E/K_b T}$, where $\Delta E$ is

803: the energy difference.}

804: \label{energyplot}

805: \end{figure}

806:

807:

808: \subsection{Configurational changes (Monte Carlo moves)}

809:

810: At large $n_{max}$, the above algorithm is guaranteed to generate a

811: set of configurations with the probability distribution $P(C)$, under

812: few assumptions. Two essential requirements are that the

813: Monte Carlo moves have to be {\it ergodic} and satisfy the so-called

814: {\it detailed balance condition}. Ergodicity essentially means that every

815: point in the configuration space can be reached in a finite number of

816: Monte Carlo steps from any other point. The detailed balance condition

817: here simply means that the Monte Carlo moves are symmetric, i.e.  the

818: probability of proposing a Monte Carlo move $C \to C'$ is the same as

819: of proposing the move $C' \to C$.

820:

821: We describe now the Monte Carlo moves for

822: RNA folding. First, at the beginning of the

823: simulation, it is useful to make some book-keeping by storing in the memory the list of

824: all the allowed base-pairs (i.e. that are only of the type

825: A$\bullet$U, C$\bullet$G or G$\bullet$U).  Such an information can be

826: stored in $L$ vectors $l_i$, $i=1,\ldots,L$, as follows: the

827: nucleotide in position $i$ can be paired to $n_i$

828: possible other nucleotides, namely with the ones in position $l_i(1),

829: l_i(2),\ldots,l_i(n_i)$ and nothing else. For example, if the primary

830: structure is $\{AGCU\}$ then we have:

831: \begin{equation}

832: l_1=[4] \, , \quad l_2=[3,4] \, , \quad l_3=[2] \, , \quad l_4=[1,2]

833: \, .

834: \end{equation}

835: The creation of such a list of possible base-pairs does not slow down

836: the total algorithm since it is an $O(L^2)$ operation which is done

837: only once. Now we want to extract one element from the list of $L$

838: vectors with uniform probability. This can be done as follows. Let $T

839: \equiv \sum_h n_h$, and let pick up a uniform integer random number $\tau$

840: between $1 \leq \tau \leq T$. Then take the highest integer number $i$

841: such that $\sum_{h=1}^{i} n_h \leq \tau$, and define

842: $y\equiv\tau-\sum_{h=1}^{i} n_h+1$. Obviously $1 \leq y \leq T$ holds

843: true. Consider the pair of bases $i$ and $j \equiv l_i(y)$.  The

844: base-pair $i-j$ has been extracted randomly with uniform probability

845: in the set of all possible base-pairs, for the given RNA sequence.  The

846: Monte Carlo move $C\to C'$ is then generated as follows:

847: \begin{itemize}

848:

849: \item If the configuration $C$ is such that both the base in $i$ and in $j$ are

850: free (i.e. $\sigma_{C}(i)=i$ and $\sigma_{C}(j)=j$) then add the link

851: $i-j$ (i.e. put $\sigma_{C'}(i)=j$ and $\sigma_{C'}(j)=i$). We call

852: this Monte Carlo move ``add a base pair'' (see case 1 of figure

853: \ref{MCmovesplot}).

854:

855: \item If the configuration $C$ is such that there is arc between $i$ and $j$

856: (i.e. $\sigma_{C}(i)=j$) then remove the link $i-j$ (i.e. put

857: $\sigma_{C'}(i)=i$ and $\sigma_{C'}(j)=j$). We call this Monte Carlo

858: move ``remove a base pair'' (see case 2 of figure \ref{MCmovesplot}).

859:

860: \item If the configuration $C$ is such that either the base in $i$ or the

861: base in $j$ is linked to some other base, (i.e. $\sigma_{C}(i)=i$ and

862: $\sigma_{C}(j) \neq j$, or $\sigma_{C}(j)=j$ and $\sigma_{C}(i) \neq i$)

863: then move the link back to $i-j$, by overriding any former link

864: (i.e. put $\sigma_{C'}(i)=j$ and $\sigma_{C'}(j)=i$).  We call this

865: Monte Carlo move ``shift a base pair'' (see case 3 and 4 of figure

866: \ref{MCmovesplot}).

867:

868: \item If the configuration $C$ is such that the base $i$ is linked to an other

869: base $k_1$ and $j$ is linked to an other base $k_2$ and the base-pair

870: $k_1-k_2$ is possible, then swap the links (i.e. put

871: $\sigma_{C'}(i)=j$, $\sigma_{C'}(j)=i$, $\sigma_{C'}(k_1)=k_2$,

872: $\sigma_{C'}(k_2)=k_1$).  We call this Monte Carlo move ``swap a base

873: pair'' (see case 5 and 6 of figure

874: \ref{MCmovesplot}).

875:

876: \item If none of the above cases applies then do not update the

877: configuration, i.e. put $C'=C$.

878:

879: \end{itemize}

880: See Figure \ref{MCmovesplot} for a summary of these Monte Carlo moves, and Figure \ref{confplot} for a simple example.\\

881: \begin{figure}[-ht]

882: \centerline{

883: \epsfig{figure=data/MCmoves.eps,width=40pc}

884: }

885: \caption{Monte Carlo moves for an allowed base pair $i-j$ of a

886: RNA secondary structures with pseudoknots. The move $1)$ adds a

887: base pair. The move $2)$ removes a base pair. The moves $3)$ and $4)$

888: shift a base pair.  The move $5)$ and $6)$ swap two base-pairs, when

889: possible.}

890: \label{MCmovesplot}

891: \end{figure}

892:

893:

894: These Monte Carlo moves obviously satisfy the detailed balance

895: condition. In fact the probability of creating a link between $i$ and

896: $j$ when $i$ or $j$ are link-free (or at least one of them), or of

897: removing the link when they are already linked is always $P_{ij}=2/T$,

898: and thus it is symmetric. In the case where $i$ and $j$ are already

899: linked to different bases $\sigma(i)$ and $\sigma(j)$, then a link is

900: put between $i$ and $j$ only if $\sigma(i)$ can be connected to

901: $\sigma(j)$ as well. In this case the reverse move also occurs with

902: the same probability rate, thus it is a symmetric move. Moreover, the set of

903: Monte Carlo moves are ergodic. The key

904: observation is that such moves correspond to transpositions in the

905: space of permutation involutions.  Since any configuration of RNA

906: secondary structure with pseudoknots can be uniquely represented by a

907: permutation involution, and since any permutation can be obtained by a

908: suitable finite sequence of transpositions

909: \cite{knuth3}, it follows that any RNA secondary structure with pseudoknots can be

910: generated with a finite number of the Monte Carlo moves described

911: above.

912:

913: \begin{figure}[-ht]

914: \centerline{

915: \epsfig{figure=data/configuration.eps,width=25pc}

916: %\put(5,0){$\bar{m}$}

917: }

918: \caption{Space of the configurations for the sequence $\{A,U,A,U\}$. The

919: arrows indicate the Monte Carlo moves and their probability rate.}

920: \label{confplot}

921: \end{figure}

922:

923:

924:

925: Few comments are in order.  First of all, other sets of Monte Carlo

926: moves are possible of course. Several authors introduced collective

927: moves, where several links are updated at the same time (as opposed to

928: one by one as we propose).  The advantages are a general speed-up of

929: the computing time, and a more effective simulation as far as overcoming

930: the energy barriers.  In the present work, we prefer to keep the code

931: as simple as possible by using a set of ``local'' moves, and to focus

932: on testing its effectiveness when dealing with RNA pseudoknots.

933: Second, both the generation of the Monte Carlo moves and the

934: Metropolis method require a good (pseudo)random number generator, in

935: order to avoid biases in the output which may be very difficult to

936: detect.  For a good introduction to random number generators we refer

937: the reader to

938: \cite{knuth2} and \cite{NumRecipes}.

939: Finally, as in all stochastic algorithms, one has to be able to

940: estimate the statistical errors of the Monte Carlo prediction.  As

941: this method generates an ensemble of configurations

942: $\{C^{(0)},C^{(1)},\ldots, C^{(n_{max})}\}$, distributed according to

943: the probability distribution $P(C)$, then one can compute ensemble

944: averages of any quantity ${\cal A}(C)$ simply by:

945: \begin{equation}

946: \langle {\cal A} \rangle =\frac{1}{n_{max}}\sum_{i=1}^{n_{max}} {\cal A}(C^{(i)})  \, .

947: \end{equation}

948:

949: The error associated to this observable scales like $1/\sqrt{N}$ where

950: $N$ is the number of independent measurements. It is important to

951: note that $N$ is not usually equal to $n_{max}$ since in general the

952: configurations generated by any Monte Carlo algorithm are

953: correlated. One can deal with this issue in two ways.  One can compute

954: the autocorrelation length $\xi$ of the sequence of the RNA

955: configurations generated by the Monte Carlo algorithm and then

956: subsample the same set of configurations, keeping one configuration

957: every $\xi$ and skipping all the configurations in between

958: \cite{Geyer92}.  An other possibility is to keep all the

959: configurations of the sequence, and compute the statistical error by

960: taking into account the existence of correlations (The error is

961: usually bigger than the simple standard deviation of the data). A

962: well-known technique for computing the statistical error of a set of

963: correlated data is the so-called {\it jackknife method}. For an

964: introduction to this method, we refer the reader to

965: \cite{jackknife}. There are also other techniques which can be found in

966: \cite{Liu}.

967:

968: \subsection{Energy update}

969: According to the Metropolis method, the ratio $\rho$ (Step 2 of the

970: algorithm, eq. (\ref{rho})) is given by

971: \begin{equation}

972: \rho=\exp\left[-\frac{1}{k_B T}(\Delta E-T \Delta S+\mu \Delta g )\right] \, ,

973: \end{equation}

974: where

975: \begin{eqnarray}

976: \Delta E&=&E(C^{(n)})-E(C^{(n-1)}) \nonumber \, ,\\

977: \Delta S&=&S(C^{(n)})-S(C^{(n-1)}) \nonumber \, , \\

978: \Delta g&=&g(C^{(n)})-g(C^{(n-1)}) \, . \nonumber

979: \end{eqnarray}

980: Since the Monte Carlo moves are local (i.e. they involve only a small

981: part of the RNA sequence) the computation of $\Delta E$, $\Delta S$

982: and $\Delta g$ is usually easier and faster than computing the full

983: functions $E(C)$, $S(C)$ and $g(C)$.

984:

985: We consider first $\Delta g$, and we provide an efficient algorithm

986: for computing it.  According to eq. (\ref{genus}), the genus is given

987: by $g=(1-\#V+\#E-\#F)/2=

988: (1-L+(L+n_{arcs})-n_{loops})/2$.

989: %=(1+n_{arcs}-n_{loops})/2$

990: Therefore:

991: \begin{equation}

992: \Delta g=\frac{1+\Delta n_{arcs}-\Delta n_{loops}}{2} \, ,

993: \end{equation}

994: where

995: \begin{equation}

996: \Delta n_{arcs}=

997: \left\{

998: \begin{array}{ll}

999: -1& \mbox{ for a ``remove the base-pair $i-j$" move,}\\

1000: 1 & \mbox{ for a  ``add the base pair $i-j$" move,}\\

1001: 0 & \mbox{ for a ``shift or swap the base pair $i-j$" move.}

1002: \end{array}

1003: \right.

1004: \end{equation}

1005: The difference $\Delta n_{loops}$ can be computed by considering the

1006: loops containing the bases $i$ and $j$. In principle there are 4

1007: possible independent loops, two about $i$ and two about $j$ (see

1008: Figure \ref{loopsplot}). The connectivity of the RNA molecule can be

1009: such that the four loops are not independent. In Appendix

1010: \ref{appendix} we describe an algorithm for computing the actual number of

1011: independent loops. Then it is sufficient to run the algorithm over

1012: $C^{(n)}$ and $C^{(n-1)}$ and compute the difference of loops, that is

1013: $\Delta n_{loops}$.

1014: \begin{figure}[-ht]

1015: \centerline{

1016: \epsfig{figure=data/loops.eps,width=20pc}

1017: }

1018: \caption{Two given bases $i$ and $j$ usually belong to $4$ loops, when drawing the arc with the double-line notation. The loops

1019: are not always independent. Here there are two examples: a case with

1020: $2$ independent loops (top), and a case with only one

1021: independent loop (bottom).}

1022: \label{loopsplot}

1023: \end{figure}

1024:

1025: Secondly, the calculation of $\Delta E$ is more problematic. There is

1026: not yet an RNA energy model for any given topology. The most studied

1027: and well-defined model both from a theoretical and experimental point

1028: of view is for the spherical topology (i.e. genus$=0$, that is RNA

1029: secondary structure without pseudoknots)\cite{ZukEn}. The set of

1030: empirical ``Turner'' energy rules can be generalized in order to

1031: describe some simple class of pseudoknots \cite{gultyaev} but the

1032: general case for {\it any} topology is still lacking.  For $\Delta S$

1033: the situation is slightly better, as one can model the configurational

1034: entropy of the RNA structure by using the theory of polymers (as

1035: already presented in \cite{isa1} or \cite{gultyaev}) and the inclusion

1036: of pseudoknots is in principle feasible. Therefore, in our

1037: numerical simulation we will use the ``Turner'' energy model, and even

1038: if this is not quite appropriate for higher topology, we expect the

1039: corrections to be small with respect to the over-all energy scale. We

1040: remind the reader that the purpose of this paper is to propose a new

1041: approach for the study of pseudoknots formation in RNA secondary

1042: structure. Thus at this point, it is reasonable to perform a

1043: preliminary analysis based on an approximate energy model.

1044: When a more complete energy model (including all the

1045: topologies) will be available, it will be sufficient to include it in the

1046: calculation of $\Delta E$ in our algorithm.

1047:

1048: \subsection{The ``simulated annealing" method}

1049: One of the major problems about the Monte Carlo simulation of RNA

1050: folding, is that the energy landscape is usually very rough with

1051: metastable valleys separated by energy barriers which are ``high''

1052: compared to the energy involved in each Monte Carlo move. This is a

1053: general situation in thermodynamic systems with many degrees of

1054: freedom (e.g.  glasses, polymers, proteins etc.).  where in addition

1055: to the global minimum energy configuration there may be many local

1056: minima separated by high energy barriers. The worst consequence is

1057: that the system can be trapped for a long time in such local minima

1058: and the Monte Carlo exploration of the energy landscape is no longer

1059: effective. In order to over come this problem with RNA M.Schmitz and

1060: G. Steger in \cite{Monte} proposed the use of a computational

1061: technique called ``simulated annealing'' method. It is a classical

1062: method which has been introduce for finding the minimum energy

1063: configuration of a system with a very rough energy landscape

1064: \cite{kirk}. We briefly describe the algorithm:

1065:

1066: \begin{itemize}

1067: \item Step 1: generalize the partition function eq. (\ref{ourZRNA}) to the form:

1068: \begin{equation}

1069: {\cal Z}_{RNA}= \sum_{C_{ij}} e^{-\frac{1}{k_B \Theta}\left[ E(C)-T

1070: S(T,C)+\mu g(C) \right]} \, ,

1071: \label{ourZRNATheta}

1072: \end{equation}

1073: and initialize $\Theta=\Theta_{max}>T$.

1074: \item Step 2: Starting from an initial configuration $C^{(0)}$ (e.g.,

1075: the fully denaturated RNA configuration) sample $n$ configurations by

1076: means of the Metropolis Monte Carlo method applied to

1077: eq. (\ref{ourZRNATheta}).

1078: \item Step 3: Go to Step 2, and replace $\Theta$ by a lower value, and

1079: $C^{(0)}$ by $C^{(n)}$. Repeat this step until the temperature of the

1080: system is equal to $T$. During the Monte Carlo process keep track all

1081: the time of the configuration with the lowest energy.

1082: \end{itemize}

1083: One can show that usually, the global minimum can be obtained by using

1084: a logarithmic rate \cite{annea}.  In practice, other annealing

1085: schedules are possible: linear, hyperbolic, exponential, power-law

1086: schedules are often implemented.

1087:

1088: Assuming that at low temperature an RNA molecule assumes a

1089: configuration which corresponds to the minimum energy, we can find

1090: such a configuration by using the simulated annealing method, starting

1091: the simulation with a value of $\Theta$ well above the melting

1092: temperature (say a few hundred degrees Celsius).  A first check of

1093: this method is whether we can reproduce the results produced by

1094: deterministic algorithms such as ``mfold'' \cite{z2003} or the

1095: ``Vienna Package'' \cite{ivovienna}. For that purpose, it is

1096: sufficient to use the ``Turner'' energy model and run our algorithm

1097: with a large value of the chemical potential $\mu$. Our preliminary

1098: tests showed that the minimum can be easily found for sequences with

1099: length up to around 300 bases. For longer RNA sequences, the

1100: simulation time increases and the minimum may be harder to find. In

1101: this cases we can use an additional feature of our model. In fact our

1102: approach offers also the interesting possibility of using the chemical

1103: potential for overcoming the energy barriers. It means that we can

1104: apply a ``simulated annealing'' method on $\mu$ rather than on $T$.

1105: Thus, starting with a low value of $\mu$ (where all the topologies

1106: with any genus are allowed) the Monte Carlo simulation can quickly

1107: explore regions which are very distant from each other in the energy

1108: landscape. Then by slowly increasing the value of $\mu$ we gradually

1109: constrain the simulation to select only planar configurations

1110: (i.e. secondary structures without pseudoknots), and the minimum

1111: energy configuration eventually.  During this process, that is for

1112: intermediate values of $\mu$, many configurations in thermal

1113: equilibrium are generated, and in general they correspond to diagrams

1114: with $\langle g \rangle \neq0$, i.e. RNA configurations with

1115: pseudoknots. These configurations are the prediction of our algorithm

1116: and they should be compared with the experimental data. It is at this

1117: level that the value of $\mu$ can be tuned, in order to fit the

1118: data. The results of our preliminary investigations in this region of

1119: the phase diagram are very encouraging and promising. However, in this

1120: paper we limit ourselves to the description of this new method and of

1121: our algorithm. The results of our simulation will be published

1122: shortly.

1123:

1124:

1125:

1126: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1127: \sect{Conclusions}\label{conclsection}

1128:

1129:

1130: In this paper we propose a new approach to the problem of RNA folding

1131: with pseudoknots. We start from a classifications of RNA pseudoknots

1132: based on their graphical representation by means of disk diagrams.  A

1133: generic disk diagram is usually not planar, i.e. cannot be drawn on a

1134: plane surface without crossing lines. However, if the surface has a

1135: high enough genus (i.e. a sufficient number of ``handles''), the

1136: diagram can always be drawn on that surface without any crossing.  The

1137: precise correspondence is obtained by using a famous theorem by Euler,

1138: and it precisely corresponds to the topological classifications of RNA

1139: pseudoknots already introduced in \cite{OZ}. Then we propose a

1140: statistical mechanics model where the formation of RNA pseudoknots is

1141: associated with fluctuations of the topology (eq. (\ref{ourZRNA})). In

1142: order to do that we introduce a parameter, the topological ``chemical

1143: potential'', which controls the rate of pseudoknots formation, and can

1144: be obtained by fitting experimental data. We then discuss the

1145: qualitative structure of the phase diagram for the RNA molecule in the

1146: plane $\{\mu,T\}$ and its interpretation. Finally we describe a Monte

1147: Carlo algorithm for the prediction of RNA pseudoknots. It is based on

1148: a standard Metropolis algorithm coupled to the ``simulated annealing''

1149: method and we provide an explicit description of its implementation

1150: and use.  A numerical investigation of this technique and the phase

1151: diagram is under way and will be published shortly.

1152:

1153:

1154: \indent

1155:

1156: \noindent

1157: \underline{Acknowledgments}:

1158: We wish to thank the organizers of the EUROGRID meeting on ``Random

1159: geometry: theory and applications'' at Les Houches (France) on March

1160: 2004, where this work has been first presented. GV acknowledges the

1161: support of the European Fellowship MEIF-CT-2003-501547.

1162:

1163:

1164:

1165: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1166: \begin{appendix}

1167:

1168: \sect{An algorithm for computing $n_{loops}$}

1169: \label{appendix}

1170:

1171: In this Appendix we describe an algorithm for computing the number of

1172: independent loops adjacent to the $i$-th and $j$-th nucleotides. It is

1173: useful for computing the variation of the genus when one of the Monte

1174: Carlo moves we described in Section \ref{montesection}.  The algorithm

1175: we propose for counting the number of independent loops is based on

1176: tracking a walk along the diagram starting from the base $i$ and

1177: marking the loop with an identifying number (or color). Namely, we

1178: represent the configuration $C$ by means of the permutation involution

1179: $\sigma_{C}$ (as described in Section \ref{representsection}), and the

1180: algorithm is:

1181:

1182: \begin{verbatim}

1183: START

1184: v(1)=v(2)=v(3)=v(4)=0           % set the four flags to zero

1185: pos=i                           % the start position is i-th base

1186: color=1                         % using color 1

1187: v(1)=color                      % mark the first flag with the color in use

1188: do{                             % start the first loop

1189:    pos=sigma(pos)               % follow the permutation involution

1190:    if pos==i then v(2)=color    % check if it is either in i or j

1191:    if pos==j then v(4)=color

1192:    pos=shift(pos)               % shift move (along the RNA circle)

1193:    if pos==j then v(3)=color    % check if it is in i again

1194:   } while (position!=i)         % repeat until it returns at the starting point

1195:

1196: if v(2)==0 then{                % check if the second loop has been marked already

1197:   color=color+1                 % if yes, change color

1198:   pos=i                         % start again from i-th base

1199:   v(2)=color                    % mark the second flag

1200:   do{                           % repeat all the above for the second loop (at i)

1201:      pos=shift(pos)

1202:      if pos==j then v(3)=color

1203:      pos=sigma(pos)

1204:      if pos==j then v(4)=color

1205:      } while (pos!=i)

1206: }

1207:

1208:

1209: if v(3)==0 then{                % repeat all the above for the third loop (at j)

1210:    color=color+1

1211:    pos=j

1212:    v(3)=color

1213:    do{

1214:       pos=sigma(pos)

1215:       if pos==j then v(4)=color

1216:       pos=shift(pos)

1217:       } while (pos!=j)

1218: }

1219:

1220: if v(4)==0 then{                % the fourth loop at j is independent from the

1221:    color=color+1                % previous ones if and only if it has not been

1222: }                               % marked yet

1223:

1224: nloops=color                    % the number of independent loops is the number

1225:                                 % of used colors

1226: END

1227: \end{verbatim}

1228: $L$ is the length of the RNA sequence and the function {\texttt

1229: shift(pos)} is just the increment by 1 of the variable {\texttt pos}

1230: with period $L$, i.e. {\texttt shift(pos)=remainder(pos,L)+1}. At the

1231: end of the algorithm the variable ``color'' contains the number of

1232: independent loops $n_{loops}$. The algorithm runs in a time

1233: proportional to $O(L)$.

1234: \end{appendix}

1235:

1236: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1237:

1238: %\indent

1239:

1240: \begin{thebibliography}{99}

1241:

1242: \bibitem{Science} J. Couzin, ``Breakthrough of the Year: Small RNAs Make Big Slash'', Science {\bf 298} (2002) 2296.

1243:

1244: \bibitem{TB} I. Tinoco Jr. and C. Bustamante, J. Mol. Biol. {\bf 293} (1999) 271.

1245:

1246: \bibitem{LCC} P.A. Limbach, P.F. Crain and J.A. McCloskey, Nucleic Acids Res. {\bf 22} (1994)  2183.

1247: %2196.

1248: %Summary: the modified nucleosides of RNA.

1249:

1250: \bibitem{Nagaswamy} U. Nagaswamy, N. Voss, Z. Zhang and G.E. Fox, Nucleic Acids Res. {\bf 28} (2000) 375.

1251: %376.

1252:

1253: \bibitem{BJT} A. Banerjee, J. Jaeger and D. Turner, Biochemistry {\bf 32} (1993) 153.

1254:

1255: \bibitem{BW} P. Brion and E. Westhof, Ann. Rev. Biophys. Biomol. Struct. {\bf 26} (1997) 113.

1256:

1257: \bibitem{LD} L. Laing and D. Draper, J. Mol. Biol. {\bf 237} (1994) 560.

1258:

1259: \bibitem{ZS} M. Zuker and D. Sankoff, Bull. Math. Biol. {\bf 46} (1984) 591.

1260: %621.

1261:

1262: \bibitem{Monte} M. Schmitz and G. Steger, J. Mol. Biol. {\bf 255} (1996) 254.

1263:

1264: \bibitem{phyl1} R.R. Gutell, N. Larsen and C.R. Woese, Microbiol Rev. {\bf 58} (1994) 10.

1265:

1266: \bibitem{phyl2} C.R. Woese, R.R. Gutell, R. Gupta, H.F. Noeller, Microbiol Rev. {\bf 47} (1983) 621.

1267:

1268: \bibitem{phyl3} P.Higgs, Phys. Rev. Lett {\bf 76} (1996) 704.

1269:

1270: \bibitem{phyl4} R. Nussinov, G. Peickzenink, J. Griggs and D. Kleitman, SIAM J. Appl. Math. {\bf 35} (1978) 68.

1271: %-82.

1272:

1273: \bibitem{FFHS} C. Flamm, W. Fontana, L. Hofacker and P. Schuster, RNA {\bf 6} (2000) 325.

1274:

1275: %\bibitem{FHMSZ} Christoph Flamm, Ivo L. Hofacker, Sebastian Maurer-Stroh, Peter F. Stadler, Martin Zehl.

1276: %Design of multistable RNA molecules. RNA 7:325-338, 2001

1277:

1278: \bibitem{mironovD} A.A. Mironov, L.P. Dyakonova and A.E. Kister, J. Biomol. Struct. Dyn. {\bf 2} (1985) 953.

1279: %962.

1280:

1281: \bibitem{isa1} H. Isambert and E.D. Siggia, Proc. Natl. Acad. Sci. USA {\bf 97} (2000), 6515.

1282: %6520.

1283:

1284: \bibitem{isa2} A. Xayaphoummine, T. Bucher, F. Thalmann and H. Isambert, Proc. Natl. Acad. Sci. USA {\bf 100} (2003) 15310.

1285:

1286: \bibitem{MWM} J.E. Tabaska, R.B. Cary, H.N. Gabow, G.D. Stormo, {\bf 14} (1998) 691.

1287: %699

1288: %An RNA folding method capable of identifying pseudoknots and base triples

1289:

1290: \bibitem{Z2} M. Zuker, Curr. Opin. Struct. Biol. {\bf 10} (2000) 303.

1291: % {\it Calculating nucleic acid secondary structure}

1292:

1293: \bibitem{PRB} C.W. Pleij, K. Rietveld and L. Bosch, Nucleic Acids Res. {\bf 13} (1985) 1717.

1294: %-1731.

1295: %A new principle of RNA folding based on pseudoknotting

1296:

1297: \bibitem{WJ} E. Westhof and L. Jaeger,  Current Opinion Struct. Biol. {\bf 2} (1992) 327.

1298: %-333.

1299: %``RNA pseudoknots''

1300:

1301: \bibitem{MD} V.K. Misra and D.E. Draper, Byopolimers {\bf 48} (1998) 113.

1302:

1303: \bibitem{Dave} J.P.D. Thirumalai, S.A. Woodson, Proc. Natl. Acad. Sci. USA {\bf 96} (1999) 96 6149.

1304: %6154

1305: %Biochemistry

1306: %Magnesium-dependent folding of self-splicing RNA: Exploring the link between c%ooperativity, thermodynamics, and kinetics

1307:

1308: \bibitem{MD2} V.K. Misra, R. Shiman and  D.E. Draper, Biopolymers {\bf 69}

1309: (2003) 118.

1310: %-136.

1311: %A thermodynamic framework for the magnesium-dependent folding of RNA.

1312:

1313: \bibitem{waterman} M.S. Waterman, and T.H. Byers, Mathematical Biosciences, {\bf 77} (1985) 179.

1314: %-188.

1315: %A dynamic programming algorithm to find all solutions in a neighborhood of the optimum.

1316:

1317: \bibitem{williams} A.L. Williams and I. Tinoco Jr.,  Nucleic Acids Res. {\bf 14} (1986) 299.

1318: %-315.

1319: %A dynamic programming algorithm for finding alternate RNA secondary structures.

1320:

1321: \bibitem{NJ} R. Nussinov and A.B. Jacobson, PNAS {\bf 77} (1980) 6309.

1322:

1323: \bibitem{Z} M. Zuker, Science {\bf 244} (1989) 48.

1324: %-52.

1325: %On finding all suboptimal foldings of an RNA molecule

1326:

1327: \bibitem{Z1} M. Zuker and P. Stiegler, Nucleic Acids Res. {\bf 9} (1981), 133.

1328: %-148.

1329: %Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information

1330:

1331: %\bibitem{HH1} I.L. Hofacker, M. Fekete and P.F. Stadler, J. Mol. Biol. {\bf 319} (2002), 1059-1066.

1332: %Secondary Structure Prediction for Aligned RNA Sequences

1333:

1334: \bibitem{HH2} I.L. Hofacker, W. Fontana,  P.F. Stadler, S. Bonhoeffer, M. Tacker and P. Schuster, Monatshefte f. Chemie {\bf 125} (1994) 167.

1335: %-188.

1336: %Fast Folding and Comparison of RNA Secondary Structures.

1337:

1338: \bibitem{Wuchty} S. Wuchty, W. Fontana, I.L. Hofacker, P. Schuster, Biopolymers {\bf 49} (1999) 145.

1339: %-165.

1340: %Complete suboptimal folding of RNA and the stability of secondary structures.

1341:

1342: \bibitem{RE} E. Rivas and S.R. Eddy, J. Mol. Biol. {\bf 285} (1999) 2053.

1343: %-2068.

1344:

1345: \bibitem{Uemura} Y. Uemura, A. Hasegawa, S. Kobayashi and T. Yokomori, Theoretical Computer Science {\bf 210}  (1999) 277.

1346: %-303.

1347: %Tree adjoining grammars for RNA structure prediction

1348:

1349: \bibitem{Akutsu} T. Akutsu (2001), Discr. Appl. Math. {\bf 104} (2001) 45.

1350: %-62.

1351: %Dynamic programming algorithm for RNA secondary structure prediction with pseudoknots

1352:

1353: \bibitem{LP} R.B. Lyngs\o, C.N.S. Pedersen, J. Comp. Biol., {\bf 7} (2000) 409.

1354: %-427.

1355:

1356: \bibitem{giege} R. Giegerich and J. Reeder, Technical Report 2003-03 (2003) University of Bielefeld.

1357:

1358: \bibitem{deogun} J.S. Deogun, R. Donis, O. Komina and F. Ma, Proc. Second

1359: Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New

1360: Zealand CRPIT, 29. Chen, Y.P.P., Ed. ACS. 239-246.

1361: %RNA Secondary Structure Prediction with Simple Pseudoknots.

1362:

1363: \bibitem{OZ} H. Orland and A. Zee, Nucl. Phys. {\bf B620} (2002) 456.

1364:

1365: \bibitem{POZ} M. Pillsbury, H. Orland, A. Zee, http://arXiv.org/physics/0207110.

1366:

1367: \bibitem{PTOZ}  M. Pillsbury, J.A. Taylor, H. Orland and A. Zee, http://arXiv.org/cond-mat/0310505.

1368: %An Algorithm for RNA Pseudoknots

1369:

1370: \bibitem{MironovL} A.A. Mironov and V.F. Lebedev, BioSystems {\bf 30} (1993), 49.

1371: %-56.

1372:

1373: \bibitem{gultyaev} A.P. Gultyaev, F.H.D. van Batenburg and C.W.A. Pleij, RNA  {\bf 5} (1999) 609.

1374: %-617.

1375:

1376: \bibitem{abrah} J.P. Abrahams, M. van den Berg, E. van Batenburg and C.W.A Pleij, Nucleic Acids Res. {\bf 18} (1990), 3035.

1377: %3044.

1378: %Prediction of RNA secondary structure, including pseudoknotting, by computer simulation.

1379:

1380: \bibitem{gultyMonte} A.P. Gultyaev, Nucleic Acids Res. {\bf 19} (1991), 2489.

1381: %-2494.

1382: %The computer simulation of RNA folding involving pseudoknot formation.

1383:

1384: \bibitem{Ivo} I.L. Hofacker, in "Monte Carlo Approach to Biopolymers and Protein Folding", P. Grassberger, G. Barkema and W. Nadler (eds.), World Scientific, Singapore (1998), 171.

1385: %-182.

1386: %RNA Secondary Structures: A Tractable Model of Biopolymer Folding

1387:

1388: \bibitem{pseudobase} F. H. D. van Batenburg, A. P. Gultyaev and C. W. A. Pleij, Nucleic Acids Res. {\bf 29}  (2001) 194.

1389: %-195.

1390:

1391: \bibitem{pseudobase1} F.H.D van Batenburg, A.P. Gultyaev and C.W.A. Pleij  and J. Oliehoek, Nucleic Acids Res. {\bf 28} (2000) 201.

1392: %-204.

1393: The database is at http://wwwbio.LeidenUniv.nl/$\sim$Batenburg/PKB.html.

1394:

1395: \bibitem{HH} P. Hogeweg and B. Hesper, Nucleic Acids Res. {\bf 12} (1984) 67.

1396: %-74.

1397: %Energy directed folding of RNA sequences

1398:

1399: \bibitem{FKSS} W. Fontana, A.M. Konings,  P.F. Stadler and P. Schuster, Biopolymers {\bf 33} (1993) 1389.

1400: %-1404.

1401: %Statistics of RNA secondary structures

1402:

1403: \bibitem{RNAgraph1} H.H. Gan, S. Pasquali and T. Schlick, Nucleic Acids Res. {\bf 31} (2003) 2926.

1404: %-2943.

1405: %Exploring the repertoire of RNA secondary motifs using graph theory

1406: %with implications for RNA design .

1407:

1408: \bibitem{Gluick} T.C. Gluick and D.E. Drape, J. Mol. Biol {\bf 241} (1994) 246.

1409: %-262.

1410: %thermodynamics of folding a pseudoknotted mRNA fragment.

1411:

1412: \bibitem{Tur} M.J. Serra, D.H. Turner and S.M. Freier, Methods Enzymol. {\bf 259} (1995) 243.

1413: %-261.

1414:

1415: \bibitem{ZTM}  M. Zuker, D.H. Mathews and D.H. Turner, RNA Biochemistry and Biotechnology, J. Barciszewski and B.F.C. Clark,eds., NATO ASI Series (1999) Kluwer Academic Publishers.

1416:

1417: \bibitem{caskill} J.S. McCaskill, Biopolymers {\bf 29} (1990) 1105.

1418: %-1119.

1419:

1420: \bibitem{z2003} M. Zuker, Nucleic Acids Res. {\bf 31} (2003) 3406.

1421: %-15.

1422: %Mfold web server for nucleic acid folding and hybridization prediction.

1423:

1424: \bibitem{mathews} D.H. Mathews, J. Sabina, M. Zuker and  D.H. Turner, J. Mol. Biol. {\bf 288} (1999) 911.

1425: %-940.

1426: %Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure

1427:

1428: \bibitem{ivovienna} I.L. Hofacker, Nucleic Acids Res. {\bf 31} (2003) 3429.

1429: %-3431.

1430:

1431: \bibitem{Mau} J. Ambj{\o}rn, M. Carfora, and A. Marzuoli, {\it The Geometry of Dynamical Triangulations}, Springer-Verlag, Berlin, 1998.

1432:

1433: \bibitem{Amb}  J. Ambj{\o}rn, B. Durhuus, T. Jonsson, {\it Quantum Geometry : A Statistical Field Theory Approach}, Cambridge University Press, 1997.

1434:

1435: \bibitem{thooft}  G. 't Hooft, Nucl. Phys. {\bf B72} (1974) 461.

1436: %A planar diagram theory for strong interactions, Nucl. Phys. B72 (1974)

1437:

1438: \bibitem{MC} K. Binder and D.W. Heerman, {\it Monte Carlo Simulation in Statistical Physics}, Springer-Verlag, Berlin,1992.

1439:

1440: \bibitem{frenkel} D. Frenkel and B. Smit, {\it Understanding Molecular Simulation}, 2nd edition, Academic Press,  2002.

1441:

1442: \bibitem{combinatorics} I.L. Hofacker, P. Schuster, P.F. Stadler,

1443: Discr. Appl. Math. {\bf 88} (1998) 207.

1444: %-237.

1445: %Combinatorics Of RNA Secondary Structures

1446:

1447: \bibitem{Metro} N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller, Jour. Chemical Physics {\bf 21} (1953) 1087.

1448: %-1092.

1449: %Equation of State Calculations by Fast Computing Machines

1450:

1451: \bibitem{knuth3} D.E. Knuth, {\it The Art of Computer Programming, Volume 3: Sorting and Searching}, 2nd ed. Reading, MA: Addison-Wesley, 1998.

1452:

1453: \bibitem{knuth2} D.E. Knuth, {\it The Art of Computer Programming, Volume 2: Seminumerical Algorithms}, 3rd ed. Reading, MA: Addison-Wesley, 1997.

1454:

1455: \bibitem{NumRecipes} W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, {\it Numerical Recipes in C : The Art of Scientific Computing}, Cambridge University Press, Cambridge, 1992.

1456:

1457: \bibitem{Geyer92} C.J. Geyer, Statist. Sci. {\bf 7} (1992) 473.

1458: %-511.

1459:

1460: \bibitem{jackknife} J. Shao and D. Tu, {\it The Jackknife and Bootstrap}, Springer Verlag, 1995.

1461:

1462: \bibitem{Liu} J.S. Liu, {\it  Monte Carlo Strategies in Scientific Computing},  Chapter 2, Springer New York, 2001.

1463:

1464: \bibitem{ZukEn} M. Zuker, Methods Enzymol. {\bf 180} (1989) 262.

1465: %288.

1466:

1467: %%%ENERGYPARAMETERS

1468: %A. Walter, D Turner, J Kim, M Lyttle, P M�ller, D Mathews, M Zuker "Coaxial stacking of helices enhances binding of oligoribonucleotides.." PNAS, 91, pp

1469: %9218-9222, 1994

1470:

1471: %algorithm for loop matching

1472: %\bibitem{phyl5}  R.R. Gutell, Curr. Opin. Struct. Biol. {\bf 6} (1993), 313.

1473:

1474: \bibitem{kirk} S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, Science {\bf 220} (1983) 671.

1475: %-680.

1476:

1477: \bibitem{annea} S. Geman and D. Geman, IEEE Trans. on Pattern Analysis and Machine Intelligence,

1478: {\bf 6} (1984) 721.

1479: %-741.

1480:

1481: \end{thebibliography}

1482: \end{document}

1483:

1484:

1485:

1486: %%% Local Variables:

1487: %%% mode: latex

1488: %%% TeX-master: t

1489: %%% End:

1490: