1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2: %\\
3: %Title: Prediction of RNA pseudoknots by Monte Carlo simulations
4: %Authors: Graziano Vernizzi, Henri Orland, Anthony Zee
5: %Comments: LaTeX, 22 pages
6: %Report-no: SPhT-T04/061
7: %\\
8: %Abstract:
9: % In this paper we consider the problem of RNA folding with pseudoknots.
10: % We use a graphical representation in which the secondary structures
11: % are described by planar diagrams. Pseudoknots are identified as
12: % non-planar diagrams. We analyze the non-planar topologies of RNA
13: % structures and propose a classification of RNA pseudoknots according
14: % to the minimal genus of the surface on which the RNA structure can be
15: % embedded. This classification provides a simple and natural way to
16: % tackle the problem of RNA folding prediction in presence of
17: % pseudoknots. Based on that approach, we describe a Monte Carlo
18: % algorithm for the prediction of pseudoknots in an RNA molecule.
19: %
20:
21: \documentclass[11pt]{article}
22: \usepackage{amssymb}
23: \usepackage{epsfig}
24: \newlength{\bredde}
25: \def\slash#1{\settowidth{\bredde}{$#1$}\ifmmode\,\raisebox{.15ex}{/}
26: \hspace*{-\bredde} #1\else$\,\raisebox{.15ex}{/}\hspace*{-\bredde} #1$\fi}
27: \textwidth 170mm
28: \textheight 230mm
29: \topmargin -0.8cm
30: \oddsidemargin -0.8cm
31: \evensidemargin -0.8cm
32:
33:
34: \newcommand{\vs}[1]{\rule[- #1 mm]{0mm}{#1 mm}}
35: \newcommand{\nn}{\nonumber}
36:
37: \newcommand{\sect}[1]{\setcounter{equation}{0}\section{#1}}
38: \renewcommand{\theequation}{\thesection.\arabic{equation}}
39:
40: \def\zc{{z^\ast}}
41: \def\tr{{\mbox{tr}}}
42: \def\re{{\Re\mbox{e}}}
43: \def\im{{\Im\mbox{m}}}
44:
45:
46: \begin{document}
47: \topmargin -1.4cm
48: \oddsidemargin -0.8cm
49: \evensidemargin -0.8cm
50: \title{\Large{{\bf
51: Prediction of RNA pseudoknots by Monte Carlo simulations
52: }}}
53:
54: \vspace{1.5cm}
55: \author{~\\{\sc G. Vernizzi}$^1$, {\sc H. Orland}$^1$
56: and {\sc A. Zee}$^2$\\~\\
57: $^1$Service de Physique Th\'eorique, CEA/DSM/SPhT Saclay\\
58: Unit\'e de recherche associ\'ee au CNRS\\
59: F-91191 Gif-sur-Yvette Cedex, France\\~\\
60: $^2$Institute of Theoretical Physics and Department of Physics\\
61: University of California, Santa Barbara, CA 93106, USA
62: }
63:
64:
65: \date{}
66: \maketitle
67: \vfill
68: \begin{abstract}
69: In this paper we consider the problem of RNA folding with pseudoknots.
70: We use a graphical representation in which the secondary structures
71: are described by planar diagrams. Pseudoknots are identified as
72: non-planar diagrams. We analyze the non-planar topologies of RNA
73: structures and propose a classification of RNA pseudoknots according
74: to the minimal genus of the surface on which the RNA structure can be
75: embedded. This classification provides a simple and natural way to
76: tackle the problem of RNA folding prediction in presence of
77: pseudoknots. Based on that approach, we describe a Monte Carlo
78: algorithm for the prediction of pseudoknots in an RNA molecule.
79: \end{abstract}
80: %PACS 02.40.Pc General topology **
81: % 82.35.Pq Biopolymers, biopolymerization **
82: % 82.39.Pj Nucleic acids, DNA and RNA bases
83: % 87.14.Gg DNA, RNA **
84: % 87.15.Aa Theory and modeling; computer simulation
85: % 87.15.Cc Folding and sequence analysis **
86: % 87.15.He Dynamics and conformational changes
87:
88:
89: %keywords: General topology, RNA, pseudoknot, structure prediction,
90: % Monte Carlo simulations
91:
92:
93: \vfill
94:
95:
96: \begin{flushleft}
97: SPhT-T04/061\\
98: q-bio.BM/0405014
99: \end{flushleft}
100: \thispagestyle{empty}
101: \newpage
102:
103:
104:
105: \renewcommand{\thefootnote}{\arabic{footnote}}
106: \setcounter{footnote}{0}
107:
108:
109: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
110: \sect{Introduction}
111: \label{introsection}
112: In recent years the quest for an algorithm which can predict the
113: spatial structure of an RNA molecule given its chemical sequence has
114: received considerable attention from molecular biologists
115: \cite{Science}. In fact the three-dimensional structure of an RNA
116: molecule is intimately connected to its specific biological function
117: in the cell (e.g. for protein synthesis and transport, catalysis,
118: chromosome replication and regulation) \cite{TB}. It is determined by
119: the sequence of nucleotides along the sugar-phosphate backbone of the
120: RNA. The chemical formula or sequence of covalently linked nucleotides
121: along the molecule from the 5' to the 3' end is called the {\it
122: primary structure}. The four basic types of nucleotides are adenine
123: (A), cytosine (C), guanine (G) and uracil (U), but it is known that
124: modified bases may appear \cite{LCC}.
125:
126: At high enough temperatures, or under high-denaturant conditions RNA
127: molecules have the three-dimensional structure of a free
128: single-stranded swollen polymer. At room temperature, different
129: nucleotides can pair by means of saturating hydrogen bonds. The
130: standard Watson-Crick pairs are A$\bullet$U and C$\bullet$G with two
131: and three hydrogen bonds respectively, whereas G$\bullet$U is a wobble
132: pair with two hydrogen bonds. Comparative methods showed that
133: ``non-canonical'' pairings are also possible \cite{Nagaswamy}, as well
134: as higher-order interactions such as triplets, or quartets. In this
135: paper we will consider only canonical base-pair interactions. Adjacent
136: base pairs can stack, providing and additional binding energy which is
137: actually the origin of the formation of stable A-form helices, one of
138: the main structural characteristics of folded RNAs. Helices may embed
139: unpaired sections of RNA, in the form of hairpins, loops and bulges.
140: It is all these pairings, stackings of bases and structural motifs
141: which bring the RNA into its folded three-dimensional
142: configuration. One of the main open problems of molecular biology is
143: the prediction of the actual spatial molecular structure of RNA
144: (i.e. its {\it shape}) given its primary structure.
145:
146: As we shall see in Section \ref{representsection}, it is possible to
147: define {\it secondary structures} of RNA as structures in which the
148: pairings between canonical base pairs do not cross in a certain
149: representation (planar graphs). One can also define the {\it tertiary
150: structure} of RNA which is the actual three-dimensional arrangement of
151: the base sequence. This classification corresponds to the fact that
152: the secondary structure of RNA carries the main contribution to the
153: free energy of a fully folded RNA configuration, including also some
154: of the sterical constraints. For that reason one can attempt to
155: describe the folding process hierarchically \cite{TB,BJT,BW,LD}.
156: However, since the secondary structure describes just the topology of
157: binary contacts of the bases, most of the information about distances
158: in real three-dimensional space is lost. The importance of the
159: secondary structure relies in the fact that it may provide the
160: ``skeleton'' of the final tertiary structure.
161:
162: Over the past twenty years several algorithms have been proposed for
163: the prediction of RNA folding. They are based on: deterministic or
164: stochastic minimization of a free energy function \cite{ZS,Monte},
165: phylogenetic comparison \cite{phyl1,phyl2,phyl3,phyl4}, kinetic
166: folding \cite{FFHS,mironovD,isa1,isa2}, maximal weighted matching
167: method \cite{MWM}, and several others (for a survey see \cite{Z2}).
168: It is fair to say that despite the large number of tools available for
169: the prediction of RNA structures, no reliable algorithms exist for the
170: prediction of the full tertiary structure of RNA. Most of the
171: algorithms listed above deal with the prediction of just the RNA
172: secondary structure. To describe the full folding it is important to
173: introduce the concept of RNA pseudoknot \cite{PRB}. One says that two
174: base pairs form a pseudoknot when the parts of the RNA sequence
175: spanned by those two base pairs are neither disjoint, nor have one
176: contained in the other. Thus RNA secondary structures without
177: pseudoknots can be represented by planar diagrams, whereas RNA with
178: pseudoknots appear when two base pairs can ``cross'', leading to
179: non-planar diagrams (a more precise definition is given in the next
180: Section). Pseudoknots play an important role in natural RNAs
181: \cite{WJ}, for structural, regulatory and catalytic
182: functions. Pseudoknots are excluded in the definition of RNA secondary
183: structure and many authors consider them as part of the tertiary
184: structure. This restriction is due to the fact that RNA secondary
185: structures without pseudoknots can be predicted easily. One should
186: also note that pseudoknots very often involve base-pairing from
187: distant parts of the RNA, and are thus quite sensitive to the ionic
188: strength of the solution. It has been shown that the number of
189: pseudoknots depends on the concentration of Mg$^{++}$ ion, and can be
190: strongly suppressed by decreasing the ionic strength (thus enhancing
191: electrostatic repulsion) \cite{MD,Dave,MD2}. The most popular and
192: successful technique for predicting secondary structures is dynamic
193: programming \cite{ZS,waterman,williams,NJ,Z,Z1,HH2,Wuchty}, for which
194: the memory and CPU requirements scale with the sequence length $L$ as
195: $O(L^2)$ and $O(L^3)$, respectively.
196:
197: Recently, new deterministic algorithms that deal with pseudoknots have
198: been formulated
199: \cite{RE,Uemura,Akutsu,LP,giege,deogun,OZ,POZ,PTOZ}. In this case the
200: memory and CPU requirements generally scale as $O(L^4)$ and $O(L^6)$
201: respectively ($O(L^4)$ and $O(L^5)$ in
202: \cite{LP}, or $O(L^4)$ and $O(L^3)$ for a restricted model in
203: \cite{Akutsu}), which can be a very demanding computational effort
204: even for short RNA sequences ($L\sim100$). Moreover, the main
205: limitation of these algorithms is the lack of precise experimental
206: informations about the contribution of pseudoknots to the RNA free
207: energy, which is often excluded a priori in the data analysis (as also
208: pointed out in \cite{isa1,MironovL,gultyaev}). The
209: increase of computational complexity does not come as a surprise. In
210: fact the RNA-folding problem with pseudoknots has been proven to be
211: NP-complete for some classes of pseudoknots \cite{Akutsu,LP}.
212: For that reason, stochastic algorithms might be a better choice to
213: predict secondary structures with pseudoknots in a reasonable time and
214: for long enough sequences.
215:
216: In \cite{Monte,abrah,gultyMonte,Ivo} stochastic Monte Carlo algorithms
217: for the prediction of RNA pseudoknots have been proposed. In these
218: stochastic approaches, the very irregular structure of the energy
219: landscape (glassy-like) is the main obstacle: configurations with
220: small differences in energy may be separated by high energy barriers,
221: and the system may very easily get trapped in metastable states. Among
222: the stochastic methods, the direct simulation of the RNA-folding
223: dynamics (including pseudoknots) with kinetic folding algorithms
224: \cite{isa1,isa2} is most successful. This technique allows to
225: describe the succession of secondary structures with pseudoknots
226: during the folding process. The approach we follow in this paper is
227: close in spirit to that one, with a stronger emphasis on the
228: topological character of the RNA pseudoknots. It is based on a
229: correspondence (first noticed by E. Rivas and S.R. Eddy in \cite{RE})
230: between a graphical representation of RNA secondary structures with
231: pseudoknots and Feynman diagrams. In \cite{RE} the authors consider
232: only a particular class of pseudoknots. Along the same direction, the
233: authors of \cite{OZ} made the correspondence between RNA folding and
234: Feynman diagrams more explicit by formulating a {\it matrix field
235: theory} model whose Feynman diagrams give exactly all the RNA
236: secondary structures with pseudoknots. The remarkable facts of this
237: new approach is that it provides an analytic tool for the prediction
238: of pseudoknots, and all the diagrams appear to be naturally organized
239: in a series of terms, called the {\it topological expansion}, where
240: the first term corresponds to planar secondary structures without
241: pseudoknots, and higher-order terms correspond to structures with
242: pseudoknots.
243:
244: In this paper we explore in more detail this topological expansion
245: and its potential predictive power. We also propose a numerical
246: stochastic algorithm for dealing with this expansion in a systematic
247: way, which in principle allows the prediction of all kinds of RNA
248: pseudoknots. The paper is organized as follows. In Section
249: \ref{representsection} we review some well-known graphical
250: representation of RNA structures, with special emphasis on the
251: so-called ``disk diagram'' representation. In such a representation
252: one can uniquely associate to each RNA secondary structure with (or
253: without) pseudoknots, a circle diagram which is planar (or not planar,
254: respectively). In Section \ref{toposection} we show how one can
255: characterize the ``degree of non-planarity'' of a given disk diagram.
256: In fact, one can always associate an integer number to each RNA disk
257: diagram, called the {\it genus}, and we will describe its topological
258: meaning and information content. We thus propose to classify
259: RNA pseudoknots according to their genus. Following this idea, in Section
260: \ref{statsection} we generalize the standard thermodynamic model for the
261: description of RNA structures to the inclusion of pseudoknots. The
262: generalized model we propose is very natural, in the same spirit when
263: going from the Canonical Ensemble to the Grand Canonical Ensemble in
264: statistical mechanics. Our model can control the topological
265: fluctuations i.e. the formation of pseudoknots in the RNA molecule,
266: and we will describe the general features of its phase diagram. In
267: Section \ref{montesection} we describe a Monte Carlo algorithm for the
268: actual calculation of thermodynamical quantities in our generalized
269: model. In particular we will list in details the Monte Carlo moves,
270: the free-energy updating algorithm and the simulated annealing method
271: we propose for dealing with the problem of high energy
272: barriers. Section \ref{conclsection} contains the concluding remarks,
273: and the Appendix is devoted to the explicit description of a part of
274: the Monte Carlo algorithm of Section
275: \ref{montesection}.
276:
277:
278:
279:
280:
281:
282: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
283: \sect{Representation of RNA structures}
284: \label{representsection}
285:
286: Any RNA sequence can be represented as the list of nucleotides
287: $r_i\in$(A,C,G,U), $i=1,\ldots,L$, where $r_i$ is the $i$-th
288: nucleotide along the oriented sugar backbone (from 5' to 3'). The
289: ordered list $\{r_1,r_2,\ldots,r_L\}$ is called the primary structure
290: of the RNA.
291:
292: The RNA secondary structure requires a more graphical representation.
293: Actually there are several equivalent ways to represent an RNA
294: secondary structure with a given primary structure. The most commonly
295: used representation is perhaps the {\it bracket} notation, where two
296: paired bases, $r_i$ and $r_j$ ($i<j$), are represented by parenthesis
297: ``(`` and ``)'', and unpaired bases are represented by a dot '.' or a
298: colon ':' (see Figure
299: \ref{bracketplot}). Pseudoknots can be described in a similar fashion,
300: but one needs to introduce several different kinds of brackets (like
301: square brackets '[', ']' or braces '$\{$', '$\}$', as for example in
302: the database \cite{pseudobase,pseudobase1}), and this is not a
303: very efficient representation for complicated structures.
304: \begin{figure}[-t]
305: \centerline{
306: \epsfig{figure=data/bracket.eps,width=25pc}
307: }
308: \caption{
309: An RNA configuration without pseudoknots (left column) and RNA
310: configuration with a simple ``H'' pseudoknot (right column). From the
311: top to the bottom: the RNA configuration, arc representation and
312: bracket representation. Note that the arc diagram for pseudoknotted
313: RNA has crossing arcs and the bracket representation requires two
314: kinds of parenthesis.}
315: \label{bracketplot}
316: \end{figure}
317:
318: %A closely related graphical representation is the {\it mountain plot},
319: %where a list of rectangles are superimposed, each rectangle having
320: %a height proportional to the strength of the bond between $r_i$ and $r_j$ and %a width equal to the segment $i-j$ (see Figure \ref{bracketplot}). RNA seconda%ry structures without pseudoknots can be represe%nted also
321: %by tree graphs and the ones with pseudoknots with non-tree graphs. This
322: %approach its consequences from the point of view of graph theory
323: % are analyzed in \cite{RNAgraph1}.
324:
325:
326: Among several other representations (e.g. mountain diagrams
327: \cite{HH}, tree diagrams \cite{FKSS}, graphs \cite{RNAgraph1}),
328: a very general and widely used representation is the so-called {\it
329: dot plot} diagram. It is an array where a dot is placed in the row $i$
330: and column $j$ if the bases $r_i$ and $r_j$ are actually paired (see
331: Figure
332: \ref{rnadotplot}). This plot is the graphical representation of the $L
333: \times L$ {\it contact matrix} $C$ with elements
334: \begin{equation}
335: C_{ij}=
336: \left\{
337: \begin{array}{ll}
338: 1 & \mbox{if } i \mbox{ and } j \mbox{ are paired} \, ,\\
339: 0 & \mbox{ otherwise.}
340: \end{array}
341: \right.
342: \label{Contact}
343: \end{equation}
344: In mathematical terms, the contact matrix $C$ is the matrix
345: of the permutation involution associated to the
346: given set of pairings. In fact, one can always interpret the base pairing
347: $i-j$ as a transposition of the elements $\{i,j\}$ and therefore one
348: can uniquely associate a permutation $\sigma$ to any structure by:
349: \begin{equation}
350: \sigma(i)=
351: \left\{
352: \begin{array}{ll}
353: j & \mbox{if } i \mbox{ and } j \mbox{ are paired} \, ,\\
354: i & \mbox{ otherwise.}
355: \end{array}
356: \right.
357: \end{equation}
358: For example, if the primary structure is $\{5'-CUUCAUCAGGAAAUGAC-3'
359: \}$ and the pseudoknotted secondary structure is: $.(((.[[[)))..]]].$,
360: one can associate to it the permutation:
361: \begin{equation}
362: \sigma=\left(
363: \begin{array}{ccccccccccccccccc}
364: .&(&(&(&.&[&[&[&)&)&)&.&.&]&]&]&.\\
365: 1& 2 &3 &4 &5 &6 &7 &8 &9 &10& 11& 12& 13& 14& 15& 16& 17 \\
366: 1&11 &10&9&5 &16&15&14 &4 &3 &2 & 12& 13& 8 & 7 & 6 & 17
367: \end{array}
368: \right) \, ,
369: \end{equation}
370: which is also an involution since $\sigma^2$ is the identity
371: permutation. The matrix representation of $\sigma$ is the matrix $D$
372: with $D_{i, \sigma(i)}=1$ and $0$ otherwise. Obviously $D=C+{\cal I}$,
373: where ${\cal I}$ is the $L \times L$ identity matrix. This notation is
374: very useful for numerical implementations of the algorithm we propose
375: in Section \ref{montesection}. The advantage of the dot plot diagram
376: is that it allows the comparison between different RNA secondary
377: structures, just by superimposition as it is necessary for
378: comparative analysis. Moreover it can be used for representing
379: RNA structure with any kind of pseudoknots.
380: \begin{figure}[-t]
381: \centerline{
382: %\epsfig{figure=data/rnaplot.ps,width=15pc}
383: \epsfig{figure=data/dotplot2.eps,width=30pc}
384: %\put(5,0){$\bar{m}$}
385: }
386: \caption{Representation of an RNA secondary structure with an ``H'' pseudoknot (left), and the corresponding dot plot diagram (right).}
387: \label{rnadotplot}
388: \end{figure}
389:
390: A representation which is completely equivalent to the dot plot
391: diagram is the {\it disk diagram} (also called {\it circle plot} or
392: {\it circular plot}). In this case the RNA sequence is represented as
393: an oriented circle (from 5' to 3') by virtually linking the first
394: nucleotide to the last one. Each base pairing is represented as an arc
395: inside the circle, connecting the two paired bases. Figure
396: \ref{circleplot} shows a typical disk diagram. In this representation
397: secondary structures without pseudoknots are purely planar diagrams,
398: i.e. diagrams that can be drawn without crossing arcs, whereas
399: pseudoknots correspond to structures which are not planar.
400: \begin{figure}[-ht]
401: \centerline{
402: \epsfig{figure=data/circles.eps,width=15pc}
403: }
404: \caption{
405: Typical disk (circle) diagram representation of an RNA secondary
406: structure without pseudoknots. The circle is anticlockwise oriented
407: from $5'$ to $3'$. Note that there are no crossing arcs.}
408: \label{circleplot}
409: \end{figure}
410: This fact has been already observed by E.Rivas and S.R.Eddy in
411: \cite{RE}, where they consider diagrams with arcs inside
412: {\it and} outside the disk\footnote{More precisely, they represent the
413: RNA sequence as an oriented straight line, and the pairings as arcs
414: above and below that line. This is of course equivalent to the disk
415: representation.}. Crossing arcs are allowed but only inside or only
416: outside but not both at the same time (so-called ``overlapping
417: pseudoknots'', see diagrams a) and b) of Figure
418: \ref{rivaseddyplot}). As it was shown in
419: \cite{POZ,PTOZ}, several general classes of pseudoknots cannot be
420: described in such a simple way (such as the diagram on the right of Figure
421: \ref{rivaseddyplot}). It is then more convenient to draw the
422: arcs always inside the disk (or outside, but not both) and to consider
423: all the corresponding diagrams as non planar. It is precisely
424: following this approach that the authors of \cite{OZ} found an
425: algorithm for computing pseudoknots with matrix field theory in a
426: completely general fashion. In this paper we pursue the same analysis
427: by considering the diagrams themselves and not the associated matrix
428: field theory model.\\
429: \begin{figure}[-ht]
430: \centerline{
431: \epsfig{figure=data/eddyrivas1.eps,width=11pc}
432: \epsfig{figure=data/eddyrivas2.eps,width=9pc}
433: \epsfig{figure=data/eddyrivas3.eps,width=9pc}
434: \put(-290,-13){$a)$}
435: \put(-170,-13){$b)$}
436: \put(-55,-13){$c)$}
437: }
438: \caption{
439: Three kind of disk diagrams for RNA secondary structures with
440: pseudoknots. The authors of \cite{RE} consider cases of the form a)
441: (``overlapping pseudoknots''), b) (pseudoknot present in {\it
442: Escherichia coli $\alpha$} mRNA \cite{Gluick}) but not c) (parallel
443: $\beta$-sheet protein interaction). The technique in
444: \cite{OZ} can deal with all the three cases.}
445: \label{rivaseddyplot}
446: \end{figure}
447:
448:
449:
450:
451:
452:
453: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
454: \sect{The topological character of RNA pseudoknots}
455: \label{toposection}
456:
457: There is a very natural way for classifying the ``degree of
458: non-planarity'' of a given disk diagram, which we review here
459: briefly. It is based on a topological analysis introduced long ago by
460: Euler. We emphasize that this characterization is a well-known
461: classical result of algebraic topology
462: and it has been already introduced in \cite{OZ} for RNA secondary
463: structures. We repeat it here in more detail for the convenience
464: of the reader.
465:
466: As we have shown, in any disk diagram the RNA sequence is represented
467: by an oriented circle. When the circle is drawn on a sphere, its
468: orientation allows to distinguish an ``inside'' and an ``outside'' of
469: the circle. One says that the circle is a ``boundary'' or
470: ``puncture'' (as it can be drawn smaller and smaller, in a continuous
471: fashion up to single point) on the sphere. Hence any (disk) planar
472: diagram can be drawn on a sphere without crossing lines, simply by
473: drawing the arcs on the same side (see Figure \ref{sphereplot}). The key
474: observation is that the sphere is naturally partitioned in several
475: parts by the diagram. As explained in \cite{OZ} it is useful to draw
476: the arcs with a ``double-line notation'' (see Figure \ref{sphereplot}). In
477: this way it is clear that the sphere is partitioned into several
478: polygons. Note that all the lines have an orientation induced by the
479: one of the circle.
480: \begin{figure}[-t]
481: \centerline{
482: \epsfig{figure=data/sphere.eps,width=25pc}
483: }
484: \caption{Example of planar disk diagram. In a) the disk diagram of a double hairpin (like the one in Figure \ref{bracketplot}) is on a sphere. In b) the arcs are drawn all outside the circle. In c) the sphere is partitioned in 6 patches (5 faces and one ``hole'', i.e. the RNA circle). In d) is the representation of c) in double line notation (black thick arcs). Here $\#F=5$, $\#V=4$, $\#E=8$, and therefore $\chi=1$, i.e. $g=0$.}
485: \label{sphereplot}
486: \end{figure}
487:
488: The Euler characteristic $\chi$ of a diagram is defined as
489: \begin{equation}
490: \chi=\#V-\#E+\#F \, ,
491: \label{genus}
492: \end{equation}
493: where $\#V$, $\#E$ and $\#F$ are the numbers of vertices,
494: edges, and faces, respectively. A vertex is just a nucleotide, an edge is any line
495: connecting two nucleotides (either an arc joining the nucleotides, or
496: the RNA sequence) and a face is that part of the surface within a closed
497: loop of edges. Obviously, if there are $n$ arcs then $\#E=\#V+n$. A
498: famous theorem of Euler states that any polyhedron homeomorphic to a
499: sphere with a boundary (puncture) has an Euler characteristic
500: $\chi=1$. Therefore all RNA secondary structures without pseudoknots
501: are described by disk diagrams with $\chi=1$.
502:
503:
504: \begin{figure}[-ht]
505: \centerline{
506: \epsfig{figure=data/khplot1.eps,width=12pc}
507: \epsfig{figure=data/khplot2.eps,width=10pc}
508: %\put(5,0){$\bar{m}$}
509: }
510: \caption{A ``kissing hairpin'' pseudoknot. The respective disk diagram has
511: crossing arcs necessarily, when the arcs are drawn all inside (or all
512: outside) the RNA circle.}
513: \label{kisshairpinplot}
514: \end{figure}
515: Let us discuss the case when there is a pseudoknot. For simplicity, we
516: consider a ``kissing hairpin'' pseudoknot. In this case the
517: corresponding disk diagram is not planar, and has crossing arcs (see
518: for instance Figure \ref{kisshairpinplot}). After drawing the disk
519: diagram in double line notation, and counting the number of vertices,
520: edges and faces, one gets that $\chi=-1$ this time. This value has a
521: precise geometrical meaning. In fact, the Euler characteristic of a
522: surface (or of a manifold in general) is closely related to its {\it
523: genus} $g$, i.e. the number of ``handles'' of the surface. Namely if
524: the manifold is orientable (as the disk diagram is, since the oriented
525: circle line defines naturally an orientation of all the elements of
526: the diagram), then one has $\chi = 2 - 2g-c$ where $c$ is the number
527: of punctures ($c=1$ in the case we consider here, with only one RNA
528: strand). It follows that a kissing hairpin is represented by a disk
529: diagram with genus $g=1$. One concludes then, that such a disk diagram
530: can be drawn on an oriented manifold with one handle, that is a {\it
531: torus} (which is a doughnut-shaped surface formed by taking a cylinder
532: and joining the two circular ends together, see Figure \ref{torusplot}). This
533: procedure can be extended easily to cases with more complex
534: pseudoknots. For instance, the three diagrams of Figure
535: \ref{rivaseddyplot} have genus $g=2$, $g=1$ and $g=2$,
536: respectively. In Figure \ref{eightplot} there is a graphical
537: representation of all 8 types of irreducible pseudoknots with genus
538: $g=1$ (from \cite{POZ}) and in Figure \ref{higherplot}
539: there is some examples of pseudoknots with a higher genus.\\
540: \begin{figure}[-ht]
541: \centerline{
542: \epsfig{figure=data/torus.eps,width=20pc}
543: }
544: \caption{The ``kissing hairpins'' of Figure \ref{kisshairpinplot} can be drawn on a torus without intersections. In this example $\#F=9$, $\#V=20$, $\#E=20+10$, and therefore $\chi=-1$, i.e. $g=1$.}
545: \label{torusplot}
546: \end{figure}
547:
548: \begin{figure}[-ht]
549: \centerline{
550: \epsfig{figure=data/eightloop.eps,width=30pc}
551: %\put(5,0){$\bar{m}$}
552: }
553: \caption{List of all eight irreducible diagrams with genus $g=1$
554: (from \cite{PTOZ}) and their representation with double line notation, on the left column and right column, respectively.}
555: \label{eightplot}
556: \end{figure}
557: \begin{figure}[-ht]
558: \centerline{
559: \epsfig{figure=data/highergenus.eps,width=30pc}
560: }
561: \caption{Example of RNA pseudoknots with higher genus. The first two plots correspond to the diagrams a) and c) of Figure \ref{rivaseddyplot} with genus $g=2$. The third plot is an example with genus $g=3$.}
562: \label{higherplot}
563: \end{figure}
564: Thus we have a simple way to classify pseudoknots. This classification
565: corresponds exactly to the series expansion of the partition function
566: of the matrix model proposed in \cite{OZ}. There, the series is in
567: powers of the form $N^{-2g}$, where $N$ is the size of the matrix, and
568: $g$ is the genus of the corresponding set of diagrams. In the next
569: section, we will exploit the same idea and show how one can
570: control the topological character of pseudoknots in a statistical
571: mechanical model for RNA secondary structures with pseudoknots.
572:
573:
574:
575:
576:
577: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
578: \sect{Statistical mechanics model of RNA structures with pseudoknots}
579: \label{statsection}
580:
581: In almost all the energy models for RNA which have been proposed in
582: recent years, the thermodynamical properties of a single stranded RNA
583: are studied by means of a partition function of the form
584: \begin{eqnarray}
585: {\cal Z}_{RNA}&=&\int \prod_{k=1}^L \, d \mathbf
586: {r}_k \,\, \sum_{C_{ij}} f(\{ \mathbf{r}\})
587: e^{-\frac{1}{k_B T} U(C_{ij}, \{ \mathbf{r} \} )}
588: \sim\sum_{C_{ij}} \omega(C) e^{-\frac{1}{k_B T} E(C) }= \nonumber \\
589: &=&
590: \sum_{C_{ij}} e^{-\frac{1}{k_B T}\left[ E(C)-T S(T,C) \right]} \, ,
591: \label{ZRNA}
592: \end{eqnarray}
593: where $T$ is the temperature, $k_B$ is the Boltzmann constant,
594: $\mathbf{r}_k$ is the three-dimensional position vector of the $k$-th
595: nucleotide, $f(\{ \mathbf{r}\})$ takes into account the geometry and
596: the constraints of the chain of nucleotides, the function $U$ takes
597: into account the energetics coming from the pairing and stacking of
598: base pairs, and the sum over $C_{ij}$ is the sum over all possible
599: contact matrices for a given primary structure. The function
600: $\omega(C)$ is proportional to the number of configurations having the
601: same contact matrix $C$, and therefore its logarithm is just the
602: entropy factor associated to the polymeric nature of the
603: sugar-phosphate backbone. The free energy of a given configuration
604: ${\cal F(C)} \equiv E(C)-T S(T,C)$ is the sum of several
605: contributions, both of energetic ( $E(C)$ ) and entropic nature (
606: $S(C)$ ): Watson-Crick and wobble base pairs, stacking energies,
607: terminal mismatches and dangling energies, special triloops and
608: tetraloops, entropy contributions (internal loops, bulges, hairpin
609: loops),
610: %usually they are modeled from self-avoiding polymers theory
611: penalty factors for terminal-AU in helices, for asymmetries etc. All
612: these terms have been determined empirically, and they are called
613: ``Turner energy rules'' \cite{Tur}. For more details see
614: \cite{ZTM}. When pseudoknots are excluded, the sum in eq. (\ref{ZRNA}) is
615: restricted over contact matrices that correspond to planar diagrams
616: only. As we mentioned already in the introduction, the partition
617: function ${\cal Z}_{RNA}$ without pseudoknots can be calculated
618: efficiently by deterministic algorithms (dynamic programming)
619: \cite{caskill}: the most popular ones are perhaps the ``mfold
620: package'' by M.Zuker et al. \cite{z2003,mathews} and the ``RNA
621: Vienna package'' by I.Hofacker et al. \cite{ivovienna}\footnote{They
622: are available on-line at {\texttt
623: www.bioinfo.rpi.edu/applications/mfold/} and {\texttt
624: www.tbi.univie.ac.at/}, respectively.}. When pseudoknots are
625: included, the sum in eq. (\ref{ZRNA}) is unrestricted and, as we
626: described in the previous Section, this leads to topology
627: fluctuations. This situation is very common also in other areas of
628: Physics (e.g. dynamical triangulations \cite{Mau}, random surfaces or
629: quantum gravity \cite{Amb}, quantum field theory \cite{thooft}), and
630: there are now standard ways to deal with it. The idea is to introduce
631: an additional parameter $\mu$, which is a topological ``chemical
632: potential'', and to consider the partition function:
633: \begin{equation}
634: {\cal Z}_{RNA}(\mu)= \sum_{C_{ij}}
635: e^{-\frac{1}{k_B T}\left[ E(C)-T S(T,C)+\mu g(C) \right]} \, ,
636: \label{ourZRNA}
637: \end{equation}
638: where $g(C)$ is the genus of the configuration associated to the contact
639: matrix $C$. The ``chemical potential'' $\mu$ allows a simple control over the
640: topological character of the pseudoknots in the statistical ensemble
641: at thermal equilibrium. It is also directly related to $N$ (the size
642: of the matrix) in the matrix model formulation of \cite{OZ}:
643: \[
644: {\cal Z}_{Matrix} \sim 1+\frac{Z_1}{N^2}+\frac{Z_2}{N^4}+\ldots \, ,
645: \]
646: with $\mu=-2 k_B T \log(N)$. The advantage here is that the energy
647: function $E(C)$ can be more realistic than the one in \cite{OZ}.
648:
649: The model without chemical potential, i.e. $\mu=0$, corresponds to the
650: case where there are no restrictions on the possible fluctuations of
651: the topology. On the other hand when $\mu$ is very large, all the
652: configurations with $g>0$ are suppressed by the Boltzmann weight, and
653: in this case one recovers the planar limit (i.e. RNA secondary
654: structures without pseudoknots). One might expect then a phase
655: transition associated to the formation of pseudoknots. A natural order
656: parameter is the average genus of a RNA structure with pseudoknots
657: which can simply be recovered by taking the logarithmic derivative of
658: the partition function
659: \begin{equation}
660: \langle g(\mu) \rangle
661: =
662: -k_B T \frac{\partial}{\partial \mu} \log {\cal Z}_{RNA}(\mu) \, .
663: \end{equation}
664: To our knowledge there are no available experimental data about
665: the dependence of the genus of RNA molecules on the
666: temperature. Informations and inputs from experiments would be highly
667: desirable.
668:
669: Figure \ref{phaseplot} displays the expected phase diagram in the
670: plane $\{\mu,T\}$ of our model. At high temperature, the RNA is always
671: in a fully denaturated phase. At lower temperature and large $\mu$ the
672: secondary structures without pseudoknots are the dominating
673: configurations. The interesting part of the diagram is for lower
674: values of $\mu$, where possibly $ \langle g(\mu) \rangle \; \neq 0$
675: and pseudoknots are present.
676: \begin{figure}[-ht]
677: \centerline{
678: \epsfig{figure=data/phase.eps,width=15pc}
679: \put(-190,120){$T$}
680: \put(-35,-9) {$\mu$}
681: \put(-95,120) {denaturated phase}
682: \put(-5,50) {planar limit}
683: \put(-170,10) {phase with pseudoknots}
684: }
685: \caption{Qualitative structure of the phase diagram in the $\{\mu,T\}$ plane.}
686: \label{phaseplot}
687: \end{figure}
688:
689: Even if eq. (\ref{ourZRNA}) can in principle deal with pseudoknotted
690: RNA molecules, it is fair to say that for any realistic energy
691: function, the model is rather unlikely amenable to an analytic
692: solution. Moreover any dynamic programming approach has been shown to
693: be computationally very demanding even for pseudoknots with genus
694: $g=1$ \cite{PTOZ}. Hence a stochastic algorithm for studying the model
695: eq. (\ref{ourZRNA}) is probably the only feasible way. In the next
696: Section we describe in details a Monte Carlo algorithm for the
697: simulation of the model of eq. (\ref{ourZRNA}).
698:
699: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
700: \sect{A Monte Carlo algorithm for RNA pseudoknots prediction}
701: \label{montesection}
702:
703: %The assumption of the RNA being in thermodynamic
704: %equilibrium may well be wrong, and it certainly is in some cases
705: %(such as during the synthesis of the RNA) /cite{S.R. Morgan and P.G.Higgs, J.%Chem.Phys. {\bf 105} (1996) 7152.}. In fact this observation led several
706: %groups to develop kinetic folding algorithms for RNA secondary structures
707: %(for predicting pseudoknots \cite{hofacker}, or recostruting folding
708: %pathways
709: %analysis of kinetic folding of RNA has been introduce by Martinez(1984
710: %Mironov et al(1985) Mironov &Kister (1986), proposing montecarlo algorithms
711: %construction of secondary structures.
712:
713: A well-known method for generating a set of configurations which are
714: distributed according to a given Boltzmann weight is the Monte Carlo
715: method. It is a standard method of modern computational analysis and
716: we refer to \cite{MC,frenkel} for a review and an introduction on the
717: subject. In recent years, it has been also used for the prediction of
718: RNA secondary structures in various contexts. In particular, our
719: proposal can be thought of as a generalization of the Monte Carlo
720: method described in \cite{Monte} where the authors considered only RNA
721: secondary structures without pseudoknots. We aim to apply this method
722: to the statistical ensemble defined by eq. (\ref{ourZRNA}).
723:
724: The sum over all the RNA configurations in eq. (\ref{ourZRNA})
725: contains many terms. In general, the total number of RNA
726: configurations (planar and non planar configurations) grows like $L!$
727: for a sequence of length $L$. The number of RNA configurations with a
728: fixed genus $g$ grows exponentially with $L$: a detailed analysis
729: for the number of diagrams with genus $g=0$ (i.e. planar diagrams) can
730: be found in \cite{combinatorics}.\footnote{An analysis for structure
731: with higher genus similar to the one in \cite{combinatorics} is still
732: lacking. We expect that the matrix field theory model introduced in
733: \cite{OZ} can shed some light on this issue.}
734: Since the number of secondary structures on a surface with fixed genus
735: grows exponentially, one expects that a brute force Monte Carlo
736: importance sampling would be rather ineffective for a not too-short
737: RNA sequence, in a reasonable amount of computation time. For that
738: reason we decided to use the standard Metropolis method \cite{Metro}. The
739: Metropolis method is an efficient and simple scheme for generating a
740: set of configurations distributed according to a given probability
741: function, by means of a random walk in the configuration space. In our
742: case, the Metropolis Monte Carlo method generates a set of $n$ RNA
743: configurations $\{C^{(0)},C^{(1)},\ldots,C^{(n)}\}$, such that
744: $\lim_{n
745: \to \infty} n_{C}/n=P(C)$, where $P(C)$ is the given
746: probability distribution (e.g., the Boltzmann distribution $P(C) =
747: {\cal Z}^{-1} \exp[\left( E-T S+\mu g \right) / k_B T ]$ and $n_C$ is
748: the number of configurations of type $C$ in the statistical
749: ensemble. Each element $C^{(k)}$ of the sequence is generated by
750: accepting or rejecting a random configuration. In the following we
751: give a complete description of the Metropolis Monte Carlo algorithm
752: for RNA pseudoknots predictions:
753:
754: \begin{itemize}
755:
756: \item Step 1: Pick an initial configuration $C^{(0)}$:
757: A simple initial configuration can be the fully denaturated state of
758: RNA, i.e. the contact matrix is the matrix with all zero entries and
759: the respective permutation involution is the identity permutation
760: $\sigma=\left(
761: \begin{array}{cccc}
762: 1&2&\ldots&L\\
763: 1&2&\ldots&L
764: \end{array}
765: \right)$. Set the variable $n=1$.
766:
767:
768: \item Step 2: Pick a trial configuration $C^{(n)}$ (by deforming
769: the configuration $C^{(n-1)}$). Such an operation is called ``Monte
770: Carlo move'' $C^{(n-1)} \to C^{(n)}$. Compute the probability ratio
771: \begin{equation}
772: \rho=\frac{P(C^{(n)})}{P(C^{(n-1)})} \, .
773: \label{rho}
774: \end{equation}
775: Pick a random number $x$ with value between 0 and 1. If $x \leq \rho$
776: accept the configuration $C^{(n)}$ as the new configuration.
777: Otherwise refuse it and keep $C^{(n-1)}$ as new configuration,
778: i.e. put $C^{(n)}=C^{(n-1)}$ . Increase the variable $n$ by one.
779:
780:
781: \item Step 3: repeat Step 2 for $n_{max}$ times, where $n_{max}$ is a
782: sufficiently large number.
783:
784: \end{itemize}
785:
786:
787: The most relevant aspect of this method is that, at large $n_{max}$,
788: one can generate an ensemble of configurations with the probability
789: distribution $P(C) = {\cal Z}^{-1} \exp[-(E(C)-T S(T,C) +\mu
790: g(C))/(k_B T)]$, simply by computing probability ratios. Therefore,
791: this method is extremely useful as it avoids the need of computing the
792: partition function ${\cal Z}$ of the system, a computational task that would be
793: surely intractable for long RNA sequences.
794: \begin{figure}[-ht]
795: \centerline{
796: \epsfig{figure=data/energy.eps,width=15pc}
797: \put(-90,85) {$P=1$}
798: \put(-70,125) {$P=e^{-\Delta E/k_b T}$}
799: }
800: \caption{The Metropolis algorithm accepts a configuration with lower energy
801: with probability $P=1$. It can also accept a configuration with higher
802: energy, with probability $P=e^{-\Delta E/K_b T}$, where $\Delta E$ is
803: the energy difference.}
804: \label{energyplot}
805: \end{figure}
806:
807:
808: \subsection{Configurational changes (Monte Carlo moves)}
809:
810: At large $n_{max}$, the above algorithm is guaranteed to generate a
811: set of configurations with the probability distribution $P(C)$, under
812: few assumptions. Two essential requirements are that the
813: Monte Carlo moves have to be {\it ergodic} and satisfy the so-called
814: {\it detailed balance condition}. Ergodicity essentially means that every
815: point in the configuration space can be reached in a finite number of
816: Monte Carlo steps from any other point. The detailed balance condition
817: here simply means that the Monte Carlo moves are symmetric, i.e. the
818: probability of proposing a Monte Carlo move $C \to C'$ is the same as
819: of proposing the move $C' \to C$.
820:
821: We describe now the Monte Carlo moves for
822: RNA folding. First, at the beginning of the
823: simulation, it is useful to make some book-keeping by storing in the memory the list of
824: all the allowed base-pairs (i.e. that are only of the type
825: A$\bullet$U, C$\bullet$G or G$\bullet$U). Such an information can be
826: stored in $L$ vectors $l_i$, $i=1,\ldots,L$, as follows: the
827: nucleotide in position $i$ can be paired to $n_i$
828: possible other nucleotides, namely with the ones in position $l_i(1),
829: l_i(2),\ldots,l_i(n_i)$ and nothing else. For example, if the primary
830: structure is $\{AGCU\}$ then we have:
831: \begin{equation}
832: l_1=[4] \, , \quad l_2=[3,4] \, , \quad l_3=[2] \, , \quad l_4=[1,2]
833: \, .
834: \end{equation}
835: The creation of such a list of possible base-pairs does not slow down
836: the total algorithm since it is an $O(L^2)$ operation which is done
837: only once. Now we want to extract one element from the list of $L$
838: vectors with uniform probability. This can be done as follows. Let $T
839: \equiv \sum_h n_h$, and let pick up a uniform integer random number $\tau$
840: between $1 \leq \tau \leq T$. Then take the highest integer number $i$
841: such that $\sum_{h=1}^{i} n_h \leq \tau$, and define
842: $y\equiv\tau-\sum_{h=1}^{i} n_h+1$. Obviously $1 \leq y \leq T$ holds
843: true. Consider the pair of bases $i$ and $j \equiv l_i(y)$. The
844: base-pair $i-j$ has been extracted randomly with uniform probability
845: in the set of all possible base-pairs, for the given RNA sequence. The
846: Monte Carlo move $C\to C'$ is then generated as follows:
847: \begin{itemize}
848:
849: \item If the configuration $C$ is such that both the base in $i$ and in $j$ are
850: free (i.e. $\sigma_{C}(i)=i$ and $\sigma_{C}(j)=j$) then add the link
851: $i-j$ (i.e. put $\sigma_{C'}(i)=j$ and $\sigma_{C'}(j)=i$). We call
852: this Monte Carlo move ``add a base pair'' (see case 1 of figure
853: \ref{MCmovesplot}).
854:
855: \item If the configuration $C$ is such that there is arc between $i$ and $j$
856: (i.e. $\sigma_{C}(i)=j$) then remove the link $i-j$ (i.e. put
857: $\sigma_{C'}(i)=i$ and $\sigma_{C'}(j)=j$). We call this Monte Carlo
858: move ``remove a base pair'' (see case 2 of figure \ref{MCmovesplot}).
859:
860: \item If the configuration $C$ is such that either the base in $i$ or the
861: base in $j$ is linked to some other base, (i.e. $\sigma_{C}(i)=i$ and
862: $\sigma_{C}(j) \neq j$, or $\sigma_{C}(j)=j$ and $\sigma_{C}(i) \neq i$)
863: then move the link back to $i-j$, by overriding any former link
864: (i.e. put $\sigma_{C'}(i)=j$ and $\sigma_{C'}(j)=i$). We call this
865: Monte Carlo move ``shift a base pair'' (see case 3 and 4 of figure
866: \ref{MCmovesplot}).
867:
868: \item If the configuration $C$ is such that the base $i$ is linked to an other
869: base $k_1$ and $j$ is linked to an other base $k_2$ and the base-pair
870: $k_1-k_2$ is possible, then swap the links (i.e. put
871: $\sigma_{C'}(i)=j$, $\sigma_{C'}(j)=i$, $\sigma_{C'}(k_1)=k_2$,
872: $\sigma_{C'}(k_2)=k_1$). We call this Monte Carlo move ``swap a base
873: pair'' (see case 5 and 6 of figure
874: \ref{MCmovesplot}).
875:
876: \item If none of the above cases applies then do not update the
877: configuration, i.e. put $C'=C$.
878:
879: \end{itemize}
880: See Figure \ref{MCmovesplot} for a summary of these Monte Carlo moves, and Figure \ref{confplot} for a simple example.\\
881: \begin{figure}[-ht]
882: \centerline{
883: \epsfig{figure=data/MCmoves.eps,width=40pc}
884: }
885: \caption{Monte Carlo moves for an allowed base pair $i-j$ of a
886: RNA secondary structures with pseudoknots. The move $1)$ adds a
887: base pair. The move $2)$ removes a base pair. The moves $3)$ and $4)$
888: shift a base pair. The move $5)$ and $6)$ swap two base-pairs, when
889: possible.}
890: \label{MCmovesplot}
891: \end{figure}
892:
893:
894: These Monte Carlo moves obviously satisfy the detailed balance
895: condition. In fact the probability of creating a link between $i$ and
896: $j$ when $i$ or $j$ are link-free (or at least one of them), or of
897: removing the link when they are already linked is always $P_{ij}=2/T$,
898: and thus it is symmetric. In the case where $i$ and $j$ are already
899: linked to different bases $\sigma(i)$ and $\sigma(j)$, then a link is
900: put between $i$ and $j$ only if $\sigma(i)$ can be connected to
901: $\sigma(j)$ as well. In this case the reverse move also occurs with
902: the same probability rate, thus it is a symmetric move. Moreover, the set of
903: Monte Carlo moves are ergodic. The key
904: observation is that such moves correspond to transpositions in the
905: space of permutation involutions. Since any configuration of RNA
906: secondary structure with pseudoknots can be uniquely represented by a
907: permutation involution, and since any permutation can be obtained by a
908: suitable finite sequence of transpositions
909: \cite{knuth3}, it follows that any RNA secondary structure with pseudoknots can be
910: generated with a finite number of the Monte Carlo moves described
911: above.
912:
913: \begin{figure}[-ht]
914: \centerline{
915: \epsfig{figure=data/configuration.eps,width=25pc}
916: %\put(5,0){$\bar{m}$}
917: }
918: \caption{Space of the configurations for the sequence $\{A,U,A,U\}$. The
919: arrows indicate the Monte Carlo moves and their probability rate.}
920: \label{confplot}
921: \end{figure}
922:
923:
924:
925: Few comments are in order. First of all, other sets of Monte Carlo
926: moves are possible of course. Several authors introduced collective
927: moves, where several links are updated at the same time (as opposed to
928: one by one as we propose). The advantages are a general speed-up of
929: the computing time, and a more effective simulation as far as overcoming
930: the energy barriers. In the present work, we prefer to keep the code
931: as simple as possible by using a set of ``local'' moves, and to focus
932: on testing its effectiveness when dealing with RNA pseudoknots.
933: Second, both the generation of the Monte Carlo moves and the
934: Metropolis method require a good (pseudo)random number generator, in
935: order to avoid biases in the output which may be very difficult to
936: detect. For a good introduction to random number generators we refer
937: the reader to
938: \cite{knuth2} and \cite{NumRecipes}.
939: Finally, as in all stochastic algorithms, one has to be able to
940: estimate the statistical errors of the Monte Carlo prediction. As
941: this method generates an ensemble of configurations
942: $\{C^{(0)},C^{(1)},\ldots, C^{(n_{max})}\}$, distributed according to
943: the probability distribution $P(C)$, then one can compute ensemble
944: averages of any quantity ${\cal A}(C)$ simply by:
945: \begin{equation}
946: \langle {\cal A} \rangle =\frac{1}{n_{max}}\sum_{i=1}^{n_{max}} {\cal A}(C^{(i)}) \, .
947: \end{equation}
948:
949: The error associated to this observable scales like $1/\sqrt{N}$ where
950: $N$ is the number of independent measurements. It is important to
951: note that $N$ is not usually equal to $n_{max}$ since in general the
952: configurations generated by any Monte Carlo algorithm are
953: correlated. One can deal with this issue in two ways. One can compute
954: the autocorrelation length $\xi$ of the sequence of the RNA
955: configurations generated by the Monte Carlo algorithm and then
956: subsample the same set of configurations, keeping one configuration
957: every $\xi$ and skipping all the configurations in between
958: \cite{Geyer92}. An other possibility is to keep all the
959: configurations of the sequence, and compute the statistical error by
960: taking into account the existence of correlations (The error is
961: usually bigger than the simple standard deviation of the data). A
962: well-known technique for computing the statistical error of a set of
963: correlated data is the so-called {\it jackknife method}. For an
964: introduction to this method, we refer the reader to
965: \cite{jackknife}. There are also other techniques which can be found in
966: \cite{Liu}.
967:
968: \subsection{Energy update}
969: According to the Metropolis method, the ratio $\rho$ (Step 2 of the
970: algorithm, eq. (\ref{rho})) is given by
971: \begin{equation}
972: \rho=\exp\left[-\frac{1}{k_B T}(\Delta E-T \Delta S+\mu \Delta g )\right] \, ,
973: \end{equation}
974: where
975: \begin{eqnarray}
976: \Delta E&=&E(C^{(n)})-E(C^{(n-1)}) \nonumber \, ,\\
977: \Delta S&=&S(C^{(n)})-S(C^{(n-1)}) \nonumber \, , \\
978: \Delta g&=&g(C^{(n)})-g(C^{(n-1)}) \, . \nonumber
979: \end{eqnarray}
980: Since the Monte Carlo moves are local (i.e. they involve only a small
981: part of the RNA sequence) the computation of $\Delta E$, $\Delta S$
982: and $\Delta g$ is usually easier and faster than computing the full
983: functions $E(C)$, $S(C)$ and $g(C)$.
984:
985: We consider first $\Delta g$, and we provide an efficient algorithm
986: for computing it. According to eq. (\ref{genus}), the genus is given
987: by $g=(1-\#V+\#E-\#F)/2=
988: (1-L+(L+n_{arcs})-n_{loops})/2$.
989: %=(1+n_{arcs}-n_{loops})/2$
990: Therefore:
991: \begin{equation}
992: \Delta g=\frac{1+\Delta n_{arcs}-\Delta n_{loops}}{2} \, ,
993: \end{equation}
994: where
995: \begin{equation}
996: \Delta n_{arcs}=
997: \left\{
998: \begin{array}{ll}
999: -1& \mbox{ for a ``remove the base-pair $i-j$" move,}\\
1000: 1 & \mbox{ for a ``add the base pair $i-j$" move,}\\
1001: 0 & \mbox{ for a ``shift or swap the base pair $i-j$" move.}
1002: \end{array}
1003: \right.
1004: \end{equation}
1005: The difference $\Delta n_{loops}$ can be computed by considering the
1006: loops containing the bases $i$ and $j$. In principle there are 4
1007: possible independent loops, two about $i$ and two about $j$ (see
1008: Figure \ref{loopsplot}). The connectivity of the RNA molecule can be
1009: such that the four loops are not independent. In Appendix
1010: \ref{appendix} we describe an algorithm for computing the actual number of
1011: independent loops. Then it is sufficient to run the algorithm over
1012: $C^{(n)}$ and $C^{(n-1)}$ and compute the difference of loops, that is
1013: $\Delta n_{loops}$.
1014: \begin{figure}[-ht]
1015: \centerline{
1016: \epsfig{figure=data/loops.eps,width=20pc}
1017: }
1018: \caption{Two given bases $i$ and $j$ usually belong to $4$ loops, when drawing the arc with the double-line notation. The loops
1019: are not always independent. Here there are two examples: a case with
1020: $2$ independent loops (top), and a case with only one
1021: independent loop (bottom).}
1022: \label{loopsplot}
1023: \end{figure}
1024:
1025: Secondly, the calculation of $\Delta E$ is more problematic. There is
1026: not yet an RNA energy model for any given topology. The most studied
1027: and well-defined model both from a theoretical and experimental point
1028: of view is for the spherical topology (i.e. genus$=0$, that is RNA
1029: secondary structure without pseudoknots)\cite{ZukEn}. The set of
1030: empirical ``Turner'' energy rules can be generalized in order to
1031: describe some simple class of pseudoknots \cite{gultyaev} but the
1032: general case for {\it any} topology is still lacking. For $\Delta S$
1033: the situation is slightly better, as one can model the configurational
1034: entropy of the RNA structure by using the theory of polymers (as
1035: already presented in \cite{isa1} or \cite{gultyaev}) and the inclusion
1036: of pseudoknots is in principle feasible. Therefore, in our
1037: numerical simulation we will use the ``Turner'' energy model, and even
1038: if this is not quite appropriate for higher topology, we expect the
1039: corrections to be small with respect to the over-all energy scale. We
1040: remind the reader that the purpose of this paper is to propose a new
1041: approach for the study of pseudoknots formation in RNA secondary
1042: structure. Thus at this point, it is reasonable to perform a
1043: preliminary analysis based on an approximate energy model.
1044: When a more complete energy model (including all the
1045: topologies) will be available, it will be sufficient to include it in the
1046: calculation of $\Delta E$ in our algorithm.
1047:
1048: \subsection{The ``simulated annealing" method}
1049: One of the major problems about the Monte Carlo simulation of RNA
1050: folding, is that the energy landscape is usually very rough with
1051: metastable valleys separated by energy barriers which are ``high''
1052: compared to the energy involved in each Monte Carlo move. This is a
1053: general situation in thermodynamic systems with many degrees of
1054: freedom (e.g. glasses, polymers, proteins etc.). where in addition
1055: to the global minimum energy configuration there may be many local
1056: minima separated by high energy barriers. The worst consequence is
1057: that the system can be trapped for a long time in such local minima
1058: and the Monte Carlo exploration of the energy landscape is no longer
1059: effective. In order to over come this problem with RNA M.Schmitz and
1060: G. Steger in \cite{Monte} proposed the use of a computational
1061: technique called ``simulated annealing'' method. It is a classical
1062: method which has been introduce for finding the minimum energy
1063: configuration of a system with a very rough energy landscape
1064: \cite{kirk}. We briefly describe the algorithm:
1065:
1066: \begin{itemize}
1067: \item Step 1: generalize the partition function eq. (\ref{ourZRNA}) to the form:
1068: \begin{equation}
1069: {\cal Z}_{RNA}= \sum_{C_{ij}} e^{-\frac{1}{k_B \Theta}\left[ E(C)-T
1070: S(T,C)+\mu g(C) \right]} \, ,
1071: \label{ourZRNATheta}
1072: \end{equation}
1073: and initialize $\Theta=\Theta_{max}>T$.
1074: \item Step 2: Starting from an initial configuration $C^{(0)}$ (e.g.,
1075: the fully denaturated RNA configuration) sample $n$ configurations by
1076: means of the Metropolis Monte Carlo method applied to
1077: eq. (\ref{ourZRNATheta}).
1078: \item Step 3: Go to Step 2, and replace $\Theta$ by a lower value, and
1079: $C^{(0)}$ by $C^{(n)}$. Repeat this step until the temperature of the
1080: system is equal to $T$. During the Monte Carlo process keep track all
1081: the time of the configuration with the lowest energy.
1082: \end{itemize}
1083: One can show that usually, the global minimum can be obtained by using
1084: a logarithmic rate \cite{annea}. In practice, other annealing
1085: schedules are possible: linear, hyperbolic, exponential, power-law
1086: schedules are often implemented.
1087:
1088: Assuming that at low temperature an RNA molecule assumes a
1089: configuration which corresponds to the minimum energy, we can find
1090: such a configuration by using the simulated annealing method, starting
1091: the simulation with a value of $\Theta$ well above the melting
1092: temperature (say a few hundred degrees Celsius). A first check of
1093: this method is whether we can reproduce the results produced by
1094: deterministic algorithms such as ``mfold'' \cite{z2003} or the
1095: ``Vienna Package'' \cite{ivovienna}. For that purpose, it is
1096: sufficient to use the ``Turner'' energy model and run our algorithm
1097: with a large value of the chemical potential $\mu$. Our preliminary
1098: tests showed that the minimum can be easily found for sequences with
1099: length up to around 300 bases. For longer RNA sequences, the
1100: simulation time increases and the minimum may be harder to find. In
1101: this cases we can use an additional feature of our model. In fact our
1102: approach offers also the interesting possibility of using the chemical
1103: potential for overcoming the energy barriers. It means that we can
1104: apply a ``simulated annealing'' method on $\mu$ rather than on $T$.
1105: Thus, starting with a low value of $\mu$ (where all the topologies
1106: with any genus are allowed) the Monte Carlo simulation can quickly
1107: explore regions which are very distant from each other in the energy
1108: landscape. Then by slowly increasing the value of $\mu$ we gradually
1109: constrain the simulation to select only planar configurations
1110: (i.e. secondary structures without pseudoknots), and the minimum
1111: energy configuration eventually. During this process, that is for
1112: intermediate values of $\mu$, many configurations in thermal
1113: equilibrium are generated, and in general they correspond to diagrams
1114: with $\langle g \rangle \neq0$, i.e. RNA configurations with
1115: pseudoknots. These configurations are the prediction of our algorithm
1116: and they should be compared with the experimental data. It is at this
1117: level that the value of $\mu$ can be tuned, in order to fit the
1118: data. The results of our preliminary investigations in this region of
1119: the phase diagram are very encouraging and promising. However, in this
1120: paper we limit ourselves to the description of this new method and of
1121: our algorithm. The results of our simulation will be published
1122: shortly.
1123:
1124:
1125:
1126: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1127: \sect{Conclusions}\label{conclsection}
1128:
1129:
1130: In this paper we propose a new approach to the problem of RNA folding
1131: with pseudoknots. We start from a classifications of RNA pseudoknots
1132: based on their graphical representation by means of disk diagrams. A
1133: generic disk diagram is usually not planar, i.e. cannot be drawn on a
1134: plane surface without crossing lines. However, if the surface has a
1135: high enough genus (i.e. a sufficient number of ``handles''), the
1136: diagram can always be drawn on that surface without any crossing. The
1137: precise correspondence is obtained by using a famous theorem by Euler,
1138: and it precisely corresponds to the topological classifications of RNA
1139: pseudoknots already introduced in \cite{OZ}. Then we propose a
1140: statistical mechanics model where the formation of RNA pseudoknots is
1141: associated with fluctuations of the topology (eq. (\ref{ourZRNA})). In
1142: order to do that we introduce a parameter, the topological ``chemical
1143: potential'', which controls the rate of pseudoknots formation, and can
1144: be obtained by fitting experimental data. We then discuss the
1145: qualitative structure of the phase diagram for the RNA molecule in the
1146: plane $\{\mu,T\}$ and its interpretation. Finally we describe a Monte
1147: Carlo algorithm for the prediction of RNA pseudoknots. It is based on
1148: a standard Metropolis algorithm coupled to the ``simulated annealing''
1149: method and we provide an explicit description of its implementation
1150: and use. A numerical investigation of this technique and the phase
1151: diagram is under way and will be published shortly.
1152:
1153:
1154: \indent
1155:
1156: \noindent
1157: \underline{Acknowledgments}:
1158: We wish to thank the organizers of the EUROGRID meeting on ``Random
1159: geometry: theory and applications'' at Les Houches (France) on March
1160: 2004, where this work has been first presented. GV acknowledges the
1161: support of the European Fellowship MEIF-CT-2003-501547.
1162:
1163:
1164:
1165: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1166: \begin{appendix}
1167:
1168: \sect{An algorithm for computing $n_{loops}$}
1169: \label{appendix}
1170:
1171: In this Appendix we describe an algorithm for computing the number of
1172: independent loops adjacent to the $i$-th and $j$-th nucleotides. It is
1173: useful for computing the variation of the genus when one of the Monte
1174: Carlo moves we described in Section \ref{montesection}. The algorithm
1175: we propose for counting the number of independent loops is based on
1176: tracking a walk along the diagram starting from the base $i$ and
1177: marking the loop with an identifying number (or color). Namely, we
1178: represent the configuration $C$ by means of the permutation involution
1179: $\sigma_{C}$ (as described in Section \ref{representsection}), and the
1180: algorithm is:
1181:
1182: \begin{verbatim}
1183: START
1184: v(1)=v(2)=v(3)=v(4)=0 % set the four flags to zero
1185: pos=i % the start position is i-th base
1186: color=1 % using color 1
1187: v(1)=color % mark the first flag with the color in use
1188: do{ % start the first loop
1189: pos=sigma(pos) % follow the permutation involution
1190: if pos==i then v(2)=color % check if it is either in i or j
1191: if pos==j then v(4)=color
1192: pos=shift(pos) % shift move (along the RNA circle)
1193: if pos==j then v(3)=color % check if it is in i again
1194: } while (position!=i) % repeat until it returns at the starting point
1195:
1196: if v(2)==0 then{ % check if the second loop has been marked already
1197: color=color+1 % if yes, change color
1198: pos=i % start again from i-th base
1199: v(2)=color % mark the second flag
1200: do{ % repeat all the above for the second loop (at i)
1201: pos=shift(pos)
1202: if pos==j then v(3)=color
1203: pos=sigma(pos)
1204: if pos==j then v(4)=color
1205: } while (pos!=i)
1206: }
1207:
1208:
1209: if v(3)==0 then{ % repeat all the above for the third loop (at j)
1210: color=color+1
1211: pos=j
1212: v(3)=color
1213: do{
1214: pos=sigma(pos)
1215: if pos==j then v(4)=color
1216: pos=shift(pos)
1217: } while (pos!=j)
1218: }
1219:
1220: if v(4)==0 then{ % the fourth loop at j is independent from the
1221: color=color+1 % previous ones if and only if it has not been
1222: } % marked yet
1223:
1224: nloops=color % the number of independent loops is the number
1225: % of used colors
1226: END
1227: \end{verbatim}
1228: $L$ is the length of the RNA sequence and the function {\texttt
1229: shift(pos)} is just the increment by 1 of the variable {\texttt pos}
1230: with period $L$, i.e. {\texttt shift(pos)=remainder(pos,L)+1}. At the
1231: end of the algorithm the variable ``color'' contains the number of
1232: independent loops $n_{loops}$. The algorithm runs in a time
1233: proportional to $O(L)$.
1234: \end{appendix}
1235:
1236: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1237:
1238: %\indent
1239:
1240: \begin{thebibliography}{99}
1241:
1242: \bibitem{Science} J. Couzin, ``Breakthrough of the Year: Small RNAs Make Big Slash'', Science {\bf 298} (2002) 2296.
1243:
1244: \bibitem{TB} I. Tinoco Jr. and C. Bustamante, J. Mol. Biol. {\bf 293} (1999) 271.
1245:
1246: \bibitem{LCC} P.A. Limbach, P.F. Crain and J.A. McCloskey, Nucleic Acids Res. {\bf 22} (1994) 2183.
1247: %2196.
1248: %Summary: the modified nucleosides of RNA.
1249:
1250: \bibitem{Nagaswamy} U. Nagaswamy, N. Voss, Z. Zhang and G.E. Fox, Nucleic Acids Res. {\bf 28} (2000) 375.
1251: %376.
1252:
1253: \bibitem{BJT} A. Banerjee, J. Jaeger and D. Turner, Biochemistry {\bf 32} (1993) 153.
1254:
1255: \bibitem{BW} P. Brion and E. Westhof, Ann. Rev. Biophys. Biomol. Struct. {\bf 26} (1997) 113.
1256:
1257: \bibitem{LD} L. Laing and D. Draper, J. Mol. Biol. {\bf 237} (1994) 560.
1258:
1259: \bibitem{ZS} M. Zuker and D. Sankoff, Bull. Math. Biol. {\bf 46} (1984) 591.
1260: %621.
1261:
1262: \bibitem{Monte} M. Schmitz and G. Steger, J. Mol. Biol. {\bf 255} (1996) 254.
1263:
1264: \bibitem{phyl1} R.R. Gutell, N. Larsen and C.R. Woese, Microbiol Rev. {\bf 58} (1994) 10.
1265:
1266: \bibitem{phyl2} C.R. Woese, R.R. Gutell, R. Gupta, H.F. Noeller, Microbiol Rev. {\bf 47} (1983) 621.
1267:
1268: \bibitem{phyl3} P.Higgs, Phys. Rev. Lett {\bf 76} (1996) 704.
1269:
1270: \bibitem{phyl4} R. Nussinov, G. Peickzenink, J. Griggs and D. Kleitman, SIAM J. Appl. Math. {\bf 35} (1978) 68.
1271: %-82.
1272:
1273: \bibitem{FFHS} C. Flamm, W. Fontana, L. Hofacker and P. Schuster, RNA {\bf 6} (2000) 325.
1274:
1275: %\bibitem{FHMSZ} Christoph Flamm, Ivo L. Hofacker, Sebastian Maurer-Stroh, Peter F. Stadler, Martin Zehl.
1276: %Design of multistable RNA molecules. RNA 7:325-338, 2001
1277:
1278: \bibitem{mironovD} A.A. Mironov, L.P. Dyakonova and A.E. Kister, J. Biomol. Struct. Dyn. {\bf 2} (1985) 953.
1279: %962.
1280:
1281: \bibitem{isa1} H. Isambert and E.D. Siggia, Proc. Natl. Acad. Sci. USA {\bf 97} (2000), 6515.
1282: %6520.
1283:
1284: \bibitem{isa2} A. Xayaphoummine, T. Bucher, F. Thalmann and H. Isambert, Proc. Natl. Acad. Sci. USA {\bf 100} (2003) 15310.
1285:
1286: \bibitem{MWM} J.E. Tabaska, R.B. Cary, H.N. Gabow, G.D. Stormo, {\bf 14} (1998) 691.
1287: %699
1288: %An RNA folding method capable of identifying pseudoknots and base triples
1289:
1290: \bibitem{Z2} M. Zuker, Curr. Opin. Struct. Biol. {\bf 10} (2000) 303.
1291: % {\it Calculating nucleic acid secondary structure}
1292:
1293: \bibitem{PRB} C.W. Pleij, K. Rietveld and L. Bosch, Nucleic Acids Res. {\bf 13} (1985) 1717.
1294: %-1731.
1295: %A new principle of RNA folding based on pseudoknotting
1296:
1297: \bibitem{WJ} E. Westhof and L. Jaeger, Current Opinion Struct. Biol. {\bf 2} (1992) 327.
1298: %-333.
1299: %``RNA pseudoknots''
1300:
1301: \bibitem{MD} V.K. Misra and D.E. Draper, Byopolimers {\bf 48} (1998) 113.
1302:
1303: \bibitem{Dave} J.P.D. Thirumalai, S.A. Woodson, Proc. Natl. Acad. Sci. USA {\bf 96} (1999) 96 6149.
1304: %6154
1305: %Biochemistry
1306: %Magnesium-dependent folding of self-splicing RNA: Exploring the link between c%ooperativity, thermodynamics, and kinetics
1307:
1308: \bibitem{MD2} V.K. Misra, R. Shiman and D.E. Draper, Biopolymers {\bf 69}
1309: (2003) 118.
1310: %-136.
1311: %A thermodynamic framework for the magnesium-dependent folding of RNA.
1312:
1313: \bibitem{waterman} M.S. Waterman, and T.H. Byers, Mathematical Biosciences, {\bf 77} (1985) 179.
1314: %-188.
1315: %A dynamic programming algorithm to find all solutions in a neighborhood of the optimum.
1316:
1317: \bibitem{williams} A.L. Williams and I. Tinoco Jr., Nucleic Acids Res. {\bf 14} (1986) 299.
1318: %-315.
1319: %A dynamic programming algorithm for finding alternate RNA secondary structures.
1320:
1321: \bibitem{NJ} R. Nussinov and A.B. Jacobson, PNAS {\bf 77} (1980) 6309.
1322:
1323: \bibitem{Z} M. Zuker, Science {\bf 244} (1989) 48.
1324: %-52.
1325: %On finding all suboptimal foldings of an RNA molecule
1326:
1327: \bibitem{Z1} M. Zuker and P. Stiegler, Nucleic Acids Res. {\bf 9} (1981), 133.
1328: %-148.
1329: %Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information
1330:
1331: %\bibitem{HH1} I.L. Hofacker, M. Fekete and P.F. Stadler, J. Mol. Biol. {\bf 319} (2002), 1059-1066.
1332: %Secondary Structure Prediction for Aligned RNA Sequences
1333:
1334: \bibitem{HH2} I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker and P. Schuster, Monatshefte f. Chemie {\bf 125} (1994) 167.
1335: %-188.
1336: %Fast Folding and Comparison of RNA Secondary Structures.
1337:
1338: \bibitem{Wuchty} S. Wuchty, W. Fontana, I.L. Hofacker, P. Schuster, Biopolymers {\bf 49} (1999) 145.
1339: %-165.
1340: %Complete suboptimal folding of RNA and the stability of secondary structures.
1341:
1342: \bibitem{RE} E. Rivas and S.R. Eddy, J. Mol. Biol. {\bf 285} (1999) 2053.
1343: %-2068.
1344:
1345: \bibitem{Uemura} Y. Uemura, A. Hasegawa, S. Kobayashi and T. Yokomori, Theoretical Computer Science {\bf 210} (1999) 277.
1346: %-303.
1347: %Tree adjoining grammars for RNA structure prediction
1348:
1349: \bibitem{Akutsu} T. Akutsu (2001), Discr. Appl. Math. {\bf 104} (2001) 45.
1350: %-62.
1351: %Dynamic programming algorithm for RNA secondary structure prediction with pseudoknots
1352:
1353: \bibitem{LP} R.B. Lyngs\o, C.N.S. Pedersen, J. Comp. Biol., {\bf 7} (2000) 409.
1354: %-427.
1355:
1356: \bibitem{giege} R. Giegerich and J. Reeder, Technical Report 2003-03 (2003) University of Bielefeld.
1357:
1358: \bibitem{deogun} J.S. Deogun, R. Donis, O. Komina and F. Ma, Proc. Second
1359: Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New
1360: Zealand CRPIT, 29. Chen, Y.P.P., Ed. ACS. 239-246.
1361: %RNA Secondary Structure Prediction with Simple Pseudoknots.
1362:
1363: \bibitem{OZ} H. Orland and A. Zee, Nucl. Phys. {\bf B620} (2002) 456.
1364:
1365: \bibitem{POZ} M. Pillsbury, H. Orland, A. Zee, http://arXiv.org/physics/0207110.
1366:
1367: \bibitem{PTOZ} M. Pillsbury, J.A. Taylor, H. Orland and A. Zee, http://arXiv.org/cond-mat/0310505.
1368: %An Algorithm for RNA Pseudoknots
1369:
1370: \bibitem{MironovL} A.A. Mironov and V.F. Lebedev, BioSystems {\bf 30} (1993), 49.
1371: %-56.
1372:
1373: \bibitem{gultyaev} A.P. Gultyaev, F.H.D. van Batenburg and C.W.A. Pleij, RNA {\bf 5} (1999) 609.
1374: %-617.
1375:
1376: \bibitem{abrah} J.P. Abrahams, M. van den Berg, E. van Batenburg and C.W.A Pleij, Nucleic Acids Res. {\bf 18} (1990), 3035.
1377: %3044.
1378: %Prediction of RNA secondary structure, including pseudoknotting, by computer simulation.
1379:
1380: \bibitem{gultyMonte} A.P. Gultyaev, Nucleic Acids Res. {\bf 19} (1991), 2489.
1381: %-2494.
1382: %The computer simulation of RNA folding involving pseudoknot formation.
1383:
1384: \bibitem{Ivo} I.L. Hofacker, in "Monte Carlo Approach to Biopolymers and Protein Folding", P. Grassberger, G. Barkema and W. Nadler (eds.), World Scientific, Singapore (1998), 171.
1385: %-182.
1386: %RNA Secondary Structures: A Tractable Model of Biopolymer Folding
1387:
1388: \bibitem{pseudobase} F. H. D. van Batenburg, A. P. Gultyaev and C. W. A. Pleij, Nucleic Acids Res. {\bf 29} (2001) 194.
1389: %-195.
1390:
1391: \bibitem{pseudobase1} F.H.D van Batenburg, A.P. Gultyaev and C.W.A. Pleij and J. Oliehoek, Nucleic Acids Res. {\bf 28} (2000) 201.
1392: %-204.
1393: The database is at http://wwwbio.LeidenUniv.nl/$\sim$Batenburg/PKB.html.
1394:
1395: \bibitem{HH} P. Hogeweg and B. Hesper, Nucleic Acids Res. {\bf 12} (1984) 67.
1396: %-74.
1397: %Energy directed folding of RNA sequences
1398:
1399: \bibitem{FKSS} W. Fontana, A.M. Konings, P.F. Stadler and P. Schuster, Biopolymers {\bf 33} (1993) 1389.
1400: %-1404.
1401: %Statistics of RNA secondary structures
1402:
1403: \bibitem{RNAgraph1} H.H. Gan, S. Pasquali and T. Schlick, Nucleic Acids Res. {\bf 31} (2003) 2926.
1404: %-2943.
1405: %Exploring the repertoire of RNA secondary motifs using graph theory
1406: %with implications for RNA design .
1407:
1408: \bibitem{Gluick} T.C. Gluick and D.E. Drape, J. Mol. Biol {\bf 241} (1994) 246.
1409: %-262.
1410: %thermodynamics of folding a pseudoknotted mRNA fragment.
1411:
1412: \bibitem{Tur} M.J. Serra, D.H. Turner and S.M. Freier, Methods Enzymol. {\bf 259} (1995) 243.
1413: %-261.
1414:
1415: \bibitem{ZTM} M. Zuker, D.H. Mathews and D.H. Turner, RNA Biochemistry and Biotechnology, J. Barciszewski and B.F.C. Clark,eds., NATO ASI Series (1999) Kluwer Academic Publishers.
1416:
1417: \bibitem{caskill} J.S. McCaskill, Biopolymers {\bf 29} (1990) 1105.
1418: %-1119.
1419:
1420: \bibitem{z2003} M. Zuker, Nucleic Acids Res. {\bf 31} (2003) 3406.
1421: %-15.
1422: %Mfold web server for nucleic acid folding and hybridization prediction.
1423:
1424: \bibitem{mathews} D.H. Mathews, J. Sabina, M. Zuker and D.H. Turner, J. Mol. Biol. {\bf 288} (1999) 911.
1425: %-940.
1426: %Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure
1427:
1428: \bibitem{ivovienna} I.L. Hofacker, Nucleic Acids Res. {\bf 31} (2003) 3429.
1429: %-3431.
1430:
1431: \bibitem{Mau} J. Ambj{\o}rn, M. Carfora, and A. Marzuoli, {\it The Geometry of Dynamical Triangulations}, Springer-Verlag, Berlin, 1998.
1432:
1433: \bibitem{Amb} J. Ambj{\o}rn, B. Durhuus, T. Jonsson, {\it Quantum Geometry : A Statistical Field Theory Approach}, Cambridge University Press, 1997.
1434:
1435: \bibitem{thooft} G. 't Hooft, Nucl. Phys. {\bf B72} (1974) 461.
1436: %A planar diagram theory for strong interactions, Nucl. Phys. B72 (1974)
1437:
1438: \bibitem{MC} K. Binder and D.W. Heerman, {\it Monte Carlo Simulation in Statistical Physics}, Springer-Verlag, Berlin,1992.
1439:
1440: \bibitem{frenkel} D. Frenkel and B. Smit, {\it Understanding Molecular Simulation}, 2nd edition, Academic Press, 2002.
1441:
1442: \bibitem{combinatorics} I.L. Hofacker, P. Schuster, P.F. Stadler,
1443: Discr. Appl. Math. {\bf 88} (1998) 207.
1444: %-237.
1445: %Combinatorics Of RNA Secondary Structures
1446:
1447: \bibitem{Metro} N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller, Jour. Chemical Physics {\bf 21} (1953) 1087.
1448: %-1092.
1449: %Equation of State Calculations by Fast Computing Machines
1450:
1451: \bibitem{knuth3} D.E. Knuth, {\it The Art of Computer Programming, Volume 3: Sorting and Searching}, 2nd ed. Reading, MA: Addison-Wesley, 1998.
1452:
1453: \bibitem{knuth2} D.E. Knuth, {\it The Art of Computer Programming, Volume 2: Seminumerical Algorithms}, 3rd ed. Reading, MA: Addison-Wesley, 1997.
1454:
1455: \bibitem{NumRecipes} W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, {\it Numerical Recipes in C : The Art of Scientific Computing}, Cambridge University Press, Cambridge, 1992.
1456:
1457: \bibitem{Geyer92} C.J. Geyer, Statist. Sci. {\bf 7} (1992) 473.
1458: %-511.
1459:
1460: \bibitem{jackknife} J. Shao and D. Tu, {\it The Jackknife and Bootstrap}, Springer Verlag, 1995.
1461:
1462: \bibitem{Liu} J.S. Liu, {\it Monte Carlo Strategies in Scientific Computing}, Chapter 2, Springer New York, 2001.
1463:
1464: \bibitem{ZukEn} M. Zuker, Methods Enzymol. {\bf 180} (1989) 262.
1465: %288.
1466:
1467: %%%ENERGYPARAMETERS
1468: %A. Walter, D Turner, J Kim, M Lyttle, P Müller, D Mathews, M Zuker "Coaxial stacking of helices enhances binding of oligoribonucleotides.." PNAS, 91, pp
1469: %9218-9222, 1994
1470:
1471: %algorithm for loop matching
1472: %\bibitem{phyl5} R.R. Gutell, Curr. Opin. Struct. Biol. {\bf 6} (1993), 313.
1473:
1474: \bibitem{kirk} S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, Science {\bf 220} (1983) 671.
1475: %-680.
1476:
1477: \bibitem{annea} S. Geman and D. Geman, IEEE Trans. on Pattern Analysis and Machine Intelligence,
1478: {\bf 6} (1984) 721.
1479: %-741.
1480:
1481: \end{thebibliography}
1482: \end{document}
1483:
1484:
1485:
1486: %%% Local Variables:
1487: %%% mode: latex
1488: %%% TeX-master: t
1489: %%% End:
1490: