q-bio0510011/Main.tex
1: \documentclass[12pt]{article}
2: \usepackage{amssymb,amsmath}
3: \usepackage{epsfig}
4: 
5: \begin{document}
6: 
7: \title{A new formalism for calculation of the partition function of single stranded nucleic acids}
8: 
9: \author{Roumen A. Dimitrov \\
10: University of Sofia, Faculty of Physics,\\ Department of Theoretical Physics, \\
11: 5, James Bouchier Blvd., 1164 Sofia, Bulgaria, \\e-mail: dimitrov@phys.uni-sofia.bg
12: }
13: 
14: 
15: %\begin{document}
16: 
17: \maketitle
18: 
19: \begin{abstract}
20: 	A new formalism for calculation of the partition function of single stranded nucleic acids is presented.
21: Secondary structures and the topology of structure elements are the level of resolution that is used. 
22: The folding model deals with matches, mismatches, symmetric and asymmetric interior loops, stacked pairs in loop 
23: and dangling end regions, multi-branched loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices. 
24: Calculations on short and long sequences show, that for short oligonucleotides, a duplex formation often displays 
25: a two-state transition. However, for longer oligonucleotides, the thermodynamic properties of the single 
26: self-folding transition affects the transition nature of the duplex formation, resulting in a population of 
27: intermediate hairpin species in the solution. The role of intermediate hairpin species is analyzed in the case 
28: when a short oligonucleotides (molecular beacons) have to reliably identify and hybridize to 
29: accessible nucleotides within their targeted mRNA sequences. It is shown that the enhanced specificity of the molecular beacons 
30: is a result of their constrained conformational flexibility and the all-or-none mechanism of their hybridization to the target
31: sequence.
32: \end{abstract}
33: 
34: 
35: \section{Introduction}
36: Nucleic acids hold great promise as a design medium for the
37: construction of nanoscale devices with novel mechanical or chemical function \cite{SNC}. Efforts are currently underway in
38: many laboratories to use DNA and RNA molecules for applications in transport,
39: switching \cite{GASRRB,BYATAMFSJN,HYXZSNS}, circuitry \cite{MNSDS}, DNA computing \cite{RBNCCJPRLA} and DNA chips 
40: \cite{DSDLDMRD,SMJGDDSSM}. Conformational switches or diversity of conformations have been proven or
41: are suspected to be involved in several important processes such as regulation of gene expression, 
42: translational regulation, mutation and repair, and others \cite{GWMG,SG,GS}. 
43: During these processes there are several types of interactions
44: trough a network of RNA-RNA, RNA-DNA, RNA(DNA)-protein, RNA(DNA) self-folding or RNA(DNA)- small
45: molecular contacts. 
46: 
47: Comparison of short RNAs/DNAs with different base pairs,
48: loop sequences, bulges, etc.  has yielded an extremely useful
49: database of thermodynamic parameters from which the stabilities of conformational states of larger 
50: nucleic acid sequences can be estimated \cite{FRES8301,SUGN8701,HICD8501,PUGJ8901,BLAR7201}. 
51: The estimation of the thermodynamic parameters is based on
52: nearest-neighbor approximation for inter-residue interactions of
53: closest along the sequence nucleotide residues \cite{BORP7401}.
54: 
55: There have been several major improvements in calculation of the
56: partition function of a single stranded nucleic acids based on McCaskill
57: algorithm \cite{MCCJ9001,HOFI9401,MATO9601} or estimation of the
58: free energy based on free energy minimization and the
59: corresponding sub-ensemble around the minimum free energy
60: conformation \cite{ZUKM8901,WILA8601,WATM8301,WATM8502,ZUKM8902}.
61: 
62: 
63: In this work secondary structures and the topology of structure elements are the level of resolution that is used. 
64: However, atomic coordinates are also taken into account in the general expressions. 
65: Unlike proteins \cite{VDAF}, whose secondary structures usually
66: depend on the global amino acid sequence, DNA/RNA molecules
67: are currently thought to assemble in a hierarchical manner \cite{BATRT9901,EDRBBMJD,TRSTP}.
68: The folding can be conceptually partitioned in the two steps of formation 
69: of the secondary structure and the spatial structure \cite{ITCB}.
70: As a result DNA/RNA molecules exhibit a modular structure with individual 
71: structural motifs demonstrating independent characteristics. 
72: 
73: Therefore, investigation of the overall properties of DNA/RNA molecules based on exploration of variety of local 
74: structural motifs, their interactions and distributions along the sequence needs an appropriate theoretical approaches. 
75: In particular, this is especially important in a recent increased interest in 
76: predicting target sites for antisense oligonucleotides in
77: highly structured DNA/RNA molecules \cite{SWGSMYCR,GBSTALFK,DMMBSFJWDT,SWGNSMYMR,TVJWSF}. 
78: Because of the economical value and short experimental cycle, antisense technology has been widly accepted as the tool 
79: to study functions of a gene and to validate drug targets. Antisense oligonucleotides can 
80: potentially suppress particular gene expression through mechanism such as RNase H-mediated mRNA cleavage, destabilization of the
81: target mRNA or aberation of translation or splicing. Understanding the conformational constraints and transformation between
82: different local structural motifs is of great practical importance. Thus, conformational switches of hairpin-shaped oligonucleotide 
83: primers can be useful for enhancing the specificity of nucleic acid amplification reactions. Interactions between short 
84: oligonucleotides or small metabolic molecules can lead to conformational switches in the DNA/RNA target molecules 
85: \cite{MTMPAS,TSCSNB}. These conformational switches can be used for sensing and modulating complex biochemical networks in 
86: variety of important biological processes \cite{MJWR,GS}. 
87: 
88: Based on such local structural motifs approach in mind, we will use as a starting point our previous work \cite{RADMZ}, 
89: where we presented a new formalism for hybridization processes between DNA and RNA molecules.
90: There hybridization accounted only for stacked pairs, interior loops, bulges and, at the
91: ends, dangling bases. We did not consider stacked pairs in loop and
92: dangling end regions as well as multi-branch loops. The formalism was applied only to  
93: short DNA/RNA sequences. Another limitation was that this new formalism was not 
94: applied for the estimation of the partition function of self-folding. The self-folding of individual 
95: DNA/RNA molecules was based on free energy minimization and the 
96: corresponding sub-ensemble around the minimum free energy conformation at each temperature
97: as given by mfold program by Zuker \cite{ZUKM8902}. This led to some inconsistency in the overall calculations.
98: For sequences with non-two state transitions the populations of some intermediate species were poorly predicted. 
99: Recently, using McCaskill algorithm \cite{MCCJ9001, ZUKM0305}, mfold has been updated and now it is able to calculate not only the 
100: low energy conformations but the ensemble free energy also. It will be interesting in future to compare mfold 
101: with the formalism developed here.
102: 
103: In this work we present a new formalism for the estimation of the partition function for self-folding.
104: The formalism use an approach based on the left, right recursion algorithm we have developed for the free energy
105: calculation of duplexes \cite{RADMZ}. 
106: All possible conformations of single stranded DNA or RNA sequences in solution are explored. The folding model 
107: deals with matches, mismatches, symmetric and asymmetric interior loops, stacked pairs in loop and dangling end regions, 
108: multi-branched loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices. 
109: Calculations on short and long sequences show, that for short oligonucleotides, a duplex formation often displays 
110: a two-state transition. However, for longer oligonucleotides, the thermodynamic properties of the single 
111: self-folding transition affects the transition nature of the duplex formation, resulting in a population of 
112: intermediate hairpin species in the solution. The advantage of this new formalism is clearly demonstrated 
113: especially in the case when one need to design relatively short oligonucleotides (molecular beacons) which have to 
114: reliably identify and hybridize to accessible nucleotides within their targeted mRNA sequences. 
115: It is shown that the design will enhance the specificity of molecular beacons if they form a stem-and-loop structure
116: with constrained conformational flexibility and an all-or-none mechanism of their hybridization to the target sequence.
117: 
118: \section{Methods}
119: 
120: 
121: \subsection{Recursive calculation }
122: 
123: With increasing of the temperature the overwhelming majority of
124: the single stranded form conformations tend toward
125: their corresponding unfolded states. At each temperature there is an ensemble 
126: of conformational states where each conformation is characterized with the 
127: fraction of its base pairs and their location along the sequences which are 
128: melted at that given temperature. Thus along the sequences we have variety of local 
129: structural motifs characterized by alternating 
130: loops -single stranded regions- and double stranded regions. The
131: location and the length of these local structural motifs depend on their relative
132: Boltzmann statistical weights. In this work we are interested to calculate the partition
133: functions of the single-stranded forms based on the method
134: developed for double-stranded forms.
135: 
136: 
137: In our previous work (fig.1) \cite{RADMZ}, the polynucleotide sequences of the double-stranded 
138: forms are described as follows: sequence $1$ is represented by $S_{1}=r_{11}, r_{12},
139: r_{13}, r_{1i}, r_{1N_{1}}$ and sequence $ 2 $ is
140: represented by $S_{2}=r_{21}, r_{22}, r_{23}, r_{2j}, r_{2N_{2}}$, 
141: where $N_{1}$ and $N_{2}$ stand for their corresponding
142: lengths and $r_{1i}$ and $r_{2j}$ are the space coordinates of the
143: corresponding nucleotides of sequences $1$ and $2$. The recursion calculation is 
144: based on the condition that at least there is a two nucleotides along the sequence 
145: $1$ and sequence $2$ that are in contact $r_{1i}-r_{2j}$ and $1\leq i\leq N_{1}$,
146: $1\leq j\leq N_{2}$. The sequence enumeration is from the $5^{'} $- to
147: the $3^{'}$-end of the sequences. The contact $r_{1i}-r_{2j}$ include
148: an initiation free energy term necessary to bring the two sequences
149: together $F^{initiation}$. Each nucleotide pair $r_{1i}-r_{2j}$
150: formally divide the hybridized form $ S_{1}S_{2} $ of the sequences $
151: 1 $ and $ 2 $ in two parts left $ L $ and right $ R $ in such way that
152: the free energy $ F\left( S_{1}S_{2}\right) $ of $ S_{1}S_{2} $ is a
153: sum of the free energies of the left $ FL\left( r_{1i},r_{2j}\right)
154: $and right $ FR\left( r_{1i},r_{2j}\right) $ parts plus the initiation
155: free energy $ F^{initiation} $ which is assumed to be the same for all
156: possible pairs $ r_{1i}-r_{2j} $. Thus,
157: 
158: 
159: 
160: \begin{equation}
161: F\left( S_{1}S_{2}\right) = {F\!L}\left( r_{1i},r_{2j}\right) + {F\!R}\left(
162: r_{1i},r_{2j}\right) +F^{initiation}
163: \end{equation}
164: 
165: 
166: This additive property of the energy rules based on nearest neighbor
167: approximation forms the bases of the recursion calculations of the
168: partition function $ S_{1}S_{2} $.  The additivity of the free
169: energy leads to a multiplication of the partition functions of the
170: left $ {Z\!L} $ and right $ {Z\!R} $ parts \cite{RADMZ}. 
171: 
172: 
173: 
174: \begin{figure}
175: \begin{center}
176: \includegraphics{f3.eps}
177: \caption{Additive property of the free energy rules based on nearest-neighbor approximation: A- self-folding, B- hybridization \cite{RADMZ}.}
178: \end{center}
179: \end{figure}
180: 
181: 
182: 
183: Our main focus in this work is the partition function for single-stranded form which 
184: similar as we did for the double-stranded form will be described with left and right parts. 
185: The sequence is represented by $S=r_{1}, r_{2}, r_{3}, \dots, r_{i}, \dots r_{N}$, 
186: where $N$ stand for it's corresponding length and $r_{i}$ are the space coordinates 
187: of the corresponding nucleotides of sequences $S$. 
188: As previously, the recursion calculation is based on the condition that at least 
189: there is a two nucleotides along the sequence that are in contact $r_{i}-r_{j}$ . 
190: 
191: In contrast to the double-stranded form now the term for the initiation free energy represent the 
192: formation of a loop between the positions $i$ and $j$ (fig.1).
193: The sequence enumeration is from the $5^{'} $- to the $3^{'}$-end of the sequence.  
194: Each nucleotide pair $r_{i}-r_{j}$ formally divide the self-hybridized form  
195: of the sequences in three parts left $ FL $, middle $ FM $ and right $ FR $ in such way that
196: the free energy $ F\left( S\right) $ of $ S $ is a
197: sum of the free energies of the left $ FL\left( r_{i}\right)$, middle 
198: $ FM\left( r_{i},r_{j}\right) $ and the right $ FR\left( r_{j}\right) $ parts. 
199: 
200: 
201: 
202: \begin{eqnarray}
203: F\left( S\right) = {F\!L}\left( r_{1},r_{i}\right) + {F\!M}\left(r_{i},r_{j}\right) + {F\!R}\left(r_{j}, r_{N}\right)
204: \end{eqnarray}
205: 
206: The recursion form of the partition functions of the left, middle and right parts have the forms:
207: 
208: 
209: Left part:
210: 
211: \begin{eqnarray}
212: {Z\!L}\left( r_{1}, r_{i}\right)  & = & {Z\!L}\left( r_{1}, r_{i-1}\right) + \nonumber \\
213: & & \sum_{1\leq k<i}{Z\!L}\left( r_{1}, r_{k}\right) 
214: \exp \left( -\frac{{F\!M}\left( r_{k},r_{i}\right)}{RT}\right) \\
215: {F\!L}\left( r_{1}, r_{i}\right)  & = & -RT \ln \left[ {Z\!L}( r_{1}, r_{i}\right)]
216: \end{eqnarray}
217: 
218: 
219: Middle part:
220: 
221: 
222: \begin{eqnarray}
223: {Z\!M}\left( r_{i},r_{j}\right)  & = & {Z\!M}^{open}\left( r_{i},r_{j}\right) + \nonumber \\
224:  & & \sum _{i<k<l}\sum _{j>l>k}{Z\!M}\left( r_{k},r_{l}\right)
225: \exp \left( -\frac{F\left( r_{i},r_{j},r_{k},r_{l}\right)}{RT}\right) 
226: \end{eqnarray}
227: 
228: \begin{eqnarray}
229: F\left( r_{i},r_{j},r_{k},r_{l}\right) & = & {F\!L}\left( r_{i},r_{k}\right)+{F\!R}\left( r_{l},r_{j}\right)
230: \end{eqnarray}
231: 
232: \begin{eqnarray}
233: {F\!M}\left( r_{i},r_{j}\right)  & = & -RT \ln \left[ {Z\!M}( r_{i},r_{j}\right)]
234: \end{eqnarray}
235: 
236: 
237: Right part:
238: 
239: \begin{eqnarray}
240: {Z\!R}\left( r_{j}, r_{N}\right)  & = & {Z\!R}\left( r_{j+1}, r_{N}\right) + \nonumber \\
241: & & \sum_{N\geq k>j}{Z\!R}\left( r_{k}, r_{N}\right) 
242: \exp \left( -\frac{{F\!M}\left( r_{j},r_{k}\right)}{RT}\right) \\
243: {F\!R}\left( r_{j}, r_{N}\right)  & = & -RT \ln \left[ {Z\!R}( r_{j}, r_{N}\right)]
244: \end{eqnarray}
245: 
246: 
247: $ {F\!L}\left( r_{1}, r_{i}\right) $ and ${F\!R}\left( r_{j}, r_{N}\right)$  correspond to  
248: the free energy of self-folding of the $5'$ and $3'$ dangle ends of the sequence.
249: Obviously, $ {F\!L}\left( r_{1}, r_{N}\right) = {F\!R}\left( r_{1}, r_{N}\right)$. 
250: The term $ {F\!M}\left(r_{i},r_{j}\right) $ corresponds to the case of initiation of a loop in the middle part. 
251: Thus, $ {F\!M}^{open}\left(r_{i},r_{j}\right) = -RT\ln[{Z\!M}^{open}\left( r_{i},r_{j}\right)]$ represents the free energy initiation of a loop without internal base pair
252: contacts. While, $ F\left( r_{i},r_{j},r_{k},r_{l}\right) $ takes into account 
253: the summation over all possible distribution of structural motifs 
254: (stack pairs, bulges, symmetric and asymmetric loops, single stranded regions, hairpins and multibranches) 
255: along the sequences of the interior regions $ (i,k) $ and $ (l,j) $.
256: For example when $ \left| k-i\right|  = 1 $ and $ \left| l-j\right| =1 $ the free energy $
257: F\left( r_{i},r_{j},r_{k},r_{l}\right) $ represents a stack pair
258: which belong to a secondary structure, when $ \left| k-i\right| =2 $
259: and $ \left| l-j\right| =1 $ or $ \left| k-i\right| =1 $ and $
260: \left| l-j\right| =2 $ we have a bulge.  When $ \left|
261: k-i\right| \neq \left| l-j\right| $ and there are no any base pair contacts in the loop regions, 
262: the free energy $ F\left( r_{i},r_{j},r_{k},r_{l}\right) $ 
263: represents an asymmetrical internal loop (including the case of a bulge from the one of the
264: sequences and a loop from the other and another way around), while $
265: \left| k-i\right| =\left| l-j\right| $ leads to a symmetrical loop
266: (including the case of a bulge from both sequences). The 
267: presence of internal base pair contacts in the loop regions lead to hairpins and multibranches.
268: For detailed description of the free energies of the bulges, symmetric and
269: asymmetric internal loops and dangling ends we refer the reader to the
270: recent review by Zuker \cite{ZUKM8904}. 
271: 
272: 
273: And lastly, based on the multiplication property of the partition functions for the left and  
274: right parts, for the total partition function we have:
275: 
276: 
277: \begin{eqnarray}
278: Z\left( S\right) =\sum _{1\leq i<j\leq
279: N}\left[{{Z\!L}\left( r_{1}, r_{i}\right) {Z\!M}\left( r_{i},r_{j}\right){Z\!R}\left( r_{j}, r_{N}\right)}\right]
280: \end{eqnarray}
281: 
282: 
283: \subsubsection{Pair probabilities}
284: 
285: Having calculated the partition function will allow us to derive the
286: probability distribution of various conformational properties. However, before that we need a  
287: recursion calculation form for the free energy term $ FL\left( r_{1i},r_{2j}\right)$ in equation
288:  (1). This term presents the free energy of the left part in case of hybridization. In our 
289: previous work \cite{RADMZ} we gave an expression for $ FL\left( r_{1i},r_{2j}\right)$ in which 
290: we did not consider stacked pairs in loop and dangling end regions as well as multi-branch loops. Based on our new formalism
291: developed above a general recursion calculation form for the left partition function $ {Z\!L}^{h}\left( r_{i},r_{j}\right)$ 
292: in case of hybridization can be presented as follow:
293: 
294: 
295: \begin{eqnarray}
296: {Z\!L}^{h}\left( r_{i},r_{j}\right)  & = & {Z\!L}\left( r_{1},r_{i}\right){Z\!R}\left( r_{j}, r_{N}\right) + \nonumber \\
297:  & & \sum _{1\leq k<i}\sum _{N\geq l>j}{Z\!L}^{h}\left( r_{k},r_{l}\right)
298: \exp \left( -\frac{F\left( r_{i},r_{j},r_{k},r_{l}\right)}{RT}\right) 
299: \end{eqnarray}
300: 
301: \begin{eqnarray}
302: {F\!L}^{h}\left( r_{i},r_{j}\right)  & = & -RT \ln \left[ {Z\!L}^{h}\left( r_{i},r_{j}\right)\right]
303: \end{eqnarray}
304: 
305: 
306:  Now we can tern to the calculation of the probabilities of base pairing. For example, the probabilities $ P(r_{i},r_{j}) $ and 
307: $ P(r_{i},r_{j},r_{i+1},r_{j-1}) $  for single $ r_{i}-r_{j} $ and double $ r_{i}-r_{j},r_{i+1}-r_{j-1}$ 
308: base pairs are:
309: 
310: 
311: \begin{equation}
312: P\left(r_{i},r_{j}\right) = \frac{{Z\!L}^{h}\left( r_{i},r_{j}\right){Z\!M}\left( r_{i},r_{j}\right)} 
313: {Z\left( S\right)}
314: \end{equation}
315: 
316: 
317: \begin{equation}
318: P\left(r_{i},r_{j},r_{i+1},r_{j-1}\right) = \frac{{Z\!L}^{h}\left(r_{i},r_{j}\right)  
319: {\exp \left( -\frac{F\left(r_{i},r_{j},r_{i+1},r_{j-1}\right) }{RT}\right)}{{Z\!M}\left( r_{i+1},r_{j-1}\right)}} 
320: {Z\left( S\right)}
321: \end{equation}
322: 
323: 
324: \begin{figure}
325: \begin{center}
326: \includegraphics{f4.eps}
327: \caption{Base pair contacts and their free energy contributions in case of an open loop and 
328: branched hairpin. Also an example is given of conformational switching between the loop and the hairpin as a result 
329: of interaction of the loop with a short oligo. At the same time the subregion $ \{p,,,q\} $ (involved into a multibranched loop) 
330: has to unfold before it hybridized with the short oligo.}
331: \end{center}
332: \end{figure}
333: 
334: where $ {F\left(r_{i},r_{j},r_{i+1},r_{j-1}\right) }$ is the free energy of base pairing of two nearest-neighbor nucleotides.
335: 
336: 
337: Of particular importance is also the ability to monitor 
338: the transition between the folded and unfolded structures as well
339: as the partial forms of their conformational intermediates as a function of the temperature by any physical
340: property that is dependent on the number of base pairs formed.
341: Fortunately, the absorption spectra as well as thermodynamics are
342: physical properties that are consistent with the nearest-neighbor
343: models \cite{PUGJ8901,BLAR7201}.  In other words given
344: nearest neighbors must have identical values of their absorptions
345: or melting free energies regardless of their position in the
346: interior or at the ends of the sequence. In such way the property
347: monitored as a function of the temperature is proportional to the
348: fraction of base pairs that are stacked as a nucleic acid molecule
349: is melted \cite{RADMZ}.
350: 
351: Using the base pairing probabilities we can express the equilibrium fraction of bases paired $ \theta  $ 
352: as follow:
353: 
354: \begin{eqnarray}
355: \theta  & = & \sum _{ij}P(r_{i},r_{j}) 
356: \end{eqnarray}
357: 
358: 
359: To calculate the extinction we should take into account
360: that it is determined by the contribution of the
361: melted or mismatch loop regions along the constituent sequences of
362: the self-folded species \cite{CRCIT}. At each given
363: temperature there is an ensemble of conformation with a narrow or
364: broad distribution of such loops. The contribution of each of them
365: is proportional to its relative Boltzmann statistical weight. It
366: follows from here that the extinction  $ \epsilon(T) $ for the self-folded species can be represented in the form \cite{RADMZ}:
367: 
368: \begin{equation}
369: \epsilon (T)=\sum ^{N-1}_{i=1}2(1-P(r_{i})- P(r_{i+1})+
370: P\left(r_{i}, r_{i+1}\right))\xi (i,i+1)-\sum ^{N-1}_{i=1}(1-P(r_{i}))\xi(i)
371: \end{equation}
372: 
373: where $1-P(r_{i})- P(r_{i+1}) + P\left(r_{i}, r_{i+1}\right)$ is the probability that two closest along the sequence
374: nucleotides with positions $i$ and $i+1$ are melted and as a result give a contribution $\xi (i,i+1)$ to the total
375: absorbance. For the probabilities $ P(r_{i}) $ and $ P\left(r_{i}, r_{i+1}\right) $ we have:
376: 
377: \begin{equation}
378: P(r_{i}) = \sum_{i>n\geq N}{P\left(r_{i},r_{n}\right)}+\sum_{1\leq n<i}{P\left(r_{n},r_{i}\right)}  \nonumber
379: \end{equation}
380: 
381: \begin{eqnarray}
382: P\left(r_{i},r_{i+1}\right) & = & \sum_{i+1<n<m}\sum_{n<m\leq N}{P\left(r_{i},r_{i+1},r_{m},r_{n}\right)} + \nonumber \\
383: & & \sum_{1\leq
384: n<m}\sum_{n<m<i}{P\left(r_{i},r_{i+1},r_{m},r_{n}\right)} + \nonumber \\
385: & &  \sum_{i+1<n\leq N}\sum_{1\leq m<i}{P\left(r_{i},r_{i+1},r_{m},r_{n}\right)} 
386: \end{eqnarray}
387: 
388: The formalism developed in this work allow also incorporation of several types of intramolecular interactions
389: trough a network of RNA-RNA, RNA-DNA, RNA(DNA)-protein or RNA(DNA)- small
390: molecular contacts.  The additional free energy terms depending on
391: the type of interactions (for example hybridization with short oligos or protein molecules) have to be incorporated into 
392: the free energy term ${F\!M}\left( r_{i},r_{j}\right) $ (fig.2). 
393: 
394: \section{Results and discussions}
395: 
396: 
397: \begin{figure}
398: \begin{center}
399: \includegraphics{f7.eps}
400: \caption{Chemical potential versus temperature for the hairpin species formed after dissociation of the three dsDNAs -S1S2, S3S4,
401: S5S6.}
402: \end{center}
403: \end{figure}
404: 
405: 
406: Understanding of the molecular forces that control the various sequence- and
407: solvent-specific conformational forms found within DNA and RNA
408: oligonucleotides is of great importance. Melting experiments have been the most useful way to 
409: measure variety of thermodynamic parameters from which the stabilities of larger structures under different conditions can be estimated.
410: The estimation of the thermodynamic parameters is based on the assumption that the stability of a base pair is
411: dependent only on the identity of adjacent base pair because the major interactions involved in transformation between different
412: conformations of the polynucleotide sequence are stacking and hydrogen bonding \cite{SDATD,NSRKDHT,DRHDHT,JDPDHT}.
413: This additive property of the energy rules based on nearest neighbor
414: approximation forms the bases of the recursion calculations of the
415: partition function. The additivity of the free energy leads to a multiplication of the partition functions \cite{RADMZ}.
416: 
417: 
418: \begin{figure}
419: \begin{center}
420: \includegraphics{CP.eps}
421: \caption{Calorimetric excess heat capacity, $\Delta C_{p}$, versus temperature profiles for the three dsDNAs.
422: Experimental plots for duplex strand transition are as follows \cite{PWNS}: S1S2(A), S3S4 (B), and S5S6 (C). The calculated curves
423: are with lines and are given as follows: S1S2 (a), S3S4 (b), and S5S6 (c).}
424: \end{center}
425: \end{figure}
426: 
427: 
428: Based on the multiplication property of the partition function, here we present a new formalism for calculation of the
429: partition function of a single stranded nucleic acids. The self-folding deal with matches, mismatches, symmetric and asymmetric 
430: interior loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices. The formalism also takes into 
431: account base pair contacts in the loop regions,  or dangle ends in the double helix and single hairpin species as well as multi-branches. 
432: This allow calculations of both short and long sequences.  The self-folding explores all possible conformations of the single strand species.
433: 
434: We did calculations on non-self-complementary DNA sequences with melting temperatures between 
435: 50 $C^o$ and 90 $C^o$. The sequence length is as follows: 9-S1,d(GCTTGTTGC) and S2,d(GCAACAAGC); 
436: 15-S3,d(GCAGGTTGTTTCCGC) and S4,d(GCGGAAACAACCTGC); 21-S5,d(GCAACAGGTTGTTTCCGTTGC) and S6,d(GCAACGGAAACAACCTGTTGC) \cite{PWNS}. 
437: The self-folding and hybridization between DNA and RNA sequences takes into account the whole ensemble of single and
438: double strand species in the solution and their fractional extents at different temperatures \cite{RADMZ}.
439: We assume that the solution can be described as an ensemble of ideally mixed species.
440: This assumption is based on the experimental evidence that with very good accuracy the single-stranded self-folding trasition 
441: and the double-stranded association are independent transition processes and the thermodynamic properties and transition 
442: characteristics of each transition in a mixing solution are identical to those in the isolated systems \cite{PWNS}. The calculated 
443: chemical potentials of intermadiate hairpin species show that for short oligonucleotides (S1, S2 -fig.3), there is a small thermodynamic 
444: contribution of the single-strand self-folding transition to the entire transition. As a result the duplex formation for short oligonucleotides
445: shows a perfectly symmetric two-state shape for the calorimetric excess heat capacity curve versus temperature (fig.4). However, for longer oligonucleotides (S3, S4, S5, S6 -fig.3), calculated chemical potentials 
446: show that the thermodynamic properties of the single self-folding transition affect the transition nature of the duplex formation, resulting 
447: in a population of intermediate hairpin species in the solution. The deviation of calculated calorimetric excess heat capacity curves versus 
448: temperature from a perfectly symmetric shape can be seen for duplexes S3S4 and S5S6 in fig.4. Here, the melting of the intermadiate 
449: hairpin species are superimposed on the melting of duplex species thus leading to deviation from the two-state shape of the heat capacity curve.
450: 
451: \begin{figure}
452: \begin{center}
453: \includegraphics{f6.eps}
454: \caption{Schematic representation of the phase transitions in solutions containing molecular beacons.
455: At low temperature (phase A) molecular beacons and their targets spontaneously form duplexes. In this 
456: state molecular beacons are open and fluorescent. At higher temperature (phase B) duplexes 
457: are destabilized and molecular beacons are released, returning to their closed hairpin conformation, and fluorescence
458: decreases. As the temperature is raised further (phase C), the closed molecular beacons melt into fluorescent random coils.}
459: \end{center}
460: \end{figure}
461: 
462: Further we will analyze in details the transition nature of the duplex formation or dissociation and the role 
463: of the intermediate hairpin species. 
464: The role of hairpin intermediates during dissociation or formation of the duplex species in the solution 
465: is of great importance in the case when a short oligonucleotides 
466: (molecular beacons) have to reliably identify and hybridize to accessible nucleotides within their targeted mRNA sequences. 
467: Molecular beacons are DNA probes that form a stem-and-loop intermediate structure and possess an internally
468: quenched fluorophore. When they bind to complementary nucleic acids, they undergo a conformational transition that 
469: switches on their fluorescence. Molecular beacons are commonly used to identify complementary strands in the presence of 
470: unrelated nucleic acids. Understanding the thermodynamic basis and the underlying conformational transformations of the 
471: enhanced specificity of molecular beacons to their target sequences is of great importance. A simple picture based on 
472: detailed thermodynamic analysis of the underlying phase transitions in solutions containing molecular beacons is given in fig. 4 \cite{GBSTALFK}. 
473: Experimental data give evidence for there phases: phase A- probe-target duplex; phase B- free of target molecular beacon in the form of stem-loop 
474: structure and coiled target; and phase C- molecular beacon and the target are both coiled. All-or-none mechanism is supposed for the 
475: transitions between the phases. To understand the basis of the molecular beacon specificity from first principle we apply our formalism 
476: to calculate variety of thermodynamic characteristics such as free energy, enthalpy and entropy. The idea was to compare the behavior 
477: of molecular beacons in the presence of perfectly complementary target oligonucleotides to their behavior in the presence of targets
478: whose sequence created a single mismatched base pair in the probe-target duplex. The sequence of the molecular beacon 
479: used in this work is CGCTCCCAAAAAAAAAAACCGAGCG, and the complementary target GGTTTTTTTTTTTGG. 
480: In our calculations we do not restrict our self to the case of a two-state transitions where in solution during the
481: temperature screening there are only two type of conformational species- fully folded and fully unfolded. Rather we consider the
482: ensemble of all possible intermediate states thus having the most detailed possible picture of the melting process between the
483: folded and unfolded states of the single and double stranded forms.
484: Results from our calculations together with the experimental data are given in Table 1. Our calculations are in very good agreement 
485: with the experimental data \cite{GBSTALFK}. Analysis of the calculated melting curves and intermediates, reveals that the enhanced 
486: specificity of the molecular beacons is a result of their constrained conformational flexibility and the all-or-none mechanism of 
487: their hybridization to the target sequence. 
488: 
489: \begin{table}
490: \caption{Standard enthalpies and standard entropies are shown for solutions containing 50 nM molecular beacons and
491: 1 M target oligonucleotides in the presence of 100 mM KCl and 1 mM $ MgCl_{2}$ \cite{GBSTALFK}. Melting temperatures are for solutions with
492: 50 nM molecular beacons and 300 nM target oligonucleotides. Experiments are given for different mismatches at the 
493: same position (marked with 0) and the same mismatch at nearest left (marked with -1) and rigth (marked with +1) positions.} 
494: \fontsize{9}{10pt}\selectfont
495: \begin{tabular}{|l|c|c|c|c|c|c|c|c|c|c|c|}
496: \hline {Mismatch}&{Position}&\multicolumn{2}{|c|}{$ -\Delta H^0(kcal/mol)$}&\multicolumn{2}{|c|}{$-\Delta S^0(eu)$}&
497: \multicolumn{2}{|c|}{$T_m(C^0)$}\\
498: \hline     &   & exp  & cal  & exp  & cal  & exp & cal\\
499:            &   &   &   &   &   &  & \\
500: 
501: \hline T-A & 0 & 84  & 80  & 237  & 238  & 42 & 42 \\
502: \hline A-A & 0 & 69  & 62  & 201  & 202  & 27 & 28\\
503: \hline C-A & 0 & 61  & 61.2  & 175  & 202  & 23 & 28\\
504: \hline G-A & 0 & 65  & 61  & 185  & 202  & 28 & 28\\
505: 
506: 
507: \hline G-A &-1 & 72  & 65  & 208  & 218  & 29 & 27\\
508: \hline G-A & 1 & 74  & 65  & 213  & 217  & 29 & 27\\
509: \hline 
510: \end{tabular}
511: \end{table}
512:    
513: 
514: 
515: Thus, calculations show that the main contribution to the 
516: free energy of phase A, in case of perfect match between the probe-target sequences, is practically represented by a 
517: single conformational state of the probe-target duplex. The contributions from bulges, interior loops and dangle ends are 
518: negligible. The main contributions to 
519: the free energy of phase B come from the entropy of the coiled target and the free energy of the loop-stem structure 
520: of the molecular beacon. Flexibility of molecular beacon around its hairpin structure is the main way 
521: to modulate the stability of phase B. Long stems increase the difference between the melting temperatures of perfectly 
522: complementary duplexes and mismatched duplexes. However, too long stems make the hairpin 
523: stable not only in phase B but also in phase A. On the 
524: other hand, too long hairpin loops decrease the stability of the hairpin.  This can lead to disappearance of phase B. 
525: Moreover, as the length of the molecular beacon increase, the free energy penalty resulting 
526: from a mismatched base pair in the probe-target duplex becomes negligible and will decrease the sensitivity to the presence of 
527: a mismatch. Finally, the free energy of phase C is a sum of the entropies of the random coils of both molecular beacon and its target. 
528: Our calculations are in full agreement with the experimental data and their thermodynamic analysis (fig. 5)\cite{GBSTALFK}.  
529: 
530: 
531: 
532: \begin{figure}
533: \begin{center}
534: \includegraphics{FRE.eps}
535: \caption{Experimental and calculated free energy of a solution of molecular beacons in equilibrium with target oligonucleotides. Experimental
536: plots \cite{GBSTALFK} for the free energies are as follows: 1p -free energy of the perfect duplex match (phase A); 1m -free energy 
537: of the mismatch duplex (phase A); 2 - free energy of the molecular beacon closed form and the coiled target (phase B). 
538: The calculated free energy curves are given as follows: A -free energy of the perfect duplex match (phase A); B -free energy 
539: of the mismatch duplex (phase A). Since molecular beacons are conformationally more constrained than the unstructured probes, 
540: line 2 cross the lines 1p and 1m in such way that increase the difference between the melting temperatures of perfectly complementary 
541: duplexes and mismatched duplexes $\Delta \theta $ compare with the $\Delta \theta^{'}$ for an intermediate state of unstructured probe 
542: and target.}
543: \end{center}
544: \end{figure}
545: 
546: 
547: In conclusion, we presented here a general statistical
548: mechanical approach appropriate to describe the self-folding and hybridization processes of DNA and RNA sequences. 
549: The folding model deals with matches, mismatches, symmetric and asymmetric interior loops, stacked pairs in loop 
550: and dangling end regions, multi-branched loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices. 
551: This allow calculations of both short and long sequences.
552: 
553: Calculations on short and long sequences show, that for short oligonucleotides, a duplex formation often displays 
554: a two-state transition. However, for longer oligonucleotides, the thermodynamic properties of the single 
555: self-folding transition affects the transition nature of the duplex formation, resulting in a population of 
556: intermediate hairpin species in the solution. The advantage of this new formalism is clearly demonstrated 
557: especially in the case when one need to design relatively short oligonucleotides (molecular beacons) which have to 
558: reliably identify and hybridize to accessible nucleotides within their targeted mRNA sequences. 
559: It is shown that the design will enhance the specificity of molecular beacons if they form a stem-and-loop structure
560: with constrained conformational flexibility and an all-or-none mechanism of their hybridization to the target sequence.
561: In recent years, a class of diverse regulatory RNAs ( often denoted riboregulators) has emerged that regulate expression
562: at the posttranscriptional level. These regulatory RNAs fine tune cellular responses to stress conditions, integrating
563: environmental signals into global regulation. It seems that the structural constraints that enhance the specificity of molecular 
564: recognition are also a general feature of the mechanism of action of riboregulators. Thus, the formalism developed in this work 
565: can serve as a first step toward creation of a general approach, which can take into account both affinity and specificity 
566: of several types of intramolecular interactions trough a network of RNA-RNA, RNA-DNA, RNA(DNA)-protein 
567: or RNA(DNA)- small molecular contacts. 
568: 
569: 
570: \bibliography{hybrid}
571: 
572: 
573: \begin{thebibliography}{99}
574: 
575: 
576: \bibitem{SNC} N. C. Seeman (1999) {\em Trends Biotechnol.} {\bf 17} 437.
577: 
578: \bibitem{SG} S. Gottesman (2002) {\em GENES and DEVELOPMENT} {\bf 16} 2829.
579: 
580: \bibitem{FRES8301} S Freier and D Alkema and A Sinclair and T Neilson and DH Turner (1983) {\em Biochemistry} {\bf 22} 6198.
581: 
582: \bibitem{GASRRB} G. A. Soukup and R. R. Breaker (1999){\em Proc. Natl Acad. Sci. USA} {\bf 96} 3584.
583: 
584: \bibitem{BYATAMFSJN} B. Yurke, A. J. Turber�ld, A. P. Jr. Mills, F. C. Simmel and J. L. Neumann (2000){\em Nature} {\bf 406} 605.
585: 
586: \bibitem{HYXZSNS} H. Yan, X. Zhang, Z. Shen and N. C. Seeman (2002){\em Nature} {\bf 415} 62.
587: 
588: \bibitem{MNSDS} M. N. Stojanovic and D. Stefanovic (2003) {\em Nat. Biotechnol.} {\bf 21} 1069.
589: 
590: \bibitem{RBNCCJPRLA} R. S. Braich, N. Chelyapov, C. Johnson, P. W. K. Rothemund and L. Adleman (2002) {\em Science} {\bf 296} 499.
591: 
592: \bibitem{DSDLDMRD} D. D.Shoemaker, D. A. Lashkari, D. Morris, M. Mittman and R. W. Davis (1996) {\em Nature Genet.} {\bf 16} 450.
593: 
594: \bibitem{SMJGDDSSM} S. Brenner, M. Johnson, J. Bridgham, G. Golda, D. H. Lloyd, D. Johnson, S. Luo, S. McCurdy, M. Foy, M. Ewan et al. (2000) 
595: {\em Nat. Biotechnol.} {\bf 18} 630.
596: 
597: \bibitem{GWMG} G. Werstuck and M. R. Green (1998) {\em Science} {\bf 282} 296.
598: 
599: \bibitem{ITCB} I. Jr. Tinoco and C. Bustamante (1999) {\em J. Mol. Biol.} {\bf 293} 271.
600: 
601: \bibitem{DMMBSFJWDT} D. H. Mathews, M. E. Burkard, S. M. Freier, J. R. Wyatt and D. H. Turner (1999) {\em  RNA} {\bf 5} 1458.
602: 
603: \bibitem{GS} G. Stormo (2003) {\em Molecular Cell} {\bf 11} 1419.
604: 
605: \bibitem{MJWR}  M. Mandal, B. Boese, J. E. Barrick, W. C. Winkler, and R. R. Breaker. (2003) {\em Cell} {\bf 113} 577.
606: 
607: \bibitem{MTMPAS}  M. T. McManus and P. A. Sharp (2002) {\em Nature Rev. Genet.} {\bf 3} 737.
608: 
609: \bibitem{TSCSNB}  T. A. Vickers, S. Koo, C. F. Bennett, S. T. Crooke, N. M. Dean, and B. F. Baker (2003) {\em J. Biol. Chem.} {\bf 278} 7108.
610: 
611: \bibitem{RADMZ}  R. A. Dimitrov and M. Zuker (2003) {\em Biophysical J.} {\bf 87} 215.
612: 
613: \bibitem{SUGN8701} N. Sugimoto, R. Kierzek and DH. Turner (1987) {\em Biochemistry}{\bf 26} 4554
614: 
615: \bibitem{HICD8501} DR. Hickey and DH. Turner (1985) {\em Biochemistry} {\bf 24} 2086.
616: 
617: \bibitem{PUGJ8901} JD. Puglisi and IJr. Tinoco (1989) {\em Methods in Enzymology},
618: {\bf 180} 304.
619: 
620: \bibitem{BLAR7201} RD. Blake (1972) {\em Biopolymers} {\bf 11} 913.
621: 
622: \bibitem{BORP7401} PN. Borer, B. Dengler, IJr. Tinoco and OC. Uhlenbeck (1974) {\em J Mol Biol}
623: {\bf 86} 843.
624: 
625: \bibitem{MCCJ9001} JS McCaskill (1990) in {\em Biopolymers}{\bf 29} 1105.
626: 
627: \bibitem{HOFI9401} IL Hofacker, W. Fontana, PF. Stadler, S. Bonhoffer, M. Tacker, P. Schuster (1994) {\em Monatshefte f\"{u}r Chemie}
628: {\bf 125} 167.
629: 
630: \bibitem{MATO9601}  O. Matzura and A. Wennborg (1996) {\em Comput Appl Biosci} {\bf
631: 12} 247.
632: 
633: \bibitem{CRCIT} C. R. Cantor and I. Jr. Tinoco (1965) {\em J Mol Biol} {\bf 13} 65.
634: 
635: 
636: \bibitem{ZUKM8901} M. Zuker (1989) {\em Methods Enzymol} {\bf 180} 262
637: 
638: \bibitem{WILA8601}  AL Williams and IJr Tinoco (1986) {\em Nucleic Acids Res} {\bf 14} 299.
639: 
640: \bibitem{WATM8301}  MS Waterman (1983) {\em Proc Natl Sci USA} {\bf 80} 3123.
641: 
642: \bibitem{WATM8502} MS Waterman and TH Byers (1985) {\em Math Biosci} {\bf 77} 179.
643: 
644: \bibitem{PWNS} P. Wu and N. Sugimoto (2000) {\em Nucleic Acids Reas} {\bf 28} 4762.
645: 
646: 
647: \bibitem{ZUKM8902}  M. Zuker (1989) {\em Science} {\bf 244} 48.
648: 
649: \bibitem{ZUKM8903}  M. Zuker (1989) {\em J. Mol. Biol.} {\bf 288} 911.
650: 
651: \bibitem{ZUKM8904}  M. Zuker (2000) {\em Curr. Opin. Struct. Biol.} {\bf 10} 303.
652: 
653: \bibitem{ZUKM0305}  N. R. Markham and M. Zuker (2005) {\em Nucleic Acids Reas} {\bf 33} W577.
654: 
655: 
656: \bibitem{BATRT9901} R. T. Batey, R. P. Rambo, and J. A. Doudna (1955) {\em Angew. Chem. Int.} {\bf 38} 2326.
657: 
658: \bibitem{EDRBBMJD} E. A. Doherty, R. T. Batey, B. Masquida and J. A. Doudna (2001) {\em Nature Structural Biology}
659: {\bf 8} 339.
660: 
661: \bibitem{TRSTP} T. R. Sosnick and T. Pan (2003) {\em Current Opinion in Structural Biology}
662: {\bf 13} 309.
663: 
664: \bibitem{VDAF} V. Daggett and A. Fersht (2003) {\em Nature Rev. Mol. Cell Biol.} {\bf 4} 497.
665: 
666: \bibitem{SWGSMYCR}  S. P. Walton, G. N. Stephanopoulos, M. L. Yarmush, and C. M. Roth (2002) {\em Biophysical J.}
667: {\bf 82} 366.
668: 
669: \bibitem{SWGNSMYMR}  S. P. Walton, G. N. Stephanopoulos, M. L. Yarmush, and C. M. Roth (1999) {\em Biotechnol. Bioeng.}
670: {\bf 65} 1.
671: 
672: \bibitem{TVJWSF}  T. A. Vickers, J. R. Wyaatt and S. M. Freier (2000) {\em Nucleic Acids Research} {\bf 28} 1340.
673: 
674: 
675: \bibitem{GBSTALFK}  G. Bonnet, S. Tyagi, A. Libchaber, and F. R. Kramer (1998) {\em Proc. Natl. Acad. Sci. USA}
676: {\bf 96} 6171.
677: 
678: \bibitem{SDATD} S. Freier, D. Alkema, A. Sinclair, T. Neilson, and D. H. Terner (1983) {\em Biochemistry}
679: {\bf 22} 6198.
680: 
681: \bibitem{NSRKDHT} N. Sugimoto, R. Kierzek and D. H. Terner (1987) {\em Biochemistry}
682: {\bf 26} 4554.
683: 
684: \bibitem{DRHDHT} D. R. Hickey and D. H. Terner (1985) {\em Biochemistry}
685: {\bf 24}  2086.
686: 
687: \bibitem{JDPDHT} J. D. Puglisi and D. H. Terner (1989) {\em Methods Enzymology}
688: {\bf 180}  304.
689: 
690: \end{thebibliography}
691: 
692: 
693: 
694: 
695: \end{document}
696: 
697: 
698: \end{document}
699: