1: \documentclass[12pt]{article}
2: \usepackage{amssymb,amsmath}
3: \usepackage{epsfig}
4:
5: \begin{document}
6:
7: \title{A new formalism for calculation of the partition function of single stranded nucleic acids}
8:
9: \author{Roumen A. Dimitrov \\
10: University of Sofia, Faculty of Physics,\\ Department of Theoretical Physics, \\
11: 5, James Bouchier Blvd., 1164 Sofia, Bulgaria, \\e-mail: dimitrov@phys.uni-sofia.bg
12: }
13:
14:
15: %\begin{document}
16:
17: \maketitle
18:
19: \begin{abstract}
20: A new formalism for calculation of the partition function of single stranded nucleic acids is presented.
21: Secondary structures and the topology of structure elements are the level of resolution that is used.
22: The folding model deals with matches, mismatches, symmetric and asymmetric interior loops, stacked pairs in loop
23: and dangling end regions, multi-branched loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices.
24: Calculations on short and long sequences show, that for short oligonucleotides, a duplex formation often displays
25: a two-state transition. However, for longer oligonucleotides, the thermodynamic properties of the single
26: self-folding transition affects the transition nature of the duplex formation, resulting in a population of
27: intermediate hairpin species in the solution. The role of intermediate hairpin species is analyzed in the case
28: when a short oligonucleotides (molecular beacons) have to reliably identify and hybridize to
29: accessible nucleotides within their targeted mRNA sequences. It is shown that the enhanced specificity of the molecular beacons
30: is a result of their constrained conformational flexibility and the all-or-none mechanism of their hybridization to the target
31: sequence.
32: \end{abstract}
33:
34:
35: \section{Introduction}
36: Nucleic acids hold great promise as a design medium for the
37: construction of nanoscale devices with novel mechanical or chemical function \cite{SNC}. Efforts are currently underway in
38: many laboratories to use DNA and RNA molecules for applications in transport,
39: switching \cite{GASRRB,BYATAMFSJN,HYXZSNS}, circuitry \cite{MNSDS}, DNA computing \cite{RBNCCJPRLA} and DNA chips
40: \cite{DSDLDMRD,SMJGDDSSM}. Conformational switches or diversity of conformations have been proven or
41: are suspected to be involved in several important processes such as regulation of gene expression,
42: translational regulation, mutation and repair, and others \cite{GWMG,SG,GS}.
43: During these processes there are several types of interactions
44: trough a network of RNA-RNA, RNA-DNA, RNA(DNA)-protein, RNA(DNA) self-folding or RNA(DNA)- small
45: molecular contacts.
46:
47: Comparison of short RNAs/DNAs with different base pairs,
48: loop sequences, bulges, etc. has yielded an extremely useful
49: database of thermodynamic parameters from which the stabilities of conformational states of larger
50: nucleic acid sequences can be estimated \cite{FRES8301,SUGN8701,HICD8501,PUGJ8901,BLAR7201}.
51: The estimation of the thermodynamic parameters is based on
52: nearest-neighbor approximation for inter-residue interactions of
53: closest along the sequence nucleotide residues \cite{BORP7401}.
54:
55: There have been several major improvements in calculation of the
56: partition function of a single stranded nucleic acids based on McCaskill
57: algorithm \cite{MCCJ9001,HOFI9401,MATO9601} or estimation of the
58: free energy based on free energy minimization and the
59: corresponding sub-ensemble around the minimum free energy
60: conformation \cite{ZUKM8901,WILA8601,WATM8301,WATM8502,ZUKM8902}.
61:
62:
63: In this work secondary structures and the topology of structure elements are the level of resolution that is used.
64: However, atomic coordinates are also taken into account in the general expressions.
65: Unlike proteins \cite{VDAF}, whose secondary structures usually
66: depend on the global amino acid sequence, DNA/RNA molecules
67: are currently thought to assemble in a hierarchical manner \cite{BATRT9901,EDRBBMJD,TRSTP}.
68: The folding can be conceptually partitioned in the two steps of formation
69: of the secondary structure and the spatial structure \cite{ITCB}.
70: As a result DNA/RNA molecules exhibit a modular structure with individual
71: structural motifs demonstrating independent characteristics.
72:
73: Therefore, investigation of the overall properties of DNA/RNA molecules based on exploration of variety of local
74: structural motifs, their interactions and distributions along the sequence needs an appropriate theoretical approaches.
75: In particular, this is especially important in a recent increased interest in
76: predicting target sites for antisense oligonucleotides in
77: highly structured DNA/RNA molecules \cite{SWGSMYCR,GBSTALFK,DMMBSFJWDT,SWGNSMYMR,TVJWSF}.
78: Because of the economical value and short experimental cycle, antisense technology has been widly accepted as the tool
79: to study functions of a gene and to validate drug targets. Antisense oligonucleotides can
80: potentially suppress particular gene expression through mechanism such as RNase H-mediated mRNA cleavage, destabilization of the
81: target mRNA or aberation of translation or splicing. Understanding the conformational constraints and transformation between
82: different local structural motifs is of great practical importance. Thus, conformational switches of hairpin-shaped oligonucleotide
83: primers can be useful for enhancing the specificity of nucleic acid amplification reactions. Interactions between short
84: oligonucleotides or small metabolic molecules can lead to conformational switches in the DNA/RNA target molecules
85: \cite{MTMPAS,TSCSNB}. These conformational switches can be used for sensing and modulating complex biochemical networks in
86: variety of important biological processes \cite{MJWR,GS}.
87:
88: Based on such local structural motifs approach in mind, we will use as a starting point our previous work \cite{RADMZ},
89: where we presented a new formalism for hybridization processes between DNA and RNA molecules.
90: There hybridization accounted only for stacked pairs, interior loops, bulges and, at the
91: ends, dangling bases. We did not consider stacked pairs in loop and
92: dangling end regions as well as multi-branch loops. The formalism was applied only to
93: short DNA/RNA sequences. Another limitation was that this new formalism was not
94: applied for the estimation of the partition function of self-folding. The self-folding of individual
95: DNA/RNA molecules was based on free energy minimization and the
96: corresponding sub-ensemble around the minimum free energy conformation at each temperature
97: as given by mfold program by Zuker \cite{ZUKM8902}. This led to some inconsistency in the overall calculations.
98: For sequences with non-two state transitions the populations of some intermediate species were poorly predicted.
99: Recently, using McCaskill algorithm \cite{MCCJ9001, ZUKM0305}, mfold has been updated and now it is able to calculate not only the
100: low energy conformations but the ensemble free energy also. It will be interesting in future to compare mfold
101: with the formalism developed here.
102:
103: In this work we present a new formalism for the estimation of the partition function for self-folding.
104: The formalism use an approach based on the left, right recursion algorithm we have developed for the free energy
105: calculation of duplexes \cite{RADMZ}.
106: All possible conformations of single stranded DNA or RNA sequences in solution are explored. The folding model
107: deals with matches, mismatches, symmetric and asymmetric interior loops, stacked pairs in loop and dangling end regions,
108: multi-branched loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices.
109: Calculations on short and long sequences show, that for short oligonucleotides, a duplex formation often displays
110: a two-state transition. However, for longer oligonucleotides, the thermodynamic properties of the single
111: self-folding transition affects the transition nature of the duplex formation, resulting in a population of
112: intermediate hairpin species in the solution. The advantage of this new formalism is clearly demonstrated
113: especially in the case when one need to design relatively short oligonucleotides (molecular beacons) which have to
114: reliably identify and hybridize to accessible nucleotides within their targeted mRNA sequences.
115: It is shown that the design will enhance the specificity of molecular beacons if they form a stem-and-loop structure
116: with constrained conformational flexibility and an all-or-none mechanism of their hybridization to the target sequence.
117:
118: \section{Methods}
119:
120:
121: \subsection{Recursive calculation }
122:
123: With increasing of the temperature the overwhelming majority of
124: the single stranded form conformations tend toward
125: their corresponding unfolded states. At each temperature there is an ensemble
126: of conformational states where each conformation is characterized with the
127: fraction of its base pairs and their location along the sequences which are
128: melted at that given temperature. Thus along the sequences we have variety of local
129: structural motifs characterized by alternating
130: loops -single stranded regions- and double stranded regions. The
131: location and the length of these local structural motifs depend on their relative
132: Boltzmann statistical weights. In this work we are interested to calculate the partition
133: functions of the single-stranded forms based on the method
134: developed for double-stranded forms.
135:
136:
137: In our previous work (fig.1) \cite{RADMZ}, the polynucleotide sequences of the double-stranded
138: forms are described as follows: sequence $1$ is represented by $S_{1}=r_{11}, r_{12},
139: r_{13}, r_{1i}, r_{1N_{1}}$ and sequence $ 2 $ is
140: represented by $S_{2}=r_{21}, r_{22}, r_{23}, r_{2j}, r_{2N_{2}}$,
141: where $N_{1}$ and $N_{2}$ stand for their corresponding
142: lengths and $r_{1i}$ and $r_{2j}$ are the space coordinates of the
143: corresponding nucleotides of sequences $1$ and $2$. The recursion calculation is
144: based on the condition that at least there is a two nucleotides along the sequence
145: $1$ and sequence $2$ that are in contact $r_{1i}-r_{2j}$ and $1\leq i\leq N_{1}$,
146: $1\leq j\leq N_{2}$. The sequence enumeration is from the $5^{'} $- to
147: the $3^{'}$-end of the sequences. The contact $r_{1i}-r_{2j}$ include
148: an initiation free energy term necessary to bring the two sequences
149: together $F^{initiation}$. Each nucleotide pair $r_{1i}-r_{2j}$
150: formally divide the hybridized form $ S_{1}S_{2} $ of the sequences $
151: 1 $ and $ 2 $ in two parts left $ L $ and right $ R $ in such way that
152: the free energy $ F\left( S_{1}S_{2}\right) $ of $ S_{1}S_{2} $ is a
153: sum of the free energies of the left $ FL\left( r_{1i},r_{2j}\right)
154: $and right $ FR\left( r_{1i},r_{2j}\right) $ parts plus the initiation
155: free energy $ F^{initiation} $ which is assumed to be the same for all
156: possible pairs $ r_{1i}-r_{2j} $. Thus,
157:
158:
159:
160: \begin{equation}
161: F\left( S_{1}S_{2}\right) = {F\!L}\left( r_{1i},r_{2j}\right) + {F\!R}\left(
162: r_{1i},r_{2j}\right) +F^{initiation}
163: \end{equation}
164:
165:
166: This additive property of the energy rules based on nearest neighbor
167: approximation forms the bases of the recursion calculations of the
168: partition function $ S_{1}S_{2} $. The additivity of the free
169: energy leads to a multiplication of the partition functions of the
170: left $ {Z\!L} $ and right $ {Z\!R} $ parts \cite{RADMZ}.
171:
172:
173:
174: \begin{figure}
175: \begin{center}
176: \includegraphics{f3.eps}
177: \caption{Additive property of the free energy rules based on nearest-neighbor approximation: A- self-folding, B- hybridization \cite{RADMZ}.}
178: \end{center}
179: \end{figure}
180:
181:
182:
183: Our main focus in this work is the partition function for single-stranded form which
184: similar as we did for the double-stranded form will be described with left and right parts.
185: The sequence is represented by $S=r_{1}, r_{2}, r_{3}, \dots, r_{i}, \dots r_{N}$,
186: where $N$ stand for it's corresponding length and $r_{i}$ are the space coordinates
187: of the corresponding nucleotides of sequences $S$.
188: As previously, the recursion calculation is based on the condition that at least
189: there is a two nucleotides along the sequence that are in contact $r_{i}-r_{j}$ .
190:
191: In contrast to the double-stranded form now the term for the initiation free energy represent the
192: formation of a loop between the positions $i$ and $j$ (fig.1).
193: The sequence enumeration is from the $5^{'} $- to the $3^{'}$-end of the sequence.
194: Each nucleotide pair $r_{i}-r_{j}$ formally divide the self-hybridized form
195: of the sequences in three parts left $ FL $, middle $ FM $ and right $ FR $ in such way that
196: the free energy $ F\left( S\right) $ of $ S $ is a
197: sum of the free energies of the left $ FL\left( r_{i}\right)$, middle
198: $ FM\left( r_{i},r_{j}\right) $ and the right $ FR\left( r_{j}\right) $ parts.
199:
200:
201:
202: \begin{eqnarray}
203: F\left( S\right) = {F\!L}\left( r_{1},r_{i}\right) + {F\!M}\left(r_{i},r_{j}\right) + {F\!R}\left(r_{j}, r_{N}\right)
204: \end{eqnarray}
205:
206: The recursion form of the partition functions of the left, middle and right parts have the forms:
207:
208:
209: Left part:
210:
211: \begin{eqnarray}
212: {Z\!L}\left( r_{1}, r_{i}\right) & = & {Z\!L}\left( r_{1}, r_{i-1}\right) + \nonumber \\
213: & & \sum_{1\leq k<i}{Z\!L}\left( r_{1}, r_{k}\right)
214: \exp \left( -\frac{{F\!M}\left( r_{k},r_{i}\right)}{RT}\right) \\
215: {F\!L}\left( r_{1}, r_{i}\right) & = & -RT \ln \left[ {Z\!L}( r_{1}, r_{i}\right)]
216: \end{eqnarray}
217:
218:
219: Middle part:
220:
221:
222: \begin{eqnarray}
223: {Z\!M}\left( r_{i},r_{j}\right) & = & {Z\!M}^{open}\left( r_{i},r_{j}\right) + \nonumber \\
224: & & \sum _{i<k<l}\sum _{j>l>k}{Z\!M}\left( r_{k},r_{l}\right)
225: \exp \left( -\frac{F\left( r_{i},r_{j},r_{k},r_{l}\right)}{RT}\right)
226: \end{eqnarray}
227:
228: \begin{eqnarray}
229: F\left( r_{i},r_{j},r_{k},r_{l}\right) & = & {F\!L}\left( r_{i},r_{k}\right)+{F\!R}\left( r_{l},r_{j}\right)
230: \end{eqnarray}
231:
232: \begin{eqnarray}
233: {F\!M}\left( r_{i},r_{j}\right) & = & -RT \ln \left[ {Z\!M}( r_{i},r_{j}\right)]
234: \end{eqnarray}
235:
236:
237: Right part:
238:
239: \begin{eqnarray}
240: {Z\!R}\left( r_{j}, r_{N}\right) & = & {Z\!R}\left( r_{j+1}, r_{N}\right) + \nonumber \\
241: & & \sum_{N\geq k>j}{Z\!R}\left( r_{k}, r_{N}\right)
242: \exp \left( -\frac{{F\!M}\left( r_{j},r_{k}\right)}{RT}\right) \\
243: {F\!R}\left( r_{j}, r_{N}\right) & = & -RT \ln \left[ {Z\!R}( r_{j}, r_{N}\right)]
244: \end{eqnarray}
245:
246:
247: $ {F\!L}\left( r_{1}, r_{i}\right) $ and ${F\!R}\left( r_{j}, r_{N}\right)$ correspond to
248: the free energy of self-folding of the $5'$ and $3'$ dangle ends of the sequence.
249: Obviously, $ {F\!L}\left( r_{1}, r_{N}\right) = {F\!R}\left( r_{1}, r_{N}\right)$.
250: The term $ {F\!M}\left(r_{i},r_{j}\right) $ corresponds to the case of initiation of a loop in the middle part.
251: Thus, $ {F\!M}^{open}\left(r_{i},r_{j}\right) = -RT\ln[{Z\!M}^{open}\left( r_{i},r_{j}\right)]$ represents the free energy initiation of a loop without internal base pair
252: contacts. While, $ F\left( r_{i},r_{j},r_{k},r_{l}\right) $ takes into account
253: the summation over all possible distribution of structural motifs
254: (stack pairs, bulges, symmetric and asymmetric loops, single stranded regions, hairpins and multibranches)
255: along the sequences of the interior regions $ (i,k) $ and $ (l,j) $.
256: For example when $ \left| k-i\right| = 1 $ and $ \left| l-j\right| =1 $ the free energy $
257: F\left( r_{i},r_{j},r_{k},r_{l}\right) $ represents a stack pair
258: which belong to a secondary structure, when $ \left| k-i\right| =2 $
259: and $ \left| l-j\right| =1 $ or $ \left| k-i\right| =1 $ and $
260: \left| l-j\right| =2 $ we have a bulge. When $ \left|
261: k-i\right| \neq \left| l-j\right| $ and there are no any base pair contacts in the loop regions,
262: the free energy $ F\left( r_{i},r_{j},r_{k},r_{l}\right) $
263: represents an asymmetrical internal loop (including the case of a bulge from the one of the
264: sequences and a loop from the other and another way around), while $
265: \left| k-i\right| =\left| l-j\right| $ leads to a symmetrical loop
266: (including the case of a bulge from both sequences). The
267: presence of internal base pair contacts in the loop regions lead to hairpins and multibranches.
268: For detailed description of the free energies of the bulges, symmetric and
269: asymmetric internal loops and dangling ends we refer the reader to the
270: recent review by Zuker \cite{ZUKM8904}.
271:
272:
273: And lastly, based on the multiplication property of the partition functions for the left and
274: right parts, for the total partition function we have:
275:
276:
277: \begin{eqnarray}
278: Z\left( S\right) =\sum _{1\leq i<j\leq
279: N}\left[{{Z\!L}\left( r_{1}, r_{i}\right) {Z\!M}\left( r_{i},r_{j}\right){Z\!R}\left( r_{j}, r_{N}\right)}\right]
280: \end{eqnarray}
281:
282:
283: \subsubsection{Pair probabilities}
284:
285: Having calculated the partition function will allow us to derive the
286: probability distribution of various conformational properties. However, before that we need a
287: recursion calculation form for the free energy term $ FL\left( r_{1i},r_{2j}\right)$ in equation
288: (1). This term presents the free energy of the left part in case of hybridization. In our
289: previous work \cite{RADMZ} we gave an expression for $ FL\left( r_{1i},r_{2j}\right)$ in which
290: we did not consider stacked pairs in loop and dangling end regions as well as multi-branch loops. Based on our new formalism
291: developed above a general recursion calculation form for the left partition function $ {Z\!L}^{h}\left( r_{i},r_{j}\right)$
292: in case of hybridization can be presented as follow:
293:
294:
295: \begin{eqnarray}
296: {Z\!L}^{h}\left( r_{i},r_{j}\right) & = & {Z\!L}\left( r_{1},r_{i}\right){Z\!R}\left( r_{j}, r_{N}\right) + \nonumber \\
297: & & \sum _{1\leq k<i}\sum _{N\geq l>j}{Z\!L}^{h}\left( r_{k},r_{l}\right)
298: \exp \left( -\frac{F\left( r_{i},r_{j},r_{k},r_{l}\right)}{RT}\right)
299: \end{eqnarray}
300:
301: \begin{eqnarray}
302: {F\!L}^{h}\left( r_{i},r_{j}\right) & = & -RT \ln \left[ {Z\!L}^{h}\left( r_{i},r_{j}\right)\right]
303: \end{eqnarray}
304:
305:
306: Now we can tern to the calculation of the probabilities of base pairing. For example, the probabilities $ P(r_{i},r_{j}) $ and
307: $ P(r_{i},r_{j},r_{i+1},r_{j-1}) $ for single $ r_{i}-r_{j} $ and double $ r_{i}-r_{j},r_{i+1}-r_{j-1}$
308: base pairs are:
309:
310:
311: \begin{equation}
312: P\left(r_{i},r_{j}\right) = \frac{{Z\!L}^{h}\left( r_{i},r_{j}\right){Z\!M}\left( r_{i},r_{j}\right)}
313: {Z\left( S\right)}
314: \end{equation}
315:
316:
317: \begin{equation}
318: P\left(r_{i},r_{j},r_{i+1},r_{j-1}\right) = \frac{{Z\!L}^{h}\left(r_{i},r_{j}\right)
319: {\exp \left( -\frac{F\left(r_{i},r_{j},r_{i+1},r_{j-1}\right) }{RT}\right)}{{Z\!M}\left( r_{i+1},r_{j-1}\right)}}
320: {Z\left( S\right)}
321: \end{equation}
322:
323:
324: \begin{figure}
325: \begin{center}
326: \includegraphics{f4.eps}
327: \caption{Base pair contacts and their free energy contributions in case of an open loop and
328: branched hairpin. Also an example is given of conformational switching between the loop and the hairpin as a result
329: of interaction of the loop with a short oligo. At the same time the subregion $ \{p,,,q\} $ (involved into a multibranched loop)
330: has to unfold before it hybridized with the short oligo.}
331: \end{center}
332: \end{figure}
333:
334: where $ {F\left(r_{i},r_{j},r_{i+1},r_{j-1}\right) }$ is the free energy of base pairing of two nearest-neighbor nucleotides.
335:
336:
337: Of particular importance is also the ability to monitor
338: the transition between the folded and unfolded structures as well
339: as the partial forms of their conformational intermediates as a function of the temperature by any physical
340: property that is dependent on the number of base pairs formed.
341: Fortunately, the absorption spectra as well as thermodynamics are
342: physical properties that are consistent with the nearest-neighbor
343: models \cite{PUGJ8901,BLAR7201}. In other words given
344: nearest neighbors must have identical values of their absorptions
345: or melting free energies regardless of their position in the
346: interior or at the ends of the sequence. In such way the property
347: monitored as a function of the temperature is proportional to the
348: fraction of base pairs that are stacked as a nucleic acid molecule
349: is melted \cite{RADMZ}.
350:
351: Using the base pairing probabilities we can express the equilibrium fraction of bases paired $ \theta $
352: as follow:
353:
354: \begin{eqnarray}
355: \theta & = & \sum _{ij}P(r_{i},r_{j})
356: \end{eqnarray}
357:
358:
359: To calculate the extinction we should take into account
360: that it is determined by the contribution of the
361: melted or mismatch loop regions along the constituent sequences of
362: the self-folded species \cite{CRCIT}. At each given
363: temperature there is an ensemble of conformation with a narrow or
364: broad distribution of such loops. The contribution of each of them
365: is proportional to its relative Boltzmann statistical weight. It
366: follows from here that the extinction $ \epsilon(T) $ for the self-folded species can be represented in the form \cite{RADMZ}:
367:
368: \begin{equation}
369: \epsilon (T)=\sum ^{N-1}_{i=1}2(1-P(r_{i})- P(r_{i+1})+
370: P\left(r_{i}, r_{i+1}\right))\xi (i,i+1)-\sum ^{N-1}_{i=1}(1-P(r_{i}))\xi(i)
371: \end{equation}
372:
373: where $1-P(r_{i})- P(r_{i+1}) + P\left(r_{i}, r_{i+1}\right)$ is the probability that two closest along the sequence
374: nucleotides with positions $i$ and $i+1$ are melted and as a result give a contribution $\xi (i,i+1)$ to the total
375: absorbance. For the probabilities $ P(r_{i}) $ and $ P\left(r_{i}, r_{i+1}\right) $ we have:
376:
377: \begin{equation}
378: P(r_{i}) = \sum_{i>n\geq N}{P\left(r_{i},r_{n}\right)}+\sum_{1\leq n<i}{P\left(r_{n},r_{i}\right)} \nonumber
379: \end{equation}
380:
381: \begin{eqnarray}
382: P\left(r_{i},r_{i+1}\right) & = & \sum_{i+1<n<m}\sum_{n<m\leq N}{P\left(r_{i},r_{i+1},r_{m},r_{n}\right)} + \nonumber \\
383: & & \sum_{1\leq
384: n<m}\sum_{n<m<i}{P\left(r_{i},r_{i+1},r_{m},r_{n}\right)} + \nonumber \\
385: & & \sum_{i+1<n\leq N}\sum_{1\leq m<i}{P\left(r_{i},r_{i+1},r_{m},r_{n}\right)}
386: \end{eqnarray}
387:
388: The formalism developed in this work allow also incorporation of several types of intramolecular interactions
389: trough a network of RNA-RNA, RNA-DNA, RNA(DNA)-protein or RNA(DNA)- small
390: molecular contacts. The additional free energy terms depending on
391: the type of interactions (for example hybridization with short oligos or protein molecules) have to be incorporated into
392: the free energy term ${F\!M}\left( r_{i},r_{j}\right) $ (fig.2).
393:
394: \section{Results and discussions}
395:
396:
397: \begin{figure}
398: \begin{center}
399: \includegraphics{f7.eps}
400: \caption{Chemical potential versus temperature for the hairpin species formed after dissociation of the three dsDNAs -S1S2, S3S4,
401: S5S6.}
402: \end{center}
403: \end{figure}
404:
405:
406: Understanding of the molecular forces that control the various sequence- and
407: solvent-specific conformational forms found within DNA and RNA
408: oligonucleotides is of great importance. Melting experiments have been the most useful way to
409: measure variety of thermodynamic parameters from which the stabilities of larger structures under different conditions can be estimated.
410: The estimation of the thermodynamic parameters is based on the assumption that the stability of a base pair is
411: dependent only on the identity of adjacent base pair because the major interactions involved in transformation between different
412: conformations of the polynucleotide sequence are stacking and hydrogen bonding \cite{SDATD,NSRKDHT,DRHDHT,JDPDHT}.
413: This additive property of the energy rules based on nearest neighbor
414: approximation forms the bases of the recursion calculations of the
415: partition function. The additivity of the free energy leads to a multiplication of the partition functions \cite{RADMZ}.
416:
417:
418: \begin{figure}
419: \begin{center}
420: \includegraphics{CP.eps}
421: \caption{Calorimetric excess heat capacity, $\Delta C_{p}$, versus temperature profiles for the three dsDNAs.
422: Experimental plots for duplex strand transition are as follows \cite{PWNS}: S1S2(A), S3S4 (B), and S5S6 (C). The calculated curves
423: are with lines and are given as follows: S1S2 (a), S3S4 (b), and S5S6 (c).}
424: \end{center}
425: \end{figure}
426:
427:
428: Based on the multiplication property of the partition function, here we present a new formalism for calculation of the
429: partition function of a single stranded nucleic acids. The self-folding deal with matches, mismatches, symmetric and asymmetric
430: interior loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices. The formalism also takes into
431: account base pair contacts in the loop regions, or dangle ends in the double helix and single hairpin species as well as multi-branches.
432: This allow calculations of both short and long sequences. The self-folding explores all possible conformations of the single strand species.
433:
434: We did calculations on non-self-complementary DNA sequences with melting temperatures between
435: 50 $C^o$ and 90 $C^o$. The sequence length is as follows: 9-S1,d(GCTTGTTGC) and S2,d(GCAACAAGC);
436: 15-S3,d(GCAGGTTGTTTCCGC) and S4,d(GCGGAAACAACCTGC); 21-S5,d(GCAACAGGTTGTTTCCGTTGC) and S6,d(GCAACGGAAACAACCTGTTGC) \cite{PWNS}.
437: The self-folding and hybridization between DNA and RNA sequences takes into account the whole ensemble of single and
438: double strand species in the solution and their fractional extents at different temperatures \cite{RADMZ}.
439: We assume that the solution can be described as an ensemble of ideally mixed species.
440: This assumption is based on the experimental evidence that with very good accuracy the single-stranded self-folding trasition
441: and the double-stranded association are independent transition processes and the thermodynamic properties and transition
442: characteristics of each transition in a mixing solution are identical to those in the isolated systems \cite{PWNS}. The calculated
443: chemical potentials of intermadiate hairpin species show that for short oligonucleotides (S1, S2 -fig.3), there is a small thermodynamic
444: contribution of the single-strand self-folding transition to the entire transition. As a result the duplex formation for short oligonucleotides
445: shows a perfectly symmetric two-state shape for the calorimetric excess heat capacity curve versus temperature (fig.4). However, for longer oligonucleotides (S3, S4, S5, S6 -fig.3), calculated chemical potentials
446: show that the thermodynamic properties of the single self-folding transition affect the transition nature of the duplex formation, resulting
447: in a population of intermediate hairpin species in the solution. The deviation of calculated calorimetric excess heat capacity curves versus
448: temperature from a perfectly symmetric shape can be seen for duplexes S3S4 and S5S6 in fig.4. Here, the melting of the intermadiate
449: hairpin species are superimposed on the melting of duplex species thus leading to deviation from the two-state shape of the heat capacity curve.
450:
451: \begin{figure}
452: \begin{center}
453: \includegraphics{f6.eps}
454: \caption{Schematic representation of the phase transitions in solutions containing molecular beacons.
455: At low temperature (phase A) molecular beacons and their targets spontaneously form duplexes. In this
456: state molecular beacons are open and fluorescent. At higher temperature (phase B) duplexes
457: are destabilized and molecular beacons are released, returning to their closed hairpin conformation, and fluorescence
458: decreases. As the temperature is raised further (phase C), the closed molecular beacons melt into fluorescent random coils.}
459: \end{center}
460: \end{figure}
461:
462: Further we will analyze in details the transition nature of the duplex formation or dissociation and the role
463: of the intermediate hairpin species.
464: The role of hairpin intermediates during dissociation or formation of the duplex species in the solution
465: is of great importance in the case when a short oligonucleotides
466: (molecular beacons) have to reliably identify and hybridize to accessible nucleotides within their targeted mRNA sequences.
467: Molecular beacons are DNA probes that form a stem-and-loop intermediate structure and possess an internally
468: quenched fluorophore. When they bind to complementary nucleic acids, they undergo a conformational transition that
469: switches on their fluorescence. Molecular beacons are commonly used to identify complementary strands in the presence of
470: unrelated nucleic acids. Understanding the thermodynamic basis and the underlying conformational transformations of the
471: enhanced specificity of molecular beacons to their target sequences is of great importance. A simple picture based on
472: detailed thermodynamic analysis of the underlying phase transitions in solutions containing molecular beacons is given in fig. 4 \cite{GBSTALFK}.
473: Experimental data give evidence for there phases: phase A- probe-target duplex; phase B- free of target molecular beacon in the form of stem-loop
474: structure and coiled target; and phase C- molecular beacon and the target are both coiled. All-or-none mechanism is supposed for the
475: transitions between the phases. To understand the basis of the molecular beacon specificity from first principle we apply our formalism
476: to calculate variety of thermodynamic characteristics such as free energy, enthalpy and entropy. The idea was to compare the behavior
477: of molecular beacons in the presence of perfectly complementary target oligonucleotides to their behavior in the presence of targets
478: whose sequence created a single mismatched base pair in the probe-target duplex. The sequence of the molecular beacon
479: used in this work is CGCTCCCAAAAAAAAAAACCGAGCG, and the complementary target GGTTTTTTTTTTTGG.
480: In our calculations we do not restrict our self to the case of a two-state transitions where in solution during the
481: temperature screening there are only two type of conformational species- fully folded and fully unfolded. Rather we consider the
482: ensemble of all possible intermediate states thus having the most detailed possible picture of the melting process between the
483: folded and unfolded states of the single and double stranded forms.
484: Results from our calculations together with the experimental data are given in Table 1. Our calculations are in very good agreement
485: with the experimental data \cite{GBSTALFK}. Analysis of the calculated melting curves and intermediates, reveals that the enhanced
486: specificity of the molecular beacons is a result of their constrained conformational flexibility and the all-or-none mechanism of
487: their hybridization to the target sequence.
488:
489: \begin{table}
490: \caption{Standard enthalpies and standard entropies are shown for solutions containing 50 nM molecular beacons and
491: 1 M target oligonucleotides in the presence of 100 mM KCl and 1 mM $ MgCl_{2}$ \cite{GBSTALFK}. Melting temperatures are for solutions with
492: 50 nM molecular beacons and 300 nM target oligonucleotides. Experiments are given for different mismatches at the
493: same position (marked with 0) and the same mismatch at nearest left (marked with -1) and rigth (marked with +1) positions.}
494: \fontsize{9}{10pt}\selectfont
495: \begin{tabular}{|l|c|c|c|c|c|c|c|c|c|c|c|}
496: \hline {Mismatch}&{Position}&\multicolumn{2}{|c|}{$ -\Delta H^0(kcal/mol)$}&\multicolumn{2}{|c|}{$-\Delta S^0(eu)$}&
497: \multicolumn{2}{|c|}{$T_m(C^0)$}\\
498: \hline & & exp & cal & exp & cal & exp & cal\\
499: & & & & & & & \\
500:
501: \hline T-A & 0 & 84 & 80 & 237 & 238 & 42 & 42 \\
502: \hline A-A & 0 & 69 & 62 & 201 & 202 & 27 & 28\\
503: \hline C-A & 0 & 61 & 61.2 & 175 & 202 & 23 & 28\\
504: \hline G-A & 0 & 65 & 61 & 185 & 202 & 28 & 28\\
505:
506:
507: \hline G-A &-1 & 72 & 65 & 208 & 218 & 29 & 27\\
508: \hline G-A & 1 & 74 & 65 & 213 & 217 & 29 & 27\\
509: \hline
510: \end{tabular}
511: \end{table}
512:
513:
514:
515: Thus, calculations show that the main contribution to the
516: free energy of phase A, in case of perfect match between the probe-target sequences, is practically represented by a
517: single conformational state of the probe-target duplex. The contributions from bulges, interior loops and dangle ends are
518: negligible. The main contributions to
519: the free energy of phase B come from the entropy of the coiled target and the free energy of the loop-stem structure
520: of the molecular beacon. Flexibility of molecular beacon around its hairpin structure is the main way
521: to modulate the stability of phase B. Long stems increase the difference between the melting temperatures of perfectly
522: complementary duplexes and mismatched duplexes. However, too long stems make the hairpin
523: stable not only in phase B but also in phase A. On the
524: other hand, too long hairpin loops decrease the stability of the hairpin. This can lead to disappearance of phase B.
525: Moreover, as the length of the molecular beacon increase, the free energy penalty resulting
526: from a mismatched base pair in the probe-target duplex becomes negligible and will decrease the sensitivity to the presence of
527: a mismatch. Finally, the free energy of phase C is a sum of the entropies of the random coils of both molecular beacon and its target.
528: Our calculations are in full agreement with the experimental data and their thermodynamic analysis (fig. 5)\cite{GBSTALFK}.
529:
530:
531:
532: \begin{figure}
533: \begin{center}
534: \includegraphics{FRE.eps}
535: \caption{Experimental and calculated free energy of a solution of molecular beacons in equilibrium with target oligonucleotides. Experimental
536: plots \cite{GBSTALFK} for the free energies are as follows: 1p -free energy of the perfect duplex match (phase A); 1m -free energy
537: of the mismatch duplex (phase A); 2 - free energy of the molecular beacon closed form and the coiled target (phase B).
538: The calculated free energy curves are given as follows: A -free energy of the perfect duplex match (phase A); B -free energy
539: of the mismatch duplex (phase A). Since molecular beacons are conformationally more constrained than the unstructured probes,
540: line 2 cross the lines 1p and 1m in such way that increase the difference between the melting temperatures of perfectly complementary
541: duplexes and mismatched duplexes $\Delta \theta $ compare with the $\Delta \theta^{'}$ for an intermediate state of unstructured probe
542: and target.}
543: \end{center}
544: \end{figure}
545:
546:
547: In conclusion, we presented here a general statistical
548: mechanical approach appropriate to describe the self-folding and hybridization processes of DNA and RNA sequences.
549: The folding model deals with matches, mismatches, symmetric and asymmetric interior loops, stacked pairs in loop
550: and dangling end regions, multi-branched loops, bulges and single base stacking that might exist at duplex ends or at the ends of helices.
551: This allow calculations of both short and long sequences.
552:
553: Calculations on short and long sequences show, that for short oligonucleotides, a duplex formation often displays
554: a two-state transition. However, for longer oligonucleotides, the thermodynamic properties of the single
555: self-folding transition affects the transition nature of the duplex formation, resulting in a population of
556: intermediate hairpin species in the solution. The advantage of this new formalism is clearly demonstrated
557: especially in the case when one need to design relatively short oligonucleotides (molecular beacons) which have to
558: reliably identify and hybridize to accessible nucleotides within their targeted mRNA sequences.
559: It is shown that the design will enhance the specificity of molecular beacons if they form a stem-and-loop structure
560: with constrained conformational flexibility and an all-or-none mechanism of their hybridization to the target sequence.
561: In recent years, a class of diverse regulatory RNAs ( often denoted riboregulators) has emerged that regulate expression
562: at the posttranscriptional level. These regulatory RNAs fine tune cellular responses to stress conditions, integrating
563: environmental signals into global regulation. It seems that the structural constraints that enhance the specificity of molecular
564: recognition are also a general feature of the mechanism of action of riboregulators. Thus, the formalism developed in this work
565: can serve as a first step toward creation of a general approach, which can take into account both affinity and specificity
566: of several types of intramolecular interactions trough a network of RNA-RNA, RNA-DNA, RNA(DNA)-protein
567: or RNA(DNA)- small molecular contacts.
568:
569:
570: \bibliography{hybrid}
571:
572:
573: \begin{thebibliography}{99}
574:
575:
576: \bibitem{SNC} N. C. Seeman (1999) {\em Trends Biotechnol.} {\bf 17} 437.
577:
578: \bibitem{SG} S. Gottesman (2002) {\em GENES and DEVELOPMENT} {\bf 16} 2829.
579:
580: \bibitem{FRES8301} S Freier and D Alkema and A Sinclair and T Neilson and DH Turner (1983) {\em Biochemistry} {\bf 22} 6198.
581:
582: \bibitem{GASRRB} G. A. Soukup and R. R. Breaker (1999){\em Proc. Natl Acad. Sci. USA} {\bf 96} 3584.
583:
584: \bibitem{BYATAMFSJN} B. Yurke, A. J. Turber�ld, A. P. Jr. Mills, F. C. Simmel and J. L. Neumann (2000){\em Nature} {\bf 406} 605.
585:
586: \bibitem{HYXZSNS} H. Yan, X. Zhang, Z. Shen and N. C. Seeman (2002){\em Nature} {\bf 415} 62.
587:
588: \bibitem{MNSDS} M. N. Stojanovic and D. Stefanovic (2003) {\em Nat. Biotechnol.} {\bf 21} 1069.
589:
590: \bibitem{RBNCCJPRLA} R. S. Braich, N. Chelyapov, C. Johnson, P. W. K. Rothemund and L. Adleman (2002) {\em Science} {\bf 296} 499.
591:
592: \bibitem{DSDLDMRD} D. D.Shoemaker, D. A. Lashkari, D. Morris, M. Mittman and R. W. Davis (1996) {\em Nature Genet.} {\bf 16} 450.
593:
594: \bibitem{SMJGDDSSM} S. Brenner, M. Johnson, J. Bridgham, G. Golda, D. H. Lloyd, D. Johnson, S. Luo, S. McCurdy, M. Foy, M. Ewan et al. (2000)
595: {\em Nat. Biotechnol.} {\bf 18} 630.
596:
597: \bibitem{GWMG} G. Werstuck and M. R. Green (1998) {\em Science} {\bf 282} 296.
598:
599: \bibitem{ITCB} I. Jr. Tinoco and C. Bustamante (1999) {\em J. Mol. Biol.} {\bf 293} 271.
600:
601: \bibitem{DMMBSFJWDT} D. H. Mathews, M. E. Burkard, S. M. Freier, J. R. Wyatt and D. H. Turner (1999) {\em RNA} {\bf 5} 1458.
602:
603: \bibitem{GS} G. Stormo (2003) {\em Molecular Cell} {\bf 11} 1419.
604:
605: \bibitem{MJWR} M. Mandal, B. Boese, J. E. Barrick, W. C. Winkler, and R. R. Breaker. (2003) {\em Cell} {\bf 113} 577.
606:
607: \bibitem{MTMPAS} M. T. McManus and P. A. Sharp (2002) {\em Nature Rev. Genet.} {\bf 3} 737.
608:
609: \bibitem{TSCSNB} T. A. Vickers, S. Koo, C. F. Bennett, S. T. Crooke, N. M. Dean, and B. F. Baker (2003) {\em J. Biol. Chem.} {\bf 278} 7108.
610:
611: \bibitem{RADMZ} R. A. Dimitrov and M. Zuker (2003) {\em Biophysical J.} {\bf 87} 215.
612:
613: \bibitem{SUGN8701} N. Sugimoto, R. Kierzek and DH. Turner (1987) {\em Biochemistry}{\bf 26} 4554
614:
615: \bibitem{HICD8501} DR. Hickey and DH. Turner (1985) {\em Biochemistry} {\bf 24} 2086.
616:
617: \bibitem{PUGJ8901} JD. Puglisi and IJr. Tinoco (1989) {\em Methods in Enzymology},
618: {\bf 180} 304.
619:
620: \bibitem{BLAR7201} RD. Blake (1972) {\em Biopolymers} {\bf 11} 913.
621:
622: \bibitem{BORP7401} PN. Borer, B. Dengler, IJr. Tinoco and OC. Uhlenbeck (1974) {\em J Mol Biol}
623: {\bf 86} 843.
624:
625: \bibitem{MCCJ9001} JS McCaskill (1990) in {\em Biopolymers}{\bf 29} 1105.
626:
627: \bibitem{HOFI9401} IL Hofacker, W. Fontana, PF. Stadler, S. Bonhoffer, M. Tacker, P. Schuster (1994) {\em Monatshefte f\"{u}r Chemie}
628: {\bf 125} 167.
629:
630: \bibitem{MATO9601} O. Matzura and A. Wennborg (1996) {\em Comput Appl Biosci} {\bf
631: 12} 247.
632:
633: \bibitem{CRCIT} C. R. Cantor and I. Jr. Tinoco (1965) {\em J Mol Biol} {\bf 13} 65.
634:
635:
636: \bibitem{ZUKM8901} M. Zuker (1989) {\em Methods Enzymol} {\bf 180} 262
637:
638: \bibitem{WILA8601} AL Williams and IJr Tinoco (1986) {\em Nucleic Acids Res} {\bf 14} 299.
639:
640: \bibitem{WATM8301} MS Waterman (1983) {\em Proc Natl Sci USA} {\bf 80} 3123.
641:
642: \bibitem{WATM8502} MS Waterman and TH Byers (1985) {\em Math Biosci} {\bf 77} 179.
643:
644: \bibitem{PWNS} P. Wu and N. Sugimoto (2000) {\em Nucleic Acids Reas} {\bf 28} 4762.
645:
646:
647: \bibitem{ZUKM8902} M. Zuker (1989) {\em Science} {\bf 244} 48.
648:
649: \bibitem{ZUKM8903} M. Zuker (1989) {\em J. Mol. Biol.} {\bf 288} 911.
650:
651: \bibitem{ZUKM8904} M. Zuker (2000) {\em Curr. Opin. Struct. Biol.} {\bf 10} 303.
652:
653: \bibitem{ZUKM0305} N. R. Markham and M. Zuker (2005) {\em Nucleic Acids Reas} {\bf 33} W577.
654:
655:
656: \bibitem{BATRT9901} R. T. Batey, R. P. Rambo, and J. A. Doudna (1955) {\em Angew. Chem. Int.} {\bf 38} 2326.
657:
658: \bibitem{EDRBBMJD} E. A. Doherty, R. T. Batey, B. Masquida and J. A. Doudna (2001) {\em Nature Structural Biology}
659: {\bf 8} 339.
660:
661: \bibitem{TRSTP} T. R. Sosnick and T. Pan (2003) {\em Current Opinion in Structural Biology}
662: {\bf 13} 309.
663:
664: \bibitem{VDAF} V. Daggett and A. Fersht (2003) {\em Nature Rev. Mol. Cell Biol.} {\bf 4} 497.
665:
666: \bibitem{SWGSMYCR} S. P. Walton, G. N. Stephanopoulos, M. L. Yarmush, and C. M. Roth (2002) {\em Biophysical J.}
667: {\bf 82} 366.
668:
669: \bibitem{SWGNSMYMR} S. P. Walton, G. N. Stephanopoulos, M. L. Yarmush, and C. M. Roth (1999) {\em Biotechnol. Bioeng.}
670: {\bf 65} 1.
671:
672: \bibitem{TVJWSF} T. A. Vickers, J. R. Wyaatt and S. M. Freier (2000) {\em Nucleic Acids Research} {\bf 28} 1340.
673:
674:
675: \bibitem{GBSTALFK} G. Bonnet, S. Tyagi, A. Libchaber, and F. R. Kramer (1998) {\em Proc. Natl. Acad. Sci. USA}
676: {\bf 96} 6171.
677:
678: \bibitem{SDATD} S. Freier, D. Alkema, A. Sinclair, T. Neilson, and D. H. Terner (1983) {\em Biochemistry}
679: {\bf 22} 6198.
680:
681: \bibitem{NSRKDHT} N. Sugimoto, R. Kierzek and D. H. Terner (1987) {\em Biochemistry}
682: {\bf 26} 4554.
683:
684: \bibitem{DRHDHT} D. R. Hickey and D. H. Terner (1985) {\em Biochemistry}
685: {\bf 24} 2086.
686:
687: \bibitem{JDPDHT} J. D. Puglisi and D. H. Terner (1989) {\em Methods Enzymology}
688: {\bf 180} 304.
689:
690: \end{thebibliography}
691:
692:
693:
694:
695: \end{document}
696:
697:
698: \end{document}
699: