1: \documentclass[11pt]{article}
2:
3: \usepackage{amstext}
4: \usepackage{enumerate}
5: \usepackage{citesort}
6: \usepackage[mathscr]{eucal}
7: \usepackage{epsfig}
8: \usepackage{amsmath}
9: \usepackage{theorem}
10:
11: \renewcommand{\baselinestretch}{1.1}
12:
13: \addtolength{\textheight}{1.4in}
14: \addtolength{\textwidth}{1in}
15: \addtolength{\topmargin}{-0.7in}
16: %\addtolength{\evensidemargin}{-0.7in}
17: \addtolength{\oddsidemargin}{-0.3in}
18:
19: \def\myendproof{{\ \vbox{\hrule\hbox{%
20: \vrule height1.3ex\hskip0.8ex\vrule}\hrule }}\par}
21: \newtheorem{theorem}{Theorem}[section]
22: \newtheorem{lemma}[theorem]{Lemma}
23: \newtheorem{corollary}[theorem]{Corollary}
24: \newtheorem{fact}[theorem]{Fact}
25: \newtheorem{definition}{Definition}
26: \newenvironment{proof}{{\it Proof. }}{\myendproof}
27:
28:
29: % shorthands
30: \newcommand{\bigO}{\mathscr{O}}
31:
32: \newcommand{\mnote}[1]{\marginpar{\scriptsize\it #1}}
33:
34: \newcommand{\comment}[1]{}
35:
36:
37: \title{Predicting RNA Secondary Structures with Arbitrary Pseudoknots
38: by Maximizing the Number of Stacking Pairs}
39:
40: \author{\hspace*{.5in}
41: Samuel Ieong\thanks{Department of Computer Science,
42: Yale University, New Haven, CT 06520.}
43: \and
44: Ming-Yang Kao\thanks{Department of Computer Science,
45: Northwestern University, Evanston, IL 60201 (kao@cs.northwestern.edu).
46: This research was supported in part by NSF Grant EIA-0112934.}
47: \and
48: Tak-Wah Lam\thanks{Department of Computer Science,
49: The University of Hong Kong, Hong Kong
50: (\{twlam, smyiu\}@cs.hku.hk).
51: This research was supported in part by Hong Kong RGC grant HKU-7027/98E.}
52: \hspace*{.5in}
53: \and
54: Wing-Kin Sung\thanks{Department of Computer Science,
55: National University of Singapore, 3 Science Drive 2,
56: Singapore 117543 (ksung@comp.nus.edu.sg).}
57: \and
58: Siu-Ming Yiu\footnotemark[3]
59: }
60:
61: \begin{document}
62: \date{}
63: \maketitle
64:
65: \begin{abstract}
66:
67: The paper investigates the computational problem of predicting
68: RNA secondary structures.
69: The general belief is that allowing pseudoknots makes the problem hard.
70: Existing polynomial-time algorithms are heuristic algorithms
71: with no performance guarantee and can only handle limited
72: types of pseudoknots.
73: In this paper we initiate the study of
74: predicting RNA secondary structures with
75: a maximum number of stacking pairs while allowing arbitrary pseudoknots.
76: We obtain two approximation algorithms with worst-case approximation ratios
77: of $1/2$ and $1/3$ for planar and general secondary structures,
78: respectively. For an RNA sequence of $n$ bases,
79: the approximation algorithm for planar secondary structures
80: runs in
81: $O(n^3)$ time while that for the general case runs in linear time.
82: Furthermore, we prove that allowing
83: pseudoknots makes it NP-hard to maximize
84: the number of stacking pairs in a planar secondary structure.
85: This result is in contrast with the recent
86: NP-hard results on psuedoknots
87: which are based on optimizing some general and complicated
88: energy functions.
89:
90:
91: \end{abstract}
92:
93: \section{Introduction}
94: Ribonucleic acids (RNAs) are molecules that are responsible for regulating many genetic
95: and metabolic activities in cells.
96: An RNA is single-stranded and
97: can be considered as a sequence of nucleotides (also known as bases). There are four
98: basic nucleotides, namely, Adenine (A), Cytosine (C), Guanine (G), and Uracil (U). An
99: RNA folds into a 3-dimensional structure by forming pairs of bases. Paired bases tend to
100: stabilize the RNA (i.e., have negative free energy). Yet base pairing does not occur
101: arbitrarily. In particular, A-U and C-G form stable pairs and
102: are known as the {\em Watson-Crick}
103: base pairs. Other base pairings are less stable and often ignored. An example of a
104: folded RNA is shown in Figure~\ref{fig:RNA-structure}.
105: Note that this figure is just schematic;
106: in practice, RNAs are 3-dimensional molecules.
107:
108: \begin{figure}[t]
109: \begin{centering}
110: \epsfig{file=secondary_structure.eps, height=2.3in}
111: \caption{Example of a folded RNA}
112: \label{fig:RNA-structure}
113: \end{centering}
114: \end{figure}
115:
116: The 3-dimensional structure is related to the function of the RNA.
117: Yet existing experimental techniques for determining the
118: 3-dimensional structures of RNAs are
119: often very costly and time consuming (see, e.g., \cite{Meidanis:1997:ICM}).
120: The secondary structure of an RNA is the set of base pairings formed
121: in its 3-dimensional structure.
122: To determine the 3-dimensional structure of a given RNA sequence,
123: it is useful to determine the corresponding secondary structure.
124: As a result, it is important to design efficient algorithms to
125: predict the secondary structure with computers.
126:
127: {From} a computational viewpoint, the challenge of the RNA secondary
128: structure prediction
129: problem arises from some special structures called pseudoknots,
130: which are defined as follows.
131: Let $S$ be an RNA sequence $s_1, s_2, \cdots, s_n$. A
132: {\it pseudoknot} is composed of two interleaving base
133: pairs, i.e., $(s_i, s_j)$ and $(s_k, s_\ell)$
134: such that $i < k < j < \ell$. See Figure~\ref{fig:simple-pk} for examples.
135:
136:
137: If we assume that the secondary structure of an RNA contains no pseudoknots,
138: the secondary structure can be decomposed into a few types of loops: stacking pairs,
139: hairpins, bulges, internal loops, and multiple loops
140: (see, e.g., Tompa's lecture notes \cite{Tompa:2000:LNB} or
141: Waterman's book \cite{Waterman:1995:ICB}). A
142: {\it stacking pair} is a loop formed by two pairs of consecutive
143: bases $(s_i, s_j)$ and $(s_{i+1}, s_{j-1})$ with $i+4 \leq j$.
144: See Figure~\ref{fig:RNA-structure} for an example.
145: By definition, a stacking pair contains no unpaired bases and any other kinds
146: of loops contain one or more unpaired bases. Since
147: unpaired bases are destabilizing and have positive free energy,
148: stacking pairs are the only type of
149: loops that have negative free energy and stabilize the secondary structure.
150: It is also natural to
151: assume that the free energies of loops are independent.
152: Then an optimal pseudoknot-free secondary structure can be
153: computed using dynamic programming in $O(n^3)$ time
154: \cite{Lyngso:1999:FEI,Lyngso:1999:ILR,Zuker:1984:RSS,Zuker:1989:UDP}.
155:
156: \begin{figure}[t]
157: \begin{centering}
158: \epsfig{file=pseudoknots.eps, height=1.4in}
159: \caption{Examples of pseudoknots}
160: \label{fig:simple-pk}
161: \end{centering}
162: \end{figure}
163:
164:
165: However, pseudoknots are known to exist in some RNAs.
166: For predicting secondary structures with pseudoknots,
167: Nussinov et al.~\cite{Nussinov:1978:ALM} have studied the case where
168: the energy function is minimized when the number of base pairs is
169: maximized and have obtained an $O(n^3)$-time algorithm
170: for predicting secondary structures.
171: Based on some special energy functions, Lyngso and Pedersen
172: \cite{Lyngso:2000:RPP} have proven that determining the optimal secondary structure
173: possibly with
174: pseudoknots is NP-hard. Akutsu \cite{Akutsu:2000:DPA}
175: has shown that it is NP-hard to determine an optimal
176: planar secondary structure, where a secondary structure is {\it planar}
177: if the
178: graph formed by the base pairings and the backbone connections of adjacent bases is
179: planar (see Section 2 for a more detailed definition).
180: Rivas and Eddy \cite{Rivas:1999:DPA}, Uemura et al. \cite{Uemura:1999:TAG},
181: and
182: Akutsu \cite{Akutsu:2000:DPA} have also proposed polynomial-time algorithms that can
183: handle limited types of pseudoknots; note that the exact types of
184: such pseudoknots are implicit in these algorithms and difficult to
185: determine.
186:
187: Although it might be desirable to have a better classification of pseudoknots and
188: better algorithms that
189: can handle a wider class of pseudoknots,
190: this paper approaches the problem in a different general direction.
191: We initiate the study of predicting RNA secondary
192: structures that allow arbitrary pseudoknots while maximizing
193: the number of stacking pairs.
194: Such a simple energy function is meaningful as
195: stacking pairs are the only loops that stabilize
196: secondary structures. We obtain two approximation algorithms with worst-case ratios of
197: 1/2 and 1/3 for planar and general secondary structures, respectively.
198: The planar
199: approximation algorithm makes use of a geometric observation
200: that allows us to
201: visualize the planarity of stacking pairs on a rectangular grid;
202: interestingly, such an
203: observation does not hold if our aim is to maximize the number of base pairs.
204: This algorithm runs in $O(n^3)$ time.
205: The second approximation algorithm is more complicated and
206: is based on a combination of multiple ``greedy'' strategies.
207: A straightforward analysis cannot lead to the approximation ratio of $1/3$.
208: We make use of amortization over different steps to obtain the desired
209: ratio. This algorithm runs in $O(n)$ time.
210:
211: To complement these two algorithms, we also prove
212: that allowing pseudoknots makes it NP-hard to
213: find the planar secondary structure with the
214: largest number of stacking pairs.
215: The proof makes use of a reduction from a
216: well-known NP-complete problem called Tripartite Matching
217: \cite{Garey:1979:CIG}.
218: This result indicates that the hardness of the RNA secondary
219: structure prediction problem may be inherent in the pseudoknot structures
220: and may not be necessarily due to the complication of the energy functions.
221: This is in contrast to the other NP-hardness results discussed
222: earlier.
223:
224: The rest of this paper is organized into four sections. Section 2
225: discusses some basic properties.
226: Sections 3 and 4 present the approximation algorithms for
227: planar and general secondary structures, respectively.
228: Section 5 details the NP-hardness result.
229: Section 6 concludes the paper with open problems.
230:
231: \section{Preliminaries}
232:
233: Let $S=s_1 s_2 \cdots s_n$ be an RNA sequence of $n$ bases.
234: A {\it secondary structure} ${\cal P}$ of $S$ is a set of
235: Watson-Crick pairs $(s_{i_1}, s_{j_1}),\ldots, (s_{i_p},
236: s_{j_p})$, where $s_{i_r}+2 \leq s_{j_r}$ for all $r=1, \ldots, p$
237: and no two pairs share a base.
238: We denote $q$ ($q \geq 1$)
239: consecutive stacking pairs ($s_i, s_j$), ($s_{i+1}, s_{j-1}$);
240: ($s_{i+1}, s_{j-1}$), ($s_{i+2}, s_{j-2}$)
241: $\ldots$ ($s_{i+q-1}, s_{j-q+1}$), ($s_{i+q}, s_{j-q}$) of
242: ${\cal P}$ by ($s_i,s_{i+1}, \ldots, s_{i+q};$
243: \linebreak[4] $s_{j-q}, \ldots, s_{j-1}, s_j$).
244:
245: \begin{definition}
246: Given a secondary structure ${\cal P}$,
247: we define an undirected
248: graph $G({\cal P})$ such that the bases
249: of $S$ are the nodes of $G({\cal P})$ and $(s_i, s_j)$ is
250: an edge of $G({\cal P})$ if $j = i+1$ or $(s_i, s_j)$ is
251: a base pair in ${\cal P}$.
252: \end{definition}
253:
254: \begin{definition}
255: A secondary structure ${\cal P}$ is planar if $G({\cal P})$ is
256: a planar graph.
257: \end{definition}
258:
259: \begin{definition}
260: A secondary structure ${\cal P}$ is said to contain an
261: {\it interleaving block} if ${\cal P}$ contains three
262: stacking pairs
263: $(s_i, s_{i+1}; s_{j-1}, s_j)$, $(s_{i'}, s_{i'+1}; s_{j'-1}, s_{j'})$,
264: $(s_{i''}, s_{i''+1}; s_{j''-1},s_{j''})$ where $i < i' < i'' < j < j' < j''$.
265: \end{definition}
266:
267: \begin{lemma}
268: \label{interleavingblock}
269: If a secondary structure ${\cal P}$ contains
270: an interleaving block, ${\cal P}$ is non-planar.
271: \end{lemma}
272:
273: \begin{proof}
274: Suppose ${\cal P}$ contains an interleaving block. Without
275: loss of generality, we assume that ${\cal P}$ contains
276: the stacking pairs ($s_1, s_2; s_7, s_8$),
277: ($s_3, s_4; s_9, s_{10}$), and ($s_5, s_6; s_{11}, s_{12}$).
278: Figure \ref{interblock}(a) shows the
279: subgraph of $G({\cal P})$ corresponding to these
280: stacking pairs. Since this subgraph contains a homeomorphic copy of
281: $K_{3,3}$ (see Figure \ref{interblock}(b)),
282: $G({\cal P})$ and ${\cal P}$ are non-planar.
283: \end{proof}
284:
285: \begin{figure*}[hbtp]
286: \begin{center}
287: \scalebox{0.5}[0.5]{\includegraphics{interblock2.eps}}
288: \caption{Interleaving block}
289: \label{interblock}
290: \end{center}
291: \end{figure*}
292:
293: \section{An Approximation Algorithm for Planar Secondary Structures}
294: We present an algorithm which,
295: given an RNA sequence $S = s_1 s_2 \ldots s_n$,
296: constructs a {\it planar} secondary structure of $S$
297: to approximate one with the maximum number of stacking pairs
298: with a ratio of at least $1/2$. This
299: approximation algorithm is based on the subtle
300: observation in Lemma \ref{planarembedding}
301: that if a secondary structure ${\cal P}$ is planar,
302: the subgraph of $G({\cal P})$ which contains {\it only} the stacking pairs
303: of ${\cal P}$ can be embedded in a grid with a useful property.
304: This property enables us to consider only the secondary structure of
305: $S$ {\it without pseudoknots} in order to achieve 1/2 approximation
306: ratio.
307:
308: \begin{definition}
309: Given a secondary structure ${\cal P}$, we define a
310: {\it stacking pair embedding} of
311: ${\cal P}$ on a grid as follows.
312: Represent the bases of $S$ as $n$ consecutive grid points on the
313: same horizontal grid line $L$ such that $s_i$ and $s_{i+1}$
314: $(1 \leq i < n)$ are connected directly by a horizontal grid edge.
315: If $(s_i, s_{i+1}; s_{j-1}, s_j)$ is a stacking pair of ${\cal P}$,
316: $s_i$ and $s_{i+1}$ are connected to $s_j$ and $s_{j-1}$ respectively
317: by a sequence of grid edges such that the two sequences must
318: be either both above or both below $L$.
319: \end{definition}
320:
321: Figure \ref{embedding-eg} shows a stacking pair embedding
322: (Figure \ref{embedding-eg}(b))
323: of a given secondary structure (Figure \ref{embedding-eg}(a)).
324: Note that ($s_3,s_9$)
325: do not form a stacking pair with other base pair, so $s_3$
326: is not connected to $s_9$ in the stacking pair embedding.
327: Similarly, $s_4$ is not connected to $s_{10}$ in the
328: embedding.
329:
330: \begin{figure*}[hbtp]
331: \begin{center}
332: \scalebox{0.5}[0.5]{\includegraphics{embedding.eps}}
333: \caption{An example of a stacking pair embedding}
334: \label{embedding-eg}
335: \end{center}
336: \end{figure*}
337:
338: \begin{definition}
339: A stacking pair embedding is said to be {\it planar} if
340: it can be drawn in such a way that
341: no lines cross or overlap with each other in the grid.
342: \end{definition}
343:
344: The embedding shown in Figure \ref{embedding-eg}(b) is planar.
345:
346: \begin{lemma}
347: \label{planarembedding}
348: Let ${\cal P}$ be a secondary structure of an RNA sequence $S$.
349: Let $E$ be a stacking pair embedding of ${\cal P}$.
350: If ${\cal P}$ is planar, then $E$ must be planar.
351: \end{lemma}
352:
353: \begin{proof}
354: If ${\cal P}$ does not have a planar stacking
355: pair embedding, we claim that ${\cal P}$ contains an
356: interleaving block. Let $L$ be the horizontal grid line
357: that contains the bases of $S$ in $E$.
358: Since ${\cal P}$ does not have a planar
359: stacking pair embedding, we can assume that $E$ has
360: two stacking pairs intersect
361: above $L$ (see Figure \ref{non-planar-sec-struct}(a)).
362:
363: \begin{figure*}[hbtp]
364: \begin{center}
365: \scalebox{0.5}[0.5]{\includegraphics{nonplanar2.eps}}
366: \caption{Non-planar stacking pair embedding}
367: \label{non-planar-sec-struct}
368: \end{center}
369: \end{figure*}
370:
371:
372: If there is no other stacking pair underneath these two
373: pairs, we can flip one of the pairs below $L$ as shown
374: in Figure \ref{non-planar-sec-struct}(b). So, there must be
375: at least one stacking pair underneath these two
376: pairs. By checking all
377: possible cases (all non-symmetric cases are shown in
378: Figures \ref{non-planar-sec-struct}(c) to (i)), it can be
379: shown that $E$ cannot be redrawn without crossing or overlapping
380: lines only if it contains an interleaving block
381: (Figures \ref{non-planar-sec-struct}(h) and (i)). So, by
382: Lemma \ref{interleavingblock}, ${\cal P}$ is non-planar.
383: \end{proof}
384:
385: By Lemma \ref{planarembedding},
386: we can relate two secondary structures having the maximum
387: number of stacking pairs with and without pseudoknots
388: in the following lemma.
389:
390: \begin{lemma}
391: \label{1/2-ratio}
392: Given an RNA sequence $S$, let $N^*$ be the maximum number of
393: stacking pairs that can be formed by a planar secondary
394: structure of $S$ and let $W$ be the maximum
395: number of stacking pairs that can be formed by $S$ without
396: pseudoknots. Then, $W \geq \frac{N^*}{2}$.
397: \end{lemma}
398:
399: \begin{proof}
400: Let ${\cal P}^*$ be a planar secondary structure of $S$ with $N^*$
401: stacking pairs. Since ${\cal P}^*$ is planar, by Lemma
402: \ref{planarembedding}, any stacking pair embedding of ${\cal P}^*$
403: is planar.
404:
405: Let $E$ be a stacking pair embedding of ${\cal P}^*$
406: such that no lines cross each other in the grid.
407: Let $L$ be the horizontal grid line of $E$ which
408: contains all bases of $S$.
409: Let $n_1$ and $n_2$ be the number of stacking pairs which
410: are drawn above and below $L$, respectively.
411: Without loss of generality,
412: assume that $n_1 \geq n_2$. Now, we construct another planar
413: secondary structure ${\cal P}$ from $E$ by deleting all stacking
414: pairs which are drawn below $L$.
415: Obviously, ${\cal P}$ is a planar secondary structure of $S$ without
416: pseudoknots. Since $n_1 \geq n_2$, $n_1 \geq \frac{N^*}{2}$.
417: As $W \geq n_1$, $W \geq \frac{N^*}{2}$.
418: \end{proof}
419:
420: Based on Lemma \ref{1/2-ratio}, we now present the dynamic programming
421: algorithm $MaxSP$ which computes the maximium number of
422: stacking pairs that can be formed by an RNA
423: sequence $S=s_1 s_2 \ldots s_n$ without pseudoknots.
424:
425: \vspace{5pt}
426: \noindent
427: {\bf Algorithm $MaxSP$}
428:
429: Define $V(i,j)$ (for $j \geq i$) as the maximum number of stacking
430: pairs without pseudoknots that can be formed by $s_i \ldots s_j$
431: {\it if $s_i$ and $s_j$ form a Watson-Crick pair}.
432: Let $W(i,j)$ ($j \geq i$) be the maximum number
433: of stacking pairs without pseudoknots that can be formed by
434: $s_i \ldots s_j$. Obviously, $W(1,n)$ gives the maximum
435: number of stacking pairs that can be formed by $S$ without
436: pseudoknots.
437:
438: \noindent \fbox{Basis:}
439:
440: For $j = i, i+1, i+2 \mbox{~or~} i+3$ ($j \leq n$),
441: \[
442: \begin{array}{lll}
443: V(i,j) & = 0 & \mbox{ if $s_i, s_{j}$ form a Watson-Crick pair;} \\
444: W(i,j) & = 0. &
445: \end{array}
446: \]
447:
448: \noindent \fbox{Recurrence:}
449:
450: For $j > i+3$,
451: %\[W(i,j) = \max \left\{
452: \[
453: \begin{array}{llll}
454: W(i,j) & = & \max \left\{
455: \begin{array}{ll}
456: V(i,j) & \mbox{ if $s_i$, $s_j$ form a Watson-Crick pair} \\
457: W(i+1, j) & \\
458: W(i, j-1) &
459: \end{array}
460: \right\}; \\
461: &&\\
462: V(i,j) & = & \max \left\{
463: \begin{array}{l}
464: V(i+1, j-1) + 1 \mbox{~~~~if $s_{i+1}$, $s_{j-1}$ form a Watson-Crick pair} \\
465: \max_{i+1 \leq k \leq j-2}{\{W(i+1,k)+W(k+1,j-1)\}}
466: \end{array}
467: \right\}.
468: \end{array}
469: \]
470:
471: \begin{lemma}
472: Given an RNA sequence $S$ of length $n$, Algorithm $MaxSP$
473: computes the maximum number of stacking pairs that can be
474: formed by $S$ without pseudoknots in $O(n^3)$ time and
475: $O(n^2)$ space.
476: \end{lemma}
477:
478: \begin{proof}
479: There are $O(n^2)$ entries $V(i,j)$ and $W(i,j)$ to be
480: filled. To fill an entry of $V(i,j)$, we check
481: at most $O(n)$ values. To fill an entry of $W(i,j)$, $O(1)$ time
482: suffices. The total time complexity for filling all entries
483: is $O(n^3)$. Storing all entries requires $O(n^2)$ space.
484: \end{proof}
485:
486: Although Algorithm $MaxSP$ presented in the above only
487: computes the number of stacking pairs, it can be easily modified
488: to compute the secondary structure.
489: Thus we have the following theorem.
490:
491: \begin{theorem}
492: The Algorithm $MaxSP$ is an $(1/2)$-approximation algorithm
493: for the problem of constructing a secondary structure which
494: maximizes the number of stacking pairs for an RNA sequence $S$.
495: \end{theorem}
496:
497: \section{An Approximation Algorithm for General Secondary Structures}
498: We present Algorithm $GreedySP()$ which,
499: given an RNA sequence $S = s_1 s_2 \ldots s_n$,
500: constructs a secondary structure of $S$ (not necessarily planar)
501: with at least $1/3$ of the maximum possible number of
502: stacking pairs.
503: The approximation algorithm uses a greedy approach.
504: Figure \ref{1/3-approx-alg} shows
505: the algorithm $GreedySP()$.
506:
507: \begin{figure}[htbp]
508: \fbox{
509: \begin{minipage}{.95\textwidth}
510: \noindent // Let $S=s_1 s_2 \ldots s_n$ be the input RNA sequence.
511: Initially, all $s_j$ are unmarked.
512:
513: \noindent // Let $E$ be the set of base pairs output by the algorithm.
514: Initially, $E = \emptyset$.
515:
516: \vspace{5pt}
517: \noindent $GreedySP(S, i)$
518: \hspace{10pt}
519: // $i \geq 3$
520:
521: \begin{enumerate}
522: \item Repeatedly find the {\it leftmost} $i$ consecutive stacking pairs
523: $SP$ (i.e., find $(s_p,\ldots,s_{p+i};s_{q-i},\ldots,s_q)$ such that
524: $p$ is as small as possible) formed by unmarked bases.
525: Add $SP$ to $E$ and mark all these bases.
526: \item For $k = i-1$ downto $2$, \\
527: Repeatedly find {\it any} $k$ consecutive stacking pairs $SP$
528: formed by unmarked bases.
529: Add $SP$ to $E$ and mark all these bases.
530: \item Repeatedly find the {\it leftmost} stacking pair $SP$ formed
531: by unmarked bases.
532: Add $SP$ to $E$ and mark all these bases.
533: \end{enumerate}
534: \end{minipage}
535: } %% end of fbox
536: \caption{A 1/3-Approximation Algorithm}
537: \label{1/3-approx-alg}
538: \end{figure}
539:
540: In the following, we analyze the approximation ratio of
541: this algorithm.
542: The algorithm $GreedySP(S, i)$ will generate a sequence of $SP$'s
543: denoted by $SP_1, SP_2, \ldots, SP_h$.
544:
545: \begin{fact}
546: \label{spdisjoint}
547: For any $SP_j$ and $SP_k$ $(j \neq k)$, the
548: stacking pairs in $SP_j$ do not share any base with those
549: in $SP_k$.
550: \end{fact}
551:
552: For each $SP_j = (s_p, \ldots, s_{p+t}; s_{q-t}, \ldots, s_q)$,
553: we define two intervals of indexes, ${\cal I}_j$ and
554: ${\cal J}_j$, as $[p .. p+t]$
555: and $[q-t .. q]$, respectively.
556: In order to compare
557: the number of stacking pairs formed with that in the optimal
558: case, we have the following definition.
559:
560: %\begin{fact}
561: %We have the following facts:
562: %\begin{itemize}
563: %\item All ${\cal I}_j$ and ${\cal J}_j$ (for all $j$) intervals are disjoint.
564: %\item $|SP_j| = |{\cal I}_j|-1$ where $|{\cal I}_j|$ denotes the
565: % number of bases in the interval.
566: %\end{itemize}
567: %\end{fact}
568:
569: \begin{definition}
570: \label{xpi}
571: Let ${\cal P}$ be an optimal secondary structure of $S$ with
572: the maximum number of stacking pairs. Let
573: ${\cal F}$ be the set of all stacking pairs of ${\cal P}$.
574: For each $SP_j$ computed by
575: $GreedySP(S,i)$ and $\beta = {\cal I}_j$ or ${\cal J}_j$,
576: \[\mbox{let~}{\cal X}_{\beta} = \{ (s_k, s_{k+1}; s_{w-1}, s_w) \in
577: {\cal F} | \mbox{~at least one of indexes~} k, k+1, w-1, w
578: \mbox{~is in~} \beta\}. \]
579: \end{definition}
580:
581: Note that ${\cal X}_\beta$'s may not be disjoint.
582:
583: \begin{lemma}
584: \label{complete}
585: $\bigcup_{1 \leq j \leq h} \{{\cal X}_{{\cal I}_j} \cup
586: {\cal X}_{{\cal J}_j}\} = {\cal F}$.
587: \end{lemma}
588:
589: \begin{proof}
590: We prove this lemma by contradiction. Suppose that there exists a
591: stacking pair ($s_k,s_{k+1};s_{w-1},s_w$) in ${\cal F}$ but not in
592: any of ${\cal X}_{{\cal I}_j}$ and ${\cal X}_{{\cal J}_j}$.
593: By Definition \ref{xpi}, none of the indexes, $k,k+1,w-1,w$
594: is in any of ${\cal I}_j$ and ${\cal J}_j$. This contradicts
595: with Step 3 of Algorithm $GreedySP(S,i)$.
596: \end{proof}
597:
598: \begin{definition}
599: \label{x'pi}
600: For each ${\cal X}_{{\cal I}_j}$,
601: \[
602: \mbox{let~} {\cal X}'_{{\cal I}_j} =
603: {\cal X}_{{\cal I}_j} -
604: \bigcup_{k<j} \{ {\cal X}_{{\cal I}_k} \cup
605: {\cal X}_{{\cal J}_k} \},
606: \mbox{~and let~} {\cal X}'_{{\cal J}_j} =
607: {\cal X}_{{\cal J}_j} -
608: \bigcup_{k<j} \{ {\cal X}_{{\cal I}_k} \cup
609: {\cal X}_{{\cal J}_k} \} - {\cal X}_{{\cal I}_j} \]
610: \end{definition}
611:
612: Let $|SP_j|$ be the number of stacking pairs represented by
613: $SP_j$. Let $|{\cal I}_j|$ and $|{\cal J}_j|$ be the numbers
614: of indexes in the intervals ${\cal I}_j$ and ${\cal J}_j$,
615: respectively.
616:
617: \begin{lemma}
618: \label{sumfraction}
619: Let $N$ be the number of stacking pairs computed by
620: Algorithm $GreedySP(S,i)$ and $N^*$ be the maximum number of
621: stacking pairs that can be formed by $S$.
622: If for all $j$, we have
623: $|SP_j| \geq \frac{1}{r} \times
624: |({\cal X}'_{{\cal I}_j} \cup {\cal X}'_{{\cal J}_j})|$, then
625: $N \geq \frac{1}{r} \times N^*$.
626: \end{lemma}
627:
628: \begin{proof}
629: By Definition \ref{x'pi},
630: $\bigcup_k \{{\cal X}_{{\cal I}_k} \cup {\cal X}_{{\cal J}_k}\} =
631: \bigcup_k \{{\cal X}'_{{\cal I}_k} \cup {\cal X}'_{{\cal J}_k}\}$.
632: Then by Fact \ref{spdisjoint}, $N = \sum_j |SP_j|$. Thus,
633: $N \geq \frac{1}{r} \times
634: |\bigcup_k \{{\cal X}_{{\cal I}_k} \cup {\cal X}_{{\cal J}_k}\}|$.
635: By Lemma \ref{complete}, $N \geq \frac{1}{r} \times N^*$.
636: \end{proof}
637:
638: \begin{lemma}
639: \label{boundforspi}
640: For each $SP_j$ computed by $GreedySP(S,i)$, we have
641: $|SP_j| \geq \frac{1}{3}
642: \times
643: |({\cal X}'_{{\cal I}_j} \cup {\cal X}'_{{\cal J}_j})|$.
644: \end{lemma}
645:
646: \begin{proof}
647: There are three cases as follows.
648:
649: \vspace{5pt}
650: \noindent
651: {\it Case 1:} $SP_j$ is computed by $GreedySP(S, i)$ in Step 1.
652: Note that $SP_j = (s_p, \ldots, s_{p+i};$ $s_{q-i}, \ldots, s_q)$ is
653: the leftmost $i$ consecutive stacking pairs, i.e., $p$ is the
654: smallest possible.
655: By definition, $|{\cal X}'_{{\cal I}_j}|, |{\cal X}'_{{\cal J}_j}| \leq i+2$.
656: We further claim that $|{\cal X}'_{{\cal I}_j}| \leq i+1$.
657: Then $|SP_j| / | {\cal X}'_{{\cal I}_j} \cup {\cal X}'_{{\cal J}_j}|
658: \geq i/((i+1)+(i+2)) \geq 1/3$ (as $i \geq 3$).
659:
660: We prove the claim by contradiction. Assume that
661: $|{\cal X}'_{{\cal I}_j}| = i+2$. That is,
662: for some integer $t$, ${\cal F}$ has $i+2$ consecutive stacking pairs
663: $(s_{p-1}, \ldots, s_{p+i+1}; s_{t-i-1}, \ldots, s_{t+1})$.
664: Furthermore, none of the bases $s_{p-1}, \ldots, s_{p+i+1}, s_{t-i-1}, \ldots, s_{t+1}$
665: are marked before $SP_j$ is chosen; otherwise,
666: suppose one such base, says $s_a$, is marked
667: when the algorithm chooses $SP_\ell$ for $\ell < j$,
668: then an stacking pair adjacent to $s_a$ does not belong to
669: ${\cal X}'_{{\cal I}_j}$ and they belong to ${\cal X}'_{{\cal I}_\ell}$
670: or ${\cal X}'_{{\cal J}_\ell}$ instead.
671: Therefore, $(s_{p-1}, \ldots, s_{p+i-1}; s_{t-i+1}, \ldots, s_{t+1})$
672: is the leftmost $i$ consecutive stacking pairs formed by unmarked bases
673: before $SP_j$ is chosen.
674: As $SP_j$ is not the leftmost $i$ consecutive stacking pairs,
675: this contradicts the selection criteria of $SP_j$.
676: The claim follows.
677:
678: \vspace{5pt}
679: \noindent
680: {\it Case 2:} $SP_j$ is computed by $GreedySP(S, i)$ in Step 2.
681: Let $|SP_j| = k \geq 2$. Let
682: $SP_j = (s_p, \ldots, s_{p+k}; s_{q-k}, \ldots, s_q)$.
683: By definition, $|{\cal X}'_{{\cal I}_j}|, |{\cal X}'_{{\cal J}_j}| \leq k+2$.
684: We claim that $|{\cal X}'_{{\cal I}_j}|, |{\cal X}'_{{\cal J}_j}| \leq k+1$.
685: Then $|SP_j| / | {\cal X}'_{{\cal I}_j} \cup {\cal X}'_{{\cal J}_j}|
686: \geq k/((k+1)+(k+1))$,
687: which is at least $1/3$ as $k \geq 2$.
688:
689: To show that $|{\cal X}'_{{\cal I}_j}| \leq k+1$ by contradiction,
690: assume $|{\cal X}'_{{\cal I}_j}| = k+2$. Thus, for some integer $t$,
691: there exist $k+2$ consecutive stacking pairs
692: $(s_{p-1}, \ldots, s_{p+k+1}; s_{t-k-1}, \ldots, s_{t+1})$.
693: Similarly to case 1, we can show that
694: none of the bases $s_{p-1}, \ldots, s_{p+k+1}, s_{t-k-1}, \ldots, s_{t+1}$
695: are marked before $SP_j$ is chosen.
696: Thus, $GreedySP(S, i)$ should select some $k+1$ or $k+2$ consecutive
697: stacking pairs
698: instead of the chosen $k$ consecutive stacking pairs,
699: reaching a contradiction.
700: Similarly, we can show $|{\cal X}'_{{\cal J}_j}| \leq k+1$.
701:
702: \vspace{5pt}
703: \noindent
704: {\it Case 3:} $SP_j$ is computed by $GreedySP(S, i)$ in Step 3.
705: $SP_j$ is the leftmost stacking pair when it is chosen.
706: Let $SP_j = (s_p, s_{p+1}; s_{q-1}, s_q)$.
707: By the same approach as in Case 2,
708: we can show $|{\cal X}'_{{\cal I}_j}|, |{\cal X}'_{{\cal J}_j}| \leq 2$.
709: We further claim $|{\cal X}'_{{\cal I}_j}| \leq 1$.
710: Then $|SP_j| / | {\cal X}'_{{\cal I}_j} \cup {\cal X}'_{{\cal J}_j}| \geq 1/(1+2) = 1/3$.
711:
712: To verify $|{\cal X}'_{{\cal I}_j}| \leq 1$,
713: we consider all possible cases with $|{\cal X}'_{{\cal I}_j}| = 2$
714: while there are no two consecutive stacking pairs.
715: The only possible case is that for some integers $r, t$,
716: both $(s_{p-1}, s_p; s_{r-1}, s_r)$
717: and $(s_p, s_{p+1}; s_{t-1}, s_t)$ belong to ${\cal X}'_{{\cal I}_j}$.
718: Then, $SP_j$ cannot be the leftmost stacking pair formed by unmarked bases,
719: contradicting the selection criteria of $SP_j$.
720: \end{proof}
721:
722: \begin{theorem}
723: Let $S$ be an RNA sequence. Let $N^*$ be the maximum number of stacking
724: pairs that can be formed by any secondary structure of $S$. Let
725: $N$ be the number of stacking pairs output by $GreedySP(S,i)$. Then,
726: $N \geq \frac{N^*}{3}$.
727: \end{theorem}
728:
729: \begin{proof}
730: By Lemmas \ref{sumfraction} and \ref{boundforspi}, the result follows.
731: \end{proof}
732:
733: We remark that by setting $i=3$ in $GreedySP(S,i)$, we can already
734: achieve the approximation ratio of 1/3. The following theorem gives
735: the time and space complexity of the algorithm.
736:
737: \begin{theorem}
738: Given an RNA sequence $S$ of length $n$ and a constant $k$,
739: Algorithm \linebreak[4] $GreedySP(S,k)$
740: can be implemented in $O(n)$ time and $O(n)$ space.
741: \end{theorem}
742:
743: \begin{proof}
744: Recall that the bases of an RNA sequence are chosen from the
745: alphabet $\{A,U,G,C\}$. If $k$ is a constant, there
746: are only constant number of different patterns of consecutive
747: stacking pairs that we must consider. For any $1 \leq j \leq k$,
748: there are only $4^j$ different strings that can be formed by
749: the four characters $\{A,U,G,C\}$. So, the locations of the
750: occurrences of these possible strings in the
751: RNA sequence can be recorded in an array of linked lists
752: indexed by the pattern of the string using $O(n)$ time preprocessing.
753: There are at most $4^j$ linked lists for any fixed $j$ and
754: there are at most $n$ entries in these linked lists. In total,
755: there are at most $kn$ entries in all linked lists for all
756: possible values of $j$.
757:
758: Now, we fix a constant $j$.
759: To locate all $j$ consecutive
760: stacking pairs,
761: we scan the RNA sequence from left to right. For each substring of
762: $j$ consecutive characters, we look up the array to see whether
763: we can form $j$ consecutive stacking pairs. By simple
764: bookkeeping, we can keep track which bases have been used
765: already. Each entry in the linked lists will only be
766: scanned at most once, so
767: the whole procedure takes only $O(n)$ time. Since $k$ is a constant,
768: we can repeat the whole procedure for $k$ different values of $j$, and the
769: total time complexity is still $O(n)$ time.
770: \end{proof}
771:
772: \newcommand{\encode}[1]{\langle #1 \rangle}
773:
774: \section{NP-completeness}
775:
776: In this section, we show that it is NP-hard to find a planar
777: secondary structure with the largest number of stacking pairs.
778: We consider the following decision problem.
779: Given an RNA sequence $S$ and an integer $h$, we wish to determine
780: whether the largest possible number of stacking pairs in a planar
781: secondary structure of $S$, denoted sp($S$), is at least $h$. Below we show
782: that this decision problem is NP-complete by reducing the tripartite
783: matching problem \cite{Garey:1979:CIG} to it, which is defined as follows.
784:
785: Given three node sets $X$, $Y$, and $Z$ with the same cardinality
786: $n$ and
787: an edge set $E \subseteq X \times Y \times Z$ of size $m$,
788: the {\it tripartite matching problem} is to
789: determine whether $E$ contains a perfect matching, i.e.,
790: a set of $n$ edges which touches every node of $X$, $Y$, and $Z$
791: exactly once.
792:
793: The remainder of this section is organized as follows.
794: Section~\ref{sec-construction} shows how we construct in polynomial
795: time an RNA sequence $S_E$ and an integer $h$ from a given instance
796: $(X,Y,Z, E)$ of the tripartite matching problem, where $h$
797: depends on $n$ and $m$. Section~\ref{sec-if} shows that if $E$
798: contains a perfect matching, then sp($S_E$) $\ge h$.
799: Section~\ref{sec-only-if} is the non-trivial part, showing that if $E$
800: does not contain a perfect matching, then sp($S_E$) $< h$. Combining
801: these three sections, we can conclude that it is NP-hard to
802: maximize the
803: number of stacking pairs for planar RNA secondary structures.
804:
805: \subsection{Construction of the RNA sequence $S_E$} \label{sec-construction}
806:
807: Consider any instance $(X,Y,Z, E)$ of the tripartite matching problem.
808: We construct an RNA sequence $S_E$ and an integer $h$ as follows.
809: Let $X = \{x_1, \cdots, x_n\}$, $Y = \{y_1, \cdots, y_n\}$, and $Z =
810: \{z_1, \cdots, z_n\}$. Furthermore, let $E = \{ e_1, e_2, \cdots, e_m
811: \}$, where each edge $e_j = (x_{p_j}, y_{q_j}, z_{r_j})$. Recall that
812: an RNA sequence contains characters chosen from the alphabet $\{A, U,
813: G, C\}$. Below we denote $A^i$, where $i$ is any positive integer, as
814: the sequence of $i$ $A$'s. Furthermore, $A^+$ means a sequence of one
815: or more $A$'s.
816:
817: \newcommand{\od}[1]{\overline{\delta(#1)}}
818: \newcommand{\op}[1]{\overline{\pi(#1)}}
819:
820: Let $d = \max\{ 6n, 4(m+1) \} + 1$. Define the following four RNA
821: sequences for every positive integer $k < d$.
822: \begin{itemize}
823: \item $\delta(k)$ is the sequence $U^dA^kGU^dA^{d-k}$, and
824: $\overline{\delta(k)}$ is the sequence $U^{d-k}A^dGU^kA^d$.
825: \item $\pi(k)$ is the sequence $C^{2d+2k} AG C^{4d-2k}$, and
826: $\overline{\pi(k)}$ is the sequence $G^{4d-2k}A G^{2d+2k}$.
827: \end{itemize}
828:
829: {\small\bf Fragments:} Note that the sequences
830: $\delta(k)$ and $\od{k}$ are each composed of
831: two substrings in the form of $U^+ A^+$, separated by a character $G$.
832: Each of these two substrings is called a {\it fragment}. Similarly,
833: the two substrings of the form $C^+$ separated by $AG$ in $\pi(k)$
834: and the two substrings of the form $G^+$ separated by the character
835: $A$ in $\overline{\pi(k)}$ are also called fragments.
836:
837: {\small\bf Node Encoding:} Each node in the three node sets $X$, $Y$,
838: and $Z$ is associated with a unique sequence. For $1 \le i \le n$,
839: let $\encode{x_i}$, $\encode{y_i}$, $\encode{z_i}$ denote the
840: sequences $\delta(i)$, $\delta(n+i)$, $\delta(2n+i)$, respectively.
841: Intuitively, $\encode{x_i}$ is the encoding of the node $x_i$, and
842: similarly $\encode{y_i}$ and $\encode{z_i}$ are for the nodes $y_i$ and
843: $z_i$, respectively. Furthermore, define $\encode{\overline{x_i}} =
844: \od{i}$, $\encode{\overline{y_i}} = \od{n+i}$, and
845: $\encode{\overline{z_i}} = \od{2n+i}$.
846:
847: The node set $X$ is associated with two sequences $\cal X$ =
848: $\encode{x_1} G \encode{x_2} G \cdots G \encode{x_n}$ and
849: $\overline{\cal X}$ = $\encode{\overline{x_n}} G
850: \encode{\overline{x_{n-1}}} G \cdots G \encode{\overline{x_1}}$.
851: Let ${\cal X} - x_i$
852: = $\encode{x_1} G \cdots G \encode{x_{i-1}} G
853: \encode{x_{i+1}} G \cdots \encode{x_n}$ and $\overline{{\cal X} -
854: x_i}$ = $\encode{\overline{x_n}} G \cdots G
855: \encode{\overline{x_{i+1}}} G \encode{\overline{x_{i-1}}} G \cdots G
856: \encode{\overline{x_1}}$, where $x_i$ is any node in $X$.
857: Similarly, the node sets $Y$ and $Z$ are
858: associated with sequences ${\cal Y}$, $\overline{\cal Y}$, and
859: $\cal Z$, $\overline{\cal Z}$, respectively.
860:
861: {\small\bf Edge Encoding:} For each edge $e_j$ (where $1 \le j \le
862: m$), we define four delimiter sequences, namely,
863: $V_j = \pi(j)$, $W_j = \pi(m+1+j)$, $\overline{V_j} = \overline{\pi(j)}$,
864: and $\overline{W_j} = \overline{\pi(m+1+j)}$.
865: Assume that $e_j = (x_{p_j}, y_{q_j},
866: z_{r_j})$. Then $e_j$ is encoded by the sequence $S_j$ defined as
867: \[
868: AG~V_j~AG~W_j~AG~{\cal X}~G~{\cal Y}~G~{\cal Z}~G~
869: \overline{({\cal Z} - z_{r_j})}~G~\overline{({\cal Y} - y_{q_j})}~G~
870: \overline{({\cal X} - x_{p_j})}~\overline{V_j}~A~\overline{W_j}.
871: \]
872: Let $S_{m+1}$ be a special sequence defined as $AG~V_{m+1}~AG~W_{m+1}~AG~
873: \overline{\cal Z}~G~\overline{\cal Y}~G~\overline{\cal X}~\;
874: \overline{V_{m+1}}~A~\overline{W_{m+1}}$. In the following
875: discussion, each $S_j$ is referred to as a {\em region}.
876:
877: Finally, we define $S_E$ to be the sequence $S_{m+1} S_m \cdots
878: S_1$.
879: Let $\sigma = 3n(3d-2) + 6d - 1$ and
880: let $h = m \sigma + n (6d - 4) + 12 d - 5$. Note that $S_E$ has $O((n+m)^3)$
881: characters and can be constructed in $O(|S_E|)$ time.
882: In Sections \ref{sec-if} and \ref{sec-only-if}, we show that
883: sp($S_E$) $\ge h$ if and only if $E$ contains a perfect matching.
884:
885: \subsection{Correctness of the if-part} \label{sec-if}
886: This section shows that if $E$ has a perfect matching,
887: we can construct a planar secondary structure for $S_E$
888: containing at least $h$ stacking pairs. Therefore,
889: sp($S_E$) $\geq h$.
890:
891:
892: First of all, we establish several basic steps for constructing
893: stacking pairs on $S_E$.
894: \begin{itemize}
895: \setlength{\itemsep}{-1pt}
896: \item $\delta(i)$ or $\overline{\delta(i)}$ itself can form
897: $d-1$ stacking pairs, while
898: $\delta(i)$ and $\overline{\delta(i)}$ together can form
899: $3d - 2$ stacking pairs.
900: \item
901: $\pi(i)$ and $\overline{\pi(i)}$ together can form
902: $6d - 2$ stacking pairs.
903: \item
904: For any $i \neq j$,
905: $\pi(i)$ and $\overline{\pi(j)}$ together can form
906: $6d - 3$ stacking pairs.
907: \end{itemize}
908:
909:
910: \begin{lemma}
911: If $E$ has a perfect matching, then sp($S_E$) $\geq h$.
912: \end{lemma}
913: \begin{proof}
914: Let $M = \{ e_{j_1}, e_{j_2}, \ldots, e_{j_n} \}$ be a perfect matching.
915: Without loss of generality, we assume that $1 \le j_1 < j_2 < \ldots < j_n
916: \le m$. Define $j_{n+1} = m+1$.
917: To obtain a planar secondary structure
918: for $S_E$ with at least $h$ stacking pairs,
919: we consider the regions one by one. There are three cases.
920:
921: \noindent
922: {\it Case 1:} We consider any region $S_j$ such that $e_j \not\in M$.
923: Our goal is to show that $\sigma = 3n(3d-2) +6d -1$
924: stacking pairs can be formed within $S_j$. Note that
925: there are $(m-n)$ edges not in $M$. Thus, we can obtain a total
926: of $(m-n)\sigma$ stacking pairs in this case. Details are as follows.
927: Assume that $e_j = (x_{p_j}, y_{q_j}, z_{r_j})$.
928: \begin{itemize}
929: \item $6d-2$ stacking pairs can be formed between $V_j$ and $\overline{V_j}$,
930: and between $W_j$ and $\overline{W_j}$.
931: \item $3d-2$ stacking pairs can be formed
932: between $\encode{x_i}$ and $\encode{\overline{x_i}}$
933: for all $i \neq p_j$,
934: and between $\encode{y_i}$ and $\encode{\overline{y_i}}$
935: for all $i \neq q_j$,
936: and between $\encode{z_i}$ and $\encode{\overline{z_i}}$
937: for all $i \neq r_j$.
938: \item $\encode{\overline{x_{p_j}}}$, $\encode{\overline{y_{q_j}}}$, and
939: $\encode{\overline{z_{r_j}}}$ can each
940: form $d-1$ stacking pairs.
941: \end{itemize}
942: The total number of stacking pairs that can be formed within $S_j$
943: is $2(6d-2) + 3(n-1)(3d-2) + 3(d-1)$
944: = $3n(3d - 2) + 6d - 1$ = $\sigma$.
945:
946: \noindent
947: {\it Case 2:} We consider the edges $e_{j_1}, e_{j_2}, \ldots, e_{j_n}$
948: in $M$. Our goal is to
949: show that each corresponding region accounts for $\sigma + 6d -4$
950: stacking pairs. Thus, we obtain a total of $n\sigma + n(6d -4)$ stacking
951: pairs in this case. Details are as follows.
952: Unlike Case 1, each region $S_{j_k}$, where $1 \le k \le n$,
953: may have some of its bases paired with that of $S_{j_{k+1}}$.
954: \begin{itemize}
955: \item $6d-3$ stacking pairs can be formed between $W_{j_k}$ in $S_{j_k}$
956: and $\overline{W_{j_{k+1}}}$ in $S_{j_{k+1}}$.
957: \item $6d-2$ stacking pairs can be formed between $V_{j_k}$ in $S_{j_k}$
958: and $\overline{V_{j_k}}$ in $S_{j_k}$.
959:
960:
961: \item $3d-2$ stacking pairs can be paired between $\encode{x_i}$ in
962: $S_{j_k}$
963: and $\encode{\overline{x_i}}$ in $S_{j_k}$ for any
964: $i \neq p_{j_1}, \ldots, p_{j_k}$,
965: and between $\encode{y_i}$ in $S_{j_k}$
966: and $\encode{\overline{y_i}}$ in $S_{j_k}$ for any
967: $i \neq q_{j_1}, \ldots, q_{j_k}$, and
968: between $\encode{z_i}$ in $S_{j_k}$
969: and $\encode{\overline{z_i}}$ in $S_{j_k}$ for any $i \neq r_{j_1},
970: \ldots, r_{j_k}$.
971:
972: \item
973: $3d-2$ stacking pairs can be paired between $\encode{x_i}$ in
974: $S_{j_k}$ and $\encode{\overline{x_i}}$ in $S_{j_{k+1}}$ for any
975: $i = p_{j_1}, \ldots, p_{j_k}$,
976: and between $\encode{y_i}$ in $S_{j_k}$
977: and $\encode{\overline{y_i}}$ in $S_{j_{k+1}}$ for any
978: $i = q_{j_1}, \ldots, q_{j_k}$, and
979: between $\encode{z_i}$ in $S_{j_{k+1}}$
980: and $\encode{\overline{z_i}}$ in $S_{j_{k+1}}$ for any $i = r_{j_1},
981: \ldots, r_{j_k}$.
982:
983: \end{itemize}
984: The total number of stacking pairs charged to $S_{j_k}$ is
985: $6d-3 + 6d -2 + 3n (3d -2)$ = $\sigma + 6d - 4$.
986:
987: \noindent
988: {\it Case 3:} We consider $S_{m+1}$.
989: We can form $6d-2$ stacking pairs between $V_{m+1}$ and
990: $\overline{V_{m+1}}$, and
991: $6d-3$ stacking pairs between $W_{m+1}$ and $\overline{W_{j_1}}$.
992: The number of such stacking pairs is $12d - 5$.
993:
994: Combining the three cases, the number of stacking pairs that
995: can be formed on $S_E$ is $(m-n)\sigma + n(\sigma + 6d - 4) + 12d - 5$,
996: which is exactly $h$. Notice that no two stacking pairs formed
997: cross each other. Thus, sp($S_E$) $\ge h$.
998: \end{proof}
999:
1000: \subsection{Correctness of the only-if part} \label{sec-only-if}
1001:
1002: This section shows that if $E$ has no perfect matching, then
1003: sp($S_E$)$<h$. We first give the framework of the proof in
1004: Section~\ref{sec-only-if-framework}.
1005: Then, some basic definitions and concepts are
1006: presented in Section~\ref{sec-only-if-definition}.
1007: The proof of the only-if part
1008: is given in Section~\ref{sec-only-if-proof}.
1009:
1010: \newcommand{\opt}{\mbox{\rm OPT}}
1011:
1012: \subsubsection{Framework of the proof} \label{sec-only-if-framework}
1013: Let $\opt$ be a secondary structure of $S_E$ with the maximum
1014: number of stacking pairs. Let $\#\opt$ be the number of stacking pairs
1015: in $\opt$. That is, $\#\opt =$ sp($S_E$). In this section,
1016: we will establish
1017: an upper bound for $\#\opt$. Recall that we only consider
1018: Watson-Crick base pairs, i.e., $A-U$ and $C-G$ pairs.
1019: We define a conjugate of a
1020: substring in $S_E$ as follows.
1021:
1022: \vspace{5pt}
1023: \noindent
1024: {\bf Conjugates:}
1025: For every substring $R = s_1 s_2 \ldots s_k$ of $S_E$,
1026: the {\it conjugate} of $R$ is
1027: $\hat{R} = \hat{s_k} \ldots \hat{s_1}$,
1028: where $\hat{A} = U$, $\hat{U} = A$, $\hat{C} = G$, and $\hat{G} = C$.
1029:
1030: \vspace{5pt}
1031: For example, $AA$'s conjugate is $UU$ and $UA$'s conjugate is $UA$.
1032: To form a stacking pair, two adjacent bases must be paired
1033: with another two adjacent bases. So, we concentrate on the possible
1034: patterns of adjacent bases in $S_E$.
1035:
1036: \vspace{5pt}
1037: \noindent
1038: {\bf 2-substrings:}
1039: In $S_E$, any two adjacent characters are referred to as a 2-substring.
1040: By construction, $S_E$ has only ten different types of 2-substrings:
1041: $UU$, $AA$, $UA$, $GG$, $CC$, $GC$, $AG$, $GA$, $GU$, and $CA$-substrings.
1042: A 2-substring can only form a stacking pair with its conjugate.
1043: If they actually form a stacking pair in $OPT$, they are said to
1044: be {\it paired}.
1045:
1046: \vspace{5pt}
1047: Since the conjugates of $AG$, $GA$, $GU$, and $CA$-substrings do not
1048: exist in $S_E$,
1049: there is no stacking pair in $S_E$ which involves these 2-substrings.
1050: We only need to consider $AA$, $UU$, $UA$, $GG$, $CC$, $GC$-substrings.
1051: Table \ref{occ-2substrings} shows the numbers of
1052: occurrences of these 2-substrings
1053: in $S_j$ ($1 \leq j \le m+1$) and the total occurrences of these
1054: substrings in $S_E$.
1055:
1056: {\begin{table*}
1057: \footnotesize
1058: \begin{center}
1059: \begin{tabular}{|l|l|l||l|}
1060: \hline
1061: Substring & \multicolumn{3}{c|}{Total number of occurrences of $t$ in} \\ \cline{2-4}
1062: ($t$) & $S_j$ ($j=1,2, \ldots, m$) & $S_{m+1}$ & $S_E$ \\ \hline
1063: AA & $3n(d-2)+(3n-3)(2d-2)$ & $3n(2d-2)$ & $m(3n(d-2) + (3n-3)(2d-2)) + 3n(2d-2)$\\
1064: UU & $3n(2d-2)+(3n-3)(d-2)$ & $3n(d-2)$ & $m(3n(2d-2) + (3n-3)(d-2)) + 3n(d-2)$\\
1065: UA & $2(6n-3)$ & $6n$ & $2m(6n-3) + 6n$\\
1066: GG & $2(6d-2)$ & $2(6d-2)$ & $2(m+1)(6d-2)$\\
1067: CC & $2(6d-2)$ & $2(6d-2)$ & $2(m+1)(6d-2)$\\
1068: GC & $4$ & $4$ & $4m+4$\\ \hline
1069: \end{tabular}
1070: \caption{Number of occurrences of different 2-substrings}
1071: \label{occ-2substrings}
1072: \end{center}
1073: \end{table*}
1074: }
1075:
1076:
1077: Let $\#AA$ denote the number of occurrences of $AA$-substrings in $S_E$.
1078: We use the $\#$ notation for other types of 2-subtrings in $S_E$ similarly.
1079: The following fact gives a straightforward upper bound for $\#\opt$.
1080:
1081:
1082: \begin{fact} \label{lem-interval-very-basic}
1083: \begin{tabbing}
1084: ABCDEF \= $\#\opt$ \= $\le$ \= \kill
1085: \> $\#\opt$ \> $\le$ \> $\min\{\#AA, \#UU\} + \min\{\#GG, \#CC\} + \#UA / 2 + \#GC / 2$ \\
1086: \> \> $=$ \> $h + n + 1 + (2m+2)$.
1087: \end{tabbing}
1088: \end{fact}
1089:
1090:
1091: Note that $\opt$ may not pair all $AA$-subtrings with $UU$-substrings.
1092: Let $\diamondsuit AA$ be the number of $AA$-substrings that
1093: are not paired in $\opt$. Again, we use the $\diamondsuit$ notaion
1094: for other types of 2-substrings.
1095: Fact~\ref{lem-interval-very-basic} can be strengthened as follows.
1096:
1097: \begin{fact} \label{lem-interval-basic}
1098: $\#\opt \le \min\{\#AA-\diamondsuit AA, \#UU-\diamondsuit UU\} +
1099: \min\{\#GG-\diamondsuit GG, \#CC-\diamondsuit CC\} +
1100: (\#UA-\diamondsuit UA)/2 + (\#GC-\diamondsuit GC) / 2$.
1101: \end{fact}
1102:
1103: The upper bound given in Fact \ref{lem-interval-basic} forms
1104: the basis of our proof for showing that $\#\opt < h$.
1105: In the following sections, we consider the possible structure of
1106: $\opt$. For each possible case, we show that the lower
1107: bounds for some $\diamondsuit$ values, such as
1108: $\diamondsuit AA$ and $\diamondsuit CC$, are sufficiently
1109: large so that $\opt$ can be shown to be less than $h$.
1110: In particular, in one of the cases, we must make use of the fact
1111: that $E$ does not have a perfect matching in order to prove the
1112: lower bound for $\diamondsuit AA$, $\diamondsuit UA$, and $\diamondsuit
1113: UU$. We give some basic definitions and concepts in Section
1114: \ref{sec-only-if-definition}. The lower bounds and the
1115: proof are given in Section \ref{sec-only-if-proof}.
1116:
1117: \subsubsection{Definitions and concepts} \label{sec-only-if-definition}
1118: In this section, we give some definitions and concepts which are
1119: useful in deriving lower bounds for $\diamondsuit$ values.
1120: We first classify each region $S_j$ in $S_E$
1121: as either {\it open} or {\it closed} with
1122: respect to $\opt$. Then, extending the definitions of fragments and
1123: conjugates, we introduce {\it conjugate fragments} and
1124: {\it delimiter fragments}. Finally, we present a property
1125: of delimiter fragments in open regions.
1126:
1127: \paragraph{Open and closed regions:}
1128: With respect to $\opt$, a region
1129: $S_j$ in $S_E$ is said to be an {\it open region}
1130: if some $UU$, $AA$, or $UA$-substrings in $S_j$ are paired
1131: with some 2-substrings outside $S_j$;
1132: otherwise, it is a {\it closed region}.
1133:
1134: \begin{lemma} \label{lem-s_m+1}
1135: If $S_{m+1}$ is a closed region, then $\#\opt < h$.
1136: \end{lemma}
1137: \begin{proof}
1138: $S_{m+1}$ has $3nd$ more $AA$-substrings than $UU$-substrings.
1139: If $S_{m+1}$ is a closed region, these $3nd$ $AA$-substrings
1140: are not paired by $\opt$.
1141: Thus, $\diamondsuit AA \geq 3nd$.
1142: By Fact~\ref{lem-interval-basic}, $\#\opt < h+(n+1) + (2m+2) - 3nd < h$.
1143: \end{proof}
1144:
1145: %By Lemma \ref{lem-s_m+1}, it suffices to assume that
1146: %$S_{m+1}$ is an open region.
1147: Recall that $S_E$ is a sequence
1148: composed of $\delta$'s, $\overline{\delta}$'s,
1149: $\pi$'s, and $\overline{\pi}$'s.
1150: Each $\delta(k)$ (respectively $\overline{\delta(k)}$) consists of
1151: two substrings of the form $U^+ A^+$, each of these substrings
1152: is called a {\em fragment}. Furthermore,
1153: each $\pi(k)$ (resp.\ $\overline{\pi(k)}$) consists of
1154: two substrings of the form $C^+$ (respectively $G^+$), each of these
1155: subtrings is also called a fragment.
1156:
1157: \paragraph{Conjugate fragments and delimiter fragments:}
1158: Consider any fragment $F$ in $S_E$.
1159: Another fragment $F'$ in $S_E$ is called a {\em conjugate fragment}\/
1160: of $F$ if $F'$ is the conjugate of $F$.
1161: Note that if $F$ is a fragment of a certian $\delta(k)$ (resp. $\pi(k)$), then
1162: $F'$ appears only in some $\overline{\delta(k)}$ (respectively
1163: $\overline{\pi(k)}$),
1164: and vice versa.
1165: By construction, if $F$ is a fragment of some delimiter sequence
1166: $V_j$ or $W_j$, then
1167: $F$ has a unique conjugate fragment in $S_E$, which
1168: is located in $\overline{V_j}$ or $\overline{W_j}$, respectively.
1169: However, if $F$ is a fragment of some non-delimiter sequence,
1170: says, $\encode{x_i}$, then for every instance of $\encode{\overline{x_i}}$ in $S_E$,
1171: $F$ contains one conjugate fragment in $\encode{\overline{x_i}}$.
1172:
1173: A fragment $F$ is said to be {\em paired}\/ with
1174: its conjugate fragment $F'$ by $\opt$ if $\opt$ includes
1175: all the pairs of bases between $F$ and $F'$.
1176:
1177: For $1 \leq j \leq m+1$,
1178: the fragment $F$ in $V_j$ or $W_j$
1179: is called a {\it delimiter fragment}.
1180: Note that the delimiter fragment $F$ should be of
1181: the form $C^{2d+k}$ for $2d > k > 0$.
1182:
1183: The following lemma shows a property of delimiter fragments
1184: in open regions.
1185:
1186: \begin{lemma} \label{lem-delimiter-fragment}
1187: If $S_j$ is an open region, then both delimiter
1188: fragments of either $V_j$ or $W_j$
1189: must not pair with their conjugate fragments in $\opt$.
1190: \end{lemma}
1191: \begin{proof}
1192: We prove the statement by contradiction.
1193: Suppose one fragment of $V_j$ and one fragment of $W_j$
1194: are paired with their conjugate fragments.
1195: Let $(s_x, s_{x+1}; s_{y-1}, s_y)$ and $(s_{x'}, s_{x'+1}; s_{y'-1}, s_{y'})$
1196: be some particular stacking pairs in $V_j$ and $W_j$, respectively.
1197: Since $S_j$ is an open region,
1198: we can identify a stacking pair $(s_{x''}, s_{x''+1}; s_{y''-1}, s_{y''})$
1199: where $s_{x''} s_{x''+1}$ and $s_{y''-1} s_{y''}$
1200: are 2-substrings within and outside $S_j$, respectively.
1201: Note that these three stacking pairs form an interleaving block.
1202: By Lemma~\ref{interleavingblock}, ${\opt}$ is not planar,
1203: reaching a contradiction.
1204: \end{proof}
1205:
1206:
1207: \subsubsection{Proof of the only-if part} \label{sec-only-if-proof}
1208: By Lemma~\ref{lem-s_m+1}, it suffices to assume that
1209: $S_{m+1}$ is an open region.
1210: Before we give the proof of the only-if part, let us consider the
1211: following lemma.
1212:
1213: \begin{lemma} \label{lem-open-delimiter}
1214: Let $\alpha$ be the number of delimiter fragments that
1215: are not paired with their conjugate fragments.
1216: Then,
1217: $\diamondsuit CC + \diamondsuit GG \geq \alpha + (\#GC - \diamondsuit GC)$.
1218: \end{lemma}
1219: \begin{proof}
1220: By construction, a $GC$-substring
1221: must be next to the left end of a delimiter fragment $F$, which is
1222: of the form $C^+$.
1223: No other $GC$-substrings can exist. If this $GC$-substring is
1224: paired, the leftmost $CC$-substring of $F$
1225: must not be paired as there is no $GGC$ pattern in $S_E$.
1226: Thus, $F$ must be one of the $\alpha$ delimiter fragments
1227: that are not paired with their conjugate fragments.
1228: Based on this observation, we classify
1229: the $\alpha$ delimiter fragments into two groups:
1230: (1) $(\#GC - \diamondsuit GC)$'s delimiter fragments whose
1231: $GC$-substrings at the left end are paired; and
1232: (2) $\alpha - (\#GC - \diamondsuit GC)$'s delimiter fragments whose
1233: $GC$-substrings at the left end are not paired.
1234:
1235: For each delimiter fragment $F = C^{2d+k}$ in group (1),
1236: since the $GC$-substring on the left of $F$ is paired,
1237: the leftmost $CC$-substring of $F$ must not be paired by $\opt$.
1238: For the remaining $2d+k-2$ $CC$-substrings,
1239: we either find a $CC$-substring which is not paired by $\opt$;
1240: or these $2d+k-2$ $CC$-substrings are paired to
1241: $GG$-substrings in some fragment $F' = G^{2d+k'}$ with $2d > k' > k$,
1242: and thus, some $GG$-substring of $F'$ is not paired.
1243: Therefore, each delimiter fragment in group (1) introduces
1244: either (i) two unpaired $CC$-substrings or
1245: (ii) one unpaired $CC$-substring and one unpaired $GG$-substring.
1246: Hence, the total number of unpaired $CC$ and $GG$-substrings due to
1247: delimiter fragments in group (1) $\geq 2 (\#GC - \diamondsuit GC)$.
1248:
1249: For each delimiter fragment $F = C^{2d+k}$ in group (2), consider
1250: the $CC$-substrings in $F$. With a similar argument, we can show
1251: that
1252: each delimiter fragment in group (2) introduces
1253: either (i) one unpaired $CC$-substring
1254: or (ii) one unpaired $GG$-substring.
1255: Hence, the total number of unpaired $CC$ and $GG$-substrings due to
1256: delimiter fragments in group (2) $\geq \alpha - (\#GC - \diamondsuit GC)$.
1257:
1258: In total, we have
1259: $\diamondsuit CC + \diamondsuit GG
1260: \geq \alpha + (\# GC - \diamondsuit GC)$.
1261: \end{proof}
1262:
1263: Now, we state a lemma which shows the
1264: lower bounds for some $\diamondsuit$ values in terms of
1265: the number of open regions in $\opt$.
1266:
1267: \begin{lemma} \label{diamondlowerbounds}
1268: Let $\ell \ge 1$ be the number of open regions in $\opt$.
1269:
1270: \vspace{3pt}
1271: \noindent
1272: (1) If $S_{m+1}$ is an open region, then $\diamondsuit UU \geq 3(m+1-\ell) d$.
1273:
1274: \vspace{3pt}
1275: \noindent
1276: (2) $\max \{ \diamondsuit CC, \diamondsuit GG \} \geq
1277: \ell + (\# GC - \diamondsuit GC) / 2$.
1278:
1279: \vspace{3pt}
1280: \noindent
1281: (3) If $\ell = n+1$, $S_{m+1}$ is an open region,
1282: and $E$ does not have a perfect matching,
1283: then either (a) $\diamondsuit UU \geq 3(m-n)d + 1$,
1284: (b) $\diamondsuit AA \geq 1$, or (c) $\diamondsuit UA \geq 2$.
1285: \end{lemma}
1286:
1287: \begin{proof}
1288:
1289: \noindent
1290: {\small \bf Statement 1.}
1291: Within each closed region $S_j$ where $j \neq m+1$,
1292: $3d$'s $UU$-substrings cannot paired in $\opt$.
1293: As there are $m+1-\ell$ such closed regions, $3(m+1-\ell)d$
1294: $UU$-substrings are not
1295: paired in $\opt$. Thus, $\diamondsuit UU \geq 3(m+1-\ell)d$.
1296:
1297: \vspace{5pt}
1298: \noindent
1299: {\small \bf Statement 2.}
1300: By Lemma~\ref{lem-delimiter-fragment}, we can identify $2 \ell$ fragments
1301: in $V_j$ and $W_j$ of all open regions
1302: which are not paired with their conjugate fragments.
1303: Then, by Lemma \ref{lem-open-delimiter}, we have
1304: $\diamondsuit CC + \diamondsuit GG \geq 2\ell + (\# GC - \diamondsuit GC)$.
1305: Thus, $\max\{ \diamondsuit CC, \diamondsuit GG \} \geq
1306: \ell + (\# GC - \diamondsuit GC) / 2$.
1307:
1308: \vspace{5pt}
1309: \noindent
1310: {\small \bf Statement 3.}
1311: By a similar argument to the proof for Statement 1,
1312: within the $m+1-\ell = m-n$ closed regions,
1313: $3(m-n)d$ $UU$-substrings are not paired in $\opt$.
1314:
1315: For the $\ell = n+1$ open regions,
1316: one of them must be $S_{m+1}$.
1317: Let
1318: $S_{j_1}, \ldots, S_{j_n}$ be the remaining $n$ open regions.
1319: Recall that $e_{j_1}, \ldots, e_{j_n}$
1320: are the corresponding edges of these $n$ open regions.
1321: Since these $n$ edges cannot form a perfect matching,
1322: some node, says $x_k$, is adjacent to these $n$
1323: edges more than once.
1324: Thus, within $S_{j_1}, \ldots, S_{j_n}, S_{m+1}$,
1325: we have more $\encode{x_k}$ than
1326: $\encode{\overline{x_k}}$.
1327: Therefore, at least two of the fragments in all $\encode{x_k}$
1328: are not paired
1329: with their conjugate fragments.
1330:
1331: Let $F$ be one of such fragments.
1332: Note that $F$ is of the form $U^d A^k$.
1333: Since $F$ is not paired with its conjugate fragment,
1334: one of the following three cases occurs in $\opt$:
1335:
1336: \vspace{3pt}
1337: \noindent
1338: Case 1: An $UU$-substring of $F$ is not paired.
1339:
1340: \vspace{3pt}
1341: \noindent
1342: Case 2: An $AA$-substring of $F$ is not paired.
1343:
1344: \vspace{3pt}
1345: \noindent
1346: Case 3: All $UU$-substrings and $AA$-substrings $F$ are paired.
1347: In this case, $U^d$ of $F$ is paired with $A^d$ of a fragment
1348: $F' = U^{k'}A^d$;
1349: and $A^k$ of $F$ is paired with some substring $U^k$ of some fragment $F''$.
1350: As $F'$ and $F''$ are not the same fragment, the $UA$-substrings of both $F$
1351: and $F'$ are not paired.
1352:
1353: \vspace{3pt}
1354: In summary, we have either
1355: (1) $\diamondsuit UU \geq 3(m-n)d + 1$, or
1356: (2) $\diamondsuit AA \geq 1$, or
1357: (3) $\diamondsuit UA \geq 2$.
1358: \end{proof}
1359:
1360: Based on Lemma \ref{diamondlowerbounds}, we prove the only-if part
1361: by a case analysis in the following lemma.
1362:
1363: \begin{lemma}
1364: If $E$ does not have a prefect matching,
1365: then $\# \opt < h$.
1366: \end{lemma}
1367: \begin{proof}
1368: Recall that if $S_{m+1}$ is a closed region, then
1369: $\#\opt < h$. Now, suppose that $S_{m+1}$ is an
1370: open region. We show
1371: $\# \opt < h$ in three cases $\ell < n+1$, $\ell > n+1$ and $\ell = n+1$.
1372:
1373: \vspace{5pt}
1374: \noindent
1375: {\it Case 1:} $\ell < n+1$. By Lemma ~\ref{diamondlowerbounds} (1),
1376: $\diamondsuit UU \geq 3(m+1-\ell)d$.
1377: By Fact~\ref{lem-interval-basic},
1378: we can conclude that $\#\opt = h + n+1 + (2m+2) - 3(n+1-\ell)d
1379: \leq h + n+1 + (2m+2) - 3d < h$.
1380:
1381: \vspace{5pt}
1382: \noindent
1383: {\it Case 2:} $\ell > n+1$. By Lemma~\ref{diamondlowerbounds} (2),
1384: $\max \{ \diamondsuit CC, \diamondsuit GG \} \geq \ell + (\# GC - \diamondsuit GC)/2$.
1385: By Fact~\ref{lem-interval-basic},
1386: $\#\opt \leq h + n + 1 - \ell$, which is smaller than $h$
1387: because $\ell > n+1$.
1388:
1389: \vspace{5pt}
1390: \noindent
1391: {\it Case 3:} $\ell = n+1$. By Lemma~\ref{diamondlowerbounds} (3),
1392: either
1393: (a) $\diamondsuit UU \geq 3(m-n)d + 1$, or
1394: (b) $\diamondsuit AA \geq 1$, or
1395: (c) $\diamondsuit UA \geq 2$.
1396: By Fact~\ref{lem-interval-basic},
1397: $\#\opt \leq h + n - \max \{ \diamondsuit CC, \diamondsuit GG \}
1398: + (\#GC - \diamondsuit GC) / 2$.
1399: By Lemma~\ref{diamondlowerbounds} (2),
1400: we have $\#\opt < h$.
1401: \end{proof}
1402:
1403: We conclude that if $E$ does not have a prefect matching,
1404: then $\#\opt < h$. Equivalently,
1405: if $\#\opt \geq h$, then
1406: $E$ has a prefect matching.
1407:
1408: \section{Conclusions}
1409: In this paper, we have studied the problem of predicting RNA secondary
1410: structures that allow arbitrary pseudoknots with a simple free
1411: energy function that is minimized when the number of stacking
1412: pairs is maximized. We have proved that this problem is NP-hard if the
1413: secondary structure is required to be planar. We conjecture that
1414: the problem is also NP-hard for the general case.
1415: We have also given two approximation algorithms for this problem with
1416: worst-case approximation ratios of 1/2 and 1/3 for planar and general
1417: secondary structures, respectively. It would be of interest to
1418: improve these approximation ratios.
1419:
1420: Another direction is to study the problem using
1421: energy function that is minimized when the number of base pairs is
1422: maximized. It is known that this problem can be solved in cubic time
1423: if the secondary structure can be non-planar \cite{Nussinov:1978:ALM}.
1424: However, the computational complexity of the problem is still open if the
1425: secondary structure is required to be planar. We conjecture that
1426: the problem becomes NP-hard under this additional condition.
1427: We would like to point out that the observation that have
1428: enabled us to visualize the planarity of stacking pairs on a rectangular
1429: grid does not hold in case of maximizing base pairs.
1430:
1431: \bibliographystyle{plain}
1432: \bibliography{rnastruct}
1433:
1434: \end{document}
1435: