q-bio0410012/bm3.tex
1: \documentstyle[prl,aps,epsfig,multicol]{revtex}
2: \begin{document}
3: \title{\bf Exact Asymptotic Results for a Model of Sequence Alignment}
4: \author{Satya N. Majumdar $^{1,2}$ and Sergei Nechaev $^{2,3}$}
5: \address{\small \it $^1$Laboratoire de Physique Theorique (UMR C5152 du CNRS), Universit\'e
6: Paul Sabatier, 31062 Toulouse Cedex. France \\ $^2$Laboratoire de Physique
7: Th\'eorique et Mod\`eles Statistiques, Universit\'e Paris-Sud. B\^at. 100. 91405
8: Orsay Cedex. France \\ $^3$L.D.Landau Institute for Theoretical Physics, 117334
9: Moscow. Russia}
10: \date{\today}
11: 
12: \maketitle
13: 
14: \begin{abstract}
15: Finding analytically the statistics of the longest common subsequence (LCS) of a
16: pair of random sequences drawn from $c$ alphabets is a challenging problem in
17: computational evolutionary biology. We present exact asymptotic results for the
18: distribution of the LCS in a simpler, yet nontrivial, variant of the original model
19: called the Bernoulli matching (BM) model which reduces to the original model in
20: the $c\to \infty$ limit. We show that in the BM model, for all $c$, the distribution
21: of the asymptotic length of the LCS, suitably scaled, is identical to the Tracy-Widom
22: distribution of the largest eigenvalue of a random matrix whose entries are drawn
23: from a Gaussian unitary ensemble. In particular, in the $c\to \infty$ limit, this
24: provides an exact expression for the asymptotic length distribution in the original
25: LCS problem.
26: 
27: \noindent
28: 
29: \medskip\noindent {PACS numbers: 87.10.+e, 87.15.Cc, 02.50.-r, 05.40.-a}
30: \end{abstract}
31: 
32: 
33: \begin{multicols}{2}
34: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
35: %\section{Introduction}
36: 
37: Sequence alignment is one of the most useful quantitative methods used in
38: evolutionary molecular biology\cite{W1,Gusfield,DEKM}. The goal of an alignment
39: algorithm is to search for similarities in patterns in different sequences. A
40: classic and much studied alignment problem is the so called `longest common
41: subsequence' (LCS) problem. The input to this problem is a pair of sequences
42: $\alpha=\{\alpha_1, \alpha_2,\dots, \alpha_i\}$ (of length $i$) and
43: $\beta=\{\beta_1, \beta_2,\dots, \beta_j\}$ (of length $j$). For example, $\alpha$
44: and $\beta$ can be two random sequences of the $4$ base pairs $A$, $C$, $G$, $T$ of
45: a DNA molecule, e.g., $\alpha=\{A, C, G, C, T, A, C\}$ and $\beta=\{C, T, G, A,
46: C\}$. A subsequence of $\alpha$ is an ordered sublist of $\alpha$ (entries of which
47: need not be consecutive in $\alpha$), e.g, $\{C, G, T, C\}$, but not $\{T, G, C\}$.
48: A common subsequence of two sequences $\alpha$ and $\beta$ is a subsequence of both
49: of them. For example, the subsequence $\{C, G, A, C\}$ is a common subsequence of
50: both $\alpha$ and $\beta$. There can be many possible common subsequences of a pair
51: of sequences. The aim of the LCS problem is to find the longest of such common
52: subsequences. This problem and its variants have been widely studied in
53: biology\cite{NW,SW,WGA,AGMML}, computer science\cite{SK,AG,WF,Gusfield}, probability
54: theory\cite{CS,Deken,Steele,DP,Alex,KLM} and more recently in statistical
55: physics\cite{ZM,Hwa,Monvel}. A particularly important application of the LCS problem
56: is to quantify the closeness between two DNA sequences. In evolutionary biology, the
57: genes responsible for building specific proteins evolve with time and by finding the
58: LCS of the same gene in different species, one can learn what has been conserved in
59: time. Also, when a new DNA molecule is sequenced {\it in vitro}, it is important to
60: know whether it is really new or it already exists. This is achieved quantitatively
61: by measuring the LCS of the new molecule with another existing already in the
62: database.
63: 
64: For a pair of fixed sequences of length $i$ and $j$ respectively, the length
65: $L_{i,j}$ of their LCS is just a number. However, in the stochastic version of the
66: LCS problem one compares two random sequences drawn from $c$ alphabets and hence the
67: length $L_{i,j}$ is a random variable. A major challenge over the last three decades
68: has been to determine the statistics of $L_{i,j}$\cite{CS,Deken,Steele,DP,Alex}. For
69: equally long sequences ($i=j=n$), it has been proved that $\langle L_{n,n}\rangle
70: \approx \gamma_c n$ for $n\gg 1$, where the averaging is performed over all
71: realizations of the random sequences. The constant $\gamma_c$ is known as the
72: Chv\'atal-Sankoff constant which, to date, remains undetermined though there exists
73: several bounds\cite{Deken,DP,Alex}, a conjecture due to Steele\cite{Steele} that
74: $\gamma_c=2/(1+\sqrt{c})$ and a recent proof\cite{KLM} that $\gamma_c\to 2/\sqrt{c}$
75: as $c\to \infty$. Unfortunately, no exact results are available for the finite size
76: corrections to the leading behavior of the average $\langle L_{n,n}\rangle$, for the
77: variance, and also for the full probability distribution of $L_{n,n}$. Thus, despite
78: tremendous analytical and numerical efforts, exact solution of the random LCS
79: problem has, so far, remained elusive. Therefore it is important to find other
80: variants of this LCS problem that may be analytically tractable.
81: 
82: Computationally, the easiest way to determine the length $L_{i,j}$ of the LCS of two
83: arbitrary sequences of lengths $i$ and $j$ (in polynomial time $\sim O(ij)$) is via
84: using the recursive algorithm\cite{Gusfield,Monvel}
85: \begin{equation}
86: L_{ij} = \max\left[L_{i-1,j}, L_{i,j-1}, L_{i-1,j-1} + \eta_{i,j}\right],
87: \label{recur1}
88: \end{equation}
89: subject to the initial conditions $L_{i,0}=L_{0,j}=L_{0,0}=0$. The variable
90: $\eta_{i,j}$ is either 1 when the characters at the positions $i$ (in the sequence
91: $\alpha$) and $j$ (in the sequence $\beta$) match each other, or 0 if they do not.
92: Note that the variables $\eta_{i,j}$'s are not independent of each other. To see
93: this consider the simple example -- matching of two strings $\alpha={\rm AB}$ and
94: $\beta={\rm AA}$. One has by definition: $\eta_{1,1}=\eta_{1,2}=1$ and
95: $\eta_{2,1}=0$. The knowledge of these three variables is sufficient to predict that
96: the last two letters will not match, i.e., $\eta_{2,2}=0$. Thus, $\eta_{2,2}$ can
97: not take its value independently of $\eta_{1,1},\,\eta_{1,2},\,\eta_{2,1}$. These
98: residual correlations between the $\eta_{i,j}$ variables make the LCS problem rather
99: complicated. Note however that for two random sequences drawn from $c$ alphabets,
100: these correlations between the $\eta_{i,j}$ variables vanish in the $c\to \infty$
101: limit.
102: 
103: A simpler but natural variant of this LCS problem is the Bernoulli matching (BM)
104: model where one ignores the correlations between $\eta_{i,j}$'s for all
105: $c$\cite{Monvel}. The BM model reduces to the original LCS problem only in the $c\to
106: \infty$ limit. The length $L_{i,j}^{BM}$ of the BM model satisfies the same
107: recursion relation in Eq. (\ref{recur1}) except that $\eta_{i,j}$'s are now
108: independent and each drawn from the bimodal distribution: $p(\eta)=
109: (1/c)\delta_{\eta,1}+ (1-1/c)\delta_{\eta,0}$. The BM model, though simpler than the
110: original LCS problem, is still nontrivial due to the nonlinear recursion relation in
111: Eq. (\ref{recur1}). Using the cavity method of spin glass physics\cite{MPV}, the
112: asymptotic behavior of the average length in the BM model was determined
113: analytically\cite{Monvel},
114: \begin{equation}
115: \langle L_{n,n}^{BM}\rangle  \approx \gamma_c^{BM} n
116: \label{bm1}
117: \end{equation}
118: where $\gamma_c^{BM}= 2/(1+\sqrt{c})$, same as the conjectured value of the
119: Chv\'atal-Sankoff constant $\gamma_c$ for the original LCS model. However, other
120: properties such as the variance or the distribution of $L_{n,n}^{BM}$ remained
121: untractable even in the BM model.
122: 
123: The purpose of this Letter is to present an exact asymptotic formula for the
124: distribution of the length $L_{n,n}^{BM}$ in the BM model for all $c$. Our main
125: result is that for large $n$,
126: \begin{equation}
127: L_{n,n}^{BM}\to \gamma_c^{BM} n + f(c)\, n^{1/3}\, \chi \label{asymp1}
128: \end{equation}
129: where $\chi$ is a random variable with a $n$-independent distribution, ${\rm Prob}
130: (\chi\le x)= F_{\rm TW}(x)$ which is the well studied Tracy-Widom distribution for
131: the largest eigenvalue of a random matrix with entries drawn from a Gaussian unitary
132: ensemble\cite{TW}. For a detailed form of the function $F_{\rm TW}(x)$, see
133: \cite{TW}. We show that for all $c$,
134: \begin{equation}
135: f(c)=\frac{c^{1/6}(\sqrt{c}-1)^{1/3}}{\sqrt{c}+1}.
136: \label{fc1}
137: \end{equation}
138: This allows us to calculate the average including the subleading finite size
139: correction term and the variance of $L_{n,n}^{BM}$ for large $n$,
140: \begin{eqnarray}
141: \langle L_{n,n}^{BM}\rangle &\approx & \gamma_c^{BM} n + \left<\chi\right> f(c)
142: n^{1/3} \nonumber \\
143: {\rm Var}\, L_{n,n}^{BM} &\approx &
144: \left(\langle\chi^2\rangle-{\langle\chi\rangle}^2\right)\, f^2(c)\, n^{2/3},
145: \label{eq:expvar}
146: \end{eqnarray}
147: where one can use the known exact values\cite{TW}, $\langle \chi\rangle=
148: -1.7711\dots$ and $\langle \chi^2\rangle- {\langle \chi\rangle}^2= 0.8132\dots$. In
149: particular, we note that in the limit $c\to \infty$, Eqs.
150: (\ref{asymp1})-(\ref{eq:expvar}) provide
151: exact asymptotic results for the original LCS model as well.
152: 
153: In the BM model, the length $L_{i,j}^{BM}$ can be interpreted as the height of a
154: surface over the $2$-d $(i,j)$ plane constructed via the recursion relation in Eq.
155: (\ref{recur1}). A typical surface, shown in Fig. (1a), has terrace-like structures.
156: \begin{figure}[ht]
157: %\begin{center}
158: \centerline{\epsfig{file=bm_f1.eps,width=8cm}}
159: %\end{center}
160: \caption{Examples of (a) BM surface
161: $L_{i,j}^{BM}\equiv {\tilde h}(x,y)$ and (b) ADP surface $L_{i,j}^{ADP}\equiv
162: h(x,y)$.} \label{fig:1}
163: \end{figure}
164: 
165: It is useful to consider the projection of the level lines separating the adjacent
166: terraces whose heights differ by $1$ (see Fig.2) onto the $2$-d $(i,j)$ plane. Note
167: that, by the rule Eq. (\ref{recur1}), these level lines never overlap each other,
168: i.e., no two paths have any common edge. The statistical weight of such a projected
169: $2$-d configuration is the product of weights associated with the vertices of the
170: $2$-d plane. There are five types of possible vertices with nonzero weights as shown
171: in Fig.2, where $p=1/c$ and $q=1-p$. Since the level lines never cross each other,
172: the weight of the first vertex in Fig. (2) is 0.
173: %The height $L_{i,j}^{BM}$ at any point $(i,j)$ on this $2$-d plane is just the
174: %number of level lines that one crosses in going from the origin to $(i,j)$.
175: \begin{figure}[ht]
176: %\begin{center}
177: \centerline{\epsfig{file=bm_f2.eps,width=6.5cm}}
178: %\end{center}
179: \caption{Projected $2$-d level lines separating adjacent terraces of unit height
180: difference in the BM surface in Fig.(1a). The adjacent table shows the weights of
181: all vertices on the $2$-d plane.} \label{fig:2}
182: \end{figure}
183: 
184: Consider first the limit $c\to \infty$ (i.e., $p\to 0$). The weights of all allowed
185: vertices are $1$, except the ones shown by black dots in Fig.(2), whose associated
186: weights are $p\to 0$. The number $N$ of these black dots inside a rectangle of area
187: $A=ij$ can be easily estimated. For large $A$ and $p\to 0$, this number is Poisson
188: distributed with the mean ${\overline N}= pA$. The Bethe ansatz analysis shows that
189: BM corresponds to the sector of the 5-vertex model\cite{Wu} where the density
190: $\alpha$ of empty edges in a row of vertical edges is close to the boundary
191: $\alpha\approx 1^{-}$. The careful examination of the free energy near this boundary
192: allows one to conclude that the leading contribution in $p$ (for $p\to 0$) to
193: ${\overline N}$ comes exactly from the line of phase transitions in a 5-vertex
194: model. The subleading corrections to ${\overline N}$ are of order $\sim p^{3/2}$ and
195: are ensured by small deviations from the critical line being beyond the Poisson
196: approximation\cite{MN}.
197: 
198: The height $L_{i,j}^{BM}$ is just the number of level lines $\cal N$ inside this
199: rectangle of area $A=ij$. The problem of estimating $\cal N$ has recently appeared
200: in a number of interface models such as a polynuclear growth model\cite{PS} and a
201: ballistic deposition model\cite{BD}. By using a mapping to the longest increasing
202: subsequence (LIS) of the equally likely permutations of a set of integers and then,
203: by applying a celebrated result due to Baik, Deift and Johansson (BDJ)\cite{BDJ}, it
204: was shown\cite{PS,BD} that the number of level lines ${\cal N}$ inside the rectangle
205: (for large $A$), appropriately scaled, has a limiting behavior, ${\cal N}\to
206: 2\sqrt{\overline N} + {\overline N}^{1/6}\, \chi$, where $\chi$ is a random variable
207: with Tracy-Widom distribution. Using ${\overline N}=pA=ij/c$, one then obtains in
208: the limit $p\to 0$,
209: \begin{equation}
210: L_{i,j}^{BM}= {\cal N} \to \frac{2}{\sqrt c}\sqrt{ij} +
211: {\left( \frac{ij}{c}\right)}^{1/6}\, \chi.
212: \label{p01}
213: \end{equation}
214: In particular, for large equal length sequences $i=j=n$, we get for $c\to \infty$
215: \begin{equation}
216: L_{n,n}^{BM}\to \frac{2}{\sqrt{c}}\, n + c^{-1/6} \, n^{1/3}\, \chi .
217: \label{p02}
218: \end{equation}
219: Note that since the BM and the original LCS model are equivalent in the limit $c\to
220: \infty$, the exact results in Eqs. (\ref{p01})-(\ref{p02}) also hold for the LCS
221: model. Note that only the leading behavior of the average $\langle L_{n,n}\rangle$
222: was known before\cite{KLM} in the $c\to \infty$ limit of the original LCS model.
223: 
224: For finite $c$, while the above mapping to the LIS problem still works, the
225: corresponding permutations of the LIS problem are not generated with equal
226: probability and hence one can no longer use the BDJ results. To make progress for
227: finite $c$, we map the BM model exactly to a $3$-d anisotropic directed percolation
228: (ADP) model first considered by Rajesh and Dhar\cite{RD}. This ADP model can further
229: be mapped to a $(1+1)$-d directed polymer problem studied by
230: Johansson\cite{Johansson}. For this specific directed polymer problem, Johansson
231: derived exact asymptotic result for the distribution of the polymer energy.
232: Translating these results back to the BM model, we derive our main results in Eqs.
233: (\ref{asymp1})-(\ref{eq:expvar}). Note that the recursion relation in Eq.
234: (\ref{recur1}) can also be viewed as a $(1+1)$-d directed polymer
235: problem\cite{Hwa,Monvel} and some asymptotic results (such as the $O(n^{2/3})$
236: behavior of the variance of $L_{n,n}$ for large $n$) can be obtained using the
237: arguments of universality\cite{Hwa}. However, this does not provide precise results
238: for the full distribution which are obtained here.
239: 
240: Let us consider a directed bond percolation on a simple cubic lattice. The bonds are
241: occupied with probabilities $p_x$, $p_y$, and $p_z$ along the $x$, $y$ and $z$ axes
242: and are all directed towards increasing coordinates. Imagine a source of fluid at
243: the origin which spreads along the occupied directed bonds. The sites that get wet by the
244: fluid form a $3$-d cluster. In the ADP problem, the bond occupation probabilities are
245: anisotropic, $p_x=p_y=1$ (all bonds aligned along the $x$ and $y$ axes are occupied)
246: and $p_z=p$. Hence, if the point $(x,y,z)$ gets wet by the fluid then all the points
247: $(x',y', z)$ on the same plane with $x'\ge x$ and $y'\ge y$ also get wet. Such a wet
248: cluster is compact and can be characterized by its bounding surface height $h(x,y)$
249: as shown in Fig.(1b). It is not difficult to see that the height $h(x,y)$ satisfies
250: the following recursion relation\cite{RD},
251: \begin{equation}
252: h(x,y) = \max \left[ h(x-1,y), h(x, y-1)\right] + \xi_{i,j},
253: \label{recur2}
254: \end{equation}
255: where $\xi_{i,j}$'s are i.i.d. random variables taking nonnegative integer values
256: with ${\rm Prob}(\xi_{i,j}=k)= (1-p)\, p^k$ for $k=0,1,2,\dots$. One can also
257: interpret the height $h(x,y)$ in Eq. (\ref{recur2}) as the energy of a directed
258: polymer in the $(x-y)$ plane. Precisely this particular version of the polymer
259: problem was studied by Johansson\cite{Johansson} who obtained the asymptotic
260: distribution of the height for large $x$ and $y$,
261: \begin{eqnarray}
262: h(x,y) &\to& \frac{2\sqrt{pxy}+p(x+y)}{q}+ \nonumber \\
263:        &+&   \frac{(pxy)^{1/6}}{q}\,\left[(1+p)+\sqrt{\frac{p}{xy}}\,(x+y)\right]^{2/3}
264:        \, \chi,
265: \label{j1}
266: \end{eqnarray}
267: where $q=1-p$, $\chi$ is a random variable with a Tracy-Widom distribution.
268: 
269: While the terrace-like structures of the ADP surface look similar to the BM surfaces
270: (compare Figs.(1a) and (1b)), there is an important difference between the two. In
271: the ADP model, the level lines separating two adjacent terraces can overlap with
272: each other\cite{RD}, which does not happen in the BM model. However, by making the
273: following change of coordinates in the ADP model\cite{RD}
274: \begin{equation}
275: \zeta= x+ h(x,y); \,\,\, \eta=y+ h(x,y)
276: \label{ct1}
277: \end{equation}
278: one gets a configuration of the surface where the level lines no longer overlap.
279: Moreover, it is not difficult to show that the projected $2$-d configuration of
280: level lines of this shifted ADP surface has exactly the same statistical weight as
281: the projected $2$-d configuration of the BM surface. Denoting the BM height by
282: ${\tilde h}(x,y)= L_{x,y}^{BM}$, one then has the identity, ${\tilde h}(\zeta,
283: \eta)= h(x,y)$, which holds for each configuration. Using Eq. (\ref{ct1}), one can
284: rewrite this identity as
285: \begin{equation}
286: {\tilde h}(\zeta, \eta)= h\left( \zeta- {\tilde h}(\zeta, \eta),
287: \eta- {\tilde h}(\zeta, \eta)\right).
288: \label{conv1}
289: \end{equation}
290: Thus, for any given height function $h(x,y)$ of the ADP model, one can, in
291: principle, obtain the corresponding height function ${\tilde h}(x,y)$ for all
292: $(x,y)$ of the BM model by solving the nonlinear equation (\ref{conv1}). This is
293: however very difficult in practice. Fortunately, one can make progress for large
294: $(x,y)$ where one can replace the integer valued discrete heights by continuous
295: functions $h(x,y)$ and ${\tilde h}(x,y)$. Using the notation $\partial_x\equiv
296: \partial/{\partial x}$ it is easy to derive from Eq. (\ref{ct1}) the following pair
297: of identities,
298: \begin{equation}
299: \partial_x h = \frac{\partial_{\zeta} {\tilde h}}{1-\partial_{\zeta}
300: {\tilde h}-\partial_{\eta} {\tilde h}};
301: \,\,\,
302: \partial_y h = \frac{\partial_{\eta} {\tilde h}}{1-\partial_{\zeta}
303: {\tilde h}-\partial_{\eta} {\tilde h}}.
304: \label{der1}
305: \end{equation}
306: In a similar way, one can show that
307: \begin{equation}
308: \partial_{\zeta} {\tilde h} = \frac{\partial_x h}{1+\partial_x h+\partial_y h};\,\,\,
309: \partial_{\eta} {\tilde h} = \frac{\partial_y h}{1+\partial_x h+\partial_y h}.
310: \label{der2}
311: \end{equation}
312: We then observe that Eqs. (\ref{der1}) and (\ref{der2}) are invariant under the
313: simultaneous transformations
314: \begin{equation}
315: \zeta\to -x ; \,\, \eta\to -y; \,\, \tilde h \to h \, .
316: \label{invar1}
317: \end{equation}
318: Since the height is built up by integrating the derivatives, this leads to a simple
319: result for large $\zeta$ and $\eta$,
320: \begin{equation}
321: {\tilde h}(\zeta, \eta) = h(-\zeta, -\eta).
322: \label{res1}
323: \end{equation}
324: Thus, if we know exactly the functional form of the ADP surface $h(x,y)$, then the
325: functional form of the BM surface ${\tilde h}(x,y)$ for large $x$ and $y$ is simply
326: obtained by ${\tilde h}(x,y)=h(-x,-y)$. Changing $x\to -x$ and $y\to -y$ in
327: Johansson's expression for the ADP surface in Eq. (\ref{j1}) we thus arrive at our
328: main asymptotic result for the BM model
329: \begin{eqnarray}
330: L_{x,y}^{BM}&=& {\tilde h}(x,y) \to \frac{2\sqrt{pxy}-p(x+y)}{q}+ \nonumber \\
331: &+&\frac{(pxy)^{1/6}}{q}\,\left[(1+p)-\sqrt{\frac{p}{xy}}\,(x+y)\right]^{2/3} \,
332: \chi, \label{res2}
333: \end{eqnarray}
334: where $p=1/c$ and $q=1-1/c$. For equal length sequences $x=y=n$, Eq. (\ref{res2})
335: then reduces to Eq. (\ref{asymp1}).
336: 
337: To check the consistency of our asymptotic results, we further computed the
338: difference between the left- and the right-hand sides of Eq. (\ref{conv1}),
339: \begin{equation}
340: \Delta h (\zeta, \eta)= {\tilde h}(\zeta, \eta)- h\left( \zeta- {\tilde h}(\zeta,
341: \eta), \eta- {\tilde h}(\zeta, \eta)\right), \label{conv2}
342: \end{equation}
343: with the functions $h(x,y)$ and ${\tilde h}(x,y)$ given respectively by Eqs.
344: (\ref{j1}) and (\ref{res2}). For large $\zeta=\eta$ one gets
345: \begin{equation}
346: \Delta h(\zeta,\zeta) \to \left[{p^{1/3}\chi^2}/{3 (1-\sqrt{p})^{4/3}}\right]\,
347: {\zeta}^{-1/3} . \label{cons1}
348: \end{equation}
349: Thus the discrepancy falls off as a power law for large $\zeta$, indicating that
350: indeed our solution is asymptotically exact. We have also performed numerical
351: simulations of the BM model using the recursion relation in Eq. (\ref{recur1}) for
352: $c=2,\,4,\,9,\,16,\,100$. Our preliminary results\cite{MN} for relatively small
353: system sizes (up to $n=5000$) are consistent with our exact results in Eqs.
354: (\ref{asymp1})-(\ref{eq:expvar}).
355: 
356: The Tracy-Widom distribution of the random matrix theory has appeared recently in a
357: number of problems\cite{TW,AD,Johansson,PS,BD}. In this Letter, we have shown that
358: it also describes the asymptotic distribution of the length of the longest common
359: subsequence in a sequence matching problem. While a possible link
360: between the two problems was speculated before\cite{AD}, a precise
361: connection, so far, was missing and is provided here.
362: 
363: \vspace*{-0.3cm}
364: 
365: \begin{references}
366: 
367: \vspace*{-1.2cm}
368: 
369: \bibitem{W1} M.S. Waterman, {\em Introduction to Computational Biology} (Chapman \& Hall,
370: London, 1994).
371: 
372: \bibitem{Gusfield} D. Gusfield, {\em Algorithms on Strings, Trees, and Sequences} (Cambridge
373: University Press, Cambridge, 1997).
374: 
375: \bibitem{DEKM} R. Dubrin, S. Eddy, A. Krogh, and G. Mitchison, {\em Biological Sequence
376: Analysis} (Cambridge University Press, Cambridge, 1998).
377: 
378: \bibitem{NW} S.B. Needleman and C.D. Wunsch, J. Mol. Biol. {\bf 48}, 443 (1970).
379: 
380: \bibitem{SW} T.F. Smith and M.S. Waterman, J. Mol. Biol. {\bf 147}, 195 (1981); Adv. Appl.
381: math. {\bf 2}, 482 (1981).
382: 
383: \bibitem{WGA} M.S. Waterman, L. Gordon, and R. Arratia, Proc. Natl. Acad. Sci. USA,
384: {\bf 84}, 1239 (1987).
385: 
386: \bibitem{AGMML} S.F. Altschul et. al., J. Mol. Biol. {\bf 215}, 403 (1990).
387: 
388: \bibitem{SK} D. Sankoff and J. Kruskal, {\em Time Warps, String Edits, and Macromolecules:
389: The theory and practice of sequence comparison} (Addison Wesley, Reading, Massachussets,
390: 1983).
391: 
392: \bibitem{AG} A. Apostolico and C. Guerra, Alogorithmica, {\bf 2}, 315 (1987).
393: 
394: \bibitem{WF} R. Wagner and M. Fisher, J. Assoc. Comput. Mach. {\bf 21}, 168 (1974);
395: 
396: \bibitem{CS} V. Chv\'atal and D. Sankoff, J. Appl. Probab. {\bf 12}, 306 (1975).
397: 
398: \bibitem{Deken} J. Deken, Discrete Math. {\bf 26}, 17 (1979).
399: 
400: \bibitem{Steele} J.M. Steele, SIAM J. Appl. Math. {\bf 42}, 731 (1982).
401: 
402: \bibitem{DP} V. Dancik and M. Paterson, in STACS94, Lecture Notes in Computer Science, {\bf
403: 775}, 306 (Springer, New York, 1994).
404: 
405: \bibitem{Alex} K.S. Alexander, Ann. Appl. Probab. {\bf 4}, 1074 (1994).
406: 
407: \bibitem{KLM} M. Kiwi, M. Loebl, and J. Matousek, math.CO/0308234.
408: 
409: \bibitem{ZM} M. Zhang and T. Marr, J. Theor. Biol. {\bf 174}, 119 (1995).
410: 
411: \bibitem{Hwa} T. Hwa and M. Lassig, Phys. Rev. Lett. {\bf 76}, 2591 (1996); R. Bundschuh
412: and T. Hwa, Discrete Appl. Math. {\bf 104}, 113 (2000).
413: 
414: \bibitem{Monvel} J. Boutet de Monvel, European Phys. J. B {\bf 7}, 293 (1999); Phys. Rev. E
415: {\bf 62}, 204 (2000).
416: 
417: \bibitem{MPV} M. M\'ezard, G. Parisi, and M.A. Virasoro, eds., {\em Spin Glass Theory
418: and Beyond} (World Scientific, Singapore, 1987).
419: 
420: \bibitem{TW} C.A. Tracy and H. Widom, Comm. Math. Phys. {\bf 159}, 151 (1994); see also
421: Proc. of ICM, Beijing, Vol. I, 587 (2002).
422: 
423: \bibitem{Wu} H.Y. Huang, F.Y. Wu, H. Kunz, D. Kim, Physica A {\bf 228}, 1 (1996)
424: 
425: \bibitem{MN} S.N. Majumdar and S. Nechaev, unpublished.
426: 
427: \bibitem{PS} M. Praehofer and H. Spohn, Phys. Rev. Lett. {\bf 84}, 4882 (2000); Physica
428: A, {\bf 279}, 342 (2000).
429: 
430: \bibitem{BD} S.N. Majumdar and S. Nechaev, Phys. Rev. E {\bf 69}, 011103 (2004).
431: 
432: \bibitem{BDJ} J. Baik, P. Deift, and K. Johansson, J. Amer. Math. Soc. {\bf 12}, 1119 (1999).
433: 
434: \bibitem{RD} R. Rajesh and D. Dhar, Phys. Rev. Lett. {\bf 81}, 1646 (1998).
435: 
436: \bibitem{Johansson} K. Johansson, Comm. Math. Phys. {\bf 209}, 437 (2000).
437: 
438: \bibitem{AD} D. Aldous and P. Diaconis, Bull. Amer. Math. Soc. {\bf 12}, 1119 (1999).
439: 
440: 
441: \end{references}
442: \end{multicols}
443: 
444: 
445: 
446: \end{document}
447: