0410:q-bio0410012/bm3.tex

1: \documentstyle[prl,aps,epsfig,multicol]{revtex}

2: \begin{document}

3: \title{\bf Exact Asymptotic Results for a Model of Sequence Alignment}

4: \author{Satya N. Majumdar $^{1,2}$ and Sergei Nechaev $^{2,3}$}

5: \address{\small \it $^1$Laboratoire de Physique Theorique (UMR C5152 du CNRS), Universit\'e

6: Paul Sabatier, 31062 Toulouse Cedex. France \\ $^2$Laboratoire de Physique

7: Th\'eorique et Mod\`eles Statistiques, Universit\'e Paris-Sud. B\^at. 100. 91405

8: Orsay Cedex. France \\ $^3$L.D.Landau Institute for Theoretical Physics, 117334

9: Moscow. Russia}

10: \date{\today}

11:

12: \maketitle

13:

14: \begin{abstract}

15: Finding analytically the statistics of the longest common subsequence (LCS) of a

16: pair of random sequences drawn from $c$ alphabets is a challenging problem in

17: computational evolutionary biology. We present exact asymptotic results for the

18: distribution of the LCS in a simpler, yet nontrivial, variant of the original model

19: called the Bernoulli matching (BM) model which reduces to the original model in

20: the $c\to \infty$ limit. We show that in the BM model, for all $c$, the distribution

21: of the asymptotic length of the LCS, suitably scaled, is identical to the Tracy-Widom

22: distribution of the largest eigenvalue of a random matrix whose entries are drawn

23: from a Gaussian unitary ensemble. In particular, in the $c\to \infty$ limit, this

24: provides an exact expression for the asymptotic length distribution in the original

25: LCS problem.

26:

27: \noindent

28:

29: \medskip\noindent {PACS numbers: 87.10.+e, 87.15.Cc, 02.50.-r, 05.40.-a}

30: \end{abstract}

31:

32:

33: \begin{multicols}{2}

34: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

35: %\section{Introduction}

36:

37: Sequence alignment is one of the most useful quantitative methods used in

38: evolutionary molecular biology\cite{W1,Gusfield,DEKM}. The goal of an alignment

39: algorithm is to search for similarities in patterns in different sequences. A

40: classic and much studied alignment problem is the so called `longest common

41: subsequence' (LCS) problem. The input to this problem is a pair of sequences

42: $\alpha=\{\alpha_1, \alpha_2,\dots, \alpha_i\}$ (of length $i$) and

43: $\beta=\{\beta_1, \beta_2,\dots, \beta_j\}$ (of length $j$). For example, $\alpha$

44: and $\beta$ can be two random sequences of the $4$ base pairs $A$, $C$, $G$, $T$ of

45: a DNA molecule, e.g., $\alpha=\{A, C, G, C, T, A, C\}$ and $\beta=\{C, T, G, A,

46: C\}$. A subsequence of $\alpha$ is an ordered sublist of $\alpha$ (entries of which

47: need not be consecutive in $\alpha$), e.g, $\{C, G, T, C\}$, but not $\{T, G, C\}$.

48: A common subsequence of two sequences $\alpha$ and $\beta$ is a subsequence of both

49: of them. For example, the subsequence $\{C, G, A, C\}$ is a common subsequence of

50: both $\alpha$ and $\beta$. There can be many possible common subsequences of a pair

51: of sequences. The aim of the LCS problem is to find the longest of such common

52: subsequences. This problem and its variants have been widely studied in

53: biology\cite{NW,SW,WGA,AGMML}, computer science\cite{SK,AG,WF,Gusfield}, probability

54: theory\cite{CS,Deken,Steele,DP,Alex,KLM} and more recently in statistical

55: physics\cite{ZM,Hwa,Monvel}. A particularly important application of the LCS problem

56: is to quantify the closeness between two DNA sequences. In evolutionary biology, the

57: genes responsible for building specific proteins evolve with time and by finding the

58: LCS of the same gene in different species, one can learn what has been conserved in

59: time. Also, when a new DNA molecule is sequenced {\it in vitro}, it is important to

60: know whether it is really new or it already exists. This is achieved quantitatively

61: by measuring the LCS of the new molecule with another existing already in the

62: database.

63:

64: For a pair of fixed sequences of length $i$ and $j$ respectively, the length

65: $L_{i,j}$ of their LCS is just a number. However, in the stochastic version of the

66: LCS problem one compares two random sequences drawn from $c$ alphabets and hence the

67: length $L_{i,j}$ is a random variable. A major challenge over the last three decades

68: has been to determine the statistics of $L_{i,j}$\cite{CS,Deken,Steele,DP,Alex}. For

69: equally long sequences ($i=j=n$), it has been proved that $\langle L_{n,n}\rangle

70: \approx \gamma_c n$ for $n\gg 1$, where the averaging is performed over all

71: realizations of the random sequences. The constant $\gamma_c$ is known as the

72: Chv\'atal-Sankoff constant which, to date, remains undetermined though there exists

73: several bounds\cite{Deken,DP,Alex}, a conjecture due to Steele\cite{Steele} that

74: $\gamma_c=2/(1+\sqrt{c})$ and a recent proof\cite{KLM} that $\gamma_c\to 2/\sqrt{c}$

75: as $c\to \infty$. Unfortunately, no exact results are available for the finite size

76: corrections to the leading behavior of the average $\langle L_{n,n}\rangle$, for the

77: variance, and also for the full probability distribution of $L_{n,n}$. Thus, despite

78: tremendous analytical and numerical efforts, exact solution of the random LCS

79: problem has, so far, remained elusive. Therefore it is important to find other

80: variants of this LCS problem that may be analytically tractable.

81:

82: Computationally, the easiest way to determine the length $L_{i,j}$ of the LCS of two

83: arbitrary sequences of lengths $i$ and $j$ (in polynomial time $\sim O(ij)$) is via

84: using the recursive algorithm\cite{Gusfield,Monvel}

85: \begin{equation}

86: L_{ij} = \max\left[L_{i-1,j}, L_{i,j-1}, L_{i-1,j-1} + \eta_{i,j}\right],

87: \label{recur1}

88: \end{equation}

89: subject to the initial conditions $L_{i,0}=L_{0,j}=L_{0,0}=0$. The variable

90: $\eta_{i,j}$ is either 1 when the characters at the positions $i$ (in the sequence

91: $\alpha$) and $j$ (in the sequence $\beta$) match each other, or 0 if they do not.

92: Note that the variables $\eta_{i,j}$'s are not independent of each other. To see

93: this consider the simple example -- matching of two strings $\alpha={\rm AB}$ and

94: $\beta={\rm AA}$. One has by definition: $\eta_{1,1}=\eta_{1,2}=1$ and

95: $\eta_{2,1}=0$. The knowledge of these three variables is sufficient to predict that

96: the last two letters will not match, i.e., $\eta_{2,2}=0$. Thus, $\eta_{2,2}$ can

97: not take its value independently of $\eta_{1,1},\,\eta_{1,2},\,\eta_{2,1}$. These

98: residual correlations between the $\eta_{i,j}$ variables make the LCS problem rather

99: complicated. Note however that for two random sequences drawn from $c$ alphabets,

100: these correlations between the $\eta_{i,j}$ variables vanish in the $c\to \infty$

101: limit.

102:

103: A simpler but natural variant of this LCS problem is the Bernoulli matching (BM)

104: model where one ignores the correlations between $\eta_{i,j}$'s for all

105: $c$\cite{Monvel}. The BM model reduces to the original LCS problem only in the $c\to

106: \infty$ limit. The length $L_{i,j}^{BM}$ of the BM model satisfies the same

107: recursion relation in Eq. (\ref{recur1}) except that $\eta_{i,j}$'s are now

108: independent and each drawn from the bimodal distribution: $p(\eta)=

109: (1/c)\delta_{\eta,1}+ (1-1/c)\delta_{\eta,0}$. The BM model, though simpler than the

110: original LCS problem, is still nontrivial due to the nonlinear recursion relation in

111: Eq. (\ref{recur1}). Using the cavity method of spin glass physics\cite{MPV}, the

112: asymptotic behavior of the average length in the BM model was determined

113: analytically\cite{Monvel},

114: \begin{equation}

115: \langle L_{n,n}^{BM}\rangle  \approx \gamma_c^{BM} n

116: \label{bm1}

117: \end{equation}

118: where $\gamma_c^{BM}= 2/(1+\sqrt{c})$, same as the conjectured value of the

119: Chv\'atal-Sankoff constant $\gamma_c$ for the original LCS model. However, other

120: properties such as the variance or the distribution of $L_{n,n}^{BM}$ remained

121: untractable even in the BM model.

122:

123: The purpose of this Letter is to present an exact asymptotic formula for the

124: distribution of the length $L_{n,n}^{BM}$ in the BM model for all $c$. Our main

125: result is that for large $n$,

126: \begin{equation}

127: L_{n,n}^{BM}\to \gamma_c^{BM} n + f(c)\, n^{1/3}\, \chi \label{asymp1}

128: \end{equation}

129: where $\chi$ is a random variable with a $n$-independent distribution, ${\rm Prob}

130: (\chi\le x)= F_{\rm TW}(x)$ which is the well studied Tracy-Widom distribution for

131: the largest eigenvalue of a random matrix with entries drawn from a Gaussian unitary

132: ensemble\cite{TW}. For a detailed form of the function $F_{\rm TW}(x)$, see

133: \cite{TW}. We show that for all $c$,

134: \begin{equation}

135: f(c)=\frac{c^{1/6}(\sqrt{c}-1)^{1/3}}{\sqrt{c}+1}.

136: \label{fc1}

137: \end{equation}

138: This allows us to calculate the average including the subleading finite size

139: correction term and the variance of $L_{n,n}^{BM}$ for large $n$,

140: \begin{eqnarray}

141: \langle L_{n,n}^{BM}\rangle &\approx & \gamma_c^{BM} n + \left<\chi\right> f(c)

142: n^{1/3} \nonumber \\

143: {\rm Var}\, L_{n,n}^{BM} &\approx &

144: \left(\langle\chi^2\rangle-{\langle\chi\rangle}^2\right)\, f^2(c)\, n^{2/3},

145: \label{eq:expvar}

146: \end{eqnarray}

147: where one can use the known exact values\cite{TW}, $\langle \chi\rangle=

148: -1.7711\dots$ and $\langle \chi^2\rangle- {\langle \chi\rangle}^2= 0.8132\dots$. In

149: particular, we note that in the limit $c\to \infty$, Eqs.

150: (\ref{asymp1})-(\ref{eq:expvar}) provide

151: exact asymptotic results for the original LCS model as well.

152:

153: In the BM model, the length $L_{i,j}^{BM}$ can be interpreted as the height of a

154: surface over the $2$-d $(i,j)$ plane constructed via the recursion relation in Eq.

155: (\ref{recur1}). A typical surface, shown in Fig. (1a), has terrace-like structures.

156: \begin{figure}[ht]

157: %\begin{center}

158: \centerline{\epsfig{file=bm_f1.eps,width=8cm}}

159: %\end{center}

160: \caption{Examples of (a) BM surface

161: $L_{i,j}^{BM}\equiv {\tilde h}(x,y)$ and (b) ADP surface $L_{i,j}^{ADP}\equiv

162: h(x,y)$.} \label{fig:1}

163: \end{figure}

164:

165: It is useful to consider the projection of the level lines separating the adjacent

166: terraces whose heights differ by $1$ (see Fig.2) onto the $2$-d $(i,j)$ plane. Note

167: that, by the rule Eq. (\ref{recur1}), these level lines never overlap each other,

168: i.e., no two paths have any common edge. The statistical weight of such a projected

169: $2$-d configuration is the product of weights associated with the vertices of the

170: $2$-d plane. There are five types of possible vertices with nonzero weights as shown

171: in Fig.2, where $p=1/c$ and $q=1-p$. Since the level lines never cross each other,

172: the weight of the first vertex in Fig. (2) is 0.

173: %The height $L_{i,j}^{BM}$ at any point $(i,j)$ on this $2$-d plane is just the

174: %number of level lines that one crosses in going from the origin to $(i,j)$.

175: \begin{figure}[ht]

176: %\begin{center}

177: \centerline{\epsfig{file=bm_f2.eps,width=6.5cm}}

178: %\end{center}

179: \caption{Projected $2$-d level lines separating adjacent terraces of unit height

180: difference in the BM surface in Fig.(1a). The adjacent table shows the weights of

181: all vertices on the $2$-d plane.} \label{fig:2}

182: \end{figure}

183:

184: Consider first the limit $c\to \infty$ (i.e., $p\to 0$). The weights of all allowed

185: vertices are $1$, except the ones shown by black dots in Fig.(2), whose associated

186: weights are $p\to 0$. The number $N$ of these black dots inside a rectangle of area

187: $A=ij$ can be easily estimated. For large $A$ and $p\to 0$, this number is Poisson

188: distributed with the mean ${\overline N}= pA$. The Bethe ansatz analysis shows that

189: BM corresponds to the sector of the 5-vertex model\cite{Wu} where the density

190: $\alpha$ of empty edges in a row of vertical edges is close to the boundary

191: $\alpha\approx 1^{-}$. The careful examination of the free energy near this boundary

192: allows one to conclude that the leading contribution in $p$ (for $p\to 0$) to

193: ${\overline N}$ comes exactly from the line of phase transitions in a 5-vertex

194: model. The subleading corrections to ${\overline N}$ are of order $\sim p^{3/2}$ and

195: are ensured by small deviations from the critical line being beyond the Poisson

196: approximation\cite{MN}.

197:

198: The height $L_{i,j}^{BM}$ is just the number of level lines $\cal N$ inside this

199: rectangle of area $A=ij$. The problem of estimating $\cal N$ has recently appeared

200: in a number of interface models such as a polynuclear growth model\cite{PS} and a

201: ballistic deposition model\cite{BD}. By using a mapping to the longest increasing

202: subsequence (LIS) of the equally likely permutations of a set of integers and then,

203: by applying a celebrated result due to Baik, Deift and Johansson (BDJ)\cite{BDJ}, it

204: was shown\cite{PS,BD} that the number of level lines ${\cal N}$ inside the rectangle

205: (for large $A$), appropriately scaled, has a limiting behavior, ${\cal N}\to

206: 2\sqrt{\overline N} + {\overline N}^{1/6}\, \chi$, where $\chi$ is a random variable

207: with Tracy-Widom distribution. Using ${\overline N}=pA=ij/c$, one then obtains in

208: the limit $p\to 0$,

209: \begin{equation}

210: L_{i,j}^{BM}= {\cal N} \to \frac{2}{\sqrt c}\sqrt{ij} +

211: {\left( \frac{ij}{c}\right)}^{1/6}\, \chi.

212: \label{p01}

213: \end{equation}

214: In particular, for large equal length sequences $i=j=n$, we get for $c\to \infty$

215: \begin{equation}

216: L_{n,n}^{BM}\to \frac{2}{\sqrt{c}}\, n + c^{-1/6} \, n^{1/3}\, \chi .

217: \label{p02}

218: \end{equation}

219: Note that since the BM and the original LCS model are equivalent in the limit $c\to

220: \infty$, the exact results in Eqs. (\ref{p01})-(\ref{p02}) also hold for the LCS

221: model. Note that only the leading behavior of the average $\langle L_{n,n}\rangle$

222: was known before\cite{KLM} in the $c\to \infty$ limit of the original LCS model.

223:

224: For finite $c$, while the above mapping to the LIS problem still works, the

225: corresponding permutations of the LIS problem are not generated with equal

226: probability and hence one can no longer use the BDJ results. To make progress for

227: finite $c$, we map the BM model exactly to a $3$-d anisotropic directed percolation

228: (ADP) model first considered by Rajesh and Dhar\cite{RD}. This ADP model can further

229: be mapped to a $(1+1)$-d directed polymer problem studied by

230: Johansson\cite{Johansson}. For this specific directed polymer problem, Johansson

231: derived exact asymptotic result for the distribution of the polymer energy.

232: Translating these results back to the BM model, we derive our main results in Eqs.

233: (\ref{asymp1})-(\ref{eq:expvar}). Note that the recursion relation in Eq.

234: (\ref{recur1}) can also be viewed as a $(1+1)$-d directed polymer

235: problem\cite{Hwa,Monvel} and some asymptotic results (such as the $O(n^{2/3})$

236: behavior of the variance of $L_{n,n}$ for large $n$) can be obtained using the

237: arguments of universality\cite{Hwa}. However, this does not provide precise results

238: for the full distribution which are obtained here.

239:

240: Let us consider a directed bond percolation on a simple cubic lattice. The bonds are

241: occupied with probabilities $p_x$, $p_y$, and $p_z$ along the $x$, $y$ and $z$ axes

242: and are all directed towards increasing coordinates. Imagine a source of fluid at

243: the origin which spreads along the occupied directed bonds. The sites that get wet by the

244: fluid form a $3$-d cluster. In the ADP problem, the bond occupation probabilities are

245: anisotropic, $p_x=p_y=1$ (all bonds aligned along the $x$ and $y$ axes are occupied)

246: and $p_z=p$. Hence, if the point $(x,y,z)$ gets wet by the fluid then all the points

247: $(x',y', z)$ on the same plane with $x'\ge x$ and $y'\ge y$ also get wet. Such a wet

248: cluster is compact and can be characterized by its bounding surface height $h(x,y)$

249: as shown in Fig.(1b). It is not difficult to see that the height $h(x,y)$ satisfies

250: the following recursion relation\cite{RD},

251: \begin{equation}

252: h(x,y) = \max \left[ h(x-1,y), h(x, y-1)\right] + \xi_{i,j},

253: \label{recur2}

254: \end{equation}

255: where $\xi_{i,j}$'s are i.i.d. random variables taking nonnegative integer values

256: with ${\rm Prob}(\xi_{i,j}=k)= (1-p)\, p^k$ for $k=0,1,2,\dots$. One can also

257: interpret the height $h(x,y)$ in Eq. (\ref{recur2}) as the energy of a directed

258: polymer in the $(x-y)$ plane. Precisely this particular version of the polymer

259: problem was studied by Johansson\cite{Johansson} who obtained the asymptotic

260: distribution of the height for large $x$ and $y$,

261: \begin{eqnarray}

262: h(x,y) &\to& \frac{2\sqrt{pxy}+p(x+y)}{q}+ \nonumber \\

263:        &+&   \frac{(pxy)^{1/6}}{q}\,\left[(1+p)+\sqrt{\frac{p}{xy}}\,(x+y)\right]^{2/3}

264:        \, \chi,

265: \label{j1}

266: \end{eqnarray}

267: where $q=1-p$, $\chi$ is a random variable with a Tracy-Widom distribution.

268:

269: While the terrace-like structures of the ADP surface look similar to the BM surfaces

270: (compare Figs.(1a) and (1b)), there is an important difference between the two. In

271: the ADP model, the level lines separating two adjacent terraces can overlap with

272: each other\cite{RD}, which does not happen in the BM model. However, by making the

273: following change of coordinates in the ADP model\cite{RD}

274: \begin{equation}

275: \zeta= x+ h(x,y); \,\,\, \eta=y+ h(x,y)

276: \label{ct1}

277: \end{equation}

278: one gets a configuration of the surface where the level lines no longer overlap.

279: Moreover, it is not difficult to show that the projected $2$-d configuration of

280: level lines of this shifted ADP surface has exactly the same statistical weight as

281: the projected $2$-d configuration of the BM surface. Denoting the BM height by

282: ${\tilde h}(x,y)= L_{x,y}^{BM}$, one then has the identity, ${\tilde h}(\zeta,

283: \eta)= h(x,y)$, which holds for each configuration. Using Eq. (\ref{ct1}), one can

284: rewrite this identity as

285: \begin{equation}

286: {\tilde h}(\zeta, \eta)= h\left( \zeta- {\tilde h}(\zeta, \eta),

287: \eta- {\tilde h}(\zeta, \eta)\right).

288: \label{conv1}

289: \end{equation}

290: Thus, for any given height function $h(x,y)$ of the ADP model, one can, in

291: principle, obtain the corresponding height function ${\tilde h}(x,y)$ for all

292: $(x,y)$ of the BM model by solving the nonlinear equation (\ref{conv1}). This is

293: however very difficult in practice. Fortunately, one can make progress for large

294: $(x,y)$ where one can replace the integer valued discrete heights by continuous

295: functions $h(x,y)$ and ${\tilde h}(x,y)$. Using the notation $\partial_x\equiv

296: \partial/{\partial x}$ it is easy to derive from Eq. (\ref{ct1}) the following pair

297: of identities,

298: \begin{equation}

299: \partial_x h = \frac{\partial_{\zeta} {\tilde h}}{1-\partial_{\zeta}

300: {\tilde h}-\partial_{\eta} {\tilde h}};

301: \,\,\,

302: \partial_y h = \frac{\partial_{\eta} {\tilde h}}{1-\partial_{\zeta}

303: {\tilde h}-\partial_{\eta} {\tilde h}}.

304: \label{der1}

305: \end{equation}

306: In a similar way, one can show that

307: \begin{equation}

308: \partial_{\zeta} {\tilde h} = \frac{\partial_x h}{1+\partial_x h+\partial_y h};\,\,\,

309: \partial_{\eta} {\tilde h} = \frac{\partial_y h}{1+\partial_x h+\partial_y h}.

310: \label{der2}

311: \end{equation}

312: We then observe that Eqs. (\ref{der1}) and (\ref{der2}) are invariant under the

313: simultaneous transformations

314: \begin{equation}

315: \zeta\to -x ; \,\, \eta\to -y; \,\, \tilde h \to h \, .

316: \label{invar1}

317: \end{equation}

318: Since the height is built up by integrating the derivatives, this leads to a simple

319: result for large $\zeta$ and $\eta$,

320: \begin{equation}

321: {\tilde h}(\zeta, \eta) = h(-\zeta, -\eta).

322: \label{res1}

323: \end{equation}

324: Thus, if we know exactly the functional form of the ADP surface $h(x,y)$, then the

325: functional form of the BM surface ${\tilde h}(x,y)$ for large $x$ and $y$ is simply

326: obtained by ${\tilde h}(x,y)=h(-x,-y)$. Changing $x\to -x$ and $y\to -y$ in

327: Johansson's expression for the ADP surface in Eq. (\ref{j1}) we thus arrive at our

328: main asymptotic result for the BM model

329: \begin{eqnarray}

330: L_{x,y}^{BM}&=& {\tilde h}(x,y) \to \frac{2\sqrt{pxy}-p(x+y)}{q}+ \nonumber \\

331: &+&\frac{(pxy)^{1/6}}{q}\,\left[(1+p)-\sqrt{\frac{p}{xy}}\,(x+y)\right]^{2/3} \,

332: \chi, \label{res2}

333: \end{eqnarray}

334: where $p=1/c$ and $q=1-1/c$. For equal length sequences $x=y=n$, Eq. (\ref{res2})

335: then reduces to Eq. (\ref{asymp1}).

336:

337: To check the consistency of our asymptotic results, we further computed the

338: difference between the left- and the right-hand sides of Eq. (\ref{conv1}),

339: \begin{equation}

340: \Delta h (\zeta, \eta)= {\tilde h}(\zeta, \eta)- h\left( \zeta- {\tilde h}(\zeta,

341: \eta), \eta- {\tilde h}(\zeta, \eta)\right), \label{conv2}

342: \end{equation}

343: with the functions $h(x,y)$ and ${\tilde h}(x,y)$ given respectively by Eqs.

344: (\ref{j1}) and (\ref{res2}). For large $\zeta=\eta$ one gets

345: \begin{equation}

346: \Delta h(\zeta,\zeta) \to \left[{p^{1/3}\chi^2}/{3 (1-\sqrt{p})^{4/3}}\right]\,

347: {\zeta}^{-1/3} . \label{cons1}

348: \end{equation}

349: Thus the discrepancy falls off as a power law for large $\zeta$, indicating that

350: indeed our solution is asymptotically exact. We have also performed numerical

351: simulations of the BM model using the recursion relation in Eq. (\ref{recur1}) for

352: $c=2,\,4,\,9,\,16,\,100$. Our preliminary results\cite{MN} for relatively small

353: system sizes (up to $n=5000$) are consistent with our exact results in Eqs.

354: (\ref{asymp1})-(\ref{eq:expvar}).

355:

356: The Tracy-Widom distribution of the random matrix theory has appeared recently in a

357: number of problems\cite{TW,AD,Johansson,PS,BD}. In this Letter, we have shown that

358: it also describes the asymptotic distribution of the length of the longest common

359: subsequence in a sequence matching problem. While a possible link

360: between the two problems was speculated before\cite{AD}, a precise

361: connection, so far, was missing and is provided here.

362:

363: \vspace*{-0.3cm}

364:

365: \begin{references}

366:

367: \vspace*{-1.2cm}

368:

369: \bibitem{W1} M.S. Waterman, {\em Introduction to Computational Biology} (Chapman \& Hall,

370: London, 1994).

371:

372: \bibitem{Gusfield} D. Gusfield, {\em Algorithms on Strings, Trees, and Sequences} (Cambridge

373: University Press, Cambridge, 1997).

374:

375: \bibitem{DEKM} R. Dubrin, S. Eddy, A. Krogh, and G. Mitchison, {\em Biological Sequence

376: Analysis} (Cambridge University Press, Cambridge, 1998).

377:

378: \bibitem{NW} S.B. Needleman and C.D. Wunsch, J. Mol. Biol. {\bf 48}, 443 (1970).

379:

380: \bibitem{SW} T.F. Smith and M.S. Waterman, J. Mol. Biol. {\bf 147}, 195 (1981); Adv. Appl.

381: math. {\bf 2}, 482 (1981).

382:

383: \bibitem{WGA} M.S. Waterman, L. Gordon, and R. Arratia, Proc. Natl. Acad. Sci. USA,

384: {\bf 84}, 1239 (1987).

385:

386: \bibitem{AGMML} S.F. Altschul et. al., J. Mol. Biol. {\bf 215}, 403 (1990).

387:

388: \bibitem{SK} D. Sankoff and J. Kruskal, {\em Time Warps, String Edits, and Macromolecules:

389: The theory and practice of sequence comparison} (Addison Wesley, Reading, Massachussets,

390: 1983).

391:

392: \bibitem{AG} A. Apostolico and C. Guerra, Alogorithmica, {\bf 2}, 315 (1987).

393:

394: \bibitem{WF} R. Wagner and M. Fisher, J. Assoc. Comput. Mach. {\bf 21}, 168 (1974);

395:

396: \bibitem{CS} V. Chv\'atal and D. Sankoff, J. Appl. Probab. {\bf 12}, 306 (1975).

397:

398: \bibitem{Deken} J. Deken, Discrete Math. {\bf 26}, 17 (1979).

399:

400: \bibitem{Steele} J.M. Steele, SIAM J. Appl. Math. {\bf 42}, 731 (1982).

401:

402: \bibitem{DP} V. Dancik and M. Paterson, in STACS94, Lecture Notes in Computer Science, {\bf

403: 775}, 306 (Springer, New York, 1994).

404:

405: \bibitem{Alex} K.S. Alexander, Ann. Appl. Probab. {\bf 4}, 1074 (1994).

406:

407: \bibitem{KLM} M. Kiwi, M. Loebl, and J. Matousek, math.CO/0308234.

408:

409: \bibitem{ZM} M. Zhang and T. Marr, J. Theor. Biol. {\bf 174}, 119 (1995).

410:

411: \bibitem{Hwa} T. Hwa and M. Lassig, Phys. Rev. Lett. {\bf 76}, 2591 (1996); R. Bundschuh

412: and T. Hwa, Discrete Appl. Math. {\bf 104}, 113 (2000).

413:

414: \bibitem{Monvel} J. Boutet de Monvel, European Phys. J. B {\bf 7}, 293 (1999); Phys. Rev. E

415: {\bf 62}, 204 (2000).

416:

417: \bibitem{MPV} M. M\'ezard, G. Parisi, and M.A. Virasoro, eds., {\em Spin Glass Theory

418: and Beyond} (World Scientific, Singapore, 1987).

419:

420: \bibitem{TW} C.A. Tracy and H. Widom, Comm. Math. Phys. {\bf 159}, 151 (1994); see also

421: Proc. of ICM, Beijing, Vol. I, 587 (2002).

422:

423: \bibitem{Wu} H.Y. Huang, F.Y. Wu, H. Kunz, D. Kim, Physica A {\bf 228}, 1 (1996)

424:

425: \bibitem{MN} S.N. Majumdar and S. Nechaev, unpublished.

426:

427: \bibitem{PS} M. Praehofer and H. Spohn, Phys. Rev. Lett. {\bf 84}, 4882 (2000); Physica

428: A, {\bf 279}, 342 (2000).

429:

430: \bibitem{BD} S.N. Majumdar and S. Nechaev, Phys. Rev. E {\bf 69}, 011103 (2004).

431:

432: \bibitem{BDJ} J. Baik, P. Deift, and K. Johansson, J. Amer. Math. Soc. {\bf 12}, 1119 (1999).

433:

434: \bibitem{RD} R. Rajesh and D. Dhar, Phys. Rev. Lett. {\bf 81}, 1646 (1998).

435:

436: \bibitem{Johansson} K. Johansson, Comm. Math. Phys. {\bf 209}, 437 (2000).

437:

438: \bibitem{AD} D. Aldous and P. Diaconis, Bull. Amer. Math. Soc. {\bf 12}, 1119 (1999).

439:

440:

441: \end{references}

442: \end{multicols}

443:

444:

445:

446: \end{document}

447: