0612:cs0612041/ntvit.tex

1: \documentclass[a4,10pt]{article}

2:

3:

4: %% --------------------------------------------------------------

5: %% BEGIN -- DEFINITIONS AND PACKAGES

6:

7: \pagestyle{plain}

8:

9: \usepackage{latexsym}

10:

11: %\usepackage{psfig}

12: \usepackage{graphicx}

13: \usepackage{pstricks}

14: \usepackage{amssymb}

15: \usepackage{amsmath}

16: \usepackage{fullname}

17:

18: \textheight    220mm

19: \textwidth     155mm

20: \headheight     -2mm

21: \oddsidemargin   0mm

22: \topmargin       0mm

23:

24: \input{ntvit.def}

25:

26: %% END -- DEFINITIONS

27: %% --------------------------------------------------------------

28:

29: \begin{document}

30:

31:

32: %% --------------------------------------------------------------

33: %%  TITLE AND ABSTRACT

34: %% --------------------------------------------------------------

35:

36: \title{ \vspace{-2ex}

37:   Viterbi Algorithm Generalized for $n$-Tape Best-Path Search

38: }

39:

40: \author{

41:   Andr\'e Kempe \vspace{2ex} \\

42:   Xerox Research Centre Europe ~--~ Grenoble Laboratory \\

43:   6 chemin de Maupertuis ~--~ 38240 Meylan ~--~ France \\

44: }

45:

46: \date{

47:   March 9, 2006

48: }

49:

50: \maketitle

51:

52: \begin{abstract}

53:   %

54:   We present a generalization of the Viterbi algorithm

55:   for identifying the path with minimal (resp. maximal) weight

56:   in a {\it $n$-tape weighted finite-state machine}\/ ($n$-WFSM),

57:   that accepts a given $n$-tuple of input strings $\aTuple{s_1,\ldots s_n}$.

58:   %

59:   It also allows us to compile the best transduction of a given input $n$-tuple

60:   by a weighted $(n\!+\!m)$-WFSM (transducer) with $n$ input and $m$ output tapes.

61:   %

62:   Our algorithm has a worst-case time complexity of

63:   $\complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)$,

64:   where $n$ and $|s|$ are the number and average length of the strings in the $n$-tuple,

65:   and $|Q|$ and $|E|$ the number of states and transitions in the $n$-WFSM,

66:   respectively.

67:   %

68:   A straight forward alternative,

69:   consisting in intersection followed by classical shortest-distance search,

70:   operates in $\complexity\left(\, |s|^n(|E|+|Q|)\log|s|^n|Q| \,\right)$ time.

71:   %

72: \end{abstract}

73:

74:

75: %% --------------------------------------------------------------

76: %%  INTRODUCTION

77: %% --------------------------------------------------------------

78:

79:

80: \section{Introduction

81:   \label{sec:intro}}

82:

83:

84: The topic of this paper is situated in the areas of

85: {\it multi-tape}\/ or {\it $n$-tape weighted finite-state machines}\/ ($n$-WFSMs)

86: and shortest-path problems.

87:

88:

89: %% --------------------------------------------------------------

90:

91: $n$-WFSMs

92: \cite{rabin+scott:1959,elgot+mezei:1965,kay:1987,harju+karhumaki:1991,kaplan+kay:1994}

93: are a natural generalization of the familiar

94: finite-state acceptors (one tape) and transducers (two tapes).

95: The $n$-ary relation defined by an $n$-WFSM is a weighted {\em rational\/} relation.

96: %

97: Finite relations are of particular interest since they

98: can be viewed as relational databases.

99: %

100: A finite-state transducer ($n=2$) can be seen as a database of string pairs,

101: such as $\aTuple{\WordD{spelling}, \WordD{pronunciation}}$ or

102: $\aTuple{\WordD{French word}, \WordD{English word}}$.

103: %

104: Unlike a classical database, a transducer may even define infinitely many pairs.

105: For example, it may characterize the pattern of the spelling-pronunciation

106: relationship in such a way that it can map even the spelling of an unknown word

107: to zero or more possible pronunciations (with various weights),

108: and vice-versa.

109: %

110: $n$-WFSMs have been used in the morphological analysis of Semitic languages,

111: to synchronize the vowels, consonants, and

112: templatic pattern into a surface form \cite{kay:1987,kiraz:2000}.

113:

114:

115: %% --------------------------------------------------------------

116:

117: Classical shortest-path algorithms can be separated into two groups,

118: addressing either single-source shortest-path (SSSP) problems,

119: such as Dijkstra's algorithm \cite{dijsktra:1959}

120: or Bellman-Ford's \cite{bellman:1958,ford+fulkerson:1956},

121: or all-pairs shortest-path (APSP) problems,

122: such as Floyd-Warshall's \cite{floyd:1962,warshall:1962}.

123: SSSP algorithms determine a minimum-weight path from a source vertex

124: of a real- or integer-weighted graph to all its other vertices.

125: APSP algorithms find shortest paths between all pairs of vertices.

126: %

127: For details of shortest-path problems in graphs see \cite{pettie:2003},

128: and in semiring-weighted finite-state automata see \cite{mohri:2002b}.

129:

130:

131: %% --------------------------------------------------------------

132:

133: \smallskip

134:

135: We address the following problem:

136: in a given $n$-WFSM we want to identify the path with minimal (resp. maximal) weight

137: that accepts a given $n$-tuple of input strings $\aTuple{s_1,\ldots s_n}$.

138: %

139: This is of particular interest because it allows us also

140: to compile the best transduction of a given input $n$-tuple

141: by a weighted $(n\!+\!m)$-WFSM (transducer) with $n$ input and $m$ output tapes.

142: For this, we identify the best path accepting the input $n$-tuple on its input tapes,

143: and take the label of the path's output tapes as best output $m$-tuple.

144:

145:

146: A known straight forward method for solving our problem is

147: to intersect the $n$-WFSM with another one that contains a single path

148: labeled with the input $n$-tuple,

149: and then to apply a classical SSSP algorithm, ignoring the labels.

150: %

151: We show that such an intersection together with Dijkstra's algorithm have

152: a worst-case time complexity of

153: $ \complexity\left(\, |s|^n(|E|+|Q|)\log|s|^n|Q| \,\right) $,

154: where $n$ and $|s|$ are the number and average length of the strings in the $n$-tuple,

155: and $|Q|$ and $|E|$ the number of states and transitions of the $n$-WFSM,

156: respectively.

157:

158:

159: We propose an alternative approach with lower complexity.

160: It is based on the Viterbi algorithm

161: which is generally used for detecting the most likely path

162: in a {\it Hidden Markov Model}\/ (HMM)

163: for an observed sequence of symbols emitted by the HMM

164: \cite{viterbi:1967,rabiner:1990,manning+schuetze:1999}.

165: %

166: Our algorithm is a generalization of Viterbi's algorithm

167: such that it deals with an $n$-tuple of input strings rather than with a single input string.

168: %

169: In the worst case,

170: it operates in $\complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)$ time.

171:

172:

173: %% --------------------------------------------------------------

174:

175: \smallskip

176:

177: This paper is structured as follows.

178: %

179: Basic definitions of weighted $n$-ary relations, $n$-WFSMs, HMMs, and the Viterbi algorithm

180: are recalled in Section~\ref{sec:prelim}.

181: %

182: Section~\ref{sec:vit-1tape}

183: adapts the Viterbi algorithm

184: to the search of the best path in a $1$-WFSM that accepts a given input string,

185: %

186: and Section~\ref{sec:vit-ntape} generalizes it

187: to the search of the best path in an $n$-WFSM that accepts an $n$-tuple of strings.

188: %

189: Section~\ref{sec:align}

190: illustrates our algorithm on a practical example,

191: the alignment of word pairs (i.e., $n\!=\!2$),

192: and provides test results that show a slightly higher than

193: $\complexity\left(\, |s|^2 \,\right)$ time complexity.

194: %

195: The above mentioned classical method for solving our problem

196: is discussed in Section~\ref{sec:alternatives}.

197: %

198: Section~\ref{sec:conclusion}

199: concludes the paper.

200:

201:

202: %% --------------------------------------------------------------

203: %%  PRELIMINARIES

204: %% --------------------------------------------------------------

205:

206:

207: \section{Preliminaries

208:   \label{sec:prelim}}

209:

210:

211: We recall some definitions about

212: $n$-ary weighted relations and their machines,

213: following the usual definitions for multi-tape

214: automata \cite{elgot+mezei:1965,eilenberg:1974},

215: with semiring weights added just as for acceptors and transducers

216: \cite{kuich+salomaa:1986,mohri+al:1998}.

217: For more details see \cite{kempe+champarnaud+eisner:2004a}.

218: %

219: We also briefly recall Hidden Markov Models and the Viterbi algorithm,

220: and point the reader to

221: \cite{viterbi:1967,rabiner:1990,manning+schuetze:1999}

222: for further details.

223:

224:

225: %% --------------------------------------------------------------

226:

227: \subsection{Weighted $n$-ary relations}

228:

229: A weighted $n$-ary relation is a function from $(\Sigma^*)^n$ to

230: $\srSetK$, for a given finite alphabet $\Sigma$ and a given weight

231: semiring $\srK = \aTuple{\srSetK, \srPlus, \srTimes, \srZero,

232:   \srOne}$.  A relation assigns a weight to any

233: $n$-tuple of strings.  A weight of $\srZero$ can be interpreted as

234: meaning that the tuple is not in the relation.

235: %

236: We are especially interested in {\it rational} (or {\it regular})

237: $n$-ary relations, i.e. relations that can be encoded by $n$-tape

238: weighted finite-state machines, that we now define.

239: %

240:

241:

242: We adopt the convention that variable names referring to $n$-tuples of

243: strings include a superscript $\tapnum{n}$.

244: Thus we write $s\tapnum{n}$ rather than

245: ${\mathop{s}\limits^\rightarrow}$

246: for a tuple of strings $\aTuple{s_1, \dots  s_n}$.

247: We also use this convention for the names of

248: objects that contain $n$-tuples of strings,

249: such as $n$-tape machines and their transitions and paths.

250:

251:

252: %% --------------------------------------------------------------

253:

254: \subsection{Multi-tape weighted finite-state machines}

255:

256: An {\it $n$-tape weighted finite-state machine} (WFSM or $n$-WFSM)

257: $A\tapnum{n}$ is defined by a six-tuple

258: $A\tapnum{n} = \aTuple{\Sigma, Q, \srK, E\tapnum{n}, \wgtInit, \wgtFin}$,

259: with

260: $\Sigma$ being a finite alphabet,

261: $Q$ a finite set of states,

262: $\srK\!=\!\aTuple{\srSetK,\srPlus,\srTimes,\srZero,\srOne}$ the

263: semiring of weights,

264: $E\tapnum{n}\!\subseteq ( Q\times (\Sigma^*)^n \times \srSetK \times Q )$

265: 		a finite set of weighted $n$-tape transitions,

266: $\wgtInit : Q \rightarrow \srSetK$ a function that assigns initial

267: weights to states,

268: and $\wgtFin : Q \rightarrow \srSetK$ a function that assigns final

269: weights to states.

270:

271: Any transition $e\tapnum{n}\!\in\! E\tapnum{n}$ has the form

272: $e\tapnum{n}\!=\!\aTuple{\eSrc,\lab\tapnum{n},w,\eTrg}$.

273: We refer to these four components as the transition's source state

274: $\eSrc(e\tapnum{n})\!\in\!Q$, its label

275: $\lab(e\tapnum{n})\!\in\!(\Sigma^*)^n$, its weight

276: $w(e\tapnum{n})\!\in\!\srSetK$, and its target state

277: $\eTrg(e\tapnum{n})\!\in\!Q$.

278: %

279: We refer by $E(q)$ to the set of out-going transitions of a state $q\!\in\!Q$

280: ~(with $E(q)\!\subseteq\!E\tapnum{n}$).

281:

282:

283: A {\it path}\/ $\path\tapnum{n}$ of length $k \geq 0$

284: is a sequence of transitions

285: $e_1\tapnum{n} e_2\tapnum{n} \cdots e_k\tapnum{n}$

286: such that $\eTrg(e_i\tapnum{n})\!=\!\eSrc(e_{i+1}\tapnum{n})$

287: for all $i\!\in\!\aRange{1, k\!-\!1}$.

288: %

289: The label of a path is the element-wise concatenation of

290: the labels of its transitions.

291: The weight of a path $\path\tapnum{n}$ is

292: %

293: %\vspace{-1ex}

294: \begin{equation}

295:   w(\path\tapnum{n})	\DefAs

296: 	\wgtInit(\eSrc(e_1\tapnum{n})) \srTimes

297: 	\left(\srBigTimes_{j\in\aRange{1,k}}

298: 		\spc{-1ex}w\left(e_j\tapnum{n}\right)\right) \srTimes

299: 	\wgtFin(\eTrg(e_k\tapnum{n}))

300: \end{equation}

301:

302: %\vspace{-1ex}

303:

304: \noindent

305: The path is said to be {\it successful}, and to {\it accept}

306: its label, if $w(\path\tapnum{n})\neq\srZero$.

307:

308:

309: %% --------------------------------------------------------------

310:

311: \subsection{Hidden Markov Models}

312:

313: A {\it Hidden Markov Model}\/ (HMM) is defined by a five-tuple

314: $\aTuple{\Sigma,Q,\hmIniVec,\hmTraMtx,\hmOutMtx}$, where

315: %

316: $\Sigma\!=\!\aSet{\sigma_k}$ is the output alphabet,

317: $Q\!=\!\aSet{q_i}$ a finite set of states,

318: $\hmIniVec\!=\!\aSet{\hmIniPrb_i}$ a vector of initial state probabilities

319:   $\hmIniPrb_i = p(x_1\!=\!q_i) : Q\rightarrow\aRange{0,1}\,$,~

320: $\hmTraMtx\!=\!\aSet{\hmTraPrb_{ij}}$ a matrix of state transition probabilities

321:   $\hmTraPrb_{ij} = p(x_t\!=\!q_j | x_{t-1}\!=\!q_i) : Q\!\times\!Q\rightarrow\aRange{0,1}\,$,~

322: and $\hmOutMtx\!=\!\aSet{\hmOutPrb_{jk}}$ a matrix of state emission probabilities

323:   $\hmOutPrb_{jk} = p(\hmPthOut_t\!=\!\sigma_k | x_t\!=\!q_j) : Q\!\times\!\Sigma\rightarrow\aRange{0,1}\,$.

324: %

325: A {\it path}\/ of length $T$ in an HMM is a non-observable (i.e., hidden) state sequence

326: $\hmStaSeq = \hmPthSta_1\cdots\hmPthSta_T$,

327: emitting an observable output sequence

328: $\hmOutSeq = \hmPthOut_1\cdots\hmPthOut_T$

329: which is a probabilistic function of $\hmStaSeq$.

330: %

331:

332:

333: %% --------------------------------------------------------------

334:

335: \subsection{Viterbi Algorithm}

336:

337: The {\it Viterbi algorithm}\/

338: finds the most likely path

339: $\widehat\hmStaSeq = \argmax_{\hmStaSeq} p(\hmStaSeq | \hmOutSeq, \mu)$

340: for an observed output sequence $\hmOutSeq$

341: and given model parameters $\mu=\aTuple{\hmIniVec,\hmTraMtx,\hmOutMtx}$,

342: using a trellis similar to that in Figure~\ref{fig:vit-1tape}.

343: %

344: It has a $\complexity(T\, |Q|^2)$ time and a $\complexity(T\, |Q|)$ space complexity.

345:

346:

347: %% --------------------------------------------------------------

348: %%  VITERBI ON 1-TAPE

349: %% --------------------------------------------------------------

350:

351:

352: \section{$1$-Tape Best-Path Search

353:   \label{sec:vit-1tape}}

354:

355:

356: The Viterbi algorithm

357: \cite{viterbi:1967,rabiner:1990,manning+schuetze:1999}

358: can be easily adapted for searching for the best of all paths of a $1$-WFSM, $A\tapnum{1}$,

359: that accept a given input string.

360: %

361: We use a notation that will facilitate the subsequent generalization

362: of the algorithm to $n$-tape best-path search (Section~\ref{sec:vit-ntape}).

363: Only the search for the path with minimal weight is explained.

364: An adaptation to maximal weight search is trivial.

365:

366:

367: %% --------------------------------------------------------------

368:

369:

370: \begin{figure}[htb]

371:   \begin{center}

372:     \includegraphics[scale=0.5,angle=0]{vit-1tape.eps}

373:     \caption{Modified trellis for $1$-tape best-path search

374: 	\label{fig:vit-1tape}}

375:   \end{center}

376:   %\vspace{-3ex}

377: \end{figure}

378:

379:

380: \subsection{Structures}

381:

382: We use a reading pointer $\vtPtr\in\vtPtrSet=\aSet{0,\ldots|s|}$

383: that is initially positioned before the first letter of the input string $s$,

384: ~$\vtPtr\!=\!0$,

385: and then increased with the reading of $s$

386: until it reaches the position after the last letter,

387: $\,\vtPtr\!=\!|s|$.

388: %

389: At any moment, $\vtPtr$ equals the length of the prefix of $s$

390: that has already been read.

391:

392:

393: As it is usual for the Viterbi algorithm,

394: we use a trellis $\vtNodeSet\!=Q\times\vtPtrSet$,

395: consisting of nodes $\vtNode\!=\!\aTuple{q,\vtPtr}$

396: which express that a state $q\!\in\!Q$ is reached after reading $\vtPtr$ letters of $s$

397: (Figure~\ref{fig:vit-1tape}).

398: %

399: We divide the trellis into several node sets

400: $\vtNodeSet_{\vtPtr} = \aSet{\vtNode\!=\!\aTuple{q,\vtPtr}} \subseteq \vtNodeSet$,

401: each corresponding to a pointer position $\vtPtr$ or to a column of the trellis.

402: %

403: For each node $\vtNode$,

404: we maintain three variables referring to $\vtNode$'s best prefix:

405: 	$\vtPreWgt_{\vtNode}$ being its weight,

406: 	$\vtPreNode_{\vtNode}$ its last node (immediately preceding $\vtNode$), and

407: 	$\vtPreArc_{\vtNode}$ its last transition $e\!\in\!E$ of $A\tapnum{1}$.

408: %

409: The $\vtPreNode_{\vtNode}$ are back-pointers

410: that fully define the best prefix of each node $\vtNode$.

411: All $\vtPreWgt_{\vtNode}$, $\vtPreNode_{\vtNode}$, and $\vtPreArc_{\vtNode}$

412: are initially undefined ($\,=\!\Null\,$).\footnote{

413:   %

414:   The variables $\vtPreWgt_{\vtNode}$, $\vtPreNode_{\vtNode}$, and $\vtPreArc_{\vtNode}$

415:   can be formally regarded as elements of the vectors

416:   $\vtPreWgtArr$, $\vtPreNodeArr$, and $\vtPreArcArr$,

417:   respectively, that are indexed by values of $\vtNode$.

418:   In a practical implementation is, however, meaningful to store these variables

419:   directly on the node that they refer to.

420:   %

421: }

422:

423:

424: %% --------------------------------------------------------------

425:

426:

427: \begin{figure}[htb]

428:   \begin{center}

429:     {\small \input{vit-1tape.pc} }

430:     \vspace{-2ex}

431:     \caption{Pseudocode of $1$-tape best-path search

432:       \label{pc:vit-1tape}}

433:     %\vspace{-1ex}

434:   \end{center}

435: \end{figure}

436:

437:

438: \subsection{Algorithm}

439:

440: The algorithm \FUNCT{FsaViterbi}{~}

441: returns from all paths $\path$ of the $1$-WFSM $A\tapnum{1}$ that accept the string $s$,

442: the one with minimal weight

443: (Figure~\ref{pc:vit-1tape}).

444: %

445: $A\tapnum{1}$ must not contain any transitions labeled with $\eps$ (the empty string).

446: %

447: At least a partial order must be defined on the semiring of weights.

448: %

449: Nothing else is required concerning the labels, weights, or structure of $A\tapnum{1}$.\footnote{

450:   %

451:   Cycles are, e.g., not required to have non-negative weights (as for Dijkstra's algorithm)

452:   because all paths of interest are constrained by the input string.

453:   %

454: }

455:

456:

457: The algorithm starts with creating an initial node set $\vtNodeSet_{\sf initial}=\vtNodeSet_0$

458: for the initial position $\vtPtr=0$ of the reading pointer.

459: The set $\vtNodeSet_{\sf initial}$ contains a node for each initial state of $A\tapnum{1}$

460: (Lines~\ref{pc1:L101}--\ref{pc1:L104}).

461: %

462: The prefix weights $\vtPreWgt_\vtNode$ of these nodes are set to the initial weight $\wgtInit(q)$

463: of the respective states $q$.

464: %

465: The set of node sets $\vtNodeSetSet$ contains only $\vtNodeSet_{\sf initial}$ at this point

466: (Line~\ref{pc1:L105}).

467:

468:

469: In the subsequent iteration (Lines~\ref{pc1:L201}--\ref{pc1:L215}),

470: reaching from the first to the one but last pointer position, $p=0,\ldots|s|\!-\!1$,

471: we inspect all outgoing transitions $e\!\in\!E(q)$

472: of all states $q\!\in\!Q$

473: for which there is a node $\vtNode\!=\!\aTuple{q,\vtPtr}$ in $\vtNodeSet_{\vtPtr}$.

474: %

475: If the label $\lab(e)$ of $e$ matches $s$ at position $p$,

476: we create a new node $\vtNode^\prime=\aTuple{\eTrg(e),{\vtPtr^\prime}}$

477: for the target $\eTrg(e)$ of $e$ (Line~\ref{pc1:L204}).

478: Its prefix weight $\vtPreWgt^\prime$ equals the current node's weight $\vtPreWgt_{\vtNode}$

479: multiplied by the weight $w(e)$ of $e$.

480: %

481: The node set $\vtNodeSet_{\vtPtr^\prime}$ for the new $\vtNode^\prime$

482: is created and inserted into the set of node sets $\vtNodeSetSet$

483: ~(if it does not exist yet; Line~\ref{pc1:L211}).

484: %

485: Then $\vtNode^\prime$ is inserted into $\vtNodeSet_{\vtPtr^\prime}$

486: ~(if it is not yet a member of it; Line~\ref{pc1:L213}).

487: %

488: If the prefix weight of $\vtNode^\prime$ is still undefined,

489: $\vtPreWgt_{\vtNode^\prime}=\Null$

490: ~(because no prefix of $\vtNode^\prime$ has been analyzed yet),

491: or if it is higher than the weight of the currently analyzed new prefix,

492: $\vtPreWgt_{\vtNode^\prime} > \vtPreWgt^\prime$,

493: then the variables $\vtPreWgt_{\vtNode^\prime}$, $\vtPreNode_{\vtNode^\prime}$,

494: and $\vtPreArc_{\vtNode^\prime}$ of $\vtNode^\prime$

495: are assigned values of the new prefix (Lines~\ref{pc1:L214}--\ref{pc1:L215}).

496:

497:

498: The algorithm terminates by selecting the node $\widehat\vtNode$,

499: corresponding to the path with the minimal weight,

500: from the final node set $\vtNodeSet_{\sf final}=\vtNodeSet_{|s|}$.

501: This weight is the product of the node's prefix weight $\vtPreWgt_{\vtNode}$

502: and the final weight $\wgtFin(q)$ of the corresponding state $q\!\in\!Q$

503: (Line~\ref{pc1:L301}).

504: %

505: The function \FCT{getPath}{~} identifies the best path $\path$

506: by following all back-pointers $\vtPreNode_{\vtNode}$,

507: from the node $\widehat\vtNode\in\vtNodeSet_{\sf final}$

508: to some node $\vtNode\in\vtNodeSet_{\sf initial}$,

509: and collecting all transitions $e\!=\!\vtPreArc_{\vtNode}$ it encounters.

510: %

511: Finally, $\path$ is returned.

512:

513:

514: %% --------------------------------------------------------------

515:

516: \subsection{$\eps$-Transitions}

517:

518: The algorithm can be extended to allow for $\eps$-transitions (but not for $\eps$-cycles).

519: The source and target node, $\vtNode$ and $\vtNode^\prime$, of an $\eps$-transition

520: would be in the same $\vtNodeSet_{\vtPtr}$.

521: %

522: If $\vtNode^\prime\!=\!\aTuple{q^\prime, p^\prime}$ is actually inserted into $\vtNodeSet_{\vtPtr}$

523: (Line~\ref{pc1:L213})

524: or if its variables $\vtPreWgt_{\vtNode^\prime}$, $\vtPreNode_{\vtNode^\prime}$,

525: and $\vtPreArc_{\vtNode^\prime}$ change their values

526: (Lines~\ref{pc1:L214}--\ref{pc1:L215}),

527: then we have to (re-)``include'' $\vtNode^\prime$ into the iteration over all nodes

528: of the currently inspected $\vtNodeSet_{\vtPtr}$

529: (Line~\ref{pc1:L204}).

530: %

531: The algorithm will still terminate

532: since there can be only finite sequences of $\eps$-transitions

533: (as long as we have no $\eps$-cycles).

534:

535:

536: %% --------------------------------------------------------------

537:

538: \subsection{Best transduction}

539:

540: The algorithm \FUNCT{FsaViterbi}{~}

541: can be used for compiling the best transduction of a given input string $s$

542: by a $2$-WFSM (weighted transducer).

543: For this, we identify the best path $\path$ accepting $s$ on its input tape

544: and take the label of $\path$'s output tape as best output string $v$.

545:

546:

547: %% --------------------------------------------------------------

548: %%  VITERBI ON n-TAPE

549: %% --------------------------------------------------------------

550:

551:

552: \section{$n$-Tape Best-Path Search

553:   \label{sec:vit-ntape}}

554:

555:

556: We come now to the central topic of this paper:

557: the generalization of the Viterbi algorithm

558: for searching for the best of all paths of an $n$-WFSM, $A\tapnum{n}$,

559: that accept a given $n$-tuple of input strings,

560: $s\tapnum{n}\!=\!\aTuple{s_1,\ldots s_n}$.

561: %

562: This requires relatively few modifications to the above explained

563: structures and algorithm (Section~\ref{sec:vit-1tape}).

564:

565:

566: %% --------------------------------------------------------------

567:

568: \subsection{Structures}

569:

570: The main difference wrt. the previous structures is that now

571: our reading pointer is a vector of $n$ natural integers,

572: $

573:   \vtPtr\tapnum{n}\!=\!\aTuple{\vtPtr_1,\ldots\vtPtr_n}  \in

574:   \left(\vspc{2ex} \aRange{0,\dots|s_1|} \times\ldots

575:   \times \aRange{0,\dots|s_n|} \;\right)

576:   \subset \Nat^n

577: $.

578: The pointer is initially positioned before the first letter

579: of each $s_i$ ~($\forall i\!\in\!\aRange{1,n}$),

580: ~$\,\vtPtr\tapnum{n}\!=\!\aTuple{0,\ldots 0}\,$.

581: Its elements $p_i$ are then increased according to the non-synchronized reading of the $s_i$

582: on the tapes $i$ ~($\forall i\!\in\!\aRange{1,n}$),

583: until the pointer reaches its final position after the last letter of each $s_i$,

584: ~$\,\vtPtr\tapnum{n}\!=\!\aTuple{|s_1|,\ldots |s_n|}\,$.

585:

586: More precisely, a pointer is an element of the monoid $\aTuple{\Nat^n, +, {\bf 0}}$

587: with $+$ being vector addition and ${\bf 0}$ the vector of $n$ $0$'s.

588: %

589: We have a partial order of pointers.

590: Let $ \vtLess\; : \Nat^n\!\times\!\Nat^n \rightarrow \aSet{\srTrue, \srFalse} $.

591: Let $ a,b\in\Nat^n $,

592: then $ a \vtLess b  \biimplies  \left(\; \exists c\in\Nat^n, c\not={\bf 0} : a+c=b \right)\; $.

593: We say $a$ {\it precedes} $b$.

594: %

595: It holds that $ a \vtLess b  \implies  \left(\; \sum_{i=1}^n a_i < \sum_{i=1}^n b_i \right)\; $

596: where $a_i$ and $b_i$ are the vector elements.

597:

598:

599: % ------------------------

600:

601: In the trellis (Figure~\ref{fig:vit-ntape})

602: we have still one node set $\vtNodeSet_{\vtPtr\tapnum{n}}$

603: per pointer position $\vtPtr\tapnum{n}$,

604: a single initial node set $\vtNodeSet_{\sf initial}\!=\!\vtNodeSet_{\aTuple{0,\dots 0}}$

605: and a single final node set $\vtNodeSet_{\sf final}\!=\!\vtNodeSet_{\aTuple{|s_1|,\dots|s_n|}}$.

606: There are, however, several nodes sets in parallel between the two

607: (corresponding to pointers $\vtPtr\tapnum{n},{\vtPtr^\prime}\tapnum{n}$

608:  not preceding each other, i.e.,

609:  $\vtPtr\tapnum{n}\!\not\vtLess\!{\vtPtr^\prime}\tapnum{n}  \logAnd

610:   {\vtPtr^\prime}\tapnum{n}\!\not\vtLess\!\vtPtr\tapnum{n}$).

611:

612:

613: \begin{figure}[ht]

614:   \begin{center}

615:     \includegraphics[scale=0.5,angle=0]{vit-ntape.eps}

616:     \caption{Modified trellis for $n$-tape best-path search

617: 	\label{fig:vit-ntape}}

618:   \end{center}

619:   %\vspace{-3ex}

620: \end{figure}

621:

622:

623: %% --------------------------------------------------------------

624:

625: \subsection{Algorithm}

626:

627: The algorithm \FUNCT{FsmViterbi}{~}

628: returns from all paths $\path\tapnum{n}$ of the $n$-WFSM $A\tapnum{n}$

629: that accept the string tuple $s\tapnum{n}$, the one with minimal weight

630: (Figure~\ref{pc:vit-ntape}).

631: $A\tapnum{n}$ must not contain any transitions labeled with $\aTuple{\eps,\ldots\eps}$.\footnote{

632:   %

633:   The algorithm can be extended to allow for $\aTuple{\eps,\ldots\eps}$-transitions

634:   (but not for $\aTuple{\eps,\ldots\eps}$-cycles)

635:   as described in Section~\ref{sec:vit-1tape}.

636:   %

637: }

638:

639:

640: The initial node set $\vtNodeSet_{\sf initial}\!=\!\vtNodeSet_{\aTuple{0,\dots 0}}$

641: is created as before, and inserted into the set of node sets $\vtNodeSetSet$

642: (Lines~\ref{pc2:L101}--\ref{pc2:L105}).

643: In addition, it is inserted into a Fibonacci heap\footnote{

644:   % -----------------------

645:   Alternatively, one could use a binary heap.

646:   Tests on a concrete example have, however, shown that the algorithm performs slightly better

647:   with a Fibonacci heap

648:   (Table~\ref{tab:AlignResults}).

649:   % -----------------------

650: }

651: $\vtNodeSetHeap$

652: ~(Line~\ref{pc2:L105})

653: \cite{fredman+tarjan:1987}.

654: This heap contains node sets $\vtNodeSet_{\vtPtr\tapnum{n}}$

655: that have not yet been processed,

656: and uses $\sum_{i=1}^n \vtPtr_i$ as sorting key.

657:

658:

659: The subsequent iteration continues as long as $\vtNodeSetHeap$ is not empty

660: (Lines~\ref{pc2:L201}--\ref{pc2:L215}).

661: %

662: The function \FCT{extractMinElement}{~}

663: extracts the (or a) minimal element $\vtNodeSet_{\vtPtr\tapnum{n}}$ from $\vtNodeSetHeap$

664: ~(Line~\ref{pc2:L202}).

665: Due to our sorting key,

666: none of the remaining $\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}$ in $\vtNodeSetHeap$

667: is a predecessor to $\vtNodeSet_{\vtPtr\tapnum{n}}$~:~

668: $

669: \forall\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}\!\in\!\vtNodeSetHeap \,,\;

670: {\vtPtr^\prime}\tapnum{n}\!\not\vtLess\!\vtPtr\tapnum{n}

671: $.

672: This property prevents the compilation of suffixes

673: of a $\vtNodeSet_{\vtPtr\tapnum{n}}$ that has some not yet analyzed prefixes

674: (which could lead to wrong choices).

675: %

676: The extracted $\vtNodeSet_{\vtPtr\tapnum{n}}$ is

677: handled almost as in the previous algorithm (Figure~\ref{pc:vit-1tape}).

678: %

679: Transition labels $\lab(e\tapnum{n})$ are required to match with a factor of $s\tapnum{n}$

680: at position $\vtPtr\tapnum{n}$

681: (Line~\ref{pc2:L206}).

682: %

683: New $\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}$ are inserted both into $\vtNodeSetSet$

684: and $\vtNodeSetHeap$

685: ~(Lines~\ref{pc2:L210}--\ref{pc2:L211}).

686:

687:

688: \begin{figure}[ht]

689:   \begin{center}

690:     {\small \input{vit-ntape.pc} }

691:     \vspace{-2ex}

692:     \caption{Pseudocode of $n$-tape best-path search

693:       \label{pc:vit-ntape}}

694:   \end{center}

695: \end{figure}

696:

697:

698: %% --------------------------------------------------------------

699:

700: \subsection{Best transduction}

701:

702: The algorithm \FUNCT{FsmViterbi}{~}

703: can be used for obtaining from a weighted $(n\!+\!m)$-WFSM (transducer)

704: with $n$ input and $m$ output tapes,

705: the best transduction of a given input $n$-tuple $s\tapnum{n}$.

706: For this, we identify the best path $\path\tapnum{n\!+\!m}$

707: accepting $s\tapnum{n}$ on its $n$ input tapes

708: and take the label of $\path$'s $m$ output tapes as best output $m$-tuple $v\tapnum{m}$.

709: Input and output tapes can be in any order.

710:

711:

712: %% --------------------------------------------------------------

713:

714: \subsection{Complexity}

715:

716: The trellis (Figure~\ref{fig:vit-ntape})

717: consists of at most $|\vtPtrSet|=\prod_{i=1}^{n}(|s_i|+1)$

718: node sets $\vtNodeSet_{\vtPtr\tapnum{n}}\!\in\!\vtNodeSetSet$.

719: Assuming approximately equal length $|s|$ for all $s_i$ of $s\tapnum{n}$,

720: we can simplify: $|\vtPtrSet|\approx(|s|+1)^n$.

721: %

722: For each node set $\vtNodeSet_{\vtPtr\tapnum{n}}$

723: we have to create at most $|Q|$ nodes $\vtNode\!\in\!\vtNodeSet_{\vtPtr\tapnum{n}}$,

724: which leads to a $\complexity\left( |s|^n |Q| \right)$ space complexity

725: for our algorithm.

726:

727: Each $\vtNodeSet_{\vtPtr\tapnum{n}}$ is extracted once from the Fibonacci heap $\vtNodeSetHeap$

728: in $\complexity(\log|P|)$ time.

729: We analyze for $\vtNodeSet_{\vtPtr\tapnum{n}}$ at most $|E|$ transitions $e\!\in\!E$

730: of $A\tapnum{n}$.

731: For the target of each $e$ we find a $\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}\!\in\!\vtNodeSetSet$

732: in $\complexity(\log|P|)$ time

733: and a node $\vtNode^\prime\!\in\!\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}$

734: in $\complexity(\log|Q|)$ time.

735: %

736: Thus, \FUNCT{FsmViterbi}{~} has a worst-case overall time complexity of

737: $

738: \complexity\left(\; |P| (\log|P| + |E| (\log|P| + \log|Q|)) \;\right)

739: = \complexity\left(\, |P||E|\log|P||Q| \,\right)

740: = \complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)

741: $~.

742:

743:

744: An HMM has exactly one transition per state pair, so that $|E|\!=\!|Q|^2$,

745: and an arity of $n\!=\!1$.

746: There would also be never more than one $\vtNodeSet_{\vtPtr\tapnum{n}}$ on the heap,

747: extractable in constant time.

748: In this case, our algorithm has a $\complexity\left( |s| |Q| \right)$ space

749: and a $\complexity\left( |s| |Q|^2 \right)$ time complexity,

750: as has the classical version of the Viterbi algorithm

751: (Section~\ref{sec:prelim}).

752:

753:

754: %% --------------------------------------------------------------

755: %%  VITERBI ON n-TAPE

756: %% --------------------------------------------------------------

757:

758:

759: \section{Example: Word Alignment

760:   \label{sec:align}}

761:

762:

763: In this section we illustrate our $n$-tape best path search on a practical example:

764: the alignment of word pairs.

765:

766: Suppose, we want to create a (non-weighted) transducer, $D\tapnum{2}$,

767: from a list of word pairs $s\tapnum{2}$

768: of the form $\aTuple{\WordD{inflected form}, \WordD{lemma}}$,

769: e.g., $\aTuple{\Word{swum}, \Word{swim}}$,

770: such that each path of the transducer is labeled with one of the pairs.

771: %

772: We want to use only transition labels of the form

773: $\aTuple{\sigma,\sigma}$, $\aTuple{\sigma,\eps}$, or $\aTuple{\eps,\sigma}$ ~($\forall\sigma\in\Sigma$),

774: while keeping paths as short as possible.

775: For example,

776: $\aTuple{\Word{swum}, \Word{swim}}$ should be encoded either by the sequence

777: $\aTuple{\Word{s},\Word{s}}\aTuple{\Word{w},\Word{w}}%

778: \aTuple{\Word{u},\eps}\aTuple{\eps,\Word{i}}\aTuple{\Word{m},\Word{m}}$

779: or by

780: $\aTuple{\Word{s},\Word{s}}\aTuple{\Word{w},\Word{w}}%

781: \aTuple{\eps,\Word{i}}\aTuple{\Word{u},\eps}\aTuple{\Word{m},\Word{m}}$,

782: rather than by the ill-formed

783: $\aTuple{\Word{s},\Word{s}}\aTuple{\Word{w},\Word{w}}%

784: \aTuple{\Word{u},\Word{i}}\aTuple{\Word{m},\Word{m}}$,

785: or the sub-optimal  %\linebreak

786: $\aTuple{\Word{s},\eps}\aTuple{\Word{w},\eps}\aTuple{\Word{u},\eps}\aTuple{\Word{m},\eps}%

787: \aTuple{\eps,\Word{s}}\aTuple{\eps,\Word{w}}\aTuple{\eps,\Word{i}}\aTuple{\eps,\Word{m}}$.

788: %

789: To achieve this, we perform for each word pair an alignment based on minimal edit distance.

790:

791:

792: %% --------------------------------------------------------------

793:

794: \subsection{Standard solution with edit distance matrix}

795:

796: A well known standard solution for word alignment is based on edit distance

797: which is a string similarity measure

798: defined as the minimum cost needed to convert one string into another

799: \cite{wagner+fischer:1974,pirkola+al:2003}.

800: %%++ For the sake of space, we only briefly recall this method.

801:

802:

803: For two words, $a\!=\!a_1\ldots a_n$ and $b\!=\!b_1\ldots b_m$,

804: the edit distance can be compiled with a matrix

805: $X\!=\!\{x_{i,j}\}$ ~($i\!\in\!\aRange{0,n}$, $j\!\in\!\aRange{0,m}$)

806: (Figures~\ref{fig:EditDistanceMatrix} and~\ref{pc:EditDistanceMatrix}).

807: %

808: A horizontal move in $X$ at a cost $c_I$ expresses an {\it insertion}\/,

809: a vertical move at a cost $c_D$ a {\it deletion}\/,

810: and a diagonal move at a cost $c_S$ a {\it substitution}\/ if $a_i\!\not=\!b_j$

811: or no edit operation if $a_i\!=\!b_j$.

812: %

813: We set $c_I\!=\!c_D\!=\!1$,

814: $c_S\!=\!\infty$ for $a_i\!\not=\!b_j$ (to disable substitutions),

815: and $c_S\!=\!0$ for $a_i\!=\!b_j$.

816: %

817: The element $x_{0,0}$ is set to $0$ and all other $x_{i,j}$ to

818: $\min(x_{i,j-1}+c_I\,,\; x_{i-1,j}+c_D\,,\; x_{i-1,j-1}+c_S)$,

819: insofar as these choices are available,

820: proceeding top-down and left-to-right.

821: %

822: The choices made to go from $x_{0,0}$ to $x_{n,m}$ describe the set of paths with (the same) minimal cost.

823: Each of these paths defines a sequence of edit operations for transforming $a$ into $b$.

824:

825:

826: The algorithm operates in $\complexity(|a||b|)$ time and space complexity.

827:

828:

829: \begin{figure}[ht]

830:   \vspace{1ex}

831:   \begin{center}

832:     \begin{minipage}{65mm}

833:       \begin{center}

834: 	\includegraphics[scale=0.5,angle=0]{edit-matrix.eps}

835:

836: 	\caption{Edit distance matrix

837: 		$X\!=\!\{x_{i,j}\}$

838: 		(choices are indicated by arrows; minimum cost paths by thick arrows and circles)

839: 		\label{fig:EditDistanceMatrix}}

840:       \end{center}

841:     \end{minipage}

842:     %

843:     \spc{10mm}

844:     %

845:     \begin{minipage}{60mm}

846:       \begin{center}

847: 	{\small

848: 	  \input{align.pc}

849: 	}

850:

851: 	\vspace{-3ex}

852: 	\caption{Pseudocode of compiling an {\it edit distance matrix}

853: 		\label{pc:EditDistanceMatrix}}

854:       \end{center}

855:     \end{minipage}

856:   \end{center}

857: \end{figure}

858:

859:

860: %% --------------------------------------------------------------

861:

862: %\newpage

863:

864: \subsection{Solution with 2-tape best path search}

865:

866: Alternatively, word alignment can be performed by best path search on an $n$-WFSM,

867: such as $A\tapnum{5}$ generated from the expression

868: \cite{isabelle+kempe:2004}

869: %

870: %%++ \begin{equation}

871: %%++  \scalebox{0.85 1.0}{

872: \begin{eqnarray}

873:   A\tapnum{5} & \;=\; & \left(\;

874: 			\aTuple{\aTuple{\Any,\Any,\Any,\Any,\algnK}_{\aSet{1=2=3=4}} , 0}

875: 			 \right.	 \nonumber \\

876: 		& &	 \left. \spc{5ex}

877: 			\;\cup\; \aTuple{\aTuple{\eps,\Any,\algnEps,\Any,\algnI}_{\aSet{2=4}} , 1}

878: 			 \;\cup\; \aTuple{\aTuple{\Any,\eps,\Any,\algnEps,\algnD}_{\aSet{1=3}} , 1} \;\right)^*

879: 	\label{eq:align}

880: \end{eqnarray}

881: %%++  }

882: %%++ \end{equation}

883:

884: \noindent

885: where $\Any$~ can be instantiated by any symbol $\sigma\!\in\!\Sigma$,

886: $\algnEps$ is a special symbol representing $\eps$ in an alignment,

887: $\aSet{1\!=\!2\!=\!3\!=\!4}$ a constraint

888: requiring the $\Any$'s on tapes $1$ to $4$ to be instantiated by the same symbol

889: \cite{nicart+al:2006a},\footnote{

890:   %

891:   Roughly following \cite{kempe+champarnaud+eisner:2004a},

892:   we employ here a simpler notation for constraints than in~\cite{nicart+al:2006a}.

893:   %

894: }

895: and $0$ and $1$ are weights over the semiring $\aTuple{\srSetN\cup\aSet{\infty}, \min, +, \infty, 0}$.

896:

897: Input word pairs $s\tapnum{2}\!=\!\aTuple{s_1,s_2}$ will be matched on tape 1 and 2,

898: and aligned output word pairs generated from tape 3 and 4.

899: %

900: A symbol pair $\aTuple{\Any,\Any}$ read on tape 1 and 2

901: is identically mapped to $\aTuple{\Any,\Any}$ on tape 3 and 4,

902: a $\aTuple{\eps,\Any}$ is mapped to $\aTuple{\algnEps,\Any}$,

903: and a $\aTuple{\Any,\eps}$ to $\aTuple{\Any,\algnEps}$.

904: %

905: $A\tapnum{5}$ will introduce $\algnEps$'s in $s_1$ (resp. in $s_2$) at positions

906: where $D\tapnum{2}$ shall have $\aTuple{\eps,\sigma}$-

907: (resp. a $\aTuple{\sigma,\eps}$-) transitions.

908: %

909: (Later, we simply replace in $D\tapnum{2}$ all $\algnEps$ by $\eps$.)

910:

911:

912: Thus, we obtain the full set of all possible alignments between $s_1$ and $s_2$.

913: The best alignment is the one with the lowest weight.

914: %

915: For example, $\aTuple{\Word{swum}, \Word{swim}}$ is mapped to a set of alignments,

916: including the two best ones,

917: $\aTuple{\Word{sw\algnEps um}, \Word{swi\algnEps m}}$

918: and $\aTuple{\Word{swu\algnEps m}, \Word{sw\algnEps im}}$, with weight 2 both.

919: %

920: The (or a) best alignment can be found without generating all alignments,

921: by means of our $n$-tape best path search (with $n\!=\!2$).

922:

923:

924: So far, we did not use tape 5.

925: It can serve for excluding certain paths.

926: For example, joining $A\tapnum{5}$ on tape 5 with $C\tapnum{1}$

927: %\cite{kempe+champarnaud+eisner:2004a}  %% REPLACE THIS REFERENCE IN THE FINAL VERSION

928: \cite{kempe+al:2005a,kempe+al:2005b}

929: built from the expression $\neg(\Any^*\;\algnI\;\algnD\;\Any^*)$,

930: prohibiting an insertion ($\algnI$) to be immediately followed by a deletion ($\algnD$),

931: would leave only $\aTuple{\Word{swu\algnEps m}, \Word{sw\algnEps im}}$ as a best path.

932:

933:

934: %%++ Our algorithm operates on this example

935: %%++ with a $\complexity(\,|s_1||s_2|\log|s_1||s_2|\,)$ time

936: %%++ and a $\complexity(\,|s_1||s_2|\,)$ space complexity.

937:

938: The 5-WFSM from Equation~\eqref{eq:align}

939: has 1 state and 3 transitions.

940: Input is read on 2 tapes.

941: Our algorithm works on this example

942: with a worst-case time complexity of

943: $

944:   \complexity(\, |s_1||s_2|\cdot 3\cdot \log(|s_1||s_2|\cdot 1) \,)

945:   = \complexity(\,|s_1||s_2|\log|s_1||s_2|\,)

946: $

947: and a worst-case space complexity of

948: $

949:   \complexity(\,|s_1||s_2|\cdot 1 \,)

950:   = \complexity(\,|s_1||s_2|\,)

951: $~.

952:

953:

954: %% --------------------------------------------------------------

955:

956:

957: \subsection{Test results}

958:

959: We tested our $n$-tape best-path algorithm on the alignment of the German word pair

960: $\aTuple{\Word{gemacht}, \Word{machen}}$ ~(English: $\aTuple{\WordD{done}, \WordD{do}}$),

961: leading to  % \linebreak

962: $\aTuple{\Word{gemacht\algnEps\algnEps}, \Word{\algnEps\algnEps{mach}\algnEps{en}}}$.

963: %

964: We repeated this test for the word pairs $\aTuple{s_1^r, s_2^r}$

965: with $s_1=$``\Word{gemacht}'' and $s_2$=``\Word{machen}'',

966: and $r\!\in\!\aRange{1,8}$.\footnote{

967:   % --------------------------------

968:   For example, for $r\!=\!2$ we have

969:   $\aTuple{{\sf\small gemachtgemacht}, {\sf\small machenmachen}}$.

970:   % --------------------------------

971: }

972:

973:

974: \def\S{\spc{1.1ex}}

975: \def\D{\spc{0.6ex}}

976:

977: \begin{table}[ht]

978:   \vspace{1ex}

979:   \begin{center}

980:     \begin{math} %\tabcolsep1ex

981:       \begin{tabular}{ c | r r r | c } \hline

982: 	% ------------------------------

983: 	\spc{2ex}$r$\spc{2ex}	& \spc{2ex}A\spc{0.5ex}	& \spc{3ex}B\spc{2.5ex}	& \spc{3ex}C\spc{2.8ex}	& \spc{4ex}D\spc{3ex}	\\ \hline

984: 	% ------------------------------

985: 	1	&  1	&   1\D\S\S	&   1\D\S\S	& 1.056	\\

986: 	2	&  4	&   4.12	&   5.48	& 1.041	\\

987: 	3	&  9	&   9.41	&  14.3\S	& 1.057	\\

988: 	4	& 16	&  17.1\S	&  27.9\S	& 1.029	\\

989: 	5	& 25	&  27.2\S	&  46.5\S	& 1.059	\\

990: 	6	& 36	&  39.8\S	&  70.5\S	& 1.016	\\

991: 	7	& 49	&  54.1\S	& 100\D\S\S	& 1.005	\\

992: 	8	& 64	&  70.8\S	& 135\D\S\S	& 1.006	\\ \hline

993: 	% ------------------------------

994:       \end{tabular}

995:     \end{math}

996:

997:     \caption{Test results for word pair alignment with 2-tape best path search

998: 	\label{tab:AlignResults}}

999:   \vspace{-1ex}

1000:   \end{center}

1001: \end{table}

1002:

1003:

1004: \noindent

1005: The columns of Table~\ref{tab:AlignResults} show for different $r$~:

1006: %

1007: \begin{itemize}

1008: \item[(A)]

1009:   an estimated time ratio of $r^2$ for the classical approach with an edit distance matrix,

1010:   %

1011: \item[(B)]

1012:    the measured time ratio for 2-tape best path search (wrt. 3.93 milliseconds for $r=1$)

1013:    using a Fibonacci heap,

1014:    %

1015: \item[(C)]

1016:   an estimated worst-case time ratio of

1017:   $

1018:   \frac{(7r\cdot 6r) \log (7r\cdot 6r)}{(7\cdot 6) \log (7\cdot 6)}

1019:   = r^2(1\!+\!2\frac{\log r}{\log 42})

1020:   $

1021:   corresponding to the worst-case complexity of $\complexity(7r 6r \log 7r 6r)$

1022:   for the two words of length $7r$ and $6r$, respectively, and

1023:   %

1024: \item[(D)]

1025:   the measured time increase factor when using a binary instead of a Fibonacci heap.

1026:   %

1027: \end{itemize}

1028:

1029:

1030: Comparing the columns A and B shows a time complexity slightly above

1031: $\complexity(r^2) = \complexity(\,|s_1^r||s_2^r|\,)$,

1032: being much lower than the worst-case time complexity in column C,

1033: for our algorithm on this example.

1034:

1035:

1036: \pagebreak

1037:

1038: %% --------------------------------------------------------------

1039: %%  ALTERNATIVES

1040: %% --------------------------------------------------------------

1041:

1042:

1043: \section{An Alternative Approach

1044:   \label{sec:alternatives}}

1045:

1046:

1047: A well-known straight forward alternative to the above $n$-tape best-path search

1048: on an $n$-WFSM $A\tapnum{n}$ is

1049: to intersect $A\tapnum{n}$ with an $n$-WFSM $I\tapnum{n}$,

1050: containing a single path labeled with the input $n$-tuple $s\tapnum{n}$,

1051: and then to apply a classical shortest-distance algorithm, ignoring the labels.

1052:

1053:

1054: %% --------------------------------------------------------------

1055:

1056: \subsection{Intersection}

1057:

1058: The intersection $B\tapnum{n} = I\tapnum{n} \cap A\tapnum{n}$

1059: can be compiled as the join $I\tapnum{n} \JOIN{1=1,\ldots n=n} A\tapnum{n}$

1060: \cite{kempe+champarnaud+eisner:2004a}.

1061: %

1062: In general, it has undecidable emptiness and rationality \cite{rabin+scott:1959}.

1063: In our case, however,

1064: with $A\tapnum{n}$ being $\aTuple{\eps,\ldots\eps}$-cycle free

1065: and $I\tapnum{n}$ acyclic,

1066: it is even for non-commutative semirings always rational.\footnote{

1067:   % ----------------------------------

1068:   The intersection of two $n$-WFSM over non-commutative semirings

1069:   is in general not rational (even for $n\!=\!1$).

1070:   % ----------------------------------

1071: }

1072:

1073:

1074: Actually, the trellis $\vtNodeSet$ in Figure~\ref{fig:vit-ntape}

1075: corresponds partially to $B\tapnum{n}$.

1076: Each node $\vtNode\!\in\!\vtNodeSet$

1077: corresponds to a state $q\!\in\!Q_B$ of $B\tapnum{n}$

1078: (and vice versa);

1079: however, only those transitions $e\!\in\!E_B$ of $B\tapnum{n}$

1080: that correspond to a state's best prefix,

1081: %%++ occur as back-pointers $\vtPreNode_{\vtNode}$ in $\vtNodeSet$.\footnote{

1082: occur as ``best transitions'' $e_{\vtNode}$ in $\vtNodeSet$.\footnote{

1083:   % ---------------------------------

1084:   Due to this analogy, one can easily derive an $n$-tape intersection (or join) algorithm,

1085:   for precisely our case, from the algorithm in Figure~\ref{pc:vit-ntape}.

1086:   Trellis nodes would become states of the resulting $n$-WFSM.

1087:   All of their incoming transitions would be constructed,

1088:   rather than only those that correspond to a best prefix.

1089:   The state set would be partitioned like the trellis.

1090:   The Fibonacci heap can be replaced by a stack

1091:   (which does not decrease the overall time complexity),

1092:   because the order in which partitions are treated would be irrelevant.

1093:   % ---------------------------------

1094: }

1095:

1096:

1097: From this analogy we deduce that compiling the intersection $B\tapnum{n}$

1098: has a worst-case time and space complexity of

1099: $\complexity\left(\, |P||E|\log|P||Q| \,\right)$, with $|P|\!=\!(|s|+1)^n$,

1100: equal to the time complexity for constructing the trellis.

1101: %

1102: The result, $B\tapnum{n}$, has at most

1103: $\nu \leq |P||Q|$ states and $\mu \leq |P||E|$ transitions.

1104:

1105:

1106: %% --------------------------------------------------------------

1107:

1108: \subsection{Shortest-distance algorithms}

1109:

1110: Since any $n$-WFSM with multiple initial states can be transformed

1111: into one with a single initial state,

1112: we can use any algorithm that solves a single-source shortest-distance problem,

1113: %

1114: such as Dijkstra's algorithm \cite{dijsktra:1959}

1115: combined with Fibonacci heaps \cite{fredman+tarjan:1987},

1116: that operates in $\complexity(\mu + \nu\log\nu)$ time,

1117: %

1118: or Bellman-Ford's algorithm \cite{bellman:1958,ford+fulkerson:1956}

1119: operating in $\complexity(\mu\nu)$ time,

1120: %

1121: with $\nu$ being the number of states and $\mu$ the number of transitions.

1122:

1123:

1124: Recently, it has been shown that any single-source shortest-distance algorithm on

1125: directed graphs has a lower bound of $\Omega(\mu + \min(\nu\log\nu ,\; \nu\log\rho))$

1126: where $\rho$ is the ratio of the maximal to minimal transition weight

1127: \cite{pettie:2003}.

1128: Since we cannot make any assumption concerning $\rho$ in general,

1129: we consider $\widehat\Omega(\mu + \nu\log\nu)$ as a ``worst-case lower bound''.

1130: It equals the upper bound of Dijkstra's algorithm.

1131:

1132:

1133: On the intersection $B\tapnum{n} = I\tapnum{n} \cap A\tapnum{n}$,

1134: Dijkstra's algorithm requires $\complexity(|P||E|+|P||Q|\log|P||Q|)$ time,

1135: and Bellman-Ford's $\complexity(|P|^2|E||Q|)$ time, in the worst case.

1136: %

1137: The sets $E$ and $Q$ refer to $A\tapnum{n}$.

1138:

1139:

1140: %% --------------------------------------------------------------

1141:

1142: \subsection{Complete estimate}

1143:

1144: Intersection and Dijkstra's algorithm have together

1145: a worst-case time complexity of  \linebreak

1146: $

1147: \complexity\left(\, |P||E|\log|P||Q| + |P||E| + |P||Q|\log|P||Q| \,\right)

1148: \approx \complexity\left(\, |P|(|E|+|Q|) \log|P||Q| \,\right)

1149: $.

1150: For intersection and Bellman-Ford's algorithm it is

1151: $

1152: \complexity\left(\, |P||E|\log|P||Q| + |P|^2|E||Q| \,\right) =

1153: \complexity\left(\, |P||E|\,(|P||Q|\!+\!\log|P||Q|) \,\right)

1154: $.

1155: Both combinations exceed the complexity of our algorithm.

1156:

1157:

1158: This result is not surprising since

1159: only building the trellis $\vtNodeSet$ should take less time

1160: than building the intersection $B\tapnum{n}$

1161: (which is a kind of ``superset'' of $\vtNodeSet$)

1162: and then performing a best-path search.

1163:

1164:

1165: %% --------------------------------------------------------------

1166:

1167:

1168: \pagebreak

1169:

1170: %% --------------------------------------------------------------

1171: %%  CONCLUSION

1172: %% --------------------------------------------------------------

1173:

1174:

1175: \section{Conclusion

1176:    \label{sec:conclusion}}

1177:

1178:

1179: We presented an algorithm for identifying the path with minimal (resp. maximal) weight

1180: in a given {\it $n$-tape weighted finite-state machine}\/ ($n$-WFSM), $A\tapnum{n}$,

1181: that accepts a given $n$-tuple of input strings,

1182: $s\tapnum{n}\!=\!\aTuple{s_1,\ldots s_n}$.

1183: %

1184: This problem is of particular interest because it allows us also

1185: to compile the best transduction of a given input $n$-tuple $s\tapnum{n}$

1186: by a weighted $(n\!+\!m)$-WFSM (transducer), $A\tapnum{n+m}$, with $n$ input and $m$ output tapes.

1187: For this, we identify the best path accepting $s\tapnum{n}$ on its $n$ input tapes,

1188: and take the label of its output tapes as best output $m$-tuple $v\tapnum{m}$.

1189: (Input and output tapes can be in any order.)

1190:

1191:

1192: Our algorithm is a generalization of the Viterbi algorithm

1193: which is generally used for detecting the most likely path

1194: in a {\it Hidden Markov Model}\/ (HMM)

1195: for an observed sequence of symbols emitted by the HMM.

1196: %

1197: In the worst case,

1198: it operates in $\complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)$ time,

1199: where $n$ and $|s|$ are the number and average length of the strings in $s\tapnum{n}$,

1200: and $|Q|$ and $|E|$ the number of states and transitions of $A\tapnum{n}$,

1201: respectively.

1202:

1203:

1204: We illustrated our $n$-tape best path search on a practical example,

1205: the alignment of word pairs (i.e., $n\!=\!2$),

1206: and provided test results that show a time complexity slightly higher than

1207: $\complexity\left(\, |s|^2 \,\right)$.

1208:

1209:

1210: Finally, we discussed a straight forward alternative approach for solving our problem,

1211: that consists in intersecting $A\tapnum{n}$ with an $n$-WFSM $I\tapnum{n}$,

1212: that has a single path labeled with the input $n$-tuple $s\tapnum{n}$,

1213: and then applying a classical shortest-distance algorithm, ignoring the labels.

1214: %

1215: This has, however, a worst-case time complexity of

1216: $ \complexity\left(\, |s|^n(|E|+|Q|)\log|s|^n|Q| \,\right) $,

1217: which is higher than that of our algorithm.

1218:

1219:

1220: %% --------------------------------------------------------------

1221:

1222:

1223:

1224: %% --------------------------------------------------------------

1225: %%  BIBLIOGRAPHY

1226: %% --------------------------------------------------------------

1227:

1228: \bibliographystyle{fullname}

1229:

1230: \bibliography{ntvit}

1231:

1232:

1233: %% --------------------------------------------------------------

1234:

1235: \end{document}

1236:

1237: