1: \documentclass[a4,10pt]{article}
2:
3:
4: %% --------------------------------------------------------------
5: %% BEGIN -- DEFINITIONS AND PACKAGES
6:
7: \pagestyle{plain}
8:
9: \usepackage{latexsym}
10:
11: %\usepackage{psfig}
12: \usepackage{graphicx}
13: \usepackage{pstricks}
14: \usepackage{amssymb}
15: \usepackage{amsmath}
16: \usepackage{fullname}
17:
18: \textheight 220mm
19: \textwidth 155mm
20: \headheight -2mm
21: \oddsidemargin 0mm
22: \topmargin 0mm
23:
24: \input{ntvit.def}
25:
26: %% END -- DEFINITIONS
27: %% --------------------------------------------------------------
28:
29: \begin{document}
30:
31:
32: %% --------------------------------------------------------------
33: %% TITLE AND ABSTRACT
34: %% --------------------------------------------------------------
35:
36: \title{ \vspace{-2ex}
37: Viterbi Algorithm Generalized for $n$-Tape Best-Path Search
38: }
39:
40: \author{
41: Andr\'e Kempe \vspace{2ex} \\
42: Xerox Research Centre Europe ~--~ Grenoble Laboratory \\
43: 6 chemin de Maupertuis ~--~ 38240 Meylan ~--~ France \\
44: }
45:
46: \date{
47: March 9, 2006
48: }
49:
50: \maketitle
51:
52: \begin{abstract}
53: %
54: We present a generalization of the Viterbi algorithm
55: for identifying the path with minimal (resp. maximal) weight
56: in a {\it $n$-tape weighted finite-state machine}\/ ($n$-WFSM),
57: that accepts a given $n$-tuple of input strings $\aTuple{s_1,\ldots s_n}$.
58: %
59: It also allows us to compile the best transduction of a given input $n$-tuple
60: by a weighted $(n\!+\!m)$-WFSM (transducer) with $n$ input and $m$ output tapes.
61: %
62: Our algorithm has a worst-case time complexity of
63: $\complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)$,
64: where $n$ and $|s|$ are the number and average length of the strings in the $n$-tuple,
65: and $|Q|$ and $|E|$ the number of states and transitions in the $n$-WFSM,
66: respectively.
67: %
68: A straight forward alternative,
69: consisting in intersection followed by classical shortest-distance search,
70: operates in $\complexity\left(\, |s|^n(|E|+|Q|)\log|s|^n|Q| \,\right)$ time.
71: %
72: \end{abstract}
73:
74:
75: %% --------------------------------------------------------------
76: %% INTRODUCTION
77: %% --------------------------------------------------------------
78:
79:
80: \section{Introduction
81: \label{sec:intro}}
82:
83:
84: The topic of this paper is situated in the areas of
85: {\it multi-tape}\/ or {\it $n$-tape weighted finite-state machines}\/ ($n$-WFSMs)
86: and shortest-path problems.
87:
88:
89: %% --------------------------------------------------------------
90:
91: $n$-WFSMs
92: \cite{rabin+scott:1959,elgot+mezei:1965,kay:1987,harju+karhumaki:1991,kaplan+kay:1994}
93: are a natural generalization of the familiar
94: finite-state acceptors (one tape) and transducers (two tapes).
95: The $n$-ary relation defined by an $n$-WFSM is a weighted {\em rational\/} relation.
96: %
97: Finite relations are of particular interest since they
98: can be viewed as relational databases.
99: %
100: A finite-state transducer ($n=2$) can be seen as a database of string pairs,
101: such as $\aTuple{\WordD{spelling}, \WordD{pronunciation}}$ or
102: $\aTuple{\WordD{French word}, \WordD{English word}}$.
103: %
104: Unlike a classical database, a transducer may even define infinitely many pairs.
105: For example, it may characterize the pattern of the spelling-pronunciation
106: relationship in such a way that it can map even the spelling of an unknown word
107: to zero or more possible pronunciations (with various weights),
108: and vice-versa.
109: %
110: $n$-WFSMs have been used in the morphological analysis of Semitic languages,
111: to synchronize the vowels, consonants, and
112: templatic pattern into a surface form \cite{kay:1987,kiraz:2000}.
113:
114:
115: %% --------------------------------------------------------------
116:
117: Classical shortest-path algorithms can be separated into two groups,
118: addressing either single-source shortest-path (SSSP) problems,
119: such as Dijkstra's algorithm \cite{dijsktra:1959}
120: or Bellman-Ford's \cite{bellman:1958,ford+fulkerson:1956},
121: or all-pairs shortest-path (APSP) problems,
122: such as Floyd-Warshall's \cite{floyd:1962,warshall:1962}.
123: SSSP algorithms determine a minimum-weight path from a source vertex
124: of a real- or integer-weighted graph to all its other vertices.
125: APSP algorithms find shortest paths between all pairs of vertices.
126: %
127: For details of shortest-path problems in graphs see \cite{pettie:2003},
128: and in semiring-weighted finite-state automata see \cite{mohri:2002b}.
129:
130:
131: %% --------------------------------------------------------------
132:
133: \smallskip
134:
135: We address the following problem:
136: in a given $n$-WFSM we want to identify the path with minimal (resp. maximal) weight
137: that accepts a given $n$-tuple of input strings $\aTuple{s_1,\ldots s_n}$.
138: %
139: This is of particular interest because it allows us also
140: to compile the best transduction of a given input $n$-tuple
141: by a weighted $(n\!+\!m)$-WFSM (transducer) with $n$ input and $m$ output tapes.
142: For this, we identify the best path accepting the input $n$-tuple on its input tapes,
143: and take the label of the path's output tapes as best output $m$-tuple.
144:
145:
146: A known straight forward method for solving our problem is
147: to intersect the $n$-WFSM with another one that contains a single path
148: labeled with the input $n$-tuple,
149: and then to apply a classical SSSP algorithm, ignoring the labels.
150: %
151: We show that such an intersection together with Dijkstra's algorithm have
152: a worst-case time complexity of
153: $ \complexity\left(\, |s|^n(|E|+|Q|)\log|s|^n|Q| \,\right) $,
154: where $n$ and $|s|$ are the number and average length of the strings in the $n$-tuple,
155: and $|Q|$ and $|E|$ the number of states and transitions of the $n$-WFSM,
156: respectively.
157:
158:
159: We propose an alternative approach with lower complexity.
160: It is based on the Viterbi algorithm
161: which is generally used for detecting the most likely path
162: in a {\it Hidden Markov Model}\/ (HMM)
163: for an observed sequence of symbols emitted by the HMM
164: \cite{viterbi:1967,rabiner:1990,manning+schuetze:1999}.
165: %
166: Our algorithm is a generalization of Viterbi's algorithm
167: such that it deals with an $n$-tuple of input strings rather than with a single input string.
168: %
169: In the worst case,
170: it operates in $\complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)$ time.
171:
172:
173: %% --------------------------------------------------------------
174:
175: \smallskip
176:
177: This paper is structured as follows.
178: %
179: Basic definitions of weighted $n$-ary relations, $n$-WFSMs, HMMs, and the Viterbi algorithm
180: are recalled in Section~\ref{sec:prelim}.
181: %
182: Section~\ref{sec:vit-1tape}
183: adapts the Viterbi algorithm
184: to the search of the best path in a $1$-WFSM that accepts a given input string,
185: %
186: and Section~\ref{sec:vit-ntape} generalizes it
187: to the search of the best path in an $n$-WFSM that accepts an $n$-tuple of strings.
188: %
189: Section~\ref{sec:align}
190: illustrates our algorithm on a practical example,
191: the alignment of word pairs (i.e., $n\!=\!2$),
192: and provides test results that show a slightly higher than
193: $\complexity\left(\, |s|^2 \,\right)$ time complexity.
194: %
195: The above mentioned classical method for solving our problem
196: is discussed in Section~\ref{sec:alternatives}.
197: %
198: Section~\ref{sec:conclusion}
199: concludes the paper.
200:
201:
202: %% --------------------------------------------------------------
203: %% PRELIMINARIES
204: %% --------------------------------------------------------------
205:
206:
207: \section{Preliminaries
208: \label{sec:prelim}}
209:
210:
211: We recall some definitions about
212: $n$-ary weighted relations and their machines,
213: following the usual definitions for multi-tape
214: automata \cite{elgot+mezei:1965,eilenberg:1974},
215: with semiring weights added just as for acceptors and transducers
216: \cite{kuich+salomaa:1986,mohri+al:1998}.
217: For more details see \cite{kempe+champarnaud+eisner:2004a}.
218: %
219: We also briefly recall Hidden Markov Models and the Viterbi algorithm,
220: and point the reader to
221: \cite{viterbi:1967,rabiner:1990,manning+schuetze:1999}
222: for further details.
223:
224:
225: %% --------------------------------------------------------------
226:
227: \subsection{Weighted $n$-ary relations}
228:
229: A weighted $n$-ary relation is a function from $(\Sigma^*)^n$ to
230: $\srSetK$, for a given finite alphabet $\Sigma$ and a given weight
231: semiring $\srK = \aTuple{\srSetK, \srPlus, \srTimes, \srZero,
232: \srOne}$. A relation assigns a weight to any
233: $n$-tuple of strings. A weight of $\srZero$ can be interpreted as
234: meaning that the tuple is not in the relation.
235: %
236: We are especially interested in {\it rational} (or {\it regular})
237: $n$-ary relations, i.e. relations that can be encoded by $n$-tape
238: weighted finite-state machines, that we now define.
239: %
240:
241:
242: We adopt the convention that variable names referring to $n$-tuples of
243: strings include a superscript $\tapnum{n}$.
244: Thus we write $s\tapnum{n}$ rather than
245: ${\mathop{s}\limits^\rightarrow}$
246: for a tuple of strings $\aTuple{s_1, \dots s_n}$.
247: We also use this convention for the names of
248: objects that contain $n$-tuples of strings,
249: such as $n$-tape machines and their transitions and paths.
250:
251:
252: %% --------------------------------------------------------------
253:
254: \subsection{Multi-tape weighted finite-state machines}
255:
256: An {\it $n$-tape weighted finite-state machine} (WFSM or $n$-WFSM)
257: $A\tapnum{n}$ is defined by a six-tuple
258: $A\tapnum{n} = \aTuple{\Sigma, Q, \srK, E\tapnum{n}, \wgtInit, \wgtFin}$,
259: with
260: $\Sigma$ being a finite alphabet,
261: $Q$ a finite set of states,
262: $\srK\!=\!\aTuple{\srSetK,\srPlus,\srTimes,\srZero,\srOne}$ the
263: semiring of weights,
264: $E\tapnum{n}\!\subseteq ( Q\times (\Sigma^*)^n \times \srSetK \times Q )$
265: a finite set of weighted $n$-tape transitions,
266: $\wgtInit : Q \rightarrow \srSetK$ a function that assigns initial
267: weights to states,
268: and $\wgtFin : Q \rightarrow \srSetK$ a function that assigns final
269: weights to states.
270:
271: Any transition $e\tapnum{n}\!\in\! E\tapnum{n}$ has the form
272: $e\tapnum{n}\!=\!\aTuple{\eSrc,\lab\tapnum{n},w,\eTrg}$.
273: We refer to these four components as the transition's source state
274: $\eSrc(e\tapnum{n})\!\in\!Q$, its label
275: $\lab(e\tapnum{n})\!\in\!(\Sigma^*)^n$, its weight
276: $w(e\tapnum{n})\!\in\!\srSetK$, and its target state
277: $\eTrg(e\tapnum{n})\!\in\!Q$.
278: %
279: We refer by $E(q)$ to the set of out-going transitions of a state $q\!\in\!Q$
280: ~(with $E(q)\!\subseteq\!E\tapnum{n}$).
281:
282:
283: A {\it path}\/ $\path\tapnum{n}$ of length $k \geq 0$
284: is a sequence of transitions
285: $e_1\tapnum{n} e_2\tapnum{n} \cdots e_k\tapnum{n}$
286: such that $\eTrg(e_i\tapnum{n})\!=\!\eSrc(e_{i+1}\tapnum{n})$
287: for all $i\!\in\!\aRange{1, k\!-\!1}$.
288: %
289: The label of a path is the element-wise concatenation of
290: the labels of its transitions.
291: The weight of a path $\path\tapnum{n}$ is
292: %
293: %\vspace{-1ex}
294: \begin{equation}
295: w(\path\tapnum{n}) \DefAs
296: \wgtInit(\eSrc(e_1\tapnum{n})) \srTimes
297: \left(\srBigTimes_{j\in\aRange{1,k}}
298: \spc{-1ex}w\left(e_j\tapnum{n}\right)\right) \srTimes
299: \wgtFin(\eTrg(e_k\tapnum{n}))
300: \end{equation}
301:
302: %\vspace{-1ex}
303:
304: \noindent
305: The path is said to be {\it successful}, and to {\it accept}
306: its label, if $w(\path\tapnum{n})\neq\srZero$.
307:
308:
309: %% --------------------------------------------------------------
310:
311: \subsection{Hidden Markov Models}
312:
313: A {\it Hidden Markov Model}\/ (HMM) is defined by a five-tuple
314: $\aTuple{\Sigma,Q,\hmIniVec,\hmTraMtx,\hmOutMtx}$, where
315: %
316: $\Sigma\!=\!\aSet{\sigma_k}$ is the output alphabet,
317: $Q\!=\!\aSet{q_i}$ a finite set of states,
318: $\hmIniVec\!=\!\aSet{\hmIniPrb_i}$ a vector of initial state probabilities
319: $\hmIniPrb_i = p(x_1\!=\!q_i) : Q\rightarrow\aRange{0,1}\,$,~
320: $\hmTraMtx\!=\!\aSet{\hmTraPrb_{ij}}$ a matrix of state transition probabilities
321: $\hmTraPrb_{ij} = p(x_t\!=\!q_j | x_{t-1}\!=\!q_i) : Q\!\times\!Q\rightarrow\aRange{0,1}\,$,~
322: and $\hmOutMtx\!=\!\aSet{\hmOutPrb_{jk}}$ a matrix of state emission probabilities
323: $\hmOutPrb_{jk} = p(\hmPthOut_t\!=\!\sigma_k | x_t\!=\!q_j) : Q\!\times\!\Sigma\rightarrow\aRange{0,1}\,$.
324: %
325: A {\it path}\/ of length $T$ in an HMM is a non-observable (i.e., hidden) state sequence
326: $\hmStaSeq = \hmPthSta_1\cdots\hmPthSta_T$,
327: emitting an observable output sequence
328: $\hmOutSeq = \hmPthOut_1\cdots\hmPthOut_T$
329: which is a probabilistic function of $\hmStaSeq$.
330: %
331:
332:
333: %% --------------------------------------------------------------
334:
335: \subsection{Viterbi Algorithm}
336:
337: The {\it Viterbi algorithm}\/
338: finds the most likely path
339: $\widehat\hmStaSeq = \argmax_{\hmStaSeq} p(\hmStaSeq | \hmOutSeq, \mu)$
340: for an observed output sequence $\hmOutSeq$
341: and given model parameters $\mu=\aTuple{\hmIniVec,\hmTraMtx,\hmOutMtx}$,
342: using a trellis similar to that in Figure~\ref{fig:vit-1tape}.
343: %
344: It has a $\complexity(T\, |Q|^2)$ time and a $\complexity(T\, |Q|)$ space complexity.
345:
346:
347: %% --------------------------------------------------------------
348: %% VITERBI ON 1-TAPE
349: %% --------------------------------------------------------------
350:
351:
352: \section{$1$-Tape Best-Path Search
353: \label{sec:vit-1tape}}
354:
355:
356: The Viterbi algorithm
357: \cite{viterbi:1967,rabiner:1990,manning+schuetze:1999}
358: can be easily adapted for searching for the best of all paths of a $1$-WFSM, $A\tapnum{1}$,
359: that accept a given input string.
360: %
361: We use a notation that will facilitate the subsequent generalization
362: of the algorithm to $n$-tape best-path search (Section~\ref{sec:vit-ntape}).
363: Only the search for the path with minimal weight is explained.
364: An adaptation to maximal weight search is trivial.
365:
366:
367: %% --------------------------------------------------------------
368:
369:
370: \begin{figure}[htb]
371: \begin{center}
372: \includegraphics[scale=0.5,angle=0]{vit-1tape.eps}
373: \caption{Modified trellis for $1$-tape best-path search
374: \label{fig:vit-1tape}}
375: \end{center}
376: %\vspace{-3ex}
377: \end{figure}
378:
379:
380: \subsection{Structures}
381:
382: We use a reading pointer $\vtPtr\in\vtPtrSet=\aSet{0,\ldots|s|}$
383: that is initially positioned before the first letter of the input string $s$,
384: ~$\vtPtr\!=\!0$,
385: and then increased with the reading of $s$
386: until it reaches the position after the last letter,
387: $\,\vtPtr\!=\!|s|$.
388: %
389: At any moment, $\vtPtr$ equals the length of the prefix of $s$
390: that has already been read.
391:
392:
393: As it is usual for the Viterbi algorithm,
394: we use a trellis $\vtNodeSet\!=Q\times\vtPtrSet$,
395: consisting of nodes $\vtNode\!=\!\aTuple{q,\vtPtr}$
396: which express that a state $q\!\in\!Q$ is reached after reading $\vtPtr$ letters of $s$
397: (Figure~\ref{fig:vit-1tape}).
398: %
399: We divide the trellis into several node sets
400: $\vtNodeSet_{\vtPtr} = \aSet{\vtNode\!=\!\aTuple{q,\vtPtr}} \subseteq \vtNodeSet$,
401: each corresponding to a pointer position $\vtPtr$ or to a column of the trellis.
402: %
403: For each node $\vtNode$,
404: we maintain three variables referring to $\vtNode$'s best prefix:
405: $\vtPreWgt_{\vtNode}$ being its weight,
406: $\vtPreNode_{\vtNode}$ its last node (immediately preceding $\vtNode$), and
407: $\vtPreArc_{\vtNode}$ its last transition $e\!\in\!E$ of $A\tapnum{1}$.
408: %
409: The $\vtPreNode_{\vtNode}$ are back-pointers
410: that fully define the best prefix of each node $\vtNode$.
411: All $\vtPreWgt_{\vtNode}$, $\vtPreNode_{\vtNode}$, and $\vtPreArc_{\vtNode}$
412: are initially undefined ($\,=\!\Null\,$).\footnote{
413: %
414: The variables $\vtPreWgt_{\vtNode}$, $\vtPreNode_{\vtNode}$, and $\vtPreArc_{\vtNode}$
415: can be formally regarded as elements of the vectors
416: $\vtPreWgtArr$, $\vtPreNodeArr$, and $\vtPreArcArr$,
417: respectively, that are indexed by values of $\vtNode$.
418: In a practical implementation is, however, meaningful to store these variables
419: directly on the node that they refer to.
420: %
421: }
422:
423:
424: %% --------------------------------------------------------------
425:
426:
427: \begin{figure}[htb]
428: \begin{center}
429: {\small \input{vit-1tape.pc} }
430: \vspace{-2ex}
431: \caption{Pseudocode of $1$-tape best-path search
432: \label{pc:vit-1tape}}
433: %\vspace{-1ex}
434: \end{center}
435: \end{figure}
436:
437:
438: \subsection{Algorithm}
439:
440: The algorithm \FUNCT{FsaViterbi}{~}
441: returns from all paths $\path$ of the $1$-WFSM $A\tapnum{1}$ that accept the string $s$,
442: the one with minimal weight
443: (Figure~\ref{pc:vit-1tape}).
444: %
445: $A\tapnum{1}$ must not contain any transitions labeled with $\eps$ (the empty string).
446: %
447: At least a partial order must be defined on the semiring of weights.
448: %
449: Nothing else is required concerning the labels, weights, or structure of $A\tapnum{1}$.\footnote{
450: %
451: Cycles are, e.g., not required to have non-negative weights (as for Dijkstra's algorithm)
452: because all paths of interest are constrained by the input string.
453: %
454: }
455:
456:
457: The algorithm starts with creating an initial node set $\vtNodeSet_{\sf initial}=\vtNodeSet_0$
458: for the initial position $\vtPtr=0$ of the reading pointer.
459: The set $\vtNodeSet_{\sf initial}$ contains a node for each initial state of $A\tapnum{1}$
460: (Lines~\ref{pc1:L101}--\ref{pc1:L104}).
461: %
462: The prefix weights $\vtPreWgt_\vtNode$ of these nodes are set to the initial weight $\wgtInit(q)$
463: of the respective states $q$.
464: %
465: The set of node sets $\vtNodeSetSet$ contains only $\vtNodeSet_{\sf initial}$ at this point
466: (Line~\ref{pc1:L105}).
467:
468:
469: In the subsequent iteration (Lines~\ref{pc1:L201}--\ref{pc1:L215}),
470: reaching from the first to the one but last pointer position, $p=0,\ldots|s|\!-\!1$,
471: we inspect all outgoing transitions $e\!\in\!E(q)$
472: of all states $q\!\in\!Q$
473: for which there is a node $\vtNode\!=\!\aTuple{q,\vtPtr}$ in $\vtNodeSet_{\vtPtr}$.
474: %
475: If the label $\lab(e)$ of $e$ matches $s$ at position $p$,
476: we create a new node $\vtNode^\prime=\aTuple{\eTrg(e),{\vtPtr^\prime}}$
477: for the target $\eTrg(e)$ of $e$ (Line~\ref{pc1:L204}).
478: Its prefix weight $\vtPreWgt^\prime$ equals the current node's weight $\vtPreWgt_{\vtNode}$
479: multiplied by the weight $w(e)$ of $e$.
480: %
481: The node set $\vtNodeSet_{\vtPtr^\prime}$ for the new $\vtNode^\prime$
482: is created and inserted into the set of node sets $\vtNodeSetSet$
483: ~(if it does not exist yet; Line~\ref{pc1:L211}).
484: %
485: Then $\vtNode^\prime$ is inserted into $\vtNodeSet_{\vtPtr^\prime}$
486: ~(if it is not yet a member of it; Line~\ref{pc1:L213}).
487: %
488: If the prefix weight of $\vtNode^\prime$ is still undefined,
489: $\vtPreWgt_{\vtNode^\prime}=\Null$
490: ~(because no prefix of $\vtNode^\prime$ has been analyzed yet),
491: or if it is higher than the weight of the currently analyzed new prefix,
492: $\vtPreWgt_{\vtNode^\prime} > \vtPreWgt^\prime$,
493: then the variables $\vtPreWgt_{\vtNode^\prime}$, $\vtPreNode_{\vtNode^\prime}$,
494: and $\vtPreArc_{\vtNode^\prime}$ of $\vtNode^\prime$
495: are assigned values of the new prefix (Lines~\ref{pc1:L214}--\ref{pc1:L215}).
496:
497:
498: The algorithm terminates by selecting the node $\widehat\vtNode$,
499: corresponding to the path with the minimal weight,
500: from the final node set $\vtNodeSet_{\sf final}=\vtNodeSet_{|s|}$.
501: This weight is the product of the node's prefix weight $\vtPreWgt_{\vtNode}$
502: and the final weight $\wgtFin(q)$ of the corresponding state $q\!\in\!Q$
503: (Line~\ref{pc1:L301}).
504: %
505: The function \FCT{getPath}{~} identifies the best path $\path$
506: by following all back-pointers $\vtPreNode_{\vtNode}$,
507: from the node $\widehat\vtNode\in\vtNodeSet_{\sf final}$
508: to some node $\vtNode\in\vtNodeSet_{\sf initial}$,
509: and collecting all transitions $e\!=\!\vtPreArc_{\vtNode}$ it encounters.
510: %
511: Finally, $\path$ is returned.
512:
513:
514: %% --------------------------------------------------------------
515:
516: \subsection{$\eps$-Transitions}
517:
518: The algorithm can be extended to allow for $\eps$-transitions (but not for $\eps$-cycles).
519: The source and target node, $\vtNode$ and $\vtNode^\prime$, of an $\eps$-transition
520: would be in the same $\vtNodeSet_{\vtPtr}$.
521: %
522: If $\vtNode^\prime\!=\!\aTuple{q^\prime, p^\prime}$ is actually inserted into $\vtNodeSet_{\vtPtr}$
523: (Line~\ref{pc1:L213})
524: or if its variables $\vtPreWgt_{\vtNode^\prime}$, $\vtPreNode_{\vtNode^\prime}$,
525: and $\vtPreArc_{\vtNode^\prime}$ change their values
526: (Lines~\ref{pc1:L214}--\ref{pc1:L215}),
527: then we have to (re-)``include'' $\vtNode^\prime$ into the iteration over all nodes
528: of the currently inspected $\vtNodeSet_{\vtPtr}$
529: (Line~\ref{pc1:L204}).
530: %
531: The algorithm will still terminate
532: since there can be only finite sequences of $\eps$-transitions
533: (as long as we have no $\eps$-cycles).
534:
535:
536: %% --------------------------------------------------------------
537:
538: \subsection{Best transduction}
539:
540: The algorithm \FUNCT{FsaViterbi}{~}
541: can be used for compiling the best transduction of a given input string $s$
542: by a $2$-WFSM (weighted transducer).
543: For this, we identify the best path $\path$ accepting $s$ on its input tape
544: and take the label of $\path$'s output tape as best output string $v$.
545:
546:
547: %% --------------------------------------------------------------
548: %% VITERBI ON n-TAPE
549: %% --------------------------------------------------------------
550:
551:
552: \section{$n$-Tape Best-Path Search
553: \label{sec:vit-ntape}}
554:
555:
556: We come now to the central topic of this paper:
557: the generalization of the Viterbi algorithm
558: for searching for the best of all paths of an $n$-WFSM, $A\tapnum{n}$,
559: that accept a given $n$-tuple of input strings,
560: $s\tapnum{n}\!=\!\aTuple{s_1,\ldots s_n}$.
561: %
562: This requires relatively few modifications to the above explained
563: structures and algorithm (Section~\ref{sec:vit-1tape}).
564:
565:
566: %% --------------------------------------------------------------
567:
568: \subsection{Structures}
569:
570: The main difference wrt. the previous structures is that now
571: our reading pointer is a vector of $n$ natural integers,
572: $
573: \vtPtr\tapnum{n}\!=\!\aTuple{\vtPtr_1,\ldots\vtPtr_n} \in
574: \left(\vspc{2ex} \aRange{0,\dots|s_1|} \times\ldots
575: \times \aRange{0,\dots|s_n|} \;\right)
576: \subset \Nat^n
577: $.
578: The pointer is initially positioned before the first letter
579: of each $s_i$ ~($\forall i\!\in\!\aRange{1,n}$),
580: ~$\,\vtPtr\tapnum{n}\!=\!\aTuple{0,\ldots 0}\,$.
581: Its elements $p_i$ are then increased according to the non-synchronized reading of the $s_i$
582: on the tapes $i$ ~($\forall i\!\in\!\aRange{1,n}$),
583: until the pointer reaches its final position after the last letter of each $s_i$,
584: ~$\,\vtPtr\tapnum{n}\!=\!\aTuple{|s_1|,\ldots |s_n|}\,$.
585:
586: More precisely, a pointer is an element of the monoid $\aTuple{\Nat^n, +, {\bf 0}}$
587: with $+$ being vector addition and ${\bf 0}$ the vector of $n$ $0$'s.
588: %
589: We have a partial order of pointers.
590: Let $ \vtLess\; : \Nat^n\!\times\!\Nat^n \rightarrow \aSet{\srTrue, \srFalse} $.
591: Let $ a,b\in\Nat^n $,
592: then $ a \vtLess b \biimplies \left(\; \exists c\in\Nat^n, c\not={\bf 0} : a+c=b \right)\; $.
593: We say $a$ {\it precedes} $b$.
594: %
595: It holds that $ a \vtLess b \implies \left(\; \sum_{i=1}^n a_i < \sum_{i=1}^n b_i \right)\; $
596: where $a_i$ and $b_i$ are the vector elements.
597:
598:
599: % ------------------------
600:
601: In the trellis (Figure~\ref{fig:vit-ntape})
602: we have still one node set $\vtNodeSet_{\vtPtr\tapnum{n}}$
603: per pointer position $\vtPtr\tapnum{n}$,
604: a single initial node set $\vtNodeSet_{\sf initial}\!=\!\vtNodeSet_{\aTuple{0,\dots 0}}$
605: and a single final node set $\vtNodeSet_{\sf final}\!=\!\vtNodeSet_{\aTuple{|s_1|,\dots|s_n|}}$.
606: There are, however, several nodes sets in parallel between the two
607: (corresponding to pointers $\vtPtr\tapnum{n},{\vtPtr^\prime}\tapnum{n}$
608: not preceding each other, i.e.,
609: $\vtPtr\tapnum{n}\!\not\vtLess\!{\vtPtr^\prime}\tapnum{n} \logAnd
610: {\vtPtr^\prime}\tapnum{n}\!\not\vtLess\!\vtPtr\tapnum{n}$).
611:
612:
613: \begin{figure}[ht]
614: \begin{center}
615: \includegraphics[scale=0.5,angle=0]{vit-ntape.eps}
616: \caption{Modified trellis for $n$-tape best-path search
617: \label{fig:vit-ntape}}
618: \end{center}
619: %\vspace{-3ex}
620: \end{figure}
621:
622:
623: %% --------------------------------------------------------------
624:
625: \subsection{Algorithm}
626:
627: The algorithm \FUNCT{FsmViterbi}{~}
628: returns from all paths $\path\tapnum{n}$ of the $n$-WFSM $A\tapnum{n}$
629: that accept the string tuple $s\tapnum{n}$, the one with minimal weight
630: (Figure~\ref{pc:vit-ntape}).
631: $A\tapnum{n}$ must not contain any transitions labeled with $\aTuple{\eps,\ldots\eps}$.\footnote{
632: %
633: The algorithm can be extended to allow for $\aTuple{\eps,\ldots\eps}$-transitions
634: (but not for $\aTuple{\eps,\ldots\eps}$-cycles)
635: as described in Section~\ref{sec:vit-1tape}.
636: %
637: }
638:
639:
640: The initial node set $\vtNodeSet_{\sf initial}\!=\!\vtNodeSet_{\aTuple{0,\dots 0}}$
641: is created as before, and inserted into the set of node sets $\vtNodeSetSet$
642: (Lines~\ref{pc2:L101}--\ref{pc2:L105}).
643: In addition, it is inserted into a Fibonacci heap\footnote{
644: % -----------------------
645: Alternatively, one could use a binary heap.
646: Tests on a concrete example have, however, shown that the algorithm performs slightly better
647: with a Fibonacci heap
648: (Table~\ref{tab:AlignResults}).
649: % -----------------------
650: }
651: $\vtNodeSetHeap$
652: ~(Line~\ref{pc2:L105})
653: \cite{fredman+tarjan:1987}.
654: This heap contains node sets $\vtNodeSet_{\vtPtr\tapnum{n}}$
655: that have not yet been processed,
656: and uses $\sum_{i=1}^n \vtPtr_i$ as sorting key.
657:
658:
659: The subsequent iteration continues as long as $\vtNodeSetHeap$ is not empty
660: (Lines~\ref{pc2:L201}--\ref{pc2:L215}).
661: %
662: The function \FCT{extractMinElement}{~}
663: extracts the (or a) minimal element $\vtNodeSet_{\vtPtr\tapnum{n}}$ from $\vtNodeSetHeap$
664: ~(Line~\ref{pc2:L202}).
665: Due to our sorting key,
666: none of the remaining $\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}$ in $\vtNodeSetHeap$
667: is a predecessor to $\vtNodeSet_{\vtPtr\tapnum{n}}$~:~
668: $
669: \forall\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}\!\in\!\vtNodeSetHeap \,,\;
670: {\vtPtr^\prime}\tapnum{n}\!\not\vtLess\!\vtPtr\tapnum{n}
671: $.
672: This property prevents the compilation of suffixes
673: of a $\vtNodeSet_{\vtPtr\tapnum{n}}$ that has some not yet analyzed prefixes
674: (which could lead to wrong choices).
675: %
676: The extracted $\vtNodeSet_{\vtPtr\tapnum{n}}$ is
677: handled almost as in the previous algorithm (Figure~\ref{pc:vit-1tape}).
678: %
679: Transition labels $\lab(e\tapnum{n})$ are required to match with a factor of $s\tapnum{n}$
680: at position $\vtPtr\tapnum{n}$
681: (Line~\ref{pc2:L206}).
682: %
683: New $\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}$ are inserted both into $\vtNodeSetSet$
684: and $\vtNodeSetHeap$
685: ~(Lines~\ref{pc2:L210}--\ref{pc2:L211}).
686:
687:
688: \begin{figure}[ht]
689: \begin{center}
690: {\small \input{vit-ntape.pc} }
691: \vspace{-2ex}
692: \caption{Pseudocode of $n$-tape best-path search
693: \label{pc:vit-ntape}}
694: \end{center}
695: \end{figure}
696:
697:
698: %% --------------------------------------------------------------
699:
700: \subsection{Best transduction}
701:
702: The algorithm \FUNCT{FsmViterbi}{~}
703: can be used for obtaining from a weighted $(n\!+\!m)$-WFSM (transducer)
704: with $n$ input and $m$ output tapes,
705: the best transduction of a given input $n$-tuple $s\tapnum{n}$.
706: For this, we identify the best path $\path\tapnum{n\!+\!m}$
707: accepting $s\tapnum{n}$ on its $n$ input tapes
708: and take the label of $\path$'s $m$ output tapes as best output $m$-tuple $v\tapnum{m}$.
709: Input and output tapes can be in any order.
710:
711:
712: %% --------------------------------------------------------------
713:
714: \subsection{Complexity}
715:
716: The trellis (Figure~\ref{fig:vit-ntape})
717: consists of at most $|\vtPtrSet|=\prod_{i=1}^{n}(|s_i|+1)$
718: node sets $\vtNodeSet_{\vtPtr\tapnum{n}}\!\in\!\vtNodeSetSet$.
719: Assuming approximately equal length $|s|$ for all $s_i$ of $s\tapnum{n}$,
720: we can simplify: $|\vtPtrSet|\approx(|s|+1)^n$.
721: %
722: For each node set $\vtNodeSet_{\vtPtr\tapnum{n}}$
723: we have to create at most $|Q|$ nodes $\vtNode\!\in\!\vtNodeSet_{\vtPtr\tapnum{n}}$,
724: which leads to a $\complexity\left( |s|^n |Q| \right)$ space complexity
725: for our algorithm.
726:
727: Each $\vtNodeSet_{\vtPtr\tapnum{n}}$ is extracted once from the Fibonacci heap $\vtNodeSetHeap$
728: in $\complexity(\log|P|)$ time.
729: We analyze for $\vtNodeSet_{\vtPtr\tapnum{n}}$ at most $|E|$ transitions $e\!\in\!E$
730: of $A\tapnum{n}$.
731: For the target of each $e$ we find a $\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}\!\in\!\vtNodeSetSet$
732: in $\complexity(\log|P|)$ time
733: and a node $\vtNode^\prime\!\in\!\vtNodeSet_{{\vtPtr^\prime}\tapnum{n}}$
734: in $\complexity(\log|Q|)$ time.
735: %
736: Thus, \FUNCT{FsmViterbi}{~} has a worst-case overall time complexity of
737: $
738: \complexity\left(\; |P| (\log|P| + |E| (\log|P| + \log|Q|)) \;\right)
739: = \complexity\left(\, |P||E|\log|P||Q| \,\right)
740: = \complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)
741: $~.
742:
743:
744: An HMM has exactly one transition per state pair, so that $|E|\!=\!|Q|^2$,
745: and an arity of $n\!=\!1$.
746: There would also be never more than one $\vtNodeSet_{\vtPtr\tapnum{n}}$ on the heap,
747: extractable in constant time.
748: In this case, our algorithm has a $\complexity\left( |s| |Q| \right)$ space
749: and a $\complexity\left( |s| |Q|^2 \right)$ time complexity,
750: as has the classical version of the Viterbi algorithm
751: (Section~\ref{sec:prelim}).
752:
753:
754: %% --------------------------------------------------------------
755: %% VITERBI ON n-TAPE
756: %% --------------------------------------------------------------
757:
758:
759: \section{Example: Word Alignment
760: \label{sec:align}}
761:
762:
763: In this section we illustrate our $n$-tape best path search on a practical example:
764: the alignment of word pairs.
765:
766: Suppose, we want to create a (non-weighted) transducer, $D\tapnum{2}$,
767: from a list of word pairs $s\tapnum{2}$
768: of the form $\aTuple{\WordD{inflected form}, \WordD{lemma}}$,
769: e.g., $\aTuple{\Word{swum}, \Word{swim}}$,
770: such that each path of the transducer is labeled with one of the pairs.
771: %
772: We want to use only transition labels of the form
773: $\aTuple{\sigma,\sigma}$, $\aTuple{\sigma,\eps}$, or $\aTuple{\eps,\sigma}$ ~($\forall\sigma\in\Sigma$),
774: while keeping paths as short as possible.
775: For example,
776: $\aTuple{\Word{swum}, \Word{swim}}$ should be encoded either by the sequence
777: $\aTuple{\Word{s},\Word{s}}\aTuple{\Word{w},\Word{w}}%
778: \aTuple{\Word{u},\eps}\aTuple{\eps,\Word{i}}\aTuple{\Word{m},\Word{m}}$
779: or by
780: $\aTuple{\Word{s},\Word{s}}\aTuple{\Word{w},\Word{w}}%
781: \aTuple{\eps,\Word{i}}\aTuple{\Word{u},\eps}\aTuple{\Word{m},\Word{m}}$,
782: rather than by the ill-formed
783: $\aTuple{\Word{s},\Word{s}}\aTuple{\Word{w},\Word{w}}%
784: \aTuple{\Word{u},\Word{i}}\aTuple{\Word{m},\Word{m}}$,
785: or the sub-optimal %\linebreak
786: $\aTuple{\Word{s},\eps}\aTuple{\Word{w},\eps}\aTuple{\Word{u},\eps}\aTuple{\Word{m},\eps}%
787: \aTuple{\eps,\Word{s}}\aTuple{\eps,\Word{w}}\aTuple{\eps,\Word{i}}\aTuple{\eps,\Word{m}}$.
788: %
789: To achieve this, we perform for each word pair an alignment based on minimal edit distance.
790:
791:
792: %% --------------------------------------------------------------
793:
794: \subsection{Standard solution with edit distance matrix}
795:
796: A well known standard solution for word alignment is based on edit distance
797: which is a string similarity measure
798: defined as the minimum cost needed to convert one string into another
799: \cite{wagner+fischer:1974,pirkola+al:2003}.
800: %%++ For the sake of space, we only briefly recall this method.
801:
802:
803: For two words, $a\!=\!a_1\ldots a_n$ and $b\!=\!b_1\ldots b_m$,
804: the edit distance can be compiled with a matrix
805: $X\!=\!\{x_{i,j}\}$ ~($i\!\in\!\aRange{0,n}$, $j\!\in\!\aRange{0,m}$)
806: (Figures~\ref{fig:EditDistanceMatrix} and~\ref{pc:EditDistanceMatrix}).
807: %
808: A horizontal move in $X$ at a cost $c_I$ expresses an {\it insertion}\/,
809: a vertical move at a cost $c_D$ a {\it deletion}\/,
810: and a diagonal move at a cost $c_S$ a {\it substitution}\/ if $a_i\!\not=\!b_j$
811: or no edit operation if $a_i\!=\!b_j$.
812: %
813: We set $c_I\!=\!c_D\!=\!1$,
814: $c_S\!=\!\infty$ for $a_i\!\not=\!b_j$ (to disable substitutions),
815: and $c_S\!=\!0$ for $a_i\!=\!b_j$.
816: %
817: The element $x_{0,0}$ is set to $0$ and all other $x_{i,j}$ to
818: $\min(x_{i,j-1}+c_I\,,\; x_{i-1,j}+c_D\,,\; x_{i-1,j-1}+c_S)$,
819: insofar as these choices are available,
820: proceeding top-down and left-to-right.
821: %
822: The choices made to go from $x_{0,0}$ to $x_{n,m}$ describe the set of paths with (the same) minimal cost.
823: Each of these paths defines a sequence of edit operations for transforming $a$ into $b$.
824:
825:
826: The algorithm operates in $\complexity(|a||b|)$ time and space complexity.
827:
828:
829: \begin{figure}[ht]
830: \vspace{1ex}
831: \begin{center}
832: \begin{minipage}{65mm}
833: \begin{center}
834: \includegraphics[scale=0.5,angle=0]{edit-matrix.eps}
835:
836: \caption{Edit distance matrix
837: $X\!=\!\{x_{i,j}\}$
838: (choices are indicated by arrows; minimum cost paths by thick arrows and circles)
839: \label{fig:EditDistanceMatrix}}
840: \end{center}
841: \end{minipage}
842: %
843: \spc{10mm}
844: %
845: \begin{minipage}{60mm}
846: \begin{center}
847: {\small
848: \input{align.pc}
849: }
850:
851: \vspace{-3ex}
852: \caption{Pseudocode of compiling an {\it edit distance matrix}
853: \label{pc:EditDistanceMatrix}}
854: \end{center}
855: \end{minipage}
856: \end{center}
857: \end{figure}
858:
859:
860: %% --------------------------------------------------------------
861:
862: %\newpage
863:
864: \subsection{Solution with 2-tape best path search}
865:
866: Alternatively, word alignment can be performed by best path search on an $n$-WFSM,
867: such as $A\tapnum{5}$ generated from the expression
868: \cite{isabelle+kempe:2004}
869: %
870: %%++ \begin{equation}
871: %%++ \scalebox{0.85 1.0}{
872: \begin{eqnarray}
873: A\tapnum{5} & \;=\; & \left(\;
874: \aTuple{\aTuple{\Any,\Any,\Any,\Any,\algnK}_{\aSet{1=2=3=4}} , 0}
875: \right. \nonumber \\
876: & & \left. \spc{5ex}
877: \;\cup\; \aTuple{\aTuple{\eps,\Any,\algnEps,\Any,\algnI}_{\aSet{2=4}} , 1}
878: \;\cup\; \aTuple{\aTuple{\Any,\eps,\Any,\algnEps,\algnD}_{\aSet{1=3}} , 1} \;\right)^*
879: \label{eq:align}
880: \end{eqnarray}
881: %%++ }
882: %%++ \end{equation}
883:
884: \noindent
885: where $\Any$~ can be instantiated by any symbol $\sigma\!\in\!\Sigma$,
886: $\algnEps$ is a special symbol representing $\eps$ in an alignment,
887: $\aSet{1\!=\!2\!=\!3\!=\!4}$ a constraint
888: requiring the $\Any$'s on tapes $1$ to $4$ to be instantiated by the same symbol
889: \cite{nicart+al:2006a},\footnote{
890: %
891: Roughly following \cite{kempe+champarnaud+eisner:2004a},
892: we employ here a simpler notation for constraints than in~\cite{nicart+al:2006a}.
893: %
894: }
895: and $0$ and $1$ are weights over the semiring $\aTuple{\srSetN\cup\aSet{\infty}, \min, +, \infty, 0}$.
896:
897: Input word pairs $s\tapnum{2}\!=\!\aTuple{s_1,s_2}$ will be matched on tape 1 and 2,
898: and aligned output word pairs generated from tape 3 and 4.
899: %
900: A symbol pair $\aTuple{\Any,\Any}$ read on tape 1 and 2
901: is identically mapped to $\aTuple{\Any,\Any}$ on tape 3 and 4,
902: a $\aTuple{\eps,\Any}$ is mapped to $\aTuple{\algnEps,\Any}$,
903: and a $\aTuple{\Any,\eps}$ to $\aTuple{\Any,\algnEps}$.
904: %
905: $A\tapnum{5}$ will introduce $\algnEps$'s in $s_1$ (resp. in $s_2$) at positions
906: where $D\tapnum{2}$ shall have $\aTuple{\eps,\sigma}$-
907: (resp. a $\aTuple{\sigma,\eps}$-) transitions.
908: %
909: (Later, we simply replace in $D\tapnum{2}$ all $\algnEps$ by $\eps$.)
910:
911:
912: Thus, we obtain the full set of all possible alignments between $s_1$ and $s_2$.
913: The best alignment is the one with the lowest weight.
914: %
915: For example, $\aTuple{\Word{swum}, \Word{swim}}$ is mapped to a set of alignments,
916: including the two best ones,
917: $\aTuple{\Word{sw\algnEps um}, \Word{swi\algnEps m}}$
918: and $\aTuple{\Word{swu\algnEps m}, \Word{sw\algnEps im}}$, with weight 2 both.
919: %
920: The (or a) best alignment can be found without generating all alignments,
921: by means of our $n$-tape best path search (with $n\!=\!2$).
922:
923:
924: So far, we did not use tape 5.
925: It can serve for excluding certain paths.
926: For example, joining $A\tapnum{5}$ on tape 5 with $C\tapnum{1}$
927: %\cite{kempe+champarnaud+eisner:2004a} %% REPLACE THIS REFERENCE IN THE FINAL VERSION
928: \cite{kempe+al:2005a,kempe+al:2005b}
929: built from the expression $\neg(\Any^*\;\algnI\;\algnD\;\Any^*)$,
930: prohibiting an insertion ($\algnI$) to be immediately followed by a deletion ($\algnD$),
931: would leave only $\aTuple{\Word{swu\algnEps m}, \Word{sw\algnEps im}}$ as a best path.
932:
933:
934: %%++ Our algorithm operates on this example
935: %%++ with a $\complexity(\,|s_1||s_2|\log|s_1||s_2|\,)$ time
936: %%++ and a $\complexity(\,|s_1||s_2|\,)$ space complexity.
937:
938: The 5-WFSM from Equation~\eqref{eq:align}
939: has 1 state and 3 transitions.
940: Input is read on 2 tapes.
941: Our algorithm works on this example
942: with a worst-case time complexity of
943: $
944: \complexity(\, |s_1||s_2|\cdot 3\cdot \log(|s_1||s_2|\cdot 1) \,)
945: = \complexity(\,|s_1||s_2|\log|s_1||s_2|\,)
946: $
947: and a worst-case space complexity of
948: $
949: \complexity(\,|s_1||s_2|\cdot 1 \,)
950: = \complexity(\,|s_1||s_2|\,)
951: $~.
952:
953:
954: %% --------------------------------------------------------------
955:
956:
957: \subsection{Test results}
958:
959: We tested our $n$-tape best-path algorithm on the alignment of the German word pair
960: $\aTuple{\Word{gemacht}, \Word{machen}}$ ~(English: $\aTuple{\WordD{done}, \WordD{do}}$),
961: leading to % \linebreak
962: $\aTuple{\Word{gemacht\algnEps\algnEps}, \Word{\algnEps\algnEps{mach}\algnEps{en}}}$.
963: %
964: We repeated this test for the word pairs $\aTuple{s_1^r, s_2^r}$
965: with $s_1=$``\Word{gemacht}'' and $s_2$=``\Word{machen}'',
966: and $r\!\in\!\aRange{1,8}$.\footnote{
967: % --------------------------------
968: For example, for $r\!=\!2$ we have
969: $\aTuple{{\sf\small gemachtgemacht}, {\sf\small machenmachen}}$.
970: % --------------------------------
971: }
972:
973:
974: \def\S{\spc{1.1ex}}
975: \def\D{\spc{0.6ex}}
976:
977: \begin{table}[ht]
978: \vspace{1ex}
979: \begin{center}
980: \begin{math} %\tabcolsep1ex
981: \begin{tabular}{ c | r r r | c } \hline
982: % ------------------------------
983: \spc{2ex}$r$\spc{2ex} & \spc{2ex}A\spc{0.5ex} & \spc{3ex}B\spc{2.5ex} & \spc{3ex}C\spc{2.8ex} & \spc{4ex}D\spc{3ex} \\ \hline
984: % ------------------------------
985: 1 & 1 & 1\D\S\S & 1\D\S\S & 1.056 \\
986: 2 & 4 & 4.12 & 5.48 & 1.041 \\
987: 3 & 9 & 9.41 & 14.3\S & 1.057 \\
988: 4 & 16 & 17.1\S & 27.9\S & 1.029 \\
989: 5 & 25 & 27.2\S & 46.5\S & 1.059 \\
990: 6 & 36 & 39.8\S & 70.5\S & 1.016 \\
991: 7 & 49 & 54.1\S & 100\D\S\S & 1.005 \\
992: 8 & 64 & 70.8\S & 135\D\S\S & 1.006 \\ \hline
993: % ------------------------------
994: \end{tabular}
995: \end{math}
996:
997: \caption{Test results for word pair alignment with 2-tape best path search
998: \label{tab:AlignResults}}
999: \vspace{-1ex}
1000: \end{center}
1001: \end{table}
1002:
1003:
1004: \noindent
1005: The columns of Table~\ref{tab:AlignResults} show for different $r$~:
1006: %
1007: \begin{itemize}
1008: \item[(A)]
1009: an estimated time ratio of $r^2$ for the classical approach with an edit distance matrix,
1010: %
1011: \item[(B)]
1012: the measured time ratio for 2-tape best path search (wrt. 3.93 milliseconds for $r=1$)
1013: using a Fibonacci heap,
1014: %
1015: \item[(C)]
1016: an estimated worst-case time ratio of
1017: $
1018: \frac{(7r\cdot 6r) \log (7r\cdot 6r)}{(7\cdot 6) \log (7\cdot 6)}
1019: = r^2(1\!+\!2\frac{\log r}{\log 42})
1020: $
1021: corresponding to the worst-case complexity of $\complexity(7r 6r \log 7r 6r)$
1022: for the two words of length $7r$ and $6r$, respectively, and
1023: %
1024: \item[(D)]
1025: the measured time increase factor when using a binary instead of a Fibonacci heap.
1026: %
1027: \end{itemize}
1028:
1029:
1030: Comparing the columns A and B shows a time complexity slightly above
1031: $\complexity(r^2) = \complexity(\,|s_1^r||s_2^r|\,)$,
1032: being much lower than the worst-case time complexity in column C,
1033: for our algorithm on this example.
1034:
1035:
1036: \pagebreak
1037:
1038: %% --------------------------------------------------------------
1039: %% ALTERNATIVES
1040: %% --------------------------------------------------------------
1041:
1042:
1043: \section{An Alternative Approach
1044: \label{sec:alternatives}}
1045:
1046:
1047: A well-known straight forward alternative to the above $n$-tape best-path search
1048: on an $n$-WFSM $A\tapnum{n}$ is
1049: to intersect $A\tapnum{n}$ with an $n$-WFSM $I\tapnum{n}$,
1050: containing a single path labeled with the input $n$-tuple $s\tapnum{n}$,
1051: and then to apply a classical shortest-distance algorithm, ignoring the labels.
1052:
1053:
1054: %% --------------------------------------------------------------
1055:
1056: \subsection{Intersection}
1057:
1058: The intersection $B\tapnum{n} = I\tapnum{n} \cap A\tapnum{n}$
1059: can be compiled as the join $I\tapnum{n} \JOIN{1=1,\ldots n=n} A\tapnum{n}$
1060: \cite{kempe+champarnaud+eisner:2004a}.
1061: %
1062: In general, it has undecidable emptiness and rationality \cite{rabin+scott:1959}.
1063: In our case, however,
1064: with $A\tapnum{n}$ being $\aTuple{\eps,\ldots\eps}$-cycle free
1065: and $I\tapnum{n}$ acyclic,
1066: it is even for non-commutative semirings always rational.\footnote{
1067: % ----------------------------------
1068: The intersection of two $n$-WFSM over non-commutative semirings
1069: is in general not rational (even for $n\!=\!1$).
1070: % ----------------------------------
1071: }
1072:
1073:
1074: Actually, the trellis $\vtNodeSet$ in Figure~\ref{fig:vit-ntape}
1075: corresponds partially to $B\tapnum{n}$.
1076: Each node $\vtNode\!\in\!\vtNodeSet$
1077: corresponds to a state $q\!\in\!Q_B$ of $B\tapnum{n}$
1078: (and vice versa);
1079: however, only those transitions $e\!\in\!E_B$ of $B\tapnum{n}$
1080: that correspond to a state's best prefix,
1081: %%++ occur as back-pointers $\vtPreNode_{\vtNode}$ in $\vtNodeSet$.\footnote{
1082: occur as ``best transitions'' $e_{\vtNode}$ in $\vtNodeSet$.\footnote{
1083: % ---------------------------------
1084: Due to this analogy, one can easily derive an $n$-tape intersection (or join) algorithm,
1085: for precisely our case, from the algorithm in Figure~\ref{pc:vit-ntape}.
1086: Trellis nodes would become states of the resulting $n$-WFSM.
1087: All of their incoming transitions would be constructed,
1088: rather than only those that correspond to a best prefix.
1089: The state set would be partitioned like the trellis.
1090: The Fibonacci heap can be replaced by a stack
1091: (which does not decrease the overall time complexity),
1092: because the order in which partitions are treated would be irrelevant.
1093: % ---------------------------------
1094: }
1095:
1096:
1097: From this analogy we deduce that compiling the intersection $B\tapnum{n}$
1098: has a worst-case time and space complexity of
1099: $\complexity\left(\, |P||E|\log|P||Q| \,\right)$, with $|P|\!=\!(|s|+1)^n$,
1100: equal to the time complexity for constructing the trellis.
1101: %
1102: The result, $B\tapnum{n}$, has at most
1103: $\nu \leq |P||Q|$ states and $\mu \leq |P||E|$ transitions.
1104:
1105:
1106: %% --------------------------------------------------------------
1107:
1108: \subsection{Shortest-distance algorithms}
1109:
1110: Since any $n$-WFSM with multiple initial states can be transformed
1111: into one with a single initial state,
1112: we can use any algorithm that solves a single-source shortest-distance problem,
1113: %
1114: such as Dijkstra's algorithm \cite{dijsktra:1959}
1115: combined with Fibonacci heaps \cite{fredman+tarjan:1987},
1116: that operates in $\complexity(\mu + \nu\log\nu)$ time,
1117: %
1118: or Bellman-Ford's algorithm \cite{bellman:1958,ford+fulkerson:1956}
1119: operating in $\complexity(\mu\nu)$ time,
1120: %
1121: with $\nu$ being the number of states and $\mu$ the number of transitions.
1122:
1123:
1124: Recently, it has been shown that any single-source shortest-distance algorithm on
1125: directed graphs has a lower bound of $\Omega(\mu + \min(\nu\log\nu ,\; \nu\log\rho))$
1126: where $\rho$ is the ratio of the maximal to minimal transition weight
1127: \cite{pettie:2003}.
1128: Since we cannot make any assumption concerning $\rho$ in general,
1129: we consider $\widehat\Omega(\mu + \nu\log\nu)$ as a ``worst-case lower bound''.
1130: It equals the upper bound of Dijkstra's algorithm.
1131:
1132:
1133: On the intersection $B\tapnum{n} = I\tapnum{n} \cap A\tapnum{n}$,
1134: Dijkstra's algorithm requires $\complexity(|P||E|+|P||Q|\log|P||Q|)$ time,
1135: and Bellman-Ford's $\complexity(|P|^2|E||Q|)$ time, in the worst case.
1136: %
1137: The sets $E$ and $Q$ refer to $A\tapnum{n}$.
1138:
1139:
1140: %% --------------------------------------------------------------
1141:
1142: \subsection{Complete estimate}
1143:
1144: Intersection and Dijkstra's algorithm have together
1145: a worst-case time complexity of \linebreak
1146: $
1147: \complexity\left(\, |P||E|\log|P||Q| + |P||E| + |P||Q|\log|P||Q| \,\right)
1148: \approx \complexity\left(\, |P|(|E|+|Q|) \log|P||Q| \,\right)
1149: $.
1150: For intersection and Bellman-Ford's algorithm it is
1151: $
1152: \complexity\left(\, |P||E|\log|P||Q| + |P|^2|E||Q| \,\right) =
1153: \complexity\left(\, |P||E|\,(|P||Q|\!+\!\log|P||Q|) \,\right)
1154: $.
1155: Both combinations exceed the complexity of our algorithm.
1156:
1157:
1158: This result is not surprising since
1159: only building the trellis $\vtNodeSet$ should take less time
1160: than building the intersection $B\tapnum{n}$
1161: (which is a kind of ``superset'' of $\vtNodeSet$)
1162: and then performing a best-path search.
1163:
1164:
1165: %% --------------------------------------------------------------
1166:
1167:
1168: \pagebreak
1169:
1170: %% --------------------------------------------------------------
1171: %% CONCLUSION
1172: %% --------------------------------------------------------------
1173:
1174:
1175: \section{Conclusion
1176: \label{sec:conclusion}}
1177:
1178:
1179: We presented an algorithm for identifying the path with minimal (resp. maximal) weight
1180: in a given {\it $n$-tape weighted finite-state machine}\/ ($n$-WFSM), $A\tapnum{n}$,
1181: that accepts a given $n$-tuple of input strings,
1182: $s\tapnum{n}\!=\!\aTuple{s_1,\ldots s_n}$.
1183: %
1184: This problem is of particular interest because it allows us also
1185: to compile the best transduction of a given input $n$-tuple $s\tapnum{n}$
1186: by a weighted $(n\!+\!m)$-WFSM (transducer), $A\tapnum{n+m}$, with $n$ input and $m$ output tapes.
1187: For this, we identify the best path accepting $s\tapnum{n}$ on its $n$ input tapes,
1188: and take the label of its output tapes as best output $m$-tuple $v\tapnum{m}$.
1189: (Input and output tapes can be in any order.)
1190:
1191:
1192: Our algorithm is a generalization of the Viterbi algorithm
1193: which is generally used for detecting the most likely path
1194: in a {\it Hidden Markov Model}\/ (HMM)
1195: for an observed sequence of symbols emitted by the HMM.
1196: %
1197: In the worst case,
1198: it operates in $\complexity\left(\, |s|^n|E|\log|s|^n|Q| \,\right)$ time,
1199: where $n$ and $|s|$ are the number and average length of the strings in $s\tapnum{n}$,
1200: and $|Q|$ and $|E|$ the number of states and transitions of $A\tapnum{n}$,
1201: respectively.
1202:
1203:
1204: We illustrated our $n$-tape best path search on a practical example,
1205: the alignment of word pairs (i.e., $n\!=\!2$),
1206: and provided test results that show a time complexity slightly higher than
1207: $\complexity\left(\, |s|^2 \,\right)$.
1208:
1209:
1210: Finally, we discussed a straight forward alternative approach for solving our problem,
1211: that consists in intersecting $A\tapnum{n}$ with an $n$-WFSM $I\tapnum{n}$,
1212: that has a single path labeled with the input $n$-tuple $s\tapnum{n}$,
1213: and then applying a classical shortest-distance algorithm, ignoring the labels.
1214: %
1215: This has, however, a worst-case time complexity of
1216: $ \complexity\left(\, |s|^n(|E|+|Q|)\log|s|^n|Q| \,\right) $,
1217: which is higher than that of our algorithm.
1218:
1219:
1220: %% --------------------------------------------------------------
1221:
1222:
1223:
1224: %% --------------------------------------------------------------
1225: %% BIBLIOGRAPHY
1226: %% --------------------------------------------------------------
1227:
1228: \bibliographystyle{fullname}
1229:
1230: \bibliography{ntvit}
1231:
1232:
1233: %% --------------------------------------------------------------
1234:
1235: \end{document}
1236:
1237: