1: % Automatic Pattern Detection---
2: % An Algorithm for Constructing Regular Language Filters
3: % cm: 4/6/04, 5/7/04
4: % jpc: 4/6/04, 5/18/04
5:
6: \documentclass[pre,twocolumn,showpacs,showkeys,superscriptaddress,preprintnumbers,floatfix,dvips]{revtex4}
7:
8: \usepackage{newcent}
9: \usepackage{varioref}
10: \usepackage{oubraces}
11:
12: \usepackage{array,graphicx,amsfonts,amsmath,amsthm}% ,amssymb,amsmath}
13: \usepackage{url}
14: \urlstyle{same}
15:
16: \makeatletter
17:
18: % Literate Haskell stuff:
19: \usepackage{amstext}
20: \usepackage{amssymb}
21: \usepackage{stmaryrd}
22: \DeclareFontFamily{OT1}{cmtex}{}
23: \DeclareFontShape{OT1}{cmtex}{m}{n}
24: {<5><6><7><8>cmtex8
25: <9>cmtex9
26: <10><10.95><12><14.4><17.28><20.74><24.88>cmtex10}{}
27: \DeclareFontShape{OT1}{cmtex}{m}{it}
28: {<-> ssub * cmtt/m/it}{}
29: \newcommand{\texfamily}{\fontfamily{cmtex}\selectfont}
30: \DeclareFontShape{OT1}{cmtt}{bx}{n}
31: {<5><6><7><8>cmtt8
32: <9>cmbtt9
33: <10><10.95><12><14.4><17.28><20.74><24.88>cmbtt10}{}
34: \DeclareFontShape{OT1}{cmtex}{bx}{n}
35: {<-> ssub * cmtt/bx/n}{}
36: \newcommand{\tex}[1]{\text{\texfamily#1}} % NEU
37:
38: \newcommand{\Sp}{\hskip.33334em\relax}
39:
40: \newlength{\lwidth}\setlength{\lwidth}{4.5cm}
41: \newlength{\cwidth}\setlength{\cwidth}{8mm} % 3mm
42:
43: \newcommand{\Conid}[1]{\mathit{#1}}
44: \newcommand{\Varid}[1]{\mathit{#1}}
45: \newcommand{\anonymous}{\kern0.06em \vbox{\hrule\@width.5em}}\newcommand{\plus}{\mathbin{+\!\!\!+}}
46: \newcommand{\bind}{\mathbin{>\!\!\!>\mkern-6.7mu=}}
47: \newcommand{\sequ}{\mathbin{>\!\!\!>}}
48: \renewcommand{\leq}{\leqslant}
49: \renewcommand{\geq}{\geqslant}
50: \newcommand{\NB}{\textbf{NB}}
51: \newcommand{\Todo}[1]{$\langle$\textbf{To do:}~#1$\rangle$}
52:
53: \newcommand{\sub}{\ensuremath{\mrm{sub}}}
54:
55: %\newcommand{\Figure}{Figure}
56: \newcommand{\Figure}{Fig.}
57: \renewcommand{\vref}{\ref}
58:
59: \makeatother
60:
61: % end of Haskell stuff.
62:
63: \usepackage[pdfauthor={Carl S. McTague and James
64: P. Crutchfield},pdftitle={Automated Pattern
65: Detection},bookmarks=true]{hyperref}
66:
67: %\journal{Theoretical Computer Science}
68:
69: \newcommand{\Nn}{\ensuremath{\mathbb{N}}}
70: \newcommand{\Zr}{\ensuremath{\mathbb{Z}}}
71: \newcommand{\Qf}{\ensuremath{\mathbb{Q}}}
72: \newcommand{\Rf}{\ensuremath{\mathbb{R}}}
73: \newcommand{\Cf}{\ensuremath{\mathbb{C}}}
74:
75: \newcommand{\Qfp}{\ensuremath{\mathbb{Q}^+}}
76: \newcommand{\Rfp}{\ensuremath{\mathbb{R}^+}}
77:
78: \newcommand{\RP}{\ensuremath{\mathbb{R}\mathrm{P}}}
79: \newcommand{\CP}{\ensuremath{\mathbb{C}\mathrm{P}}}
80: \newcommand{\id}{\ensuremath{\mathrm{id}}}
81:
82: \newcommand{\bd}{\ensuremath{\partial}}
83:
84: \newcommand{\mc}[1]{\mathcal{#1}}
85: \newcommand{\mrm}[1]{\mathrm{#1}}
86: \newcommand{\seq}[1]{\ensuremath{\langle #1 \rangle}}
87:
88: \newcommand{\Input}{\ensuremath{\mathrm{In}}}
89: \newcommand{\Det}{\ensuremath{\mathrm{Det}}}
90: \newcommand{\Filter}{\ensuremath{\mathrm{Filter}}}
91: \newcommand{\Optimize}{\ensuremath{\mathrm{Optimize}}}
92: \newcommand{\Output}{\ensuremath{\mathrm{Out}}}
93: \newcommand{\Lang}{\ensuremath{\mathrm{Lang}}}
94: \newcommand{\Hom}{\ensuremath{\mathrm{Hom}}}
95: \newcommand{\Ext}{\ensuremath{\mathrm{Ext}}}
96:
97: \newcommand{\CA}{\textsc{ca}}
98: \newcommand{\CAs}{\textsc{ca}s}
99: \newcommand{\CapitalCAs}{C\textsc{a}s}
100: \newcommand{\CapitalCA}{C\textsc{a}}
101: \newcommand{\CapitalECA}{E\textsc{ca}}
102: \newcommand{\ECA}{\textsc{eca}}
103: \newcommand{\ECAs}{\textsc{eca}s}
104:
105: \newtheorem{prop}{Proposition}
106: \newtheorem{lem}{Lemma}
107: \newtheorem{alg}{Algorithm}
108:
109: % To make footnotesize text-style subscripts in math formulas
110: \newcommand{\textsub}[1]{\mbox{\footnotesize #1}}
111:
112: % Notation for regular domains
113: \newcommand{\domain}[1]{\ensuremath{\mc{D}_{#1}}}
114:
115: % \newcommand{\trans}[3]{\ensuremath{#1 \stackrel{#2}{\rightarrow} #3}}
116: \newcommand{\trans}[3]{\ensuremath{(#1,#2,#3)}}
117:
118: \begin{document}
119:
120: \title[Automated Pattern Detection]{Automated Pattern Detection---\\
121: An Algorithm for Constructing Optimally\\
122: Synchronizing Multi-Regular Language Filters}
123:
124: \author{Carl S. McTague}
125: \homepage{www.mctague.org/carl}
126: \email{mctague@santafe.edu}
127: %\email{cm434@cam.ac.uk}
128: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA}
129: \affiliation{Department of Mathematical Sciences, University of
130: Cincinnati, Cincinnati, OH 45221-0025, USA}
131:
132: \author{James P. Crutchfield}
133: \homepage{www.santafe.edu/~chaos}
134: \email{chaos@santafe.edu}
135: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA}
136:
137: \date{\today}
138:
139: \begin{abstract}
140: In the computational-mechanics structural analysis of
141: one-dimensional cellular automata the following automata-theoretic
142: analogue of the \emph{change-point problem} from time series
143: analysis arises: \emph{Given a string $\sigma$ and a collection
144: $\{\mc{D}_i\}$ of finite automata, identify the regions of
145: $\sigma$ that belong to each $\mc{D}_i$ and, in particular, the
146: boundaries separating them.} We present two methods for solving
147: this \emph{multi-regular language filtering problem}. The first,
148: although providing the ideal solution, requires a stack, has a
149: worst-case compute time that grows quadratically in $\sigma$'s
150: length and conditions its output at any point on arbitrarily long
151: windows of future input. The second method is to algorithmically
152: construct a transducer that approximates the first algorithm. In
153: contrast to the stack-based algorithm, however, the transducer
154: requires only a finite amount of memory, runs in linear time, and
155: gives immediate output for each letter read; it is, moreover, the
156: best possible finite-state approximation with these three features.
157: \end{abstract}
158:
159: \pacs{
160: 05.45.Tp, % Time series analysis
161: 89.70.+c, % Information science
162: 05.45.-a, % Nonlinear dynamics and nonlinear dynamical systems
163: 89.75.Kd % Complex Systems: Patterns
164: }
165:
166: \keywords{cellular automata; regular languages; computational
167: mechanics; domains; particles; pattern detection; transducer;
168: filter; synchronization; change-point problem}
169:
170: \preprint{Santa Fe Institute Working Paper 04-09-027}
171: \preprint{arxiv.org/abs/cs.CV/0409XXX}
172: % arxiv.org CS categories
173: % Computer Vision and Pattern Recognition CV
174: % Cross list to these:
175: % Computation and Language CL
176: % Data Structures and Algorithms DS
177: % Discret Mathematics DM
178: % Learning LG
179: % Cross list to these math and physics:
180: %
181:
182: \maketitle
183:
184: %\tableofcontents
185:
186: %% \begin{quote}
187: %% \small \emph{I bleach. In time I shall bleach that garment you are
188: %% wearing. For I take the colour out of all things. Thus you see
189: %% these stuffs here, as they are now. Clotho spun the glowing
190: %% threads, and Lachesis wove them, as you observe, in curious
191: %% patterns, very marvellous to see : but when I am done with these
192: %% stuffs there will be no more colour or beauty or strangeness
193: %% anywhere apparent than in so many dish-clouts.}
194: %% \flushright{---Mother Sereda in James Cabell's \emph{Jurgen}.}
195: %% \end{quote}
196:
197: \section{Introduction}
198:
199: Imagine you are confronted with an immense one-dimensional dataset in
200: the form of a string $\sigma$ of letters from a finite alphabet
201: $\Sigma$. Suppose moreover that you discover that vast expanses of
202: $\sigma$ are regular in the sense that they are recognized by simple
203: finite automata~$\mc{D}_1, \dots, \mc{D}_n$. You might wish to bleach
204: out these regular substrings so that only the boundaries separating
205: them remain, for this reduced presentation might illuminate $\sigma$'s
206: more subtle, larger-scale structure.
207:
208: This \emph{multi-regular language filtering problem} is the
209: automata-theoretic analogue of several, more statistical, problems
210: that arise in a wide range of disciplines. Examples include
211: estimating stationary epochs within time series (known as the
212: \emph{change-point problem} \cite{Zack83a}), distinguishing gene
213: sequences and promoter regions from enveloping junk \textsc{dna}
214: \cite{Bald01a}, detecting phonemes in sampled speech \cite{Furu89a},
215: and identifying regular segments within line-drawings \cite{Free74a},
216: to mention a few.
217:
218: The multi-regular language filtering problem arises directly in the
219: computational-mechanics structural analysis of cellular automata
220: \cite{Crut92c}. There, finite automata recognizing temporally invariant
221: sets of strings are identified and then filtered from space-time diagrams
222: to reveal systems of particles whose interactions capture the essence of
223: how a cellular automaton processes spatially distributed information.
224:
225: We present two methods for solving the multi-regular language
226: filtering problem. The first covers $\sigma$ with maximal substrings
227: recognized by the automata~$\{\mc{D}_i\}$. The interesting parts of
228: $\sigma$ are then located where these segments overlap or abut.
229: Although this approach provides the ideal solution to the problem, it
230: unfortunately requires an arbitrarily deep stack to compute, has a
231: worst-case compute time that grows quadratically in $\sigma$'s length,
232: and conditions its output at any point on arbitrarily long windows of
233: future input. As a result, this method becomes extremely expensive to
234: compute for large data sets, including the expansive space-time
235: diagrams that researchers of cellular automata often scrutinize.
236:
237: The second method---and our primary focus---is to
238: algorithmically construct a finite transducer that approximates the
239: first, stack-based algorithm by printing sequences of labels $i$ over
240: segments of $\sigma$ recognized by the automaton~$\mc{D}_i$. When, at
241: the end of such a segment, the transducer encounters a letter
242: forbidden by the prevailing automaton~$\mc{D}_i$, it prints special
243: symbols until it resynchronizes to a new automaton $\mc{D}_j$. In
244: this way, the transducer approximates the stack-based algorithm by
245: jumping from one maximal substring to the next, printing a few special
246: symbols in between. Since it does not jump to a new maximal substring
247: until the preceding one ends, however, the transducer can miss the true
248: beginning of any maximal substring that overlaps with the preceding one.
249: Typically, the benefits of the finite transducer outweigh the occurrence
250: of such errors.
251:
252: In contrast with the stack-based algorithm it approximates, however,
253: the transducer requires only a finite amount of memory, runs in linear
254: time, and gives immediate output for each letter read---significant
255: improvements for cellular automata structural analysis and, we
256: suspect, for other applications as well. Put more precisely, the
257: transducer is Lipschitz-continuous (with Lipschitz constant one) under
258: the cylinder-set topology, whereas the stack-based algorithm, which
259: conditions its output on arbitrarily long windows of future input, is
260: generally not even continuous.
261:
262: It is also worth noting that the transducers thus produced are the
263: best possible approximations with these three features and are
264: identical to those that researchers have historically constructed by
265: hand. Our algorithm thus relieves researchers of the tedium of
266: constructing ever more complicated transducers.
267:
268: \subsection*{Cellular Automata}
269: \label{ca}
270:
271: Before presenting our two filtering methods, we introduce cellular
272: automata in order to highlight an important setting where the
273: multi-regular language filtering problem arises, as well as to give
274: some visual intuition to our approach.
275:
276: Let $\Sigma$ be a discrete alphabet of $k$~symbols. A \emph{local
277: update rule} of radius~$r$ is any function $\phi : \Sigma^{2r+1}
278: \rightarrow \Sigma$. Given such a function, we can construct a global
279: mapping of bi-infinite strings $\Phi : \Sigma^\Zr \rightarrow
280: \Sigma^\Zr$, called a \emph{one-dimensional cellular automaton}~(\CA),
281: by setting:
282: \begin{align*}
283: \Phi(\sigma)_i := \phi(\sigma_{i-r} \dots
284: \sigma_i \dots \sigma_{i+r}) ~,
285: \end{align*}
286: where $\sigma_i$ denotes the $i$th letter of the string $\sigma$.
287: Since the image under $\Phi$ of any period-$N$ bi-infinite string
288: also has period $N$, it is common to regard $\Phi$ as a mapping of
289: finite strings, $\Sigma^N \rightarrow \Sigma^N$. When regarded in
290: this way, a \CA\ is said to have \emph{periodic boundary conditions}.
291:
292: For $k$=2 and $r$=1, there are precisely 256 local update rules, and
293: the resulting \CAs\ are called the \emph{elementary} \CAs\ (or \ECAs).
294: Wolfram~\cite{Wolfram83} introduced a numbering scheme for them: Order
295: the neighborhoods $\Sigma^{3}$ lexicographically and interpret the
296: symbols $\{\phi(\eta) : \eta \in \Sigma^{3}\}$ as the binary
297: representation of an integer between 0 and 255.
298:
299: \begin{figure}
300: \includegraphics[width=3in]{autocafigure-1.ps}
301: \caption{A space-time diagram illustrating the typical behavior of
302: \ECA~110. Black squares correspond to 1s, and white squares to 0s.
303: \label{eca110}}
304: \end{figure}
305:
306: By interpreting a string's letters as values assumed by the
307: \emph{sites} of a discrete \emph{lattice}, a \CA\ can be viewed as a
308: spatially extended dynamical system---discrete in time, space, and
309: local state. Its behavior as such is often illustrated through
310: so-called \emph{space-time diagrams}, in which the iterates $\{
311: \Phi^t(\sigma^0) \}_{t = 0, 1, 2, \dots}$ of an initial string
312: $\sigma^0$ are plotted as a function of time. Figure~\vref{eca110},
313: for example, depicts \ECA~110 acting iteratively on an initial string
314: of length $N=150$.
315:
316: Due to their appealingly simple architecture, researchers have studied
317: \CAs\ not only as abstract mathematical objects, but as models for
318: physical, chemical, biological, and social phenomena such as fluid
319: flow, galaxy formation, earthquakes, chemical pattern formation,
320: biological morphogenesis, and vehicular traffic dynamics.
321: Additionally, they have been used as parallel computing devices, both
322: for the high-speed simulation of scientific models and for
323: computational tasks such as image processing. More generally, \CAs\
324: have provided a simplified setting for studying the ``emergence'' of
325: cooperative or collective behavior in complex systems. The literature
326: for all these applications is vast and includes Refs.~\cite{Burks70a,
327: FarmerEtAL84a, FogelmanSoulieEtAl87, Gutowitz90a, Kara94a, Nage96a,
328: PARCELLA94, Crut98c, Toffoli&Margolus87, Wolfram86a}.
329:
330: \subsection*{Computational-Mechanics Structural Analysis of \CapitalCAs}
331:
332: The computational-mechanics \cite{Crutchfield&Young89,Crut98d}
333: structural analysis of a \CA\ rests on the discovery of a ``pattern
334: basis''---a collection $\{\mc{D}_i\}$ of automata that describe the
335: emergent structural components in the \CA's space-time behavior
336: \cite{Crut93a,Hans90a}. Once such a pattern basis is found, conforming
337: regions of space-time can be seen as background \emph{domains} through
338: which coherent structures not fitting the basis move. In this way,
339: structural features set against the domains can be identified and
340: analyzed.
341:
342: \begin{figure*}
343: \includegraphics[scale=0.4]{autocafigure-2.ps}
344: \includegraphics[scale=0.4]{autocafigure-3.ps}
345: \caption{(Left)~Space-time diagram illustrating the typical behavior of
346: \ECA~18---a \CA\ exhibiting apparently random behavior, i.e., the
347: set of length-$L$ spatial strings has a positive entropy density as
348: $L \rightarrow \infty$. (Right)~The same space-time diagram
349: filtered with the regular domain $\mc{D}=\sub \left([0(0+1)]^*
350: \right)$. (After Ref.~\cite{Crut92a}.)
351: \label{eca18}}
352: \end{figure*}
353:
354: More formally, Crutchfield and Hanson define a \emph{regular domain}
355: \domain{} to be a regular language (the collection of strings
356: recognized by some finite automaton) that is:
357: \begin{enumerate}
358: \item \emph{temporally invariant}---the \CA\ maps \domain{}
359: \emph{onto} itself; that is, $\Phi^n[\domain{}] = \domain{}$ for
360: some $n>0$\ ---and
361: \item \emph{spatially homogeneous}---the same pattern can occur at any
362: letter: the recurrent states in the minimal finite automaton
363: recognizing $\domain{}$ are strongly connected.
364: \end{enumerate}
365:
366: Once we discover a \CA's regular domains---either through visual
367: inspection or by an automated induction method such as the
368: $\epsilon$-machine reconstruction algorithm
369: \cite{Crut92c}---the corresponding space-time regions are, in
370: a sense, understood. Given this level of discovered regularity, we
371: bleach out the domain-conforming regions from space-time diagrams,
372: leaving only ``un-modeled'' deviations, whose dynamics can then be
373: studied. Sometimes, as is the case for the \CAs\ we exhibit here,
374: these deviations resemble particles and, by studying the
375: characteristics of these particle-like deviations---how they move and
376: what happens when they collide, we hope to understand the \CA's
377: (possibly hidden) computational capabilities.
378:
379: Consider, for example, the apparently random behavior of \ECA~18,
380: illustrated in \Figure~\vref{eca18}. Although no coherent structures
381: present themselves to the eye, computational-mechanics structural
382: analysis lays bare particles hidden within its output: Filtering
383: its space-time diagrams with the regular domain $\mc{D} =
384: \sub([0(0+1)]^*)$---where $\sub(\mc{L})$ denotes the regular language
385: consisting of all subwords of strings belonging to the regular
386: language $\mc{L}$---reveals a system of particles that follow random
387: walks and pairwise annihilate whenever they
388: touch~\cite{Crut92a,Eloranta94,Hans90a}. Thus, by blurring the \CA's
389: deterministic behavior on strings, we discover higher-level stochastic
390: particle dynamics. Although this loss of deterministic detail may at
391: first seem conceptually unsatisfying, the resulting view is more
392: structurally detailed than the vague classification of \ECA~18 as
393: ``chaotic''.
394:
395: Thus, discovering domains and filtering them from space-time diagrams
396: is essential to understanding the information processing embedded
397: within a \textsc{ca}'s output.
398:
399: \section{Method 1---Filtering with a Stack}
400:
401: We now present the first method for solving the general multi-regular
402: language filtering problem with which we began. Although
403: the following method is perhaps the most thorough and easiest to
404: describe, it requires an arbitrarily deep stack to compute. Its
405: description will rest upon a few basic ideas from automata theory.
406: (Please refer to the first few paragraphs of App.~\vref{automata-review},
407: up to and including Lemma~\ref{subset-construction}, where these
408: preliminaries are reviewed.)
409:
410: To filter a string $\sigma$, this method identifies the collection of
411: its maximal substrings that the automata $\{\mc{D}_i\}$ accept. More
412: formally, given a string $\sigma$, let $\sigma_{a,b}$ denote the
413: substring $\sigma_a \sigma_{a+1} \cdots \sigma_b$ for $a, b \in \Zr$.
414: If $\sigma$ is bi-infinite, extend this notation so that $a=-\infty$
415: and $b=\infty$ denote the intuitive infinite substrings. Place a
416: partial ordering $\prec$ on all such substrings by setting
417: $\sigma_{a,b} \prec \sigma_{a',b'}$ if $a' \le a \le b \le b'$. Then
418: let $\mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma)$ denote the collection of
419: maximal substrings $\sigma_{a,b}$ (with respect to $\prec$) that the
420: $\{\mc{D}_i\}$ accept---or, in symbols, let:
421: \begin{align*}
422: \mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma) := \{ \sigma_{a,b} \in \mc{P}
423: :\ & \text{there is no $\sigma' \in \mc{P}$} \\ &\text{with
424: $\sigma_{a,b} \prec \sigma'$} \} ~,
425: \end{align*}
426: where $\mc{P} := \{ \sigma_{a,b} : \text{$\mc{D}_i$ accepts
427: $\sigma_{a,b}$ for some $i$}\}$.
428:
429: The following algorithm can be used to compute
430: $\mc{P}_{\mrm{max}}(\{\mc{D}_i\},\sigma)$.
431:
432: \begin{alg}
433: \label{alg:stack-based}
434: \textbf{Input:} The automata $\mc{D}_1, \dots, \mc{D}_n$ and the
435: length-$N$ string $\sigma$.
436: \begin{tabbing}
437: \quad \= \quad \= \quad \= \quad \= \quad \= \quad \= \quad \+ \kill
438: \textbf{Let} $\mc{A} := \Det(\mc{D}_1 \sqcup \cdots \sqcup
439: \mc{D}_n)$. \\
440: \textbf{Let} $s_0$ be $\mc{A}$'s unique start state. \\
441: \textbf{Let} $\mathbf{S}$ and $\mathbf{M}$ be empty stacks. \\
442: \textbf{For} $j = 1 \dots N$ \textbf{do} \+ \\
443: Push $(s_0,j)$ onto $\mathbf{S}$. \\
444: \textbf{For} each $(s,i) \in \mathbf{S}$ \textbf{do} \+ \\
445: \textbf{If} there is a transition $\trans{s}{\sigma_j}{s'} \in
446: T(\mc{A})$ \+ \\
447: \textbf{then} replace $(s,i)$ with $(s',i)$ in $\mathbf{S}$. \\
448: \textbf{Otherwise}, remove $(s,i)$ from $\mathbf{S}$. \+ \\
449: \textbf{If}, in addition, $(s,i)$ was at the bottom
450: of $\mathbf{S}$ \\
451: \> \textbf{then} push the pair $(i,j-1)$ onto $\mathbf{M}$. \- \-
452: \- \- \\
453: \textbf{Let} $(s_f,i_f)$ be the pair at the bottom of
454: $\mathbf{S}$. \\
455: Push $(i_f,N)$ onto $\mathbf{M}$. \- \\
456: \textbf{Output:} $\mathbf{M}$.
457: \end{tabbing}
458: \end{alg}
459:
460: The following proposition is easily verified, and we state it without
461: proof.
462: \begin{prop}
463: \label{stack-local-prop}
464: If $\sigma$ is a \emph{finite string} and if $\mathbf{M}_\sigma$ is
465: the output of the above algorithm when applied to $\sigma$, then
466: $\mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma) = \{ \sigma_{a,b} : (a,b) \in
467: \mathbf{M}_\sigma\}$.
468: \end{prop}
469:
470: We summarize Prop.~\ref{stack-local-prop} by saying that
471: Algorithm~\ref{alg:stack-based} solves the \emph{local} filtering
472: problem in the sense that it can compute
473: $\mc{P}_\mrm{max}(\{\mc{D}_i\}, w)$ over a finite, contractible
474: window~$w$. (By \emph{contractible} we mean that periodic boundary
475: conditions along the boundary of $w$ are ignored.)
476:
477: The \emph{global} filtering problem, which takes into account periodic
478: boundary conditions, is considerably more subtle. A somewhat pedantic
479: example is filtering the bi-infinite string $0^\Zr$ consisting
480: entirely of $0$s with the language $\sub[(0^m 1)^*]$. (Recall that
481: $\sub(\mc{L})$ is our notation for the collection of substrings of
482: strings belonging to $\mc{L}$.) The local approach applied to a
483: finite length-$N$ window $0^N$, where $N<m$, will return $0^N$ itself
484: as its single maximal substring; i.e., $\mc{P}_\mrm{max}(\{\sub[(0^m
485: 1)^*]\},0^N) = \{ 0^N \}$. In contrast, the global filter of $0^\Zr$
486: will consist of heavily overlapping length-$m$ substrings beginning
487: and ending at every position within $0^\Zr$:
488: \begin{align*}
489: \mc{P}_\mrm{max}(\{\sub[(0^m 1)^*]\},0^\Zr) = \{ 0^\Zr_{a+1}
490: 0^\Zr_{a+2} \cdots 0^\Zr_{a+m} : a \in \Zr\}.
491: \end{align*}
492:
493: Fortunately, by examining sufficiently large finite windows,
494: Algorithm~\ref{alg:stack-based} can also be used to solve this more
495: subtle global filtering problem in the case of a bi-infinite string
496: that is periodic. The following Lemma captures the essential
497: observation.
498:
499: \begin{lem}
500: \label{lem:pumping}
501: Suppose $\sigma$ is a \emph{period-$N$ bi-infinite string}. Then
502: every maximal substring $\sigma_{a,b} \in
503: \mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma)$ must have length $\le m \cdot
504: N$, where $m := \max \{ |S(\mc{D}_i)| \}_i$, or else
505: $\mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma)$ must consist of
506: $\sigma_{-\infty,\infty}=\sigma$, alone.
507: \end{lem}
508: \begin{proof}
509: Our argument is a variation on the proof of the classical Pumping
510: Lemma from automata theory. Suppose that $\sigma_{a,b} \in
511: \mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma)$, $a$ and $b$ are finite,
512: and $b-a+1 > m \cdot N$. Then one of the domains, say
513: $\mc{D}_i$, accepts $\sigma_{a,b}$. By definition, this means there
514: is a sequence of transitions in $T(\mc{D}_i)$ of the form
515: $\trans{s_a}{\sigma_a}{s_{a+1}},
516: \trans{s_{a+1}}{\sigma_{a+1}}{s_{a+2}}, \dots,
517: \trans{s_b}{\sigma_b}{s_{b+1}}$. Consider the sequence of pairs:
518: \begin{align*}
519: \{(s_i,i \bmod N)\}_{i=a}^b \subset S(\mc{D}_i) \times \Zr_N.
520: \end{align*}
521: Since:
522: \begin{align*}
523: b-a+1 > m \cdot N \ge |S(\mc{D}_i) \times \Zr_N|,
524: \end{align*}
525: the Pigeonhole Principle implies that this sequence must
526: repeat---say $(s_l, l \bmod N) = (s_{l'}, l' \bmod N)$ for integers
527: $l<l'$. But then $\mc{D}_i$ must also accept any string of the
528: form:
529: \begin{align*}
530: \sigma_a \sigma_{a+1} \cdots \sigma_l (\sigma_{l+1} \cdots
531: \sigma_{l'})^* \sigma_{l'+1} \cdots \sigma_b.
532: \end{align*}
533: Since $l \bmod N = l' \bmod N$, such strings correspond to
534: arbitrarily long substrings of the original bi-infinite string
535: $\sigma$. As a result, $\sigma_{a,b}$ cannot be maximal. This
536: contradiction implies that either (i)~$a$ and $b$ are not both
537: finite or (ii)~$b-a+1 \le m \cdot N$. A straightforward
538: generalization of our argument in fact shows that either
539: (i)~\emph{both $a$ and $b$ are infinite} or (ii)~$b-a+1 \le m \cdot
540: N$.
541: \end{proof}
542:
543: A consequence of Lemma~\ref{lem:pumping} is that we can solve the
544: global filtering problem by applying Algorithm~\ref{alg:stack-based}
545: to a window of length $m N+1$.
546:
547: \begin{prop}
548: \label{stack-global-prop}
549: Suppose $\sigma$ is a \emph{period-$N$ bi-infinite string} and that
550: $\mathbf{M}_{\sigma'}$ is the output of
551: Algorithm~\ref{alg:stack-based} when applied to the finite string
552: $\sigma' := \sigma_1 \sigma_2 \cdots \sigma_{m N+1}$, where $m :=
553: \max \{ |S(\mc{D}_i)| \}_i$. Then:
554: \begin{align*}
555: \mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma)
556: = \{ \sigma_{a+qN,b+qN} : (a,b) \in
557: \mathbf{M}_{\sigma'}, q \in \Zr \} ~,
558: \end{align*}
559: unless $\mathbf{M}_{\sigma'}$ consists of $(1,mN+1)$ alone, in which
560: case $\mc{P}_\mrm{max}(\{\mc{D}_i\},\sigma) =
561: \{\sigma_{-\infty,\infty}=\sigma\}$.
562: \end{prop}
563:
564: The major drawback of Algorithm~\ref{alg:stack-based}, however, is its
565: worst-case compute time.
566:
567: \begin{prop}
568: \label{prop:stack-runtime}
569: The worst-case performance of the stack-based filtering algorithm
570: (Algorithm~\ref{alg:stack-based}) has order $\mc{O}(N^2)$, where $N$
571: is the length of the input string~$\sigma$.
572: \end{prop}
573: \begin{proof}
574: For each $j=1 \dots N$, the algorithm pushes a new pair $(s_0,j)$
575: onto the stack $\mathbf{S}$ and then advances each pair on
576: $\mathbf{S}$. In the case that $\mc{A}$ accepts the entire string
577: $\sigma$, the algorithm will never remove any pairs from
578: $\mathbf{S}$ and will thus advance a total of $\sum_{j=1}^N j =
579: \tfrac{1}{2}N(N+1)$ pairs. The proposition follows since it is
580: possible to advance each pair in constant time.
581: \end{proof}
582:
583: \section{Method 2---Filtering with a Transducer}
584:
585: The second method---and our primary focus---is to algorithmically
586: construct a finite transducer that approximates the stack-based
587: Algorithm~\ref{alg:stack-based} by printing sequences of labels
588: $i$ over segments of $\sigma$ recognized by the automaton~$\mc{D}_i$.
589: When, at the end of such a segment, the transducer encounters a letter
590: forbidden by the prevailing automaton~$\mc{D}_i$, it prints special
591: symbols until it resynchronizes to a new automaton $\mc{D}_j$. The
592: special symbols consist of labels for the kinds of domain-to-domain
593: transition and $\lambda$, which indicates that classification is
594: ambiguous.
595:
596: In this way, the transducer approximates the stack-based algorithm by
597: jumping from one maximal substring to the next, printing a few special
598: symbols in between. Because it does not jump to a new maximal
599: substring until the preceding one ends, however, the transducer can
600: miss the true beginning of any maximal substring that overlaps with
601: the preceding one. But if no more than two maximal substrings overlap
602: at any given point of $\sigma$, then it is possible to combine the
603: output of two transducers, one reading left-to-right and the other
604: reading right-to-left, to obtain the same output as the stack-based
605: algorithm.
606:
607: These shortcomings are minor, and in exchange the transducer gains
608: several significant advantages over the stack-based algorithm it
609: approximates: It requires only a finite amount of memory, runs in
610: linear time, and gives immediate output for each letter read.
611:
612: Although finite transducers are generally considered less sophisticated
613: than stack-based algorithms in the sense of computational complexity, the
614: construction of this transducer is considerably more intricate than the
615: preceding stack-based algorithm and is, in fact, our principal aim in
616: the following.
617:
618: Our approach will be to construct a transducer $\Filter(\{\mc{D}_i\})$
619: by `filling in' the forbidden transitions of the automaton $\mc{A} :=
620: \Det(\mc{D}_1 \sqcup \cdots \sqcup \mc{D}_n)$. We will thus tie our
621: hands behind our backs at the outset by permitting the transducer to
622: remember \emph{only} as much about past input as does the automaton
623: $\mc{A}$ while recognizing domain strings.
624:
625: Unfortunately, $\mc{A}$'s states will generally preserve too little
626: information to facilitate optimal resynchronization. It is possible,
627: however, to begin with elaborately constructed, equivalent,
628: non-minimal domains $\mc{D}'_i$ that yield an automaton $\mc{A}'
629: := \Det(\mc{D}'_1 \sqcup \dots \sqcup \mc{D}'_n)$ whose states do
630: preserve just enough information to facilitate optimal
631: resynchronization. The transducer obtained by `filling in' the
632: forbidden transitions of this automaton $\mc{A}'$ represents the best
633: possible (transducer) approximation of the stack-based algorithm. We
634: present a preprocessing algorithm which produces these equivalent,
635: non-minimal domains $\{\mc{D}'_i\} = \Optimize(\{\mc{D}_i\})$ at the
636: end of our
637: discussion of Method-2 filtering.% \vpageref{domain-preprocessing}.
638:
639: The idea underlying our construction is the following. Suppose that
640: while reading the string $\sigma$ we are recognizing an increasingly
641: long string accepted by $\mc{D}_i$ when we encounter a forbidden
642: letter $a$. In accepting $\sigma$ up to this point, the automaton
643: $\mc{A}$ will have reached a certain state $s \in S(\mc{A})$ that has
644: no outgoing transition corresponding to the letter $a$. Our goal is
645: to create such a transition by examining the collection of all
646: possible strings that could have placed us in the state $s$ and to
647: resynchronize to the state of $\mc{A}$ that is most compatible with
648: the potentially foreign strings obtained by appending to these strings
649: the forbidden letter~$a$.
650:
651: In this situation there will be two natural desires. On the one hand,
652: we wish to unambiguously resynchronize to as \emph{specific} a domain
653: state as possible; but, on the other, we wish to rely on as little of
654: the imagined past as possible. (We use the term \emph{imagined} because
655: our transducer remembers only the state $s \in S(\mc{A})$ we have
656: reached---not the particular string that placed us there.) To reflect
657: these desires, we introduce a partial ordering on the collection of
658: potential resynchronization states $\{S_{i,l}\}$, where $i$ measures
659: the specificity of resynchronization and $l$ the length of imagined
660: past.
661:
662: We now implement this intuition in full detail. Our exposition relies
663: heavily on ideas from automata theory. (We now urge reading
664: App.~\ref{automata-review} in its entirety.)
665:
666: As above, let $\mc{A} := \Det(\mc{D}_1 \sqcup \dots \sqcup \mc{D}_n)$
667: and let $S(\mc{A}) \stackrel{\psi_\mc{A}}{\hookrightarrow} S(\mc{D}_1
668: \sqcup \dots \sqcup \mc{D}_n)$ be the canonical injection provided by
669: Lemma~\vref{subset-construction} in App.~\ref{automata-review}.
670: Assume that there is a canonical injection $S(\mc{D}_1) \sqcup \dots
671: \sqcup S(\mc{D}_n) \hookrightarrow S(\mc{A})$ and that we can
672: therefore regard the sets $S(\mc{D}_i)$ as subsets of $S(\mc{A})$. An
673: example of this situation is depicted in \Figure~\vref{domains}. A
674: sufficient condition for the existence of such an injection is that
675: each $\mc{D}_i$ is minimal and that $\Lang(\mc{D}_i) \not\subset
676: \Lang(\mc{D}_{l})$ for $i \ne l$. Minimality is far from required,
677: however, and the assumption is valid for a much larger class of
678: domains. (Put informally, it suffices if we can associate to each
679: state $s \in S(\mc{D}_1 \sqcup \dots \sqcup \mc{D}_n)$ a string that
680: corresponds to a unique path through $\mc{D}_1 \sqcup \cdots \sqcup
681: \mc{D}_n$---one that leads to~$s$.)
682:
683: \begin{figure}[here]
684: \begin{minipage}[t]{2.3in}
685: \includegraphics[width=2.3in]{autofigure-1.ps}
686: \end{minipage}
687: \begin{minipage}[t]{2.4in}
688: \includegraphics[width=2.4in]{autofigure-2.ps}
689: \end{minipage}
690: \caption{\label{domains} The domains $\mc{D}_1$ and $\mc{D}_2$ (top)
691: and the automaton $\mc{A} = \Det(\mc{D}_1 \sqcup \mc{D}_2)$
692: (bottom). Start states are indicated by dotted arrows from the
693: word ``Start'', and final states are darkened. Notice that the
694: states of $\mc{A}$ correspond to collections of states of $\mc{D}_1$
695: and $\mc{D}_2$ and that the former are canonically injected into
696: the latter, here by the map $n \mapsto [n]$.
697: }
698: \end{figure}
699:
700: Let $\mc{T}$ be a transducer with the same states, start state, and
701: final states as $\mc{A}$, but with the transitions:
702: \begin{align*}
703: T(\mc{T}) :=& \{ \trans{s}{a|f(s')}{s'} : \trans{s}{a}{s'} \in
704: T(\mc{A}) \} ~,
705: \end{align*}
706: where:
707: \begin{align*}
708: f(s') =
709: \begin{cases}
710: i & \text{if $\phi_\mc{A}(s') \subset S(\mc{D}_i)$} ~,\\
711: \lambda & \text{otherwise} ~,
712: \end{cases}
713: \end{align*}
714: and where $\lambda$ is a new symbol in the output alphabet $\Sigma'$
715: indicating that domain labeling was not possible, for example,
716: because the partial string read so far belongs to more than one or
717: none of the automata $\{\mc{D}_i\}$. To recapitulate, the transducer's
718: output alphabet $\Sigma'$ consists of three kinds of symbol: domain
719: labels $\{1 \ldots n\}$, domain-domain transition types
720: $\{1, 2, \ldots, p\}$, and ambiguity $\lambda$.
721:
722: The transducer $\mc{T}$'s input, $\Input(\mc{T})$, recognizes
723: precisely those strings recognized by the given domains. Our goal is
724: to extend $\mc{T}$ by introducing transitions of the form:
725: \begin{align*}
726: \{ \trans{s}{a|h(s,a)}{g(s,a)} \ : \ & s \in S(\mc{T}) = S(\mc{A}),
727: a \in \Sigma, \\ & \text{and there are no transitions} \\ & \text{of
728: the form $\trans{s}{a}{\cdot} \in T(\mc{A})$} \} ~,
729: \end{align*}
730: where the functions $g(s,a)$ and $h(s,a)$ are defined in the following
731: paragraphs. The transducer $\Filter(\{\mc{D}_i\})$ obtained by adding
732: these transitions to $\mc{T}$ will then have the desired property that
733: its input $\Input(\Filter(\{\mc{D}_i\}))$ will accept all
734: strings~\footnote{Accepting $\Sigma^*$ is rather generous: the filter
735: will take any string and label its symbols according to the
736: hypothesized patterns $\{\mc{D}_i\}$. This is not required for
737: cellular automata, since their configurations often contract to
738: strict subsets of $\Sigma^*$ over time.}.
739:
740: Let $W_l$ denote the collection of strings corresponding to length-$l$
741: paths through $\mc{A}$ beginning in any of its states, but
742: \emph{ending} in state~$s$, and let $W'_{l+1}$ denote the collection
743: of strings obtained by appending the letter $a$ to the strings of
744: $W_l$. The strings $\bigcup_{l \ge 0} W'_l$ are accepted by the
745: finite automaton $\mc{A}^{s,a}$ obtained by adding a new state $f$ and
746: a transition $\trans{s}{a}{f}$ to $\mc{A}$, and by setting
747: $\mathrm{Start}(\mc{A}^{s,a}) := S(\mc{A}^{s,a})$ and
748: $\mathrm{Final}(\mc{A}^{s,a}) := \{f\}$. An example is shown in
749: \Figure~\vref{deadend}, where the four-state domain has a transition
750: added from state~[2] on symbol~$1$, which was originally forbidden.
751:
752: \begin{figure}[here]
753: \begin{minipage}[t]{3.5in}
754: \includegraphics[width=3.5in]{autofigure-3.ps}
755: \end{minipage}
756: \begin{minipage}[t]{2.4in}
757: \includegraphics[width=2.4in]{autofigure-4.ps}
758: \end{minipage}
759: \caption{\label{deadend}The semi-deterministic automaton
760: $\mc{A}^{[2],1}$ (top) obtained by adding a state $f=[9]$ and its
761: deterministic version $\Det(\mc{A}^{[2],1})$ (bottom) with states
762: relabeled with the integers $1\dots 17$ in order to simplify later
763: diagrams.
764: }
765: \end{figure}
766:
767: In order to choose the resynchronization state $g(s,a)$ for the
768: forbidden transition $(s,a)$, we examine the strings of $\bigcup_{l
769: \ge 0} W'_l$ that also belong to one or more of the domains
770: $\{\mc{D}_i\}$. We do this by constructing the automaton
771: $\Det(\mc{A}^{s,a}) \cap \mc{A}$, which we call the
772: \emph{resynchronization automaton}. By
773: Lemma~\vref{intersection-construction}, there is a canonical, although
774: not necessarily injective, association:
775: \begin{equation*}
776: \phi : S(\Det(\mc{A}^{s,a}) \cap \mc{A}) \rightarrow S(\mc{A})
777: \end{equation*}
778: given by the composition:
779: \begin{equation*}
780: S(\Det(\mc{A}^{s,a}) \cap \mc{A}) \ \ \hookrightarrow \ \
781: S(\Det(\mc{A}^{s,a})) \times S(\mc{A}) \ \ \rightarrow \ \ S(\mc{A}) ~,
782: \end{equation*}
783: where the right-most map is the second-factor projection, $(s,s')
784: \mapsto s'$.
785:
786: The resynchronization automaton $\Det(\mc{A}^{s,a}) \cap \mc{A}$ may
787: reveal several possible resynchronization states. To help distinguish
788: among them, we put them into sets $\{S_{i,l}\}$ where $i$
789: measures the \emph{specificity} of resynchronization and $l$ the length
790: of imagined past. More precisely, let $S_{i,l}$ denote those states $s
791: \in S(\mc{A})$ to which $\phi$ associates at least one state $s' \in
792: \mathrm{Final}(\Det(\mc{A}^{s,a}) \cap \mc{A})$ (i.e. $s=\phi(s')$)
793: satisfying the following two conditions: (1)~$s$ corresponds, under
794: Lemma~\vref{subset-construction}, to precisely $i$ states of $\mc{D}_1
795: \sqcup \cdots \sqcup \mc{D}_n$ and (2)~there is a length-$l$ path
796: from the unique start state of $\Det(\mc{A}^{s,a}) \cap \mc{A}$ to
797: $s'$.
798:
799: Give the sets $\{S_{i,l}\}$ the dictionary ordering; that is, let
800: $S_{i,l} < S_{i',l'}$ if $i<i'$ or if $i=i' \wedge l<l'$. The set
801: $S_{|S(\mc{A})|,0}$ consists of the unique start state of
802: $\Det(\mc{A}^{s,a}) \cap \mc{A}$. Thus, by the well ordering
803: principle, there must be a unique, least set among the sets
804: $\{S_{i,l}\}$ that consist of a single state, say $\{s'\}$. Let
805: $g(s,a) := s'$, and let $h(s,a):=h'(s,s')=h'(s,g(s,a))$, where $h'$ is
806: any injection $S(\mc{T}) \times S(\mc{T}) \hookrightarrow \Sigma'$
807: (chosen independent of $s$ and $a$). An example of this construction
808: is shown in \Figure~\vref{intersect}.
809:
810: The transducer is completed by repeating the above steps for all
811: forbidden transitions.
812:
813: \begin{figure}[here]
814: \begin{minipage}[t]{2.5in}
815: \includegraphics[width=2.5in]{autofigure-5.ps}
816: \end{minipage}
817: %\begin{minipage}[t]{1.86in}
818: %\includegraphics[width=1.86in]{autofigure-6-touchup.ps}
819: \begin{minipage}[t]{2.5in}
820: \includegraphics[width=2.5in]{autofigure-6-touchup.2.eps}
821: \end{minipage}
822: \caption{\label{intersect}The resynchronization automaton
823: $\Det(\mc{A}^{[2],1}) \cap \mc{A}$ (top). Here $S_{1,3}$ consists
824: of the state $(13,[6])$ alone, and all other $S_{1,\bullet}$ are
825: empty. So we choose $s' = [6]$ and add a transition
826: $\trans{[2]}{1|h'([2],[6])}{[6]}$ to~$\mc{T}$ (bottom).}
827: \end{figure}
828:
829: \subsection*{Computability of the transducer $\Filter(\{\mc{D}_i\})$}
830:
831: Although the transducer $\Filter(\{\mc{D}_i\})$ is well defined, it is
832: perhaps not immediately clear that it is computable. After all, we
833: appealed to the well ordering principle to obtain a least singleton
834: set $\{s'\}$ among the sets $\{S_{i,l}\}$. In fact, infinitely many
835: sets $S_{i,l}$ precede the stated upper bound
836: $S_{|S(\mc{A})|,0}$---for instance, all of the sets $S_{1,\Nn}$ do,
837: provided $|S(\mc{A})|>1$.
838:
839: The construction is nevertheless computable, because for each $i$ the
840: sequence of sets $S_{i,\Nn}$ must eventually repeat. In fact, we can
841: compute this sequence of sets exactly by automata-theoretic means.
842:
843: \begin{prop}
844: The transducer $\Filter(\{\mc{D}_i\})$ is computable.
845: \end{prop}
846: \begin{proof}
847: Let $\mc{Z}[\mc{C}]$ denote the automaton obtained by relabeling all
848: of the automaton $\mc{C}$'s transitions with $0$s. This automaton
849: will almost certainly be nondeterministic. The equivalent
850: deterministic automaton $\Det(\mc{Z}[\mc{C}])$ is useful, because
851: the state it reaches when accepting the string $0^l$ corresponds
852: precisely, under Lemma~\ref{subset-construction}, to the collection
853: of states that can be reached by length-$l$ paths through $\mc{C}$.
854:
855: Moreover, since $\Det(\mc{Z}[\mc{C}])$ is defined over a single
856: letter, yet deterministic and finite, it must have a special
857: graphical structure: its single start state $s_0$ must lead to a
858: finite loop after a finite chain of non-recurrent states.
859: (Actually, if $\mc{C}$ has no loops whatsoever, there will not even
860: be a loop.) Thus, its states have a linear ordering: $s_0
861: \stackrel{0}{\rightarrow} s_1 \stackrel{0}{\rightarrow} \cdots
862: \stackrel{0}{\rightarrow} s_m \stackrel{0}{\rightarrow} s_{m+1}
863: \stackrel{0}{\rightarrow} \cdots \stackrel{0}{\rightarrow} s_{m+m'}
864: \stackrel{0}{\rightarrow} s_m$. An example is illustrated in
865: \Figure~\vref{squash}, where $m=4$ and $m'=0$.
866:
867: By Lemma~\vref{subset-construction} the states $\{s_k\}$ correspond
868: to collections of states of $\mc{C}$ under an injection:
869: \begin{align*}
870: \psi_{\mc{Z}[\mc{C}]} : S(\Det(\mc{Z}[\mc{C}]))
871: \hookrightarrow & \{ S \subset
872: S(\mc{Z}[\mc{C}]) \}\\ =& \{ S \subset
873: S(\mc{C}) \}.
874: \end{align*}
875:
876: Let $\mc{C} := \Det(\mc{A}^{s,a}) \cap \mc{A}$ in the preceding
877: discussion. As before, by Lemma~\vref{intersection-construction},
878: there is a function:
879: \begin{align*}
880: \phi : S(\Det(\mc{A}^{s,a}) \cap \mc{A}) \rightarrow S(\mc{A}) ~.
881: \end{align*}
882: Let $S_{*,l} \subset S(\mc{A})$ denote those states defined by
883: the formula:
884: \begin{align*}
885: S_{*,l} := \phi[\psi_{\mc{Z}[\mc{C}]}(s_l) \cap
886: \mathrm{Final}(\Det(\mc{A}^{s,a}) \cap \mc{A})] ~.
887: \end{align*}
888:
889: \begin{figure}[here]
890: \begin{minipage}[t]{2.6in}
891: \includegraphics[width=2.6in]{autofigure-7.ps}
892: \end{minipage}
893: \begin{minipage}[t]{3.1in}
894: \includegraphics[width=3.1in]{autofigure-8-touchup.2.eps}
895: \end{minipage}
896: \caption{\label{squash} The automaton
897: $\mc{Z}[\Det(\mc{D}_1^{[2],1}) \cap \mc{A}]$ (top) and its
898: deterministic version $\Det(\mc{Z}[\Det(\mc{D}_1^{[2],1}) \cap
899: \mc{A}])$ (bottom).}
900: \end{figure}
901:
902: Finally, let $S_{i,*}$ denote those states of $\mc{A}$ that
903: correspond to precisely $i$~states of $\mc{D}_1 \sqcup \cdots \sqcup
904: \mc{D}_n$; that is, let:
905: \begin{align*}
906: S_{i,*} := \{ s \in S(\mc{A}) : |\phi_{\mc{A}}(s)| = i \} ~,
907: \end{align*}
908: where $\phi_{\mc{A}} : S(\mc{A}) \hookrightarrow \{ S \subset
909: S(\mc{D}_1 \sqcup \cdots \sqcup \mc{D}_n) \}$ is the injection
910: provided by Lemma~\vref{subset-construction}.
911:
912: The sets $S_{i,l}$ can then be computed as the intersections
913: $S_{i,*} \cap S_{*,l}$, and we need only examine these for $1 \le i
914: \le |S(\mc{D}_1)|+\dots+|S(\mc{D}_n)|$ and $0 \le l \le m+m'$ to
915: discover the least one under the dictionary ordering that is a
916: singleton $\{s'\}$.
917: \end{proof}
918:
919: We summarize the entire algorithm.
920:
921: {
922: \newenvironment{entry}
923: {\begin{list}{--}{
924: \setlength{\topsep}{0pt}
925: \setlength{\itemsep}{0pt}
926: \setlength{\parsep}{0pt}
927: \setlength{\labelwidth}{5pt}
928: \setlength{\itemindent}{0pt}}}{\end{list}}
929:
930: \begin{alg}
931: \label{alg:transducer-building}
932: % \ \\
933: \textbf{Input:} The regular domains $\mc{D}_1, \dots, \mc{D}_n$.
934: \begin{entry}
935: \item Let $\mc{A} := \Det(\mc{D}_1 \sqcup \cdots \sqcup \mc{D}_n)$.
936: \item Choose any injection $h' : S(\mc{A}) \times S(\mc{A})
937: \hookrightarrow \Sigma'$.
938: \item Make $\mc{A}$ into a transducer $\mc{T}$ by adding the symbol
939: $i$ as output to any transition ending in a state corresponding to
940: states of only one domain $\mc{D}_i$ and by adding $\lambda$s as
941: output symbols to all other transitions.
942: \item \textbf{For each} forbidden transition $(s,a) \in S(\mc{A})
943: \times \Sigma(\mc{A})$, add a transition to $\mc{T}$ through the
944: following procedure \textbf{do}
945: \begin{entry}
946: \item Construct the automaton $\mc{A}^{s,a}$ by adding to $\mc{A}$
947: the transition $\trans{s}{a}{f}$, where $f$ is a new state, and
948: by letting $f$ be its only final state.
949: \item Construct the automaton $\Det(\mc{Z}[\Det(\mc{A}^{s,a}) \cap
950: \mc{A}])$, where $\mc{Z}[\mc{C}]$ is the automaton obtained by
951: relabeling all of $\mc{C}$'s transitions with $0$s. Its states
952: will have a natural linear ordering $s_0 \rightarrow s_1
953: \rightarrow \cdots \rightarrow s_{m+m'}.$
954: \item Let $S_{i,*}$ and $S_{*,l}$ be the subsets of $S(\mc{A})$
955: defined by:
956: \begin{align*}
957: S_{*,l} &:= \phi[\psi_{\Det(\mc{Z}[\Det(\mc{A}^{s,a}) \cap
958: \mc{A}])}(s_l) \\
959: & \; \; \; \; \; \; \; \; \; \; \cap
960: \mathrm{Final}(\Det(\mc{A}^{s,a}) \cap \mc{A})] \text{\ and} \\
961: S_{i,*} &:= \{ s \in S(\mc{A}) : |\psi_{\mc{A}}(s)| = i \}.
962: \end{align*}
963: \item Find the singleton set $\{s'\}$ among the sets:
964: \begin{align*}
965: \{S_{i,*} \cap S_{*,l} : \; & 1 \le i \le |\mc{D}_1| + \cdots +
966: |\mc{D}_n|, \\ & 0 \le l \le m+m' \}
967: \end{align*}
968: that occurs first under the dictionary ordering.
969: \item Add the transition $\trans{s}{a|h'(s,s')}{s'}$ to~$\mc{T}$.
970: \end{entry}
971: \end{entry}
972: \textbf{Output:} $\Filter(\{\mc{D}_i\}) := \mc{T}$.
973: \end{alg}
974: }
975:
976: \subsubsection*{Algorithmic Complexity}
977:
978: \begin{prop}
979: The worst-case performance of the transducer-constructing algorithm
980: (Algorithm~\ref{alg:transducer-building}) has order no greater than:
981: \begin{align*}
982: |\mc{A}| \cdot (|\Sigma|-1) \cdot \exp \circ \exp \left( 2 \cdot
983: |\mc{A}| + 1 \right) ~,
984: \end{align*}
985: where $|\mc{A}|$ has order $\exp( |\mc{D}_1| + \cdots +
986: |\mc{D}_n|)$.
987: \end{prop}
988: \begin{proof}
989: The algorithm's most expensive step is the computation of
990: $\Det(\mc{Z}[\Det(\mc{A}^{s,a}) \cap \mc{A}])$. Unfortunately,
991: because computing $\Det(\mc{G})$ has order $\mathrm{exp}(|\mc{G}|)$,
992: and because computing $\mc{G} \cap \mc{H}$ has order $|\mc{G}| \cdot
993: |\mc{H}|$, this computation has order $\exp \circ \exp \left( 2
994: \cdot |\mc{A}| + 1 \right)$.
995:
996: Finally, recall that the algorithm computes
997: $\Det(\mc{Z}[\Det(\mc{A}^{s,a}) \cap \mc{A}])$ for every forbidden
998: transition $(s,a)$ of $\mc{A}$. A rough upper bound for the number
999: of such transitions is $|\mc{A}| \cdot (|\Sigma|-1)$.
1000: From these two upper bounds the proposition follows.
1001: \end{proof}
1002:
1003: Although this analysis may at first seem to objurgate the
1004: transducer-constructing algorithm, the reader should realize that,
1005: once computed, $\mc{T}$ can be very efficiently used to filter
1006: arbitrarily long strings. That is, unlike the stack-based algorithm,
1007: its performance is linear in string length. Thus, one pays during the
1008: filter design phase for an efficient run-time algorithm---a trade-off
1009: familiar, for example, in data compression.
1010:
1011: \subsection*{Constructing optimal transducers from non-minimal
1012: domains, a preprocessing step to
1013: Algorithm~\ref{alg:transducer-building}}
1014:
1015: \label{domain-preprocessing}
1016:
1017: Recall that we constructed the transducer $\Filter(\{\mc{D}_i\})$ by
1018: `filling in' the forbidden transitions of the automaton $\mc{A} :=
1019: \Det(\mc{D}_1 \sqcup \dots \sqcup \mc{D}_n)$. This proved somewhat
1020: problematic, however, because $\mc{A}$'s states do not always preserve
1021: enough information about past input to unambiguously resynchronize to
1022: a unique, recurrent domain state. In order to help discriminate among
1023: the several possible resynchronization states, we introduced the
1024: partially ordered sets $\{S_{i,l}\}$. But even so, several attractive
1025: resynchronization states often fell into the same set $S_{i,l}$. So,
1026: lacking any objective way to choose among them, we resigned ourselves
1027: to a less attractive resynchronization state occurring in a later set
1028: $S_{i',l'}$, simply because it appeared alone there, making our choice
1029: unambiguous. If only the states of the automaton $\mc{A}$ preserved
1030: slightly more information about past input, then such compromises
1031: could be avoided.
1032:
1033: In this section we present an algorithm that splits the states of a
1034: given collection $\{\mc{D}_i\}$ of domains to obtain an equivalent
1035: collection $\{\mc{D}'_i\} = \Optimize(\{\mc{D}_i\})$ of domains that
1036: preserve just enough information about past input to enable unambiguous
1037: resynchronization in the transducer obtained by \emph{filling in} the
1038: forbidden transitions of the automaton
1039: $\mc{A}' := \Det(\mc{D}'_1 \sqcup \dots \sqcup \mc{D}'_n)$.
1040:
1041: We will accomplish this by associating to each state of the original
1042: domains $\mc{D}_i$ a collection of automata that partition past
1043: input strings into equivalence classes corresponding to individual
1044: resynchronization states. We will then refine these partitions so
1045: that $\mc{D}_i$'s transition structures can be lifted to them and
1046: thus obtain the desired domains $\{\mc{D}'_i\} =
1047: \Optimize(\{\mc{D}_i\})$.
1048:
1049: This procedure, taken as a preprocessing step to
1050: Algorithm~\ref{alg:transducer-building}, will thus produce the best
1051: possible transducer for Method-2 multi-regular language filtering.
1052:
1053: We now state our construction formally. If $s' \in S(\mc{A})$, then
1054: let $\mc{A}_{s'}$ denote the automaton that is identical to the
1055: automaton $\mc{A}$ except that its only final state is $s'$.
1056: Additionally, if $(s,a)$ is a forbidden transition of the automaton
1057: $\mc{D}_1 \sqcup \dots \sqcup \mc{D}_n$, then let $\mc{B}(s,a,s')$
1058: denote the automaton satisfying the formula:
1059: \begin{align*}
1060: \mc{B}(s,a,s') \cdot a = \Det(\mc{A}^{s,a}) \cap \mc{A}_{s'} ~,
1061: \end{align*}
1062: where $\cdot$ denotes concatenation. That is, let $\mc{B}(s,a,s')$
1063: denote the automaton that is identical to the automaton
1064: $\Det(\mc{A}^{s,a}) \cap \mc{A}_{s'}$ except that its final states are
1065: given by $\{ s_f : \trans{s_f}{a}{s_f'} \in T(\diamond), s_f' \in
1066: \mrm{Final}(\diamond)\}$, where $\diamond := \Det(\mc{A}^{s,a}) \cap
1067: \mc{A}_{s'}$. Note that in most cases $\Lang(\mc{B}(s,a,s'))$ will be
1068: empty.
1069:
1070: Next we associate to each state $s \in S(\mc{D}_1 \sqcup \dots \sqcup
1071: \mc{D}_n)$ a collection $\Gamma(s)$ of automata. If the state $s$ has
1072: no forbidden transitions, let $\Gamma(s) := \{ \Sigma^* \}$. If the
1073: state $s$ has at least one forbidden transition, however, then let
1074: $\Gamma(s)$ denote the collection of automata:
1075: \begin{align*}
1076: \Gamma(s) := \mrm{Disjoin}(\{ & \Sigma^* \cdot \mc{B}(s,a,s') : \\ &
1077: \trans{s}{a}{\cdot} \not\in T(\mc{A}), s' \in S(\mc{A})\}) ~,
1078: \end{align*}
1079: where $\mrm{Disjoin}(\{\mc{C}_\gamma\})$ denotes the coarsest
1080: partition of $\bigcup_\gamma \Lang(\mc{C}_\gamma)$ by automata
1081: $\{\mc{E}_\epsilon\}$ that is compatible with the automata
1082: $\{\mc{C}_\gamma\}$. That is, $\mrm{Disjoin}(\{\mc{C}_\gamma\})$
1083: denotes the smallest collection $\{\mc{E}_\epsilon\}$ of automata
1084: satisfying (i)~$\bigcup_\epsilon \Lang(\mc{E}_\epsilon) =
1085: \bigcup_\gamma \Lang(C_\gamma)$ and (ii)~$\Lang(\mc{C}_\gamma) \cap
1086: \Lang(\mc{E}_\epsilon)$ is either empty or equal to
1087: $\Lang(\mc{E}_\epsilon)$ for all $\gamma$ and $\epsilon$.
1088:
1089: It is possible to compute $\mrm{Disjoin}(\{\mc{C}_\gamma\})$ inductively
1090: with the formula:
1091: \begin{align*}
1092: \mrm{Disjoin}&(\{\mc{C}_1, \mc{C}_2, \dots, \mc{C}_m\}) = \\ & \{
1093: \mc{C}_1 \setminus (\mc{C}_2 \sqcup \cdots \sqcup \mc{C}_m)\} \;
1094: \cup \\ &\{ \mc{C}_1 \cap \mc{C}' : \mc{C}' \in
1095: \mrm{Disjoin}(\{\mc{C}_2, \dots, \mc{C}_m\}) \} \; \cup \\ & \{
1096: \mc{C}_1 \setminus \mc{C}' \, : \mc{C}' \in \mrm{Disjoin}(\{\mc{C}_2,
1097: \dots, \mc{C}_m\}) \}.
1098: \end{align*}
1099:
1100: %% (An algorithm for computing $\mrm{Disjoin}(\bullet)$ is provided in
1101: %% Appendix~\vref{implementation}.)
1102:
1103: Note that $\bigcup_{\mc{E} \in \Gamma(s)} \Lang(\mc{E}) = \Sigma^*$
1104: for all states $s \in S(\mc{D}_1 \sqcup \dots \sqcup \mc{D}_n)$. This
1105: is because $\Lang(\mc{B}(s,a,s'))$ contains only the empty string if
1106: $s' \in S(\mc{A})$ is the unique state reached on input $a$ from
1107: $\mc{A}$'s starting state---that is, if $(s_0,a,s') \in T(\mc{A})$,
1108: where $\{s_0\} = \mrm{Start}(\mc{A})$.
1109:
1110: Our goal is to create for each original domain $\mc{D}_i$ an
1111: equivalent domain $\mc{D}'_i$ by splitting each state $s \in
1112: S(\mc{D}_i)$ into states of the form $(s,\mc{E})$, where $\mc{E} \in
1113: \Gamma(s)$. But to endow these split states with a transition
1114: structure equivalent to $\mc{D}_i$'s, we typically must refine the
1115: sets $\Gamma(s)$ further. We must construct a refinement $\Gamma'(s)$
1116: of each $\Gamma(s)$ with the property that if $\trans{s}{a}{s'}$ is a
1117: transition of $\mc{D}_i$, then to each $\mc{E} \in \Gamma'(s)$ there
1118: corresponds a unique $\mc{E}' \in \Gamma'(s')$ with $\Lang(\mc{E}
1119: \cdot a) \subset \Lang(\mc{E}')$. Given such refinements
1120: $\Gamma'(s)$, we can take the pairs $\{(s,\mc{E}) : s \in S(\mc{D}_i),
1121: \mc{E} \in \Gamma'(s)\}$ as the states of $\mc{D}'_i$ and equip them
1122: with transitions of the form $(s,\mc{E}) \stackrel{a}{\rightarrow}
1123: (s',\mc{E}')$, and thus obtain an equivalent, but non-minimal,
1124: domain~$\mc{D}'_i$.
1125:
1126: The following algorithm can be used to compute the desired refinements
1127: $\Gamma'(s)$.
1128:
1129: {
1130: \newenvironment{entry}
1131: {\begin{list}{--}{
1132: \setlength{\topsep}{0pt}
1133: \setlength{\itemsep}{0pt}
1134: \setlength{\parsep}{0pt}
1135: \setlength{\labelwidth}{5pt}
1136: \setlength{\itemindent}{0pt}}}{\end{list}}
1137:
1138: \begin{alg}
1139: \label{alg:optimize-domains}
1140: \textbf{Input:} The domain $\mc{D}$ and the function~$\Gamma$ that
1141: assigns to each state $s \in S(\mc{D})$ a collection $\Gamma(s)$ of
1142: automata that partition $\Sigma^*$.
1143: \begin{entry}
1144: \item \textbf{For each} state $s \in S(\mc{D})$, \textbf{let}:
1145: \begin{align*}
1146: \Gamma'(s) &:= \mrm{Disjoin}\left( \bigcup
1147: \{ \Gamma'(s,a,s') : (s,a,s') \in T(\mc{D}) \} \right) ~,
1148: \end{align*}
1149: where:
1150: \begin{align*}
1151: \Gamma'(s,a,s') &:= \{ \mc{E}_\alpha'' \}_\alpha \text{\ and} \\
1152: \{\mc{E}_\alpha'' \cdot a\}_\alpha &:= \{ (\mc{E} \cdot a) \cap
1153: \mc{E}' : \mc{E} \in \Gamma(s), \mc{E}' \in \Gamma(s') \}.
1154: \end{align*}
1155: \item If $\Gamma'(s) \ne \Gamma(s)$ for some state $s \in
1156: S(\mc{D})$, then \textbf{repeat} with $\Gamma'$ in place
1157: of $\Gamma$. Otherwise: \\
1158: \end{entry}
1159: \textbf{Output:} $\Gamma'$.
1160: \end{alg}
1161: }
1162:
1163: \begin{prop}
1164: \label{prop:optimization-terminates}
1165: Algorithm~\vref{alg:optimize-domains} eventually terminates,
1166: producing the coarsest possible refinements $\Gamma'(s)$ of
1167: $\Gamma(s)$ compatible with $\mc{D}$'s transition structure.
1168: \end{prop}
1169: \begin{proof}
1170: We construct fine, but finite, refinements that are compatible with
1171: $\mc{D}$'s transition structure, then use this result to conclude
1172: that Algorithm~\vref{alg:optimize-domains} must eventually terminate.
1173: Moreover, we also conclude that, when Algorithm~\ref{alg:optimize-domains}
1174: terminates, it produces the coarsest possible refinements that are
1175: compatible with $\mc{D}$'s transition structure.
1176:
1177: Let $\{\mc{E}_i\}$ denote the potentially large, but finite,
1178: collection of automata:
1179: \begin{align*}
1180: \{\mc{E}_i\}_{i=1}^N := \mrm{Disjoin}\left( \bigcup_{s \in
1181: S(\mc{D})} \! \! \! \Gamma(s) \right) ~,
1182: \end{align*}
1183: which partition $\Sigma^*$.
1184:
1185: We refine the partition $\{\mc{E}_i\}$ to make it compatible with
1186: $\mc{D}$'s transitions by examining the automaton $\mc{F} :=
1187: \Det(\mc{E}_1 \sqcup \dots \sqcup \mc{E}_N)$. Since the automata
1188: $\{\mc{E}_i\}$ cover $\Sigma^*$, the deterministic automaton
1189: $\mc{F}$ can have no forbidden transitions, and all its states must
1190: be final. Moreover, because the automata $\{\mc{E}_i\}$ are
1191: disjoint, each of $\mc{F}$'s states must correspond (under the
1192: canonical injection $\psi_\mc{F}$ of
1193: Lemma~\vref{subset-construction}) to final states of precisely one
1194: automaton $\mc{E}_i$. In this way, the automata $\{\mc{E}_i\}$
1195: correspond to a partition of the states of $\mc{F}$.
1196:
1197: Since each automaton $\mc{E}_i$ is equivalent to the automaton
1198: obtained by restricting $\mc{F}$'s final states to those states
1199: corresponding (under $\psi_\mc{F}$) to final states of $\mc{E}_i$,
1200: we can refine the partition $\{\mc{E}_i\}$ by refining this
1201: partition of $\mc{F}$'s states.
1202:
1203: Although a coarser refinement may suffice, we can always choose the
1204: partition consisting of single states. That is, if $s \in
1205: S(\mc{F})$, let $\mc{F}_s$ denote the automaton that is identical to
1206: the automaton $\mc{F}$ except that its only final state is~$s$.
1207: Then $\{ \mc{F}_s : s \in S(\mc{F})\}$ is a refinement of the
1208: partition $\{\mc{E}_i\}$ with the special property that for each
1209: automaton $\mc{F}_s$ and $a \in \Sigma(\mc{F})$, there is a unique
1210: automaton $\mc{F}_{s'}$ such that $\mc{F}_s \cdot a = \mc{F}_{s'}$.
1211: Indeed, since $\mc{F}$ is deterministic, $s'$ is the unique state
1212: corresponding to a transition $\trans{s}{a}{s'} \in T(\mc{F})$.
1213:
1214: If we let $\Gamma''(s) := \{ \mc{F}_{s'} : s' \in S(\mc{F}) \}$ for
1215: each state $s \in S(\mc{D})$, then we obtain finite refinements of
1216: $\Gamma(s)$ compatible with $\mc{D}$'s transition structure, as
1217: desired.
1218:
1219: This result implies that Algorithm~\ref{alg:optimize-domains} must
1220: eventually terminate. After all, every refinement that
1221: Algorithm~\ref{alg:optimize-domains} performs must already be
1222: reflected in $\Gamma''(s)$. Moreover, since every refinement that
1223: the algorithm performs is essential to compatibility with $\mc{D}$'s
1224: transition structure, the algorithm must, upon termination, produce
1225: the coarsest (smallest) compatible refinement possible.
1226: \end{proof}
1227:
1228: \begin{figure}[here]
1229: \includegraphics[width=2.6in]{autofigure-11.ps}
1230: \caption{\label{fig:chaotic-domains} The positive-entropy domains
1231: $\mc{D}_1$ and $\mc{D}_2$ of the binary, next-to-nearest neighbor
1232: \CA~2614700074. (After Ref.~\cite{Crut93a}.)}
1233: \end{figure}
1234:
1235: %% The equivalent, non-minimal domains $\{\mc{D}'_1, \mc{D}'_2\} =
1236: %% \Optimize(\{\mc{D}_1, \mc{D}_2\})$ obtained by applying
1237: %% Algorithm~\ref{alg:optimize-domains} to the chaotic domains in
1238: %% \Figure~\vref{fig:chaotic-domains} are shown in
1239: %% \Figure~\vref{fig:optimized-chaotic-domains}. Some of their
1240: %% transitions are drawn with dashed arrows to help the reader
1241: %% distinguish the non-recurrent states.
1242:
1243: When applied to the domains $\mc{D}_1$ and $\mc{D}_2$ in
1244: \Figure~\vref{fig:chaotic-domains}, for example,
1245: Algorithm~\ref{alg:optimize-domains} produces the equivalent,
1246: non-minimal domains $\{\mc{D}'_1, \mc{D}'_2\} = \Optimize(\{\mc{D}_1,
1247: \mc{D}_2\})$ shown in \Figure~\vref{fig:optimized-chaotic-domains}.
1248: Notice these domains' many non-recurrent states. These have almost no
1249: effect on the automaton $\mc{A}' := \Det(\bigsqcup_i \mc{D}'_i)$.
1250:
1251: \begin{figure}[here]
1252: \begin{minipage}[t]{2.4in}
1253: \includegraphics[width=2.4in]{autofigure-9-touchup.ps}
1254: \end{minipage}
1255: \begin{minipage}[t]{2.5in}
1256: \includegraphics[width=2.5in]{autofigure-10-touchup.ps}
1257: \end{minipage}
1258: \caption{\label{fig:optimized-chaotic-domains} The equivalent,
1259: non-minimal domains $\{\mc{D}'_1, \mc{D}'_2\} =
1260: \Optimize(\{\mc{D}_1, \mc{D}_2\})$ obtained by applying
1261: Algorithm~\ref{alg:optimize-domains} to the positive-entropy
1262: domains $\mc{D}_1$ and $\mc{D}_2$ in
1263: \Figure~\vref{fig:chaotic-domains}. ($\mc{D}'_1$ (top) and
1264: $\mc{D}'_2$ (bottom).) The ``Start'' arrows are
1265: omitted for clarity (all states are starting), and some of the
1266: transitions are drawn with dashed arrows to help the reader
1267: distinguish the recurrent states.}
1268: \end{figure}
1269:
1270: \section{Applications}
1271:
1272: We now present four applications to illustrate how the stack-based
1273: Algorithm~\ref{alg:stack-based} and its transducer approximation
1274: (Algorithms~\ref{alg:transducer-building}
1275: and~\ref{alg:optimize-domains}) solve the multi-regular language
1276: filtering problem. The first is the cellular automaton \ECA~110,
1277: shown previously. Its rather large filtering transducer is quite
1278: tedious to construct by hand, but
1279: Algorithm~\ref{alg:transducer-building} produces it handily. The
1280: second example, \ECA~18, which we have also already seen, illustrates
1281: the stack-based Algorithm~\ref{alg:stack-based}'s ability to detect
1282: overlapping domains. The third example shows our methods' power to
1283: detect structures in the midst of apparent randomness: the domains and
1284: sharp boundaries between them are identified easily despite the fact
1285: that the domains themselves have positive entropy and their boundaries
1286: move stochastically. The example shows the use of---and need
1287: for---domain-preprocessing (Algorithm~\ref{alg:optimize-domains}).
1288: That is, rapid resynchronization is achieved using a filter built from
1289: optimized, non-minimal domains. The final example demonstrates the
1290: transducer (constructed by Algorithms~\ref{alg:transducer-building}
1291: and~\ref{alg:optimize-domains}) detecting domains in a
1292: multi-stationary process---what is called the change-point problem in
1293: statistical time-series analysis. This example emphasizes that the
1294: methods developed here are not limited to cellular automata. More
1295: importantly, it highlights several of the subtleties of multi-regular
1296: language filtering and clearly illustrates the need for the
1297: domain-preprocessing Algorithm~\ref{alg:optimize-domains}.
1298:
1299: \subsection*{\CapitalECA~110}
1300:
1301: First consider \ECA~110, illustrated earlier in \Figure~\vref{eca110}.
1302: Its domains are easy to see visually; they have the form $\sub(w^*)$
1303: for some finite word~$w$. Its dominant domain is $\sub(w^*) =
1304: \sub[(00010011011111)^*]$, illustrated in
1305: \Figure~\vref{fig:eca110-spacetime-domain}.
1306: \begin{figure}
1307: \includegraphics[width=3in]{autocafigure-8.ps}
1308: \caption{\label{fig:eca110-spacetime-domain} \ECA~110's principal domain,
1309: $\sub[(00010011011111)^*]$.}
1310: \end{figure}
1311: In fact, the transducer $\Filter(\{ \sub[(00010011011111)^*] \})$,
1312: constructed from this single domain, filters \ECA~110's space-time
1313: behavior well; see \Figure~\vref{eca110-big}.
1314:
1315: Notice, in that figure, the wide variety of particle-like domain
1316: defects that the filtered version lays bare. Note, moreover, how
1317: these particles move and collide according to consistent rules. These
1318: particles are important to \ECA~110's computational properties; a
1319: subset can be used to implement a Post Tag system \cite{Mins67} and
1320: thus simulate arbitrary Turing machines \cite{Cook04}.
1321:
1322: \begin{figure*}
1323: %\centerline{\psfig{figure=eca110.eps}}
1324: \includegraphics[scale=0.3]{autocafigure-4.ps}
1325: \includegraphics[scale=0.3]{autocafigure-5.ps}
1326: \caption{An \ECA~110 space-time diagram (left) filtered by the transducer
1327: $\Filter(\{\sub[(00010011011111)^*]\})$ (right).
1328: \label{eca110-big}}
1329: \end{figure*}
1330:
1331: \subsection*{\CapitalECA~18}
1332:
1333: Next, consider \CapitalECA~18, illustrated earlier in
1334: \Figure~\vref{eca18}. It is somewhat more challenging to filter,
1335: because its domain $\mc{D}=\sub \left( [0(0+1)]^* \right)$ has
1336: positive entropy. As a result, its particles are
1337: difficult---although by no means impossible---to see with the naked
1338: eye. Nevertheless, the stack-based algorithm filters its space-time
1339: diagrams extremely well, as illustrated in \Figure~\ref{eca18}
1340: (right). There, black rectangles are drawn where maximal substrings
1341: overlap, and vertical bars are drawn where maximal substrings abut.
1342: As mentioned earlier, these particles, whose precise location is
1343: somewhat ambiguous, follow random walks and pairwise annihilate
1344: whenever they touch~\cite{Crut92a,Eloranta94,Hans90a}.
1345:
1346: It is worth mentioning that the transducer $\Filter(\{\mc{D}\})$
1347: produces a less precise filtrate in this case---and that
1348: $\Filter(\Optimize(\{\mc{D}\}))$ does no better. Indeed, since breaks
1349: in \ECA~18's domain have the form $\cdots 1 (0^{2 n}) 1 \cdots$, the
1350: precise location of the domain break is ambiguous: if reading
1351: left-to-right, it does not occur until the $1$ on the right of $0^{2
1352: n}$ is read; whereas, if reading right-to-left, it does not occur
1353: until the $1$ on the left is read. In other words, if reading
1354: left-to-right, the transducer $\Filter(\{\mc{D}\})$ detects only the
1355: right edges of the black triangles of \Figure~\ref{eca18} (right).
1356: Similarly, if reading right-to-left, it detects only the left edges of
1357: these triangles. In this case it is possible to fill in the space
1358: between these pairs of edges to obtain the output of the stack-based
1359: algorithm.
1360:
1361: \subsection*{\CapitalCA~2614700074}
1362:
1363: Now consider the binary, next-to-nearest neighbor (i.e. $k$=$r$=2)
1364: \CA~2614700074, shown in \Figure~\vref{chaotic-ca-big}. Crutchfield
1365: and Hanson constructed it expressly to have the positive-entropy domains
1366: $\mc{D}_1$ and $\mc{D}_2$ in \Figure~\vref{fig:chaotic-domains} \cite{Crut93a}.
1367:
1368: \begin{figure*}
1369: \includegraphics[scale=0.3]{autocafigure-6.ps}
1370: \includegraphics[scale=0.3]{autocafigure-7.ps}
1371: \caption{Binary, next-to-nearest neighbor \CA~2614700074 space-time
1372: diagram (left) filtered by the transducer
1373: $\Filter(\Optimize(\{\mc{D}_1,\mc{D}_2\}))$ (right). The white
1374: regions on the right correspond to the domain $\mc{D}_1$, the gray
1375: to the domain $\mc{D}_2$. The black squares separating these
1376: regions correspond to the interruption symbols $h'(s,s')$ that the
1377: transducer emits between domains.
1378: \label{chaotic-ca-big}}
1379: \end{figure*}
1380:
1381: As illustrated in \Figure~\vref{chaotic-ca-big}, the optimal
1382: transducer $\Filter(\Optimize(\{\mc{D}_1,\mc{D}_2\}))$ filters this
1383: \CA's output well. This illustrates a practical advantage of
1384: multi-regular language filtering: it can detect structure embedded in
1385: randomness. Notice how the filter easily identifies the domains and
1386: sharp boundaries separating them, even though the domains themselves
1387: have positive entropy and their boundaries move stochastically.
1388:
1389: It is worth noting that in place of the gray regions of
1390: \Figure~\ref{chaotic-ca-big} so clearly identified by the optimal
1391: transducer as corresponding to the second domain~$\mc{D}_2$, the
1392: simpler transducer $\Filter(\{\mc{D}_1,\mc{D}_2\})$ produces a regular
1393: checkering of false domain breaks (not pictured). This is because,
1394: when examining the sole forbidden transition $(s,a) = (2,1)$ of the
1395: first domain~$\mc{D}_1$, Algorithm~\ref{alg:transducer-building}
1396: discovers that the first non-empty set $S_{i=1,l=4} = \{2,4,5\}$
1397: contains three resynchronization states. It unfortunately abandons
1398: both states~4 and~5, which belong to the second domain, instead
1399: choosing to resynchronize to the original state~2 itself, because it
1400: occurs alone in the next set $S_{1,5}$. As a result, the transducer
1401: $\Filter(\{\mc{D}_1,\mc{D}_2\})$ has no transitions leaving the first
1402: domain whatsoever and is therefore incapable of detecting jumps from
1403: the first domain to the second. This is why it prints a checkering of
1404: domain breaks instead of correctly resynchronizing to the second
1405: domain. The optimal transducer does not suffer from this problem,
1406: because Algorithm~\ref{alg:optimize-domains} splits state~2 into
1407: several new ones, from which unambiguous resynchronization to the
1408: appropriate state---2, 4, or 5---is possible.
1409:
1410: \subsection*{Change-Point Problem: Filtering Multi-Stationary Sources}
1411:
1412: Leaving cellular automata behind, consider a binary information source
1413: that hops with low probability between the two three-state domains
1414: $\mc{D}_1$ and $\mc{D}_2$ in \Figure~\vref{fig:nasty-domains} (top).
1415: This source allows us to illustrate subtleties in multi-regular
1416: language filtering and, in particular, in the construction of the
1417: optimal transducer $\Filter(\Optimize(\{\mc{D}_i\})$ can be.
1418:
1419: \begin{figure}[here]
1420: \begin{minipage}[t]{2.4in}
1421: \includegraphics[width=1.9in]{autofigure-12.ps}
1422: \end{minipage}
1423: \begin{minipage}[t]{2.3in}
1424: \includegraphics[width=2.5in]{autofigure-13-touchup.ps}
1425: \end{minipage}
1426: \caption{\label{fig:nasty-domains}Two similar three-state domains
1427: $\mc{D}_1$ (top left) and $\mc{D}_2$ (top right) illustrate how
1428: subtle the construction of the optimal transducer
1429: $\Filter(\Optimize(\{\mc{D}_i\}))$ can be: the automaton $\mc{A}'
1430: := \Det(\bigsqcup \Optimize(\{\mc{D}_1, \mc{D}_2\}))$ (below),
1431: from which the optimal transducer is constructed, has 69
1432: states---the unoptimized automaton $\mc{A} := \Det(\mc{D}_1 \sqcup
1433: \mc{D}_2)$ (not pictured) has 30. }
1434: \end{figure}
1435:
1436: To appreciate how subtle filtering with the domains $\mc{D}_1$ and
1437: $\mc{D}_2$ is---and why the extra states of
1438: $\Optimize(\{\mc{D}_1,\mc{D}_2\})$ are needed to do it---consider the
1439: following. First choose any finite word $w$ of the form:
1440: \begin{align*}
1441: (0^6 + 0^3 1^2)^* 0^3 1^2 0.
1442: \end{align*}
1443:
1444: As the ambitious reader can verify, both of the strings $101111w$ and
1445: $110w$ belong to the domain $\mc{D}_2$. In fact, both correspond to
1446: unique paths through $\mc{D}_1 \sqcup \mc{D}_2$ ending in state~5 of
1447: \Figure~\vref{fig:nasty-domains} (top).
1448:
1449: On the other hand, the strings $01111w1$ and $10w1$ are also domain
1450: words---the first belonging to $\mc{D}_2$, but the second belonging to
1451: $\mc{D}_1$. In fact, $01111w1$ corresponds to a unique path through
1452: $\mc{D}_1 \sqcup \mc{D}_2$ ending in \emph{state~6}, while $10w1$
1453: corresponds to a unique path ending in \emph{state~3}.
1454:
1455: As a result, these four strings are the maximal substrings of the
1456: non-domain strings $101111w1$ and $110w1$, as indicated by the
1457: brackets below:
1458:
1459: \begin{align*}
1460: \overunderbraces{&\br{2}{\parbox{1.7in}{\small corresponds to a
1461: unique path through $\mc{D}_2$ ending in state 5}}&}
1462: {&1\;&0\;1\;1\;1\;1\;w&\;1&} {&&\br{2}{\text{\parbox{1.7in}{\small
1463: corresponds to a unique path through $\mc{D}_2$ ending in
1464: \emph{state 6}}}}}
1465: \end{align*}
1466: \begin{align*}
1467: \overunderbraces{&\br{2}{\text{\parbox{1.7in}{\small corresponds to
1468: a unique path through $\mc{D}_2$ ending in state 5}}}&}
1469: {&1\;&1\;0\;w&\;1&} {&&\br{2}{\text{\parbox{1.7in}{\small
1470: corresponds to a unique path through $\mc{D}_1$ ending in
1471: \emph{state 3}}}}}
1472: \end{align*}
1473:
1474: This example illustrates several important points. First of all, it
1475: shows that when the naive transducer $\Filter(\{\mc{D}_1, \mc{D}_2\})$
1476: reaches the forbidden letter 1 at the end of either of these two
1477: strings, the state~2 reached does not preserve enough information to
1478: resynchronize to the appropriate state---3 or 6, respectively. As a
1479: result, it must either make a guess---at the risk of choosing incorrectly
1480: and then later reporting an artificial domain break (as in the preceding
1481: cellular automaton example)---or else jump to one of its non-recurrent
1482: states, emitting a potentially long chain of $\lambda$s until it can
1483: re-infer from future input what was already determined by past input.
1484:
1485: As unsettling as this may be, the example illustrates something far
1486: more nefarious. Since an arbitrarily long word $w$ can be chosen, it
1487: is impossible to fix the problem by splitting the states of
1488: $\Filter(\{\mc{D}_1,\mc{D}_2\})$ so as to buffer finite windows of
1489: past input. In fact, because $w$ is chosen from a language with
1490: positive entropy, the number of windows that would need to be buffered
1491: grows exponentially.
1492:
1493: At this point achieving optimal resynchronization might seem hopeless,
1494: but it actually is possible. This is what makes
1495: Algorithm~\ref{alg:optimize-domains}---and in particular the proof
1496: that it terminates
1497: (Prop.~\ref{prop:optimization-terminates})---not only
1498: surprising, but extremely useful.
1499:
1500: Indeed, recall that instead of splitting states according to finite
1501: windows, Algorithm~\ref{alg:optimize-domains} splits them according to
1502: entire regular languages of past input and that, by
1503: Prop.~\ref{prop:optimization-terminates}, a finite number of
1504: these regular languages will always suffice to achieve optimal
1505: resynchronization. And so, instead of reaching the same original
1506: state~2 when reading the strings $101111w$ and $110w$, the optimal
1507: transducer $\Filter(\Optimize(\{\mc{D}_1,\mc{D}_2\}))$ reaches two
1508: distinct states $(2,\mc{E})$ and $(2,\mc{E}')$, where $101111w \in
1509: \Lang(\mc{E})$ and $110w \in \Lang(\mc{E}')$. These two split states
1510: are labeled with the enlarged integers 15 and 13, respectively, in
1511: \Figure~\ref{fig:nasty-domains} (bottom), which shows $\mc{A}' :=
1512: \Det(\bigsqcup \Optimize(\{\mc{D}_1, \mc{D}_2\}))$---the automaton
1513: from which $\Filter(\Optimize(\{\mc{D}_1,\mc{D}_2\}))$ is constructed.
1514: As illustrated in that figure, the optimal transducer has $69$
1515: states---the unoptimized automaton $\mc{A} := \Det(\mc{D}_1 \sqcup
1516: \mc{D}_2)$ (not pictured) has $30$.
1517:
1518: \section{Conclusion}
1519:
1520: We posed the multi-regular language filtering problem and presented
1521: two methods for solving it. The first, although providing the ideal
1522: solution, requires a stack, has a worst-case compute time that grows
1523: quadratically in string length and conditions its output at any
1524: point on arbitrarily long windows of future input. The second method
1525: was to algorithmically construct a transducer that approximates the first
1526: algorithm. In contrast to the stack-based algorithm it approximates,
1527: however, the transducer requires only a finite amount of memory, runs
1528: in linear time, and gives immediate output for each letter
1529: read---significant improvements for cellular automata structural
1530: analysis and, we suspect, for other applications as well. It is,
1531: moreover, the best possible approximation with these three features.
1532: Finally, we applied both methods to the computational-mechanics
1533: structural analysis of cellular automata and to a version of the
1534: change-point problem from time-series analysis.
1535:
1536: Future directions for this work include generalization both to
1537: probabilistic patterns and transducers and to higher dimensions.
1538: Although both seem difficult, the latter seems most daunting---at
1539: least from the standpoint of transducer construction---because there
1540: is as yet no consensus on how to approach the subtleties of
1541: high-dimensional automata theory. (See, for example,
1542: Refs.~\cite{Feld02b} and \cite{Lindgren98} for discussions of
1543: two-dimensional generalizations of regular languages and patterns.)
1544: Note, however, that the basic notion of maximal substrings underlying
1545: the stack-based algorithm is easily generalized to a broader notion of
1546: higher-dimensional maximal connected subregions, although we suspect
1547: that this generalization will be much more difficult to compute.
1548:
1549: In the introduction we alluded to a range of additional applications
1550: of multi-regular language filtering. Segmenting time series into
1551: structural components was illustrated by the change-point example.
1552: This type of time series problem occurs in many areas, however, such
1553: as in speech processing where the structural components are hidden
1554: Markov models of phonemes, for example, and in image segmentation
1555: where the structural components are objects or even textures. One of
1556: the more promising areas, though, is genomics. In genomics there is
1557: often quite a bit of prior biochemical knowledge about structural
1558: regions in biosequences. Finally, when coupled with statistical
1559: inference of stationary domains, so that the structural components are
1560: estimated from a data stream, multi-regular language filtering should
1561: provide a powerful and broadly applicable pattern detection tool.
1562:
1563: \section*{Acknowledgments}
1564:
1565: This work was supported at the Santa Fe Institute under the
1566: Networks Dynamics Program funded by the Intel Corporation and under
1567: the Computation, Dynamics, and Inference Program via SFI's core
1568: grants from the National Science and MacArthur Foundations. Direct
1569: support was provided by DARPA Agreement F30602-00-2-0583.
1570:
1571: \appendix
1572:
1573: \section{Automata Theory Preliminaries}
1574:
1575: \label{automata-review}
1576:
1577: In this appendix we review the definitions and results from automata
1578: theory that are essential to our exposition. A good source for these
1579: preliminaries is Ref.~\cite{Hopc79}, although its authors employ
1580: altogether different notation, which does not suit our needs.
1581:
1582: \subsection*{Automata}
1583:
1584: An \emph{automaton} $\mc{A}$ over an alphabet~$\Sigma(\mc{A})$ is a
1585: collection of states $S(\mc{A})$, together with subsets
1586: $\mathrm{Start}(\mc{A}), \mathrm{Final}(\mc{A}) \subset S(\mc{A})$,
1587: and a collection of transitions $T(\mc{A}) \subset S(\mc{A}) \times
1588: \Sigma(\mc{A}) \times S(\mc{A})$. We call an automaton \emph{finite}
1589: if both $S(\mc{A})$ and $T(\mc{A})$ are.
1590:
1591: An automaton $\mc{A}$ \emph{accepts} a string $\sigma = a_1 a_2 \cdots
1592: a_n$ if there is a sequence of transitions $\trans{s_1}{a_1}{s_2},
1593: \trans{s_2}{a_2}{s_3}, \dots, \trans{s_{n-1}}{a_n}{s_n} \in T(\mc{A})$
1594: such that $s_1 \in \mathrm{Start}(\mc{A})$ and $s_n \in
1595: \mathrm{Final}(\mc{A})$. Denote the collection of all strings that
1596: $\mc{A}$ accepts by $\Lang(\mc{A})$. Two automata $\mc{A}$ and
1597: $\mc{B}$ are said to be \emph{equivalent} if
1598: $\Lang(\mc{A})=\Lang(\mc{B})$.
1599:
1600: We can think of an automaton as a directed graph whose edges are
1601: labeled with symbols from $\Sigma(\mc{A})$. In this view, an
1602: automaton accepts precisely those strings that correspond to paths
1603: through its graph beginning in its start states and ending in its
1604: final ones.
1605:
1606: An automaton $\mc{A}$ is said to be \emph{semi-deterministic} if any
1607: pair of its transitions that agree in the first two slots are
1608: identical, that is, any pair of transitions of the form
1609: $\trans{s_1}{a}{s_2}$ and $(s_1,a,s'_2) \in T(\mc{A})$ satisfy $s_2 =
1610: s'_2$. A \emph{deterministic} automaton is one that is
1611: semi-deterministic and that has a single start state. If $\mc{A}$ is
1612: deterministic, then each string of $\Lang(\mc{A})$ corresponds to
1613: precisely one path through $\mc{A}$'s graph.
1614:
1615: For two automata $\mc{A}$ and $\mc{B}$, let $\mc{A} \sqcup \mc{B}$
1616: denote their \emph{disjoint union}---the automaton over the alphabet
1617: $\Sigma(\mc{A}) \cup \Sigma(\mc{B})$ whose states are the disjoint
1618: union of the states of $\mc{A}$ and $\mc{B}$, i.e. $S(\mc{A} \sqcup
1619: \mc{B}) = S(\mc{A}) \sqcup S(\mc{B})$ (and similarly for its start
1620: and final states) and whose transitions are the union of the
1621: transitions of $\mc{A}$ and $\mc{B}$. In this way, $\Lang(\mc{A}
1622: \sqcup \mc{B}) = \Lang(\mc{A}) \cup \Lang(\mc{B})$.
1623:
1624: In this terminology, a \emph{domain} is a semi-deterministic finite
1625: automaton $\mc{D}$ whose states are all start and final states, i.e.
1626: $\mathrm{Start}(\mc{D}) = S(\mc{D}) = \mathrm{Final}(\mc{D})$, and
1627: whose graph is strongly connected---i.e., there is a path from any
1628: one state to any other.
1629:
1630: Finally, a domain $\mc{D}$ is said to be \emph{minimal} if all
1631: equivalent domains $\mc{D}'$ satisfy $|S(\mc{D})| \le |S(\mc{D}')|$.
1632:
1633:
1634: \subsection*{Standard Results}
1635:
1636: \begin{lem}
1637: \label{subset-construction}
1638: Every automaton $\mc{A}$ is equivalent to a deterministic automaton
1639: $\Det(\mc{A})$. Moreover, $\Det(\mc{A})$'s states correspond
1640: uniquely to collections of $\mc{A}$'s states; in other words, there
1641: is a canonical injection $S(\Det(\mc{A}))
1642: \stackrel{\psi_{\Det(\mc{A})}}{\hookrightarrow} \{ S : S \subset
1643: S(\mc{A}) \}$.
1644: \end{lem}
1645:
1646: % {\bf Define, discuss, or at least recall \emph{canonical injection}.}
1647:
1648: \begin{lem}
1649: \label{intersection-construction}
1650: If $\mc{A}$ and $\mc{B}$ are automata, then there is an automaton
1651: $\mc{A} \cap \mc{B}$ that accepts precisely those strings accepted
1652: by both $\mc{A}$ and $\mc{B}$; that is, $\Lang(\mc{A} \cap \mc{B}) =
1653: \Lang(\mc{A}) \cap \Lang(\mc{B})$. If $\mc{A}$ and $\mc{B}$ are
1654: deterministic, then so is $\mc{A} \cap \mc{B}$. Moreover, there is
1655: a canonical injection $S(\mc{A} \cap \mc{B}) \hookrightarrow
1656: S(\mc{A}) \times S(\mc{B})$, which restricts to injections
1657: $\mathrm{Start}(\mc{A} \cap \mc{B}) \hookrightarrow
1658: \mathrm{Start}(\mc{A}) \times \mathrm{Start}(\mc{B})$ and
1659: $\mathrm{Final}(\mc{A} \cap \mc{B}) \hookrightarrow
1660: \mathrm{Final}(\mc{A}) \times \mathrm{Final}(\mc{B})$.
1661: \end{lem}
1662:
1663: %% \begin{lem}
1664: %% Every domain $\mc{D}$ is equivalent to a unique minimal domain
1665: %% $\mrm{Min}(\mc{D})$. Moreover, $\mrm{Min}(\mc{D})$ is isomorphic to
1666: %% the recurrent component of the automaton
1667: %% $\Det(\mrm{Reverse}(\Det(\mc{D})))$ where $\mrm{Reverse}(\bullet)$
1668: %% is the automaton obtained by reversing the directions $\bullet$'s
1669: %% transitions (although there are much faster ways to compute it).
1670: %% \end{lem}
1671:
1672: \subsection*{Transducers}
1673:
1674: A \emph{transducer} $\mc{T}$ from an alphabet $\Sigma(\mc{T})$ to an
1675: alphabet $\Sigma'(\mc{T})$ is an automaton on the alphabet
1676: $\Sigma(\mc{T}) \times \Sigma'(\mc{T})$. We will use the more
1677: traditional notation $(s,b|c,s')$ in place of $(s,(b,c),s') \in
1678: T(\mc{T})$.
1679:
1680: The \emph{input} of a transducer $\mc{T}$ is the automaton
1681: $\Input(\mc{T})$ whose states, start states, and final states are the
1682: same as $\mc{T}$'s, but whose transitions are given by
1683: $T(\Input(\mc{T})) := \{ \trans{s}{b}{s'} : \trans{s}{b|c}{s'} \in
1684: T(\mc{T}) \}$. Similarly, the \emph{output} of a transducer $\mc{T}$
1685: is the automaton $\Output(\mc{T})$ whose transitions are given by
1686: $T(\Output(\mc{T})) := \{ \trans{s}{c}{s'} : \trans{s}{b|c}{s'} \in
1687: T(\mc{T}) \}$.
1688:
1689: A transducer $\mc{T}$ is said to be \emph{well defined} if
1690: $\Input(\mc{T})$ is deterministic, because such a transducer
1691: determines a function from $\Lang(\Input(\mc{T}))$ onto
1692: $\Lang(\Output(\mc{T}))$.
1693:
1694: % In fact, if $\mc{A}$ is an automaton and $\Lang(\mc{A}) \subset
1695: % \Lang(\Input(\mc{T}))$, then there is an automaton $\mc{T}[\mc{A}]$
1696: % accepting the images under $\mc{T}$ of the strings accepted by
1697: % $\mc{A}$, i.e. $\Lang(\mc{T}[\mc{A}]) = \{ \mc{T}(\sigma) : \sigma \in
1698: % \Lang(\mc{A})\}$. Although this automaton is not generally
1699: % deterministic, there is a canonical injection $S(\mc{T}[\mc{A}])
1700: % \hookrightarrow S(\mc{T}) \times S(\mc{A})$.
1701:
1702: \label{automata-review-end}
1703:
1704: \section{Implementation}
1705:
1706: \label{implementation}
1707:
1708: % NOTE: AutomataCode.tex is generated from the literate Haskell file
1709: % AutomataCode.lhs by the program lhs2TeX. Therefore, you should make changes
1710: % only to AutomataCode.lhs.
1711:
1712: \input{AutomataCode.tex}
1713:
1714: %\bibliographystyle{amsplain}
1715: \bibliographystyle{unsrt}
1716: \bibliography{apd.refs}
1717:
1718: \end{document}
1719: