1: \documentclass[twocolumn,10pt]{article}
2:
3: \usepackage{latex8,times,latexsym,epsfig,fancyheadings,alltt,a4wide}
4:
5: \pretolerance 250
6: \tolerance 500
7: \hyphenpenalty 100
8: \exhyphenpenalty 100
9: \doublehyphendemerits 7500
10: \finalhyphendemerits 7500
11: \brokenpenalty 10000
12: \lefthyphenmin 2
13: \righthyphenmin 3
14: \widowpenalty 10000
15: \clubpenalty 10000
16: \displaywidowpenalty 10000
17: \looseness 1
18:
19: \usepackage{url}
20: \def\sburl#1{[\url{#1}]}
21:
22: \def\expr#1{\texttt{#1}}
23: \def\predicate#1{\texttt{#1}}
24:
25: % the sequence operator in an emu query
26: \newcommand{\queryseq}{-$>$\ }
27: % the domination operator
28: \newcommand{\querydom}{\^{}\ }
29: % the association operator
30: \newcommand{\queryassoc}{$=>$\ }
31: % and disjunction
32: \newcommand{\queryor}{$|$}
33:
34: \pagestyle{empty}
35:
36: \def\smtt#1{{\small\tt #1}}
37: \def\note#1{{\bf (#1)}}
38: \newenvironment{sv}{\small\begin{alltt}}{\end{alltt}\normalsize}
39: \def\mb#1{{\mbox{\scriptsize #1}}}
40: \def\rn#1#2{\delta_{#1\rightarrow #2}}
41: \def\arXivhack{\vspace{-6pt}}
42:
43: \title{Querying Databases of Annotated Speech}
44: \author{Steve Cassidy\\
45: Department of Linguistics\\
46: Macquarie University\\
47: Sydney, NSW 2109,\\
48: Australia\\
49: \smtt{steve.cassidy@mq.edu.au}\\
50: \and
51: Steven Bird\\
52: Linguistic Data Consortium,\\
53: University of Pennsylvania, \\
54: 3615 Market St, Suite 200, \\
55: Philadelphia, PA 19104-2608, USA \\
56: \smtt{steven.bird@ldc.upenn.edu}
57: }
58:
59: %\date{\today}
60:
61: \begin{document}
62:
63: \maketitle
64: \thispagestyle{empty}
65:
66: \begin{abstract}
67: Annotated speech corpora are databases consisting of signal data
68: along with time-aligned symbolic `transcriptions'. Such databases
69: are typically multidimensional, heterogeneous and dynamic. These
70: properties present a number of tough challenges for representation
71: and query. The temporal nature of the data adds an additional layer
72: of complexity. This paper presents and harmonises two independent
73: efforts to model annotated speech databases, one at Macquarie
74: University and one at the University of Pennsylvania.
75: Various query languages are described, along
76: with illustrative applications to a variety of analytical problems.
77: The research reported here forms a part of several ongoing projects
78: to develop platform-independent open-source tools for creating,
79: browsing, searching, querying and transforming linguistic databases,
80: and to disseminate large linguistic databases over the internet.
81: \end{abstract}
82:
83: \Section{Databases of Annotated Speech Recordings}
84:
85: Annotated corpora have been an essential component of research and
86: development in language-related technologies for some years.
87: Text corpora have been used for developing information retrieval
88: and summarisation software (e.g. MUC \cite{MUC7}, TREC \cite{Voorhees98}),
89: automatic taggers and parsers and machine translation systems
90: \cite{ChurchMercer93}. In a similar way, annotated
91: speech corpora have proliferated and have found uses across a rapidly
92: expanding set of languages, disciplines and technologies
93: \sburl{www.ldc.upenn.edu/annotation/}.
94: Over the last 7 years, the Linguistic Data Consortium (LDC)
95: has published over 150 text and speech databases
96: \sburl{www.ldc.upenn.edu/Catalog/}.
97:
98: Typically, such databases are specified at the level of file
99: formats. Linguistic content is annotated with a variety of
100: tags, attributes and values, with a specified syntax and semantics.
101: Tools are developed for each new format and linguistic domain
102: on an ad hoc basis. These systems are
103: akin to the databases of the 1960s. There is a physical
104: representation along with a hand-crafted program offering a
105: single view on the data. Recently, the authors have shown how
106: the three-level architecture and the relational model can be
107: applied to annotated speech databases
108: \cite{BirdLiberman99,Cassidy99}. The goal of this paper is
109: to illustrate our two approaches and to describe ongoing research
110: on query algebras.
111:
112: Before presenting the models we give an example of a collection
113: of speech annotations. This illustrates the diversity of
114: the physical formats and gives an idea of the challenge involved
115: in providing a general-purpose logical characterisation of the
116: data. The Boston University Radio Speech Corpus consists of
117: 7 hours of radio news stories
118: \sburl{www.ldc.upenn.edu/Catalog/LDC96S36.html}.
119: The annotations include four
120: types of information: orthographic transcripts, broad phonetic
121: transcripts (including main word stress), and two kinds of prosodic
122: annotation, all time-aligned to the digital audio files. The two kinds
123: of prosodic annotation implement the system known as ToBI --
124: Tones and Break Indices
125: \sburl{www.ling.ohio-state.edu/phonetics/E_ToBI/}.
126: We have added three further annotations: coreference annotation and
127: named entity annotation in the style of MUC-7
128: \sburl{www.muc.saic.com/proceedings/muc_7_toc.html}, and
129: syntactic structures in the style of the Penn TreeBank \cite{Marcus93}.
130: Fragments of the physical data are shown in Figure~\ref{fig:bu-speech}.
131:
132: \begin{figure*}
133: {\scriptsize\setlength{\tabcolsep}{.5\tabcolsep}
134: \begin{tabular}{l|l|l}
135: \begin{minipage}[t]{.325\linewidth}
136: {\small Coreference Annotation}
137:
138: {\tiny
139: \begin{alltt}
140: <COREF ID="2" MIN="woman">
141: This woman</COREF>
142: receives three hundred dollars
143: a month under
144: <COREF ID="5">
145: General Relief</COREF>, plus
146: <COREF ID="16"
147: MIN="four hundred dollars">
148: four hundred dollars a month in
149: <COREF ID="17"
150: MIN="benefits" REF="16">
151: A.F.D.C. benefits</COREF>
152: </COREF> for
153: <COREF ID="9" MIN="son">
154: <COREF ID="3" REF="2">
155: her</COREF> son
156: </COREF>, who is
157: <COREF ID="10" MIN="citizen" REF="9">
158: a U.S. citizen</COREF>.
159: <COREF ID="4" REF="2">
160: She</COREF>'s among
161: <COREF ID="18" MIN="aliens">
162: an estimated five hundred illegal
163: aliens on
164: <COREF ID="6" REF="5">
165: General Relief</COREF>
166: out of
167: <COREF ID="11" MIN="population">
168: <COREF ID="13" MIN="state">
169: the state</COREF>'s
170: total illegal immigrant
171: population of
172: <COREF ID="12" REF="11">
173: one hundred thousand
174: </COREF>
175: </COREF>
176: </COREF>.
177: <COREF ID="7" REF="5">
178: General Relief</COREF>
179: is for needy families and
180: unemployable adults who
181: \end{alltt}}
182: \end{minipage}
183: &
184: \begin{minipage}[t]{.25\linewidth}
185: {\small Named Entity\\ Annotation}
186:
187: {\tiny
188: \begin{alltt}
189: This woman receives
190: <b_numex TYPE="MONEY">
191: three hundred dollars
192: <e_numex>
193: a month under General
194: Relief, plus
195: <b_numex TYPE="MONEY">
196: four hundred dollars
197: <e_numex>
198: a month in A.F.D.C.
199: benefits for her
200: son, who is a
201: <b_enamex TYPE="LOCATION">
202: U.S.
203: <e_enamex>
204: citizen. brth She's among
205: an estimated five hundred
206: illegal aliens on General
207: Relief brth out of the
208: state's total illegal
209: immigrant population of
210: one hundred thousand. brth
211: General Relief is for
212: needy families and
213: unemployable adults brth
214: who don't qualify for other
215: public assistance. brth
216: <b_enamex TYPE="ORGANIZATION">
217: Welfare Department
218: <e_enamex>
219: spokeswoman
220: <b_enamex TYPE="PERSON">
221: Michael Reganburg
222: <e_enamex>
223: brth says the state will
224: save about
225: <b_numex TYPE="MONEY">
226: one million dollars
227: <e_numex>
228: a year if illegal aliens
229: are denied General Relief.
230: \end{alltt}}
231: \end{minipage}
232: &
233: \begin{minipage}[t]{.325\linewidth}
234: {\small Penn Treebank Annotation}
235:
236: {\tiny
237: \begin{alltt}
238: ((S
239: (NP-SBJ This woman)
240: (VP receives
241: (NP
242: (NP
243: (NP (QP three hundred) dollars)
244: (NP-ADV a month)
245: (PP under
246: (NP General Relief))) , plus
247: (NP
248: (NP (QP four hundred) dollars
249: )
250: (NP-ADV a month)
251: (PP in
252: (NP A.F.D.C. benefits))))
253: (PP for
254: (NP
255: (NP her son) ,
256: (SBAR (WHNP-1 who)
257: (S (NP-SBJ *T*-1)
258: (VP is
259: (NP-PRD a U.S. citizen)))))))
260: .
261: ))
262: ((S
263: (NP-SBJ She)
264: (VP 's
265: (PP-PRD among
266: (NP (NP an estimated
267: (QP five hundred) illegal aliens)
268: (PP on
269: (NP General Relief))
270: (PP out of
271: (NP
272: (NP
273: (NP the state 's)
274: total illegal immigrant population)
275: (PP of
276: (NP
277: (QP one hundred thousand))))))))
278: .
279: \end{alltt}}
280: \end{minipage}
281: \end{tabular}
282:
283: \vspace*{2ex}\hrule\vspace*{2ex}
284:
285: \begin{tabular}{l|l|l|l}
286: \begin{minipage}[t]{.21\linewidth}
287: {\small Word-Level Annotation}
288:
289: \begin{alltt}
290: 0.320000 This
291: 0.620000 woman
292: 1.120000 receives
293: 1.370000 three
294: 1.670000 {hundred
295: 2.020000 }dollars
296: 2.060000 a
297: 2.450000 month
298: 2.740000 under
299: 3.280000 General
300: 3.800000 Relief
301: 4.310000 plus
302: 4.520000 four
303: 4.800000 hundred
304: 5.160000 dollars
305: 5.190000 a
306: 5.480000 month
307: 5.610000 in
308: 6.340000 A.F.D.C.
309: 6.870000 benefits
310: 7.060000 for
311: 7.190000 her
312: 7.620000 son
313: 7.830000 who
314: 7.970000 is
315: 8.020000 a
316: \end{alltt}
317: \end{minipage}
318: &
319: \begin{minipage}[t]{.25\linewidth}
320: {\small Syllable Annotation}
321:
322: \begin{alltt}
323: H# 0 2
324: H# 2 3
325: >endsil
326: DH 5 14 4.182398
327: IH 19 6 -0.184139
328: S 25 8 -0.387113
329: >This
330: W 33 6 -0.495798
331: UH+1 39 3 -0.792806
332: M 42 7 0.042605
333: >
334: EN 49 14 0.395379
335: >woman
336: R 63 3 -0.996359
337: IY 66 7 -0.658371
338: >
339: S 73 12 0.865892
340: IY+1 85 13 0.815127
341: V 98 9 0.815878
342: Z 107 6 -0.563102
343: >receives
344: TH 113 9 0.506469
345: R 122 5 -0.359288
346: IY+1 127 11 0.323961
347: >three
348: HH 138 3 -0.905714
349: \end{alltt}
350: \end{minipage}
351: &
352: \begin{minipage}[t]{.33\linewidth}
353: {\small Tonal Annotation}
354:
355: \begin{alltt}
356: 0.373684 HiF0
357: 0.493698 H*
358: 0.915000 !H*
359: 1.100000 !H-
360: 1.325000 L+H*
361: 1.389472 HiF0
362: 1.716865 L*
363: 2.178711 !H*
364: 2.434735 L-L%
365: 2.969376 H*
366: 3.552627 HiF0
367: 3.630000 H* ; !HL%, maybe LL% ?
368: 3.770074 H-L%
369: 4.440000 H*
370: 4.478946 HiF0
371: 5.330000 L*
372: 5.445000 L-H%
373: 5.709989 H*
374: 6.300000 H*
375: 6.331575 HiF0
376: 6.740000 L-H%
377: 7.336837 HiF0
378: 7.402120 H*
379: 7.607943 L-L%
380: 8.301393 H*
381: 8.510248 HiF0
382: 10.105260 HiF0
383: \end{alltt}
384: \end{minipage}
385: &
386: \begin{minipage}[t]{.2\linewidth}
387: {\small Part-of-speech\\ Annotation}
388:
389: \begin{alltt}
390: This DT
391: woman NN
392: receives VBZ
393: three CD
394: hundred CD
395: dollars NNS
396: a DT
397: month NN
398: under IN
399: General NP
400: Relief, NP
401: plus CC
402: four CD
403: hundred CD
404: dollars NNS
405: a DT
406: month NN
407: in IN
408: A.F.D.C. NP
409: benefits NNS
410: for IN
411: her PP\$
412: son, NN
413: who WP
414: is VBZ
415: \end{alltt}
416: \end{minipage}
417: \end{tabular}}
418:
419: \caption{Multiple Annotations of the Boston University Radio Speech Corpus}\label{fig:bu-speech}
420: \vspace*{2ex}\hrule
421: \end{figure*}
422:
423: %\begin{figure}
424: %\centerline{\epsfig{figure=bu4.ps,width=\linewidth}}
425: %\vspace{3cm}
426: %\caption{Visualization for BU Example}\label{fig:bu-ag}
427: %\vspace*{2ex}\hrule
428: %\end{figure}
429:
430:
431: Coreference annotation (Figure~\ref{fig:bu-speech}, top left) associates a
432: unique identifier to each noun phrase and a reference attribute which
433: links each pronoun to its antecedent. The set of coreferring
434: expressions is considered to be an equivalence class. Named-entity
435: annotation (top centre) identifies and classifies numerical and name
436: expressions. Penn Treebank annotation provides a syntactic parse of
437: each sentence. The word-level annotation (bottom left) gives the end
438: time of each word (a second offset into the associated signal data).
439: The syllable annotation gives the Arpabet phonetic symbols
440: (see \sburl{www.ldc.upenn.edu/doc/timit/phoncode.doc}).
441: The tonal annotation provides time points and intonational units, and the
442: part of speech annotation (bottom right) specifies the syntactic
443: category of each word. This is but a small sample of the bazaar of
444: data formats.
445: \arXivhack
446:
447: \Section{Data Models for Speech Databases}
448:
449: Two database models for multi-layered speech annotations have been
450: developed by the authors. The Emu model (Macquarie) organises the data
451: primarily in terms of its hierarchical structure, while the annotation
452: graph model (Penn) foregrounds the temporal structure. In separate
453: work we demonstrate the expressive equivalence of the two models
454: \cite{BirdLiberman99,CassidyHarrington99}. Here we give a brief
455: overview of both models. In the remainder of this paper we will
456: consider mainly the annotation graph data model, while the Emu system
457: serves as an example of a working speech database system.
458:
459: \SubSection{The Emu model}
460:
461: The Emu speech database system \sburl{www.shlrc.mq.edu.au/emu}
462: \cite{CassidyHarrington96,CassidyHarrington99} provides tools for creation, query
463: and analysis of data from annotated speech databases. Emu is
464: implemented as a core C++ library and a set of extensions to the Tcl
465: scripting language which provide a set of basic operations on speech
466: annotations. Emu provides a flexible annotation model into which a
467: number of existing label file formats can be read.
468:
469: The Emu annotation model is based on a set of \emph{levels} which
470: represent different types of linguistic data such as words, phonemes or
471: pitch events. Each level contains a set of \emph{tokens} which have
472: one or more \emph{labels} and optionally a start and end time relative
473: to an associated speech signal. Within a level, tokens are stored as a
474: partial order representing thier sequence in the annotation: each token
475: may have zero or more previous and next tokens. The partial ordering
476: must respect timing information if it is present in the tokens: that
477: is, a token cannot follow a token with an later start time.
478:
479: Within and between levels, tokens may be related by either
480: \emph{domination} or \emph{association} relations. Domination
481: relations relate a parent token to an ordered sequence of constituent
482: child tokens and imply that the start and end times of the parent could
483: be inferred from those of the children. Association relations have no
484: in-built semantics and can be used for any application specific
485: relation, such as that between a word and a tone target which denotes
486: the point at which word stress is realised
487: (Figure~\ref{fig:emu-timit}). Relations may be defined between any
488: pair of levels which allows Emu to handle intersecting hierarchies such
489: as that illustrated in Figure~\ref{fig:emu-timit}.
490:
491: \begin{figure*}[tbp]
492: \centerline{\epsfig{file=emu-timit,width=0.75\linewidth}}
493: \caption{An example utterance from the TIMIT database which has been
494: augmented with both a syntactic annotation and a ToBI style
495: intonational annotation. The names of the levels are shown on the
496: left, the Word level has been duplicated to show the links to both
497: the syntactic and intonatational hierarchies. The single Tone event H*
498: is associated with the word `dark'. Time information at the phoneme
499: level is used to derive times for all higher levels.}
500: \label{fig:emu-timit}
501: \vspace*{2ex}\hrule
502: \end{figure*}
503:
504:
505: \SubSection{The annotation graph model}
506:
507: A second general purpose model supporting multiple independent
508: hierarchical transcriptions of the same signal data is known as the
509: {\it annotation graph} \cite{BirdLiberman99dtag,BirdLiberman99}.
510: This model forms the heart of a joint initiative between LDC, NIST
511: \sburl{www.nist.gov} and MITRE \sburl{www.mitre.org}
512: to develop an architecture and tools for linguistic
513: analysis systems (ATLAS), and an NSF-sponsored project between
514: LDC, the Penn database group, and the CMU Psychology and Informedia
515: departments, to develop a multimodal database of communicative
516: interaction called Talkbank \sburl{www.talkbank.org}.
517:
518: Annotation graphs are labelled DAGs with time references on some of the
519: nodes. Bird and Liberman have demonstrated that annotation graphs are
520: sufficiently expressive to encompass the full range of current speech
521: annotation practice. A simple example of an annotation graph is shown
522: in Figure~\ref{fig:ag-timit}, for a corpus known as TIMIT \cite{TIMIT86}.
523: Annotation graphs (AGs) have the following structure.
524: Let $L = \bigotimes L_i$ be the label data which occurs on the arcs of
525: an AG. The nodes $N$ of an AG reference signal data by virtue of a
526: function mapping nodes to time offsets $T$. AGs are now defined as
527: follows:
528:
529: \newtheorem{defn}{Definition}
530: \newtheorem{ex}{Example}
531:
532: \begin{defn}
533: An \textbf{annotation graph} $G$ over a label set $L$ and a
534: timeline $T$ is a 3-tuple
535: $\left< N, A, \tau \right>$ consisting of a node set $N$,
536: a collection of arcs $A$ labelled with elements of $L$,
537: and a time function $\tau$, which satisfies the following conditions:
538:
539: \begin{enumerate}\setlength{\itemsep}{0pt}
540:
541: \item $\left< N, A \right>$ is an acyclic digraph
542: labeled with elements of $L$, and
543: containing no nodes of degree zero;
544:
545: \item $\tau: N \rightharpoonup T$,
546: such that, for any path from node $n_1$ to $n_2$ in $A$,
547: if $\tau(n_1)$ and $\tau(n_2)$ are defined, then
548: $\tau(n_1) \leq \tau(n_2)$;
549:
550: \end{enumerate}
551: \end{defn}
552:
553: \begin{figure*}[tbp]
554: \centerline{\epsfig{file=ag-timit,width=\linewidth}}
555: \caption{TIMIT Graph Structure}\label{fig:ag-timit}
556: \vspace*{2ex}\hrule
557: \end{figure*}
558: %% note I've modified the emu example to associate H* with 'dark'
559: %% instead of aa -- I think this is fits ToBI better
560:
561: Note that AGs may be disconnected or empty, and that they must
562: not have orphan nodes. The AG corresponding to the Emu annotation
563: structure in Figure~\ref{fig:emu-timit}, for the first five
564: words of a TIMIT annotation, is given in Figure~\ref{fig:ag-timit}.
565: The arc types are interpreted as follows:
566: \expr{S} -- syntax;
567: \expr{W} -- word;
568: \expr{P} -- phoneme;
569: \expr{T} -- tone;
570: \expr{Imt} -- intermediate phrase;
571: \expr{Itl} -- intonational phrase.
572: \arXivhack
573:
574: \Section{Annotations as Relational Tables}
575:
576: Annotation data expressed in either the Emu or annotation graph data
577: models can be trivially recast as a set of relational tables
578: \cite{Cassidy99}. For the purposes of this paper it is instructive to
579: consider the relational form of annotation data in order to explore the
580: requirements for a query language for these databases.
581:
582: An annotation graph can be represented as a pair of tables, for the arc
583: relation and time relations. The arc relation is a six-tuple
584: containing an arc id, a source node id, a target node id, and three
585: labels taken from the sets $L_1, L_2, L_3$ respectively. The choice of
586: three label positions is somewhat arbitrary, but it seems to be
587: both necessary and sufficient for the various annotation structures
588: considered here.
589:
590: We let $L_1$ be the set of types of transcript information
591: (e.g.\ `word', `syllable', `phoneme'), and let
592: $L_2$ be the substantive transcript element (e.g.\ particular
593: words, phonetic symbols, and so on). We let $L_3$ be the names
594: of equivalence classes, used here to model so-called
595: `phonological association'. (This kind of association is
596: discussed in depth in \cite{Bird95}.)
597: Let $T$ be the set
598: of non-negative integers, the sample numbers.
599: Figure~\ref{fig:graph-table} gives an example for the TIMIT data of
600: Figures~\ref{fig:emu-timit}, \ref{fig:ag-timit}.
601:
602: \begin{figure*}[tbp]
603: {\scriptsize
604: \begin{minipage}{\textwidth}
605: \begin{tabular}[t]{c|cccccc}
606: {\it Arc} &
607: $id$ &
608: $X$ &
609: $Y$ &
610: $L_1$ &
611: $L_2$ &
612: $L_3$ \\
613: \cline{2-7}
614:
615: &1 & 0 & 1 & P & h\# & \\
616: &2 & 1 & 2 & P & sh & \\
617: &3 & 2 & 3 & P & iy & \\
618: &4 & 3 & 4 & P & hv & \\
619: &5 & 4 & 5 & P & ae & \\
620: &6 & 5 & 6 & P & dcl & \\
621: &7 & 6 & 7 & P & y & \\
622: &8 & 7 & 8 & P & axr & \\
623: &9 & 8 & 9 & P & dcl & \\
624: &10 & 9 & 10 & P & d & \\
625: &11 & 10 & 11 & P & aa & \\
626: &12 & 11 & 12 & P & r & \\
627: &13 & 12 & 13 & P & kcl & \\
628: &14 & 13 & 14 & P & k & \\
629: &15 & 14 & 15 & P & s & \\
630: &16 & 15 & 16 & P & uw & \\
631: &17 & 16 & 17 & P & q &
632: \end{tabular}\hfil
633: \begin{tabular}[t]{c|cccccc}
634: {\it Arc} &
635: $id$ &
636: $X$ &
637: $Y$ &
638: $L_1$ &
639: $L_2$ &
640: $L_3$ \\
641: \cline{2-7}
642:
643: &18 & 1 & 3 & W & she & \\
644: &19 & 3 & 6 & W & had & \\
645: &20 & 6 & 8 & W & your & \\
646: &21 & 8 & 14 & W & dark & 1 \\
647: &22 & 14 & 17 & W & suit & \\
648: &23 & 1 & 18 & S & S & \\
649: &24 &3 & 18 & S & VP & \\
650: &25 &1 & 3 & S & NP & \\
651: &26 &3 & 6 & S & V & \\
652: &27 &6 & 17 & S & NP & \\
653:
654: &28 &1 & 17 & Imt & L- & \\
655: &29 &1 & 18 & Itl & L\% & \\
656:
657: &30 &1 & 19 & T & 0 & \\
658: &31 &19 & 20 & T & H* & 1
659: \end{tabular}\hfil
660: \begin{tabular}[t]{c|cc}
661: {\it Time} &
662: $N$ &
663: $T$\\
664: \cline{2-3}
665: & 0 & 0 \\
666: & 1 & 2360 \\
667: & 2 & 3270 \\
668: & 3 & 5200 \\
669: & 4 & 6160 \\
670: & 5 & 8720 \\
671: & 6 & 9680 \\
672: & 7 & 10173\\
673: & 8 & 11077\\
674: & 9 & 12019\\
675: & 10 & 12257\\
676: & 11 & 14120\\
677: & 12 & 15240\\
678: & 13 & 16200\\
679: & 14 & 16626\\
680: & 15 & 18480\\
681: & 16 & 20685\\
682: & 17 & 22179\\
683: & 18 & 57040\\
684: & 19 & 13650\\
685: & 20 & 13650
686: \end{tabular}
687: \end{minipage}
688: \caption{The Arc and Time Relations}
689: \label{fig:graph-table}
690: }
691: \vspace{2ex}\hrule
692: \end{figure*}
693:
694: We form the transitive closure of the (unlabeled) graph relation to
695: define a structural (graph-wise) precedence relation using a datalog program:
696:
697: \begin{sv}
698: s_prec(X,X) :-
699: s_prec(X,Y) :- arc(_,X,Y,_,_,_)
700: s_prec(X,Y) :- s_prec(X,Z),
701: arc(_,Z,Y,_,_,_)
702: \end{sv}
703:
704: Now we further define a temporal precedence relation, where {\tt leq} is
705: the $\leq$ relation (minimally defined on the times used by the graph):
706:
707: \begin{sv}
708: t_prec0(X,Y) :- time(X,T1),
709: time(Y,T2),
710: leq(X,Y)
711:
712: t_prec(X,Y) :- t_prec0(X,Y)
713: t_prec(X,Y) :- t_prec(X,Z),
714: t_prec0(Z,Y)
715: \end{sv}
716: \arXivhack
717:
718: \Section{Exploring Annotated Linguistic Databases}
719:
720: \SubSection{General architecture}
721:
722: In our experience with the analysis of linguistic databases, we have
723: found a recurrent pattern of use having three components
724: which we will call query, report generation, and analysis.
725:
726: The query system proper can be viewed as a function from annotation
727: graphs to sets of subgraphs, i.e. those meeting some (perhaps complex)
728: condition.
729: The report generation phase is able to access these query
730: results, but also the signals underlying the annotations. For
731: example, the report generation phase can calculate such things as
732: `mean F$_2$ in signal S during time interval $(t_1,t_2)$.'
733: Each hit constitutes an `observation' in the statistician's sense,
734: and we extract a vector of specified values for each observation, to
735: be passed along to the analysis system.
736: The analysis phase is then some general-purpose data
737: crunching system such as Splus or Matlab.
738:
739: This architecture saves us from having to incorporate all possible
740: calculations over annotated signals into the query language.
741: The report generation phase can perform such calculations, as well
742: as compute properties of the annotation data itself.
743: This seems to simplify the query system a good deal;
744: now things like `count the number of syllables to the end of the
745: current phrase' (which we do need to be able to do) are tasks for the
746: report generator, not the query system proper.
747:
748: In general, the result of a query is a set of sub-graphs, each of which
749: forms one matching instance. If we use the relational model proposed
750: above, these would be returned as a result table having the
751: same structure as the arc relation of Figure~\ref{fig:graph-table},
752: but containing just the tuples which took part in each matching instance.
753: We are then faced with the problem of how to differentiate the matching
754: instances, for example, if we wished to collect together the word
755: labels for the query `find all words dominated by noun phrases' we need
756: some way of treating each sub-graph separately. Hence, we would prefer
757: the result to be a set of tables rather than a single table containing
758: all matching tuples.
759:
760: In a sense, then, the only role of the query is to define an iterator
761: for the report generator over a set of sub-graphs of the overall
762: annotation graph.
763:
764: \subsubsection*{The Emu query language}
765:
766: The Emu query language uses simple conditions on token labels which
767: match only tokens at a specified level, for example:
768: \texttt{Phonetic=A|I|O|U|E|V}. These conditions can be combined by
769: sequence, domination or association operators to constrain the
770: relational structure of the tokens of interest. Examples of each are:
771:
772: Find a sequence of \texttt{vowel} followed by \texttt{stop} at the
773: phoneme level:\\
774: \texttt{[Phoneme=vowel \queryseq Phoneme=stop]}
775:
776: Find Words not labelled \texttt{x} dominating \texttt{vowel}
777: phonemes:\\
778: \texttt{[Word!=x \querydom Phoneme=vowel]}
779:
780: Find words associated with \texttt{H*} tones:\\
781: \texttt{[Word!=x \queryassoc Tone=H*]}
782:
783: The \texttt{Word!=x} query is intended to match any word in lieu of a
784: query language construct which allows matching any label string.
785:
786: Each query matches either a token or, in the case of the sequence
787: query, a sequence of tokens. The result of a domination or association
788: query is the result of the left hand side of the bracketed term; this
789: can be changed by marking the right hand side term with a hash (\texttt{\#}).
790: Compound queries can be arbitrarily nested to specify complex
791: constraints on tokens. As an example the following query finds
792: sequences of stop and vowels dominated by strong syllables where the
793: vowel is associated with an \texttt{H*} tone target, the result is a
794: list of the vowel labels with associated start and end times.
795: \begin{center}
796: \begin{sv}
797: [Syllable=S ^
798: [Phoneme=stop ->
799: [Phoneme=vowel => Tone=H*]]]
800: \end{sv}
801: \end{center}
802:
803: The result of an Emu query is a table with one entry per matching
804: token:
805: \begin{sv}
806: database:timit
807: query:Phoneme!=x
808: type:segment
809: #
810: h# 0 147.5 fjsp0:sa1
811: sh 147.5 232.5 fjsp0:sa1
812: iy 232.5 325 fjsp0:sa1
813: hv 325 385 fjsp0:sa1
814: ...
815: \end{sv}
816: This table is used to extract any of the associated time-series data
817: associated with the database, an operation usually carried out from an
818: analysis environment such as Splus or XlispStat. Emu provides
819: libraries of analysis functions for these environments which
820: facilitate, for example, mapping signal processing operations over each
821: token in a query result or overlaying plots of the time series data for
822: each token.
823:
824: Although this query system has proved useful and useable in the
825: environment of acoustic phonetics research, it is now evident that
826: there are a number of shortcomings which prevent it's wider use. The
827: query syntax is unable to express some queries, such as those involving
828: disjunction or optional elements, and the query result is only really
829: useful for data extraction. It is for these reasons that we are now
830: looking more formally at the requirements for a query language for
831: annotation data.
832:
833: \SubSection{A query language on annotation graphs}
834:
835: A high-level query language for annotation graphs, founded on
836: an interval-based tense logic, is currently being developed and
837: will be reported in a later version of this paper.
838:
839: Here we describe a variety of useful queries on annotation
840: graphs and formulate them as datalog programs. As we shall see,
841: it turns out that datalog is insufficiently expressive for the
842: range of queries we have in mind. Finding a more expressive
843: yet tractable query language is the focus of ongoing research.
844:
845: A number of simple operations, extending our two relations
846: \predicate{arc/6} and \predicate{time/2},
847: will be necessary for succinct queries.
848: The first and most obvious is for hierarchy. Observe in
849: Figure~\ref{fig:ag-timit} that there is a notion of structural
850: inclusion defined by the arcs. We formulate this as follows:
851:
852: \begin{sv}
853: s_incl(I,J) :-
854: arc(I,W,Z,_,_,_),
855: arc(J,X,Y,_,_,_),
856: s_prec(W,X), s_prec(Y,Z)
857: \end{sv}
858:
859: Now, since \predicate{s\_prec} is reflexive, so is \predicate{s\_incl}.
860: Observe that nodes 3 and 6 in Figure~\ref{fig:ag-timit} are connected
861: by both an \smtt{S/V} arc and a \smtt{W/had} arc. The syntactic
862: verb arc \smtt{S/V} should dominate the word arc \smtt{W/had}, but not
863: vice versa. Therefore we need to have a hierarchy defined over the
864: types. We achieve this with a (domain-specific) ordering on the
865: type names:
866:
867: \begin{sv}
868: type_hierarchy(word,syl)
869: type_hierarchy(syl,seg)
870: \end{sv}
871:
872: \noindent
873: Now dominance is expressed by the predicate:
874:
875: \begin{sv}
876: dom(I,J) :-
877: arc(I,_,_,L1,_,_),
878: arc(J,_,_,L2,_,_),
879: type_hierarchy(L1,L2),
880: s_incl(I,J)
881: \end{sv}
882:
883: In some cases it is necessary to have an intransitive dominance
884: relation that is sensitive to phrase structure rules. For simplicity
885: of presentation, we assume binary branching structures. The first
886: of the rules below states that a sentence arc \smtt{s} will
887: immediately and exhaustively dominate an \smtt{np} arc followed
888: by a \smtt{vp} arc.
889:
890: \begin{sv}
891: ps_rule(s,np,vp)
892: ps_rule(np,det,n)
893: ps_rule(vp,v,np)
894: \end{sv}
895:
896: \noindent
897: Now we define immediate dominance over the syntax arcs \smtt{syn} as
898: follows:
899:
900: \begin{sv}
901: i_dom(I,J) :-
902: arc(I,X,Z,syn,P,_),
903: ps_rule(P,C1,C2),
904: arc(J,X,Y,syn,C1,_),
905: arc(_,Y,Z,syn,C2,_)
906:
907: i_dom(I,J) :-
908: arc(I,X,Z,syn,P,_),
909: ps_rule(P,C1,C2),
910: arc(_,X,Y,syn,C1,_),
911: arc(J,Y,Z,syn,C2,_)
912: \end{sv}
913:
914: Another widely used relation between arcs is association. In the
915: instance of the AG model in Figure~\ref{fig:graph-table}, association
916: amounts to sharing the value of $L_3$, as we saw in the tuples
917: for \smtt{dark} and \smtt{H*} in Figure~\ref{fig:graph-table}.
918: The \predicate{assoc}
919: predicate simply does a join on the third label field:
920:
921: \begin{sv}
922: assoc(I,J) :-
923: arc(I,_,_,_,_,A),
924: arc(J,_,_,_,_,A)
925: \end{sv}
926:
927: Finally, it is convenient to have a kleene star relation.
928: %%Unfortunately it is unable to collect up the arbitrary length
929: Unfortunately in datalog we are unable to collect up the arbitrary length
930: sequence it matches. Here we have it returning the two nodes which
931: bound the sequence, which is often enough to uniquely identify the
932: sequence in practice.
933:
934: \begin{sv}
935: node(N) :- arc(_,N,_,_,_,_)
936: node(N) :- arc(_,_,N,_,_,_)
937:
938: kleene1(X,X,_) :- node(X)
939: kleene1(X,Y,L) :-
940: arc(_,X,Z,L,_,_),
941: kleene1(Z,Y,L)
942:
943: kleene2(X,X,_) :- node(X)
944: kleene2(X,Y,L) :-
945: arc(_,X,Z,_,L,_),
946: kleene2(Z,Y,L)
947:
948: kleene3(X,X,_) :- node(X)
949: kleene3(X,Y,L) :-
950: arc(_,X,Z,_,_,L),
951: kleene3(Z,Y,L)
952: \end{sv}
953:
954: With this simple machinery we can start defining some annotation
955: queries.
956:
957: %\note{Now, we want these queries to construct a graph/6 table where each
958: %tuple that takes part in the match is represented. Clearly these
959: %queries aren't going to do that since they are just logical statements
960: %about various properties, so they really combine our query and
961: %reporting steps so...}
962: % I don't think this is a problem now, given the way queries
963: % always return arc ids - SB
964: %
965: %We define a query here as a datalog predicate. The result of a query
966: %is the set of graph/6 tuples involved in deriving the truth value for
967: %the predicate. This set is marked somehow to group together those
968: %tuples involved in each instance of a query match.
969: %
970: %\note{But, even this isn't enough, since we want to have the query
971: % result annotated in some way so that we can tell which bit matched
972: % what -- ie which is the vowel and which is the stop in this first
973: % query. So, what do we do? }
974:
975: \noindent
976: Find a sequence of vowel followed by stop at the phoneme level
977: (assumes suitably defined {\tt vowel} and {\tt stop} unary relations):
978: \begin{sv}
979: vowel_stop(I,J) :-
980: arc(I,_,Y,phoneme,V,_),
981: arc(J,Y,_,phoneme,S,_),
982: vowel(V), stop(S)
983: \end{sv}
984:
985: \noindent
986: If we do not want both the vowel and the stop, but just the vowel,
987: we could write:
988: \begin{sv}
989: vowel_stop(I) :-
990: arc(I,_,Y,phoneme,V,_),
991: arc(_,Y,_,phoneme,S,_),
992: vowel(V), stop(S)
993: \end{sv}
994:
995: \noindent
996: Find words dominating vowel phonemes:
997: \begin{sv}
998: strongWrdDomVowels(I) :-
999: arc(I,_,_,word,s,_),
1000: arc(J,_,_,phoneme,V,_),
1001: vowel(V),
1002: dom(I,J)
1003: \end{sv}
1004:
1005: \noindent
1006: Find words associated with H* tones:
1007: %% [Word!=x => Tone=H*]
1008: \begin{sv}
1009: sylHtone(I) :-
1010: arc(I,_,_,word,_,A),
1011: arc(_,_,_,tone,h*,A)
1012: \end{sv}
1013:
1014: \noindent
1015: Find stop-vowel sequences dominated by words in noun phrases where the word
1016: is associated with an H* tone target.
1017: %% [[Phoneme=stop -> Phoneme=vowel] ^ [[Word!=x => Tone=H*] ^ Syntax=np]]
1018: \begin{sv}
1019: stop_vowel_seq(I,J) :-
1020: arc(I,_,Y,phoneme,S,_), stop(S),
1021: arc(J,Y,_,phoneme,V,_), vowel(V),
1022: arc(W,_,_,word,_,_),
1023: arc(N,_,_,syn,np,_),
1024: dom(N,W), dom(W,I), dom(W,J),
1025: arc(T,_,tone,h*,_), assoc(W,T)
1026: \end{sv}
1027:
1028: \noindent
1029: Find the intermediate phrase containing the main verb of a sentence:
1030: %% [Intermediate!=x ^ [Syntax=s ^ [Syntax=vp ^ Syntax=v]]]
1031: \begin{sv}
1032: imt_phrase(P) :-
1033: arc(K, _, _, syn, s, _),
1034: arc(J, _, _, syn, vp, _),
1035: arc(I, _, _, syn, v, _),
1036: i_dom(K,J), i_dom(J,I),
1037: dom(P, I),
1038: arc(P, _, _, imt, _, _)
1039: \end{sv}
1040:
1041: \noindent
1042: Return the set of syllables between an H* and an L\% tone (inclusive).
1043: %% Emu can't do this, doesn't have kleene star
1044: \begin{sv}
1045: syls(K) :-
1046: arc(_, _, N, tone, h*, A1),
1047: arc(_, N, _, tone, l%, A2),
1048: arc(I, _, N1, syl, _, A1),
1049: arc(J, N2, _, syl, _, A2),
1050: kleene1(N1, N2, syl),
1051: arc(K, N2, N3, syl,_,_),
1052: kleene1(N3, N4, syl)
1053: \end{sv}
1054:
1055: The above query shows how the datalog model breaks down. We would
1056: like it to return sets of sets of syllable arcs. Instead it returns
1057: a flat set structure.
1058: In many cases we will know that some arc participating in
1059: the query expression can be used to recover the nested structure. For
1060: example, if the head of the above clause was changed from
1061: \predicate{syls(K)} to \predicate{syls(I,K)}, then \predicate{I} will
1062: aggregate \predicate{K} in just the right way.
1063: \arXivhack
1064:
1065: \Section{Applying XML Query Languages to Annotations}
1066:
1067: %\begin{figure*}[t]
1068: %\begin{tabular}{l|l}
1069: %{\scriptsize
1070: %\begin{minipage}[t]{.4\linewidth}
1071: %\begin{alltt}
1072: %<clauses>
1073: % <clause id=1 label='s'>
1074: % <clause id=2 label='np'>
1075: % <word id=3 label='she'/>
1076: % </clause>
1077: % <clause id=4 label='vp'>
1078: % <word id=5 label='had'/>
1079: % <clause id=6 label='np'>
1080: % <word id=7 label='your'/>
1081: % <word id=8 label='dark'/>
1082: % <word id=9 label='suit'/>
1083: % </clause>
1084: % <clause id=10 label='pp'>
1085: % ...
1086: % </clause>
1087: % </clause>
1088: % </clause>
1089: %</clauses>
1090: %\end{alltt}
1091: %\end{minipage}}
1092: %&
1093: %{\scriptsize
1094: %\begin{minipage}[t]{.5\linewidth}
1095: %\begin{alltt}
1096: %<tobi>
1097: % <intonational id=20 label='L\%'>
1098: % <intermediate id=21 label='L-'>
1099: % <word id=3 label='she'>
1100: % <phoneme id=22 label='sh'>
1101: % <phoneme id=23 label='iy'>
1102: % </word>
1103: % <word id=5 label='had'>
1104: % <phoneme id=24 label='hv'/>
1105: % <phoneme id=25 label='ae'/>
1106: % <phoneme id=26 label='dcl'/>
1107: % </word>
1108: % <word id=7 label='your'/>
1109: % <word id=8 label='dark' toneref=101/>
1110: % <word id=9 label='suit'>
1111: % </word>
1112: % </intermediate>
1113: %...
1114: % </intonational>
1115: % <tones>
1116: % <tone id=101 label='H*'/>
1117: % <tone id=102 label='H*'/>
1118: % </tones>
1119: %</tobi>
1120: %\end{alltt}
1121: %\end{minipage}}
1122: %\end{tabular}
1123: %\caption{The TIMIT Example as a Pair of XML Documents}
1124: %\label{fig:xml-eg}
1125: %\vspace{2ex}\hrule
1126: %\end{figure*}
1127:
1128: It is worth briefly considering the suitability of existing XML
1129: query languages such as XML-QL \cite{xml-ql} and XQL \cite{xql}
1130: for the domain of annotated speech. At first glance the problems
1131: we face querying annotated speech data are similar to those present
1132: with XML queries in that both present a hierarchical data
1133: model. A number of formulations of annotation data as XML are
1134: possible, indeed some projects make use of XML/SGML based formats
1135: entirely (e.g.\ MATE \sburl{mate.nis.sdu.dklpq},
1136: LACITO \sburl{lacito.vjf.cnrs.fr/ARCHIVAG/ENGLISH.htm}).
1137: %
1138: % This is quite problematic, and I propose to omit it:
1139: % The major problem is the common occurence of multiple
1140: % intersecting hierarchies, such as that presented in the earlier
1141: % augmented TIMIT example. In order to represent this kind of
1142: % annotation, two or more XML documents (or sub-documents) are required
1143: % which share structure, perhaps via common attributes; an example is
1144: % given in Figure~\ref{fig:xml-eg}.
1145: %
1146: XML can represent trees using properly nested tags, in the
1147: obvious way. In order to represent multiple independent
1148: hierarchies built on top of the same material one must construct trees
1149: using IDREF pointers. This idea was proposed by the
1150: Text Encoding Initiative \cite{TEI-P3} and recently
1151: adopted by the MATE project. We believe this approach is
1152: vastly more expressive than necessary for representing speech
1153: annotations, and we prefer a more constrained approach having
1154: desirable computational properties with respect to creation,
1155: validation and query.
1156:
1157: The XQL proposal \cite{xql} describes a query language which is
1158: intended to select elements from
1159: within XML documents according to various criteria; for example, the
1160: query \texttt{text/author} returns all author elements that are
1161: children of text elements. The XQL data model ignores the order of
1162: elements within a parent element and has no obvious way to query for
1163: sequences of tokens.
1164:
1165: The XML-QL proposal \cite{xml-ql} provides for a data model
1166: where the order of elements is respected. A query for a word-internal
1167: vowel-stop sequence could be expressed as follows (assuming
1168: suitably tagged annotation data for TIMIT):
1169:
1170: \begin{sv}
1171: <word>
1172: <phoneme label=&vowel;/>
1173: <phoneme label=&stop;/>
1174: </word>
1175: \end{sv}
1176:
1177: \noindent
1178: The result of this query would have the following form:
1179:
1180: \begin{sv}
1181: <word label=had>
1182: <phoneme label=ae/>
1183: <phoneme label=dcl/>
1184: </word>
1185: <word label=dark>
1186: <phoneme label=ar/>
1187: <phoneme label=k/>
1188: </word>
1189: \end{sv}
1190:
1191: Queries which refer to two independent
1192: hierarchies, such as syntactic and intonational phrase
1193: structure, need to use joins.
1194: For example, to find words that are simultaneously
1195: at the end of both clauses and intermediate phrases,
1196: we could have the following query:
1197: %%In Emu-QL:
1198: %%End(Intermediate,Word)=1 & End(Clause,Word)=1
1199: \begin{sv}
1200: <intermediate>
1201: <word id=\$i></>[end()]
1202: </intermediate>
1203:
1204: <clause>
1205: <word id=\$i></>[end()]
1206: </clause>
1207: \end{sv}
1208:
1209: We assume the existence of some mechanism to pick out the last child
1210: element.
1211: The ID attribute ensures that the words are the same in each
1212: part of the join.
1213:
1214: Perhaps either of these approaches could be made to work for a useful range
1215: of query needs. However they do not appear to be sufficiently
1216: general. For example, it is often useful to have query expressions
1217: involving kleene star: `select all pairs of consonants, ignoring
1218: any intervening vowels' (CV*C). Such queries may ignore hierarchical
1219: structure, finding sequences across (say) word boundaries.
1220: Using regular expressions over paths, XML-QL could provide access to
1221: strings of terminal symbols ignoring intervening levels of hierarchy.
1222: Yet it does not provide regular-expression matching over those
1223: sequences. Alternatively, sequences at each level of a hierarchy
1224: could be chained together using IDREF pointers, but it is unclear
1225: how we would manage closures over such pointer structures.
1226:
1227: % This is really problematic - so I'm omitting it:
1228: % While it may be possible to express many queries in XML-QL or similar
1229: % XML query languages, the poor fit of the XML data model to annotated
1230: % speech data ensures that many of these queries will be awkward and
1231: % unnatural. Hence, although these efforts are informative and the
1232: % semistructured data model is clearly appropriate, the query needs we
1233: % describe here appear to fall outside the capabilities of these
1234: % existing XML query languages. Databases of annotated speech
1235: % present an interesting new challenge for research on query languages.
1236: \arXivhack
1237:
1238: \Section{Conclusions}
1239:
1240: Annotated speech corpora are an essential component of speech
1241: research, and the variety of formats in which they are distributed has
1242: become a barrier to their wider adoption. To address this issue,
1243: we have developed two data
1244: models for speech annotations which seem to be sufficiently expressive
1245: to encompass the full range of practice in this area. We have shown
1246: how the models can be stored in a simple relational format, and how
1247: many useful queries in this domain are first-order. However, existing
1248: query languages lack sufficient expressive power for the full range of
1249: queries we would like to be able to express, and we hope stimulate new
1250: research into the design of general purpose query languages for
1251: databases of annotated speech recordings.
1252: \arXivhack
1253:
1254: \section*{Acknowledgements}
1255:
1256: We are grateful to Peter Buneman, Mark Liberman
1257: and Gary Simons for helpful discussions concerning
1258: the research reported here.
1259:
1260: \raggedright\small
1261: \bibliographystyle{latex8}
1262:
1263: \begin{thebibliography}{10}\setlength{\itemsep}{-1ex}\small
1264:
1265: \bibitem{Bird95}
1266: S.~Bird.
1267: \newblock {\em Computational Phonology: A Constraint-Based Approach}.
1268: \newblock Studies in Natural Language Processing. Cambridge University Press,
1269: 1995.
1270:
1271: \bibitem{BirdLiberman99dtag}
1272: S.~Bird and M.~Liberman.
1273: \newblock Annotation graphs as a framework for multidimensional linguistic data
1274: analysis.
1275: \newblock In {\em Towards Standards and Tools for Discourse Tagging --
1276: Proceedings of the Workshop}, pages 1--10. Somerset, NJ: Association for
1277: Computational Linguistics, 1999.
1278: \newblock [xxx.lanl.gov/abs/cs.CL/9907003].
1279:
1280: \bibitem{BirdLiberman99}
1281: S.~Bird and M.~Liberman.
1282: \newblock A formal framework for linguistic annotation.
1283: \newblock Technical Report MS-CIS-99-01, Department of Computer and Information
1284: Science, University of Pennsylvania, 1999.
1285: \newblock [xxx.lanl.gov/abs/cs.CL/9903003], expanded from version presented at
1286: ICSLP-98, Sydney, revised version to appear in {\it Speech Communication}.
1287:
1288: \bibitem{Cassidy99}
1289: S.~Cassidy.
1290: \newblock Compiling multi-tiered speech databases into the relational model:
1291: experiments with the {Emu} system.
1292: \newblock In {\em Proceedings of the 6th European Conference on Speech
1293: Communication and Technology}, 1999.
1294: \newblock \url{http://www.shlrc.mq.edu.au/emu/eurospeech99.shtml}.
1295:
1296: \bibitem{CassidyHarrington96}
1297: S.~Cassidy and J.~Harrington.
1298: \newblock Emu: An enhanced hierarchical speech data management system.
1299: \newblock In {\em Proceedings of the Sixth Australian International Conference
1300: on Speech Science and Technology}, pages 361--366, 1996.
1301: \newblock \url{http://www.shlrc.mq.edu.au/emu/}.
1302:
1303: \bibitem{CassidyHarrington99}
1304: S.~Cassidy and J.~Harrington.
1305: \newblock Multi-level annotation of speech: an overview of the emu speech
1306: database management system.
1307: \newblock manuscript, 1999.
1308:
1309: \bibitem{ChurchMercer93}
1310: K.~W. Church and R.~L. Mercer, editors.
1311: \newblock {\em Special Issue on Computational Linguistics Using Large Corpora},
1312: volume 19(1,2).
1313: \newblock MIT Press, 1993.
1314:
1315: \bibitem{xml-ql}
1316: A.~Deutsch, M.~Fernandez, D.~Florescu, A.~Levy, and D.~Suciu.
1317: \newblock {XML-QL}: A query language for {XML}, 1998.
1318: \newblock \url{http://www.w3.org/TR/NOTE-xml-ql/}.
1319:
1320: \bibitem{TIMIT86}
1321: J.~S. Garofolo, L.~F. Lamel, W.~M. Fisher, J.~G. Fiscus, D.~S. Pallett, and
1322: N.~L. Dahlgren.
1323: \newblock {\em The {DARPA TIMIT} Acoustic-Phonetic Continuous Speech Corpus
1324: {CDROM}}.
1325: \newblock NIST, 1986.
1326: \newblock \url{http://www.ldc.upenn.edu/Catalog/LDC93S1.html}.
1327:
1328: \bibitem{Marcus93}
1329: M.~P. Marcus, B.~Santorini, and M.~A. Marcinkiewicz.
1330: \newblock Building a large annotated corpus of {English}: The {Penn}
1331: {Treebank}.
1332: \newblock {\em Computational Linguistics}, 19(2):313--30, 1993.
1333: \newblock \url{http://www.cis.upenn.edu/~treebank/home.html}.
1334:
1335: \bibitem{MUC7}
1336: {\em Message Understanding Conference Proceedings (MUC-7)}. Science
1337: Applications International Corporation, 1998.
1338: \newblock \url{http://www.muc.saic.com/proceedings/muc_7_toc.html}.
1339:
1340: \bibitem{xql}
1341: J.~Robie, J.~Lapp, and D.~Schach.
1342: \newblock {XML} query language ({XQL}), 1998.
1343: \newblock \url{http://www.w3.org/TandS/QL/QL98/pp/xql.html}.
1344:
1345: \bibitem{TEI-P3}
1346: {Text Encoding Initiative}.
1347: \newblock {\em Guidelines for Electronic Text Encoding and Interchange (TEI
1348: P3)}.
1349: \newblock Oxford University Computing Services, 1994.
1350: \newblock \url{http://www.uic.edu/orgs/tei/}.
1351:
1352: \bibitem{Voorhees98}
1353: E.~M. Voorhees and D.~K. Harman, editors.
1354: \newblock {\em NIST Special Publication 500-242: The Seventh Text REtrieval
1355: Conference (TREC-7)}. NIST, Government Printing Office, 1998.
1356: \newblock [trec.nist.gov/pubs/trec7/t7\_proceedings.html].
1357:
1358: \end{thebibliography}
1359:
1360: \end{document}
1361: