cs0204026/arxiv.tex
1: \documentclass[twocolumn,10pt]{article}
2: 
3: \usepackage{latex8,times,latexsym,epsfig,fancyheadings,alltt,a4wide}
4: 
5: \pretolerance 250
6: \tolerance 500
7: \hyphenpenalty 100
8: \exhyphenpenalty 100
9: \doublehyphendemerits 7500
10: \finalhyphendemerits 7500
11: \brokenpenalty 10000
12: \lefthyphenmin 2
13: \righthyphenmin 3
14: \widowpenalty 10000
15: \clubpenalty 10000
16: \displaywidowpenalty 10000
17: \looseness 1
18: 
19: \usepackage{url}
20: \def\sburl#1{[\url{#1}]}
21: 
22: \def\expr#1{\texttt{#1}}
23: \def\predicate#1{\texttt{#1}}
24: 
25: % the sequence operator in an emu query
26: \newcommand{\queryseq}{-$>$\ }
27: % the domination operator
28: \newcommand{\querydom}{\^{}\ }
29: % the association operator
30: \newcommand{\queryassoc}{$=>$\ }
31: % and disjunction
32: \newcommand{\queryor}{$|$}
33: 
34: \pagestyle{empty}
35: 
36: \def\smtt#1{{\small\tt #1}}
37: \def\note#1{{\bf (#1)}}
38: \newenvironment{sv}{\small\begin{alltt}}{\end{alltt}\normalsize}
39: \def\mb#1{{\mbox{\scriptsize #1}}}
40: \def\rn#1#2{\delta_{#1\rightarrow #2}}
41: \def\arXivhack{\vspace{-6pt}}
42: 
43: \title{Querying Databases of Annotated Speech}
44: \author{Steve Cassidy\\
45: Department of Linguistics\\
46: Macquarie University\\
47: Sydney, NSW 2109,\\
48: Australia\\
49: \smtt{steve.cassidy@mq.edu.au}\\
50: \and 
51: Steven Bird\\
52: Linguistic Data Consortium,\\
53: University of Pennsylvania, \\
54: 3615 Market St, Suite 200,  \\
55: Philadelphia, PA 19104-2608, USA \\
56: \smtt{steven.bird@ldc.upenn.edu}
57: }
58: 
59: %\date{\today}
60: 
61: \begin{document}
62: 
63: \maketitle
64: \thispagestyle{empty}
65: 
66: \begin{abstract}
67:   Annotated speech corpora are databases consisting of signal data
68:   along with time-aligned symbolic `transcriptions'.  Such databases
69:   are typically multidimensional, heterogeneous and dynamic.  These
70:   properties present a number of tough challenges for representation
71:   and query.  The temporal nature of the data adds an additional layer
72:   of complexity.  This paper presents and harmonises two independent
73:   efforts to model annotated speech databases, one at Macquarie
74:   University and one at the University of Pennsylvania.
75:   Various query languages are described, along
76:   with illustrative applications to a variety of analytical problems.
77:   The research reported here forms a part of several ongoing projects
78:   to develop platform-independent open-source tools for creating,
79:   browsing, searching, querying and transforming linguistic databases,
80:   and to disseminate large linguistic databases over the internet.
81: \end{abstract}
82: 
83: \Section{Databases of Annotated Speech Recordings}
84: 
85: Annotated corpora have been an essential component of research and
86: development in language-related technologies for some years.
87: Text corpora have been used for developing information retrieval
88: and summarisation software (e.g. MUC \cite{MUC7}, TREC \cite{Voorhees98}),
89: automatic taggers and parsers and machine translation systems
90: \cite{ChurchMercer93}.  In a similar way, annotated
91: speech corpora have proliferated and have found uses across a rapidly
92: expanding set of languages, disciplines and technologies
93: \sburl{www.ldc.upenn.edu/annotation/}.
94: Over the last 7 years, the Linguistic Data Consortium (LDC)
95: has published over 150 text and speech databases
96: \sburl{www.ldc.upenn.edu/Catalog/}.
97: 
98: Typically, such databases are specified at the level of file
99: formats.  Linguistic content is annotated with a variety of
100: tags, attributes and values, with a specified syntax and semantics.
101: Tools are developed for each new format and linguistic domain
102: on an ad hoc basis.  These systems are
103: akin to the databases of the 1960s.  There is a physical
104: representation along with a hand-crafted program offering a
105: single view on the data.  Recently, the authors have shown how
106: the three-level architecture and the relational model can be
107: applied to annotated speech databases
108: \cite{BirdLiberman99,Cassidy99}.  The goal of this paper is
109: to illustrate our two approaches and to describe ongoing research
110: on query algebras.
111: 
112: Before presenting the models we give an example of a collection
113: of speech annotations.  This illustrates the diversity of
114: the physical formats and gives an idea of the challenge involved
115: in providing a general-purpose logical characterisation of the
116: data.  The Boston University Radio Speech Corpus consists of
117: 7 hours of radio news stories
118: \sburl{www.ldc.upenn.edu/Catalog/LDC96S36.html}.
119: The annotations include four
120: types of information: orthographic transcripts, broad phonetic
121: transcripts (including main word stress), and two kinds of prosodic
122: annotation, all time-aligned to the digital audio files. The two kinds
123: of prosodic annotation implement the system known as ToBI --
124: Tones and Break Indices
125: \sburl{www.ling.ohio-state.edu/phonetics/E_ToBI/}.
126: We have added three further annotations: coreference annotation and
127: named entity annotation in the style of MUC-7
128: \sburl{www.muc.saic.com/proceedings/muc_7_toc.html}, and
129: syntactic structures in the style of the Penn TreeBank \cite{Marcus93}.
130: Fragments of the physical data are shown in Figure~\ref{fig:bu-speech}.
131: 
132: \begin{figure*}
133: {\scriptsize\setlength{\tabcolsep}{.5\tabcolsep}
134: \begin{tabular}{l|l|l}
135: \begin{minipage}[t]{.325\linewidth}
136: {\small Coreference Annotation}
137: 
138: {\tiny
139: \begin{alltt}
140: <COREF ID="2" MIN="woman">
141:   This woman</COREF>
142: receives three hundred dollars
143: a month under
144: <COREF ID="5">
145:   General Relief</COREF>, plus
146: <COREF ID="16"
147:        MIN="four hundred dollars">
148:   four hundred dollars a month in
149:   <COREF ID="17"
150:          MIN="benefits" REF="16">
151:     A.F.D.C. benefits</COREF>
152: </COREF> for
153: <COREF ID="9" MIN="son">
154:   <COREF ID="3" REF="2">
155:     her</COREF> son
156: </COREF>, who is
157: <COREF ID="10" MIN="citizen" REF="9">
158:   a U.S. citizen</COREF>.
159: <COREF ID="4" REF="2">
160:   She</COREF>'s among
161: <COREF ID="18" MIN="aliens">
162:   an estimated five hundred illegal
163:   aliens on
164:   <COREF ID="6" REF="5">
165:     General Relief</COREF>
166:   out of
167:   <COREF ID="11" MIN="population">
168:     <COREF ID="13" MIN="state">
169:       the state</COREF>'s
170:     total illegal immigrant
171:     population of
172:     <COREF ID="12" REF="11">
173:       one hundred thousand
174:     </COREF>
175:   </COREF>
176: </COREF>.
177: <COREF ID="7" REF="5">
178:   General Relief</COREF>
179: is for needy families and
180: unemployable adults who
181: \end{alltt}}
182: \end{minipage}
183: &
184: \begin{minipage}[t]{.25\linewidth}
185: {\small Named Entity\\ Annotation}
186: 
187: {\tiny
188: \begin{alltt}
189: This woman receives
190: <b_numex TYPE="MONEY">
191:   three hundred dollars
192: <e_numex>
193: a month under General
194: Relief, plus
195: <b_numex TYPE="MONEY">
196:   four hundred dollars
197: <e_numex>
198: a month in A.F.D.C.
199: benefits for her
200: son, who is a
201: <b_enamex TYPE="LOCATION">
202:   U.S.
203: <e_enamex>
204: citizen. brth She's among
205: an estimated five hundred
206: illegal aliens on General
207: Relief brth out of the
208: state's total illegal
209: immigrant population of
210: one hundred thousand. brth
211: General Relief is for
212: needy families and
213: unemployable adults brth
214: who don't qualify for other
215:  public assistance. brth
216: <b_enamex TYPE="ORGANIZATION">
217:   Welfare Department
218: <e_enamex>
219: spokeswoman
220: <b_enamex TYPE="PERSON">
221:   Michael Reganburg
222: <e_enamex>
223: brth says the state will
224: save about
225: <b_numex TYPE="MONEY">
226:   one million dollars
227: <e_numex>
228: a year if illegal aliens
229: are denied General Relief.
230: \end{alltt}}
231: \end{minipage}
232: &
233: \begin{minipage}[t]{.325\linewidth}
234: {\small Penn Treebank Annotation}
235: 
236: {\tiny
237: \begin{alltt}
238: ((S
239:   (NP-SBJ This woman)
240:    (VP receives
241:     (NP
242:      (NP
243:       (NP (QP three hundred) dollars)
244:       (NP-ADV a month)
245:       (PP under
246:        (NP General Relief))) , plus
247:      (NP
248:       (NP (QP four hundred) dollars
249:       )
250:       (NP-ADV a month)
251:       (PP in
252:        (NP A.F.D.C. benefits))))
253:     (PP for
254:      (NP
255:       (NP her son) ,
256:       (SBAR (WHNP-1 who)
257:        (S (NP-SBJ *T*-1)
258:         (VP is
259:          (NP-PRD a U.S. citizen)))))))
260:   .
261: ))
262: ((S
263:  (NP-SBJ She)
264:  (VP 's
265:   (PP-PRD among
266:    (NP (NP an estimated
267:     (QP five hundred) illegal aliens)
268:    (PP on
269:     (NP General Relief))
270:    (PP out of
271:     (NP
272:      (NP
273:       (NP the state 's)
274:        total illegal immigrant population)
275:        (PP of
276:         (NP
277:          (QP one hundred thousand))))))))
278:   .
279: \end{alltt}}
280: \end{minipage}
281: \end{tabular}
282: 
283: \vspace*{2ex}\hrule\vspace*{2ex}
284: 
285: \begin{tabular}{l|l|l|l}
286: \begin{minipage}[t]{.21\linewidth}
287: {\small Word-Level Annotation}
288: 
289: \begin{alltt}
290: 0.320000 This
291: 0.620000 woman
292: 1.120000 receives
293: 1.370000 three
294: 1.670000 {hundred
295: 2.020000 }dollars
296: 2.060000 a
297: 2.450000 month
298: 2.740000 under
299: 3.280000 General
300: 3.800000 Relief
301: 4.310000 plus
302: 4.520000 four
303: 4.800000 hundred
304: 5.160000 dollars
305: 5.190000 a
306: 5.480000 month
307: 5.610000 in
308: 6.340000 A.F.D.C.
309: 6.870000 benefits
310: 7.060000 for
311: 7.190000 her
312: 7.620000 son
313: 7.830000 who
314: 7.970000 is
315: 8.020000 a
316: \end{alltt}
317: \end{minipage}
318: &
319: \begin{minipage}[t]{.25\linewidth}
320: {\small Syllable Annotation}
321: 
322: \begin{alltt}
323: H#   0    2
324: H#   2    3
325: >endsil
326: DH   5    14   4.182398
327: IH   19   6   -0.184139
328: S    25   8   -0.387113
329: >This
330: W    33   6   -0.495798
331: UH+1 39   3   -0.792806
332: M    42   7    0.042605
333: >
334: EN   49   14   0.395379
335: >woman
336: R    63   3   -0.996359
337: IY   66   7   -0.658371
338: >
339: S    73   12   0.865892
340: IY+1 85   13   0.815127
341: V    98   9    0.815878
342: Z    107  6   -0.563102
343: >receives
344: TH   113  9    0.506469
345: R    122  5   -0.359288
346: IY+1 127  11   0.323961
347: >three
348: HH   138  3   -0.905714
349: \end{alltt}
350: \end{minipage}
351: &
352: \begin{minipage}[t]{.33\linewidth}
353: {\small Tonal Annotation}
354: 
355: \begin{alltt}
356: 0.373684 HiF0
357: 0.493698 H*
358: 0.915000 !H*
359: 1.100000 !H-
360: 1.325000 L+H*
361: 1.389472 HiF0
362: 1.716865 L*
363: 2.178711 !H*
364: 2.434735 L-L%
365: 2.969376 H*
366: 3.552627 HiF0
367: 3.630000 H* ; !HL%, maybe LL% ?
368: 3.770074 H-L%
369: 4.440000 H*
370: 4.478946 HiF0
371: 5.330000 L*
372: 5.445000 L-H%
373: 5.709989 H*
374: 6.300000 H*
375: 6.331575 HiF0
376: 6.740000 L-H%
377: 7.336837 HiF0
378: 7.402120 H*
379: 7.607943 L-L%
380: 8.301393 H*
381: 8.510248 HiF0
382: 10.105260 HiF0
383: \end{alltt}
384: \end{minipage}
385: &
386: \begin{minipage}[t]{.2\linewidth}
387: {\small Part-of-speech\\ Annotation}
388: 
389: \begin{alltt}
390: This DT
391: woman NN
392: receives VBZ
393: three CD
394: hundred CD
395: dollars NNS
396: a DT
397: month NN
398: under IN
399: General NP
400: Relief, NP
401: plus CC
402: four CD
403: hundred CD
404: dollars NNS
405: a DT
406: month NN
407: in IN
408: A.F.D.C. NP
409: benefits NNS
410: for IN
411: her PP\$
412: son, NN
413: who WP
414: is VBZ
415: \end{alltt}
416: \end{minipage}
417: \end{tabular}}
418: 
419: \caption{Multiple Annotations of the Boston University Radio Speech Corpus}\label{fig:bu-speech}
420: \vspace*{2ex}\hrule
421: \end{figure*}
422: 
423: %\begin{figure}
424: %\centerline{\epsfig{figure=bu4.ps,width=\linewidth}}
425: %\vspace{3cm}
426: %\caption{Visualization for BU Example}\label{fig:bu-ag}
427: %\vspace*{2ex}\hrule
428: %\end{figure}
429: 
430: 
431: Coreference annotation (Figure~\ref{fig:bu-speech}, top left) associates a
432: unique identifier to each noun phrase and a reference attribute which
433: links each pronoun to its antecedent.  The set of coreferring
434: expressions is considered to be an equivalence class.  Named-entity
435: annotation (top centre) identifies and classifies numerical and name
436: expressions.  Penn Treebank annotation provides a syntactic parse of
437: each sentence.  The word-level annotation (bottom left) gives the end
438: time of each word (a second offset into the associated signal data).
439: The syllable annotation gives the Arpabet phonetic symbols
440: (see \sburl{www.ldc.upenn.edu/doc/timit/phoncode.doc}).
441: The tonal annotation provides time points and intonational units, and the
442: part of speech annotation (bottom right) specifies the syntactic
443: category of each word.  This is but a small sample of the bazaar of
444: data formats.
445: \arXivhack
446: 
447: \Section{Data Models for Speech Databases}
448: 
449: Two database models for multi-layered speech annotations have been
450: developed by the authors.  The Emu model (Macquarie) organises the data
451: primarily in terms of its hierarchical structure, while the annotation
452: graph model (Penn) foregrounds the temporal structure.  In separate
453: work we demonstrate the expressive equivalence of the two models
454: \cite{BirdLiberman99,CassidyHarrington99}.  Here we give a brief
455: overview of both models. In the remainder of this paper we will
456: consider mainly the annotation graph data model, while the Emu system
457: serves as an example of a working speech database system.
458: 
459: \SubSection{The Emu model}
460: 
461: The Emu speech database system \sburl{www.shlrc.mq.edu.au/emu}
462: \cite{CassidyHarrington96,CassidyHarrington99} provides tools for creation, query
463: and analysis of data from annotated speech databases.  Emu is
464: implemented as a core C++ library and a set of extensions to the Tcl
465: scripting language which provide a set of basic operations on speech
466: annotations.  Emu provides a flexible annotation model into which a
467: number of existing label file formats can be read.
468: 
469: The Emu annotation model is based on a set of \emph{levels} which
470: represent different types of linguistic data such as words, phonemes or
471: pitch events.  Each level contains a set of \emph{tokens} which have
472: one or more \emph{labels} and optionally a start and end time relative
473: to an associated speech signal.  Within a level, tokens are stored as a
474: partial order representing thier sequence in the annotation: each token
475: may have zero or more previous and next tokens.  The partial ordering
476: must respect timing information if it is present in the tokens: that
477: is, a token cannot follow a token with an later start time.
478: 
479: Within and between levels, tokens may be related by either
480: \emph{domination} or \emph{association} relations.  Domination
481: relations relate a parent token to an ordered sequence of constituent
482: child tokens and imply that the start and end times of the parent could
483: be inferred from those of the children. Association relations have no
484: in-built semantics and can be used for any application specific
485: relation, such as that between a word and a tone target which denotes
486: the point at which word stress is realised
487: (Figure~\ref{fig:emu-timit}).  Relations may be defined between any
488: pair of levels which allows Emu to handle intersecting hierarchies such
489: as that illustrated in Figure~\ref{fig:emu-timit}.
490: 
491: \begin{figure*}[tbp]
492: \centerline{\epsfig{file=emu-timit,width=0.75\linewidth}}
493: \caption{An example utterance from the TIMIT database which has been
494:   augmented with both a syntactic annotation and a ToBI style
495:   intonational annotation.  The names of the levels are shown on the
496:   left, the Word level has been duplicated to show the links to both
497:   the syntactic and intonatational hierarchies. The single Tone event H*
498:   is associated with the word `dark'. Time information at the phoneme
499:   level is used to derive times for all higher levels.}
500: \label{fig:emu-timit}
501: \vspace*{2ex}\hrule
502: \end{figure*}
503: 
504: 
505: \SubSection{The annotation graph model}
506: 
507: A second general purpose model supporting multiple independent
508: hierarchical transcriptions of the same signal data is known as the
509: {\it annotation graph} \cite{BirdLiberman99dtag,BirdLiberman99}.
510: This model forms the heart of a joint initiative between LDC, NIST
511: \sburl{www.nist.gov} and MITRE \sburl{www.mitre.org}
512: to develop an architecture and tools for linguistic
513: analysis systems (ATLAS), and an NSF-sponsored project between
514: LDC, the Penn database group, and the CMU Psychology and Informedia
515: departments, to develop a multimodal database of communicative
516: interaction called Talkbank \sburl{www.talkbank.org}.
517: 
518: Annotation graphs are labelled DAGs with time references on some of the
519: nodes.  Bird and Liberman have demonstrated that annotation graphs are
520: sufficiently expressive to encompass the full range of current speech
521: annotation practice.  A simple example of an annotation graph is shown
522: in Figure~\ref{fig:ag-timit}, for a corpus known as TIMIT \cite{TIMIT86}.
523: Annotation graphs (AGs) have the following structure.
524: Let $L = \bigotimes L_i$ be the label data which occurs on the arcs of
525: an AG.  The nodes $N$ of an AG reference signal data by virtue of a
526: function mapping nodes to time offsets $T$.  AGs are now defined as
527: follows:
528: 
529: \newtheorem{defn}{Definition}
530: \newtheorem{ex}{Example}
531: 
532: \begin{defn}
533: An \textbf{annotation graph} $G$ over a label set $L$ and a
534: timeline $T$ is a 3-tuple
535: $\left< N, A, \tau \right>$ consisting of a node set $N$,
536: a collection of arcs $A$ labelled with elements of $L$,
537: and a time function $\tau$, which satisfies the following conditions:
538: 
539: \begin{enumerate}\setlength{\itemsep}{0pt}
540: 
541: \item $\left< N, A \right>$ is an acyclic digraph
542:   labeled with elements of $L$, and
543:   containing no nodes of degree zero;
544: 
545: \item $\tau: N \rightharpoonup T$,
546:   such that, for any path from node $n_1$ to $n_2$ in $A$,
547:   if $\tau(n_1)$ and $\tau(n_2)$ are defined, then
548:   $\tau(n_1) \leq \tau(n_2)$;
549: 
550: \end{enumerate}
551: \end{defn}
552: 
553: \begin{figure*}[tbp]
554: \centerline{\epsfig{file=ag-timit,width=\linewidth}}
555: \caption{TIMIT Graph Structure}\label{fig:ag-timit}
556: \vspace*{2ex}\hrule
557: \end{figure*}
558: %% note I've modified the emu example to associate H* with 'dark'
559: %% instead of aa -- I think this is fits ToBI better
560: 
561: Note that AGs may be disconnected or empty, and that they must
562: not have orphan nodes.  The AG corresponding to the Emu annotation
563: structure in Figure~\ref{fig:emu-timit}, for the first five
564: words of a TIMIT annotation, is given in Figure~\ref{fig:ag-timit}.
565: The arc types are interpreted as follows:
566: \expr{S} -- syntax;
567: \expr{W} -- word;
568: \expr{P} -- phoneme;
569: \expr{T} -- tone;
570: \expr{Imt} -- intermediate phrase;
571: \expr{Itl} -- intonational phrase.
572: \arXivhack
573: 
574: \Section{Annotations as Relational Tables}
575: 
576: Annotation data expressed in either the Emu or annotation graph data
577: models can be trivially recast as a set of relational tables
578: \cite{Cassidy99}.  For the purposes of this paper it is instructive to
579: consider the relational form of annotation data in order to explore the
580: requirements for a query language for these databases.
581: 
582: An annotation graph can be represented as a pair of tables, for the arc
583: relation and time relations.  The arc relation is a six-tuple
584: containing an arc id, a source node id, a target node id, and three
585: labels taken from the sets $L_1, L_2, L_3$ respectively.  The choice of
586: three label positions is somewhat arbitrary, but it seems to be
587: both necessary and sufficient for the various annotation structures
588: considered here.
589: 
590: We let $L_1$ be the set of types of transcript information
591: (e.g.\ `word', `syllable', `phoneme'), and let
592: $L_2$ be the substantive transcript element (e.g.\ particular
593: words, phonetic symbols, and so on).  We let $L_3$ be the names
594: of equivalence classes, used here to model so-called
595: `phonological association'.  (This kind of association is
596: discussed in depth in \cite{Bird95}.)
597: Let $T$ be the set
598: of non-negative integers, the sample numbers.
599: Figure~\ref{fig:graph-table} gives an example for the TIMIT data of
600: Figures~\ref{fig:emu-timit}, \ref{fig:ag-timit}.
601: 
602: \begin{figure*}[tbp]
603: {\scriptsize
604: \begin{minipage}{\textwidth}
605: \begin{tabular}[t]{c|cccccc}
606: {\it Arc} &
607:  $id$ &
608:  $X$  &
609:  $Y$  &
610:  $L_1$ &
611:  $L_2$ &
612:  $L_3$ \\
613: \cline{2-7}
614: 
615: &1 & 0 & 1 & P & h\# & \\
616: &2 & 1 & 2 & P & sh  & \\
617: &3 & 2 & 3 & P & iy  & \\
618: &4 & 3 & 4 & P & hv  & \\
619: &5 & 4 & 5 & P & ae  & \\
620: &6 & 5 & 6 & P & dcl & \\
621: &7 & 6 & 7 & P & y   & \\
622: &8 & 7 & 8 & P & axr & \\
623: &9 & 8 & 9 & P & dcl & \\
624: &10 & 9 & 10 & P & d & \\
625: &11 & 10 & 11 & P & aa & \\
626: &12 & 11 & 12 & P & r & \\
627: &13 & 12 & 13 & P & kcl & \\
628: &14 & 13 & 14 & P & k & \\
629: &15 & 14 & 15 & P & s & \\
630: &16 & 15 & 16 & P & uw & \\
631: &17 & 16 & 17 & P & q &
632: \end{tabular}\hfil
633: \begin{tabular}[t]{c|cccccc}
634: {\it Arc} &
635:  $id$ &
636:  $X$  &
637:  $Y$  &
638:  $L_1$ &
639:  $L_2$ &
640:  $L_3$ \\
641: \cline{2-7}
642: 
643: &18 & 1 & 3 & W & she & \\
644: &19 & 3 & 6 & W & had & \\
645: &20 & 6 & 8 & W & your & \\
646: &21 & 8 & 14 & W & dark & 1 \\
647: &22 & 14 & 17 & W & suit & \\
648: &23 & 1 & 18 & S & S & \\
649: &24 &3 & 18 & S & VP & \\
650: &25 &1 & 3 & S & NP & \\
651: &26 &3 & 6 & S & V & \\
652: &27 &6 & 17 & S & NP & \\
653: 
654: &28 &1 & 17 & Imt & L- & \\
655: &29 &1 & 18 & Itl & L\% & \\
656: 
657: &30 &1 & 19 & T & 0 & \\
658: &31 &19 & 20 & T & H* & 1
659: \end{tabular}\hfil
660: \begin{tabular}[t]{c|cc}
661: {\it Time} &
662:  $N$ &
663:  $T$\\
664: \cline{2-3}
665: & 0 & 0    \\
666: & 1 & 2360 \\
667: & 2 & 3270 \\
668: & 3 & 5200 \\
669: & 4 & 6160 \\
670: & 5 & 8720 \\
671: & 6 & 9680 \\
672: & 7 & 10173\\
673: & 8 & 11077\\
674: & 9 & 12019\\
675: & 10 & 12257\\
676: & 11 & 14120\\
677: & 12 & 15240\\
678: & 13 & 16200\\
679: & 14 & 16626\\
680: & 15 & 18480\\
681: & 16 & 20685\\
682: & 17 & 22179\\
683: & 18 & 57040\\
684: & 19 & 13650\\
685: & 20 & 13650
686: \end{tabular}
687: \end{minipage}    
688:     \caption{The Arc and Time Relations}
689:     \label{fig:graph-table}
690: }
691: \vspace{2ex}\hrule
692: \end{figure*}
693: 
694: We form the transitive closure of the (unlabeled) graph relation to
695: define a structural (graph-wise) precedence relation using a datalog program:
696: 
697: \begin{sv}
698: s_prec(X,X) :- 
699: s_prec(X,Y) :- arc(_,X,Y,_,_,_)
700: s_prec(X,Y) :- s_prec(X,Z),
701:                arc(_,Z,Y,_,_,_)
702: \end{sv}
703: 
704: Now we further define a temporal precedence relation, where {\tt leq} is
705: the $\leq$ relation (minimally defined on the times used by the graph):
706: 
707: \begin{sv}
708: t_prec0(X,Y) :- time(X,T1),
709:                 time(Y,T2),
710:                 leq(X,Y)
711: 
712: t_prec(X,Y) :-  t_prec0(X,Y)
713: t_prec(X,Y) :-  t_prec(X,Z),
714:                 t_prec0(Z,Y)
715: \end{sv}
716: \arXivhack
717: 
718: \Section{Exploring Annotated Linguistic Databases}
719: 
720: \SubSection{General architecture}
721: 
722: In our experience with the analysis of linguistic databases, we have
723: found a recurrent pattern of use having three components
724: which we will call query, report generation, and analysis.
725: 
726: The query system proper can be viewed as a function from annotation
727: graphs to sets of subgraphs, i.e. those meeting some (perhaps complex)
728: condition.
729: The report generation phase is able to access these query
730: results, but also the signals underlying the annotations.  For
731: example, the report generation phase can calculate such things as
732: `mean F$_2$ in signal S during time interval $(t_1,t_2)$.'
733: Each hit constitutes an `observation' in the statistician's sense,
734: and we extract a vector of specified values for each observation, to
735: be passed along to the analysis system.
736: The analysis phase is then some general-purpose data
737: crunching system such as Splus or Matlab.
738: 
739: This architecture saves us from having to incorporate all possible
740: calculations over annotated signals into the query language.
741: The report generation phase can perform such calculations, as well
742: as compute properties of the annotation data itself.
743: This seems to simplify the query system a good deal;
744: now things like `count the number of syllables to the end of the
745: current phrase' (which we do need to be able to do) are tasks for the
746: report generator, not the query system proper.
747: 
748: In general, the result of a query is a set of sub-graphs, each of which 
749: forms one matching instance.  If we use the relational model proposed
750: above, these would be returned as a result table having the 
751: same structure as the arc relation of Figure~\ref{fig:graph-table},
752: but containing just the tuples which took part in each matching instance. 
753: We are then faced with the problem of how to differentiate the matching 
754: instances, for example, if we wished to collect together the word
755: labels for the query `find all words dominated by noun phrases' we need 
756: some way of treating each sub-graph separately.  Hence, we would prefer 
757: the result to be a set of tables rather than a single table containing
758: all matching tuples. 
759: 
760: In a sense, then, the only role of the query is to define an iterator
761: for the report generator over a set of sub-graphs of the overall
762: annotation graph.
763: 
764: \subsubsection*{The Emu query language}
765: 
766: The Emu query language uses simple conditions on token labels which
767: match only tokens at a specified level, for example:
768: \texttt{Phonetic=A|I|O|U|E|V}.  These conditions can be combined by
769: sequence, domination or association operators to constrain the
770: relational structure of the tokens of interest.  Examples of each are:
771: 
772:   Find a sequence of \texttt{vowel} followed by \texttt{stop} at the
773:   phoneme level:\\
774:       \texttt{[Phoneme=vowel \queryseq Phoneme=stop]}
775:  
776:   Find Words not labelled \texttt{x} dominating \texttt{vowel}
777:   phonemes:\\
778:       \texttt{[Word!=x \querydom Phoneme=vowel]}
779: 
780:   Find words associated with \texttt{H*} tones:\\
781:       \texttt{[Word!=x \queryassoc Tone=H*]}     
782:      
783: The \texttt{Word!=x} query is intended to match any word in lieu of a
784: query language construct which allows matching any label string.  
785: 
786: Each query matches either a token or, in the case of the sequence
787: query, a sequence of tokens.  The result of a domination or association
788: query is the result of the left hand side of the bracketed term; this
789: can be changed by marking the right hand side term with a hash (\texttt{\#}).
790: Compound queries can be arbitrarily nested to specify complex
791: constraints on tokens. As an example the following query finds
792: sequences of stop and vowels dominated by strong syllables where the
793: vowel is associated with an \texttt{H*} tone target, the result is a
794: list of the vowel labels with associated start and end times.
795: \begin{center}
796: \begin{sv}
797:  [Syllable=S ^ 
798:      [Phoneme=stop -> 
799:          [Phoneme=vowel => Tone=H*]]]
800: \end{sv}
801: \end{center}
802: 
803: The result of an Emu query is a table with one entry per matching
804: token:
805: \begin{sv}
806: database:timit
807: query:Phoneme!=x
808: type:segment
809: #
810: h#      0       147.5   fjsp0:sa1
811: sh      147.5   232.5   fjsp0:sa1
812: iy      232.5   325     fjsp0:sa1
813: hv      325     385     fjsp0:sa1
814: ...
815: \end{sv}
816: This table is used to extract any of the associated time-series data
817: associated with the database, an operation usually carried out from an
818: analysis environment such as Splus or XlispStat.  Emu provides
819: libraries of analysis functions for these environments which
820: facilitate, for example, mapping signal processing operations over each
821: token in a query result or overlaying plots of the time series data for
822: each token.
823: 
824: Although this query system has proved useful and useable in the
825: environment of acoustic phonetics research, it is now evident that
826: there are a number of shortcomings which prevent it's wider use. The
827: query syntax is unable to express some queries, such as those involving 
828: disjunction or optional elements, and the query result is only really
829: useful for data extraction.  It is for these reasons that we are now
830: looking more formally at the requirements for a query language for
831: annotation data. 
832: 
833: \SubSection{A query language on annotation graphs}
834: 
835: A high-level query language for annotation graphs, founded on
836: an interval-based tense logic, is currently being developed and
837: will be reported in a later version of this paper.
838: 
839: Here we describe a variety of useful queries on annotation
840: graphs and formulate them as datalog programs.  As we shall see,
841: it turns out that datalog is insufficiently expressive for the
842: range of queries we have in mind.  Finding a more expressive
843: yet tractable query language is the focus of ongoing research.
844: 
845: A number of simple operations, extending our two relations
846: \predicate{arc/6} and \predicate{time/2},
847: will be necessary for succinct queries.
848: The first and most obvious is for hierarchy.  Observe in
849: Figure~\ref{fig:ag-timit} that there is a notion of structural
850: inclusion defined by the arcs.  We formulate this as follows:
851: 
852: \begin{sv}
853: s_incl(I,J) :- 
854:    arc(I,W,Z,_,_,_), 
855:    arc(J,X,Y,_,_,_), 
856:    s_prec(W,X), s_prec(Y,Z)
857: \end{sv}
858: 
859: Now, since \predicate{s\_prec} is reflexive, so is \predicate{s\_incl}.
860: Observe that nodes 3 and 6 in Figure~\ref{fig:ag-timit} are connected
861: by both an \smtt{S/V} arc and a \smtt{W/had} arc.  The syntactic
862: verb arc \smtt{S/V} should dominate the word arc \smtt{W/had}, but not
863: vice versa.  Therefore we need to have a hierarchy defined over the
864: types.  We achieve this with a (domain-specific) ordering on the
865: type names:
866: 
867: \begin{sv}
868: type_hierarchy(word,syl)
869: type_hierarchy(syl,seg)
870: \end{sv}
871: 
872: \noindent
873: Now dominance is expressed by the predicate:
874: 
875: \begin{sv}
876: dom(I,J) :- 
877:    arc(I,_,_,L1,_,_), 
878:    arc(J,_,_,L2,_,_), 
879:    type_hierarchy(L1,L2), 
880:    s_incl(I,J)
881: \end{sv}
882: 
883: In some cases it is necessary to have an intransitive dominance
884: relation that is sensitive to phrase structure rules.  For simplicity
885: of presentation, we assume binary branching structures.  The first
886: of the rules below states that a sentence arc \smtt{s} will
887: immediately and exhaustively dominate an \smtt{np} arc followed
888: by a \smtt{vp} arc.
889: 
890: \begin{sv}
891: ps_rule(s,np,vp)
892: ps_rule(np,det,n)
893: ps_rule(vp,v,np)
894: \end{sv}
895: 
896: \noindent
897: Now we define immediate dominance over the syntax arcs \smtt{syn} as
898: follows:
899: 
900: \begin{sv}
901: i_dom(I,J) :- 
902:    arc(I,X,Z,syn,P,_), 
903:    ps_rule(P,C1,C2), 
904:    arc(J,X,Y,syn,C1,_), 
905:    arc(_,Y,Z,syn,C2,_)
906: 
907: i_dom(I,J) :- 
908:    arc(I,X,Z,syn,P,_), 
909:    ps_rule(P,C1,C2), 
910:    arc(_,X,Y,syn,C1,_), 
911:    arc(J,Y,Z,syn,C2,_)
912: \end{sv}
913: 
914: Another widely used relation between arcs is association.  In the
915: instance of the AG model in Figure~\ref{fig:graph-table}, association
916: amounts to sharing the value of $L_3$, as we saw in the tuples
917: for \smtt{dark} and \smtt{H*} in Figure~\ref{fig:graph-table}.
918: The \predicate{assoc}
919: predicate simply does a join on the third label field:
920: 
921: \begin{sv}
922: assoc(I,J) :- 
923:    arc(I,_,_,_,_,A), 
924:    arc(J,_,_,_,_,A)
925: \end{sv}
926: 
927: Finally, it is convenient to have a kleene star relation.
928: %%Unfortunately it is unable to collect up the arbitrary length
929: Unfortunately in datalog we are unable to collect up the arbitrary length
930: sequence it matches.  Here we have it returning the two nodes which
931: bound the sequence, which is often enough to uniquely identify the
932: sequence in practice.
933: 
934: \begin{sv}
935: node(N) :- arc(_,N,_,_,_,_)
936: node(N) :- arc(_,_,N,_,_,_)
937: 
938: kleene1(X,X,_) :- node(X)
939: kleene1(X,Y,L) :- 
940:    arc(_,X,Z,L,_,_),
941:    kleene1(Z,Y,L)
942: 
943: kleene2(X,X,_) :- node(X)
944: kleene2(X,Y,L) :- 
945:    arc(_,X,Z,_,L,_), 
946:    kleene2(Z,Y,L)
947: 
948: kleene3(X,X,_) :- node(X)
949: kleene3(X,Y,L) :- 
950:    arc(_,X,Z,_,_,L), 
951:    kleene3(Z,Y,L)
952: \end{sv}
953: 
954: With this simple machinery we can start defining some annotation
955: queries.
956: 
957: %\note{Now, we want these queries to construct a graph/6 table where each
958: %tuple that takes part in the match is represented. Clearly these
959: %queries aren't going to do that since they are just logical statements
960: %about various properties, so they really combine our query and
961: %reporting steps so...}
962: % I don't think this is a problem now, given the way queries
963: % always return arc ids - SB
964: %
965: %We define a query here as a datalog predicate.  The result of a query
966: %is the set of graph/6 tuples involved in deriving the truth value for
967: %the predicate.  This set is marked somehow to group together those
968: %tuples involved in each instance of a query match. 
969: %
970: %\note{But, even this isn't enough, since we want to have the query
971: %  result annotated in some way so that we can tell which bit matched
972: %  what -- ie which is the vowel and which is the stop in this first
973: %  query. So, what do we do? }
974: 
975: \noindent
976: Find a sequence of vowel followed by stop at the phoneme level
977: (assumes suitably defined {\tt vowel} and {\tt stop} unary relations):
978: \begin{sv}
979: vowel_stop(I,J) :- 
980:    arc(I,_,Y,phoneme,V,_), 
981:    arc(J,Y,_,phoneme,S,_), 
982:    vowel(V), stop(S)
983: \end{sv}
984: 
985: \noindent
986: If we do not want both the vowel and the stop, but just the vowel,
987: we could write:
988: \begin{sv}
989: vowel_stop(I) :- 
990:    arc(I,_,Y,phoneme,V,_), 
991:    arc(_,Y,_,phoneme,S,_), 
992:    vowel(V), stop(S)
993: \end{sv}
994: 
995: \noindent
996: Find words dominating vowel phonemes:
997: \begin{sv}
998: strongWrdDomVowels(I) :-
999:     arc(I,_,_,word,s,_),
1000:     arc(J,_,_,phoneme,V,_),
1001:     vowel(V),
1002:     dom(I,J)
1003: \end{sv}
1004: 
1005: \noindent
1006: Find words associated with H* tones:
1007: %% [Word!=x => Tone=H*]
1008: \begin{sv}
1009: sylHtone(I) :-
1010:     arc(I,_,_,word,_,A),
1011:     arc(_,_,_,tone,h*,A)
1012: \end{sv}
1013: 
1014: \noindent
1015: Find stop-vowel sequences dominated by words in noun phrases where the word
1016: is associated with an H* tone target.
1017: %% [[Phoneme=stop -> Phoneme=vowel] ^ [[Word!=x => Tone=H*] ^ Syntax=np]]
1018: \begin{sv}
1019: stop_vowel_seq(I,J) :-
1020:     arc(I,_,Y,phoneme,S,_), stop(S),
1021:     arc(J,Y,_,phoneme,V,_), vowel(V),
1022:     arc(W,_,_,word,_,_),
1023:     arc(N,_,_,syn,np,_),
1024:     dom(N,W), dom(W,I), dom(W,J),
1025:     arc(T,_,tone,h*,_), assoc(W,T)
1026: \end{sv}
1027: 
1028: \noindent
1029: Find the intermediate phrase containing the main verb of a sentence:
1030: %% [Intermediate!=x ^ [Syntax=s ^ [Syntax=vp ^ Syntax=v]]]
1031: \begin{sv}
1032: imt_phrase(P) :-
1033:     arc(K, _, _, syn, s, _),
1034:     arc(J, _, _, syn, vp, _),
1035:     arc(I, _, _, syn, v, _),
1036:     i_dom(K,J), i_dom(J,I),
1037:     dom(P, I),
1038:     arc(P, _, _, imt, _, _)
1039: \end{sv}
1040: 
1041: \noindent
1042: Return the set of syllables between an H* and an L\% tone (inclusive).
1043: %% Emu can't do this, doesn't have kleene star
1044: \begin{sv}
1045: syls(K) :-
1046:     arc(_, _, N, tone, h*, A1),
1047:     arc(_, N, _, tone, l%, A2),
1048:     arc(I, _, N1, syl, _, A1),
1049:     arc(J, N2, _, syl, _, A2),
1050:     kleene1(N1, N2, syl),
1051:     arc(K, N2, N3, syl,_,_),
1052:     kleene1(N3, N4, syl)
1053: \end{sv}
1054: 
1055: The above query shows how the datalog model breaks down.  We would
1056: like it to return sets of sets of syllable arcs.  Instead it returns
1057: a flat set structure.
1058: In many cases we will know that some arc participating in
1059: the query expression can be used to recover the nested structure.  For
1060: example, if the head of the above clause was changed from
1061: \predicate{syls(K)} to \predicate{syls(I,K)}, then \predicate{I} will
1062: aggregate \predicate{K} in just the right way.
1063: \arXivhack
1064: 
1065: \Section{Applying XML Query Languages to Annotations}
1066: 
1067: %\begin{figure*}[t]
1068: %\begin{tabular}{l|l}
1069: %{\scriptsize
1070: %\begin{minipage}[t]{.4\linewidth}
1071: %\begin{alltt}
1072: %<clauses>
1073: %  <clause id=1 label='s'>
1074: %    <clause id=2 label='np'>
1075: %      <word id=3 label='she'/>
1076: %    </clause>
1077: %    <clause id=4 label='vp'>
1078: %      <word id=5 label='had'/>
1079: %      <clause id=6 label='np'>
1080: %        <word id=7 label='your'/>
1081: %        <word id=8 label='dark'/>
1082: %        <word id=9 label='suit'/>
1083: %      </clause>
1084: %      <clause id=10 label='pp'>
1085: %      ...
1086: %      </clause>
1087: %    </clause>
1088: %  </clause>
1089: %</clauses>
1090: %\end{alltt}
1091: %\end{minipage}}
1092: %&
1093: %{\scriptsize
1094: %\begin{minipage}[t]{.5\linewidth}
1095: %\begin{alltt}
1096: %<tobi>
1097: %  <intonational id=20  label='L\%'>
1098: %    <intermediate id=21 label='L-'>
1099: %      <word  id=3 label='she'>
1100: %        <phoneme id=22 label='sh'>
1101: %        <phoneme id=23 label='iy'>
1102: %      </word>
1103: %      <word id=5 label='had'>
1104: %        <phoneme id=24 label='hv'/>
1105: %        <phoneme id=25 label='ae'/>
1106: %        <phoneme id=26 label='dcl'/>
1107: %      </word>
1108: %      <word id=7 label='your'/>
1109: %      <word id=8 label='dark' toneref=101/>
1110: %      <word id=9 label='suit'>
1111: %      </word>
1112: %    </intermediate>
1113: %...
1114: %  </intonational>
1115: %  <tones>
1116: %    <tone id=101 label='H*'/>
1117: %    <tone id=102 label='H*'/>
1118: %  </tones>
1119: %</tobi>
1120: %\end{alltt}
1121: %\end{minipage}}
1122: %\end{tabular}
1123: %\caption{The TIMIT Example as a Pair of XML Documents}
1124: %\label{fig:xml-eg}
1125: %\vspace{2ex}\hrule
1126: %\end{figure*}
1127: 
1128: It is worth briefly considering the suitability of existing XML
1129: query languages such as XML-QL \cite{xml-ql} and XQL \cite{xql}
1130: for the domain of annotated speech.  At first glance the problems
1131: we face querying annotated speech data are similar to those present
1132: with XML queries in that both present a hierarchical data
1133: model.  A number of formulations of annotation data as XML are
1134: possible, indeed some projects make use of XML/SGML based formats
1135: entirely (e.g.\ MATE \sburl{mate.nis.sdu.dklpq},
1136: LACITO \sburl{lacito.vjf.cnrs.fr/ARCHIVAG/ENGLISH.htm}).
1137: %
1138: % This is quite problematic, and I propose to omit it:
1139: % The major problem is the common occurence of multiple
1140: % intersecting hierarchies, such as that presented in the earlier
1141: % augmented TIMIT example.  In order to represent this kind of
1142: % annotation, two or more XML documents (or sub-documents) are required
1143: % which share structure, perhaps via common attributes; an example is
1144: % given in Figure~\ref{fig:xml-eg}.
1145: %
1146: XML can represent trees using properly nested tags, in the
1147: obvious way.  In order to represent multiple independent
1148: hierarchies built on top of the same material one must construct trees
1149: using IDREF pointers.  This idea was proposed by the
1150: Text Encoding Initiative \cite{TEI-P3} and recently
1151: adopted by the MATE project.  We believe this approach is
1152: vastly more expressive than necessary for representing speech
1153: annotations, and we prefer a more constrained approach having
1154: desirable computational properties with respect to creation,
1155: validation and query.
1156: 
1157: The XQL proposal \cite{xql} describes a query language which is
1158: intended to select elements from
1159: within XML documents according to various criteria; for example, the
1160: query \texttt{text/author} returns all author elements that are
1161: children of text elements.  The XQL data model ignores the order of
1162: elements within a parent element and has no obvious way to query for
1163: sequences of tokens. 
1164: 
1165: The XML-QL proposal \cite{xml-ql} provides for a data model
1166: where the order of elements is respected.  A query for a word-internal
1167: vowel-stop sequence could be expressed as follows (assuming
1168: suitably tagged annotation data for TIMIT):
1169: 
1170: \begin{sv}
1171: <word>
1172:   <phoneme label=&vowel;/>
1173:   <phoneme label=&stop;/>
1174: </word>
1175: \end{sv}
1176: 
1177: \noindent
1178: The result of this query would have the following form:
1179: 
1180: \begin{sv}
1181: <word label=had>
1182:   <phoneme label=ae/>
1183:   <phoneme label=dcl/>
1184: </word>
1185: <word label=dark>
1186:   <phoneme label=ar/>
1187:   <phoneme label=k/>
1188: </word>
1189: \end{sv}
1190: 
1191: Queries which refer to two independent
1192: hierarchies, such as syntactic and intonational phrase
1193: structure, need to use joins.
1194: For example, to find words that are simultaneously
1195: at the end of both clauses and intermediate phrases,
1196: we could have the following query:
1197: %%In Emu-QL:
1198: %%End(Intermediate,Word)=1 & End(Clause,Word)=1
1199: \begin{sv}
1200: <intermediate>
1201:   <word id=\$i></>[end()]
1202: </intermediate>
1203: 
1204: <clause>
1205:   <word id=\$i></>[end()]
1206: </clause>
1207: \end{sv}
1208: 
1209: We assume the existence of some mechanism to pick out the last child
1210: element.
1211: The ID attribute ensures that the words are the same in each
1212: part of the join.
1213: 
1214: Perhaps either of these approaches could be made to work for a useful range
1215: of query needs.  However they do not appear to be sufficiently
1216: general.  For example, it is often useful to have query expressions
1217: involving kleene star: `select all pairs of consonants, ignoring
1218: any intervening vowels' (CV*C).  Such queries may ignore hierarchical
1219: structure, finding sequences across (say) word boundaries.
1220: Using regular expressions over paths, XML-QL could provide access to
1221: strings of terminal symbols ignoring intervening levels of hierarchy.
1222: Yet it does not provide regular-expression matching over those
1223: sequences.  Alternatively, sequences at each level of a hierarchy
1224: could be chained together using IDREF pointers, but it is unclear
1225: how we would manage closures over such pointer structures.
1226: 
1227: % This is really problematic - so I'm omitting it:
1228: % While it may be possible to express many queries in XML-QL or similar
1229: % XML query languages, the poor fit of the XML data model to annotated
1230: % speech data ensures that many of these queries will be awkward and
1231: % unnatural.  Hence, although these efforts are informative and the
1232: % semistructured data model is clearly appropriate, the query needs we
1233: % describe here appear to fall outside the capabilities of these
1234: % existing XML query languages.  Databases of annotated speech
1235: % present an interesting new challenge for research on query languages.
1236: \arXivhack
1237: 
1238: \Section{Conclusions}
1239: 
1240: Annotated speech corpora are an essential component of speech
1241: research, and the variety of formats in which they are distributed has
1242: become a barrier to their wider adoption.  To address this issue,
1243: we have developed two data
1244: models for speech annotations which seem to be sufficiently expressive
1245: to encompass the full range of practice in this area.  We have shown
1246: how the models can be stored in a simple relational format, and how
1247: many useful queries in this domain are first-order.  However, existing
1248: query languages lack sufficient expressive power for the full range of
1249: queries we would like to be able to express, and we hope stimulate new
1250: research into the design of general purpose query languages for
1251: databases of annotated speech recordings.
1252: \arXivhack
1253: 
1254: \section*{Acknowledgements}
1255: 
1256: We are grateful to Peter Buneman, Mark Liberman
1257: and Gary Simons for helpful discussions concerning
1258: the research reported here.
1259: 
1260: \raggedright\small
1261: \bibliographystyle{latex8}
1262: 
1263: \begin{thebibliography}{10}\setlength{\itemsep}{-1ex}\small
1264: 
1265: \bibitem{Bird95}
1266: S.~Bird.
1267: \newblock {\em Computational Phonology: A Constraint-Based Approach}.
1268: \newblock Studies in Natural Language Processing. Cambridge University Press,
1269:   1995.
1270: 
1271: \bibitem{BirdLiberman99dtag}
1272: S.~Bird and M.~Liberman.
1273: \newblock Annotation graphs as a framework for multidimensional linguistic data
1274:   analysis.
1275: \newblock In {\em Towards Standards and Tools for Discourse Tagging --
1276:   Proceedings of the Workshop}, pages 1--10. Somerset, NJ: Association for
1277:   Computational Linguistics, 1999.
1278: \newblock [xxx.lanl.gov/abs/cs.CL/9907003].
1279: 
1280: \bibitem{BirdLiberman99}
1281: S.~Bird and M.~Liberman.
1282: \newblock A formal framework for linguistic annotation.
1283: \newblock Technical Report MS-CIS-99-01, Department of Computer and Information
1284:   Science, University of Pennsylvania, 1999.
1285: \newblock [xxx.lanl.gov/abs/cs.CL/9903003], expanded from version presented at
1286:   ICSLP-98, Sydney, revised version to appear in {\it Speech Communication}.
1287: 
1288: \bibitem{Cassidy99}
1289: S.~Cassidy.
1290: \newblock Compiling multi-tiered speech databases into the relational model:
1291:   experiments with the {Emu} system.
1292: \newblock In {\em Proceedings of the 6th European Conference on Speech
1293:   Communication and Technology}, 1999.
1294: \newblock \url{http://www.shlrc.mq.edu.au/emu/eurospeech99.shtml}.
1295: 
1296: \bibitem{CassidyHarrington96}
1297: S.~Cassidy and J.~Harrington.
1298: \newblock Emu: An enhanced hierarchical speech data management system.
1299: \newblock In {\em Proceedings of the Sixth Australian International Conference
1300:   on Speech Science and Technology}, pages 361--366, 1996.
1301: \newblock \url{http://www.shlrc.mq.edu.au/emu/}.
1302: 
1303: \bibitem{CassidyHarrington99}
1304: S.~Cassidy and J.~Harrington.
1305: \newblock Multi-level annotation of speech: an overview of the emu speech
1306:   database management system.
1307: \newblock manuscript, 1999.
1308: 
1309: \bibitem{ChurchMercer93}
1310: K.~W. Church and R.~L. Mercer, editors.
1311: \newblock {\em Special Issue on Computational Linguistics Using Large Corpora},
1312:   volume 19(1,2).
1313: \newblock MIT Press, 1993.
1314: 
1315: \bibitem{xml-ql}
1316: A.~Deutsch, M.~Fernandez, D.~Florescu, A.~Levy, and D.~Suciu.
1317: \newblock {XML-QL}: A query language for {XML}, 1998.
1318: \newblock \url{http://www.w3.org/TR/NOTE-xml-ql/}.
1319: 
1320: \bibitem{TIMIT86}
1321: J.~S. Garofolo, L.~F. Lamel, W.~M. Fisher, J.~G. Fiscus, D.~S. Pallett, and
1322:   N.~L. Dahlgren.
1323: \newblock {\em The {DARPA TIMIT} Acoustic-Phonetic Continuous Speech Corpus
1324:   {CDROM}}.
1325: \newblock NIST, 1986.
1326: \newblock \url{http://www.ldc.upenn.edu/Catalog/LDC93S1.html}.
1327: 
1328: \bibitem{Marcus93}
1329: M.~P. Marcus, B.~Santorini, and M.~A. Marcinkiewicz.
1330: \newblock Building a large annotated corpus of {English}: The {Penn}
1331:   {Treebank}.
1332: \newblock {\em Computational Linguistics}, 19(2):313--30, 1993.
1333: \newblock \url{http://www.cis.upenn.edu/~treebank/home.html}.
1334: 
1335: \bibitem{MUC7}
1336: {\em Message Understanding Conference Proceedings (MUC-7)}. Science
1337:   Applications International Corporation, 1998.
1338: \newblock \url{http://www.muc.saic.com/proceedings/muc_7_toc.html}.
1339: 
1340: \bibitem{xql}
1341: J.~Robie, J.~Lapp, and D.~Schach.
1342: \newblock {XML} query language ({XQL}), 1998.
1343: \newblock \url{http://www.w3.org/TandS/QL/QL98/pp/xql.html}.
1344: 
1345: \bibitem{TEI-P3}
1346: {Text Encoding Initiative}.
1347: \newblock {\em Guidelines for Electronic Text Encoding and Interchange (TEI
1348:   P3)}.
1349: \newblock Oxford University Computing Services, 1994.
1350: \newblock \url{http://www.uic.edu/orgs/tei/}.
1351: 
1352: \bibitem{Voorhees98}
1353: E.~M. Voorhees and D.~K. Harman, editors.
1354: \newblock {\em NIST Special Publication 500-242: The Seventh Text REtrieval
1355:   Conference (TREC-7)}. NIST, Government Printing Office, 1998.
1356: \newblock [trec.nist.gov/pubs/trec7/t7\_proceedings.html].
1357: 
1358: \end{thebibliography}
1359: 
1360: \end{document}
1361: