0204:cs0204026/arxiv.tex

1: \documentclass[twocolumn,10pt]{article}

2:

3: \usepackage{latex8,times,latexsym,epsfig,fancyheadings,alltt,a4wide}

4:

5: \pretolerance 250

6: \tolerance 500

7: \hyphenpenalty 100

8: \exhyphenpenalty 100

9: \doublehyphendemerits 7500

10: \finalhyphendemerits 7500

11: \brokenpenalty 10000

12: \lefthyphenmin 2

13: \righthyphenmin 3

14: \widowpenalty 10000

15: \clubpenalty 10000

16: \displaywidowpenalty 10000

17: \looseness 1

18:

19: \usepackage{url}

20: \def\sburl#1{[\url{#1}]}

21:

22: \def\expr#1{\texttt{#1}}

23: \def\predicate#1{\texttt{#1}}

24:

25: % the sequence operator in an emu query

26: \newcommand{\queryseq}{-$>$\ }

27: % the domination operator

28: \newcommand{\querydom}{\^{}\ }

29: % the association operator

30: \newcommand{\queryassoc}{$=>$\ }

31: % and disjunction

32: \newcommand{\queryor}{$|$}

33:

34: \pagestyle{empty}

35:

36: \def\smtt#1{{\small\tt #1}}

37: \def\note#1{{\bf (#1)}}

38: \newenvironment{sv}{\small\begin{alltt}}{\end{alltt}\normalsize}

39: \def\mb#1{{\mbox{\scriptsize #1}}}

40: \def\rn#1#2{\delta_{#1\rightarrow #2}}

41: \def\arXivhack{\vspace{-6pt}}

42:

43: \title{Querying Databases of Annotated Speech}

44: \author{Steve Cassidy\\

45: Department of Linguistics\\

46: Macquarie University\\

47: Sydney, NSW 2109,\\

48: Australia\\

49: \smtt{steve.cassidy@mq.edu.au}\\

50: \and

51: Steven Bird\\

52: Linguistic Data Consortium,\\

53: University of Pennsylvania, \\

54: 3615 Market St, Suite 200,  \\

55: Philadelphia, PA 19104-2608, USA \\

56: \smtt{steven.bird@ldc.upenn.edu}

57: }

58:

59: %\date{\today}

60:

61: \begin{document}

62:

63: \maketitle

64: \thispagestyle{empty}

65:

66: \begin{abstract}

67:   Annotated speech corpora are databases consisting of signal data

68:   along with time-aligned symbolic `transcriptions'.  Such databases

69:   are typically multidimensional, heterogeneous and dynamic.  These

70:   properties present a number of tough challenges for representation

71:   and query.  The temporal nature of the data adds an additional layer

72:   of complexity.  This paper presents and harmonises two independent

73:   efforts to model annotated speech databases, one at Macquarie

74:   University and one at the University of Pennsylvania.

75:   Various query languages are described, along

76:   with illustrative applications to a variety of analytical problems.

77:   The research reported here forms a part of several ongoing projects

78:   to develop platform-independent open-source tools for creating,

79:   browsing, searching, querying and transforming linguistic databases,

80:   and to disseminate large linguistic databases over the internet.

81: \end{abstract}

82:

83: \Section{Databases of Annotated Speech Recordings}

84:

85: Annotated corpora have been an essential component of research and

86: development in language-related technologies for some years.

87: Text corpora have been used for developing information retrieval

88: and summarisation software (e.g. MUC \cite{MUC7}, TREC \cite{Voorhees98}),

89: automatic taggers and parsers and machine translation systems

90: \cite{ChurchMercer93}.  In a similar way, annotated

91: speech corpora have proliferated and have found uses across a rapidly

92: expanding set of languages, disciplines and technologies

93: \sburl{www.ldc.upenn.edu/annotation/}.

94: Over the last 7 years, the Linguistic Data Consortium (LDC)

95: has published over 150 text and speech databases

96: \sburl{www.ldc.upenn.edu/Catalog/}.

97:

98: Typically, such databases are specified at the level of file

99: formats.  Linguistic content is annotated with a variety of

100: tags, attributes and values, with a specified syntax and semantics.

101: Tools are developed for each new format and linguistic domain

102: on an ad hoc basis.  These systems are

103: akin to the databases of the 1960s.  There is a physical

104: representation along with a hand-crafted program offering a

105: single view on the data.  Recently, the authors have shown how

106: the three-level architecture and the relational model can be

107: applied to annotated speech databases

108: \cite{BirdLiberman99,Cassidy99}.  The goal of this paper is

109: to illustrate our two approaches and to describe ongoing research

110: on query algebras.

111:

112: Before presenting the models we give an example of a collection

113: of speech annotations.  This illustrates the diversity of

114: the physical formats and gives an idea of the challenge involved

115: in providing a general-purpose logical characterisation of the

116: data.  The Boston University Radio Speech Corpus consists of

117: 7 hours of radio news stories

118: \sburl{www.ldc.upenn.edu/Catalog/LDC96S36.html}.

119: The annotations include four

120: types of information: orthographic transcripts, broad phonetic

121: transcripts (including main word stress), and two kinds of prosodic

122: annotation, all time-aligned to the digital audio files. The two kinds

123: of prosodic annotation implement the system known as ToBI --

124: Tones and Break Indices

125: \sburl{www.ling.ohio-state.edu/phonetics/E_ToBI/}.

126: We have added three further annotations: coreference annotation and

127: named entity annotation in the style of MUC-7

128: \sburl{www.muc.saic.com/proceedings/muc_7_toc.html}, and

129: syntactic structures in the style of the Penn TreeBank \cite{Marcus93}.

130: Fragments of the physical data are shown in Figure~\ref{fig:bu-speech}.

131:

132: \begin{figure*}

133: {\scriptsize\setlength{\tabcolsep}{.5\tabcolsep}

134: \begin{tabular}{l|l|l}

135: \begin{minipage}[t]{.325\linewidth}

136: {\small Coreference Annotation}

137:

138: {\tiny

139: \begin{alltt}

140: <COREF ID="2" MIN="woman">

141:   This woman</COREF>

142: receives three hundred dollars

143: a month under

144: <COREF ID="5">

145:   General Relief</COREF>, plus

146: <COREF ID="16"

147:        MIN="four hundred dollars">

148:   four hundred dollars a month in

149:   <COREF ID="17"

150:          MIN="benefits" REF="16">

151:     A.F.D.C. benefits</COREF>

152: </COREF> for

153: <COREF ID="9" MIN="son">

154:   <COREF ID="3" REF="2">

155:     her</COREF> son

156: </COREF>, who is

157: <COREF ID="10" MIN="citizen" REF="9">

158:   a U.S. citizen</COREF>.

159: <COREF ID="4" REF="2">

160:   She</COREF>'s among

161: <COREF ID="18" MIN="aliens">

162:   an estimated five hundred illegal

163:   aliens on

164:   <COREF ID="6" REF="5">

165:     General Relief</COREF>

166:   out of

167:   <COREF ID="11" MIN="population">

168:     <COREF ID="13" MIN="state">

169:       the state</COREF>'s

170:     total illegal immigrant

171:     population of

172:     <COREF ID="12" REF="11">

173:       one hundred thousand

174:     </COREF>

175:   </COREF>

176: </COREF>.

177: <COREF ID="7" REF="5">

178:   General Relief</COREF>

179: is for needy families and

180: unemployable adults who

181: \end{alltt}}

182: \end{minipage}

183: &

184: \begin{minipage}[t]{.25\linewidth}

185: {\small Named Entity\\ Annotation}

186:

187: {\tiny

188: \begin{alltt}

189: This woman receives

190: <b_numex TYPE="MONEY">

191:   three hundred dollars

192: <e_numex>

193: a month under General

194: Relief, plus

195: <b_numex TYPE="MONEY">

196:   four hundred dollars

197: <e_numex>

198: a month in A.F.D.C.

199: benefits for her

200: son, who is a

201: <b_enamex TYPE="LOCATION">

202:   U.S.

203: <e_enamex>

204: citizen. brth She's among

205: an estimated five hundred

206: illegal aliens on General

207: Relief brth out of the

208: state's total illegal

209: immigrant population of

210: one hundred thousand. brth

211: General Relief is for

212: needy families and

213: unemployable adults brth

214: who don't qualify for other

215:  public assistance. brth

216: <b_enamex TYPE="ORGANIZATION">

217:   Welfare Department

218: <e_enamex>

219: spokeswoman

220: <b_enamex TYPE="PERSON">

221:   Michael Reganburg

222: <e_enamex>

223: brth says the state will

224: save about

225: <b_numex TYPE="MONEY">

226:   one million dollars

227: <e_numex>

228: a year if illegal aliens

229: are denied General Relief.

230: \end{alltt}}

231: \end{minipage}

232: &

233: \begin{minipage}[t]{.325\linewidth}

234: {\small Penn Treebank Annotation}

235:

236: {\tiny

237: \begin{alltt}

238: ((S

239:   (NP-SBJ This woman)

240:    (VP receives

241:     (NP

242:      (NP

243:       (NP (QP three hundred) dollars)

244:       (NP-ADV a month)

245:       (PP under

246:        (NP General Relief))) , plus

247:      (NP

248:       (NP (QP four hundred) dollars

249:       )

250:       (NP-ADV a month)

251:       (PP in

252:        (NP A.F.D.C. benefits))))

253:     (PP for

254:      (NP

255:       (NP her son) ,

256:       (SBAR (WHNP-1 who)

257:        (S (NP-SBJ *T*-1)

258:         (VP is

259:          (NP-PRD a U.S. citizen)))))))

260:   .

261: ))

262: ((S

263:  (NP-SBJ She)

264:  (VP 's

265:   (PP-PRD among

266:    (NP (NP an estimated

267:     (QP five hundred) illegal aliens)

268:    (PP on

269:     (NP General Relief))

270:    (PP out of

271:     (NP

272:      (NP

273:       (NP the state 's)

274:        total illegal immigrant population)

275:        (PP of

276:         (NP

277:          (QP one hundred thousand))))))))

278:   .

279: \end{alltt}}

280: \end{minipage}

281: \end{tabular}

282:

283: \vspace*{2ex}\hrule\vspace*{2ex}

284:

285: \begin{tabular}{l|l|l|l}

286: \begin{minipage}[t]{.21\linewidth}

287: {\small Word-Level Annotation}

288:

289: \begin{alltt}

290: 0.320000 This

291: 0.620000 woman

292: 1.120000 receives

293: 1.370000 three

294: 1.670000 {hundred

295: 2.020000 }dollars

296: 2.060000 a

297: 2.450000 month

298: 2.740000 under

299: 3.280000 General

300: 3.800000 Relief

301: 4.310000 plus

302: 4.520000 four

303: 4.800000 hundred

304: 5.160000 dollars

305: 5.190000 a

306: 5.480000 month

307: 5.610000 in

308: 6.340000 A.F.D.C.

309: 6.870000 benefits

310: 7.060000 for

311: 7.190000 her

312: 7.620000 son

313: 7.830000 who

314: 7.970000 is

315: 8.020000 a

316: \end{alltt}

317: \end{minipage}

318: &

319: \begin{minipage}[t]{.25\linewidth}

320: {\small Syllable Annotation}

321:

322: \begin{alltt}

323: H#   0    2

324: H#   2    3

325: >endsil

326: DH   5    14   4.182398

327: IH   19   6   -0.184139

328: S    25   8   -0.387113

329: >This

330: W    33   6   -0.495798

331: UH+1 39   3   -0.792806

332: M    42   7    0.042605

333: >

334: EN   49   14   0.395379

335: >woman

336: R    63   3   -0.996359

337: IY   66   7   -0.658371

338: >

339: S    73   12   0.865892

340: IY+1 85   13   0.815127

341: V    98   9    0.815878

342: Z    107  6   -0.563102

343: >receives

344: TH   113  9    0.506469

345: R    122  5   -0.359288

346: IY+1 127  11   0.323961

347: >three

348: HH   138  3   -0.905714

349: \end{alltt}

350: \end{minipage}

351: &

352: \begin{minipage}[t]{.33\linewidth}

353: {\small Tonal Annotation}

354:

355: \begin{alltt}

356: 0.373684 HiF0

357: 0.493698 H*

358: 0.915000 !H*

359: 1.100000 !H-

360: 1.325000 L+H*

361: 1.389472 HiF0

362: 1.716865 L*

363: 2.178711 !H*

364: 2.434735 L-L%

365: 2.969376 H*

366: 3.552627 HiF0

367: 3.630000 H* ; !HL%, maybe LL% ?

368: 3.770074 H-L%

369: 4.440000 H*

370: 4.478946 HiF0

371: 5.330000 L*

372: 5.445000 L-H%

373: 5.709989 H*

374: 6.300000 H*

375: 6.331575 HiF0

376: 6.740000 L-H%

377: 7.336837 HiF0

378: 7.402120 H*

379: 7.607943 L-L%

380: 8.301393 H*

381: 8.510248 HiF0

382: 10.105260 HiF0

383: \end{alltt}

384: \end{minipage}

385: &

386: \begin{minipage}[t]{.2\linewidth}

387: {\small Part-of-speech\\ Annotation}

388:

389: \begin{alltt}

390: This DT

391: woman NN

392: receives VBZ

393: three CD

394: hundred CD

395: dollars NNS

396: a DT

397: month NN

398: under IN

399: General NP

400: Relief, NP

401: plus CC

402: four CD

403: hundred CD

404: dollars NNS

405: a DT

406: month NN

407: in IN

408: A.F.D.C. NP

409: benefits NNS

410: for IN

411: her PP\$

412: son, NN

413: who WP

414: is VBZ

415: \end{alltt}

416: \end{minipage}

417: \end{tabular}}

418:

419: \caption{Multiple Annotations of the Boston University Radio Speech Corpus}\label{fig:bu-speech}

420: \vspace*{2ex}\hrule

421: \end{figure*}

422:

423: %\begin{figure}

424: %\centerline{\epsfig{figure=bu4.ps,width=\linewidth}}

425: %\vspace{3cm}

426: %\caption{Visualization for BU Example}\label{fig:bu-ag}

427: %\vspace*{2ex}\hrule

428: %\end{figure}

429:

430:

431: Coreference annotation (Figure~\ref{fig:bu-speech}, top left) associates a

432: unique identifier to each noun phrase and a reference attribute which

433: links each pronoun to its antecedent.  The set of coreferring

434: expressions is considered to be an equivalence class.  Named-entity

435: annotation (top centre) identifies and classifies numerical and name

436: expressions.  Penn Treebank annotation provides a syntactic parse of

437: each sentence.  The word-level annotation (bottom left) gives the end

438: time of each word (a second offset into the associated signal data).

439: The syllable annotation gives the Arpabet phonetic symbols

440: (see \sburl{www.ldc.upenn.edu/doc/timit/phoncode.doc}).

441: The tonal annotation provides time points and intonational units, and the

442: part of speech annotation (bottom right) specifies the syntactic

443: category of each word.  This is but a small sample of the bazaar of

444: data formats.

445: \arXivhack

446:

447: \Section{Data Models for Speech Databases}

448:

449: Two database models for multi-layered speech annotations have been

450: developed by the authors.  The Emu model (Macquarie) organises the data

451: primarily in terms of its hierarchical structure, while the annotation

452: graph model (Penn) foregrounds the temporal structure.  In separate

453: work we demonstrate the expressive equivalence of the two models

454: \cite{BirdLiberman99,CassidyHarrington99}.  Here we give a brief

455: overview of both models. In the remainder of this paper we will

456: consider mainly the annotation graph data model, while the Emu system

457: serves as an example of a working speech database system.

458:

459: \SubSection{The Emu model}

460:

461: The Emu speech database system \sburl{www.shlrc.mq.edu.au/emu}

462: \cite{CassidyHarrington96,CassidyHarrington99} provides tools for creation, query

463: and analysis of data from annotated speech databases.  Emu is

464: implemented as a core C++ library and a set of extensions to the Tcl

465: scripting language which provide a set of basic operations on speech

466: annotations.  Emu provides a flexible annotation model into which a

467: number of existing label file formats can be read.

468:

469: The Emu annotation model is based on a set of \emph{levels} which

470: represent different types of linguistic data such as words, phonemes or

471: pitch events.  Each level contains a set of \emph{tokens} which have

472: one or more \emph{labels} and optionally a start and end time relative

473: to an associated speech signal.  Within a level, tokens are stored as a

474: partial order representing thier sequence in the annotation: each token

475: may have zero or more previous and next tokens.  The partial ordering

476: must respect timing information if it is present in the tokens: that

477: is, a token cannot follow a token with an later start time.

478:

479: Within and between levels, tokens may be related by either

480: \emph{domination} or \emph{association} relations.  Domination

481: relations relate a parent token to an ordered sequence of constituent

482: child tokens and imply that the start and end times of the parent could

483: be inferred from those of the children. Association relations have no

484: in-built semantics and can be used for any application specific

485: relation, such as that between a word and a tone target which denotes

486: the point at which word stress is realised

487: (Figure~\ref{fig:emu-timit}).  Relations may be defined between any

488: pair of levels which allows Emu to handle intersecting hierarchies such

489: as that illustrated in Figure~\ref{fig:emu-timit}.

490:

491: \begin{figure*}[tbp]

492: \centerline{\epsfig{file=emu-timit,width=0.75\linewidth}}

493: \caption{An example utterance from the TIMIT database which has been

494:   augmented with both a syntactic annotation and a ToBI style

495:   intonational annotation.  The names of the levels are shown on the

496:   left, the Word level has been duplicated to show the links to both

497:   the syntactic and intonatational hierarchies. The single Tone event H*

498:   is associated with the word `dark'. Time information at the phoneme

499:   level is used to derive times for all higher levels.}

500: \label{fig:emu-timit}

501: \vspace*{2ex}\hrule

502: \end{figure*}

503:

504:

505: \SubSection{The annotation graph model}

506:

507: A second general purpose model supporting multiple independent

508: hierarchical transcriptions of the same signal data is known as the

509: {\it annotation graph} \cite{BirdLiberman99dtag,BirdLiberman99}.

510: This model forms the heart of a joint initiative between LDC, NIST

511: \sburl{www.nist.gov} and MITRE \sburl{www.mitre.org}

512: to develop an architecture and tools for linguistic

513: analysis systems (ATLAS), and an NSF-sponsored project between

514: LDC, the Penn database group, and the CMU Psychology and Informedia

515: departments, to develop a multimodal database of communicative

516: interaction called Talkbank \sburl{www.talkbank.org}.

517:

518: Annotation graphs are labelled DAGs with time references on some of the

519: nodes.  Bird and Liberman have demonstrated that annotation graphs are

520: sufficiently expressive to encompass the full range of current speech

521: annotation practice.  A simple example of an annotation graph is shown

522: in Figure~\ref{fig:ag-timit}, for a corpus known as TIMIT \cite{TIMIT86}.

523: Annotation graphs (AGs) have the following structure.

524: Let $L = \bigotimes L_i$ be the label data which occurs on the arcs of

525: an AG.  The nodes $N$ of an AG reference signal data by virtue of a

526: function mapping nodes to time offsets $T$.  AGs are now defined as

527: follows:

528:

529: \newtheorem{defn}{Definition}

530: \newtheorem{ex}{Example}

531:

532: \begin{defn}

533: An \textbf{annotation graph} $G$ over a label set $L$ and a

534: timeline $T$ is a 3-tuple

535: $\left< N, A, \tau \right>$ consisting of a node set $N$,

536: a collection of arcs $A$ labelled with elements of $L$,

537: and a time function $\tau$, which satisfies the following conditions:

538:

539: \begin{enumerate}\setlength{\itemsep}{0pt}

540:

541: \item $\left< N, A \right>$ is an acyclic digraph

542:   labeled with elements of $L$, and

543:   containing no nodes of degree zero;

544:

545: \item $\tau: N \rightharpoonup T$,

546:   such that, for any path from node $n_1$ to $n_2$ in $A$,

547:   if $\tau(n_1)$ and $\tau(n_2)$ are defined, then

548:   $\tau(n_1) \leq \tau(n_2)$;

549:

550: \end{enumerate}

551: \end{defn}

552:

553: \begin{figure*}[tbp]

554: \centerline{\epsfig{file=ag-timit,width=\linewidth}}

555: \caption{TIMIT Graph Structure}\label{fig:ag-timit}

556: \vspace*{2ex}\hrule

557: \end{figure*}

558: %% note I've modified the emu example to associate H* with 'dark'

559: %% instead of aa -- I think this is fits ToBI better

560:

561: Note that AGs may be disconnected or empty, and that they must

562: not have orphan nodes.  The AG corresponding to the Emu annotation

563: structure in Figure~\ref{fig:emu-timit}, for the first five

564: words of a TIMIT annotation, is given in Figure~\ref{fig:ag-timit}.

565: The arc types are interpreted as follows:

566: \expr{S} -- syntax;

567: \expr{W} -- word;

568: \expr{P} -- phoneme;

569: \expr{T} -- tone;

570: \expr{Imt} -- intermediate phrase;

571: \expr{Itl} -- intonational phrase.

572: \arXivhack

573:

574: \Section{Annotations as Relational Tables}

575:

576: Annotation data expressed in either the Emu or annotation graph data

577: models can be trivially recast as a set of relational tables

578: \cite{Cassidy99}.  For the purposes of this paper it is instructive to

579: consider the relational form of annotation data in order to explore the

580: requirements for a query language for these databases.

581:

582: An annotation graph can be represented as a pair of tables, for the arc

583: relation and time relations.  The arc relation is a six-tuple

584: containing an arc id, a source node id, a target node id, and three

585: labels taken from the sets $L_1, L_2, L_3$ respectively.  The choice of

586: three label positions is somewhat arbitrary, but it seems to be

587: both necessary and sufficient for the various annotation structures

588: considered here.

589:

590: We let $L_1$ be the set of types of transcript information

591: (e.g.\ `word', `syllable', `phoneme'), and let

592: $L_2$ be the substantive transcript element (e.g.\ particular

593: words, phonetic symbols, and so on).  We let $L_3$ be the names

594: of equivalence classes, used here to model so-called

595: `phonological association'.  (This kind of association is

596: discussed in depth in \cite{Bird95}.)

597: Let $T$ be the set

598: of non-negative integers, the sample numbers.

599: Figure~\ref{fig:graph-table} gives an example for the TIMIT data of

600: Figures~\ref{fig:emu-timit}, \ref{fig:ag-timit}.

601:

602: \begin{figure*}[tbp]

603: {\scriptsize

604: \begin{minipage}{\textwidth}

605: \begin{tabular}[t]{c|cccccc}

606: {\it Arc} &

607:  $id$ &

608:  $X$  &

609:  $Y$  &

610:  $L_1$ &

611:  $L_2$ &

612:  $L_3$ \\

613: \cline{2-7}

614:

615: &1 & 0 & 1 & P & h\# & \\

616: &2 & 1 & 2 & P & sh  & \\

617: &3 & 2 & 3 & P & iy  & \\

618: &4 & 3 & 4 & P & hv  & \\

619: &5 & 4 & 5 & P & ae  & \\

620: &6 & 5 & 6 & P & dcl & \\

621: &7 & 6 & 7 & P & y   & \\

622: &8 & 7 & 8 & P & axr & \\

623: &9 & 8 & 9 & P & dcl & \\

624: &10 & 9 & 10 & P & d & \\

625: &11 & 10 & 11 & P & aa & \\

626: &12 & 11 & 12 & P & r & \\

627: &13 & 12 & 13 & P & kcl & \\

628: &14 & 13 & 14 & P & k & \\

629: &15 & 14 & 15 & P & s & \\

630: &16 & 15 & 16 & P & uw & \\

631: &17 & 16 & 17 & P & q &

632: \end{tabular}\hfil

633: \begin{tabular}[t]{c|cccccc}

634: {\it Arc} &

635:  $id$ &

636:  $X$  &

637:  $Y$  &

638:  $L_1$ &

639:  $L_2$ &

640:  $L_3$ \\

641: \cline{2-7}

642:

643: &18 & 1 & 3 & W & she & \\

644: &19 & 3 & 6 & W & had & \\

645: &20 & 6 & 8 & W & your & \\

646: &21 & 8 & 14 & W & dark & 1 \\

647: &22 & 14 & 17 & W & suit & \\

648: &23 & 1 & 18 & S & S & \\

649: &24 &3 & 18 & S & VP & \\

650: &25 &1 & 3 & S & NP & \\

651: &26 &3 & 6 & S & V & \\

652: &27 &6 & 17 & S & NP & \\

653:

654: &28 &1 & 17 & Imt & L- & \\

655: &29 &1 & 18 & Itl & L\% & \\

656:

657: &30 &1 & 19 & T & 0 & \\

658: &31 &19 & 20 & T & H* & 1

659: \end{tabular}\hfil

660: \begin{tabular}[t]{c|cc}

661: {\it Time} &

662:  $N$ &

663:  $T$\\

664: \cline{2-3}

665: & 0 & 0    \\

666: & 1 & 2360 \\

667: & 2 & 3270 \\

668: & 3 & 5200 \\

669: & 4 & 6160 \\

670: & 5 & 8720 \\

671: & 6 & 9680 \\

672: & 7 & 10173\\

673: & 8 & 11077\\

674: & 9 & 12019\\

675: & 10 & 12257\\

676: & 11 & 14120\\

677: & 12 & 15240\\

678: & 13 & 16200\\

679: & 14 & 16626\\

680: & 15 & 18480\\

681: & 16 & 20685\\

682: & 17 & 22179\\

683: & 18 & 57040\\

684: & 19 & 13650\\

685: & 20 & 13650

686: \end{tabular}

687: \end{minipage}

688:     \caption{The Arc and Time Relations}

689:     \label{fig:graph-table}

690: }

691: \vspace{2ex}\hrule

692: \end{figure*}

693:

694: We form the transitive closure of the (unlabeled) graph relation to

695: define a structural (graph-wise) precedence relation using a datalog program:

696:

697: \begin{sv}

698: s_prec(X,X) :-

699: s_prec(X,Y) :- arc(_,X,Y,_,_,_)

700: s_prec(X,Y) :- s_prec(X,Z),

701:                arc(_,Z,Y,_,_,_)

702: \end{sv}

703:

704: Now we further define a temporal precedence relation, where {\tt leq} is

705: the $\leq$ relation (minimally defined on the times used by the graph):

706:

707: \begin{sv}

708: t_prec0(X,Y) :- time(X,T1),

709:                 time(Y,T2),

710:                 leq(X,Y)

711:

712: t_prec(X,Y) :-  t_prec0(X,Y)

713: t_prec(X,Y) :-  t_prec(X,Z),

714:                 t_prec0(Z,Y)

715: \end{sv}

716: \arXivhack

717:

718: \Section{Exploring Annotated Linguistic Databases}

719:

720: \SubSection{General architecture}

721:

722: In our experience with the analysis of linguistic databases, we have

723: found a recurrent pattern of use having three components

724: which we will call query, report generation, and analysis.

725:

726: The query system proper can be viewed as a function from annotation

727: graphs to sets of subgraphs, i.e. those meeting some (perhaps complex)

728: condition.

729: The report generation phase is able to access these query

730: results, but also the signals underlying the annotations.  For

731: example, the report generation phase can calculate such things as

732: `mean F$_2$ in signal S during time interval $(t_1,t_2)$.'

733: Each hit constitutes an `observation' in the statistician's sense,

734: and we extract a vector of specified values for each observation, to

735: be passed along to the analysis system.

736: The analysis phase is then some general-purpose data

737: crunching system such as Splus or Matlab.

738:

739: This architecture saves us from having to incorporate all possible

740: calculations over annotated signals into the query language.

741: The report generation phase can perform such calculations, as well

742: as compute properties of the annotation data itself.

743: This seems to simplify the query system a good deal;

744: now things like `count the number of syllables to the end of the

745: current phrase' (which we do need to be able to do) are tasks for the

746: report generator, not the query system proper.

747:

748: In general, the result of a query is a set of sub-graphs, each of which

749: forms one matching instance.  If we use the relational model proposed

750: above, these would be returned as a result table having the

751: same structure as the arc relation of Figure~\ref{fig:graph-table},

752: but containing just the tuples which took part in each matching instance.

753: We are then faced with the problem of how to differentiate the matching

754: instances, for example, if we wished to collect together the word

755: labels for the query `find all words dominated by noun phrases' we need

756: some way of treating each sub-graph separately.  Hence, we would prefer

757: the result to be a set of tables rather than a single table containing

758: all matching tuples.

759:

760: In a sense, then, the only role of the query is to define an iterator

761: for the report generator over a set of sub-graphs of the overall

762: annotation graph.

763:

764: \subsubsection*{The Emu query language}

765:

766: The Emu query language uses simple conditions on token labels which

767: match only tokens at a specified level, for example:

768: \texttt{Phonetic=A|I|O|U|E|V}.  These conditions can be combined by

769: sequence, domination or association operators to constrain the

770: relational structure of the tokens of interest.  Examples of each are:

771:

772:   Find a sequence of \texttt{vowel} followed by \texttt{stop} at the

773:   phoneme level:\\

774:       \texttt{[Phoneme=vowel \queryseq Phoneme=stop]}

775:

776:   Find Words not labelled \texttt{x} dominating \texttt{vowel}

777:   phonemes:\\

778:       \texttt{[Word!=x \querydom Phoneme=vowel]}

779:

780:   Find words associated with \texttt{H*} tones:\\

781:       \texttt{[Word!=x \queryassoc Tone=H*]}

782:

783: The \texttt{Word!=x} query is intended to match any word in lieu of a

784: query language construct which allows matching any label string.

785:

786: Each query matches either a token or, in the case of the sequence

787: query, a sequence of tokens.  The result of a domination or association

788: query is the result of the left hand side of the bracketed term; this

789: can be changed by marking the right hand side term with a hash (\texttt{\#}).

790: Compound queries can be arbitrarily nested to specify complex

791: constraints on tokens. As an example the following query finds

792: sequences of stop and vowels dominated by strong syllables where the

793: vowel is associated with an \texttt{H*} tone target, the result is a

794: list of the vowel labels with associated start and end times.

795: \begin{center}

796: \begin{sv}

797:  [Syllable=S ^

798:      [Phoneme=stop ->

799:          [Phoneme=vowel => Tone=H*]]]

800: \end{sv}

801: \end{center}

802:

803: The result of an Emu query is a table with one entry per matching

804: token:

805: \begin{sv}

806: database:timit

807: query:Phoneme!=x

808: type:segment

809: #

810: h#      0       147.5   fjsp0:sa1

811: sh      147.5   232.5   fjsp0:sa1

812: iy      232.5   325     fjsp0:sa1

813: hv      325     385     fjsp0:sa1

814: ...

815: \end{sv}

816: This table is used to extract any of the associated time-series data

817: associated with the database, an operation usually carried out from an

818: analysis environment such as Splus or XlispStat.  Emu provides

819: libraries of analysis functions for these environments which

820: facilitate, for example, mapping signal processing operations over each

821: token in a query result or overlaying plots of the time series data for

822: each token.

823:

824: Although this query system has proved useful and useable in the

825: environment of acoustic phonetics research, it is now evident that

826: there are a number of shortcomings which prevent it's wider use. The

827: query syntax is unable to express some queries, such as those involving

828: disjunction or optional elements, and the query result is only really

829: useful for data extraction.  It is for these reasons that we are now

830: looking more formally at the requirements for a query language for

831: annotation data.

832:

833: \SubSection{A query language on annotation graphs}

834:

835: A high-level query language for annotation graphs, founded on

836: an interval-based tense logic, is currently being developed and

837: will be reported in a later version of this paper.

838:

839: Here we describe a variety of useful queries on annotation

840: graphs and formulate them as datalog programs.  As we shall see,

841: it turns out that datalog is insufficiently expressive for the

842: range of queries we have in mind.  Finding a more expressive

843: yet tractable query language is the focus of ongoing research.

844:

845: A number of simple operations, extending our two relations

846: \predicate{arc/6} and \predicate{time/2},

847: will be necessary for succinct queries.

848: The first and most obvious is for hierarchy.  Observe in

849: Figure~\ref{fig:ag-timit} that there is a notion of structural

850: inclusion defined by the arcs.  We formulate this as follows:

851:

852: \begin{sv}

853: s_incl(I,J) :-

854:    arc(I,W,Z,_,_,_),

855:    arc(J,X,Y,_,_,_),

856:    s_prec(W,X), s_prec(Y,Z)

857: \end{sv}

858:

859: Now, since \predicate{s\_prec} is reflexive, so is \predicate{s\_incl}.

860: Observe that nodes 3 and 6 in Figure~\ref{fig:ag-timit} are connected

861: by both an \smtt{S/V} arc and a \smtt{W/had} arc.  The syntactic

862: verb arc \smtt{S/V} should dominate the word arc \smtt{W/had}, but not

863: vice versa.  Therefore we need to have a hierarchy defined over the

864: types.  We achieve this with a (domain-specific) ordering on the

865: type names:

866:

867: \begin{sv}

868: type_hierarchy(word,syl)

869: type_hierarchy(syl,seg)

870: \end{sv}

871:

872: \noindent

873: Now dominance is expressed by the predicate:

874:

875: \begin{sv}

876: dom(I,J) :-

877:    arc(I,_,_,L1,_,_),

878:    arc(J,_,_,L2,_,_),

879:    type_hierarchy(L1,L2),

880:    s_incl(I,J)

881: \end{sv}

882:

883: In some cases it is necessary to have an intransitive dominance

884: relation that is sensitive to phrase structure rules.  For simplicity

885: of presentation, we assume binary branching structures.  The first

886: of the rules below states that a sentence arc \smtt{s} will

887: immediately and exhaustively dominate an \smtt{np} arc followed

888: by a \smtt{vp} arc.

889:

890: \begin{sv}

891: ps_rule(s,np,vp)

892: ps_rule(np,det,n)

893: ps_rule(vp,v,np)

894: \end{sv}

895:

896: \noindent

897: Now we define immediate dominance over the syntax arcs \smtt{syn} as

898: follows:

899:

900: \begin{sv}

901: i_dom(I,J) :-

902:    arc(I,X,Z,syn,P,_),

903:    ps_rule(P,C1,C2),

904:    arc(J,X,Y,syn,C1,_),

905:    arc(_,Y,Z,syn,C2,_)

906:

907: i_dom(I,J) :-

908:    arc(I,X,Z,syn,P,_),

909:    ps_rule(P,C1,C2),

910:    arc(_,X,Y,syn,C1,_),

911:    arc(J,Y,Z,syn,C2,_)

912: \end{sv}

913:

914: Another widely used relation between arcs is association.  In the

915: instance of the AG model in Figure~\ref{fig:graph-table}, association

916: amounts to sharing the value of $L_3$, as we saw in the tuples

917: for \smtt{dark} and \smtt{H*} in Figure~\ref{fig:graph-table}.

918: The \predicate{assoc}

919: predicate simply does a join on the third label field:

920:

921: \begin{sv}

922: assoc(I,J) :-

923:    arc(I,_,_,_,_,A),

924:    arc(J,_,_,_,_,A)

925: \end{sv}

926:

927: Finally, it is convenient to have a kleene star relation.

928: %%Unfortunately it is unable to collect up the arbitrary length

929: Unfortunately in datalog we are unable to collect up the arbitrary length

930: sequence it matches.  Here we have it returning the two nodes which

931: bound the sequence, which is often enough to uniquely identify the

932: sequence in practice.

933:

934: \begin{sv}

935: node(N) :- arc(_,N,_,_,_,_)

936: node(N) :- arc(_,_,N,_,_,_)

937:

938: kleene1(X,X,_) :- node(X)

939: kleene1(X,Y,L) :-

940:    arc(_,X,Z,L,_,_),

941:    kleene1(Z,Y,L)

942:

943: kleene2(X,X,_) :- node(X)

944: kleene2(X,Y,L) :-

945:    arc(_,X,Z,_,L,_),

946:    kleene2(Z,Y,L)

947:

948: kleene3(X,X,_) :- node(X)

949: kleene3(X,Y,L) :-

950:    arc(_,X,Z,_,_,L),

951:    kleene3(Z,Y,L)

952: \end{sv}

953:

954: With this simple machinery we can start defining some annotation

955: queries.

956:

957: %\note{Now, we want these queries to construct a graph/6 table where each

958: %tuple that takes part in the match is represented. Clearly these

959: %queries aren't going to do that since they are just logical statements

960: %about various properties, so they really combine our query and

961: %reporting steps so...}

962: % I don't think this is a problem now, given the way queries

963: % always return arc ids - SB

964: %

965: %We define a query here as a datalog predicate.  The result of a query

966: %is the set of graph/6 tuples involved in deriving the truth value for

967: %the predicate.  This set is marked somehow to group together those

968: %tuples involved in each instance of a query match.

969: %

970: %\note{But, even this isn't enough, since we want to have the query

971: %  result annotated in some way so that we can tell which bit matched

972: %  what -- ie which is the vowel and which is the stop in this first

973: %  query. So, what do we do? }

974:

975: \noindent

976: Find a sequence of vowel followed by stop at the phoneme level

977: (assumes suitably defined {\tt vowel} and {\tt stop} unary relations):

978: \begin{sv}

979: vowel_stop(I,J) :-

980:    arc(I,_,Y,phoneme,V,_),

981:    arc(J,Y,_,phoneme,S,_),

982:    vowel(V), stop(S)

983: \end{sv}

984:

985: \noindent

986: If we do not want both the vowel and the stop, but just the vowel,

987: we could write:

988: \begin{sv}

989: vowel_stop(I) :-

990:    arc(I,_,Y,phoneme,V,_),

991:    arc(_,Y,_,phoneme,S,_),

992:    vowel(V), stop(S)

993: \end{sv}

994:

995: \noindent

996: Find words dominating vowel phonemes:

997: \begin{sv}

998: strongWrdDomVowels(I) :-

999:     arc(I,_,_,word,s,_),

1000:     arc(J,_,_,phoneme,V,_),

1001:     vowel(V),

1002:     dom(I,J)

1003: \end{sv}

1004:

1005: \noindent

1006: Find words associated with H* tones:

1007: %% [Word!=x => Tone=H*]

1008: \begin{sv}

1009: sylHtone(I) :-

1010:     arc(I,_,_,word,_,A),

1011:     arc(_,_,_,tone,h*,A)

1012: \end{sv}

1013:

1014: \noindent

1015: Find stop-vowel sequences dominated by words in noun phrases where the word

1016: is associated with an H* tone target.

1017: %% [[Phoneme=stop -> Phoneme=vowel] ^ [[Word!=x => Tone=H*] ^ Syntax=np]]

1018: \begin{sv}

1019: stop_vowel_seq(I,J) :-

1020:     arc(I,_,Y,phoneme,S,_), stop(S),

1021:     arc(J,Y,_,phoneme,V,_), vowel(V),

1022:     arc(W,_,_,word,_,_),

1023:     arc(N,_,_,syn,np,_),

1024:     dom(N,W), dom(W,I), dom(W,J),

1025:     arc(T,_,tone,h*,_), assoc(W,T)

1026: \end{sv}

1027:

1028: \noindent

1029: Find the intermediate phrase containing the main verb of a sentence:

1030: %% [Intermediate!=x ^ [Syntax=s ^ [Syntax=vp ^ Syntax=v]]]

1031: \begin{sv}

1032: imt_phrase(P) :-

1033:     arc(K, _, _, syn, s, _),

1034:     arc(J, _, _, syn, vp, _),

1035:     arc(I, _, _, syn, v, _),

1036:     i_dom(K,J), i_dom(J,I),

1037:     dom(P, I),

1038:     arc(P, _, _, imt, _, _)

1039: \end{sv}

1040:

1041: \noindent

1042: Return the set of syllables between an H* and an L\% tone (inclusive).

1043: %% Emu can't do this, doesn't have kleene star

1044: \begin{sv}

1045: syls(K) :-

1046:     arc(_, _, N, tone, h*, A1),

1047:     arc(_, N, _, tone, l%, A2),

1048:     arc(I, _, N1, syl, _, A1),

1049:     arc(J, N2, _, syl, _, A2),

1050:     kleene1(N1, N2, syl),

1051:     arc(K, N2, N3, syl,_,_),

1052:     kleene1(N3, N4, syl)

1053: \end{sv}

1054:

1055: The above query shows how the datalog model breaks down.  We would

1056: like it to return sets of sets of syllable arcs.  Instead it returns

1057: a flat set structure.

1058: In many cases we will know that some arc participating in

1059: the query expression can be used to recover the nested structure.  For

1060: example, if the head of the above clause was changed from

1061: \predicate{syls(K)} to \predicate{syls(I,K)}, then \predicate{I} will

1062: aggregate \predicate{K} in just the right way.

1063: \arXivhack

1064:

1065: \Section{Applying XML Query Languages to Annotations}

1066:

1067: %\begin{figure*}[t]

1068: %\begin{tabular}{l|l}

1069: %{\scriptsize

1070: %\begin{minipage}[t]{.4\linewidth}

1071: %\begin{alltt}

1072: %<clauses>

1073: %  <clause id=1 label='s'>

1074: %    <clause id=2 label='np'>

1075: %      <word id=3 label='she'/>

1076: %    </clause>

1077: %    <clause id=4 label='vp'>

1078: %      <word id=5 label='had'/>

1079: %      <clause id=6 label='np'>

1080: %        <word id=7 label='your'/>

1081: %        <word id=8 label='dark'/>

1082: %        <word id=9 label='suit'/>

1083: %      </clause>

1084: %      <clause id=10 label='pp'>

1085: %      ...

1086: %      </clause>

1087: %    </clause>

1088: %  </clause>

1089: %</clauses>

1090: %\end{alltt}

1091: %\end{minipage}}

1092: %&

1093: %{\scriptsize

1094: %\begin{minipage}[t]{.5\linewidth}

1095: %\begin{alltt}

1096: %<tobi>

1097: %  <intonational id=20  label='L\%'>

1098: %    <intermediate id=21 label='L-'>

1099: %      <word  id=3 label='she'>

1100: %        <phoneme id=22 label='sh'>

1101: %        <phoneme id=23 label='iy'>

1102: %      </word>

1103: %      <word id=5 label='had'>

1104: %        <phoneme id=24 label='hv'/>

1105: %        <phoneme id=25 label='ae'/>

1106: %        <phoneme id=26 label='dcl'/>

1107: %      </word>

1108: %      <word id=7 label='your'/>

1109: %      <word id=8 label='dark' toneref=101/>

1110: %      <word id=9 label='suit'>

1111: %      </word>

1112: %    </intermediate>

1113: %...

1114: %  </intonational>

1115: %  <tones>

1116: %    <tone id=101 label='H*'/>

1117: %    <tone id=102 label='H*'/>

1118: %  </tones>

1119: %</tobi>

1120: %\end{alltt}

1121: %\end{minipage}}

1122: %\end{tabular}

1123: %\caption{The TIMIT Example as a Pair of XML Documents}

1124: %\label{fig:xml-eg}

1125: %\vspace{2ex}\hrule

1126: %\end{figure*}

1127:

1128: It is worth briefly considering the suitability of existing XML

1129: query languages such as XML-QL \cite{xml-ql} and XQL \cite{xql}

1130: for the domain of annotated speech.  At first glance the problems

1131: we face querying annotated speech data are similar to those present

1132: with XML queries in that both present a hierarchical data

1133: model.  A number of formulations of annotation data as XML are

1134: possible, indeed some projects make use of XML/SGML based formats

1135: entirely (e.g.\ MATE \sburl{mate.nis.sdu.dklpq},

1136: LACITO \sburl{lacito.vjf.cnrs.fr/ARCHIVAG/ENGLISH.htm}).

1137: %

1138: % This is quite problematic, and I propose to omit it:

1139: % The major problem is the common occurence of multiple

1140: % intersecting hierarchies, such as that presented in the earlier

1141: % augmented TIMIT example.  In order to represent this kind of

1142: % annotation, two or more XML documents (or sub-documents) are required

1143: % which share structure, perhaps via common attributes; an example is

1144: % given in Figure~\ref{fig:xml-eg}.

1145: %

1146: XML can represent trees using properly nested tags, in the

1147: obvious way.  In order to represent multiple independent

1148: hierarchies built on top of the same material one must construct trees

1149: using IDREF pointers.  This idea was proposed by the

1150: Text Encoding Initiative \cite{TEI-P3} and recently

1151: adopted by the MATE project.  We believe this approach is

1152: vastly more expressive than necessary for representing speech

1153: annotations, and we prefer a more constrained approach having

1154: desirable computational properties with respect to creation,

1155: validation and query.

1156:

1157: The XQL proposal \cite{xql} describes a query language which is

1158: intended to select elements from

1159: within XML documents according to various criteria; for example, the

1160: query \texttt{text/author} returns all author elements that are

1161: children of text elements.  The XQL data model ignores the order of

1162: elements within a parent element and has no obvious way to query for

1163: sequences of tokens.

1164:

1165: The XML-QL proposal \cite{xml-ql} provides for a data model

1166: where the order of elements is respected.  A query for a word-internal

1167: vowel-stop sequence could be expressed as follows (assuming

1168: suitably tagged annotation data for TIMIT):

1169:

1170: \begin{sv}

1171: <word>

1172:   <phoneme label=&vowel;/>

1173:   <phoneme label=&stop;/>

1174: </word>

1175: \end{sv}

1176:

1177: \noindent

1178: The result of this query would have the following form:

1179:

1180: \begin{sv}

1181: <word label=had>

1182:   <phoneme label=ae/>

1183:   <phoneme label=dcl/>

1184: </word>

1185: <word label=dark>

1186:   <phoneme label=ar/>

1187:   <phoneme label=k/>

1188: </word>

1189: \end{sv}

1190:

1191: Queries which refer to two independent

1192: hierarchies, such as syntactic and intonational phrase

1193: structure, need to use joins.

1194: For example, to find words that are simultaneously

1195: at the end of both clauses and intermediate phrases,

1196: we could have the following query:

1197: %%In Emu-QL:

1198: %%End(Intermediate,Word)=1 & End(Clause,Word)=1

1199: \begin{sv}

1200: <intermediate>

1201:   <word id=\$i></>[end()]

1202: </intermediate>

1203:

1204: <clause>

1205:   <word id=\$i></>[end()]

1206: </clause>

1207: \end{sv}

1208:

1209: We assume the existence of some mechanism to pick out the last child

1210: element.

1211: The ID attribute ensures that the words are the same in each

1212: part of the join.

1213:

1214: Perhaps either of these approaches could be made to work for a useful range

1215: of query needs.  However they do not appear to be sufficiently

1216: general.  For example, it is often useful to have query expressions

1217: involving kleene star: `select all pairs of consonants, ignoring

1218: any intervening vowels' (CV*C).  Such queries may ignore hierarchical

1219: structure, finding sequences across (say) word boundaries.

1220: Using regular expressions over paths, XML-QL could provide access to

1221: strings of terminal symbols ignoring intervening levels of hierarchy.

1222: Yet it does not provide regular-expression matching over those

1223: sequences.  Alternatively, sequences at each level of a hierarchy

1224: could be chained together using IDREF pointers, but it is unclear

1225: how we would manage closures over such pointer structures.

1226:

1227: % This is really problematic - so I'm omitting it:

1228: % While it may be possible to express many queries in XML-QL or similar

1229: % XML query languages, the poor fit of the XML data model to annotated

1230: % speech data ensures that many of these queries will be awkward and

1231: % unnatural.  Hence, although these efforts are informative and the

1232: % semistructured data model is clearly appropriate, the query needs we

1233: % describe here appear to fall outside the capabilities of these

1234: % existing XML query languages.  Databases of annotated speech

1235: % present an interesting new challenge for research on query languages.

1236: \arXivhack

1237:

1238: \Section{Conclusions}

1239:

1240: Annotated speech corpora are an essential component of speech

1241: research, and the variety of formats in which they are distributed has

1242: become a barrier to their wider adoption.  To address this issue,

1243: we have developed two data

1244: models for speech annotations which seem to be sufficiently expressive

1245: to encompass the full range of practice in this area.  We have shown

1246: how the models can be stored in a simple relational format, and how

1247: many useful queries in this domain are first-order.  However, existing

1248: query languages lack sufficient expressive power for the full range of

1249: queries we would like to be able to express, and we hope stimulate new

1250: research into the design of general purpose query languages for

1251: databases of annotated speech recordings.

1252: \arXivhack

1253:

1254: \section*{Acknowledgements}

1255:

1256: We are grateful to Peter Buneman, Mark Liberman

1257: and Gary Simons for helpful discussions concerning

1258: the research reported here.

1259:

1260: \raggedright\small

1261: \bibliographystyle{latex8}

1262:

1263: \begin{thebibliography}{10}\setlength{\itemsep}{-1ex}\small

1264:

1265: \bibitem{Bird95}

1266: S.~Bird.

1267: \newblock {\em Computational Phonology: A Constraint-Based Approach}.

1268: \newblock Studies in Natural Language Processing. Cambridge University Press,

1269:   1995.

1270:

1271: \bibitem{BirdLiberman99dtag}

1272: S.~Bird and M.~Liberman.

1273: \newblock Annotation graphs as a framework for multidimensional linguistic data

1274:   analysis.

1275: \newblock In {\em Towards Standards and Tools for Discourse Tagging --

1276:   Proceedings of the Workshop}, pages 1--10. Somerset, NJ: Association for

1277:   Computational Linguistics, 1999.

1278: \newblock [xxx.lanl.gov/abs/cs.CL/9907003].

1279:

1280: \bibitem{BirdLiberman99}

1281: S.~Bird and M.~Liberman.

1282: \newblock A formal framework for linguistic annotation.

1283: \newblock Technical Report MS-CIS-99-01, Department of Computer and Information

1284:   Science, University of Pennsylvania, 1999.

1285: \newblock [xxx.lanl.gov/abs/cs.CL/9903003], expanded from version presented at

1286:   ICSLP-98, Sydney, revised version to appear in {\it Speech Communication}.

1287:

1288: \bibitem{Cassidy99}

1289: S.~Cassidy.

1290: \newblock Compiling multi-tiered speech databases into the relational model:

1291:   experiments with the {Emu} system.

1292: \newblock In {\em Proceedings of the 6th European Conference on Speech

1293:   Communication and Technology}, 1999.

1294: \newblock \url{http://www.shlrc.mq.edu.au/emu/eurospeech99.shtml}.

1295:

1296: \bibitem{CassidyHarrington96}

1297: S.~Cassidy and J.~Harrington.

1298: \newblock Emu: An enhanced hierarchical speech data management system.

1299: \newblock In {\em Proceedings of the Sixth Australian International Conference

1300:   on Speech Science and Technology}, pages 361--366, 1996.

1301: \newblock \url{http://www.shlrc.mq.edu.au/emu/}.

1302:

1303: \bibitem{CassidyHarrington99}

1304: S.~Cassidy and J.~Harrington.

1305: \newblock Multi-level annotation of speech: an overview of the emu speech

1306:   database management system.

1307: \newblock manuscript, 1999.

1308:

1309: \bibitem{ChurchMercer93}

1310: K.~W. Church and R.~L. Mercer, editors.

1311: \newblock {\em Special Issue on Computational Linguistics Using Large Corpora},

1312:   volume 19(1,2).

1313: \newblock MIT Press, 1993.

1314:

1315: \bibitem{xml-ql}

1316: A.~Deutsch, M.~Fernandez, D.~Florescu, A.~Levy, and D.~Suciu.

1317: \newblock {XML-QL}: A query language for {XML}, 1998.

1318: \newblock \url{http://www.w3.org/TR/NOTE-xml-ql/}.

1319:

1320: \bibitem{TIMIT86}

1321: J.~S. Garofolo, L.~F. Lamel, W.~M. Fisher, J.~G. Fiscus, D.~S. Pallett, and

1322:   N.~L. Dahlgren.

1323: \newblock {\em The {DARPA TIMIT} Acoustic-Phonetic Continuous Speech Corpus

1324:   {CDROM}}.

1325: \newblock NIST, 1986.

1326: \newblock \url{http://www.ldc.upenn.edu/Catalog/LDC93S1.html}.

1327:

1328: \bibitem{Marcus93}

1329: M.~P. Marcus, B.~Santorini, and M.~A. Marcinkiewicz.

1330: \newblock Building a large annotated corpus of {English}: The {Penn}

1331:   {Treebank}.

1332: \newblock {\em Computational Linguistics}, 19(2):313--30, 1993.

1333: \newblock \url{http://www.cis.upenn.edu/~treebank/home.html}.

1334:

1335: \bibitem{MUC7}

1336: {\em Message Understanding Conference Proceedings (MUC-7)}. Science

1337:   Applications International Corporation, 1998.

1338: \newblock \url{http://www.muc.saic.com/proceedings/muc_7_toc.html}.

1339:

1340: \bibitem{xql}

1341: J.~Robie, J.~Lapp, and D.~Schach.

1342: \newblock {XML} query language ({XQL}), 1998.

1343: \newblock \url{http://www.w3.org/TandS/QL/QL98/pp/xql.html}.

1344:

1345: \bibitem{TEI-P3}

1346: {Text Encoding Initiative}.

1347: \newblock {\em Guidelines for Electronic Text Encoding and Interchange (TEI

1348:   P3)}.

1349: \newblock Oxford University Computing Services, 1994.

1350: \newblock \url{http://www.uic.edu/orgs/tei/}.

1351:

1352: \bibitem{Voorhees98}

1353: E.~M. Voorhees and D.~K. Harman, editors.

1354: \newblock {\em NIST Special Publication 500-242: The Seventh Text REtrieval

1355:   Conference (TREC-7)}. NIST, Government Printing Office, 1998.

1356: \newblock [trec.nist.gov/pubs/trec7/t7\_proceedings.html].

1357:

1358: \end{thebibliography}

1359:

1360: \end{document}

1361: