0206:cs0206037/main.tex

1: \documentclass[runningheads]{llncs}

2:

3: \input{psfig.sty}

4:

5: \begin{document}

6:

7: \pagestyle{headings}

8:

9: \mainmatter

10:

11: \title{Speech-Driven Text Retrieval: Using Target IR

12: Collections for Statistical Language Model Adaptation in Speech

13: Recognition}

14:

15: \titlerunning{Speech-Driven Text Retrieval}

16:

17: \author{Atsushi Fujii\inst{1} \and Katunobu Itou\inst{2}

18: \and Tetsuya Ishikawa\inst{1}}

19:

20: \authorrunning{Atsushi Fujii et al.}

21:

22: \institute{University of Library and Information Science\\

23: 1-2 Kasuga, Tsukuba, 305-8550, Japan\\

24: \email{\{fujii,ishikawa\}@ulis.ac.jp}\\

25: \and

26: National Institute of Advanced Industrial Science and Technology\\

27: 1-1-1 Chuuou Daini Umezono, Tsukuba, 305-8568, Japan\\

28: \email{itou@ni.aist.go.jp}}

29:

30: \maketitle

31:

32: \begin{abstract}

33:   Speech recognition has of late become a practical technology for

34:   real world applications. Aiming at speech-driven text retrieval,

35:   which facilitates retrieving information with spoken queries, we

36:   propose a method to integrate speech recognition and retrieval

37:   methods. Since users speak contents related to a target collection,

38:   we adapt statistical language models used for speech recognition

39:   based on the target collection, so as to improve both the

40:   recognition and retrieval accuracy. Experiments using existing test

41:   collections combined with dictated queries showed the effectiveness

42:   of our method.

43: \end{abstract}

44:

45: \newcommand{\etal}{et~al.}

46: \newcommand{\etaleos}{et~al}

47: \newcommand{\eq}[1]{(\ref{#1})}

48:

49: \section{Introduction}

50: \label{sec:introduction}

51:

52: Automatic speech recognition, which decodes human voice to generate

53: transcriptions, has of late become a practical technology.  It is

54: feasible that speech recognition is used in real world computer-based

55: applications, specifically, those associated with human language.  In

56: fact, a number of speech-based methods have been explored in the

57: information retrieval community, which can be classified into the

58: following two fundamental categories:

59: \begin{itemize}

60: \item spoken document retrieval, in which written queries are used to

61:   search speech (e.g., broadcast news audio) archives for relevant

62:   speech information~\cite{johnson:icassp-99,jones:sigir-96,sheridan:sigir-97,singhal:sigir-99,srinivasan:sigir-2000,wechsler:sigir-98,whittaker:sigir-99},

63: \item speech-driven (spoken query) retrieval, in which spoken queries

64:   are used to retrieve relevant textual information~\cite{barnett:eurospeech-97,crestani:fqas-2000}.

65: \end{itemize}

66:

67: Initiated partially by the TREC-6 spoken document retrieval (SDR)

68: track~\cite{garofolo:trec-97}, various methods have been proposed for

69: spoken document retrieval.  However, a relatively small number of

70: methods have been explored for speech-driven text retrieval, although

71: they are associated with numerous keyboard-less retrieval

72: applications, such as telephone-based retrieval, car navigation

73: systems, and user-friendly interfaces.

74:

75: Barnett~\etal~\cite{barnett:eurospeech-97} performed comparative

76: experiments related to speech-driven retrieval, where an existing

77: speech recognition system was used as an input interface for the

78: INQUERY text retrieval system.  They used as test inputs 35 queries

79: collected from the TREC 101-135 topics, dictated by a single male

80: speaker.  Crestani~\cite{crestani:fqas-2000} also used the above 35

81: queries and showed that conventional relevance feedback techniques

82: marginally improved the accuracy for speech-driven text retrieval.

83:

84: These above cases focused solely on improving text retrieval methods

85: and did not address problems of improving speech recognition accuracy.

86: In fact, an existing speech recognition system was used with no

87: enhancement. In other words, speech recognition and text retrieval

88: modules were fundamentally independent and were simply connected by

89: way of an input/output protocol.

90:

91: However, since most speech recognition systems are trained based on

92: specific domains, the accuracy of speech recognition across domains is

93: not satisfactory. Thus, as can easily be predicted, in cases of

94: Barnett~\etal~\cite{barnett:eurospeech-97} and

95: Crestani~\cite{crestani:fqas-2000}, a relatively high speech

96: recognition error rate considerably decreased the retrieval accuracy.

97: Additionally, speech recognition with a high accuracy is crucial for

98: interactive retrieval.

99:

100: Motivated by these problems, in this paper we integrate (not simply

101: connect) speech recognition and text retrieval to improve both

102: recognition and retrieval accuracy in the context of speech-driven

103: text retrieval.

104:

105: Unlike general-purpose speech recognition aimed to decode any

106: spontaneous speech, in the case of speech-driven text retrieval, users

107: usually speak contents associated with a target collection, from which

108: documents relevant to their information need are retrieved.  In a

109: stochastic speech recognition framework, the accuracy depends

110: primarily on acoustic and language models~\cite{bahl:ieee-tpami-1983}.

111: While acoustic models are related to phonetic properties, language

112: models, which represent linguistic contents to be spoken, are strongly

113: related to target collections.  Thus, it is intuitively feasible that

114: language models have to be produced based on target collections.

115:

116: To sum up, our belief is that by adapting a language model based on a

117: target IR collection, we can improve the speech recognition and text

118: retrieval accuracy, simultaneously.

119:

120: Section~\ref{sec:system} describes our prototype speech-driven text

121: retrieval system, which is currently implemented for Japanese.

122: Section~\ref{sec:experimentation} elaborates on comparative

123: experiments, in which existing test collections for Japanese text

124: retrieval are used to evaluate the effectiveness of our system.

125:

126: \section{System Description}

127: \label{sec:system}

128:

129:

130: \subsection{Overview}

131: \label{subsec:system_overview}

132:

133: Figure~\ref{fig:system} depicts the overall design of our

134: speech-driven text retrieval system, which consists of speech

135: recognition, text retrieval and adaptation modules. We explain the

136: retrieval process based on this figure.

137:

138: In the off-line process, the adaptation module uses the entire target

139: collection (from which relevant documents are retrieved) to produce a

140: language model, so that user speech related to the collection can be

141: recognized with a high accuracy.  On the other hand, an acoustic model

142: is produced independent of the target collection.

143:

144: In the on-line process, given an information need spoken by a user,

145: the speech recognition module uses the acoustic and language models to

146: generate a transcription for the user speech.  Then, the text

147: retrieval module searches the collection for documents relevant to the

148: transcription, and outputs a specific number of top-ranked documents

149: according to the degree of relevance, in descending order.

150:

151: These documents are fundamentally final outputs. However, in the case

152: where the target collection consists of multiple domains, a language

153: model produced in the off-line adaptation process is not necessarily

154: precisely adapted to a specific information need.  Thus, we optionally

155: use top-ranked documents obtained in the initial retrieval process for

156: an on-line adaptation, because these documents are associated with the

157: user speech more than the entire collection.  We then re-perform

158: speech recognition and text retrieval processes to obtain final

159: outputs.

160:

161: In other words, our system is based on the two-stage retrieval

162: principle~\cite{kwok:sigir-98}, where top-ranked documents retrieved

163: in the first stage are intermediate results, and are used to improve

164: the accuracy for the second (final) stage.  From a different

165: perspective, while the off-line adaptation process produces the {\it

166: global\/} language model for a target collection, the on-line

167: adaptation process produces a {\it local\/} language model based on

168: the user speech.

169:

170: In the following sections, we explain speech recognition, adaptation,

171: and text retrieval modules in

172: Figure~\ref{fig:system}, respectively.

173:

174: \begin{figure}[htbp]

175:   \begin{center}

176:     \leavevmode \psfig{file=system.eps,height=2.5in}

177:   \end{center}

178:   \caption{The overall design of our speech-driven text retrieval system.}

179:   \label{fig:system}

180: \end{figure}

181:

182: \subsection{Speech Recognition}

183: \label{subsec:speech_recognition}

184:

185: The speech recognition module generates word sequence $W$, given

186: phoneme sequence $X$.  In the stochastic speech recognition framework,

187: the task is to output the $W$ maximizing $P(W|X)$, which is

188: transformed as in equation~\eq{eq:bayes} through use of the Bayesian

189: theorem.

190: \begin{equation}

191:   \label{eq:bayes}

192:   \arg\max_{W}P(W|X) = \arg\max_{W}P(X|W)\cdot P(W)

193: \end{equation}

194: Here, $P(X|W)$ models a probability that word sequence $W$ is

195: transformed into phoneme sequence $X$, and $P(W)$ models a probability

196: that $W$ is linguistically acceptable. These factors are usually

197: called acoustic and language models, respectively.

198:

199: For the speech recognition module, we use the Japanese dictation

200: toolkit~\cite{kawahara:icslp-2000}\footnote{http://winnie.kuis.kyoto-u.ac.jp/dictation/},

201: which includes the ``Julius'' recognition engine and acoustic/language

202: models trained based on newspaper articles. This toolkit also includes

203: development softwares, so that acoustic and language models can be

204: produced and replaced depending on the application.  While we use the

205: acoustic model provided in the toolkit, we use new language models

206: produced by way of the adaptation process (see

207: Section~\ref{subsec:lm_adaptation}).

208:

209: \subsection{Language Model Adaptation}

210: \label{subsec:lm_adaptation}

211:

212: The basis of the adaptation module is to produce a word-based $N$-gram

213: (in our case, a combination of bigram and trigram) model by way of

214: source documents.

215:

216: In the off-line (global) adaptation process, we use the ChaSen

217: morphological analyzer~\cite{matsumoto:chasen-99} to extract words

218: from the entire target collection, and produce the global $N$-gram

219: model.

220:

221: On the other hand, in the on-line (local) adaptation process, only

222: top-ranked documents retrieved in the first stage are used as source

223: documents, from which word-based $N$-grams are extracted as performed

224: in the off-line process.  However, unlike the case of the off-line

225: process, we do not produce the entire language model. Instead, we

226: re-estimate only statistics associated with top-ranked documents, for

227: which we use the MAP (Maximum A-posteriori Probability) estimation

228: method~\cite{masataki:icassp-97}.

229:

230: Although the on-line adaptation theoretically improves the retrieval

231: accuracy, for real-time usage, the trade-off between the retrieval

232: accuracy and computational time required for the on-line process has

233: to be considered.

234:

235: Our method is similar to the one proposed by Seymore and

236: Rosenfeld~\cite{seymore:eurospeech-97} in the sense that both methods

237: adapt language models based on a small number of documents related to

238: a specific domain (or topic). However, unlike their method, our method

239: does not require corpora manually annotated with topic tags.

240:

241: \subsection{Text Retrieval}

242: \label{subsec:text_retrieval}

243:

244: The text retrieval module is based on an existing probabilistic

245: retrieval method~\cite{robertson:sigir-94}, which computes the

246: relevance score between the transcribed query and each document in the

247: collection.  The relevance score for document $i$ is computed based on

248: equation~\eq{eq:okapi}.

249: \begin{equation}

250:   \label{eq:okapi}

251:   \sum_{t} \left(\frac{\textstyle TF_{t,i}}{\textstyle

252:     \frac{\textstyle DL_{i}}{\textstyle avglen} +

253:     TF_{t,i}}\cdot\log\frac{\textstyle N}{\textstyle DF_{t}}\right)

254: \end{equation}

255: Here, $t$'s denote terms in transcribed queries.  $TF_{t,i}$ denotes

256: the frequency that term $t$ appears in document $i$. $DF_{t}$ and $N$

257: denote the number of documents containing term $t$ and the total

258: number of documents in the collection. $DL_{i}$ denotes the length of

259: document $i$ (i.e., the number of characters contained in $i$), and

260: $avglen$ denotes the average length of documents in the collection.

261:

262: We use content words extracted from documents as terms, and perform a

263: word-based indexing. For this purpose, we use the ChaSen morphological

264: analyzer~\cite{matsumoto:chasen-99} to extract content words. We

265: extract terms from transcribed queries using the same method.

266:

267: \section{Experimentation}

268: \label{sec:experimentation}

269:

270: \subsection{Test Collections}

271: \label{subsec:test_collection}

272:

273: We investigated the performance of our system based on the NTCIR

274: workshop evaluation methodology, which resembles the one in the TREC

275: ad hoc retrieval track. In other words, each system outputs 1,000 top

276: documents, and the TREC evaluation software was used to plot

277: recall-precision curves and calculate non-interpolated average

278: precision values.

279:

280: The NTCIR workshop was held twice (in 1999 and 2001), for which two

281: different test collections were produced: the NTCIR-1 and 2

282: collections~\cite{ntcir-99,ntcir-2001}\footnote{http://research.nii.ac.jp/\~{}ntcadm/index-en.html}.

283: However, since these collections do not include spoken queries, we

284: asked four speakers (two males/females) to dictate information needs

285: in the NTCIR collections, and simulated speech-driven text retrieval.

286:

287: The NTCIR collections include documents collected from technical

288: papers published by 65 Japanese associations for various fields. Each

289: document consists of the document ID, title, name(s) of author(s),

290: name/date of conference, hosting organization, abstract and author

291: keywords, from which we used titles, abstracts and keywords for the

292: indexing. The number of documents in the NTCIR-1 and 2 collections are

293: 332,918 and 736,166, respectively (the NTCIR-1 documents are a subset

294: of the NTCIR-2).

295:

296: The NTCIR-1 and 2 collections also include 53 and 49 topics,

297: respectively. Each topic consists of the topic ID, title of the topic,

298: description, narrative.  Figure~\ref{fig:topic} shows an English

299: translation for a fragment of the NTCIR topics\footnote{The NTCIR-2

300: collection contains Japanese topics and their English translations.},

301: where each field is tagged in an SGML form. In general, titles are not

302: informative for the retrieval. On the other hand, narratives, which

303: usually consist of several sentences, are too long to speak. Thus,

304: only descriptions, which consist of a single phrase and sentence, were

305: dictated by each speaker, so as to produce four different sets of 102

306: spoken queries.

307:

308: \begin{figure*}[htbp]

309:   \begin{center}

310:     \leavevmode

311:     \small

312:     \begin{quote}

313:       \tt

314:       <TOPIC q=0118>\\

315:       <TITLE>TV conferencing</TITLE>\\

316:       <DESCRIPTION>Distance education support systems using TV

317:       conferencing</DESCRIPTION>\\

318:       <NARRATIVE>A relevant document will provide information on the

319:       development of distance education support systems using TV

320:       conferencing. Preferred documents would present examples of

321:       using TV conferencing and discuss the results. Any reported

322:       methods of aiding remote teaching are relevant documents (for

323:       example, ways of utilizing satellite communication, the

324:       Internet, and ISDN circuits).</NARRATIVE>\\

325:       </TOPIC>

326:     \end{quote}

327:     \caption{An English translation for an example topic in the NTCIR

328:       collections.}

329:     \label{fig:topic}

330:   \end{center}

331: \end{figure*}

332:

333: In the NTCIR collections, relevance assessment was performed based on

334: the pooling method~\cite{voorhees:sigir-98}. First, candidates for

335: relevant documents were obtained with multiple retrieval

336: systems. Then, for each candidate document, human expert(s) assigned

337: one of three ranks of relevance: ``relevant,'' ``partially relevant''

338: and \mbox{``irrelevant.''} The NTCIR-2 collection also includes

339: ``highly relevant'' documents. In our evaluation, ``highly relevant''

340: and ``relevant'' documents were regarded as relevant ones.

341:

342: \subsection{Comparative Evaluation}

343: \label{subsec:comparison}

344:

345: In order to investigate the effectiveness of the off-line language

346: model adaptation, we compared the performance of the following

347: different retrieval methods:

348: \begin{itemize}

349: \item text-to-text retrieval, which used written descriptions

350:   as queries, and can be seen as the perfect speech-driven text retrieval,

351: \item speech-driven text retrieval, in which a language model produced

352:   based on the NTCIR-2 collection was used,

353: \item speech-driven text retrieval, in which a language model produced

354:   based on ten years worth of {\it Mainichi Shimbun\/} Japanese newspaper

355:   articles (1991-2000) was used.

356: \end{itemize}

357: The only difference in producing two different language models (i.e.,

358: those based on the NTCIR-2 collection and newspaper articles) are the

359: source documents. In other words, both language models have the same

360: vocabulary size (20,000), and were produced using the same softwares.

361:

362: Table~\ref{tab:lang_model} shows statistics related to word

363: tokens/types in two different source corpora for language modeling,

364: where the line ``Coverage'' denotes the ratio of word tokens contained

365: in the resultant language model. Most of word tokens were covered in

366: both language models.

367:

368: \begin{table}[htbp]

369:   \begin{center}

370:     \caption{Statistics associated with source words for language

371:     modeling.}

372:     \medskip

373:     \leavevmode

374:     \small

375:     \tabcolsep=3pt

376:     \begin{tabular}{lcc} \hline\hline

377:       & NTCIR & News \\ \hline

378:       \# of Types & 454K & 315K \\

379:       \# of Tokens & 175M & 262M \\

380:       Coverage & 97.9\% & 96.5\% \\

381:       \hline

382:     \end{tabular}

383:     \label{tab:lang_model}

384:   \end{center}

385: \end{table}

386:

387: In cases of speech-driven text retrieval methods, queries dictated by

388: four speakers were used individually. Thus, in practice we compared

389: nine different retrieval methods. Although the Julius decoder outputs

390: more than one transcription candidate for a single speech input, we

391: used only the one with the greatest probability score. The results did

392: not significantly change depending on whether or not we used

393: lower-ranked transcriptions as queries.

394:

395: Table~\ref{tab:results} shows the non-interpolated average precision

396: values and word error rate in speech recognition, for different

397: retrieval methods. As with existing experiments for speech

398: recognition, word error rate (WER) is the ratio between the number of

399: word errors (i.e., deletion, insertion, and substitution) and the

400: total number of words. In addition, we also investigated error rate

401: with respect to query terms (i.e., keywords used for retrieval), which

402: we shall call ``term error rate (TER).''

403:

404: In Table~\ref{tab:results}, the first line denotes results of the

405: text-to-text retrieval, which were relatively high compared with

406: existing results reported in the NTCIR

407: workshops~\cite{ntcir-99,ntcir-2001}.

408:

409: \begin{table*}[htbp]

410:   \begin{center}

411:     \caption{Results for different retrieval methods (AP: average

412:     precision, WER: word error rate, TER: term error rate).}

413:     \medskip

414:     \leavevmode

415:     \small

416:     \tabcolsep=5pt

417:     \begin{tabular}{lcccccc} \hline\hline

418:       & \multicolumn{3}{c}{NTCIR-1} & \multicolumn{3}{c}{NTCIR-2} \\

419:       \cline{2-7}

420:       {\hfill\centering Method\hfill}

421:       & AP & WER & TER

422:       & AP & WER & TER \\ \hline

423:       Text & 0.3320 & --- & --- & 0.3118 & --- & --- \\

424:       M1 (NTCIR) & 0.2708 & 0.1659 & 0.2190 & 0.2504 & 0.1532 & 0.2313 \\

425:       M2 (NTCIR) & 0.2471 & 0.2034 & 0.2381 & 0.2114 & 0.2180 & 0.2799 \\

426:       F1 (NTCIR) & 0.2276 & 0.1961 & 0.2857 & 0.1873 & 0.1885 & 0.2500 \\

427:       F2 (NTCIR) & 0.2642 & 0.1477 & 0.2222 & 0.2376 & 0.1635 & 0.2388 \\

428:       M1 (News) &  0.1076 & 0.3547 & 0.5143 & 0.0790 & 0.3594 & 0.5149 \\

429:       M2 (News) &  0.1257 & 0.4044 & 0.5460 & 0.0691 & 0.5022 & 0.6343 \\

430:       F1 (News) &  0.1156 & 0.3801 & 0.5238 & 0.0798 & 0.4418 & 0.5709 \\

431:       F2 (News) &  0.1225 & 0.3317 & 0.5016 & 0.0917 & 0.4080 & 0.5858 \\

432:       \hline

433:     \end{tabular}

434:     \label{tab:results}

435:   \end{center}

436: \end{table*}

437:

438: The remaining lines denote results of speech-driven text retrieval

439: combined with the NTCIR-based language model (lines 2-5) and the

440: newspaper-based model (lines 6-9), respectively.  Here, ``Mx'' and

441: ``Fx'' denote male/female speakers, respectively. Suggestions which

442: can be derived from these results are as follows.

443:

444: First, for both language models, results did not significantly change

445: depending on the speaker. The best average precision values for

446: speech-driven text retrieval were obtained with a combination of

447: queries dictated by a male speaker (M1) and the NTCIR-based language

448: model, which were approximately 80\% of those with the text-to-text

449: retrieval.

450:

451: Second, by comparing results of different language models for each

452: speaker, one can see that the NTCIR-based model significantly

453: decreased WER and TER obtained with the newspaper-based model, and

454: that the retrieval method using the NTCIR-based model significantly

455: outperformed one using the newspaper-based model. In addition, these

456: results were observable, irrespective of the speaker.  Thus, we

457: conclude that adapting language models based on target collections was

458: quite effective for speech-driven text retrieval.

459:

460: Third, TER was generally higher than WER irrespective of the speaker.

461: In other words, speech recognition for content words was more

462: difficult than functional words, which were not contained in query

463: terms.

464:

465: We analyzed transcriptions for dictated queries, and found that speech

466: recognition error was mainly caused by the out-of-vocabulary

467: problem. In the case where major query terms are mistakenly

468: recognized, the retrieval accuracy substantially decreases.  In

469: addition, descriptions in the NTCIR topics often contain expressions

470: which do not appear in the documents, such as ``I want papers

471: about...''  Although these expressions usually do not affect the

472: retrieval accuracy, misrecognized words affect the recognition

473: accuracy for remaining words including major query

474: terms. Consequently, the retrieval accuracy decreases due to the

475: partial misrecognition.

476:

477: Finally, we investigated the trade-off between recall and precision.

478: Figures~\ref{fig:ntcir1} and \ref{fig:ntcir2} show recall-precision

479: curves of different retrieval methods, for the NTCIR-1 and 2

480: collections, respectively. In these figures, the relative superiority

481: for precision values due to different language models in

482: Table~\ref{tab:results} was also observable, regardless of the recall.

483:

484: However, the effectiveness of the on-line adaptation remains an open

485: question and needs to be explored.

486:

487: \begin{figure}[htbp]

488:   \begin{center}

489:     \leavevmode \psfig{file=ntcir-1.ps,height=3in}

490:   \end{center}

491:   \caption{Recall-precision curves for different retrieval methods

492:   using the NTCIR-1 collection.}

493:   \label{fig:ntcir1}

494: \end{figure}

495:

496: \begin{figure}[htbp]

497:   \begin{center}

498:     \leavevmode \psfig{file=ntcir-2.ps,height=3in}

499:   \end{center}

500:   \caption{Recall-precision curves for different retrieval methods

501:   using the NTCIR-2 collection.}

502:   \label{fig:ntcir2}

503: \end{figure}

504:

505: \section{Conclusion}

506: \label{sec:conclusion}

507:

508: Aiming at speech-driven text retrieval with a high accuracy, we

509: proposed a method to integrate speech recognition and text retrieval

510: methods, in which target text collections are used to adapt

511: statistical language models for speech recognition.  We also showed

512: the effectiveness of our method by way of experiments, where dictated

513: information needs in the NTCIR collections were used as queries to

514: retrieve technical abstracts.  Future work would include experiments

515: on various collections, such as newspaper articles and Web pages.

516:

517: \section{Acknowledgments}

518:

519: The authors would like to thank the National Institute of Informatics

520: for their support with the NTCIR collections.

521:

522: \bibliographystyle{abbrv}

523: \begin{thebibliography}{10}

524:

525: \bibitem{bahl:ieee-tpami-1983}

526: L.~R. Bahl, F.~Jelinek, and R.~L. Mercer.

527: \newblock A maximum likelihood approach to continuous speech recognition.

528: \newblock {\em IEEE Transactions on Pattern Analysis and Machine Intelligence},

529:   5(2):179--190, 1983.

530:

531: \bibitem{barnett:eurospeech-97}

532: J.~Barnett, S.~Anderson, J.~Broglio, M.~Singh, R.~Hudson, and S.~W. Kuo.

533: \newblock Experiments in spoken queries for document retrieval.

534: \newblock In {\em Proceedings of Eurospeech97}, pages 1323--1326, 1997.

535:

536: \bibitem{crestani:fqas-2000}

537: F.~Crestani.

538: \newblock Word recognition errors and relevance feedback in spoken query

539:   processing.

540: \newblock In {\em Proceedings of the Fourth International Conference on

541:   Flexible Query Answering Systems}, pages 267--281, 2000.

542:

543: \bibitem{garofolo:trec-97}

544: J.~S. Garofolo, E.~M. Voorhees, V.~M. Stanford, and K.~S. Jones.

545: \newblock {TREC-6} 1997 spoken document retrieval track overview and results.

546: \newblock In {\em Proceedings of the 6th Text REtrieval Conference}, pages

547:   83--91, 1997.

548:

549: \bibitem{johnson:icassp-99}

550: S.~Johnson, P.~Jourlin, G.~Moore, K.~S. Jones, and P.~Woodland.

551: \newblock The {Cambridge} {University} spoken document retrieval system.

552: \newblock In {\em Proceedings of ICASSP'99}, pages 49--52, 1999.

553:

554: \bibitem{jones:sigir-96}

555: G.~Jones, J.~Foote, K.~S. Jones, and S.~Young.

556: \newblock Retrieving spoken documents by combining multiple index sources.

557: \newblock In {\em Proceedings of the 19th Annual International ACM SIGIR

558:   Conference on Research and Development in Information Retrieval}, pages

559:   30--38, 1996.

560:

561: \bibitem{kawahara:icslp-2000}

562: T.~Kawahara, A.~Lee, T.~Kobayashi, K.~Takeda, N.~Minematsu, S.~Sagayama,

563:   K.~Itou, A.~Ito, M.~Yamamoto, A.~Yamada, T.~Utsuro, and K.~Shikano.

564: \newblock Free software toolkit for {Japanese} large vocabulary continuous

565:   speech recognition.

566: \newblock In {\em Proceedings of the 6th International Conference on Spoken

567:   Language Processing}, pages 476--479, 2000.

568:

569: \bibitem{kwok:sigir-98}

570: K.~Kwok and M.~Chan.

571: \newblock Improving two-stage ad-hoc retrieval for short queries.

572: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR

573:   Conference on Research and Development in Information Retrieval}, pages

574:   250--256, 1998.

575:

576: \bibitem{masataki:icassp-97}

577: H.~Masataki, Y.~Sagisaka, K.~Hisaki, and T.~Kawahara.

578: \newblock Task adaptation using {MAP} estimation in n-gram language modeling.

579: \newblock In {\em Proceedings of ICASSP'97}, pages 783--786, 1997.

580:

581: \bibitem{matsumoto:chasen-99}

582: Y.~Matsumoto, A.~Kitauchi, T.~Yamashita, Y.~Hirano, H.~Matsuda, and M.~Asahara.

583: \newblock {Japanese} morphological analysis system {ChaSen} version 2.0 manual

584:   2nd edition.

585: \newblock Technical Report NAIST-IS-TR99009, NAIST, 1999.

586:

587: \bibitem{ntcir-99}

588: {National Center for Science Information Systems}.

589: \newblock {\em Proceedings of the 1st NTCIR Workshop on Research in Japanese

590:   Text Retrieval and Term Recognition}, 1999.

591:

592: \bibitem{ntcir-2001}

593: {National Institute of Informatics}.

594: \newblock {\em Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation of

595:   Chinese \& Japanese Text Retrieval and Text Summarization}, 2001.

596:

597: \bibitem{robertson:sigir-94}

598: S.~Robertson and S.~Walker.

599: \newblock Some simple effective approximations to the 2-poisson model for

600:   probabilistic weighted retrieval.

601: \newblock In {\em Proceedings of the 17th Annual International ACM SIGIR

602:   Conference on Research and Development in Information Retrieval}, pages

603:   232--241, 1994.

604:

605: \bibitem{seymore:eurospeech-97}

606: K.~Seymore and R.~Rosenfeld.

607: \newblock Using story topics for language model adaptation.

608: \newblock In {\em Proceedings of Eurospeech97}, 1997.

609:

610: \bibitem{sheridan:sigir-97}

611: P.~Sheridan, M.~Wechsler, and P.~Sch\"{a}uble.

612: \newblock Cross-language speech retrieval: Establishing a baseline performance.

613: \newblock In {\em Proceedings of the 20th Annual International ACM SIGIR

614:   Conference on Research and Development in Information Retrieval}, pages

615:   99--108, 1997.

616:

617: \bibitem{singhal:sigir-99}

618: A.~Singhal and F.~Pereira.

619: \newblock Document expansion for speech retrieval.

620: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR

621:   Conference on Research and Development in Information Retrieval}, pages

622:   34--41, 1999.

623:

624: \bibitem{srinivasan:sigir-2000}

625: S.~Srinivasan and D.~Petkovic.

626: \newblock Phonetic confusion matrix based spoken document retrieval.

627: \newblock In {\em Proceedings of the 23rd Annual International ACM SIGIR

628:   Conference on Research and Development in Information Retrieval}, pages

629:   81--87, 2000.

630:

631: \bibitem{voorhees:sigir-98}

632: E.~M. Voorhees.

633: \newblock Variations in relevance judgments and the measurement of retrieval

634:   effectiveness.

635: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR

636:   Conference on Research and Development in Information Retrieval}, pages

637:   315--323, 1998.

638:

639: \bibitem{wechsler:sigir-98}

640: M.~Wechsler, E.~Munteanu, and P.~Sch\"{a}uble.

641: \newblock New techniques for open-vocabulary spoken document retrieval.

642: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR

643:   Conference on Research and Development in Information Retrieval}, pages

644:   20--27, 1998.

645:

646: \bibitem{whittaker:sigir-99}

647: S.~Whittaker, J.~Hirschberg, J.~Choi, D.~Hindle, F.~Pereira, and A.~Singhal.

648: \newblock {SCAN}: Designing and evaluating user interfaces to support retrieval

649:   from speech archives.

650: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR

651:   Conference on Research and Development in Information Retrieval}, pages

652:   26--33, 1999.

653:

654: \end{thebibliography}

655:

656:

657: \end{document}

658: