0011:cs0011003/main.tex

1: %%

2: %% AMTA-2000 camera-ready

3: %%

4: \documentstyle{llncs}

5:

6: \title{Applying Machine Translation to Two-Stage Cross-Language

7: Information Retrieval}

8:

9: \author{\Large Atsushi Fujii and Tetsuya Ishikawa}

10:

11: \institute{University of Library and Information Science \\ 1-2

12: Kasuga, Tsukuba, 305-8550, Japan \\ \smallskip {\normalsize\tt

13: E-mail:~fujii@ulis.ac.jp}}

14:

15: \newcommand{\etal}{et~al.}

16: \newcommand{\etaleos}{et~al}

17: \newcommand{\eq}[1]{(\ref{#1})}

18:

19: \newcommand{\shortcite}[1]{\cite{#1}}

20: \renewcommand{\nocite}[1]{\shortcite{#1}}

21:

22: \input{psfig.tex}

23:

24: \begin{document}

25:

26: \maketitle

27:

28: \begin{abstract}

29:   Cross-language information retrieval (CLIR), where queries and

30:   documents are in different languages, needs a translation of queries

31:   and/or documents, so as to standardize both of them into a common

32:   representation. For this purpose, the use of machine translation is

33:   an effective approach. However, computational cost is prohibitive in

34:   translating large-scale document collections. To resolve this

35:   problem, we propose a two-stage CLIR method. First, we translate a

36:   given query into the document language, and retrieve a limited

37:   number of foreign documents. Second, we machine translate only those

38:   documents into the user language, and re-rank them based on the

39:   translation result. We also show the effectiveness of our method by

40:   way of experiments using Japanese queries and English technical

41:   documents.

42: \end{abstract}

43:

44: \section{Introduction}

45: \label{sec:introduction}

46:

47: The number of machine readable texts accessible via CD-ROMs and the

48: World Wide Web has been rapidly growing. However, since the content of

49: each text is usually provided in a limited number of languages, the

50: notion of information retrieval (IR) has been expanded so that users

51: can retrieve textual information (i.e., documents) across

52: languages. One application, commonly termed ``cross-language

53: information retrieval (CLIR)'', is the retrieval task where the user

54: presents queries in one language to retrieve documents in another

55: language. Thus, as can be predicted, CLIR needs to standardize queries

56: and documents into a common representation, so that monolingual IR

57: techniques can be applied. From this point of view, existing CLIR can

58: be classified into three approaches.

59:

60: The first approach translates queries into the document

61: language~\cite{ballesteros:sigir-98,davis:sigir-97,fujii:emnlp-vlc-99,nie:sigir-99},

62: while the second approach translates documents into the query

63: language~\cite{mccarley:acl-99,oard:amta-98}. The third approach

64: projects both queries and documents into a language-independent

65: representation by way of thesaurus

66: classes~\cite{gonzalo:chum-98,salton:jasis-70} and latent semantic

67: indexing~\cite{carbonell:ijcai-97,littman:clir-98}.

68:

69: Although extensive comparative experiments among different approaches

70: in a rigorous manner are difficult and expensive, a few cases can be

71: found in past CLIR literature.

72:

73: Oard~\nocite{oard:amta-98} compared the query and document translation

74: methods. For the purpose of English-German CLIR experiments, he used

75: the 21 English queries and SDA/NZZ German collection consisting of

76: 251,840 newswire articles, contained in the TREC-6 CLIR collection.

77: Then, he showed that the MT-based query translation with the Logos

78: system was more effective than various types of dictionary-based query

79: translation methods, and that the MT-based document translation method

80: further outperformed the MT-based query translation method. Those

81: findings were salient especially when the length of queries was large.

82:

83: McCarley~\nocite{mccarley:acl-99} conducted English/French

84: bidirectional CLIR experiments, where the 141,656 AP English documents

85: and 212,918 SDA French documents in the TREC-6 and TREC-7 collections

86: were used, and applied a statistical MT method to both query and

87: document translation methods. He showed that the relative superiority

88: between query and document translation methods varied depending on the

89: source and target language pair. To put it more precisely, in his

90: case, the quality of French-English translation was better than that

91: of English-French translation, for both query and document

92: translations.

93:

94: In addition, he showed that a hybrid method, where the relevance

95: degree of each document (i.e., the ``score'') is the mean of those

96: obtained with query and document translation methods, outperformed

97: methods based on either query or document translation, irrespective of

98: the source and target language pair.  Possible rationales include that

99: since machine translation is not an invertible operation, query and

100: document translations mutually enhance the possibility that query

101: terms correspond to appropriate translations in documents.

102:

103: To sum up, the MT-based document translation approach is potentially

104: effective in terms of retrieval accuracy.  Besides this, since

105: retrieved documents are mostly in a user's non-native language, the

106: document translation approach is significantly effective for browsing

107: and interactive retrieval.

108:

109: However, a major drawback of this approach is that the full

110: translation on large-scale collections is prohibitive in terms of

111: computational cost. In fact, Oard~\nocite{oard:amta-98}, for example,

112: spent approximately ten machine-months in translating the SDA/NZZ

113: collection. This problem is especially crucial in the case where the

114: number of user languages is large, and documents are frequently

115: updated as in the Web. Although a fast MT

116: method~\cite{mccarley:amta-98} was proposed, this method is currently

117: limited to MT within European languages, which are relatively similar

118: to one another.

119:

120: In view of the above discussions, we propose a method to minimize the

121: computational cost required for the MT-based document translation,

122: which is fundamentally twofold. First, we translate the query into the

123: document language, and retrieve a fixed number of top-ranked documents

124: (one thousand, for example). Second, we machine translate those

125: documents into the query language, and then re-rank those documents

126: based on the score, combining those individually obtained with query

127: and document translation methods. Consequently, it is expected that

128: the retrieval accuracy is improved with a minimal MT cost.

129:

130: From a different perspective, our method can be classified as a {\em

131: two-stage\/} retrieval principle. However, in the monolingual

132: two-stage IR, the second stage usually involves re-calculation of term

133: weights and local feedback so as to increase the number of relevant

134: documents in the final result~\cite{kwok:sigir-98}, and that in the

135: case of existing two-stage CLIR, multiple stages are used to improve

136: the quality of query

137: translation~\cite{ballesteros:sigir-97,davis:sigir-97}.

138:

139: Section~\ref{sec:system} describes our two-stage CLIR system, where we

140: elaborate mainly on the MT-based re-ranking method.

141: Section~\ref{sec:experimentation} then evaluates the performance of

142: our system, using the NACSIS test collection~\cite{kando:sigir-99},

143: which consists of 39 Japanese queries and approximately 330,000

144: technical abstracts in English and Japanese.

145:

146: \section{System Description}

147: \label{sec:system}

148:

149: \subsection{Overview}

150: \label{subsec:system_overview}

151:

152: Figure~\ref{fig:system} depicts the overall design of our

153: Japanese/English bidirectional CLIR system, in which we combined query

154: and document translation modules with a monolingual retrieval

155: system. In this section, we explain the retrieval process based on

156: this figure.

157:

158: First, given a query in the source language (S), a query translation

159: is performed to output a translation in the target language (T). In

160: this phase, we use two alternative methods. The first method is the

161: use of an MT system, for which we use the Transer Japanese/English MT

162: system.\footnote{Developed by NOVA, Inc.} This MT system uses a

163: general bilingual dictionary consisting of 230,000 entries, and 19

164: optional technical dictionaries, among which a computer terminology

165: dictionary consisting of 100,000 entries is combined with our system.

166:

167: However, since in most cases, queries consist of a small number of

168: keywords and phrases, word/phrased-based translation methods are

169: expected to be comparable with MT systems, in terms of query

170: translation. Thus, for the second method, we use the Japanese/English

171: phrase-based translation method proposed by Fujii and

172: Ishikawa~\cite{fujii:emnlp-vlc-99}, which uses general/technical

173: dictionaries to derive possible word/phrase translations, and resolves

174: translation ambiguity based on statistical information obtained from

175: the target document collection. In addition, for words unlisted in

176: dictionaries, transliteration is performed to identify phonetic

177: equivalents in the target language.

178:

179: Second, the monolingual retrieval system searches a collection for

180: documents relevant to the translated query, and sorts them according

181: to the degree of relevance (i.e., the score), in descending order.

182: For English documents, we use the SMART system~\cite{salton:71}, where

183: the augmented TF$\cdot$IDF term weighting method (``atc'') is used for

184: both queries and documents, and the score is computed based on the

185: similarity between the query and each document in a term vector space.

186: For Japanese documents, we implemented a retrieval system based on the

187: vector space model.

188:

189: Consequently, only the top $N$ documents are selected as an

190: intermediate retrieval result, where $N$ is a parametric constant.

191:

192: Third, the top $N$ documents are translated into the source

193: language. Note that unlike the query translation phase, we use solely

194: the Transer MT system, because translations are aimed primarily at

195: human users, and thus the phrase-based translation method potentially

196: degrades readability of retrieval results.

197:

198: Finally, the $N$ documents translated are {\em re\/}-ranked according

199: to the new score. To accomplish this task, we compute the similarity

200: score between the source query (submitted by the user) and each

201: translated document in the term vector space, as performed in the

202: first retrieval stage. We then compute the new score by averaging

203: those obtained independently with English and Japanese monolingual

204: similarity computations.  We will elaborate on this process in

205: Section~\ref{subsec:re-ranking}.

206:

207: Note that by decreasing the value of $N$, we can decrease the

208: computational cost required for machine translation. However, this

209: also decreases the number of relevant documents contained in the top

210: $N$ set, and potentially dilutes the effectiveness of the re-ranking.

211: For example, in an extreme case where the top $N$ set contains no

212: relevant document, the re-ranking procedure does not change the

213: retrieval accuracy.

214:

215: The re-ranking procedure is similar to McCarley's hybrid

216: method~\cite{mccarley:acl-99}, in the sense that his method also

217: combines scores obtained with query and document

218: translations. However, unlike McCarley's method, which needs to

219: translate the entire document collection prior to the retrieval, in

220: our method the overhead for translating documents is minimized and can

221: be distributed to each user. In other words, the second stage can be

222: performed on each client (i.e., users' computers or Web browsers). In

223: fact, there are a number of commercial Web browsers combined with MT

224: systems, and thus it is feasible to additionally introduce the

225: re-ranking function to those browsers.  Besides this, we can easily

226: replace the MT system with a newer version or those for other language

227: pairs.

228:

229: \begin{figure}[htbp]

230:   \begin{center}

231:     \leavevmode

232:     \psfig{file=system.eps,height=3.3in}

233:   \end{center}

234:   \caption{The overall design of our CLIR system.}

235:   \label{fig:system}

236: \end{figure}

237:

238: \subsection{MT-based Re-ranking Method}

239: \label{subsec:re-ranking}

240:

241: First, given the top $N$ documents retrieved and translated into the

242: source language, we first compute the similarity score between each

243: document and the source query provided by the user. Following the

244: vector space model, both queries and documents are represented by a

245: vector consisting of statistical factors associated with indexed terms

246: (i.e., term weights).

247:

248: In conventional retrieval systems, documents are indexed to produce an

249: inverted file, prior to the retrieval, so that documents containing

250: query terms can efficiently be retrieved even from a large-scale

251: collection.  However, in the case of our re-ranking process, since (a)

252: the number of target documents is limited, and (b) real-time indexing

253: degrades the time efficiency, we prefer to use a simple pattern

254: matching method, instead of the inverted file.

255:

256: For term weighting, we tentatively use a variation of

257: TF$\cdot$IDF~\cite{salton:ipm-88,zobel:sigir-forum-98}, as shown in

258: Equation~\eq{eq:tf_idf}.

259: \begin{equation}

260:   \label{eq:tf_idf}

261:   \begin{array}{lll}

262:     TF & = & 1 + \log(f_{t,d}) \\

263:     \noalign{\vskip 1.2ex}

264:     IDF & = & \log\frac{\textstyle N}{\textstyle n_{t}}

265:   \end{array}

266: \end{equation}

267: Here, $f_{t,d}$ denotes the frequency that term $t$ appears in

268: document $d$. Note that unlike the common IDF formula, $N$ denotes the

269: number of documents retrieved in the first stage (see

270: Section~\ref{subsec:system_overview}), and $n_{t}$ denotes the number

271: of documents containing term $t$, out of $N$ documents.

272:

273: One may argue that since in our case where the number of target

274: documents is considerably smaller than that of the entire collection,

275: a different term weighting method is needed. For example, the IDF

276: formula proposed for large-scale document collections may be less

277: effective for a limited number of documents. However, a preliminary

278: experiment showed that the use of IDF marginally improved the

279: performance obtained without IDF. On the other hand, since the

280: preliminary experiment showed that the use of document length

281: considerably degraded the performance, we compute the similarity

282: between the query and each document, as the inner product (instead of

283: the cosine of the angle) between their associated vectors.

284:

285: Thereafter, for each document, we combine two similarity scores

286: obtained in English-English and Japanese-Japanese retrieval processes.

287: We shall call them $ESIM$ and $JSIM$, respectively.  Since those two

288: similarity scores have different ranges, we use a geometric mean,

289: instead of an arithmetic mean, as shown in

290: Equation~\eq{eq:new_similarity}.

291: \begin{eqnarray}

292:   \label{eq:new_similarity}

293:   SIM & = & ESIM^\alpha \cdot JSIM^\beta

294: \end{eqnarray}

295: Here, $SIM$ is the final similarity score with which we re-rank the

296: top $N$ documents, and $\alpha$ and $\beta$ are parametric constants

297: used to control the degree to which $ESIM$ and $JSIM$ affect the

298: computation of $SIM$. However, in the case where either $ESIM$ or

299: $JSIM$ is zero, the value of $SIM$ always becomes zero, disregarding

300: the value of the other similarity score. To avoid this problem, in

301: such a case we arbitrarily assign the value 0.0001 to either $ESIM$ or

302: $JSIM$ that takes zero.

303:

304: Possible factors to set values of $\alpha$ and $\beta$ include the

305: quality of Japanese-English and English-Japanese translations. In the

306: case where the quality of one of the translations is considerably

307: lower, $\alpha$ and $\beta$ must be properly set so as to decrease the

308: effect of the similarity score through the lower quality

309: translation. Generally speaking, the quality of English-Japanese

310: translation is higher than that of Japanese-English translation,

311: because morphological and syntactic analyses for Japanese are usually

312: more crucial than those for English.  However, we empirically set

313: \mbox{$\alpha=\beta=1$}, that is, we consider $ESIM$ and $JSIM$

314: equally in the re-ranking process.

315:

316: \medskip

317: \section{Experimentation}

318: \label{sec:experimentation}

319:

320: \subsection{Methodology}

321: \label{subsec:eval_overview}

322:

323: We investigated the performance of several versions of our system in

324: terms of Japanese-English CLIR, where each system outputs the top

325: 1,000 documents, and the TREC evaluation software was used

326: to calculate non-interpolated average precision values.

327:

328: For the purpose of our experiments, we used the official version

329: of the NACSIS test collection~\cite{kando:sigir-99}. This collection

330: consists of 39 Japanese queries and approximately 330,000 documents

331: (in either a combination of English and Japanese or either of the

332: languages individually), collected from technical papers published by

333: 65 Japanese associations for various fields.

334:

335: Each document consists of the document ID, title, name(s) of

336: author(s), name/date of conference, hosting organization, abstract and

337: keywords, from which titles, abstracts and keywords were indexed by

338: the SMART system. We used as target documents 187,081 entries that are

339: in both English and Japanese.

340:

341: Each query consists of the query ID, title of the topic, description,

342: narrative and list of synonyms, from which we used only the

343: description.  Figure~\ref{fig:query} shows example descriptions

344: (translated into English by one of the authors).

345:

346: The NACSIS collection was produced for a TREC-type (CL)IR workshop

347: held by NACSIS (National Center for Science Information Systems,

348: Japan) in 1999.\footnote{See {\tt

349: http://www.rd.nacsis.ac.jp/\~{}ntcadm/workshop/work-en.html} for

350: details of the NACSIS workshop.} In this workshop, each participant

351: was allowed to submit more than one retrieval result using different

352: methods. However, at least one result had to be gained with only the

353: description field in queries. According to experimental results

354: reported in the proceedings of the workshop~\cite{ntcir-99}, in the

355: case where only the description field was used, average precision

356: values ranged from 0.021 to 0.182.

357:

358: Relevance assessment was performed based on the pooling

359: method~\cite{voorhees:sigir-98}. To put it more precisely, candidates

360: for relevant documents were first pooled by multiple retrieval systems

361: (primarily systems that participated in the NACSIS

362: workshop). Thereafter, for each candidate document, human expert(s)

363: assigned one of three ranks of relevance, that is, ``relevant'',

364: ``partially relevant'' and \mbox{``irrelevant''.} The average number

365: of candidate documents pooled for each query is 2,509, among which the

366: number of relevant and partially relevant documents are approximately

367: 21 and 6, respectively.  In our experiments, we did not regard

368: ``partially relevant'' documents as relevant ones, because

369: interpretation of ``partially relevant'' is not fully clear to the

370: authors. Note that since the NACSIS collection does not contain

371: English queries, we cannot estimate a baseline for Japanese-English

372: CLIR performance using English-English IR.

373:

374: In the following two sections, we will show experimental results in

375: terms of the first and second stages (i.e., query translation methods

376: and the MT-based re-ranking method), respectively.

377:

378: \begin{figure}[htbp]

379:   \begin{center}

380:     \leavevmode

381:     \small

382:     \begin{tabular}{cl} \hline\hline

383:       ID & {\hfill\centering Description\hfill} \\ \hline

384:       0032 & middleware construction in network collaboration \\

385:       0035 & digital libraries in distributed systems \\

386:       0036 & problems related to groupwares in mobile communication \\

387:       0062 & life-long education and volunteer \\

388:       0065 & image retrieval based on genetic algorithm \\

389:       \hline

390:     \end{tabular}

391:     \caption{Example query descriptions in the NACSIS collection.}

392:     \label{fig:query}

393:   \end{center}

394: \end{figure}

395:

396: \medskip

397: \subsection{Evaluation of Query Translation Methods}

398: \label{subsec:eval_query_translation}

399:

400: The primal objective in this section is to compare the effectiveness

401: of the phrase-based translation method proposed by Fujii and

402: Ishikawa~\nocite{fujii:emnlp-vlc-99} and one based on the Transer MT

403: system, in terms of Japanese-English query translation. While the

404: former method is aimed solely at words and phrases, the MT system can

405: also be used for full sentences. In addition, since both methods are,

406: to some extent, complementary to each other, we theoretically gain a

407: query expansion effect, combining query terms translated by individual

408: methods. In view of those above factors, we compared the following

409: query translation methods:

410: \begin{itemize}

411: \item the use of the Transer MT system for full sentences contained in

412:   the description field (``MTS''),

413: \item the use of the Transer MT system for content words and phrases

414:   extracted from the description field, for which the ChaSen

415:   morphological analyzer~\cite{matsumoto:chasen-97} was used

416:   (``MTP''),

417: \item the phrase-based translation method applied to the same words

418:   and phrases as used for the MTP method (``PBT''),

419: \item the use of query terms obtained with both MTP and PBT, where

420:   terms outputed by both methods are considered to appear twice in the

421:   query (``MPBT'').

422: \end{itemize}

423: Table~\ref{tab:avg_pre} shows the non-interpolated average

424: precision values, averaged over the 39 queries, for different query

425: translation methods listed above.  The second column denotes the

426: average number of query terms provided with each translation method,

427: some of which were potentially discarded as stopwords by the SMART

428: system. The third column denotes average precision values for

429: different query translation methods. We will explain the fourth and

430: fifth columns in Section~\ref{subsec:eval_re-ranking}.

431:

432: Looking at this table, one can see that while two MT-based methods,

433: that is, MTS and MTP, were quite comparable in performance, and that

434: PBT outperformed both of them. In the case of PBT, the transliteration

435: successfully identified English equivalents for {\it katakana\/} words

436: unlisted in the word dictionary, such as ``{\it

437: coraboreishon\/}~(collaboration)'' and ``{\it mobairu\/}~(mobile)'',

438: which the MT-based methods failed to translate.  Another reason was

439: due to the difference in dictionaries used.  Generally speaking, PBT

440: tended to output technical words more than the MT-based methods. For

441: example, for Japanese phrases ``{\it fukusuu-deeta\/}'' and ``{\it

442: sekitsui-doubutsu}'', PBT outputed ``multiple data'' and ``craniate'',

443: while MTS/MTP outputed ``more than one data'' and ``vertebrate'',

444: respectively. Note that this effect was evident partially because the

445: NACSIS collection consists of technical documents. In addition, MPBT

446: further improved the performance of PBT. Although the difference

447: between PBT and MPBT was marginal, it is worth utilizing both the

448: MT-based and phrase-based methods, if available, for query

449: translation.

450:

451: \begin{table}[htbp]

452:   \begin{center}

453:     \caption{Non-interpolated average precision values,

454:     averaged over the 39 queries.}

455:     \medskip

456:     \leavevmode

457:     \small

458:     \tabcolsep=3pt

459:     \begin{tabular}{lrcll} \hline\hline

460:       Query Translation & & &

461:       \multicolumn{2}{c}{Avg. Precision with Re-ranking} \\

462:       \cline{4-5}

463:       Method & \# of Terms &

464:       Avg. Precision & {\hfill\centering MT\hfill} & {\hfill\centering

465:       HT\hfill} \\ \hline

466:       ~~~~~~~~MTS  & 16.6~~~~ & 0.1124 & 0.1770 (+57.5\%) & 0.2297

467:       (+104.3\%) \\

468:       ~~~~~~~~MTP  & 8.7~~~~ & 0.1134 & 0.1746 (+54.0\%) & 0.2217

469:       (+95.5\%) \\

470:       ~~~~~~~~PBT  & 6.1~~~~ & 0.1403 & 0.2013 (+43.5\%) & 0.2295

471:       (+63.6\%) \\

472:       ~~~~~~~~MPBT & 13.1~~~~ & 0.1426 & 0.1986 (+39.3\%) & 0.2356

473:       (+65.2\%) \\

474:       \hline

475:     \end{tabular}

476:     \label{tab:avg_pre}

477:   \end{center}

478: \end{table}

479:

480: To validate those above results in a thorough manner, we used the

481: non-parametric Wilcoxon matched-pairs signed-test for statistical

482: testing (at the 5\% level), which investigates whether the difference

483: in average precision is meaningful or simply due to

484: chance~\cite{hull:sigir-93,keen:ipm-92,srinivasan:ipm-90}. We found

485: that differences in average precision values for pairs ``MTP versus

486: MTS'', ``MPBT versus MTS'', and ``MPBT versus MTP'' were significant,

487: although for other pairs, we could not obtain sufficient evidence to

488: conclude a statistical significance. To sum up, we concluded that in

489: query translation, a combination of MT-based and phrase-based

490: translation methods was more effective than a method relying solely on

491: the MT system.

492:

493: \medskip

494: \subsection{Evaluation of the MT-based Re-ranking Method}

495: \label{subsec:eval_re-ranking}

496:

497: First, we consider Table~\ref{tab:avg_pre} again, where the fourth

498: column ``MT'' denotes the average precision values for each query

499: translation method, combined with the MT-based re-ranking

500: method. Throughout our experimentation in this paper, the best average

501: precision value by an automatic method was 0.2013 (i.e., one obtained

502: by PBT combined with the MT-based re-ranking method), which is

503: relatively high, when compared with average precision values reported

504: in the NACSIS workshop (ranging from 0.021 to 0.182).

505:

506: For each query translation method, the improvement in average

507: precision from one without the re-ranking, which is generally

508: noticeable, is indicated in parentheses.  In fact, we used the

509: Wilcoxon test again, as conducted in

510: Section~\ref{subsec:eval_query_translation}, and confirmed that every

511: improvement was statistically significant. To sum up, the MT-based

512: re-ranking method we proposed was generally effective, irrespective of

513: the query translation method combined, in terms of CLIR performance.

514:

515: Second, we conducted an error analysis for queries for which the

516: re-ranking method degraded the average precision, and found that

517: roughly two thirds of errors were due to ambiguity in the document

518: translation.  For example, the English word ``library'' was often

519: incorrectly translated into ``{\em raiburari\/}~(library as a

520: software)'', whereas the original query was intended to ``{\em

521: toshokan\/}~(library as an institution)''.

522:

523: Third, to estimate the upper bound of the re-ranking method, as

524: denoted in the fifth column ``HT'', we used as human translations

525: Japanese documents comparable to English ones in the NACSIS

526: collection. By comparing the results of ``MT'' and ``HT'', one can see

527: that MT systems with a higher quality, if available, are expected to

528: further improve our CLIR system. In fact, when we manually corrected

529: inappropriate translations in translated documents, such as

530: ``library~({\em raiburari/toshokan\/})'' above, the average precision

531: of ``MT'' became almost equivalent to that of ``HT''.

532:

533: Noted that when combined with the re-ranking method, differences among

534: query translation methods in average precision were relatively

535: overshadowed.  In the case of ``MT'', the Wilcoxon test showed that

536: differences in only pairs ``MPBT versus MTS'' and ``MPBT versus MTP''

537: were significant, while in the case of ``HT'', none of the differences

538: were identified as significant.

539:

540: Fourth, we investigated how the number of documents retrieved in the

541: first stage (i.e., the value of $N$ in Section~\ref{sec:system})

542: affected the performance of the re-ranking method. As discussed in

543: Section~\ref{subsec:system_overview}, in real world usage, one has to

544: consider the trade-off between the retrieval accuracy (i.e., average

545: precision in our case) and overhead required for the document

546: translation.

547:

548: Table~\ref{tab:docnum_avgpre} shows the results, where average

549: precision values in the column \mbox{``1,000''} correspond to those in

550: Table~\ref{tab:avg_pre}. By comparing average precision values for

551: each of four query translation methods (i.e., MTS, MTP, PBT and MPBT)

552: and those suffixed with ``+MT'' and ``+HT'' in

553: Table~\ref{tab:docnum_avgpre}, one can see that the re-ranking methods

554: were effective, irrespective of the number of documents retrieved.  In

555: other words, it is expected that we can minimize the overhead in

556: translating documents, without decreasing the retrieval accuracy.

557:

558: Table~\ref{tab:xtime} shows CPU time (sec.) required for the document

559: translation and re-ranking procedures, averaged over four different

560: query translation methods. In the case of \mbox{$N=1,000$}, the total

561: CPU time was approximately three minutes, which is perhaps not

562: tolerable for a real-time usage. However, for small values of $N$

563: (e.g., 50 and 100), the CPU time was more acceptable and practical,

564: maintaining the improvement of retrieval accuracy.

565:

566: \begin{table}[htbp]

567:   \begin{center}

568:     \caption{The relation between the number of documents retrieved in

569:     the first stage and non-interpolated average precision

570:     values, averaged over the 39 queries.}

571:     \medskip

572:     \leavevmode

573:     \small

574:     \tabcolsep=4pt

575:     \begin{tabular}{lccccccc} \hline\hline

576:       & \multicolumn{7}{c}{\# of Documents Retrieved ($N$)} \\

577:       \cline{2-8}

578:       {\hfill\centering Method\hfill} & 50 & 100 & 200 & 400 & 600 &

579:       800 & 1,000 \\ \hline

580:       MTS & 0.0949 & 0.1017 & 0.1074 & 0.1101 & 0.1112 & 0.1119 &

581:       0.1124 \\

582:       MTS+MT & 0.1341 & 0.1556 & 0.1673 & 0.1698 & 0.1720 & 0.1736 &

583:       0.1770 \\

584:       MTS+HT & 0.1666 & 0.1901 & 0.2070 & 0.2173 & 0.2230 & 0.2259 &

585:       0.2297 \\

586:       \hline

587:       MTP & 0.0953 & 0.1020 & 0.1085 & 0.1113 & 0.1123 & 0.1131 &

588:       0.1134 \\

589:       MTP+MT & 0.1449 & 0.1584 & 0.1692 & 0.1711 & 0.1728 & 0.1750 &

590:       0.1746 \\

591:       MTP+HT & 0.1619 & 0.1819 & 0.2017 & 0.2105 & 0.2165 & 0.2203 &

592:       0.2217 \\

593:       \hline

594:       PBT & 0.1215 & 0.1301 & 0.1355 & 0.1385 & 0.1394 & 0.1399 &

595:       0.1403 \\

596:       PBT+MT & 0.1553 & 0.1723 & 0.1866 & 0.1954 & 0.1978 & 0.2005 &

597:       0.2013 \\

598:       PBT+HT & 0.1722 & 0.1915 & 0.2097 & 0.2212 & 0.2241 & 0.2279 &

599:       0.2295 \\

600:       \hline

601:       MPBT & 0.1229 & 0.1305 & 0.1376 & 0.1405 & 0.1416 & 0.1421 &

602:       0.1426 \\

603:       MPBT+MT & 0.1690 & 0.1766 & 0.1901 & 0.1946 & 0.1958 & 0.1967 &

604:       0.1986 \\

605:       MPBT+HT & 0.1814 & 0.1968 & 0.2142 & 0.2242 & 0.2301 & 0.2319 &

606:       0.2356 \\

607:       \hline

608:     \end{tabular}

609:     \label{tab:docnum_avgpre}

610:   \end{center}

611: \end{table}

612:

613: \begin{table}[htbp]

614:   \begin{center}

615:     \caption{CPU time for document translation and re-ranking (sec.).}

616:     \medskip

617:     \leavevmode

618:     \small

619:     \tabcolsep=4pt

620:     \begin{tabular}{lrrrrrrr} \hline\hline

621:       & \multicolumn{7}{c}{\# of Documents Retrieved ($N$)} \\

622:       \cline{2-8}

623:       & {\hfill\centering 50 \hfill} &

624:       {\hfill\centering 100 \hfill} &

625:       {\hfill\centering 200 \hfill} &

626:       {\hfill\centering 400 \hfill} &

627:       {\hfill\centering 600 \hfill} &

628:       {\hfill\centering 800 \hfill} &

629:       {\hfill\centering 1,000 \hfill}

630:       \\ \hline

631:       translation & 9.5 & 17.7 & 33.3 & 65.6 & 106.2 & 139.3 & 175.1 \\

632:       re-ranking  & 0.2 &  0.3 &  0.6 &  1.2 &   1.8 &   2.4 &   3.0 \\

633:       total       & 9.7 & 18.0 & 33.9 & 66.8 & 108.0 & 141.7 & 178.1 \\

634:       \hline

635:       \multicolumn{8}{r}{(Pentium III 700MHz)}

636:     \end{tabular}

637:     \label{tab:xtime}

638:   \end{center}

639: \end{table}

640:

641: \section{Conclusion}

642: \label{sec:conclusion}

643:

644: Reflecting the rapid growth in utilization of machine readable texts,

645: cross-language information retrieval (CLIR) has variously been

646: explored in order to facilitate retrieving information across

647: languages.

648:

649: In brief, existing CLIR systems are classified into three approaches:

650: (a) translating queries into the document language, (b) translating

651: documents into the query language, and (c) representing both queries

652: and documents in a language-independent space. Among these approaches,

653: the second approach, based on machine translation, is effective in

654: terms of retrieval accuracy and user interaction.  However, the

655: computational cost in translating large-scale document collections is

656: prohibitive.

657:

658: To resolve this problem, we proposed a two-stage CLIR method, in which

659: we first used a query translation method to retrieve a fixed number of

660: documents, and then applied machine translation only to those

661: documents, instead of the entire collection, to improve the document

662: ranking.

663:

664: Through Japanese-English CLIR experiments using the NACSIS collection,

665: we showed that our two-stage method significantly improved average

666: precision values obtained solely with query translation methods. We

667: also showed that our method performed reasonably, even in the case

668: where the number of retrieved documents was relatively small.

669:

670: \section*{Acknowledgments}

671:

672: The authors would like to thank NOVA, Inc. for their support with the

673: Transer MT system, and Noriko Kando (National Institute of

674: Informatics, Japan) for her support with the NACSIS collection.

675:

676:

677: \bibliographystyle{jplain}

678:

679: \begin{thebibliography}{10}

680:

681: \bibitem{ballesteros:sigir-97}

682: Lisa Ballesteros and W.~Bruce Croft.

683: \newblock Phrasal translation and query expansion techniques for cross-language

684:   information retrieval.

685: \newblock In {\em Proceedings of the 20th Annual International ACM SIGIR

686:   Conference on Research and Development in Information Retrieval}, pp. 84--91,

687:   1997.

688:

689: \bibitem{ballesteros:sigir-98}

690: Lisa Ballesteros and W.~Bruce Croft.

691: \newblock Resolving ambiguity for cross-language retrieval.

692: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR

693:   Conference on Research and Development in Information Retrieval}, pp. 64--71,

694:   1998.

695:

696: \bibitem{carbonell:ijcai-97}

697: Jaime~G. Carbonell, Yiming Yang, Robert~E. Frederking, Ralf~D. Brown, Yibing

698:   Geng, and Danny Lee.

699: \newblock Translingual information retrieval: A comparative evaluation.

700: \newblock In {\em Proceedings of the 15th International Joint Conference on

701:   Artificial Intelligence}, pp. 708--714, 1997.

702:

703: \bibitem{davis:sigir-97}

704: Mark~W. Davis and William~C. Ogden.

705: \newblock {QUILT}: Implementing a large-scale cross-language text retrieval

706:   system.

707: \newblock In {\em Proceedings of the 20th Annual International ACM SIGIR

708:   Conference on Research and Development in Information Retrieval}, pp. 92--98,

709:   1997.

710:

711: \bibitem{fujii:emnlp-vlc-99}

712: Atsushi Fujii and Tetsuya Ishikawa.

713: \newblock Cross-language information retrieval for technical documents.

714: \newblock In {\em Proceedings of the Joint ACL SIGDAT Conference on Empirical

715:   Methods in Natural Language Processing and Very Large Corpora}, pp. 29--37,

716:   1999.

717:

718: \bibitem{gonzalo:chum-98}

719: Julio Gonzalo, Felisa Verdejo, Carol Peters, and Nicoletta Calzolari.

720: \newblock Applying {EuroWordNet} to cross-language text retrieval.

721: \newblock {\em Computers and the Humanities}, Vol.~32, pp. 185--207, 1998.

722:

723: \bibitem{hull:sigir-93}

724: David Hull.

725: \newblock Using statistical testing in the evaluation of retrieval experiments.

726: \newblock In {\em Proceedings of the 16th Annual International ACM SIGIR

727:   Conference on Research and Development in Information Retrieval}, pp.

728:   329--338, 1993.

729:

730: \bibitem{kando:sigir-99}

731: Noriko Kando, Kazuko Kuriyama, and Toshihiko Nozue.

732: \newblock {NACSIS} test collection workshop ({NTCIR-1}).

733: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR

734:   Conference on Research and Development in Information Retrieval}, pp.

735:   299--300, 1999.

736:

737: \bibitem{keen:ipm-92}

738: E.~Michael Keen.

739: \newblock Presenting results of experimental retrieval comparisons.

740: \newblock {\em Information Processing \& Management}, Vol.~28, No.~4, pp.

741:   491--502, 1992.

742:

743: \bibitem{kwok:sigir-98}

744: K.L. Kwok and M.~Chan.

745: \newblock Improving two-stage ad-hoc retrieval for short queries.

746: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR

747:   Conference on Research and Development in Information Retrieval}, pp.

748:   250--256, 1998.

749:

750: \bibitem{littman:clir-98}

751: Michael~L. Littman, Susan~T. Dumais, and Thomas~K. Landauer.

752: \newblock Automatic cross-language information retrieval using latent semantic

753:   indexing.

754: \newblock In Gregory Grefenstette, editor, {\em Cross-Language Information

755:   Retrieval}, chapter~5, pp. 51--62. Kluwer Academic Publishers, 1998.

756:

757: \bibitem{matsumoto:chasen-97}

758: Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Osamu Imaichi, and Tomoaki

759:   Imamura.

760: \newblock {Japanese} morphological analysis system {ChaSen} manual.

761: \newblock Technical Report NAIST-IS-TR97007, NAIST, 1997.

762: \newblock (In Japanese).

763:

764: \bibitem{mccarley:acl-99}

765: J.~Scott McCarley.

766: \newblock Should we translate the documents or the queries in cross-language

767:   information retrieval?

768: \newblock In {\em Proceedings of the 37th Annual Meeting of the Association for

769:   Computational Linguistics}, pp. 208--214, 1999.

770:

771: \bibitem{mccarley:amta-98}

772: J.~Scott McCarley and Salim Roukos.

773: \newblock Fast document translation for cross-language information retrieval.

774: \newblock In {\em Proceedings of the 3rd Conference of the Association for

775:   Machine Translation in the Americas}, pp. 150--157, 1998.

776:

777: \bibitem{ntcir-99}

778: {National Center for Science Information Systems}.

779: \newblock {\em Proceedings of the 1st NTCIR Workshop on Research in Japanese

780:   Text Retrieval and Term Recognition}, 1999.

781:

782: \bibitem{nie:sigir-99}

783: Jian-Yun Nie, Michel Simard, Pierre Isabelle, and Richard Durand.

784: \newblock Cross-language information retrieval based on parallel texts and

785:   automatic mining of parallel texts from the {Web}.

786: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR

787:   Conference on Research and Development in Information Retrieval}, pp. 74--81,

788:   1999.

789:

790: \bibitem{oard:amta-98}

791: Douglas~W. Oard.

792: \newblock A comparative study of query and document translation for

793:   cross-language information retrieval.

794: \newblock In {\em Proceedings of the 3rd Conference of the Association for

795:   Machine Translation in the Americas}, pp. 472--483, 1998.

796:

797: \bibitem{salton:jasis-70}

798: Gerard Salton.

799: \newblock Automatic processing of foreign language documents.

800: \newblock {\em Journal of the American Society for Information Science},

801:   Vol.~21, No.~3, pp. 187--194, 1970.

802:

803: \bibitem{salton:71}

804: Gerard Salton.

805: \newblock {\em The {SMART} Retrieval System: Experiments in Automatic Document

806:   Processing}.

807: \newblock Prentice-Hall, 1971.

808:

809: \bibitem{salton:ipm-88}

810: Gerard Salton and Christopher Buckley.

811: \newblock Term-weighting approaches in automatic text retrieval.

812: \newblock {\em Information Processing \& Management}, Vol.~24, No.~5, pp.

813:   513--523, 1988.

814:

815: \bibitem{srinivasan:ipm-90}

816: Padmini Srinivasan.

817: \newblock A comparison of two-poisson, inverse document frequency and

818:   discrimination value models of document representation.

819: \newblock {\em Information Processing \& Management}, Vol.~26, No.~2, pp.

820:   269--278, 1990.

821:

822: \bibitem{voorhees:sigir-98}

823: Ellen~M. Voorhees.

824: \newblock Variations in relevance judgments and the measurement of retrieval

825:   effectiveness.

826: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR

827:   Conference on Research and Development in Information Retrieval}, pp.

828:   315--323, 1998.

829:

830: \bibitem{zobel:sigir-forum-98}

831: Justin Zobel and Alistair Moffat.

832: \newblock Exploring the similarity space.

833: \newblock {\em ACM SIGIR FORUM}, Vol.~32, No.~1, pp. 18--34, 1998.

834:

835: \end{thebibliography}

836:

837: \end{document}

838: