1: %%
2: %% AMTA-2000 camera-ready
3: %%
4: \documentstyle{llncs}
5:
6: \title{Applying Machine Translation to Two-Stage Cross-Language
7: Information Retrieval}
8:
9: \author{\Large Atsushi Fujii and Tetsuya Ishikawa}
10:
11: \institute{University of Library and Information Science \\ 1-2
12: Kasuga, Tsukuba, 305-8550, Japan \\ \smallskip {\normalsize\tt
13: E-mail:~fujii@ulis.ac.jp}}
14:
15: \newcommand{\etal}{et~al.}
16: \newcommand{\etaleos}{et~al}
17: \newcommand{\eq}[1]{(\ref{#1})}
18:
19: \newcommand{\shortcite}[1]{\cite{#1}}
20: \renewcommand{\nocite}[1]{\shortcite{#1}}
21:
22: \input{psfig.tex}
23:
24: \begin{document}
25:
26: \maketitle
27:
28: \begin{abstract}
29: Cross-language information retrieval (CLIR), where queries and
30: documents are in different languages, needs a translation of queries
31: and/or documents, so as to standardize both of them into a common
32: representation. For this purpose, the use of machine translation is
33: an effective approach. However, computational cost is prohibitive in
34: translating large-scale document collections. To resolve this
35: problem, we propose a two-stage CLIR method. First, we translate a
36: given query into the document language, and retrieve a limited
37: number of foreign documents. Second, we machine translate only those
38: documents into the user language, and re-rank them based on the
39: translation result. We also show the effectiveness of our method by
40: way of experiments using Japanese queries and English technical
41: documents.
42: \end{abstract}
43:
44: \section{Introduction}
45: \label{sec:introduction}
46:
47: The number of machine readable texts accessible via CD-ROMs and the
48: World Wide Web has been rapidly growing. However, since the content of
49: each text is usually provided in a limited number of languages, the
50: notion of information retrieval (IR) has been expanded so that users
51: can retrieve textual information (i.e., documents) across
52: languages. One application, commonly termed ``cross-language
53: information retrieval (CLIR)'', is the retrieval task where the user
54: presents queries in one language to retrieve documents in another
55: language. Thus, as can be predicted, CLIR needs to standardize queries
56: and documents into a common representation, so that monolingual IR
57: techniques can be applied. From this point of view, existing CLIR can
58: be classified into three approaches.
59:
60: The first approach translates queries into the document
61: language~\cite{ballesteros:sigir-98,davis:sigir-97,fujii:emnlp-vlc-99,nie:sigir-99},
62: while the second approach translates documents into the query
63: language~\cite{mccarley:acl-99,oard:amta-98}. The third approach
64: projects both queries and documents into a language-independent
65: representation by way of thesaurus
66: classes~\cite{gonzalo:chum-98,salton:jasis-70} and latent semantic
67: indexing~\cite{carbonell:ijcai-97,littman:clir-98}.
68:
69: Although extensive comparative experiments among different approaches
70: in a rigorous manner are difficult and expensive, a few cases can be
71: found in past CLIR literature.
72:
73: Oard~\nocite{oard:amta-98} compared the query and document translation
74: methods. For the purpose of English-German CLIR experiments, he used
75: the 21 English queries and SDA/NZZ German collection consisting of
76: 251,840 newswire articles, contained in the TREC-6 CLIR collection.
77: Then, he showed that the MT-based query translation with the Logos
78: system was more effective than various types of dictionary-based query
79: translation methods, and that the MT-based document translation method
80: further outperformed the MT-based query translation method. Those
81: findings were salient especially when the length of queries was large.
82:
83: McCarley~\nocite{mccarley:acl-99} conducted English/French
84: bidirectional CLIR experiments, where the 141,656 AP English documents
85: and 212,918 SDA French documents in the TREC-6 and TREC-7 collections
86: were used, and applied a statistical MT method to both query and
87: document translation methods. He showed that the relative superiority
88: between query and document translation methods varied depending on the
89: source and target language pair. To put it more precisely, in his
90: case, the quality of French-English translation was better than that
91: of English-French translation, for both query and document
92: translations.
93:
94: In addition, he showed that a hybrid method, where the relevance
95: degree of each document (i.e., the ``score'') is the mean of those
96: obtained with query and document translation methods, outperformed
97: methods based on either query or document translation, irrespective of
98: the source and target language pair. Possible rationales include that
99: since machine translation is not an invertible operation, query and
100: document translations mutually enhance the possibility that query
101: terms correspond to appropriate translations in documents.
102:
103: To sum up, the MT-based document translation approach is potentially
104: effective in terms of retrieval accuracy. Besides this, since
105: retrieved documents are mostly in a user's non-native language, the
106: document translation approach is significantly effective for browsing
107: and interactive retrieval.
108:
109: However, a major drawback of this approach is that the full
110: translation on large-scale collections is prohibitive in terms of
111: computational cost. In fact, Oard~\nocite{oard:amta-98}, for example,
112: spent approximately ten machine-months in translating the SDA/NZZ
113: collection. This problem is especially crucial in the case where the
114: number of user languages is large, and documents are frequently
115: updated as in the Web. Although a fast MT
116: method~\cite{mccarley:amta-98} was proposed, this method is currently
117: limited to MT within European languages, which are relatively similar
118: to one another.
119:
120: In view of the above discussions, we propose a method to minimize the
121: computational cost required for the MT-based document translation,
122: which is fundamentally twofold. First, we translate the query into the
123: document language, and retrieve a fixed number of top-ranked documents
124: (one thousand, for example). Second, we machine translate those
125: documents into the query language, and then re-rank those documents
126: based on the score, combining those individually obtained with query
127: and document translation methods. Consequently, it is expected that
128: the retrieval accuracy is improved with a minimal MT cost.
129:
130: From a different perspective, our method can be classified as a {\em
131: two-stage\/} retrieval principle. However, in the monolingual
132: two-stage IR, the second stage usually involves re-calculation of term
133: weights and local feedback so as to increase the number of relevant
134: documents in the final result~\cite{kwok:sigir-98}, and that in the
135: case of existing two-stage CLIR, multiple stages are used to improve
136: the quality of query
137: translation~\cite{ballesteros:sigir-97,davis:sigir-97}.
138:
139: Section~\ref{sec:system} describes our two-stage CLIR system, where we
140: elaborate mainly on the MT-based re-ranking method.
141: Section~\ref{sec:experimentation} then evaluates the performance of
142: our system, using the NACSIS test collection~\cite{kando:sigir-99},
143: which consists of 39 Japanese queries and approximately 330,000
144: technical abstracts in English and Japanese.
145:
146: \section{System Description}
147: \label{sec:system}
148:
149: \subsection{Overview}
150: \label{subsec:system_overview}
151:
152: Figure~\ref{fig:system} depicts the overall design of our
153: Japanese/English bidirectional CLIR system, in which we combined query
154: and document translation modules with a monolingual retrieval
155: system. In this section, we explain the retrieval process based on
156: this figure.
157:
158: First, given a query in the source language (S), a query translation
159: is performed to output a translation in the target language (T). In
160: this phase, we use two alternative methods. The first method is the
161: use of an MT system, for which we use the Transer Japanese/English MT
162: system.\footnote{Developed by NOVA, Inc.} This MT system uses a
163: general bilingual dictionary consisting of 230,000 entries, and 19
164: optional technical dictionaries, among which a computer terminology
165: dictionary consisting of 100,000 entries is combined with our system.
166:
167: However, since in most cases, queries consist of a small number of
168: keywords and phrases, word/phrased-based translation methods are
169: expected to be comparable with MT systems, in terms of query
170: translation. Thus, for the second method, we use the Japanese/English
171: phrase-based translation method proposed by Fujii and
172: Ishikawa~\cite{fujii:emnlp-vlc-99}, which uses general/technical
173: dictionaries to derive possible word/phrase translations, and resolves
174: translation ambiguity based on statistical information obtained from
175: the target document collection. In addition, for words unlisted in
176: dictionaries, transliteration is performed to identify phonetic
177: equivalents in the target language.
178:
179: Second, the monolingual retrieval system searches a collection for
180: documents relevant to the translated query, and sorts them according
181: to the degree of relevance (i.e., the score), in descending order.
182: For English documents, we use the SMART system~\cite{salton:71}, where
183: the augmented TF$\cdot$IDF term weighting method (``atc'') is used for
184: both queries and documents, and the score is computed based on the
185: similarity between the query and each document in a term vector space.
186: For Japanese documents, we implemented a retrieval system based on the
187: vector space model.
188:
189: Consequently, only the top $N$ documents are selected as an
190: intermediate retrieval result, where $N$ is a parametric constant.
191:
192: Third, the top $N$ documents are translated into the source
193: language. Note that unlike the query translation phase, we use solely
194: the Transer MT system, because translations are aimed primarily at
195: human users, and thus the phrase-based translation method potentially
196: degrades readability of retrieval results.
197:
198: Finally, the $N$ documents translated are {\em re\/}-ranked according
199: to the new score. To accomplish this task, we compute the similarity
200: score between the source query (submitted by the user) and each
201: translated document in the term vector space, as performed in the
202: first retrieval stage. We then compute the new score by averaging
203: those obtained independently with English and Japanese monolingual
204: similarity computations. We will elaborate on this process in
205: Section~\ref{subsec:re-ranking}.
206:
207: Note that by decreasing the value of $N$, we can decrease the
208: computational cost required for machine translation. However, this
209: also decreases the number of relevant documents contained in the top
210: $N$ set, and potentially dilutes the effectiveness of the re-ranking.
211: For example, in an extreme case where the top $N$ set contains no
212: relevant document, the re-ranking procedure does not change the
213: retrieval accuracy.
214:
215: The re-ranking procedure is similar to McCarley's hybrid
216: method~\cite{mccarley:acl-99}, in the sense that his method also
217: combines scores obtained with query and document
218: translations. However, unlike McCarley's method, which needs to
219: translate the entire document collection prior to the retrieval, in
220: our method the overhead for translating documents is minimized and can
221: be distributed to each user. In other words, the second stage can be
222: performed on each client (i.e., users' computers or Web browsers). In
223: fact, there are a number of commercial Web browsers combined with MT
224: systems, and thus it is feasible to additionally introduce the
225: re-ranking function to those browsers. Besides this, we can easily
226: replace the MT system with a newer version or those for other language
227: pairs.
228:
229: \begin{figure}[htbp]
230: \begin{center}
231: \leavevmode
232: \psfig{file=system.eps,height=3.3in}
233: \end{center}
234: \caption{The overall design of our CLIR system.}
235: \label{fig:system}
236: \end{figure}
237:
238: \subsection{MT-based Re-ranking Method}
239: \label{subsec:re-ranking}
240:
241: First, given the top $N$ documents retrieved and translated into the
242: source language, we first compute the similarity score between each
243: document and the source query provided by the user. Following the
244: vector space model, both queries and documents are represented by a
245: vector consisting of statistical factors associated with indexed terms
246: (i.e., term weights).
247:
248: In conventional retrieval systems, documents are indexed to produce an
249: inverted file, prior to the retrieval, so that documents containing
250: query terms can efficiently be retrieved even from a large-scale
251: collection. However, in the case of our re-ranking process, since (a)
252: the number of target documents is limited, and (b) real-time indexing
253: degrades the time efficiency, we prefer to use a simple pattern
254: matching method, instead of the inverted file.
255:
256: For term weighting, we tentatively use a variation of
257: TF$\cdot$IDF~\cite{salton:ipm-88,zobel:sigir-forum-98}, as shown in
258: Equation~\eq{eq:tf_idf}.
259: \begin{equation}
260: \label{eq:tf_idf}
261: \begin{array}{lll}
262: TF & = & 1 + \log(f_{t,d}) \\
263: \noalign{\vskip 1.2ex}
264: IDF & = & \log\frac{\textstyle N}{\textstyle n_{t}}
265: \end{array}
266: \end{equation}
267: Here, $f_{t,d}$ denotes the frequency that term $t$ appears in
268: document $d$. Note that unlike the common IDF formula, $N$ denotes the
269: number of documents retrieved in the first stage (see
270: Section~\ref{subsec:system_overview}), and $n_{t}$ denotes the number
271: of documents containing term $t$, out of $N$ documents.
272:
273: One may argue that since in our case where the number of target
274: documents is considerably smaller than that of the entire collection,
275: a different term weighting method is needed. For example, the IDF
276: formula proposed for large-scale document collections may be less
277: effective for a limited number of documents. However, a preliminary
278: experiment showed that the use of IDF marginally improved the
279: performance obtained without IDF. On the other hand, since the
280: preliminary experiment showed that the use of document length
281: considerably degraded the performance, we compute the similarity
282: between the query and each document, as the inner product (instead of
283: the cosine of the angle) between their associated vectors.
284:
285: Thereafter, for each document, we combine two similarity scores
286: obtained in English-English and Japanese-Japanese retrieval processes.
287: We shall call them $ESIM$ and $JSIM$, respectively. Since those two
288: similarity scores have different ranges, we use a geometric mean,
289: instead of an arithmetic mean, as shown in
290: Equation~\eq{eq:new_similarity}.
291: \begin{eqnarray}
292: \label{eq:new_similarity}
293: SIM & = & ESIM^\alpha \cdot JSIM^\beta
294: \end{eqnarray}
295: Here, $SIM$ is the final similarity score with which we re-rank the
296: top $N$ documents, and $\alpha$ and $\beta$ are parametric constants
297: used to control the degree to which $ESIM$ and $JSIM$ affect the
298: computation of $SIM$. However, in the case where either $ESIM$ or
299: $JSIM$ is zero, the value of $SIM$ always becomes zero, disregarding
300: the value of the other similarity score. To avoid this problem, in
301: such a case we arbitrarily assign the value 0.0001 to either $ESIM$ or
302: $JSIM$ that takes zero.
303:
304: Possible factors to set values of $\alpha$ and $\beta$ include the
305: quality of Japanese-English and English-Japanese translations. In the
306: case where the quality of one of the translations is considerably
307: lower, $\alpha$ and $\beta$ must be properly set so as to decrease the
308: effect of the similarity score through the lower quality
309: translation. Generally speaking, the quality of English-Japanese
310: translation is higher than that of Japanese-English translation,
311: because morphological and syntactic analyses for Japanese are usually
312: more crucial than those for English. However, we empirically set
313: \mbox{$\alpha=\beta=1$}, that is, we consider $ESIM$ and $JSIM$
314: equally in the re-ranking process.
315:
316: \medskip
317: \section{Experimentation}
318: \label{sec:experimentation}
319:
320: \subsection{Methodology}
321: \label{subsec:eval_overview}
322:
323: We investigated the performance of several versions of our system in
324: terms of Japanese-English CLIR, where each system outputs the top
325: 1,000 documents, and the TREC evaluation software was used
326: to calculate non-interpolated average precision values.
327:
328: For the purpose of our experiments, we used the official version
329: of the NACSIS test collection~\cite{kando:sigir-99}. This collection
330: consists of 39 Japanese queries and approximately 330,000 documents
331: (in either a combination of English and Japanese or either of the
332: languages individually), collected from technical papers published by
333: 65 Japanese associations for various fields.
334:
335: Each document consists of the document ID, title, name(s) of
336: author(s), name/date of conference, hosting organization, abstract and
337: keywords, from which titles, abstracts and keywords were indexed by
338: the SMART system. We used as target documents 187,081 entries that are
339: in both English and Japanese.
340:
341: Each query consists of the query ID, title of the topic, description,
342: narrative and list of synonyms, from which we used only the
343: description. Figure~\ref{fig:query} shows example descriptions
344: (translated into English by one of the authors).
345:
346: The NACSIS collection was produced for a TREC-type (CL)IR workshop
347: held by NACSIS (National Center for Science Information Systems,
348: Japan) in 1999.\footnote{See {\tt
349: http://www.rd.nacsis.ac.jp/\~{}ntcadm/workshop/work-en.html} for
350: details of the NACSIS workshop.} In this workshop, each participant
351: was allowed to submit more than one retrieval result using different
352: methods. However, at least one result had to be gained with only the
353: description field in queries. According to experimental results
354: reported in the proceedings of the workshop~\cite{ntcir-99}, in the
355: case where only the description field was used, average precision
356: values ranged from 0.021 to 0.182.
357:
358: Relevance assessment was performed based on the pooling
359: method~\cite{voorhees:sigir-98}. To put it more precisely, candidates
360: for relevant documents were first pooled by multiple retrieval systems
361: (primarily systems that participated in the NACSIS
362: workshop). Thereafter, for each candidate document, human expert(s)
363: assigned one of three ranks of relevance, that is, ``relevant'',
364: ``partially relevant'' and \mbox{``irrelevant''.} The average number
365: of candidate documents pooled for each query is 2,509, among which the
366: number of relevant and partially relevant documents are approximately
367: 21 and 6, respectively. In our experiments, we did not regard
368: ``partially relevant'' documents as relevant ones, because
369: interpretation of ``partially relevant'' is not fully clear to the
370: authors. Note that since the NACSIS collection does not contain
371: English queries, we cannot estimate a baseline for Japanese-English
372: CLIR performance using English-English IR.
373:
374: In the following two sections, we will show experimental results in
375: terms of the first and second stages (i.e., query translation methods
376: and the MT-based re-ranking method), respectively.
377:
378: \begin{figure}[htbp]
379: \begin{center}
380: \leavevmode
381: \small
382: \begin{tabular}{cl} \hline\hline
383: ID & {\hfill\centering Description\hfill} \\ \hline
384: 0032 & middleware construction in network collaboration \\
385: 0035 & digital libraries in distributed systems \\
386: 0036 & problems related to groupwares in mobile communication \\
387: 0062 & life-long education and volunteer \\
388: 0065 & image retrieval based on genetic algorithm \\
389: \hline
390: \end{tabular}
391: \caption{Example query descriptions in the NACSIS collection.}
392: \label{fig:query}
393: \end{center}
394: \end{figure}
395:
396: \medskip
397: \subsection{Evaluation of Query Translation Methods}
398: \label{subsec:eval_query_translation}
399:
400: The primal objective in this section is to compare the effectiveness
401: of the phrase-based translation method proposed by Fujii and
402: Ishikawa~\nocite{fujii:emnlp-vlc-99} and one based on the Transer MT
403: system, in terms of Japanese-English query translation. While the
404: former method is aimed solely at words and phrases, the MT system can
405: also be used for full sentences. In addition, since both methods are,
406: to some extent, complementary to each other, we theoretically gain a
407: query expansion effect, combining query terms translated by individual
408: methods. In view of those above factors, we compared the following
409: query translation methods:
410: \begin{itemize}
411: \item the use of the Transer MT system for full sentences contained in
412: the description field (``MTS''),
413: \item the use of the Transer MT system for content words and phrases
414: extracted from the description field, for which the ChaSen
415: morphological analyzer~\cite{matsumoto:chasen-97} was used
416: (``MTP''),
417: \item the phrase-based translation method applied to the same words
418: and phrases as used for the MTP method (``PBT''),
419: \item the use of query terms obtained with both MTP and PBT, where
420: terms outputed by both methods are considered to appear twice in the
421: query (``MPBT'').
422: \end{itemize}
423: Table~\ref{tab:avg_pre} shows the non-interpolated average
424: precision values, averaged over the 39 queries, for different query
425: translation methods listed above. The second column denotes the
426: average number of query terms provided with each translation method,
427: some of which were potentially discarded as stopwords by the SMART
428: system. The third column denotes average precision values for
429: different query translation methods. We will explain the fourth and
430: fifth columns in Section~\ref{subsec:eval_re-ranking}.
431:
432: Looking at this table, one can see that while two MT-based methods,
433: that is, MTS and MTP, were quite comparable in performance, and that
434: PBT outperformed both of them. In the case of PBT, the transliteration
435: successfully identified English equivalents for {\it katakana\/} words
436: unlisted in the word dictionary, such as ``{\it
437: coraboreishon\/}~(collaboration)'' and ``{\it mobairu\/}~(mobile)'',
438: which the MT-based methods failed to translate. Another reason was
439: due to the difference in dictionaries used. Generally speaking, PBT
440: tended to output technical words more than the MT-based methods. For
441: example, for Japanese phrases ``{\it fukusuu-deeta\/}'' and ``{\it
442: sekitsui-doubutsu}'', PBT outputed ``multiple data'' and ``craniate'',
443: while MTS/MTP outputed ``more than one data'' and ``vertebrate'',
444: respectively. Note that this effect was evident partially because the
445: NACSIS collection consists of technical documents. In addition, MPBT
446: further improved the performance of PBT. Although the difference
447: between PBT and MPBT was marginal, it is worth utilizing both the
448: MT-based and phrase-based methods, if available, for query
449: translation.
450:
451: \begin{table}[htbp]
452: \begin{center}
453: \caption{Non-interpolated average precision values,
454: averaged over the 39 queries.}
455: \medskip
456: \leavevmode
457: \small
458: \tabcolsep=3pt
459: \begin{tabular}{lrcll} \hline\hline
460: Query Translation & & &
461: \multicolumn{2}{c}{Avg. Precision with Re-ranking} \\
462: \cline{4-5}
463: Method & \# of Terms &
464: Avg. Precision & {\hfill\centering MT\hfill} & {\hfill\centering
465: HT\hfill} \\ \hline
466: ~~~~~~~~MTS & 16.6~~~~ & 0.1124 & 0.1770 (+57.5\%) & 0.2297
467: (+104.3\%) \\
468: ~~~~~~~~MTP & 8.7~~~~ & 0.1134 & 0.1746 (+54.0\%) & 0.2217
469: (+95.5\%) \\
470: ~~~~~~~~PBT & 6.1~~~~ & 0.1403 & 0.2013 (+43.5\%) & 0.2295
471: (+63.6\%) \\
472: ~~~~~~~~MPBT & 13.1~~~~ & 0.1426 & 0.1986 (+39.3\%) & 0.2356
473: (+65.2\%) \\
474: \hline
475: \end{tabular}
476: \label{tab:avg_pre}
477: \end{center}
478: \end{table}
479:
480: To validate those above results in a thorough manner, we used the
481: non-parametric Wilcoxon matched-pairs signed-test for statistical
482: testing (at the 5\% level), which investigates whether the difference
483: in average precision is meaningful or simply due to
484: chance~\cite{hull:sigir-93,keen:ipm-92,srinivasan:ipm-90}. We found
485: that differences in average precision values for pairs ``MTP versus
486: MTS'', ``MPBT versus MTS'', and ``MPBT versus MTP'' were significant,
487: although for other pairs, we could not obtain sufficient evidence to
488: conclude a statistical significance. To sum up, we concluded that in
489: query translation, a combination of MT-based and phrase-based
490: translation methods was more effective than a method relying solely on
491: the MT system.
492:
493: \medskip
494: \subsection{Evaluation of the MT-based Re-ranking Method}
495: \label{subsec:eval_re-ranking}
496:
497: First, we consider Table~\ref{tab:avg_pre} again, where the fourth
498: column ``MT'' denotes the average precision values for each query
499: translation method, combined with the MT-based re-ranking
500: method. Throughout our experimentation in this paper, the best average
501: precision value by an automatic method was 0.2013 (i.e., one obtained
502: by PBT combined with the MT-based re-ranking method), which is
503: relatively high, when compared with average precision values reported
504: in the NACSIS workshop (ranging from 0.021 to 0.182).
505:
506: For each query translation method, the improvement in average
507: precision from one without the re-ranking, which is generally
508: noticeable, is indicated in parentheses. In fact, we used the
509: Wilcoxon test again, as conducted in
510: Section~\ref{subsec:eval_query_translation}, and confirmed that every
511: improvement was statistically significant. To sum up, the MT-based
512: re-ranking method we proposed was generally effective, irrespective of
513: the query translation method combined, in terms of CLIR performance.
514:
515: Second, we conducted an error analysis for queries for which the
516: re-ranking method degraded the average precision, and found that
517: roughly two thirds of errors were due to ambiguity in the document
518: translation. For example, the English word ``library'' was often
519: incorrectly translated into ``{\em raiburari\/}~(library as a
520: software)'', whereas the original query was intended to ``{\em
521: toshokan\/}~(library as an institution)''.
522:
523: Third, to estimate the upper bound of the re-ranking method, as
524: denoted in the fifth column ``HT'', we used as human translations
525: Japanese documents comparable to English ones in the NACSIS
526: collection. By comparing the results of ``MT'' and ``HT'', one can see
527: that MT systems with a higher quality, if available, are expected to
528: further improve our CLIR system. In fact, when we manually corrected
529: inappropriate translations in translated documents, such as
530: ``library~({\em raiburari/toshokan\/})'' above, the average precision
531: of ``MT'' became almost equivalent to that of ``HT''.
532:
533: Noted that when combined with the re-ranking method, differences among
534: query translation methods in average precision were relatively
535: overshadowed. In the case of ``MT'', the Wilcoxon test showed that
536: differences in only pairs ``MPBT versus MTS'' and ``MPBT versus MTP''
537: were significant, while in the case of ``HT'', none of the differences
538: were identified as significant.
539:
540: Fourth, we investigated how the number of documents retrieved in the
541: first stage (i.e., the value of $N$ in Section~\ref{sec:system})
542: affected the performance of the re-ranking method. As discussed in
543: Section~\ref{subsec:system_overview}, in real world usage, one has to
544: consider the trade-off between the retrieval accuracy (i.e., average
545: precision in our case) and overhead required for the document
546: translation.
547:
548: Table~\ref{tab:docnum_avgpre} shows the results, where average
549: precision values in the column \mbox{``1,000''} correspond to those in
550: Table~\ref{tab:avg_pre}. By comparing average precision values for
551: each of four query translation methods (i.e., MTS, MTP, PBT and MPBT)
552: and those suffixed with ``+MT'' and ``+HT'' in
553: Table~\ref{tab:docnum_avgpre}, one can see that the re-ranking methods
554: were effective, irrespective of the number of documents retrieved. In
555: other words, it is expected that we can minimize the overhead in
556: translating documents, without decreasing the retrieval accuracy.
557:
558: Table~\ref{tab:xtime} shows CPU time (sec.) required for the document
559: translation and re-ranking procedures, averaged over four different
560: query translation methods. In the case of \mbox{$N=1,000$}, the total
561: CPU time was approximately three minutes, which is perhaps not
562: tolerable for a real-time usage. However, for small values of $N$
563: (e.g., 50 and 100), the CPU time was more acceptable and practical,
564: maintaining the improvement of retrieval accuracy.
565:
566: \begin{table}[htbp]
567: \begin{center}
568: \caption{The relation between the number of documents retrieved in
569: the first stage and non-interpolated average precision
570: values, averaged over the 39 queries.}
571: \medskip
572: \leavevmode
573: \small
574: \tabcolsep=4pt
575: \begin{tabular}{lccccccc} \hline\hline
576: & \multicolumn{7}{c}{\# of Documents Retrieved ($N$)} \\
577: \cline{2-8}
578: {\hfill\centering Method\hfill} & 50 & 100 & 200 & 400 & 600 &
579: 800 & 1,000 \\ \hline
580: MTS & 0.0949 & 0.1017 & 0.1074 & 0.1101 & 0.1112 & 0.1119 &
581: 0.1124 \\
582: MTS+MT & 0.1341 & 0.1556 & 0.1673 & 0.1698 & 0.1720 & 0.1736 &
583: 0.1770 \\
584: MTS+HT & 0.1666 & 0.1901 & 0.2070 & 0.2173 & 0.2230 & 0.2259 &
585: 0.2297 \\
586: \hline
587: MTP & 0.0953 & 0.1020 & 0.1085 & 0.1113 & 0.1123 & 0.1131 &
588: 0.1134 \\
589: MTP+MT & 0.1449 & 0.1584 & 0.1692 & 0.1711 & 0.1728 & 0.1750 &
590: 0.1746 \\
591: MTP+HT & 0.1619 & 0.1819 & 0.2017 & 0.2105 & 0.2165 & 0.2203 &
592: 0.2217 \\
593: \hline
594: PBT & 0.1215 & 0.1301 & 0.1355 & 0.1385 & 0.1394 & 0.1399 &
595: 0.1403 \\
596: PBT+MT & 0.1553 & 0.1723 & 0.1866 & 0.1954 & 0.1978 & 0.2005 &
597: 0.2013 \\
598: PBT+HT & 0.1722 & 0.1915 & 0.2097 & 0.2212 & 0.2241 & 0.2279 &
599: 0.2295 \\
600: \hline
601: MPBT & 0.1229 & 0.1305 & 0.1376 & 0.1405 & 0.1416 & 0.1421 &
602: 0.1426 \\
603: MPBT+MT & 0.1690 & 0.1766 & 0.1901 & 0.1946 & 0.1958 & 0.1967 &
604: 0.1986 \\
605: MPBT+HT & 0.1814 & 0.1968 & 0.2142 & 0.2242 & 0.2301 & 0.2319 &
606: 0.2356 \\
607: \hline
608: \end{tabular}
609: \label{tab:docnum_avgpre}
610: \end{center}
611: \end{table}
612:
613: \begin{table}[htbp]
614: \begin{center}
615: \caption{CPU time for document translation and re-ranking (sec.).}
616: \medskip
617: \leavevmode
618: \small
619: \tabcolsep=4pt
620: \begin{tabular}{lrrrrrrr} \hline\hline
621: & \multicolumn{7}{c}{\# of Documents Retrieved ($N$)} \\
622: \cline{2-8}
623: & {\hfill\centering 50 \hfill} &
624: {\hfill\centering 100 \hfill} &
625: {\hfill\centering 200 \hfill} &
626: {\hfill\centering 400 \hfill} &
627: {\hfill\centering 600 \hfill} &
628: {\hfill\centering 800 \hfill} &
629: {\hfill\centering 1,000 \hfill}
630: \\ \hline
631: translation & 9.5 & 17.7 & 33.3 & 65.6 & 106.2 & 139.3 & 175.1 \\
632: re-ranking & 0.2 & 0.3 & 0.6 & 1.2 & 1.8 & 2.4 & 3.0 \\
633: total & 9.7 & 18.0 & 33.9 & 66.8 & 108.0 & 141.7 & 178.1 \\
634: \hline
635: \multicolumn{8}{r}{(Pentium III 700MHz)}
636: \end{tabular}
637: \label{tab:xtime}
638: \end{center}
639: \end{table}
640:
641: \section{Conclusion}
642: \label{sec:conclusion}
643:
644: Reflecting the rapid growth in utilization of machine readable texts,
645: cross-language information retrieval (CLIR) has variously been
646: explored in order to facilitate retrieving information across
647: languages.
648:
649: In brief, existing CLIR systems are classified into three approaches:
650: (a) translating queries into the document language, (b) translating
651: documents into the query language, and (c) representing both queries
652: and documents in a language-independent space. Among these approaches,
653: the second approach, based on machine translation, is effective in
654: terms of retrieval accuracy and user interaction. However, the
655: computational cost in translating large-scale document collections is
656: prohibitive.
657:
658: To resolve this problem, we proposed a two-stage CLIR method, in which
659: we first used a query translation method to retrieve a fixed number of
660: documents, and then applied machine translation only to those
661: documents, instead of the entire collection, to improve the document
662: ranking.
663:
664: Through Japanese-English CLIR experiments using the NACSIS collection,
665: we showed that our two-stage method significantly improved average
666: precision values obtained solely with query translation methods. We
667: also showed that our method performed reasonably, even in the case
668: where the number of retrieved documents was relatively small.
669:
670: \section*{Acknowledgments}
671:
672: The authors would like to thank NOVA, Inc. for their support with the
673: Transer MT system, and Noriko Kando (National Institute of
674: Informatics, Japan) for her support with the NACSIS collection.
675:
676:
677: \bibliographystyle{jplain}
678:
679: \begin{thebibliography}{10}
680:
681: \bibitem{ballesteros:sigir-97}
682: Lisa Ballesteros and W.~Bruce Croft.
683: \newblock Phrasal translation and query expansion techniques for cross-language
684: information retrieval.
685: \newblock In {\em Proceedings of the 20th Annual International ACM SIGIR
686: Conference on Research and Development in Information Retrieval}, pp. 84--91,
687: 1997.
688:
689: \bibitem{ballesteros:sigir-98}
690: Lisa Ballesteros and W.~Bruce Croft.
691: \newblock Resolving ambiguity for cross-language retrieval.
692: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR
693: Conference on Research and Development in Information Retrieval}, pp. 64--71,
694: 1998.
695:
696: \bibitem{carbonell:ijcai-97}
697: Jaime~G. Carbonell, Yiming Yang, Robert~E. Frederking, Ralf~D. Brown, Yibing
698: Geng, and Danny Lee.
699: \newblock Translingual information retrieval: A comparative evaluation.
700: \newblock In {\em Proceedings of the 15th International Joint Conference on
701: Artificial Intelligence}, pp. 708--714, 1997.
702:
703: \bibitem{davis:sigir-97}
704: Mark~W. Davis and William~C. Ogden.
705: \newblock {QUILT}: Implementing a large-scale cross-language text retrieval
706: system.
707: \newblock In {\em Proceedings of the 20th Annual International ACM SIGIR
708: Conference on Research and Development in Information Retrieval}, pp. 92--98,
709: 1997.
710:
711: \bibitem{fujii:emnlp-vlc-99}
712: Atsushi Fujii and Tetsuya Ishikawa.
713: \newblock Cross-language information retrieval for technical documents.
714: \newblock In {\em Proceedings of the Joint ACL SIGDAT Conference on Empirical
715: Methods in Natural Language Processing and Very Large Corpora}, pp. 29--37,
716: 1999.
717:
718: \bibitem{gonzalo:chum-98}
719: Julio Gonzalo, Felisa Verdejo, Carol Peters, and Nicoletta Calzolari.
720: \newblock Applying {EuroWordNet} to cross-language text retrieval.
721: \newblock {\em Computers and the Humanities}, Vol.~32, pp. 185--207, 1998.
722:
723: \bibitem{hull:sigir-93}
724: David Hull.
725: \newblock Using statistical testing in the evaluation of retrieval experiments.
726: \newblock In {\em Proceedings of the 16th Annual International ACM SIGIR
727: Conference on Research and Development in Information Retrieval}, pp.
728: 329--338, 1993.
729:
730: \bibitem{kando:sigir-99}
731: Noriko Kando, Kazuko Kuriyama, and Toshihiko Nozue.
732: \newblock {NACSIS} test collection workshop ({NTCIR-1}).
733: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR
734: Conference on Research and Development in Information Retrieval}, pp.
735: 299--300, 1999.
736:
737: \bibitem{keen:ipm-92}
738: E.~Michael Keen.
739: \newblock Presenting results of experimental retrieval comparisons.
740: \newblock {\em Information Processing \& Management}, Vol.~28, No.~4, pp.
741: 491--502, 1992.
742:
743: \bibitem{kwok:sigir-98}
744: K.L. Kwok and M.~Chan.
745: \newblock Improving two-stage ad-hoc retrieval for short queries.
746: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR
747: Conference on Research and Development in Information Retrieval}, pp.
748: 250--256, 1998.
749:
750: \bibitem{littman:clir-98}
751: Michael~L. Littman, Susan~T. Dumais, and Thomas~K. Landauer.
752: \newblock Automatic cross-language information retrieval using latent semantic
753: indexing.
754: \newblock In Gregory Grefenstette, editor, {\em Cross-Language Information
755: Retrieval}, chapter~5, pp. 51--62. Kluwer Academic Publishers, 1998.
756:
757: \bibitem{matsumoto:chasen-97}
758: Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Osamu Imaichi, and Tomoaki
759: Imamura.
760: \newblock {Japanese} morphological analysis system {ChaSen} manual.
761: \newblock Technical Report NAIST-IS-TR97007, NAIST, 1997.
762: \newblock (In Japanese).
763:
764: \bibitem{mccarley:acl-99}
765: J.~Scott McCarley.
766: \newblock Should we translate the documents or the queries in cross-language
767: information retrieval?
768: \newblock In {\em Proceedings of the 37th Annual Meeting of the Association for
769: Computational Linguistics}, pp. 208--214, 1999.
770:
771: \bibitem{mccarley:amta-98}
772: J.~Scott McCarley and Salim Roukos.
773: \newblock Fast document translation for cross-language information retrieval.
774: \newblock In {\em Proceedings of the 3rd Conference of the Association for
775: Machine Translation in the Americas}, pp. 150--157, 1998.
776:
777: \bibitem{ntcir-99}
778: {National Center for Science Information Systems}.
779: \newblock {\em Proceedings of the 1st NTCIR Workshop on Research in Japanese
780: Text Retrieval and Term Recognition}, 1999.
781:
782: \bibitem{nie:sigir-99}
783: Jian-Yun Nie, Michel Simard, Pierre Isabelle, and Richard Durand.
784: \newblock Cross-language information retrieval based on parallel texts and
785: automatic mining of parallel texts from the {Web}.
786: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR
787: Conference on Research and Development in Information Retrieval}, pp. 74--81,
788: 1999.
789:
790: \bibitem{oard:amta-98}
791: Douglas~W. Oard.
792: \newblock A comparative study of query and document translation for
793: cross-language information retrieval.
794: \newblock In {\em Proceedings of the 3rd Conference of the Association for
795: Machine Translation in the Americas}, pp. 472--483, 1998.
796:
797: \bibitem{salton:jasis-70}
798: Gerard Salton.
799: \newblock Automatic processing of foreign language documents.
800: \newblock {\em Journal of the American Society for Information Science},
801: Vol.~21, No.~3, pp. 187--194, 1970.
802:
803: \bibitem{salton:71}
804: Gerard Salton.
805: \newblock {\em The {SMART} Retrieval System: Experiments in Automatic Document
806: Processing}.
807: \newblock Prentice-Hall, 1971.
808:
809: \bibitem{salton:ipm-88}
810: Gerard Salton and Christopher Buckley.
811: \newblock Term-weighting approaches in automatic text retrieval.
812: \newblock {\em Information Processing \& Management}, Vol.~24, No.~5, pp.
813: 513--523, 1988.
814:
815: \bibitem{srinivasan:ipm-90}
816: Padmini Srinivasan.
817: \newblock A comparison of two-poisson, inverse document frequency and
818: discrimination value models of document representation.
819: \newblock {\em Information Processing \& Management}, Vol.~26, No.~2, pp.
820: 269--278, 1990.
821:
822: \bibitem{voorhees:sigir-98}
823: Ellen~M. Voorhees.
824: \newblock Variations in relevance judgments and the measurement of retrieval
825: effectiveness.
826: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR
827: Conference on Research and Development in Information Retrieval}, pp.
828: 315--323, 1998.
829:
830: \bibitem{zobel:sigir-forum-98}
831: Justin Zobel and Alistair Moffat.
832: \newblock Exploring the similarity space.
833: \newblock {\em ACM SIGIR FORUM}, Vol.~32, No.~1, pp. 18--34, 1998.
834:
835: \end{thebibliography}
836:
837: \end{document}
838: