1: \documentstyle[lrec2000]{article}
2:
3: \title{PRIME: A System for Multi-lingual Patent Retrieval}
4:
5: \name{Shigeto Higuchi$^{\dagger}$, Masatoshi Fukui$^{\dagger}$, %\\
6: \large\bf Atsushi Fujii$^{\dagger\dagger,\dagger\dagger\dagger}$, and Tetsuya
7: Ishikawa$^{\dagger\dagger}$}
8:
9: \address{$^{\dagger}$PATOLIS Corporation \\
10: 2-4-29 Shiohama Koto-ku, 135-0043, Japan \\
11: $^{\dagger\dagger}$University of Library and Information Science \\
12: 1-2 Kasuga Tsukuba, 305-8550, Japan \\
13: $^{\dagger\dagger\dagger}$CREST, Japan Science and Technology Corporation \\
14: fujii@ulis.ac.jp}
15:
16: \abstract{Given the growing number of patents filed in multiple
17: countries, users are interested in retrieving patents across
18: languages. We propose a multi-lingual patent retrieval system, which
19: translates a user query into the target language, searches a
20: multilingual database for patents relevant to the query, and improves
21: the browsing efficiency by way of machine translation and
22: clustering. Our system also extracts new translations from patent
23: families consisting of comparable patents, to enhance the translation
24: dictionary.}
25:
26: \keywords{multi-lingual patent retrieval, machine translation,
27: document clustering, translation extraction, patent families}
28:
29: \newcommand{\etal}{et~al.}
30: \newcommand{\etaleos}{et~al}
31: \newcommand{\eq}[1]{(\ref{#1})}
32:
33: \input{psfig.tex}
34:
35: \begin{document}
36:
37: \maketitleabstract
38:
39: \section{Introduction}
40: \label{sec:introduction}
41:
42: Given the growing number of patents filed in multiple countries, it is
43: feasible that users are interested in retrieving patent information
44: across languages. However, many users find it difficult to perform
45: patent retrieval (i.e., formulating queries, searching databases for
46: relevant patents, and browsing retrieved patents) in foreign
47: languages.
48:
49: To counter this problem, cross-language information retrieval (CLIR),
50: where queries in one language are submitted to retrieve documents in
51: another language, can be an effective solution. CLIR has of late
52: become one of the major topics within the information retrieval and
53: natural language processing communities. In fact, a number of
54: methods/systems for CLIR have been proposed.
55:
56: Since by definition queries and documents are in different languages,
57: queries and documents need to be standardized into a common
58: representation, so that monolingual retrieval techniques can be
59: applied. From this point of view, existing CLIR methods are classified
60: into the following three fundamental categories.
61:
62: The first method translates queries into the document
63: language~\cite{ballesteros:sigir-98,fujii:chum-x,nie:sigir-99}, and
64: the second method translates documents into the query
65: language~\cite{mccarley:acl-99,oard:amta-98}. The third method
66: projects both queries and documents into a language-independent space
67: by way of thesaurus classes~\cite{gonzalo:chum-98,salton:jasis-70} and
68: latent semantic indexing~\cite{carbonell:ijcai-97,littman:clir-98}.
69:
70: Among those above methods, the first one (i.e., query translation
71: method) is preferable in terms of implementation cost, because this
72: approach can simply be combined with existing monolingual retrieval
73: systems.
74:
75: Following a query translation
76: method~\cite{fujii:emnlp-vlc-99,fujii:chum-x}, we previously proposed
77: a Japanese/English cross-language patent retrieval
78: system~\cite{fukui:sigir-ws-pr-2000}, where users submit queries in
79: either Japanese or English to retrieve patents in the other language.
80: In either case, the target database is monolingual.
81:
82: However, since users are not always sure as to which language database
83: contains patents relevant to their information need, it is effective
84: to retrieve patents in multiple languages {\em simultaneously\/}.
85: This process, which we shall call ``multi-lingual information
86: retrieval (MLIR)'', is an extension of CLIR. In this paper, we propose
87: a Japanese/English multi-lingual patent retrieval system called
88: ``PRIME'' (Patent Retrieval In Multi-lingual Environment),
89:
90: The design of our system is based on that for technical
91: documents~\cite{fujii:ntcir-2-2001}, which combines query translation,
92: document retrieval, document translation and clustering modules
93: (Section~\ref{sec:system}).
94:
95: Additionally, in this paper we newly introduce a module for enhancing
96: a dictionary used for the query translation module. For this purpose,
97: we propose a method to extract Japanese/English translations from
98: patent families consisting of comparable patents filed in Japan and
99: the United States (Section~\ref{sec:extraction}).
100:
101: \section{System Description}
102: \label{sec:system}
103:
104: \subsection{Overview}
105: \label{subsec:system_overview}
106:
107: Figure~\ref{fig:system} depicts the overall design of PRIME, which
108: retrieves documents in response to user queries in either Japanese or
109: English. However, unlike the case of CLIR, retrieved documents can
110: potentially be in either a combination of Japanese and English or
111: either of the languages individually. We briefly explain the entire
112: on-line process based on this figure.
113:
114: First, a user query is translated into the foreign language (i.e.,
115: either Japanese or English) by way of a query translation module.
116:
117: Second, a document retrieval module uses both the source (user) and
118: translated queries to search a Japanese/English bilingual patent
119: collection for relevant documents.
120:
121: In real world usage, Japanese and English patents are not comparable
122: in the collection (this is the major reason why cross/multi-lingual
123: retrieval is needed). However, for the purpose of research and
124: development, we currently target a comparable collection.
125:
126: To put it more precisely, the collection contains approximately
127: 1,750,000 pairs of Japanese abstracts and their English translations,
128: which were provided on PAJ (Patent Abstract of Japan) CD-ROMs in
129: 1995-1999\footnote{Copyright by Japan Patent Office.}.
130:
131: Third, among retrieved documents, only those that are in the foreign
132: language are translated into the user language through a document
133: translation module.
134:
135: In principle, we need only above three modules to realize
136: multi-lingual patent retrieval in the sense that users can
137: retrieve/browse foreign documents through their native
138: language. However, to improve the browsing efficiency, a clustering
139: module finally divides retrieved documents into a specific number of
140: groups.
141:
142: Additionally, in the off-line process, a translation extraction module
143: identifies Japanese/English translations in the database, to enhance
144: the query translation module.
145:
146: \begin{figure}[htbp]
147: \begin{center}
148: \leavevmode
149: \psfig{file=system.eps,height=3.3in}
150: \end{center}
151: \caption{The design of PRIME: our multi-lingual patent retrieval system
152: (dashed arrows denote the off-line process).}
153: \label{fig:system}
154: \end{figure}
155:
156: \subsection{Query Translation}
157: \label{subsec:query_translation}
158:
159: The query translation module is based on the method proposed by Fujii
160: and Ishikawa~\shortcite{fujii:emnlp-vlc-99,fujii:chum-x}, which has
161: been applied to Japanese/English CLIR for the NTCIR collection
162: consisting of technical abstracts~\cite{kando:sigir-99}.
163:
164: This method translates words and phrases (compound words) in a given
165: query, maintaining the word order in the source language. A
166: preliminary study showed that approximately 95\% of compound technical
167: terms defined in a bilingual dictionary~\cite{ferber:89} maintain the
168: same word order in both Japanese and English.
169:
170: Then, the Nova dictionary\footnote{Developed by NOVA,
171: Inc. http://www.nova.co.jp/} is used to derive possible word/phrase
172: translations, and a probabilistic method is used to resolve
173: translation ambiguity.
174:
175: The Nova dictionary includes approximately one million
176: Japanese-English translations related to 19 technical fields as listed
177: below:
178: \begin{quote}
179: aeronautics, biotechnology, business, chemistry, computers,
180: construction, defense, ecology, electricity, energy, finance, law,
181: mathematics, mechanics, medicine, metals, oceanography, plants,
182: trade.
183: \end{quote}
184:
185: In addition, for words unlisted in the Nova dictionary,
186: transliteration is performed to identify phonetic equivalents in the
187: target language. Since Japanese often represents loanwords (i.e.,
188: technical terms and proper nouns imported from foreign languages)
189: using its special phonetic alphabet (or phonogram) called ``{\it
190: katakana}'', with which new words can be spelled out, transliteration
191: is effective to improve the translation quality.
192:
193: We represent the user query and one translation candidate in the
194: document language by $U$ and $D$, respectively. From the viewpoint of
195: probability theory, our task here is to select $D$'s with greater
196: probability, $P(D|U)$, which can be transformed as in
197: Equation~\eq{eq:query_translation} through the Bayesian theorem.
198: \begin{equation}
199: \label{eq:query_translation}
200: P(D|U) = \frac{\textstyle P(U|D)\cdot P(D)}{\textstyle P(U)}
201: \end{equation}
202: In practice, $P(U)$ can be omitted because this factor is a constant
203: with respect to the given query, and thus does not affect the relative
204: probability for different translation candidates.
205:
206: $P(D)$ is estimated by a word-based bi-gram language model produced
207: from the target collection. $P(U|D)$ is estimated based on the word
208: frequency obtained from the Nova dictionary. Those two factors are
209: commonly termed language and translation models, respectively (see
210: Figure~\ref{fig:system}).
211:
212: \subsection{Document Retrieval}
213: \label{subsec:retrieval}
214:
215: The retrieval module is based on an existing probabilistic retrieval
216: method~\cite{robertson:sigir-94}, which computes the relevance score
217: between the translated query and each document in the collection. The
218: relevance score for document $i$ is computed based on
219: Equation~\eq{eq:okapi}.
220: \begin{equation}
221: \label{eq:okapi}
222: \sum_{t} \left(\frac{\textstyle TF_{t,i}}{\textstyle
223: \frac{\textstyle DL_{i}}{\textstyle avglen} +
224: TF_{t,i}}\cdot\log\frac{\textstyle N}{\textstyle DF_{t}}\right)
225: \end{equation}
226: Here, $TF_{t,i}$ denotes the frequency that term $t$ appears in
227: document $i$. $DF_{t}$ and $N$ denote the number of documents
228: containing term $t$ and the total number of documents in the
229: collection. $DL_{i}$ denotes the length of document $i$ (i.e., the
230: number of characters contained in $i$), and $avglen$ denotes the
231: average length of documents in the collection.
232:
233: For both Japanese and English collections, we use content words
234: extracted from documents as terms, and perform a word-based indexing.
235: For the Japanese collection, we use the ChaSen morphological
236: analyzer~\cite{matsumoto:chasen-99} to extract content words. However,
237: for the English collection, we extract content words based on
238: parts-of-speech as defined in WordNet~\cite{fellbaum:wordnet-98}.
239:
240: \subsection{Document Translation}
241: \label{subsec:document_translation}
242:
243: The document translation module consists of the the Transer
244: Japanese/English MT system, which uses the same dictionary used for
245: the query translation module.
246:
247: In practice, since machine translation is computationally expensive
248: and degrades the time efficiency, we perform machine translation on a
249: phrase-by-phrase basis. In brief, phrases are sequences of content
250: words in documents, for which we developed rules to generate phrases
251: based on the part-of-speech information. This method is practical
252: because even a word/phrase-based translation can potentially improve
253: on the efficiency for users to find relevant foreign documents from
254: the whole retrieval result~\cite{oard:ipm-99}.
255:
256: \subsection{Clustering}
257: \label{subsec:clustering}
258:
259: For the purpose of clustering retrieved documents, we use the
260: Hierarchical Bayesian Clustering (HBC) method~\cite{iwayama:ijcai-95},
261: which merges similar items (i.e., documents in our case) in a
262: bottom-up manner, until all the items are merged into a single
263: cluster. Thus, a specific number of clusters can be obtained by
264: splitting the resultant hierarchy at a predetermined level.
265:
266: The HBC method also determines the most representative item (centroid)
267: for each cluster. Thus, we can enhance the browsing efficiency by
268: presenting only those centroids to users.
269:
270: The similarity between documents is computed based on feature vectors
271: that characterize each document. In our case, vectors for each
272: document consist of frequencies of content words appearing in the
273: document. We extract content words from documents as performed in
274: word-based indexing (see Section~\ref{subsec:retrieval}).
275:
276: Given the clustering module, the system can facilitate an interactive
277: retrieval. To put it more precisely, through the interface, users can
278: discard irrelevant clusters determined by browsing representative
279: documents, and re-cluster the remaining documents. By performing this
280: process recursively, relevant documents are eventually remained.
281:
282: \section{Extracting Translations Using Patent Families}
283: \label{sec:extraction}
284:
285: \subsection{Overview}
286: \label{subsec:extraction_overview}
287:
288: Since patents are usually associated with new words, it is crucial to
289: translate out-of-dictionary words. The transliteration method used in
290: the query translation module is one solution for this problem (see
291: Section~\ref{subsec:query_translation}).
292:
293: On the other hand, it is also effective to update the translation
294: dictionary. For this purpose, a number of methods to extract
295: translations from bilingual (parallel/comparable)
296: corpora~\cite{smadja:cl-96,yamamoto:coling-2000} are
297: applicable. However, it is considerably expensive to obtain bilingual
298: corpora with sufficient volume of alignment information.
299:
300: To resolve this problem, we use patent families, which are patent sets
301: filed for the same/related contents in multiple countries, as
302: comparable corpora. Thus, patents contained in the same family are not
303: necessarily parallel, but quite comparable.
304:
305: Among a number of ways to apply for patents in multiple countries, we
306: focus solely on patents claiming priority under the Paris Convention,
307: because we can easily identify patent families by the identification
308: number assigned to each patent.
309:
310: In addition, the number of patent families is still increasing. Thus,
311: we can easily update a large-scale bilingual comparable corpus based
312: on patent families. To the best of our knowledge no research has
313: utilized patent families for extracting translations.
314:
315: \subsection{Methodology}
316: \label{subsec:extraction_method}
317:
318: Since patents are structured with a number of fields (e.g., titles,
319: abstracts, and claims), our method first identifies corresponding
320: fragments based on the document structure, to improve the extraction
321: accuracy.
322:
323: However, structures of paired patents are not always the same. For
324: example, the number of fields claimed in a single patent family often
325: varies depending on the language. Thus, we use only the title and
326: abstract fields, which usually parallel in Japanese and English
327: patents. In other words, unlike the case of most existing extraction
328: methods, our method does not need sentence-aligned corpora.
329:
330: We use the ChaSen morphological analyzer~\cite{matsumoto:chasen-99}
331: and Brill tagger~\cite{brill:cl-95} to extract content words from
332: Japanese and English fragments, respectively. In addition, we combine
333: more than one word into phrases, for which we developed rules to
334: generate phrases based on the part-of-speech information.
335:
336: We then compute the association score for all the possible
337: combinations of Japanese/English phrases co-occurring in the same
338: fragment, and select those with greater score as the final
339: translations. For this purpose, we use the weighted Dice
340: coefficient~\cite{yamamoto:coling-2000} as shown in
341: Equation~\eq{eq:wdice}.
342: \begin{equation}
343: \label{eq:wdice}
344: score(W_{j}, W_{e}) = \log F_{je}\cdot\frac{\textstyle 2
345: F_{je}}{\textstyle F_{j} + F_{e}}
346: \end{equation}
347: Here, $W_{j}$ and $W_{e}$ are Japanese and English phrases,
348: respectively. $F_{j}$ and $F_{e}$ denote the frequency that $W_{j}$
349: and $W_{e}$ appear in the entire corpus, respectively. $F_{je}$
350: denotes the frequency that $W_{j}$ and $W_{e}$ co-occur in the same
351: fragment. The logarithm factor is effective to discard infrequent
352: co-occurrences, which usually decrease the extraction accuracy.
353:
354: \subsection{Experimentation}
355: \label{subsec:extraction_experimentation}
356:
357: A preliminary study showed that out of approximately 1,750,000 patents
358: filed in Japan (1995-1999), approximately 32,000 patents were paired
359: with those filed in the United States as patent families. Thus, in
360: practice we obtained a bilingual comparable corpus consisting of
361: 32,000 Japanese/English pairs. From this corpus, our method extracted
362: 1,234,347 phrase-based translations, which were judged it correct or
363: incorrect.
364:
365: However, we selected translations association whose score was above
366: 1.5, and manually judged their correctness, because a) the judgement
367: can be considerably expensive for the entire translations, and b)
368: translations with small association scores are usually incorrect. The
369: total number of selected translations was 37,669.
370:
371: We then evaluated the accuracy of our extraction method. The accuracy
372: is the ratio between the number of correct translations, and the
373: number of cases where the association score of the translation is
374: above a specific threshold. By raising the value of the threshold,
375: the accuracy also increased, while the number of extracted
376: translations decreased, as shown in Table~\ref{tab:extraction}.
377: According to this table, we could achieve a high accuracy by limiting
378: the number of translations extracted.
379:
380: We spent only four man-days in judging the 37,669 translations and
381: identifying 5,879 correct translations. In other words, our method
382: facilitated to produce bilingual lexicons semi-automatically with a
383: trivial cost.
384:
385: \begin{table}[htbp]
386: \begin{center}
387: \caption{Accuracy for translation extraction.}
388: \medskip
389: \leavevmode
390: \small
391: \tabcolsep=2pt
392: \begin{tabular}{lrrrrr} \hline\hline
393: Threshold for Score & 1.5 & 2.0 & 3.0 & 4.0 & 5.0 \\ \hline
394: \# of Translations & 37,669 & 24,869 & 4,419 & 962 & 356 \\
395: \# of Correct Translations & 5,879 & 4,129 & 1,399 & 564 & 240 \\
396: Accuracy (\%) & 15.6 & 16.6 & 31.7 & 58.6 & 67.4 \\
397: \hline
398: \end{tabular}
399: \label{tab:extraction}
400: \end{center}
401: \end{table}
402:
403: \section{Conclusion}
404: \label{sec:conclusion}
405:
406: In this paper, we proposed a multi-lingual system for Japanese/English
407: patent retrieval. For this purpose, we used a query translation method
408: explored in cross-language information retrieval (CLIR).
409:
410: However, unlike the case of CLIR, our system retrieves bilingual
411: patents simultaneously in response to a monolingual query. Our system
412: also summarizes retrieved patents by way of machine translation and
413: clustering to improve the browsing efficiency.
414:
415: In addition, our system includes an extraction module which produces
416: new translations from patent families consisting of comparable
417: patents, and updates the translation dictionary.
418:
419: Future work would include improving existing modules in our system,
420: and the application of our framework to other languages.
421:
422: \section*{Acknowledgments}
423:
424: The authors would like to thank NOVA, Inc. for their support with the
425: Nova dictionary and Transer system, and Makoto Iwayama for his support
426: with the HBC software.
427:
428: \small
429: \bibliographystyle{acl}
430:
431: \begin{thebibliography}{}
432:
433: \bibitem[\protect\citename{Ballesteros and Croft}1998]{ballesteros:sigir-98}
434: Lisa Ballesteros and W.~Bruce Croft.
435: \newblock 1998.
436: \newblock Resolving ambiguity for cross-language retrieval.
437: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR
438: Conference on Research and Development in Information Retrieval}, pages
439: 64--71.
440:
441: \bibitem[\protect\citename{Brill}1995]{brill:cl-95}
442: Eric Brill.
443: \newblock 1995.
444: \newblock Transformation-based error-driven learning and natural language
445: processing: A case study in part-of-speech tagging.
446: \newblock {\em Computational Linguistics}, 21(4):543--565.
447:
448: \bibitem[\protect\citename{Carbonell \bgroup et al.\egroup
449: }1997]{carbonell:ijcai-97}
450: Jaime~G. Carbonell, Yiming Yang, Robert~E. Frederking, Ralf~D. Brown, Yibing
451: Geng, and Danny Lee.
452: \newblock 1997.
453: \newblock Translingual information retrieval: A comparative evaluation.
454: \newblock In {\em Proceedings of the 15th International Joint Conference on
455: Artificial Intelligence}, pages 708--714.
456:
457: \bibitem[\protect\citename{Fellbaum}1998]{fellbaum:wordnet-98}
458: Christiane Fellbaum, editor.
459: \newblock 1998.
460: \newblock {\em {WordNet}: An Electronic Lexical Database}.
461: \newblock MIT Press.
462:
463: \bibitem[\protect\citename{Ferber}1989]{ferber:89}
464: Gene Ferber.
465: \newblock 1989.
466: \newblock {\em {English-Japanese}, {Japanese-English} Dictionary of Computer
467: and Data-Processing Terms}.
468: \newblock MIT Press.
469:
470: \bibitem[\protect\citename{Fujii and Ishikawa}1999]{fujii:emnlp-vlc-99}
471: Atsushi Fujii and Tetsuya Ishikawa.
472: \newblock 1999.
473: \newblock Cross-language information retrieval for technical documents.
474: \newblock In {\em Proceedings of the Joint ACL SIGDAT Conference on Empirical
475: Methods in Natural Language Processing and Very Large Corpora}, pages 29--37.
476:
477: \bibitem[\protect\citename{Fujii and Ishikawa}2001]{fujii:ntcir-2-2001}
478: Atsushi Fujii and Tetsuya Ishikawa.
479: \newblock 2001.
480: \newblock Evaluating multi-lingual information retrieval and clustering at
481: {ULIS}.
482: \newblock In {\em Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation
483: of Chinese \& Japanese Text Retrieval and Text Summarization}.
484:
485: \bibitem[\protect\citename{Fujii and Ishikawa}To appear]{fujii:chum-x}
486: Atsushi Fujii and Tetsuya Ishikawa.
487: \newblock (To appear).
488: \newblock {Japanese/English} cross-language information retrieval: Exploration
489: of query translation and transliteration.
490: \newblock {\em Computers and the Humanities}.
491:
492: \bibitem[\protect\citename{Fukui \bgroup et al.\egroup
493: }2000]{fukui:sigir-ws-pr-2000}
494: Masatoshi Fukui, Shigeto Higuchi, Youichi Nakatani, Masao Tanaka, Atsushi
495: Fujii, and Tetsuya Ishikawa.
496: \newblock 2000.
497: \newblock Applying a hybrid query translation method to {Japanese/English}
498: cross-language patent retrieval.
499: \newblock In {\em ACM SIGIR Workshop on Patent Retrieval}.
500:
501: \bibitem[\protect\citename{Gonzalo \bgroup et al.\egroup
502: }1998]{gonzalo:chum-98}
503: Julio Gonzalo, Felisa Verdejo, Carol Peters, and Nicoletta Calzolari.
504: \newblock 1998.
505: \newblock Applying {EuroWordNet} to cross-language text retrieval.
506: \newblock {\em Computers and the Humanities}, 32:185--207.
507:
508: \bibitem[\protect\citename{Iwayama and Tokunaga}1995]{iwayama:ijcai-95}
509: Makoto Iwayama and Takenobu Tokunaga.
510: \newblock 1995.
511: \newblock Hierarchical {Bayesian} clustering for automatic text classification.
512: \newblock In {\em Proceedings of the 14th International Joint Conference on
513: Artificial Intelligence}, pages 1322--1327.
514:
515: \bibitem[\protect\citename{Kando \bgroup et al.\egroup }1999]{kando:sigir-99}
516: Noriko Kando, Kazuko Kuriyama, and Toshihiko Nozue.
517: \newblock 1999.
518: \newblock {NACSIS} test collection workshop ({NTCIR-1}).
519: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR
520: Conference on Research and Development in Information Retrieval}, pages
521: 299--300.
522:
523: \bibitem[\protect\citename{Littman \bgroup et al.\egroup
524: }1998]{littman:clir-98}
525: Michael~L. Littman, Susan~T. Dumais, and Thomas~K. Landauer.
526: \newblock 1998.
527: \newblock Automatic cross-language information retrieval using latent semantic
528: indexing.
529: \newblock In Gregory Grefenstette, editor, {\em Cross-Language Information
530: Retrieval}, chapter~5, pages 51--62. Kluwer Academic Publishers.
531:
532: \bibitem[\protect\citename{Matsumoto \bgroup et al.\egroup
533: }1999]{matsumoto:chasen-99}
534: Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi
535: Matsuda, and Masayuki Asahara.
536: \newblock 1999.
537: \newblock {Japanese} morphological analysis system {ChaSen} version 2.0 manual
538: 2nd edition.
539: \newblock Technical Report NAIST-IS-TR99009, NAIST.
540:
541: \bibitem[\protect\citename{McCarley}1999]{mccarley:acl-99}
542: J.~Scott McCarley.
543: \newblock 1999.
544: \newblock Should we translate the documents or the queries in cross-language
545: information retrieval?
546: \newblock In {\em Proceedings of the 37th Annual Meeting of the Association for
547: Computational Linguistics}, pages 208--214.
548:
549: \bibitem[\protect\citename{Nie \bgroup et al.\egroup }1999]{nie:sigir-99}
550: Jian-Yun Nie, Michel Simard, Pierre Isabelle, and Richard Durand.
551: \newblock 1999.
552: \newblock Cross-language information retrieval based on parallel texts and
553: automatic mining of parallel texts from the {Web}.
554: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR
555: Conference on Research and Development in Information Retrieval}, pages
556: 74--81.
557:
558: \bibitem[\protect\citename{Oard and Resnik}1999]{oard:ipm-99}
559: Douglas~W. Oard and Philip Resnik.
560: \newblock 1999.
561: \newblock Support for interactive document selection in cross-language
562: information retrieval.
563: \newblock {\em Information Processing \& Management}, 35(3):363--379.
564:
565: \bibitem[\protect\citename{Oard}1998]{oard:amta-98}
566: Douglas~W. Oard.
567: \newblock 1998.
568: \newblock A comparative study of query and document translation for
569: cross-language information retrieval.
570: \newblock In {\em Proceedings of the 3rd Conference of the Association for
571: Machine Translation in the Americas}, pages 472--483.
572:
573: \bibitem[\protect\citename{Robertson and Walker}1994]{robertson:sigir-94}
574: S.~E. Robertson and S.~Walker.
575: \newblock 1994.
576: \newblock Some simple effective approximations to the 2-poisson model for
577: probabilistic weighted retrieval.
578: \newblock In {\em Proceedings of the 17th Annual International ACM SIGIR
579: Conference on Research and Development in Information Retrieval}, pages
580: 232--241.
581:
582: \bibitem[\protect\citename{Salton}1970]{salton:jasis-70}
583: Gerard Salton.
584: \newblock 1970.
585: \newblock Automatic processing of foreign language documents.
586: \newblock {\em Journal of the American Society for Information Science},
587: 21(3):187--194.
588:
589: \bibitem[\protect\citename{Smadja \bgroup et al.\egroup }1996]{smadja:cl-96}
590: Frank Smadja, Kathleen~R. McKeown, and Vasileios Hatzivassiloglou.
591: \newblock 1996.
592: \newblock Translating collocations for bilingual lexicons: A statistical
593: approach.
594: \newblock {\em Computational Linguistics}, 22(1):1--38.
595:
596: \bibitem[\protect\citename{Yamamoto and Matsumoto}2000]{yamamoto:coling-2000}
597: Kaoru Yamamoto and Yuji Matsumoto.
598: \newblock 2000.
599: \newblock Acquisition of phrase-level bilingual correspondence using dependency
600: structure.
601: \newblock In {\em Proceedings of the 18th International Conference on
602: Computational Linguistics}, pages 933--939.
603:
604: \end{thebibliography}
605:
606: \end{document}
607: