0206:cs0206035/main.tex

1: \documentstyle[lrec2000]{article}

2:

3: \title{PRIME: A System for Multi-lingual Patent Retrieval}

4:

5: \name{Shigeto Higuchi$^{\dagger}$, Masatoshi Fukui$^{\dagger}$, %\\

6: \large\bf Atsushi Fujii$^{\dagger\dagger,\dagger\dagger\dagger}$, and Tetsuya

7: Ishikawa$^{\dagger\dagger}$}

8:

9: \address{$^{\dagger}$PATOLIS Corporation \\

10: 2-4-29 Shiohama Koto-ku, 135-0043, Japan \\

11: $^{\dagger\dagger}$University of Library and Information Science \\

12: 1-2 Kasuga Tsukuba, 305-8550, Japan \\

13: $^{\dagger\dagger\dagger}$CREST, Japan Science and Technology Corporation \\

14: fujii@ulis.ac.jp}

15:

16: \abstract{Given the growing number of patents filed in multiple

17: countries, users are interested in retrieving patents across

18: languages.  We propose a multi-lingual patent retrieval system, which

19: translates a user query into the target language, searches a

20: multilingual database for patents relevant to the query, and improves

21: the browsing efficiency by way of machine translation and

22: clustering. Our system also extracts new translations from patent

23: families consisting of comparable patents, to enhance the translation

24: dictionary.}

25:

26: \keywords{multi-lingual patent retrieval, machine translation,

27: document clustering, translation extraction, patent families}

28:

29: \newcommand{\etal}{et~al.}

30: \newcommand{\etaleos}{et~al}

31: \newcommand{\eq}[1]{(\ref{#1})}

32:

33: \input{psfig.tex}

34:

35: \begin{document}

36:

37: \maketitleabstract

38:

39: \section{Introduction}

40: \label{sec:introduction}

41:

42: Given the growing number of patents filed in multiple countries, it is

43: feasible that users are interested in retrieving patent information

44: across languages. However, many users find it difficult to perform

45: patent retrieval (i.e., formulating queries, searching databases for

46: relevant patents, and browsing retrieved patents) in foreign

47: languages.

48:

49: To counter this problem, cross-language information retrieval (CLIR),

50: where queries in one language are submitted to retrieve documents in

51: another language, can be an effective solution.  CLIR has of late

52: become one of the major topics within the information retrieval and

53: natural language processing communities. In fact, a number of

54: methods/systems for CLIR have been proposed.

55:

56: Since by definition queries and documents are in different languages,

57: queries and documents need to be standardized into a common

58: representation, so that monolingual retrieval techniques can be

59: applied. From this point of view, existing CLIR methods are classified

60: into the following three fundamental categories.

61:

62: The first method translates queries into the document

63: language~\cite{ballesteros:sigir-98,fujii:chum-x,nie:sigir-99}, and

64: the second method translates documents into the query

65: language~\cite{mccarley:acl-99,oard:amta-98}. The third method

66: projects both queries and documents into a language-independent space

67: by way of thesaurus classes~\cite{gonzalo:chum-98,salton:jasis-70} and

68: latent semantic indexing~\cite{carbonell:ijcai-97,littman:clir-98}.

69:

70: Among those above methods, the first one (i.e., query translation

71: method) is preferable in terms of implementation cost, because this

72: approach can simply be combined with existing monolingual retrieval

73: systems.

74:

75: Following a query translation

76: method~\cite{fujii:emnlp-vlc-99,fujii:chum-x}, we previously proposed

77: a Japanese/English cross-language patent retrieval

78: system~\cite{fukui:sigir-ws-pr-2000}, where users submit queries in

79: either Japanese or English to retrieve patents in the other language.

80: In either case, the target database is monolingual.

81:

82: However, since users are not always sure as to which language database

83: contains patents relevant to their information need, it is effective

84: to retrieve patents in multiple languages {\em simultaneously\/}.

85: This process, which we shall call ``multi-lingual information

86: retrieval (MLIR)'', is an extension of CLIR. In this paper, we propose

87: a Japanese/English multi-lingual patent retrieval system called

88: ``PRIME'' (Patent Retrieval In Multi-lingual Environment),

89:

90: The design of our system is based on that for technical

91: documents~\cite{fujii:ntcir-2-2001}, which combines query translation,

92: document retrieval, document translation and clustering modules

93: (Section~\ref{sec:system}).

94:

95: Additionally, in this paper we newly introduce a module for enhancing

96: a dictionary used for the query translation module. For this purpose,

97: we propose a method to extract Japanese/English translations from

98: patent families consisting of comparable patents filed in Japan and

99: the United States (Section~\ref{sec:extraction}).

100:

101: \section{System Description}

102: \label{sec:system}

103:

104: \subsection{Overview}

105: \label{subsec:system_overview}

106:

107: Figure~\ref{fig:system} depicts the overall design of PRIME, which

108: retrieves documents in response to user queries in either Japanese or

109: English. However, unlike the case of CLIR, retrieved documents can

110: potentially be in either a combination of Japanese and English or

111: either of the languages individually. We briefly explain the entire

112: on-line process based on this figure.

113:

114: First, a user query is translated into the foreign language (i.e.,

115: either Japanese or English) by way of a query translation module.

116:

117: Second, a document retrieval module uses both the source (user) and

118: translated queries to search a Japanese/English bilingual patent

119: collection for relevant documents.

120:

121: In real world usage, Japanese and English patents are not comparable

122: in the collection (this is the major reason why cross/multi-lingual

123: retrieval is needed). However, for the purpose of research and

124: development, we currently target a comparable collection.

125:

126: To put it more precisely, the collection contains approximately

127: 1,750,000 pairs of Japanese abstracts and their English translations,

128: which were provided on PAJ (Patent Abstract of Japan) CD-ROMs in

129: 1995-1999\footnote{Copyright by Japan Patent Office.}.

130:

131: Third, among retrieved documents, only those that are in the foreign

132: language are translated into the user language through a document

133: translation module.

134:

135: In principle, we need only above three modules to realize

136: multi-lingual patent retrieval in the sense that users can

137: retrieve/browse foreign documents through their native

138: language. However, to improve the browsing efficiency, a clustering

139: module finally divides retrieved documents into a specific number of

140: groups.

141:

142: Additionally, in the off-line process, a translation extraction module

143: identifies Japanese/English translations in the database, to enhance

144: the query translation module.

145:

146: \begin{figure}[htbp]

147:   \begin{center}

148:     \leavevmode

149:     \psfig{file=system.eps,height=3.3in}

150:   \end{center}

151:   \caption{The design of PRIME: our multi-lingual patent retrieval system

152:   (dashed arrows denote the off-line process).}

153:   \label{fig:system}

154: \end{figure}

155:

156: \subsection{Query Translation}

157: \label{subsec:query_translation}

158:

159: The query translation module is based on the method proposed by Fujii

160: and Ishikawa~\shortcite{fujii:emnlp-vlc-99,fujii:chum-x}, which has

161: been applied to Japanese/English CLIR for the NTCIR collection

162: consisting of technical abstracts~\cite{kando:sigir-99}.

163:

164: This method translates words and phrases (compound words) in a given

165: query, maintaining the word order in the source language.  A

166: preliminary study showed that approximately 95\% of compound technical

167: terms defined in a bilingual dictionary~\cite{ferber:89} maintain the

168: same word order in both Japanese and English.

169:

170: Then, the Nova dictionary\footnote{Developed by NOVA,

171: Inc. http://www.nova.co.jp/} is used to derive possible word/phrase

172: translations, and a probabilistic method is used to resolve

173: translation ambiguity.

174:

175: The Nova dictionary includes approximately one million

176: Japanese-English translations related to 19 technical fields as listed

177: below:

178: \begin{quote}

179:   aeronautics, biotechnology, business, chemistry, computers,

180:   construction, defense, ecology, electricity, energy, finance, law,

181:   mathematics, mechanics, medicine, metals, oceanography, plants,

182:   trade.

183: \end{quote}

184:

185: In addition, for words unlisted in the Nova dictionary,

186: transliteration is performed to identify phonetic equivalents in the

187: target language. Since Japanese often represents loanwords (i.e.,

188: technical terms and proper nouns imported from foreign languages)

189: using its special phonetic alphabet (or phonogram) called ``{\it

190: katakana}'', with which new words can be spelled out, transliteration

191: is effective to improve the translation quality.

192:

193: We represent the user query and one translation candidate in the

194: document language by $U$ and $D$, respectively.  From the viewpoint of

195: probability theory, our task here is to select $D$'s with greater

196: probability, $P(D|U)$, which can be transformed as in

197: Equation~\eq{eq:query_translation} through the Bayesian theorem.

198: \begin{equation}

199:   \label{eq:query_translation}

200:   P(D|U) = \frac{\textstyle P(U|D)\cdot P(D)}{\textstyle P(U)}

201: \end{equation}

202: In practice, $P(U)$ can be omitted because this factor is a constant

203: with respect to the given query, and thus does not affect the relative

204: probability for different translation candidates.

205:

206: $P(D)$ is estimated by a word-based bi-gram language model produced

207: from the target collection. $P(U|D)$ is estimated based on the word

208: frequency obtained from the Nova dictionary.  Those two factors are

209: commonly termed language and translation models, respectively (see

210: Figure~\ref{fig:system}).

211:

212: \subsection{Document Retrieval}

213: \label{subsec:retrieval}

214:

215: The retrieval module is based on an existing probabilistic retrieval

216: method~\cite{robertson:sigir-94}, which computes the relevance score

217: between the translated query and each document in the collection. The

218: relevance score for document $i$ is computed based on

219: Equation~\eq{eq:okapi}.

220: \begin{equation}

221:   \label{eq:okapi}

222:   \sum_{t} \left(\frac{\textstyle TF_{t,i}}{\textstyle

223:     \frac{\textstyle DL_{i}}{\textstyle avglen} +

224:     TF_{t,i}}\cdot\log\frac{\textstyle N}{\textstyle DF_{t}}\right)

225: \end{equation}

226: Here, $TF_{t,i}$ denotes the frequency that term $t$ appears in

227: document $i$. $DF_{t}$ and $N$ denote the number of documents

228: containing term $t$ and the total number of documents in the

229: collection. $DL_{i}$ denotes the length of document $i$ (i.e., the

230: number of characters contained in $i$), and $avglen$ denotes the

231: average length of documents in the collection.

232:

233: For both Japanese and English collections, we use content words

234: extracted from documents as terms, and perform a word-based indexing.

235: For the Japanese collection, we use the ChaSen morphological

236: analyzer~\cite{matsumoto:chasen-99} to extract content words. However,

237: for the English collection, we extract content words based on

238: parts-of-speech as defined in WordNet~\cite{fellbaum:wordnet-98}.

239:

240: \subsection{Document Translation}

241: \label{subsec:document_translation}

242:

243: The document translation module consists of the the Transer

244: Japanese/English MT system, which uses the same dictionary used for

245: the query translation module.

246:

247: In practice, since machine translation is computationally expensive

248: and degrades the time efficiency, we perform machine translation on a

249: phrase-by-phrase basis.  In brief, phrases are sequences of content

250: words in documents, for which we developed rules to generate phrases

251: based on the part-of-speech information. This method is practical

252: because even a word/phrase-based translation can potentially improve

253: on the efficiency for users to find relevant foreign documents from

254: the whole retrieval result~\cite{oard:ipm-99}.

255:

256: \subsection{Clustering}

257: \label{subsec:clustering}

258:

259: For the purpose of clustering retrieved documents, we use the

260: Hierarchical Bayesian Clustering (HBC) method~\cite{iwayama:ijcai-95},

261: which merges similar items (i.e., documents in our case) in a

262: bottom-up manner, until all the items are merged into a single

263: cluster. Thus, a specific number of clusters can be obtained by

264: splitting the resultant hierarchy at a predetermined level.

265:

266: The HBC method also determines the most representative item (centroid)

267: for each cluster. Thus, we can enhance the browsing efficiency by

268: presenting only those centroids to users.

269:

270: The similarity between documents is computed based on feature vectors

271: that characterize each document. In our case, vectors for each

272: document consist of frequencies of content words appearing in the

273: document. We extract content words from documents as performed in

274: word-based indexing (see Section~\ref{subsec:retrieval}).

275:

276: Given the clustering module, the system can facilitate an interactive

277: retrieval. To put it more precisely, through the interface, users can

278: discard irrelevant clusters determined by browsing representative

279: documents, and re-cluster the remaining documents. By performing this

280: process recursively, relevant documents are eventually remained.

281:

282: \section{Extracting Translations Using Patent Families}

283: \label{sec:extraction}

284:

285: \subsection{Overview}

286: \label{subsec:extraction_overview}

287:

288: Since patents are usually associated with new words, it is crucial to

289: translate out-of-dictionary words. The transliteration method used in

290: the query translation module is one solution for this problem (see

291: Section~\ref{subsec:query_translation}).

292:

293: On the other hand, it is also effective to update the translation

294: dictionary.  For this purpose, a number of methods to extract

295: translations from bilingual (parallel/comparable)

296: corpora~\cite{smadja:cl-96,yamamoto:coling-2000} are

297: applicable. However, it is considerably expensive to obtain bilingual

298: corpora with sufficient volume of alignment information.

299:

300: To resolve this problem, we use patent families, which are patent sets

301: filed for the same/related contents in multiple countries, as

302: comparable corpora. Thus, patents contained in the same family are not

303: necessarily parallel, but quite comparable.

304:

305: Among a number of ways to apply for patents in multiple countries, we

306: focus solely on patents claiming priority under the Paris Convention,

307: because we can easily identify patent families by the identification

308: number assigned to each patent.

309:

310: In addition, the number of patent families is still increasing. Thus,

311: we can easily update a large-scale bilingual comparable corpus based

312: on patent families. To the best of our knowledge no research has

313: utilized patent families for extracting translations.

314:

315: \subsection{Methodology}

316: \label{subsec:extraction_method}

317:

318: Since patents are structured with a number of fields (e.g., titles,

319: abstracts, and claims), our method first identifies corresponding

320: fragments based on the document structure, to improve the extraction

321: accuracy.

322:

323: However, structures of paired patents are not always the same. For

324: example, the number of fields claimed in a single patent family often

325: varies depending on the language. Thus, we use only the title and

326: abstract fields, which usually parallel in Japanese and English

327: patents. In other words, unlike the case of most existing extraction

328: methods, our method does not need sentence-aligned corpora.

329:

330: We use the ChaSen morphological analyzer~\cite{matsumoto:chasen-99}

331: and Brill tagger~\cite{brill:cl-95} to extract content words from

332: Japanese and English fragments, respectively. In addition, we combine

333: more than one word into phrases, for which we developed rules to

334: generate phrases based on the part-of-speech information.

335:

336: We then compute the association score for all the possible

337: combinations of Japanese/English phrases co-occurring in the same

338: fragment, and select those with greater score as the final

339: translations.  For this purpose, we use the weighted Dice

340: coefficient~\cite{yamamoto:coling-2000} as shown in

341: Equation~\eq{eq:wdice}.

342: \begin{equation}

343:   \label{eq:wdice}

344:   score(W_{j}, W_{e}) = \log F_{je}\cdot\frac{\textstyle 2

345:   F_{je}}{\textstyle F_{j} + F_{e}}

346: \end{equation}

347: Here, $W_{j}$ and $W_{e}$ are Japanese and English phrases,

348: respectively. $F_{j}$ and $F_{e}$ denote the frequency that $W_{j}$

349: and $W_{e}$ appear in the entire corpus, respectively.  $F_{je}$

350: denotes the frequency that $W_{j}$ and $W_{e}$ co-occur in the same

351: fragment. The logarithm factor is effective to discard infrequent

352: co-occurrences, which usually decrease the extraction accuracy.

353:

354: \subsection{Experimentation}

355: \label{subsec:extraction_experimentation}

356:

357: A preliminary study showed that out of approximately 1,750,000 patents

358: filed in Japan (1995-1999), approximately 32,000 patents were paired

359: with those filed in the United States as patent families. Thus, in

360: practice we obtained a bilingual comparable corpus consisting of

361: 32,000 Japanese/English pairs.  From this corpus, our method extracted

362: 1,234,347 phrase-based translations, which were judged it correct or

363: incorrect.

364:

365: However, we selected translations association whose score was above

366: 1.5, and manually judged their correctness, because a) the judgement

367: can be considerably expensive for the entire translations, and b)

368: translations with small association scores are usually incorrect.  The

369: total number of selected translations was 37,669.

370:

371: We then evaluated the accuracy of our extraction method. The accuracy

372: is the ratio between the number of correct translations, and the

373: number of cases where the association score of the translation is

374: above a specific threshold.  By raising the value of the threshold,

375: the accuracy also increased, while the number of extracted

376: translations decreased, as shown in Table~\ref{tab:extraction}.

377: According to this table, we could achieve a high accuracy by limiting

378: the number of translations extracted.

379:

380: We spent only four man-days in judging the 37,669 translations and

381: identifying 5,879 correct translations.  In other words, our method

382: facilitated to produce bilingual lexicons semi-automatically with a

383: trivial cost.

384:

385: \begin{table}[htbp]

386:   \begin{center}

387:     \caption{Accuracy for translation extraction.}

388:     \medskip

389:     \leavevmode

390:     \small

391:     \tabcolsep=2pt

392:     \begin{tabular}{lrrrrr} \hline\hline

393:       Threshold for Score & 1.5 & 2.0 & 3.0 & 4.0 & 5.0 \\ \hline

394:       \# of Translations & 37,669 & 24,869 & 4,419 & 962 & 356 \\

395:       \# of Correct Translations & 5,879 & 4,129 & 1,399 & 564 & 240 \\

396:       Accuracy (\%) & 15.6 & 16.6 & 31.7 & 58.6 & 67.4 \\

397:       \hline

398:     \end{tabular}

399:     \label{tab:extraction}

400:   \end{center}

401: \end{table}

402:

403: \section{Conclusion}

404: \label{sec:conclusion}

405:

406: In this paper, we proposed a multi-lingual system for Japanese/English

407: patent retrieval. For this purpose, we used a query translation method

408: explored in cross-language information retrieval (CLIR).

409:

410: However, unlike the case of CLIR, our system retrieves bilingual

411: patents simultaneously in response to a monolingual query.  Our system

412: also summarizes retrieved patents by way of machine translation and

413: clustering to improve the browsing efficiency.

414:

415: In addition, our system includes an extraction module which produces

416: new translations from patent families consisting of comparable

417: patents, and updates the translation dictionary.

418:

419: Future work would include improving existing modules in our system,

420: and the application of our framework to other languages.

421:

422: \section*{Acknowledgments}

423:

424: The authors would like to thank NOVA, Inc. for their support with the

425: Nova dictionary and Transer system, and Makoto Iwayama for his support

426: with the HBC software.

427:

428: \small

429: \bibliographystyle{acl}

430:

431: \begin{thebibliography}{}

432:

433: \bibitem[\protect\citename{Ballesteros and Croft}1998]{ballesteros:sigir-98}

434: Lisa Ballesteros and W.~Bruce Croft.

435: \newblock 1998.

436: \newblock Resolving ambiguity for cross-language retrieval.

437: \newblock In {\em Proceedings of the 21st Annual International ACM SIGIR

438:   Conference on Research and Development in Information Retrieval}, pages

439:   64--71.

440:

441: \bibitem[\protect\citename{Brill}1995]{brill:cl-95}

442: Eric Brill.

443: \newblock 1995.

444: \newblock Transformation-based error-driven learning and natural language

445:   processing: A case study in part-of-speech tagging.

446: \newblock {\em Computational Linguistics}, 21(4):543--565.

447:

448: \bibitem[\protect\citename{Carbonell \bgroup et al.\egroup

449:   }1997]{carbonell:ijcai-97}

450: Jaime~G. Carbonell, Yiming Yang, Robert~E. Frederking, Ralf~D. Brown, Yibing

451:   Geng, and Danny Lee.

452: \newblock 1997.

453: \newblock Translingual information retrieval: A comparative evaluation.

454: \newblock In {\em Proceedings of the 15th International Joint Conference on

455:   Artificial Intelligence}, pages 708--714.

456:

457: \bibitem[\protect\citename{Fellbaum}1998]{fellbaum:wordnet-98}

458: Christiane Fellbaum, editor.

459: \newblock 1998.

460: \newblock {\em {WordNet}: An Electronic Lexical Database}.

461: \newblock MIT Press.

462:

463: \bibitem[\protect\citename{Ferber}1989]{ferber:89}

464: Gene Ferber.

465: \newblock 1989.

466: \newblock {\em {English-Japanese}, {Japanese-English} Dictionary of Computer

467:   and Data-Processing Terms}.

468: \newblock MIT Press.

469:

470: \bibitem[\protect\citename{Fujii and Ishikawa}1999]{fujii:emnlp-vlc-99}

471: Atsushi Fujii and Tetsuya Ishikawa.

472: \newblock 1999.

473: \newblock Cross-language information retrieval for technical documents.

474: \newblock In {\em Proceedings of the Joint ACL SIGDAT Conference on Empirical

475:   Methods in Natural Language Processing and Very Large Corpora}, pages 29--37.

476:

477: \bibitem[\protect\citename{Fujii and Ishikawa}2001]{fujii:ntcir-2-2001}

478: Atsushi Fujii and Tetsuya Ishikawa.

479: \newblock 2001.

480: \newblock Evaluating multi-lingual information retrieval and clustering at

481:   {ULIS}.

482: \newblock In {\em Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation

483:   of Chinese \& Japanese Text Retrieval and Text Summarization}.

484:

485: \bibitem[\protect\citename{Fujii and Ishikawa}To appear]{fujii:chum-x}

486: Atsushi Fujii and Tetsuya Ishikawa.

487: \newblock (To appear).

488: \newblock {Japanese/English} cross-language information retrieval: Exploration

489:   of query translation and transliteration.

490: \newblock {\em Computers and the Humanities}.

491:

492: \bibitem[\protect\citename{Fukui \bgroup et al.\egroup

493:   }2000]{fukui:sigir-ws-pr-2000}

494: Masatoshi Fukui, Shigeto Higuchi, Youichi Nakatani, Masao Tanaka, Atsushi

495:   Fujii, and Tetsuya Ishikawa.

496: \newblock 2000.

497: \newblock Applying a hybrid query translation method to {Japanese/English}

498:   cross-language patent retrieval.

499: \newblock In {\em ACM SIGIR Workshop on Patent Retrieval}.

500:

501: \bibitem[\protect\citename{Gonzalo \bgroup et al.\egroup

502:   }1998]{gonzalo:chum-98}

503: Julio Gonzalo, Felisa Verdejo, Carol Peters, and Nicoletta Calzolari.

504: \newblock 1998.

505: \newblock Applying {EuroWordNet} to cross-language text retrieval.

506: \newblock {\em Computers and the Humanities}, 32:185--207.

507:

508: \bibitem[\protect\citename{Iwayama and Tokunaga}1995]{iwayama:ijcai-95}

509: Makoto Iwayama and Takenobu Tokunaga.

510: \newblock 1995.

511: \newblock Hierarchical {Bayesian} clustering for automatic text classification.

512: \newblock In {\em Proceedings of the 14th International Joint Conference on

513:   Artificial Intelligence}, pages 1322--1327.

514:

515: \bibitem[\protect\citename{Kando \bgroup et al.\egroup }1999]{kando:sigir-99}

516: Noriko Kando, Kazuko Kuriyama, and Toshihiko Nozue.

517: \newblock 1999.

518: \newblock {NACSIS} test collection workshop ({NTCIR-1}).

519: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR

520:   Conference on Research and Development in Information Retrieval}, pages

521:   299--300.

522:

523: \bibitem[\protect\citename{Littman \bgroup et al.\egroup

524:   }1998]{littman:clir-98}

525: Michael~L. Littman, Susan~T. Dumais, and Thomas~K. Landauer.

526: \newblock 1998.

527: \newblock Automatic cross-language information retrieval using latent semantic

528:   indexing.

529: \newblock In Gregory Grefenstette, editor, {\em Cross-Language Information

530:   Retrieval}, chapter~5, pages 51--62. Kluwer Academic Publishers.

531:

532: \bibitem[\protect\citename{Matsumoto \bgroup et al.\egroup

533:   }1999]{matsumoto:chasen-99}

534: Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi

535:   Matsuda, and Masayuki Asahara.

536: \newblock 1999.

537: \newblock {Japanese} morphological analysis system {ChaSen} version 2.0 manual

538:   2nd edition.

539: \newblock Technical Report NAIST-IS-TR99009, NAIST.

540:

541: \bibitem[\protect\citename{McCarley}1999]{mccarley:acl-99}

542: J.~Scott McCarley.

543: \newblock 1999.

544: \newblock Should we translate the documents or the queries in cross-language

545:   information retrieval?

546: \newblock In {\em Proceedings of the 37th Annual Meeting of the Association for

547:   Computational Linguistics}, pages 208--214.

548:

549: \bibitem[\protect\citename{Nie \bgroup et al.\egroup }1999]{nie:sigir-99}

550: Jian-Yun Nie, Michel Simard, Pierre Isabelle, and Richard Durand.

551: \newblock 1999.

552: \newblock Cross-language information retrieval based on parallel texts and

553:   automatic mining of parallel texts from the {Web}.

554: \newblock In {\em Proceedings of the 22nd Annual International ACM SIGIR

555:   Conference on Research and Development in Information Retrieval}, pages

556:   74--81.

557:

558: \bibitem[\protect\citename{Oard and Resnik}1999]{oard:ipm-99}

559: Douglas~W. Oard and Philip Resnik.

560: \newblock 1999.

561: \newblock Support for interactive document selection in cross-language

562:   information retrieval.

563: \newblock {\em Information Processing \& Management}, 35(3):363--379.

564:

565: \bibitem[\protect\citename{Oard}1998]{oard:amta-98}

566: Douglas~W. Oard.

567: \newblock 1998.

568: \newblock A comparative study of query and document translation for

569:   cross-language information retrieval.

570: \newblock In {\em Proceedings of the 3rd Conference of the Association for

571:   Machine Translation in the Americas}, pages 472--483.

572:

573: \bibitem[\protect\citename{Robertson and Walker}1994]{robertson:sigir-94}

574: S.~E. Robertson and S.~Walker.

575: \newblock 1994.

576: \newblock Some simple effective approximations to the 2-poisson model for

577:   probabilistic weighted retrieval.

578: \newblock In {\em Proceedings of the 17th Annual International ACM SIGIR

579:   Conference on Research and Development in Information Retrieval}, pages

580:   232--241.

581:

582: \bibitem[\protect\citename{Salton}1970]{salton:jasis-70}

583: Gerard Salton.

584: \newblock 1970.

585: \newblock Automatic processing of foreign language documents.

586: \newblock {\em Journal of the American Society for Information Science},

587:   21(3):187--194.

588:

589: \bibitem[\protect\citename{Smadja \bgroup et al.\egroup }1996]{smadja:cl-96}

590: Frank Smadja, Kathleen~R. McKeown, and Vasileios Hatzivassiloglou.

591: \newblock 1996.

592: \newblock Translating collocations for bilingual lexicons: A statistical

593:   approach.

594: \newblock {\em Computational Linguistics}, 22(1):1--38.

595:

596: \bibitem[\protect\citename{Yamamoto and Matsumoto}2000]{yamamoto:coling-2000}

597: Kaoru Yamamoto and Yuji Matsumoto.

598: \newblock 2000.

599: \newblock Acquisition of phrase-level bilingual correspondence using dependency

600:   structure.

601: \newblock In {\em Proceedings of the 18th International Conference on

602:   Computational Linguistics}, pages 933--939.

603:

604: \end{thebibliography}

605:

606: \end{document}

607: