0206:cs0206030/main.tex

1: %

2: % COLING2002

3: %

4:

5: \documentclass[11pt]{article}

6:

7: \usepackage{colacl}

8: \usepackage{graphicx}

9:

10: \title{A Probabilistic Method for Analyzing Japanese Anaphora Integrating Zero Pronoun Detection and Resolution}

11:

12: \author{Kazuhiro Seki$^{\,\dag}$, Atsushi Fujii$^{\,\dag\dag,\,\dag\dag\dag}$

13:   and Tetsuya Ishikawa$^{\,\dag\dag}$\\

14:   \dag National Institute of Advanced Industrial Science and Technology\\

15:   {\normalsize 1-1-1, Chuuou Daini Umezono, Tsukuba 305-8568, Japan}\\

16:   \dag\dag University of Library and Information Science \\

17:   {\normalsize 1-2, Kasuga, Tsukuba, 305-8550, Japan}\\

18:   \dag\dag\dag  CREST, Japan Science \& Technology Corporation \\

19:   {\normalsize\tt k.seki@aist.go.jp\ \ \ fujii@ulis.ac.jp\ \ \ ishikawa@ulis.ac.jp}}

20:

21: \begin{document}

22: \maketitle

23:

24: \begin{abstract}

25:   This paper proposes a method to analyze Japanese anaphora, in which

26:   zero pronouns (omitted obligatory cases) are used to refer to

27:   preceding entities (antecedents). Unlike the case of general

28:   coreference resolution, zero pronouns have to be detected prior to

29:   resolution because they are not expressed in discourse. Our method

30:   integrates two probability parameters to perform zero pronoun

31:   detection and resolution in a single framework. The first parameter

32:   quantifies the degree to which a given case is a zero pronoun. The

33:   second parameter quantifies the degree to which a given entity is

34:   the antecedent for a detected zero pronoun. To compute these

35:   parameters efficiently, we use corpora with/without annotations of

36:   anaphoric relations.  We show the effectiveness of our method by way

37:   of experiments.

38: \end{abstract}

39:

40: \section{Introduction}

41: \label{sec:introduction}

42:

43: Anaphora resolution is crucial in natural language processing (NLP),

44: specifically, discourse analysis. In the case of English, partially

45: motivated by Message Understanding Conferences

46: (MUCs)~\cite{grishman96}, a number of coreference resolution methods

47: have been proposed.

48:

49: In other languages such as Japanese and Spanish, anaphoric expressions

50: are often omitted. Ellipses related to obligatory cases are usually

51: termed zero pronouns. Since zero pronouns are not expressed in

52: discourse, they have to be detected prior to identifying their

53: antecedents. Thus, although in English pleonastic pronouns have to be

54: determined whether or not they are anaphoric expressions prior to

55: resolution, the process of analyzing Japanese zero pronouns is

56: different from general coreference resolution in English.

57:

58: For identifying anaphoric relations, existing methods are classified

59: into two fundamental approaches: rule-based and statistical approaches.

60:

61: In rule-based

62: approaches~\cite{gros95,hobbs78,mitk98,naka96-2,okum96,palomar01,walk94},

63: anaphoric relations between anaphors and their antecedents are

64: identified by way of hand-crafted rules, which typically rely on

65: syntactic structures, gender/number agreement, and selectional

66: restrictions.  However, it is difficult to produce rules exhaustively,

67: and rules that are developed for a specific language are not

68: necessarily effective for other languages. For example, gender/number

69: agreement in English cannot be applied to Japanese.

70:

71: Statistical approaches~\cite{aone95,ge98,kim95,soon01} use statistical

72: models produced based on corpora annotated with anaphoric relations.

73: However, only a few attempts have been made in corpus-based anaphora

74: resolution for Japanese zero pronouns. One of the reasons is that it

75: is costly to produce a sufficient volume of training corpora annotated

76: with anaphoric relations.

77:

78: In addition, those above methods focused mainly on identifying

79: antecedents, and few attempts have been made to detect zero pronouns.

80:

81: Motivated by the above background, we propose a probabilistic model

82: for analyzing Japanese zero pronouns combined with a detection

83: method. In brief, our model consists of two parameters associated with

84: zero pronoun detection and antecedent identification. We focus on zero

85: pronouns whose antecedents exist in preceding sentences to zero

86: pronouns because they are major referential expressions in Japanese.

87:

88: Section~\ref{sec:proposed approach} explains our proposed method

89: (system) for analyzing  Japanese zero pronouns.

90: Section~\ref{sec:evaluation} evaluates our method by way of

91: experiments using newspaper articles.  Section~\ref{sec:related works}

92: discusses related research literature.

93:

94: \section{A System for Analyzing Japanese Zero Pronouns}

95: \label{sec:proposed approach}

96:

97: \subsection{Overview}

98: \label{sec:overview}

99:

100: Figure~\ref{fig:overview} depicts the overall design of our system to

101: analyze Japanese zero pronouns. We explain the entire process based on

102: this figure.

103:

104: First, given an input Japanese text, our system performs morphological

105: and syntactic analyses. In the case of Japanese, morphological

106: analysis involves word segmentation and part-of-speech tagging because

107: Japanese sentences lack lexical segmentation, for which we use the

108: JUMAN morphological analyzer~\cite{juman98e}. Then, we use the KNP

109: parser~\cite{knp98e} to identify syntactic relations between segmented

110: words.

111:

112: Second, in a zero pronoun detection phase, the system uses syntactic

113: relations to detect omitted cases (nominative, accusative, and dative)

114: as zero pronoun candidates. To avoid zero pronouns overdetected, we

115: use the IPAL verb dictionary~\cite{ipal87e} including case frames

116: associated with 911 Japanese verbs. We discard zero pronoun candidates

117: unlisted in the case frames associated with a verb in question.

118:

119: For verbs unlisted in the IPAL dictionary, only nominative cases are

120: regarded as obligatory. The system also computes a probability that

121: case $c$ related to target verb $v$ is a zero pronoun,

122: $P_{zero}(c|v)$, to select plausible zero pronoun candidates.

123:

124: Ideally, in the case where a verb in question is polysemous, word

125: sense disambiguation is needed to select the appropriate case frame,

126: because different verb senses often correspond to different case

127: frames. However, we currently merge multiple case frames for a verb

128: into a single frame so as to avoid the polysemous problem.  This issue

129: needs to be further explored.

130:

131: Third, in a zero pronoun resolution  (i.e., antecedent identification)

132: phase, for each zero pronoun the system extracts antecedent candidates

133: from the preceding contexts, which are ordered according to the extent

134: to which they can be the antecedent for the target zero pronoun. From

135: the viewpoint of probability theory, our task here is to compute a

136: probability that zero pronoun $\phi$ refers to antecedent $a_i$,

137: $P(a_i|\phi)$, and select the candidate that maximizes the probability

138: score. For the purpose of computing this score, we model zero pronouns

139: and antecedents in Section~\ref{sec:features}.

140:

141: Finally, the system outputs texts containing anaphoric relations.  In

142: addition, the number of zero pronouns analyzed by the system can

143: optionally be controlled based on the certainty score described in

144: Section~\ref{sec:certainty}.

145:

146: \begin{figure}[tb]

147: \begin{center}

148: \includegraphics[scale=0.9]{overview2.eps}

149: \caption{The overall design of our system to analyze Japanese zero pronouns.}

150: \label{fig:overview}

151: \end{center}

152: \end{figure}

153:

154: \subsection{Modeling Zero Pronouns and Antecedents}

155: \label{sec:features}

156:

157: According to past literature associated with zero pronoun resolution

158: and our preliminary study, we use the following six features to model

159: zero pronouns and antecedents.

160:

161: \vspace{3mm}

162: \noindent

163: $\bullet$ Features for zero pronouns

164: \begin{itemize}

165: \item[--] Verbs that govern zero pronouns ($v$), which denote verbs

166:   whose cases are omitted.

167:

168: \item[--] Surface cases related to zero pronouns ($c$), for which

169:   possible values are Japanese case marker suffixes, {\it ga\/}

170:   (nominative), {\it wo\/} (accusative), and {\it ni\/} (dative).  Those

171:   values indicate which cases are omitted.

172: \end{itemize}

173: \noindent

174: $\bullet$ Features for antecedents

175: \begin{itemize}

176: \item[--] Post-positional particles ($p$), which play crucial roles in

177:   resolving Japanese zero pronouns~\cite{kame86,walk94}.

178:

179: \item[--] Distance ($d$), which denotes the distance (proximity)

180:   between a zero pronoun and an antecedent candidate in an input text. In

181:   the case where they occur in the same sentence, its value takes $0$.

182:   In the case where an antecedent occurs in $n$ sentences previous to

183:   the sentence including a zero pronoun, its value takes $n$.

184:

185: \item[--] Constraint related to relative clauses ($r$), which denotes

186:   whether an antecedent is included in a relative clause or not. In

187:   the case where it is included, the value of $r$ takes \textit{true},

188:   otherwise \textit{false}.  The rationale behind this feature is that

189:   Japanese zero pronouns tend {\em not} to refer to noun phrases in

190:   relative clauses.

191:

192: \item[--] Semantic classes ($n$), which represent semantic classes

193:   associated with antecedents.  We use 544 semantic classes defined in

194:   the Japanese \textit{Bunruigoihyou} thesaurus~\cite{koku64e}, which

195:   contains 55,443 Japanese nouns.

196:

197: \end{itemize}

198:

199: \subsection{Our Probabilistic Model for Zero Pronoun Detection and Resolution}

200: \label{sec:probabilistic model}

201:

202: We consider probabilities that unsatisfied case $c$ related to verb

203: $v$ is a zero pronoun, $P_{zero}(c|v)$, and that zero pronoun $\phi_c$

204: refers to antecedent $a_i$, $P(a_i|\phi_c)$. Thus, a probability that

205: case $c$ ($\phi_c$) is zero-pronominalized and refers to candidate

206: $a_i$ is formalized as in Equation~(\ref{eq:product}).

207: \begin{eqnarray}

208:   \label{eq:product}

209:   P(a_i|\phi_c)\cdot P_{zero}(c|v)

210: \end{eqnarray}

211: Here, $P_{zero}(c|v)$ and $P(a_i|\phi_c)$ are computed in the

212: detection and resolution phases, respectively (see

213: Figure~\ref{fig:overview}).

214:

215: Since zero pronouns are omitted obligatory cases, whether or not case

216: $c$ is a zero pronoun depends on the extent to which case $c$ is

217: obligatory for verb $v$. Case $c$ is likely to be obligatory for verb

218: $v$ if $c$ frequently co-occurs with $v$. Thus, we compute

219: $P_{zero}(c|v)$ based on the co-occurrence frequency of $\langle

220: v,c\rangle$ pairs, which can be extracted from unannotated corpora.

221: $P_{zero}(c|v)$ takes 1 in the case where $c$ is $ga$ (nominative)

222: regardless of the target verb, because $ga$ is obligatory for most

223: Japanese verbs.

224:

225: Given the formal representation for zero pronouns and antecedents in

226: Section~\ref{sec:features}, the probability, $P(a|\phi)$, is expressed

227: as in Equation~(\ref{eq:paz1}).

228: \begin{eqnarray}

229:   \label{eq:paz1}

230:   P(a_i|\phi) = P(p_i,d_i,r_i,n_i|v,c)

231: \end{eqnarray}

232:

233: \noindent

234: To improve the efficiency of probability estimation, we decompose the

235: right-hand side of Equation~(\ref{eq:paz1}) as follows.

236:

237: Since a preliminary study showed that $d_i$ and $r_i$ were relatively

238: independent of the other features, we approximate

239: Equation~(\ref{eq:paz1}) as in Equation~(\ref{eq:paz2}).

240: \begin{eqnarray}

241:   \label{eq:paz2}

242:   \begin{array}{@{}r@{~}c@{~}l@{}}

243:     \vspace*{1mm}

244:     P(a_i|\phi) & \approx & P(p_i,n_i|v,c)\cdot P(d_i)\cdot P(r_i)\\

245:     \vspace*{1mm}

246:     &=& P(p_i|n_i,v,c)\cdot P(n_i|v,c)\\

247:     && \mbox{}\cdot P(d_i)\cdot P(r_i)

248:   \end{array}

249: \end{eqnarray}

250: Given that $p_i$ is independent of $v$ and $n_i$, we can further

251: approximate Equation~(\ref{eq:paz2}) to derive

252: Equation~(\ref{eq:paz}).

253: \begin{eqnarray}

254:   \label{eq:paz}

255:   P(a_i|\phi_c) \approx P(p_i|c)\!\cdot\! P(d_i)\!\cdot\! P(r_i)\!\cdot\! P(n_i|v,c)

256: \end{eqnarray}

257: Here, the first three factors, $P(p_i|c)\cdot P(d_i)\cdot P(r_i)$, are

258: related to syntactic properties, and $P(n_i|v,c)$ is a semantic

259: property associated with zero pronouns and antecedents. We shall call

260: the former and latter ``syntactic'' and ``semantic'' models,

261: respectively.

262:

263: Each parameter in Equation~(\ref{eq:paz}) is computed as in Equations

264: (\ref{eq:ppc}), where $F(x)$ denotes the frequency of $x$ in corpora

265: annotated with anaphoric relations.

266: \begin{eqnarray}

267:   \label{eq:ppc}

268:   \begin{array}{rcl}

269:     \vspace*{1mm}

270:     P(p_i|c)&=&\displaystyle\frac{F(p_i,c)}{\sum_{j}F(p_j,c)}\\

271:     \vspace*{1mm}

272:     \label{eq:pd}

273:     P(d_i)&=&\displaystyle\frac{F(d_i)}{\sum_{j}F(d_j)}\\

274:     \vspace*{1mm}

275:     \label{eq:pm}

276:     P(r_i)&=&\displaystyle\frac{F(r_i)}{\sum_{j}F(r_j)}\\

277:     \label{eq:pns}

278:     P(n_i|v,c)&=&\displaystyle\frac{F(n_i,v,c)}{\sum_{j}F(n_j,v,c)}

279:   \end{array}

280: \end{eqnarray}

281: However, since estimating a semantic model, $P(n_i|v,c)$, needs

282: large-scale annotated corpora, the data sparseness problem is crucial.

283: Thus, we explore the use of unannotated corpora.

284:

285: For $P(n_i|v,c)$, $v$ and $c$ are features for a zero pronoun, and

286: $n_i$ is a feature for an antecedent. However, we can regard $v$, $c$,

287: and $n_i$ as features for a verb and its case noun because zero

288: pronouns are omitted case nouns. Thus, it is possible to estimate the

289: probability based on co-occurrences of verbs and their case nouns,

290: which can be extracted automatically from large-scale unannotated

291: corpora.

292:

293: \subsection{Computing Certainty Score}

294: \label{sec:certainty}

295:

296: Since zero pronoun analysis is not a stand-alone application, our

297: system is used as a module in other NLP applications, such as machine

298: translation. In those applications, it is desirable that erroneous

299: anaphoric relations are not generated. Thus, we propose a notion of

300: certainty to output only zero pronouns that are detected and resolved

301: with a high certainty score.

302:

303: We formalize the certainty score, $C(\phi_c)$, for each zero pronoun

304: as in Equation (\ref{eq:certainty}), where $P_1(\phi_c)$ and

305: $P_2(\phi_c)$ denote probabilities computed by

306: Equation~(\ref{eq:product}) for the first and second ranked

307: candidates, respectively. In addition, $t$ is a parametric constant,

308: which is experimentally set to $0.5$.

309: \begin{eqnarray}

310:   \label{eq:certainty}

311:   C(\phi_c) = t\!\cdot\! P_1(\phi_c)+(1\!-\!t)(P_1(\phi_c)\!-\!P_2(\phi_c))

312: \end{eqnarray}

313: The certainty score becomes great in the case where $P_1(\phi_c)$ is

314: sufficiently great and significantly greater than $P_2(\phi_c)$.

315:

316: \section{Evaluation}

317: \label{sec:evaluation}

318:

319: \subsection{Methodology}

320: \label{sec:methodology}

321:

322: To investigate the performance of our system, we used

323: \textit{Kyotodaigaku} Text Corpus version 2.0~\cite{kuro98}, in which

324: 20,000 articles in \textit{Mainichi Shimbun} newspaper articles in

325: 1995 were analyzed by JUMAN and KNP (i.e., the morph/syntax analyzers

326: used in our system) and revised manually. From this corpus, we

327: randomly selected 30 general articles (e.g., politics and sports) and

328: manually annotated those articles with anaphoric relations for zero

329: pronouns. The number of zero pronouns contained in those articles was

330: 449.

331:

332: We used a leave-one-out cross-validation evaluation method: we

333: conducted 30 trials in each of which  one article was used as a test

334: input and the remaining 29 articles were used for producing a

335: syntactic model. We used six years worth of \textit{Mainichi Shimbun}

336: newspaper articles~\cite{mainichi-e} to produce a semantic model based

337: on co-occurrences of verbs and their case nouns.

338:

339: To extract verbs and their case noun pairs from newspaper articles, we

340: performed a morphological analysis by JUMAN and extracted dependency

341: relations using a relatively simple rule: we assumed that each noun

342: modifies the verb of highest proximity.  As a result, we obtained 12

343: million co-occurrences associated with 6,194 verb types. Then, we

344: generalized the extracted nouns into semantic classes in the Japanese

345: {\it Bunruigoihyou\/} thesaurus. In the case where a noun was

346: associated with multiple classes, the noun was assigned to all

347: possible classes.

348: In the case where a noun was not listed in the thesaurus, the noun

349: itself was regarded as a single semantic class.

350:

351: \begin{table*}[htb]

352:   \begin{center}

353:     \caption{Experimental results for zero pronoun resolution.}

354:     \label{tab:riyousosei}

355:     \footnotesize

356:     \smallskip

357:     \begin{tabular}{cr@{~}lccr@{~}lcc} \hline\hline

358:       & \multicolumn{8}{c}{\# of Correct cases (Accuracy)} \\

359:       \cline{2-9}

360:       $k$ & \multicolumn{2}{c}{$Sem1$} & $Sem2$ & $Syn$ & \multicolumn{2}{c}{$Both1$} & $Both2$ & $Rule$\\

361:       \hline

362:       1 & 25 & (6.2\%)  & 119 (29.5\%) & 185 (45.8\%) & 30 & (7.4\%) & \bf{205 (50.7\%)} & 162 (40.1\%) \\

363:       2 & 46 & (11.4\%) & 193 (47.8\%) & 227 (56.2\%) & 49 & (12.1\%) & \bf{250 (61.9\%)} & 213 (52.7\%) \\

364:       3 & 72 & (17.8\%) & 230 (56.9\%) & 262 (64.9\%) & 75 & (18.6\%) & \bf{280 (69.3\%)} & 237 (58.6\%) \\

365:       \hline

366:     \end{tabular}

367:   \end{center}

368: \end{table*}

369:

370: \subsection{Comparative Experiments}

371: \label{sec:results}

372:

373: Fundamentally, our evaluation is two-fold: we evaluated only zero

374: pronoun resolution (antecedent identification) and a combination of

375: detection and resolution. In the former case, we assumed that all the

376: zero pronouns are correctly detected, and investigated the

377: effectiveness of the resolution model, $P(a_i|\phi)$. In the latter

378: case, we investigated the effectiveness of the combined model,

379: $P(a_i|\phi_c)\cdot P_{zero}(c|v)$.

380:

381: First, we compared the performance of the following different models

382: for zero pronoun resolution, $P(a_i|\phi)$:

383: \begin{list}{$\bullet$}{\itemsep 3pt \parsep 0pt}

384: \item a semantic model produced based on annotated corpora ($Sem1$),

385: \item a semantic model produced based on unannotated corpora, using

386:   co-occurrences of verbs and their case nouns ($Sem2$),

387: \item a syntactic model ($Syn$),

388: \item a combination of $Syn$ and $Sem1$ ($Both1$),

389: \item a combination of $Syn$ and $Sem2$ ($Both2$), which is our

390:   complete model for zero pronoun resolution,

391: \item a rule-based model ($Rule$).

392: \end{list}

393: As a control (baseline) model, we took approximately two man-months to

394: develop a rule-based model ($Rule$) through an analysis on ten

395: articles in \textit{Kyotodaigaku} Text Corpus. This model uses rules

396: typically used in existing rule-based methods: 1) post-positional

397: particles that follow antecedent candidates, 2) proximity between zero

398: pronouns and antecedent candidates, and 3) conjunctive particles. We

399: did not use semantic properties in the rule-based method because they

400: decreased the system accuracy in a preliminary study.

401:

402: Table~\ref{tab:riyousosei} shows the results, where we regarded the

403: $k$-best antecedent candidates as the final output and compared

404: results for different values of $k$.  In the case where the correct

405: answer was included in the $k$-best candidates, we judged it

406: correct. In addition, ``Accuracy'' is the ratio between the number of

407: zero pronouns whose antecedents were correctly identified and the

408: number of zero pronouns correctly detected by the system (404 for all

409: the models). Bold figures denote the highest performance for each

410: value of $k$ across different models. Here, the average number of

411: antecedent candidates per zero pronoun was 27 regardless of the model,

412: and thus the accuracy was 3.7\% in the case where the system randomly

413: selected antecedents.

414:

415: Looking at the results for two different semantic models, $Sem2$

416: outperformed $Sem1$, which indicates that the use of co-occurrences of

417: verbs and their case nouns was effective to identify antecedents and

418: avoid the data sparseness problem in producing a semantic model.

419:

420: The syntactic model, $Syn$, outperformed the two semantic models

421: independently, and therefore the syntactic features used in our model

422: were more effective than the semantic features to identify

423: antecedents. When both syntactic and semantic models were used in

424: $Both2$, the accuracy was further improved.  While the rule-based

425: method, $Rule$, achieved a relatively high accuracy, our complete

426: model, $Both2$, outperformed $Rule$ irrespective of the value of $k$.

427: To sum up, we conclude that both syntactic and semantic models were

428: effective to identify appropriate anaphoric relations.

429:

430: At the same time, since our method requires annotated corpora, the

431: relation between the corpus size and accuracy is crucial. Thus, we

432: performed two additional experiments associated with $Both2$.

433:

434: In the first experiment, we varied the number of annotated articles

435: used to produce a syntactic model, where a semantic model was produced

436: based on six years worth of newspaper articles. In the second

437: experiment, we varied the number of unannotated articles used to

438: produce a semantic model, where a syntactic model was produced based

439: on 29 annotated articles. In Figure~\ref{fig:both}, we show two {\em

440:   independent\/} results as space is limited: the dashed and solid

441: graphs correspond to the results of the first and second experiments,

442: respectively. Given all the articles for modeling, the resultant

443: accuracy for each experiment was 50.7\%, which corresponds to that for

444: $Both2$ with $k=1$ in Table~\ref{tab:riyousosei}.

445:

446: \begin{figure}[tb]

447: \begin{center}

448:   \includegraphics[scale=0.62]{training-en.eps}

449: \caption{The relation between the corpus size and accuracy for a combination of syntactic and semantic models ($Both2$).}

450: \label{fig:both}

451: \end{center}

452: \end{figure}

453:

454: In the case where the number of articles was varied in producing a

455: syntactic model, the accuracy improved rapidly in the first five

456: articles. This indicates that a high accuracy can be obtained by a

457: relatively small number of supervised articles.  In the case where the

458: amount of unannotated corpora was varied in producing a semantic

459: model, the accuracy marginally improved as the corpus size

460: increases. However, note that we do not need human supervision to

461: produce a semantic model.

462:

463: Finally, we evaluated the effectiveness of the combination of zero

464: pronoun detection and resolution in Equation~(\ref{eq:product}).  To

465: investigate the contribution of the detection model, $P_{zero}(c|v)$,

466: we used $P(a_i|\phi_c)$ for comparison. Both cases used $Both2$ to

467: compute the probability for zero pronoun resolution.  We varied a

468: threshold for the certainty score to plot coverage-accuracy graphs for

469: zero pronoun detection (Figure~\ref{fig:detection}) and antecedent

470: identification (Figure~\ref{fig:sikii}).

471:

472: \begin{figure}

473: \begin{center}

474: \includegraphics[scale=0.60]{recall-precisionAna-i.eps}

475: \caption{The relation between coverage and accuracy for zero pronoun detection ({\it Both\/}2).}

476: \label{fig:detection}

477: \end{center}

478: \end{figure}

479:

480: \begin{figure}

481: \begin{center}

482:   \includegraphics[scale=0.60]{coverage-accuracy-i.eps}

483: \caption{The relation between coverage and accuracy for antecedent identification ({\it Both\/}2).}

484: \label{fig:sikii}

485: \end{center}

486: \end{figure}

487:

488: In Figure~\ref{fig:detection}, ``coverage'' is the ratio between the

489: number of zero pronouns correctly detected by the system and the total

490: number of zero pronouns in input texts, and ``accuracy'' is the ratio

491: between the number of zero pronouns correctly detected and the total

492: number of zero pronouns detected by the system. Note that since our

493: system failed to detect a number of zero pronouns, the coverage could

494: not be 100\%.

495:

496: Figure~\ref{fig:detection} shows that  as the coverage

497: decreases, the accuracy improved irrespective of the model used.  When

498: compared with the case of $P(a_i|\phi)$, our model, $P(a_i|\phi)\cdot

499: P_{zero}(c|v)$, achieved a higher accuracy regardless of the coverage.

500:

501: In Figure~\ref{fig:sikii}, ``coverage'' is the ratio between the

502: number of zero pronouns whose antecedents were generated and the

503: number of zero pronouns correctly detected by the system.  The

504: accuracy was improved by decreasing the coverage, and our model

505: marginally improved the accuracy for $P(a_i|\phi)$.

506:

507: According to those above results, our model was effective to improve

508: the accuracy for zero pronoun detection and did not have side effect

509: on the antecedent identification process. As a result, the overall

510: accuracy of zero pronoun detection and resolution was improved.

511:

512:

513: \section{Related Work}

514: \label{sec:related works}

515:

516: Kim and Ehara~\shortcite{kim95} proposed a probabilistic model to

517: resolve subjective zero pronouns for the purpose of Japanese/English

518: machine translation. In their model, the search scope for possible

519: antecedents was limited to the sentence containing zero pronouns.  In

520: contrast, our method can resolve zero pronouns in both

521: intra/inter-sentential anaphora types.

522:

523: Aone and Bennett~\shortcite{aone95} used a decision tree to determine

524: appropriate antecedents for zero pronouns. They focused on proper and

525: definite nouns used in anaphoric expressions as well as zero pronouns.

526: However, their method resolves only anaphors that refer to

527: organization names (e.g., private companies), which are generally

528: easier to resolve than our case.

529:

530: Both above existing methods require annotated corpora for statistical

531: modeling, while we used corpora with/without annotations related to

532: anaphoric relations, and thus we can easily obtain large-scale corpora

533: to avoid the data sparseness problem.

534:

535: Nakaiwa~\shortcite{nakaiwa00} used Japanese/English bilingual corpora

536: to identify anaphoric relations of Japanese zero pronouns by comparing

537: J/E sentence pairs. The rationale behind this method is that

538: obligatory cases zero-pronominalized in Japanese are usually expressed

539: in English. However, in the case where corresponding English

540: expressions are pronouns and anaphors, their method is not

541: effective. Additionally, bilingual corpora are more expensive to

542: obtain than monolingual corpora used in our method.

543:

544: Finally, our method integrates a parameter for zero pronoun detection

545: in computing the certainty score. Thus, we can improve the accuracy of

546: our system by discarding extraneous outputs with a small certainty

547: score.

548:

549: \section{Conclusion}

550: \label{sec:conclusion}

551:

552: We proposed a probabilistic model to analyze Japanese zero pronouns

553: that refer to antecedents in the previous context. Our model consists

554: of two probabilistic parameters corresponding to detecting zero

555: pronouns and identifying their antecedents, respectively. The latter

556: is decomposed into syntactic and semantic properties. To estimate

557: those parameters efficiently, we used annotated/unannotated

558: corpora. In addition, we formalized the certainty score to improve the

559: accuracy. Through experiments, we showed that the use of unannotated

560: corpora was effective to avoid the data sparseness problem and that

561: the certainty score further improved the accuracy.

562:

563: Future work would include word sense disambiguation for polysemous

564: predicate verbs to select appropriate case frames in the zero pronoun

565: detection process.

566:

567: \bibliographystyle{acl}

568: \small

569: \begin{thebibliography}{}

570:

571: \bibitem[\protect\citename{Aone and Bennett}1995]{aone95}

572: Chinatsu Aone and Scott~William Bennett.

573: \newblock 1995.

574: \newblock Evaluating automated and manual acquisition of anaphora resolution

575:   strategies.

576: \newblock In {\em Proceedings of 33th Annual Meeting of the Association for

577:   Computational Linguistics}, pages 122--129.

578:

579: \bibitem[\protect\citename{Ge \bgroup et al.\egroup }1998]{ge98}

580: Niyu Ge, John Hale, and Eugene Charniak.

581: \newblock 1998.

582: \newblock A statistical approach to anaphora resolution.

583: \newblock In {\em Proceedings of the Sixth Workshop on Very Large Corpora},

584:   pages 161--170.

585:

586: \bibitem[\protect\citename{Grishman and Sundheim}1996]{grishman96}

587: Ralph Grishman and Beth Sundheim.

588: \newblock 1996.

589: \newblock Message {Understanding} {Conference} - 6: A brief history.

590: \newblock In {\em Proceedings of the 16th International Conference on

591:   Computational Linguistics}, pages 466--471.

592:

593: \bibitem[\protect\citename{Grosz \bgroup et al.\egroup }1995]{gros95}

594: Barbara~J. Grosz, Aravind~K. Joshi, and Scott Weinstein.

595: \newblock 1995.

596: \newblock Centering: A framework for modeling the local coherence of discourse.

597: \newblock {\em Computational Linguistics}, 21(2):203--226.

598:

599: \bibitem[\protect\citename{Hobbs}1978]{hobbs78}

600: Jerry~R. Hobbs.

601: \newblock 1978.

602: \newblock Resolving pronoun references.

603: \newblock {\em Lingua}, 44:311--338.

604:

605: \bibitem[\protect\citename{{Information-technology Promotion

606:   Agency}}1987]{ipal87e}

607: {Information-technology Promotion Agency}, 1987.

608: \newblock {\em IPA Lexicon of the {Japanese} language for computers (Basic

609:   Verbs)}.

610: \newblock (in Japanese).

611:

612: \bibitem[\protect\citename{Kameyama}1986]{kame86}

613: Megumi Kameyama.

614: \newblock 1986.

615: \newblock A property-sharing constraint in centering.

616: \newblock In {\em Proceedings of the 24th Annual Meeting of the Association for

617:   Computational Linguistics}, pages 200--206.

618:

619: \bibitem[\protect\citename{Kim and Ehara}1995]{kim95}

620: Yeun-Bae Kim and Terumasa Ehara.

621: \newblock 1995.

622: \newblock Zero-subject resolution method based on probabilistic inference with

623:   evaluation function.

624: \newblock In {\em Proceedings of the 3rd Natural Language Processing

625:   Pacific-Rim Symposium}, pages 721--727.

626:

627: \bibitem[\protect\citename{Kurohashi and Nagao}1998a]{kuro98}

628: Sadao Kurohashi and Makoto Nagao.

629: \newblock 1998a.

630: \newblock Building a {Japanese} parsed corpus while improving the parsing

631:   system.

632: \newblock In {\em Proceedings of The 1st International Conference on Language

633:   Resources \& Evaluation}, pages 719--724.

634:

635: \bibitem[\protect\citename{Kurohashi and Nagao}1998b]{juman98e}

636: Sadao Kurohashi and Makoto Nagao, 1998b.

637: \newblock {\em {Japanese} morphological analysis system {JUMAN} version 3.6

638:   manual}.

639: \newblock Department of Informatics, Kyoto University.

640: \newblock (in Japanese).

641:

642: \bibitem[\protect\citename{Kurohashi}1998]{knp98e}

643: Sadao Kurohashi, 1998.

644: \newblock {\em Japanese Dependency/Case Structure Analyzer {KNP} version

645:   2.0b6}.

646: \newblock Department of Informatics, Kyoto University.

647: \newblock (in Japanese).

648:

649: \bibitem[\protect\citename{{Mainichi Shimbunsha}}1994--1999]{mainichi-e}

650: {Mainichi Shimbunsha}.

651: \newblock {1994--1999}.

652: \newblock {Mainichi Shimbun CD-ROM}.

653:

654: \bibitem[\protect\citename{Mitkov \bgroup et al.\egroup }1998]{mitk98}

655: Ruslan Mitkov, Lamia Belguith, and Malgorzata Stys.

656: \newblock 1998.

657: \newblock Multilingual robust anaphora resolution.

658: \newblock In {\em Proceedings of the 3rd Conference on Empirical Methods in

659:   Natural Language Processing}, pages 7--16.

660:

661: \bibitem[\protect\citename{Nakaiwa and Shirai}1996]{naka96-2}

662: Hiromi Nakaiwa and Satoshi Shirai.

663: \newblock 1996.

664: \newblock Anaphora resolution of {Japanese} zero pronouns with deictic

665:   reference.

666: \newblock In {\em Proceedings of the 16th International Conference on

667:   Computational Linguistics}, pages 812--817.

668:

669: \bibitem[\protect\citename{Nakaiwa}2000]{nakaiwa00}

670: Hiromi Nakaiwa.

671: \newblock 2000.

672: \newblock An environment for extracting resolution rules of zero pronouns from

673:   corpora.

674: \newblock In {\em COLING-2000 Workshop on Semantic Annotation and Intelligent

675:   Content}, pages 44--52.

676:

677: \bibitem[\protect\citename{{National Language Research

678:   Institute}}1964]{koku64e}

679: {National Language Research Institute}.

680: \newblock 1964.

681: \newblock {\em Bunruigoihyou}.

682: \newblock Shuei publisher.

683: \newblock (in Japanese).

684:

685: \bibitem[\protect\citename{Okumura and Tamura}1996]{okum96}

686: Manabu Okumura and Kouji Tamura.

687: \newblock 1996.

688: \newblock Zero pronoun resolution in {Japanese} discourse based on centering

689:   theory.

690: \newblock In {\em Proceedings of the 16th International Conference on

691:   Computational Linguistics}, pages 871--876.

692:

693: \bibitem[\protect\citename{Palomar \bgroup et al.\egroup }2001]{palomar01}

694: Manuel Palomar, Antonio Ferr\'{a}ndez, Lidia Moreno, Patricio

695:   Mart\'{\i}nez-Barco, Jes\'{u}s Peral, Maximiliano Saiz-Noeda, and Rafael~Mu\

696:   {n}oz.

697: \newblock 2001.

698: \newblock An algorithm for anaphora resolution in {Spanish} texts.

699: \newblock {\em Computational Linguistics}, 27(4):545--568.

700:

701: \bibitem[\protect\citename{Soon \bgroup et al.\egroup }2001]{soon01}

702: Wee~Meng Soon, Hwee~Tou Ng, and Daniel Chung~Yong Lim.

703: \newblock 2001.

704: \newblock A machine learning approach to coreference resolution of noun

705:   phrases.

706: \newblock {\em Computational Linguistics}, 27(4):521--544.

707:

708: \bibitem[\protect\citename{Walker \bgroup et al.\egroup }1994]{walk94}

709: Marilyn Walker, Masayo Iida, and Sharon Cote.

710: \newblock 1994.

711: \newblock Japanese discourse and the process of centering.

712: \newblock {\em Computational Linguistics}, 20(2):193--233.

713:

714: \end{thebibliography}

715:

716: \end{document}

717: