1: %
2: % COLING2002
3: %
4:
5: \documentclass[11pt]{article}
6:
7: \usepackage{colacl}
8: \usepackage{graphicx}
9:
10: \title{A Probabilistic Method for Analyzing Japanese Anaphora Integrating Zero Pronoun Detection and Resolution}
11:
12: \author{Kazuhiro Seki$^{\,\dag}$, Atsushi Fujii$^{\,\dag\dag,\,\dag\dag\dag}$
13: and Tetsuya Ishikawa$^{\,\dag\dag}$\\
14: \dag National Institute of Advanced Industrial Science and Technology\\
15: {\normalsize 1-1-1, Chuuou Daini Umezono, Tsukuba 305-8568, Japan}\\
16: \dag\dag University of Library and Information Science \\
17: {\normalsize 1-2, Kasuga, Tsukuba, 305-8550, Japan}\\
18: \dag\dag\dag CREST, Japan Science \& Technology Corporation \\
19: {\normalsize\tt k.seki@aist.go.jp\ \ \ fujii@ulis.ac.jp\ \ \ ishikawa@ulis.ac.jp}}
20:
21: \begin{document}
22: \maketitle
23:
24: \begin{abstract}
25: This paper proposes a method to analyze Japanese anaphora, in which
26: zero pronouns (omitted obligatory cases) are used to refer to
27: preceding entities (antecedents). Unlike the case of general
28: coreference resolution, zero pronouns have to be detected prior to
29: resolution because they are not expressed in discourse. Our method
30: integrates two probability parameters to perform zero pronoun
31: detection and resolution in a single framework. The first parameter
32: quantifies the degree to which a given case is a zero pronoun. The
33: second parameter quantifies the degree to which a given entity is
34: the antecedent for a detected zero pronoun. To compute these
35: parameters efficiently, we use corpora with/without annotations of
36: anaphoric relations. We show the effectiveness of our method by way
37: of experiments.
38: \end{abstract}
39:
40: \section{Introduction}
41: \label{sec:introduction}
42:
43: Anaphora resolution is crucial in natural language processing (NLP),
44: specifically, discourse analysis. In the case of English, partially
45: motivated by Message Understanding Conferences
46: (MUCs)~\cite{grishman96}, a number of coreference resolution methods
47: have been proposed.
48:
49: In other languages such as Japanese and Spanish, anaphoric expressions
50: are often omitted. Ellipses related to obligatory cases are usually
51: termed zero pronouns. Since zero pronouns are not expressed in
52: discourse, they have to be detected prior to identifying their
53: antecedents. Thus, although in English pleonastic pronouns have to be
54: determined whether or not they are anaphoric expressions prior to
55: resolution, the process of analyzing Japanese zero pronouns is
56: different from general coreference resolution in English.
57:
58: For identifying anaphoric relations, existing methods are classified
59: into two fundamental approaches: rule-based and statistical approaches.
60:
61: In rule-based
62: approaches~\cite{gros95,hobbs78,mitk98,naka96-2,okum96,palomar01,walk94},
63: anaphoric relations between anaphors and their antecedents are
64: identified by way of hand-crafted rules, which typically rely on
65: syntactic structures, gender/number agreement, and selectional
66: restrictions. However, it is difficult to produce rules exhaustively,
67: and rules that are developed for a specific language are not
68: necessarily effective for other languages. For example, gender/number
69: agreement in English cannot be applied to Japanese.
70:
71: Statistical approaches~\cite{aone95,ge98,kim95,soon01} use statistical
72: models produced based on corpora annotated with anaphoric relations.
73: However, only a few attempts have been made in corpus-based anaphora
74: resolution for Japanese zero pronouns. One of the reasons is that it
75: is costly to produce a sufficient volume of training corpora annotated
76: with anaphoric relations.
77:
78: In addition, those above methods focused mainly on identifying
79: antecedents, and few attempts have been made to detect zero pronouns.
80:
81: Motivated by the above background, we propose a probabilistic model
82: for analyzing Japanese zero pronouns combined with a detection
83: method. In brief, our model consists of two parameters associated with
84: zero pronoun detection and antecedent identification. We focus on zero
85: pronouns whose antecedents exist in preceding sentences to zero
86: pronouns because they are major referential expressions in Japanese.
87:
88: Section~\ref{sec:proposed approach} explains our proposed method
89: (system) for analyzing Japanese zero pronouns.
90: Section~\ref{sec:evaluation} evaluates our method by way of
91: experiments using newspaper articles. Section~\ref{sec:related works}
92: discusses related research literature.
93:
94: \section{A System for Analyzing Japanese Zero Pronouns}
95: \label{sec:proposed approach}
96:
97: \subsection{Overview}
98: \label{sec:overview}
99:
100: Figure~\ref{fig:overview} depicts the overall design of our system to
101: analyze Japanese zero pronouns. We explain the entire process based on
102: this figure.
103:
104: First, given an input Japanese text, our system performs morphological
105: and syntactic analyses. In the case of Japanese, morphological
106: analysis involves word segmentation and part-of-speech tagging because
107: Japanese sentences lack lexical segmentation, for which we use the
108: JUMAN morphological analyzer~\cite{juman98e}. Then, we use the KNP
109: parser~\cite{knp98e} to identify syntactic relations between segmented
110: words.
111:
112: Second, in a zero pronoun detection phase, the system uses syntactic
113: relations to detect omitted cases (nominative, accusative, and dative)
114: as zero pronoun candidates. To avoid zero pronouns overdetected, we
115: use the IPAL verb dictionary~\cite{ipal87e} including case frames
116: associated with 911 Japanese verbs. We discard zero pronoun candidates
117: unlisted in the case frames associated with a verb in question.
118:
119: For verbs unlisted in the IPAL dictionary, only nominative cases are
120: regarded as obligatory. The system also computes a probability that
121: case $c$ related to target verb $v$ is a zero pronoun,
122: $P_{zero}(c|v)$, to select plausible zero pronoun candidates.
123:
124: Ideally, in the case where a verb in question is polysemous, word
125: sense disambiguation is needed to select the appropriate case frame,
126: because different verb senses often correspond to different case
127: frames. However, we currently merge multiple case frames for a verb
128: into a single frame so as to avoid the polysemous problem. This issue
129: needs to be further explored.
130:
131: Third, in a zero pronoun resolution (i.e., antecedent identification)
132: phase, for each zero pronoun the system extracts antecedent candidates
133: from the preceding contexts, which are ordered according to the extent
134: to which they can be the antecedent for the target zero pronoun. From
135: the viewpoint of probability theory, our task here is to compute a
136: probability that zero pronoun $\phi$ refers to antecedent $a_i$,
137: $P(a_i|\phi)$, and select the candidate that maximizes the probability
138: score. For the purpose of computing this score, we model zero pronouns
139: and antecedents in Section~\ref{sec:features}.
140:
141: Finally, the system outputs texts containing anaphoric relations. In
142: addition, the number of zero pronouns analyzed by the system can
143: optionally be controlled based on the certainty score described in
144: Section~\ref{sec:certainty}.
145:
146: \begin{figure}[tb]
147: \begin{center}
148: \includegraphics[scale=0.9]{overview2.eps}
149: \caption{The overall design of our system to analyze Japanese zero pronouns.}
150: \label{fig:overview}
151: \end{center}
152: \end{figure}
153:
154: \subsection{Modeling Zero Pronouns and Antecedents}
155: \label{sec:features}
156:
157: According to past literature associated with zero pronoun resolution
158: and our preliminary study, we use the following six features to model
159: zero pronouns and antecedents.
160:
161: \vspace{3mm}
162: \noindent
163: $\bullet$ Features for zero pronouns
164: \begin{itemize}
165: \item[--] Verbs that govern zero pronouns ($v$), which denote verbs
166: whose cases are omitted.
167:
168: \item[--] Surface cases related to zero pronouns ($c$), for which
169: possible values are Japanese case marker suffixes, {\it ga\/}
170: (nominative), {\it wo\/} (accusative), and {\it ni\/} (dative). Those
171: values indicate which cases are omitted.
172: \end{itemize}
173: \noindent
174: $\bullet$ Features for antecedents
175: \begin{itemize}
176: \item[--] Post-positional particles ($p$), which play crucial roles in
177: resolving Japanese zero pronouns~\cite{kame86,walk94}.
178:
179: \item[--] Distance ($d$), which denotes the distance (proximity)
180: between a zero pronoun and an antecedent candidate in an input text. In
181: the case where they occur in the same sentence, its value takes $0$.
182: In the case where an antecedent occurs in $n$ sentences previous to
183: the sentence including a zero pronoun, its value takes $n$.
184:
185: \item[--] Constraint related to relative clauses ($r$), which denotes
186: whether an antecedent is included in a relative clause or not. In
187: the case where it is included, the value of $r$ takes \textit{true},
188: otherwise \textit{false}. The rationale behind this feature is that
189: Japanese zero pronouns tend {\em not} to refer to noun phrases in
190: relative clauses.
191:
192: \item[--] Semantic classes ($n$), which represent semantic classes
193: associated with antecedents. We use 544 semantic classes defined in
194: the Japanese \textit{Bunruigoihyou} thesaurus~\cite{koku64e}, which
195: contains 55,443 Japanese nouns.
196:
197: \end{itemize}
198:
199: \subsection{Our Probabilistic Model for Zero Pronoun Detection and Resolution}
200: \label{sec:probabilistic model}
201:
202: We consider probabilities that unsatisfied case $c$ related to verb
203: $v$ is a zero pronoun, $P_{zero}(c|v)$, and that zero pronoun $\phi_c$
204: refers to antecedent $a_i$, $P(a_i|\phi_c)$. Thus, a probability that
205: case $c$ ($\phi_c$) is zero-pronominalized and refers to candidate
206: $a_i$ is formalized as in Equation~(\ref{eq:product}).
207: \begin{eqnarray}
208: \label{eq:product}
209: P(a_i|\phi_c)\cdot P_{zero}(c|v)
210: \end{eqnarray}
211: Here, $P_{zero}(c|v)$ and $P(a_i|\phi_c)$ are computed in the
212: detection and resolution phases, respectively (see
213: Figure~\ref{fig:overview}).
214:
215: Since zero pronouns are omitted obligatory cases, whether or not case
216: $c$ is a zero pronoun depends on the extent to which case $c$ is
217: obligatory for verb $v$. Case $c$ is likely to be obligatory for verb
218: $v$ if $c$ frequently co-occurs with $v$. Thus, we compute
219: $P_{zero}(c|v)$ based on the co-occurrence frequency of $\langle
220: v,c\rangle$ pairs, which can be extracted from unannotated corpora.
221: $P_{zero}(c|v)$ takes 1 in the case where $c$ is $ga$ (nominative)
222: regardless of the target verb, because $ga$ is obligatory for most
223: Japanese verbs.
224:
225: Given the formal representation for zero pronouns and antecedents in
226: Section~\ref{sec:features}, the probability, $P(a|\phi)$, is expressed
227: as in Equation~(\ref{eq:paz1}).
228: \begin{eqnarray}
229: \label{eq:paz1}
230: P(a_i|\phi) = P(p_i,d_i,r_i,n_i|v,c)
231: \end{eqnarray}
232:
233: \noindent
234: To improve the efficiency of probability estimation, we decompose the
235: right-hand side of Equation~(\ref{eq:paz1}) as follows.
236:
237: Since a preliminary study showed that $d_i$ and $r_i$ were relatively
238: independent of the other features, we approximate
239: Equation~(\ref{eq:paz1}) as in Equation~(\ref{eq:paz2}).
240: \begin{eqnarray}
241: \label{eq:paz2}
242: \begin{array}{@{}r@{~}c@{~}l@{}}
243: \vspace*{1mm}
244: P(a_i|\phi) & \approx & P(p_i,n_i|v,c)\cdot P(d_i)\cdot P(r_i)\\
245: \vspace*{1mm}
246: &=& P(p_i|n_i,v,c)\cdot P(n_i|v,c)\\
247: && \mbox{}\cdot P(d_i)\cdot P(r_i)
248: \end{array}
249: \end{eqnarray}
250: Given that $p_i$ is independent of $v$ and $n_i$, we can further
251: approximate Equation~(\ref{eq:paz2}) to derive
252: Equation~(\ref{eq:paz}).
253: \begin{eqnarray}
254: \label{eq:paz}
255: P(a_i|\phi_c) \approx P(p_i|c)\!\cdot\! P(d_i)\!\cdot\! P(r_i)\!\cdot\! P(n_i|v,c)
256: \end{eqnarray}
257: Here, the first three factors, $P(p_i|c)\cdot P(d_i)\cdot P(r_i)$, are
258: related to syntactic properties, and $P(n_i|v,c)$ is a semantic
259: property associated with zero pronouns and antecedents. We shall call
260: the former and latter ``syntactic'' and ``semantic'' models,
261: respectively.
262:
263: Each parameter in Equation~(\ref{eq:paz}) is computed as in Equations
264: (\ref{eq:ppc}), where $F(x)$ denotes the frequency of $x$ in corpora
265: annotated with anaphoric relations.
266: \begin{eqnarray}
267: \label{eq:ppc}
268: \begin{array}{rcl}
269: \vspace*{1mm}
270: P(p_i|c)&=&\displaystyle\frac{F(p_i,c)}{\sum_{j}F(p_j,c)}\\
271: \vspace*{1mm}
272: \label{eq:pd}
273: P(d_i)&=&\displaystyle\frac{F(d_i)}{\sum_{j}F(d_j)}\\
274: \vspace*{1mm}
275: \label{eq:pm}
276: P(r_i)&=&\displaystyle\frac{F(r_i)}{\sum_{j}F(r_j)}\\
277: \label{eq:pns}
278: P(n_i|v,c)&=&\displaystyle\frac{F(n_i,v,c)}{\sum_{j}F(n_j,v,c)}
279: \end{array}
280: \end{eqnarray}
281: However, since estimating a semantic model, $P(n_i|v,c)$, needs
282: large-scale annotated corpora, the data sparseness problem is crucial.
283: Thus, we explore the use of unannotated corpora.
284:
285: For $P(n_i|v,c)$, $v$ and $c$ are features for a zero pronoun, and
286: $n_i$ is a feature for an antecedent. However, we can regard $v$, $c$,
287: and $n_i$ as features for a verb and its case noun because zero
288: pronouns are omitted case nouns. Thus, it is possible to estimate the
289: probability based on co-occurrences of verbs and their case nouns,
290: which can be extracted automatically from large-scale unannotated
291: corpora.
292:
293: \subsection{Computing Certainty Score}
294: \label{sec:certainty}
295:
296: Since zero pronoun analysis is not a stand-alone application, our
297: system is used as a module in other NLP applications, such as machine
298: translation. In those applications, it is desirable that erroneous
299: anaphoric relations are not generated. Thus, we propose a notion of
300: certainty to output only zero pronouns that are detected and resolved
301: with a high certainty score.
302:
303: We formalize the certainty score, $C(\phi_c)$, for each zero pronoun
304: as in Equation (\ref{eq:certainty}), where $P_1(\phi_c)$ and
305: $P_2(\phi_c)$ denote probabilities computed by
306: Equation~(\ref{eq:product}) for the first and second ranked
307: candidates, respectively. In addition, $t$ is a parametric constant,
308: which is experimentally set to $0.5$.
309: \begin{eqnarray}
310: \label{eq:certainty}
311: C(\phi_c) = t\!\cdot\! P_1(\phi_c)+(1\!-\!t)(P_1(\phi_c)\!-\!P_2(\phi_c))
312: \end{eqnarray}
313: The certainty score becomes great in the case where $P_1(\phi_c)$ is
314: sufficiently great and significantly greater than $P_2(\phi_c)$.
315:
316: \section{Evaluation}
317: \label{sec:evaluation}
318:
319: \subsection{Methodology}
320: \label{sec:methodology}
321:
322: To investigate the performance of our system, we used
323: \textit{Kyotodaigaku} Text Corpus version 2.0~\cite{kuro98}, in which
324: 20,000 articles in \textit{Mainichi Shimbun} newspaper articles in
325: 1995 were analyzed by JUMAN and KNP (i.e., the morph/syntax analyzers
326: used in our system) and revised manually. From this corpus, we
327: randomly selected 30 general articles (e.g., politics and sports) and
328: manually annotated those articles with anaphoric relations for zero
329: pronouns. The number of zero pronouns contained in those articles was
330: 449.
331:
332: We used a leave-one-out cross-validation evaluation method: we
333: conducted 30 trials in each of which one article was used as a test
334: input and the remaining 29 articles were used for producing a
335: syntactic model. We used six years worth of \textit{Mainichi Shimbun}
336: newspaper articles~\cite{mainichi-e} to produce a semantic model based
337: on co-occurrences of verbs and their case nouns.
338:
339: To extract verbs and their case noun pairs from newspaper articles, we
340: performed a morphological analysis by JUMAN and extracted dependency
341: relations using a relatively simple rule: we assumed that each noun
342: modifies the verb of highest proximity. As a result, we obtained 12
343: million co-occurrences associated with 6,194 verb types. Then, we
344: generalized the extracted nouns into semantic classes in the Japanese
345: {\it Bunruigoihyou\/} thesaurus. In the case where a noun was
346: associated with multiple classes, the noun was assigned to all
347: possible classes.
348: In the case where a noun was not listed in the thesaurus, the noun
349: itself was regarded as a single semantic class.
350:
351: \begin{table*}[htb]
352: \begin{center}
353: \caption{Experimental results for zero pronoun resolution.}
354: \label{tab:riyousosei}
355: \footnotesize
356: \smallskip
357: \begin{tabular}{cr@{~}lccr@{~}lcc} \hline\hline
358: & \multicolumn{8}{c}{\# of Correct cases (Accuracy)} \\
359: \cline{2-9}
360: $k$ & \multicolumn{2}{c}{$Sem1$} & $Sem2$ & $Syn$ & \multicolumn{2}{c}{$Both1$} & $Both2$ & $Rule$\\
361: \hline
362: 1 & 25 & (6.2\%) & 119 (29.5\%) & 185 (45.8\%) & 30 & (7.4\%) & \bf{205 (50.7\%)} & 162 (40.1\%) \\
363: 2 & 46 & (11.4\%) & 193 (47.8\%) & 227 (56.2\%) & 49 & (12.1\%) & \bf{250 (61.9\%)} & 213 (52.7\%) \\
364: 3 & 72 & (17.8\%) & 230 (56.9\%) & 262 (64.9\%) & 75 & (18.6\%) & \bf{280 (69.3\%)} & 237 (58.6\%) \\
365: \hline
366: \end{tabular}
367: \end{center}
368: \end{table*}
369:
370: \subsection{Comparative Experiments}
371: \label{sec:results}
372:
373: Fundamentally, our evaluation is two-fold: we evaluated only zero
374: pronoun resolution (antecedent identification) and a combination of
375: detection and resolution. In the former case, we assumed that all the
376: zero pronouns are correctly detected, and investigated the
377: effectiveness of the resolution model, $P(a_i|\phi)$. In the latter
378: case, we investigated the effectiveness of the combined model,
379: $P(a_i|\phi_c)\cdot P_{zero}(c|v)$.
380:
381: First, we compared the performance of the following different models
382: for zero pronoun resolution, $P(a_i|\phi)$:
383: \begin{list}{$\bullet$}{\itemsep 3pt \parsep 0pt}
384: \item a semantic model produced based on annotated corpora ($Sem1$),
385: \item a semantic model produced based on unannotated corpora, using
386: co-occurrences of verbs and their case nouns ($Sem2$),
387: \item a syntactic model ($Syn$),
388: \item a combination of $Syn$ and $Sem1$ ($Both1$),
389: \item a combination of $Syn$ and $Sem2$ ($Both2$), which is our
390: complete model for zero pronoun resolution,
391: \item a rule-based model ($Rule$).
392: \end{list}
393: As a control (baseline) model, we took approximately two man-months to
394: develop a rule-based model ($Rule$) through an analysis on ten
395: articles in \textit{Kyotodaigaku} Text Corpus. This model uses rules
396: typically used in existing rule-based methods: 1) post-positional
397: particles that follow antecedent candidates, 2) proximity between zero
398: pronouns and antecedent candidates, and 3) conjunctive particles. We
399: did not use semantic properties in the rule-based method because they
400: decreased the system accuracy in a preliminary study.
401:
402: Table~\ref{tab:riyousosei} shows the results, where we regarded the
403: $k$-best antecedent candidates as the final output and compared
404: results for different values of $k$. In the case where the correct
405: answer was included in the $k$-best candidates, we judged it
406: correct. In addition, ``Accuracy'' is the ratio between the number of
407: zero pronouns whose antecedents were correctly identified and the
408: number of zero pronouns correctly detected by the system (404 for all
409: the models). Bold figures denote the highest performance for each
410: value of $k$ across different models. Here, the average number of
411: antecedent candidates per zero pronoun was 27 regardless of the model,
412: and thus the accuracy was 3.7\% in the case where the system randomly
413: selected antecedents.
414:
415: Looking at the results for two different semantic models, $Sem2$
416: outperformed $Sem1$, which indicates that the use of co-occurrences of
417: verbs and their case nouns was effective to identify antecedents and
418: avoid the data sparseness problem in producing a semantic model.
419:
420: The syntactic model, $Syn$, outperformed the two semantic models
421: independently, and therefore the syntactic features used in our model
422: were more effective than the semantic features to identify
423: antecedents. When both syntactic and semantic models were used in
424: $Both2$, the accuracy was further improved. While the rule-based
425: method, $Rule$, achieved a relatively high accuracy, our complete
426: model, $Both2$, outperformed $Rule$ irrespective of the value of $k$.
427: To sum up, we conclude that both syntactic and semantic models were
428: effective to identify appropriate anaphoric relations.
429:
430: At the same time, since our method requires annotated corpora, the
431: relation between the corpus size and accuracy is crucial. Thus, we
432: performed two additional experiments associated with $Both2$.
433:
434: In the first experiment, we varied the number of annotated articles
435: used to produce a syntactic model, where a semantic model was produced
436: based on six years worth of newspaper articles. In the second
437: experiment, we varied the number of unannotated articles used to
438: produce a semantic model, where a syntactic model was produced based
439: on 29 annotated articles. In Figure~\ref{fig:both}, we show two {\em
440: independent\/} results as space is limited: the dashed and solid
441: graphs correspond to the results of the first and second experiments,
442: respectively. Given all the articles for modeling, the resultant
443: accuracy for each experiment was 50.7\%, which corresponds to that for
444: $Both2$ with $k=1$ in Table~\ref{tab:riyousosei}.
445:
446: \begin{figure}[tb]
447: \begin{center}
448: \includegraphics[scale=0.62]{training-en.eps}
449: \caption{The relation between the corpus size and accuracy for a combination of syntactic and semantic models ($Both2$).}
450: \label{fig:both}
451: \end{center}
452: \end{figure}
453:
454: In the case where the number of articles was varied in producing a
455: syntactic model, the accuracy improved rapidly in the first five
456: articles. This indicates that a high accuracy can be obtained by a
457: relatively small number of supervised articles. In the case where the
458: amount of unannotated corpora was varied in producing a semantic
459: model, the accuracy marginally improved as the corpus size
460: increases. However, note that we do not need human supervision to
461: produce a semantic model.
462:
463: Finally, we evaluated the effectiveness of the combination of zero
464: pronoun detection and resolution in Equation~(\ref{eq:product}). To
465: investigate the contribution of the detection model, $P_{zero}(c|v)$,
466: we used $P(a_i|\phi_c)$ for comparison. Both cases used $Both2$ to
467: compute the probability for zero pronoun resolution. We varied a
468: threshold for the certainty score to plot coverage-accuracy graphs for
469: zero pronoun detection (Figure~\ref{fig:detection}) and antecedent
470: identification (Figure~\ref{fig:sikii}).
471:
472: \begin{figure}
473: \begin{center}
474: \includegraphics[scale=0.60]{recall-precisionAna-i.eps}
475: \caption{The relation between coverage and accuracy for zero pronoun detection ({\it Both\/}2).}
476: \label{fig:detection}
477: \end{center}
478: \end{figure}
479:
480: \begin{figure}
481: \begin{center}
482: \includegraphics[scale=0.60]{coverage-accuracy-i.eps}
483: \caption{The relation between coverage and accuracy for antecedent identification ({\it Both\/}2).}
484: \label{fig:sikii}
485: \end{center}
486: \end{figure}
487:
488: In Figure~\ref{fig:detection}, ``coverage'' is the ratio between the
489: number of zero pronouns correctly detected by the system and the total
490: number of zero pronouns in input texts, and ``accuracy'' is the ratio
491: between the number of zero pronouns correctly detected and the total
492: number of zero pronouns detected by the system. Note that since our
493: system failed to detect a number of zero pronouns, the coverage could
494: not be 100\%.
495:
496: Figure~\ref{fig:detection} shows that as the coverage
497: decreases, the accuracy improved irrespective of the model used. When
498: compared with the case of $P(a_i|\phi)$, our model, $P(a_i|\phi)\cdot
499: P_{zero}(c|v)$, achieved a higher accuracy regardless of the coverage.
500:
501: In Figure~\ref{fig:sikii}, ``coverage'' is the ratio between the
502: number of zero pronouns whose antecedents were generated and the
503: number of zero pronouns correctly detected by the system. The
504: accuracy was improved by decreasing the coverage, and our model
505: marginally improved the accuracy for $P(a_i|\phi)$.
506:
507: According to those above results, our model was effective to improve
508: the accuracy for zero pronoun detection and did not have side effect
509: on the antecedent identification process. As a result, the overall
510: accuracy of zero pronoun detection and resolution was improved.
511:
512:
513: \section{Related Work}
514: \label{sec:related works}
515:
516: Kim and Ehara~\shortcite{kim95} proposed a probabilistic model to
517: resolve subjective zero pronouns for the purpose of Japanese/English
518: machine translation. In their model, the search scope for possible
519: antecedents was limited to the sentence containing zero pronouns. In
520: contrast, our method can resolve zero pronouns in both
521: intra/inter-sentential anaphora types.
522:
523: Aone and Bennett~\shortcite{aone95} used a decision tree to determine
524: appropriate antecedents for zero pronouns. They focused on proper and
525: definite nouns used in anaphoric expressions as well as zero pronouns.
526: However, their method resolves only anaphors that refer to
527: organization names (e.g., private companies), which are generally
528: easier to resolve than our case.
529:
530: Both above existing methods require annotated corpora for statistical
531: modeling, while we used corpora with/without annotations related to
532: anaphoric relations, and thus we can easily obtain large-scale corpora
533: to avoid the data sparseness problem.
534:
535: Nakaiwa~\shortcite{nakaiwa00} used Japanese/English bilingual corpora
536: to identify anaphoric relations of Japanese zero pronouns by comparing
537: J/E sentence pairs. The rationale behind this method is that
538: obligatory cases zero-pronominalized in Japanese are usually expressed
539: in English. However, in the case where corresponding English
540: expressions are pronouns and anaphors, their method is not
541: effective. Additionally, bilingual corpora are more expensive to
542: obtain than monolingual corpora used in our method.
543:
544: Finally, our method integrates a parameter for zero pronoun detection
545: in computing the certainty score. Thus, we can improve the accuracy of
546: our system by discarding extraneous outputs with a small certainty
547: score.
548:
549: \section{Conclusion}
550: \label{sec:conclusion}
551:
552: We proposed a probabilistic model to analyze Japanese zero pronouns
553: that refer to antecedents in the previous context. Our model consists
554: of two probabilistic parameters corresponding to detecting zero
555: pronouns and identifying their antecedents, respectively. The latter
556: is decomposed into syntactic and semantic properties. To estimate
557: those parameters efficiently, we used annotated/unannotated
558: corpora. In addition, we formalized the certainty score to improve the
559: accuracy. Through experiments, we showed that the use of unannotated
560: corpora was effective to avoid the data sparseness problem and that
561: the certainty score further improved the accuracy.
562:
563: Future work would include word sense disambiguation for polysemous
564: predicate verbs to select appropriate case frames in the zero pronoun
565: detection process.
566:
567: \bibliographystyle{acl}
568: \small
569: \begin{thebibliography}{}
570:
571: \bibitem[\protect\citename{Aone and Bennett}1995]{aone95}
572: Chinatsu Aone and Scott~William Bennett.
573: \newblock 1995.
574: \newblock Evaluating automated and manual acquisition of anaphora resolution
575: strategies.
576: \newblock In {\em Proceedings of 33th Annual Meeting of the Association for
577: Computational Linguistics}, pages 122--129.
578:
579: \bibitem[\protect\citename{Ge \bgroup et al.\egroup }1998]{ge98}
580: Niyu Ge, John Hale, and Eugene Charniak.
581: \newblock 1998.
582: \newblock A statistical approach to anaphora resolution.
583: \newblock In {\em Proceedings of the Sixth Workshop on Very Large Corpora},
584: pages 161--170.
585:
586: \bibitem[\protect\citename{Grishman and Sundheim}1996]{grishman96}
587: Ralph Grishman and Beth Sundheim.
588: \newblock 1996.
589: \newblock Message {Understanding} {Conference} - 6: A brief history.
590: \newblock In {\em Proceedings of the 16th International Conference on
591: Computational Linguistics}, pages 466--471.
592:
593: \bibitem[\protect\citename{Grosz \bgroup et al.\egroup }1995]{gros95}
594: Barbara~J. Grosz, Aravind~K. Joshi, and Scott Weinstein.
595: \newblock 1995.
596: \newblock Centering: A framework for modeling the local coherence of discourse.
597: \newblock {\em Computational Linguistics}, 21(2):203--226.
598:
599: \bibitem[\protect\citename{Hobbs}1978]{hobbs78}
600: Jerry~R. Hobbs.
601: \newblock 1978.
602: \newblock Resolving pronoun references.
603: \newblock {\em Lingua}, 44:311--338.
604:
605: \bibitem[\protect\citename{{Information-technology Promotion
606: Agency}}1987]{ipal87e}
607: {Information-technology Promotion Agency}, 1987.
608: \newblock {\em IPA Lexicon of the {Japanese} language for computers (Basic
609: Verbs)}.
610: \newblock (in Japanese).
611:
612: \bibitem[\protect\citename{Kameyama}1986]{kame86}
613: Megumi Kameyama.
614: \newblock 1986.
615: \newblock A property-sharing constraint in centering.
616: \newblock In {\em Proceedings of the 24th Annual Meeting of the Association for
617: Computational Linguistics}, pages 200--206.
618:
619: \bibitem[\protect\citename{Kim and Ehara}1995]{kim95}
620: Yeun-Bae Kim and Terumasa Ehara.
621: \newblock 1995.
622: \newblock Zero-subject resolution method based on probabilistic inference with
623: evaluation function.
624: \newblock In {\em Proceedings of the 3rd Natural Language Processing
625: Pacific-Rim Symposium}, pages 721--727.
626:
627: \bibitem[\protect\citename{Kurohashi and Nagao}1998a]{kuro98}
628: Sadao Kurohashi and Makoto Nagao.
629: \newblock 1998a.
630: \newblock Building a {Japanese} parsed corpus while improving the parsing
631: system.
632: \newblock In {\em Proceedings of The 1st International Conference on Language
633: Resources \& Evaluation}, pages 719--724.
634:
635: \bibitem[\protect\citename{Kurohashi and Nagao}1998b]{juman98e}
636: Sadao Kurohashi and Makoto Nagao, 1998b.
637: \newblock {\em {Japanese} morphological analysis system {JUMAN} version 3.6
638: manual}.
639: \newblock Department of Informatics, Kyoto University.
640: \newblock (in Japanese).
641:
642: \bibitem[\protect\citename{Kurohashi}1998]{knp98e}
643: Sadao Kurohashi, 1998.
644: \newblock {\em Japanese Dependency/Case Structure Analyzer {KNP} version
645: 2.0b6}.
646: \newblock Department of Informatics, Kyoto University.
647: \newblock (in Japanese).
648:
649: \bibitem[\protect\citename{{Mainichi Shimbunsha}}1994--1999]{mainichi-e}
650: {Mainichi Shimbunsha}.
651: \newblock {1994--1999}.
652: \newblock {Mainichi Shimbun CD-ROM}.
653:
654: \bibitem[\protect\citename{Mitkov \bgroup et al.\egroup }1998]{mitk98}
655: Ruslan Mitkov, Lamia Belguith, and Malgorzata Stys.
656: \newblock 1998.
657: \newblock Multilingual robust anaphora resolution.
658: \newblock In {\em Proceedings of the 3rd Conference on Empirical Methods in
659: Natural Language Processing}, pages 7--16.
660:
661: \bibitem[\protect\citename{Nakaiwa and Shirai}1996]{naka96-2}
662: Hiromi Nakaiwa and Satoshi Shirai.
663: \newblock 1996.
664: \newblock Anaphora resolution of {Japanese} zero pronouns with deictic
665: reference.
666: \newblock In {\em Proceedings of the 16th International Conference on
667: Computational Linguistics}, pages 812--817.
668:
669: \bibitem[\protect\citename{Nakaiwa}2000]{nakaiwa00}
670: Hiromi Nakaiwa.
671: \newblock 2000.
672: \newblock An environment for extracting resolution rules of zero pronouns from
673: corpora.
674: \newblock In {\em COLING-2000 Workshop on Semantic Annotation and Intelligent
675: Content}, pages 44--52.
676:
677: \bibitem[\protect\citename{{National Language Research
678: Institute}}1964]{koku64e}
679: {National Language Research Institute}.
680: \newblock 1964.
681: \newblock {\em Bunruigoihyou}.
682: \newblock Shuei publisher.
683: \newblock (in Japanese).
684:
685: \bibitem[\protect\citename{Okumura and Tamura}1996]{okum96}
686: Manabu Okumura and Kouji Tamura.
687: \newblock 1996.
688: \newblock Zero pronoun resolution in {Japanese} discourse based on centering
689: theory.
690: \newblock In {\em Proceedings of the 16th International Conference on
691: Computational Linguistics}, pages 871--876.
692:
693: \bibitem[\protect\citename{Palomar \bgroup et al.\egroup }2001]{palomar01}
694: Manuel Palomar, Antonio Ferr\'{a}ndez, Lidia Moreno, Patricio
695: Mart\'{\i}nez-Barco, Jes\'{u}s Peral, Maximiliano Saiz-Noeda, and Rafael~Mu\
696: {n}oz.
697: \newblock 2001.
698: \newblock An algorithm for anaphora resolution in {Spanish} texts.
699: \newblock {\em Computational Linguistics}, 27(4):545--568.
700:
701: \bibitem[\protect\citename{Soon \bgroup et al.\egroup }2001]{soon01}
702: Wee~Meng Soon, Hwee~Tou Ng, and Daniel Chung~Yong Lim.
703: \newblock 2001.
704: \newblock A machine learning approach to coreference resolution of noun
705: phrases.
706: \newblock {\em Computational Linguistics}, 27(4):521--544.
707:
708: \bibitem[\protect\citename{Walker \bgroup et al.\egroup }1994]{walk94}
709: Marilyn Walker, Masayo Iida, and Sharon Cote.
710: \newblock 1994.
711: \newblock Japanese discourse and the process of centering.
712: \newblock {\em Computational Linguistics}, 20(2):193--233.
713:
714: \end{thebibliography}
715:
716: \end{document}
717: