cs0206030/main.tex
1: %
2: % COLING2002
3: % 
4: 
5: \documentclass[11pt]{article}
6: 
7: \usepackage{colacl}
8: \usepackage{graphicx}
9: 
10: \title{A Probabilistic Method for Analyzing Japanese Anaphora Integrating Zero Pronoun Detection and Resolution}
11: 
12: \author{Kazuhiro Seki$^{\,\dag}$, Atsushi Fujii$^{\,\dag\dag,\,\dag\dag\dag}$
13:   and Tetsuya Ishikawa$^{\,\dag\dag}$\\
14:   \dag National Institute of Advanced Industrial Science and Technology\\
15:   {\normalsize 1-1-1, Chuuou Daini Umezono, Tsukuba 305-8568, Japan}\\
16:   \dag\dag University of Library and Information Science \\
17:   {\normalsize 1-2, Kasuga, Tsukuba, 305-8550, Japan}\\
18:   \dag\dag\dag  CREST, Japan Science \& Technology Corporation \\
19:   {\normalsize\tt k.seki@aist.go.jp\ \ \ fujii@ulis.ac.jp\ \ \ ishikawa@ulis.ac.jp}}
20: 
21: \begin{document}
22: \maketitle
23: 
24: \begin{abstract}
25:   This paper proposes a method to analyze Japanese anaphora, in which
26:   zero pronouns (omitted obligatory cases) are used to refer to
27:   preceding entities (antecedents). Unlike the case of general
28:   coreference resolution, zero pronouns have to be detected prior to
29:   resolution because they are not expressed in discourse. Our method
30:   integrates two probability parameters to perform zero pronoun
31:   detection and resolution in a single framework. The first parameter
32:   quantifies the degree to which a given case is a zero pronoun. The
33:   second parameter quantifies the degree to which a given entity is
34:   the antecedent for a detected zero pronoun. To compute these
35:   parameters efficiently, we use corpora with/without annotations of
36:   anaphoric relations.  We show the effectiveness of our method by way
37:   of experiments.
38: \end{abstract}
39: 
40: \section{Introduction}
41: \label{sec:introduction}
42: 
43: Anaphora resolution is crucial in natural language processing (NLP),
44: specifically, discourse analysis. In the case of English, partially
45: motivated by Message Understanding Conferences
46: (MUCs)~\cite{grishman96}, a number of coreference resolution methods
47: have been proposed.
48: 
49: In other languages such as Japanese and Spanish, anaphoric expressions
50: are often omitted. Ellipses related to obligatory cases are usually
51: termed zero pronouns. Since zero pronouns are not expressed in
52: discourse, they have to be detected prior to identifying their
53: antecedents. Thus, although in English pleonastic pronouns have to be
54: determined whether or not they are anaphoric expressions prior to
55: resolution, the process of analyzing Japanese zero pronouns is
56: different from general coreference resolution in English.
57: 
58: For identifying anaphoric relations, existing methods are classified
59: into two fundamental approaches: rule-based and statistical approaches.
60: 
61: In rule-based
62: approaches~\cite{gros95,hobbs78,mitk98,naka96-2,okum96,palomar01,walk94},
63: anaphoric relations between anaphors and their antecedents are
64: identified by way of hand-crafted rules, which typically rely on
65: syntactic structures, gender/number agreement, and selectional
66: restrictions.  However, it is difficult to produce rules exhaustively,
67: and rules that are developed for a specific language are not
68: necessarily effective for other languages. For example, gender/number
69: agreement in English cannot be applied to Japanese.
70: 
71: Statistical approaches~\cite{aone95,ge98,kim95,soon01} use statistical
72: models produced based on corpora annotated with anaphoric relations.
73: However, only a few attempts have been made in corpus-based anaphora
74: resolution for Japanese zero pronouns. One of the reasons is that it
75: is costly to produce a sufficient volume of training corpora annotated
76: with anaphoric relations.
77: 
78: In addition, those above methods focused mainly on identifying
79: antecedents, and few attempts have been made to detect zero pronouns.
80: 
81: Motivated by the above background, we propose a probabilistic model
82: for analyzing Japanese zero pronouns combined with a detection
83: method. In brief, our model consists of two parameters associated with
84: zero pronoun detection and antecedent identification. We focus on zero
85: pronouns whose antecedents exist in preceding sentences to zero
86: pronouns because they are major referential expressions in Japanese.
87: 
88: Section~\ref{sec:proposed approach} explains our proposed method
89: (system) for analyzing  Japanese zero pronouns.
90: Section~\ref{sec:evaluation} evaluates our method by way of
91: experiments using newspaper articles.  Section~\ref{sec:related works}
92: discusses related research literature.
93: 
94: \section{A System for Analyzing Japanese Zero Pronouns}
95: \label{sec:proposed approach}
96: 
97: \subsection{Overview}
98: \label{sec:overview}
99: 
100: Figure~\ref{fig:overview} depicts the overall design of our system to
101: analyze Japanese zero pronouns. We explain the entire process based on
102: this figure.
103: 
104: First, given an input Japanese text, our system performs morphological
105: and syntactic analyses. In the case of Japanese, morphological
106: analysis involves word segmentation and part-of-speech tagging because
107: Japanese sentences lack lexical segmentation, for which we use the
108: JUMAN morphological analyzer~\cite{juman98e}. Then, we use the KNP
109: parser~\cite{knp98e} to identify syntactic relations between segmented
110: words.
111: 
112: Second, in a zero pronoun detection phase, the system uses syntactic
113: relations to detect omitted cases (nominative, accusative, and dative)
114: as zero pronoun candidates. To avoid zero pronouns overdetected, we
115: use the IPAL verb dictionary~\cite{ipal87e} including case frames
116: associated with 911 Japanese verbs. We discard zero pronoun candidates
117: unlisted in the case frames associated with a verb in question.
118: 
119: For verbs unlisted in the IPAL dictionary, only nominative cases are
120: regarded as obligatory. The system also computes a probability that
121: case $c$ related to target verb $v$ is a zero pronoun,
122: $P_{zero}(c|v)$, to select plausible zero pronoun candidates.
123: 
124: Ideally, in the case where a verb in question is polysemous, word
125: sense disambiguation is needed to select the appropriate case frame,
126: because different verb senses often correspond to different case
127: frames. However, we currently merge multiple case frames for a verb
128: into a single frame so as to avoid the polysemous problem.  This issue
129: needs to be further explored.
130: 
131: Third, in a zero pronoun resolution  (i.e., antecedent identification)
132: phase, for each zero pronoun the system extracts antecedent candidates
133: from the preceding contexts, which are ordered according to the extent
134: to which they can be the antecedent for the target zero pronoun. From
135: the viewpoint of probability theory, our task here is to compute a
136: probability that zero pronoun $\phi$ refers to antecedent $a_i$,
137: $P(a_i|\phi)$, and select the candidate that maximizes the probability
138: score. For the purpose of computing this score, we model zero pronouns
139: and antecedents in Section~\ref{sec:features}.
140: 
141: Finally, the system outputs texts containing anaphoric relations.  In
142: addition, the number of zero pronouns analyzed by the system can
143: optionally be controlled based on the certainty score described in
144: Section~\ref{sec:certainty}.
145: 
146: \begin{figure}[tb]
147: \begin{center}
148: \includegraphics[scale=0.9]{overview2.eps}
149: \caption{The overall design of our system to analyze Japanese zero pronouns.}
150: \label{fig:overview}
151: \end{center}
152: \end{figure}
153: 
154: \subsection{Modeling Zero Pronouns and Antecedents}
155: \label{sec:features}
156: 
157: According to past literature associated with zero pronoun resolution
158: and our preliminary study, we use the following six features to model
159: zero pronouns and antecedents.
160: 
161: \vspace{3mm}
162: \noindent
163: $\bullet$ Features for zero pronouns
164: \begin{itemize}
165: \item[--] Verbs that govern zero pronouns ($v$), which denote verbs
166:   whose cases are omitted.
167:   
168: \item[--] Surface cases related to zero pronouns ($c$), for which
169:   possible values are Japanese case marker suffixes, {\it ga\/}
170:   (nominative), {\it wo\/} (accusative), and {\it ni\/} (dative).  Those
171:   values indicate which cases are omitted.
172: \end{itemize}
173: \noindent
174: $\bullet$ Features for antecedents
175: \begin{itemize}
176: \item[--] Post-positional particles ($p$), which play crucial roles in
177:   resolving Japanese zero pronouns~\cite{kame86,walk94}.
178:   
179: \item[--] Distance ($d$), which denotes the distance (proximity)
180:   between a zero pronoun and an antecedent candidate in an input text. In
181:   the case where they occur in the same sentence, its value takes $0$.
182:   In the case where an antecedent occurs in $n$ sentences previous to
183:   the sentence including a zero pronoun, its value takes $n$.
184:     
185: \item[--] Constraint related to relative clauses ($r$), which denotes
186:   whether an antecedent is included in a relative clause or not. In
187:   the case where it is included, the value of $r$ takes \textit{true},
188:   otherwise \textit{false}.  The rationale behind this feature is that
189:   Japanese zero pronouns tend {\em not} to refer to noun phrases in
190:   relative clauses.
191:     
192: \item[--] Semantic classes ($n$), which represent semantic classes
193:   associated with antecedents.  We use 544 semantic classes defined in
194:   the Japanese \textit{Bunruigoihyou} thesaurus~\cite{koku64e}, which
195:   contains 55,443 Japanese nouns.
196:     
197: \end{itemize}
198: 
199: \subsection{Our Probabilistic Model for Zero Pronoun Detection and Resolution}
200: \label{sec:probabilistic model}
201: 
202: We consider probabilities that unsatisfied case $c$ related to verb
203: $v$ is a zero pronoun, $P_{zero}(c|v)$, and that zero pronoun $\phi_c$
204: refers to antecedent $a_i$, $P(a_i|\phi_c)$. Thus, a probability that
205: case $c$ ($\phi_c$) is zero-pronominalized and refers to candidate
206: $a_i$ is formalized as in Equation~(\ref{eq:product}).
207: \begin{eqnarray}
208:   \label{eq:product}
209:   P(a_i|\phi_c)\cdot P_{zero}(c|v)
210: \end{eqnarray}
211: Here, $P_{zero}(c|v)$ and $P(a_i|\phi_c)$ are computed in the
212: detection and resolution phases, respectively (see
213: Figure~\ref{fig:overview}).
214: 
215: Since zero pronouns are omitted obligatory cases, whether or not case
216: $c$ is a zero pronoun depends on the extent to which case $c$ is
217: obligatory for verb $v$. Case $c$ is likely to be obligatory for verb
218: $v$ if $c$ frequently co-occurs with $v$. Thus, we compute
219: $P_{zero}(c|v)$ based on the co-occurrence frequency of $\langle
220: v,c\rangle$ pairs, which can be extracted from unannotated corpora.
221: $P_{zero}(c|v)$ takes 1 in the case where $c$ is $ga$ (nominative)
222: regardless of the target verb, because $ga$ is obligatory for most
223: Japanese verbs.
224: 
225: Given the formal representation for zero pronouns and antecedents in
226: Section~\ref{sec:features}, the probability, $P(a|\phi)$, is expressed
227: as in Equation~(\ref{eq:paz1}).
228: \begin{eqnarray}
229:   \label{eq:paz1}
230:   P(a_i|\phi) = P(p_i,d_i,r_i,n_i|v,c)
231: \end{eqnarray}
232: 
233: \noindent
234: To improve the efficiency of probability estimation, we decompose the
235: right-hand side of Equation~(\ref{eq:paz1}) as follows.
236: 
237: Since a preliminary study showed that $d_i$ and $r_i$ were relatively
238: independent of the other features, we approximate
239: Equation~(\ref{eq:paz1}) as in Equation~(\ref{eq:paz2}).
240: \begin{eqnarray}
241:   \label{eq:paz2}
242:   \begin{array}{@{}r@{~}c@{~}l@{}}
243:     \vspace*{1mm}
244:     P(a_i|\phi) & \approx & P(p_i,n_i|v,c)\cdot P(d_i)\cdot P(r_i)\\
245:     \vspace*{1mm}
246:     &=& P(p_i|n_i,v,c)\cdot P(n_i|v,c)\\
247:     && \mbox{}\cdot P(d_i)\cdot P(r_i)
248:   \end{array}
249: \end{eqnarray}
250: Given that $p_i$ is independent of $v$ and $n_i$, we can further
251: approximate Equation~(\ref{eq:paz2}) to derive
252: Equation~(\ref{eq:paz}).
253: \begin{eqnarray}
254:   \label{eq:paz}
255:   P(a_i|\phi_c) \approx P(p_i|c)\!\cdot\! P(d_i)\!\cdot\! P(r_i)\!\cdot\! P(n_i|v,c)
256: \end{eqnarray}
257: Here, the first three factors, $P(p_i|c)\cdot P(d_i)\cdot P(r_i)$, are
258: related to syntactic properties, and $P(n_i|v,c)$ is a semantic
259: property associated with zero pronouns and antecedents. We shall call
260: the former and latter ``syntactic'' and ``semantic'' models,
261: respectively.
262: 
263: Each parameter in Equation~(\ref{eq:paz}) is computed as in Equations
264: (\ref{eq:ppc}), where $F(x)$ denotes the frequency of $x$ in corpora
265: annotated with anaphoric relations.
266: \begin{eqnarray}
267:   \label{eq:ppc}
268:   \begin{array}{rcl}
269:     \vspace*{1mm}
270:     P(p_i|c)&=&\displaystyle\frac{F(p_i,c)}{\sum_{j}F(p_j,c)}\\
271:     \vspace*{1mm}
272:     \label{eq:pd}
273:     P(d_i)&=&\displaystyle\frac{F(d_i)}{\sum_{j}F(d_j)}\\
274:     \vspace*{1mm}
275:     \label{eq:pm}
276:     P(r_i)&=&\displaystyle\frac{F(r_i)}{\sum_{j}F(r_j)}\\
277:     \label{eq:pns}
278:     P(n_i|v,c)&=&\displaystyle\frac{F(n_i,v,c)}{\sum_{j}F(n_j,v,c)}
279:   \end{array}
280: \end{eqnarray}
281: However, since estimating a semantic model, $P(n_i|v,c)$, needs
282: large-scale annotated corpora, the data sparseness problem is crucial.
283: Thus, we explore the use of unannotated corpora.
284: 
285: For $P(n_i|v,c)$, $v$ and $c$ are features for a zero pronoun, and
286: $n_i$ is a feature for an antecedent. However, we can regard $v$, $c$,
287: and $n_i$ as features for a verb and its case noun because zero
288: pronouns are omitted case nouns. Thus, it is possible to estimate the
289: probability based on co-occurrences of verbs and their case nouns,
290: which can be extracted automatically from large-scale unannotated
291: corpora.
292: 
293: \subsection{Computing Certainty Score}
294: \label{sec:certainty}
295: 
296: Since zero pronoun analysis is not a stand-alone application, our
297: system is used as a module in other NLP applications, such as machine
298: translation. In those applications, it is desirable that erroneous
299: anaphoric relations are not generated. Thus, we propose a notion of
300: certainty to output only zero pronouns that are detected and resolved
301: with a high certainty score.
302: 
303: We formalize the certainty score, $C(\phi_c)$, for each zero pronoun
304: as in Equation (\ref{eq:certainty}), where $P_1(\phi_c)$ and
305: $P_2(\phi_c)$ denote probabilities computed by
306: Equation~(\ref{eq:product}) for the first and second ranked
307: candidates, respectively. In addition, $t$ is a parametric constant,
308: which is experimentally set to $0.5$.
309: \begin{eqnarray}
310:   \label{eq:certainty}
311:   C(\phi_c) = t\!\cdot\! P_1(\phi_c)+(1\!-\!t)(P_1(\phi_c)\!-\!P_2(\phi_c))
312: \end{eqnarray}
313: The certainty score becomes great in the case where $P_1(\phi_c)$ is
314: sufficiently great and significantly greater than $P_2(\phi_c)$.
315: 
316: \section{Evaluation}
317: \label{sec:evaluation}
318: 
319: \subsection{Methodology}
320: \label{sec:methodology}
321: 
322: To investigate the performance of our system, we used
323: \textit{Kyotodaigaku} Text Corpus version 2.0~\cite{kuro98}, in which
324: 20,000 articles in \textit{Mainichi Shimbun} newspaper articles in
325: 1995 were analyzed by JUMAN and KNP (i.e., the morph/syntax analyzers
326: used in our system) and revised manually. From this corpus, we
327: randomly selected 30 general articles (e.g., politics and sports) and
328: manually annotated those articles with anaphoric relations for zero
329: pronouns. The number of zero pronouns contained in those articles was
330: 449.
331: 
332: We used a leave-one-out cross-validation evaluation method: we
333: conducted 30 trials in each of which  one article was used as a test
334: input and the remaining 29 articles were used for producing a
335: syntactic model. We used six years worth of \textit{Mainichi Shimbun}
336: newspaper articles~\cite{mainichi-e} to produce a semantic model based
337: on co-occurrences of verbs and their case nouns.
338: 
339: To extract verbs and their case noun pairs from newspaper articles, we
340: performed a morphological analysis by JUMAN and extracted dependency
341: relations using a relatively simple rule: we assumed that each noun
342: modifies the verb of highest proximity.  As a result, we obtained 12
343: million co-occurrences associated with 6,194 verb types. Then, we
344: generalized the extracted nouns into semantic classes in the Japanese
345: {\it Bunruigoihyou\/} thesaurus. In the case where a noun was
346: associated with multiple classes, the noun was assigned to all
347: possible classes.
348: In the case where a noun was not listed in the thesaurus, the noun
349: itself was regarded as a single semantic class.
350: 
351: \begin{table*}[htb]
352:   \begin{center}
353:     \caption{Experimental results for zero pronoun resolution.}
354:     \label{tab:riyousosei}
355:     \footnotesize
356:     \smallskip
357:     \begin{tabular}{cr@{~}lccr@{~}lcc} \hline\hline
358:       & \multicolumn{8}{c}{\# of Correct cases (Accuracy)} \\
359:       \cline{2-9}
360:       $k$ & \multicolumn{2}{c}{$Sem1$} & $Sem2$ & $Syn$ & \multicolumn{2}{c}{$Both1$} & $Both2$ & $Rule$\\
361:       \hline
362:       1 & 25 & (6.2\%)  & 119 (29.5\%) & 185 (45.8\%) & 30 & (7.4\%) & \bf{205 (50.7\%)} & 162 (40.1\%) \\
363:       2 & 46 & (11.4\%) & 193 (47.8\%) & 227 (56.2\%) & 49 & (12.1\%) & \bf{250 (61.9\%)} & 213 (52.7\%) \\
364:       3 & 72 & (17.8\%) & 230 (56.9\%) & 262 (64.9\%) & 75 & (18.6\%) & \bf{280 (69.3\%)} & 237 (58.6\%) \\
365:       \hline
366:     \end{tabular}
367:   \end{center}
368: \end{table*}
369: 
370: \subsection{Comparative Experiments}
371: \label{sec:results}
372: 
373: Fundamentally, our evaluation is two-fold: we evaluated only zero
374: pronoun resolution (antecedent identification) and a combination of
375: detection and resolution. In the former case, we assumed that all the
376: zero pronouns are correctly detected, and investigated the
377: effectiveness of the resolution model, $P(a_i|\phi)$. In the latter
378: case, we investigated the effectiveness of the combined model,
379: $P(a_i|\phi_c)\cdot P_{zero}(c|v)$.
380: 
381: First, we compared the performance of the following different models
382: for zero pronoun resolution, $P(a_i|\phi)$:
383: \begin{list}{$\bullet$}{\itemsep 3pt \parsep 0pt}
384: \item a semantic model produced based on annotated corpora ($Sem1$),
385: \item a semantic model produced based on unannotated corpora, using
386:   co-occurrences of verbs and their case nouns ($Sem2$),
387: \item a syntactic model ($Syn$),
388: \item a combination of $Syn$ and $Sem1$ ($Both1$),
389: \item a combination of $Syn$ and $Sem2$ ($Both2$), which is our
390:   complete model for zero pronoun resolution,
391: \item a rule-based model ($Rule$).
392: \end{list}
393: As a control (baseline) model, we took approximately two man-months to
394: develop a rule-based model ($Rule$) through an analysis on ten
395: articles in \textit{Kyotodaigaku} Text Corpus. This model uses rules
396: typically used in existing rule-based methods: 1) post-positional
397: particles that follow antecedent candidates, 2) proximity between zero
398: pronouns and antecedent candidates, and 3) conjunctive particles. We
399: did not use semantic properties in the rule-based method because they
400: decreased the system accuracy in a preliminary study.
401: 
402: Table~\ref{tab:riyousosei} shows the results, where we regarded the
403: $k$-best antecedent candidates as the final output and compared
404: results for different values of $k$.  In the case where the correct
405: answer was included in the $k$-best candidates, we judged it
406: correct. In addition, ``Accuracy'' is the ratio between the number of
407: zero pronouns whose antecedents were correctly identified and the
408: number of zero pronouns correctly detected by the system (404 for all
409: the models). Bold figures denote the highest performance for each
410: value of $k$ across different models. Here, the average number of
411: antecedent candidates per zero pronoun was 27 regardless of the model,
412: and thus the accuracy was 3.7\% in the case where the system randomly
413: selected antecedents.
414: 
415: Looking at the results for two different semantic models, $Sem2$
416: outperformed $Sem1$, which indicates that the use of co-occurrences of
417: verbs and their case nouns was effective to identify antecedents and
418: avoid the data sparseness problem in producing a semantic model.
419: 
420: The syntactic model, $Syn$, outperformed the two semantic models
421: independently, and therefore the syntactic features used in our model
422: were more effective than the semantic features to identify
423: antecedents. When both syntactic and semantic models were used in
424: $Both2$, the accuracy was further improved.  While the rule-based
425: method, $Rule$, achieved a relatively high accuracy, our complete
426: model, $Both2$, outperformed $Rule$ irrespective of the value of $k$.
427: To sum up, we conclude that both syntactic and semantic models were
428: effective to identify appropriate anaphoric relations.
429: 
430: At the same time, since our method requires annotated corpora, the
431: relation between the corpus size and accuracy is crucial. Thus, we
432: performed two additional experiments associated with $Both2$.
433: 
434: In the first experiment, we varied the number of annotated articles
435: used to produce a syntactic model, where a semantic model was produced
436: based on six years worth of newspaper articles. In the second
437: experiment, we varied the number of unannotated articles used to
438: produce a semantic model, where a syntactic model was produced based
439: on 29 annotated articles. In Figure~\ref{fig:both}, we show two {\em
440:   independent\/} results as space is limited: the dashed and solid
441: graphs correspond to the results of the first and second experiments,
442: respectively. Given all the articles for modeling, the resultant
443: accuracy for each experiment was 50.7\%, which corresponds to that for
444: $Both2$ with $k=1$ in Table~\ref{tab:riyousosei}.
445: 
446: \begin{figure}[tb]
447: \begin{center}
448:   \includegraphics[scale=0.62]{training-en.eps}
449: \caption{The relation between the corpus size and accuracy for a combination of syntactic and semantic models ($Both2$).}
450: \label{fig:both}
451: \end{center}
452: \end{figure}
453: 
454: In the case where the number of articles was varied in producing a
455: syntactic model, the accuracy improved rapidly in the first five
456: articles. This indicates that a high accuracy can be obtained by a
457: relatively small number of supervised articles.  In the case where the
458: amount of unannotated corpora was varied in producing a semantic
459: model, the accuracy marginally improved as the corpus size
460: increases. However, note that we do not need human supervision to
461: produce a semantic model.
462: 
463: Finally, we evaluated the effectiveness of the combination of zero
464: pronoun detection and resolution in Equation~(\ref{eq:product}).  To
465: investigate the contribution of the detection model, $P_{zero}(c|v)$,
466: we used $P(a_i|\phi_c)$ for comparison. Both cases used $Both2$ to
467: compute the probability for zero pronoun resolution.  We varied a
468: threshold for the certainty score to plot coverage-accuracy graphs for
469: zero pronoun detection (Figure~\ref{fig:detection}) and antecedent
470: identification (Figure~\ref{fig:sikii}).
471: 
472: \begin{figure}
473: \begin{center}
474: \includegraphics[scale=0.60]{recall-precisionAna-i.eps}
475: \caption{The relation between coverage and accuracy for zero pronoun detection ({\it Both\/}2).}
476: \label{fig:detection}
477: \end{center}
478: \end{figure}
479: 
480: \begin{figure}
481: \begin{center}
482:   \includegraphics[scale=0.60]{coverage-accuracy-i.eps}
483: \caption{The relation between coverage and accuracy for antecedent identification ({\it Both\/}2).}
484: \label{fig:sikii}
485: \end{center}
486: \end{figure}
487: 
488: In Figure~\ref{fig:detection}, ``coverage'' is the ratio between the
489: number of zero pronouns correctly detected by the system and the total
490: number of zero pronouns in input texts, and ``accuracy'' is the ratio
491: between the number of zero pronouns correctly detected and the total
492: number of zero pronouns detected by the system. Note that since our
493: system failed to detect a number of zero pronouns, the coverage could
494: not be 100\%.
495: 
496: Figure~\ref{fig:detection} shows that  as the coverage
497: decreases, the accuracy improved irrespective of the model used.  When
498: compared with the case of $P(a_i|\phi)$, our model, $P(a_i|\phi)\cdot
499: P_{zero}(c|v)$, achieved a higher accuracy regardless of the coverage.
500: 
501: In Figure~\ref{fig:sikii}, ``coverage'' is the ratio between the
502: number of zero pronouns whose antecedents were generated and the
503: number of zero pronouns correctly detected by the system.  The
504: accuracy was improved by decreasing the coverage, and our model
505: marginally improved the accuracy for $P(a_i|\phi)$.
506: 
507: According to those above results, our model was effective to improve
508: the accuracy for zero pronoun detection and did not have side effect
509: on the antecedent identification process. As a result, the overall
510: accuracy of zero pronoun detection and resolution was improved.
511: 
512: 
513: \section{Related Work}
514: \label{sec:related works}
515: 
516: Kim and Ehara~\shortcite{kim95} proposed a probabilistic model to
517: resolve subjective zero pronouns for the purpose of Japanese/English
518: machine translation. In their model, the search scope for possible
519: antecedents was limited to the sentence containing zero pronouns.  In
520: contrast, our method can resolve zero pronouns in both
521: intra/inter-sentential anaphora types.
522: 
523: Aone and Bennett~\shortcite{aone95} used a decision tree to determine
524: appropriate antecedents for zero pronouns. They focused on proper and
525: definite nouns used in anaphoric expressions as well as zero pronouns.
526: However, their method resolves only anaphors that refer to
527: organization names (e.g., private companies), which are generally
528: easier to resolve than our case.
529: 
530: Both above existing methods require annotated corpora for statistical
531: modeling, while we used corpora with/without annotations related to
532: anaphoric relations, and thus we can easily obtain large-scale corpora
533: to avoid the data sparseness problem.
534: 
535: Nakaiwa~\shortcite{nakaiwa00} used Japanese/English bilingual corpora
536: to identify anaphoric relations of Japanese zero pronouns by comparing
537: J/E sentence pairs. The rationale behind this method is that
538: obligatory cases zero-pronominalized in Japanese are usually expressed
539: in English. However, in the case where corresponding English
540: expressions are pronouns and anaphors, their method is not
541: effective. Additionally, bilingual corpora are more expensive to
542: obtain than monolingual corpora used in our method.
543: 
544: Finally, our method integrates a parameter for zero pronoun detection
545: in computing the certainty score. Thus, we can improve the accuracy of
546: our system by discarding extraneous outputs with a small certainty
547: score.
548: 
549: \section{Conclusion}
550: \label{sec:conclusion}
551: 
552: We proposed a probabilistic model to analyze Japanese zero pronouns
553: that refer to antecedents in the previous context. Our model consists
554: of two probabilistic parameters corresponding to detecting zero
555: pronouns and identifying their antecedents, respectively. The latter
556: is decomposed into syntactic and semantic properties. To estimate
557: those parameters efficiently, we used annotated/unannotated
558: corpora. In addition, we formalized the certainty score to improve the
559: accuracy. Through experiments, we showed that the use of unannotated
560: corpora was effective to avoid the data sparseness problem and that
561: the certainty score further improved the accuracy.
562: 
563: Future work would include word sense disambiguation for polysemous
564: predicate verbs to select appropriate case frames in the zero pronoun
565: detection process.
566: 
567: \bibliographystyle{acl}
568: \small
569: \begin{thebibliography}{}
570: 
571: \bibitem[\protect\citename{Aone and Bennett}1995]{aone95}
572: Chinatsu Aone and Scott~William Bennett.
573: \newblock 1995.
574: \newblock Evaluating automated and manual acquisition of anaphora resolution
575:   strategies.
576: \newblock In {\em Proceedings of 33th Annual Meeting of the Association for
577:   Computational Linguistics}, pages 122--129.
578: 
579: \bibitem[\protect\citename{Ge \bgroup et al.\egroup }1998]{ge98}
580: Niyu Ge, John Hale, and Eugene Charniak.
581: \newblock 1998.
582: \newblock A statistical approach to anaphora resolution.
583: \newblock In {\em Proceedings of the Sixth Workshop on Very Large Corpora},
584:   pages 161--170.
585: 
586: \bibitem[\protect\citename{Grishman and Sundheim}1996]{grishman96}
587: Ralph Grishman and Beth Sundheim.
588: \newblock 1996.
589: \newblock Message {Understanding} {Conference} - 6: A brief history.
590: \newblock In {\em Proceedings of the 16th International Conference on
591:   Computational Linguistics}, pages 466--471.
592: 
593: \bibitem[\protect\citename{Grosz \bgroup et al.\egroup }1995]{gros95}
594: Barbara~J. Grosz, Aravind~K. Joshi, and Scott Weinstein.
595: \newblock 1995.
596: \newblock Centering: A framework for modeling the local coherence of discourse.
597: \newblock {\em Computational Linguistics}, 21(2):203--226.
598: 
599: \bibitem[\protect\citename{Hobbs}1978]{hobbs78}
600: Jerry~R. Hobbs.
601: \newblock 1978.
602: \newblock Resolving pronoun references.
603: \newblock {\em Lingua}, 44:311--338.
604: 
605: \bibitem[\protect\citename{{Information-technology Promotion
606:   Agency}}1987]{ipal87e}
607: {Information-technology Promotion Agency}, 1987.
608: \newblock {\em IPA Lexicon of the {Japanese} language for computers (Basic
609:   Verbs)}.
610: \newblock (in Japanese).
611: 
612: \bibitem[\protect\citename{Kameyama}1986]{kame86}
613: Megumi Kameyama.
614: \newblock 1986.
615: \newblock A property-sharing constraint in centering.
616: \newblock In {\em Proceedings of the 24th Annual Meeting of the Association for
617:   Computational Linguistics}, pages 200--206.
618: 
619: \bibitem[\protect\citename{Kim and Ehara}1995]{kim95}
620: Yeun-Bae Kim and Terumasa Ehara.
621: \newblock 1995.
622: \newblock Zero-subject resolution method based on probabilistic inference with
623:   evaluation function.
624: \newblock In {\em Proceedings of the 3rd Natural Language Processing
625:   Pacific-Rim Symposium}, pages 721--727.
626: 
627: \bibitem[\protect\citename{Kurohashi and Nagao}1998a]{kuro98}
628: Sadao Kurohashi and Makoto Nagao.
629: \newblock 1998a.
630: \newblock Building a {Japanese} parsed corpus while improving the parsing
631:   system.
632: \newblock In {\em Proceedings of The 1st International Conference on Language
633:   Resources \& Evaluation}, pages 719--724.
634: 
635: \bibitem[\protect\citename{Kurohashi and Nagao}1998b]{juman98e}
636: Sadao Kurohashi and Makoto Nagao, 1998b.
637: \newblock {\em {Japanese} morphological analysis system {JUMAN} version 3.6
638:   manual}.
639: \newblock Department of Informatics, Kyoto University.
640: \newblock (in Japanese).
641: 
642: \bibitem[\protect\citename{Kurohashi}1998]{knp98e}
643: Sadao Kurohashi, 1998.
644: \newblock {\em Japanese Dependency/Case Structure Analyzer {KNP} version
645:   2.0b6}.
646: \newblock Department of Informatics, Kyoto University.
647: \newblock (in Japanese).
648: 
649: \bibitem[\protect\citename{{Mainichi Shimbunsha}}1994--1999]{mainichi-e}
650: {Mainichi Shimbunsha}.
651: \newblock {1994--1999}.
652: \newblock {Mainichi Shimbun CD-ROM}.
653: 
654: \bibitem[\protect\citename{Mitkov \bgroup et al.\egroup }1998]{mitk98}
655: Ruslan Mitkov, Lamia Belguith, and Malgorzata Stys.
656: \newblock 1998.
657: \newblock Multilingual robust anaphora resolution.
658: \newblock In {\em Proceedings of the 3rd Conference on Empirical Methods in
659:   Natural Language Processing}, pages 7--16.
660: 
661: \bibitem[\protect\citename{Nakaiwa and Shirai}1996]{naka96-2}
662: Hiromi Nakaiwa and Satoshi Shirai.
663: \newblock 1996.
664: \newblock Anaphora resolution of {Japanese} zero pronouns with deictic
665:   reference.
666: \newblock In {\em Proceedings of the 16th International Conference on
667:   Computational Linguistics}, pages 812--817.
668: 
669: \bibitem[\protect\citename{Nakaiwa}2000]{nakaiwa00}
670: Hiromi Nakaiwa.
671: \newblock 2000.
672: \newblock An environment for extracting resolution rules of zero pronouns from
673:   corpora.
674: \newblock In {\em COLING-2000 Workshop on Semantic Annotation and Intelligent
675:   Content}, pages 44--52.
676: 
677: \bibitem[\protect\citename{{National Language Research
678:   Institute}}1964]{koku64e}
679: {National Language Research Institute}.
680: \newblock 1964.
681: \newblock {\em Bunruigoihyou}.
682: \newblock Shuei publisher.
683: \newblock (in Japanese).
684: 
685: \bibitem[\protect\citename{Okumura and Tamura}1996]{okum96}
686: Manabu Okumura and Kouji Tamura.
687: \newblock 1996.
688: \newblock Zero pronoun resolution in {Japanese} discourse based on centering
689:   theory.
690: \newblock In {\em Proceedings of the 16th International Conference on
691:   Computational Linguistics}, pages 871--876.
692: 
693: \bibitem[\protect\citename{Palomar \bgroup et al.\egroup }2001]{palomar01}
694: Manuel Palomar, Antonio Ferr\'{a}ndez, Lidia Moreno, Patricio
695:   Mart\'{\i}nez-Barco, Jes\'{u}s Peral, Maximiliano Saiz-Noeda, and Rafael~Mu\
696:   {n}oz.
697: \newblock 2001.
698: \newblock An algorithm for anaphora resolution in {Spanish} texts.
699: \newblock {\em Computational Linguistics}, 27(4):545--568.
700: 
701: \bibitem[\protect\citename{Soon \bgroup et al.\egroup }2001]{soon01}
702: Wee~Meng Soon, Hwee~Tou Ng, and Daniel Chung~Yong Lim.
703: \newblock 2001.
704: \newblock A machine learning approach to coreference resolution of noun
705:   phrases.
706: \newblock {\em Computational Linguistics}, 27(4):521--544.
707: 
708: \bibitem[\protect\citename{Walker \bgroup et al.\egroup }1994]{walk94}
709: Marilyn Walker, Masayo Iida, and Sharon Cote.
710: \newblock 1994.
711: \newblock Japanese discourse and the process of centering.
712: \newblock {\em Computational Linguistics}, 20(2):193--233.
713: 
714: \end{thebibliography}
715: 
716: \end{document}
717: