0304:cs0304006/cmplg.tex

1: \documentclass[10pt]{article}

2:

3:

4: \usepackage{hltnaacl03}

5: \usepackage{times}

6: \usepackage{latexsym}

7: \usepackage{epsfig}

8: \usepackage{xspace}

9: \newcommand{\epsfscaledbox}[2]{\centerline{\psfig{figure=#1,width=#2}}}

10: \newcommand{\omt}[1]{}

11: \newcommand{\bibsnip}{\vspace*{-.11in}}

12: \newcommand{\proc}{Proc.\xspace}

13: \newcommand{\U}[1]{\underline{#1}}

14: \newcommand{\UU}[1]{\underline{\underline{#1}}}

15: \newcommand{\comment}[1]{{\bf !!- - - #1 - - -  !!}}

16: \newcommand{\Lattice}{Lattice\xspace}

17: \newcommand{\lattice}{lattice\xspace}

18: \newcommand{\lattices}{lattices\xspace}

19: \newcommand{\slotlat}{slotted \lattice}

20: \newcommand{\slotlats}{slotted \lattices}

21: \newcommand{\template}[1]{{\sf #1}}

22: \newcommand{\corpus}{C}

23:

24: \newenvironment{frameit}[1]

25:   {\begin{tabular}{|p{#1}|}\hline}{\\\hline\end{tabular}}

26:

27: \newcommand{\textexample}[1]{

28:   {\noindent

29:     \begin{center}

30:       \fbox{\parbox{0.45\textwidth}{\small\sf #1}}

31:     \end{center}}}

32:

33: \setlength\titlebox{6.5cm}

34: \title{\vspace{-75pt}

35: {\normalsize {\it \hfill Proceedings of HLT/NAACL 2003}} \\ \mbox{}\\Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment}

36:

37: \author{Regina Barzilay \and Lillian Lee \\

38: Department of Computer Science \\

39: Cornell University \\

40: Ithaca, NY 14853-7501 \\

41: \{regina,llee\}@cs.cornell.edu}

42:

43: \date{}

44:

45: \begin{document}

46: \maketitle

47: \begin{abstract}

48:   We address the text-to-text generation problem of sentence-level paraphrasing

49:   --- a phenomenon distinct from and more difficult than word- or phrase-level

50:   paraphrasing.  Our approach applies {\em multiple-sequence alignment} to

51:   sentences gathered from unannotated comparable corpora: it learns a set of

52:   paraphrasing patterns represented by {\em word lattice} pairs and

53:   automatically determines how to apply these patterns to rewrite new

54:   sentences.  The results of our evaluation experiments show that the system

55:   derives accurate paraphrases, outperforming baseline systems.

56: \end{abstract}

57:

58:

59:

60: \section{Introduction}

61:

62: \begin{quote}

63: {\em This is a late parrot! It's a stiff! Bereft of life, it rests in

64: peace! If you hadn't nailed him to the perch he would be pushing up

65: the daisies! Its metabolical processes are of interest only to

66: historians! It's hopped the twig! It's shuffled off this mortal coil!

67: It's rung down the curtain and joined the choir invisible! This is

68: an EX-PARROT!} --- Monty Python, ``Pet Shop''

69: \end{quote}

70:

71: A mechanism for automatically generating multiple paraphrases of a

72: given sentence would be of significant practical import for

73: text-to-text generation systems.  Applications include summarization

74: \cite{Knight&Marcu:2000a} and rewriting

75: \cite{Chandrasekar+Srinivas:97a}: both could employ such a mechanism

76: to produce candidate sentence paraphrases

77: that other

78: system components  would filter for length, sophistication level, and

79: so forth.\footnote{Another interesting application,

80:   somewhat tangential to generation, would be

81: to expand existing corpora by providing several versions of their

82: component sentences.

83: This could, for example, aid machine-translation evaluation, where it has

84: become common to evaluate systems by comparing their output against a bank of

85: several reference translations for the same sentences

86: \cite{Papineni&al:2002a}.

87: See \newcite{Bangalore&Murdock&Riccardi:2002a} and

88: \newcite{Barzilay&Lee:2002a} for other uses of such data.}

89: Not surprisingly, therefore,

90: paraphrasing has been a focus of generation

91: research for quite some time

92: \cite{McKeown:79a,Meteer+Shaked:88a,Dras:1999a}.

93:

94: One might initially suppose that sentence-level paraphrasing is simply the

95: result of word-for-word or phrase-by-phrase substitution applied in a domain-

96: and context-independent fashion.  However, in studies of paraphrases across

97: several domains

98: \cite{Iordanskaja&Kittredge&Polguere:1991a,Robin-phd,McKeown&Kukich&Shaw:1994a},

99: this was generally not the case.

100: For instance, consider the following two sentences (similar to

101: examples found in  \newcite{Smadja&McKeown:1991a}):

102:   \begin{center}

103:     \begin{frameit}{0.9\columnwidth}

104:     {\small    After the latest Fed rate cut, stocks rose across the board.}

105:       \\\hline

106:       {\small Winners strongly outpaced losers after Greenspan cut

107:       interest rates again.}

108:     \end{frameit}

109:   \end{center}

110:   Observe that ``Fed'' (Federal Reserve) and ``Greenspan'' are interchangeable

111:   only in the domain of US financial matters.  Also, note that one cannot draw

112:   one-to-one correspondences between single words or phrases.  For instance,

113:   nothing in the second sentence is really equivalent to ``across the board'';

114:   we can only say that the entire clauses ``stocks rose across the board'' and

115:   ``winners strongly outpaced losers'' are paraphrases.  This evidence suggests

116:   two consequences: (1) we cannot rely solely on generic domain-independent

117:   lexical resources for the task of paraphrasing, and (2) {\em sentence-level}

118:   paraphrasing is an important problem extending beyond that of paraphrasing

119:   smaller lexical units.

120:

121:   {\em Our work presents a novel knowledge-lean algorithm that uses {\em

122:       multiple-sequence alignment} (MSA) to {\em learn} to generate

123:     sentence-level paraphrases essentially from unannotated corpus data alone.}

124:   In contrast to previous work using MSA for generation

125:   \cite{Barzilay&Lee:2002a}, we need neither parallel data nor explicit

126:   information about sentence semantics.  Rather, we use two {\em comparable

127:     corpora}, in our case, collections of articles produced by two different

128:   newswire agencies about the same events.  The use of related corpora is key:

129:   we can capture paraphrases that on the surface bear little resemblance but

130:   that, by the nature of the data, must be descriptions of the same

131:   information.  Note that we also acquire paraphrases from each of the

132:   individual corpora; but the lack of clues as to sentence equivalence in

133:   single corpora means that we must be more conservative, only selecting as

134:   paraphrases items that are structurally very similar.

135:

136:   Our approach has three main steps.  First, working on each of the comparable

137:   corpora separately, we compute {\em \lattices} --- compact graph-based

138:   representations --- to find commonalities within (automatically derived)

139:   groups of structurally similar sentences.  Next, we identify pairs of

140:   lattices from the two different corpora that are paraphrases of each other;

141:   the identification process checks whether the lattices take similar

142:   arguments.  Finally, given an input sentence to be paraphrased, we match it

143:   to a lattice and use a paraphrase from the matched lattice's mate to generate

144:   an output sentence.  The key features of this approach are:

145:

146: \noindent

147: \textbf{Focus on paraphrase generation.} In contrast to earlier work, we not

148: only extract paraphrasing rules, but also automatically determine which of the

149: potentially relevant rules to apply to an input sentence and produce a revised

150: form using them.

151:

152: \noindent

153: \textbf{Flexible paraphrase types.} Previous approaches to paraphrase

154: acquisition focused on certain rigid types of paraphrases, for instance,

155: limiting the number of arguments.  In contrast, our method is not limited to a

156: set of {\it a priori}-specified paraphrase types.

157:

158: \noindent

159: \textbf{Use of comparable corpora and minimal use of knowledge resources.}  In

160: addition to the advantages mentioned above, comparable corpora can be easily

161: obtained for many domains, whereas previous approaches to paraphrase

162: acquisition (and the related problem of phrase-based machine translation

163: \cite{Wang:1998a,Och&Tillman&Ney:1999a,Vogel&Ney:2000a}) required parallel

164: corpora.  We point out that one such approach, recently proposed by

165: \newcite{Pang+Knight+Marcu:03a}, also represents paraphrases by lattices,

166: similarly to our method, although their lattices are derived using parse

167: information.

168:

169:

170: Moreover, our algorithm does not employ knowledge resources such as parsers or

171: lexical databases, which may not be available or appropriate for all domains

172: --- a key issue since paraphrasing is typically domain-dependent.  Nonetheless,

173: our algorithm achieves good performance.

174:

175:

176:

177: \section{Related work}

178: Previous work on automated paraphrasing has considered different levels of

179: paraphrase granularity.  Learning synonyms via distributional similarity has

180: been well-studied \cite{Pereira&Tishby&Lee:1993a,Grefenstette:94a,Lin:1998a}.

181: \newcite{Jacquemin:l999a} and \newcite{Barzilay&McKeown:01a} identify

182: phrase-level paraphrases, while \newcite{Lin&Pantel:2001a} and

183: \newcite{Shinyama&al:2002a} acquire structural paraphrases encoded as

184: templates.  These latter are the most closely related to the sentence-level

185: paraphrases we desire, and so we focus in this section on template-induction

186: approaches.

187:

188: \newcite{Lin&Pantel:2001a} extract inference rules, which are related to

189: paraphrases (for example, \template{X wrote Y} implies \template{X is the

190:   author of Y}), to improve question answering.  They assume that {\em paths}

191: in dependency trees that take similar arguments (leaves) are close in meaning.

192: However, only two-argument templates are considered.

193: \newcite{Shinyama&al:2002a} also use dependency-tree information to extract

194: templates of a limited form (in their case, determined by the underlying

195: information extraction application).  Like us (and unlike Lin and Pantel, who

196: employ a single large corpus), they use articles written about the same event

197: in different newspapers as data.

198:

199: Our approach shares two characteristics with the two methods just described:

200: pattern comparison by analysis of the patterns' respective arguments, and use

201: of non-parallel corpora as a data source.  However, {\em extraction} methods

202: are not easily extended to {\em generation} methods.  One problem is that their

203: templates often only match small fragments of a sentence.  While this is

204: appropriate for other applications, deciding whether to use a given template to

205: generate a paraphrase requires information about the surrounding context

206: provided by the entire sentence.

207:

208:

209: \newcommand{\slot}{slot\xspace}

210: \newcommand{\slots}{slots\xspace}

211: \newcommand{\findclusters}{Sentence clustering}

212: \newcommand{\families}{clusters\xspace}

213: \newcommand{\Families}{Clusters\xspace}

214: \newcommand{\family}{cluster\xspace}

215: \newcommand{\famlat}{\lattice}

216: \newcommand{\famlats}{\lattices}

217: \newcommand{\msg}{pattern\xspace}

218:

219: \newcommand{\patterninformal}{pattern\xspace}

220: \newcommand{\patternsinformal}{patterns\xspace}

221: \newcommand{\Patterninformal}{Pattern\xspace}

222: \newcommand{\surprise}{surprise\xspace}

223: \newcommand{\backbone}{backbone\xspace}

224: \newcommand{\numtoken}{NUM}

225: \newcommand{\nametoken}{NAME}

226: \newcommand{\datetoken}{DATE}

227:

228:

229:

230: \section{Algorithm}

231:

232:

233: \paragraph{Overview} We first sketch the algorithm's broad outlines. The subsequent subsections provide

234: more detailed descriptions of the individual steps.

235:

236: The major goals of our algorithm are to learn:

237: \begin{itemize}

238: \item  recurring {\patternsinformal} in the data, such as  \template{X

239: (injured/wounded) Y people, Z seriously}, where the capital letters

240: represent variables;

241: \item

242: pairings between such \patternsinformal that represent paraphrases, for

243: example, between the \patterninformal \template{X (injured/wounded) Y people,

244: Z of them seriously} and the \patterninformal \template{Y were

245: (wounded/hurt) by X, among them Z were in serious condition}.

246: \end{itemize}

247:

248: Figure~\ref{fig:arch} illustrates the main stages of our approach.  During

249: training, \patterninformal induction is first applied independently to the two

250: datasets making up a pair of {comparable corpora}.  Individual

251: \patternsinformal are learned by applying {\em multiple-sequence alignment} to

252: \families of sentences describing approximately similar events; these

253: \patternsinformal are represented compactly by {\em \lattices} (see Figure

254: \ref{fig:lattice}).  We then check for \lattices from the two different corpora

255: that tend to take the same arguments; these \lattice pairs are taken to be

256: paraphrase \patternsinformal.

257:

258: \begin{figure}

259: \begin{center}

260: \epsfscaledbox{arch.eps}{2.2in}

261: \end{center}

262: \vspace*{-.2in}

263: \caption{\label{fig:arch} System architecture.}

264: \end{figure}

265:

266: Once training is done, we can generate paraphrases as follows: given the

267: sentence ``The \surprise bombing injured twenty people, five of them

268: seriously'', we match it to the lattice \template{X (injured/wounded) Y people,

269:   Z of them seriously} which can be rewritten as \template{Y were

270:   (wounded/hurt) by X, among them Z were in serious condition}, and so by

271: substituting arguments we can generate ``Twenty were wounded by the \surprise

272: bombing, among them five were in serious condition'' or ``Twenty were hurt by

273: the \surprise bombing, among them five were in serious condition''.

274:

275: \begin{figure}

276: \newcounter{sentexample}\setcounter{sentexample}{1}

277: \newcommand{\sentex}[1]{{\footnotesize (\thesentexample)~#1 \stepcounter{sentexample}}}

278: \fbox{

279: \begin{minipage}{3in}

280:   \sentex{\textbf{A Palestinian suicide bomber blew himself up in} a southern

281:     city Wednesday, \textbf{killing} two other \textbf{people}

282:     \textbf{and wounding} 27.} \\

283:   \sentex{\textbf{A suicide bomber blew himself up in} the settlement of Efrat,

284:     on Sunday, \textbf{killing} himself \textbf{and injuring}

285:     seven people.} \\

286:   \sentex{\textbf{A suicide bomber blew himself up in} the coastal resort of

287:     Netanya on Monday, \textbf{killing} three other \textbf{people}

288:     \textbf{and wounding} dozens more.} \\

289:   \sentex{\textbf{A Palestinian suicide bomber blew himself up in} a garden

290:     cafe on Saturday, \textbf{killing} 10 \textbf{people}  \textbf{and wounding}

291:     54.} \\

292:   \sentex{\textbf{A suicide bomber blew himself up in} the centre of Netanya on

293:     Sunday, \textbf{killing} three \textbf{people} as well as himself

294:     \textbf{and injuring} 40. }

295: \end{minipage}

296: }

297: \caption{\label{fig:cluster} Five sentences (without date, number,

298:   and name substitution) from a \family of 49, similarities emphasized.  }

299: \end{figure}

300:

301:

302:

303: \begin{figure*}

304:   \psfig{figure=msa-new.ps,width=6.5in}

305: \caption{\label{fig:lattice} \Lattice and

306:   \slotlat for the five sentences from Figure \ref{fig:cluster}.  Punctuation

307:   and articles removed for clarity.}

308: \end{figure*}

309:

310: \subsection{\findclusters}

311:

312: Our first step is to cluster sentences into groups from which to learn useful

313: patterns; for the multiple-sequence techniques we will use, this means that the

314: sentences within \families should describe similar events and have similar

315: structure, as in the sentences of Figure \ref{fig:cluster}.  This is

316: accomplished by applying hierarchical complete-link clustering to the sentences

317: using a similarity metric based on word n-gram overlap ($n=1,2,3,4$).  The only

318: subtlety is that we do not want mismatches on sentence details (e.g., the

319: location of a raid) causing sentences describing the same type of occurrence

320: (e.g., a raid) from being separated, as this might yield \families too

321: fragmented for effective learning to take place. (Moreover, variability in the

322: {\em arguments} of the sentences in a cluster is needed for our learning

323: algorithm to succeed; see below.)  We therefore first

324: replace all appearances of dates, numbers, and proper names\footnote{Our crude

325:   proper-name identification method was to flag every phrase (extracted by a

326:   noun-phrase chunker) appearing capitalized in a non-sentence-initial position

327:   sufficiently often.  }  with generic tokens.  \Families with fewer than ten

328: sentences are discarded.

329:

330:

331: \newcommand{\art}[1]{}

332: \newcommand{\monthtoken}{MONTH\xspace}

333: \newcommand{\mayseven}{\datetoken~\numtoken \xspace}

334: \newcommand{\palestinian}{\nametoken\xspace}

335: \newcommand{\southta}{\nametoken\xspace}

336: \newcommand{\marchnine}{\datetoken~\numtoken \xspace}

337: \newcommand{\jerusalem}{\nametoken\xspace}

338: \newcommand{\ipmasharon}{\nametoken\xspace}

339: \newcommand{\marchten}{\datetoken~\numtoken \xspace}

340: \newcommand{\saturday}{\datetoken\xspace}

341: \newcommand{\marchthirtyone}{\datetoken~\numtoken \xspace}

342: \newcommand{\afpsource}{\nametoken\xspace}

343: \newcommand{\jewish}{\nametoken\xspace}

344: \newcommand{\efratwwbbeth}{\nametoken\xspace}

345: \newcommand{\sunday}{\datetoken\xspace}

346: \newcommand{\juneeighteen}{\datetoken~\numtoken \xspace}

347: \newcommand{\fifteen}{\numtoken1\xspace}

348: \newcommand{\tuesday}{\datetoken\xspace}

349: \newcommand{\seven}{{\numtoken1}\xspace}

350: \newcommand{\eleven}{{\numtoken1}\xspace}

351: \newcommand{\fifty}{\numtoken2\xspace}

352: \newcommand{\eighteen}{\numtoken1\xspace}

353: \newcommand{\fortyeight}{\numtoken2\xspace}

354: \newcommand{\locone}{{in \art{a} crowded hall in \southta}\xspace}

355: \newcommand{\loconeshort}{in \art{a} crowded hall$\ldots$\xspace}

356: \newcommand{\loctwo}{into \art{a} crowded \jerusalem cafe [sic] \ipmasharon's residence\xspace}

357: \newcommand{\loctwoshort}{into \art{a} crowded $\ldots$ residence\xspace}

358: \newcommand{\locthree}{{in \art{the} \jewish

359:  settlement of \efratwwbbeth}\xspace}

360: \newcommand{\locthreeshort}{in \art{the} \jewish

361:  settlement $\ldots$ \xspace}

362: \newcommand{\locfour}{{aboard \art{a} crowded bus in \jerusalem}\xspace}

363: \newcommand{\locfourshort}{{aboard $\ldots$ \jerusalem}\xspace}

364: \newcommand{\synone}{injuring\xspace}

365: \newcommand{\syntwo}{wounding\xspace}

366:

367:

368:

369:

370: \subsection{Inducing \patternsinformal}

371:

372: \newcommand{\simfn}{\textrm{sim}} \newcommand{\alphabet}{\Sigma}

373: \newcommand{\underscore}{\underline{~}} In order to learn \patternsinformal, we

374: first compute a {\em multiple-sequence alignment} (MSA) of the sentences in a

375: given \family.  Pairwise MSA takes two sentences and a scoring function giving

376: the similarity between words; it determines the highest-scoring way to perform

377: insertions, deletions, and changes to transform one of the sentences into the

378: other.  Pairwise MSA can be extended efficiently to multiple sequences via the

379: iterative pairwise alignment, a polynomial-time method commonly used in

380: computational biology \cite{Durbin+Eddy+al:98a}.\footnote{Scoring function:

381:   aligning two identical words scores 1; inserting a word scores -0.01, and

382:   aligning two different words scores -0.5 (parameter values taken from

383:   \newcite{Barzilay&Lee:2002a}).}  \omt{ $$\simfn(x,y) = 1 & $x = y$, $x \in

384:   \alphabet$; \cr -0.01 & exactly one of $x,y$ is $\underscore$~; \cr -0.5 &

385:   otherwise (mismatch)$$

386:   1 if the two words $x$ and $y$ are the same, -0.01 }

387: The results can be represented in an intuitive form via a word {\em \lattice}

388: (see Figure \ref{fig:lattice}), which compactly represents (n-gram) structural

389: similarities between the \family's sentences.

390:

391: To transform \lattices into generation-suitable \patternsinformal requires some

392: understanding of the possible varieties of \lattice structures.  The most

393: important part of the transformation is to determine which words are actually

394: instances of arguments, and so should be replaced by {\em slots} (representing

395: variables).  The key intuition is that because the sentences in the \family

396: represent the same {\em type} of event, such as a bombing, but generally refer

397: to different {\em instances} of said event (e.g. a bombing in Jerusalem versus

398: in Gaza), areas of large variability in the \lattice should correspond to

399: arguments.

400:

401: To quantify this notion of variability, we first formalize its opposite:

402: commonality.  We define {\em \backbone} nodes as those shared by more than 50\%

403: of the \family's sentences.  The choice of 50\% is not arbitrary --- it can be

404: proved using the pigeonhole principle that our strict-majority criterion

405: imposes a unique linear ordering of the backbone nodes that respects the word

406: ordering within the sentences, thus guaranteeing at least a degree of

407: well-formedness and avoiding the problem of how to order backbone nodes

408: occurring on parallel ``branches'' of the lattice.

409:

410:

411: Once we have identified the \backbone nodes as points of strong commonality,

412: the next step is to identify the regions of variability (or, in \lattice terms,

413: many parallel disjoint paths) between them as (probably) corresponding to the

414: arguments of the propositions that the sentences represent.  For example, in

415: the top of Figure \ref{fig:lattice}, the words ``southern city, ``settlement of

416: NAME'',``coastal resort of NAME'', etc.  all correspond to the location of an

417: event and could be replaced by a single {\slot}.

418: Figure \ref{fig:lattice} shows an example of a \lattice and the derived

419: \slotlat; we give the details of the slot-induction process in the Appendix.

420:

421:

422: \subsection{Matching \famlats}

423:

424: Now, if we were using a parallel corpus, we could employ

425: sentence-alignment information to determine which lattices correspond

426: to paraphrases.  Since we do not have this information, we essentially

427: approximate the parallel-corpus situation by correlating information

428: from descriptions of (what we hope are) the same event occurring in

429: the two different corpora.

430:

431: Our method works as follows.  Once \lattices for each corpus in our

432: comparable-corpus pair are computed, we identify \lattice paraphrase pairs,

433: using the idea that paraphrases will tend to take the same values as arguments

434: \cite{Shinyama&al:2002a,Lin&Pantel:2001a}. More specifically, we take a pair of

435: \lattices from different corpora, look back at the sentence clusters from which

436: the two lattices were derived, and compare the slot values of those

437: cross-corpus sentence pairs that appear in articles written on the {\em same

438:   day} on the same topic; we pair the \lattices if the degree of matching is

439: over a threshold tuned on held-out data.  For example, suppose we have two

440: (linearized) lattices \template{{slot1} bombed slot2} and \template{slot3 was

441:   bombed by slot4} drawn from different corpora.  If in the first lattice's

442: sentence cluster we have the sentence ``the plane bombed the town'', and in the

443: second lattice's sentence cluster we have a sentence written on the same day

444: reading ``the town was bombed by the plane'', then the corresponding lattices

445: may well be paraphrases, where \template{slot1} is identified with

446: \template{slot4} and \template{slot2} with \template{slot3}.

447:

448:

449: To compare the set of argument values of two lattices, we simply count their

450: word overlap, giving double weight to proper names and numbers and discarding

451: auxiliaries (we purposely ignore order because paraphrases can consist of word

452: re-orderings).

453:

454: \subsection{Generating paraphrase sentences}

455:

456: Given a sentence to paraphrase, we first need to identify which, if any, of our

457: previously-computed sentence \families the new sentence belongs most strongly

458: to. We do this by finding the best alignment of the sentence to the existing

459: \famlats.\footnote{ To facilitate this process, we add ``insert'' nodes between

460:   \backbone nodes; these nodes can match any word sequence and thus account for

461:   new words in the input sentence.  Then, we perform multiple-sequence

462:   alignment where insertions score \mbox{-0.1} and all other node alignments

463:   receive a score of unity.}  If a matching \famlat is found, we choose one of

464: its comparable-corpus paraphrase \lattices to rewrite the sentence,

465: substituting in the argument values of the original sentence.  This yields as

466: many paraphrases as there are lattice paths.

467:

468:

469:

470: \section{Evaluation}

471: \label{sec:eval}

472:

473:

474: All evaluations involved judgments by native speakers of

475: English who were not familiar with the paraphrasing systems

476: under consideration.

477:

478: \begin{figure*}

479: \epsfscaledbox{templateeval4.eps}{6.4in}

480: \caption{\label{msa-dirt-accuracy} Correctness and agreement results.

481: Columns = instances; each grey box represents a judgment of ``valid''

482: for the instance.  For each method, a good, middling, and poor

483: instance is shown.  (Results separated by algorithm for clarity; the

484: blind evaluation presented instances from the two algorithms in random

485: order.)

486: }

487: \end{figure*}

488:

489: We implemented our system on a pair of comparable corpora consisting of

490: articles produced between September 2000 and August 2002 by the Agence

491: France-Presse (AFP) and Reuters news agencies.  Given our interest in

492: domain-dependent paraphrasing, we limited attention to 9MB of articles,

493: collected using a TDT-style document clustering system, concerning individual

494: acts of violence in Israel and army raids on the

495: Palestinian territories.  From this data (after removing 120 articles as a

496: held-out parameter-training set), we extracted 43 \slotlats from the AFP corpus

497: and 32 \slotlats from the Reuters corpus, and found 25 cross-corpus matching

498: pairs; since \lattices contain multiple paths, these yielded 6,534 template

499: pairs.\footnote{The extracted paraphrases are available at \texttt{http://www.cs.cornell.edu/Info/Projects/\\NLP/statpar.html}}

500:

501:

502: \subsection{Template Quality Evaluation}

503:

504: Before evaluating the quality of the rewritings produced by our templates and

505: \lattices, we first tested the quality of a random sample of just the template

506: pairs.  In our instructions to the judges, we defined two {text units} (such as

507: sentences or snippets) to be paraphrases if one of them can generally be

508: substituted for the other without great loss of information (but not

509: necessarily vice versa).  \footnote{We switched to this ``one-sided''

510:   definition because in initial tests judges found it excruciating to decide on

511:   equivalence.

512: %LL-post

513:   Also, in applications such as summarization some information loss is

514:   acceptable.}  Given a pair of {\em templates} produced by a system, the

515: judges marked them as paraphrases if for many instantiations of the templates'

516: variables, the resulting text units were paraphrases.  (Several labelled

517: examples were provided to supply further guidance).

518:

519: To put the evaluation results into context, we wanted to compare against

520: another system, but we are not aware of any previous work creating templates

521: precisely for the task of generating paraphrases.  Instead, we made a

522: good-faith effort to adapt the DIRT system \cite{Lin&Pantel:2001a} to the

523: problem, selecting the 6,534 highest-scoring templates it produced when run on

524: our datasets. (The system of \newcite{Shinyama&al:2002a} was unsuitable for

525: evaluation purposes because their paraphrase extraction component is too

526: tightly coupled to the underlying information extraction system.)  It is

527: important to note some important caveats in making this comparison, the most

528: prominent being that DIRT was not designed with sentence-paraphrase generation

529: in mind --- its templates are much shorter than ours, which may have affected

530: the evaluators' judgments --- and was originally implemented on much larger

531: data sets.\footnote{To cope with the corpus-size issue, DIRT was trained on an

532:   84MB corpus of Middle-East news articles, a strict superset of the 9MB we

533:   used.  Other issues include the fact that DIRT's output needed to be

534:   converted into English: it produces paths like ``N:of:N

535:   $\langle$tide$\rangle$ N:nn:N'', which we transformed into ``Y tide of X'' so

536:   that its output format would be the same as ours.  } The point of this

537: evaluation is simply to determine whether another corpus-based

538: paraphrase-focused approach could easily achieve the same performance level.

539:

540:

541: In brief, the DIRT system works as follows. Dependency trees are

542: constructed from parsing a large corpus.  Leaf-to-leaf paths are

543: extracted from these dependency trees, with the leaves serving as

544: slots.  Then, pairs of paths in which the slots tend to be filled by

545: similar values, where the similarity measure is based on the mutual

546: information between the value and the slot, are deemed to be

547: paraphrases.

548:

549:

550: We randomly extracted 500 pairs from the two algorithms' output sets.  Of

551: these, 100 paraphrases (50 per system) made up a ``common'' set evaluated by

552: all four judges, allowing us to compute agreement rates; in addition, each

553: judge also evaluated another ``individual'' set, seen only by him- or herself,

554: consisting of another 100 pairs (50 per system). The ``individual'' sets

555: allowed us to broaden our sample's coverage of the corpus.\footnote{Each judge

556:   took several hours at the task, making it infeasible to expand the sample

557:   size further.}

558: The pairs were presented in random order, and the judges were

559: not told which system produced a given pair.

560:

561: As Figure~\ref{msa-dirt-accuracy} shows, our system outperforms the DIRT

562: system, with a consistent performance gap for all the judges of about 38\%,

563: although the absolute scores vary (for example, Judge 4 seems lenient).  The

564: judges' assessment of correctness was fairly constant between the full

565: 100-instance set and just the 50-instance common set alone.

566:

567:  In terms of agreement, the Kappa value (measuring pairwise agreement

568: discounting chance occurrences\footnote{One issue is that the Kappa

569: statistic doesn't account for varying difficulty among instances.  For

570: this reason, we actually asked judges to indicate for each instance

571: whether making the validity decision was difficult.  However, the

572: judges generally did not agree on difficulty.  Post hoc analysis

573: indicates that perception of difficulty depends on each judge's

574: individual ``threshold of similarity'', not just the instance itself.

575: }) on the common set was 0.54, which corresponds to moderate

576: agreement~\cite{Landis&Koch:1977a}.  Multiway agreement is depicted in

577: Figure~\ref{msa-dirt-accuracy} --- there, we see that in 86 of 100

578: cases, at least three of the judges gave the same correctness

579: assessment, and in 60 cases all four judges concurred.

580:

581:

582: \subsection{Evaluation of the generated paraphrases}

583:

584: Finally, we evaluated the quality of the paraphrase sentences generated by our

585: system, thus (indirectly) testing all the system components: pattern selection,

586: paraphrase acquisition, and generation.  We are not aware of another system

587: generating sentence-level paraphrases.  Therefore, we used as a baseline a

588: simple paraphrasing system that just replaces words with one of their

589: randomly-chosen WordNet synonyms (using the most frequent sense of the word

590: that WordNet listed synonyms for). The number of substitutions was set

591: proportional to the number of words our method replaced in the same sentence.

592: The point of this comparison is to check whether simple synonym substitution

593: yields results comparable to those of our algorithm.  \footnote{ We chose not

594:   to employ a language model to re-rank either system's output because such an

595:   addition would make it hard to isolate the contribution of the paraphrasing

596:   component itself.  }

597:

598:

599:

600: \begin{figure*}[htpb]\footnotesize

601: \hspace*{-.2in}

602:    \begin{tabular}{|l|l|}    \hline

603:       Original (1) & {\em The caller identified the bomber as Yussef Attala, 20, from the

604:       Balata refugee camp near Nablus.}                  \\\hline

605:       MSA  &  The caller named the bomber as 20-year old Yussef Attala from the

606:       Balata refugee camp near Nablus.                    \\\hline

607:       Baseline & The company placed the bomber as Yussef Attala, 20, from the

608:       Balata refugee camp near Nablus.    \\\hline \hline

609:       Original (2) & {\em A spokesman for the group claimed responsibility for the attack

610:       in a phone call to AFP in this northern West Bank town}. \\\hline

611:       MSA  & The attack in a phone call to AFP in this northern West Bank town

612:       was claimed by a spokesman of the group.                     \\\hline

613:       Baseline & \parbox[t]{6in}{A spokesman for the grouping laid claim

614:       responsibility for the onslaught in a phone call to AFP

615:       in this northern West Bank town. } \\\hline

616:     \end{tabular}

617:     \caption{Example sentences and generated paraphrases. Both judges felt

618:     MSA preserved the meaning of (1) but not (2), and that neither

619:     baseline paraphrase was meaning-preserving.}

620:     \label{fig:WordNet}

621: \end{figure*}

622:

623:

624:

625: \newcommand{\results}[2]{#2\xspace} For this experiment, we randomly selected

626: 20 AFP articles about violence in the Middle East published later than the

627: articles in our training corpus.  Out of 484 sentences in this set, our system

628: was able to paraphrase 59 (12.2\%).  (We chose parameters that optimized

629: precision rather than recall on our small held-out set.)  We found that after

630: proper name substitution, only seven sentences in the test set appeared in the

631: training set,\footnote{Since we are doing unsupervised paraphrase acquisition,

632:   train-test overlap is allowed.}  which implies that \lattices boost the

633: generalization power of our method significantly: from seven to 59 sentences.

634: Interestingly, the coverage of the system varied significantly with article

635: length.  For the eight articles of ten or fewer sentences, we paraphrased

636: 60.8\% of the sentences per article on average, but for longer articles only

637: 9.3\% of the sentences per article on average were paraphrased.  Our analysis

638: revealed that long articles tend to include large portions that are unique to

639: the article, such as personal stories of the event participants, which explains

640: why our algorithm had a lower paraphrasing rate for such articles.

641:

642:

643:

644: All 118 instances (59 per system) were presented in random order to two judges,

645: who were asked to indicate whether the meaning had been preserved.  Of the

646: paraphrases generated by our system, the two evaluators deemed

647: \results{59-11}{81.4\%} and \results{59-13}{78\%}, respectively, to be valid,

648: whereas for the baseline system, the correctness results were

649: \results{59-18}{69.5\%} and \results{59-20}{66.1\%}, respectively. Agreement

650: according to the Kappa statistic was 0.6.  Note that judging full sentences is

651: inherently easier than judging templates, because template comparison requires

652: considering a variety of possible slot values, while sentences are

653: self-contained units.

654:

655: Figure \ref{fig:WordNet} shows two example sentences, one where our MSA-based

656: paraphrase was deemed correct by both judges, and one where both judges deemed

657: the MSA-generated paraphrase incorrect.  Examination of the results indicates

658: that the two systems make essentially orthogonal types of errors. The baseline

659: system's relatively poor performance supports our claim that whole-sentence

660: paraphrasing is a hard task even when accurate word-level paraphrases are

661: given.

662:

663:

664: \section{Conclusions}

665:

666: We presented an approach for generating sentence level

667: paraphrases, a task not addressed previously. Our method learns

668: structurally similar patterns of expression from data and identifies

669: paraphrasing pairs among them using a comparable corpus. A flexible

670: pattern-matching procedure allows us to paraphrase an unseen sentence by

671: matching it to one of the induced patterns. Our approach

672: generates both lexical and structural paraphrases.

673:

674: Another contribution is the induction of MSA lattices from non-parallel data.

675: Lattices have proven advantageous in a number of NLP contexts

676: \cite{Mangu&Brill&Stolcke:00a,Bangalore&Murdock&Riccardi:2002a,Barzilay&Lee:2002a,Pang+Knight+Marcu:03a},

677: but were usually produced from \mbox{(multi-)p}arallel data, which may not be

678: readily available for many applications.  We showed that word lattices can be

679: induced from a type of corpus that can be easily obtained for many domains,

680: broadening the applicability of this useful representation.

681:

682: \vspace*{-.1in}

683:

684:

685: \section*{Acknowledgments}

686:

687: {\footnotesize{

688:     We are grateful to many people for helping us in this work.  We thank

689:     Stuart Allen, Itai Balaban, Hubie Chen, Tom Heyerman, Evelyn Kleinberg,

690:     Carl Sable, and Alex Zubatov for acting as judges.  Eric Breck helped us

691:     with translating the output of the DIRT system.  We had numerous very

692:     useful conversations with all those mentioned above and with Eli Barzilay,

693:     Noemie Elhadad, Jon Kleinberg (who made the ``pigeonhole'' observation),

694:     Mirella Lapata, Smaranda Muresan and Bo Pang.  We are very grateful to

695:     Dekang Lin for providing us with DIRT's output.  We thank the Cornell NLP

696:     group, especially Eric Breck, Claire Cardie, Amanda Holland-Minkley, and Bo

697:     Pang, for helpful comments on previous drafts.  This paper is based upon

698:     work supported in part by the National Science Foundation under ITR/IM

699:     grant IIS-0081334 and a Sloan Research Fellowship.  Any opinions, findings,

700:     and conclusions or recommendations expressed above are those of the authors

701:     and do not necessarily reflect the views of the National Science Foundation

702:     or the Sloan Foundation.

703:

704: \vspace*{-.2in}

705:

706:

707:

708:

709: \bibliographystyle{llee-fullname}

710:

711: \begin{thebibliography}{}

712:

713: \bibitem[\protect\citename{Bangalore, Murdock, and

714:   Riccardi}2002]{Bangalore&Murdock&Riccardi:2002a}

715: Bangalore, Srinivas, Vanessa Murdock, and Giuseppe Riccardi.

716: \newblock 2002.

717: \newblock Bootstrapping bilingual data using consensus translation for a

718:   multilingual instant messaging system.

719: \newblock In {\em \proc of COLING}.

720:

721: \bibsnip

722:

723: \bibitem[\protect\citename{Barzilay and Lee}2002]{Barzilay&Lee:2002a}

724: Barzilay, Regina and Lillian Lee.

725: \newblock 2002.

726: \newblock Bootstrapping lexical choice via multiple-sequence alignment.

727: \newblock In {\em \proc of EMNLP}, pages 164--171.

728:

729: \bibsnip

730:

731: \bibitem[\protect\citename{Barzilay and McKeown}2001]{Barzilay&McKeown:01a}

732: Barzilay, Regina and Kathleen McKeown.

733: \newblock 2001.

734: \newblock Extracting paraphrases from a parallel corpus.

735: \newblock In {\em \proc of the ACL/EACL}, pages 50--57.

736:

737: \bibsnip

738:

739: \bibitem[\protect\citename{Chandrasekar and

740:   Bangalore}1997]{Chandrasekar+Srinivas:97a}

741: Chandrasekar, Raman and Srinivas Bangalore.

742: \newblock 1997.

743: \newblock Automatic induction of rules for text simplification.

744: \newblock {\em Knowledge-Based Systems}, 10(3):183--190.

745:

746: \bibsnip

747:

748: \bibitem[\protect\citename{Dras}1999]{Dras:1999a}

749: Dras, Mark.

750: \newblock 1999.

751: \newblock {\em Tree Adjoining Grammar and the Reluctant Paraphrasing of Text}.

752: \newblock {Ph.D.} thesis, Macquarie University.

753:

754: \bibsnip

755:

756: \bibitem[\protect\citename{Durbin \bgroup et al.\egroup

757:   }1998]{Durbin+Eddy+al:98a}

758: Durbin, Richard, Sean Eddy, Anders Krogh, and Graeme Mitchison.

759: \newblock 1998.

760: \newblock {\em Biological Sequence Analysis}.

761: \newblock Cambridge University Press, Cambridge, UK.

762:

763: \bibsnip

764:

765: \bibitem[\protect\citename{Grefenstette}1994]{Grefenstette:94a}

766: Grefenstette, Gregory.

767: \newblock 1994.

768: \newblock {\em Explorations in Automatic Thesaurus Discovery}, volume 278.

769: \newblock Kluwer.

770:

771: \bibsnip

772:

773: \bibitem[\protect\citename{Iordanskaja, Kittredge, and

774:   Polguere}1991]{Iordanskaja&Kittredge&Polguere:1991a}

775: Iordanskaja, L., R.~Kittredge, and A.~Polguere.

776: \newblock 1991.

777: \newblock Lexical selection and paraphrase in a meaning-text generation model.

778: \newblock In C.~Paris, W.~Swartout, and W.~Mann, editors, {\em Natural Language

779:   Generation in Artificial Intelligence and Computational Linguistics}. Kluwer,

780:   chapter~11.

781:

782: \bibsnip

783:

784: \bibitem[\protect\citename{Jacquemin}1999]{Jacquemin:l999a}

785: Jacquemin, Christian.

786: \newblock 1999.

787: \newblock Syntagmatic and paradigmatic representations of term variations.

788: \newblock In {\em \proc of the ACL}, pages 341--349.

789:

790: \bibsnip

791:

792: \bibitem[\protect\citename{Knight and Marcu}2000]{Knight&Marcu:2000a}

793: Knight, Kevin and Daniel Marcu.

794: \newblock 2000.

795: \newblock Statistics-based summarization --- {Step} one: Sentence compression.

796: \newblock In {\em \proc of AAAI}.

797:

798: \bibsnip

799:

800: \bibitem[\protect\citename{Landis and Koch}1977]{Landis&Koch:1977a}

801: Landis, J.~Richard and Gary~G. Koch.

802: \newblock 1977.

803: \newblock The measurement of observer agreement for categorical data.

804: \newblock {\em Biometrics}, 33:159--174.

805:

806: \bibsnip

807:

808: \bibitem[\protect\citename{Lin}1998]{Lin:1998a}

809: Lin, Dekang.

810: \newblock 1998.

811: \newblock {Automatic retrieval and clustering of similar words}.

812: \newblock In {\em \proc of ACL/COLING}, pages 768--774.

813:

814: \bibsnip

815:

816: \bibitem[\protect\citename{Lin and Pantel}2001]{Lin&Pantel:2001a}

817: Lin, Dekang and Patrick Pantel.

818: \newblock 2001.

819: \newblock Discovery of inference rules for question-answering.

820: \newblock {\em Natural Language Engineering}, 7(4):343--360.

821:

822: \bibsnip

823:

824: \bibitem[\protect\citename{Mangu, Brill, and

825:   Stolcke}2000]{Mangu&Brill&Stolcke:00a}

826: Mangu, Lidia, Eric Brill, and Andreas Stolcke.

827: \newblock 2000.

828: \newblock Finding consensus in speech recognition: Word error minimization and

829:   other applications of confusion networks.

830: \newblock {\em Computer, Speech and Language}, 14(4):373--400.

831:

832: \bibsnip

833:

834: \bibitem[\protect\citename{McKeown}1979]{McKeown:79a}

835: McKeown, Kathleen~R.

836: \newblock 1979.

837: \newblock Paraphrasing using given and new information in a question-answer

838:   system.

839: \newblock In {\em \proc of the ACL}, pages 67--72.

840:

841: \bibsnip

842:

843: \bibitem[\protect\citename{McKeown, Kukich, and

844:   Shaw}1994]{McKeown&Kukich&Shaw:1994a}

845: McKeown, Kathleen~R., Karen Kukich, and James Shaw.

846: \newblock 1994.

847: \newblock Practical issues in automatic documentation generation.

848: \newblock In {\em \proc of ANLP}, pages 7--14.

849:

850: \bibsnip

851:

852: \bibitem[\protect\citename{Meteer and Shaked}1988]{Meteer+Shaked:88a}

853: Meteer, Marie and Varda Shaked.

854: \newblock 1988.

855: \newblock Strategies for effective paraphrasing.

856: \newblock In {\em \proc of COLING}, pages 431--436.

857:

858: \bibsnip

859:

860: \bibitem[\protect\citename{Och, Tillman, and Ney}1999]{Och&Tillman&Ney:1999a}

861: Och, Franz~Josef, Christoph Tillman, and Hermann Ney.

862: \newblock 1999.

863: \newblock Improved alignment models for statistical machine translation.

864: \newblock In {\em \proc of EMNLP}, pages 20--28.

865:

866: \bibsnip

867:

868: \bibitem[\protect\citename{Pang, Knight, and Marcu}2003]{Pang+Knight+Marcu:03a}

869: Pang, Bo, Kevin Knight, and Daniel Marcu.

870: \newblock 2003.

871: \newblock Syntax-based alignment of multiple translations: Extracting

872:   paraphrases and generating new sentences.

873: \newblock In {\em Proceedings of HLT/NAACL}.

874:

875: \bibsnip

876:

877: \bibitem[\protect\citename{Papineni \bgroup et al.\egroup

878:   }2002]{Papineni&al:2002a}

879: Papineni, Kishore~A., Salim Roukos, Todd Ward, and Wei-Jing Zhu.

880: \newblock 2002.

881: \newblock Bleu: A method for automatic evaluation of machine translation.

882: \newblock In {\em \proc of the ACL}, pages 311--318.

883:

884: \bibsnip

885:

886: \bibitem[\protect\citename{Pereira, Tishby, and

887:   Lee}1993]{Pereira&Tishby&Lee:1993a}

888: Pereira, Fernando, Naftali Tishby, and Lillian Lee.

889: \newblock 1993.

890: \newblock Distributional clustering of {English} words.

891: \newblock In {\em \proc of the ACL}, pages 183--190.

892:

893: \bibsnip

894:

895: \bibitem[\protect\citename{Robin}1994]{Robin-phd}

896: Robin, Jacques.

897: \newblock 1994.

898: \newblock {\em Revision-Based Generation of Natural Language Summaries

899:   Providing Historical Background: Corpus-Based Analysis, Design,

900:   Implementation, and Evaluation}.

901: \newblock {Ph.D.} thesis, Columbia University.

902:

903: \bibsnip

904:

905: \bibitem[\protect\citename{Shinyama \bgroup et al.\egroup

906:   }2002]{Shinyama&al:2002a}

907: Shinyama, Yusuke, Satoshi Sekine, Kiyoshi Sudo, and Ralph Grishman.

908: \newblock 2002.

909: \newblock Automatic paraphrase acquisition from news articles.

910: \newblock In {\em \proc of HLT}, pages 40--46.

911:

912: \bibsnip

913:

914: \bibitem[\protect\citename{Smadja and McKeown}1991]{Smadja&McKeown:1991a}

915: Smadja, Frank and Kathleen McKeown.

916: \newblock 1991.

917: \newblock Using collocations for language generation.

918: \newblock {\em Computational Intelligence}, 7(4).

919: \newblock Special issue on natural language generation.

920:

921: \bibsnip

922:

923: \bibitem[\protect\citename{Vogel and Ney}2000]{Vogel&Ney:2000a}

924: Vogel, Stephan and Hermann Ney.

925: \newblock 2000.

926: \newblock Construction of a hierarchical translation memory.

927: \newblock In {\em \proc of COLING}, pages 1131--1135.

928:

929: \bibsnip

930:

931: \bibitem[\protect\citename{Wang}1998]{Wang:1998a}

932: Wang, Ye-Yi.

933: \newblock 1998.

934: \newblock {\em Grammar Inference and Statistical Machine Translation}.

935: \newblock {Ph.D.} thesis, CMU.

936:

937: \end{thebibliography}

938:

939: }

940: }

941:

942: \section*{Appendix}

943:

944: In this appendix, we describe how we  insert slots into  \lattices to

945: form \slotlats.

946:

947: Recall that the backbone nodes in our \lattices represent words appearing in

948: many of the sentences from which the lattice was built.  As mentioned above,

949: the intuition is that areas of high variability between backbone nodes may

950: correspond to arguments, or slots.  But the key thing to note is that there are

951: actually two different phenomena giving rise to multiple parallel paths: {\em

952:   argument variability}, described above, and {\em synonym variability}.  For

953: example, Figure \ref{fig:variability}(b) contains parallel paths corresponding

954: to the synonyms ``injured'' and ``wounded''.  Note that we want to {\em remove}

955: argument variability so that we can generate paraphrases of sentences with

956: arbitrary arguments; but we want to {\em preserve} synonym variability in order

957: to generate a variety of sentence rewritings.

958:

959: To distinguish  these two situations, we analyze the {\em split

960: level} of \backbone nodes that begin regions with multiple paths. The

961: basic intuition is that there is probably more variability associated

962: with arguments than with

963: synonymy: for example, as datasets increase, the number of locations

964: mentioned rises faster than the number of synonyms appearing. We make

965: use of a

966: {\em synonymy threshold} $s$ (set by held-out parameter-tuning

967:  to  30), as follows.

968:

969: \begin{itemize}

970: \item If no more than $s$\% of all the edges out of a \backbone node

971:  lead to the same next node, we have high enough variability to

972: warrant inserting a {\slot} node.

973: \item Otherwise, we incorporate reliable synonyms\footnote{While our original

974:     implementation, evaluated in Section~\ref{sec:eval}, identified only

975:     single-word synonyms, phrase-level synonyms can similarly be acquired by

976:     considering chains of nodes connecting backbone nodes.}  into the \backbone

977:   structure by preserving all nodes that are reached by at least $s$\% of the

978:   sentences passing through the two neighboring \backbone nodes.

979: \end{itemize}

980: Furthermore, all \backbone nodes

981: labelled with our special generic tokens are

982: also replaced with \slot nodes, since they, too, probably represent arguments

983: (we condense adjacent \slots into one).  Nodes with in-degree lower than the

984: synonymy threshold are removed under the assumption that they probably

985: represent idiosyncrasies of individual sentences.  See Figure

986: \ref{fig:variability} for examples.

987:

988: Figure \ref{fig:lattice} shows an example of a

989: \lattice and the  \slotlat derived via the process just described.

990:

991:

992: \begin{figure}[h]

993: \epsfscaledbox{variability.eps}{2.8in}

994: \caption{\label{fig:variability} Simple seven-sentence examples of two types of

995: variability.  The double-boxed nodes are \backbone nodes; edges show

996: consecutive words in some sentence. The synonymy threshold (set to 30\%

997: in this example)

998: determines the type of variability. }

999: \end{figure}

1000:

1001:

1002:

1003:

1004:

1005: \end{document}

1006: