0207:cs0207083/eval.tex

1: \documentclass{article}

2: \usepackage{proceedings}

3: \usepackage{latexsym}

4: \usepackage{amsfonts}

5: \usepackage{named}

6:

7: \newtheorem{definition}{Definition}

8: \newtheorem{theorem}{Theorem}

9: \newtheorem{ex}{Example}

10: \newenvironment{example}{\begin{ex} \rm}{$\Box$ \end{ex}}

11:

12: \newcommand{\Th}{{\bf Th}}

13: \newcommand{\default}[3]{\frac{#1:#2}{#3}}

14:

15: \newcommand{\lc}{\ulcorner}

16: \newcommand{\rc}{\urcorner}

17:

18:

19: \newcommand{\note}[1]{\bigskip[{\em #1}]\bigskip}

20:

21: \title{Evaluating Defaults}

22: \author{

23: {\bf Henry E. Kyburg, Jr.}$^{1,2}$\\

24: {\tt kyburg@cs.rochester.edu}

25: \vspace{1ex}\\

26: $^{1}$Computer Science and Philosophy\\

27: University of Rochester\\

28: Rochester NY 14627, USA

29: \And

30: {\bf Choh Man Teng}$^{2}$\\

31: {\tt cmteng@ai.uwf.edu}

32: \vspace{1ex}\\

33: $^{2}$Institute for Human and Machine Cognition\\

34: University of West Florida\\

35: Pensacola FL 32501, USA

36: }

37:

38: \begin{document}

39:

40: \maketitle

41:

42: \begin{abstract}

43: We seek to find normative criteria of adequacy for nonmonotonic logic similar

44: to the criterion of validity for deductive logic.  Rather than stipulating that

45: the conclusion of an inference be true in all models in which the premises are

46: true, we require that the conclusion of a nonmonotonic inference be true in

47: ``almost all'' models of a certain sort in which the premises are true.  This

48: ``certain sort'' specification picks out the models that are relevant to the

49: inference, taking into account factors such as specificity and vagueness,

50: and previous inferences.  The frequencies

51: characterizing the relevant models reflect known frequencies in our actual

52: world.  The criteria of adequacy for a default

53: inference can be extended by thresholding to criteria of adequacy for an

54: extension.  We show that this avoids the implausibilities that might otherwise

55: result from the chaining of default inferences.  The model proportions,

56: when construed in terms of frequencies, provide a verifiable grounding of

57: default rules, and can become the basis for generating default rules from

58: statistics.

59:

60:

61: \vskip 2ex

62: \noindent

63: {\em Keywords}:

64: probability, frequency, default logic

65: \end{abstract}

66:

67:

68: \section{Introduction}

69:

70: Non-monotonic reasoning, for example default logic~\cite{Reiter80},

71: models the intuitive process of making non-deductive

72: inferences in the face of certain supportive but not conclusive evidence.

73: Given a default theory $\Delta=\langle D,F \rangle$, we can obtain its

74: extensions by following a prescribed set of steps.

75: However, on what grounds do we employ a particular default rule?

76: Some writers would

77: regard this as an inappropriate question, since they take as their goal the

78: representation of human inference.

79: To this end defaults represent rules that we take to be intuitively

80: appropriate.  But then when we apply these rules,

81: we may be led to counterintuitive

82: results~\cite{ReiterC81,Lukaszewicz88,Poole89}.

83: The underlying principle seems to be circular: the original default

84: rules are ``intuitively good'' at first glance, but when we discover that they

85: do not give rise to the desired results, we tweak the rules until they give us

86: those results. It seems that we have to know what results we want first before

87: constructing the default theory, rather than having the default theory tell us

88: what conclusions are warranted. This is precisely the reason why we need an

89: independent measure of validity for default rules and default extensions.

90: We think of nonmonotonic

91: logic as sharing the normative character of other logics.  From this point of

92: view default rules require some defense.

93: We will concentrate on default logic here, though much of

94: what we have to say will apply to other nonmonotonic approaches as well.

95: Much

96: of the work on nonmonotonic logic has concerned the syntactic manipulation

97: of the nonmonotonic rules,

98: rather than their basic justification.

99:

100:

101:

102: \subsection{Selective Preference}

103: For a {\em default rule\/}

104: $d=\default{\alpha}{\beta_1, \ldots, \beta_n}{\gamma}$,

105: $\alpha$ is the {\em prerequisite\/},

106: $\beta_1, \ldots, \beta_n$ are the {\em justifications\/},

107: and $\gamma$ is the {\em consequent\/} of $d$.

108: Loosely speaking, the rule

109: conveys the idea that if $\alpha$ is provable, and

110: $\neg\beta_1, \ldots, \neg\beta_n$ are

111: each

112: not provable,

113: then  by default  we conclude that $\gamma$ is true.

114: A {\em default theory\/} is an ordered pair

115: $\langle D,F \rangle$, where

116: $D$ is a set of default rules and $F$ is a set of

117: ``facts''.

118: A theory extended from $F$ by applying the default rules in $D$

119: is known as an {\em extension\/} of the default theory.

120:

121: Consider the following canonical example.

122:

123:

124: \begin{example}

125: We have a default theory $\Delta=\langle D, F \rangle$, where

126: \[\begin{array}{rcl}

127: D &=& \{\default{R(x)}{T(x)}{T(x)},

128:      \default{S(x)}{\neg T(x)}{\neg T(x)}\}, \\

129: F &=& \{R(a),S(a)\}.

130: \end{array}\]

131:

132: We get two extensions, one containing $T(a)$ and the other containing

133: $\neg T(a)$.

134: If we take ``$R(x)$'' to mean that $x$ is a bird, ``$S(x)$'' to mean that

135: $x$ is a penguin, and ``$T(x)$'' to mean that $x$ flies, then we would like

136: to reject the extension containing ``$T(a)$'' ($a$ flies) in favor of the

137: extension containing ``$\neg T(a)$'' ($a$ does not fly).

138: However, if we take ``$S(x)$'' to mean that $x$ is an animal instead and

139: keep ``$R(x)$'' and ``$T(x)$'' the same, we would want to reverse our preference.

140: Now the extension containing ``$T(a)$'' ($a$ flies) seems better.

141: \end{example}

142: Note that each of the default rules involved in the example above

143: is intuitively appealing

144: when viewed by itself against our background knowledge:

145:  birds fly;

146: penguins do not fly;

147: and animals in general do not fly either.

148: Moreover, both instantiations (penguins and animals) are syntactically

149: identical.  Thus, we cannot base our decision to prefer one default rule over

150: the other by simply looking at their syntactic structures.

151:

152: It is the interaction between the default rules and evidence

153: that gives rise to the selective preference above.  We have

154: the evidence that ``$a$ is a bird''.

155: If in addition we also have ``$a$ is a penguin'',

156: we prefer the penguin rule.

157: If instead we have ``$a$ is an animal'', we prefer the bird rule.

158:

159: There are several approaches to circumventing this conceptual difficulty.

160: The first is to revise the default theory so that the desired result

161: is achieved~\cite{ReiterC81}.  We can amend the default rules

162: by adding the exceptions as justifications, for example

163: $\default{B(x)}{F(x), \neg P(x)}{F(x)}$ and

164: $\default{A(x)}{\neg F(x), \neg B(x)}{\neg F(x)}$.

165: With this approach we have to constantly revise the default rules

166: to take into account additional exceptions.

167: We have little guidance in constructing the list of justifications

168: except that the resulting default rule has to produce the ``right''

169: answer in the given situation.

170:

171:

172: Another approach is to establish some priority structure over the

173: set of defaults.

174: For example, we can refer to a specificity or inheritance hierarchy

175: to determine which default rule should be used in case of a

176: conflict~\cite{Touretzky84,HortyTT87}.

177: The penguin rule is more specific than the bird rule, when both are

178: applicable, and therefore we use the penguin rule and not the bird rule.

179: However, conflicting rules do not always fit into neat hierarchies

180: (for example, adults are employed, students are not, how about adult

181: students?~\cite{ReiterC81}).  It is not obvious how we can extend

182: the hierarchical structure without resorting to explicitly enumerating

183: the priority relations between the default rules~\cite{Brewka89,Brewka94}.

184:

185: The third approach is to appeal to probabilistic

186: analysis.  Defaults are interpreted as

187:  representing  properties of conditional probabilities.

188: For example, the conditional probability of $a$ being able to fly given

189: that $a$ is a bird is ``high''~\cite{Pearl88,Pearl90}

190: or increases from the prior probability~\cite{NeufeldPA90},

191: while the conditional probability of $a$ being able to fly given that $a$ is

192: a penguin

193: is ``low'' or decreases.  This approach provides a probabilistic semantics

194: for default rules, but in a way which does not represent the fact that

195:  the conclusions are {\em accepted\/}.  The default conclusion is ``Tweety

196: flies,'' not ``Probably Tweety flies.''

197:   This is in contrast to the spirit of

198: nonmonotonic reasoning: default conclusions should be accepted as new facts,

199: and we should be able to chain default rules

200: and build upon the conclusions of previous default applications

201: to obtain further conclusions.

202:

203:

204: \subsection{Justifying Nonmonotonic Inference}

205:

206: The justification of beliefs is a long standing issue in epistemology.  There

207: is

208: not much that is problematic about the justification of beliefs obtained by

209: deductive inference (though there are plenty of problems that surround

210: deduction --- see

211: \cite{kyburg.justification,haack.justification,dummett.justification}, not to

212: mention the voluminous literature on paraconsistent logic \cite{priest}).  The

213: reason is that we can show that the ordinary rules of deduction lead from

214: premises to conclusions that are true in every model in which the premises are

215: true.  This is exactly what is not true of ampliative inference, and it is what

216: has led some writers (e.g., \cite{morgan}) to deny that there {\em is} any such

217: thing as a nonmonotonic logic.  This has been disputed in \cite{kyburg.2001}.

218:

219: But other kinds of justifications of beliefs have been proposed.  Isaac Levi

220: \cite{levi.gambling,levi.enterprise,levi.argument} has argued for many years

221: that the way to understand ampliative (inductive, nonmonotonic) argument is in

222: terms of decision theory: we choose (decide) to accept a hypothesis in a given

223: context provided that the expected epistemic utility of doing so in that

224: context

225: is greater than the expected utility of any other epistemic act, such as

226: suspending belief totally, or accepting a stronger hypothesis.

227:

228: Levi's approach employs a rich and detailed structure for acceptance, and

229: allows drawing many important distinctions.    This structure requires three

230: things that make it less than perfect as a vehicle for ordinary nonmonotonic

231: inferential systems.  First, in keeping with a long tradition in pragmatism

232: \cite{dewey,peirce,james} the context of inquiry must be tied to a specific

233: problem: We need the answer to a question.  Second, the epistemic expectation of an

234: answer is the expected value of the {\em information} contained in that

235: answer.  Thus we need to presuppose an information measure on the language of

236: our inquiry \cite[p. 169]{levi.argument}. Third, we need to have available a

237: credal or inductive probability,  based on a measure (or

238: convex set of measures) on the sentences of the language, in terms of which a

239: conditional probability (or convex set of conditional probabilities)

240: can be defined  \cite[p. 52]{levi.enterprise}.

241:

242: It is our belief

243: both that in some contexts in which we might wish to use nonmonotonic

244: mechanisms, this overhead is unnecessary, and perhaps itself difficult to

245: justify, and that we would like to be able to explicate the justification of

246: inference in a less context dependent way.

247:

248: Another approach that has attracted considerable

249: attention in the philosophical community in recent years is that of

250: ``reliabilism'' whose best known exponent is Alvin Goldman \cite{goldman}.

251: According to this view, what justifies a belief is the fact that it is obtained

252: by a ``reliable cognitive process...'' \cite[p. 20]{goldman}  Of course there

253: are a number of additional hedges to the view that are required for philosophical

254: accuracy, and even with those hedges there remains a certain vagueness in the

255: view.  These details need not detain us, since we are seeking inspiration

256: rather than philosophical precision.

257:

258: What does ``reliable'' mean?

259: We will construe reliability in terms of frequency or propensity to yield truth

260:  when applied.

261: Specifically, we will say that the belief $\phi$ is nonmonotonically justified

262: by a default rule if the rule would frequently lead to truth and

263: rarely to error, given what we know --- given our background knowledge.

264:

265: A deductive argument is justified (valid) if its conclusion is true in every

266: model of its premises.

267: We will attempt to provide an analog of the justification of deductive rules:

268: a default argument is justified if its

269: conclusion is true in a high proportion of the relevant models in which its premises are

270: true.

271: To make this idea precise requires an excursion into model theory.

272:

273:

274: \section{Model Theory}

275:

276: We will suppose that the underlying object language is a first order

277:  language that does not involve such intensional predicates

278: as ``know'' or ``believe.''  A number of nonmonotonic formalisms

279: (specifically autoepistemic logic~\cite{Moore85})

280: do involve such locutions within the object

281: language, but they can be dispensed with

282:  in default logic.  The default rule

283: $\default{\alpha}{\beta_1,\ldots, \beta_n}{\gamma}$ can be read in terms

284: of the nonmembership of $\ulcorner \neg \beta_i \urcorner$ in a specified set

285: of

286: expressions $\Gamma$. In original default logic, $\Gamma$ would

287: just be an extension.

288:

289:

290:  There are a number of immediate problems associated

291: with the idea of looking at the ``proportion'' of models.

292: The least of them is choosing a {\em level\/} at

293: which to regard the evidence as adequate.  Should we require that the

294: proportion be 0.95?  Or 0.99?  Or 0.995?  This is

295: just the sort of question that

296: arises in statistical hypothesis testing or in confidence interval estimation.

297: We shall suppose that in a given context there is some

298: agreed-upon level of security $\delta$; we will accept a conclusion if the

299: proportion of models in which we could be committing an error is no greater

300: than $\delta$.

301:

302: This approach is to be contrasted with those of Adams \cite{adams,adams.1966},

303: Pearl \cite{Pearl88} and Bacchus et al \cite{BacchusGHK93}.  Adams requires that

304: for

305: $A$ to be a reasonable consequence of the set of sentences $S$, for {\em any\/}

306: $\epsilon$ there must be a positive $\delta$ such that for every probability

307: function, if the probability of every sentence in $S$ is greater than $1 -

308: \delta$, then the probability of $A$ is at least $1 - \epsilon$ \cite[p.

309: 274]{adams.1966}.  Pearl's approach similarly involves quantification over

310: possible probability functions.

311: Bacchus et al again take the degree of belief of a statement to be

312: the limiting proportion of first order models in which the statement is true.

313: All of these approaches involve matters that go well beyond what we may

314: reasonably suppose to be available to us as empirical enquirers.  Our $\delta$, on

315: the other hand, serves much like the $\alpha$ of statistical testing.

316:

317: We must restrict the number of models

318: under consideration to a finite  number so that the idea

319: of looking at proportions makes

320: sense.\footnote{We could, instead, seek to develop a

321: way of proceeding to a limit; this still would require restrictions to arrive

322: at a countable number of models, and would entail a large expository cost for

323: little gain in plausibility.}

324: We will be taking account of statistical information, and to this end will want

325: each model to have a finite domain.

326:   Roughly speaking, we take as a model

327: of our language one in which the domain of empirical individuals is of finite

328: cardinality.  This may be regarded as problematic (it

329: entails the falsity of ``every person has two parents  and

330: nobody is his own ancestor'') but with reasonable spatial and temporal bounding

331: it can be rendered plausible.

332:

333: Even so, to ensure that the set of models is finite

334: we must restrict the empirical domain even further.  Not only must it consist

335: of

336: a finite set of physical entities, but this same set of physical entities

337: $\mathcal D$ must be taken to be the empirical domain of every model.

338:

339: We assume that it is possible to express statistical knowledge in this

340: language.  For example, if ``$B(x)$'' is the predicate ``is a bird''

341: and

342: ``$F(x)$'' is ``can fly'', we can express the fact that

343: between 85\% and 95\% of birds fly by the formula $0.85 <

344: \frac{|\{x:B(x)

345: \land F(x)\}|}{|\{x:B(x)\}|} < 0.95$.  Employing the notation of

346: \cite{kyburg.teng} we write this as ``$\%x(F(x),B(x),0.85,0.95)$.''

347: This renders ``\%''  a variable binding operator on 4-sequences of expressions:

348: two formulas and two fractions.

349:

350:

351: We distinguish, as do Pearl and Geffner \cite[p. 70]{pearl.geffner} between

352: immediate evidence, represented by a finite set of sentences $E$ concerning

353: particular facts (to be distinguished  from the general body of factual

354: knowledge $F$ invoked by classical default logic), and a finitely axiomatizable

355: set of sentences

356: $K$ representing general background knowledge.

357:  What defaults are plausible depends, of course, on background knowledge.

358: If it were not for what we take to be the typical (or natural, or frequent)

359: behavior of birds, the world's best known example of a default rule would not

360: be

361: plausible.  On the other hand no one has proposed the default rule $\frac

362: {\mathrm{fish}(a):\mathrm{mackerel}(a)}{\mathrm{can-talk}( a)}$.

363:

364: Thus in general

365: we will represent the set of default rules of a default theory as $\Delta_K$

366: rather than $D$, since we take them to be a function of our body of general

367: knowledge $K$. Given an error tolerance

368: $\delta$, we will take a default rule to be {\em $\delta$-valid} if, for every

369: set of possible input sentences $E$ consistent with $K$, the application of the

370: rule to $E$ leads to a false conclusion in a

371: proportion of at most

372: $\delta$ of the relevant models.

373: More precisely, a default rule is $\delta$-valid if and only if for every set of

374: input sentences $E$ consistent with $K$ to which the rule is applicable, the

375: proportion of models of $E \cup K$ in which the conclusion of the rule is false

376: is no more than $\delta$.

377:

378: To fix our ideas, let us begin with a simple example.  Suppose

379: $K$ includes a statement to the effect that at least $1 -

380: \delta$ and not more than 1 - $\epsilon$ of birds fly and nothing else; that is,

381: ``$\%x(F(x),B(x),1-\delta,1-\epsilon)$.''  Consider the rule

382: $\default{B(x)}{F(x)}{F(x)}$.

383: This

384: rule is ``applicable'' to immediate evidence

385: $E$ only if $E \cup K$ entails a sentence of the form

386: $\ulcorner B(a) \urcorner$ and no corresponding sentence of the form

387: $\ulcorner \neg F(a) \urcorner$.

388:

389: Our models have a single domain $\mathcal D$ of finite cardinality.

390: We will write ``$\mathcal I_m(\phi)$'' for the interpretation of $\phi$ in

391: the model $m$.  The constraint imposed by

392: $K$ is that for every model $m$ the proportion of objects in $\mathcal I_m(B)$ that

393: are

394: also in $\mathcal I_m(F)$ lies in

395: $[1-\delta,1-\epsilon]$.

396:

397: There are three cases.  First, suppose $E \cup K$ does not entail a sentence of

398: the form $\ulcorner B(a) \urcorner$.  Then the rule is inapplicable.  Second,

399: suppose that for some term $a$, $E \cup K$ entails

400: ``$B(a)$'' and also entails ``$\neg F(a)$''.  The rule is again inapplicable, because

401: it is blocked by the failure of a justification.  Third, suppose

402: for some term $a$, $E \cup K$ entails

403: ``$B(a)$'' but not ``$\neg F(a)$''.

404: Then $\mathcal I_m(a) \in \mathcal I_m(B)$.  There are

405: $|\mathcal I_m(B)|$ interpretations of $a$ that make $E \cup K$ true;

406: of these at least

407: $1-\delta$ make ``$F(a)$'' true.  We have said nothing about interpreting the

408: rest of the language, but however many interpretations there are (we have seen

409: to it that there are only a finite number) the proportion that renders

410: ``$F(a)$'' true will remain unchanged; it will be at least $1-\delta$.

411:  Thus, given the background knowledge that we have

412: posited, the rule is $\delta$-valid: if it is applicable it will lead to error no

413: more than $1- \delta$ of the time.

414:

415:

416: Now let us consider a somewhat more complex example:  Suppose we know

417: that typically birds fly, and that typically penguins don't.  If that is in our

418: background knowledge $K$, as well as ``$\forall x(P(x) \supset B(x))$'', then

419: the

420: flying default becomes $\default{B(x)}{F(x),\neg P(x)}{F(x)}$, and

421: we also have the default

422: $\default{P(x)}{\neg F(x)}{\neg F(x)}.$

423: If $E$ entails ``$P(a)$'', only the second default is applicable.  In no more

424: than

425: $\delta$ of the models of $E$ will $a$ fly, unless $E \cup K$ entails that $a$ can

426: fly.

427:

428:

429: Another example: Suppose $K$ contains vague information about the

430: frequency with which red birds fly (perhaps because we have

431: encountered few red birds).

432: Say that we know the frequency to be between

433: 0.50 and 1.0.

434: Since the interval for birds in general

435: $[1-\delta,1-\epsilon]$ is included in $[0.5, 1.0]$,

436: this additional piece of information

437: should not interfere with our inference of flying ability.

438: There is no {\em conflict\/} between the two intervals,

439: just less precision in one.  The rule about birds in general

440: can be applied to red birds.  However, if $K$ contains the knowledge

441: that

442: between 0.5 and

443: $r$ of red birds fly, where

444: $r$ is less than $1-\epsilon$,

445: then this information

446: {\em should\/} interfere.

447: In this case the general rule should be so construed that it does not

448: apply to red birds.  If $b$ is a red bird and not a penguin, no conclusion about

449:  flying ability is justified.

450:

451: We can arrange this by judiciously adding or deleting

452: justifications in the general bird rule, in accordance with the statistical

453: information in $K$:

454: in the first case we allow red birds; in the second we must require that we do

455: not know $a$ is red: ``$\neg R(x)$'' must be among the justifications of

456: the rule.  This statistical approach provides exactly the normative guidance

457: that is lacking in the ad-hoc

458: approach of tweaking default rules in order to arrive at the ``intuitive''

459: results.

460:

461: More generally, we can give recipes for constructing $\delta$-valid defaults

462: for conclusions of the form $\lc \phi (a) \rc$ from background knowledge $K$

463: and immediate evidence $E$.\footnote{

464: Of course any default conclusion can be given this form, particularly if we allow

465: the term $a$ to be an $n$-sequence of terms.  Furthermore, any such conclusion

466: can be taken to be an instance of the consequent of a statistical generalization,

467: in virtue of the fact that statistical generalizations merely impose bounds. We

468: are  not imposing serious limitations on default rules.  For details, see

469: \cite{kyburg.teng}.}

470: Let $K$ contain $\ulcorner

471: \%x(\phi(x),\psi(x),p,q) \urcorner$ and

472: $\ulcorner

473: \%x(\phi(x),\psi'(x),p',q') \urcorner$.

474: We consider three cases:

475: \begin{enumerate}

476: \item  $K$ entails $\ulcorner\forall x

477: (\psi(x) \supset \psi'(x)) \urcorner$.

478: There are three subcases

479: according to the relation among $p$, $p'$, $q$, $q'$:

480: \begin{enumerate}

481: \item ($p \leq p'$ and $q \leq q'$) or ($p' \leq p$ and $q' \leq q$)\\

482: $\default{\psi(x)}{\phi(x)}{\phi(x)}$ and $\default{\psi'(x)}{\neg

483: \psi(x),\phi(x)}{\phi(x)}$ are candidate defaults.

484: \item $p \leq p'$ and $q' \leq q$\\

485: $\default{\psi'(x)}{\phi(x)}{\phi(x)}$ is the only candidate default,

486: since

487: the justification $\neg \psi'(x)$

488: of $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$ is

489: inconsistent with the prerequisite $\psi(x)$ and $K$.

490: \item $p' \leq p$ and $q \leq q'$\\

491: $\default{\psi(x)}{\phi(x)}{\phi(x)}$ and $\default{\psi'(x)}{\neg

492: \psi(x),\phi(x)}{\phi(x)}$ are candidate defaults.

493: \end{enumerate}

494: \item $K$ entails $\ulcorner\forall x

495: (\psi'(x) \supset \psi(x)) \urcorner$.

496: This is symmetrical to case 1.

497: \item $K$ entails neither $\ulcorner\forall x

498: (\psi(x) \supset \psi'(x)) \urcorner$ nor  $\ulcorner\forall x

499: (\psi'(x) \supset \psi(x)) \urcorner$.

500: Again there are three subcases:

501: \begin{enumerate}

502: \item ($p \leq p'$ and $q \leq q'$) or ($p' \leq p$ and $q' \leq q$) \\

503: The candidate defaults are $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$

504: and

505: $\default{\psi'(x)}{\neg \psi(x),\phi(x)}{\phi(x)}$.

506: \item $p \leq p'$ and $q' \leq q$\\

507: $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$ and

508: $\default{\psi'(x)}{\phi(x)}{\phi(x)}$ are candidate default rules.

509: \item $p' \leq p$ and $q \leq q'$:

510: This is symmetrical to case 3(b).

511: \end{enumerate}

512: \end{enumerate}

513:

514: Having generated a list of candidate default rules based on our background

515: knowledge $K$, we delete those rules derived from statistics with a lower

516: measure less than $1 -\delta$.  The remainder is the set of

517: defaults $\Delta_K$.

518:

519: We have not taken account of relations among default conclusions that may be

520: entailed by $K$.  If $K$ contains $\ulcorner \forall x

521: (\phi(x) \equiv \phi'(x)) \urcorner$ then the default conclusion $\ulcorner

522: \phi(a) \rc$ behaves just like the default conclusion $\lc \phi'(a) \rc$.  If

523: $K$ contains $\lc \forall x(\phi(x) \supset \phi'(x)) \rc$, then since $\lc

524: \phi(x) \rc$ is equivalent to $\lc \phi(x) \wedge \phi'(x) \rc$ and $\lc

525: \phi'(x) \rc$ is equivalent to $\lc \phi(x) \vee \phi'(x) \rc$ we can make use

526: of the obvious entailment relations.

527:

528: Soundness of a system of deductive logic requires that the conclusion of any

529: inference be true in every model in which the premises are true.  Clearly

530: nonmonotonic inference should not be sound.  But there is a property that is

531: {\em like\/} soundness that applies to default inference.  It is the property

532: that the conclusion is false in at most a fraction $\delta$ of the models of the

533: premises $K \cup E$.

534:

535: \begin{theorem}[Default Soundness]

536: For every set of observations $E$, if $d \in \Delta_K$ is applicable to $E$,

537: the proportion of models of $E \cup K$ in which the conclusion of $d$ is false

538: is less than $\delta$.

539:

540: {\rm The proof of this theorem is provided by the soundness theorem for

541: evidential probability \cite[p. 241]{kyburg.teng}, since the rules for deriving

542: defaults are a subset of the rules for computing evidential probabilities.

543: $\Box$}

544: \end{theorem}

545:

546:

547: \section{Interactions within an Extension}

548:

549: Having determined which default rules are justified with respect to the

550: background knowledge, the next step is to investigate the interaction between

551: default rules in generating an extension.

552: A default extension is a minimal deductively closed set that

553: contains the given facts and the consequents of all applicable default rules.

554: Given an evidence set, we need to determine

555: how to control the compound effects of multiple defaults in an extension.

556:

557: Take for example, a default version~\cite{Poole89} of

558: the probabilistic lottery paradox~\cite{Kyburg61}.  There are $n$ species

559: of birds, $S_1, \ldots, S_n$.

560: We can say that penguins are atypical in that they cannot fly;

561: hummingbirds are atypical in that they have very fine motor control;

562: parrots are atypical in that they could talk; and so on.

563: If we apply this train of thought to all $n$ species of birds,

564: there is no typical bird left, as for each species

565: there is always at least one aspect in which it is atypical.

566: A parallel scenario is formulated below.

567:

568: \begin{example}

569: \label{ex:birds}

570: $K$ contains

571: \begin{quote}

572: $B(x) \equiv S_1(x) \vee \ldots \vee S_n(x)$\\

573: \mbox{\ } \hfill

574: [an exhaustive list of bird species]\\

575: $S_i(x) \supset \neg S_j(x)$, for all $j \neq i$\\

576: \mbox{\ } \hfill

577: [species are mutually exclusive]\\

578: $\%(S_i(x), B(x), \epsilon_i, \delta_i), \mbox{for } 1 \leq i \leq n$\\

579: \mbox{\ } \hfill

580: [the proportion of each $S_i$ species of birds is ``small'']

581: \end{quote}

582: From $K$ we can derive $n$ $\delta^*$-valid default rules for $\Delta_K$:

583: \[d_i=\default{B(x)}{\neg S_i(x)}{\neg S_i(x)}, \mbox{for } 1 \leq i \leq n\]

584: where $\delta^*$ is  the maximum of $\delta_1, \ldots, \delta_n$.

585:

586: Now consider the evidence set $E=\{B(a)\}$.

587: In the original formulation of default logic,

588: we would get $n$ extensions, each one containing one $S_i(a)$ and the

589: negations of all the other $S_j(a)$'s.

590: Thus, for each extension, we would conclude that $a$ is a particular species

591: of bird, which seems to be an over commitment, considering we have

592: $\%(S_i(x), B(x), \epsilon_i, \delta_i)$ in $K$.

593: \end{example}

594: Note that each of the $n$ default rules is $\delta$-valid when considered

595: individually, but in an extension the rules

596: interact to sanction a set of conclusions that when taken together

597: seems implausible according to our knowledge of model frequencies.

598: The definition of an extension dictates that we must keep applying

599: rules until all ``applicable'' ones are exhausted.  The

600: ``applicability'' condition is based on maximizing logical strength:

601: for $d=\default{\alpha}{\beta_1, \ldots, \beta_m}{\gamma}$,

602: as long as $\alpha$ is derivable, and the $\beta$'s are consistent with

603: the extension, we must apply $d$ and add $\gamma$ to the extension.

604: Thus, for each of the extensions above, we have to keep applying the rules

605: until we have drawn $n-1$ conclusions: $\neg S_j(a)$ for all $j \neq i$.

606: Then the consistency requirement blocks the last default rule $d_i$,

607: as $B(x) \supset S_1(x) \vee \ldots \vee S_n(x)$ together with

608: $\neg S_j(a)$ for all $j \neq i$ gives us $S_i(a)$, contradicting the

609: $\beta$ of $d_i$.

610: From $\%(S_i(x), B(x), \epsilon_i, \delta_i)$, we know the proportion

611: of models in which $S_i(a)$ is true, and thus the proportion of models

612: satisfying this extension, given $E$, is at most $\delta_i$, a small ratio.

613:

614: \subsection{Sequential Thresholding}

615: The validity criteria for individual default rules can be extended to

616: extensions resulted from the application of a chain of default rules.

617: We can think of the task of regulating the compound effect of multiple

618: default rules

619: as adjusting the set of relevant models

620: by taking into account

621: the default conclusions of all previously applied rules in the chain of

622: reasoning.

623:

624:

625: One way to accomplish this is by {\em sequential

626: thresholding\/}~\cite{Teng97c}.  The applicability condition

627: of a default rule $\default{\alpha}{\beta_1, \ldots, \beta_m}{\gamma}$

628: in an extension can be modified to take into account the validity of the rule.

629: In addition to requiring that $\alpha$ is provable and that none of

630: $\neg\beta_1, \ldots, \neg\beta_n$ are

631: provable,

632: we require that the default rule

633: be ``above threshold'', that is, the proportion of relevant

634: models satisfying the consequent $\gamma$ be greater

635: than a threshold $1-\epsilon^*$.

636:

637: The set of relevant models shrinks in a stepwise fashion.

638: We start out with all the models satisfying the background knowledge and

639: evidence we have.

640: As default rules are applied sequentially, the consequent of the

641: applied rule at each step is taken as true in all subsequent steps.

642: The relevant models at a particular step

643: are then those that are consistent with the given

644: facts and all the consequents of the rules applied in the previous steps.

645: A default rule, even if it is $\delta$-valid with respect to the background

646: knowledge, would be blocked from

647: application if it does not satisfy the thresholding criterion.

648:

649: In~\cite{Teng97c}, the thresholding metric

650: is based on a simple probability measure of possible worlds.

651: We can easily extend this metric to employ the same measure as that used for

652: evaluating the $\delta$-validity of default rules.

653:

654: \begin{example}

655: Reconsider Example~\ref{ex:birds}.

656: Let us take $\epsilon^* \geq \delta^*$.

657: We start out with the set ${\cal M}$ of all models satisfying $K$ and $E$.

658: From $\%(S_1(x), B(x), \epsilon_1, \delta_1)$ we know that $d_1$ is

659: above threshold, and it satisfies the other conditions for applicability.

660: Therefore we apply $d_1$ and conclude $\neg S_1(a)$.

661:

662: Now consider $d_2$.

663: The set ${\cal M}'$ of relevant models

664: at this point is a subset of ${\cal M}$; it contains only those

665: models in ${\cal M}$ that satisfy our new conclusion $\neg S_1(a)$ as well.

666: We have eliminated the models in which

667: $S_1(a)$ is true.  Since $S_1(x) \supset \neg S_2(x)$,

668: and $S_1(x) \supset B(x)$,

669: all the models eliminated satisfy $B(a)$,

670: and none satisfies $S_2(a)$.

671: Thus, in ${\cal M}'$, the number of models satisfying $S_2(a)$

672: is the same as in ${\cal M}$.  However, the number of models

673: satisfying $B(a)$ is lower in ${\cal M}'$ as a result of

674: the addition of $\neg S_1(a)$.  This gives rise to a higher proportion

675: of models satisfying $S_2(a)$ in ${\cal M}'$ ($\delta_2'$)

676: than in ${\cal M}$ ($\delta_2$).

677: If $\delta_2' \leq \epsilon^*$, $d_2$ is still above threshold

678: after the application of $d_1$, and we can apply it to obtain $\neg S_2(a)$.

679: Otherwise, $d_2$ is below threshold, and we cannot apply it even though

680: it was above threshold before the application of $d_1$.

681:

682: After each step of applying a rule, the set of relevant models

683: shrinks, and the proportion of $S_i(a)$ of any unapplied rule $d_i$

684: increases.  After a number of steps, all the remaining rules would be

685: below threshold, and we thus obtain an extension containing only

686: a portion of the conclusions that would otherwise be present in the

687: non-thresholding version of the extension.

688: \end{example}

689: Note that the size of $\epsilon^*$

690: determines how much risk is tolerated in an extension.  The higher

691: the $\epsilon^*$, the more of the rules can be applied

692: and the longer they can stay above threshold.

693: Reiter's non-thresholding version corresponds to the case when $\epsilon^*=1$;

694: that is, every rule whose associated proportion is above 0 is allowed,

695: and logical consistency alone determines the rule's admissibility.

696:

697:

698: \section{Concluding Remarks}

699:

700: We have developed a notion of validity for default inference based on

701: model proportions.  A rule is $\delta$-valid if the proportion of models

702: in which the consequent of the rule is satisfied is greater than $1-\delta$

703: in the relevant models picked out by the background knowledge, the evidence,

704: and the

705: applicability conditions of the default rule.

706: Given a body of background knowledge $K$, we can systematically generate

707: candidate default rules and determine which ones are $\delta$-valid

708: based on the statistical facts known in $K$.

709: Conflicts between default rules stemming from multiple inheritance

710: are resolved as a consequence of the validation process.

711: The result is a set of $\delta$-valid default rules which are

712: ``pre-compiled'' for a given background knowledge base, and can be

713: reused for different evidence sets without change.

714:

715: This idea of evaluating the validity of a default rule using model

716: proportions is extended to extensions generated by

717: a combinations of rules.  The compound effect is

718: regulated by a sequential thresholding process,

719: which blocks the rules whose associated model proportions with respect

720: to the ``current'' (shrinking) set of models fall below

721: a particular comfort threshold.  This allows us to use a more reasonable

722: ``closure condition'' for extension than the usual maximal logical strength:

723: we can refrain from applying rules that would make the extension

724: satisfiable in only a small set of models, even if the consequent of

725: the rule is logically consistent with the extension.

726:

727: Grounding the justification of default rules in model proportions provides

728: a way to validate the rules empirically, and is a first step towards

729: automating the learning of default rules from (statistical) data.

730: One might ask why we need the default rules when we can reason with

731: the statistics directly.  Default rules provide a succinct and more

732: understandable characterization of the import of the data, as well as

733: a smooth articulation of the  information that

734: may exist in the knowledge base.

735:

736:

737:

738: \section*{Acknowledgement}

739: This work was supported by the National Science Foundation STS-9906128,

740: IIS-0082928, and NASA NCC2-1239.

741:

742:

743: \begin{thebibliography}{}

744:

745: \bibitem[\protect\citeauthoryear{Adams}{1966}]{adams.1966}

746: Ernest~W. Adams.

747: \newblock Probability and the logic of conditionals.

748: \newblock In Jaakko Hintikka and Patrick Suppes, editors, {\em Aspects of

749:   Inductive Logic}, pages 265--316. North Holland, Amsterdam, 1966.

750:

751: \bibitem[\protect\citeauthoryear{Adams}{1975}]{adams}

752: Ernest~W. Adams.

753: \newblock {\em The Logic of Conditionals}.

754: \newblock Reidel, Dordrecht, 1975.

755:

756: \bibitem[\protect\citeauthoryear{Bacchus \bgroup \em et al.\egroup

757:   }{1993}]{BacchusGHK93}

758: Fahiem Bacchus, Adam~J. Grove, Joesph~Y. Halpern, and Daphne Koller.

759: \newblock Statistical foundations for default reasoning.

760: \newblock In {\em Proceedings of the Thirteenth International Joint Conference

761:   on Artificial Intelligence}, pages 563--569, 1993.

762:

763: \bibitem[\protect\citeauthoryear{Brewka}{1989}]{Brewka89}

764: Gerhard Brewka.

765: \newblock Preferred subtheories---an extended logical framework for default

766:   reasoning.

767: \newblock In {\em Proceedings of the Eleventh International Joint Conference on

768:   Artificial Intelligence}, 1989.

769:

770: \bibitem[\protect\citeauthoryear{Brewka}{1994}]{Brewka94}

771: Gerhard Brewka.

772: \newblock Reasoning about priorities in default logic.

773: \newblock In {\em Proceedings of the Twelfth National Conference on Artificial

774:   Intelligence}, pages 940--945, 1994.

775:

776: \bibitem[\protect\citeauthoryear{Dewey}{1938}]{dewey}

777: John Dewey.

778: \newblock {\em Logic: the Theory of Inquiry}.

779: \newblock Henry Holt, 1938.

780:

781: \bibitem[\protect\citeauthoryear{Dummett}{1978}]{dummett.justification}

782: Michael Dummett.

783: \newblock The justification of deduction.

784: \newblock In {\em Truth and Other Enigmas}, pages 290--318. Duckworth, London,

785:   1978.

786:

787: \bibitem[\protect\citeauthoryear{Goldman}{1979}]{goldman}

788: Alvin~I. Goldman.

789: \newblock What is justified belief?

790: \newblock In George~S. Pappas, editor, {\em Justification and Knowledge}, pages

791:   1--24. D. Reidel, Dordrecht, 1979.

792:

793: \bibitem[\protect\citeauthoryear{Haack}{1976}]{haack.justification}

794: Susan Haack.

795: \newblock The justification of deduction.

796: \newblock {\em Mind}, 85:112--119, 1976.

797:

798: \bibitem[\protect\citeauthoryear{Horty \bgroup \em et al.\egroup

799:   }{1987}]{HortyTT87}

800: J.~F. Horty, D.~S. Touretzky, and R.~H. Thomason.

801: \newblock A clash of intuitions: The current state of nonmonotonic multiple

802:   inheritance systems.

803: \newblock In {\em Proceedings of the Tenth International Joint Conference on

804:   Artificial Intelligence}, pages 476--482, 1987.

805:

806: \bibitem[\protect\citeauthoryear{James}{1948}]{james}

807: William James.

808: \newblock {\em Essays in Pragmatism}.

809: \newblock Hafner, New York, 1948.

810:

811: \bibitem[\protect\citeauthoryear{Kyburg and Teng}{2001}]{kyburg.teng}

812: Henry~E. Kyburg, Jr. and Choh~Man Teng.

813: \newblock {\em Uncertain Inference}.

814: \newblock Cambridge University Press, New York, 2001.

815:

816: \bibitem[\protect\citeauthoryear{Kyburg}{1958}]{kyburg.justification}

817: Henry E.~Jr. Kyburg.

818: \newblock The justification of deduction.

819: \newblock {\em Review of Metaphysics}, 12:19--25, 1958.

820:

821: \bibitem[\protect\citeauthoryear{Kyburg}{1961}]{Kyburg61}

822: Henry~E. Kyburg, Jr.

823: \newblock {\em Probability and the Logic of Rational Belief}.

824: \newblock Wesleyan University Press, 1961.

825:

826: \bibitem[\protect\citeauthoryear{Kyburg}{2001}]{kyburg.2001}

827: Henry~E. Kyburg, Jr.

828: \newblock Real logic is nonmonotonic.

829: \newblock {\em Minds and Machines}, 11:577--595, 2001.

830:

831: \bibitem[\protect\citeauthoryear{Levi}{1967}]{levi.gambling}

832: Isaac Levi.

833: \newblock {\em Gambling with Truth}.

834: \newblock Knopf, New York, 1967.

835:

836: \bibitem[\protect\citeauthoryear{Levi}{1980}]{levi.enterprise}

837: Isaac Levi.

838: \newblock {\em The Enterprise of Knowledge,}.

839: \newblock MIT Press, Cambridge, 1980.

840:

841: \bibitem[\protect\citeauthoryear{Levi}{1996}]{levi.argument}

842: Isaac Levi.

843: \newblock {\em For the Sake of the Argument}.

844: \newblock Cambridge University Press, Cambridge, 1996.

845:

846: \bibitem[\protect\citeauthoryear{{\L}ukaszewicz}{1988}]{Lukaszewicz88}

847: Witold {\L}ukaszewicz.

848: \newblock Considerations on default logic: An alternative approach.

849: \newblock {\em Computational Intelligence}, 4(1):1--16, 1988.

850:

851: \bibitem[\protect\citeauthoryear{Moore}{1985}]{Moore85}

852: Robert~C. Moore.

853: \newblock Semantical considerations on nonmonotonic logic.

854: \newblock {\em Artificial Intelligence}, 25:75--94, 1985.

855:

856: \bibitem[\protect\citeauthoryear{Morgan}{1998}]{morgan}

857: Charles Morgan.

858: \newblock Non-monotonic logic is impossible.

859: \newblock {\em Canadian Artificial Intelligence Magazine}, 42:18--25, 1998.

860:

861: \bibitem[\protect\citeauthoryear{Neufeld \bgroup \em et al.\egroup

862:   }{1990}]{NeufeldPA90}

863: Eric Neufeld, David Poole, and Romas Aleliunas.

864: \newblock Probabilistic semantics and defaults.

865: \newblock In {\em Uncertainty in Artificial Intelligence}, volume~4, pages

866:   121--131. North-Holland, 1990.

867:

868: \bibitem[\protect\citeauthoryear{Pearl and Geffner}{1990}]{pearl.geffner}

869: Judea Pearl and Hector Geffner.

870: \newblock A framework for reasoning with defaults.

871: \newblock In Henry~E. Kyburg, Jr., Ronald~P. Loui, and Greg~N. Carlson,

872:   editors, {\em Knowledge Representation and Defeasible Reasoning}, pages

873:   69--87. Kluwer, 1990.

874:

875: \bibitem[\protect\citeauthoryear{Pearl}{1988}]{Pearl88}

876: Judea Pearl.

877: \newblock {\em Probabilistic Reasoning in Intelligent Systems}.

878: \newblock Morgan Kaufmann, 1988.

879:

880: \bibitem[\protect\citeauthoryear{Pearl}{1990}]{Pearl90}

881: Judea Pearl.

882: \newblock System {Z}: A natural ordering of defaults with tractable

883:   applications to default reasoning.

884: \newblock In {\em Theoretical Aspects of Reasoning about Knowledge}, pages

885:   121--135, 1990.

886:

887: \bibitem[\protect\citeauthoryear{Peirce}{1903}]{peirce}

888: Charles~S. Peirce.

889: \newblock The fixation of belief.

890: \newblock In J.~Buchler, editor, {\em The Philosophy of Peirce.}, pages 5--22.

891:   Harcourt Brace and Company, New York, 1903.

892:

893: \bibitem[\protect\citeauthoryear{Poole}{1989}]{Poole89}

894: David Poole.

895: \newblock What the lottery paradox tells us about default reasoning.

896: \newblock In {\em Proceedings of the First International Conference on

897:   Principles of Knowledge Representation and Reasoning}, pages 333--340, 1989.

898:

899: \bibitem[\protect\citeauthoryear{Priest}{1989}]{priest}

900: Graham Priest.

901: \newblock Reasoning about truth.

902: \newblock {\em Artificial Intelligence}, 39:231--244, 1989.

903:

904: \bibitem[\protect\citeauthoryear{Reiter and Criscuolo}{1981}]{ReiterC81}

905: Raymond Reiter and Giovanni Criscuolo.

906: \newblock On interacting defaults.

907: \newblock In {\em Proceedings of the Seventh International Joint Conference on

908:   Artificial Intelligence}, pages 270--276, 1981.

909:

910: \bibitem[\protect\citeauthoryear{Reiter}{1980}]{Reiter80}

911: R.~Reiter.

912: \newblock A logic for default reasoning.

913: \newblock {\em Artificial Intelligence}, 13:81--132, 1980.

914:

915: \bibitem[\protect\citeauthoryear{Teng}{1997}]{Teng97c}

916: Choh~Man Teng.

917: \newblock Sequential thresholds: Context sensitive default extensions.

918: \newblock In {\em Proceedings of the Thirteen Conference of Uncertainty in

919:   Artificial Intelligence}, pages 437--444, 1997.

920:

921: \bibitem[\protect\citeauthoryear{Touretzky}{1984}]{Touretzky84}

922: David~S. Touretzky.

923: \newblock Implicit ordering of defaults in inheritance systems.

924: \newblock In {\em Proceedings of the Fifth National Conference on Artificial

925:   Intelligence}, pages 322--325, 1984.

926:

927: \end{thebibliography}

928:

929: \end{document}

930:

931: