1: \documentclass{article}
2: \usepackage{proceedings}
3: \usepackage{latexsym}
4: \usepackage{amsfonts}
5: \usepackage{named}
6:
7: \newtheorem{definition}{Definition}
8: \newtheorem{theorem}{Theorem}
9: \newtheorem{ex}{Example}
10: \newenvironment{example}{\begin{ex} \rm}{$\Box$ \end{ex}}
11:
12: \newcommand{\Th}{{\bf Th}}
13: \newcommand{\default}[3]{\frac{#1:#2}{#3}}
14:
15: \newcommand{\lc}{\ulcorner}
16: \newcommand{\rc}{\urcorner}
17:
18:
19: \newcommand{\note}[1]{\bigskip[{\em #1}]\bigskip}
20:
21: \title{Evaluating Defaults}
22: \author{
23: {\bf Henry E. Kyburg, Jr.}$^{1,2}$\\
24: {\tt kyburg@cs.rochester.edu}
25: \vspace{1ex}\\
26: $^{1}$Computer Science and Philosophy\\
27: University of Rochester\\
28: Rochester NY 14627, USA
29: \And
30: {\bf Choh Man Teng}$^{2}$\\
31: {\tt cmteng@ai.uwf.edu}
32: \vspace{1ex}\\
33: $^{2}$Institute for Human and Machine Cognition\\
34: University of West Florida\\
35: Pensacola FL 32501, USA
36: }
37:
38: \begin{document}
39:
40: \maketitle
41:
42: \begin{abstract}
43: We seek to find normative criteria of adequacy for nonmonotonic logic similar
44: to the criterion of validity for deductive logic. Rather than stipulating that
45: the conclusion of an inference be true in all models in which the premises are
46: true, we require that the conclusion of a nonmonotonic inference be true in
47: ``almost all'' models of a certain sort in which the premises are true. This
48: ``certain sort'' specification picks out the models that are relevant to the
49: inference, taking into account factors such as specificity and vagueness,
50: and previous inferences. The frequencies
51: characterizing the relevant models reflect known frequencies in our actual
52: world. The criteria of adequacy for a default
53: inference can be extended by thresholding to criteria of adequacy for an
54: extension. We show that this avoids the implausibilities that might otherwise
55: result from the chaining of default inferences. The model proportions,
56: when construed in terms of frequencies, provide a verifiable grounding of
57: default rules, and can become the basis for generating default rules from
58: statistics.
59:
60:
61: \vskip 2ex
62: \noindent
63: {\em Keywords}:
64: probability, frequency, default logic
65: \end{abstract}
66:
67:
68: \section{Introduction}
69:
70: Non-monotonic reasoning, for example default logic~\cite{Reiter80},
71: models the intuitive process of making non-deductive
72: inferences in the face of certain supportive but not conclusive evidence.
73: Given a default theory $\Delta=\langle D,F \rangle$, we can obtain its
74: extensions by following a prescribed set of steps.
75: However, on what grounds do we employ a particular default rule?
76: Some writers would
77: regard this as an inappropriate question, since they take as their goal the
78: representation of human inference.
79: To this end defaults represent rules that we take to be intuitively
80: appropriate. But then when we apply these rules,
81: we may be led to counterintuitive
82: results~\cite{ReiterC81,Lukaszewicz88,Poole89}.
83: The underlying principle seems to be circular: the original default
84: rules are ``intuitively good'' at first glance, but when we discover that they
85: do not give rise to the desired results, we tweak the rules until they give us
86: those results. It seems that we have to know what results we want first before
87: constructing the default theory, rather than having the default theory tell us
88: what conclusions are warranted. This is precisely the reason why we need an
89: independent measure of validity for default rules and default extensions.
90: We think of nonmonotonic
91: logic as sharing the normative character of other logics. From this point of
92: view default rules require some defense.
93: We will concentrate on default logic here, though much of
94: what we have to say will apply to other nonmonotonic approaches as well.
95: Much
96: of the work on nonmonotonic logic has concerned the syntactic manipulation
97: of the nonmonotonic rules,
98: rather than their basic justification.
99:
100:
101:
102: \subsection{Selective Preference}
103: For a {\em default rule\/}
104: $d=\default{\alpha}{\beta_1, \ldots, \beta_n}{\gamma}$,
105: $\alpha$ is the {\em prerequisite\/},
106: $\beta_1, \ldots, \beta_n$ are the {\em justifications\/},
107: and $\gamma$ is the {\em consequent\/} of $d$.
108: Loosely speaking, the rule
109: conveys the idea that if $\alpha$ is provable, and
110: $\neg\beta_1, \ldots, \neg\beta_n$ are
111: each
112: not provable,
113: then by default we conclude that $\gamma$ is true.
114: A {\em default theory\/} is an ordered pair
115: $\langle D,F \rangle$, where
116: $D$ is a set of default rules and $F$ is a set of
117: ``facts''.
118: A theory extended from $F$ by applying the default rules in $D$
119: is known as an {\em extension\/} of the default theory.
120:
121: Consider the following canonical example.
122:
123:
124: \begin{example}
125: We have a default theory $\Delta=\langle D, F \rangle$, where
126: \[\begin{array}{rcl}
127: D &=& \{\default{R(x)}{T(x)}{T(x)},
128: \default{S(x)}{\neg T(x)}{\neg T(x)}\}, \\
129: F &=& \{R(a),S(a)\}.
130: \end{array}\]
131:
132: We get two extensions, one containing $T(a)$ and the other containing
133: $\neg T(a)$.
134: If we take ``$R(x)$'' to mean that $x$ is a bird, ``$S(x)$'' to mean that
135: $x$ is a penguin, and ``$T(x)$'' to mean that $x$ flies, then we would like
136: to reject the extension containing ``$T(a)$'' ($a$ flies) in favor of the
137: extension containing ``$\neg T(a)$'' ($a$ does not fly).
138: However, if we take ``$S(x)$'' to mean that $x$ is an animal instead and
139: keep ``$R(x)$'' and ``$T(x)$'' the same, we would want to reverse our preference.
140: Now the extension containing ``$T(a)$'' ($a$ flies) seems better.
141: \end{example}
142: Note that each of the default rules involved in the example above
143: is intuitively appealing
144: when viewed by itself against our background knowledge:
145: birds fly;
146: penguins do not fly;
147: and animals in general do not fly either.
148: Moreover, both instantiations (penguins and animals) are syntactically
149: identical. Thus, we cannot base our decision to prefer one default rule over
150: the other by simply looking at their syntactic structures.
151:
152: It is the interaction between the default rules and evidence
153: that gives rise to the selective preference above. We have
154: the evidence that ``$a$ is a bird''.
155: If in addition we also have ``$a$ is a penguin'',
156: we prefer the penguin rule.
157: If instead we have ``$a$ is an animal'', we prefer the bird rule.
158:
159: There are several approaches to circumventing this conceptual difficulty.
160: The first is to revise the default theory so that the desired result
161: is achieved~\cite{ReiterC81}. We can amend the default rules
162: by adding the exceptions as justifications, for example
163: $\default{B(x)}{F(x), \neg P(x)}{F(x)}$ and
164: $\default{A(x)}{\neg F(x), \neg B(x)}{\neg F(x)}$.
165: With this approach we have to constantly revise the default rules
166: to take into account additional exceptions.
167: We have little guidance in constructing the list of justifications
168: except that the resulting default rule has to produce the ``right''
169: answer in the given situation.
170:
171:
172: Another approach is to establish some priority structure over the
173: set of defaults.
174: For example, we can refer to a specificity or inheritance hierarchy
175: to determine which default rule should be used in case of a
176: conflict~\cite{Touretzky84,HortyTT87}.
177: The penguin rule is more specific than the bird rule, when both are
178: applicable, and therefore we use the penguin rule and not the bird rule.
179: However, conflicting rules do not always fit into neat hierarchies
180: (for example, adults are employed, students are not, how about adult
181: students?~\cite{ReiterC81}). It is not obvious how we can extend
182: the hierarchical structure without resorting to explicitly enumerating
183: the priority relations between the default rules~\cite{Brewka89,Brewka94}.
184:
185: The third approach is to appeal to probabilistic
186: analysis. Defaults are interpreted as
187: representing properties of conditional probabilities.
188: For example, the conditional probability of $a$ being able to fly given
189: that $a$ is a bird is ``high''~\cite{Pearl88,Pearl90}
190: or increases from the prior probability~\cite{NeufeldPA90},
191: while the conditional probability of $a$ being able to fly given that $a$ is
192: a penguin
193: is ``low'' or decreases. This approach provides a probabilistic semantics
194: for default rules, but in a way which does not represent the fact that
195: the conclusions are {\em accepted\/}. The default conclusion is ``Tweety
196: flies,'' not ``Probably Tweety flies.''
197: This is in contrast to the spirit of
198: nonmonotonic reasoning: default conclusions should be accepted as new facts,
199: and we should be able to chain default rules
200: and build upon the conclusions of previous default applications
201: to obtain further conclusions.
202:
203:
204: \subsection{Justifying Nonmonotonic Inference}
205:
206: The justification of beliefs is a long standing issue in epistemology. There
207: is
208: not much that is problematic about the justification of beliefs obtained by
209: deductive inference (though there are plenty of problems that surround
210: deduction --- see
211: \cite{kyburg.justification,haack.justification,dummett.justification}, not to
212: mention the voluminous literature on paraconsistent logic \cite{priest}). The
213: reason is that we can show that the ordinary rules of deduction lead from
214: premises to conclusions that are true in every model in which the premises are
215: true. This is exactly what is not true of ampliative inference, and it is what
216: has led some writers (e.g., \cite{morgan}) to deny that there {\em is} any such
217: thing as a nonmonotonic logic. This has been disputed in \cite{kyburg.2001}.
218:
219: But other kinds of justifications of beliefs have been proposed. Isaac Levi
220: \cite{levi.gambling,levi.enterprise,levi.argument} has argued for many years
221: that the way to understand ampliative (inductive, nonmonotonic) argument is in
222: terms of decision theory: we choose (decide) to accept a hypothesis in a given
223: context provided that the expected epistemic utility of doing so in that
224: context
225: is greater than the expected utility of any other epistemic act, such as
226: suspending belief totally, or accepting a stronger hypothesis.
227:
228: Levi's approach employs a rich and detailed structure for acceptance, and
229: allows drawing many important distinctions. This structure requires three
230: things that make it less than perfect as a vehicle for ordinary nonmonotonic
231: inferential systems. First, in keeping with a long tradition in pragmatism
232: \cite{dewey,peirce,james} the context of inquiry must be tied to a specific
233: problem: We need the answer to a question. Second, the epistemic expectation of an
234: answer is the expected value of the {\em information} contained in that
235: answer. Thus we need to presuppose an information measure on the language of
236: our inquiry \cite[p. 169]{levi.argument}. Third, we need to have available a
237: credal or inductive probability, based on a measure (or
238: convex set of measures) on the sentences of the language, in terms of which a
239: conditional probability (or convex set of conditional probabilities)
240: can be defined \cite[p. 52]{levi.enterprise}.
241:
242: It is our belief
243: both that in some contexts in which we might wish to use nonmonotonic
244: mechanisms, this overhead is unnecessary, and perhaps itself difficult to
245: justify, and that we would like to be able to explicate the justification of
246: inference in a less context dependent way.
247:
248: Another approach that has attracted considerable
249: attention in the philosophical community in recent years is that of
250: ``reliabilism'' whose best known exponent is Alvin Goldman \cite{goldman}.
251: According to this view, what justifies a belief is the fact that it is obtained
252: by a ``reliable cognitive process...'' \cite[p. 20]{goldman} Of course there
253: are a number of additional hedges to the view that are required for philosophical
254: accuracy, and even with those hedges there remains a certain vagueness in the
255: view. These details need not detain us, since we are seeking inspiration
256: rather than philosophical precision.
257:
258: What does ``reliable'' mean?
259: We will construe reliability in terms of frequency or propensity to yield truth
260: when applied.
261: Specifically, we will say that the belief $\phi$ is nonmonotonically justified
262: by a default rule if the rule would frequently lead to truth and
263: rarely to error, given what we know --- given our background knowledge.
264:
265: A deductive argument is justified (valid) if its conclusion is true in every
266: model of its premises.
267: We will attempt to provide an analog of the justification of deductive rules:
268: a default argument is justified if its
269: conclusion is true in a high proportion of the relevant models in which its premises are
270: true.
271: To make this idea precise requires an excursion into model theory.
272:
273:
274: \section{Model Theory}
275:
276: We will suppose that the underlying object language is a first order
277: language that does not involve such intensional predicates
278: as ``know'' or ``believe.'' A number of nonmonotonic formalisms
279: (specifically autoepistemic logic~\cite{Moore85})
280: do involve such locutions within the object
281: language, but they can be dispensed with
282: in default logic. The default rule
283: $\default{\alpha}{\beta_1,\ldots, \beta_n}{\gamma}$ can be read in terms
284: of the nonmembership of $\ulcorner \neg \beta_i \urcorner$ in a specified set
285: of
286: expressions $\Gamma$. In original default logic, $\Gamma$ would
287: just be an extension.
288:
289:
290: There are a number of immediate problems associated
291: with the idea of looking at the ``proportion'' of models.
292: The least of them is choosing a {\em level\/} at
293: which to regard the evidence as adequate. Should we require that the
294: proportion be 0.95? Or 0.99? Or 0.995? This is
295: just the sort of question that
296: arises in statistical hypothesis testing or in confidence interval estimation.
297: We shall suppose that in a given context there is some
298: agreed-upon level of security $\delta$; we will accept a conclusion if the
299: proportion of models in which we could be committing an error is no greater
300: than $\delta$.
301:
302: This approach is to be contrasted with those of Adams \cite{adams,adams.1966},
303: Pearl \cite{Pearl88} and Bacchus et al \cite{BacchusGHK93}. Adams requires that
304: for
305: $A$ to be a reasonable consequence of the set of sentences $S$, for {\em any\/}
306: $\epsilon$ there must be a positive $\delta$ such that for every probability
307: function, if the probability of every sentence in $S$ is greater than $1 -
308: \delta$, then the probability of $A$ is at least $1 - \epsilon$ \cite[p.
309: 274]{adams.1966}. Pearl's approach similarly involves quantification over
310: possible probability functions.
311: Bacchus et al again take the degree of belief of a statement to be
312: the limiting proportion of first order models in which the statement is true.
313: All of these approaches involve matters that go well beyond what we may
314: reasonably suppose to be available to us as empirical enquirers. Our $\delta$, on
315: the other hand, serves much like the $\alpha$ of statistical testing.
316:
317: We must restrict the number of models
318: under consideration to a finite number so that the idea
319: of looking at proportions makes
320: sense.\footnote{We could, instead, seek to develop a
321: way of proceeding to a limit; this still would require restrictions to arrive
322: at a countable number of models, and would entail a large expository cost for
323: little gain in plausibility.}
324: We will be taking account of statistical information, and to this end will want
325: each model to have a finite domain.
326: Roughly speaking, we take as a model
327: of our language one in which the domain of empirical individuals is of finite
328: cardinality. This may be regarded as problematic (it
329: entails the falsity of ``every person has two parents and
330: nobody is his own ancestor'') but with reasonable spatial and temporal bounding
331: it can be rendered plausible.
332:
333: Even so, to ensure that the set of models is finite
334: we must restrict the empirical domain even further. Not only must it consist
335: of
336: a finite set of physical entities, but this same set of physical entities
337: $\mathcal D$ must be taken to be the empirical domain of every model.
338:
339: We assume that it is possible to express statistical knowledge in this
340: language. For example, if ``$B(x)$'' is the predicate ``is a bird''
341: and
342: ``$F(x)$'' is ``can fly'', we can express the fact that
343: between 85\% and 95\% of birds fly by the formula $0.85 <
344: \frac{|\{x:B(x)
345: \land F(x)\}|}{|\{x:B(x)\}|} < 0.95$. Employing the notation of
346: \cite{kyburg.teng} we write this as ``$\%x(F(x),B(x),0.85,0.95)$.''
347: This renders ``\%'' a variable binding operator on 4-sequences of expressions:
348: two formulas and two fractions.
349:
350:
351: We distinguish, as do Pearl and Geffner \cite[p. 70]{pearl.geffner} between
352: immediate evidence, represented by a finite set of sentences $E$ concerning
353: particular facts (to be distinguished from the general body of factual
354: knowledge $F$ invoked by classical default logic), and a finitely axiomatizable
355: set of sentences
356: $K$ representing general background knowledge.
357: What defaults are plausible depends, of course, on background knowledge.
358: If it were not for what we take to be the typical (or natural, or frequent)
359: behavior of birds, the world's best known example of a default rule would not
360: be
361: plausible. On the other hand no one has proposed the default rule $\frac
362: {\mathrm{fish}(a):\mathrm{mackerel}(a)}{\mathrm{can-talk}( a)}$.
363:
364: Thus in general
365: we will represent the set of default rules of a default theory as $\Delta_K$
366: rather than $D$, since we take them to be a function of our body of general
367: knowledge $K$. Given an error tolerance
368: $\delta$, we will take a default rule to be {\em $\delta$-valid} if, for every
369: set of possible input sentences $E$ consistent with $K$, the application of the
370: rule to $E$ leads to a false conclusion in a
371: proportion of at most
372: $\delta$ of the relevant models.
373: More precisely, a default rule is $\delta$-valid if and only if for every set of
374: input sentences $E$ consistent with $K$ to which the rule is applicable, the
375: proportion of models of $E \cup K$ in which the conclusion of the rule is false
376: is no more than $\delta$.
377:
378: To fix our ideas, let us begin with a simple example. Suppose
379: $K$ includes a statement to the effect that at least $1 -
380: \delta$ and not more than 1 - $\epsilon$ of birds fly and nothing else; that is,
381: ``$\%x(F(x),B(x),1-\delta,1-\epsilon)$.'' Consider the rule
382: $\default{B(x)}{F(x)}{F(x)}$.
383: This
384: rule is ``applicable'' to immediate evidence
385: $E$ only if $E \cup K$ entails a sentence of the form
386: $\ulcorner B(a) \urcorner$ and no corresponding sentence of the form
387: $\ulcorner \neg F(a) \urcorner$.
388:
389: Our models have a single domain $\mathcal D$ of finite cardinality.
390: We will write ``$\mathcal I_m(\phi)$'' for the interpretation of $\phi$ in
391: the model $m$. The constraint imposed by
392: $K$ is that for every model $m$ the proportion of objects in $\mathcal I_m(B)$ that
393: are
394: also in $\mathcal I_m(F)$ lies in
395: $[1-\delta,1-\epsilon]$.
396:
397: There are three cases. First, suppose $E \cup K$ does not entail a sentence of
398: the form $\ulcorner B(a) \urcorner$. Then the rule is inapplicable. Second,
399: suppose that for some term $a$, $E \cup K$ entails
400: ``$B(a)$'' and also entails ``$\neg F(a)$''. The rule is again inapplicable, because
401: it is blocked by the failure of a justification. Third, suppose
402: for some term $a$, $E \cup K$ entails
403: ``$B(a)$'' but not ``$\neg F(a)$''.
404: Then $\mathcal I_m(a) \in \mathcal I_m(B)$. There are
405: $|\mathcal I_m(B)|$ interpretations of $a$ that make $E \cup K$ true;
406: of these at least
407: $1-\delta$ make ``$F(a)$'' true. We have said nothing about interpreting the
408: rest of the language, but however many interpretations there are (we have seen
409: to it that there are only a finite number) the proportion that renders
410: ``$F(a)$'' true will remain unchanged; it will be at least $1-\delta$.
411: Thus, given the background knowledge that we have
412: posited, the rule is $\delta$-valid: if it is applicable it will lead to error no
413: more than $1- \delta$ of the time.
414:
415:
416: Now let us consider a somewhat more complex example: Suppose we know
417: that typically birds fly, and that typically penguins don't. If that is in our
418: background knowledge $K$, as well as ``$\forall x(P(x) \supset B(x))$'', then
419: the
420: flying default becomes $\default{B(x)}{F(x),\neg P(x)}{F(x)}$, and
421: we also have the default
422: $\default{P(x)}{\neg F(x)}{\neg F(x)}.$
423: If $E$ entails ``$P(a)$'', only the second default is applicable. In no more
424: than
425: $\delta$ of the models of $E$ will $a$ fly, unless $E \cup K$ entails that $a$ can
426: fly.
427:
428:
429: Another example: Suppose $K$ contains vague information about the
430: frequency with which red birds fly (perhaps because we have
431: encountered few red birds).
432: Say that we know the frequency to be between
433: 0.50 and 1.0.
434: Since the interval for birds in general
435: $[1-\delta,1-\epsilon]$ is included in $[0.5, 1.0]$,
436: this additional piece of information
437: should not interfere with our inference of flying ability.
438: There is no {\em conflict\/} between the two intervals,
439: just less precision in one. The rule about birds in general
440: can be applied to red birds. However, if $K$ contains the knowledge
441: that
442: between 0.5 and
443: $r$ of red birds fly, where
444: $r$ is less than $1-\epsilon$,
445: then this information
446: {\em should\/} interfere.
447: In this case the general rule should be so construed that it does not
448: apply to red birds. If $b$ is a red bird and not a penguin, no conclusion about
449: flying ability is justified.
450:
451: We can arrange this by judiciously adding or deleting
452: justifications in the general bird rule, in accordance with the statistical
453: information in $K$:
454: in the first case we allow red birds; in the second we must require that we do
455: not know $a$ is red: ``$\neg R(x)$'' must be among the justifications of
456: the rule. This statistical approach provides exactly the normative guidance
457: that is lacking in the ad-hoc
458: approach of tweaking default rules in order to arrive at the ``intuitive''
459: results.
460:
461: More generally, we can give recipes for constructing $\delta$-valid defaults
462: for conclusions of the form $\lc \phi (a) \rc$ from background knowledge $K$
463: and immediate evidence $E$.\footnote{
464: Of course any default conclusion can be given this form, particularly if we allow
465: the term $a$ to be an $n$-sequence of terms. Furthermore, any such conclusion
466: can be taken to be an instance of the consequent of a statistical generalization,
467: in virtue of the fact that statistical generalizations merely impose bounds. We
468: are not imposing serious limitations on default rules. For details, see
469: \cite{kyburg.teng}.}
470: Let $K$ contain $\ulcorner
471: \%x(\phi(x),\psi(x),p,q) \urcorner$ and
472: $\ulcorner
473: \%x(\phi(x),\psi'(x),p',q') \urcorner$.
474: We consider three cases:
475: \begin{enumerate}
476: \item $K$ entails $\ulcorner\forall x
477: (\psi(x) \supset \psi'(x)) \urcorner$.
478: There are three subcases
479: according to the relation among $p$, $p'$, $q$, $q'$:
480: \begin{enumerate}
481: \item ($p \leq p'$ and $q \leq q'$) or ($p' \leq p$ and $q' \leq q$)\\
482: $\default{\psi(x)}{\phi(x)}{\phi(x)}$ and $\default{\psi'(x)}{\neg
483: \psi(x),\phi(x)}{\phi(x)}$ are candidate defaults.
484: \item $p \leq p'$ and $q' \leq q$\\
485: $\default{\psi'(x)}{\phi(x)}{\phi(x)}$ is the only candidate default,
486: since
487: the justification $\neg \psi'(x)$
488: of $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$ is
489: inconsistent with the prerequisite $\psi(x)$ and $K$.
490: \item $p' \leq p$ and $q \leq q'$\\
491: $\default{\psi(x)}{\phi(x)}{\phi(x)}$ and $\default{\psi'(x)}{\neg
492: \psi(x),\phi(x)}{\phi(x)}$ are candidate defaults.
493: \end{enumerate}
494: \item $K$ entails $\ulcorner\forall x
495: (\psi'(x) \supset \psi(x)) \urcorner$.
496: This is symmetrical to case 1.
497: \item $K$ entails neither $\ulcorner\forall x
498: (\psi(x) \supset \psi'(x)) \urcorner$ nor $\ulcorner\forall x
499: (\psi'(x) \supset \psi(x)) \urcorner$.
500: Again there are three subcases:
501: \begin{enumerate}
502: \item ($p \leq p'$ and $q \leq q'$) or ($p' \leq p$ and $q' \leq q$) \\
503: The candidate defaults are $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$
504: and
505: $\default{\psi'(x)}{\neg \psi(x),\phi(x)}{\phi(x)}$.
506: \item $p \leq p'$ and $q' \leq q$\\
507: $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$ and
508: $\default{\psi'(x)}{\phi(x)}{\phi(x)}$ are candidate default rules.
509: \item $p' \leq p$ and $q \leq q'$:
510: This is symmetrical to case 3(b).
511: \end{enumerate}
512: \end{enumerate}
513:
514: Having generated a list of candidate default rules based on our background
515: knowledge $K$, we delete those rules derived from statistics with a lower
516: measure less than $1 -\delta$. The remainder is the set of
517: defaults $\Delta_K$.
518:
519: We have not taken account of relations among default conclusions that may be
520: entailed by $K$. If $K$ contains $\ulcorner \forall x
521: (\phi(x) \equiv \phi'(x)) \urcorner$ then the default conclusion $\ulcorner
522: \phi(a) \rc$ behaves just like the default conclusion $\lc \phi'(a) \rc$. If
523: $K$ contains $\lc \forall x(\phi(x) \supset \phi'(x)) \rc$, then since $\lc
524: \phi(x) \rc$ is equivalent to $\lc \phi(x) \wedge \phi'(x) \rc$ and $\lc
525: \phi'(x) \rc$ is equivalent to $\lc \phi(x) \vee \phi'(x) \rc$ we can make use
526: of the obvious entailment relations.
527:
528: Soundness of a system of deductive logic requires that the conclusion of any
529: inference be true in every model in which the premises are true. Clearly
530: nonmonotonic inference should not be sound. But there is a property that is
531: {\em like\/} soundness that applies to default inference. It is the property
532: that the conclusion is false in at most a fraction $\delta$ of the models of the
533: premises $K \cup E$.
534:
535: \begin{theorem}[Default Soundness]
536: For every set of observations $E$, if $d \in \Delta_K$ is applicable to $E$,
537: the proportion of models of $E \cup K$ in which the conclusion of $d$ is false
538: is less than $\delta$.
539:
540: {\rm The proof of this theorem is provided by the soundness theorem for
541: evidential probability \cite[p. 241]{kyburg.teng}, since the rules for deriving
542: defaults are a subset of the rules for computing evidential probabilities.
543: $\Box$}
544: \end{theorem}
545:
546:
547: \section{Interactions within an Extension}
548:
549: Having determined which default rules are justified with respect to the
550: background knowledge, the next step is to investigate the interaction between
551: default rules in generating an extension.
552: A default extension is a minimal deductively closed set that
553: contains the given facts and the consequents of all applicable default rules.
554: Given an evidence set, we need to determine
555: how to control the compound effects of multiple defaults in an extension.
556:
557: Take for example, a default version~\cite{Poole89} of
558: the probabilistic lottery paradox~\cite{Kyburg61}. There are $n$ species
559: of birds, $S_1, \ldots, S_n$.
560: We can say that penguins are atypical in that they cannot fly;
561: hummingbirds are atypical in that they have very fine motor control;
562: parrots are atypical in that they could talk; and so on.
563: If we apply this train of thought to all $n$ species of birds,
564: there is no typical bird left, as for each species
565: there is always at least one aspect in which it is atypical.
566: A parallel scenario is formulated below.
567:
568: \begin{example}
569: \label{ex:birds}
570: $K$ contains
571: \begin{quote}
572: $B(x) \equiv S_1(x) \vee \ldots \vee S_n(x)$\\
573: \mbox{\ } \hfill
574: [an exhaustive list of bird species]\\
575: $S_i(x) \supset \neg S_j(x)$, for all $j \neq i$\\
576: \mbox{\ } \hfill
577: [species are mutually exclusive]\\
578: $\%(S_i(x), B(x), \epsilon_i, \delta_i), \mbox{for } 1 \leq i \leq n$\\
579: \mbox{\ } \hfill
580: [the proportion of each $S_i$ species of birds is ``small'']
581: \end{quote}
582: From $K$ we can derive $n$ $\delta^*$-valid default rules for $\Delta_K$:
583: \[d_i=\default{B(x)}{\neg S_i(x)}{\neg S_i(x)}, \mbox{for } 1 \leq i \leq n\]
584: where $\delta^*$ is the maximum of $\delta_1, \ldots, \delta_n$.
585:
586: Now consider the evidence set $E=\{B(a)\}$.
587: In the original formulation of default logic,
588: we would get $n$ extensions, each one containing one $S_i(a)$ and the
589: negations of all the other $S_j(a)$'s.
590: Thus, for each extension, we would conclude that $a$ is a particular species
591: of bird, which seems to be an over commitment, considering we have
592: $\%(S_i(x), B(x), \epsilon_i, \delta_i)$ in $K$.
593: \end{example}
594: Note that each of the $n$ default rules is $\delta$-valid when considered
595: individually, but in an extension the rules
596: interact to sanction a set of conclusions that when taken together
597: seems implausible according to our knowledge of model frequencies.
598: The definition of an extension dictates that we must keep applying
599: rules until all ``applicable'' ones are exhausted. The
600: ``applicability'' condition is based on maximizing logical strength:
601: for $d=\default{\alpha}{\beta_1, \ldots, \beta_m}{\gamma}$,
602: as long as $\alpha$ is derivable, and the $\beta$'s are consistent with
603: the extension, we must apply $d$ and add $\gamma$ to the extension.
604: Thus, for each of the extensions above, we have to keep applying the rules
605: until we have drawn $n-1$ conclusions: $\neg S_j(a)$ for all $j \neq i$.
606: Then the consistency requirement blocks the last default rule $d_i$,
607: as $B(x) \supset S_1(x) \vee \ldots \vee S_n(x)$ together with
608: $\neg S_j(a)$ for all $j \neq i$ gives us $S_i(a)$, contradicting the
609: $\beta$ of $d_i$.
610: From $\%(S_i(x), B(x), \epsilon_i, \delta_i)$, we know the proportion
611: of models in which $S_i(a)$ is true, and thus the proportion of models
612: satisfying this extension, given $E$, is at most $\delta_i$, a small ratio.
613:
614: \subsection{Sequential Thresholding}
615: The validity criteria for individual default rules can be extended to
616: extensions resulted from the application of a chain of default rules.
617: We can think of the task of regulating the compound effect of multiple
618: default rules
619: as adjusting the set of relevant models
620: by taking into account
621: the default conclusions of all previously applied rules in the chain of
622: reasoning.
623:
624:
625: One way to accomplish this is by {\em sequential
626: thresholding\/}~\cite{Teng97c}. The applicability condition
627: of a default rule $\default{\alpha}{\beta_1, \ldots, \beta_m}{\gamma}$
628: in an extension can be modified to take into account the validity of the rule.
629: In addition to requiring that $\alpha$ is provable and that none of
630: $\neg\beta_1, \ldots, \neg\beta_n$ are
631: provable,
632: we require that the default rule
633: be ``above threshold'', that is, the proportion of relevant
634: models satisfying the consequent $\gamma$ be greater
635: than a threshold $1-\epsilon^*$.
636:
637: The set of relevant models shrinks in a stepwise fashion.
638: We start out with all the models satisfying the background knowledge and
639: evidence we have.
640: As default rules are applied sequentially, the consequent of the
641: applied rule at each step is taken as true in all subsequent steps.
642: The relevant models at a particular step
643: are then those that are consistent with the given
644: facts and all the consequents of the rules applied in the previous steps.
645: A default rule, even if it is $\delta$-valid with respect to the background
646: knowledge, would be blocked from
647: application if it does not satisfy the thresholding criterion.
648:
649: In~\cite{Teng97c}, the thresholding metric
650: is based on a simple probability measure of possible worlds.
651: We can easily extend this metric to employ the same measure as that used for
652: evaluating the $\delta$-validity of default rules.
653:
654: \begin{example}
655: Reconsider Example~\ref{ex:birds}.
656: Let us take $\epsilon^* \geq \delta^*$.
657: We start out with the set ${\cal M}$ of all models satisfying $K$ and $E$.
658: From $\%(S_1(x), B(x), \epsilon_1, \delta_1)$ we know that $d_1$ is
659: above threshold, and it satisfies the other conditions for applicability.
660: Therefore we apply $d_1$ and conclude $\neg S_1(a)$.
661:
662: Now consider $d_2$.
663: The set ${\cal M}'$ of relevant models
664: at this point is a subset of ${\cal M}$; it contains only those
665: models in ${\cal M}$ that satisfy our new conclusion $\neg S_1(a)$ as well.
666: We have eliminated the models in which
667: $S_1(a)$ is true. Since $S_1(x) \supset \neg S_2(x)$,
668: and $S_1(x) \supset B(x)$,
669: all the models eliminated satisfy $B(a)$,
670: and none satisfies $S_2(a)$.
671: Thus, in ${\cal M}'$, the number of models satisfying $S_2(a)$
672: is the same as in ${\cal M}$. However, the number of models
673: satisfying $B(a)$ is lower in ${\cal M}'$ as a result of
674: the addition of $\neg S_1(a)$. This gives rise to a higher proportion
675: of models satisfying $S_2(a)$ in ${\cal M}'$ ($\delta_2'$)
676: than in ${\cal M}$ ($\delta_2$).
677: If $\delta_2' \leq \epsilon^*$, $d_2$ is still above threshold
678: after the application of $d_1$, and we can apply it to obtain $\neg S_2(a)$.
679: Otherwise, $d_2$ is below threshold, and we cannot apply it even though
680: it was above threshold before the application of $d_1$.
681:
682: After each step of applying a rule, the set of relevant models
683: shrinks, and the proportion of $S_i(a)$ of any unapplied rule $d_i$
684: increases. After a number of steps, all the remaining rules would be
685: below threshold, and we thus obtain an extension containing only
686: a portion of the conclusions that would otherwise be present in the
687: non-thresholding version of the extension.
688: \end{example}
689: Note that the size of $\epsilon^*$
690: determines how much risk is tolerated in an extension. The higher
691: the $\epsilon^*$, the more of the rules can be applied
692: and the longer they can stay above threshold.
693: Reiter's non-thresholding version corresponds to the case when $\epsilon^*=1$;
694: that is, every rule whose associated proportion is above 0 is allowed,
695: and logical consistency alone determines the rule's admissibility.
696:
697:
698: \section{Concluding Remarks}
699:
700: We have developed a notion of validity for default inference based on
701: model proportions. A rule is $\delta$-valid if the proportion of models
702: in which the consequent of the rule is satisfied is greater than $1-\delta$
703: in the relevant models picked out by the background knowledge, the evidence,
704: and the
705: applicability conditions of the default rule.
706: Given a body of background knowledge $K$, we can systematically generate
707: candidate default rules and determine which ones are $\delta$-valid
708: based on the statistical facts known in $K$.
709: Conflicts between default rules stemming from multiple inheritance
710: are resolved as a consequence of the validation process.
711: The result is a set of $\delta$-valid default rules which are
712: ``pre-compiled'' for a given background knowledge base, and can be
713: reused for different evidence sets without change.
714:
715: This idea of evaluating the validity of a default rule using model
716: proportions is extended to extensions generated by
717: a combinations of rules. The compound effect is
718: regulated by a sequential thresholding process,
719: which blocks the rules whose associated model proportions with respect
720: to the ``current'' (shrinking) set of models fall below
721: a particular comfort threshold. This allows us to use a more reasonable
722: ``closure condition'' for extension than the usual maximal logical strength:
723: we can refrain from applying rules that would make the extension
724: satisfiable in only a small set of models, even if the consequent of
725: the rule is logically consistent with the extension.
726:
727: Grounding the justification of default rules in model proportions provides
728: a way to validate the rules empirically, and is a first step towards
729: automating the learning of default rules from (statistical) data.
730: One might ask why we need the default rules when we can reason with
731: the statistics directly. Default rules provide a succinct and more
732: understandable characterization of the import of the data, as well as
733: a smooth articulation of the information that
734: may exist in the knowledge base.
735:
736:
737:
738: \section*{Acknowledgement}
739: This work was supported by the National Science Foundation STS-9906128,
740: IIS-0082928, and NASA NCC2-1239.
741:
742:
743: \begin{thebibliography}{}
744:
745: \bibitem[\protect\citeauthoryear{Adams}{1966}]{adams.1966}
746: Ernest~W. Adams.
747: \newblock Probability and the logic of conditionals.
748: \newblock In Jaakko Hintikka and Patrick Suppes, editors, {\em Aspects of
749: Inductive Logic}, pages 265--316. North Holland, Amsterdam, 1966.
750:
751: \bibitem[\protect\citeauthoryear{Adams}{1975}]{adams}
752: Ernest~W. Adams.
753: \newblock {\em The Logic of Conditionals}.
754: \newblock Reidel, Dordrecht, 1975.
755:
756: \bibitem[\protect\citeauthoryear{Bacchus \bgroup \em et al.\egroup
757: }{1993}]{BacchusGHK93}
758: Fahiem Bacchus, Adam~J. Grove, Joesph~Y. Halpern, and Daphne Koller.
759: \newblock Statistical foundations for default reasoning.
760: \newblock In {\em Proceedings of the Thirteenth International Joint Conference
761: on Artificial Intelligence}, pages 563--569, 1993.
762:
763: \bibitem[\protect\citeauthoryear{Brewka}{1989}]{Brewka89}
764: Gerhard Brewka.
765: \newblock Preferred subtheories---an extended logical framework for default
766: reasoning.
767: \newblock In {\em Proceedings of the Eleventh International Joint Conference on
768: Artificial Intelligence}, 1989.
769:
770: \bibitem[\protect\citeauthoryear{Brewka}{1994}]{Brewka94}
771: Gerhard Brewka.
772: \newblock Reasoning about priorities in default logic.
773: \newblock In {\em Proceedings of the Twelfth National Conference on Artificial
774: Intelligence}, pages 940--945, 1994.
775:
776: \bibitem[\protect\citeauthoryear{Dewey}{1938}]{dewey}
777: John Dewey.
778: \newblock {\em Logic: the Theory of Inquiry}.
779: \newblock Henry Holt, 1938.
780:
781: \bibitem[\protect\citeauthoryear{Dummett}{1978}]{dummett.justification}
782: Michael Dummett.
783: \newblock The justification of deduction.
784: \newblock In {\em Truth and Other Enigmas}, pages 290--318. Duckworth, London,
785: 1978.
786:
787: \bibitem[\protect\citeauthoryear{Goldman}{1979}]{goldman}
788: Alvin~I. Goldman.
789: \newblock What is justified belief?
790: \newblock In George~S. Pappas, editor, {\em Justification and Knowledge}, pages
791: 1--24. D. Reidel, Dordrecht, 1979.
792:
793: \bibitem[\protect\citeauthoryear{Haack}{1976}]{haack.justification}
794: Susan Haack.
795: \newblock The justification of deduction.
796: \newblock {\em Mind}, 85:112--119, 1976.
797:
798: \bibitem[\protect\citeauthoryear{Horty \bgroup \em et al.\egroup
799: }{1987}]{HortyTT87}
800: J.~F. Horty, D.~S. Touretzky, and R.~H. Thomason.
801: \newblock A clash of intuitions: The current state of nonmonotonic multiple
802: inheritance systems.
803: \newblock In {\em Proceedings of the Tenth International Joint Conference on
804: Artificial Intelligence}, pages 476--482, 1987.
805:
806: \bibitem[\protect\citeauthoryear{James}{1948}]{james}
807: William James.
808: \newblock {\em Essays in Pragmatism}.
809: \newblock Hafner, New York, 1948.
810:
811: \bibitem[\protect\citeauthoryear{Kyburg and Teng}{2001}]{kyburg.teng}
812: Henry~E. Kyburg, Jr. and Choh~Man Teng.
813: \newblock {\em Uncertain Inference}.
814: \newblock Cambridge University Press, New York, 2001.
815:
816: \bibitem[\protect\citeauthoryear{Kyburg}{1958}]{kyburg.justification}
817: Henry E.~Jr. Kyburg.
818: \newblock The justification of deduction.
819: \newblock {\em Review of Metaphysics}, 12:19--25, 1958.
820:
821: \bibitem[\protect\citeauthoryear{Kyburg}{1961}]{Kyburg61}
822: Henry~E. Kyburg, Jr.
823: \newblock {\em Probability and the Logic of Rational Belief}.
824: \newblock Wesleyan University Press, 1961.
825:
826: \bibitem[\protect\citeauthoryear{Kyburg}{2001}]{kyburg.2001}
827: Henry~E. Kyburg, Jr.
828: \newblock Real logic is nonmonotonic.
829: \newblock {\em Minds and Machines}, 11:577--595, 2001.
830:
831: \bibitem[\protect\citeauthoryear{Levi}{1967}]{levi.gambling}
832: Isaac Levi.
833: \newblock {\em Gambling with Truth}.
834: \newblock Knopf, New York, 1967.
835:
836: \bibitem[\protect\citeauthoryear{Levi}{1980}]{levi.enterprise}
837: Isaac Levi.
838: \newblock {\em The Enterprise of Knowledge,}.
839: \newblock MIT Press, Cambridge, 1980.
840:
841: \bibitem[\protect\citeauthoryear{Levi}{1996}]{levi.argument}
842: Isaac Levi.
843: \newblock {\em For the Sake of the Argument}.
844: \newblock Cambridge University Press, Cambridge, 1996.
845:
846: \bibitem[\protect\citeauthoryear{{\L}ukaszewicz}{1988}]{Lukaszewicz88}
847: Witold {\L}ukaszewicz.
848: \newblock Considerations on default logic: An alternative approach.
849: \newblock {\em Computational Intelligence}, 4(1):1--16, 1988.
850:
851: \bibitem[\protect\citeauthoryear{Moore}{1985}]{Moore85}
852: Robert~C. Moore.
853: \newblock Semantical considerations on nonmonotonic logic.
854: \newblock {\em Artificial Intelligence}, 25:75--94, 1985.
855:
856: \bibitem[\protect\citeauthoryear{Morgan}{1998}]{morgan}
857: Charles Morgan.
858: \newblock Non-monotonic logic is impossible.
859: \newblock {\em Canadian Artificial Intelligence Magazine}, 42:18--25, 1998.
860:
861: \bibitem[\protect\citeauthoryear{Neufeld \bgroup \em et al.\egroup
862: }{1990}]{NeufeldPA90}
863: Eric Neufeld, David Poole, and Romas Aleliunas.
864: \newblock Probabilistic semantics and defaults.
865: \newblock In {\em Uncertainty in Artificial Intelligence}, volume~4, pages
866: 121--131. North-Holland, 1990.
867:
868: \bibitem[\protect\citeauthoryear{Pearl and Geffner}{1990}]{pearl.geffner}
869: Judea Pearl and Hector Geffner.
870: \newblock A framework for reasoning with defaults.
871: \newblock In Henry~E. Kyburg, Jr., Ronald~P. Loui, and Greg~N. Carlson,
872: editors, {\em Knowledge Representation and Defeasible Reasoning}, pages
873: 69--87. Kluwer, 1990.
874:
875: \bibitem[\protect\citeauthoryear{Pearl}{1988}]{Pearl88}
876: Judea Pearl.
877: \newblock {\em Probabilistic Reasoning in Intelligent Systems}.
878: \newblock Morgan Kaufmann, 1988.
879:
880: \bibitem[\protect\citeauthoryear{Pearl}{1990}]{Pearl90}
881: Judea Pearl.
882: \newblock System {Z}: A natural ordering of defaults with tractable
883: applications to default reasoning.
884: \newblock In {\em Theoretical Aspects of Reasoning about Knowledge}, pages
885: 121--135, 1990.
886:
887: \bibitem[\protect\citeauthoryear{Peirce}{1903}]{peirce}
888: Charles~S. Peirce.
889: \newblock The fixation of belief.
890: \newblock In J.~Buchler, editor, {\em The Philosophy of Peirce.}, pages 5--22.
891: Harcourt Brace and Company, New York, 1903.
892:
893: \bibitem[\protect\citeauthoryear{Poole}{1989}]{Poole89}
894: David Poole.
895: \newblock What the lottery paradox tells us about default reasoning.
896: \newblock In {\em Proceedings of the First International Conference on
897: Principles of Knowledge Representation and Reasoning}, pages 333--340, 1989.
898:
899: \bibitem[\protect\citeauthoryear{Priest}{1989}]{priest}
900: Graham Priest.
901: \newblock Reasoning about truth.
902: \newblock {\em Artificial Intelligence}, 39:231--244, 1989.
903:
904: \bibitem[\protect\citeauthoryear{Reiter and Criscuolo}{1981}]{ReiterC81}
905: Raymond Reiter and Giovanni Criscuolo.
906: \newblock On interacting defaults.
907: \newblock In {\em Proceedings of the Seventh International Joint Conference on
908: Artificial Intelligence}, pages 270--276, 1981.
909:
910: \bibitem[\protect\citeauthoryear{Reiter}{1980}]{Reiter80}
911: R.~Reiter.
912: \newblock A logic for default reasoning.
913: \newblock {\em Artificial Intelligence}, 13:81--132, 1980.
914:
915: \bibitem[\protect\citeauthoryear{Teng}{1997}]{Teng97c}
916: Choh~Man Teng.
917: \newblock Sequential thresholds: Context sensitive default extensions.
918: \newblock In {\em Proceedings of the Thirteen Conference of Uncertainty in
919: Artificial Intelligence}, pages 437--444, 1997.
920:
921: \bibitem[\protect\citeauthoryear{Touretzky}{1984}]{Touretzky84}
922: David~S. Touretzky.
923: \newblock Implicit ordering of defaults in inheritance systems.
924: \newblock In {\em Proceedings of the Fifth National Conference on Artificial
925: Intelligence}, pages 322--325, 1984.
926:
927: \end{thebibliography}
928:
929: \end{document}
930:
931: