cs0207083/eval.tex
1: \documentclass{article} 
2: \usepackage{proceedings}
3: \usepackage{latexsym}
4: \usepackage{amsfonts}
5: \usepackage{named}
6: 
7: \newtheorem{definition}{Definition}
8: \newtheorem{theorem}{Theorem}
9: \newtheorem{ex}{Example}
10: \newenvironment{example}{\begin{ex} \rm}{$\Box$ \end{ex}}
11: 
12: \newcommand{\Th}{{\bf Th}}
13: \newcommand{\default}[3]{\frac{#1:#2}{#3}}
14: 
15: \newcommand{\lc}{\ulcorner}
16: \newcommand{\rc}{\urcorner}
17: 
18: 
19: \newcommand{\note}[1]{\bigskip[{\em #1}]\bigskip}
20: 
21: \title{Evaluating Defaults}
22: \author{
23: {\bf Henry E. Kyburg, Jr.}$^{1,2}$\\
24: {\tt kyburg@cs.rochester.edu}
25: \vspace{1ex}\\
26: $^{1}$Computer Science and Philosophy\\
27: University of Rochester\\
28: Rochester NY 14627, USA
29: \And
30: {\bf Choh Man Teng}$^{2}$\\
31: {\tt cmteng@ai.uwf.edu}
32: \vspace{1ex}\\
33: $^{2}$Institute for Human and Machine Cognition\\
34: University of West Florida\\
35: Pensacola FL 32501, USA
36: }
37: 
38: \begin{document}
39: 
40: \maketitle
41: 
42: \begin{abstract}
43: We seek to find normative criteria of adequacy for nonmonotonic logic similar
44: to the criterion of validity for deductive logic.  Rather than stipulating that
45: the conclusion of an inference be true in all models in which the premises are
46: true, we require that the conclusion of a nonmonotonic inference be true in
47: ``almost all'' models of a certain sort in which the premises are true.  This
48: ``certain sort'' specification picks out the models that are relevant to the
49: inference, taking into account factors such as specificity and vagueness,
50: and previous inferences.  The frequencies
51: characterizing the relevant models reflect known frequencies in our actual
52: world.  The criteria of adequacy for a default
53: inference can be extended by thresholding to criteria of adequacy for an
54: extension.  We show that this avoids the implausibilities that might otherwise
55: result from the chaining of default inferences.  The model proportions,
56: when construed in terms of frequencies, provide a verifiable grounding of
57: default rules, and can become the basis for generating default rules from
58: statistics.
59: 
60: 
61: \vskip 2ex
62: \noindent
63: {\em Keywords}: 
64: probability, frequency, default logic
65: \end{abstract}
66: 
67: 
68: \section{Introduction}
69: 
70: Non-monotonic reasoning, for example default logic~\cite{Reiter80},
71: models the intuitive process of making non-deductive
72: inferences in the face of certain supportive but not conclusive evidence.
73: Given a default theory $\Delta=\langle D,F \rangle$, we can obtain its
74: extensions by following a prescribed set of steps.
75: However, on what grounds do we employ a particular default rule?
76: Some writers would
77: regard this as an inappropriate question, since they take as their goal the
78: representation of human inference.
79: To this end defaults represent rules that we take to be intuitively
80: appropriate.  But then when we apply these rules,
81: we may be led to counterintuitive
82: results~\cite{ReiterC81,Lukaszewicz88,Poole89}.
83: The underlying principle seems to be circular: the original default
84: rules are ``intuitively good'' at first glance, but when we discover that they
85: do not give rise to the desired results, we tweak the rules until they give us
86: those results. It seems that we have to know what results we want first before
87: constructing the default theory, rather than having the default theory tell us
88: what conclusions are warranted. This is precisely the reason why we need an
89: independent measure of validity for default rules and default extensions.
90: We think of nonmonotonic
91: logic as sharing the normative character of other logics.  From this point of
92: view default rules require some defense.
93: We will concentrate on default logic here, though much of
94: what we have to say will apply to other nonmonotonic approaches as well.
95: Much
96: of the work on nonmonotonic logic has concerned the syntactic manipulation
97: of the nonmonotonic rules,
98: rather than their basic justification.
99: 
100: 
101: 
102: \subsection{Selective Preference}
103: For a {\em default rule\/}
104: $d=\default{\alpha}{\beta_1, \ldots, \beta_n}{\gamma}$,
105: $\alpha$ is the {\em prerequisite\/},
106: $\beta_1, \ldots, \beta_n$ are the {\em justifications\/},
107: and $\gamma$ is the {\em consequent\/} of $d$.
108: Loosely speaking, the rule
109: conveys the idea that if $\alpha$ is provable, and
110: $\neg\beta_1, \ldots, \neg\beta_n$ are
111: each
112: not provable,
113: then  by default  we conclude that $\gamma$ is true.
114: A {\em default theory\/} is an ordered pair
115: $\langle D,F \rangle$, where
116: $D$ is a set of default rules and $F$ is a set of
117: ``facts''.
118: A theory extended from $F$ by applying the default rules in $D$
119: is known as an {\em extension\/} of the default theory.
120: 
121: Consider the following canonical example.
122: 
123: 
124: \begin{example}
125: We have a default theory $\Delta=\langle D, F \rangle$, where
126: \[\begin{array}{rcl}
127: D &=& \{\default{R(x)}{T(x)}{T(x)},
128:      \default{S(x)}{\neg T(x)}{\neg T(x)}\}, \\
129: F &=& \{R(a),S(a)\}.
130: \end{array}\]
131: 
132: We get two extensions, one containing $T(a)$ and the other containing
133: $\neg T(a)$.
134: If we take ``$R(x)$'' to mean that $x$ is a bird, ``$S(x)$'' to mean that
135: $x$ is a penguin, and ``$T(x)$'' to mean that $x$ flies, then we would like
136: to reject the extension containing ``$T(a)$'' ($a$ flies) in favor of the
137: extension containing ``$\neg T(a)$'' ($a$ does not fly).
138: However, if we take ``$S(x)$'' to mean that $x$ is an animal instead and
139: keep ``$R(x)$'' and ``$T(x)$'' the same, we would want to reverse our preference.
140: Now the extension containing ``$T(a)$'' ($a$ flies) seems better.
141: \end{example}
142: Note that each of the default rules involved in the example above
143: is intuitively appealing
144: when viewed by itself against our background knowledge:
145:  birds fly;
146: penguins do not fly;
147: and animals in general do not fly either.
148: Moreover, both instantiations (penguins and animals) are syntactically
149: identical.  Thus, we cannot base our decision to prefer one default rule over
150: the other by simply looking at their syntactic structures.
151: 
152: It is the interaction between the default rules and evidence
153: that gives rise to the selective preference above.  We have
154: the evidence that ``$a$ is a bird''.
155: If in addition we also have ``$a$ is a penguin'',
156: we prefer the penguin rule.
157: If instead we have ``$a$ is an animal'', we prefer the bird rule.
158: 
159: There are several approaches to circumventing this conceptual difficulty.
160: The first is to revise the default theory so that the desired result
161: is achieved~\cite{ReiterC81}.  We can amend the default rules
162: by adding the exceptions as justifications, for example
163: $\default{B(x)}{F(x), \neg P(x)}{F(x)}$ and
164: $\default{A(x)}{\neg F(x), \neg B(x)}{\neg F(x)}$.
165: With this approach we have to constantly revise the default rules
166: to take into account additional exceptions.
167: We have little guidance in constructing the list of justifications
168: except that the resulting default rule has to produce the ``right''
169: answer in the given situation.
170: 
171: 
172: Another approach is to establish some priority structure over the
173: set of defaults.
174: For example, we can refer to a specificity or inheritance hierarchy
175: to determine which default rule should be used in case of a
176: conflict~\cite{Touretzky84,HortyTT87}.
177: The penguin rule is more specific than the bird rule, when both are
178: applicable, and therefore we use the penguin rule and not the bird rule.
179: However, conflicting rules do not always fit into neat hierarchies
180: (for example, adults are employed, students are not, how about adult
181: students?~\cite{ReiterC81}).  It is not obvious how we can extend
182: the hierarchical structure without resorting to explicitly enumerating
183: the priority relations between the default rules~\cite{Brewka89,Brewka94}.
184: 
185: The third approach is to appeal to probabilistic
186: analysis.  Defaults are interpreted as
187:  representing  properties of conditional probabilities.
188: For example, the conditional probability of $a$ being able to fly given
189: that $a$ is a bird is ``high''~\cite{Pearl88,Pearl90}
190: or increases from the prior probability~\cite{NeufeldPA90},
191: while the conditional probability of $a$ being able to fly given that $a$ is 
192: a penguin
193: is ``low'' or decreases.  This approach provides a probabilistic semantics
194: for default rules, but in a way which does not represent the fact that
195:  the conclusions are {\em accepted\/}.  The default conclusion is ``Tweety 
196: flies,'' not ``Probably Tweety flies.''
197:   This is in contrast to the spirit of
198: nonmonotonic reasoning: default conclusions should be accepted as new facts,
199: and we should be able to chain default rules
200: and build upon the conclusions of previous default applications
201: to obtain further conclusions.
202: 
203: 
204: \subsection{Justifying Nonmonotonic Inference}
205: 
206: The justification of beliefs is a long standing issue in epistemology.  There
207: is
208: not much that is problematic about the justification of beliefs obtained by
209: deductive inference (though there are plenty of problems that surround
210: deduction --- see
211: \cite{kyburg.justification,haack.justification,dummett.justification}, not to
212: mention the voluminous literature on paraconsistent logic \cite{priest}).  The
213: reason is that we can show that the ordinary rules of deduction lead from
214: premises to conclusions that are true in every model in which the premises are
215: true.  This is exactly what is not true of ampliative inference, and it is what
216: has led some writers (e.g., \cite{morgan}) to deny that there {\em is} any such
217: thing as a nonmonotonic logic.  This has been disputed in \cite{kyburg.2001}.
218: 
219: But other kinds of justifications of beliefs have been proposed.  Isaac Levi
220: \cite{levi.gambling,levi.enterprise,levi.argument} has argued for many years
221: that the way to understand ampliative (inductive, nonmonotonic) argument is in
222: terms of decision theory: we choose (decide) to accept a hypothesis in a given
223: context provided that the expected epistemic utility of doing so in that
224: context
225: is greater than the expected utility of any other epistemic act, such as
226: suspending belief totally, or accepting a stronger hypothesis.
227: 
228: Levi's approach employs a rich and detailed structure for acceptance, and
229: allows drawing many important distinctions.    This structure requires three
230: things that make it less than perfect as a vehicle for ordinary nonmonotonic
231: inferential systems.  First, in keeping with a long tradition in pragmatism
232: \cite{dewey,peirce,james} the context of inquiry must be tied to a specific
233: problem: We need the answer to a question.  Second, the epistemic expectation of an
234: answer is the expected value of the {\em information} contained in that
235: answer.  Thus we need to presuppose an information measure on the language of
236: our inquiry \cite[p. 169]{levi.argument}. Third, we need to have available a
237: credal or inductive probability,  based on a measure (or
238: convex set of measures) on the sentences of the language, in terms of which a
239: conditional probability (or convex set of conditional probabilities)
240: can be defined  \cite[p. 52]{levi.enterprise}.
241: 
242: It is our belief 
243: both that in some contexts in which we might wish to use nonmonotonic
244: mechanisms, this overhead is unnecessary, and perhaps itself difficult to
245: justify, and that we would like to be able to explicate the justification of
246: inference in a less context dependent way.
247: 
248: Another approach that has attracted considerable
249: attention in the philosophical community in recent years is that of
250: ``reliabilism'' whose best known exponent is Alvin Goldman \cite{goldman}.
251: According to this view, what justifies a belief is the fact that it is obtained
252: by a ``reliable cognitive process...'' \cite[p. 20]{goldman}  Of course there
253: are a number of additional hedges to the view that are required for philosophical
254: accuracy, and even with those hedges there remains a certain vagueness in the
255: view.  These details need not detain us, since we are seeking inspiration
256: rather than philosophical precision.
257: 
258: What does ``reliable'' mean?  
259: We will construe reliability in terms of frequency or propensity to yield truth
260:  when applied.
261: Specifically, we will say that the belief $\phi$ is nonmonotonically justified
262: by a default rule if the rule would frequently lead to truth and
263: rarely to error, given what we know --- given our background knowledge.
264: 
265: A deductive argument is justified (valid) if its conclusion is true in every
266: model of its premises.
267: We will attempt to provide an analog of the justification of deductive rules:
268: a default argument is justified if its
269: conclusion is true in a high proportion of the relevant models in which its premises are
270: true.
271: To make this idea precise requires an excursion into model theory.
272: 
273: 
274: \section{Model Theory}
275: 
276: We will suppose that the underlying object language is a first order
277:  language that does not involve such intensional predicates
278: as ``know'' or ``believe.''  A number of nonmonotonic formalisms
279: (specifically autoepistemic logic~\cite{Moore85})
280: do involve such locutions within the object
281: language, but they can be dispensed with
282:  in default logic.  The default rule
283: $\default{\alpha}{\beta_1,\ldots, \beta_n}{\gamma}$ can be read in terms
284: of the nonmembership of $\ulcorner \neg \beta_i \urcorner$ in a specified set
285: of
286: expressions $\Gamma$. In original default logic, $\Gamma$ would
287: just be an extension. 
288: 
289: 
290:  There are a number of immediate problems associated
291: with the idea of looking at the ``proportion'' of models.
292: The least of them is choosing a {\em level\/} at
293: which to regard the evidence as adequate.  Should we require that the
294: proportion be 0.95?  Or 0.99?  Or 0.995?  This is
295: just the sort of question that
296: arises in statistical hypothesis testing or in confidence interval estimation.
297: We shall suppose that in a given context there is some
298: agreed-upon level of security $\delta$; we will accept a conclusion if the
299: proportion of models in which we could be committing an error is no greater
300: than $\delta$.
301: 
302: This approach is to be contrasted with those of Adams \cite{adams,adams.1966},
303: Pearl \cite{Pearl88} and Bacchus et al \cite{BacchusGHK93}.  Adams requires that
304: for
305: $A$ to be a reasonable consequence of the set of sentences $S$, for {\em any\/}
306: $\epsilon$ there must be a positive $\delta$ such that for every probability
307: function, if the probability of every sentence in $S$ is greater than $1 -
308: \delta$, then the probability of $A$ is at least $1 - \epsilon$ \cite[p.
309: 274]{adams.1966}.  Pearl's approach similarly involves quantification over
310: possible probability functions.  
311: Bacchus et al again take the degree of belief of a statement to be
312: the limiting proportion of first order models in which the statement is true.
313: All of these approaches involve matters that go well beyond what we may
314: reasonably suppose to be available to us as empirical enquirers.  Our $\delta$, on
315: the other hand, serves much like the $\alpha$ of statistical testing.
316: 
317: We must restrict the number of models
318: under consideration to a finite  number so that the idea
319: of looking at proportions makes
320: sense.\footnote{We could, instead, seek to develop a
321: way of proceeding to a limit; this still would require restrictions to arrive
322: at a countable number of models, and would entail a large expository cost for
323: little gain in plausibility.}
324: We will be taking account of statistical information, and to this end will want
325: each model to have a finite domain.
326:   Roughly speaking, we take as a model
327: of our language one in which the domain of empirical individuals is of finite
328: cardinality.  This may be regarded as problematic (it
329: entails the falsity of ``every person has two parents  and
330: nobody is his own ancestor'') but with reasonable spatial and temporal bounding
331: it can be rendered plausible.
332: 
333: Even so, to ensure that the set of models is finite
334: we must restrict the empirical domain even further.  Not only must it consist
335: of
336: a finite set of physical entities, but this same set of physical entities
337: $\mathcal D$ must be taken to be the empirical domain of every model.
338: 
339: We assume that it is possible to express statistical knowledge in this
340: language.  For example, if ``$B(x)$'' is the predicate ``is a bird''
341: and
342: ``$F(x)$'' is ``can fly'', we can express the fact that
343: between 85\% and 95\% of birds fly by the formula $0.85 <
344: \frac{|\{x:B(x)
345: \land F(x)\}|}{|\{x:B(x)\}|} < 0.95$.  Employing the notation of
346: \cite{kyburg.teng} we write this as ``$\%x(F(x),B(x),0.85,0.95)$.''
347: This renders ``\%''  a variable binding operator on 4-sequences of expressions: 
348: two formulas and two fractions.
349: 
350: 
351: We distinguish, as do Pearl and Geffner \cite[p. 70]{pearl.geffner} between
352: immediate evidence, represented by a finite set of sentences $E$ concerning
353: particular facts (to be distinguished  from the general body of factual
354: knowledge $F$ invoked by classical default logic), and a finitely axiomatizable
355: set of sentences
356: $K$ representing general background knowledge.
357:  What defaults are plausible depends, of course, on background knowledge.
358: If it were not for what we take to be the typical (or natural, or frequent)
359: behavior of birds, the world's best known example of a default rule would not
360: be
361: plausible.  On the other hand no one has proposed the default rule $\frac
362: {\mathrm{fish}(a):\mathrm{mackerel}(a)}{\mathrm{can-talk}( a)}$.
363: 
364: Thus in general
365: we will represent the set of default rules of a default theory as $\Delta_K$
366: rather than $D$, since we take them to be a function of our body of general
367: knowledge $K$. Given an error tolerance 
368: $\delta$, we will take a default rule to be {\em $\delta$-valid} if, for every
369: set of possible input sentences $E$ consistent with $K$, the application of the
370: rule to $E$ leads to a false conclusion in a
371: proportion of at most
372: $\delta$ of the relevant models.  
373: More precisely, a default rule is $\delta$-valid if and only if for every set of
374: input sentences $E$ consistent with $K$ to which the rule is applicable, the
375: proportion of models of $E \cup K$ in which the conclusion of the rule is false
376: is no more than $\delta$.
377: 
378: To fix our ideas, let us begin with a simple example.  Suppose
379: $K$ includes a statement to the effect that at least $1 -
380: \delta$ and not more than 1 - $\epsilon$ of birds fly and nothing else; that is,
381: ``$\%x(F(x),B(x),1-\delta,1-\epsilon)$.''  Consider the rule
382: $\default{B(x)}{F(x)}{F(x)}$.
383: This
384: rule is ``applicable'' to immediate evidence 
385: $E$ only if $E \cup K$ entails a sentence of the form
386: $\ulcorner B(a) \urcorner$ and no corresponding sentence of the form
387: $\ulcorner \neg F(a) \urcorner$.
388: 
389: Our models have a single domain $\mathcal D$ of finite cardinality.
390: We will write ``$\mathcal I_m(\phi)$'' for the interpretation of $\phi$ in
391: the model $m$.  The constraint imposed by
392: $K$ is that for every model $m$ the proportion of objects in $\mathcal I_m(B)$ that
393: are
394: also in $\mathcal I_m(F)$ lies in
395: $[1-\delta,1-\epsilon]$.
396: 
397: There are three cases.  First, suppose $E \cup K$ does not entail a sentence of
398: the form $\ulcorner B(a) \urcorner$.  Then the rule is inapplicable.  Second, 
399: suppose that for some term $a$, $E \cup K$ entails
400: ``$B(a)$'' and also entails ``$\neg F(a)$''.  The rule is again inapplicable, because
401: it is blocked by the failure of a justification.  Third, suppose
402: for some term $a$, $E \cup K$ entails
403: ``$B(a)$'' but not ``$\neg F(a)$''.
404: Then $\mathcal I_m(a) \in \mathcal I_m(B)$.  There are
405: $|\mathcal I_m(B)|$ interpretations of $a$ that make $E \cup K$ true;
406: of these at least
407: $1-\delta$ make ``$F(a)$'' true.  We have said nothing about interpreting the
408: rest of the language, but however many interpretations there are (we have seen
409: to it that there are only a finite number) the proportion that renders
410: ``$F(a)$'' true will remain unchanged; it will be at least $1-\delta$. 
411:  Thus, given the background knowledge that we have
412: posited, the rule is $\delta$-valid: if it is applicable it will lead to error no 
413: more than $1- \delta$ of the time.
414: 
415: 
416: Now let us consider a somewhat more complex example:  Suppose we know
417: that typically birds fly, and that typically penguins don't.  If that is in our
418: background knowledge $K$, as well as ``$\forall x(P(x) \supset B(x))$'', then
419: the
420: flying default becomes $\default{B(x)}{F(x),\neg P(x)}{F(x)}$, and
421: we also have the default
422: $\default{P(x)}{\neg F(x)}{\neg F(x)}.$
423: If $E$ entails ``$P(a)$'', only the second default is applicable.  In no more
424: than
425: $\delta$ of the models of $E$ will $a$ fly, unless $E \cup K$ entails that $a$ can
426: fly.
427: 
428: 
429: Another example: Suppose $K$ contains vague information about the
430: frequency with which red birds fly (perhaps because we have 
431: encountered few red birds).  
432: Say that we know the frequency to be between
433: 0.50 and 1.0. 
434: Since the interval for birds in general
435: $[1-\delta,1-\epsilon]$ is included in $[0.5, 1.0]$,
436: this additional piece of information
437: should not interfere with our inference of flying ability.
438: There is no {\em conflict\/} between the two intervals,
439: just less precision in one.  The rule about birds in general
440: can be applied to red birds.  However, if $K$ contains the knowledge
441: that
442: between 0.5 and
443: $r$ of red birds fly, where
444: $r$ is less than $1-\epsilon$, 
445: then this information
446: {\em should\/} interfere.  
447: In this case the general rule should be so construed that it does not 
448: apply to red birds.  If $b$ is a red bird and not a penguin, no conclusion about
449:  flying ability is justified.
450: 
451: We can arrange this by judiciously adding or deleting
452: justifications in the general bird rule, in accordance with the statistical
453: information in $K$:
454: in the first case we allow red birds; in the second we must require that we do
455: not know $a$ is red: ``$\neg R(x)$'' must be among the justifications of
456: the rule.  This statistical approach provides exactly the normative guidance
457: that is lacking in the ad-hoc
458: approach of tweaking default rules in order to arrive at the ``intuitive''
459: results.
460: 
461: More generally, we can give recipes for constructing $\delta$-valid defaults
462: for conclusions of the form $\lc \phi (a) \rc$ from background knowledge $K$ 
463: and immediate evidence $E$.\footnote{
464: Of course any default conclusion can be given this form, particularly if we allow
465: the term $a$ to be an $n$-sequence of terms.  Furthermore, any such conclusion
466: can be taken to be an instance of the consequent of a statistical generalization,
467: in virtue of the fact that statistical generalizations merely impose bounds. We
468: are  not imposing serious limitations on default rules.  For details, see
469: \cite{kyburg.teng}.}
470: Let $K$ contain $\ulcorner
471: \%x(\phi(x),\psi(x),p,q) \urcorner$ and 
472: $\ulcorner
473: \%x(\phi(x),\psi'(x),p',q') \urcorner$.
474: We consider three cases:
475: \begin{enumerate}
476: \item  $K$ entails $\ulcorner\forall x
477: (\psi(x) \supset \psi'(x)) \urcorner$.
478: There are three subcases
479: according to the relation among $p$, $p'$, $q$, $q'$:
480: \begin{enumerate}
481: \item ($p \leq p'$ and $q \leq q'$) or ($p' \leq p$ and $q' \leq q$)\\
482: $\default{\psi(x)}{\phi(x)}{\phi(x)}$ and $\default{\psi'(x)}{\neg
483: \psi(x),\phi(x)}{\phi(x)}$ are candidate defaults.
484: \item $p \leq p'$ and $q' \leq q$\\
485: $\default{\psi'(x)}{\phi(x)}{\phi(x)}$ is the only candidate default,
486: since
487: the justification $\neg \psi'(x)$ 
488: of $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$ is
489: inconsistent with the prerequisite $\psi(x)$ and $K$.
490: \item $p' \leq p$ and $q \leq q'$\\
491: $\default{\psi(x)}{\phi(x)}{\phi(x)}$ and $\default{\psi'(x)}{\neg
492: \psi(x),\phi(x)}{\phi(x)}$ are candidate defaults.
493: \end{enumerate}
494: \item $K$ entails $\ulcorner\forall x
495: (\psi'(x) \supset \psi(x)) \urcorner$.
496: This is symmetrical to case 1.
497: \item $K$ entails neither $\ulcorner\forall x
498: (\psi(x) \supset \psi'(x)) \urcorner$ nor  $\ulcorner\forall x
499: (\psi'(x) \supset \psi(x)) \urcorner$.
500: Again there are three subcases:
501: \begin{enumerate}
502: \item ($p \leq p'$ and $q \leq q'$) or ($p' \leq p$ and $q' \leq q$) \\
503: The candidate defaults are $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$
504: and
505: $\default{\psi'(x)}{\neg \psi(x),\phi(x)}{\phi(x)}$.
506: \item $p \leq p'$ and $q' \leq q$\\
507: $\default{\psi(x)}{\phi(x),\neg \psi'(x)}{\phi(x)}$ and
508: $\default{\psi'(x)}{\phi(x)}{\phi(x)}$ are candidate default rules.
509: \item $p' \leq p$ and $q \leq q'$:
510: This is symmetrical to case 3(b).
511: \end{enumerate}
512: \end{enumerate}
513: 
514: Having generated a list of candidate default rules based on our background
515: knowledge $K$, we delete those rules derived from statistics with a lower
516: measure less than $1 -\delta$.  The remainder is the set of
517: defaults $\Delta_K$.
518: 
519: We have not taken account of relations among default conclusions that may be
520: entailed by $K$.  If $K$ contains $\ulcorner \forall x
521: (\phi(x) \equiv \phi'(x)) \urcorner$ then the default conclusion $\ulcorner
522: \phi(a) \rc$ behaves just like the default conclusion $\lc \phi'(a) \rc$.  If
523: $K$ contains $\lc \forall x(\phi(x) \supset \phi'(x)) \rc$, then since $\lc
524: \phi(x) \rc$ is equivalent to $\lc \phi(x) \wedge \phi'(x) \rc$ and $\lc
525: \phi'(x) \rc$ is equivalent to $\lc \phi(x) \vee \phi'(x) \rc$ we can make use
526: of the obvious entailment relations.
527: 
528: Soundness of a system of deductive logic requires that the conclusion of any 
529: inference be true in every model in which the premises are true.  Clearly
530: nonmonotonic inference should not be sound.  But there is a property that is
531: {\em like\/} soundness that applies to default inference.  It is the property
532: that the conclusion is false in at most a fraction $\delta$ of the models of the
533: premises $K \cup E$.
534: 
535: \begin{theorem}[Default Soundness]
536: For every set of observations $E$, if $d \in \Delta_K$ is applicable to $E$,
537: the proportion of models of $E \cup K$ in which the conclusion of $d$ is false
538: is less than $\delta$.
539: 
540: {\rm The proof of this theorem is provided by the soundness theorem for
541: evidential probability \cite[p. 241]{kyburg.teng}, since the rules for deriving
542: defaults are a subset of the rules for computing evidential probabilities.
543: $\Box$}
544: \end{theorem}
545: 
546: 
547: \section{Interactions within an Extension}
548: 
549: Having determined which default rules are justified with respect to the
550: background knowledge, the next step is to investigate the interaction between
551: default rules in generating an extension.
552: A default extension is a minimal deductively closed set that
553: contains the given facts and the consequents of all applicable default rules.
554: Given an evidence set, we need to determine
555: how to control the compound effects of multiple defaults in an extension.
556: 
557: Take for example, a default version~\cite{Poole89} of
558: the probabilistic lottery paradox~\cite{Kyburg61}.  There are $n$ species
559: of birds, $S_1, \ldots, S_n$.
560: We can say that penguins are atypical in that they cannot fly;
561: hummingbirds are atypical in that they have very fine motor control;
562: parrots are atypical in that they could talk; and so on.
563: If we apply this train of thought to all $n$ species of birds,
564: there is no typical bird left, as for each species
565: there is always at least one aspect in which it is atypical.
566: A parallel scenario is formulated below.
567: 
568: \begin{example}
569: \label{ex:birds}
570: $K$ contains
571: \begin{quote}
572: $B(x) \equiv S_1(x) \vee \ldots \vee S_n(x)$\\
573: \mbox{\ } \hfill
574: [an exhaustive list of bird species]\\
575: $S_i(x) \supset \neg S_j(x)$, for all $j \neq i$\\
576: \mbox{\ } \hfill
577: [species are mutually exclusive]\\
578: $\%(S_i(x), B(x), \epsilon_i, \delta_i), \mbox{for } 1 \leq i \leq n$\\
579: \mbox{\ } \hfill
580: [the proportion of each $S_i$ species of birds is ``small'']
581: \end{quote}
582: From $K$ we can derive $n$ $\delta^*$-valid default rules for $\Delta_K$:
583: \[d_i=\default{B(x)}{\neg S_i(x)}{\neg S_i(x)}, \mbox{for } 1 \leq i \leq n\]
584: where $\delta^*$ is  the maximum of $\delta_1, \ldots, \delta_n$.
585: 
586: Now consider the evidence set $E=\{B(a)\}$.
587: In the original formulation of default logic,
588: we would get $n$ extensions, each one containing one $S_i(a)$ and the
589: negations of all the other $S_j(a)$'s.
590: Thus, for each extension, we would conclude that $a$ is a particular species
591: of bird, which seems to be an over commitment, considering we have
592: $\%(S_i(x), B(x), \epsilon_i, \delta_i)$ in $K$.
593: \end{example}
594: Note that each of the $n$ default rules is $\delta$-valid when considered
595: individually, but in an extension the rules
596: interact to sanction a set of conclusions that when taken together
597: seems implausible according to our knowledge of model frequencies.
598: The definition of an extension dictates that we must keep applying
599: rules until all ``applicable'' ones are exhausted.  The
600: ``applicability'' condition is based on maximizing logical strength:
601: for $d=\default{\alpha}{\beta_1, \ldots, \beta_m}{\gamma}$,
602: as long as $\alpha$ is derivable, and the $\beta$'s are consistent with
603: the extension, we must apply $d$ and add $\gamma$ to the extension.
604: Thus, for each of the extensions above, we have to keep applying the rules
605: until we have drawn $n-1$ conclusions: $\neg S_j(a)$ for all $j \neq i$.
606: Then the consistency requirement blocks the last default rule $d_i$,
607: as $B(x) \supset S_1(x) \vee \ldots \vee S_n(x)$ together with
608: $\neg S_j(a)$ for all $j \neq i$ gives us $S_i(a)$, contradicting the
609: $\beta$ of $d_i$.
610: From $\%(S_i(x), B(x), \epsilon_i, \delta_i)$, we know the proportion
611: of models in which $S_i(a)$ is true, and thus the proportion of models
612: satisfying this extension, given $E$, is at most $\delta_i$, a small ratio.
613: 
614: \subsection{Sequential Thresholding}
615: The validity criteria for individual default rules can be extended to
616: extensions resulted from the application of a chain of default rules.
617: We can think of the task of regulating the compound effect of multiple
618: default rules
619: as adjusting the set of relevant models
620: by taking into account
621: the default conclusions of all previously applied rules in the chain of
622: reasoning.
623: 
624: 
625: One way to accomplish this is by {\em sequential
626: thresholding\/}~\cite{Teng97c}.  The applicability condition
627: of a default rule $\default{\alpha}{\beta_1, \ldots, \beta_m}{\gamma}$
628: in an extension can be modified to take into account the validity of the rule.
629: In addition to requiring that $\alpha$ is provable and that none of
630: $\neg\beta_1, \ldots, \neg\beta_n$ are
631: provable,
632: we require that the default rule
633: be ``above threshold'', that is, the proportion of relevant
634: models satisfying the consequent $\gamma$ be greater
635: than a threshold $1-\epsilon^*$.
636: 
637: The set of relevant models shrinks in a stepwise fashion.
638: We start out with all the models satisfying the background knowledge and
639: evidence we have.
640: As default rules are applied sequentially, the consequent of the
641: applied rule at each step is taken as true in all subsequent steps.
642: The relevant models at a particular step
643: are then those that are consistent with the given
644: facts and all the consequents of the rules applied in the previous steps.
645: A default rule, even if it is $\delta$-valid with respect to the background
646: knowledge, would be blocked from
647: application if it does not satisfy the thresholding criterion.
648: 
649: In~\cite{Teng97c}, the thresholding metric
650: is based on a simple probability measure of possible worlds.
651: We can easily extend this metric to employ the same measure as that used for
652: evaluating the $\delta$-validity of default rules.
653: 
654: \begin{example}
655: Reconsider Example~\ref{ex:birds}.
656: Let us take $\epsilon^* \geq \delta^*$.
657: We start out with the set ${\cal M}$ of all models satisfying $K$ and $E$.
658: From $\%(S_1(x), B(x), \epsilon_1, \delta_1)$ we know that $d_1$ is
659: above threshold, and it satisfies the other conditions for applicability.
660: Therefore we apply $d_1$ and conclude $\neg S_1(a)$.
661: 
662: Now consider $d_2$.
663: The set ${\cal M}'$ of relevant models
664: at this point is a subset of ${\cal M}$; it contains only those
665: models in ${\cal M}$ that satisfy our new conclusion $\neg S_1(a)$ as well.
666: We have eliminated the models in which
667: $S_1(a)$ is true.  Since $S_1(x) \supset \neg S_2(x)$,
668: and $S_1(x) \supset B(x)$,
669: all the models eliminated satisfy $B(a)$,
670: and none satisfies $S_2(a)$.
671: Thus, in ${\cal M}'$, the number of models satisfying $S_2(a)$
672: is the same as in ${\cal M}$.  However, the number of models
673: satisfying $B(a)$ is lower in ${\cal M}'$ as a result of
674: the addition of $\neg S_1(a)$.  This gives rise to a higher proportion
675: of models satisfying $S_2(a)$ in ${\cal M}'$ ($\delta_2'$)
676: than in ${\cal M}$ ($\delta_2$).
677: If $\delta_2' \leq \epsilon^*$, $d_2$ is still above threshold
678: after the application of $d_1$, and we can apply it to obtain $\neg S_2(a)$.
679: Otherwise, $d_2$ is below threshold, and we cannot apply it even though
680: it was above threshold before the application of $d_1$.
681: 
682: After each step of applying a rule, the set of relevant models
683: shrinks, and the proportion of $S_i(a)$ of any unapplied rule $d_i$
684: increases.  After a number of steps, all the remaining rules would be
685: below threshold, and we thus obtain an extension containing only
686: a portion of the conclusions that would otherwise be present in the
687: non-thresholding version of the extension.
688: \end{example}
689: Note that the size of $\epsilon^*$
690: determines how much risk is tolerated in an extension.  The higher
691: the $\epsilon^*$, the more of the rules can be applied
692: and the longer they can stay above threshold.
693: Reiter's non-thresholding version corresponds to the case when $\epsilon^*=1$;
694: that is, every rule whose associated proportion is above 0 is allowed,
695: and logical consistency alone determines the rule's admissibility.
696: 
697: 
698: \section{Concluding Remarks}
699: 
700: We have developed a notion of validity for default inference based on
701: model proportions.  A rule is $\delta$-valid if the proportion of models
702: in which the consequent of the rule is satisfied is greater than $1-\delta$
703: in the relevant models picked out by the background knowledge, the evidence,
704: and the
705: applicability conditions of the default rule.
706: Given a body of background knowledge $K$, we can systematically generate
707: candidate default rules and determine which ones are $\delta$-valid
708: based on the statistical facts known in $K$.
709: Conflicts between default rules stemming from multiple inheritance
710: are resolved as a consequence of the validation process.
711: The result is a set of $\delta$-valid default rules which are
712: ``pre-compiled'' for a given background knowledge base, and can be
713: reused for different evidence sets without change.
714: 
715: This idea of evaluating the validity of a default rule using model
716: proportions is extended to extensions generated by
717: a combinations of rules.  The compound effect is
718: regulated by a sequential thresholding process,
719: which blocks the rules whose associated model proportions with respect
720: to the ``current'' (shrinking) set of models fall below
721: a particular comfort threshold.  This allows us to use a more reasonable
722: ``closure condition'' for extension than the usual maximal logical strength:
723: we can refrain from applying rules that would make the extension
724: satisfiable in only a small set of models, even if the consequent of
725: the rule is logically consistent with the extension.
726: 
727: Grounding the justification of default rules in model proportions provides
728: a way to validate the rules empirically, and is a first step towards
729: automating the learning of default rules from (statistical) data.
730: One might ask why we need the default rules when we can reason with
731: the statistics directly.  Default rules provide a succinct and more
732: understandable characterization of the import of the data, as well as
733: a smooth articulation of the  information that
734: may exist in the knowledge base.
735: 
736: 
737: 
738: \section*{Acknowledgement}
739: This work was supported by the National Science Foundation STS-9906128,
740: IIS-0082928, and NASA NCC2-1239.
741: 
742: 
743: \begin{thebibliography}{}
744: 
745: \bibitem[\protect\citeauthoryear{Adams}{1966}]{adams.1966}
746: Ernest~W. Adams.
747: \newblock Probability and the logic of conditionals.
748: \newblock In Jaakko Hintikka and Patrick Suppes, editors, {\em Aspects of
749:   Inductive Logic}, pages 265--316. North Holland, Amsterdam, 1966.
750: 
751: \bibitem[\protect\citeauthoryear{Adams}{1975}]{adams}
752: Ernest~W. Adams.
753: \newblock {\em The Logic of Conditionals}.
754: \newblock Reidel, Dordrecht, 1975.
755: 
756: \bibitem[\protect\citeauthoryear{Bacchus \bgroup \em et al.\egroup
757:   }{1993}]{BacchusGHK93}
758: Fahiem Bacchus, Adam~J. Grove, Joesph~Y. Halpern, and Daphne Koller.
759: \newblock Statistical foundations for default reasoning.
760: \newblock In {\em Proceedings of the Thirteenth International Joint Conference
761:   on Artificial Intelligence}, pages 563--569, 1993.
762: 
763: \bibitem[\protect\citeauthoryear{Brewka}{1989}]{Brewka89}
764: Gerhard Brewka.
765: \newblock Preferred subtheories---an extended logical framework for default
766:   reasoning.
767: \newblock In {\em Proceedings of the Eleventh International Joint Conference on
768:   Artificial Intelligence}, 1989.
769: 
770: \bibitem[\protect\citeauthoryear{Brewka}{1994}]{Brewka94}
771: Gerhard Brewka.
772: \newblock Reasoning about priorities in default logic.
773: \newblock In {\em Proceedings of the Twelfth National Conference on Artificial
774:   Intelligence}, pages 940--945, 1994.
775: 
776: \bibitem[\protect\citeauthoryear{Dewey}{1938}]{dewey}
777: John Dewey.
778: \newblock {\em Logic: the Theory of Inquiry}.
779: \newblock Henry Holt, 1938.
780: 
781: \bibitem[\protect\citeauthoryear{Dummett}{1978}]{dummett.justification}
782: Michael Dummett.
783: \newblock The justification of deduction.
784: \newblock In {\em Truth and Other Enigmas}, pages 290--318. Duckworth, London,
785:   1978.
786: 
787: \bibitem[\protect\citeauthoryear{Goldman}{1979}]{goldman}
788: Alvin~I. Goldman.
789: \newblock What is justified belief?
790: \newblock In George~S. Pappas, editor, {\em Justification and Knowledge}, pages
791:   1--24. D. Reidel, Dordrecht, 1979.
792: 
793: \bibitem[\protect\citeauthoryear{Haack}{1976}]{haack.justification}
794: Susan Haack.
795: \newblock The justification of deduction.
796: \newblock {\em Mind}, 85:112--119, 1976.
797: 
798: \bibitem[\protect\citeauthoryear{Horty \bgroup \em et al.\egroup
799:   }{1987}]{HortyTT87}
800: J.~F. Horty, D.~S. Touretzky, and R.~H. Thomason.
801: \newblock A clash of intuitions: The current state of nonmonotonic multiple
802:   inheritance systems.
803: \newblock In {\em Proceedings of the Tenth International Joint Conference on
804:   Artificial Intelligence}, pages 476--482, 1987.
805: 
806: \bibitem[\protect\citeauthoryear{James}{1948}]{james}
807: William James.
808: \newblock {\em Essays in Pragmatism}.
809: \newblock Hafner, New York, 1948.
810: 
811: \bibitem[\protect\citeauthoryear{Kyburg and Teng}{2001}]{kyburg.teng}
812: Henry~E. Kyburg, Jr. and Choh~Man Teng.
813: \newblock {\em Uncertain Inference}.
814: \newblock Cambridge University Press, New York, 2001.
815: 
816: \bibitem[\protect\citeauthoryear{Kyburg}{1958}]{kyburg.justification}
817: Henry E.~Jr. Kyburg.
818: \newblock The justification of deduction.
819: \newblock {\em Review of Metaphysics}, 12:19--25, 1958.
820: 
821: \bibitem[\protect\citeauthoryear{Kyburg}{1961}]{Kyburg61}
822: Henry~E. Kyburg, Jr.
823: \newblock {\em Probability and the Logic of Rational Belief}.
824: \newblock Wesleyan University Press, 1961.
825: 
826: \bibitem[\protect\citeauthoryear{Kyburg}{2001}]{kyburg.2001}
827: Henry~E. Kyburg, Jr.
828: \newblock Real logic is nonmonotonic.
829: \newblock {\em Minds and Machines}, 11:577--595, 2001.
830: 
831: \bibitem[\protect\citeauthoryear{Levi}{1967}]{levi.gambling}
832: Isaac Levi.
833: \newblock {\em Gambling with Truth}.
834: \newblock Knopf, New York, 1967.
835: 
836: \bibitem[\protect\citeauthoryear{Levi}{1980}]{levi.enterprise}
837: Isaac Levi.
838: \newblock {\em The Enterprise of Knowledge,}.
839: \newblock MIT Press, Cambridge, 1980.
840: 
841: \bibitem[\protect\citeauthoryear{Levi}{1996}]{levi.argument}
842: Isaac Levi.
843: \newblock {\em For the Sake of the Argument}.
844: \newblock Cambridge University Press, Cambridge, 1996.
845: 
846: \bibitem[\protect\citeauthoryear{{\L}ukaszewicz}{1988}]{Lukaszewicz88}
847: Witold {\L}ukaszewicz.
848: \newblock Considerations on default logic: An alternative approach.
849: \newblock {\em Computational Intelligence}, 4(1):1--16, 1988.
850: 
851: \bibitem[\protect\citeauthoryear{Moore}{1985}]{Moore85}
852: Robert~C. Moore.
853: \newblock Semantical considerations on nonmonotonic logic.
854: \newblock {\em Artificial Intelligence}, 25:75--94, 1985.
855: 
856: \bibitem[\protect\citeauthoryear{Morgan}{1998}]{morgan}
857: Charles Morgan.
858: \newblock Non-monotonic logic is impossible.
859: \newblock {\em Canadian Artificial Intelligence Magazine}, 42:18--25, 1998.
860: 
861: \bibitem[\protect\citeauthoryear{Neufeld \bgroup \em et al.\egroup
862:   }{1990}]{NeufeldPA90}
863: Eric Neufeld, David Poole, and Romas Aleliunas.
864: \newblock Probabilistic semantics and defaults.
865: \newblock In {\em Uncertainty in Artificial Intelligence}, volume~4, pages
866:   121--131. North-Holland, 1990.
867: 
868: \bibitem[\protect\citeauthoryear{Pearl and Geffner}{1990}]{pearl.geffner}
869: Judea Pearl and Hector Geffner.
870: \newblock A framework for reasoning with defaults.
871: \newblock In Henry~E. Kyburg, Jr., Ronald~P. Loui, and Greg~N. Carlson,
872:   editors, {\em Knowledge Representation and Defeasible Reasoning}, pages
873:   69--87. Kluwer, 1990.
874: 
875: \bibitem[\protect\citeauthoryear{Pearl}{1988}]{Pearl88}
876: Judea Pearl.
877: \newblock {\em Probabilistic Reasoning in Intelligent Systems}.
878: \newblock Morgan Kaufmann, 1988.
879: 
880: \bibitem[\protect\citeauthoryear{Pearl}{1990}]{Pearl90}
881: Judea Pearl.
882: \newblock System {Z}: A natural ordering of defaults with tractable
883:   applications to default reasoning.
884: \newblock In {\em Theoretical Aspects of Reasoning about Knowledge}, pages
885:   121--135, 1990.
886: 
887: \bibitem[\protect\citeauthoryear{Peirce}{1903}]{peirce}
888: Charles~S. Peirce.
889: \newblock The fixation of belief.
890: \newblock In J.~Buchler, editor, {\em The Philosophy of Peirce.}, pages 5--22.
891:   Harcourt Brace and Company, New York, 1903.
892: 
893: \bibitem[\protect\citeauthoryear{Poole}{1989}]{Poole89}
894: David Poole.
895: \newblock What the lottery paradox tells us about default reasoning.
896: \newblock In {\em Proceedings of the First International Conference on
897:   Principles of Knowledge Representation and Reasoning}, pages 333--340, 1989.
898: 
899: \bibitem[\protect\citeauthoryear{Priest}{1989}]{priest}
900: Graham Priest.
901: \newblock Reasoning about truth.
902: \newblock {\em Artificial Intelligence}, 39:231--244, 1989.
903: 
904: \bibitem[\protect\citeauthoryear{Reiter and Criscuolo}{1981}]{ReiterC81}
905: Raymond Reiter and Giovanni Criscuolo.
906: \newblock On interacting defaults.
907: \newblock In {\em Proceedings of the Seventh International Joint Conference on
908:   Artificial Intelligence}, pages 270--276, 1981.
909: 
910: \bibitem[\protect\citeauthoryear{Reiter}{1980}]{Reiter80}
911: R.~Reiter.
912: \newblock A logic for default reasoning.
913: \newblock {\em Artificial Intelligence}, 13:81--132, 1980.
914: 
915: \bibitem[\protect\citeauthoryear{Teng}{1997}]{Teng97c}
916: Choh~Man Teng.
917: \newblock Sequential thresholds: Context sensitive default extensions.
918: \newblock In {\em Proceedings of the Thirteen Conference of Uncertainty in
919:   Artificial Intelligence}, pages 437--444, 1997.
920: 
921: \bibitem[\protect\citeauthoryear{Touretzky}{1984}]{Touretzky84}
922: David~S. Touretzky.
923: \newblock Implicit ordering of defaults in inheritance systems.
924: \newblock In {\em Proceedings of the Fifth National Conference on Artificial
925:   Intelligence}, pages 322--325, 1984.
926: 
927: \end{thebibliography}
928: 
929: \end{document}
930: 
931: