1: \documentclass{statsoc}
2:
3: \usepackage{amssymb}
4: \renewcommand\theenumi{\arabic{enumi}}
5: \title[Marginalization paradox]{The marginalization paradox does not \\
6: imply inconsistency for improper priors}
7:
8: \author[T. C. Wallstrom]{Timothy C. Wallstrom}
9:
10: \coaddress{Timothy C.\ Wallstrom,
11: Theoretical Division, MS-B213, Los Alamos National
12: Laboratory, Los Alamos, New Mexico, 87545, USA}
13:
14: \email{tcw@lanl.gov}
15:
16: \address{Los Alamos National Laboratory, Los Alamos, New Mexico, USA}
17:
18:
19: \begin{document}
20:
21: \begin{abstract}
22: \keywords{Marginalization paradox; improper prior; reduction
23: principle; noninformative prior}
24:
25: The marginalization paradox involves a disagreement between two
26: Bayesians who use two different procedures for calculating a
27: posterior in the presence of an improper prior. We show that the
28: argument used to justify the procedure of one of the Bayesians is
29: inapplicable. There is therefore no reason to expect agreement, no
30: paradox, and no evidence that improper priors are inherently
31: inconsistent. We show further that the procedure in question can be
32: interpreted as the cancellation of infinities in the formal
33: posterior. We suggest that the implicit use of this formal
34: procedure is the source of the observed disagreement.
35:
36: \end{abstract}
37:
38: \section{Introduction.}
39: \label{sec:intro}
40:
41: An important question in statistics is whether Bayesian inference can
42: be extended to the setting of improper priors in a consistent and
43: intuitively viable manner. The use of improper priors was common
44: throughout much of the twentieth century, and appears to be a useful
45: idealization for many applications. In the 1970s, however, two
46: influential arguments appeared against the use of improper priors: the
47: ``marginalization paradox,'' and ``strong inconsistency.'' These
48: arguments appear to have convinced most statisticians that improper
49: priors must be abandoned.
50:
51: In this paper we discuss the marginalization paradox, due
52: to~\citet*{DSZ73}~(DSZ73). Let $p(x|\theta)$ be a normalized sampling
53: distribution with parameter $\theta=(\eta, \zeta)$ and data $x=(y,z)$,
54: and let $p(\theta)$ be a prior, which may be improper, \textit{i.e.},
55: of infinite total probability. The marginalization paradox concerns
56: the problem of calculating $p(\zeta|z)$, under a certain set of
57: assumptions. A first Bayesian, $B_1$, eliminates $\eta$ and then $y$;
58: a second Bayesian, $B_2$, eliminates $y$ and then $\eta$. The details
59: of the procedures are given in DSZ73. It is claimed that these
60: procedures rely only on principles that would have to hold in any
61: intuitively viable theory of inference. If $p(\theta)$ is improper,
62: however, $B_1$ and $B_2$ generally get incompatible answers. It has
63: been widely inferred that any extension of Bayesian inference to the
64: context of improper priors will be inconsistent.
65:
66: The purpose of this paper is to show that the marginalization paradox
67: does not imply that the use of improper priors will lead to
68: inconsistency. First, we show that the argument used to justify
69: $B_1$'s elimination of $y$ is invalid, because it is based on the
70: application of probabilistic intuitions to a formal quantity whose
71: probabilistic meaning has not been justified. The ``paradox'' is
72: thereby resolved, since we now have no reason to believe that $B_1$'s
73: answer is correct, and no reason to insist that the answers of $B_1$
74: and $B_2$ be compatible.
75:
76: Second, we analyze $B_1$'s procedure on its own terms, to get a better
77: sense for what is being assumed. The posterior $p(\zeta|z)$ is defined
78: as a ratio, which is only formal when the prior is improper because
79: there are infinities in the numerator and denominator. $B_1$'s
80: procedure is equivalent to the assumption that these infinities will
81: cancel. What DSZ73 have shown, therefore, is that there is no
82: consistent extension of Bayesian inference in which the cancellation
83: law, assumed implicitly by $B_1$, holds when the prior is improper.
84: But this is only to be expected: it is analogous to the well-known
85: fact that there is no consistent extension of arithmetic to the
86: extended real numbers in which the cancellation law holds for
87: infinity. The proposal that we abandon improper priors because of the
88: marginalization paradox is analogous to the proposal that we abandon
89: the use of infinity because it does not obey the laws of arithmetic.
90:
91: In brief, the inconsistency of the marginalization paradox is based on
92: an assumption that has not been justified intuitively and that is
93: unreasonable mathematically. There is nothing in the marginalization
94: paradox to preclude the existence of a formalism that justifies the
95: careful use of improper priors.
96:
97:
98: \section{The intuitive argument.}
99: \label{sec:ia}
100:
101: In this section we show that the validity of $B_1$'s argument has not
102: been established, because it is based on an intuitive probabilistic
103: argument, and the distribution to which it is applied has not been
104: shown to have a probabilistic meaning. In other words, we show that
105: DSZ73 have not made their case, because their argument contains a gap.
106:
107: In addition to the assumptions described in Section~\ref{sec:intro},
108: we assume the following:
109: \begin{enumerate}
110: \item The formal posterior, defined as
111: \begin{equation}
112: \label{eq:post}
113: p(\zeta|y,z) = {\int p(y,z|\eta,\zeta) \,p(\eta,\zeta)
114: \,d\eta\over \int p(y,z|\eta,\zeta) \,p(\eta,\zeta)
115: \,d\eta\,d\zeta},
116: \end{equation}
117: is independent of $y$. We denote the common value by $p_1(\zeta|z)$.
118: Note that the value of $p(\zeta|y,z)$ and the validity of the
119: assumption itself depend on the prior. \bigskip
120:
121: \item The marginalized sampling distribution,
122: \begin{displaymath}
123: p(z|\eta,\zeta) = \int p(y,z|\eta,\zeta)\,dy
124: \end{displaymath}
125: is independent of $\eta$. We denote the common value by
126: $p_2(z|\zeta)$. \bigskip
127:
128: \item For each value of $\zeta$, the prior is improper in $\eta$: $\int
129: p(\eta,\zeta) \, d\eta = \infty$.
130:
131: \end{enumerate}
132: Assumptions 1 and 2 enable $B_1$ and $B_2$, respectively, to invoke
133: intuitive arguments to determine $p(\zeta|z)$, even though the formal
134: calculations would lead to infinities. Assumption~3 is
135: satisfied by all of the examples in DSZ73, and reflects the fact
136: that we are really interested in impropriety in $\eta$.
137:
138: We focus on only one aspect of the analysis in DSZ73, because we
139: believe that aspect to be the source of all of the difficulties. The
140: aspect in question is $B_1$'s elimination of $y$, which occurs after
141: he has already marginalized over $\eta$. $B_1$ assumes that since
142: $p(\zeta|y,z)$ is independent of $y$, then $p(\zeta|z)$ must be equal
143: to the $y$-independent value of this function.
144:
145: The justification that DSZ73 give for this assumption is intuitive,
146: and has been formalized as the ``reduction principle,'' which is
147: stated as follows in~\cite*{DSZ96}: ``Suppose that a general method of
148: inference, applied to data $(y,z)$, leads to an answer that in fact
149: depends on $z$ alone. Then we should obtain the same answer if we
150: apply the method to $z$ alone.'' The principle enables one to
151: determine the answer to the problem with data $z$ from the answer to
152: the problem with data $(y,z)$, provided that the latter answer depends
153: only on $z$. We have no objection to this principle as stated. We
154: wish to emphasize, however, that in order to apply the principle (or
155: invoke the intuition behind the principle), we must first have the
156: ``answer'' to a problem of inference, given data $(y,z)$.
157:
158: The problem with $B_1$'s argument is that $p(\zeta|y,z)$ has not been
159: shown to be the ``answer'' to a problem of inference, so the reduction
160: principle is inapplicable. We show below that in the context of the
161: marginalization paradox, any sampling distribution $p(y,z|\zeta)$
162: associated with $p(\zeta|y,z)$ is necessarily \textit{improper}, so
163: that it has no inherent probabilistic meaning. There is no reason to
164: assume that the associated formal posterior will have any
165: probabilistic meaning, even if that posterior is proper. In the
166: absence of such a meaning, $p(\zeta|y,z)$ is not the answer to a
167: problem of inference, $B_1$ is unable to use the reduction principle
168: to complete his argument, and the inconsistency vanishes.
169:
170: We are not claiming that it is impossible to provide a meaning for an
171: improper distribution. Indeed, such an assumption would preclude the
172: use of improper priors and prejudge the whole issue. We are merely
173: observing that in order to use the reduction principle, a
174: probabilistic meaning must be provided for $p(y,z|\zeta)$, and this
175: has not been done. Even if a meaning is provided, any manipulations
176: of the distribution must be justified in terms of that meaning, and
177: there is no guarantee that the resulting procedures will be the formal
178: analogs of valid procedures for proper distributions.
179:
180: We now establish the impropriety of the sampling distribution.
181:
182: \noindent\textbf{Proposition:\ \ }
183: Let $p(\eta,\zeta)$ be given, and let $p(\eta,\zeta)= p(\eta|\zeta)\,
184: p(\zeta) $ be any factorization of $p(\eta,\zeta)$ such that
185: $0<p(\zeta)<\infty$. Under the above assumptions we have, for
186: each $\zeta$,
187: \begin{equation}
188: \int p(y,z|\zeta)\, dy = \infty.
189: \label{eq:result}
190: \end{equation}
191:
192: \noindent\textit{Proof:\ \ }
193: \begin{eqnarray*}
194: \int p(y,z|\zeta) \,dy = {1\over p(\zeta)} \int p(y,z|\eta,\zeta)
195: p(\eta,\zeta)\,d\eta
196: \,dy = {p_2(z|\zeta)\over p(\zeta)}
197: \int p(\eta,\zeta) \,d\eta = \infty. \qquad
198: \end{eqnarray*}
199: The interchange in the order of integration is justified by
200: Tonelli's theorem. $\square$
201:
202: An immediate corollary is that $\int p(y,z|\zeta)\, dy\,dz = \infty$.
203: The factorization of $p(\eta,\zeta)$ is nonunique, and this implies a
204: nonuniqueness in the definition of $p(y,z|\zeta)$. The proposition
205: shows, however, that impropriety of the conditional distribution is
206: independent of the choice of factorization. Note also that although we
207: are evaluating $B_1$'s argument, the proof depends on assumption~(2),
208: which was made for $B_2$'s benefit.
209:
210: \section{The formal argument.}
211: \label{sec:fa}
212:
213: We now consider $B_1$'s procedure on its own terms, as a formal
214: procedure. We find that in the case of a proper prior, $B_1$'s use of
215: the reduction principle is equivalent to the cancellation of a finite
216: factor in a ratio defining $p(\zeta|z)$, and in the case of an
217: improper prior, to the cancellation of an infinite factor. It is
218: well-known that the formal cancellation of infinities will generally
219: lead to inconsistencies. We conclude that when viewed formally,
220: $B_1$'s procedure is highly suspect.
221:
222: In general, the posteriors of $\zeta$ given $(y,z)$ and given $z$ are
223: given formally by the following expressions:
224: \begin{eqnarray}
225: \label{eq:postx}
226: p(\zeta|y,z) &=& {p(y,z,\zeta)\over \int p(y,z,\zeta) \,d\zeta},
227: \qquad\qquad \mathrm{and} \\
228: \label{eq:postz}
229: p(\zeta|z) &=& {\int p(y,z,\zeta)\,dy\over \int p(y,z,\zeta)
230: \,dy\,d\zeta}.
231: \end{eqnarray}
232: Under Assumption~1, $p(\zeta|y,z)$ is independent of $y$. Then
233: \begin{equation}
234: \label{eq:fact}
235: p(y,z,\zeta) = p(y,z)\, p_1(\zeta|z),
236: \end{equation}
237: where $p(y,z) =\int p(y,z,\zeta) d\zeta$.
238: Substituting Eq.~(\ref{eq:fact}) into Eq.~(\ref{eq:postz}), we obtain
239: \begin{equation}
240: p(\zeta|z) = {\int p(y,z) \,p_1(\zeta|z)\, dy \over \int
241: p(y,z) \,p_1(\zeta|z) \,dy\,d\zeta}.
242: \end{equation}
243: When $\int p(y,z) dy$ is finite, then $p(\zeta|z) = p_1(\zeta|z)$.
244:
245: If we also make Assumptions~2 and~3, the proposition implies that
246: $\int p(y,z)\,dy = \infty$. The assumption that
247: $p(\zeta|z)=p_1(\zeta|z)$ is now equivalent, as claimed, to the
248: assumption that it is permissible to cancel infinite factors of
249: $\int p(y,z)\, dy$ from the ratio defining $p(\zeta|z)$.
250:
251: \section{Discussion.}
252:
253: We have observed that the inconsistencies uncovered in DSZ73 depend on
254: formal manipulation on the part of $B_1$. We have shown, in
255: Sections~\ref{sec:ia} and~\ref{sec:fa}, respectively, that $B_1$'s
256: procedure has not been justified intuitively, and is suspect
257: mathematically. We therefore see no reason to accept $B_1$'s
258: reasoning, or to regard the validity of this reasoning as necessary or
259: desirable in any extension of Bayesian inference to improper priors.
260: Once $B_1$'s reasoning is rejected, the marginalization paradox
261: disappears.
262:
263: The core of our argument is the observation that $B_1$'s argument is
264: formal because the sampling distribution $p(y,z|\zeta)$ is improper.
265: To the best of our knowledge, this observation has not been made
266: previously. The impropriety of the sampling distribution has perhaps
267: been obscured by its nonuniqueness and by the fact that the formal
268: posterior can be calculated from Eq.~(\ref{eq:post}) without ever
269: computing the sampling distribution explicitly.
270:
271: Previous analyses of the marginalization paradox generally accepted
272: the validity of both Bayesians' arguments. The problem then becomes
273: one of understanding when and why the two Bayesians will agree. This
274: analysis was initiated in DSZ73, which is mostly dedicated to this
275: question. It turns out that for problems amenable to group analysis,
276: consistency may be achieved by a uniquely determined prior. The
277: priors determined by this constraint, however, are unsatisfactory for
278: a variety of reasons, which DSZ73 explore in detail. They conclude
279: that an acceptable theory is elusive or unachievable.
280:
281: The most persistent and insightful critic of the marginalization
282: paradox has been the late E.~T.~Jaynes.
283: Cf.~\cite{Jaynes80a,DSZ80,Jaynes80b,DSZ96,Jaynes03}, for his extended
284: debate with the authors of DSZ73. We believe that at the conceptual
285: level, Jaynes' critique was fundamentally correct, in that he
286: identified the source of the inconsistencies as the formal
287: manipulation of completed infinities. A particularly elegant
288: statement of this view can be found in~\cite{Jaynes03}. At the
289: technical level, Jaynes did not recognize that $B_1$'s argument was
290: invalid, so he was forced to try to determine how the two Bayesians
291: could be reconciled. His thesis was that the disagreement between the
292: Bayesians reflected differences in their prior information. In our
293: opinion, this analysis was not entirely successful, and the correct
294: approach is to reject $B_1$'s reasoning.
295:
296: For general background on the marginalization paradox and related
297: issues, we refer the reader to the excellent review article
298: of~\cite{KW96}.
299:
300: \section*{Acknowledgments} I thank Harry Martz, Brad Plohr, and
301: Arnold Zellner for helpful comments. I also acknowledge support from
302: the Department of Energy under contract W-7405-ENG-36.
303:
304: \begin{thebibliography}{7}
305: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
306: \expandafter\ifx\csname url\endcsname\relax
307: \def\url#1{{\tt #1}}\fi
308:
309: \bibitem[Dawid et~al.(1973)Dawid, Stone, and Zidek]{DSZ73}
310: A.~P. Dawid, M.~Stone, and J.~V. Zidek.
311: \newblock Marginalization paradoxes in {B}ayesian and structural inference.
312: \newblock {\em J. R. Statist. Soc. B}, 35\penalty0 (2):\penalty0 189--233,
313: 1973.
314: \newblock With discussion. Available online at {\tt http://www.jstor.org}.
315:
316: \bibitem[Dawid et~al.(1980)Dawid, Stone, and Zidek]{DSZ80}
317: A.~P. Dawid, M.~Stone, and J.~V. Zidek.
318: \newblock Comments on {J}aynes's paper ``{M}arginalization and prior
319: probabilities''.
320: \newblock In A.~Zellner, editor, {\em Bayesian Analysis in Econometrics and
321: Statistics}, pages 79--82. North-Holland, Amsterdam, 1980.
322:
323: \bibitem[Dawid et~al.(1996)Dawid, Stone, and Zidek]{DSZ96}
324: A.~P. Dawid, M.~Stone, and J.~V. Zidek.
325: \newblock Critique of {E}.~{T}. {J}aynes's `{P}aradoxes of {P}robability
326: {T}heory', 1996.
327: \newblock Research Report 172, Department of Statistical Science, University
328: College London. Available online at {\tt
329: http://www.ucl.ac.uk/Stats/research/Resrprts/abstracts.html}.
330:
331: \bibitem[Jaynes(1980{\natexlab{a}})]{Jaynes80a}
332: Edwin~T. Jaynes.
333: \newblock Marginalization and prior probabilities.
334: \newblock In A.~Zellner, editor, {\em Bayesian Analysis in Econometrics and
335: Statistics}, pages 43--78. North-Holland, Amsterdam, 1980{\natexlab{a}}.
336: \newblock Reprinted in ``E. T. Jaynes: Papers on Probability, Statistics and
337: Statistical Physics,'' R. D. Rosenkrantz, ed., D. Reidel Publishing Company,
338: Dordrecht.
339:
340: \bibitem[Jaynes(1980{\natexlab{b}})]{Jaynes80b}
341: Edwin~T. Jaynes.
342: \newblock Reply to {D}awid, {S}tone, and {Z}idek.
343: \newblock In A.~Zellner, editor, {\em Bayesian Analysis in Econometrics and
344: Statistics}, pages 83--87. North-Holland, Amsterdam, 1980{\natexlab{b}}.
345:
346: \bibitem[Jaynes(2003)]{Jaynes03}
347: Edwin~T. Jaynes.
348: \newblock {\em Probability Theory: The Logic of Science}.
349: \newblock Cambridge University Press, 2003.
350:
351: \bibitem[Kass and Wasserman(1996)]{KW96}
352: Robert~E. Kass and Larry Wasserman.
353: \newblock The selection of prior distributions by formal rules.
354: \newblock {\em JASA}, 91\penalty0 (435):\penalty0 1343--1370, Sep 1996.
355:
356: \end{thebibliography}
357:
358:
359:
360:
361: \end{document}
362:
363: