0310:math0310006/mp.tex

1: \documentclass{statsoc}

2:

3: \usepackage{amssymb}

4: \renewcommand\theenumi{\arabic{enumi}}

5: \title[Marginalization paradox]{The marginalization paradox does not \\

6: imply inconsistency for improper priors}

7:

8: \author[T. C. Wallstrom]{Timothy C. Wallstrom}

9:

10: \coaddress{Timothy C.\ Wallstrom,

11: Theoretical Division, MS-B213, Los Alamos National

12: Laboratory, Los Alamos, New Mexico, 87545, USA}

13:

14: \email{tcw@lanl.gov}

15:

16: \address{Los Alamos National Laboratory, Los Alamos, New Mexico, USA}

17:

18:

19: \begin{document}

20:

21: \begin{abstract}

22:   \keywords{Marginalization paradox; improper prior; reduction

23:     principle; noninformative prior}

24:

25:   The marginalization paradox involves a disagreement between two

26:   Bayesians who use two different procedures for calculating a

27:   posterior in the presence of an improper prior. We show that the

28:   argument used to justify the procedure of one of the Bayesians is

29:   inapplicable. There is therefore no reason to expect agreement, no

30:   paradox, and no evidence that improper priors are inherently

31:   inconsistent. We show further that the procedure in question can be

32:   interpreted as the cancellation of infinities in the formal

33:   posterior.  We suggest that the implicit use of this formal

34:   procedure is the source of the observed disagreement.

35:

36: \end{abstract}

37:

38: \section{Introduction.}

39: \label{sec:intro}

40:

41: An important question in statistics is whether Bayesian inference can

42: be extended to the setting of improper priors in a consistent and

43: intuitively viable manner. The use of improper priors was common

44: throughout much of the twentieth century, and appears to be a useful

45: idealization for many applications. In the 1970s, however, two

46: influential arguments appeared against the use of improper priors: the

47: ``marginalization paradox,'' and ``strong inconsistency.''  These

48: arguments appear to have convinced most statisticians that improper

49: priors must be abandoned.

50:

51: In this paper we discuss the marginalization paradox, due

52: to~\citet*{DSZ73}~(DSZ73).  Let $p(x|\theta)$ be a normalized sampling

53: distribution with parameter $\theta=(\eta, \zeta)$ and data $x=(y,z)$,

54: and let $p(\theta)$ be a prior, which may be improper, \textit{i.e.},

55: of infinite total probability. The marginalization paradox concerns

56: the problem of calculating $p(\zeta|z)$, under a certain set of

57: assumptions. A first Bayesian, $B_1$, eliminates $\eta$ and then $y$;

58: a second Bayesian, $B_2$, eliminates $y$ and then $\eta$. The details

59: of the procedures are given in DSZ73. It is claimed that these

60: procedures rely only on principles that would have to hold in any

61: intuitively viable theory of inference.  If $p(\theta)$ is improper,

62: however, $B_1$ and $B_2$ generally get incompatible answers. It has

63: been widely inferred that any extension of Bayesian inference to the

64: context of improper priors will be inconsistent.

65:

66: The purpose of this paper is to show that the marginalization paradox

67: does not imply that the use of improper priors will lead to

68: inconsistency. First, we show that the argument used to justify

69: $B_1$'s elimination of $y$ is invalid, because it is based on the

70: application of probabilistic intuitions to a formal quantity whose

71: probabilistic meaning has not been justified. The ``paradox'' is

72: thereby resolved, since we now have no reason to believe that $B_1$'s

73: answer is correct, and no reason to insist that the answers of $B_1$

74: and $B_2$ be compatible.

75:

76: Second, we analyze $B_1$'s procedure on its own terms, to get a better

77: sense for what is being assumed. The posterior $p(\zeta|z)$ is defined

78: as a ratio, which is only formal when the prior is improper because

79: there are infinities in the numerator and denominator. $B_1$'s

80: procedure is equivalent to the assumption that these infinities will

81: cancel. What DSZ73 have shown, therefore, is that there is no

82: consistent extension of Bayesian inference in which the cancellation

83: law, assumed implicitly by $B_1$, holds when the prior is improper.

84: But this is only to be expected: it is analogous to the well-known

85: fact that there is no consistent extension of arithmetic to the

86: extended real numbers in which the cancellation law holds for

87: infinity. The proposal that we abandon improper priors because of the

88: marginalization paradox is analogous to the proposal that we abandon

89: the use of infinity because it does not obey the laws of arithmetic.

90:

91: In brief, the inconsistency of the marginalization paradox is based on

92: an assumption that has not been justified intuitively and that is

93: unreasonable mathematically. There is nothing in the marginalization

94: paradox to preclude the existence of a formalism that justifies the

95: careful use of improper priors.

96:

97:

98: \section{The intuitive argument.}

99: \label{sec:ia}

100:

101: In this section we show that the validity of $B_1$'s argument has not

102: been established, because it is based on an intuitive probabilistic

103: argument, and the distribution to which it is applied has not been

104: shown to have a probabilistic meaning.  In other words, we show that

105: DSZ73 have not made their case, because their argument contains a gap.

106:

107: In addition to the assumptions described in Section~\ref{sec:intro},

108: we assume the following:

109: \begin{enumerate}

110: \item The formal posterior, defined as

111: \begin{equation}

112:     \label{eq:post}

113:     p(\zeta|y,z) = {\int p(y,z|\eta,\zeta) \,p(\eta,\zeta)

114:                  \,d\eta\over \int p(y,z|\eta,\zeta) \,p(\eta,\zeta)

115:                  \,d\eta\,d\zeta},

116: \end{equation}

117: is independent of $y$. We denote the common value by $p_1(\zeta|z)$.

118: Note that the value of $p(\zeta|y,z)$ and the validity of the

119: assumption itself depend on the prior. \bigskip

120:

121: \item  The marginalized sampling distribution,

122:    \begin{displaymath}

123:     p(z|\eta,\zeta) = \int p(y,z|\eta,\zeta)\,dy

124:   \end{displaymath}

125:   is independent of $\eta$. We denote the common value by

126:   $p_2(z|\zeta)$. \bigskip

127:

128: \item For each value of $\zeta$, the prior is improper in $\eta$: $\int

129:   p(\eta,\zeta) \, d\eta = \infty$.

130:

131: \end{enumerate}

132: Assumptions 1 and 2 enable $B_1$ and $B_2$, respectively, to invoke

133: intuitive arguments to determine $p(\zeta|z)$, even though the formal

134: calculations would lead to infinities. Assumption~3 is

135: satisfied by all of the examples in DSZ73, and reflects the fact

136: that we are really interested in impropriety in $\eta$.

137:

138: We focus on only one aspect of the analysis in DSZ73, because we

139: believe that aspect to be the source of all of the difficulties. The

140: aspect in question is $B_1$'s elimination of $y$, which occurs after

141: he has already marginalized over $\eta$.  $B_1$ assumes that since

142: $p(\zeta|y,z)$ is independent of $y$, then $p(\zeta|z)$ must be equal

143: to the $y$-independent value of this function.

144:

145: The justification that DSZ73 give for this assumption is intuitive,

146: and has been formalized as the ``reduction principle,'' which is

147: stated as follows in~\cite*{DSZ96}: ``Suppose that a general method of

148: inference, applied to data $(y,z)$, leads to an answer that in fact

149: depends on $z$ alone.  Then we should obtain the same answer if we

150: apply the method to $z$ alone.'' The principle enables one to

151: determine the answer to the problem with data $z$ from the answer to

152: the problem with data $(y,z)$, provided that the latter answer depends

153: only on $z$.  We have no objection to this principle as stated.  We

154: wish to emphasize, however, that in order to apply the principle (or

155: invoke the intuition behind the principle), we must first have the

156: ``answer'' to a problem of inference, given data $(y,z)$.

157:

158: The problem with $B_1$'s argument is that $p(\zeta|y,z)$ has not been

159: shown to be the ``answer'' to a problem of inference, so the reduction

160: principle is inapplicable. We show below that in the context of the

161: marginalization paradox, any sampling distribution $p(y,z|\zeta)$

162: associated with $p(\zeta|y,z)$ is necessarily \textit{improper}, so

163: that it has no inherent probabilistic meaning.  There is no reason to

164: assume that the associated formal posterior will have any

165: probabilistic meaning, even if that posterior is proper.  In the

166: absence of such a meaning, $p(\zeta|y,z)$ is not the answer to a

167: problem of inference, $B_1$ is unable to use the reduction principle

168: to complete his argument, and the inconsistency vanishes.

169:

170: We are not claiming that it is impossible to provide a meaning for an

171: improper distribution. Indeed, such an assumption would preclude the

172: use of improper priors and prejudge the whole issue. We are merely

173: observing that in order to use the reduction principle, a

174: probabilistic meaning must be provided for $p(y,z|\zeta)$, and this

175: has not been done.  Even if a meaning is provided, any manipulations

176: of the distribution must be justified in terms of that meaning, and

177: there is no guarantee that the resulting procedures will be the formal

178: analogs of valid procedures for proper distributions.

179:

180: We now establish the impropriety of the sampling distribution.

181:

182: \noindent\textbf{Proposition:\ \ }

183: Let $p(\eta,\zeta)$ be given, and let $p(\eta,\zeta)= p(\eta|\zeta)\,

184: p(\zeta) $ be any factorization of $p(\eta,\zeta)$ such that

185: $0<p(\zeta)<\infty$. Under the above assumptions we have, for

186: each $\zeta$,

187: \begin{equation}

188:   \int p(y,z|\zeta)\, dy = \infty.

189:     \label{eq:result}

190: \end{equation}

191:

192: \noindent\textit{Proof:\ \ }

193: \begin{eqnarray*}

194:   \int p(y,z|\zeta) \,dy = {1\over p(\zeta)} \int p(y,z|\eta,\zeta)

195:     p(\eta,\zeta)\,d\eta

196:    \,dy = {p_2(z|\zeta)\over p(\zeta)}

197:    \int p(\eta,\zeta) \,d\eta =    \infty. \qquad

198: \end{eqnarray*}

199: The interchange in the order of integration is justified by

200: Tonelli's theorem. $\square$

201:

202: An immediate corollary is that $\int p(y,z|\zeta)\, dy\,dz = \infty$.

203: The factorization of $p(\eta,\zeta)$ is nonunique, and this implies a

204: nonuniqueness in the definition of $p(y,z|\zeta)$. The proposition

205: shows, however, that impropriety of the conditional distribution is

206: independent of the choice of factorization. Note also that although we

207: are evaluating $B_1$'s argument, the proof depends on assumption~(2),

208: which was made for $B_2$'s benefit.

209:

210: \section{The formal argument.}

211: \label{sec:fa}

212:

213: We now consider $B_1$'s procedure on its own terms, as a formal

214: procedure. We find that in the case of a proper prior, $B_1$'s use of

215: the reduction principle is equivalent to the cancellation of a finite

216: factor in a ratio defining $p(\zeta|z)$, and in the case of an

217: improper prior, to the cancellation of an infinite factor.  It is

218: well-known that the formal cancellation of infinities will generally

219: lead to inconsistencies. We conclude that when viewed formally,

220: $B_1$'s procedure is highly suspect.

221:

222: In general, the posteriors of $\zeta$ given $(y,z)$ and given $z$ are

223: given formally by the following expressions:

224: \begin{eqnarray}

225:   \label{eq:postx}

226:   p(\zeta|y,z) &=& {p(y,z,\zeta)\over \int p(y,z,\zeta) \,d\zeta},

227:     \qquad\qquad \mathrm{and} \\

228:   \label{eq:postz}

229:   p(\zeta|z) &=& {\int p(y,z,\zeta)\,dy\over \int p(y,z,\zeta)

230:   \,dy\,d\zeta}.

231: \end{eqnarray}

232: Under Assumption~1, $p(\zeta|y,z)$ is independent of $y$. Then

233: \begin{equation}

234:   \label{eq:fact}

235:   p(y,z,\zeta) = p(y,z)\, p_1(\zeta|z),

236: \end{equation}

237: where $p(y,z) =\int p(y,z,\zeta) d\zeta$.

238: Substituting Eq.~(\ref{eq:fact}) into Eq.~(\ref{eq:postz}), we obtain

239: \begin{equation}

240:   p(\zeta|z) = {\int p(y,z) \,p_1(\zeta|z)\, dy \over \int

241:          p(y,z) \,p_1(\zeta|z) \,dy\,d\zeta}.

242: \end{equation}

243: When $\int p(y,z) dy$ is finite, then $p(\zeta|z) = p_1(\zeta|z)$.

244:

245: If we also make Assumptions~2 and~3, the proposition implies that

246: $\int p(y,z)\,dy = \infty$. The assumption that

247: $p(\zeta|z)=p_1(\zeta|z)$ is now equivalent, as claimed, to the

248: assumption that it is permissible to cancel infinite factors of

249: $\int p(y,z)\, dy$ from the ratio defining $p(\zeta|z)$.

250:

251: \section{Discussion.}

252:

253: We have observed that the inconsistencies uncovered in DSZ73 depend on

254: formal manipulation on the part of $B_1$. We have shown, in

255: Sections~\ref{sec:ia} and~\ref{sec:fa}, respectively, that $B_1$'s

256: procedure has not been justified intuitively, and is suspect

257: mathematically. We therefore see no reason to accept $B_1$'s

258: reasoning, or to regard the validity of this reasoning as necessary or

259: desirable in any extension of Bayesian inference to improper priors.

260: Once $B_1$'s reasoning is rejected, the marginalization paradox

261: disappears.

262:

263: The core of our argument is the observation that $B_1$'s argument is

264: formal because the sampling distribution $p(y,z|\zeta)$ is improper.

265: To the best of our knowledge, this observation has not been made

266: previously. The impropriety of the sampling distribution has perhaps

267: been obscured by its nonuniqueness and by the fact that the formal

268: posterior can be calculated from Eq.~(\ref{eq:post}) without ever

269: computing the sampling distribution explicitly.

270:

271: Previous analyses of the marginalization paradox generally accepted

272: the validity of both Bayesians' arguments. The problem then becomes

273: one of understanding when and why the two Bayesians will agree. This

274: analysis was initiated in DSZ73, which is mostly dedicated to this

275: question.  It turns out that for problems amenable to group analysis,

276: consistency may be achieved by a uniquely determined prior.  The

277: priors determined by this constraint, however, are unsatisfactory for

278: a variety of reasons, which DSZ73 explore in detail.  They conclude

279: that an acceptable theory is elusive or unachievable.

280:

281: The most persistent and insightful critic of the marginalization

282: paradox has been the late E.~T.~Jaynes.

283: Cf.~\cite{Jaynes80a,DSZ80,Jaynes80b,DSZ96,Jaynes03}, for his extended

284: debate with the authors of DSZ73.  We believe that at the conceptual

285: level, Jaynes' critique was fundamentally correct, in that he

286: identified the source of the inconsistencies as the formal

287: manipulation of completed infinities.  A particularly elegant

288: statement of this view can be found in~\cite{Jaynes03}.  At the

289: technical level, Jaynes did not recognize that $B_1$'s argument was

290: invalid, so he was forced to try to determine how the two Bayesians

291: could be reconciled. His thesis was that the disagreement between the

292: Bayesians reflected differences in their prior information. In our

293: opinion, this analysis was not entirely successful, and the correct

294: approach is to reject $B_1$'s reasoning.

295:

296: For general background on the marginalization paradox and related

297: issues, we refer the reader to the excellent review article

298: of~\cite{KW96}.

299:

300: \section*{Acknowledgments}  I thank Harry Martz, Brad Plohr, and

301: Arnold Zellner for helpful comments. I also acknowledge support from

302: the Department of Energy under contract W-7405-ENG-36.

303:

304: \begin{thebibliography}{7}

305: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi

306: \expandafter\ifx\csname url\endcsname\relax

307:   \def\url#1{{\tt #1}}\fi

308:

309: \bibitem[Dawid et~al.(1973)Dawid, Stone, and Zidek]{DSZ73}

310: A.~P. Dawid, M.~Stone, and J.~V. Zidek.

311: \newblock Marginalization paradoxes in {B}ayesian and structural inference.

312: \newblock {\em J. R. Statist. Soc. B}, 35\penalty0 (2):\penalty0 189--233,

313:   1973.

314: \newblock With discussion. Available online at {\tt http://www.jstor.org}.

315:

316: \bibitem[Dawid et~al.(1980)Dawid, Stone, and Zidek]{DSZ80}

317: A.~P. Dawid, M.~Stone, and J.~V. Zidek.

318: \newblock Comments on {J}aynes's paper ``{M}arginalization and prior

319:   probabilities''.

320: \newblock In A.~Zellner, editor, {\em Bayesian Analysis in Econometrics and

321:   Statistics}, pages 79--82. North-Holland, Amsterdam, 1980.

322:

323: \bibitem[Dawid et~al.(1996)Dawid, Stone, and Zidek]{DSZ96}

324: A.~P. Dawid, M.~Stone, and J.~V. Zidek.

325: \newblock Critique of {E}.~{T}. {J}aynes's `{P}aradoxes of {P}robability

326:   {T}heory', 1996.

327: \newblock Research Report 172, Department of Statistical Science, University

328:   College London. Available online at {\tt

329:   http://www.ucl.ac.uk/Stats/research/Resrprts/abstracts.html}.

330:

331: \bibitem[Jaynes(1980{\natexlab{a}})]{Jaynes80a}

332: Edwin~T. Jaynes.

333: \newblock Marginalization and prior probabilities.

334: \newblock In A.~Zellner, editor, {\em Bayesian Analysis in Econometrics and

335:   Statistics}, pages 43--78. North-Holland, Amsterdam, 1980{\natexlab{a}}.

336: \newblock Reprinted in ``E. T. Jaynes: Papers on Probability, Statistics and

337:   Statistical Physics,'' R. D. Rosenkrantz, ed., D. Reidel Publishing Company,

338:   Dordrecht.

339:

340: \bibitem[Jaynes(1980{\natexlab{b}})]{Jaynes80b}

341: Edwin~T. Jaynes.

342: \newblock Reply to {D}awid, {S}tone, and {Z}idek.

343: \newblock In A.~Zellner, editor, {\em Bayesian Analysis in Econometrics and

344:   Statistics}, pages 83--87. North-Holland, Amsterdam, 1980{\natexlab{b}}.

345:

346: \bibitem[Jaynes(2003)]{Jaynes03}

347: Edwin~T. Jaynes.

348: \newblock {\em Probability Theory: The Logic of Science}.

349: \newblock Cambridge University Press, 2003.

350:

351: \bibitem[Kass and Wasserman(1996)]{KW96}

352: Robert~E. Kass and Larry Wasserman.

353: \newblock The selection of prior distributions by formal rules.

354: \newblock {\em JASA}, 91\penalty0 (435):\penalty0 1343--1370, Sep 1996.

355:

356: \end{thebibliography}

357:

358:

359:

360:

361: \end{document}

362:

363: