0006:hep-ph0006356/BayesianAnalysis.tex

1: \documentclass{cernrep}

2: \usepackage{graphicx,here}

3:

4: \begin{document}

5:

6: \title{BAYESIAN ANALYSIS}

7:

8: \author{Harrison B. Prosper}

9:

10: \institute{Department of Physics, Florida  State University, Tallahassee,

11: Florida 32306, USA}

12:

13: %---------------------------------------------------------------------------

14: \def\etal{{\sl et al.}}                 %et al. - no preceeding comma

15: \def\vs{{\sl vs.}}                      %vs.

16:

17: % A useful Journal macro

18: \def\Journal#1#2#3#4{{#1} {\bf #2} (#4) #3}

19: % Some useful journal names

20: \def\NCA{Nuovo Cimento}

21: \def\NIM{Nucl. Instrum. Methods}

22: \def\NIMA{{Nucl. Instrum. Methods} A}

23: \def\NPB{{Nucl. Phys.} B}

24: \def\PLB{{Phys. Lett.}  B}

25: \def\PRL{Phys. Rev. Lett.}

26: \def\APP{Astro. Part. Phys.}

27: \def\PRD{{Phys. Rev.} D}

28: \def\PRC{{Phys. Rev.} C}

29: \def\ZPC{{Z. Phys.} C}

30: \def\REM{Rev. Mod. Phys.}

31: %----------------------------------------------------

32: % Some useful commands

33: %----------------------------------------------------

34: \newcommand{\seqn}{\begin{equation}}

35: \newcommand{\eeqn}{\end{equation}}

36: \newcommand{\seqna}{\begin{eqnarray}}

37: \newcommand{\eeqna}{\end{eqnarray}}

38: \newcommand{\Eq}[1]{Eq.\ (\ref{eq:#1})}

39: \newcommand{\Eqs}[2]{Eqs.\ (\ref{eq:#1}) and (\ref{eq:#2})}

40: \newcommand{\lr}[1]{\left ( #1 \right )}

41: \newcommand{\lrb}[1]{\left [ #1 \right ]}

42: \newcommand{\bold}[1]{\mbox{\bf #1}}

43: \newcommand{\clist}[2]{(#1_{1},\ldots,#1_{#2})}

44: \newcommand{\comb}[2]{\pmatrix{#1\cr#2\cr}}

45: \newcommand{\R}{{\cal R}}

46: \newcommand{\N}{{\cal N}}

47: \newcommand{\C}{{\cal C}}

48: \newcommand{\Cl}{\mbox{$^{37}Cl$}}

49: \newcommand{\B}{\mbox{$^{8}B$}}

50: \newcommand{\Be}{\mbox{$^{7}Be$}}

51:

52: \newcommand{\prior}[1]{{\rm Prior}(#1|I)}

53: \newcommand{\like}[2]{{\rm Pr}(#1|#2,I)}

54: \newcommand{\post}[2]{{\rm Post}(#1|#2,I)}

55: \newcommand{\prob}[3]{{\rm #1}(#2|#3)}

56: \newcommand{\Enu}{E_{\nu}}

57: \newcommand{\ps}{p(\nu|\Enu)}

58: \newcommand{\psa}{p(\nu|\Enu,a)}

59: %---------------------------------------------------------------------------

60:

61: \date{\today}

62:

63: \maketitle

64:

65: \begin{abstract}

66: After making some general remarks, I consider two examples that

67: illustrate the use of Bayesian Probability Theory. The first is a

68: simple one, the physicist's favorite ``toy," that provides a forum

69: for a discussion of the key conceptual issue of Bayesian analysis:

70: the assignment of prior probabilities. The other example

71: illustrates the use of Bayesian ideas in the real world of

72: experimental physics.

73: \end{abstract}

74:

75: \section{INTRODUCTION}

76: \begin{quote}

77: ``We don't know all about the world to start with; our knowledge

78: by experience consists simply of a rather scattered lot of

79: sensations, and we cannot get any further without some {\em a

80: priori} postulates. My problem is to get these stated as clearly

81: as possible." \vspace{0.2cm}

82:

83: Sir Harold Jeffreys, in a letter to Sir Ronald Fisher dated 1

84: March, 1934.

85: \end{quote}

86:

87: Scientific inference has led to the surest knowledge we have yet,

88: paradoxically, there is still disagreement about how to perform

89: it. The disagreement is both within as well as between camps, the

90: principal ones being frequentist and Bayesian. If pressed, the

91: majority of physicists would claim to belong to the frequentist

92: camp. In practice, we belong to both camps: we are frequentists

93: when we wish to appear ``objective," but Bayesian when to be

94: otherwise is either too hard, or makes no sense.

95:  Until fairly recently, relatively few of us

96: have been party to the frequentist Bayesian debate.

97: And society is

98: all the better for it!

99: It is

100: our pragmatism that has cut through the Gordian Knot and allowed

101: scientific progress.

102: However, we find ourselves performing

103: ever more complex inferences that, in some cases, have real world

104: consequences and we can no longer regard the debate as mere

105: philosophical musings.

106: Indeed, this workshop is a testimony to this loss of innocence.

107:

108: All parties appear, at least, to agree on one thing: probability

109: theory is a reasonable basis for a theory of inference. But notice

110: the use of the word ``reasonable." That word highlights the chief

111: cause of the disagreement: any theory of inference is inevitably

112: {\em subjective} in the following sense: what one person regards

113: as reasonable may be considered unreasonable by another and,

114: unlike scientific theories, we cannot appeal to Nature to decide

115: which of the many inference theories is best, nor which criteria

116: are to be used.

117:  I used to think that

118: biased estimates were bad. But while some of us strive mightily to

119: create them others look on bewildered, wondering why on earth we

120: work so hard to achieve a characteristic they consider irrelevant.

121:

122: Physicists, quite properly, are deeply concerned about delivering

123: to the world objective results. Therefore, anything that openly

124: declares itself to be subjective is viewed with suspicion. Since

125: Neyman's theory of inference is billed as objective many of us

126: regard it as reasonable and the Bayesian theory as unfit for

127: scientific use.  However, when one scrutinizes the Neyman theory,

128: its ``objectivity" proves to be of a very peculiar sort, as I hope

129: to show. I then discuss the difficult issue of prior probabilities

130: by way of a simple model. In the last section, I describe a

131: realistic Bayesian analysis to illustrate a point: Bayesian

132: methods are not only fit for scientific use, they are precisely

133: what is needed to make maximal use of data.

134:

135:  But first here are some remarks about

136: probability.

137:

138: \subsection{What is Probability?}

139: Probability theory is a mathematical theory about abstractions

140: called {\em probabilities}. Therefore, to put this theory to work

141: we are obliged to {\em interpret} these abstractions. At least three

142: interpretations have been suggested:

143: \begin{itemize}

144: \item propensity (Popper)

145: \item degree of belief (Bayes, Laplace, Gauss, Jeffreys, de Finetti)

146: \item relative frequency (Venn, Fisher, Neyman, von Mises).

147: \end{itemize}

148: In parentheses I have given the names of a few of the proponents.

149: According to Karl Popper, an unbiased coin, when tossed, has a

150: propensity of 1/2 to land heads or tails. The 1/2 is claimed to be

151: a property of the coin. According to Laplace probability is a

152: measure of the degree of belief in a proposition: given that you

153: believe the coin to be unbiased your degree of belief in the

154: proposition ``the coin will land heads" is 1/2. Finally, according

155: to Venn if the coin is unbiased the relative frequency with which

156: heads appears in an infinite sequence of coin tosses is 1/2. Venn

157: seems to have the edge on the other two interpretations since it

158: is a matter of experience that a coin tossed repeatedly lands

159: heads about 1/2 the time as the number of tosses, that is, trials,

160: increases. Every physicist who performs repeated controlled

161: experiments, either real ones or virtual ones on a computer,

162: provides overwhelming evidence in support of Venn's

163: interpretation.

164:

165: So, which is it to be: degree of belief or relative frequency? The

166: answer, I believe, is both, which prompts another question: is one

167: interpretation more fundamental than the other and if so which?

168: The answer is yes, degree of belief. It is yes for two very

169: important reasons: one is practical the other foundational. The

170: practical reason is that we use probability in a much broader

171: context than that to which the relative frequency interpretation

172: pertains. It has been amply demonstrated that we perform

173: inferential reasoning according to rules that are isomorphic to

174: those of probability theory. Any theory of inference that

175: dismisses the ``degree of belief" interpretation would be expected

176: to suffer a severely restricted domain of applicability relative

177: to the large domain in which probability is used in everyday life.

178:

179:  The

180: second reason is that the Venn limit---the convergence of the ratio

181: of the number of successes to the number of trails---cannot be proved without

182: appealing to the notion of degree of belief\cite{Jeffreys}.

183: The issue here is one

184: of epistemology. Empirical evidence,

185: even when

186: overwhelming, does not prove that a thing is true; only that it is

187: very likely, which is just another way of saying it is very probable. It is

188: easy to see why a mathematical proof, as commonly understood, cannot be

189: established. Consider a sequence of trials to test the Standard

190: Model. Suppose each trial to be a proton

191: anti-proton collision at the Tevatron. Each trial ends in success

192: (a top quark is created) or failure. Let $T$ be the number of

193: trials  and $S$ the number of

194: successes. Given the top quark mass, the

195: Standard Model predicts the probability $p$ of successes.

196: The Standard Model, we note, is a quantum theory. Therefore,

197: the sequence of successes is strictly non-deterministic,

198: in a sense in

199: which a coin toss and a pseudo-random number generator

200: are not.

201:

202: However, a necessary (but of course not sufficient) basis for a

203: mathematical proof of convergence of a sequence to a limit is the

204: existence of a rule that connects term $T+1$ {\em

205: deterministically} to $T$. But for quantum theory it is believed

206: that no such rule exists. What can be and has been proved, by

207: several people starting with James Bernoulli, is this:

208: \begin{quote}

209: If the order of trials is unimportant (that is, the sequence of

210: trials is {\em exchangeable}), and if the {\em probability} of success

211: at each trial is the

212: same, then $S/T \rightarrow p$, as $T

213: \rightarrow \infty$ with {\em probability} one.

214: \end{quote}

215: At this point, I can adopt two attitudes regarding this theorem:

216: one is that clarity of thought is a virtue; the second is that

217: clarity of thought is nice but less important than pragmatism. As

218: a pragmatist I would say that this theorem proves that the Venn

219: limit exists. But in this case I prefer clarity. Let us,

220: therefore, be clear about what this theorem actually proves and

221: what it does not. Bernoulli's theorem does not prove that $S/T$

222: converges to $p$. Rather it is a statement about 1) the {\em

223: probability} that $S/T$ converges to $p$ as 2) the number of

224: trials increases without limit, provided that 3) the order of

225: trials does not matter and that 4) the {\em probability} at each

226: trial is the same. Lurking behind these four seemingly innocuous

227: statements are deep issues that are far beyond the scope of what I

228: wish to say in this paper. Let me just note that the word

229: ``probability" occurs twice in the statement of Bernoulli's

230: theorem. If we insist that all probabilities are relative

231: frequencies then we would have to interpret ``probability of

232: success at each trial" and ``probability one" as the ``limit with

233: probability one" of other exchangeable sequences in order to be

234: consistent. This leads into the abyss of an infinitely recursive

235: definition. Doubtless, von Mises was well aware of this

236: difficulty, which may be why he took the existence of the Venn

237: ``limit" as an axiom. However, even if one is prepared to accept

238: this axiom, I do not think it circumvents the epistemological

239: difficulty of defining a thing, probability, by making use of the

240: thing {\em twice} in its definition. As de Finetti\cite{deFinetti}

241: puts it

242: \begin{quote}

243: ``In order for the results concerning frequencies to make sense,

244: it is necessary that the concept of probability, and the concepts

245: deriving from it which appear in the statements and proofs of

246: these results, should have been defined and given meaning

247: beforehand. In particular, a result which depends on certain

248: events being uncorrelated, or having equal probabilities, does not

249: make sense unless one has defined in advance what one means by the

250: probabilities of the individual events."

251: \end{quote}

252: I agree.

253:

254: The alternative interpretation of probability is {\em degree of

255: belief}. Thus the probability $p$ is our

256: assessment of the probability of success at each trial, based on

257: our current state of knowledge. That state of knowledge

258: could be informed, for example, by the predictions of the Standard

259: Model. Bernoulli's theorem says that

260: if our assessment of the probability of success at each trial is

261: correct, and if our assessment does not change,

262: then it is reasonable to expect $S/T \rightarrow p$ as

263: $T \rightarrow \infty$.

264:

265: But what if our assessment, initially, is incorrect? This poses no

266: difficulty. As our state of knowledge changes, by virtue of data acquired,

267: our assessment of the probability of success changes accordingly.

268: Bayes' theorem shows

269: how the degree of belief of a coherent reasoner will be updated

270: to the point

271: where it closely matches the relative frequency $S/T$.

272:

273:

274: \subsection{Neyman's Theory}

275: Neyman rejected the Bayesian use of Bayes' theorem arguing that

276: the prior probability for a parameter ``has no meaning" when the

277: latter is an unknown constant. He further argued that even if the

278: parameters to be estimated could be considered as random variables

279: we usually do not know the prior probability. With the benefit of

280: hindsight, we can see that these arguments betray a confusion

281: about of the notion of degree of belief. Jeffreys\cite{Jeffreys}

282: frequently lamented the failure of his contemporaries to really

283: understand what he was talking about. I would note that even

284: amongst this illustrious gathering the confusion persists. So let

285: me belabor a point: when one assigns a probability to a parameter

286: it is not because one deems it sensible to think of the parameter

287: as if it were a random variable---this is clearly nonsense if the

288: parameter is in fact a constant. The probability assignments

289: merely encode one's knowledge (or that of an idealized reasoner)

290: of the possible values of the parameter.

291:

292: In his classic paper of 1937\cite{Neyman}, Neyman introduced his

293: theory of confidence intervals, which he believed provided an

294: important element of an objective theory of inference. He not only

295: specified the property that confidence intervals had to satisfy

296: but he also gave a particular rule for constructing them, although

297: he left considerable freedom that can be creatively

298: exploited\cite{Feldman}. Neyman's theory is elegant and powerful.

299: Nonetheless, his theory is open to criticism. But in order to

300: raise objections we need to understand what Neyman said.

301:

302: Imagine an ensemble of trials, or experiments, $\{ E \}$ to each

303: of which we associate an interval

304: $[\underline{\theta}(E),\overline{\theta}(E)]$. The ensemble of

305: experiments yields an ensemble of intervals. Neyman required the

306: ensemble of confidence intervals to satisfy the following

307: condition:

308: \begin{quote}

309: For every possible {\em fixed} point $(\theta, \alpha)$ in the

310: parameter space of the problem, where $\theta$ is the parameter of interest

311: and $\alpha$ denotes all other parameters of the problem

312: \begin{equation}

313: {\rm Prob}\{ \theta \in [\underline{\theta}(E),\overline{\theta}(E)]

314: \} \geq \beta.

315: \end{equation}

316: \end{quote}

317: According to Neyman this probability is to be interpreted as a

318: relative frequency. Thus,

319: any set of intervals is an ensemble of {\em

320: confidence intervals} if the relative frequency with which

321: the intervals contain the

322: point $\theta$ is greater than or equal to $\beta$,

323: for every possible {\em fixed} point in the parameter space regardless of

324: its dimensionality.

325:  Neyman's idea is intuitively clear: an interval picked at

326: {\em random} from such an ensemble, the proverbial urn of

327: sampling theory, will have a $100\beta$\% chance

328: of containing the fixed point $\theta$, whatever the

329: value of $\theta$ and $\alpha$.

330: This is a remarkable requirement. Here is an example.

331:

332: Suppose we wish to measure a cross section. Our inference problem

333: depends upon the following parameters: the cross section $\sigma$,

334: the efficiency $\epsilon$, the background $b$ and the integrated

335: luminosity $L$. Consider a {\em fixed} point $(\sigma, \epsilon,

336: b, L)$ in the parameter space. To this point we associate an

337: ensemble of confidence intervals, induced by an ensemble of

338: possible experimental results. Some of these intervals

339: $[\underline{\sigma}(E),\overline{\sigma}(E)]$ will contain

340: $\sigma$, others will not. The fraction of intervals, in the

341: ensemble, that contain $\sigma$ is called the {\em coverage

342: probability} of the ensemble of intervals. A coverage probability

343: is associated with every point $(\sigma, \epsilon, b, L)$ of the

344: parameter space. Moreover, the value of the coverage probability

345: may vary from point to point. Neyman's key idea is that the

346: ensembles of intervals should be constructed so that, over the

347: allowed parameter space, the coverage probability never falls

348: below some number $\beta$, called the confidence level. Both the

349: coverage probability and the confidence level are to be

350: interpreted as relative frequencies.

351:

352:  The parameter space and its set of ensembles form what

353: mathematicians call a {\em fibre bundle}. The parameter space is

354: the base space to each point of which is attached a fibre, that

355: is, another space, here the ensemble of intervals associated with

356: that parameter point. Each fibre has a coverage probability, and

357: none falls below the confidence level $\beta$. Since the fibres

358: may vary in a non-trivial way from point to point it is not

359: possible, in general, to construct the fibre bundle as a simple

360: Cartesian product of the parameter space and a single ensemble of

361: intervals. In general, a non-trivial fibre bundle is the natural

362: mathematical description of Neyman's construction. Well natural

363: if, like me, you like to think of things geometrically!

364:

365: There are two difficulties with Neyman's idea. The first is

366: technical. For one-dimensional problems, or for problems in which

367: we wish to set bounds on {\em all} parameters simultaneously, the

368: construction of confidence intervals is straightforward. But when

369: the parameter space is multi-dimensional and our interest is to

370: set limits on a single parameter no general algorithm is known for

371: constructing intervals. That is, no general algorithm is known for

372: eliminating nuisance parameters. In our example, we care only

373: about the cross-section; we have no interest in setting bounds on

374: the integrated luminosity. What we do, in practice, is to replace

375: the nuisance parameters with their maximum likelihood estimates.

376: The justification for this procedure is the following theorem:

377: \begin{equation}

378: -2\log

379: \frac{Pr(x|\theta,\hat{\alpha})}{Pr(x|\hat{\theta},\hat{\alpha})}

380: \rightarrow \chi^2, \label{eq:loglike}

381: \end{equation}

382: \begin{quote}

383: as the data sample $x$ grows without

384: limit, and provided that the maximum likelihood estimates

385: $\hat{\theta}$

386: and

387: $\hat{\alpha}$ lie within the parameter space minus its boundary.

388: \end{quote}

389: If our data sample is sufficiently large its likelihood becomes

390: effectively a (non-truncated) multi-variate Gaussian, and

391: consequently the distribution of the log-likelihood ratio is

392: $\chi^2$. Since that distribution is independent of the true

393: values of the parameters a probability statement about the

394: log-likelihood ratio can be re-stated as one about the parameter

395: $\theta$. But, and this is the crucial point, the theorem is

396: silent about what to do for small samples. Unfortunately, we high

397: energy physicists insist on looking for new things, so our data

398: samples are often small. So what are we, in fact, to do? We must

399: after all publish. Today, with our surfeit of computer time, we

400: can contemplate a brute-force approach: start with an approximate

401: set of intervals, computed using \Eq{loglike}, and adjust them

402: iteratively until they make Neyman happy. But because of the

403: second difficulty I now discuss the effort seems hardly worth the

404: trouble.

405:

406: The second difficulty is conceptual.

407:  It has been argued at this workshop, and elsewhere\cite{Cousins},

408:  that the set of published 95\%

409:  intervals constitute a bona fide ensemble of approximately 95\%

410:  confidence intervals. Here is the argument. Each published interval

411:  is

412: drawn from an urn (that is, an ensemble of experiments if you

413: prefer a more cheerful allusion) whose confidence level is 95\%.

414: The fact that each urn is completely different is irrelevant

415: provided that the sampling probability from each is the same,

416: namely 95\%. Thus 95\% of the set of published intervals will be

417: found to yield true statements. And herein lies the beauty of

418: coverage! The flaw in this argument is this: each published

419: interval is drawn from an urn that does not objectively exist,

420: because the ensemble into which an actual experiment is embedded

421: is a purely conceptual construct not open to empirical scrutiny.

422: Fisher\cite{Fisher}, not known for fawning over Bayesians, made a

423: similar point a long time ago:

424: \begin{quote}

425: ``.. if we possess a unique sample on which significance tests are

426: to be performed, there is always ... a multiplicity of populations

427: to each of which we can legitimately regard our sample as

428: belonging; so the phrase `repeated sampling' from the same

429: population does not enable us to determine which population is to

430: be used to define the probability level, for no one of them has

431: objective reality, all being products of the statistician's

432: imagination.''

433: \end{quote}

434: This is true of our ensemble of experiments.

435:  Consequently, a few

436: troublesome physicists, bent on giving the Particle Data Group a

437: hard time, need merely imagine a different set of urns from which

438: the published results could legitimately have been drawn and

439: thereby alter the confidence level of each result!

440:

441: Of course, the published intervals do have a coverage probability.

442: My claim is that its value is a matter to be decided by actual

443: inspection---provided, of course, we know the right answers! It is

444: not one that can be deduced {\em a priori} for the reason just

445: given. The fact that I am able to construct ensembles of

446: confidence intervals on my computer, by whatever procedure, and

447: verify that they satisfy Neyman's criterion is certainly

448: satisfying, but in no way does it prove anything empirically

449: verifiable about the interval I publish. Forgive me for flogging a

450: sincerely dead horse, but let me state this another way: Since I

451: do not repeat my experiment, any statement to the effect that the

452: virtual ensemble simulated on my computer mimics the potential

453: ensemble to which my published interval belongs is tantamount to

454: my claiming that if I were to repeat my experiment, then I would

455: do so such that the virtual and real ensembles matched. Maybe, or

456: maybe not!

457:

458: To summarize: A frequentist confidence level is a property of an

459: ensemble, therefore, its objectivity, or lack thereof, is on par

460: with the ensemble that defines it.

461:

462: This whole discussion may strike you as a tad surreal, but I think

463: it goes to the heart of the matter: many physicists, for sensible

464: reasons, reject the Bayesian theory and embrace coverage because

465: it is widely viewed as objective. But as argued above confidence

466: levels may or may not be objective depending on the circumstances.

467: Therefore, when confronted with a difficult inference problem our

468: choice is not between an ``objective" and ``subjective" theory of

469: inference, but rather between two different subjective theories.

470: It may be reasonable to continue to insist upon coverage, but not

471: because it is objective.

472:

473: After this somewhat philosophical detour it is time to turn to the

474: real world. But en route to the real world, lest Bayesians begin

475: to feel uncontrollably smug, I'd like to discuss an instructive

476: ``toy" model that highlights the fact that for a Bayesian life is

477: hardly a bed of roses\cite{Wasserman}.

478:

479:

480: \section{THE PHYSICIST'S FAVOURITE TOY}

481: The typical high energy physics experiment consists of doing a

482: large number $T$ of similar things---for example, proton

483: antiproton collisions, and searching for $n$ interesting

484: outcomes---for example, $t\bar{t}$ production. We invariably

485: assume that the order of the collisions is irrelevant and that

486: each interesting outcome occurs with equal probability. Then we

487: may avail ourselves of the well-known fact that the probability

488: assigned to $n$ outcomes out of $T$ trials, with our assumptions,

489: is binomial. Since $n << T$, this probability can be approximated

490: by a Poisson distribution

491: \begin{equation}

492: \like{n}{\mu} = \frac{e^{-\mu} \mu^{n}}{n!}, \label{eq:poisson}

493: \end{equation}

494: and thus do we arrive at the physicist's favourite toy. The symbol

495: $I$ denotes all prior information and assumptions that led us to

496: this probability assignment. Here, it is introduced for

497: pedagogical reasons; to remind us of the fact that {\em all}

498: probabilities are conditional. We shall assume that our aim is to

499: infer something about the Poisson parameter $\mu$, given that we

500: have observed $n$ events. Just for fun, we'll give this problem to

501: each workshop member. Naturally, being physicists, each of us

502: insists on parameterizing this problem as we see fit, but in the

503: end when we compare notes we shall do so in terms of the parameter

504: $\mu$,  by transforming to that parameter.

505:

506: There are, of course, infinitely many ways to parameterize a

507: likelihood function and the Poisson likelihood is no exception.

508: For simplicity, however, let's assume that each of us uses a

509: parameter $\mu_p$ related to $\mu$ as follows

510: \begin{equation}

511: \mu_p = \mu^p.

512: \label{eq:mup}

513: \end{equation}

514: ``$p$" for physicist if you like! In terms of the parameter

515: $\mu_p$ \Eq{poisson} becomes

516: \begin{equation}

517: \like{n}{\mu_p} = \frac{e^{-\mu_p^{1/p}} \mu_p^{n/p}}{n!},

518: \label{eq:poissonp}

519: \end{equation}

520: which, we note, does not alter the probability assigned to $n$.

521:

522: From Bayes' theorem

523: \begin{equation}

524: \post{\mu_p}{n} = \frac{\like{n}{\mu_p}

525: \prior{\mu_p}}{\int_{\mu_p} \like{n}{\mu_p} \prior{\mu_p}},

526:  \label{eq:defpostp}

527: \end{equation}

528: each of us can make inferences about our parameter $\mu_p$, and

529: hence $\mu$. Of course, no one can proceed without specifying a

530: prior probability $\prior{\mu_p}$. Unfortunately, being mere

531: physicists we do not know what its form should be. But since we

532: are all in the same state of knowledge regarding our parameter,

533: coherence would seem to demand that we use the same functional

534: form. So without a shred of motivation let's try the following

535: form for the prior probability

536: \begin{equation}

537: \prior{\mu_p} = \mu_p^{-q} d\mu_p. \label{eq:aprior}

538: \end{equation}

539: Although this prior is plucked out of thin air, it is actually

540: more general than it appears because, in principle, $q$ could be

541: an arbitrarily complicated function of $p$. Now each of us is in a

542: position to calculate, assuming that the allowed parameter space

543: for $\mu_p$ is $[0,\infty)$. We each find that

544: \begin{equation}

545: \post{\mu_p}{n} = \frac{e^{-\mu_p^{1/p}} \mu_p^{n/p-q}

546: d\mu_p}{p\Gamma(n-pq+p)}.

547:  \label{eq:postp}

548: \end{equation}

549: But as agreed, each of us transforms our posterior probability to

550: the parameter $\mu$ using \Eq{mup}. Thus we obtain, from

551: \Eq{postp},

552: \begin{equation}

553: \post{\mu}{n} = \frac{e^{-\mu} \mu^{n-pq+p-1}

554: d\mu}{\Gamma(n-pq+p)}.

555:  \label{eq:postmu}

556: \end{equation}

557: Unfortunately, something is seriously amiss with the family of

558: posterior probabilities represented by \Eq{postmu}: each of us has

559: ended up  making a different inference about the same parameter

560: $\mu$! We can see this more clearly by computing the $r$th moment

561: \begin{eqnarray}

562: \label{eq:mr}

563:  m_r & \equiv & \int_{\mu} \mu^r \post{\mu}{n} \\ \nonumber

564:     & = & \Gamma(n-pq+p+r)/\Gamma(n-pq+p),

565: \end{eqnarray}

566: of the posterior probability $\post{\mu}{n}$. The moments clearly

567: depend on $p$, that is, on how we have chosen to parameterize the

568: problem.

569:

570: What does a Bayesian have to say about this state of affairs? Is

571: it a problem? I would say yes, it is. But there are some Bayesians

572: who call themselves ``subjective Bayesians" and others who believe

573: themselves to be ``objective Bayesians." I confess that these

574: terms leave me a bit baffled. The latter term because it seems to

575: be an oxymoron and the former because it seems to be superfluous.

576: The fundamental Bayesian pact is this: The prior probability is an

577: encoding of a state of knowledge; as such it is a subjective

578: construct. That construct may encode one's personal state of

579: knowledge or belief, and that's a fine thing to do and is very

580: powerful. But it may also encode a state of knowledge that is not

581: specifically yours and that too is just fine. The issue is one of

582: encoding a state of knowledge: Are there any desiderata that

583: should be respected? The subjectivist is probably inclined to say

584: no: simply choose the parameterization that makes sense for you

585: and associate a prior, declare it to be supreme, and force all

586: other priors to differ from yours in just the right way to render

587: an inference about $\mu$ unique. So a ``subjective" Bayesian would

588: presumably reject \Eq{aprior}.

589:

590: I believe that to make headway, we should entertain some further

591: principles. They should not degenerate into dogma but should serve

592: as a lantern in the dark. Here are two possible principles:

593: \begin{itemize}

594: \item Possible Principle 1: For the same likelihood and the same

595: form of prior we should obtain the same inferences.

596: \item Possible Principle 2: The moments of the posterior

597: probability should be finite.

598: \end{itemize}

599: Let's apply these tentative principles to the moments in \Eq{mr}.

600: Principle 1 says that each of us should make the same inferences

601: about $\mu$, that is, the moments ought not to depend on the whim

602: of a workshop member; it ought not to depend on $p$. Principle 2

603: says that $m_r < \infty$. Together these principles imply that

604: \begin{equation}

605: -pq + p = a > 0,

606: \end{equation}

607: where $a$ is a constant. This leads to the following prior

608: \begin{equation}

609: \prior{\mu_p} = \mu_p^{a/p-1} d\mu_p.

610: \end{equation}

611: But we didn't quite make it; our principles are insufficient to

612: uniquely specify a value for the constant $a$. We need something

613: more. Here is something more, suggested by Vijay

614: Balasubramanian\cite{Bala}:

615: \begin{itemize}

616: \item Possible Principle 3: When in doubt, choose a prior that

617: gives equal weight to

618: all likelihoods indexed by the same parameters.

619: \end{itemize}

620: That is, impose a {\em uniform} prior on the space of

621: distributions. This requirement is a much more reasonable one

622: (here is that word again) than imposing uniformity on the space of

623: parameters because the space of distributions is invariant,

624: whereas that of parameters is not. The space of distributions is

625: akin to a space containing invariant objects like the vectors in a

626: vector space, whereas the parameter space is analogous to the

627: non-invariant space of vector coordinates. In our case, we impose

628: a uniform prior on the space inhabited by Poisson distributions.

629: Balasubramanian has shown that a uniform prior on the space of

630: distributions induces, locally, a Riemannian metric whose

631: invariant measure is determined by the Fisher Information, $F$.

632: For our toy model the invariant measure is

633: \begin{equation}

634: \label{eq:jeffprior} \prior{\mu_p} = F^{1/2} d\mu_p,

635: \end{equation}

636: where

637: \begin{equation}

638: F(\mu_p) = -\left < \frac{d^2 \log \like{n}{\mu_p}}{d\mu_p^2}

639: \right >.

640: \end{equation}

641: Equation~(\ref{eq:jeffprior}) is called the {\em Jeffreys prior}.

642: It gives $a = 1/2$ and thus uniquely specifies the form of the

643: prior probability. Possible Principle 3 is a generalization of

644: Possible Principle 1. Thus we conclude that the prior probability

645: that forces us all to make the same inference, regardless of how

646: we choose to parameterize the problem, is

647: \begin{equation}

648: \label{eq:prior} \prior{\mu_p} = \mu_p^{-\frac{1}{2}(2-p)} d\mu_p.

649: \end{equation}

650:

651: This is all very tidy. However, when Jeffreys\cite{Jeffreys}

652: applied his general prior probability to the Gaussian, treating

653: both its mean and standard deviation together he got a result he

654: did not like. He therefore suggested another principle:

655: \begin{itemize}

656: \item Possible Principle 4: If the parameter space can be

657: partitioned into subspaces that, {\em a priori}, are considered

658: independent then the general prior should be applied to each

659: subspace separately.

660: \end{itemize}

661: This gave him a prior he liked. Alas, for a Bayesian life is not

662: easy. While the frequentist struggles with justifying the use of a

663: particular non-objective ensemble the Bayesian struggles to

664: justify why some set of additional principles for encoding minimal

665: prior knowledge is reasonable. Meanwhile, the ``subjective

666: Bayesian" says this is all a mere chasing after shadows. And so it

667: goes!

668:

669: \section{THE READ WORLD}

670: The foregoing discussion might suggest to ``abandon all hope yea

671: who enter" the real world of inference problems. Fortunately, it

672: is not quite so bleak. The real world imposes some very severe

673: constraints on what we can reasonably be expected to do. For one

674: thing, the lifetime of a physicist is finite, indeed, short when

675: compared with the age of the universe. Technical resources are

676: also finite. And then there is competition from fellow physicists.

677: Finally, uncertainty in abundance is the norm. Perhaps with enough

678: deep thought all inference problems can be solved in a pristine

679: manner. In practice, we are forced to exercise a modicum of

680: judgement when undertaking any realistic analysis. We introduce

681: approximations as needed, we side-step difficult issues by

682: accepting some conventions and we rely upon our ability not to get

683: lost amongst the trees. But when I reflect on what must be done to

684: measure, say, the top quark mass, a problem replete with

685: uncertainties in the jet energy scale, acceptance, background,

686: luminosity, Monte Carlo modeling to name but a few, it strikes me

687: as desirable to have a coherent and intuitive framework to think

688: about such problems. Bayesian Probability Theory provides

689: precisely such a framework. Moreover, it is a framework that

690: mitigates our propensity to get confused about statistics when the

691: going gets tough. The second example I discuss shows that real

692: science can be done in spite of prior anxiety\cite{Wasserman}.

693:

694:

695: \subsection{Measuring the Solar Neutrino Survival Probability}

696:

697: It has been known for over a quarter of a century that fewer

698: electron neutrinos are received from the Sun than expected on the

699: basis of the Standard Solar Model

700: (SSM)\cite{review,history,bp98,bp95,ssm93}.

701:  This is the famous solar neutrino

702: problem. Figure~\ref{fig:rates} summarizes the situation as of

703: Neutrino 98.

704: \begin{figure}[htbp]

705: \begin{center}

706: \includegraphics[width=8.5cm,angle=-90]{bp98rates.eps}

707: \caption{Predictions of the 1998 Standard Solar Model of Bahcall

708: and Pinsonneault relative to data presented at Neutrino 98.

709: Courtesy J.N.~Bahcall.} \label{fig:rates}

710: \end{center}

711: \end{figure}

712: If the SSM is correct---and there is very strong evidence in its

713: favour\cite{helio}, then the inevitable conclusion is that a

714: fraction of the electron neutrinos created in the solar core are

715: lost before they reach detectors on Earth. The loss of electron

716: neutrinos is parameterized by the {\em neutrino survival

717: probability}, $\ps$, which is the probability that a solar

718: neutrino $\nu$ of energy $\Enu$ arrives at the Earth.

719:

720: Several loss mechanisms have been suggested, such as the

721: oscillation of electron neutrinos to less readily observed states

722: such as muon, tau  or sterile neutrinos\cite{gribov,wolfenstein}.

723: Many $\chi^2$-based analyses have been performed to estimate model

724: parameters\cite{hata, lui, parke}. To the degree that a fit to the

725: solar neutrino data is good it provides evidence in favour of the

726: particular new physics that has been assumed. From this

727: perspective, solar neutrino physics is yet another way to probe

728: physics beyond the Standard Model.

729:

730:  But I'd like to address a more modest question:

731:  What do the data tell us about the solar neutrino

732:  survival probability independently of any particular model of new physics?

733:   We can provide a complete answer by computing

734:  the posterior probability of different

735:  hypotheses about the value of the survival probability, for a given neutrino

736:  energy\cite{bhat,gates}. Our Bayesian analysis is comprised of four components

737: \begin{itemize}

738: \item The model

739: \item The data

740: \item The likelihood

741: \item The prior

742: \end{itemize}

743: First we sketch the model. (See Ref.~\cite{bhat} for details.)

744:

745: The solar neutrino capture rate $S_i$ on chlorine and gallium can

746: be written as

747: \begin{equation}

748: S_i = {\sum_j} \Phi_j \int  \ps \sigma_i(E_{\nu}) \phi_j(E_{\nu})

749: d\Enu, \label{eq:radio}

750: \end{equation}

751: where $\Phi_j$ is the total flux from neutrino source $j$,

752: $\phi_j$ is the normalized neutrino energy spectrum and $\sigma_i$

753: is the cross section for experiment $i$. The predicted spectrum,

754: plus experimental energy thresholds, are shown in

755: Fig.~\ref{fig:spectrum}. The full spectrum consists of eight

756: components (of which six are shown in Fig.~\ref{fig:spectrum}),

757: with total fluxes $\Phi_1$ to $\Phi_8$\cite{bp98}.

758: \begin{figure}[htbp]

759: \begin{center}

760: \includegraphics[width=8.5cm,angle=-90]{bp98spectrum.eps}

761: \caption{Solar neutrino energy spectrum as predicted by the

762: Bahcall-Pinsonneault 1998 Standard Solar Model, including the

763: neutrino energy thresholds for different solar neutrino

764: experiments. Courtesy J.N.~Bahcall.} \label{fig:spectrum}

765: \end{center}

766: \end{figure}

767:

768: The Super-Kamiokande experiment\cite{neutrino98} measures the

769: electron recoil spectrum arising from the scattering of the $^8B$

770: neutrinos (plus higher energy neutrinos) off atomic electrons.  We

771: shall use the electron recoil spectrum reported at Neutrino 98.

772: The spectrum spans the range 6.5 to 20~MeV. Light water

773: experiments, like Super-Kamiokande, are sensitive to all neutrino

774: flavors but do not distinguish between them. There are, therefore,

775: two possibilities: the $\nu_e$ deficit could be caused by $\nu_e$

776: conversions to $\nu_x$, where $x$ is either $\mu$ or $\tau$. If so

777: the measured neutrino flux would be the sum of these flavors. If,

778: however, the $\nu_e$ are simply lost without a trace, for example

779: because of conversion into sterile neutrinos, then the measured

780: flux would be comprised of $\nu_e$ only. Like the rates for the

781: radiochemical experiments, the measured electron recoil spectrum

782: is linear in the neutrino survival probability. The data are shown

783: in Fig.~\ref{fig:recoil}.

784: \begin{figure}[htbp]

785: \begin{center}

786: \includegraphics[width=8.5cm]{bp98recoil.eps}

787: \caption{Electron recoil spectrum measured by Super-Kamiokande

788: compared to spectrum predicted by the Bahcall-Pinsonneault 1998

789: Standard Solar Model. From Ref.~\cite{superk}.} \label{fig:recoil}

790: \end{center}

791: \end{figure}

792:

793:

794: For solar neutrino experiments, a reasonable definition of

795: sensitivity is the product of the cross section times the

796: spectrum\cite{bhat}. This quantity is plotted in

797: Fig.~\ref{fig:sensitivity}. Two points are noteworthy: each

798: experiment is sensitive to different parts of the neutrino energy

799: spectrum and there are regions in neutrino energy where the

800: sensitivity is essentially zero. We should anticipate that these

801: facts will constrain what we are able to learn about the neutrino

802: survival probability from the current solar neutrino data.

803: %

804: \begin{figure}[htbp]

805: \begin{center}

806: \includegraphics[width=8.5cm]{bp98sensitivity.eps}

807: \caption{Spectral sensitivity as a function of the neutrino

808: energy. From Ref.~\cite{bhat}.} \label{fig:sensitivity}

809: \end{center}

810: \end{figure}

811: %

812:

813: Since we do not know the cause of the solar neutrino deficit,

814: let's adopt a purely phenomenological approach to the survival

815: probability. Guided by the results from previous

816: analyses~\cite{hata,lui,parke,cbhat} we write the survival

817: probability as a sum of two finite Fourier series:

818: \begin{eqnarray}

819: \label{eq:parametric}

820:  \psa & = & \sum_{r=0}^{7}

821: a_{r+1} \mbox{cos}(r\pi E_{\nu}/L_1)  / (1 +

822: \mbox{exp}[(E_{\nu}-L_1)/b])

823: \\ \nonumber

824: & + & \sum_{r=0}^{3} a_{r+9} \mbox{cos}(r \pi E_{\nu}/L_2),

825: \end{eqnarray}

826: where now we explicitly note the fact that the survival

827: probability depends upon the set of parameters $a$.

828: %

829: The first term in \Eq{parametric} is defined in the interval 0.0

830: to $L_1$ MeV---and suppressed beyond $L_1$ by the exponential. The

831: second term spans the interval 0.0 to $L_2$ MeV. We have divided

832: the function this way to model a survival probability that varies

833: rapidly in the interval 0.0 to $L_1$ and less so elsewhere. The

834: parameters $L_1$, $L_2$ and $b$ are set to 1.0, 15.0 and 0.1~MeV,

835: respectively.

836:

837: We now consider the likelihood function $\like{D}{H}$, where $H$

838: denotes the hypothesis under consideration. The likelihood is

839: assumed to be proportional to a multi-variate Gaussian

840: $g(D|S,\Sigma)$, where $D \equiv (D_1,\ldots,D_{19})$ represents

841: the 19 data---3 rates from the chlorine and gallium experiments

842: plus 16 rates from the binned Super Kamiokande electron recoil

843: spectrum (Fig.~\ref{fig:recoil}); $\Sigma $ denotes the

844: $19\times19$ error matrix for the experimental data and $S \equiv

845: (S_1,\ldots,S_{19})$ represents the predicted rates.

846:

847: The remaining ingredient is the prior probability. First we assess

848: our state of knowledge. There are two sets of parameters to be

849: considered: the total fluxes $(\Phi_1,\ldots,\Phi_8)$ and the

850: survival probability parameters $(a_1,\ldots,a_{12})$. The

851: hypotheses under consideration concern the values of these two

852: sets of parameters. The Standard Solar Model provides predictions

853: $\Phi^0 \equiv (\Phi_1^0,\ldots,\Phi_8^0)$ for the total fluxes,

854: together with estimates of their {\em theoretical} uncertainties.

855: So here is an analysis that must deal with theoretical

856: uncertainties in some sensible way. I do not know how such a thing

857: can be addressed in a manner consistent with frequentist precepts.

858: For a Bayesian uncertainty is, well, uncertainty, regardless of

859: provenance; therefore, every sort can be treated identically. We

860: represent our state of knowledge regarding the fluxes by a

861: multi-variate Gaussian prior probability $\prior{\Phi} =$

862: $g(\Phi|\Phi^0,\Sigma_{\Phi})$, where $\Phi^0$ is the vector of

863: flux predictions and $\Sigma_{\Phi}$ is the corresponding error

864: matrix\cite{bp98}.

865:

866:  Unfortunately, we know very

867: little about the parameters $a_1, \ldots,a_{12}$, so we shall

868: short-circuit discussion by taking, as a matter of convention, the

869: prior probability for $a$ to be uniform. In practice, any other

870: plausible choice makes very little difference to our conclusions.

871: We may even find that a uniform prior for $a$ is consistent with

872: the generalized Jeffreys prior. Thus we arrive at the following

873: prior for this inference problem:

874: \begin{eqnarray}

875: \prior{a,\Phi} & = & {\rm Prior}(a|\Phi,I) \prior{\Phi} \\

876: \nonumber

877:     & = & da \prior{\Phi},

878: \end{eqnarray}

879: where $I$ now includes the prior information from the Standard

880: Solar Model.

881:

882: Now we can calculate! The posterior probability is given by

883: \begin{equation}

884: \post{a,\Phi}{D} = \frac{\like{D}{a,\Phi} \prior{a,\Phi}}

885:             {\int_{a,\Phi}\like{D}{a,\Phi} \prior{a,\Phi}  }.

886: \end{equation}

887: But since we aren't really interested in the total fluxes

888: probability theory dictates that we just marginalize  (that is,

889: integrate) them away to arrive at the quantity of interest

890: $\post{a}{D}$. Actually, what we really want is the probability of

891: the survival probability for a given neutrino energy $\Enu$! That

892: is, we want

893: \begin{equation}

894:     \post{p}{D} = \int_{a} \delta(p - \psa) P(a|D,I).

895: \end{equation}

896:  Figure~\ref{fig:parme} shows contour plots of $\post{p}{D}$ for the two

897:  cases considered, conversion to sterile and active neutrinos.

898:

899: Our Bayesian analysis has produced a result that, intuitively,

900: makes a lot of sense. As expected, given the sensitivity plot in

901: Fig.~\ref{fig:sensitivity}, our knowledge of the survival

902: probability is very uncertain between 1 and 5 MeV. In fact, the

903: survival probability is tightly constrained in only two narrow

904: regions: in the \Be\ region just below 1 MeV and another at around

905: 8 MeV, near the peak of the \B\ neutrino spectrum. For neutrino

906: energies above 12 MeV or so, the survival probability is basically

907: unconstrained by current data.

908:

909: \begin{figure}[htbp]

910: \begin{center}

911:     \begin{minipage}[t]{0.46\linewidth}

912:     \includegraphics[width=7cm,angle=-90]{bp98sterile.eps}

913:     \end{minipage}

914:     \begin{minipage}[t]{0.46\linewidth}

915:     \includegraphics[width=7cm,angle=-90]{bp98active.eps}

916:     \end{minipage}

917: \end{center}

918: \caption{ Survival probability {\it vs} neutrino energy assuming

919: the  neutrino flux consists of $\nu_e $ only (left plot) and

920: $\nu_e $ to active neutrinos (right plot).} \label{fig:parme}

921: \end{figure}

922:

923: \section{SUMMARY}

924: It has been claimed by some at this workshop that Bayesian methods

925: are of limited use in physics research. This of course is not true

926: as I hope to have shown. Bayesian methods are, however, explicitly

927: subjective and this may give one pause. I have argued that

928: frequentist methods are not nearly as objective as claimed. While

929: Bayesians cannot avoid the irreducible subjectivism of prior

930: probabilities, frequentists cannot avoid the use of ensembles that

931: do not objectively exist. Frequentists struggle with any

932: uncertainty that does not arise from repeated sampling, like

933: theoretical errors, while for Bayesians uncertainty in all its

934: forms is treated identically. On the other hand, some Bayesians

935: struggle to convince us that a particular choice of prior is

936: reasonable, while frequentists look on in amusement. The point is

937: neither approach is free from warts. But, of the two approaches to

938: inference, I would say that the Bayesian one has more to offer, is

939: easier to understand, has greater conceptual cohesion and, the

940: most important point of all, more closely accords with the way we

941: physicists think\cite{bayes}. And this is real reason why it

942: should be embraced.

943:

944:

945: \vskip1cm \noindent

946:

947: \section*{ACKNOWLEDGEMENTS}

948:

949:

950:  I wish to thank the organizers for hosting this most enjoyable workshop.

951:  It was a particular pleasure for me to meet again my dear friend, and

952:  intellectual sparring partner, Fred James who must take all the

953:  credit for arousing my interest in this arcane subject.

954:  I thank my colleagues Chandra Bhat, Pushpa Bhat

955: and Marc Paterno with whom the solar neutrino work was done, John

956: Bahcall for providing the latest theoretical information and

957: Robert Svoboda for providing the 1998 Super-Kamiokande data in

958: electronic form. This work was supported in part by the U.S.

959: Department of Energy.

960:

961: \begin{thebibliography}{99}

962:

963: \bibitem{Jeffreys} H.~Jeffreys,Theory of Probability, 3rd edition,

964: Oxford University Press (1961). Chapters I, VII and VIII should be

965: required reading for anyone who values clear thinking.

966:

967: \bibitem{deFinetti} B.~deFinetti, Theory of Probability, Vol. 1,

968: John Wiley \& Sons Ltd. (1990).

969:

970: \bibitem{Neyman} J.~Neyman, Phil. Trans. R. Soc. London {\bf A236}

971: (1937) 333. A beautiful paper, not nearly as daunting as one might

972: imagine.

973:

974: \bibitem{Feldman} G. Feldman and R. Cousins, Phys. Rev. {\bf D57}

975:   (1998) 3873. A clear paper free of

976:   frequentist/Bayesian muddle! The authors

977:   make a sharp distinction between Bayesian and frequentist ideas

978:   and then opt for a principled frequentism.

979:

980: \bibitem{Cousins}

981: R.~Cousins, Am. J. Phys. {\bf 63} (1995) 398. An excellent

982: accessible discussion about limits. And yes Bob, every physicist

983: is a Bayesian, but many don't know it! Ok, maybe Fred isn't!

984:

985: \bibitem{Fisher}  R.~A.~Fisher: An Appreciation (Lecture Notes on

986: Statistics, Vol. 1), S.~E.~Fienberg and D.~V.~Hinkley, eds.

987: Springer Verlag (1990). Lots of interesting historical stuff about

988: Sir Ronald.

989:

990: \bibitem{Bala}

991: V.~Balasubramanian, Statistical Inference, Occam's Razor and

992: Statistical Mechanics on The Space of Probability Distributions,

993: Princeton University Physics Preprint PUPT-1587 (1996). Also

994: available electronically as preprint cond-mat/9601030. The

995: mathematics is a bit tricky, but the main ideas are not too hard

996: to grasp. It's worth a read.

997:

998: \bibitem{Wasserman}

999: R.~E.~Kass and L.~Wasserman, J. Am. Stat. Assoc., {\bf 91} (1996)

1000: 1343. Life is tough!

1001:

1002: \bibitem{review}

1003: T.~A.~Kirsten, \Journal{\REM}{71}{1213}{1999}.

1004:

1005: \bibitem{history}

1006: J.~N.~Bahcall and Raymond Davis, Jr., An account of the

1007: development of the solar neutrino problem, Essays in Nuclear

1008: Astrophics, eds. C.~A.~Barnes, D.~D.~Clayton and D.~Schramm,

1009: Cambridge University Press (1982) pp. 243-285.\\ See also,

1010: //http://www.sns.ias.edu/$\sim$jnb/Papers/Popular/snhistory.html

1011:

1012: \bibitem{bp98}

1013: J.~N.~Bahcall \etal, \Journal{\PLB}{433}{1}{1998}.

1014:

1015: \bibitem{bp95} J.~N.~Bahcall and M.~Pinsonneault,

1016: \Journal{\REM}{67}{781}{1995}.

1017:

1018: \bibitem{ssm93}

1019: S.~Turck-Chi$\acute{\rm e}$ze and I.~Lopes, Astrophys. J. {\bf

1020: 408} (1993) 347.

1021:

1022: \bibitem{helio}

1023: J.~N.~Bahcall, S.~Basu and M.~H.~Pinsonneault,

1024: \Journal{\PLB}{433}{1}{1998}.

1025:

1026: \bibitem{gribov} V.N. Gribov and B.M.~Pontecorvo,

1027: \Journal{\PLB}{28}{493}{1969};\\ J.N.~Bahcall and S.C.~Frautschi,

1028: \Journal{\PLB}{29}{623}{1969}; \\ S.L.~Glashow and L.M.~Krauss,

1029: \Journal{\PLB}{190}{199}{1987}.

1030:

1031: \bibitem{wolfenstein} L.~Wolfenstein,

1032: \Journal{\PRD}{17}{2369}{1978};\\ S.P.~Mikheyev and A.Yu.~Smirnov,

1033: \Journal{Sov. J. Nucl. Phys.}{42}{913}{1986}; \\ S.P.~Mikheyev and

1034: A.Yu.~Smirnov, \Journal{Nuovo Cimento C}{9}{17}{1986}.

1035:

1036: \bibitem{hata} N.~Hata and P.Langacker,

1037: \Journal{\PRD}{50}{632}{1994}, N.~Hata and P.Langacker,

1038: \Journal{\PRD}{56}{6107}{1997}.

1039:

1040: \bibitem{lui} Q.~Y.~Liu and S.~T.~Petcov,

1041: \Journal{\PRD}{56}{7392}{1997}; \\

1042: A.B.~Balantekin, J.F.~Beacom,

1043: J.M.~Fetter, \Journal{\PLB}{427}{317}{1998}.

1044:

1045: \bibitem{parke}

1046: S.~Parke, \Journal{\PRL}{74}{839}{1995}.

1047:

1048: \bibitem{cbhat}

1049: C.M.~Bhat, 8th Lomonosov Conference on Elementary Particle

1050: Physics, Moscow, Russia, August 1997, FERMILAB-Conf-98/066;\\

1051: C.M.~Bhat \etal, Proceedings of the 9th  Meeting  of the DPF of

1052: the American Physical Society, ed. K.~Heller \etal, World

1053: Scientific (1996) 1220;

1054:

1055: \bibitem{bhat}

1056: C.~M.~Bhat, P.~C.~Bhat, M.~Paterno and H.~B.~Prosper,

1057: \Journal{\PRL}{81}{5056}{1998}.

1058:

1059: \bibitem{gates}

1060: E.~Gates, L.M.~Krauss, M.~White, \Journal{\PRD}{51}{2631}{1995}.

1061:

1062: \bibitem{neutrino98}

1063: K.~Lande (Homestake), V.N.~Gavrin (SAGE), T. Kirsten (GALLEX) and

1064: Y.~Suzuki (Super-Kamiokande), Neutrino 98, Proceedings XVIIIth

1065: International Conference on Neutrino Physics and Astrophysics,

1066: Takayama, Japan, June 1998, eds. Y.~Suzuki and Y.~Totsuka;\\

1067: Robert Svoboda, private communication 1998.

1068:

1069: \bibitem{superk}

1070: The Super-Kamiokande Collaboration,

1071: \Journal{\PRL}{82}{2644}{1999}.

1072:

1073: \bibitem{bayes}

1074: See for example, G. D'Agostini, Bayesian Reasoning In High-Energy

1075: Physics: Principles And Applications, CERN-99-03 (1999) 183.

1076:

1077: \end{thebibliography}

1078:

1079: \end{document}

1080:

1081: