1: \documentclass{cernrep}
2: \usepackage{graphicx,here}
3:
4: \begin{document}
5:
6: \title{BAYESIAN ANALYSIS}
7:
8: \author{Harrison B. Prosper}
9:
10: \institute{Department of Physics, Florida State University, Tallahassee,
11: Florida 32306, USA}
12:
13: %---------------------------------------------------------------------------
14: \def\etal{{\sl et al.}} %et al. - no preceeding comma
15: \def\vs{{\sl vs.}} %vs.
16:
17: % A useful Journal macro
18: \def\Journal#1#2#3#4{{#1} {\bf #2} (#4) #3}
19: % Some useful journal names
20: \def\NCA{Nuovo Cimento}
21: \def\NIM{Nucl. Instrum. Methods}
22: \def\NIMA{{Nucl. Instrum. Methods} A}
23: \def\NPB{{Nucl. Phys.} B}
24: \def\PLB{{Phys. Lett.} B}
25: \def\PRL{Phys. Rev. Lett.}
26: \def\APP{Astro. Part. Phys.}
27: \def\PRD{{Phys. Rev.} D}
28: \def\PRC{{Phys. Rev.} C}
29: \def\ZPC{{Z. Phys.} C}
30: \def\REM{Rev. Mod. Phys.}
31: %----------------------------------------------------
32: % Some useful commands
33: %----------------------------------------------------
34: \newcommand{\seqn}{\begin{equation}}
35: \newcommand{\eeqn}{\end{equation}}
36: \newcommand{\seqna}{\begin{eqnarray}}
37: \newcommand{\eeqna}{\end{eqnarray}}
38: \newcommand{\Eq}[1]{Eq.\ (\ref{eq:#1})}
39: \newcommand{\Eqs}[2]{Eqs.\ (\ref{eq:#1}) and (\ref{eq:#2})}
40: \newcommand{\lr}[1]{\left ( #1 \right )}
41: \newcommand{\lrb}[1]{\left [ #1 \right ]}
42: \newcommand{\bold}[1]{\mbox{\bf #1}}
43: \newcommand{\clist}[2]{(#1_{1},\ldots,#1_{#2})}
44: \newcommand{\comb}[2]{\pmatrix{#1\cr#2\cr}}
45: \newcommand{\R}{{\cal R}}
46: \newcommand{\N}{{\cal N}}
47: \newcommand{\C}{{\cal C}}
48: \newcommand{\Cl}{\mbox{$^{37}Cl$}}
49: \newcommand{\B}{\mbox{$^{8}B$}}
50: \newcommand{\Be}{\mbox{$^{7}Be$}}
51:
52: \newcommand{\prior}[1]{{\rm Prior}(#1|I)}
53: \newcommand{\like}[2]{{\rm Pr}(#1|#2,I)}
54: \newcommand{\post}[2]{{\rm Post}(#1|#2,I)}
55: \newcommand{\prob}[3]{{\rm #1}(#2|#3)}
56: \newcommand{\Enu}{E_{\nu}}
57: \newcommand{\ps}{p(\nu|\Enu)}
58: \newcommand{\psa}{p(\nu|\Enu,a)}
59: %---------------------------------------------------------------------------
60:
61: \date{\today}
62:
63: \maketitle
64:
65: \begin{abstract}
66: After making some general remarks, I consider two examples that
67: illustrate the use of Bayesian Probability Theory. The first is a
68: simple one, the physicist's favorite ``toy," that provides a forum
69: for a discussion of the key conceptual issue of Bayesian analysis:
70: the assignment of prior probabilities. The other example
71: illustrates the use of Bayesian ideas in the real world of
72: experimental physics.
73: \end{abstract}
74:
75: \section{INTRODUCTION}
76: \begin{quote}
77: ``We don't know all about the world to start with; our knowledge
78: by experience consists simply of a rather scattered lot of
79: sensations, and we cannot get any further without some {\em a
80: priori} postulates. My problem is to get these stated as clearly
81: as possible." \vspace{0.2cm}
82:
83: Sir Harold Jeffreys, in a letter to Sir Ronald Fisher dated 1
84: March, 1934.
85: \end{quote}
86:
87: Scientific inference has led to the surest knowledge we have yet,
88: paradoxically, there is still disagreement about how to perform
89: it. The disagreement is both within as well as between camps, the
90: principal ones being frequentist and Bayesian. If pressed, the
91: majority of physicists would claim to belong to the frequentist
92: camp. In practice, we belong to both camps: we are frequentists
93: when we wish to appear ``objective," but Bayesian when to be
94: otherwise is either too hard, or makes no sense.
95: Until fairly recently, relatively few of us
96: have been party to the frequentist Bayesian debate.
97: And society is
98: all the better for it!
99: It is
100: our pragmatism that has cut through the Gordian Knot and allowed
101: scientific progress.
102: However, we find ourselves performing
103: ever more complex inferences that, in some cases, have real world
104: consequences and we can no longer regard the debate as mere
105: philosophical musings.
106: Indeed, this workshop is a testimony to this loss of innocence.
107:
108: All parties appear, at least, to agree on one thing: probability
109: theory is a reasonable basis for a theory of inference. But notice
110: the use of the word ``reasonable." That word highlights the chief
111: cause of the disagreement: any theory of inference is inevitably
112: {\em subjective} in the following sense: what one person regards
113: as reasonable may be considered unreasonable by another and,
114: unlike scientific theories, we cannot appeal to Nature to decide
115: which of the many inference theories is best, nor which criteria
116: are to be used.
117: I used to think that
118: biased estimates were bad. But while some of us strive mightily to
119: create them others look on bewildered, wondering why on earth we
120: work so hard to achieve a characteristic they consider irrelevant.
121:
122: Physicists, quite properly, are deeply concerned about delivering
123: to the world objective results. Therefore, anything that openly
124: declares itself to be subjective is viewed with suspicion. Since
125: Neyman's theory of inference is billed as objective many of us
126: regard it as reasonable and the Bayesian theory as unfit for
127: scientific use. However, when one scrutinizes the Neyman theory,
128: its ``objectivity" proves to be of a very peculiar sort, as I hope
129: to show. I then discuss the difficult issue of prior probabilities
130: by way of a simple model. In the last section, I describe a
131: realistic Bayesian analysis to illustrate a point: Bayesian
132: methods are not only fit for scientific use, they are precisely
133: what is needed to make maximal use of data.
134:
135: But first here are some remarks about
136: probability.
137:
138: \subsection{What is Probability?}
139: Probability theory is a mathematical theory about abstractions
140: called {\em probabilities}. Therefore, to put this theory to work
141: we are obliged to {\em interpret} these abstractions. At least three
142: interpretations have been suggested:
143: \begin{itemize}
144: \item propensity (Popper)
145: \item degree of belief (Bayes, Laplace, Gauss, Jeffreys, de Finetti)
146: \item relative frequency (Venn, Fisher, Neyman, von Mises).
147: \end{itemize}
148: In parentheses I have given the names of a few of the proponents.
149: According to Karl Popper, an unbiased coin, when tossed, has a
150: propensity of 1/2 to land heads or tails. The 1/2 is claimed to be
151: a property of the coin. According to Laplace probability is a
152: measure of the degree of belief in a proposition: given that you
153: believe the coin to be unbiased your degree of belief in the
154: proposition ``the coin will land heads" is 1/2. Finally, according
155: to Venn if the coin is unbiased the relative frequency with which
156: heads appears in an infinite sequence of coin tosses is 1/2. Venn
157: seems to have the edge on the other two interpretations since it
158: is a matter of experience that a coin tossed repeatedly lands
159: heads about 1/2 the time as the number of tosses, that is, trials,
160: increases. Every physicist who performs repeated controlled
161: experiments, either real ones or virtual ones on a computer,
162: provides overwhelming evidence in support of Venn's
163: interpretation.
164:
165: So, which is it to be: degree of belief or relative frequency? The
166: answer, I believe, is both, which prompts another question: is one
167: interpretation more fundamental than the other and if so which?
168: The answer is yes, degree of belief. It is yes for two very
169: important reasons: one is practical the other foundational. The
170: practical reason is that we use probability in a much broader
171: context than that to which the relative frequency interpretation
172: pertains. It has been amply demonstrated that we perform
173: inferential reasoning according to rules that are isomorphic to
174: those of probability theory. Any theory of inference that
175: dismisses the ``degree of belief" interpretation would be expected
176: to suffer a severely restricted domain of applicability relative
177: to the large domain in which probability is used in everyday life.
178:
179: The
180: second reason is that the Venn limit---the convergence of the ratio
181: of the number of successes to the number of trails---cannot be proved without
182: appealing to the notion of degree of belief\cite{Jeffreys}.
183: The issue here is one
184: of epistemology. Empirical evidence,
185: even when
186: overwhelming, does not prove that a thing is true; only that it is
187: very likely, which is just another way of saying it is very probable. It is
188: easy to see why a mathematical proof, as commonly understood, cannot be
189: established. Consider a sequence of trials to test the Standard
190: Model. Suppose each trial to be a proton
191: anti-proton collision at the Tevatron. Each trial ends in success
192: (a top quark is created) or failure. Let $T$ be the number of
193: trials and $S$ the number of
194: successes. Given the top quark mass, the
195: Standard Model predicts the probability $p$ of successes.
196: The Standard Model, we note, is a quantum theory. Therefore,
197: the sequence of successes is strictly non-deterministic,
198: in a sense in
199: which a coin toss and a pseudo-random number generator
200: are not.
201:
202: However, a necessary (but of course not sufficient) basis for a
203: mathematical proof of convergence of a sequence to a limit is the
204: existence of a rule that connects term $T+1$ {\em
205: deterministically} to $T$. But for quantum theory it is believed
206: that no such rule exists. What can be and has been proved, by
207: several people starting with James Bernoulli, is this:
208: \begin{quote}
209: If the order of trials is unimportant (that is, the sequence of
210: trials is {\em exchangeable}), and if the {\em probability} of success
211: at each trial is the
212: same, then $S/T \rightarrow p$, as $T
213: \rightarrow \infty$ with {\em probability} one.
214: \end{quote}
215: At this point, I can adopt two attitudes regarding this theorem:
216: one is that clarity of thought is a virtue; the second is that
217: clarity of thought is nice but less important than pragmatism. As
218: a pragmatist I would say that this theorem proves that the Venn
219: limit exists. But in this case I prefer clarity. Let us,
220: therefore, be clear about what this theorem actually proves and
221: what it does not. Bernoulli's theorem does not prove that $S/T$
222: converges to $p$. Rather it is a statement about 1) the {\em
223: probability} that $S/T$ converges to $p$ as 2) the number of
224: trials increases without limit, provided that 3) the order of
225: trials does not matter and that 4) the {\em probability} at each
226: trial is the same. Lurking behind these four seemingly innocuous
227: statements are deep issues that are far beyond the scope of what I
228: wish to say in this paper. Let me just note that the word
229: ``probability" occurs twice in the statement of Bernoulli's
230: theorem. If we insist that all probabilities are relative
231: frequencies then we would have to interpret ``probability of
232: success at each trial" and ``probability one" as the ``limit with
233: probability one" of other exchangeable sequences in order to be
234: consistent. This leads into the abyss of an infinitely recursive
235: definition. Doubtless, von Mises was well aware of this
236: difficulty, which may be why he took the existence of the Venn
237: ``limit" as an axiom. However, even if one is prepared to accept
238: this axiom, I do not think it circumvents the epistemological
239: difficulty of defining a thing, probability, by making use of the
240: thing {\em twice} in its definition. As de Finetti\cite{deFinetti}
241: puts it
242: \begin{quote}
243: ``In order for the results concerning frequencies to make sense,
244: it is necessary that the concept of probability, and the concepts
245: deriving from it which appear in the statements and proofs of
246: these results, should have been defined and given meaning
247: beforehand. In particular, a result which depends on certain
248: events being uncorrelated, or having equal probabilities, does not
249: make sense unless one has defined in advance what one means by the
250: probabilities of the individual events."
251: \end{quote}
252: I agree.
253:
254: The alternative interpretation of probability is {\em degree of
255: belief}. Thus the probability $p$ is our
256: assessment of the probability of success at each trial, based on
257: our current state of knowledge. That state of knowledge
258: could be informed, for example, by the predictions of the Standard
259: Model. Bernoulli's theorem says that
260: if our assessment of the probability of success at each trial is
261: correct, and if our assessment does not change,
262: then it is reasonable to expect $S/T \rightarrow p$ as
263: $T \rightarrow \infty$.
264:
265: But what if our assessment, initially, is incorrect? This poses no
266: difficulty. As our state of knowledge changes, by virtue of data acquired,
267: our assessment of the probability of success changes accordingly.
268: Bayes' theorem shows
269: how the degree of belief of a coherent reasoner will be updated
270: to the point
271: where it closely matches the relative frequency $S/T$.
272:
273:
274: \subsection{Neyman's Theory}
275: Neyman rejected the Bayesian use of Bayes' theorem arguing that
276: the prior probability for a parameter ``has no meaning" when the
277: latter is an unknown constant. He further argued that even if the
278: parameters to be estimated could be considered as random variables
279: we usually do not know the prior probability. With the benefit of
280: hindsight, we can see that these arguments betray a confusion
281: about of the notion of degree of belief. Jeffreys\cite{Jeffreys}
282: frequently lamented the failure of his contemporaries to really
283: understand what he was talking about. I would note that even
284: amongst this illustrious gathering the confusion persists. So let
285: me belabor a point: when one assigns a probability to a parameter
286: it is not because one deems it sensible to think of the parameter
287: as if it were a random variable---this is clearly nonsense if the
288: parameter is in fact a constant. The probability assignments
289: merely encode one's knowledge (or that of an idealized reasoner)
290: of the possible values of the parameter.
291:
292: In his classic paper of 1937\cite{Neyman}, Neyman introduced his
293: theory of confidence intervals, which he believed provided an
294: important element of an objective theory of inference. He not only
295: specified the property that confidence intervals had to satisfy
296: but he also gave a particular rule for constructing them, although
297: he left considerable freedom that can be creatively
298: exploited\cite{Feldman}. Neyman's theory is elegant and powerful.
299: Nonetheless, his theory is open to criticism. But in order to
300: raise objections we need to understand what Neyman said.
301:
302: Imagine an ensemble of trials, or experiments, $\{ E \}$ to each
303: of which we associate an interval
304: $[\underline{\theta}(E),\overline{\theta}(E)]$. The ensemble of
305: experiments yields an ensemble of intervals. Neyman required the
306: ensemble of confidence intervals to satisfy the following
307: condition:
308: \begin{quote}
309: For every possible {\em fixed} point $(\theta, \alpha)$ in the
310: parameter space of the problem, where $\theta$ is the parameter of interest
311: and $\alpha$ denotes all other parameters of the problem
312: \begin{equation}
313: {\rm Prob}\{ \theta \in [\underline{\theta}(E),\overline{\theta}(E)]
314: \} \geq \beta.
315: \end{equation}
316: \end{quote}
317: According to Neyman this probability is to be interpreted as a
318: relative frequency. Thus,
319: any set of intervals is an ensemble of {\em
320: confidence intervals} if the relative frequency with which
321: the intervals contain the
322: point $\theta$ is greater than or equal to $\beta$,
323: for every possible {\em fixed} point in the parameter space regardless of
324: its dimensionality.
325: Neyman's idea is intuitively clear: an interval picked at
326: {\em random} from such an ensemble, the proverbial urn of
327: sampling theory, will have a $100\beta$\% chance
328: of containing the fixed point $\theta$, whatever the
329: value of $\theta$ and $\alpha$.
330: This is a remarkable requirement. Here is an example.
331:
332: Suppose we wish to measure a cross section. Our inference problem
333: depends upon the following parameters: the cross section $\sigma$,
334: the efficiency $\epsilon$, the background $b$ and the integrated
335: luminosity $L$. Consider a {\em fixed} point $(\sigma, \epsilon,
336: b, L)$ in the parameter space. To this point we associate an
337: ensemble of confidence intervals, induced by an ensemble of
338: possible experimental results. Some of these intervals
339: $[\underline{\sigma}(E),\overline{\sigma}(E)]$ will contain
340: $\sigma$, others will not. The fraction of intervals, in the
341: ensemble, that contain $\sigma$ is called the {\em coverage
342: probability} of the ensemble of intervals. A coverage probability
343: is associated with every point $(\sigma, \epsilon, b, L)$ of the
344: parameter space. Moreover, the value of the coverage probability
345: may vary from point to point. Neyman's key idea is that the
346: ensembles of intervals should be constructed so that, over the
347: allowed parameter space, the coverage probability never falls
348: below some number $\beta$, called the confidence level. Both the
349: coverage probability and the confidence level are to be
350: interpreted as relative frequencies.
351:
352: The parameter space and its set of ensembles form what
353: mathematicians call a {\em fibre bundle}. The parameter space is
354: the base space to each point of which is attached a fibre, that
355: is, another space, here the ensemble of intervals associated with
356: that parameter point. Each fibre has a coverage probability, and
357: none falls below the confidence level $\beta$. Since the fibres
358: may vary in a non-trivial way from point to point it is not
359: possible, in general, to construct the fibre bundle as a simple
360: Cartesian product of the parameter space and a single ensemble of
361: intervals. In general, a non-trivial fibre bundle is the natural
362: mathematical description of Neyman's construction. Well natural
363: if, like me, you like to think of things geometrically!
364:
365: There are two difficulties with Neyman's idea. The first is
366: technical. For one-dimensional problems, or for problems in which
367: we wish to set bounds on {\em all} parameters simultaneously, the
368: construction of confidence intervals is straightforward. But when
369: the parameter space is multi-dimensional and our interest is to
370: set limits on a single parameter no general algorithm is known for
371: constructing intervals. That is, no general algorithm is known for
372: eliminating nuisance parameters. In our example, we care only
373: about the cross-section; we have no interest in setting bounds on
374: the integrated luminosity. What we do, in practice, is to replace
375: the nuisance parameters with their maximum likelihood estimates.
376: The justification for this procedure is the following theorem:
377: \begin{equation}
378: -2\log
379: \frac{Pr(x|\theta,\hat{\alpha})}{Pr(x|\hat{\theta},\hat{\alpha})}
380: \rightarrow \chi^2, \label{eq:loglike}
381: \end{equation}
382: \begin{quote}
383: as the data sample $x$ grows without
384: limit, and provided that the maximum likelihood estimates
385: $\hat{\theta}$
386: and
387: $\hat{\alpha}$ lie within the parameter space minus its boundary.
388: \end{quote}
389: If our data sample is sufficiently large its likelihood becomes
390: effectively a (non-truncated) multi-variate Gaussian, and
391: consequently the distribution of the log-likelihood ratio is
392: $\chi^2$. Since that distribution is independent of the true
393: values of the parameters a probability statement about the
394: log-likelihood ratio can be re-stated as one about the parameter
395: $\theta$. But, and this is the crucial point, the theorem is
396: silent about what to do for small samples. Unfortunately, we high
397: energy physicists insist on looking for new things, so our data
398: samples are often small. So what are we, in fact, to do? We must
399: after all publish. Today, with our surfeit of computer time, we
400: can contemplate a brute-force approach: start with an approximate
401: set of intervals, computed using \Eq{loglike}, and adjust them
402: iteratively until they make Neyman happy. But because of the
403: second difficulty I now discuss the effort seems hardly worth the
404: trouble.
405:
406: The second difficulty is conceptual.
407: It has been argued at this workshop, and elsewhere\cite{Cousins},
408: that the set of published 95\%
409: intervals constitute a bona fide ensemble of approximately 95\%
410: confidence intervals. Here is the argument. Each published interval
411: is
412: drawn from an urn (that is, an ensemble of experiments if you
413: prefer a more cheerful allusion) whose confidence level is 95\%.
414: The fact that each urn is completely different is irrelevant
415: provided that the sampling probability from each is the same,
416: namely 95\%. Thus 95\% of the set of published intervals will be
417: found to yield true statements. And herein lies the beauty of
418: coverage! The flaw in this argument is this: each published
419: interval is drawn from an urn that does not objectively exist,
420: because the ensemble into which an actual experiment is embedded
421: is a purely conceptual construct not open to empirical scrutiny.
422: Fisher\cite{Fisher}, not known for fawning over Bayesians, made a
423: similar point a long time ago:
424: \begin{quote}
425: ``.. if we possess a unique sample on which significance tests are
426: to be performed, there is always ... a multiplicity of populations
427: to each of which we can legitimately regard our sample as
428: belonging; so the phrase `repeated sampling' from the same
429: population does not enable us to determine which population is to
430: be used to define the probability level, for no one of them has
431: objective reality, all being products of the statistician's
432: imagination.''
433: \end{quote}
434: This is true of our ensemble of experiments.
435: Consequently, a few
436: troublesome physicists, bent on giving the Particle Data Group a
437: hard time, need merely imagine a different set of urns from which
438: the published results could legitimately have been drawn and
439: thereby alter the confidence level of each result!
440:
441: Of course, the published intervals do have a coverage probability.
442: My claim is that its value is a matter to be decided by actual
443: inspection---provided, of course, we know the right answers! It is
444: not one that can be deduced {\em a priori} for the reason just
445: given. The fact that I am able to construct ensembles of
446: confidence intervals on my computer, by whatever procedure, and
447: verify that they satisfy Neyman's criterion is certainly
448: satisfying, but in no way does it prove anything empirically
449: verifiable about the interval I publish. Forgive me for flogging a
450: sincerely dead horse, but let me state this another way: Since I
451: do not repeat my experiment, any statement to the effect that the
452: virtual ensemble simulated on my computer mimics the potential
453: ensemble to which my published interval belongs is tantamount to
454: my claiming that if I were to repeat my experiment, then I would
455: do so such that the virtual and real ensembles matched. Maybe, or
456: maybe not!
457:
458: To summarize: A frequentist confidence level is a property of an
459: ensemble, therefore, its objectivity, or lack thereof, is on par
460: with the ensemble that defines it.
461:
462: This whole discussion may strike you as a tad surreal, but I think
463: it goes to the heart of the matter: many physicists, for sensible
464: reasons, reject the Bayesian theory and embrace coverage because
465: it is widely viewed as objective. But as argued above confidence
466: levels may or may not be objective depending on the circumstances.
467: Therefore, when confronted with a difficult inference problem our
468: choice is not between an ``objective" and ``subjective" theory of
469: inference, but rather between two different subjective theories.
470: It may be reasonable to continue to insist upon coverage, but not
471: because it is objective.
472:
473: After this somewhat philosophical detour it is time to turn to the
474: real world. But en route to the real world, lest Bayesians begin
475: to feel uncontrollably smug, I'd like to discuss an instructive
476: ``toy" model that highlights the fact that for a Bayesian life is
477: hardly a bed of roses\cite{Wasserman}.
478:
479:
480: \section{THE PHYSICIST'S FAVOURITE TOY}
481: The typical high energy physics experiment consists of doing a
482: large number $T$ of similar things---for example, proton
483: antiproton collisions, and searching for $n$ interesting
484: outcomes---for example, $t\bar{t}$ production. We invariably
485: assume that the order of the collisions is irrelevant and that
486: each interesting outcome occurs with equal probability. Then we
487: may avail ourselves of the well-known fact that the probability
488: assigned to $n$ outcomes out of $T$ trials, with our assumptions,
489: is binomial. Since $n << T$, this probability can be approximated
490: by a Poisson distribution
491: \begin{equation}
492: \like{n}{\mu} = \frac{e^{-\mu} \mu^{n}}{n!}, \label{eq:poisson}
493: \end{equation}
494: and thus do we arrive at the physicist's favourite toy. The symbol
495: $I$ denotes all prior information and assumptions that led us to
496: this probability assignment. Here, it is introduced for
497: pedagogical reasons; to remind us of the fact that {\em all}
498: probabilities are conditional. We shall assume that our aim is to
499: infer something about the Poisson parameter $\mu$, given that we
500: have observed $n$ events. Just for fun, we'll give this problem to
501: each workshop member. Naturally, being physicists, each of us
502: insists on parameterizing this problem as we see fit, but in the
503: end when we compare notes we shall do so in terms of the parameter
504: $\mu$, by transforming to that parameter.
505:
506: There are, of course, infinitely many ways to parameterize a
507: likelihood function and the Poisson likelihood is no exception.
508: For simplicity, however, let's assume that each of us uses a
509: parameter $\mu_p$ related to $\mu$ as follows
510: \begin{equation}
511: \mu_p = \mu^p.
512: \label{eq:mup}
513: \end{equation}
514: ``$p$" for physicist if you like! In terms of the parameter
515: $\mu_p$ \Eq{poisson} becomes
516: \begin{equation}
517: \like{n}{\mu_p} = \frac{e^{-\mu_p^{1/p}} \mu_p^{n/p}}{n!},
518: \label{eq:poissonp}
519: \end{equation}
520: which, we note, does not alter the probability assigned to $n$.
521:
522: From Bayes' theorem
523: \begin{equation}
524: \post{\mu_p}{n} = \frac{\like{n}{\mu_p}
525: \prior{\mu_p}}{\int_{\mu_p} \like{n}{\mu_p} \prior{\mu_p}},
526: \label{eq:defpostp}
527: \end{equation}
528: each of us can make inferences about our parameter $\mu_p$, and
529: hence $\mu$. Of course, no one can proceed without specifying a
530: prior probability $\prior{\mu_p}$. Unfortunately, being mere
531: physicists we do not know what its form should be. But since we
532: are all in the same state of knowledge regarding our parameter,
533: coherence would seem to demand that we use the same functional
534: form. So without a shred of motivation let's try the following
535: form for the prior probability
536: \begin{equation}
537: \prior{\mu_p} = \mu_p^{-q} d\mu_p. \label{eq:aprior}
538: \end{equation}
539: Although this prior is plucked out of thin air, it is actually
540: more general than it appears because, in principle, $q$ could be
541: an arbitrarily complicated function of $p$. Now each of us is in a
542: position to calculate, assuming that the allowed parameter space
543: for $\mu_p$ is $[0,\infty)$. We each find that
544: \begin{equation}
545: \post{\mu_p}{n} = \frac{e^{-\mu_p^{1/p}} \mu_p^{n/p-q}
546: d\mu_p}{p\Gamma(n-pq+p)}.
547: \label{eq:postp}
548: \end{equation}
549: But as agreed, each of us transforms our posterior probability to
550: the parameter $\mu$ using \Eq{mup}. Thus we obtain, from
551: \Eq{postp},
552: \begin{equation}
553: \post{\mu}{n} = \frac{e^{-\mu} \mu^{n-pq+p-1}
554: d\mu}{\Gamma(n-pq+p)}.
555: \label{eq:postmu}
556: \end{equation}
557: Unfortunately, something is seriously amiss with the family of
558: posterior probabilities represented by \Eq{postmu}: each of us has
559: ended up making a different inference about the same parameter
560: $\mu$! We can see this more clearly by computing the $r$th moment
561: \begin{eqnarray}
562: \label{eq:mr}
563: m_r & \equiv & \int_{\mu} \mu^r \post{\mu}{n} \\ \nonumber
564: & = & \Gamma(n-pq+p+r)/\Gamma(n-pq+p),
565: \end{eqnarray}
566: of the posterior probability $\post{\mu}{n}$. The moments clearly
567: depend on $p$, that is, on how we have chosen to parameterize the
568: problem.
569:
570: What does a Bayesian have to say about this state of affairs? Is
571: it a problem? I would say yes, it is. But there are some Bayesians
572: who call themselves ``subjective Bayesians" and others who believe
573: themselves to be ``objective Bayesians." I confess that these
574: terms leave me a bit baffled. The latter term because it seems to
575: be an oxymoron and the former because it seems to be superfluous.
576: The fundamental Bayesian pact is this: The prior probability is an
577: encoding of a state of knowledge; as such it is a subjective
578: construct. That construct may encode one's personal state of
579: knowledge or belief, and that's a fine thing to do and is very
580: powerful. But it may also encode a state of knowledge that is not
581: specifically yours and that too is just fine. The issue is one of
582: encoding a state of knowledge: Are there any desiderata that
583: should be respected? The subjectivist is probably inclined to say
584: no: simply choose the parameterization that makes sense for you
585: and associate a prior, declare it to be supreme, and force all
586: other priors to differ from yours in just the right way to render
587: an inference about $\mu$ unique. So a ``subjective" Bayesian would
588: presumably reject \Eq{aprior}.
589:
590: I believe that to make headway, we should entertain some further
591: principles. They should not degenerate into dogma but should serve
592: as a lantern in the dark. Here are two possible principles:
593: \begin{itemize}
594: \item Possible Principle 1: For the same likelihood and the same
595: form of prior we should obtain the same inferences.
596: \item Possible Principle 2: The moments of the posterior
597: probability should be finite.
598: \end{itemize}
599: Let's apply these tentative principles to the moments in \Eq{mr}.
600: Principle 1 says that each of us should make the same inferences
601: about $\mu$, that is, the moments ought not to depend on the whim
602: of a workshop member; it ought not to depend on $p$. Principle 2
603: says that $m_r < \infty$. Together these principles imply that
604: \begin{equation}
605: -pq + p = a > 0,
606: \end{equation}
607: where $a$ is a constant. This leads to the following prior
608: \begin{equation}
609: \prior{\mu_p} = \mu_p^{a/p-1} d\mu_p.
610: \end{equation}
611: But we didn't quite make it; our principles are insufficient to
612: uniquely specify a value for the constant $a$. We need something
613: more. Here is something more, suggested by Vijay
614: Balasubramanian\cite{Bala}:
615: \begin{itemize}
616: \item Possible Principle 3: When in doubt, choose a prior that
617: gives equal weight to
618: all likelihoods indexed by the same parameters.
619: \end{itemize}
620: That is, impose a {\em uniform} prior on the space of
621: distributions. This requirement is a much more reasonable one
622: (here is that word again) than imposing uniformity on the space of
623: parameters because the space of distributions is invariant,
624: whereas that of parameters is not. The space of distributions is
625: akin to a space containing invariant objects like the vectors in a
626: vector space, whereas the parameter space is analogous to the
627: non-invariant space of vector coordinates. In our case, we impose
628: a uniform prior on the space inhabited by Poisson distributions.
629: Balasubramanian has shown that a uniform prior on the space of
630: distributions induces, locally, a Riemannian metric whose
631: invariant measure is determined by the Fisher Information, $F$.
632: For our toy model the invariant measure is
633: \begin{equation}
634: \label{eq:jeffprior} \prior{\mu_p} = F^{1/2} d\mu_p,
635: \end{equation}
636: where
637: \begin{equation}
638: F(\mu_p) = -\left < \frac{d^2 \log \like{n}{\mu_p}}{d\mu_p^2}
639: \right >.
640: \end{equation}
641: Equation~(\ref{eq:jeffprior}) is called the {\em Jeffreys prior}.
642: It gives $a = 1/2$ and thus uniquely specifies the form of the
643: prior probability. Possible Principle 3 is a generalization of
644: Possible Principle 1. Thus we conclude that the prior probability
645: that forces us all to make the same inference, regardless of how
646: we choose to parameterize the problem, is
647: \begin{equation}
648: \label{eq:prior} \prior{\mu_p} = \mu_p^{-\frac{1}{2}(2-p)} d\mu_p.
649: \end{equation}
650:
651: This is all very tidy. However, when Jeffreys\cite{Jeffreys}
652: applied his general prior probability to the Gaussian, treating
653: both its mean and standard deviation together he got a result he
654: did not like. He therefore suggested another principle:
655: \begin{itemize}
656: \item Possible Principle 4: If the parameter space can be
657: partitioned into subspaces that, {\em a priori}, are considered
658: independent then the general prior should be applied to each
659: subspace separately.
660: \end{itemize}
661: This gave him a prior he liked. Alas, for a Bayesian life is not
662: easy. While the frequentist struggles with justifying the use of a
663: particular non-objective ensemble the Bayesian struggles to
664: justify why some set of additional principles for encoding minimal
665: prior knowledge is reasonable. Meanwhile, the ``subjective
666: Bayesian" says this is all a mere chasing after shadows. And so it
667: goes!
668:
669: \section{THE READ WORLD}
670: The foregoing discussion might suggest to ``abandon all hope yea
671: who enter" the real world of inference problems. Fortunately, it
672: is not quite so bleak. The real world imposes some very severe
673: constraints on what we can reasonably be expected to do. For one
674: thing, the lifetime of a physicist is finite, indeed, short when
675: compared with the age of the universe. Technical resources are
676: also finite. And then there is competition from fellow physicists.
677: Finally, uncertainty in abundance is the norm. Perhaps with enough
678: deep thought all inference problems can be solved in a pristine
679: manner. In practice, we are forced to exercise a modicum of
680: judgement when undertaking any realistic analysis. We introduce
681: approximations as needed, we side-step difficult issues by
682: accepting some conventions and we rely upon our ability not to get
683: lost amongst the trees. But when I reflect on what must be done to
684: measure, say, the top quark mass, a problem replete with
685: uncertainties in the jet energy scale, acceptance, background,
686: luminosity, Monte Carlo modeling to name but a few, it strikes me
687: as desirable to have a coherent and intuitive framework to think
688: about such problems. Bayesian Probability Theory provides
689: precisely such a framework. Moreover, it is a framework that
690: mitigates our propensity to get confused about statistics when the
691: going gets tough. The second example I discuss shows that real
692: science can be done in spite of prior anxiety\cite{Wasserman}.
693:
694:
695: \subsection{Measuring the Solar Neutrino Survival Probability}
696:
697: It has been known for over a quarter of a century that fewer
698: electron neutrinos are received from the Sun than expected on the
699: basis of the Standard Solar Model
700: (SSM)\cite{review,history,bp98,bp95,ssm93}.
701: This is the famous solar neutrino
702: problem. Figure~\ref{fig:rates} summarizes the situation as of
703: Neutrino 98.
704: \begin{figure}[htbp]
705: \begin{center}
706: \includegraphics[width=8.5cm,angle=-90]{bp98rates.eps}
707: \caption{Predictions of the 1998 Standard Solar Model of Bahcall
708: and Pinsonneault relative to data presented at Neutrino 98.
709: Courtesy J.N.~Bahcall.} \label{fig:rates}
710: \end{center}
711: \end{figure}
712: If the SSM is correct---and there is very strong evidence in its
713: favour\cite{helio}, then the inevitable conclusion is that a
714: fraction of the electron neutrinos created in the solar core are
715: lost before they reach detectors on Earth. The loss of electron
716: neutrinos is parameterized by the {\em neutrino survival
717: probability}, $\ps$, which is the probability that a solar
718: neutrino $\nu$ of energy $\Enu$ arrives at the Earth.
719:
720: Several loss mechanisms have been suggested, such as the
721: oscillation of electron neutrinos to less readily observed states
722: such as muon, tau or sterile neutrinos\cite{gribov,wolfenstein}.
723: Many $\chi^2$-based analyses have been performed to estimate model
724: parameters\cite{hata, lui, parke}. To the degree that a fit to the
725: solar neutrino data is good it provides evidence in favour of the
726: particular new physics that has been assumed. From this
727: perspective, solar neutrino physics is yet another way to probe
728: physics beyond the Standard Model.
729:
730: But I'd like to address a more modest question:
731: What do the data tell us about the solar neutrino
732: survival probability independently of any particular model of new physics?
733: We can provide a complete answer by computing
734: the posterior probability of different
735: hypotheses about the value of the survival probability, for a given neutrino
736: energy\cite{bhat,gates}. Our Bayesian analysis is comprised of four components
737: \begin{itemize}
738: \item The model
739: \item The data
740: \item The likelihood
741: \item The prior
742: \end{itemize}
743: First we sketch the model. (See Ref.~\cite{bhat} for details.)
744:
745: The solar neutrino capture rate $S_i$ on chlorine and gallium can
746: be written as
747: \begin{equation}
748: S_i = {\sum_j} \Phi_j \int \ps \sigma_i(E_{\nu}) \phi_j(E_{\nu})
749: d\Enu, \label{eq:radio}
750: \end{equation}
751: where $\Phi_j$ is the total flux from neutrino source $j$,
752: $\phi_j$ is the normalized neutrino energy spectrum and $\sigma_i$
753: is the cross section for experiment $i$. The predicted spectrum,
754: plus experimental energy thresholds, are shown in
755: Fig.~\ref{fig:spectrum}. The full spectrum consists of eight
756: components (of which six are shown in Fig.~\ref{fig:spectrum}),
757: with total fluxes $\Phi_1$ to $\Phi_8$\cite{bp98}.
758: \begin{figure}[htbp]
759: \begin{center}
760: \includegraphics[width=8.5cm,angle=-90]{bp98spectrum.eps}
761: \caption{Solar neutrino energy spectrum as predicted by the
762: Bahcall-Pinsonneault 1998 Standard Solar Model, including the
763: neutrino energy thresholds for different solar neutrino
764: experiments. Courtesy J.N.~Bahcall.} \label{fig:spectrum}
765: \end{center}
766: \end{figure}
767:
768: The Super-Kamiokande experiment\cite{neutrino98} measures the
769: electron recoil spectrum arising from the scattering of the $^8B$
770: neutrinos (plus higher energy neutrinos) off atomic electrons. We
771: shall use the electron recoil spectrum reported at Neutrino 98.
772: The spectrum spans the range 6.5 to 20~MeV. Light water
773: experiments, like Super-Kamiokande, are sensitive to all neutrino
774: flavors but do not distinguish between them. There are, therefore,
775: two possibilities: the $\nu_e$ deficit could be caused by $\nu_e$
776: conversions to $\nu_x$, where $x$ is either $\mu$ or $\tau$. If so
777: the measured neutrino flux would be the sum of these flavors. If,
778: however, the $\nu_e$ are simply lost without a trace, for example
779: because of conversion into sterile neutrinos, then the measured
780: flux would be comprised of $\nu_e$ only. Like the rates for the
781: radiochemical experiments, the measured electron recoil spectrum
782: is linear in the neutrino survival probability. The data are shown
783: in Fig.~\ref{fig:recoil}.
784: \begin{figure}[htbp]
785: \begin{center}
786: \includegraphics[width=8.5cm]{bp98recoil.eps}
787: \caption{Electron recoil spectrum measured by Super-Kamiokande
788: compared to spectrum predicted by the Bahcall-Pinsonneault 1998
789: Standard Solar Model. From Ref.~\cite{superk}.} \label{fig:recoil}
790: \end{center}
791: \end{figure}
792:
793:
794: For solar neutrino experiments, a reasonable definition of
795: sensitivity is the product of the cross section times the
796: spectrum\cite{bhat}. This quantity is plotted in
797: Fig.~\ref{fig:sensitivity}. Two points are noteworthy: each
798: experiment is sensitive to different parts of the neutrino energy
799: spectrum and there are regions in neutrino energy where the
800: sensitivity is essentially zero. We should anticipate that these
801: facts will constrain what we are able to learn about the neutrino
802: survival probability from the current solar neutrino data.
803: %
804: \begin{figure}[htbp]
805: \begin{center}
806: \includegraphics[width=8.5cm]{bp98sensitivity.eps}
807: \caption{Spectral sensitivity as a function of the neutrino
808: energy. From Ref.~\cite{bhat}.} \label{fig:sensitivity}
809: \end{center}
810: \end{figure}
811: %
812:
813: Since we do not know the cause of the solar neutrino deficit,
814: let's adopt a purely phenomenological approach to the survival
815: probability. Guided by the results from previous
816: analyses~\cite{hata,lui,parke,cbhat} we write the survival
817: probability as a sum of two finite Fourier series:
818: \begin{eqnarray}
819: \label{eq:parametric}
820: \psa & = & \sum_{r=0}^{7}
821: a_{r+1} \mbox{cos}(r\pi E_{\nu}/L_1) / (1 +
822: \mbox{exp}[(E_{\nu}-L_1)/b])
823: \\ \nonumber
824: & + & \sum_{r=0}^{3} a_{r+9} \mbox{cos}(r \pi E_{\nu}/L_2),
825: \end{eqnarray}
826: where now we explicitly note the fact that the survival
827: probability depends upon the set of parameters $a$.
828: %
829: The first term in \Eq{parametric} is defined in the interval 0.0
830: to $L_1$ MeV---and suppressed beyond $L_1$ by the exponential. The
831: second term spans the interval 0.0 to $L_2$ MeV. We have divided
832: the function this way to model a survival probability that varies
833: rapidly in the interval 0.0 to $L_1$ and less so elsewhere. The
834: parameters $L_1$, $L_2$ and $b$ are set to 1.0, 15.0 and 0.1~MeV,
835: respectively.
836:
837: We now consider the likelihood function $\like{D}{H}$, where $H$
838: denotes the hypothesis under consideration. The likelihood is
839: assumed to be proportional to a multi-variate Gaussian
840: $g(D|S,\Sigma)$, where $D \equiv (D_1,\ldots,D_{19})$ represents
841: the 19 data---3 rates from the chlorine and gallium experiments
842: plus 16 rates from the binned Super Kamiokande electron recoil
843: spectrum (Fig.~\ref{fig:recoil}); $\Sigma $ denotes the
844: $19\times19$ error matrix for the experimental data and $S \equiv
845: (S_1,\ldots,S_{19})$ represents the predicted rates.
846:
847: The remaining ingredient is the prior probability. First we assess
848: our state of knowledge. There are two sets of parameters to be
849: considered: the total fluxes $(\Phi_1,\ldots,\Phi_8)$ and the
850: survival probability parameters $(a_1,\ldots,a_{12})$. The
851: hypotheses under consideration concern the values of these two
852: sets of parameters. The Standard Solar Model provides predictions
853: $\Phi^0 \equiv (\Phi_1^0,\ldots,\Phi_8^0)$ for the total fluxes,
854: together with estimates of their {\em theoretical} uncertainties.
855: So here is an analysis that must deal with theoretical
856: uncertainties in some sensible way. I do not know how such a thing
857: can be addressed in a manner consistent with frequentist precepts.
858: For a Bayesian uncertainty is, well, uncertainty, regardless of
859: provenance; therefore, every sort can be treated identically. We
860: represent our state of knowledge regarding the fluxes by a
861: multi-variate Gaussian prior probability $\prior{\Phi} =$
862: $g(\Phi|\Phi^0,\Sigma_{\Phi})$, where $\Phi^0$ is the vector of
863: flux predictions and $\Sigma_{\Phi}$ is the corresponding error
864: matrix\cite{bp98}.
865:
866: Unfortunately, we know very
867: little about the parameters $a_1, \ldots,a_{12}$, so we shall
868: short-circuit discussion by taking, as a matter of convention, the
869: prior probability for $a$ to be uniform. In practice, any other
870: plausible choice makes very little difference to our conclusions.
871: We may even find that a uniform prior for $a$ is consistent with
872: the generalized Jeffreys prior. Thus we arrive at the following
873: prior for this inference problem:
874: \begin{eqnarray}
875: \prior{a,\Phi} & = & {\rm Prior}(a|\Phi,I) \prior{\Phi} \\
876: \nonumber
877: & = & da \prior{\Phi},
878: \end{eqnarray}
879: where $I$ now includes the prior information from the Standard
880: Solar Model.
881:
882: Now we can calculate! The posterior probability is given by
883: \begin{equation}
884: \post{a,\Phi}{D} = \frac{\like{D}{a,\Phi} \prior{a,\Phi}}
885: {\int_{a,\Phi}\like{D}{a,\Phi} \prior{a,\Phi} }.
886: \end{equation}
887: But since we aren't really interested in the total fluxes
888: probability theory dictates that we just marginalize (that is,
889: integrate) them away to arrive at the quantity of interest
890: $\post{a}{D}$. Actually, what we really want is the probability of
891: the survival probability for a given neutrino energy $\Enu$! That
892: is, we want
893: \begin{equation}
894: \post{p}{D} = \int_{a} \delta(p - \psa) P(a|D,I).
895: \end{equation}
896: Figure~\ref{fig:parme} shows contour plots of $\post{p}{D}$ for the two
897: cases considered, conversion to sterile and active neutrinos.
898:
899: Our Bayesian analysis has produced a result that, intuitively,
900: makes a lot of sense. As expected, given the sensitivity plot in
901: Fig.~\ref{fig:sensitivity}, our knowledge of the survival
902: probability is very uncertain between 1 and 5 MeV. In fact, the
903: survival probability is tightly constrained in only two narrow
904: regions: in the \Be\ region just below 1 MeV and another at around
905: 8 MeV, near the peak of the \B\ neutrino spectrum. For neutrino
906: energies above 12 MeV or so, the survival probability is basically
907: unconstrained by current data.
908:
909: \begin{figure}[htbp]
910: \begin{center}
911: \begin{minipage}[t]{0.46\linewidth}
912: \includegraphics[width=7cm,angle=-90]{bp98sterile.eps}
913: \end{minipage}
914: \begin{minipage}[t]{0.46\linewidth}
915: \includegraphics[width=7cm,angle=-90]{bp98active.eps}
916: \end{minipage}
917: \end{center}
918: \caption{ Survival probability {\it vs} neutrino energy assuming
919: the neutrino flux consists of $\nu_e $ only (left plot) and
920: $\nu_e $ to active neutrinos (right plot).} \label{fig:parme}
921: \end{figure}
922:
923: \section{SUMMARY}
924: It has been claimed by some at this workshop that Bayesian methods
925: are of limited use in physics research. This of course is not true
926: as I hope to have shown. Bayesian methods are, however, explicitly
927: subjective and this may give one pause. I have argued that
928: frequentist methods are not nearly as objective as claimed. While
929: Bayesians cannot avoid the irreducible subjectivism of prior
930: probabilities, frequentists cannot avoid the use of ensembles that
931: do not objectively exist. Frequentists struggle with any
932: uncertainty that does not arise from repeated sampling, like
933: theoretical errors, while for Bayesians uncertainty in all its
934: forms is treated identically. On the other hand, some Bayesians
935: struggle to convince us that a particular choice of prior is
936: reasonable, while frequentists look on in amusement. The point is
937: neither approach is free from warts. But, of the two approaches to
938: inference, I would say that the Bayesian one has more to offer, is
939: easier to understand, has greater conceptual cohesion and, the
940: most important point of all, more closely accords with the way we
941: physicists think\cite{bayes}. And this is real reason why it
942: should be embraced.
943:
944:
945: \vskip1cm \noindent
946:
947: \section*{ACKNOWLEDGEMENTS}
948:
949:
950: I wish to thank the organizers for hosting this most enjoyable workshop.
951: It was a particular pleasure for me to meet again my dear friend, and
952: intellectual sparring partner, Fred James who must take all the
953: credit for arousing my interest in this arcane subject.
954: I thank my colleagues Chandra Bhat, Pushpa Bhat
955: and Marc Paterno with whom the solar neutrino work was done, John
956: Bahcall for providing the latest theoretical information and
957: Robert Svoboda for providing the 1998 Super-Kamiokande data in
958: electronic form. This work was supported in part by the U.S.
959: Department of Energy.
960:
961: \begin{thebibliography}{99}
962:
963: \bibitem{Jeffreys} H.~Jeffreys,Theory of Probability, 3rd edition,
964: Oxford University Press (1961). Chapters I, VII and VIII should be
965: required reading for anyone who values clear thinking.
966:
967: \bibitem{deFinetti} B.~deFinetti, Theory of Probability, Vol. 1,
968: John Wiley \& Sons Ltd. (1990).
969:
970: \bibitem{Neyman} J.~Neyman, Phil. Trans. R. Soc. London {\bf A236}
971: (1937) 333. A beautiful paper, not nearly as daunting as one might
972: imagine.
973:
974: \bibitem{Feldman} G. Feldman and R. Cousins, Phys. Rev. {\bf D57}
975: (1998) 3873. A clear paper free of
976: frequentist/Bayesian muddle! The authors
977: make a sharp distinction between Bayesian and frequentist ideas
978: and then opt for a principled frequentism.
979:
980: \bibitem{Cousins}
981: R.~Cousins, Am. J. Phys. {\bf 63} (1995) 398. An excellent
982: accessible discussion about limits. And yes Bob, every physicist
983: is a Bayesian, but many don't know it! Ok, maybe Fred isn't!
984:
985: \bibitem{Fisher} R.~A.~Fisher: An Appreciation (Lecture Notes on
986: Statistics, Vol. 1), S.~E.~Fienberg and D.~V.~Hinkley, eds.
987: Springer Verlag (1990). Lots of interesting historical stuff about
988: Sir Ronald.
989:
990: \bibitem{Bala}
991: V.~Balasubramanian, Statistical Inference, Occam's Razor and
992: Statistical Mechanics on The Space of Probability Distributions,
993: Princeton University Physics Preprint PUPT-1587 (1996). Also
994: available electronically as preprint cond-mat/9601030. The
995: mathematics is a bit tricky, but the main ideas are not too hard
996: to grasp. It's worth a read.
997:
998: \bibitem{Wasserman}
999: R.~E.~Kass and L.~Wasserman, J. Am. Stat. Assoc., {\bf 91} (1996)
1000: 1343. Life is tough!
1001:
1002: \bibitem{review}
1003: T.~A.~Kirsten, \Journal{\REM}{71}{1213}{1999}.
1004:
1005: \bibitem{history}
1006: J.~N.~Bahcall and Raymond Davis, Jr., An account of the
1007: development of the solar neutrino problem, Essays in Nuclear
1008: Astrophics, eds. C.~A.~Barnes, D.~D.~Clayton and D.~Schramm,
1009: Cambridge University Press (1982) pp. 243-285.\\ See also,
1010: //http://www.sns.ias.edu/$\sim$jnb/Papers/Popular/snhistory.html
1011:
1012: \bibitem{bp98}
1013: J.~N.~Bahcall \etal, \Journal{\PLB}{433}{1}{1998}.
1014:
1015: \bibitem{bp95} J.~N.~Bahcall and M.~Pinsonneault,
1016: \Journal{\REM}{67}{781}{1995}.
1017:
1018: \bibitem{ssm93}
1019: S.~Turck-Chi$\acute{\rm e}$ze and I.~Lopes, Astrophys. J. {\bf
1020: 408} (1993) 347.
1021:
1022: \bibitem{helio}
1023: J.~N.~Bahcall, S.~Basu and M.~H.~Pinsonneault,
1024: \Journal{\PLB}{433}{1}{1998}.
1025:
1026: \bibitem{gribov} V.N. Gribov and B.M.~Pontecorvo,
1027: \Journal{\PLB}{28}{493}{1969};\\ J.N.~Bahcall and S.C.~Frautschi,
1028: \Journal{\PLB}{29}{623}{1969}; \\ S.L.~Glashow and L.M.~Krauss,
1029: \Journal{\PLB}{190}{199}{1987}.
1030:
1031: \bibitem{wolfenstein} L.~Wolfenstein,
1032: \Journal{\PRD}{17}{2369}{1978};\\ S.P.~Mikheyev and A.Yu.~Smirnov,
1033: \Journal{Sov. J. Nucl. Phys.}{42}{913}{1986}; \\ S.P.~Mikheyev and
1034: A.Yu.~Smirnov, \Journal{Nuovo Cimento C}{9}{17}{1986}.
1035:
1036: \bibitem{hata} N.~Hata and P.Langacker,
1037: \Journal{\PRD}{50}{632}{1994}, N.~Hata and P.Langacker,
1038: \Journal{\PRD}{56}{6107}{1997}.
1039:
1040: \bibitem{lui} Q.~Y.~Liu and S.~T.~Petcov,
1041: \Journal{\PRD}{56}{7392}{1997}; \\
1042: A.B.~Balantekin, J.F.~Beacom,
1043: J.M.~Fetter, \Journal{\PLB}{427}{317}{1998}.
1044:
1045: \bibitem{parke}
1046: S.~Parke, \Journal{\PRL}{74}{839}{1995}.
1047:
1048: \bibitem{cbhat}
1049: C.M.~Bhat, 8th Lomonosov Conference on Elementary Particle
1050: Physics, Moscow, Russia, August 1997, FERMILAB-Conf-98/066;\\
1051: C.M.~Bhat \etal, Proceedings of the 9th Meeting of the DPF of
1052: the American Physical Society, ed. K.~Heller \etal, World
1053: Scientific (1996) 1220;
1054:
1055: \bibitem{bhat}
1056: C.~M.~Bhat, P.~C.~Bhat, M.~Paterno and H.~B.~Prosper,
1057: \Journal{\PRL}{81}{5056}{1998}.
1058:
1059: \bibitem{gates}
1060: E.~Gates, L.M.~Krauss, M.~White, \Journal{\PRD}{51}{2631}{1995}.
1061:
1062: \bibitem{neutrino98}
1063: K.~Lande (Homestake), V.N.~Gavrin (SAGE), T. Kirsten (GALLEX) and
1064: Y.~Suzuki (Super-Kamiokande), Neutrino 98, Proceedings XVIIIth
1065: International Conference on Neutrino Physics and Astrophysics,
1066: Takayama, Japan, June 1998, eds. Y.~Suzuki and Y.~Totsuka;\\
1067: Robert Svoboda, private communication 1998.
1068:
1069: \bibitem{superk}
1070: The Super-Kamiokande Collaboration,
1071: \Journal{\PRL}{82}{2644}{1999}.
1072:
1073: \bibitem{bayes}
1074: See for example, G. D'Agostini, Bayesian Reasoning In High-Energy
1075: Physics: Principles And Applications, CERN-99-03 (1999) 183.
1076:
1077: \end{thebibliography}
1078:
1079: \end{document}
1080:
1081: