1:
2: \documentclass[12pt,letterpaper]{article}
3:
4: \newif\ifpdf % creates \ifpdf , \pdffalse and \pdftrue
5:
6: \ifx\pdfoutput\undefined
7: \pdffalse
8: \else
9: \pdfoutput=1
10: \pdftrue
11: \fi
12:
13:
14:
15:
16: \ifpdf
17: % \topmargin=-20mm
18: \topmargin=-38mm
19: \usepackage{graphicx,here,theorem,amssymb,amsmath,url,thumbpdf}
20: \pdfimageresolution=300
21: \DeclareGraphicsExtensions{.pdf,.jpg,.jpeg}
22: \usepackage[pdftex,pdfpagemode=None,
23: pdfstartview={XYZ},]{hyperref} % specify last
24: \else
25: % \topmargin=0mm
26: \topmargin=-18mm
27: \usepackage{graphicx,here,theorem,amssymb,amsmath,url}
28: \DeclareGraphicsExtensions{.eps,.ps,.eps.gz,.ps.gz}
29: \newcommand{\href}[2]{{#2}}
30: \fi
31:
32: %
33: \oddsidemargin=5mm
34: \evensidemargin=5mm
35: \textwidth=155mm
36: \textheight=230mm
37: % Allow the page size to vary a bit ...
38: \raggedbottom
39: % To avoid Latex to be too fussy with line breaking ...
40: \sloppy
41:
42:
43:
44: %%%%%%%% new commands for s_true and epsilon_true %%%%%%%%%%
45: \newcommand{\st}{\ensuremath{s_\mathrm{true}}}
46: \newcommand{\et}{\ensuremath{\epsilon_\mathrm{true}}}
47:
48: \newcommand{\penn}{$^\mathrm{a}$}
49: \newcommand{\brandeis}{$^\mathrm{b}$}
50: \newcommand{\rutgers}{$^\mathrm{c}$}
51: \newcommand{\rockefeller}{$^\mathrm{d}$}
52: \newcommand{\oxford}{$^\mathrm{e}$}
53: \newcommand{\pisa}{$^\mathrm{f}$}
54: \newcommand{\toronto}{$^\mathrm{g}$}
55:
56:
57: \title{{\hfill\small CDF/MEMO/STATISTICS/PUBLIC/7117}\\\hfill\\
58: \bf\large Interval estimation in the presence of nuisance parameters.
59: 1.~Bayesian approach.}
60:
61: \pagenumbering{arabic}
62:
63: \author{
64: %\LARGE \phantom{xxxxx} CDF Statistics Committee \phantom{xxxxx} \and
65: Joel Heinrich\penn,
66: Craig Blocker\brandeis,
67: John Conway\rutgers,
68: Luc Demortier\rockefeller, \and
69: Louis Lyons\oxford,
70: Giovanni Punzi\pisa,
71: Pekka K.~Sinervo\toronto \and
72: \scriptsize\it\penn%
73: University of Pennsylvania, Philadelphia, Pennsylvania 19104 \and
74: \scriptsize\it\brandeis%
75: Brandeis University, Waltham, Massachusetts 02254 \and
76: \scriptsize\it \phantom{xxxxx} \rutgers%
77: Rutgers University, Piscataway, New Jersey 08855 \phantom{xxxxx} \and
78: \scriptsize\it\rockefeller%
79: Rockefeller University, New York, New York 10021 \and
80: \scriptsize\it\oxford%
81: University of Oxford, Oxford OX1 3RH, United Kingdom \and
82: \scriptsize\it\pisa%
83: Istituto Nazionale di Fisica Nucleare,
84: University and Scuola Normale Superiore of Pisa, I-56100 Pisa, Italy \and
85: \scriptsize\it\toronto%
86: University of Toronto, Toronto M5S 1A7, Canada}
87:
88: \date{September 27, 2004}
89:
90: \begin{document}
91:
92: \maketitle
93:
94:
95: \begin{abstract}
96: We address the common problem of calculating intervals in the presence
97: of systematic uncertainties. We aim to investigate several
98: approaches, but here describe just a Bayesian technique for setting
99: upper limits. The particular example we study is that of inferring
100: the rate of a Poisson process when there are uncertainties on the
101: acceptance and the background. Limit calculating software associated
102: with this work is available in the form of C~functions.
103: %%We address the common problem of setting limits on the rate of a
104: %%process using a counting experiment in the presence of uncertainties
105: %%on acceptances and backgrounds. We aim to investigate several
106: %%different approaches, but here describe just a Bayesian
107: %%technique. Limit calculating software associated with this study is
108: %%available in the form of C~functions.
109:
110: \end{abstract}
111:
112: \section{The problem}
113: \label{problem}
114: %A very common statistical problem of relevance to Particle Physics is
115: %the extraction of an upper limit on some hypothesised process,
116: A very common statistical procedure is obtaining a confidence interval
117: for a physics parameter of interest,
118: when there are uncertainties in quantities such as the acceptance of the
119: detector and/or the analysis procedure, the beam intensity, and the
120: estimated background. These are known in statistics as nuisance
121: parameters, or in Particle Physics as sources of systematic
122: uncertainty. We assume that estimates of these quantities are
123: available from subsidiary measurements.\footnote{
124: There are other possibilities. Thus it may be that all that is known is
125: that a nuisance parameter is contained within a certain range:
126: $\mu_\mathrm{l}\le\mu\le\mu_\mathrm{u}$;
127: that is not enough information for a Bayesian
128: approach. Alternatively the data relevant for the physics and nuisance
129: parameters could be bound up in the main measurement, and not require a
130: subsidiary one.}
131: A variant of
132: this procedure which is particularly relevant for Particle Physics is
133: the extraction of an upper limit on the rate of some hypothesized
134: process or on a physical parameter, again with systematic uncertainties.
135:
136:
137: To specify the problem in more detail, we assume that we are
138: performing a counting experiment in which we observe $n$ counts, and
139: that the acceptance has been estimated as $\epsilon_0 \pm
140: \sigma_\epsilon$ and the background as $b_0 \pm \sigma_b$. For a
141: signal rate $s$, $n$ is Poisson distributed with mean $s\epsilon + b$.
142: Here $\epsilon$ contains factors like the intensity of the accelerator
143: beam(s), the running time, and various efficiencies. It is constrained
144: to be non-negative, but can be larger than unity.
145:
146: We aim to study and compare different approaches
147: for determining confidence intervals for
148: this problem. In
149: general we are interested in pathologies in these areas:
150: \begin{itemize}
151:
152:
153: \item
154: {\bf Coverage.}
155: This is a measure of how often the limits that we deduce would
156: in fact include the true value of the parameter. This requires consideration of an ensemble
157: of experiments like the one we actually performed, and hence is an essentially
158: frequentist concept. Nevertheless, it can be applied to a Bayesian technique.
159:
160: Coverage is a property of the technique, and not of the particular
161: limit deduced from a given measurement. It can, however, be a function
162: of the true value of the parameter, which is in general unknown in a
163: real measurement.
164:
165: Undercoverage (i.e.\ the probability of containing
166: the true value is less than the stated confidence level) is regarded by frequentists as a
167: serious defect.
168: Usually coverage is required for all possible values of the physical
169: parameter.\footnote{The argument is that the parameter is unknown, and
170: so we wish to have coverage, whatever its value. This ensures that, if
171: we repeat our specific experiment many times, we should include the
172: true value within our confidence ranges in (at least) the stated
173: fraction of cases. This argument may, however, be over-cautious. The
174: location of the dips in a coverage plot like that of
175: Fig.~\ref{fig:Bayes} occur at values which are not fixed in $s$, but
176: which depend on the details of our experiment (such as the values of
177: $\epsilon$ and $b$). These details vary from experiment to
178: experiment. Thus we could achieve `no undercoverage for the ensemble
179: of experiments measuring the parameter $s$', even if the individual
180: coverage plots did fall below the nominal coverage occasionally. Thus
181: in some sense `average coverage' would be sufficient (see
182: for example reference \cite{interplay}), although it is
183: hard to quantify the exact meaning of `average'. It should be stated
184: that this is not the accepted position of most High Energy Physics
185: frequentists.}
186: In contrast, overcoverage is permissible, but the larger intervals result
187: in less stringent tests of models that predict the value of the parameter. For
188: measurements involving quantised data (e.g.\ Poisson counting), most
189: methods have coverage which varies with the true value of the parameter of interest,
190: and hence if undercoverage is to be avoided, overcoverage is inevitable.
191:
192: Frequentist methods by construction will not undercover for any values
193: of the parameters. This is not guaranteed for other approaches. For
194: example, even though the Bayesian intervals shown here do not
195: undercover, in other problems Bayesian 95\% credible intervals could
196: even have zero coverage for some values of the parameter of
197: interest.\cite{zeroc}
198: It should also be remarked that, although coverage is a very important
199: property for frequentists, on its own exact coverage does not
200: guarantee that intervals have desirable properties (for many examples,
201: see Refs. \cite{clifford} and \cite{strong}).
202:
203:
204:
205:
206: %\item
207: %%How the limits on $s$ vary with the observation $n$. Low limits can
208: %%provide tighter rejection of incorrect values, but empty or very short
209: %%intervals are undesirable:
210: %Narrow intervals or low upper limits
211: %for the signal rate $s$
212: %can provide tighter rejection of
213: %incorrect values, but empty or very short intervals---intervals
214: %with very small Bayesian credibility---are undesirable.
215: %Although such intervals may formally enjoy frequentist coverage%
216: %%and/or Bayesian credibility
217: %, they simply lack a kind of conditional
218: %plausibility: {\em given} the type of measurement we are making, the
219: %resolution of the apparatus, etc.,\ it is very unlikely that the true
220: %value of the parameter we are interested in lies in the calculated
221: %interval.
222:
223: %We will also look at the mean interval length,
224: %even though it is not invariant with respect to reparametrisations of
225: %$s$, the variable of interest, e.g.\ $1/s$ or $\ln s$, or $s^2$,
226: %etc.\footnote{The {\bf median} upper limit is invariant with respect
227: %to monotonic reparametrisations, but is not commonly used.}
228:
229: \item
230: {\bf Interval length.}
231: This is sometimes used as a criterion of accuracy of
232: intervals, in the sense that shorter intervals have less probability of
233: covering false values of the parameter of interest. However, one should
234: keep in mind that short intervals are only desirable if they contain the
235: true value of the parameter. Thus empty intervals, which do occur in
236: some frequentist constructions, are generally undesirable, even when their
237: construction formally enjoys frequentist coverage.
238:
239: Intervals that fill the entire physically allowed range of the
240: parameter of interest may also occur in some situations. Examples of
241: this behavior are given in \cite{clifford} and \cite{zech}. An
242: experimenter who requests a 68\% confidence interval, but receives what
243: appears to be a 100\% confidence interval instead, may not be satisfied
244: with the explanation that he is performing a useful service in helping
245: to keep the coverage probability---averaged over his measurement and
246: his competitor's measurements---from dropping below 68\%.
247:
248: %This is sometimes used as a criterion of
249: %accuracy of intervals, but one should keep in mind that short
250: %intervals are only desirable if they contain the true value of the
251: %parameter. Thus empty intervals, which do occur in some frequentist
252: %constructions, are generally undesirable, even when their construction
253: %formally enjoys frequentist coverage.
254: %This problem is not necessarily
255: %restricted to empty intervals. Suppose we make a histogram of
256: %interval lengths over the relevant ensemble of measurements, and in
257: %each non-empty bin we calculate the average coverage probability for
258: %that bin. Ideally we would like the average coverage to be constant
259: %over all bins. On the other hand, it may happen that the average
260: %coverage increases as the interval widens; this could be seriously
261: %misleading. It is easy to see that upper limits suffer from this
262: %defect: the average coverage is zero for upper limits below the true
263: %value, and one above. In practice of course we do not know the true
264: %value, so that calculation of an upper limit does not affect our
265: %knowledge of the coverage. However, with two-sided intervals and in
266: %the presence of systematic uncertainties the situation may not be as
267: %straightforward.
268:
269:
270:
271: \item
272: {\bf Bayesian credibility.}
273: In some situations it may be relevant to calculate the Bayesian
274: credibility of an interval, even when the latter was constructed by
275: frequentist methods. This would of course require one to choose a
276: prior for all the unknown parameters. The question is one of
277: plausibility: given the type of measurement we are making, the
278: resolution of the apparatus, etc., how likely is it that the true
279: value of the parameter we are interested in lies in the calculated
280: interval? Does this
281: %more or less agree with
282: differ dramatically from
283: the nominal coverage probability of the interval?
284: In fact for different values of the observable(s), frequentist ranges
285: are very likely to have different credibilities. Some examples
286: of this behavior are noted in Ref.~\cite{karlen}.
287:
288: When calculating the Bayesian credibility of frequentist intervals,
289: ``uninformative'' priors appear advisable. Note that severe interval
290: length pathologies will automatically produce a large inconsistency
291: between the nominal coverage and the Bayesian credibility of an
292: interval. Except in a handful of very special cases, it is not
293: possible to construct an interval scheme that has simultaneously
294: constant Bayesian credibility and constant frequentist coverage, even
295: if one has total freedom in choosing the prior(s). Although it is not
296: at all clear exactly how large a level of disagreement is
297: pathological, nevertheless it may be instructive to know how severely
298: an interval scheme deviates from constant Bayesian credibility (and
299: how sensitive this is to the choice of prior).
300:
301:
302: \item
303: {\bf Bias.}
304: In the context of interval selection, this means having a larger
305: coverage $B(s',\st)$ for an incorrect value $s'$ of the parameter than
306: for the true value \st. This requires plots of coverage versus $s'$
307: for different values of \st. For upper limits, $B(s_1',\st) \ge
308: B(s_2',\st)$ if $s_1'$ is less than $s_2'$, so methods are necessarily
309: biassed for low $s'$.
310: %Bias is more interesting for 2--sided intervals.
311: Bias thus is not very interesting for upper limits.
312: %This is because almost
313: %all methods are biassed in giving larger coverage for $s'=0$ (where the
314: %coverage is usually 100\%) than for \st.
315: It will be discussed in
316: later notes dealing with two-sided intervals.
317:
318: \item
319: {\bf Transformation under reparametrisation.}
320: %Suppose that
321: %$\theta\in\Theta$ is the parameter we are interested in, and let $f$
322: %be an arbitrary transformation of $\Theta$. The calculated interval
323: %is then called ``transformation-respecting'' if the interval for
324: %$f(\theta)$ can be obtained by applying $f$ to the endpoints of the
325: %interval for $\theta$.
326: Intervals that are not transformation-respecting can be problematic. For
327: example, it is possible for the predicted value of the lifetime of a
328: particle to be contained within the 90\% interval determined from the
329: data, but for the corresponding predicted value of the decay rate (equal
330: to the reciprocal of the predicted lifetime) to be outside the 90\%
331: interval when the data is analysed by the same procedure, but in terms
332: of decay rate. This would result in unwanted ambiguities about the
333: compatibility of the data with the prediction.
334:
335: \item
336: %Range preservation.
337: {\bf Unphysical ranges.}
338: The question here is whether the interval
339: construction procedure can be made to respect the physical boundaries
340: of the problem.
341: For example, branching fractions should be in the range zero to one,
342: masses should not be negative, etc. Statements about the {\em true}
343: value of a parameter should respect any physical bounds. In contrast,
344: some methods give {\em estimates} of parameters which can themselves
345: be unphysical, or which include unphysical values when the errors are
346: taken into account. We do not recommend truncating ranges of {\em
347: estimates} of parameters to obey such bounds. Thus the fact that a
348: branching fraction is estimated as $1.1\pm0.2$ conveys more
349: information about the experimental result than does the statement that
350: it lies in the range 0.9 to 1.
351:
352:
353:
354: \item
355: {\bf Behavior with respect to nuisance parameter.}
356: We would normally expect that the limits on a physical parameter would
357: tighten as the uncertainty on a nuisance parameter decreases; and that
358: as this uncertainty tends to zero, the limits should agree with those
359: obtained under the assumption that the ``nuisance parameter'' was
360: exactly known. (Otherwise we could sometimes obtain a tighter limit
361: simply by pretending that we knew less about the nuisance parameter
362: than in fact is the case.) These desiderata are not always satisfied
363: by non-Bayesian methods (see \cite{ch} and \cite{feldman}).
364:
365:
366:
367: \end{itemize}
368:
369: Although we are ultimately interested in comparing different
370: approaches to this problem, in this note we investigate a Bayesian
371: technique for determining upper limits. Our purpose is to spell out in
372: some detail how this approach is used, and to discuss some of the
373: properties of the resulting limits in this specific example. We
374: believe that, for variants of this problem (e.g.\ different choice of
375: prior for $s$; alternative assumptions about the information on the
376: nuisance parameters; etc.), the reader could readily adapt the
377: techniques described here (and the associated software) to their
378: particular situation.
379:
380: We will report on two-sided intervals and also compare with
381: %Here we investigate a Bayesian technique. We will report on and
382: %compare with
383: other methods (e.g.\ Cousins--Highland, pure frequentist,
384: profiled frequentist) in later notes.
385:
386: \section{Reminder of Bayesian approach}
387: \label{Bayes}
388:
389: Before dealing with the problem of extracting and studying the limits on $s$ as deduced
390: from observing $n$ events from a
391: Poisson distribution with mean $s\epsilon + b$
392: in the presence of an uncertainty on $\epsilon$, we recall the way
393: the Bayesian approach works for the simpler problem of a counting experiment with no
394: background and with $\epsilon$ exactly known. Then $n$ is Poisson
395: distributed with mean $s\epsilon$,
396: and Bayes' Theorem\footnote{We follow the common convention whereby
397: lower case $\pi$'s denote prior p.d.f.'s, lower case $p$'s denote
398: other p.d.f.'s, upper case $\Pi$'s denote prior probabilities, and
399: upper case $P$'s denote other
400: probabilities. Equation~(\ref{eqn:BayesTh}) is true for probabilities,
401: p.d.f.'s, or mixtures depending on whether $B$ and/or $C$ are
402: discrete or continuous variables.}
403: \begin{equation}
404: P(B|C) = P(C|B)P(B)/P(C)
405: \label{eqn:BayesTh}
406: \end{equation}
407: gives
408: \begin{equation}
409: p(s|n) = \frac{P(n|s) \pi(s)}{\int P(n|s) \pi(s) \ ds}
410: \label{eqn:P(s|n)}
411: \end{equation}
412: where $\pi(s)$ is the prior probability density for $s$;
413: $p(s|n)$ is the posterior probability density function (p.d.f.)\ for $s$, given
414: the observed $n$; and
415: $P(n|s)$ is the probability of observing $n$, given $s$.
416:
417: We assume a constant prior for $s$,\footnote{This is an assumption, not a necessity, and
418: is in some ways unsatisfactory. (It is implausible, cannot be normalised, and
419: creates divergences for the posterior if used with a (truncated) Gaussian prior
420: for the acceptance $\epsilon$.)} and that $P(n|s)$ is given by the Poisson
421: \begin{equation}
422: P(n|s) = e^{-s\epsilon}(s\epsilon)^n/n!
423: \label{eqn:Poisson}
424: \end{equation}
425: Then\footnote{It turns out that the sum over $n$ of the discrete distribution
426: (\ref{eqn:Poisson}) and the integral over $s$ of the continuous distribution
427: (\ref{eqn:Poisson2}) are both equal to unity.
428: This means that the probability $P(n|s)$ and the probability density
429: $p(s|n)$ are correctly normalised.}
430: \begin{equation}
431: p(s|n) = \epsilon e^{-s\epsilon}(s\epsilon)^n/n!
432: \label{eqn:Poisson2}
433: \end{equation}
434: The limit is now obtained by integrating this posterior p.d.f.\ for $s$ until we
435: achieve the required fraction $\beta$ of the total integral from zero to infinity. If
436: $\beta$ is $90\%$, the upper limit $s_\mathrm{u}$ is given by
437: \begin{equation}
438: \int^{s_\mathrm{u}}_0\!\! p(s|n)\ ds =0.9
439: \label{eqn:0.9}
440: \end{equation}
441: $\beta$ is termed the credible or Bayesian confidence level for the limit.
442:
443: For different observed $n$, the upper limits are shown in the last two
444: columns of Table~\ref{ltablex}, for $b = 0$ and for $b = 3$
445: respectively.
446: The Gaussian
447: approximation for the case $b = 0$, $n=20$, would yield
448: $s_\mathrm{u}\simeq20+1.28\sqrt{20}\simeq25.7$, which is roughly
449: comparable to the corresponding $s_\mathrm{u}=27.0451$ of the Table.
450: For $b = 0$, it coincidentally turns out that, for this
451: particular example, the Bayesian upper limits are identical with those
452: obtained in a frequentist calculation with the Neyman construction and
453: a simple ordering rule (see later note on the frequentist approach to
454: this problem). In general this is not so.
455: Other priors sometimes used for $s$ are $1/\sqrt s$ \cite{roots} or
456: $1/s$ \cite{ref:Jeffreys}. Having a prior peaked at smaller values of
457: $s$ in general results in tighter limits for a given observed $n$.
458:
459: If the whole procedure is now repeated with a background $b$ and a
460: flat prior, the upper limits not surprisingly decrease for increasing
461: $b$ at fixed $n$ (except for the case $n = 0$ where the limits can
462: trivially be seen to be independent of $b$). This is not inconsistent
463: with the fact that the mean limit for a series of measurements
464: increases with $b$, i.e.\ experiments with larger expected backgrounds
465: have poorer sensitivity.
466:
467:
468:
469:
470: \begin{table}
471: \begin{center}
472: \scriptsize
473: \begin{tabular}{@{}r|rrrrrrrrr|rr@{}}
474: \hline
475: &\multicolumn{9}{c|}{$\epsilon=1.0\pm0.1$}&
476: \multicolumn{2}{c}{$\epsilon=1\pm0$}\\
477: $n$&\multicolumn{1}{c}{$b=0$}&
478: \multicolumn{1}{c}{1}&
479: \multicolumn{1}{c}{2}&
480: \multicolumn{1}{c}{3}&
481: \multicolumn{1}{c}{4}&
482: \multicolumn{1}{c}{5}&
483: \multicolumn{1}{c}{6}&
484: \multicolumn{1}{c}{7}&
485: \multicolumn{1}{c|}{8}&
486: \multicolumn{1}{c}{$b=0$}&
487: \multicolumn{1}{c}{3}\\
488: \hline
489: 0&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3026&2.3026\\
490: 1&3.9868&3.3470&3.0620&2.9019&2.8000&2.7297&2.6783&2.6391&2.6083&3.8897&2.8389\\
491: 2&5.4669&4.5520&3.9676&3.6026&3.3623&3.1953&3.0736&2.9816&2.9099&5.3223&3.5228\\
492: 3&6.8745&5.8618&5.0463&4.4644&4.0571&3.7666&3.5534&3.3922&3.2671&6.6808&4.3624\\
493: 4&8.2380&7.1964&6.2451&5.4751&4.8914&4.4569&4.1313&3.8832&3.6904&7.9936&5.3447\\
494: 5&9.5714&8.5213&7.5063&6.6022&5.8579&5.2719&4.8180&4.4660&4.1904&9.2747&6.4371\\
495: 6&10.8826&9.8288&8.7885&7.8047&6.9344&6.2066&5.6184&5.1499&4.7772&10.5321&7.5993\\
496: 7&12.1766&11.1203&10.0703&9.0460&8.0904&7.2450&6.5289&5.9387&5.4586&11.7709&8.7958\\
497: 8&13.4570&12.3984&11.3441&10.3014&9.2952&8.3635&7.5374&6.8300&6.2380&12.9947&10.0030\\
498: 9&14.7261&13.6655&12.6085&11.5575&10.5247&9.5365&8.6250&7.8142&7.1136&14.2060&11.2085\\
499: 10&15.9858&14.9233&13.8641&12.8090&11.7630&10.7415&9.7701&8.8758&8.0775&15.4066&12.4073\\
500: 11&17.2375&16.1732&15.1121&14.0542&13.0017&11.9621&10.9525&9.9966&9.1170&16.5981&13.5983\\
501: 12&18.4823&17.4163&16.3533&15.2934&14.2371&13.1881&12.1560&11.1582&10.2162&17.7816&14.7816\\
502: 13&19.7210&18.6535&17.5887&16.5269&15.4682&14.4139&13.3692&12.3452&11.3588&18.9580&15.9580\\
503: 14&20.9545&19.8854&18.8191&17.7554&16.6946&15.6373&14.5856&13.5459&12.5302&20.1280&17.1280\\
504: 15&22.1832&21.1127&20.0448&18.9795&17.9169&16.8572&15.8014&14.7528&13.7187&21.2924&18.2924\\
505: 16&23.4078&22.3359&21.2665&20.1996&19.1353&18.0737&17.0151&15.9612&14.9161&22.4516&19.4516\\
506: 17&24.6286&23.5553&22.4845&21.4161&20.3502&19.2868&18.2261&17.1689&16.1172&23.6061&20.6061\\
507: 18&25.8459&24.7714&23.6992&22.6294&21.5619&20.4969&19.4344&18.3747&17.3189&24.7563&21.7563\\
508: 19&27.0601&25.9844&24.9109&23.8397&22.7708&21.7042&20.6400&19.5784&18.5198&25.9025&22.9025\\
509: 20&28.2715&27.1946&26.1199&25.0474&23.9770&22.9090&21.8432&20.7799&19.7191&27.0451&24.0451\\
510: \hline
511: \end{tabular}
512: \end{center}
513: \caption{Upper 90\% limits for $n$ observed events with $b$ background
514: and $\epsilon=1.0\pm0.1$ ($\kappa=100$ and $m=99$,
515: as defined in section~\ref{submeas}). Also shown
516: are limits for $b=0$ and $b=3$ with fixed $\epsilon=1$.
517: \label{ltablex}}
518: \end{table}
519:
520:
521:
522:
523: %\begin{table}
524: %\begin{center}
525: %\begin{tabular}{c|ccc|cc}
526: %\hline
527: %Observed number & \multicolumn{3}{c|}{Bayes upper limit}
528: %& \multicolumn{2}{c}{Frequentist upper limit} \\
529: %\cline{2-6}
530: %\rule[-2.5mm]{0mm}{7.5mm}
531: % & $B1$ & $B2$ & $B3$
532: % & $Neyman$ & Feldman--Cousins \\
533: %\hline\hline
534: %0 & 2.30 & **** & **** & 2.30 & **** \\
535: %1 & 3.89 & **** & **** & 3.89 & **** \\
536: %2 & 5.32 & **** & **** & 5.32 & **** \\
537: %3 & 6.68 & **** & **** & 6.68 & **** \\
538: %4 & 7.99 & **** & **** & 7.99 & **** \\
539: %\hline
540: %\end{tabular}
541: %\end{center}
542: %\caption{Upper limits at the $90\%$ confidence (or credibility)
543: % level for $\epsilon = 1$ and no background.
544: %The upper limit is shown as a function on the Poisson observable $n$.
545: %*****This Table will change.**********}
546: %\label{tab:Bayes_limits}
547: %\end{table}
548:
549:
550: \subsection{Coverage}
551:
552: Next we can investigate the frequentist coverage $C(\st)$\footnote
553: {This is the coverage at $s = \st$ when the Poisson variable is generated with
554: $s = \st$. This differs from $B(s',\st)$ where the coverage is checked at
555: $s = s'$ when the generation value is \st.} of this Bayesian approach.
556: That is, we can ask what the probability is, for a given value of \st, of our upper
557: limit being larger than \st, and hence being consistent with it. This is equivalent
558: to adding up the Poisson probabilities
559: of eqn.~(\ref{eqn:Poisson}) for those values of $n$ for which $s_\mathrm{u}(n) \ge \st$ i.e.
560: \begin{equation}
561: C(\st) = \sum_{\mathrm{relevant}\ n}\!\!\!e^{-\st\epsilon}(\st\epsilon)^n / n!
562: \label{csum}
563: \end{equation}
564: As \st\ increases through any of the values of $s_\mathrm{u}$ of
565: the last two columns of
566: Table~\ref{ltablex}, the coverage drops sharply.
567: For example, for the case of zero background and efficiency known to
568: be unity, the 90\% Bayesian upper limits will include $\st=3.8896$ for
569: $n = 1$ or larger. But $\st=3.8898$ is no longer below the upper limit
570: for $n = 1$. Thus one term drops out of the summation of
571: eqn.~(\ref{csum}) for the calculation of the coverage at $\st=3.8898$,
572: while the remaining terms change but little for the small change in
573: \st; this produces the abrupt fall in coverage. The coverage is
574: plotted in Fig.~\ref{fig:Bayes}, where the drop at $\st=3.8897$ can be
575: seen.
576: %The coverage is plotted in Fig.~\ref{fig:Bayes}.
577:
578: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
579:
580: {\footnotesize
581: \begin{quotation}
582: The calculation of $C(\st)$ can be done as follows:
583: The identity
584: \begin{equation}
585: f'(x)=
586: e^{-x}\left[\sum_{k=0}^{n-1}{x^k\over k!}-\sum_{k=0}^n{x^k\over k!}\right]=
587: -e^{-x}{x^n\over n!}
588: \qquad\mathrm{for}\qquad f(x)=e^{-x}\sum_{k=0}^n{x^k\over k!}
589: \end{equation}
590: allows us to write (integrating $-f'(x)$)
591: \begin{equation}
592: \int^{\st}_0\!\!\!\! p(s|n)\,ds =
593: \int^{\st\epsilon}_0\!\!\!\!e^{-x}{x^n\over n!}dx =
594: 1-e^{-\st\epsilon}\sum_{k=0}^n{(\st\epsilon)^k\over k!}
595: \end{equation}
596: From this, it follows that
597: ``relevant $n$'' is equivalent to any one of these
598: inequalities:
599: \begin{equation}
600: s_\mathrm{u}(n) \ge \st\Leftrightarrow
601: \int^{s_\mathrm{u}}_0\!\! p(s|n)\,ds\ge\int^{\st}_0\!\!\!\! p(s|n)\,ds\Leftrightarrow
602: \beta\ge1-e^{-\st\epsilon}\sum_{k=0}^n{(\st\epsilon)^k\over k!}
603: \end{equation}
604: and our expression for the coverage becomes
605: \begin{equation}
606: C(\st) = 1 - {\sum_{n=0}}'e^{-\st\epsilon}{(\st\epsilon)^n \over n!}
607: \end{equation}
608: where ${\sum}'$ means ``sum until the next term would cause the sum to
609: exceed $1-\beta$''. This result proves that $C(\st)\ge\beta$
610: for all values of \st\ in this simple example.
611:
612: \end{quotation}
613: }
614:
615: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
616:
617:
618:
619: It is seen that the coverage starts at $100\%$ for small \st. This is because
620: even for $n = 0$ the Bayesian upper limit will include \st, and this is even more
621: so for larger $n$.
622:
623: Bayesian methods can be shown to achieve average coverage. By this we
624: mean that when the coverage is averaged over the parameter $s$,
625: weighted by the prior in $s$, the result will agree with the nominal
626: value $\beta$, i.e.
627: \begin{equation}
628: \frac{\int C(s)\ \pi(s)\ ds}{\int \pi(s)\ ds} = \beta
629: \end{equation}
630: A proof of this theorem is given in the second appendix, section~\ref{ac} of this note.
631:
632: For a constant prior, the region at large $s$ tends to dominate the
633: average, while in general we will be interested in the coverage at
634: small $s$. Thus the ``average coverage'' result is of academic rather
635: than practical interest, especially for the case of a flat prior.
636: Indeed it is possible to have a situation where the average coverage
637: is, say, $90\%$, while the coverage as a function of $s$ is always
638: larger than or equal to $90\%$.
639:
640:
641: \section{The actual problem}
642: Our actual problem differs from the simple case of Section \ref{Bayes}
643: in that\\ (a) we have a background $b$, assumed for the time being to
644: be accurately known; and (b) we have an acceptance $\epsilon$
645: estimated in a subsidiary experiment as $\epsilon_0 \pm
646: \sigma_\epsilon$.
647:
648: What we are going to do is to use a multidimensional version of Bayes' Theorem
649: to express $p(s,\epsilon|n)$ in terms of $P(n|s,\epsilon)$ and the
650: priors for $s$ and $\epsilon$. The relationship is\footnote{
651: %This is deduced
652: For the case where the probabilities have a frequency ratio
653: interpretation, this is seen
654: from the mathematical identities\\ \\
655: $P(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z) = \frac{N(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z)}{N(Z)}\frac{N(Z)}{N_\mathrm{tot}} =P(X,Y|Z)\
656: P(Z)$
657: and \\ \\
658: $P(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z) = \frac{N(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z)}{N(X\ \mathrm{and}\ Y)}\frac{N(X\ \mathrm{and}\ Y)}{N_\mathrm{tot}} =
659: P(Z|X,Y)\ P(X,Y)$.\\ \\
660: So with $X$, $Y$ and $Z$ identified with $s$, $\epsilon$ and $n$ respectively, and with the prior for
661: $s$ and $\epsilon$ factorising into two separate priors for $s$ and for $\epsilon$, we obtain
662: $p(s,\epsilon|n)\ P(n) = P(n|s,\epsilon)\ \pi(s)\ \pi(\epsilon)$.
663: }
664: \begin{equation}
665: p(s,\epsilon|n) = \frac{P(n|s,\epsilon)\pi(s)\pi(\epsilon)}{\int\!\!\int
666: P(n|s,\epsilon)\pi(s)\pi(\epsilon)\ ds\ d\epsilon}
667: \label{eqn:extended}
668: \end{equation}
669:
670: To obtain the posterior p.d.f.\ for $s$, we now integrate this over $\epsilon$:
671: \begin{equation}
672: p(s|n) = \int_0^\infty\!\! p(s,\epsilon|n)\ d\epsilon,
673: \label{eqn:posterior}
674: \end{equation}
675: and finally we use this to set a limit on $s$ as in eqn.~(\ref{eqn:0.9}).
676:
677: The coverage for this procedure needs to be calculated as a function of
678: \st\ and \et.
679: The average coverage theorem of the previous section must be
680: generalized to
681: \begin{equation}
682: {\int\!\!\int C(s,\epsilon)\pi(s)\pi(\epsilon)\,ds\,d\epsilon
683: \over
684: \int\!\!\int \pi(s)\pi(\epsilon)\,ds\,d\epsilon}=\beta
685: \end{equation}
686:
687:
688:
689:
690:
691: \subsection{Priors}
692: To implement the above procedure we need priors for $s$ and $\epsilon$. As in
693: the simple example of Section \ref{Bayes}, for simplicity we assume
694: that the prior for $s$ is constant. It will be interesting to look at
695: the way the properties of this method change as other priors for $s$ are used.
696:
697: We assume that the prior for $\epsilon$ is extracted from some
698: subsidiary measurement $\epsilon_0 \pm \sigma_\epsilon$. We do {\bf
699: not} assume that this implies that our belief about \et\ is
700: represented by a {\bf Gaussian} distribution centred on $\epsilon_0$,
701: as this would give trouble with the lower end of the Gaussian
702: extending to negative $\epsilon$.
703: %A Gaussian truncated at zero could also run into difficulties because
704: %of its finite probability density at zero acceptance.
705: Instead, we specify some particular form of the
706: subsidiary experiment that provides information about $\epsilon$, and
707: then assume that a Bayesian analysis of this yields a posterior
708: p.d.f.\ for $\epsilon$. Slightly confusingly, this posterior from the
709: subsidiary experiment is used as the prior for the application of
710: Bayes' Theorem to extract the limit on $s$ (see eqns.\
711: (\ref{eqn:extended}) and (\ref{eqn:posterior})).
712: %We aim to have this posterior/prior for $\epsilon$ having zero
713: %probability density at $\epsilon = 0$.
714:
715: \subsection{The subsidiary measurement}
716: \label{submeas}
717: Somewhat arbitrarily, we assume that, for a true acceptance \et, the
718: probability for the measured value $\epsilon_0$ in the subsidiary experiment
719: is given by a Poisson distribution
720: \begin{equation}
721: P(\epsilon_0|\et) = e^{-\kappa\et}\kappa^m\et^m/m!
722: \label{eqn:pdf}
723: \end{equation}
724: where $\epsilon_0 = (m+1)/\kappa$, $\sigma_\epsilon^2 =
725: (m+1)/\kappa^2$ and $\kappa$ is a scaling constant\footnote{Here we define
726: $\epsilon_0$ and $\sigma^2_{\epsilon}$ as the mean and variance of the
727: posterior p.d.f.\ of eqn.~(\ref{eqn:posterior_e}).}. We interpret this
728: as the probability for $\epsilon_0$. This is discrete because the
729: observable $m$ is discrete, but the allowed values become closely
730: spaced for large $\kappa$. For small $\sigma_\epsilon/\epsilon$
731: (i.e.\ for large $m$), these probabilities approximate to a narrow Gaussian
732: (see Fig.~\ref{fig:comparison}).
733:
734: Given our choice of probability in eqn.~(\ref{eqn:pdf}),
735: the likelihood for the parameter $\epsilon$, given measured $\epsilon_0$,
736: is
737: %a likelihood approach to deducing
738: %\et\ from a measured $\epsilon_0$ would make use of
739: \begin{equation}
740: \mathcal{L}(\epsilon|\epsilon_0) = e^{-\kappa\epsilon}\kappa^m\epsilon^m/m!
741: \label{eqn:likelihood}
742: \end{equation}
743: This is the same function of $\epsilon$ and $\epsilon_0$ as eqn.~(\ref{eqn:pdf}), but
744: now $m$
745: is regarded as fixed, and $\epsilon$ is the variable. The likelihood is a continuous function of
746: $\epsilon$. It is compared with a Gaussian in Fig.~\ref{fig:comparison2}.
747:
748: Finally in the Bayes approach, with the choice of a constant prior for $\epsilon$, the posterior
749: probability density for $\epsilon$
750: after our subsidiary measurement is
751: \begin{equation}
752: p(\epsilon|m) \propto \ e^{-\kappa\epsilon}\kappa^m\epsilon^m/m!
753: \label{eqn:posterior_e}
754: \end{equation}
755: which is obtained by multiplying the right-hand side of eqn.~(\ref{eqn:likelihood}) by unity.
756: This {\bf posterior} probability density for $\epsilon$ will be used as our {\bf prior}
757: for $\epsilon$ in the next step of deducing the limit for $s$.
758:
759: \section{Results}
760:
761: The details of the necessary analytical calculations\footnote{This
762: example can be handled analytically. More complicated cases might require
763: numerical integration, which can be done via numerical quadrature or
764: Monte Carlo methods.} are presented in the Appendix of this note. In
765: this section we investigate the behavior of the Bayesian limits in
766: this example, especially the shape of the frequentist coverage
767: probability as a function of
768: \st.
769:
770:
771: \subsection{Shape of the posterior}
772:
773: The posterior p.d.f.\ for $s$ has the form
774: \begin{equation}
775: p(s|b,n)ds\propto
776: \left[\int_0^\infty
777: {e^{-(\epsilon s+b)}(\epsilon s+b)^n\over n!}
778: {\kappa (\kappa\epsilon)^{m} e^{-\kappa\epsilon} \over\Gamma(m+1)}d\epsilon
779: \right]1\,ds
780: \end{equation}
781: where the likelihood, the prior for $\epsilon$, the (constant) prior
782: for $s$, and the marginalization integral over $\epsilon$ are all
783: prominently displayed.
784:
785: The posterior probability density for $s$ gives the complete summary
786: of the outcome of the measurement in the Bayesian approach. It is
787: therefore important to understand its shape before proceeding to use
788: it to compute a limit (or extract a central value and error-bars).
789:
790: Figure~\ref{pdfs} illustrates the shape of the posterior for $s$
791: (i.e.\ marginalized over $\epsilon$) in the case of a nominal 10\%
792: uncertainty on $\epsilon$, and an expectation of 3 background
793: events. Plots are shown for 1, 3, 5, and 10 observed events. The
794: posterior evolves gracefully from being strongly peaked at $s=0$ to a
795: roughly Gaussian shape that excludes the neighborhood near $s=0$ with
796: high probability. Technically, the posterior would be described as a
797: mixture of $n+1$ Beta distributions of the 2nd kind\footnote{The 2nd Beta
798: distribution is also known as ``$\mathrm{Beta}'$''
799: (i.e.\ ``Beta prime''), ``inverted Beta'', ``Pearson Type~VI'',
800: ``Variance-Ratio'', ``Gamma-Gamma'', ``F'', ``Snedecor'',
801: ``Fisher-Snedecor''\ldots.}, giving
802: it a tail at high $s$ that is heavier than that of a Gaussian.
803:
804: \subsection{Upper limits}
805:
806: In this note, our main goal is to obtain a
807: Bayesian upper limit $s_\mathrm{u}$ from our observation of $n$
808: events. It is by integrating the posterior p.d.f.\ out to
809: $s=s_\mathrm{u}$ that an upper limit is calculated: a $\beta=90\%$
810: upper limit is defined so that the integral of the posterior from
811: $s=0$ to $s=s_\mathrm{u}$ is 0.9. The probability (in the Bayesian
812: sense) of $\st<s_\mathrm{u}$ is then exactly $\beta$.
813:
814: Table~\ref{ltablex} shows the upper limits ($\beta=0.9$) for $n=0$--20
815: observed events with $b=0$--8 and $\epsilon=1.0\pm0.1$.
816: (Integer values of $b$ are chosen for illustration purposes only;
817: $b$ can, of course, take any real value $\ge0$.)
818:
819: One notices
820: that when $n=0$, the limit is independent of the expected background
821: $b$. This is required in the Bayesian approach: we know that exactly
822: zero background events were produced (when no events at all were
823: produced), and this knowledge of what {\em did} happen makes what
824: might have happened superfluous. An interesting corollary
825: is, in the case of no events observed, uncertainties in estimating
826: the background rate are of no consequence in the Bayesian approach,
827: and must not contribute any systematic uncertainty to the limit.
828: This reasoning does not hold in the frequentist framework,
829: where what might have happened definitely does influence the limit.
830:
831:
832:
833: For comparison, limits for fixed $\epsilon=1$ with $b=0$ or $b=3$ are
834: also shown in Table~\ref{ltablex}. It is interesting that these two
835: columns start out equal at $n=0$ and differ by almost exactly 3 for
836: $n>11$. In contrast, the difference between the $b=0$ and $b=3$
837: columns for $\epsilon=1.0\pm0.1$ is already greater than 3 at $n=6$, and
838: continues to grow as $n$ increases; it is not clear whether
839: the difference approaches a finite value as $n\to\infty$.
840: In any case, the limits for $\epsilon=1$ exactly are all smaller
841: than the corresponding limits for $\epsilon=1.0\pm0.1$,
842: as expected.
843:
844:
845: \subsection{Coverage}
846:
847: The main quantity of interest in this subsection is the frequentist
848: coverage probability $C$ as a function of \st\ (for fixed \et\ and
849: $b$). Because both the main and the subsidiary measurements involve
850: observing a discrete number of events, the function $C(\st)$ will have
851: many discontinuities. On the other hand, $C(\et)$ will be continuous
852: (for fixed \st). The explanation of this effect is as follows:
853:
854: \begin{quotation}
855: \footnotesize
856:
857: The measured data are $n$ events in the main measurement and $m$ events
858: in the subsidiary measurement. For each observed outcome $(n,m)$
859: there is a limit $s_\mathrm{u}(n,m)$. This limit includes the effect
860: of marginalization over $\epsilon$.
861:
862: All $(n,m)$ with $n\ge0$ and $m\ge0$ are possible, and the probability
863: $P$ of observing $(n,m)$ can be calculated as the product of two
864: Poissons. (It will depend on \st, \et, \ldots) If we look at all the
865: possible limits we can obtain,
866: \begin{equation}
867: \{ s_\mathrm{u}(n,m) | n\ge0\ \mathrm{and}\ m\ge0\}
868: \end{equation}
869: and sort them in increasing $s_\mathrm{u}$, the $s_\mathrm{u}$ are
870: countably infinite in number and dense in the same way that rational
871: numbers are dense in the reals.
872:
873: To compute the coverage as a function of \st, we simply add up all
874: the probabilities of obtaining $(n,m)$ with $s_\mathrm{u}(n,m)\ge\st$:
875: \begin{equation}
876: C=\!\!\sum_{(n,m)\in\mathcal{A}}\!\!P(n,m)\qquad\qquad
877: \mathcal{A}=\{(n,m) | \st\le
878: s_\mathrm{u}(n,m)\ \mathrm{and}\ n\ge0\ \mathrm{and}\ m\ge0\}
879: \end{equation}
880: This sum is over a countably infinite number of terms. If we
881: increase $\st$ slightly to $\st+ds$ and recalculate the coverage, we
882: have to drop all the terms
883: \begin{equation}
884: \{ (n,m) | \st \le s_\mathrm{u}(n,m) \le \st+ds \}
885: \end{equation}
886: from the previous sum (the $P(n,m)$ for each term also changes
887: continuously with $\st$, but this is no problem). If there are $M>0$
888: such terms, there are $M$ discontinuities in the coverage in the
889: interval $[\st,\st+ds]$, since $P(n,m)$ for each of these is finite,
890: and we lose them one by one as we sweep across the interval
891: $[\st,\st+ds]$.
892:
893: But it seems that, in general, we can always find a solution to
894: $\st\le s_\mathrm{u}(n,m)\le\st+ds$ for finite $ds$ by going out to larger and
895: larger $n$ and $m$. So, although the discontinuity may be tiny, we can
896: always find a finite discontinuity in any finite interval of $\st$.
897:
898: On the other hand, if we keep $\st$ fixed and vary $\et$, we always
899: sum over the same set of $(n,m)$, since the definition of
900: $\mathcal{A}$ does not involve \et, and $P(n,m)$ is continuous in
901: \et. So the coverage is continuous as a function of \et\ for \st\
902: fixed.
903:
904: \end{quotation}
905:
906: Plotting a curve that is discontinuous at every point is somewhat
907: problematical. The solution adopted here is to plot the coverage as
908: straight line segments between the discontinuities, ignoring any
909: discontinuities with $|\Delta C|<10^{-4}$. Figure~\ref{l100} shows
910: $C(\st)$ for the case $\beta=90\%$, $\et=1$, nominal 10\% uncertainty
911: of the subsidiary measurement of $\epsilon$, and $b=3$. We observe
912: that $C(\st)>\beta$ in this range, and it is not clear numerically
913: whether $C(\st)\to\beta$ as $\st\to\infty$. The same conclusions hold
914: for Fig.~\ref{l25}, which illustrates the same situation with a 20\%
915: nominal uncertainty for the $\epsilon$-measurement.
916:
917: Figure~\ref{ecov} shows $C(\et)$ for $\beta=90\%$, $\st=10$,
918: $\kappa=100$, and $b=3$---continuous as advertised. The shape of the
919: curve is quite similar to that of Figs.~\ref{l100} and \ref{l25}, so
920: it seems that the coverage probability (with $b$ fixed) is
921: approximately a function of just the product of \et\ and \st. This
922: approximate rule is likely to fail in the limit as $\et\to0$ and
923: $\st\to\infty$, for example, but it seems to hold when \et\ and
924: \st\ are at least of the same order of magnitude.
925:
926: When $\et\st$ is small, of order 1 or less, the coverage is
927: $\sim$100\%, as in the simple case of Fig.~\ref{fig:Bayes}.
928: Otherwise, the behavior of coverage in Figs.~\ref{l100}--\ref{ecov}
929: is superior to
930: that of Fig.~\ref{fig:Bayes}, which has
931: a much larger amplitude of oscillation.
932:
933: Another frequentist quantity that characterizes the performance of a
934: limit scheme is the sensitivity, defined as the mean of
935: $s_\mathrm{u}$. Figure~\ref{sens} shows the sensitivity as a function
936: of \st\ for the case of Fig.~\ref{l100}; $\langle
937: s_\mathrm{u}\rangle$ is observed to be nearly linearly dependent on
938: \st. There is one complication here: when the subsidiary
939: measurement observes $m=0$ events, and the prior for $s$ is flat,
940: $s_\mathrm{u}=\infty$. Since the Poisson probability of obtaining
941: $m=0$ is always finite, $\langle s_\mathrm{u}\rangle$ is consequently
942: infinite. So we must exclude the $m=0$ case from the
943: definition of $\langle s_\mathrm{u}\rangle$. (In Fig.~\ref{sens}
944: the probability of obtaining $m=0$ is $e^{-100}\simeq4\times10^{-44}$.)
945:
946:
947:
948: \subsection{Other priors for $s$}
949:
950: A weakness of the Bayesian approach is that there is no
951: universally accepted method to obtain a unique ``non-informative''
952: or ``objective'' prior p.d.f. Reference~\cite{roots}, for example,
953: states:
954: \begin{quote}
955: Put bluntly: data cannot ever speak entirely for themselves; every
956: prior specification has {\em some} informative posterior or predictive
957: implications; and ``vague'' is itself much too vague an idea to be
958: useful. There is no ``objective'' prior that represents ignorance.
959: \end{quote}
960: Nevertheless, Ref.~\cite{roots} does derive a $1/\sqrt{s}$ ``reference
961: prior'' for the simple Poisson case, which is claimed to have ``a
962: minimal effect, relative to the data, on the final inference''. This
963: is to be considered a ``{\em default} option when there are
964: insufficient resources for detailed elicitation of actual prior
965: knowledge''.
966:
967: Reference~\cite{ref:Jeffreys} attempts to discover the optimal form for
968: prior ignorance by considering the behavior of the prior under
969: reparameterizations. For the case in question, the form $1/s$ clearly
970: has the best properties in this respect.
971:
972: We are using an flat ($s^0$) prior for this study, which
973: seems to be the most popular choice, but the Appendix works out the
974: form of the posterior using an $s^{\alpha-1}$ prior, so we can briefly
975: here summarize the results for the $1/s$ and $1/\sqrt{s}$ cases:
976:
977: The $1/s$ prior leads to an unnormalizable posterior for all observed
978: $n$ when $b>0$. The posterior becomes a $\delta$-function at $s=0$,
979: $s_\mathrm{u}=0$ for any $\beta$, and the coverage is consequently
980: zero for all $\st>0$. This clearly is a disaster.
981:
982: The $1/\sqrt{s}$ prior results in a posterior p.d.f.\ qualitatively
983: similar in shape to those of Fig~\ref{pdfs}, except that the p.d.f.\
984: is always infinite at $s=0$. For $n\gg b$, this produces an extremely
985: thin ``spike'' at $s=0$, which has a negligible contribution to the
986: integral of the posterior p.d.f. A more significant difference (for
987: frequentists) between the $1/\sqrt{s}$ and the $s^0$ case is that the
988: coverage probability is significantly reduced: for the case of
989: Fig.~\ref{l100} the $1/\sqrt{s}$ prior pushes the minimum coverage
990: down to $\sim$0.87. So the $1/\sqrt{s}$ prior leads to violation of
991: the frequentist coverage requirement; it undercovers for some
992: values of \st.
993:
994: %From the practical point of view---trying to upset as few people as
995: %possible---the $s^0$ prior for the case considered in this note seems
996: %more universally acceptable than the $1/\sqrt{s}$ prior, which is
997: %objectionable from a frequentist point of view.
998: One might also seek to
999: further improve the coverage properties by adopting an intermediate
1000: prior. For example, an $s^{-0.25}$ prior would reduce the level of
1001: overcoverage obtained with the $s^0$ prior. How acceptable this
1002: approach would be within the Bayesian Statistical community is an
1003: interesting question.
1004:
1005: It should be noted that all the prior p.d.f.'s considered in this note
1006: are ``improper priors''---they cannot be correctly normalized: In the
1007: case of the $s^0$ and $1/\sqrt{s}$ priors, the integral from 0 to any
1008: value $s_0$ is finite, while the integral from $s_0$ to infinity is
1009: infinite. The corresponding integrals of the $1/s$ prior are infinite
1010: on both sides for all $s_0>0$. Improper priors are dangerous but often
1011: useful; ``improper posteriors'' are generally pathological. Extra
1012: care must be taken when employing improper priors to verify the
1013: normalizability of the resulting posterior---when using a numerical
1014: method to obtain the posterior, it is very easy to miss the fact that
1015: its integral is infinite.
1016:
1017:
1018:
1019: \subsection{Restrictions}
1020:
1021: We summarize here the restrictions forced on the priors for $s$ and
1022: $\epsilon$---see the Appendix for the analytical causes. The
1023: discussion below assumes $b>0$. The prior for $s$ being of the form
1024: $s^{\alpha-1}$, we must require $\alpha>0$, as discussed above.
1025:
1026: As specified in this note, the prior for $\epsilon$, being taken from
1027: the posterior from the subsidiary measurement with a flat prior, has
1028: been given no freedom. Should the subsidiary measurement observe $m=0$
1029: events, the posterior for $s$ is not normalizable when $\alpha\ge1$:
1030: $s_\mathrm{u}=\infty$ when $m=0$ and $\alpha\ge1$.
1031:
1032: This behavior is due to a well known effect: the $\epsilon$ prior
1033: becomes $\kappa e^{-\kappa\epsilon}$ when $m=0$, which remains finite
1034: as $\epsilon\to0$. All such cases\footnote{A Gaussian truncated at
1035: $\epsilon=0$ is the standard example.} yield $s_\mathrm{u}=\infty$ when
1036: $\alpha\ge1$; any positive $\alpha<1$ cuts off the posterior at large
1037: $s$ sufficiently rapidly to render it normalizable. From this point of
1038: view, a $1/\sqrt{s}$ prior may seem preferable, but on the other hand,
1039: having $s_\mathrm{u}=\infty$ when $m=0$ seems intuitively reasonable.
1040: (In general, we have $s_\mathrm{u}=\infty$ for $m\le\alpha-1$, but
1041: $\alpha\ge2$ are not popular choices.)
1042:
1043: There is another approach possible to the gamma prior for $\epsilon$:
1044: we may simply specify by fiat the form of the prior as
1045: \begin{equation}
1046: p(\epsilon|\mu)d\epsilon =
1047: {\kappa (\kappa\epsilon)^{\mu-1} e^{-\kappa\epsilon} \over\Gamma(\mu)}
1048: d\epsilon
1049: \end{equation}
1050: where $\mu$ is no longer required to be an integer. In practice, one
1051: then might obtain $\mu$ and $\kappa$ from a subsidiary measurement
1052: whose result is approximated by the gamma distribution.
1053: In such cases,
1054: one must require $\mu>\alpha$ to keep the posterior normalizable. Note
1055: that in this form, $\mu/\kappa$ is the mean of the $\epsilon$ prior,
1056: $(\mu-1)/\kappa$ is the mode, and $\mu/\kappa^2$ is the variance.
1057: The subsidiary measurement is often analysed by other experimenters,
1058: who chose statistics to quote for their central value and uncertainty
1059: (omitting additional likelihood information).
1060: It is then important to obtain $\mu$ and $\kappa$ in a consistent way
1061: from the information supplied by the subsidiary measurement. If
1062: $\epsilon$, for example, were estimated by a maximum likelihood
1063: method, one would identify the estimate with $(\mu-1)/\kappa$ rather
1064: than $\mu/\kappa$.
1065:
1066:
1067: \section{Conclusions}
1068:
1069:
1070: Results have been presented on the performance of a purely Bayesian
1071: approach to the issue of setting upper limits on the rate of a
1072: process, when $n$ events have been observed in a situation where the
1073: expected background is $b$ and where the efficiency/acceptance factor
1074: $\epsilon \pm \sigma_{\epsilon}$ has been determined in a subsidiary
1075: experiment.
1076: We find that this approach, when using a flat prior for the rate,
1077: results in modest overcoverage.
1078: Plots of the expected sensitivity of such a measurement
1079: and of the coverage of the upper limits are given.
1080: It will be
1081: interesting to compare these with the corresponding plots for other
1082: methods of extracting upper limits, to be given in future notes.
1083: Reference~\cite{software} provides the limit calculating software
1084: associated with this study in the form of C~functions.
1085:
1086:
1087: \section{Appendix A---Analytical Details}
1088:
1089: %%%%%%%%%%%%%%%%%%%%%%%%%%%%5
1090:
1091:
1092: Here we present the details of the analytical calculation of the
1093: posterior p.d.f.\ for $s$. For generality, we work through the
1094: calculation with a $s^{\alpha-1}$ prior; a flat prior is then the
1095: special case $\alpha=1$.
1096:
1097: \subsection{Posterior for $s$ with $\epsilon$ and $b$ fixed }
1098:
1099: We measure $n$ events from a process with Poisson rate $\epsilon s+b$,
1100: and we want the Bayesian posterior for $s$, given improper prior
1101: $s^{\alpha-1}$. We compute the posterior for fixed $\epsilon$ and $b$
1102: in this subsection; the calculation with our prior
1103: for $\epsilon$ follows in the next subsection. We have
1104: \[
1105: \mbox{posterior:}\quad p(s|\epsilon,b,n)ds={1\over\mathcal{N}_s}
1106: e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}ds
1107: \]
1108: where all factors not depending on $s$ have already been absorbed
1109: into the normalization constant $\mathcal{N}_s$, which is defined by
1110: \[
1111: \mathcal{N}_s=
1112: \int_0^\infty e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}ds=
1113: {b^{n+\alpha}\over\epsilon^\alpha}
1114: \int_0^\infty e^{-bu}u^{\alpha-1}(1+u)^ndu\qquad(u=s\epsilon/b)
1115: \]
1116: where we have performed the indicated change of variable.
1117:
1118: Expanding
1119: $(1+u)^n$ in powers of $u$ using the binomial theorem, we get
1120: \[
1121: (1+u)^n=n!\sum_{k=0}^n{u^{n-k}\over(n-k)!k!}\qquad\Rightarrow\qquad
1122: \mathcal{N}_s=
1123: n!\epsilon^{-\alpha}
1124: \sum_{k=0}^n{\Gamma(\alpha+n-k)b^k\over(n-k)!k!}
1125: \]
1126: Recognizing this as of the general hypergeometric form, we write it as
1127: \[
1128: \mathcal{N}_s=
1129: \epsilon^{-\alpha}
1130: \Gamma(\alpha+n)\left[1+{n\over\alpha+n-1}{b\over1!}+
1131: {n(n-1)\over(\alpha+n-1)(\alpha+n-2)}{b^2\over2!}+\cdots\right]
1132: \]
1133: to make the hypergeometric nature more explicit. Using the modern
1134: notation\cite{ff} for the falling factorial
1135: \[
1136: z^{\underline{k}}\equiv
1137: {\Gamma(z+1)\over\Gamma(z-k+1)}=z(z-1)(z-2)\cdots(z-k+1)
1138: \]
1139: this is expressed as
1140: \[
1141: \mathcal{N}_s=
1142: \epsilon^{-\alpha}\Gamma(\alpha+n)
1143: \sum_{k=0}^n{n^{\underline{k}}\over(\alpha+n-1)^{\underline{k}}}{b^k\over k!}=
1144: \epsilon^{-\alpha}\Gamma(\alpha+n)M(-n,1-n-\alpha,b)
1145: \]
1146: where $M$ is the notation of \cite{as}.
1147: ($M$, a confluent hypergeometric function, is often written $_1F_1$,
1148: and the relation given here is only valid for integer $n\ge0$.) Note
1149: that $M(-n,1-n-\alpha,b)$ is a polynomial of order $n$ in $b$ (for $n$
1150: a non-negative integer), and is related to the Laguerre polynomials.
1151: When $\alpha=1$, we get $M(-n,-n,b)$, which is related to the Incomplete
1152: Gamma Function.
1153: When $\alpha=0$, we get $M(-n,1-n,b)$, which is infinite, so we require
1154: that $\alpha>0$.
1155:
1156:
1157: Our posterior probability density for fixed $\epsilon$
1158: is then given by
1159: \[
1160: p(s|\epsilon,b,n)ds=
1161: {\epsilon^\alpha e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}
1162: \over\Gamma(\alpha+n)M(-n,1-n-\alpha,b)}ds
1163: \]
1164:
1165:
1166: \subsection{Posterior for $\epsilon$ of the subsidiary measurement}
1167:
1168: The subsidiary measurement observes an integer number of events $m$,
1169: Poisson distributed as:
1170: \[
1171: P(m|\epsilon) = {e^{-\kappa\epsilon} (\kappa\epsilon)^m\over m!}
1172: \]
1173: where $\kappa$ is a real number (connecting the subsidiary measurement to the
1174: main measurement) whose uncertainty is negligible, so $\kappa$ can safely be
1175: treated as a fixed constant. $\kappa$ might be thought of, for example, as
1176: based on a cross section that is exactly calculable by theory. There
1177: is negligible (i.e.\ zero) background in the subsidiary measurement.
1178:
1179: The prior for $\epsilon$ is specified to be flat.
1180: The Bayesian posterior p.d.f.\ for $\epsilon$ is then
1181: \[
1182: p(\epsilon|m) = {\kappa (\kappa\epsilon)^m e^{-\kappa\epsilon} \over m!}
1183: \]
1184: (or $\Gamma(m+1)$ instead of $m!$ in the denominator if you prefer).
1185: This is known as a gamma distribution.
1186:
1187: The mean and rms of this posterior p.d.f.\ summarize the result of the
1188: subsidiary measurement as:
1189: \[
1190: \epsilon = {m + 1\over\kappa} \pm
1191: {\sqrt{m + 1}\over\kappa} = \epsilon_0 \pm \sigma_\epsilon
1192: \]
1193: Note that the observed data quantity in the subsidiary measurement is
1194: an integer $m$, while the quantity being measured by the subsidiary
1195: measurement is a positive real number $\epsilon$.
1196:
1197: \subsection{Posterior for $s$ with gamma prior for $\epsilon$ ($b$ fixed)\label{postpdf}}
1198:
1199:
1200: Next we compute the joint posterior $p(s,\epsilon|b,n)dsd\epsilon$
1201: using the $s^{\alpha-1}$ prior for $s$ and
1202: our gamma distribution prior (i.e.\ the posterior derived
1203: from the subsidiary measurement) for $\epsilon\ge0$
1204: \[
1205: \mbox{prior for $\epsilon$:}\quad \pi(\epsilon)d\epsilon=
1206: {(\kappa\epsilon)^\mu e^{-\kappa\epsilon}\over\Gamma(\mu)}
1207: {d\epsilon\over\epsilon}
1208: \qquad\qquad\mu=m+1=(\epsilon_0/\sigma_\epsilon)^2\qquad
1209: \kappa=\epsilon_0/{\sigma_\epsilon}^2
1210: \]
1211: where it is convenient to write $\mu$ for $m+1$.
1212: We have for the joint posterior p.d.f.
1213: \[
1214: p(s,\epsilon|b,n)dsd\epsilon={1\over\mathcal{N}_{s,\epsilon}}
1215: \pi(\epsilon)e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}dsd\epsilon
1216: \]
1217: where
1218: \[
1219: \mathcal{N}_{s,\epsilon}=\int_0^\infty\!\!\!\int_0^\infty\!\!
1220: \pi(\epsilon)e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}dsd\epsilon=
1221: \int_0^\infty\!\!\pi(\epsilon)\mathcal{N}_sd\epsilon
1222: \]
1223: We calculated $\mathcal{N}_s$ above, so we have
1224: \[
1225: \mathcal{N}_{s,\epsilon}=
1226: \Gamma(\alpha+n)M(-n,1-n-\alpha,b)
1227: \int_0^\infty\!\!\epsilon^{-\alpha}\pi(\epsilon)d\epsilon
1228: \]
1229: \[
1230: \mathcal{N}_{s,\epsilon}=
1231: \kappa^\alpha\Gamma(\mu-\alpha)\Gamma(\alpha+n)M(-n,1-n-\alpha,b)/\Gamma(\mu)
1232: \]
1233: \[
1234: p(s,\epsilon|b,n)dsd\epsilon=
1235: {\kappa^{\mu-\alpha}\epsilon^{\mu-1}s^{\alpha-1}(\epsilon s+b)^n
1236: e^{-(s+\kappa)\epsilon}\over
1237: \Gamma(\mu-\alpha)\Gamma(\alpha+n)M(-n,1-n-\alpha,b)}dsd\epsilon
1238: \]
1239: The marginalized posterior for $s$ can then be expressed as
1240: \[
1241: p(s|b,n)ds=\left[\int_0^\infty\!\!p(s,\epsilon|b,n)d\epsilon\right]ds=
1242: {s^{\alpha-1}\kappa^{\mu-\alpha}\mathcal{I}_\epsilon\over
1243: \Gamma(\mu-\alpha)\Gamma(\alpha+n)M(-n,1-n-\alpha,b)}ds
1244: \]
1245: where the integral $\mathcal{I}_\epsilon$ is given by
1246: \[
1247: \mathcal{I}_\epsilon=
1248: \int_0^\infty\epsilon^{\mu-1}e^{-(s+\kappa)\epsilon}
1249: (\epsilon s+b)^nd\epsilon
1250: \]
1251: The same procedure that was used for the normalization integral can
1252: be applied here, producing
1253: \[
1254: \mathcal{I}_\epsilon=
1255: {b^{\mu+n}\over s^\mu}
1256: \int_0^\infty u^{\mu-1}e^{-b(1+\kappa/s)u}(1+u)^ndu
1257: \]
1258: \[
1259: \mathcal{I}_\epsilon=
1260: {s^nn!\over(s+\kappa)^{\mu+n}}
1261: \sum_{k=0}^n{\Gamma(\mu+n-k)\over
1262: (n-k)!k!}\left[b(s+\kappa)\over s\right]^k
1263: \]
1264: \[
1265: \mathcal{I}_\epsilon=
1266: {s^n\over(s+\kappa)^{\mu+n}}\Gamma(\mu+n)
1267: M(-n,1-n-\mu,b(s+\kappa)/s)
1268: \]
1269: \[
1270: p(s|b,n)ds=
1271: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}
1272: {s^{\alpha+n-1}\kappa^{\mu-\alpha}\over(s+\kappa)^{\mu+n}}
1273: {M(-n,1-n-\mu,b(s+\kappa)/s)\over M(-n,1-n-\alpha,b)}ds
1274: \]
1275: which has a particularly simple form when the background term is zero:
1276: \[
1277: p(s|b=0,n)ds=
1278: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}
1279: {s^{\alpha+n-1}\kappa^{\mu-\alpha}\over(s+\kappa)^{\mu+n}}ds
1280: \]
1281: a Beta distribution of the 2nd kind. Note that we must require
1282: $\mu>\alpha>0$ to obtain a normalizable posterior.
1283:
1284: Our posterior p.d.f.\ for $s$ with $\epsilon$ (and $b$) fixed is
1285: recovered exactly by taking the limit of $p(s|b,n)$ as
1286: $\sigma_\epsilon\to0$. This means that the limit of $s_{\mathrm{u}}$
1287: as $\sigma_\epsilon\to0$ is identical to the value of $s_{\mathrm{u}}$
1288: when $\epsilon$ is known exactly. This property may seem obvious, but
1289: it is violated by some frequentist methods of setting limits,
1290: so it is worth mentioning.
1291:
1292: \subsection{Calculating the limit \label{intpostpdf}}
1293:
1294: We need to integrate $p(s|b,n)$ up to some limit $s_\mathrm{u}$, which
1295: can be done analytically as follows.
1296: \[
1297: \int_0^{s_\mathrm{u}}\!\!p(s|b,n)ds=
1298: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}
1299: \int_0^{s_\mathrm{u}\over s_\mathrm{u}+\kappa}
1300: t^{\alpha+n-1}(1-t)^{\mu-\alpha-1}
1301: {M(-n,1-n-\mu,b/t)\over M(-n,1-n-\alpha,b)}dt
1302: \]
1303: where the substitution $t={s\over s+\kappa}$ has been performed.
1304: Re-expanding the polynomial $M$ and integrating term by term yields
1305: \[
1306: \int_0^{s_\mathrm{u}}\!\!p(s|b,n)ds=
1307: \sum_{k=0}^n
1308: {I_x(\alpha+n-k,\mu-\alpha)n^{\underline{k}}\over(\alpha+n-1)^{\underline{k}}}
1309: {b^k\over k!}\Bigg/\sum_{k=0}^n
1310: {n^{\underline{k}}\over(\alpha+n-1)^{\underline{k}}}
1311: {b^k\over k!}\qquad\left(x={s_\mathrm{u}\over s_\mathrm{u}+\kappa}\right)
1312: \]
1313: where $I_x$ is the standard notation for the Incomplete Beta Function
1314: \[
1315: I_x(q,r)\equiv
1316: {\Gamma(q+r)\over\Gamma(q)\Gamma(r)}
1317: \int_0^xt^{q-1}(1-t)^{r-1}dt
1318: \]
1319: which also satisfies the following recursion:
1320: \[
1321: I_x(q,r)={\Gamma(q+r)\over\Gamma(q+1)\Gamma(r)}x^q(1-x)^r+I_x(q+1,r)
1322: \]
1323:
1324: %It is interesting to also look at the quantity
1325: %\[
1326: %p(\epsilon|b,n)d\epsilon=
1327: %\left[\int_0^\infty\!\!p(s,\epsilon|b,n)ds\right]d\epsilon
1328: %\]
1329: %which is the marginalized posterior for $\epsilon$. Once again,
1330: %the necessary integral has already been done above, and we
1331: %obtain
1332: %\[
1333: %p(\epsilon|b,n)d\epsilon=
1334: %{(\kappa\epsilon)^{\mu-\alpha}
1335: %e^{-\kappa\epsilon}\over\Gamma(\mu-\alpha)}
1336: %{d\epsilon\over\epsilon}
1337: %\]
1338: %which is independent of $b$ and $n$, and has the same form as the prior
1339: %for $\epsilon$, except that $\mu$ is replaced by $\mu-\alpha$.
1340: %If we had picked $\alpha=0$ (i.e.,\ a Jeffreys prior for $s$),
1341: %we would have the nice property that $p(\epsilon|b,n)=p(\epsilon)$,
1342: %but $\alpha=0$ has already been rejected above.
1343:
1344:
1345: %Another quantity of interest is the probability distribution
1346: %of a mixture of Poissons where the $\epsilon$ parameter
1347: %is given the Gamma distribution
1348: %\[
1349: %\mathcal{P}(n|b,s)=
1350: %\int_0^\infty {e^{-(s\epsilon+b)}(s\epsilon+b)^n\over n!}
1351: %{(\kappa\epsilon)^\mu e^{-\kappa\epsilon}\over\Gamma(\mu)}
1352: %{d\epsilon\over\epsilon}=
1353: %{e^{-b}\kappa^\mu\mathcal{I}_\epsilon\over n!\Gamma(\mu)}
1354: %\]
1355: %As indicated, the necessary integral here is
1356: %an integral we did above, so we get
1357: %\[
1358: %\mathcal{P}(n|b,s)=
1359: %{e^{-b}\kappa^\mu s^n\Gamma(\mu+n)M(-n,1-n-\mu,b(s+\kappa)/s)\over
1360: %n!\Gamma(\mu)(s+\kappa)^{\mu+n}}
1361: %\]
1362: %whose zero-background special case
1363: %\[
1364: %\mathcal{P}(n|b=0,s)=
1365: %{\kappa^\mu s^n\Gamma(\mu+n)\over
1366: %n!\Gamma(\mu)(s+\kappa)^{\mu+n}}
1367: %\]
1368: %is a negative binomial distribution.
1369:
1370: %There is also an interesting expression for $\sum_{j=0}^n\mathcal{P}(j|b,s)$.
1371: %We start with
1372: %\[
1373: %\mathcal{P}(j|b,s)=e^{-b}\left[f(j-1)-f(j)+b^j/j!\right]
1374: %\]
1375: %where
1376: %\[
1377: %f(j)=\sum_{k=0}^jI_{s\over s+\kappa}(1+j-k,\mu){b^k\over k!}
1378: %\]
1379: %which follows from the recursion relation for $I_x(q,r)$ given above.
1380: %Summing, we obtain
1381: %\[
1382: %\sum_{j=0}^n\mathcal{P}(j|b,s)=e^{-b}M(-n,-n,b)-e^{-b}f(n)
1383: %\]
1384: %or equivalently
1385: %\[
1386: %1-{\sum_{j=0}^n\mathcal{P}(j|b,s)\over e^{-b}M(-n,-n,b)}=
1387: %{f(n)\over M(-n,-n,b)}
1388: %\]
1389:
1390: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5
1391:
1392:
1393:
1394: \subsection{Integer moments of the marginalized posterior}
1395: Using the same technique as above, we can calculate the $j$th moment
1396: of the posterior p.d.f.\ as
1397: \[
1398: \langle s^j\rangle=\int_0^\infty\!\!s^jp(s|b,n)ds=
1399: {(\alpha+n)^{\overline{j}}\kappa^j\over(\mu-\alpha-1)^{\underline{j}}}
1400: {M(-n,1-n-\alpha-j,b)\over M(-n,1-n-\alpha,b)}
1401: \]
1402: where we utilize the rising factorial notation\cite{ff}
1403: \[
1404: z^{\overline{k}}\equiv
1405: {\Gamma(z+k)\over\Gamma(z)}=z(z+1)(z+2)\cdots(z+k-1)
1406: \]
1407:
1408: The expression for the mean of the posterior when $\alpha=1$
1409: can be simplified using the identity
1410: \[
1411: M(-n,-n-1,b)=\left(1-{b\over n+1}\right)M(-n,-n,b)+{b^{n+1}\over(n+1)!}
1412: \]
1413: obtaining
1414: \[
1415: \mathrm{mean}(\alpha=1)=\langle s\rangle|_{\alpha=1}=
1416: {\kappa(n+1-b)\over\mu-2}+
1417: {\kappa b^{n+1}\over(\mu-2)n!M(-n,-n,b)}
1418: \]
1419: Note that the 2nd term is very small when $n\gg b$.
1420:
1421: The recurrence relation\cite{as}
1422: \[
1423: r(r-1)M(q,r-1,z)+r(1-r-z)M(q,r,z)+z(r-q)M(q,r+1,z)=0
1424: \]
1425: leads to a recurrence relation between moments
1426: \[
1427: \langle s^j\rangle=
1428: {\kappa(\alpha+n+j-1-b)\over\mu-\alpha-j}\langle s^{j-1}\rangle+
1429: {\kappa^2b(\alpha+j-2)\over(\mu-\alpha-j+1)(\mu-\alpha-j)}
1430: \langle s^{j-2}\rangle
1431: \]
1432: The special case $\alpha=1$ then yields
1433: \[
1434: \langle s^2\rangle|_{\alpha=1}=
1435: {\kappa^2\over(\mu-2)(\mu-3)}\left[
1436: (2+n-b)(1+n-b)+b+{(2+n-b)b^{n+1}\over n!M(-n,-n,b)}\right]
1437: \]
1438: which leads to this approximation for the variance of the posterior
1439: \[
1440: \mathrm{variance}(\alpha=1)\simeq
1441: {\kappa^2(1+n)\over(\mu-2)(\mu-3)}+
1442: {\kappa^2(1+n-b)^2\over(\mu-2)^2(\mu-3)}\qquad\qquad(n\gg b)
1443: \]
1444:
1445: \subsection{Posterior for $s$ with gamma priors for $\epsilon$ and $b$}
1446:
1447: Here we very briefly consider the case where the background parameter
1448: $b$ also acquires an uncertainty. This case is more general than the
1449: fixed $b$ case that is the main subject of this note: the fixed $b$ case
1450: will be the subject of additional studies employing various popular
1451: frequentist techniques, with the goal of comparing their performance.
1452: We judge the more general case considered in this subsection to be
1453: more complicated than necessary for the purpose of comparing the
1454: various methods, but it is instructive to document the fact that the
1455: Bayesian method can easily handle the more general case.
1456:
1457: We assume a 2nd subsidiary measurement observing $r$ events (Poisson,
1458: as was the case for $\epsilon$), which, when combined with a flat
1459: prior for $b$, results in a gamma posterior for $b$ of the form
1460: \[
1461: p(b|r)db = {\omega (\omega b)^r e^{-\omega b} \over r!}db
1462: \]
1463: where $\omega$ is a calibration constant (analogous to $\kappa$ in the
1464: subsidiary measurement for~$\epsilon$).
1465:
1466: The posterior for $b$ becomes the prior for $b$ in the measurement of
1467: $s$. After determining the joint posterior $p(s,\epsilon,b|n)$ by
1468: using our priors for $s$, $\epsilon$ and $b$, we marginalize with
1469: respect to $\epsilon$ and $b$, resulting in
1470: \[
1471: p(s|n)ds=
1472: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}
1473: {s^{\alpha+n-1}\kappa^{\mu-\alpha}\over(s+\kappa)^{\mu+n}}
1474: {F(-n,\rho;1-n-\mu;(s+\kappa)/(s\omega))\over F(-n,\rho;1-n-\alpha;1/\omega)}ds
1475: \]
1476: where we write $\rho=r+1$ for convenience, and $F$ is the
1477: hypergeometric function\cite{as2}. As long as $n$ is a non-negative
1478: integer and $\alpha>0$, $F(-n,\rho;1-n-\alpha;x)$ is a polynomial of
1479: order $n$ in $x$ (closely related to Jacobi polynomials).
1480:
1481: This marginalized posterior for $s$ can then be integrated, with the result
1482: \[
1483: \int_0^{s_\mathrm{u}}\!\!p(s|n)ds=
1484: \sum_{k=0}^n
1485: {I_x(\alpha+n-k,\mu-\alpha)n^{\underline{k}}\rho^{\overline{k}}
1486: \over(\alpha+n-1)^{\underline{k}}}
1487: {\omega^{-k}\over k!}\Bigg/\sum_{k=0}^n
1488: {n^{\underline{k}}\rho^{\overline{k}}\over(\alpha+n-1)^{\underline{k}}}
1489: {\omega^{-k}\over k!}\quad
1490: \left(x={s_\mathrm{u}\over s_\mathrm{u}+\kappa}\right)
1491: \]
1492:
1493: These two equations closely resemble the main results of sections
1494: \ref{postpdf} and \ref{intpostpdf}: to recover the fixed $b$
1495: results, simply substitute $b\omega$ for $\rho$ above,
1496: and take the limit $\omega\to\infty$.
1497:
1498:
1499: \section{Appendix B---Average Coverage Theorem\label{ac}}
1500:
1501: In this appendix we prove that Bayesian credible intervals have average frequentist
1502: coverage, where the average is calculated with respect to the prior density.
1503: We start from the Bayesian posterior density:
1504: \begin{equation}
1505: p(s\,|\,n)\;=\;\frac{P(n\,|\,s)\,\pi(s)}{\int_{0}^{\infty}\!P(n\,|\,s)\,\pi(s)\,ds}.
1506: \end{equation}
1507: For a given observed value of $n$, a credibility-$\beta$ Bayesian interval
1508: for $s$ is any interval $[s_\mathrm{L}(n), s_\mathrm{U}(n)]$ that encloses a fraction $\beta$
1509: of the total area under the posterior density. Such an interval must therefore
1510: satisfy:
1511: \begin{equation}
1512: \beta \;=\; \int_{s_\mathrm{L}(n)}^{s_\mathrm{U}(n)}\! p(s\,|\,n)\,ds,
1513: \end{equation}
1514: or, using the definition of the posterior density:
1515: \begin{equation}
1516: \int_{s_\mathrm{L}(n)}^{s_\mathrm{U}(n)}\! P(n\,|\,s)\,\pi(s)\,ds\;=\;\beta\;\int_{0}^{\infty}
1517: \! P(n\,|\,s)\,\pi(s)\,ds.
1518: \label{eq:acbci1}
1519: \end{equation}
1520: Now for coverage. Given a true value $s_\mathrm{t}$ of $s$, the coverage $C(s_\mathrm{t})$ of
1521: $[s_\mathrm{L}(n),s_\mathrm{U}(n)]$ is the frequentist probability that $s_\mathrm{t}$ is
1522: included in that interval. We can write this as:
1523: \begin{equation}
1524: C(s_\mathrm{t})\;=\; \sum_{\substack{\text{all $n$ such that:}\\[1mm]
1525: s_\mathrm{L}(n)\le s\le s_\mathrm{U}(n)}} P(n\,|\,s_\mathrm{t}).
1526: \label{eq:acbci2}
1527: \end{equation}
1528: Next we calculate the average coverage $\overline{C}$, weighted by the prior $\pi(s)$:
1529: \addtocounter{footnote}{1}
1530: \protect\footnotetext{The best way to understand this step is to draw a diagram of
1531: $s$ versus $n$: one is integrating and summing over the area between the curves
1532: $s_\mathrm{L}(n)$ and $s_\mathrm{U}(n)$. The limits on the sum and integral depend on the order
1533: in which one does these operations and can be derived from the diagram.}
1534: \addtocounter{footnote}{-1}
1535: \begin{align*}
1536: \overline{C} & \;=\; \int_{0}^{\infty}\! C(s)\,\pi(s)\,ds , && \displaybreak[0]\\[6mm]
1537: & \;=\; \int_{0}^{\infty}\sum_{\substack{\text{all $n$ such that:}\\[1mm]
1538: s_\mathrm{L}(n)\le s\le s_\mathrm{U}(n)}}
1539: P(n\,|\,s)\,\pi(s)\, ds,
1540: && \text{using equation (\protect\ref{eq:acbci2}),} \displaybreak[0]\\[6mm]
1541: & \;=\; \sum_{n=0}^{\infty}\;\int_{s_\mathrm{L}(n)}^{s_\mathrm{U}(n)}\! P(n\,|\,s)\,\pi(s)\,ds ,
1542: && \text{interchanging integral and sum,\protect\footnotemark}\displaybreak[0]\\[6mm]
1543: & \;=\; \beta\; \sum_{n=0}^{\infty}\;\int_{0}^{\infty}\! P(n\,|\,s)\,\pi(s)\,ds ,
1544: && \text{using equation (\protect\ref{eq:acbci1}),} \displaybreak[0]\\[6mm]
1545: & \;=\; \beta\; \int_{0}^{\infty}\sum_{n=0}^{\infty} P(n\,|\,s)\,\pi(s)\,ds ,
1546: && \text{interchanging sum and integral,}\displaybreak[0]\\[6mm]
1547: & \;=\; \beta\; \int_{0}^{\infty}\!\pi(s)\, ds ,
1548: && \text{by the normalization of }P(n\,|\,s), \displaybreak[0]\\[6mm]
1549: & \;=\; \beta ,
1550: && \text{by the normalization of }\pi(s) .
1551: \end{align*}
1552: This completes the proof. We have assumed here that the prior $\pi(s)$ is
1553: proper and normalized to 1, but the proof can be generalized to improper priors
1554: such as those we considered in this note. A constant prior for example, can be
1555: regarded as the limit for $s_{\max}\rightarrow\infty$ of the proper prior:
1556: \begin{equation}
1557: \pi(s\,|\,s_{\max}) \;=\; \frac{\vartheta(s_{\max}-s)}{s_{\max}},
1558: \end{equation}
1559: where $\vartheta(x)$ is $0$ if $x<0$ and $1$ otherwise. We then {\em define}
1560: the average coverage for the constant prior as the limit:
1561: \begin{equation}
1562: \overline{C}\;=\;\lim_{s_{\max}\rightarrow\, +\infty} \;
1563: \int_{0}^{\infty}\! C(s)\,\pi(s\,|\,s_{\max})\,ds.
1564: \end{equation}
1565: The previous proof can now be applied to the argument of the limit and leads
1566: to the same result.
1567:
1568: The average coverage theorem remains valid when $s$ is multidimensional,
1569: for example when it consists of a parameter of interest and one or more
1570: nuisance parameters. In that case one needs to average the coverage over
1571: {\em all} the parameters.
1572: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1573:
1574:
1575:
1576:
1577: \newpage
1578:
1579:
1580:
1581:
1582: \begin{thebibliography}{99}
1583:
1584: \bibitem{interplay}
1585: M.~J.~Bayarri and J.~O.~Berger,
1586: ``The Interplay of Bayesian and Frequentist Analysis'',
1587: Statistical Science 19, p 58 (2004),\\
1588: \url{projecteuclid.org/Dienst/UI/1.0/Summarize/euclid.ss/1089808273},\\
1589: \url{www.isds.duke.edu/~berger/papers/interplay.html}.
1590:
1591: \bibitem{zeroc}
1592: Giovanni Punzi,
1593: ``Example of Bayesian intervals with zero coverage'',
1594: CDF Internal Note 6689 (2001),\\
1595: \url{www-cdf.fnal.gov/publications/cdf6689_Bayes_zero_coverage.pdf}.
1596:
1597: \bibitem{clifford}
1598:
1599: Peter Clifford,
1600: ``Interval estimation as viewed from the world of mathematical
1601: statistics'',
1602: CERN Yellow Report CERN 2000-005, p 157 (2000), {\it
1603: Proceedings of the
1604: Workshop on Confidence Limits at CERN, 17--18 January 2000}, edited by
1605: L.~Lyons, Y.~Perrin, and F.~James,\\
1606: %\url{cdsweb.cern.ch/search.py?recid=411537}
1607: \url{doc.cern.ch/yellowrep/2000/2000-005/p157.pdf}.
1608: %\url{ph-dep.web.cern.ch/ph-dep/Events/CLW/PAPERS/PS/clifford.ps}
1609:
1610: \bibitem{strong}
1611: Giovanni Punzi,
1612: ``A stronger classical definition of Confidence Limits'',
1613: hep-ex/9912048,
1614: %\url{ph-dep.web.cern.ch/ph-dep/Events/CLW/PAPERS/PS/punzi.ps}
1615: \url{www.arxiv.org/abs/hep-ex/9912048}.
1616:
1617:
1618:
1619: \bibitem{zech}
1620: G\"{u}nter Zech,
1621: ``Confronting classical and Bayesian confidence limits to examples'',
1622: CERN Yellow Report CERN 2000-005, p 141 (2000), {\it
1623: Proceedings of the
1624: Workshop on Confidence Limits at CERN, 17--18 January 2000}, edited by
1625: L.~Lyons, Y.~Perrin, and F.~James,
1626: %\url{cdsweb.cern.ch/search.py?recid=411537}
1627: \url{doc.cern.ch/yellowrep/2000/2000-005/p141.pdf}.
1628: %\url{ph-dep.web.cern.ch/ph-dep/Events/CLW/PAPERS/PS/zech.ps}
1629:
1630:
1631:
1632: \bibitem{karlen}
1633: D.~Karlen,
1634: ``Credibility of confidence intervals'', in
1635: {\em Proceedings of the Conference on Advanced Techniques in Particle Physics,
1636: Durham, 18--22 March 2002},
1637: edited by M.~Whalley and L.~Lyons, p 53, (2002),\\
1638: \url{www.ippp.dur.ac.uk/Workshops/02/statistics/proceedings/karlen.pdf}.
1639:
1640:
1641: \bibitem{ch}
1642: R.~D.~Cousins and V.~L.~Highland,
1643: ``Incorporating systematic uncertainties into an upper limit'',
1644: Nucl.\ Instrum.\ Meth.\ A {\bf 320}, p 331 (1992).
1645:
1646:
1647: \bibitem{feldman}
1648: Gary Feldman,
1649: ``Multiple measurements and parameters in the unified approach'',
1650: {\it Fermilab Workshop on Confidence Limits 27--28 March, 2000}, p 11,\\
1651: \url{conferences.fnal.gov/cl2k/copies/feldman2.pdf},\\
1652: \url{huhepl.harvard.edu/~feldman/CL2k.pdf}.
1653:
1654: \bibitem{roots}
1655: J.~M.~Bernardo and A.~F.~M.~Smith, ``Bayesian Theory'', (John
1656: Wiley and Sons, Chichester, UK, 1993), \S5.4 and \S A.2.
1657:
1658: \bibitem{ref:Jeffreys}
1659: Harold Jeffreys, ``Theory of Probability'', 3rd ed., (Oxford
1660: University Press, Oxford, 1961), \S3.1.
1661:
1662: \bibitem{software}
1663: Joel Heinrich, ``User Guide to Bayesian-Limit Software Package'',
1664: CDF Internal Note 7232, (2004),\\
1665: \url{www-cdf.fnal.gov/publications/cdf7232_blimitguide.pdf};\\
1666: CDF Statistics Committee Software Page,\\
1667: \url{www-cdf.fnal.gov/physics/statistics/statistics_software.html}.
1668:
1669:
1670: \bibitem{ff}
1671: R.~L.~Graham, D.~E.~Knuth, and O.~Patashnik, ``Concrete Mathematics:
1672: A Foundation for Computer Science'', 2nd ed., (Addison-Wesley, Reading,
1673: MA, 1994); PlanetMath Mathematics Encyclopedia,\\
1674: \url{planetmath.org/encyclopedia/FallingFactorial.html}.
1675:
1676:
1677: \bibitem{as}
1678: M.~Abramowitz and I.A.~Stegun, editors, ``Handbook of Mathematical
1679: Functions'', (United
1680: States Department of Commerce, National Bureau of Standards,
1681: Washington, D.C. 1964; and Dover Publications, New York, 1968),
1682: chapter 13.
1683:
1684: \bibitem{as2}
1685: M.~Abramowitz and I.A.~Stegun, ibid., chapter 15;
1686: William H. Press, et al.,\ ``Numerical Recipes'', 2nd edition,
1687: (Cambridge University Press, Cambridge, 1992), \S5.14 and \S6.12,
1688: \url{lib-www.lanl.gov/numerical/bookcpdf/c5-14.pdf},\\
1689: \url{lib-www.lanl.gov/numerical/bookcpdf/c6-12.pdf}.
1690:
1691:
1692:
1693: \end{thebibliography}
1694:
1695: %\clearpage
1696: \begin{figure}[p]
1697: \begin{center}
1698: \includegraphics[width=\textwidth]{Bayes}
1699: \caption{
1700: Coverage as a function of the true signal rate $s$ for Bayes 90\%
1701: limits, for the simple case of no background and no uncertainty on
1702: $\epsilon = 1$. The dotted line at $C=0.9$ is given to
1703: show that the coverage never falls below 90\% (in this
1704: simple case).}
1705: \label{fig:Bayes}
1706: \end{center}
1707: \end{figure}
1708:
1709:
1710: \begin{figure}[p]
1711: \begin{center}
1712: \includegraphics[width=\textwidth]{pcomparison}
1713: \caption{
1714: Comparison of our discrete probability for $\epsilon_0$ (shown as
1715: a histogram, see eqn.~(\ref{eqn:pdf})) and Gaussian (continuous curve)
1716: for the case $\epsilon=1.0\pm0.1$.}
1717: \label{fig:comparison}
1718: \end{center}
1719: \end{figure}
1720:
1721:
1722: \begin{figure}[p]
1723: \begin{center}
1724: \includegraphics[width=\textwidth]{comparison2}
1725: \caption{Comparison of our likelihood
1726: (dashed, see eqn.~(\ref{eqn:likelihood}))
1727: and Gaussian (solid) for the case
1728: $\epsilon=1.0\pm0.1$.}
1729: \label{fig:comparison2}
1730: \end{center}
1731: \end{figure}
1732:
1733:
1734: \begin{figure}[p]
1735: \begin{center}
1736: \includegraphics[width=3.15in]{pdf1}\hskip-0.2in
1737: \includegraphics[width=3.15in]{pdf2}\\
1738: \includegraphics[width=3.15in]{pdf3}\hskip-0.2in
1739: \includegraphics[width=3.15in]{pdf4}\\
1740: \caption{Posterior densities $p(s|b,n)$ vs $s$ for $n=1$, 3, 5, 10.
1741: In each case, $b=3$ and $\epsilon=1.0\pm0.1$ (i.e.\ $\kappa=100$ and $m$=99).}
1742: \label{pdfs}
1743: \end{center}
1744: \end{figure}
1745:
1746: \begin{figure}[p]
1747: \begin{center}
1748: \includegraphics[width=\textwidth]{l100}
1749: \caption{Coverage of 90\% upper limits as a function of \st\ for
1750: $\et=1$, nominal 10\% uncertainty of the subsidiary
1751: measurement of $\epsilon$, and expected background $b=3$.}
1752: \label{l100}
1753: \end{center}
1754: \end{figure}
1755:
1756:
1757:
1758: \begin{figure}[p]
1759: \begin{center}
1760: \includegraphics[width=\textwidth]{l25}
1761: \caption{Coverage of 90\% upper limits as a function of \st\ for
1762: $\et=1$, nominal 20\% uncertainty of the subsidiary
1763: measurement of $\epsilon$, and expected background $b=3$.}
1764: \label{l25}
1765: \end{center}
1766: \end{figure}
1767:
1768:
1769: \begin{figure}[p]
1770: \begin{center}
1771: \includegraphics[width=\textwidth]{ecov}
1772: \caption{Coverage of 90\% upper limits as a function of \et\ for
1773: $\st=10$, nominal 10\% uncertainty of the subsidiary
1774: measurement of $\epsilon$, and expected background $b=3$.}
1775: \label{ecov}
1776: \end{center}
1777: \end{figure}
1778:
1779:
1780: \begin{figure}[p]
1781: \begin{center}
1782: \includegraphics[width=\textwidth]{sens}
1783: \caption{Sensitivity of 90\% upper limits as a function of \st\ for
1784: $\et=1$, nominal 10\% uncertainty of the subsidiary
1785: measurement of $\epsilon$, and expected background $b=3$.
1786: For reference, the sensitivity for $\sigma_\epsilon=0$ is also
1787: given (dashed).}
1788: \label{sens}
1789: \end{center}
1790: \end{figure}
1791:
1792:
1793:
1794:
1795:
1796:
1797:
1798:
1799: \end{document}
1800: