0409:physics0409129/ln.tex

1:

2: \documentclass[12pt,letterpaper]{article}

3:

4: \newif\ifpdf  % creates \ifpdf , \pdffalse and \pdftrue

5:

6: \ifx\pdfoutput\undefined

7:   \pdffalse

8: \else

9:   \pdfoutput=1

10:   \pdftrue

11: \fi

12:

13:

14:

15:

16: \ifpdf

17: %  \topmargin=-20mm

18:   \topmargin=-38mm

19:   \usepackage{graphicx,here,theorem,amssymb,amsmath,url,thumbpdf}

20:   \pdfimageresolution=300

21:   \DeclareGraphicsExtensions{.pdf,.jpg,.jpeg}

22:   \usepackage[pdftex,pdfpagemode=None,

23:      pdfstartview={XYZ},]{hyperref}  % specify last

24: \else

25: %  \topmargin=0mm

26:   \topmargin=-18mm

27:   \usepackage{graphicx,here,theorem,amssymb,amsmath,url}

28:   \DeclareGraphicsExtensions{.eps,.ps,.eps.gz,.ps.gz}

29:   \newcommand{\href}[2]{{#2}}

30: \fi

31:

32: %

33: \oddsidemargin=5mm

34: \evensidemargin=5mm

35: \textwidth=155mm

36: \textheight=230mm

37: % Allow the page size to vary a bit ...

38: \raggedbottom

39: % To avoid Latex to be too fussy with line breaking ...

40: \sloppy

41:

42:

43:

44: %%%%%%%% new commands for s_true and epsilon_true %%%%%%%%%%

45: \newcommand{\st}{\ensuremath{s_\mathrm{true}}}

46: \newcommand{\et}{\ensuremath{\epsilon_\mathrm{true}}}

47:

48: \newcommand{\penn}{$^\mathrm{a}$}

49: \newcommand{\brandeis}{$^\mathrm{b}$}

50: \newcommand{\rutgers}{$^\mathrm{c}$}

51: \newcommand{\rockefeller}{$^\mathrm{d}$}

52: \newcommand{\oxford}{$^\mathrm{e}$}

53: \newcommand{\pisa}{$^\mathrm{f}$}

54: \newcommand{\toronto}{$^\mathrm{g}$}

55:

56:

57: \title{{\hfill\small CDF/MEMO/STATISTICS/PUBLIC/7117}\\\hfill\\

58: \bf\large Interval estimation in the presence of nuisance parameters.

59: 1.~Bayesian approach.}

60:

61: \pagenumbering{arabic}

62:

63: \author{

64: %\LARGE \phantom{xxxxx} CDF Statistics Committee \phantom{xxxxx} \and

65: Joel Heinrich\penn,

66: Craig Blocker\brandeis,

67: John Conway\rutgers,

68: Luc Demortier\rockefeller, \and

69: Louis Lyons\oxford,

70: Giovanni Punzi\pisa,

71: Pekka K.~Sinervo\toronto \and

72: \scriptsize\it\penn%

73: University of Pennsylvania, Philadelphia, Pennsylvania 19104 \and

74: \scriptsize\it\brandeis%

75: Brandeis University, Waltham, Massachusetts 02254 \and

76: \scriptsize\it \phantom{xxxxx} \rutgers%

77: Rutgers University, Piscataway, New Jersey 08855 \phantom{xxxxx} \and

78: \scriptsize\it\rockefeller%

79: Rockefeller University, New York, New York 10021 \and

80: \scriptsize\it\oxford%

81: University of Oxford, Oxford OX1 3RH, United Kingdom \and

82: \scriptsize\it\pisa%

83: Istituto Nazionale di Fisica Nucleare,

84: University and Scuola Normale Superiore of Pisa, I-56100 Pisa, Italy \and

85: \scriptsize\it\toronto%

86: University of Toronto, Toronto M5S 1A7, Canada}

87:

88: \date{September 27, 2004}

89:

90: \begin{document}

91:

92: \maketitle

93:

94:

95: \begin{abstract}

96: We address the common problem of calculating intervals in the presence

97: of systematic uncertainties.  We aim to investigate several

98: approaches, but here describe just a Bayesian technique for setting

99: upper limits.  The particular example we study is that of inferring

100: the rate of a Poisson process when there are uncertainties on the

101: acceptance and the background.  Limit calculating software associated

102: with this work is available in the form of C~functions.

103: %%We address the common problem of setting limits on the rate of a

104: %%process using a counting experiment in the presence of uncertainties

105: %%on acceptances and backgrounds. We aim to investigate several

106: %%different approaches, but here describe just a Bayesian

107: %%technique. Limit calculating software associated with this study is

108: %%available in the form of C~functions.

109:

110: \end{abstract}

111:

112: \section{The problem}

113: \label{problem}

114: %A very common statistical problem of relevance to Particle Physics is

115: %the extraction of an upper limit on some hypothesised process,

116: A very common statistical procedure is obtaining a confidence interval

117: for a physics parameter of interest,

118: when there are uncertainties in quantities such as the acceptance of the

119: detector and/or the analysis procedure, the beam intensity, and the

120: estimated background. These are known in statistics as nuisance

121: parameters, or in Particle Physics as sources of systematic

122: uncertainty. We assume that estimates of these quantities are

123: available from subsidiary measurements.\footnote{

124: There are other possibilities. Thus it may be that all that is known is

125: that a nuisance parameter is contained within a certain range:

126: $\mu_\mathrm{l}\le\mu\le\mu_\mathrm{u}$;

127: that is not enough information for a Bayesian

128: approach. Alternatively the data relevant for the physics and nuisance

129: parameters could be bound up in the main measurement, and not require a

130: subsidiary one.}

131: A variant of

132: this procedure which is particularly relevant for Particle Physics is

133: the extraction of an upper limit on the rate of some hypothesized

134: process or on a physical parameter, again with systematic uncertainties.

135:

136:

137: To specify the problem in more detail, we assume that we are

138: performing a counting experiment in which we observe $n$ counts, and

139: that the acceptance has been estimated as $\epsilon_0 \pm

140: \sigma_\epsilon$ and the background as $b_0 \pm \sigma_b$.  For a

141: signal rate $s$, $n$ is Poisson distributed with mean $s\epsilon + b$.

142: Here $\epsilon$ contains factors like the intensity of the accelerator

143: beam(s), the running time, and various efficiencies. It is constrained

144: to be non-negative, but can be larger than unity.

145:

146: We aim to study and compare different approaches

147: for determining confidence intervals for

148: this problem. In

149: general we are interested in pathologies in these areas:

150: \begin{itemize}

151:

152:

153: \item

154: {\bf  Coverage.}

155: This is a measure of how often the limits that we deduce would

156: in fact include the true value of the parameter. This requires consideration of an ensemble

157: of experiments like the one we actually performed, and hence is an essentially

158: frequentist concept. Nevertheless, it can be applied to a Bayesian technique.

159:

160: Coverage is a property of the technique, and not of the particular

161: limit deduced from a given measurement. It can, however, be a function

162: of the true value of the parameter, which is in general unknown in a

163: real measurement.

164:

165: Undercoverage (i.e.\ the probability of containing

166: the true value is less than the stated confidence level) is regarded by frequentists as a

167: serious defect.

168: Usually coverage is required for all possible values of the physical

169: parameter.\footnote{The argument is that the parameter is unknown, and

170: so we wish to have coverage, whatever its value. This ensures that, if

171: we repeat our specific experiment many times, we should include the

172: true value within our confidence ranges in (at least) the stated

173: fraction of cases. This argument may, however, be over-cautious. The

174: location of the dips in a coverage plot like that of

175: Fig.~\ref{fig:Bayes} occur at values which are not fixed in $s$, but

176: which depend on the details of our experiment (such as the values of

177: $\epsilon$ and $b$). These details vary from experiment to

178: experiment. Thus we could achieve `no undercoverage for the ensemble

179: of experiments measuring the parameter $s$', even if the individual

180: coverage plots did fall below the nominal coverage occasionally.  Thus

181: in some sense `average coverage' would be sufficient (see

182: for example reference \cite{interplay}), although it is

183: hard to quantify the exact meaning of `average'.  It should be stated

184: that this is not the accepted position of most High Energy Physics

185: frequentists.}

186: In contrast, overcoverage is permissible, but the larger intervals result

187: in less stringent tests of models that predict the value of the parameter. For

188: measurements involving quantised data (e.g.\ Poisson counting), most

189: methods have coverage which varies with the true value of the parameter of interest,

190: and hence if undercoverage is to be avoided, overcoverage is inevitable.

191:

192: Frequentist methods by construction will not undercover for any values

193: of the parameters. This is not guaranteed for other approaches. For

194: example, even though the Bayesian intervals shown here do not

195: undercover, in other problems Bayesian 95\% credible intervals could

196: even have zero coverage for some values of the parameter of

197: interest.\cite{zeroc}

198: It should also be remarked that, although coverage is a very important

199: property for frequentists, on its own exact coverage does not

200: guarantee that intervals have desirable properties (for many examples,

201: see Refs. \cite{clifford} and \cite{strong}).

202:

203:

204:

205:

206: %\item

207: %%How the limits on $s$ vary with the observation $n$. Low limits can

208: %%provide tighter rejection of incorrect values, but empty or very short

209: %%intervals are undesirable:

210: %Narrow intervals or low upper limits

211: %for the signal rate $s$

212: %can provide tighter rejection of

213: %incorrect values, but empty or very short intervals---intervals

214: %with very small Bayesian credibility---are undesirable.

215: %Although such intervals may formally enjoy frequentist coverage%

216: %%and/or Bayesian credibility

217: %, they simply lack a kind of conditional

218: %plausibility: {\em given} the type of measurement we are making, the

219: %resolution of the apparatus, etc.,\ it is very unlikely that the true

220: %value of the parameter we are interested in lies in the calculated

221: %interval.

222:

223: %We will also look at the mean interval length,

224: %even though it is not invariant with respect to reparametrisations of

225: %$s$, the variable of interest, e.g.\ $1/s$ or $\ln s$, or $s^2$,

226: %etc.\footnote{The {\bf median} upper limit is invariant with respect

227: %to monotonic reparametrisations, but is not commonly used.}

228:

229: \item

230: {\bf Interval length.}

231: This is sometimes used as a criterion of accuracy of

232: intervals, in the sense that shorter intervals have less probability of

233: covering false values of the parameter of interest.  However, one should

234: keep in mind that short intervals are only desirable if they contain the

235: true value of the parameter.  Thus empty intervals, which do occur in

236: some frequentist constructions, are generally undesirable, even when their

237: construction formally enjoys frequentist coverage.

238:

239: Intervals that fill the entire physically allowed range of the

240: parameter of interest may also occur in some situations.  Examples of

241: this behavior are given in \cite{clifford} and \cite{zech}.  An

242: experimenter who requests a 68\% confidence interval, but receives what

243: appears to be a 100\% confidence interval instead, may not be satisfied

244: with the explanation that he is performing a useful service in helping

245: to keep the coverage probability---averaged over his measurement and

246: his competitor's measurements---from dropping below 68\%.

247:

248: %This is sometimes used as a criterion of

249: %accuracy of intervals, but one should keep in mind that short

250: %intervals are only desirable if they contain the true value of the

251: %parameter.  Thus empty intervals, which do occur in some frequentist

252: %constructions, are generally undesirable, even when their construction

253: %formally enjoys frequentist coverage.

254: %This problem is not necessarily

255: %restricted to empty intervals.  Suppose we make a histogram of

256: %interval lengths over the relevant ensemble of measurements, and in

257: %each non-empty bin we calculate the average coverage probability for

258: %that bin.  Ideally we would like the average coverage to be constant

259: %over all bins.  On the other hand, it may happen that the average

260: %coverage increases as the interval widens; this could be seriously

261: %misleading.  It is easy to see that upper limits suffer from this

262: %defect: the average coverage is zero for upper limits below the true

263: %value, and one above.  In practice of course we do not know the true

264: %value, so that calculation of an upper limit does not affect our

265: %knowledge of the coverage.  However, with two-sided intervals and in

266: %the presence of systematic uncertainties the situation may not be as

267: %straightforward.

268:

269:

270:

271: \item

272: {\bf Bayesian credibility.}

273: In some situations it may be relevant to calculate the Bayesian

274: credibility of an interval, even when the latter was constructed by

275: frequentist methods.  This would of course require one to choose a

276: prior for all the unknown parameters.  The question is one of

277: plausibility: given the type of measurement we are making, the

278: resolution of the apparatus, etc., how likely is it that the true

279: value of the parameter we are interested in lies in the calculated

280: interval?  Does this

281: %more or less agree with

282: differ dramatically from

283: the nominal coverage probability of the interval?

284: In fact for different values of the observable(s), frequentist ranges

285: are very likely to have different credibilities. Some examples

286: of this behavior are noted in Ref.~\cite{karlen}.

287:

288: When calculating the Bayesian credibility of frequentist intervals,

289: ``uninformative'' priors appear advisable.  Note that severe interval

290: length pathologies will automatically produce a large inconsistency

291: between the nominal coverage and the Bayesian credibility of an

292: interval.  Except in a handful of very special cases, it is not

293: possible to construct an interval scheme that has simultaneously

294: constant Bayesian credibility and constant frequentist coverage, even

295: if one has total freedom in choosing the prior(s).  Although it is not

296: at all clear exactly how large a level of disagreement is

297: pathological, nevertheless it may be instructive to know how severely

298: an interval scheme deviates from constant Bayesian credibility (and

299: how sensitive this is to the choice of prior).

300:

301:

302: \item

303: {\bf  Bias.}

304: In the context of interval selection, this means having a larger

305: coverage $B(s',\st)$ for an incorrect value $s'$ of the parameter than

306: for the true value \st. This requires plots of coverage versus $s'$

307: for different values of \st. For upper limits, $B(s_1',\st) \ge

308: B(s_2',\st)$ if $s_1'$ is less than $s_2'$, so methods are necessarily

309: biassed for low $s'$.

310: %Bias is more interesting for 2--sided intervals.

311: Bias thus is not very interesting for upper limits.

312: %This is because almost

313: %all methods are biassed in giving larger coverage for $s'=0$ (where the

314: %coverage is usually 100\%) than for \st.

315: It will be discussed in

316: later notes dealing with two-sided intervals.

317:

318: \item

319: {\bf Transformation under reparametrisation.}

320: %Suppose that

321: %$\theta\in\Theta$ is the parameter we are interested in, and let $f$

322: %be an arbitrary transformation of $\Theta$.  The calculated interval

323: %is then called ``transformation-respecting'' if the interval for

324: %$f(\theta)$ can be obtained by applying $f$ to the endpoints of the

325: %interval for $\theta$.

326: Intervals that are not transformation-respecting can be problematic. For

327: example, it is possible for the predicted value of the lifetime of a

328: particle to be contained within the 90\% interval determined from the

329: data, but for the corresponding predicted value of the decay rate (equal

330: to the reciprocal of the predicted lifetime) to be outside the 90\%

331: interval when the data is analysed by the same procedure, but in terms

332: of decay rate. This would result in unwanted ambiguities about the

333: compatibility of the data with the prediction.

334:

335: \item

336: %Range preservation.

337: {\bf Unphysical ranges.}

338: The question here is whether the interval

339: construction procedure can be made to respect the physical boundaries

340: of the problem.

341: For example, branching fractions should be in the range zero to one,

342: masses should not be negative, etc. Statements about the {\em true}

343: value of a parameter should respect any physical bounds.  In contrast,

344: some methods give {\em estimates} of parameters which can themselves

345: be unphysical, or which include unphysical values when the errors are

346: taken into account. We do not recommend truncating ranges of {\em

347: estimates} of parameters to obey such bounds. Thus the fact that a

348: branching fraction is estimated as $1.1\pm0.2$ conveys more

349: information about the experimental result than does the statement that

350: it lies in the range 0.9 to 1.

351:

352:

353:

354: \item

355: {\bf Behavior with respect to nuisance parameter.}

356: We would normally expect that the limits on a physical parameter would

357: tighten as the uncertainty on a nuisance parameter decreases; and that

358: as this uncertainty tends to zero, the limits should agree with those

359: obtained under the assumption that the ``nuisance parameter'' was

360: exactly known. (Otherwise we could sometimes obtain a tighter limit

361: simply by pretending that we knew less about the nuisance parameter

362: than in fact is the case.) These desiderata are not always satisfied

363: by non-Bayesian methods (see \cite{ch} and \cite{feldman}).

364:

365:

366:

367: \end{itemize}

368:

369: Although we are ultimately interested in comparing different

370: approaches to this problem, in this note we investigate a Bayesian

371: technique for determining upper limits. Our purpose is to spell out in

372: some detail how this approach is used, and to discuss some of the

373: properties of the resulting limits in this specific example. We

374: believe that, for variants of this problem (e.g.\ different choice of

375: prior for $s$; alternative assumptions about the information on the

376: nuisance parameters; etc.), the reader could readily adapt the

377: techniques described here (and the associated software) to their

378: particular situation.

379:

380: We will report on two-sided intervals and also compare with

381: %Here we investigate a Bayesian technique. We will report on and

382: %compare with

383: other methods (e.g.\ Cousins--Highland, pure frequentist,

384: profiled frequentist) in later notes.

385:

386: \section{Reminder of Bayesian approach}

387: \label{Bayes}

388:

389: Before dealing with the problem of extracting and studying the limits on $s$ as deduced

390: from observing $n$ events from a

391: Poisson distribution with mean $s\epsilon + b$

392: in the presence of an uncertainty on $\epsilon$, we recall the way

393: the Bayesian approach works for the simpler problem of a counting experiment with no

394: background and with $\epsilon$ exactly known. Then $n$ is Poisson

395: distributed with mean $s\epsilon$,

396: and Bayes' Theorem\footnote{We follow the common convention whereby

397: lower case $\pi$'s denote prior p.d.f.'s, lower case $p$'s denote

398: other p.d.f.'s, upper case $\Pi$'s denote prior probabilities, and

399: upper case $P$'s denote other

400: probabilities. Equation~(\ref{eqn:BayesTh}) is true for probabilities,

401: p.d.f.'s, or mixtures depending on whether $B$ and/or $C$ are

402: discrete or continuous variables.}

403: \begin{equation}

404: P(B|C) = P(C|B)P(B)/P(C)

405: \label{eqn:BayesTh}

406: \end{equation}

407: gives

408: \begin{equation}

409: p(s|n) = \frac{P(n|s)  \pi(s)}{\int P(n|s)  \pi(s) \ ds}

410: \label{eqn:P(s|n)}

411: \end{equation}

412: where $\pi(s)$ is the prior probability density for $s$;

413: $p(s|n)$ is the posterior probability density function (p.d.f.)\ for $s$, given

414: the observed $n$; and

415: $P(n|s)$ is the probability of observing $n$, given $s$.

416:

417: We assume a constant prior for $s$,\footnote{This is an assumption, not a necessity, and

418: is in some ways unsatisfactory. (It is implausible, cannot be normalised, and

419: creates divergences for the posterior if used with a (truncated) Gaussian prior

420: for the acceptance $\epsilon$.)} and that $P(n|s)$ is given by the Poisson

421: \begin{equation}

422: P(n|s) = e^{-s\epsilon}(s\epsilon)^n/n!

423: \label{eqn:Poisson}

424: \end{equation}

425: Then\footnote{It turns out that the sum over $n$ of the discrete distribution

426: (\ref{eqn:Poisson}) and the integral over $s$ of the continuous distribution

427: (\ref{eqn:Poisson2}) are both equal to unity.

428: This means that the probability $P(n|s)$ and the probability density

429: $p(s|n)$ are correctly normalised.}

430: \begin{equation}

431: p(s|n) = \epsilon e^{-s\epsilon}(s\epsilon)^n/n!

432: \label{eqn:Poisson2}

433: \end{equation}

434: The limit is now obtained by integrating this posterior p.d.f.\ for $s$ until we

435: achieve the required fraction $\beta$ of the total integral from zero to infinity. If

436: $\beta$ is $90\%$, the upper limit $s_\mathrm{u}$ is given by

437: \begin{equation}

438: \int^{s_\mathrm{u}}_0\!\! p(s|n)\ ds =0.9

439: \label{eqn:0.9}

440: \end{equation}

441: $\beta$ is termed the credible or Bayesian confidence level for the limit.

442:

443: For different observed $n$, the upper limits are shown in the last two

444: columns of Table~\ref{ltablex}, for $b = 0$ and for $b = 3$

445: respectively.

446: The Gaussian

447: approximation for the case $b = 0$, $n=20$, would yield

448: $s_\mathrm{u}\simeq20+1.28\sqrt{20}\simeq25.7$, which is roughly

449: comparable to the corresponding $s_\mathrm{u}=27.0451$ of the Table.

450: For $b = 0$, it coincidentally turns out that, for this

451: particular example, the Bayesian upper limits are identical with those

452: obtained in a frequentist calculation with the Neyman construction and

453: a simple ordering rule (see later note on the frequentist approach to

454: this problem). In general this is not so.

455: Other priors sometimes used for $s$ are $1/\sqrt s$ \cite{roots} or

456: $1/s$ \cite{ref:Jeffreys}. Having a prior peaked at smaller values of

457: $s$ in general results in tighter limits for a given observed $n$.

458:

459: If the whole procedure is now repeated with a background $b$ and a

460: flat prior, the upper limits not surprisingly decrease for increasing

461: $b$ at fixed $n$ (except for the case $n = 0$ where the limits can

462: trivially be seen to be independent of $b$). This is not inconsistent

463: with the fact that the mean limit for a series of measurements

464: increases with $b$, i.e.\ experiments with larger expected backgrounds

465: have poorer sensitivity.

466:

467:

468:

469:

470: \begin{table}

471: \begin{center}

472: \scriptsize

473: \begin{tabular}{@{}r|rrrrrrrrr|rr@{}}

474: \hline

475: &\multicolumn{9}{c|}{$\epsilon=1.0\pm0.1$}&

476: \multicolumn{2}{c}{$\epsilon=1\pm0$}\\

477: $n$&\multicolumn{1}{c}{$b=0$}&

478: \multicolumn{1}{c}{1}&

479: \multicolumn{1}{c}{2}&

480: \multicolumn{1}{c}{3}&

481: \multicolumn{1}{c}{4}&

482: \multicolumn{1}{c}{5}&

483: \multicolumn{1}{c}{6}&

484: \multicolumn{1}{c}{7}&

485: \multicolumn{1}{c|}{8}&

486: \multicolumn{1}{c}{$b=0$}&

487: \multicolumn{1}{c}{3}\\

488: \hline

489: 0&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3531&2.3026&2.3026\\

490: 1&3.9868&3.3470&3.0620&2.9019&2.8000&2.7297&2.6783&2.6391&2.6083&3.8897&2.8389\\

491: 2&5.4669&4.5520&3.9676&3.6026&3.3623&3.1953&3.0736&2.9816&2.9099&5.3223&3.5228\\

492: 3&6.8745&5.8618&5.0463&4.4644&4.0571&3.7666&3.5534&3.3922&3.2671&6.6808&4.3624\\

493: 4&8.2380&7.1964&6.2451&5.4751&4.8914&4.4569&4.1313&3.8832&3.6904&7.9936&5.3447\\

494: 5&9.5714&8.5213&7.5063&6.6022&5.8579&5.2719&4.8180&4.4660&4.1904&9.2747&6.4371\\

495: 6&10.8826&9.8288&8.7885&7.8047&6.9344&6.2066&5.6184&5.1499&4.7772&10.5321&7.5993\\

496: 7&12.1766&11.1203&10.0703&9.0460&8.0904&7.2450&6.5289&5.9387&5.4586&11.7709&8.7958\\

497: 8&13.4570&12.3984&11.3441&10.3014&9.2952&8.3635&7.5374&6.8300&6.2380&12.9947&10.0030\\

498: 9&14.7261&13.6655&12.6085&11.5575&10.5247&9.5365&8.6250&7.8142&7.1136&14.2060&11.2085\\

499: 10&15.9858&14.9233&13.8641&12.8090&11.7630&10.7415&9.7701&8.8758&8.0775&15.4066&12.4073\\

500: 11&17.2375&16.1732&15.1121&14.0542&13.0017&11.9621&10.9525&9.9966&9.1170&16.5981&13.5983\\

501: 12&18.4823&17.4163&16.3533&15.2934&14.2371&13.1881&12.1560&11.1582&10.2162&17.7816&14.7816\\

502: 13&19.7210&18.6535&17.5887&16.5269&15.4682&14.4139&13.3692&12.3452&11.3588&18.9580&15.9580\\

503: 14&20.9545&19.8854&18.8191&17.7554&16.6946&15.6373&14.5856&13.5459&12.5302&20.1280&17.1280\\

504: 15&22.1832&21.1127&20.0448&18.9795&17.9169&16.8572&15.8014&14.7528&13.7187&21.2924&18.2924\\

505: 16&23.4078&22.3359&21.2665&20.1996&19.1353&18.0737&17.0151&15.9612&14.9161&22.4516&19.4516\\

506: 17&24.6286&23.5553&22.4845&21.4161&20.3502&19.2868&18.2261&17.1689&16.1172&23.6061&20.6061\\

507: 18&25.8459&24.7714&23.6992&22.6294&21.5619&20.4969&19.4344&18.3747&17.3189&24.7563&21.7563\\

508: 19&27.0601&25.9844&24.9109&23.8397&22.7708&21.7042&20.6400&19.5784&18.5198&25.9025&22.9025\\

509: 20&28.2715&27.1946&26.1199&25.0474&23.9770&22.9090&21.8432&20.7799&19.7191&27.0451&24.0451\\

510: \hline

511: \end{tabular}

512: \end{center}

513: \caption{Upper 90\% limits for $n$ observed events with $b$ background

514: and $\epsilon=1.0\pm0.1$ ($\kappa=100$ and $m=99$,

515: as defined in section~\ref{submeas}). Also shown

516: are limits for $b=0$ and $b=3$ with fixed $\epsilon=1$.

517: \label{ltablex}}

518: \end{table}

519:

520:

521:

522:

523: %\begin{table}

524: %\begin{center}

525: %\begin{tabular}{c|ccc|cc}

526: %\hline

527: %Observed number & \multicolumn{3}{c|}{Bayes upper limit}

528: %& \multicolumn{2}{c}{Frequentist upper limit} \\

529: %\cline{2-6}

530: %\rule[-2.5mm]{0mm}{7.5mm}

531: %    &  $B1$ &  $B2$  &  $B3$

532: %                &  $Neyman$ & Feldman--Cousins \\

533: %\hline\hline

534: %0    & 2.30  & ****   & **** & 2.30 & **** \\

535: %1    & 3.89  & ****   & **** & 3.89 & **** \\

536: %2    & 5.32  & ****   & **** & 5.32 & **** \\

537: %3    & 6.68  & ****   & **** & 6.68 & **** \\

538: %4    & 7.99  & ****   & **** & 7.99 & **** \\

539: %\hline

540: %\end{tabular}

541: %\end{center}

542: %\caption{Upper limits at the $90\%$ confidence (or credibility)

543: % level for $\epsilon = 1$ and no background.

544: %The upper limit is shown as a function on the Poisson observable $n$.

545: %*****This Table will change.**********}

546: %\label{tab:Bayes_limits}

547: %\end{table}

548:

549:

550: \subsection{Coverage}

551:

552: Next we can investigate the frequentist coverage $C(\st)$\footnote

553: {This is the coverage at $s = \st$ when the Poisson variable is generated with

554: $s = \st$. This differs from $B(s',\st)$ where the coverage is checked at

555: $s = s'$ when the generation value is \st.} of this Bayesian approach.

556: That is, we can ask what the probability is, for a given value of \st, of our upper

557: limit being larger than \st, and hence being consistent with it. This is equivalent

558: to adding up the Poisson probabilities

559: of eqn.~(\ref{eqn:Poisson}) for those values of $n$ for which $s_\mathrm{u}(n) \ge \st$ i.e.

560: \begin{equation}

561: C(\st) = \sum_{\mathrm{relevant}\ n}\!\!\!e^{-\st\epsilon}(\st\epsilon)^n / n!

562: \label{csum}

563: \end{equation}

564: As \st\ increases through any of the values of $s_\mathrm{u}$ of

565: the last two columns of

566: Table~\ref{ltablex}, the coverage drops sharply.

567: For example, for the case of zero background and efficiency known to

568: be unity, the 90\% Bayesian upper limits will include $\st=3.8896$ for

569: $n = 1$ or larger. But $\st=3.8898$ is no longer below the upper limit

570: for $n = 1$. Thus one term drops out of the summation of

571: eqn.~(\ref{csum}) for the calculation of the coverage at $\st=3.8898$,

572: while the remaining terms change but little for the small change in

573: \st; this produces the abrupt fall in coverage. The coverage is

574: plotted in Fig.~\ref{fig:Bayes}, where the drop at $\st=3.8897$ can be

575: seen.

576: %The coverage is plotted in Fig.~\ref{fig:Bayes}.

577:

578: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

579:

580: {\footnotesize

581: \begin{quotation}

582: The calculation of $C(\st)$ can be done as follows:

583: The identity

584: \begin{equation}

585: f'(x)=

586: e^{-x}\left[\sum_{k=0}^{n-1}{x^k\over k!}-\sum_{k=0}^n{x^k\over k!}\right]=

587: -e^{-x}{x^n\over n!}

588: \qquad\mathrm{for}\qquad f(x)=e^{-x}\sum_{k=0}^n{x^k\over k!}

589: \end{equation}

590: allows us to write (integrating $-f'(x)$)

591: \begin{equation}

592: \int^{\st}_0\!\!\!\! p(s|n)\,ds =

593: \int^{\st\epsilon}_0\!\!\!\!e^{-x}{x^n\over n!}dx =

594: 1-e^{-\st\epsilon}\sum_{k=0}^n{(\st\epsilon)^k\over k!}

595: \end{equation}

596: From this, it follows that

597: ``relevant $n$'' is equivalent to any one of these

598: inequalities:

599: \begin{equation}

600: s_\mathrm{u}(n) \ge \st\Leftrightarrow

601: \int^{s_\mathrm{u}}_0\!\! p(s|n)\,ds\ge\int^{\st}_0\!\!\!\! p(s|n)\,ds\Leftrightarrow

602: \beta\ge1-e^{-\st\epsilon}\sum_{k=0}^n{(\st\epsilon)^k\over k!}

603: \end{equation}

604: and our expression for the coverage becomes

605: \begin{equation}

606: C(\st) = 1 - {\sum_{n=0}}'e^{-\st\epsilon}{(\st\epsilon)^n \over n!}

607: \end{equation}

608: where ${\sum}'$ means ``sum until the next term would cause the sum to

609: exceed $1-\beta$''. This result proves that $C(\st)\ge\beta$

610: for all values of \st\ in this simple example.

611:

612: \end{quotation}

613: }

614:

615: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

616:

617:

618:

619: It is seen that the coverage starts at $100\%$ for small \st. This is because

620: even for $n = 0$  the Bayesian upper limit will include \st, and this is even more

621: so for larger $n$.

622:

623: Bayesian methods can be shown to achieve average coverage. By this we

624: mean that when the coverage is averaged over the parameter $s$,

625: weighted by the prior in $s$, the result will agree with the nominal

626: value $\beta$, i.e.

627: \begin{equation}

628: \frac{\int C(s)\ \pi(s)\ ds}{\int \pi(s)\ ds} = \beta

629: \end{equation}

630: A proof of this theorem is given in the second appendix, section~\ref{ac} of this note.

631:

632: For a constant prior, the region at large $s$ tends to dominate the

633: average, while in general we will be interested in the coverage at

634: small $s$. Thus the ``average coverage'' result is of academic rather

635: than practical interest, especially for the case of a flat prior.

636: Indeed it is possible to have a situation where the average coverage

637: is, say, $90\%$, while the coverage as a function of $s$ is always

638: larger than or equal to $90\%$.

639:

640:

641: \section{The actual problem}

642: Our actual problem differs from the simple case of Section \ref{Bayes}

643: in that\\ (a) we have a background $b$, assumed for the time being to

644: be accurately known; and (b) we have an acceptance $\epsilon$

645: estimated in a subsidiary experiment as $\epsilon_0 \pm

646: \sigma_\epsilon$.

647:

648: What we are going to do is to use a multidimensional version of Bayes' Theorem

649: to express $p(s,\epsilon|n)$ in terms of $P(n|s,\epsilon)$ and the

650: priors for $s$ and $\epsilon$. The relationship is\footnote{

651: %This is deduced

652: For the case where the probabilities have a frequency ratio

653: interpretation, this is seen

654: from the mathematical identities\\ \\

655: $P(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z) = \frac{N(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z)}{N(Z)}\frac{N(Z)}{N_\mathrm{tot}} =P(X,Y|Z)\

656: P(Z)$

657: and \\ \\

658: $P(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z) = \frac{N(X\ \mathrm{and}\ Y\ \mathrm{and}\ Z)}{N(X\ \mathrm{and}\ Y)}\frac{N(X\ \mathrm{and}\ Y)}{N_\mathrm{tot}} =

659: P(Z|X,Y)\ P(X,Y)$.\\ \\

660: So with $X$, $Y$ and $Z$ identified with $s$, $\epsilon$ and $n$ respectively, and with the prior for

661: $s$ and $\epsilon$ factorising into two separate priors for $s$ and for $\epsilon$, we obtain

662: $p(s,\epsilon|n)\ P(n) = P(n|s,\epsilon)\ \pi(s)\ \pi(\epsilon)$.

663: }

664: \begin{equation}

665: p(s,\epsilon|n) = \frac{P(n|s,\epsilon)\pi(s)\pi(\epsilon)}{\int\!\!\int

666: P(n|s,\epsilon)\pi(s)\pi(\epsilon)\ ds\ d\epsilon}

667: \label{eqn:extended}

668: \end{equation}

669:

670: To obtain the posterior p.d.f.\ for $s$, we now integrate this over $\epsilon$:

671: \begin{equation}

672: p(s|n) = \int_0^\infty\!\! p(s,\epsilon|n)\ d\epsilon,

673: \label{eqn:posterior}

674: \end{equation}

675: and finally we use this to set a limit on $s$ as in eqn.~(\ref{eqn:0.9}).

676:

677: The coverage for this procedure needs to be calculated as a function of

678: \st\ and \et.

679: The average coverage theorem of the previous section must be

680: generalized to

681: \begin{equation}

682: {\int\!\!\int C(s,\epsilon)\pi(s)\pi(\epsilon)\,ds\,d\epsilon

683: \over

684: \int\!\!\int \pi(s)\pi(\epsilon)\,ds\,d\epsilon}=\beta

685: \end{equation}

686:

687:

688:

689:

690:

691: \subsection{Priors}

692: To implement the above procedure we need priors for $s$ and $\epsilon$. As in

693: the simple example of Section \ref{Bayes}, for simplicity we assume

694: that the prior for $s$ is constant. It will be interesting to look at

695: the way the properties of this method change as other priors for $s$ are used.

696:

697: We assume that the prior for $\epsilon$ is extracted from some

698: subsidiary measurement $\epsilon_0 \pm \sigma_\epsilon$. We do {\bf

699: not} assume that this implies that our belief about \et\ is

700: represented by a {\bf Gaussian} distribution centred on $\epsilon_0$,

701: as this would give trouble with the lower end of the Gaussian

702: extending to negative $\epsilon$.

703: %A Gaussian truncated at zero could also run into difficulties because

704: %of its finite probability density at zero acceptance.

705: Instead, we specify some particular form of the

706: subsidiary experiment that provides information about $\epsilon$, and

707: then assume that a Bayesian analysis of this yields a posterior

708: p.d.f.\ for $\epsilon$. Slightly confusingly, this posterior from the

709: subsidiary experiment is used as the prior for the application of

710: Bayes' Theorem to extract the limit on $s$ (see eqns.\

711: (\ref{eqn:extended}) and (\ref{eqn:posterior})).

712: %We aim to have this posterior/prior for $\epsilon$ having zero

713: %probability density at $\epsilon = 0$.

714:

715: \subsection{The subsidiary measurement}

716: \label{submeas}

717: Somewhat arbitrarily, we assume that, for a true acceptance \et, the

718: probability  for the measured value $\epsilon_0$ in the subsidiary experiment

719: is given by a Poisson distribution

720: \begin{equation}

721: P(\epsilon_0|\et) = e^{-\kappa\et}\kappa^m\et^m/m!

722: \label{eqn:pdf}

723: \end{equation}

724: where $\epsilon_0 = (m+1)/\kappa$, $\sigma_\epsilon^2 =

725: (m+1)/\kappa^2$ and $\kappa$ is a scaling constant\footnote{Here we define

726: $\epsilon_0$ and $\sigma^2_{\epsilon}$ as the mean and variance of the

727: posterior p.d.f.\ of eqn.~(\ref{eqn:posterior_e}).}. We interpret this

728: as the probability for $\epsilon_0$. This is discrete because the

729: observable $m$ is discrete, but the allowed values become closely

730: spaced for large $\kappa$. For small $\sigma_\epsilon/\epsilon$

731: (i.e.\ for large $m$), these probabilities approximate to a narrow Gaussian

732: (see Fig.~\ref{fig:comparison}).

733:

734: Given our choice of probability in eqn.~(\ref{eqn:pdf}),

735: the likelihood for the parameter $\epsilon$, given measured $\epsilon_0$,

736: is

737: %a likelihood approach to deducing

738: %\et\ from a measured $\epsilon_0$ would make use of

739: \begin{equation}

740: \mathcal{L}(\epsilon|\epsilon_0) = e^{-\kappa\epsilon}\kappa^m\epsilon^m/m!

741: \label{eqn:likelihood}

742: \end{equation}

743: This is the same function of $\epsilon$ and $\epsilon_0$ as eqn.~(\ref{eqn:pdf}), but

744: now $m$

745: is regarded as fixed, and $\epsilon$ is the variable. The likelihood is a continuous function of

746: $\epsilon$. It is compared with a Gaussian in Fig.~\ref{fig:comparison2}.

747:

748: Finally in the Bayes approach, with the choice of a constant prior for $\epsilon$, the posterior

749: probability density for $\epsilon$

750:  after our subsidiary measurement is

751: \begin{equation}

752: p(\epsilon|m) \propto \ e^{-\kappa\epsilon}\kappa^m\epsilon^m/m!

753: \label{eqn:posterior_e}

754: \end{equation}

755: which is obtained by multiplying the right-hand side of eqn.~(\ref{eqn:likelihood}) by unity.

756: This {\bf posterior} probability density for $\epsilon$ will be used as our {\bf prior}

757: for $\epsilon$ in the next step of deducing the limit for $s$.

758:

759: \section{Results}

760:

761: The details of the necessary analytical calculations\footnote{This

762: example can be handled analytically. More complicated cases might require

763: numerical integration, which can be done via numerical quadrature or

764: Monte Carlo methods.}  are presented in the Appendix of this note. In

765: this section we investigate the behavior of the Bayesian limits in

766: this example, especially the shape of the frequentist coverage

767: probability as a function of

768: \st.

769:

770:

771: \subsection{Shape of the posterior}

772:

773: The posterior p.d.f.\ for $s$ has the form

774: \begin{equation}

775: p(s|b,n)ds\propto

776: \left[\int_0^\infty

777: {e^{-(\epsilon s+b)}(\epsilon s+b)^n\over n!}

778: {\kappa  (\kappa\epsilon)^{m}  e^{-\kappa\epsilon} \over\Gamma(m+1)}d\epsilon

779: \right]1\,ds

780: \end{equation}

781: where the likelihood, the prior for $\epsilon$, the (constant) prior

782: for $s$, and the marginalization integral over $\epsilon$ are all

783: prominently displayed.

784:

785: The posterior probability density for $s$ gives the complete summary

786: of the outcome of the measurement in the Bayesian approach.  It is

787: therefore important to understand its shape before proceeding to use

788: it to compute a limit (or extract a central value and error-bars).

789:

790: Figure~\ref{pdfs} illustrates the shape of the posterior for $s$

791: (i.e.\ marginalized over $\epsilon$) in the case of a nominal 10\%

792: uncertainty on $\epsilon$, and an expectation of 3 background

793: events. Plots are shown for 1, 3, 5, and 10 observed events.  The

794: posterior evolves gracefully from being strongly peaked at $s=0$ to a

795: roughly Gaussian shape that excludes the neighborhood near $s=0$ with

796: high probability. Technically, the posterior would be described as a

797: mixture of $n+1$ Beta distributions of the 2nd kind\footnote{The 2nd Beta

798: distribution is also known as ``$\mathrm{Beta}'$''

799: (i.e.\ ``Beta prime''), ``inverted Beta'', ``Pearson Type~VI'',

800: ``Variance-Ratio'', ``Gamma-Gamma'', ``F'', ``Snedecor'',

801: ``Fisher-Snedecor''\ldots.}, giving

802: it a tail at high $s$ that is heavier than that of a Gaussian.

803:

804: \subsection{Upper limits}

805:

806: In this note, our main goal is to obtain a

807: Bayesian upper limit $s_\mathrm{u}$ from our observation of $n$

808: events.  It is by integrating the posterior p.d.f.\ out to

809: $s=s_\mathrm{u}$ that an upper limit is calculated: a $\beta=90\%$

810: upper limit is defined so that the integral of the posterior from

811: $s=0$ to $s=s_\mathrm{u}$ is 0.9.  The probability (in the Bayesian

812: sense) of $\st<s_\mathrm{u}$ is then exactly $\beta$.

813:

814: Table~\ref{ltablex} shows the upper limits ($\beta=0.9$) for $n=0$--20

815: observed events with $b=0$--8 and $\epsilon=1.0\pm0.1$.

816: (Integer values of $b$ are chosen for illustration purposes only;

817: $b$ can, of course, take any real value $\ge0$.)

818:

819: One notices

820: that when $n=0$, the limit is independent of the expected background

821: $b$. This is required in the Bayesian approach: we know that exactly

822: zero background events were produced (when no events at all were

823: produced), and this knowledge of what {\em did} happen makes what

824: might have happened superfluous.  An interesting corollary

825: is, in the case of no events observed, uncertainties in estimating

826: the background rate are of no consequence in the Bayesian approach,

827: and must not contribute any systematic uncertainty to the limit.

828: This reasoning does not hold in the frequentist framework,

829: where what might have happened definitely does influence the limit.

830:

831:

832:

833: For comparison, limits for fixed $\epsilon=1$ with $b=0$ or $b=3$ are

834: also shown in Table~\ref{ltablex}. It is interesting that these two

835: columns start out equal at $n=0$ and differ by almost exactly 3 for

836: $n>11$. In contrast, the difference between the $b=0$ and $b=3$

837: columns for $\epsilon=1.0\pm0.1$ is already greater than 3 at $n=6$, and

838: continues to grow as $n$ increases; it is not clear whether

839: the difference approaches a finite value as $n\to\infty$.

840: In any case, the limits for $\epsilon=1$ exactly are all smaller

841: than the corresponding limits for $\epsilon=1.0\pm0.1$,

842: as expected.

843:

844:

845: \subsection{Coverage}

846:

847: The main quantity of interest in this subsection is the frequentist

848: coverage probability $C$ as a function of \st\ (for fixed \et\ and

849: $b$).  Because both the main and the subsidiary measurements involve

850: observing a discrete number of events, the function $C(\st)$ will have

851: many discontinuities. On the other hand, $C(\et)$ will be continuous

852: (for fixed \st). The explanation of this effect is as follows:

853:

854: \begin{quotation}

855: \footnotesize

856:

857: The measured data are $n$ events in the main measurement and $m$ events

858: in the subsidiary measurement.  For each observed outcome $(n,m)$

859: there is a limit $s_\mathrm{u}(n,m)$. This limit includes the effect

860: of marginalization over $\epsilon$.

861:

862: All $(n,m)$ with $n\ge0$ and $m\ge0$ are possible, and the probability

863: $P$ of observing $(n,m)$ can be calculated as the product of two

864: Poissons.  (It will depend on \st, \et, \ldots)  If we look at all the

865: possible limits we can obtain,

866: \begin{equation}

867:            \{ s_\mathrm{u}(n,m) | n\ge0\ \mathrm{and}\ m\ge0\}

868: \end{equation}

869: and sort them in increasing $s_\mathrm{u}$, the $s_\mathrm{u}$ are

870: countably infinite in number and dense in the same way that rational

871: numbers are dense in the reals.

872:

873: To compute the coverage as a function of \st, we simply add up all

874: the probabilities of obtaining $(n,m)$ with $s_\mathrm{u}(n,m)\ge\st$:

875: \begin{equation}

876: C=\!\!\sum_{(n,m)\in\mathcal{A}}\!\!P(n,m)\qquad\qquad

877: \mathcal{A}=\{(n,m) | \st\le

878: s_\mathrm{u}(n,m)\ \mathrm{and}\ n\ge0\ \mathrm{and}\ m\ge0\}

879: \end{equation}

880: This sum is over a countably infinite number of terms.  If we

881: increase $\st$ slightly to $\st+ds$ and recalculate the coverage, we

882: have to drop all the terms

883: \begin{equation}

884:    \{ (n,m) | \st \le s_\mathrm{u}(n,m) \le \st+ds \}

885: \end{equation}

886: from the previous sum (the $P(n,m)$ for each term also changes

887: continuously with $\st$, but this is no problem).  If there are $M>0$

888: such terms, there are $M$ discontinuities in the coverage in the

889: interval $[\st,\st+ds]$, since $P(n,m)$ for each of these is finite,

890: and we lose them one by one as we sweep across the interval

891: $[\st,\st+ds]$.

892:

893: But it seems that, in general, we can always find a solution to

894: $\st\le s_\mathrm{u}(n,m)\le\st+ds$ for finite $ds$ by going out to larger and

895: larger $n$ and $m$.  So, although the discontinuity may be tiny, we can

896: always find a finite discontinuity in any finite interval of $\st$.

897:

898: On the other hand, if we keep $\st$ fixed and vary $\et$, we always

899: sum over the same set of $(n,m)$, since the definition of

900: $\mathcal{A}$ does not involve \et, and $P(n,m)$ is continuous in

901: \et.  So the coverage is continuous as a function of \et\ for \st\

902: fixed.

903:

904: \end{quotation}

905:

906: Plotting a curve that is discontinuous at every point is somewhat

907: problematical. The solution adopted here is to plot the coverage as

908: straight line segments between the discontinuities, ignoring any

909: discontinuities with $|\Delta C|<10^{-4}$.  Figure~\ref{l100} shows

910: $C(\st)$ for the case $\beta=90\%$, $\et=1$, nominal 10\% uncertainty

911: of the subsidiary measurement of $\epsilon$, and $b=3$. We observe

912: that $C(\st)>\beta$ in this range, and it is not clear numerically

913: whether $C(\st)\to\beta$ as $\st\to\infty$. The same conclusions hold

914: for Fig.~\ref{l25}, which illustrates the same situation with a 20\%

915: nominal uncertainty for the $\epsilon$-measurement.

916:

917: Figure~\ref{ecov} shows $C(\et)$ for $\beta=90\%$, $\st=10$,

918: $\kappa=100$, and $b=3$---continuous as advertised.  The shape of the

919: curve is quite similar to that of Figs.~\ref{l100} and \ref{l25}, so

920: it seems that the coverage probability (with $b$ fixed) is

921: approximately a function of just the product of \et\ and \st. This

922: approximate rule is likely to fail in the limit as $\et\to0$ and

923: $\st\to\infty$, for example, but it seems to hold when \et\ and

924: \st\ are at least of the same order of magnitude.

925:

926: When $\et\st$ is small, of order 1 or less, the coverage is

927: $\sim$100\%, as in the simple case of Fig.~\ref{fig:Bayes}.

928: Otherwise, the behavior of coverage in Figs.~\ref{l100}--\ref{ecov}

929: is superior to

930: that of Fig.~\ref{fig:Bayes}, which has

931: a much larger amplitude of oscillation.

932:

933: Another frequentist quantity that characterizes the performance of a

934: limit scheme is the sensitivity, defined as the mean of

935: $s_\mathrm{u}$.  Figure~\ref{sens} shows the sensitivity as a function

936: of \st\ for the case of Fig.~\ref{l100}; $\langle

937: s_\mathrm{u}\rangle$ is observed to be nearly linearly dependent on

938: \st. There is one complication here: when the subsidiary

939: measurement observes $m=0$ events, and the prior for $s$ is flat,

940: $s_\mathrm{u}=\infty$. Since the Poisson probability of obtaining

941: $m=0$ is always finite, $\langle s_\mathrm{u}\rangle$ is consequently

942: infinite. So we must exclude the $m=0$ case from the

943: definition of $\langle s_\mathrm{u}\rangle$. (In Fig.~\ref{sens}

944: the probability of obtaining $m=0$ is $e^{-100}\simeq4\times10^{-44}$.)

945:

946:

947:

948: \subsection{Other priors for $s$}

949:

950: A weakness of the Bayesian approach is that there is no

951: universally accepted method to obtain a unique ``non-informative''

952: or ``objective'' prior p.d.f. Reference~\cite{roots}, for example,

953: states:

954: \begin{quote}

955: Put bluntly: data cannot ever speak entirely for themselves; every

956: prior specification has {\em some} informative posterior or predictive

957: implications; and ``vague'' is itself much too vague an idea to be

958: useful.  There is no ``objective'' prior that represents ignorance.

959: \end{quote}

960: Nevertheless, Ref.~\cite{roots} does derive a $1/\sqrt{s}$ ``reference

961: prior'' for the simple Poisson case, which is claimed to have ``a

962: minimal effect, relative to the data, on the final inference''.  This

963: is to be considered a ``{\em default} option when there are

964: insufficient resources for detailed elicitation of actual prior

965: knowledge''.

966:

967: Reference~\cite{ref:Jeffreys} attempts to discover the optimal form for

968: prior ignorance by considering the behavior of the prior under

969: reparameterizations.  For the case in question, the form $1/s$ clearly

970: has the best properties in this respect.

971:

972: We are using an flat ($s^0$) prior for this study, which

973: seems to be the most popular choice, but the Appendix works out the

974: form of the posterior using an $s^{\alpha-1}$ prior, so we can briefly

975: here summarize the results for the $1/s$ and $1/\sqrt{s}$ cases:

976:

977: The $1/s$ prior leads to an unnormalizable posterior for all observed

978: $n$ when $b>0$. The posterior becomes a $\delta$-function at $s=0$,

979: $s_\mathrm{u}=0$ for any $\beta$, and the coverage is consequently

980: zero for all $\st>0$. This clearly is a disaster.

981:

982: The $1/\sqrt{s}$ prior results in a posterior p.d.f.\ qualitatively

983: similar in shape to those of Fig~\ref{pdfs}, except that the p.d.f.\

984: is always infinite at $s=0$. For $n\gg b$, this produces an extremely

985: thin ``spike'' at $s=0$, which has a negligible contribution to the

986: integral of the posterior p.d.f. A more significant difference (for

987: frequentists) between the $1/\sqrt{s}$ and the $s^0$ case is that the

988: coverage probability is significantly reduced: for the case of

989: Fig.~\ref{l100} the $1/\sqrt{s}$ prior pushes the minimum coverage

990: down to $\sim$0.87.  So the $1/\sqrt{s}$ prior leads to violation of

991: the frequentist coverage requirement; it undercovers for some

992: values of \st.

993:

994: %From the practical point of view---trying to upset as few people as

995: %possible---the $s^0$ prior for the case considered in this note seems

996: %more universally acceptable than the $1/\sqrt{s}$ prior, which is

997: %objectionable from a frequentist point of view.

998: One might also seek to

999: further improve the coverage properties by adopting an intermediate

1000: prior. For example, an $s^{-0.25}$ prior would reduce the level of

1001: overcoverage obtained with the $s^0$ prior.  How acceptable this

1002: approach would be within the Bayesian Statistical community is an

1003: interesting question.

1004:

1005: It should be noted that all the prior p.d.f.'s considered in this note

1006: are ``improper priors''---they cannot be correctly normalized: In the

1007: case of the $s^0$ and $1/\sqrt{s}$ priors, the integral from 0 to any

1008: value $s_0$ is finite, while the integral from $s_0$ to infinity is

1009: infinite. The corresponding integrals of the $1/s$ prior are infinite

1010: on both sides for all $s_0>0$. Improper priors are dangerous but often

1011: useful; ``improper posteriors'' are generally pathological.  Extra

1012: care must be taken when employing improper priors to verify the

1013: normalizability of the resulting posterior---when using a numerical

1014: method to obtain the posterior, it is very easy to miss the fact that

1015: its integral is infinite.

1016:

1017:

1018:

1019: \subsection{Restrictions}

1020:

1021: We summarize here the restrictions forced on the priors for $s$ and

1022: $\epsilon$---see the Appendix for the analytical causes.  The

1023: discussion below assumes $b>0$.  The prior for $s$ being of the form

1024: $s^{\alpha-1}$, we must require $\alpha>0$, as discussed above.

1025:

1026: As specified in this note, the prior for $\epsilon$, being taken from

1027: the posterior from the subsidiary measurement with a flat prior, has

1028: been given no freedom. Should the subsidiary measurement observe $m=0$

1029: events, the posterior for $s$ is not normalizable when $\alpha\ge1$:

1030: $s_\mathrm{u}=\infty$ when $m=0$ and $\alpha\ge1$.

1031:

1032: This behavior is due to a well known effect: the $\epsilon$ prior

1033: becomes $\kappa e^{-\kappa\epsilon}$ when $m=0$, which remains finite

1034: as $\epsilon\to0$. All such cases\footnote{A Gaussian truncated at

1035: $\epsilon=0$ is the standard example.} yield $s_\mathrm{u}=\infty$ when

1036: $\alpha\ge1$; any positive $\alpha<1$ cuts off the posterior at large

1037: $s$ sufficiently rapidly to render it normalizable. From this point of

1038: view, a $1/\sqrt{s}$ prior may seem preferable, but on the other hand,

1039: having $s_\mathrm{u}=\infty$ when $m=0$ seems intuitively reasonable.

1040: (In general, we have $s_\mathrm{u}=\infty$ for $m\le\alpha-1$, but

1041: $\alpha\ge2$ are not popular choices.)

1042:

1043: There is another approach possible to the gamma prior for $\epsilon$:

1044: we may simply specify by fiat the form of the prior as

1045: \begin{equation}

1046: p(\epsilon|\mu)d\epsilon =

1047: {\kappa  (\kappa\epsilon)^{\mu-1}  e^{-\kappa\epsilon} \over\Gamma(\mu)}

1048: d\epsilon

1049: \end{equation}

1050: where $\mu$ is no longer required to be an integer.  In practice, one

1051: then might obtain $\mu$ and $\kappa$ from a subsidiary measurement

1052: whose result is approximated by the gamma distribution.

1053: In such cases,

1054: one must require $\mu>\alpha$ to keep the posterior normalizable. Note

1055: that in this form, $\mu/\kappa$ is the mean of the $\epsilon$ prior,

1056: $(\mu-1)/\kappa$ is the mode, and $\mu/\kappa^2$ is the variance.

1057: The subsidiary measurement is often analysed by other experimenters,

1058: who chose statistics to quote for their central value and uncertainty

1059: (omitting additional likelihood information).

1060: It is then important to obtain $\mu$ and $\kappa$ in a consistent way

1061: from the information supplied by the subsidiary measurement.  If

1062: $\epsilon$, for example, were estimated by a maximum likelihood

1063: method, one would identify the estimate with $(\mu-1)/\kappa$ rather

1064: than $\mu/\kappa$.

1065:

1066:

1067: \section{Conclusions}

1068:

1069:

1070: Results have been presented on the performance of a purely Bayesian

1071: approach to the issue of setting upper limits on the rate of a

1072: process, when $n$ events have been observed in a situation where the

1073: expected background is $b$ and where the efficiency/acceptance factor

1074: $\epsilon \pm \sigma_{\epsilon}$ has been determined in a subsidiary

1075: experiment.

1076: We find that this approach, when using a flat prior for the rate,

1077: results in modest overcoverage.

1078: Plots of the expected sensitivity of such a measurement

1079: and of the coverage of the upper limits are given.

1080: It will be

1081: interesting to compare these with the corresponding plots for other

1082: methods of extracting upper limits, to be given in future notes.

1083: Reference~\cite{software} provides the limit calculating software

1084: associated with this study in the form of C~functions.

1085:

1086:

1087: \section{Appendix A---Analytical Details}

1088:

1089: %%%%%%%%%%%%%%%%%%%%%%%%%%%%5

1090:

1091:

1092: Here we present the details of the analytical calculation of the

1093: posterior p.d.f.\ for $s$. For generality, we work through the

1094: calculation with a $s^{\alpha-1}$ prior; a flat prior is then the

1095: special case $\alpha=1$.

1096:

1097: \subsection{Posterior for $s$ with $\epsilon$ and $b$ fixed }

1098:

1099: We measure $n$ events from a process with Poisson rate $\epsilon s+b$,

1100: and we want the Bayesian posterior for $s$, given improper prior

1101: $s^{\alpha-1}$.  We compute the posterior for fixed $\epsilon$ and $b$

1102: in this subsection; the calculation with our prior

1103: for $\epsilon$ follows in the next subsection. We have

1104: \[

1105: \mbox{posterior:}\quad p(s|\epsilon,b,n)ds={1\over\mathcal{N}_s}

1106: e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}ds

1107: \]

1108: where all factors not depending on $s$ have already been absorbed

1109: into the normalization constant $\mathcal{N}_s$, which is defined by

1110: \[

1111: \mathcal{N}_s=

1112: \int_0^\infty e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}ds=

1113: {b^{n+\alpha}\over\epsilon^\alpha}

1114: \int_0^\infty e^{-bu}u^{\alpha-1}(1+u)^ndu\qquad(u=s\epsilon/b)

1115: \]

1116: where we have performed the indicated change of variable.

1117:

1118: Expanding

1119: $(1+u)^n$ in powers of $u$ using the binomial theorem, we get

1120: \[

1121: (1+u)^n=n!\sum_{k=0}^n{u^{n-k}\over(n-k)!k!}\qquad\Rightarrow\qquad

1122: \mathcal{N}_s=

1123: n!\epsilon^{-\alpha}

1124: \sum_{k=0}^n{\Gamma(\alpha+n-k)b^k\over(n-k)!k!}

1125: \]

1126: Recognizing this as of the general hypergeometric form, we write it as

1127: \[

1128: \mathcal{N}_s=

1129: \epsilon^{-\alpha}

1130: \Gamma(\alpha+n)\left[1+{n\over\alpha+n-1}{b\over1!}+

1131: {n(n-1)\over(\alpha+n-1)(\alpha+n-2)}{b^2\over2!}+\cdots\right]

1132: \]

1133: to make the hypergeometric nature more explicit.  Using the modern

1134: notation\cite{ff} for the falling factorial

1135: \[

1136: z^{\underline{k}}\equiv

1137: {\Gamma(z+1)\over\Gamma(z-k+1)}=z(z-1)(z-2)\cdots(z-k+1)

1138: \]

1139: this is expressed as

1140: \[

1141: \mathcal{N}_s=

1142: \epsilon^{-\alpha}\Gamma(\alpha+n)

1143: \sum_{k=0}^n{n^{\underline{k}}\over(\alpha+n-1)^{\underline{k}}}{b^k\over k!}=

1144: \epsilon^{-\alpha}\Gamma(\alpha+n)M(-n,1-n-\alpha,b)

1145: \]

1146: where $M$ is the notation of \cite{as}.

1147: ($M$, a confluent hypergeometric function, is often written $_1F_1$,

1148: and the relation given here is only valid for integer $n\ge0$.)  Note

1149: that $M(-n,1-n-\alpha,b)$ is a polynomial of order $n$ in $b$ (for $n$

1150: a non-negative integer), and is related to the Laguerre polynomials.

1151: When $\alpha=1$, we get $M(-n,-n,b)$, which is related to the Incomplete

1152: Gamma Function.

1153: When $\alpha=0$, we get $M(-n,1-n,b)$, which is infinite, so we require

1154: that $\alpha>0$.

1155:

1156:

1157: Our posterior probability density for fixed $\epsilon$

1158: is then given by

1159: \[

1160: p(s|\epsilon,b,n)ds=

1161: {\epsilon^\alpha e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}

1162: \over\Gamma(\alpha+n)M(-n,1-n-\alpha,b)}ds

1163: \]

1164:

1165:

1166: \subsection{Posterior for $\epsilon$ of the subsidiary measurement}

1167:

1168: The subsidiary measurement observes an integer number of events $m$,

1169: Poisson distributed as:

1170: \[

1171: P(m|\epsilon) = {e^{-\kappa\epsilon}  (\kappa\epsilon)^m\over m!}

1172: \]

1173: where $\kappa$ is a real number (connecting the subsidiary measurement to the

1174: main measurement) whose uncertainty is negligible, so $\kappa$ can safely be

1175: treated as a fixed constant.  $\kappa$ might be thought of, for example, as

1176: based on a cross section that is exactly calculable by theory.  There

1177: is negligible (i.e.\ zero) background in the subsidiary measurement.

1178:

1179: The prior for $\epsilon$ is specified to be flat.

1180: The Bayesian posterior p.d.f.\ for $\epsilon$ is then

1181: \[

1182: p(\epsilon|m) = {\kappa  (\kappa\epsilon)^m  e^{-\kappa\epsilon} \over m!}

1183: \]

1184: (or $\Gamma(m+1)$ instead of $m!$ in the denominator if you prefer).

1185: This is known as a gamma distribution.

1186:

1187: The mean and rms of this posterior p.d.f.\ summarize the result of the

1188: subsidiary measurement as:

1189: \[

1190: \epsilon = {m + 1\over\kappa}   \pm

1191: {\sqrt{m + 1}\over\kappa} = \epsilon_0 \pm \sigma_\epsilon

1192: \]

1193: Note that the observed data quantity in the subsidiary measurement is

1194: an integer $m$, while the quantity being measured by the subsidiary

1195: measurement is a positive real number $\epsilon$.

1196:

1197: \subsection{Posterior for $s$ with gamma prior for $\epsilon$ ($b$ fixed)\label{postpdf}}

1198:

1199:

1200: Next we compute the joint posterior $p(s,\epsilon|b,n)dsd\epsilon$

1201: using the $s^{\alpha-1}$ prior for $s$ and

1202: our gamma distribution prior (i.e.\ the posterior derived

1203: from the subsidiary measurement) for $\epsilon\ge0$

1204: \[

1205: \mbox{prior for $\epsilon$:}\quad \pi(\epsilon)d\epsilon=

1206: {(\kappa\epsilon)^\mu e^{-\kappa\epsilon}\over\Gamma(\mu)}

1207: {d\epsilon\over\epsilon}

1208: \qquad\qquad\mu=m+1=(\epsilon_0/\sigma_\epsilon)^2\qquad

1209: \kappa=\epsilon_0/{\sigma_\epsilon}^2

1210: \]

1211: where it is convenient to write $\mu$ for $m+1$.

1212: We have for the joint posterior p.d.f.

1213: \[

1214: p(s,\epsilon|b,n)dsd\epsilon={1\over\mathcal{N}_{s,\epsilon}}

1215: \pi(\epsilon)e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}dsd\epsilon

1216: \]

1217: where

1218: \[

1219: \mathcal{N}_{s,\epsilon}=\int_0^\infty\!\!\!\int_0^\infty\!\!

1220: \pi(\epsilon)e^{-\epsilon s}(\epsilon s+b)^ns^{\alpha-1}dsd\epsilon=

1221: \int_0^\infty\!\!\pi(\epsilon)\mathcal{N}_sd\epsilon

1222: \]

1223: We calculated $\mathcal{N}_s$ above, so we have

1224: \[

1225: \mathcal{N}_{s,\epsilon}=

1226: \Gamma(\alpha+n)M(-n,1-n-\alpha,b)

1227: \int_0^\infty\!\!\epsilon^{-\alpha}\pi(\epsilon)d\epsilon

1228: \]

1229: \[

1230: \mathcal{N}_{s,\epsilon}=

1231: \kappa^\alpha\Gamma(\mu-\alpha)\Gamma(\alpha+n)M(-n,1-n-\alpha,b)/\Gamma(\mu)

1232: \]

1233: \[

1234: p(s,\epsilon|b,n)dsd\epsilon=

1235: {\kappa^{\mu-\alpha}\epsilon^{\mu-1}s^{\alpha-1}(\epsilon s+b)^n

1236: e^{-(s+\kappa)\epsilon}\over

1237: \Gamma(\mu-\alpha)\Gamma(\alpha+n)M(-n,1-n-\alpha,b)}dsd\epsilon

1238: \]

1239: The marginalized posterior for $s$ can then be expressed as

1240: \[

1241: p(s|b,n)ds=\left[\int_0^\infty\!\!p(s,\epsilon|b,n)d\epsilon\right]ds=

1242: {s^{\alpha-1}\kappa^{\mu-\alpha}\mathcal{I}_\epsilon\over

1243: \Gamma(\mu-\alpha)\Gamma(\alpha+n)M(-n,1-n-\alpha,b)}ds

1244: \]

1245: where the integral $\mathcal{I}_\epsilon$ is given by

1246: \[

1247: \mathcal{I}_\epsilon=

1248: \int_0^\infty\epsilon^{\mu-1}e^{-(s+\kappa)\epsilon}

1249: (\epsilon s+b)^nd\epsilon

1250: \]

1251: The same procedure that was used for the normalization integral can

1252: be applied here, producing

1253: \[

1254: \mathcal{I}_\epsilon=

1255: {b^{\mu+n}\over s^\mu}

1256: \int_0^\infty u^{\mu-1}e^{-b(1+\kappa/s)u}(1+u)^ndu

1257: \]

1258: \[

1259: \mathcal{I}_\epsilon=

1260: {s^nn!\over(s+\kappa)^{\mu+n}}

1261: \sum_{k=0}^n{\Gamma(\mu+n-k)\over

1262: (n-k)!k!}\left[b(s+\kappa)\over s\right]^k

1263: \]

1264: \[

1265: \mathcal{I}_\epsilon=

1266: {s^n\over(s+\kappa)^{\mu+n}}\Gamma(\mu+n)

1267: M(-n,1-n-\mu,b(s+\kappa)/s)

1268: \]

1269: \[

1270: p(s|b,n)ds=

1271: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}

1272: {s^{\alpha+n-1}\kappa^{\mu-\alpha}\over(s+\kappa)^{\mu+n}}

1273: {M(-n,1-n-\mu,b(s+\kappa)/s)\over M(-n,1-n-\alpha,b)}ds

1274: \]

1275: which has a particularly simple form when the background term is zero:

1276: \[

1277: p(s|b=0,n)ds=

1278: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}

1279: {s^{\alpha+n-1}\kappa^{\mu-\alpha}\over(s+\kappa)^{\mu+n}}ds

1280: \]

1281: a Beta distribution of the 2nd kind. Note that we must require

1282: $\mu>\alpha>0$ to obtain a normalizable posterior.

1283:

1284: Our posterior p.d.f.\ for $s$ with $\epsilon$ (and $b$) fixed is

1285: recovered exactly by taking the limit of $p(s|b,n)$ as

1286: $\sigma_\epsilon\to0$. This means that the limit of $s_{\mathrm{u}}$

1287: as $\sigma_\epsilon\to0$ is identical to the value of $s_{\mathrm{u}}$

1288: when $\epsilon$ is known exactly. This property may seem obvious, but

1289: it is violated by some frequentist methods of setting limits,

1290: so it is worth mentioning.

1291:

1292: \subsection{Calculating the limit \label{intpostpdf}}

1293:

1294: We need to integrate $p(s|b,n)$ up to some limit $s_\mathrm{u}$, which

1295: can be done analytically as follows.

1296: \[

1297: \int_0^{s_\mathrm{u}}\!\!p(s|b,n)ds=

1298: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}

1299: \int_0^{s_\mathrm{u}\over s_\mathrm{u}+\kappa}

1300: t^{\alpha+n-1}(1-t)^{\mu-\alpha-1}

1301: {M(-n,1-n-\mu,b/t)\over M(-n,1-n-\alpha,b)}dt

1302: \]

1303: where the substitution $t={s\over s+\kappa}$ has been performed.

1304: Re-expanding the polynomial $M$ and integrating term by term yields

1305: \[

1306: \int_0^{s_\mathrm{u}}\!\!p(s|b,n)ds=

1307: \sum_{k=0}^n

1308: {I_x(\alpha+n-k,\mu-\alpha)n^{\underline{k}}\over(\alpha+n-1)^{\underline{k}}}

1309: {b^k\over k!}\Bigg/\sum_{k=0}^n

1310: {n^{\underline{k}}\over(\alpha+n-1)^{\underline{k}}}

1311: {b^k\over k!}\qquad\left(x={s_\mathrm{u}\over s_\mathrm{u}+\kappa}\right)

1312: \]

1313: where $I_x$ is the standard notation for the Incomplete Beta Function

1314: \[

1315: I_x(q,r)\equiv

1316: {\Gamma(q+r)\over\Gamma(q)\Gamma(r)}

1317: \int_0^xt^{q-1}(1-t)^{r-1}dt

1318: \]

1319: which also satisfies the following recursion:

1320: \[

1321: I_x(q,r)={\Gamma(q+r)\over\Gamma(q+1)\Gamma(r)}x^q(1-x)^r+I_x(q+1,r)

1322: \]

1323:

1324: %It is interesting to also look at the quantity

1325: %\[

1326: %p(\epsilon|b,n)d\epsilon=

1327: %\left[\int_0^\infty\!\!p(s,\epsilon|b,n)ds\right]d\epsilon

1328: %\]

1329: %which is the marginalized posterior for $\epsilon$. Once again,

1330: %the necessary integral has already been done above, and we

1331: %obtain

1332: %\[

1333: %p(\epsilon|b,n)d\epsilon=

1334: %{(\kappa\epsilon)^{\mu-\alpha}

1335: %e^{-\kappa\epsilon}\over\Gamma(\mu-\alpha)}

1336: %{d\epsilon\over\epsilon}

1337: %\]

1338: %which is independent of $b$ and $n$, and has the same form as the prior

1339: %for $\epsilon$, except that $\mu$ is replaced by $\mu-\alpha$.

1340: %If we had picked $\alpha=0$ (i.e.,\ a Jeffreys prior for $s$),

1341: %we would have the nice property that $p(\epsilon|b,n)=p(\epsilon)$,

1342: %but $\alpha=0$ has already been rejected above.

1343:

1344:

1345: %Another quantity of interest is the probability distribution

1346: %of a mixture of Poissons where the $\epsilon$ parameter

1347: %is given the Gamma distribution

1348: %\[

1349: %\mathcal{P}(n|b,s)=

1350: %\int_0^\infty {e^{-(s\epsilon+b)}(s\epsilon+b)^n\over n!}

1351: %{(\kappa\epsilon)^\mu e^{-\kappa\epsilon}\over\Gamma(\mu)}

1352: %{d\epsilon\over\epsilon}=

1353: %{e^{-b}\kappa^\mu\mathcal{I}_\epsilon\over n!\Gamma(\mu)}

1354: %\]

1355: %As indicated, the necessary integral here is

1356: %an integral we did above, so we get

1357: %\[

1358: %\mathcal{P}(n|b,s)=

1359: %{e^{-b}\kappa^\mu s^n\Gamma(\mu+n)M(-n,1-n-\mu,b(s+\kappa)/s)\over

1360: %n!\Gamma(\mu)(s+\kappa)^{\mu+n}}

1361: %\]

1362: %whose zero-background special case

1363: %\[

1364: %\mathcal{P}(n|b=0,s)=

1365: %{\kappa^\mu s^n\Gamma(\mu+n)\over

1366: %n!\Gamma(\mu)(s+\kappa)^{\mu+n}}

1367: %\]

1368: %is a negative binomial distribution.

1369:

1370: %There is also an interesting expression for $\sum_{j=0}^n\mathcal{P}(j|b,s)$.

1371: %We start with

1372: %\[

1373: %\mathcal{P}(j|b,s)=e^{-b}\left[f(j-1)-f(j)+b^j/j!\right]

1374: %\]

1375: %where

1376: %\[

1377: %f(j)=\sum_{k=0}^jI_{s\over s+\kappa}(1+j-k,\mu){b^k\over k!}

1378: %\]

1379: %which follows from the recursion relation for $I_x(q,r)$ given above.

1380: %Summing, we obtain

1381: %\[

1382: %\sum_{j=0}^n\mathcal{P}(j|b,s)=e^{-b}M(-n,-n,b)-e^{-b}f(n)

1383: %\]

1384: %or equivalently

1385: %\[

1386: %1-{\sum_{j=0}^n\mathcal{P}(j|b,s)\over e^{-b}M(-n,-n,b)}=

1387: %{f(n)\over M(-n,-n,b)}

1388: %\]

1389:

1390: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5

1391:

1392:

1393:

1394: \subsection{Integer moments of the marginalized posterior}

1395: Using the same technique as above, we can calculate the $j$th moment

1396: of the posterior p.d.f.\ as

1397: \[

1398: \langle s^j\rangle=\int_0^\infty\!\!s^jp(s|b,n)ds=

1399: {(\alpha+n)^{\overline{j}}\kappa^j\over(\mu-\alpha-1)^{\underline{j}}}

1400: {M(-n,1-n-\alpha-j,b)\over M(-n,1-n-\alpha,b)}

1401: \]

1402: where we utilize the rising factorial notation\cite{ff}

1403: \[

1404: z^{\overline{k}}\equiv

1405: {\Gamma(z+k)\over\Gamma(z)}=z(z+1)(z+2)\cdots(z+k-1)

1406: \]

1407:

1408: The expression for the mean of the posterior when $\alpha=1$

1409: can be simplified using the identity

1410: \[

1411: M(-n,-n-1,b)=\left(1-{b\over n+1}\right)M(-n,-n,b)+{b^{n+1}\over(n+1)!}

1412: \]

1413: obtaining

1414: \[

1415: \mathrm{mean}(\alpha=1)=\langle s\rangle|_{\alpha=1}=

1416: {\kappa(n+1-b)\over\mu-2}+

1417: {\kappa b^{n+1}\over(\mu-2)n!M(-n,-n,b)}

1418: \]

1419: Note that the 2nd term is very small when $n\gg b$.

1420:

1421: The recurrence relation\cite{as}

1422: \[

1423: r(r-1)M(q,r-1,z)+r(1-r-z)M(q,r,z)+z(r-q)M(q,r+1,z)=0

1424: \]

1425: leads to a recurrence relation between moments

1426: \[

1427: \langle s^j\rangle=

1428: {\kappa(\alpha+n+j-1-b)\over\mu-\alpha-j}\langle s^{j-1}\rangle+

1429: {\kappa^2b(\alpha+j-2)\over(\mu-\alpha-j+1)(\mu-\alpha-j)}

1430: \langle s^{j-2}\rangle

1431: \]

1432: The special case $\alpha=1$ then yields

1433: \[

1434: \langle s^2\rangle|_{\alpha=1}=

1435: {\kappa^2\over(\mu-2)(\mu-3)}\left[

1436: (2+n-b)(1+n-b)+b+{(2+n-b)b^{n+1}\over n!M(-n,-n,b)}\right]

1437: \]

1438: which leads to this approximation for the variance of the posterior

1439: \[

1440: \mathrm{variance}(\alpha=1)\simeq

1441: {\kappa^2(1+n)\over(\mu-2)(\mu-3)}+

1442: {\kappa^2(1+n-b)^2\over(\mu-2)^2(\mu-3)}\qquad\qquad(n\gg b)

1443: \]

1444:

1445: \subsection{Posterior for $s$ with gamma priors for $\epsilon$ and $b$}

1446:

1447: Here we very briefly consider the case where the background parameter

1448: $b$ also acquires an uncertainty. This case is more general than the

1449: fixed $b$ case that is the main subject of this note: the fixed $b$ case

1450: will be the subject of additional studies employing various popular

1451: frequentist techniques, with the goal of comparing their performance.

1452: We judge the more general case considered in this subsection to be

1453: more complicated than necessary for the purpose of comparing the

1454: various methods, but it is instructive to document the fact that the

1455: Bayesian method can easily handle the more general case.

1456:

1457: We assume a 2nd subsidiary measurement observing $r$ events (Poisson,

1458: as was the case for $\epsilon$), which, when combined with a flat

1459: prior for $b$, results in a gamma posterior for $b$ of the form

1460: \[

1461: p(b|r)db = {\omega  (\omega b)^r  e^{-\omega b} \over r!}db

1462: \]

1463: where $\omega$ is a calibration constant (analogous to $\kappa$ in the

1464: subsidiary measurement for~$\epsilon$).

1465:

1466: The posterior for $b$ becomes the prior for $b$ in the measurement of

1467: $s$. After determining the joint posterior $p(s,\epsilon,b|n)$ by

1468: using our priors for $s$, $\epsilon$ and $b$, we marginalize with

1469: respect to $\epsilon$ and $b$, resulting in

1470: \[

1471: p(s|n)ds=

1472: {\Gamma(\mu+n)\over\Gamma(\mu-\alpha)\Gamma(\alpha+n)}

1473: {s^{\alpha+n-1}\kappa^{\mu-\alpha}\over(s+\kappa)^{\mu+n}}

1474: {F(-n,\rho;1-n-\mu;(s+\kappa)/(s\omega))\over F(-n,\rho;1-n-\alpha;1/\omega)}ds

1475: \]

1476: where we write $\rho=r+1$ for convenience, and $F$ is the

1477: hypergeometric function\cite{as2}. As long as $n$ is a non-negative

1478: integer and $\alpha>0$, $F(-n,\rho;1-n-\alpha;x)$ is a polynomial of

1479: order $n$ in $x$ (closely related to Jacobi polynomials).

1480:

1481: This marginalized posterior for $s$ can then be integrated, with the result

1482: \[

1483: \int_0^{s_\mathrm{u}}\!\!p(s|n)ds=

1484: \sum_{k=0}^n

1485: {I_x(\alpha+n-k,\mu-\alpha)n^{\underline{k}}\rho^{\overline{k}}

1486: \over(\alpha+n-1)^{\underline{k}}}

1487: {\omega^{-k}\over k!}\Bigg/\sum_{k=0}^n

1488: {n^{\underline{k}}\rho^{\overline{k}}\over(\alpha+n-1)^{\underline{k}}}

1489: {\omega^{-k}\over k!}\quad

1490: \left(x={s_\mathrm{u}\over s_\mathrm{u}+\kappa}\right)

1491: \]

1492:

1493: These two equations closely resemble the main results of sections

1494: \ref{postpdf} and \ref{intpostpdf}: to recover the fixed $b$

1495: results, simply substitute $b\omega$ for $\rho$ above,

1496: and take the limit $\omega\to\infty$.

1497:

1498:

1499: \section{Appendix B---Average Coverage Theorem\label{ac}}

1500:

1501: In this appendix we prove that Bayesian credible intervals have average frequentist

1502: coverage, where the average is calculated with respect to the prior density.

1503: We start from the Bayesian posterior density:

1504: \begin{equation}

1505: p(s\,|\,n)\;=\;\frac{P(n\,|\,s)\,\pi(s)}{\int_{0}^{\infty}\!P(n\,|\,s)\,\pi(s)\,ds}.

1506: \end{equation}

1507: For a given observed value of $n$, a credibility-$\beta$ Bayesian interval

1508: for $s$ is any interval $[s_\mathrm{L}(n), s_\mathrm{U}(n)]$ that encloses a fraction $\beta$

1509: of the total area under the posterior density.  Such an interval must therefore

1510: satisfy:

1511: \begin{equation}

1512: \beta \;=\; \int_{s_\mathrm{L}(n)}^{s_\mathrm{U}(n)}\! p(s\,|\,n)\,ds,

1513: \end{equation}

1514: or, using the definition of the posterior density:

1515: \begin{equation}

1516: \int_{s_\mathrm{L}(n)}^{s_\mathrm{U}(n)}\! P(n\,|\,s)\,\pi(s)\,ds\;=\;\beta\;\int_{0}^{\infty}

1517: \! P(n\,|\,s)\,\pi(s)\,ds.

1518: \label{eq:acbci1}

1519: \end{equation}

1520: Now for coverage.  Given a true value $s_\mathrm{t}$ of $s$, the coverage $C(s_\mathrm{t})$ of

1521: $[s_\mathrm{L}(n),s_\mathrm{U}(n)]$ is the frequentist probability that $s_\mathrm{t}$ is

1522: included in that interval.  We can write this as:

1523: \begin{equation}

1524: C(s_\mathrm{t})\;=\; \sum_{\substack{\text{all $n$ such that:}\\[1mm]

1525:                             s_\mathrm{L}(n)\le s\le s_\mathrm{U}(n)}}   P(n\,|\,s_\mathrm{t}).

1526: \label{eq:acbci2}

1527: \end{equation}

1528: Next we calculate the average coverage $\overline{C}$, weighted by the prior $\pi(s)$:

1529: \addtocounter{footnote}{1}

1530: \protect\footnotetext{The best way to understand this step is to draw a diagram of

1531: $s$ versus $n$: one is integrating and summing over the area between the curves

1532: $s_\mathrm{L}(n)$ and $s_\mathrm{U}(n)$.  The limits on the sum and integral depend on the order

1533: in which one does these operations and can be derived from the diagram.}

1534: \addtocounter{footnote}{-1}

1535: \begin{align*}

1536: \overline{C} & \;=\;  \int_{0}^{\infty}\! C(s)\,\pi(s)\,ds ,  && \displaybreak[0]\\[6mm]

1537:         & \;=\;  \int_{0}^{\infty}\sum_{\substack{\text{all $n$ such that:}\\[1mm]

1538:                                                   s_\mathrm{L}(n)\le s\le s_\mathrm{U}(n)}}

1539:                  P(n\,|\,s)\,\pi(s)\, ds,

1540:               && \text{using equation (\protect\ref{eq:acbci2}),} \displaybreak[0]\\[6mm]

1541:         & \;=\;  \sum_{n=0}^{\infty}\;\int_{s_\mathrm{L}(n)}^{s_\mathrm{U}(n)}\! P(n\,|\,s)\,\pi(s)\,ds ,

1542:               && \text{interchanging integral and sum,\protect\footnotemark}\displaybreak[0]\\[6mm]

1543:         & \;=\;  \beta\; \sum_{n=0}^{\infty}\;\int_{0}^{\infty}\! P(n\,|\,s)\,\pi(s)\,ds ,

1544:               && \text{using equation (\protect\ref{eq:acbci1}),} \displaybreak[0]\\[6mm]

1545:         & \;=\;  \beta\; \int_{0}^{\infty}\sum_{n=0}^{\infty} P(n\,|\,s)\,\pi(s)\,ds ,

1546:               && \text{interchanging sum and integral,}\displaybreak[0]\\[6mm]

1547:         & \;=\;  \beta\; \int_{0}^{\infty}\!\pi(s)\, ds ,

1548:               && \text{by the normalization of }P(n\,|\,s), \displaybreak[0]\\[6mm]

1549:         & \;=\;  \beta ,

1550:               && \text{by the normalization of }\pi(s) .

1551: \end{align*}

1552: This completes the proof.  We have assumed here that the prior $\pi(s)$ is

1553: proper and normalized to 1, but the proof can be generalized to improper priors

1554: such as those we considered in this note.  A constant prior for example, can be

1555: regarded as the limit for $s_{\max}\rightarrow\infty$ of the proper prior:

1556: \begin{equation}

1557: \pi(s\,|\,s_{\max}) \;=\; \frac{\vartheta(s_{\max}-s)}{s_{\max}},

1558: \end{equation}

1559: where $\vartheta(x)$ is $0$ if $x<0$ and $1$ otherwise.  We then {\em define}

1560: the average coverage for the constant prior as the limit:

1561: \begin{equation}

1562: \overline{C}\;=\;\lim_{s_{\max}\rightarrow\, +\infty} \;

1563:                  \int_{0}^{\infty}\! C(s)\,\pi(s\,|\,s_{\max})\,ds.

1564: \end{equation}

1565: The previous proof can now be applied to the argument of the limit and leads

1566: to the same result.

1567:

1568: The average coverage theorem remains valid when $s$ is multidimensional,

1569: for example when it consists of a parameter of interest and one or more

1570: nuisance parameters.  In that case one needs to average the coverage over

1571: {\em all} the parameters.

1572: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1573:

1574:

1575:

1576:

1577: \newpage

1578:

1579:

1580:

1581:

1582: \begin{thebibliography}{99}

1583:

1584: \bibitem{interplay}

1585: M.~J.~Bayarri and J.~O.~Berger,

1586: ``The Interplay of Bayesian and Frequentist Analysis'',

1587: Statistical Science 19, p 58 (2004),\\

1588: \url{projecteuclid.org/Dienst/UI/1.0/Summarize/euclid.ss/1089808273},\\

1589: \url{www.isds.duke.edu/~berger/papers/interplay.html}.

1590:

1591: \bibitem{zeroc}

1592: Giovanni Punzi,

1593: ``Example of Bayesian intervals with zero coverage'',

1594: CDF Internal Note 6689 (2001),\\

1595: \url{www-cdf.fnal.gov/publications/cdf6689_Bayes_zero_coverage.pdf}.

1596:

1597: \bibitem{clifford}

1598:

1599: Peter Clifford,

1600: ``Interval estimation as viewed from the world of mathematical

1601: statistics'',

1602: CERN Yellow Report CERN 2000-005, p 157 (2000), {\it

1603: Proceedings of the

1604: Workshop on Confidence Limits at CERN, 17--18 January 2000}, edited by

1605: L.~Lyons, Y.~Perrin, and F.~James,\\

1606: %\url{cdsweb.cern.ch/search.py?recid=411537}

1607: \url{doc.cern.ch/yellowrep/2000/2000-005/p157.pdf}.

1608: %\url{ph-dep.web.cern.ch/ph-dep/Events/CLW/PAPERS/PS/clifford.ps}

1609:

1610: \bibitem{strong}

1611: Giovanni Punzi,

1612: ``A stronger classical definition of Confidence Limits'',

1613: hep-ex/9912048,

1614: %\url{ph-dep.web.cern.ch/ph-dep/Events/CLW/PAPERS/PS/punzi.ps}

1615: \url{www.arxiv.org/abs/hep-ex/9912048}.

1616:

1617:

1618:

1619: \bibitem{zech}

1620: G\"{u}nter Zech,

1621: ``Confronting classical and Bayesian confidence limits to examples'',

1622: CERN Yellow Report CERN 2000-005, p 141 (2000), {\it

1623: Proceedings of the

1624: Workshop on Confidence Limits at CERN, 17--18 January 2000}, edited by

1625: L.~Lyons, Y.~Perrin, and F.~James,

1626: %\url{cdsweb.cern.ch/search.py?recid=411537}

1627: \url{doc.cern.ch/yellowrep/2000/2000-005/p141.pdf}.

1628: %\url{ph-dep.web.cern.ch/ph-dep/Events/CLW/PAPERS/PS/zech.ps}

1629:

1630:

1631:

1632: \bibitem{karlen}

1633: D.~Karlen,

1634: ``Credibility of confidence intervals'', in

1635: {\em Proceedings of the Conference on Advanced Techniques in Particle Physics,

1636: Durham, 18--22 March 2002},

1637: edited by M.~Whalley and L.~Lyons, p 53, (2002),\\

1638: \url{www.ippp.dur.ac.uk/Workshops/02/statistics/proceedings/karlen.pdf}.

1639:

1640:

1641: \bibitem{ch}

1642: R.~D.~Cousins and V.~L.~Highland,

1643: ``Incorporating systematic uncertainties into an upper limit'',

1644: Nucl.\ Instrum.\ Meth.\ A {\bf 320}, p 331 (1992).

1645:

1646:

1647: \bibitem{feldman}

1648: Gary Feldman,

1649: ``Multiple measurements and parameters in the unified approach'',

1650: {\it Fermilab Workshop on Confidence Limits 27--28 March, 2000}, p 11,\\

1651: \url{conferences.fnal.gov/cl2k/copies/feldman2.pdf},\\

1652: \url{huhepl.harvard.edu/~feldman/CL2k.pdf}.

1653:

1654: \bibitem{roots}

1655: J.~M.~Bernardo and A.~F.~M.~Smith, ``Bayesian Theory'', (John

1656: Wiley and Sons, Chichester, UK, 1993), \S5.4 and \S A.2.

1657:

1658: \bibitem{ref:Jeffreys}

1659: Harold Jeffreys, ``Theory of Probability'', 3rd ed., (Oxford

1660: University Press, Oxford, 1961), \S3.1.

1661:

1662: \bibitem{software}

1663: Joel Heinrich, ``User Guide to Bayesian-Limit Software Package'',

1664: CDF Internal Note 7232, (2004),\\

1665: \url{www-cdf.fnal.gov/publications/cdf7232_blimitguide.pdf};\\

1666: CDF Statistics Committee Software Page,\\

1667: \url{www-cdf.fnal.gov/physics/statistics/statistics_software.html}.

1668:

1669:

1670: \bibitem{ff}

1671: R.~L.~Graham, D.~E.~Knuth, and O.~Patashnik, ``Concrete Mathematics:

1672: A Foundation for Computer Science'', 2nd ed., (Addison-Wesley, Reading,

1673: MA, 1994); PlanetMath Mathematics Encyclopedia,\\

1674: \url{planetmath.org/encyclopedia/FallingFactorial.html}.

1675:

1676:

1677: \bibitem{as}

1678: M.~Abramowitz and I.A.~Stegun, editors, ``Handbook of Mathematical

1679: Functions'', (United

1680: States Department of Commerce, National Bureau of Standards,

1681: Washington, D.C. 1964; and Dover Publications, New York, 1968),

1682: chapter 13.

1683:

1684: \bibitem{as2}

1685: M.~Abramowitz and I.A.~Stegun, ibid., chapter 15;

1686: William H. Press, et al.,\ ``Numerical Recipes'', 2nd edition,

1687: (Cambridge University Press, Cambridge, 1992), \S5.14 and \S6.12,

1688: \url{lib-www.lanl.gov/numerical/bookcpdf/c5-14.pdf},\\

1689: \url{lib-www.lanl.gov/numerical/bookcpdf/c6-12.pdf}.

1690:

1691:

1692:

1693: \end{thebibliography}

1694:

1695: %\clearpage

1696: \begin{figure}[p]

1697: \begin{center}

1698: \includegraphics[width=\textwidth]{Bayes}

1699: \caption{

1700: Coverage as a function of the true signal rate $s$ for Bayes 90\%

1701: limits, for the simple case of no background and no uncertainty on

1702: $\epsilon = 1$. The dotted line at $C=0.9$ is given to

1703: show that the coverage never falls below 90\% (in this

1704: simple case).}

1705: \label{fig:Bayes}

1706: \end{center}

1707: \end{figure}

1708:

1709:

1710: \begin{figure}[p]

1711: \begin{center}

1712: \includegraphics[width=\textwidth]{pcomparison}

1713: \caption{

1714: Comparison of our discrete probability for $\epsilon_0$ (shown as

1715: a histogram, see eqn.~(\ref{eqn:pdf})) and Gaussian (continuous curve)

1716: for the case $\epsilon=1.0\pm0.1$.}

1717: \label{fig:comparison}

1718: \end{center}

1719: \end{figure}

1720:

1721:

1722: \begin{figure}[p]

1723: \begin{center}

1724: \includegraphics[width=\textwidth]{comparison2}

1725: \caption{Comparison of our likelihood

1726: (dashed, see eqn.~(\ref{eqn:likelihood}))

1727: and Gaussian (solid) for the case

1728: $\epsilon=1.0\pm0.1$.}

1729: \label{fig:comparison2}

1730: \end{center}

1731: \end{figure}

1732:

1733:

1734: \begin{figure}[p]

1735: \begin{center}

1736: \includegraphics[width=3.15in]{pdf1}\hskip-0.2in

1737: \includegraphics[width=3.15in]{pdf2}\\

1738: \includegraphics[width=3.15in]{pdf3}\hskip-0.2in

1739: \includegraphics[width=3.15in]{pdf4}\\

1740: \caption{Posterior densities $p(s|b,n)$ vs $s$ for $n=1$, 3, 5, 10.

1741: In each case, $b=3$ and $\epsilon=1.0\pm0.1$ (i.e.\ $\kappa=100$ and $m$=99).}

1742: \label{pdfs}

1743: \end{center}

1744: \end{figure}

1745:

1746: \begin{figure}[p]

1747: \begin{center}

1748: \includegraphics[width=\textwidth]{l100}

1749: \caption{Coverage of 90\% upper limits as a function of \st\ for

1750: $\et=1$, nominal 10\% uncertainty of the subsidiary

1751: measurement of $\epsilon$, and expected background $b=3$.}

1752: \label{l100}

1753: \end{center}

1754: \end{figure}

1755:

1756:

1757:

1758: \begin{figure}[p]

1759: \begin{center}

1760: \includegraphics[width=\textwidth]{l25}

1761: \caption{Coverage of 90\% upper limits as a function of \st\ for

1762: $\et=1$, nominal 20\% uncertainty of the subsidiary

1763: measurement of $\epsilon$, and expected background $b=3$.}

1764: \label{l25}

1765: \end{center}

1766: \end{figure}

1767:

1768:

1769: \begin{figure}[p]

1770: \begin{center}

1771: \includegraphics[width=\textwidth]{ecov}

1772: \caption{Coverage of 90\% upper limits as a function of \et\ for

1773: $\st=10$, nominal 10\% uncertainty of the subsidiary

1774: measurement of $\epsilon$, and expected background $b=3$.}

1775: \label{ecov}

1776: \end{center}

1777: \end{figure}

1778:

1779:

1780: \begin{figure}[p]

1781: \begin{center}

1782: \includegraphics[width=\textwidth]{sens}

1783: \caption{Sensitivity of 90\% upper limits as a function of \st\ for

1784: $\et=1$, nominal 10\% uncertainty of the subsidiary

1785: measurement of $\epsilon$, and expected background $b=3$.

1786: For reference, the sensitivity for $\sigma_\epsilon=0$ is also

1787: given (dashed).}

1788: \label{sens}

1789: \end{center}

1790: \end{figure}

1791:

1792:

1793:

1794:

1795:

1796:

1797:

1798:

1799: \end{document}

1800: