0711:0711.3937/ms.tex

1: %%

2: %% Beginning of file 'sample.tex'

3: %%

4: %% Modified 2005 December 5

5: %%

6: %% This is a sample manuscript marked up using the

7: %% AASTeX v5.x LaTeX 2e macros.

8:

9: %% The first piece of markup in an AASTeX v5.x document

10: %% is the \documentclass command. LaTeX will ignore

11: %% any data that comes before this command.

12:

13: %% The command below calls the preprint style

14: %% which will produce a one-column, single-spaced document.

15: %% Examples of commands for other substyles follow. Use

16: %% whichever is most appropriate for your purposes.

17: %%

18: \documentclass[12pt,preprint]{aastex}

19:

20: %% manuscript produces a one-column, double-spaced document:

21:

22: %% \documentclass[manuscript]{aastex}

23:

24: %% preprint2 produces a double-column, single-spaced document:

25:

26: %% \documentclass[preprint2]{aastex}

27:

28: %% Sometimes a paper's abstract is too long to fit on the

29: %% title page in preprint2 mode. When that is the case,

30: %% use the longabstract style option.

31:

32: %% \documentclass[preprint2,longabstract]{aastex}

33:

34: %% If you want to create your own macros, you can do so

35: %% using \newcommand. Your macros should appear before

36: %% the \begin{document} command.

37: %%

38: %% If you are submitting to a journal that translates manuscripts

39: %% into SGML, you need to follow certain guidelines when preparing

40: %% your macros. See the AASTeX v5.x Author Guide

41: %% for information.

42:

43: %%\usepackage{amsmath}

44:

45: \newcommand{\vdag}{(v)^\dagger}

46: \newcommand{\myemail}{skywalker@galaxy.far.far.away}

47:

48: %% Bibliography styles.

49: \citestyle{aa}

50: \bibliographystyle{apj}

51:

52: %% You can insert a short comment on the title page using the command below.

53:

54: \slugcomment{}

55:

56: %% If you wish, you may supply running head information, although

57: %% this information may be modified by the editorial offices.

58: %% The left head contains a list of authors,

59: %% usually a maximum of three (otherwise use et al.).  The right

60: %% head is a modified title of up to roughly 44 characters.

61: %% Running heads will not print in the manuscript style.

62:

63: \shorttitle{Sequential Analysis in Particle Astronomy}

64: \shortauthors{BenZvi et al.}

65:

66: %% This is the end of the preamble.  Indicate the beginning of the

67: %% paper itself with \begin{document}.

68:

69: \begin{document}

70:

71: %% LaTeX will automatically break titles if they run longer than

72: %% one line. However, you may use \\ to force a line break if

73: %% you desire.

74:

75:

76: \title{Sequential Analysis Techniques for Correlation Studies \\

77:        in Particle Astronomy}

78:

79: \author{S.Y. BenZvi\altaffilmark{1}, B.M. Connolly\altaffilmark{2}, and S. Westerhoff\altaffilmark{1}}

80: \altaffiltext{1}{University of Wisconsin-Madison, Department of Physics, 1150 University

81:               Avenue, Madison, WI 53706, USA}

82:

83: \altaffiltext{2}{University of Pennsylvania, Department of Physics and Astronomy,

84:               209 South $33^{\mathrm{rd}}$ Street, Philadelphia, PA 19104, USA; brianco@sas.upenn.edu}

85:

86: %\author{S.Y. BenZvi}

87: %\affil{Columbia University, Department of Physics and Nevis Laboratories,

88: %                   538 West $\it 120^{th}$ Street, New York, NY 10027, USA}

89:

90: %\author{B.M. Connolly}

91: %\affil{University of Pennsylvania, Department of Physics and Astronomy,

92: %                209 South $\it 33^{rd}$ Street, Philadelphia, PA 19104, USA}

93:

94: %\author{S. Westerhoff}

95: %\affil{University of Wisconsin-Madison, Department of Physics, 1150 University

96: %              Avenue, Madison, WI 53706, USA}

97:

98:

99: \begin{abstract}

100:

101: Searches for statistically significant correlations between arrival directions

102: of ultra-high energy cosmic rays and classes of astrophysical objects are

103: common in astroparticle physics.  We present a method to test potential

104: correlation signals of \textit{a priori} unknown strength and evaluate their

105: statistical significance sequentially, i.e., after each incoming new event in a

106: running experiment.  The method can be applied to data taken after the test has

107: concluded, allowing for further monitoring of the signal significance.  It

108: adheres to the likelihood principle and rigorously accounts for our ignorance

109: of the signal strength.

110:

111: \end{abstract}

112:

113: \keywords{cosmic rays --- methods: statistical}

114:

115:

116: % -----------------------------------------------------------------------------

117: \section{Introduction}\label{sec:intro}

118: % -----------------------------------------------------------------------------

119:

120: One of the major goals in astroparticle physics is the identification and the

121: study of sources of ultra-high energy cosmic rays, defined as cosmic rays with

122: energies larger than $10^{18}$\,eV.  The discovery of discrete sources would

123: answer longstanding questions about how and where particles are accelerated to

124: such energies.  So far, no discrete sources have been positively identified.

125: One major obstacle for the identification of potential sources is the small

126: number of detected events.  Until a few years ago, the published world data set

127: of cosmic rays with energies above $4\,\times 10^{19}$\,eV consisted of little

128: more than 100 events, mainly recorded with the Akeno Giant Air Shower Array

129: (AGASA) in Japan between 1984 and 2003~\citep{Takeda:1999sg}, and the High

130: Resolution Fly's Eye (HiRes) Experiment in Utah between 1997 and

131: 2006~\citep{Abbasi:2004ib}.

132:

133: Nevertheless, the small data set has been subjected to exhaustive searches for

134: deviations from isotropy.  These include searches for point sources; searches

135: for an excess of clustering in the distribution of arrival directions on

136: various angular scales; and searches for correlations with classes of known

137: astrophysical objects that were considered likely sites of cosmic ray

138: acceleration.  Some of these searches resulted in potential signals, but

139: because of the small size of the data set, the statistical significance could

140: not be established in a reliable manner.  Consequently, while the discovery of

141: discrete sources was claimed repeatedly, statistically independent data

142: routinely failed to support earlier claims.  An example is the search for

143: correlations of cosmic ray arrival directions with objects of the BL Lac

144: class~\citep{Tinyakov:2001nr,Gorbunov:2004bs,Abbasi:2005qy}.

145:

146: With a new generation of large-aperture astroparticle physics detectors like

147: the Pierre Auger Observatory nearing completion in Malarg\"ue, Argentina and

148: the Telescope Array detector under construction in Utah, the amount of

149: ultra-high energy data is now growing at an unprecedented pace.  The Pierre

150: Auger Observatory, for instance, began scientific data taking in January 2004

151: and has already accumulated over

152: $9\times10^3\,\mathrm{km}^{2}\,\mathrm{sr}\,\mathrm{yr}$ of integrated

153: exposure, more than any previous experiment.

154:

155: \subsection{Basic Search Techniques in Cosmic Ray Physics}

156:

157: The fact that previous experiments have failed to find statistically

158: significant deviations from isotropy in skymaps of ultra-high energy cosmic

159: rays can be seen as an indication that the sources are weak.  In this case,

160: the most promising correlation searches are not those which aim at finding

161: sources individually, but rather those conducted on a statistical basis;

162: i.e., searches for significant correlations of cosmic ray arrival

163: directions with catalogs of astrophysical objects.

164:

165: When studying correlations with objects from a source catalog, one tests

166: whether the probability $p$ of a given event to arrive from the direction of an

167: object in the catalog is significantly larger than the probability $p_0$ of the

168: correlation occurring by chance.  These analyses are typically binned, so an

169: event is said to correlate with an object from the catalog if the angle between

170: its arrival direction and the object's position is smaller than some angle

171: $\theta$.  If the particles are neutral, $\theta$ could be chosen to reflect

172: the point spread function of the detector.  In the case of cosmic rays,

173: however, the particles are most likely charged and therefore deflected by

174: Galactic and intergalactic magnetic fields of (unknown) strength.

175: Consequently, $\theta$ is usually chosen to be larger than the resolution of

176: the detector to account for magnetic smearing.

177:

178: Typically, potential signals are identified after intensive searches using

179: different angular scales, different energy thresholds, different source

180: catalogs, and other parameters that are found to maximize the signal strength.

181: Therefore, an unbiased chance probability for the observed signal can only be

182: established by discarding the data set used to find the signal and testing the

183: signal with statistically independent data.  For the test, the source catalog

184: and all analysis parameters are fixed {\it a priori} to obtain an unbiased

185: chance probability for the signal.

186:

187: Once the \textsl{a priori} analysis parameters are identified, the problem is

188: easily formulated in terms of a classical hypothesis test, in which new data

189: are checked for compatibility with a null hypothesis $\mathcal{H}_0$ (``the

190: data exhibit no significant correlation'') or an alternative ``signal''

191: hypothesis $\mathcal{H}_1$.  There are several ways to perform such a test.

192: For example, one can run the test after the new data set has reached a certain

193: size $n$, or after the experiment has run for a certain fixed amount of time.

194:

195: Formally, the size of the data set and the acceptance or rejection of the null

196: hypothesis are determined by two probabilities, $\alpha$ and $\beta$, which are

197: usually chosen before the start of the test.  These values define the

198: experimenter's tolerance for different sorts of experimental errors: $\alpha$

199: is the probability of wrongly rejecting the null hypothesis when

200: $\mathcal{H}_0$ is true (a type-1 or ``false positive'' error); and $\beta$ is

201: the probability of wrongly accepting the null hypothesis when $\mathcal{H}_0$

202: is false (a type-2 or ``false negative'' error).  In a classical one-sided

203: hypothesis test, where a p-value $P$ is used to estimate the agreement of the

204: data with the null hypothesis, the result $P<\alpha$ implies rejection of

205: $\mathcal{H}_0$ at the ``confidence level'' $1-\alpha$.  Meanwhile, the desired

206: probability of rejecting a false null hypothesis $(1-\beta)$ fixes the required

207: size of the data set $(n)$.

208:

209: \subsection{One-Shot vs. Sequential Testing}

210:

211: If one chooses to evaluate $P$ after a predefined number of events has been

212: recorded, or a predefined amount of time has elapsed, then the significance of

213: the signal is tested only once.  However, it is often desirable to evaluate and

214: test the signal sequentially, i.e., after each new event, rather than

215: at the end of the test.  This approach allows for the possibility of claiming a

216: statistically significant result earlier than with methods that check the

217: signal only once, a distinct advantage when event rates are quite low.  It also

218: avoids another practical disadvantage of hypothesis tests that arises when the

219: experiment, for one reason or another, has to discontinue data taking before

220: the predefined number of events is taken.  In that case, the ``one-shot''

221: analysis does not lead to a conclusion.

222:

223: A sequential analysis can be performed in several ways.  If $P$ is evaluated

224: after every incoming event and not just once after all $n$ events are

225: collected, a ``penalty'' factor has to be inserted to account for the fact that

226: there are now more opportunities to satisfy the test by

227: chance~\citep{Anscombe:1954,Armitage:1969}.  This penalty

228: factor can be evaluated with simulations and will depend on $n$.  The

229: dependence of $P$ on $n$ is an undesirable feature of the method; rather than

230: depending on the data that were actually recorded, $P$ now depends on the

231: number of events that an observer would have recorded had he decided to perform

232: a ``one-shot'' test.  The interpretation of the data therefore depends on data

233: not actually taken.  This feature of the test violates the likelihood

234: principle~\citep{Berry:1987}.

235:

236: In addition, the inclusion of the penalty factor means that data arriving after

237: the test has ended cannot be used to calculate $P$ for the entire data

238: set.  It is therefore not possible to include new data in the calculation

239: of the probability.  In many practical situations, data taking continues after

240: the test has ended, and it is highly desirable to monitor the signal

241: probability with new data.

242:

243: The classical sequential likelihood ratio test developed by~\citet{Wald:1945,Wald:1947}

244: avoids the limitations that arise when using the p-value $P$.  Wald defines the

245: likelihood ratio evaluated after the $n^{th}$ event as

246: %

247: \begin{equation}

248: \mathcal{R}_n=\frac{P(\mathcal{D}|\mathcal{H}_1)}{P(\mathcal{D}|\mathcal{H}_0)}~~,

249: \label{eq:likelihood_ratio}

250: \end{equation}

251: %

252: where the denominator and numerator represent the probability of observing a

253: data set $\mathcal{D}$ given a null hypothesis (no correlation) and an

254: alternative (correlation).  The ratio $\mathcal{R}_n$ can be evaluated after each

255: incoming event (i.e. after the $n^{th}$ event) without statistical penalty, and the test stops with the

256: acceptance or rejection of the null hypothesis when $\mathcal{R}_n$ falls below

257: or exceeds a predefined value (details will be given in

258: Section~\ref{sec:method}). Moreover, the evaluation of $\mathcal{R}_n$ can

259: continue after the decision to see whether new data continue to favor or

260: disfavor the selected hypothesis.

261:

262: The probabilities $P(\mathcal{D}|\mathcal{H}_0)$ and

263: $P(\mathcal{D}|\mathcal{H}_1)$ in eq.\,(\ref{eq:likelihood_ratio}) depend on the

264: expected correlations in case of random coincidences and true signals,

265: respectively.  In correlation studies, the strength of the signal is typically

266: not known before the test is complete; so in the analysis proposed

267: by~\citet{Wald:1945,Wald:1947}, one simply takes a ``best guess'' at the lower

268: bound of the signal strength.  In this paper, we extend Wald's technique to

269: marginalize the signal strength, which more rigorously accounts for our

270: ignorance of the true signal.  As in the classical likelihood ratio test, this

271: extended test can be applied after each new event without statistical penalty,

272: so that it adheres to the likelihood principle.  It also allows for the

273: evaluation of the significance of the signal after the test has been fulfilled,

274: as well as in cases where the test stops prematurely.

275:

276: We note that the usefulness of this test is not limited to cosmic ray physics.

277: It can be applied in many other areas of astroparticle physics or astrophysics

278: where event rates are low, for example in searches for discrete sources of high

279: energy neutrinos or $\gamma$-rays.

280:

281: %This paper is organized as follows.  After a description of the method in

282: %Section\,\ref{sec:method}, we analyze the behavior of the test with simulated

283: %data sets in Section\,\ref{sec:test}.  Section~\ref{sec:summary} summarizes

284: %the results.

285:

286: % -----------------------------------------------------------------------------

287: \section{The Method}\label{sec:method}

288: % -----------------------------------------------------------------------------

289:

290: We consider the case of an analysis searching for correlations between cosmic

291: ray arrival directions and objects from a catalog.  The background probability

292: $p_0$ is the probability that a given event correlates by chance.  We want to

293: test the signal probability $p_1$ against $p_0$.  If two point hypotheses are

294: tested against each other, $p_0$ and $p_1$ are single numbers; but in general,

295: $p_1$ can also have a range of values.  If, for example, the ``signal''

296: corresponds to a stronger correlation than can be expected by chance, then

297: $p_1>p_0$.

298:

299: Since an event can either be correlated with an object from the catalog or not,

300: the probability of observing a data set $\mathcal{D}$ in which $k$ out of $n$

301: events correlate with sources is given by the binomial distribution

302: %

303: \begin{equation}

304: P(\mathcal{D}|p) = P(n,k|p) = {n \choose k}\ p^k\ (1-p)^{n-k}

305: \label{eq:binomial_distribution}

306: \end{equation}

307: %

308: where $p$ is the probability of a given event to correlate.  If the data show

309: no significant correlations in addition to those occurring by chance, then

310: $p=p_0$.

311:

312: In a sequential analysis that tests hypothesis $\mathcal{H}_1$ against

313: $\mathcal{H}_0$ with data $\mathcal{D}$, the probability ratio $\mathcal{R}_n$ of

314: eq.\,(\ref{eq:likelihood_ratio}) is calculated after each incoming event, and is

315: then compared to two positive constants $A$ and $B$ (where $B<A$).  During each

316: step in the sequence, the experimenter is presented with the following possible

317: outcomes:

318: %

319: \begin{enumerate}

320:   \item $\mathcal{R}_n\ge A$: the test terminates with the rejection of

321:         $\mathcal{H}_0$.

322:   \item $\mathcal{R}_n\le B$: the test terminates with the acceptance of

323:         $\mathcal{H}_0$.

324:   \item $B<\mathcal{R}_n<A$: the test continues to record data.

325: \end{enumerate}

326: %

327: \citet{Wald:1945,Wald:1947} showed that the constants

328: $A$ and $B$ are closely related to the probabilities $\alpha$ and $\beta$ of

329: type-1 and type-2 errors:

330: \begin{equation}

331:   A\leq\frac{1-\beta}{\alpha}~~~\mathrm{and}~~~

332:   B\geq\frac{\beta}{1-\alpha}~~.

333: \end{equation}

334: %

335: While it is difficult in most practical situations to estimate exact values for

336: $A$ and $B$, Wald showed that simply choosing

337: %

338: \begin{equation}

339:   A = \frac{1-\beta}{\alpha}~~~\mathrm{and}~~~

340:   B = \frac{\beta}{1-\alpha}~~,

341: \end{equation}

342: %

343: as the test boundaries leads to adequate results if $\alpha$ and $\beta$ are

344: small (typically, they are not larger than 0.05).  By adequate, we mean that

345: the true type-1 and type-2 rates will never exceed $\alpha$ and

346: $\beta$.  In fact, the true error rates will often be smaller than the nominal

347: $\alpha$ and $\beta$ specified before the start of the experiment.

348:

349: %The test can terminate at any time and still provide valuable information,

350: %or it can continue even after a decision is made to see whether additional

351: %data further supports the decision or not.  No penalty factor is required,

352: %but still, the probabilities are evaluated after each incoming event.

353:

354: For a data set that contains $n$ events and $k$ correlations, the likelihood

355: ratio is given by

356: %

357: \begin{equation}

358:   \mathcal{R}^\prime _n

359:     =\frac{P(\mathcal{D}|p_1)}{P(\mathcal{D}|p_0)}

360:     =\frac{p_1^k (1-p_1)^{n-k}}{p_0^k (1-p_0)^{n-k}}~~.

361:   \label{eq:likeli}

362: \end{equation}

363:

364: In practice, the signal strength $p_1$ is often not known.  We consider here

365: the common case of a one-sided test where $p_0 < p_1 \leq 1$.  The confidence

366: in rejecting $\mathcal{H}_0$ typically increases with increasing $p$.  To

367: evaluate $\mathcal{R}_n$ in this case, we can expand the numerator and

368: denominator of eq.\,(\ref{eq:likelihood_ratio}) in terms of $p$:

369: %

370: \begin{equation}

371:   \mathcal{R}_n = \frac{\int_0^1 P(D|p)\ P(p|\mathcal{H}_1)\ dp}

372:                      {\int_0^1 P(D|p)\ P(p|\mathcal{H}_0)\ dp}~.

373: \end{equation}

374:

375: The quantities $P(p|\mathcal{H}_1)$ and $P(p|\mathcal{H}_0)$ represent our

376: prior assumptions about $p$ in the cases of true signal vs. chance

377: correlations.  In cosmic ray studies, the probability $p_0$ of a chance

378: correlation with a catalog object is estimated from the \textsl{a priori}

379: parameters of the test: e.g., the detector exposure to the catalog,

380: the angular bin size $\theta$, etc.  In contrast, it is fairly uncommon to have

381: a reliable estimate of the signal probability $p_1$ beyond the fact that

382: $p_1>p_0$.  Absent further knowledge of the signal, we can therefore treat the

383: probability as uniformly distributed on the interval $[p_1,1]$.  Hence, we

384: summarize our prior knowledge of the two cases by

385: %

386: \begin{eqnarray}

387:   P(p|\mathcal{H}_1) & = & \frac{\Theta(p-p_1)}{1-p_1}~~, \\

388:   P(p|\mathcal{H}_0) & = & \delta(p-p_0)~~.

389: \end{eqnarray}

390: %

391: Note that $p$ is not time-dependent, although we do not see

392: anything inherently problematic in inserting a time-dependence.  Although not

393: many ultra-high energy cosmic ray models propose a time-dependence,

394: if a time-dependent model is inserted for $\mathcal{H}_0$, the probability

395: of each sucessive event is evaluated based on what is expected

396: at the time it was measured.  However, if $\mathcal{H}_0$

397: and $\mathcal{H}_1$ are simply wrong - that is, the

398: hypotheses do not properly reflect what could happen in nature

399: - then any result is possible.  This hazard exists for any hypothesis test.

400:

401: Solving for the likelihood ratio $\mathcal{R}_n$, we have

402: %

403: \begin{eqnarray}

404:   \mathcal{R}_n & = & \frac{\int_{p_1}^1 p^k\ (1-p)^{n-k}\ dp}

405:                          {p_0^k\ (1-p_0)^{n-k}\ (1-p_1)}\\

406:               & = & \frac{\mathrm{B}(k+1, n-k+1) -

407:                           \mathrm{B}(p_1; k+1, n-k+1)}

408:                          {p_0^k\ (1-p_0)^{n-k}\ (1-p_1)}~,

409:   \label{eq:final_ratio}

410: \end{eqnarray}

411: %

412: where $\mathrm{B}(a,b)$ and $\mathrm{B}(x;a,b)$ are the complete and incomplete

413: beta functions.  Note that eq.\,(\ref{eq:final_ratio}) is a convenient form for

414: the numerical computation of $\mathcal{R}_n$.

415:

416: When nothing is known {\it a priori} about the strength of the signal, $p_1$

417: will be chosen close to $p_0$ to test as large a signal space $p$ as possible.

418: If more information on $p$ were available --- for example, if it were known

419: that $p$ is larger than some value $p_{\mbox{\scriptsize min}}$ --- then the range of

420: integration could be made smaller.  To illustrate the merits of improved

421: knowledge, Fig.\,\ref{fig:R_vs_p1} shows $\mathcal{R}_n$ as a function of $p_1$

422: for $n=10$, $k=6$, and $p_0=0.1$.  Since the ``true'' probability for an event

423: to correlate is $p=6/10=0.6$, choosing $p_1$ close to $p$ increases

424: $\mathcal{R}_n$ and therefore minimizes the time necessary to confirm the signal.

425: As $p_1$ continues to increase beyond the true signal probability,

426: $\mathcal{R}_n$ decreases, as expected.

427:

428: Fig.\,\ref{fig:R_vs_n} shows the results of the sequential analysis described

429: above when applied to simulated data sets.  The background probability is

430: $p_0=0.1$; $p_1=0.3$ is the minimum signal we choose to distinguish from the

431: background; and $\alpha=\beta=0.001$.  The upper plot shows the result of the

432: test for data sets with a correlation probability of $p=0.5$ ($\mathcal{H}_0$

433: is false), whereas for the bottom plot, $p=0.1$ ($\mathcal{H}_0$ is true).  For

434: both plots, the analysis is performed for $10^5$ Monte Carlo data sets, and the

435: dark and light grey areas indicate the range that includes 68\% and 95\% of the

436: data sets.

437:

438: % -----------------------------------------------------------------------------

439: \section{The Ratio of Likelihoods, the Ratio of Posteriors, and the Meaning

440: of $\alpha$ and $\beta$}

441: % -----------------------------------------------------------------------------

442:

443: Here, $\mathcal{R}_n$ is defined as a ratio of likelihoods, but

444: one could just as easily define $\mathcal{R}_n$ as a ratio of

445: posterior probabilities as suggested by~\citet{Wald:1945,Wald:1947}.

446: However, changing the definition

447: of $\mathcal{R}_n$ carries consequences in the interpretation

448: of $\alpha$ and $\beta$.  To understand how, we first review

449: what $\alpha$ and $\beta$ mean in the context of the likelihood ratio.

450:

451: The meaning of the probabilities in the numerator and denominator of

452: $\mathcal{R}_n$ are obviously connected to the meaning of $\alpha$ and $\beta$.

453: One could argue that, since we are marginalizing parameters anyway,

454: we might as well calculate the posterior probabilities as suggested in

455: Wald's original paper~\citep{Wald:1945}.

456: This has certain advantages.

457: For instance, the ratio would be defined as

458: \begin{eqnarray}

459: \mathcal{R}^{post}_n = \frac{P(\mathcal{H}_1|D)}{P(\mathcal{H}_0|D)}

460: = \frac{P(D|\mathcal{H}_1)P(\mathcal{H}_1)}{P(D|\mathcal{H}_0)P(\mathcal{H}_0)}.

461: \end{eqnarray}

462: One could choose priors for $P(\mathcal{H}_1)$ and $P(\mathcal{H}_0)$.

463: $A$ and $B$ then become thresholds for ``degrees of belief'' that

464: we must hold for one hypothesis over another before we claim one or the

465: other to be true.

466: For instance, given that $\mathcal{H}_1$ is true,

467: $1-\beta$

468: becomes the required confidence for $P(\mathcal{H}_1|D)$

469: and $\alpha$ the required confidence

470: for $P(\mathcal{H}_0|D)$ to claim that $\mathcal{H}_1$ is true - i.e.

471: $A=(1-\beta)/\alpha$.

472:

473: However, as noted by~\citet{Wald:1945,Wald:1947}, the likelihood ratio also

474: has its merits.  First, the likelihood ratio has some precedent.  Even those

475: who subscribe to the Bayesian formalism use marginalized likelihood ratios

476: (i.e. Bayes Factors)~\citep{Jeffreys:1939,Kass:1995}; using a likelihood

477: ratio avoids the use of priors $P(\mathcal{H}_0)$ and $P(\mathcal{H}_1)$ which

478: can strongly influence the result.  Further, likelhood ratios provide

479: like comparisons with likelihood ratios used in other analyses with fixed $p_0$

480: and $p_1$.  However, the definitions of $A$ and $B$ become cumbersome even

481: in the circumstance here where we

482: are unconcerned whether or not the test ever terminates,

483: For instance, given that $\mathcal{H}_1$ is true,

484: $A$ parameterizes how much more likely the data must come from a universe

485: where $\mathcal{H}_1$ is true as opposed to $\mathcal{H}_0$ before

486: we claim that $\mathcal{H}_1$ is indeed true.

487:

488: In short, using a ratio of posteriors allows

489: $\alpha$ and

490: $\beta$ to be conceptualized intuitively as degrees of belief

491: in one hypothesis or another.  Using likelihood ratios is common and, while one

492: does not have to contend with defining priors for $\mathcal{H}_1$ and $\mathcal{H}_0$,

493: $\alpha$ and $\beta$ can no longer be conceptualized in terms of degrees of belief

494: for $\mathcal{H}_0$ and $\mathcal{H}_1$.

495: Here, we opt for the more traditional calculation of the likelihood ratio

496: or what could be thought of as a ratio of posteriors

497: if $P(\mathcal{H}_1)=P(\mathcal{H}_0)$.

498:

499:

500: % -----------------------------------------------------------------------------

501: \section{Testing the Method}\label{sec:test}

502: % -----------------------------------------------------------------------------

503:

504: \subsection{Test Convergence and the Error Rates $\alpha$ and $\beta$}

505:

506: To account for our ignorance of the true correlation probability $p$ of the

507: given data set, $p$ is marginalized in the likelihoods in eq.\,(\ref{eq:likeli}).

508: As mentioned in the previous section, we assume that the signal

509: probability $p$ that we want to test against the null hypothesis is uniformly

510: distributed on $\left[p_1,1\right]$.  With no prior knowledge of the signal

511: other than $p>p_0$, we choose $p_1=p_0$.

512:

513: In practice, this approach has an important consequence if one were to

514: interpret the results of the hypothesis test in terms of the probabilities

515: $\alpha$ and $\beta$, for example by using $(1-\alpha)$ as a confidence

516: level for the rejection of the null hypothesis.  Since the numerator now

517: allows for $p_1<p<1$, $\alpha$ and $\beta$ have, strictly speaking, only

518: meaning for a data set that has similar properties, i.e. has a correlation

519: probability that is not a single value, but spread over the interval

520: $\left[p_1,1\right]$.  However, in reality, any given data set has some fixed

521: probability $p$ to correlate with objects of a catalog.

522:

523: Therefore, we must test whether in the case of a fixed $p$ the method returns

524: probabilities for type-1 and type-2 errors lower than $\alpha$ and $\beta$.  In

525: general, we expect the type-2 error to be smaller than $\beta$ if the

526: correlation probability in the data is larger than some minimum value

527: $p_{\mbox{\scriptsize min}}$.

528:

529: A second practical issue is the convergence of the sequential likelihood ratio

530: test to a conclusion in favor of $\mathcal{H}_0$ or $\mathcal{H}_1$.  When

531: $p_1=p_0$ and the null hypothesis is true $(p=p_0)$, the ratio test will often

532: fail to reach a conclusion even as the number of events $n$ becomes quite

533: large.

534: This problem can be avoided in two ways.  One would be to terminate the test after

535: accumulating some number of events, $n_0$.  The acceptance or rejection of

536: $\mathcal{H}_0$ would then depend on whether $\mathcal{R}_n$ was greater or less than 1.

537: However, making a decision in this way would require a modification of

538: the type-1 and type-2 errors (see Appendix\,A).

539: Another would be to choose $p_1=p_0+\delta$, where

540: $\delta$ is a positive constant.

541: The particular choice of $\delta$ is somewhat

542: \textsl{ad hoc}, since it mainly reflects the experimenter's degree of belief

543: about the strength of the signal.  However, for those uncomfortable with this

544: kind of inference, we present a simple procedure to find $\delta$ such that:

545: the likelihood ratio $\mathcal{R}_n$ converges to a conclusion while still

546: satisfying a large number of signal hypotheses; and the type-1 and

547: type-2 rates of the sequential analysis are consistent with the

548: classical interpretations of the probabilities $\alpha$ and $\beta$.

549:

550: %Performing the likelihood ratio test as described above with $p_1=p_0$ leads to

551: %another practical problem.  In cases where the null hypothesis is true, the

552: %ratio test often does not come to a conclusion even for large numbers of events

553: %$n$.  This problem can be avoided by choosing $p_1$ to be larger than $p_0$ by

554: %some amount $\delta$.  One could go even further by choosing a $\delta$ such

555: %that the method would not only finish with a finite data set $n$, but with

556: %type-1 error probabilities smaller than $\alpha$ in data sets where $p$ was

557: %fixed.  This latter requirement for $\delta$ can be viewed as superfluous for

558: %two reasons.  First, strictly speaking, $\delta$ is not a parameter that is

559: %usually found, but rather chosen {\it a priori}; ideally, it should be governed

560: %by nothing more than the experimenter's degree of belief.  Second, as will be

561: %discussed below, we could simply pick a $p_1$ above which $\mathcal{R}_n$ has the

562: %correct $\alpha$ and $\beta$ when the signal probability is larger than $p$.

563: %This removes the need for scanning over $\delta$ to find an $\alpha$ and

564: %$\beta$ that behave in the desired fashion.  However, here we discuss a

565: %procedure to find $\delta$ to satisfy those who seek the best of both worlds: a

566: %likelihood ratio that leaves the option for a number of signal hypothesis and a

567: %sequential analysis method that returns an $\alpha$ and $\beta$ that can be

568: %interpreted intuitively.

569:

570: In this section, we test these expectations with simulated data sets and

571: determine values for $\delta$ and $p_{\mbox{\scriptsize min}}$ for some typical values for

572: $p_0$, $\alpha$, and $\beta$.  If we find $\delta$ to be small and

573: $p_{\mbox{\scriptsize min}}$ to be close to $p_0$, then the test will terminate with type-1

574: and type-2 error rates that are smaller than $\alpha$ and $\beta$, giving the

575: result an intuitive interpretation.  For each of the following tests, we

576: produce $10^5$ simulated data sets\footnote{We will use $\alpha=\beta=0.001$, and therefore

577: test the method on $10\times 1/0.001$.} with a correlation probability $p$ and

578: subject these data sets to a sequential analysis with predefined values for

579: $\alpha$ and $\beta$.

580:

581: \textbf{Case 1: $\mathcal{H}_0$ is True:} First consider the case where the

582: null hypothesis is true, so that the correlation probability $p$ of the data is

583: equal to $p_0$.  The dark grey area in Fig.\,\ref{fig:zones} indicates, as a

584: function of $p_{0}$,  the range $p_1>p_0$ for which the ratio test terminates

585: with a type-1 error probability greater than $\alpha$.  Note that when

586: $p_1\simeq p_0$, there is a large fraction of data sets in which the test does

587: not come to a conclusion (rejection or acceptance of the null hypothesis) even

588: when the number of events $n$ exceeds 1000.  The fraction of undecided tests is

589: added to the type-1 error rate to give a conservative limit on $p_1$.

590: For all $p_1$ that fall above the dark grey area, the test terminates with a

591: type-1 error rate less than $\alpha$.  As expected, the dark grey range

592: is narrow, so the test is ``well-behaved'' if $p_1$ is chosen not too

593: close to $p_0$.  As an example, if the random correlation probability

594: $p_0=0.1$, then $p_1=0.14$ ($\delta=0.04$).  Any values for $p_1$ larger than

595: 0.14 will of course also be well-behaved.

596:

597: \textbf{Case 2: $\mathcal{H}_0$ is False:} We now consider the case where the

598: null hypothesis is false.  Choosing the values for $p_1$ determined with the

599: procedure outlined in ``Case 1,'' we use simulated data to find the minimum

600: signal probability $p_{\mbox{\scriptsize min}}$ for which the ratio test terminates with a

601: type-2 error probability less than $\beta$.  The light grey area in

602: Fig.\,\ref{fig:zones} depicts, as a function of $p_0$, the range of

603: $p_{\mbox{\scriptsize min}}>p_1$ for which the ratio test terminates with a type-2

604: probability greater than $\beta$.  For instance, when $p_0=0.1$ and

605: $\alpha=\beta=0.01$, for all signal probabilities $p>p_{\mbox{\scriptsize min}}=0.18$ the

606: ratio test will terminate with a type-2 error probability less than

607: $\beta$.  Note that the $p_{\mbox{\scriptsize min}}$ values given here are conservative,

608: since they not only require a type-2 error below $\beta$ in case of a

609: signal with strength $p_{\mbox{\scriptsize min}}$, but also a type-1 rate below

610: $\alpha$ \textit{and} a rejection or acceptance of $\mathcal{H}_0$ before the

611: sample size $n$ reaches 1000 when $\mathcal{H}_0$ is true.  This last

612: requirement slightly inflates the value of $p_{\mbox{\scriptsize min}}$.

613:

614: The simulations of Cases 1 and 2 indicate that $p$ and $p_1$ must be larger

615: than $p_0$ if the test is to arrive at a decision in a reasonable amount of

616: time, and if the results are to be consistent with the error probabilities

617: $\alpha$ and $\beta$.  (To a much lesser extent, this second issue also exists

618: in Wald's original formulation of the ratio test, in which $p_1$ is treated as

619: a single alternative probability~\citep{Wald:1945,Wald:1947}.) Even so, the

620: amounts by which $p$ and $p_1$ should differ from $p_0$ are small enough that

621: they do not appreciably limit the usefulness of the method when a ``classical''

622: interpretation of $\alpha$ and $\beta$ is required.  We note that the existence

623: of small intervals above $p_0$ where such an interpretation is not possible are

624: a typical feature of sequential tests; see, for

625: example~\citep{Wald:1945,Wald:1947,Lewis:1994}.  It should be stressed,

626: however, that we have not demonstrated a circumstance where we are obtaining

627: some undesired values for $\alpha$ and $\beta$.  Rather, we have demonstrated

628: that marginalizing the likelihood is not the equivalent of inserting the right

629: value for $p$.

630:

631: %In his original paper, Wald suggests a different approach for testing a point

632: %null hypothesis against a single-sided alternative.  Here, rather than

633: %marginalizing over the unknown correlation probability $p$, one chooses a

634: %single value $p_s>p_0$ and proceeds with the ratio test as if the point null

635: %hypothesis is tested against a {\it single} alternative probability $p_s$.  As

636: %in the method that uses a marginalized likelihood ratio, the fact that the data

637: %has a correlation probability $p$ that is in most cases not equal $p_s$ may

638: %result in type-1 error probabilities greater than $\alpha$ and type-2 error

639: %probabilities greater than $\beta$ for some $p_s$.  Again, $p_s$ can be chosen

640: %such that the probabilities do not exceed $\alpha$ and $\beta$.  This is

641: %typically the case if $p_s$ is chosen not too close to $p_0$.

642:

643: \subsection{Efficiency of the Ratio Test}

644:

645: An important aspect of a sequential test is its length, i.e., the number of

646: events $n$ necessary to reach a decision.  Fig.\,\ref{fig:median} shows an

647: example for the typical length of the test as a function of the signal

648: probability $p$.  In this example, the background probability is chosen as

649: $p_0=0.1$, the lower boundary of the marginalization is  $p_1=0.3$, and

650: $\alpha=\beta=0.001$.  For $10^5$ simulated data sets,

651: Fig.\,\ref{fig:median}\,(top) shows the median number of events required for a

652: termination of the test.  The error bars indicate the range that includes

653: 68\,\% of the data sets.  In this example, the median size of a data set

654: required to accept the null hypothesis if it is true ($p_0=0.1$) is 27.  The

655: median size of a data set required to reject the null hypothesis if it is wrong

656: depends on $p$ and is large when $p$ is close to $p_0$.  Above $p\simeq 0.6$,

657: the median number reaches a plateau of about 7 events.

658:

659: Fig.\,\ref{fig:median}\,(bottom) shows which decision is actually made,

660: depicting the fraction of data sets for which the null hypothesis ($p_0=0.1$)

661: is accepted and the fraction for which it is rejected, as a function of the

662: signal probability.

663:

664: Comparing the length of the test with the marginalized likelihood to Wald's

665: original test is not straightforward, since the length of each test depends on

666: the specifics of the problem, and because the probability $p_1$ has quite a

667: different meaning for the two methods.  However, we find that the marginalized

668: test tends to require fewer events when $p_1$ is the same in both tests.  For

669: the above example, the median number of events required to accept the null

670: hypothesis if it is true is 55 and thus twice as large as for the marginalized

671: likelihood ratio.  For signal probabilities $p>0.6$, the Wald test reaches a

672: plateau that is roughly comparable to the marginalized test.

673: Fig.\,\ref{fig:median_wald} shows the median number of events required for the

674: Wald test for $p_1=0.3$ and $\alpha=\beta=0.001$.

675:

676: %Comparing the length of the test with the marginalized likelihood to Wald's

677: %original test which compares $p_0$ to a fixed signal probability $p_s$ is not

678: %straightforward, as the length of the test depends on our choice of $p_s$ just

679: %like the length of the marginalized test depends on our choice of $p_1$.  Both

680: %parameters can be chosen independently, and the Wald sequential test can end

681: %sooner or later than the marginalized test depending on what values are chosen.

682: %However, we find that the marginalized test tends to require fewer events if

683: %$p_1=p_s$.  For the above example, the median number of events required to

684: %accept the null hypothesis if it is true is 55 and thus twice as large as for

685: %the marginalized likelihood ratio.  For signal probabilities $p>0.6$, the Wald

686: %test reaches a plateau that is roughly comparable to the marginalized test.

687: %Fig.\,\ref{fig:median_wald} shows the median number of events required for the

688: %Wald test for $p_s=0.3$ and $\alpha=\beta=0.001$.

689:

690:

691: % -----------------------------------------------------------------------------

692: \section{Summary}\label{sec:summary}

693: % -----------------------------------------------------------------------------

694:

695: We have outlined a sequential analysis technique for testing a point

696: null hypothesis with probability $p_0$ against a signal probability

697: $p$.  The method is based on the sequential analysis proposed

698: in~\citet{Wald:1945,Wald:1947}, but replacing the likelihood ratio used

699: to evaluate the significance of a signal with one that marginalizes

700: the signal strength.

701:

702: In many sequential tests, the signal strength is unknown when the test

703: starts.  Typically, the signal probability $p$ can in principle have any

704: value in the interval $\left[p_0,1\right]$.  Rather than choosing a fixed

705: threshold for $p$, as suggested in~\citet{Wald:1945,Wald:1947}, we have

706: argued that, in general, the better alternative is to marginalize $p$ and

707: account for our ignorance exactly.  In the marginalization of the signal

708: likelihood, the integration starts at some value $p_1=p_0+\delta$, where

709: $\delta$ is an \textsl{ad hoc} parameter reflecting the experimenter's belief

710: about the strength of the signal, the capability of his experiment,

711: and other \textsl{a priori} knowledge.

712:

713: Because of the integration of the signal likelihood over a range in $p$, the

714: parameters $\alpha$ and $\beta$ have lost their intuitive meaning if the

715: method is applied to data sets where $p$ is fixed, as is typically the

716: case for real data.  However, we have shown that for most values of $\delta$

717: and $p$ that occur in correlation searches, the type-1 and type-2

718: error rates of the sequential analysis are consistent with the classical

719: interpretations of the probabilities $\alpha$ and $\beta$.

720:

721: Note that we have run a test with one of two outcomes

722: (i.e., an acceptance or rejection of $\mathcal{H}_0$), defining $\alpha$ and

723: $\beta$, rather than one outcome (say, only a rejection of $\mathcal{H}_0$)

724: such as in~\citet{Darling:1968}.  The latter case supposes that we

725: are only concerned about reporting a signal.

726: However, it is important to state a null

727: result at some point in the interest of reducing reporting bias.

728: That is, it is important to ensure that 1\% of the

729: results that claim an excess of events are indeed a 1\% effect.

730:

731: The sequential analysis technique proposed here is efficient, allows the

732: signal significance to be evaluated after the test has been fulfilled,

733: adheres to the likelihood principle, and rigorously accounts for our

734: ignorance of the signal strength.

735:

736:

737: \acknowledgments

738:

739: We thank Diego Harari, Antoine Letessier-Selvon, and John A.J. Matthews for

740: valuable discussions and help.  This work is supported by the National Science

741: Foundation under contract numbers NSF-PHY-0500492 and NSF-PHY-0636875.

742:

743: \appendix

744: \section{The Truncated Sequential Analysis Test}

745:

746: In practice, the test must end.  It is supposed that a decision to

747: accept or reject the null hypothesis must

748: be made when $n=n_0$ if it has not been made already for $n\le n_0$.

749: Following the derivation of the modified errors for truncated tests

750: in~\citet{Wald:1945}, $\alpha(n_0)$ and $\beta(n_0)$

751: are defined as the probabilities of errors of the first and second

752: kinds if the test is truncated at $n=n_0$.  The objective is then

753: to derive an upper bound on $\alpha(n_0)$ and $\beta(n_0)$ such

754: that (1) the test ends prematurely and (2)

755: $\mathcal{H}_1$ is accepted if $R_{n_0}>1$ and $\mathcal{H}_0$ is accepted in

756: $R_{n_0}\le 1$.  In doing so,

757: we find a suitable $\delta$ and $n_0$ where $\alpha$

758: and $\beta$ are small.

759:

760: First, $\rho_0(n_0)$ is defined as the probability that, under the null hypothesis,

761: \begin{enumerate}

762: \item $B<R_{n_0-1}<A$

763: \item $1<R_{n_0}<A$

764: \item The sequential analysis would terminate with an acceptance

765: of $\mathcal{H}_0$ if allowed to continue.

766: \end{enumerate}

767: For the truncated test, we are rejecting the null hypothesis if

768: $1<R_{n_0}<A$.  In other words, $\rho_0(n_0)$ is the

769: probability of wrongly rejecting the null hypothesis

770: when $1<R_{n_0}<A$ when it would have terminated with a

771: rejection of the null hypothesis

772: wanted if we let the test continue.  This is

773: added to the probability that the test would terminate wrongly if we let

774: it continue.

775: Therefore, the upper bound on $\alpha(n_0)$ can be expressed as

776: \begin{eqnarray}

777: \alpha(n_0)\le \alpha + \rho_0(n_0).

778: \end{eqnarray}

779: Now if $\bar{\rho}_0(n_0)$ is

780: simply the probability under the null hypothesis that $1<R_{n_0}<A$,

781: then $\rho_0(n_0)<\bar{\rho}(n_0)$

782: and therefore

783: \begin{eqnarray}

784: \alpha(n_0)\le \alpha + \bar{\rho}_0(n_0).

785: \end{eqnarray}

786: Similarly, $\rho_1(n_0)$ is defined as the probability that,

787: under the ``signal'' hypothesis,

788: \begin{enumerate}

789: \item $B<R_{n_0-1}<A$

790: \item $B<R_{n_0}\le 1$

791: \item The sequential analysis would terminate with an acceptance

792: of $\mathcal{H}_1$ if allowed to continue.

793: \end{enumerate}

794: and

795: \begin{eqnarray}

796: \beta(n_0)\le \beta + \bar{\rho}_1(n_0).

797: \end{eqnarray}

798: where $\bar{\rho}_1(n_0)$ is defined to be

799: the probability under the signal hypothesis that $B<R_{n_0}\le 1$.

800:

801: We then calculate $\bar{\rho}_0(n_0)$ explicitly.

802: The probability of obtaining $R_{n_0}>1$ if the null hypothesis is true is

803: \begin{eqnarray}

804: \bar{\rho}_0(n_0)=

805: \sum_{k_{1+}}^{k_A} {n_0 \choose k} p_0^k(1-p_0)^{n_0-k}

806: \end{eqnarray}

807: where $k_{1+}$ is the minimum integer $k$ for which

808: \begin{eqnarray}

809: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}>1

810: \end{eqnarray}

811: and $k_{A}$ is the maximum integer $k$ for which

812: \begin{eqnarray}

813: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}<A

814: \end{eqnarray}

815:

816: Similarly,

817: \begin{eqnarray}

818: \bar{\rho}_1(n_0)=

819: \frac{\sum_{k_B}^{k_{1-}} {n_0 \choose k} \frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^{k}(1-p)^{n_0-k}}

820: {\sum_0^{n_0} {n_0 \choose k} \frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^{k}(1-p)^{n_0-k}}

821: \end{eqnarray}

822: where $k_{1-}$ is the maximum integer $k$ for which

823: \begin{eqnarray}

824: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}\le 1

825: \end{eqnarray}

826: and $k_B$ is the minimum integer $k$ for which

827: \begin{eqnarray}

828: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}>B

829: \end{eqnarray}

830:

831: Under this scheme, Fig.\,\ref{fig:rho} shows $\bar{\rho}_0(n_0)$

832: and $\bar{\rho}_1(n_0)$ as a function of $\delta$ and $n_0$.

833: It shows that a rather large $\delta$ ($\sim 0.7$) is required to bring

834: $\bar{\rho}_1(n_0)$ and $\bar{\rho}_1(n_0)$ to be less than

835: $\alpha = \beta = 0.001$.  Further, if the calculation is

836: extended we find that

837: it would take $\sim 180$ events to bring

838: $\bar{\rho}_1(n_0)$ and $\bar{\rho}_1(n_0)$ to be $\sim 0$

839: for any $\delta$.

840:

841:

842:

843: \begin{thebibliography}{11}

844: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi

845:

846: \bibitem[{Abbasi {et~al.}(2004)}]{Abbasi:2004ib}

847: Abbasi, R.~U. {et~al.} 2004, Astrophys. J., 610, L73

848:

849: \bibitem[{Abbasi {et~al.}(2006)}]{Abbasi:2005qy}

850: ---. 2006, Astrophys. J., 636, 680

851:

852: \bibitem[{Anscombe (1954)}]{Anscombe:1954}

853: Anscombe, F.~J. 1954, Biometrics, 10, 89

854:

855: \bibitem[{Armitage {et~al.}(1969)}]{Armitage:1969}

856: Armitage, P., McPherson, C.~K., \& Rowe, B.~C. 1969, J. Roy. Stat. Soc. A, 132, 235

857:

858: \bibitem[{Berry (1987)}]{Berry:1987}

859: Berry, D.~A. 1987, Amer. Stat., 41, 117

860:

861: \bibitem[{Darling {et~al.}(1968)}]{Darling:1968}

862: Darling, D.~A. \& Robbins, H. 1968, Proc. Nat. Acad. Sci. USA, 61, 804

863:

864: \bibitem[{Gorbunov {et~al.}(2004)}]{Gorbunov:2004bs}

865: Gorbunov, D.~S., Tinyakov, P.~G., Tkachev, I.~I., \& Troitsky, S.~V. 2004, JETP

866:   Lett., 80, 145

867:

868: \bibitem[{Jeffreys (1939)}]{Jeffreys:1939}

869: Jeffreys, H. 1939, Theory of Probability (London: Oxford University Press)

870:

871: \bibitem[{Lewis \& Berry (1994)}]{Lewis:1994}

872: Lewis, R.~J. \& Berry, D.~A. 1994, J. Amer. Stat. Assoc., 89, 1528

873:

874: \bibitem[{Kass \& Raftery (1995)}]{Kass:1995}

875: Kass, R.~E. \& Raftery, A.~E. 1995, J. Amer. Stat. Assoc., 90 773

876:

877: \bibitem[{Takeda {et~al.}(1999)}]{Takeda:1999sg}

878: Takeda, M. {et~al.} 1999, Astrophys. J., 522, 225

879:

880: \bibitem[{Tinyakov \& Tkachev (2001)}]{Tinyakov:2001nr}

881: Tinyakov, P.~G. \& Tkachev, I.~I. 2001, JETP Lett., 74, 445

882:

883: \bibitem[{Wald (1945)}]{Wald:1945}

884: Wald, A. 1945, Ann. Math. Stat., 16, 117

885:

886: \bibitem[{Wald (1947)}]{Wald:1947}

887: ---. 1947, Sequential Analysis (New York, NY: John Wiley and Sons)

888:

889: \end{thebibliography}

890:

891: \clearpage

892:

893: \begin{figure}

894: \plotone{f1.eps}

895: \caption{Likelihood ratio as a function of $p_1$ for $n=10$, $k=6$,

896: and $p_0=0.1$.\label{fig:R_vs_p1}}

897: \end{figure}

898:

899: \clearpage

900:

901: \begin{figure}

902: \plotone{f2a.eps}

903: \plotone{f2b.eps}

904: \caption{Likelihood ratio as a function of the number of events for a background

905: probability $p_0=0.1$, $p_1=0.3$, and a signal probability $p=0.5$ (top) and $p=0.1$ (bottom).

906: The ratio is calculated for $10^5$ random data sets.  The plots show the median (dark grey dots)

907: together with the range that includes 68\,\% and 95\,\% of the data sets (dark and light grey

908: areas).  The values for the test boundaries $A$ and $B$ for $\alpha=\beta=0.001$ are indicated

909: as dashed and dotted lines.\label{fig:R_vs_n}}

910: \end{figure}

911:

912: \clearpage

913:

914: \begin{figure}

915: \plotone{f3a.eps}

916: \plotone{f3b.eps}

917: \caption{Range for $p_1>p_0$ for which the ratio test terminates with

918: type-1 error probabilities greater than $\alpha$ (dark grey), as a

919: function of $p_0$.  Range for $p>p_1$ for which the ratio test terminates

920: with type-2 error probabilities greater than $\beta$, as a function of $p_0$ (light

921: grey).  The upper plot is for $\alpha=\beta=0.01$, the lower plot for

922: $\alpha=\beta=0.001$.\label{fig:zones}}

923: \end{figure}

924:

925: \clearpage

926:

927: \begin{figure}

928: \plotone{f4a.eps}

929: \plotone{f4b.eps}

930: \caption{{\it Top:}  Median number of events necessary for the sequential test

931: to come to a conclusion, as a function of the signal probability $p$.  In this

932: example, the background probability is $p_0=0.1$, and $p_1=0.3$, $\alpha=\beta=0.001$.

933: Error bars indicate the range that includes 68\,\% of the simulated data sets.

934: {\it Bottom:}  For the same simulated data sets, fraction of data sets for which

935: the null hypothesis is accepted (solid line) and rejected (dotted line) as a

936: function of the signal probability $p$ for a background probability $p_0=0.1$.

937: \label{fig:median}}

938: \end{figure}

939:

940: \clearpage

941:

942: \begin{figure}

943: \plotone{f5.eps}

944: \caption{Median number of events necessary for the Wald sequential test

945: to come to a conclusion (open circles), as a function of the signal probability

946: $p$, compared to the marginalized likelihood ratio test (filled circles).  The

947: fixed point $p_1$ is the same in both cases.  For this example, the background

948: probability is $p_0=0.1$, and $p_1=0.3$, $\alpha=\beta=0.001$.  Error bars

949: indicate the range that includes 68\,\% of the simulated data sets.

950: \label{fig:median_wald}}

951: \end{figure}

952:

953: \clearpage

954:

955: \begin{figure}

956: \plotone{f6.eps}

957: \caption{The added error for $\alpha$, $\bar{\rho}_0(n_0)$, and $\beta$, $\bar{\rho}_1(n_0)$,

958: as a function of $\delta$, where $p$ is integrated from $p_0+\delta$ to 1,

959: and the number of events at which the test is truncated, $n_0$.  }

960: \label{fig:rho}

961: \end{figure}

962:

963: \end{document}

964: