0711.3937/ms.tex
1: %%
2: %% Beginning of file 'sample.tex'
3: %%
4: %% Modified 2005 December 5
5: %%
6: %% This is a sample manuscript marked up using the
7: %% AASTeX v5.x LaTeX 2e macros.
8: 
9: %% The first piece of markup in an AASTeX v5.x document
10: %% is the \documentclass command. LaTeX will ignore
11: %% any data that comes before this command.
12: 
13: %% The command below calls the preprint style
14: %% which will produce a one-column, single-spaced document.
15: %% Examples of commands for other substyles follow. Use
16: %% whichever is most appropriate for your purposes.
17: %%
18: \documentclass[12pt,preprint]{aastex}
19: 
20: %% manuscript produces a one-column, double-spaced document:
21: 
22: %% \documentclass[manuscript]{aastex}
23: 
24: %% preprint2 produces a double-column, single-spaced document:
25: 
26: %% \documentclass[preprint2]{aastex}
27: 
28: %% Sometimes a paper's abstract is too long to fit on the
29: %% title page in preprint2 mode. When that is the case,
30: %% use the longabstract style option.
31: 
32: %% \documentclass[preprint2,longabstract]{aastex}
33: 
34: %% If you want to create your own macros, you can do so
35: %% using \newcommand. Your macros should appear before
36: %% the \begin{document} command.
37: %%
38: %% If you are submitting to a journal that translates manuscripts
39: %% into SGML, you need to follow certain guidelines when preparing
40: %% your macros. See the AASTeX v5.x Author Guide
41: %% for information.
42: 
43: %%\usepackage{amsmath}
44: 
45: \newcommand{\vdag}{(v)^\dagger}
46: \newcommand{\myemail}{skywalker@galaxy.far.far.away}
47: 
48: %% Bibliography styles.
49: \citestyle{aa}
50: \bibliographystyle{apj}
51: 
52: %% You can insert a short comment on the title page using the command below.
53: 
54: \slugcomment{}
55: 
56: %% If you wish, you may supply running head information, although
57: %% this information may be modified by the editorial offices.
58: %% The left head contains a list of authors,
59: %% usually a maximum of three (otherwise use et al.).  The right
60: %% head is a modified title of up to roughly 44 characters.
61: %% Running heads will not print in the manuscript style.
62: 
63: \shorttitle{Sequential Analysis in Particle Astronomy}
64: \shortauthors{BenZvi et al.}
65: 
66: %% This is the end of the preamble.  Indicate the beginning of the
67: %% paper itself with \begin{document}.
68: 
69: \begin{document}
70: 
71: %% LaTeX will automatically break titles if they run longer than
72: %% one line. However, you may use \\ to force a line break if
73: %% you desire.
74: 
75: 
76: \title{Sequential Analysis Techniques for Correlation Studies \\
77:        in Particle Astronomy}
78: 
79: \author{S.Y. BenZvi\altaffilmark{1}, B.M. Connolly\altaffilmark{2}, and S. Westerhoff\altaffilmark{1}}
80: \altaffiltext{1}{University of Wisconsin-Madison, Department of Physics, 1150 University
81:               Avenue, Madison, WI 53706, USA}
82: 
83: \altaffiltext{2}{University of Pennsylvania, Department of Physics and Astronomy,
84:               209 South $33^{\mathrm{rd}}$ Street, Philadelphia, PA 19104, USA; brianco@sas.upenn.edu}
85: 
86: %\author{S.Y. BenZvi}
87: %\affil{Columbia University, Department of Physics and Nevis Laboratories,
88: %                   538 West $\it 120^{th}$ Street, New York, NY 10027, USA}
89: 
90: %\author{B.M. Connolly}
91: %\affil{University of Pennsylvania, Department of Physics and Astronomy,
92: %                209 South $\it 33^{rd}$ Street, Philadelphia, PA 19104, USA}
93: 
94: %\author{S. Westerhoff}
95: %\affil{University of Wisconsin-Madison, Department of Physics, 1150 University
96: %              Avenue, Madison, WI 53706, USA}
97: 
98: 
99: \begin{abstract}
100: 
101: Searches for statistically significant correlations between arrival directions
102: of ultra-high energy cosmic rays and classes of astrophysical objects are
103: common in astroparticle physics.  We present a method to test potential
104: correlation signals of \textit{a priori} unknown strength and evaluate their
105: statistical significance sequentially, i.e., after each incoming new event in a
106: running experiment.  The method can be applied to data taken after the test has
107: concluded, allowing for further monitoring of the signal significance.  It
108: adheres to the likelihood principle and rigorously accounts for our ignorance
109: of the signal strength.
110: 
111: \end{abstract}
112: 
113: \keywords{cosmic rays --- methods: statistical}
114: 
115: 
116: % -----------------------------------------------------------------------------
117: \section{Introduction}\label{sec:intro}
118: % -----------------------------------------------------------------------------
119: 
120: One of the major goals in astroparticle physics is the identification and the
121: study of sources of ultra-high energy cosmic rays, defined as cosmic rays with
122: energies larger than $10^{18}$\,eV.  The discovery of discrete sources would
123: answer longstanding questions about how and where particles are accelerated to
124: such energies.  So far, no discrete sources have been positively identified.
125: One major obstacle for the identification of potential sources is the small
126: number of detected events.  Until a few years ago, the published world data set
127: of cosmic rays with energies above $4\,\times 10^{19}$\,eV consisted of little
128: more than 100 events, mainly recorded with the Akeno Giant Air Shower Array
129: (AGASA) in Japan between 1984 and 2003~\citep{Takeda:1999sg}, and the High
130: Resolution Fly's Eye (HiRes) Experiment in Utah between 1997 and
131: 2006~\citep{Abbasi:2004ib}.
132: 
133: Nevertheless, the small data set has been subjected to exhaustive searches for
134: deviations from isotropy.  These include searches for point sources; searches
135: for an excess of clustering in the distribution of arrival directions on
136: various angular scales; and searches for correlations with classes of known
137: astrophysical objects that were considered likely sites of cosmic ray
138: acceleration.  Some of these searches resulted in potential signals, but
139: because of the small size of the data set, the statistical significance could
140: not be established in a reliable manner.  Consequently, while the discovery of
141: discrete sources was claimed repeatedly, statistically independent data
142: routinely failed to support earlier claims.  An example is the search for
143: correlations of cosmic ray arrival directions with objects of the BL Lac
144: class~\citep{Tinyakov:2001nr,Gorbunov:2004bs,Abbasi:2005qy}.
145: 
146: With a new generation of large-aperture astroparticle physics detectors like
147: the Pierre Auger Observatory nearing completion in Malarg\"ue, Argentina and
148: the Telescope Array detector under construction in Utah, the amount of
149: ultra-high energy data is now growing at an unprecedented pace.  The Pierre
150: Auger Observatory, for instance, began scientific data taking in January 2004
151: and has already accumulated over
152: $9\times10^3\,\mathrm{km}^{2}\,\mathrm{sr}\,\mathrm{yr}$ of integrated
153: exposure, more than any previous experiment.
154: 
155: \subsection{Basic Search Techniques in Cosmic Ray Physics}
156: 
157: The fact that previous experiments have failed to find statistically
158: significant deviations from isotropy in skymaps of ultra-high energy cosmic
159: rays can be seen as an indication that the sources are weak.  In this case,
160: the most promising correlation searches are not those which aim at finding
161: sources individually, but rather those conducted on a statistical basis;
162: i.e., searches for significant correlations of cosmic ray arrival
163: directions with catalogs of astrophysical objects.   
164: 
165: When studying correlations with objects from a source catalog, one tests
166: whether the probability $p$ of a given event to arrive from the direction of an
167: object in the catalog is significantly larger than the probability $p_0$ of the
168: correlation occurring by chance.  These analyses are typically binned, so an
169: event is said to correlate with an object from the catalog if the angle between
170: its arrival direction and the object's position is smaller than some angle
171: $\theta$.  If the particles are neutral, $\theta$ could be chosen to reflect
172: the point spread function of the detector.  In the case of cosmic rays,
173: however, the particles are most likely charged and therefore deflected by
174: Galactic and intergalactic magnetic fields of (unknown) strength.
175: Consequently, $\theta$ is usually chosen to be larger than the resolution of
176: the detector to account for magnetic smearing.
177: 
178: Typically, potential signals are identified after intensive searches using
179: different angular scales, different energy thresholds, different source
180: catalogs, and other parameters that are found to maximize the signal strength.
181: Therefore, an unbiased chance probability for the observed signal can only be
182: established by discarding the data set used to find the signal and testing the
183: signal with statistically independent data.  For the test, the source catalog
184: and all analysis parameters are fixed {\it a priori} to obtain an unbiased
185: chance probability for the signal.  
186: 
187: Once the \textsl{a priori} analysis parameters are identified, the problem is
188: easily formulated in terms of a classical hypothesis test, in which new data
189: are checked for compatibility with a null hypothesis $\mathcal{H}_0$ (``the
190: data exhibit no significant correlation'') or an alternative ``signal''
191: hypothesis $\mathcal{H}_1$.  There are several ways to perform such a test.
192: For example, one can run the test after the new data set has reached a certain
193: size $n$, or after the experiment has run for a certain fixed amount of time.
194: 
195: Formally, the size of the data set and the acceptance or rejection of the null
196: hypothesis are determined by two probabilities, $\alpha$ and $\beta$, which are
197: usually chosen before the start of the test.  These values define the
198: experimenter's tolerance for different sorts of experimental errors: $\alpha$
199: is the probability of wrongly rejecting the null hypothesis when
200: $\mathcal{H}_0$ is true (a type-1 or ``false positive'' error); and $\beta$ is
201: the probability of wrongly accepting the null hypothesis when $\mathcal{H}_0$
202: is false (a type-2 or ``false negative'' error).  In a classical one-sided
203: hypothesis test, where a p-value $P$ is used to estimate the agreement of the
204: data with the null hypothesis, the result $P<\alpha$ implies rejection of
205: $\mathcal{H}_0$ at the ``confidence level'' $1-\alpha$.  Meanwhile, the desired
206: probability of rejecting a false null hypothesis $(1-\beta)$ fixes the required
207: size of the data set $(n)$.
208: 
209: \subsection{One-Shot vs. Sequential Testing}
210: 
211: If one chooses to evaluate $P$ after a predefined number of events has been
212: recorded, or a predefined amount of time has elapsed, then the significance of
213: the signal is tested only once.  However, it is often desirable to evaluate and
214: test the signal sequentially, i.e., after each new event, rather than
215: at the end of the test.  This approach allows for the possibility of claiming a
216: statistically significant result earlier than with methods that check the
217: signal only once, a distinct advantage when event rates are quite low.  It also
218: avoids another practical disadvantage of hypothesis tests that arises when the
219: experiment, for one reason or another, has to discontinue data taking before
220: the predefined number of events is taken.  In that case, the ``one-shot''
221: analysis does not lead to a conclusion.
222: 
223: A sequential analysis can be performed in several ways.  If $P$ is evaluated
224: after every incoming event and not just once after all $n$ events are
225: collected, a ``penalty'' factor has to be inserted to account for the fact that
226: there are now more opportunities to satisfy the test by
227: chance~\citep{Anscombe:1954,Armitage:1969}.  This penalty
228: factor can be evaluated with simulations and will depend on $n$.  The
229: dependence of $P$ on $n$ is an undesirable feature of the method; rather than
230: depending on the data that were actually recorded, $P$ now depends on the
231: number of events that an observer would have recorded had he decided to perform
232: a ``one-shot'' test.  The interpretation of the data therefore depends on data
233: not actually taken.  This feature of the test violates the likelihood
234: principle~\citep{Berry:1987}.   
235: 
236: In addition, the inclusion of the penalty factor means that data arriving after
237: the test has ended cannot be used to calculate $P$ for the entire data 
238: set.  It is therefore not possible to include new data in the calculation
239: of the probability.  In many practical situations, data taking continues after 
240: the test has ended, and it is highly desirable to monitor the signal 
241: probability with new data.  
242: 
243: The classical sequential likelihood ratio test developed by~\citet{Wald:1945,Wald:1947} 
244: avoids the limitations that arise when using the p-value $P$.  Wald defines the 
245: likelihood ratio evaluated after the $n^{th}$ event as
246: % 
247: \begin{equation}
248: \mathcal{R}_n=\frac{P(\mathcal{D}|\mathcal{H}_1)}{P(\mathcal{D}|\mathcal{H}_0)}~~,
249: \label{eq:likelihood_ratio}
250: \end{equation}
251: %
252: where the denominator and numerator represent the probability of observing a
253: data set $\mathcal{D}$ given a null hypothesis (no correlation) and an
254: alternative (correlation).  The ratio $\mathcal{R}_n$ can be evaluated after each
255: incoming event (i.e. after the $n^{th}$ event) without statistical penalty, and the test stops with the
256: acceptance or rejection of the null hypothesis when $\mathcal{R}_n$ falls below
257: or exceeds a predefined value (details will be given in
258: Section~\ref{sec:method}). Moreover, the evaluation of $\mathcal{R}_n$ can
259: continue after the decision to see whether new data continue to favor or
260: disfavor the selected hypothesis. 
261: 
262: The probabilities $P(\mathcal{D}|\mathcal{H}_0)$ and
263: $P(\mathcal{D}|\mathcal{H}_1)$ in eq.\,(\ref{eq:likelihood_ratio}) depend on the
264: expected correlations in case of random coincidences and true signals,
265: respectively.  In correlation studies, the strength of the signal is typically
266: not known before the test is complete; so in the analysis proposed
267: by~\citet{Wald:1945,Wald:1947}, one simply takes a ``best guess'' at the lower
268: bound of the signal strength.  In this paper, we extend Wald's technique to
269: marginalize the signal strength, which more rigorously accounts for our
270: ignorance of the true signal.  As in the classical likelihood ratio test, this
271: extended test can be applied after each new event without statistical penalty,
272: so that it adheres to the likelihood principle.  It also allows for the
273: evaluation of the significance of the signal after the test has been fulfilled,
274: as well as in cases where the test stops prematurely.
275: 
276: We note that the usefulness of this test is not limited to cosmic ray physics.
277: It can be applied in many other areas of astroparticle physics or astrophysics
278: where event rates are low, for example in searches for discrete sources of high
279: energy neutrinos or $\gamma$-rays.
280: 
281: %This paper is organized as follows.  After a description of the method in
282: %Section\,\ref{sec:method}, we analyze the behavior of the test with simulated
283: %data sets in Section\,\ref{sec:test}.  Section~\ref{sec:summary} summarizes
284: %the results.
285: 
286: % -----------------------------------------------------------------------------
287: \section{The Method}\label{sec:method}
288: % -----------------------------------------------------------------------------
289: 
290: We consider the case of an analysis searching for correlations between cosmic
291: ray arrival directions and objects from a catalog.  The background probability
292: $p_0$ is the probability that a given event correlates by chance.  We want to
293: test the signal probability $p_1$ against $p_0$.  If two point hypotheses are
294: tested against each other, $p_0$ and $p_1$ are single numbers; but in general,
295: $p_1$ can also have a range of values.  If, for example, the ``signal''
296: corresponds to a stronger correlation than can be expected by chance, then
297: $p_1>p_0$.
298: 
299: Since an event can either be correlated with an object from the catalog or not,
300: the probability of observing a data set $\mathcal{D}$ in which $k$ out of $n$
301: events correlate with sources is given by the binomial distribution
302: %
303: \begin{equation}
304: P(\mathcal{D}|p) = P(n,k|p) = {n \choose k}\ p^k\ (1-p)^{n-k}
305: \label{eq:binomial_distribution}
306: \end{equation}
307: %
308: where $p$ is the probability of a given event to correlate.  If the data show
309: no significant correlations in addition to those occurring by chance, then
310: $p=p_0$.
311: 
312: In a sequential analysis that tests hypothesis $\mathcal{H}_1$ against
313: $\mathcal{H}_0$ with data $\mathcal{D}$, the probability ratio $\mathcal{R}_n$ of
314: eq.\,(\ref{eq:likelihood_ratio}) is calculated after each incoming event, and is
315: then compared to two positive constants $A$ and $B$ (where $B<A$).  During each
316: step in the sequence, the experimenter is presented with the following possible
317: outcomes:
318: %
319: \begin{enumerate}
320:   \item $\mathcal{R}_n\ge A$: the test terminates with the rejection of
321:         $\mathcal{H}_0$.
322:   \item $\mathcal{R}_n\le B$: the test terminates with the acceptance of
323:         $\mathcal{H}_0$.
324:   \item $B<\mathcal{R}_n<A$: the test continues to record data.
325: \end{enumerate}
326: %
327: \citet{Wald:1945,Wald:1947} showed that the constants
328: $A$ and $B$ are closely related to the probabilities $\alpha$ and $\beta$ of
329: type-1 and type-2 errors:
330: \begin{equation}
331:   A\leq\frac{1-\beta}{\alpha}~~~\mathrm{and}~~~
332:   B\geq\frac{\beta}{1-\alpha}~~.
333: \end{equation}
334: %
335: While it is difficult in most practical situations to estimate exact values for
336: $A$ and $B$, Wald showed that simply choosing 
337: %
338: \begin{equation}
339:   A = \frac{1-\beta}{\alpha}~~~\mathrm{and}~~~
340:   B = \frac{\beta}{1-\alpha}~~, 
341: \end{equation}
342: %
343: as the test boundaries leads to adequate results if $\alpha$ and $\beta$ are
344: small (typically, they are not larger than 0.05).  By adequate, we mean that
345: the true type-1 and type-2 rates will never exceed $\alpha$ and
346: $\beta$.  In fact, the true error rates will often be smaller than the nominal
347: $\alpha$ and $\beta$ specified before the start of the experiment.
348: 
349: %The test can terminate at any time and still provide valuable information, 
350: %or it can continue even after a decision is made to see whether additional 
351: %data further supports the decision or not.  No penalty factor is required, 
352: %but still, the probabilities are evaluated after each incoming event.
353: 
354: For a data set that contains $n$ events and $k$ correlations, the likelihood
355: ratio is given by
356: %
357: \begin{equation}
358:   \mathcal{R}^\prime _n
359:     =\frac{P(\mathcal{D}|p_1)}{P(\mathcal{D}|p_0)}
360:     =\frac{p_1^k (1-p_1)^{n-k}}{p_0^k (1-p_0)^{n-k}}~~.
361:   \label{eq:likeli}
362: \end{equation}
363: 
364: In practice, the signal strength $p_1$ is often not known.  We consider here
365: the common case of a one-sided test where $p_0 < p_1 \leq 1$.  The confidence
366: in rejecting $\mathcal{H}_0$ typically increases with increasing $p$.  To
367: evaluate $\mathcal{R}_n$ in this case, we can expand the numerator and
368: denominator of eq.\,(\ref{eq:likelihood_ratio}) in terms of $p$:
369: %
370: \begin{equation}
371:   \mathcal{R}_n = \frac{\int_0^1 P(D|p)\ P(p|\mathcal{H}_1)\ dp}
372:                      {\int_0^1 P(D|p)\ P(p|\mathcal{H}_0)\ dp}~.
373: \end{equation}
374: 
375: The quantities $P(p|\mathcal{H}_1)$ and $P(p|\mathcal{H}_0)$ represent our
376: prior assumptions about $p$ in the cases of true signal vs. chance
377: correlations.  In cosmic ray studies, the probability $p_0$ of a chance
378: correlation with a catalog object is estimated from the \textsl{a priori}
379: parameters of the test: e.g., the detector exposure to the catalog,
380: the angular bin size $\theta$, etc.  In contrast, it is fairly uncommon to have
381: a reliable estimate of the signal probability $p_1$ beyond the fact that
382: $p_1>p_0$.  Absent further knowledge of the signal, we can therefore treat the
383: probability as uniformly distributed on the interval $[p_1,1]$.  Hence, we
384: summarize our prior knowledge of the two cases by
385: %
386: \begin{eqnarray}
387:   P(p|\mathcal{H}_1) & = & \frac{\Theta(p-p_1)}{1-p_1}~~, \\
388:   P(p|\mathcal{H}_0) & = & \delta(p-p_0)~~.
389: \end{eqnarray}
390: %
391: Note that $p$ is not time-dependent, although we do not see 
392: anything inherently problematic in inserting a time-dependence.  Although not
393: many ultra-high energy cosmic ray models propose a time-dependence, 
394: if a time-dependent model is inserted for $\mathcal{H}_0$, the probability 
395: of each sucessive event is evaluated based on what is expected 
396: at the time it was measured.  However, if $\mathcal{H}_0$ 
397: and $\mathcal{H}_1$ are simply wrong - that is, the
398: hypotheses do not properly reflect what could happen in nature
399: - then any result is possible.  This hazard exists for any hypothesis test. 
400: 
401: Solving for the likelihood ratio $\mathcal{R}_n$, we have
402: %
403: \begin{eqnarray}
404:   \mathcal{R}_n & = & \frac{\int_{p_1}^1 p^k\ (1-p)^{n-k}\ dp}
405:                          {p_0^k\ (1-p_0)^{n-k}\ (1-p_1)}\\
406:               & = & \frac{\mathrm{B}(k+1, n-k+1) - 
407:                           \mathrm{B}(p_1; k+1, n-k+1)}
408:                          {p_0^k\ (1-p_0)^{n-k}\ (1-p_1)}~,
409:   \label{eq:final_ratio}
410: \end{eqnarray}
411: %
412: where $\mathrm{B}(a,b)$ and $\mathrm{B}(x;a,b)$ are the complete and incomplete
413: beta functions.  Note that eq.\,(\ref{eq:final_ratio}) is a convenient form for
414: the numerical computation of $\mathcal{R}_n$.
415: 
416: When nothing is known {\it a priori} about the strength of the signal, $p_1$
417: will be chosen close to $p_0$ to test as large a signal space $p$ as possible.
418: If more information on $p$ were available --- for example, if it were known
419: that $p$ is larger than some value $p_{\mbox{\scriptsize min}}$ --- then the range of
420: integration could be made smaller.  To illustrate the merits of improved
421: knowledge, Fig.\,\ref{fig:R_vs_p1} shows $\mathcal{R}_n$ as a function of $p_1$
422: for $n=10$, $k=6$, and $p_0=0.1$.  Since the ``true'' probability for an event
423: to correlate is $p=6/10=0.6$, choosing $p_1$ close to $p$ increases
424: $\mathcal{R}_n$ and therefore minimizes the time necessary to confirm the signal.
425: As $p_1$ continues to increase beyond the true signal probability,
426: $\mathcal{R}_n$ decreases, as expected.
427: 
428: Fig.\,\ref{fig:R_vs_n} shows the results of the sequential analysis described
429: above when applied to simulated data sets.  The background probability is
430: $p_0=0.1$; $p_1=0.3$ is the minimum signal we choose to distinguish from the
431: background; and $\alpha=\beta=0.001$.  The upper plot shows the result of the
432: test for data sets with a correlation probability of $p=0.5$ ($\mathcal{H}_0$
433: is false), whereas for the bottom plot, $p=0.1$ ($\mathcal{H}_0$ is true).  For
434: both plots, the analysis is performed for $10^5$ Monte Carlo data sets, and the
435: dark and light grey areas indicate the range that includes 68\% and 95\% of the
436: data sets.  
437: 
438: % -----------------------------------------------------------------------------
439: \section{The Ratio of Likelihoods, the Ratio of Posteriors, and the Meaning
440: of $\alpha$ and $\beta$}
441: % -----------------------------------------------------------------------------
442: 
443: Here, $\mathcal{R}_n$ is defined as a ratio of likelihoods, but
444: one could just as easily define $\mathcal{R}_n$ as a ratio of 
445: posterior probabilities as suggested by~\citet{Wald:1945,Wald:1947}.  
446: However, changing the definition
447: of $\mathcal{R}_n$ carries consequences in the interpretation 
448: of $\alpha$ and $\beta$.  To understand how, we first review 
449: what $\alpha$ and $\beta$ mean in the context of the likelihood ratio.
450: 
451: The meaning of the probabilities in the numerator and denominator of 
452: $\mathcal{R}_n$ are obviously connected to the meaning of $\alpha$ and $\beta$.  
453: One could argue that, since we are marginalizing parameters anyway, 
454: we might as well calculate the posterior probabilities as suggested in 
455: Wald's original paper~\citep{Wald:1945}.  
456: This has certain advantages.
457: For instance, the ratio would be defined as 
458: \begin{eqnarray}
459: \mathcal{R}^{post}_n = \frac{P(\mathcal{H}_1|D)}{P(\mathcal{H}_0|D)} 
460: = \frac{P(D|\mathcal{H}_1)P(\mathcal{H}_1)}{P(D|\mathcal{H}_0)P(\mathcal{H}_0)}.
461: \end{eqnarray}
462: One could choose priors for $P(\mathcal{H}_1)$ and $P(\mathcal{H}_0)$.
463: $A$ and $B$ then become thresholds for ``degrees of belief'' that
464: we must hold for one hypothesis over another before we claim one or the 
465: other to be true.  
466: For instance, given that $\mathcal{H}_1$ is true,
467: $1-\beta$
468: becomes the required confidence for $P(\mathcal{H}_1|D)$ 
469: and $\alpha$ the required confidence
470: for $P(\mathcal{H}_0|D)$ to claim that $\mathcal{H}_1$ is true - i.e.
471: $A=(1-\beta)/\alpha$.   
472: 
473: However, as noted by~\citet{Wald:1945,Wald:1947}, the likelihood ratio also 
474: has its merits.  First, the likelihood ratio has some precedent.  Even those 
475: who subscribe to the Bayesian formalism use marginalized likelihood ratios 
476: (i.e. Bayes Factors)~\citep{Jeffreys:1939,Kass:1995}; using a likelihood 
477: ratio avoids the use of priors $P(\mathcal{H}_0)$ and $P(\mathcal{H}_1)$ which 
478: can strongly influence the result.  Further, likelhood ratios provide 
479: like comparisons with likelihood ratios used in other analyses with fixed $p_0$ 
480: and $p_1$.  However, the definitions of $A$ and $B$ become cumbersome even
481: in the circumstance here where we
482: are unconcerned whether or not the test ever terminates, 
483: For instance, given that $\mathcal{H}_1$ is true, 
484: $A$ parameterizes how much more likely the data must come from a universe
485: where $\mathcal{H}_1$ is true as opposed to $\mathcal{H}_0$ before 
486: we claim that $\mathcal{H}_1$ is indeed true. 
487: 
488: In short, using a ratio of posteriors allows 
489: $\alpha$ and 
490: $\beta$ to be conceptualized intuitively as degrees of belief
491: in one hypothesis or another.  Using likelihood ratios is common and, while one
492: does not have to contend with defining priors for $\mathcal{H}_1$ and $\mathcal{H}_0$,
493: $\alpha$ and $\beta$ can no longer be conceptualized in terms of degrees of belief
494: for $\mathcal{H}_0$ and $\mathcal{H}_1$.  
495: Here, we opt for the more traditional calculation of the likelihood ratio
496: or what could be thought of as a ratio of posteriors 
497: if $P(\mathcal{H}_1)=P(\mathcal{H}_0)$.
498:  
499: 
500: % -----------------------------------------------------------------------------
501: \section{Testing the Method}\label{sec:test}
502: % -----------------------------------------------------------------------------
503: 
504: \subsection{Test Convergence and the Error Rates $\alpha$ and $\beta$}
505: 
506: To account for our ignorance of the true correlation probability $p$ of the
507: given data set, $p$ is marginalized in the likelihoods in eq.\,(\ref{eq:likeli}).
508: As mentioned in the previous section, we assume that the signal
509: probability $p$ that we want to test against the null hypothesis is uniformly
510: distributed on $\left[p_1,1\right]$.  With no prior knowledge of the signal
511: other than $p>p_0$, we choose $p_1=p_0$.  
512: 
513: In practice, this approach has an important consequence if one were to
514: interpret the results of the hypothesis test in terms of the probabilities
515: $\alpha$ and $\beta$, for example by using $(1-\alpha)$ as a confidence
516: level for the rejection of the null hypothesis.  Since the numerator now
517: allows for $p_1<p<1$, $\alpha$ and $\beta$ have, strictly speaking, only
518: meaning for a data set that has similar properties, i.e. has a correlation
519: probability that is not a single value, but spread over the interval
520: $\left[p_1,1\right]$.  However, in reality, any given data set has some fixed
521: probability $p$ to correlate with objects of a catalog.
522: 
523: Therefore, we must test whether in the case of a fixed $p$ the method returns
524: probabilities for type-1 and type-2 errors lower than $\alpha$ and $\beta$.  In
525: general, we expect the type-2 error to be smaller than $\beta$ if the
526: correlation probability in the data is larger than some minimum value
527: $p_{\mbox{\scriptsize min}}$.  
528: 
529: A second practical issue is the convergence of the sequential likelihood ratio
530: test to a conclusion in favor of $\mathcal{H}_0$ or $\mathcal{H}_1$.  When
531: $p_1=p_0$ and the null hypothesis is true $(p=p_0)$, the ratio test will often
532: fail to reach a conclusion even as the number of events $n$ becomes quite
533: large.  
534: This problem can be avoided in two ways.  One would be to terminate the test after
535: accumulating some number of events, $n_0$.  The acceptance or rejection of
536: $\mathcal{H}_0$ would then depend on whether $\mathcal{R}_n$ was greater or less than 1.
537: However, making a decision in this way would require a modification of 
538: the type-1 and type-2 errors (see Appendix\,A).  
539: Another would be to choose $p_1=p_0+\delta$, where
540: $\delta$ is a positive constant.  
541: The particular choice of $\delta$ is somewhat
542: \textsl{ad hoc}, since it mainly reflects the experimenter's degree of belief
543: about the strength of the signal.  However, for those uncomfortable with this
544: kind of inference, we present a simple procedure to find $\delta$ such that:
545: the likelihood ratio $\mathcal{R}_n$ converges to a conclusion while still
546: satisfying a large number of signal hypotheses; and the type-1 and
547: type-2 rates of the sequential analysis are consistent with the
548: classical interpretations of the probabilities $\alpha$ and $\beta$.
549: 
550: %Performing the likelihood ratio test as described above with $p_1=p_0$ leads to
551: %another practical problem.  In cases where the null hypothesis is true, the
552: %ratio test often does not come to a conclusion even for large numbers of events
553: %$n$.  This problem can be avoided by choosing $p_1$ to be larger than $p_0$ by
554: %some amount $\delta$.  One could go even further by choosing a $\delta$ such
555: %that the method would not only finish with a finite data set $n$, but with
556: %type-1 error probabilities smaller than $\alpha$ in data sets where $p$ was
557: %fixed.  This latter requirement for $\delta$ can be viewed as superfluous for
558: %two reasons.  First, strictly speaking, $\delta$ is not a parameter that is
559: %usually found, but rather chosen {\it a priori}; ideally, it should be governed
560: %by nothing more than the experimenter's degree of belief.  Second, as will be
561: %discussed below, we could simply pick a $p_1$ above which $\mathcal{R}_n$ has the
562: %correct $\alpha$ and $\beta$ when the signal probability is larger than $p$.
563: %This removes the need for scanning over $\delta$ to find an $\alpha$ and
564: %$\beta$ that behave in the desired fashion.  However, here we discuss a
565: %procedure to find $\delta$ to satisfy those who seek the best of both worlds: a
566: %likelihood ratio that leaves the option for a number of signal hypothesis and a
567: %sequential analysis method that returns an $\alpha$ and $\beta$ that can be
568: %interpreted intuitively.
569: 
570: In this section, we test these expectations with simulated data sets and
571: determine values for $\delta$ and $p_{\mbox{\scriptsize min}}$ for some typical values for
572: $p_0$, $\alpha$, and $\beta$.  If we find $\delta$ to be small and
573: $p_{\mbox{\scriptsize min}}$ to be close to $p_0$, then the test will terminate with type-1
574: and type-2 error rates that are smaller than $\alpha$ and $\beta$, giving the
575: result an intuitive interpretation.  For each of the following tests, we
576: produce $10^5$ simulated data sets\footnote{We will use $\alpha=\beta=0.001$, and therefore
577: test the method on $10\times 1/0.001$.} with a correlation probability $p$ and
578: subject these data sets to a sequential analysis with predefined values for
579: $\alpha$ and $\beta$.  
580: 
581: \textbf{Case 1: $\mathcal{H}_0$ is True:} First consider the case where the
582: null hypothesis is true, so that the correlation probability $p$ of the data is
583: equal to $p_0$.  The dark grey area in Fig.\,\ref{fig:zones} indicates, as a
584: function of $p_{0}$,  the range $p_1>p_0$ for which the ratio test terminates
585: with a type-1 error probability greater than $\alpha$.  Note that when
586: $p_1\simeq p_0$, there is a large fraction of data sets in which the test does
587: not come to a conclusion (rejection or acceptance of the null hypothesis) even
588: when the number of events $n$ exceeds 1000.  The fraction of undecided tests is
589: added to the type-1 error rate to give a conservative limit on $p_1$.
590: For all $p_1$ that fall above the dark grey area, the test terminates with a
591: type-1 error rate less than $\alpha$.  As expected, the dark grey range
592: is narrow, so the test is ``well-behaved'' if $p_1$ is chosen not too
593: close to $p_0$.  As an example, if the random correlation probability
594: $p_0=0.1$, then $p_1=0.14$ ($\delta=0.04$).  Any values for $p_1$ larger than
595: 0.14 will of course also be well-behaved.
596: 
597: \textbf{Case 2: $\mathcal{H}_0$ is False:} We now consider the case where the
598: null hypothesis is false.  Choosing the values for $p_1$ determined with the
599: procedure outlined in ``Case 1,'' we use simulated data to find the minimum
600: signal probability $p_{\mbox{\scriptsize min}}$ for which the ratio test terminates with a
601: type-2 error probability less than $\beta$.  The light grey area in
602: Fig.\,\ref{fig:zones} depicts, as a function of $p_0$, the range of
603: $p_{\mbox{\scriptsize min}}>p_1$ for which the ratio test terminates with a type-2
604: probability greater than $\beta$.  For instance, when $p_0=0.1$ and
605: $\alpha=\beta=0.01$, for all signal probabilities $p>p_{\mbox{\scriptsize min}}=0.18$ the
606: ratio test will terminate with a type-2 error probability less than
607: $\beta$.  Note that the $p_{\mbox{\scriptsize min}}$ values given here are conservative,
608: since they not only require a type-2 error below $\beta$ in case of a
609: signal with strength $p_{\mbox{\scriptsize min}}$, but also a type-1 rate below
610: $\alpha$ \textit{and} a rejection or acceptance of $\mathcal{H}_0$ before the
611: sample size $n$ reaches 1000 when $\mathcal{H}_0$ is true.  This last
612: requirement slightly inflates the value of $p_{\mbox{\scriptsize min}}$.
613: 
614: The simulations of Cases 1 and 2 indicate that $p$ and $p_1$ must be larger
615: than $p_0$ if the test is to arrive at a decision in a reasonable amount of
616: time, and if the results are to be consistent with the error probabilities
617: $\alpha$ and $\beta$.  (To a much lesser extent, this second issue also exists
618: in Wald's original formulation of the ratio test, in which $p_1$ is treated as
619: a single alternative probability~\citep{Wald:1945,Wald:1947}.) Even so, the
620: amounts by which $p$ and $p_1$ should differ from $p_0$ are small enough that
621: they do not appreciably limit the usefulness of the method when a ``classical''
622: interpretation of $\alpha$ and $\beta$ is required.  We note that the existence
623: of small intervals above $p_0$ where such an interpretation is not possible are
624: a typical feature of sequential tests; see, for
625: example~\citep{Wald:1945,Wald:1947,Lewis:1994}.  It should be stressed,
626: however, that we have not demonstrated a circumstance where we are obtaining
627: some undesired values for $\alpha$ and $\beta$.  Rather, we have demonstrated
628: that marginalizing the likelihood is not the equivalent of inserting the right
629: value for $p$.
630: 
631: %In his original paper, Wald suggests a different approach for testing a point
632: %null hypothesis against a single-sided alternative.  Here, rather than
633: %marginalizing over the unknown correlation probability $p$, one chooses a
634: %single value $p_s>p_0$ and proceeds with the ratio test as if the point null
635: %hypothesis is tested against a {\it single} alternative probability $p_s$.  As
636: %in the method that uses a marginalized likelihood ratio, the fact that the data
637: %has a correlation probability $p$ that is in most cases not equal $p_s$ may
638: %result in type-1 error probabilities greater than $\alpha$ and type-2 error
639: %probabilities greater than $\beta$ for some $p_s$.  Again, $p_s$ can be chosen
640: %such that the probabilities do not exceed $\alpha$ and $\beta$.  This is
641: %typically the case if $p_s$ is chosen not too close to $p_0$.  
642: 
643: \subsection{Efficiency of the Ratio Test}
644: 
645: An important aspect of a sequential test is its length, i.e., the number of
646: events $n$ necessary to reach a decision.  Fig.\,\ref{fig:median} shows an
647: example for the typical length of the test as a function of the signal
648: probability $p$.  In this example, the background probability is chosen as
649: $p_0=0.1$, the lower boundary of the marginalization is  $p_1=0.3$, and
650: $\alpha=\beta=0.001$.  For $10^5$ simulated data sets,
651: Fig.\,\ref{fig:median}\,(top) shows the median number of events required for a
652: termination of the test.  The error bars indicate the range that includes
653: 68\,\% of the data sets.  In this example, the median size of a data set
654: required to accept the null hypothesis if it is true ($p_0=0.1$) is 27.  The
655: median size of a data set required to reject the null hypothesis if it is wrong
656: depends on $p$ and is large when $p$ is close to $p_0$.  Above $p\simeq 0.6$,
657: the median number reaches a plateau of about 7 events.
658: 
659: Fig.\,\ref{fig:median}\,(bottom) shows which decision is actually made,
660: depicting the fraction of data sets for which the null hypothesis ($p_0=0.1$)
661: is accepted and the fraction for which it is rejected, as a function of the
662: signal probability.  
663: 
664: Comparing the length of the test with the marginalized likelihood to Wald's
665: original test is not straightforward, since the length of each test depends on
666: the specifics of the problem, and because the probability $p_1$ has quite a
667: different meaning for the two methods.  However, we find that the marginalized
668: test tends to require fewer events when $p_1$ is the same in both tests.  For
669: the above example, the median number of events required to accept the null
670: hypothesis if it is true is 55 and thus twice as large as for the marginalized
671: likelihood ratio.  For signal probabilities $p>0.6$, the Wald test reaches a
672: plateau that is roughly comparable to the marginalized test.
673: Fig.\,\ref{fig:median_wald} shows the median number of events required for the
674: Wald test for $p_1=0.3$ and $\alpha=\beta=0.001$.
675: 
676: %Comparing the length of the test with the marginalized likelihood to Wald's
677: %original test which compares $p_0$ to a fixed signal probability $p_s$ is not
678: %straightforward, as the length of the test depends on our choice of $p_s$ just
679: %like the length of the marginalized test depends on our choice of $p_1$.  Both
680: %parameters can be chosen independently, and the Wald sequential test can end
681: %sooner or later than the marginalized test depending on what values are chosen.
682: %However, we find that the marginalized test tends to require fewer events if
683: %$p_1=p_s$.  For the above example, the median number of events required to
684: %accept the null hypothesis if it is true is 55 and thus twice as large as for
685: %the marginalized likelihood ratio.  For signal probabilities $p>0.6$, the Wald
686: %test reaches a plateau that is roughly comparable to the marginalized test.
687: %Fig.\,\ref{fig:median_wald} shows the median number of events required for the
688: %Wald test for $p_s=0.3$ and $\alpha=\beta=0.001$.
689: 
690: 
691: % -----------------------------------------------------------------------------
692: \section{Summary}\label{sec:summary}
693: % -----------------------------------------------------------------------------
694: 
695: We have outlined a sequential analysis technique for testing a point
696: null hypothesis with probability $p_0$ against a signal probability
697: $p$.  The method is based on the sequential analysis proposed 
698: in~\citet{Wald:1945,Wald:1947}, but replacing the likelihood ratio used
699: to evaluate the significance of a signal with one that marginalizes
700: the signal strength.
701: 
702: In many sequential tests, the signal strength is unknown when the test
703: starts.  Typically, the signal probability $p$ can in principle have any
704: value in the interval $\left[p_0,1\right]$.  Rather than choosing a fixed
705: threshold for $p$, as suggested in~\citet{Wald:1945,Wald:1947}, we have
706: argued that, in general, the better alternative is to marginalize $p$ and
707: account for our ignorance exactly.  In the marginalization of the signal
708: likelihood, the integration starts at some value $p_1=p_0+\delta$, where
709: $\delta$ is an \textsl{ad hoc} parameter reflecting the experimenter's belief
710: about the strength of the signal, the capability of his experiment,
711: and other \textsl{a priori} knowledge.
712: 
713: Because of the integration of the signal likelihood over a range in $p$, the
714: parameters $\alpha$ and $\beta$ have lost their intuitive meaning if the
715: method is applied to data sets where $p$ is fixed, as is typically the
716: case for real data.  However, we have shown that for most values of $\delta$
717: and $p$ that occur in correlation searches, the type-1 and type-2
718: error rates of the sequential analysis are consistent with the classical
719: interpretations of the probabilities $\alpha$ and $\beta$.
720: 
721: Note that we have run a test with one of two outcomes
722: (i.e., an acceptance or rejection of $\mathcal{H}_0$), defining $\alpha$ and 
723: $\beta$, rather than one outcome (say, only a rejection of $\mathcal{H}_0$) 
724: such as in~\citet{Darling:1968}.  The latter case supposes that we 
725: are only concerned about reporting a signal.  
726: However, it is important to state a null
727: result at some point in the interest of reducing reporting bias.
728: That is, it is important to ensure that 1\% of the 
729: results that claim an excess of events are indeed a 1\% effect. 
730: 
731: The sequential analysis technique proposed here is efficient, allows the
732: signal significance to be evaluated after the test has been fulfilled,
733: adheres to the likelihood principle, and rigorously accounts for our
734: ignorance of the signal strength.
735: 
736: 
737: \acknowledgments
738: 
739: We thank Diego Harari, Antoine Letessier-Selvon, and John A.J. Matthews for
740: valuable discussions and help.  This work is supported by the National Science
741: Foundation under contract numbers NSF-PHY-0500492 and NSF-PHY-0636875.
742: 
743: \appendix
744: \section{The Truncated Sequential Analysis Test}
745: 
746: In practice, the test must end.  It is supposed that a decision to 
747: accept or reject the null hypothesis must 
748: be made when $n=n_0$ if it has not been made already for $n\le n_0$.
749: Following the derivation of the modified errors for truncated tests 
750: in~\citet{Wald:1945}, $\alpha(n_0)$ and $\beta(n_0)$
751: are defined as the probabilities of errors of the first and second
752: kinds if the test is truncated at $n=n_0$.  The objective is then
753: to derive an upper bound on $\alpha(n_0)$ and $\beta(n_0)$ such 
754: that (1) the test ends prematurely and (2) 
755: $\mathcal{H}_1$ is accepted if $R_{n_0}>1$ and $\mathcal{H}_0$ is accepted in 
756: $R_{n_0}\le 1$.  In doing so, 
757: we find a suitable $\delta$ and $n_0$ where $\alpha$
758: and $\beta$ are small.
759: 
760: First, $\rho_0(n_0)$ is defined as the probability that, under the null hypothesis,
761: \begin{enumerate}
762: \item $B<R_{n_0-1}<A$
763: \item $1<R_{n_0}<A$
764: \item The sequential analysis would terminate with an acceptance
765: of $\mathcal{H}_0$ if allowed to continue.
766: \end{enumerate}
767: For the truncated test, we are rejecting the null hypothesis if
768: $1<R_{n_0}<A$.  In other words, $\rho_0(n_0)$ is the
769: probability of wrongly rejecting the null hypothesis 
770: when $1<R_{n_0}<A$ when it would have terminated with a 
771: rejection of the null hypothesis
772: wanted if we let the test continue.  This is 
773: added to the probability that the test would terminate wrongly if we let
774: it continue.  
775: Therefore, the upper bound on $\alpha(n_0)$ can be expressed as
776: \begin{eqnarray}
777: \alpha(n_0)\le \alpha + \rho_0(n_0).
778: \end{eqnarray}
779: Now if $\bar{\rho}_0(n_0)$ is 
780: simply the probability under the null hypothesis that $1<R_{n_0}<A$, 
781: then $\rho_0(n_0)<\bar{\rho}(n_0)$
782: and therefore 
783: \begin{eqnarray}
784: \alpha(n_0)\le \alpha + \bar{\rho}_0(n_0).
785: \end{eqnarray}
786: Similarly, $\rho_1(n_0)$ is defined as the probability that, 
787: under the ``signal'' hypothesis,
788: \begin{enumerate}
789: \item $B<R_{n_0-1}<A$
790: \item $B<R_{n_0}\le 1$
791: \item The sequential analysis would terminate with an acceptance
792: of $\mathcal{H}_1$ if allowed to continue.
793: \end{enumerate}
794: and
795: \begin{eqnarray}
796: \beta(n_0)\le \beta + \bar{\rho}_1(n_0).
797: \end{eqnarray}
798: where $\bar{\rho}_1(n_0)$ is defined to be 
799: the probability under the signal hypothesis that $B<R_{n_0}\le 1$.  
800: 
801: We then calculate $\bar{\rho}_0(n_0)$ explicitly.  
802: The probability of obtaining $R_{n_0}>1$ if the null hypothesis is true is
803: \begin{eqnarray}
804: \bar{\rho}_0(n_0)=
805: \sum_{k_{1+}}^{k_A} {n_0 \choose k} p_0^k(1-p_0)^{n_0-k}
806: \end{eqnarray}
807: where $k_{1+}$ is the minimum integer $k$ for which 
808: \begin{eqnarray}
809: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}>1
810: \end{eqnarray}
811: and $k_{A}$ is the maximum integer $k$ for which 
812: \begin{eqnarray}
813: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}<A
814: \end{eqnarray}
815: 
816: Similarly,
817: \begin{eqnarray}
818: \bar{\rho}_1(n_0)=
819: \frac{\sum_{k_B}^{k_{1-}} {n_0 \choose k} \frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^{k}(1-p)^{n_0-k}}
820: {\sum_0^{n_0} {n_0 \choose k} \frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^{k}(1-p)^{n_0-k}}
821: \end{eqnarray}
822: where $k_{1-}$ is the maximum integer $k$ for which 
823: \begin{eqnarray}
824: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}\le 1
825: \end{eqnarray}
826: and $k_B$ is the minimum integer $k$ for which 
827: \begin{eqnarray}
828: \frac{\frac{1}{1-p_0-\delta}\int_{p_0+\delta}^1p^k(1-p)^{n_0-k}}{p_0^{k}(1-p_0)^{n_0-k}}>B
829: \end{eqnarray}
830: 
831: Under this scheme, Fig.\,\ref{fig:rho} shows $\bar{\rho}_0(n_0)$ 
832: and $\bar{\rho}_1(n_0)$ as a function of $\delta$ and $n_0$.
833: It shows that a rather large $\delta$ ($\sim 0.7$) is required to bring 
834: $\bar{\rho}_1(n_0)$ and $\bar{\rho}_1(n_0)$ to be less than 
835: $\alpha = \beta = 0.001$.  Further, if the calculation is
836: extended we find that
837: it would take $\sim 180$ events to bring 
838: $\bar{\rho}_1(n_0)$ and $\bar{\rho}_1(n_0)$ to be $\sim 0$
839: for any $\delta$.  
840: 
841: 
842: 
843: \begin{thebibliography}{11}
844: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
845: 
846: \bibitem[{Abbasi {et~al.}(2004)}]{Abbasi:2004ib}
847: Abbasi, R.~U. {et~al.} 2004, Astrophys. J., 610, L73
848: 
849: \bibitem[{Abbasi {et~al.}(2006)}]{Abbasi:2005qy}
850: ---. 2006, Astrophys. J., 636, 680
851: 
852: \bibitem[{Anscombe (1954)}]{Anscombe:1954}
853: Anscombe, F.~J. 1954, Biometrics, 10, 89
854: 
855: \bibitem[{Armitage {et~al.}(1969)}]{Armitage:1969}
856: Armitage, P., McPherson, C.~K., \& Rowe, B.~C. 1969, J. Roy. Stat. Soc. A, 132, 235
857: 
858: \bibitem[{Berry (1987)}]{Berry:1987}
859: Berry, D.~A. 1987, Amer. Stat., 41, 117
860: 
861: \bibitem[{Darling {et~al.}(1968)}]{Darling:1968}
862: Darling, D.~A. \& Robbins, H. 1968, Proc. Nat. Acad. Sci. USA, 61, 804
863: 
864: \bibitem[{Gorbunov {et~al.}(2004)}]{Gorbunov:2004bs}
865: Gorbunov, D.~S., Tinyakov, P.~G., Tkachev, I.~I., \& Troitsky, S.~V. 2004, JETP
866:   Lett., 80, 145
867: 
868: \bibitem[{Jeffreys (1939)}]{Jeffreys:1939}
869: Jeffreys, H. 1939, Theory of Probability (London: Oxford University Press) 
870: 
871: \bibitem[{Lewis \& Berry (1994)}]{Lewis:1994}
872: Lewis, R.~J. \& Berry, D.~A. 1994, J. Amer. Stat. Assoc., 89, 1528
873: 
874: \bibitem[{Kass \& Raftery (1995)}]{Kass:1995}
875: Kass, R.~E. \& Raftery, A.~E. 1995, J. Amer. Stat. Assoc., 90 773
876: 
877: \bibitem[{Takeda {et~al.}(1999)}]{Takeda:1999sg}
878: Takeda, M. {et~al.} 1999, Astrophys. J., 522, 225
879: 
880: \bibitem[{Tinyakov \& Tkachev (2001)}]{Tinyakov:2001nr}
881: Tinyakov, P.~G. \& Tkachev, I.~I. 2001, JETP Lett., 74, 445
882: 
883: \bibitem[{Wald (1945)}]{Wald:1945}
884: Wald, A. 1945, Ann. Math. Stat., 16, 117
885: 
886: \bibitem[{Wald (1947)}]{Wald:1947}
887: ---. 1947, Sequential Analysis (New York, NY: John Wiley and Sons)
888: 
889: \end{thebibliography}
890: 
891: \clearpage
892: 
893: \begin{figure}
894: \plotone{f1.eps}
895: \caption{Likelihood ratio as a function of $p_1$ for $n=10$, $k=6$, 
896: and $p_0=0.1$.\label{fig:R_vs_p1}}
897: \end{figure}
898: 
899: \clearpage
900: 
901: \begin{figure}
902: \plotone{f2a.eps}
903: \plotone{f2b.eps}
904: \caption{Likelihood ratio as a function of the number of events for a background 
905: probability $p_0=0.1$, $p_1=0.3$, and a signal probability $p=0.5$ (top) and $p=0.1$ (bottom). 
906: The ratio is calculated for $10^5$ random data sets.  The plots show the median (dark grey dots)
907: together with the range that includes 68\,\% and 95\,\% of the data sets (dark and light grey
908: areas).  The values for the test boundaries $A$ and $B$ for $\alpha=\beta=0.001$ are indicated 
909: as dashed and dotted lines.\label{fig:R_vs_n}}
910: \end{figure}
911: 
912: \clearpage
913: 
914: \begin{figure}
915: \plotone{f3a.eps}
916: \plotone{f3b.eps}
917: \caption{Range for $p_1>p_0$ for which the ratio test terminates with 
918: type-1 error probabilities greater than $\alpha$ (dark grey), as a 
919: function of $p_0$.  Range for $p>p_1$ for which the ratio test terminates 
920: with type-2 error probabilities greater than $\beta$, as a function of $p_0$ (light 
921: grey).  The upper plot is for $\alpha=\beta=0.01$, the lower plot for 
922: $\alpha=\beta=0.001$.\label{fig:zones}}
923: \end{figure}
924: 
925: \clearpage
926: 
927: \begin{figure}
928: \plotone{f4a.eps}
929: \plotone{f4b.eps}
930: \caption{{\it Top:}  Median number of events necessary for the sequential test 
931: to come to a conclusion, as a function of the signal probability $p$.  In this
932: example, the background probability is $p_0=0.1$, and $p_1=0.3$, $\alpha=\beta=0.001$.  
933: Error bars indicate the range that includes 68\,\% of the simulated data sets.  
934: {\it Bottom:}  For the same simulated data sets, fraction of data sets for which 
935: the null hypothesis is accepted (solid line) and rejected (dotted line) as a 
936: function of the signal probability $p$ for a background probability $p_0=0.1$. 
937: \label{fig:median}}
938: \end{figure}
939: 
940: \clearpage
941: 
942: \begin{figure}
943: \plotone{f5.eps}
944: \caption{Median number of events necessary for the Wald sequential test 
945: to come to a conclusion (open circles), as a function of the signal probability
946: $p$, compared to the marginalized likelihood ratio test (filled circles).  The
947: fixed point $p_1$ is the same in both cases.  For this example, the background
948: probability is $p_0=0.1$, and $p_1=0.3$, $\alpha=\beta=0.001$.  Error bars 
949: indicate the range that includes 68\,\% of the simulated data sets.  
950: \label{fig:median_wald}}
951: \end{figure}
952: 
953: \clearpage
954: 
955: \begin{figure}
956: \plotone{f6.eps}
957: \caption{The added error for $\alpha$, $\bar{\rho}_0(n_0)$, and $\beta$, $\bar{\rho}_1(n_0)$,
958: as a function of $\delta$, where $p$ is integrated from $p_0+\delta$ to 1, 
959: and the number of events at which the test is truncated, $n_0$.  }
960: \label{fig:rho}
961: \end{figure}
962: 
963: \end{document}
964: