gr-qc0210032/ms.tex
1: \documentclass[twocolumn,aps,prd,amssymb,eqsecnum,floatfix,nofootinbib]{revtex4}
2: %\documentclass[twocolumn,aps,prd,amssymb,eqsecnum,floatfix]{revtex4}
3: %\documentclass[preprint,aps,prd,amssymb,eqsecnum,floatfix]{revtex4}
4: \usepackage{epsfig}
5: \usepackage{amsmath}
6: \DeclareMathOperator{\erfc}{erfc}
7: \DeclareMathOperator{\erf}{erf}
8: 
9: \begin{document}
10: 
11: \title{Detection methods for non-Gaussian gravitational wave stochastic backgrounds}
12: \author{Steve Drasco\footnote{sd68@cornell.edu}}
13: %\email{sd68@cornell.edu}
14: \author{\'{E}anna \'{E}. Flanagan\footnote{eef3@cornell.edu.  Also
15:     Radcliffe Institute for Advanced Study, Putnam House, 10 Garden
16:     Street, Cambridge, MA 02138.}}
17: %\email{eef3@cornell.edu}
18: \affiliation{Newman Laboratory of Nuclear Studies, Cornell University, Ithaca, New York 14853}
19: %\affiliation{Center for Radiophysics and Space Research, Cornell University, Ithaca, New York 14853}
20: \date{\today}
21: 
22: \begin{abstract}
23: A gravitational wave stochastic background can be produced by a
24: collection of independent gravitational wave events.  There are two
25: classes of such backgrounds, one for which the ratio of the average
26: time between events to the average duration of an event is small
27: (i.e., many events are on at once), and one for which the ratio is
28: large.  In the first case the signal is continuous, sounds something
29: like a constant {\em hiss}, and has a Gaussian probability
30: distribution.  In the second case, the discontinuous or intermittent
31: signal sounds something like popcorn popping, and is described by a
32: non-Gaussian probability distribution.  In this paper we address the
33: issue of finding an optimal detection method for such a non-Gaussian
34: background.  As a first step, we examine the idealized situation in
35: which the event durations are short compared to the detector sampling
36: time, so that the time structure of the events cannot be resolved, and
37: we assume white, Gaussian noise in two collocated, aligned
38: detectors.  For this situation we derive an appropriate version of the
39: maximum likelihood detection statistic.  We compare the performance of
40: this statistic to that of the standard cross-correlation statistic
41: both analytically and with Monte Carlo simulations.  
42: In general the maximum likelihood statistic performs better than the
43: cross-correlation statistic when the stochastic background is
44: sufficiently non-Gaussian, resulting in a gain factor in the minimum
45: gravitational-wave energy density necessary for detection. 
46: This gain factor ranges roughly between 1 and 3, depending on the duty
47: cycle of the background, for realistic observing times and signal
48: strengths for both ground and space based detectors.
49: The computational cost of the statistic, although significantly greater 
50: than that of the cross-correlation statistic, is not unreasonable.
51: Before the statistic can be used in practice with real detector data,
52: further work is required to generalize our analysis to accommodate
53: separated, misaligned detectors with realistic, colored, non-Gaussian noise.
54: \end{abstract}
55: 
56: \pacs{04.80.Nn, 04.30.Db, 95.55.Ym, 07.05.Kf}
57: 
58: \keywords{gravitational waves; stochastic background}
59: 
60: \maketitle
61: 
62: \section{Introduction and summary}
63: \label{s:Introduction and summary}
64: 
65: Along with a new generation of gravitational wave detectors around the world \cite{ligo,virgo,geo,tama}, detection 
66: algorithms for a variety of sources are nearing completion. If the
67: signals from these sources are  
68: detected, physicists stand to harvest unprecedented quantities of observational information concerning the 
69: nature of gravitation and the cosmos as a whole.  The fruit of this harvest will be the outputs of detection 
70: algorithms.  In this paper we introduce an algorithm designed for nearly optimal detection of a class of 
71: gravitational wave stochastic backgrounds. The non-Gaussian nature of this class of backgrounds causes 
72: the algorithm presented here to differ from the well studied cross-correlation based algorithms which are 
73: nearly optimal for Gaussian backgrounds.
74: 
75: \subsection{Gravitational wave stochastic backgrounds}
76: \label{ss:Gravitational stochastic backgrounds}
77: 
78: Consider a large collection of similar gravitational wave sources.  If
79: we cannot resolve the individual signals produced by these sources and
80: know only their statistical properties, the signals form a stochastic background.
81: A wide variety of candidate sources of gravitational wave stochastic backgrounds have been studied 
82: (for an excellent general review see Ref.~\cite{Allen Review}).
83: These include high redshift supernovae \cite{gaussian supernovae, non gaussian supernovae}, 
84: the first stars or so-called population III objects \cite{first stars}, 
85: rapidly rotating young neutron stars \cite{gaussian neutron stars 1, gaussian neutron stars 2}, 
86: early universe phase transitions and cosmic strings \cite{cosmic strings, bubbles}, 
87: inflation \cite{inflation}, 
88: and high redshift compact binaries \cite{binaries}.
89: 
90: Detecting a gravitational wave stochastic background produced by any one of these candidate sources could 
91: provide information on a variety of topics ranging from the evolution
92: of the star formation rate \cite{Coward} to the numbers and sizes of posited extra dimensions \cite{Hogan}.  
93: Because of this, stochastic backgrounds have long been thought to be among the most interesting 
94: possible types of gravitational radiation.
95: 
96: \subsection{Gaussian stochastic backgrounds}
97: \label{ss:Gaussian stochastic backgrounds}
98: 
99: In order to develop detection methods, it is traditionally assumed that the individual events making up a
100: background are uncorrelated and sufficiently frequent for the background to be Gaussian.  That is, it is
101: assumed that the conditions for applicability of the central limit theorem are satisfied. 
102: 
103: Unlike electromagnetic waves, gravitational waves cannot be screened from a detector. 
104: Using a single gravitational wave detector,  there is no practical way to distinguish between 
105: detector noise and a stochastic background of gravitational waves.  
106: As a consequence the sensitivity of a single detector to gravitational backgrounds is severely limited.
107: By comparing the outputs of multiple detectors, sensitivity levels can be enhanced.
108: Michelson \cite{Michelson} was the first to give a detailed description of such a detection 
109: method for a Gaussian stochastic background of gravitational waves in the presence of Gaussian 
110: detector noise.  His detection strategy and its later refinements \cite{Christensen,Flanagan,Allen Romano} are 
111: often referred to as the cross-correlation method.  
112: Recently the cross-correlation method has been modified to treat more realistic detectors
113: which themselves have sources of non-Gaussian noise \cite{robust
114: gaussian, robust gaussian II, Klimenko and Mitselmakher}.
115: 
116: We now briefly review the cross-correlation method.
117: Consider two gravitational wave detectors.   
118: The output of each detector is a collection of dimensionless strain measurements.  
119: Suppose that $N$ such measurements are made with each detector at regular time intervals. Denote these measurements 
120: by a $N \times 2$ matrix $h$ with components $h_i^k$,
121: where $i=1,2$ labels the detector, and $k=1,2,\ldots,N$ is a time index.  
122: To determine whether or not the data $h$ contains some desired signal,
123: one usually  
124: compares the value of some detection statistic $\Gamma(h)$ to a threshold value $\Gamma_*$.  That is, 
125: if $\Gamma(h) > \Gamma_*$ one concludes that a signal is present and otherwise
126: one concludes that no signal is present.
127: A detection statistic is said to be optimal if it yields the smallest probability of mistakenly concluding a signal 
128: is present (false alarm probability) after choosing a threshold which fixes the probability for 
129: mistakenly concluding a signal is absent (false dismissal probability).
130: 
131: Assume that the two detectors are collocated and aligned, and that each detector has white Gaussian noise with vanishing 
132: mean with no correlations between the two detectors. Then the standard cross-correlation detection 
133: statistic $\Lambda_{\text{CC}}$ for a Gaussian signal is 
134: \begin{equation} \label{cross correlation}
135: \Lambda_{\text{CC}}(h) = \frac{\hat\alpha^2 }{ \bar \sigma_1 \bar \sigma_2},
136: \end{equation}
137: where 
138: \begin{eqnarray}
139: \hat\alpha^2 &=& {\bar \alpha}^2 \theta({\bar \alpha}^2), \\
140: \bar\alpha^2 &=& \frac{1}{N}\sum_{k=1}^N h_1^{k} h_2^{k}, \\ 
141: \bar\sigma_i^2 &=& \frac{1}{N} \sum_{k=1}^N \left(h_i^k\right)^2, \label{intro bar sigma}
142: \end{eqnarray}
143: for $i=1,2$, and $\theta(x)$ is the Heaviside step function defined by
144: \begin{equation} 
145: \label{stepfunction}
146: \theta(x) = \left\{ 
147: \begin{array}{ll}
148:         1 & \text{ if } x \ge 0 \\
149:         0 & \text{ if } x  < 0
150: \end{array}
151: \right. .
152: \end{equation} 
153: This statistic is nearly optimal and can be derived 
154: from a maximum likelihood framework (see Sec.~\ref{ss:Gaussian
155: signal}). The subscript CC in $\Lambda_{\text{CC}}$
156: denotes ``cross correlation''.  The generalization of this statistic
157: to allow for colored noise and non-collocated, non-aligned detectors is
158: discussed in Refs.\ \cite{Michelson,Christensen,Flanagan,Allen Romano}.
159: 
160: 
161: 
162: % latex bug (?) causes start of next subsection to be squashed up
163: % against above text, the ~ below fixes is.
164: 
165: ~
166: 
167: 
168: 
169: \subsection{Non-Gaussian stochastic backgrounds}
170: \label{ss:Non-Gaussian stochastic backgrounds}
171: 
172: 
173: A particular class of events will produce a Gaussian background
174: if, on average, at any given moment, many individual events are arriving at the detector. 
175: However, if the ratio of average time between events to the average duration
176: of events is large, then there are long stretches of ``silence'' or time during which no events arrive at
177: the detector.  The resulting stochastic background is non-Gaussian as the conditions for the applicability 
178: of the central limit theorem are not  satisfied. Recent work has suggested that some candidate 
179: gravitational wave stochastic backgrounds, of both cosmological and astrophysical origin, may  be 
180: non-Gaussian \cite{non gaussian supernovae, cosmic strings, first stars}.  However,
181: predictions concerning the properties of most gravitational wave background sources rely heavily on theoretical
182: arguments which extrapolate well beyond observational support.  Such extrapolations are always in some sense
183: speculative. It is conceivable that backgrounds predicted to be Gaussian may in fact turn out to be non-Gaussian, 
184: or vice versa.  
185: 
186: In Sec.~\ref{ss:Non-Gaussian signal} below, we apply a maximum
187: likelihood framework to derive a detection statistic for a particular
188: model of non-Gaussian stochastic background, which we now describe.
189: Let $h_i^k$ be the outputs of two collocated aligned gravitational
190: wave detectors with white, zero-mean, Gaussian noise with no
191: correlations between the two detectors.  The detector outputs $h_i^k$
192: consist of noise $n_i^k$ together with a common signal $s^k$:
193: \begin{eqnarray}
194: h_1^k &=& n_1^k + s^k \label{eq:common} \\
195: h_2^k &=& n_2^k + s^k. \nonumber
196: \end{eqnarray}
197: We wish to detect a non-Gaussian signal $s^k$ composed of long stretches of
198: silence which separate short bursts whose amplitudes are Gaussianly
199: distributed, and whose durations are smaller than the detector
200: resolution time (see Fig.~\ref{signal sketch}).  We therefore assume that each signal sample $s^k$ is
201: statistically independent with probability distribution [cf.\ Eq.\
202: (\ref{signal prior}) below]
203: \begin{equation}
204: p(s) = \xi {1 \over \sqrt{2 \pi} \alpha} \exp \left[-{s^2 \over 2
205: \alpha^2} \right] + (1 - \xi) \delta(s).
206: \label{eq:sigg}
207: \end{equation}
208: The parameter $\xi$ is what we call the {\it 
209: Gaussianity parameter} of the
210: stochastic background; it is the probability that, at any randomly
211: chosen time, a burst is present in the detector.  Thus $\xi$ takes
212: values in the 
213: interval $0 \le \xi \le 1$, and if $\xi=1$ then the background is
214: Gaussian.  The parameter $\xi$ can also be thought of as the duty
215: cycle of the background.  The parameter $\alpha$ in Eq.\ (\ref{eq:sigg}) is
216: the rms amplitude of the bursts.
217: 
218: 
219: Our nearly-optimal detection statistic
220: $\Lambda_{\text{ML}}^{\text{NG}}$ for the signal model (\ref{eq:sigg})
221: is given by [cf.\ Eq.\ (\ref{main result2}) below]
222: \begin{widetext}
223: \begin{eqnarray}  \label{main result}
224: \Lambda_{\text{ML}}^{\text{NG}}(h) &=&
225: \max_{0<\xi\le 1}~ \max_{\alpha > 0}~ \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~ \prod_{k=1}^N
226: \left\{  
227:         \frac{ \bar\sigma_1 \bar\sigma_2 \xi}{\sqrt{\sigma^2_1 \sigma^2_2 + \sigma^2_1 \alpha^2 + \sigma^2_2 \alpha^2}} 
228:         \exp \left[ \frac{\left( \frac{h_1^k}{\sigma^2_1} + \frac{h_2^k}{\sigma^2_2}\right)^2}
229:         {2\left( \frac{1}{\sigma^2_1} + \frac{1}{\sigma^2_2} + \frac{1}{\alpha^2} \right)} 
230:         - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]  \right. \nonumber \\ 
231: &+& \left. \frac{\bar\sigma_1 \bar\sigma_2}{\sigma_1 \sigma_2}  (1-\xi)
232:         \exp \left[ - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]\right\}.
233: \end{eqnarray}
234: \end{widetext}
235: Here the quantities $\bar\sigma_1$ and $\bar\sigma_2$ are defined by Eq.~(\ref{intro bar sigma}). 
236: The values of $\xi$, $\alpha^2$, $\sigma^2_1$ and $\sigma^2_2$ which achieve the maximum in Eq.~(\ref{main result}) are, respectively,  estimators of 
237: the signal's Gaussianity parameter, the variance of the signal events, and the variances of the noise in the two detectors.
238: If we calculate the quantity (\ref{main result}) at $\xi = 1$, instead of maximizing over $\xi$, the result is a statistic which is 
239: equivalent to the standard cross-correlation statistic
240: $\Lambda_{\text{CC}}$.
241: 
242: The subscript ML on $\Lambda_{\text{ML}}^{\text{NG}}$ stands for
243: ``maximum likelihood'', while the superscript NG stands for
244: ``non-Gaussian statistic''.  The superscript NG does {\it not}
245: necessarily mean that one is considering a non-Gaussian signal; both
246: of the statistics $\Lambda_{\rm CC}$ and 
247: $\Lambda_{\text{ML}}^{\text{NG}}$ can be applied to data containing
248: either a Gaussian signal or a non-Gaussian signal.
249: 
250: If the burst-amplitude parameter $\alpha$ is sufficiently large 
251: and the bursts are well separated in time, then the
252: individual bursts can 
253: be seen in the detector output.  In this 
254: case one could use, for example, the simple burst statistic
255: \footnote{In reality the statistic (\ref{eq:lambdaBdef}) would
256: be especially susceptible to non-Gaussian noise bursts in the detector
257: and so would not be used in practice; instead one would need search
258: for events where $|h_1^k|$ and $|h_2^k|$ are simultaneously large.  In
259: this paper we restrict attention for simplicity to Gaussian detector
260: noise; it will be important for future more general analyses to 
261: to allow for (uncorrelated) non-Gaussian noise components in the two
262: detectors.} 
263: \begin{equation}
264: \Lambda_\text{B} \equiv \max_{1 \le k \le N} \ \left| h_1^k \right|.
265: \label{eq:lambdaBdef}
266: \end{equation}
267: on the data from detector 1 to detect the signal.  The burst statistic
268: (\ref{eq:lambdaBdef}) and the cross-correlation statistic
269: $\Lambda_\text{CC}$ are used as references for comparison for 
270: the maximum likelihood statistic below.
271: 
272: 
273: \subsection{Main results}
274: \label{ss:Main results}
275: 
276: There are two main results in this paper.  The first result is the detection statistic $\Lambda_{\text{ML}}^{\text{NG}}$ given by Eq.~(\ref{main result}),
277: which is derived in Sec.~\ref{ss:Non-Gaussian signal}. This statistic is nearly optimal for 
278: the detection of a class of non-Gaussian gravitational wave stochastic backgrounds incident on a pair of 
279: idealized detectors.  
280: 
281: 
282: The second main result, summarized in Figs.~\ref{omega gain} and \ref{fig:theoretical}, is a
283: comparison of  
284: the performances of the maximum likelihood statistic
285: $\Lambda_{\text{ML}}^{\text{NG}}$, the  
286: cross-correlation statistic $\Lambda_{\text{CC}}$, and the burst
287: statistic $\Lambda_\text{B}$.  
288: \begin{figure}
289: \begin{center}
290: \epsfig{file=Figure1.eps,width=8.5cm}
291: \caption{
292: This plot shows the minimum gravitational-wave energy density
293: $\Omega_{\rm detectable}$ necessary for detection, for several
294: different detection statistics, as a function of the
295: background's Gaussianity parameter $\xi$.  
296: The Gaussianity parameter $\xi$ is the probability that, 
297: at any randomly chosen time, the waves from an event are incident on
298: the detectors, and thus takes values in the interval $0 \le \xi \le
299: 1$.  For a Gaussian background $\xi=1$.
300: The circles are the results of our Monte Carlo simulations for the
301: maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$, and the 
302: solid curve shows the approximate theoretical prediction (\ref{eq:ansA}) and
303: (\ref{eq:ansB}) for 
304: this statistic (expected to be accurate only to within a few tens of
305: percent).   
306: The crosses are the Monte Carlo results for the
307: cross-correlation statistic $\Lambda_\text{CC}$, and the 
308: dashed curve shows the theoretical prediction \protect{(\ref{analytic
309: detectable})} for 
310: this statistic.  Finally the squares are the Monte Carlo results for the
311: burst statistic \protect{(\ref{eq:lambdaBdef})}, and the dotted curve shows the
312: corresponding theoretical prediction given by Eqs.\ \protect{(\ref{burstans1})}
313: and \protect{(\ref{burstans2})}.
314: For each statistic, the vertical error bars on the Monte Carlo
315: simulation results give the fluctuations computed from 4 different
316: runs, each with 2000 trials.  
317: The number of data points is $N = 10^4$, and the false alarm and false
318: dismissal 
319: probabilities are both $0.1$.  
320: A detailed description of the
321: simulations and the analytical predictions can be found in
322: Sec.~\ref{s:Performance comparison}.  
323: }
324: \label{omega gain}
325: \end{center}
326: \end{figure}
327: That comparison is quantified in terms of the the minimum
328: gravitational-wave energy density  $\Omega_\text{detectable}$ 
329: necessary for detection.   The values of this quantity for the three
330: different statistics $\Lambda_\text{ML}^\text{NG}$,
331: $\Lambda_\text{CC}$ and $\Lambda_\text{B}$ we will denote by
332: $\Omega_\text{detectable}^\text{ML}$, $\Omega_{\rm
333: detectable}^\text{CC}$, and $\Omega_\text{detectable}^\text{B}$,
334: respectively.  Results for these three quantities obtained from Monte
335: Carlo simulations are shown in Fig.\ \ref{omega gain}, which gives
336: $\Omega_\text{detectable}$ as a function of $\xi$ for $N = 10^4$ data
337: points.  The Monte Carlo simulations are described in Sec.\
338: \ref{ss:Description of the simulation algorithm}
339: below.  The figure shows that in the limit $\xi \to 1$ of Gaussian
340: signals, the statistics $\Lambda_\text{ML}^\text{NG}$ and
341: $\Lambda_\text{CC}$ perform approximately equivalently (the
342: cross-correlation statistic is slightly better).  As the Gaussianity
343: parameter $\xi$ is decreased, the performance of
344: $\Lambda_\text{ML}^\text{NG}$ improves, until at $\xi \sim 10^{-2.5}$ it
345: is better than that of $\Lambda_\text{CC}$ by about a factor of $3$ in
346: energy density.  Finally, in the 
347: limit $\xi \to 0$, the individual bursts become visible and the burst
348: statistic $\Lambda_\text{B}$ becomes the best statistic.
349: 
350: 
351: 
352: 
353: {}Figure \ref{omega gain} also shows theoretical curves for the three
354: quantities $\Omega_\text{detectable}^\text{ML}$, $\Omega_{\rm
355: detectable}^\text{CC}$, and $\Omega_\text{detectable}^\text{B}$.
356: These curves are derived and discussed in Sec.\ \ref{s:Performance
357: comparison} below.  For the burst and cross-correlation statistics,
358: the theoretical curves should have a fractional accuracy $\sim
359: 1/\sqrt{N}$. For the maximum likelihood statistic, the theoretical
360: prediction is expected to be accurate to a few tens of percent.  These
361: expected accuracies are confirmed by the Monte Carlo simulations, as
362: seen in Fig.\ \ref{omega gain}.
363: 
364: 
365: The value $N = 10^4$ of the number of data points is roughly
366: appropriate for a space based detector like LISA, for which the
367: duration of a measurement might be $\sim 1 $ year and the effective bandwidth
368: $\sim 10^{-3}$ Hz.  However, for year-long observations with 
369: ground based detectors, the effective bandwidth will be $\sim 100$ Hz
370: and consequently the appropriate value of $N$ is $ \sim 10^9$.  We were 
371: unable to perform Monte Carlo simulations for this large value of $N$ due to
372: limitations in available computing power.  However, we show in Fig.\
373: \ref{fig:theoretical} the theoretical curves for the three different
374: statistics as functions of $\xi$ for $N=10^9$.  In this case, the
375: maximum likelihood statistic starts to outperform the
376: cross-correlation statistic at $\xi \sim 10^{-3}$, and the maximum
377: gain factor in energy density is of order $\sim 2$.  
378: 
379: 
380: \begin{figure}
381: \begin{center}
382: \epsfig{file=Figure2.eps,width=8.5cm}
383: \caption{
384: The minimum gravitational-wave energy density $\Omega_{\rm
385: detectable}$ necessary for detection as a function of the
386: background's Gaussianity parameter $\xi$ for $N = 10^9$ data points,
387: which is a realistic number of data points for ground based detectors.
388: The false alarm and false dismissal probabilities are both 0.01.
389: The solid line is the theoretical prediction (\ref{eq:ansA}) and
390: (\ref{eq:ansB}) for the maximum
391: likelihood statistic, which is expected to be accurate to a few tens
392: of  percent.  The dashed line is the theoretical prediction
393: (\protect{\ref{analytic}}) for the cross correlation statistic, and
394: the dotted line is the theoretical prediction
395: (\ref{burstans1})--(\ref{burstans2}) for the burst 
396: statistic; see caption to Fig.\ \protect{\ref{omega gain}}.
397: This plot indicates a maximum gain factor of $\sim 2$ in energy
398: density for duty cycles  
399: in a narrow band near $\xi \sim 10^{-4}$.}
400: \label{fig:theoretical}
401: \end{center}
402: \end{figure}
403: 
404: 
405: We next discuss the computational cost of the maximum likelihood
406: statistic $\Lambda_{\rm ML}^{\rm NG}$.  As is well known, the
407: computational cost of trying to detect a stochastic background using
408: the cross-correlation statistic $\Lambda_{\text{CC}}$ is   
409: negligible when compared to, say, matched-filter-based inspiral
410: waveform searches.  However, because of the non-trivial maximization
411: in Eq.~(\ref{main result}), the maximum likelihood statistic  
412: $\Lambda_{\text{ML}}^{\text{NG}}$ 
413: is computationally intensive.  In fact, every evaluation of the function
414: to be maximized over the four parameters 
415: $\xi$, $\alpha$, $\sigma_1$, and $\sigma_2$ requires computing a
416: length-$N$ sum or product, where $N$ is the number of data points,
417: and takes longer than the
418: entire cross-correlation detection method.  Depending on the method of calculation,
419: the computational cost of computing $\Lambda_{\text{ML}}^{\text{NG}}$ is larger than that
420: of computing $\Lambda_{\text{CC}}$ by a factor anywhere from $10^2$ to $10^4$.
421: 
422: 
423: 
424: 
425: To summarize, under the idealized assumptions of this paper, if one
426: searches for a stochastic background using the standard
427: cross-correlation statistic, then one might not detect a signal that
428: would have been detectable using our maximum likelihood statistic.
429: This conclusion probably generalizes to realistic detector
430: noise models and detector orientations.
431: 
432: 
433: \subsection{Outline of this paper}
434: \label{ss:Outline of this paper}
435: 
436: In Sec.~\ref{s:General theory of detection statistics and parameters estimator} we introduce notation, 
437: review the general theory of signal detection and parameter measurement, and derive a general form of the maximum 
438: likelihood detection statistic.  Then, in Sec.~\ref{s:Application to stochastic background searches}, we derive the maximum likelihood 
439: statistics for both a Gaussian background (Sec.~\ref{ss:Gaussian
440: signal}) and for the model (\ref{eq:sigg}) of a non-Gaussian background 
441: (Sec.~\ref{ss:Non-Gaussian signal}), assuming two idealized detectors.  
442: In Sec.~\ref{s:Performance comparison} we discuss analytical calculations and Monte Carlo simulations 
443: comparing the performance of 
444: the maximum likelihood and cross-correlation detection statistics.
445: Also in Sec.~\ref{s:Performance comparison} we show how the signal
446: parameters $\xi$ and $\alpha$ can be estimated, with reasonable
447: accuracy, for a strong non-Gaussian background. We conclude in
448: Sec.~\ref{s:Conclusions} with a  
449: discussion of the results.  
450: 
451: \section{General theory of detection statistics and parameter estimation}
452: \label{s:General theory of detection statistics and parameters estimator}
453: 
454: In this section we review various formal aspects of the theory of
455: signal detection and measurement. 
456: We derive a form of the maximum likelihood detection statistic that is
457: more general than has been considered before in  
458: the context of gravitational wave data analysis \cite{Allen Romano, general method, excess power, sam joe, sam unpublished}.
459: The material in this section can be found in a variety of texts
460: \cite{maximum likelihood}; we include this section for completeness
461: and to introduce notation. 
462: 
463: \subsection{Notational conventions}
464: \label{ss:Notational conventions}
465: 
466: We use calligraphic letters $\mathcal{A, B, C, \ldots}$ to denote
467: random variables.
468: As described in Sec.~\ref{ss:Gaussian stochastic backgrounds}, given
469: $D$ detectors we can assemble an $N \times D$ detector 
470: output matrix $\mathcal{H}$ with components $\mathcal{H}_i^k$ where $k=1,2,\ldots,N$ is
471: a time index, and $i=1,2,\dots,D$ labels the detector.
472: We assume that the detector outputs are made up of noise $\mathcal{N}$ and signal $\mathcal{S}$ 
473: with components $\mathcal{N}_i^k$ and $\mathcal{S}_i^k$ respectively, such that
474: \begin{equation} \label{detector output matrices} 
475: \mathcal{H = N + S}.
476: \end{equation}
477: Specific realizations of random variables will be denoted by lower case
478: Roman symbols.  For example,  
479: $h=n+s$ is a specific realization of Eq.~(\ref{detector output
480: matrices}), where the components of $h$ are $h_i^k$.
481: 
482: Probability densities for random variables will always be denoted by a lowercase $p$ and will carry a subscript
483: to indicate which random variable is being described.  For example, $p_\mathcal{N}(n)d^{ND}n$
484: is the probability that $n<\mathcal{N}< n+dn$, where $d^{ND}n$ is the product
485: \begin{equation}  \label{differential product}
486: d^{ND}n = \prod_{k=1}^{N}\prod_{i=1}^{D}dn_i^k.
487: \end{equation}
488: We write the normalization requirement for $p_\mathcal{N}(n)$ as
489: \begin{equation} \label{normalization} 
490: 1 = \int d^{ND}n~ p_\mathcal{N}(n).
491: \end{equation}
492: Unless otherwise specified, integrals are over $\mathbb{R}^{ND}$ where $\mathbb{R}$ is the set of real numbers.
493: 
494: We assume a detector noise model with $Q_n$ parameters.  Let
495: $\mathcal{V}_n$ be a vector of length $Q_n$   
496: whose components are the parameters characterizing the noise in the
497: detectors.  We denote by  
498: $\Theta_n$ the space of all possible values of $\mathcal{V}_n$. Here
499: the subscript $n$ is not an index; it is merely short for ``noise''. 
500: We denote joint probabilities in the usual way.  For example, $p_{{\mathcal N},{\mathcal V}_n}(n,{\bf v}_n)d^{ND}n~d^{Q_n}v_n$ is 
501: the probability that $n<\mathcal{N}< n+dn$ and ${\bf v}_n<\mathcal{V}_n< {\bf v}_n+d{\bf v}_n$, where $d^{Q_n}v_n$ is defined by
502: \begin{equation} 
503: d^{Q_n}v_n = \prod_{l=1}^{Q_n} dv_n^l,
504: \end{equation}
505: and $dv_n^l$ is the $l$th component of $d{\bf v}_n$.
506: We also use vertical bars to denote conditional 
507: probabilities.  For example
508: \begin{equation} \label{conditional joint}  
509: p_{\mathcal{N|V}_n}(n|{\bf v}_n) d^{ND}n =\frac{ p_{\mathcal{N,V}_n}(n,{\bf v}_n) d^{Q_n}v_n}{ p_{\mathcal{V}_n}({\bf v}_n) d^{Q_n}v_n}d^{ND}n
510: \end{equation}
511: is the probability that $n<\mathcal{N}< n+dn$ given that ${\bf
512: v}_n<\mathcal{V}_n< {\bf v}_n+d{\bf v}_n$. 
513: 
514: We will often use the so-called total probability theorem \cite{Papoulis} to write probability densities 
515: for a specific random variable as an integral over the functional dependencies of that random variable. 
516: An example is
517: \begin{equation} \label{total probability} 
518: p_\mathcal{N}(n) =\int_{\Theta_n}d^{Q_n}v_n~ p_{\mathcal{N|V}_n}(n|{\bf v}_n) p_{\mathcal{V}_n}({\bf v}_n).
519: \end{equation}
520: Expanding probability densities in this way allows us to treat
521: parameters, such as the noise parameters $\mathcal{V}_n$ in 
522: Eq.~(\ref{total probability}), as unknowns.  In fact, such a treatment of
523: the noise parameters
524: is the crucial difference between the derivations of this work and those in previous studies of 
525: gravitational wave data analysis techniques \cite{Allen Romano, general method, excess power, sam joe, sam unpublished}.
526: 
527: We assume that the signal model contains $Q_s$ parameters, which we
528: will treat as random variables  
529: like the noise parameters.  We will denote by ${\mathcal V}_s$ the
530: random vector of length $Q_s$ containing the signal parameters, 
531: and by $\Theta_s$ the space of all possible values of ${\mathcal V}_s$.
532: 
533: We define the notions of ``signal present'' and ``signal absent'' in terms of a partition of the space
534: $\Theta_s$ of signal parameters into a disjoint union
535: \begin{equation} 
536: \Theta_s = \Theta_{s0} \cup \Theta_{s1},
537: \end{equation} 
538: where $\Theta_{s0}$ corresponds to the signal being absent, and $\Theta_{s1}$ the signal being present.
539: We define the random variable $\mathcal{T}$, taking values $\mathcal{T}=0$ or $\mathcal{T}=1$, according to 
540: \begin{equation} 
541: \mathcal{T} = \left\{ 
542: \begin{array}{ll}
543:         1 & \text{ if } \mathcal{V}_s \in \Theta_{s1} \\
544:         0 & \text{ if } \mathcal{V}_s \in \Theta_{s0}
545: \end{array} \right. .
546: \end{equation}
547: Thus $\mathcal{T}=1$ corresponds to a signal being present, and
548: $\mathcal{T}=0$ to no signal being present.  We define
549: \begin{equation} 
550: p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,0) = \left\{
551: \begin{array}{ll}
552:         0  & \text{ if } {\bf v}_s \in \Theta_{s1} \\
553:         \delta^{ND}(s) & \text{ if } {\bf v}_s \in \Theta_{s0}
554: \end{array} \right. ,
555: \end{equation} 
556: where $\delta^{ND}(s)$ is the $N \times D$ dimensional Dirac delta function.
557: We denote by  $p_\mathcal{T,H}(t,h)d^{ND}h$ the probability that
558: $\mathcal{T}=t$ and that $h < \mathcal{H} < h+dh$, where $t=0$ or $1$.
559: Similarly 
560: \begin{equation} 
561: p_\mathcal{H|T}(h|t) d^{ND}h = \frac{ p_\mathcal{H,T}(h,t) }{ P_\mathcal{T}(t)}d^{ND}h
562: \end{equation} 
563: is the probability that $h < \mathcal{H} < h+dh$ given that $\mathcal{T}=t$.
564: 
565: 
566: We denote probabilities (as opposed to probability densities) with an uppercase $P$. For example $P_\mathcal{T}(1)$ is the probability that a signal 
567: is present, and $P_\mathcal{T}(0)$ is the probability that a signal is
568: absent.
569: 
570: 
571: Before examining the detector outputs, we may have some idea, say from previous experiments,  of the probability 
572: that a signal will be present. We denote this prior probability by $P^{(0)}$. We denote by $P^{(1)}$ the posterior 
573: probability that the signal is present after examining $\mathcal{H}$ in the context of all prior experiments etc.  
574: All posterior quantities have an implicit dependence on the detector outputs.  To simplify the notation 
575: we will not explicitly show this dependence.  For example, we write $P^{(1)}$ rather than the more cumbersome 
576: $P^{(1)}(\mathcal{H})$ for the posterior probability that a signal is present.
577: 
578: There are prior and posterior versions of all probability densities. When necessary we will append superscripts
579: of $(0)$ and $(1)$ to distinguish priors and posteriors respectively.
580: For example $p^{(1)}_{\mathcal{V}_n}({\bf v}_n) = p_{\mathcal{V}_n|\mathcal{H}}({\bf v}_n|h)$ is the posterior probability density for
581: $\mathcal{V}_n$. The posterior distribution for the noise can be expanded in terms of $p^{(1)}_{\mathcal{V}_n}({\bf v}_n)$ as
582: \begin{equation} 
583: p^{(1)}_\mathcal{N}(n) =\int_{\Theta_n}d^{Q_n}v_n~ p_{\mathcal{N|V}_n}(n|{\bf v}_n) p^{(1)}_{\mathcal{V}_n}({\bf v}_n).  
584: \end{equation} 
585: 
586: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
587: 
588: The conventions and symbols which have been introduced above are summarized in tables 
589: \ref{conventions} and \ref{symbols} respectively.  
590: 
591: \begin{table*}
592: \caption{\label{conventions} A summary of conventions introduced in Sec.~\ref{ss:Notational conventions}.}
593: \begin{ruledtabular}
594: \begin{tabular}{p{8.5cm}p{8.5cm}}
595: Convention & Example \\ \hline
596: 
597: Random variables are denoted by upper case calligraphic letters. & 
598: The detector output matrix is denoted by $\mathcal{H}$. \\
599: 
600: Specific realizations of random variables are denoted by lower 
601: case Roman letters. (see next convention)& 
602: A specific observation run may result in a specific detector output 
603: matrix $h$ or say $x$. These results would be denoted $\mathcal{H}=h$ 
604: and $\mathcal{H}=x$ respectively. \\ 
605: 
606: A lower case $p$ denotes a probability density function (PDF). 
607: It's subscript determines the quantities with which it is associated. &
608: The PDF for the detector output $\mathcal{H}$ as a function of $h$, or say $x$, 
609: is denoted by $p_\mathcal{H}(h)$ and $p_\mathcal{H}(x)$ respectively. \\
610: 
611: A comma in a PDF subscript and argument indicates a joint PDF. &
612: The joint PDF for $\mathcal{N}$ and $\mathcal{V}_n$ as a function of 
613: $n$ and ${\bf v}_n$ respectively is denoted by $p_{\mathcal{N},\mathcal{V}_n}(n,{\bf v}_n)$. \\
614: 
615: A vertical bar in a PDF subscript and argument indicates a conditional PDF. &
616: The conditional PDF for $\mathcal{N}$ and $\mathcal{V}_n$ as a function of
617: $n$ and ${\bf v}_n$ respectively is denoted by $p_{\mathcal{N}|\mathcal{V}_n}(n|{\bf v}_n)$. \\
618: 
619: An upper case $P$ denotes a probability. & 
620: 
621: The probability that $\mathcal{T}=1$ is denoted by $P_\mathcal{T}(1)$. \\
622: 
623: Prior and posterior quantities are denoted by superscripts of $(0)$ and $(1)$ respectively. &
624: 
625: The prior probability that a signal is present is denoted by $P^{(0)}$, while the posterior 
626: probability that a signal is present, after an observation $\mathcal{H}=h$, is denoted by 
627: $P^{(1)} = P_{\mathcal{T}|\mathcal{H}}(1|h)$.
628: \end{tabular}
629: \end{ruledtabular}
630: \end{table*}
631: 
632: \begin{table}
633: \caption{\label{symbols} A summary of symbols introduced in Sec.~\ref{ss:Notational conventions}.}
634: \begin{ruledtabular}
635: \begin{tabular}{cp{7cm}}
636: Symbol & {Meaning} \\ \hline
637: $\mathcal{H},h$ & detector output matrix \\
638: $\mathcal{N},n$ & noise contribution to detector output matrix \\
639: $\mathcal{S},s$ & signal contribution to detector output matrix \\
640: $N$ & number of strain samples taken from one detector  \\
641: $D$ & number of detectors \\
642: $Q_n$ & number of parameters in the model noise PDF \\
643: $Q_s$ & number of parameters in the model signal PDF \\
644: $\mathcal{V}_n,{\bf v}_n$ & the parameters of the model noise PDF \\
645: $\mathcal{V}_s,{\bf v}_s$ & the parameters of the model signal PDF \\
646: $\Theta_n$ & the space of all possible values of $\mathcal{V}_n$\\
647: $\Theta_s$ & the space of all possible values of $\mathcal{V}_s$\\
648: $\Theta_{s0}$ & the subspace of $\Theta_s$ for which a signal is absent\\
649: $\Theta_{s1}$ & the subspace of $\Theta_s$ for which a signal is present \\
650: $\mathcal{T},t$ & 1 if a signal is present ($\mathcal{V}_s \in \Theta_{s1}$), otherwise 0\\
651: $P^{(0)}$ & prior probability that a signal is present\\
652: $P^{(1)}$ & posterior probability that a signal is present\\
653: \end{tabular}
654: \end{ruledtabular}
655: \end{table}
656: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
657: 
658: \subsection{Detection statistics}
659: \label{ss:Detection statistics}
660: 
661: To detect a signal one uses a detection statistic, say $\Gamma=\Gamma(\mathcal{H})$, that is some function
662: of the detector outputs ${\cal H}$.  A signal is said to have been
663: detected when 
664: $\Gamma$ exceeds some threshold value $\Gamma_*$.  
665: 
666: Denote by $P_\text{FD}(\Gamma_*)$ the probability of false dismissal, that is, the probability
667: that we fail to detect a signal which is actually present.  Similarly, let $P_\text{FA}(\Gamma_*)$ be
668: the probability that we claim to have detected a signal which in fact is absent---the probability of false alarm.
669: For given signal and noise models and for a given statistic $\Gamma$, the
670: false alarm and false dismissal probabilities generate a curve in the
671: $P_\text{FA}$-$P_\text{FD}$ plane parametrized by the threshold $\Gamma_*$.
672: Such curves depend on the number of detectors $D$, the number of data points $N$, 
673: the signal parameters $\mathcal{V}_s$, and the noise parameters $\mathcal{V}_n$.
674: 
675: \begin{figure}
676: \begin{center}
677: \epsfig{file=FalseExpectations.eps,width=8.5cm}
678: \caption{False dismissal versus false alarm curves for typical detection statistics.}
679: \label{expected plots}
680: \end{center}
681: \end{figure}
682: 
683: Suppose that the statistic $\Gamma$ is bounded in the sense that 
684: there exist numbers $\Gamma_{\min}$ and $\Gamma_{\max}$ such that
685: $\Gamma_{\min} < \Gamma < \Gamma_{\max}$ for all ${\cal H}$. 
686: Then it is clear that $P_\text{FD}(\Gamma_{\min}) = 0$ and that
687: $P_\text{FA}(\Gamma_{\min})=1$.  As the threshold $\Gamma_*$
688: increases toward $\Gamma_{\max}$,  $P_\text{FD}(\Gamma_*)$ will
689: increase while $P_\text{FA}(\Gamma_*)$ 
690: decreases, until finally at $\Gamma_* = \Gamma_{\rm max}$, $P_\text{FD} = 1$, 
691: and $P_\text{FA} = 0$.  Thus, false dismissal-false alarm curves generally look 
692: something like those sketched in Fig.~\ref{expected plots}.
693: 
694: 
695: 
696: Note that if one uses a different statistic $f(\Gamma)$, where $f$ is
697: any function, then the shape of the $P_\text{FA}$-$P_\text{FD}$ curve does
698: not change as long as $f$ is monotonic in the sense that 
699: \begin{equation} \label{transformation}
700: \Gamma > \Gamma_* ~\Rightarrow~ f(\Gamma) > f(\Gamma_*).
701: \end{equation}
702: Only the parametrization of the curve changes under such a
703: transformation.  Statistics related by transformations $f$ satisfying
704: the monotonicity property (\ref{transformation}) have identical false
705: dismissal versus false alarm curves.
706: 
707: In 1933 Neyman and Pearson considered a simple signal detection scenario 
708: where the sets $\Theta_n$, $\Theta_{s1}$, and $\Theta_{s0}$ each contain a single element \cite{Neyman and Pearson}.  
709: They showed that for this scenario the detection statistic which minimizes $P_\text{FD}$ for any $P_\text{FA}$ 
710: is the so-called \emph{likelihood ratio} $\Lambda$, defined by
711: \begin{equation} \label{def1}
712: \Lambda = \frac{p_\mathcal{H|T}(h|1)}{p_\mathcal{H|T}(h|0)}.
713: \end{equation}
714: One notion of optimality for detection statistics is that the
715: statistic should minimize the false dismissal probability
716: at a fixed value of the false alarm probability.  For the simple
717: scenario above, this criteria, known as the  
718: Neyman-Pearson criteria, uniquely determines the likelihood ratio as the optimal statistic
719: \cite{Ferguson}.  However in general, when any of $\Theta_n$, $\Theta_{s1}$, or $\Theta_{s0}$ contains more than
720: one element, the statistic selected by this criteria is a function of the unknown parameters $\mathcal{V}_s$
721: and $\mathcal{V}_n$.  Thus, as is well known, the Neyman-Pearson
722: criteria does not single out a unique statistic in such cases.  
723: 
724: 
725: In this paper we will obtain our detection statistics from Bayesian
726: considerations, but we will quantify their  
727: effectiveness using the Neyman and Pearson criteria of comparing false
728: dismissal probabilities at fixed false alarm probabilities.
729: 
730: \subsection{Likelihood ratio and likelihood function}
731: \label{ss:Likelhood ratio and likelihood function}
732: 
733: {}From a Bayesian point of view, a natural criterion for 
734: deciding that a signal is present  
735: is for the posterior probability $P^{(1)}$ to
736: exceed some threshold \cite{Bayes}. 
737: The posterior probability $P^{(1)}$ is related to the prior
738: probability $P^{(0)}$ and to the likelihood ratio $\Lambda$ defined by
739: Eq.~(\ref{def1}) by 
740: \begin{equation} \label{def2} 
741: \frac{P^{(1)}}{1-P^{(1)}} = \Lambda \frac{P^{(0)}}{1-P^{(0)}}.
742: \end{equation}
743: See appendix \ref{s:appendixA} for a derivation of Eq.~(\ref{def2}) in the most general context where the sets $\Theta_n$, 
744: $\Theta_{s1}$, and $\Theta_{s0}$ are all non-trivial. It follows from Eq.~(\ref{def2}) that $P^{(1)}$ is a monotonic function
745: of $\Lambda$, so thresholding on $P^{(1)}$ is equivalent to thresholding on $\Lambda$.
746: This makes $\Lambda$, or approximate versions of it, the natural choice for a detection statistic.
747: 
748: 
749: We derive in Appendix \ref{s:appendixA} the following general formula
750: for the likelihood ratio as a function of the data ${\cal H} = h$:
751: \begin{widetext}
752: \begin{equation} \label{general likelihood ratio}
753: \Lambda = \frac{\displaystyle  \int_{\Theta_{s1}} d^{Q_s}v_s~ \int d^{ND}s~ \int_{\Theta_n} d^{Q_n}v_n~ p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{V}_n}({\bf v}_n) 
754:                  p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s|1) }
755:                {\displaystyle                       \int_{\Theta_n} d^{Q_n}v_n'~             p_{\mathcal{N|V}_n}(h|{\bf v}_n')  p_{\mathcal{V}_n}({\bf v}_n') }.
756: \end{equation}
757: \end{widetext}
758: The various probability distributions that appear in Eq.\ (\ref{general
759: likelihood ratio}) are (i) the prior distribution
760: $p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s|1)$ for the signal
761: parameters ${\bf v}_s$; (ii) the distribution 
762: $p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1)$ for the signal $s$
763: given the signal parameters ${\bf v}_s$; (iii) the prior distribution 
764: $p_{\mathcal{V}_n}({\bf v}_n)$ for the noise parameters ${\bf v}_n$;
765: and (iv) the distribution $p_{\mathcal{N|V}_n}(h|{\bf v}_n)$ for the
766: noise $n$ given the noise parameters ${\bf v}_n$.
767: 
768: 
769: We can interpret Eq.\ (\ref{general likelihood ratio}) as follows.  
770: In the simple signal detection scenario, we choose between a pair of
771: simple claims: 
772: (i) $\mathcal{V}_s = {\bf v}_{s0}$ or (ii) $\mathcal{V}_s = {\bf v}_{s1}$.
773: In general we choose between a pair of complicated, or composite, claims:
774: (i)  $\mathcal{V}_s \in \Theta_{s0}$ or (ii) $\mathcal{V}_s \in
775: \Theta_{s1}$, where both $\Theta_{s0}$  
776: and $\Theta_{s1}$ contain many elements.
777: Equation (\ref{general likelihood ratio}) says that the best way to
778: chose between a pair of complicated claims is 
779: to 
780: first break the complicated pair of claims into pairs of simple
781: claims, then compute the likelihood ratio for each pair of simple claims,  
782: and sum the results of each choice. That is, the likelihood ratio can
783: be written as an integral over the parameters of the composite claims
784: \begin{equation} \label{likelihood function def}
785: \Lambda = \int_{\Theta_{s1}} d^{Q_s}v_s~ \int_{\Theta_n} d^{Q_n}v_n ~\Lambda({\bf v}_s,{\bf v}_n),
786: \end{equation} 
787: where the integrand $\Lambda({\bf v}_s,{\bf v}_n)$, which we refer to
788: as the \emph{likelihood function}, 
789: can be read off from Eq.\ (\ref{general likelihood ratio}):
790: \begin{widetext}
791: \begin{equation} \label{likelihood function} 
792: \Lambda({\bf v}_s,{\bf v}_n) = \frac{\displaystyle \int d^{ND}s~  p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) 
793:                                          p_{\mathcal{V}_n}({\bf v}_n) p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s,1) }
794:                     {\displaystyle \int_{\Theta_n} d^{Q_n}v_n'~ p_{\mathcal{N|V}_n}(h|{\bf v}_n')  p_{\mathcal{V}_n}({\bf v}_n') }.
795: \end{equation} 
796: \end{widetext}
797: 
798: The likelihood function\footnote{There are two different conventions for the definition of the likelihood function.
799: Some authors include the probability distributions for $\mathcal{V}_s$ and $\mathcal{V}_n$ in the definition
800: of $\Lambda({\bf v}_s,{\bf v}_n)$ as we have in Eq.~(\ref{likelihood function}), while others leave these out 
801: of $\Lambda({\bf v}_s,{\bf v}_n)$ and would show these distributions explicitly in 
802: Eq.~(\ref{likelihood function def}).}
803: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
804: $\Lambda({\bf v}_s,{\bf v}_n)$ can be used to
805: compute the posterior probability density 
806: $p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf
807: v}_n|1)$ for the signal and noise parameters given that a signal is
808: present, via the formula
809: \begin{equation} \label{distribution relation} 
810: \frac{ P^{(1)} }{ 1 - P^{(1)} } p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1)
811: = \Lambda({\bf v}_s,{\bf v}_n) \frac{ P^{(0)} }{ 1 - P^{(0)} }.
812: \end{equation}
813: A derivation of Eq.~(\ref{distribution relation}) can be found in appendix \ref{s:appendixA}.
814: 
815: \subsection{Maximum likelihood detection statistics and parameter estimators}
816: \label{ss:Maximum likelihood detection statistics and parameter estimators}
817: 
818: In many applications, it is impractical to compute the detection
819: statistic (\ref{general likelihood ratio}) because of the
820: multi-dimensional integrals involved \cite{Loredo}.  However,
821: approximate versions of the statistic are often easier to compute and
822: useful.  If a signal is present with sufficiently large amplitude, then
823: the integrand in the numerator of Eq.\ (\ref{general likelihood ratio})
824: will be sharply peaked.  The integrand in the denominator of 
825: Eq.\ (\ref{general likelihood ratio}) will also be sharply peaked when
826: there is sufficient data that the noise is well characterized.  Under
827: these circumstances, the integrals can be written as the values of the
828: corresponding integrands at the peaks multiplied by ``width
829: factors'', where the width factors depend only weakly
830: on the data $h$ and can be neglected without affecting much the
831: performance of the statistic.  [The width factors from the integrals
832: over the noise parameters will tend to cancel between the numerator
833: and denominator].  Also, frequently the prior distributions for
834: $\mathcal{V}_s$ and $\mathcal{V}_n$ are slowly varying, and neglecting
835: those distributions 
836: has a negligible effect on the performance of the statistic.   
837: Under these conditions the maximum likelihood detection statistic
838: $\Lambda_\text{ML}$ defined by  
839: \begin{widetext}
840: \begin{equation} \label{general likelihood estimator}
841: \Lambda_\text{ML} = \frac{ \displaystyle \max_{{\bf v}_s\in\Theta_{s1}}~\max_{{\bf v}_n\in\Theta_n}~ \int d^{ND}s ~p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) 
842: 			   p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) }
843:                          { \displaystyle \max_{{\bf v'}\in\Theta_n}~ p_{\mathcal{N|V}_n}(h|{\bf v}_n') }
844: \end{equation}
845: \end{widetext}
846: is a natural approximate version of $\Lambda$ 
847: % FOOTNOTE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
848: \footnote{In the event that the priors for $\mathcal{V}_s$
849: and $\mathcal{V}_n$ restrict these parameters to regions $\Theta_{s1}' \subset \Theta_{s1}$ and
850: $\Theta_n' \subset \Theta_n$, the bounds of the maximizations in Eq.~(\ref{general likelihood estimator})
851: should be changed to $\Theta_{s1} \rightarrow \Theta_{s1}'$ and $\Theta_n \rightarrow \Theta_n'$.}. 
852: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
853: The subscript ML denotes that (\ref{general likelihood estimator}) is the maximum likelihood approximate 
854: version of $\Lambda$.
855: See Ref.~\cite{maximum likelihood} for further discussion of
856: $\Lambda_\text{ML}$ as an approximate version of $\Lambda$ \footnote{Note that $\Lambda_{\rm ML}$ is an 
857: approximate version of $\Lambda$ only in the sense that the false dismissal versus false alarm curves
858: of the two statistics will be close to one another.  The numerical
859: values of $\Lambda_{\rm ML}$ and $\Lambda$ will in general differ
860: significantly, due to the width factors and priors.  Therefore the
861: statistic $\Lambda_{\rm ML}$ cannot be used in Eq.\ (\ref{def2}) to
862: compute Bayesian thresholds for detection given a desired value of
863: $P^{(1)}$.}.  
864: 
865: 
866: 
867: A particular special case of the detection
868: statistic (\ref{general likelihood estimator}), which is widely used,
869: is the following.  Assume that the noise   
870: parameters have some known values $\mathcal{V}_n = {\bf v}_n$. Then the noise priors and the $\Theta_n$ integrals 
871: in Eq.~(\ref{general likelihood ratio}) are trivial, and one obtains
872: the detection statistic
873: \begin{equation} \label{known likelihood estimator}
874: \tilde\Lambda_\text{ML}=\frac{\displaystyle \max_{{\bf v}_s\in\Theta_{s1}}~ \int d^{ND}s ~p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1)}
875: 			     {\displaystyle p_{\mathcal{N|V}_n}(h|{\bf v}_n)}.
876: \end{equation}
877: See Ref.~\cite{sam joe} for an exploration of the statistic (\ref{known likelihood estimator}) in the 
878: context of stochastic backgrounds.  We will show below that for a Gaussian stochastic background, 
879: $\Lambda_\text{ML}$ reduces to the standard cross-correlation
880: statistic while the more specialized statistic
881: $\tilde\Lambda_\text{ML}$ does not.  Thus for stochastic backgrounds,
882: treating the noise parameters as unknowns is crucial \cite{robust
883: gaussian II}.
884: 
885: 
886: When the noise and signal parameters $\mathcal{V}_n$ and
887: $\mathcal{V}_s$ can take on many values, one naturally would like to
888: know which  
889: values are realized. Equation (\ref{distribution relation}) suggests
890: using the values $\hat {\bf v}_n$ and $\hat {\bf v}_s$ defined by
891: \begin{equation} \label{ML estimates}
892: \Lambda(\hat {\bf v}_s, \hat {\bf v}_n) = \max_{{\bf v}_s\in\Theta_{s1}}~ \max_{{\bf v}_n\in\Theta_n}~ \Lambda({\bf v}_s,{\bf v}_n).
893: \end{equation}
894: The estimators $\hat {\bf v}_n$ and $\hat {\bf v}_s$ are known as maximum likelihood estimators. 
895: Note that ${\bf v}_s=\hat {\bf v}_s$ and ${\bf v}_n=\hat {\bf v}_n$ also maximize the numerator in Eq.~(\ref{general likelihood estimator}).
896: For the remainder of this paper we will use $\Lambda_\text{ML}$, defined by Eq.~(\ref{general likelihood estimator}), as our 
897: detection statistic, and $\hat {\bf v}_s$ and $\hat {\bf v}_n$, defined by Eq.~(\ref{ML estimates}), as parameter estimators.
898: 
899: \section{Application to stochastic background searches}
900: \label{s:Application to stochastic background searches}
901: 
902: In this section we derive the maximum likelihood detection statistic (\ref{general likelihood estimator})
903: for a simplified model of the detection problem for stochastic gravitational waves, and for a specific simple model 
904: of a non-Gaussian stochastic background.
905: 
906: \subsection{Assumptions}
907: \label{ss:Assumptions}
908: 
909: We assume two detectors with outputs $\mathcal{H}_i^k$, where $i=1,2$
910: labels the detector
911: and $k=1,2,\ldots,N$ is a time index. 
912: We assume that the noise in detector one is uncorrelated with the
913: noise in detector two.
914: We will require the noise in both detectors to have vanishing mean and to be both Gaussian and white, so that
915: \begin{equation} \label{assumption2}
916: p_{\mathcal{N|V}_n}\left[n|(\sigma_1,\sigma_2)\right] = 
917: \prod_{k=1}^N \frac{ 1 }{ 2 \pi \sigma_1\sigma_2 } \exp\left[- \frac{ (n_1^k)^2 }{ 2 \sigma^2_1 } - \frac{ (n_2^k)^2 }{ 2 \sigma^2_2 } \right].
918: \end{equation}
919: The parameters $\sigma_1$ and $\sigma_2$ in Eq.~(\ref{assumption2})
920: are the square roots of the variances of the noise 
921: in the two detectors. 
922: For this model ${\bf v}_n = (\sigma_1,\sigma_2)$ and $\Theta_n =
923: \left\{ (\sigma_1,\sigma_2) ~|~ \sigma_1 \ge 0 \text{ and } \sigma_2
924: \ge 0\right\}$. 
925: 
926: We assume that the detectors are collocated and aligned, so that the
927: same signal is present in both detectors
928: \begin{equation}
929: \mathcal{S}_1^k = \mathcal{S}_2^k = \mathcal{S}^k.
930: \end{equation}
931: Lastly we assume that the individual signal samples are uncorrelated
932: and identically distributed, i.e., the signal is white, so that
933: \begin{equation} \label{assumption4}
934: p_\mathcal{S}(s) = \prod_{k=1}^N p_{\mathcal{S}^k}(s^k).
935: \end{equation} 
936: Our assumptions (\ref{assumption2})-(\ref{assumption4})
937: are unrealistic for both ground-based and space-based detectors: we
938: expect the noise to be colored with significant non-Guasssian
939: components, and in general detectors will not be co-located and
940: aligned.  Our analysis is therefore just a first step, and will need
941: to be generalized.  However, we expect that our central conclusion ---
942: the existence of statistics which outperform the standard
943: cross-correlation statistic for nonGaussian signals --- is robust, and
944: will not be altered when these complications are taken into account.  
945: 
946: 
947: 
948: We now derive a general formula for the maximum likelihood statistic 
949: (\ref{general likelihood estimator}), which we apply in both the
950: Gaussian and non-Gaussian cases in the following two subsections. 
951: The denominator in Eq.~(\ref{general likelihood estimator}) can be
952: written, from Eq.~(\ref{assumption2}), as
953: \begin{equation} \label{denominator to maximize} 
954: \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~ \left\{  \left(2 \pi \sigma_1\sigma_2 \right)^{-N} 
955: 	                         	\exp\left[ -\frac{N}{2} \left( \frac{\bar\sigma^2_1}{\sigma^2_1} 
956: 					+ \frac{\bar\sigma^2_2}{\sigma^2_2} \right)\right] \right\},
957: \end{equation}
958: where $\bar\sigma_1^2$ and $\bar\sigma_2^2$ are defined by 
959: \begin{equation} \label{sigma bar def}
960: \bar\sigma^2_i = \frac{1}{N} \sum_{k=1}^N \left(h_i^k\right)^2 
961: \end{equation} 
962: for $i=1,2$.
963: It is easily shown that the maximum in Eq.~(\ref{denominator to maximize}) is achieved at
964: $\sigma_i = \bar\sigma_i$.
965: From Eq.~(\ref{general likelihood estimator}) this yields
966: \begin{widetext}
967: \begin{equation} 
968: \Lambda_\text{ML} = \frac{\displaystyle \max_{{\bf v}_s\in\Theta_{s1}}~\max_{{\bf v}_n\in\Theta_n}~ \int d^{ND}s ~p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) }
969:                          {\displaystyle \left( 2\pi \bar\sigma_1 \bar\sigma_2 \right)^{-N} \exp\left( -N\right) }.
970: \end{equation}
971: Combining this with Eq.~(\ref{assumption4}) yields the following final general expression for the maximum likelihood statistic:
972: \begin{equation} \label{special likelihood estimator}
973: \Lambda_\text{ML} = \max_{{\bf v}_s\in\Theta_{s1}}~\max_{\sigma_1 \ge 0}~\max_{\sigma_2 \ge 0}~
974: 		    \prod_{k=1}^N  \frac{ \bar\sigma_1 \bar\sigma_2 }{ \sigma_1 \sigma_2 }   
975: 		    \int_{-\infty}^\infty ds^k~ p_{\mathcal{S}^k|\mathcal{V}_s,\mathcal{T}}(s^k|{\bf v}_s,1) 
976: 	            \exp\left[ -\frac{ \left(h_1^k - s^k \right)^2 }{ 2\sigma^2_1 } 
977: 	                       - \frac{ \left( h_2^k - s^k \right)^2 }{ 2\sigma^2_2 } 
978: 	       	               + 1 \right].
979: \end{equation} 
980: \end{widetext}
981: 
982: 
983: \subsection{Gaussian signal}
984: \label{ss:Gaussian signal}
985: 
986: We now consider the case where the signal is Gaussian and has a
987: vanishing mean.  We denote by $\alpha^2$ the variance of the signal,
988: so the prior for $\mathcal{S}$ is given by
989: \begin{equation} \label{Gaussian signal}
990: p_{\mathcal{S}^k|\mathcal{V}_s,\mathcal{T}}(s^k|\alpha,1) = \frac{ 1 }{ \sqrt{2 \pi} \alpha } 
991: 			      \exp\left[- \frac{ \left( s^k \right)^2 }{ 2 \alpha^2 } \right].
992: \end{equation}
993: For this model ${\bf v}_s = (\alpha)$ has only one component, and
994: $\Theta_{s1}=\{ \alpha ~|~ \alpha > 0\}$. 
995: 
996: 
997: 
998: 
999: Substituting the signal probability distribution (\ref{Gaussian signal}) into the general expression (\ref{special likelihood estimator}) 
1000: for $\Lambda_\text{ML}$ yields a Gaussian integral which is straightforward to evaluate.  The result is
1001: \begin{eqnarray} 
1002: \label{long Gaussian stat}
1003: \Lambda^\text{G}_\text{ML} &=&  \max_{\alpha > 0}~\max_{\sigma_1 \ge 0}~\max_{\sigma_2 \ge 0} 
1004: \left\{ 
1005:  \frac{\bar\sigma_1 \bar\sigma_2}{\sqrt{\sigma_1^2 \sigma_2^2 + \sigma_1^2 \alpha^2 + \sigma_2^2 \alpha^2}}   \right. \\
1006: &\times & \left. \exp\left[ \frac{ \frac{\bar\sigma_1^2}{\sigma_1^4} + \frac{\bar\sigma_2^2}{\sigma_2^4} + \frac{2\bar\alpha^2}{\sigma_1^2\sigma_2^2} }
1007: 	                   {2 \left( \frac{1}{\sigma_1^2} + \frac{1}{\sigma_2^2} + \frac{1}{\alpha^2} \right)} 
1008: 		           - \frac{\bar\sigma_1^2}{2\sigma_1^2} - \frac{\bar\sigma_2^2}{2\sigma_2^2} + 1 \right] \right\}^N ,\nonumber 
1009: \end{eqnarray} 
1010: where 
1011: \begin{equation} 
1012: \bar \alpha^2 = \frac{1}{N}\sum_{k=1}^N h_1^k h_2^k,
1013: \label{baralphadef}
1014: \end{equation} 
1015: and we have appended a superscript G on $\Lambda^\text{G}_\text{ML}$ to indicate the maximum likelihood detection 
1016: statistic for a Gaussian signal.  
1017: 
1018: One can show that the maximum in Eq.~(\ref{long Gaussian stat}) is achieved at $\alpha = \hat\alpha$, $\sigma_1  = \hat\sigma_1$, and
1019: $\sigma_2  = \hat\sigma_2$, where
1020: \begin{eqnarray}
1021: \hat\alpha^2 &=& \bar\alpha^2 ~\theta(\bar\alpha^2) ,  \label{gaussian estimator1}\\
1022: \hat\sigma^2_i &=& (\bar\sigma^2_i - \hat\alpha^2) ~\theta\left( \bar\sigma^2_i - \hat\alpha^2 \right) , \label{gaussian estimator2}
1023: \end{eqnarray} 
1024: for $i=1,2$, and $\bar\sigma_1$ and $\bar\sigma_2$ are given by Eq.~(\ref{sigma bar def}). 
1025: Here $\theta(x)$ is the step function (\ref{stepfunction}). 
1026: The quantities (\ref{gaussian estimator1}) and (\ref{gaussian estimator2}) are the maximum likelihood estimators for 
1027: the variance $\alpha^2$ of the signal and the variances $\sigma_1^2$
1028: and $\sigma_2^2$ of the noise in the two detectors. The step functions  
1029: in Eqs.~(\ref{gaussian estimator1}) and (\ref{gaussian estimator2})
1030: arise as a result of the bounds of the maximization 
1031: in Eq.~(\ref{special likelihood estimator}).
1032: 
1033: The corresponding detection statistic is, from Eq.~(\ref{long Gaussian stat})
1034: % FOOTNOTE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1035: \footnote{To simplify the formula for $\Lambda_\text{ML}^\text{G}$ we assume that $\bar\sigma_i^2 - \bar\alpha^2> 0$.  
1036: This will be true for any realistic value of $N$ since $\bar\sigma_i^2
1037: - \bar\alpha^2 = \sigma_{i,{\rm true}}^2 + O(1/\sqrt{N})$, where
1038: $\sigma_{i,{\rm true}}$ is the true value of $\sigma_i$ and the second term
1039: describes the statistical fluctuations.},
1040: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1041: \begin{equation} \label{Gaussian statistic}
1042: \Lambda^\text{G}_\text{ML} = \left[ 1 - \frac{\hat\alpha^4}{\bar\sigma_1^2 \bar\sigma_2^2} \right]^{-N/2}.
1043: \end{equation} 
1044: The cross-correlation statistic $\Lambda_\text{CC}$ can be obtained
1045: from $\Lambda_\text{ML}^\text{G}$ via a monotonic transformation which
1046: preserves false dismissal versus false alarm curves [cf.\ Eq.\
1047: (\ref{transformation}) above]:
1048: \begin{eqnarray}
1049: \Lambda_\text{CC} &=& \sqrt{1 -(\Lambda^\text{G}_\text{ML})^{-2/N}}
1050: \nonumber \\
1051:  &=& \frac{\hat\alpha^2}{\bar\sigma_1 \bar\sigma_2}.
1052: \label{standard cross corr}
1053: \end{eqnarray}
1054: 
1055: 
1056: 
1057: 
1058: Note that if we had assumed the noise parameters ${\bf v}_n =
1059: (\sigma_1,\sigma_2)$ were known, and derived a statistic from
1060: Eq.~(\ref{known likelihood estimator}) rather than Eq.~(\ref{general
1061: likelihood estimator}), we would have found instead the detection
1062: statistic $\tilde\Lambda_\text{ML}^\text{G} =
1063: \bar\Lambda_\text{ML}^\text{G}~\theta(\bar\Lambda_\text{ML}^\text{G})$,
1064: where 
1065: \begin{equation} \label{known varriance}
1066: \bar\Lambda_\text{ML}^\text{G} = \bar\alpha^2 + \frac{1}{2}\left[
1067:                     \frac{\sigma_2^2}{\sigma_1^2}(\bar\sigma_1^2 -
1068:                     \sigma_1^2)                     +
1069:                     \frac{\sigma_1^2}{\sigma_2^2}(\bar\sigma_2^2 -
1070:                     \sigma_2^2)  \right], 
1071: \end{equation}
1072: which is different from the standard cross-correlation statistic. This
1073: non-standard result is obtained because of the unrealistic assumption 
1074: that the noise parameters ${\bf v}_n = (\sigma_1,\sigma_2)$ are
1075: known.  Different derivations of the result (\ref{known varriance})
1076: can be found in Refs.~\cite{sam joe,robust gaussian II}. 
1077: 
1078: 
1079: It is often useful to characterize the ``strength'' of a stochastic
1080: background in terms of the signal-to-noise ratio of the
1081: cross-correlation statistic (\ref{standard cross corr}), which we now
1082: define.  First note that for large $N$, the fractional fluctuations in 
1083: $\hat\alpha^2$ will be much larger than those in
1084: $\bar\sigma_1\bar\sigma_2$
1085: \footnote{This is true at fixed signal-to-noise ratio $\rho$.}.  
1086: For the purpose of defining the signal-to-noise ratio, we assume that $N$
1087: is large enough that  
1088: $\bar\sigma_1$ and $\bar\sigma_2$ in Eq.~(\ref{standard cross corr})
1089: can be taken to be independent of $h$, 
1090: so that $\Lambda_\text{CC}$ and $\hat\alpha^2$ are equivalent
1091: detection statistics.   
1092: We also use ${\bar \alpha}^2$ instead of
1093: ${\hat \alpha}^2$ in the computations that follow, as is conventional
1094: when defining 
1095: signal-to-noise ratios.  If a signal is present, then the expected
1096: value of $\bar\alpha^2$ is, from Eqs.\ (\ref{detector output
1097: matrices}), (\ref{assumption2})--(\ref{assumption4}), (\ref{Gaussian
1098: signal}) and (\ref{baralphadef}),
1099: \begin{equation} \label{expected value 1}
1100: \left< \bar \alpha^2 \right> = \alpha^2.
1101: \end{equation} 
1102: If no signal is present, so that $\alpha^2=0$, then the fluctuations in $\bar\alpha^2$ are given by
1103: \begin{equation} \label{fluctuations 1}
1104: \Delta \left( \bar \alpha^2 \right) = \frac{\sigma_1\sigma_2}{\sqrt{N}}.
1105: \end{equation} 
1106: The signal-to-noise ratio $\rho$ is defined to be the ratio of these two quantities:
1107: \begin{equation} \label{rho}
1108: \rho = \frac{\alpha^2\sqrt{N}}{\sigma_1\sigma_2}.
1109: \end{equation} 
1110: 
1111: \subsection{Non-Gaussian signal}
1112: \label{ss:Non-Gaussian signal}
1113: 
1114: As mentioned in the introduction, the traditional assumption that a
1115: gravitational wave stochastic background will be Gaussian  
1116: requires the individual events to be sufficiently frequent and
1117: uncorrelated.  Our model for a non-Gaussian signal assumes instead that the
1118: events are infrequent.  
1119: 
1120: Consider a collection of similar events generating a stochastic background $\mathcal{S}$. 
1121: Let $\xi$ be the probability that, at any randomly chosen time, the waves from an event 
1122: are arriving at the detectors.  We assume that
1123: the time structure of individual
1124: events cannot be resolved by the detectors.  That is, 
1125: we assume that the events occur over timescales smaller than the
1126: detectors' resolution time, as illustrated in Fig.~\ref{signal sketch}.
1127: \begin{figure}
1128: \begin{center}
1129: \epsfig{file=signal.eps,width=8cm}
1130: \caption{
1131: Sketched segment of the signal produced by a model non-Gaussian stochastic
1132: background of events unresolved by the detectors. Here we show two events.  The solid curve is the 
1133: exact signal.  This exact signal's contributions to the detector outputs, shown as stemmed {\sf o}'s, 
1134: are averages of the exact signal over the detector resolution timescale.
1135: }
1136: \label{signal sketch}
1137: \end{center}
1138: \end{figure}
1139: We assume that the distribution of the amplitudes of the events is
1140: Gaussian with variance $\alpha^2$.
1141: The probability distribution for the signal given the signal
1142: parameters $(\xi,\alpha)$ is therefore given by
1143: \begin{eqnarray} \label{signal model}
1144: p_{\mathcal{S}^k|\mathcal{V}_s,\mathcal{T}}[s^k| (\xi,\alpha),1] &=& 
1145: \frac{\xi}{\sqrt{2\pi}\alpha}\exp \left[ -\frac{\left( s^k\right) ^2}{2\alpha^2} \right] \nonumber \\
1146: &+& (1-\xi) \delta \left( s^k \right) \label{signal prior},
1147: \end{eqnarray} 
1148: together with Eq.~(\ref{assumption4}).
1149: Thus the signal model parameters are ${\bf v}_s=(\xi,\alpha)$, which
1150: give respectively the  ``event probability'' and ``event variance''
1151: characterizing the stochastic background.  The parameter space  
1152: $\Theta_s$ for this model is 
1153: \begin{equation} 
1154: \Theta_{s} = \left\{ (\xi,\alpha) ~|~ 0 \le \xi \le 1 \text{ and } \alpha \ge 0 \right\},
1155: \end{equation} 
1156: and the subset corresponding to a signal being present is
1157: \begin{equation} 
1158: \Theta_{s1} = \left\{ (\xi,\alpha) ~|~ 0 < \xi \le 1 \text{ and } \alpha > 0 \right\}.
1159: \end{equation} 
1160: 
1161: 
1162: 
1163: 
1164: Note that our assumption that the time structure of events is not resolved by the detector is unrealistic.  Detector resolution times 
1165: can be as small as 0.1 ms in the case of ground-based detectors like LIGO
1166: \footnote{For ground-based detectors, the effective resolution time in a cross-correlation between
1167: two detectors can be considerably longer than $0.1$ ms \cite{Allen
1168:   Romano}, which may help with this issue.},
1169: and even supernova bursts are expected to
1170: have time scales $\gtrsim 10$ ms \cite{waveform catalog,new waveform catalog}.  
1171: It will be important for future studies to relax this assumption.
1172: 
1173: 
1174: 
1175: We now compute the maximum likelihood detection statistic $\Lambda^\text{NG}_\text{ML}$ for our simple non-Gaussian signal model
1176: by substituting Eq.~(\ref{signal prior}) into Eq.~(\ref{special likelihood estimator}).
1177: This yields
1178: \begin{widetext}  
1179: \begin{eqnarray}  \label{main result2}
1180: \Lambda_{\text{ML}}^{\text{NG}} &=&
1181: \max_{0<\xi\le 1}~ \max_{\alpha > 0}~ \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~ \prod_{k=1}^N
1182: \left\{  
1183:         \frac{ \bar\sigma_1 \bar\sigma_2 \xi}{\sqrt{\sigma^2_1 \sigma^2_2 + \sigma^2_1 \alpha^2 + \sigma^2_2 \alpha^2}} 
1184:         \exp \left[ \frac{\left( \frac{h_1^k}{\sigma^2_1} + \frac{h_2^k}{\sigma^2_2}\right)^2}
1185:         {2\left( \frac{1}{\sigma^2_1} + \frac{1}{\sigma^2_2} + \frac{1}{\alpha^2} \right)} 
1186:         - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]  \right. \nonumber \\ 
1187: &+& \left. \frac{\bar\sigma_1 \bar\sigma_2}{\sigma_1 \sigma_2}  (1-\xi)
1188:         \exp \left[ - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]\right\}.
1189: \end{eqnarray}
1190: \end{widetext}
1191: The values of $\xi$, $\alpha^2$, $\sigma_1^2$, and $\sigma_2^2$ which achieve the maximum in Eq.~(\ref{main result2})
1192: are, respectively, estimators of the signal's Gaussianity parameter,
1193: the variance of the signal events, 
1194: and the noise variances in the two detectors\footnote{See Ref.~\cite{MG9} for a derivation of a statistic similar to $\Lambda_\text{ML}^\text{NG}$ and
1195: designed for the same non-Gaussian signals which is based on Eq.~(\ref{known likelihood estimator}) rather
1196: than Eq.~(\ref{general likelihood estimator}).}.
1197: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1198: Note that if we evaluate Eq.~(\ref{main result2}) at $\xi=1$, rather than maximizing over $\xi$, 
1199: we recover Eq.~(\ref{long Gaussian stat}) and the statistic $\Lambda_\text{ML}^\text{G}$.
1200: 
1201: We mention in passing an approximate version of the statistic
1202: (\ref{main result2}) which is significantly easier to compute.
1203: Expanding the logarithm of the 
1204: quantity to be maximized in Eq.~(\ref{main result2}) as a power 
1205: series in $\alpha^2$ to fourth order about $\alpha^2=0$ yields the 
1206: approximate statistic $\hat\Lambda_\text{ML}^\text{NG}$ given by 
1207: \begin{eqnarray} \label{expanded}
1208: \ln \hat\Lambda_\text{ML}^\text{NG} &=& \max_{0<\xi\le 1}~ \max_{\alpha > 0}~ \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~
1209:                         \sum_{n=0}^4 
1210:                         \sum_{l = 0}^{8} 
1211:                         \sum_{m=0}^8 
1212:                         \left(\frac{\alpha^2}{\sigma_1 \sigma_2}\right)^n \nonumber \\
1213: &\times&		C_{nlm}\left(\xi,\sigma_1,\sigma_2\right) 
1214: 			\sum_{k=1}^N (h_1^k)^l (h_2^k)^m,
1215: \end{eqnarray} 
1216: where the coefficients $C_{nlm}(\xi,\sigma_1^2,\sigma_2^2)$ 
1217: %are tabulated in Appendix \ref{s:coeffs}.  These coefficients 
1218: vanish
1219: unless $l+m$ is even and $l+m \le 8$. 
1220: In evaluating the statistic (\ref{expanded}), one can first evaluate the 24 sums
1221: \begin{equation} 
1222: \sum_{k=1}^N (h_1^k)^l (h_2^k)^m
1223: \end{equation} 
1224: for the required values of $l$ and $m$, and subsequently numerically maximize over the parameters $\xi$, 
1225: $\alpha$, $\sigma_1$, and $\sigma_2$. Thus the length-$N$ sums need only be performed once, rather than each time one tries a new 
1226: set of values for $\xi$, $\alpha$, 
1227: $\sigma_1$, and $\sigma_2$. Therefore the computational cost of $\hat\Lambda_\text{ML}^\text{NG}$ is only about an order of 
1228: magnitude greater than that of the cross correlation statistic
1229: $\Lambda_\text{CC}$, and this statistic may be useful to explore.
1230: 
1231: 
1232: 
1233: 
1234: We now derive the signal-to-noise ratio $\rho$ for the cross-correlation
1235: statistic and for the non-Gaussian signal (\ref{signal model}).
1236: If the signal is present, then from Eqs.\ (\ref{detector output
1237: matrices}), (\ref{assumption4}),
1238: (\ref{baralphadef}), (\ref{gaussian estimator1}) and (\ref{signal model})
1239: the expected value
1240: of $\bar\alpha^2$ is
1241: \begin{equation} \label{expected value 2}
1242: \left< \bar \alpha^2 \right> = \xi\alpha^2.
1243: \end{equation}
1244: If no signal is present then the fluctuations in $\bar\alpha^2$ are given by
1245: \begin{equation} \label{fluctuations 2}
1246: \Delta \left( \bar \alpha^2 \right) = \frac{\sigma_1\sigma_2}{\sqrt{N}}.
1247: \end{equation}
1248: Therefore, taking the ratio of Eqs.\ (\ref{expected value 2}) and
1249: (\ref{fluctuations 2}), the signal-to-noise ratio $\rho$ is
1250: \begin{equation} \label{rho2}
1251: \rho = \frac{\xi\alpha^2\sqrt{N}}{\sigma_1\sigma_2}.
1252: \end{equation}
1253: 
1254: 
1255: \section{Performance comparison}
1256: \label{s:Performance comparison}
1257: 
1258: 
1259: In this section we compare the performances of the cross-correlation
1260: statistic (\ref{standard cross corr}), the burst statistic
1261: (\ref{eq:lambdaBdef}), and the maximum
1262: likelihood statistic (\ref{main result2})
1263: for our model non-Gaussian signal described in Sec.~\ref{ss:Non-Gaussian signal}.   
1264: The comparison is quantified in terms of the false alarm versus false
1265: dismissal curves, as discussed in Sec.\ \ref{s:General theory of detection
1266: statistics and parameters estimator} above.
1267: In Sec.\ \ref{ss:analytic} we discuss analytic predictions for these curves
1268: for the three different statistics.  Section \ref{ss:Description of the
1269: simulation algorithm} describes our Monte Carlo simulation algorithm,
1270: and Secs.\ \ref{ss:Results for detection} and \ref{ss:Results for
1271:   parameter estimation} describe the results.
1272: 
1273: 
1274: 
1275: 
1276: \subsection{Analytic computation of asymptotic behavior of statistics}
1277: \label{ss:analytic}
1278: 
1279: 
1280: We start by discussing the set of parameters on which the false dismissal versus false 
1281: alarm curves can depend.  As before, we assume two detectors with noise characterized by
1282: Eq.~(\ref{assumption2}) with $\mathcal{V}_n=(\sigma_1,\sigma_2)$,  and a non-Gaussian 
1283: signal characterized by Eqs.~(\ref{assumption4}) and (\ref{signal prior}) with 
1284: $\mathcal{V}_s=(\xi,\alpha)$.
1285: The curves for each statistic are given by some function
1286: \begin{equation}\label{dependance1}
1287: P_\text{FD} = P_\text{FD}(P_\text{FA},\xi,\alpha,\sigma_1,\sigma_2,N)
1288: \end{equation}
1289: of the false alarm probability $P_{\rm FA}$, the Gaussianity parameter
1290: $\xi$, the rms amplitude $\alpha$ of events, the noise variances
1291: $\sigma_1^2$ and $\sigma_2^2$, and the number of data points $N$.
1292: We can simplify Eq.~(\ref{dependance1}) by replacing $\alpha$ with the
1293: signal-to-noise ratio $\rho$ using the definition (\ref{rho2}), and
1294: noting from dimensional analysis that $P_\text{FA}$ depends on $\sigma_1$ 
1295: and $\sigma_2$ at fixed $\rho$ only through the ratio
1296: $\sigma_1/\sigma_2$.  This gives  
1297: \begin{equation}\label{dependance2}
1298: P_\text{FD} = P_\text{FD}(P_\text{FA},\xi,\rho,\sigma_1/\sigma_2,N).
1299: \end{equation}
1300: For simplicity, we specialize to $\sigma_1=\sigma_2$ for the remainder of this paper.  
1301: This implies that
1302: \begin{equation}\label{dependance3}
1303: P_\text{FD} = P_\text{FD}(P_\text{FA},\xi,\rho,N).
1304: \end{equation}
1305: 
1306: 
1307: \subsubsection{Cross correlation statistic}
1308: 
1309: 
1310:  
1311: The false dismissal versus false alarm curves for the cross-correlation statistic can be computed 
1312: analytically in the large $N$ limit, as we now describe.  Our derivation generalizes the analysis of 
1313: Ref.~\cite{Allen Romano} from Gaussian to non-Gaussian signals. For any detection statistic $\Gamma$, 
1314: we can express $P_\text{FA}$ and $P_\text{FD}$ in terms of the detection threshold $\Gamma_*$ as
1315: \begin{eqnarray} 
1316: P_\text{FA}(\Gamma_*,\sigma_1,\sigma_2,N) &=& 
1317:          \int_{\Gamma_*}^\infty dx~p_{\Gamma|\mathcal{T}}(x|0) , \label{simple Pfa}\\
1318: P_\text{FD}(\Gamma_*,\xi,\rho,\sigma_1,\sigma_2,N) &=& 
1319:          1 - \int_{\Gamma_*}^\infty dx~p_{\Gamma|\mathcal{T}}(x|1)\nonumber  . \label{simple Pfd}\\
1320: \end{eqnarray}
1321: Here the definition of the random variable $\mathcal{T}$ is such that
1322: if $\mathcal{T}=0$ then no signal is present ($\xi = \rho = 0$), and  
1323: if $\mathcal{T}=1$ then a signal is present ($\xi \ne 0$ and $\rho \ne
1324: 0$); cf.\ Sec.\ \ref{ss:Notational conventions} above.
1325: Note that by eliminating $\Gamma_*$ between
1326: Eqs.~(\ref{simple Pfa}) and (\ref{simple Pfd}), we recover Eq.~(\ref{dependance1}).
1327: 
1328: In the large $N$ limit, the distribution
1329: $p_{\Lambda_\text{CC}|\mathcal{T}}(x|t)$ is a Gaussian by the
1330: central limit theorem, and the integrals
1331: (\ref{simple Pfa}) and (\ref{simple Pfd}) can be evaluated
1332: analytically (see Appendix \ref{s:appendixB}) to give
1333: \begin{eqnarray} \label{analytic}
1334: &&P_\text{FD} \left(P_\text{FA},\xi,\rho,N \right) =  1  \\
1335: &&-\frac{1}{2} \erfc \left[ \frac{\displaystyle \erfc^{-1} \left(2P_\text{FA}\right) - \frac{\rho}{\sqrt{2}} }  
1336:                            {\sqrt{\displaystyle \frac{\rho^2}{N}\left( \frac{3}{\xi} - 1 \right) + \frac{2\rho}{\sqrt{N}} + 1}} 
1337: 		      \right] + O \left( {1 \over \sqrt{N}} \right). \nonumber
1338: \end{eqnarray} 
1339: Here the function $\erfc(x)$ (known as the compliment of the error
1340: function) is defined by 
1341: \begin{equation} 
1342: \erfc(x) = \frac{2}{\sqrt{\pi}}\int_x^\infty dy~e^{-y^2},
1343: \end{equation}
1344: and $\erfc^{-1}(x)$ is the inverse of $\erfc(x)$.  
1345: The formula (\ref{analytic}) is valid only for $P_{\rm FA} < 1/2$; 
1346: $P_\text{FD}$ is undefined for $1/2 \le P_\text{FA} < 1$.  
1347: In deriving Eq.~(\ref{analytic}), we assumed 
1348: that the statistics $\Lambda_\text{CC}$ and $\hat\alpha^2$ 
1349: are equivalent, and that the distribution for $\bar\alpha^2$ is
1350: Gaussian. Those assumptions  
1351: are only valid up to fractional correction terms of order
1352: $1/\sqrt{N}$; hence the indicated correction term in Eq.\ (\ref{analytic}).
1353: 
1354: 
1355: In the regime where $\rho^2\ll N \xi$ in addition to $N \gg 1$, the
1356: result (\ref{analytic}) simplifies to 
1357: \begin{eqnarray} 
1358: P_\text{FD} \left(P_\text{FA},\xi,\rho,N \right) &=&  1 - \frac{1}{2} 
1359: \erfc \left[ \erfc^{-1} \left(2P_\text{FA}\right) -
1360: \frac{\rho}{\sqrt{2}} \right]  \nonumber \\ 
1361: &+& O\left( \frac{1}{ \sqrt{N} }\right) + O\left({ \rho \over
1362: \sqrt{N}} \right) + O\left({ \rho^2 \over
1363: N \xi } \right). \nonumber \\
1364:  \label{specialized analytic}
1365: \end{eqnarray}
1366: Note that the false dismissal versus false alarm relation 
1367: (\ref{specialized analytic}) is independent of both $N$ and $\xi$.
1368: Sample curves from Eq.~(\ref{specialized analytic}) are shown in
1369: Fig.~\ref{analytical curves}.   
1370: \begin{figure}
1371: \begin{center}
1372: \epsfig{file=analytical.eps,width=8.5cm}
1373: \caption{Sample false dismissal versus false alarm curves for
1374: the cross correlation statistic $\Lambda_\text{CC}$ 
1375: in the large $N$ limit, as prescribed by Eq.~(\ref{specialized
1376:   analytic}).  For these curves  
1377: the signal-to-noise ratio $\rho$ has equally spaced values from 0.01
1378: to 1. Note that here $P_\text{FD}$  
1379: is undefined for $1/2 \le P_\text{FA} <  1$.}
1380: \label{analytical curves}
1381: \end{center}
1382: \end{figure}
1383: The discontinuities at $P_\text{FA} = 1/2$ are a result of the step
1384: functions in the definition (\ref{gaussian estimator1})
1385: of $\hat\alpha^2$.   
1386: 
1387: 
1388: \subsubsection{Burst statistic}
1389: 
1390: By combining the definition (\ref{eq:lambdaBdef}) of the burst
1391: statistic together with the decomposition (\ref{eq:common}), the
1392: noise and signal distributions (\ref{assumption2}) and (\ref{signal
1393: prior}), and the change of variables (\ref{rho2})
1394: it is straightforward to derive the exact false alarm versus
1395: false dismissal relation.  The result is given by
1396: \begin{eqnarray}
1397: (1 - P_{\rm FA})^{1/N} = {\rm erf}\left({\Lambda_* \over \sqrt{2}}\right)
1398: \label{burstans1}
1399: \end{eqnarray}
1400: and
1401: \begin{equation} 
1402: P_{\rm FD}^{1/N} = \xi \, {\rm erf} \left[ { \Lambda_* \over \sqrt{
1403:        2 + {2 \rho \over \xi \sqrt{N}} }} \right] + (1 -
1404:       \xi) \, {\rm erf} \left( {\Lambda_* \over \sqrt{2}} \right),
1405: \label{burstans2}
1406: \end{equation}
1407: where $\Lambda_*$ is the value of the threshold.
1408: 
1409: \subsubsection{Maximum likelihood statistic}
1410: \label{s:MLS}
1411: 
1412: 
1413: We start by discussing the
1414: different regimes present in the space of signal 
1415: parameters $\xi$, $\rho$ and $N$, treating the 
1416: false alarm probability $P_\text{FA}$ as fixed.  There are several
1417: different constraints  
1418: on the three parameters $\xi$,
1419: $\rho$, and $N$ that define the regime in parameter space where we
1420: expect our maximum likelihood statistic to work well. First, it is clear that the total 
1421: number of events $\sim \xi N$ in the data set must be large compared to one:
1422: \begin{equation}
1423: \xi \gg \frac{1}{N}.
1424: \label{eq:constraint2a}
1425: \end{equation}
1426: 
1427: 
1428: Second, if the signal-to-noise ratio $\alpha^2 / (\sigma_1 \sigma_2)$
1429: of individual burst events is large compared to one, then one can detect the
1430: individual events using the burst statistic (\ref{eq:lambdaBdef})
1431: and the method of this paper is not needed.  From
1432: Eq.\ (\ref{rho}) we can write the constraint $\alpha^2 / (\sigma_1
1433: \sigma_2) \alt 1$ as
1434: \begin{equation}
1435: \xi \agt \frac{\rho}{\sqrt{N}}.
1436: \end{equation}
1437: A more precise version of this requirement can be obtained by noting
1438: that the detection threshold for the signal-to-noise ratio 
1439: $\alpha^2 / (\sigma_1 \sigma_2)$ is $\sim \sqrt{2 \ln N}$, since there
1440: are $N$ independent trials.  This yields the constraint
1441: \begin{equation}
1442: \xi \agt \frac{\rho}{\sqrt{2 N \ln N}}.
1443: \label{eq:constraint2}
1444: \end{equation}
1445: The regime $\xi \sim \rho / \sqrt{2 N \ln N}$ is where the 
1446: burst statistic $\Lambda_\text{B}$ starts becoming as sensitive as the
1447: cross correlation statistic, as can be seen by combining Eqs.\
1448: (\ref{specialized analytic}), (\ref{burstans1}) and (\ref{burstans2})
1449: above.  This behavior can also be seen  
1450: in Figs. \ref{omega gain} and \ref{fig:theoretical} above.
1451: 
1452: 
1453: A third constraint on the space of signal parameters is derived as
1454: follows.  Consider the statistic 
1455: \begin{equation}
1456: \eta = \frac{1}{N}\sum_{k=1}^N (h_1^k)^2(h_2^k)^2.
1457: \label{etadef}
1458: \end{equation} 
1459: We can use this statistic to estimate the Gaussianity parameter $\xi$
1460: in the following way.
1461: The mean value of $\eta$ when a signal is present is given by
1462: \begin{equation} \label{eta mean}
1463: \left< \eta \right> = 3\xi\alpha^4 + \xi\alpha^2(\sigma_1^2+\sigma_2^2) + \sigma_1^2 \sigma_2^2,
1464: \end{equation} 
1465: and the variance when a signal is absent is 
1466: \begin{equation} \label{eta var}
1467: (\Delta \eta)^2 = \sigma^4_1\sigma^4_2 \frac{8}{N}.
1468: \end{equation} 
1469: It follows from Eqs.\ (\ref{eta mean}), (\ref{eta var}), and the
1470: relation $\left<
1471: \hat\alpha^2 \right> = \xi\alpha^2$ that the estimator ${\hat \xi}$ of
1472: $\xi$ defined by
1473: \begin{equation}
1474: \hat \xi = \frac{3 \hat\alpha^4}{\eta - \hat\alpha^2 (\hat\sigma_1^2 + \hat\sigma_2^2) - \hat\sigma_1^2\hat\sigma_2^2} 
1475: \end{equation} 
1476: has a fractional accuracy of order
1477: \begin{equation}
1478: \frac{\Delta \xi}{\xi} \sim \frac{\xi \sqrt{N}}{\rho^2}.
1479: \label{Deltaxi}
1480: \end{equation}
1481: Now in the regime $\Delta \xi / \xi \ll 1$, we expect our maximum
1482: likelihood detection statistic to work well, since one's first guess
1483: for a nonlinear statistic (\ref{etadef}) can be used to detect the
1484: non-Gaussianity of the signal to high accuracy.
1485: In the regime $\Delta \xi / \xi \gg 1$, it is not obvious how the
1486: maximum likelihood detection statistic will perform, since it could
1487: have a performance much better than that of the statistic $\eta$.
1488: However, our Monte Carlo simulations [Sec.\ \ref{ss:Description of the
1489: simulation algorithm} below] and analytic 
1490: computations [Appendix 
1491: \ref{s:appendixC}] indicate that the maximum likelihood statistic does
1492: indeed perform poorly in the regime $\Delta \xi / \xi \gg 1$.
1493: Thus, our third constraint is $\Delta \xi / \xi \alt 1$, which from
1494: Eq.\ (\ref{Deltaxi}) can be written as 
1495: \begin{equation}
1496: \xi \alt \frac{\rho^2}{\sqrt{N}}.
1497: \label{constraint3}
1498: \end{equation}
1499: Our Monte Carlo simulations show that for $\rho^2/\sqrt{N} \alt \xi
1500: \alt 1$, the maximum likelihood and cross-correlation statistics
1501: perform roughly equivalently, and that once $\xi$ becomes smaller than
1502: $\rho^2 / \sqrt{N}$, the maximum likelihood statistic starts to
1503: perform significantly better than the cross-correlation statistic; see 
1504: Figs. \ref{omega gain} and \ref{fig:theoretical} above.
1505: 
1506: 
1507: In Appendix \ref{s:appendixC} we derive analytically the approximate
1508: expression (\ref{eq:ansA}) for the false dismissal
1509: probability for the maximum likelihood statistic, which we expect to
1510: be accurate up to corrections of order $1/\rho^4$ or a few tens of
1511: percent.  We also derive the expression (\ref{eq:ansB}) for the false
1512: alarm probability using a combination of analytical and numerical
1513: techniques.  Combining these results gives the curves which are associated with the
1514: maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$ and labeled ``analytic'' in 
1515: Figs.~\ref{omega gain}, \ref{fig:theoretical}, \ref{detection curves}, and \ref{LargeN}.
1516: 
1517: 
1518: 
1519: 
1520: \subsection{Description of the Monte Carlo simulation algorithm}
1521: \label{ss:Description of the simulation algorithm}
1522: 
1523: Next we describe our Monte Carlo simulations of the performances of the various statistics.
1524: We numerically estimate the false dismissal and false alarm probabilities
1525: $P_\text{FD}$ and $P_\text{FA}$ by conducting an ensemble
1526: of $N_E$ simulated experiments.  
1527: For each experiment we simulate a detector output matrix, half of which 
1528: have a signal present, and half of which do not.  Since we know in advance whether or not a signal is present, we can 
1529: easily estimate $P_\text{FA}$ and $P_\text{FD}$.  More specifically,
1530: our algorithm for simulating false  
1531: dismissal versus false alarm curves, for an arbitrary statistic
1532: $\Gamma$, is as follows:
1533: \begin{enumerate}
1534: \item Choose values for $\xi$, $\alpha$, $\sigma_1$, $\sigma_2$, and $N$.
1535: \item Choose the total number of trials $N_E$.
1536: \item For $r=1,2,\ldots,N_E/2$:
1537: 	\begin{enumerate}
1538:                 \item Generate a data train $h(\sigma_1,\sigma_2,N)$ of noise only.
1539:                 \item Compute $\Gamma$ and store result as $\Gamma_{r0}$.
1540: 		\item Generate a data train $h(\xi,\alpha,\sigma_1,\sigma_2,N)$ which has a signal present.
1541: 		\item Compute $\Gamma$ and store result as $\Gamma_{r1}$.
1542: 	\end{enumerate}
1543: \item Choose a discretization $\Gamma_{*j}$ of the set of thresholds,
1544: where $j=1,2,\ldots,M$.
1545: \item Set $P_\text{FA}(\Gamma_{*j}) = P_\text{FD}(\Gamma_{*j}) = 0$, for each $j$.
1546: \item For $r=1,2,\ldots,N_E/2$:
1547: 	\begin{enumerate}
1548: 		\item for each $j$, if $\Gamma_{r0} > \Gamma_{*j}$, increment $P_\text{FA}(\Gamma_{*j})$ by $2/N_E$.
1549: 		\item for each $j$, if $\Gamma_{r1} \le \Gamma_{*j}$, increment $P_\text{FD}(\Gamma_{*j})$ by $2/N_E$.
1550: 	\end{enumerate}
1551: \item Repeat steps 3-6 above several times to estimate the fluctuations in $P_\text{FA}(\Gamma_{*j})$ and $P_\text{FD}(\Gamma_{*j})$.
1552: \end{enumerate}
1553: 
1554: We use the above algorithm to simulate false dismissal versus false
1555: alarm curves for the three statistics $\Lambda_\text{CC}$,
1556: $\Lambda_{\rm B}$ and 
1557: $\Lambda_\text{ML}^\text{NG}$.  The analytical expressions
1558: (\ref{analytic}) and (\ref{burstans1}) -- (\ref{burstans2}) for the
1559: cross-correlation and burst statistics are 
1560: used as a check of the numerical method.
1561: 
1562: 
1563: 
1564: 
1565: \subsection{Simulation results}
1566: \label{ss:Results for detection}
1567: 
1568: A family of simulated false dismissal versus false alarm curves for
1569: the cross correlation statistic $\Lambda_\text{CC}$ and
1570: the maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$ is  
1571: shown in Fig.~\ref{compare}. 
1572: \begin{figure}
1573: \begin{center}
1574: \epsfig{file=compare.eps,width=8.5cm}
1575: \caption{
1576: Plots of false dismissal probability ($P_\text{FD}$) versus false alarm
1577: probability ($P_\text{FA}$) for the standard cross-correlation
1578: statistic $\Lambda_\text{CC}$ and 
1579: our maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$.  Each
1580: of these curves is characterized by a total number of trials $N_E =
1581: 2\times 10^4$, number of data points $N = 5\times10^4$, noise variances
1582: $\sigma_1 = \sigma_2 = 1$, and by the signal-to-noise ratio $\rho = 1$.
1583: The values of the Gaussianity parameter $\xi$ are 0.02, 0.012, and
1584: 0.01.  The solid curves are the results for $\Lambda_{\rm ML}^{\rm
1585: G}$; these curves are bunched together because  
1586: $\rho$ is fixed.  The dashed curves are the results for $\Lambda_{\rm
1587: ML}^{\rm NG}$.  For the dashed curves, the lowest curve is for $\xi =
1588: 0.01$, while the highest curve is for $\xi = 0.02$. 
1589: We estimate error bars for each of these curves by separating the $2 \times
1590: 10^4$ runs into 10 bins of $2 \times 10^3$, and generating 10 separate
1591: plots; the resulting fluctuations are $\alt 10^{-3}$.  The curves for
1592: the cross correlation statistic $\Lambda_{\rm ML}^{\rm G}$ agree with
1593: the analytic prediction (\ref{analytic}) to within $\sim 10^{-3}$.
1594: This plot shows that $\Lambda_\text{ML}^\text{NG}$ can perform 
1595: significantly better than $\Lambda_\text{CC}$.
1596: }
1597: \label{compare}
1598: \end{center}
1599: \end{figure}
1600: We see that at fixed $\rho$, as the Gaussianity $\xi$ of the signal
1601: decreases, $\Lambda_\text{ML}^\text{NG}$ performs increasingly better
1602: than $\Lambda_\text{CC}$.  
1603: The curves for $\Lambda_\text{CC}$ are almost indistinguishable from
1604: each other because $\rho$ is fixed, and the curves depend only on
1605: $\rho$ and not on $\xi$ for this detection statistic in the large $N$
1606: limit [cf.\ Eq.\ (\ref{specialized analytic}) above].
1607: 
1608: If we maintain the same value for $\rho$ as in Fig.~\ref{compare}, but
1609: take $\xi \gtrsim 0.03$, the  
1610: curves for $\Lambda_\text{CC}$ and $\Lambda_\text{ML}^\text{NG}$ cannot be distinguished from each other.  
1611: We find in general that for \emph{any} values of $N$, $\sigma_1$, $\sigma_2$, and $\rho$, 
1612: as $\xi \rightarrow 1$, the false dismissal versus false alarm curves for $\Lambda_\text{CC}$ 
1613: and $\Lambda_\text{ML}^\text{NG}$ cannot be distinguished from each other.
1614: Thus, the two statistics are nearly equivalent for Gaussian
1615: signals, as expected.  However, for $\xi \ll 1$, 
1616: Fig.\ \ref{compare} demonstrates that $\Lambda_\text{ML}^\text{NG}$ performs noticeably better than
1617: $\Lambda_\text{CC}$. 
1618: 
1619: 
1620: We now discuss a comparison of the two statistics in terms of the
1621: minimum gravitational wave energy density necessary for detection,
1622: instead of in terms of the false dismissal versus false alarm curves.  
1623: For a stochastic background with rms strain amplitude 
1624: $h_\text{rms}$, we have $\Omega \propto h^2_\text{rms}$ \cite{Allen
1625: Review}, where $\Omega$ is the gravitational wave energy density.  
1626: For our model signal (\ref{signal model}) we have $h_{\rm rms}^2
1627: \propto \xi \alpha^2$, and comparing this with the 
1628: formula $\rho \propto \xi \alpha^2$ from Eq.\ (\ref{rho2}) shows that
1629: we can interpret the signal to 
1630: noise ratio $\rho$ as the energy density in the stochastic background,
1631: even for non-Gaussian signals.
1632: 
1633: 
1634: We compute the minimum detectable energy density or signal-to-noise
1635: ratio $\rho_{\rm detectable}$ as follows.  First, we choose 
1636: thresholds $P_{\text{FA}*}$ and $P_{\text{FD}*}$ for the false alarm
1637: and false dismissal probabilities.  We refer to the pair
1638: $(P_{\text{FA}*},P_{\text{FD}*})$ as the \emph{detection point}. 
1639: For any statistic $\Gamma$, the choice of detection point determines
1640: the detection threshold $\Gamma_*$, and inverting Eq.~(\ref{dependance3})
1641: gives the minimum detectable signal-to-noise ratio 
1642: \begin{equation} 
1643: \rho = \rho_\text{detectable}(P_{\text{FA}*},P_{\text{FD}*},\xi,N),
1644: \end{equation} 
1645: as illustrated in Fig.~\ref{rho detectable}. 
1646: \begin{figure}
1647: \begin{center}
1648: \epsfig{file=critical.eps,width=8.5cm}
1649: \caption{A family of false dismissal versus false alarm curves for fixed $\xi$.  
1650: Here the detection point, at $P_{\text{FD}*} = P_{\text{FA}*} = 0.1$, is marked with $*$.}
1651: \label{rho detectable}
1652: \end{center}
1653: \end{figure}
1654: For the cross-correlation statistic $\Lambda_\text{CC}$ the
1655: result is, from Eq.~(\ref{analytic}), 
1656: \begin{eqnarray}
1657: \rho_\text{detectable}^\text{CC} &=& 
1658: \frac{2\sqrt{2}\gamma \left[1 + \gamma\sqrt{2/N}\right]}{ 1 +
1659:   2\gamma^2\left(1 - \frac{3}{\xi} 
1660:   \right)/N } \left[ 1 + O\left( {1 \over \sqrt{N} }\right) \right] \nonumber \\
1661: & & \label{analytic detectable} \\
1662: &=& 2\sqrt{2}\gamma + O\left(\frac{1}{\sqrt{N}}\right) + O\left(
1663:      {\gamma \over \sqrt{N}} \right), \nonumber \\
1664: & & \label{specialized detectable}
1665: \end{eqnarray} 
1666: where $\gamma = \erfc^{-1}(2P_{\text{FA}*})$ and we have assumed that
1667: $P_{\text{FA}*}=P_{\text{FD}*}$.
1668: This relation is plotted in Fig.~\ref{min rho^G_detectable}. 
1669: 
1670: 
1671: \begin{figure}
1672: \begin{center}
1673: \epsfig{file=MinDetect.eps,width=8.5cm}
1674: \caption{The minimum detectable signal-to-noise ratio $\rho_\text{detectable}^\text{CC}$ for the cross-correlation statistic $\Lambda_\text{CC}$
1675: as a function of the false alarm probability threshold $P_{\text{FA}*}$. Note that we assume the false dismissal probability
1676: threshold $P_{\text{FD}*} = P_{\text{FA}*}$.}
1677: \label{min rho^G_detectable}
1678: \end{center}
1679: \end{figure}
1680: 
1681: From the results of our simulations, we determine 
1682: $\rho_\text{detectable}(P_{\text{FA}*},P_{\text{FD}*},\xi,N)$ by
1683: numerically solving the equation
1684: \begin{equation} \label{root}
1685: P_\text{FD}(P_{\text{FA}*},\xi,\rho,N) - P_{\text{FD}*} = 0
1686: \end{equation} 
1687: for $\rho$.
1688: Unfortunately, evaluating the function on the left hand side of
1689: Eq.~(\ref{root}) 
1690: is computationally expensive.  Each evaluation involves simulating the
1691: false dismissal versus false alarm curve which 
1692: is itself a computationally intensive task.  Moreover, 
1693: it is only feasible for us to solve Eq.~(\ref{root}) for values of $N$
1694: $\alt 10^4$ while 
1695: a realistic detection scenario for ground based detectors would
1696: involve a year's worth of data sampled at $\sim 100$ 
1697: Hz for which $N\sim 10^9$.  Therefore our conclusions about the
1698: applicability of the method to ground based detectors are based on our
1699: analytic results, as discussed in the Introduction.
1700: 
1701: 
1702: 
1703: Figure \ref{detection curves} shows the results obtained from
1704: numerically solving Eq.~(\ref{root}) for $\rho_{\rm detectable}$ for
1705: the parameter values $\xi = 0.02$, $P_{\text{FA}*} = P_{\text{FD}*} = 0.1$,
1706: and Fig.~\ref{LargeN} shows the corresponding results for $\xi = 4.3
1707: \times 10^{-3}$.  For the cross-correlation statistic, the results are
1708: in good agreement with the analytic prediction 
1709: (\ref{analytic detectable}).
1710: \begin{figure}
1711: \begin{center}
1712: \epsfig{file=detection.eps,width=8.5cm}
1713: \caption{The minimum detectable signal strength
1714:   $\rho_\text{detectable}$ as a function of
1715: the number of data points $N$,
1716: for the false alarm probability threshold $P_{\text{FA}*}=0.1$, false
1717: dismissal probability threshold $P_{\text{FD}*} = 0.1$, and
1718: Gaussianity parameter $\xi = 0.02$.
1719: The circles are the simulation results, and the error bars 
1720: are estimated from ten
1721: different runs.  The solid curve is the analytical prediction (\ref{analytic detectable}) for 
1722: $\Lambda_\text{CC}$, and the dotted line is the $N \to \infty$ limit
1723:   (\ref{specialized detectable}).
1724: The dashed line is the analytic prediction for $\Lambda_{\rm
1725:   ML}^{\rm NG}$ given by Eqs.\ (\ref{eq:ansA}) and (\ref{eq:ansB}).
1726: }
1727: \label{detection curves}
1728: \end{center}
1729: \end{figure}
1730: \begin{figure}
1731: \begin{center}
1732: \epsfig{file=LargeN.eps,width=8.5cm}
1733: \caption{Same as Fig.\ \protect{\ref{detection curves}} but with $\xi
1734:   = 4.3 \times 10^{-3}$.
1735: }
1736: \label{LargeN}
1737: \end{center}
1738: \end{figure}
1739: 
1740: 
1741: Figure \ref{omega gain} shows the minimum detectable energy density 
1742: as a function of the Gaussianity parameter $\xi$ for $N=10^4$ (corresponding to space based detectors), for the
1743: cross-correlation and maximum likelihood statistics and also for the
1744: burst statistic (\ref{eq:lambdaBdef}).  We again use the values   
1745: $P_{\text{FA}*} =  P_{\text{FD}*} = 0.1$.  The figure shows that the
1746: maximum likelihood statistic performs better than the other statistics
1747: by a factor which is roughly 3 for $\xi$ of order 1\%.  For smaller values
1748: of $\xi$, the maximum likelihood performs increasingly better than the 
1749: cross-correlation statistic, but is eventually comparable to the burst statistic.
1750: Thus the maximum likelihood statistic gives an improvement in sensitivity to backgrounds
1751: composed of roughly $10$ to $10^{3}$ events per year.
1752: 
1753: Figure \ref{fig:theoretical} is a similar plot, without the Monte Carlo simulation results,
1754: for $N = 10^9$ (corresponding to ground based detectors). Here we use $P_{\text{FA}*} =  P_{\text{FD}*} = 0.01$.
1755: The results are similar to those in Fig.~\ref{omega gain}, except that here the gain in sensitivity
1756: occurs in the band $10^{-5} < \xi < 10^{-3}$.  This band corresponds to $10^4$-$10^6$ events per year.
1757: 
1758: 
1759: 
1760: 
1761: 
1762: 
1763: \subsection{Parameter estimation}
1764: \label{ss:Results for parameter estimation}
1765: 
1766: 
1767: The computation of the maximum likelihood statistic also serves to
1768: measure the parameters of the signal.
1769: The statistic $\Lambda_\text{ML}^\text{NG}$, from Eq.~(\ref{main result2}), can be written as
1770: \begin{equation} \label{main result simple form}
1771: \Lambda_\text{ML}^\text{NG} = \max_{0<\xi\le 1}~ \max_{\alpha^2>0}~ 
1772: 			\max_{\sigma^2_1 \ge 0}~ \max_{\sigma^2_2 \ge 0}~ 
1773: 			\lambda(\xi, \alpha^2, \sigma^2_1,\sigma^2_2).
1774: \end{equation} 
1775: The point $(\hat \xi,\hat \alpha^2,\hat \sigma_1^2,\hat \sigma^2_2)$ where this maximum is achieved 
1776: is the maximum likelihood estimator for $(\xi,\alpha^2,\sigma_1^2,\sigma^2_2)$. In Fig.~\ref{contours} 
1777: we show contours of the function $\ln \lambda$ for a strong ($\rho = 20$) signal.
1778: \begin{figure}
1779: \begin{center}
1780: \epsfig{file=contour.eps,width=8.5cm}
1781: \caption{
1782: Representative contours of $\ln \lambda(\xi,\alpha^2,\hat \sigma_1^2,\hat \sigma_2^2)$.
1783: Here $\rho= 20$ and $N = 1.6 \times 10^5$.  The simulated signal is characterized by $\xi = 0.2$ and 
1784: $\alpha^2 = 0.25$, marked with an $\times$.  The noise is characterized by $\sigma_1^2 = \sigma_2^2 = 1$.
1785: The maximum, marked with a $+$, is found at $\ln \lambda(0.207, 0.251, 0.993, 0.993) = 229$, 
1786: while $\ln \lambda(0.2,0.25,1,1) = 227$.  
1787: }
1788: \label{contours}
1789: \end{center}
1790: \end{figure}
1791: This figure shows that both $\xi$ and $\alpha^2$ can be measured with
1792: good accuracy. 
1793: 
1794: Note that the main benefit of using $\Lambda_{\rm ML}^{\rm NG}$
1795: is that it allows  
1796: us to detect signals that are too weak to be seen using $\Lambda_\text{CC}$.  Using 
1797: $\Lambda_\text{ML}^\text{NG}$ also allows one to test if a detected signal is Gaussian, as obtained 
1798: above, but this is not the main benefit of the method, as there are other, simpler, methods to test 
1799: for non-Gaussianity.
1800: 
1801: \section{Conclusions}
1802: \label{s:Conclusions}
1803: 
1804: The use of our maximum likelihood statistic in searches for a
1805: non-Gaussian background gives a gain in sensitivity over the
1806: standard cross-correlation statistic.  Figures \ref{omega gain} and \ref{fig:theoretical} show
1807: that the gain factor can be significant for sufficiently non-Gaussian signals.
1808: However, computing the maximum likelihood statistic requires significantly more
1809: computational power than the cross-correlation statistic. 
1810: 
1811: 
1812: 
1813: The analysis presented here must be generalized in several ways before
1814: being usable in gravitational wave detectors.  These generalizations,
1815: listed in order of importance, are:
1816: 
1817: \begin{itemize}
1818: 
1819: \item Our signal model (\ref{signal model}) assumes a Gaussian
1820: distribution of amplitudes of the burst events.  This assumption simplified
1821: our analysis and resulted in a statistic with the useful property of 
1822: being nearly equivalent to the cross-correlation statistic in the Gaussian 
1823: signal limit.  In practice however, the distribution of the events
1824: should instead be based on the candidate sources.  For
1825: example, a popcorn-like stochastic background produced by a spatially
1826: uniform distribution 
1827: of standard-candle sources out to some maximum redshift would have a
1828: signal distribution of the form (\ref{signal model}) with the Gaussian
1829: term replaced by a term proportional to $s^{-4}\theta(s-s_{\min})$,
1830: where $\theta$ is the step function and $s_{\rm min}$ is a cutoff
1831: signal strength.
1832: 
1833: \item One should allow the burst durations to be longer than the
1834: detector resolution time.  For this situation one possibility would
1835: be to preprocess the data with a lowpass filter, and then apply
1836: the techniques developed here.  Another possibility would be to try to
1837: combine the analysis of this paper with the excess power detection
1838: method of Ref.\ \cite{excess power}.
1839: 
1840: \item Real detector noise always contains non-Gaussian components, so
1841: one needs to generalize the analysis to allow for this.  Such a
1842: generalization for a Gaussian stochastic background can be found in
1843: Refs.\ \cite{robust gaussian,robust gaussian II}.
1844: 
1845: \item It would be useful to consider a more general signal model which
1846: consists of a superposition of a Gaussian background and a
1847: non-Gaussian background, since the true gravitational wave background
1848: might consist of such a superposition.
1849: 
1850: \item The analysis needs to be generalized to allow for colored
1851: detector noise, and separated, misaligned detectors.  This
1852: generalization should be fairly straightforward.
1853: 
1854: \end{itemize}
1855: 
1856: 
1857: 
1858: 
1859: \begin{acknowledgments}
1860: We thank Wolfgang Tichy, Tom Loredo, Teviet Creighton, and Bernard Whiting for helpful
1861: discussions, and the web site {\it google.com} for providing useful
1862: references on the generalized central limit theorem.
1863: The analytic computations in Appendix \ref{s:appendixC}
1864: were carried out using the software package {\it Mathematica}.
1865: This work was supported in part by National Science
1866: Foundation awards PHY-9722189 and PHY-0140209, the Alfred P. Sloan
1867: foundation, the Radcliffe Institute for Advanced Study, and the
1868: NASA/New York Space Grant Consortium.  
1869: \end{acknowledgments}
1870: 
1871: \appendix
1872: 
1873: \section{General form of the likelihood ratio}
1874: \label{s:appendixA}
1875: 
1876: In this appendix we give two derivations of the general formula
1877: (\ref{general likelihood ratio}) for the likelihood ratio. 
1878: The first derivation is based on Eq.~(\ref{def1}) while the second is based on Eq.~(\ref{def2}).  
1879: We also derive the formula (\ref{distribution relation}) for the posterior probability density 
1880: $p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1)$.
1881: 
1882: \subsection{First derivation}
1883: \label{ss:First derivation}
1884: 
1885: We can derive Eq.~(\ref{general likelihood ratio}) by using the total probability theorem to 
1886: expand the distributions in the numerator and denominator of Eq.~(\ref{def1}). Note that all 
1887: distributions in this derivation are priors.  
1888: 
1889: First expand $p_\mathcal{H}(h)$ just in terms of the random variable
1890: $\mathcal{T}$ 
1891: \begin{equation} \label{expansion goal} 
1892: p_\mathcal{H}(h) = P_\mathcal{T}(1)p_\mathcal{H|T}(h|1) 
1893:                  + P_\mathcal{T}(0) p_\mathcal{H|T}(h|0).
1894: \end{equation} 
1895: Expanding $p_\mathcal{H}(h)$ in terms of all the degrees of freedom yields
1896: \begin{eqnarray} 
1897: && p_\mathcal{H}(h) = \sum_{t=0}^1~ \int_{\Theta_s} d^{Q_s}v_s~ \int d^{ND}s~ 
1898: 		\int_{\Theta_n} d^{Q_n}v_n \label{expand1} \\
1899: &&		~\times~  p_{\mathcal{H|T,V}_s,\mathcal{S,V}_n}(h|t,{\bf v}_s,s,{\bf v}_n) 
1900: 			  p_{\mathcal{T,V}_s,\mathcal{S,V}_n}(t,{\bf v}_s,s,{\bf v}_n), \nonumber.
1901: \end{eqnarray} 
1902: The ratio of the coefficients of $P_\mathcal{T}(1)$ and $P_\mathcal{T}(0)$ in Eq.~(\ref{expand1}) will 
1903: give the general expression for the likelihood ratio by Eq.~(\ref{def1}).
1904: 
1905: The conditional distribution for $\mathcal{H}$ in Eq.~(\ref{expand1}) can be translated into a conditional distribution 
1906: for $\mathcal{N}$.  From Eq.~(\ref{detector output matrices}) it
1907: follows that
1908: \begin{equation} \label{trick1}
1909: p_\mathcal{H|S}(h|s)=p_\mathcal{N+S|S}(h|s) = p_\mathcal{N|S}(h-s|s),
1910: \end{equation}
1911: and since $\mathcal{S}$ and $\mathcal{N}$ are statistically independent we obtain
1912: \begin{equation} \label{trick3}
1913: P_\mathcal{H|S}(h|s)=P_\mathcal{N}(h-s).
1914: \end{equation}
1915: Generalizing this argument gives
1916: \begin{equation}  \label{simplify1}
1917: p_{\mathcal{H|T,V}_s,\mathcal{S,V}_n}(h|t,{\bf v}_s,s,{\bf v}_n) = 
1918: p_{\mathcal{N|V}_n}(h-s|{\bf v}_n),
1919: \end{equation} 
1920: since a priori $\mathcal{T}$, $\mathcal{V}_s$, and $\mathcal{S}$ are statistically independent 
1921: of $\mathcal{N}$ and  $\mathcal{V}_n$. For the same reason we can
1922: write the joint distribution that appears in Eq.~(\ref{expand1}) as
1923: \begin{equation} \label{simplify2}
1924: p_{\mathcal{T,V}_s,\mathcal{S,V}_n}(t,{\bf v}_s,s,{\bf v}_n) = 
1925: p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s)p_{\mathcal{V}_n}({\bf v}_n).
1926: \end{equation} 
1927: 
1928: Substituting Eqs.~(\ref{simplify1}) and (\ref{simplify2}) into Eq.~(\ref{expand1}) yields
1929: \begin{eqnarray} 
1930: p_\mathcal{H}(h) &=& \sum_{t=0}^1~ \int_{\Theta_s} d^{Q_s}v_s~ \int d^{ND}s~ 
1931: 		\int_{\Theta_n} d^{Q_n}v_n \label{expand2} \\
1932:                 &\times&  p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s)
1933: 		p_{\mathcal{V}_n}({\bf v}_n) .\nonumber
1934: \end{eqnarray}
1935: We can also rewrite the distribution
1936: $p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s)$ as
1937: \begin{equation} \label{expand3} 
1938: p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s) = p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,t) 
1939: p_{\mathcal{V}_s|T}({\bf v}_s,t) P_\mathcal{T}(t),
1940: \end{equation} 
1941: by Eq.~(\ref{conditional joint}).
1942: Substituting Eq.~(\ref{expand3}) into Eq.~(\ref{expand2}) and explicitly evaluating the sum over $t$ yields
1943: \begin{widetext}
1944: \begin{eqnarray} 
1945: p_\mathcal{H}(h) &=& P_\mathcal{T}(1)\int_{\Theta_{s1}}d^{Q_s}v_s~ \int d^{ND}s~ \int_{\Theta_n} d^{Q_n}v_n~
1946: 		 p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) 
1947: 		 p_{\mathcal{V}_n}({\bf v}_n)
1948: 		 p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) 
1949: 		 p_{\mathcal{V}_s|\mathcal{T}}({\bf v_s}|1) \nonumber \\
1950: 		 &+& P_\mathcal{T}(0)\int_{\Theta_n} d^{Q_n}v_n~ 
1951: 		 p_{\mathcal{N|V}_s}(h|{\bf v}_n). \label{expand4}
1952: \end{eqnarray}
1953: \end{widetext}
1954: Here we have used the following relations:
1955: \begin{eqnarray} 
1956: p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s\in \Theta_{s1},0) &=& \delta^{ND}(s) \\
1957: p_{\mathcal{V}_s|\mathcal{T}}({\bf v_s}\in \Theta_{s0}|1) &=& 0  \\
1958: p_{\mathcal{V}_s|\mathcal{T}}({\bf v_s}\in \Theta_{s1}|0) &=& 0  \\
1959: \int_{\Theta_{s0}} d^{Q_s}v_s~ p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s|0) &=& 1.
1960: \end{eqnarray}
1961: By comparing Eqs.~(\ref{expansion goal}) and (\ref{expand4}) we can read off the distributions 
1962: $p_\mathcal{H|T}(h|t)$ and construct Eq.~(\ref{general likelihood
1963: ratio}) from Eq.~(\ref{def1}).  Note that the expression (\ref{def1})
1964: is independent of the space $\Theta_{s0}$ of signal parameters
1965: corresponding to ``no signal present''. 
1966: 
1967: 
1968: \subsection{Second derivation}
1969: \label{Second derivation}
1970: 
1971: Here we derive Eq.~(\ref{general likelihood ratio}), and also Eq.~(\ref{distribution relation}), from Eq.~(\ref{def2}).
1972: Consider the distribution 
1973: \begin{equation} \label{simple1} 
1974: p_{\mathcal{T,V}_s,\mathcal{V}_n|\mathcal{H}}(1,{\bf v}_s,{\bf v}_n|h) = 
1975: 	\frac{ p_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h) }{ p_\mathcal{H}(h) }.
1976: \end{equation} 
1977: We will justify Eq.~(\ref{general likelihood ratio}) by the defining relation Eq.~(\ref{def2}), which explicitly refers 
1978: to priors and posteriors.  Therefore we now append the appropriate superscripts as bookkeeping devices.  
1979: Eq.~(\ref{simple1}) then reads
1980: \begin{equation} \label{simple2} 
1981: p^{(1)}_{\mathcal{T,V}_s,\mathcal{V}_n}(1,{\bf v}_s,{\bf v}_n) = 
1982: \frac{ p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h) }
1983:      { p^{(0)}_\mathcal{H}(h) }.
1984: \end{equation}
1985: 
1986: Using the expansion of $p_\mathcal{H}(h)$ given by Eq.~(\ref{expand4}), and what we will justify is the 
1987: likelihood ratio $\Lambda$ given by Eq.~(\ref{general likelihood ratio}), we have
1988: \begin{equation} \label{simple3} 
1989: p^{(1)}_{\mathcal{T,V}_s,\mathcal{V}_n}(1,{\bf v}_s,{\bf v}_n) = 
1990: \frac{ \left[ \frac{p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h)}
1991: 			{\int_{\Theta_n} d^{Q_n}v_n'~ p^{(0)}_{\mathcal{H|V}_n}(h|{\bf v}_n)
1992: 			p^{(0)}_{\mathcal{V}_n}({\bf v}_n)} \right] }
1993: 			{ \Lambda P^{(0)} + 1 - P^{(0)} }.
1994: \end{equation} 
1995: Expanding the uppermost numerator in Eq.~(\ref{simple3}) over
1996: $\mathcal{S}$ by the total probability theorem gives
1997: \begin{eqnarray}  \label{simple4}
1998: p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h) &=& \int d^{ND}s \\
1999: &\times& p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H,S}}(1,{\bf v}_s,{\bf v_n},h,s), \nonumber
2000: \end{eqnarray} 
2001: and rewriting this gives
2002: \begin{eqnarray} \label{simple expand}
2003: && p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H,S}}(1,{\bf v}_s,{\bf v_n},h,s) = 
2004: 			p^{(0)}_{\mathcal{N|V}_s}(h-s|{\bf v}_s) \nonumber \\
2005: && \times		p^{(0)}_{\mathcal{V}_n}({\bf v}_n) 
2006: 			p^{(0)}_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) 
2007: 			p^{(0)}_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s,1)P^{(0)}. 
2008: \end{eqnarray} 
2009: After putting Eq.~(\ref{simple expand}) into Eq.~(\ref{simple4}), substitute the result into Eq.~(\ref{simple3}).
2010: Using $\Lambda({\bf v}_s,{\bf v}_n)$ given by Eq.~(\ref{likelihood function}) then yields
2011: \begin{equation} \label{simple5} 
2012: P^{(1)}p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1) = 
2013: \frac{ \Lambda({\bf v}_s,{\bf v}_n)P^{(0)} }{ \Lambda P^{(0)} + 1 - P^{(0)} }.
2014: \end{equation} 
2015: On the left hand side of Eq.~(\ref{simple5}) we have used 
2016: \begin{equation} 
2017: p^{(1)}_{\mathcal{T,V}_s,\mathcal{V}_n}(1,{\bf v}_s,{\bf v}_n) = P^{(1)}
2018: p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1).
2019: \end{equation} 
2020: 
2021: Integrate Eq.~(\ref{simple5}) over $\Theta_n$ and $\Theta_{s1}$ using Eq.~(\ref{likelihood function def}) 
2022: and the normalization requirement 
2023: \begin{equation} \label{simple normalization}
2024: \int_{\Theta_{s1}}d^{Q_s}v_s \int_{\Theta_n}d^{Q_n}v_n~ 
2025: p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1) = 1
2026: \end{equation}  
2027: to get 
2028: \begin{equation} \label{p1}
2029: P^{(1)} = \frac{\Lambda P^{(0)}}{\Lambda P^{(0)} + 1 - P^{(0)}}.
2030: \end{equation} 
2031: Use Eq.~(\ref{p1}) and Eq.~(\ref{simple5}) to form the ratio on the left hand side of 
2032: Eq.~(\ref{distribution relation}) .  This justifies Eq.~(\ref{distribution relation}). 
2033: 
2034: Integrate Eq.~(\ref{distribution relation}) over $\Theta_n$ and $\Theta_{s1}$ using 
2035: Eq.~(\ref{likelihood function def}) and Eq.(~\ref{simple normalization}) to see that the 
2036: defining relation Eq.~(\ref{def2}) is satisfied and thus Eq.~(\ref{general likelihood ratio}) 
2037: is justified.
2038: 
2039: \medskip
2040: 
2041: 
2042: 
2043: \section{Analytical expressions for false dismissal versus false alarm
2044: curves for cross-correlation statistic}
2045: \label{s:appendixB}
2046: This appendix derives the analytical form (\ref{analytic})
2047: of the false dismissal versus false alarm curves for the
2048: cross-correlation statistic $\Lambda_\text{CC}$ in the large
2049: $N$ limit, for both Gaussian and non-Gaussian signals.  
2050: A derivation for Gaussian signals can be found in Sec.~IV of
2051: Ref.~\cite{Allen Romano}. 
2052: 
2053: As noted in Sec.~\ref{ss:Gaussian signal}, the statistics
2054: $\Lambda_\text{CC}$ and $\hat\alpha^2$ 
2055: are equivalent in the large $N$ limit. Thus, in this limit, the false
2056: dismissal versus false alarm curves 
2057: can be found by evaluating Eqs.~(\ref{simple Pfa}) and (\ref{simple
2058: Pfd}) with $\Gamma$ replaced by $\hat\alpha^2$.  
2059: The relation (\ref{gaussian estimator1}) between the statistics ${\bar
2060: \alpha}^2$ and ${\hat \alpha}^2$ implies the following relation
2061: between their probability distributions 
2062: $p_{\hat\alpha^2|\mathcal{T}}(x|t)$ and $p_{\bar\alpha^2|\mathcal{T}}(x|t)$:
2063: \begin{equation}
2064: p_{\hat\alpha^2|\mathcal{T}}(x|t)=
2065: \theta(x) p_{\bar\alpha^2|\mathcal{T}}(x|t)
2066: +\delta(x) \, \int_{-\infty}^0
2067: dy \, p_{\bar\alpha^2|\mathcal{T}}(y|t). 
2068: \end{equation}
2069: Inserting this formula into Eqs.\ (\ref{simple Pfa}) and (\ref{simple
2070: Pfd}) gives 
2071: \begin{eqnarray}  
2072: P_\text{FA}(\hat\alpha^2_*) &=&\left\{ 
2073: \begin{array}{ll} 
2074:  \displaystyle \int_{\hat\alpha^2_*}^\infty dx~p_{\bar\alpha^2|\mathcal{T}}(x|0) 
2075: 	& \text{ if } \hat\alpha^2_* > 0 \\
2076:  \displaystyle 1  
2077: 	& \text{ if } \hat\alpha^2_* \le 0 \\
2078: \end{array} \right. ,\label{Pfa} \\
2079: P_\text{FD}(\hat\alpha^2_*) &=&\left\{ 
2080: \begin{array}{ll} 
2081:  \displaystyle 1 - \int_{\hat\alpha^2_*}^\infty dx~p_{\bar\alpha^2|\mathcal{T}}(x|1) 
2082: 	& \text{ if } \hat\alpha^2_* > 0 \\
2083:  \displaystyle 0 
2084: 	& \text{ if } \hat\alpha^2_*\le 0 \\
2085: \end{array} \right. \nonumber . \\ && \label{Pfd}
2086: \end{eqnarray}
2087: 
2088: In the large $N$ limit, the distribution
2089: $p_{\bar\alpha^2_s|\mathcal{T}}(x|t)$ must be Gaussian by the central
2090: limit theorem, and   
2091: therefore this distribution is characterized entirely by its mean $\left<
2092: \bar\alpha^2_t \right>$ and variance  
2093: $[\Delta(\bar\alpha^2_t)]^2$.  From Eqs.\ (\ref{detector output
2094: matrices}), (\ref{assumption4}), (\ref{baralphadef}), (\ref{gaussian
2095: estimator1}) and (\ref{signal model}), these are given by
2096: \begin{eqnarray} 
2097: \left< \bar\alpha^2_0 \right> &=& 0 \label{mean0}\\
2098: \Delta(\bar\alpha^2_0)  &=& \frac{\sigma_1\sigma_2}{\sqrt{N}}  \label{var0} \\
2099: \left< \bar\alpha^2_1 \right> &=& \xi\alpha^2 \label{mean1} \\
2100: \Delta(\bar\alpha^2_1) &=& \sqrt{ \frac{ \xi\alpha^4(3-\xi) + \xi\alpha^2(\sigma_1^2+\sigma_2^2) 
2101: 	+ \sigma_1^2 \sigma_2^2} {N} }. 
2102: \nonumber \\ && \label{var1}
2103: \end{eqnarray} 
2104: Substituting Gaussian distributions, with means and variances determined by 
2105: Eqs.~(\ref{mean0})-(\ref{var1}), into Eqs.~(\ref{Pfa}) and (\ref{Pfd}) yields
2106: \begin{widetext}
2107: \begin{eqnarray} 
2108: P_\text{FA}(\hat\alpha^2_*,\sigma_1,\sigma_2,N) &=& 
2109: \left\{\begin{array}{ll} 
2110:  \displaystyle \frac{1}{2} \erfc \left( \frac{\hat\alpha^2_*}{\sigma_1\sigma_2}\sqrt{\frac{N}{2}} \right)  
2111: 	& \text{ if } \hat\alpha^2_* > 0 \\
2112:  \displaystyle 1  
2113: 	& \text{ if } \hat\alpha^2_* \le 0 \\
2114: \end{array} , \right. \label{analytic1} \\
2115: P_\text{FD}(\hat\alpha^2_*,\xi,\alpha,\sigma_1,\sigma_2,N) &=& 
2116: \left\{\begin{array}{ll}
2117:  \displaystyle 1 - \frac{1}{2} \erfc \left[ \left( \hat\alpha^2_*-\xi\alpha^2 \right)
2118:               \sqrt{ \frac{N}{2 \left[ \xi\alpha^4(3-\xi) + \xi\alpha^2(\sigma_1^2+\sigma_2^2) 
2119:               + \sigma_1^2 \sigma_2^2 \right]} }\right]  
2120: 	& \text{ if } \hat\alpha^2_* > 0 \\
2121:  \displaystyle 0  
2122: 	& \text{ if } \hat\alpha^2_* \le 0 \\
2123: \end{array} \right. .\nonumber \\ &&  
2124: \label{analytic2}
2125: \end{eqnarray} 
2126: \end{widetext}
2127: If we now eliminate $\hat\alpha^2_*$ between Eqs.~(\ref{analytic1})
2128: and (\ref{analytic2}), change variables from $\alpha$ to $\rho$
2129: using Eq.~(\ref{rho2}), and set $\sigma_1 = \sigma_2$, 
2130: we obtain Eq.~(\ref{analytic}).
2131: 
2132: \section{Asymptotic behavior of maximum likelihood statistic}
2133: \label{s:appendixC}
2134: 
2135: 
2136: In this appendix we derive the large-$N$ behavior of 
2137: the maximum likelihood statistic $\Lambda^{\rm NG}_{\rm ML}$.
2138: From Eq. (\ref{main result2}), we can write the statistic in the form
2139: \begin{equation}
2140: \Lambda_{\rm ML}^{\rm NG}(h) = \exp \left[ N {\cal L}(h) \right]
2141: \label{lambdadef}
2142: \end{equation}
2143: with
2144: \begin{equation}
2145: {\cal L}(h) = \max_{\sigma_1,\sigma_2,\xi,\alpha} \, 
2146: g(\sigma_1,\sigma_2,\xi,\alpha,h)
2147: \label{lambdadef1a}
2148: \end{equation}
2149: where
2150: \begin{equation}
2151: g = \frac{1}{N}
2152: \sum_{k=1}^N \, g_k(\sigma_1,\sigma_2,\xi,\alpha),
2153: \label{lambdadef1}
2154: \end{equation}
2155: and the function $g_k = g_k(\sigma_1,\sigma_2,\xi,\alpha)$ is
2156: given by 
2157: \begin{equation}
2158: e^{g_k} = \xi A_k(\alpha) + (1 - \xi) A_k(0)
2159: \end{equation}
2160: with
2161: \begin{eqnarray}
2162: A_k(\alpha) &=&  
2163:  { \exp \left[ \frac{\left( \frac{h_1^k}{\sigma^2_1} + \frac{h_2^k}{\sigma^2_2}\right)^2}
2164:           {2\left( \frac{1}{\sigma^2_1} + \frac{1}{\sigma^2_2} + \frac{1}{\alpha^2} \right)} 
2165:           - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left(
2166: 	h_2^k\right)^2}{2\sigma^2_2} +1 \right]} \nonumber \\
2167:  && \times { {\bar \sigma}_1 {\bar \sigma}_2  \over 
2168: \sqrt{\sigma^2_1 \sigma^2_2 + \sigma^2_1 \alpha^2 + \sigma^2_2 \alpha^2}}.
2169: \label{f def}
2170: \end{eqnarray}
2171: We denote by ${\tilde \sigma}_1$, ${\tilde \sigma}_2$, ${\tilde \xi}$ and
2172: ${\tilde \alpha}$ the ``true'' parameters governing the distribution of
2173: the quantities $h_1^k$ and $h_2^k$ according to
2174: Eqs. (\ref{detector output matrices}), (\ref{assumption2}),
2175: (\ref{assumption4}), and (\ref{signal model}), with untilded
2176: quantities replaced by the corresponding tilded quantities.
2177: [These ``true parameters'' were denoted by $\sigma_1$, $\sigma_2$,
2178: $\xi$ and $\alpha$ in the body of the paper.]
2179: We define ${\tilde \rho}$ to
2180: be the signal-to-noise ratio (\ref{rho2}) with untilded
2181: quantities replaced by tilded quantities:
2182: \begin{equation}
2183: {\tilde \rho} \equiv \frac{ {\tilde \xi} {\tilde \alpha}^2 \sqrt{N} }{ {\tilde
2184: \sigma}_1 {\tilde \sigma}_2}.  
2185: \label{rho2bar}
2186: \end{equation}
2187: For simplicity, in this appendix we restrict attention to the case
2188: ${\tilde \sigma}_1 = {\tilde \sigma}_2$.  Then, without loss of
2189: generality, we can take ${\tilde \sigma}_1 = {\tilde \sigma}_2 =1$ by
2190: rescaling our units of strain amplitude.
2191: 
2192: 
2193: 
2194: We discuss separately the computation of the false alarm and false
2195: dismissal probabilities, as different techniques are required to
2196: compute each.
2197: 
2198: \subsection{False dismissal probability}
2199: 
2200: 
2201: 
2202: The false dismissal probability for the statistic (\ref{lambdadef1a})
2203: will be some function
2204: \begin{equation}
2205: P_{\rm FD} = P_{\rm FD}({\cal L}_*, N, {\tilde \xi}, {\tilde \rho})
2206: \end{equation}
2207: of the threshold ${\cal L}_*$ on ${\cal L}$, the number of data points
2208: $N$, the Gaussianity parameter ${\tilde \xi}$ and signal-to-noise
2209: ratio ${\tilde \rho}$ of the signal.  For applications to ground based
2210: detectors, we will have ${\tilde \rho} \sim $ (a few), in order that
2211: the signal be detectable, $N \sim 10^9$, and $10^{-3} \alt {\tilde
2212: \xi}\le 1$.  Therefore it would be useful to find approximate analytic
2213: expressions for the false alarm probability in the limit of large
2214: $N$.  There are actually several different, large $N$ regimes in the
2215: three dimensional parameter space $(N, {\tilde \xi}, {\tilde \rho})$
2216: that one might explore: 
2217: \begin{itemize}
2218: \item The limit $N \to \infty$ with ${\tilde \alpha}$ and ${\tilde
2219: \xi}$ held fixed.  This corresponds to fixing the stochastic
2220: background signal and going to a limit of long observation times.  In
2221: this limit we have ${\tilde \rho} \propto \sqrt{N}$ which diverges.
2222: This is not a very realistic limit to explore.
2223: 
2224: \item The limit $N \to \infty$ with ${\tilde \rho}$ and ${\tilde \xi}$
2225: held fixed.  In this limit, the signal-to-noise ratio is held fixed,
2226: and correspondingly the amplitude ${\tilde \alpha}$ of the stochastic
2227: background signal goes to zero, from Eq.\ (\ref{rho2bar}).  This would
2228: be the most natural 
2229: limit to explore.  However, in this limit the statistical error
2230: $\Delta {\tilde \xi}$ in our measurement of the Gaussianity parameter
2231: would diverge, from Eq.\ (\ref{Deltaxi}), and therefore in this limit
2232: we do not expect to be able to compute analytically the value of the
2233: parameter $\xi$ which achieves the maximum in Eq.\ (\ref{lambdadef1a}).
2234: The analytic approximation methods which we discuss below do not work in this
2235: regime.  [In addition our Monte Carlo simulations show that the maximum
2236: likelihood statistic itself does not perform any better than the
2237: cross-correlation statistic in this regime, as discussed in the
2238: Introduction.]
2239: 
2240: \item The limit we actually explore is the limit $N \to \infty$ with
2241: ${\tilde \xi}$ fixed and ${\tilde \rho}$ scaling $\propto N^{1/4}$,
2242: corresponding to ${\tilde \alpha} \propto N^{-1/8}$.  The reason for
2243: our choosing to explore this particular limit is simply that it is
2244: amenable to analytic computations.  Fractional corrections to our
2245: analytic results should scale like $1/N$ or as $1 / {\tilde \rho}^4$.  
2246: Since ${\tilde \rho} \sim $ (a few) at the threshold for detection, the
2247: approximation should be good to $10\% - 20\%$ or so.
2248: 
2249: \end{itemize}
2250: 
2251: 
2252: 
2253: We now turn to a discussion of the computational technique.  
2254: We write 
2255: \begin{equation}
2256: {\tilde \alpha} = {\tilde \alpha}_0 N^{-1/8},
2257: \label{alpha0def}
2258: \end{equation}
2259: where ${\tilde \alpha}_0$ is independent of $N$.
2260: Correspondingly, from Eq.\ (\ref{signal model}) we can write
2261: \begin{equation}
2262: s^k = N^{-1/8} {\hat s}^k,
2263: \end{equation}
2264: where the distribution of ${\hat s}^k$ is given by Eq.\ (\ref{signal
2265: model}) with $\xi$ replaced by ${\tilde \xi}$ and $\alpha$ replaced
2266: by ${\tilde \alpha}_0$.  In particular, the distribution of ${\hat
2267: s}^k$ is independent of $N$.
2268: In computing the maximum over $(\xi,\alpha,\sigma_1, \sigma_2)$ in
2269: Eq.\ (\ref{lambdadef1a}), it is useful 
2270: change variables from $\alpha$ to $\kappa$ defined by
2271: \begin{equation}
2272: \kappa = \rho N^{-1/4} = {\xi \alpha^2 N^{1/4}  \over \sigma_1
2273:   \sigma_2},
2274: \label{kappadef}
2275: \end{equation}
2276: which we expect to be independent of $N$ to leading order in the large
2277: $N$ limit.  The value of the variable $\kappa$ that characterizes the
2278: signal is 
2279: \begin{equation}
2280: {\tilde \kappa} = {\tilde \rho} N^{-1/4} = {{\tilde \xi} {\tilde
2281:     \alpha}_0^2  \over {\tilde 
2282:     \sigma}_1 {\tilde \sigma}_2};
2283: \end{equation}
2284: cf.\ Eqs.\ (\ref{alpha0def}) and (\ref{kappadef}).  
2285: 
2286: 
2287: We now consider fixed realizations of the infinite sequences of random
2288: variables $n_1^k$, $n_2^k$ and ${\hat s}^k$, and 
2289: $1 \le k < \infty$, and examine the limiting
2290: behavior of ${\cal L}(h)$ as $N \to \infty$.   
2291: We compute this limiting behavior by substituting into 
2292: the right hand side of Eq. (\ref{lambdadef})
2293: the relations 
2294: \begin{equation}
2295: h_1^k = n_1^k + N^{-1/8} {\hat s}^k \ \ \ \ \ 
2296: h_2^k = n_2^k + N^{-1/8} {\hat s}^k,
2297: \end{equation}
2298: writing $\alpha$ in terms of $\kappa$ using Eq.\ (\ref{kappadef}), and
2299: expanding in powers of $N^{-1/8}$.
2300: The result is an expression which can be written in terms of 
2301: the sums $Q_{abc}$ defined by 
2302: \begin{equation}
2303: \label{sums0}
2304: Q_{abc} = \frac{1}{N}\sum_{k=1}^N \left( \hat s^k \right)^a 
2305: 				  \left( n_1^k    \right)^b 
2306: 				  \left( n_2^k    \right)^c,
2307: \end{equation}
2308: where $a$, $b$, and $c$ are non-negative integers.
2309: From the central limit theorem we can write
2310: \begin{equation}\label{sums}
2311: Q_{abc} = \mu_{abc} + \frac{1}{\sqrt{N}} \Delta_{abc},
2312: \end{equation}
2313: where $\mu_{abc}=\left< Q_{abc} \right>$ are computable functions of
2314: ${\tilde \xi}$ and ${\tilde \alpha}$, and where the random variables 
2315: $(\Delta_{100},\Delta_{010},\ldots)$
2316: converge in distribution 
2317: % FOOTNOTE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2318: \footnote{See chapter 8 of Ref.\ \cite{Papoulis} for definitions of different
2319: notions of convergence for sequences of random variables. }
2320: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2321: as $N \rightarrow \infty$ to a multivariate Gaussian of zero mean whose variance-covariance 
2322: matrix is independent of $N$.  Thus, in particular the joint distribution of all 
2323: $\Delta_{abc}$'s is $N$-independent in limit that $N \rightarrow \infty$.
2324: 
2325: 
2326: 
2327: 
2328: 
2329: 
2330: We define the vector
2331: \begin{equation} 
2332: {\bf v} = (v^1,v^2,v^3,v^4) = (\xi,\kappa,\sigma_1^2, \sigma_2^2),
2333: \end{equation} 
2334: We denote the value of ${\bf v}$ that achieves the maximum in Eq.\
2335: (\ref{lambdadef1a}) by $\hat{{\bf v}}$:
2336: \begin{equation} 
2337: g( \hat{{\bf v}}) = \max_{{\bf v}}~g({\bf v}),
2338: \end{equation} 
2339: where $\hat{{\bf v}} = (\hat \xi, \hat \kappa,\widehat{\sigma_1^2},\widehat{\sigma_2^2})$.
2340: These estimators satisfy a system of four equations \footnote{Here we are assuming that the maximum is achieved as a local maximum in the interior of the 4 dimensional parameter space.  Cases when the maximum is achieved on the boundary are discussed below.}
2341: \begin{equation} \label{hard eq system}
2342: \left. \frac{\partial g }{\partial v^l} \right|_{{\bf v} = \hat{{\bf v}}} = 0.
2343: \end{equation} 
2344: We solve Eq.~(\ref{hard eq system}) perturbatively.  
2345: First assume that the estimators can be expanded in the form
2346: \begin{equation} \label{assume expand}
2347: \widehat{v^l} = \sum_{j=0}^{\infty} \widehat{v^{l}}^{[j]} \epsilon^j,
2348: \end{equation} 
2349: where for ease of notation we have defined
2350: $\epsilon = N^{-1/8}$.  
2351: We define the expansion coefficients $v^{l[j]}$ analogously by an
2352: expansion of the form (\ref{assume expand}) 
2353: but without the hats.  Now using Eq.\ (\ref{sums}) the function $g$
2354: can be expanded as a power 
2355: series in $\epsilon$ whose coefficients are functions of
2356: $v^{l[k]}$, $\mu_{abc}$, and $\Delta_{abc}$:
2357: \begin{equation} 
2358: g({\bf v}) = \sum_{j=0}^{\infty} 
2359: g^{[j]}\left[ v^{l[k]}, \mu_{abc},\Delta_{abc} \right] \epsilon^j.
2360: \label{gexpand}
2361: \end{equation} 
2362: Substituting the expansions (\ref{assume expand}) and (\ref{gexpand}) into
2363: the condition 
2364: (\ref{hard eq system}) for a local extremum 
2365: gives an infinite set of equations which must collectively be satisfied by 
2366: the coefficients $\widehat{v^{l}}^{[j]}$
2367: \begin{equation} \label{easy set}
2368: \left. \frac{\partial g^{[j]}}{\partial v^{l[k]}} \right|_{v^{m[n]} = \widehat{v^m}^{[n]}} = 0.
2369: \end{equation}
2370: We solve these equations order by order to determine the coefficients 
2371: $\widehat{{v^l}}^{[j]}$, and thereby justify a posteriori the ansatz
2372: (\ref{assume expand}).
2373: 
2374: 
2375: 
2376: 
2377: We find that in order to compute the leading order expression for
2378: ${\cal L}$, we must obtain the expansion for ${\hat \xi}$ to zeroth
2379: order in $\epsilon$, the expansion for ${\hat \kappa}$ to fourth order
2380: in $\epsilon$, and the expansions of ${\hat {\sigma_1^2}}$ and ${\hat
2381: {\sigma_2^2}}$ to sixth order in $\epsilon$.  
2382: The leading order results are
2383: \begin{eqnarray}
2384: \label{kappaans}
2385: {\hat \kappa} &=& {\tilde \kappa} + \epsilon^2 X + O(\epsilon^3), \\
2386: \label{xians}
2387: {1 \over {\hat \xi}} &=& {1 \over {\tilde \xi}} + {Y \over \sqrt{6}
2388:   {\tilde \kappa}^2} + O(\epsilon), \\
2389: \widehat{\sigma_1^2} &=& 1 + O(\epsilon^2), \\
2390: \widehat{\sigma_2^2} &=& 1 + O(\epsilon^2),
2391: \label{sigmaans}
2392: \end{eqnarray}
2393: where
2394: \begin{eqnarray}
2395: \label{eq:Xdef}
2396: X = \Delta_{011}  
2397: \end{eqnarray}
2398: and
2399: \begin{eqnarray}
2400: \label{eq:Ydef}
2401:  Y &=& {1 \over 8 \sqrt{6}} \bigg[ 4 (\Delta_{031} + \Delta_{013}) - 12
2402: (\Delta_{002} + \Delta_{020}) \nonumber \\
2403: && - 24 \Delta_{011} + \Delta_{040} + \Delta_{004} + 6 \Delta_{022} \bigg].
2404: \end{eqnarray}
2405: Using Eqs.\ (\ref{assumption2}), (\ref{signal model}), (\ref{sums0}) and
2406: (\ref{sums}) one can show that the random variables $X$ and $Y$ are
2407: independent Gaussian random variables of zero mean and unit variance.
2408: 
2409: 
2410: In deriving Eqs.\ (\ref{kappaans}) -- (\ref{sigmaans}) we assumed that
2411: the value of ${\bf v}$ which achieves the maximum in Eq.\
2412: (\ref{lambdadef1a}) corresponds a local maximum.  However, if the
2413: right hand side of Eq.\ (\ref{kappaans}) is negative, the maximum will
2414: instead be achieved on the boundary of the parameter space at ${\hat
2415: \kappa} =0$, since the variable $\kappa$ must be non-negative.
2416: Similarly, if the right hand side of Eq.\ (\ref{xians}) is less than
2417: 1, the maximum will be achieved at ${\hat \xi} =1$, since $1/\xi$ must
2418: lie in the interval $[1,\infty)$.
2419: 
2420: Substituting the results (\ref{kappaans}) -- (\ref{sigmaans}) [together
2421: with the higher order corrections to those results which we have not
2422: shown] into the expansion for the statistic ${\cal L}$, and taking
2423: into account the various special cases discussed in the last
2424: paragraph, gives
2425: \begin{eqnarray}
2426: {\cal L} &=&  \bigg[
2427: {1 \over 2} \left(Y + \sqrt{6} q {\tilde \kappa}^2 \right)^2 \epsilon^8 
2428: \, \theta
2429: \left(Y + \sqrt{6} q {\tilde \kappa}^2 \right) \nonumber \\
2430: && + {1 \over 2} ({\tilde \kappa} + \epsilon^2 X)^2 \epsilon^4 
2431: - {\tilde \kappa}^3 \epsilon^6 + {7 \over 4} {\tilde \kappa}^4
2432: \epsilon^8 \nonumber \\
2433:  && 
2434: + {\tilde \kappa} U \epsilon^7 + {\tilde \kappa} V \epsilon^8
2435: \bigg] \theta({\tilde \kappa} + \epsilon^2 X) + O(\epsilon^9).
2436: \label{eq:ML1}
2437: \end{eqnarray}
2438: Here $\theta(x)$ is the step function and
2439: \begin{eqnarray}
2440: \label{eq:qdef}
2441: q &=& {1 \over {\tilde \xi}}-1, \\
2442: U &=& \Delta_{101} + \Delta_{110}, \\
2443: V &=& \Delta_{200} - {1 \over 2} {\tilde \kappa} (\Delta_{002} +
2444: \Delta_{020}) - 2 {\tilde \kappa} \Delta_{011}.
2445: \end{eqnarray}
2446: We note that the corresponding expression for the statistic
2447: $(\ln \Lambda_{\rm ML}^{\rm G})/N$ [which is equivalent to the
2448: cross-correlation statistic by Eq.\ (\ref{Gaussian statistic})] is
2449: given by Eq.\ (\ref{eq:ML1}) with the first term in the square
2450: brackets dropped.
2451: 
2452: 
2453: Next we drop all the
2454: terms in the square bracket in Eq.\ (\ref{eq:ML1}) other than the
2455: first two terms.  The 
2456: reason is that these terms will give corrections that are smaller than
2457: the terms retained (both in expected value and in fluctuations) by a
2458: factor of
2459: $$
2460: {\tilde \kappa} \epsilon^2 = {{\tilde \rho} \over \sqrt{N}},
2461: $$
2462: which will be small compared to unity for all cases we are interested
2463: in.  This gives for the false dismissal probability the expression
2464: \begin{eqnarray}
2465: P_\text{FD} &=& P({\cal L} < {\cal L}_*) \nonumber \\
2466:  &=& \int_{\cal R} {dx dy  \over
2467: 2 \pi} \exp \left[ -{(x-x_0)^2 \over 2} - {(y-y_0)^2 \over 2} \right], \nonumber \\ & & 
2468: \end{eqnarray}
2469: where
2470: \begin{eqnarray}
2471: \label{eq:def1}
2472: x_0 &=& {\tilde \kappa}/\epsilon^2 \\
2473: y_0 &=& \sqrt{6} q {\tilde \kappa}^2  \\
2474: r_0 &=& \sqrt{2 N {\cal L}_*}.
2475: \label{eq:def4}
2476: \end{eqnarray}
2477: Here the region ${\cal R}$ in the $x,y$ plane is the union of the
2478: two regions
2479: \begin{eqnarray}
2480: x &\ge& 0 \nonumber \\
2481: y &\ge& 0 \nonumber \\
2482: x^2 +  y^2 &\le& r_0^2
2483: \label{eq:region1a}
2484: \end{eqnarray}
2485: and
2486: \begin{eqnarray}
2487: y &\le& 0 \nonumber \\
2488: 0 \le x &\le& r_0.
2489: \label{eq:region2a}
2490: \end{eqnarray}
2491: 
2492: 
2493: The integral over the region (\ref{eq:region2a}) is
2494: \begin{equation}
2495: P_{\rm FD}^{(1)} = 
2496: {\cal P}(-y_0)[ {\cal P}(r_0 - x_0) - {\cal P}(-x_0)],
2497: \label{eq:1}
2498: \end{equation}
2499: where 
2500: \begin{equation}
2501: {\cal P}(x) \equiv 1 - {1 \over 2} \erfc (x/\sqrt{2}) = \int_{-\infty}^x
2502: dt {1 \over \sqrt{2 \pi}} \exp[-t^2/2].
2503: \end{equation}
2504: The integral over the region (\ref{eq:region1a}) can be written as
2505: \begin{eqnarray}
2506: P_{\rm FD}^{(2)} &=& {1 \over 2 \pi} \int_0^{\pi/2} d\theta \,
2507: \int_{0}^{r_0} dr \, r 
2508: \nonumber \\ &\times&
2509: \exp \left[ - {1 \over 2} (r \cos \theta -
2510: x_0)^2 - {1 \over 2} (r \sin \theta - y_0)^2 \right]. \nonumber \\ 
2511: & & \label{eq:pFD1}
2512: \end{eqnarray}
2513: The integrand in (\ref{eq:pFD1})
2514: peaks
2515: at $r \cos \theta =  
2516: x_0$, $r \sin \theta = y_0$.  In order for $P_{\rm FD}$ to be
2517: small, its necessary that this peak occurs outside the domain of
2518: integration, at $r > r_0$.  So we must have
2519: \begin{equation}
2520: x_0^2 + y_0^2 \ge r_0^2.
2521: \label{eq:constraint}
2522: \end{equation}
2523: The criterion $x_0 \ge r_0$ is, in order of magnitude, just the usual
2524: criterion for detectability with the cross-correlation statistic.  The
2525: criterion $y_0 \agt r_0$ reduces to, in order of magnitude,
2526: \begin{equation}
2527: \xi \alt {\rho^2 \over \sqrt{N}}
2528: \end{equation}
2529: which is what we claimed earlier to be the regime where the maximum
2530: likelihood statistic starts
2531: to work well, cf.\ Sec.\ \ref{s:MLS} above.  
2532: 
2533: Evaluating the integral (\ref{eq:pFD1}) using the Laplace
2534: approximation gives
2535: \begin{eqnarray}
2536: P_{\rm FD}^{(2)} &=& {1 \over r_0 (\lambda-1) \sqrt{2 \pi \lambda}}
2537: \exp \left[ 
2538: - {1 \over 2} r_0^2 (\lambda-1)^2 \right] \nonumber \\
2539:  && \times \left[ 1 + O\left({1
2540: \over r_0}\right)\right],
2541: \label{eq:final}
2542: \end{eqnarray}
2543: where we define the variables $\lambda$ and $\gamma$ by
2544: \begin{equation}
2545: (x_0,y_0) = r_0 \lambda (\cos \gamma, \sin \gamma).
2546: \label{eq:lambdadef}
2547: \end{equation}
2548: However, the result (\ref{eq:final}) is not very accurate for small
2549: $r_0$.  Alternatively we can integrate over $r$ in Eq.\
2550: (\ref{eq:pFD1}) to obtain
2551: \begin{widetext}
2552: \begin{eqnarray}
2553: && P_{\rm FD}^{(2)} = \int_0^{\pi/2} d\theta \left\{ 
2554: \frac{1}{2\pi} e^{\frac{{{r_0}}^2\,\left( 1 + {\lambda }^2 \right) }{2}} 
2555: \left[ e^{\frac{{{r_0}}^2}{2}} - e^{{{r_0}}^2\,\lambda \,\cos (\gamma  - \theta )} \right] \right. \nonumber \\
2556: && + \left. 
2557: \frac{r_0 \lambda}{2\sqrt{2\pi}} 
2558: e^{\frac{ {{r_0}}^2 \lambda^2 }{4} 
2559:    \left[ \cos (2\,\left\{ \gamma  - \theta  \right\} ) -1 \right] }
2560: \cos \left( \gamma  - \theta \right)
2561:      \left[ \erf\left( \frac{{r_0}\,\lambda \cos \{\gamma  - \theta \}}{{\sqrt{2}}} \right) +
2562:             \erf\left( \frac{{r_0}\,\left\{ 1 - \lambda \,\cos [\gamma  - \theta ]  \right\} }{{\sqrt{2}}} \right) \right] 
2563: \right\}, \label{eq:c}
2564: \end{eqnarray}
2565: \end{widetext}
2566: where 
2567: \begin{equation} 
2568: \erf(x) = \frac{2}{\sqrt{\pi}}\int_0^x dy~ e^{-y^2}.
2569: \end{equation} 
2570: The integral (\ref{eq:c}) can be evaluated numerically.
2571: The false dismissal probability is then given by
2572: \begin{equation}
2573: P_{\rm FD} = P_{\rm FD}^{(1)} + P_{\rm FD}^{(2)},
2574: \label{eq:ansA}
2575: \end{equation}
2576: with $P_{\rm FD}^{(1)}$ given by Eq.\ (\ref{eq:1})
2577: and $P_{\rm FD}^{(2)}$ given by Eq.\ (\ref{eq:c}).
2578: 
2579: 
2580: \subsection{False alarm probability}
2581: 
2582: The false alarm probability is some function
2583: \begin{equation}
2584: P_{\rm FA} = P_{\rm FA}({\cal L}_*,N)
2585: \end{equation}
2586: of the threshold ${\cal L}_*$ value of the detection statistic
2587: (\ref{lambdadef1a}) and of the number of data points $N$.  It does not
2588: depend on the signal parameters ${\tilde \rho}$ and ${\tilde \xi}$
2589: because no signal is present.  We would like to evaluate this quantity
2590: in the large $N$ limit.
2591: 
2592: We start by rewriting the statistic (\ref{lambdadef1a}) in the form
2593: \begin{equation}
2594: {\cal L} = \max_{\bf v} \left\{ {1 \over N} \sum_{k=1}^N \ln A_k(0) 
2595: + {1 \over N} \sum_{k=1}^N \ln \left[ 1 + \xi {\cal D}_k(\alpha)
2596:   \right] \right\},  
2597: \label{lambdadef4}
2598: \end{equation}
2599: where
2600: \begin{equation}
2601: {\cal D}_k(\alpha) = { A_k(\alpha) \over A_k(0)} -1.
2602: \label{calDdef}
2603: \end{equation}
2604: Consider first the first term in Eq.\ (\ref{lambdadef4}).  Using the
2605: definition (\ref{f def}) of $A_k(\alpha)$ and the definition
2606: (\ref{intro bar sigma})
2607: of ${\bar \sigma}_1$ and ${\bar \sigma}_2$ we can
2608: write this term as  
2609: \begin{equation}
2610: {1 \over N} \sum_{k=1}^N \ln A_k(0) = - { \Delta \sigma_1^2 \over
2611:   {\bar \sigma}_1^2}  - { \Delta \sigma_2^2 \over
2612:   {\bar \sigma}_2^2} + O( \Delta \sigma_1^3, \Delta \sigma_2^3),
2613: \end{equation}
2614: where $\Delta \sigma_1 = \sigma_1 - {\bar \sigma}_1$, $\Delta \sigma_2
2615: = \sigma_2 - {\bar \sigma}_2$.
2616: Therefore the first term is maximized at $\sigma_1 = {\bar \sigma}_1$,
2617: $\sigma_2 = {\bar \sigma}_2$.  Below we shall show that the second
2618: term in Eq.\ (\ref{lambdadef4}) is of order $O(\epsilon^2)$, where in this
2619: subsection we define $\epsilon = 1/\sqrt{N}$.  Therefore the values of
2620: $\sigma_1$ and $\sigma_2$ that achieve the maximum are
2621: \begin{eqnarray}
2622: {\hat \sigma}_1 &=& {\bar \sigma}_1 \left[ 1 + O(\epsilon^2) \right] \nonumber
2623: \\ 
2624: {\hat \sigma}_2 &=& {\bar \sigma}_2 \left[ 1 + O(\epsilon^2) \right].
2625: \end{eqnarray}
2626: Moreover, in analyzing the second term it suffices to take $\sigma_1 =
2627: {\bar \sigma}_1$, $\sigma_2 = {\bar \sigma}_2$ in order to obtain the
2628: statistic to the leading $O(\epsilon^2)$ order.  Lastly, since we have
2629: assumed that ${\tilde \sigma}_1 = {\tilde \sigma}_2 =1$ and no signal
2630: is present, we have ${\bar \sigma}_{1,2} = 1 + O(\epsilon)$.  Hence,
2631: in analyzing the second term, it is sufficient to take $\sigma_1 =
2632: \sigma_2 = 1$.  
2633: 
2634: 
2635: The statistic (\ref{lambdadef4}) therefore reduces to 
2636: \begin{equation}
2637: {\cal L} = \max_{\alpha,\xi} 
2638: {1 \over N} \sum_{k=1}^N \, \ln \left[ 1 + \xi {\cal D}_k(\alpha) \right] + O(\epsilon),
2639: \label{lambdadef5}
2640: \end{equation}
2641: where from Eqs.\ (\ref{f def}) and (\ref{calDdef}) 
2642: \begin{equation}
2643: {\cal D}_k(\alpha) = {1 \over \sqrt{1 + 2 \alpha}} \exp \left[ {w_k^2
2644:     \over 2 + {1 \over \alpha}} \right] -1.
2645: \end{equation}
2646: Here $w_k = (n_1^k + n_2^k)/\sqrt{2}$, $1 \le k \le N$, are
2647: independent Gaussian random variables of zero mean and unit variance.   
2648: 
2649: 
2650: 
2651: It is straightforward to numerically compute the distribution of the
2652: statistic (\ref{lambdadef5}), by generating the Gaussian variables
2653: $w_k$ and numerically maximizing over $\xi$ and $\alpha$.  
2654: The result is shown in Fig. \ref{fig:fa}.  We find
2655: that at large $N$, the distribution of $N {\cal L}$ becomes
2656: independent of $N$, and is approximately given by 
2657: \begin{equation}
2658: P(N {\cal L} > \xi) = \alpha_0 e^{-\beta_0 \xi}
2659: \end{equation}
2660: for $\xi > 0$, where $\alpha_0 \approx 0.42$ and $\beta_0 \approx
2661: 1.08$.  Therefore the false alarm probability is approximately given
2662: by
2663: \begin{equation}
2664: P_{\rm FA} = \alpha_0 \exp \left[ - \beta_0 N {\cal L}_* \right].
2665: \label{eq:ansB}
2666: \end{equation}
2667: 
2668: 
2669: \begin{figure}
2670: \begin{center}
2671: \epsfig{file=fa.eps,width=8.5cm}
2672: \caption{The cumulative distribution function for the leading order
2673: expression \protect{(\ref{lambdadef5})} for the statistic when no
2674: signal is present, obtained numerically.  The solid line is for $N =
2675: 1000$, and the dashed line for $N = 5000$.
2676: }
2677: \label{fig:fa}
2678: \end{center}
2679: \end{figure}
2680: 
2681: 
2682: 
2683: 
2684: Finally, we remark why it is plausible to expect the distribution of
2685: $N {\cal L}$ to be independent of $N$ in the large $N$ limit.  The
2686: numerical maximizations over $\xi$ and $\alpha$ in Eq.\
2687: (\ref{lambdadef5}) show that the maximum is nearly always achieved at
2688: $\alpha \ll 1$ or $\xi \ll 1$.  In both these regimes, one can obtain
2689: some information about the $N$-dependence of the statistic.
2690: 
2691: Consider first the regime $\xi \ll 1$.  In this regime we can expand
2692: the expression (\ref{lambdadef5}) as a power series in $\xi$ to obtain
2693: \begin{equation}
2694: {\cal L} = \max_{\alpha,\xi} 
2695: {1 \over N} \sum_{k=1}^N \, \left[ \xi {\cal D}_k(\alpha) - {1 \over
2696:     2} \xi^2 {\cal D}_k(\alpha)^2 + O(\xi^3) \right] + O(\epsilon).
2697: \label{lambdadef6}
2698: \end{equation}
2699: The generalized central limit theorem (reviewed in Appendix
2700: \ref{app:gclt}) implies that
2701: \begin{equation}
2702: {1 \over N} \sum_{k=1}^N \, {\cal D}_k(\alpha) = N^{1 - \gamma_1 \over 
2703:   \gamma_1} \left( \ln N \right)^{\delta_1} {\cal
2704:   F}_N(\alpha),
2705: \label{levy1}
2706: \end{equation}
2707: where for each fixed $\alpha$, the distribution of the random variable ${\cal
2708: F}_N(\alpha)$ becomes independent of $N$ in the large $N$ limit.
2709: Here 
2710: \begin{equation}
2711: \gamma_1 = \left\{ \begin{array}{ll} 2 & 0 <
2712:         \alpha \le 1/2 \\ 
2713:         1 + {1 \over 2 \alpha} & 
2714:         1/2 \le \alpha \\ \end{array} \right.
2715: \end{equation}
2716: and
2717: \begin{equation}
2718: \delta_1 = \left\{ \begin{array}{ll} 0 & 0 <
2719:         \alpha \le 1/2 \\ 
2720:          { -\alpha \over 1 + 2 \alpha} & 
2721:         1/2 \le \alpha.\\ \end{array} \right.
2722: \end{equation}
2723: The limiting distribution is a Levy distribution with parameters $p =
2724: 1$ and $\gamma = \gamma_1$.
2725: Similarly we have
2726: \begin{equation}
2727: {1 \over N} \sum_{k=1}^N \, {\cal D}_k(\alpha)^2 = N^{1 - \gamma_2 \over
2728:   \gamma_2} \left( \ln N \right)^{\delta_2} {\cal 
2729:   G}_N(\alpha),
2730: \label{levy2}
2731: \end{equation}
2732: where as $N \to \infty$ at each fixed $\alpha$ the distribution of the
2733: random variable ${\cal G}_N(\alpha)$ tends to a Levy distribution with
2734: parameters $p=1$ and $\gamma = \gamma_2$, with
2735: \begin{equation}
2736: \gamma_2 = \left\{ \begin{array}{ll} 2 & 0 <
2737:         \alpha \le 1/6 \\ 
2738:          {1 + 2 \alpha \over 4 \alpha} & 
2739:         1/6 \le \alpha.\\ \end{array} \right.
2740: \end{equation}
2741: and 
2742: \begin{equation}
2743: \delta_2 = \left\{ \begin{array}{ll} 0 & 0 <
2744:         \alpha \le 1/6 \\ 
2745:          { - 2 \alpha \over 1 + 2 \alpha} & 
2746:         1/6 \le \alpha. \\ \end{array} \right.
2747: \end{equation}
2748: 
2749: 
2750: 
2751: We now substitute the results (\ref{levy1}) and (\ref{levy2}) into the
2752: expression (\ref{lambdadef6}) for the statistic, and maximize analytically over
2753: the quadratic dependence on $\xi$.  For $\alpha \ge 1/2$, the value of
2754: $\xi$ which achieves the maximum goes to zero as $N \to \infty$,
2755: consistent with the assumption $\xi \ll 1$, and the result is
2756: \footnote{For $\alpha < 1/2$ this argument fails, which is why we must
2757: numerically verify that the distribution of $N {\cal L}$ is
2758: asymptotically independent of $N$.}
2759: \begin{equation}
2760: N {\cal L} = {1 \over 2 } \max_\alpha {{\cal F}_N(\alpha)^2 \over {\cal
2761:     G}_N(\alpha)} + O(\epsilon).
2762: \end{equation}
2763: 
2764: 
2765: In the regime $\alpha \ll 1$, if we expand the expression
2766: (\ref{lambdadef5}) to quadratic order in $\alpha$, the result is an
2767: expression which is a linear function of $1/\xi$ at fixed $\alpha
2768: \xi$.  Hence, when one 
2769: maximizes over values of $\xi$ in the range $0 \le \xi \le 1$, the
2770: maximum is always achieved either at $\xi =0$ or $\xi =1$.  One can
2771: show that the maximum to this order is always achieved at $\xi = 1$, and the
2772: resulting expression is
2773: \begin{equation}
2774: N {\cal L} = {1 \over 4 } {\cal G}^2 + O(\epsilon),
2775: \end{equation}
2776: where 
2777: \begin{equation}
2778: {\cal G} = \sqrt{N} \left[ {1 \over N} \sum_{k=1}^N w_k^2 \ -1 \right]
2779: \end{equation}
2780: has a distribution that is independent of $N$ in the large $N$ limit.
2781: 
2782: \section{Generalized central limit theorem}
2783: \label{app:gclt}
2784: 
2785: In this appendix we review the generalized central limit theorem that
2786: can be found on p.~574 of Ref.~\cite{Feller}.  First we define a
2787: particular distribution function called the Levy distribution.  It
2788: depends on 3 real parameters, a positive constant $C$, a parameter
2789: $\gamma$ in the range $0 < \gamma \le 2$, and constant $p$ in the
2790: range $0 \le p \le 1$ \footnote{The parameter $\gamma$ is conventionally denoted by $\alpha$.  We use $\gamma$ here to avoid confusion with the variable
2791: $\alpha$ defined in Eq.\ (\ref{eq:sigg}).}.  We say a random 
2792: variable $X$ has a Levy distribution with parameters $C$, $\gamma$ and
2793: $p$ if the characteristic function of $X$ is given by
2794: \begin{eqnarray}
2795: \left< e^{i \zeta X} \right> &=& \exp \bigg\{ | \zeta|^\gamma { C
2796:     \Gamma(3-\gamma) \over \gamma (\gamma-1)} \bigg[ \cos(\pi
2797:     \gamma/2) \nonumber \\
2798:   &&  + i \, {\rm sgn}(\zeta) (p-q) \sin(\pi \gamma/2) \bigg] \bigg\},
2799: \end{eqnarray}
2800: where $q = 1 - p$.
2801: The corresponding probability distribution function  is obtained by
2802: taking a Fourier transform and decays like $x^{-(1+\gamma)}$ at large
2803: $x$ for $\gamma < 2$ ($\gamma =2$ is the Gaussian case).
2804: 
2805: Consider now a random
2806: variable $X$ with probability
2807: distribution function $f(x)$ whose variance is infinite.  Let
2808: \begin{equation}
2809: F(x) = \int_{-\infty}^x dy \, f(y)
2810: \end{equation}
2811: be the cumulative distribution function
2812: and define 
2813: \begin{equation}
2814: \mu(x) = \int_{-x}^x dy y^2 f(y).
2815: \end{equation}
2816: Suppose that
2817: the distribution satisfies the following conditions:
2818: (i)  As $x \to \infty$ we have $\mu(x) \sim x^{2 - \gamma} L(x)$,
2819: where $0 < \gamma \le 2$, and $L(x)$ varies slowly in the sense that
2820: $L(tx)/L(t) \to 1$ as $t \to \infty$ for all $x>0$.  (ii) We have
2821: \begin{equation}
2822: {1 - F(x) \over F(-x) + 1 - F(x)} \to p \ \ \ \ \ {F(-x) \over F(-x) +
2823:   1 - F(x)} \to q
2824: \end{equation}
2825: as $x \to \infty$, where $0 \le p \le 1$, $0 \le q \le 1$ and $p+q=1$.
2826: (iii) For $1 < \gamma \le 2$, we assume that the expected value $\int
2827: dx \, x f(x)$ vanishes; this can be enforced by making a
2828: transformation of the form $X \to X + {\rm constant}$.
2829: 
2830: We define the sequence of random variables
2831: \begin{equation}
2832: S_N = {1 \over a_N} \sum_{i=1}^N \, X_i,
2833: \end{equation}
2834: where the $X_i$ are independent, identically distributed random
2835: variables with distribution function $f$, and the constants
2836: $a_N$ are 
2837: chosen to satisfy
2838: \begin{equation}
2839: {N \mu(a_N)  \over a_N^2} \to C
2840: \end{equation}
2841: as $N \to \infty$, where $C$ is a positive constant.  Then, the
2842: distribution functions of the random
2843: variables $S_N$ converge to a Levy distribution with parameters $C$,
2844: $\gamma$ and $p$ as $N \to \infty$.
2845: 
2846: 
2847: \begin{thebibliography}{0}
2848: 
2849: \bibitem{ligo}
2850: A.  Abramovici \emph{et al.}, Science {\bf 256}, 325 (1992).
2851: %A.  Abramovici, W. E.  Althouse, R. W. P. Drever, Y. G\"{u}rsel, S. Kawamura, F. J. Raab,
2852: %D. Shoemaker, L. Siewers, R. E. Spero, K. S. Thorne, R. E. Vogt, R. Weiss, S. E. Whitcomb, 
2853: %and M. E. Zucker, 
2854: %\emph{LIGO: The laser interferometer gravitational-wave observatory}, Science, 256, 325--333, (1992)
2855: 
2856: \bibitem{virgo}
2857: C. Bradaschia \emph{et al.}, Nuc. Instrum. Methods {\bf 289}, 518 (1990).
2858: %C. Bradaschia, R. del Fabbro,  A. Virgilio, A. Giazotto, H. Kautzky,  V. Montelatici, D. Passuello,  
2859: %A. Brillet, O. Cregut, P. Hello, C. N. Man, P. T. Manh, A. Marraud, D. Shoemaker, J. Y. Vinet, F. Barone, 
2860: %L. di Fiore, L. Milano, G. Russo, J. M. Aguirregabiria, H. Bel, J. P. Duruisseau, G. le Denmat, P. Tourrenc, 
2861: %M. Capozzi, M. Longo, M. Lops, I. Pinto, G. Rotoli, T. Damour, S. Bonazzola, J. A. Marck, Y. Gourghoulon, 
2862: %L. E. Holloway, F. E. Fuligni, V. Iafolla, and G. Natale, 
2863: %\emph{The VIRGO project: a wide band antenna for gravitational wave detection}
2864: 
2865: \bibitem{geo}
2866: R. Schilling, AIP Conf. Proc. {\bf 456}, 217 (1998).
2867: % Second international LISA symposium on the detection and observation of gravitational waves in space 
2868: %\emph{The GEO 600 ground-based interferometer for the detection of gravitational waves}
2869: % edited by William M. Folkner, Jet Propulsion Laboratory, California Institute of Technology,
2870: % Pasadena CA, December 1998
2871: 
2872: \bibitem{tama}
2873: M. K. Fujimoto, Journal of the Communications Research Laboratory {\bf 46}, 437 (1999).
2874: %\emph{Japanese gravitational wave detector-TAMA 300},
2875: %Journal of the Communications Research Laboratory, 46, 3, 437-440, (1999)
2876: 
2877: \bibitem{Allen Review}
2878: B. Allen, in
2879: \emph{Relativistic Gravitation and Gravitational Radiation, Proceedings of the 
2880: Les Houches School of Physics, Les Houches, 1995}, edited by J. A. Marck and J. P. Lasota (CNRS, Observatorie de Paris, Meudon, 1997), 
2881: p. 373. 
2882: %\emph{The stochastic gravity-wave background: sources and detection},
2883: %Proceedings of the Les Houches School on Astrophysical Sources of Gravitational 
2884: %waves, Cambridge University Press, (1996)
2885: 
2886: \bibitem{gaussian supernovae}
2887: D. Blair and L. Ju, Mon. Not. R. Astron. Soc. {\bf 283}, 648 (1996).
2888: %\emph{A cosmological background of gravitational waves produced by supernovae in the early universe},
2889: 
2890: \bibitem{non gaussian supernovae}
2891: V. Ferrari, S. Matarrese, and R. Schneider, Mon. Not. R. Astron. Soc. {\bf 303}, 247 (1999).
2892: %\emph{Gravitational wave background from a cosmological population of core-collapse supernovae},
2893: 
2894: \bibitem{first stars}
2895: R. Schneider \emph{et al.}, Mon. Not. R. Astron. Soc. {\bf 317}, 385 (2000).
2896: %\emph{Gravitational wave signals from the collapse of the first stars},
2897: 
2898: \bibitem{gaussian neutron stars 1}
2899: V. Ferrari, S. Matarrese, and R. Schneider, Mon. Not. R. Astron. Soc. {\bf 303}, 258 (1999).
2900: %\emph{Stochastic background of gravitational waves generated by a cosmological population
2901: %of young, rapidly rotating neutron stars},
2902: 
2903: \bibitem{gaussian neutron stars 2}
2904: T. Regimbau and J. A. de Freitas Pacheco, astro-ph/0105260.
2905: %\emph{Cosmic background of gravitational waves from rotating neutron stars},
2906: 
2907: \bibitem{cosmic strings}
2908: T. Damour, A. Vilenkin, Phys. Rev D {\bf 64}, 064008 (2002)  (also
2909: gr-qc/0104026).
2910: %\emph{Gravitational wave bursts from cusps and kinks on cosmic strings},
2911: 
2912: \bibitem{bubbles}
2913: M. Kamionkowski, A. Kosowsky, and M. S. Turner, Phys. Rev. D {\bf 49}, 2837 (1994).
2914: %\emph{Gravitational radiation from first-order phase transitions},
2915: 
2916: \bibitem{inflation} 
2917: L.P. Grishchuk, Zh. Eksp. Teor. Fiz. {\bf 67}, 825
2918: (1974) [Sov. Phys. JETP {\bf 40}, 409 (1975)];
2919: E.W. Kolb and M.S. Turner, {\it The Early Universe} (Addison-Wesley,
2920: Redwood, CA, 1990), and references therein.
2921: 
2922: \bibitem{binaries}
2923: R. Schneider Ferrari, S. Matarrese, and S. F. Portegies Zwart, 
2924: Mon. Not. R. Astron. Soc. {\bf 342}, 797 (2001)
2925: (also astro-ph/0002055).
2926: %\emph{Gravitational waves from cosmological compact binaries}, 
2927: 
2928: \bibitem{Coward}
2929: D. M. Coward, R. R. Burman, and D. G. Blair, Mon. Not. R. Astron. Soc. {\bf 324}, 1015 (2001).
2930: %\emph{Simulating a stochastic background of gravitational waves from neutron star formation at cosmological distances},
2931: 
2932: \bibitem{Hogan}
2933: C. J. Hogan, Phys. Rev. D {\bf 62}, 121302 (2000).
2934: %\emph{Scales of the extra dimensions and their gravitational wave backgrounds},
2935: 
2936: \bibitem{Michelson}
2937: P. F. Michelson, Mon. Not. R. Astron. Soc. {\bf 227}, 933 (1987).
2938: %\emph{On detecting stochastic background gravitational radiation with terrestrial detectors},
2939: 
2940: \bibitem{Christensen}
2941: N. Christensen, Phys. Rev. D {\bf 46}, 5250 (1992).
2942: %\emph{Measuring the stochastic gravitational-radiation background with laser interferometric antennas},
2943: 
2944: \bibitem{Flanagan}
2945: \'{E}. \'{E}. Flanagan, Phys. Rev. D {\bf 48}, 2389 (1993).
2946: %\emph{Sensitivity of the Laser Interferometer Gravitational Wave Observatory to a stochastic background and its dependence
2947: %on the detector orientations}
2948: 
2949: \bibitem{Allen Romano}
2950: B. Allen and J. D. Romano, Phys. Rev. D {\bf 59}, 102001 (1999) (also
2951: gr-qc/9710117).  Note that the criticism in this paper of the upper
2952: limit formula given in Eq. (6.5) of Ref.\ \cite{Flanagan} is incorrect
2953: as it misinterprets that formula as a frequentist upper limit rather
2954: than a Bayesian upper limit.
2955: %\emph{Detecting a stochastic background of gravitational radiation: Signal processing strategies and sensitivities},
2956: 
2957: \bibitem{robust gaussian}
2958: B. Allen , J. D. E. Creighton, \'{E}. \'{E}. Flanagan, and
2959: J. D. Romano, Phys. Rev. D {\bf 65}, 122002 (2002) (also 
2960: gr-qc/0105100).
2961: %\emph{Robust statistics for deterministic and stochastic
2962: %gravitational waves in non-Gaussian noise.  I: Frequentist analyses},
2963: 
2964: \bibitem{robust gaussian II}
2965: B. Allen , J. D. E. Creighton, \'{E}. \'{E}. Flanagan, and
2966: J. D. Romano, gr-qc/0205015. 
2967: %\emph{Robust statistics for deterministic and stochastic
2968: %gravitational waves in non-Gaussian noise.  II: Bayesian analyses},
2969: 
2970: \bibitem{Klimenko and Mitselmakher}
2971: S. Klimenko and G. Mitselmakher, LIGO Technical Report
2972: LIGO-T010125-00-D, 2001, (unpublished); {\it ibid.}, gr-qc/0208007.
2973: % A cross-correlation technique wavelet domain for detection of stochastic gravitational waves
2974: 
2975: \bibitem{general method}
2976: L. S. Finn, Phys. Rev. D {\bf 46}, 5236 (1992).
2977: %\emph{Detection, measurement, and gravitational radiation},
2978: 
2979: \bibitem{excess power}
2980: W. G. Anderson, P. R. Brady, J. D. E. Creighton, and \'{E}. \'{E}. Flanagan, Phys. Rev. D {\bf 63}, 042003 (2001)
2981: 
2982: \bibitem{sam joe}
2983: L. S. Finn and J. D. Romano, in preparation.
2984: %\emph{Detecting stochastic gravitational waves: Performance of maximum-likelihood and cross-correlation statistics}
2985: 
2986: \bibitem{sam unpublished}
2987: L. S. Finn, in preparation.
2988: %\emph{?}
2989: 
2990: \bibitem{maximum likelihood}
2991: P. J. Bickel and K. A. Doksum,
2992: \emph{Mathematical statistics: basic ideas and selected topics} (Holden-Day, Inc., California, 1977),
2993: Sec. 6.4.
2994: 
2995: \bibitem{Papoulis}
2996: A. Papoulis, edited by S. W. Director, \emph{Probability, random variables, and stochastic processes} (McGraw Hill, New York, 1984),
2997: second edition.
2998: 
2999: \bibitem{Neyman and Pearson}
3000: J. Neyman and K. Pearson, Philos. Trans. R. Soc. London Ser. A {\bf 231}, 289 (1933).
3001: 
3002: \bibitem{Ferguson}
3003: T. S. Ferguson, edited by Z. W. Birnbaum and E. Kukacs, \emph{Mathematical statistics a decision theoretic approach} (Academic Press, New York, 1967).
3004: 
3005: \bibitem{Bayes}
3006: T. Bayes and R. Price, Philos. Trans. {\bf 53}, 370 (1763).
3007: 
3008: \bibitem{Loredo}
3009: T. J. Loredo, \emph{Astronomical Society of the Pacific Conference Series, San Francisco, 1999}, edited by R. (Dick) Crutcher 
3010: and D. Mehringer, vol 172, p. 297.
3011: %\emph{Computational technology for Bayesian inference},
3012: %astronomical data analysis software and systems VIII, ed. R. (Dick) Crutcher and D. Mehringer,
3013: %San Francisco: Astronomical Society of the Pacific, p. 297-306,(1999)
3014: 
3015: \bibitem{waveform catalog}
3016: T. Zwerger and E. M\"{u}ller, Astron. Astrophys. {\bf 320}, 209 (1997).
3017: %\emph{Dynamics and gravitational wave signature of axisymetric rotational core collapse},
3018: 
3019: \bibitem{new waveform catalog}
3020: H. Dimmelmeier, J. A. Font, and E. M\"{u}ller, Astron. Astrophys., {\bf 393}, 523 (2002).
3021: %\emph{Relativistic simulations of rotational core collapse. II. Collapse and gravitational radiation}
3022: %astro-ph/0204289
3023: 
3024: \bibitem{MG9}
3025: S. Drasco and \'{E}. \'{E}. Flanagan, {\it Detecting a non-Gaussian
3026: stochastic background of gravitational radiation}, Proceedings of the
3027: Ninth Marcel Grossmann Meeting on General Relativity,
3028: eds. V. Gurzadyan, R. T. Jantzen, and R. Ruffini (World
3029: Scientific, Singapore, 2001) (also gr-qc/0101051).
3030: %in \emph{Proceedings of the Ninth Marcel Grossmann 
3031: %Meeting on General Relativity}, edited by V. G. Gurzadyan,  R. T. Jantzen and 
3032: %R. Ruffini, (World Scientific, Singapore, 2002), p. 1917 [?].
3033: 
3034: \bibitem{wainstein zubakov}
3035: L. A. Wainstein and V. D. Zubakov, translated from Russian by R. A. Silverman,
3036: \emph{Extraction of signals from noise} (Prentice-Hall, Inc., New Jersey, 1962).
3037: 
3038: 
3039: 
3040: \bibitem{Feller}
3041: W. Feller, {\it An Introduction to Probability Theory and Its
3042:   Applications}, Volume II, Wiley, New York, 1971.
3043: 
3044: \end{thebibliography}
3045: 
3046: \end{document}
3047: