0210:gr-qc0210032/ms.tex

1: \documentclass[twocolumn,aps,prd,amssymb,eqsecnum,floatfix,nofootinbib]{revtex4}

2: %\documentclass[twocolumn,aps,prd,amssymb,eqsecnum,floatfix]{revtex4}

3: %\documentclass[preprint,aps,prd,amssymb,eqsecnum,floatfix]{revtex4}

4: \usepackage{epsfig}

5: \usepackage{amsmath}

6: \DeclareMathOperator{\erfc}{erfc}

7: \DeclareMathOperator{\erf}{erf}

8:

9: \begin{document}

10:

11: \title{Detection methods for non-Gaussian gravitational wave stochastic backgrounds}

12: \author{Steve Drasco\footnote{sd68@cornell.edu}}

13: %\email{sd68@cornell.edu}

14: \author{\'{E}anna \'{E}. Flanagan\footnote{eef3@cornell.edu.  Also

15:     Radcliffe Institute for Advanced Study, Putnam House, 10 Garden

16:     Street, Cambridge, MA 02138.}}

17: %\email{eef3@cornell.edu}

18: \affiliation{Newman Laboratory of Nuclear Studies, Cornell University, Ithaca, New York 14853}

19: %\affiliation{Center for Radiophysics and Space Research, Cornell University, Ithaca, New York 14853}

20: \date{\today}

21:

22: \begin{abstract}

23: A gravitational wave stochastic background can be produced by a

24: collection of independent gravitational wave events.  There are two

25: classes of such backgrounds, one for which the ratio of the average

26: time between events to the average duration of an event is small

27: (i.e., many events are on at once), and one for which the ratio is

28: large.  In the first case the signal is continuous, sounds something

29: like a constant {\em hiss}, and has a Gaussian probability

30: distribution.  In the second case, the discontinuous or intermittent

31: signal sounds something like popcorn popping, and is described by a

32: non-Gaussian probability distribution.  In this paper we address the

33: issue of finding an optimal detection method for such a non-Gaussian

34: background.  As a first step, we examine the idealized situation in

35: which the event durations are short compared to the detector sampling

36: time, so that the time structure of the events cannot be resolved, and

37: we assume white, Gaussian noise in two collocated, aligned

38: detectors.  For this situation we derive an appropriate version of the

39: maximum likelihood detection statistic.  We compare the performance of

40: this statistic to that of the standard cross-correlation statistic

41: both analytically and with Monte Carlo simulations.

42: In general the maximum likelihood statistic performs better than the

43: cross-correlation statistic when the stochastic background is

44: sufficiently non-Gaussian, resulting in a gain factor in the minimum

45: gravitational-wave energy density necessary for detection.

46: This gain factor ranges roughly between 1 and 3, depending on the duty

47: cycle of the background, for realistic observing times and signal

48: strengths for both ground and space based detectors.

49: The computational cost of the statistic, although significantly greater

50: than that of the cross-correlation statistic, is not unreasonable.

51: Before the statistic can be used in practice with real detector data,

52: further work is required to generalize our analysis to accommodate

53: separated, misaligned detectors with realistic, colored, non-Gaussian noise.

54: \end{abstract}

55:

56: \pacs{04.80.Nn, 04.30.Db, 95.55.Ym, 07.05.Kf}

57:

58: \keywords{gravitational waves; stochastic background}

59:

60: \maketitle

61:

62: \section{Introduction and summary}

63: \label{s:Introduction and summary}

64:

65: Along with a new generation of gravitational wave detectors around the world \cite{ligo,virgo,geo,tama}, detection

66: algorithms for a variety of sources are nearing completion. If the

67: signals from these sources are

68: detected, physicists stand to harvest unprecedented quantities of observational information concerning the

69: nature of gravitation and the cosmos as a whole.  The fruit of this harvest will be the outputs of detection

70: algorithms.  In this paper we introduce an algorithm designed for nearly optimal detection of a class of

71: gravitational wave stochastic backgrounds. The non-Gaussian nature of this class of backgrounds causes

72: the algorithm presented here to differ from the well studied cross-correlation based algorithms which are

73: nearly optimal for Gaussian backgrounds.

74:

75: \subsection{Gravitational wave stochastic backgrounds}

76: \label{ss:Gravitational stochastic backgrounds}

77:

78: Consider a large collection of similar gravitational wave sources.  If

79: we cannot resolve the individual signals produced by these sources and

80: know only their statistical properties, the signals form a stochastic background.

81: A wide variety of candidate sources of gravitational wave stochastic backgrounds have been studied

82: (for an excellent general review see Ref.~\cite{Allen Review}).

83: These include high redshift supernovae \cite{gaussian supernovae, non gaussian supernovae},

84: the first stars or so-called population III objects \cite{first stars},

85: rapidly rotating young neutron stars \cite{gaussian neutron stars 1, gaussian neutron stars 2},

86: early universe phase transitions and cosmic strings \cite{cosmic strings, bubbles},

87: inflation \cite{inflation},

88: and high redshift compact binaries \cite{binaries}.

89:

90: Detecting a gravitational wave stochastic background produced by any one of these candidate sources could

91: provide information on a variety of topics ranging from the evolution

92: of the star formation rate \cite{Coward} to the numbers and sizes of posited extra dimensions \cite{Hogan}.

93: Because of this, stochastic backgrounds have long been thought to be among the most interesting

94: possible types of gravitational radiation.

95:

96: \subsection{Gaussian stochastic backgrounds}

97: \label{ss:Gaussian stochastic backgrounds}

98:

99: In order to develop detection methods, it is traditionally assumed that the individual events making up a

100: background are uncorrelated and sufficiently frequent for the background to be Gaussian.  That is, it is

101: assumed that the conditions for applicability of the central limit theorem are satisfied.

102:

103: Unlike electromagnetic waves, gravitational waves cannot be screened from a detector.

104: Using a single gravitational wave detector,  there is no practical way to distinguish between

105: detector noise and a stochastic background of gravitational waves.

106: As a consequence the sensitivity of a single detector to gravitational backgrounds is severely limited.

107: By comparing the outputs of multiple detectors, sensitivity levels can be enhanced.

108: Michelson \cite{Michelson} was the first to give a detailed description of such a detection

109: method for a Gaussian stochastic background of gravitational waves in the presence of Gaussian

110: detector noise.  His detection strategy and its later refinements \cite{Christensen,Flanagan,Allen Romano} are

111: often referred to as the cross-correlation method.

112: Recently the cross-correlation method has been modified to treat more realistic detectors

113: which themselves have sources of non-Gaussian noise \cite{robust

114: gaussian, robust gaussian II, Klimenko and Mitselmakher}.

115:

116: We now briefly review the cross-correlation method.

117: Consider two gravitational wave detectors.

118: The output of each detector is a collection of dimensionless strain measurements.

119: Suppose that $N$ such measurements are made with each detector at regular time intervals. Denote these measurements

120: by a $N \times 2$ matrix $h$ with components $h_i^k$,

121: where $i=1,2$ labels the detector, and $k=1,2,\ldots,N$ is a time index.

122: To determine whether or not the data $h$ contains some desired signal,

123: one usually

124: compares the value of some detection statistic $\Gamma(h)$ to a threshold value $\Gamma_*$.  That is,

125: if $\Gamma(h) > \Gamma_*$ one concludes that a signal is present and otherwise

126: one concludes that no signal is present.

127: A detection statistic is said to be optimal if it yields the smallest probability of mistakenly concluding a signal

128: is present (false alarm probability) after choosing a threshold which fixes the probability for

129: mistakenly concluding a signal is absent (false dismissal probability).

130:

131: Assume that the two detectors are collocated and aligned, and that each detector has white Gaussian noise with vanishing

132: mean with no correlations between the two detectors. Then the standard cross-correlation detection

133: statistic $\Lambda_{\text{CC}}$ for a Gaussian signal is

134: \begin{equation} \label{cross correlation}

135: \Lambda_{\text{CC}}(h) = \frac{\hat\alpha^2 }{ \bar \sigma_1 \bar \sigma_2},

136: \end{equation}

137: where

138: \begin{eqnarray}

139: \hat\alpha^2 &=& {\bar \alpha}^2 \theta({\bar \alpha}^2), \\

140: \bar\alpha^2 &=& \frac{1}{N}\sum_{k=1}^N h_1^{k} h_2^{k}, \\

141: \bar\sigma_i^2 &=& \frac{1}{N} \sum_{k=1}^N \left(h_i^k\right)^2, \label{intro bar sigma}

142: \end{eqnarray}

143: for $i=1,2$, and $\theta(x)$ is the Heaviside step function defined by

144: \begin{equation}

145: \label{stepfunction}

146: \theta(x) = \left\{

147: \begin{array}{ll}

148:         1 & \text{ if } x \ge 0 \\

149:         0 & \text{ if } x  < 0

150: \end{array}

151: \right. .

152: \end{equation}

153: This statistic is nearly optimal and can be derived

154: from a maximum likelihood framework (see Sec.~\ref{ss:Gaussian

155: signal}). The subscript CC in $\Lambda_{\text{CC}}$

156: denotes ``cross correlation''.  The generalization of this statistic

157: to allow for colored noise and non-collocated, non-aligned detectors is

158: discussed in Refs.\ \cite{Michelson,Christensen,Flanagan,Allen Romano}.

159:

160:

161:

162: % latex bug (?) causes start of next subsection to be squashed up

163: % against above text, the ~ below fixes is.

164:

165: ~

166:

167:

168:

169: \subsection{Non-Gaussian stochastic backgrounds}

170: \label{ss:Non-Gaussian stochastic backgrounds}

171:

172:

173: A particular class of events will produce a Gaussian background

174: if, on average, at any given moment, many individual events are arriving at the detector.

175: However, if the ratio of average time between events to the average duration

176: of events is large, then there are long stretches of ``silence'' or time during which no events arrive at

177: the detector.  The resulting stochastic background is non-Gaussian as the conditions for the applicability

178: of the central limit theorem are not  satisfied. Recent work has suggested that some candidate

179: gravitational wave stochastic backgrounds, of both cosmological and astrophysical origin, may  be

180: non-Gaussian \cite{non gaussian supernovae, cosmic strings, first stars}.  However,

181: predictions concerning the properties of most gravitational wave background sources rely heavily on theoretical

182: arguments which extrapolate well beyond observational support.  Such extrapolations are always in some sense

183: speculative. It is conceivable that backgrounds predicted to be Gaussian may in fact turn out to be non-Gaussian,

184: or vice versa.

185:

186: In Sec.~\ref{ss:Non-Gaussian signal} below, we apply a maximum

187: likelihood framework to derive a detection statistic for a particular

188: model of non-Gaussian stochastic background, which we now describe.

189: Let $h_i^k$ be the outputs of two collocated aligned gravitational

190: wave detectors with white, zero-mean, Gaussian noise with no

191: correlations between the two detectors.  The detector outputs $h_i^k$

192: consist of noise $n_i^k$ together with a common signal $s^k$:

193: \begin{eqnarray}

194: h_1^k &=& n_1^k + s^k \label{eq:common} \\

195: h_2^k &=& n_2^k + s^k. \nonumber

196: \end{eqnarray}

197: We wish to detect a non-Gaussian signal $s^k$ composed of long stretches of

198: silence which separate short bursts whose amplitudes are Gaussianly

199: distributed, and whose durations are smaller than the detector

200: resolution time (see Fig.~\ref{signal sketch}).  We therefore assume that each signal sample $s^k$ is

201: statistically independent with probability distribution [cf.\ Eq.\

202: (\ref{signal prior}) below]

203: \begin{equation}

204: p(s) = \xi {1 \over \sqrt{2 \pi} \alpha} \exp \left[-{s^2 \over 2

205: \alpha^2} \right] + (1 - \xi) \delta(s).

206: \label{eq:sigg}

207: \end{equation}

208: The parameter $\xi$ is what we call the {\it

209: Gaussianity parameter} of the

210: stochastic background; it is the probability that, at any randomly

211: chosen time, a burst is present in the detector.  Thus $\xi$ takes

212: values in the

213: interval $0 \le \xi \le 1$, and if $\xi=1$ then the background is

214: Gaussian.  The parameter $\xi$ can also be thought of as the duty

215: cycle of the background.  The parameter $\alpha$ in Eq.\ (\ref{eq:sigg}) is

216: the rms amplitude of the bursts.

217:

218:

219: Our nearly-optimal detection statistic

220: $\Lambda_{\text{ML}}^{\text{NG}}$ for the signal model (\ref{eq:sigg})

221: is given by [cf.\ Eq.\ (\ref{main result2}) below]

222: \begin{widetext}

223: \begin{eqnarray}  \label{main result}

224: \Lambda_{\text{ML}}^{\text{NG}}(h) &=&

225: \max_{0<\xi\le 1}~ \max_{\alpha > 0}~ \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~ \prod_{k=1}^N

226: \left\{

227:         \frac{ \bar\sigma_1 \bar\sigma_2 \xi}{\sqrt{\sigma^2_1 \sigma^2_2 + \sigma^2_1 \alpha^2 + \sigma^2_2 \alpha^2}}

228:         \exp \left[ \frac{\left( \frac{h_1^k}{\sigma^2_1} + \frac{h_2^k}{\sigma^2_2}\right)^2}

229:         {2\left( \frac{1}{\sigma^2_1} + \frac{1}{\sigma^2_2} + \frac{1}{\alpha^2} \right)}

230:         - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]  \right. \nonumber \\

231: &+& \left. \frac{\bar\sigma_1 \bar\sigma_2}{\sigma_1 \sigma_2}  (1-\xi)

232:         \exp \left[ - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]\right\}.

233: \end{eqnarray}

234: \end{widetext}

235: Here the quantities $\bar\sigma_1$ and $\bar\sigma_2$ are defined by Eq.~(\ref{intro bar sigma}).

236: The values of $\xi$, $\alpha^2$, $\sigma^2_1$ and $\sigma^2_2$ which achieve the maximum in Eq.~(\ref{main result}) are, respectively,  estimators of

237: the signal's Gaussianity parameter, the variance of the signal events, and the variances of the noise in the two detectors.

238: If we calculate the quantity (\ref{main result}) at $\xi = 1$, instead of maximizing over $\xi$, the result is a statistic which is

239: equivalent to the standard cross-correlation statistic

240: $\Lambda_{\text{CC}}$.

241:

242: The subscript ML on $\Lambda_{\text{ML}}^{\text{NG}}$ stands for

243: ``maximum likelihood'', while the superscript NG stands for

244: ``non-Gaussian statistic''.  The superscript NG does {\it not}

245: necessarily mean that one is considering a non-Gaussian signal; both

246: of the statistics $\Lambda_{\rm CC}$ and

247: $\Lambda_{\text{ML}}^{\text{NG}}$ can be applied to data containing

248: either a Gaussian signal or a non-Gaussian signal.

249:

250: If the burst-amplitude parameter $\alpha$ is sufficiently large

251: and the bursts are well separated in time, then the

252: individual bursts can

253: be seen in the detector output.  In this

254: case one could use, for example, the simple burst statistic

255: \footnote{In reality the statistic (\ref{eq:lambdaBdef}) would

256: be especially susceptible to non-Gaussian noise bursts in the detector

257: and so would not be used in practice; instead one would need search

258: for events where $|h_1^k|$ and $|h_2^k|$ are simultaneously large.  In

259: this paper we restrict attention for simplicity to Gaussian detector

260: noise; it will be important for future more general analyses to

261: to allow for (uncorrelated) non-Gaussian noise components in the two

262: detectors.}

263: \begin{equation}

264: \Lambda_\text{B} \equiv \max_{1 \le k \le N} \ \left| h_1^k \right|.

265: \label{eq:lambdaBdef}

266: \end{equation}

267: on the data from detector 1 to detect the signal.  The burst statistic

268: (\ref{eq:lambdaBdef}) and the cross-correlation statistic

269: $\Lambda_\text{CC}$ are used as references for comparison for

270: the maximum likelihood statistic below.

271:

272:

273: \subsection{Main results}

274: \label{ss:Main results}

275:

276: There are two main results in this paper.  The first result is the detection statistic $\Lambda_{\text{ML}}^{\text{NG}}$ given by Eq.~(\ref{main result}),

277: which is derived in Sec.~\ref{ss:Non-Gaussian signal}. This statistic is nearly optimal for

278: the detection of a class of non-Gaussian gravitational wave stochastic backgrounds incident on a pair of

279: idealized detectors.

280:

281:

282: The second main result, summarized in Figs.~\ref{omega gain} and \ref{fig:theoretical}, is a

283: comparison of

284: the performances of the maximum likelihood statistic

285: $\Lambda_{\text{ML}}^{\text{NG}}$, the

286: cross-correlation statistic $\Lambda_{\text{CC}}$, and the burst

287: statistic $\Lambda_\text{B}$.

288: \begin{figure}

289: \begin{center}

290: \epsfig{file=Figure1.eps,width=8.5cm}

291: \caption{

292: This plot shows the minimum gravitational-wave energy density

293: $\Omega_{\rm detectable}$ necessary for detection, for several

294: different detection statistics, as a function of the

295: background's Gaussianity parameter $\xi$.

296: The Gaussianity parameter $\xi$ is the probability that,

297: at any randomly chosen time, the waves from an event are incident on

298: the detectors, and thus takes values in the interval $0 \le \xi \le

299: 1$.  For a Gaussian background $\xi=1$.

300: The circles are the results of our Monte Carlo simulations for the

301: maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$, and the

302: solid curve shows the approximate theoretical prediction (\ref{eq:ansA}) and

303: (\ref{eq:ansB}) for

304: this statistic (expected to be accurate only to within a few tens of

305: percent).

306: The crosses are the Monte Carlo results for the

307: cross-correlation statistic $\Lambda_\text{CC}$, and the

308: dashed curve shows the theoretical prediction \protect{(\ref{analytic

309: detectable})} for

310: this statistic.  Finally the squares are the Monte Carlo results for the

311: burst statistic \protect{(\ref{eq:lambdaBdef})}, and the dotted curve shows the

312: corresponding theoretical prediction given by Eqs.\ \protect{(\ref{burstans1})}

313: and \protect{(\ref{burstans2})}.

314: For each statistic, the vertical error bars on the Monte Carlo

315: simulation results give the fluctuations computed from 4 different

316: runs, each with 2000 trials.

317: The number of data points is $N = 10^4$, and the false alarm and false

318: dismissal

319: probabilities are both $0.1$.

320: A detailed description of the

321: simulations and the analytical predictions can be found in

322: Sec.~\ref{s:Performance comparison}.

323: }

324: \label{omega gain}

325: \end{center}

326: \end{figure}

327: That comparison is quantified in terms of the the minimum

328: gravitational-wave energy density  $\Omega_\text{detectable}$

329: necessary for detection.   The values of this quantity for the three

330: different statistics $\Lambda_\text{ML}^\text{NG}$,

331: $\Lambda_\text{CC}$ and $\Lambda_\text{B}$ we will denote by

332: $\Omega_\text{detectable}^\text{ML}$, $\Omega_{\rm

333: detectable}^\text{CC}$, and $\Omega_\text{detectable}^\text{B}$,

334: respectively.  Results for these three quantities obtained from Monte

335: Carlo simulations are shown in Fig.\ \ref{omega gain}, which gives

336: $\Omega_\text{detectable}$ as a function of $\xi$ for $N = 10^4$ data

337: points.  The Monte Carlo simulations are described in Sec.\

338: \ref{ss:Description of the simulation algorithm}

339: below.  The figure shows that in the limit $\xi \to 1$ of Gaussian

340: signals, the statistics $\Lambda_\text{ML}^\text{NG}$ and

341: $\Lambda_\text{CC}$ perform approximately equivalently (the

342: cross-correlation statistic is slightly better).  As the Gaussianity

343: parameter $\xi$ is decreased, the performance of

344: $\Lambda_\text{ML}^\text{NG}$ improves, until at $\xi \sim 10^{-2.5}$ it

345: is better than that of $\Lambda_\text{CC}$ by about a factor of $3$ in

346: energy density.  Finally, in the

347: limit $\xi \to 0$, the individual bursts become visible and the burst

348: statistic $\Lambda_\text{B}$ becomes the best statistic.

349:

350:

351:

352:

353: {}Figure \ref{omega gain} also shows theoretical curves for the three

354: quantities $\Omega_\text{detectable}^\text{ML}$, $\Omega_{\rm

355: detectable}^\text{CC}$, and $\Omega_\text{detectable}^\text{B}$.

356: These curves are derived and discussed in Sec.\ \ref{s:Performance

357: comparison} below.  For the burst and cross-correlation statistics,

358: the theoretical curves should have a fractional accuracy $\sim

359: 1/\sqrt{N}$. For the maximum likelihood statistic, the theoretical

360: prediction is expected to be accurate to a few tens of percent.  These

361: expected accuracies are confirmed by the Monte Carlo simulations, as

362: seen in Fig.\ \ref{omega gain}.

363:

364:

365: The value $N = 10^4$ of the number of data points is roughly

366: appropriate for a space based detector like LISA, for which the

367: duration of a measurement might be $\sim 1 $ year and the effective bandwidth

368: $\sim 10^{-3}$ Hz.  However, for year-long observations with

369: ground based detectors, the effective bandwidth will be $\sim 100$ Hz

370: and consequently the appropriate value of $N$ is $ \sim 10^9$.  We were

371: unable to perform Monte Carlo simulations for this large value of $N$ due to

372: limitations in available computing power.  However, we show in Fig.\

373: \ref{fig:theoretical} the theoretical curves for the three different

374: statistics as functions of $\xi$ for $N=10^9$.  In this case, the

375: maximum likelihood statistic starts to outperform the

376: cross-correlation statistic at $\xi \sim 10^{-3}$, and the maximum

377: gain factor in energy density is of order $\sim 2$.

378:

379:

380: \begin{figure}

381: \begin{center}

382: \epsfig{file=Figure2.eps,width=8.5cm}

383: \caption{

384: The minimum gravitational-wave energy density $\Omega_{\rm

385: detectable}$ necessary for detection as a function of the

386: background's Gaussianity parameter $\xi$ for $N = 10^9$ data points,

387: which is a realistic number of data points for ground based detectors.

388: The false alarm and false dismissal probabilities are both 0.01.

389: The solid line is the theoretical prediction (\ref{eq:ansA}) and

390: (\ref{eq:ansB}) for the maximum

391: likelihood statistic, which is expected to be accurate to a few tens

392: of  percent.  The dashed line is the theoretical prediction

393: (\protect{\ref{analytic}}) for the cross correlation statistic, and

394: the dotted line is the theoretical prediction

395: (\ref{burstans1})--(\ref{burstans2}) for the burst

396: statistic; see caption to Fig.\ \protect{\ref{omega gain}}.

397: This plot indicates a maximum gain factor of $\sim 2$ in energy

398: density for duty cycles

399: in a narrow band near $\xi \sim 10^{-4}$.}

400: \label{fig:theoretical}

401: \end{center}

402: \end{figure}

403:

404:

405: We next discuss the computational cost of the maximum likelihood

406: statistic $\Lambda_{\rm ML}^{\rm NG}$.  As is well known, the

407: computational cost of trying to detect a stochastic background using

408: the cross-correlation statistic $\Lambda_{\text{CC}}$ is

409: negligible when compared to, say, matched-filter-based inspiral

410: waveform searches.  However, because of the non-trivial maximization

411: in Eq.~(\ref{main result}), the maximum likelihood statistic

412: $\Lambda_{\text{ML}}^{\text{NG}}$

413: is computationally intensive.  In fact, every evaluation of the function

414: to be maximized over the four parameters

415: $\xi$, $\alpha$, $\sigma_1$, and $\sigma_2$ requires computing a

416: length-$N$ sum or product, where $N$ is the number of data points,

417: and takes longer than the

418: entire cross-correlation detection method.  Depending on the method of calculation,

419: the computational cost of computing $\Lambda_{\text{ML}}^{\text{NG}}$ is larger than that

420: of computing $\Lambda_{\text{CC}}$ by a factor anywhere from $10^2$ to $10^4$.

421:

422:

423:

424:

425: To summarize, under the idealized assumptions of this paper, if one

426: searches for a stochastic background using the standard

427: cross-correlation statistic, then one might not detect a signal that

428: would have been detectable using our maximum likelihood statistic.

429: This conclusion probably generalizes to realistic detector

430: noise models and detector orientations.

431:

432:

433: \subsection{Outline of this paper}

434: \label{ss:Outline of this paper}

435:

436: In Sec.~\ref{s:General theory of detection statistics and parameters estimator} we introduce notation,

437: review the general theory of signal detection and parameter measurement, and derive a general form of the maximum

438: likelihood detection statistic.  Then, in Sec.~\ref{s:Application to stochastic background searches}, we derive the maximum likelihood

439: statistics for both a Gaussian background (Sec.~\ref{ss:Gaussian

440: signal}) and for the model (\ref{eq:sigg}) of a non-Gaussian background

441: (Sec.~\ref{ss:Non-Gaussian signal}), assuming two idealized detectors.

442: In Sec.~\ref{s:Performance comparison} we discuss analytical calculations and Monte Carlo simulations

443: comparing the performance of

444: the maximum likelihood and cross-correlation detection statistics.

445: Also in Sec.~\ref{s:Performance comparison} we show how the signal

446: parameters $\xi$ and $\alpha$ can be estimated, with reasonable

447: accuracy, for a strong non-Gaussian background. We conclude in

448: Sec.~\ref{s:Conclusions} with a

449: discussion of the results.

450:

451: \section{General theory of detection statistics and parameter estimation}

452: \label{s:General theory of detection statistics and parameters estimator}

453:

454: In this section we review various formal aspects of the theory of

455: signal detection and measurement.

456: We derive a form of the maximum likelihood detection statistic that is

457: more general than has been considered before in

458: the context of gravitational wave data analysis \cite{Allen Romano, general method, excess power, sam joe, sam unpublished}.

459: The material in this section can be found in a variety of texts

460: \cite{maximum likelihood}; we include this section for completeness

461: and to introduce notation.

462:

463: \subsection{Notational conventions}

464: \label{ss:Notational conventions}

465:

466: We use calligraphic letters $\mathcal{A, B, C, \ldots}$ to denote

467: random variables.

468: As described in Sec.~\ref{ss:Gaussian stochastic backgrounds}, given

469: $D$ detectors we can assemble an $N \times D$ detector

470: output matrix $\mathcal{H}$ with components $\mathcal{H}_i^k$ where $k=1,2,\ldots,N$ is

471: a time index, and $i=1,2,\dots,D$ labels the detector.

472: We assume that the detector outputs are made up of noise $\mathcal{N}$ and signal $\mathcal{S}$

473: with components $\mathcal{N}_i^k$ and $\mathcal{S}_i^k$ respectively, such that

474: \begin{equation} \label{detector output matrices}

475: \mathcal{H = N + S}.

476: \end{equation}

477: Specific realizations of random variables will be denoted by lower case

478: Roman symbols.  For example,

479: $h=n+s$ is a specific realization of Eq.~(\ref{detector output

480: matrices}), where the components of $h$ are $h_i^k$.

481:

482: Probability densities for random variables will always be denoted by a lowercase $p$ and will carry a subscript

483: to indicate which random variable is being described.  For example, $p_\mathcal{N}(n)d^{ND}n$

484: is the probability that $n<\mathcal{N}< n+dn$, where $d^{ND}n$ is the product

485: \begin{equation}  \label{differential product}

486: d^{ND}n = \prod_{k=1}^{N}\prod_{i=1}^{D}dn_i^k.

487: \end{equation}

488: We write the normalization requirement for $p_\mathcal{N}(n)$ as

489: \begin{equation} \label{normalization}

490: 1 = \int d^{ND}n~ p_\mathcal{N}(n).

491: \end{equation}

492: Unless otherwise specified, integrals are over $\mathbb{R}^{ND}$ where $\mathbb{R}$ is the set of real numbers.

493:

494: We assume a detector noise model with $Q_n$ parameters.  Let

495: $\mathcal{V}_n$ be a vector of length $Q_n$

496: whose components are the parameters characterizing the noise in the

497: detectors.  We denote by

498: $\Theta_n$ the space of all possible values of $\mathcal{V}_n$. Here

499: the subscript $n$ is not an index; it is merely short for ``noise''.

500: We denote joint probabilities in the usual way.  For example, $p_{{\mathcal N},{\mathcal V}_n}(n,{\bf v}_n)d^{ND}n~d^{Q_n}v_n$ is

501: the probability that $n<\mathcal{N}< n+dn$ and ${\bf v}_n<\mathcal{V}_n< {\bf v}_n+d{\bf v}_n$, where $d^{Q_n}v_n$ is defined by

502: \begin{equation}

503: d^{Q_n}v_n = \prod_{l=1}^{Q_n} dv_n^l,

504: \end{equation}

505: and $dv_n^l$ is the $l$th component of $d{\bf v}_n$.

506: We also use vertical bars to denote conditional

507: probabilities.  For example

508: \begin{equation} \label{conditional joint}

509: p_{\mathcal{N|V}_n}(n|{\bf v}_n) d^{ND}n =\frac{ p_{\mathcal{N,V}_n}(n,{\bf v}_n) d^{Q_n}v_n}{ p_{\mathcal{V}_n}({\bf v}_n) d^{Q_n}v_n}d^{ND}n

510: \end{equation}

511: is the probability that $n<\mathcal{N}< n+dn$ given that ${\bf

512: v}_n<\mathcal{V}_n< {\bf v}_n+d{\bf v}_n$.

513:

514: We will often use the so-called total probability theorem \cite{Papoulis} to write probability densities

515: for a specific random variable as an integral over the functional dependencies of that random variable.

516: An example is

517: \begin{equation} \label{total probability}

518: p_\mathcal{N}(n) =\int_{\Theta_n}d^{Q_n}v_n~ p_{\mathcal{N|V}_n}(n|{\bf v}_n) p_{\mathcal{V}_n}({\bf v}_n).

519: \end{equation}

520: Expanding probability densities in this way allows us to treat

521: parameters, such as the noise parameters $\mathcal{V}_n$ in

522: Eq.~(\ref{total probability}), as unknowns.  In fact, such a treatment of

523: the noise parameters

524: is the crucial difference between the derivations of this work and those in previous studies of

525: gravitational wave data analysis techniques \cite{Allen Romano, general method, excess power, sam joe, sam unpublished}.

526:

527: We assume that the signal model contains $Q_s$ parameters, which we

528: will treat as random variables

529: like the noise parameters.  We will denote by ${\mathcal V}_s$ the

530: random vector of length $Q_s$ containing the signal parameters,

531: and by $\Theta_s$ the space of all possible values of ${\mathcal V}_s$.

532:

533: We define the notions of ``signal present'' and ``signal absent'' in terms of a partition of the space

534: $\Theta_s$ of signal parameters into a disjoint union

535: \begin{equation}

536: \Theta_s = \Theta_{s0} \cup \Theta_{s1},

537: \end{equation}

538: where $\Theta_{s0}$ corresponds to the signal being absent, and $\Theta_{s1}$ the signal being present.

539: We define the random variable $\mathcal{T}$, taking values $\mathcal{T}=0$ or $\mathcal{T}=1$, according to

540: \begin{equation}

541: \mathcal{T} = \left\{

542: \begin{array}{ll}

543:         1 & \text{ if } \mathcal{V}_s \in \Theta_{s1} \\

544:         0 & \text{ if } \mathcal{V}_s \in \Theta_{s0}

545: \end{array} \right. .

546: \end{equation}

547: Thus $\mathcal{T}=1$ corresponds to a signal being present, and

548: $\mathcal{T}=0$ to no signal being present.  We define

549: \begin{equation}

550: p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,0) = \left\{

551: \begin{array}{ll}

552:         0  & \text{ if } {\bf v}_s \in \Theta_{s1} \\

553:         \delta^{ND}(s) & \text{ if } {\bf v}_s \in \Theta_{s0}

554: \end{array} \right. ,

555: \end{equation}

556: where $\delta^{ND}(s)$ is the $N \times D$ dimensional Dirac delta function.

557: We denote by  $p_\mathcal{T,H}(t,h)d^{ND}h$ the probability that

558: $\mathcal{T}=t$ and that $h < \mathcal{H} < h+dh$, where $t=0$ or $1$.

559: Similarly

560: \begin{equation}

561: p_\mathcal{H|T}(h|t) d^{ND}h = \frac{ p_\mathcal{H,T}(h,t) }{ P_\mathcal{T}(t)}d^{ND}h

562: \end{equation}

563: is the probability that $h < \mathcal{H} < h+dh$ given that $\mathcal{T}=t$.

564:

565:

566: We denote probabilities (as opposed to probability densities) with an uppercase $P$. For example $P_\mathcal{T}(1)$ is the probability that a signal

567: is present, and $P_\mathcal{T}(0)$ is the probability that a signal is

568: absent.

569:

570:

571: Before examining the detector outputs, we may have some idea, say from previous experiments,  of the probability

572: that a signal will be present. We denote this prior probability by $P^{(0)}$. We denote by $P^{(1)}$ the posterior

573: probability that the signal is present after examining $\mathcal{H}$ in the context of all prior experiments etc.

574: All posterior quantities have an implicit dependence on the detector outputs.  To simplify the notation

575: we will not explicitly show this dependence.  For example, we write $P^{(1)}$ rather than the more cumbersome

576: $P^{(1)}(\mathcal{H})$ for the posterior probability that a signal is present.

577:

578: There are prior and posterior versions of all probability densities. When necessary we will append superscripts

579: of $(0)$ and $(1)$ to distinguish priors and posteriors respectively.

580: For example $p^{(1)}_{\mathcal{V}_n}({\bf v}_n) = p_{\mathcal{V}_n|\mathcal{H}}({\bf v}_n|h)$ is the posterior probability density for

581: $\mathcal{V}_n$. The posterior distribution for the noise can be expanded in terms of $p^{(1)}_{\mathcal{V}_n}({\bf v}_n)$ as

582: \begin{equation}

583: p^{(1)}_\mathcal{N}(n) =\int_{\Theta_n}d^{Q_n}v_n~ p_{\mathcal{N|V}_n}(n|{\bf v}_n) p^{(1)}_{\mathcal{V}_n}({\bf v}_n).

584: \end{equation}

585:

586: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

587:

588: The conventions and symbols which have been introduced above are summarized in tables

589: \ref{conventions} and \ref{symbols} respectively.

590:

591: \begin{table*}

592: \caption{\label{conventions} A summary of conventions introduced in Sec.~\ref{ss:Notational conventions}.}

593: \begin{ruledtabular}

594: \begin{tabular}{p{8.5cm}p{8.5cm}}

595: Convention & Example \\ \hline

596:

597: Random variables are denoted by upper case calligraphic letters. &

598: The detector output matrix is denoted by $\mathcal{H}$. \\

599:

600: Specific realizations of random variables are denoted by lower

601: case Roman letters. (see next convention)&

602: A specific observation run may result in a specific detector output

603: matrix $h$ or say $x$. These results would be denoted $\mathcal{H}=h$

604: and $\mathcal{H}=x$ respectively. \\

605:

606: A lower case $p$ denotes a probability density function (PDF).

607: It's subscript determines the quantities with which it is associated. &

608: The PDF for the detector output $\mathcal{H}$ as a function of $h$, or say $x$,

609: is denoted by $p_\mathcal{H}(h)$ and $p_\mathcal{H}(x)$ respectively. \\

610:

611: A comma in a PDF subscript and argument indicates a joint PDF. &

612: The joint PDF for $\mathcal{N}$ and $\mathcal{V}_n$ as a function of

613: $n$ and ${\bf v}_n$ respectively is denoted by $p_{\mathcal{N},\mathcal{V}_n}(n,{\bf v}_n)$. \\

614:

615: A vertical bar in a PDF subscript and argument indicates a conditional PDF. &

616: The conditional PDF for $\mathcal{N}$ and $\mathcal{V}_n$ as a function of

617: $n$ and ${\bf v}_n$ respectively is denoted by $p_{\mathcal{N}|\mathcal{V}_n}(n|{\bf v}_n)$. \\

618:

619: An upper case $P$ denotes a probability. &

620:

621: The probability that $\mathcal{T}=1$ is denoted by $P_\mathcal{T}(1)$. \\

622:

623: Prior and posterior quantities are denoted by superscripts of $(0)$ and $(1)$ respectively. &

624:

625: The prior probability that a signal is present is denoted by $P^{(0)}$, while the posterior

626: probability that a signal is present, after an observation $\mathcal{H}=h$, is denoted by

627: $P^{(1)} = P_{\mathcal{T}|\mathcal{H}}(1|h)$.

628: \end{tabular}

629: \end{ruledtabular}

630: \end{table*}

631:

632: \begin{table}

633: \caption{\label{symbols} A summary of symbols introduced in Sec.~\ref{ss:Notational conventions}.}

634: \begin{ruledtabular}

635: \begin{tabular}{cp{7cm}}

636: Symbol & {Meaning} \\ \hline

637: $\mathcal{H},h$ & detector output matrix \\

638: $\mathcal{N},n$ & noise contribution to detector output matrix \\

639: $\mathcal{S},s$ & signal contribution to detector output matrix \\

640: $N$ & number of strain samples taken from one detector  \\

641: $D$ & number of detectors \\

642: $Q_n$ & number of parameters in the model noise PDF \\

643: $Q_s$ & number of parameters in the model signal PDF \\

644: $\mathcal{V}_n,{\bf v}_n$ & the parameters of the model noise PDF \\

645: $\mathcal{V}_s,{\bf v}_s$ & the parameters of the model signal PDF \\

646: $\Theta_n$ & the space of all possible values of $\mathcal{V}_n$\\

647: $\Theta_s$ & the space of all possible values of $\mathcal{V}_s$\\

648: $\Theta_{s0}$ & the subspace of $\Theta_s$ for which a signal is absent\\

649: $\Theta_{s1}$ & the subspace of $\Theta_s$ for which a signal is present \\

650: $\mathcal{T},t$ & 1 if a signal is present ($\mathcal{V}_s \in \Theta_{s1}$), otherwise 0\\

651: $P^{(0)}$ & prior probability that a signal is present\\

652: $P^{(1)}$ & posterior probability that a signal is present\\

653: \end{tabular}

654: \end{ruledtabular}

655: \end{table}

656: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

657:

658: \subsection{Detection statistics}

659: \label{ss:Detection statistics}

660:

661: To detect a signal one uses a detection statistic, say $\Gamma=\Gamma(\mathcal{H})$, that is some function

662: of the detector outputs ${\cal H}$.  A signal is said to have been

663: detected when

664: $\Gamma$ exceeds some threshold value $\Gamma_*$.

665:

666: Denote by $P_\text{FD}(\Gamma_*)$ the probability of false dismissal, that is, the probability

667: that we fail to detect a signal which is actually present.  Similarly, let $P_\text{FA}(\Gamma_*)$ be

668: the probability that we claim to have detected a signal which in fact is absent---the probability of false alarm.

669: For given signal and noise models and for a given statistic $\Gamma$, the

670: false alarm and false dismissal probabilities generate a curve in the

671: $P_\text{FA}$-$P_\text{FD}$ plane parametrized by the threshold $\Gamma_*$.

672: Such curves depend on the number of detectors $D$, the number of data points $N$,

673: the signal parameters $\mathcal{V}_s$, and the noise parameters $\mathcal{V}_n$.

674:

675: \begin{figure}

676: \begin{center}

677: \epsfig{file=FalseExpectations.eps,width=8.5cm}

678: \caption{False dismissal versus false alarm curves for typical detection statistics.}

679: \label{expected plots}

680: \end{center}

681: \end{figure}

682:

683: Suppose that the statistic $\Gamma$ is bounded in the sense that

684: there exist numbers $\Gamma_{\min}$ and $\Gamma_{\max}$ such that

685: $\Gamma_{\min} < \Gamma < \Gamma_{\max}$ for all ${\cal H}$.

686: Then it is clear that $P_\text{FD}(\Gamma_{\min}) = 0$ and that

687: $P_\text{FA}(\Gamma_{\min})=1$.  As the threshold $\Gamma_*$

688: increases toward $\Gamma_{\max}$,  $P_\text{FD}(\Gamma_*)$ will

689: increase while $P_\text{FA}(\Gamma_*)$

690: decreases, until finally at $\Gamma_* = \Gamma_{\rm max}$, $P_\text{FD} = 1$,

691: and $P_\text{FA} = 0$.  Thus, false dismissal-false alarm curves generally look

692: something like those sketched in Fig.~\ref{expected plots}.

693:

694:

695:

696: Note that if one uses a different statistic $f(\Gamma)$, where $f$ is

697: any function, then the shape of the $P_\text{FA}$-$P_\text{FD}$ curve does

698: not change as long as $f$ is monotonic in the sense that

699: \begin{equation} \label{transformation}

700: \Gamma > \Gamma_* ~\Rightarrow~ f(\Gamma) > f(\Gamma_*).

701: \end{equation}

702: Only the parametrization of the curve changes under such a

703: transformation.  Statistics related by transformations $f$ satisfying

704: the monotonicity property (\ref{transformation}) have identical false

705: dismissal versus false alarm curves.

706:

707: In 1933 Neyman and Pearson considered a simple signal detection scenario

708: where the sets $\Theta_n$, $\Theta_{s1}$, and $\Theta_{s0}$ each contain a single element \cite{Neyman and Pearson}.

709: They showed that for this scenario the detection statistic which minimizes $P_\text{FD}$ for any $P_\text{FA}$

710: is the so-called \emph{likelihood ratio} $\Lambda$, defined by

711: \begin{equation} \label{def1}

712: \Lambda = \frac{p_\mathcal{H|T}(h|1)}{p_\mathcal{H|T}(h|0)}.

713: \end{equation}

714: One notion of optimality for detection statistics is that the

715: statistic should minimize the false dismissal probability

716: at a fixed value of the false alarm probability.  For the simple

717: scenario above, this criteria, known as the

718: Neyman-Pearson criteria, uniquely determines the likelihood ratio as the optimal statistic

719: \cite{Ferguson}.  However in general, when any of $\Theta_n$, $\Theta_{s1}$, or $\Theta_{s0}$ contains more than

720: one element, the statistic selected by this criteria is a function of the unknown parameters $\mathcal{V}_s$

721: and $\mathcal{V}_n$.  Thus, as is well known, the Neyman-Pearson

722: criteria does not single out a unique statistic in such cases.

723:

724:

725: In this paper we will obtain our detection statistics from Bayesian

726: considerations, but we will quantify their

727: effectiveness using the Neyman and Pearson criteria of comparing false

728: dismissal probabilities at fixed false alarm probabilities.

729:

730: \subsection{Likelihood ratio and likelihood function}

731: \label{ss:Likelhood ratio and likelihood function}

732:

733: {}From a Bayesian point of view, a natural criterion for

734: deciding that a signal is present

735: is for the posterior probability $P^{(1)}$ to

736: exceed some threshold \cite{Bayes}.

737: The posterior probability $P^{(1)}$ is related to the prior

738: probability $P^{(0)}$ and to the likelihood ratio $\Lambda$ defined by

739: Eq.~(\ref{def1}) by

740: \begin{equation} \label{def2}

741: \frac{P^{(1)}}{1-P^{(1)}} = \Lambda \frac{P^{(0)}}{1-P^{(0)}}.

742: \end{equation}

743: See appendix \ref{s:appendixA} for a derivation of Eq.~(\ref{def2}) in the most general context where the sets $\Theta_n$,

744: $\Theta_{s1}$, and $\Theta_{s0}$ are all non-trivial. It follows from Eq.~(\ref{def2}) that $P^{(1)}$ is a monotonic function

745: of $\Lambda$, so thresholding on $P^{(1)}$ is equivalent to thresholding on $\Lambda$.

746: This makes $\Lambda$, or approximate versions of it, the natural choice for a detection statistic.

747:

748:

749: We derive in Appendix \ref{s:appendixA} the following general formula

750: for the likelihood ratio as a function of the data ${\cal H} = h$:

751: \begin{widetext}

752: \begin{equation} \label{general likelihood ratio}

753: \Lambda = \frac{\displaystyle  \int_{\Theta_{s1}} d^{Q_s}v_s~ \int d^{ND}s~ \int_{\Theta_n} d^{Q_n}v_n~ p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{V}_n}({\bf v}_n)

754:                  p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s|1) }

755:                {\displaystyle                       \int_{\Theta_n} d^{Q_n}v_n'~             p_{\mathcal{N|V}_n}(h|{\bf v}_n')  p_{\mathcal{V}_n}({\bf v}_n') }.

756: \end{equation}

757: \end{widetext}

758: The various probability distributions that appear in Eq.\ (\ref{general

759: likelihood ratio}) are (i) the prior distribution

760: $p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s|1)$ for the signal

761: parameters ${\bf v}_s$; (ii) the distribution

762: $p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1)$ for the signal $s$

763: given the signal parameters ${\bf v}_s$; (iii) the prior distribution

764: $p_{\mathcal{V}_n}({\bf v}_n)$ for the noise parameters ${\bf v}_n$;

765: and (iv) the distribution $p_{\mathcal{N|V}_n}(h|{\bf v}_n)$ for the

766: noise $n$ given the noise parameters ${\bf v}_n$.

767:

768:

769: We can interpret Eq.\ (\ref{general likelihood ratio}) as follows.

770: In the simple signal detection scenario, we choose between a pair of

771: simple claims:

772: (i) $\mathcal{V}_s = {\bf v}_{s0}$ or (ii) $\mathcal{V}_s = {\bf v}_{s1}$.

773: In general we choose between a pair of complicated, or composite, claims:

774: (i)  $\mathcal{V}_s \in \Theta_{s0}$ or (ii) $\mathcal{V}_s \in

775: \Theta_{s1}$, where both $\Theta_{s0}$

776: and $\Theta_{s1}$ contain many elements.

777: Equation (\ref{general likelihood ratio}) says that the best way to

778: chose between a pair of complicated claims is

779: to

780: first break the complicated pair of claims into pairs of simple

781: claims, then compute the likelihood ratio for each pair of simple claims,

782: and sum the results of each choice. That is, the likelihood ratio can

783: be written as an integral over the parameters of the composite claims

784: \begin{equation} \label{likelihood function def}

785: \Lambda = \int_{\Theta_{s1}} d^{Q_s}v_s~ \int_{\Theta_n} d^{Q_n}v_n ~\Lambda({\bf v}_s,{\bf v}_n),

786: \end{equation}

787: where the integrand $\Lambda({\bf v}_s,{\bf v}_n)$, which we refer to

788: as the \emph{likelihood function},

789: can be read off from Eq.\ (\ref{general likelihood ratio}):

790: \begin{widetext}

791: \begin{equation} \label{likelihood function}

792: \Lambda({\bf v}_s,{\bf v}_n) = \frac{\displaystyle \int d^{ND}s~  p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1)

793:                                          p_{\mathcal{V}_n}({\bf v}_n) p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s,1) }

794:                     {\displaystyle \int_{\Theta_n} d^{Q_n}v_n'~ p_{\mathcal{N|V}_n}(h|{\bf v}_n')  p_{\mathcal{V}_n}({\bf v}_n') }.

795: \end{equation}

796: \end{widetext}

797:

798: The likelihood function\footnote{There are two different conventions for the definition of the likelihood function.

799: Some authors include the probability distributions for $\mathcal{V}_s$ and $\mathcal{V}_n$ in the definition

800: of $\Lambda({\bf v}_s,{\bf v}_n)$ as we have in Eq.~(\ref{likelihood function}), while others leave these out

801: of $\Lambda({\bf v}_s,{\bf v}_n)$ and would show these distributions explicitly in

802: Eq.~(\ref{likelihood function def}).}

803: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

804: $\Lambda({\bf v}_s,{\bf v}_n)$ can be used to

805: compute the posterior probability density

806: $p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf

807: v}_n|1)$ for the signal and noise parameters given that a signal is

808: present, via the formula

809: \begin{equation} \label{distribution relation}

810: \frac{ P^{(1)} }{ 1 - P^{(1)} } p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1)

811: = \Lambda({\bf v}_s,{\bf v}_n) \frac{ P^{(0)} }{ 1 - P^{(0)} }.

812: \end{equation}

813: A derivation of Eq.~(\ref{distribution relation}) can be found in appendix \ref{s:appendixA}.

814:

815: \subsection{Maximum likelihood detection statistics and parameter estimators}

816: \label{ss:Maximum likelihood detection statistics and parameter estimators}

817:

818: In many applications, it is impractical to compute the detection

819: statistic (\ref{general likelihood ratio}) because of the

820: multi-dimensional integrals involved \cite{Loredo}.  However,

821: approximate versions of the statistic are often easier to compute and

822: useful.  If a signal is present with sufficiently large amplitude, then

823: the integrand in the numerator of Eq.\ (\ref{general likelihood ratio})

824: will be sharply peaked.  The integrand in the denominator of

825: Eq.\ (\ref{general likelihood ratio}) will also be sharply peaked when

826: there is sufficient data that the noise is well characterized.  Under

827: these circumstances, the integrals can be written as the values of the

828: corresponding integrands at the peaks multiplied by ``width

829: factors'', where the width factors depend only weakly

830: on the data $h$ and can be neglected without affecting much the

831: performance of the statistic.  [The width factors from the integrals

832: over the noise parameters will tend to cancel between the numerator

833: and denominator].  Also, frequently the prior distributions for

834: $\mathcal{V}_s$ and $\mathcal{V}_n$ are slowly varying, and neglecting

835: those distributions

836: has a negligible effect on the performance of the statistic.

837: Under these conditions the maximum likelihood detection statistic

838: $\Lambda_\text{ML}$ defined by

839: \begin{widetext}

840: \begin{equation} \label{general likelihood estimator}

841: \Lambda_\text{ML} = \frac{ \displaystyle \max_{{\bf v}_s\in\Theta_{s1}}~\max_{{\bf v}_n\in\Theta_n}~ \int d^{ND}s ~p_{\mathcal{N|V}_n}(h-s|{\bf v}_n)

842: 			   p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) }

843:                          { \displaystyle \max_{{\bf v'}\in\Theta_n}~ p_{\mathcal{N|V}_n}(h|{\bf v}_n') }

844: \end{equation}

845: \end{widetext}

846: is a natural approximate version of $\Lambda$

847: % FOOTNOTE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

848: \footnote{In the event that the priors for $\mathcal{V}_s$

849: and $\mathcal{V}_n$ restrict these parameters to regions $\Theta_{s1}' \subset \Theta_{s1}$ and

850: $\Theta_n' \subset \Theta_n$, the bounds of the maximizations in Eq.~(\ref{general likelihood estimator})

851: should be changed to $\Theta_{s1} \rightarrow \Theta_{s1}'$ and $\Theta_n \rightarrow \Theta_n'$.}.

852: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

853: The subscript ML denotes that (\ref{general likelihood estimator}) is the maximum likelihood approximate

854: version of $\Lambda$.

855: See Ref.~\cite{maximum likelihood} for further discussion of

856: $\Lambda_\text{ML}$ as an approximate version of $\Lambda$ \footnote{Note that $\Lambda_{\rm ML}$ is an

857: approximate version of $\Lambda$ only in the sense that the false dismissal versus false alarm curves

858: of the two statistics will be close to one another.  The numerical

859: values of $\Lambda_{\rm ML}$ and $\Lambda$ will in general differ

860: significantly, due to the width factors and priors.  Therefore the

861: statistic $\Lambda_{\rm ML}$ cannot be used in Eq.\ (\ref{def2}) to

862: compute Bayesian thresholds for detection given a desired value of

863: $P^{(1)}$.}.

864:

865:

866:

867: A particular special case of the detection

868: statistic (\ref{general likelihood estimator}), which is widely used,

869: is the following.  Assume that the noise

870: parameters have some known values $\mathcal{V}_n = {\bf v}_n$. Then the noise priors and the $\Theta_n$ integrals

871: in Eq.~(\ref{general likelihood ratio}) are trivial, and one obtains

872: the detection statistic

873: \begin{equation} \label{known likelihood estimator}

874: \tilde\Lambda_\text{ML}=\frac{\displaystyle \max_{{\bf v}_s\in\Theta_{s1}}~ \int d^{ND}s ~p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1)}

875: 			     {\displaystyle p_{\mathcal{N|V}_n}(h|{\bf v}_n)}.

876: \end{equation}

877: See Ref.~\cite{sam joe} for an exploration of the statistic (\ref{known likelihood estimator}) in the

878: context of stochastic backgrounds.  We will show below that for a Gaussian stochastic background,

879: $\Lambda_\text{ML}$ reduces to the standard cross-correlation

880: statistic while the more specialized statistic

881: $\tilde\Lambda_\text{ML}$ does not.  Thus for stochastic backgrounds,

882: treating the noise parameters as unknowns is crucial \cite{robust

883: gaussian II}.

884:

885:

886: When the noise and signal parameters $\mathcal{V}_n$ and

887: $\mathcal{V}_s$ can take on many values, one naturally would like to

888: know which

889: values are realized. Equation (\ref{distribution relation}) suggests

890: using the values $\hat {\bf v}_n$ and $\hat {\bf v}_s$ defined by

891: \begin{equation} \label{ML estimates}

892: \Lambda(\hat {\bf v}_s, \hat {\bf v}_n) = \max_{{\bf v}_s\in\Theta_{s1}}~ \max_{{\bf v}_n\in\Theta_n}~ \Lambda({\bf v}_s,{\bf v}_n).

893: \end{equation}

894: The estimators $\hat {\bf v}_n$ and $\hat {\bf v}_s$ are known as maximum likelihood estimators.

895: Note that ${\bf v}_s=\hat {\bf v}_s$ and ${\bf v}_n=\hat {\bf v}_n$ also maximize the numerator in Eq.~(\ref{general likelihood estimator}).

896: For the remainder of this paper we will use $\Lambda_\text{ML}$, defined by Eq.~(\ref{general likelihood estimator}), as our

897: detection statistic, and $\hat {\bf v}_s$ and $\hat {\bf v}_n$, defined by Eq.~(\ref{ML estimates}), as parameter estimators.

898:

899: \section{Application to stochastic background searches}

900: \label{s:Application to stochastic background searches}

901:

902: In this section we derive the maximum likelihood detection statistic (\ref{general likelihood estimator})

903: for a simplified model of the detection problem for stochastic gravitational waves, and for a specific simple model

904: of a non-Gaussian stochastic background.

905:

906: \subsection{Assumptions}

907: \label{ss:Assumptions}

908:

909: We assume two detectors with outputs $\mathcal{H}_i^k$, where $i=1,2$

910: labels the detector

911: and $k=1,2,\ldots,N$ is a time index.

912: We assume that the noise in detector one is uncorrelated with the

913: noise in detector two.

914: We will require the noise in both detectors to have vanishing mean and to be both Gaussian and white, so that

915: \begin{equation} \label{assumption2}

916: p_{\mathcal{N|V}_n}\left[n|(\sigma_1,\sigma_2)\right] =

917: \prod_{k=1}^N \frac{ 1 }{ 2 \pi \sigma_1\sigma_2 } \exp\left[- \frac{ (n_1^k)^2 }{ 2 \sigma^2_1 } - \frac{ (n_2^k)^2 }{ 2 \sigma^2_2 } \right].

918: \end{equation}

919: The parameters $\sigma_1$ and $\sigma_2$ in Eq.~(\ref{assumption2})

920: are the square roots of the variances of the noise

921: in the two detectors.

922: For this model ${\bf v}_n = (\sigma_1,\sigma_2)$ and $\Theta_n =

923: \left\{ (\sigma_1,\sigma_2) ~|~ \sigma_1 \ge 0 \text{ and } \sigma_2

924: \ge 0\right\}$.

925:

926: We assume that the detectors are collocated and aligned, so that the

927: same signal is present in both detectors

928: \begin{equation}

929: \mathcal{S}_1^k = \mathcal{S}_2^k = \mathcal{S}^k.

930: \end{equation}

931: Lastly we assume that the individual signal samples are uncorrelated

932: and identically distributed, i.e., the signal is white, so that

933: \begin{equation} \label{assumption4}

934: p_\mathcal{S}(s) = \prod_{k=1}^N p_{\mathcal{S}^k}(s^k).

935: \end{equation}

936: Our assumptions (\ref{assumption2})-(\ref{assumption4})

937: are unrealistic for both ground-based and space-based detectors: we

938: expect the noise to be colored with significant non-Guasssian

939: components, and in general detectors will not be co-located and

940: aligned.  Our analysis is therefore just a first step, and will need

941: to be generalized.  However, we expect that our central conclusion ---

942: the existence of statistics which outperform the standard

943: cross-correlation statistic for nonGaussian signals --- is robust, and

944: will not be altered when these complications are taken into account.

945:

946:

947:

948: We now derive a general formula for the maximum likelihood statistic

949: (\ref{general likelihood estimator}), which we apply in both the

950: Gaussian and non-Gaussian cases in the following two subsections.

951: The denominator in Eq.~(\ref{general likelihood estimator}) can be

952: written, from Eq.~(\ref{assumption2}), as

953: \begin{equation} \label{denominator to maximize}

954: \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~ \left\{  \left(2 \pi \sigma_1\sigma_2 \right)^{-N}

955: 	                         	\exp\left[ -\frac{N}{2} \left( \frac{\bar\sigma^2_1}{\sigma^2_1}

956: 					+ \frac{\bar\sigma^2_2}{\sigma^2_2} \right)\right] \right\},

957: \end{equation}

958: where $\bar\sigma_1^2$ and $\bar\sigma_2^2$ are defined by

959: \begin{equation} \label{sigma bar def}

960: \bar\sigma^2_i = \frac{1}{N} \sum_{k=1}^N \left(h_i^k\right)^2

961: \end{equation}

962: for $i=1,2$.

963: It is easily shown that the maximum in Eq.~(\ref{denominator to maximize}) is achieved at

964: $\sigma_i = \bar\sigma_i$.

965: From Eq.~(\ref{general likelihood estimator}) this yields

966: \begin{widetext}

967: \begin{equation}

968: \Lambda_\text{ML} = \frac{\displaystyle \max_{{\bf v}_s\in\Theta_{s1}}~\max_{{\bf v}_n\in\Theta_n}~ \int d^{ND}s ~p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1) }

969:                          {\displaystyle \left( 2\pi \bar\sigma_1 \bar\sigma_2 \right)^{-N} \exp\left( -N\right) }.

970: \end{equation}

971: Combining this with Eq.~(\ref{assumption4}) yields the following final general expression for the maximum likelihood statistic:

972: \begin{equation} \label{special likelihood estimator}

973: \Lambda_\text{ML} = \max_{{\bf v}_s\in\Theta_{s1}}~\max_{\sigma_1 \ge 0}~\max_{\sigma_2 \ge 0}~

974: 		    \prod_{k=1}^N  \frac{ \bar\sigma_1 \bar\sigma_2 }{ \sigma_1 \sigma_2 }

975: 		    \int_{-\infty}^\infty ds^k~ p_{\mathcal{S}^k|\mathcal{V}_s,\mathcal{T}}(s^k|{\bf v}_s,1)

976: 	            \exp\left[ -\frac{ \left(h_1^k - s^k \right)^2 }{ 2\sigma^2_1 }

977: 	                       - \frac{ \left( h_2^k - s^k \right)^2 }{ 2\sigma^2_2 }

978: 	       	               + 1 \right].

979: \end{equation}

980: \end{widetext}

981:

982:

983: \subsection{Gaussian signal}

984: \label{ss:Gaussian signal}

985:

986: We now consider the case where the signal is Gaussian and has a

987: vanishing mean.  We denote by $\alpha^2$ the variance of the signal,

988: so the prior for $\mathcal{S}$ is given by

989: \begin{equation} \label{Gaussian signal}

990: p_{\mathcal{S}^k|\mathcal{V}_s,\mathcal{T}}(s^k|\alpha,1) = \frac{ 1 }{ \sqrt{2 \pi} \alpha }

991: 			      \exp\left[- \frac{ \left( s^k \right)^2 }{ 2 \alpha^2 } \right].

992: \end{equation}

993: For this model ${\bf v}_s = (\alpha)$ has only one component, and

994: $\Theta_{s1}=\{ \alpha ~|~ \alpha > 0\}$.

995:

996:

997:

998:

999: Substituting the signal probability distribution (\ref{Gaussian signal}) into the general expression (\ref{special likelihood estimator})

1000: for $\Lambda_\text{ML}$ yields a Gaussian integral which is straightforward to evaluate.  The result is

1001: \begin{eqnarray}

1002: \label{long Gaussian stat}

1003: \Lambda^\text{G}_\text{ML} &=&  \max_{\alpha > 0}~\max_{\sigma_1 \ge 0}~\max_{\sigma_2 \ge 0}

1004: \left\{

1005:  \frac{\bar\sigma_1 \bar\sigma_2}{\sqrt{\sigma_1^2 \sigma_2^2 + \sigma_1^2 \alpha^2 + \sigma_2^2 \alpha^2}}   \right. \\

1006: &\times & \left. \exp\left[ \frac{ \frac{\bar\sigma_1^2}{\sigma_1^4} + \frac{\bar\sigma_2^2}{\sigma_2^4} + \frac{2\bar\alpha^2}{\sigma_1^2\sigma_2^2} }

1007: 	                   {2 \left( \frac{1}{\sigma_1^2} + \frac{1}{\sigma_2^2} + \frac{1}{\alpha^2} \right)}

1008: 		           - \frac{\bar\sigma_1^2}{2\sigma_1^2} - \frac{\bar\sigma_2^2}{2\sigma_2^2} + 1 \right] \right\}^N ,\nonumber

1009: \end{eqnarray}

1010: where

1011: \begin{equation}

1012: \bar \alpha^2 = \frac{1}{N}\sum_{k=1}^N h_1^k h_2^k,

1013: \label{baralphadef}

1014: \end{equation}

1015: and we have appended a superscript G on $\Lambda^\text{G}_\text{ML}$ to indicate the maximum likelihood detection

1016: statistic for a Gaussian signal.

1017:

1018: One can show that the maximum in Eq.~(\ref{long Gaussian stat}) is achieved at $\alpha = \hat\alpha$, $\sigma_1  = \hat\sigma_1$, and

1019: $\sigma_2  = \hat\sigma_2$, where

1020: \begin{eqnarray}

1021: \hat\alpha^2 &=& \bar\alpha^2 ~\theta(\bar\alpha^2) ,  \label{gaussian estimator1}\\

1022: \hat\sigma^2_i &=& (\bar\sigma^2_i - \hat\alpha^2) ~\theta\left( \bar\sigma^2_i - \hat\alpha^2 \right) , \label{gaussian estimator2}

1023: \end{eqnarray}

1024: for $i=1,2$, and $\bar\sigma_1$ and $\bar\sigma_2$ are given by Eq.~(\ref{sigma bar def}).

1025: Here $\theta(x)$ is the step function (\ref{stepfunction}).

1026: The quantities (\ref{gaussian estimator1}) and (\ref{gaussian estimator2}) are the maximum likelihood estimators for

1027: the variance $\alpha^2$ of the signal and the variances $\sigma_1^2$

1028: and $\sigma_2^2$ of the noise in the two detectors. The step functions

1029: in Eqs.~(\ref{gaussian estimator1}) and (\ref{gaussian estimator2})

1030: arise as a result of the bounds of the maximization

1031: in Eq.~(\ref{special likelihood estimator}).

1032:

1033: The corresponding detection statistic is, from Eq.~(\ref{long Gaussian stat})

1034: % FOOTNOTE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1035: \footnote{To simplify the formula for $\Lambda_\text{ML}^\text{G}$ we assume that $\bar\sigma_i^2 - \bar\alpha^2> 0$.

1036: This will be true for any realistic value of $N$ since $\bar\sigma_i^2

1037: - \bar\alpha^2 = \sigma_{i,{\rm true}}^2 + O(1/\sqrt{N})$, where

1038: $\sigma_{i,{\rm true}}$ is the true value of $\sigma_i$ and the second term

1039: describes the statistical fluctuations.},

1040: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1041: \begin{equation} \label{Gaussian statistic}

1042: \Lambda^\text{G}_\text{ML} = \left[ 1 - \frac{\hat\alpha^4}{\bar\sigma_1^2 \bar\sigma_2^2} \right]^{-N/2}.

1043: \end{equation}

1044: The cross-correlation statistic $\Lambda_\text{CC}$ can be obtained

1045: from $\Lambda_\text{ML}^\text{G}$ via a monotonic transformation which

1046: preserves false dismissal versus false alarm curves [cf.\ Eq.\

1047: (\ref{transformation}) above]:

1048: \begin{eqnarray}

1049: \Lambda_\text{CC} &=& \sqrt{1 -(\Lambda^\text{G}_\text{ML})^{-2/N}}

1050: \nonumber \\

1051:  &=& \frac{\hat\alpha^2}{\bar\sigma_1 \bar\sigma_2}.

1052: \label{standard cross corr}

1053: \end{eqnarray}

1054:

1055:

1056:

1057:

1058: Note that if we had assumed the noise parameters ${\bf v}_n =

1059: (\sigma_1,\sigma_2)$ were known, and derived a statistic from

1060: Eq.~(\ref{known likelihood estimator}) rather than Eq.~(\ref{general

1061: likelihood estimator}), we would have found instead the detection

1062: statistic $\tilde\Lambda_\text{ML}^\text{G} =

1063: \bar\Lambda_\text{ML}^\text{G}~\theta(\bar\Lambda_\text{ML}^\text{G})$,

1064: where

1065: \begin{equation} \label{known varriance}

1066: \bar\Lambda_\text{ML}^\text{G} = \bar\alpha^2 + \frac{1}{2}\left[

1067:                     \frac{\sigma_2^2}{\sigma_1^2}(\bar\sigma_1^2 -

1068:                     \sigma_1^2)                     +

1069:                     \frac{\sigma_1^2}{\sigma_2^2}(\bar\sigma_2^2 -

1070:                     \sigma_2^2)  \right],

1071: \end{equation}

1072: which is different from the standard cross-correlation statistic. This

1073: non-standard result is obtained because of the unrealistic assumption

1074: that the noise parameters ${\bf v}_n = (\sigma_1,\sigma_2)$ are

1075: known.  Different derivations of the result (\ref{known varriance})

1076: can be found in Refs.~\cite{sam joe,robust gaussian II}.

1077:

1078:

1079: It is often useful to characterize the ``strength'' of a stochastic

1080: background in terms of the signal-to-noise ratio of the

1081: cross-correlation statistic (\ref{standard cross corr}), which we now

1082: define.  First note that for large $N$, the fractional fluctuations in

1083: $\hat\alpha^2$ will be much larger than those in

1084: $\bar\sigma_1\bar\sigma_2$

1085: \footnote{This is true at fixed signal-to-noise ratio $\rho$.}.

1086: For the purpose of defining the signal-to-noise ratio, we assume that $N$

1087: is large enough that

1088: $\bar\sigma_1$ and $\bar\sigma_2$ in Eq.~(\ref{standard cross corr})

1089: can be taken to be independent of $h$,

1090: so that $\Lambda_\text{CC}$ and $\hat\alpha^2$ are equivalent

1091: detection statistics.

1092: We also use ${\bar \alpha}^2$ instead of

1093: ${\hat \alpha}^2$ in the computations that follow, as is conventional

1094: when defining

1095: signal-to-noise ratios.  If a signal is present, then the expected

1096: value of $\bar\alpha^2$ is, from Eqs.\ (\ref{detector output

1097: matrices}), (\ref{assumption2})--(\ref{assumption4}), (\ref{Gaussian

1098: signal}) and (\ref{baralphadef}),

1099: \begin{equation} \label{expected value 1}

1100: \left< \bar \alpha^2 \right> = \alpha^2.

1101: \end{equation}

1102: If no signal is present, so that $\alpha^2=0$, then the fluctuations in $\bar\alpha^2$ are given by

1103: \begin{equation} \label{fluctuations 1}

1104: \Delta \left( \bar \alpha^2 \right) = \frac{\sigma_1\sigma_2}{\sqrt{N}}.

1105: \end{equation}

1106: The signal-to-noise ratio $\rho$ is defined to be the ratio of these two quantities:

1107: \begin{equation} \label{rho}

1108: \rho = \frac{\alpha^2\sqrt{N}}{\sigma_1\sigma_2}.

1109: \end{equation}

1110:

1111: \subsection{Non-Gaussian signal}

1112: \label{ss:Non-Gaussian signal}

1113:

1114: As mentioned in the introduction, the traditional assumption that a

1115: gravitational wave stochastic background will be Gaussian

1116: requires the individual events to be sufficiently frequent and

1117: uncorrelated.  Our model for a non-Gaussian signal assumes instead that the

1118: events are infrequent.

1119:

1120: Consider a collection of similar events generating a stochastic background $\mathcal{S}$.

1121: Let $\xi$ be the probability that, at any randomly chosen time, the waves from an event

1122: are arriving at the detectors.  We assume that

1123: the time structure of individual

1124: events cannot be resolved by the detectors.  That is,

1125: we assume that the events occur over timescales smaller than the

1126: detectors' resolution time, as illustrated in Fig.~\ref{signal sketch}.

1127: \begin{figure}

1128: \begin{center}

1129: \epsfig{file=signal.eps,width=8cm}

1130: \caption{

1131: Sketched segment of the signal produced by a model non-Gaussian stochastic

1132: background of events unresolved by the detectors. Here we show two events.  The solid curve is the

1133: exact signal.  This exact signal's contributions to the detector outputs, shown as stemmed {\sf o}'s,

1134: are averages of the exact signal over the detector resolution timescale.

1135: }

1136: \label{signal sketch}

1137: \end{center}

1138: \end{figure}

1139: We assume that the distribution of the amplitudes of the events is

1140: Gaussian with variance $\alpha^2$.

1141: The probability distribution for the signal given the signal

1142: parameters $(\xi,\alpha)$ is therefore given by

1143: \begin{eqnarray} \label{signal model}

1144: p_{\mathcal{S}^k|\mathcal{V}_s,\mathcal{T}}[s^k| (\xi,\alpha),1] &=&

1145: \frac{\xi}{\sqrt{2\pi}\alpha}\exp \left[ -\frac{\left( s^k\right) ^2}{2\alpha^2} \right] \nonumber \\

1146: &+& (1-\xi) \delta \left( s^k \right) \label{signal prior},

1147: \end{eqnarray}

1148: together with Eq.~(\ref{assumption4}).

1149: Thus the signal model parameters are ${\bf v}_s=(\xi,\alpha)$, which

1150: give respectively the  ``event probability'' and ``event variance''

1151: characterizing the stochastic background.  The parameter space

1152: $\Theta_s$ for this model is

1153: \begin{equation}

1154: \Theta_{s} = \left\{ (\xi,\alpha) ~|~ 0 \le \xi \le 1 \text{ and } \alpha \ge 0 \right\},

1155: \end{equation}

1156: and the subset corresponding to a signal being present is

1157: \begin{equation}

1158: \Theta_{s1} = \left\{ (\xi,\alpha) ~|~ 0 < \xi \le 1 \text{ and } \alpha > 0 \right\}.

1159: \end{equation}

1160:

1161:

1162:

1163:

1164: Note that our assumption that the time structure of events is not resolved by the detector is unrealistic.  Detector resolution times

1165: can be as small as 0.1 ms in the case of ground-based detectors like LIGO

1166: \footnote{For ground-based detectors, the effective resolution time in a cross-correlation between

1167: two detectors can be considerably longer than $0.1$ ms \cite{Allen

1168:   Romano}, which may help with this issue.},

1169: and even supernova bursts are expected to

1170: have time scales $\gtrsim 10$ ms \cite{waveform catalog,new waveform catalog}.

1171: It will be important for future studies to relax this assumption.

1172:

1173:

1174:

1175: We now compute the maximum likelihood detection statistic $\Lambda^\text{NG}_\text{ML}$ for our simple non-Gaussian signal model

1176: by substituting Eq.~(\ref{signal prior}) into Eq.~(\ref{special likelihood estimator}).

1177: This yields

1178: \begin{widetext}

1179: \begin{eqnarray}  \label{main result2}

1180: \Lambda_{\text{ML}}^{\text{NG}} &=&

1181: \max_{0<\xi\le 1}~ \max_{\alpha > 0}~ \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~ \prod_{k=1}^N

1182: \left\{

1183:         \frac{ \bar\sigma_1 \bar\sigma_2 \xi}{\sqrt{\sigma^2_1 \sigma^2_2 + \sigma^2_1 \alpha^2 + \sigma^2_2 \alpha^2}}

1184:         \exp \left[ \frac{\left( \frac{h_1^k}{\sigma^2_1} + \frac{h_2^k}{\sigma^2_2}\right)^2}

1185:         {2\left( \frac{1}{\sigma^2_1} + \frac{1}{\sigma^2_2} + \frac{1}{\alpha^2} \right)}

1186:         - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]  \right. \nonumber \\

1187: &+& \left. \frac{\bar\sigma_1 \bar\sigma_2}{\sigma_1 \sigma_2}  (1-\xi)

1188:         \exp \left[ - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left( h_2^k\right)^2}{2\sigma^2_2} + 1\right]\right\}.

1189: \end{eqnarray}

1190: \end{widetext}

1191: The values of $\xi$, $\alpha^2$, $\sigma_1^2$, and $\sigma_2^2$ which achieve the maximum in Eq.~(\ref{main result2})

1192: are, respectively, estimators of the signal's Gaussianity parameter,

1193: the variance of the signal events,

1194: and the noise variances in the two detectors\footnote{See Ref.~\cite{MG9} for a derivation of a statistic similar to $\Lambda_\text{ML}^\text{NG}$ and

1195: designed for the same non-Gaussian signals which is based on Eq.~(\ref{known likelihood estimator}) rather

1196: than Eq.~(\ref{general likelihood estimator}).}.

1197: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1198: Note that if we evaluate Eq.~(\ref{main result2}) at $\xi=1$, rather than maximizing over $\xi$,

1199: we recover Eq.~(\ref{long Gaussian stat}) and the statistic $\Lambda_\text{ML}^\text{G}$.

1200:

1201: We mention in passing an approximate version of the statistic

1202: (\ref{main result2}) which is significantly easier to compute.

1203: Expanding the logarithm of the

1204: quantity to be maximized in Eq.~(\ref{main result2}) as a power

1205: series in $\alpha^2$ to fourth order about $\alpha^2=0$ yields the

1206: approximate statistic $\hat\Lambda_\text{ML}^\text{NG}$ given by

1207: \begin{eqnarray} \label{expanded}

1208: \ln \hat\Lambda_\text{ML}^\text{NG} &=& \max_{0<\xi\le 1}~ \max_{\alpha > 0}~ \max_{\sigma_1 \ge 0}~ \max_{\sigma_2 \ge 0}~

1209:                         \sum_{n=0}^4

1210:                         \sum_{l = 0}^{8}

1211:                         \sum_{m=0}^8

1212:                         \left(\frac{\alpha^2}{\sigma_1 \sigma_2}\right)^n \nonumber \\

1213: &\times&		C_{nlm}\left(\xi,\sigma_1,\sigma_2\right)

1214: 			\sum_{k=1}^N (h_1^k)^l (h_2^k)^m,

1215: \end{eqnarray}

1216: where the coefficients $C_{nlm}(\xi,\sigma_1^2,\sigma_2^2)$

1217: %are tabulated in Appendix \ref{s:coeffs}.  These coefficients

1218: vanish

1219: unless $l+m$ is even and $l+m \le 8$.

1220: In evaluating the statistic (\ref{expanded}), one can first evaluate the 24 sums

1221: \begin{equation}

1222: \sum_{k=1}^N (h_1^k)^l (h_2^k)^m

1223: \end{equation}

1224: for the required values of $l$ and $m$, and subsequently numerically maximize over the parameters $\xi$,

1225: $\alpha$, $\sigma_1$, and $\sigma_2$. Thus the length-$N$ sums need only be performed once, rather than each time one tries a new

1226: set of values for $\xi$, $\alpha$,

1227: $\sigma_1$, and $\sigma_2$. Therefore the computational cost of $\hat\Lambda_\text{ML}^\text{NG}$ is only about an order of

1228: magnitude greater than that of the cross correlation statistic

1229: $\Lambda_\text{CC}$, and this statistic may be useful to explore.

1230:

1231:

1232:

1233:

1234: We now derive the signal-to-noise ratio $\rho$ for the cross-correlation

1235: statistic and for the non-Gaussian signal (\ref{signal model}).

1236: If the signal is present, then from Eqs.\ (\ref{detector output

1237: matrices}), (\ref{assumption4}),

1238: (\ref{baralphadef}), (\ref{gaussian estimator1}) and (\ref{signal model})

1239: the expected value

1240: of $\bar\alpha^2$ is

1241: \begin{equation} \label{expected value 2}

1242: \left< \bar \alpha^2 \right> = \xi\alpha^2.

1243: \end{equation}

1244: If no signal is present then the fluctuations in $\bar\alpha^2$ are given by

1245: \begin{equation} \label{fluctuations 2}

1246: \Delta \left( \bar \alpha^2 \right) = \frac{\sigma_1\sigma_2}{\sqrt{N}}.

1247: \end{equation}

1248: Therefore, taking the ratio of Eqs.\ (\ref{expected value 2}) and

1249: (\ref{fluctuations 2}), the signal-to-noise ratio $\rho$ is

1250: \begin{equation} \label{rho2}

1251: \rho = \frac{\xi\alpha^2\sqrt{N}}{\sigma_1\sigma_2}.

1252: \end{equation}

1253:

1254:

1255: \section{Performance comparison}

1256: \label{s:Performance comparison}

1257:

1258:

1259: In this section we compare the performances of the cross-correlation

1260: statistic (\ref{standard cross corr}), the burst statistic

1261: (\ref{eq:lambdaBdef}), and the maximum

1262: likelihood statistic (\ref{main result2})

1263: for our model non-Gaussian signal described in Sec.~\ref{ss:Non-Gaussian signal}.

1264: The comparison is quantified in terms of the false alarm versus false

1265: dismissal curves, as discussed in Sec.\ \ref{s:General theory of detection

1266: statistics and parameters estimator} above.

1267: In Sec.\ \ref{ss:analytic} we discuss analytic predictions for these curves

1268: for the three different statistics.  Section \ref{ss:Description of the

1269: simulation algorithm} describes our Monte Carlo simulation algorithm,

1270: and Secs.\ \ref{ss:Results for detection} and \ref{ss:Results for

1271:   parameter estimation} describe the results.

1272:

1273:

1274:

1275:

1276: \subsection{Analytic computation of asymptotic behavior of statistics}

1277: \label{ss:analytic}

1278:

1279:

1280: We start by discussing the set of parameters on which the false dismissal versus false

1281: alarm curves can depend.  As before, we assume two detectors with noise characterized by

1282: Eq.~(\ref{assumption2}) with $\mathcal{V}_n=(\sigma_1,\sigma_2)$,  and a non-Gaussian

1283: signal characterized by Eqs.~(\ref{assumption4}) and (\ref{signal prior}) with

1284: $\mathcal{V}_s=(\xi,\alpha)$.

1285: The curves for each statistic are given by some function

1286: \begin{equation}\label{dependance1}

1287: P_\text{FD} = P_\text{FD}(P_\text{FA},\xi,\alpha,\sigma_1,\sigma_2,N)

1288: \end{equation}

1289: of the false alarm probability $P_{\rm FA}$, the Gaussianity parameter

1290: $\xi$, the rms amplitude $\alpha$ of events, the noise variances

1291: $\sigma_1^2$ and $\sigma_2^2$, and the number of data points $N$.

1292: We can simplify Eq.~(\ref{dependance1}) by replacing $\alpha$ with the

1293: signal-to-noise ratio $\rho$ using the definition (\ref{rho2}), and

1294: noting from dimensional analysis that $P_\text{FA}$ depends on $\sigma_1$

1295: and $\sigma_2$ at fixed $\rho$ only through the ratio

1296: $\sigma_1/\sigma_2$.  This gives

1297: \begin{equation}\label{dependance2}

1298: P_\text{FD} = P_\text{FD}(P_\text{FA},\xi,\rho,\sigma_1/\sigma_2,N).

1299: \end{equation}

1300: For simplicity, we specialize to $\sigma_1=\sigma_2$ for the remainder of this paper.

1301: This implies that

1302: \begin{equation}\label{dependance3}

1303: P_\text{FD} = P_\text{FD}(P_\text{FA},\xi,\rho,N).

1304: \end{equation}

1305:

1306:

1307: \subsubsection{Cross correlation statistic}

1308:

1309:

1310:

1311: The false dismissal versus false alarm curves for the cross-correlation statistic can be computed

1312: analytically in the large $N$ limit, as we now describe.  Our derivation generalizes the analysis of

1313: Ref.~\cite{Allen Romano} from Gaussian to non-Gaussian signals. For any detection statistic $\Gamma$,

1314: we can express $P_\text{FA}$ and $P_\text{FD}$ in terms of the detection threshold $\Gamma_*$ as

1315: \begin{eqnarray}

1316: P_\text{FA}(\Gamma_*,\sigma_1,\sigma_2,N) &=&

1317:          \int_{\Gamma_*}^\infty dx~p_{\Gamma|\mathcal{T}}(x|0) , \label{simple Pfa}\\

1318: P_\text{FD}(\Gamma_*,\xi,\rho,\sigma_1,\sigma_2,N) &=&

1319:          1 - \int_{\Gamma_*}^\infty dx~p_{\Gamma|\mathcal{T}}(x|1)\nonumber  . \label{simple Pfd}\\

1320: \end{eqnarray}

1321: Here the definition of the random variable $\mathcal{T}$ is such that

1322: if $\mathcal{T}=0$ then no signal is present ($\xi = \rho = 0$), and

1323: if $\mathcal{T}=1$ then a signal is present ($\xi \ne 0$ and $\rho \ne

1324: 0$); cf.\ Sec.\ \ref{ss:Notational conventions} above.

1325: Note that by eliminating $\Gamma_*$ between

1326: Eqs.~(\ref{simple Pfa}) and (\ref{simple Pfd}), we recover Eq.~(\ref{dependance1}).

1327:

1328: In the large $N$ limit, the distribution

1329: $p_{\Lambda_\text{CC}|\mathcal{T}}(x|t)$ is a Gaussian by the

1330: central limit theorem, and the integrals

1331: (\ref{simple Pfa}) and (\ref{simple Pfd}) can be evaluated

1332: analytically (see Appendix \ref{s:appendixB}) to give

1333: \begin{eqnarray} \label{analytic}

1334: &&P_\text{FD} \left(P_\text{FA},\xi,\rho,N \right) =  1  \\

1335: &&-\frac{1}{2} \erfc \left[ \frac{\displaystyle \erfc^{-1} \left(2P_\text{FA}\right) - \frac{\rho}{\sqrt{2}} }

1336:                            {\sqrt{\displaystyle \frac{\rho^2}{N}\left( \frac{3}{\xi} - 1 \right) + \frac{2\rho}{\sqrt{N}} + 1}}

1337: 		      \right] + O \left( {1 \over \sqrt{N}} \right). \nonumber

1338: \end{eqnarray}

1339: Here the function $\erfc(x)$ (known as the compliment of the error

1340: function) is defined by

1341: \begin{equation}

1342: \erfc(x) = \frac{2}{\sqrt{\pi}}\int_x^\infty dy~e^{-y^2},

1343: \end{equation}

1344: and $\erfc^{-1}(x)$ is the inverse of $\erfc(x)$.

1345: The formula (\ref{analytic}) is valid only for $P_{\rm FA} < 1/2$;

1346: $P_\text{FD}$ is undefined for $1/2 \le P_\text{FA} < 1$.

1347: In deriving Eq.~(\ref{analytic}), we assumed

1348: that the statistics $\Lambda_\text{CC}$ and $\hat\alpha^2$

1349: are equivalent, and that the distribution for $\bar\alpha^2$ is

1350: Gaussian. Those assumptions

1351: are only valid up to fractional correction terms of order

1352: $1/\sqrt{N}$; hence the indicated correction term in Eq.\ (\ref{analytic}).

1353:

1354:

1355: In the regime where $\rho^2\ll N \xi$ in addition to $N \gg 1$, the

1356: result (\ref{analytic}) simplifies to

1357: \begin{eqnarray}

1358: P_\text{FD} \left(P_\text{FA},\xi,\rho,N \right) &=&  1 - \frac{1}{2}

1359: \erfc \left[ \erfc^{-1} \left(2P_\text{FA}\right) -

1360: \frac{\rho}{\sqrt{2}} \right]  \nonumber \\

1361: &+& O\left( \frac{1}{ \sqrt{N} }\right) + O\left({ \rho \over

1362: \sqrt{N}} \right) + O\left({ \rho^2 \over

1363: N \xi } \right). \nonumber \\

1364:  \label{specialized analytic}

1365: \end{eqnarray}

1366: Note that the false dismissal versus false alarm relation

1367: (\ref{specialized analytic}) is independent of both $N$ and $\xi$.

1368: Sample curves from Eq.~(\ref{specialized analytic}) are shown in

1369: Fig.~\ref{analytical curves}.

1370: \begin{figure}

1371: \begin{center}

1372: \epsfig{file=analytical.eps,width=8.5cm}

1373: \caption{Sample false dismissal versus false alarm curves for

1374: the cross correlation statistic $\Lambda_\text{CC}$

1375: in the large $N$ limit, as prescribed by Eq.~(\ref{specialized

1376:   analytic}).  For these curves

1377: the signal-to-noise ratio $\rho$ has equally spaced values from 0.01

1378: to 1. Note that here $P_\text{FD}$

1379: is undefined for $1/2 \le P_\text{FA} <  1$.}

1380: \label{analytical curves}

1381: \end{center}

1382: \end{figure}

1383: The discontinuities at $P_\text{FA} = 1/2$ are a result of the step

1384: functions in the definition (\ref{gaussian estimator1})

1385: of $\hat\alpha^2$.

1386:

1387:

1388: \subsubsection{Burst statistic}

1389:

1390: By combining the definition (\ref{eq:lambdaBdef}) of the burst

1391: statistic together with the decomposition (\ref{eq:common}), the

1392: noise and signal distributions (\ref{assumption2}) and (\ref{signal

1393: prior}), and the change of variables (\ref{rho2})

1394: it is straightforward to derive the exact false alarm versus

1395: false dismissal relation.  The result is given by

1396: \begin{eqnarray}

1397: (1 - P_{\rm FA})^{1/N} = {\rm erf}\left({\Lambda_* \over \sqrt{2}}\right)

1398: \label{burstans1}

1399: \end{eqnarray}

1400: and

1401: \begin{equation}

1402: P_{\rm FD}^{1/N} = \xi \, {\rm erf} \left[ { \Lambda_* \over \sqrt{

1403:        2 + {2 \rho \over \xi \sqrt{N}} }} \right] + (1 -

1404:       \xi) \, {\rm erf} \left( {\Lambda_* \over \sqrt{2}} \right),

1405: \label{burstans2}

1406: \end{equation}

1407: where $\Lambda_*$ is the value of the threshold.

1408:

1409: \subsubsection{Maximum likelihood statistic}

1410: \label{s:MLS}

1411:

1412:

1413: We start by discussing the

1414: different regimes present in the space of signal

1415: parameters $\xi$, $\rho$ and $N$, treating the

1416: false alarm probability $P_\text{FA}$ as fixed.  There are several

1417: different constraints

1418: on the three parameters $\xi$,

1419: $\rho$, and $N$ that define the regime in parameter space where we

1420: expect our maximum likelihood statistic to work well. First, it is clear that the total

1421: number of events $\sim \xi N$ in the data set must be large compared to one:

1422: \begin{equation}

1423: \xi \gg \frac{1}{N}.

1424: \label{eq:constraint2a}

1425: \end{equation}

1426:

1427:

1428: Second, if the signal-to-noise ratio $\alpha^2 / (\sigma_1 \sigma_2)$

1429: of individual burst events is large compared to one, then one can detect the

1430: individual events using the burst statistic (\ref{eq:lambdaBdef})

1431: and the method of this paper is not needed.  From

1432: Eq.\ (\ref{rho}) we can write the constraint $\alpha^2 / (\sigma_1

1433: \sigma_2) \alt 1$ as

1434: \begin{equation}

1435: \xi \agt \frac{\rho}{\sqrt{N}}.

1436: \end{equation}

1437: A more precise version of this requirement can be obtained by noting

1438: that the detection threshold for the signal-to-noise ratio

1439: $\alpha^2 / (\sigma_1 \sigma_2)$ is $\sim \sqrt{2 \ln N}$, since there

1440: are $N$ independent trials.  This yields the constraint

1441: \begin{equation}

1442: \xi \agt \frac{\rho}{\sqrt{2 N \ln N}}.

1443: \label{eq:constraint2}

1444: \end{equation}

1445: The regime $\xi \sim \rho / \sqrt{2 N \ln N}$ is where the

1446: burst statistic $\Lambda_\text{B}$ starts becoming as sensitive as the

1447: cross correlation statistic, as can be seen by combining Eqs.\

1448: (\ref{specialized analytic}), (\ref{burstans1}) and (\ref{burstans2})

1449: above.  This behavior can also be seen

1450: in Figs. \ref{omega gain} and \ref{fig:theoretical} above.

1451:

1452:

1453: A third constraint on the space of signal parameters is derived as

1454: follows.  Consider the statistic

1455: \begin{equation}

1456: \eta = \frac{1}{N}\sum_{k=1}^N (h_1^k)^2(h_2^k)^2.

1457: \label{etadef}

1458: \end{equation}

1459: We can use this statistic to estimate the Gaussianity parameter $\xi$

1460: in the following way.

1461: The mean value of $\eta$ when a signal is present is given by

1462: \begin{equation} \label{eta mean}

1463: \left< \eta \right> = 3\xi\alpha^4 + \xi\alpha^2(\sigma_1^2+\sigma_2^2) + \sigma_1^2 \sigma_2^2,

1464: \end{equation}

1465: and the variance when a signal is absent is

1466: \begin{equation} \label{eta var}

1467: (\Delta \eta)^2 = \sigma^4_1\sigma^4_2 \frac{8}{N}.

1468: \end{equation}

1469: It follows from Eqs.\ (\ref{eta mean}), (\ref{eta var}), and the

1470: relation $\left<

1471: \hat\alpha^2 \right> = \xi\alpha^2$ that the estimator ${\hat \xi}$ of

1472: $\xi$ defined by

1473: \begin{equation}

1474: \hat \xi = \frac{3 \hat\alpha^4}{\eta - \hat\alpha^2 (\hat\sigma_1^2 + \hat\sigma_2^2) - \hat\sigma_1^2\hat\sigma_2^2}

1475: \end{equation}

1476: has a fractional accuracy of order

1477: \begin{equation}

1478: \frac{\Delta \xi}{\xi} \sim \frac{\xi \sqrt{N}}{\rho^2}.

1479: \label{Deltaxi}

1480: \end{equation}

1481: Now in the regime $\Delta \xi / \xi \ll 1$, we expect our maximum

1482: likelihood detection statistic to work well, since one's first guess

1483: for a nonlinear statistic (\ref{etadef}) can be used to detect the

1484: non-Gaussianity of the signal to high accuracy.

1485: In the regime $\Delta \xi / \xi \gg 1$, it is not obvious how the

1486: maximum likelihood detection statistic will perform, since it could

1487: have a performance much better than that of the statistic $\eta$.

1488: However, our Monte Carlo simulations [Sec.\ \ref{ss:Description of the

1489: simulation algorithm} below] and analytic

1490: computations [Appendix

1491: \ref{s:appendixC}] indicate that the maximum likelihood statistic does

1492: indeed perform poorly in the regime $\Delta \xi / \xi \gg 1$.

1493: Thus, our third constraint is $\Delta \xi / \xi \alt 1$, which from

1494: Eq.\ (\ref{Deltaxi}) can be written as

1495: \begin{equation}

1496: \xi \alt \frac{\rho^2}{\sqrt{N}}.

1497: \label{constraint3}

1498: \end{equation}

1499: Our Monte Carlo simulations show that for $\rho^2/\sqrt{N} \alt \xi

1500: \alt 1$, the maximum likelihood and cross-correlation statistics

1501: perform roughly equivalently, and that once $\xi$ becomes smaller than

1502: $\rho^2 / \sqrt{N}$, the maximum likelihood statistic starts to

1503: perform significantly better than the cross-correlation statistic; see

1504: Figs. \ref{omega gain} and \ref{fig:theoretical} above.

1505:

1506:

1507: In Appendix \ref{s:appendixC} we derive analytically the approximate

1508: expression (\ref{eq:ansA}) for the false dismissal

1509: probability for the maximum likelihood statistic, which we expect to

1510: be accurate up to corrections of order $1/\rho^4$ or a few tens of

1511: percent.  We also derive the expression (\ref{eq:ansB}) for the false

1512: alarm probability using a combination of analytical and numerical

1513: techniques.  Combining these results gives the curves which are associated with the

1514: maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$ and labeled ``analytic'' in

1515: Figs.~\ref{omega gain}, \ref{fig:theoretical}, \ref{detection curves}, and \ref{LargeN}.

1516:

1517:

1518:

1519:

1520: \subsection{Description of the Monte Carlo simulation algorithm}

1521: \label{ss:Description of the simulation algorithm}

1522:

1523: Next we describe our Monte Carlo simulations of the performances of the various statistics.

1524: We numerically estimate the false dismissal and false alarm probabilities

1525: $P_\text{FD}$ and $P_\text{FA}$ by conducting an ensemble

1526: of $N_E$ simulated experiments.

1527: For each experiment we simulate a detector output matrix, half of which

1528: have a signal present, and half of which do not.  Since we know in advance whether or not a signal is present, we can

1529: easily estimate $P_\text{FA}$ and $P_\text{FD}$.  More specifically,

1530: our algorithm for simulating false

1531: dismissal versus false alarm curves, for an arbitrary statistic

1532: $\Gamma$, is as follows:

1533: \begin{enumerate}

1534: \item Choose values for $\xi$, $\alpha$, $\sigma_1$, $\sigma_2$, and $N$.

1535: \item Choose the total number of trials $N_E$.

1536: \item For $r=1,2,\ldots,N_E/2$:

1537: 	\begin{enumerate}

1538:                 \item Generate a data train $h(\sigma_1,\sigma_2,N)$ of noise only.

1539:                 \item Compute $\Gamma$ and store result as $\Gamma_{r0}$.

1540: 		\item Generate a data train $h(\xi,\alpha,\sigma_1,\sigma_2,N)$ which has a signal present.

1541: 		\item Compute $\Gamma$ and store result as $\Gamma_{r1}$.

1542: 	\end{enumerate}

1543: \item Choose a discretization $\Gamma_{*j}$ of the set of thresholds,

1544: where $j=1,2,\ldots,M$.

1545: \item Set $P_\text{FA}(\Gamma_{*j}) = P_\text{FD}(\Gamma_{*j}) = 0$, for each $j$.

1546: \item For $r=1,2,\ldots,N_E/2$:

1547: 	\begin{enumerate}

1548: 		\item for each $j$, if $\Gamma_{r0} > \Gamma_{*j}$, increment $P_\text{FA}(\Gamma_{*j})$ by $2/N_E$.

1549: 		\item for each $j$, if $\Gamma_{r1} \le \Gamma_{*j}$, increment $P_\text{FD}(\Gamma_{*j})$ by $2/N_E$.

1550: 	\end{enumerate}

1551: \item Repeat steps 3-6 above several times to estimate the fluctuations in $P_\text{FA}(\Gamma_{*j})$ and $P_\text{FD}(\Gamma_{*j})$.

1552: \end{enumerate}

1553:

1554: We use the above algorithm to simulate false dismissal versus false

1555: alarm curves for the three statistics $\Lambda_\text{CC}$,

1556: $\Lambda_{\rm B}$ and

1557: $\Lambda_\text{ML}^\text{NG}$.  The analytical expressions

1558: (\ref{analytic}) and (\ref{burstans1}) -- (\ref{burstans2}) for the

1559: cross-correlation and burst statistics are

1560: used as a check of the numerical method.

1561:

1562:

1563:

1564:

1565: \subsection{Simulation results}

1566: \label{ss:Results for detection}

1567:

1568: A family of simulated false dismissal versus false alarm curves for

1569: the cross correlation statistic $\Lambda_\text{CC}$ and

1570: the maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$ is

1571: shown in Fig.~\ref{compare}.

1572: \begin{figure}

1573: \begin{center}

1574: \epsfig{file=compare.eps,width=8.5cm}

1575: \caption{

1576: Plots of false dismissal probability ($P_\text{FD}$) versus false alarm

1577: probability ($P_\text{FA}$) for the standard cross-correlation

1578: statistic $\Lambda_\text{CC}$ and

1579: our maximum likelihood statistic $\Lambda_\text{ML}^\text{NG}$.  Each

1580: of these curves is characterized by a total number of trials $N_E =

1581: 2\times 10^4$, number of data points $N = 5\times10^4$, noise variances

1582: $\sigma_1 = \sigma_2 = 1$, and by the signal-to-noise ratio $\rho = 1$.

1583: The values of the Gaussianity parameter $\xi$ are 0.02, 0.012, and

1584: 0.01.  The solid curves are the results for $\Lambda_{\rm ML}^{\rm

1585: G}$; these curves are bunched together because

1586: $\rho$ is fixed.  The dashed curves are the results for $\Lambda_{\rm

1587: ML}^{\rm NG}$.  For the dashed curves, the lowest curve is for $\xi =

1588: 0.01$, while the highest curve is for $\xi = 0.02$.

1589: We estimate error bars for each of these curves by separating the $2 \times

1590: 10^4$ runs into 10 bins of $2 \times 10^3$, and generating 10 separate

1591: plots; the resulting fluctuations are $\alt 10^{-3}$.  The curves for

1592: the cross correlation statistic $\Lambda_{\rm ML}^{\rm G}$ agree with

1593: the analytic prediction (\ref{analytic}) to within $\sim 10^{-3}$.

1594: This plot shows that $\Lambda_\text{ML}^\text{NG}$ can perform

1595: significantly better than $\Lambda_\text{CC}$.

1596: }

1597: \label{compare}

1598: \end{center}

1599: \end{figure}

1600: We see that at fixed $\rho$, as the Gaussianity $\xi$ of the signal

1601: decreases, $\Lambda_\text{ML}^\text{NG}$ performs increasingly better

1602: than $\Lambda_\text{CC}$.

1603: The curves for $\Lambda_\text{CC}$ are almost indistinguishable from

1604: each other because $\rho$ is fixed, and the curves depend only on

1605: $\rho$ and not on $\xi$ for this detection statistic in the large $N$

1606: limit [cf.\ Eq.\ (\ref{specialized analytic}) above].

1607:

1608: If we maintain the same value for $\rho$ as in Fig.~\ref{compare}, but

1609: take $\xi \gtrsim 0.03$, the

1610: curves for $\Lambda_\text{CC}$ and $\Lambda_\text{ML}^\text{NG}$ cannot be distinguished from each other.

1611: We find in general that for \emph{any} values of $N$, $\sigma_1$, $\sigma_2$, and $\rho$,

1612: as $\xi \rightarrow 1$, the false dismissal versus false alarm curves for $\Lambda_\text{CC}$

1613: and $\Lambda_\text{ML}^\text{NG}$ cannot be distinguished from each other.

1614: Thus, the two statistics are nearly equivalent for Gaussian

1615: signals, as expected.  However, for $\xi \ll 1$,

1616: Fig.\ \ref{compare} demonstrates that $\Lambda_\text{ML}^\text{NG}$ performs noticeably better than

1617: $\Lambda_\text{CC}$.

1618:

1619:

1620: We now discuss a comparison of the two statistics in terms of the

1621: minimum gravitational wave energy density necessary for detection,

1622: instead of in terms of the false dismissal versus false alarm curves.

1623: For a stochastic background with rms strain amplitude

1624: $h_\text{rms}$, we have $\Omega \propto h^2_\text{rms}$ \cite{Allen

1625: Review}, where $\Omega$ is the gravitational wave energy density.

1626: For our model signal (\ref{signal model}) we have $h_{\rm rms}^2

1627: \propto \xi \alpha^2$, and comparing this with the

1628: formula $\rho \propto \xi \alpha^2$ from Eq.\ (\ref{rho2}) shows that

1629: we can interpret the signal to

1630: noise ratio $\rho$ as the energy density in the stochastic background,

1631: even for non-Gaussian signals.

1632:

1633:

1634: We compute the minimum detectable energy density or signal-to-noise

1635: ratio $\rho_{\rm detectable}$ as follows.  First, we choose

1636: thresholds $P_{\text{FA}*}$ and $P_{\text{FD}*}$ for the false alarm

1637: and false dismissal probabilities.  We refer to the pair

1638: $(P_{\text{FA}*},P_{\text{FD}*})$ as the \emph{detection point}.

1639: For any statistic $\Gamma$, the choice of detection point determines

1640: the detection threshold $\Gamma_*$, and inverting Eq.~(\ref{dependance3})

1641: gives the minimum detectable signal-to-noise ratio

1642: \begin{equation}

1643: \rho = \rho_\text{detectable}(P_{\text{FA}*},P_{\text{FD}*},\xi,N),

1644: \end{equation}

1645: as illustrated in Fig.~\ref{rho detectable}.

1646: \begin{figure}

1647: \begin{center}

1648: \epsfig{file=critical.eps,width=8.5cm}

1649: \caption{A family of false dismissal versus false alarm curves for fixed $\xi$.

1650: Here the detection point, at $P_{\text{FD}*} = P_{\text{FA}*} = 0.1$, is marked with $*$.}

1651: \label{rho detectable}

1652: \end{center}

1653: \end{figure}

1654: For the cross-correlation statistic $\Lambda_\text{CC}$ the

1655: result is, from Eq.~(\ref{analytic}),

1656: \begin{eqnarray}

1657: \rho_\text{detectable}^\text{CC} &=&

1658: \frac{2\sqrt{2}\gamma \left[1 + \gamma\sqrt{2/N}\right]}{ 1 +

1659:   2\gamma^2\left(1 - \frac{3}{\xi}

1660:   \right)/N } \left[ 1 + O\left( {1 \over \sqrt{N} }\right) \right] \nonumber \\

1661: & & \label{analytic detectable} \\

1662: &=& 2\sqrt{2}\gamma + O\left(\frac{1}{\sqrt{N}}\right) + O\left(

1663:      {\gamma \over \sqrt{N}} \right), \nonumber \\

1664: & & \label{specialized detectable}

1665: \end{eqnarray}

1666: where $\gamma = \erfc^{-1}(2P_{\text{FA}*})$ and we have assumed that

1667: $P_{\text{FA}*}=P_{\text{FD}*}$.

1668: This relation is plotted in Fig.~\ref{min rho^G_detectable}.

1669:

1670:

1671: \begin{figure}

1672: \begin{center}

1673: \epsfig{file=MinDetect.eps,width=8.5cm}

1674: \caption{The minimum detectable signal-to-noise ratio $\rho_\text{detectable}^\text{CC}$ for the cross-correlation statistic $\Lambda_\text{CC}$

1675: as a function of the false alarm probability threshold $P_{\text{FA}*}$. Note that we assume the false dismissal probability

1676: threshold $P_{\text{FD}*} = P_{\text{FA}*}$.}

1677: \label{min rho^G_detectable}

1678: \end{center}

1679: \end{figure}

1680:

1681: From the results of our simulations, we determine

1682: $\rho_\text{detectable}(P_{\text{FA}*},P_{\text{FD}*},\xi,N)$ by

1683: numerically solving the equation

1684: \begin{equation} \label{root}

1685: P_\text{FD}(P_{\text{FA}*},\xi,\rho,N) - P_{\text{FD}*} = 0

1686: \end{equation}

1687: for $\rho$.

1688: Unfortunately, evaluating the function on the left hand side of

1689: Eq.~(\ref{root})

1690: is computationally expensive.  Each evaluation involves simulating the

1691: false dismissal versus false alarm curve which

1692: is itself a computationally intensive task.  Moreover,

1693: it is only feasible for us to solve Eq.~(\ref{root}) for values of $N$

1694: $\alt 10^4$ while

1695: a realistic detection scenario for ground based detectors would

1696: involve a year's worth of data sampled at $\sim 100$

1697: Hz for which $N\sim 10^9$.  Therefore our conclusions about the

1698: applicability of the method to ground based detectors are based on our

1699: analytic results, as discussed in the Introduction.

1700:

1701:

1702:

1703: Figure \ref{detection curves} shows the results obtained from

1704: numerically solving Eq.~(\ref{root}) for $\rho_{\rm detectable}$ for

1705: the parameter values $\xi = 0.02$, $P_{\text{FA}*} = P_{\text{FD}*} = 0.1$,

1706: and Fig.~\ref{LargeN} shows the corresponding results for $\xi = 4.3

1707: \times 10^{-3}$.  For the cross-correlation statistic, the results are

1708: in good agreement with the analytic prediction

1709: (\ref{analytic detectable}).

1710: \begin{figure}

1711: \begin{center}

1712: \epsfig{file=detection.eps,width=8.5cm}

1713: \caption{The minimum detectable signal strength

1714:   $\rho_\text{detectable}$ as a function of

1715: the number of data points $N$,

1716: for the false alarm probability threshold $P_{\text{FA}*}=0.1$, false

1717: dismissal probability threshold $P_{\text{FD}*} = 0.1$, and

1718: Gaussianity parameter $\xi = 0.02$.

1719: The circles are the simulation results, and the error bars

1720: are estimated from ten

1721: different runs.  The solid curve is the analytical prediction (\ref{analytic detectable}) for

1722: $\Lambda_\text{CC}$, and the dotted line is the $N \to \infty$ limit

1723:   (\ref{specialized detectable}).

1724: The dashed line is the analytic prediction for $\Lambda_{\rm

1725:   ML}^{\rm NG}$ given by Eqs.\ (\ref{eq:ansA}) and (\ref{eq:ansB}).

1726: }

1727: \label{detection curves}

1728: \end{center}

1729: \end{figure}

1730: \begin{figure}

1731: \begin{center}

1732: \epsfig{file=LargeN.eps,width=8.5cm}

1733: \caption{Same as Fig.\ \protect{\ref{detection curves}} but with $\xi

1734:   = 4.3 \times 10^{-3}$.

1735: }

1736: \label{LargeN}

1737: \end{center}

1738: \end{figure}

1739:

1740:

1741: Figure \ref{omega gain} shows the minimum detectable energy density

1742: as a function of the Gaussianity parameter $\xi$ for $N=10^4$ (corresponding to space based detectors), for the

1743: cross-correlation and maximum likelihood statistics and also for the

1744: burst statistic (\ref{eq:lambdaBdef}).  We again use the values

1745: $P_{\text{FA}*} =  P_{\text{FD}*} = 0.1$.  The figure shows that the

1746: maximum likelihood statistic performs better than the other statistics

1747: by a factor which is roughly 3 for $\xi$ of order 1\%.  For smaller values

1748: of $\xi$, the maximum likelihood performs increasingly better than the

1749: cross-correlation statistic, but is eventually comparable to the burst statistic.

1750: Thus the maximum likelihood statistic gives an improvement in sensitivity to backgrounds

1751: composed of roughly $10$ to $10^{3}$ events per year.

1752:

1753: Figure \ref{fig:theoretical} is a similar plot, without the Monte Carlo simulation results,

1754: for $N = 10^9$ (corresponding to ground based detectors). Here we use $P_{\text{FA}*} =  P_{\text{FD}*} = 0.01$.

1755: The results are similar to those in Fig.~\ref{omega gain}, except that here the gain in sensitivity

1756: occurs in the band $10^{-5} < \xi < 10^{-3}$.  This band corresponds to $10^4$-$10^6$ events per year.

1757:

1758:

1759:

1760:

1761:

1762:

1763: \subsection{Parameter estimation}

1764: \label{ss:Results for parameter estimation}

1765:

1766:

1767: The computation of the maximum likelihood statistic also serves to

1768: measure the parameters of the signal.

1769: The statistic $\Lambda_\text{ML}^\text{NG}$, from Eq.~(\ref{main result2}), can be written as

1770: \begin{equation} \label{main result simple form}

1771: \Lambda_\text{ML}^\text{NG} = \max_{0<\xi\le 1}~ \max_{\alpha^2>0}~

1772: 			\max_{\sigma^2_1 \ge 0}~ \max_{\sigma^2_2 \ge 0}~

1773: 			\lambda(\xi, \alpha^2, \sigma^2_1,\sigma^2_2).

1774: \end{equation}

1775: The point $(\hat \xi,\hat \alpha^2,\hat \sigma_1^2,\hat \sigma^2_2)$ where this maximum is achieved

1776: is the maximum likelihood estimator for $(\xi,\alpha^2,\sigma_1^2,\sigma^2_2)$. In Fig.~\ref{contours}

1777: we show contours of the function $\ln \lambda$ for a strong ($\rho = 20$) signal.

1778: \begin{figure}

1779: \begin{center}

1780: \epsfig{file=contour.eps,width=8.5cm}

1781: \caption{

1782: Representative contours of $\ln \lambda(\xi,\alpha^2,\hat \sigma_1^2,\hat \sigma_2^2)$.

1783: Here $\rho= 20$ and $N = 1.6 \times 10^5$.  The simulated signal is characterized by $\xi = 0.2$ and

1784: $\alpha^2 = 0.25$, marked with an $\times$.  The noise is characterized by $\sigma_1^2 = \sigma_2^2 = 1$.

1785: The maximum, marked with a $+$, is found at $\ln \lambda(0.207, 0.251, 0.993, 0.993) = 229$,

1786: while $\ln \lambda(0.2,0.25,1,1) = 227$.

1787: }

1788: \label{contours}

1789: \end{center}

1790: \end{figure}

1791: This figure shows that both $\xi$ and $\alpha^2$ can be measured with

1792: good accuracy.

1793:

1794: Note that the main benefit of using $\Lambda_{\rm ML}^{\rm NG}$

1795: is that it allows

1796: us to detect signals that are too weak to be seen using $\Lambda_\text{CC}$.  Using

1797: $\Lambda_\text{ML}^\text{NG}$ also allows one to test if a detected signal is Gaussian, as obtained

1798: above, but this is not the main benefit of the method, as there are other, simpler, methods to test

1799: for non-Gaussianity.

1800:

1801: \section{Conclusions}

1802: \label{s:Conclusions}

1803:

1804: The use of our maximum likelihood statistic in searches for a

1805: non-Gaussian background gives a gain in sensitivity over the

1806: standard cross-correlation statistic.  Figures \ref{omega gain} and \ref{fig:theoretical} show

1807: that the gain factor can be significant for sufficiently non-Gaussian signals.

1808: However, computing the maximum likelihood statistic requires significantly more

1809: computational power than the cross-correlation statistic.

1810:

1811:

1812:

1813: The analysis presented here must be generalized in several ways before

1814: being usable in gravitational wave detectors.  These generalizations,

1815: listed in order of importance, are:

1816:

1817: \begin{itemize}

1818:

1819: \item Our signal model (\ref{signal model}) assumes a Gaussian

1820: distribution of amplitudes of the burst events.  This assumption simplified

1821: our analysis and resulted in a statistic with the useful property of

1822: being nearly equivalent to the cross-correlation statistic in the Gaussian

1823: signal limit.  In practice however, the distribution of the events

1824: should instead be based on the candidate sources.  For

1825: example, a popcorn-like stochastic background produced by a spatially

1826: uniform distribution

1827: of standard-candle sources out to some maximum redshift would have a

1828: signal distribution of the form (\ref{signal model}) with the Gaussian

1829: term replaced by a term proportional to $s^{-4}\theta(s-s_{\min})$,

1830: where $\theta$ is the step function and $s_{\rm min}$ is a cutoff

1831: signal strength.

1832:

1833: \item One should allow the burst durations to be longer than the

1834: detector resolution time.  For this situation one possibility would

1835: be to preprocess the data with a lowpass filter, and then apply

1836: the techniques developed here.  Another possibility would be to try to

1837: combine the analysis of this paper with the excess power detection

1838: method of Ref.\ \cite{excess power}.

1839:

1840: \item Real detector noise always contains non-Gaussian components, so

1841: one needs to generalize the analysis to allow for this.  Such a

1842: generalization for a Gaussian stochastic background can be found in

1843: Refs.\ \cite{robust gaussian,robust gaussian II}.

1844:

1845: \item It would be useful to consider a more general signal model which

1846: consists of a superposition of a Gaussian background and a

1847: non-Gaussian background, since the true gravitational wave background

1848: might consist of such a superposition.

1849:

1850: \item The analysis needs to be generalized to allow for colored

1851: detector noise, and separated, misaligned detectors.  This

1852: generalization should be fairly straightforward.

1853:

1854: \end{itemize}

1855:

1856:

1857:

1858:

1859: \begin{acknowledgments}

1860: We thank Wolfgang Tichy, Tom Loredo, Teviet Creighton, and Bernard Whiting for helpful

1861: discussions, and the web site {\it google.com} for providing useful

1862: references on the generalized central limit theorem.

1863: The analytic computations in Appendix \ref{s:appendixC}

1864: were carried out using the software package {\it Mathematica}.

1865: This work was supported in part by National Science

1866: Foundation awards PHY-9722189 and PHY-0140209, the Alfred P. Sloan

1867: foundation, the Radcliffe Institute for Advanced Study, and the

1868: NASA/New York Space Grant Consortium.

1869: \end{acknowledgments}

1870:

1871: \appendix

1872:

1873: \section{General form of the likelihood ratio}

1874: \label{s:appendixA}

1875:

1876: In this appendix we give two derivations of the general formula

1877: (\ref{general likelihood ratio}) for the likelihood ratio.

1878: The first derivation is based on Eq.~(\ref{def1}) while the second is based on Eq.~(\ref{def2}).

1879: We also derive the formula (\ref{distribution relation}) for the posterior probability density

1880: $p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1)$.

1881:

1882: \subsection{First derivation}

1883: \label{ss:First derivation}

1884:

1885: We can derive Eq.~(\ref{general likelihood ratio}) by using the total probability theorem to

1886: expand the distributions in the numerator and denominator of Eq.~(\ref{def1}). Note that all

1887: distributions in this derivation are priors.

1888:

1889: First expand $p_\mathcal{H}(h)$ just in terms of the random variable

1890: $\mathcal{T}$

1891: \begin{equation} \label{expansion goal}

1892: p_\mathcal{H}(h) = P_\mathcal{T}(1)p_\mathcal{H|T}(h|1)

1893:                  + P_\mathcal{T}(0) p_\mathcal{H|T}(h|0).

1894: \end{equation}

1895: Expanding $p_\mathcal{H}(h)$ in terms of all the degrees of freedom yields

1896: \begin{eqnarray}

1897: && p_\mathcal{H}(h) = \sum_{t=0}^1~ \int_{\Theta_s} d^{Q_s}v_s~ \int d^{ND}s~

1898: 		\int_{\Theta_n} d^{Q_n}v_n \label{expand1} \\

1899: &&		~\times~  p_{\mathcal{H|T,V}_s,\mathcal{S,V}_n}(h|t,{\bf v}_s,s,{\bf v}_n)

1900: 			  p_{\mathcal{T,V}_s,\mathcal{S,V}_n}(t,{\bf v}_s,s,{\bf v}_n), \nonumber.

1901: \end{eqnarray}

1902: The ratio of the coefficients of $P_\mathcal{T}(1)$ and $P_\mathcal{T}(0)$ in Eq.~(\ref{expand1}) will

1903: give the general expression for the likelihood ratio by Eq.~(\ref{def1}).

1904:

1905: The conditional distribution for $\mathcal{H}$ in Eq.~(\ref{expand1}) can be translated into a conditional distribution

1906: for $\mathcal{N}$.  From Eq.~(\ref{detector output matrices}) it

1907: follows that

1908: \begin{equation} \label{trick1}

1909: p_\mathcal{H|S}(h|s)=p_\mathcal{N+S|S}(h|s) = p_\mathcal{N|S}(h-s|s),

1910: \end{equation}

1911: and since $\mathcal{S}$ and $\mathcal{N}$ are statistically independent we obtain

1912: \begin{equation} \label{trick3}

1913: P_\mathcal{H|S}(h|s)=P_\mathcal{N}(h-s).

1914: \end{equation}

1915: Generalizing this argument gives

1916: \begin{equation}  \label{simplify1}

1917: p_{\mathcal{H|T,V}_s,\mathcal{S,V}_n}(h|t,{\bf v}_s,s,{\bf v}_n) =

1918: p_{\mathcal{N|V}_n}(h-s|{\bf v}_n),

1919: \end{equation}

1920: since a priori $\mathcal{T}$, $\mathcal{V}_s$, and $\mathcal{S}$ are statistically independent

1921: of $\mathcal{N}$ and  $\mathcal{V}_n$. For the same reason we can

1922: write the joint distribution that appears in Eq.~(\ref{expand1}) as

1923: \begin{equation} \label{simplify2}

1924: p_{\mathcal{T,V}_s,\mathcal{S,V}_n}(t,{\bf v}_s,s,{\bf v}_n) =

1925: p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s)p_{\mathcal{V}_n}({\bf v}_n).

1926: \end{equation}

1927:

1928: Substituting Eqs.~(\ref{simplify1}) and (\ref{simplify2}) into Eq.~(\ref{expand1}) yields

1929: \begin{eqnarray}

1930: p_\mathcal{H}(h) &=& \sum_{t=0}^1~ \int_{\Theta_s} d^{Q_s}v_s~ \int d^{ND}s~

1931: 		\int_{\Theta_n} d^{Q_n}v_n \label{expand2} \\

1932:                 &\times&  p_{\mathcal{N|V}_n}(h-s|{\bf v}_n) p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s)

1933: 		p_{\mathcal{V}_n}({\bf v}_n) .\nonumber

1934: \end{eqnarray}

1935: We can also rewrite the distribution

1936: $p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s)$ as

1937: \begin{equation} \label{expand3}

1938: p_{\mathcal{T,V}_s,\mathcal{S}}(t,{\bf v}_s,s) = p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,t)

1939: p_{\mathcal{V}_s|T}({\bf v}_s,t) P_\mathcal{T}(t),

1940: \end{equation}

1941: by Eq.~(\ref{conditional joint}).

1942: Substituting Eq.~(\ref{expand3}) into Eq.~(\ref{expand2}) and explicitly evaluating the sum over $t$ yields

1943: \begin{widetext}

1944: \begin{eqnarray}

1945: p_\mathcal{H}(h) &=& P_\mathcal{T}(1)\int_{\Theta_{s1}}d^{Q_s}v_s~ \int d^{ND}s~ \int_{\Theta_n} d^{Q_n}v_n~

1946: 		 p_{\mathcal{N|V}_n}(h-s|{\bf v}_n)

1947: 		 p_{\mathcal{V}_n}({\bf v}_n)

1948: 		 p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1)

1949: 		 p_{\mathcal{V}_s|\mathcal{T}}({\bf v_s}|1) \nonumber \\

1950: 		 &+& P_\mathcal{T}(0)\int_{\Theta_n} d^{Q_n}v_n~

1951: 		 p_{\mathcal{N|V}_s}(h|{\bf v}_n). \label{expand4}

1952: \end{eqnarray}

1953: \end{widetext}

1954: Here we have used the following relations:

1955: \begin{eqnarray}

1956: p_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s\in \Theta_{s1},0) &=& \delta^{ND}(s) \\

1957: p_{\mathcal{V}_s|\mathcal{T}}({\bf v_s}\in \Theta_{s0}|1) &=& 0  \\

1958: p_{\mathcal{V}_s|\mathcal{T}}({\bf v_s}\in \Theta_{s1}|0) &=& 0  \\

1959: \int_{\Theta_{s0}} d^{Q_s}v_s~ p_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s|0) &=& 1.

1960: \end{eqnarray}

1961: By comparing Eqs.~(\ref{expansion goal}) and (\ref{expand4}) we can read off the distributions

1962: $p_\mathcal{H|T}(h|t)$ and construct Eq.~(\ref{general likelihood

1963: ratio}) from Eq.~(\ref{def1}).  Note that the expression (\ref{def1})

1964: is independent of the space $\Theta_{s0}$ of signal parameters

1965: corresponding to ``no signal present''.

1966:

1967:

1968: \subsection{Second derivation}

1969: \label{Second derivation}

1970:

1971: Here we derive Eq.~(\ref{general likelihood ratio}), and also Eq.~(\ref{distribution relation}), from Eq.~(\ref{def2}).

1972: Consider the distribution

1973: \begin{equation} \label{simple1}

1974: p_{\mathcal{T,V}_s,\mathcal{V}_n|\mathcal{H}}(1,{\bf v}_s,{\bf v}_n|h) =

1975: 	\frac{ p_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h) }{ p_\mathcal{H}(h) }.

1976: \end{equation}

1977: We will justify Eq.~(\ref{general likelihood ratio}) by the defining relation Eq.~(\ref{def2}), which explicitly refers

1978: to priors and posteriors.  Therefore we now append the appropriate superscripts as bookkeeping devices.

1979: Eq.~(\ref{simple1}) then reads

1980: \begin{equation} \label{simple2}

1981: p^{(1)}_{\mathcal{T,V}_s,\mathcal{V}_n}(1,{\bf v}_s,{\bf v}_n) =

1982: \frac{ p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h) }

1983:      { p^{(0)}_\mathcal{H}(h) }.

1984: \end{equation}

1985:

1986: Using the expansion of $p_\mathcal{H}(h)$ given by Eq.~(\ref{expand4}), and what we will justify is the

1987: likelihood ratio $\Lambda$ given by Eq.~(\ref{general likelihood ratio}), we have

1988: \begin{equation} \label{simple3}

1989: p^{(1)}_{\mathcal{T,V}_s,\mathcal{V}_n}(1,{\bf v}_s,{\bf v}_n) =

1990: \frac{ \left[ \frac{p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h)}

1991: 			{\int_{\Theta_n} d^{Q_n}v_n'~ p^{(0)}_{\mathcal{H|V}_n}(h|{\bf v}_n)

1992: 			p^{(0)}_{\mathcal{V}_n}({\bf v}_n)} \right] }

1993: 			{ \Lambda P^{(0)} + 1 - P^{(0)} }.

1994: \end{equation}

1995: Expanding the uppermost numerator in Eq.~(\ref{simple3}) over

1996: $\mathcal{S}$ by the total probability theorem gives

1997: \begin{eqnarray}  \label{simple4}

1998: p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H}}(1,{\bf v}_s,{\bf v}_n,h) &=& \int d^{ND}s \\

1999: &\times& p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H,S}}(1,{\bf v}_s,{\bf v_n},h,s), \nonumber

2000: \end{eqnarray}

2001: and rewriting this gives

2002: \begin{eqnarray} \label{simple expand}

2003: && p^{(0)}_{\mathcal{T,V}_s,\mathcal{V}_n,\mathcal{H,S}}(1,{\bf v}_s,{\bf v_n},h,s) =

2004: 			p^{(0)}_{\mathcal{N|V}_s}(h-s|{\bf v}_s) \nonumber \\

2005: && \times		p^{(0)}_{\mathcal{V}_n}({\bf v}_n)

2006: 			p^{(0)}_{\mathcal{S|V}_s,\mathcal{T}}(s|{\bf v}_s,1)

2007: 			p^{(0)}_{\mathcal{V}_s|\mathcal{T}}({\bf v}_s,1)P^{(0)}.

2008: \end{eqnarray}

2009: After putting Eq.~(\ref{simple expand}) into Eq.~(\ref{simple4}), substitute the result into Eq.~(\ref{simple3}).

2010: Using $\Lambda({\bf v}_s,{\bf v}_n)$ given by Eq.~(\ref{likelihood function}) then yields

2011: \begin{equation} \label{simple5}

2012: P^{(1)}p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1) =

2013: \frac{ \Lambda({\bf v}_s,{\bf v}_n)P^{(0)} }{ \Lambda P^{(0)} + 1 - P^{(0)} }.

2014: \end{equation}

2015: On the left hand side of Eq.~(\ref{simple5}) we have used

2016: \begin{equation}

2017: p^{(1)}_{\mathcal{T,V}_s,\mathcal{V}_n}(1,{\bf v}_s,{\bf v}_n) = P^{(1)}

2018: p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1).

2019: \end{equation}

2020:

2021: Integrate Eq.~(\ref{simple5}) over $\Theta_n$ and $\Theta_{s1}$ using Eq.~(\ref{likelihood function def})

2022: and the normalization requirement

2023: \begin{equation} \label{simple normalization}

2024: \int_{\Theta_{s1}}d^{Q_s}v_s \int_{\Theta_n}d^{Q_n}v_n~

2025: p^{(1)}_{\mathcal{V}_s,\mathcal{V}_n|\mathcal{T}}({\bf v}_s,{\bf v}_n|1) = 1

2026: \end{equation}

2027: to get

2028: \begin{equation} \label{p1}

2029: P^{(1)} = \frac{\Lambda P^{(0)}}{\Lambda P^{(0)} + 1 - P^{(0)}}.

2030: \end{equation}

2031: Use Eq.~(\ref{p1}) and Eq.~(\ref{simple5}) to form the ratio on the left hand side of

2032: Eq.~(\ref{distribution relation}) .  This justifies Eq.~(\ref{distribution relation}).

2033:

2034: Integrate Eq.~(\ref{distribution relation}) over $\Theta_n$ and $\Theta_{s1}$ using

2035: Eq.~(\ref{likelihood function def}) and Eq.(~\ref{simple normalization}) to see that the

2036: defining relation Eq.~(\ref{def2}) is satisfied and thus Eq.~(\ref{general likelihood ratio})

2037: is justified.

2038:

2039: \medskip

2040:

2041:

2042:

2043: \section{Analytical expressions for false dismissal versus false alarm

2044: curves for cross-correlation statistic}

2045: \label{s:appendixB}

2046: This appendix derives the analytical form (\ref{analytic})

2047: of the false dismissal versus false alarm curves for the

2048: cross-correlation statistic $\Lambda_\text{CC}$ in the large

2049: $N$ limit, for both Gaussian and non-Gaussian signals.

2050: A derivation for Gaussian signals can be found in Sec.~IV of

2051: Ref.~\cite{Allen Romano}.

2052:

2053: As noted in Sec.~\ref{ss:Gaussian signal}, the statistics

2054: $\Lambda_\text{CC}$ and $\hat\alpha^2$

2055: are equivalent in the large $N$ limit. Thus, in this limit, the false

2056: dismissal versus false alarm curves

2057: can be found by evaluating Eqs.~(\ref{simple Pfa}) and (\ref{simple

2058: Pfd}) with $\Gamma$ replaced by $\hat\alpha^2$.

2059: The relation (\ref{gaussian estimator1}) between the statistics ${\bar

2060: \alpha}^2$ and ${\hat \alpha}^2$ implies the following relation

2061: between their probability distributions

2062: $p_{\hat\alpha^2|\mathcal{T}}(x|t)$ and $p_{\bar\alpha^2|\mathcal{T}}(x|t)$:

2063: \begin{equation}

2064: p_{\hat\alpha^2|\mathcal{T}}(x|t)=

2065: \theta(x) p_{\bar\alpha^2|\mathcal{T}}(x|t)

2066: +\delta(x) \, \int_{-\infty}^0

2067: dy \, p_{\bar\alpha^2|\mathcal{T}}(y|t).

2068: \end{equation}

2069: Inserting this formula into Eqs.\ (\ref{simple Pfa}) and (\ref{simple

2070: Pfd}) gives

2071: \begin{eqnarray}

2072: P_\text{FA}(\hat\alpha^2_*) &=&\left\{

2073: \begin{array}{ll}

2074:  \displaystyle \int_{\hat\alpha^2_*}^\infty dx~p_{\bar\alpha^2|\mathcal{T}}(x|0)

2075: 	& \text{ if } \hat\alpha^2_* > 0 \\

2076:  \displaystyle 1

2077: 	& \text{ if } \hat\alpha^2_* \le 0 \\

2078: \end{array} \right. ,\label{Pfa} \\

2079: P_\text{FD}(\hat\alpha^2_*) &=&\left\{

2080: \begin{array}{ll}

2081:  \displaystyle 1 - \int_{\hat\alpha^2_*}^\infty dx~p_{\bar\alpha^2|\mathcal{T}}(x|1)

2082: 	& \text{ if } \hat\alpha^2_* > 0 \\

2083:  \displaystyle 0

2084: 	& \text{ if } \hat\alpha^2_*\le 0 \\

2085: \end{array} \right. \nonumber . \\ && \label{Pfd}

2086: \end{eqnarray}

2087:

2088: In the large $N$ limit, the distribution

2089: $p_{\bar\alpha^2_s|\mathcal{T}}(x|t)$ must be Gaussian by the central

2090: limit theorem, and

2091: therefore this distribution is characterized entirely by its mean $\left<

2092: \bar\alpha^2_t \right>$ and variance

2093: $[\Delta(\bar\alpha^2_t)]^2$.  From Eqs.\ (\ref{detector output

2094: matrices}), (\ref{assumption4}), (\ref{baralphadef}), (\ref{gaussian

2095: estimator1}) and (\ref{signal model}), these are given by

2096: \begin{eqnarray}

2097: \left< \bar\alpha^2_0 \right> &=& 0 \label{mean0}\\

2098: \Delta(\bar\alpha^2_0)  &=& \frac{\sigma_1\sigma_2}{\sqrt{N}}  \label{var0} \\

2099: \left< \bar\alpha^2_1 \right> &=& \xi\alpha^2 \label{mean1} \\

2100: \Delta(\bar\alpha^2_1) &=& \sqrt{ \frac{ \xi\alpha^4(3-\xi) + \xi\alpha^2(\sigma_1^2+\sigma_2^2)

2101: 	+ \sigma_1^2 \sigma_2^2} {N} }.

2102: \nonumber \\ && \label{var1}

2103: \end{eqnarray}

2104: Substituting Gaussian distributions, with means and variances determined by

2105: Eqs.~(\ref{mean0})-(\ref{var1}), into Eqs.~(\ref{Pfa}) and (\ref{Pfd}) yields

2106: \begin{widetext}

2107: \begin{eqnarray}

2108: P_\text{FA}(\hat\alpha^2_*,\sigma_1,\sigma_2,N) &=&

2109: \left\{\begin{array}{ll}

2110:  \displaystyle \frac{1}{2} \erfc \left( \frac{\hat\alpha^2_*}{\sigma_1\sigma_2}\sqrt{\frac{N}{2}} \right)

2111: 	& \text{ if } \hat\alpha^2_* > 0 \\

2112:  \displaystyle 1

2113: 	& \text{ if } \hat\alpha^2_* \le 0 \\

2114: \end{array} , \right. \label{analytic1} \\

2115: P_\text{FD}(\hat\alpha^2_*,\xi,\alpha,\sigma_1,\sigma_2,N) &=&

2116: \left\{\begin{array}{ll}

2117:  \displaystyle 1 - \frac{1}{2} \erfc \left[ \left( \hat\alpha^2_*-\xi\alpha^2 \right)

2118:               \sqrt{ \frac{N}{2 \left[ \xi\alpha^4(3-\xi) + \xi\alpha^2(\sigma_1^2+\sigma_2^2)

2119:               + \sigma_1^2 \sigma_2^2 \right]} }\right]

2120: 	& \text{ if } \hat\alpha^2_* > 0 \\

2121:  \displaystyle 0

2122: 	& \text{ if } \hat\alpha^2_* \le 0 \\

2123: \end{array} \right. .\nonumber \\ &&

2124: \label{analytic2}

2125: \end{eqnarray}

2126: \end{widetext}

2127: If we now eliminate $\hat\alpha^2_*$ between Eqs.~(\ref{analytic1})

2128: and (\ref{analytic2}), change variables from $\alpha$ to $\rho$

2129: using Eq.~(\ref{rho2}), and set $\sigma_1 = \sigma_2$,

2130: we obtain Eq.~(\ref{analytic}).

2131:

2132: \section{Asymptotic behavior of maximum likelihood statistic}

2133: \label{s:appendixC}

2134:

2135:

2136: In this appendix we derive the large-$N$ behavior of

2137: the maximum likelihood statistic $\Lambda^{\rm NG}_{\rm ML}$.

2138: From Eq. (\ref{main result2}), we can write the statistic in the form

2139: \begin{equation}

2140: \Lambda_{\rm ML}^{\rm NG}(h) = \exp \left[ N {\cal L}(h) \right]

2141: \label{lambdadef}

2142: \end{equation}

2143: with

2144: \begin{equation}

2145: {\cal L}(h) = \max_{\sigma_1,\sigma_2,\xi,\alpha} \,

2146: g(\sigma_1,\sigma_2,\xi,\alpha,h)

2147: \label{lambdadef1a}

2148: \end{equation}

2149: where

2150: \begin{equation}

2151: g = \frac{1}{N}

2152: \sum_{k=1}^N \, g_k(\sigma_1,\sigma_2,\xi,\alpha),

2153: \label{lambdadef1}

2154: \end{equation}

2155: and the function $g_k = g_k(\sigma_1,\sigma_2,\xi,\alpha)$ is

2156: given by

2157: \begin{equation}

2158: e^{g_k} = \xi A_k(\alpha) + (1 - \xi) A_k(0)

2159: \end{equation}

2160: with

2161: \begin{eqnarray}

2162: A_k(\alpha) &=&

2163:  { \exp \left[ \frac{\left( \frac{h_1^k}{\sigma^2_1} + \frac{h_2^k}{\sigma^2_2}\right)^2}

2164:           {2\left( \frac{1}{\sigma^2_1} + \frac{1}{\sigma^2_2} + \frac{1}{\alpha^2} \right)}

2165:           - \frac{\left( h_1^k\right)^2}{2\sigma^2_1} - \frac{\left(

2166: 	h_2^k\right)^2}{2\sigma^2_2} +1 \right]} \nonumber \\

2167:  && \times { {\bar \sigma}_1 {\bar \sigma}_2  \over

2168: \sqrt{\sigma^2_1 \sigma^2_2 + \sigma^2_1 \alpha^2 + \sigma^2_2 \alpha^2}}.

2169: \label{f def}

2170: \end{eqnarray}

2171: We denote by ${\tilde \sigma}_1$, ${\tilde \sigma}_2$, ${\tilde \xi}$ and

2172: ${\tilde \alpha}$ the ``true'' parameters governing the distribution of

2173: the quantities $h_1^k$ and $h_2^k$ according to

2174: Eqs. (\ref{detector output matrices}), (\ref{assumption2}),

2175: (\ref{assumption4}), and (\ref{signal model}), with untilded

2176: quantities replaced by the corresponding tilded quantities.

2177: [These ``true parameters'' were denoted by $\sigma_1$, $\sigma_2$,

2178: $\xi$ and $\alpha$ in the body of the paper.]

2179: We define ${\tilde \rho}$ to

2180: be the signal-to-noise ratio (\ref{rho2}) with untilded

2181: quantities replaced by tilded quantities:

2182: \begin{equation}

2183: {\tilde \rho} \equiv \frac{ {\tilde \xi} {\tilde \alpha}^2 \sqrt{N} }{ {\tilde

2184: \sigma}_1 {\tilde \sigma}_2}.

2185: \label{rho2bar}

2186: \end{equation}

2187: For simplicity, in this appendix we restrict attention to the case

2188: ${\tilde \sigma}_1 = {\tilde \sigma}_2$.  Then, without loss of

2189: generality, we can take ${\tilde \sigma}_1 = {\tilde \sigma}_2 =1$ by

2190: rescaling our units of strain amplitude.

2191:

2192:

2193:

2194: We discuss separately the computation of the false alarm and false

2195: dismissal probabilities, as different techniques are required to

2196: compute each.

2197:

2198: \subsection{False dismissal probability}

2199:

2200:

2201:

2202: The false dismissal probability for the statistic (\ref{lambdadef1a})

2203: will be some function

2204: \begin{equation}

2205: P_{\rm FD} = P_{\rm FD}({\cal L}_*, N, {\tilde \xi}, {\tilde \rho})

2206: \end{equation}

2207: of the threshold ${\cal L}_*$ on ${\cal L}$, the number of data points

2208: $N$, the Gaussianity parameter ${\tilde \xi}$ and signal-to-noise

2209: ratio ${\tilde \rho}$ of the signal.  For applications to ground based

2210: detectors, we will have ${\tilde \rho} \sim $ (a few), in order that

2211: the signal be detectable, $N \sim 10^9$, and $10^{-3} \alt {\tilde

2212: \xi}\le 1$.  Therefore it would be useful to find approximate analytic

2213: expressions for the false alarm probability in the limit of large

2214: $N$.  There are actually several different, large $N$ regimes in the

2215: three dimensional parameter space $(N, {\tilde \xi}, {\tilde \rho})$

2216: that one might explore:

2217: \begin{itemize}

2218: \item The limit $N \to \infty$ with ${\tilde \alpha}$ and ${\tilde

2219: \xi}$ held fixed.  This corresponds to fixing the stochastic

2220: background signal and going to a limit of long observation times.  In

2221: this limit we have ${\tilde \rho} \propto \sqrt{N}$ which diverges.

2222: This is not a very realistic limit to explore.

2223:

2224: \item The limit $N \to \infty$ with ${\tilde \rho}$ and ${\tilde \xi}$

2225: held fixed.  In this limit, the signal-to-noise ratio is held fixed,

2226: and correspondingly the amplitude ${\tilde \alpha}$ of the stochastic

2227: background signal goes to zero, from Eq.\ (\ref{rho2bar}).  This would

2228: be the most natural

2229: limit to explore.  However, in this limit the statistical error

2230: $\Delta {\tilde \xi}$ in our measurement of the Gaussianity parameter

2231: would diverge, from Eq.\ (\ref{Deltaxi}), and therefore in this limit

2232: we do not expect to be able to compute analytically the value of the

2233: parameter $\xi$ which achieves the maximum in Eq.\ (\ref{lambdadef1a}).

2234: The analytic approximation methods which we discuss below do not work in this

2235: regime.  [In addition our Monte Carlo simulations show that the maximum

2236: likelihood statistic itself does not perform any better than the

2237: cross-correlation statistic in this regime, as discussed in the

2238: Introduction.]

2239:

2240: \item The limit we actually explore is the limit $N \to \infty$ with

2241: ${\tilde \xi}$ fixed and ${\tilde \rho}$ scaling $\propto N^{1/4}$,

2242: corresponding to ${\tilde \alpha} \propto N^{-1/8}$.  The reason for

2243: our choosing to explore this particular limit is simply that it is

2244: amenable to analytic computations.  Fractional corrections to our

2245: analytic results should scale like $1/N$ or as $1 / {\tilde \rho}^4$.

2246: Since ${\tilde \rho} \sim $ (a few) at the threshold for detection, the

2247: approximation should be good to $10\% - 20\%$ or so.

2248:

2249: \end{itemize}

2250:

2251:

2252:

2253: We now turn to a discussion of the computational technique.

2254: We write

2255: \begin{equation}

2256: {\tilde \alpha} = {\tilde \alpha}_0 N^{-1/8},

2257: \label{alpha0def}

2258: \end{equation}

2259: where ${\tilde \alpha}_0$ is independent of $N$.

2260: Correspondingly, from Eq.\ (\ref{signal model}) we can write

2261: \begin{equation}

2262: s^k = N^{-1/8} {\hat s}^k,

2263: \end{equation}

2264: where the distribution of ${\hat s}^k$ is given by Eq.\ (\ref{signal

2265: model}) with $\xi$ replaced by ${\tilde \xi}$ and $\alpha$ replaced

2266: by ${\tilde \alpha}_0$.  In particular, the distribution of ${\hat

2267: s}^k$ is independent of $N$.

2268: In computing the maximum over $(\xi,\alpha,\sigma_1, \sigma_2)$ in

2269: Eq.\ (\ref{lambdadef1a}), it is useful

2270: change variables from $\alpha$ to $\kappa$ defined by

2271: \begin{equation}

2272: \kappa = \rho N^{-1/4} = {\xi \alpha^2 N^{1/4}  \over \sigma_1

2273:   \sigma_2},

2274: \label{kappadef}

2275: \end{equation}

2276: which we expect to be independent of $N$ to leading order in the large

2277: $N$ limit.  The value of the variable $\kappa$ that characterizes the

2278: signal is

2279: \begin{equation}

2280: {\tilde \kappa} = {\tilde \rho} N^{-1/4} = {{\tilde \xi} {\tilde

2281:     \alpha}_0^2  \over {\tilde

2282:     \sigma}_1 {\tilde \sigma}_2};

2283: \end{equation}

2284: cf.\ Eqs.\ (\ref{alpha0def}) and (\ref{kappadef}).

2285:

2286:

2287: We now consider fixed realizations of the infinite sequences of random

2288: variables $n_1^k$, $n_2^k$ and ${\hat s}^k$, and

2289: $1 \le k < \infty$, and examine the limiting

2290: behavior of ${\cal L}(h)$ as $N \to \infty$.

2291: We compute this limiting behavior by substituting into

2292: the right hand side of Eq. (\ref{lambdadef})

2293: the relations

2294: \begin{equation}

2295: h_1^k = n_1^k + N^{-1/8} {\hat s}^k \ \ \ \ \

2296: h_2^k = n_2^k + N^{-1/8} {\hat s}^k,

2297: \end{equation}

2298: writing $\alpha$ in terms of $\kappa$ using Eq.\ (\ref{kappadef}), and

2299: expanding in powers of $N^{-1/8}$.

2300: The result is an expression which can be written in terms of

2301: the sums $Q_{abc}$ defined by

2302: \begin{equation}

2303: \label{sums0}

2304: Q_{abc} = \frac{1}{N}\sum_{k=1}^N \left( \hat s^k \right)^a

2305: 				  \left( n_1^k    \right)^b

2306: 				  \left( n_2^k    \right)^c,

2307: \end{equation}

2308: where $a$, $b$, and $c$ are non-negative integers.

2309: From the central limit theorem we can write

2310: \begin{equation}\label{sums}

2311: Q_{abc} = \mu_{abc} + \frac{1}{\sqrt{N}} \Delta_{abc},

2312: \end{equation}

2313: where $\mu_{abc}=\left< Q_{abc} \right>$ are computable functions of

2314: ${\tilde \xi}$ and ${\tilde \alpha}$, and where the random variables

2315: $(\Delta_{100},\Delta_{010},\ldots)$

2316: converge in distribution

2317: % FOOTNOTE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2318: \footnote{See chapter 8 of Ref.\ \cite{Papoulis} for definitions of different

2319: notions of convergence for sequences of random variables. }

2320: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2321: as $N \rightarrow \infty$ to a multivariate Gaussian of zero mean whose variance-covariance

2322: matrix is independent of $N$.  Thus, in particular the joint distribution of all

2323: $\Delta_{abc}$'s is $N$-independent in limit that $N \rightarrow \infty$.

2324:

2325:

2326:

2327:

2328:

2329:

2330: We define the vector

2331: \begin{equation}

2332: {\bf v} = (v^1,v^2,v^3,v^4) = (\xi,\kappa,\sigma_1^2, \sigma_2^2),

2333: \end{equation}

2334: We denote the value of ${\bf v}$ that achieves the maximum in Eq.\

2335: (\ref{lambdadef1a}) by $\hat{{\bf v}}$:

2336: \begin{equation}

2337: g( \hat{{\bf v}}) = \max_{{\bf v}}~g({\bf v}),

2338: \end{equation}

2339: where $\hat{{\bf v}} = (\hat \xi, \hat \kappa,\widehat{\sigma_1^2},\widehat{\sigma_2^2})$.

2340: These estimators satisfy a system of four equations \footnote{Here we are assuming that the maximum is achieved as a local maximum in the interior of the 4 dimensional parameter space.  Cases when the maximum is achieved on the boundary are discussed below.}

2341: \begin{equation} \label{hard eq system}

2342: \left. \frac{\partial g }{\partial v^l} \right|_{{\bf v} = \hat{{\bf v}}} = 0.

2343: \end{equation}

2344: We solve Eq.~(\ref{hard eq system}) perturbatively.

2345: First assume that the estimators can be expanded in the form

2346: \begin{equation} \label{assume expand}

2347: \widehat{v^l} = \sum_{j=0}^{\infty} \widehat{v^{l}}^{[j]} \epsilon^j,

2348: \end{equation}

2349: where for ease of notation we have defined

2350: $\epsilon = N^{-1/8}$.

2351: We define the expansion coefficients $v^{l[j]}$ analogously by an

2352: expansion of the form (\ref{assume expand})

2353: but without the hats.  Now using Eq.\ (\ref{sums}) the function $g$

2354: can be expanded as a power

2355: series in $\epsilon$ whose coefficients are functions of

2356: $v^{l[k]}$, $\mu_{abc}$, and $\Delta_{abc}$:

2357: \begin{equation}

2358: g({\bf v}) = \sum_{j=0}^{\infty}

2359: g^{[j]}\left[ v^{l[k]}, \mu_{abc},\Delta_{abc} \right] \epsilon^j.

2360: \label{gexpand}

2361: \end{equation}

2362: Substituting the expansions (\ref{assume expand}) and (\ref{gexpand}) into

2363: the condition

2364: (\ref{hard eq system}) for a local extremum

2365: gives an infinite set of equations which must collectively be satisfied by

2366: the coefficients $\widehat{v^{l}}^{[j]}$

2367: \begin{equation} \label{easy set}

2368: \left. \frac{\partial g^{[j]}}{\partial v^{l[k]}} \right|_{v^{m[n]} = \widehat{v^m}^{[n]}} = 0.

2369: \end{equation}

2370: We solve these equations order by order to determine the coefficients

2371: $\widehat{{v^l}}^{[j]}$, and thereby justify a posteriori the ansatz

2372: (\ref{assume expand}).

2373:

2374:

2375:

2376:

2377: We find that in order to compute the leading order expression for

2378: ${\cal L}$, we must obtain the expansion for ${\hat \xi}$ to zeroth

2379: order in $\epsilon$, the expansion for ${\hat \kappa}$ to fourth order

2380: in $\epsilon$, and the expansions of ${\hat {\sigma_1^2}}$ and ${\hat

2381: {\sigma_2^2}}$ to sixth order in $\epsilon$.

2382: The leading order results are

2383: \begin{eqnarray}

2384: \label{kappaans}

2385: {\hat \kappa} &=& {\tilde \kappa} + \epsilon^2 X + O(\epsilon^3), \\

2386: \label{xians}

2387: {1 \over {\hat \xi}} &=& {1 \over {\tilde \xi}} + {Y \over \sqrt{6}

2388:   {\tilde \kappa}^2} + O(\epsilon), \\

2389: \widehat{\sigma_1^2} &=& 1 + O(\epsilon^2), \\

2390: \widehat{\sigma_2^2} &=& 1 + O(\epsilon^2),

2391: \label{sigmaans}

2392: \end{eqnarray}

2393: where

2394: \begin{eqnarray}

2395: \label{eq:Xdef}

2396: X = \Delta_{011}

2397: \end{eqnarray}

2398: and

2399: \begin{eqnarray}

2400: \label{eq:Ydef}

2401:  Y &=& {1 \over 8 \sqrt{6}} \bigg[ 4 (\Delta_{031} + \Delta_{013}) - 12

2402: (\Delta_{002} + \Delta_{020}) \nonumber \\

2403: && - 24 \Delta_{011} + \Delta_{040} + \Delta_{004} + 6 \Delta_{022} \bigg].

2404: \end{eqnarray}

2405: Using Eqs.\ (\ref{assumption2}), (\ref{signal model}), (\ref{sums0}) and

2406: (\ref{sums}) one can show that the random variables $X$ and $Y$ are

2407: independent Gaussian random variables of zero mean and unit variance.

2408:

2409:

2410: In deriving Eqs.\ (\ref{kappaans}) -- (\ref{sigmaans}) we assumed that

2411: the value of ${\bf v}$ which achieves the maximum in Eq.\

2412: (\ref{lambdadef1a}) corresponds a local maximum.  However, if the

2413: right hand side of Eq.\ (\ref{kappaans}) is negative, the maximum will

2414: instead be achieved on the boundary of the parameter space at ${\hat

2415: \kappa} =0$, since the variable $\kappa$ must be non-negative.

2416: Similarly, if the right hand side of Eq.\ (\ref{xians}) is less than

2417: 1, the maximum will be achieved at ${\hat \xi} =1$, since $1/\xi$ must

2418: lie in the interval $[1,\infty)$.

2419:

2420: Substituting the results (\ref{kappaans}) -- (\ref{sigmaans}) [together

2421: with the higher order corrections to those results which we have not

2422: shown] into the expansion for the statistic ${\cal L}$, and taking

2423: into account the various special cases discussed in the last

2424: paragraph, gives

2425: \begin{eqnarray}

2426: {\cal L} &=&  \bigg[

2427: {1 \over 2} \left(Y + \sqrt{6} q {\tilde \kappa}^2 \right)^2 \epsilon^8

2428: \, \theta

2429: \left(Y + \sqrt{6} q {\tilde \kappa}^2 \right) \nonumber \\

2430: && + {1 \over 2} ({\tilde \kappa} + \epsilon^2 X)^2 \epsilon^4

2431: - {\tilde \kappa}^3 \epsilon^6 + {7 \over 4} {\tilde \kappa}^4

2432: \epsilon^8 \nonumber \\

2433:  &&

2434: + {\tilde \kappa} U \epsilon^7 + {\tilde \kappa} V \epsilon^8

2435: \bigg] \theta({\tilde \kappa} + \epsilon^2 X) + O(\epsilon^9).

2436: \label{eq:ML1}

2437: \end{eqnarray}

2438: Here $\theta(x)$ is the step function and

2439: \begin{eqnarray}

2440: \label{eq:qdef}

2441: q &=& {1 \over {\tilde \xi}}-1, \\

2442: U &=& \Delta_{101} + \Delta_{110}, \\

2443: V &=& \Delta_{200} - {1 \over 2} {\tilde \kappa} (\Delta_{002} +

2444: \Delta_{020}) - 2 {\tilde \kappa} \Delta_{011}.

2445: \end{eqnarray}

2446: We note that the corresponding expression for the statistic

2447: $(\ln \Lambda_{\rm ML}^{\rm G})/N$ [which is equivalent to the

2448: cross-correlation statistic by Eq.\ (\ref{Gaussian statistic})] is

2449: given by Eq.\ (\ref{eq:ML1}) with the first term in the square

2450: brackets dropped.

2451:

2452:

2453: Next we drop all the

2454: terms in the square bracket in Eq.\ (\ref{eq:ML1}) other than the

2455: first two terms.  The

2456: reason is that these terms will give corrections that are smaller than

2457: the terms retained (both in expected value and in fluctuations) by a

2458: factor of

2459: $$

2460: {\tilde \kappa} \epsilon^2 = {{\tilde \rho} \over \sqrt{N}},

2461: $$

2462: which will be small compared to unity for all cases we are interested

2463: in.  This gives for the false dismissal probability the expression

2464: \begin{eqnarray}

2465: P_\text{FD} &=& P({\cal L} < {\cal L}_*) \nonumber \\

2466:  &=& \int_{\cal R} {dx dy  \over

2467: 2 \pi} \exp \left[ -{(x-x_0)^2 \over 2} - {(y-y_0)^2 \over 2} \right], \nonumber \\ & &

2468: \end{eqnarray}

2469: where

2470: \begin{eqnarray}

2471: \label{eq:def1}

2472: x_0 &=& {\tilde \kappa}/\epsilon^2 \\

2473: y_0 &=& \sqrt{6} q {\tilde \kappa}^2  \\

2474: r_0 &=& \sqrt{2 N {\cal L}_*}.

2475: \label{eq:def4}

2476: \end{eqnarray}

2477: Here the region ${\cal R}$ in the $x,y$ plane is the union of the

2478: two regions

2479: \begin{eqnarray}

2480: x &\ge& 0 \nonumber \\

2481: y &\ge& 0 \nonumber \\

2482: x^2 +  y^2 &\le& r_0^2

2483: \label{eq:region1a}

2484: \end{eqnarray}

2485: and

2486: \begin{eqnarray}

2487: y &\le& 0 \nonumber \\

2488: 0 \le x &\le& r_0.

2489: \label{eq:region2a}

2490: \end{eqnarray}

2491:

2492:

2493: The integral over the region (\ref{eq:region2a}) is

2494: \begin{equation}

2495: P_{\rm FD}^{(1)} =

2496: {\cal P}(-y_0)[ {\cal P}(r_0 - x_0) - {\cal P}(-x_0)],

2497: \label{eq:1}

2498: \end{equation}

2499: where

2500: \begin{equation}

2501: {\cal P}(x) \equiv 1 - {1 \over 2} \erfc (x/\sqrt{2}) = \int_{-\infty}^x

2502: dt {1 \over \sqrt{2 \pi}} \exp[-t^2/2].

2503: \end{equation}

2504: The integral over the region (\ref{eq:region1a}) can be written as

2505: \begin{eqnarray}

2506: P_{\rm FD}^{(2)} &=& {1 \over 2 \pi} \int_0^{\pi/2} d\theta \,

2507: \int_{0}^{r_0} dr \, r

2508: \nonumber \\ &\times&

2509: \exp \left[ - {1 \over 2} (r \cos \theta -

2510: x_0)^2 - {1 \over 2} (r \sin \theta - y_0)^2 \right]. \nonumber \\

2511: & & \label{eq:pFD1}

2512: \end{eqnarray}

2513: The integrand in (\ref{eq:pFD1})

2514: peaks

2515: at $r \cos \theta =

2516: x_0$, $r \sin \theta = y_0$.  In order for $P_{\rm FD}$ to be

2517: small, its necessary that this peak occurs outside the domain of

2518: integration, at $r > r_0$.  So we must have

2519: \begin{equation}

2520: x_0^2 + y_0^2 \ge r_0^2.

2521: \label{eq:constraint}

2522: \end{equation}

2523: The criterion $x_0 \ge r_0$ is, in order of magnitude, just the usual

2524: criterion for detectability with the cross-correlation statistic.  The

2525: criterion $y_0 \agt r_0$ reduces to, in order of magnitude,

2526: \begin{equation}

2527: \xi \alt {\rho^2 \over \sqrt{N}}

2528: \end{equation}

2529: which is what we claimed earlier to be the regime where the maximum

2530: likelihood statistic starts

2531: to work well, cf.\ Sec.\ \ref{s:MLS} above.

2532:

2533: Evaluating the integral (\ref{eq:pFD1}) using the Laplace

2534: approximation gives

2535: \begin{eqnarray}

2536: P_{\rm FD}^{(2)} &=& {1 \over r_0 (\lambda-1) \sqrt{2 \pi \lambda}}

2537: \exp \left[

2538: - {1 \over 2} r_0^2 (\lambda-1)^2 \right] \nonumber \\

2539:  && \times \left[ 1 + O\left({1

2540: \over r_0}\right)\right],

2541: \label{eq:final}

2542: \end{eqnarray}

2543: where we define the variables $\lambda$ and $\gamma$ by

2544: \begin{equation}

2545: (x_0,y_0) = r_0 \lambda (\cos \gamma, \sin \gamma).

2546: \label{eq:lambdadef}

2547: \end{equation}

2548: However, the result (\ref{eq:final}) is not very accurate for small

2549: $r_0$.  Alternatively we can integrate over $r$ in Eq.\

2550: (\ref{eq:pFD1}) to obtain

2551: \begin{widetext}

2552: \begin{eqnarray}

2553: && P_{\rm FD}^{(2)} = \int_0^{\pi/2} d\theta \left\{

2554: \frac{1}{2\pi} e^{\frac{{{r_0}}^2\,\left( 1 + {\lambda }^2 \right) }{2}}

2555: \left[ e^{\frac{{{r_0}}^2}{2}} - e^{{{r_0}}^2\,\lambda \,\cos (\gamma  - \theta )} \right] \right. \nonumber \\

2556: && + \left.

2557: \frac{r_0 \lambda}{2\sqrt{2\pi}}

2558: e^{\frac{ {{r_0}}^2 \lambda^2 }{4}

2559:    \left[ \cos (2\,\left\{ \gamma  - \theta  \right\} ) -1 \right] }

2560: \cos \left( \gamma  - \theta \right)

2561:      \left[ \erf\left( \frac{{r_0}\,\lambda \cos \{\gamma  - \theta \}}{{\sqrt{2}}} \right) +

2562:             \erf\left( \frac{{r_0}\,\left\{ 1 - \lambda \,\cos [\gamma  - \theta ]  \right\} }{{\sqrt{2}}} \right) \right]

2563: \right\}, \label{eq:c}

2564: \end{eqnarray}

2565: \end{widetext}

2566: where

2567: \begin{equation}

2568: \erf(x) = \frac{2}{\sqrt{\pi}}\int_0^x dy~ e^{-y^2}.

2569: \end{equation}

2570: The integral (\ref{eq:c}) can be evaluated numerically.

2571: The false dismissal probability is then given by

2572: \begin{equation}

2573: P_{\rm FD} = P_{\rm FD}^{(1)} + P_{\rm FD}^{(2)},

2574: \label{eq:ansA}

2575: \end{equation}

2576: with $P_{\rm FD}^{(1)}$ given by Eq.\ (\ref{eq:1})

2577: and $P_{\rm FD}^{(2)}$ given by Eq.\ (\ref{eq:c}).

2578:

2579:

2580: \subsection{False alarm probability}

2581:

2582: The false alarm probability is some function

2583: \begin{equation}

2584: P_{\rm FA} = P_{\rm FA}({\cal L}_*,N)

2585: \end{equation}

2586: of the threshold ${\cal L}_*$ value of the detection statistic

2587: (\ref{lambdadef1a}) and of the number of data points $N$.  It does not

2588: depend on the signal parameters ${\tilde \rho}$ and ${\tilde \xi}$

2589: because no signal is present.  We would like to evaluate this quantity

2590: in the large $N$ limit.

2591:

2592: We start by rewriting the statistic (\ref{lambdadef1a}) in the form

2593: \begin{equation}

2594: {\cal L} = \max_{\bf v} \left\{ {1 \over N} \sum_{k=1}^N \ln A_k(0)

2595: + {1 \over N} \sum_{k=1}^N \ln \left[ 1 + \xi {\cal D}_k(\alpha)

2596:   \right] \right\},

2597: \label{lambdadef4}

2598: \end{equation}

2599: where

2600: \begin{equation}

2601: {\cal D}_k(\alpha) = { A_k(\alpha) \over A_k(0)} -1.

2602: \label{calDdef}

2603: \end{equation}

2604: Consider first the first term in Eq.\ (\ref{lambdadef4}).  Using the

2605: definition (\ref{f def}) of $A_k(\alpha)$ and the definition

2606: (\ref{intro bar sigma})

2607: of ${\bar \sigma}_1$ and ${\bar \sigma}_2$ we can

2608: write this term as

2609: \begin{equation}

2610: {1 \over N} \sum_{k=1}^N \ln A_k(0) = - { \Delta \sigma_1^2 \over

2611:   {\bar \sigma}_1^2}  - { \Delta \sigma_2^2 \over

2612:   {\bar \sigma}_2^2} + O( \Delta \sigma_1^3, \Delta \sigma_2^3),

2613: \end{equation}

2614: where $\Delta \sigma_1 = \sigma_1 - {\bar \sigma}_1$, $\Delta \sigma_2

2615: = \sigma_2 - {\bar \sigma}_2$.

2616: Therefore the first term is maximized at $\sigma_1 = {\bar \sigma}_1$,

2617: $\sigma_2 = {\bar \sigma}_2$.  Below we shall show that the second

2618: term in Eq.\ (\ref{lambdadef4}) is of order $O(\epsilon^2)$, where in this

2619: subsection we define $\epsilon = 1/\sqrt{N}$.  Therefore the values of

2620: $\sigma_1$ and $\sigma_2$ that achieve the maximum are

2621: \begin{eqnarray}

2622: {\hat \sigma}_1 &=& {\bar \sigma}_1 \left[ 1 + O(\epsilon^2) \right] \nonumber

2623: \\

2624: {\hat \sigma}_2 &=& {\bar \sigma}_2 \left[ 1 + O(\epsilon^2) \right].

2625: \end{eqnarray}

2626: Moreover, in analyzing the second term it suffices to take $\sigma_1 =

2627: {\bar \sigma}_1$, $\sigma_2 = {\bar \sigma}_2$ in order to obtain the

2628: statistic to the leading $O(\epsilon^2)$ order.  Lastly, since we have

2629: assumed that ${\tilde \sigma}_1 = {\tilde \sigma}_2 =1$ and no signal

2630: is present, we have ${\bar \sigma}_{1,2} = 1 + O(\epsilon)$.  Hence,

2631: in analyzing the second term, it is sufficient to take $\sigma_1 =

2632: \sigma_2 = 1$.

2633:

2634:

2635: The statistic (\ref{lambdadef4}) therefore reduces to

2636: \begin{equation}

2637: {\cal L} = \max_{\alpha,\xi}

2638: {1 \over N} \sum_{k=1}^N \, \ln \left[ 1 + \xi {\cal D}_k(\alpha) \right] + O(\epsilon),

2639: \label{lambdadef5}

2640: \end{equation}

2641: where from Eqs.\ (\ref{f def}) and (\ref{calDdef})

2642: \begin{equation}

2643: {\cal D}_k(\alpha) = {1 \over \sqrt{1 + 2 \alpha}} \exp \left[ {w_k^2

2644:     \over 2 + {1 \over \alpha}} \right] -1.

2645: \end{equation}

2646: Here $w_k = (n_1^k + n_2^k)/\sqrt{2}$, $1 \le k \le N$, are

2647: independent Gaussian random variables of zero mean and unit variance.

2648:

2649:

2650:

2651: It is straightforward to numerically compute the distribution of the

2652: statistic (\ref{lambdadef5}), by generating the Gaussian variables

2653: $w_k$ and numerically maximizing over $\xi$ and $\alpha$.

2654: The result is shown in Fig. \ref{fig:fa}.  We find

2655: that at large $N$, the distribution of $N {\cal L}$ becomes

2656: independent of $N$, and is approximately given by

2657: \begin{equation}

2658: P(N {\cal L} > \xi) = \alpha_0 e^{-\beta_0 \xi}

2659: \end{equation}

2660: for $\xi > 0$, where $\alpha_0 \approx 0.42$ and $\beta_0 \approx

2661: 1.08$.  Therefore the false alarm probability is approximately given

2662: by

2663: \begin{equation}

2664: P_{\rm FA} = \alpha_0 \exp \left[ - \beta_0 N {\cal L}_* \right].

2665: \label{eq:ansB}

2666: \end{equation}

2667:

2668:

2669: \begin{figure}

2670: \begin{center}

2671: \epsfig{file=fa.eps,width=8.5cm}

2672: \caption{The cumulative distribution function for the leading order

2673: expression \protect{(\ref{lambdadef5})} for the statistic when no

2674: signal is present, obtained numerically.  The solid line is for $N =

2675: 1000$, and the dashed line for $N = 5000$.

2676: }

2677: \label{fig:fa}

2678: \end{center}

2679: \end{figure}

2680:

2681:

2682:

2683:

2684: Finally, we remark why it is plausible to expect the distribution of

2685: $N {\cal L}$ to be independent of $N$ in the large $N$ limit.  The

2686: numerical maximizations over $\xi$ and $\alpha$ in Eq.\

2687: (\ref{lambdadef5}) show that the maximum is nearly always achieved at

2688: $\alpha \ll 1$ or $\xi \ll 1$.  In both these regimes, one can obtain

2689: some information about the $N$-dependence of the statistic.

2690:

2691: Consider first the regime $\xi \ll 1$.  In this regime we can expand

2692: the expression (\ref{lambdadef5}) as a power series in $\xi$ to obtain

2693: \begin{equation}

2694: {\cal L} = \max_{\alpha,\xi}

2695: {1 \over N} \sum_{k=1}^N \, \left[ \xi {\cal D}_k(\alpha) - {1 \over

2696:     2} \xi^2 {\cal D}_k(\alpha)^2 + O(\xi^3) \right] + O(\epsilon).

2697: \label{lambdadef6}

2698: \end{equation}

2699: The generalized central limit theorem (reviewed in Appendix

2700: \ref{app:gclt}) implies that

2701: \begin{equation}

2702: {1 \over N} \sum_{k=1}^N \, {\cal D}_k(\alpha) = N^{1 - \gamma_1 \over

2703:   \gamma_1} \left( \ln N \right)^{\delta_1} {\cal

2704:   F}_N(\alpha),

2705: \label{levy1}

2706: \end{equation}

2707: where for each fixed $\alpha$, the distribution of the random variable ${\cal

2708: F}_N(\alpha)$ becomes independent of $N$ in the large $N$ limit.

2709: Here

2710: \begin{equation}

2711: \gamma_1 = \left\{ \begin{array}{ll} 2 & 0 <

2712:         \alpha \le 1/2 \\

2713:         1 + {1 \over 2 \alpha} &

2714:         1/2 \le \alpha \\ \end{array} \right.

2715: \end{equation}

2716: and

2717: \begin{equation}

2718: \delta_1 = \left\{ \begin{array}{ll} 0 & 0 <

2719:         \alpha \le 1/2 \\

2720:          { -\alpha \over 1 + 2 \alpha} &

2721:         1/2 \le \alpha.\\ \end{array} \right.

2722: \end{equation}

2723: The limiting distribution is a Levy distribution with parameters $p =

2724: 1$ and $\gamma = \gamma_1$.

2725: Similarly we have

2726: \begin{equation}

2727: {1 \over N} \sum_{k=1}^N \, {\cal D}_k(\alpha)^2 = N^{1 - \gamma_2 \over

2728:   \gamma_2} \left( \ln N \right)^{\delta_2} {\cal

2729:   G}_N(\alpha),

2730: \label{levy2}

2731: \end{equation}

2732: where as $N \to \infty$ at each fixed $\alpha$ the distribution of the

2733: random variable ${\cal G}_N(\alpha)$ tends to a Levy distribution with

2734: parameters $p=1$ and $\gamma = \gamma_2$, with

2735: \begin{equation}

2736: \gamma_2 = \left\{ \begin{array}{ll} 2 & 0 <

2737:         \alpha \le 1/6 \\

2738:          {1 + 2 \alpha \over 4 \alpha} &

2739:         1/6 \le \alpha.\\ \end{array} \right.

2740: \end{equation}

2741: and

2742: \begin{equation}

2743: \delta_2 = \left\{ \begin{array}{ll} 0 & 0 <

2744:         \alpha \le 1/6 \\

2745:          { - 2 \alpha \over 1 + 2 \alpha} &

2746:         1/6 \le \alpha. \\ \end{array} \right.

2747: \end{equation}

2748:

2749:

2750:

2751: We now substitute the results (\ref{levy1}) and (\ref{levy2}) into the

2752: expression (\ref{lambdadef6}) for the statistic, and maximize analytically over

2753: the quadratic dependence on $\xi$.  For $\alpha \ge 1/2$, the value of

2754: $\xi$ which achieves the maximum goes to zero as $N \to \infty$,

2755: consistent with the assumption $\xi \ll 1$, and the result is

2756: \footnote{For $\alpha < 1/2$ this argument fails, which is why we must

2757: numerically verify that the distribution of $N {\cal L}$ is

2758: asymptotically independent of $N$.}

2759: \begin{equation}

2760: N {\cal L} = {1 \over 2 } \max_\alpha {{\cal F}_N(\alpha)^2 \over {\cal

2761:     G}_N(\alpha)} + O(\epsilon).

2762: \end{equation}

2763:

2764:

2765: In the regime $\alpha \ll 1$, if we expand the expression

2766: (\ref{lambdadef5}) to quadratic order in $\alpha$, the result is an

2767: expression which is a linear function of $1/\xi$ at fixed $\alpha

2768: \xi$.  Hence, when one

2769: maximizes over values of $\xi$ in the range $0 \le \xi \le 1$, the

2770: maximum is always achieved either at $\xi =0$ or $\xi =1$.  One can

2771: show that the maximum to this order is always achieved at $\xi = 1$, and the

2772: resulting expression is

2773: \begin{equation}

2774: N {\cal L} = {1 \over 4 } {\cal G}^2 + O(\epsilon),

2775: \end{equation}

2776: where

2777: \begin{equation}

2778: {\cal G} = \sqrt{N} \left[ {1 \over N} \sum_{k=1}^N w_k^2 \ -1 \right]

2779: \end{equation}

2780: has a distribution that is independent of $N$ in the large $N$ limit.

2781:

2782: \section{Generalized central limit theorem}

2783: \label{app:gclt}

2784:

2785: In this appendix we review the generalized central limit theorem that

2786: can be found on p.~574 of Ref.~\cite{Feller}.  First we define a

2787: particular distribution function called the Levy distribution.  It

2788: depends on 3 real parameters, a positive constant $C$, a parameter

2789: $\gamma$ in the range $0 < \gamma \le 2$, and constant $p$ in the

2790: range $0 \le p \le 1$ \footnote{The parameter $\gamma$ is conventionally denoted by $\alpha$.  We use $\gamma$ here to avoid confusion with the variable

2791: $\alpha$ defined in Eq.\ (\ref{eq:sigg}).}.  We say a random

2792: variable $X$ has a Levy distribution with parameters $C$, $\gamma$ and

2793: $p$ if the characteristic function of $X$ is given by

2794: \begin{eqnarray}

2795: \left< e^{i \zeta X} \right> &=& \exp \bigg\{ | \zeta|^\gamma { C

2796:     \Gamma(3-\gamma) \over \gamma (\gamma-1)} \bigg[ \cos(\pi

2797:     \gamma/2) \nonumber \\

2798:   &&  + i \, {\rm sgn}(\zeta) (p-q) \sin(\pi \gamma/2) \bigg] \bigg\},

2799: \end{eqnarray}

2800: where $q = 1 - p$.

2801: The corresponding probability distribution function  is obtained by

2802: taking a Fourier transform and decays like $x^{-(1+\gamma)}$ at large

2803: $x$ for $\gamma < 2$ ($\gamma =2$ is the Gaussian case).

2804:

2805: Consider now a random

2806: variable $X$ with probability

2807: distribution function $f(x)$ whose variance is infinite.  Let

2808: \begin{equation}

2809: F(x) = \int_{-\infty}^x dy \, f(y)

2810: \end{equation}

2811: be the cumulative distribution function

2812: and define

2813: \begin{equation}

2814: \mu(x) = \int_{-x}^x dy y^2 f(y).

2815: \end{equation}

2816: Suppose that

2817: the distribution satisfies the following conditions:

2818: (i)  As $x \to \infty$ we have $\mu(x) \sim x^{2 - \gamma} L(x)$,

2819: where $0 < \gamma \le 2$, and $L(x)$ varies slowly in the sense that

2820: $L(tx)/L(t) \to 1$ as $t \to \infty$ for all $x>0$.  (ii) We have

2821: \begin{equation}

2822: {1 - F(x) \over F(-x) + 1 - F(x)} \to p \ \ \ \ \ {F(-x) \over F(-x) +

2823:   1 - F(x)} \to q

2824: \end{equation}

2825: as $x \to \infty$, where $0 \le p \le 1$, $0 \le q \le 1$ and $p+q=1$.

2826: (iii) For $1 < \gamma \le 2$, we assume that the expected value $\int

2827: dx \, x f(x)$ vanishes; this can be enforced by making a

2828: transformation of the form $X \to X + {\rm constant}$.

2829:

2830: We define the sequence of random variables

2831: \begin{equation}

2832: S_N = {1 \over a_N} \sum_{i=1}^N \, X_i,

2833: \end{equation}

2834: where the $X_i$ are independent, identically distributed random

2835: variables with distribution function $f$, and the constants

2836: $a_N$ are

2837: chosen to satisfy

2838: \begin{equation}

2839: {N \mu(a_N)  \over a_N^2} \to C

2840: \end{equation}

2841: as $N \to \infty$, where $C$ is a positive constant.  Then, the

2842: distribution functions of the random

2843: variables $S_N$ converge to a Levy distribution with parameters $C$,

2844: $\gamma$ and $p$ as $N \to \infty$.

2845:

2846:

2847: \begin{thebibliography}{0}

2848:

2849: \bibitem{ligo}

2850: A.  Abramovici \emph{et al.}, Science {\bf 256}, 325 (1992).

2851: %A.  Abramovici, W. E.  Althouse, R. W. P. Drever, Y. G\"{u}rsel, S. Kawamura, F. J. Raab,

2852: %D. Shoemaker, L. Siewers, R. E. Spero, K. S. Thorne, R. E. Vogt, R. Weiss, S. E. Whitcomb,

2853: %and M. E. Zucker,

2854: %\emph{LIGO: The laser interferometer gravitational-wave observatory}, Science, 256, 325--333, (1992)

2855:

2856: \bibitem{virgo}

2857: C. Bradaschia \emph{et al.}, Nuc. Instrum. Methods {\bf 289}, 518 (1990).

2858: %C. Bradaschia, R. del Fabbro,  A. Virgilio, A. Giazotto, H. Kautzky,  V. Montelatici, D. Passuello,

2859: %A. Brillet, O. Cregut, P. Hello, C. N. Man, P. T. Manh, A. Marraud, D. Shoemaker, J. Y. Vinet, F. Barone,

2860: %L. di Fiore, L. Milano, G. Russo, J. M. Aguirregabiria, H. Bel, J. P. Duruisseau, G. le Denmat, P. Tourrenc,

2861: %M. Capozzi, M. Longo, M. Lops, I. Pinto, G. Rotoli, T. Damour, S. Bonazzola, J. A. Marck, Y. Gourghoulon,

2862: %L. E. Holloway, F. E. Fuligni, V. Iafolla, and G. Natale,

2863: %\emph{The VIRGO project: a wide band antenna for gravitational wave detection}

2864:

2865: \bibitem{geo}

2866: R. Schilling, AIP Conf. Proc. {\bf 456}, 217 (1998).

2867: % Second international LISA symposium on the detection and observation of gravitational waves in space

2868: %\emph{The GEO 600 ground-based interferometer for the detection of gravitational waves}

2869: % edited by William M. Folkner, Jet Propulsion Laboratory, California Institute of Technology,

2870: % Pasadena CA, December 1998

2871:

2872: \bibitem{tama}

2873: M. K. Fujimoto, Journal of the Communications Research Laboratory {\bf 46}, 437 (1999).

2874: %\emph{Japanese gravitational wave detector-TAMA 300},

2875: %Journal of the Communications Research Laboratory, 46, 3, 437-440, (1999)

2876:

2877: \bibitem{Allen Review}

2878: B. Allen, in

2879: \emph{Relativistic Gravitation and Gravitational Radiation, Proceedings of the

2880: Les Houches School of Physics, Les Houches, 1995}, edited by J. A. Marck and J. P. Lasota (CNRS, Observatorie de Paris, Meudon, 1997),

2881: p. 373.

2882: %\emph{The stochastic gravity-wave background: sources and detection},

2883: %Proceedings of the Les Houches School on Astrophysical Sources of Gravitational

2884: %waves, Cambridge University Press, (1996)

2885:

2886: \bibitem{gaussian supernovae}

2887: D. Blair and L. Ju, Mon. Not. R. Astron. Soc. {\bf 283}, 648 (1996).

2888: %\emph{A cosmological background of gravitational waves produced by supernovae in the early universe},

2889:

2890: \bibitem{non gaussian supernovae}

2891: V. Ferrari, S. Matarrese, and R. Schneider, Mon. Not. R. Astron. Soc. {\bf 303}, 247 (1999).

2892: %\emph{Gravitational wave background from a cosmological population of core-collapse supernovae},

2893:

2894: \bibitem{first stars}

2895: R. Schneider \emph{et al.}, Mon. Not. R. Astron. Soc. {\bf 317}, 385 (2000).

2896: %\emph{Gravitational wave signals from the collapse of the first stars},

2897:

2898: \bibitem{gaussian neutron stars 1}

2899: V. Ferrari, S. Matarrese, and R. Schneider, Mon. Not. R. Astron. Soc. {\bf 303}, 258 (1999).

2900: %\emph{Stochastic background of gravitational waves generated by a cosmological population

2901: %of young, rapidly rotating neutron stars},

2902:

2903: \bibitem{gaussian neutron stars 2}

2904: T. Regimbau and J. A. de Freitas Pacheco, astro-ph/0105260.

2905: %\emph{Cosmic background of gravitational waves from rotating neutron stars},

2906:

2907: \bibitem{cosmic strings}

2908: T. Damour, A. Vilenkin, Phys. Rev D {\bf 64}, 064008 (2002)  (also

2909: gr-qc/0104026).

2910: %\emph{Gravitational wave bursts from cusps and kinks on cosmic strings},

2911:

2912: \bibitem{bubbles}

2913: M. Kamionkowski, A. Kosowsky, and M. S. Turner, Phys. Rev. D {\bf 49}, 2837 (1994).

2914: %\emph{Gravitational radiation from first-order phase transitions},

2915:

2916: \bibitem{inflation}

2917: L.P. Grishchuk, Zh. Eksp. Teor. Fiz. {\bf 67}, 825

2918: (1974) [Sov. Phys. JETP {\bf 40}, 409 (1975)];

2919: E.W. Kolb and M.S. Turner, {\it The Early Universe} (Addison-Wesley,

2920: Redwood, CA, 1990), and references therein.

2921:

2922: \bibitem{binaries}

2923: R. Schneider Ferrari, S. Matarrese, and S. F. Portegies Zwart,

2924: Mon. Not. R. Astron. Soc. {\bf 342}, 797 (2001)

2925: (also astro-ph/0002055).

2926: %\emph{Gravitational waves from cosmological compact binaries},

2927:

2928: \bibitem{Coward}

2929: D. M. Coward, R. R. Burman, and D. G. Blair, Mon. Not. R. Astron. Soc. {\bf 324}, 1015 (2001).

2930: %\emph{Simulating a stochastic background of gravitational waves from neutron star formation at cosmological distances},

2931:

2932: \bibitem{Hogan}

2933: C. J. Hogan, Phys. Rev. D {\bf 62}, 121302 (2000).

2934: %\emph{Scales of the extra dimensions and their gravitational wave backgrounds},

2935:

2936: \bibitem{Michelson}

2937: P. F. Michelson, Mon. Not. R. Astron. Soc. {\bf 227}, 933 (1987).

2938: %\emph{On detecting stochastic background gravitational radiation with terrestrial detectors},

2939:

2940: \bibitem{Christensen}

2941: N. Christensen, Phys. Rev. D {\bf 46}, 5250 (1992).

2942: %\emph{Measuring the stochastic gravitational-radiation background with laser interferometric antennas},

2943:

2944: \bibitem{Flanagan}

2945: \'{E}. \'{E}. Flanagan, Phys. Rev. D {\bf 48}, 2389 (1993).

2946: %\emph{Sensitivity of the Laser Interferometer Gravitational Wave Observatory to a stochastic background and its dependence

2947: %on the detector orientations}

2948:

2949: \bibitem{Allen Romano}

2950: B. Allen and J. D. Romano, Phys. Rev. D {\bf 59}, 102001 (1999) (also

2951: gr-qc/9710117).  Note that the criticism in this paper of the upper

2952: limit formula given in Eq. (6.5) of Ref.\ \cite{Flanagan} is incorrect

2953: as it misinterprets that formula as a frequentist upper limit rather

2954: than a Bayesian upper limit.

2955: %\emph{Detecting a stochastic background of gravitational radiation: Signal processing strategies and sensitivities},

2956:

2957: \bibitem{robust gaussian}

2958: B. Allen , J. D. E. Creighton, \'{E}. \'{E}. Flanagan, and

2959: J. D. Romano, Phys. Rev. D {\bf 65}, 122002 (2002) (also

2960: gr-qc/0105100).

2961: %\emph{Robust statistics for deterministic and stochastic

2962: %gravitational waves in non-Gaussian noise.  I: Frequentist analyses},

2963:

2964: \bibitem{robust gaussian II}

2965: B. Allen , J. D. E. Creighton, \'{E}. \'{E}. Flanagan, and

2966: J. D. Romano, gr-qc/0205015.

2967: %\emph{Robust statistics for deterministic and stochastic

2968: %gravitational waves in non-Gaussian noise.  II: Bayesian analyses},

2969:

2970: \bibitem{Klimenko and Mitselmakher}

2971: S. Klimenko and G. Mitselmakher, LIGO Technical Report

2972: LIGO-T010125-00-D, 2001, (unpublished); {\it ibid.}, gr-qc/0208007.

2973: % A cross-correlation technique wavelet domain for detection of stochastic gravitational waves

2974:

2975: \bibitem{general method}

2976: L. S. Finn, Phys. Rev. D {\bf 46}, 5236 (1992).

2977: %\emph{Detection, measurement, and gravitational radiation},

2978:

2979: \bibitem{excess power}

2980: W. G. Anderson, P. R. Brady, J. D. E. Creighton, and \'{E}. \'{E}. Flanagan, Phys. Rev. D {\bf 63}, 042003 (2001)

2981:

2982: \bibitem{sam joe}

2983: L. S. Finn and J. D. Romano, in preparation.

2984: %\emph{Detecting stochastic gravitational waves: Performance of maximum-likelihood and cross-correlation statistics}

2985:

2986: \bibitem{sam unpublished}

2987: L. S. Finn, in preparation.

2988: %\emph{?}

2989:

2990: \bibitem{maximum likelihood}

2991: P. J. Bickel and K. A. Doksum,

2992: \emph{Mathematical statistics: basic ideas and selected topics} (Holden-Day, Inc., California, 1977),

2993: Sec. 6.4.

2994:

2995: \bibitem{Papoulis}

2996: A. Papoulis, edited by S. W. Director, \emph{Probability, random variables, and stochastic processes} (McGraw Hill, New York, 1984),

2997: second edition.

2998:

2999: \bibitem{Neyman and Pearson}

3000: J. Neyman and K. Pearson, Philos. Trans. R. Soc. London Ser. A {\bf 231}, 289 (1933).

3001:

3002: \bibitem{Ferguson}

3003: T. S. Ferguson, edited by Z. W. Birnbaum and E. Kukacs, \emph{Mathematical statistics a decision theoretic approach} (Academic Press, New York, 1967).

3004:

3005: \bibitem{Bayes}

3006: T. Bayes and R. Price, Philos. Trans. {\bf 53}, 370 (1763).

3007:

3008: \bibitem{Loredo}

3009: T. J. Loredo, \emph{Astronomical Society of the Pacific Conference Series, San Francisco, 1999}, edited by R. (Dick) Crutcher

3010: and D. Mehringer, vol 172, p. 297.

3011: %\emph{Computational technology for Bayesian inference},

3012: %astronomical data analysis software and systems VIII, ed. R. (Dick) Crutcher and D. Mehringer,

3013: %San Francisco: Astronomical Society of the Pacific, p. 297-306,(1999)

3014:

3015: \bibitem{waveform catalog}

3016: T. Zwerger and E. M\"{u}ller, Astron. Astrophys. {\bf 320}, 209 (1997).

3017: %\emph{Dynamics and gravitational wave signature of axisymetric rotational core collapse},

3018:

3019: \bibitem{new waveform catalog}

3020: H. Dimmelmeier, J. A. Font, and E. M\"{u}ller, Astron. Astrophys., {\bf 393}, 523 (2002).

3021: %\emph{Relativistic simulations of rotational core collapse. II. Collapse and gravitational radiation}

3022: %astro-ph/0204289

3023:

3024: \bibitem{MG9}

3025: S. Drasco and \'{E}. \'{E}. Flanagan, {\it Detecting a non-Gaussian

3026: stochastic background of gravitational radiation}, Proceedings of the

3027: Ninth Marcel Grossmann Meeting on General Relativity,

3028: eds. V. Gurzadyan, R. T. Jantzen, and R. Ruffini (World

3029: Scientific, Singapore, 2001) (also gr-qc/0101051).

3030: %in \emph{Proceedings of the Ninth Marcel Grossmann

3031: %Meeting on General Relativity}, edited by V. G. Gurzadyan,  R. T. Jantzen and

3032: %R. Ruffini, (World Scientific, Singapore, 2002), p. 1917 [?].

3033:

3034: \bibitem{wainstein zubakov}

3035: L. A. Wainstein and V. D. Zubakov, translated from Russian by R. A. Silverman,

3036: \emph{Extraction of signals from noise} (Prentice-Hall, Inc., New Jersey, 1962).

3037:

3038:

3039:

3040: \bibitem{Feller}

3041: W. Feller, {\it An Introduction to Probability Theory and Its

3042:   Applications}, Volume II, Wiley, New York, 1971.

3043:

3044: \end{thebibliography}

3045:

3046: \end{document}

3047: