0608:gr-qc0608114/ms.tex

1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2: %%                                                                  %%

3: %% TITLE: When is Enough Good Enough in Source Modeling?            %%

4: %% AUTHOR: Louis J. Rubbo                                           %%

5: %% DATE: August 28, 2006                                            %%

6: %%                                                                  %%

7: %% NOTES: This document requires the aipproc package to compile.    %%

8: %% The package is available at the AIP conference proceedings web   %%

9: %% page at                                                          %%

10: %%                                                                  %%

11: %%    http://proceedings.aip.org/proceedings/authors.jsp#latex      %%

12: %%                                                                  %%

13: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

14:

15:

16: \documentclass[final]{aipproc}

17:

18: %#### PACKAGES ####################################

19:

20: \usepackage{amsmath, amssymb, latexsym}

21:

22:

23:

24: %#### LENGTHS #####################################

25:

26: % Page size

27: \layoutstyle{6x9}

28:

29: % This call sets the amount of space on either side of the equals sign

30: % in the equation array environment

31: \setlength{\arraycolsep}{2pt}

32:

33:

34:

35: %#### NEW COMMANDS ################################

36:

37:

38:

39: %#### INFORMATION #################################

40:

41: \begin{document}

42:

43: \title{When is Enough Good Enough\\in Gravitational Wave Source Modeling?}

44:

45: \classification{02.50.Cw, 04.80.Nn, 05.45.Tp}

46:

47: \keywords{gravitational waves --- methods: data analysis}

48:

49: \author{Louis J. Rubbo}{

50:   address={Center for Gravitational Wave Physics, 104 Davey Lab,

51:   University Park, PA 16802} }

52:

53:

54:

55: %#### MAIN DOCUMENT ###############################

56:

57: %==== Abstract ====================================

58:

59: \begin{abstract}

60: A typical approach to developing an analysis algorithm for analyzing

61: gravitational wave data is to assume a particular waveform and use its

62: characteristics to formulate a detection criteria.  Once a detection

63: has been made, the algorithm uses those same characteristics to tease

64: out parameter estimates from a given data set.  While an obvious

65: starting point, such an approach is initiated by assuming a single,

66: correct model for the waveform regardless of the signal strength,

67: observation length, noise, etc.  This paper introduces the method of

68: Bayesian model selection as a way to select the most plausible

69: waveform model from a set of models given the data and prior

70: information.  The discussion is done in the scientific context for the

71: proposed Laser Interferometer Space Antenna.

72: \end{abstract}

73:

74: \maketitle

75:

76:

77:

78: %==== Main matter =================================

79:

80: \section{INTRODUCTION} \label{sec:intro}

81:

82: The anticipated data from the proposed Laser Interferometer Space

83: Antenna (LISA) introduces a number of exciting and original

84: challenges.  Central in these challenges is the development of data

85: analysis routines capable of coaxing out and characterizing individual

86: signals from the noisy time series LISA will return.  A great deal of

87: work has already been invested into the development of algorithms

88: applicable to the LISA data.  While a number of these algorithms have

89: demonstrated favorable capabilities on simulated data, each make an

90: initial assumption about the functional form for the waveform under

91: consideration.  This paper introduces the use of Bayesian model

92: selection as a quantitative method to selecting the waveform model.

93: Using Bayes' theorem we show how the data and prior information picks

94: out the most plausible model from a set of proposed models.

95:

96: Gravitational wave data analysis can be loosely described as a three

97: step process as depicted in figure~\ref{fig:flowchart}.  In the first

98: step, a signal is detected within a set of noisy time streams

99: retrieved from the detector.  In step two, the signal is characterized

100: by producing estimates for the parameterization variables.  Finally,

101: step three is to make physical interpretations based on the estimated

102: parameter values.  These steps are not necessarily mutually exclusive.

103: There are no obvious boundaries and areas of overlap do exist.

104: However, each step is necessary when analyzing a detected signal.

105: \begin{figure}

106:   \includegraphics[height=.27\textheight]{DataAnalysisBW.eps}

107:   \caption{Data analysis flow chart.}

108:   \label{fig:flowchart}

109: \end{figure}

110:

111: In making the transition form detection to characterization (and quite

112: often in the detection process itself) a particular waveform is

113: assumed prior to the investigate.  While an obvious assumption to make

114: in the early developmental stages for an algorithm, it can lead to

115: needless complications and even misidentifications.  For example, if a

116: signal is characterized by a low signal-to-noise ratio, some of the

117: intricate waveform features can be lost in the noise and therefore a

118: simpler model would have sufficed in the analysis. In the Bayesian

119: model selection approach presented here, the data and prior

120: information justify the selection of a particular waveform model by

121: calculating the most plausible model from a proposed library of

122: models.

123:

124: Bayesian model selection is not a new methodology, but it is one that

125: has not been fully adopted by the still infant gravitational wave

126: community.  The aim of this paper is to briefly summarize the theory

127: and to discuss possible applications for analyzing the LISA data.  To

128: this end, the paper first introduces the rules of probability theory,

129: including a derivation of Bayes' theorem.  It then outlines the

130: necessary calculations for performing a model selection procedure.

131: From here we give a simple, qualitative example of its use for the

132: LISA data.  We conclude by suggesting a few other applications

133: associated with LISA.

134:

135:

136:

137: \section{BAYESIAN STATISTICS} \label{sec:bayes}

138:

139: \subsection{Rules of Probability Theory} \label{sec:rules}

140:

141: We begin by introducing a notation first used by

142: Jeffreys~\cite{Jeffreys:1961}.  We will denote the statement ``the

143: probability that proposition $A$ is true given proposition $B$'' as

144: $P(A|B)$.  Similarly, ``the joint probability that both $A$ and $B$

145: are true given $C$'' is denoted by $P(A,B|C)$.  The notation ``$|C)$''

146: is the conditional that proposition $C$ is assumed to be true.  In

147: Bayesian statistics probability statements such as $P(A)$ are not

148: clear because they do not explicitly state their dependencies.

149: Furthermore, \textit{all} probabilities are conditional.

150:

151: Starting with the desiderata that degrees of plausibility are

152: represented by real numbers, the rules for manipulating plausibility

153: statements should agree with common sense, and they should be

154: consistent, then it is possible to show that the only two rules are

155: required for manipulating probabilities~\cite{Cox:1946}: the Sum Rule,

156: \begin{equation} \label{eq:sum_rule}

157:   P(A+B|C) = P(A|C) + P(B|C) - P(A, B|C)

158: \end{equation}

159: where the plus sign inside the probability argument means ``or'', and

160: the Product Rule,

161: \begin{equation} \label{eq:multi_rule}

162:   P(A,B|C) = P(A|C) P(B|A,C) \;.

163: \end{equation}

164: By standard Aristotelian logic it must be the case that $P(A,B|C) =

165: P(B,A|C)$.  Consequently, the Product Rule may be re-expressed as

166: \begin{equation}

167:   P(B,A|C) = P(B|C) P(A|B,C) \;.

168: \end{equation}

169: Equating the last two expressions results in Bayes' theorem,

170: \begin{equation} \label{eq:Bayes}

171:   P(A|B,C) = P(A|C) \; \frac{P(B|A,C)}{P(B|C)} \;.

172: \end{equation}

173: Although Bayes' theorem receives the accolades, it is simply a

174: consistency statement for the Product Rule.

175:

176: In words, Bayes' theorem is often stated as

177: \begin{displaymath}

178:   \textrm{Posterior} = \textrm{Prior} \;

179:   \frac{\textrm{Marginal Likelihood}}{\textrm{Global Likelihood}} \;.

180: \end{displaymath}

181: In this form it is evident that Bayes' theorem quantitatively

182: describes a learning process.  We start with a prior state of

183: knowledge about proposition $A$ when $C$ is assumed true, $P(A|C)$.

184: We then gain new information $B$, which in return updates our final

185: state of knowledge as given by the posterior probability, $P(A|B,C)$.

186: The proportionality factor between our prior and posterior states of

187: knowledge is a normalized statement about how likely the proposition

188: $B$ will occur given that both $A$ and $C$ are true.

189:

190: While Bayes' theorem is a useful byproduct of the Product Rule, the

191: use of the Sum Rule is equally important.  It is through the Sum Rule

192: that we are able to take a joint probability of multiple propositions,

193: and reduce it to a distribution of a smaller subset of the larger

194: joint distribution.  For example, consider the joint distribution

195: between $A$ and a set of $n$ exhaustive $B_{i}$'s, given prior

196: information $I$.  From the Sum Rule we have

197: \begin{eqnarray}

198:   P(A, \sum_{i=1}^{n} B_{i} | I) &=& P(A|I) \nonumber\\

199:   &=& P(A, B_{1} | I) + P(A, \sum_{i=2}^{n} B_{i} | I) - P(A, B_{1},

200:   \sum_{i=2}^{n} B_{i} | I) \;,

201: \end{eqnarray}

202: where the first equality follows from the Product Rule and the fact

203: that the $B_{i}$'s are exhaustive.  If the $B_{i}$'s are mutually

204: exclusive, that is only one value can be realized at a time, then the

205: last term is zero.  Repeated applications of the Sum Rule leads to

206: \begin{equation}

207:   P(A|I) = \sum_{i=1}^{n} P(A, B_{i} | I) \;.

208: \end{equation}

209: When the $B_{i}$'s take on continuous values the above goes over to an

210: integral,

211: \begin{equation}

212:   P(A | I) = \int P(A, B | I) \, dB \;.

213: \end{equation}

214: The process which we have just described is referred to as

215: \textit{marginalization}.  In it we have removed a \textit{nuisance

216: parameter}, $B$, from a joint distribution by a repeated application

217: of the Sum Rule.

218:

219:

220: \subsection{Model Selection} \label{sec:ModelSel}

221:

222: In model selection the central question that is being addressed is the

223: following: ``Given a particular set of data, and prior information,

224: which hypothesis from a library $\mathcal{L} \equiv \{H_{1}, \ldots,

225: H_{\ell}\}$ of hypotheses is the most plausible?''  Key to this

226: question are the ideas that all prior information is included and that

227: the most plausible hypothesis is based on the given data.  The

228: hypotheses within a library are either assumed to be exhaustive or, by

229: a careful choice in models, the space is made

230: so~\cite{Bretthorst:1996}.

231:

232: A model itself consists of a functional form dependent on a vector of

233: parameters $\vec{\lambda}$, and two probability

234: distributions~\cite{MacKay:1992}.  The first distribution describes

235: the probability distribution for the parameter values given the model

236: prior to the new data, $P(\vec{\lambda}|H_{\alpha})$.  This is a key

237: point; two models are distinct even if they have the same

238: parameterization but different priors about how those parameters are

239: believed to be distributed.  The second distribution is the

240: probability of a data set given the model and a particular set of

241: parameter values, $P(D|\vec{\lambda}, H_{\alpha})$.

242:

243: From Bayes' theorem~\eqref{eq:Bayes}, the posterior probability for a

244: particular model is given by

245: \begin{equation}

246:   P(H_{\alpha}|D, I) = P(H_{\alpha}|I) \; \frac{P(D|H_{\alpha}, I)}{

247:   P(D| I)} \;,

248: \end{equation}

249: where $I$ symbolizes our unenumerated prior information.  The

250: denominator can be viewed as a normalization constant,

251: \begin{equation}

252:   P(D|I) = \sum_{\alpha = 1}^{\ell} P(H_{\alpha}|I) P(D|H_{\alpha}, I)

253:   \;.

254: \end{equation}

255: By investigating the \textit{odds ratio} between two competing models,

256: we can eliminate the need to calculate the normalization constant,

257: \begin{eqnarray} \label{eq:oddsratio}

258:   O_{12} &=& \frac{P(H_{1}|D,I)}{P(H_{2}|D,I)} =

259:   \frac{P(H_{1}|I) P(D|H_{1},I)}{P(H_{2}|I) P(D|H_{2},I)} \nonumber\\

260:   &=& \frac{P(D|H_{1},I)}{P(D|H_{2},I)} \;.

261: \end{eqnarray}

262: The second line arises by assuming that our prior information does not

263: favor one model over the other.  The odds ratio gives us a means to

264: directly compare competing models.  If our library contains more than

265: two models, one model may be used as a reference.  For example, the

266: reference model may be a constant (i.e. a no signal present model),

267: while the remaining library contains a spectrum of waveform models.

268:

269: From the odds ratio it is apparent that to compare models in a library

270: only their marginal likelihoods need to be calculated.  The

271: likelihoods are found by marginalizing, over all model parameters, the

272: joint distribution for the data and the model parameters,

273: \begin{equation} \label{eq:evidence}

274:   P(D|H_{\alpha}, I) = \int P(D, \vec{\lambda}_{\alpha} | I) \;

275:   d\vec{\lambda}_{\alpha} = \int P(\vec{\lambda}_{\alpha}|H_{\alpha},

276:   I) P(D|\vec{\lambda}_{\alpha}, H_{\alpha}, I) \;

277:   d\vec{\lambda}_{\alpha} \;,

278: \end{equation}

279: where the second equality follows from the Product Rule.

280:

281: If the data is informative, i.e. we have learned something new, then

282: the parameter likelihood function, $P(D|\vec{\lambda}_{\alpha},

283: H_{\alpha}, I)$, will be more peaked than the parameter priors,

284: $P(\vec{\lambda}_{\alpha} | H_{\alpha}, I)$.  Figure~\ref{fig:occam}

285: illustrates this for a one dimensional model.

286: \begin{figure}

287:   \includegraphics[height=0.27\textheight]{OccamFactorBW.eps}

288:   \caption{A pictorial representation for the origins of Occam factors

289:     in Bayesian model comparisons.}

290:   \label{fig:occam}

291: \end{figure}

292: In this instance we can estimate the marginal likelihood as

293: \begin{equation}

294:   P(D|H_{\alpha},I) \approx P(D | \lambda_{ML}, H_{\alpha}, I) \left[

295:   P(\lambda_{ML}|H_{\alpha}, I) \; \delta\lambda \right] \;.

296: \end{equation}

297: Here $\lambda_{ML}$ is the parameter value at the maximum likelihood

298: and $\delta\lambda$ is the characteristic width for the parameter

299: likelihood function.  The term in square brackets is an \textit{Occam

300: factor}; a term that naturally penalizes complicated models. To see

301: this consider a uniform prior, $P(\lambda|I) = (\Delta\lambda)^{-1}$,

302: where $\Delta\lambda$ is the interval width for the range of expected

303: parameter values before the data is collected.  The marginal

304: likelihood is now

305: \begin{equation}

306:   P(D|H_{\alpha},I) \approx P(D | \lambda_{ML}, H_{\alpha}, I) \;

307:   \frac{\delta\lambda}{\Delta\lambda} \;.

308: \end{equation}

309: For informative data the Occam factor is always less than unity.

310: Consequently, for a complicated model to be favored over a simpler

311: one, the data must justify it by having a corresponding larger value

312: for the parameter likelihood function.

313:

314: The proceeding argument is quickly extended to multiple dimensions.

315: If the model has more than one parameter, then there is a

316: corresponding Occam factor for each parameter,

317: \begin{equation} \label{eq:apprxpost}

318:   P(D|H_{\alpha},I) \approx P(D | \vec{\lambda}_{ML}, H_{\alpha}, I)

319:   \; \frac{\delta\lambda_{1}}{\Delta\lambda_{1}} \cdots

320:   \frac{\delta\lambda_{i}}{\Delta\lambda_{i}} \;,

321: \end{equation}

322: where $i$ is the number of parameters.

323:

324: As a last point of emphasis, it is not enough to perform a parameter

325: estimation analysis and find that $\lambda_{i} = 0$, therefore ruling

326: out the model that includes $\lambda_{i}$.  Doing so would neglect the

327: Occam factors that arise in Bayesian model selection and are not

328: present in a parameter estimation analysis, even a Bayesian analysis.

329:

330:

331:

332: \section{WHITE DWARF TRANSFORM} \label{sec:wdtrans}

333:

334: As a conceptually trivial but applicable example of Bayesian model

335: selection for the LISA mission, consider the detection of a

336: supermassive black hole binary inspiral.  For black hole binaries with

337: component masses in the range of $10^{4-7}~\textrm{M}_{\odot}$, LISA

338: will observe the binary evolution as the binary sweeps through

339: frequencies from $\sim\!0.01$~mHz up to a few milliHertz (depending on

340: the actual masses).  In this same range of frequencies is the

341: gravitational wave background formed from the $\sim\!10^{8}$ solar

342: mass binaries in our own galaxy.  As the black holes inspiral, their

343: detected signal will overlap with the collective galactic background

344: signal.  Moreover, at any instant of time the black hole binary looks

345: like a monochromatic binary.  That is, as a supermassive black hole

346: binary with a time to coalescence of $t_{c}$ sweeps past a galactic

347: binary of period $T$, the two signals have a significant overlap for

348: an interval equal to the geometric mean of $t_{c}$ and

349: $T$~\cite{Cornish:2005}.  Consequently the black hole inspiral signal

350: may be decomposed into a population of monochromatic galactic

351: binaries.  Such a process is often referred to as a \textit{white

352: dwarf transform}.

353:

354: For a gravitational wave data analyst the task is to select which of

355: two models is more plausible.  The models under consideration are

356: \begin{eqnarray*}

357:   H_{WD} &=& \left( \begin{array}{l} \text{the detected signal is from

358:   a population} \\ \text{of monochromatic galactic binaries}

359:   \end{array} \right) \\

360:   H_{BH} &=& \left( \begin{array}{l} \text{the detected signal is from

361:   a single} \\ \text{supermassive black hole binary} \end{array}

362:   \right) \;.

363: \end{eqnarray*}

364: Model $H_{WD}$ is parameterized by $7N$ variables, where $N$ is the

365: number of binaries required to describe the apparent inspiral signal.

366: For an inspiral signal between $0.01$ and 1~mHz, $N$ is on the order

367: of $10^{4}$ assuming a binary per frequency bin and for a one year

368: observation\footnote{A frequency bin $\Delta f$ is equal to one on the

369: observation time, $\Delta f = T^{-1}$.  For a one year observation,

370: which is used here, $\Delta f = 3.2 \times 10^{-8}$~Hz.}.  Conversely,

371: model $H_{BH}$ is characterized by only seventeen parameters.

372:

373: Estimating the posterior probabilities using

374: equation~\eqref{eq:apprxpost} quickly leads to the conclusion that the

375: large parameter space associated with the white dwarf population model

376: has associated with it an overwhelming number of Occam factors.  These

377: Occam factors penalize the white dwarf population model and in turn

378: make the plausibility for the model extremely low.  The black hole

379: model, on the other hand, only has seventeen Occam factors and

380: therefore is not as severely penalized.  Consequently, although an

381: ensemble of galactic binaries could conspire to look like a

382: supermassive black hole binary inspiral, the relative probability for

383: such a model is many orders of magnitude less than a model that

384: contains a single black hole binary.

385:

386:

387:

388: \section{CONCLUDING REMARKS} \label{sec:conclusions}

389:

390: The white dwarf transform is an obvious application of Bayesian model

391: selection.  More informative and interesting examples include using

392: Bayesian model selection as a criteria for deciding when a signal is

393: present in the data; characterizing complicated but detected signals

394: that have low signal-to-noise ratios; and counting the number of

395: detectable galactic binaries within the larger population.  The first

396: application is simply answering the question, when does the data

397: justify declaring a detection for a particular waveform?  The second

398: application is concerned with deciding the information content from a

399: weak signal.  That is, what features of an emitting system are

400: actually measurable and what features are lost to the noise.  Counting

401: the number of detectable galactic binaries is one of the few Bayesian

402: model selection applications used in the LISA

403: literature~\cite{Umstatter:2005a, Stroeer:2006}.  Embedded within

404: Reversible Jump Markov Chain Monte Carlo techniques is the use of odds

405: ratios in deciding the number of galactic binaries that are

406: detectable.

407:

408: In general, Bayesian model selection gives a logical and quantitative

409: approach to directly comparing competing models.  By using a model

410: selection procedure we are able to maximize the amount of information

411: we can extract from LISA's data.  The most plausible model is the one

412: that is most justified by the data and our prior state of knowledge

413: prior to the experiment.  As progress is made in the development of

414: LISA analysis routines it is conceivable that Bayesian approaches will

415: be a central tool.

416:

417:

418:

419: %==== Acknowledgments =============================

420:

421: \begin{theacknowledgments}

422: The author would like to thank Edward Cazalas, Matthew Francis, and

423: Deirdre Shoemaker for a number of helpful discussions.  Also, Lee

424: Samuel Finn for introducing the author to the Bayesian approach and

425: for guidance on a number of its subtler points.  This work was

426: supported by the Center for Gravitational Wave Physics.  The Center

427: for Gravitational Wave Physics is funded by the National Science

428: Foundation under cooperative agreement PHY 01-14375.

429: \end{theacknowledgments}

430:

431:

432:

433: %==== Bibliography ================================

434:

435: \bibliographystyle{aipproc}

436: \bibliography{References}

437:

438:

439: \end{document}

440: