0412:astro-ph0412703/ms.tex

1: %\documentclass[manuscript,onecolumn]{aastex}

2: \documentclass[manuscript,onecolumn]{emulateapj}

3: %\usepackage{emulateapj}

4:

5: %\documentclass[preprint,11pt]{aastex}

6:

7: \newcommand{\be}{\begin{equation}}

8: \newcommand{\ee}{\end{equation}}

9:

10: \slugcomment{accepted to AJ}

11: \shorttitle{Adaptive Scheduling}

12: \shortauthors{Ford}

13: \begin{document}

14:

15: \title{Adaptive Scheduling Algorithms for Planet Searches}

16:

17: \author{Eric B.\ Ford\altaffilmark{1,2,3,4,5}}

18:

19: %\email{eford@astro.ufl.edu}

20: \altaffiltext{1}{Department of Astrophysical Sciences,

21: 	Princeton University,

22: 	Peyton Hall,

23: 	Princeton, NJ 08544-1001, USA}

24: \altaffiltext{2}{Astronomy Department,

25: 	601 Campbell Hall,

26: 	University of California at Berkeley,

27: 	Berkeley, CA 94720-3411, USA}

28: \altaffiltext{3}{Harvard-Smithsonian Center for Astrophysics,

29:         MS-51,

30: 	60 Garden Street,

31: 	Cambridge, MA 02138, USA}

32: \altaffiltext{4}{Hubble Fellow}

33: \altaffiltext{5}{present address: Department of Astronomy,

34:         University of Florida,

35: 	211 Bryant Space Science Center,

36: 	P.O. Box 112055,

37: 	Gainesville, FL, 32611-2055, USA}

38:

39:

40:

41: \begin{abstract}

42: High-precision radial velocity planet searches have surveyed

43: over $\sim\!2000$ nearby stars and detected over $\sim\!200$ planets.  While

44: these same stars likely harbor many additional planets, they will

45: become increasingly challenging to detect, as they tend to have

46: relatively small masses and/or relatively long orbital periods.

47: Therefore, observers are increasing the precision of their

48: observations, continuing to monitor stars over decade timescales, and

49: also preparing to survey thousands more stars.  Given the considerable

50: amounts of telescope time required for such observing programs, it is

51: important use the available resources as efficiently as possible.

52: Previous studies have found that a wide range of predetermined

53: scheduling algorithms result in planet searches with similar

54: sensitivities.  We have developed adaptive scheduling algorithms which

55: have a solid basis in Bayesian inference and information theory and

56: also are computationally feasible for modern planet searches.  We have

57: performed Monte Carlo simulations of plausible planet searches to test

58: the power of adaptive scheduling algorithms.  Our simulations

59: demonstrate that planet searches performed with adaptive scheduling

60: algorithms can simultaneously detect more planets, detect less massive

61: planets, and measure orbital parameters more accurately than

62: comparable surveys using a non-adaptive scheduling algorithm.  We

63: expect that these techniques will be particularly valuable for the

64: N2K radial velocity planet search for short-period planets as well as

65: future astrometric planet searches with the Space Interferometry

66: Mission which aim to detect terrestrial mass planets.

67: \end{abstract}

68:

69: \keywords{Subject headings: planetary systems -- methods: statistical

70: -- techniques: radial velocities}

71:

72:

73: \section{Introduction}

74:

75: Radial velocity planet searches have surveyed over 2000 nearby solar

76: type stars and discovered over 200 planets.  The surveys require many

77: high precision radial velocity observations of each star in the

78: survey, and hence a significant amount of observing time.

79: %

80: For example, the N2K project has recently begun surveying the next

81: $\sim 2000$ nearby stars for planets.  This project aims to discover

82: dozens of hot Jupiters and hopefully additional transiting planets

83: (Fischer et al.\ 2004).  Given the large target list for the N2K

84: project, it is essential that inferences about the presence of planets

85: and their orbits be made as efficiently as possible.  The

86: observing program aims to take three observations of each star on

87: consecutive nights, followed by an additional observation one to a few

88: months later.  In many cases, it is clear after a few observations

89: that the radial velocity observations have a dispersion significantly

90: greater than would be expected due to measurement errors only.

91: However, there may still be a large range of possible orbital

92: solutions, and several additional observations will often be

93: required to determine the planetary orbits.  Given the considerable

94: observation time required for such planet searches and the value of

95: telescope time, it is important that these surveys be as

96: efficient as possible.

97:

98: Previous studies have demonstrated that a variety of observing

99: schedules result in comparable efficiencies for detecting planets

100: (Sozzetti 2002, Ford 2004).  However, these studies have only

101: considered {\em non-adaptive } observing schedules, i.e., schedules

102: that are fully determined before any observations are taken.  In this

103: paper, we develop and test the efficiency of {\em adaptive}

104: scheduling algorithms.  We describe how adaptive scheduling algorithms

105: can help increase the efficiency of planet searches by utilizing

106: the information available from the previous observations to plan future

107: observations.

108:

109: % It is possible to develop adaptive scheduling algorithms based on the frequentist methods of sequential statistics (see Wetherill \& Glazebrook 1986).  For example, a planet could be considered detected when a $\chi^2$ statistic exceeds a threshold which depends on the number of observations made.  Unfortunately, the thresholds typically must be determined via simulation.  Since one must choose a threshold function (and not just a constant threshold as in the case of non-sequential statistics), there are an infinite number of threshold functions which have the same false alarm rate.  One must use large simulations to estimate the power of various threshold functions. More importantly, this approach typically relies on making decisions (e.g., Is the null hypothesis rejected?  Should we stop observing this star?) that introduce biases which much should be fully understood before making inferences.  In practice, large simulations are necessary to understand these biases.  Finally, with this type of adaptive scheduling algorithm it is not easy to test new hypotheses which were formulated after some data has been collected.

110:

111: Adaptive scheduling algorithms can be based on Bayesian

112: inference (Loredo 2004).  Within this framework, the need to make decisions is

113: minimized.  For example, the experimenter does not say that a planet

114: has been detected, but rather states that the posterior probability for

115: the null hypothesis is some small value.  This eliminates the need to

116: chose threshold functions.  More importantly, by eliminating

117: decisions and basing the scheduling algorithm on the posterior

118: probability distribution, the posterior probability distribution is

119: not biased by the choice of observing schedule.  Additionally, it is

120: straightforward to test new hypotheses which will inevitably be

121: formulated after making some observations.  For these reasons, we have

122: developed adaptive scheduling algorithms within the Bayesian

123: framework, always considering all possible models (weighted according

124: to their posterior probability).

125:

126: It should be noted that the use of adaptive scheduling algorithms ---

127: even those based on Bayesian inference--- does affect the distribution

128: of the posterior distributions.  Indeed, that is the purpose of

129: employing such algorithms.  For example, if an adaptive scheduling

130: algorithm is chosen to increase the sensitivity of an observing

131: program for detecting planets, then it is expected that an ensemble of

132: surveys employing the adaptive scheduling algorithm will result in

133: detecting more planets than a comparable ensemble of surveys using a

134: fixed scheduling algorithm.  It should be noted that this is also true

135: of non-adaptive scheduling algorithms.  For example, an observing

136: program which makes observations over a long duration will be

137: sensitive to planets with orbital periods comparable or less than the

138: survey duration, but there may be ambiguities in determining the

139: orbital period associated with aliasing.  If an observing program

140: obtained the same number of observations with the same precision, but

141: performed all the observations during a smaller interval of time, then

142: the shorter survey would be less sensitive to planets with orbital

143: periods longer that the short survey's duration, but would likely

144: reduce aliasing ambiguities for short period planets.  Thus, when

145: analyzing the properties of a population of stars and planets, it is

146: alway important to account for the scheduling algorithm used.

147:

148: Our approach applies the principles of Bayesian inference and

149: information theory to guide the choice of observation times (and later

150: choice of target stars).  Following Loredo (2004), we assume a prior

151: for both the probability that each target star has a planet and the

152: distribution of orbital periods and masses of planets.  Second, a

153: small number of observations are taken of each target star.  Bayesian

154: inference is used to calculate the posterior probability distribution

155: for all model parameters.  Then, the posterior probability

156: distribution for the model parameters is used to calculate the

157: predictive distribution, the posterior probability distribution for

158: the radial velocities at some future time.  By comparing the

159: information contained in the predictive distribution at various future

160: times, it is possible to choose observing times at which additional

161: observations would be most valuable.  This technique requires

162: performing integration over several variables and in general is

163: extremely computationally intensive.  In this paper we describe a

164: relatively fast algorithm for performing the necessary integrations.

165:

166: %

167: % Loredo (2004) described such an algorithm and presented an example of how adaptive scheduling algorithms could result in improving the precision of orbital parameters more rapidly than the usual scaling law, the inverse square root of the number of observations.

168:

169: While it would be desirable to observe each star in a

170: survey at the optimal times for each particular star, it is more

171: realistic to consider a planet survey which is able to perform a fixed

172: number of observations at specific times.  In this case, the

173: information contained in the predictive distributions for several

174: stars can be used to select which target star should be observed at a

175: given time.

176:

177: We describe the algorithm for choosing observing times of a single

178: star in \S2.  In \S3 we describe a generalized algorithm that

179: allows for multiple target stars,

180: along with simulations which demonstrate the power of adaptive

181: scheduling algorithms.  In \S4 we present generalizations that allow

182: observing schedules to be optimized for meeting specific goals,

183: such as detecting planets in the habitable zone or maximizing the

184: number of planets detected.  Finally, we discuss the implications of

185: our findings and the challenges which remain in \S5.

186:

187:

188: \section{Adaptive Scheduling for a Single Target Star}

189: \label{SecSingleStar}

190:

191: \subsection{Priors}

192: \label{SecPriors}

193:

194: For each target star we assume a prior probability, $p(0)$, for the

195: null hypothesis that the radial velocity observations are consistent

196: with a constant radial velocity.  We assume a probability,

197: $p(1)=1-p(0)$, that the star has a single planet.  For the orbital

198: parameters of any planet, we take prior distributions that are flat in

199: $\log P$, $\log K$, and $\phi_o$, where $P$ is the orbital period, $K$

200: is the velocity semi-amplitude, and $\phi_o$ is the phase at a

201: given epoch.  These choices are standard for variables which represent

202: magnitudes and angles.  We limit the range of orbital parameters to

203: $P_{\min} \le P < P_{\max}$, $K_{\min} \le K < \infty$, and $0 \le \phi_o < 2

204: \pi$ (see \S2.3).  The choices for the prior distributions are supported by

205: scaling arguments as well as their approximate agreement with the

206: orbital parameters for the known extrasolar planets.

207:

208: There are a few differences in our prior distributions and the

209: distributions of orbital elements of known extrasolar planets.  First,

210: we apply a sharp cutoff for orbital periods less than $P_{\min}$.  The

211: OGLE transit searches have discovered planets with orbital periods as

212: short as $1.2$d (Koanacki 2003).  While radial velocity searches are

213: very sensitive to such planets, the shortest orbital period discovered

214: by a radial velocity survey is 2.5d (Udry et al.\ 2003).  It is

215: important to recognize the OGLE transit search surveys a much larger

216: number of stars than radial velocity surveys (Gaudi, Seager, \&

217: Mallen-Ornelas 2004).  Thus, the observations imply that the

218: distribution of orbital periods is roughly flat in $\log P$ for $P \ge

219: 3d$, but there is a very significant reduction in the number of

220: planets at shorter orbital periods (Gaudi, Seager, \&

221: Mallen-Ornelas 2004).  Therefore, it would be reasonable

222: to apply cutoff for orbital periods $P < P_{\min}$ for any $P_{\min} <

223: 3$d, and we choose $P_{\min} = 2.5$d.  Given recent discoveries, we

224: do not suggest such a large $P_{\min}$ for future studies.

225:

226: %Additionally, it is worth noting that if $P_{\min}\le 1$d, then there

227: %can be a large range of possible orbital solutions with orbital

228: %periods very near one day.  Such planets are difficult for radial

229: %velocity surveys to detect when there are only a few closely spaced

230: %observations.  However, it is important to note that the possibility

231: %of planets with orbital periods very near one day can be well

232: %constrained by observing the systems again after a significant

233: %fraction of a year.  Therefore, when $T_{\mathrm obs} < 100$d, and

234: %$P_{\min} = 1$d when $T_{\mathrm obs} \ge 100$d.

235:

236: We also apply a sharp cutoff for orbital periods greater than

237: $P_{\max}$.  When a planet has an orbital period much longer than

238: $T_{\mathrm obs}$, then there are degeneracies in the Keplerian

239: orbital parameters and the radial velocities can be well modeled by a

240: quadratic polynomial (Cumming 2004).  By replacing all the Keplerian

241: models with $P \ge \pi T_{\mathrm obs}$ with a single quadratic model,

242: it is possible for our algorithm to detect planets with orbital

243: periods, $\pi T_{\mathrm obs} \le P$ efficiently.  We maintain the

244: flat prior in $\log P$ by setting the prior probability for the

245: quadratic model equal to the sum of the prior probabilities for

246: orbital periods in the range $\pi T_{\mathrm obs} < P < P_{\max}$.

247: Since the prior distribution is flat in $\log P$, this relative prior

248: probability is not sensitive to the exact choice of $P_{\max}$.  The

249: effect of varying $P_{\max}$ is to change the prior probability for a

250: planet having a long-period orbit relative to the prior probability

251: for a planet having an orbital period between $P_{\min}$ to

252: $P_{\max}$.  Since little is known about the abundance of extrasolar

253: planets with orbital periods greater than $\sim 10$yr, we choose

254: $P_{\max} = 40$yr guided by our own solar system.

255:

256: Another difference between our prior distributions and the

257: distributions of orbital elements for known extrasolar planets is that

258: we assume the planetary orbits are circular, i.e., the orbital

259: eccentricity, $e$, is zero.

260: %

261: Since a circular orbit approximates a Keplerian orbit with a small

262: eccentricity, our algorithm is expected to identify planets with small

263: and even moderately eccentric orbits.  However, the efficiency for

264: detecting planets on moderately eccentric orbits may be somewhat

265: reduced compared to the efficiency for detecting a planet on a

266: circular orbit with comparable mass and orbital period.  The

267: reduction in efficiency is relatively mild for $e<0.4$, but rapidly

268: becomes more significant (Endl. et al.\ 2002; Cumming 2004).

269: %

270: While many of the known extrasolar planets are on significantly

271: eccentric orbits, the planets with shorter orbital periods tend to

272: have smaller eccentricities.  This is likely due in part to tidal

273: circularization affecting planets with small orbital periods (Rasio,

274: Livio, \& Tout 1996).  While the assumption of circular orbits is

275: likely appropriate for many planets (especially short-period planets

276: targeted by the N2K project), there is no question that it would be

277: more desirable to include eccentricities.  Still, the assumption of

278: circular orbits permits significant computational advantages making it

279: extremely attractive when a large range of parameter space must be

280: searched (e.g., for planetary orbits which are poorly constrained by

281: the currently available data).  The reduction in computational

282: requirements makes it computationally feasible to explore the

283: properties of adaptive scheduling algorithms (as in this paper).

284:

285:

286:  \subsection{Initial Observations}

287:

288: Before making observations of a target star, there is no basis for

289: believing that the star is more or less likely to have a planet or

290: that various orbital parameters are preferred, except what is

291: suggested by the prior probability distribution.  Therefore, an

292: initial set of $N_{\min}$ observations is made for each star.  When

293: the number of observations, $N_{\mathrm obs}$ is less than $N_{\min}$,

294: the choices for when to observe are not affected by previous

295: observations, however these choices may be affected by practical

296: considerations.  For example, observations must be made at night and

297: radial velocity surveys are typically allocated observing time near full

298: Moon.  Additionally, the airmass and atmospheric conditions in the

299: direction of each target may favor observing certain targets at

300: certain times during the available observing nights.

301:

302: %The N2K project

303: %aims to make three observations on consecutive (or nearly consecutive)

304: %nights and a fourth observation a few months later.  The first three

305: %observations provide valuable information for constraining possible

306: %hot Jupiters, and the fourth observation provides much improved

307: %sensitivity to planets with longer orbital periods.

308:

309: Under the null hypothesis, there is a single fit parameter for the

310: constant velocity of the star, $C_0$, and the star's velocity is given

311: by $v_{*,C_0}(t) = C_0$.  Next, we consider the alternative hypothesis

312: that the star has a single planet in a circular orbit.  There are four

313: parameters which can be varied to fit the velocity observations of

314: each star, $P$, $K$, $\phi_o$, and $C_1$, where $C_1$ is the constant

315: velocity of the star.  (Given the way the star's velocity is measured,

316: it is typically necessary to use different values of $C_1$ for

317: different observatories.  Thus, when there are only a small number of

318: observations of a given target star, it is extremely advantageous if

319: all the observations are made from a single observatory.  For the

320: purposes of this paper, we assume that all radial velocity

321: observations are made with a single observatory.)  The radial velocity

322: signature of a planet on a circular orbit can be written as

323: %

324: \begin{equation}

325: %

326: v_{*,\vec{x}}(t) = K \cos \left[ \frac{2\pi}{P}t + \phi_o \right]  + C_{1}.

327: %

328: \end{equation}

329: %

330: %  Also, since we assume circular orbits, $\omega$ and $M_o$ can not be determined separately, but only in the combination $\phi_o$.

331: %

332: After a set of $N_{\min}=3$ initial observations, it is possible to

333: evaluate the plausibility of the fit parameters, $P$, $K$,

334: $\phi_o$, and $C_1$.  Since each observation has some

335: observational uncertainties, the orbit still is not uniquely

336: determined, even if our general model is exactly correct.

337:

338: As noted in \S\ref{SecPriors}, when a planet has an orbital period

339: much longer than the time span of observations, then the fit

340: parameters used above are not well determined by the radial velocity

341: observations.  Therefore, for orbital periods $\pi T_{\mathrm obs} \le P \le P_{\max}$, we model the radial velocity of the star as

342: %

343: \begin{equation}

344: %

345: v_{*,\vec{a}}(t) = a_0 + a_1 t + a_2 t^2,

346: %

347: \end{equation}

348: %

349: where $\vec{a} = \left( a_0, a_1, a_2 \right)$ is the set of

350: coefficients for the polynomial model.

351:

352:

353: \subsection{Inference}

354:

355: \label{SecInference}

356:

357: Once $N_{\mathrm obs} \ge N_{\min}$, we analyze the available

358: observations using the methods of Bayesian statistics after making

359: each new observation.  The results of the analysis can be used to make

360: informed choices for when stars should be targeted for additional

361: observations.  Let $\vec{d}$ denote the set of available data, in this

362: case the previous radial velocity observations.  We have already

363: introduced the prior probabilities for the null hypothesis, $p(0)$,

364: and for the single planet model, $p(1)$, as well as the prior

365: probability distribution for orbital parameters, $p(\vec{x}|1)$.

366: Next, we introduce the conditional probability for the observations

367: given the null hypothesis, $p( \vec{d} | 0 )$, and the conditional

368: probability for the observations given a fixed set of model

369: parameters, $p(\vec{d} | \vec{x} )$.  Since the observational errors

370: are assumed to be independent, both conditional probabilities can be

371: simply evaluated as the product of the probabilities for drawing each

372: observation given the relevant model for the stellar velocity.  Since

373: each radial velocity measurement is obtained by averaging the Doppler

374: shift measured for hundreds of spectra lines, the observational

375: uncertainties are very well approximated by a normal distribution, and

376: the conditional probabilities are given by

377: %

378: \begin{eqnarray}

379: p(\vec{d} | \vec{m} )

380: & = & \prod_i p(d_i | \vec{m})

381: = \prod_i \frac{\exp{\left[-\frac{\left(d_i - v_{*,\vec{m}}(t_i)\right)^2}{2\sigma_i^2} \right]}}{\sqrt{2 \pi} \sigma_i} \\\

382: & = & \frac{\exp \left[\frac{-1}{2} \sum_i \left(\frac{d_i-v_{*,\vec{m}}(t_i)}{\sigma_i}\right)^2 \right]}{\left(2\pi\right)^{N_{\mathrm obs}/2} \prod_i \sigma_i}

383: \equiv \frac{\exp \left[ \frac{-\chi^2(\vec{m})}{2}\right]}{\left(2\pi\right)^{N_{\mathrm obs}/2} \prod_i \sigma_i},

384: \end{eqnarray}

385: %

386: where $\vec{m}$ represents the generalized model parameters, i.e.,

387: either $\vec{m}=(C)$ (the null hypothesis model), $\vec{m}=\vec{x}$ (the single

388: planet model with $P < \pi T_{\mathrm obs}$), or $\vec{m}=\vec{a}$ (the

389: polynomial model for a planet with $\pi T_{\mathrm obs} \le P \le

390: P_{\max}$).  Each individual observation, $d_i$, is made at a time,

391: $t_i$, and has an observational uncertainty, $\sigma_i$.  Since the

392: observational uncertainties are nearly Gaussian and assumed to be

393: independent, the conditional probability distribution for all the

394: available observations is a chi-squared distribution, and we

395: introduced the goodness of fit statistics, $\chi^2(\vec{m})$ which can be

396: easily computed for each set of model parameters, $\vec{m}$.

397:

398: Next, we introduce terminology from Bayesian statistics, $p(\vec{d},

399: 0)$, the joint probability for the observations and the null

400: hypothesis, and $p(\vec{d}, \vec{x})$, the joint probability for the

401: observations and the single planet hypothesis with a particular set of

402: model parameters, $\vec{x}$.  The joint probabilities can be written

403: as the product of the prior probability and the conditional

404: probability, e.g., $p(\vec{d}, 0) = p(0) p(\vec{d} | 0)$.  We will

405: also use Bayes theorem, which states that

406: %

407: \begin{equation}

408: p(\vec{m} | \vec{d})

409:  = \frac{p(\vec{d}, \vec{m})}{p(\vec{d})}

410:  = \frac{p(\vec{m}) p(\vec{d} | \vec{m})}{\int d\vec{m} \, p(\vec{m}) p(\vec{d} | \vec{m})},

411: \end{equation}

412: %

413: We use the joint probabilities and Bayes' theorem, to compute the

414: posterior probabilities which incorporate both the prior probabilities

415: and the information contained in the observations, $\vec{d}$.  For

416: example, the posterior probability for the null hypothesis is

417: %

418: \begin{eqnarray}

419: p( 0 | \vec{d} ) &

420: = & \frac{ p(\vec{d}, 0) }{p(\vec{d}, 0) + \int d\vec{x} \, p(\vec{d}, \vec{x}) + \int d\vec{a} \, p(\vec{d}, \vec{a}) } \\

421: & = & \frac{\int dC \, p(C) p(\vec{d}| C) }{\int dC \, p(C) p(\vec{d}| C)  + \int d\vec{x} \, p(\vec{x}) p(\vec{d} | \vec{x}) + \int d\vec{a} \, p(\vec{a}) p(\vec{d} | \vec{a}) }  \\

422: & = & \frac{p(0) \int dC \, p(C | 0) p(\vec{d}| C) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C)  + p(1) \int d\vec{x} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{a}) },

423: \end{eqnarray}

424: %

425: where $p(C | 0)$ is the prior distribution (flat) for the constant velocity

426: given the null hypothesis, $p(\vec{x} | 1)$ is the prior distribution

427: for the sinusoidal fit parameters given that a planet is present, and

428: $p(\vec{a} | 1)$ is the prior distribution for the polynomial fit

429: parameters given that a planet is present.

430:

431: Unfortunately, the integrals, and particularly the integral over

432: $\vec{x}$ in the denominator, can be extremely difficult to evaluate.

433: In particular, $\chi^2(\vec{m})$ and hence $p(\vec{d} | \vec{x} )$

434: can be extremely ``bumpy'' functions (Ford 2005).  It is computationally

435: impractical to actually calculate $\chi^2(\vec{x})$ over the

436: entire range of parameter space with sufficient resolution to

437: approximate the integral accurately.  Therefore, we must find a way to

438: approximate the integrals in a computationally efficient manner.

439:

440: When the orbital parameters are well constrained, then the integral is

441: typically dominated by the contribution from a small region of

442: parameter space near the best-fit solution and the integrals are

443: easily evaluated.  Even when the orbital parameters are somewhat less

444: constrained, the method of Markov chain Monte Carlo provides a

445: powerful tool for evaluating the necessary integrals.  However, even

446: Markov chain Monte Carlo is not computationally practical when the observations still

447: permit a wide variety of distinct orbital solutions (e.g., when there

448: are only a small number of observations).  Since the N2K

449: project is particularly interested in working with small data sets, we

450: expect that it will frequently be necessary to analyze radial velocity

451: observations which only provide limited constraints on the orbital

452: parameters, given the typical planetary masses, orbital periods, and

453: measurement errors.  Therefore we have developed an efficient

454: algorithm for approximating the necessary integrals.

455:

456: % Rather than approximating the integral by integrating over the region of parameter space near the best-fit solution, we approximate the integral over $\vec{x}$ by summing the contributions from many regions of parameter space, each region centered on one of the many local maxima in $p(\vec{d} | \vec{x})$.

457: %

458: While the integrand typically has numerous local maxima which can be

459: spread across a wide range of orbital periods, if the orbital period

460: is held fixed at $P$, then the integral over the remaining fit

461: parameters is dominated by the contribution from the single maximum

462: (assuming a circular orbit).  Therefore, we approximate the integral

463: by separating the integral over orbital period, $P$, from the

464: integrals over the remaining fit parameters, $\vec{x}_P$.  We sum the

465: contributions to the integral from each of the regions around the

466: best-fit solutions for each orbital period.  Thus, for the purposes of

467: computation, we replace the integral over orbital period with a

468: summation and will approximate the integrals over $\vec{x}_P$, giving

469: %

470: \be

471: p( 0 | \vec{d} ) = \frac{ p(0) \int dC \, p(C | 0) p(\vec{d}| C) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C)  + p(1) \sum_i  \Delta \log P_i \int d\vec{x}_{P_i} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x})},

472: \ee

473: %

474: where $\Delta \log P_i$ is the spacing between the logarithm of successive orbital

475: periods, and $\vec{x}_P$ is the set of fit parameters excluding the

476: orbital period, $P$.  Similarly, the posterior probability for a planet with orbital period near $P$ is given by

477: %

478: \be

479: p( P | \vec{d} ) \Delta \log P = \frac{p(1) \Delta \log P \int d\vec{x}_{P} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C)  + p(1) \sum_i  \Delta \log P_i \int d\vec{x}_{P_i} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x})},

480: \ee

481: %

482: and the posterior probability for a planet with an orbital period greater than

483: $\pi T_{\mathrm obs}$ is

484: %

485: \be

486: p( P\ge \pi T_{\mathrm obs} | \vec{d} ) = \frac{p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x}) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C)  + p(1) \sum_i  \Delta \log P_i \int d\vec{x}_{P_i} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x})  }  .

487: \ee

488: %

489: Clearly, the posterior probability that a star has a planet with any

490: orbital period is simply

491: %

492: \be

493: p( 1 | \vec{d} ) = \sum_i p( P_i | \vec{d} ) \Delta \log P_i + p(P\ge\pi T_{\mathrm obs}) = 1 - p( 0 | \vec{d} ).

494: \ee

495: %

496:

497: For each orbital period, $P$, we must approximate each of the

498: integrals over $\vec{x}_P$.  Since the prior, $p(\vec{x} | 1, P_i)$ is

499: flat, we expand the argument of the exponential, $\chi^2(\vec{x}_P |

500: P)$, about its minimum ($P$ is held fixed).  Since we expand about a

501: minimum, the first derivatives of $\chi^2$ with respect to the

502: variable in $\vec{x}_P$ vanish.  Therefore the $\chi^2$ surface is a

503: quadratic function centered on the minimum, and we can approximate the

504: integral by extending the limits of integration to infinity.  The

505: resulting multidimensional Gaussian integral can then be evaluated

506: analytically, using only the value of $\chi^2$ at its minima,

507: $\min_{\vec{x}_P} \chi^2(\vec{x}_P | P)$, and the determinant of the

508: covariance matrix, $\mathrm{Covar}\left(\chi^2(\vec{x}_P | P)\right)$,

509: as

510: %

511: \be

512: \int d\vec{x}_{P} \, p(\vec{x}_P, P | 1 ) p(\vec{d} | \vec{x}) \simeq

513: \frac{\sqrt{\mathrm{Det} \left|\mathrm{Covar}\left(\chi^2(\vec{x}_P | P)\right)\right|}}{\left(2\pi\right)^\nu \prod_i \sigma_i } \exp \left[ -\frac{1}{2} \min_{\vec{x}_P} \chi^2(\vec{x}_P | P)\right],

514: \ee

515: %

516: where $\nu = N_{obs} - N_{fit}$ and here $N_{fit}=3$  (Sivia 1996; Cumming 2004).  In a similar

517: way, the integral over $\vec{a}$ can be approximated by

518: %

519: \be

520: \int d\vec{a} \, p(\vec{a} | 1) p(\vec{d} | \vec{a}) \simeq

521: \frac{\sqrt{\mathrm{Det} \left| \mathrm{Covar}\left(\chi^2(\vec{a})\right)\right|}}{\left(2\pi\right)^\nu \prod_i \sigma_i } \exp \left[-\frac{1}{2} \min_{\vec{a}} \chi^2(\vec{a})\right],

522: \ee

523: %

524: and the integral over $C$ can be approximated by

525: %

526: \be

527: \int dC \, p(C | 0) p(\vec{d}| C) \simeq

528: \frac{\sqrt{\mathrm{Var}\left(\chi^2(C)\right)}}{\left(2\pi\right)^\nu \prod_i \sigma_i } \exp \left[ -\frac{1}{2} \min_{C} \chi^2(C)\right],

529: \ee

530: %

531: where $N_{fit} = 1$ and the determinant of the covariance matrix has

532: been replaced by the square root of the variance of the single fit

533: parameter, $C$.  Thus, we approximate the necessary integrals by

534: explicitly summing the contributions from the null hypothesis, the

535: polynomial model, and all possible orbital periods, but approximate

536: the integrals over the remaining fit parameters as Gaussian integrals.

537: This provides a good approximation to the necessary integrals while

538: leaving only one dimension ($P$) which must be finely sampled.

539:

540: It remains to identify the best fit solution for each orbital period

541: considered and to evaluate the probability of each of these possible

542: solutions.  This is equivalent to the problem of evaluating the

543: floating-mean periodogram.  The floating-mean periodogram and its

544: relationship to the standard periodogram is described by Cumming

545: (2004).  The periodogram is evaluated on a grid uniform in the

546: frequency, $f = 1/P$, rather than in $\log P$.  Thus, the factors

547: $\Delta \log P_i$ serve as weighting factors to ensure that we

548: maintain a prior which is uniform in $\log P$.  The necessary number

549: of orbital periods to consider is set by the ratio of the maximum

550: orbital period considered, $\pi T_{\mathrm obs}$, to the minimum

551: period considered, $P_{\min}$.  Since we do not want to miss a minima

552: in $\min_{\vec{x}_P} \chi^2(\vec{x}_P, P)$, we oversample by a factor

553: $\zeta \simeq 4$.  Thus, the number of orbital periods considered is

554: $N_P \simeq \zeta \pi T_{\mathrm obs} / P_{\min}$, a constant times

555: the Nyquist frequency corresponding to the minimum period.  Note that

556: we do not count sine and cosine components separately, and we are

557: searching for periods up to $\pi T_{\mathrm obs}$, despite the fact

558: that models with orbital periods longer than $T_{\mathrm obs}$ are so

559: similar that they can not be distinguished with the previous

560: observations.  The time span of observations can be as short as a

561: couple of months or extend for several years.  Hence it is typically

562: necessary to find the best-fit orbital parameters for thousands of

563: orbital periods.  Given the large number of global searches necessary,

564: the computation time required is significantly reduced if $\chi^2$ can

565: be written as a linear function of the fit parameters.  While this is

566: impossible for eccentric Keplerian orbits, it is possible for circular

567: orbits by writing

568: %

569: \begin{equation}

570: %

571: v_{*,\vec{x}}(t) = A \cos (\frac{2\pi}{P}t) + B \sin (\frac{2\pi}{P}t) + C_{1},

572: %

573: \end{equation}

574: %

575: where $K = \sqrt{A^2+B^2}$, and $\phi_o = \tan \frac{B}{A}$.  Using

576: this formulation allows for each of the best-fit solutions and the

577: covariance matrices to be evaluated by linear least-squares which is

578: much faster and more robust than non-linear least squares.  Since we

579: used $A$ and $B$ as fit parameters rather than $\log K$ and $\phi_o$,

580: we must include a weight equal to the determinant of the Jacobian of

581: transformation, $\left|J\right|=K^{-2}$.  While the Jacobian should

582: formally be inside the integral, we approximate the integral by

583: substituting the value of $K$ at the minimum in $\chi^2(\vec{x}_P |

584: P)$.

585:

586: While $K$ is allowed to take on any positive value, for the purpose of

587: comparing the posterior probability of the no-planet and one-planet

588: models it is necessary to normalize the prior distribution for $K$.

589: For this purpose only, we assume $K_{\min} \le K \le K_{\max}$, where

590: $K_{\max}$ is the amplitude of a 10 Jupiter-mass planet orbiting a solar-mass

591: star with an orbital period of $P_{\min}$.  Again, solely for the purposes

592: of setting the normalization, we adopt $K_{\min}$ is the signal amplitude for

593: which there would be a $\sim50\%$ probability of detecting the planet

594: and $K_{\max}$ is the maximum velocity amplitude of a planet for the

595: specified orbital period.  For a planet on a circular orbit, $K_{\max}

596: = 2\pi (m_{\max}/M_*).\left( G (M_*+m_{\max}) / P\right)^{-1/3}$,

597: where $m_{\max}/M_* = 0.01$ is the ratio of the maximum planet mass to

598: the star mass and $G$ is the gravitational constant.  Note that

599: $K_{\max}$ and hence the normalization for the prior, $p(\vec{x})$,

600: varies with the orbital period.  For $K_{\min}$ we use the analytic

601: approximation from (Cumming et al.\ 2002), $K_{\min} = 2

602: \sigma_{\mathrm obs} \sqrt{T_{\mathrm obs} / \left( P_{\min}

603: (N_{obs}-3) FAP \right)}$ , where $\sigma_{\mathrm obs}$ is the

604: uncertainty of the individual velocity measurements, $N_{obs}$ is the

605: number of previous observations of the star, and $FAP$ is the false

606: alarm probability which we set to $1/N_{\mathrm targets}$, the inverse

607: of the number of target stars.  As pointed out by an anonymous

608: referee, this choice is somewhat arbitrary.  Nevertheless, given our

609: choice of prior, it is necessary to make some choice to result in a

610: normalized probability distribution.  In the future, we would suggest

611: using a single normalized prior distribution that has support below

612: $K_{\min}$, such as the modified Jeffreys prior, $p(K) = (K+K_o)^{-1}

613: / \log\left[1+K/K_o\right]$.

614:

615: The above procedure allows us to efficiently calculate the probability

616: of the null hypothesis as well as a list of probabilities that there

617: is a planet with each of the orbital periods considered.  For each of

618: these probabilities, there is also a set of best-fit parameters and a

619: covariance matrix which describe the size and shape of the posterior

620: probability distribution for the remaining fit parameters.  To the

621: extent that our model and approximations are valid, these

622: probabilities and covariance matrices provide the optimal basis for

623: making inferences about the presence of a planet and its orbital

624: parameters.  Each time that a new observation is made, the entire procedure

625: is repeated to produce updated posterior distributions which

626: incorporate the new information from the latest observation.

627:

628: \subsection{Prediction}

629: \label{SecPrediction}

630:

631: Having estimated the posterior probability distributions for model

632: parameters in \S\ref{SecInference}, it is straightforward to sample

633: from $p(v(t) | \vec{d})$, the predictive probability distribution for a

634: hypothetical radial velocity observation at time, $t$.

635: %

636: \be

637: p(v(t) | \vec{d} )

638: = \int d\vec{m} \, p(v(t) | \vec{m}) p(\vec{m} | \vec{d} )

639: = \int dC p(v(t)|C) \, p(C| \vec{d})

640: + \int d\vec{x} \, p(v(t)|\vec{x}) p(\vec{x} | \vec{d})

641: + \int d\vec{a} \, p(v(t) | 1, \vec{a}) p(\vec{a} | \vec{d})

642: \ee

643: %

644:

645:

646: Various summary statistics can be computed for $p(v(t) | \vec{d})$ at

647: each of several possible future observing times.  Perhaps the

648: simplest is to calculate the mean ($E[v(t) | \vec{d}]$) and variance

649: ($E[\mathrm{Var}(v(t) | \vec{d})]$) of the velocities sampled from

650: $p(v(t) | \vec{d})$.  This can be done extremely efficiently and has

651: the added benefit that that the mean and variance are straight forward

652: to interpret.

653:

654: Naively, it might seem desirable to make future observations when

655: $E[\mathrm{Var}(v(t) | \vec{d} )]$ is largest (See Fig.\ 1).  While this seems to be a

656: reasonable strategy, a more rigorous analysis will lead to a somewhat

657: different result, as demonstrated in \S\ref{SecDesign}.

658:

659: \subsection{Design}

660: \label{SecDesign}

661:

662: A more sophisticated analysis incorporates the concept of a utility

663: function from decision theory.

664: The utility function makes explicit

665: the utility of a specific combination of an action (e.g., observe at

666: time $t$) and an outcome (e.g., measure a velocity $v(t)$).  While the

667: experimenter can choose the action to be taken, the outcome is not

668: known a priori.  Nevertheless, the predictive distribution, $p(v(t) |

669: \vec{d})$, contains information about likelihood of various outcomes

670: for a given action.  Thus, the experimenter can calculate the expected

671: value of the utility function for various possible actions.  Then the

672: action with the largest expected utility can be chosen.

673: Throughout this section, we closely follow the derivation of Loredo (2004).

674:

675: % TODO: Change below to use Kullback-Leibler entropy for multi star case

676:

677: While numerous utility functions are possible, one particularly well

678: motivated choice is to set the utility function equal to the change in

679: the information contained in the posterior probability distribution

680: for model parameters after incorporating the future observation.  Let

681: $I\left\{f(z)\right\}$ be the information contained in the distribution

682: $f(z)$, which is the negative of the Shannon entropy and is given by

683: %

684: \be

685: I\left\{f(z)\right\} = \int dz \, f(z) \log f(z).

686: \ee

687: %

688: The expectation for the information contained in the posterior

689: distribution for the model parameters after incorporating the future

690: observation is

691: %

692: \be

693: E[I\left\{p(\vec{m} | \vec{d}')\right\} ] = \int dv \, p(v|\vec{d}) I\left\{p(\vec{m} | \vec{d}')\right\},

694: \ee

695: %

696: where $\vec{d}'$ is the set of previous observations, $\vec{d}$,

697: augmented by the future observation, $v$.  Next, we will invoke Shannon's theorem which can be derived by considering the information contained in a joint probability distribution, in this case, $p(\vec{m},\vec{d}')=p(\vec{m}|\vec{d}')p(\vec{d}') = p(\vec{d}'|\vec{m})p(\vec{m})$.   By writing out the integrals contained in

698: %

699: \be

700: I\left\{p(\vec{m}|\vec{d}') p(\vec{d}') \right\} = I\left\{p(\vec{d}'|\vec{m}) p(\vec{m}) \right\},

701: \ee

702: %

703: separating integrals when possible, and simplifying integrals over probability densities which integrate to unity, we arrive at Shannon's theorem,

704: %

705: \be

706: I\left\{p(\vec{m} | \vec{d})\right\} + \int d\vec{m} \, p(\vec{m} | \vec{d} ) I\left\{ p(v|\vec{m}) \right\} =

707: I\left\{p(v | \vec{d} )\right\} + \int dv \, p(v | \vec{d}) I\left\{p(\vec{m} | \vec{d}') \right\},

708: \ee

709: %

710: to rewrite the expected information as

711: %

712: \be

713: \label{EqnExpInfo}

714: E[ I\left\{p(\vec{m} | \vec{d}')\right\} ] =

715: I\left\{p(\vec{m} | \vec{d})\right\} + \int d\vec{m} \, p(\vec{m} | \vec{d} ) I\left\{ p(v|\vec{m}) \right\}

716: - I\left\{p(v | \vec{d} )\right\}.

717: \ee

718: %

719: The first term in Eqn.\ref{EqnExpInfo} is simply the information about

720: the model parameters already available from the previous observations,

721: $\vec{d}$, and is independent of future observations.  The second term

722: is the weighted average of the information contained in the

723: probability distribution for the future observation conditioned on a

724: particular model.  Note that the distribution, $p(v(t)|\vec{m})$, is

725: the distribution of the observed velocities if the model where exactly

726: known.  Although the location of the distribution for the predicted

727: velocities depends on $t$ and $\vec{m}$, the shape and scale of the

728: distribution is independent of both $t$ and $\vec{m}$.  Since the

729: Shannon entropy of a distribution depends only on the scale and shape

730: of a distribution (and not the location where it is centered), the

731: second term of Eqn.\ \ref{EqnExpInfo} is also constant.  The remaining

732: term is the information content of the predictive distribution,

733: $p(v(t)|\vec{d})$, and has an explicit dependence on $\vec{d}$.  Thus,

734: the expected change in the information content of the posterior

735: probability distribution for the model parameters is

736: %

737: \be

738: E[ \Delta I\left\{p(\vec{m} | \vec{d}')\right\} ](t) = I\left\{ p(v(t) | v_{*}(t)) \right\}

739: - I\left\{p(v(t) | \vec{d} )\right\},

740: \ee

741: %

742: where $v_{*}(t)$ is the actual radial velocity of the star at time $t$ (as opposed to the observed velocity $v(t)$).  The first term depends only on the distribution for the measurement about the true value, which we assume is independent of time.  Therefore, the expected

743: change in the information is maximized if the next observation is

744: taken when the information content of the predictive distribution is

745: minimized and the entropy of the predictive distribution is maximized.

746: Thus, an observing program will more efficiently constrain the orbital

747: parameters of a given target star if future observations are made at

748: times when the uncertainty in the predictive distribution is large.

749:

750: The above analysis naturally leads to the technique of maximum entropy sampling.

751: Once $p(v(t) | \vec{d})$ has been estimated as outlined in sections

752: \ref{SecInference}, the Shannon entropy, $-I\left\{p(v(t) | \vec{d})\right\}$ can be

753: easily calculated for numerous possible future observation times.

754: In particular, the necessary integral can be written as

755: \be

756: I\left\{p(v(t)|\vec{d}) \right\} = \int d\vec{m} \, p(\vec{m}|\vec{d}) \int dv p(v|\vec{m}) \log \left[ \int d\vec{m}' p(\vec{m}'|\vec{d}) p(v|\vec{m}') \right],

757: \ee

758: %

759: where the first integral represents a sum sampling the model

760: parameters from $p(\vec{m}|\vec{d})$, and the second integral

761: represents a sum sampling the prospective velocity from $p(v|\vec{m})$

762: using the previously drawn model parameters.  For each velocity drawn

763: in this manner, we must calculate the probability of obtaining that

764: velocity according to the full posterior distribution as the argument

765: to the logarithm.

766:

767: By choosing to make the next observation when $I\left\{p(v(t) |

768: \vec{d})\right\}$ is near a minimum, the observation is expected to

769: yield more information about the model parameters than if the next

770: observation time were chosen randomly.  Once a new observation is

771: made, we must repeat the entire process of calculating a posterior

772: probability distribution for the model parameters, the predictive

773: distribution for the velocity at future times, and the entropy of the

774: predictive distribution at each time.

775:

776: \subsection{Maximum Entropy versus Maximum Variance}

777:

778: It is easy to demonstrate that the Shannon entropy of a Gaussian

779: distribution with standard deviation, $\sigma$, is $-\log (\sqrt{2\pi

780: e} \sigma)$.  If the uncertainty in the prospective measurement is

781: Gaussian with variance $\sigma_{\mathrm obs}^2$ and the predictive

782: distribution is also well approximated by a normal distribution with

783: variance $\sigma_{\mathrm pred}^2(t)$, then the expected change in information

784: reduces to

785: %

786: \be

787: E\left[ \Delta I\left\{p(\vec{m} | \vec{d}')\right\}(t)\right] = \log\left(\frac{\sigma_{\mathrm pred}(t)}{\sigma_{\mathrm obs}}\right).

788: \ee

789: %

790: Thus, the more simplistic strategy of choosing observation times to

791: maximize the variance (rather than the entropy) of the predictive

792: distribution (as described in \S\ref{SecDesign}) is equivalent when

793: the predictive distribution is normal.  Based on visual inspection of

794: several predictive distributions, we observe that the predictive distribution

795: is typically well approximated by a normal distribution, if the period is assumed

796: to be known precisely.  While the predictive

797: distribution may be well approximated by a Gaussian distribution for

798: well constrained orbits, when $N_{\mathrm obs}$ is small, the

799: predictive distributions are generally not well approximated by a

800: Gaussian.  In particular, if there is a significant probability for

801: two qualitatively different models (e.g., null hypothesis and a planet

802: with orbital period near $P$), then the predictive distribution is

803: frequently bimodal with one mode centered on the best-fit constant

804: velocity and another mode near the velocity predicted by the best-fit

805: sinusoidal solution with a different orbital period.

806: For example, let us

807: consider a case where there is a posterior probability, $p_a$, for

808: models with orbital period near $P_a$ and predictive velocity

809: distribution approximately Gaussian centered on $v_a(t)$ with standard

810: deviation $\sigma_a(t)$, and there is a posterior probability, $p_b$,

811: for models with orbital period near $P_b$ (or perhaps the null model)

812: and a predictive distribution approximately Gaussian centered on

813: $v_b(t)$ with standard deviation $\sigma_b(t)$.  The the predictive

814: distribution is approximated by

815: %

816: \be

817: p(v(t)|\vec{d}) \simeq p_a N(v_a(t), \sigma_a(t)) + p_b N(v_b(t), \sigma_b(t)),

818: \ee

819: %

820: and the information contained in the predictive distribution is approximated by

821: %

822: \be

823: I\left\{p(v(t)|\vec{d})\right\} \simeq

824: p_a I\left\{N(v_a(t),\sigma_a(t))\right\} + p_b I\left\{N(v_b(t),\sigma_b(t))\right\} \simeq

825: 0.5 p_a \log\left(2\pi e \sigma_a\right) + 0.5 p_b \log\left(2\pi e \sigma_b\right),

826: \ee

827: %

828: if $v_a(t)-v_b(t) \gg \sqrt{\sigma^2_a(t) + \sigma^2_b(t)}$.  Most

829: notably, the information in the predictive distribution is not

830: sensitive to the separation $v_a(t)-v_b(t)$, but the variance in the

831: distribution obviously does depend on the separation $v_a(t)-v_b(t)$.

832:

833: As can be seen in this example, the entropy of such a distribution is not

834: sensitive to the separation between the modes, unlike the variance

835: which increases with the separation between the two modes.  Thus,

836: choosing observing times based on the variance rather than the entropy

837: of the predictive distribution will tend to favor observing at times

838: when the observations do not completely rule out another possible

839: model which predicts a very different velocity.  While choosing future

840: observation times based on the variance of the distribution may be

841: acceptable for well constrained orbits, it is particularly important

842: to use maximum entropy sampling when the observations are not yet able

843: to exclude qualitatively different models.

844:

845: \subsection{Examples}

846: \label{SecExamples}

847:

848: We have begun to apply the inference and predictive steps to some of

849: the observations taken near the beginning of the N2K project.  In

850: Fig.\ 1, we show the expected velocity and the 5th and 95th

851: percentiles of the predictive distribution as a function of time for

852: several target stars.  We show the median of the predictive

853: distribution as the heavy line and the credible intervals as the

854: thinner lines.

855:

856: By inspecting confidence intervals for the predictive distributions based on

857: actual observational data, we have identified four common cases.

858: %

859: \begin{enumerate}

860: %

861: \item There is structure in the predictive distribution both during

862: the prior observations and significantly after the last observation.

863: The orbital period is well constrained and the structure is nearly

864: periodic with the same period (e.g., Fig.\ 1, top row).

865:

866: \item There is structure in the predictive distribution both during

867: the prior observations and significantly after the last observation.

868: The orbital period is not precisely known, and so the structure is not

869: periodic or is nearly periodic on a timescale significantly longer

870: than the orbital period (e.g., Fig.\ 1, lower left).

871:

872: \item In other cases, there is significant variability in the scale of

873: the predictive distribution times in the past, but not for

874: times in the future (Fig.\ 1, lower right).  This can occur when the

875: orbital period is only weakly constrained.  The uncertainty in the

876: orbital period causes information about the orbital phase to be lost

877: with time and the uncertainty in the orbital phase dominates

878: the width of the predictive distribution.

879:

880: \item There is no significant variability in the predictive

881: distribution around the prior observations, but the variability grows

882: with time due to the possibility of a long period planet (polynomial

883: terms).

884: %

885: \end{enumerate}

886: %

887:

888: In the first two cases, maximum entropy sampling could provide a

889: valuable increase in the efficiency of constraining orbital

890: parameters.  In the first case, the structure in the predictive

891: distribution is periodic, so it is possible to identify the best time

892: to observe the system each orbital period.  However, in the second

893: case, it is not clear how frequently the system should be observed.

894: If the system is observed each time there is a local maximum in the

895: variance or entropy of the predictive distribution, then many

896: observations may be made during a single orbital period.

897: Alternatively, if the system is not observed at each local maximum,

898: then observations may skip an entire orbital period.  The last two

899: cases illustrates another problem with the maximum entropy sampling

900: algorithm applied to a single star.  When the entropy of the

901: predictive distribution increases with time, then maximum entropy

902: sampling does not identify any particular time.  In the next section

903: we present a variation on this algorithm which overcomes these

904: difficulties.

905:

906:

907: \section{Adaptive Scheduling for Multiple Target Stars}

908: \label{SecMulti}

909:

910: In modern planet searches, there is typically a large list of possible

911: target stars and another list of opportunities to observe a small

912: subset of these stars.  Here we rephrase the goal of adaptive

913: scheduling algorithms.  Instead of identifying the best time to

914: observe a particular target, we ask which targets would be best to

915: observe at a particular time.  In this context, an adaptive scheduling

916: algorithm determines both the times at which each target star is

917: observed and the number of times each target star is observed.

918: Adaptively choosing the observing times as in \S\ref{SecSingleStar}

919: can significantly improve the efficiency for constraining orbital

920: parameters for stars with planets as demonstrated by Loredo (2004).

921: Similarly, adaptively choosing the number of observations of each

922: target star can significantly improve the efficiency for detecting

923: planets.  Thus,adaptive scheduling algorithms can provide a double

924: benefit to planet searches.

925:

926:

927: \subsection{Maximum Entropy}

928: \label{SecMultiMaxEntropy}

929:

930: A straightforward generalization of the methods described in

931: \S\ref{SecSingleStar} is to apply the principles of maximum entropy

932: sampling to the joint posterior distribution function for model

933: parameters for each target star, $P \equiv p(\vec{m}_1, \vec{m}_2, ...,

934: \vec{m}_{N_{\mathrm targets}} | \vec{d}_1, \vec{d}_2, ...,

935: \vec{d}_{N_{\mathrm targets}})$, where $\vec{m}_i$ are the model

936: parameters for the $i$th star and $\vec{d}_i$ are the observations of

937: the $i$th star.  Since the posterior distributions for the fit

938: parameters of each star are independent of each other, $P = \prod_i p(\vec{m}_i | \vec{d}_i)$.

939: Therefore, the information contained in the joint posterior

940: distribution is simply the sum of the information contained in the

941: posterior distribution of each star independently.

942: %

943: \be

944: I\left\{ P \right\} = \sum_i I\left\{ p(\vec{m}_i | \vec{d}_i ) \right\},

945: \ee

946: %

947: and the expected

948: increase in information about the joint distribution is equal to the

949: expected increase in information about the posterior distribution of

950: the star being targeted,

951: %

952: \be

953: E\left[ \Delta I\left\{P\right\} \right](t,i) = E\left[\Delta I\left\{p(\vec{m}_i | \vec{d}_i)\right\}(t)\right],

954: \ee

955: %

956: where $i$ indicates which star is being targeted at time $t$.  Thus,

957: one can calculate the expected increase in information about the

958: posterior distribution for the model parameters for each star

959: separately and then choose to observe the star which is expected to

960: yield the most information.

961:

962: After each new observation is obtained, the above procedure can be

963: repeated and a new target star chosen.  In

964: practice, the procedure that we describe requires significant

965: computation time and dozens of radial velocity observations are made

966: on a clear night.  Since the orbital periods (and hence timescale for

967: variability in the predictive distributions) are typically long

968: compared to one night of observing, it is reasonable to calculate the

969: predictive distributions and entropy for each star at one time during

970: a night of observing (e.g. the time at which the star reaches its the

971: maximum altitude during the night).  A list of the stars with the largest

972: expected increase in information can be targeted during that observing

973: night.

974:

975: In principle, the predictive distributions and entropy for each star

976: could be calculated at several times during a night of observing.

977: This would make it possible to choose the observing time precisely,

978: rather than just which night to observe the star.  This could be

979: valuable for stars with very short orbital periods or highly eccentric

980: orbits (and hence short timescales for periastron passage).  In

981: practice, there are significant limitations on when a star can be

982: observed and costs associated with observing stars in an arbitrary

983: order.  (We will discuss how to incorporate this costs in

984: \S\ref{SecAltCosts}.)  For simplicity, in our simulations described below we do not

985: attempt to optimize the observing schedule across times within one

986: night.

987:

988:

989: \subsection{Example}

990: \label{SecMultiExample}

991:

992: To demonstrate the value of adaptive scheduling, we have simulated

993: radial velocity planet surveys using both regular and adaptive target

994: scheduling.  First, we generate a list of 1000 target stars and

995: randomly assign planets to some of them.  The frequency, mass, and

996: orbital period distributions are taken from Tabachnik \& Tremaine

997: (2002).  We then randomly choose 20 observing nights per year.  To simulate

998: the allocation of nights on a large telescope, we restrict the

999: possible observing nights to be during the quarter of the lunar month

1000: closest to full moon.  Each night 100 observation times are regularly

1001: spaced during the night.  For the regular scheduling algorithm, stars

1002: with the smallest number of observations are given the highest

1003: priority.  Among stars which have the same number of observations, the

1004: stars which are less frequently observable are given priority.  For

1005: the adaptive scheduling algorithm, each star is observed three times

1006: as with the regular scheduling algorithm.  Subsequently, a Bayesian

1007: analysis (as described in \S\ref{SecInference}) of the available

1008: observations is performed at the conclusion of each night a star is

1009: observed.  Before each observing night the predictive probability

1010: distribution is calculated (as described in \S\ref{SecPrediction}) for

1011: the velocity of each star observable on that night.  The exact time

1012: for calculating the predictive distribution is the time at which the

1013: star reaches maximum altitude during the night.  The possible target stars are

1014: prioritized based on the entropy of the predictive distribution.  The

1015: 100 stars with the highest priorities are observed that night in order of their

1016: right ascension (not necessarily at the exact time for which the

1017: predictive distribution was calculated).

1018:

1019:

1020: Here we present a summary of the results of these simulations.  At the

1021: end of each night we monitor the posterior distributions for the model

1022: parameters of each system, paying particular attention to the number

1023: of planet detections (which we define to be systems for which the

1024: probability of the null hypothesis, no planet, is less than 0.1\%).

1025: %

1026: % and the number of false alarms (stars for which there is no planet, but the posterior probability for the no-planet hypothesis is less than 0.1\%).  First, we verify that the false alarm rate is approximately 0.1\%, as expected.

1027: %

1028: In Fig.~2, we show how the number of detections

1029: increases as a function of the number of observing nights.  The

1030: adaptive scheduling algorithm based on \S\ref{SecMulti} (dashed blue)

1031: %

1032: % FIXED

1033: %

1034: is clearly more efficient than the regular scheduling algorithm (solid

1035: black) for detecting planets, even though it is not explicitly

1036: optimized for detecting the largest number of planets.  More

1037: importantly, the additional planets that are being detected by the

1038: adaptive scheduling algorithm, tend to be those with the smallest

1039: velocity amplitudes (see Fig.~3).  This is accomplished by observing some stars

1040: more frequently than others.  In Fig.~4 we present a histogram showing

1041: the fraction of stars that were observed a given number of times.

1042: While the regular scheduling algorithm observed each star ten times,

1043: the adaptive algorithms observed many stars slightly less frequently

1044: and a few stars much more frequently.  This makes the adaptive

1045: scheduling algorithms much more sensitive to planets with velocity

1046: amplitudes near the threshold of detection.  Thus, while the total

1047: number of planets detected increases by $\sim10-20\%$, the mass of

1048: the least massive planet detected by the adaptive scheduling algorithm

1049: is less than that of the regular scheduling algorithm by a factor

1050: $\sim2$ or more.  It is also important to note that the accuracy with

1051: which the orbital parameters are measured has not been sacrificed (see Fig.~5).

1052: While the orbital periods and amplitudes for planets with the largest

1053: velocity amplitudes ($\ge100$m/s) are measured with a similar

1054: accuracies, the adaptive scheduling algorithm provides a significant

1055: improvement in the accuracy of the orbital parameter determinations

1056: for planets with more modest velocity amplitudes ($\le30$m/s).

1057:

1058:

1059: \section{Alternative Utility Functions}

1060: \label{SecAltUtility}

1061:

1062: In \S\ref{SecSingleStar} we focused on when to observe a single star, and

1063: hence the predictive distribution, $p(v | \vec{d})$, and its entropy

1064: were an obvious choices for comparing the utility of observations at

1065: various times.  In \S\ref{SecMulti}, we focused on choosing which

1066: star (from a large list) should be targeted at the next observing

1067: opportunity.  In \S\ref{SecMulti}, we choose a utility function

1068: based on the joint posterior for the model parameters for all the

1069: target stars.  However, in this case the choice of utility function is

1070: less obvious.

1071: %

1072: Various surveys and investigators may have differing goals and hence

1073: differing utility functions.  For example, one possible goal would be

1074: to measure the orbital parameters to some desired accuracy.  Another

1075: reasonable goal might be to discover as many planets as possible given

1076: some fixed amount of observing time.  In that case, it would make

1077: sense to stop observing stars once it had been established that they

1078: harbored a planet, even if the orbital elements were not yet well

1079: constrained.   An even more extreme example is for a radial velocity

1080: survey intended to help select reference stars for astrometric survey

1081: by future missions such as SIM.  In that case, one could stop

1082: observing a star before obtaining a rough measure of the orbital

1083: parameters or even before the false alarm rate (for detecting a

1084: planet) was small.  This is similar to a strategy for a survey aimed at

1085: discovering planets with a small mass which would eliminate stars once the

1086: radial velocities are observed to vary over too large a range to be

1087: due to a low mass planet.  Yet another possible goal would be to

1088: discover multiple planet systems or planets with long orbital periods.

1089: In this case, one would not want to stop observing a star even after

1090: the orbit of one planet had been well characterized, if it was still

1091: possible the system could have an additional planet with a longer

1092: orbital period.

1093:

1094: The above examples illustrate that simply targeting stars based on the

1095: maximum entropy method with the same utility function is not always

1096: the best strategy for a given application.  Nevertheless, we have

1097: demonstrated that adaptive scheduling algorithms can significantly

1098: increase the efficiency of an observing program.  Thus, it is

1099: important that the goals of an observing program be carefully

1100: considered and clearly identified.  Then, a utility function can be

1101: chosen that is appropriate for the particular purpose of the

1102: observations.  Once a utility function has been defined, the methods

1103: outlined in this paper can be used to optimize the observing program

1104: for the given utility function.

1105:

1106: The utility function discussed above, $E\left[\Delta

1107: I\left\{p(\vec{m}|\vec{d}')\right\}\right]$, is relatively easy to

1108: calculate based on the posterior distribution, $p(\vec{m}|\vec{d})$,

1109: giving it a practical advantage over many other possible utility

1110: functions.  In this section we describe simple generalizations of the

1111: above utility function which are can be computed with a similar

1112: efficiency.  The generalized maximum entropy utility functions that we

1113: describe below provide a means for optimizing observing schedules for

1114: a broad range of goals.

1115: %

1116: % We caution that the results of a particular choice of utility function is not always obvious.  Therefore, we recommend that observers calculate the results of simulated observing programs using a particular choice of utility function to make sure there are no unintended consequences of the utility function chosen.

1117:

1118:

1119: \subsection{Information about a Subset of Models}

1120: \label{SecMultiAltSubset}

1121:

1122: One case worth considering is when we are only interested in models

1123: which satisfy certain criteria.  For example, we might only be

1124: interested in obtaining more information about stars with planets (and

1125: not about the constant velocity of stars without planets).  Similarly,

1126: we might be interested in companions with (minimum) masses less than

1127: some threshold, perhaps to exclude binary stars or perhaps to target

1128: terrestrial-mass planets.  In this case we could replace

1129: $I\left\{p(\vec{m})\right\}$ with

1130: %

1131: \be

1132: I_{\Theta}\left\{p(\vec{m})\right\} = \int d\vec{m} \, p(\vec{m}) \theta(\vec{m}) \log p(\vec{m}),

1133: \ee

1134: %

1135: where $\Theta$ is a region of the model parameter space,

1136: $\theta(\vec{m}) = 1$ when the parameters $\vec{m}$ satisfy a certain

1137: criteria and $\theta(\vec{m}) = 0$ otherwise.  The expression for

1138: $I_{\Theta}(p(\vec{m}))$ can be thought of as the information contained in

1139: the distribution $p(\vec{m})$ about the subset of model parameters in $\Theta$.  In this case, the relevant utility function becomes

1140: %

1141: \be

1142: E\left[\Delta I_\Theta\left\{p(\vec{m}|\vec{d}')\right\}\right] = p(\vec{m} \in \Theta) I\left\{p(v|v_{actual})\right\} - \int d\vec{m} \, p(\vec{m}|\vec{d}) \theta(\vec{m}) \int dv \, p(v|\vec{m}) \log \left[ p(v|\vec{d}) \right],

1143: \ee

1144: %

1145: where the first term is again a constant, provided that the scale and

1146: shape of the sampling distribution does not depend on the model

1147: parameters.

1148:

1149: In the above example, we used $\theta(\vec{m})$ as an indicator

1150: variable to specify when the parameters satisfied some criteria of

1151: interest, such as whether the model includes a planet or whether the

1152: orbital period is less than some threshold.  More specialized forms of

1153: $\theta(\vec{m})$ could be chosen for specific goals, such as finding

1154: planets with certain orbital periods (e.g., within the habitable

1155: zone).  In principle, $\theta(\vec{m})$ could be used as a weight,

1156: specifying the relative value of information about systems with

1157: various model parameters.  For example, a planet search aiming to

1158: discover low-mass planets could specify a $\theta(\vec{m})$ which

1159: decreases for high mass planets.

1160:

1161:

1162: \subsection{Information in Marginal Distributions}

1163: \label{SecMultiAltMargin}

1164:

1165: Another case worth considering is when some model parameters are of

1166: more scientific interest than others.  For example, the constant

1167: stellar velocities, $C_0$ and $C_1$, contain no information about

1168: extrasolar planets.  Similarly, the angle $\phi_o$ may be of less

1169: interest than other model parameters such as the orbital period, $P$.

1170: As an extreme example, one might be interested in determining only if

1171: a star has a planet and not be interested in measuring the orbital

1172: parameters.  In such cases, it is useful to subdivide the set of model

1173: parameters, $\vec{m}$, renaming them $(\vec{m},\vec{n})$, where

1174: $\vec{m}$ is the set of scientifically interesting model parameters

1175: and $\vec{n}$ is the set of ``nuisance'' parameters.  Now, we can

1176: marginalize over the nuisance parameters and consider the expected

1177: change in the information contained in the probability distribution,

1178: $p(\vec{m}|\vec{d}') = \int dn p(\vec{m}, \vec{n} | \vec{d}')$, rather

1179: than using the joint distribution $p(\vec{m}, \vec{n} | \vec{d}')$

1180: as before.  Thus, the relevant utility function becomes

1181: %

1182: \be

1183: E\left[\Delta I\left\{p(\vec{m}|\vec{d}') \right\} \right] = \int d\vec{m} \, p(\vec{m}|\vec{d}) \int d\vec{n} \, p(\vec{n}|\vec{d},\vec{m}) \int dv \, p(v|\vec{m},\vec{n}) \log \left[ \frac{ \int d\vec{n}' p(\vec{n}'|\vec{d},\vec{m}) p(v|\vec{m},\vec{n}') }{\int d\vec{m}' \int d\vec{n}' p(\vec{m}',\vec{n}'|\vec{d}) p(v|\vec{m}',\vec{n}')} \right].

1184: \ee

1185: %

1186: Here the first two integrals sample over all possible models and the

1187: third integral samples over all possible values of the observed

1188: velocity, just as before (e.g., Eqn.\ 24).  Indeed, if the logarithm

1189: were split into the difference of two terms, then the term arising

1190: from the denominator would also be mathematically equivalent to the

1191: (negative of) Eqn.\ 24.  However, the term arising from numerator

1192: causes this utility function to differ from Eqn.\ 23, as it no longer

1193: simplifies to equal the information of the upcoming observation if the

1194: actual velocity were known.  For the utility function in Eqn.\ 23, the

1195: entropy of the predictive distribution at the time of a hypothetical

1196: future observation is compared to the entropy of the probability

1197: distribution for the observed value given the actual value.  However,

1198: for this choice of utility function, the entropy of the predictive

1199: distribution at the time of a hypothetical future observation is

1200: compared to the entropy of the predictive distribution marginalized

1201: over the nuisance parameters (evaluated for the same time).  Thus,

1202: both terms depend on the time, and a hypothetical future precise

1203: observation would be expected to contribute less information at times

1204: where the predictive distribution is more sensitive to the values

1205: of the nuisance parameters.

1206:

1207: If we marginalize over all the fit parameters, then we can obtain

1208: posterior distributions for the probability that the system does or

1209: does not have a planet.  A scheduling algorithm based on maximizing

1210: the expected increase in information contained in this distribution is

1211: expected to detect planets very efficiently.  Indeed, we can see that

1212: this is the case from the dotted red curve in Figs.~2 \& 3.

1213:

1214:

1215: \subsection{Non-Greedy Algorithms}

1216: \label{SecMultiAltNonGreedy}

1217:

1218: So far we have restricted our attention to ``greedy'' algorithms,

1219: since the utility functions have only considered the effect of a

1220: single additional observation (Cormen et al.\ 2001).  As we have demonstrated, these

1221: greedy algorithms perform quite well in the cases which we considered.

1222: However, it is worth briefly discussing alternative ``non-greedy''

1223: utility functions.

1224:

1225:

1226:

1227: Let us consider the case where making one additional observation is

1228: expected provide no or little increase in information, but making

1229: multiple additional observations (perhaps at particular times) would

1230: be expected to provide significant additional information.  A simple

1231: example of such a situation is when a new target star is added to the

1232: survey.  The first two observations of any star can not result in the

1233: detection of a planet.  Yet, if the current target list has been

1234: searched thoroughly, then it could be more productive to add new

1235: stars to the target list.  While this effect is increased when the

1236: constant velocities are considered nuisance parameters, there is

1237: some effect even when $C_0$ and $C_1$ are considered parameters of interest.

1238:

1239: In principle, cases such as these can be handled by calculating the

1240: expected increase in information at some later time by which several

1241: additional observations could be made.  Multiple additional

1242: observations are considered by sampling over the various possible

1243: combinations of future observations.  This generalization has the

1244: obvious drawback that it is necessary to perform an additional

1245: integral over various possible combinations of observing schedules.

1246: One possible simplification is to relax the assumption that the total

1247: number of observations is held fixed (only for the purpose of

1248: evaluating the integral over future observing schedules) and to

1249: consider observing schedules for each star separately.  Then the integral

1250: over future observing schedules can be performed separately for each

1251: star by assuming a constant probability of making an observation at

1252: each possible future observing time.

1253:

1254: In principle, the above algorithm could identify combinations of

1255: multiple observations which are significantly more valuable than would

1256: be estimated by a greedy algorithm.  Another benefit of the above

1257: algorithm is that it can automatically account for differences in the

1258: fraction of time during which stars are observable (e.g., due to

1259: seasonal effects).  We have performed a few small simulations using

1260: the non-greedy algorithm described above and found that they provided

1261: a small increase in observing efficiency for the cases which we

1262: considered.  However, a through comparison of greedy and non-greedy

1263: algorithms will require significantly more computation power.

1264:

1265: \subsection{Utility Functions with Costs}

1266: \label{SecAltCosts}

1267:

1268: The utility function can also include information about the cost of

1269: making a given observation.  While economic cost is often used in

1270: Bayesian decision theory, for our applications it is preferable to use

1271: observing time necessary to perform an observation as the measure of

1272: cost.  The observing time required for a given observation includes

1273: the time necessary to collect the desired number of photons, but also

1274: the time required for CCD readout and slewing the telescope from the

1275: previous target to the next target.  The time required for CCD readout

1276: is known and a constant for each exposure.  (For faint target stars,

1277: it may be necessary to use multiple exposures for a single

1278: observation, if the barycentric correction would vary significantly

1279: during the necessary integration time.)  For target stars near the

1280: previous target, the telescope can often slew to the target during CCD

1281: readout, resulting in no or minimal loss of observing time to slewing.

1282: %

1283: Typically, observing schedules within a night are chosen so that the

1284: vast majority of the observing time is spent integrating on targets.

1285: The main factor in determining the integration time necessary is the

1286: apparent magnitude of the target star.  In principle, adaptive scheduling algorithm could plan an entire evenings

1287: observations, considering the cost to observe each possible target

1288: throughout the night.  However, the exact amount of

1289: integration time depends on the atmospheric extinction (which we

1290: assume in proportional to the airmass) as well as time variable atmospheric

1291: conditions that are generally not known in advance.

1292: %

1293: Therefore, we do not believe that it is worthwhile to include such a

1294: fine level of scheduling when selecting targets for a given night.  In

1295: practice, atmospheric conditions (e.g., seeing and cloud cover) change

1296: throughout a night, making it impossible to know in advance the number

1297: of targets that can be observed during that night.  Given the

1298: computational complexity of these algorithms, it is impractical to

1299: perform new analyzes throughout the night as atmospheric conditions

1300: change.  By assuming that atmospheric extinction is a function of

1301: airmass only (e.g., ignoring the possibilities of cloud cover or

1302: atmospheric conditions changing throughout the night), we can compute

1303: the expected amount of observing time necessary for each possible

1304: target star.  Then, we identify the targets with the greatest expected

1305: increase in information per unit observing time (rather than per

1306: observation).

1307:

1308: % Therefore, we assume

1309: %that, if a candidate target star were to be observed on a given night,

1310: %then it would be observed at the optimal time during the night.  For

1311: %most stars, this means that we assume an airmass equal to the minimum

1312: %airmass for that star during the night.  For stars which transit

1313: %during the daytime or twilight, we assume the minimum of the airmass

1314: %at the beginning and ending of the available observing time.

1315:

1316:

1317: \section{Discussion}

1318: \label{SecDiscussion}

1319:

1320: %\subsection{Value of Technique}

1321:

1322: We have developed a practical algorithm for applying adaptive

1323: scheduling to radial velocity planet searches.  The algorithms

1324: presented are rigorously grounded in Bayesian data analysis and

1325: information theory, and still permit specialization for the specific

1326: goals of the observing program.  While such adaptive scheduling

1327: algorithms are computationally demanding, they can provide dramatic

1328: benefits.  Already, there is some element of ``adaptive'' scheduling

1329: due to human feedback (e.g., observers identify interesting targets to

1330: be observed more frequently, time allocation committees decide how

1331: much and when observing time will be made available).  Unfortunately,

1332: quantifying these effects is extremely difficult.  One advantage of

1333: following an algorithmic procedure is that any biases can be

1334: recognized, simulated, and quantified.

1335:

1336: As we demonstrated in Fig.~2, the use of adaptive

1337: scheduling algorithm can significantly increase the number of planet

1338: detections in a survey with a fixed amount of observing time.

1339: Perhaps, more significantly, the additional planets found tend to have

1340: the smallest velocity amplitudes and hence smaller masses of those

1341: detected in the survey, as seen in Fig.~3.  For the survey parameters

1342: which we considered, we found the least massive planet being detected

1343: in a survey is a factor $\sim 2$ less massive when using our adaptive

1344: scheduling algorithms than when scheduling observations randomly.  It

1345: is important to appreciate that the increased the number of planet

1346: detections does not require reducing the precision of measurements of

1347: orbital parameters.  In fact, the same algorithms which increase the

1348: number of planet detections simultaneously can improve the precision with

1349: which most of orbital parameters are measured (Fig.~5).  While the precision is

1350: comparable for planets with very large velocity amplitudes, for low

1351: mass planets the adaptive algorithms typically measure orbital

1352: parameters with more than an order of magnitude smaller uncertainties.

1353: %

1354: % While the value of detecting

1355: % additional planets and measuring their orbital parameters more

1356: % precisely is subjective, it is clear that adaptive scheduling

1357: % algorithms can decrease the amount of telescope time needed for a

1358: % given scientific goal.  Thus, adaptive scheduling algorithms can be of

1359: % great value to the entire astronomical community.

1360:

1361:

1362: %\subsection{Challenges Addressed}

1363:

1364: The adaptive algorithms presented in this paper address several

1365: important challenges raised by previous studies.  In this paper, we

1366: dramatically reduce the computational requirements for Bayesian

1367: adaptive scheduling algorithms relative to Loredo (2004).  We

1368: accomplish this by separating the integrals over orbital period from

1369: the integrals over the other orbital parameters.  By assuming planets

1370: on circular orbits, the remaining integrals become Gaussian integrals

1371: which can be evaluated analytically.  The increased efficiency has

1372: made several other advances possible.  The increased computational

1373: efficiency of our algorithms allows us to conduct Monte Carlo

1374: simulations of entire planet search programs and quantify the increase

1375: in efficiency that adaptive scheduling algorithms offer.  More

1376: significantly, our algorithms make it practical to perform a Bayesian

1377: analysis of the orbital parameters even when there are extremely weak

1378: constraints on the orbital parameters, such as when only a few

1379: observations have been made.  This has allowed us to apply Bayesian

1380: hierarchical modeling to simultaneously consider both possibilities

1381: that a star has no planet and that the star has one planet,

1382: extending previous Bayesian techniques which assumed there was a

1383: single planet (Loredo 2004; Ford 2005, 2006).  By constructing hierarchical

1384: models, adaptive scheduling algorithms naturally consider the

1385: problems of planet detection and orbital parameter estimation

1386: simultaneously.  We also present several generalizations of our

1387: computationally efficient algorithms.  The generalized algorithms can

1388: accommodate a variety of utility functions which can be customized to

1389: the specific goals of an observational program.  These generalizations

1390: also allow the adaptive scheduling algorithm to consider practical

1391: complications such as stars of different brightnesses and observing

1392: seasons.

1393:

1394:

1395: %\subsection{New Challenges}

1396:

1397: This work also raises several new challenges.  Clearly, it would be

1398: desirable to generalize our algorithms to reflect the full variety of

1399: planetary systems.  For example, we assume that each star has a

1400: maximum of one planet, while there are already $\sim$12 stars known to

1401: have multiple planets.  In principle, it is easy to generalize our

1402: hierarchical models to allow more multiple planets, but in practice

1403: each additional planet would introduce an additional integral which

1404: can not be evaluated analytically and dramatically increase the

1405: required computations.

1406:

1407: Additionally, we have assumed circular planetary orbits, while most

1408: extrasolar planets have significant eccentricities.  We have conducted

1409: some simulations in which we consider a population of stars with

1410: planets on eccentric orbits but the adaptive scheduling algorithm

1411: assumes circular orbits.  While the adaptive scheduling algorithms

1412: still detect planets and measure orbital parameters more efficiently

1413: than non-adaptive algorithms, the improvement in efficiency is less

1414: than when applied to a population of stars with planets on circular

1415: orbits.  Intuitively, we expect that planets on eccentric orbits could

1416: benefit from adaptive scheduling algorithms even more than planets on

1417: circular orbits.  Therefore, we expect that the reduction in

1418: efficiency is due to the adaptive scheduling algorithm not having

1419: access to the appropriate model.  Again, in theory, it is

1420: straightforward to include eccentric orbits in a Bayesian analysis,

1421: but this would introduce an additional two integrals which can not be

1422: evaluated analytically and hence would significantly increase the

1423: require computations.  Incorporating eccentric orbits into adaptive

1424: scheduling algorithm could be accomplished by brute force, or it might

1425: be sufficient to use approximate models which expand the orbital

1426: motion in the eccentricity.

1427:

1428: Our simulations have also assumed that planetary perturbations are the

1429: only cause of variations in the star's radial velocity.  In practice,

1430: many stars appear to have intrinsic variability commonly known as

1431: stellar ``jitter'' which can be comparable to or exceed the

1432: observational uncertainties in the radial velocity measurements (Saar,

1433: Butler, \& Marcy 1998).  We have conducted some simulations in which

1434: we consider a population of stars where each star has a Gaussian

1435: jitter, but the adaptive scheduling algorithm assumes no jitter.  The

1436: primary effect of the jitter is to increase the false alarm rate

1437: (fraction of stars with no planets for which the Bayesian analysis

1438: determines the probability of having no planet is less than 0.1\%).

1439: While the threshold for announcing a planet detection can be altered

1440: to maintain a 0.1\% false alarm rate, such a treatment is simplistic

1441: and does not properly account for the unknown jitter.  The problems

1442: posed by jitter can be mitigated by replacing the observational

1443: uncertainties with the observational uncertainties added in quadrature

1444: to the amount of jitter expected based on the star's spectral

1445: properties.  Still, this approach could result in poor performance

1446: when the estimate for the stellar jitter is inaccurate.  Indeed, one

1447: of the advantages of Bayesian analysis is that it can naturally allow

1448: for noise sources of unknown magnitude.  A more rigorous analysis

1449: would treat the stellar jitter of each star as an unknown along with

1450: the orbital parameters.  Unfortunately, adding the stellar jitter as a

1451: model parameter introduces an additional integral and requires

1452: additional computation time.  For the purpose of testing adaptive

1453: scheduling algorithms when confronted with jitter, it may be useful to

1454: consider a special case in which the observational uncertainties are

1455: the same for each observation.  In this case, the integral over the

1456: unknown stellar jitter can be performed separately from the other

1457: integrals and reduces to a sum of exponential integrals which

1458: can be performed analytically (Cumming 2004).

1459:

1460: This paper has focused its attention on adaptive scheduling algorithms

1461: for radial velocity planet searches.  Targeted astrometric planet

1462: searches such as those planned for the Space Interferometry Mission

1463: (SIM) are another obvious application of our methods.  Given the

1464: cost and finite lifetime of space missions such as SIM, observing time

1465: is extremely valuable and it is even more important to make the best

1466: use of the available observing time.  Successfully incorporating Bayesian

1467: adaptive design would require that the observation schedule not be fixed

1468: far in advance by logistical constraints.  Mission designers should aim

1469: to allow for frequent upload of revised target lists.

1470: In principle, the two types of

1471: planet searches are quite similar, with the main difference that

1472: astrometric surveys can measure the stellar position in two dimensions

1473: while radial velocity surveys can measure the stellar velocity in only

1474: one dimension.  Therefore, we expect that adaptive scheduling

1475: algorithms could also provide a significant improvement in the

1476: efficiency of the SIM planet searches, resulting in more planets being

1477: discovered, planet masses and orbital parameters being determined more

1478: accurately, and {\em significantly increasing the sensitivity of SIM

1479: to nearly-Earth-mass planets and multiple planet systems}.  For this

1480: potential benefit to be realized the SIM design must be sufficiently

1481: flexible that the observing schedule and target stars can be chosen

1482: with a lead time much less than the duration of the mission,

1483: preferably a lead time of a month or less.  To fully simulate such

1484: an adaptive scheduling algorithm, it will be important to incorporate

1485: the practical observing constraints and costs (e.g., possible pointing

1486: directions limited, time require to slew to different positions).

1487:

1488: \acknowledgments

1489:

1490: We thank Debra Fischer, Jeremy Goodman, Geoff Marcy, Scott

1491: Tremaine, and an anonymous referee for their suggestions.

1492: %

1493: This research was supported in part by the Miller Institute for Basic

1494: Research, NASA grants NAG5-10456 and NNG04H44g, and by NASA through

1495: Hubble Fellowship grant HST-HF-01195.01A awarded by the Space

1496: Telescope Science Institute, which is operated by the Association of

1497: Universities for Research in Astronomy, Inc., for NASA, under contract

1498: NAS 5-26555.

1499:

1500:

1501: \newpage

1502:

1503: \begin{thebibliography}{}

1504: \baselineskip 11pt

1505: \parsep 0pt

1506: \itemsep -3pt

1507:

1508: \bibitem[]{} Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. 2001 Introduction to Algorithms.  (Cambridge, MA: MIT Press)

1509:

1510: %\bibitem[]{} Cumming, A., Marcy, G., \& Butler, R.P.\ 1999,

1511: %\newblock{ ApJ, } 526, 890.

1512:

1513: \bibitem[]{} Cumming, A., Marcy, G.W., Butler, R.P., \& Vogt, S.S. 2002,

1514: in ``Scientific Frontiers in Research on Extrasolar Planets'' eds. D. Deming \& S. Seager; ASP Conferences Series 294, 27.

1515:

1516: \bibitem[]{} Cumming, A. 2004

1517: \newblock{ MNRAS, } 354, 1165.

1518:

1519: \bibitem[]{} Endl, M., Kurster, M., Els, S., Hatzes, A.P., Cochran, W.D., Dennerl, K., Dobereiner, S. 2002 A\&A 392, 671.

1520:

1521: \bibitem[]{} Fischer, D.A., Laughlin, G., Butler, R.P., Marcy, G.W., Johnson, J., Henry, G., Valenti, J., Vogt, S.S., Ammons, M., Robinson, S., Spear, G., Strader, J., Driscoll, P., Fuller, A., Johnson, T., Manrao, E., McCarthy, C., Munoz, M., Tah, K.L., Wright, J., Ida, S., Sato, B., Minniti, D. 2005,

1522: \newblock{ ApJ, } 620, 481.

1523:

1524: \bibitem[]{} Ford, E.B. 2004,

1525: \newblock{  PASP, } 116, 1083.

1526:

1527: \bibitem[]{} Ford, E.B. 2005,

1528: \newblock{ AJ, } 129, 1706.

1529:

1530: \bibitem[]{} Ford, E.B. 2006,

1531: \newblock{ ApJ, } 642, 505.

1532:

1533: \bibitem[]{} Gaudi, B.S., Seager, S., Mallen-Ornelas, G. 2005,

1534: \newblock{ ApJ, } 623, 472.

1535:

1536: \bibitem[]{} Konacki, M., Torres, G., Saurabh, J., Sasselov, D. 2003,

1537: \newblock{ Nature, } 421, 507.

1538:

1539: \bibitem[]{} Loredo, T.J. 2004, in ``Bayesian Inference And Maximum Entropy Methods In Science And Engineering: 23rd International Workshop'' ed. G. J. Erickson and Y. Zhai; AIP Conference Proceedings 707, 330. % -346

1540:

1541: \bibitem[]{} Pourbaix, D. 2002,

1542: \newblock{ A\&A, } 385, 686. % -692.

1543:

1544: \bibitem[]{} Rasio, F.A., Tout, C.A., Lubow, S.H. Livio, M. 1996,

1545: \newblock{ ApJ, } 470, 1187.

1546:

1547: \bibitem[]{} Saar, S.H., Butler, R.P., Marcy, G.W. 1998,

1548: \newblock{ ApJ, } 498, 153.

1549:

1550: \bibitem[]{} Sivia, D.S. 1996 Data Analysis: A Bayesian Tutorial.  (New York, NY: Oxford University Press)

1551:

1552: \bibitem[]{} Sozzetti, A., Casertano, S., Brown, R.A., Lattanzi, M.G. 2002,

1553: \newblock { PASP, } 114, 117.  % 2002 astro-ph/0207222 % SIM: single planet

1554:

1555: \bibitem[]{} Tabachnik, S. \& Tremaine, S. 2002,

1556: \newblock{ MNRAS, } 335, 151.

1557:

1558: \bibitem[]{} Udry, S., Mayor, M., Clausen, J., Freyhammer, L., Helt, B., Lovis, C., Naef, D., Olsen, E, Pepe, F., Queloz, D., Santos, N. 2003,

1559: \newblock{ A\&A, } 407, 679.

1560:

1561: %\bibitem[]{} Whetherill, G.B. \& Glazebrook, K.D. 1986,

1562: %{\em Sequential Methods in Statistics,}

1563: %% (Monographs on applied probability and statistics)

1564: %New York: Chapman and Hall Ltd.

1565:

1566: \end{thebibliography}

1567:

1568: \newpage

1569:

1570: \begin{figure}

1571: \label{FigN2k}

1572: \plotone{f1.eps}

1573: \caption{Here we show the expected value for the radial velocity

1574: (solid line) during October and November 2004 (tick marks very five

1575: days) of four different target stars from the N2K project.  The dotted

1576: lines show the 95\% confidence intervals for $p(v|d)$ as a function of

1577: time.  These predictive observations were based on observations taken

1578: between January 10 and July 11, 2004 (not shown) as a part of the N2K

1579: survey.  The arrows near the bottom of each panel indicate the time

1580: when the entropy of the predictive distribution is maximized.  The

1581: arrows near the bottom of each panel indicate the time when the

1582: standard deviation of the predictive distribution is maximized.  The

1583: predictive distributions are based on 4 (top left), 5 (top right), 11

1584: (bottom left), and 6 (bottom right) observations.  The best-fit

1585: orbital periods are near 53d (top left), 52d (top right), 16 d (bottom

1586: left), and 27d (bottom right).

1587: %

1588: %For some  stars there are some times where observations would be particularly valuable.

1589: }

1590: %

1591: \end{figure}

1592:

1593: \begin{figure}

1594: \label{FigHd88133}

1595: \plotone{f2.eps}

1596: \caption{Here we show the expected change in the expectation of the information

1597: content following a single observation as a function of the time, $t$ (dotted line).

1598: This is to be compared to the log of the standard deviation of the predictive distribution (solid line).

1599: The figure is based on the first five of the observations of HD 88133, the first planet discovered by the N2K project (Fischer et al. 2004).  The times of the observations are indicated with vertical arrows.  While the two curves show similarities, the most/least favorable times according to an information theory analysis are not always coincident with what would be predicted using only the variance of the predictive distribution.

1600: %

1601: }

1602: %

1603: \end{figure}

1604:

1605: \begin{figure}

1606: \plotone{f3.eps}

1607: \caption{This figure shows the number of planets detected (stars for

1608: which the posterior probability of the no-planet model,

1609: $p(0|\vec{d})$, is less than 0.1\%) as a function of the number of

1610: observing nights.  The different line styles present the results for

1611: different scheduling algorithms: regular (solid black), the adaptive

1612: algorithm from \S\ref{SecMultiMaxEntropy} (dashed blue), and the

1613: %

1614: %FIXED

1615: %

1616: adaptive algorithm from \S\ref{SecMultiAltMargin} (dotted red), which

1617: %

1618: %FIXED

1619: %

1620: maximizes the detections at the expense of accuracy in orbital

1621: parameters by marginalizing over orbital parameters (compare to Fig.\

1622: 5).  The results for each scheduling algorithm have been averaged over

1623: ten simulated observing programs.

1624: %

1625: }

1626: %

1627: \end{figure}

1628:

1629: \begin{figure}

1630: \plotone{f4.eps}

1631: \caption{Here we show the fraction of planets detected as a function

1632: of the velocity semi-amplitude, $K$.  In these simulations each

1633: observation had an observational uncertainty of $\sigma_{\mathrm obs}

1634: = 3$m/s.  The planet searches based on both adaptive algorithms are

1635: significantly more efficient at detecting planets with $K\sim 1-3

1636: \sigma_{\mathrm obs}$ than the fixed schedule (solid black line).  The

1637: line styles are as in Fig.~2.  }

1638: %

1639: \end{figure}

1640:

1641: \begin{figure}

1642: \plotone{f5.eps}

1643: \caption{In this figure we present a histogram of the number of time a star was

1644: observed.  For the regular scheduling algorithm, each star was

1645: observed ten times (not shown).  The adaptive algorithm from

1646: \S\ref{SecMultiMaxEntropy} (dashed blue) and the adaptive algorithm from

1647: %

1648: % FIXED

1649: %

1650: \S\ref{SecMultiAltMargin} (dotted red) both devote a large number of

1651: %

1652: % FIXED

1653: %

1654: observations to a few stars, significantly increasing the sensitivity

1655: to low-mass planets around these stars.  }

1656: %

1657: \end{figure}

1658:

1659: \begin{figure}

1660: \plotone{f6.eps}

1661: \caption{Here we show the median precision of the measurements of the orbital parameters, $P$ (top panel) and $K$ (bottom panel), as a function of the velocity semi-amplitude, $K$.  Both adaptive scheduling algorithms typically perform significantly better than the fixed schedule (solid black line).  However, for large velocity amplitudes, the adaptive algorithm presented in \S\ref{SecMultiMaxEntropy} often allocates a only small number of observations to planets with large velocity amplitudes and hence the orbit determinations are not as precise as with the fixed schedule.  The line styles are as in Fig.~2.

1662: }

1663: %

1664: \end{figure}

1665:

1666: \end{document}

1667: