1: %\documentclass[manuscript,onecolumn]{aastex}
2: \documentclass[manuscript,onecolumn]{emulateapj}
3: %\usepackage{emulateapj}
4:
5: %\documentclass[preprint,11pt]{aastex}
6:
7: \newcommand{\be}{\begin{equation}}
8: \newcommand{\ee}{\end{equation}}
9:
10: \slugcomment{accepted to AJ}
11: \shorttitle{Adaptive Scheduling}
12: \shortauthors{Ford}
13: \begin{document}
14:
15: \title{Adaptive Scheduling Algorithms for Planet Searches}
16:
17: \author{Eric B.\ Ford\altaffilmark{1,2,3,4,5}}
18:
19: %\email{eford@astro.ufl.edu}
20: \altaffiltext{1}{Department of Astrophysical Sciences,
21: Princeton University,
22: Peyton Hall,
23: Princeton, NJ 08544-1001, USA}
24: \altaffiltext{2}{Astronomy Department,
25: 601 Campbell Hall,
26: University of California at Berkeley,
27: Berkeley, CA 94720-3411, USA}
28: \altaffiltext{3}{Harvard-Smithsonian Center for Astrophysics,
29: MS-51,
30: 60 Garden Street,
31: Cambridge, MA 02138, USA}
32: \altaffiltext{4}{Hubble Fellow}
33: \altaffiltext{5}{present address: Department of Astronomy,
34: University of Florida,
35: 211 Bryant Space Science Center,
36: P.O. Box 112055,
37: Gainesville, FL, 32611-2055, USA}
38:
39:
40:
41: \begin{abstract}
42: High-precision radial velocity planet searches have surveyed
43: over $\sim\!2000$ nearby stars and detected over $\sim\!200$ planets. While
44: these same stars likely harbor many additional planets, they will
45: become increasingly challenging to detect, as they tend to have
46: relatively small masses and/or relatively long orbital periods.
47: Therefore, observers are increasing the precision of their
48: observations, continuing to monitor stars over decade timescales, and
49: also preparing to survey thousands more stars. Given the considerable
50: amounts of telescope time required for such observing programs, it is
51: important use the available resources as efficiently as possible.
52: Previous studies have found that a wide range of predetermined
53: scheduling algorithms result in planet searches with similar
54: sensitivities. We have developed adaptive scheduling algorithms which
55: have a solid basis in Bayesian inference and information theory and
56: also are computationally feasible for modern planet searches. We have
57: performed Monte Carlo simulations of plausible planet searches to test
58: the power of adaptive scheduling algorithms. Our simulations
59: demonstrate that planet searches performed with adaptive scheduling
60: algorithms can simultaneously detect more planets, detect less massive
61: planets, and measure orbital parameters more accurately than
62: comparable surveys using a non-adaptive scheduling algorithm. We
63: expect that these techniques will be particularly valuable for the
64: N2K radial velocity planet search for short-period planets as well as
65: future astrometric planet searches with the Space Interferometry
66: Mission which aim to detect terrestrial mass planets.
67: \end{abstract}
68:
69: \keywords{Subject headings: planetary systems -- methods: statistical
70: -- techniques: radial velocities}
71:
72:
73: \section{Introduction}
74:
75: Radial velocity planet searches have surveyed over 2000 nearby solar
76: type stars and discovered over 200 planets. The surveys require many
77: high precision radial velocity observations of each star in the
78: survey, and hence a significant amount of observing time.
79: %
80: For example, the N2K project has recently begun surveying the next
81: $\sim 2000$ nearby stars for planets. This project aims to discover
82: dozens of hot Jupiters and hopefully additional transiting planets
83: (Fischer et al.\ 2004). Given the large target list for the N2K
84: project, it is essential that inferences about the presence of planets
85: and their orbits be made as efficiently as possible. The
86: observing program aims to take three observations of each star on
87: consecutive nights, followed by an additional observation one to a few
88: months later. In many cases, it is clear after a few observations
89: that the radial velocity observations have a dispersion significantly
90: greater than would be expected due to measurement errors only.
91: However, there may still be a large range of possible orbital
92: solutions, and several additional observations will often be
93: required to determine the planetary orbits. Given the considerable
94: observation time required for such planet searches and the value of
95: telescope time, it is important that these surveys be as
96: efficient as possible.
97:
98: Previous studies have demonstrated that a variety of observing
99: schedules result in comparable efficiencies for detecting planets
100: (Sozzetti 2002, Ford 2004). However, these studies have only
101: considered {\em non-adaptive } observing schedules, i.e., schedules
102: that are fully determined before any observations are taken. In this
103: paper, we develop and test the efficiency of {\em adaptive}
104: scheduling algorithms. We describe how adaptive scheduling algorithms
105: can help increase the efficiency of planet searches by utilizing
106: the information available from the previous observations to plan future
107: observations.
108:
109: % It is possible to develop adaptive scheduling algorithms based on the frequentist methods of sequential statistics (see Wetherill \& Glazebrook 1986). For example, a planet could be considered detected when a $\chi^2$ statistic exceeds a threshold which depends on the number of observations made. Unfortunately, the thresholds typically must be determined via simulation. Since one must choose a threshold function (and not just a constant threshold as in the case of non-sequential statistics), there are an infinite number of threshold functions which have the same false alarm rate. One must use large simulations to estimate the power of various threshold functions. More importantly, this approach typically relies on making decisions (e.g., Is the null hypothesis rejected? Should we stop observing this star?) that introduce biases which much should be fully understood before making inferences. In practice, large simulations are necessary to understand these biases. Finally, with this type of adaptive scheduling algorithm it is not easy to test new hypotheses which were formulated after some data has been collected.
110:
111: Adaptive scheduling algorithms can be based on Bayesian
112: inference (Loredo 2004). Within this framework, the need to make decisions is
113: minimized. For example, the experimenter does not say that a planet
114: has been detected, but rather states that the posterior probability for
115: the null hypothesis is some small value. This eliminates the need to
116: chose threshold functions. More importantly, by eliminating
117: decisions and basing the scheduling algorithm on the posterior
118: probability distribution, the posterior probability distribution is
119: not biased by the choice of observing schedule. Additionally, it is
120: straightforward to test new hypotheses which will inevitably be
121: formulated after making some observations. For these reasons, we have
122: developed adaptive scheduling algorithms within the Bayesian
123: framework, always considering all possible models (weighted according
124: to their posterior probability).
125:
126: It should be noted that the use of adaptive scheduling algorithms ---
127: even those based on Bayesian inference--- does affect the distribution
128: of the posterior distributions. Indeed, that is the purpose of
129: employing such algorithms. For example, if an adaptive scheduling
130: algorithm is chosen to increase the sensitivity of an observing
131: program for detecting planets, then it is expected that an ensemble of
132: surveys employing the adaptive scheduling algorithm will result in
133: detecting more planets than a comparable ensemble of surveys using a
134: fixed scheduling algorithm. It should be noted that this is also true
135: of non-adaptive scheduling algorithms. For example, an observing
136: program which makes observations over a long duration will be
137: sensitive to planets with orbital periods comparable or less than the
138: survey duration, but there may be ambiguities in determining the
139: orbital period associated with aliasing. If an observing program
140: obtained the same number of observations with the same precision, but
141: performed all the observations during a smaller interval of time, then
142: the shorter survey would be less sensitive to planets with orbital
143: periods longer that the short survey's duration, but would likely
144: reduce aliasing ambiguities for short period planets. Thus, when
145: analyzing the properties of a population of stars and planets, it is
146: alway important to account for the scheduling algorithm used.
147:
148: Our approach applies the principles of Bayesian inference and
149: information theory to guide the choice of observation times (and later
150: choice of target stars). Following Loredo (2004), we assume a prior
151: for both the probability that each target star has a planet and the
152: distribution of orbital periods and masses of planets. Second, a
153: small number of observations are taken of each target star. Bayesian
154: inference is used to calculate the posterior probability distribution
155: for all model parameters. Then, the posterior probability
156: distribution for the model parameters is used to calculate the
157: predictive distribution, the posterior probability distribution for
158: the radial velocities at some future time. By comparing the
159: information contained in the predictive distribution at various future
160: times, it is possible to choose observing times at which additional
161: observations would be most valuable. This technique requires
162: performing integration over several variables and in general is
163: extremely computationally intensive. In this paper we describe a
164: relatively fast algorithm for performing the necessary integrations.
165:
166: %
167: % Loredo (2004) described such an algorithm and presented an example of how adaptive scheduling algorithms could result in improving the precision of orbital parameters more rapidly than the usual scaling law, the inverse square root of the number of observations.
168:
169: While it would be desirable to observe each star in a
170: survey at the optimal times for each particular star, it is more
171: realistic to consider a planet survey which is able to perform a fixed
172: number of observations at specific times. In this case, the
173: information contained in the predictive distributions for several
174: stars can be used to select which target star should be observed at a
175: given time.
176:
177: We describe the algorithm for choosing observing times of a single
178: star in \S2. In \S3 we describe a generalized algorithm that
179: allows for multiple target stars,
180: along with simulations which demonstrate the power of adaptive
181: scheduling algorithms. In \S4 we present generalizations that allow
182: observing schedules to be optimized for meeting specific goals,
183: such as detecting planets in the habitable zone or maximizing the
184: number of planets detected. Finally, we discuss the implications of
185: our findings and the challenges which remain in \S5.
186:
187:
188: \section{Adaptive Scheduling for a Single Target Star}
189: \label{SecSingleStar}
190:
191: \subsection{Priors}
192: \label{SecPriors}
193:
194: For each target star we assume a prior probability, $p(0)$, for the
195: null hypothesis that the radial velocity observations are consistent
196: with a constant radial velocity. We assume a probability,
197: $p(1)=1-p(0)$, that the star has a single planet. For the orbital
198: parameters of any planet, we take prior distributions that are flat in
199: $\log P$, $\log K$, and $\phi_o$, where $P$ is the orbital period, $K$
200: is the velocity semi-amplitude, and $\phi_o$ is the phase at a
201: given epoch. These choices are standard for variables which represent
202: magnitudes and angles. We limit the range of orbital parameters to
203: $P_{\min} \le P < P_{\max}$, $K_{\min} \le K < \infty$, and $0 \le \phi_o < 2
204: \pi$ (see \S2.3). The choices for the prior distributions are supported by
205: scaling arguments as well as their approximate agreement with the
206: orbital parameters for the known extrasolar planets.
207:
208: There are a few differences in our prior distributions and the
209: distributions of orbital elements of known extrasolar planets. First,
210: we apply a sharp cutoff for orbital periods less than $P_{\min}$. The
211: OGLE transit searches have discovered planets with orbital periods as
212: short as $1.2$d (Koanacki 2003). While radial velocity searches are
213: very sensitive to such planets, the shortest orbital period discovered
214: by a radial velocity survey is 2.5d (Udry et al.\ 2003). It is
215: important to recognize the OGLE transit search surveys a much larger
216: number of stars than radial velocity surveys (Gaudi, Seager, \&
217: Mallen-Ornelas 2004). Thus, the observations imply that the
218: distribution of orbital periods is roughly flat in $\log P$ for $P \ge
219: 3d$, but there is a very significant reduction in the number of
220: planets at shorter orbital periods (Gaudi, Seager, \&
221: Mallen-Ornelas 2004). Therefore, it would be reasonable
222: to apply cutoff for orbital periods $P < P_{\min}$ for any $P_{\min} <
223: 3$d, and we choose $P_{\min} = 2.5$d. Given recent discoveries, we
224: do not suggest such a large $P_{\min}$ for future studies.
225:
226: %Additionally, it is worth noting that if $P_{\min}\le 1$d, then there
227: %can be a large range of possible orbital solutions with orbital
228: %periods very near one day. Such planets are difficult for radial
229: %velocity surveys to detect when there are only a few closely spaced
230: %observations. However, it is important to note that the possibility
231: %of planets with orbital periods very near one day can be well
232: %constrained by observing the systems again after a significant
233: %fraction of a year. Therefore, when $T_{\mathrm obs} < 100$d, and
234: %$P_{\min} = 1$d when $T_{\mathrm obs} \ge 100$d.
235:
236: We also apply a sharp cutoff for orbital periods greater than
237: $P_{\max}$. When a planet has an orbital period much longer than
238: $T_{\mathrm obs}$, then there are degeneracies in the Keplerian
239: orbital parameters and the radial velocities can be well modeled by a
240: quadratic polynomial (Cumming 2004). By replacing all the Keplerian
241: models with $P \ge \pi T_{\mathrm obs}$ with a single quadratic model,
242: it is possible for our algorithm to detect planets with orbital
243: periods, $\pi T_{\mathrm obs} \le P$ efficiently. We maintain the
244: flat prior in $\log P$ by setting the prior probability for the
245: quadratic model equal to the sum of the prior probabilities for
246: orbital periods in the range $\pi T_{\mathrm obs} < P < P_{\max}$.
247: Since the prior distribution is flat in $\log P$, this relative prior
248: probability is not sensitive to the exact choice of $P_{\max}$. The
249: effect of varying $P_{\max}$ is to change the prior probability for a
250: planet having a long-period orbit relative to the prior probability
251: for a planet having an orbital period between $P_{\min}$ to
252: $P_{\max}$. Since little is known about the abundance of extrasolar
253: planets with orbital periods greater than $\sim 10$yr, we choose
254: $P_{\max} = 40$yr guided by our own solar system.
255:
256: Another difference between our prior distributions and the
257: distributions of orbital elements for known extrasolar planets is that
258: we assume the planetary orbits are circular, i.e., the orbital
259: eccentricity, $e$, is zero.
260: %
261: Since a circular orbit approximates a Keplerian orbit with a small
262: eccentricity, our algorithm is expected to identify planets with small
263: and even moderately eccentric orbits. However, the efficiency for
264: detecting planets on moderately eccentric orbits may be somewhat
265: reduced compared to the efficiency for detecting a planet on a
266: circular orbit with comparable mass and orbital period. The
267: reduction in efficiency is relatively mild for $e<0.4$, but rapidly
268: becomes more significant (Endl. et al.\ 2002; Cumming 2004).
269: %
270: While many of the known extrasolar planets are on significantly
271: eccentric orbits, the planets with shorter orbital periods tend to
272: have smaller eccentricities. This is likely due in part to tidal
273: circularization affecting planets with small orbital periods (Rasio,
274: Livio, \& Tout 1996). While the assumption of circular orbits is
275: likely appropriate for many planets (especially short-period planets
276: targeted by the N2K project), there is no question that it would be
277: more desirable to include eccentricities. Still, the assumption of
278: circular orbits permits significant computational advantages making it
279: extremely attractive when a large range of parameter space must be
280: searched (e.g., for planetary orbits which are poorly constrained by
281: the currently available data). The reduction in computational
282: requirements makes it computationally feasible to explore the
283: properties of adaptive scheduling algorithms (as in this paper).
284:
285:
286: \subsection{Initial Observations}
287:
288: Before making observations of a target star, there is no basis for
289: believing that the star is more or less likely to have a planet or
290: that various orbital parameters are preferred, except what is
291: suggested by the prior probability distribution. Therefore, an
292: initial set of $N_{\min}$ observations is made for each star. When
293: the number of observations, $N_{\mathrm obs}$ is less than $N_{\min}$,
294: the choices for when to observe are not affected by previous
295: observations, however these choices may be affected by practical
296: considerations. For example, observations must be made at night and
297: radial velocity surveys are typically allocated observing time near full
298: Moon. Additionally, the airmass and atmospheric conditions in the
299: direction of each target may favor observing certain targets at
300: certain times during the available observing nights.
301:
302: %The N2K project
303: %aims to make three observations on consecutive (or nearly consecutive)
304: %nights and a fourth observation a few months later. The first three
305: %observations provide valuable information for constraining possible
306: %hot Jupiters, and the fourth observation provides much improved
307: %sensitivity to planets with longer orbital periods.
308:
309: Under the null hypothesis, there is a single fit parameter for the
310: constant velocity of the star, $C_0$, and the star's velocity is given
311: by $v_{*,C_0}(t) = C_0$. Next, we consider the alternative hypothesis
312: that the star has a single planet in a circular orbit. There are four
313: parameters which can be varied to fit the velocity observations of
314: each star, $P$, $K$, $\phi_o$, and $C_1$, where $C_1$ is the constant
315: velocity of the star. (Given the way the star's velocity is measured,
316: it is typically necessary to use different values of $C_1$ for
317: different observatories. Thus, when there are only a small number of
318: observations of a given target star, it is extremely advantageous if
319: all the observations are made from a single observatory. For the
320: purposes of this paper, we assume that all radial velocity
321: observations are made with a single observatory.) The radial velocity
322: signature of a planet on a circular orbit can be written as
323: %
324: \begin{equation}
325: %
326: v_{*,\vec{x}}(t) = K \cos \left[ \frac{2\pi}{P}t + \phi_o \right] + C_{1}.
327: %
328: \end{equation}
329: %
330: % Also, since we assume circular orbits, $\omega$ and $M_o$ can not be determined separately, but only in the combination $\phi_o$.
331: %
332: After a set of $N_{\min}=3$ initial observations, it is possible to
333: evaluate the plausibility of the fit parameters, $P$, $K$,
334: $\phi_o$, and $C_1$. Since each observation has some
335: observational uncertainties, the orbit still is not uniquely
336: determined, even if our general model is exactly correct.
337:
338: As noted in \S\ref{SecPriors}, when a planet has an orbital period
339: much longer than the time span of observations, then the fit
340: parameters used above are not well determined by the radial velocity
341: observations. Therefore, for orbital periods $\pi T_{\mathrm obs} \le P \le P_{\max}$, we model the radial velocity of the star as
342: %
343: \begin{equation}
344: %
345: v_{*,\vec{a}}(t) = a_0 + a_1 t + a_2 t^2,
346: %
347: \end{equation}
348: %
349: where $\vec{a} = \left( a_0, a_1, a_2 \right)$ is the set of
350: coefficients for the polynomial model.
351:
352:
353: \subsection{Inference}
354:
355: \label{SecInference}
356:
357: Once $N_{\mathrm obs} \ge N_{\min}$, we analyze the available
358: observations using the methods of Bayesian statistics after making
359: each new observation. The results of the analysis can be used to make
360: informed choices for when stars should be targeted for additional
361: observations. Let $\vec{d}$ denote the set of available data, in this
362: case the previous radial velocity observations. We have already
363: introduced the prior probabilities for the null hypothesis, $p(0)$,
364: and for the single planet model, $p(1)$, as well as the prior
365: probability distribution for orbital parameters, $p(\vec{x}|1)$.
366: Next, we introduce the conditional probability for the observations
367: given the null hypothesis, $p( \vec{d} | 0 )$, and the conditional
368: probability for the observations given a fixed set of model
369: parameters, $p(\vec{d} | \vec{x} )$. Since the observational errors
370: are assumed to be independent, both conditional probabilities can be
371: simply evaluated as the product of the probabilities for drawing each
372: observation given the relevant model for the stellar velocity. Since
373: each radial velocity measurement is obtained by averaging the Doppler
374: shift measured for hundreds of spectra lines, the observational
375: uncertainties are very well approximated by a normal distribution, and
376: the conditional probabilities are given by
377: %
378: \begin{eqnarray}
379: p(\vec{d} | \vec{m} )
380: & = & \prod_i p(d_i | \vec{m})
381: = \prod_i \frac{\exp{\left[-\frac{\left(d_i - v_{*,\vec{m}}(t_i)\right)^2}{2\sigma_i^2} \right]}}{\sqrt{2 \pi} \sigma_i} \\\
382: & = & \frac{\exp \left[\frac{-1}{2} \sum_i \left(\frac{d_i-v_{*,\vec{m}}(t_i)}{\sigma_i}\right)^2 \right]}{\left(2\pi\right)^{N_{\mathrm obs}/2} \prod_i \sigma_i}
383: \equiv \frac{\exp \left[ \frac{-\chi^2(\vec{m})}{2}\right]}{\left(2\pi\right)^{N_{\mathrm obs}/2} \prod_i \sigma_i},
384: \end{eqnarray}
385: %
386: where $\vec{m}$ represents the generalized model parameters, i.e.,
387: either $\vec{m}=(C)$ (the null hypothesis model), $\vec{m}=\vec{x}$ (the single
388: planet model with $P < \pi T_{\mathrm obs}$), or $\vec{m}=\vec{a}$ (the
389: polynomial model for a planet with $\pi T_{\mathrm obs} \le P \le
390: P_{\max}$). Each individual observation, $d_i$, is made at a time,
391: $t_i$, and has an observational uncertainty, $\sigma_i$. Since the
392: observational uncertainties are nearly Gaussian and assumed to be
393: independent, the conditional probability distribution for all the
394: available observations is a chi-squared distribution, and we
395: introduced the goodness of fit statistics, $\chi^2(\vec{m})$ which can be
396: easily computed for each set of model parameters, $\vec{m}$.
397:
398: Next, we introduce terminology from Bayesian statistics, $p(\vec{d},
399: 0)$, the joint probability for the observations and the null
400: hypothesis, and $p(\vec{d}, \vec{x})$, the joint probability for the
401: observations and the single planet hypothesis with a particular set of
402: model parameters, $\vec{x}$. The joint probabilities can be written
403: as the product of the prior probability and the conditional
404: probability, e.g., $p(\vec{d}, 0) = p(0) p(\vec{d} | 0)$. We will
405: also use Bayes theorem, which states that
406: %
407: \begin{equation}
408: p(\vec{m} | \vec{d})
409: = \frac{p(\vec{d}, \vec{m})}{p(\vec{d})}
410: = \frac{p(\vec{m}) p(\vec{d} | \vec{m})}{\int d\vec{m} \, p(\vec{m}) p(\vec{d} | \vec{m})},
411: \end{equation}
412: %
413: We use the joint probabilities and Bayes' theorem, to compute the
414: posterior probabilities which incorporate both the prior probabilities
415: and the information contained in the observations, $\vec{d}$. For
416: example, the posterior probability for the null hypothesis is
417: %
418: \begin{eqnarray}
419: p( 0 | \vec{d} ) &
420: = & \frac{ p(\vec{d}, 0) }{p(\vec{d}, 0) + \int d\vec{x} \, p(\vec{d}, \vec{x}) + \int d\vec{a} \, p(\vec{d}, \vec{a}) } \\
421: & = & \frac{\int dC \, p(C) p(\vec{d}| C) }{\int dC \, p(C) p(\vec{d}| C) + \int d\vec{x} \, p(\vec{x}) p(\vec{d} | \vec{x}) + \int d\vec{a} \, p(\vec{a}) p(\vec{d} | \vec{a}) } \\
422: & = & \frac{p(0) \int dC \, p(C | 0) p(\vec{d}| C) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C) + p(1) \int d\vec{x} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{a}) },
423: \end{eqnarray}
424: %
425: where $p(C | 0)$ is the prior distribution (flat) for the constant velocity
426: given the null hypothesis, $p(\vec{x} | 1)$ is the prior distribution
427: for the sinusoidal fit parameters given that a planet is present, and
428: $p(\vec{a} | 1)$ is the prior distribution for the polynomial fit
429: parameters given that a planet is present.
430:
431: Unfortunately, the integrals, and particularly the integral over
432: $\vec{x}$ in the denominator, can be extremely difficult to evaluate.
433: In particular, $\chi^2(\vec{m})$ and hence $p(\vec{d} | \vec{x} )$
434: can be extremely ``bumpy'' functions (Ford 2005). It is computationally
435: impractical to actually calculate $\chi^2(\vec{x})$ over the
436: entire range of parameter space with sufficient resolution to
437: approximate the integral accurately. Therefore, we must find a way to
438: approximate the integrals in a computationally efficient manner.
439:
440: When the orbital parameters are well constrained, then the integral is
441: typically dominated by the contribution from a small region of
442: parameter space near the best-fit solution and the integrals are
443: easily evaluated. Even when the orbital parameters are somewhat less
444: constrained, the method of Markov chain Monte Carlo provides a
445: powerful tool for evaluating the necessary integrals. However, even
446: Markov chain Monte Carlo is not computationally practical when the observations still
447: permit a wide variety of distinct orbital solutions (e.g., when there
448: are only a small number of observations). Since the N2K
449: project is particularly interested in working with small data sets, we
450: expect that it will frequently be necessary to analyze radial velocity
451: observations which only provide limited constraints on the orbital
452: parameters, given the typical planetary masses, orbital periods, and
453: measurement errors. Therefore we have developed an efficient
454: algorithm for approximating the necessary integrals.
455:
456: % Rather than approximating the integral by integrating over the region of parameter space near the best-fit solution, we approximate the integral over $\vec{x}$ by summing the contributions from many regions of parameter space, each region centered on one of the many local maxima in $p(\vec{d} | \vec{x})$.
457: %
458: While the integrand typically has numerous local maxima which can be
459: spread across a wide range of orbital periods, if the orbital period
460: is held fixed at $P$, then the integral over the remaining fit
461: parameters is dominated by the contribution from the single maximum
462: (assuming a circular orbit). Therefore, we approximate the integral
463: by separating the integral over orbital period, $P$, from the
464: integrals over the remaining fit parameters, $\vec{x}_P$. We sum the
465: contributions to the integral from each of the regions around the
466: best-fit solutions for each orbital period. Thus, for the purposes of
467: computation, we replace the integral over orbital period with a
468: summation and will approximate the integrals over $\vec{x}_P$, giving
469: %
470: \be
471: p( 0 | \vec{d} ) = \frac{ p(0) \int dC \, p(C | 0) p(\vec{d}| C) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C) + p(1) \sum_i \Delta \log P_i \int d\vec{x}_{P_i} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x})},
472: \ee
473: %
474: where $\Delta \log P_i$ is the spacing between the logarithm of successive orbital
475: periods, and $\vec{x}_P$ is the set of fit parameters excluding the
476: orbital period, $P$. Similarly, the posterior probability for a planet with orbital period near $P$ is given by
477: %
478: \be
479: p( P | \vec{d} ) \Delta \log P = \frac{p(1) \Delta \log P \int d\vec{x}_{P} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C) + p(1) \sum_i \Delta \log P_i \int d\vec{x}_{P_i} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x})},
480: \ee
481: %
482: and the posterior probability for a planet with an orbital period greater than
483: $\pi T_{\mathrm obs}$ is
484: %
485: \be
486: p( P\ge \pi T_{\mathrm obs} | \vec{d} ) = \frac{p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x}) }{p(0) \int dC \, p(C | 0) p(\vec{d}| C) + p(1) \sum_i \Delta \log P_i \int d\vec{x}_{P_i} \, p(\vec{x} | 1 ) p(\vec{d} | \vec{x}) + p(1) \int d\vec{a} \, p(\vec{a} | 1 ) p(\vec{d} | \vec{x}) } .
487: \ee
488: %
489: Clearly, the posterior probability that a star has a planet with any
490: orbital period is simply
491: %
492: \be
493: p( 1 | \vec{d} ) = \sum_i p( P_i | \vec{d} ) \Delta \log P_i + p(P\ge\pi T_{\mathrm obs}) = 1 - p( 0 | \vec{d} ).
494: \ee
495: %
496:
497: For each orbital period, $P$, we must approximate each of the
498: integrals over $\vec{x}_P$. Since the prior, $p(\vec{x} | 1, P_i)$ is
499: flat, we expand the argument of the exponential, $\chi^2(\vec{x}_P |
500: P)$, about its minimum ($P$ is held fixed). Since we expand about a
501: minimum, the first derivatives of $\chi^2$ with respect to the
502: variable in $\vec{x}_P$ vanish. Therefore the $\chi^2$ surface is a
503: quadratic function centered on the minimum, and we can approximate the
504: integral by extending the limits of integration to infinity. The
505: resulting multidimensional Gaussian integral can then be evaluated
506: analytically, using only the value of $\chi^2$ at its minima,
507: $\min_{\vec{x}_P} \chi^2(\vec{x}_P | P)$, and the determinant of the
508: covariance matrix, $\mathrm{Covar}\left(\chi^2(\vec{x}_P | P)\right)$,
509: as
510: %
511: \be
512: \int d\vec{x}_{P} \, p(\vec{x}_P, P | 1 ) p(\vec{d} | \vec{x}) \simeq
513: \frac{\sqrt{\mathrm{Det} \left|\mathrm{Covar}\left(\chi^2(\vec{x}_P | P)\right)\right|}}{\left(2\pi\right)^\nu \prod_i \sigma_i } \exp \left[ -\frac{1}{2} \min_{\vec{x}_P} \chi^2(\vec{x}_P | P)\right],
514: \ee
515: %
516: where $\nu = N_{obs} - N_{fit}$ and here $N_{fit}=3$ (Sivia 1996; Cumming 2004). In a similar
517: way, the integral over $\vec{a}$ can be approximated by
518: %
519: \be
520: \int d\vec{a} \, p(\vec{a} | 1) p(\vec{d} | \vec{a}) \simeq
521: \frac{\sqrt{\mathrm{Det} \left| \mathrm{Covar}\left(\chi^2(\vec{a})\right)\right|}}{\left(2\pi\right)^\nu \prod_i \sigma_i } \exp \left[-\frac{1}{2} \min_{\vec{a}} \chi^2(\vec{a})\right],
522: \ee
523: %
524: and the integral over $C$ can be approximated by
525: %
526: \be
527: \int dC \, p(C | 0) p(\vec{d}| C) \simeq
528: \frac{\sqrt{\mathrm{Var}\left(\chi^2(C)\right)}}{\left(2\pi\right)^\nu \prod_i \sigma_i } \exp \left[ -\frac{1}{2} \min_{C} \chi^2(C)\right],
529: \ee
530: %
531: where $N_{fit} = 1$ and the determinant of the covariance matrix has
532: been replaced by the square root of the variance of the single fit
533: parameter, $C$. Thus, we approximate the necessary integrals by
534: explicitly summing the contributions from the null hypothesis, the
535: polynomial model, and all possible orbital periods, but approximate
536: the integrals over the remaining fit parameters as Gaussian integrals.
537: This provides a good approximation to the necessary integrals while
538: leaving only one dimension ($P$) which must be finely sampled.
539:
540: It remains to identify the best fit solution for each orbital period
541: considered and to evaluate the probability of each of these possible
542: solutions. This is equivalent to the problem of evaluating the
543: floating-mean periodogram. The floating-mean periodogram and its
544: relationship to the standard periodogram is described by Cumming
545: (2004). The periodogram is evaluated on a grid uniform in the
546: frequency, $f = 1/P$, rather than in $\log P$. Thus, the factors
547: $\Delta \log P_i$ serve as weighting factors to ensure that we
548: maintain a prior which is uniform in $\log P$. The necessary number
549: of orbital periods to consider is set by the ratio of the maximum
550: orbital period considered, $\pi T_{\mathrm obs}$, to the minimum
551: period considered, $P_{\min}$. Since we do not want to miss a minima
552: in $\min_{\vec{x}_P} \chi^2(\vec{x}_P, P)$, we oversample by a factor
553: $\zeta \simeq 4$. Thus, the number of orbital periods considered is
554: $N_P \simeq \zeta \pi T_{\mathrm obs} / P_{\min}$, a constant times
555: the Nyquist frequency corresponding to the minimum period. Note that
556: we do not count sine and cosine components separately, and we are
557: searching for periods up to $\pi T_{\mathrm obs}$, despite the fact
558: that models with orbital periods longer than $T_{\mathrm obs}$ are so
559: similar that they can not be distinguished with the previous
560: observations. The time span of observations can be as short as a
561: couple of months or extend for several years. Hence it is typically
562: necessary to find the best-fit orbital parameters for thousands of
563: orbital periods. Given the large number of global searches necessary,
564: the computation time required is significantly reduced if $\chi^2$ can
565: be written as a linear function of the fit parameters. While this is
566: impossible for eccentric Keplerian orbits, it is possible for circular
567: orbits by writing
568: %
569: \begin{equation}
570: %
571: v_{*,\vec{x}}(t) = A \cos (\frac{2\pi}{P}t) + B \sin (\frac{2\pi}{P}t) + C_{1},
572: %
573: \end{equation}
574: %
575: where $K = \sqrt{A^2+B^2}$, and $\phi_o = \tan \frac{B}{A}$. Using
576: this formulation allows for each of the best-fit solutions and the
577: covariance matrices to be evaluated by linear least-squares which is
578: much faster and more robust than non-linear least squares. Since we
579: used $A$ and $B$ as fit parameters rather than $\log K$ and $\phi_o$,
580: we must include a weight equal to the determinant of the Jacobian of
581: transformation, $\left|J\right|=K^{-2}$. While the Jacobian should
582: formally be inside the integral, we approximate the integral by
583: substituting the value of $K$ at the minimum in $\chi^2(\vec{x}_P |
584: P)$.
585:
586: While $K$ is allowed to take on any positive value, for the purpose of
587: comparing the posterior probability of the no-planet and one-planet
588: models it is necessary to normalize the prior distribution for $K$.
589: For this purpose only, we assume $K_{\min} \le K \le K_{\max}$, where
590: $K_{\max}$ is the amplitude of a 10 Jupiter-mass planet orbiting a solar-mass
591: star with an orbital period of $P_{\min}$. Again, solely for the purposes
592: of setting the normalization, we adopt $K_{\min}$ is the signal amplitude for
593: which there would be a $\sim50\%$ probability of detecting the planet
594: and $K_{\max}$ is the maximum velocity amplitude of a planet for the
595: specified orbital period. For a planet on a circular orbit, $K_{\max}
596: = 2\pi (m_{\max}/M_*).\left( G (M_*+m_{\max}) / P\right)^{-1/3}$,
597: where $m_{\max}/M_* = 0.01$ is the ratio of the maximum planet mass to
598: the star mass and $G$ is the gravitational constant. Note that
599: $K_{\max}$ and hence the normalization for the prior, $p(\vec{x})$,
600: varies with the orbital period. For $K_{\min}$ we use the analytic
601: approximation from (Cumming et al.\ 2002), $K_{\min} = 2
602: \sigma_{\mathrm obs} \sqrt{T_{\mathrm obs} / \left( P_{\min}
603: (N_{obs}-3) FAP \right)}$ , where $\sigma_{\mathrm obs}$ is the
604: uncertainty of the individual velocity measurements, $N_{obs}$ is the
605: number of previous observations of the star, and $FAP$ is the false
606: alarm probability which we set to $1/N_{\mathrm targets}$, the inverse
607: of the number of target stars. As pointed out by an anonymous
608: referee, this choice is somewhat arbitrary. Nevertheless, given our
609: choice of prior, it is necessary to make some choice to result in a
610: normalized probability distribution. In the future, we would suggest
611: using a single normalized prior distribution that has support below
612: $K_{\min}$, such as the modified Jeffreys prior, $p(K) = (K+K_o)^{-1}
613: / \log\left[1+K/K_o\right]$.
614:
615: The above procedure allows us to efficiently calculate the probability
616: of the null hypothesis as well as a list of probabilities that there
617: is a planet with each of the orbital periods considered. For each of
618: these probabilities, there is also a set of best-fit parameters and a
619: covariance matrix which describe the size and shape of the posterior
620: probability distribution for the remaining fit parameters. To the
621: extent that our model and approximations are valid, these
622: probabilities and covariance matrices provide the optimal basis for
623: making inferences about the presence of a planet and its orbital
624: parameters. Each time that a new observation is made, the entire procedure
625: is repeated to produce updated posterior distributions which
626: incorporate the new information from the latest observation.
627:
628: \subsection{Prediction}
629: \label{SecPrediction}
630:
631: Having estimated the posterior probability distributions for model
632: parameters in \S\ref{SecInference}, it is straightforward to sample
633: from $p(v(t) | \vec{d})$, the predictive probability distribution for a
634: hypothetical radial velocity observation at time, $t$.
635: %
636: \be
637: p(v(t) | \vec{d} )
638: = \int d\vec{m} \, p(v(t) | \vec{m}) p(\vec{m} | \vec{d} )
639: = \int dC p(v(t)|C) \, p(C| \vec{d})
640: + \int d\vec{x} \, p(v(t)|\vec{x}) p(\vec{x} | \vec{d})
641: + \int d\vec{a} \, p(v(t) | 1, \vec{a}) p(\vec{a} | \vec{d})
642: \ee
643: %
644:
645:
646: Various summary statistics can be computed for $p(v(t) | \vec{d})$ at
647: each of several possible future observing times. Perhaps the
648: simplest is to calculate the mean ($E[v(t) | \vec{d}]$) and variance
649: ($E[\mathrm{Var}(v(t) | \vec{d})]$) of the velocities sampled from
650: $p(v(t) | \vec{d})$. This can be done extremely efficiently and has
651: the added benefit that that the mean and variance are straight forward
652: to interpret.
653:
654: Naively, it might seem desirable to make future observations when
655: $E[\mathrm{Var}(v(t) | \vec{d} )]$ is largest (See Fig.\ 1). While this seems to be a
656: reasonable strategy, a more rigorous analysis will lead to a somewhat
657: different result, as demonstrated in \S\ref{SecDesign}.
658:
659: \subsection{Design}
660: \label{SecDesign}
661:
662: A more sophisticated analysis incorporates the concept of a utility
663: function from decision theory.
664: The utility function makes explicit
665: the utility of a specific combination of an action (e.g., observe at
666: time $t$) and an outcome (e.g., measure a velocity $v(t)$). While the
667: experimenter can choose the action to be taken, the outcome is not
668: known a priori. Nevertheless, the predictive distribution, $p(v(t) |
669: \vec{d})$, contains information about likelihood of various outcomes
670: for a given action. Thus, the experimenter can calculate the expected
671: value of the utility function for various possible actions. Then the
672: action with the largest expected utility can be chosen.
673: Throughout this section, we closely follow the derivation of Loredo (2004).
674:
675: % TODO: Change below to use Kullback-Leibler entropy for multi star case
676:
677: While numerous utility functions are possible, one particularly well
678: motivated choice is to set the utility function equal to the change in
679: the information contained in the posterior probability distribution
680: for model parameters after incorporating the future observation. Let
681: $I\left\{f(z)\right\}$ be the information contained in the distribution
682: $f(z)$, which is the negative of the Shannon entropy and is given by
683: %
684: \be
685: I\left\{f(z)\right\} = \int dz \, f(z) \log f(z).
686: \ee
687: %
688: The expectation for the information contained in the posterior
689: distribution for the model parameters after incorporating the future
690: observation is
691: %
692: \be
693: E[I\left\{p(\vec{m} | \vec{d}')\right\} ] = \int dv \, p(v|\vec{d}) I\left\{p(\vec{m} | \vec{d}')\right\},
694: \ee
695: %
696: where $\vec{d}'$ is the set of previous observations, $\vec{d}$,
697: augmented by the future observation, $v$. Next, we will invoke Shannon's theorem which can be derived by considering the information contained in a joint probability distribution, in this case, $p(\vec{m},\vec{d}')=p(\vec{m}|\vec{d}')p(\vec{d}') = p(\vec{d}'|\vec{m})p(\vec{m})$. By writing out the integrals contained in
698: %
699: \be
700: I\left\{p(\vec{m}|\vec{d}') p(\vec{d}') \right\} = I\left\{p(\vec{d}'|\vec{m}) p(\vec{m}) \right\},
701: \ee
702: %
703: separating integrals when possible, and simplifying integrals over probability densities which integrate to unity, we arrive at Shannon's theorem,
704: %
705: \be
706: I\left\{p(\vec{m} | \vec{d})\right\} + \int d\vec{m} \, p(\vec{m} | \vec{d} ) I\left\{ p(v|\vec{m}) \right\} =
707: I\left\{p(v | \vec{d} )\right\} + \int dv \, p(v | \vec{d}) I\left\{p(\vec{m} | \vec{d}') \right\},
708: \ee
709: %
710: to rewrite the expected information as
711: %
712: \be
713: \label{EqnExpInfo}
714: E[ I\left\{p(\vec{m} | \vec{d}')\right\} ] =
715: I\left\{p(\vec{m} | \vec{d})\right\} + \int d\vec{m} \, p(\vec{m} | \vec{d} ) I\left\{ p(v|\vec{m}) \right\}
716: - I\left\{p(v | \vec{d} )\right\}.
717: \ee
718: %
719: The first term in Eqn.\ref{EqnExpInfo} is simply the information about
720: the model parameters already available from the previous observations,
721: $\vec{d}$, and is independent of future observations. The second term
722: is the weighted average of the information contained in the
723: probability distribution for the future observation conditioned on a
724: particular model. Note that the distribution, $p(v(t)|\vec{m})$, is
725: the distribution of the observed velocities if the model where exactly
726: known. Although the location of the distribution for the predicted
727: velocities depends on $t$ and $\vec{m}$, the shape and scale of the
728: distribution is independent of both $t$ and $\vec{m}$. Since the
729: Shannon entropy of a distribution depends only on the scale and shape
730: of a distribution (and not the location where it is centered), the
731: second term of Eqn.\ \ref{EqnExpInfo} is also constant. The remaining
732: term is the information content of the predictive distribution,
733: $p(v(t)|\vec{d})$, and has an explicit dependence on $\vec{d}$. Thus,
734: the expected change in the information content of the posterior
735: probability distribution for the model parameters is
736: %
737: \be
738: E[ \Delta I\left\{p(\vec{m} | \vec{d}')\right\} ](t) = I\left\{ p(v(t) | v_{*}(t)) \right\}
739: - I\left\{p(v(t) | \vec{d} )\right\},
740: \ee
741: %
742: where $v_{*}(t)$ is the actual radial velocity of the star at time $t$ (as opposed to the observed velocity $v(t)$). The first term depends only on the distribution for the measurement about the true value, which we assume is independent of time. Therefore, the expected
743: change in the information is maximized if the next observation is
744: taken when the information content of the predictive distribution is
745: minimized and the entropy of the predictive distribution is maximized.
746: Thus, an observing program will more efficiently constrain the orbital
747: parameters of a given target star if future observations are made at
748: times when the uncertainty in the predictive distribution is large.
749:
750: The above analysis naturally leads to the technique of maximum entropy sampling.
751: Once $p(v(t) | \vec{d})$ has been estimated as outlined in sections
752: \ref{SecInference}, the Shannon entropy, $-I\left\{p(v(t) | \vec{d})\right\}$ can be
753: easily calculated for numerous possible future observation times.
754: In particular, the necessary integral can be written as
755: \be
756: I\left\{p(v(t)|\vec{d}) \right\} = \int d\vec{m} \, p(\vec{m}|\vec{d}) \int dv p(v|\vec{m}) \log \left[ \int d\vec{m}' p(\vec{m}'|\vec{d}) p(v|\vec{m}') \right],
757: \ee
758: %
759: where the first integral represents a sum sampling the model
760: parameters from $p(\vec{m}|\vec{d})$, and the second integral
761: represents a sum sampling the prospective velocity from $p(v|\vec{m})$
762: using the previously drawn model parameters. For each velocity drawn
763: in this manner, we must calculate the probability of obtaining that
764: velocity according to the full posterior distribution as the argument
765: to the logarithm.
766:
767: By choosing to make the next observation when $I\left\{p(v(t) |
768: \vec{d})\right\}$ is near a minimum, the observation is expected to
769: yield more information about the model parameters than if the next
770: observation time were chosen randomly. Once a new observation is
771: made, we must repeat the entire process of calculating a posterior
772: probability distribution for the model parameters, the predictive
773: distribution for the velocity at future times, and the entropy of the
774: predictive distribution at each time.
775:
776: \subsection{Maximum Entropy versus Maximum Variance}
777:
778: It is easy to demonstrate that the Shannon entropy of a Gaussian
779: distribution with standard deviation, $\sigma$, is $-\log (\sqrt{2\pi
780: e} \sigma)$. If the uncertainty in the prospective measurement is
781: Gaussian with variance $\sigma_{\mathrm obs}^2$ and the predictive
782: distribution is also well approximated by a normal distribution with
783: variance $\sigma_{\mathrm pred}^2(t)$, then the expected change in information
784: reduces to
785: %
786: \be
787: E\left[ \Delta I\left\{p(\vec{m} | \vec{d}')\right\}(t)\right] = \log\left(\frac{\sigma_{\mathrm pred}(t)}{\sigma_{\mathrm obs}}\right).
788: \ee
789: %
790: Thus, the more simplistic strategy of choosing observation times to
791: maximize the variance (rather than the entropy) of the predictive
792: distribution (as described in \S\ref{SecDesign}) is equivalent when
793: the predictive distribution is normal. Based on visual inspection of
794: several predictive distributions, we observe that the predictive distribution
795: is typically well approximated by a normal distribution, if the period is assumed
796: to be known precisely. While the predictive
797: distribution may be well approximated by a Gaussian distribution for
798: well constrained orbits, when $N_{\mathrm obs}$ is small, the
799: predictive distributions are generally not well approximated by a
800: Gaussian. In particular, if there is a significant probability for
801: two qualitatively different models (e.g., null hypothesis and a planet
802: with orbital period near $P$), then the predictive distribution is
803: frequently bimodal with one mode centered on the best-fit constant
804: velocity and another mode near the velocity predicted by the best-fit
805: sinusoidal solution with a different orbital period.
806: For example, let us
807: consider a case where there is a posterior probability, $p_a$, for
808: models with orbital period near $P_a$ and predictive velocity
809: distribution approximately Gaussian centered on $v_a(t)$ with standard
810: deviation $\sigma_a(t)$, and there is a posterior probability, $p_b$,
811: for models with orbital period near $P_b$ (or perhaps the null model)
812: and a predictive distribution approximately Gaussian centered on
813: $v_b(t)$ with standard deviation $\sigma_b(t)$. The the predictive
814: distribution is approximated by
815: %
816: \be
817: p(v(t)|\vec{d}) \simeq p_a N(v_a(t), \sigma_a(t)) + p_b N(v_b(t), \sigma_b(t)),
818: \ee
819: %
820: and the information contained in the predictive distribution is approximated by
821: %
822: \be
823: I\left\{p(v(t)|\vec{d})\right\} \simeq
824: p_a I\left\{N(v_a(t),\sigma_a(t))\right\} + p_b I\left\{N(v_b(t),\sigma_b(t))\right\} \simeq
825: 0.5 p_a \log\left(2\pi e \sigma_a\right) + 0.5 p_b \log\left(2\pi e \sigma_b\right),
826: \ee
827: %
828: if $v_a(t)-v_b(t) \gg \sqrt{\sigma^2_a(t) + \sigma^2_b(t)}$. Most
829: notably, the information in the predictive distribution is not
830: sensitive to the separation $v_a(t)-v_b(t)$, but the variance in the
831: distribution obviously does depend on the separation $v_a(t)-v_b(t)$.
832:
833: As can be seen in this example, the entropy of such a distribution is not
834: sensitive to the separation between the modes, unlike the variance
835: which increases with the separation between the two modes. Thus,
836: choosing observing times based on the variance rather than the entropy
837: of the predictive distribution will tend to favor observing at times
838: when the observations do not completely rule out another possible
839: model which predicts a very different velocity. While choosing future
840: observation times based on the variance of the distribution may be
841: acceptable for well constrained orbits, it is particularly important
842: to use maximum entropy sampling when the observations are not yet able
843: to exclude qualitatively different models.
844:
845: \subsection{Examples}
846: \label{SecExamples}
847:
848: We have begun to apply the inference and predictive steps to some of
849: the observations taken near the beginning of the N2K project. In
850: Fig.\ 1, we show the expected velocity and the 5th and 95th
851: percentiles of the predictive distribution as a function of time for
852: several target stars. We show the median of the predictive
853: distribution as the heavy line and the credible intervals as the
854: thinner lines.
855:
856: By inspecting confidence intervals for the predictive distributions based on
857: actual observational data, we have identified four common cases.
858: %
859: \begin{enumerate}
860: %
861: \item There is structure in the predictive distribution both during
862: the prior observations and significantly after the last observation.
863: The orbital period is well constrained and the structure is nearly
864: periodic with the same period (e.g., Fig.\ 1, top row).
865:
866: \item There is structure in the predictive distribution both during
867: the prior observations and significantly after the last observation.
868: The orbital period is not precisely known, and so the structure is not
869: periodic or is nearly periodic on a timescale significantly longer
870: than the orbital period (e.g., Fig.\ 1, lower left).
871:
872: \item In other cases, there is significant variability in the scale of
873: the predictive distribution times in the past, but not for
874: times in the future (Fig.\ 1, lower right). This can occur when the
875: orbital period is only weakly constrained. The uncertainty in the
876: orbital period causes information about the orbital phase to be lost
877: with time and the uncertainty in the orbital phase dominates
878: the width of the predictive distribution.
879:
880: \item There is no significant variability in the predictive
881: distribution around the prior observations, but the variability grows
882: with time due to the possibility of a long period planet (polynomial
883: terms).
884: %
885: \end{enumerate}
886: %
887:
888: In the first two cases, maximum entropy sampling could provide a
889: valuable increase in the efficiency of constraining orbital
890: parameters. In the first case, the structure in the predictive
891: distribution is periodic, so it is possible to identify the best time
892: to observe the system each orbital period. However, in the second
893: case, it is not clear how frequently the system should be observed.
894: If the system is observed each time there is a local maximum in the
895: variance or entropy of the predictive distribution, then many
896: observations may be made during a single orbital period.
897: Alternatively, if the system is not observed at each local maximum,
898: then observations may skip an entire orbital period. The last two
899: cases illustrates another problem with the maximum entropy sampling
900: algorithm applied to a single star. When the entropy of the
901: predictive distribution increases with time, then maximum entropy
902: sampling does not identify any particular time. In the next section
903: we present a variation on this algorithm which overcomes these
904: difficulties.
905:
906:
907: \section{Adaptive Scheduling for Multiple Target Stars}
908: \label{SecMulti}
909:
910: In modern planet searches, there is typically a large list of possible
911: target stars and another list of opportunities to observe a small
912: subset of these stars. Here we rephrase the goal of adaptive
913: scheduling algorithms. Instead of identifying the best time to
914: observe a particular target, we ask which targets would be best to
915: observe at a particular time. In this context, an adaptive scheduling
916: algorithm determines both the times at which each target star is
917: observed and the number of times each target star is observed.
918: Adaptively choosing the observing times as in \S\ref{SecSingleStar}
919: can significantly improve the efficiency for constraining orbital
920: parameters for stars with planets as demonstrated by Loredo (2004).
921: Similarly, adaptively choosing the number of observations of each
922: target star can significantly improve the efficiency for detecting
923: planets. Thus,adaptive scheduling algorithms can provide a double
924: benefit to planet searches.
925:
926:
927: \subsection{Maximum Entropy}
928: \label{SecMultiMaxEntropy}
929:
930: A straightforward generalization of the methods described in
931: \S\ref{SecSingleStar} is to apply the principles of maximum entropy
932: sampling to the joint posterior distribution function for model
933: parameters for each target star, $P \equiv p(\vec{m}_1, \vec{m}_2, ...,
934: \vec{m}_{N_{\mathrm targets}} | \vec{d}_1, \vec{d}_2, ...,
935: \vec{d}_{N_{\mathrm targets}})$, where $\vec{m}_i$ are the model
936: parameters for the $i$th star and $\vec{d}_i$ are the observations of
937: the $i$th star. Since the posterior distributions for the fit
938: parameters of each star are independent of each other, $P = \prod_i p(\vec{m}_i | \vec{d}_i)$.
939: Therefore, the information contained in the joint posterior
940: distribution is simply the sum of the information contained in the
941: posterior distribution of each star independently.
942: %
943: \be
944: I\left\{ P \right\} = \sum_i I\left\{ p(\vec{m}_i | \vec{d}_i ) \right\},
945: \ee
946: %
947: and the expected
948: increase in information about the joint distribution is equal to the
949: expected increase in information about the posterior distribution of
950: the star being targeted,
951: %
952: \be
953: E\left[ \Delta I\left\{P\right\} \right](t,i) = E\left[\Delta I\left\{p(\vec{m}_i | \vec{d}_i)\right\}(t)\right],
954: \ee
955: %
956: where $i$ indicates which star is being targeted at time $t$. Thus,
957: one can calculate the expected increase in information about the
958: posterior distribution for the model parameters for each star
959: separately and then choose to observe the star which is expected to
960: yield the most information.
961:
962: After each new observation is obtained, the above procedure can be
963: repeated and a new target star chosen. In
964: practice, the procedure that we describe requires significant
965: computation time and dozens of radial velocity observations are made
966: on a clear night. Since the orbital periods (and hence timescale for
967: variability in the predictive distributions) are typically long
968: compared to one night of observing, it is reasonable to calculate the
969: predictive distributions and entropy for each star at one time during
970: a night of observing (e.g. the time at which the star reaches its the
971: maximum altitude during the night). A list of the stars with the largest
972: expected increase in information can be targeted during that observing
973: night.
974:
975: In principle, the predictive distributions and entropy for each star
976: could be calculated at several times during a night of observing.
977: This would make it possible to choose the observing time precisely,
978: rather than just which night to observe the star. This could be
979: valuable for stars with very short orbital periods or highly eccentric
980: orbits (and hence short timescales for periastron passage). In
981: practice, there are significant limitations on when a star can be
982: observed and costs associated with observing stars in an arbitrary
983: order. (We will discuss how to incorporate this costs in
984: \S\ref{SecAltCosts}.) For simplicity, in our simulations described below we do not
985: attempt to optimize the observing schedule across times within one
986: night.
987:
988:
989: \subsection{Example}
990: \label{SecMultiExample}
991:
992: To demonstrate the value of adaptive scheduling, we have simulated
993: radial velocity planet surveys using both regular and adaptive target
994: scheduling. First, we generate a list of 1000 target stars and
995: randomly assign planets to some of them. The frequency, mass, and
996: orbital period distributions are taken from Tabachnik \& Tremaine
997: (2002). We then randomly choose 20 observing nights per year. To simulate
998: the allocation of nights on a large telescope, we restrict the
999: possible observing nights to be during the quarter of the lunar month
1000: closest to full moon. Each night 100 observation times are regularly
1001: spaced during the night. For the regular scheduling algorithm, stars
1002: with the smallest number of observations are given the highest
1003: priority. Among stars which have the same number of observations, the
1004: stars which are less frequently observable are given priority. For
1005: the adaptive scheduling algorithm, each star is observed three times
1006: as with the regular scheduling algorithm. Subsequently, a Bayesian
1007: analysis (as described in \S\ref{SecInference}) of the available
1008: observations is performed at the conclusion of each night a star is
1009: observed. Before each observing night the predictive probability
1010: distribution is calculated (as described in \S\ref{SecPrediction}) for
1011: the velocity of each star observable on that night. The exact time
1012: for calculating the predictive distribution is the time at which the
1013: star reaches maximum altitude during the night. The possible target stars are
1014: prioritized based on the entropy of the predictive distribution. The
1015: 100 stars with the highest priorities are observed that night in order of their
1016: right ascension (not necessarily at the exact time for which the
1017: predictive distribution was calculated).
1018:
1019:
1020: Here we present a summary of the results of these simulations. At the
1021: end of each night we monitor the posterior distributions for the model
1022: parameters of each system, paying particular attention to the number
1023: of planet detections (which we define to be systems for which the
1024: probability of the null hypothesis, no planet, is less than 0.1\%).
1025: %
1026: % and the number of false alarms (stars for which there is no planet, but the posterior probability for the no-planet hypothesis is less than 0.1\%). First, we verify that the false alarm rate is approximately 0.1\%, as expected.
1027: %
1028: In Fig.~2, we show how the number of detections
1029: increases as a function of the number of observing nights. The
1030: adaptive scheduling algorithm based on \S\ref{SecMulti} (dashed blue)
1031: %
1032: % FIXED
1033: %
1034: is clearly more efficient than the regular scheduling algorithm (solid
1035: black) for detecting planets, even though it is not explicitly
1036: optimized for detecting the largest number of planets. More
1037: importantly, the additional planets that are being detected by the
1038: adaptive scheduling algorithm, tend to be those with the smallest
1039: velocity amplitudes (see Fig.~3). This is accomplished by observing some stars
1040: more frequently than others. In Fig.~4 we present a histogram showing
1041: the fraction of stars that were observed a given number of times.
1042: While the regular scheduling algorithm observed each star ten times,
1043: the adaptive algorithms observed many stars slightly less frequently
1044: and a few stars much more frequently. This makes the adaptive
1045: scheduling algorithms much more sensitive to planets with velocity
1046: amplitudes near the threshold of detection. Thus, while the total
1047: number of planets detected increases by $\sim10-20\%$, the mass of
1048: the least massive planet detected by the adaptive scheduling algorithm
1049: is less than that of the regular scheduling algorithm by a factor
1050: $\sim2$ or more. It is also important to note that the accuracy with
1051: which the orbital parameters are measured has not been sacrificed (see Fig.~5).
1052: While the orbital periods and amplitudes for planets with the largest
1053: velocity amplitudes ($\ge100$m/s) are measured with a similar
1054: accuracies, the adaptive scheduling algorithm provides a significant
1055: improvement in the accuracy of the orbital parameter determinations
1056: for planets with more modest velocity amplitudes ($\le30$m/s).
1057:
1058:
1059: \section{Alternative Utility Functions}
1060: \label{SecAltUtility}
1061:
1062: In \S\ref{SecSingleStar} we focused on when to observe a single star, and
1063: hence the predictive distribution, $p(v | \vec{d})$, and its entropy
1064: were an obvious choices for comparing the utility of observations at
1065: various times. In \S\ref{SecMulti}, we focused on choosing which
1066: star (from a large list) should be targeted at the next observing
1067: opportunity. In \S\ref{SecMulti}, we choose a utility function
1068: based on the joint posterior for the model parameters for all the
1069: target stars. However, in this case the choice of utility function is
1070: less obvious.
1071: %
1072: Various surveys and investigators may have differing goals and hence
1073: differing utility functions. For example, one possible goal would be
1074: to measure the orbital parameters to some desired accuracy. Another
1075: reasonable goal might be to discover as many planets as possible given
1076: some fixed amount of observing time. In that case, it would make
1077: sense to stop observing stars once it had been established that they
1078: harbored a planet, even if the orbital elements were not yet well
1079: constrained. An even more extreme example is for a radial velocity
1080: survey intended to help select reference stars for astrometric survey
1081: by future missions such as SIM. In that case, one could stop
1082: observing a star before obtaining a rough measure of the orbital
1083: parameters or even before the false alarm rate (for detecting a
1084: planet) was small. This is similar to a strategy for a survey aimed at
1085: discovering planets with a small mass which would eliminate stars once the
1086: radial velocities are observed to vary over too large a range to be
1087: due to a low mass planet. Yet another possible goal would be to
1088: discover multiple planet systems or planets with long orbital periods.
1089: In this case, one would not want to stop observing a star even after
1090: the orbit of one planet had been well characterized, if it was still
1091: possible the system could have an additional planet with a longer
1092: orbital period.
1093:
1094: The above examples illustrate that simply targeting stars based on the
1095: maximum entropy method with the same utility function is not always
1096: the best strategy for a given application. Nevertheless, we have
1097: demonstrated that adaptive scheduling algorithms can significantly
1098: increase the efficiency of an observing program. Thus, it is
1099: important that the goals of an observing program be carefully
1100: considered and clearly identified. Then, a utility function can be
1101: chosen that is appropriate for the particular purpose of the
1102: observations. Once a utility function has been defined, the methods
1103: outlined in this paper can be used to optimize the observing program
1104: for the given utility function.
1105:
1106: The utility function discussed above, $E\left[\Delta
1107: I\left\{p(\vec{m}|\vec{d}')\right\}\right]$, is relatively easy to
1108: calculate based on the posterior distribution, $p(\vec{m}|\vec{d})$,
1109: giving it a practical advantage over many other possible utility
1110: functions. In this section we describe simple generalizations of the
1111: above utility function which are can be computed with a similar
1112: efficiency. The generalized maximum entropy utility functions that we
1113: describe below provide a means for optimizing observing schedules for
1114: a broad range of goals.
1115: %
1116: % We caution that the results of a particular choice of utility function is not always obvious. Therefore, we recommend that observers calculate the results of simulated observing programs using a particular choice of utility function to make sure there are no unintended consequences of the utility function chosen.
1117:
1118:
1119: \subsection{Information about a Subset of Models}
1120: \label{SecMultiAltSubset}
1121:
1122: One case worth considering is when we are only interested in models
1123: which satisfy certain criteria. For example, we might only be
1124: interested in obtaining more information about stars with planets (and
1125: not about the constant velocity of stars without planets). Similarly,
1126: we might be interested in companions with (minimum) masses less than
1127: some threshold, perhaps to exclude binary stars or perhaps to target
1128: terrestrial-mass planets. In this case we could replace
1129: $I\left\{p(\vec{m})\right\}$ with
1130: %
1131: \be
1132: I_{\Theta}\left\{p(\vec{m})\right\} = \int d\vec{m} \, p(\vec{m}) \theta(\vec{m}) \log p(\vec{m}),
1133: \ee
1134: %
1135: where $\Theta$ is a region of the model parameter space,
1136: $\theta(\vec{m}) = 1$ when the parameters $\vec{m}$ satisfy a certain
1137: criteria and $\theta(\vec{m}) = 0$ otherwise. The expression for
1138: $I_{\Theta}(p(\vec{m}))$ can be thought of as the information contained in
1139: the distribution $p(\vec{m})$ about the subset of model parameters in $\Theta$. In this case, the relevant utility function becomes
1140: %
1141: \be
1142: E\left[\Delta I_\Theta\left\{p(\vec{m}|\vec{d}')\right\}\right] = p(\vec{m} \in \Theta) I\left\{p(v|v_{actual})\right\} - \int d\vec{m} \, p(\vec{m}|\vec{d}) \theta(\vec{m}) \int dv \, p(v|\vec{m}) \log \left[ p(v|\vec{d}) \right],
1143: \ee
1144: %
1145: where the first term is again a constant, provided that the scale and
1146: shape of the sampling distribution does not depend on the model
1147: parameters.
1148:
1149: In the above example, we used $\theta(\vec{m})$ as an indicator
1150: variable to specify when the parameters satisfied some criteria of
1151: interest, such as whether the model includes a planet or whether the
1152: orbital period is less than some threshold. More specialized forms of
1153: $\theta(\vec{m})$ could be chosen for specific goals, such as finding
1154: planets with certain orbital periods (e.g., within the habitable
1155: zone). In principle, $\theta(\vec{m})$ could be used as a weight,
1156: specifying the relative value of information about systems with
1157: various model parameters. For example, a planet search aiming to
1158: discover low-mass planets could specify a $\theta(\vec{m})$ which
1159: decreases for high mass planets.
1160:
1161:
1162: \subsection{Information in Marginal Distributions}
1163: \label{SecMultiAltMargin}
1164:
1165: Another case worth considering is when some model parameters are of
1166: more scientific interest than others. For example, the constant
1167: stellar velocities, $C_0$ and $C_1$, contain no information about
1168: extrasolar planets. Similarly, the angle $\phi_o$ may be of less
1169: interest than other model parameters such as the orbital period, $P$.
1170: As an extreme example, one might be interested in determining only if
1171: a star has a planet and not be interested in measuring the orbital
1172: parameters. In such cases, it is useful to subdivide the set of model
1173: parameters, $\vec{m}$, renaming them $(\vec{m},\vec{n})$, where
1174: $\vec{m}$ is the set of scientifically interesting model parameters
1175: and $\vec{n}$ is the set of ``nuisance'' parameters. Now, we can
1176: marginalize over the nuisance parameters and consider the expected
1177: change in the information contained in the probability distribution,
1178: $p(\vec{m}|\vec{d}') = \int dn p(\vec{m}, \vec{n} | \vec{d}')$, rather
1179: than using the joint distribution $p(\vec{m}, \vec{n} | \vec{d}')$
1180: as before. Thus, the relevant utility function becomes
1181: %
1182: \be
1183: E\left[\Delta I\left\{p(\vec{m}|\vec{d}') \right\} \right] = \int d\vec{m} \, p(\vec{m}|\vec{d}) \int d\vec{n} \, p(\vec{n}|\vec{d},\vec{m}) \int dv \, p(v|\vec{m},\vec{n}) \log \left[ \frac{ \int d\vec{n}' p(\vec{n}'|\vec{d},\vec{m}) p(v|\vec{m},\vec{n}') }{\int d\vec{m}' \int d\vec{n}' p(\vec{m}',\vec{n}'|\vec{d}) p(v|\vec{m}',\vec{n}')} \right].
1184: \ee
1185: %
1186: Here the first two integrals sample over all possible models and the
1187: third integral samples over all possible values of the observed
1188: velocity, just as before (e.g., Eqn.\ 24). Indeed, if the logarithm
1189: were split into the difference of two terms, then the term arising
1190: from the denominator would also be mathematically equivalent to the
1191: (negative of) Eqn.\ 24. However, the term arising from numerator
1192: causes this utility function to differ from Eqn.\ 23, as it no longer
1193: simplifies to equal the information of the upcoming observation if the
1194: actual velocity were known. For the utility function in Eqn.\ 23, the
1195: entropy of the predictive distribution at the time of a hypothetical
1196: future observation is compared to the entropy of the probability
1197: distribution for the observed value given the actual value. However,
1198: for this choice of utility function, the entropy of the predictive
1199: distribution at the time of a hypothetical future observation is
1200: compared to the entropy of the predictive distribution marginalized
1201: over the nuisance parameters (evaluated for the same time). Thus,
1202: both terms depend on the time, and a hypothetical future precise
1203: observation would be expected to contribute less information at times
1204: where the predictive distribution is more sensitive to the values
1205: of the nuisance parameters.
1206:
1207: If we marginalize over all the fit parameters, then we can obtain
1208: posterior distributions for the probability that the system does or
1209: does not have a planet. A scheduling algorithm based on maximizing
1210: the expected increase in information contained in this distribution is
1211: expected to detect planets very efficiently. Indeed, we can see that
1212: this is the case from the dotted red curve in Figs.~2 \& 3.
1213:
1214:
1215: \subsection{Non-Greedy Algorithms}
1216: \label{SecMultiAltNonGreedy}
1217:
1218: So far we have restricted our attention to ``greedy'' algorithms,
1219: since the utility functions have only considered the effect of a
1220: single additional observation (Cormen et al.\ 2001). As we have demonstrated, these
1221: greedy algorithms perform quite well in the cases which we considered.
1222: However, it is worth briefly discussing alternative ``non-greedy''
1223: utility functions.
1224:
1225:
1226:
1227: Let us consider the case where making one additional observation is
1228: expected provide no or little increase in information, but making
1229: multiple additional observations (perhaps at particular times) would
1230: be expected to provide significant additional information. A simple
1231: example of such a situation is when a new target star is added to the
1232: survey. The first two observations of any star can not result in the
1233: detection of a planet. Yet, if the current target list has been
1234: searched thoroughly, then it could be more productive to add new
1235: stars to the target list. While this effect is increased when the
1236: constant velocities are considered nuisance parameters, there is
1237: some effect even when $C_0$ and $C_1$ are considered parameters of interest.
1238:
1239: In principle, cases such as these can be handled by calculating the
1240: expected increase in information at some later time by which several
1241: additional observations could be made. Multiple additional
1242: observations are considered by sampling over the various possible
1243: combinations of future observations. This generalization has the
1244: obvious drawback that it is necessary to perform an additional
1245: integral over various possible combinations of observing schedules.
1246: One possible simplification is to relax the assumption that the total
1247: number of observations is held fixed (only for the purpose of
1248: evaluating the integral over future observing schedules) and to
1249: consider observing schedules for each star separately. Then the integral
1250: over future observing schedules can be performed separately for each
1251: star by assuming a constant probability of making an observation at
1252: each possible future observing time.
1253:
1254: In principle, the above algorithm could identify combinations of
1255: multiple observations which are significantly more valuable than would
1256: be estimated by a greedy algorithm. Another benefit of the above
1257: algorithm is that it can automatically account for differences in the
1258: fraction of time during which stars are observable (e.g., due to
1259: seasonal effects). We have performed a few small simulations using
1260: the non-greedy algorithm described above and found that they provided
1261: a small increase in observing efficiency for the cases which we
1262: considered. However, a through comparison of greedy and non-greedy
1263: algorithms will require significantly more computation power.
1264:
1265: \subsection{Utility Functions with Costs}
1266: \label{SecAltCosts}
1267:
1268: The utility function can also include information about the cost of
1269: making a given observation. While economic cost is often used in
1270: Bayesian decision theory, for our applications it is preferable to use
1271: observing time necessary to perform an observation as the measure of
1272: cost. The observing time required for a given observation includes
1273: the time necessary to collect the desired number of photons, but also
1274: the time required for CCD readout and slewing the telescope from the
1275: previous target to the next target. The time required for CCD readout
1276: is known and a constant for each exposure. (For faint target stars,
1277: it may be necessary to use multiple exposures for a single
1278: observation, if the barycentric correction would vary significantly
1279: during the necessary integration time.) For target stars near the
1280: previous target, the telescope can often slew to the target during CCD
1281: readout, resulting in no or minimal loss of observing time to slewing.
1282: %
1283: Typically, observing schedules within a night are chosen so that the
1284: vast majority of the observing time is spent integrating on targets.
1285: The main factor in determining the integration time necessary is the
1286: apparent magnitude of the target star. In principle, adaptive scheduling algorithm could plan an entire evenings
1287: observations, considering the cost to observe each possible target
1288: throughout the night. However, the exact amount of
1289: integration time depends on the atmospheric extinction (which we
1290: assume in proportional to the airmass) as well as time variable atmospheric
1291: conditions that are generally not known in advance.
1292: %
1293: Therefore, we do not believe that it is worthwhile to include such a
1294: fine level of scheduling when selecting targets for a given night. In
1295: practice, atmospheric conditions (e.g., seeing and cloud cover) change
1296: throughout a night, making it impossible to know in advance the number
1297: of targets that can be observed during that night. Given the
1298: computational complexity of these algorithms, it is impractical to
1299: perform new analyzes throughout the night as atmospheric conditions
1300: change. By assuming that atmospheric extinction is a function of
1301: airmass only (e.g., ignoring the possibilities of cloud cover or
1302: atmospheric conditions changing throughout the night), we can compute
1303: the expected amount of observing time necessary for each possible
1304: target star. Then, we identify the targets with the greatest expected
1305: increase in information per unit observing time (rather than per
1306: observation).
1307:
1308: % Therefore, we assume
1309: %that, if a candidate target star were to be observed on a given night,
1310: %then it would be observed at the optimal time during the night. For
1311: %most stars, this means that we assume an airmass equal to the minimum
1312: %airmass for that star during the night. For stars which transit
1313: %during the daytime or twilight, we assume the minimum of the airmass
1314: %at the beginning and ending of the available observing time.
1315:
1316:
1317: \section{Discussion}
1318: \label{SecDiscussion}
1319:
1320: %\subsection{Value of Technique}
1321:
1322: We have developed a practical algorithm for applying adaptive
1323: scheduling to radial velocity planet searches. The algorithms
1324: presented are rigorously grounded in Bayesian data analysis and
1325: information theory, and still permit specialization for the specific
1326: goals of the observing program. While such adaptive scheduling
1327: algorithms are computationally demanding, they can provide dramatic
1328: benefits. Already, there is some element of ``adaptive'' scheduling
1329: due to human feedback (e.g., observers identify interesting targets to
1330: be observed more frequently, time allocation committees decide how
1331: much and when observing time will be made available). Unfortunately,
1332: quantifying these effects is extremely difficult. One advantage of
1333: following an algorithmic procedure is that any biases can be
1334: recognized, simulated, and quantified.
1335:
1336: As we demonstrated in Fig.~2, the use of adaptive
1337: scheduling algorithm can significantly increase the number of planet
1338: detections in a survey with a fixed amount of observing time.
1339: Perhaps, more significantly, the additional planets found tend to have
1340: the smallest velocity amplitudes and hence smaller masses of those
1341: detected in the survey, as seen in Fig.~3. For the survey parameters
1342: which we considered, we found the least massive planet being detected
1343: in a survey is a factor $\sim 2$ less massive when using our adaptive
1344: scheduling algorithms than when scheduling observations randomly. It
1345: is important to appreciate that the increased the number of planet
1346: detections does not require reducing the precision of measurements of
1347: orbital parameters. In fact, the same algorithms which increase the
1348: number of planet detections simultaneously can improve the precision with
1349: which most of orbital parameters are measured (Fig.~5). While the precision is
1350: comparable for planets with very large velocity amplitudes, for low
1351: mass planets the adaptive algorithms typically measure orbital
1352: parameters with more than an order of magnitude smaller uncertainties.
1353: %
1354: % While the value of detecting
1355: % additional planets and measuring their orbital parameters more
1356: % precisely is subjective, it is clear that adaptive scheduling
1357: % algorithms can decrease the amount of telescope time needed for a
1358: % given scientific goal. Thus, adaptive scheduling algorithms can be of
1359: % great value to the entire astronomical community.
1360:
1361:
1362: %\subsection{Challenges Addressed}
1363:
1364: The adaptive algorithms presented in this paper address several
1365: important challenges raised by previous studies. In this paper, we
1366: dramatically reduce the computational requirements for Bayesian
1367: adaptive scheduling algorithms relative to Loredo (2004). We
1368: accomplish this by separating the integrals over orbital period from
1369: the integrals over the other orbital parameters. By assuming planets
1370: on circular orbits, the remaining integrals become Gaussian integrals
1371: which can be evaluated analytically. The increased efficiency has
1372: made several other advances possible. The increased computational
1373: efficiency of our algorithms allows us to conduct Monte Carlo
1374: simulations of entire planet search programs and quantify the increase
1375: in efficiency that adaptive scheduling algorithms offer. More
1376: significantly, our algorithms make it practical to perform a Bayesian
1377: analysis of the orbital parameters even when there are extremely weak
1378: constraints on the orbital parameters, such as when only a few
1379: observations have been made. This has allowed us to apply Bayesian
1380: hierarchical modeling to simultaneously consider both possibilities
1381: that a star has no planet and that the star has one planet,
1382: extending previous Bayesian techniques which assumed there was a
1383: single planet (Loredo 2004; Ford 2005, 2006). By constructing hierarchical
1384: models, adaptive scheduling algorithms naturally consider the
1385: problems of planet detection and orbital parameter estimation
1386: simultaneously. We also present several generalizations of our
1387: computationally efficient algorithms. The generalized algorithms can
1388: accommodate a variety of utility functions which can be customized to
1389: the specific goals of an observational program. These generalizations
1390: also allow the adaptive scheduling algorithm to consider practical
1391: complications such as stars of different brightnesses and observing
1392: seasons.
1393:
1394:
1395: %\subsection{New Challenges}
1396:
1397: This work also raises several new challenges. Clearly, it would be
1398: desirable to generalize our algorithms to reflect the full variety of
1399: planetary systems. For example, we assume that each star has a
1400: maximum of one planet, while there are already $\sim$12 stars known to
1401: have multiple planets. In principle, it is easy to generalize our
1402: hierarchical models to allow more multiple planets, but in practice
1403: each additional planet would introduce an additional integral which
1404: can not be evaluated analytically and dramatically increase the
1405: required computations.
1406:
1407: Additionally, we have assumed circular planetary orbits, while most
1408: extrasolar planets have significant eccentricities. We have conducted
1409: some simulations in which we consider a population of stars with
1410: planets on eccentric orbits but the adaptive scheduling algorithm
1411: assumes circular orbits. While the adaptive scheduling algorithms
1412: still detect planets and measure orbital parameters more efficiently
1413: than non-adaptive algorithms, the improvement in efficiency is less
1414: than when applied to a population of stars with planets on circular
1415: orbits. Intuitively, we expect that planets on eccentric orbits could
1416: benefit from adaptive scheduling algorithms even more than planets on
1417: circular orbits. Therefore, we expect that the reduction in
1418: efficiency is due to the adaptive scheduling algorithm not having
1419: access to the appropriate model. Again, in theory, it is
1420: straightforward to include eccentric orbits in a Bayesian analysis,
1421: but this would introduce an additional two integrals which can not be
1422: evaluated analytically and hence would significantly increase the
1423: require computations. Incorporating eccentric orbits into adaptive
1424: scheduling algorithm could be accomplished by brute force, or it might
1425: be sufficient to use approximate models which expand the orbital
1426: motion in the eccentricity.
1427:
1428: Our simulations have also assumed that planetary perturbations are the
1429: only cause of variations in the star's radial velocity. In practice,
1430: many stars appear to have intrinsic variability commonly known as
1431: stellar ``jitter'' which can be comparable to or exceed the
1432: observational uncertainties in the radial velocity measurements (Saar,
1433: Butler, \& Marcy 1998). We have conducted some simulations in which
1434: we consider a population of stars where each star has a Gaussian
1435: jitter, but the adaptive scheduling algorithm assumes no jitter. The
1436: primary effect of the jitter is to increase the false alarm rate
1437: (fraction of stars with no planets for which the Bayesian analysis
1438: determines the probability of having no planet is less than 0.1\%).
1439: While the threshold for announcing a planet detection can be altered
1440: to maintain a 0.1\% false alarm rate, such a treatment is simplistic
1441: and does not properly account for the unknown jitter. The problems
1442: posed by jitter can be mitigated by replacing the observational
1443: uncertainties with the observational uncertainties added in quadrature
1444: to the amount of jitter expected based on the star's spectral
1445: properties. Still, this approach could result in poor performance
1446: when the estimate for the stellar jitter is inaccurate. Indeed, one
1447: of the advantages of Bayesian analysis is that it can naturally allow
1448: for noise sources of unknown magnitude. A more rigorous analysis
1449: would treat the stellar jitter of each star as an unknown along with
1450: the orbital parameters. Unfortunately, adding the stellar jitter as a
1451: model parameter introduces an additional integral and requires
1452: additional computation time. For the purpose of testing adaptive
1453: scheduling algorithms when confronted with jitter, it may be useful to
1454: consider a special case in which the observational uncertainties are
1455: the same for each observation. In this case, the integral over the
1456: unknown stellar jitter can be performed separately from the other
1457: integrals and reduces to a sum of exponential integrals which
1458: can be performed analytically (Cumming 2004).
1459:
1460: This paper has focused its attention on adaptive scheduling algorithms
1461: for radial velocity planet searches. Targeted astrometric planet
1462: searches such as those planned for the Space Interferometry Mission
1463: (SIM) are another obvious application of our methods. Given the
1464: cost and finite lifetime of space missions such as SIM, observing time
1465: is extremely valuable and it is even more important to make the best
1466: use of the available observing time. Successfully incorporating Bayesian
1467: adaptive design would require that the observation schedule not be fixed
1468: far in advance by logistical constraints. Mission designers should aim
1469: to allow for frequent upload of revised target lists.
1470: In principle, the two types of
1471: planet searches are quite similar, with the main difference that
1472: astrometric surveys can measure the stellar position in two dimensions
1473: while radial velocity surveys can measure the stellar velocity in only
1474: one dimension. Therefore, we expect that adaptive scheduling
1475: algorithms could also provide a significant improvement in the
1476: efficiency of the SIM planet searches, resulting in more planets being
1477: discovered, planet masses and orbital parameters being determined more
1478: accurately, and {\em significantly increasing the sensitivity of SIM
1479: to nearly-Earth-mass planets and multiple planet systems}. For this
1480: potential benefit to be realized the SIM design must be sufficiently
1481: flexible that the observing schedule and target stars can be chosen
1482: with a lead time much less than the duration of the mission,
1483: preferably a lead time of a month or less. To fully simulate such
1484: an adaptive scheduling algorithm, it will be important to incorporate
1485: the practical observing constraints and costs (e.g., possible pointing
1486: directions limited, time require to slew to different positions).
1487:
1488: \acknowledgments
1489:
1490: We thank Debra Fischer, Jeremy Goodman, Geoff Marcy, Scott
1491: Tremaine, and an anonymous referee for their suggestions.
1492: %
1493: This research was supported in part by the Miller Institute for Basic
1494: Research, NASA grants NAG5-10456 and NNG04H44g, and by NASA through
1495: Hubble Fellowship grant HST-HF-01195.01A awarded by the Space
1496: Telescope Science Institute, which is operated by the Association of
1497: Universities for Research in Astronomy, Inc., for NASA, under contract
1498: NAS 5-26555.
1499:
1500:
1501: \newpage
1502:
1503: \begin{thebibliography}{}
1504: \baselineskip 11pt
1505: \parsep 0pt
1506: \itemsep -3pt
1507:
1508: \bibitem[]{} Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. 2001 Introduction to Algorithms. (Cambridge, MA: MIT Press)
1509:
1510: %\bibitem[]{} Cumming, A., Marcy, G., \& Butler, R.P.\ 1999,
1511: %\newblock{ ApJ, } 526, 890.
1512:
1513: \bibitem[]{} Cumming, A., Marcy, G.W., Butler, R.P., \& Vogt, S.S. 2002,
1514: in ``Scientific Frontiers in Research on Extrasolar Planets'' eds. D. Deming \& S. Seager; ASP Conferences Series 294, 27.
1515:
1516: \bibitem[]{} Cumming, A. 2004
1517: \newblock{ MNRAS, } 354, 1165.
1518:
1519: \bibitem[]{} Endl, M., Kurster, M., Els, S., Hatzes, A.P., Cochran, W.D., Dennerl, K., Dobereiner, S. 2002 A\&A 392, 671.
1520:
1521: \bibitem[]{} Fischer, D.A., Laughlin, G., Butler, R.P., Marcy, G.W., Johnson, J., Henry, G., Valenti, J., Vogt, S.S., Ammons, M., Robinson, S., Spear, G., Strader, J., Driscoll, P., Fuller, A., Johnson, T., Manrao, E., McCarthy, C., Munoz, M., Tah, K.L., Wright, J., Ida, S., Sato, B., Minniti, D. 2005,
1522: \newblock{ ApJ, } 620, 481.
1523:
1524: \bibitem[]{} Ford, E.B. 2004,
1525: \newblock{ PASP, } 116, 1083.
1526:
1527: \bibitem[]{} Ford, E.B. 2005,
1528: \newblock{ AJ, } 129, 1706.
1529:
1530: \bibitem[]{} Ford, E.B. 2006,
1531: \newblock{ ApJ, } 642, 505.
1532:
1533: \bibitem[]{} Gaudi, B.S., Seager, S., Mallen-Ornelas, G. 2005,
1534: \newblock{ ApJ, } 623, 472.
1535:
1536: \bibitem[]{} Konacki, M., Torres, G., Saurabh, J., Sasselov, D. 2003,
1537: \newblock{ Nature, } 421, 507.
1538:
1539: \bibitem[]{} Loredo, T.J. 2004, in ``Bayesian Inference And Maximum Entropy Methods In Science And Engineering: 23rd International Workshop'' ed. G. J. Erickson and Y. Zhai; AIP Conference Proceedings 707, 330. % -346
1540:
1541: \bibitem[]{} Pourbaix, D. 2002,
1542: \newblock{ A\&A, } 385, 686. % -692.
1543:
1544: \bibitem[]{} Rasio, F.A., Tout, C.A., Lubow, S.H. Livio, M. 1996,
1545: \newblock{ ApJ, } 470, 1187.
1546:
1547: \bibitem[]{} Saar, S.H., Butler, R.P., Marcy, G.W. 1998,
1548: \newblock{ ApJ, } 498, 153.
1549:
1550: \bibitem[]{} Sivia, D.S. 1996 Data Analysis: A Bayesian Tutorial. (New York, NY: Oxford University Press)
1551:
1552: \bibitem[]{} Sozzetti, A., Casertano, S., Brown, R.A., Lattanzi, M.G. 2002,
1553: \newblock { PASP, } 114, 117. % 2002 astro-ph/0207222 % SIM: single planet
1554:
1555: \bibitem[]{} Tabachnik, S. \& Tremaine, S. 2002,
1556: \newblock{ MNRAS, } 335, 151.
1557:
1558: \bibitem[]{} Udry, S., Mayor, M., Clausen, J., Freyhammer, L., Helt, B., Lovis, C., Naef, D., Olsen, E, Pepe, F., Queloz, D., Santos, N. 2003,
1559: \newblock{ A\&A, } 407, 679.
1560:
1561: %\bibitem[]{} Whetherill, G.B. \& Glazebrook, K.D. 1986,
1562: %{\em Sequential Methods in Statistics,}
1563: %% (Monographs on applied probability and statistics)
1564: %New York: Chapman and Hall Ltd.
1565:
1566: \end{thebibliography}
1567:
1568: \newpage
1569:
1570: \begin{figure}
1571: \label{FigN2k}
1572: \plotone{f1.eps}
1573: \caption{Here we show the expected value for the radial velocity
1574: (solid line) during October and November 2004 (tick marks very five
1575: days) of four different target stars from the N2K project. The dotted
1576: lines show the 95\% confidence intervals for $p(v|d)$ as a function of
1577: time. These predictive observations were based on observations taken
1578: between January 10 and July 11, 2004 (not shown) as a part of the N2K
1579: survey. The arrows near the bottom of each panel indicate the time
1580: when the entropy of the predictive distribution is maximized. The
1581: arrows near the bottom of each panel indicate the time when the
1582: standard deviation of the predictive distribution is maximized. The
1583: predictive distributions are based on 4 (top left), 5 (top right), 11
1584: (bottom left), and 6 (bottom right) observations. The best-fit
1585: orbital periods are near 53d (top left), 52d (top right), 16 d (bottom
1586: left), and 27d (bottom right).
1587: %
1588: %For some stars there are some times where observations would be particularly valuable.
1589: }
1590: %
1591: \end{figure}
1592:
1593: \begin{figure}
1594: \label{FigHd88133}
1595: \plotone{f2.eps}
1596: \caption{Here we show the expected change in the expectation of the information
1597: content following a single observation as a function of the time, $t$ (dotted line).
1598: This is to be compared to the log of the standard deviation of the predictive distribution (solid line).
1599: The figure is based on the first five of the observations of HD 88133, the first planet discovered by the N2K project (Fischer et al. 2004). The times of the observations are indicated with vertical arrows. While the two curves show similarities, the most/least favorable times according to an information theory analysis are not always coincident with what would be predicted using only the variance of the predictive distribution.
1600: %
1601: }
1602: %
1603: \end{figure}
1604:
1605: \begin{figure}
1606: \plotone{f3.eps}
1607: \caption{This figure shows the number of planets detected (stars for
1608: which the posterior probability of the no-planet model,
1609: $p(0|\vec{d})$, is less than 0.1\%) as a function of the number of
1610: observing nights. The different line styles present the results for
1611: different scheduling algorithms: regular (solid black), the adaptive
1612: algorithm from \S\ref{SecMultiMaxEntropy} (dashed blue), and the
1613: %
1614: %FIXED
1615: %
1616: adaptive algorithm from \S\ref{SecMultiAltMargin} (dotted red), which
1617: %
1618: %FIXED
1619: %
1620: maximizes the detections at the expense of accuracy in orbital
1621: parameters by marginalizing over orbital parameters (compare to Fig.\
1622: 5). The results for each scheduling algorithm have been averaged over
1623: ten simulated observing programs.
1624: %
1625: }
1626: %
1627: \end{figure}
1628:
1629: \begin{figure}
1630: \plotone{f4.eps}
1631: \caption{Here we show the fraction of planets detected as a function
1632: of the velocity semi-amplitude, $K$. In these simulations each
1633: observation had an observational uncertainty of $\sigma_{\mathrm obs}
1634: = 3$m/s. The planet searches based on both adaptive algorithms are
1635: significantly more efficient at detecting planets with $K\sim 1-3
1636: \sigma_{\mathrm obs}$ than the fixed schedule (solid black line). The
1637: line styles are as in Fig.~2. }
1638: %
1639: \end{figure}
1640:
1641: \begin{figure}
1642: \plotone{f5.eps}
1643: \caption{In this figure we present a histogram of the number of time a star was
1644: observed. For the regular scheduling algorithm, each star was
1645: observed ten times (not shown). The adaptive algorithm from
1646: \S\ref{SecMultiMaxEntropy} (dashed blue) and the adaptive algorithm from
1647: %
1648: % FIXED
1649: %
1650: \S\ref{SecMultiAltMargin} (dotted red) both devote a large number of
1651: %
1652: % FIXED
1653: %
1654: observations to a few stars, significantly increasing the sensitivity
1655: to low-mass planets around these stars. }
1656: %
1657: \end{figure}
1658:
1659: \begin{figure}
1660: \plotone{f6.eps}
1661: \caption{Here we show the median precision of the measurements of the orbital parameters, $P$ (top panel) and $K$ (bottom panel), as a function of the velocity semi-amplitude, $K$. Both adaptive scheduling algorithms typically perform significantly better than the fixed schedule (solid black line). However, for large velocity amplitudes, the adaptive algorithm presented in \S\ref{SecMultiMaxEntropy} often allocates a only small number of observations to planets with large velocity amplitudes and hence the orbit determinations are not as precise as with the fixed schedule. The line styles are as in Fig.~2.
1662: }
1663: %
1664: \end{figure}
1665:
1666: \end{document}
1667: