0710:0710.1872/ms.tex

1: %%

2: %% Beginning of file 'sample.tex'

3: %%

4: %% Modified 2005 December 5

5: %%

6: %% This is a sample manuscript marked up using the

7: %% AASTeX v5.x LaTeX 2e macros.

8:

9: %% The first piece of markup in an AASTeX v5.x document

10: %% is the \documentclass command. LaTeX will ignore

11: %% any data that comes before this command.

12:

13: %% The command below calls the preprint style

14: %% which will produce a one-column, single-spaced document.

15: %% Examples of commands for other substyles follow. Use

16: %% whichever is most appropriate for your purposes.

17: %%

18: \documentclass[12pt,preprint]{aastex}

19:

20: %% manuscript produces a one-column, double-spaced document:

21:

22: %\documentclass[manuscript]{aastex}

23: %\documentclass[preprint]{aastex}

24:

25: %%\usepackage{amsmath}

26: %%\usepackage{amssymb}

27: %%\usepackage{graphicx}

28:

29: %% preprint2 produces a double-column, single-spaced document:

30:

31: %% \documentclass[preprint2]{aastex}

32:

33: %% Sometimes a paper's abstract is too long to fit on the

34: %% title page in preprint2 mode. When that is the case,

35: %% use the longabstract style option.

36:

37: %% \documentclass[preprint2,longabstract]{aastex}

38:

39: %% If you want to create your own macros, you can do so

40: %% using \newcommand. Your macros should appear before

41: %% the \begin{document} command.

42: %%

43: %% If you are submitting to a journal that translates manuscripts

44: %% into SGML, you need to follow certain guidelines when preparing

45: %% your macros. See the AASTeX v5.x Author Guide

46: %% for information.

47:

48: \newcommand{\vdag}{(v)^\dagger}

49: %\newcommand{\myemail}{skywalker@galaxy.far.far.away}

50: \newcommand{\etal}{{et al.}}

51: \newcommand{\kpc}{$\, {\rm kpc}$}

52: \newcommand{\kms}{$\, {\rm km\,s^{-1}}$}

53: \newcommand{\lsun}{$L_{\odot}$}

54: \newcommand{\msun}{\,$M_{\odot\,}$}

55: \newcommand{\sch}{Schwarzschild\,\,}

56: \newcommand{\ml}{$\Upsilon$}

57: \newcommand{\grad}{^{\circ}}

58: %\newcommand{\vect}[1]{\ensuremath{\mbox{\boldmath $#1$}}}

59: %\newcommand{\isot}{$\rm{^{12}C/^{13}C}\,\,$}

60: %\newcommand{\no}{$\rm{^{14}N/^{16}O}\,\,$}

61: %\newcommand{\feh}{${\rm [Fe/H]\,\,}$}

62: %\newcommand{\sgb}{$\rm SGB\,\,$}

63: %\newcommand{\arcsec}{{''\hskip-3pt .}}

64:

65: %\def\plotone#1{}

66:

67: %% You can insert a short comment on the title page using the command below.

68:

69: %\slugcomment{To be submitted to The Astrophysical Journal}

70:

71: %% If you wish, you may supply running head information, although

72: %% this information may be modified by the editorial offices.

73: %% The left head contains a list of authors,

74: %% usually a maximum of three (otherwise use et al.).  The right

75: %% head is a modified title of up to roughly 44 characters.

76: %% Running heads will not print in the manuscript style.

77:

78: \shorttitle{Schwarzschild models of discrete data}

79: \shortauthors{Kleyna et al.}

80:

81: %% This is the end of the preamble.  Indicate the beginning of the

82: %% paper itself with \begin{document}.

83:

84: \begin{document}

85:

86: %% LaTeX will automatically break titles if they run longer than

87: %% one line. However, you may use \\ to force a line break if

88: %% you desire.

89:

90:

91: \title{Constraining the Mass Profiles of Stellar Systems: \sch Modeling of

92: Discrete Velocity Datasets}

93:

94: %% Use \author, \affil, and the \and command to format

95: %% author and affiliation information.

96: %% Note that \email has replaced the old \authoremail command

97: %% from AASTeX v4.0. You can use \email to mark an email address

98: %% anywhere in the paper, not just in the front matter.

99: %% As in the title, use \\ to force line breaks.

100:

101: \author{Julio Chanam\'e\altaffilmark{1}, Jan Kleyna\altaffilmark{2}, \& Roeland van der Marel\altaffilmark{1}}

102: \altaffiltext{1}{Space Telescope Science Institute, 3700 San Martin Dr., Baltimore, MD 21218}

103: \altaffiltext{2}{Institute for Astronomy, University of Hawaii, 2680 Woodlawn Drive, Honolulu, HI 96822}

104:

105:

106:

107: %% Notice that each of these authors has alternate affiliations, which

108: %% are identified by the \altaffilmark after each name.  Specify alternate

109: %% affiliation information with \altaffiltext, with one command per each

110: %% affiliation.

111:

112: %\altaffiltext{1}{Visiting Astronomer, Cerro Tololo Inter-American Observatory.

113: %CTIO is operated by AURA, Inc.\ under contract to the National Science

114: %Foundation.}

115: %\altaffiltext{2}{Society of Fellows, Harvard University.}

116: %\altaffiltext{3}{present address: Center for Astrophysics,

117: %    60 Garden Street, Cambridge, MA 02138}

118: %\altaffiltext{4}{Visiting Programmer, Space Telescope Science Institute}

119: %\altaffiltext{5}{Patron, Alonso's Bar and Grill}

120:

121: %% Mark off your abstract in the ``abstract'' environment. In the manuscript

122: %% style, abstract will output a Received/Accepted line after the

123: %% title and affiliation information. No date will appear since the author

124: %% does not have this information. The dates will be filled in by the

125: %% editorial office after submission.

126:

127: \begin{abstract}

128:

129: We present a new \sch orbit-superposition code that is designed to

130: model discrete datasets composed of velocity measurements of

131: individual kinematic tracers in a dynamical system. This constitutes

132: an extension of previous implementations that can only address

133: continuous data in the form of (the moments of) velocity

134: distributions, thus avoiding potentially important losses of

135: information due to data binning. Furthermore, the code can handle any

136: combination of available velocity components, i.e., only line-of-sight

137: velocities, only proper motions, or a combination of both. It can also

138: handle a combination of discrete and continuous data. The code

139: determines the combination of orbital mass weights (representing the

140: distribution function) as a function of the three integrals of motion

141: $E,L_z,$ and $I_3$ that best reproduces, in a maximum-likelihood

142: sense, the available kinematic and photometric observations in a given

143: axisymmetric gravitational potential. The overall best fit is the one

144: that maximizes the likelihood over a parameterized set of trial

145: potentials. The fully numerical approach ensures considerable freedom

146: on the form of the distribution function $f(E,L_z,I_3)$. This allows a

147: very general modeling of the orbital structure, thus avoiding

148: restrictive assumptions about the degree of (an)isotropy of the

149: orbits. We describe the implementation of the discrete code and

150: present a series of tests of its performance based on the modeling of

151: simulated (i.e., artificial) datasets generated from a known

152: distribution function. We explore pseudo-datasets with varying degrees

153: of overall rotation and different inclinations on the plane of the

154: sky, and study the results as a function of relevant observational

155: variables such as the size of the dataset and the type of velocity

156: information available. We find that the discrete \sch code recovers

157: the original orbital structure, mass-to-light ratio, and inclination

158: of the input datasets to satisfactory accuracy, as quantified by

159: various statistics. The code will be valuable, e.g., for modeling

160: stellar motions in Galactic globular clusters, and modeling the

161: motions of individual stars, planetary nebulae, or globular clusters

162: in nearby galaxies. This can shed new light on the total mass

163: distributions of these systems, with central black holes and dark

164: matter halos being of particular interest.

165:

166: \end{abstract}

167:

168: %% Keywords should appear after the \end{abstract} command. The uncommented

169: %% example has been keyed in ApJ style. See the instructions to authors

170: %% for the journal to which you are submitting your paper to determine

171: %% what keyword punctuation is appropriate.

172:

173: \keywords{stellar dynamics -- galaxies: kinematics and dynamics --

174: dark matter -- galaxies: halos -- methods: numerical}

175:

176:

177:

178: %% From the front matter, we move on to the body of the paper.

179: %% In the first two sections, notice the use of the natbib \citep

180: %% and \citet commands to identify citations.  The citations are

181: %% tied to the reference list via symbolic KEYs. The KEY corresponds

182: %% to the KEY in the \bibitem in the reference list below. We have

183: %% chosen the first three characters of the first author's name plus

184: %% the last two numeral of the year of publication as our KEY for

185: %% each reference.

186:

187:

188: %% Authors who wish to have the most important objects in their paper

189: %% linked in the electronic edition to a data center may do so by tagging

190: %% their objects with \objectname{} or \object{}.  Each macro takes the

191: %% object name as its required argument. The optional, square-bracket

192: %% argument should be used in cases where the data center identification

193: %% differs from what is to be printed in the paper.  The text appearing

194: %% in curly braces is what will appear in print in the published paper.

195: %% If the object name is recognized by the data centers, it will be linked

196: %% in the electronic edition to the object data available at the data centers

197: %%

198: %% Note that for sources with brackets in their names, e.g. [WEG2004] 14h-090,

199: %% the brackets must be escaped with backslashes when used in the first

200: %% square-bracket argument, for instance, \object[\[WEG2004\] 14h-090]{90}).

201: %%  Otherwise, LaTeX will issue an error.

202:

203: \section{Introduction}

204: \label{sec.intro}

205:

206: The study of the internal dynamics of stellar systems plays an

207: essential role in astronomy.  From the observed positions and

208: velocities of the stars in galaxies and globular clusters it is

209: possible to infer their total (dark+luminous) mass distribution,

210: which, in particular, provides information on the presence and

211: properties of dark halos and massive black holes. In turn, this

212: structural knowledge constrains theories for the formation and

213: evolution of these systems.

214:

215: The dynamical state of a stellar system is determined by its phase

216: space distribution function, $f({\vec r}, {\vec v})$, which counts the

217: stars as a function of position ${\vec r}$ and velocity ${\vec v}$.

218: Typically, however, only three of the six phase-space coordinates are

219: available observationally: the projected sky position $(x',y')$, and

220: the velocity $v_{z'}$ along the line of sight (LOS). Proper motion

221: observations can provide the additional velocities $(v_{x'},v_{y'})$,

222: but such data are generally not available (with the notable exception

223: of some Galactic globular clusters). To make progress with the limited

224: information available, the dynamical theorist is often forced to make

225: simplifying assumptions about geometry (e.g., that the system is

226: spherical) or about the velocity distribution (e.g., that it is

227: isotropic). Such assumptions can have strong effects on the inferred

228: mass distribution (\citealt{bin82}). To obtain the most accurate

229: results it is therefore important to make models that are as general

230: as possible. Of particular importance for collisionless, unrelaxed

231: systems such as galaxies is to constrain the velocity anisotropy using

232: available data, rather than to assume it a priori.

233:

234: In a collisionless system the distribution function satisfies the

235: collisionless Boltzmann equation. Analytical methods to find solutions

236: of this equation usually rely on the Jeans Theorem, which states that

237: the distribution function must depend on the phase-space coordinates

238: through integrals of motion (quantities that are conserved along a

239: stellar orbit). In a spherical system all integrals are known

240: analytically, namely, the energy $E$ and the components of the angular

241: momentum vector ${\vec L}$. Analytical models for spherical systems

242: are therefore fairly easily constructed. In an axisymmetric system

243: things are more complicated (e.g., \citealt{bt87,mer99}). Only two

244: integrals are known analytically, $E$ and the vertical component

245: $L_{\rm z}$ of the angular momentum vector\footnote{We adopt the

246: notation in which $(x,y,z)$ denote the coordinates intrinsic to the

247: axisymmetric stellar system, with the plane $x-y$ being the equatorial

248: plane, and $z$ the symmetry axis. These relate via the inclination $i$

249: to the observable coordinates $(x',y')$ on the plane of the sky

250: (aligned, respectively, along the projected major and minor axes of

251: the stellar system), and $z'$ the line-of-sight direction, positive in

252: the direction away from us.}, but there is generally a third integral

253: for which no analytical expression exists. Therefore, it is not

254: generally possible to construct an axisymmetric model

255: analytically. The special class of so-called `two-integral'

256: ($f=f(E,L_z)$) models (e.g., \citealt{bat93,deh94,ver02}) has its uses

257: (e.g., \citealt{mag98,vdm06}), but these have an isotropic velocity

258: distribution in their meridional plane, which need not be a good fit

259: to real dynamical systems.

260:

261: The most practical way to model a general axisymmetric system is to do

262: it numerically. While a few methods exist to do this (e.g.,

263: \citealt{m2m,nmagic}), the most common approach uses Schwarzschild's

264: (1979) method. One starts with a trial guess for the gravitational

265: potential $\Psi$ and then numerically calculates an orbit library that

266: samples integral space in some complete and uniform way. The orbits

267: are integrated for several hundred orbital periods, and the

268: time-averaged intrinsic and projected properties (density, LOS

269: velocity, etc.) are stored as the integration progresses. The

270: construction of a model consists of finding a weighted superposition

271: of the orbits that: (1) reproduces the observed stellar or surface

272: brightness distribution on the sky; and (2) reproduces all available

273: kinematical data to within the observational error bars. Additional

274: constraints can be added to enforce that the distribution function in

275: phase space be smooth and reasonably well behaved, e.g., through

276: regularization or by requiring maximum entropy.

277:

278: Several axisymmetric Schwarzschild codes have been developed in the

279: last decade (e.g., \citealt{vdm98,cre99,geb00,val04,tho04}). These

280: codes deal with the situation in which information on the

281: line-of-sight velocity distribution (LOSVD) is available for a set of

282: positions on the projected plane of the sky. This is the case, e.g.,

283: when the kinematical data are from long-slit or integral-field

284: spectroscopic observations of unresolved galaxies. The optimization

285: problem for such data can be reduced to a linear matrix equation for

286: which one needs to find the least-squares solution with non-negative

287: weights \citep{rix97}. One dimension of the matrix corresponds to the

288: number of orbits in the library, while the other corresponds to the

289: number of (luminosity, kinematical and regularization) constraints

290: that must be reproduced. Both dimensions are typically in the range

291: $10^3$--$10^4$. Nonetheless, efficient numerical algorithms exist to

292: find the solution, which yield the orbital and the velocity

293: distribution of the model, as well as the $\chi^2$ of the fit to the

294: kinematical data. The procedure must then be iterated with different

295: gravitational potentials, to determine the potential that provides the

296: overall best $\chi^2$. The existing codes have been used and tested

297: extensively (e.g.,

298: \citealt{cre00,cap02,cap06,geb03,ben05,dav06}). Some questions remain,

299: e.g., about the importance of smoothing in phase space, the exact

300: meaning of the confidence regions determined using $\Delta \chi^2$

301: contours, and, in some situations, valid concerns have been raised

302: regarding whether the available data contain enough information so as

303: to warrant the conclusions of the \sch modeling

304: \citep{val04,cre04,kra05}. Nevertheless, on the whole Schwarzschild

305: codes have now been established as an accurate and versatile tool to

306: study a wide range of dynamical problems.

307:

308: A disadvantage of the existing codes is that they cannot be easily

309: applied to the large class of problems in which the kinematical

310: observations come in the form of discrete velocity measurements,

311: rather than as LOSVDs. This is encountered, e.g., when modeling the

312: dynamics of galaxies at large radii, where the low-surface brightness

313: prevents integrated-light spectroscopy. The only available data are

314: then often of a discrete nature, e.g., via the LOS velocities of

315: individual stars in galaxies of the Local Group (e.g.,

316: \citealt{bill00,jan01,jan02,lok02,wil04,lok05,wal06,geh06}), or via

317: planetary nebulae (e.g., \citealt{dou02,rom03,teo05}) and globular

318: clusters (e.g., \citealt{cote01,tom04}) surrounding giant

319: ellipticals. The kinematical data available for clusters of galaxies,

320: consisting of redshifts for individual galaxies, are of a similarly

321: discrete nature (e.g., \citealt{lok03}). The typical datasets in all

322: these cases consist of tens to hundreds of LOS velocities. Galactic

323: globular clusters constitute another class of object for which

324: kinematical data is often available only as discrete measurements,

325: rather than in the form of LOSVDs. From ground-based observations,

326: data sets of individual LOS velocities can be available for up to

327: thousands of stars in these systems (e.g.,

328: \citealt{sun96,may97,rei06}), and for $\omega$ Cen it has been

329: possible to assemble large samples of proper motions as well

330: \citep{vleu00}. With the capabilities of {\it HST}, accurate proper

331: motion data sets with up to $\sim 10^4$ stars are now becoming

332: available for several more Galactic globular clusters (e.g.,

333: \citealt{mcn03,mcl06}).

334:

335: Note that discrete datasets do not necessarily provide better or worse

336: information than datasets obtained from integrated-light

337: measurements. Both types of data have their advantages and

338: disadvantages. For discrete datasets, for example, interloper

339: contamination can be a problem (see also the end of

340: Section~\ref{sec:logL} below). By contrast, for integrated-light

341: measurements, it is often difficult to constrain the wings of the

342: LOSVD due to uncertainties associated with continuum

343: subtraction. Which type of data is most appropriate and most easily

344: obtained depends on the specific object under study. This is therefore

345: not a question that we address in this paper.  Instead, we focus on

346: the issue of how to best analyze discrete data, if that happens to be

347: what is available.

348:

349: Analyses of discrete datasets have often been more simplified than the

350: analyses that are now common for integrated-light data. For example,

351: the observations are analyzed using the Jeans equations (e.g.,

352: \citealt{ger02,lok03,cote03,dou07}), often with the help of data

353: binning to calculate rotation velocity and velocity dispersion

354: profiles (see, however, the ``spherical'' Schwarzschild models of M87

355: of \citealt{rom01}). The disadvantage of such an approach is that not

356: all the information content of the data is used, including information

357: on deviations of the velocity histograms from a Gaussian. Such

358: deviations are important because they constrain the velocity

359: dispersion anisotropy of the system (e.g.,

360: \citealt{vdm93,ger93,ger98}). This anisotropy is an important

361: ingredient in some existing controversies, e.g. regarding the presence

362: of dark halos around elliptical galaxies \citep{rom03,dek05}. Loss of

363: information can be avoided when large numbers of datapoints are

364: available, as is often the case for globular clusters. It is then

365: possible to create velocity histograms for binned areas on the

366: projected plane of the sky, after which analysis can be done with

367: existing Schwarzschild codes (e.g., \citealt{bos06}). While this is

368: possible for large datasets, such an approach is not viable for the

369: more typical, smaller datasets that are often available. The

370: availability of Schwarzschild codes that can fully exploit the

371: information content of such smaller datasets would therefore be

372: valuable to advance this subject.

373:

374: Motivated by these considerations we set out to adapt our existing

375: Schwarzschild code \citep{vdm98} to deal with discrete datasets. This

376: does not constitute a trivial change, since it changes the constrained

377: superposition procedure from a linear matrix problem to a more

378: complicated maximum likelihood one. For each observed velocity of a

379: particle in the system the question becomes: what is the probability

380: that this velocity would have been observed if the model is correct?

381: The overall likelihood of the data, given a trial model, is the

382: product of these probabilities for all observations. Such likelihood

383: problems have previously been solved for spherical systems

384: \citep{mer93,vdm00,wu06} and the special class of axisymmetric

385: $f(E,L_z)$ systems \citep{mer97,wu07}. However, for the axisymmetric

386: Schwarzschild modeling approach the problem corresponds to finding the

387: minimum of a function in a space with a dimension of

388: $10^3$--$10^4$. We show in this work, via the \sch modeling of

389: simulated datasets, that this problem can indeed be solved

390: successfully and efficiently. Moreover, we follow \cite{glenn06} and

391: implement in our new code the ability to calculate and fit proper

392: motions in addition to LOS velocities. Applications of the code to

393: real datasets will be presented in forthcoming papers.

394:

395: The structure of the paper is as follows. In Section \ref{sec:logL} we

396: phrase the new problem of fitting a \sch model to a dataset of

397: discrete velocities (of one, two, or three dimensions) of individual

398: kinematic tracers in terms of a likelihood formalism. Section

399: \ref{sec:code} describes the implementation of the discrete fitting

400: procedure into our existing \sch code. At the same time, we summarize

401: here the major steps involved in the construction of the probability

402: matrix that describes the likelihood of a given kinematic data point

403: belonging to some particular orbit of the library. We then present in

404: Section \ref{sec:tools} sets of simulated data that we use for the

405: purpose of testing the performance of the discrete \sch code. We also

406: describe the known input distribution functions from which these data

407: were drawn. The application of the code to the simulated datasets is

408: presented in Section \ref{sec:tests}. We present a thorough analysis

409: of the accuracy with which our discrete \sch code recovers the known

410: distribution function, mass-to-light ratio and inclination used to

411: generate the simulated data. Finally, in Section \ref{sec:end} we

412: summarize our findings and present our conclusions.

413:

414: \section{Linear and non-linear constraints in the likelihood formalism}

415: \label{sec:logL}

416:

417: In the \sch scheme the properties of every orbit $j$ in the orbit

418: library are computed and stored. The modeling consists in finding the

419: superposition of orbital weights $a_j^2$, i.e., the fraction of

420: particles in the system residing in each orbit, that best reproduces

421: some set of constraints. The weights are written as squares to ensure

422: that they never become negative. Linear constraints are of the form

423: %

424: \begin{equation}

425: \label{eq.constraint}

426:   {\gamma_k}^* = {\gamma_k} \pm {\sigma_k} , \quad k=1 \ldots M .

427: \end{equation}

428: %

429: Here ${\gamma_k}$ is a constraint value that needs to be reproduced,

430: ${\sigma_k}$ is its uncertainty, and ${\gamma_k}^*$ is its model

431: prediction

432: %

433: \begin{equation}

434: \label{eq.gamma.def}

435:   {\gamma_k}^* = \sum_j B_{kj} a_j^2/\sum_j a_j^2 .

436: \end{equation}

437: %

438: The matrix $B_{kj}$ represents here, for orbit $j$, the probability

439: distribution corresponding to the constraint $\gamma_k$. The

440: constraints are generally one of the following: (a) the integrated

441: light (surface brightness) of a stellar population in some aperture

442: number in the projected plane of the sky, necessary to reproduce an

443: observational measurement of the surface brightness; (b) the mean LOS

444: velocity, velocity dispersion, or for data of sufficient quality, a

445: higher-order Gauss-Hermite moment in some aperture number in the

446: projected plane of the sky, necessary to reproduce an observational

447: measurement of the stellar kinematics; (c) the integrated mass in some

448: meridional $(R,z)$ plane grid point, necessary to provide a consistent

449: model; (d) a combination of distribution function moments in some

450: meridional $(R,z)$ plane grid point, if a model with a particular

451: dynamical structure is desired (e.g., one may want a model with $\rho

452: (\overline{v_R^2}-\overline{v_z^2})$ equal to zero in order to

453: simulate a two-integral $f(E,L_z)$ model); (e) a combination of orbit

454: weights, if regularization constraints are desired to enforce

455: smoothness of the model in phase space (e.g., one can set the N-th

456: order divided difference of adjacent orbit weights to zero, with an

457: uncertainty $\Delta {\gamma_k}$ that measures the desired amount of

458: smoothing).

459:

460: It is natural to choose the best-fitting model to be the one that

461: produces the maximum likelihood. To determine the likelihood we need

462: to write down an expression for the probability of measuring

463: $\gamma_k$ among all its possible values. To do this, we recall that

464: any model is not an attempt to reproduce a set of observations to

465: infinite accuracy, but instead to do it within the uncertainty

466: $\sigma_k$. For observational constraints, such as those in (a) and

467: (b) above, $\sigma_k$ is equal to the measurement uncertainty. For

468: other constraints, such as those in (c)-(e) above, $\sigma_k$ can be

469: used as a forcing parameter that compels how accurately the likelihood

470: needs to peak around a particular value of $\gamma_k$. If one assumes

471: that these uncertainties have a normal (Gaussian) distribution, then

472: the probability we are interested in is given by

473: %

474: \begin{eqnarray}

475: \label{eq.non.linear.term}

476: P(\gamma_k) &=&

477: {1\over \sqrt{2\pi} {\sigma_k}}

478: \exp\left[-{1\over 2 {\sigma_k}^2}

479: \left(\gamma_k- {\gamma_k}^* \right)^2 \right] .

480: %\\

481: %&=&{1\over \sqrt{2\pi} {\sigma_k}}

482: %\exp\left[-{1\over 2 {\sigma_k}^2}

483: %\left(\gamma_k- {\sum_j  B_{kj} a_j^2 \over\sum_j a_j^2} \right)^2

484: %\right] . \nonumber

485: \end{eqnarray}

486: %

487: The combined probability for the simultaneous occurrence of all $M$

488: linear constraints is then given by the product of the single

489: probabilities, $L_{\rm linear} = \prod_k P(\gamma_k)$. Using equation

490: (\ref{eq.non.linear.term}), the logarithm of this linear part of the

491: likelihood is therefore

492:

493: \begin{equation}

494: \label{eq:loglinear}

495:   \ln L_{\rm linear} = - \sum_{k=1}^M  \ln {\sqrt{2\pi} \sigma_k} \, - \,

496:           \sum_{k=1}^M \left(

497:                   \frac{\gamma_k-\gamma_k^*}{\sqrt{2}\,\sigma_k}\right)^2 .

498: \end{equation}

499: %

500: The first sum on the right-hand side of this expression does not

501: depend on the orbital weights $a_j^2$ and, therefore, does not affect

502: the likelihood maximization. The second term has the exact form of the

503: $\chi^2$ statistic. Maximizing the likelihood is therefore equivalent

504: to the minimization of this $\chi^2$. This can be done by finding the

505: solution of the set of equations~(\ref{eq.constraint}) and

506: (\ref{eq.gamma.def}), which can be rewritten as an overdetermined

507: matrix equation. This matrix equation can be solved with the use of

508: standard non-negative least-squares (NNLS) algorithms (see

509: \citealt{rix97} for a detailed description).

510:

511: In the case of discrete data, however, the introduction of constraints

512: of a ``non-linear'' type is inevitable in order to adequately exploit

513: the entire information content available, avoiding restrictive

514: simplifications and loss of information due to binning.  This occurs

515: because the individual probabilities do not necessarily have the

516: simple, Gaussian form of equation (\ref{eq.non.linear.term}).  The

517: procedure for finding the maximum likelihood then cannot be cast as

518: the solution of a linear matrix equation anymore.

519:

520: Suppose we have a kinematic dataset consisting of discrete

521: measurements which we are trying to model using the \sch

522: technique. Let $P_j({\bf q})$ be the phase-space probability

523: distribution of any given orbit, properly averaged azimuthally, and

524: normalized such that $\int{P_j({\bf q}){\rm d^3}r\,{\rm d^3}v} = 1$.

525: We use ${\bf q}$ to denote a vector of up to six Euclidean spatial and

526: velocity coordinates. Whenever ${\bf q}$ is shorter than 6 elements,

527: it is understood that the distribution has been marginalized over the

528: missing dimensions. Then the total probability of drawing a particle

529: from a superposition of orbits representing the whole system is

530: %

531: \begin{equation}

532: \label{eq.prob.q}

533:   P({\bf q}) = \sum_j a^2_j P_j({\bf q}) /\sum_j a_j^2 .

534: \end{equation}

535:

536: We now need to consider the total probability of the ensemble of $N$

537: particles with kinematic information that constitute our discrete

538: dataset. Before this, however, it is necessary to make the

539: distinction, in the language of probabilities, between the possible

540: modes of sampling of the tracers available in a system of particles.

541: The two main possibilities depend on whether the particles are

542: randomly or non-randomly drawn from their spatial distribution, and we

543: may refer to these, respectively, as random positional sampling and

544: incomplete positional sampling. Additionally, particles may be drawn

545: with or without velocity information, thus adding up to a total of

546: four possibilities. The case with incomplete positional sampling and

547: no velocity information, however, does not provide any useful

548: constraint to the analysis and therefore we restrict the discussion to

549: the remaining three cases.

550:

551: For particles drawn randomly from the spatial distribution with no

552: velocity information, the probability $P({\bf q})$ is

553:

554: \begin{equation}

555: \label{eq.prob.r}

556: P({\bf q}) = P({\bf r}) = \sum_j a^2_j P_j({\bf r}) /\sum_j a_j^2\,,

557: \end{equation}

558:

559: \noindent where {\bf r} represents a 2 or 3 dimensional position. This

560: type of dataset could result from imaging of the resolved populations

561: of a stellar system, where the positional information could be used as

562: actual constraints. This would force the model to fit the underlying

563: spatial distribution of discrete tracers, instead of making use of a

564: parametrization of the (continuous) brightness profile of the system.

565: Of course, a dataset without velocity information cannot by itself

566: constrain the dynamical state or the mass of the system.

567:

568: In the case of random positional sampling including velocity

569: information, particles are randomly drawn from both the spatial and

570: velocity distributions. In this case, $P({\bf q})$ has the form

571:

572: \begin{equation}

573: \label{eq.prob.rv}

574: P({\bf q}) = P({\bf r},{\bf v}) = \sum_j a^2_j P_j({\bf r},{\bf v}) /\sum_j a_j^2\,,

575: \end{equation}

576:

577: \noindent where ${\bf r}$ is the same as above and ${\bf v}$

578: represents a general 1, 2, or 3 dimensional velocity. This would be

579: the case when being able to obtain the velocities of particles in a

580: given field without introducing any spatial or velocity bias, such as

581: the proper motions of all stars (brighter than some magnitude limit)

582: in a sufficiently sparse stellar cluster, or when LOS velocities are

583: obtained for a complete (or possibly magnitude-limited) set of

584: globular clusters or planetary nebulae in a galaxy.

585:

586: In contrast, having {\it incomplete} positional sampling means that

587: the particles are drawn from a velocity distribution, with {\it a

588: priori} fixed positions. This can occur, for example, when because of

589: the usually limited availability of telescope time and resources, LOS

590: velocities are measured only for stars within some distance from the

591: photometric major or minor axes of a galaxy, or when because of the

592: finite size of fibers in a fiber-fed spectrograph, not all the

593: potentially observable kinematic tracers in the field can be actually

594: acquired. Incomplete positional sampling also arises when, even though

595: particles can be randomly drawn spatially, this is the case only for a

596: limited area. This occurs, for example, when the observations have to

597: avoid the innermost regions of a galaxy or globular cluster, where,

598: because of crowding, stars cannot be individually resolved. In these

599: case, $P({\bf q})$ has the form

600:

601: \begin{equation}

602: \label{eq.prob.rfix}

603: P({\bf q}) = P({\bf v}|{\bf r}) = \sum_j a^2_j P_j({\bf r}) P_j({\bf v}|{\bf r})

604:     /\sum_j a_j^2 P_j({\bf r}),

605: \end{equation}

606:

607: \noindent where, rather than just $a_j^2$, the effective weights when

608: summing together the individual orbital distributions are

609: $a_j^2\,P_j({\bf r})$.

610:

611: Once the individual probabilities for all possible cases of spatial

612: sampling that comprise the data have been properly assigned, we can

613: proceed to the construction of the total probability of observing the

614: entire dataset. Let $N_1$ and $N_2$ be the number of observational

615: data points obtained under the mode of random positional sampling

616: without and with velocity information, respectively, and $N_3$ the

617: number of data points obtained with incomplete positional sampling

618: with kinematic information. Then, the total probability is simply the

619: product of the individual probabilities, with logarithm given by $\ln

620: L_{\rm discrete} = \sum_{i=1}^{N_1}\ln P({\bf r_i}) +

621: \sum_{i=1}^{N_2}\ln P({\bf r_i},{\bf v_i}) + \sum_{i=1}^{N_3}\ln

622: P({\bf v_i}|{\bf r_i})$. Using equations (\ref{eq.prob.r}) to

623: (\ref{eq.prob.rfix}), and adopting the abbreviated notation

624: $p^{(r)}_{i,j} \equiv P_j({\bf r_i})$, $p^{(q)}_{i,j} \equiv P_j({\bf

625: r_i,v_i})$, and $p^{(v)}_{i,j} \equiv P_j({\bf v_i | r_i})$ (all known

626: for each orbit $j$ and particle $i$ from the orbit library

627: calculation; see \S\,3), the quantity to maximize becomes

628: %

629: \begin{eqnarray}

630: \label{eq.total.likelihood.1}

631: \ln L_{\rm discrete}

632: &=&  \sum_{i=1}^{N_1} \left(\ln \sum_j a^2_j p^{(r)}_{i,j} -  \ln \sum_j a^2_j\right) \\

633: &+&  \sum_{i=1}^{N_2} \left(\ln \sum_j a^2_j p^{(q)}_{i,j} -  \ln \sum_j a^2_j\right) \nonumber \\

634: &+&  \sum_{i=1}^{N_3} \left( \ln \sum_j a^2_j p^{(r)}_{i,j}  p^{(v)}_{i,j} -  \ln \sum_j a^2_j  p^{(r)}_{i,j} \right). \nonumber

635: \end{eqnarray}

636: %

637: Joining the results in equations (\ref{eq:loglinear}) and

638: (\ref{eq.total.likelihood.1}), the complete log-likelihood for a

639: general application of the \sch method, which is the full expression

640: to be maximized with respect to the orbital weights $a_j$, is the sum

641: of the log-likelihoods for linear and discrete constraints

642: %

643: \begin{eqnarray}

644: \label{eq.logL}

645: \ln L = \ln L_{\rm linear} + \ln L_{\rm discrete}.

646: \end{eqnarray}

647: %

648: Finding the maximum likelihood corresponds to finding the solution of

649: $\partial(\ln L)/\partial a_l$ = 0, for all $l$. Denoting $s=\sum_j

650: a_j^2$, the expression for the first derivative is

651: %

652: \begin{eqnarray}

653: \label{eq.1st.deriv}

654: {\partial \ln L \over \partial a_l}

655: &=& -\,{2 a_l \over s}

656: \sum_{k=1}^M {1\over {\sigma_k}^2}

657:        \left( \gamma_k - {\gamma_k}^* \right)

658:        \left( - B_{kl} + {{\gamma_k^*}}   \right)\\

659: && +\,2 a_l \sum_{i=1}^{N_1} \left( {p^{(r)}_{i,l}\over \sum_j  a^2_j p^{(r)}_{i,j}} - {1\over s} \right)\nonumber \\

660: && +\,2 a_l \sum_{i=1}^{N_2} \left( {p^{(q)}_{i,l}\over \sum_j  a^2_j p^{(q)}_{i,j}} - {1\over s} \right)\nonumber \\

661: && +\,2 a_l \sum_{i=1}^{N_3} \left( {p^{(r)}_{i,l} p^{(v)}_{i,l}\over

662:      \sum_j  a^2_j p^{(r)}_{i,j}  p^{(v)}_{i,j}}

663:   -  { p^{(r)}_{i,l} \over \sum_j  a^2_jp^{(r)}_{i,j} } \right) . \nonumber

664: \end{eqnarray}

665:

666: One important question that remains is regarding the estimation of

667: confidence regions around the parameters of the best-fitting model,

668: i.e., the (statistical) uncertainties around the likelihood maximum in

669: the case of non-linear constraints. Recalling that maximizing $\ln L$

670: is equivalent to minimizing the quantity $\lambda = -2\ln L$, it is

671: easy to realize that, if the probabilities involved in equation

672: (\ref{eq.total.likelihood.1}) were all of Gaussian form, then

673: $\lambda$ would simply reduce to the well known $\chi^2$ statistic, as

674: we have already seen for the case with linear constraints in equation

675: (\ref{eq:loglinear}). When dealing with non-linear constraints,

676: however, the likelihood does not reduce to a simple $\chi^2$

677: form. Nevertheless, one still can use another well known theorem of

678: statistics which, used before by \citet{mer93a} and \citet{vdm00},

679: states that the ``likelihood-ratio'' statistic $\lambda - \lambda_{\rm

680: min}$ does tend to a $\chi^2$ statistic in the limit of large $N$,

681: with the number of degrees of freedom equal to the number of free

682: parameters that have not yet been varied and chosen so as to optimize

683: the fit. Therefore, the likelihood-ratio statistic reduces to the

684: $\Delta\chi^2$ statistic for $N\rightarrow\infty$, even though the

685: probabilities in equation (\ref{eq.total.likelihood.1}) are not all

686: individually Gaussian. Since in the present work we explore datasets

687: consisting of 100 kinematic measurements or more, the condition of

688: large $N$ should be reasonably fulfilled. Therefore, following the

689: likelihood-ratio statistic, we assume $\Delta\chi^2 = -2(\ln L-\ln

690: L_{\rm max})$, and compute the confidence regions around the best-fit

691: parameters in the usual way (e.g., \citealt{recipes}), i.e., with the

692: $1\sigma$ error for a single parameter corresponding to wherever

693: $\Delta\chi^2 = 1$, and so forth. Other approaches to quantify the

694: uncertainties exist as well, e.g., using Bayesian statistics, but

695: these are generally more difficult to implement (e.g.,

696: \citealt{mag06}).

697:

698: The equations described above assume that any possible ``interloper''

699: contaminants have already been removed, and that the targets with

700: observed velocities that enter the likelihood equations all belong to

701: the system under study. For realistic datasets, contamination by

702: interlopers can certainly be a problem \citep{lok05}; i.e., targets

703: that happen to lie close to the line-of-sight of the stellar system

704: under study and are difficult to reject from the sample. However,

705: efficient interloper rejection schemes do exist for various types of

706: samples and these have been well-described in the literature

707: \citep{woj07a,woj07b}. Moreover, the use of empirically-calibrated

708: selection criteria (independent of the measured velocity) can produce

709: extraordinarily clean samples for kinematic analysis

710: \citep{gil06,gil07,sim07}. Either way, interloper rejection is best

711: discussed in the context of specific data sets. We therefore do not

712: discuss it further in the present paper. Interloper rejection for

713: discrete data sets can also be built in as part of the likelihood

714: analysis \citep{vdm00}, so a simple modification of the likelihood

715: equations given above could deal with interlopers explicitly. However,

716: we have not yet explored this in the present context.

717:

718: \section{Computational Implementation}

719: \label{sec:code}

720:

721: Given equations (\ref{eq.logL}) and (\ref{eq.1st.deriv}), fitting a

722: \sch model to the data requires the following two steps: (a)

723: determination of all the individual probabilities $p_{i,j}$ and matrix

724: elements $B_{kj}$, so that the only unknowns in equation

725: (\ref{eq.1st.deriv}) are the coefficients $a_l$; and (b) performing

726: the maximization of the total likelihood, i.e., finding the set of

727: orbital weights $a_l$ that satisfies $\partial(\ln L)/\partial a_l$ =

728: 0, for all $l$, and therefore best fits the available constraints. The

729: elements of the matrix $B_{kj}$, corresponding to the linear

730: constraints discussed in \S\,\ref{sec:logL}, are calculated in the

731: same way as in the old (continuous) implementation of the code, and

732: for them we refer to \citet{rix97}, \citet{vdm98} and

733: \citet{cre99}. In what follows we concentrate on the probabilities

734: $p_{i,j}$ associated with the discrete treatment that is the subject

735: of this work.

736:

737: \subsection{Calculation of Individual Probabilities}

738: \label{sec:pij}

739:

740: The matrix elements $p_{i,j}$ in equation (\ref{eq.1st.deriv}), which

741: keep track of the probability that orbit $j$ of the library would have

742: produced the measurement $i$ of the dataset (each $j$ corresponding to

743: some combination of the three integrals of motion $E$, $L_z$, and

744: $I_3$), are stored as the orbit in question is being computed. That

745: is, at every time step during the orbit integration, we check whether

746: the position and velocity along the orbit is consistent with any of

747: the observational datapoints.  To accomplish this, it is necessary to

748: implement some degree of {\it smoothing}, both in position and

749: velocity space, since otherwise the probability of having a particle

750: on an orbit at exactly the observed position and velocity would be

751: infinitesimally small.

752:

753: Smoothing in the spatial coordinates is accomplished through the

754: definition of an {\it aperture} around the position of each particle

755: in the dataset, with the size of the aperture controlling the amount

756: of smoothing. The optimal aperture size will be somewhat dependent on

757: the sampling characteristics of the data. In general, apertures should

758: not be too small, or otherwise few time steps during orbit integration

759: will fall on any one of them. This would lead to large shot noise in

760: the computed probabilities $p_{i,j}$, unless the orbits are integrated

761: for very long times. Nor should the apertures be too large, so that

762: information on the orbital structure of the model is not erased by

763: excessive spatial smoothing. The choice of aperture shape is arbitrary

764: and a matter of numerical convenience. We adopt square apertures as in

765: previous implementations of the code (long-slit observations naturally

766: produce data for rectangular apertures), and set their sizes to a

767: user-supplied fraction of $R$, the radius in the projected plane at

768: the aperture's position.

769:

770: Once the spatial apertures are defined, and every time the projected

771: position along the orbit being integrated falls within an aperture, we

772: need to keep track of whether the orbital velocity matches the

773: observed velocity. In the old (continuous) implementation, the LOSVD

774: was computed and stored for every orbit $j$ at each aperture $i$, with

775: the size of the bins in the histogram determining the amount of

776: smoothing in velocity space. In our discrete treatment of the problem,

777: $p_{i,j}$ would simply be the histogram value for the bin that

778: contains the observed velocity. A direct, though information ally

779: incomplete, generalization of this implementation to kinematical data

780: in three-dimensions would be to keep track of two additional

781: histograms at each aperture to account for $\mu_{x'}$ and

782: $\mu_{y'}$. This has been done by \citet{glenn06} and \citet{bos06},

783: who calculated moments of the three model velocity distributions and

784: fitted them to those obtained from binning LOS and proper-motion

785: observations of stars in $\omega$ Cen and M15, respectively (note that

786: these studies still handle the data in a continuous fashion, by

787: reducing the initially discrete datasets to binned velocity

788: distributions at a number of apertures on the sky, an approach only

789: possible thanks to the very large number of stars with measured

790: velocities in these systems).

791:

792: While reproducing the three-dimensional mean velocities and

793: dispersions of the stars in a stellar system is already an improvement

794: over all previous implementations of the \sch technique, doing so is

795: nevertheless a simplification of the problem. The reason is that it

796: implicitly assumes that the three velocity components are independent

797: of each other, i.e., it does not account for the fact that there is a

798: velocity ellipsoid whose cross terms are, in the most general case,

799: not identical to zero. The most complete treatment would be to store a

800: cube with entries for all possible combinations of

801: $(\mu_{x'},\mu_{y'},v_{z'})$, and do this at each spatial aperture where there

802: is kinematical data available.  This implementation would be, however,

803: expensive in terms of memory storage and, moreover, not absolutely

804: necessary, simply because we are not interested in the entire

805: probability cube. Instead, we only need probabilities in the cases

806: when the model velocities are close to the observed ones. Thus, in the

807: framework of velocity histograms or full velocity cubes, and because

808: of the discrete nature of the data, the large majority of the bins or

809: entries would be filled with weights that do not affect the likelihood

810: in equation (\ref{eq.logL}).

811:

812: Therefore, we adopt an approach in which, instead of storing velocity

813: histograms or cubes, every time an orbit $j$ passes through an

814: aperture $i$ with kinematical data, we add a Gaussian contribution to

815: $p_{i,j}$. This contribution is centered on the observed

816: (any-dimensional) velocity and has a dispersion that reflects the

817: measurement errors, and if desired, any amount of extra velocity

818: smoothing. Thus, denoting the actually measured components of the

819: particle's velocity in aperture $i$ as $v_{ik}$ and their associated

820: uncertainties as $e_{ik}$, with $k=1\ldots3$ corresponding to

821: $v_{x'}$, $v_{y'}$, and $v_{z'} = v_{\rm los}$, the multiplicative

822: contribution $w_{ik}^{(j)}$ to the probability has the form

823: %

824: \begin{eqnarray}

825: \label{eq.weights}

826: w_{ik}^{(j)} &=&

827: {1\over \sqrt{2\pi\left(\xi_k^2 + e_{ik}^2\right)}} \exp{\left[-\,\frac{\left(v_{jk}-v_{ik}\right)^2}{2 \left(\xi_k^2+e_{ik}^2\right)}\right]},

828: \end{eqnarray}

829: %

830: where $v_{jk}$ is the component $k$ of the velocity of a test particle

831: on orbit $j$. The quantity $\xi_k$ is the numerical smoothing assigned

832: to velocity component $k$. Whenever a particular component $k$ of the

833: velocity is not available, we set $w_{ik}^{(j)} = 1$. Finally, in

834: order to account for the fact that we represent a continuous orbit by

835: a discrete sequence of time steps, we weigh this Gaussian factor by

836: multiplying it by the timestep $\Delta t_j$. Therefore, for every

837: orbit $j$, and every time the orbit integration falls within an

838: aperture, the probability is increased according to

839: %

840: \begin{eqnarray}

841: \label{eq.pij}

842: p_{i,j} = p_{i,j} + \Delta t_j \prod_{k=1}^3 w_{ik}^{(j)}.

843: \end{eqnarray}

844: %

845: When the integration of orbit $j$ is done, the $p_{i,j}$ elements for

846: all datapoints (apertures) are written to a file for later use by the

847: algorithm that performs the maximization of the likelihood.

848:

849: In most practical applications one can set $\xi_k = 0$, since the

850: error bars $e_{ik}$ on the data already provide sufficient natural

851: smoothing for numerical efficiency. We do this throughout the rest of

852: this paper. However, we note that there may be situations in which

853: non-zero $\xi_k$ may be beneficial. For example, if the observational

854: errors $e_{ik}$ are much smaller than the velocity dispersions

855: $\sigma_k$ of the system. It then takes very long integrations to beat

856: down the shot noise in the orbital distributions $p_{i,j}$. Addition

857: of a numerical smoothing $\xi_k$ with $e_{ik} \ll \xi_k \ll \sigma_k$

858: can then speed up the calculations without affecting the accuracy of

859: the results.

860:

861: The approach of equations (\ref{eq.weights}) and (\ref{eq.pij})

862: assumes that the errors $e_{ik}$ for the different datapoints are

863: uncorrelated. Sometimes this is not true, as in the case of the proper

864: motions of stars in the globular cluster $\omega$ Cen, where relative

865: rotation between the old photographic plates used in their derivation

866: produce an artifact overall rotation of the cluster

867: \citep{glenn06}. If problems like these can not be removed before

868: modeling, a more complicated treatment than the one described here

869: will be necessary.

870:

871: \subsection{Finding the maximum likelihood solution}

872: \label{sec.mkfitin}

873:

874: The non-linear nature of the discrete problem addressed in this paper

875: requires the use of a non-linear optimizer, and there is no guarantee

876: of a unique optimum. After experimentation with various optimization

877: algorithms, we settled on the TOMS 500 conjugate gradient optimizer of

878: \citet{sha80}. This code uses the function value and gradient to

879: optimize along successive vectors (lines) in the space of the orbital

880: weights, choosing the optimization direction at every pass in a manner

881: that attempts to minimize the number of such line minimizations needed

882: (see Chapter 10 of \citealt{recipes} for details on conjugate gradient

883: methods).

884:

885: In our code, we rely on the fact that the majority of orbits do not

886: contribute to any particular linear constraint, or to the likelihood

887: of any particular observational datum. In the notation of equation

888: (\ref{eq.1st.deriv}), the linear constraints $B_{k,l}$, and also the

889: $p_{i,l}$, are sparse matrices. Accordingly, the code to evaluate the

890: gradient in equation (\ref{eq.1st.deriv}) is written to store and

891: evaluate only non-zero terms of $B_{k,l}$ and $p_{k,l}$, reducing the

892: computational burden by a factor of four or five.

893:

894: To evaluate convergence and estimate the proximity of our final

895: likelihood maximum to the true (possibly local) maximum, we plot the

896: magnitude of the improvement of the likelihood $\delta\lambda$ as a

897: function of the number of function evaluations $N$. See Figure

898: \ref{fig:mkfitin}. We find that $\delta\lambda$ is well represented by

899: an exponential relation $\delta\lambda \sim \exp{(-aN)}$, where

900: $a\approx 10^{-5}$. Therefore, the future change in the likelihood if

901: the optimizer were allowed to run forever would be $\Delta \ln L \sim

902: \int_{N_0}^\infty \delta\lambda\, dN = a^{-1} (\delta\lambda)_0$,

903: where $(\delta\lambda)_0$ is the current change in likelihood at step

904: $N_0$. In practice, we terminate the optimization at $\delta\lambda =

905: 10^{-6}-10^{-7}$, so that we expect to be within an additive factor of

906: $\leq 0.1$ of the true likelihood maximum. This typically occurs after

907: a number $N\sim 10^5$ of function evaluations.  The final accuracy is

908: merely linear in the exponential coefficient $a$, so that this

909: accuracy estimate should be reasonably robust.

910:

911: We ran a variety of tests in order to establish whether the algorithm

912: has a tendency of finding local extrema as opposed to global ones. In

913: particular, for some of the test cases to be discussed later in

914: \S\,\ref{sec:tests}, we started the iterative algorithm from different

915: initial conditions, to verify that the solutions thus obtained were

916: always in (statistical) agreement. Also, as will be shown in

917: \S\,\ref{sec:tests}, we find that the algorithm recovers the

918: properties of known input models with reasonable accuracy. While this

919: does not prove that the \sch code cannot end up in a local maximum, at

920: least it shows that the code does not end up in (potential) local

921: maxima that are far from the correct solution.

922:

923: In practice we usually start the maximization procedure from a

924: homogeneous set of initial mass weights. We also investigated whether

925: the convergence to a solution can be sped up by starting the iterative

926: process from initial conditions that may already be reasonably close

927: to the final solution. For example, we ran tests starting from a set

928: of weights corresponding to a two-integral DF of the form $f(E,L_z)$

929: that already fit the light (surface brightness) profile followed by

930: the input data. Such a solution is easily obtained as the NNLS

931: solution of a matrix equation. We found that the same final answer was

932: reached in essentially the same number of iterations.

933:

934:

935: %\newpage

936:

937:

938: \section{Pseudo-Data and Comparison Distribution Functions}

939: \label{sec:tools}

940:

941: In order to test the performance of our discrete \sch code, we

942: generate sets of simulated data drawn from a known phase-space

943: distribution function (DF). Unlike the case of using actual

944: observations of a real stellar system, this approach offers the

945: advantage of unambiguously knowing in advance the input properties

946: underlying the data, which an optimally-working code should be able to

947: ``recover''.  It also provides flexibility by allowing the possibility

948: of adapting the input data at will in order to test different aspects

949: of the code (\S\,\ref{sec:tests}). We discuss here the construction of

950: various sets of pseudo-data and the properties of the underlying

951: models.

952:

953:

954:

955:

956: \subsection{Simulated Datasets}

957: \label{sec:data}

958:

959: Our simulated input data are obtained from a set of $f(E,L_z)$ DFs

960: derived by \citet{vdm98}, with the methodology for drawing N-body

961: initial conditions from these DFs described in \citet{vdm97b}. The

962: models have a constant mass-to-light ratio $\Upsilon$, and have

963: neither a central black hole or extended dark halo. They provide good

964: fits to available photometric and kinematic observations of the galaxy

965: M32 over the radial range from $1-20$ arcsec. However, this property

966: has no bearing on the present analysis. We only use the fact that

967: there is a known DF, and not that this DF resembles any realistic

968: stellar system. A two-integral $f(E,L_z)$ DF provides a useful test

969: case (see also \citealt{cre99}, \citealt{ver02}), and does not mean

970: that the model results would be less valid for more general DFs. Also,

971: the use of a constant $\Upsilon$ is motivated only to simplify the

972: test environment. Central black holes (e.g., \citealt{vdm98,geb00})

973: and extended dark halos (e.g., \citealt{rix97,cap06}) can be easily

974: implemented in any \sch code.

975:

976: The luminous mass density is assumed to be axisymmetric and is

977: parameterized according to

978:

979: \begin{equation}

980: \label{eq:light}

981: \rho(R,z) = \rho_{0}(m/b)^{\alpha}[1+(m/b)^2]^{\beta}[1+(m/c)^2]^{\gamma},

982: \end{equation}

983:

984: \noindent with $m^2 \equiv R^2 + (z/q)^2$. Here, $q$ is the (constant)

985: intrinsic axis ratio, related to the projected (observed) axis ratio

986: $q_p$ via the inclination angle $i$, $q_p^2 = \cos^2 i + q^2\sin^2

987: i$. The parameters in equation (\ref{eq:light}) are set to

988: $\alpha=-1.435$, $\beta=-0.423$, $\gamma=-1.298$, $b=0.\arcsec55$,

989: $c=102.\arcsec0$, $q_p=0.73$, and $\rho_0=j_{0}\Upsilon_0$, with the

990: $V$-band luminosity density $j_0 =

991: 0.463\times10^5(q_p/q)$\lsun\,pc$^{-3}$, and $\Upsilon_0$ the

992: mass-to-light ratio in the $V$-band and in solar units. The adopted

993: distance is 0.7 Mpc. The models share the property of appearing the

994: same in projection on the sky, but correspond to different intrinsic

995: axis ratios as determined by the inclination angle $i$.

996:

997: The even part $f_e$ of the DF $f(E,L_z)$ is uniquely determined by the

998: mass density $\rho(R,z)$ (e.g., \citealt{bt87}). To specify the odd

999: part $f_o$ of the DF we follow \citet{vdm94} and write

1000: %

1001: \begin{equation}

1002: \label{eq:odd}

1003: f_o = f_e\,(2\eta-1)\,h_u[L_z/L_{z,{\rm max}}(E)],

1004: \end{equation}

1005: %

1006: with $L_{z,{\rm max}}(E)$ being the angular momentum of a circular

1007: orbit of energy $E$ in the equatorial plane ($z=0$), and the auxiliary

1008: function $h_u$ defined by

1009: %

1010: \begin{equation}

1011: \label{eq:ha}

1012: h_u(x) = \left\{

1013:  \begin{array}{ll}

1014:     \tanh(ux/2)\,\,/\,\,\tanh(u/2)&  \mbox{$(u > 0)$},\\

1015:     x&  \mbox{$(u=0)$},\\

1016:     (2/u)\tanh^{-1}[x\tanh(u/2)]&  \mbox{$(u < 0)$}.

1017:  \end{array}\right.

1018: \end{equation}

1019: %

1020: The choice of the parameters $\eta$ and $u$ determines the degree of

1021: streaming of the dataset. These free parameters can have values in the

1022: ranges $0\leq \eta \leq 1$ and $-\infty < u < \infty$, with $\eta$

1023: controlling the fraction of stars in the equatorial plane with

1024: clock-wise rotation, and $u$ controlling the behavior of the stellar

1025: streaming with orbital shape. The family of functions $h_u$ is shown

1026: in Figure 1 of \citet{vdm94}. Combinations of $(\eta,u)$ that fit data

1027: for M32 are also discussed in that paper. Here we explore a variety of

1028: input datasets with different amounts of mean streaming and test the

1029: recovery of these properties by our discrete \sch code.

1030:

1031: We generated 6 different datasets to test our discrete \sch code. By

1032: dataset we mean a number of particle $(x',y')$ positions on the sky

1033: with corresponding proper motions $(\mu_{\rm x'},\mu_{\rm y'})$ and

1034: LOS velocities $v_{\rm z'}$. For two chosen inclinations on the sky,

1035: $i=90\grad$ and $i=55\grad$, we produced three datasets resembling

1036: systems with varying degrees of rotation: a non-streaming system

1037: ($\eta=0.5$ and $u=1$), a maximally-streaming system ($\eta=1$ and

1038: $u=\infty$), and a third system with intermediate streaming ($\eta=1$

1039: and $u=0$). We label our different datasets as 90ns, 90is, and 90ms to

1040: indicate the non-streaming, intermediate-streaming, and

1041: maximally-streaming cases of $i=90\grad$, respectively. Similarly, for

1042: the $i=55\grad$ case, we have the 55ns, 55is, and 55ms datasets. The

1043: mass-to-light ratio used to generate the datasets is $\Upsilon_0=2.51$

1044: for $i=90\grad$ and $\Upsilon_0=2.55$ for $i=55\grad$, in units of

1045: \msun/$L_{\odot,V}$.

1046:

1047: Although we examined the performance of our \sch code with tests that

1048: involve all of the six simulated datasets introduced above, we chose

1049: to use the 55is dataset to show most of our results. Figure

1050: \ref{fig:data55is} shows some projections of the phase-space

1051: coordinates for the 55is dataset.

1052:

1053:

1054:

1055:

1056:

1057:

1058: \subsection{Comparison DF}

1059: \label{sec:DF}

1060:

1061: In order to quantitatively judge the performance of the three-integral

1062: \sch code, it is desirable to make a comparison between the properties

1063: of the input DF (i.e., that from which the pseudo-data were obtained)

1064: and those of the fitted DF (i.e., that found as the solution to the

1065: fitting or minimization problem). It is important to note in this

1066: context that the direct output of our \sch code is not in the form of

1067: a proper DF $f$, but rather in the form of ``mass weights'' $\zeta$

1068: associated to each set of integrals of motion $(E,L_z,I_3)$ that

1069: uniquely define an orbit. The relation between the DF and the orbital

1070: mass weights is through a volume element dependent on the three

1071: integrals and an integration over the 3-dimensional space associated

1072: to the particular orbit (see \citealt{voort84} for a detailed

1073: discussion). Such a conversion can be done in Schwarzschild codes

1074: (e.g., \citealt{tho04}), but this is not necessary for the goals of

1075: the present paper. We therefore limit ourselves to the comparison

1076: between the input and the fitted orbital mass weight distributions,

1077: which from now on we denote by $\zeta_{\rm in}(E,L_z,I_3)$ and

1078: $\zeta_{\rm fit}(E,L_z,I_3)$, respectively.

1079:

1080: To validate the weights $\zeta_{\rm fit}(E,L_z,I_3)$ returned by the

1081: \sch code, we need to know the weights $\zeta_{\rm in}(E,L_z,I_3)$ for

1082: the model DF $f(E,L_z)$. This is not a simple problem in the absence

1083: of an analytic expression for $I_3$. However, two related functions

1084: are more easily accessible. The first is $\bar\zeta_{\rm in}(E,L_z)$,

1085: defined as the projection of $\zeta_{\rm in}(E,L_z,I_3)$ over the

1086: $E-L_z$ plane (i.e., integrated over $I_3$). Having the means of

1087: drawing N-body initial conditions from the DF \citep{vdm97b}, we know

1088: that the energy and z-component of the angular momentum of each

1089: particle are given by $E=\psi - {1\over 2}v^2$ and $L_z = R\cdot

1090: v_{\phi}$, respectively. Therefore, $\bar\zeta_{\rm in}(E,L_z)$ is

1091: easily obtained by binning a large N-body dataset $(N\sim 10^6)$ in

1092: $E$ and $L_z$. The second related function that is easily accessible

1093: is $\zeta_{\rm Kep,\lambda}(E,L_z,I_3)$, the distribution of mass

1094: weights for an $f(E,L_z)$ model of axial ratio $q$ and a power-law

1095: density profile with logarithmic slope $\lambda$ in a spherical Kepler

1096: potential. This function is calculated analytically in de Bruijne et

1097: al. (1996; their equation (38)), and has the form

1098: %

1099: \begin{equation}

1100: \label{eq:kep}

1101: \zeta_{\rm Kep,\lambda}(E,L_z,I_3) = E^{\lambda-4}\times j_{\lambda}\left[L_z/L_{z,{\rm max}}(E),I_3\right].

1102: \end{equation}

1103: %

1104: Here, $\lambda$ is the logarithmic slope of the mass distribution and

1105: $j_{\lambda}$ is a known function. The spherical Kepler potential is

1106: of course only an accurate approximation to our model at

1107: asymptotically large radii. Nonetheless, we can combine

1108: $\bar\zeta_{\rm in}(E,L_z)$ and $\zeta_{\rm Kep,\lambda}(E,L_z,I_3)$

1109: to obtain a reasonable approximation for $\zeta_{\rm in}(E,L_z,I_3)$

1110: throughout the system, namely

1111: %

1112: \begin{equation}

1113: \label{eq:zetain}

1114: \zeta_{\rm in}(E,L_z,I_3) \approx \bar\zeta_{\rm in}(E,L_z) \times g(I_3),

1115: %j_{\lambda}\left(g(L_z,E),I_3\right).

1116: \end{equation}

1117: %

1118: with

1119: %

1120: \begin{equation}

1121: \label{eq:I3}

1122: g(I_3) \equiv \frac{j_{\lambda}\left[L_z/L_{z,{\rm max}}(E),I_3\right]}{\int j_{\lambda}\left[L_z/L_{z,{\rm max}}(E),I_3\right]{\rm d} I_3}.

1123: \end{equation}

1124: %

1125: For $\lambda$ we take the slope of the mass distribution of equation

1126: (\ref{eq:light}) at $r=R_c$, the radius of the circular orbit of

1127: energy $E$ in the equatorial plane $(z=0)$. The function $\zeta_{\rm

1128: in}$ in equation (\ref{eq:zetain}) is correct (i.e., reduces to

1129: $\bar\zeta_{\rm in}$) when projected on the $E-L_z$ plane, and has

1130: approximately the correct distribution over $I_3$ at fixed

1131: $(E,L_z)$. In this way, we construct sets of orbital mass weights for

1132: each of our 6 simulated datasets described in \S\,\ref{sec:data}.

1133:

1134:

1135:

1136:

1137:

1138: \section{Performance Tests}

1139: \label{sec:tests}

1140:

1141: Using all the kinematic (pseudo) datasets and their corresponding

1142: input DFs described in \S\,\ref{sec:tools} we now proceed to examine

1143: how accurately the discrete \sch code can recover the properties of

1144: the galaxy models used to generate the input datasets. By {\it

1145: recovery} we mean to determine how close or how far is the obtained

1146: solution from the known DF, known mass-to-light ratio \ml, and known

1147: inclination $i$ of the galaxy model corresponding to the simulated

1148: dataset that was provided as input to the code. At the same time, we

1149: investigate the reliability of the uncertainties provided by the code

1150: on each of these properties.

1151:

1152: In the general case of modeling real observations of an actual stellar

1153: system, the true radial mass density profile is not known a priori and

1154: is typically described following some parameterization. Since mass may

1155: not necessarily follow light, or may do so in some complicated way,

1156: different plausible mass models should be attempted, as well as

1157: allowing for possible variations of the mass-to-light ratio with

1158: position. For the purposes of the present tests, however, the

1159: underlying mass distribution is assumed to be perfectly known from

1160: equation (\ref{eq:light}), except for the value of \ml. Therefore, the

1161: assumed parameterization for the mass distribution is only a

1162: 1-parameter family, and includes the ``correct'' distribution

1163: $(\Upsilon = \Upsilon_0)$. In applications to real data,

1164: higher-parameter families may be necessary, and there is no guarantee

1165: that any member of the family would provide a good approximation to

1166: the true underlying distribution.

1167:

1168: The results of our tests are examined via three different exercises,

1169: which can be performed on each of the 6 different input datasets,

1170: providing a good baseline to judge the performance of our discrete

1171: \sch code. First, we explore the recovery of the internal orbital

1172: structure of the input dataset (i.e., the input DF, or more

1173: specifically, the input mass weights $\zeta_{\rm in}$) by feeding the

1174: code with the correct inclination and mass-to-light ratio \ml\,

1175: (\S\,\ref{sec:getDF}). Second, we fix the inclination to the correct

1176: value of the input dataset and study whether the code finds the

1177: minimum of the $\Delta \chi^2$ function at the correct value of $\Upsilon$

1178: (\S\,\ref{sec:getML}). And third, we explore grids of \sch models with

1179: different ($i$,\ml) combinations to study how well these two

1180: properties are recovered when they are both assumed unknown

1181: (\S\,\ref{sec:grids}).

1182:

1183: We run all the above exercises for various subsets of each of our 6

1184: datasets in order to explore the results as a function of relevant

1185: observational variables, particularly the size of the input dataset

1186: and the type of kinematical constraints available (i.e., only LOS

1187: velocities, only proper motions, or the complete three-dimensional

1188: velocities). This adds even more elements for a thorough assessment of

1189: the code's performance. It also provides insights into the types of

1190: datasets that will be necessary to constrain $i$ or $\Upsilon$ to some

1191: given uncertainty in realistic situations.

1192:

1193: Our \sch code has the capability of computing and storing, during a

1194: single orbit integration, the orbital properties for a series of

1195: different values of \ml. Thus, during the construction of the orbit

1196: library, different values of \ml\, are converted into a dimensionless

1197: factor $v_{\rm s}$ that multiplies all our original velocities, thus

1198: with \ml\, scaling simply as $v_{\rm s}^2$. We stress that this allows

1199: us to explore several values of \ml\, while computing only one orbit

1200: library. In our tests, we explore models for velocity factors in the

1201: range $0.8\leq v_{\rm s} \leq 1.2$. Given that our galaxy models with

1202: different inclinations have slightly different mass-to-light ratios

1203: $\Upsilon_0$, the use of this dimensionless representation also

1204: facilitates the visualization of the results in \S\,\ref{sec:getML}

1205: and \S\,\ref{sec:grids}. The correct (input) value of \ml\, is always

1206: at $v_{\rm s} = (\Upsilon/\Upsilon_0)^{1/2} = 1$.

1207:

1208: %We start by discussing the ``standard'' parameter settings with which

1209: %we have run most of our tests (\S\,\ref{sec:settings}).

1210:

1211:

1212:

1213:

1214: \subsection{Standard Settings}

1215: \label{sec:settings}

1216:

1217: At each of its different steps, the \sch code requires the user to

1218: specify several settings (or dials) that control a corresponding

1219: number of tasks and functions of the modeling procedure. Here we list

1220: the settings that we use for our standard run. We concentrate on the

1221: settings that are new to the discrete implementation. All other

1222: settings that are needed to fit a \sch model (e.g., the resolution and

1223: limits of the polar grids used to compute the gravitational potential

1224: $\Psi$; the required numerical accuracies in the fitting of the mass

1225: in the meridional plane and/or the projected plane of the sky; etc.)

1226: are identical to previous implementations of the code, so for those we

1227: refer to \citet{vdm98} and \citet{cre99}.

1228:

1229: At the heart of the \sch method lies the generation of a comprehensive

1230: library of orbits that should be representative of all types of orbits

1231: possible in the gravitational potential under study. This is achieved

1232: by adequately sampling the ranges of values that the three integrals

1233: of motion $(E,L_z,I_3)$ can acquire, each set of values uniquely

1234: determining one possible orbit. In this work we build models using two

1235: libraries that only differ in their size. Most of our runs consist of

1236: the generation of initial conditions and libraries with

1237: $20\times14\times7 = 1960$ orbits, obtained by sampling the available

1238: integral space with 20 energies $E$, 14 angular momenta $L_z$ (7

1239: positive and 7 negative), and 7 third integrals $I_3$. In order to

1240: study the dependency of the results on the size of the orbit library,

1241: we also compute \sch models using a much larger orbit library, with

1242: $40\times28\times14 = 15680$ combinations of $(E,L_z,I_3)$.

1243:

1244: The energy $E$ is sampled via the corresponding radius $R_c$ of the

1245: circular orbit of that energy (that with maximum angular momentum) in

1246: the equatorial plane $(z=0)$. This radius is logarithmically sampled

1247: from a minimum value that we choose to be much smaller than the

1248: spatial resolution of the data, to a maximum value set much beyond the

1249: point at which most of the mass of the input distribution is actually

1250: encompassed. Since totally unconstrained by the data, therefore, the

1251: few first and last energy bins will mostly be of no interest (i.e., no

1252: mass gets assigned to them in the process of optimization). The

1253: vertical component of the angular momentum, $L_z$, is linearly sampled

1254: using the variable $\eta=L_z/L_{\rm max}$, where $\eta \in\,\,(0,1)$

1255: and $L_{\rm max}$ is the angular momentum of the circular orbit with

1256: energy $E$. While orbits with both positive and negative $L_z$ are

1257: included in the library, the latter need not be individually

1258: integrated because they are simply obtained by reversing the velocity

1259: vector at each point along the orbit. The third integral $I_3$ is

1260: sampled via an angle $w \in\,\,(0,w_{\rm th})$, where $w_{\rm th}$

1261: determines the position at which the ``thin tube'' orbit at the given

1262: $(E,L_z)$ touches its zero-velocity curve (defined by the equation $E

1263: = \Psi_{\rm eff}$, where $\Psi_{\rm eff} = \Psi -

1264: \frac{1}{2}L_z^2/R^2$ is the effective gravitational potential; see

1265: \citealt{vdm98} for a detailed presentation). Finally, in order to

1266: help alleviate the discrete nature of the numerical orbit library, some

1267: extra radial smoothing of the orbits is performed by randomly

1268: generating a small variation to the energy and computing and storing

1269: the contribution to the probabilities from the ``new'' orbit with

1270: integrals $(E+\delta E,L_z,I_3)$. It is possible to implement similar

1271: smoothing in $L_z$ and $I_3$ as well (e.g., \citealt{kra05,cap06}),

1272: but we leave this for a future version of our code. This energy

1273: smoothing is repeated, at each timestep, for 7 random $\delta E$

1274: values. Azimuthal averaging is also performed by randomly drawing 7

1275: $\phi$ values at each timestep.

1276:

1277: Smoothing in phase-space is accomplished with the use of apertures

1278: (\S\,\ref{sec:pij}). The size of the (squared-shaped) spatial

1279: apertures are defined as a fraction of $R$, the distance to the center

1280: of the stellar system in the projected plane, and we set this fraction

1281: to 10\%. In velocity space, and for most practical applications, the

1282: measurement errors $e_{ik}$ themselves will provide sufficient

1283: ``natural'' smoothing for numerical purposes. Thus, we set the factors

1284: $\xi_k$ in equation \ref{eq.weights} to zero for all our tests (see

1285: also discussion in \S\,\ref{sec:pij}). In practice, the optimal value

1286: of $\xi_k$ will depend on the characteristics of the data

1287: (particularly the size of the velocity errors) as well as the stellar

1288: system under study. When dealing with actual data, therefore, at least

1289: a few different values should be tried in order to explore their

1290: impact on the results. Additionally, the extra smoothing provided by

1291: $\xi_k$ can also be useful to explore the validity of the quoted

1292: errors in any given application.

1293:

1294: The uncertainties $e_{ik}$ in the LOS velocities and/or proper motions

1295: are in practice determined by the details of the observations and,

1296: since obtained by different techniques (spectroscopy versus

1297: astrometry), are of different size in general. Furthermore, the

1298: uncertainties in the velocities tangential to the plane of the sky are

1299: affected by the uncertainty in the distance to the stellar system

1300: under study. Here, however, since we deal with simulated data, we

1301: assume kinematical data of nowadays typical good quality, and simply

1302: set all these errors to a moderate and arbitrary value of $e_{ik} =

1303: 7.1$\kms.

1304:

1305: The large majority of our tests were done on the simulated datasets as

1306: described in \S\,\ref{sec:data} {\it without} the addition of

1307: simulated observational errors (i.e., random Gaussian deviates with

1308: dispersion $e_{ik}$). This simplification was made early on in our

1309: project, based on the fact that the velocity errors should not matter

1310: much as long they are much smaller than the average one-dimensional

1311: velocity dispersion of the system under study. However, we realized

1312: later that this does induce a slight bias in our estimated

1313: mass-to-light ratios.  Our typical simulated datasets have dispersions

1314: of 48.4\kms and 46.3\kms, for the 55is and 90is cases,

1315: respectively. Therefore, by not adding any random velocity errors, the

1316: one-dimensional velocity dispersion of the pseudo-data that we

1317: actually analyzed is too small by a factor $h =

1318: (1+(e_{ik}/\sigma)^2)^{1/2}$. As a consequence of the virial theorem,

1319: it follows that we should expect to infer a mass-to-light ratio that

1320: is too small by a factor of $h^2$, corresponding to about 2.2\% and

1321: 2.4\% for the 55is and 90is datasets, respectively. Instead of

1322: rerunning all our calculations, which would have been computationally

1323: expensive, we therefore simply corrected for this bias {\it post

1324: facto}. So when studying the recovery of the mass-to-light ratio in

1325: \S\,\ref{sec:getML} and \S\,\ref{sec:grids}, instead of comparing the

1326: inferred values to the value $\Upsilon_0$ of the input model, we

1327: compare to the slightly smaller $\Upsilon_0^* = \Upsilon_0/h^2$. This

1328: quantity is $\Upsilon_0^* = 2.451$ for $i=90^{\circ}$ and

1329: $\Upsilon_0^* = 2.498$ for $i=55^{\circ}$.

1330:

1331: The sizes of currently existing kinematic datasets of discrete nature

1332: range from a few hundred datapoints (red giants in Local Group dwarf

1333: galaxies, planetary nebulae in the outskirts of giant ellipticals) to

1334: a few thousands (stars in Galactic globular clusters, systems of

1335: globular clusters around giant ellipticals). For our standard tests we

1336: adopt datasets with 1000 kinematic observational points, although we

1337: also study the consequences of studying datasets with sizes ranging

1338: from 100 to 2000 datapoints. In these tests, the small-$N$ datasets

1339: are subsets of the largest dataset ($N = 2000$), which means that

1340: there will be some correlation between the results of experiments done

1341: as a function of the number of available observations. This approach,

1342: we note, is of no substantial difference than having all the datasets

1343: of different $N$ but within the same simulation to be completely

1344: disjoint. The progression with $N$ should still follow the expected

1345: $N^{-1/2}$ statistical-convergence behavior (see

1346: Fig.~\ref{fig:errors_ML} and \S\,\ref{sec:getML}). The generation of

1347: one of our smaller $20\times14\times7$ orbit libraries, simultaneously

1348: storing discrete probabilities for a set of 1000 observational points

1349: with both LOS velocities and proper motions, takes 2.5 hours on a 3.6

1350: GHz, Pentium 4, 64-bit CPU with 2 Gb memory. An additional 0.5 hours

1351: are needed to find the maximum likelihood fit to the data. In

1352: practice, these steps must be iterated over a grid of gravitational

1353: potential parameterizations.

1354:

1355:

1356:

1357: %.... enforce meridional plane dispersion constraints?: NO ....

1358: %

1359: %.... fractional error in meridional plane mass: 0.05 ....

1360: %

1361: %.... fractional error in vel merid disp constraints: 0.20 ....

1362: %

1363: %.... settings on projected cubes used to store info ....

1364:

1365:

1366:

1367:

1368:

1369:

1370: \subsection{Recovery of the Distribution Function}

1371: \label{sec:getDF}

1372:

1373: In order to determine whether the best-fitting solution obtained by

1374: the discrete \sch code actually resembles the properties of the input

1375: data, we start by making detailed comparisons between the input and

1376: fitted DFs. To do this, we feed the code with the correct inclination

1377: and mass-to-light ratio \ml\, used to generate the input datasets, and

1378: compare the fitted mass weights $\zeta_{\rm fit}$ to those

1379: corresponding to the input data, $\zeta_{\rm in}(E,L_z,I_3)$,

1380: approximated using equation (\ref{eq:zetain}). We use datasets with

1381: 1000 LOS velocities and proper motions, and present results for both

1382: the small and big orbit libraries detailed in \S\,\ref{sec:settings}.

1383:

1384: The comparison is best achieved via the analysis of corresponding one-

1385: and two-dimensional projections of the cubes of mass weights

1386: $\zeta_{\rm fit}(E,L_z,I_3)$ and $\zeta_{\rm in}(E,L_z,I_3)$, obtained

1387: by integrating over two and one of the integrals of motion,

1388: respectively (Figs. \ref{fig:1Dplots} to \ref{fig:2Dplot55is}).  Also,

1389: we make comparisons of two-dimensional $L_z-I_3$ slices of both cubes

1390: at selected values of the energy (Fig. \ref{fig:Ebins55is}). For al

1391: of these projections we quantify the agreement between fits and input

1392: data by computing the RMS and median absolute deviation of the

1393: quantity $(\zeta_{\rm fit}-\zeta_{\rm in})/\zeta_{\rm in}$, i.e., the

1394: difference between fit and input mass weights normalized by the input

1395: mass weights. These statistics are listed for \sch models run on all

1396: our input datasets in Table 1. Since the RMS can be biased

1397: disproportionately by a small number of large outliers, in our

1398: discussion below we use preferentially the median absolute residual.

1399:

1400: Figure \ref{fig:1Dplots} shows, for the 55is case, the integrated mass

1401: weights as a function of each of the three integrals of motion, for

1402: both the input dataset and the discrete \sch fit. Inside the region

1403: actually constrained by kinematic data (containing 99.83\% of the

1404: total mass), the mean absolute deviations between the fitted and input

1405: distributions of mass weights are 3\%, 16\%, and 18\%, for the

1406: integrated distributions as a function of $E$, $L_{\rm z}$, and $I_3$,

1407: respectively. As listed in Table 1, similar numbers are obtained for

1408: the other 5 simulated datasets, with the agreement between both

1409: distributions as a function of energy always better than 5\%. As a

1410: function of $L_z$, the largest disagreement actually corresponds to

1411: the one shown in Figure \ref{fig:1Dplots}, the 55is case. It goes down

1412: to 7\% for our case of closest agreement, the case labeled 55ns. The

1413: net rotation inherent to the 55is dataset (reflected in the middle

1414: panel by all the mass weights with positive $L_z$ being larger than

1415: those with negative $L_z$) is clearly reproduced by the \sch fit. As a

1416: function of the third integral, the median absolute deviation varies

1417: from 16\% for the 55ns case to up to 25\% for the 90ns case. Note

1418: that, since we are showing orbital mass weights instead of the actual

1419: DF, the $I_3$ distributions are not expected to be constant over

1420: $I_3$, even though the input DF underlying all simulated datasets is

1421: of the form $f(E,L_z)$.

1422:

1423: Next, integrating only over $I_3$, we show in Figures

1424: \ref{fig:2Dplot55ns} and \ref{fig:2Dplot55is} the agreement between

1425: the fitted and input sets of mass weights as a function of $E$ and

1426: $L_z$, for the 55ns and 55is cases, respectively. The upper panels of

1427: these figures show the results of the \sch fit ($\zeta_{\rm fit}$) and

1428: the lower panels the original input distributions ($\zeta_{\rm

1429: in}$). The left-hand panels show the results for a $(E,L_z,I_3)$

1430: library of $40\times28\times14$ orbits, 8 times larger (i.e., finer)

1431: than that of the right-hand panels, which correspond to our standard

1432: case of $20\times14\times7$ orbits.  Only the energy range constrained

1433: by the respective sets of data is shown. Black corresponds to zero

1434: weight, and the white (brightest) color in each pair of panels (fit

1435: and model, or upper and lower) has been assigned to the maximum

1436: orbital weight among the two panels, so that the comparison between

1437: fits and models is made using the same color scale.

1438: %(WILL HAVE TO CHANGE THIS AND SET WHITE TO

1439: %THE MAXIMUM AMONG ALL PANELS ... STILL TO DO).

1440:

1441: Both Figures \ref{fig:2Dplot55ns} and \ref{fig:2Dplot55is} show that

1442: the main features of the input $E-L_z$ distributions of mass weights

1443: are well reproduced by the 3-integral \sch fits. In particular, the

1444: mean streaming properties of both datasets are satisfactorily

1445: recovered.  In Figure \ref{fig:2Dplot55ns}, the two prominent

1446: phase-space blobs occupying symmetrical locations on the negative and

1447: positive sides of the $L_z$-axis correspond well with the non-rotating

1448: overall nature of the 55ns dataset. Moreover, this is recovered by

1449: both the models with standard and large orbit libraries (right- versus

1450: left-hand panels). Similarly, in Figure \ref{fig:2Dplot55is}, the

1451: single phase-space blob at positive $L_z$ with a pronounced elongation

1452: towards negative $L_z$ (in light blue and blue), indicative of the

1453: rotating nature of the 55is case, is reproduced by the \sch fit as

1454: well. The median absolute deviations between the fitted and input

1455: $E-L_z$ distributions, always restricted to the energy range

1456: constrained by the data, are 14\% and 19\% for the 55ns and 55is

1457: cases, respectively (Table 1).

1458:

1459: In Figure \ref{fig:Ebins55is} we show the 3-dimensional distributions

1460: of mass weights of our 55is case in the form of a series of $L_z-I_3$

1461: planes at different energies. Here again, the upper panels show the

1462: results of the discrete \sch fit ($\zeta_{\rm fit}$), the lower panels

1463: the distribution of mass weights corresponding to the input data

1464: ($\zeta_{\rm in}$), and the color scale is set up in the same way as

1465: in the $E-L_z$ figures. As the energy $E$ is sampled via the radius

1466: $R_c$ of the circular orbit (its value in arcmin indicated at the top

1467: of each pair of panels), this series of planes shows the variation of

1468: the $L_z-I_3$ distribution with increasing distance from the center of

1469: the galaxy. The fraction of the total mass at each energy slice is

1470: given as a percentage at the bottom of each panel.

1471:

1472: The bottom panels of Figure \ref{fig:Ebins55is} indicate that, in the

1473: inner regions (inside 0.2 arcmin), most of the mass in the 55is

1474: dataset is concentrated in orbits with $L_z$ near zero. The

1475: corresponding upper panels show that the \sch fit recovers this $L_z

1476: \approx 0$ component, but it distributes more weight than the input

1477: model into orbits with positive $L_z$. These inner regions,

1478: nevertheless, have a relatively low mass content in comparison with

1479: regions at larger radii. As the radius increases, the $L_z \approx 0$

1480: region of phase-space gets progressively depleted of stars in favor of

1481: orbits with high $L_z$. This transition is reasonably well reproduced

1482: by the \sch solution, and the agreement between fit and input data

1483: becomes better at large radii, at which point most of the mass at each

1484: energy is concentrated in orbits of high $L_z$.

1485:

1486: Note also that a common characteristic of Figures

1487: \ref{fig:2Dplot55ns}, \ref{fig:2Dplot55is}, and \ref{fig:Ebins55is} is

1488: that \sch fits typically present mass weight distributions that appear

1489: broader (more extended) and less peaked than the corresponding

1490: distributions displayed by the pseudo-data. The effect is most obvious

1491: among the right-most panels of Figure \ref{fig:Ebins55is}, where one

1492: can see that the $L_z-I_3$ mass-weight distributions of the input data

1493: (lower panels) have higher peaks and overall sharper features than the

1494: corresponding fitted distributions (upper panels). This is an expected

1495: effect and is due to the combined smoothing of the fitted distribution

1496: introduced both by the (necessary) use of velocity apertures for the

1497: computation of likelihoods (see \S\,\ref{sec:pij}), and by the

1498: regularization constraints imposed in order to enforce smoothness in

1499: phase space. While the first smoothing is particular to our discrete

1500: implementation, the second is a well-known procedure common to most

1501: \sch codes. Models without regularization tend to be unrealistically

1502: noisy \citep{vdm98} and unreliable for parameter estimation

1503: \citep{cre04}. Thus, although we choose to plot the input distribution

1504: of mass weights as they actually are, the most fair of comparisons

1505: would be one in which the \sch fit is compared with a smoothed version

1506: of the original mass weight distribution describing the input data. We

1507: explored this by convolving the input distribution of mass weights

1508: with a (circular) Gaussian kernel, and then computing the same

1509: statistics shown in Table 1 (but this time using the smoothed version

1510: of the input distribution) for different widths of the Gaussian

1511: kernel. We have verified that indeed it is possible to find a kernel

1512: width for which the agreement between fit and input data is best,

1513: improving both the RMS and mean absolute deviations of Table 1 by

1514: factors between 1.2 and 1.5. Finally, we also note that the comparison

1515: in Figure \ref{fig:Ebins55is} might be affected by the accuracy of the

1516: approximation in equation (\ref{eq:zetain}), which means that the

1517: values in Table 1 are actually upper limits to the true accuracy of

1518: the \sch fits.

1519:

1520:

1521:

1522: From these tests we conclude that our discrete \sch code can

1523: successfully recover the original DF inside the region constrained by

1524: the kinematic data, at least for the case in which the inclination and

1525: mass-to-light ratio are assumed known.

1526:

1527:

1528:

1529:

1530: \subsection{Recovering the mass-to-light ratio}

1531: \label{sec:getML}

1532:

1533: For a large range of potential applications of a \sch code, such as

1534: investigating dark matter halos in galaxies, the most important

1535: property that one is interested in measuring with confidence is the

1536: mass-to-light ratio. In the present tests, this quantity is a scalar,

1537: \ml, although in more general applications it could be a function of

1538: radius. In this section we study in detail the capacity of our code to

1539: infer the correct \ml\, when the inclination of the system is assumed

1540: known. Tests were performed for a number of input datasets in order to

1541: investigate the dependence of the results on key observational

1542: variables such as the number of kinematic measurements and the type of

1543: kinematic constraints available (i.e., only-LOS velocities, only

1544: proper motions, as well as both LOS velocities and proper

1545: motions). All models in this section were computed using our small

1546: orbit library, with $20\times14\times7$ combinations of $(E,L_z,I_3)$.

1547: The results of these experiments are summarized in Figures

1548: \ref{fig:MLparabN}\,-\,\ref{fig:errors_ML}.

1549:

1550: For the 90is and 55is cases and using full 3-dimensional velocity

1551: information, Figure \ref{fig:MLparabN} shows the $\Delta \chi^2$

1552: parabolae obtained when applying the discrete \sch code with a number

1553: of \ml\, values, distributed around the correct one ($\Upsilon_0^*$),

1554: for datasets of varying sizes. The quantity

1555: $(\Upsilon/\Upsilon_0^*)^{1/2}$ along the ordinate denotes the

1556: velocity scaling; $(\Upsilon/\Upsilon_0^*)^{1/2} = 1$ corresponding to

1557: a \sch model with the input value $\Upsilon_0^*$, defined in

1558: \S\,\ref{sec:settings}. The zero point of the vertical axis (both in

1559: Figures \ref{fig:MLparabN} and \ref{fig:MLparab2}) is arbitrary, but

1560: the difference $\Delta\chi^2$ between points on the same curve has its

1561: usual statistical meaning, and indeed we compute the (random)

1562: uncertainties on the determination of \ml\, directly from them.

1563:

1564: Figure \ref{fig:MLparabN} shows that the difference $\Delta\chi^2$

1565: between points on the same curve becomes larger (the parabolae become

1566: narrower) as the number of available kinematic measurements increases.

1567: The determination of the best-fit \ml\, also depends on the type of

1568: available kinematic measurements. This is illustrated in Figure

1569: \ref{fig:MLparab2}, where we plot the $\Delta \chi^2$ parabolae

1570: obtained when considering only LOS velocities, only proper-motions, or

1571: the full 3-dimensional velocity information. All cases are for the

1572: 55is dataset with 1000 kinematic measurements. In this case, the

1573: $\Delta\chi^2$ parabolae become narrower as the number of available

1574: velocity components increases.

1575:

1576: Furthermore, the statistical errors are generally smaller for larger

1577: datasets, as well as when more velocity components are available. This

1578: is shown in Figure \ref{fig:errors_ML}, where we plot the behavior of

1579: the best-fit \ml\, and its uncertainties as a function of $\log(N)$,

1580: where $N$ is the number of datapoints. The uncertainties

1581: $\Delta\Upsilon$ displayed in the upper panel of Figure

1582: \ref{fig:errors_ML} represent the $1\sigma$ intervals around the

1583: minimum of the parabolae in Figures \ref{fig:MLparabN} and

1584: \ref{fig:MLparab2}, and are defined as half the distance between the

1585: points on the curve where $\Delta \chi^2 = 1$ with respect to the

1586: minimum. The statistical errors scale roughly as $N^{-1/2}$ over an

1587: interval of 1.3 dex in $\log N$. Also, the errors in the best-fit

1588: \ml\, associated to datasets with only proper-motions (triangles) are

1589: smaller than those associated to only-LOS datasets (open circles) for

1590: any value of $N$.  In other words, our discrete \sch code satisfies

1591: the fundamental statistical expectation that it should become easier

1592: for the method to distinguish between models with different \ml\, when

1593: the amount of observational information is larger. In the case of

1594: datasets with the full 3-d velocity information, the 55is

1595: uncertainties do not quite seem to follow the $N^{-1/2}$ behavior

1596: expected from statistics. We attribute this to our tests having

1597: reached a fundamental floor due to the discrete nature of the models,

1598: a limit that can not be overcome by increasing the number $N$ of

1599: available measurements. This can cause an apparent flattening with

1600: respect to the regular $N^{-1/2}$ behavior at large $N$.

1601:

1602: To test the robustness of the errors estimated as above, we performed

1603: the following exercise. Selecting 10 different (disjoint) realizations

1604: of the N-body data (for the 55is case with 1000 measurements of only

1605: line-of-sight velocities, the case most often found in practice), we

1606: repeated the exercise of Figures \ref{fig:MLparabN} and

1607: \ref{fig:MLparab2} and computed discrete \sch models for a set of

1608: different \ml\, values distributed around the correct input one. This

1609: was done using our small orbit library. We obtained an average

1610: best-fit \ml\, of 2.46 (less than $1\sigma$ away from the input value,

1611: $\Upsilon_0^*=2.498$), with an RMS of 0.074 (corresponding to about

1612: 3\%). When computing the statistical uncertainties using the $\Delta

1613: \chi^2$ parabolae as described above, the average $1\sigma$ error in

1614: the best-fit \ml\, of the set of experiments turns out to be 0.204,

1615: equivalent to a fractional error of 8\%. This is a factor of 2.5

1616: larger than the scatter in the results from multiple independent

1617: realizations of the pseudo-data. This gap is smaller when additional

1618: information about the individual kinematics of the tracers is

1619: available. Indeed, repeating the above exercise for the same datasets

1620: but now using two-dimensional proper-motions instead of only

1621: line-of-sight velocities, the average error in \ml\, computed from our

1622: $\Delta \chi^2$ parabolae is 0.112, a factor of 1.7 larger than the

1623: scatter of the best-fit values, which was 0.067. Therefore, we

1624: conclude that our error estimation using $\Delta\chi^2$ is

1625: conservative.

1626:

1627: Despite the smaller statistical errors for the case with

1628: proper-motions alone, the bottom panel of Figure \ref{fig:errors_ML}

1629: indicates that the best-fit \ml\, is closer to the real value,

1630: $\Upsilon_0^*$, for the case with only-LOS velocities. While the

1631: best-fit \ml\, from datasets with only LOS velocities are well within

1632: $1\sigma$ of $\Upsilon_0^*$ for any $N$, this is not the case for the

1633: datasets with only proper-motions, with best-fit \ml\, values that are

1634: $2-4\sigma$ away from $\Upsilon_0^*$. Still, the formal best-fit \ml\,

1635: for the case of full 3-d velocities (thick squares) is on average

1636: within $2\sigma$ of the real value, $\Upsilon_0^*$, corresponding to

1637: better than $\sim 6$\% accuracy. One contribution to the small

1638: systematic bias in \ml\, may come from the fundamental nature of

1639: inverse problems in general (of which \sch modeling is an example),

1640: namely, that there may not necessarily be a unique solution: it may be

1641: possible to change the mass profile and the DF without appreciably

1642: changing the model predictions. If such is the case and there are

1643: multiple solutions, we do not necessarily expect a flat $\Delta

1644: \chi^2$ profile (i.e., with a number of equally acceptable solutions

1645: containing the correct one), most likely because of numerical noise

1646: and discretization effects. While we cannot rule this out, our results

1647: do show that this probably does not affect the recovered mass-to-light

1648: ratio at more than the $\sim 10$\% level (based on Figure

1649: \ref{fig:errors_ML}, built with models using our smaller orbit

1650: library). Unless superb data are available, random uncertainties are

1651: likely larger than such systematic errors. Currently, the only

1652: exception to this are some Galactic globular clusters, for which

1653: thousands of proper motions are being measured. However, such systems

1654: are often closer to spherical than galaxies, and hence one expects any

1655: theoretical degeneracies to be smaller. Alternatively, numerical noise

1656: in the orbit library may be the cause of this systematic bias in \ml\,

1657: seen in the bottom panel of Figure \ref{fig:errors_ML}.  Numerical

1658: noise may be reduced in part by the use of larger orbit

1659: libraries. Indeed, we show in \S\,\ref{sec:grids} below that a

1660: substantially larger orbit library tends to produce more accurate

1661: results overall.

1662:

1663: The likelihood ratio statistic $\Delta \chi^2$ in

1664: Figures~\ref{fig:MLparabN} and \ref{fig:MLparab2} allows us to find

1665: the best-fit model parameters and their confidence intervals.

1666: However, it does not shed light on the question whether the best-fit

1667: model is actually statistically consistent with the data. The

1668: likelihood $\ln L$ of the best-fit model also cannot be used for this

1669: purpose. There is no theorem of mathematics that states what the value

1670: of $\ln L$ should be for a statistically acceptable model, given that

1671: the underlying velocity distributions from which the particles are

1672: drawn are not known a priori (and are not generally Gaussian).

1673: Nonetheless, many other statistics can be defined to address this

1674: issue once the best-fit model has been found. For example, the

1675: velocity moments of the best-fit model can be calculated (as a

1676: function of position on the sky), and statistics can be defined that

1677: assess whether these moments are consistent with the observed data.

1678: Alternatively, one can draw random realizations of the data from the

1679: best-fit model and use a Kolmogorov-Smirnov test to assess whether the

1680: data and the realization are consistent with being drawn from the same

1681: underlying distribution. We have explored a subset of these approaches

1682: and these suggested that the best-fit models are indeed statistically

1683: consistent with the pseudo-data they were designed to fit.

1684:

1685: \subsection{Recovering the inclination and M/L}

1686: \label{sec:grids}

1687:

1688: In general neither the mass-to-light ratio nor the inclination of a

1689: stellar system under study are known in advance, and thus one has to

1690: explore models with several combinations of both parameters in a

1691: search for those values that provide the best fit to the data. In this

1692: section we present and discuss the results of running the discrete

1693: \sch code on grids of $(i,\Upsilon)$ values to study whether the

1694: correct input combination is recovered. As in \S\,\ref{sec:getML}, we

1695: perform tests on datasets with different types of kinematic

1696: constraints (LOS velocities and/or proper motions).

1697:

1698: The results of tests are presented in Figures \ref{fig:55is_LOS_MU}

1699: and \ref{fig:grid_55_90}. They show $\Delta\chi^2$ contours that

1700: result when computing discrete \sch models on a grid of $(i,\Upsilon)$

1701: values, including the correct input combination, for a variety of

1702: input datasets of the 55is and 90is cases. The goodness-of-fit

1703: parameter $\Delta\chi^2$ shown in these plots is obtained by first

1704: rebinning with a much finer grid the $(i,\Upsilon)$ space explored by

1705: the models actually calculated (indicated by small dots), and then

1706: computing the values on this new grid via interpolation between the

1707: nearest calculated models. We then determine the minimum on the finer

1708: grid (whose location is indicated by the star) and subtract it from

1709: all grid points to obtain the $\Delta\chi^2$ parameter, for which

1710: contours are shown. As in the case of Figures \ref{fig:MLparabN} and

1711: \ref{fig:MLparab2}, the mass-to-light ratio is parameterized by the

1712: dimensionless velocity scaling $v_{\rm

1713: s}=(\Upsilon/\Upsilon_0^*)^{1/2}$, so that the input value corresponds

1714: to $v_{\rm s}=1$.

1715:

1716: We start by showing in Figure \ref{fig:55is_LOS_MU} the results of

1717: running grids of models for input datasets composed of only-LOS

1718: velocities and only proper motions, in both cases for the 55is case

1719: with 1000 observational datapoints, and using our small orbit library

1720: with $20\times14\times7$ combinations of $(E,L_z,I_3)$. Overall, and

1721: in agreement with the results of Figure \ref{fig:MLparab2} discussed

1722: in \S\,\ref{sec:getML}, the $\Delta\chi^2$ contours indicate that

1723: proper motions (bottom panel) better constrain the best-fit

1724: $(i,\Upsilon)$ combination than a dataset with only-LOS velocities

1725: (upper panel). The $3\sigma$ uncertainties (thick contours) obtained

1726: from the only-LOS dataset are twice as large than those from the

1727: proper motions alone (31\% and 16\%, respectively). The input

1728: mass-to-light ratio $\Upsilon_0^*$ is adequately recovered by both

1729: datasets (to within the $1\sigma$ confidence region). The best-fit

1730: inclination, however, is offset from the actual input value

1731: $i=55\grad$ for both datasets, although somewhat closer to the correct

1732: value in the case of proper motions only. The $3\sigma$ uncertainties

1733: in the best-fit inclination are $\pm 6\grad$ and $\pm 11\grad$ for the

1734: only proper motions and only LOS cases, respectively.

1735:

1736: Difficulties in constraining the inclination using \sch modeling of

1737: stellar kinematics have been encountered in the past. A good recent

1738: example is that of \citet{kra05} who, based on integrated stellar LOS

1739: velocity profiles and ionized gas observations of the E4 galaxy NGC

1740: 2974, carried out a study analogous to the present one by constructing

1741: simulated observations of this galaxy, which they feed to their

1742: ``continuous'' (as opposed to discrete) \sch code in order to study

1743: the recovery of the input mass-to-light ratio and inclination. They

1744: find that even with artificially perfect input kinematics the

1745: inclination is very poorly constrained. The same conclusion is reached

1746: when attempting to fit the actually observed LOS velocity profiles

1747: with \sch models, so stellar LOS velocity profiles provide weak

1748: constraints on the inclination of this system, a statement they are

1749: confident about because the actual inclination of NGC 2974 is known

1750: from observations of its extended disc of neutral and ionized gas in

1751: rapid rotation.

1752:

1753: While one could expect that the availability of proper motion

1754: measurements in addition to LOS velocities would enhance the ability

1755: of the models to obtain useful constraints on the inclination of a

1756: stellar system in general, the reality is that the current

1757: state-of-the-art of \sch modeling does not have a definitive answer on

1758: this issue yet. As recent studies of the kinematics of stars in

1759: globular clusters seem to indicate, the chances of success are highly

1760: dependable on the quality and quantity of available data on the system

1761: under study (compare, for example, the results of \citealt{glenn06}

1762: and \citealt{bos06} regarding the best-fit inclinations of $\omega$

1763: Cen and M15, respectively).

1764:

1765: There are at least two factors that may contribute to the difficulty

1766: in recovering the inclination from stellar kinematics: degeneracies

1767: inherent to \sch models, and numerical noise. First, there is no

1768: guarantee that inclinations other than the correct one must fit the

1769: data worse. Indeed, in their modeling of high signal-to-noise

1770: integral-field data of NGC 2974, \citet{kra05} already observe that

1771: the differences between \sch models with different inclinations are

1772: smaller than the differences between the best-fitting model and the

1773: data, which they interpret as indication of a fundamental degeneracy

1774: in the recovery of the inclination with three-integral

1775: models. Numerical noise, on the other hand, is a consequence of \sch

1776: models being in the end only discrete representations of a smooth,

1777: continuous distribution of possible orbits, and it could be argued

1778: that this discreteness might have a more negative effect for high

1779: inclinations. For example, even a simple and smooth circular orbit

1780: presents cusps or discontinuities when viewed close to edge-on. The

1781: turning points of such an orbit may get smoothed out differently for

1782: different inclinations.

1783:

1784: The issue of degeneracy, nevertheless, can be avoided in those cases

1785: where the inclination is known to be uniquely determined by the

1786: data. This is the case, e.g., in the situations where the following

1787: conditions are met: (1) the kinematical dataset consists of proper

1788: motion measurements and LOS velocities, (2) the stellar system is

1789: reasonably close to axisymmetric, and (3) there exists an independent

1790: measurement of the distance $D$ to the system. As first used in

1791: practice by \citet{glenn06}, the inclination then follows directly

1792: from the following relationship between the mean LOS velocity (in

1793: units of \kms) and the mean proper motion along the short axis (in

1794: units of mas\,yr$^{-1}$),

1795: %

1796: \begin{equation}

1797: \label{eq:inclination}

1798: \langle\,v_{z'}\,\rangle = 4.74\,\,D\,\tan i\,\,\langle\,\mu_{y'}\,\rangle,

1799: \end{equation}

1800: %

1801: where $D$ is the distance in kpc, and the brackets denote an

1802: integration along the line-of-sight. This relation is true at each

1803: projected position $(x',y')$ in any axisymmetric system, and has been

1804: successfully applied to the Galactic globular clusters $\omega$ Cen

1805: and M15 \citep{glenn06,bos06}.

1806:

1807: Here, in order to explore the applicability of this simple

1808: relationship, we take advantage of our a priori knowledge of the

1809: correct inclination for our simulated datasets, and study the

1810: circumstances under which the use of equation (\ref{eq:inclination})

1811: provides an accurate result. Unlike the case of integrated light

1812: measurements (where $\langle\,v_{z'}\,\rangle$ is simply the average

1813: of the LOSVD at any given projected position on the sky), in the

1814: context of discrete datasets neither $\langle\,v_{z'}\,\rangle$ nor

1815: $\langle\,\mu_{y'}\,\rangle$ are quantities that can be rigorously

1816: obtained from the data at any given $(x',y')$. Both quantities may,

1817: nevertheless, be approximated by averaging a number of kinematic

1818: measurements that fall within one or more apertures of a given size

1819: around projected positions $(x',y')$. Following this, we applied

1820: equation (\ref{eq:inclination}) to a series of subsets of our 6

1821: simulated datasets with varying number of kinematic measurements, and

1822: verified that indeed the correct inclination is reproduced provided:

1823: (a) the system is rotating (otherwise, while the relation is still

1824: valid, both averages are nearly zero and hence the inclination is not

1825: really constrained); (b) most of the datapoints are not located close

1826: to the minor axis (where rotation velocities are too small); and (c)

1827: the averages are computed from a sufficiently large number of

1828: kinematical measurements (so that the error in $\tan i$ is not too

1829: large). These are conditions that are certainly fulfilled by datasets

1830: on some Galactic globular clusters, currently the only class of

1831: stellar system for which there are 3-dimensional kinematic information

1832: available. Therefore, in those cases, equation (\ref{eq:inclination})

1833: can be safely applied. The \sch modeling can then concentrate on

1834: recovering the more interesting properties such as the orbital

1835: structure and mass-to-light ratios, which we have shown are

1836: successfully recovered when the inclination is assumed known.

1837:

1838: To better understand the problem of numerical noise, we explored the

1839: dependence of the results on the size of the orbit library used to

1840: construct the \sch models. We did this for cases with 1000 datapoints

1841: with complete three-dimensional velocities, so that because of

1842: equation (\ref{eq:inclination}) we know that there is no theoretical

1843: degeneracy in inclination. Figure \ref{fig:grid_55_90} shows the

1844: $\Delta\chi^2$ contours resulting from fits of \sch models using our

1845: standard library of $20\times14\times7$ orbits (upper panels; same

1846: library size as in Figure~\ref{fig:55is_LOS_MU}) in comparison with

1847: fits that use a library 8 times larger, i.e., one with

1848: $40\times28\times14$ orbits (lower panels). We show results for the

1849: 55is (left-hand panels) and 90is (right-hand panels) cases.

1850:

1851: In all four panels of Figure \ref{fig:grid_55_90}, the best-fit

1852: mass-to-light ratio is always within $1\sigma$ of the input value

1853: $\Upsilon_0^*$, with the exception of the 55is case with the bigger

1854: library (lower left), where they agree at the $2\sigma$ level. The

1855: size of the confidence regions on the mass-to-light ratio does not

1856: change significantly when the orbit library is increased in size.

1857: Therefore, we conclude that libraries of $20\times14\times7$ orbits

1858: are large enough to properly constrain the mass-to-light ratio

1859: (provided that one uses regularization as we do here; see

1860: \citealt{cre04}). This provides further justification for our use of

1861: this library size in Section~\ref{sec:getML}.

1862:

1863: The top left panel in Figure \ref{fig:grid_55_90} is directly

1864: comparable to the two panels of Figure \ref{fig:55is_LOS_MU}, but now

1865: with three components of velocities observed, instead of just one or

1866: two, respectively. Consistent with the results in

1867: Figures~\ref{fig:errors_ML} and~\ref{fig:55is_LOS_MU}, we see that the

1868: addition of an extra component of velocity decreases the size of the

1869: confidence regions. More interestingly, a secondary minimum in $\Delta

1870: \chi^2$ appears close to the $(i,\Upsilon)$ values for the correct

1871: input model. This suggests that indeed all three components of

1872: velocity may be necessary to uniquely constrain the inclination of an

1873: axisymmetric stellar system. The bottom left panel shows the effect of

1874: increasing the orbit library size. There is now only a single minimum,

1875: centered at an inclination that agrees with the input value at the

1876: $\sim 2\sigma$ level.

1877:

1878: The right panels in Figure \ref{fig:grid_55_90} show the situation for

1879: the 90is case. With the small library (top right), the best-fit

1880: inclination is at $i\approx 70\grad$, substantially far from the input

1881: value. When the orbit library size is increased (lower right), the

1882: best-fit shifts to $i=80\grad$. This is only $10\grad$ from the

1883: correct input value, which may well be acceptable for many realistic

1884: applications. On the other hand, the best fit and the input value are

1885: inconsistent at the many sigma level, which is certainly reason for

1886: some concern. A possible cause for this is that the turning points of

1887: orbits in edge-on systems have very sharp edges in

1888: projection. Therefore, larger grid sizes than we have used may be

1889: necessary to correctly represent them in all the necessary detail.

1890: However, we have not explored this further for two reasons. First,

1891: information on all three velocity components may be necessary to be

1892: able to uniquely constrain the inclination. If that is available, then

1893: use of equation~(\ref{eq:inclination}) will be more accurate and

1894: efficient than use of Schwarzschild modeling. Second, in practice one

1895: is generally much more interested in the mass distribution than in the

1896: inclination. Figure \ref{fig:grid_55_90} shows that the mass-to-light

1897: ratio is correctly recovered, even when the inclination is

1898: systematically biased.

1899:

1900: In conclusion, our tests demonstrate that the recovery of the most

1901: important properties of the system (its orbital structure and the

1902: mass-to-light ratio) by our discrete \sch models is robust. Correct

1903: recovery of the inclination appears to be the most complicated aspect

1904: of the modeling. Sufficient observational data must be available and a

1905: large enough orbit library must be used. Our code can then adequately

1906: recover the inclination of sufficiently inclined systems. However, for

1907: edge-on systems there remains a systematic inclination bias of $\sim

1908: 10\grad$ that we have been unable to resolve. This is the primary

1909: shortcoming of our new approach that was unearthed by the pseudo-data

1910: tests that we have presented. This may be a generic property of

1911: Schwarzschild codes, since other authors have also reported

1912: difficulties in recovering inclinations. Either way, this is not

1913: believed to be a significant limitation for most potential practical

1914: applications of our code.

1915:

1916: \section{Summary and conclusions}

1917: \label{sec:end}

1918:

1919: Discrete kinematic datasets, composed of velocities of individual

1920: tracers (e.g., red giants, planetary nebulae, globular clusters,

1921: galaxies, etc.), are routinely being assembled for a variety of

1922: stellar systems of all scales (\S\,\ref{sec.intro}). These include not

1923: only LOS-velocity surveys. High-quality proper-motion databases

1924: already exist for Galactic globular clusters, and future facilities

1925: hold the promise of providing the same for stars in the nearest

1926: galaxies. However, the most sophisticated tools typically being used

1927: in the modeling of these observations were actually developed for the

1928: analysis of kinematic data in the form of LOSVDs, a rather different

1929: type of velocity information than the case of the velocities of

1930: kinematic tracers on a one-by-one basis. As a consequence, the

1931: information content of any particular dataset of a discrete nature is

1932: likely not being fully exploited. We thus have developed a specific

1933: tool for the modeling of discrete datasets, which we have presented in

1934: this paper along with detailed tests of its performance based on the

1935: modeling of simulated data.

1936:

1937: The new tool consists of a \sch orbit-superposition code that, adapted

1938: from the implementation of \citet{vdm98}, can handle any number of

1939: (one-, two-, or three-dimensional) velocities of individual kinematic

1940: tracers without relying on any binning of the data. Under the only

1941: assumptions that the system is in steady-state equilibrium (i.e., the

1942: gravitational potential is not changing in time) and may be well

1943: approximated as axisymmetric, the code finds the distribution function

1944: (a function of the three integrals of motion $E$, $L_z$, and $I_3$)

1945: that best reproduces the observations (the velocities of the tracers

1946: as well as the overall light distribution) in a given potential. The

1947: fact that the distribution function is free to have any dependence on

1948: the three integrals of motion allows for a very general description of

1949: the orbital structure, thus avoiding common restrictive assumptions

1950: about the degree of (an)isotropy of the orbits.

1951:

1952: Unlike previous implementations of the \sch technique, we cast the

1953: problem of finding the best superposition of orbits using a

1954: probabilistic approach, i.e., by building a likelihood function

1955: representing the probability that the entire set of measurements would

1956: have been observed assuming a particular form for the gravitational

1957: potential (\S\,\ref{sec:logL}). In this case, and in contrast with the

1958: old continuous versions, the dependence of the likelihood function on

1959: the orbital weights is non-linear, and the optimization problem can

1960: not be reduced to a linear matrix equation. Instead, it becomes a

1961: problem of the maximization of a likelihood with respect to the set of

1962: weights associated to all possible combinations of the integrals

1963: $(E,L_z,I_3)$ that comprise the orbit library (\S\,\ref{sec:logL}),

1964: and which accounts for the observed positions and (any-dimensional)

1965: velocities of all particles in the dataset, including their

1966: uncertainties (\S\,\ref{sec:pij}). After extensive testing, a

1967: conjugate gradient algorithm was found to converge satisfactorily to

1968: the correct solution and was adopted for the remaining tests of the

1969: code's overall performance (\S\,\ref{sec.mkfitin}).

1970:

1971: In order to assess the reliability of our discrete \sch code, we

1972: applied it to several sets of simulated data, i.e., artificially

1973: generated kinematic observations obtained from a model of an

1974: axisymmetric galaxy of which the orbital structure, mass distribution,

1975: and inclination are known in advance. Pseudo-datasets were generated

1976: from a two-integral phase-space distribution function with varying

1977: degrees of overall rotation, types of velocity information (only-LOS,

1978: only proper motions, and both), total number of particles, and for two

1979: different inclinations on the plane of the sky (\S\,\ref{sec:data}).

1980:

1981: Using the various simulated datasets, we studied the recovery of the

1982: input orbital structure or DF, mass-to-light ratio, and

1983: inclination. For the purposes of these tests, we assumed complete

1984: knowledge of the radial profile of the underlying mass distribution

1985: and a mass-to-light ratio that remains constant as a function of

1986: radius. These restrictions are easily (and must be) lifted when

1987: modeling data on real systems, in which case one needs to explore a

1988: range of plausible underlying potentials and allow for variations of

1989: the mass-to-light ratio to properly account for the possibility of

1990: central black holes and dark halos.

1991:

1992: Inside the region constrained by data, we find that the distribution

1993: function (represented by the corresponding distributions of orbital

1994: mass weights) and streaming characteristics of the input datasets are

1995: satisfactorily recovered by the \sch fits when the correct inclination

1996: and mass-to-light ratio are known (Figs. \ref{fig:1Dplots} to

1997: \ref{fig:Ebins55is}). As measured by the mean absolute deviations

1998: between the integrated weight distributions, the agreement between the

1999: fitted and the input orbital weight distributions as a function of

2000: $E$, $L_z$, and $I_3$ is typically of the order of 3\%, 10\%, and

2001: 20\%, respectively (the numbers for our worst case being 5\%, 16\%,

2002: and 25\%). When eliminating the dependence on $I_3$, the agreement

2003: between the fitted and input $E-L_z$ distributions is of the order of

2004: 15\%, with the net rotational behavior of the input datasets cleanly

2005: recovered (Figs.  \ref{fig:2Dplot55ns} and

2006: \ref{fig:2Dplot55is}). Thus, we conclude that the discrete \sch code

2007: can successfully recover the orbital structure of the system under

2008: study.

2009:

2010: Assuming that the inclination of the system on the plane of the sky is

2011: known, we quantified the recovery of the input mass-to-light ratio as

2012: a function of the size of the input dataset (Fig. \ref{fig:MLparabN})

2013: and of the type of kinematic information available

2014: (Fig. \ref{fig:MLparab2}). We studied both the best-fit value as well

2015: as the uncertainty in its determination

2016: (Fig. \ref{fig:errors_ML}). The statistical expectation of better

2017: results when the amount of observational information is larger (either

2018: regarding the number of datapoints or the number of velocity

2019: components) is clearly reproduced by our discrete \sch models. For the

2020: smallest datasets used in our testing ($N=100$), and regardless of

2021: whether using only-LOS velocities, only proper motions, or both, the

2022: best-fit mass-to-light ratio is within 5-10\% of the input value, with

2023: formal $1\sigma$ uncertainties of the order of 15\%. When increasing

2024: either the number of available measurements or the number of measured

2025: velocity components, the mass-to-light ratio is always recovered to

2026: better than $\sim 10\%$ accuracy, with the corresponding random

2027: ($1\sigma$) uncertainties in the range of 5-10\%. The discrete \sch

2028: code, therefore, recovers the mass-to-light ratio of the input

2029: datasets to satisfactory levels of accuracy.

2030:

2031: The recovery of both the mass-to-light ratio and inclination when

2032: neither of these quantities are known in advance (as is usually the

2033: case with real observations) was studied using a grid of discrete \sch

2034: models, exploring also the dependence on the type of velocity

2035: components available (Fig. \ref{fig:55is_LOS_MU}). We find that the

2036: mass-to-light ratio was again successfully recovered, but the best-fit

2037: inclination was not identified correctly using small orbit

2038: libraries. We found that this was remedied by better sampling the

2039: available $(E,L_z,I_3)$ integral space using a larger orbit library

2040: (Fig. \ref{fig:grid_55_90}). For our input datasets with $i=55\grad$,

2041: the best-fit inclination obtained by our models with a large orbit

2042: library is $57\grad$, while for input datasets with $i=90\grad$ we

2043: obtain a best-fit model with $i=80\grad$. Given the known difficulty

2044: of \sch models in general for determining the inclination of stellar

2045: systems, and considering the low relative importance of this parameter

2046: compared to other properties such as the orbital structure and the

2047: mass-to-light ratio, we regard this small disagreement for the high

2048: inclination datasets as acceptable.

2049:

2050: In summary, we have shown that our new \sch code, designed to

2051: adequately handle modern datasets composed of discrete measurements of

2052: kinematic tracers, doing this without any loss of information due to

2053: data binning or restrictive assumptions on the distribution function,

2054: is able to constrain satisfactorily the orbital structure,

2055: mass-to-light ratio, and inclination of the system under

2056: study. Applications to data for Galactic globular clusters and nearby

2057: dE galaxies will be presented in future papers. These are only two

2058: examples of a large range of dynamical problems in astronomy to which

2059: a discrete \sch code like ours can be applied, so we expect this new

2060: tool will contribute to the better understanding of stellar systems in

2061: general.

2062:

2063:

2064:

2065:

2066: %\begin{acknowledgements}

2067: %Thanks to  ....

2068: %\end{acknowledgements}

2069: %

2070: %\section{Acknowledgements}

2071: %\label{sec:gracias}

2072:

2073: \acknowledgements

2074:

2075: We are happy to thank Marla Geha and Raja Guhathakurta for their

2076: continued interest in the present work and its extension to the study

2077: of actual galaxies using their unique data on dwarf ellipticals. We

2078: also thank Glenn van de Ven for very useful discussions, his interest

2079: in the progress of this project and, last but not least, for his

2080: invaluable help with IDL routines. This paper also benefited by

2081: comments from Davor Krajnovic, Aaron Romanowsky, and David

2082: Merritt. Thanks also to George Meylan for his help with the writing of

2083: the HST Theory proposal specified below, and to the anonymous referee,

2084: whose comments and suggestions improved the presentation of the

2085: paper. This work was carried out as part of HST Theory Project \#9952

2086: and was supported by NASA through a grant from STScI, which is

2087: operated by AURA, Inc., under NASA contract NAS 5-26555.

2088:

2089:

2090:

2091:

2092:

2093: %% Appendix material should be preceded with a single \appendix command.

2094: %% There should be a \section command for each appendix. Mark appendix

2095: %% subsections with the same markup you use in the main body of the paper.

2096:

2097: %% Each Appendix (indicated with \section) will be lettered A, B, C, etc.

2098: %% The equation counter will reset when it encounters the \appendix

2099: %% command and will number appendix equations (A1), (A2), etc.

2100:

2101: %\appendix

2102:

2103: %\section{Appendix material}

2104:

2105: \begin{thebibliography}{}

2106:

2107: \bibitem[Batsleer \& Dejonghe(1993)]{bat93} Batsleer, P., \& Dejonghe, H.\ 1993, \aap, 271, 104

2108: \bibitem[Batsleer \& Dejonghe(1995)]{bat95} Batsleer, P., \& Dejonghe, H.\ 1995, \aap, 294, 693

2109: \bibitem[Bender et al.(2005)]{ben05} Bender, R., et al.\ 2005, \apj, 631, 280

2110: \bibitem[Binney \& Mamon(1982)]{bin82} Binney, J., \& Mamon, G.~A.\ 1982, \mnras, 200, 361

2111: \bibitem[Binney \& Tremaine(1987)]{bt87} Binney, J., \& Tremaine, S.\ 1987, Princeton, NJ, Princeton University Press, 1987

2112: \bibitem[Cappellari et al.(2002)]{cap02} Cappellari, M., Verolme, E.~K., van der Marel, R.~P., Kleijn, G.~A.~V., Illingworth, G.~D., Franx, M., Carollo, C.~M., \& de Zeeuw, P.~T.\ 2002, \apj, 578, 787

2113: \bibitem[Cappellari et al.(2006)]{cap06} Cappellari, M., et al.\ 2006, \mnras, 366, 1126

2114: \bibitem[C{\^o}t{\'e} et al.(2001)]{cote01} C{\^o}t{\'e}, P., et al.\ 2001, \apj, 559, 828

2115: \bibitem[C{\^o}t{\'e} et al.(2003)]{cote03} C{\^o}t{\'e}, P., McLaughlin, D.~E., Cohen, J.~G., \& Blakeslee, J.~P.\ 2003, \apj, 591, 850

2116: \bibitem[Cretton et al.(1999)]{cre99} Cretton, N., de Zeeuw, P.~T., van der Marel, R.~P., \& Rix, H.-W.\ 1999, \apjs, 124, 383

2117: \bibitem[Cretton et al.(2000)]{cre00} Cretton, N., Rix, H.-W., \& de Zeeuw, P.~T.\ 2000, \apj, 536, 319

2118: \bibitem[Cretton \& Emsellem(2004)]{cre04} Cretton, N., \& Emsellem, E.\ 2004, \mnras, 347, L31

2119: \bibitem[Davies et al.(2006)]{dav06} Davies, R.~I., et al.\ 2006, \apj, 646, 754

2120: \bibitem[Dehnen \& Gerhard(1994)]{deh94} Dehnen, W., \& Gerhard, O.~E.\ 1994, \mnras, 268, 1019

2121: \bibitem[Dekel et al.(2005)]{dek05} Dekel, A., Stoehr, F., Mamon, G.~A., Cox, T.~J., Novak, G.~S., \& Primack, J.~R.\ 2005, \nat, 437, 707

2122: \bibitem[de Bruijne et al.(1996)]{deb96} de Bruijne, J.~H.~J., van der Marel, R.~P., \& de Zeeuw, P.~T.\ 1996, \mnras, 282, 909

2123: \bibitem[de Lorenzi et al.(2007)]{nmagic} de Lorenzi, F., Debattista, V.~P., Gerhard, O., \& Sambhus, N.\ 2007, \mnras, 376, 71

2124: \bibitem[Douglas et al.(2002)]{dou02} Douglas, N.~G., et al.\ 2002, \pasp, 114, 1234

2125: \bibitem[Douglas et al.(2007)]{dou07} Douglas, N.~G., et al.\ 2007, arXiv:astro-ph/0703047

2126: \bibitem[Gebhardt et al.(1997)]{geb97} Gebhardt, K., Pryor, C., Williams, T.~B., Hesser, J.~E., \& Stetson, P.~B.\ 1997, \aj, 113, 1026

2127: \bibitem[Gebhardt et al.(2000)]{geb00} Gebhardt, K., et al.\ 2000, \aj, 119, 1157

2128: \bibitem[Gebhardt et al.(2003)]{geb03} Gebhardt, K., et al.\ 2003, \apj, 583, 92

2129: \bibitem[Geha et al.(2006)]{geh06} Geha, M., Guhathakurta, P., Rich, R.~M., \& Cooper, M.~C.\ 2006, \aj, 131, 332

2130: \bibitem[Gerhard(1993)]{ger93} Gerhard, O.~E.\ 1993, \mnras, 265, 213

2131: \bibitem[Gerhard et al.(1998)]{ger98} Gerhard, O., Jeske, G., Saglia, R.~P., \& Bender, R.\ 1998, \mnras, 295, 197

2132: \bibitem[Gerssen et al.(2002)]{ger02} Gerssen, J., van der Marel, R.~P., Gebhardt, K., Guhathakurta, P., Peterson, R.~C., \& Pryor, C.\ 2002, \aj, 124, 3270

2133: \bibitem[Gilbert et al.(2006)]{gil06} Gilbert, K.~M., et al.\ 2006, \apj, 652, 1188

2134: \bibitem[Gilbert et al.(2007)]{gil07} Gilbert, K.~M., et al.\ 2007, \apj, 668, 245

2135: \bibitem[Kleyna et al.(2001)]{jan01} Kleyna, J.~T., Wilkinson, M.~I., Evans, N.~W., \& Gilmore, G.\ 2001, \apjl, 563, L115

2136: \bibitem[Kleyna et al.(2002)]{jan02} Kleyna, J., Wilkinson, M.~I., Evans, N.~W., Gilmore, G., \& Frayn, C.\ 2002, \mnras, 330, 792

2137: \bibitem[Krajnovi{\'c} et al.(2005)]{kra05} Krajnovi{\'c}, D., Cappellari, M., Emsellem, E., McDermid, R.M., \& de Zeeuw, P.T.\ 2005, \mnras, 357, 1113

2138: \bibitem[Kunkel et al.(1997)]{bill97} Kunkel, W.~E., Demers, S., Irwin, M.~J., \& Albert, L.\ 1997, \apjl, 488, L129

2139: \bibitem[Kunkel et al.(2000)]{bill00} Kunkel, W.~E., Demers, S., \& Irwin, M.~J.\ 2000, \aj, 119, 2789

2140: \bibitem[{\L}okas(2002)]{lok02} {\L}okas, E.~L.\ 2002, \mnras, 333, 697

2141: \bibitem[{\L}okas \& Mamon(2003)]{lok03} {\L}okas, E.~L., \& Mamon, G.~A.\ 2003, \mnras, 343, 401

2142: \bibitem[{\L}okas et al.(2005)]{lok05} {\L}okas, E.~L., Mamon, G.~A., \& Prada, F.\ 2005, \mnras, 363, 918

2143: \bibitem[Magorrian(2006)]{mag06} Magorrian, J.\ 2006, \mnras, 373, 425

2144: \bibitem[Magorrian et al.(1998)]{mag98} Magorrian, J., et al.\ 1998, \aj, 115, 2285

2145: \bibitem[Mayor et al.(1997)]{may97} Mayor, M., et al.\ 1997, \aj, 114, 1087

2146: \bibitem[McLaughlin et al.(2006)]{mcl06} McLaughlin, D.~E., Anderson, J., Meylan, G., Gebhardt, K., Pryor, C., Minniti, D., \& Phinney, S.\ 2006, \apjs, 166, 249

2147: \bibitem[McNamara et al.(2003)]{mcn03} McNamara, B.~J., Harrison, T.~E., \& Anderson, J.\ 2003, \apj, 595, 187

2148: \bibitem[Merritt(1993)]{mer93} Merritt, D.\ 1993, \apj, 413, 79

2149: \bibitem[Merritt \& Saha(1993)]{mer93a} Merritt, D., \& Saha, P.\ 1993, \apj, 409, 75

2150: \bibitem[Merritt et al.(1997)]{mer97} Merritt, D., Meylan, G., \& Mayor, M.\ 1997, \aj, 114, 1074

2151: \bibitem[Merritt(1999)]{mer99} Merritt, D.\ 1999, \pasp, 111, 129

2152: \bibitem[Press et al.(1992)]{recipes} Press, W.~H., Teukolsky, S.~A., Vetterling, W.~T., \& Flannery, B.~P.\ 1992, Cambridge University Press, 1992, 2nd ed.

2153: \bibitem[Reijns et al.(2006)]{rei06} Reijns, R.~A., Seitzer, P., Arnold, R., Freeman, K.~C., Ingerson, T., van den Bosch, R.~C.~E., van de Ven, G., \& de Zeeuw, P.~T.\ 2006, \aap, 445, 503

2154: \bibitem[Richstone \& Tremaine(1984)]{ric84} Richstone, D.~O., \& Tremaine, S.\ 1984, \apj, 286, 27

2155: \bibitem[Richtler et al.(2004)]{tom04} Richtler, T., et al.\ 2004, \aj, 127, 2094

2156: \bibitem[Rix et al.(1997)]{rix97} Rix, H.-W., de Zeeuw, P.~T., Cretton, N., van der Marel, R.~P., \& Carollo, C.~M.\ 1997, \apj, 488, 702

2157: \bibitem[Romanowsky \& Kochanek(2001)]{rom01} Romanowsky, A.~J., \& Kochanek, C.~S.\ 2001, \apj, 553, 722

2158: \bibitem[Romanowsky et al.(2003)]{rom03} Romanowsky, A.~J., Douglas, N.~G., Arnaboldi, M., Kuijken, K., Merrifield, M.~R., Napolitano, N.~R., Capaccioli, M., \& Freeman, K.~C.\ 2003, Science, 301, 1696

2159: \bibitem[Schwarzschild(1979)]{sch79} Schwarzschild, M.\ 1979, \apj, 232, 236

2160: \bibitem[Shanno \& Phua(1980)]{sha80} Shanno, D.F., \& Phua, K.H.\ 1980, ACM Transactions on Mathematical Software, 6, 618

2161: \bibitem[Simon \& Geha(2007)]{sim07} Simon, J.~D., \& Geha, M.\ 2007, \apj, 670, 313

2162: \bibitem[Suntzeff \& Kraft(1996)]{sun96} Suntzeff, N.~B., \& Kraft, R.~P.\ 1996, \aj, 111, 1913

2163: \bibitem[Syer \& Tremaine(1996)]{m2m} Syer, D., \& Tremaine, S.\ 1996, \mnras, 282, 223

2164: \bibitem[Teodorescu et al.(2005)]{teo05} Teodorescu, A.~M., M{\'e}ndez, R.~H., Saglia, R.~P., Riffeser, A., Kudritzki, R.-P., Gerhard, O.~E., \& Kleyna, J.\ 2005, \apj, 635, 290

2165: \bibitem[Thomas et al.(2004)]{tho04} Thomas, J., Saglia, R.~P., Bender, R., Thomas, D., Gebhardt, K., Magorrian, J., \& Richstone, D.\ 2004, \mnras, 353, 391

2166: \bibitem[Valluri et al.(2004)]{val04} Valluri, M., Merritt, D., \& Emsellem, E.\ 2004, \apj, 602, 66

2167: \bibitem[van de Ven et al.(2006)]{glenn06} van de Ven, G., van den Bosch, R.C.E., Verolme, E.K., \& de Zeeuw, P.T.\ 2006, \aap, 445, 513

2168: \bibitem[van den Bosch et al.(2006)]{bos06} van den Bosch, R., de Zeeuw, T., Gebhardt, K., Noyola, E., \& van de Ven, G.\ 2006, \apj, 641, 852

2169: \bibitem[van der Marel \& Franx(1993)]{vdm93} van der Marel, R.~P., \& Franx, M.\ 1993, \apj, 407, 525

2170: \bibitem[van der Marel \etal(1994)]{vdm94} van der Marel, R.P., Evans, N.W., Rix, H.-W., White, S.D.M., \& de Zeeuw, P.T.\ 1994, \mnras, 271, 99

2171: \bibitem[van der Marel et al.(1997a)]{vdm97a} van der Marel, R.~P., de Zeeuw, P.~T., \& Rix, H.-W.\ 1997, \apj, 488, 119

2172: \bibitem[van der Marel \etal(1997b)]{vdm97b} van der Marel, R.P., Sigurdsson, S., \& Hernquist, L.\ 1997, \apj, 487, 153

2173: \bibitem[van der Marel \etal(1998)]{vdm98} van der Marel, R.P., Cretton, N., de Zeeuw, P.T., \& Rix, H.-W.\ 1998, \apj, 493, 613

2174: \bibitem[van der Marel et al.(2000)]{vdm00} van der Marel, R.~P., Magorrian, J., Carlberg, R.~G., Yee, H.~K.~C., \& Ellingson, E.\ 2000, \aj, 119, 2038

2175: \bibitem[van der Marel et al.(2002)]{vdm02} van der Marel, R.~P., Alves, D.~R., Hardy, E., \& Suntzeff, N.~B.\ 2002, \aj, 124, 2639

2176: \bibitem[van der Marel \& van Dokkum(2006)]{vdm06} van der Marel, R.~P., \& van Dokkum, P.~G.\ 2006, astro-ph/0611571

2177: \bibitem[Vandervoort(1984)]{voort84} Vandervoort, P.O.\ 1984, \apj, 287, 475

2178: \bibitem[van Leeuwen et al.(2000)]{vleu00} van Leeuwen, F., Le Poole, R.~S., Reijns, R.~A., Freeman, K.~C., \& de Zeeuw, P.~T.\ 2000, \aap, 360, 472

2179: \bibitem[Verolme \& de Zeeuw(2002)]{ver02} Verolme, E.~K., \& de Zeeuw, P.~T.\ 2002, \mnras, 331, 959

2180: \bibitem[Walker et al.(2006)]{wal06} Walker, M.~G., Mateo, M., Olszewski, E.~W., Bernstein, R., Wang, X., \& Woodroofe, M.\ 2006, \aj, 131, 2114

2181: \bibitem[Wilkinson et~al.(2002)]{wil02} Wilkinson M.I., Kleyna, J.T., Evans, N.W., Gilmore G., 2001, MNRAS, 330, 778

2182: \bibitem[Wilkinson et al.(2004)]{wil04} Wilkinson, M.~I., Kleyna, J.~T., Evans, N.~W., Gilmore, G.~F., Irwin, M.~J., \& Grebel, E.~K.\ 2004, \apjl, 611, L21

2183: \bibitem[Wojtak \& {\L}okas(2007)]{woj07a} Wojtak, R., \& {\L}okas, E.~L.\ 2007, \mnras, 377, 843

2184: \bibitem[Wojtak et al.(2007)]{woj07b} Wojtak, R., {\L}okas, E.~L., Mamon, G.~A., Gottl{\"o}ber, S., Prada, F., \& Moles, M.\ 2007, \aap, 466, 437

2185: \bibitem[Wu(2007)]{wu07} Wu, X.\ 2007, astro-ph/0702233

2186: \bibitem[Wu \& Tremaine(2006)]{wu06} Wu, X., \& Tremaine, S.\ 2006, \apj, 643, 210

2187:

2188: \end{thebibliography}

2189:

2190:

2191: \clearpage

2192:

2193: %\begin{landscape}

2194: %\rotate

2195: \begin{deluxetable}{ccrlcrlcrlcrlcrl}

2196: \tablewidth{0pc}

2197: %\tabletypesize{\tiny}

2198: \tabletypesize{\scriptsize}

2199: \tablecaption{Comparison between input and fitted orbital mass weights}

2200: \tablehead{

2201: \multicolumn{1}{c}{dataset} &

2202: \multicolumn{1}{c}{} &

2203: \multicolumn{2}{c}{no projection} &

2204: \multicolumn{1}{c}{} &

2205: \multicolumn{2}{c}{$I_3$} &

2206: \multicolumn{1}{c}{} &

2207: \multicolumn{2}{c}{$L_z,I_3$} &

2208: \multicolumn{1}{c}{} &

2209: \multicolumn{2}{c}{$E,I_3$} &

2210: \multicolumn{1}{c}{} &

2211: \multicolumn{2}{c}{$E,L_z$} \\

2212: \multicolumn{1}{c}{} &

2213: \multicolumn{1}{c}{} &

2214: \multicolumn{1}{c}{RMS} &

2215: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &

2216: \multicolumn{1}{c}{} &

2217: \multicolumn{1}{c}{RMS} &

2218: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &

2219: \multicolumn{1}{c}{} &

2220: \multicolumn{1}{c}{RMS} &

2221: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &

2222: \multicolumn{1}{c}{} &

2223: \multicolumn{1}{c}{RMS} &

2224: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &

2225: \multicolumn{1}{c}{} &

2226: \multicolumn{1}{c}{RMS} &

2227: \multicolumn{1}{c}{$\mid{\rm med}\mid$}

2228: }

2229: \startdata

2230: 55ns & & 4.5138 & 0.4531 & & 0.2207 & 0.1404 & & 0.0908 & 0.0359 & & 0.1080 & 0.0719 & & 0.2634 & 0.1591 \\

2231: 55is & & 6.6525 & 0.5630 & & 0.3677 & 0.1915 & & 0.0939 & 0.0309 & & 0.1912 & 0.1598 & & 0.2460 & 0.1855 \\

2232: 55ms & & 3.1524 & 0.4208 & & 0.2123 & 0.1207 & & 0.0884 & 0.0334 & & 0.1845 & 0.0903 & & 0.2505 & 0.1122 \\

2233: 90ns & & 3.7560 & 0.5939 & & 0.2683 & 0.1633 & & 0.1419 & 0.0425 & & 0.2202 & 0.1365 & & 0.2361 & 0.2491 \\

2234: 90is & & 2.7956 & 0.6410 & & 0.6253 & 0.1629 & & 0.1366 & 0.0347 & & 0.4826 & 0.1178 & & 0.2566 & 0.2364 \\

2235: 90ms & & 1.4893 & 0.4927 & & 0.2298 & 0.1714 & & 0.1216 & 0.0384 & & 0.1456 & 0.1149 & & 0.2953 & 0.2333 \\

2236: %\\

2237: %55ns & & 8.7257 & 0.4920 & & 0.3179 & 0.1459 & & 0.0304 & 0.0173 & & 0.1512 & 0.1022 & & 0.5774 & 0.1500 \\

2238: %55is & & 9.3958 & 0.6237 & & 0.5400 & 0.2400 & & 0.0316 & 0.0171 & & 0.2450 & 0.2281 & & 0.7361 & 0.1604 \\

2239: %55ms & & 4.5211 & 0.4539 & & 0.2914 & 0.1731 & & 0.0305 & 0.0181 & & 0.2278 & 0.1553 & & 0.6633 & 0.1909 \\

2240: %90ns & & 4.1376 & 0.5866 & & 0.4036 & 0.1687 & & 0.0481 & 0.0375 & & 0.2173 & 0.1300 & & 0.7313 & 0.2064 \\

2241: %90is & & 5.8993 & 0.7236 & & 2.5684 & 0.2071 & & 0.0452 & 0.0324 & & 0.7889 & 0.1608 & & 0.7603 & 0.1658 \\

2242: %90ms & & 1.7923 & 0.5176 & & 0.3792 & 0.1960 & & 0.0290 & 0.0251 & & 0.1682 & 0.0896 & & 0.8509 & 0.1511 \\

2243: \enddata

2244:

2245: \tablecomments{The tabulated numbers are the root mean square and

2246: median absolute deviation of the quantity $(\zeta_{\rm fit}-\zeta_{\rm

2247: in})/\zeta_{\rm in}$, i.e., the difference between fit and input mass

2248: weights normalized by the input mass weights, for \sch models based on

2249: our small ($20\times14\times7$) orbit library. The statistics are

2250: always computed inside the energy range constrained by the data (see

2251: Fig.\ref{fig:1Dplots}), and are shown for the full cubes of mass

2252: weights (columns labeled ``no projection'') and for various

2253: projections of these cubes. The projected distributions are obtained

2254: by integrating over one or two of the integrals of motion (i.e., by

2255: collapsing the 3-D cubes in one or two dimensions), and appear under

2256: the columns labeled by the integral(s) of motion over which the

2257: integration has been done. }

2258:

2259: \end{deluxetable}

2260: %\end{landscape}

2261:

2262:

2263: \clearpage

2264:

2265: %% Use the figure environment and \plotone or \plottwo to include

2266: %% figures and captions in your electronic submission.

2267: %% To embed the sample graphics in

2268: %% the file, uncomment the \plotone, \plottwo, and

2269: %% \includegraphics commands

2270: %%

2271: %% If you need a layout that cannot be achieved with \plotone or

2272: %% \plottwo, you can invoke the graphicx package directly with the

2273: %% \includegraphics command or use \plotfiddle. For more information,

2274: %% please see the tutorial on "Using Electronic Art with AASTeX" in the

2275: %% documentation section at the AASTeX Web site,

2276: %% http://www.journals.uchicago.edu/AAS/AASTeX.

2277: %%

2278: %% The examples below also include sample markup for submission of

2279: %% supplemental electronic materials. As always, be sure to check

2280: %% the instructions to authors for the journal you are submitting to

2281: %% for specific submissions guidelines as they vary from

2282: %% journal to journal.

2283:

2284: %% This example uses \plotone to include an EPS file scaled to

2285: %% 80% of its natural size with \epsscale. Its caption

2286: %% has been written to indicate that additional figure parts will be

2287: %% available in the electronic journal.

2288:

2289: %%\begin{figure}

2290: %%\vspace*{-1cm}

2291: %%%{\hspace{-15cm}

2292: %%%\plotfiddle{positions_55is_1E5.ps}{15cm}{0}{50}{50}{-25}{0}}

2293: %%\vspace*{-1cm}

2294: %%\caption{\label{fig:data55is} Phase-space projections of the 55is

2295: %%dataset. }\end{figure}

2296:

2297: \clearpage

2298:

2299: \begin{figure}

2300: %\epsscale{.80}

2301: \plotone{f1.eps}

2302: \caption{\label{fig:mkfitin} Maximization of the total likelihood $\ln

2303: L$ as a function of the number of function evaluations $N$, for a

2304: typical \sch fit using a dataset consisting of 1000 discrete kinematic

2305: measurements consisting of full 3-dimensional velocities (55is case;

2306: \S\,\ref{sec:data} and Figure \ref{fig:data55is}). Shown on the

2307: vertical axis is the change in the quantity $\lambda = -2\ln L$,

2308: denoted as $\delta\lambda$. This change becomes smaller as the

2309: optimization converges to a solution following approximately the

2310: exponential relation illustrated by the dotted line. See discussion in

2311: \S\,\ref{sec.mkfitin}. }

2312: \end{figure}

2313:

2314: \clearpage

2315:

2316: \begin{figure}

2317: %\epsscale{.80}

2318: %\plotone{3_maps.from_hoegaarden.10e4.ps}

2319: \plotone{f2.eps}

2320: \caption{\label{fig:data55is} Three phase-space projections using

2321: 10,000 particles of the 55is simulated dataset ($i=55\grad$ with

2322: intermediate streaming). All input datasets have been constructed by

2323: randomly drawing discrete particles from a two-integral distribution

2324: function of the form $f(E,L_z)$, and are built so that, regardless of

2325: their true inclination, they have the same light distribution when

2326: projected on the plane of the sky. The coordinate system $(x',y',z')$

2327: represents the observer's system, with $(x',y')$ on the plane of the

2328: sky, and $z'$ the direction along the line-of-sight, defined positive

2329: away from the observer. The coordinates $r$, $v_{\theta}$, and

2330: $v_{\phi}$ correspond to the usual spherical coordinates intrinsic to

2331: the system. Spatial coordinates are in units of 8.7 arcsec, and

2332: velocities in units of 250 \kms. Note the asymmetry with respect to

2333: $v_{\phi}=0$ in the bottom-right panel, reflecting the net rotation of

2334: the 55is dataset. }

2335: \end{figure}

2336:

2337: \clearpage

2338:

2339: \begin{figure}

2340: \vspace*{-1cm}

2341: %\plotone{ELzI3_55is.ps}

2342: \plotone{f3.eps}

2343: %{\hspace{-15cm}

2344: %\plotfiddle{ELzI3_55is.ps}{1cm}{0}{400}{500}{-25}{0}}

2345: %\vspace*{-3cm}

2346: \caption{\label{fig:1Dplots} Integrated mass weights as a function of

2347: the three integrals of motion, $E$, $L_z$, and $I_3$, for the 55is

2348: dataset ($i=55^{\circ}$ with intermediate streaming) with 1000

2349: kinematic constraints and full 3-dimensional velocities (both LOS

2350: velocities and proper motions). The \sch fit (solid lines) was

2351: obtained using a library with $40\times28\times14$ orbits (our

2352: ``large'' library), and satisfactorily reproduces the mass

2353: distributions associated with the input dataset (dashed lines). The

2354: vertical dotted lines in the upper panel indicate the energy range

2355: constrained by the kinematic data.  The middle panel, with the mass

2356: distribution at positive $L_z$ always higher than that at negative

2357: $L_z$, reflects the net rotation of the 55is dataset. Note that, since

2358: we are showing orbital mass weights instead of the actual distribution

2359: function, the $I_3$ distributions in the bottom panel are not constant

2360: over $I_3$, even though the input distribution function is of the form

2361: $f(E,L_z)$.}

2362: \end{figure}

2363:

2364: \clearpage

2365:

2366: \begin{figure}

2367: \vspace*{-1cm}

2368: %\plotone{FIGURE_DF.55ns.model_vs_fit.En_Lz_2D.40x14x14_vs_20x7x7.ps}

2369: \plotone{f4.eps}

2370: %{\hspace{-15cm}

2371: %\plotfiddle{FIGURE_DF.55ns.model_vs_fit.En_Lz_2D.1st_version.40x14x14.ps}{1cm}{0}{400}{500}{-25}{0}}

2372: %\vspace*{-3cm}

2373: \caption{\label{fig:2Dplot55ns} Comparison of the input and fitted

2374: distributions of mass weights as a function of energy and $L_z$ for

2375: the 55ns dataset ($i=55^{\circ}$ with no streaming) with 1000 LOS

2376: velocities and proper motions. Only the energy range containing most

2377: of the total mass is shown. Upper panels show the weight distribution

2378: obtained by the \sch fit when the inclination and mass-to-light ratio

2379: are assumed to be known a priori, and the bottom panels show the

2380: weights distribution associated to the simulated input

2381: data. Right-hand panels show the results of the \sch code for an orbit

2382: library with $20\times14\times7$ orbits, while the left-hand panels

2383: show the results for a library 8 times bigger, with

2384: $40\times28\times14$ combinations of $(E,L_z,I_3)$. Black corresponds

2385: to zero weight, and the white (brightest) color in each pair of panels

2386: (fit and model, or upper and lower) has been assigned to the maximum

2387: orbital weight among the two panels, so that the comparison between

2388: fits and models is made using the same color scale. The images in this

2389: and subsequent figures are based on two-dimensional spline curves

2390: fitted to the gridded information. While the two bottom panels

2391: represent the same input data, their visualizations differ due to a

2392: different coarseness in the gridding of integral space.}

2393: \end{figure}

2394:

2395: \clearpage

2396:

2397: \begin{figure}

2398: \vspace*{-1cm}

2399: %\plotone{FIGURE_DF.55is.model_vs_fit.En_Lz_2D.40x14x14_vs_20x7x7.ps}

2400: \plotone{f5.eps}

2401: %{\hspace{-15cm}

2402: %\plotfiddle{FIGURE_DF.55ns.model_vs_fit.En_Lz_2D.1st_version.40x14x14.ps}{1cm}{0}{400}{500}{-25}{0}}

2403: %\vspace*{-3cm}

2404: \caption{\label{fig:2Dplot55is} Same as in Figure \ref{fig:2Dplot55ns} but for

2405: the 55is dataset ($i=55^{\circ}$ with intermediate streaming).

2406: }

2407: \end{figure}

2408:

2409: \clearpage

2410:

2411: \begin{figure}

2412: \vspace*{-1cm}

2413: %\plotone{FIGURE_DF.55is.model_vs_fit.10_E_bins.40x14x14.ps}

2414: \plotone{f6.eps}

2415: %{\hspace{-15cm}

2416: %\plotfiddle{FIGURE_DF.55is.model_vs_fit.10_E_bins.40x14x14.ps}{1cm}{0}{400}{500}{-25}{0}}

2417: %\vspace*{-3cm}

2418: \caption{\label{fig:Ebins55is} Input ($\zeta_{\rm in}$; bottom) and

2419: fitted ($\zeta_{\rm fit}$; top) distributions of orbital mass weights

2420: as a function of $L_z$ and $I_3$ at fixed (non-consecutive) values of

2421: energy, for the 55is case with 1000 kinematic measurements with LOS

2422: velocities and proper motions (i.e., the same case as depicted in

2423: Figure \ref{fig:2Dplot55is}). These results correspond to our orbit

2424: library with $40\times28\times14$ combinations of $(E,L_z,I_3)$. From

2425: left to right, the panels show the weight distribution at increasing

2426: distances from the center of the galaxy, as indicated at the top of

2427: each pair of panels by the value $R_c$ (in arcmin) of the circular

2428: orbit at the corresponding energy. The fraction (in \%) of the total

2429: mass contained in each energy slice is indicated at the bottom of each

2430: panel. As in Figures \ref{fig:2Dplot55ns} and \ref{fig:2Dplot55is},

2431: black corresponds to zero weight and the white (brightest) color in

2432: each pair of panels (fit and model, or upper and lower) has been

2433: assigned to the maximum orbital weight among the two panels, so that

2434: the comparison between fits and models is made using the same color

2435: scale. }

2436: \end{figure}

2437:

2438: \clearpage

2439:

2440: \begin{figure}

2441: \vspace*{-1cm}

2442: %\plotone{ML_parabolas.N.eps}

2443: \plotone{f7.eps}

2444: %{\hspace{-15cm}

2445: %\plotfiddle{ML_parabola.55is_90is.40x14x14.eps}{1cm}{0}{400}{500}{-25}{0}}

2446: %\vspace*{-3cm}

2447: \caption{\label{fig:MLparabN} $\Delta \chi^2-$parabolae that

2448: illustrate the recovery of the input mass-to-light ratio

2449: $\Upsilon_0^*$ as a function of the number of available kinematic

2450: measurements. All input datasets include both LOS velocities and

2451: proper motions, and all \sch models have been computed using our small

2452: orbit library, the one with $20\times14\times7$ combinations of the

2453: $(E,L_z,I_3)$ integrals of motion. For any given input dataset, the

2454: symbols show the $\Delta \chi^2$ obtained by the discrete \sch code on

2455: a number of \ml\, values distributed around the correct one

2456: ($\Upsilon_0^*$). The curves connecting the computed models are

2457: polynomial fits of 5th order. When the number of datapoints $N$ is

2458: smaller, the $\Delta \chi^2$ parabola is shallower, and the

2459: statistical uncertainty on the inferred $\Upsilon$ is larger. The

2460: lowest curve is shown at its actual $\Delta \chi^2$.  Each subsequent

2461: curve was offset vertically by a value of 40 for visual clarity.}

2462: \end{figure}

2463:

2464: \clearpage

2465:

2466: \begin{figure}

2467: \vspace*{-1cm}

2468: %\plotone{ML_parabolas.type.eps}

2469: \plotone{f8.eps}

2470: %{\hspace{-15cm}

2471: %\plotfiddle{ML_parabola.55is_90is.40x14x14.eps}{1cm}{0}{400}{500}{-25}{0}}

2472: %\vspace*{-3cm}

2473: \caption{\label{fig:MLparab2} $\Delta \chi^2-$parabolae illustrating

2474: the recovery of the input mass-to-light ratio $\Upsilon_0^*$ for

2475: datasets with different types of kinematic information. All input

2476: datasets are of the 55is case ($i=55\grad$ with intermediate

2477: streaming) with 1000 measurements. As in Figure \ref{fig:MLparabN},

2478: all \sch models have been computed using our small orbit library, with

2479: $20\times14\times7$ combinations of the $(E,L_z,I_3)$ integrals of

2480: motion.  When fewer velocity components are observed, the $\Delta

2481: \chi^2$ parabola is shallower, and the statistical uncertainty on the

2482: inferred $\Upsilon$ is larger.

2483: %The lowest curve is shown at its actual

2484: %$\Delta \chi^2$.  Each subsequent curve was offset vertically by a

2485: %value of 50 for visual clarity.

2486: }

2487: \end{figure}

2488:

2489: \clearpage

2490:

2491: \begin{figure}

2492: \vspace*{-1cm}

2493: %\plotone{errors_ML.eps}

2494: \plotone{f9.eps}

2495: %{\hspace{-15cm}

2496: %\plotfiddle{ML_parabola.55is_90is.40x14x14.eps}{1cm}{0}{400}{500}{-25}{0}}

2497: %\vspace*{-3cm}

2498: \caption{\label{fig:errors_ML} Uncertainties in the recovery of the

2499: input mass-to-light ratio $\Upsilon_0^*$ as a function of the number

2500: of available kinematic measurements, and for input datasets with

2501: varying types of kinematic information. The upper panel shows the

2502: behavior of the statistical uncertainty in the determination of the

2503: best-fit \ml, i.e., the $1\sigma$ interval around the minimum of the

2504: corresponding parabolae in Figures \ref{fig:MLparabN} and

2505: \ref{fig:MLparab2}. The dashed lines in the upper panel have a slope

2506: of $-1/2$ and serve to demonstrate that the errors given by the \sch

2507: code roughly satisfy the $N^{-1/2}$ scaling expected from number

2508: statistics. The bottom panel shows the difference between the input

2509: mass-to-light ratio ($\Upsilon_0^*$) and the best-fit \ml\, given by

2510: the \sch code (i.e., the minimum of the parabolae of Figures

2511: \ref{fig:MLparabN} and \ref{fig:MLparab2}). The error bars are the

2512: $1\sigma$ errors from the upper panel. All \sch models in this figure

2513: have been computed using our small orbit library, with

2514: $20\times14\times7$ combinations of the $(E,L_z,I_3)$ integrals of

2515: motion. }

2516: \end{figure}

2517:

2518: \clearpage

2519:

2520: \begin{figure}

2521: \vspace*{0cm}

2522: %\plotone{delta_chisqr.fine_grid.55is_los_mu.eps}

2523: \plotone{f10.eps}

2524: %{\hspace*{-18cm}

2525: %\plotfiddle{55is.onlyLOS_vs_onlyMU.1000.ps}{1cm}{0}{1.0}{1.0}{0}{0}}

2526: %\vspace*{-3cm}

2527: \caption{\label{fig:55is_LOS_MU} Comparison of discrete \sch models

2528: based on data comprised of purely LOS velocities (upper panel) and

2529: purely proper motions (lower panel), for the 55is dataset with 1000

2530: kinematic measurements and libraries with $20\times14\times7$

2531: orbits. The lines are $\Delta\chi^2$ contours overlaid on grids of

2532: actually computed models (indicated by the small dots) with different

2533: combinations of inclination and mass-to-light ratio \ml. The correct

2534: input model is indicated as a large black dot, and the best-fit model

2535: as a star.  The first three contours are spaced in increments of

2536: $1\sigma$ confidence, with the $3\sigma$ contour (99.7\% confidence

2537: level) highlighted with a thick line. Discrete \sch fits on both

2538: only-LOS velocities and only proper motions satisfactorily recover the

2539: input mass-to-light ratio, but not the input inclination. In terms of

2540: the uncertainties in the best-fit parameters (i.e., the size of the

2541: confidence intervals), proper motions provide tighter constraints than

2542: only-LOS velocities. }

2543: \end{figure}

2544:

2545: \clearpage

2546:

2547: \pagestyle{empty}

2548: \begin{figure}

2549: \vspace*{-25mm}

2550: %\plotone{delta_chisqr.fine_grid.library_size.eps}

2551: \plotone{f11.eps}

2552: %{\hspace{-15cm}

2553: %\plotfiddle{}{1cm}{0}{400}{500}{-25}{0}}

2554: %\vspace*{-3cm}

2555: \caption{\label{fig:grid_55_90} Recovery of the input inclination and

2556: mass-to-light ratio, when both assumed unknown, for the 55is and 90is

2557: datasets (left- and right-hand panels, respectively). Shown are the

2558: $\Delta\chi^2$ contours obtained from grids of \sch models constructed

2559: using orbit libraries with different sampling of the available

2560: $(E,L_z,I_3)$ integral space. Upper panels correspond to libraries

2561: with $20\times14\times7$ orbits, while lower panels are based on

2562: libraries with $40\times28\times14$ orbits, i.e., with 8 times finer

2563: sampling.  The input mass-to-light ratio $\Upsilon_0^*$ is

2564: satisfactorily recovered regardless of the number of orbits (in all

2565: cases inside the $2\sigma$ confidence level). In terms of inclination,

2566: the shapes of the contours indicate that there may be two separate

2567: maxima providing similarly good fits to the data. For the smaller

2568: orbit library, the best-fit inclinations converge to the wrong

2569: solution, $i\approx 70\grad$, for both datasets. Nevertheless, the

2570: correct inclination is encompassed by the secondary maximum in the

2571: 55is case (upper left), and a clear elongation of the contours towards

2572: higher inclination is seen in the 90is case (upper right). When using

2573: the larger orbit library, however, the best-fit inclination is

2574: $i=57\grad$ for the 55is dataset, and $i=80\grad$ for the 90is

2575: dataset, in better agreement with the true values.  }

2576: \end{figure}

2577:

2578: \clearpage

2579:

2580: %% Here we use \plottwo to present two versions of the same figure,

2581: %% one in black and white for print the other in RGB color

2582: %% for online presentation. Note that the caption indicates

2583: %% that a color version of the figure will be available online.

2584: %%

2585:

2586: %\begin{figure}

2587: %%\plottwo{f2.eps}{f2_color.eps}

2588: %\caption{A panel taken from Figure 2 of \citet{rudnick03}.

2589: %See the electronic edition of the Journal for a color version

2590: %of this figure.\label{fig2}}

2591: %\end{figure}

2592:

2593:

2594: %% If you are not including electonic art with your submission, you may

2595: %% mark up your captions using the \figcaption command. See the

2596: %% User Guide for details.

2597: %%

2598: %% No more than seven \figcaption commands are allowed per page,

2599: %% so if you have more than seven captions, insert a \clearpage

2600: %% after every seventh one.

2601:

2602: %% Tables should be submitted one per page, so put a \clearpage before

2603: %% each one.

2604:

2605: %% Two options are available to the author for producing tables:  the

2606: %% deluxetable environment provided by the AASTeX package or the LaTeX

2607: %% table environment.  Use of deluxetable is preferred.

2608: %%

2609:

2610: %% Three table samples follow, two marked up in the deluxetable environment,

2611: %% one marked up as a LaTeX table.

2612:

2613: %% In this first example, note that the \tabletypesize{}

2614: %% command has been used to reduce the font size of the table.

2615: %% We also use the \rotate command to rotate the table to

2616: %% landscape orientation since it is very wide even at the

2617: %% reduced font size.

2618: %%

2619: %% Note also that the \label command needs to be placed

2620: %% inside the \tablecaption.

2621:

2622: %% This table also includes a table comment indicating that the full

2623: %% version will be available in machine-readable format in the electronic

2624: %% edition.

2625:

2626:

2627:

2628:

2629: %% The following command ends your manuscript. LaTeX will ignore any text

2630: %% that appears after it.

2631:

2632: \end{document}

2633:

2634: %%

2635: %% End of file `sample.tex'.

2636: